-
Notifications
You must be signed in to change notification settings - Fork 4.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement BaseFilesystemReader in S3Reader and SharePoint Reader #13408
Conversation
@Javtor you can update the base branch to be |
…rface-simpledirreader
…/fs-reader-interface
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a good start to implementing this, thanks @Javtor (also yay tests)
from llama_index.core.readers.base import BaseReader, BasePydanticReader | ||
from llama_index.core.schema import Document | ||
from llama_index.core.bridge.pydantic import Field | ||
|
||
|
||
class S3Reader(BasePydanticReader): | ||
class S3Reader(BasePydanticReader, BaseFilesystemReader): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I almost wonder if BaseFileSystemReader should just be subclassing BasePydanticReader? If we follow the dependency list of classes, now S3Reader inherits both BaseReader and BasePydanticReader lol
(This is where all the subclasses start to get confusing haha)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agreed, this multiple inheritance is confusing.
…rface-simpledirreader
…/fs-reader-interface
Description
Add
BaseFilesystemReader
interface to be able to break down the process of reading files from a file system in two separate steps: first listing the files and then reading each individual file. Additionally, add a method to be able to read the info for a file without needing to actually get the file content from the filesystem.Depends on #13424 being merged first, because the changes done to integrations here (like S3 and SharePoint) depend on the core package version being bumped first to pass CI.
Version Bump?
Type of Change
How Has This Been Tested?