Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement BaseFilesystemReader in S3Reader and SharePoint Reader #13408

Merged
merged 20 commits into from
May 17, 2024

Conversation

Javtor
Copy link
Contributor

@Javtor Javtor commented May 10, 2024

Description

Add BaseFilesystemReader interface to be able to break down the process of reading files from a file system in two separate steps: first listing the files and then reading each individual file. Additionally, add a method to be able to read the info for a file without needing to actually get the file content from the filesystem.

Depends on #13424 being merged first, because the changes done to integrations here (like S3 and SharePoint) depend on the core package version being bumped first to pass CI.

Version Bump?

  • Yes

Type of Change

  • New feature (non-breaking change which adds functionality)

How Has This Been Tested?

  • Added new unit/integration tests

@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label May 10, 2024
@Javtor Javtor marked this pull request as draft May 10, 2024 00:09
@Javtor Javtor changed the title [WIP] Add BaseFilesystemReader interface + some implementations Implement BaseFilesystemReader in S3Reader and SharePoint Reader May 11, 2024
@Javtor Javtor marked this pull request as ready for review May 11, 2024 00:25
@Disiok
Copy link
Collaborator

Disiok commented May 13, 2024

@Javtor you can update the base branch to be javier/fs-reader-interface-simpledirreader to show the (stacked) diff.

@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels May 16, 2024
@Javtor Javtor changed the base branch from main to javier/fs-reader-interface-simpledirreader May 16, 2024 16:21
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels May 16, 2024
@Javtor Javtor requested a review from nerdai May 16, 2024 17:08
Copy link
Collaborator

@logan-markewich logan-markewich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good start to implementing this, thanks @Javtor (also yay tests)

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label May 16, 2024
from llama_index.core.readers.base import BaseReader, BasePydanticReader
from llama_index.core.schema import Document
from llama_index.core.bridge.pydantic import Field


class S3Reader(BasePydanticReader):
class S3Reader(BasePydanticReader, BaseFilesystemReader):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I almost wonder if BaseFileSystemReader should just be subclassing BasePydanticReader? If we follow the dependency list of classes, now S3Reader inherits both BaseReader and BasePydanticReader lol

(This is where all the subclasses start to get confusing haha)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agreed, this multiple inheritance is confusing.

Base automatically changed from javier/fs-reader-interface-simpledirreader to main May 17, 2024 17:15
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:L This PR changes 100-499 lines, ignoring generated files. labels May 17, 2024
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:XL This PR changes 500-999 lines, ignoring generated files. labels May 17, 2024
@Javtor Javtor merged commit e6daed5 into main May 17, 2024
8 checks passed
@Javtor Javtor deleted the javier/fs-reader-interface branch May 17, 2024 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants