Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor page management to a PagesManager class #8639

Merged
merged 9 commits into from May 13, 2024

Conversation

kmcgrady
Copy link
Collaborator

@kmcgrady kmcgrady commented May 9, 2024

Describe your changes

Migrates pages are primarily managed in the source_util module to a PagesManager class. Because V2 can have a dynamic set of pages per session, the Pages Manager instance will live on the AppSession. For V1, we will leverage static variables/methods for page watching.

Testing Plan

  • Original Unit tests are applied (with patches adjusted)
  • Unit tests are applied for the PagesManager component

Contribution License Agreement

By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.

@kmcgrady kmcgrady changed the title Refactor backend navigation to handle multiple versions of MPA Refactor page management to a PagesManager class May 9, 2024
@kmcgrady kmcgrady marked this pull request as ready for review May 10, 2024 17:33
@kmcgrady kmcgrady requested a review from a team as a code owner May 10, 2024 17:33
absolute_path = self.get_absolute_path(self.root, self.path)
return super().validate_absolute_path(root, absolute_path)

raise e
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vdonato This might be the most controversial part. My assumption here is that we need these paths to actually 404 for the sake of identifying the correct path that provides the stream.

The rest can essentially be handled by the frontend, but let me know if my assumption is wrong.

Copy link
Collaborator

@vdonato vdonato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some comments, but overall LGTM

@@ -69,7 +70,7 @@ def setUp(self):

@patch("streamlit.watcher.local_sources_watcher.PathWatcher")
def test_just_script(self, fob):
lso = local_sources_watcher.LocalSourcesWatcher(SCRIPT_PATH)
lso = local_sources_watcher.LocalSourcesWatcher(PagesManager(SCRIPT_PATH))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: not quite related to this change, but there variable used here is frequently lso (not sure if LocalSourcesWatcher previously had a different name and this variable stayed the same when class name was changed by a search and replace) when lsw would be clearer

_LOGGER: Final = get_logger(__name__)


class V1PagesManager:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this name should probably be something more like V1PagesWatcher since the class just installs watchers/doesn't do anything else related to page management

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would also suggest to flip the word order and call it PagesManagerV1. This way, file sorting etc. shows the different versions next to each other

# This is a static method because we only want to watch the pages directory
# once on initial load.
@staticmethod
def watch_pages_dir(pages_manager):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to add type annotations to all parameters/return values throughout this file (same comment throughout the file)

Comment on lines 56 to 58
_cached_pages: dict[str, dict[str, str]] | None = None
_pages_cache_lock = threading.RLock()
_on_pages_changed = Signal(doc="Emitted when the set of pages has changed")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fairly sure all of these need to be moved into the class __init__, otherwise we'll run into some unexpected behavior.

  • _cached_pages doesn't really matter since the class-level variable defined here will just be different from the instance-level one set in get_pages
  • _pages_cache_lock and _on_pages_changed, however, will be shared by all class instances, which may result in unnecessary lock contention and signals emitted when pages for a different AppSession are changed

# once on initial load.
@staticmethod
def watch_pages_dir(pages_manager):
if V1PagesManager.is_watching_pages_dir:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have a potential race condition that can lead to multiple watchers here? If the goal is to have a single watcher, we might need to use

lock = threading.Lock()
with lock:

to make the lookup for if V1PagesManager.is_watching_pages_dir: and move V1PagesManager.is_watching_pages_dir = True into the lock.
If we just have a single thread and this is in the setup logic of Streamlit, it might make sense to add a comment to the method that its not thread-safe.

# once on initial load.
@staticmethod
def watch_pages_dir(pages_manager: PagesManager):
lock = threading.Lock()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we have to initialize the lock as a class variable; this way we create a new lock everytime and it does not do anything (sorry, my previous comment was not very accurate and I gave a bad example)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmhm maybe the _pages_cache_lock can be re-used from this line. but only if we can make it a class-level lock, because the watcher should exist only globally right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be separate it cause the pages_cache_lock has to be an instance variable. I created a separate lock.

Copy link
Collaborator

@vdonato vdonato May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this lock currently does anything since it's created right before use

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yea, @vdonato that was my bad. I corrected it here. 😞https://github.com/streamlit/streamlit/pull/8650/files


class PagesManager:
def __init__(self, main_script_path, **kwargs):
self._cached_pages: dict[str, PageInfo] | None = None
Copy link
Collaborator

@raethlein raethlein May 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it might be a good idea to define a type-alias for the key and give it a proper name, such as
type PageName = str if the key is the page name, or PageHash or whatever and use that type for the key everywhere where we expect it. Then its easier to understand what we put in there semantically. Or perhaps a comment will do it 🙂
One pattern I personally use often is to name the dict / map like this: self._pagehash_to_cached_pages but I believe some people might find it too verbose

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Love the type alias, and I'll apply it. I'm going to keep the dict for now cause the type is quite descriptive for it, and it makes sense (and it also aligns to the source_util phrasing).

Comment on lines +139 to +140
callback: Callable[[str], None],
) -> Callable[[], None]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we probably should have a docstring here saying what will be passed as a string to the Callable (presumably the page_hash)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added the docstring in the new PR, though the argument is never used. I believed it's the path detected from the watcher.

@kmcgrady kmcgrady merged commit 86965dc into feature/mpa-v2 May 13, 2024
31 checks passed
@kmcgrady kmcgrady deleted the refactor/pages-manager-abstraction branch May 13, 2024 19:16
@kmcgrady kmcgrady restored the refactor/pages-manager-abstraction branch May 13, 2024 19:17
@kmcgrady kmcgrady deleted the refactor/pages-manager-abstraction branch May 13, 2024 19:29
kmcgrady added a commit that referenced this pull request May 14, 2024
## Describe your changes

From #8639, @raethlein made a good claim about the lock as well as some
types to make things clear, so I addressed those here.

## Testing Plan

- Type checks should pass
- Existing tests should pass with lock behavior (we don't usually manage
concurrency testing at the moment)

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
kmcgrady added a commit that referenced this pull request May 20, 2024
## Describe your changes

Migrates pages are primarily managed in the `source_util` module to a
`PagesManager` class. Because V2 can have a dynamic set of pages per
session, the Pages Manager instance will live on the `AppSession`. For
V1, we will leverage static variables/methods for page watching.

## Testing Plan

- Original Unit tests are applied (with patches adjusted)
- Unit tests are applied for the PagesManager component

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
kmcgrady added a commit that referenced this pull request May 20, 2024
## Describe your changes

From #8639, @raethlein made a good claim about the lock as well as some
types to make things clear, so I addressed those here.

## Testing Plan

- Type checks should pass
- Existing tests should pass with lock behavior (we don't usually manage
concurrency testing at the moment)

---

**Contribution License Agreement**

By submitting this pull request you agree that all contributions to this
project are made under the Apache 2.0 license.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants