New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor page management to a PagesManager class #8639
Conversation
absolute_path = self.get_absolute_path(self.root, self.path) | ||
return super().validate_absolute_path(root, absolute_path) | ||
|
||
raise e |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vdonato This might be the most controversial part. My assumption here is that we need these paths to actually 404 for the sake of identifying the correct path that provides the stream.
The rest can essentially be handled by the frontend, but let me know if my assumption is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments, but overall LGTM
@@ -69,7 +70,7 @@ def setUp(self): | |||
|
|||
@patch("streamlit.watcher.local_sources_watcher.PathWatcher") | |||
def test_just_script(self, fob): | |||
lso = local_sources_watcher.LocalSourcesWatcher(SCRIPT_PATH) | |||
lso = local_sources_watcher.LocalSourcesWatcher(PagesManager(SCRIPT_PATH)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: not quite related to this change, but there variable used here is frequently lso
(not sure if LocalSourcesWatcher
previously had a different name and this variable stayed the same when class name was changed by a search and replace) when lsw
would be clearer
_LOGGER: Final = get_logger(__name__) | ||
|
||
|
||
class V1PagesManager: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this name should probably be something more like V1PagesWatcher
since the class just installs watchers/doesn't do anything else related to page management
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: I would also suggest to flip the word order and call it PagesManagerV1
. This way, file sorting etc. shows the different versions next to each other
# This is a static method because we only want to watch the pages directory | ||
# once on initial load. | ||
@staticmethod | ||
def watch_pages_dir(pages_manager): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to add type annotations to all parameters/return values throughout this file (same comment throughout the file)
_cached_pages: dict[str, dict[str, str]] | None = None | ||
_pages_cache_lock = threading.RLock() | ||
_on_pages_changed = Signal(doc="Emitted when the set of pages has changed") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fairly sure all of these need to be moved into the class __init__
, otherwise we'll run into some unexpected behavior.
_cached_pages
doesn't really matter since the class-level variable defined here will just be different from the instance-level one set inget_pages
_pages_cache_lock
and_on_pages_changed
, however, will be shared by all class instances, which may result in unnecessary lock contention and signals emitted when pages for a differentAppSession
are changed
# once on initial load. | ||
@staticmethod | ||
def watch_pages_dir(pages_manager): | ||
if V1PagesManager.is_watching_pages_dir: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a potential race condition that can lead to multiple watchers here? If the goal is to have a single watcher, we might need to use
lock = threading.Lock()
with lock:
to make the lookup for if V1PagesManager.is_watching_pages_dir:
and move V1PagesManager.is_watching_pages_dir = True
into the lock.
If we just have a single thread and this is in the setup logic of Streamlit, it might make sense to add a comment to the method that its not thread-safe.
# once on initial load. | ||
@staticmethod | ||
def watch_pages_dir(pages_manager: PagesManager): | ||
lock = threading.Lock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have to initialize the lock as a class variable; this way we create a new lock everytime and it does not do anything (sorry, my previous comment was not very accurate and I gave a bad example)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmhm maybe the _pages_cache_lock
can be re-used from this line. but only if we can make it a class-level lock, because the watcher should exist only globally right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be separate it cause the pages_cache_lock has to be an instance variable. I created a separate lock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think this lock currently does anything since it's created right before use
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, @vdonato that was my bad. I corrected it here. 😞https://github.com/streamlit/streamlit/pull/8650/files
|
||
class PagesManager: | ||
def __init__(self, main_script_path, **kwargs): | ||
self._cached_pages: dict[str, PageInfo] | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: it might be a good idea to define a type-alias for the key and give it a proper name, such as
type PageName = str
if the key is the page name, or PageHash
or whatever and use that type for the key everywhere where we expect it. Then its easier to understand what we put in there semantically. Or perhaps a comment will do it 🙂
One pattern I personally use often is to name the dict / map like this: self._pagehash_to_cached_pages
but I believe some people might find it too verbose
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Love the type alias, and I'll apply it. I'm going to keep the dict for now cause the type is quite descriptive for it, and it makes sense (and it also aligns to the source_util phrasing).
callback: Callable[[str], None], | ||
) -> Callable[[], None]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: we probably should have a docstring here saying what will be passed as a string to the Callable (presumably the page_hash)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added the docstring in the new PR, though the argument is never used. I believed it's the path detected from the watcher.
## Describe your changes From #8639, @raethlein made a good claim about the lock as well as some types to make things clear, so I addressed those here. ## Testing Plan - Type checks should pass - Existing tests should pass with lock behavior (we don't usually manage concurrency testing at the moment) --- **Contribution License Agreement** By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.
## Describe your changes Migrates pages are primarily managed in the `source_util` module to a `PagesManager` class. Because V2 can have a dynamic set of pages per session, the Pages Manager instance will live on the `AppSession`. For V1, we will leverage static variables/methods for page watching. ## Testing Plan - Original Unit tests are applied (with patches adjusted) - Unit tests are applied for the PagesManager component --- **Contribution License Agreement** By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.
## Describe your changes From #8639, @raethlein made a good claim about the lock as well as some types to make things clear, so I addressed those here. ## Testing Plan - Type checks should pass - Existing tests should pass with lock behavior (we don't usually manage concurrency testing at the moment) --- **Contribution License Agreement** By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.
Describe your changes
Migrates pages are primarily managed in the
source_util
module to aPagesManager
class. Because V2 can have a dynamic set of pages per session, the Pages Manager instance will live on theAppSession
. For V1, we will leverage static variables/methods for page watching.Testing Plan
Contribution License Agreement
By submitting this pull request you agree that all contributions to this project are made under the Apache 2.0 license.