-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhancement: Query isin #847
base: master
Are you sure you want to change the base?
Conversation
Indeed! More are welcome :D This looks like a good addition to me. On a first read this seems straight-forward and great, but I'll give it a second pass and go over implementation a bit more slowly. The main expression docs page ought to have it's example updated with this, and some pages made for these new methods. Thanks! |
Depending on how much of the PR is accepted I'm perfectly happy to add a PR with the added documentation for the website too. |
My initial reaction: I really like the addition of The Anyhow, if I get a vote (an admittedly questionable proposition) I'd vote for |
Thank you for this PR :) I pretty much agree with the following:
|
Removed Query.isnot() expression. Everybody is correct, it's confusing. |
lektor/db.py
Outdated
|
||
if self.__evaluated: | ||
return self.__value | ||
self.__value = [eval(v) for v in self.__value] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this caching is not correct. What happens if __eval__
is called a second time with a different record
(and the sequence contains an Expression
)?
(Is the caching even worth the effort? The evaluation of non-Expression
values is cheap. The Expression
-valued items are potentially heavy, but how often do they get re-evaluated for the same record
?)
Cache or not, there is another optimization that could be made here. I'm not completely sure it's worth the effort, but it's worth considering.
Instead of always evaluating every item in the sequence, __eval__
could return an iterator/generator that evaluates the items one at a time, as they are iterated over, thus saving some (potentially heavy) evaluations if the calling is in
operator hits early.
This would make the caching logic more complicated, but not overly so.
If the caching is eliminated, then this optimization is simple enough that it's almost certainly worth making.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes the caching is a worry. I'll remove the __evaluated
test.
But returning a generator looks problematic too. Since the _ContainmentExpr into which
it is placed (see the isin
method) might be evalued more that once in which case the generator will be already
exhausted on the second evaluation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! I take the second paragraph back. _ContainmentExpr re-evalues seq for each on its eval's
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Made the evaluation "lazier" but still had to guard against multiple evaluations of a generator
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if one just "listified" the sequence/iterable in the constructor for _SeqExpr? Something like:
class _SeqExpr(Expression):
def __init__(self, value):
if isinstance(value, collections.abc.Collection):
self.__value = value
else:
try:
self.__value = list(value)
except TypeError:
self.__value = [value]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't pick up the case for value being a generator (which can be the case -- my test in
test_db.py fail). I could use isinstance(value, Iterable) but....
There is a footnote in https://docs.python.org/3/library/collections.abc.html that says:
Checking isinstance(obj, Iterable) detects classes that are registered as Iterable or that have an iter() method, but it does not detect classes that iterate with the getitem() method. The only reliable way to determine whether an object is [iterable] is to call iter(obj)
Which the reason for the iter(value) test.
Also... I don't know if I should test for value being a string. Since currently: F.name.isin("ABC") == F.name.isin(['A','B','C'])
(since list("ABC") => ['A','B','C'], which is "probably" not what a user intended.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This won't pick up the case for value being a generator
I'm pretty sure it will. Note that the if isinstance(value, Collection)
check is just an optimization — it could be omitted altogether. The check for iterability happens at try: self.__value = list(value)
(which is essentially the same test as your try: iter(value)
, except that now we pre-cache the results of the iteration.
The idea behind the if instance(value, Collection)
check is to identify those values for which iter(value)
always returns a "fresh" iterator. Since we can already iterate these values multiple times without issue, we can skip the value = list(value)
cache step.
For the sake of concreteness, here is my previous proposal, in its entirety. I've rewritten the constructor for (I think) clarity, but it's equivalent to my previous proposal.)
class _SeqExpr(Expression):
def __init__(self, value):
if not isinstance(value, collections.abc.Collection):
try:
value = list(value)
except TypeError:
value = [value]
self.__value = value
def __eval__(self, record):
for item in self.__value:
if instance(item, record):
yield item["_id"]
elif isinstance(item, Expression):
yield item.__eval__(record)
else:
yield item
Since Collection
s are iterable and sized, the assumption made here is that iter(collection)
always produces a fresh iterable (since one would expect len(list(iter(collection))) == len(collection)
to always be true.)
Also... I don't know if I should test for value being a string. Since currently: F.name.isin("ABC") == F.name.isin(['A','B','C']) (since list("ABC") => ['A','B','C'], which is "probably" not what a user intended.
I don't have strong feelings, but, probably should either go all-in on the DWIM and test for string, or skip the DWIM altogether and just throw TypeError if value
is not iterable. I can see arguments either way. My personal feeling (at the moment) is explicit is better, and skip the DWIM. But, since Expression
s are largely intended for use in templates, I can see where the DWIM approach is reasonable, too.
If one does go full DWIM, perhaps things should be made so that F.name.isin('Fred', 'Joe')
is equivalent to F.name.isin(['Fred', 'Joe'])
? In that case the DWIM logic should probably be moved to Expression.isin
and Expression.isnotin
(abstracted to a separate helper function).
Now that I have had a couple of pull-requests merged the awesome lektor maintainers
probably think "I'm not a moron" (ref. Linus). So I'm going to impertinently proffer an enhancement...
isin
ExpressionLektor has a
contains
expression methode.g.
F.tags.contains('amazing')
but it doesn't have the complement viz:F.tag.isin(['amazing','zounds!'])
The
isin
name was chosen because is serves the same purpose as panda's DataFrameisin
.With an
isin
(andisnotin
) some of the necessarily "clunky" expression syntax forand
s andor
s goes away (sqlalchemy has this problem too).Also you don't need a separate syntax for jinja templates as is currently required.
It plays well with fields that are lists. e.g. with a model field like:
it's easy to select the correct slides:
There is also a
isnotin
method that makes long chains ofand
s moot:becomes:
This is the same as
F.name.isin(seq).false()
but possibly moreuser friendly.
There is also an
isnot
method attached to the query so thatAllows the entire query to be complemented.
How safe are these changes?
These additions should be safe for existing lektor projects
in the wild since everything is backwards compatible.
The exception being if someone is storing
get_structure_hash
values long term (Since I have changedQuery.__get_lektor_param_hash__
).Notes
Query.isnot
doesn't invert is_hidden and is_discoverable values.This seems more natural. You want to complement the query after selecting
-- or not -- for is_hidden, is_discoverable.
_id
values are special in that you can return full records:whereas for any other attribute you will need to do something
like.
Description of Changes