New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Restock & Price monitor - Use itemprop where available #2041
base: master
Are you sure you want to change the base?
Conversation
Also I'm not sure it link elements are the only place these can occur. To me the google documentation looks like meta elements are also supported, e.g. And, while it may be out of scope for this PR, there is also the RDFa format, which looks like this: |
@druppelt I just realised that we store What do you think about this xPath ? Do all
I also added RDFa style detection too, but I couldnt find a web page "in the wild" to test it on, do you have any links I could test it against? |
Ah, according to https://schema.org/availability Microdata (completed in this PR) ✔️
RDFa (not yet, needs also
JSON-LD (handled by the 'follow JSON-LD embedded data?' prompt, but this should work here also, needs to be integrated)
|
…edetection.io into 2039-restock-use-itemprop
scrapinghub/extruct#232 stuck here |
@@ -240,7 +240,7 @@ def _get_stripped_text_from_json_match(match): | |||
# ensure_is_ldjson_info_type - str "product", optional, "@type == product" (I dont know how to do that as a json selector) | |||
def extract_json_as_string(content, json_filter, ensure_is_ldjson_info_type=None): | |||
stripped_text_from_html = False | |||
|
|||
# https://github.com/dgtlmoon/changedetection.io/pull/2041#issuecomment-1848397161w |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
all of this code can be replaced with extruct
@property | ||
def has_restock_info(self): | ||
# has either price or availability | ||
if self.get('restock'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should be moved to the actual Restock object
Re #2039 -
Make the restock monitor use machine-data first if available, then fallback to what the browser scraper replies.
https://schema.org/ItemAvailability
<link itemprop="availability" href="https://schema.org/OutOfStock" />
Todo
'track_ldjson_price_data'
set should be now"restock detection"
mode'watch - in_stock'
should bewatch - restock - in_stock
boolexception:...
errors exist with test setactually better to add here is an extra attribute on the watch to record the most recent price (and stock status?) too, then we can add a setting to alert if the price moves
as well as add two extra columns in the watch-overview table (maybe this should be extensible so different functionalities can announce that they want an extra column in the table)
maybe some toggle button like "follow content change" or "follow price/stock change"