-
-
Notifications
You must be signed in to change notification settings - Fork 136
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
3033 Introduced V4 Oral Arguments Search API #4056
Conversation
- Added V4 OA Search API tests
d3b1af7
to
bc86869
Compare
… disabled - Refactored add_es_highlighting to centralize the highlights dict build using build_highlights_dict.
This sounds great, Alberto, thank you. Re the transcript, we can cross that bridge when we start having them. The good news about oral arguments is we have so few of them, we can do ugly non-performant things and get away with it! Eduardo, all yours for review! Thank you both! |
…s-es-v3-oa-search-api 2680 Compatibility tweaks to make ES V3 OA API fully compatible with Solr version
dc25500
to
56e5678
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks good. I tested using different filter combinations and it worked properly.
This PR introduces the
ORAL_ARGUMENT
search type to the V4 Search API.The object's structure looks as follows:
It possesses the same features as other V4 endpoints, such as cursor pagination and obtaining the total document count from a cardinality count when matched hits exceed 10,000 documents.
Sorting
It supports the same sorting keys as in the frontend:
As in RECAP and People, it was required to apply a custom function score to sort and paginate documents due to
dateArgued
can beNone
in Audios.So the sorting keys in OA V4 Search API is as follows:
1°
score desc
or function score for dateArgued2° id desc (as the tiebreaker key)
I refactored
build_es_base_query
because applying the custom function score to plain documents like OA required a different approach, which was implemented incombine_plain_filters_and_queries
.Highlighting
As in the other search types, highlighting is disabled by default. When enabled by passing highlight=on, the HL fields are the same as in the frontend:
The "text" field holds the audio transcript. Although we currently don't have transcripts, I left the
text
field prepared to be retrieved from ES if highlighting is enabled or to get the snippet from the database if highlighting is off. To achieve this, I refactored theadd_es_highlighting
method so it can highlight or exclude fields as required, centralizing this process within thebuild_highlights_dict
method.In relation to this, I noticed an improvement that can be made in the
Audio
model for when we start getting transcripts for all audio files. Currently, a JSON object is stored instt_google_response
, and the transcript to display is retrieved by thetranscript
property. However, considering transcripts can be massive, retrieving the whole JSON from the database and then extracting the best transcript via the property can be expensive, especially when merging transcripts from the database when highlighting is disabled in the API. A better approach would be to also store the final plain text transcript in a field so we can get the snippet directly from the database using theSubstr
helper.Let me know what do you think.