Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3033 Introduced V4 Oral Arguments Search API #4056

Merged
merged 15 commits into from
May 28, 2024

Conversation

albertisfu
Copy link
Contributor

@albertisfu albertisfu commented May 18, 2024

This PR introduces the ORAL_ARGUMENT search type to the V4 Search API.

The object's structure looks as follows:

{
            "absolute_url": "/audio/2/chase-cunningham-v-lawrence/",
            "caseName": "Chase-Cunningham v. Lawrence",
            "case_name_full": "",
            "court": "Thirteenth circuit of the Programming Horrors",
            "court_citation_string": "",
            "court_id": "xvyqd",
            "dateArgued": "1991-05-10",
            "dateReargued": null,
            "dateReargumentDenied": null,
            "docketNumber": "2:67-cv-18611",
            "docket_id": 3,
            "download_url": "http://allen.info/",
            "duration": null,
            "file_size_mp3": 0,
            "id": 2,
            "judge": "",
            "local_path": "dat/1991/05/10/example_2.dat",
            "meta": {
                "timestamp": "2024-05-20T22:53:35.213442Z",
                "date_created": "2024-05-20T22:52:49.659774Z"
            },
            "pacer_case_id": "248764",
            "panel_ids": [],
            "sha1": "911f8d4bbb4c6a7622ff92ab91fc9e15c4514e7d",
            "snippet": "This is the best transcript.",
            "source": "ZLU"
        }

It possesses the same features as other V4 endpoints, such as cursor pagination and obtaining the total document count from a cardinality count when matched hits exceed 10,000 documents.

Sorting

It supports the same sorting keys as in the frontend:

"score desc"
"dateArgued desc"
"dateArgued asc"

As in RECAP and People, it was required to apply a custom function score to sort and paginate documents due to dateArgued can be None in Audios.

So the sorting keys in OA V4 Search API is as follows:
score desc or function score for dateArgued
2° id desc (as the tiebreaker key)

I refactored build_es_base_query because applying the custom function score to plain documents like OA required a different approach, which was implemented in combine_plain_filters_and_queries.

Highlighting

As in the other search types, highlighting is disabled by default. When enabled by passing highlight=on, the HL fields are the same as in the frontend:

"caseName"
"judge"
"docketNumber"
"court_citation_string"
"text"

The "text" field holds the audio transcript. Although we currently don't have transcripts, I left the text field prepared to be retrieved from ES if highlighting is enabled or to get the snippet from the database if highlighting is off. To achieve this, I refactored the add_es_highlighting method so it can highlight or exclude fields as required, centralizing this process within the build_highlights_dict method.

In relation to this, I noticed an improvement that can be made in the Audio model for when we start getting transcripts for all audio files. Currently, a JSON object is stored in stt_google_response, and the transcript to display is retrieved by the transcript property. However, considering transcripts can be massive, retrieving the whole JSON from the database and then extracting the best transcript via the property can be expensive, especially when merging transcripts from the database when highlighting is disabled in the API. A better approach would be to also store the final plain text transcript in a field so we can get the snippet directly from the database using the Substr helper.

Let me know what do you think.

@albertisfu albertisfu force-pushed the 3033-develop-v4-oa-search-api branch from d3b1af7 to bc86869 Compare May 20, 2024 20:52
… disabled

- Refactored add_es_highlighting to centralize the highlights dict build using build_highlights_dict.
@albertisfu albertisfu marked this pull request as ready for review May 21, 2024 00:17
@albertisfu albertisfu requested a review from mlissner May 21, 2024 00:17
@mlissner
Copy link
Member

This sounds great, Alberto, thank you. Re the transcript, we can cross that bridge when we start having them. The good news about oral arguments is we have so few of them, we can do ugly non-performant things and get away with it!

Eduardo, all yours for review! Thank you both!

mlissner and others added 2 commits May 23, 2024 16:47
…s-es-v3-oa-search-api

2680 Compatibility tweaks to make ES V3 OA API fully compatible with Solr version
@albertisfu albertisfu force-pushed the 3033-develop-v4-oa-search-api branch from dc25500 to 56e5678 Compare May 24, 2024 16:10
Copy link
Contributor

@ERosendo ERosendo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good. I tested using different filter combinations and it worked properly.

@mlissner mlissner merged commit 596b59b into main May 28, 2024
13 checks passed
@mlissner mlissner deleted the 3033-develop-v4-oa-search-api branch May 28, 2024 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

None yet

3 participants