3033 Introduced V4 Oral Arguments Search API #4056

albertisfu · 2024-05-18T02:47:31Z

This PR introduces the ORAL_ARGUMENT search type to the V4 Search API.

The object's structure looks as follows:

{
            "absolute_url": "/audio/2/chase-cunningham-v-lawrence/",
            "caseName": "Chase-Cunningham v. Lawrence",
            "case_name_full": "",
            "court": "Thirteenth circuit of the Programming Horrors",
            "court_citation_string": "",
            "court_id": "xvyqd",
            "dateArgued": "1991-05-10",
            "dateReargued": null,
            "dateReargumentDenied": null,
            "docketNumber": "2:67-cv-18611",
            "docket_id": 3,
            "download_url": "http://allen.info/",
            "duration": null,
            "file_size_mp3": 0,
            "id": 2,
            "judge": "",
            "local_path": "dat/1991/05/10/example_2.dat",
            "meta": {
                "timestamp": "2024-05-20T22:53:35.213442Z",
                "date_created": "2024-05-20T22:52:49.659774Z"
            },
            "pacer_case_id": "248764",
            "panel_ids": [],
            "sha1": "911f8d4bbb4c6a7622ff92ab91fc9e15c4514e7d",
            "snippet": "This is the best transcript.",
            "source": "ZLU"
        }

It possesses the same features as other V4 endpoints, such as cursor pagination and obtaining the total document count from a cardinality count when matched hits exceed 10,000 documents.

Sorting

It supports the same sorting keys as in the frontend:

"score desc"
"dateArgued desc"
"dateArgued asc"

As in RECAP and People, it was required to apply a custom function score to sort and paginate documents due to dateArgued can be None in Audios.

So the sorting keys in OA V4 Search API is as follows:
1° score desc or function score for dateArgued
2° id desc (as the tiebreaker key)

I refactored build_es_base_query because applying the custom function score to plain documents like OA required a different approach, which was implemented in combine_plain_filters_and_queries.

Highlighting

As in the other search types, highlighting is disabled by default. When enabled by passing highlight=on, the HL fields are the same as in the frontend:

"caseName"
"judge"
"docketNumber"
"court_citation_string"
"text"

The "text" field holds the audio transcript. Although we currently don't have transcripts, I left the text field prepared to be retrieved from ES if highlighting is enabled or to get the snippet from the database if highlighting is off. To achieve this, I refactored the add_es_highlighting method so it can highlight or exclude fields as required, centralizing this process within the build_highlights_dict method.

In relation to this, I noticed an improvement that can be made in the Audio model for when we start getting transcripts for all audio files. Currently, a JSON object is stored in stt_google_response, and the transcript to display is retrieved by the transcript property. However, considering transcripts can be massive, retrieving the whole JSON from the database and then extracting the best transcript via the property can be expensive, especially when merging transcripts from the database when highlighting is disabled in the API. A better approach would be to also store the final plain text transcript in a field so we can get the snippet directly from the database using the Substr helper.

Let me know what do you think.

- Added V4 OA Search API tests

…lters.

… disabled - Refactored add_es_highlighting to centralize the highlights dict build using build_highlights_dict.

cl/lib/elasticsearch_utils.py

mlissner · 2024-05-21T00:41:57Z

This sounds great, Alberto, thank you. Re the transcript, we can cross that bridge when we start having them. The good news about oral arguments is we have so few of them, we can do ugly non-performant things and get away with it!

Eduardo, all yours for review! Thank you both!

…compatible

cl/lib/elasticsearch_utils.py

…s-es-v3-oa-search-api 2680 Compatibility tweaks to make ES V3 OA API fully compatible with Solr version

ERosendo

The code looks good. I tested using different filter combinations and it worked properly.

albertisfu added 6 commits May 17, 2024 00:17

fix(api): Refactored V3 OA API Tests

b202e73

fix(api): Introduced V4 OA Search API

f980a39

- Added V4 OA Search API tests

fix(api): Fix minimum_should_match when combining query string and fi…

0200fe7

…lters.

Merge branch 'main' into 3033-develop-v4-oa-search-api

e603059

fix(api): Updated branch and solved merge conflicts

88b1e5a

fix(api): Refactor apply_custom_score_to_main_query

bc86869

albertisfu force-pushed the 3033-develop-v4-oa-search-api branch from d3b1af7 to bc86869 Compare May 20, 2024 20:52

fix(api): Added support to merge OA snippet from DB when highlight is…

228d827

… disabled - Refactored add_es_highlighting to centralize the highlights dict build using build_highlights_dict.

semgrep-app bot reviewed May 20, 2024

View reviewed changes

cl/lib/elasticsearch_utils.py Show resolved Hide resolved

albertisfu added 2 commits May 20, 2024 17:46

Merge branch 'main' into 3033-develop-v4-oa-search-api

a6f17ab

fix(api): Fixed HL tag for OA Search alerts

5cef3c6

albertisfu marked this pull request as ready for review May 21, 2024 00:17

albertisfu requested a review from mlissner May 21, 2024 00:17

albertisfu and others added 3 commits May 21, 2024 15:30

fix(api): Verify and fix V3 OA Search API fields to be Solr-backward …

2515eaf

…compatible

fix(api): Fixed audio_v4_fields mypy complaint

c3e8ac0

Merge branch 'main' into 3033-develop-v4-oa-search-api

8419251

ERosendo reviewed May 23, 2024

View reviewed changes

cl/lib/elasticsearch_utils.py Outdated Show resolved Hide resolved

ERosendo reviewed May 23, 2024

View reviewed changes

cl/lib/elasticsearch_utils.py Outdated Show resolved Hide resolved

mlissner and others added 2 commits May 23, 2024 16:47

Merge pull request #4063 from freelawproject/2680-compatibility-tweak…

a222921

…s-es-v3-oa-search-api 2680 Compatibility tweaks to make ES V3 OA API fully compatible with Solr version

fix(elasticsearch): Applied elasticsearch_utils fix and suggestion

56e5678

albertisfu force-pushed the 3033-develop-v4-oa-search-api branch from dc25500 to 56e5678 Compare May 24, 2024 16:10

Merge branch 'main' into 3033-develop-v4-oa-search-api

21714a8

ERosendo approved these changes May 28, 2024

View reviewed changes

mlissner merged commit 596b59b into main May 28, 2024
13 checks passed

mlissner deleted the 3033-develop-v4-oa-search-api branch May 28, 2024 22:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

3033 Introduced V4 Oral Arguments Search API #4056

3033 Introduced V4 Oral Arguments Search API #4056

albertisfu commented May 18, 2024 •

edited

mlissner commented May 21, 2024

ERosendo left a comment

3033 Introduced V4 Oral Arguments Search API #4056

3033 Introduced V4 Oral Arguments Search API #4056

Conversation

albertisfu commented May 18, 2024 • edited

Sorting

Highlighting

mlissner commented May 21, 2024

ERosendo left a comment

Choose a reason for hiding this comment

albertisfu commented May 18, 2024 •

edited