Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dynamic fields term set query not working #4945

Open
camerondavison opened this issue May 6, 2024 · 4 comments · May be fixed by #4983
Open

Dynamic fields term set query not working #4945

camerondavison opened this issue May 6, 2024 · 4 comments · May be fixed by #4983
Assignees
Labels
bug Something isn't working

Comments

@camerondavison
Copy link
Contributor

I am able to get

data.event_type:StandardMgaQuoteIssued OR data.event_type:QuoteIssued

query to work, but I cannot get a term set query to work, like

data.event_type:IN [StandardMgaQuoteIssued QuoteIssued]

This is my kafka ingestion configuration

version: 0.6
index_id: events

doc_mapping:
  field_mappings:
    - name: id
      type: text
      tokenizer: raw
...
    - name: data
      type: json
      tokenizer: default
    - name: occurred_at
      type: datetime
      fast: true
      input_formats:
        - rfc3339
        - "%Y-%m-%dT%H:%M:%S.%f"
      precision: seconds
  timestamp_field: occurred_at

indexing_settings:
  commit_timeout_secs: 10
{
  "build": {
    "build_date": "2024-03-29T16:35:13Z",
    "build_profile": "release",
    "build_target": "x86_64-unknown-linux-gnu",
    "cargo_pkg_version": "0.8.1",
    "commit_date": "2024-03-29T14:09:41Z",
    "commit_hash": "e6c53967f8e57401d93bcc555d361dad69bd4ece",
    "commit_short_hash": "e6c5396",
    "commit_tags": [
      "v0.8.1"
    ],
    "version": "v0.8.1"
  },
  "runtime": {
    "num_cpus_logical": 4,
    "num_cpus_physical": 2,
    "num_threads_blocking": 3,
    "num_threads_non_blocking": 1
  }
}

Im not sure if it may be a feature or a bug, based on the dynamic json fields that it is trying to use.

@camerondavison camerondavison added the bug Something isn't working label May 6, 2024
@fmassot
Copy link
Contributor

fmassot commented May 8, 2024

Thanks for the report @camerondavison

@trinity-1686a, can you look at this?

@trinity-1686a trinity-1686a self-assigned this May 13, 2024
@trinity-1686a
Copy link
Contributor

it looks like this is an issue of TermSetQuery not going through the tokenizer, and not lowercasing the values.
@camerondavison can you confirm you use the default tokenizer, and that data.event_type:IN [standardmgaquoteissued quoteissued] returns the result you expect?

Note that term set query with only a few elements is often less efficient than ORing a couple of term queries, so using a term set query for a set of 2 elements is not advisable.

@camerondavison
Copy link
Contributor Author

Yes that worked.

Good to know about the OR v term set query. That seems a little counter intuitive TBH thanks.

@trinity-1686a
Copy link
Contributor

sets get efficient when you start to have many terms. Computation-wise, both should be close, but network-wise, multiple term queries will cause multiple small downloads, while term set will download a big chunk of data. When you have many terms in you set, one large fetch is more efficient than thousands of small fetches, but when you need only a few terms, doing these small fetches is faster. In the future, we may improve term set queries so they are more efficient when only a few terms can be requested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants