Alternative for SetNextReader to return all strings #910

mowali · 2024-01-30T17:19:57Z

Is there an existing issue for this?

I have searched the existing issues

Describe the documentation issue

PaulVrugt was asking this question, but never got a response to it:

The FieldCache GetStrings method was replace by GetTerms, but GetTerms requires an AtomicReader, we used to be able to pass an IndexReader into this method and it used to return a string array containing the values. How to I get the same kind of behavior from the GetTerms method?

Is there no way to have the same behavior that GetStrings did in version 3.0.3?

Additional context

Here is the link to that thread:
#398
No response

NightOwl888 · 2024-01-31T04:24:26Z

The Migration Guide covers this very issue with an example:

LUCENE-2380: FieldCache.GetStrings/Index --> FieldCache.GetDocTerms/Index

The field values returned when sorting by SortField.STRING are now
BytesRef. You can call value.Utf8ToString() to convert back to
string, if necessary.
In FieldCache, GetStrings (returning string[]) has been replaced
with GetTerms (returning a BinaryDocValues instance).
BinaryDocValues provides a Get method, taking a docID and a BytesRef
to fill (which must not be null), and it fills it in with the
reference to the bytes for that term.

If you had code like this before:
```
string[] values = FieldCache.DEFAULT.GetStrings(reader, field);
...
string aValue = values[docID];
```
you can do this instead:
```
BinaryDocValues values = FieldCache.DEFAULT.GetTerms(reader, field);
...
BytesRef term = new BytesRef();
values.Get(docID, term);
string aValue = term.Utf8ToString();
```
Note however that it can be costly to convert to String, so it's better to work directly with the BytesRef.
Similarly, in FieldCache, GetStringIndex (returning a StringIndex
instance, with direct arrays int[] order and String[] lookup) has
been replaced with GetTermsIndex (returning a
SortedDocValues instance). SortedDocValues provides the
GetOrd(int docID) method to lookup the int order for a document,
LookupOrd(int ord, BytesRef result) to lookup the term from a given
order, and the sugar method Get(int docID, BytesRef result)
which internally calls GetOrd and then LookupOrd.

If you had code like this before:
```
StringIndex idx = FieldCache.DEFAULT.GetStringIndex(reader, field);
...
int ord = idx.order[docID];
String aValue = idx.lookup[ord];
```
you can do this instead:
```
DocTermsIndex idx = FieldCache.DEFAULT.GetTermsIndex(reader, field);
...
int ord = idx.GetOrd(docID);
BytesRef term = new BytesRef();
idx.LookupOrd(ord, term);
string aValue = term.Utf8ToString();
```
Note however that it can be costly to convert to String, so it's better to work directly with the BytesRef.

DocTermsIndex also has a GetTermsEnum() method, which returns an iterator (TermsEnum) over the term values in the index (ie, iterates ord = 0..NumOrd-1).

Furthermore, if you drill down into the issue LUCENE-2380, there is an explanation for the change: primarily, this was done for performance reasons. There is no longer a string[] stored in the field cache, the underlying data is now a byte[] so extra steps are required to get a UTF8 string.

Do note that you are meant to reuse the BytesRef instance that is passed in to get better performance.

mowali added the docs label Jan 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Alternative for SetNextReader to return all strings #910

Alternative for SetNextReader to return all strings #910

mowali commented Jan 30, 2024 •

edited

NightOwl888 commented Jan 31, 2024

Alternative for SetNextReader to return all strings #910

Alternative for SetNextReader to return all strings #910

Comments

mowali commented Jan 30, 2024 • edited

Is there an existing issue for this?

Describe the documentation issue

PaulVrugt was asking this question, but never got a response to it:

The FieldCache GetStrings method was replace by GetTerms, but GetTerms requires an AtomicReader, we used to be able to pass an IndexReader into this method and it used to return a string array containing the values. How to I get the same kind of behavior from the GetTerms method?

Additional context

NightOwl888 commented Jan 31, 2024

LUCENE-2380: FieldCache.GetStrings/Index --> FieldCache.GetDocTerms/Index

mowali commented Jan 30, 2024 •

edited