Telicent OpenSearch Plugin

Overview

This plugin provides an alternative implementation of the SynonymGraphTokenFilter for OpenSearch.

Instead of storing the synonyms in a file, this implementation loads it from an OpenSearch index, which makes it easier to update especially when OpenSearch runs in a sandboxed environment or in a cluster as it saves you having to update the synonyms file on every single node.

Please consult the Opensearch documentation for instructions on how to install and configure plugins.

You can also use Telicent Docker image, which provides OpenSearch with the plugin pre-installed.

Issues/Questions

Please file an issue.

Installation

Compile the code with mvn clean package, you should find the plugin in target/releases.

You can then install it in OpenSearch with

$OS_HOME/bin/opensearch-plugin install --batch file:///target/releases/SynonymsPlugin-$VERSION.zip

When installing the plugin, you will see a message similar to this one:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@     WARNING: plugin requires additional permissions     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
* java.net.SocketPermission localhost connect,listen,accept,resolve
See http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.html
for descriptions of what these permissions allow and the associated risks.

This is because the plugin code needs to query OpenSearch and requires special permissions to do so.

If you are planning to use OpenSearch with Docker, you should build a custom version of the image using the Dockerfile provided and use it instead

docker build --tag=opensearch-telicent-plugin .

This way the plugin will be preinstalled.

Alternatively, you can simply run the following command to pull the latest image from DockerHub

docker pull telicent/opensearch

Getting Started

First, you need to declare the analyzers when creating your index (assuming OpenSearch is running locally on the default port and that the default security settings are applied):

curl --insecure -u admin:admin -XPUT "https://localhost:9200/my_index" -H 'Content-Type: application/json' -d'
{
  "settings": {
    "analysis": {
      "analyzer": {
        "default": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding"
          ]
        },
        "default_search": {
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "asciifolding",
            "graph_synonyms"
          ]
        }
      },
      "filter": {
        "graph_synonyms": {
          "type": "index_synonym_graph",
          "index": ".synonyms",
          "expand": true,
          "lenient": false,
          "username": "admin",
          "password": "admin"
        }
      }
    }
  }
}'

The index synonym graph is used only during search and can't be applied during indexing. The parameters lenient and expand are similar to those of synonym-graph-tokenfilter, their default values are indicated above. The parameter index specifies where the plugin will load the synonym mappings from. The default value is .synonyms. The parameters "username" and "password" allow to specify the credentials to use for connecting to OpenSearch. If the security plugin is deactivated, remove these parameters.

The next step is to index the synonyms.

curl --insecure -u admin:admin -XPOST -H "Content-Type: application/json" "https://localhost:9200/.synonyms/_doc/synonyms" -d '{
  "synonyms": [
    "i-pod, i pod => ipod",
    "sea biscuit, sea biscit => seabiscuit",
    "ipod, i-pod, i pod",
    "universe , cosmos",
    "lol, laughing out loud"
  ]
}'

The plugin supports only the SOLR format.

The synonyms can be stored in any number of documents in the index, a query loads them all. The field names do not matter either. The values of the fields are either simple strings or arrays of strings. Each string corresponds to a line in the SOLR synonym format.

Testing

Now that the synonym index has been populated, you can check that it is being applied. First, since the synonym data have been created after configuring the analysis for the search, the config must be reloaded with

curl --insecure -u admin:admin -XPOST "https://localhost:9200/_plugins/_refresh_search_analyzers/my_index"

you can then use the analyze endpoint to get a description of how a field will be analysed at search time, for instance

curl --insecure -u admin:admin -XPOST "https://localhost:9200/my_index/_analyze" -H 'Content-Type: application/json' -d'
{ 
  "analyzer": "default_search", 
  "text": "Is this universe déja vu?"
}'

should return

{
  "tokens" : [
    {
      "token" : "is",
      "start_offset" : 0,
      "end_offset" : 2,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "this",
      "start_offset" : 3,
      "end_offset" : 7,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "cosmos",
      "start_offset" : 8,
      "end_offset" : 16,
      "type" : "SYNONYM",
      "position" : 2
    },
    {
      "token" : "universe",
      "start_offset" : 8,
      "end_offset" : 16,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "deja",
      "start_offset" : 17,
      "end_offset" : 21,
      "type" : "<ALPHANUM>",
      "position" : 3
    },
    {
      "token" : "vu",
      "start_offset" : 22,
      "end_offset" : 24,
      "type" : "<ALPHANUM>",
      "position" : 4
    }
  ]
}

as you can see, universe has been expanded into cosmos with the same offset. This also illustrates the use of ASCII folding.

Note to developers

Please format the code with

mvn git-code-format:format-code -Dgcf.globPattern=**/*

prior to submitting a PR.

License

This code is licensed under the Apache 2.0 License. See LICENSE.txt. All contributions are welcome and should be under the Apache 2.0 License, see CONTRIBUTING on how to proceed.

Copyright

Copyright Telicent. See NOTICE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
.github/workflows		.github/workflows
.mvn		.mvn
src		src
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE.txt		LICENSE.txt
NOTICE.txt		NOTICE.txt
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

.github/workflows

.github/workflows

.mvn

.mvn

src

src

.gitignore

.gitignore

CONTRIBUTING.md

CONTRIBUTING.md

Dockerfile

Dockerfile

LICENSE.txt

LICENSE.txt

NOTICE.txt

NOTICE.txt

README.md

README.md

pom.xml

pom.xml

Repository files navigation

Telicent OpenSearch Plugin

Overview

Issues/Questions

Installation

Getting Started

Testing

Note to developers

License

Copyright

About

Releases 3

Contributors 2

Languages

License

Telicent-io/telicent-opensearch

Folders and files

Latest commit

History

Repository files navigation

Telicent OpenSearch Plugin

Overview

Issues/Questions

Installation

Getting Started

Testing

Note to developers

License

Copyright

About

Topics

Resources

License

Stars

Watchers

Forks

Languages