Skip to content

apply unidirectional synonyms at query-time #411

@missinglink

Description

@missinglink

as of today we finally removed all unidirectional synonyms (ones using the a=>b syntax) from our default synonyms file 🎉

unfortunately, I realized that there is a bug which is preventing those unidirectional synonyms from working properly when users specify them in a custom configuration.

as per the example below, it's possible to index the term "hello" and then not be able to retrieve the document using the term "hello" 🤔

the solution to this problem is to split all the synonyms into two buckets, one for unidirectional synonyms (a=>b syntax) and one for bidirectional synonyms (a,b syntax), we will then need to apply both buckets at index-time and only the unidirectional synonyms at query-time.

curl -s -XDELETE "http://localhost:9200/foo?pretty=true"

curl -s -XPUT "http://localhost:9200/foo?pretty=true" \
  -H 'Content-Type: application/json' \
  -d '{
      "settings" : {
        "analysis": {
          "filter" : {
            "mySynonym" : {
              "type" : "synonym",
              "synonyms" : [
                "hello => world"
              ]
            }
          },
          "analyzer": {
            "myAnalyzer": {
              "type": "custom",
              "tokenizer": "standard",
              "filter": [
                "mySynonym"
              ]
            }
          }
        }
      },
      "mappings" : {
        "_doc" : {
          "properties" : {
            "field1": {
              "type": "text",
              "analyzer": "myAnalyzer",
              "search_analyzer": "standard"
            }
          }
        }
      }
    }'

curl -s -XPOST "http://localhost:9200/foo/_doc/example?pretty=true" \
  -H 'Content-Type: application/json' \
  -d '{
      "field1": "hello"
    }'

curl -s -XPOST "http://localhost:9200/foo/_refresh?pretty=true"

curl -XGET "http://localhost:9200/foo/_search?pretty=true" \
  -H 'Content-Type: application/json' \
  -d '{
      "query": {
        "match": {
          "field1": "hello"
        }
      }
    }'

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions