-
-
Notifications
You must be signed in to change notification settings - Fork 81
Open
Labels
Description
as of today we finally removed all unidirectional synonyms (ones using the a=>b syntax) from our default synonyms file 🎉
unfortunately, I realized that there is a bug which is preventing those unidirectional synonyms from working properly when users specify them in a custom configuration.
as per the example below, it's possible to index the term "hello" and then not be able to retrieve the document using the term "hello" 🤔
the solution to this problem is to split all the synonyms into two buckets, one for unidirectional synonyms (a=>b syntax) and one for bidirectional synonyms (a,b syntax), we will then need to apply both buckets at index-time and only the unidirectional synonyms at query-time.
curl -s -XDELETE "http://localhost:9200/foo?pretty=true"
curl -s -XPUT "http://localhost:9200/foo?pretty=true" \
-H 'Content-Type: application/json' \
-d '{
"settings" : {
"analysis": {
"filter" : {
"mySynonym" : {
"type" : "synonym",
"synonyms" : [
"hello => world"
]
}
},
"analyzer": {
"myAnalyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"mySynonym"
]
}
}
}
},
"mappings" : {
"_doc" : {
"properties" : {
"field1": {
"type": "text",
"analyzer": "myAnalyzer",
"search_analyzer": "standard"
}
}
}
}
}'
curl -s -XPOST "http://localhost:9200/foo/_doc/example?pretty=true" \
-H 'Content-Type: application/json' \
-d '{
"field1": "hello"
}'
curl -s -XPOST "http://localhost:9200/foo/_refresh?pretty=true"
curl -XGET "http://localhost:9200/foo/_search?pretty=true" \
-H 'Content-Type: application/json' \
-d '{
"query": {
"match": {
"field1": "hello"
}
}
}'