Sort with ElasticSearch accent

1

I'm trying to do a sort with Elastic Search , but some fields have accentuation, such as city names, I tried to use fields with index not_analyzed and with ptbr of the second form:

    {
   "settings": {
      "analysis": {
         "analyzer": {
            "folding": {
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "asciifolding"
               ]
            },
            "analyzer_ptbr": {
               "tokenizer": "standard",
               "filter": [
                  "lowercase",
                  "stemmer_plural_portugues",
                  "asciifolding"
               ]
            }
         },
         "filter": {
            "stemmer_plural_portugues": {
               "type": "stemmer",
               "name": "minimal_portuguese"
            }
         }
      }
   },
   "mappings": {
      "post": {
         "properties": {
            "title": {
               "type": "multi_field",
               "fields": {
                  "title": {
                     "type": "string",
                     "analyzer": "standard"
                  },
                  "folded": {
                     "type": "string",
                     "analyzer": "folding"
                  },
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed"
                  },
                  "ptbr": {
                     "type": "string",
                     "analyzer": "analyzer_ptbr"
                  }
               }
            }
         }
      }
   }

When trying to sort with:

{
   "query": {
      "match_all": {}
   },
   "sort": [
      {
         "title.ptbr": {
            "order": "asc"
         }
      }
   ]
}

It is returned:

A bacate
Version of A cent  b B anana D ois neighbors
>

If you change to the raw field (not parsed):

{
   "query": {
      "match_all": {}
   },
   "sort": [
      {
         "title.raw": {
            "order": "desc"
         }
      }
   ]
}

Return:

 ngelo
V ersão de Acentuação
Banners
B banners banners
banners

Or, ignoring the accent can not sort by the first word of the sentence, if I keep the field as unanalyzed the special characters are considered the first in decreasing ordering, has anyone ever had this problem?

Thank you

    
asked by anonymous 20.07.2016 / 23:03

1 answer

2

Friend, I already had this problem and I was able to solve by removing all the accent and spaces of the words with a filter and creating a version of the field multi_field , so your sentences are:

Angelo = > angelo
Accent Version = > Search Two Neighbors = > two neighbors
Banana = > banana
Avocado = > avocado

So you can apply sort in this version of the field, see the code:

{
   "settings": {
      "analysis": {
         "analyzer": {
            "without_space": {
               "filter": [
                  "lowercase",
                  "whitespace_remove",
                  "asciifolding"
               ],
               "type": "custom",
               "tokenizer": "keyword"
            }
         },
         "filter": {
            "whitespace_remove": {
               "type": "pattern_replace",
               "pattern": " ",
               "replacement": ""
            }
         }
      }
   },
   "mappings": {
      "my_type": {
         "properties": {
            "title": {
               "type": "multi_field",
               "fields": {
                  "title": {
                     "type": "string",
                     "analyzer": "standard"
                  },
                  "raw": {
                     "type": "string",
                     "index": "not_analyzed"
                  },
                  "sorting": {
                     "type": "string",
                     "analyzer": "without_space"
                  }
               }
            }
         }
      }
   }
}

In the query:

{
    "query": {
        "match_all": {

        }
    },
    "sort": [
       {
          "title.sorting": {
             "order": "desc"
          }
       }
    ]
}

It was the simplest solution I found to deal with accentuation and "relevance" of the first letter of the sentence.

I hope it helps.

    
21.07.2016 / 18:23