• About Us
    • New York
  • Work
  • Capabilities
  • Careers
  • Technology
  • Blog
  • Contact Us
  • About Us
    • New York
  • Work
  • Capabilities
  • Careers
  • Technology
  • Blog
  • Contact Us
January 05, 2016

Elasticsearch Term or Terms Query Not Working? Start Here.

Posted by Christopher Davis

Summary if the term(s) being searched contain spaces or special characters, you’ll need to use a not_analyzed property in your search to make it work.

Analyzers & the Inverted Index

By default Elasticsearch runs data that comes in through a set of analyzers when it comes in. You can specify what sort of analysis you want done on the strings when you set up the property’s index parameter.

This analysis turns the raw data into a set of tokens that are stored in an inverted index (here’s a bit more in depth guide).

When you search for something, the inverted index is queried and documents that match are returned.

Term & Terms Queries Are Not Analyzed

When you search with something like a query string or match query, Elasticsearch will use its analyzers again to tokenize the query and look up documents that match in the inverted index. You can control which analyzer is used with the analyzer parameter in the query object. You can see how Elasticsearch tokenizes as term with the analyze endpoint.

curl 'http://localhost:9200/_analyze?pretty&text=test%20two'
{
  "tokens" : [ {
    "token" : "test",
    "start_offset" : 0,
    "end_offset" : 4,
    "type" : "",
    "position" : 1
  }, {
    "token" : "two",
    "start_offset" : 5,
    "end_offset" : 8,
    "type" : "",
    "position" : 2
  } ]
}

The term and terms queries do no analysis: they look for values that match exactly what’s given to them. This makes all kinds of sense: you’re trying to look up the values exactly as you pass them in.

But there’s a catch: term and terms queries still search the inverted index.

This is unnoticeable if you’re doing those queries on terms that are all one word or numeric since the terms stored in Elasticsearch would not have changed (the analyzer does nothing without spaces to tokenize on, etc). But term values with spaces or punctuation will appear not to be working unless the field you’re search is set to be not_analyzed.

A Quick Example

First lets create an index with a single type and property.

curl -XPUT http://localhost:9200/analyzed_example -d '{
    "mappings": {
        "mytype": {
            "_source": {"enabled": true},
            "properties": {
                "content": {
                    "type": "string"
                }
            }
        }
    }
}'

Then we’ll index some documents:

curl -XPOST http://localhost:9200/analyzed_example/mytype -d '{"content": "test"}'
curl -XPOST http://localhost:9200/analyzed_example/mytype -d '{"content": "test two"}'

Now let’s try a terms query with test, which should return just one document, but really returns two:

curl -XPOST http://localhost:9200/analyzed_example/mytype/_search?pretty -d '{                                     
  "query": {"term": {"content": "test"}}
}'

{
  "took" : 3,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 2,
    "max_score" : 0.5945348,
    "hits" : [ {
      "_index" : "analyzed_example",
      "_type" : "mytype",
      "_id" : "AVHotWCgWVxYklVnp_0-",
      "_score" : 0.5945348,
      "_source":{"content": "test"}
    }, {
      "_index" : "analyzed_example",
      "_type" : "mytype",
      "_id" : "AVHotYZ9WVxYklVnp_0_",
      "_score" : 0.37158427,
      "_source":{"content": "test two"}
    } ]
  }
}

Why two documents? Because the analysis done one the content field in the second document put test and two into the inverted index. As such our terms query matches. But what happens when we do a term query on test two? No results.

curl -XPOST http://localhost:9200/analyzed_example/mytype/_search?pretty -d '{                                         
  "query": {"term": {"content": "test two"}}
}'

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 0,
    "max_score" : null,
    "hits" : [ ]
  }
}

We can get around this by setting the field we want to “not_analyzed”:

curl -XPUT http://localhost:9200/nonanalyzed_example -d '{
    "mappings": {
        "mytype": {
            "_source": {"enabled": true},
            "properties": {
                "content": {
                    "type": "string",
                    "index": "not_analyzed"
                }
            }
        }
    }
}'

curl -XPOST http://localhost:9200/nonanalyzed_example/mytype -d '{"content": "test"}'
curl -XPOST http://localhost:9200/nonanalyzed_example/mytype -d '{"content": "test two"}'

And now both of our queries turn out as expected:

curl -XPOST http://localhost:9200/nonanalyzed_example/mytype/_search?pretty -d '{                                         
  "query": {"term": {"content": "test"}}
}'

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "nonanalyzed_example",
      "_type" : "mytype",
      "_id" : "AVHov1xVWVxYklVnp_1H",
      "_score" : 1.0,
      "_source":{"content": "test"}
    } ]
  }
}

curl -XPOST http://localhost:9200/nonanalyzed_example/mytype/_search?pretty -d '{                                         
  "query": {"term": {"content": "test two"}}
}'

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 5,
    "successful" : 5,
    "failed" : 0
  },
  "hits" : {
    "total" : 1,
    "max_score" : 1.0,
    "hits" : [ {
      "_index" : "nonanalyzed_example",
      "_type" : "mytype",
      "_id" : "AVHov4K7WVxYklVnp_1I",
      "_score" : 1.0,
      "_source":{"content": "test two"}
    } ]
  }
}

So What Should I Set to not_analyzed?

It’s up to your application’s needs. Some examples are document properties that map to identifiers external to Elasticsearch or things like URL slugs.

An application at PMG needed some exact matching on certain fields as well as the normal search functionality Elasticsearch provides. We ended up creating a specially named field that was not analyzed specifically to do the term and terms queries we needed.

Elasticsearch is a trademark of Elasticsearch BV, registered in the U.S. and in other countries.

Interested in working with us? See our open engineering roles here.

Previous
Next

Latest White Papers

  • Shifting Plans for 2020 & Beyond
  • Game On: How Brands Can Log Into A Diverse Multi-Billion Dollar Industry
  • What CCPA Means For Brands
  • How Google is Improving Consumer Data Privacy
  • Ways to Prepare for the Cookieless Future
  • See all White Papers

Featured Posts

  • Ad Age Names PMG #1 Best Place to Work in 2021
  • Hindsight 2020 & Looking Ahead to 2021
  • Preparing for Streaming’s Growth & The Future of TV Buying
  • MediaPost Names PMG Independent Agency of the Year
  • PMG Client Portfolio Trends During Amazon Prime Day 2020

Categories

  • Consumer Insights
  • Content
  • Creative Design
  • Data Analytics
  • Development
  • Digital TV & Video
  • Ecommerce
  • Industry News
  • Local
  • Mobile
  • Paid Search
  • PMG Culture
  • Programmatic & Display
  • SEO
  • Social Media
  • Structured Data
Fort Worth

2845 West 7th Street
Fort Worth, TX 76107

Dallas

3102 Oak Lawn Avenue
Suite 650
Dallas, TX 75219

Austin

823 Congress Avenue
Suite 800
Austin, TX 78701

London

33 Broadwick Street
London
W1F 0DQ

New York

120 East 23rd Street
New York, NY 10010

Get in touch

(817) 420 9970
info@pmg.com

Subscribe to the PMG Newsletter
© 2021 PMG Worldwide, LLC, All Rights Reserved
  • Contact
  • Privacy Policy
 Tweet
 Share
 Tweet
 Share
 Tweet
 Share
 LinkedIn
We and our partners use cookies to personalize content, analyze traffic, and deliver ads. By using our website, you agree to the use of cookies as described in our Cookie Policy.