1001Ferramentas
๐Ÿ”ŽGenerators

Elasticsearch Query DSL Builder

Build Elasticsearch Query DSL (match, term, range, bool) in JSON.


  

Elasticsearch Query DSL explained

Elasticsearch is a distributed search engine built on top of Apache Lucene, created by Shay Banon in 2010. Every interaction happens over a JSON REST API, and the Query DSL (Domain Specific Language) is the declarative format used to describe searches, aggregations and scoring. A query body is a JSON document sent to POST /index/_search, with two top-level concepts: query (what to match, with scoring) and filter (what to match, without scoring, cacheable).

Query types you actually use

  • match โ€” full-text on analysed fields; runs the same analyzer the field was indexed with.
  • term / terms โ€” exact, non-analysed lookup; use for keyword, numeric, date and boolean fields.
  • range โ€” numeric or date range with gte, gt, lte, lt plus optional format and time_zone.
  • bool โ€” combinator with must, should, must_not and filter; the workhorse of every non-trivial search.
  • match_phrase / multi_match โ€” phrase search and multi-field search with field boosts (title^3).
  • prefix, wildcard, regexp, fuzzy (Levenshtein) โ€” partial and approximate matching.
  • query_string / simple_query_string โ€” full Lucene mini-syntax for power users.
  • exists, nested, has_child, geo_bounding_box โ€” structural and geographical filters.

Filter vs Query and the scoring story

Anything inside filter or must_not answers a yes/no question and is cached in the bitset cache โ€” that is what you want for booleans, dates and exact terms. Anything inside must or should contributes to the relevance score. The default scoring is BM25 (since Elasticsearch 7), an evolution of TF-IDF that handles term saturation and document length better. For boosting and re-ranking you can use function_score with field_value_factor, script_score or random_score.

Aggregations: SQL GROUP BY on steroids

Aggregations are split into three families. Bucket aggregations group documents (terms, date_histogram, range, histogram, geohash_grid, filters, composite for paginated drill-downs). Metric aggregations compute values (avg, sum, min, max, percentiles, cardinality, value_count, stats). Pipeline aggregations operate on the output of other aggregations (avg_bucket, derivative, moving_avg, cumulative_sum). Combined, they power most of the dashboards you see in Kibana.

Mapping, analyzers and the text vs keyword trap

A mapping is the schema of an index. The classic pitfall is the difference between text and keyword: text is analysed (lowercased, tokenised, possibly stemmed), enabling full-text search but disallowing sorting and aggregations without fielddata: true; keyword stores the value verbatim and is the right pick for IDs, tags, enums and anything you need to terms-aggregate. A custom analyzer is a chain of char_filter โ†’ tokenizer โ†’ token_filter (lowercase, stop, stemmer, synonym, edge_ngram for autocomplete). ILM (Index Lifecycle Management) rolls indices through hot, warm, cold, frozen and delete phases, and the cluster scales by adjusting the number of primary and replica shards.

How it compares to the alternatives

OpenSearch is the AWS-sponsored fork born in 2021 after Elastic switched to the SSPL licence; it remains API-compatible at the query level. MongoDB Atlas Search embeds Lucene into MongoDB clusters but exposes a smaller surface. Algolia is a fully managed alternative with millisecond latency and a different pricing model. Meilisearch and Typesense are newer projects that prioritise simplicity, instant typo tolerance and small clusters.

FAQ

Is it better than LIKE '%term%' in SQL? For full-text it is in another league: tokenisation, stemming, BM25 ranking, fuzziness, synonyms and phrase matching are all out of reach for a SQL LIKE.

Does Elasticsearch support updates? Yes, but partially. Lucene segments are immutable, so an update is implemented as a delete + reindex of the full document. Frequent partial updates can fragment segments and force merges โ€” bulk update with care.

Why is deep pagination (from + size beyond 10 000) slow? Every shard must rank and return from + size docs and the coordinator merges them, which is O(n). For deep navigation use search_after with a sort tiebreaker, or the scroll / point-in-time APIs for snapshotted exports.

Can I write SQL against Elasticsearch? Yes โ€” the bundled Elasticsearch SQL plugin accepts a subset of SQL and translates it to DSL under the hood. It is great for ad-hoc analytics; for production code, prefer the JSON DSL.

Related Tools