Elasticsearch：如何在搜索时得到精确的总 hits 数

参考 okeyl.com

从 Elasticsearch 7.0之后，为了提高搜索的性能，在 hits 字段中返回的文档数有时不是最精确的数值。Elasticsearch 限制了最多的数值为10000。

{

"took" : 1,

"timed_out" : false,

"_shards" : {

"total" : 1,

"successful" : 1,

"skipped" : 0,

"failed" : 0

"hits" : {

"total" : {

"value" : 10000,

"relation" : "gte"

...

}

当文档的数值大于10000时，返回的 total 数值为10000，并在 relation 中指出 gte。

我们可以做如下的一个实验。启动Kibana:

然后选中“Add data”:

这样我们就把Sample flight data的数据加载到Elasticsearch中去了。

我们在Dev tools中来查询我们的文档个数：

我们可以看到有13059个数值。假如我们使用如下的方式来进行搜索的话：

显然我们得到的文档的数目是10000个，但是它并不是我们的实际的满足条件的所有文档数。假如我们想得到所有的文档数，那么我们可以做如下的方式：

我们在请求的参数中加入 track_total_hits，并设置为true，那么我们可以看到在返回的参数中，它正确地显示了所有满足条件的文档个数。

=========================================================================================================================

ElasticSearch Count API 和 track_total_hits 两者的区别是什么？

我想计算：某个查询条件(比如exists-query)下文档的总数，看了下ES官方文档：count api ：

Gets the number of matches for a search query.

返回符合查询条件的文档数量，应该能满足需求。

但是，我又看到了另一个参数：track-total-hits，这里面提到：某个查询条件下的total hits是不准确的，因为它没有：visiting all matches。而 track_total_hits 提供了一个下界来保证符合查询条件的文档数量的准确性。

Generally the total hit count can't be computed accurately without visiting all matches, which is costly for queries that match lots of documents. The track_total_hits parameter allows you to control how the total number of hits should be tracked.

我的疑问是：

count api 得到的符合查询条件的文档数量一定是准确的吧？如果是准确的话，那么它应该 visiting all matches了，那是不是说明：count api 是一个很耗时的操作吧？那么我要计算：符合某个查询条件下的文档的准确数量时，使用 count api 好呢？还是使用 track_total_hits 好呢？有什么坑要注意的么？

参考链接：Do not compute hit counts by default=

===========================================================================================================================================

es7.x在查询时，必须加上track_total_hits，不然就只显示10000

{

"track_total_hits": true,

"query": {

"range": {

"ts": {

"gte": 0

}

Elasticsearch：如何在搜索时得到精确的总 hits 数

推荐阅读更多精彩内容