Realtime Search: Solr vs Elasticsearch

2019-03-27 01:08|来源: 网路

Realtime Search: Solr vs Elasticsearch | Socialcast Engineering
Realtime Search: Solr vs Elasticsearch
Ryan SonnekRyan Sonnek
Tuesday May 31st, 2011 by Ryan Sonnek
19 comments
Tweet
What is Elasticsearch?

Elasticsearch is REST based, distributed search engine powered by the excellent Lucene library. The built in JSON + HTTP API provides an elegant platform perfect for integrating with (ex: the elastic_searchable ruby gem). It’s simple, scalable and “cool, bonsai cool“.
Why is it better than Solr?

First of all, let’s set the record straight: Solr is fast. I’m serious…it’s really fast! Solr is the defacto search engine for a reason. It’s stable, reliable and out of the box, it outperforms nearly every search solution for basic vanilla searches (including Elasticsearch).

Unfortunately, it is really easy to break Solr as well. All it takes is to performing searches while concurrently updating the index with new content. This is a pretty serious problem if you need to update your search index regularly.

Now throw a few million documents into the index and Solr will be buckling at the knees while Elasticsearch doesn’t break a sweat!

It is painfully apparent that Solr’s architecture was not built for realtime search applications. The demands of realtime web applications require delivery of updates in near realtime as new content is generated by users. The distributed nature of Elasticsearch allows it to keep up with concurrent search + index requests without skipping a beat.
Realworld Results…

After transitioning our search infrastructure from Solr to Elasticsearch, we saw an instant ~50x improvement in search performance!
And now for something a bit more interesting…

The typical realtime search architecture goes something like this:

index user content into the search engine
perform set of queries against search engine to determine if content matches particular criteria
perform specific logic notifying registered channels that new content is available

Elasticsearch can support this model quite well, but it also offers a feature that turns this entire workflow on it’s head.
Introducing: Percolation!

Elasticsearch percolation is similar to webhooks. The idea is to have Elasticsearch notify your application when new content matches your filters instead of having to constantly poll the search engine to check for new updates.

The new workflow looks like this:

register specific query (percolation) in Elasticsearch
index new content (passing a flag to trigger percolation)
the response to the indexing operation will contain the matched percolations

This is the perfect architecture for realtime search and a true gamechanger.
The Bottom Line

Solr may be the weapon of choice when building standard search applications, but Elasticsearch takes it to the next level with an architecture for creating modern realtime search applications. Percolation is an exciting and innovative feature that singlehandedly blows Solr right out of the water. Elasticsearch is scalable, speedy and a dream to integrate with. Adios Solr, it was nice knowing you.
Tagged: search
Comments

David says:

Cool article. Now, i know why I love ES ! ;-)
Commented on May 31, 2011
jrawlings says:

Was the ‘Search Fresh Index while Idle’ performed against an elasticsearch 5 shard index (the default setup for a newly created index) or a single shard index?
Commented on May 31, 2011
Ryan Sonnek
Ryan Sonnek says:

@jrawlings these benchmarks are for the “out of the box” vanilla install of Elasticsearch and Solr so yes, this is using the 5 shard index setting.
Commented on May 31, 2011
umad says:

Elasticsearch is a peach, when it doesn’t break. I’ve had so many nightmares trying to recover from a broken elasticsearch cluster that I wouldn’t recommend it to anyone.

I guess for small sites it’s ok. For serious business, I’ll stick with solr.

It would be nice to see a comparison with riaksearch as well.
Commented on May 31, 2011
Ryan Sonnek
Ryan Sonnek says:

@umad in our experience, the exact opposite is true. We pushed Solr so hard to try and support realtime search that we constantly had to deal with Java out of memory issues. Elasticsearch is much more stable (even for a beta application) and runs *so* much smoother.

I’m not sure what you classify a “small” site. Our search index contains millions of documents and we’re performing hundreds of requests per minute and Elasticsearch has not had a single hiccup yet.
Commented on June 1, 2011
Philip Ingram says:

That percolation business is awesome. Webhooks make updating realtime data sources easy, and it’s brilliant that Elasticsearch takes that approach. Thanks for sharing.
Commented on May 31, 2011
Ben says:

Good blog post. What were some of the parameters around index sizes (per shard) and commit rates? We have some massive warming times on our solr indexes that requires us to batch our adds before a commit, certainly not a position to be in with real time search though. I can see how without tuning and default cache warming you might run into bunches of overlapping warming searchers.
Commented on May 31, 2011
MarcMarc says:

And why not using master-slave configuration in SOLR? Isn`t that perfect solution for sepearating add doc/query operations?
Commented on June 1, 2011
Ryan Sonnek
Ryan Sonnek says:

@MarcMarc master-slave really isn’t an option for realtime search applications. The current Solr replication solution is not synchronous so once your update operation is complete on the master, the data is not yet available on all slaves for subsequent searches.

Introducing master-slave for the search index also introduces a lot of operational complexity that if you can avoid, you really should. :)
Commented on June 1, 2011
Vlad Zloteanu says:

Ryan, what was the commit strategy you used with Solr? Commit after each request, autocommit after X secs, autocommit after X docs? This can greatly impact update performance. See http://wiki.apache.org/solr/SolrPerformanceFactors#Updates_and_Commit_Frequency_Tradeoffs, http://blog.raspberry.nl/2011/04/08/solr-update-performance/ and http://www.elevatedcode.com/articles/2009/01/14/speeding-up-solr-indexing/
Commented on June 1, 2011
Ryan Sonnek
Ryan Sonnek says:

@vlad we require all content to be immediately available for searches after indexing, so we commit after each update operation. this the nature of the beast when building a true realtime search application and as you point out is not the “preferred” way to integrate with Solr.
Commented on June 1, 2011
Otis Gospodnetic says:

Nice post. You’ll need to compare ES and Solr once Solr starts making use of the underlying Lucene NRT mechanism.

Just to make it clear to readers not familiar with the underlying details:
It is Lucene that adds the NRT support. ES uses it, while Solr does not use it yet, which is different from Solr using the same Lucene API as ES and doing it/still performing poorly.
Commented on June 1, 2011
Peter Bengtsson says:

Being a Xapian fan as of many years I’d love to see Xapian benchmarked against ES.
Commented on June 1, 2011
Andy says:

What’s the difference between “search fresh index” and “search full index”?

Were you running Solr and ElasticSearch on the same hardware?
Commented on June 1, 2011
Ryan Sonnek
Ryan Sonnek says:

@andy the fresh index benchmarks are done against an empty/clean index. the “full index” benchmarks were done after populating the index with a few million documents. The index is never technically “full”, but it was just a quick way of getting more realistic and real world benchmarks.
Commented on June 1, 2011
db says:

Interesting that umad says he had so many issues with broken clusters, that he stopped recommending ES for production usage. We’ve been running in production for 6 months with significant traffic volume on behalf of demanding clients.

There have been some nice robustness improvements in ES 0.16

We evaluated Solr vs ES and for our data with a wide range of queries, ES was significantly faster than Solr. Tuning Solr is challenging.

David
Commented on June 7, 2011
Steven Hildreth says:

Solr doesn’t support GeoPolygons either, so if you need spatial searches look to ElasticSearch.
Commented on August 24, 2011
David says:

Field collapsing (grouping, or whatever you call it) is still awaited in ES, but exists in Solr.

This is in some particular use cases a must have feature (think about SKUs in an index and search results must be products (and not SKU)
Commented on September 16, 2011

转自:http://www.cnblogs.com/lexus/archive/2011/10/11/2207984

相关问答

更多
  • 对于实时搜索,Google Analytics弹性搜索是一个不错的选择。 比Hadoop / HBase / HDFS更容易设置和处理。 弹性搜索与HBase的比较: http : //db-engines.com/en/system/Elasticsearch%3BHBase For real-time search Analytics Elastic Search is a good choice. Definitely easier to setup and handle than Hadoop/HB ...
  • 有一个现有的项目需要你的binlog,将其转换并发送到Elasticsearch,你可以在https://github.com/siddontang/go-mysql-elasticsearch查看它。 另一个是这个: https : //github.com/noplay/python-mysql-replication 。 但是,请注意,无论您选择哪个,在为binlog建立索引之前预先创建索引和映射是一种很好的做法。 这使您可以更好地控制数据。 There is an existing project ...
  • 在ElasticSearch中,您可以通过索引将数据分离为单独的索引,然后将查询限制为特定的索引。 例如,如果您有两个索引,'foo'和'bar'正在运行: % curl -XGET http://localhost:9200/_search?q=*:* 将搜索整个群集,同时: % curl -XGET http://localhost:9200/foo/_search?q=*:* 将只搜索'foo'索引。 如果使用以下命令创建索引'测试',也可以按类型分隔数据: % curl -XPOST http: ...
  • 必须根据您的业务设置Solr堆大小。 设置-Xms=2G和-Xmx=12G只是对许多流行的Solr应用程序的推荐,但它不是强制性的。 您需要评估您的需求并将堆设置为适合您。 我真的建议你在堆上使用至少2G 。 Solr使用的一部分堆只是为了维护服务器而512m可能是不够的。 Solr heap size must be set according your business. Set -Xms=2G and -Xmx=12G is just a recommendation to lots of popul ...
  • 我是OrientDB的全文和空间索引的维护者。 首先,这里是文档的链接,只是为了概述: http://orientdb.com/docs/last/Full-Text-Index.html 我们提供什么? 我们允许在使用Lucene作为引擎的类(例如表)的一个或多个属性上定义索引。 这允许执行可以使用完整Lucene语法的查询: SELECT from Person WHERE Description LUCENE "progra* +senior -ruby" 您可以配置分析器和停用词,但我们无法提供E ...
  • Lucene是一个用Java构建的搜索库,而Solr和Elastic Search(ES)是使用Lucene的Web应用程序。 在大多数情况下,您更喜欢Solr或ES到Lucene,主要是因为开箱即用的机制:多个节点上的分布式搜索,复制,分片和索引管理。 因为使用自定义Java应用程序和Lucene很难实现和维护这样的机制。 你会选择Lucene: 要有更多的控制权,因为它只是一个没有严格依赖关系的jar; 你不希望被任何特定的服务器约束; 您不希望构建自动化以在生产中部署Solr或ES(使用他们的服务器, ...
  • 这是一个立即为插入/更新上的节点编制索引的解决方案,只需要将一小段代码放入自定义模块: https : //www.drupal.org/node/1816462#comment-9093573 (向上滚动一下,看看hook_entity_insert / update()的实现 。) Here's a solution that immediately indexes nodes on insert/update and only requires putting a little snippet of ...
  • 有一个很好的adhoc Python工具,由OpenSource Connections的优秀人员用爱制作,您可以使用它来执行此操作: https://github.com/o19s/solr-to-es 只是 ./solr-to-es solr_url elasticsearch_url elasticsearch_index doc_type 例如,下面的命令将翻阅本地Solr节点上名为node所有文档,并将它们提交到索引my_index的本地Elasticsearch服务器,文档类型为my_type ...
  • 据我所知,这不能在Solr 4.0中完成,只能在Solr-ra http://solr-ra.tgels.com中完成 .Solr 4.0具有软提交功能,有助于NRT搜索要求。 它有一个新功能,通过启用记录更新但没有搜索功能,支持get操作而不提交。 As far as my understanding this can't be done in Solr 4.0 but only in Solr-ra http://solr-ra.tgels.com Solr 4.0 has the soft commi ...
  • 您请求的功能称为近实时搜索,也称为NRT。 关于NRT的工作仍在进行中,但过去几年来Solr对这种支持的改进有了很大的改进。 有关NRT的当前(版本1.4 - 3.5)和将来(版本4.0)支持的更多详细信息,请参阅以下链接。 NRT选项 Solr Near Realtime搜索版本3.5 / 3.4 / 3.3 / 3.2 / 1.4.1 近实时搜索ver 3.x. 近实时搜索调优 (版本1.4 - 3.x) Solr近实时搜索 (版本4.0) 对新Solr“近实时”改进进行基准测试 (版本4.0) 具有排 ...