首页 \ 教程 \ solr

知识点

Solr

YaCy 1.4 发布，分布式Web搜索引擎

一致性hash和solr千万级数据分布式搜索引擎中的应用

分布式检索系统 ElasticSearch 和 SenseiDB 比较

elasticsearch RESTful搜索引擎-简介

分布式搜索Elasticsearch——概述

从Storm和Spark 学习流式实时分布式计算的设计

分布式搜索Elasticsearch——安装

elasticsearch RESTful搜索引擎-安装

搜索引擎之solr小结

使用搜索引擎solr的步骤

solr搜索引擎问题

使用nutch和solr建立搜索引擎

Solr java 的开源搜索引擎

实时分布式搜索引擎比较（senseidb、Solr、elasticsearch）

2019-03-27 01:09|来源: 网路

1 1. Solr
- 1.1 Features
- 1.2 Pros & Cons
- 1.3 References
2 2. Senseidb
- 2.1 Features
- 2.2 Pros & Cons
- 2.3 为何没有直接用Solr？
- 2.4 References
3 3. elasticsearch
- 3.1 Features
- 3.2 Pros & Cons
- 3.3 References
4 4. Conclusion
5 其它参考文献

比较的时候，主要关注以下几个方面：

Clustering
- Scalability on Storage and Service
- High Availability Considerations
Features
Flexibility

1. Solr

很显然， Solr跟Lucene是一家，所以，对Lucene做了很多扩展，与lucene的集成也比较好，而且，业界貌似求稳的都会选择Solr来构建他们的搜索体系。

但SolrCloud才是最终的理想解决方案，而SolrCloud还没有production-ready。

下面是Solr相关的架构图：

solr architecture

1.1 Features

Solr的首页上对自己的特性罗列阐述的很详细了，这里不再赘述。

1.2 Pros & Cons

Pros
- 成熟且验证过的方案
- 文档资料丰富
- 社区活跃
- plugin extension points
Cons
- 貌似体系比较庞杂， replication的架构扩展有稍许问题？！

1.3 References

New SolrCloud Design
Scaling Lucene and Solr
Turbocharging Solr Index Replication with BitTorrent
- funny and sparkling idea by introducing BitTorrent replication mechanism *****
Distributed Searching
Carrot2-OSS framework for building search clustering engine
- Solr search results clustering is based on the Carrot2 real-time document clustering engine.
Clustering Component
- 结果集的分类
New SolrCloud Design
SolrCloud
UniqueKey
Solr Near Realtime Search
- will be added in Solr4, currently available in trunk
Scaling Solr Indexing with SolrCloud, Hadoop and Behemoth

2. Senseidb

architecture of sensei

2.1 Features

主要解决高速索引更新的问题;
- 底层是zoie的“2-swapping-in-memory-index + 1-on-disk-index”索引结构支持
需要定义schema;
通过Gateway可以接入多种数据源;
通过BQL或者REST API，甚至各种语言bindings进行数据查询；
支持通过hadoop MR job批量更新数据索引；

2.2 Pros & Cons

Pros
- 高速索引更新
- 多数据源接入
- 灵活的访问接口
- 与hadoop生态的集成
- 优秀的分布式扩展能力
Cons
- static schema
- application side versioning maitaining

2.3 为何没有直接用Solr？

摘录在John Wang的访谈片段：

Sensei leverages Lucene.

We weren’t able to leverage Solr because of the following requirements:

    * High update requirement, 10’s of thousands updates per second in to the system
    * Real distributed solution, current Solr’s distributed story has a SPOF at the master, and Solr Cloud is not yet completed.
    * Complex faceting support. Not just your standard terms based faceting. We needed to facet on social graph, dynamic time ranges and many other interesting faceting scenarios. Faceting behavior also needs to be highly customizable, which is not available via Solr.

2.4 References

3. elasticsearch

很新，当前0.19RC3版本，文档缺乏。不过， ES确实有很多值得喝彩的地方。

3.1 Features

Schema-Free | Schemaless
feed index engine with JSON formatted documents
Query by Lucene based query string or JSON based query DSL over HTTP or Native Java;
shards and replicas, LB and routings
cloud integration
multiple search types
multiple data sources integration with River
many more...

3.2 Pros & Cons

Pros
- 许多灵活，优秀的特性（见features列表）
- 作者拥有多年在搜索领域的涉猎经验
- senseidb的pros它也基本都有
Cons
- 文档不足
- 后端没有大的商业机构支持

3.3 References

quick intro to elastic search
Flume, Hive and realtime indexing via ElasticSearch
The Future of Compass & ElasticSearch
Elastic Search: Distributed, Lucene-based Search Engine
ElasticSearch at berlinbuzzwords 2010
Elastic Search Vs. Apache Solr
- 这篇貌似倾向于ES比较多一些
Your Data, Your Search
Search Engine Time Machine
- transient状态与持久化状态的结合， write behind策略
NoSQL, Yes Search
- 多种数据源类型的平滑接入
Geo Location and Search
- 基于geo进行排序的特性很新颖
Zero Conf Persistency
- Local Gateway (Local Storage | Local FileSystem)
The River
- ES里River的概念跟Senseidb里Gateway的概念相近，是数据源通道的意思，可以根据不同的数据源给出不同的River实现，比如基于MysqlBinlog的River，基于Hbase的River，或者RabbitMQ River，CouchDB River etc.
Percolator
- 这个Percolator是ES里的概念，不要跟Google的Percolator混淆
Versioning
- Optimistic Concurrency Control
New Search Types
- Introduce count and scan search types, the latter can be used to scroll large result set
Data Visualization with ElasticSearch and Protovis
Distributed Diagram (Video)
Road to a Distributed Search Engine (Video)*****

4. Conclusion

All are based on Lucene.
All are distributed.
- senseidb shards with multi-write?!
- solr shards with master-slaves and slave pull strategy;
- elasticsearch shards with primary-secondary push strategy;
All do partitioning in document granularity, All require some unique key for each document(optional for some situations);
Sensei is good at real-time index update; Solr is good at stable and wide adoption; Elasticsearch is good at flexible and good ideas;

5 其它参考文献

Lily架构简介
- 在自己的lily node里实现了multiwrite + wal+ message queue的数据分发，没有充分利用现有系统中各个组件/系统的能力(虽然是基于hbase的table实现的)，部分上来讲把事情搞复杂了。

引自：http://afoo.me/notes-on-senseidb-solr-and-elasticsearch.html

转自：http://www.cnblogs.com/ibook360/archive/2013/03/22/2975345

知识点

相关文章

最近更新

实时分布式搜索引擎比较（senseidb、Solr、elasticsearch）

1. Solr

1.1 Features

1.2 Pros & Cons

1.3 References

2. Senseidb

2.1 Features

2.2 Pros & Cons

2.3 为何没有直接用Solr？

2.4 References

3. elasticsearch

3.1 Features

3.2 Pros & Cons

3.3 References

4. Conclusion

5 其它参考文献

相关问答

java如何用lucene+nutch搭建分布式搜索引擎？[2022-04-25]

solr redis对比 Solr vs.Elasticsearch谁是开源搜索引擎王者[2023-09-17]

如何评价慕课网课程《Python分布式爬虫打造搜索引擎》[2022-09-08]

java如何用lucene+nutch搭建分布式搜索引擎？[2022-05-03]

如何评价慕课网课程《Python分布式爬虫打造搜索引擎》[2022-01-23]

Mysql搜索引擎有那些？[2022-06-18]

谁搭建过lucene搜索引擎的分布式？是否有具体DEMO？java版的。[2023-03-09]

django-haystack elasticsearch作为后端和搜索引擎(django-haystack elasticsearch as backend and searchengine)[2023-02-05]

p2p搜索引擎如何防止恶意同行破坏分布式索引？(How p2p search engines could prevent corruption of distributed index by malicious peers?)[2023-04-02]

如何搜索引擎，说谷歌的页面排名算法跨分布式/多台机器工作？(How search engine, say Google's page ranking algorithm work across distributed/multiple machines?)[2023-10-29]

知识点

相关文章

最近更新

实时分布式搜索引擎比较（senseidb、Solr、elasticsearch）

1. Solr

1.1 Features

1.2 Pros & Cons

1.3 References

2. Senseidb

2.1 Features

2.2 Pros & Cons

2.3 为何没有直接用Solr？

2.4 References

3. elasticsearch

3.1 Features

3.2 Pros & Cons

3.3 References

4. Conclusion

5 其它参考文献

相关问答

java如何用lucene+nutch搭建分布式搜索引擎？[2022-04-25]

solr redis对比 Solr vs.Elasticsearch谁是开源搜索引擎王者[2023-09-17]

如何评价慕课网课程《Python分布式爬虫打造搜索引擎 》[2022-09-08]

java如何用lucene+nutch搭建分布式搜索引擎？[2022-05-03]

如何评价慕课网课程《Python分布式爬虫打造搜索引擎 》[2022-01-23]

Mysql搜索引擎有那些？[2022-06-18]

谁搭建过lucene搜索引擎的分布式？是否有具体DEMO？java版的。[2023-03-09]

django-haystack elasticsearch作为后端和搜索引擎(django-haystack elasticsearch as backend and searchengine)[2023-02-05]

p2p搜索引擎如何防止恶意同行破坏分布式索引？(How p2p search engines could prevent corruption of distributed index by malicious peers?)[2023-04-02]

如何搜索引擎，说谷歌的页面排名算法跨分布式/多台机器工作？(How search engine, say Google's page ranking algorithm work across distributed/multiple machines?)[2023-10-29]

如何评价慕课网课程《Python分布式爬虫打造搜索引擎》[2022-09-08]

如何评价慕课网课程《Python分布式爬虫打造搜索引擎》[2022-01-23]