首页 \ 问答 \ 在分布式系统中使用Neo4j和Lucene(Using Neo4j and Lucene in a distributed system)

在分布式系统中使用Neo4j和Lucene(Using Neo4j and Lucene in a distributed system)

我正在考虑将Neo4j作为精简文档存储。 文档存储的一个关键方面是搜索，我知道Neo4j包含Lucene提供的旧版索引的全文搜索。
我非常有兴趣听听分布式环境中Neo4j搜索功能的局限性。 它是否提供分布式索引？ 它在哪些方面不如Solr或ElasticSearch？ 在我必须安装Solr之前，我能在多长时间内接受它？
- 编辑 -
我们正在尝试整合两种不同的搜索工作。 第一种是标准文本内容搜索。 例如，使用安然电子邮件，我们希望搜索与“香蕉”或“去商店”匹配的每封电子邮件，并将这些文档正文作为回应。 这是人们经常求助于Solr的地方。
第二种情况比较复杂，我们在每个文档中附加了大量的元数据。 我们可能已经决定“这些”电子邮件是深夜醉酒拨号的结果。 现在我想搜索可能是深夜醉酒拨号结果的所有电子邮件。 对于这种元数据，我们认为图形数据库是有序的。
在一个完美的世界中，我可以使用一个平台来执行两个查询。 我很欣赏Neo4j（也不是OrientDB，Arango等）被设计为全文搜索数据库，但我试图理解其局限性。
就数量而言，我们正在进行大规模的批量式夜间更新。 数据内容繁重，有些文档会运行到数百页的文本中，但大多数都是一两页的顺序。

I am looking into Neo4j as a stripped-down document store. A key aspect of document storage is search, and I know Neo4j includes full text search via legacy indices provided by Lucene.
I would be very interested in hearing the limitations of Neo4j search capabilities in a distributed environment. Does it provide a distributed index? In what ways is it inferior to Solr or ElasticSearch? How far can I take it before I must install Solr?
-- EDIT --
We are trying to integrate two distinct search efforts. The first is standard text content search. For instance, using the Enron emails, we want to search for every email that matches "bananas" or "going to the store" and get those document bodies in response. This is where people often turn to Solr.
The second case is more complicated, we have attached a great deal of meta-data to each document. We may have decided that "these" emails were the result of late-night drunk-dialing. Now I want to search for all emails that may have been the result of late-night drunk-dialing. For this kind of meta-data, we believe a graph database is in order.
In a perfect world, I can use one platform to perform both queries. I appreciate that Neo4j (nor OrientDB, Arango, etc) are designed as full text search databases, but I'm trying to understand the limitations thereof.
In terms of volume, we are dealing at a very large scale with batch-style nightly updates. The data is content heavy, with some documents running into hundreds of pages of text, but mostly on the order of a page or two.

原文：https://stackoverflow.com/questions/31295830

更新时间：2022-06-26 13:06

最满意答案

 capacity是字符串当前可以容纳的最大字符数，而不必增加。 size是字符串中实际存在的字符数。 他们分离概念的原因是分配内存通常效率低下，因此您尝试通过获取比实际需要的内存更少的内存来分配内存。 （许多数据结构使用“加倍”方法，如果它们的容量达到N并且需要更多空间，则它们将分配2*N空间，以避免不久后再次重新分配。）  
 当您使用字符串并且需要更多空间时， capacity会自动增加。 您也可以使用reserve功能手动增加它。 

capacity is the maximum number of characters that the string can currently hold without having to grow. size is how many characters actually exist in the string. The reason they're separate concepts is that allocating memory is generally inefficient, so you try to allocate as rarely as possible by grabbing more memory than you actually need at one time. (Many data structures use a "doubling" method where, if they hit their capacity of N and need more space, they will allocate 2*N space, to avoid having to reallocate again any time soon.) 
capacity will increase automatically as you use the string and require more space. You can also manually increase it using the reserve function.

在分布式系统中使用Neo4j和Lucene(Using Neo4j and Lucene in a distributed system)

最满意答案

相关问答

std :: string容量大小(std::string capacity size)[2023-09-27]

C ++中矢量的初始容量(Initial capacity of vector in C++)[2022-05-24]

C ++ - getline是否具有最大字符串或字符容量？(C++ - Does getline have a maximum string or character capacity?)[2023-07-26]

C ++：为另一个容器内的std :: vector保留容量(C++: Reserve capacity for a std::vector, which is inside another container)[2023-12-20]

我们什么时候需要调用std :: string :: capacity（）？(When do we need to call std::string::capacity()?)[2019-12-03]

C ++自定义类String，将其分配给C风格的String(C++ A self-defined class String assigning it to a C-style String)[2023-04-20]

c ++字符串中“size”和“capacity”之间的区别？(Difference between “size” and “capacity” in c++ string?)[2024-02-08]

C ++ fstream：如何在阅读时知道字符串的大小？(C++ fstream: how to know size of string when reading?)[2023-01-24]

为什么矢量具有不同的容量而不是尺寸？(Why vector has different capacity and other than the size? [duplicate])[2023-08-17]

java和C ++之间的字符串大小关系(String size relation between java and C++)[2022-05-14]

相关文章

最新问答