首页 \ 问答 \ “我应该在Solr中使用多个索引吗?”,以及其他一些快速Q.(“Should I use multiple indices in Solr?”, and some other quick Q)

“我应该在Solr中使用多个索引吗?”,以及其他一些快速Q.(“Should I use multiple indices in Solr?”, and some other quick Q)

想象一下一个分类广告网站,一个非常简单的网站,用户没有登录详细信息。

我目前使用MySql作为数据库。 由于类别,数据库有几个表,但是分类本身有一个主表。 在我的案例中总共有7个表。

我想只使用Solr作为“数据库”,因为SO上的某些人认为它会更好,我同意,如果它有效的话。

现在,我有一些关于这样做的快速问题:

  1. 我应该有多个scheema.xml文件或config.xml文件吗?
  2. 如何查询多个索引?
  3. 如何(具有多个索引)会影响性能,我是否需要更强大的机器(内存,CPU等...)来管理它?
  4. 你最终会选择Solr而不是我打算做什么,这是使用Solr搜索并返回我用来查询和查找MySql中的分类的ID号码吗?

我今天有大约300,000条记录,它们可能不会增加。

我没有测试在使用Solr和MySql时记录会如何影响性能,因为我仍然在创建网站,但是当仅使用MySql时它很慢。 我希望使用Solr + MySql会更好,但正如我所说,如果有可能,我将只使用Solr。

谢谢


Imagine a classifieds website, a very simple one where users don't have login details.

I have this currently with MySql as a db. The db has several tables, because of the categories, but one main table for the classified itself. Total of 7 tables in my case.

I want to use only Solr as a "db" because some people on SO thinks it would be better, and I agree, if it works that is.

Now, I have some quick questions about doing this:

  1. Should I have multiple scheema.xml files or config.xml files?
  2. How do I query multiple indices?
  3. How would this (having multiple indices) affect performance and do I need a more powerful machine (memory, cpu etc...) for managing this?
  4. Would you eventually go with only Solr instead of what I planned to do, which is to use Solr to search and return ID numbers which I use to query and find the classifieds in MySql?

I have some 300,000 records today, and they probably won't be increasing.

I have not tested how the records would affect performance when using Solr with MySql, because I am still creating the website, but when using only MySql it is quite slow. I am hoping it will be better with Solr + MySql, but as I said, if it is possible I will go with only Solr.

Thanks


原文:https://stackoverflow.com/questions/2357341
更新时间:2022-06-18 12:06

最满意答案

我找到了一个解决方案。 在Haystack 2.0中 ,为solr生成的schema.xml具有与默认文本字段相关的下一个定义:

   <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
          <filter class="solr.EnglishMinimalStemFilterFactory"/>
        -->
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
          <filter class="solr.EnglishMinimalStemFilterFactory"/>
        -->
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

Haystack 1.2中 ,为solr生成的schema.xml具有另一个定义:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

</fieldType>

如果比较两个定义中的分析器,可以检查Haystack 1.2版本是否使用solr.WhitespaceTokenizerFactory ,而Haystack 2.0版本使用solr.StandardTokenizerFactory

为了保持Haystack 1.2在Haystack 2.0中的行为,您可以创建一个新的文本字段类型,例如,“text_exact”,其中包含与Haystack 1.2相同的文本字段定义内容,然后将schema.xml中的所有文本字段与“text_en”相关联“to”text_exact“。

从这个版本:

<field name="sales_user" type="text_en" indexed="true" stored="true" multiValued="false" />

我们将获得这个其他版本:

<field name="sales_user" type="text_exact" indexed="true" stored="true" multiValued="false" /> 

Haystack 2.0官方文档没有为这一重大变化提供解决方案。 在专门从Haystack 1.x迁移到2.x的部分中包含一些这样的例子会很有趣。


I have found one solution. In Haystack 2.0, the schema.xml generated for solr has the next definition related to default text field:

   <fieldType name="text_en" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
          <filter class="solr.EnglishMinimalStemFilterFactory"/>
        -->
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory"
                ignoreCase="true"
                words="lang/stopwords_en.txt"
                enablePositionIncrements="true"
                />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPossessiveFilterFactory"/>
        <filter class="solr.KeywordMarkerFilterFactory" protected="protwords.txt"/>
        <!-- Optionally you may want to use this less aggressive stemmer instead of PorterStemFilterFactory:
          <filter class="solr.EnglishMinimalStemFilterFactory"/>
        -->
        <filter class="solr.PorterStemFilterFactory"/>
      </analyzer>
    </fieldType>

In Haystack 1.2, the schema.xml generated for solr has this other definition:

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="1" catenateNumbers="1" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"/>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="lang/stopwords_en.txt"/>
        <filter class="solr.WordDelimiterFilterFactory" generateWordParts="1" generateNumberParts="1" catenateWords="0" catenateNumbers="0" catenateAll="0"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.EnglishPorterFilterFactory" protected="protwords.txt"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
      </analyzer>

</fieldType>

If you compare the analyzers in both definitions you can check that Haystack 1.2 version use solr.WhitespaceTokenizerFactory, and Haystack 2.0 version use solr.StandardTokenizerFactory.

To keep the behavior of Haystack 1.2 in Haystack 2.0 you can create new text field type named, for example, "text_exact" with the same content of text field definition of Haystack 1.2 and then you associate all text fields in schema.xml with "text_en" to "text_exact".

From this version:

<field name="sales_user" type="text_en" indexed="true" stored="true" multiValued="false" />

We will obtain this other version:

<field name="sales_user" type="text_exact" indexed="true" stored="true" multiValued="false" /> 

The Haystack 2.0 official documentation does not provide a solution to this big change. It would be interesting to include some examples like this in section dedicated to migration from Haystack 1.x to 2.x.

相关问答

更多

相关文章

更多

最新问答

更多
  • 您如何使用git diff文件,并将其应用于同一存储库的副本的本地分支?(How do you take a git diff file, and apply it to a local branch that is a copy of the same repository?)
  • 将长浮点值剪切为2个小数点并复制到字符数组(Cut Long Float Value to 2 decimal points and copy to Character Array)
  • OctoberCMS侧边栏不呈现(OctoberCMS Sidebar not rendering)
  • 页面加载后对象是否有资格进行垃圾回收?(Are objects eligible for garbage collection after the page loads?)
  • codeigniter中的语言不能按预期工作(language in codeigniter doesn' t work as expected)
  • 在计算机拍照在哪里进入
  • 使用cin.get()从c ++中的输入流中丢弃不需要的字符(Using cin.get() to discard unwanted characters from the input stream in c++)
  • No for循环将在for循环中运行。(No for loop will run inside for loop. Testing for primes)
  • 单页应用程序:页面重新加载(Single Page Application: page reload)
  • 在循环中选择具有相似模式的列名称(Selecting Column Name With Similar Pattern in a Loop)
  • System.StackOverflow错误(System.StackOverflow error)
  • KnockoutJS未在嵌套模板上应用beforeRemove和afterAdd(KnockoutJS not applying beforeRemove and afterAdd on nested templates)
  • 散列包括方法和/或嵌套属性(Hash include methods and/or nested attributes)
  • android - 如何避免使用Samsung RFS文件系统延迟/冻结?(android - how to avoid lag/freezes with Samsung RFS filesystem?)
  • TensorFlow:基于索引列表创建新张量(TensorFlow: Create a new tensor based on list of indices)
  • 企业安全培训的各项内容
  • 错误:RPC失败;(error: RPC failed; curl transfer closed with outstanding read data remaining)
  • C#类名中允许哪些字符?(What characters are allowed in C# class name?)
  • NumPy:将int64值存储在np.array中并使用dtype float64并将其转换回整数是否安全?(NumPy: Is it safe to store an int64 value in an np.array with dtype float64 and later convert it back to integer?)
  • 注销后如何隐藏导航portlet?(How to hide navigation portlet after logout?)
  • 将多个行和可变行移动到列(moving multiple and variable rows to columns)
  • 提交表单时忽略基础href,而不使用Javascript(ignore base href when submitting form, without using Javascript)
  • 对setOnInfoWindowClickListener的意图(Intent on setOnInfoWindowClickListener)
  • Angular $资源不会改变方法(Angular $resource doesn't change method)
  • 在Angular 5中不是一个函数(is not a function in Angular 5)
  • 如何配置Composite C1以将.m和桌面作为同一站点提供服务(How to configure Composite C1 to serve .m and desktop as the same site)
  • 不适用:悬停在悬停时:在元素之前[复制](Don't apply :hover when hovering on :before element [duplicate])
  • 常见的python rpc和cli接口(Common python rpc and cli interface)
  • Mysql DB单个字段匹配多个其他字段(Mysql DB single field matching to multiple other fields)
  • 产品页面上的Magento Up出售对齐问题(Magento Up sell alignment issue on the products page)