Solr中delta导入的效率方面(Efficiency aspect of delta import in solr)
我有大约2100000行的数据。 完全进口所花费的时间约为2分钟。 对于表中的任何更新,我使用增量导入来索引更新。 增量导入需要6分钟的时间。
考虑到效率方面,最好进行全面进口而不是增量进口。 那么,三角洲进口的需求是什么? 有没有更好的方式来使用增量导入来提高效率?
我遵循了文档中的步骤。
数据-config.xml中
<dataConfig> <dataSource type="JdbcDataSource" driver="com.dbschema.CassandraJdbcDriver" url="jdbc:cassandra://127.0.0.1:9042/test" autoCommit="true" rowLimit = '-1' batchSize="-1"/> <document name="content"> <entity name="test" query="SELECT * from person" deltaImportQuery="select * from person where seq=${dataimporter.delta.seq}" deltaQuery="select seq from person where last_modified > '${dataimporter.last_index_time}' ALLOW FILTERING" autoCommit="true"> <field column="seq" name="id" /> <field column="last" name="last_s" /> <field column="first" name="first_s" /> <field column="city" name="city_s" /> <field column="zip" name="zip_s" /> <field column="street" name="street_s" /> <field column="age" name="age_s" /> <field column="state" name="state_s" /> <field column="dollar" name="dollar_s" /> <field column="pick" name="pick_s" /> </entity> </document>
I have data of about 2100000 rows. The time taken for full-import is about 2 minutes. For any updates in table I'm using delta import to index the updates. The time taken for delta import is 6 minutes.
Considering the efficiency aspect it is better to do full import rather than delta import. So, what is the need of delta import? Is there any better way to use delta import to increase it's efficiency?
I followed the steps in documentation.
data-config.xml
<dataConfig> <dataSource type="JdbcDataSource" driver="com.dbschema.CassandraJdbcDriver" url="jdbc:cassandra://127.0.0.1:9042/test" autoCommit="true" rowLimit = '-1' batchSize="-1"/> <document name="content"> <entity name="test" query="SELECT * from person" deltaImportQuery="select * from person where seq=${dataimporter.delta.seq}" deltaQuery="select seq from person where last_modified > '${dataimporter.last_index_time}' ALLOW FILTERING" autoCommit="true"> <field column="seq" name="id" /> <field column="last" name="last_s" /> <field column="first" name="first_s" /> <field column="city" name="city_s" /> <field column="zip" name="zip_s" /> <field column="street" name="street_s" /> <field column="age" name="age_s" /> <field column="state" name="state_s" /> <field column="dollar" name="dollar_s" /> <field column="pick" name="pick_s" /> </entity> </document>
原文:https://stackoverflow.com/questions/45540956