知识点
相关文章
更多最近更新
更多Solr: a custom Search RequestHandler
2019-03-27 01:13|来源: 网路
As you know, I've been playing with Solr lately, trying to see how feasible it would be to customize it for our needs. We have been a Lucene shop for a while, and we've built our own search framework around it, which has served us well so far. The rationale for moving to Solr is driven primarily by the need to expose our search tier as a service for our internal applications. While it would have been relatively simple (probably simpler) to slap on an HTTP interface over our current search tier, we also want to use the other Solr features such as incremental indexing and replication.
One of our challenges to using Solr is that the way we do search is quite different from the way Solr does search. A query string passed to the default Solr search handler is parsed into a Lucene query and a single search call is made on the underlying index. In our case, the query string is passed to our taxonomy, and depending on the type of query (as identified by the taxonomy), it is sent through one or more sub-handlers. Each sub-handler converts the query into a (different) Lucene query and executes the search against the underlying index. The results from each sub-handler are then layered together to present the final search result.
Conceptually, the customization is quite simple - simply create a custom subclass of RequestHandlerBase (as advised on this wiki page) and override the handleRequestBody(SolrQueryRequest, SolrQueryResponse) method. In reality, I had quite a tough time doing this, admittedly caused (at least partly) by my ignorance of Solr internals. However, I did succeed, so, in this post, I outline my solution, along with some advice I feel would be useful to others embarking on a similar route.
Configuration and Code
The handler is configured to trigger in response to a /solr/mysearch request. Here is the (rewritten for readability) XML snippet from my solrconfig.xml file. I used the "invariants" block to pass in configuration parameters for the handler.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 |
...
<requestHandler name="/mysearch"
class="org.apache.solr.handler.ext.MyRequestHAndler">
<lst name="defaults">
<str name="echoParams">explicit</str>
<str name="fl">*,score</str>
<str name="wt">xml</str>
</lst>
<lst name="invariants">
<str name="prop1">value1</str>
<int name="prop2">value2</int>
<!-- ... more config items here ... -->
</lst>
</requestHandler>
...
|
And here is the (also rewritten for readability) code for the custom handler. I used the SearchHandler and MoreLikeThisHandler as my templates, but diverged from it in several ways in order to accomodate my requirements. I will describe them below.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 |
package org.apache.solr.handler.ext;
// imports omitted
public class MyRequestHandler extends RequestHandlerBase {
private String prop1;
private String prop2;
...
private TaxoService taxoService;
@Override
public void init(NamedList args) {
super.init(args);
this.prop1 = invariants.get("prop1");
this.prop2 = Integer.valueOf(invariants.get("prop2"));
...
this.taxoService = new TaxoService(prop1);
}
@Override
public void handleRequestBody(SolrQueryRequest req, SolrQueryResponse rsp)
throws Exception {
// extract params from request
SolrParams params = req.getParams();
String q = params.get(CommonParams.Q);
String[] fqs = params.getParams(CommonParams.FQ);
int start = 0;
try { start = Integer.parseInt(params.get(CommonParams.START)); }
catch (Exception e) { /* default */ }
int rows = 0;
try { rows = Integer.parseInt(params.get(CommonParams.ROWS)); }
catch (Exception e) { /* default */ }
SolrPluginUtils.setReturnFields(req, rsp);
// build initial data structures
TaxoResult taxoResult = taxoService.getResult(q);
SolrDocumentList results = new SolrDocumentList();
SolrIndexSearcher searcher = req.getSearcher();
Map<String,SchemaField> fields = req.getSchema().getFields();
int ndocs = start + rows;
Filter filter = buildFilter(fqs, req);
Set<Integer> alreadyFound = new HashSet<Integer>();
// invoke the various sub-handlers in turn and return results
doSearch1(results, searcher, q, filter, taxoResult, ndocs, req,
fields, alreadyFound);
doSearch2(results, searcher, q, filter, taxoResult, ndocs, req,
fields, alreadyFound);
// ... more sub-handler calls here ...
// build and write response
float maxScore = 0.0F;
int numFound = 0;
List<SolrDocument> slice = new ArrayList<SolrDocument>();
for (Iterator<SolrDocument> it = results.iterator(); it.hasNext(); ) {
SolrDocument sdoc = it.next();
Float score = (Float) sdoc.getFieldValue("score");
if (maxScore < score) {
maxScore = score;
}
if (numFound >= start && numFound < start + rows) {
slice.add(sdoc);
}
numFound++;
}
results.clear();
results.addAll(slice);
results.setNumFound(numFound);
results.setMaxScore(maxScore);
results.setStart(start);
rsp.add("response", results);
}
private Filter buildFilter(String[] fqs, SolrQueryRequest req)
throws IOException, ParseException {
if (fqs != null && fqs.length > 0) {
BooleanQuery fquery = new BooleanQuery();
for (int i = 0; i < fqs.length; i++) {
QParser parser = QParser.getParser(fqs[i], null, req);
fquery.add(parser.getQuery(), Occur.MUST);
}
return new CachingWrapperFilter(new QueryWrapperFilter(fquery));
}
return null;
}
private void doSearch1(SolrDocumentList results,
SolrIndexSearcher searcher, String q, Filter filter,
TaxoResult taxoResult, int ndocs, SolrQueryRequest req,
Map<String,SchemaField> fields, Set<Integer> alreadyFound)
throws IOException {
// check entry condition
if (! canEnterSearch1(q, filter, taxoResult)) {
return;
}
// build custom query and extra fields
Query query = buildCustomQuery1(q, taxoResult);
Map<String,Object> extraFields = new HashMap<String,Object>();
extraFields.put("search_type", "search1");
boolean includeScore =
req.getParams().get(CommonParams.FL).contains("score"));
append(results, searcher.search(
query, filter, maxDocsPerSearcherType).scoreDocs,
alreadyFound, fields, extraFields, maprelScoreCutoff,
searcher.getReader(), includeScore);
}
// ... more doSearchXXX() calls here ...
private void append(SolrDocumentList results, ScoreDoc[] more,
Set<Integer> alreadyFound, Map<String,SchemaField> fields,
Map<String,Object> extraFields, float scoreCutoff,
SolrIndexReader reader, boolean includeScore) throws IOException {
for (ScoreDoc hit : more) {
if (alreadyFound.contains(hit.doc)) {
continue;
}
Document doc = reader.document(hit.doc);
SolrDocument sdoc = new SolrDocument();
for (String fieldname : fields.keySet()) {
SchemaField sf = fields.get(fieldname);
if (sf.stored()) {
sdoc.addField(fieldname, doc.get(fieldname));
}
}
for (String extraField : extraFields.keySet()) {
sdoc.addField(extraField, extraFields.get(extraField));
}
if (includeScore) {
sdoc.addField("score", hit.score);
}
results.add(sdoc);
alreadyFound.add(hit.doc);
}
}
//////////////////////// SolrInfoMBeans methods //////////////////////
@Override
public String getDescription() {
return "My Search Handler";
}
@Override
public String getSource() {
return "$Source$";
}
@Override
public String getSourceId() {
return "$Id$";
}
@Override
public String getVersion() {
return "$Revision$";
}
}
|
Configuration Parameters - I started out baking most of my "configuration" parameters as constants within the handler code, but later moved them into the invariants block in the XML declaration. Not ideal, since we still need to touch the solrconfig.xml file (which is regarded as application code in our environment) to change behavior. The ideal solution, given the circumstances, would probably be to use JNDI to hold the configuration parameters and have the handler connect to the JNDI to pull the properties it needs.
Using Filter - The MoreLikeThis handler converts the fq (filter query) parameter into a List of Query objects, because this is what is needed to pass into a searcher.getDocList(). In my case, I couldn't use DocListAndSet because DocList is unmodifiable (ie, DocList.add() throws an UnsupportedOperationException). So I fell back to the pattern I am used to, which is getting the ScoreDoc[] array from a standard searcher.search(Query,Filter,numDocs) call. That is why the buildFilter() above returns a Filter and not a List<Query>.
Connect to external services - My handler needs to connect to the taxonomy service. Our taxonomy exposes an RMI service with a very rich and fine-grained API. I tried to use this at first, but ran into problems because it needs access to configuration files on the local system, and Jetty couldn't see these files because it was not within its context. I ended up solving for this by exposing a coarse grained JSON service over HTTP on the taxonomy service. The handler calls it once per query and gets back all the information that it needs in a single call. Probably not ideal, since now the logic is spread out in two places - I will probably revisit the RMI client integration again in the future.
Layer multiple resultsets - This is the main reason for writing the custom handler. Most of the work happens in the append() method above. Each sub-handler calls SolrSearcher.search(Query, Filter, numDocs) and populates its resulting ScoreDocs array into a List<SolrDocument>. Since previous sub-handlers may have already returned a result, subsequent sub-handlers check against a Set of docIds.
Add a pseudo-field to the Document - There are currently two competing initiatives in Solr (SOLR-1566 and SOLR-1298) on how to handle this situation. Since I was populating SolrDocument objects (this was one of the reasons I started using SolrDocumentList), it was relatively simple for me to pass in a Map of extra fields which are just tacked on to the end of the SolrDocument.
Some Miscellaneous advice
Here is some advice and tips which I wish someone had told me before I started out on this.
For your own sanity, standardize on a Solr release. I chose 1.4.1 which is the latest at the time of writing this. Prior to that, I was developing within the Solr trunk. One day (after about 60-70% of my code was working), I decided to do an svn update, and all of a sudden there was a huge bunch of compile failures (in my code as well as the Solr code). Some of them were probably caused by missing/out-of-date JARs in my .classpath. But the point is that Solr code is being actively developed, and there is quite a bit of code churn, and if you really want to work on the trunk (or a pre-release branch), you should be ready to deal with these situtations.
Solr is well designed (so the flow is kind of intuitive) and reasonably well documented, but there are some places where you will probably need to step through the code in a debugger to figure out what's going on. I am still using the Jetty container in the examples subdirectory. This page on Lucid Imagination outlines the steps you need to run Solr within Eclipse using the Jetty plugin, but thanks to the information on this StackOverlow page, all I did was add some command-line parameters to the java call, like so:
1 2 3 |
sujit@cyclone:example$ java -Dsolr.solr.home=my_schema \
-agentlib:jdwp=transport=dt_socket,server=y,address=8883,suspend=n \
-jar start.jar
|
and then set up an external debug configuration for localhost:8883 in Eclipse, and I could step through the code just fine.
Solr has very aggressive caching (which is great for a production environment), but for development, you need to disable it. I did this by commenting out all the cache references for filterCache, queryResultCache and documentCache in solrconfig.xml, and changed the httpCaching to use never304="true". All these are in the solrconfig.xml file.
Conclusion
The approach I described here is not as performant as the "standard" flow. Because I have to do multiple searches in a single request, I am doing more I/O. I am also consuming more CPU cycles since I have to dedup documents across each layer. I am also consuming more memory per request because I populate the SolrDocument inline rather than just pass the DocListAndSet to the ResponseBuilder. I don't see a way around it, though, given the nature of my requirements.
If you are a Solr expert, or someone who is familiar with the internals, I would appreciate hearing your thoughts about this approach - criticisms and suggestions are welcome.
http://sujitpal.blogspot.com/2011/02/solr-custom-search-requesthandler.html
转自:http://www.cnblogs.com/chenying99/p/3470393
相关问答
更多-
TCP/IP模型是一个________。[2023-05-19]
a -
下列中不属于面向对象的编程语言的是?[2022-05-30]
a -
SOLR 4.0:在requestHandler代码中获取唯一的LONG字段(SOLR 4.0: Get unique LONG fields in requestHandler code)[2024-01-15]
好的,我发现了。 如果有人感兴趣:longs似乎在Solr 4.0中使用不同类型的编码,所以我们可以像上面的代码一样将它们作为普通字符串处理,但后来我们必须使用不同的解析器将term值转换为long: FieldCache.NUMERIC_UTILS_LONG_PARSER.parseLong(term) 当没有更多元素存在时,它似乎抛出异常。 至于现在它工作正常。 Ok, i found it out. If anyone is interested: longs seem to use differ ... -
Solr 1与Solr 4 Dismax Handler的不同分数(Different scores from Solr 1 vs Solr 4 Dismax Handler)[2022-10-31]
如果解释部分不同,那么它使用不同的计算来计算得分,因此它们将是不同的。 分数非常随意,基本上只用于查询的一个结果集中的比较,换句话说,将一个查询的分数与另一个查询的分数进行比较是没有意义的。 同样可能适用于不同版本的solr,特别是如果计算的方式不同。 If the explain portion is different, then it's using different calculations to calculate the scores so they are going to be diffe ... -
如何查询Solr分片(How to query Solr shard)[2022-08-06]
您不能只将响应中的内容复制到配置文件中 - 这两种格式完全不同。 引用的是, defaults部分中的每个条目都被添加到查询字符串中(除非它们已经在那里提供 - 如果要强制某个不能被覆盖的值,也有选项)。[...] somehost:port1/s ... -
Solr直接匹配搜索(Solr direct match search)[2022-02-17]
您需要为字符串定义字段类型。 (我希望你已经定义了这个)提交的定义是正确的。 您可以使用简单的查询来搜索相同的内容 q=name:"one blue car"&rows=100&wt=json&indent=true 您不需要定义 ... -
如何使用searchcomponent在solr响应中添加自定义响应?(How to add custom response in solr response using searchcomponent?)[2023-05-07]
通过在public void finishStage(ResponseBuilder rb)而不是process()中重写和写入逻辑来修复它。 fixed it by overriding and writing logic in public void finishStage(ResponseBuilder rb) instead of process(). -
Solr 6.4.1更新很长(Solr 6.4.1 Update very long)[2021-12-02]
幸运的是,我很快就找到了答案。 我不能说这些参数中有一个是快速的(我认为它是autoCommit),但它实际上很快(我在solr优化方面跟着一些文章)。 这是新的solrconfig.xml:6.4.1 如何使用Drupal 8 Solr搜索模块创建高级搜索表单?(How to create an advanced search form using the Drupal 8 Solr Search module?)[2022-04-03]
我找到了一种无需编程的方法。 在Solr Search配置页面(结构/视图)中,您可以使用视图公开过滤器来根据需要添加任意多个索引特定的表单输入。 I found a way without programming anything. In the Solr Search config page (Structure / Views) you can use the Views exposed filter to add as many index specific form input as you wa .../export端点仅与本地节点相关,但Streaming Expressions API (在没有任何进一步配置的/stream下可用)构建在/export端点之上,并且是云替代方案。 这也允许您在请求时处理内容(如果适用)。 /stream所需的参数与/ export相同。 但是,由于你在4.10.2上,你将不得不从Zookeeper请求clusterstate.json,然后在本地合并结果之前自己查询每个节点。 您可以通过连接到Zookeeper来检索此文件: zkCli.sh -server ip:2 ...