首页 \ 教程 \ solr

知识点

Solr

solr1.4 replication分发知识

Paoding 在 Solr 1.4 中使用

升级到 solr 1.4 的注意事项

升级 Solr 1.4 后性能有所提升

初级问题关于jdk 1.4 中DecimalFormat

Nutch1.4相关

ubuntu部署nutch1.4

solr的并发能力,最好有数据量化说明

solr 教程 01

about云资源汇总指引V1.4:包括hadoop,openstack,nosql,虚拟化

实战： SOLR的分布式部署（复制）CollectionDistribute 快照分发（精简版）

Solr总结

solr

solr学习（1）

Solr在Tomcat中的部署

solr1.4 中SearchHandler使用的httpclient在高并发可能出现的问题

2019-03-27 01:18|来源: 网路

solr 1.4 中使用的分布式搜索，是基于httpclient发出分布结点的请求，主要实现在SearchHandler类，该类里有个内部类

HttpCommComponent

里面有一个httpclient ，是一个静态实例，也就是说在同一个jvm里只有一个实例，可以重复使用，主要代码：

static HttpClientclient;

static {

MultiThreadedHttpConnectionManagermgr =new MultiThreadedHttpConnectionManager();

mgr.getParams().setDefaultMaxConnectionsPerHost(20);

mgr.getParams().setMaxTotalConnections(10000);

mgr.getParams().setConnectionTimeout(SearchHandler.connectionTimeout);

mgr.getParams().setSoTimeout(SearchHandler.soTimeout);

// mgr.getParams().setStaleCheckingEnabled(false);

client = new HttpClient(mgr);

}

其中有两个参数设置死了。。这两个参数是作用于线程池，管理于http连接

httpclient在处理请求连接方面使用了连接池，它内部实际上有两种连接池，一种是全局的ConnectionPool，一种是每主机（per-host）HostConnectionPool。参数maxHostConnections就HostConnectionPool中表示每主机可保持连接的连接数，maxTotalConnections是ConnectionPool中可最多保持的连接数。每主机的配置类是HostConfiguration，HttpClient有个int executeMethod(final HostConfiguration hostConfiguration, final HttpMethod method)方法可以指定使用哪个HostConfiguration，不过多数情况都是不显示指定HostConfiguration，这样httpclient就用了默认的HostConfiguration=null，也就是说所有的请求可以认为指自同一个主机。如果不设置这两个参数，httpclient自然会用默认的配置，也就是MultiThreadedHttpConnectionManager中的：

 /** The default maximum number of connections allowed per host */
    public static final int DEFAULT_MAX_HOST_CONNECTIONS = 2;   // Per RFC 2616 sec 8.1.4
 
    /** The default maximum number of connections allowed overall */
    public static final int DEFAULT_MAX_TOTAL_CONNECTIONS = 20;

设置于对应请求的目标主机线程数最多为20条

mgr.getParams().setDefaultMaxConnectionsPerHost(20);

总共的线程数为10000。

mgr.getParams().setMaxTotalConnections(10000);

具体如何分配可以看MultiThreadedHttpConnectionManagermgr的实现代码，在不大于总线程的情况下，最多分配给某个目标主机最多20条线程。

这里有个问题，如果主要请求两台机器，那么最终分配的线程数为20*2=40条，在高并发情况下就会出现阻塞情况。所以对于高并发的线上服务来说，20是比较吝啬的。。

这里是一段httclient请求调用的方法，应该就是在高并发中，阻塞在getConnectionWithTimeout()这个方法中。。具体可能追综下源代码 .。

这里为了测试这个参数引起的问题，简单实现了一个小程序，代码如下：

package org.yzy.jetty;

import org.apache.commons.httpclient.HttpClient;
import org.apache.commons.httpclient.HttpMethod;
import org.apache.commons.httpclient.MultiThreadedHttpConnectionManager;
import org.apache.commons.httpclient.methods.GetMethod;

public class HttpClientTest {

	static HttpClient client;

	static {
		MultiThreadedHttpConnectionManager mgr = new MultiThreadedHttpConnectionManager();
		mgr.getParams().setDefaultMaxConnectionsPerHost(2);
		mgr.getParams().setMaxTotalConnections(10);
		mgr.getParams().setConnectionTimeout(2000);
		mgr.getParams().setSoTimeout(1000);
		client = new HttpClient(mgr);
	}

	public static void main(String[] args) {
		
		Thread t[]=new Thread[3];

		for (int i = 0; i < t.length; i++) {
			t[i]=new Thread(new Send());
		}
		
		for (int i = 0; i < t.length; i++) {
			t[i].start();
		}
	
	}

	public static class Send implements Runnable {
		@Override
		public void run() {
			try{
				HttpMethod method = new GetMethod("http://localhost:8080/solr");
				System.out.println(Thread.currentThread().getName()+"-" +Thread.currentThread().getId() +":send");
				int result = client.executeMethod(method);
				System.out.println(Thread.currentThread().getName()+"-" +Thread.currentThread().getId() +":back" +result);
			}catch(Exception e){
				e.printStackTrace();
			}

		}

	}

}

运行结果如下：

Thread-2-11:send

Thread-1-10:send

Thread-3-12:send

Thread-3-12:back200

Thread-2-11:back200

可以看到有一条没有执行成功，一直阻塞中。。。

将Send类修改一下，代码再改下：

public static class Send implements Runnable {
		@Override
		public void run() {
			HttpMethod method = new GetMethod("http://localhost:8080/solr");
			try{
				System.out.println(Thread.currentThread().getName()+"-" +Thread.currentThread().getId() +":send");
				int result = client.executeMethod(method);
				System.out.println(Thread.currentThread().getName()+"-" +Thread.currentThread().getId() +":back" +result);
				Thread.sleep(1000);
			}catch(Exception e){
				e.printStackTrace();
			}finally{
				method.releaseConnection();
				System.out.println("relase..");
			}

		}

	}

再运行：

Thread-3-12:send

Thread-1-10:send

Thread-2-11:send

Thread-2-11:back200

Thread-3-12:back200

relase..

Thread-1-10:back200

relase..

当有连接断掉的时候，阻塞的线程可用。。完成请求。。

还有个问题，就是用户请求solr时，分发为三个请求，分别请求,主索引，小索引，专辑索引，最后发现，总是主索引抛出socke超时异常，又作何解释呢：

首先，分发的三个请求是在多线程的情况下处理的，当主索引搜索时间过长，而小索引，专辑索引搜索时间较短，比较快地 releaseConnection，

所以相对大索引来说，其它两上在同一时间内可用的连接比较多，相反大索引由于响应过慢，导致同一时间内占握的连接超过了默认设置的20条连接。

所以才会在大索引上产生请求阻塞。

至于为什么抛出socket超时异常，因为solr的服务运行在tomcat上，tomcat 设置了连接超时等待，比如3000ms，这个时候，由于阻塞的连接没完成，所以这个时候，tomcat主动抛弃了连接，最后看到的就是socket超时异常。。

因为socket异常主要发生在等待读取数据造成的。。。这就是我的分析。。。

当然solr的最新版本已解决了这个问题，连接池已改为可以配置的形式。。

新版本的solr可配置方式，请看wiki.. http://wiki.apache.org/solr/SolrConfigXml/

<requestHandler name="standard" class="solr.SearchHandler" default="true">
    <!-- other params go here -->
     <shardHandlerFactory class="HttpShardHandlerFactory">
        <int name="socketTimeOut">1000</int>
        <int name="connTimeOut">5000</int>
      </shardHandler>
  </requestHandler>

The parameters that can be specified are as follows:

socketTimeout. default: 0 (use OS default) - The amount of time in ms that a socket is allowed to wait for
connTimeout. default: 0 (use OS default) - The amount of time in ms that is accepted for binding / connection a socket
maxConnectionsPerHost. default: 20 - The maximum number of connections that is made to each individual shard in a distributed search
corePoolSize. default: 0 - The retained lowest limit on the number of threads used in coordinating distributed search
maximumPoolSize. default: Integer.MAX_VALUE - The maximum number of threads used for coordinating distributed search
maxThreadIdleTime. default: 5 seconds - The amount of time to wait for before threads are scaled back in response to a reduction in load
sizeOfQueue. default: -1 - If specified the thread pool will use a backing queue instead of a direct handoff buffer. This may seem difficult to grasp, essentially high throughput systems will want to configure this to be a direct hand off (with -1). Systems that desire better latency will want to configure a reasonable size of queue to handle variations in requests.
fairnessPolicy. default: false - Chooses in the JVM specifics dealing with fair policy queuing, if enabled distributed searches will be handled in a First in First out fashion at a cost to throughput. If disabled throughput will be favoured over latency.

http://blog.csdn.net/duck_genuine/article/details/7839479

转自：http://blog.csdn.net/duck_genuine/article/details/7839479

知识点

相关文章

最近更新

solr1.4 中SearchHandler使用的httpclient在高并发可能出现的问题

相关问答

HttpClient对高并发有什么优化吗[2023-07-03]

HttpClient对高并发有什么优化吗[2022-09-01]

Solr 1.4和Solrj 4.6需要哪些兼容性更改？(What compatibility changes required for Solr 1.4 and Solrj 4.6?)[2022-04-08]

在solr 1.4中突出显示时显示所有出现的查询(Show all occurrences of query while highlighting in solr 1.4)[2023-06-11]

Solr 1.4 Date Facet Include(Solr 1.4 Date Facet Include)[2022-02-27]

solr更新特定字段而不是整个文档[关闭](solr update specific fields rather than entire document [closed])[2022-12-12]

如何在不影响插件的情况下更新grails-solr-plugin中的Solr版本？(How do I update the Solr version in grails-solr-plugin without affecting the plugin?)[2023-04-24]

Nutch 1.4与Solr 3.4 - 无法抓取网址，“无法抓取网址”(Nutch 1.4 with Solr 3.4 - can't crawl URL, “no URLs to fetch”)[2021-12-12]

为什么Solr默认多值为真？(Why Solr default Multivalued to true?)[2022-06-29]

替代Tika / PDFBox在Solr中解析PDF（任何版本低于1.4）(Alternative to Tika/PDFBox for parsing PDF in Solr (any version later than 1.4))[2022-04-24]