首页 \ 教程 \ solr

知识点

Solr

Java访问Hadoop编译和运行遇到的各类问题和解决方案

友推常见问题和解决方案汇总

lucene的缓存机制分析

几种常见的基于Lucene的开源搜索解决方案对比

[ lucene其他 ] 几种常见的基于Lucene的开源搜索解决方案对比[转]

基于Lucene/XML的站内全文检索解决方案：WebLucene 【转】

ajax跨域问题的解决方案

2013微信解决方案

微信营销解决方案

HDFS小文件问题及解决方案

centos中文乱码的解决方案

freemarker中文乱码的解决方案

企业高并发的成熟解决方案

Python IndentationError:expected an indented block的解决方案

基于Hadoop Sequencefile的小文件解决方案

Lucene的缓存机制和解决方案

2019-03-27 00:59|来源: 网路

转自：http://blog.csdn.net/buptdavid/article/details/5791125

概述

lucene的缓存可分为两类：filter cache和field cache。

filter cache的实现类为CachingWrapperFilter，用来缓存其他Filter的查询结果。

field cache的实现类是FieldCache，缓存用于排序的field的值。

简单来说，filter Cache用于查询缓存，field cache用于排序。

这两种缓存的生存周期都是在一个IndexReader实例内，因此提高Lucene查询性能的关键在于如何维护和使用同一个IndexReader(即IndexSearcher)。

Filter Cache

从严格意义上来说，lucene没有查询类似数据库服务器的数据高速缓存。lucene的Filter缓存实现类是CachingWrapperFilter，它缓存了查出来的bits。另外lucene还提供了FilterManager，一个单例对象，用来缓存Filter本身。

下面是CachingWrapperFilter的具体实现：

public class CachingWrapperFilter extends Filter {

protected Filter filter;

protected transient Map cache;//这是作为缓存使用的map

public CachingWrapperFilter(Filter filter) {

this.filter = filter;

}

public BitSet bits(IndexReader reader) throws IOException {

if (cache == null) {

cache = new WeakHashMap();//采用WeakHashMap实现，由JVM回收内存

}

synchronized (cache) { // check cache

BitSet cached = (BitSet) cache.get(reader);//key为IndexReader，value为BitSet，所以该缓存生存周期在一个IndexReader内

if (cached != null) {

return cached;

}

//若没有找到缓存，则重新读取

final BitSet bits = filter.bits(reader);

synchronized (cache) { // update cache

cache.put(reader, bits);

}

return bits;

}

在FilterManager里，采用Filter.hashCode()作为key的，所以使用的时候应该在自定义的Filter类中重载hashCode()方法。

例子：Filter filter=FilterManager.getInstance().getFilter(new CachingWrapperFilter(new MyFilter()));如果该filter已经存在，在FilterManager返回该Filter的缓存（带有bit缓存），否则返回本身（不带bit缓存的）。

FilterManager里有个定时线程，会定期清理缓存，以防造成内存溢出错误。

field缓存

field缓存是用来排序用的。lucene会将需要排序的字段都读到内存来进行排序，所占内存大小和文档数目相关。经常有人用lucene做排序出现内存溢出的问题，一般是因为每次查询都启动新的searcher实例进行查询，当并发大的时候，造成多个Searcher实例同时装载排序字段，引起内存溢出。

Field缓存的实现类是FieldCacheImpl，下面我们看看排序时怎么用到Field缓存的：

在IndexSearcher类里的方法，有关排序的查询都会调用到此方法：

public TopFieldDocs search(Weight weight, Filter filter, final int nDocs,Sort sort)throws IOException {

TopFieldDocCollector collector =

new TopFieldDocCollector(reader, sort, nDocs);//排序操作由TopFieldDocCollector实现

search(weight, filter, collector);//开始查询,查询结果回调Collector.collect()方法时实现排序

return (TopFieldDocs)collector.topDocs();//返回TopFieldDocs对象，这个对象和TopDocs的差异在于TopFieldDocs里包含排序字段的信息，包括字段名和字段值。其中TopFieldDocs中ScoreDoc[]的实例是FieldDoc[]

}

下面看看TopFieldDocCollector.collect()是怎么实现的：

public void collect(int doc, float score) {

if (score > 0.0f) {

totalHits++;

if (reusableFD == null)

reusableFD = new FieldDoc(doc, score);s

else {

reusableFD.score = score;

reusableFD.doc = doc;

}

reusableFD = (FieldDoc) hq.insertWithOverflow(reusableFD);//hq是FieldSortedHitQueue对象，一个PriorityQueue的子类，insertWithOverflow()实现一个固定大小的排序队列，排序靠后的对象被挤出队列

}

FieldSortedHitQueue是通过重载lessThan()方法来实现排序功能的：

protected boolean lessThan (final Object a, final Object b) {

final ScoreDoc docA = (ScoreDoc) a;

final ScoreDoc docB = (ScoreDoc) b;

// run comparators

final int n = comparators.length;

int c = 0;

for (int i=0; i<n && c==0; ++i) {

c = (fields[i].reverse) ? comparators[i].compare (docB, docA)

: comparators[i].compare (docA, docB);//通过comparators[]来进行排序，我们剩下的任务就是看看这些comparator[]是怎么构造的，怎么使用的Fieldcache的

}

// avoid random sort order that could lead to duplicates (bug #31241):

if (c == 0)

return docA.doc > docB.doc;

return c > 0;

}

comparators实在FieldSortedHitQueue的构造函数里创建的：

public FieldSortedHitQueue (IndexReader reader, SortField[] fields, int size)throws IOException {

final int n = fields.length;

comparators = new ScoreDocComparator[n];

this.fields = new SortField[n];

for (int i=0; i<n; ++i) {

String fieldname = fields[i].getField();

comparators[i] = getCachedComparator (reader, fieldname, fields[i].getType(), fields[i].getLocale(), fields[i].getFactory());//调用getCachedComparator方法获得缓存的comparators，comparator是ScoreDocComparator的实例

if (comparators[i].sortType() == SortField.STRING) {

this.fields[i] = new SortField (fieldname, fields[i].getLocale(), fields[i].getReverse());

} else {

this.fields[i] = new SortField (fieldname, comparators[i].sortType(), fields[i].getReverse());

}

initialize (size);

}

下面看看getCachedComparator ()的实现：

static final FieldCacheImpl.Cache Comparators = new FieldCacheImpl.Cache(){

。。。

}

static ScoreDocComparator getCachedComparator (IndexReader reader, String field, int type, Locale locale, SortComparatorSource factory)throws IOException {

//以下两种不需要读取字段

if (type == SortField.DOC) return ScoreDocComparator.INDEXORDER;//按索引顺序排序

if (type == SortField.SCORE) return ScoreDocComparator.RELEVANCE;//按相关度排序

FieldCacheImpl.Entry entry = (factory != null)? new FieldCacheImpl.Entry (field, factory)

: new FieldCacheImpl.Entry (field, type, locale);

//其他类型的排序需要读取字段到缓存中

return (ScoreDocComparator)Comparators.get(reader, entry);//Comparators 是一个FieldCache的实例

}

Comparators.get()方法根据排序字段类型的不同，返回ScoreDocComparator的不同实现，下面我们看看String类型的实现，就可以知道什么时候调用fieldCache了：

static ScoreDocComparator comparatorString (final IndexReader reader, final String fieldname)

throws IOException {

final String field = fieldname.intern();

//下面代码读取缓存，得到字段值和文档id的对应关系，如果缓存不存在，则读取索引文件。缓存的生命周期是和IndexReader一样，所以不同查询使用同一个Searcher，可以保证排序缓存只有一个，不会出现内存溢出的问题

final FieldCache.StringIndex index = FieldCache.DEFAULT.getStringIndex (reader, field);

return new ScoreDocComparator () {

public final int compare (final ScoreDoc i, final ScoreDoc j) {

final int fi = index.order[i.doc];//index.order[]的值是按自定义字段的排序，数组的索引是lucene docid；可以看看getStringIndex的具体实现来看看这些值是怎么读进来的，这里就不详细说明了

final int fj = index.order[j.doc];

if (fi < fj) return -1;

if (fi > fj) return 1;

return 0;

}

public Comparable sortValue (final ScoreDoc i) {

return index.lookup[index.order[i.doc]];

}

public int sortType() {

return SortField.STRING;

}

};

}

结论

lucene使用上述的两个缓存机制已经能解决绝大部分的问题了。solr在lucene之上封装，又增加了另外的缓存，但应该说作用不太大，反而使代码变得很复杂了。

缓存解决方案

Lucene缓存的生存周期都是在一个IndexReader实例内，因此提高Lucene查询性能的关键在于如何维护和使用同一个IndexReader(即IndexSearcher)。

因此我们需要新写一个SingleIndexSearcher（源代码见下）类，该类继承IndexSearcher，作用为实现IndexSearcher的单例模式。

LuceneBase加入类SingleIndexSearcher并将IndexSearcher对象的生成都用SingleIndexSearcher. getInstance()方法。

缓存Filter用法：Filter filter = new CachingWrapperFilter(new FieldFilter(field, value));

或

Filter filter = FilterManager.getInstance().getFilter(new CachingWrapperFilter(new FieldFilter(field, value)));

/**

* IndexSearcher单例模式的实现采取单例模式是要充分利用Lucene的缓存，同时防止多个IndexSearcher对象导致内存溢出和并发问题

* @author 路卫杰

* @version 1.0, 2010-8-4

* @see IndexSearcher

public class SingleIndexSearcher extends IndexSearcher {

/** 私有静态SingleIndexSearcher对象 */

private static IndexSearcher instance;

static{

try {

instance = new SingleIndexSearcher(Configure.getProperties().getProperty("ZkAnalyzerPath"));

System.out.println("构造");

} catch (CorruptIndexException e) {

e.printStackTrace();

} catch (IOException e) {

e.printStackTrace();

}

/**

* 构造方法

* @param path

* 索引路径

* @throws IOException

* @throws CorruptIndexException

public SingleIndexSearcher(String path) throws CorruptIndexException, IOException{

super(path);

}

/**

* 获得单例

public static IndexSearcher getInstance() {

return instance;

}

搜索速度比较

搜索相同关键字和过滤器次数（次）一般过滤器(ms) 缓存过滤器(ms) 缓存排序(ms)

1 2407 2438 2093

5 4750 2531 2219

10 8110 2672 2313

20 14750 2922 2593

50 34498 3672 3250

100 67546 4844 4407

转自：http://www.cnblogs.com/zjw520/archive/2013/04/11/3015515

知识点

相关文章

最近更新

Lucene的缓存机制和解决方案

转自：http://blog.csdn.net/buptdavid/article/details/5791125

概述

Filter Cache

field缓存

结论

缓存解决方案

搜索速度比较

相关问答

求一篇企业信息安全的论文，要求有具体的问题，分析过程，和解决方案！[2022-08-22]

我的“最近的项目和解决方案”选项在哪里？(Where is my “Recent Projects and Solutions” option?)[2024-01-09]

WCF缓存解决方案 - 需要建议(WCF Caching Solution - Need Advice)[2022-06-22]

图像缓存解决方案(Image caching solutions)[2021-09-28]

ADM在TOGAF中的机遇和解决方案阶段意义何在？(What is the Meaning of Opportunities in Opportunities and Solutions Phase of ADM in TOGAF)[2021-10-28]

Lucene.NET和Facete搜索解决方案(Lucene.NET & Facete Search Solution)[2024-02-28]

如何将我的解决方案纳入Windows Problemm报告和解决方案(How To Get My Solutions Into Windows Problemm Reports and Solutions)[2023-01-04]

msbuild和解决方案中的多个Web项目(msbuild and multiple web projects in a solution)[2022-11-12]

Java验证程序和解决方案(Java verifier and resolution)[2022-08-23]