首页 \ 教程 \ solr

知识点

Solr

lucene之用Lucene实现分组，facet功能，FieldCache

solr的分组和统计功能

about lucene grouping and facet history

lucene之用Lucene实现分组，好的实现，solr

solr4.7开发实践 3——分组查询facet

lucene 3.4 contrib/facet 切面搜索

solr分组排序实现group by功能

Elasticsearch强大的聚合功能Facet

solr facet

solr+facet学习笔记

solr facet是个好东东

荐用solr的facet实现聚合标签

solr的facet性能

Solr Facet技术的应用与研究

lucene的使用

用Lucene实现分组，facet功能，FieldCache

2019-03-27 01:12|来源: 网路

假如你像用lucene来作分组，比如按类别分组，这种功能，好了你压力大了，lucene本身是不支持分组的。

当你想要这个功能的时候，就可能会用到基于lucene的搜索引擎solr。

不过也可以通过编码通过FieldCache和单字段，对索引进行分组，比如：想构造类别树。大类里面还有小类那种。

这个功能实现起来可能会比较麻烦，主要是lucene提供的支持也不多，参考资料也不多。

（以下代码都是我在做测试的时候做的，可以稍作修改满足相应需求。）

//用于分组统计的对象GroupCollector

import java.io.IOException;

import org.apache.lucene.index.IndexReader;
import org.apache.lucene.search.Collector;
import org.apache.lucene.search.Scorer;

public class GroupCollector extends Collector {

   private GroupField gf = new GroupField();// 保存分组统计结果
   private int docBase;
   // fieldCache
   private String[] fc;

   @Override
   public boolean acceptsDocsOutOfOrder() {
       return true;
   }

   @Override
   public void collect(int doc) throws IOException {
       // 因为doc是每个segment的文档编号，需要加上docBase才是总的文档编号
       final int docId = doc + this.docBase;
       // 添加的GroupField中，由GroupField负责统计每个不同值的数目
       this.gf.addValue(this.fc[docId]);

   }

   @Override
   public void setNextReader(IndexReader arg0, int arg1) throws IOException {
       this.docBase = this.docBase;

   }

   @Override
   public void setScorer(Scorer arg0) throws IOException {

   }

   public GroupField getGf() {
       return this.gf;
   }

   public void setGf(GroupField gf) {
       this.gf = gf;
   }

   public int getDocBase() {
       return this.docBase;
   }

   public void setDocBase(int docBase) {
       this.docBase = docBase;
   }

   public String[] getFc() {
       return this.fc;
   }

   public void setFc(String[] fc) {
       this.fc = fc;
   }

}

import java.util.ArrayList;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

/**
* 用于保存分组统计后每个字段的分组结果
*/
public class GroupField {

   /**
   * 字段名
   */
   private String name;
   /**
   * 商品类型的对象列表
   */
   private List<SimpleCategory> values = new ArrayList<SimpleCategory>();
   /**
   * 保存字段值和文档个数的对应关系
   */
   private Map<String, Integer> countMap = new HashMap<String, Integer>();

   public Map<String, Integer> getCountMap() {
       return this.countMap;
   }

   public void setCountMap(Map<String, Integer> countMap) {
       this.countMap = countMap;
   }

   public String getName() {
       return this.name;
   }

   public void setName(String name) {
       this.name = name;
   }

   public List<SimpleCategory> getValues() {
       return this.values;
   }

   public void setValues(List<SimpleCategory> values) {
       this.values = values;
   }

   /**
   * 用于商品对象list的构造
   *
   * @param value
   */
   public void addValue(String value) {
       if ((value == null) || "".equals(value)) return;
       // 对于多值的字段，支持按空格拆分
       final String[] temp = value.split(",");

       if (this.countMap.get(temp[1]) == null) {
           this.countMap.put(temp[1], 1);
           // 构造商品类型临时对象
           final SimpleCategory simpleCategory = new SimpleCategory();

           simpleCategory.setCategoryId(Integer.parseInt(temp[0]));
           simpleCategory.setCategoryName(temp[1]);
           simpleCategory.setParentId(Integer.parseInt(temp[2]));
           simpleCategory.setSortIndex(temp[3]);
           simpleCategory.setParentCategoryName(temp[4]);
           // simpleCategory.setAdImag(temp[5]);
           // simpleCategory.setParentAdImage(temp[6]);
           this.values.add(simpleCategory);
       }
       else {
           this.countMap.put(temp[1], this.countMap.get(temp[1]) + 1);
       }
       // for( String str : temp ){
       // if(countMap.get(str)==null){
       // countMap.put(str,1);
       // values.add(str);
       // }
       // else{
       // countMap.put(str, countMap.get(str)+1);
       // }
       // }
   }
   // class ValueComparator implements Comparator<String>{
   //
   // // public int compare(String value0, String value1) {
   // // if(countMap.get(value0)>countMap.get(value1)){
   // // return -1;
   // // }
   // // else if(countMap.get(value0)<countMap.get(value1)){
   // // return 1;
   // // }
   // // return 0;
   // // }
   // }
}

自己构建想返回的对象

/**
* 用于将lucene索引中的商品类型CategoryIndex字段，转换成商品类型的一个对象。
*
* @author xiaozd
*
*/
public class SimpleCategory extends BaseModel {

   private static final long serialVersionUID = -2345212345526771266L;
   private int parentId;
   private int categoryId;
   private String categoryName;
   private String sortIndex;
   private int goodsCount;
   private String parentCategoryName;

   public int getParentId() {
       return this.parentId;
   }

   public void setParentId(int parentId) {
       this.parentId = parentId;
   }

   public int getCategoryId() {
       return this.categoryId;
   }

   public void setCategoryId(int categoryId) {
       this.categoryId = categoryId;
   }

   public String getCategoryName() {
       return this.categoryName;
   }

   public void setCategoryName(String categoryName) {
       this.categoryName = categoryName;
   }

   public String getSortIndex() {
       return this.sortIndex;
   }

   public void setSortIndex(String sortIndex) {
       this.sortIndex = sortIndex;
   }

   public static long getSerialversionuid() {
       return SimpleCategory.serialVersionUID;
   }

   public int getGoodsCount() {
       return this.goodsCount;
   }

   public void setGoodsCount(int goodsCount) {
       this.goodsCount = goodsCount;
   }

   public String getParentCategoryName() {
       return this.parentCategoryName;
   }

   public void setParentCategoryName(String parentCategoryName) {
       this.parentCategoryName = parentCategoryName;
   }

}

    /**
   * 查询商品的所有类型，方式：通过索引分组查询所有类型。
   *    @return   Map<String, String> 第一个参数表示商品类型id，第二个String表示商品类型名称
   */
   public List<SimpleCategory> getGoodsCategory() {

       List<SimpleCategory> values=new ArrayList<SimpleCategory>();
       try {

           IndexReader reader = IndexReader.open(FSDirectory.open(new File(luceneSearchPath)), true); // only searching, so read-only=true

           //读取"modified"字段值，放到fieldCache中
           final String[] fc=FieldCache.DEFAULT.getStrings(reader, "categoryIndex");
           IndexSearcher searcher = new IndexSearcher(reader);
           //GroupCollector是自定义文档收集器，用于实现分组统计
           GroupCollector myCollector=new GroupCollector();
           myCollector.setFc(fc);
           searcher.search(new MatchAllDocsQuery(), myCollector);
           //GroupField用来保存分组统计的结果
           GroupField gf=myCollector.getGf();
           values=gf.getValues();
           for (SimpleCategory value : values) {
               System.out.println("商品类型名称： "+value +" 数量："+gf.getCountMap().get(value.getCategoryName())+"   商品父类型名称: "+value.getParentCategoryName());
           }

       } catch (Exception e) {
           e.printStackTrace();
       }

       return values;
   }

http://blog.csdn.net/xiaozhengdong/article/details/7035607

转自：http://www.cnblogs.com/chenying99/p/3819336

知识点

相关文章

最近更新

用Lucene实现分组，facet功能，FieldCache

相关问答

电脑会越来越神中的神是什么意思[2022-01-31]

lucene怎么用[2022-07-03]

lucene 分组统计热门关键字[2023-05-30]

Lucene计数 - 按查询分组(Lucene count - group by query)[2022-12-23]

Solr / Lucene：索引构面值(Solr/Lucene: Indexing facet values)[2024-02-22]

根据字段分组结果 - Lucene(Groupping results based on fields - Lucene)[2022-12-23]

Hibernate Search - 应用facet后可能获得新的Lucene查询？(Hibernate Search - possible to get new Lucene query after facets applied?)[2022-12-10]

Lucene SimpleFacetedSearch Facet计数超过2048(Lucene SimpleFacetedSearch Facet count exceeded 2048)[2022-07-16]

elasticsearch / lucene是否会对fieldcache中的缺失值施加内存开销？(Does elasticsearch/lucene impose memory overhead for missing values in fieldcache?)[2022-09-04]

知识点

相关文章

最近更新

用Lucene实现分组，facet功能，FieldCache

相关问答

电脑会越来越神中的神是什么意思[2022-01-31]

lucene怎么用[2022-07-03]

lucene 分组统计 热门关键字[2023-05-30]

Lucene计数 - 按查询分组(Lucene count - group by query)[2022-12-23]

Solr / Lucene：索引构面值(Solr/Lucene: Indexing facet values)[2024-02-22]

根据字段分组结果 - Lucene(Groupping results based on fields - Lucene)[2022-12-23]

Hibernate Search - 应用facet后可能获得新的Lucene查询？(Hibernate Search - possible to get new Lucene query after facets applied?)[2022-12-10]

Lucene SimpleFacetedSearch Facet计数超过2048(Lucene SimpleFacetedSearch Facet count exceeded 2048)[2022-07-16]

elasticsearch / lucene是否会对fieldcache中的缺失值施加内存开销？(Does elasticsearch/lucene impose memory overhead for missing values in fieldcache?)[2022-09-04]

lucene 分组统计热门关键字[2023-05-30]