首页 \ 教程 \ solr

知识点

Solr

集成Nutch和Solr

Nutch-2.2.1学习之七Nutch与Solr的集成

Nutch和Solr的集成方案

Nutch集成Solr中文分词Schema

荐《Nutch笔记》Nutch-1.7+solr-4.7集成

Nutch1.3集成Solr网页快照功能实现（一）

Nutch、Hadoop、Lucene、Solr 之间的数据交互

nutch1.3与solr3.4集成部署在eclipse上之——运行的输出日志

Nutch1.3集成Solr3.4网页快照功能实现(四)

Nutch1.3集成Solr3.4网页快照功能实现(三)

Nutch1.3集成Solr3.4网页快照功能实现（二）

基于hadoop+nutch+solr的搜索引擎环境搭载<二>nutch+solr整合以及搭载在hadoop上

Nutch&Solr小计

nutch与起点R3集成之笔记（四）

Solr与Mysql集成指南

nutch，solr集成在hadoop上

2019-03-27 01:11|来源: 网路

nutch是一个应用程序，在我的这个项目里主要是做爬虫用，爬取后的内容寄存在hdfs上，所以在hdfs结合模块现已结合上去了。

solr：

在eclipse新建动态页面项目，删去WebContent的一切内容。

　在solr/dist下（或许/solr3.6.2/example/webapps下）解压solr.war 将一切内容拷贝到WenContent里。

修正WEB-INF里的web.xml

增加

solr/home/home/hadoop/solr3.6.2/example/solrtype>java.lang.Stringtype>

到最后的前。

解说下这个当地是你的solr core的方位

采用solr多核的话能够将

/home/hadoop/solr3.6.2/example/multicore，一起修正multicore中的solr.xml

instanceDir为core的寄存方位

在server中新建tomcat7服务，然后增加你刚新建的动态页面工程:

创建indexwrite，开始抓取资源:

indexwrite.sprite("http://www.metabase.cn/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.jinanwuliangye.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.tongxinglong.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.qclchina.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.vipfuxin.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.minnan888.net/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.lcsyt.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://lf.yunnanw.cn/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.yzbljp.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.hyyfscl.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.shoudashou.com/","utf-8");//资源地址，utf-8

indexwrite.sprite("http://www.shuoma.com.cn/","utf-8");//资源地址，utf-8

InputStrame.close;

转自：http://www.cnblogs.com/haomad/p/3793222

相关问答

如何解决：请教nutch和solr集成问题[2023-12-14]

3.1.Nutch安装 l 解压 tar -zxvf apache-nutch-1.4-bin.tar.gz l 终端下cd到目录 apache-nutch-1.4-bin/runtime/local,下面会有 bin conf lib ...
如何解决：请教nutch和solr集成问题[2021-11-19]

3.1.Nutch安装 l 解压 tar -zxvf apache-nutch-1.4-bin.tar.gz l 终端下cd到目录 apache-nutch-1.4-bin/runtime/local,下面会有 bin conf lib ...
lucene nutch solr及hadoop的区别和联系[2022-03-15]

apache lucene是apache下一个著名的开源搜索引擎内核，基于Java技术，处理索引，拼写检查，点击高亮和其他分析，分词等技术。 nutch和solr原来都是lucene下的子项目。但后来nutch独立成为独立项目。nutch是2004年由俄勒冈州立大学开源实验室模仿google搜索引擎创立的开源搜索引擎，后归于apache旗下。nutch主要完成抓取，提取内容等工作。 solr则是基于lucene的搜索界面。提供XML/HTTP 和 JSON/Python/Ruby API，提供搜索入口，点击 ...
Nutch与Solr(Nutch versus Solr)[2022-06-20]

Nutch是构建网络爬虫和搜索引擎的框架。 Nutch可以完成从收集网页到建立倒排索引的整个过程。它也可以将这些索引推送到Solr。 Solr主要是一个搜索引擎，支持分面搜索和许多其他简洁的功能。但Solr不提取数据，你必须提供它。因此，也许你必须要问的第一件事是在你是否有可用的索引数据（在XML中，在CMS或数据库中）。在这种情况下，您应该只使用Solr并为其提供数据。另一方面，如果你不得不从网络上获取数据，你可能更愿意使用Nutch。 Nutch is a framework to build ...
Nutch 1.2 Solr 3.6集成问题(Nutch 1.2 Solr 3.6 integration issue)[2022-08-14]

这主要是Nutch使用的Solrj版本罐和您尝试集成的Solr 3.6之间的javabin不兼容性。您需要更新Solrj罐并重新生成作业。按照论坛中提到的步骤操作。 This is mainly the javabin incompatiblity between the Solrj version jars used by Nutch and the Solr 3.6 which you are trying to integrate. You would need to update the Sol ...
Apache Nutch 1.12和Solr 5.4.1的集成失败(Integration of Apache Nutch 1.12 and Solr 5.4.1 failed)[2023-08-01]

问题是solr，nutch和hbase之间的版本不兼容。这篇文章对我来说非常合适。 The problem was version incompatibility between solr, nutch and hbase. This article worked perfectly for me.
nutch 1.2 solr 3.1集成问题(nutch 1.2 solr 3.1 integration issue)[2023-02-21]

您需要将以下Apache Commons库添加到类路径中： commons-httpclient.jar （您可以将它放在nutch安装所使用的其他JAR所在的文件夹中）。你可以在这里找到当前版本的HttpClient http://hc.apache.org/httpcomponents-client-ga/ 请注意，您的Nutch版本可能使用较旧版本的HttpClient，而当前版本的HttpClient与旧版本不兼容。在这种情况下，您需要下载旧版本的HttpClient，并在您的库中包含旧版本。 ...
我应该使用cygwin进行nutch和solr集成吗？(Should i use cygwin for nutch and solr integration?)[2023-01-10]

使用cygwin，这是一个很好的指南，可以将它们组合在一起： http://amac4.blogspot.com/2013/07/setting-up-solr-with-apache-tomcat-be.html Use cygwin, heres an excellent guide to set them up together: http://amac4.blogspot.com/2013/07/setting-up-solr-with-apache-tomcat-be.html
在hadoop上运行nutch，那是nutch的日志？(running nutch on the hadoop ，where is the nutch logs？)[2021-10-19]

如果在hadoop上运行nutch，则会生成与每个映射器和每个阶段的reducer相对应的日志。它的位置是{HADOOP_LOG_DIR}/userlogs//syslog If you are running nutch on hadoop, the logs corresponding to each mapper and reducer of each phase is generated. The location of that is {HADOOP_LOG_DIR}/user ...
Nutch v Solr v Nutch + Solr(Nutch v Solr v Nutch+Solr)[2022-04-21]

在目前阶段，Nutch只负责抓取网页，这意味着访问网页，提取内容，找到更多链接并重复这个过程（我正在跳过很多复杂的东西，但希望你能得到这个想法）。爬网过程的最后一步是将数据存储在后端（ES / Solr是1.x分支上支持的数据存储）。因此，在这个步骤中，Solr开始发挥作用，在Nutch完成其工作之后，您需要将数据存储在某处以便能够在其上执行查询：这是Solr作业。前段时间Nutch包含了编写倒排索引的能力（正如问题中所解释的那样），但是决定（也是前一段时间）是弃用这个以支持使用Solr / ES（ ...

知识点

相关文章

最近更新

nutch，solr集成在hadoop上

相关问答

如何解决：请教nutch和solr集成问题[2023-12-14]

如何解决：请教nutch和solr集成问题[2021-11-19]

lucene nutch solr及hadoop的区别和联系[2022-03-15]

Nutch与Solr(Nutch versus Solr)[2022-06-20]

Nutch 1.2 Solr 3.6集成问题(Nutch 1.2 Solr 3.6 integration issue)[2022-08-14]

Apache Nutch 1.12和Solr 5.4.1的集成失败(Integration of Apache Nutch 1.12 and Solr 5.4.1 failed)[2023-08-01]

nutch 1.2 solr 3.1集成问题(nutch 1.2 solr 3.1 integration issue)[2023-02-21]

我应该使用cygwin进行nutch和solr集成吗？(Should i use cygwin for nutch and solr integration?)[2023-01-10]

在hadoop上运行nutch，那是nutch的日志？(running nutch on the hadoop ，where is the nutch logs？)[2021-10-19]

Nutch v Solr v Nutch + Solr(Nutch v Solr v Nutch+Solr)[2022-04-21]