首页 \ 问答 \ Nutch没有删除Solr的重复项(Nutch not deleting duplicates from Solr)

Nutch没有删除Solr的重复项(Nutch not deleting duplicates from Solr)

当Nutch完成其爬行时,它识别出有重复删除并通过说“删除xxx重复”并完成没有问题。 唯一的问题是它实际上没有删除重复项,虽然它说它有。

我也试过自己使用重复数据删除命令,结果是一样的。

我有Solr&Nutch设置如我在博客上所示,如果你想深入研究一下,每个阶段在不同的帖子中:

http://amac4.blogspot.co.uk/2013/07/setting-up-solr-with-apache-tomcat-be.html http://amac4.blogspot.co.uk/2013/07/setting-up -nutch到爬行,filesystem.html


When Nutch finishes its crawl it recognises that there are duplicates to delete and goes through saying "deleting xxx duplicates" and completes with no problems. The only problem is that it actually hasnt deleted the duplicates although it said it has.

I've also tried using the dedup command on its own and the result is the same.

I have Solr & Nutch Set-up as shown on my blog if you wish to delve a little deeper, each stage in a different post:

http://amac4.blogspot.co.uk/2013/07/setting-up-solr-with-apache-tomcat-be.html http://amac4.blogspot.co.uk/2013/07/setting-up-nutch-to-crawl-filesystem.html


原文:https://stackoverflow.com/questions/17901592
更新时间:2022-09-04 15:09

最满意答案

XML方法在SQL Server中被破坏。 没有理由尝试在任何其他数据库中。

一种方法使用数组:

select s.id, array_agg(s.term)
from search s
group by s.id;

由于数据库支持数组,因此应该学会使用它们。 您可以将数组转换为字符串:

select s.id, array_join(array_agg(s.term), ',') as terms
from search s
group by s.id;

The XML method is brokenness in SQL Server. No reason to attempt it in any other database.

One method uses arrays:

select s.id, array_agg(s.term)
from search s
group by s.id;

Because the database supports arrays, you should learn to use them. You can convert the array to a string:

select s.id, array_join(array_agg(s.term), ',') as terms
from search s
group by s.id;

相关问答

更多

相关文章

更多

最新问答

更多
  • h2元素推动其他h2和div。(h2 element pushing other h2 and div down. two divs, two headers, and they're wrapped within a parent div)
  • 创建一个功能(Create a function)
  • 我投了份简历,是电脑编程方面的学徒,面试时说要培训三个月,前面
  • PDO语句不显示获取的结果(PDOstatement not displaying fetched results)
  • Qt冻结循环的原因?(Qt freezing cause of the loop?)
  • TableView重复youtube-api结果(TableView Repeating youtube-api result)
  • 如何使用自由职业者帐户登录我的php网站?(How can I login into my php website using freelancer account? [closed])
  • SQL Server 2014版本支持的最大数据库数(Maximum number of databases supported by SQL Server 2014 editions)
  • 我如何获得DynamicJasper 3.1.2(或更高版本)的Maven仓库?(How do I get the maven repository for DynamicJasper 3.1.2 (or higher)?)
  • 以编程方式创建UITableView(Creating a UITableView Programmatically)
  • 如何打破按钮上的生命周期循环(How to break do-while loop on button)
  • C#使用EF访问MVC上的部分类的自定义属性(C# access custom attributes of a partial class on MVC with EF)
  • 如何获得facebook app的publish_stream权限?(How to get publish_stream permissions for facebook app?)
  • 如何防止调用冗余函数的postgres视图(how to prevent postgres views calling redundant functions)
  • Sql Server在欧洲获取当前日期时间(Sql Server get current date time in Europe)
  • 设置kotlin扩展名(Setting a kotlin extension)
  • 如何并排放置两个元件?(How to position two elements side by side?)
  • 如何在vim中启用python3?(How to enable python3 in vim?)
  • 在MySQL和/或多列中使用多个表用于Rails应用程序(Using multiple tables in MySQL and/or multiple columns for a Rails application)
  • 如何隐藏谷歌地图上的登录按钮?(How to hide the Sign in button from Google maps?)
  • Mysql左连接旋转90°表(Mysql Left join rotate 90° table)
  • dedecms如何安装?
  • 在哪儿学计算机最好?
  • 学php哪个的书 最好,本人菜鸟
  • 触摸时不要突出显示表格视图行(Do not highlight table view row when touched)
  • 如何覆盖错误堆栈getter(How to override Error stack getter)
  • 带有ImageMagick和许多图像的GIF动画(GIF animation with ImageMagick and many images)
  • USSD INTERFACE - > java web应用程序通信(USSD INTERFACE -> java web app communication)
  • 电脑高中毕业学习去哪里培训
  • 正则表达式验证SMTP响应(Regex to validate SMTP Responses)