Nutch没有删除Solr的重复项(Nutch not deleting duplicates from Solr)
当Nutch完成其爬行时,它识别出有重复删除并通过说“删除xxx重复”并完成没有问题。 唯一的问题是它实际上没有删除重复项,虽然它说它有。
我也试过自己使用重复数据删除命令,结果是一样的。
我有Solr&Nutch设置如我在博客上所示,如果你想深入研究一下,每个阶段在不同的帖子中:
http://amac4.blogspot.co.uk/2013/07/setting-up-solr-with-apache-tomcat-be.html http://amac4.blogspot.co.uk/2013/07/setting-up -nutch到爬行,filesystem.html
When Nutch finishes its crawl it recognises that there are duplicates to delete and goes through saying "deleting xxx duplicates" and completes with no problems. The only problem is that it actually hasnt deleted the duplicates although it said it has.
I've also tried using the dedup command on its own and the result is the same.
I have Solr & Nutch Set-up as shown on my blog if you wish to delve a little deeper, each stage in a different post:
http://amac4.blogspot.co.uk/2013/07/setting-up-solr-with-apache-tomcat-be.html http://amac4.blogspot.co.uk/2013/07/setting-up-nutch-to-crawl-filesystem.html
原文:https://stackoverflow.com/questions/17901592
最满意答案
XML方法在SQL Server中被破坏。 没有理由尝试在任何其他数据库中。
一种方法使用数组:
select s.id, array_agg(s.term) from search s group by s.id;
由于数据库支持数组,因此应该学会使用它们。 您可以将数组转换为字符串:
select s.id, array_join(array_agg(s.term), ',') as terms from search s group by s.id;
The XML method is brokenness in SQL Server. No reason to attempt it in any other database.
One method uses arrays:
select s.id, array_agg(s.term) from search s group by s.id;
Because the database supports arrays, you should learn to use them. You can convert the array to a string:
select s.id, array_join(array_agg(s.term), ',') as terms from search s group by s.id;
相关问答
更多-
SELECT t.TicketID, Assignment = STUFF( ( SELECT ', ' + l.Name FROM dbo.Login AS l INNER JOIN dbo.TicketAssignments AS ta ON l.LoginID = ta.LoginID WHERE ta.TicketID = t.TicketID FOR XML PATH(''), TYPE ).value('.[1]','nvarchar(max)' ...
-
XML方法在SQL Server中被破坏。 没有理由尝试在任何其他数据库中。 一种方法使用数组: select s.id, array_agg(s.term) from search s group by s.id; 由于数据库支持数组,因此应该学会使用它们。 您可以将数组转换为字符串: select s.id, array_join(array_agg(s.term), ',') as terms from search s group by s.id; The XML method is broke ...
-
SQL Server使用STUFF和GROUP BY将数据复制到另一个表(SQL Server copying data to another table using STUFF and GROUP BY)[2022-07-04]
给出max(product_id)的别名 INSERT INTO [ProductsImported] (image_url, product_id, item_size) SELECT image_url, MAX(product_id) As product_id, STUFF((SELECT ',' + item_size AS [text()] FROM (SELECT DIS ... -
你可以通过使用这个更简单 right('0' + UNIT_ADM, 3) 而不是stuff 。 You could make this simpler by using right('0' + UNIT_ADM, 3) instead of stuff.
-
我解决了它,下面是脚本 Select Col1, Col2, Col3, Col4, 'BDS: ' + STUFF (select ' , ' + BD from (select Col1, Col2, Col3, Col4, Col5, Col6, CONVERT(VARCHAR,Col7) + '-' + Col8+ '-' + CONVERT(VARCHAR,Col5) + '-' + CONVERT(VARCHA ...
-
你可以试试这个: SELECT TOP 50 m.*, STUFF(( SELECT TOP 50 ',' + convert(varchar(10), lu.lookup_id) FROM data_lookups_ref lu WHERE lu.ref_id = m.member_id FOR XML PATH('')),1,1,'') AS ids FROM members m Can you please try with this: SELECT ...
-
使用STUFF将多个元素放入SQL中的一个块中[重复](Using STUFF in order to get multiple elements into one block in SQL [duplicate])[2021-02-02]
使脚本具有CTE。 然后,一旦将其定义为CTE,就可以运行它。 SELECT Customer_name, STUFF(( SELECT ', ' + [Product They Own] FROM CTE WHERE Customer_name= final.Customer_name FOR XML PATH('')),1,1,'') AS [Products They Own] FROM CTE final GROUP BY Custom ... -
你应该首先将tb_departament与inner join tb_departament就像这样 Select sqd.id_question, STUFF(( SELECT ',' + td.nm_departament from tb_departament td INNER JOIN tb_survey_question_departament sqd1 ON sqd1.id_departament ...
-
当您从另一个问题( SQL最近的订单?MS SQL )发布查询时,我将使用我的答案,因为它比上述查询更清晰: SELECT o.* , OrderID as LastOrderID FROM ( SELECT BuyerEMail , Name , COUNT(*) as TotalOrders FROM Orders WHERE Pay != 'PayPal' GROUP BY BuyerEmail, Name ) o CROSS APPLY ( S ...
-
SELECT SEL_PROFILE_DETAIL_FK FROM sel_pro_pmtmethod WHERE pmt_type IN ('EFT', 'RCC') GROUP BY SEL_PROFILE_DETAIL_FK HAVING COUNT(distinct pmt_type) = 2 你的小组错了。 按pmt_type分组意味着它每行只显示一种类型。 因为您需要外键,所以需要将其分组。 DISTINCT意味着它只计算每个值的1次出现。 如果你真的想要所有相关的记录,你可以使用窗口 ...