关于决策树的问题(question about decision trees)
在研究决策树一段时间之后,我注意到有一种叫做助推的小技术。 我在正常情况下看到,它将提高决策树的准确性。
所以我只是想知道,为什么我们不只是简单地将这种提升结合到我们构建的每个决策树中? 由于目前我们将提升作为一种单独的技术,所以我在思考:使用提升比使用单一决策树有任何缺点吗?
谢谢你帮助我!
after studying decision tree for a while, I noticed there is a small technique called boosting. I see in normal cases, it will improve the accuracy of the decision tree.
So I am just wondering, why don't we just simply incorporate this boosting into every decision tree we built? Since currently we leave boosting out as a separate technique, so I ponder: are there any disadvantages of using boosting than just using a single decision tree?
Thanks for helping me out here!
原文:https://stackoverflow.com/questions/4262000
最满意答案
如果要使用y删除值,可以使用group by,count和having来过滤可能的id
delete from my_table where id in ( select t.id from( select id from my_table group by id having count(*) > 1 ) t and flag ='y')
否则如果你想保留y那么
delete from my_table where id in ( select t.id from( select id from my_table group by id having count(*) > 1 ) t and flag <>'y')
You could use a group by, count and having for filter the possible id if you want delete the value with y
delete from my_table where id in ( select t.id from( select id from my_table group by id having count(*) > 1 ) t and flag ='y')
otherwise if you want keep the y then
delete from my_table where id in ( select t.id from( select id from my_table group by id having count(*) > 1 ) t and flag <>'y')
相关问答
更多-
你可以按date和mh使用plyr::arrange对数据进行排序,然后删除重复项: df <- read.table(textConnection(" date wd ws temp sol octa pg mh daterep '2007-01-01 00:00:00' 100 1.5 9.0 0 8 D 100 FALSE '2007-01-01 01:00:00' 90 2.6 9.0 0 7 E 50 TRUE ' ...
-
删除重复项,其中同一行中另一列的值= 0(Delete duplicates where the value of another column in the same row = 0)[2021-02-20]
这是VBA中的一种简单方法,可以像表格一样查询电子表格。 你必须为你的特定情况调整RunQuery()方法,因为我不知道你的列名。 这假设如下: 您的工作表布局类似于表,第1行中的列名称和下面的数据 您在运行此代码之前已保存工作簿 在我的特定工作簿中,我有一个标记为“类别”的列,一个标记为“类型”的列,我添加了一个名为“DeleteMe”的列。 'Adapt RunQuery for your particular needs Sub RunQuery() 'Change this SQL stat ... -
如果您只想选择行,可以执行以下操作: select t.* from t where t.columnA = (select max(t2.columnA) from t t2 where t2.columnB = t.columnB); 如果你真的想要删除它们,那么一种方法是: delete from t where t.columnA < (select max(t2.columnA) from t t2 where t2.columnB = t.columnB); If you just want ...
-
这可能对你有帮助。 我假设你有一个Header行。 如果不是,请将iHeaderRowIndex更改为0。 第一部分创建一个字典对象,收集所有唯一的EAN编号,并为每个EAN分配一个非常高的价格(1000万) 然后它重新扫描列表,这次做一个“MIN”逻辑来确定每个EAN的最低价格。 另一次重新扫描,这次它在每个最小EAN旁边的空闲列中放置一个MIN标记(你应该选择一个空闲和空列的名称 - 我输入“W”但你可以改变它) 最后,它以相反的顺序重新扫描列表,以删除所有没有MIN标记的行。 此外,最后,它删除带有M ...
-
您可以使用“dplyr”来尝试以下内容: library(dplyr) data %>% ## Your data group_by(a) %>% ## grouped by "a" filter(b == max(b)) ## filtered to only include the rows where b == max(b) # Source: local data frame [2 x 3] # Groups: a # # ...
-
可重复的例子: > df = data.frame(v1=runif(10), v2=runif(10)+100, id=sample(c("High","Low"),10,TRUE)) > df v1 v2 id 1 0.5369817 100.7348 High 2 0.4603543 100.2849 Low 3 0.7916333 100.3077 High 4 0.9786784 100.6317 Low 5 0.9116897 100.6764 ...
-
这是一个满足您需求的解决方案,但可能不是最优雅的方法。 data = read.table(header=TRUE, stringsAsFactors=FALSE, text="ColA ColB ColC ColD ColE rs778 C Can + C/T rs778 C Pro + C/T ...
-
如果要使用y删除值,可以使用group by,count和having来过滤可能的id delete from my_table where id in ( select t.id from( select id from my_table group by id having count(*) > 1 ) t and flag ='y') 否则如果你想保留y那么 delete from my_table whe ...
-
对列进行求和,然后根据具有重复值的另一列进行分组(Sum a column then group based on another column that has duplicate values)[2022-03-17]
我相信你只需要在查询中引入分组: SELECT ShipDate, Stockroom, COLineNumber, CONumber, ShippedQty, ReversedQty, SUM(CASE WHEN ReversedQty IS NULL THEN ShippedQty ELSE ReversedQty * - 1 END) AS FinalShippedQty FROM MyTable WHERE CONumber = 'RAN-00001000' GROUP BY S ... -
TOP 100 PERCENT用于将ORDER BY添加到视图源(以避免最终的ORDER BY ,这简直是愚蠢的)。 您必须定义优先级,首先应列出Agenzia : SELECT Agenzia, Codice, Descrizione, FlBSP, MastroForn, CapocForn, ContoForn, SottocForn, CodIVANazAtt, CommNazAttiva, CommIntAttiva, FlCancellato, DataUltModifica, ...