首页 \ 问答 \ 如何使用Orange对数据进行分层？(How to stratify data using Orange?)

如何使用Orange对数据进行分层？(How to stratify data using Orange?)

 寻找橙色专家的帮助。  
 我有一个大约600万行的数据集。 为简单起见，我们只考虑两列。 一个是正十进制数，并作为连续值导入。 另一个是离散值（0或1），其中1到0的比率为30：1。  
 我正在使用分类树（我将其标记为“学习者”）来获取分类器。 我正在尝试对我的数据集进行交叉验证，同时调整压倒性的30：1样本偏差。 我已经尝试了几种变体来做到这一点，但无论我是否对数据进行分层，都会继续得到相同的结果。  
 下面是我的代码，我已经注释掉了我尝试的各种行（使用True和False值进行分层）：  
import Orange
import os
import time
import operator

start = time.time()
print "Starting"
print ""

mydata = Orange.data.Table("testData.csv")

# This is used only for the test_with_indices method below
indicesCV = Orange.data.sample.SubsetIndicesCV(mydata)

# I only want the highest level classifier so max_depth=1
learner = Orange.classification.tree.TreeLearner(max_depth=1)

# These are the lines I've tried:
#res = Orange.evaluation.testing.cross_validation([learner], mydata, folds=5, stratified=True)
#res = Orange.evaluation.testing.proportion_test([learner], mydata, 0.8, 100, store_classifiers=1)
res = Orange.evaluation.testing.proportion_test([learner], mydata, learning_proportion=0.8, times=10, stratification=True, store_classifiers=1)
#res = Orange.evaluation.testing.test_with_indices([learner], mydata, indicesCV)

f = open('results.txt', 'a')
divString = "\n##### RESULTS (" + time.strftime("%Y-%m-%d %H:%M:%S") + ") #####"
f.write(divString)
f.write("\nAccuracy:     %.2f" %  Orange.evaluation.scoring.CA(res)[0])
f.write("\nPrecision:    %.2f" % Orange.evaluation.scoring.Precision(res)[0])
f.write("\nRecall:       %.2f" % Orange.evaluation.scoring.Recall(res)[0])
f.write("\nF1:           %.2f\n" % Orange.evaluation.scoring.F1(res)[0])

tree = learner(mydata)

f.write(tree.to_string(leaf_str="%V (%M out of %N)"))
print tree.to_string(leaf_str="%V (%M out of %N)")

end = time.time()
print "Ending"
timeStr = "Execution time: " + str((end - start) / 60) + " minutes"
f.write(timeStr)

f.close()
 
 注意：似乎存在语法错误（分层与分层），但程序按原样运行，没有例外。 此外，我知道文档显示像stratified = StratifiedIfPossible之类的东西，但由于某种原因，只有布尔值对我有效。 

Looking for some help from the Orange experts out there. 
I have a data set of about 6 million lines. For simplicity's sake, we'll consider only two columns. One is of positive decimal numbers and is imported as a continuous value. The other is of discrete values (either 0 or 1) where there is a ratio of 30:1 for 1's to 0's. 
I am using a classification tree (which I label as 'learner') to get the classifier. I'm then trying to do a cross-validation on my data set while adjusting for the overwhelming 30:1 sample bias. I've tried several variations to do this but continue to get the same result regardless of whether I stratify the data or not. 
Below is my code and I've commented out the various lines I've tried (using both True and False values for stratification): 
import Orange
import os
import time
import operator

start = time.time()
print "Starting"
print ""

mydata = Orange.data.Table("testData.csv")

# This is used only for the test_with_indices method below
indicesCV = Orange.data.sample.SubsetIndicesCV(mydata)

# I only want the highest level classifier so max_depth=1
learner = Orange.classification.tree.TreeLearner(max_depth=1)

# These are the lines I've tried:
#res = Orange.evaluation.testing.cross_validation([learner], mydata, folds=5, stratified=True)
#res = Orange.evaluation.testing.proportion_test([learner], mydata, 0.8, 100, store_classifiers=1)
res = Orange.evaluation.testing.proportion_test([learner], mydata, learning_proportion=0.8, times=10, stratification=True, store_classifiers=1)
#res = Orange.evaluation.testing.test_with_indices([learner], mydata, indicesCV)

f = open('results.txt', 'a')
divString = "\n##### RESULTS (" + time.strftime("%Y-%m-%d %H:%M:%S") + ") #####"
f.write(divString)
f.write("\nAccuracy:     %.2f" %  Orange.evaluation.scoring.CA(res)[0])
f.write("\nPrecision:    %.2f" % Orange.evaluation.scoring.Precision(res)[0])
f.write("\nRecall:       %.2f" % Orange.evaluation.scoring.Recall(res)[0])
f.write("\nF1:           %.2f\n" % Orange.evaluation.scoring.F1(res)[0])

tree = learner(mydata)

f.write(tree.to_string(leaf_str="%V (%M out of %N)"))
print tree.to_string(leaf_str="%V (%M out of %N)")

end = time.time()
print "Ending"
timeStr = "Execution time: " + str((end - start) / 60) + " minutes"
f.write(timeStr)

f.close()
 
Note: There may seem like there are syntax errors (stratified vs. stratification) but the program runs as-is without exceptions. Also, I know the documentation shows stuff like stratified=StratifiedIfPossible but for some reason, only boolean values work for me.

原文：https://stackoverflow.com/questions/29973059

更新时间：2023-08-09 06:08

最满意答案

 %s用于以null结尾的字符串。 magic只是一个2字节的数组，而不是一个字符串。  
printf("magic number = %c%c\n", bmp_header_p->magic[0], bmp_header_p->magic[1]);

%s is for null-terminated strings. magic is just an array of 2 bytes, not a string. 
printf("magic number = %c%c\n", bmp_header_p->magic[0], bmp_header_p->magic[1]);

如何使用Orange对数据进行分层？(How to stratify data using Orange?)

最满意答案

相关问答

如何正确地将内存分配给存储在结构中的动态整数数组？(How to correctly allocate memory to a dynamic array of integers stored in a struct?)[2023-03-29]

动态内存分配，如何在处理结构时正确使用指针？(Dynamic Memory Allocate, how to use pointer correctly when dealing with Structure?)[2023-04-20]

如何正确地将包含int数组的结构分配给struct数组？(How do I correctly assign a struct containing an array of int to an array of struct?)[2022-11-22]

这段代码是否泄漏内存？(Does this piece of code leak memory?)[2022-01-30]

C结构与指针认为我没有正确分配内存(C struct with pointer think I'm not allocating memory correctly)[2022-07-07]

我如何正确地指向这段内存以将其视为我的结构？(How do I point to this piece of memory correctly to treat it as my struct?)[2021-09-11]

如何分配结构的内存？(How to allocate memory of a struct?)[2022-06-21]

在C中，指针的特定类型如何处理指向的内存空间？(In C, how does the specific type of pointer treat the memory space which point to?)[2023-04-29]

gcc是否真的将原型视为函数并且它们的参数是否已分配内存？(Does gcc actually treat prototypes as functions and do their parameters have memory allocated?)[2023-01-04]

使用Struct指针进行内存分配的问题(Issue with Memory Allocation using Struct Pointers)[2022-02-05]

相关文章

最新问答