Number of instances: 700 Number of attributes: 5236 === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall F-Measure ROC Area Class 0.93 0.002 0.989 0.93 ...
我首先对训练集进行了中文分词处理,在不做特征选择的情况下,只进行向量化处理: weka.filters.unsupervised.attribute.StringToWordVector in:9804 Number of instances: 9804 Number of attributes: 9302 产生的arff文件大约30M. 使用TFIDF进行特征选择,仍然使用这个训练集,代码很简单:   ...
昨晚又用SMO重新对上次的训练集做了训练,效果有所改观,结果如下: Number of instances in the arff file: 9804 Number of attributes: 9302 weka.filters.unsupervised.attribute.ReplaceMissingValues in:9804 weka.filters.unsupervised.attribute.Normalize in:9804 weka.filters.unsupervised.attribute.Replace ...