Naive Bayes, dataset choice(sentences vs dictionary)

I'm trying to classify emotion based on text using naive Bayes. I have the ISEAR dataset and NRC dataset. I felt that ISEAR has lower result compared to NRC. A little explanation for those didn't know the difference between ISEAR and NRC, ISEAR was dataset consist of sentences and NRC was word as a dictionary. The result was far from I expected when inputting manual sentences using ISEAR.

I'm kinda new with machine learning, so correct me if I'm wrong.

So naive Bayes worked using prob of each word showing right? So for example, I have a word "I'm happy" and it appears on the "Joy" features 5 times and 6 times on the "Surprise" features. Isn't this could cause false predict? Compare with a word as a dictionary, for example, happy labeled as joy and surprise and only occur once in the each dataset?

Am I okay to go if using a word as dictionary as my data set using simple Naive Bayes method?

链接地址: http://www.djcxy.com/p/40168.html

上一篇: nlp朴素贝叶斯分类器训练

下一篇: 朴素贝叶斯,数据集选择(句子与字典)