ISSN
| 1941-7020 |
DDC
| 004 |
Nhan đề
| English Sentiment Classification using Only the Sentiment Lexicons with a JOHNSON Coefficient in a Parallel Network Environment / Vo Ngoc Phuc, Vo Thi Ngoc Tran |
Thông tin xuất bản
| United States : Science Publications, 2017 |
Mô tả vật lý
| 28 p. |
Tóm tắt
| Sentiment classification is significant in everyday life, such as in political activities, commodity production and commercial activities. In this survey, we have proposed a new model for Big Data sentiment classification. We use many sentiment lexicons of our basis English Sentiment Dictionary (bESD) to classify 5,000,000 documents including 2,500,000 positive and 2,500,000 negative of our testing data set in English. We do not use any training data set in English. We do not use any one- dimensional vector in both a sequential environment and a distributed network system. We also do not use any multi-dimensional vector in both a sequential system and a parallel network environment. We use a JOHNSON Coefficient (JC) through a Google search engine with AND operator and OR operator to identify many sentiment values of the sentiment lexicons of the bESD in English. One term (a word or a phrase in English) is clustered into either the positive polarity or the negative polarity if this term is very close to either the positive or the negative by using many similarity measures of the JC. It means that this term is very similar to either the positive or the negative. We tested the proposed model in both a sequential environment and a distributed network system. We achieved 87.56% accuracy of the testing data set. The execution time of the model in the parallel network environment is faster than the execution time of the model in the sequential system. Our new model can classify sentiment of millions of English documents based on the sentiment lexicons of the bESD in a parallel network environment. The proposed model is not depending on both any special domain and any training stage. This survey used many similarity coefficients of a data mining field. The results of this work can be widely used in applications and research of the English sentiment classification. |
Thuật ngữ chủ đề
| Big Data sentiment classification |
Từ khóa tự do
| English Sentiment Classification |
Từ khóa tự do
| Distributed System |
Từ khóa tự do
| Parallel System |
Từ khóa tự do
| Cloudera |
Từ khóa tự do
| JOHNSON Coefficient |
Từ khóa tự do
| Hadoop Map and Hadoop Reduce |
Từ khóa tự do
| Sentiment Lexicons |
Khoa
| Khoa Công nghệ Thông tin |
Tác giả(bs) CN
| Vo, Ngoc Phu |
Tác giả(bs) CN
| Vo, Thi Ngoc Tran |
Nguồn trích
| American Journal of Engineering and Applied Sciences .
Số: Vol. 11, Issue 1, P.38-65, , |
Địa chỉ
| Thư Viện Đại học Nguyễn Tất Thành |
Tệp tin điện tử
| https://thescipub.com/abstract/10.3844/ajeassp.2018.38.65 |
|
000
| 00000nam#a2200000u##4500 |
---|
001 | 19566 |
---|
002 | 12 |
---|
004 | 4FA40FEF-20CF-4D5F-AC75-0F71AF1ECF81 |
---|
005 | 202003090045 |
---|
008 | 200227s2017 xxu eng |
---|
009 | 1 0 |
---|
022 | |a1941-7020 |
---|
039 | |a20200309004552|bphucvh|c20200227143339|dphucvh|y20200227093112|zphucvh |
---|
040 | |aNTT |
---|
041 | |aeng |
---|
044 | |axxu |
---|
082 | |a004|223 |
---|
245 | |aEnglish Sentiment Classification using Only the Sentiment Lexicons with a JOHNSON Coefficient in a Parallel Network Environment / |cVo Ngoc Phuc, Vo Thi Ngoc Tran |
---|
260 | |aUnited States : |bScience Publications, |c2017 |
---|
300 | |a28 p. |
---|
520 | |aSentiment classification is significant in everyday life, such as in political activities, commodity production and commercial activities. In this survey, we have proposed a new model for Big Data sentiment classification. We use many sentiment lexicons of our basis English Sentiment Dictionary (bESD) to classify 5,000,000 documents including 2,500,000 positive and 2,500,000 negative of our testing data set in English. We do not use any training data set in English. We do not use any one- dimensional vector in both a sequential environment and a distributed network system. We also do not use any multi-dimensional vector in both a sequential system and a parallel network environment. We use a JOHNSON Coefficient (JC) through a Google search engine with AND operator and OR operator to identify many sentiment values of the sentiment lexicons of the bESD in English. One term (a word or a phrase in English) is clustered into either the positive polarity or the negative polarity if this term is very close to either the positive or the negative by using many similarity measures of the JC. It means that this term is very similar to either the positive or the negative. We tested the proposed model in both a sequential environment and a distributed network system. We achieved 87.56% accuracy of the testing data set. The execution time of the model in the parallel network environment is faster than the execution time of the model in the sequential system. Our new model can classify sentiment of millions of English documents based on the sentiment lexicons of the bESD in a parallel network environment. The proposed model is not depending on both any special domain and any training stage. This survey used many similarity coefficients of a data mining field. The results of this work can be widely used in applications and research of the English sentiment classification. |
---|
650 | |aBig Data sentiment classification |
---|
653 | |aEnglish Sentiment Classification |
---|
653 | |aDistributed System |
---|
653 | |aParallel System |
---|
653 | |aCloudera |
---|
653 | |aJOHNSON Coefficient |
---|
653 | |aHadoop Map and Hadoop Reduce |
---|
653 | |aSentiment Lexicons |
---|
690 | |aKhoa Công nghệ Thông tin |
---|
700 | |aVo, Ngoc Phu |
---|
700 | |aVo, Thi Ngoc Tran |
---|
773 | |tAmerican Journal of Engineering and Applied Sciences |gVol. 11, Issue 1, P.38-65 |
---|
852 | |aThư Viện Đại học Nguyễn Tất Thành |
---|
856 | |uhttps://thescipub.com/abstract/10.3844/ajeassp.2018.38.65 |
---|
890 | |c1|a0|b0|d0 |
---|
| |
Không tìm thấy biểu ghi nào
|
|
|
|