ISSN
| 1992-8645 |
DDC
| 004 |
Tác giả CN
| Vo, Ngoc Phu |
Nhan đề
| English sentiment classification using a Gower-2 coefficient and a genetic algorithm with a fitness-proportionate selection in a parallel network environment / Vo Ngoc Phu, Vo Thi Ngoc Tran |
Thông tin xuất bản
| Pakistan : Little Lion Scientific, 2018 |
Mô tả vật lý
| 50 p. |
Tóm tắt
| We have already studied a data mining field and a natural language processing field for many years. There are many significant relationships between the data mining and the natural language processing. Sentiment classification Has HAd many crucial contributions to many different fields in everyday life, such as in political Activities, commodity production, and commercial Activities. A new model using a Gower-2 Coefficient (HA) and a Genetic Algorithm (GA) with a fitness function (FF) which is a Fitness- proportionate Selection (FPS) has been proposed for the sentiment classification. This can be applied to a big data. The GA can process many bit arrays. Thus, it saves a lot of storage spaces. We do not need lots of storage spaces to store a big data. Firstly, we create many sentiment lexicons of our basis English sentiment dictionary (bESD) by using the HA through a Google search engine with AND operator and OR operator. Next, According to the sentiment lexicons of the bESD, we encode 7,000,000 sentences of our training data set including the 3,500,000 negative and the 3,500,000 positive in English successfully into the bit arrays in a small storage space. We also encrypt all sentences of 8,000,000 documents of our testing data set comprising the 4,000,000 positive and the 4,000,000 negative in English successfully into the bit arrays in the small storage space. We use the GA with the FPS to cluster one bit array (corresponding to one sentence) of one document of the testing data set into either the bit arrays of the negative sentences or the bit arrays of the positive sentences of the training data set. The sentiment classification of one document is based on the results of the sentiment classification of the sentences of this document of the testing data set. We tested the proposed model in both a sequential environment and a distributed network system. We achieved 88.12% accuracy of the testing data set. The execution time of the model in the parallel network environment is faster than the execution time of the model in the sequential system. The results of this work can be widely used in applications and research of the English sentiment classification. |
Thuật ngữ chủ đề
| Bigdata-English sentiment-Algorithm |
Từ khóa tự do
| English sentiment classification |
Từ khóa tự do
| Distributed system |
Từ khóa tự do
| Cloudera |
Từ khóa tự do
| Gower-2 |
Từ khóa tự do
| Similarity coefficient |
Từ khóa tự do
| Hadoop map and hadoop reduce |
Từ khóa tự do
| Fitness-proportionate selection |
Từ khóa tự do
| Genetic algorithm |
Khoa
| Khoa Công nghệ Thông tin |
Tác giả(bs) CN
| Vo, Thi Ngoc Tran |
Nguồn trích
| Journal of Theoretical and Applied Information Technology.
Số: Vol. 96 (2018), No. 4, P.887-936, , |
Địa chỉ
| Thư Viện Đại học Nguyễn Tất Thành |
|
000
| 00000nam#a2200000u##4500 |
---|
001 | 19614 |
---|
002 | 12 |
---|
004 | 62EC1205-FA8D-46AA-936C-0CD285FE0524 |
---|
005 | 202003090056 |
---|
008 | 200302s2018 pk eng |
---|
009 | 1 0 |
---|
022 | |a1992-8645 |
---|
039 | |a20200309005618|bphucvh|c20200302140113|dphucvh|y20200302135714|zphucvh |
---|
040 | |aNTT |
---|
041 | |aeng |
---|
044 | |apk |
---|
082 | |a004|223 |
---|
100 | |aVo, Ngoc Phu|cDr. |
---|
245 | |aEnglish sentiment classification using a Gower-2 coefficient and a genetic algorithm with a fitness-proportionate selection in a parallel network environment / |cVo Ngoc Phu, Vo Thi Ngoc Tran |
---|
260 | |aPakistan : |bLittle Lion Scientific, |c2018 |
---|
300 | |a50 p. |
---|
520 | |aWe have already studied a data mining field and a natural language processing field for many years. There are many significant relationships between the data mining and the natural language processing. Sentiment classification Has HAd many crucial contributions to many different fields in everyday life, such as in political Activities, commodity production, and commercial Activities. A new model using a Gower-2 Coefficient (HA) and a Genetic Algorithm (GA) with a fitness function (FF) which is a Fitness- proportionate Selection (FPS) has been proposed for the sentiment classification. This can be applied to a big data. The GA can process many bit arrays. Thus, it saves a lot of storage spaces. We do not need lots of storage spaces to store a big data. Firstly, we create many sentiment lexicons of our basis English sentiment dictionary (bESD) by using the HA through a Google search engine with AND operator and OR operator. Next, According to the sentiment lexicons of the bESD, we encode 7,000,000 sentences of our training data set including the 3,500,000 negative and the 3,500,000 positive in English successfully into the bit arrays in a small storage space. We also encrypt all sentences of 8,000,000 documents of our testing data set comprising the 4,000,000 positive and the 4,000,000 negative in English successfully into the bit arrays in the small storage space. We use the GA with the FPS to cluster one bit array (corresponding to one sentence) of one document of the testing data set into either the bit arrays of the negative sentences or the bit arrays of the positive sentences of the training data set. The sentiment classification of one document is based on the results of the sentiment classification of the sentences of this document of the testing data set. We tested the proposed model in both a sequential environment and a distributed network system. We achieved 88.12% accuracy of the testing data set. The execution time of the model in the parallel network environment is faster than the execution time of the model in the sequential system. The results of this work can be widely used in applications and research of the English sentiment classification. |
---|
650 | |aBigdata|vEnglish sentiment|xAlgorithm |
---|
653 | |aEnglish sentiment classification |
---|
653 | |aDistributed system |
---|
653 | |aCloudera |
---|
653 | |aGower-2 |
---|
653 | |aSimilarity coefficient |
---|
653 | |aHadoop map and hadoop reduce |
---|
653 | |aFitness-proportionate selection |
---|
653 | |aGenetic algorithm |
---|
690 | |aKhoa Công nghệ Thông tin |
---|
700 | |aVo, Thi Ngoc Tran|cDr. |
---|
773 | |tJournal of Theoretical and Applied Information Technology|gVol. 96 (2018), No. 4, P.887-936 |
---|
852 | |aThư Viện Đại học Nguyễn Tất Thành |
---|
890 | |c1|a0|b0|d1 |
---|
| |
Không tìm thấy biểu ghi nào
|
|
|
|