thông tin biểu ghi
  • Bài báo khoa học công nghệ
  • Ký hiệu PL/XG: 004
    Nhan đề: English sentiment classification using a Gower-2 coefficient and a genetic algorithm with a fitness-proportionate selection in a parallel network environment /

ISSN 1992-8645
DDC 004
Tác giả CN Vo, Ngoc Phu
Nhan đề English sentiment classification using a Gower-2 coefficient and a genetic algorithm with a fitness-proportionate selection in a parallel network environment / Vo Ngoc Phu, Vo Thi Ngoc Tran
Thông tin xuất bản Pakistan : Little Lion Scientific, 2018
Mô tả vật lý 50 p.
Tóm tắt We have already studied a data mining field and a natural language processing field for many years. There are many significant relationships between the data mining and the natural language processing. Sentiment classification Has HAd many crucial contributions to many different fields in everyday life, such as in political Activities, commodity production, and commercial Activities. A new model using a Gower-2 Coefficient (HA) and a Genetic Algorithm (GA) with a fitness function (FF) which is a Fitness- proportionate Selection (FPS) has been proposed for the sentiment classification. This can be applied to a big data. The GA can process many bit arrays. Thus, it saves a lot of storage spaces. We do not need lots of storage spaces to store a big data. Firstly, we create many sentiment lexicons of our basis English sentiment dictionary (bESD) by using the HA through a Google search engine with AND operator and OR operator. Next, According to the sentiment lexicons of the bESD, we encode 7,000,000 sentences of our training data set including the 3,500,000 negative and the 3,500,000 positive in English successfully into the bit arrays in a small storage space. We also encrypt all sentences of 8,000,000 documents of our testing data set comprising the 4,000,000 positive and the 4,000,000 negative in English successfully into the bit arrays in the small storage space. We use the GA with the FPS to cluster one bit array (corresponding to one sentence) of one document of the testing data set into either the bit arrays of the negative sentences or the bit arrays of the positive sentences of the training data set. The sentiment classification of one document is based on the results of the sentiment classification of the sentences of this document of the testing data set. We tested the proposed model in both a sequential environment and a distributed network system. We achieved 88.12% accuracy of the testing data set. The execution time of the model in the parallel network environment is faster than the execution time of the model in the sequential system. The results of this work can be widely used in applications and research of the English sentiment classification.
Thuật ngữ chủ đề Bigdata-English sentiment-Algorithm
Từ khóa tự do English sentiment classification
Từ khóa tự do Distributed system
Từ khóa tự do Cloudera
Từ khóa tự do Gower-2
Từ khóa tự do Similarity coefficient
Từ khóa tự do Hadoop map and hadoop reduce
Từ khóa tự do Fitness-proportionate selection
Từ khóa tự do Genetic algorithm
Khoa Khoa Công nghệ Thông tin
Tác giả(bs) CN Vo, Thi Ngoc Tran
Nguồn trích Journal of Theoretical and Applied Information Technology. Số: Vol. 96 (2018), No. 4, P.887-936, ,
Địa chỉ Thư Viện Đại học Nguyễn Tất Thành
000 00000nam#a2200000u##4500
00119614
00212
00462EC1205-FA8D-46AA-936C-0CD285FE0524
005202003090056
008200302s2018 pk eng
0091 0
022 |a1992-8645
039|a20200309005618|bphucvh|c20200302140113|dphucvh|y20200302135714|zphucvh
040 |aNTT
041 |aeng
044 |apk
082 |a004|223
100 |aVo, Ngoc Phu|cDr.
245 |aEnglish sentiment classification using a Gower-2 coefficient and a genetic algorithm with a fitness-proportionate selection in a parallel network environment / |cVo Ngoc Phu, Vo Thi Ngoc Tran
260 |aPakistan : |bLittle Lion Scientific, |c2018
300 |a50 p.
520 |aWe have already studied a data mining field and a natural language processing field for many years. There are many significant relationships between the data mining and the natural language processing. Sentiment classification Has HAd many crucial contributions to many different fields in everyday life, such as in political Activities, commodity production, and commercial Activities. A new model using a Gower-2 Coefficient (HA) and a Genetic Algorithm (GA) with a fitness function (FF) which is a Fitness- proportionate Selection (FPS) has been proposed for the sentiment classification. This can be applied to a big data. The GA can process many bit arrays. Thus, it saves a lot of storage spaces. We do not need lots of storage spaces to store a big data. Firstly, we create many sentiment lexicons of our basis English sentiment dictionary (bESD) by using the HA through a Google search engine with AND operator and OR operator. Next, According to the sentiment lexicons of the bESD, we encode 7,000,000 sentences of our training data set including the 3,500,000 negative and the 3,500,000 positive in English successfully into the bit arrays in a small storage space. We also encrypt all sentences of 8,000,000 documents of our testing data set comprising the 4,000,000 positive and the 4,000,000 negative in English successfully into the bit arrays in the small storage space. We use the GA with the FPS to cluster one bit array (corresponding to one sentence) of one document of the testing data set into either the bit arrays of the negative sentences or the bit arrays of the positive sentences of the training data set. The sentiment classification of one document is based on the results of the sentiment classification of the sentences of this document of the testing data set. We tested the proposed model in both a sequential environment and a distributed network system. We achieved 88.12% accuracy of the testing data set. The execution time of the model in the parallel network environment is faster than the execution time of the model in the sequential system. The results of this work can be widely used in applications and research of the English sentiment classification.
650 |aBigdata|vEnglish sentiment|xAlgorithm
653 |aEnglish sentiment classification
653 |aDistributed system
653 |aCloudera
653 |aGower-2
653 |aSimilarity coefficient
653|aHadoop map and hadoop reduce
653|aFitness-proportionate selection
653|aGenetic algorithm
690 |aKhoa Công nghệ Thông tin
700 |aVo, Thi Ngoc Tran|cDr.
773|tJournal of Theoretical and Applied Information Technology|gVol. 96 (2018), No. 4, P.887-936
852 |aThư Viện Đại học Nguyễn Tất Thành
890|c1|a0|b0|d1
Không tìm thấy biểu ghi nào