Dòng Nội dung
1
A reformed K-Nearest neighbors algorithm for big data sets / Vo Ngoc Phu, Vo Thi Ngoc Tran // Journal of Computer Science. - . - Vol. 14, Issue 9, P.1213-1225. - ISSN:

New York : Science Publications, 2018
13 p.
Ký hiệu phân loại (DDC): 004
A Data Mining Has Already Had Many Algorithms Which A K-Nearest Neighbors Algorithm, K-NN, Is A Famous Algorithm For Researchers. K-NN Is Very Effective On Small Data Sets, However It Takes A Lot Of Time To Run On Big Datasets. Today, Data Sets Often Have Millions Of Data Records, Hence, It Is Difficult To Implement K-NN On Big Data. In This Research, We Propose An Improvement To K-NN To Process Big Datasets In A Shortened Execution Time. The Reformed K-Nearest Neighbors Algorithm (R-K-NN) Can Be Implemented On Large Datasets With Millions Or Even Billions Of Data Records. R-K-NN Is Tested On A Data Set With 500,000 Records. The Execution Time Of R-K-NN Is Much Shorter Than That Of K-NN. In Addition, R-K-NN Is Implemented In A Parallel Network System With Hadoop Map (M) And Hadoop Reduce (R).
Số bản sách: (0) Tài liệu số: (1)
2
A STING algorithm and multi-dimensional vectors used for english sentiment classification in a distributed system / Vo Ngoc Phu, Vo Thi Ngoc Tran // . - . - Vol. 11, Issue 1, P. 19-37. - ISSN:
// American Journal of Engineering and Applied Sciences. - . - . - ISSN:

Science Publications2017
19 p. ;
Ký hiệu phân loại (DDC): 004
Sentiment classification is significant in everyday life, such as in political activities, commodity production and commercial activities. Finding a fast, highly accurate solution to classify emotion has been a challenge for scientists. In this research, we have proposed a new model for Big Data sentiment classification in the parallel network environment - a Cloudera system with Hadoop Map (M) and Hadoop Reduce (R). Our new model has used a Statistical Information Grid Algorithm (STING) with multi-dimensional vector and 2,000,000 English documents of our English training data set for English document-level sentiment classification. Our new model can classify sentiment of millions of English documents based on many English documents in the parallel network environment. However, we tested our new model on our testing data set (including 1,000,000 English reviews, 500,000 positive and 500,000 negative) and achieved 83.92% accuracy.
Số bản sách: (0) Tài liệu số: (1)
3
English sentiment classification using a Gower-2 coefficient and a genetic algorithm with a fitness-proportionate selection in a parallel network environment / Vo Ngoc Phu, Vo Thi Ngoc Tran // Journal of Theoretical and Applied Information Technology. - . - Vol. 96 (2018), No. 4, P.887-936. - ISSN:

Pakistan : Little Lion Scientific, 2018
50 p.
Ký hiệu phân loại (DDC): 004
We have already studied a data mining field and a natural language processing field for many years. There are many significant relationships between the data mining and the natural language processing. Sentiment classification Has HAd many crucial contributions to many different fields in everyday life, such as in political Activities, commodity production, and commercial Activities. A new model using a Gower-2 Coefficient (HA) and a Genetic Algorithm (GA) with a fitness function (FF) which is a Fitness- proportionate Selection (FPS) has been proposed for the sentiment classification. This can be applied to a big data. The GA can process many bit arrays. Thus, it saves a lot of storage spaces. We do not need lots of storage spaces to store a big data. Firstly, we create many sentiment lexicons of our basis English sentiment dictionary (bESD) by using the HA through a Google search engine with AND operator and OR operator. Next, According to the sentiment lexicons of the bESD, we encode 7,000,000 sentences of our training data set including the 3,500,000 negative and the 3,500,000 positive in English successfully into the bit arrays in a small storage space. We also encrypt all sentences of 8,000,000 documents of our testing data set comprising the 4,000,000 positive and the 4,000,000 negative in English successfully into the bit arrays in the small storage space. We use the GA with the FPS to cluster one bit array (corresponding to one sentence) of one document of the testing data set into either the bit arrays of the negative sentences or the bit arrays of the positive sentences of the training data set. The sentiment classification of one document is based on the results of the sentiment classification of the sentences of this document of the testing data set. We tested the proposed model in both a sequential environment and a distributed network system. We achieved 88.12% accuracy of the testing data set. The execution time of the model in the parallel network environment is faster than the execution time of the model in the sequential system. The results of this work can be widely used in applications and research of the English sentiment classification.
Số bản sách: (0) Tài liệu số: (1)
4
English Sentiment Classification using Only the Sentiment Lexicons with a JOHNSON Coefficient in a Parallel Network Environment / Vo Ngoc Phuc, Vo Thi Ngoc Tran // American Journal of Engineering and Applied Sciences . - . - Vol. 11, Issue 1, P.38-65. - ISSN:

United States : Science Publications, 2017
28 p.
Ký hiệu phân loại (DDC): 004
Sentiment classification is significant in everyday life, such as in political activities, commodity production and commercial activities. In this survey, we have proposed a new model for Big Data sentiment classification. We use many sentiment lexicons of our basis English Sentiment Dictionary (bESD) to classify 5,000,000 documents including 2,500,000 positive and 2,500,000 negative of our testing data set in English. We do not use any training data set in English. We do not use any one- dimensional vector in both a sequential environment and a distributed network system. We also do not use any multi-dimensional vector in both a sequential system and a parallel network environment. We use a JOHNSON Coefficient (JC) through a Google search engine with AND operator and OR operator to identify many sentiment values of the sentiment lexicons of the bESD in English. One term (a word or a phrase in English) is clustered into either the positive polarity or the negative polarity if this term is very close to either the positive or the negative by using many similarity measures of the JC. It means that this term is very similar to either the positive or the negative. We tested the proposed model in both a sequential environment and a distributed network system. We achieved 87.56% accuracy of the testing data set. The execution time of the model in the parallel network environment is faster than the execution time of the model in the sequential system. Our new model can classify sentiment of millions of English documents based on the sentiment lexicons of the bESD in a parallel network environment. The proposed model is not depending on both any special domain and any training stage. This survey used many similarity coefficients of a data mining field. The results of this work can be widely used in applications and research of the English sentiment classification.
Số bản sách: (0) Tài liệu số: (1)
5
K-Medoids algorithm used for English sentiment classification in a distributed system / Vo Ngoc Phu, Vo Thi Ngoc Tran // Computer Modelling And New Technologies. - . - Vol. 22 (2018), P.20-39. - ISSN:

Latvia : Latvian Transport Development and Education Association, 2018
20 p.
Ký hiệu phân loại (DDC): 004
Sentiment classification is significant in everyday life, such as in political activities, commodity production, and commercial activities. Finding a fast, highly accurate solution to classify emotion has been a challenge for scientists. In this research, we have proposed a new model for Big Data sentiment classification in the parallel network environment – a Cloudera system with Hadoop Map (M) and Hadoop Reduce (R). Our new model has used a K-Medoids Algorithm (PAM) with multi-dimensional vector and 2,000,000 English documents of our English training data set for English document-level sentiment classification. Our new model can classify sentiment of millions of English documents based on many English documents in the parallel network environment. However, we tested our new model on our testing data set (including 1,000,000 English reviews, 500,000 positive and 500,000 negative) and achieved 85.98% accuracy.
Số bản sách: (0) Tài liệu số: (1)