Presentation On SENTIMENTAL ANALYSIS IN TWITTER Submitted To Submitted By Department of Computer Science AIM & ACT IN B.Tech (IT) Session : Jul Dec’2017 Name : KHUSHBOO GUPTA ID : BTBTI15052 Exam Roll No : 8244 Class S.No : 34
S.NO. TOPIC NAME SLIDE NO. 1 Introduction 4 2 Methodology 7 3 Why Twitter? 11 4 Classification of Techniques 12 5 Naïve Bayes Approach for SA 14 6 Applications 18 7 Future Scope 19 8 Conclusion 20 9 References 21
WHY USE SENTIMENYAL ANALYSIS ?? Promotion : Is This Review Positive Or Negative?Products : What Do People Think About The New Iphone?Politics : What Do People Think About This Candidate Or Issue? Prediction: Predict Election Outcomes Or Market Trends From Sentiment
WHAT IS SENTIMENTAL ANALYSIS ? Sentiment analysis is the process of determining the feeling behind a piece of text, conversation or a social media update. Classification in terms of polarity of the given tweet used in twitter and other social channels 86% marketers value it highly opinion is of positive , negative or neutral
negative POSITIVE neutral
METHODOLOGY 1. DATA COLLECTION : Sentiments in the form of tweets collected from Twitter/ any other platforms . 2. TOKENSIER :  filteration of text  goes through POS tagger • nouns/pronouns removed • measures the intensity of any word ie is it used as a verb or adjective ?  Remove slag words.  Remove URL (friendorfollow.com/twitter/most- tweets/)  Remove HASTAG(#),numbers.  Replace sequence of repeated character coooooool by cool
3. NEGATION : Very important in sentimental analysis for the “not” can also be used for positive as “ not only ” …so there can be no confusion !! 4. FEATURE EXTRACTION : • Percentage of capitalized word • No of –ve /+ve capitalized word • No of +ve /-ve hashtag • No of +ve /-ve emoticons • No. of negations • No. of special characters Example : & $ @ %
Perform Subjectivity Classification In this one can find out sentence is either an objective sentence or a subjective sentence as per the opinion expressed. Perform Classification of Subjective Sentence. In this, if sentence is subjective sentence, then one can find out sentence is either a positive opinion or negative opinion 5. Sentiment Classification at Sentence Level Now for the similar task we can compute the sentence-level classification. Suppose the task is given as below. For a sentence S, perform the two important sub-tasks which are given below
PREDICTIONS The model is built to predict the sentiment of new tweets… Feature extracted are next focused to classifier.
• social networking and microblogging service • allows users to post called real time messages called tweets . • messages restricted to 140 characters in length people use acronyms, make spelling mistakes , use emoticons ,and other characters that give a special meaning Following is a brief terminology associated with tweets : EMOTICONS : express the user’s mood. TARGET : use the ‘@’ to refer to other users HASHTAG : users usually use hashtags to mark topics WHY TWITTER? ??
Sentimental Analysis Machine Learning Approach Supervised Learning Decision Tree Identifiers Linear Classifiers Support Vector Machines Neural Networks Rule Based Classifiers Probalistic Classifiers Naïve Bayes Classifiers Bayesian Networks Maximunm Entropy Unsupervise d Learning Lexicon Based Approach Dictionary Based Approach Corpus Based Approach Statistical Semantics CLASSIFICATION OF TECHNIQUES USED FOR SENTIMENTAL ANALYSIS
Machine Learning Approach : • uses ML algorithm & linguistic features • optimises the performance of the system using example data. • Example : The big data framework such as Mahout and Pentaho contain library and plugins. 2 sets of documents are required by ML approach : 1.Training Sets : Used by the classifier to learn the document characteristics. 2.Testing Sets : Used to validate classifier performance. Machine Learning Approach Supervised Methods : use a large number of labelled training documents. Unsupervised Methods
Naïve Bayes Approach for Sentimental Analysis Positives HAPPY GOOD GREAT Negatives SAD POOR BAD Lets have 5 sentences : 1. I loved the movie . 2. I hated the movie 3. A great movie , good movie . 4. Poor acting . 5. Great acting , a good movie .
Sentence No I Loved The Movie Hated A Great Good Poor Acting Class 1. 1 1 1 1 Pos(+) 2. 1 1 1 1 Neg(- ) 3. 2 1 1 1 Pos(+) 4. 1 1 Neg(- ) 5. 1 1 1 1 Pos(+)
P(+) = 3 5 P(-) = 2 5 P(word|label) = 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡ℎ𝑒 𝑤𝑜𝑟𝑑 𝑜𝑐𝑐𝑢𝑟𝑠 + 1 𝑛𝑜. 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑡ℎ𝑒𝑟𝑒 𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑖𝑛 𝑙𝑎𝑏𝑒𝑙 + 𝑛𝑜 𝑜𝑓 𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠 P(I|+) = 1+1 14+10 = 2 24 =0.0833 P(loved |+) = 1+1 14+10 = 2 24 =0.0833 P(the|+) = 1+1 14+10 = 2 24 =0.0833 P(movie|+) = 4+1 14+10 = 5 24 =0.2083 P(hated|+) = 0+1 14+10 = 1 24 =0.04166 P(a|+) = 2+1 14+10 = 3 24 =0.0125 P(great|+) = 2+1 14+10 = 3 24 =0.0125 P(good|+) = 2+1 14+10 = 3 24 =0.0125 P(poor|+) = 0+1 14+10 = 1 24 =0.04166 P(acting |+) = 1+1 14+10 = 2 24 =0.0833 P(I|-) = 1+1 6+10 = 2 6 =0.125 P(loved|-) = 0+1 6+10 = 1 6 =0.0625 P(the|-) = 1+1 6+10 = 2 6 =0.125 P(movie|-) = 1+1 6+10 = 2 6 =0.125 P(hated|-) = 1+1 6+10 = 2 6 =0.125 P(poor|-) = 1+1 6+10 = 2 6 =0.125 P(acting|-) = 1+1 6+10 = 2 6 =0.125 P(a|-) = 0+1 6+10 = 1 6 =0.0625 P(great|-) = 0+1 6+10 = 1 6 =0.0625 P(good|-) = 0+1 6+10 = 1 6 =0.0625
I hated the poor acting . P( positive )= P(+) P(I|+) P(hated|+) P(the|+) P(poor|+) P(acting |+) = 0.6 * 0.0833 * 0.04166 * 0.0833 * 0.04166 * 0.0833 P( negative )= P(-) P(I|-) P(hated|-) P(the|-) P(poor|-) P(acting |-) = 0.4 * 0.125 * 0.125 * 0.125 * 0.125 * 0.125 = 6.02 X 10 -8 = 1.2207 X 10 -5 P( negative) > P( positive ) RESULT : There is more negativity in the tweet and so we label this tweet as NEGATIVE
APPLICATIONS • Dissatisfaction oriented online advertising • On-line commerce  Ex : Brand A or B? Quality X or Y ? Feature C or D ? • Voting advise applications • Clarification of politicians’ positions • Real-world events monitoring  Ex: Leader A or B ? • Legal matters “blawgs”(subset of blogs ) • Policy or government-regulation proposals • Intelligent transportation systems  Ex: Is the movement / law proposals advantageous??
Using different other models and algorithms. Temporal analysis  Data Pre-Processing using more parameters to get best sentiments  accuracy to process human sentiments  Updating Dictionary for new Synonym and Antonyms of already existing words. Web-Application can be converted to Mobile Application Context Sentimental Analysis may be implemented in future for accuracy purposes. FUTURE SCOPE
 “ What others think “ is important.  Sentiment analysis or opinion mining is a field of study that analyzes people’s sentiments, attitudes, or emotions towards certain entities.  Supervised algorithms are still an open field for research.  Naïve Bayes and support vector machines are the most frequently used ML algorithms for solving sc problem.  Micro-blogs, blogs and forums as well as news source, is widely used .  Hence we conclude that Twitter can be the best platform for sentimental analysis
• https://journalofbigdata.springeropen.com/articles/10.1 186/s40537-015-0015-2 • http://ijiet.com/wp-content/uploads/2016/04/37.pdf • https://github.com/mayank93/Twitter-Sentiment- Analysis • http://www.pythonforbeginners.com/systemsprogrammi ng/using-the-csv-module-in-python/ • http://www.academia.edu/6723240/Mining_Opinion_Fea tures_in_Customer_Reviews • http://content26.com/blog/bing-liu-the-science-of- detecting-fake-reviews/ • http://www.scienceforseo.com • http://help.sentiment140.com/for-students REFERENCES
• Ronen Feldman, “Techniques and Application of Sentiment Analysis”, Communication of ACM, April 2013, vol. 56.No.4. • http://help.sentiment140.com/for-students • REASEARCH PAPER : Utilization of project sentimental analysis as a project performance predictor by Bob Prieto • REASEARCH PAPER : Sentimental Analysis : Measuring Opinions by Chetashri Bhadane , Hardi Dalal and Heenal Doshi • RESEARCH PAPER : Overview and Future Opportunities of Sentimental Analysis Approaches for Big Data by Nurfadhlina Mohd Sharef, Harnani Mat Zin and Samaneh Nadali
Sentimental Analysis - Naive Bayes Algorithm

Sentimental Analysis - Naive Bayes Algorithm

  • 1.
    Presentation On SENTIMENTAL ANALYSIS INTWITTER Submitted To Submitted By Department of Computer Science AIM & ACT IN B.Tech (IT) Session : Jul Dec’2017 Name : KHUSHBOO GUPTA ID : BTBTI15052 Exam Roll No : 8244 Class S.No : 34
  • 2.
    S.NO. TOPIC NAMESLIDE NO. 1 Introduction 4 2 Methodology 7 3 Why Twitter? 11 4 Classification of Techniques 12 5 Naïve Bayes Approach for SA 14 6 Applications 18 7 Future Scope 19 8 Conclusion 20 9 References 21
  • 4.
    WHY USE SENTIMENYAL ANALYSIS?? Promotion : Is This Review Positive Or Negative?Products : What Do People Think About The New Iphone?Politics : What Do People Think About This Candidate Or Issue? Prediction: Predict Election Outcomes Or Market Trends From Sentiment
  • 6.
    WHAT IS SENTIMENTAL ANALYSIS? Sentiment analysis is the process of determining the feeling behind a piece of text, conversation or a social media update. Classification in terms of polarity of the given tweet used in twitter and other social channels 86% marketers value it highly opinion is of positive , negative or neutral
  • 7.
  • 8.
    METHODOLOGY 1. DATA COLLECTION: Sentiments in the form of tweets collected from Twitter/ any other platforms . 2. TOKENSIER :  filteration of text  goes through POS tagger • nouns/pronouns removed • measures the intensity of any word ie is it used as a verb or adjective ?  Remove slag words.  Remove URL (friendorfollow.com/twitter/most- tweets/)  Remove HASTAG(#),numbers.  Replace sequence of repeated character coooooool by cool
  • 9.
    3. NEGATION : Veryimportant in sentimental analysis for the “not” can also be used for positive as “ not only ” …so there can be no confusion !! 4. FEATURE EXTRACTION : • Percentage of capitalized word • No of –ve /+ve capitalized word • No of +ve /-ve hashtag • No of +ve /-ve emoticons • No. of negations • No. of special characters Example : & $ @ %
  • 10.
    Perform Subjectivity Classification In thisone can find out sentence is either an objective sentence or a subjective sentence as per the opinion expressed. Perform Classification of Subjective Sentence. In this, if sentence is subjective sentence, then one can find out sentence is either a positive opinion or negative opinion 5. Sentiment Classification at Sentence Level Now for the similar task we can compute the sentence-level classification. Suppose the task is given as below. For a sentence S, perform the two important sub-tasks which are given below
  • 11.
    PREDICTIONS The model isbuilt to predict the sentiment of new tweets… Feature extracted are next focused to classifier.
  • 12.
    • social networkingand microblogging service • allows users to post called real time messages called tweets . • messages restricted to 140 characters in length people use acronyms, make spelling mistakes , use emoticons ,and other characters that give a special meaning Following is a brief terminology associated with tweets : EMOTICONS : express the user’s mood. TARGET : use the ‘@’ to refer to other users HASHTAG : users usually use hashtags to mark topics WHY TWITTER? ??
  • 13.
  • 14.
    Machine Learning Approach: • uses ML algorithm & linguistic features • optimises the performance of the system using example data. • Example : The big data framework such as Mahout and Pentaho contain library and plugins. 2 sets of documents are required by ML approach : 1.Training Sets : Used by the classifier to learn the document characteristics. 2.Testing Sets : Used to validate classifier performance. Machine Learning Approach Supervised Methods : use a large number of labelled training documents. Unsupervised Methods
  • 15.
    Naïve Bayes Approach forSentimental Analysis Positives HAPPY GOOD GREAT Negatives SAD POOR BAD Lets have 5 sentences : 1. I loved the movie . 2. I hated the movie 3. A great movie , good movie . 4. Poor acting . 5. Great acting , a good movie .
  • 16.
    Sentence No I Loved TheMovie Hated A Great Good Poor Acting Class 1. 1 1 1 1 Pos(+) 2. 1 1 1 1 Neg(- ) 3. 2 1 1 1 Pos(+) 4. 1 1 Neg(- ) 5. 1 1 1 1 Pos(+)
  • 17.
    P(+) = 3 5 P(-) = 2 5 P(word|label)= 𝑛𝑜. 𝑜𝑓 𝑡𝑖𝑚𝑒𝑠 𝑡ℎ𝑒 𝑤𝑜𝑟𝑑 𝑜𝑐𝑐𝑢𝑟𝑠 + 1 𝑛𝑜. 𝑜𝑓 𝑤𝑜𝑟𝑑𝑠 𝑡ℎ𝑒𝑟𝑒 𝑡ℎ𝑒𝑟𝑒 𝑎𝑟𝑒 𝑖𝑛 𝑙𝑎𝑏𝑒𝑙 + 𝑛𝑜 𝑜𝑓 𝑘𝑒𝑦𝑤𝑜𝑟𝑑𝑠 P(I|+) = 1+1 14+10 = 2 24 =0.0833 P(loved |+) = 1+1 14+10 = 2 24 =0.0833 P(the|+) = 1+1 14+10 = 2 24 =0.0833 P(movie|+) = 4+1 14+10 = 5 24 =0.2083 P(hated|+) = 0+1 14+10 = 1 24 =0.04166 P(a|+) = 2+1 14+10 = 3 24 =0.0125 P(great|+) = 2+1 14+10 = 3 24 =0.0125 P(good|+) = 2+1 14+10 = 3 24 =0.0125 P(poor|+) = 0+1 14+10 = 1 24 =0.04166 P(acting |+) = 1+1 14+10 = 2 24 =0.0833 P(I|-) = 1+1 6+10 = 2 6 =0.125 P(loved|-) = 0+1 6+10 = 1 6 =0.0625 P(the|-) = 1+1 6+10 = 2 6 =0.125 P(movie|-) = 1+1 6+10 = 2 6 =0.125 P(hated|-) = 1+1 6+10 = 2 6 =0.125 P(poor|-) = 1+1 6+10 = 2 6 =0.125 P(acting|-) = 1+1 6+10 = 2 6 =0.125 P(a|-) = 0+1 6+10 = 1 6 =0.0625 P(great|-) = 0+1 6+10 = 1 6 =0.0625 P(good|-) = 0+1 6+10 = 1 6 =0.0625
  • 18.
    I hated thepoor acting . P( positive )= P(+) P(I|+) P(hated|+) P(the|+) P(poor|+) P(acting |+) = 0.6 * 0.0833 * 0.04166 * 0.0833 * 0.04166 * 0.0833 P( negative )= P(-) P(I|-) P(hated|-) P(the|-) P(poor|-) P(acting |-) = 0.4 * 0.125 * 0.125 * 0.125 * 0.125 * 0.125 = 6.02 X 10 -8 = 1.2207 X 10 -5 P( negative) > P( positive ) RESULT : There is more negativity in the tweet and so we label this tweet as NEGATIVE
  • 19.
    APPLICATIONS • Dissatisfaction orientedonline advertising • On-line commerce  Ex : Brand A or B? Quality X or Y ? Feature C or D ? • Voting advise applications • Clarification of politicians’ positions • Real-world events monitoring  Ex: Leader A or B ? • Legal matters “blawgs”(subset of blogs ) • Policy or government-regulation proposals • Intelligent transportation systems  Ex: Is the movement / law proposals advantageous??
  • 20.
    Using different othermodels and algorithms. Temporal analysis  Data Pre-Processing using more parameters to get best sentiments  accuracy to process human sentiments  Updating Dictionary for new Synonym and Antonyms of already existing words. Web-Application can be converted to Mobile Application Context Sentimental Analysis may be implemented in future for accuracy purposes. FUTURE SCOPE
  • 21.
     “ Whatothers think “ is important.  Sentiment analysis or opinion mining is a field of study that analyzes people’s sentiments, attitudes, or emotions towards certain entities.  Supervised algorithms are still an open field for research.  Naïve Bayes and support vector machines are the most frequently used ML algorithms for solving sc problem.  Micro-blogs, blogs and forums as well as news source, is widely used .  Hence we conclude that Twitter can be the best platform for sentimental analysis
  • 22.
    • https://journalofbigdata.springeropen.com/articles/10.1 186/s40537-015-0015-2 • http://ijiet.com/wp-content/uploads/2016/04/37.pdf •https://github.com/mayank93/Twitter-Sentiment- Analysis • http://www.pythonforbeginners.com/systemsprogrammi ng/using-the-csv-module-in-python/ • http://www.academia.edu/6723240/Mining_Opinion_Fea tures_in_Customer_Reviews • http://content26.com/blog/bing-liu-the-science-of- detecting-fake-reviews/ • http://www.scienceforseo.com • http://help.sentiment140.com/for-students REFERENCES
  • 23.
    • Ronen Feldman,“Techniques and Application of Sentiment Analysis”, Communication of ACM, April 2013, vol. 56.No.4. • http://help.sentiment140.com/for-students • REASEARCH PAPER : Utilization of project sentimental analysis as a project performance predictor by Bob Prieto • REASEARCH PAPER : Sentimental Analysis : Measuring Opinions by Chetashri Bhadane , Hardi Dalal and Heenal Doshi • RESEARCH PAPER : Overview and Future Opportunities of Sentimental Analysis Approaches for Big Data by Nurfadhlina Mohd Sharef, Harnani Mat Zin and Samaneh Nadali