How To Build Thesaurus On Twitter Data

by Arun Mathew Kurian

How to build a Twitter sentiment analyzer in Python using TextBlob

This blog is based on the video Twitter Sentiment Assay — Learn Python for Data Science #two by Siraj Raval. In this challenge, we will be building a sentiment analyzer that checks whether tweets about a subject are negative or positive. We will exist making use of the Python library textblob for this.

tP3mRr3cKmn9tJ8tua6Rp2ffqIrJk0e9FwMM — image from google

Sentiment Assay, too called stance mining or emotion AI, is the procedure of determining whether a piece of writing is positive, negative, or neutral. A common apply example for this technology is to discover how people feel nearly a item topic. Sentiment assay is widely practical to reviews and social media for a variety of applications.

Sentiment analysis tin exist performed in many different ways. Many brands and marketers use keyword-based tools that allocate data (i.eastward. social, news, review, blog, etc.) as positive/negative/neutral.

Automated sentiment tagging is usually achieved through word lists. For example, mentions of 'hate' would exist tagged negatively.

There tin exist 2 approaches to sentiment analysis.

1. Dictionary-based methods
2. Machine Learning-based methods.

In this problem, we will be using a Dictionary-based method.

Lexicon based methods define a list of positive and negative words, with a valence — (eg 'dainty': +ii, 'good': +one, 'terrible': -ane.v etc). The algorithm looks up a text to find all known words. Information technology and then combines their individual results by summing or averaging. Some extensions tin check some grammatical rules, like negation or sentiment modifier (similar the word "but", which weights sentiment values in text differently, to emphasize the end of text).

Let'southward build the analyzer now.

Twitter API

Earlier we start coding, we need to register for the Twitter API https://apps.twitter.com/. Here we demand to register an app to generate diverse keys associated with our API. The Twitter API can exist used to perform many deportment like create and search.

At present after creating the app nosotros can kickoff coding.

We demand to install 2 packages:

pip install tweepy

This package volition be used for treatment the Twitter API.

pip install textblob

This packet volition exist used for the sentiment analysis.

sentiment_analyzer.py

                import tweepyfrom textblob import TextBlob

We demand to declare the variables to store the various keys associated with the Twitter API.

                consumer_key = '[consumer_key]'

                consumer_key_secret = '[consumer_key_secret]'

                access_token = '[access_token]'

                access_token_secret = '[access_token_secret]'

The next step is to create a connection with the Twitter API using tweepy with these tokens.

Tweepy

Tweepy supports OAuth authentication. Authentication is handled by the tweepy.OAuthHandler course.

An OAuthHandler instance must be created by passing a consumer token and undercover.

On this auth instance, we will call a office set_access_token by passing the access_token and access_token_secret.

Finally, we create our tweepy API instance by passing this auth case into the API role of tweepy.

                auth = tweepy.OAuthHandler(consumer_key, consumer_key_secret)

                auth.set_access_token(access_token, access_token_secret)

                api = tweepy.API(auth)

We can now search Twitter for any topic using the search method of the API.

                public_tweets = api.search('Dogs')

Now we will exist getting all the tweets related to the topic 'Dogs'. We tin can perform sentiment analysis using the library textblob.

TextBlob

TextBlob is a Python (two and iii) library for processing textual data. It provides a simple API for diving into mutual natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, nomenclature, translation, and more.

A textblob tin can exist created in the following manner (example, and non function of the original code):

                example = TextBlob("Python is a high-level, general-purpose programming language.")

And tokenization can be performed past the post-obit methods:

words: returns the words of text

usage:

                example.words

sentences: returns the sentences of text

usage:

                example.sentences

Part-of-speech Tagging

Part-of-voice communication tags can be accessed through the tags belongings.

                wiki.tags[('Python', 'NNP'), ('is', 'VBZ'), ('a', 'DT'), ('loftier-level', 'JJ'), ('general-purpose', 'JJ'), ('programming', 'NN'), ('language', 'NN')]

Sentiment Analysis

The sentiment property returns a named tuple of the class Sentiment (polarity, subjectivity). The polarity score is a float inside the range [-ane.0, i.0]. The subjectivity is a float inside the range [0.0, 1.0] where 0.0 is very objective and 1.0 is very subjective.

At present dorsum to the code.

We can iterate the publice_tweets array, and check the sentiment of the text of each tweet based on the polarity.

                for tweet in public_tweets:    print(tweet.text)    analysis = TextBlob(tweet.text)    print(analysis.sentiment)    if analysis.sentiment[0]>0:       print 'Positive'    elif assay.sentiment[0]<0:       impress 'Negative'    else:       print 'Neutral'

Now nosotros run the code using the following:

python sentiment_analyzer.py

and we get the output:

We tin can encounter that the sentiment of the tweet is displayed.

This is an example of how sentiment assay can be washed on data from social media like Twitter. I hope yous find information technology useful!

Find the code at https://github.com/amkurian/twitter_sentiment_challenge

Learn to code for free. freeCodeCamp's open source curriculum has helped more 40,000 people get jobs equally developers. Go started