Python NLTK

90% of world's data generated over last two years

common Internet user creates Visual Textual Instagram Flickr Vscocam Facebook Tumblr Blogger Twitter Facebook Emails Costumer Reviews

World is full of unstructured, text-rich data. Everything from emails to customer tweets. The information buried in all that text holds the potential to deliver valuable business insights

Text analytics is the practice of using technology to gather, store and mine textual information for hidden signals that can be used to inform smarter business decisions

An explosion of unstructured data

Many types of organizations are experiencing explosive growth in their unstructured enterprise data. Same time that they have access to external sources of data such as social media, blogs, and mobile data.

Until now, much of this information passed through the organization virtually unanalyzed. Today, new tools for handling large amounts of complex data makes it easier to squeeze value from such unlikely sources.

sentiment analysis spam filtering text categorization topic detection keyword frequency plagiatism detection document similarity phrase extraction

Natural Language Tool Kit leading platform for building Python programs to work with human language data

sentence and word tokenization text calsification corpora parsing clustring part of speach tagging text stemming and mutch more..

Part of speech tagging explanation CC Coordinating conjunctin CD Cardinal Number DT Determiner EX Existing “ there“ FW Foreign word IN Preposition or subordination conjuction JJ Adjective JJR Adjective- comparative JJS Adjective- superlative LS List item marker MD Modal NN Noun- singular or mass NNS Non-Plural NP Proper noun- singular nltk.help.upenn_tagset() //all tag sets

Text clasification Algorithms in NLTK Naive Bayes Maximum Entropy Decision Tree

Sentiment analysis https://github.com/pumpurs/SentimentWordsLV/

Document similarity detection Tf-idf stands for term frequency-inverse document frequency, and the tf-idf weight is a weight often used in information retrieval and text mining. This weight is a statistical measure used to evaluate how important a word is to a document in a collection or corpus.

“Market and product reserch” “Social CMS” 1.97 b social network users “Costumer profiling / analytics” 70% of marketers used Facebook to gain 6.7 million people blog on blogging sites

pumpurs.alberts@gmail.com Big Data, Startups, Text Analysis, Internet of Things, Web Development

Python NLTK

More Related Content

What's hot

Similar to Python NLTK

Python NLTK