Think of Spacy
library more intelligent than nltk
. Let's start with installing spacy library.
You can use google collab to avoid hassle for downloading it.
Write in terminal of code editor.
pip install spacy
import spacy
nlp = spacy.load('en_core_web_lg')
loads a large pre-trained English language model in spaCy, making it available for natural language processing tasks. This specific model, en_core_web_lg, provides comprehensive capabilities like tokenization, part-of-speech tagging, dependency parsing, and named entity recognition.
python -m spacy download en_core_web_lg
import spacy nlp=spacy.load('en_core_web_lg')
Tokenisation
nltk.tokenize
import nltk from nltk.tokenize import word_tokenize txt="Hello How it going U.S.A." print(word_tokenize(txt))
Output: ['Hello', 'How', 'it', 'going', 'U.S.A', '.']
nltk.tokenize made '.' full stop also split.
spacy
tokenize
import spacy nlp=spacy.load('en_core_web_lg') text=nlp("Hello How it going U.S.A.") for token in text: print(token.text)
Hello How it going U.S.A.
It doesn't split '.' full stop.
Here is question for you.
txt=nlp("I can't came there")
for token in text:
print(token.text)
Output:
I
ca
n't
came
there
Why it is treating "can't" separately "ca" "n't" how to solve this thing.
Part of Speech (POS).
import spacy nlp=spacy.load('en_core_web_lg') text=nlp("Hello How it going U.S.A. we are 83 block") for token in text: print(token.text,token.pos)
Hello 91 How 98 it 95 going 100 U.S.A. 96 we 95 are 87 83 93 block 92
These number is given to the part of speech.
import spacy nlp=spacy.load('en_core_web_lg') text=nlp("Hello How it going U.S.A. we are 83 block") for token in text: print(token.text,token.pos_)
Hello INTJ How SCONJ it PRON going VERB U.S.A. PROPN we PRON are AUX 83 NUM block NOUN
Now you see Hello
is interjection it
is pronoun and further more.
Sentence Tokenisation
s=nlp(u"This is the first sentence. I gave given fullstop please check. Let's study now") for sentence in s.sents: print(sentence)
Output: This is the first sentence. I gave given fullstop please check. Let's study now
Top comments (0)