Create Encoder and Decoder
Give number to words and words also represent that number.
num=1 word_to_num={} num_to_word={} for word in vocab: word_to_num[word]=num num_to_word[num]=word num+=1 print(num_to_word)
{1: 'north', 2: 'southwest', 3: 'democracy', 4: 'sea', 5: 'vicinity', 6: 'populous', 7: 'thailand', 8: 'share', 9: 'bounded', 10: 'india', 11: 'maldives', 12: 'area', 13: 'land', 14: 'bengal', 15: 'indian', 16: 'gaṇarājya', 17: 'southeast', 18: 'south', 19: 'world', 20: 'second-most', 21: 'pakistan', 22: 'bay', 23: 'arabian', 24: 'ocean', 25: 'border', 26: 'maritime', 27: 'nepal', 28: 'myanmar', 29: 'officially', 30: 'hindi', 31: 'seventh-largest', 32: 'china', 33: 'indonesia', 34: 'asia', 35: 'andaman', 36: 'bhārat', 37: 'nicobar', 38: 'borders', 39: 'west', 40: 'bhutan', 41: 'shares', 42: 'country', 43: 'bangladesh', 44: 'islands', 45: 'east', 46: 'lanka', 47: 'sri', 48: 'republic'}
Encoding
We are going to encode the corpus and make list of it, that list is of number which represent the number.
data=[] for sent in sent_tokenize(corpus): temp=[] for word in word_tokenize(sent): if (word.lower() not in stopwords.words('english')) and (len(word)>=2): temp.append(word_to_num[word.lower()]) print(temp) data.append(temp) print()
[20, 38, 31, 20, 17, 22, 33, 1, 15, 39] [18, 1, 45, 46, 9, 1, 9, 25, 7] [4, 10, 12, 15, 14, 47, 13, 21, 5, 6, 36, 27, 28, 24, 23, 8, 26, 2, 40, 48, 30, 32] [10, 12, 20, 16, 37, 41, 3, 42, 35, 29, 19, 11, 43, 44, 30, 34]
Explanation:
- tokenise the each sentence into word
- Stop Words Removal
- Assign word to number, append it into temp=[]
Decode
for sent in data: for word in sent: print(num_to_word[word],end=' ') print()
india officially republic india hindi bhārat gaṇarājya country south asia seventh-largest country area second-most populous country populous democracy world bounded indian ocean south arabian sea southwest bay bengal southeast shares land borders pakistan west china nepal bhutan north bangladesh myanmar east indian ocean india vicinity sri lanka maldives andaman nicobar islands share maritime border thailand myanmar indonesia
Hey, guys if you are reading this, just want to ask you for your feedback what thing need to be change, it can be anything the topic is not explained well, you don't understand anything please let me know.
Top comments (0)