Python Forum
Saving a download of stopwords (nltk)
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Saving a download of stopwords (nltk)
#1
I’ve got a basic Django project. One feature I am working on counts the number of most commonly occurring words in a .txt file, such as a large public domain book. I’ve used the Python Natural Language Tool Kit to filter out “stopwords” (in SEO language, that means redundant words such as ‘the’, ‘you’, etc. ).

Anyways, I’m getting this traceback on my Django server:

Quote: Resource [93mstopwords[0m not found.
Please use the NLTK Downloader to obtain the resource:

[31m>>> import nltk
>>> nltk.download('stopwords')
[0m
For more information see: https://www.nltk.org/data.html

After Googling around, I discovered the reason why is because I need to download the library of stopwords. To resolve the issue, I simply open a Python REPL on my remote server and invoke these two straight forward lines:

>>> import nltk >>> nltk.download('stopwords')
That resolves the issue, but only temporarily. As soon as the REPL session is terminated, the error returns. I figure I need to use the built in .save class method but I am not sure which attribute to pair it with.

Here are the relevant lines from my utils.py file:

import re from collections import Counter from nltk.corpus import stopwords #library used to filter out common english words to produce more meaningful output from blogs.models import Posts def top_word_counts(text):	stoplist = stopwords.words('english')	stoplist.extend(["said", "gutenberg", "could", "would", "shall", "unto", "thou", "thy", "ye", "thee","upon", "hath","came", "come","things", "also", "saying", "say"])	# Added the mechanism to extend the list to include integers between 0 and 1999	extendinteger = list(range(0, 2000))	# Using map() it will convert the given type with one by iterations	# of the array and convert to the corresponding type	stoplist.extend(list(map(str,extendinteger)))	clean = []	for word in re.split(r"\W+", text):	if word not in stoplist:	clean.append(word)	top_10 = Counter(clean).most_common(10)	return top_10
I tried adding import nltk to the top of this script and adding nltk.download('stopwords') to different locations within the top_word_counts function but that didn’t work.

So my question is: How do I invoke nltk.download('stopwords') so that it automatically runs once without having to manually load it in the Python REPL?

Here is the utility file in full in my GitHub repo.

I decided to post to the General Coding Help forum instead of web development because the answer to my question is more to do with Python in general rather than being specific to Django.
Reply
#2
It will download to a system-wide directory,so it's a one time operation.
Eg on Windows.
>>> import nltk >>> nltk.download('stopwords') [nltk_data] Downloading package stopwords to [nltk_data] C:\Users\Tom\AppData\Roaming\nltk_data... [nltk_data] Unzipping corpora\stopwords.zip.
So now will import work every time.
>>> from nltk.corpus import stopwords >>> stoplist = stopwords.words('english') >>> stoplist[:5] ['i', 'me', 'my', 'myself', 'we']
When use Django is most common(highly advisable) to run in a virtual environment.
Then can point download to that folder,so it get data from environment folder and not system-wide.
>>> import nltk >>> nltk.download('stopwords', download_dir='E:/div_code/django_env/nltk_data') [nltk_data] Downloading package stopwords to [nltk_data] E:/div_code/django_env/nltk_data... [nltk_data] Unzipping corpora\stopwords.zip. True
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  NLTK danandu 7 892 Oct-30-2025, 02:31 PM
Last Post: noisefloor
  nltk installs but cannot be imported [SOLVED] sheepog 3 2,507 Jun-07-2025, 10:03 PM
Last Post: Larz60+
  get nltk data Pedroski55 7 10,973 Aug-12-2024, 06:16 AM
Last Post: Pedroski55
  Help with simple nltk Chatbot Extra 3 4,700 Jan-02-2022, 07:50 AM
Last Post: bepammoifoge
  download with internet download manager coral_raha 0 4,949 Jul-18-2021, 03:11 PM
Last Post: coral_raha
  Installing nltk dependency Eshwar 0 3,166 Aug-30-2020, 06:10 PM
Last Post: Eshwar
  Analyzing large text file with nltk.corpus (stopwords ) Drone4four 9 10,198 Jun-06-2019, 09:30 PM
Last Post: Drone4four
  Clean Data using NLTK disruptfwd8 0 4,291 May-12-2018, 11:21 PM
Last Post: disruptfwd8
  Text Processing and NLTK (POS tagging) TwelveMoons 2 6,412 Mar-16-2017, 02:53 AM
Last Post: TwelveMoons
  NLTK create corpora pythlang 5 12,864 Oct-26-2016, 07:31 PM
Last Post: Larz60+

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020
This forum uses Lukasz Tkacz MyBB addons.
Forum use Krzysztof "Supryk" Supryczynski addons.