Python Forum
Tkinter Web Scraping w/Multithreading Question....
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Tkinter Web Scraping w/Multithreading Question....
#1
I have a simple program in Tkinter that I'm using to scrape the age of various people from wikipedia just for webscraping practice.

I am able to scrape the age of each person one-by-one on one thread, but I'm trying to have one thread for each person to handle scraping their ages all at the same time so that the program will be much faster.

So, in other words, the program currently scrapes only 1 person at a time and can only return 1 row at a time in the Treeview, but I'd like to have it to where a thread works for each person at the same time (concurrently) so that the Treeview will return each person's age in one shot as well.

Here's the code that I've come up with so far:

from tkinter import Tk, Button, Listbox from tkinter import ttk import threading import requests import re #imports RegEx class MainWindow(Tk): def __init__(self): super().__init__() self.option_add("*Font", "poppins, 11 bold") self.lb1 = Listbox(self, width=22, cursor='hand2') self.lb1.pack(side='left', fill='y', padx=20, pady=20) #create list of names names = ['Adam Levine', 'Arnold Schwarzenegger', 'Artur Beterbiev', 'Chris Hemsworth', 'Dan Henderson', 'Dustin Poirier', 'Fedor Emelianenko', 'Gennady Golovkin', 'Igor Vovchanchyn', 'Ken Shamrock', 'Mirko Cro Cop', 'Oleksandr Usyk', 'Ronnie Coleman', 'Vasiliy Lomachenko'] #populate listbox with names for name in names: self.lb1.insert('end', name) self.tv1 = ttk.Treeview(self, show='tree headings', cursor='hand2') columns = ('NAME', 'AGE') self.tv1.config(columns = columns) style = ttk.Style() style.configure("Treeview", highlightthickness=2, bd=0, rowheight=26,font=('Poppins', 11)) # Modify the font of the body style.configure("Treeview.Heading", font=('Poppins', 12, 'bold')) # Modify the font of the headings #configure headers self.tv1.column('#0', width=0, stretch=0) self.tv1.column('NAME', width=190) self.tv1.column('AGE', width=80, stretch=0) #define headings self.tv1.heading('NAME', text='NAME', anchor='w') self.tv1.heading('AGE', text='AGE', anchor='w') self.tv1.pack(fill='both', expand=1, padx=(0, 20), pady=20) #create start button self.b1 = Button(self, text='START', bg='green', fg='white', cursor='hand2', command=self.start) self.b1.pack(pady=(0, 20)) #scrape data from WikiPedia.org def start(self): for item in self.tv1.get_children(): self.tv1.delete(item) t1 = threading.Thread(target=self.scrape_wiki, daemon=True) t1.start() def scrape_wiki(self): for i in range(self.lb1.size()): #select the name self.name = self.lb1.get(i).replace(' ', '_') # create a simple dictionary to hold the user agent inside of the headers headers = {'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; rv:91.0) Gecko/20100101 Firefox/91.0'} # the required first parameter of the 'get' method is the 'url': r = requests.get('https://en.wikipedia.org/wiki/' + self.name, headers=headers) # regex setup regex = re.search('(age \d+)', r.text) age = regex.group(0).replace('age ', '').replace(')', '') # Populate Treeview with row data self.name = self.name.replace('_', ' ') self.tv1.insert(parent='', index='end', iid=i, values=(self.name, age)) if __name__ == '__main__': app = MainWindow() #app.iconbitmap('imgs/logo-icon.ico') app.title('Main Window') app.configure(bg='#333333') #center the Main Window: w = 600 # Width h = 520 # Height screen_width = app.winfo_screenwidth() # Width of the screen screen_height = app.winfo_screenheight() # Height of the screen # Calculate Starting X and Y coordinates for Window x = (screen_width / 2) - (w / 2) y = (screen_height / 2) - (h / 2) app.mainloop()
So, how exactly do I create multiple threads to handle each person in the list and return them concurrently/simultaneously instead of row-by-row, one at a time?
I'd appreciate any support.
Reply
#2
Take a look at ProcessPoolExecutor.

https://docs.python.org/3/library/concur...olExecutor
Reply
#3
Hi Dean, and thank you for your reply.

So, I've already tried the ThreadPoolExector from concurrent.futures class, but it was not compatible with updating Tkinter's GUI. Is ProcessPoolExecutor going to be any different?

I'm currently away from home, but I'll give it a shot once I return if you have tested this class & have concluded that it will indeed work with Tkinter's GUI..

Anyhow, thanks again in the meantime. 👍👍


(Dec-15-2022, 06:52 PM)deanhystad Wrote: Take a look at ProcessPoolExecutor.

https://docs.python.org/3/library/concur...olExecutor
Reply
#4
After my post I gave ProcessPoolExecuter a better look. I thought you could launch all the processes and use futures to get the results when available. Unfortunately, it looks like the executor blocks until all processes are completed, or at least started.

Have you tried Asnyc IO? I think async io works great, but it is invasive. You only want to call one function asynchronously and before you know it you have async def sprinkled everywhere.

Here's an idea that uses threads, a list of queue objects, and the tkinter after() funciton.
import tkinter as tk from threading import Thread from queue import Queue from time import sleep from random import randint import functools class TkinterThreadManager: """A thread pool manager that maintains function call order and doesn't block tkinter """ def __init__(self, window, period=100): self.queue = [] self.window = window self.period = period def submit(self, func, args=None, callback=None): """Add thread to the pool func: Function executed in thread args: Optional arguments passed to function. callback: Optional function that is called using return value """ if callback is None: Thread(target=func, args=args).start() else: q = Queue(maxsize=1) q.callback = callback Thread(target=func, args=(q,)+args).start() self.queue.append(q) self.join() def join(self): """Periodically check for thread completion. Execute callback.""" while self.queue and self.queue[0].full(): q = self.queue.pop(0) q.callback(q.get()) if self.queue: self.window.after(self.period, self.join) def thread_func(func): """TkinterThreadManager function wrapper. Puts function return value in Queue to notify manager of thread completion. """ @functools.wraps(func) def wrapper(q, *args, **kwargs): value = func(*args, **kwargs) print(value) # For demonstration purposes q.put(value) return wrapper @thread_func def get_name(name, delay): """Dummy function to execute via thread""" sleep(randint(0, delay)) return name class MainWindow(tk.Tk): """Window to demonstrate TkinterThreadManager.""" def __init__(self): super().__init__() self.threads = TkinterThreadManager(self) self.lb1 = tk.Listbox(self, width=22, height=30) self.lb1.pack(side='left', fill='y', padx=10, pady=10) button = tk.Button(self, text='Letters', command=self.letters) button.pack(padx=10, pady=10) button = tk.Button(self, text='Numbers', command=self.numbers) button.pack(padx=10, pady=10) def letters(self): """Add some letters to the listbox""" for delay, name in enumerate(('A', 'B', 'C')): self.threads.submit( func=get_name, args=(name, delay), callback=lambda x: self.lb1.insert(tk.END, x)) def numbers(self): """Add some numbers to the listbox""" for delay, name in enumerate((1, 2, 3)): self.threads.submit( func=get_name, args=(name, delay), callback=lambda x: self.lb1.insert(tk.END, x)) if __name__ == '__main__': MainWindow().mainloop()
AaronCatolico1 likes this post
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  multithreading xlwings error Equivocal 0 1,058 Jan-25-2025, 05:10 PM
Last Post: Equivocal
  Python Tkinter Simple Multithreading Question AaronCatolico1 5 4,764 Dec-14-2022, 11:35 PM
Last Post: deanhystad
  multithreading Hanyx 4 2,764 Jul-29-2022, 07:28 AM
Last Post: Larz60+
Question Problems with variables in multithreading Wombaz 2 2,542 Mar-08-2022, 03:32 PM
Last Post: Wombaz
  Multithreading question amadeok 0 2,648 Oct-17-2020, 12:54 PM
Last Post: amadeok
  How can i add multithreading in this example WoodyWoodpecker1 3 4,064 Aug-11-2020, 05:30 PM
Last Post: deanhystad
  matplotlib multithreading catosp 0 4,145 Jul-03-2020, 09:33 AM
Last Post: catosp
  Multithreading dynamically syncronism Rodrigo 0 2,393 Nov-08-2019, 02:33 AM
Last Post: Rodrigo
  Locks in Multithreading Chuonon 0 2,623 Oct-03-2019, 04:16 PM
Last Post: Chuonon
  multithreading issue with output mr_byte31 4 5,724 Sep-11-2019, 12:04 PM
Last Post: stullis

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020
This forum uses Lukasz Tkacz MyBB addons.
Forum use Krzysztof "Supryk" Supryczynski addons.