Python Forum
Download multiple large json files at once
Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Download multiple large json files at once
#1
I'm trying to get the auction data from the world of warcraft API for all realms to be able to scan for certain items posted across all realms.

import webbrowser import urllib import urllib.request import urllib.error import urllib, json import re import os.path import time import threading from multiprocessing import Pool from multiprocessing.pool import ThreadPool from concurrent.futures.thread import ThreadPoolExecutor from queue import Queue from threading import Thread import concurrent.futures import multiprocessing q = Queue() url = "" apiUrls = [] apiRealms = [] curApi = 0 realms = [] realm = "" items = [141582, 141583, 141585, 141581, 141580, 141590, 141589, 141588, 141587, 141564, 141565, 141566, 141571, 141567, 141576, 141577, 141578, 141570, 141569, 141568, 141572, 141573, 141574, 141575, 141579] itemnames = ["Fran's Intractable Loop", "Sameed's Vision Ring", "Six-Feather Fan", "Demar's Band of Amore", "Vastly Oversized Ring", "Cloak of Martayl Oceanstrider", "Treia's Handcrafted Shroud", "Talisman of Jaimil Lightheart", "Queen Yh'saerie's Pendant", "Telubis Binding of Patience", "Mir's Enthralling Grasp", "Serrinne's Maleficent Habit", "Mavanah's Shifting Wristguards", "Cyno's Mantle of Sin", "Aethrynn's Everwarm Chestplate", "Fists of Thane Kray-Tan", "Claud's War-Ravaged Boots", "Cainen's Preeminent Chestguard", "Samnoh's Exceptional Leggings", "Boughs of Archdruid Van-yali", "Geta of Tay'shute", "Shokell's Grim Cinch", "Ulfgor's Greaves of Bravery", "Gorrog's Serene Gaze", "Welded Hardskin Helmet"] print(len(items)) # Import realms with open('C:/test/RealmList.txt') as f: realms = f.read().splitlines() print("Realms: " + str(realms)) f.close() # realms = ["Shu'halo", "Eitrigg", "Stormrage", "Moonguard"] def scanRealm(realmb): startTime = time.time() print("Scanning " + realmb) url = 'https://us.api.blizzard.com/wow/auction/data/' + realmb + '?locale=en_US&access_token=hidden' print(url) # Get auction json url with urllib.request.urlopen(url) as response: html = response.read() html = html.decode('utf-8') # Get url from string urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', html) print("Got Auction Api url for realm " + realmb + ": ", urls[0]) jsonurl = urllib.request.urlopen(urls[0]) print("request") text = json.load(jsonurl.read()) print("load") # write to txt with open('c:/Test/' + realmb + '.txt', 'w') as f: f.write(str(text).replace("{'auc", "\n{'auc")) print("Completed scanning " + realmb + " in " + str(time.time() - startTime) + " seconds.") f.close() return 1 def getJsonUrl(realmb): startTime = time.time() print("Scanning " + realmb) url = 'https://us.api.blizzard.com/wow/auction/data/' + realmb + '?locale=en_US&access_token=hidden' print(url) # Get auction json url with urllib.request.urlopen(url) as response: html = response.read() print(html) html = html.decode('utf-8') # Get url from string urls = re.findall('http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\(\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+', html) apiUrls.append(urls[0]) apiRealms.append(realmb) print("Got Auction Api url for realm " + realmb + ": ", urls[0]) def downloadJson(realmb): print("request") response = urllib.request.urlopen(realmb) # print(response.read()) text = json.load(response.read()) print("load") # write to txt with open('c:/Test/' + apiRealms[curApi] + '.txt', 'w') as f: f.write(str(text).replace("{'auc", "\n{'auc")) print("Completed scanning " + realmb) f.close() curApi = curApi + 1 return 1 starttime = time.time() #Get the url's for the json files with concurrent.futures.ThreadPoolExecutor(max_workers=20) as executor: pages = executor.map(getJsonUrl, realms) print("Completed in " + str(time.time() - starttime) + " seconds.") #download json with concurrent.futures.ThreadPoolExecutor(max_workers=8) as executor: pages = executor.map(downloadJson, apiUrls) print("Completed in " + str(time.time() - starttime) + " seconds.")
Initially I had it downloading one realm at a time, which worked fine but took almost 2 hours to complete. I'm trying to use threading to be able to scan multiple realms at once to speed it up a lot. The scanRealm function is the one that'll do one realm at a time and work fine. getJsonUrl appears to work fine and outputs urls to the json files that I need, for example http://auction-api-us.worldofwarcraft.co...tions.json. The downloadJson function is where I believe things are going wrong. It never seems to get to the point where it prints "load". No files are ever created or anything and after fiddling around looking for solutions for the past few hours I'm stumped.

Sorry for the mess of a code, I'm no professional at python and am mostly just trying to scrape up something functional to improve over time.
Reply


Possibly Related Threads…
Thread Author Replies Views Last Post
  How to optimize the speed of processing large JSON files in Python without using too sophia2005 3 890 Aug-02-2025, 03:25 PM
Last Post: snippsat
  how to download large files faster? kucingkembar 3 2,160 Feb-20-2025, 06:57 PM
Last Post: snippsat
  Trying to generating multiple json files using python script dzgn989 4 4,474 May-10-2024, 03:09 PM
Last Post: deanhystad
  Parsing large JSON josvink66 5 3,091 Jan-10-2024, 05:46 PM
Last Post: snippsat
  python convert multiple files to multiple lists MCL169 6 4,522 Nov-25-2023, 05:31 AM
Last Post: Iqratech
  splitting file into multiple files by searching for string AlphaInc 2 4,149 Jul-01-2023, 10:35 PM
Last Post: Pedroski55
  Merging multiple csv files with same X,Y,Z in each Auz_Pete 3 5,147 Feb-21-2023, 04:21 AM
Last Post: Auz_Pete
  unittest generates multiple files for each of my test case, how do I change to 1 file zsousa 0 2,136 Feb-15-2023, 05:34 PM
Last Post: zsousa
  Find duplicate files in multiple directories Pavel_47 9 9,143 Dec-27-2022, 04:47 PM
Last Post: deanhystad
  Opinion: how should my scripts cache web download files? stevendaprano 0 1,830 Dec-17-2022, 12:19 AM
Last Post: stevendaprano

User Panel Messages

Announcements
Announcement #1 8/1/2020
Announcement #2 8/2/2020
Announcement #3 8/6/2020
This forum uses Lukasz Tkacz MyBB addons.
Forum use Krzysztof "Supryk" Supryczynski addons.