If you’re working with location data, geocoding — the process of converting addresses into geographic coordinates — is often a key step. While many APIs offer geocoding, handling large lists of addresses while respecting rate limits can be a challenge.
In this article, we’ll walk through a Python script that solves exactly this problem using Geoapify’s Geocoding API. We’ll read addresses from a file, process them in rate-limited batches, and write the results to a newline-delimited JSON (NDJSON) file.
🔗 GitHub Repository: geoapify/maps-api-code-samples
🧩 What This Script Does
This script:
- Reads a list of addresses from a file.
- Sends asynchronous requests to the Geoapify Geocoding API.
- Respects API rate limits (5 requests/second).
- Optionally filters geocoding by country.
- Saves results to a file in NDJSON format.
Perfect for developers processing large CSV/Excel exports or building internal tools with address lookups.
📥 1. Reading Addresses from File
with open(input_file, 'r') as f: addresses = f.read().strip().splitlines()
What it does:
Reads the input file line by line, strips extra whitespace, and stores the addresses in a list.
Why it’s needed:
Prepares a clean list of addresses for batch processing. Each line in the input file represents a separate address.
📚 Docs:
🧮 2. Batching Requests According to Rate Limit
addresses = list(it.batched(addresses, REQUESTS_PER_SECOND))
What it does:
Splits the address list into smaller batches, each containing REQUESTS_PER_SECOND
number of addresses (e.g. 5 per batch).
Why it’s needed:
Geoapify enforces a maximum number of API requests per second. Batching ensures we never send more than the allowed number of requests per second.
📚 Docs:
📝 If you're using Python < 3.12, see how to implement your own batching function here.
🚀 3. Asynchronous Execution of Requests
tasks = [] with ThreadPoolExecutor(max_workers=10) as executor: for batch in addresses: logger.info(batch) tasks.extend([executor.submit(geocode_address, address, api_key, country_code) for address in batch]) sleep(1)
What it does:
- Uses a thread pool to send multiple requests in parallel.
- Submits one thread per address.
- Waits 1 second between batches to comply with Geoapify's rate limit.
Why it’s needed:
Parallelism accelerates processing by making multiple requests simultaneously. sleep(1)
ensures the API's request-per-second quota isn’t exceeded.
📚 Docs:
🌍 4. Geocoding Function
def geocode_address(address, api_key, country_code): params = { 'format': 'json', 'text': address, 'limit': 1, 'apiKey': api_key } if country_code: params['filter'] = 'countrycode:' + country_code try: response = requests.get(GEOAPIFY_API_URL, params=params) if response.status_code == 200: data = response.json() if len(data['results']) > 0: return data['results'][0] else: return { "error": "Not found" } else: logger.warning(f"Failed to geocode address '{address}': {response_data}") return {} except Exception as e: logger.error(f"Error while geocoding address '{address}': {e}") return {}
What it does:
Sends a request to the Geoapify Geocoding API with the given address and optional country code.
Parses the response and returns the top geocoding result as a dictionary. If no result is found, or an error occurs, it returns a fallback dictionary with an error message.
Why it’s needed:
Encapsulates the geocoding logic in a reusable function. Handles:
- URL building and query parameters,
- Optional filtering by country for accuracy,
- Error handling and logging,
- Response validation.
📚 Docs:
- Geoapify Geocoding API Docs
- Python
requests.get()
- Python
dict
type - Python
try
/except
- Python
logging
module
⏳ 5. Waiting for All Requests to Complete
wait(tasks, return_when=ALL_COMPLETED) results = [task.result() for task in tasks]
What it does:
Blocks until all geocoding requests have completed, then collects results into a list.
Why it’s needed:
Ensures that all asynchronous jobs finish before the output is saved. Prevents writing incomplete or partial results.
📚 Docs:
📝 6. Writing Results to NDJSON File
with open(output_file, 'w') as f: for result in results: f.write(json.dumps(result) + '\n')
What it does:
Writes results as newline-delimited JSON objects to a file — a format known as NDJSON.
Why it’s needed:
NDJSON is ideal for large-scale processing. It’s readable line-by-line, can be streamed, and integrates well with tools like jq
, Elasticsearch, and data pipelines.
📚 Docs:
▶️ How to Use It
1. Save your addresses to a .txt
file (one per line):
1600 Amphitheatre Parkway, Mountain View, CA Eiffel Tower, Paris Brandenburger Tor, Berlin
2. Run the script:
python geocode_addresses.py \ --api_key=YOUR_GEOAPIFY_API_KEY \ --input=addresses.txt \ --output=results.ndjson \ --country_code=us
-
--api_key
: Your Geoapify API key. -
--input
: Input file containing addresses. -
--output
: Output file in NDJSON format. -
--country_code
: Optional ISO country code (e.g.,us
,fr
,de
) to increase accuracy.
🛠️ Requirements
Only Python standard library is used — no extra installs needed.
Python 3.12+ is required for itertools.batched
.
If you’re on an older version, you can define your own batching function:
def batched(iterable, n): it = iter(iterable) while True: batch = list(itertools.islice(it, n)) if not batch: break yield batch
📦 Use Case Scenarios
- Clean and validate customer address lists.
- Pre-process logistics/delivery points.
- Enrich event registration data with geo-coordinates.
🔍 Conclusion
With just a few lines of Python, you can build a robust geocoding pipeline that respects API rate limits and scales to thousands of addresses. This script is a great foundation you can extend with features like:
- Retry logic
- Address deduplication
- Integration with Pandas or Google Sheets
Try it yourself — and happy geocoding!
Top comments (0)