Member-only story
How to Get Home Property Data to Analyze Your Market using Python
Python tutorial to get property data for on and off-market deals
Property data is awesome to analyze.
But how do we get property data without being a real estate agent?
We need to use web scraping to retrieve home data from public real estate sites.
Can we do better? Web scraping is tedious!
Yes! We can use APIs that web scrape public facing real estate sites already for us. We can query and consume the structured data.
What type of data can we get?
In this article, we will get over 200+ data points for a list of properties including number of bedrooms, square footage, lot size, Zestimate, and more.
This post will get data for properties that are for sale (on-market) and properties that are not for sale (off-market) using Python.
Problem Statement
We have a list of property addresses. We need to get property detail for each address.
The property detail will include home characteristics, sold history, tax data, and property estimates.
This will allow us to analyze properties in our real estate market.
Can we append property information to our list?
Yes, we can.
Data Source
We will use the Zillow.com API by APIMaker to get property detail.
This API already does the web scraping for us. It provides property information through several end points.
Disclosure: I am not the creator of the API, I am solely a consumer.
Framework
We will follow a four-step framework to gather property information.
- Upload a file with a list of property addresses
- Find the associated Zillow unique id (ZPID)
- Search ZPID and get property details from the API
- Append property details to our original file
Prerequisite
- Sign up for free Rapid API account to get an API key
- Subscribe to Zillow.com API to request property data
The Zillow.com API provides an option to subscribe for 20 FREE API Credits / mo (one API Credit = one API Call).
Supporting Video
Follow along in my Python tutorial video.
Python Tutorial
If you do not have an existing Python environment, then I highly suggest to first clone the notebook (at the bottom of the article).
This will allow you to run the Python code in Google Colab (free!). It is a cloud-based environment that lets you run code without having to install Python locally.
I. Install Packages
The first step is installing the necessary packages.
II. Import Libraries
Next, import the required libraries.
III. Locals & Constants
Sign up for a free RapidAPI account and subscribe to Zillow.com API.
Create a variable to hold our API key.
IV. Data
Single Property Search
Let’s start off with retrieving property data for a single address.
To request data from the API we need to provide the zpid of the address. We will get this ID by replicating a google search.
Steps:
- Get ZPID (unique identifier for each property stored in the URL)
- Get Property Details data
Let’s generate our property search phrase to enter into the google search function.
We add “ zillow home details” in our search string in order to get the Zillow link at the top of our list of URLs.
Input the query string in the google search function and set the stop value to “3” in order to return the top three search results.
Select the first URL, which is the most relevant in the search.
This returns the unique URL for our property. The ZPID is located at the end of the URL string.
Let’s extract the ZPID in a few steps wrapped in one line of code:
- Split our URL by “/”
- Search for the object with “zpid” in the string
- Split the object by “_” to get the ZPID
Here is our unique ID to pass into our API.
We now make a request to the API to get data on our property address.
We transform our response into a JSON format.
This outputs a lengthy set of data.
Let’s transform this dataset into a pandas dataframe (rows and columns).
This will allow us to view our data in table format that we can download later on.
For our single property address, we have 259 columns of data! Wow!
Let’s select a few columns from our dataset to view.
We have information on home characteristics as well as property estimates.
Check out my post on how to calculate cash flow based on property estimates.
List of Properties
We tested our framework to get data for a single property.
Now, it is time to upload our own list of properties and get data for each row.
Steps:
- Upload CSV file — Check out PropStream for on and off-market deals
- Get ZPID (unique identifier for each property stored in the URL)
- Get Property Details data
For this example, I uploaded a file of property addresses for tax delinquent owners. I downloaded this data from PropStream.
In the file, we have a list of properties. There are four columns related to property address — Address, City, State, and Zip.
Let’s pass the property address columns into our code!
Functions
We need to recreate the same steps we performed to get data for our single property.
Let’s set up functions to repeat the process of retrieving the unique ID and requesting data from the API.
Function #1 — Get ZPID using Google Search
Function #2 — Get property detail from the API
Here we set up a for loop to perform actions for each row in our spreadsheet.
Steps:
- Map address related columns to variables (street, city, state, zip_code)
- Call Function #1 to get the ZPID
- Pause script to not overwhelm the Google Search requests
- Call Function #2 to get property detail from the API
- Transform the JSON object to a dataframe and append it to a list
We have a list of 5 dataframes in our df_list object. Each dataframe represents the response we received from the property details API.
Let’s concatenate these dataframes to create one single table.
This looks great! We have all the property detail information like Zestimate in a single table.
But, how do we merge this new table with our original dataset?
Let’s merge our original and new dataframe on the unique column — ZPID.
This gives us a dataframe of 300+ columns.
Definitely not user friendly!
Let’s trim down the number of columns in our merge by selecting a subset of the property details columns.
Imagine that for our use case we only need property estimates to calculate metrics like cash flow.
We select 3 columns: ZPID, Zestimate, and rentZestimate.
Now we have a trimmed down our dataset to 44 columns.
We can see our new columns appended at the end of our dataframe— Zestimate and rentZestimate.
V. Visualize
By creating a Plotly box plot we can view the distribution of the Zestimate values.
The property estimates in our dataset range from 200K to 560K.
This can help us target certain properties over others.
VI. Automation
Check out my no code solution to upload your custom file and get property details.
Conclusion
Leveraging APIs are great way to retrieve property data.
Using property datasets alongside economic data from the Census can provide insight on how your real estate market is performing and what future trends exist.
Check out my YouTube channel — AnalyticsAriel to get more insight on real estate data sources and data analytics!








