0% found this document useful (0 votes)

47 views14 pages

Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius

An API allows users to send HTTP requests to a server to interact with services like querying a database or executing functions. The document discusses building a simple API using Flask that scrapes text from a website when users send GET requests to the API. It then improves the API by adding an API key to authenticate requests and only return data to requests that include the correct key.

Uploaded by

teo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views14 pages

Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius

Uploaded by

teo

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Open in app

Search Write

Data Engineering Concepts #2 —

Sending Data Using an API
Bar Dadon · Follow
Published in Dev Genius · 7 min read · Jul 17

151 1

Photo by Myriam Jessier on Unsplash

Introduction
One of the main responsibilities of data engineers is to transfer data between
a source and a destination. Data engineers can do it in many different ways.

Depending on the problem, this job often requires a data engineer to build
and maintain a complex data pipeline. However, data pipelines are not the
only way we can transfer data between machines or services.

In many cases, we can complete the request by building a simple API that
allows authorized users to request data from our services.

What is an API?
An API is simply an interface that allows users to send HTTP requests over
the internet to a server. Using these HTTP requests, a user can interact with
various services on the server, such as querying a database or executing a
function.

The developers who create the API control which operations users can
activate when they send HTTP requests.

For example, we can create an API that, given the correct request, activates a
function that is in charge of calling a query that retrieves the five most active
customer ids in the last month from a table called “customers”.

When to use an API instead of a data pipeline

APIs can be a great replacement for pipelines, but we should be aware when
to use them.

First, because APIs are used to send data over the internet, we can only send
relatively small amounts of data in each request. Also, if there’s a need for
highly complex processing of the data, then the API will be slow and
inefficient. In those cases, we should create a data pipeline instead.
However, APIs can replace a pipeline when the data needed is lightweight
and there’s no need for scheduling.

APIs also allow users to pull the data on their own. Users can interact with a
service whenever they choose, without having to request a data engineer to
execute a certain pipeline.

Of course, we can always use a hybrid approach. We can create a data

pipeline for transferring and processing large amounts of data into a
repository of our choice. Then create an API that can retrieve small amounts
of that processed data to users.

Example
To make this more concrete, let’s build a simple API using Flask. This API
will allow users to send a GET request to our service. If the request is valid,
the API will scrape the website: “example.com” and retrieve the requested
amount of letters from the website.

http://example.com/

1. Setting the environment

To get started, let’s create a virtual environment:

root@DESKTOP-3U7IV4I:/projects# python3 -m venv api_example

Then activate it:

root@DESKTOP-3U7IV4I:/projects/api_example# source bin/activate

To verify that we are currently in the virtual environment, the prompt should
look like this:

(api_example) root@DESKTOP-3U7IV4I:/projects/api_example#

Next, we need to pip install the libraries: flask, bs4 and requests.

(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# pip install flask bs4

Next, create a folder called “app” and a file app.py:

app/app.py

Great. Now we can build the API.

2. Building the API

First, let’s write the function for scraping the website example.com and
retrieving all the text we can find.

from bs4 import BeautifulSoup

import requests

def scrape_data(url = "<http://example.com/>"):

'''
1. Send a GET request to <http://example.com/>.
2. Parse the response.
3. Return all the text in the website.

Args:
- url(str)
default("<http://example.com/>")
Returns:
- text(str)
'''
def extract():
response = requests.get(url)
if response.status_code == 200:
print("Connection Succesful")
else:
raise ConnectionError("Something Went Wrong!")
return response

def transform(response):
text = ''
soup = BeautifulSoup(response.text, 'html.parser')
elements = soup.find_all(name='p')
for ele in elements:
text += ele.text
return text

return transform(extract())

if __name__ == "__main__":
data = scrape_data()
print(data)

Output:

Connection Succesful
This domain is for use in illustrative examples in documents. You may use this
domain in literature without prior coordination or asking for permission.Mor

Seems to be functioning properly. Let’s start building the API now. I will use
flask to create a local app that listens to port 5000.

Any user that sends a GET request to the URL: localhost:5000/ will activate
the above function and receive the text that we just scraped.

from flask import Flask

from bs4 import BeautifulSoup

import requests

def scrape_data(url = "<http://example.com/>"):

'''
1. Send a GET request to <http://example.com/>.
2. Parse the response.
3. Return all the text in the website.

def transform(response):
text = ''
soup = BeautifulSoup(response.text, 'html.parser')
elements = soup.find_all(name='p')
for ele in elements:
text += ele.text
return text

return transform(extract())

# Create a flask app

app = Flask(__name__)

# Implement a route to scrape data

@app.route('/')
def get_data(data):
data = scrape_data()
return data

# Run the app

if __name__ == "__main__":
app.run(debug=True, host = "localhost", port=5000)

To run the app, go to the folder “app” and run:

(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# cd app
(api_example) root@DESKTOP-3U7IV4I:/projects/api_example/app# flask run

If we go to localhost:5000/ we will see the scraped text in our simple app:

Our app at: localhost:5000

3. Using the API

Now, let’s say that we are users that need this data and want to use this API
that the developers built. To access this data we need to send a GET request
to localhost:5000/.

We can do that in many different ways. There are tons of tools for that, the
simplest one is to just use the Linux command “curl”.

Let’s use a curl command to grab this data and store it in a text file called
“scraped_data.txt”
(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# curl -o scraped_data.t

Output:

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed
100 175 100 175 0 0 634 0 --:--:-- --:--:-- --:--:-- 636

We should now have all the scraped text in the text file:

scraped_data.txt

4. Improving the API

Let’s go back to “playing” the developers. As the developers that built this
API, we are also tasked with adding some layer of security. We can’t allow
anyone that sends a simple GET request to grab out data.

a. Adding an API key

A very common way of adding a layer of security is by adding an API key.

For this simple example, let’s say that the API key is 12345. We want to
modify the code so that only requests to the URL
localhost:5000/api_key=12345 will be granted data. All other requests will
fail.

This will ensure that only users that know the API key that we chose are
authorized to send GET requests.

from flask import Flask

from bs4 import BeautifulSoup

import requests

def scrape_data(url = "<http://example.com/>"):

'''
1. Send a GET request to <http://example.com/>.
2. Parse the response.
3. Return all the text in the website.

def transform(response):
text = ''
soup = BeautifulSoup(response.text, 'html.parser')
elements = soup.find_all(name='p')
for ele in elements:
text += ele.text
return text

return transform(extract())

# Create a flask app

app = Flask(__name__)

API_KEY = '12345'

# Implement a route to scrape data

@app.route('/api_key=<api_key>')
def get_data(api_key):
if api_key != API_KEY:
raise ConnectionRefusedError("Wrong API key!")
else:
data = scrape_data()
return data

# Run the app

if __name__ == "__main__":
app.run(debug=True, host = "localhost", port=5000)

Now, let’s send a GET request, but this time with the API key 12345:

(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# curl -o scraped_data.t

Output:

% Total % Received % Xferd Average Speed Time Time Time Current

Dload Upload Total Spent Left Speed
100 175 100 175 0 0 653 0 --:--:-- --:--:-- --:--:-- 655

Great. Now, only authorized users that know that the API key is 12345 can
scrape our data.
b. Controlling the amount of data
Next, let’s allow users to control the amount of data they receive. Instead of
receiving all the data, users will be able to choose how many letters they
want. This can look like this:

from flask import Flask

from bs4 import BeautifulSoup

import requests

def scrape_data(url = "<http://example.com/>"):

'''
1. Send a GET request to <http://example.com/>.
2. Parse the response.
3. Return all the text in the website.

def transform(response):
text = ''
soup = BeautifulSoup(response.text, 'html.parser')
elements = soup.find_all(name='p')
for ele in elements:
text += ele.text
return text

return transform(extract())

# Create a flask app

app = Flask(__name__)

API_KEY = '12345'

# Implement a route to scrape data

@app.route('/api_key=<api_key>/number_of_letters=<number_of_letters>')
def get_data(api_key, number_of_letters):
if api_key != API_KEY:
raise ConnectionRefusedError("Wrong API key!")
else:
data = scrape_data()
return data[0:int(number_of_letters)]

# Run the app

if __name__ == "__main__":
app.run(debug=True, host = "localhost", port=5000)

Now let’s say that we want only the first 100 letters. We can send a GET
request like this:

(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# curl -o scraped_data.t

And the result is a text file with only the first 100 letters:
scraped_data.txt — only the first 100 letters

As we can see, APIs are a useful way to send small amounts of data online
and enable users to access services that developers provide.

This concludes the article. Hope you had a good read and learned something
new. If there are any questions, please don’t hesitate to ask in the comment
section.

API Data Science Data Engineering Python Programming

Lecture 6 Create REST API
No ratings yet
Lecture 6 Create REST API
13 pages
Web Scraping CheatSheet Guide
No ratings yet
Web Scraping CheatSheet Guide
10 pages
Web Scraping Guide for Data Scientists
No ratings yet
Web Scraping Guide for Data Scientists
25 pages
Programming 2 Lectures
No ratings yet
Programming 2 Lectures
52 pages
Cheat Sheet For API's and Data Collection
No ratings yet
Cheat Sheet For API's and Data Collection
4 pages
3252 Ids 10
No ratings yet
3252 Ids 10
5 pages
Python Toolbox 100 Scripts For Developers Enhance Your Development Skills With Ready-to-Use Python Scripts (Sari, Serhan) (Z-Library)
No ratings yet
Python Toolbox 100 Scripts For Developers Enhance Your Development Skills With Ready-to-Use Python Scripts (Sari, Serhan) (Z-Library)
193 pages
Image Scrapper From Scratch To Proudction
No ratings yet
Image Scrapper From Scratch To Proudction
22 pages
Web Scraping & API Guide
No ratings yet
Web Scraping & API Guide
24 pages
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
No ratings yet
Sari Serhan Python Toolbox 100 Scripts For Developers 2023
193 pages
Web API Development With Python A Beginners Guide Using Flask and FastAPI (Intermediate Python) (Rehan Haider) (Z-Library)
100% (3)
Web API Development With Python A Beginners Guide Using Flask and FastAPI (Intermediate Python) (Rehan Haider) (Z-Library)
127 pages
DeepSeek - Python Tutorial
No ratings yet
DeepSeek - Python Tutorial
8 pages
Unit I
No ratings yet
Unit I
12 pages
Getting Data
No ratings yet
Getting Data
54 pages
Python & FastAPI Setup Guide
No ratings yet
Python & FastAPI Setup Guide
200 pages
Api and Data Structure
No ratings yet
Api and Data Structure
3 pages
REST API Crash Course - Concepts, Python, Flask, Postgres
No ratings yet
REST API Crash Course - Concepts, Python, Flask, Postgres
7 pages
DAP 4 Module
No ratings yet
DAP 4 Module
45 pages
Data Science APIs With Flask
No ratings yet
Data Science APIs With Flask
18 pages
Data - Collection Python
No ratings yet
Data - Collection Python
40 pages
Api Design Course
No ratings yet
Api Design Course
11 pages
A Simple Python Web Crawler...
100% (1)
A Simple Python Web Crawler...
5 pages
Ibm Python Module 5 Apis Data Collection
No ratings yet
Ibm Python Module 5 Apis Data Collection
3 pages
Docs Scrapy Org en Latest
No ratings yet
Docs Scrapy Org en Latest
354 pages
Unit IV
No ratings yet
Unit IV
33 pages
API & Data Collection Guide
No ratings yet
API & Data Collection Guide
4 pages
Retrieving Data From The Web
No ratings yet
Retrieving Data From The Web
9 pages
Id-11659 Scrapping Web
No ratings yet
Id-11659 Scrapping Web
295 pages
Rest Api
No ratings yet
Rest Api
11 pages
Ua Pycon 2012
No ratings yet
Ua Pycon 2012
65 pages
Web Scraping Using Python: A Step by Step Guide: September 2019
No ratings yet
Web Scraping Using Python: A Step by Step Guide: September 2019
7 pages
Scrapy-Org Documentation
No ratings yet
Scrapy-Org Documentation
352 pages
03 Web Scraping
No ratings yet
03 Web Scraping
41 pages
Introduction To Web Crawling Chapter - 13
No ratings yet
Introduction To Web Crawling Chapter - 13
3 pages
FDSWeb Scraping
No ratings yet
FDSWeb Scraping
31 pages
6 Results and Discussions
No ratings yet
6 Results and Discussions
5 pages
Docs Scrapy Org en Latest
No ratings yet
Docs Scrapy Org en Latest
382 pages
Scrapy
No ratings yet
Scrapy
298 pages
CheatSheet - APIs and Data Collection
No ratings yet
CheatSheet - APIs and Data Collection
6 pages
1393 (Ebook PDF) Django For APIs: Build Web APIs With Python and Django Download
100% (1)
1393 (Ebook PDF) Django For APIs: Build Web APIs With Python and Django Download
53 pages
API's and Data Collection
No ratings yet
API's and Data Collection
4 pages
Python Web Scraping Guide
No ratings yet
Python Web Scraping Guide
7 pages
Web Scraping Using Python: A Step by Step Guide: September 2019
0% (1)
Web Scraping Using Python: A Step by Step Guide: September 2019
7 pages
Unit 4
No ratings yet
Unit 4
36 pages
Module5 Q&A
No ratings yet
Module5 Q&A
6 pages
Infosys
No ratings yet
Infosys
27 pages
Scrapy Docs
100% (1)
Scrapy Docs
197 pages
Day 3-Vulnerability Scanner
No ratings yet
Day 3-Vulnerability Scanner
23 pages
Web Programming
No ratings yet
Web Programming
36 pages
LAMP & REST: Building Web Apps
No ratings yet
LAMP & REST: Building Web Apps
25 pages
Quickstart - Requests 2.28.1 Documentation
No ratings yet
Quickstart - Requests 2.28.1 Documentation
8 pages
Icrawler
No ratings yet
Icrawler
35 pages
Scrapy PDF
No ratings yet
Scrapy PDF
250 pages
Lecture 12 - Web Scrapping
No ratings yet
Lecture 12 - Web Scrapping
11 pages
Context
No ratings yet
Context
8 pages
Basic Scraping Techniques
No ratings yet
Basic Scraping Techniques
7 pages
Manual Jve Pack
No ratings yet
Manual Jve Pack
40 pages
IP Addressing, Subnetting, Supernetting
No ratings yet
IP Addressing, Subnetting, Supernetting
65 pages
FPGA Interview Prep Guide
No ratings yet
FPGA Interview Prep Guide
8 pages
Android Debug Log Analysis
No ratings yet
Android Debug Log Analysis
22 pages
628 630
No ratings yet
628 630
3 pages
Dynamic Network Administration 5.4 PDF
No ratings yet
Dynamic Network Administration 5.4 PDF
363 pages
BCS 11
No ratings yet
BCS 11
5 pages
Citra Log - Txt.old
No ratings yet
Citra Log - Txt.old
57 pages
12345
No ratings yet
12345
4 pages
AECOsimBuildingDesigner ReadmMe
No ratings yet
AECOsimBuildingDesigner ReadmMe
37 pages
Aps e
No ratings yet
Aps e
4 pages
The Veiled Gate To Siemens S7 Silicon
No ratings yet
The Veiled Gate To Siemens S7 Silicon
53 pages
Steam Community - Guide - How To Downgrade Game Version
No ratings yet
Steam Community - Guide - How To Downgrade Game Version
8 pages
Embedded Internship PPT Final
No ratings yet
Embedded Internship PPT Final
61 pages
Crane Monitoring System
No ratings yet
Crane Monitoring System
15 pages
Intel Memo To Employees
No ratings yet
Intel Memo To Employees
3 pages
Data Structures: Stacks & Queues
No ratings yet
Data Structures: Stacks & Queues
40 pages
Computational Thinking Algorithms and Programming
No ratings yet
Computational Thinking Algorithms and Programming
42 pages
Malacious Software
No ratings yet
Malacious Software
54 pages
Guide: SUSE Linux Enterprise Server For SAP Applications 12 SP3
No ratings yet
Guide: SUSE Linux Enterprise Server For SAP Applications 12 SP3
110 pages
Python Machine Learning: Python Programming Language Fundamentals
No ratings yet
Python Machine Learning: Python Programming Language Fundamentals
4 pages
Web Systems and Technologies 2 Notes
No ratings yet
Web Systems and Technologies 2 Notes
8 pages
Mississippi State University Dallas Semiconductor: Standard Cell Tutorial
No ratings yet
Mississippi State University Dallas Semiconductor: Standard Cell Tutorial
133 pages
E9000 Series Blade Server Deployment and Management Training Standard Training Timetable
No ratings yet
E9000 Series Blade Server Deployment and Management Training Standard Training Timetable
1 page
Arduino PWM Guide for Enthusiasts
No ratings yet
Arduino PWM Guide for Enthusiasts
8 pages
Card VHDL PDF
No ratings yet
Card VHDL PDF
2 pages
Epson LQ-2190
No ratings yet
Epson LQ-2190
2 pages
Newest - Booklet Theory 1 EDPM
No ratings yet
Newest - Booklet Theory 1 EDPM
30 pages
PHP Coding Standards Guide
No ratings yet
PHP Coding Standards Guide
20 pages
Best of SQL Server Central Vol 2
100% (3)
Best of SQL Server Central Vol 2
195 pages

Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius

Uploaded by

Data Engineering Concepts #2 - Sending Data Using An API - by Bar Dadon - Dev Genius

Uploaded by

Open in app

Data Engineering Concepts #2 —

Photo by Myriam Jessier on Unsplash

When to use an API instead of a data pipeline

Of course, we can always use a hybrid approach. We can create a data

1. Setting the environment

root@DESKTOP-3U7IV4I:/projects# python3 -m venv api_example

Then activate it:

root@DESKTOP-3U7IV4I:/projects/api_example# source bin/activate

(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# pip install flask bs4

Next, create a folder called “app” and a file app.py:

Great. Now we can build the API.

2. Building the API

from bs4 import BeautifulSoup

def scrape_data(url = "<http://example.com/>"):

from flask import Flask

from bs4 import BeautifulSoup

def scrape_data(url = "<http://example.com/>"):

# Create a flask app

# Implement a route to scrape data

# Run the app

To run the app, go to the folder “app” and run:

If we go to localhost:5000/ we will see the scraped text in our simple app:

Our app at: localhost:5000

3. Using the API

% Total % Received % Xferd Average Speed Time Time Time Current

4. Improving the API

a. Adding an API key

from flask import Flask

from bs4 import BeautifulSoup

def scrape_data(url = "<http://example.com/>"):

# Create a flask app

# Implement a route to scrape data

# Run the app

(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# curl -o scraped_data.t

% Total % Received % Xferd Average Speed Time Time Time Current

from flask import Flask

from bs4 import BeautifulSoup

def scrape_data(url = "<http://example.com/>"):

# Create a flask app

# Implement a route to scrape data

# Run the app

(api_example) root@DESKTOP-3U7IV4I:/projects/api_example# curl -o scraped_data.t

API Data Science Data Engineering Python Programming

You might also like