Skip to content

Commit f94cc88

Browse files
author
benijake
committed
Initial clean commit
0 parents commit f94cc88

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+3391
-0
lines changed

.vscode/settings.json

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
{
2+
"pgsql.connections": []
3+
}

LICENSE

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,21 @@
1+
MIT License
2+
3+
Copyright (c) 2025 Beni Jake
4+
5+
Permission is hereby granted, free of charge, to any person obtaining a copy
6+
of this software and associated documentation files (the "Software"), to deal
7+
in the Software without restriction, including without limitation the rights
8+
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9+
copies of the Software, and to permit persons to whom the Software is
10+
furnished to do so, subject to the following conditions:
11+
12+
The above copyright notice and this permission notice shall be included in all
13+
copies or substantial portions of the Software.
14+
15+
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16+
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17+
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18+
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19+
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20+
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21+
SOFTWARE.

README.md

Lines changed: 231 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,231 @@
1+
# How can I use a service principal to read and write to a database from Databricks?
2+
3+
This demo will show how you can use an Entra service principal to read and write to an Azure SQL DB or Azure Postgres database from a Databricks notebook.
4+
5+
## Solution
6+
- [Prerequisites](#prerequisites)
7+
- [Azure SQL DB](#azure-sql-db)
8+
- [Azure Postgres](#azure-postgres)
9+
10+
## Prerequisites
11+
12+
### Create an Entra app registration
13+
First, navigate to your tenant's Entra Id directory and add a new [app registration](https://learn.microsoft.com/en-us/entra/identity-platform/quickstart-register-app). This will create an Entra Id service principal that you can use to authenticate to Azure resources:
14+
![app-registration](./databricks/AppRegistration.png)
15+
16+
Next, create a secret for the service principal.
17+
18+
Note: You'll need Tenant ID, service principal name, client ID and secret for the steps below.
19+
![add-secret1](./databricks/AddSecret1.png)
20+
![add-secret2](./databricks/AddSecret2.png)
21+
22+
The secret value will appear on the screen. Do not leave the page before recording it ([it will never be displayed again](https://learn.microsoft.com/en-us/entra/identity-platform/how-to-add-credentials?tabs=client-secret#add-a-credential-to-your-application)). You may want to keep the browser tab open and complete the remaining steps in a second tab for convenience.
23+
![secret-id-value](./databricks/SpSecretIDValue.png)
24+
25+
### Create an Azure Key Vault
26+
Create an Azure Key Vault in the same region
27+
![azure-keyvault](./databricks/CreateKeyVault.png)
28+
29+
Assign yourself and the Azure Databricks application the Key Vault Secrets Officer role. (Databricks will access the key vault referenced in your secrets scope using the Databricks application's own service principal, which is unique to your tenant. You might expect a Unity Catalog-enabled workspace to use the workspace's managed identity to connect to the key vault, but unfortunately that's not the case.)
30+
![add-role](./databricks/AddKeyVaultRole.png)
31+
![secrets-officer](./databricks/KVSecretsOfficerRole.png)
32+
![azure-databricks](./databricks/AssignDatabricksAppRBAC.png)
33+
34+
Add three secrets to the key vault: one for your tenant id, another for the service principal client Id (not the secret id) and a third for the secret value. You will need these to authenticate as the service principal in your Databricks notebook. Refer back to the secret value in the other browser tab as needed.
35+
![secret-id](./databricks/CreateSPId.png)
36+
![secret-value](./databricks/CreateSPSecret.png)
37+
38+
39+
40+
### Create a Databricks workspace
41+
Many of the new Databricks features like serverless, data lineage and managed identity support require Unity Catalog. Unity Catalog is normally enabled now by default when you create a new workspace and gives you the most authentication options when connecting to Azure resources like databases. The steps described in this demo should work for both Unity Catalog-enabled workspaces and workspaces using the legacy hive metastore.
42+
![create-workspace](./databricks/create-databricks-workspace.png)
43+
44+
### Create a Databricks secret scope
45+
Next, we need to [create a secrets scope](https://learn.microsoft.com/en-us/azure/databricks/security/secrets/) for our Databricks workspace. Open your workspace and add ```#secrets/createScope``` after the databricks instance url in your browser. It's [case sensitive](https://learn.microsoft.com/en-us/azure/databricks/security/secrets/#create-an-azure-key-vault-backed-secret-scope-1) so be sure to use a capital S in 'createScope':
46+
![dbx-scope](./databricks/CreateDBXSecretsScope.png)
47+
48+
A page to configure a new secrets scope should appear. Give your secrets scope a name and paste the key vault uri (DNS Name) and resource id into the respective fields below. You can find these on the properties blade of your key vault.
49+
![dbx-uri-resource-id](./databricks/createDBXSecretsScope2.png)
50+
![kv-properties](./databricks/KeyVaultProperties.png)
51+
52+
## Create a Databricks cluster
53+
If you just want to read from a database with a service principal, you can use the serverless cluster. Just add the azure-identity library to the environment configuration.
54+
55+
![azure-identity](./databricks/AddServerlessClusterLibrary.png)
56+
57+
If you also want to *write* to the database, however, you'll need to use a provisioned cluster.
58+
![create-cluster](./databricks/CreateCluster.png)
59+
60+
After the cluster has been created, click on the cluster name in the list and then switch to the libraries tab to install azure-identity with PyPi
61+
![install-library](./databricks/InstallLibrary.png)
62+
63+
## Azure SQL DB
64+
### Create Database
65+
Create an Azure SQL database with Entra Id authentication. You can use the Adventure Works LT sample to pre-populate it with data if you like. For the purposes of this demo, we will only be working with the information_schema.
66+
![adventure-works](./databricks/AdventureWorksLTDB.png)
67+
68+
Then connect to the database using the Entra Id admin user and [create a user for the service principal](https://learn.microsoft.com/en-us/azure/azure-sql/database/authentication-aad-service-principal-tutorial?view=azuresql#create-the-service-principal-user). You can use the Query editor in the portal or SQL Server Management Studio.
69+
![query-editor](./databricks/QueryEditor.png)
70+
![create-dbuser](./databricks/CreateDBUser.png)
71+
72+
``` SQL
73+
CREATE USER [sql-dbx-read-write-test-sp] FROM EXTERNAL PROVIDER;
74+
```
75+
76+
Add the dbmanagedidentity user to database roles db_datareader, db_datawriter and db_ddladmin. This will allow the managed identity to read and write data to existing tables. Using overwrite mode will automatically drop and recreate a table before writing to it.
77+
``` SQL
78+
ALTER ROLE db_datareader
79+
ADD MEMBER [sql-dbx-read-write-test-sp];
80+
81+
ALTER ROLE db_datawriter
82+
ADD MEMBER [sql-dbx-read-write-test-sp];
83+
84+
ALTER ROLE db_ddladmin
85+
ADD MEMBER [sql-dbx-read-write-test-sp];
86+
```
87+
88+
89+
90+
## Create a Databricks notebook
91+
Paste the code below into the first cell of the notebook. Using the dbutils library, retrieve the tenant id, client id and service principal secret from the key vault referenced in the secrets scope and get an Entra token.
92+
``` python
93+
from azure.identity import ClientSecretCredential
94+
95+
tenant_id = dbutils.secrets.get(scope = "default", key = "tenant-id")
96+
client_id = dbutils.secrets.get(scope = "default", key = "sql-dbx-read-write-test-sp-client-id")
97+
client_secret = dbutils.secrets.get(scope = "default", key = "sql-dbx-read-write-test-sp-secret")
98+
99+
credential = ClientSecretCredential(tenant_id, client_id, client_secret)
100+
token = credential.get_token("https://database.windows.net/.default").token
101+
```
102+
103+
In the next cell, create a jdbc connection to the database and query a list of tables from the information schema.
104+
``` python
105+
jdbc_url = "jdbc:sqlserver://sql-xxxx.database.windows.net:1433;database=testdb"
106+
107+
connection_properties = {
108+
"accessToken": token,
109+
"driver": "com.microsoft.sqlserver.jdbc.SQLServerDriver"
110+
}
111+
112+
# Read from a table
113+
df = spark.read.jdbc(url=jdbc_url, table="INFORMATION_SCHEMA.TABLES", properties=connection_properties)
114+
df.show()
115+
```
116+
The output should look something like this
117+
![info-schema](./databricks/ReadInfoSchema.png)
118+
119+
If you're using a provisioned cluster, you can run the code below to write to the database.
120+
``` python
121+
df.write.jdbc(
122+
url=jdbc_url,
123+
table="dbo.Test",
124+
mode="overwrite",
125+
properties=connection_properties
126+
)
127+
```
128+
129+
To verify that the data was written as expected, you can read the table data into a dataframe
130+
``` python
131+
df_check = spark.read.jdbc(
132+
url=jdbc_url,
133+
table="dbo.Test",
134+
properties=connection_properties
135+
)
136+
df_check.show()
137+
```
138+
139+
You can find the complete notebook [here](./databricks/Read_Write_SQL_DB_SP.ipynb)
140+
141+
## Azure Postgres
142+
### Create Database
143+
First [create an Azure Database for PostgreSQL flexible server with Entra Id authentication](https://learn.microsoft.com/en-us/azure/postgresql/flexible-server/how-to-configure-sign-in-azure-ad-authentication). Then connect to the **postgres** database with the Entra ID administrator user.
144+
145+
In the example below, we connect using bash commands in [Azure Cloud Shell](https://learn.microsoft.com/en-us/azure/cloud-shell/get-started/classic?tabs=azurecli) in the portal. (You can use the [PostgreSQL add-in in VS Code](https://marketplace.visualstudio.com/items?itemName=ms-ossdata.vscode-pgsql) instead if you prefer.)
146+
![create-PGUser](./databricks/connect-postgres3.png)
147+
148+
Set the environment variables in the Cloud Shell window.
149+
``` bash
150+
export PGHOST=psql-xxxxx.postgres.database.azure.com
151+
export PGUSER=user@domain.com
152+
export PGPORT=5432
153+
export PGDATABASE=postgres
154+
export PGPASSWORD="$(az account get-access-token --resource https://ossrdbms-aad.database.windows.net --query accessToken --output tsv)"
155+
```
156+
157+
Now simply type psql. The environment variables set above will be used automatically to connect.
158+
``` bash
159+
psql
160+
```
161+
162+
163+
Once connected, run the statement below to create a user for the service principal
164+
``` python
165+
SELECT * FROM pgaadauth_create_principal('sql-dbx-read-write-test-sp',false, false);
166+
```
167+
![create-dbuser](./databricks/CreatePGUser.png)
168+
169+
Quit your connection to the postgres database and then open a new connection to your **application** database (```testdb``` in our example). Grant the service principal user read and write privileges on all tables in the public schema
170+
![connect-application-db](./databricks/connect-testdb2.png)
171+
172+
```SQL
173+
ALTER DEFAULT PRIVILEGES IN SCHEMA public
174+
GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO "sql-dbx-read-write-test-sp";
175+
176+
GRANT ALL PRIVILEGES ON SCHEMA public TO "sql-dbx-read-write-test-sp";
177+
```
178+
![grant-pguser](./databricks/GrantPGUser.png)
179+
180+
### Create a Databricks notebook
181+
Paste the code below into the first cell of the notebook. Using the dbutils library, retrieve the tenant id, client id and service principal secret from the key vault referenced in the secrets scope and get an Entra token. The url used to get the token is different from the one we used above to authenticate to [Azure SQL DB](#azure-sql-db).
182+
``` python
183+
from azure.identity import ClientSecretCredential
184+
185+
sp_name = "sql-dbx-read-write-test-sp"
186+
tenant_id = dbutils.secrets.get(scope = "default", key = "tenant-id")
187+
client_id = dbutils.secrets.get(scope = "default", key = "sql-dbx-read-write-test-sp-client-id")
188+
client_secret = dbutils.secrets.get(scope = "default", key = "sql-dbx-read-write-test-sp-secret")
189+
190+
credential = ClientSecretCredential(tenant_id, client_id, client_secret)
191+
token = credential.get_token("https://ossrdbms-aad.database.windows.net/.default").token
192+
```
193+
194+
In the next cell, add the following code to connect to and read data from the database into a dataframe. You need to specify the service principal name as the user and pass the token for the password. Once the connection is made, however, we read and write to the database the same way we did for SQL.
195+
``` python
196+
jdbc_url = "jdbc:postgresql://psql-xxxxxxx.postgres.database.azure.com:5432/testdb"
197+
198+
connection_properties = {
199+
"user": sp_name,
200+
"password": token,
201+
"driver": "org.postgresql.Driver",
202+
"ssl": "true",
203+
"sslfactory": "org.postgresql.ssl.NonValidatingFactory"
204+
}
205+
206+
# Read from a table
207+
df = spark.read.jdbc(url=jdbc_url, table="information_schema.tables", properties=connection_properties)
208+
display(df)
209+
```
210+
211+
Add a new cell and paste in the code below to write the contents of the dataframe to a new table in the database. The only difference to the SQL code above is that we are using the public schema in place of dbo.
212+
``` python
213+
df.write.jdbc(
214+
url=jdbc_url,
215+
table="public.Test",
216+
mode="overwrite",
217+
properties=connection_properties
218+
)
219+
```
220+
221+
Finally, let's add a cell to read the table contents into a dataframe to confirm that the data was written.
222+
``` python
223+
df_check = spark.read.jdbc(
224+
url=jdbc_url,
225+
table="public.Test",
226+
properties=connection_properties
227+
)
228+
df_check.show()
229+
```
230+
231+
You can download the complete notebook [here](./databricks/Read_Write_PSQL_DB_SP.ipynb)

databricks/AddKeyVaultRole.png

134 KB
Loading

databricks/AddPGUser1.png

242 KB
Loading

databricks/AddSecret1.png

90.9 KB
Loading

databricks/AddSecret2.png

15.3 KB
Loading
41.9 KB
Loading

databricks/AdventureWorksLTDB.png

97.9 KB
Loading

databricks/AppRegistration.png

259 KB
Loading

0 commit comments

Comments
 (0)