Posted on Apr 24 • Originally published at Medium on Apr 20

Operation DynamoDB- wrangle some GitHub data into DynamoDB

For a Bedrock project I am working on, I needed to get my GitHub Dependabot alerts into an AWS DynamoDB table. This may be a complete edge case, but in case it helps anybody, here is how it went:

First, get your data out of GitHub using the REST API:

Install GitHub CLI — https://cli.github.com/

Once the CLI is installed, you will need to authenticate to your GitHub account-

gh auth login

Now you can call the API, see variations here:

https://docs.github.com/en/rest/dependabot/alerts?apiVersion=2022-11-28

If you are using data from your enterprise or organization, you will have a slightly different command line than I am showing for my personal GitHub repositories. You will also need to have the correct permissions to query the data. Also, if you are calling for an entity with quite a few alerts, you will probably want to add — paginate to get them all.

gh api -H “Accept: application/vnd.github+json” -H “X-GitHub-Api-Version:2022–11–28” /repos/yourgithubaccount/yourrepo/dependabot/alerts > dependabot.json

I redirected to a file and then I did this for the three repos I wanted to analyze. You can do it all in one fell swoop for an enterprise or organization.

I am not a developer, nor do I play one on TV, but this is what worked for me in Python.

## import your shenanigans import pandas as pd import boto3 import json import awswrangler as wr ##read your json files produced by the GitHub CLI into dataframes file_path = r"C:\Users\User\dependabot.json" df = pd.read_json(file_path) file_path = r"C:\Users\User\dependabot2.json" df2 = pd.read_json(file_path) file_path = r"C:\Users\User\dependabot3.json" df3 = pd.read_json(file_path) ##Concatenate the files concatdf= pd.concat([df, df2, df3]) #I ended up converting converting all the datatypes into string, which was fine in this case concatdf = concatdf.astype(str) ## Function to trim URL path, I just want the repo name of this column. I also renamed that column to “repo” def trim_url_path(url): return url.split('/')[4] concatdf['repo'] = concatdf['html_url'].apply(trim_url_path) ## Now let ’s wrangle my data with AWS SDK for pandas, formerly known as AWS Wrangler wr.dynamodb.put_df( df=concatdf, table_name='concat_dependabotalert' )

Yay, and now it will upload. Note: I already had a table created in DynamoDB when I started this process. I did this a few separate times with the same amount of data and the time to upload to DynamoDB varied quite a bit. It might definitely be a user side issue- a watched table never populates. You can also use this process for GitHub CodeQL alerts, and Secrets alerts. Hope this helps somebody. Thanks for reading!

DEV Community

Operation DynamoDB- wrangle some GitHub data into DynamoDB

Top comments (0)