Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

datafold/data-diff

Repository files navigation

Datafold

data-diff

What is data-diff?

data-diff is a free, open-source tool that enables data professionals to detect differences in values between any two tables. It's fast, easy to use, and reliable. Even at massive scale.

Use cases

Quickly identify issues when migrating data between databases

diff1 diff2

Improve code reviews by identifying data problems you don't have tests for

(video is rough draft, screenshot will be replaced)

Intro to Diff

   

Get started

Installation

First, install data-diff using pip.

pip install data-diff 

Note: Once you've installed Python 3.7+, it's most likely that pip and pip3 can be used interchangeably.

Then, install one or more driver(s) specific to the database(s) you want to connect to.

  • pip install 'data-diff[postgresql]'

  • pip install 'data-diff[snowflake]'

  • We support 10+ other databases. Check out our detailed documentation for specifics.

Run your first diff

Once you've installed data-diff, you can run it from the command line:

data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS] 

You can find all the correct syntax for your database setup in our documentation.

Here's an example command for your copy/pasting, taken from the screenshot above:

data-diff \ postgresql://leoebfolsom:'$PW_POSTGRES'@localhost:5432/diff_test \ org_activity_stream \ "snowflake://leo:$PW_SNOWFLAKE@BYA42734/analytics/ANALYTICS?warehouse=ANALYTICS&role=DATAFOLDROLE" \ ORG_ACTIVITY_STREAM \ -k activity_id \ -c activity \ -w "event_timestamp < '2022-10-10'" 

That's just an example, but sure to check out the documentation for more details about the options you can use to create a command that's useful to you.

We're here to help

We know, that data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS] command can become long! And maybe you're new to the command line. We're here to help on slack if you have ANY questions as you use data-diff in your workflow.

Reporting bugs and contributing

License

This project is licensed under the terms of the MIT License.