data-diff is a free, open-source tool that enables data professionals to detect differences in values between any two tables. It's fast, easy to use, and reliable. Even at massive scale.
(video is rough draft, screenshot will be replaced)
pip install data-diff
Note: Once you've installed Python 3.7+, it's most likely that pip
and pip3
can be used interchangeably.
-
pip install 'data-diff[postgresql]'
-
pip install 'data-diff[snowflake]'
-
We support 10+ other databases. Check out our detailed documentation for specifics.
Once you've installed data-diff
, you can run it from the command line:
data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS]
You can find all the correct syntax for your database setup in our documentation.
Here's an example command for your copy/pasting, taken from the screenshot above:
data-diff \ postgresql://leoebfolsom:'$PW_POSTGRES'@localhost:5432/diff_test \ org_activity_stream \ "snowflake://leo:$PW_SNOWFLAKE@BYA42734/analytics/ANALYTICS?warehouse=ANALYTICS&role=DATAFOLDROLE" \ ORG_ACTIVITY_STREAM \ -k activity_id \ -c activity \ -w "event_timestamp < '2022-10-10'"
That's just an example, but sure to check out the documentation for more details about the options you can use to create a command that's useful to you.
We know, that data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS]
command can become long! And maybe you're new to the command line. We're here to help on slack if you have ANY questions as you use data-diff
in your workflow.
- Open an issue or chat with us on slack.
- Interested in contributing to this open source project? Please see our Contributing Guideline!
- Did we mention we're hiring?
This project is licensed under the terms of the MIT License.