Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

datafold/data-diff

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Datafold

data-diff

What is data-diff?

data-diff is a free, open-source tool that enables data professionals to detect differences in values between any two tables. It's fast, easy to use, and reliable. Even at massive scale.

๐Ÿ’ธ Join our team

We're looking for developers with a deep understanding of databases and solid Python knowledge. Apply here!

๐Ÿ“– Documentation

Check out our detailed documentation for instructions on how to use, common use cases, features, and technical details.

How to use

Quickly identify issues when migrating data between databases

diff1 diff2

Improve code reviews by identifying data problems you don't have tests for

(video is rough draft, screenshot will be replaced with something better)

Intro to Diff

ย  ย 

Get started

Installation

First, install data-diff using pip.

pip install data-diff

Note: Once you've installed Python 3.7+, it's most likely that pip and pip3 can be used interchangeably.

Then, install one or more driver(s) specific to the database(s) you want to connect to.

  • pip install 'data-diff[postgresql]'

  • pip install 'data-diff[snowflake]'

  • TODO We support 10+ other databases. Check out [TODO link to documentation] for specifics.

Run your first diff

Once you've installed data-diff, you can run it from the command line:

data-diff DB1_URI TABLE1_NAME DB2_URI TABLE2_NAME [OPTIONS]

[TODO here's one example of code that you can copy and paste, just like from the screenshot]

Check out the Documentation TODO add link for all the options and database-specific configurations.

Reporting bugs and contributing

License

This project is licensed under the terms of the MIT License.