Skip to content
This repository was archived by the owner on May 17, 2024. It is now read-only.

Conversation

@erezsh
Copy link
Contributor

@erezsh erezsh commented May 30, 2022

No description provided.

@erezsh
Copy link
Contributor Author

erezsh commented May 30, 2022

Common usage will look like this:

from data_diff import diff_tables, TableRef # will change to datadiff in another PR table1 = TableRef('postgres:///', 'Rating') table2 = TableRef('postgres:///', 'Rating_del1') diff_tables(table1, table2, update_column="timestamp", bisection_factor=64)
@erezsh erezsh requested a review from sirupsen May 30, 2022 09:26
@@ -1,9 +1,86 @@
from typing import Tuple
from typing import Tuple, Iterator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you're not doing this yet because you want it to be finalized, and not conflict with the other README PR, but I think we need to add some examples to the README, as well as a link to the Pydocs 👍🏻

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The other thing is tests (which I know you didn't add for the same reason; want to finalize it first)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep

@sirupsen
Copy link
Contributor

It's looking pretty good, let me try to play a bit around with the API in an almost-empty file and I'll see if any other feedback comes out of that!

@sirupsen
Copy link
Contributor

Looking really good! 🚀

Some thoughts from playing around with a script locally (thanks for the sample!):

  1. In the documentation, I think we should have an example line that sets the logging level to INFO so you can see the verbose output the first time you run it. I think that's very satisfying!
  2. I think we need to support counts and optionally stats out of the gate.... Maybe the simplest way to do this is that the helper helps create TableDiffer, you call diff_tables yourself, and then we can allow them to peek into the stats property? I think there's a case here for the helper to help you create the object, rather than run the whole thing 👀
  3. I'm still not sure I understand the value of the TableRef class. You mentioned it's because doing this kind of thing in the initializer in TableSegment is bad form... I'm not sure I agree. Just seems like a common question why we have both which will cause more confusion than I think automatically parsing these tables will 🤷🏻
  4. In your example of rating as table1 and rating_del1 as table2, why does the API return [('+', (12500048,))]? I would've expected the diff to be "left-to-right", that is, for it to return [('+', (12500048,))].
  5. When running this against rating and rating_001p it's somehow not working? Just gives me an empty list? Seems like it's not recursing? I ran the CLI, and it worked fine 🤔
@erezsh
Copy link
Contributor Author

erezsh commented Jun 6, 2022

@sirupsen Addressing your points:

  1. Sure, why not.
  2. What stats specifically do you think we should provide? (other than getting the table count)
  3. I don't want anyone to create a TableSegment on their own, because there's a lot they can get wrong. There won't be any confusion, because no one should access the TableSegment class directly. Will it help if I rename TableRef to connect_to_table(), or something like that?
  4. The two samples you wrote are exactly the same. ([('+', (12500048,))])
  5. I don't know. Can you include the code that doesn't work for you?
@erezsh erezsh requested a review from sirupsen June 6, 2022 09:30
@sirupsen
Copy link
Contributor

sirupsen commented Jun 8, 2022

What stats specifically do you think we should provide? (other than getting the table count)

Yeah, we can probably get away with just # of rows, and # of different rows. I guess the latter could even just be from len().

Will it help if I rename TableRef to connect_to_table(), or something like that?

I think a function that returns a properly crated TableSegment is more appropriate here, yeah.

The two samples you wrote are exactly the same. ([('+', (12500048,))])

I meant that I expected it to be ([('-', (12500048,))]). That represents a "diff" from left-to-right in my opinion, and is more consistent with how e.g. git diff works, no?

I don't know. Can you include the code that doesn't work for you?

Works great now! For my own reference, this is the code, but works just fine now!

from data_diff import diff_tables, TableRef # will change to datadiff in another PR import logging logging.basicConfig(level=logging.INFO) table1 = TableRef('postgres://postgres:Password1@localhost/postgres', 'rating') table2 = TableRef('postgres://postgres:Password1@localhost/postgres', 'rating_update001p') for diff in diff_tables(table1, table2, update_column="timestamp", debug=True, bisection_factor=64): print(diff)
Copy link
Contributor

@sirupsen sirupsen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me! Only smaller suggestions at your discretion.

README update with an example, a link to Pydocs, and then I think this is good to go!

@erezsh erezsh marked this pull request as ready for review June 8, 2022 14:13
@erezsh erezsh requested a review from sirupsen June 8, 2022 14:13
@erezsh
Copy link
Contributor Author

erezsh commented Jun 8, 2022

I believe I addressed all of the concerns, except for the docs which I will add later.

This PR is already too big.

@erezsh
Copy link
Contributor Author

erezsh commented Jun 8, 2022

@sirupsen Ok I am blocked from merging because there is an unresolved conversation and github is glitching.

I don't understand why I have to deal with these issues when I'm the one writing 95% of the commits. Whoever is playing with the settings isn't helping.

@sirupsen sirupsen merged commit 98b7b32 into master Jun 8, 2022
@sirupsen sirupsen deleted the new_api branch June 8, 2022 15:11
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

3 participants