Skip to content

Conversation

@jbrockmendel
Copy link
Member

ATM this implements a smoke-test that the cases don't raise, but I don't have a canonical answer to test them against. Could hard-code the result I get now. Thoughts?

except ValueError:
# GH#28969 if we contain tuples, astype fails here,
# apparently related to github.com/numpy/numpy/issues/9441
str_vals = np.array([str(x) for x in vals], dtype=object)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm not sure about this - wouldn't this really make everything hashable at this point, including non-hashable items?

>>> vals = np.array([("a",), ["c", "d"]]) >>> str_vals = np.array([str(x) for x in vals], dtype=object) array(["('a',)", "['c', 'd']"], dtype=object)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point

except TypeError:
# we have mixed types
try:
str_vals = vals.astype(str)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would handle this in hash_object_array as a case

@jreback jreback added Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode labels Dec 27, 2019
@jbrockmendel
Copy link
Member Author

moved the tuple-catching to hash_object_array, added check for non-hashable within a tuple

@jbrockmendel jbrockmendel marked this pull request as ready for review December 27, 2019 21:21
@jreback jreback added this to the 1.0 milestone Dec 30, 2019
@jreback
Copy link
Contributor

jreback commented Dec 30, 2019

also have 2 tests in the OP

pd.util.hash_pandas_object(pd.DataFrame({'data': [tuple('1'), tuple('2')]})) # fails pd.util.hash_pandas_object(pd.DataFrame({'data': [tuple([1]), tuple([2])]})) # fails 
@TomAugspurger
Copy link
Contributor

Also needs a whatsnew note.

@jbrockmendel
Copy link
Member Author

also have 2 tests in the OP

these are the df1 and df2 in the test that this adds.

Also needs a whatsnew note.

updated.

@TomAugspurger TomAugspurger merged commit f8a0989 into pandas-dev:master Dec 31, 2019
@TomAugspurger
Copy link
Contributor

Thanks!

@jbrockmendel jbrockmendel deleted the bug-hashtup branch December 31, 2019 16:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Bug Reshaping Concat, Merge/Join, Stack/Unstack, Explode

4 participants