-
-
Couldn't load subscription status.
- Fork 19.2k
ENH: add StringMethods (.str accessor) to Index, fixes #9068 #9667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| @jreback please let me know what you think of this, thanks |
| this looks good |
| may also want to update the docs to mention u can do this on Index (I think it's in text.rst iirc) |
9bab07a to 00e6aea Compare | @jreback both release note and |
pandas/core/base.py Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a check here to exclude string methods on MultiIndex objects, too? They have object dtype, too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer you mean a simple check like this?
if isinstance(self, MultiIndex): raise AttributeError(".str accessor is not supported for MultiIndex objects") There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that looks about right
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Normally, I would suggest only putting this accessor on the appropriate types directly, but it's a little tricky with MultiIndex because it's a subclass of the generic Index type (which is what string indexes area)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think better to check the inferred type
which will be string or mixed for s multi index (iirc)
4b9468b to 78b68cb Compare There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| Nice, but 2 points:
|
78b68cb to e6b0a4b Compare | @shoyer, @sinhrks I see, indeed returning @jreback an example would be: If we return an would work naturally, whereas if we return a boolean |
e6b0a4b to 23cf6b3 Compare | Just added a check for the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be nice to add an example or two here, both to show off the new feature and to illustrate the behavior with boolean output.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good idea, will do
pandas/core/strings.py Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use is_bool_dtype(result)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good catch I'll change this
| @mortada yeh I think we need to do something for
|
| @mortada Thanks for quick action:) |
| Didn't look in detail into this, but a quick question: what is the return type? I suppose in general a new index? But what if the method returns multiple elements for each entry? (eg returning a frame for a series)? |
| @jorisvandenbossche the return type is if the method returns multiple elements it will be an |
4b68a72 to 1b48868 Compare | By the way, I think this is a very nice feature, especially useful for cleaning up your column names (stripping whitespace at beginning or end -> typical problem you see on SO, replacing whitespace with |
| @jorisvandenbossche I don't think its a problem if certain methods which have a Another option is to exclude the methods entirely from the |
1b48868 to 8968009 Compare | @jreback I added |
| lgtm. @jorisvandenbossche |
8968009 to 7d734fb Compare | @mortada Are you sure you committed it? I still only see the frame/series options (or I am overlooking it completely ..) |
7d734fb to ed77c72 Compare | @jorisvandenbossche oops I had to do another rebase and missed the last commit, thanks for catching! |
| @jorisvandenbossche it's fixed, please take a look thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this {'series'/'index', 'frame'} to make it more clear they are aliases?
| The thing I am thinking now: won't it be confusing for users to get a Index back even if he/she supplies I was also thinking of another option: an But maybe we shouldn't block this PR with this discussion and open a new issue for this. So I am OK with merging. |
| how about we just do
but let's do that in another issue / PR |
| @jreback @jorisvandenbossche I'm happy to do another PR for the |
ed77c72 to f98bcb8 Compare | ok, merging this, but @mortada I would appreciate another issue (and PR!) for the |
ENH: add StringMethods (.str accessor) to Index, fixes #9068
| @mortada really a nice improvement! I will open an new issue to discuss the |
| @jorisvandenbossche thanks! Another thing you had mentioned was expanding the docs to cover some questions on SO about cleaning up column names. Could you please point me to some of those SO questions? I can add more examples to the docs. |
| @mortada I don't have a specific SO question on my mind, but you often see questions like: "why does So for me, this are typical ways to clean up column names:
|
| @mortada very nice, thanks! |
fixes #9068