Skip to content

Conversation

@jzwick
Copy link
Contributor

@jzwick jzwick commented Oct 1, 2025

  • closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest doc/source/whatsnew/vX.X.X.rst file if fixing a bug or adding a new feature.

This PR adds as AGENTS.md file, which is like "a README for agents: a dedicated, predictable place to provide the context and instructions to help AI coding agents work on your project." (more on AGENTS.md here)

This PR started as a copilot-instructions.md(see more on copilot-instrucitons.md files here) but the more generic AGENTS.md was suggested, which is supported not only by Copilot but also many other AI agents.

I have initially populated the file with some simple guidance on Decision Heuristics, Type Hints and Docstrings, based on the existing guidance in the "Contributing" section of the project documentation. The expectation is that the content in this file would grow and become more comprehensive over subsequent PRs.

@WillAyd
Copy link
Member

WillAyd commented Oct 2, 2025

Thanks for the PR. I don't have any experience with this so am open to experiences from other projects.

The github link provides an example of an instructions file that is much more compact than what is provided here:

https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions#writing-effective-repository-custom-instructions

Is there an advantage to this text copied from the guide versus something that would be more bulleted like the example?

@jzwick
Copy link
Contributor Author

jzwick commented Oct 2, 2025

Thanks for the PR. I don't have any experience with this so am open to experiences from other projects.

The github link provides an example of an instructions file that is much more compact than what is provided here:

https://docs.github.com/en/copilot/how-tos/configure-custom-instructions/add-repository-instructions#writing-effective-repository-custom-instructions

Is there an advantage to this text copied from the guide versus something that would be more bulleted like the example?

Thanks Will. My goal with the copy-paste was to avoid any controversy over wording. I'll ask copilot to re-write it in a way that is better suited for a copilot-instructions.md and update the PR with what it gives me ;)

Separately, on Slack someone made me aware of AGENTS.md which is now natively supported by Github Copilot (among many other agents) so I will update this to be an agents.md instead.

@jzwick jzwick changed the title DOC: Add copilot-instructions.md with type hints DOC: Add AGENTS.md with basic type and docstring guidelines Oct 3, 2025
Copy link
Member

@Alvaro-Kothe Alvaro-Kothe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am neutral to these changes. But I think this file needs to be improved. I think the one from openai/agents.md is a great example.

- doc/source/development/contributing_docstring.rst
- doc/source/development/contributing_documentation.rst
- doc/source/development/contributing.rst

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
+ doc/source/development/contributing_environment.rst

It needs instructions on how to build pandas from source. I am not sure if the model will be reliable if it reads all these files. I think it's best to create sections focusing on some aspects and link to the .rst files for additional information. The agents.md highlights:

  • Project overview
  • Build and test commands
  • Code style guidelines
  • Testing instructions
  • Security considerations
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the input. The intent of this PR is to "get the ball rolling" with a simple AGENTS.md, and to continue to expand upon it with subsequent PRs. For the first PR we wanted to stay focused on some simple guidelines around style (types and docstrings), and pointing to the existing contributing_*.rst

I agree that sections with more details for Build, Testing, and Security would be very valuable - we just prefer to have those contributed as separate PRs for more engaged conversation around each of those specifically.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT whether to have the .rst linked at the top, or individually within each of the sections below, I'm open to what others have experience with. In one place seemed more maintainable (e.g. easier to see if one is missing, or renamed, where to update the reference).

@jbrockmendel
Copy link
Member

Is the idea to encourage AI PRs? That’s the opposite of what we want.

@jzwick
Copy link
Contributor Author

jzwick commented Oct 8, 2025

Is the idea to encourage AI PRs? That’s the opposite of what we want.

No, we are not trying to encourage AI PRs. IMO the motivation here is acknowledging the reality that many people are already developing with the help of AI agents, so we are trying to improve both the (human) developer's experience and the code quality that ends up in PRs. For example, if we have a good AGENTS.md with robust instructions around the Style Guidelines, then for folks who do leverage copilot, cursor, etc. hopefully that reduces the corrections made at PR review.

Copy link
Member

@WillAyd WillAyd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still have some healthy skepticism of the effectiveness of this file. It would be nice to see the impact of this file when AI is used to help/solve a good first issue.

As an experiment, could you open an experimental PR that uses AI to address a good first issue while this file is present (and sharing all the transcripts from the AI, especially what rules it is following)? It would also be good to see another PR with a similar premise but other instructions exist e.g. .cursor/rules that may differ or conflict with this AGENTS.md

@WillAyd
Copy link
Member

WillAyd commented Oct 9, 2025

I think it will be difficult or impossible to measure the effectiveness of an AI solution, and that will lead to a level of subjectiveness with this in the future, but I'm also hesitant for either of those problems to be a blocker. AI tools are popular companions and learning tools, particularly for new contributors, so I am +1 to anything that promotes contribution

@mroeschke
Copy link
Member

mroeschke commented Oct 9, 2025

difficult or impossible to measure the effectiveness of an AI solution

I'm just interested if the AI solution even just uses this AGENT.md file in it's context, especially since appears to be a newish standard. My impression is that AI coding tools should pick this up automatically, so I'm just curious if a prompt like:

"Create a plan on how to solve $PANDAS_ISSUE and describe the conventions or rules to follow for this repository when implementing that plan."

Gives any indication if it picked anything up from AGENTS.md

@jzwick
Copy link
Contributor Author

jzwick commented Oct 9, 2025

@mroeschke I think that's a fair question. I will do a bit of "experimenting" and report back with my findings.

@mroeschke
Copy link
Member

Thanks for being open to experimenting.

For context, I am also not opposed to contributors using AI, but I (and other maintainers of larger open source projects I've spoken to) have noticed an increase of poor quality contributions in the past year which have the signs of AI use, which makes AI feels like a net negative experience from the maintainers side. So I'm hoping these "agent guidelines" can help AI follow conventions that a human would when contributing to pandas.

@jbrockmendel
Copy link
Member

There was a flood of AI slop PRs/issues a couple months ago but it seems to have calmed down. Any guesses as to why? Could it really be as simple as "because we asked people to stop"?

@jzwick
Copy link
Contributor Author

jzwick commented Oct 10, 2025

@mroeschke @WillAyd here is a link to the results of my experimentation https://github.com/jzwick/pandas/blob/agentsmdexperiment/agentsmd_experiment.md

TLDR; instruction files do have an impact, but (in my experience) only when the user prompts the agent to look at it. It only needs to be prompted once in the session, not with each instruction, even if a checkpoint is reverted. It didn't follow the instructions about loading other files or asking the *.rst files to be provided as context, so IMO we could remove that for now and consider better ways of prompting or including the source content in subsequent PRs.

Interested to see what you think.

@jzwick jzwick marked this pull request as ready for review October 15, 2025 16:42
Copy link
Member

@mroeschke mroeschke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose I'm OK with adding this file as I hope overtime AI coding tools can be smart enough to pick this up. Additionally, I think should just reflect what we document in our "contributing guidelines" if you want to submit a follow up https://pandas.pydata.org/docs/development/contributing_codebase.html.

Could you also add another checkbox to https://github.com/pandas-dev/pandas/blob/main/.github/PULL_REQUEST_TEMPLATE.md with something like - [ ] If I used AI to develop this pull request, I prompted to follow AGENTS.md ?

It would be good to solicit more opinions from other maintainers though @pandas-dev/pandas-core

@fangchenli
Copy link
Member

There was a flood of AI slop PRs/issues a couple months ago but it seems to have calmed down. Any guesses as to why? Could it really be as simple as "because we asked people to stop"?

That flood of AI PRs was likely due to someone testing/promoting their agent product. Apparently, they didn't receive much positive feedback from us, so at some point, it just wasn’t worth continuing.

Another possibility is that the base models have improved so much recently that we just can’t tell when a PR comes from an LLM anymore.

@jzwick
Copy link
Contributor Author

jzwick commented Oct 15, 2025

I suppose I'm OK with adding this file as I hope overtime AI coding tools can be smart enough to pick this up. Additionally, I think should just reflect what we document in our "contributing guidelines" if you want to submit a follow up https://pandas.pydata.org/docs/development/contributing_codebase.html.

Could you also add another checkbox to https://github.com/pandas-dev/pandas/blob/main/.github/PULL_REQUEST_TEMPLATE.md with something like - [ ] If I used AI to develop this pull request, I prompted to follow AGENTS.md ?

It would be good to solicit more opinions from other maintainers though @pandas-dev/pandas-core

That's an interesting idea, to see if I would have better success prompting copilot to follow the guidance from the public website vs the source files in the repo. I'll play around with that and report back.

Regardless, I'm inclined to say that it's better to keep pointing agents to the contributing docs (maybe both to the website and/or the source *rst in the repo) rather than duplicating all the info. IMO better to keep the guidance in the AGENTS.md light, point them to the source (even if it seems ineffective right now), and hopefully soon they'll be successful in actually referencing it. Trying to keep all the guidance in sync in 2 places feels messy. Maybe someone with an MCP set up with copilot, or a different AI agent would have more success.

I'll add the box to the PR template, and re-add the instructions to refer to the *rst and/or the webpage

@Dr-Irv
Copy link
Contributor

Dr-Irv commented Oct 15, 2025

One thing we discussed is that with the AI generated PRs, there is a LOT of verbosity in the description of the PR. See this comment for an example: #62590 (comment) (note the comment is in an issue, but we've seen this in the PRs as well).

So if there is a way that you can indicate in AGENTS.md that any verbal description of what is done in the PR needs to be brief, that would be helpful to us.

Copy link
Contributor

@Dr-Irv Dr-Irv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm good with this, if it helps us better AI-generated PRs.

- PERF: Performance improvement
- TYP: Type annotations
- CLN: Code cleanup
- Pull request descriptions should follow the template, and **succinctly** describe the change being made. Usually a few sentences is sufficient.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add "Only add or update summaries in the opening post. Do not add summaries in other comments."

In the past, each update with an AI-assisted PR comes with a comment summarizing what was done in the commit. This is unnecessary, adds noise, and has the risk of being inaccurate; the commit is sufficient for determining what was done.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

8 participants