Skip to content

ENH: Add Coefficient of Variation to DataFrame.describe() #61784

Closed
@ffaa1234

Description

@ffaa1234

Feature Type

  • Adding new functionality to pandas

  • Changing existing functionality in pandas

  • Removing existing functionality in pandas

Problem Description

The DataFrame.describe() method includes standard deviation (std), but its significance is hard to interpret without context, as it depends on the data’s scale. The coefficient of variation (CV = std / mean * 100) provides a relative measure of variability, making it easier to assess if std is "big."

Feature Description

Add CV as a row in DataFrame.describe() output for numeric columns, optionally enabled via df.describe(include_cv=True).

Example

import pandas as pd data = {'A': [10, 12, 14, 15, 13], 'B': [1000, 1100, 900, 950, 1050]} df = pd.DataFrame(data) desc = df.describe() desc.loc['CV (%)'] = (df.std() / df.mean() * 100) print(desc)

Output:

 A B count 5.000000 5.000000 mean 12.800000 1000.000000 std 1.923538 79.056942 min 10.000000 900.000000 25% 12.000000 950.000000 50% 13.000000 1000.000000 75% 14.000000 1050.000000 max 15.000000 1100.000000 CV (%) 15.027641 7.905694 

Benefits

  • Interpretability: CV shows relative variability, aiding comparison across columns.
  • Usability: Simplifies exploratory data analysis.
  • Relevance: Widely used in fields like finance and biology.

Alternative Solutions

Users can compute CV manually, but this is less convenient.

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions