imbalanced-learn vs ydata-profiling

imbalanced-learn

A Python Package to Tackle the Curse of Imbalanced Datasets in Machine Learning (by scikit-learn-contrib)

Source Code

imbalanced-learn.org

Suggest alternative

Edit details

ydata-profiling

1 Line of code data quality profiling & exploratory data analysis for Pandas and Spark DataFrames. (by ydataai)

Source Code

docs.sdk.ydata.ai

Suggest alternative

Edit details

Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.

Stream helps developers build engaging apps that scale to millions with performant and flexible Chat, Feeds, Moderation, and Video APIs and SDKs powered by a global edge network and enterprise-grade infrastructure.

getstream.io

featured

InfluxDB – Built for High-Performance Time Series Workloads

InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

www.influxdata.com

featured

imbalanced-learn		ydata-profiling
	Project
1	Mentions	44
7,070	Stars	13,311
0.3%	Growth	0.7%
6.9	Activity	7.5
4 months ago	Latest Commit	8 days ago
Python	Language	Python
MIT License	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

imbalanced-learn

Posts with mentions or reviews of imbalanced-learn. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-05-26.

What’s your approach to highly imbalanced data sets?
5 projects | /r/datascience | 26 May 2023

There's a pletora of undersampling and oversampling models you can try out. To avoid removing information form the dataset, you can focus on oversampling techniques. You can try imbalanced-learn or smote-variants. Given enough data, using fully synthetic data is also an option, you can check ydata-synthetic for it. Let us know how it turned out!

ydata-profiling

Posts with mentions or reviews of ydata-profiling. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2025-03-12.

The DuckDB Local UI
21 projects | news.ycombinator.com | 12 Mar 2025

WhatTheDuck does SQL with duckdb-wasm IIRC
Pygwalker does open-source descriptive statistics and charts from pandas dataframes: https://github.com/Kanaries/pygwalker
ydata-profiling does Exploratory Data Analysis (EDA) with Pandas and Spark DataFrames and integrates with various apps: https://github.com/ydataai/ydata-profiling
FLaNK 25 December 2023
33 projects | dev.to | 26 Dec 2023
First 15 Open Source Advent projects
16 projects | dev.to | 15 Dec 2023

6. Ydata-synthetic and Ydata-profiling by YData | Github | tutorial
Coding Wonderland: Contribute to YData Profiling and YData Synthetic in this Advent of Code
4 projects | dev.to | 5 Dec 2023

Send us your North ⭐️: "On the first day of Christmas, my true contributor gave to me..." a star in my GitHub tree! 🎵 If you love these projects too, star ydata-profiling or ydata-synthetic and let your friends know why you love it so much!
Data exploration is not dead
1 project | news.ycombinator.com | 24 Jun 2023
Explore your data in a single line of code
1 project | news.ycombinator.com | 24 Jun 2023
Which preprocessing steps to improve the performance of a naive bayes classifier
1 project | /r/learnmachinelearning | 23 Jun 2023

My suggestion start with the EDA - there are a lot of packages that automate that for you already. My usual go-to: https://github.com/ydataai/ydata-profiling.
Simulating sales data
2 projects | /r/datascience | 12 Jun 2023

If you're not sure about the behaviour of your data (i.e., if the original data has properties like seasonality), you can use ydata-profiling to profile your data first.
I recorded a Data Science Project using Python and uploaded it on Youtube
2 projects | /r/datascienceproject | 1 Jun 2023

Super cool! For EDA, you could give ydata-profiling a spin sometime and speed up the process!
Ydata-Profiling and Dask
1 project | news.ycombinator.com | 19 May 2023

Hey guys,
We've been recently at the Dask Demo Day and we're hoping to launch a new feature on ydata-profiling, with the support for Dask dataframes!
We're looking for Dask Wizards to start collaborating on this feature, so if you're interested, please join us to define the roadmap of the project and start making it real
Current GitHub branch is here: https://github.com/ydataai/ydata-profiling/tree/feat/dask
Dedicated dask channel here: https://discord.gg/EHDBuSSDuy

What are some alternatives?

When comparing imbalanced-learn and ydata-profiling you can also consider the following projects:

deodel - A mixed attributes predictive algorithm implemented in Python.

DataProfiler - What's in your data? Extract schema, statistics and entities from datasets

general_class_balancer - Data matching algorithm for categorical and continuous variables

dataprep - Open-source low code data preparation library in python. Collect, clean and visualization your data in python with a few lines of code.

confidenceinterval - The long missing library for python confidence intervals

dtale - Visualizer for pandas data structures

imbalanced-learn vs deodel ydata-profiling vs DataProfiler imbalanced-learn vs general_class_balancer ydata-profiling vs dataprep imbalanced-learn vs confidenceinterval ydata-profiling vs dtale

Stream - Scalable APIs for Chat, Feeds, Moderation, & Video.

getstream.io

featured

InfluxDB – Built for High-Performance Time Series Workloads

InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

www.influxdata.com

featured

Compare imbalanced-learn vs ydata-profiling and see what are their differences.

imbalanced-learn

ydata-profiling

imbalanced-learn

ydata-profiling

What are some alternatives?

Did you know that Python is
the 2nd most popular programming language
based on number of references?

imbalanced-learn VS ydata-profiling

Compare imbalanced-learn vs ydata-profiling and see what are their differences.

imbalanced-learn

ydata-profiling

imbalanced-learn

ydata-profiling

What are some alternatives?

Did you know that Python is the 2nd most popular programming language based on number of references?

Did you know that Python is
the 2nd most popular programming language
based on number of references?