I Would like to propose a ai powered malware package scanner because anyone can upload malware to PyPi even people report them but cause damages.
This solution will be help keep PyPi safe for developers and minimize malware example soopsocks is a malicious package have been detected but some users have download it before deleted in PyPi via this solution this minimize the risk of malicious packages uploaded.
There are already groups of security researchers who audit the packages PyPI. The malware reporting feature on PyPI is open to everyone so anyone who manages to build a malware scanning AI that isnât as hopeless as all the general purpose AI powered antivirus services that have sprung up is free to download wheels, scan them and report the bad ones themselves.
No This Feature designed if someone upload a package ai will be scan it automatically if malware found will be deleted and send message to the maintener for keep community safe and protect developers before harm or damage done
(Somewhat tempted to move this to Help or Packaging, rather than Ideas, which is for ideas about the language.)
This post is of a kind that I donât very much like, which just lobs a vague thought out without doing much to build towards any kind of productive conversation.
One possibility is that you intend to build this and feel you have a lot of relevant expertise? In which case, cool, let us know when you have a proof of concept! Iâm all in favor of building things, even projects which I personally think are probably doomed to fail, since they may spawn or inspire other projects or teach us all what does or does not work.
Another is that youâd like to build this, but are wondering if you should? My advice would be not to. There are already ML-based tools, decades of research, and novel attacks showing up all the time. Classifying packages as malware is incredibly hard. If youâre approaching this as a fun hobby project, I think youâll end up not enjoying it.
And then third, and this is the possibility that makes me dislike such posts, you might be of the impression that none of the people who work on maintaining these software systems have thought of using software to make their jobs easier? i.e. If the response you were hoping for was âWow, nobody ever considered that! What an amazing idea! Weâll go build it!â then I think youâre badly out of step with the community. If you are genuinely just curious if an idea, even an obvious one, had been considered or is in use, then you can ask that as a question rather than phrasing it as a novel idea.
These researchers will have the process of downloading and scanning new uploads automated. The only meaningful difference between what youâre proposing and the status quo is youâd have the suspicious packages deleted straight away whereas a human currently has to verify the findings before acting. For that to have a chance of floating, you will need to demonstrate that the false positive rate is so exceptionally low that the proposal is worth the enormous disruption that even one misguided deletion could cause.
First AI need be trained to minimize false positive and negative before integrate it should first be trained before integration this can take years no problem after you finish of training after this integrate it to pypi.org
Yes AI / ML Models need to be trained, and that takes some years, I think everyone here knows that.
But if you yourself realise that itâs gonna take a long time to train a model that can accurately detect malware, and not disturb harmless project, then why propose something that will take a lot of time (and money), whilst not actually increasing the usefulness of Pypi that much.
IMO, if you download something on Pypi, without checking the code beforehand, and it turns out to be a piece of malware, well thatâs your fault. Its the same situation that GitHub is facing. If they were to ban malware, that could even be âbadâ. Sometimes malware is explicitly placed somewhere in order to e.g. inform others about certain kinds of malware (e.g. âThis kind of virus works by âŠâ). Youâd automatically lock that kind of content too. Humans are the most reliable way to check for malicious code in a project.
I disagree. Supply chain attacks to start, but you should also feel safe trusting a project like numpy by reputation without having the requisite skills to audit the source. Securing the supply chain is a difficult (maybe, in the perfect sense, impossible) task, but it is an active effort.
However, thatâs all nuanced stuff, and Iâm not sure this is the thread is the place to discuss it.
No this should be implemented because of threat of cyberattack of 2025+, attackers are leveraging AI defenders also should train ai by example in popular package the ai monitor the packages any obfuscated code in pypi popular package and safe are red flag will be triger an email to the maintener for take actions and remove malicious package for reduce the abuse of PyPi.
It doesnât really matter what the motivation or specifics of how itâs to be done are. The answer is the same. If you think this can work and want to do it yourself then no-one is stopping you. If youâre expecting someone else to do the hard work on a whim because you suggested they should then forget it.