Posted on Aug 29

Making scientific python blazingly fast with PyTorch

If you’ve ever written scientific code in Python, you know the stack Numpy + Scipy, a lot of people say it is fast, but to me not fast enough. In my recent paper TorchFX: A Modern Approach to Audio DSP with PyTorch and GPU Acceleration (DAFx25), I explored a solution to this long-standing problem.

The idea is simple yet powerful: instead of relying on the traditional NumPy + SciPy stack, we can swap them out for PyTorch, instantly gaining GPU acceleration, an object-oriented API, and direct compatibility with AI workflows.

The slow python problem

Python itself is notoriously slow for heavy computations. The interpreter adds overhead, and loops kill performance. That’s why scientific programming in Python always starts with NumPy and SciPy, which wrap C and Fortran under the hood. But even then, scaling to large matrices multiplication (as it happens in multichannel audio) is a challenge.

Already used solutions (and their limitations)

Over the years, developers have tried different hacks to speed up Python:

Vectorization with NumPy – great, but requires reshaping your code.
Cython – lets you drop into C, but you need to rewrite parts of your program.
Numba – JIT compilation is cool, but not everything is parallelizable, and libraries like SciPy aren’t fully supported.

All of these approaches share one problem: you must refactor your code in sometimes not so elegant ways, and you’re still limited by CPU-based execution (actually numba supports also a simple @cuda.jit decorator to run code on an NVIDIA GPU, but even there you often need to add types as in cython and know about cuda kernels).

PyTorch to the rescue

This is where PyTorch comes in. Originally built for AI, PyTorch provides:

Tensors that work like NumPy arrays, but with GPU acceleration.
An object-oriented API, more structured than SciPy’s MATLAB-inspired procedural style.
Direct integration with AI models—any function you write, if you extend the class nn.Module, can be dropped into a neural network.

In our paper, we introduce TorchFX, a new library built on top of PyTorch, designed specifically for audio DSP. With TorchFX, you can:

Apply digital filters and many DSP effects using GPU acceleration.
Work naturally with multichannel audio.
Build filter chains with an intuitive pipe operator (|), making code both clean and fast (similarly to what happens in langchain).

And the performance? In our benchmarks, TorchFX crushed SciPy when dealing with multi-channel or long-duration signals. While SciPy slows down linearly with more channels, TorchFX (especially on GPU) keeps execution times almost flat—even for huge datasets.

Conclusions

Python’s slowness has always been the elephant in the room for scientific computing. While solutions like Numba and Cython help, they demand compromises. Our proposal is simple: if you’re starting from NumPy and SciPy, switch to PyTorch.

With TorchFX, we show how this shift can significantly speed up scientific Python for audio DSP:

Cleaner, object-oriented code
Built-in AI compatibility
GPU acceleration out of the box

The result? Your code will run blazingly fast.

TorchFX is open-source and available here: https://github.com/matteospanio/torchfx

Top comments (1)

Paige Herman • Aug 29

Great write-up—PyTorch fits DSP well. Fun fact: cuFFT (used by torch.fft on CUDA) supports batched FFTs, so you can transform thousands of channels in one call with O(N log N) per channel and minimal launch overhead.