A curated list of recommended Python frameworks, libraries, software and resources, all particularly useful for scientific Python users.
Intended for students and researchers in the sciences who want to get the most out of the open-source Python ecosystem.
There is a section of must-haves for beginners.
List inspired by awesome-python, which is a great similar resource for anything else you might want to do with Python!
Some libraries appear multiple times where they are useful in multiple ways.
- Python-for-Science
- Algebra
- Animations
- Bayesian Analysis
- Code Quality
- Data Storage
- Dates and Times
- Debugging
- Development Environments
- Documentation
- Domain-specific
- Gotchas
- GPU Acceleration
- Graphical Interfaces
- Job Scheduling
- Labelled data
- Mathematical Library Functions
- Numerical Data
- Optimisation
- Package Management
- Parallelization
- Physical Units
- Plotting
- Presentations
- Profiling and Benchmarking
- Speed
- Statistics
- Testing
- Visualisation
- Workflow
- Beginners Recommendations
Libraries for manipulation of symbolic algebra, analytic integration etc.
- pymc -
- PEP8 -
- flake8 -
- pycodestyle -
- structure - The officially recommended way to structure any python project.
- dateutil - Provides powerful extensions to the standard datetime module available in Python.
- pdb - Python debugger
Programs to write code into. The main choice is between a software-engineering style IDE, and one intended specifically for scientific users.
- JupyterLab - An IDE which incorporates Jupyter notebooks.
- PyCharm - Very powerful IDE for python. Use if you want the full powers a software engineer would expect. Has a professional version, which is free for students.
- spyder - MatLab-like development environment for scientific python users.
Libraries of tools tools developed for python users in various fields of science.
- astropy -
- Biopython - Tools for biological computation.
- cartopy -
- geoviews - makes it easy to explore and visualize geographical, meteorological, and oceanographic datasets, such as those used in weather, climate, and remote sensing research.
- MetPy - MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.
- PlasmaPy -
- psychopy -
- pyrocko - A seismology toolkit for python.
- SpectroscoPyx - A community developed python package for spectroscopy.
- TomoPy - Package for tomographic data processing and image reconstruction.
- prophet -
- pyqt5 -
- experi -
- papermill - A tool for parameterizing, executing, and analyzing multiple Jupyter Notebooks.
- scipy - The standard resource for all kinds of mathematical functions.
- xrft - Discrete Fourier transform operations for xarray data structures.
- numpy - The fundamental package for numerhon. So ubiquitous that it might as well be part of python's standard library at this point. Ultimately just a contiguous-in-memory C array, wrapped very nicely with python.
- nlopt -
Keep track of module dependencies, python versions, and virtual environments.
- conda - A package manager specifically intended for use by the scientific python community. Developed by the authors of numpy to manage both python packages and the underlying C/Fortran libraries which make them fast. Also obviates the need for system virtual environments.
- anaconda - Conda, but packaged with a wide range of useful scientific python libraries, including many from this list.
- pip - The standard way to install python packages. Use when you can't use conda, but will play nicely together.
- setuptools - For when you make your own module, and want to install it properly into your conda environment (so you never need to touch your
$PYTHONPATH!)
Use all the cores of your machine, and scale up to clusters!
- dask - Tools for splitting up computations and executing them across many processors in parallel. dask.array in particular provides a numpy-like interface to a chunked-in-memory array. Dask is especially useful for analysing datasets which are larger than your RAM.
- xarray - Employs dask behind the scenes to parallelize most operations. Simply load your dataset in "chunks" and xarray will operate on each chunk in parallel:
# Load data in chunks ds = open_dataset('data.nc', chunks={'space': 100} # Will operate on each spatial chunk in parallel using dask ds['density'].mean(dim='time')Producing static plots of publication quality.
- matplotlib -
- anatomy of matplotlib - Tutorial on how matplotlib is structured.
- scientific-matplotlib -
- seaborn -
- xarray.plot -
- colorcet - A set of useful perceptually uniform colormaps for plotting scientific data
- RISE -
- Binder - Online Jupyter Notebook hosting for GitHub repositories. Allows users to run Jupyter notebooks from GitHub repositories in the cloud, without Python installed locally.
- jupyter-rise -
- nb_pdf_template -
- py-spy - A profiler for python code which doesn't interfere with the running process.
Python inevitably sacrifices some speed to gain increased clarity. Scientific programs usually have one or two functions which do 90% of the work, and there are various ways to dramatically speed these up. Use in conjunction with parallelization through dask if you want as much speed as possible.
- cython - A compiler which allows you to write snippets of C code into your python for massive speed increases.
- F2PY - For calling fast, compiled Fortran subroutines from Python (part of SciPy)
- numba - Automatic generation of fast compiled C code from Python functions.
- bottleneck - A collection of fast numpy array functions written in C.
- Theano - Allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.
- statsmodels - Provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
Check that your code actually does what you think it will do!
- pytest - The standard unit testing framework for python. Essential - if you're not unit-testing your calculations then you are merely hoping that they are actually doing what you think they are.
pytestdoes a lot of magic behind the scenes to make it as simple as possible to use, with no boilerplate. - pytest-clarity - A plugin which improves the readability of pytest output.
- hypothesis - Hypothesis testing for python. Normal tests check that your function behaves as expected for some specific input. Hypothesis tests check that your function behaves as expected for any input of some type, e.g. any string, or any numpy array. Basically magic, compatible with pytest, and the algorithms used in the implementation are very interesting.
- cosmic-ray - Mutation testing in python. Checks that your test coverage is robust by randomly changing pieces of your code and checking that this change is actually caught by a test failing.
- flaky - pytest plugin for automatically re-running inconsistent ("flaky") tests.
*Don't just write and run python scripts. Tools to make your workflow faster, clearer, and easier to come back to later. *
- ipython - Run python interactively, like MatLab! Forms the backend of jupyter notebooks.
- jupyter notebooks -
- jupyterlab - A development environment in which you can write Jupyter notebooks. The spiritual successor to spyder, in that it is designed specifically for scientists.
- papermill - A tool for parameterizing, executing, and analyzing multiple Jupyter Notebooks.
- First, install python through anaconda, which will also give you the packages you're about to use.
- Write your code in either
pycharm(if you want a professional IDE),spyderorjupyterlab(if you're used to MatLabs' environment). - Become familiar with
numpy, the fundamental numeric object in python, andmatplotlib, the standard way to plot. - Next, wrap your data into clearer, higher-level objects with either
Pandasorxarray(usexarrayif your data has more than one dimension). - Before writing new analysis functions, check if someone has already solved your problem for you in
scipy, or in one of python's domain-specific scientific software packages. - As soon as you start writing your own analysis functions, test they're correct with unit tests written with
pytest. - Analyse your data interactively with
ipython, and record your work in ajupyter notebook.