Skip to content

TomNicholas/Python-for-Scientists

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 

Repository files navigation

Python for Science

A curated list of recommended Python frameworks, libraries, software and resources, all particularly useful for scientific Python users.

Intended for students and researchers in the sciences who want to get the most out of the open-source Python ecosystem. Aims to provide a list of tools useful for common tasks for scientists, without mentioning things which they are unlikely ever to need (e.g. authentication, databases, networking, NLP).

There is a section of must-haves for beginners.

List inspired by awesome-python, which is a great similar resource for anything else you might want to do with Python!

Some libraries appear multiple times where they are useful in multiple ways.


Algebra

Libraries for manipulation of symbolic algebra, analytic integration etc.

  • SymPy -
  • sagemath - Mathematical software system with features covering multiple aspects of mathematics, including algebra, combinatorics, numerical mathematics, number theory, and calculus.

Animations

  • animatplot - A wrapper around matplotlib's funcanimation library - makes it very easy to animate matplotlib plots.

Bayesian Analysis

  • pymc -
  • arviz - Exploratory analysis of Bayesian models.

Better Scientific Software

Code Quality

Data Storage

Debugging

  • pdb - Python debugger

Development Environments

Programs to write code into. The main choice is between a software-engineering style IDE, and one intended specifically for scientific users.

  • JupyterLab - An IDE which incorporates Jupyter notebooks.
  • PyCharm - Very powerful IDE for python. Use if you want the full powers a software engineer would expect. Has a professional version, which is free for students.
  • spyder - MatLab-like development environment for scientific python users.

Documentation

  • sphinx - Sphinx is a tool that makes it easy to create intelligent and beautiful documentation, from the docstrings in your code. Originally created for documenting the python language itself.
  • nbconvert - Convert jupyter notebooks to other formats such as PDF, LaTeX, HTML.

Domain-specific

Libraries of tools developed for python users in various fields of science.

  • astropy - Various tools and functionality for astronomy and astrophysics.
  • Biopython - Tools for biological computation.
  • geoviews - Makes it easy to explore and visualize geographical, meteorological, and oceanographic datasets, such as those used in weather, climate, and remote sensing research.
  • MetPy - MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.
  • NetworkX - A package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.
  • nilearn - Machine learning for Neuro-Imaging in python.
  • PlasmaPy - Various tools for plasma physics.
  • psychopy - An open-source application allowing you run a wide range of neuroscience, psychology and psychophysics experiments.
  • pyrocko - A seismology toolkit for python.
  • scikit-beam - Data analysis tools for X-Ray, Neutron and Electron sciences
  • scikit-spectra - A community developed python package for spectroscopy.
  • SunPy - SunPy is a data-analysis environment specializing in providing the software necessary to analyze solar and heliospheric data in Python.
  • TomoPy - Package for tomographic data processing and image reconstruction.

Error handling

  • errors -
  • warnings - Throw proper warnings instead of using print statements. Python standard library module.
  • logging

Forecasting

  • prophet - Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth. Developed by Facebook.

Gotchas

GPU acceleration

Graphical interfaces

Job scheduling

  • experi -
  • papermill - A tool for parameterizing, executing, and analyzing multiple Jupyter Notebooks.

Labelled data

Mathematical library functions

  • scipy - The standard resource for all kinds of mathematical functions.
  • xrft - Discrete Fourier transform operations for xarray data structures.

Numerical data

  • numpy - The fundamental package for numerhon. So ubiquitous that it might as well be part of python's standard library at this point. Ultimately just a contiguous-in-memory C array, wrapped very nicely with python.

Optimisation problems

  • nlopt - Library for nonlinear optimization, wrapping many algorithms for global and local, constrained or unconstrained, optimization.

Package Management

Keep track of module dependencies, python versions, and virtual environments.

  • conda - A package manager specifically intended for use by the scientific python community. Developed by the authors of numpy to manage both python packages and the underlying C/Fortran libraries which make them fast. Also obviates the need for system virtual environments.
  • anaconda - Conda, but packaged with a wide range of useful scientific python libraries, including many from this list.
  • pip - The standard way to install python packages. Use when you can't use conda, but will play nicely together.
  • setuptools - For when you make your own module, and want to install it properly into your conda environment (so you never need to touch your $PYTHONPATH!)

Parallelization

Use all the cores of your machine, and scale up to clusters!

  • dask - Tools for splitting up computations and executing them across many processors in parallel. dask.array in particular provides a numpy-like interface to a chunked-in-memory array. Dask is especially useful for analysing datasets which are larger than your RAM.
  • xarray - Employs dask behind the scenes to parallelize most operations. Simply load your dataset in "chunks" and xarray will operate on each chunk in parallel:
# Load data in chunks ds = open_dataset('data.nc', chunks={'space': 100} # Will operate on each spatial chunk in parallel using dask ds['density'].mean(dim='time')

Physical Units

Plotting

Producing static plots of publication quality.

Presentations and sharing work

  • Binder - Online Jupyter Notebook hosting for GitHub repositories. Allows users to run Jupyter notebooks from GitHub repositories in the cloud, without Python installed locally.
  • nb_pdf_template - A more accurate representation of jupyter notebooks when converting to pdfs.
  • RISE - A plugin for Jupyter which turns notebooks into slick presentations.
  • jupyter-rise -

Profiling and benchmarking

  • py-spy - A profiler for python code which doesn't interfere with the running prohttps://palletsprojects.com/p/click/cess.

Scripting

Tools which are likely to be useful when writing python scripts to automate common tasks.

  • click - Run your scripts from the command line, with as little extra code as possible.
  • dateutil - Provides powerful extensions to the standard datetime module available in Python.
  • gitpython - Interact with git from python. Useful for tasks like checking if your simulation code has uncommitted changes before executing it.
  • pathlib - Use this anytime you want to do anything with a file path. Obviates the need for os and sys most of the time. A module in the python standard library.

Speed

Python inevitably sacrifices some speed to gain increased clarity. Scientific programs usually have one or two functions which do 90% of the work, and there are various ways to dramatically speed these up. Use in conjunction with parallelization through dask if you want as much speed as possible.

  • cython - A compiler which allows you to write snippets of C code into your python for massive speed increases.
  • F2PY - For calling fast, compiled Fortran subroutines from Python (part of SciPy)
  • numba - Automatic generation of fast compiled C code from Python functions.
  • bottleneck - A collection of fast numpy array functions written in C.
  • Theano - Allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.

Statistics

  • statsmodels - Provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.

Testing

Check that your code actually does what you think it will do!

  • pytest - The standard unit testing framework for python. Essential - if you're not unit-testing your calculations then you are merely hoping that they are actually doing what you think they are. pytest does a lot of magic behind the scenes to make it as simple as possible to use, with no boilerplate.
  • pytest-clarity - A plugin which improves the readability of pytest output.
  • hypothesis - Hypothesis testing for python. Normal tests check that your function behaves as expected for some specific input. Hypothesis tests check that your function behaves as expected for any input of some type, e.g. any string, or any numpy array. Basically magic, compatible with pytest, and the algorithms used in the implementation are very interesting.
  • cosmic-ray - Mutation testing in python. Checks that your test coverage is robust by randomly changing pieces of your code and checking that this change is actually caught by a test failing.
  • flaky - pytest plugin for automatically re-running inconsistent ("flaky") tests.

Visualisation

  • animatplot - A wrapper around matplotlib's funcanimation library - makes it very easy to animate matplotlib plots.
  • mayavi - 3D scientific data visualization and plotting in Python.
  • cartopy - A library for cartographic projections and plots, with matplotlib support.
  • bokeh -
  • plotly -
  • holoviews -
  • ipyvolume - 3d plotting for Python in the Jupyter notebook.
  • vispy - Interactive scientific visualisation in python.
  • yt - Very powerful software suite for analysing and visualising volumetric data. Written by astrophysicists, but since applied to many other domains.

Workflow

Don't just write and run python scripts. Tools to make your workflow faster, clearer, and easier to come back to later.

  • ipython - Run python interactively, like MatLab! Forms the backend of Jupyter notebooks.
  • jupyter notebooks -
  • jupyterlab - A development environment in which you can write Jupyter notebooks. The spiritual successor to spyder, in that it is designed specifically for scientists.
  • papermill - A tool for parameterizing, executing, and analyzing multiple Jupyter Notebooks.

Beginner Recommendations

  • First, install python through anaconda, which will also give you the packages you're about to use.
  • Write your code in either pycharm (if you want a professional IDE), spyder or jupyterlab (if you're used to MatLabs' environment).
  • Become familiar with numpy, the fundamental numeric object in python, and matplotlib, the standard way to plot.
  • Next, wrap your data into clearer, higher-level objects with either Pandas or xarray (use xarray if your data has more than one dimension).
  • Before writing new analysis functions, check if someone has already solved your problem for you in scipy , or in one of python's domain-specific scientific software packages.
  • As soon as you start writing your own analysis functions, test they're correct with unit tests written with pytest.
  • Analyse your data interactively with ipython, and record your work in a Jupyter notebook.

About

A list of recommended Python libraries, and resources, intended for scientific Python users.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 10