Skip to content

Commit 20127b8

Browse files
committed
Finished speed section
1 parent 93697bd commit 20127b8

File tree

1 file changed

+65
-24
lines changed

1 file changed

+65
-24
lines changed

README.md

Lines changed: 65 additions & 24 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# Python for Science
22

3-
A curated list of recommened Python frameworks, libraries, software and
3+
A curated list of recommended Python frameworks, libraries, software and
44
resources, all particularly useful for scientific Python users.
55

66
Intended for students and researchers in the sciences who want to get
@@ -32,6 +32,7 @@ ways.
3232
- [Graphical Interfaces](#graphical-interfaces)
3333
- [Job Scheduling](#job-scheduling)
3434
- [Labelled data](#labelled-data)
35+
- [Mathematical Library Functions](#mathematical-library-functions)
3536
- [Numerical Data](#numerical-data)
3637
- [Optimisation](#optimisation)
3738
- [Package Management](#package-management)
@@ -41,6 +42,7 @@ ways.
4142
- [Presentations](#presentations)
4243
- [Profiling and Benchmarking](#profiling-and-benchmarking)
4344
- [Speed](#speed)
45+
- [Statistics](#statistics)
4446
- [Testing](#testing)
4547
- [Visualisation](#visualisation)
4648
- [Workflow](#workflow)
@@ -55,6 +57,7 @@ ways.
5557
*Libraries for manipulation of symbolic algebra, analytic integration etc.*
5658

5759
* [SymPy]() -
60+
* [sage]()
5861

5962

6063
## Animations
@@ -71,9 +74,10 @@ ways.
7174

7275
* [PEP8]() -
7376
* [flake8]() -
74-
* [black]() -
77+
* [pycodestyle]() -
7578
* [structure](https://docs.python-guide.org/writing/structure/) - The officially recommended way to structure any python project.
7679

80+
7781
## Data Storage
7882

7983
* [netcdf4]() -
@@ -83,7 +87,7 @@ ways.
8387

8488
## Dates and Times
8589

86-
* []() -
90+
* [dateutil](https://dateutil.readthedocs.io/en/stable/) - Provides powerful extensions to the standard datetime module available in Python.
8791

8892

8993
## Debugging
@@ -93,25 +97,34 @@ ways.
9397

9498
## Development Environments
9599

96-
* [PyCharm]() -
97-
* [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) - (follow-on from SPyder)
100+
*Programs to write code into. The main choice is between a software-engineering style IDE, and one intended specifically for scientific users.*
101+
102+
* [JupyterLab](https://jupyterlab.readthedocs.io/en/stable/) - An IDE which incorporates Jupyter notebooks.
103+
* [PyCharm](https://www.jetbrains.com/pycharm/) - Very powerful IDE for python. Use if you want the full powers a software engineer would expect.
104+
Has a professional version, which is free for students.
105+
* [spyder](https://www.spyder-ide.org/) - MatLab-like development environment for scientific python users.
98106

99107

100108
## Documentation
101109

102110
* [sphinx]() -
103-
* [nbconvert]() -
111+
* [nbconvert](https://nbconvert.readthedocs.io/en/latest/) - Convert jupyter notebooks to other formats such as PDF, LaTeX, HTML.
104112

105113

106114
## Domain-specific
107115

116+
*Libraries of tools tools developed for python users in various fields of science.*
117+
108118
* [astropy]() -
109-
* [plasmapy]() -
110-
* [psychopy]() -
119+
* [Biopython](https://biopython.org/) - Tools for biological computation.
111120
* [cartopy]() -
112-
* [geoviews]() -
113-
* [pyrocko]() -
114-
* [SpectroscoPyx]() -
121+
* [geoviews](http://geoviews.org/) - makes it easy to explore and visualize geographical, meteorological, and oceanographic datasets, such as those used in weather, climate, and remote sensing research.
122+
* [MetPy](https://unidata.github.io/MetPy/latest/) - MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.
123+
* [PlasmaPy]() -
124+
* [psychopy]() -
125+
* [pyrocko](https://pyrocko.org/) - A seismology toolkit for python.
126+
* [SpectroscoPyx](https://github.com/PlasmaPy/SpectroscoPyx) - A community developed python package for spectroscopy.
127+
* [TomoPy](https://tomopy.readthedocs.io/en/latest/) - Package for tomographic data processing and image reconstruction.
115128

116129

117130
## Forecasting
@@ -138,7 +151,7 @@ ways.
138151
## Job scheduling
139152

140153
* [experi]() -
141-
* [papermill]() -
154+
* [papermill](https://papermill.readthedocs.io/en/latest/) - A tool for parameterizing, executing, and analyzing multiple Jupyter Notebooks.
142155

143156

144157
## Labelled data
@@ -147,9 +160,17 @@ ways.
147160
* [xarray]() -
148161

149162

163+
## Mathematical library functions
164+
165+
* [scipy](https://docs.scipy.org/doc/scipy/reference/) - The standard resource for all kinds of mathematical functions.
166+
* [xrft](https://xrft.readthedocs.io/en/latest/) - Discrete Fourier transform operations for xarray data structures.
167+
168+
150169
## Numerical data
151170

152-
* [numpy]() -
171+
* [numpy](http://www.numpy.org/) - The fundamental package for numerhon.
172+
So ubiquitous that it might as well be part of python's standard library at this point.
173+
Ultimately just a contiguous-in-memory C array, wrapped very nicely with python.
153174

154175

155176
## Optimisation problems
@@ -159,6 +180,8 @@ ways.
159180

160181
## Package Management
161182

183+
*Keep track of module dependencies, python versions, and virtual environments.*
184+
162185
* [conda](https://conda.io/docs/index.html) - A package manager specifically intended for use by the scientific python community.
163186
Developed by the authors of numpy to manage both python packages and the underlying C/Fortran libraries which make them fast.
164187
Also obviates the need for system virtual environments.
@@ -169,6 +192,8 @@ Also obviates the need for system virtual environments.
169192

170193
## Parallelization
171194

195+
*Use all the cores of your machine, and scale up to clusters!*
196+
172197
* [dask](https://dask.org/) - Tools for splitting up computations and executing them across many processors in parallel.
173198
[dask.array](http://docs.dask.org/en/latest/array.html) in particular provides a numpy-like interface to a chunked-in-memory array.
174199
Dask is especially useful for analysing datasets which are larger than your RAM.
@@ -194,16 +219,18 @@ ds['density'].mean(dim='time')
194219
*Producing static plots of publication quality.*
195220

196221
* [matplotlib]() -
222+
* [anatomy of matplotlib](https://github.com/matplotlib/AnatomyOfMatplotlib) - Tutorial on how matplotlib is structured.
197223
* [scientific-matplotlib]() -
198224
* [seaborn]() -
199225
* [xarray.plot]() -
200-
* [colorcet]() -
226+
* [colorcet](http://colorcet.pyviz.org/) - A set of useful [perceptually uniform](https://arxiv.org/abs/1509.03700) colormaps for plotting scientific data
201227

202228

203229
## Presentations and sharing work
204230

205231
* [RISE]() -
206-
* [Binder]() -
232+
* [Binder](https://mybinder.org/) - Online Jupyter Notebook hosting for GitHub repositories.
233+
Allows users to run Jupyter notebooks from GitHub repositories in the cloud, without Python installed locally.
207234
* [jupyter-rise]() -
208235
* [nb_pdf_template]() -
209236

@@ -215,13 +242,26 @@ ds['density'].mean(dim='time')
215242

216243
## Speed
217244

218-
* [cython]() -
219-
* [numba]() -
220-
* [bottleneck]()
245+
*Python inevitably sacrifices some speed to gain increased clarity.
246+
Scientific programs usually have one or two functions which do 90% of the work, and there are various ways to dramatically speed these up.
247+
Use in conjunction with parallelization through dask if you want as much speed as possible.*
248+
249+
* [cython](https://cython.org/) - A compiler which allows you to write snippets of C code into your python for massive speed increases.
250+
* [F2PY](https://docs.scipy.org/doc/numpy/f2py/) - For calling fast, compiled Fortran subroutines from Python (part of SciPy)
251+
* [numba](https://numba.pydata.org/) - *Automatic* generation of fast compiled C code from Python functions.
252+
* [bottleneck](https://kwgoodman.github.io/bottleneck-doc/) - A collection of fast numpy array functions written in C.
253+
* [Theano](http://www.deeplearning.net/software/theano/) - Allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.
254+
255+
256+
## Statistics
257+
258+
* [statsmodels](http://www.statsmodels.org/stable/index.html) - Provides classes and functions for the estimation of many different statistical models, as well as for conducting statistical tests, and statistical data exploration.
221259

222260

223261
## Testing
224262

263+
*Check that your code actually does what you think it will do!*
264+
225265
* [pytest](https://docs.pytest.org/en/latest/) - The standard unit testing framework for python.
226266
Essential - if you're not unit-testing your calculations then you are merely hoping that they are actually doing what you think they are.
227267
`pytest` does a lot of magic behind the scenes to make it as simple as possible to use, with no boilerplate.
@@ -241,6 +281,7 @@ Basically magic, compatible with pytest, and the algorithms used in the implemen
241281
* [cartopy]()
242282
* [bokeh]() -
243283
* [plotly]() -
284+
* [holoviews]() -
244285

245286

246287
## Workflow
@@ -251,12 +292,12 @@ Basically magic, compatible with pytest, and the algorithms used in the implemen
251292
* [papermill]() -
252293

253294

254-
# Beginners Recommendations
295+
# Beginner Recommendations
255296

256297
* First, install python through anaconda, which will also give you the packages you're about to use.
257-
* Write your code in either `pycharm` (if you want a good IDE) or `jupyterlab` (if you're used to MatLabs' environment).
298+
* Write your code in either `pycharm` (if you want a professional IDE), `spyder` or `jupyterlab` (if you're used to MatLabs' environment).
258299
* Become familiar with `numpy`, the fundamental numeric object in python, and `matplotlib`, the standard way to plot.
259-
* Next, wrap your data into a clearer form with either `Pandas` or `xarray` (`xarray` if your data has more than one dimension).
260-
* As soon as you start writing your own analysis functions, test they are correct with `pytest`.
261-
* Examine your data on the fly with `ipython`, and record your work in a `jupyter notebook`.
262-
* Check if someone has already solved your problem for you in one of python's domain-specific scientific software packages.
300+
* Next, wrap your data into clearer, higher-level objects with either `Pandas` or `xarray` (use `xarray` if your data has more than one dimension).
301+
* Before writing new analysis functions, check if someone has already solved your problem for you in `scipy` , or in one of python's domain-specific scientific software packages.
302+
* As soon as you start writing your own analysis functions, test they're correct with unit tests written with `pytest`.
303+
* Analyse your data interactively with `ipython`, and record your work in a `jupyter notebook`.

0 commit comments

Comments
 (0)