Python Pandas for Data Science cheatsheet

PythonForDataScience Cheat Sheet Pandas Basics Learn Python for Data Science Interactively at www.DataCamp.com Pandas DataCamp Learn Python for Data Science Interactively Series DataFrame 4 7 -5 3 D C B AA one-dimensional labeled array capable of holding any data type Index Index Columns A two-dimensional labeled data structure with columns of potentially different types The Pandas library is built on NumPy and provides easy-to-use data structures and data analysis tools for the Python programming language. >>> import pandas as pd Use the following import convention: Pandas Data Structures >>> s = pd.Series([3, -5, 7, 4], index=['a', 'b', 'c', 'd']) >>> data = {'Country': ['Belgium', 'India', 'Brazil'], 'Capital': ['Brussels', 'New Delhi', 'Brasília'], 'Population': [11190846, 1303171035, 207847528]} >>> df = pd.DataFrame(data, columns=['Country', 'Capital', 'Population']) Selection >>> s['b'] Get one element -5 >>> df[1:] Get subset of a DataFrame Country Capital Population 1 India New Delhi 1303171035 2 Brazil Brasília 207847528 By Position >>> df.iloc([0],[0]) Select single value by row & 'Belgium' column >>> df.iat([0],[0]) 'Belgium' By Label >>> df.loc([0], ['Country']) Select single value by row & 'Belgium' column labels >>> df.at([0], ['Country']) 'Belgium' By Label/Position >>> df.ix[2] Select single row of Country Brazil subset of rows Capital Brasília Population 207847528 >>> df.ix[:,'Capital'] Select a single column of 0 Brussels subset of columns 1 New Delhi 2 Brasília >>> df.ix[1,'Capital'] Select rows and columns 'New Delhi' Boolean Indexing >>> s[~(s > 1)] Series s where value is not >1 >>> s[(s < -1) | (s > 2)] s where value is <-1 or >2 >>> df[df['Population']>1200000000] Use filter to adjust DataFrame Setting >>> s['a'] = 6 Set index a of Series s to 6 Applying Functions >>> f = lambda x: x*2 >>> df.apply(f) Apply function >>> df.applymap(f) Apply function element-wise Retrieving Series/DataFrame Information >>> df.shape (rows,columns) >>> df.index Describe index >>> df.columns Describe DataFrame columns >>> df.info() Info on DataFrame >>> df.count() Number of non-NA values Getting Also see NumPy Arrays Selecting, Boolean Indexing & Setting Basic Information Summary >>> df.sum() Sum of values >>> df.cumsum() Cummulative sum of values >>> df.min()/df.max() Minimum/maximum values >>> df.idmin()/df.idmax() Minimum/Maximum index value >>> df.describe() Summary statistics >>> df.mean() Mean of values >>> df.median() Median of values Dropping >>> s.drop(['a', 'c']) Drop values from rows (axis=0) >>> df.drop('Country', axis=1) Drop values from columns(axis=1) Data Alignment >>> s.add(s3, fill_value=0) a 10.0 b -5.0 c 5.0 d 7.0 >>> s.sub(s3, fill_value=2) >>> s.div(s3, fill_value=4) >>> s.mul(s3, fill_value=3) >>> s3 = pd.Series([7, -2, 3], index=['a', 'c', 'd']) >>> s + s3 a 10.0 b NaN c 5.0 d 7.0 Arithmetic Operations with Fill Methods Internal Data Alignment NA values are introduced in the indices that don’t overlap: You can also do the internal data alignment yourself with the help of the fill methods: Sort & Rank >>> df.sort_index(by='Country') Sort by row or column index >>> s.order() Sort a series by its values >>> df.rank() Assign ranks to entries Belgium Brussels India New Delhi Brazil Brasília 1 2 3 Country Capital 11190846 1303171035 207847528 Population I/O Read and Write to CSV >>> pd.read_csv('file.csv', header=None, nrows=5) >>> pd.to_csv('myDataFrame.csv') Read and Write to Excel >>> pd.read_excel('file.xlsx') >>> pd.to_excel('dir/myDataFrame.xlsx', sheet_name='Sheet1') Read multiple sheets from the same file >>> xlsx = pd.ExcelFile('file.xls') >>> df = pd.read_excel(xlsx, 'Sheet1') >>> help(pd.Series.loc) Asking For Help Read and Write to SQL Query or Database Table >>> from sqlalchemy import create_engine >>> engine = create_engine('sqlite:///:memory:') >>> pd.read_sql("SELECT * FROM my_table;", engine) >>> pd.read_sql_table('my_table', engine) >>> pd.read_sql_query("SELECT * FROM my_table;", engine) >>> pd.to_sql('myDf', engine) read_sql()is a convenience wrapper around read_sql_table() and read_sql_query()

Python Pandas for Data Science cheatsheet

More Related Content

What's hot

Viewers also liked

Similar to Python Pandas for Data Science cheatsheet

More from Dr. Volkan OBAN

Recently uploaded

Python Pandas for Data Science cheatsheet