21/08/2020 - pandas

 ok so now i am moving on to pandas

its rather similar to excel and sql

RESOURCES
user guidehttps://pandas.pydata.org/docs/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging
comparison to sqlhttps://pandas.pydata.org/docs/getting_started/comparison/comparison_with_sql.html#compare-with-sql-join
youtubehttps://www.youtube.com/watch?v=txMdrV1Ut64&list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS&index=8
FUNCTIONS
opening and preview
import pandas as pdimports pandas with alias pd, as per convention
read_csvimports data from csv or tsv filefilepath, seperator character
df.head()returns the first n rows, previewern rows to be returned
set_indexrename columns
df.col.describe()summary stats
pd.set_optionpandas settings eg max rows to show
you can use list comprehensions to mass rename header rows
df[new col nam]=new column from existing columnscreate new column from existing columns
main
pd.DataFrame()create df from scratch
selection
. or [] notationjust use the bracket notation
ilocselect regions of data by indexrow, column [:,0] gives you all rows in the first column
locloc is label basedthere's some difference in the interpretation of the end value for iloc vs loc
filter / conditions
conditional selectionpassing in == into the args, you can use operators also
df.isin([...])can use with loc on a subset of values
isnull/notnull
math
df.col.mean()mean
.map(lambda p: p +1)creates a new column with transformed values
rename; inplace=truepass in a dict to rename a column. inplace true will override it
== + - * /operators can be used
value_counts
agguse together with groupby to use do multiple kinds of aggregration calculation
reset_indexresset multiple index back to the original
sort_values(by)you can sort more than 1 column at a time
fillna
renamepass in an object literal
combining data sets
concatjust use merge
appendjust use merge
joinjust use merge
mergeaka excel vlookup, sql inner/outer joinjoin, inner, outer, on, key
l_suffix/r_suffixhelps to differentitate if datasets have same col names
aggregation
group byequivalent of pivot table. for aggregating and filtering data
string methods
regex
other col / row methods
df.dropdrop a column
inplacepersists the change to your df
splitstring method to split data based on regex char
expandpersist split columns to df
sort; argsyou can add the new rows to existing df, but use sort org
notation
df['...']=='...'returns a boolean series
df[filter statement]returns a filtered dataframe
df.loc[filter condition]same as aboveadditional param to specify which cols you want
df.loc[-filter cond]if you filter by adding '-' to the condition, it will give you the opposite of the filter for your results
df.loc['...'].str.contains; na=falseyou have to specify na is false so trues and falses only
df.locfilter with string method aka find in excel

Comments

Popular posts from this blog

green tea bitch

11/02/2021 - git commands for remote repo

29/11/2021 - time flies