21/08/2020 - pandas

 ok so now i am moving on to pandas

its rather similar to excel and sql

RESOURCES
user guidehttps://pandas.pydata.org/docs/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging
comparison to sqlhttps://pandas.pydata.org/docs/getting_started/comparison/comparison_with_sql.html#compare-with-sql-join
youtubehttps://www.youtube.com/watch?v=txMdrV1Ut64&list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS&index=8
FUNCTIONS
opening and preview
import pandas as pdimports pandas with alias pd, as per convention
read_csvimports data from csv or tsv filefilepath, seperator character
df.head()returns the first n rows, previewern rows to be returned
set_indexrename columns
df.col.describe()summary stats
pd.set_optionpandas settings eg max rows to show
you can use list comprehensions to mass rename header rows
df[new col nam]=new column from existing columnscreate new column from existing columns
main
pd.DataFrame()create df from scratch
selection
. or [] notationjust use the bracket notation
ilocselect regions of data by indexrow, column [:,0] gives you all rows in the first column
locloc is label basedthere's some difference in the interpretation of the end value for iloc vs loc
filter / conditions
conditional selectionpassing in == into the args, you can use operators also
df.isin([...])can use with loc on a subset of values
isnull/notnull
math
df.col.mean()mean
.map(lambda p: p +1)creates a new column with transformed values
rename; inplace=truepass in a dict to rename a column. inplace true will override it
== + - * /operators can be used
value_counts
agguse together with groupby to use do multiple kinds of aggregration calculation
reset_indexresset multiple index back to the original
sort_values(by)you can sort more than 1 column at a time
fillna
renamepass in an object literal
combining data sets
concatjust use merge
appendjust use merge
joinjust use merge
mergeaka excel vlookup, sql inner/outer joinjoin, inner, outer, on, key
l_suffix/r_suffixhelps to differentitate if datasets have same col names
aggregation
group byequivalent of pivot table. for aggregating and filtering data
string methods
regex
other col / row methods
df.dropdrop a column
inplacepersists the change to your df
splitstring method to split data based on regex char
expandpersist split columns to df
sort; argsyou can add the new rows to existing df, but use sort org
notation
df['...']=='...'returns a boolean series
df[filter statement]returns a filtered dataframe
df.loc[filter condition]same as aboveadditional param to specify which cols you want
df.loc[-filter cond]if you filter by adding '-' to the condition, it will give you the opposite of the filter for your results
df.loc['...'].str.contains; na=falseyou have to specify na is false so trues and falses only
df.locfilter with string method aka find in excel

Comments

Popular posts from this blog

green tea bitch

song

20231104