21/08/2020 - pandas
- Get link
- X
- Other Apps
ok so now i am moving on to pandas
its rather similar to excel and sql
RESOURCES | ||
user guide | https://pandas.pydata.org/docs/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging | |
comparison to sql | https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_sql.html#compare-with-sql-join | |
youtube | https://www.youtube.com/watch?v=txMdrV1Ut64&list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS&index=8 | |
FUNCTIONS | ||
opening and preview | ||
import pandas as pd | imports pandas with alias pd, as per convention | |
read_csv | imports data from csv or tsv file | filepath, seperator character |
df.head() | returns the first n rows, previewer | n rows to be returned |
set_index | rename columns | |
df.col.describe() | summary stats | |
pd.set_option | pandas settings eg max rows to show | |
you can use list comprehensions to mass rename header rows | ||
df[new col nam]=new column from existing columns | create new column from existing columns | |
main | ||
pd.DataFrame() | create df from scratch | |
selection | ||
. or [] notation | just use the bracket notation | |
iloc | select regions of data by index | row, column [:,0] gives you all rows in the first column |
loc | loc is label based | there's some difference in the interpretation of the end value for iloc vs loc |
filter / conditions | ||
conditional selection | passing in == into the args, you can use operators also | |
df.isin([...]) | can use with loc on a subset of values | |
isnull/notnull | ||
math | ||
df.col.mean() | mean | |
.map(lambda p: p +1) | creates a new column with transformed values | |
rename; inplace=true | pass in a dict to rename a column. inplace true will override it | |
== + - * / | operators can be used | |
value_counts | ||
agg | use together with groupby to use do multiple kinds of aggregration calculation | |
reset_index | resset multiple index back to the original | |
sort_values(by) | you can sort more than 1 column at a time | |
fillna | ||
rename | pass in an object literal | |
combining data sets | ||
concat | just use merge | |
append | just use merge | |
join | just use merge | |
merge | aka excel vlookup, sql inner/outer join | join, inner, outer, on, key |
l_suffix/r_suffix | helps to differentitate if datasets have same col names | |
aggregation | ||
group by | equivalent of pivot table. for aggregating and filtering data | |
string methods | ||
regex | ||
other col / row methods | ||
df.drop | drop a column | |
inplace | persists the change to your df | |
split | string method to split data based on regex char | |
expand | persist split columns to df | |
sort; args | you can add the new rows to existing df, but use sort org | |
notation | ||
df['...']=='...' | returns a boolean series | |
df[filter statement] | returns a filtered dataframe | |
df.loc[filter condition] | same as above | additional param to specify which cols you want |
df.loc[-filter cond] | if you filter by adding '-' to the condition, it will give you the opposite of the filter for your results | |
df.loc['...'].str.contains; na=false | you have to specify na is false so trues and falses only | |
df.loc | filter with string method aka find in excel |
- Get link
- X
- Other Apps
Comments