21/08/2020 - pandas
- Get link
- X
- Other Apps
ok so now i am moving on to pandas
its rather similar to excel and sql
| RESOURCES | ||
| user guide | https://pandas.pydata.org/docs/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging | |
| comparison to sql | https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_sql.html#compare-with-sql-join | |
| youtube | https://www.youtube.com/watch?v=txMdrV1Ut64&list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS&index=8 | |
| FUNCTIONS | ||
opening and preview | ||
| import pandas as pd | imports pandas with alias pd, as per convention | |
| read_csv | imports data from csv or tsv file | filepath, seperator character |
| df.head() | returns the first n rows, previewer | n rows to be returned |
| set_index | rename columns | |
| df.col.describe() | summary stats | |
| pd.set_option | pandas settings eg max rows to show | |
| you can use list comprehensions to mass rename header rows | ||
| df[new col nam]=new column from existing columns | create new column from existing columns | |
| main | ||
| pd.DataFrame() | create df from scratch | |
| selection | ||
| . or [] notation | just use the bracket notation | |
| iloc | select regions of data by index | row, column [:,0] gives you all rows in the first column |
| loc | loc is label based | there's some difference in the interpretation of the end value for iloc vs loc |
| filter / conditions | ||
| conditional selection | passing in == into the args, you can use operators also | |
| df.isin([...]) | can use with loc on a subset of values | |
| isnull/notnull | ||
| math | ||
| df.col.mean() | mean | |
| .map(lambda p: p +1) | creates a new column with transformed values | |
| rename; inplace=true | pass in a dict to rename a column. inplace true will override it | |
| == + - * / | operators can be used | |
| value_counts | ||
| agg | use together with groupby to use do multiple kinds of aggregration calculation | |
| reset_index | resset multiple index back to the original | |
| sort_values(by) | you can sort more than 1 column at a time | |
| fillna | ||
| rename | pass in an object literal | |
| combining data sets | ||
| concat | just use merge | |
| append | just use merge | |
| join | just use merge | |
| merge | aka excel vlookup, sql inner/outer join | join, inner, outer, on, key |
| l_suffix/r_suffix | helps to differentitate if datasets have same col names | |
| aggregation | ||
| group by | equivalent of pivot table. for aggregating and filtering data | |
| string methods | ||
| regex | ||
| other col / row methods | ||
| df.drop | drop a column | |
| inplace | persists the change to your df | |
| split | string method to split data based on regex char | |
| expand | persist split columns to df | |
| sort; args | you can add the new rows to existing df, but use sort org | |
| notation | ||
| df['...']=='...' | returns a boolean series | |
| df[filter statement] | returns a filtered dataframe | |
| df.loc[filter condition] | same as above | additional param to specify which cols you want |
| df.loc[-filter cond] | if you filter by adding '-' to the condition, it will give you the opposite of the filter for your results | |
| df.loc['...'].str.contains; na=false | you have to specify na is false so trues and falses only | |
| df.loc | filter with string method aka find in excel |
- Get link
- X
- Other Apps
Comments