21/08/2020 - pandas

- August 21, 2020

ok so now i am moving on to pandas

its rather similar to excel and sql

RESOURCES
user guide	https://pandas.pydata.org/docs/user_guide/merging.html#database-style-dataframe-or-named-series-joining-merging
comparison to sql	https://pandas.pydata.org/docs/getting_started/comparison/comparison_with_sql.html#compare-with-sql-join
youtube	https://www.youtube.com/watch?v=txMdrV1Ut64&list=PL-osiE80TeTsWmV9i9c58mdDCSskIFdDS&index=8



FUNCTIONS

opening and preview
import pandas as pd	imports pandas with alias pd, as per convention
read_csv	imports data from csv or tsv file	filepath, seperator character
df.head()	returns the first n rows, previewer	n rows to be returned
set_index	rename columns
df.col.describe()	summary stats
pd.set_option	pandas settings eg max rows to show
	you can use list comprehensions to mass rename header rows
df[new col nam]=new column from existing columns	create new column from existing columns

main
pd.DataFrame()	create df from scratch

selection
. or [] notation	just use the bracket notation
iloc	select regions of data by index	row, column [:,0] gives you all rows in the first column
loc	loc is label based	there's some difference in the interpretation of the end value for iloc vs loc

filter / conditions
	conditional selection	passing in == into the args, you can use operators also
df.isin([...])	can use with loc on a subset of values
isnull/notnull

math
df.col.mean()	mean

.map(lambda p: p +1)	creates a new column with transformed values
rename; inplace=true	pass in a dict to rename a column. inplace true will override it
== + - * /	operators can be used


value_counts
agg	use together with groupby to use do multiple kinds of aggregration calculation
reset_index	resset multiple index back to the original
sort_values(by)	you can sort more than 1 column at a time
fillna
rename	pass in an object literal

combining data sets
concat	just use merge
append	just use merge
join	just use merge
merge	aka excel vlookup, sql inner/outer join	join, inner, outer, on, key
l_suffix/r_suffix	helps to differentitate if datasets have same col names

aggregation
group by	equivalent of pivot table. for aggregating and filtering data

string methods
regex

other col / row methods
df.drop	drop a column
inplace	persists the change to your df
split	string method to split data based on regex char
expand	persist split columns to df
sort; args	you can add the new rows to existing df, but use sort org

notation
df['...']=='...'	returns a boolean series
df[filter statement]	returns a filtered dataframe
df.loc[filter condition]	same as above	additional param to specify which cols you want
df.loc[-filter cond]	if you filter by adding '-' to the condition, it will give you the opposite of the filter for your results
df.loc['...'].str.contains; na=false		you have to specify na is false so trues and falses only
df.loc	filter with string method aka find in excel

Search This Blog

Pensieve

21/08/2020 - pandas

Comments

Popular posts from this blog

green tea bitch

green tea bitch part 2