Using your own data
If you run into errors, check the common errors Google doc first.
All the information/functions you will need are on the notebooks. The notebooks follow the general order you will want to follow during your own data analysis for the project.
If you need help, please consult the Data Peers!
Part 1
Import the right libraries
from datascience import *
import numpy as np
import matplotlib.pyplot as plots
import scipy as sp
%matplotlib inline
import statsmodels.formula.api as smf
plots.style.use('fivethirtyeight')
Read in your .csv
files
Go back to the file browser (or look below) and choose your first dataset. All are located in the data
directory.
Implicit-Age_IAT.csv Implicit-Weight_IAT.csv
Implicit-Disability_IAT.csv Outcome-FBI-Hate-Crimes.csv
Implicit-Race_IAT.csv Outcome-Heart-Attack-Mortality.csv
Implicit-Religion-Muslim_IAT.csv Outcome-Neonatal-Deaths.csv
Implicit-Sexuality_IAT.csv Outcome-Poverty.csv
Type its name exactly as it appears between the apostrophes below:
my_data = Table.read_table('YOUR-FILE-NAME.csv')
my_data
Now do the same with your second dataset:
my_data2 = Table.read_table('YOUR-FILE-NAME.csv')
my_data2
Wrangle your data
Look back at the previous notebook to figure out how to subset and join your data for analysis:
# work on subsetting and joining your data here
my_joined_data =
Now write your merged data to a .csv
:
my_joined_data.to_df().to_csv('my-joined-data.csv')
Part 2
Visualize your data
To find an association between two variables, the .scatter
method is perhaps the most useful one.
Try creating a few scatter plots of variables you might think are related among your data!
# create some visualizations of your data here
Correlate your data
Calculate a correlation coefficient on your data! Remember: sp.stats.pearsonr(var1, var2)
# calculate the correlation coefficient here
Regress your data
Run a simple regression on your data
# run a regression here
Try adding a covariate to that regression below:
# run another regression here, this time with a covariate