# Using your own data

*If you run into errors, check the common errors Google doc first.*

All the information/functions you will need are on the notebooks. The notebooks follow the general order you will want to follow during your own data analysis for the project.

If you need help, please consult the Data Peers!

# Part 1

## Import the right libraries

```
from datascience import *
import numpy as np
import matplotlib.pyplot as plots
import scipy as sp
%matplotlib inline
import statsmodels.formula.api as smf
plots.style.use('fivethirtyeight')
```

## Read in your `.csv`

files

Go back to the file browser (or look below) and choose your first dataset. All are located in the `data`

directory.

```
Implicit-Age_IAT.csv Implicit-Weight_IAT.csv
Implicit-Disability_IAT.csv Outcome-FBI-Hate-Crimes.csv
Implicit-Race_IAT.csv Outcome-Heart-Attack-Mortality.csv
Implicit-Religion-Muslim_IAT.csv Outcome-Neonatal-Deaths.csv
Implicit-Sexuality_IAT.csv Outcome-Poverty.csv
```

Type its name *exactly* as it appears between the apostrophes below:

```
my_data = Table.read_table('YOUR-FILE-NAME.csv')
my_data
```

Now do the same with your second dataset:

```
my_data2 = Table.read_table('YOUR-FILE-NAME.csv')
my_data2
```

## Wrangle your data

Look back at the previous notebook to figure out how to subset and join your data for analysis:

```
# work on subsetting and joining your data here
my_joined_data =
```

Now write your merged data to a `.csv`

:

```
my_joined_data.to_df().to_csv('my-joined-data.csv')
```

# Part 2

## Visualize your data

To find an association between two variables, the `.scatter`

method is perhaps the most useful one.
Try creating a few scatter plots of variables you might think are related among your data!

```
# create some visualizations of your data here
```

## Correlate your data

Calculate a correlation coefficient on your data! Remember: `sp.stats.pearsonr(var1, var2)`

```
# calculate the correlation coefficient here
```

## Regress your data

Run a simple regression on your data

```
# run a regression here
```

Try adding a covariate to that regression below:

```
# run another regression here, this time with a covariate
```