Lab 7

Intro to Python

In this lab you will be familiarizing yourself with the programming language called Python. Python is a so-called high level programming language. Which means it is very abstracted from machine languages and are easily readable by humans.

Python is one of the most popular programming languages nowadays, and is used in various spheres from research and Data Science to Machine Learning and Artificial Intelligence.

In this lab you will also learn some of the most popular Python libraries such as Pandas, Numpy, and Matplotlib. Libraries are used to make Python more efficient. By itself (solely with built-in functions and methods), Python is not a very useful language, hence people keep building libraries (sets of functions and methods) that can be used together with Python in order to make it more useful.

1. Python’s Data Types

1.1 Numerals

One of the data types that is commonly used in Python are numerals (aka numbers). With numerals, Python can be used as a calculator. Run the cells below to see it for yourself.

3+5

140-20

40/2

20.0

2*2

Python follows the same arithmetic rules as we do. So guess what the output of the next cell will be:

2*5+1

What about this cell:

2*(5+1)

Guess what a double star notation does? Run the cell below to see if you had the right intuition about it.

3**2

There are a few types of them: integers and floats. They are not compatible, so it is important to remember which one you are using in order to avoid bugs. Floats use decimal points, so it will be easy for you to spot them. See how the output changes when we print the same number as an integer and as a float.

int(1776)

float(1776)

1776.0

1.2 Variables

To store the values in Python (and in many other programming languages) we need to use variables. Variables can store all kinds of information. This is useful if you plan to use the same values later on in your project. It will save you time and make your code more readable. To see how we store values in variables run the cell below.

burr = "Wait for it!"
burr

'Wait for it!'

If you don’t assign any value to the variable and call it later, it will give you an error. We can see it in the cell below. (It will give you an error, but it was planned that way)

burr

'Wait for it!'

y = 8
z = x+y

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-33c5b6ca9af3> in <module>()
      1 y = 8
----> 2 z = x+y

NameError: name 'x' is not defined

Here, we did not assign x to anything, and since x was not defined, we cannot add its value to y, so this produces an error.

1.3 Strings

The phrases and words in Python are always stored inbetween double or single quotation marks. They are formally called strings.

"History has its eyes on you"

'History has its eyes on you'

Also, you can use both single and double quotes for the strings, but sometimes you need to use single quotes inside of the string and that might cause a problem (it will “exit it out”).

You can just always use the double quotation marks on the outside to avoid such problem. Or, you can use a backslash before the single quotation mark. It will tell Python to ignore it and not to exit it out. You can see an example of it below:

print("What'd I miss")
print('You\'ll be back')

What'd I miss
You'll be back

The strings can also be “added” onto one another. Run the cell below to see how it works.

"York"+"town"

'Yorktown'

When we are using Jupyter Notebooks, the last line is printed out automatically (unless it is performing some other operation, eg. getting assigned to a specific value). But if you would like to print more than just the last line, you can use the print( ) method. If you just run the next cell, it will not give you any output.

aaron = "Ev’ryone give it up for America’s favorite fighting Frenchman"
everyone = "Lafayette!"
mdl = "I’m takin this horse by the reins"

Try using print( ) in the cell below to output all the three lines of lyrics of “Guns and Ships”.

Hint: you’ll need to use it three times.

##fill this cell in

A string is a sequence of characters. If you want to see how many characters your string has, use the len() method. Do you think spaces are considered characters in Python? Check if your intuition was correct by running the cell below.

len(burr)

Now create a variable name in which you will save your full name. Note that you need to call the variable once again in order to get an output.

##fill this in
name = ...
name

Sometimes you need to switch the values in two variables. In the cell below we have the last names mixed up. The variable theodosia should have an output “burr” and philip should return “hamilton”. How would you approach this problem without manually retyping the outputs?

Hint Use a temporary variable in which you can temporarily save the value of one of the other variables.

theodosia = "hamilton"
philip = "burr"

##fill these in, Version 1:
theodosia, philip = ...

# Version 2
temp = ...
theodosia = ...
philip = ...

1.4 Lists

You have learnt about the strings and numerals, it’s time to take it up a notch. Another useful and very popular data type is a Python list. Lists can take various data types (both numerals, strings, arrays, other lists, etc).

To create an empty list just make your variable equal to “[ ]”, like so:

washington = []
washington

[]

To add the values into your list you can either use lst.append( ), lst.insert( ), or lst.extend( ).

washington.extend(["Right Hand Man"])
washington

['Right Hand Man']

washington.insert(0, "I need my")
washington

['I need my', 'Right Hand Man']

washington.append("Back")
washington

['I need my', 'Right Hand Man', 'Back']

Notice, that the insert method takes two positional values. You need to specify the index in the list at which it will be inserted, whilst append just adds the value to the end of the list. extend takes another list as its input. If you try to use a string or a number by itself, it will give you an error.

When you need to access an element at a specific index, that is the code you can use The code below returns the second element of the list.

washington[1]

'Right Hand Man'

Note: Python starts counting at 0, not 1.

Sometimes you just need to use part of the list. In this case you can use “:” to modify your list. Let’s make a new list:

Sometimes you just need to use part of the list or copy the whole list into a new variable. In this case you can use “:” to modify your list. The notation is:

a = b[start:stop:step]

start: the index you want to start with. (the default is 0)

stop: the index you want to end with. (the default is the last number)

step: use it, if you wish to skip some indeces (eg. use only every other value). For example, if you start with index 1 and step is 2, you will iterate through the 1st index, then 3rd, then 5th, etc. until you get to the stopping index. (the default is 1)

Note that python will stop iterating at the value before the last index. See the following example:

lst = [1,2,3,4,5]

lst[0:5:1]

[1, 2, 3, 4, 5]

Note: if you wish to start iterating from the end of a list backwards, we use a negative step:

lst[::-1]

[5, 4, 3, 2, 1]

ham_songs = ["Alexander Hamilton", "Aaron Burr, Sir", "My Shot", \
             "The Story of Tonight", "The Schuyler Sisters", "Farmer Refuted",
             "You'll be Back", "Right Hand Man", "A Winter's Ball", \
             "Helpless", "Satisfied"]

every_other = ham_songs[::2] 
every_other

['Alexander Hamilton',
 'My Shot',
 'The Schuyler Sisters',
 "You'll be Back",
 "A Winter's Ball",
 'Satisfied']

lyrics = ["I", "am", "not", "throwing", "away", "my", "shot"]
lyrics

['I', 'am', 'not', 'throwing', 'away', 'my', 'shot']

Let’s say you want to save only the two last words from your old list into your new list. That’s what you will need to do then:

alex = lyrics[-2:]
alex

# alternatively you can use lyrics[5:]

['my', 'shot']

Notice also that if you want to learn the length of your list, you can use the same method “len( )”, but it will not count all the characters anymore. Instead, it will count all the elements in it. Like so:

len(washington)

Exercise: Create a list with the values 1, 2, 3, 4, and 5 and only get the values 2 and 4

##fill this cell in

2. Built-in Functions

Although Python doesn’t have a lot of functions and methods by itself, it is not like it doesn’t have any. Let’s go over some of the most valuable built-in funsctions in Python.

min(1, 3)

max(15, 25, 70)

You can also use these functions with strings. Can you guess what the output will be?

max("Hamilton", "Washington")

'Washington'

So does Python know that George Washington was older than Alexander Hamilton? Of course not, when comparing strings, Python goes of off the letters of alphabet. Since “W” goes after “H” in the alphabet, “Washington” is a value that is “bigger” than “Hamilton”.

We have created two lists with different integers. Let’s now find the biggest number among the two smallest numbers in two lists. In other words, let’s find the max( ) of the two min( ). You can achieve it with either 3 or 1 lines of code.

dob = [1757, 1756, 1732, 1737, 1754]
dod = [1804, 1836, 1799, 1793, 1782] 

##fill in
min_dob = ...
min_dod = ...
max_of_mins = ...


# you can do it in one line

max_mins = max(min(), min())

Another valuable function you can use with your numerals is “round”. It will round your floats to the nearest integer. Like so:

round(8.7)

Another useful function we can use with numbers is “abs”. It outputs an absolute value of a number:

abs(-3.5)

3.5

3. Conditionals and For-loops

3.1 Conditionals

Conditionals are also known as booleans. With conditional statements we can let the computer know when (aka under which condition) we want a specific operation to be executed. Under the hood the computer evaluates if the condition is True and performs a specific operation (like print( ), return( ), etc). Some of the most popular conditional statements are:

A == B: True if A equals B

A != B: True if A is not equal to B

A > B: True if A is greater than B. Same syntax for “<”

A >= B: True if A is greater than or equal to B. Same syntax for “<=”

We have three types of statements in the Python conditionals: if, elif, and else. We always start with “if”, we always end with “else”. The “elif” statement is optional. It literally means “else if”. While both “if” and “else” can only be used once, “elif” can be used multiple times, and let you add many more conditions to your statement.

If the condition in the if case is False, then we move to the elif case. If the elif case is false, then we will move to the else case and perform whatever is inside the else condition. We move into the condition that has a True conditional value first and evaluate that.

playwright = "Lin-Manuel Miranda"

if playwright == "Lin-Manuel Miranda":
    print (True)
else:
    print (False)

True

Note: “=” and “==” are not the same operations in Python. A single equals sign (=) assignes the value to the right of it to the variable name to the left of it. While the double equals sign (==) compares if the value on the right is equal to the value to the left of it.

In the next line, let’s ask Python to print the name of the school two of the Founding Fathers went based on their name. Feel free to change the string our “name” variable is assigned to.

name = "Hamilton"

# Feel free to change the value of the variable "name"

if name == "Hamilton":
    print("Columbia")
elif name == "Burr":
    print("Princeton")
else:
    print("I don't know")

Columbia

In the cell below guess what the output will be, before you run it. Feel free to change the numeral under the variable “year”.

year = 1789

# Who's the president

if year>=1789 and year<1797:
    print("George Washington")
elif year>=1797:
    print("John Adams")
else:
    print("King George")

George Washington

We can also search through lists of values in our conditional statements to see if a word is in our list using the keyword ‘in’:

founding_fathers = ["Jefferson", "Hancock", "Hamilton", "Adams", "Washington", "Burr", "Madison"]

# change the name in "any_name"
if "any_name" in founding_fathers:
    print("You are not throwing away your shot!")
else:
    print("You've got to be in the room where it happens!")

You've got to be in the room where it happens!

3.2 Loops

Loops (aka for-loops) allow for some code to be executed repeatedly. For example, if you wish to print the numbers from 0 to 10, you can do that with the for-loop.

A for-loop iterates through a sequence of elements (list, string, array, etc.) and reassigns an element (which can have any name, most common are “x”, “i”, “elem”, the name in itself doesn’t matter) to each element of the sequence sequentially.

for elem in ['a', 'b', 'c']:   
    print(elem)

elem

a
b
c

'c'

That is why in the previous code cell we can see that when an “elem” is called at the end of the loop, it is equal to “c” which is the last element of the sequence.

In the cell below we will be using the built-in function called range( ). It enumerates the numbers from 0 up to a number you put in (exclusive of that last number).

Note: remember that Python starts to count from 0, hence the last number you specify is not going to be included.

for i in range(11):
    print(i)

Exercise: Let’s combine what you know so far about lists, conditionals, and for loops. Create a list with values 1, 2, 3, 4, 5. Iterate through the list with a for loop, and print only the values that are greater than 2. Otherwise, print the statement “This value is not greater than 2.”

##fill this cell in

4. Working with Libraries

It is time for us to introduce a concept of Python libraries. As we mentioned before, Python is not a very powerful programming language by itself. What makes it powerful are the libraries people wrote for it. Libraries are the sets of different functions that you can import and use. The two most popular libraries are numpy and pandas.

Since we are using anaconda, a lot of libraries have been preloaded for you. You just need to “call” them. Just run the cell below. It will make our notebook know that we will be using these preinstalled libraries from that point on.

import pandas as pd
import numpy as np

With the help of numpy we can introduce a new data type - arrays. Arrays are commonly used with Data Frames (basically, tables of values). There are a lot of functions and methods that can be used with arrays when one is analysing their data. The main difference between the lists and arrays is that an array can take only one type of data (eg. only numerals, or strings, but never both).

To create an array with values, just make your variable equal to np.array(“your value”)

arr = np.array(10)
arr

array(10)

A very useful method that can be used with an array of numbers is np.arange( ). It takes at least 2, sometimes 3 positional arguments. The first number will be the starting point of your array, the last one will identify up to which number your array will go (not inclusive). The third positional argument is optional, it shows how many steps an array should skip through. (This is very similar to lists)

For example, if you need an array with numbers from 0 to 20, but you only want for it to include every other number, you will need add a third positional argument.

every_other = np.arange(0, 21, 2)
every_other

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20])

You can also convert a list of numbers in Python into an array:

num_lst = [1, 2, 3, 4, 5]
num_arr = np.array(num_lst)
num_arr

array([1, 2, 3, 4, 5])

You can do some arithmetic with arrays. Guess what will be the output of this line of code before you run it:

num_arr*2

array([ 2,  4,  6,  8, 10])

As you can see, all the values in our array got mutliplied by 2.

Try performing the same operation with the initial list of numbers instead. What do you think the output will be in the cell below?

num_lst*2

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]

As you can see, lists and arrays not only can be used differently, but they lso give different outputs when we are using the same operations on them.

5. Data Frames and Tables

Now that you are familiar with all the basics of Python coding, it is time for you to learn how to work with Tables and Pandas Data Frames. First let’s upload the dataset we have selected to work on.

broadway = pd.read_csv("broadway.csv")
broadway.head(7)

	Attendance	Capacity	Day	Full	Gross	Month	Name	Performances	Theatre	Type	Year
0	5500	88	26	08/26/1990	134456	8	Tru	8	Booth	Play	1990
1	1737	100	24	03/24/1991	100647	3	Miss Saigon	0	Broadway	Musical	1991
2	12160	100	31	03/31/1991	634424	3	Miss Saigon	0	Broadway	Musical	1991
3	13921	100	7	04/07/1991	713353	4	Miss Saigon	0	Broadway	Musical	1991
4	10973	90	14	04/14/1991	573981	4	Miss Saigon	4	Broadway	Musical	1991
5	14076	101	21	04/21/1991	706793	4	Miss Saigon	8	Broadway	Musical	1991
6	14065	101	28	04/28/1991	714968	4	Miss Saigon	8	Broadway	Musical	1991

We can call a column in our dataset by typing:

Note: it will return an array of values

broadway['Name']

                                                    Tru
                                            Miss Saigon
                                            Miss Saigon
                                            Miss Saigon
                                            Miss Saigon
                                            Miss Saigon
                                            Miss Saigon
                                            Miss Saigon
                                            Miss Saigon
                                            Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                                           Miss Saigon
                               ...                        
                                      The Lion King
                           The Phantom Of The Opera
                                           Waitress
                                             Wicked
                                            Aladdin
                                 An Act Of God 2016
                               An American In Paris
                                          Beautiful
                                          Cats 2016
                                            Chicago
                           Fiddler On The Roof 2015
                                  Finding Neverland
                                           Fun Home
                                           Hamilton
                                        Jersey Boys
                                        Kinky Boots
                                  Les Misrables '14
                                            Matilda
                                      On Your Feet!
                                           Paramour
                                     School Of Rock
                                  Something Rotten!
                                 The Book Of Mormon
                              The Color Purple 2015
  The Curious Incident Of The Dog In The Night-Time
                                         The Humans
                                      The Lion King
                           The Phantom Of The Opera
                                           Waitress
                                             Wicked
Name: Name, Length: 31296, dtype: object

Let’s find the most and the least profitable shows of all time. We want for our code to return the whole row for the highest value in the “Gross” column:

broadway[broadway['Gross'] == broadway['Gross'].max()]

	Attendance	Capacity	Day	Full	Gross	Gross Potential	Month	Name	Performances	Theatre	Type	Year
26997	17352	100	29	12/29/2013	3201333	121	12	Wicked	9	Gershwin	Musical	2013

broadway['Gross'].max()

Let’s go over the code in the previous line. We are asking Python to show as a row from a dataset where the value in column “Gross” equals the largest value in that column.

Now do the same for the smallest value in the “Gross” column.

##fill this cell in

Let’s now search for a show with the highest amount of people who attended it in a single day. Then, let’s find out what type of perfomance this one was.

Note: There are three types of shows listed in this dataset: Musical, Play, and Special.

broadway[broadway['Attendance'] == broadway['Attendance'].max()]["Type"]

1711    Special
Name: Type, dtype: object

In the code above we used the same things we used previously, but added the last element [“Type”] to it. What our code does is:

1) it searches through the column “Attendance”

2) finds the largest value there

3) finds a row in our dataset where it has found that value

4) returns only the value in the column “Type” for that row

To ensure your understanding, find the name of the show with the highest value in the “Gross” column.

##fill this cell in

Let’s see which 5 shows on Broadway have the largest amount of rows to their name (aka has been on Broadway the longest).

For that to work we’ll need to use method .groupby( ) which will combine all the rows for the shows with same name. Then we’ll use .size( ), which will count the total of every column. Then we’ll use .nlargest(n) that will return the n-largest values.

broadway.groupby(['Name']).size().nlargest(5)

Name
The Phantom Of The Opera    1053
Chicago                     1032
The Lion King                980
Mamma Mia!                   725
Beauty And The Beast         690
dtype: int64

To do visualizations, we will need to import yet another library.

import matplotlib.pyplot as plt

Sometimes it is easier to see a certain trend in our data by simply plotting the values from a few columns together. Like so:

plt.scatter(broadway['Year'], broadway['Gross'])
plt.show()

png

So we can see that the Gross amount for the shows were growing in the course of years. Which makes perfect sense. What if we wanted to see the same trend for a specific show? How about we see the changes in gross amount by week for all the years “The Phantom of the Opera” was on stage?

phantom = broadway[broadway['Name'] == "The Phantom Of The Opera"]
plt.scatter(phantom['Year'], phantom['Gross'])
plt.show()

png

Feel free to use the same code for any other show you are interested in.