Intro to Python

In this lab you will be familiarizing yourself with the programming language called Python. Python is a so-called high level programming language. Which means it is very abstracted from machine languages and are easily readable by humans.

Python is one of the most popular programming languages nowadays, and is used in various spheres from research and Data Science to Machine Learning and Artificial Intelligence.

In this lab you will also learn some of the most popular Python libraries such as Pandas, Numpy, and Matplotlib. Libraries are used to make Python more efficient. By itself (solely with built-in functions and methods), Python is not a very useful language, hence people keep building libraries (sets of functions and methods) that can be used together with Python in order to make it more useful.

1. Python’s Data Types

1.1 Numerals

One of the data types that is commonly used in Python are numerals (aka numbers). With numerals, Python can be used as a calculator. Run the cells below to see it for yourself.

3+5

8

140-20

120

40/2

20.0

2*2

4


Python follows the same arithmetic rules as we do. So guess what the output of the next cell will be:

2*5+1

11


2*(5+1)

12


Guess what a double star notation does? Run the cell below to see if you had the right intuition about it.

3**2

9


There are a few types of them: integers and floats. They are not compatible, so it is important to remember which one you are using in order to avoid bugs. Floats use decimal points, so it will be easy for you to spot them. See how the output changes when we print the same number as an integer and as a float.

int(1776)

1776

float(1776)

1776.0


1.2 Variables

To store the values in Python (and in many other programming languages) we need to use variables. Variables can store all kinds of information. This is useful if you plan to use the same values later on in your project. It will save you time and make your code more readable. To see how we store values in variables run the cell below.

burr = "Wait for it!"
burr

'Wait for it!'


If you don’t assign any value to the variable and call it later, it will give you an error. We can see it in the cell below. (It will give you an error, but it was planned that way)

burr

'Wait for it!'

y = 8
z = x+y

---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-25-33c5b6ca9af3> in <module>()
1 y = 8
----> 2 z = x+y

NameError: name 'x' is not defined


Here, we did not assign x to anything, and since x was not defined, we cannot add its value to y, so this produces an error.

1.3 Strings

The phrases and words in Python are always stored inbetween double or single quotation marks. They are formally called strings.

"History has its eyes on you"

'History has its eyes on you'


Also, you can use both single and double quotes for the strings, but sometimes you need to use single quotes inside of the string and that might cause a problem (it will “exit it out”).

You can just always use the double quotation marks on the outside to avoid such problem. Or, you can use a backslash before the single quotation mark. It will tell Python to ignore it and not to exit it out. You can see an example of it below:

print("What'd I miss")
print('You\'ll be back')

What'd I miss
You'll be back



The strings can also be “added” onto one another. Run the cell below to see how it works.

"York"+"town"

'Yorktown'


When we are using Jupyter Notebooks, the last line is printed out automatically (unless it is performing some other operation, eg. getting assigned to a specific value). But if you would like to print more than just the last line, you can use the print( ) method. If you just run the next cell, it will not give you any output.

aaron = "Ev’ryone give it up for America’s favorite fighting Frenchman"
everyone = "Lafayette!"
mdl = "I’m takin this horse by the reins"


Try using print( ) in the cell below to output all the three lines of lyrics of “Guns and Ships”.

Hint: you’ll need to use it three times.

##fill this cell in



A string is a sequence of characters. If you want to see how many characters your string has, use the len() method. Do you think spaces are considered characters in Python? Check if your intuition was correct by running the cell below.

len(burr)

12


Now create a variable name in which you will save your full name. Note that you need to call the variable once again in order to get an output.

##fill this in
name = ...
name


Sometimes you need to switch the values in two variables. In the cell below we have the last names mixed up. The variable theodosia should have an output “burr” and philip should return “hamilton”. How would you approach this problem without manually retyping the outputs?

Hint Use a temporary variable in which you can temporarily save the value of one of the other variables.

theodosia = "hamilton"
philip = "burr"

##fill these in, Version 1:
theodosia, philip = ...

# Version 2
temp = ...
theodosia = ...
philip = ...


1.4 Lists

You have learnt about the strings and numerals, it’s time to take it up a notch. Another useful and very popular data type is a Python list. Lists can take various data types (both numerals, strings, arrays, other lists, etc).

To create an empty list just make your variable equal to “[ ]”, like so:

washington = []
washington

[]


To add the values into your list you can either use lst.append( ), lst.insert( ), or lst.extend( ).

washington.extend(["Right Hand Man"])
washington

['Right Hand Man']

washington.insert(0, "I need my")
washington

['I need my', 'Right Hand Man']

washington.append("Back")
washington

['I need my', 'Right Hand Man', 'Back']


Notice, that the insert method takes two positional values. You need to specify the index in the list at which it will be inserted, whilst append just adds the value to the end of the list. extend takes another list as its input. If you try to use a string or a number by itself, it will give you an error.

When you need to access an element at a specific index, that is the code you can use The code below returns the second element of the list.

washington[1]

'Right Hand Man'


Note: Python starts counting at 0, not 1.

Sometimes you just need to use part of the list. In this case you can use “:” to modify your list. Let’s make a new list:

Sometimes you just need to use part of the list or copy the whole list into a new variable. In this case you can use “:” to modify your list. The notation is:

a = b[start:stop:step]

start: the index you want to start with. (the default is 0)

stop: the index you want to end with. (the default is the last number)

step: use it, if you wish to skip some indeces (eg. use only every other value). For example, if you start with index 1 and step is 2, you will iterate through the 1st index, then 3rd, then 5th, etc. until you get to the stopping index. (the default is 1)

Note that python will stop iterating at the value before the last index. See the following example:

lst = [1,2,3,4,5]

lst[0:5:1]

[1, 2, 3, 4, 5]


Note: if you wish to start iterating from the end of a list backwards, we use a negative step:

lst[::-1]

[5, 4, 3, 2, 1]

ham_songs = ["Alexander Hamilton", "Aaron Burr, Sir", "My Shot", \
"The Story of Tonight", "The Schuyler Sisters", "Farmer Refuted",
"You'll be Back", "Right Hand Man", "A Winter's Ball", \
"Helpless", "Satisfied"]

every_other = ham_songs[::2]
every_other

['Alexander Hamilton',
'My Shot',
'The Schuyler Sisters',
"You'll be Back",
"A Winter's Ball",
'Satisfied']

lyrics = ["I", "am", "not", "throwing", "away", "my", "shot"]
lyrics

['I', 'am', 'not', 'throwing', 'away', 'my', 'shot']


Let’s say you want to save only the two last words from your old list into your new list. That’s what you will need to do then:

alex = lyrics[-2:]
alex

# alternatively you can use lyrics[5:]

['my', 'shot']


Notice also that if you want to learn the length of your list, you can use the same method “len( )”, but it will not count all the characters anymore. Instead, it will count all the elements in it. Like so:

len(washington)

3


Exercise: Create a list with the values 1, 2, 3, 4, and 5 and only get the values 2 and 4

##fill this cell in



2. Built-in Functions

Although Python doesn’t have a lot of functions and methods by itself, it is not like it doesn’t have any. Let’s go over some of the most valuable built-in funsctions in Python.

min(1, 3)

1

max(15, 25, 70)

70


You can also use these functions with strings. Can you guess what the output will be?

max("Hamilton", "Washington")

'Washington'


So does Python know that George Washington was older than Alexander Hamilton? Of course not, when comparing strings, Python goes of off the letters of alphabet. Since “W” goes after “H” in the alphabet, “Washington” is a value that is “bigger” than “Hamilton”.

We have created two lists with different integers. Let’s now find the biggest number among the two smallest numbers in two lists. In other words, let’s find the max( ) of the two min( ). You can achieve it with either 3 or 1 lines of code.

dob = [1757, 1756, 1732, 1737, 1754]
dod = [1804, 1836, 1799, 1793, 1782]

##fill in
min_dob = ...
min_dod = ...
max_of_mins = ...

# you can do it in one line

max_mins = max(min(), min())


Another valuable function you can use with your numerals is “round”. It will round your floats to the nearest integer. Like so:

round(8.7)

9


Another useful function we can use with numbers is “abs”. It outputs an absolute value of a number:

abs(-3.5)

3.5


3. Conditionals and For-loops

3.1 Conditionals

Conditionals are also known as booleans. With conditional statements we can let the computer know when (aka under which condition) we want a specific operation to be executed. Under the hood the computer evaluates if the condition is True and performs a specific operation (like print( ), return( ), etc). Some of the most popular conditional statements are:

A == B: True if A equals B

A != B: True if A is not equal to B

A > B: True if A is greater than B. Same syntax for “<”

A >= B: True if A is greater than or equal to B. Same syntax for “<=”

We have three types of statements in the Python conditionals: if, elif, and else. We always start with “if”, we always end with “else”. The “elif” statement is optional. It literally means “else if”. While both “if” and “else” can only be used once, “elif” can be used multiple times, and let you add many more conditions to your statement.

If the condition in the if case is False, then we move to the elif case. If the elif case is false, then we will move to the else case and perform whatever is inside the else condition. We move into the condition that has a True conditional value first and evaluate that.

playwright = "Lin-Manuel Miranda"

if playwright == "Lin-Manuel Miranda":
print (True)
else:
print (False)

True



Note: “=” and “==” are not the same operations in Python. A single equals sign (=) assignes the value to the right of it to the variable name to the left of it. While the double equals sign (==) compares if the value on the right is equal to the value to the left of it.

In the next line, let’s ask Python to print the name of the school two of the Founding Fathers went based on their name. Feel free to change the string our “name” variable is assigned to.

name = "Hamilton"

# Feel free to change the value of the variable "name"

if name == "Hamilton":
print("Columbia")
elif name == "Burr":
print("Princeton")
else:
print("I don't know")

Columbia



In the cell below guess what the output will be, before you run it. Feel free to change the numeral under the variable “year”.

year = 1789

# Who's the president

if year>=1789 and year<1797:
print("George Washington")
elif year>=1797:
else:
print("King George")

George Washington



We can also search through lists of values in our conditional statements to see if a word is in our list using the keyword ‘in’:

founding_fathers = ["Jefferson", "Hancock", "Hamilton", "Adams", "Washington", "Burr", "Madison"]

# change the name in "any_name"
if "any_name" in founding_fathers:
print("You are not throwing away your shot!")
else:
print("You've got to be in the room where it happens!")

You've got to be in the room where it happens!



3.2 Loops

Loops (aka for-loops) allow for some code to be executed repeatedly. For example, if you wish to print the numbers from 0 to 10, you can do that with the for-loop.

A for-loop iterates through a sequence of elements (list, string, array, etc.) and reassigns an element (which can have any name, most common are “x”, “i”, “elem”, the name in itself doesn’t matter) to each element of the sequence sequentially.

for elem in ['a', 'b', 'c']:
print(elem)

elem

a
b
c


'c'


That is why in the previous code cell we can see that when an “elem” is called at the end of the loop, it is equal to “c” which is the last element of the sequence.

In the cell below we will be using the built-in function called range( ). It enumerates the numbers from 0 up to a number you put in (exclusive of that last number).

Note: remember that Python starts to count from 0, hence the last number you specify is not going to be included.

for i in range(11):
print(i)

0
1
2
3
4
5
6
7
8
9
10



Exercise: Let’s combine what you know so far about lists, conditionals, and for loops. Create a list with values 1, 2, 3, 4, 5. Iterate through the list with a for loop, and print only the values that are greater than 2. Otherwise, print the statement “This value is not greater than 2.”

##fill this cell in



4. Working with Libraries

It is time for us to introduce a concept of Python libraries. As we mentioned before, Python is not a very powerful programming language by itself. What makes it powerful are the libraries people wrote for it. Libraries are the sets of different functions that you can import and use. The two most popular libraries are numpy and pandas.

Since we are using anaconda, a lot of libraries have been preloaded for you. You just need to “call” them. Just run the cell below. It will make our notebook know that we will be using these preinstalled libraries from that point on.

import pandas as pd
import numpy as np


With the help of numpy we can introduce a new data type - arrays. Arrays are commonly used with Data Frames (basically, tables of values). There are a lot of functions and methods that can be used with arrays when one is analysing their data. The main difference between the lists and arrays is that an array can take only one type of data (eg. only numerals, or strings, but never both).

To create an array with values, just make your variable equal to np.array(“your value”)

arr = np.array(10)
arr

array(10)


A very useful method that can be used with an array of numbers is np.arange( ). It takes at least 2, sometimes 3 positional arguments. The first number will be the starting point of your array, the last one will identify up to which number your array will go (not inclusive). The third positional argument is optional, it shows how many steps an array should skip through. (This is very similar to lists)

For example, if you need an array with numbers from 0 to 20, but you only want for it to include every other number, you will need add a third positional argument.

every_other = np.arange(0, 21, 2)
every_other

array([ 0,  2,  4,  6,  8, 10, 12, 14, 16, 18, 20])


You can also convert a list of numbers in Python into an array:

num_lst = [1, 2, 3, 4, 5]
num_arr = np.array(num_lst)
num_arr

array([1, 2, 3, 4, 5])


You can do some arithmetic with arrays. Guess what will be the output of this line of code before you run it:

num_arr*2

array([ 2,  4,  6,  8, 10])


As you can see, all the values in our array got mutliplied by 2.

Try performing the same operation with the initial list of numbers instead. What do you think the output will be in the cell below?

num_lst*2

[1, 2, 3, 4, 5, 1, 2, 3, 4, 5]


As you can see, lists and arrays not only can be used differently, but they lso give different outputs when we are using the same operations on them.

5. Data Frames and Tables

Now that you are familiar with all the basics of Python coding, it is time for you to learn how to work with Tables and Pandas Data Frames. First let’s upload the dataset we have selected to work on.

broadway = pd.read_csv("broadway.csv")

Attendance Capacity Day Full Gross Gross Potential Month Name Performances Theatre Type Year
0 5500 88 26 08/26/1990 134456 0 8 Tru 8 Booth Play 1990
1 1737 100 24 03/24/1991 100647 0 3 Miss Saigon 0 Broadway Musical 1991
2 12160 100 31 03/31/1991 634424 0 3 Miss Saigon 0 Broadway Musical 1991
3 13921 100 7 04/07/1991 713353 0 4 Miss Saigon 0 Broadway Musical 1991
4 10973 90 14 04/14/1991 573981 0 4 Miss Saigon 4 Broadway Musical 1991
5 14076 101 21 04/21/1991 706793 0 4 Miss Saigon 8 Broadway Musical 1991
6 14065 101 28 04/28/1991 714968 0 4 Miss Saigon 8 Broadway Musical 1991

We can call a column in our dataset by typing:

Note: it will return an array of values

broadway['Name']

0                                                      Tru
1                                              Miss Saigon
2                                              Miss Saigon
3                                              Miss Saigon
4                                              Miss Saigon
5                                              Miss Saigon
6                                              Miss Saigon
7                                              Miss Saigon
8                                              Miss Saigon
9                                              Miss Saigon
10                                             Miss Saigon
11                                             Miss Saigon
12                                             Miss Saigon
13                                             Miss Saigon
14                                             Miss Saigon
15                                             Miss Saigon
16                                             Miss Saigon
17                                             Miss Saigon
18                                             Miss Saigon
19                                             Miss Saigon
20                                             Miss Saigon
21                                             Miss Saigon
22                                             Miss Saigon
23                                             Miss Saigon
24                                             Miss Saigon
25                                             Miss Saigon
26                                             Miss Saigon
27                                             Miss Saigon
28                                             Miss Saigon
29                                             Miss Saigon
...
31266                                        The Lion King
31267                             The Phantom Of The Opera
31268                                             Waitress
31269                                               Wicked
31271                                   An Act Of God 2016
31272                                 An American In Paris
31273                                            Beautiful
31274                                            Cats 2016
31275                                              Chicago
31276                             Fiddler On The Roof 2015
31277                                    Finding Neverland
31278                                             Fun Home
31279                                             Hamilton
31280                                          Jersey Boys
31281                                          Kinky Boots
31282                                    Les Misrables '14
31283                                              Matilda
31285                                             Paramour
31286                                       School Of Rock
31287                                    Something Rotten!
31288                                   The Book Of Mormon
31289                                The Color Purple 2015
31290    The Curious Incident Of The Dog In The Night-Time
31291                                           The Humans
31292                                        The Lion King
31293                             The Phantom Of The Opera
31294                                             Waitress
31295                                               Wicked
Name: Name, Length: 31296, dtype: object


Let’s find the most and the least profitable shows of all time. We want for our code to return the whole row for the highest value in the “Gross” column:

broadway[broadway['Gross'] == broadway['Gross'].max()]

Attendance Capacity Day Full Gross Gross Potential Month Name Performances Theatre Type Year
26997 17352 100 29 12/29/2013 3201333 121 12 Wicked 9 Gershwin Musical 2013
broadway['Gross'].max()

3201333


Let’s go over the code in the previous line. We are asking Python to show as a row from a dataset where the value in column “Gross” equals the largest value in that column.

Now do the same for the smallest value in the “Gross” column.

##fill this cell in



Let’s now search for a show with the highest amount of people who attended it in a single day. Then, let’s find out what type of perfomance this one was.

Note: There are three types of shows listed in this dataset: Musical, Play, and Special.

broadway[broadway['Attendance'] == broadway['Attendance'].max()]["Type"]

1711    Special
Name: Type, dtype: object


In the code above we used the same things we used previously, but added the last element [“Type”] to it. What our code does is:

1) it searches through the column “Attendance”

2) finds the largest value there

3) finds a row in our dataset where it has found that value

4) returns only the value in the column “Type” for that row

To ensure your understanding, find the name of the show with the highest value in the “Gross” column.

##fill this cell in



Let’s see which 5 shows on Broadway have the largest amount of rows to their name (aka has been on Broadway the longest).

For that to work we’ll need to use method .groupby( ) which will combine all the rows for the shows with same name. Then we’ll use .size( ), which will count the total of every column. Then we’ll use .nlargest(n) that will return the n-largest values.

broadway.groupby(['Name']).size().nlargest(5)

Name
The Phantom Of The Opera    1053
Chicago                     1032
The Lion King                980
Mamma Mia!                   725
Beauty And The Beast         690
dtype: int64


To do visualizations, we will need to import yet another library.

import matplotlib.pyplot as plt


Sometimes it is easier to see a certain trend in our data by simply plotting the values from a few columns together. Like so:

plt.scatter(broadway['Year'], broadway['Gross'])
plt.show()


So we can see that the Gross amount for the shows were growing in the course of years. Which makes perfect sense. What if we wanted to see the same trend for a specific show? How about we see the changes in gross amount by week for all the years “The Phantom of the Opera” was on stage?

phantom = broadway[broadway['Name'] == "The Phantom Of The Opera"]
plt.scatter(phantom['Year'], phantom['Gross'])
plt.show()


Feel free to use the same code for any other show you are interested in.