ENGLISH R1A: Chinatown and the Culture of Exclusion

Instructor: Amy Lee

Developers: Michaela Palmer, Maya Shen, Cynthia Leu, Chris Cheung

FPF 2017

Welcome to lab! Please read this lab in its entirety, as the analysis will make a lot more sense with the background context provided. This lab is intended to be a hands-on introduction to data science as it can be applied to Chinatown demographics and analyzing primary texts.

We will be reading and analyzing representations of Chinatown in the form of data and maps. In addition, we will learn how data tools can be used to read and analyze large volumes of text.

What this lab will cover

  • Running Jupyter Notebooks
  • Data Analysis of Chinatowns’ demographics
  • Visualization & Interpretation
  • Using Data Tools to Analyze Primary Texts

What you need to do

  • Read the content, complete the questions
  • Analyze the data
  • Submit the assignment

1. Running Jupyter Notebooks

You are currently working in a Jupyter Notebook. A Notebook allows text and code to be combined into one document. Each rectangular section of a notebook is called a “cell.” There are two types of cells in this notebook: text cells and code cells.

Jupyter allows you to run simulations and regressions in real time. To do this, select a code cell, and click the “run cell” button at the top that looks like ▶ to confirm any changes. Alternatively, you can hold down the shift key and then press return or enter.

In the following simulations, anytime you see In [ ] you should click the “run cell” button to see output. If you get an error message after running a cell, go back to the beginning of the lab and make sure that every previous code cell has been run.

Part 0: Introduction to Python and Jupyter Notebooks:

1. Cells, Arithmetic, and Code

In a notebook, each rectangle containing text or code is called a cell.

Cells (like this one) can be edited by double-clicking on them. This cell is a text cell, written in a simple format called Markdown to add formatting and section headings. You don’t need to worry about Markdown today, but it’s a pretty fun+easy tool to learn.

After you edit a cell, click the “run cell” button at the top that looks like ▶ to confirm any changes. (Try not to delete the instructions.) You can also press SHIFT-ENTER to run any cell or progress from one cell to the next.

Other cells contain code in the Python programming language. Running a code cell will execute all of the code it contains.

Try running this cell:

print("Hello, World!")
Hello, World!

We will now quickly go through some very basic functionality of Python, which we’ll be using throughout the rest of this notebook.

1.1 Arithmetic

Quantitative information arises everywhere in data science. In addition to representing commands to print out lines, expressions can represent numbers and methods of combining numbers.

The expression 3.2500 evaluates to the number 3.25. (Run the cell and see.)


We don’t necessarily always need to say “print”, because Jupyter always prints the last line in a code cell. If you want to print more than one line, though, do specify “print”.



Many basic arithmetic operations are built in to Python, like * (multiplication), + (addition), - (subtraction), and / (division). There are many others, which you can find information about here. Use parentheses to specify the order of operations, which act according to PEMDAS, just as you may have learned in school. Use parentheses for a happy new year!

2 + (6 * 5 - (6 * 3)) ** 2 * (( 2 ** 3 ) / 4 * 7)

1.2 Variables

We sometimes want to work with the result of some computation more than once. To be able to do that without repeating code everywhere we want to use it, we can store it in a variable with assignment statements, which have the variable name on the left, an equals sign, and the expression to be evaluated and stored on the right. In the cell below, (3 * 11 + 5) / 2 - 9 evaluates to 10, and gets stored in the variable result.

result = (3 * 11 + 5) / 2 - 9

2. Functions

One important form of an expression is the call expression, which first names a function and then describes its arguments. The function returns some value, based on its arguments. Some important mathematical functions are:

Function Description
abs Returns the absolute value of its argument
max Returns the maximum of all its arguments
min Returns the minimum of all its arguments
round Round its argument to the nearest integer

Here are two call expressions that both evaluate to 3

abs(2 - 5)
max(round(2.8), min(pow(2, 10), -1 * pow(2, 10)))

These function calls first evaluate the expressions in the arguments (inside the parentheses), then evaluate the function on the results. abs(2-5) evaluates first to abs(3), then returns 3.

A statement is a whole line of code. Some statements are just expressions, like the examples above, that can be broken down into its subexpressions which get evaluated individually before evaluating the statement as a whole.

2.1 Calling functions

The most common way to combine or manipulate values in Python is by calling functions. Python comes with many built-in functions that perform common operations.

For example, the abs function takes a single number as its argument and returns the absolute value of that number. The absolute value of a number is its distance from 0 on the number line, so abs(5) is 5 and abs(-5) is also 5.


Functions can be called as above, putting the argument in parentheses at the end, or by using “dot notation”, and calling the function after finding the arguments, as in the cell immediately below.

from datascience import make_array
nums = make_array(1, 2, 3)  # makes a list of items, in this case, numbers
nums.mean()  # finds the average of the array

1: Exploring Demographic Data:

1.1 Importing Modules

First, we need to import libraries so that we are able to call the functions from within. We are going to use these functions to manipulate data tables and conduct a statistical analysis. Run the code cell below to import these modules.

from datascience import *
import numpy as np
import matplotlib.pyplot as plt
from ipywidgets import *
%matplotlib inline
import folium
import pandas as pd
from IPython.display import HTML, display, IFrame
import folium
import spacy
from wordcloud import WordCloud
from textblob import TextBlob
import geojson

Official map of Chinatown in San Francisco - 1855

Prepared under the supervision of the special committee of the Board of Supervisors. July 1885.


This map reflects the pervasive bias against the Chinese in California and in turn further fostered the hysteria. It was published as part of an official report of a Special Committee established by the San Francisco Board of Supervisors “on the Condition of the Chinese Quarter.” The Report resulted from a dramatic increase in hostility to the Chinese, particularly because many Chinese laborers had been driven out of other Western states by vigilantes and sought safety in San Francisco (Shah 2001, 37).

The substance and tone of the Report is best illustrated by a few excerpts: “The general aspect of the streets and habitations was filthy in the extreme, . . . a slumbering pest, likely at any time to generate and spread disease, . . . a constant source of danger . . . , the filthiest spot inhabited by men, women and children on the American continent.” (Report 4-5). “The Chinese brought here with them and have successfully maintained and perpetuated the grossest habits of bestiality practiced by the human race.” (Ibid. 38).

The map highlights the Committee’s points, particularly the pervasiveness of gambling, prostitution and opium use. It shows the occupancy of the street floor of every building in Chinatown, color coded to show: General Chinese Occupancy, Chinese Gambling Houses, Chinese Prostitution, Chinese Opium Resorts, Chinese Joss Houses, and White Prostitution.

The Report concludes with a recommendation that the Chinese be driven out of the City by stern enforcement of the law: “compulsory obedience to our laws [is] necessarily obnoxious and revolting to the Chinese and the more rigidly this enforcement is insisted upon and carried out the less endurable will existence be to them here, the less attractive will life be to them in California. Fewer will come and fewer will remain. . . . Scatter them by such a policy as this to other States . . . .” (Ibid. 67-68)

2. Analyzing Demographics

In this section, we will examine some of the factors that influence population growth and how they are changing the landscape of Chinatowns across the U.S.

2.1 Reading Data, 2010-2015

Now it’s time to work with tables and explore some real data. A Table is just like how we made a list above with make_array, but for all the rows in a table.

We’re going to first look at the most recent demographic data from 2010-2015:

historical_data = Table.read_table('data/2010-2015.csv')  # read in data from file
historical_data['FIPS'] = ['0' + str(x) for x in historical_data['FIPS']]  # fix FIPS columns  # show first ten rows
FIPS Population One_race Two_or_more_races Asian Year
06075010100 3744 3689 55 1276 2010
06075010200 4184 3942 242 385 2010
06075010300 4285 4173 112 1445 2010
06075010400 4154 4068 86 1161 2010
06075010500 2429 2357 72 317 2010
06075010600 3927 3789 138 2453 2010
06075010700 5683 5657 26 4541 2010
06075010800 4587 4475 112 2336 2010
06075010900 4463 4339 124 1056 2010
06075011000 5718 5653 65 3027 2010

... (1172 rows omitted)

We can get some quick summary statistics by calling the .stats() function on our Table variable:

statistic FIPS Population One_race Two_or_more_races Asian Year
min 06075010100 0 0 0 0 2010
max 06075990100 12511 12157 660 5291 2015
median 4026.5 3872 157 1004 2012.5
sum 4.88225e+06 4.68046e+06 201782 1.6378e+06 2.37878e+06

So which census tract has the highest Asian population?

First we can find the highest population by using the max function:


Let’s plug that into a table that uses the where and are.equal_to functions:

historical_data.where('Asian', are.equal_to(max(historical_data['Asian'])))
FIPS Population One_race Two_or_more_races Asian Year
06075035300 7885 7667 218 5291 2014

This FIPS code 06075035300 is tract 353. Does this make sense to you?

It might be better to look at which census tracts has Asian as the highest proportion of the population:

historical_data['Asian_percentage'] = historical_data['Asian'] / historical_data['Population']
FIPS Population One_race Two_or_more_races Asian Year Asian_percentage
06075010100 3744 3689 55 1276 2010 0.340812
06075010200 4184 3942 242 385 2010 0.0920172
06075010300 4285 4173 112 1445 2010 0.337223
06075010400 4154 4068 86 1161 2010 0.27949
06075010500 2429 2357 72 317 2010 0.130506

... (1177 rows omitted)

Now we can use the same method to get the max and subset our table:

historical_data.where('Asian_percentage', are.equal_to(max(historical_data['Asian_percentage'])))
FIPS Population One_race Two_or_more_races Asian Year Asian_percentage
06075011800 1765 1765 0 1690 2010 0.957507

FIPS code 06075011800 is census tract 118. Does this make sense?

Question: Write one sentence describing the Asian population in Chinatown.

Replace this text with your response

Tables Essentials!

For your reference, here’s a table of useful Table functions:

Name Example Purpose
Table Table() Create an empty table, usually to extend with data
Table.read_table Table.read_table("my_data.csv") Create a table from a data file
with_columns tbl = Table().with_columns("N", np.arange(5), "2*N", np.arange(0, 10, 2)) Create a copy of a table with more columns
column tbl.column("N") Create an array containing the elements of a column
sort tbl.sort("N") Create a copy of a table sorted by the values in a column
where tbl.where("N", are.above(2)) Create a copy of a table with only the rows that match some predicate
num_rows tbl.num_rows Compute the number of rows in a table
num_columns tbl.num_columns Compute the number of columns in a table
select"N") Create a copy of a table with only some of the columns
drop tbl.drop("2*N") Create a copy of a table without some of the columns
take tbl.take(np.arange(0, 6, 2)) Create a copy of the table with only the rows whose indices are in the given array
join tbl1.join("shared_column_name", tbl2) Join together two tables with a common column name
are.equal_to() tbl.where("SEX", are.equal_to(0)) find values equal to that indicated
are.not_equal_to() tbl.where("SEX", are.not_equal_to(0)) find values not including the one indicated
are.above() tbl.where("AGE", are.above(30)) find values greater to that indicated
are.below() tbl.where("AGE", are.below(40)) find values less than that indicated
are.between() tbl.where("SEX", are.between(18, 60)) find values between the two indicated

2.2 The correlation coefficient - r

If we were interested in the relationship between two variables in our dataset, we’d want to look at correlation.

The correlation coefficient ranges from −1 to 1. A value of 1 implies that a linear equation describes the relationship between X and Y perfectly, with all data points lying on a line for which Y increases as X increases. A value of −1 implies that all data points lie on a line for which Y decreases as X increases. A value of 0 implies that there is no linear correlation between the variables. ~Wikipedia

r = 1: the scatter diagram is a perfect straight line sloping upwards

r = -1: the scatter diagram is a perfect straight line sloping downwards.

Let’s calculate the correlation coefficient between each of the continuous variables in our dataset.. We can use the .to_df().corr() function:

Population One_race Two_or_more_races Asian Year Asian_percentage
Population 1.000000 0.998493 0.568193 0.690292 0.053188 0.264237
One_race 0.998493 1.000000 0.522172 0.696641 0.043588 0.273538
Two_or_more_races 0.568193 0.522172 1.000000 0.281429 0.172941 0.002922
Asian 0.690292 0.696641 0.281429 1.000000 0.029201 0.839065
Year 0.053188 0.043588 0.172941 0.029201 1.000000 0.002954
Asian_percentage 0.264237 0.273538 0.002922 0.839065 0.002954 1.000000

We often visualize correlations with a scatter plot:

historical_data.scatter('Population', 'Asian')


historical_data.scatter('One_race', 'Asian')


historical_data.scatter('Two_or_more_races', 'Asian')


To look at a 1-1 relationship over time we might prefer a simple line graph. We can first group the data by Year, then take the mean for the Population, and plot that against Year:

2010    4005.949239
2011    4050.675127
2012    4100.279188
2013    4149.751269
2014    4208.487310
2015    4267.832487
Name: Population, dtype: float64
<matplotlib.axes._subplots.AxesSubplot at 0x1317a5a20>


<matplotlib.axes._subplots.AxesSubplot at 0x1318a4f60>


2.3 2015

Let’s look at only the year 2015:

historical_2015 = historical_data.where('Year', are.equal_to(2015))
FIPS Population One_race Two_or_more_races Asian Year Asian_percentage
06075010100 3798 3517 281 1284 2015 0.338073
06075010200 4172 4014 158 401 2015 0.096117
06075010300 4387 4334 53 1700 2015 0.387509
06075010400 5148 4709 439 1429 2015 0.277584
06075010500 2589 2337 252 804 2015 0.310545

... (192 rows omitted)

We can make a choropleth map with a little function, don’t worry about the code below!

def choro_column(tab, column):
    sf_2010 = geojson.load(open("data/2010-sf.geojson"))
    threshold_scale = np.linspace(min(tab[column]), max(tab[column]), 6, dtype=float).tolist()

    mapa = folium.Map(location=(37.7793784, -122.4063879), zoom_start=11)
                    columns=['FIPS', column],
                    threshold_scale=threshold_scale)"output/map-{}.html".format(column.replace(" ", "-")))
    return mapa

Here’s a choropleth of all the population:

choro_column(historical_2015, 'Population')
IFrame('output/map-Population.html', width=700, height=400)

Let’s look at only Asian:

choro_column(historical_2015, 'Asian')

Try making one more choropleth below with only Asian_percentage:

Question: Where is the largest concentration of Asian residents?

Replace this text with your response


Create a choropleth for 2010 with the same `Asian_percentage` column. Do you see any differences from 2010 to 2015?

Replace this text with your response

2.4 1940-2010

Now let’s take a look at the historical data showing how the Asian population has changed over time, as compared to the black population.

First, let’s load in all our of decennial San Francisco Chinatown census data acquired from an online domain called Social Explorer. Let’s first examine this dataset to get a sense of what’s in it.

Question: Can you explain how you would derive the Asian population from the given census data?

Replace this text with your response

historical = Table.read_table('data/process.csv')
Asian Asian and Pacific Islander Black Federal Information Processing System Code Total Population White Year
nan nan 78 06075A0005000 3465 3199 1940
nan nan 32 06075A0006000 4401 3203 1940
nan nan 71 06075A0007000 5466 4489 1940
nan nan 6 06075A0008000 5152 5124 1940
nan nan 49 06075A0009000 5052 4959 1940

... (73 rows omitted)

historical['Other'] = historical['Total Population'] - historical['White'] - historical['Black']
Asian Asian and Pacific Islander Black Federal Information Processing System Code Total Population White Year Other
nan nan 78 06075A0005000 3465 3199 1940 188
nan nan 32 06075A0006000 4401 3203 1940 1166
nan nan 71 06075A0007000 5466 4489 1940 906
nan nan 6 06075A0008000 5152 5124 1940 22
nan nan 49 06075A0009000 5052 4959 1940 44

... (73 rows omitted)

You can use the mean function to find the average total population in Chinatown. Do you notice any significant changes between 1940 and 2010?

historical.to_df().groupby('Year')['Total Population'].mean()
1940    4521.900
1950    4592.400
1960    4123.400
1970    4137.000
1980    3841.100
1990    4007.300
2000    3886.300
2010    4478.125
Name: Total Population, dtype: float64

Let’s plot the results on a graph.

historical.to_df().groupby('Year')['Total Population'].mean().plot()
<matplotlib.axes._subplots.AxesSubplot at 0x12d5f4438>


1940    2880.50
1950    2469.40
1960    1702.10
1970    1341.80
1980    1170.30
1990    1195.60
2000    1322.10
2010    1679.25
Name: White, dtype: float64

We can plot the average population of different racial groups.

<matplotlib.axes._subplots.AxesSubplot at 0x1329e64e0>


1940    37.2
1950    84.3
1960    39.0
1970    33.3
1980    41.9
1990    52.0
2000    50.7
2010    71.5
Name: Black, dtype: float64
<matplotlib.axes._subplots.AxesSubplot at 0x1329bcf28>


1940    1604.200
1950    2038.700
1960    2382.300
1970    2761.900
1980    2628.900
1990    2759.700
2000    2513.500
2010    2727.375
Name: Other, dtype: float64
<matplotlib.axes._subplots.AxesSubplot at 0x132419cf8>


Question: Describe the population trends you observed from the above graphs. How would you compare the changes in Asian vs Black vs White populations?

Replace this text with your response

2.5 Manhattan

One of the goals of this module is to compare different Chinatowns from across the US. We will now compare the SF Chinatown data to the census data from Manhattan’s Chinatown. Let’s load the Manhattan data.

manhattan = Table.read_table('data/manhattan_cleaned.csv')
Year Census Tract Total Population Asian/Other Population Chinese Population White Population
1950 29 6695 3180 3180 6228
1960 29 7091 4677 4677 2390
1970 6 8322 2684 2684 4669
1970 8 9597 4258 4258 5311
1970 16 6381 4069 4069 2141
1970 18 7764 2281 2281 4766
1970 27 1671 623 623 993
1970 29 9435 5996 5996 2321
1970 41 9294 5262 5262 3845
1980 6 10638 5172 5086 2179

... (53 rows omitted)

Year Census Tract Total Population Asian/Other Population Chinese Population White Population
Year 1.000000 0.054710 -0.191828 -0.087767 -0.118715 -0.539567
Census Tract 0.054710 1.000000 -0.412661 -0.376237 -0.378750 -0.014695
Total Population -0.191828 -0.412661 1.000000 0.907103 0.904009 0.408393
Asian/Other Population -0.087767 -0.376237 0.907103 1.000000 0.998946 0.119564
Chinese Population -0.118715 -0.378750 0.904009 0.998946 1.000000 0.123380
White Population -0.539567 -0.014695 0.408393 0.119564 0.123380 1.000000
manhattan.scatter('Chinese Population', 'White Population')


manhattan_2010 = manhattan.where('Year', are.equal_to(2010))
Year Census Tract Total Population Asian/Other Population Chinese Population White Population
2010 2.01 3058 1266 1202 777
2010 6 11367 7253 6886 1708
2010 8 10290 9069 8832 964
2010 14.02 2782 880 760 1142
2010 16 8478 7058 6733 1117
2010 18 8660 5281 4827 2338
2010 22.01 6398 1553 1267 2349
2010 25 4869 1919 1818 965
2010 27 1264 978 941 254
2010 29 6398 4461 4260 917
2010 30.01 4492 1029 592 2732
2010 31 2550 712 413 1550
2010 36.01 3393 852 678 1424
2010 41 7817 5015 4598 2445
2010 43 4270 1046 827 1750
2010 45 1136 230 176 851
def choro_column(tab, column):
    tab = tab.to_df()
    tab['Census Tract'] = tab['Census Tract'].astype(str).str.strip('0').str.strip('.')
    nyc_2010 = geojson.load(open("data/nyc-census-2010.geojson"))
    tracts = folium.features.GeoJson(nyc_2010)
    threshold_scale = np.linspace(min(tab[column]), max(tab[column]), 6, dtype=float).tolist()

    mapa = folium.Map(location=(40.7128, -74.00609), zoom_start=11)
                    columns=['Census Tract', column],
                    threshold_scale=threshold_scale)"output/map-{}.html".format(column.replace(" ", "-")))
    return mapa
choro_column(manhattan_2010, 'Chinese Population')
IFrame('output/map-Chinese-Population.html', width=700, height=400)
manhattan_2010['Asian_percentage'] = manhattan_2010['Asian/Other Population'] / manhattan_2010['Total Population']
Year Census Tract Total Population Asian/Other Population Chinese Population White Population Asian_percentage
2010 2.01 3058 1266 1202 777 0.413996
2010 6 11367 7253 6886 1708 0.638075
2010 8 10290 9069 8832 964 0.881341
2010 14.02 2782 880 760 1142 0.316319
2010 16 8478 7058 6733 1117 0.832508

... (11 rows omitted)

choro_column(manhattan_2010, 'Asian_percentage')
IFrame('output/map-Asian_percentage.html', width=700, height=400)

3. Reading Primary Texts

In this class, we have been learning how to ‘close-read’ primary texts. Close-reading generally involves picking select passages and reading for the latent meanings embedded in word choice, syntax, the use of metaphors and symbols, etc. Here, we are introducing another way of analyzing primary texts using computational methods. Computational text analysis generally involves ‘counting’ words. Let’s see how this works by analyzing some of the poems written by Chinese immigrants on Angel Island.

Run the following cell to import the poems from a .txt file.

with open('data/islandpoetry1_22.txt', "r") as f:
    raw =
The sea-scape resembles lichen twisting and
turning for a thousand li
There is no shore to land and it is difficult to
With a gentle breeze I arrived at the city
thinking all would be so.
At ease, how was one to know he was to live in a
wooden building?

Because my house had bare walls, I began
rushing all about.
The waves are happy, laughing "Ha-ha!"
When I arrived on Island, I heard I was
forbidden to land.
I could do nothing but frown and feel angry
at heaven.

As a rule, a person is twenty before he starts
making a living.
Family circumstances have forced me to
experience wind and dust.
The heartless months and years seem bent on
defeating me.
It is a pity that time quickly ages one.

The gold and silver of America is very
Jabbing an awl into the thigh in search of
I embarked on the journey.
Not only are my one-thousand pieces of gold
already depleted, but
My countenance is blackened. It is surely for
the sake of the family.

Four days before the Qiqiao Festival,
I boarded the steamship for America.
Time flew like a shooting arrow.
Already, a cool autumn has passed.
Counting on my fingers, several months have
Still I am at the beginning of the road.
I have yet to be interrogated.
My heart is nervous with anticipation.

Everyone says travelling to North America is
a pleasure.
I suffered misery on the ship and sadness in
the wooden building.
After several interrogations, still I am not
I sigh because my compatriots are being
forceably detained.

Originally, I had intended to come to
America last year.
Lack of money delayed me until early
It was on the day that the Weaver Maiden
met the Cowherd
That I took passage on the President Lincoln.
I ate wind and tasted waves for more than
twenty days.
Fortunately, I arrived safely on the American
I thought I could land in a few days.
How was I to know I would become a
prisoner suffering in the wooden building?
The barbarians' abuse is really difficult to
When my family's circumstances stir my
emotions, a double stream of tears flow.
I only wish I can land in San Francisco soon.
Thus sparing me this additional sorrow here.

Instead of remaining a citizen of China, I
willingly became an ox.
I intended to come to America to earn a
The Western styled buildings are lofty; but I
have not the luck to live in them.
How was anyone to know that my dwelling
place would be a prison?

I used to admire the land of the Flowery
Flag as a country of abundance.
I immediately raised money and started my
For over a month, I have experienced enough
winds and waves.
Now on an extended sojourn in jail, I am
subject to the ordeals of prison life.
I look up and see Oakland so close by.
I wish to go back to my motherland to carry
the farmer's hoe.
Discontent fills my belly and it is difficult for
me to sleep.
I just write these few lines to express what is
on my mind.

Just talk about going to the land of the
Flowery Flag and my countenance fills
with happiness.
Not without hard work were 1,000 pieces of
gold dug up and gathered together.
There were words of farewell to the parents,
but the throat choked up first.
There were many feelings, many tears flowing
face to face, when parting with the wife.
Waves big as mountains often astonished this
With laws harsh as tigers, I had a taste of all
the barbarities.
Do not forget this day when you land ashore.
Push yourself ahead and do not be lazy or

I think back on the past when I had not
experienced hardship.
I resolved to go and seek Taogong.
The months and years are wasted and still it
has not ended.
Up to now, I am still trapped on a lonely

Today is the last day of winter.
Tomorrow morning is the vernal equinox.
One year's prospects have changed to another.
Sadness kills the person in the wooden

In the quiet of night, I heard, faintly, the
whistling of wind.
The forms and shadows saddened me, upon
seeing the landscape, I composed a poem.
The floating clouds, the fog, darken the sky.
The moon shines faintly as the insects chirp.
Grief and bitterness entwined are heaven sent.
The sad person sits alone, leaning by a window.

The night is cool as I lie stiff on the steel bunk.
Before the window the moon lady shines on me.
Bored, I get up and stand beneath the cold
Sadly, I count the time that's elapsed.
It is already mid-autumn.
We should all honor and enjoy her.
But I have not prepared even the most trifling
gift and I feel embarrassed.

The insects chirp outside the four walls.
The inmates often sigh.
Thinking of affairs back home.
Unconscious tears wet my lapel.

Depressed from living on Island, I sought the
Sleeping Village.
The uncertain future altogether wounds my
When I see my old country fraught with chaos,
I, a drifting leaf, become doubly saddened.

My belly is so full of discontent it is really
difficult to relax.
I can only worry silently to myself.
At times I gaze at the cloud- and fog-enshrouded
It only deepens my sadness.

Sadly, I listen to the sounds of insects and
angry surf.
The harsh laws pile layer upon layer; how
can I dissipate my hatred?
Drifting in as a traveller, I met with this
It is more miserable than owning only a flute
in the marketplace of Wu.

Living on Island away from home elicits a
hundred feelings.
My chest is filled with a sadness and anger I
cannot bear to explain.
Night and day, I sit passively and listlessly.
Fortunately, I have a novel as my companion.

Imprisonment at Youli, when will it end?
Fur and linen garments have been exchanged;
it is already another autumn.
My belly brims with discontent, too numerous
to inscribe on bamboo slips.
Snow falls, flowers wilt, expressing sorrow
through the ages.

The west wind ruffles my thin gauze clothing.
On the hill sits a tall building with a room of
wooden planks.
I wish I could travel on a cloud far away,
reunite with my wife and son.
When the moonlight shines on me alone, the
nights seem even longer.
At the head of the bed there is wine and my
heart is constantly drunk.
There is no flower beneath my pillow and
my dreams are not sweet.
To whom can I confide my innermost
I rely solely on close friends to relieve my

America has power, but not justice.
In prison, we were victimized as if we were
Given no opportunity to explain, it was really
I bow my head in reflection but there is
nothing I can do.

This place is called an island of immortals.
When, in fact, this mountain wilderness is a
Once you see the open net, why throw
yourself in?
It is only because of empty pockets I can do
nothing else.

I, a seven foot man, am ashamed I cannot
extend myself.
Curled up in an enclosure, my movements
are dictated by others."
Enduring a hundred humiliations, I can only
cry in vain.
This person's tears fall, but what can the blue
heavens do?

I have infinite feelings that the ocean
has changed into a mulberry grove.
My body is detained in this building.
I cannot fly from this grassy hill.
And green waters block the hero.
Impetuously, I threw away my writing brush.
My efforts have all been in vain.
It is up to me to answer carefully.
I have no words to murmur against the east

My grief, like dense clouds, cannot be
Whether deliberating or being melancholy
and bored,
I constantly pace to and fro.
Wang Can ascended the tower but who
pitied his sorrow?
Lord Yu who left his country could only wail
to himself.

Over a hundred poems are on the walls.
Looking at them, they are all pining at the
delayed progress.
What can one sad person say to another?
Unfortunate travellers everywhere wish to
Gain or lose, how is one to know what is
Rich or poor, who is to say it is not the will
of heaven?
Why should one complain if he is detained
and imprisoned here?
From ancient times, heroes often were the
first ones to face adversity.

The male eagle is also easy to tame.
One must be able to bend before one can
China experienced calamities for a thousand
Confucius was surrounded in Chen for seven
Great men exhibit quality.
Scholars take pride in being themselves.
Gains and losses are entangled in my bosom.
My restlessness is a sign of self-illumination.

Half way up the hill on Island, in the
building upstairs.
The imprisoned one has been separated from
his people summer to autumn.
Three times I dreamed of returning to the
native village.
My intestines are agitated in its nine turns by
the false Westerner.
I have run into hard times and am uselessly
There are many obstacles in life but who will
commiserate with me?
If at a later time I am allowed to land on the
American shore,
I will toss all the miseries of this jail to the
flowing current.

After leaping into prison, I cannot come out.
From endless sorrows, tears and blood streak.
The jingwei bird carries gravel to fill its old
The migrating wild goose complains to the
moon, mourning his harried life.
When Ziqing was in distant lands, who
pitied and inquired after him?
When Ruan Ji reached the end of the road,
he shed tile tears.
The scented grass and hidden orchids
complain of withering and falling.
When can I be allowed to rise above as I

There are tens of thousands of poems
composed on these walls.
They are all cries of complaint and sadness.
The day I am rid of this prison and attain
I must remember that this chapter once
In my daily needs, I must be frugal.
Needless extravagance leads youth to ruin.
All my compatriots should please be mindful.
Once you have some small gains, return
home early.

Imprisoned in the wooden building day after
My freedom withheld; how can I bear to talk
about it?
I look to see who is happy but they only sit
I am anxious and depressed and cannot fall
The days are long and the bottle constantly
empty; my sad mood, even so, is not
Nights are long and the pillow cold; who can
pity my loneliness?
After experiencing such loneliness and sorrow.
Why not just return home and learn to plow
the fields?

A building does not have to be tall; if it has
windows, it will be bright.
Island is not far, Angel Island.
Alas, this wooden building disrupts my
travelling schedule.
Paint on the four walls are green.
And green is the grass which surrounds.
It is noisy because of the many country folk.
And there are watchmen guarding during the
To exert influence, one can use a square-holed
elder brother.
There are children who disturb the ears.
But there are no incoherent sounds that
cause fatigue.
I gaze to the south at the hospital,
And look to the west at the army camp.
This author says, "What happiness is there in

For what reason must I sit in jail?
It is only because my country is weak and
my family poor.
My parents wait at the door but there is no
My wife and child wrap themselves in quilt,
sighing with loneliness.
Even if my petition is approved and I can
enter the country.
When can I return to the Mountains of Tang
with a full load?
From ancient times, those who venture out
usually become worthless.
How many people ever return from battles?

Leaving behind my writing brush and
removing my sword, I came to America.
Who was to know two streams of tears would
flow upon arriving here?
If there comes a day when I will have
attained my ambition and become
I will certainly behead the barbarians and
spare not a single blade of grass.

I am a member of the Huang clan from
I threw away my writing brush and pushed
forward, journeying to the capital of the
I bought an oar and arrived in the land of the
Golden Mountain.
Who was to know they would banish me to
If my country had contrived to make herself
strong, this never would have happened.
Then when the ship had docked, we could
have gone directly ashore.

Just now the five nationalities in China have
become one family,
But the powers still have not yet recognized
our China,
Primarily because foreign debts were piling
The foreigners pushed to control finances
and to seize power.

Being idle in the wooden building, I opened
a window.
The morning breeze and bright moon lingered
I reminisce the native village far away, cut off
by clouds and mountains.
On the little island the wailing of cold, wild
geese can be faintly heard.
The hero who has lost his way can talk
meaninglessly of the sword.
The poet at the end of the road can only
ascend a tower.
One should know that when the country is
weak, the people's spirit dies.
Why else do we come to this place to be

Twice I have passed through the blue ocean,
experienced the wind and dust of journey.
Confinement in the wooden building has
pained me doubly.
With a weak country, we must all join
together in urgent effort.
It depends on all of us together to roll back
the wild wave.

I lean on the railing and lift my head to look
at the cloudy sky.
All the mountains and rivers are dark.
Eastern Mongolia is lost and the date of her
return is uncertain.
The recovery of the Central Plains depends
on the youth.
Only the tongue of Changshan can slay the
To kill the bandit we must wave the whip of
Zu Di.
I am ashamed to be curled up like a worm
on Island.
I grieve for my native land but what else can
I say?

I have ten-thousand hopes that the
revolutionary armies will complete their
And help make the mining enterprises
successful in the ancestral land.
They will build many battleships and come to
the U.S. territory.
Vowing never to stop till the white men are
completely annihilated.

The dragon out of water is humiliated by
The fierce tiger who is caged is baited by a
As long as I am imprisoned, how can I dare
strive for supremacy?
An advantageous position for revenge will
surely come one day.

I left the village well behind me, bade
farewell to my father and mother.
Now I gaze at distant clouds and mountains,
tears forming like pearls.
The wandering son longed to be wealthy like
Who would have known I would be
imprisoned on Island?
I beat my breast when I think of China and
cry bitterly like Ruan Ji.
Our country's wealth is being drained by
foreigners, causing us to suffer national
My fellow countrymen, have foresight, plan
to be resolute.
And vow to conquer the U.S. and avenge
previous wrongs!

If the land of the Flowery Flag is occupied by
us in turn.
The wooden building will be left for the
angel's revenge.

If you have but one breath left, do not be
discouraged from your purpose.
I respectfully exhort my brothers who are all
talents of Chu.
Having a sense of shame, one can eradicate
Only by wielding the lance can one avoid
certain defeat.
Do not say that we have not the means to
level the ugly barbarians.
I am searching for a method that will turn
destiny back.
One-hundred-thousand men sharpen their
Swearing to behead the Loulan and open
the grasslands and fallow fields.

The low building with three beams merely
shelters the body.
It is unbearable to relate the stories
accumulated on the Island slopes.
Wait till the day I become successful and
fulfill my wish!
I will not speak of love when I level the
immigration station!

I am distressed that we Chinese are detained
in this wooden building.
It is actually racial barriers which cause
difficulties on Yingtai Island.
Even while they are tyrannical, they still
claim to be humanitarian.
I should regret my taking the risks of coming
in the first place.

I thoroughly hate the barbarians because they
do not respect justice.
They continually promulgate harsh laws to
show off their prowess.
They oppress the overseas Chinese and also
violate treaties.
They examine for hookworms and practice
hundreds of despotic acts.

I cannot bear to describe the harsh treatment
by the doctors.
Being stabbed for blood samples and
examined for hookworms was even more
After taking the medicine, I also drank
Like a dumb person eating the huanglian.

We’re interested in which words appear the most often in our set of poems. It’s pretty hard to read or see much in this form. We’ll coming back to the topic of what words are the most common with actual numbers a bit later but for now, run the following cell to generate two interesting visualizations of the most common words (minus those such as “the”, “a”, etc.).

wordcloud = WordCloud().generate(raw)

plt.imshow(wordcloud, interpolation='bilinear')

# lower max_font_size
wordcloud = WordCloud(max_font_size=40).generate(raw)
plt.imshow(wordcloud, interpolation="bilinear")



Question: What are the most common words you notice? Judging from these words, what do you think these poems are about?

Replace this text with your response

Oops, it seems we’ve forgotten just how many poems we have in our set. Luckily we have a quick way of finding out! Each “\n” in the display of the poem text indicates a line break. It turns out that each poem is separated by an empty line, a.k.a. two line breaks or “\n”’s.

num_poems = len(raw.split("\n\n"))

We can also use this idea to calculate the number of characters in each poem.

num_char_per_poem = [len(p) for p in raw.split("\n\n")]
[259, 212, 227, 262, 306, 233, 654, 260, 488, 550, 206, 167, 331, 325, 128, 203, 196, 251, 221, 250, 488, 201, 206, 237, 344, 243, 442, 345, 507, 474, 382, 465, 624, 444, 302, 399, 230, 505, 268, 437, 288, 226, 532, 116, 487, 254, 279, 263, 233]

This is interesting but seems like just a long list of numbers. What about the average number of characters per poem?


Let’s look at it in histogram form to get a better idea of our data.

Table().with_columns("Character Count", np.asarray(num_char_per_poem)).hist()


Each bar of this histogram tells us what proportion of the poems (the height of the bar) have that many characters (the position of the bar on the x-axis).

We can also use “\n” to look at enjambment too. Let’s calculate the proportion of lines that are enjambed out of the total number of lines per poem.

from string import punctuation

poems = raw.split("\n\n")

all_poems_enjambment = []
for p in poems:
    lines = p.split("\n")
    poems = raw.split("\n\n")
    enjambment = 0
    for l in lines:
            if l[-1] in punctuation:
                enjambment += 1
    enj = enjambment/len(lines)

[0.625, 0.42857142857142855, 0.42857142857142855, 0.5555555555555556, 0.1111111111111111, 0.5, 0.45, 0.5, 0.4666666666666667, 0.4375, 0.42857142857142855, 0.2, 0.25, 0.2222222222222222, 0.0, 0.3333333333333333, 0.3333333333333333, 0.5, 0.3333333333333333, 0.2857142857142857, 0.4, 0.42857142857142855, 0.42857142857142855, 0.5, 0.2, 0.4444444444444444, 0.42857142857142855, 0.2727272727272727, 0.5, 0.35714285714285715, 0.3333333333333333, 0.5, 0.3157894736842105, 0.35714285714285715, 0.5, 0.5384615384615384, 0.5, 0.5, 0.375, 0.4666666666666667, 0.5555555555555556, 0.5, 0.47058823529411764, 0.5, 0.5, 0.5, 0.5, 0.5, 0.4444444444444444]

Once again, what about the average?


Let’s now return to the question of the words that appear the most frequently in these 49 poems. First we have to use spaCy, an open-source software library for Natural Language Processing (NLP), to parse through the text and replace all the “\n”’s with spaces.

nlp = spacy.load('en', parser=False)
parsed_text = nlp(raw.replace("\n", " "))

We can separate all the words/symbols and put them in a table.

toks_tab = Table()
toks_tab.append_column(label="Word", values=[word.text for word in parsed_text])

... (3366 rows omitted)

toks_tab.append_column(label="POS", values=[word.pos_ for word in parsed_text])
Word POS
sea NOUN
scape NOUN
resembles NOUN
lichen ADJ
twisting VERB
turning VERB
for ADP

... (3366 rows omitted)

Now let’s create a new table with even more columns using the “tablefy” function below.

def tablefy(parsed_text):
    toks_tab = Table()
    toks_tab.append_column(label="Word", values=[word.text for word in parsed_text])
    toks_tab.append_column(label="POS", values=[word.pos_ for word in parsed_text])
    toks_tab.append_column(label="Lemma", values=[word.lemma_ for word in parsed_text])
    toks_tab.append_column(label="Stop Word", values=[word.is_stop for word in parsed_text])
    toks_tab.append_column(label="Punctuation", values=[word.is_punct for word in parsed_text])
    toks_tab.append_column(label="Space", values=[word.is_space for word in parsed_text])
    toks_tab.append_column(label="Number", values=[word.like_num for word in parsed_text])
    toks_tab.append_column(label="OOV", values=[word.is_oov for word in parsed_text])
    toks_tab.append_column(label="Dependency", values=[word.dep_ for word in parsed_text])
    return toks_tab
Word POS Lemma Stop Word Punctuation Space Number OOV Dependency
The DET the True False False False True det
sea NOUN sea False False False False True compound
- PUNCT - False True False False True punct
scape NOUN scape False False False False True nsubj
resembles NOUN resemble False False False False True ROOT
lichen ADJ lichen False False False False True compound
twisting VERB twist False False False False True dobj
and CCONJ and True False False False True cc
turning VERB turn False False False False True conj
for ADP for True False False False True prep

... (3366 rows omitted)

Next, let’s look at the frequency of words. However, we want to get rid of words such as “the” and “and” (stop words), punctuation, and spaces. We can do this by selecting rows that are not stop words, punctuation, or spaces and then sorting by word!

word_counts = tablefy(parsed_text).where("Stop Word", are.equal_to(False)).where(
    "Punctuation", are.equal_to(False)).where(
    "Space", are.equal_to(False)).group("Word").sort("count",descending=True)
Word count
building 15
land 12
wooden 11
Island 11
day 10
country 10
tears 8
know 7
America 7
wind 6

... (899 rows omitted)

In this table, we have both the words “sad” and “sadness” - it seems strange to separate them. It turns out that these words are part of the same “lexeme”, or a unit of meaning. For example, “run”, “runs”, “ran”, and “running” are all part of the same lexeme with the lemma ‘run’. Lemmas are another column in our table from above! Nice!

lemma_counts = tablefy(parsed_text).where("Stop Word", are.equal_to(False)).where(
    "Punctuation", are.equal_to(False)).where(
    "Space", are.equal_to(False)).group("Lemma").sort("count",descending=True)
Lemma count
building 16
day 15
land 13
wooden 11
country 10
Island 10
come 9
time 8
tear 8
know 8

... (796 rows omitted)

Now let’s look at how many words there are of each part of speech.

pos_counts = tablefy(parsed_text).where("Stop Word", are.equal_to(False)).where(
    "Punctuation", are.equal_to(False)).where(
    "Space", are.equal_to(False)).group("POS").sort("count",descending=True)
POS count
NOUN 585
VERB 392
ADJ 173
ADV 65
ADP 12

We can also look at the proportions of each POS out of all the words!

for i in np.arange(pos_counts.num_rows):
    pos = pos_counts.column("POS").item(i)
    count = pos_counts.column("count").item(i)
    total = np.sum(pos_counts.column("count"))
    proportion = str(count / total)
    print(pos + " proportion: " + proportion)
NOUN proportion: 0.4428463285389856
VERB proportion: 0.2967448902346707
ADJ proportion: 0.13096139288417866
PROPN proportion: 0.0643451930355791
ADV proportion: 0.04920514761544285
ADP proportion: 0.009084027252081756
NUM proportion: 0.006056018168054504
INTJ proportion: 0.000757002271006813

If we’re interested in words’ relations with each other, we can look at words that are next to each other. The function below returns the word following the first instance of the word you search for in the specified source.

def nextword(word, source):
    for i, w in enumerate(source):
        if w == word:
            return source[i+1]

Mess around a bit with this function! Change the “word” argument.

split_txt = raw.split()

# Change the target or "home" to other words!
nextword("home", split_txt)

We are specifically interested in the word “I” and the words that poets use in succession. Let’s make an array of all the words that come after it in these poems. For easier viewing, the phrases have been printed out. What do you notice?

one_after_i = make_array()
for i, w in enumerate(split_txt):
    if w == "I":
        one_after_i = np.append(one_after_i, split_txt[i+1])
for i in one_after_i:
    print("I " + i)
I arrived
I began
I arrived
I heard
I was
I could
I embarked
I boarded
I am
I have
I suffered
I am
I sigh
I had
I took
I ate
I arrived
I thought
I could
I to
I would
I only
I can
I willingly
I intended
I have
I used
I immediately
I have
I am
I look
I wish
I just
I had
I think
I had
I resolved
I am
I heard,
I composed
I lie
I get
I count
I have
I feel
I sought
I see
I can
I gaze
I listen
I dissipate
I met
I cannot
I sit
I have
I wish
I could
I confide
I rely
I bow
I can
I can
I cannot
I can
I have
I cannot
I threw
I have
I constantly
I dreamed
I have
I am
I will
I cannot
I be
I please?
I am
I must
I must
I bear
I look
I am
I gaze
I sit
I can
I return
I came
I will
I will
I am
I threw
I bought
I opened
I reminisce
I have
I lean
I am
I grieve
I say?
I have
I am
I dare
I left
I gaze
I would
I beat
I think
I respectfully
I am
I become
I will
I level
I am
I should
I thoroughly
I cannot
I also

Above we have only shown the next word, what about the next two words? Does this give you any new insight?

two_after_i = make_array()
for i, w in enumerate(split_txt):
    if w == "I":
        two_after_i = np.append(two_after_i, split_txt[i+1] + " " + split_txt[i+2])
for i in two_after_i:
    print("I " + i)
I arrived at
I began rushing
I arrived on
I heard I
I was forbidden
I could do
I embarked on
I boarded the
I am at
I have yet
I suffered misery
I am not
I sigh because
I had intended
I took passage
I ate wind
I arrived safely
I thought I
I could land
I to know
I would become
I only wish
I can land
I willingly became
I intended to
I have not
I used to
I immediately raised
I have experienced
I am subject
I look up
I wish to
I just write
I had a
I think back
I had not
I resolved to
I am still
I heard, faintly,
I composed a
I lie stiff
I get up
I count the
I have not
I feel embarrassed.
I sought the
I see my
I can only
I gaze at
I listen to
I dissipate my
I met with
I cannot bear
I sit passively
I have a
I wish I
I could travel
I confide my
I rely solely
I bow my
I can do.
I can do
I cannot extend
I can only
I have infinite
I cannot fly
I threw away
I have no
I constantly pace
I dreamed of
I have run
I am allowed
I will toss
I cannot come
I be allowed
I please? There
I am rid
I must remember
I must be
I bear to
I look to
I am anxious
I gaze to
I sit in
I can enter
I return to
I came to
I will have
I will certainly
I am a
I threw away
I bought an
I opened a
I reminisce the
I have passed
I lean on
I am ashamed
I grieve for
I say? I
I have ten-thousand
I am imprisoned,
I dare strive
I left the
I gaze at
I would be
I beat my
I think of
I respectfully exhort
I am searching
I become successful
I will not
I level the
I am distressed
I should regret
I thoroughly hate
I cannot bear
I also drank

Try doing some exploring of your own! If you’re feeling stuck, feel free to copy and edit code from above.

# Write your own code here!

Sentiment Analysis

We can do some analysis of the overall sentiments, or emotions conveyed, in each of the poems using the code below. Here, we analyze the overall sentiment of each poem individually. Once you run the next cell, you’ll see the sentiment values for each poem. A value below 0 denotes a negative sentiment, and a value above 0 is positive.

sentiments = make_array()
for p in poems:
    poem = TextBlob(p)
    sentiments = np.append(sentiments, poem.sentiment.polarity)
array([-0.04090909,  0.18333333, -0.02222222,  0.38333333,  0.175     ,
        0.        ,  0.09791667,  0.12878788, -0.02380952,  0.15104167,
       -0.19166667,  0.        , -0.46      , -0.08061224, -0.03333333,
        0.03333333, -0.03      , -0.28333333,  0.03333333,  0.        ,
       -0.195     , -0.6875    , -0.025     ,  0.        ,  0.16666667,
       -0.26      , -0.155     ,  0.38611111, -0.10083333, -0.03194444,
       -0.10833333, -0.07727273,  0.12777778, -0.134375  ,  0.15      ,
        0.08333333,  0.1375    , -0.12625   ,  0.075     , -0.05      ,
        0.29      ,  0.225     , -0.05138889,  0.        ,  0.00204082,
        0.265625  ,  0.0625    , -0.33333333, -0.025     ])

Now, what does this mean? It appears that the number of poems with negative sentiment is about the same as the number of poems with positive or neutral (0) sentiment. We can look at the proportion of negative poems in the next cell:

neg_proportion = np.count_nonzero(sentiments < 0)/len(sentiments)

Okay, so just under half of the poems have negative sentiment. So, on average the poems have slightly positive sentiment, right?

We can also perform sentiment analysis across the text of all of the poems at once and see what happens:

poems_all = TextBlob(raw.replace('\n', ' '))

This way of analyzing the text tells us that the language in all of the poems has slightly negative sentiment.

One more analysis we can perform is computing the average sentiment of the poems, given the list of each individual poem’s sentiments that we computed earlier:


This method also tells us that our poems have slightly negative sentiment, on average.

Here, let’s look at one of the poems with it’s sentiment value:

poem_3 = poems[3].replace('\n', ' ')
The gold and silver of America is very appealing. Jabbing an awl into the thigh in search of glory, I embarked on the journey. Not only are my one-thousand pieces of gold already depleted, but My countenance is blackened. It is surely for the sake of the family.

Let’s look at one more poem:

poem_47 = poems[47].replace('\n', ' ')
I thoroughly hate the barbarians because they do not respect justice. They continually promulgate harsh laws to show off their prowess. They oppress the overseas Chinese and also violate treaties. They examine for hookworms and practice hundreds of despotic acts.

Question: Do you think the sentiment analyzer did a good job assigning the sentiment to these poems? What might that mean for the trends we see in our average sentiment across the poems?

Replace this text with your response