Skip to article frontmatterSkip to article content
Site not loading correctly?

This may be due to an incorrect BASE_URL configuration. See the MyST Documentation for reference.

U.S. Small Business Association

Estimated Time: 60 Minutes
Developers: James Geronimo, Suparna Kompalli

Run the cell below before running any other code cells!

from utils import *

Table of Contents

  1. Background

  2. About the Data

  3. Inspecting the Data

  4. Top States by SBA-Approved Loan Amounts

  5. Top Cities by SBA-Approved Loan Amounts

  6. Top Industries by SBA-Approved Loan Amounts

  7. SBA Loan Counts and Proportions

  8. Spotlight on Los Angeles


1. Background

Small businesses serve as a core component of innovation, employment, and community development in the United States. Since its founding in 1953, the U.S. Small Business Administration (SBA) has played a critical role in expanding access to capital by offering loan guarantees to small enterprises that may struggle to obtain funding through traditional credit markets.

The importance of supporting small businesses goes beyond economics — it fosters entrepreneurship, reduces unemployment, and strengthens local economies. However, these loans are not without risk; defaults and charge-offs are also part of the picture, especially in volatile or highly competitive industries.

Question 1.1: What small business(es) in your local community hold importance in your everyday life?

Your Answer Here

Question 1.2: If you were to start your own small business, what would it specialize in and why?

Your Answer Here


2. About the Data

In this notebook, we will analyze a comprehensive dataset from the SBA, originally sourced from Kaggle:

Should This Loan Be Approved or Denied?

The dataset contains 899,164 SBA loan records and includes detailed information about:

This dataset provides a comprehensive view into the geographic, demographic, and financial dimensions of small business lending in the U.S. It offers a unique opportunity to explore a variety of questions regarding SBA-backed funding, successful industries, and loan approval trends.

Throughout this notebook, we will use data visualizations, descriptive statistics, and interactive tools to uncover insights about how and where SBA loans are distributed — and what that might say about the broader startup ecosystem.

For further academic insight, see the associated article by Li, Mickel, and Taylor (2018):

“Should This Loan Be Approved or Denied?” A Large Dataset with Class Assignment Guidelines

Question 2: Skim through the introduction of the paper linked above. What was the main purpose for the construction of the SBA’s dataset?

Your Answer Here


3. Inspecting the Data

We begin by importing the SBA loan dataset and displaying a DataFrame in the cell below.

To get a feel for the structure and contents of the dataset, we are showing the first 5 rows of the data in chunks of 9 columns at a time. This allows us to explore the attributes associated with each loan. For example, we can look into loan identifiers and business info, bank and loan approval information, and disbursement and financial data. Also, we’ve provided a data dictionary below:

Data Dictionary

Variable NameDescription
LoanNr_ChkDgtIdentifier (Primary key)
NameBorrower name
CityBorrower city
StateBorrower state
ZipBorrower zip code
BankBank name
BankStateBank state
NAICSNorth American Industry Classification System code
ApprovalDateDate SBA commitment was issued
ApprovalFYFiscal year of SBA commitment
TermLoan term in months
NoEmpNumber of business employees
NewExist1 = Existing business, 2 = New business
CreateJobNumber of jobs created
RetainedJobNumber of jobs retained
FranchiseCodeFranchise code (00000 or 00001 = No franchise)
UrbanRural1 = Urban, 2 = Rural, 0 = Undefined
RevLineCrRevolving line of credit (Y = Yes, N = No)
LowDocLowDoc Loan Program (Y = Yes, N = No)
ChgOffDateDate when the loan was charged off (defaulted)
DisbursementDateDate when the loan was disbursed
DisbursementGrossAmount disbursed
BalanceGrossGross amount still outstanding
MIS_StatusLoan status (CHGOFF = charged off, PIF = paid in full)
ChgOffPrinGrPrincipal amount charged off
GrAppvGross amount of loan approved by the bank
SBA_AppvSBA’s guaranteed portion of the approved loan

This initial inspection is important for identifying which columns will be most useful in answering questions about geographic, financial, and industry-based trends in startup funding.

initial_inspection()

Question 3: How might using a data dictionary be useful when looking through our dataset?

Your Answer Here

4. Top States by SBA-Approved Loan Amounts

In this section, we investigate how SBA-approved funding is distributed geographically across U.S. states. Using a horizontal bar chart, we highlight the top 15 states by total approved loan volume.

The visualization uses a green color gradient to emphasize differences in loan volume, making it easier to compare across states.

This type of visualization helps reveal regional disparities in startup funding support. It also raises key analytical questions for further exploration:

This sets the stage for deeper geographic or industry-specific analysis.

top_states_by_amount()

Question 4.1: What are the top states in this plot? Why might this be the case?

Your Answer Here

Question 4.2: What states did you expect to be higher (or lower) on this list?

Your Answer Here


5. Top Cities by SBA-Approved Loan Amount

In this section, we drill down from states to individual cities to see where SBA-backed loans are most heavily concentrated.

Down below, we plot the top 20 cities by total SBA-guaranteed approval amount, coloring bars by the state for additional geographic context.

top_cities_by_amount()

Question 5.1: Which three cities top this chart, and what factors (industry makeup, population, policies) might explain their high SBA-approved volumes?

Your Answer Here

Question 5.2: Do any states appear more than once among the top 20 cities? What might that tell you about how SBA funding is distributed within those states?

Your Answer Here

Question 5.3: How does the city-level picture compare to the state-level chart from Section 4? Do the same places dominate, or do we see different hotspots emerge?

Your Answer Here


6. Top Industries by SBA-Approved Loan Amount

Having examined SBA-approved funding across geography (states in Section 4, cities in Section 5), we now turn to the industry-level to understand which economic sectors receive the most support. By grouping North American Industry Classification System (NAICS) codes by their first two digits, we aggregate individual industries into broader sectors (e.g., “72” = Accommodation & Food Services, “44” = Retail Trade). This enables us to compare sectors on a level playing field and uncover where SBA guarantees are most heavily concentrated.

Below, we have provided the NAICS Sector Descriptions table for you to reference the codes with their corresponding industries.

NAICS CodeSector Description
11Agriculture, forestry, fishing and hunting
21Mining, quarrying, and oil and gas extraction
22Utilities
23Construction
31–33Manufacturing
42Wholesale trade
44–45Retail trade
48–49Transportation and warehousing
51Information
52Finance and insurance
53Real estate and rental and leasing
54Professional, scientific, and technical services
55Management of companies and enterprises
56Administrative and support and waste management and remediation services
61Educational services
62Health care and social assistance
71Arts, entertainment, and recreation
72Accommodation and food services
81Other services (except public administration)
92Public administration

The chart generated below presents a horizontal bar plot of total SBA-approved loan amounts by NAICS sector code. The length and color intensity of each bar correspond to the scale of funding. These insights can help highlight sectoral priorities in small business financing, inform risk assessments, and guide entrepreneurs toward areas with robust SBA backing.

top_industries_by_amount()

Question 6.1: Which NAICS sector tops this chart, and why might it receive such high SBA support?

Your Answer Here

Question 6.2: Do any sectors surprise you with unexpectedly high or low funding levels? What factors might explain these outliers?

Your Answer Here

Question 6.3: How do the industry-level patterns compare to the geographic trends we saw earlier? Are certain sectors clustered in particular states or cities?

Your Answer Here


7. SBA Loan Counts and Proportions

Up to now, we’ve focused on total dollar‐value metrics—total SBA-approved amounts by state, city, and industry. In this section, we examine two complementary views of SBA activity:

  1. Loan Count per State: Shows the total number of SBA loans issued in each state, highlighting where the SBA is most active.

  2. Average SBA-Approved Amount per Loan: Calculates the mean guaranteed loan amount per loan for each state, revealing where individual loans tend to be larger or smaller.

By comparing these two choropleth maps, we can see whether states with high loan volume also have high average loan sizes, or other trends are present.

loan_count_per_state()
avg_amount_per_loan_by_state()

Question 7.1: Which states have the highest loan counts, and do they also exhibit high average loan amounts? Does this differ from what we saw in Section 4?

Your Answer Here

Question 7.2: Identify any states with a high number of loans but a low average loan size (or vice versa). What might this indicate about small-business lending patterns in those states?

Your Answer Here

8. Spotlight on Los Angeles

In this section, we zero in on Los Angeles, CA, one of the country’s largest and most diverse startup ecosystems. First, we filter all SBA loans to those issued in Los Angeles and compute a concise summary table showing:

Next, we parse through the approval dates and plot a year-over-year line chart of total SBA-approved dollars, revealing how funding in LA has evolved over time.

los_angeles_summary()
annual_sba_approved_amount_la()

Question 8.1: Based on the summary table, which metric surprised you most, and what might explain that result in the context of LA’s economy?

Your Answer Here

Question 8.2: Examine the annual line chart—identify any sharp increases or declines. What local or national events (like policy changes or economic events) could correlate with those inflection points?

Your Answer Here


✅ Congrats! You’ve completed the SBA loan exploration notebook! 🎉

References
  1. Li, M., Mickel, A., & Taylor, S. (2018). “Should This Loan be Approved or Denied?”: A Large Dataset with Class Assignment Guidelines. Journal of Statistics Education, 26(1), 55–66. 10.1080/10691898.2018.1434342