Pycensuskr example – Insang Song

This post demonstrates how to use the pycensuskr package to analyze Korean census data.

Installation

You can install pycensuskr via pip:

pip install pycensuskr

Basic Usage

Here is a basic example of how to load census data and district boundaries.

from pycensuskr import CensusKR
from matplotlib import pyplot as plt
import geopandas as gpd

# Create a CensusData instance
census = CensusKR()

# load specific year data (e.g., 2020)
# This returns a dictionary containing various census datasets
data_2020 = census.load_data(2020)
print(data_2020.keys())

# load district boundaries for a specific year
districts_2020 = census.load_districts(2020)
print(districts_2020.head())

Index(['year', 'adm1', 'adm1_code', 'adm2', 'adm2_code', 'type', 'class1',
       'class2', 'unit', 'value'],
      dtype='object')
   year  adm2_code                                           geometry
0  2020      21140  POLYGON ((1147104.09 1689056.052, 1147080.539 ...
1  2020      21020  POLYGON ((1137762.765 1683520.509, 1137613.699...
2  2020      21010  POLYGON ((1139121.024 1678920.638, 1139128.662...
3  2020      21040  POLYGON ((1144618.022 1676795.027, 1144559.535...
4  2020      21070  POLYGON ((1142639.495 1682655.233, 1142678.184...

# Example: Process and visualize tax data
# First, basic processing of district codes to match data aggregation level
districts_2020["adm2_re"] = districts_2020["adm2_code"].astype(str).str.slice(0,4)
# aggregate geometries by adm2_re
districts_2020 = districts_2020.dissolve(by="adm2_re", as_index=False)
districts_2020["adm2_code"] = districts_2020["adm2_re"] + "0"
districts_2020["adm2_code"] = districts_2020["adm2_code"].astype(int)

# Retrieve tax data
df_tax_2020 = census.anycensus(year = 2020, type = "tax", aggregator = "sum")

# Merge spatial data with tax data
districts_tax_2020 = districts_2020.merge(df_tax_2020, on="adm2_code")

# Plot labor income
districts_tax_2020.plot("income_labor_mil", legend=True, figsize=(10, 10))
plt.title("Labor Income by District (2020)")
plt.show()