Pycensuskr example

news
code
analysis
Author

Insang Song

Published

January 26, 2026

This post demonstrates how to use the pycensuskr package to analyze Korean census data.

Installation

You can install pycensuskr via pip:

pip install pycensuskr

Basic Usage

Here is a basic example of how to load census data and district boundaries.

from pycensuskr import CensusKR
from matplotlib import pyplot as plt
import geopandas as gpd

# Create a CensusData instance
census = CensusKR()

# load specific year data (e.g., 2020)
# This returns a dictionary containing various census datasets
data_2020 = census.load_data(2020)
print(data_2020.keys())

# load district boundaries for a specific year
districts_2020 = census.load_districts(2020)
print(districts_2020.head())
Index(['year', 'adm1', 'adm1_code', 'adm2', 'adm2_code', 'type', 'class1',
       'class2', 'unit', 'value'],
      dtype='object')
   year  adm2_code                                           geometry
0  2020      21140  POLYGON ((1147104.09 1689056.052, 1147080.539 ...
1  2020      21020  POLYGON ((1137762.765 1683520.509, 1137613.699...
2  2020      21010  POLYGON ((1139121.024 1678920.638, 1139128.662...
3  2020      21040  POLYGON ((1144618.022 1676795.027, 1144559.535...
4  2020      21070  POLYGON ((1142639.495 1682655.233, 1142678.184...
# Example: Process and visualize tax data
# First, basic processing of district codes to match data aggregation level
districts_2020["adm2_re"] = districts_2020["adm2_code"].astype(str).str.slice(0,4)
# aggregate geometries by adm2_re
districts_2020 = districts_2020.dissolve(by="adm2_re", as_index=False)
districts_2020["adm2_code"] = districts_2020["adm2_re"] + "0"
districts_2020["adm2_code"] = districts_2020["adm2_code"].astype(int)

# Retrieve tax data
df_tax_2020 = census.anycensus(year = 2020, type = "tax", aggregator = "sum")

# Merge spatial data with tax data
districts_tax_2020 = districts_2020.merge(df_tax_2020, on="adm2_code")

# Plot labor income
districts_tax_2020.plot("income_labor_mil", legend=True, figsize=(10, 10))
plt.title("Labor Income by District (2020)")
plt.show()