[PULL REQUEST] Add Employment Estimates by bryce-sandag · Pull Request #192 · SANDAG/Estimates-Program

bryce-sandag · 2026-02-04T19:16:23Z

Describe this pull request. What changes are being made?

Add in the employment estimates functionality. This is the base for employment estimates using publicly available data. Starts with LEHD LODES data at the block level by 2-digit NAICS, except split NIACS 72 into NAICS 721 and 722 and scales the employment to QCEW county level controls.

What issues does this pull request address?

close #185
close #186
close #188
close #189

Additional context

Issue #188 was completed by changing the functionality of read_sql_query_acs() to function on more than just ACS data, which resulted in changing the function to read_sql_query_custom()

There will still be some functionality to be added in future issues to be created, such as add in missing job categories not covered by LEHD LODES and QCEW

This update separates out some of the functionality. Then addressed the ability to lookback multiple years to grab most recently available data. Integrated the ability to split NAICS 72 into NIACS 721 and 722

…to SQL

change name of function and changed all the spots where read_sql_query_acs was being used

…ustom

Copilot

Pull request overview

This pull request adds employment estimates functionality to the codebase, implementing LEHD LODES data processing at the block level by 2-digit NAICS codes (with special handling to split NAICS 72 into 721 and 722) and scaling to QCEW county level controls.

Changes:

Added new employment module with functions to retrieve LODES data, apply NAICS 72 splits, aggregate to MGRA level, and control to QCEW totals
Renamed read_sql_query_acs() to read_sql_query_custom() to support querying ACS, LEHD LODES, and EDD point-level data with enhanced year lookback functionality
Created SQL queries for employment data retrieval and database schema updates for storing employment inputs/outputs

Reviewed changes

Copilot reviewed 16 out of 16 changed files in this pull request and generated 2 comments.

Show a summary per file

File	Description
wiki/Utility.md	Updated documentation to reflect renamed function and expanded data source support
sql/employment/xref_block_to_mgra.sql	New query to retrieve Census block to MGRA crosswalk
sql/employment/get_naics72_split.sql	New query to split NAICS 72 into 721 and 722 using EDD point-level data
sql/employment/get_mgra.sql	New query to retrieve distinct MGRAs for a run
sql/employment/get_lodes_data.sql	New query to retrieve LEHD LODES employment data by block and industry code
sql/employment/QCEW_control.sql	New query to retrieve QCEW county-level employment controls
sql/create_objects.sql	Added database tables for employment control inputs and job outputs
python/utils.py	Renamed and enhanced query function with improved year lookback logic and added LEHD database engine
python/pop_type.py	Updated function call to use renamed utility function
python/parsers.py	Added employment module to configuration validation
python/hs_hh.py	Updated function call to use renamed utility function
python/hh_characteristics.py	Updated function calls to use renamed utility function
python/employment.py	New module implementing employment estimates workflow
python/ase.py	Updated function calls to use renamed utility function
main.py	Integrated employment module into main execution flow
config.yml	Added employment flag to debug configuration

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

python/employment.py

Eric-Liu-SANDAG

Make sure all SQL files use only spaces. There's some tabs here and there

I haven't looked at any output yet, is there some [run_id] which has employment numbers yet?

python/parsers.py

python/utils.py

sql/employment/get_lodes_data.sql

wiki/Utility.md

sql/employment/get_mgra.sql

Eric-Liu-SANDAG · 2026-02-04T21:00:47Z

sql/employment/get_naics72_split.sql

What? --Drop Temp table and ok and spatial index if they exist. Also this doesn't drop any spatial index

What's the point of DECLARE @qry NVARCHAR(max)?

If you have long math equations, would very much prefer new lines start with an operator

I'm confused by the usage of COALESCE. Wouldn't ISNULL be a lot more clear?

The column [jobs] is very poorly named. It should be something like [average_monthly_jobs]

My preferred formatting for longer CASE statements is:

SELECT CASE WHEN LEFT([code], 3) = '721' THEN '721' WHEN LEFT([code], 3) = '722' THEN '722' ELSE NULL END AS [industry_code], ... FROM ...

Shorter case statements can remain like CASE WHEN [emp_m1] IS NOT NULL THEN 1 ELSE 0 END

Your final IF/ELSE statement has no worst case scenario for years before 2010

Not to be too pedantic, but [pct_721] and [pct_722] are technically proportions and not percentages

fixed comment, drop table also drops spatial index

removed DECLARE @qry NVARCHAR(max), was not needed

made change where appropriate to start line with operator

Changed COALESCE to ISNULL

Changed [jobs] to [average_monthly_jobs]

Fixed formatting for longer CASE statements

running before 2010 return the message 'EDD point-level data does not exist'. This is by design as don't want to run for any years prior to 2010

fine as [pct_721] and [pct_722]

@GregorSchroeder
Corrected missing parenthesis in ELSE IF @year = 2015 section to do correct addition and division

Also tested 2014 and 2016 and returns no data. May not be big deal as pulling in data using utils.read_sql_query_fallback so will just grab data for year previous, but may be worth looking into

@Eric-Liu-SANDAG is tasked with integrating 2014 into [EMPCORE]
2016 data does not seem to exist so the fallback will revert to 2015

sql/employment/xref_block_to_mgra.sql

…quest

bryce-sandag · 2026-02-05T00:20:35Z

I haven't looked at any output yet, is there some [run_id] which has employment numbers yet?

run_id = 22 has data for jobs in ws database and this is after addressing first two comments from @Eric-Liu-SANDAG

…back()

…orrect math in 2015 year section

bryce-sandag · 2026-02-05T21:21:20Z

@Eric-Liu-SANDAG @GregorSchroeder
run_id = 39 has data for jobs in ws database and this is after addressing comments from Eric

python/utils.py

Eric-Liu-SANDAG · 2026-02-05T21:44:36Z

sql/employment/get_lodes_data.sql

To avoid url changes, download the PDF and add to a new documentation folder in the repo. Probably add a README.md in the new documentation folder which notes the original source, aka the url

sql/employment/get_naics72_split.sql

wiki/Utility.md

Eric-Liu-SANDAG · 2026-02-05T22:04:45Z

python/employment.py

Dude GitHub completely ate my comments on this file? I don't see them on my previous request changes...

Need to restructure employment.py to match the format of the other modules. For example, see how pop_type.py:run_pop() is structured, with explicit inputs, outputs, validation, and insertion

The validation part is super super important. Make sure to run on both inputs and outputs

A lot of the processing can be combined via chained operators to remove a ton of the intermediate variables. For example, in aggregate_lodes_to_mgra(), you have the variables lehd_to_mgra, lehd_to_mgra_summed, and final_lehd_to_mgra, which feels excessive

As a continuation of above, not of fan of variable names like jobs_frame (why not just jobs, we already know it's a pd.DataFrame) and final_etc

Remove self-explanatory comments like # Add run_id column

Be extremely careful with your usages of utils.integerize_1d(). If you look at other locations it is used, it is nearly always proceeded by some kind of sort. This is because a single row or value being different can completely change the output of utils.integerize_1d() due to it's random nature. I would recommend that you sort values before and additionally, do two consecutive runs back to back and write a script to ensure that outputs are the same between runs. Note, the output of SQL scripts is not guaranteed to output in the same order each time, unless you do an explicit ORDER BY and ensure that there are no ties

Add some comments at the top similar to other modules. See pop_type.py for example

Surely there's a better way to cross join in pandas without using that weird key thing

…ebugging

bryce-sandag and others added 11 commits January 29, 2026 11:39

#185 initial commit or work done

a0c011d

#189 - block level naics72 split query

c66e9b8

#185 separate lodes data pull from join to mgra

2c28142

#185 #188 #189 Update logic for employment estimates

86019c7

This update separates out some of the functionality. Then addressed the ability to lookback multiple years to grab most recently available data. Integrated the ability to split NAICS 72 into NIACS 721 and 722

#185 change "jobs" in output to "value" and fix year left in SQL query

dec3eaf

#185 #186 Create output/input table in SQL and add ability to output …

3243c07

…to SQL

#188 change function read_sql_query_acs to read_sql_query_custom

b7e8062

change name of function and changed all the spots where read_sql_query_acs was being used

#188 update wiki for change in read_sql_query_acs to read_sql_query_c…

8410963

…ustom

#185 reset config

3be99f5

#185 reset config

a2b6275

#185 cleanup in a few spots

3788834

bryce-sandag requested a review from Copilot February 4, 2026 19:16

bryce-sandag self-assigned this Feb 4, 2026

bryce-sandag added the enhancement New feature or request label Feb 4, 2026

Copilot AI reviewed Feb 4, 2026

View reviewed changes

python/employment.py Outdated Show resolved Hide resolved

python/employment.py Outdated Show resolved Hide resolved

bryce-sandag added 2 commits February 4, 2026 11:26

#185 remove output folder used during testing

6015a5a

#185 Change connection when grabbing mgras

0cc89b9

bryce-sandag requested review from Eric-Liu-SANDAG and GregorSchroeder February 4, 2026 19:30

Eric-Liu-SANDAG requested changes Feb 4, 2026

View reviewed changes

#185 #188 addressed first 2 comments from @Eric-Liu-SANDAG in pull re…

827d066

…quest

bryce-sandag added 7 commits February 5, 2026 09:52

#185 address pull request feedback for get_lodes_data.sql

b8dfdb6

#185 Update utils.py and utility.md for update to read_sql_query_fall…

f20e4c6

…back()

#185 #189 addressed pull request feedback and added parenthesis for c…

26a00d6

…orrect math in 2015 year section

#185 remove get_mgra.sql and use as string directly

5f376d5

#185 fix using only spaces vs tabs in sql files

824b8ef

#185 remove year left in query from testing

223fcc4

#185 fix table being called for [inputs].[mgra]

46cda0e

bryce-sandag requested a review from Eric-Liu-SANDAG February 5, 2026 21:20

Eric-Liu-SANDAG requested changes Feb 5, 2026

View reviewed changes

bryce-sandag added 3 commits February 5, 2026 14:56

#185 #188 update utility.md and utils.py

f7a109b

#185 #189 Better format based on feedback and fix year set when was d…

b8bb79c

…ebugging

#185 better formatting to match rest of estimates program

73bca20

Conversation

bryce-sandag commented Feb 4, 2026

Describe this pull request. What changes are being made?

What issues does this pull request address?

Additional context

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Eric-Liu-SANDAG left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Eric-Liu-SANDAG Feb 4, 2026

Choose a reason for hiding this comment

Uh oh!

bryce-sandag Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

bryce-sandag Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

GregorSchroeder Feb 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

bryce-sandag commented Feb 5, 2026

Uh oh!

bryce-sandag commented Feb 5, 2026

Uh oh!

Uh oh!

Eric-Liu-SANDAG Feb 5, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Eric-Liu-SANDAG Feb 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Eric-Liu-SANDAG Feb 5, 2026 •

edited

Loading