USDA Public Comment Analysis System

A serverless application for analyzing public comments from regulations.gov using advanced NLP techniques, clustering, and AI-powered insights.

Disclaimers

Customers are responsible for making their own independent assessment of the information in this document.

This document:

(a) is for informational purposes only,

(b) references AWS product offerings and practices, which are subject to change without notice,

(c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided "as is" without warranties, representations, or conditions of any kind, whether express or implied. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers, and

(d) is not to be considered a recommendation or viewpoint of AWS.

Additionally, you are solely responsible for testing, security and optimizing all code and assets on GitHub repo, and all such code and assets should be considered:

(a) as-is and without warranties or representations of any kind,

(b) not suitable for production environments, or on production or other critical data, and

(c) to include shortcuts in order to support rapid prototyping such as, but not limited to, relaxed authentication and authorization and a lack of strict adherence to security best practices.

All work produced is open source. More information can be found in the GitHub repo.

Project Overview

The USDA Public Comment Analysis system ingests public comments from regulations.gov, processes them using advanced natural language processing techniques, and generates insightful reports that include topic modeling, sentiment analysis, and detection of AI-generated content. This system assists USDA staff in efficiently analyzing public feedback on proposed regulations.

All work produced is open source. More information can be found in the GitHub repo.

Key Features

Comment Processing: Retrieves and processes comments and attachments from regulations.gov API
Content Clustering: Groups similar comments using semantic similarity
Sentiment Analysis: Determines positive, neutral, or negative sentiment by cluster
Insight Generation: AI-powered summarization and action recommendations
AI Content Detection: Flags potentially AI-generated comments for review
Real-time Progress Updates: WebSocket-based progress monitoring
Interactive UI: React frontend for submission and result visualization

Architecture

The system follows a serverless architecture on AWS:

Frontend: React application deployed on AWS Amplify
API Layer: REST API (API Gateway + Lambda) for submissions and results
Real-time Updates: WebSocket API for progress notifications
Processing Pipeline: Step Functions workflow for orchestration
ML Processing: SageMaker for clustering and Amazon Bedrock for analysis
Storage: S3 for comment data and DynamoDB for state management
Infrastructure: Defined with AWS CDK in TypeScript

Prerequisites

AWS account with sufficient permissions
AWS CLI installed and configured
Node.js (v14 or higher) and npm
AWS CDK installed globally
Regulations.gov API key
Access to Amazon Bedrock with Claude 3.5 Sonnet model enabled
GitHub account with access to the repository
GitHub personal access token with repository permissions
Docker should be running beforehand (use docker run)

Deployment

Preparing Secrets

Creating GitHub Access Token

In GitHub: Settings > Developer settings > Personal access tokens
Generate a new token with repo and admin:repo_hook scopes
Store in AWS Secrets Manager as github-token by updating the plaintext

Enabling Amazon Bedrock Model Access

In AWS Console, navigate to Amazon Bedrock
Go to "Model access" and click "Manage model access"
Enable "Anthropic Claude 3.5 Sonnet" model

Deployment Steps

Create IAM User and Access Keys

Open the IAM Console in AWS
Create a new IAM user
Under Attach Policies Directly, assign the required policies.
Go to the Security Credentials tab and generate new Access Keys.
Download or securely save the Access Key ID and Secret Access Key for later use.

Configure AWS CLI:
```
aws configure
```

Clone the repository:

git clone https://github.com/ASUCICREPO/public-comment-analysis.git
cd public-comment-analysis

Install global and project dependencies:
```
npm install -g aws-cdk
npm install
```

FOLLOW IF REPOSITORY IS FORKED

Update Github repository and owner [IF REPOSITORY IS FORKED]

Navigate to the public-comment-analysis.ts file under the bin/ folder
Update the values in the 'owner' and 'repository' parameters for the amplifyStack as seen below:

1. const amplifyStack = new AmplifyStack(app, 'AmplifyStack', {
   apiUrl: restApiStack.apiUrl,
   webSocketEndpoint: webSocketStack.webSocketEndpoint,
   owner: 'ASUCICREPO',  <- UPDATE HERE
   repository: 'public-comment-analysis', <- UPDATE HERE
 });

Changing remote access to your repository [IF REPOSITORY IS FORKED]

git remote remove origin
git remote add origin "YOUR_PERSONAL_GIT_REPO"
git add .
git commit "Initial Commit"
git push

Bootstrap your AWS environment:
```
cdk bootstrap
```
Deploy all stacks:
```
cdk deploy --all
```
Update the regulations.gov API Key

Go to the AWS Secrets Manager
Navigate to regulations-gov-api-key and update the plaintext with the correct key

Note the outputs (API URLs and Amplify application URL)
Run Job in Amplify

Navigate to the Amplify service in the AWS Console.
Select the 'USDA-Comment-Analysis' Amplify App from the list
Press into the 'main deployment'
Click 'Run Job'

This should kick off the initial deployment, after which you can start using the application by following the link in the Amplify Service.

Usage

Access the Amplify application URL from deployment outputs
Enter a document ID from regulations.gov
Click "Add to Queue" followed by "Generate Insights"
Monitor real-time progress in the UI
View analysis results when processing completes

Stack Architecture

The project consists of the following CDK stacks:

PublicCommentAnalysisStack: Core document processing pipeline
WebSocketStack: Real-time communication infrastructure
RestApiStack: HTTP endpoints for frontend interaction
ClusteringStack: Comment clustering and analysis pipeline
ECRStack: Container infrastructure for processing
AmplifyStack: Frontend application deployment
TestLambdaStack: End-to-end testing resources

Technical Implementation

Clustering Algorithm

The system employs semantic clustering using:

SentenceTransformer embeddings (all-MiniLM-L6-v2 model)
K-means clustering with silhouette score evaluation
Text preprocessing and deduplication
Attachment content extraction and integration

Analysis Generation

Insights are generated using:

Amazon Bedrock with Claude 3.5 Sonnet
Structured prompt engineering for consistent JSON output
Multi-perspective analysis (sentiment, actions, organizations)
Representative comment extraction

API Implementation

REST API: Document submission and status checking
WebSocket API: Real-time progress notifications
State Management: Comprehensive processing state tracking
Error Handling: Graceful degradation and informative errors

Development

Project Structure

public-comment-analysis/
├── bin/                    # CDK application entry point
├── lib/                    # CDK stack definitions
├── lambda/                 # Lambda function code
│   ├── initializer/        # Document validation
│   ├── processor/          # Comment processing
│   ├── clustering-analyzer/# Analysis generation
│   └── ...
├── docker/                 # Container definitions
├── frontend/               # React frontend application
└── scripts/                # Utility scripts

Recent Improvements

The latest updates focus on improving pipeline reliability, frontend accuracy, and workflow efficiency across the clustering and analysis system.

Pipeline Fixes

Fixed SageMaker clustering failures by granting full read/write S3 permissions:
- s3:GetObject, s3:PutObject, s3:ListBucket
- s3:DeleteObject, s3:GetObjectVersion
Ensured SageMaker can read and write to the clustering bucket, preventing silent failures.
Updated clustering-stack.ts and public-comment-analysis.ts to correct insufficient permissions.

Clustering Logic Enhancements

Improved cluster size handling for small datasets:
- ≤ 2 texts → 1 cluster
- ≤ 5 texts → max of 2 clusters
Enhanced silhouette score calculation with error handling.
Added dynamic cluster adjustment based on dataset size.

State Management and Progress Tracking

Preserved existing state during progress updates to prevent overwrites.
Added a “completed” stage and ensured accurate 100% progress display.
Prevented race conditions that could mark completed tasks as incomplete.

Workflow & WebSocket Optimization

Removed duplicate status updates (70+ redundant lines deleted).
Streamlined completion handling in the clustering pipeline.
Eliminated unnecessary WebSocket notifications to reduce system overhead.

Frontend Fixes (Minor)

Corrected total comments display in Body.jsx

Testing

A test Lambda function provides end-to-end testing:

aws lambda invoke --function-name PublicCommentAnalysis-TestFunction response.json

Troubleshooting

Deployment Issues: Check CloudFormation events and logs
API Errors: Review Lambda logs and API Gateway CloudWatch logs
Processing Failures: Examine Step Functions execution history
WebSocket Problems: Check WebSocket handler logs for connection issues

Name		Name	Last commit message	Last commit date
Latest commit History 46 Commits
bin		bin
docker/sagemaker-processing		docker/sagemaker-processing
docs		docs
frontend		frontend
lambda		lambda
lib		lib
test		test
.env'		.env'
.gitignore		.gitignore
.npmignore		.npmignore
README.md		README.md
cdk.json		cdk.json
fix-document-status.py		fix-document-status.py
jest.config.js		jest.config.js
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

USDA Public Comment Analysis System

Disclaimers

Project Overview

Key Features

Architecture

Prerequisites

Deployment

Preparing Secrets

Creating GitHub Access Token

Enabling Amazon Bedrock Model Access

Deployment Steps

FOLLOW IF REPOSITORY IS FORKED

Usage

Stack Architecture

Technical Implementation

Clustering Algorithm

Analysis Generation

API Implementation

Development

Project Structure

Recent Improvements

Pipeline Fixes

Clustering Logic Enhancements

State Management and Progress Tracking

Workflow & WebSocket Optimization

Frontend Fixes (Minor)

Testing

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

USDA Public Comment Analysis System

Disclaimers

Project Overview

Key Features

Architecture

Prerequisites

Deployment

Preparing Secrets

Creating GitHub Access Token

Enabling Amazon Bedrock Model Access

Deployment Steps

FOLLOW IF REPOSITORY IS FORKED

Usage

Stack Architecture

Technical Implementation

Clustering Algorithm

Analysis Generation

API Implementation

Development

Project Structure

Recent Improvements

Pipeline Fixes

Clustering Logic Enhancements

State Management and Progress Tracking

Workflow & WebSocket Optimization

Frontend Fixes (Minor)

Testing

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages