A serverless application for analyzing public comments from regulations.gov using advanced NLP techniques, clustering, and AI-powered insights.
Customers are responsible for making their own independent assessment of the information in this document.
This document:
(a) is for informational purposes only,
(b) references AWS product offerings and practices, which are subject to change without notice,
(c) does not create any commitments or assurances from AWS and its affiliates, suppliers or licensors. AWS products or services are provided "as is" without warranties, representations, or conditions of any kind, whether express or implied. The responsibilities and liabilities of AWS to its customers are controlled by AWS agreements, and this document is not part of, nor does it modify, any agreement between AWS and its customers, and
(d) is not to be considered a recommendation or viewpoint of AWS.
Additionally, you are solely responsible for testing, security and optimizing all code and assets on GitHub repo, and all such code and assets should be considered:
(a) as-is and without warranties or representations of any kind,
(b) not suitable for production environments, or on production or other critical data, and
(c) to include shortcuts in order to support rapid prototyping such as, but not limited to, relaxed authentication and authorization and a lack of strict adherence to security best practices.
All work produced is open source. More information can be found in the GitHub repo.
The USDA Public Comment Analysis system ingests public comments from regulations.gov, processes them using advanced natural language processing techniques, and generates insightful reports that include topic modeling, sentiment analysis, and detection of AI-generated content. This system assists USDA staff in efficiently analyzing public feedback on proposed regulations.
All work produced is open source. More information can be found in the GitHub repo.
- Comment Processing: Retrieves and processes comments and attachments from regulations.gov API
- Content Clustering: Groups similar comments using semantic similarity
- Sentiment Analysis: Determines positive, neutral, or negative sentiment by cluster
- Insight Generation: AI-powered summarization and action recommendations
- AI Content Detection: Flags potentially AI-generated comments for review
- Real-time Progress Updates: WebSocket-based progress monitoring
- Interactive UI: React frontend for submission and result visualization
The system follows a serverless architecture on AWS:
- Frontend: React application deployed on AWS Amplify
- API Layer: REST API (API Gateway + Lambda) for submissions and results
- Real-time Updates: WebSocket API for progress notifications
- Processing Pipeline: Step Functions workflow for orchestration
- ML Processing: SageMaker for clustering and Amazon Bedrock for analysis
- Storage: S3 for comment data and DynamoDB for state management
- Infrastructure: Defined with AWS CDK in TypeScript
- AWS account with sufficient permissions
- AWS CLI installed and configured
- Node.js (v14 or higher) and npm
- AWS CDK installed globally
- Regulations.gov API key
- Access to Amazon Bedrock with Claude 3.5 Sonnet model enabled
- GitHub account with access to the repository
- GitHub personal access token with repository permissions
- Docker should be running beforehand (use docker run)
- In GitHub: Settings > Developer settings > Personal access tokens
- Generate a new token with
repoandadmin:repo_hookscopes - Store in AWS Secrets Manager as
github-tokenby updating the plaintext
- In AWS Console, navigate to Amazon Bedrock
- Go to "Model access" and click "Manage model access"
- Enable "Anthropic Claude 3.5 Sonnet" model
- Create IAM User and Access Keys
- Open the IAM Console in AWS
- Create a new IAM user
- Under Attach Policies Directly, assign the required policies.
- Go to the Security Credentials tab and generate new Access Keys.
- Download or securely save the Access Key ID and Secret Access Key for later use.
-
Configure AWS CLI:
aws configure
-
Clone the repository:
git clone https://github.com/ASUCICREPO/public-comment-analysis.git cd public-comment-analysis -
Install global and project dependencies:
npm install -g aws-cdk npm install
- Update Github repository and owner [IF REPOSITORY IS FORKED]
- Navigate to the public-comment-analysis.ts file under the bin/ folder
- Update the values in the 'owner' and 'repository' parameters for the amplifyStack as seen below:
1. const amplifyStack = new AmplifyStack(app, 'AmplifyStack', {
apiUrl: restApiStack.apiUrl,
webSocketEndpoint: webSocketStack.webSocketEndpoint,
owner: 'ASUCICREPO', <- UPDATE HERE
repository: 'public-comment-analysis', <- UPDATE HERE
});- Changing remote access to your repository [IF REPOSITORY IS FORKED]
git remote remove origin git remote add origin "YOUR_PERSONAL_GIT_REPO" git add . git commit "Initial Commit" git push
-
Bootstrap your AWS environment:
cdk bootstrap
-
Deploy all stacks:
cdk deploy --all
-
Update the regulations.gov API Key
- Go to the AWS Secrets Manager
- Navigate to
regulations-gov-api-keyand update the plaintext with the correct key
-
Note the outputs (API URLs and Amplify application URL)
-
Run Job in Amplify
- Navigate to the Amplify service in the AWS Console.
- Select the 'USDA-Comment-Analysis' Amplify App from the list
- Press into the 'main deployment'
- Click 'Run Job'
This should kick off the initial deployment, after which you can start using the application by following the link in the Amplify Service.
- Access the Amplify application URL from deployment outputs
- Enter a document ID from regulations.gov
- Click "Add to Queue" followed by "Generate Insights"
- Monitor real-time progress in the UI
- View analysis results when processing completes
The project consists of the following CDK stacks:
- PublicCommentAnalysisStack: Core document processing pipeline
- WebSocketStack: Real-time communication infrastructure
- RestApiStack: HTTP endpoints for frontend interaction
- ClusteringStack: Comment clustering and analysis pipeline
- ECRStack: Container infrastructure for processing
- AmplifyStack: Frontend application deployment
- TestLambdaStack: End-to-end testing resources
The system employs semantic clustering using:
- SentenceTransformer embeddings (
all-MiniLM-L6-v2model) - K-means clustering with silhouette score evaluation
- Text preprocessing and deduplication
- Attachment content extraction and integration
Insights are generated using:
- Amazon Bedrock with Claude 3.5 Sonnet
- Structured prompt engineering for consistent JSON output
- Multi-perspective analysis (sentiment, actions, organizations)
- Representative comment extraction
- REST API: Document submission and status checking
- WebSocket API: Real-time progress notifications
- State Management: Comprehensive processing state tracking
- Error Handling: Graceful degradation and informative errors
public-comment-analysis/
├── bin/ # CDK application entry point
├── lib/ # CDK stack definitions
├── lambda/ # Lambda function code
│ ├── initializer/ # Document validation
│ ├── processor/ # Comment processing
│ ├── clustering-analyzer/# Analysis generation
│ └── ...
├── docker/ # Container definitions
├── frontend/ # React frontend application
└── scripts/ # Utility scripts
The latest updates focus on improving pipeline reliability, frontend accuracy, and workflow efficiency across the clustering and analysis system.
- Fixed SageMaker clustering failures by granting full read/write S3 permissions:
s3:GetObject,s3:PutObject,s3:ListBuckets3:DeleteObject,s3:GetObjectVersion
- Ensured SageMaker can read and write to the clustering bucket, preventing silent failures.
- Updated
clustering-stack.tsandpublic-comment-analysis.tsto correct insufficient permissions.
- Improved cluster size handling for small datasets:
- ≤ 2 texts → 1 cluster
- ≤ 5 texts → max of 2 clusters
- Enhanced silhouette score calculation with error handling.
- Added dynamic cluster adjustment based on dataset size.
- Preserved existing state during progress updates to prevent overwrites.
- Added a “completed” stage and ensured accurate 100% progress display.
- Prevented race conditions that could mark completed tasks as incomplete.
- Removed duplicate status updates (70+ redundant lines deleted).
- Streamlined completion handling in the clustering pipeline.
- Eliminated unnecessary WebSocket notifications to reduce system overhead.
- Corrected total comments display in
Body.jsx
A test Lambda function provides end-to-end testing:
aws lambda invoke --function-name PublicCommentAnalysis-TestFunction response.json- Deployment Issues: Check CloudFormation events and logs
- API Errors: Review Lambda logs and API Gateway CloudWatch logs
- Processing Failures: Examine Step Functions execution history
- WebSocket Problems: Check WebSocket handler logs for connection issues
