-
Notifications
You must be signed in to change notification settings - Fork 9
Description
Is your feature request related to a problem? Please describe.
Currently, the application supports data ingestion from specific, persistent sources like URLs or file uploads. However, users often have ad-hoc text snippets (e.g., from an email, a chat, a PDF, or other documents) that they want to analyze quickly. The current workflow requires them to save this text into a file first and then upload it, which is an unnecessary and cumbersome step for quick analysis.
Describe the solution you'd like
I propose adding a fourth data source type: "Pasted Text".
- New UI Option: Introduce a new option in the data source selection UI, such as a "Paste Text" tab, which provides a large textarea field.
- Backend Processing:
- A new API endpoint will accept the raw text submission.
- Upon receiving the text, the system will create a corresponding
SourceDocumentrecord. This record is crucial for maintaining data integrity and associating the resulting events. - This new
SourceDocumentwill have a specialsource_type, such asPASTED_TEXT. Its title could be automatically generated (e.g., "Pasted Text from [Timestamp]") to make it identifiable.
- Data Storage Handling:
- To save storage and for privacy, the raw, original pasted text should not be stored in the database.
- The system will process the ephemeral text through the existing event extraction pipeline.
- The generated structured
Eventdata will be stored and correctly linked to the newSourceDocumentrecord.
This approach allows users to analyze text from any source seamlessly while integrating cleanly into our existing data model.
Describe alternatives you've considered
The only alternative is to continue forcing users to save their text snippets as local files before uploading. This is less user-friendly and acts as a barrier to quick, spontaneous analysis, which is a key use case we should support.
Additional context
This feature requires a modification to how SourceDocument records are handled. The system must be able to create a SourceDocument entry that doesn't have a stored raw text or a permanent URL, serving purely as a metadata container and a foreign key anchor for the Event table.