Skip to content

[FEAT] Support Pasted Text as a New Data Source #4

@liujuanjuan1984

Description

@liujuanjuan1984

Is your feature request related to a problem? Please describe.

Currently, the application supports data ingestion from specific, persistent sources like URLs or file uploads. However, users often have ad-hoc text snippets (e.g., from an email, a chat, a PDF, or other documents) that they want to analyze quickly. The current workflow requires them to save this text into a file first and then upload it, which is an unnecessary and cumbersome step for quick analysis.

Describe the solution you'd like

I propose adding a fourth data source type: "Pasted Text".

  1. New UI Option: Introduce a new option in the data source selection UI, such as a "Paste Text" tab, which provides a large textarea field.
  2. Backend Processing:
    • A new API endpoint will accept the raw text submission.
    • Upon receiving the text, the system will create a corresponding SourceDocument record. This record is crucial for maintaining data integrity and associating the resulting events.
    • This new SourceDocument will have a special source_type, such as PASTED_TEXT. Its title could be automatically generated (e.g., "Pasted Text from [Timestamp]") to make it identifiable.
  3. Data Storage Handling:
    • To save storage and for privacy, the raw, original pasted text should not be stored in the database.
    • The system will process the ephemeral text through the existing event extraction pipeline.
    • The generated structured Event data will be stored and correctly linked to the new SourceDocument record.

This approach allows users to analyze text from any source seamlessly while integrating cleanly into our existing data model.

Describe alternatives you've considered

The only alternative is to continue forcing users to save their text snippets as local files before uploading. This is less user-friendly and acts as a barrier to quick, spontaneous analysis, which is a key use case we should support.

Additional context

This feature requires a modification to how SourceDocument records are handled. The system must be able to create a SourceDocument entry that doesn't have a stored raw text or a permanent URL, serving purely as a metadata container and a foreign key anchor for the Event table.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions