Skip to content

Conversation

@L3Gaunt
Copy link

@L3Gaunt L3Gaunt commented Feb 20, 2025

This PR implements storing the homepages accessed in a downloaded-urls/ subdirectory, which allows a user to validate the report by looking at the sources and building a knowledge base that can be consulted for more detail. Filenames derive from URLs sanitized with sanitize-url, plus a timestamp recording when files were accessed. Files contain the title, description, URL, accessed-at timestamp, and markdown content from firecrawl.

In the near future, I want to implement storing the log of queries, research and learnings as well, so that a user can judge the quality of the research process for themselves and give feedback.

@L3Gaunt
Copy link
Author

L3Gaunt commented Feb 21, 2025

I changed things as follows:

  • the accessed-at date isn't put into filenames of downloaded URLs anymore; I think it is usually a desired behavior to overwrite web pages with newer versions, someone who really wants version tracking should add git to their knowledge base. In edge cases, the mapping of URLs->filenames is not 1-to-1 anymore though.
  • The final report now includes the download locations of the files we get
  • the output.md file now contains a timestamp, initial+follow-up questions, and the final learnings. Want to add intermediate learnings too. I think having the option to supervise and judge the quality of what the thing did during the process is important for quality control, and someone who doesn't want to see it can always just scroll past it.
  • using path.join to put folder+filename together (so it should work on Windows now...?)

Feel free to cherry-pick what you like.

@L3Gaunt L3Gaunt changed the title store crawled research results in a folder store crawled research results in a folder, log questions+learnings in output file Feb 22, 2025
@L3Gaunt L3Gaunt changed the title store crawled research results in a folder, log questions+learnings in output file store crawled research results in a folder, log research topic/follow-up questions+learnings in output file Feb 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant