Skip to content

Use a single dataset per data provider / frontend app #1393

@Laurin-W

Description

@Laurin-W

Issue
I'm about to refactor the data provider to use ldo in general.

This comes with improvements such as

  • typed objects
  • link-traversal (auto-dereferencing required resources)
  • the possibility to keep track of changes per object using transactions
  • translating changes to diffs or SPARQL queries
  • stores all data in RDF.js datasets, which store data as quads.

What ldo doesn't implement (yet) but which we can augment:

  • Schema Validation
  • Caching
  • Pagination

My question
Can we store all data in one RDF dataset or should we create one dataset per resource?

In practice a resource URI translates to the graph URI. So storing everything in one RDF dataset should be fine in the vast majority of cases.

Approach to using a single dataset
The dataprovider needs to be refactored so that it queries resources accordingly, if they are not cached already or stale. This requires a slightly different caching approach.
Also, caching and filtering will need to be addressed.

In either case, we need to add/refactor pagination and schema validation.

Possible Issues
Where do you see issues with using one dataset for all frontend data?

  • Also, we might run into issues when the graph URI does not translate to the resource URI: JsonLd objects can store information about graphs that do not belong to the same resource. E.g. I fetch https://foo.bar which returns a jsonld resource which contains a graph with id https://graph.id - should data from that graph as defined here be stored in the general dataset? What happends if I fetch https://graph.id and the data is contradictory?
  • Refactoring filtering / queries could be challenging. We might have to come up with a strategy to decide on whether we need to fetch ldp resources or use a sparql query (if sparql is available).

Benefits
Currently, there is a lot of redundancy in the data fetched so it would probably bring performance gains and reduce server load.

I think that the single-dataset approach brings us closer to a unified framework which supports both the solid and the NextGraph sparql-based world.

Also, I suppose that it reduces complexity because you don't have to think about the dataset where data is stored in (almost) every case.

What next?
I would like to try using one dataset only with the use(typed)Collection implementation first and see how it goes.


@srosset81 do you have an opinion here?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions