Skip to content

[Feature] Naming of the domains by an LLM #43

@quentinblampey

Description

@quentinblampey

Currently, domains are annotated as e.g. D1015, and we have to manually provide a name afterwards.

Instead, we could:

  1. extract some information from the domains
  2. create a prompt
  3. send it to a LLM (need an API key)
  4. format the output to create a dict[str, str] mapping, e.g. {"D1015": "Tertiary Lymphoid Structure", ...}

Ideas of the information to add to the prompt:

  • DEGs between domains
  • percentage of cell-types within each domain (if cell_type_key exists).
  • tissue name (provided by the user?)
  • other ideas?

Function signature:

def llm_domains_name_mapping(
    adata: AnnData,
    obs_key: str,
    cell_type_key: str | None = None,
    tissue_name: str | None = None,
) -> dict[str, str]

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions