Skip to content

Latest commit

 

History

History
125 lines (81 loc) · 4.16 KB

File metadata and controls

125 lines (81 loc) · 4.16 KB

AIEdit logo

Alignment-free genome assembly polisher with an ML model trained on spaced seed hit/miss patterns.

Requirements

The environment.yaml file contains all the necessary dependencies for compiling AIEdit manually in a Conda environment:

conda env create -f environment.yaml
conda activate aiedit

If you would like to train new models:

Installation

Using conda (recommended)

AIEdit is available on Bioconda:

conda install bioconda::aiedit

This will make the aiedit command available in the environment.

Manually

Build AIEdit in the build folder by running the following in the project's root folder:

pip install . --no-build-isolation

For Developers

If you are modifying the C++ or Python code and want your changes to reflect immediately, use an editable install:

pip install -e . --no-build-isolation

Install pybind11-stubgen so the aiedit/core.pyi file will be updated in case of changes in the C++ bindings.

Usage

AIEdit will run all required polishing stages given a set of reads READS and an assembly ASSEMBLY. Results will be stored in the output path specified by -o, which is the current working directory by default:

aiedit polish -r READS -a ASSEMBLY

Run aiedit polish --help for more details on the input parameters.

For polishing assemblies with ONT reads, we suggest setting -y 10 -p 0.8.

AIEdit uses half of the available CPUs on the machine by default. This can be adjusted with the -t parameter.

Models

To list available pretrained models with their configurations, run:

aiedit list_models

The default model supports 5bp edit windows using 3 spaced seeds (aiedit/pretrained/s3m5i5.pt). More models are available in the pretrained directory. Additionally, new models can be trained using the aiedit train command. We recommend using the default model for balanced computational performance and polishing accuracy—feel free to train and experiment with other models.

Output Files

The following files are created in the output folder (specified by -o). <input_file> is replaced by the draft assembly file's name:

  • <input_file>-aiedit_edited.fa, polished assembly in FASTA format
  • <input_file>-aiedit_variants.vcf, list of AIEdit's changes
  • <input_file>-ntedit_variants.vcf, list of ntEdit's changes

Running Tests

After compiling the project manually in build, run:

ctest --testdir build/tests

License

AIEdit Copyright (c) 2025-present British Columbia Cancer Agency Branch. All rights reserved.

AIEdit is released under the GNU General Public License v3

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, version 3.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see http://www.gnu.org/licenses/.

For commercial licensing options, please contact Patrick Rebstein prebstein@bccancer.bc.ca