A clean, dependency-free Python CLI for computing the union or intersection of two BED4 files. Designed for clarity, explicit edge-case handling, and easy reuse in bioinformatics pipelines.
- Input: two whitespace-separated BED4 files with columns:
chrom start end name - Output: a BED4 file you specify
- Operations:
union(merge by feature name within a chromosome),isec(pairwise interval overlap per chromosome) - Python: 3.8+
Please note: BED is typically 0-based, half-open; chromosome labels must match exactly (e.g.,
chr1vs1are different).
Goal: Read two BED4 files and, based on user choice, compute either the union or the intersection.
Intersection (isec)
Report all non-empty overlaps between each interval in file 1 and each interval in file 2 on the same chromosome.
- Overlap rule (half-open intervals):
[a,b)and[c,d)overlap iffa < dandc < b. - Example:
[30,50)&[50,70)→ no overlap - Example:
[30,52)&[50,70)→ overlap[50,52) - The feature name in the output is taken from file 1.
Union (union)
Output all features that occur in at least one file.
- If a feature name occurs in only one file → include as-is.
- If a feature name occurs in both files on different chromosomes → exclude both.
- If a feature name occurs in both files on the same chromosome → output a single interval using the smallest covering span of both:
[30,40)+[70,90)→[30,90)[30,50)+[40,45)→[30,50)
CLI contract (argparse)
operation(unionorisec)input1pathinput2pathoutputpath
The order of output lines is not important for the script.
# Union (merge by feature name if on the same chromosome)
python mycode.py union main.bed.txt unionsecondfile.bed.txt union_results.bed.txt
cat union_results.bed.txt
# Intersection (interval overlap by chromosome; name is taken from file1)
python mycode.py isec main.bed.txt intersectionsecondfile.bed.txt isec_results.bed.txt
cat isec_results.bed.txt
# CLI help
python mycode.py -hRobust parsing
- Skips blank lines and
#comments - Accepts whitespace-separated columns (tabs or spaces)
- Auto-swaps inverted intervals (
start > end) - Defaults
nameto.if the 4th column is missing
Union
- Groups intervals by name across both files
- Merges only when intervals with the same name are on the same chromosome
- Flags & drops “same name but different chromosome”
Intersection
- Checks pairwise overlaps by chromosome (no requirement to match names)
- Output
nameis inherited from the file1 interval
Clear output & errors
- Summary stats printed to stdout
- Warnings/parse errors printed to stderr (with line numbers)
python mycode.py {union|isec} <input1.bed> <input2.bed> <output.bed>
-union: merge intervals by feature name if on the same chromosome. start = min(start_a, start_b); end = max(end_a, end_b) Pairs with identical names on different chromosomes are excluded.
-isec: report overlapping regions by chromosome for all pairs between file1 and file2. Overlap calculation:
overlap_start = max(start_a, start_b)
overlap_end = min(end_a, end_b)
emit if overlap_start < overlap_end
Output name is taken from file1’s interval.
How it works
read_bed_file(path)
Validates file existence, parses into (chrom, start, end, name), skips malformed rows and non-integers, prints warnings with line numbers.
find_unions(intervals1, intervals2)
Concatenates both lists; groups by name. Ensures all intervals for a given name share the same chromosome; otherwise marks the group as invalid. Emits one merged interval per valid name.
Examples (files included)
python mycode.py union main.bed.txt unionsecondfile.bed.txt union_results.bed.txt python mycode.py isec main.bed.txt intersectionsecondfile.bed.txt isec_results.bed.txt
The repository includes the following example inputs/outputs at the repo root:
-
main.bed.txt -
unionsecondfile.bed.txt -
intersectionsecondfile.bed.txt -
union_results.bed.txt(generated by the command below) -
isec_results.bed.txt(generated by the command below)
Run the following please:
Union example
cat union_results.bed.txtIntersection example
cat isec_results.bed.txt.
├── mycode.py
├── main.bed.txt
├── unionsecondfile.bed.txt
├── intersectionsecondfile.bed.txt
├── union_results.bed.txt # generated
└── isec_results.bed.txt # generated
Appendix: BED4 Files
A BED4 file has four columns (whitespace separated):
-
chrom – chromosome/contig label (e.g., chr1)
-
start – 0-based start (inclusive)
-
end – 0-based end (exclusive)
-
name – feature label (optional; defaults to . in this tool)
Lines beginning with # are treated as comments and ignored.
Developed by Martina Debnath | MSc Genetics and Multiomics in Medicine | UCL
Thank you for using my intersection-and-union CLI <3
Feel free to reach out for collaboration.
GitHub: https://github.com/marti-dotcom
Email: martinadebnath@gmail.com