Intersection & Union of BED4 Intervals

A clean, dependency-free Python CLI for computing the union or intersection of two BED4 files. Designed for clarity, explicit edge-case handling, and easy reuse in bioinformatics pipelines.

Input: two whitespace-separated BED4 files with columns: chrom start end name
Output: a BED4 file you specify
Operations: union (merge by feature name within a chromosome), isec (pairwise interval overlap per chromosome)
Python: 3.8+

Please note: BED is typically 0-based, half-open; chromosome labels must match exactly (e.g., chr1 vs 1 are different).

Motivation

Goal: Read two BED4 files and, based on user choice, compute either the union or the intersection.

Intersection (isec)
Report all non-empty overlaps between each interval in file 1 and each interval in file 2 on the same chromosome.

Overlap rule (half-open intervals): [a,b) and [c,d) overlap iff a < d and c < b.
Example: [30,50) & [50,70) → no overlap
Example: [30,52) & [50,70) → overlap [50,52)
The feature name in the output is taken from file 1.

Union (union)
Output all features that occur in at least one file.

If a feature name occurs in only one file → include as-is.
If a feature name occurs in both files on different chromosomes → exclude both.
If a feature name occurs in both files on the same chromosome → output a single interval using the smallest covering span of both:
- [30,40) + [70,90) → [30,90)
- [30,50) + [40,45) → [30,50)

CLI contract (argparse)

operation (union or isec)
input1 path
input2 path
output path

The order of output lines is not important for the script.

Quick Start

# Union (merge by feature name if on the same chromosome)
python mycode.py union main.bed.txt unionsecondfile.bed.txt union_results.bed.txt
cat union_results.bed.txt

# Intersection (interval overlap by chromosome; name is taken from file1)
python mycode.py isec  main.bed.txt intersectionsecondfile.bed.txt isec_results.bed.txt
cat isec_results.bed.txt

# CLI help
python mycode.py -h

Features included

Robust parsing

Skips blank lines and # comments
Accepts whitespace-separated columns (tabs or spaces)
Auto-swaps inverted intervals (start > end)
Defaults name to . if the 4th column is missing

Union

Groups intervals by name across both files
Merges only when intervals with the same name are on the same chromosome
Flags & drops “same name but different chromosome”

Intersection

Checks pairwise overlaps by chromosome (no requirement to match names)
Output name is inherited from the file1 interval

Clear output & errors

Summary stats printed to stdout
Warnings/parse errors printed to stderr (with line numbers)

Usage

python mycode.py {union|isec} <input1.bed> <input2.bed> <output.bed>

-union: merge intervals by feature name if on the same chromosome. start = min(start_a, start_b); end = max(end_a, end_b) Pairs with identical names on different chromosomes are excluded.

-isec: report overlapping regions by chromosome for all pairs between file1 and file2. Overlap calculation:

overlap_start = max(start_a, start_b)
overlap_end   = min(end_a, end_b)
emit if overlap_start < overlap_end

Output name is taken from file1’s interval.

How it works read_bed_file(path)

Validates file existence, parses into (chrom, start, end, name), skips malformed rows and non-integers, prints warnings with line numbers.

find_unions(intervals1, intervals2)

Concatenates both lists; groups by name. Ensures all intervals for a given name share the same chromosome; otherwise marks the group as invalid. Emits one merged interval per valid name.

Examples (files included)

python mycode.py union main.bed.txt unionsecondfile.bed.txt union_results.bed.txt python mycode.py isec main.bed.txt intersectionsecondfile.bed.txt isec_results.bed.txt

The repository includes the following example inputs/outputs at the repo root:

main.bed.txt
unionsecondfile.bed.txt
intersectionsecondfile.bed.txt
union_results.bed.txt (generated by the command below)
isec_results.bed.txt (generated by the command below)

Run the following please:

Union example

cat union_results.bed.txt

Intersection example

cat isec_results.bed.txt

Project Structure

.
├── mycode.py
├── main.bed.txt
├── unionsecondfile.bed.txt
├── intersectionsecondfile.bed.txt
├── union_results.bed.txt         # generated
└── isec_results.bed.txt          # generated

Appendix: BED4 Files

A BED4 file has four columns (whitespace separated):

chrom – chromosome/contig label (e.g., chr1)
start – 0-based start (inclusive)
end – 0-based end (exclusive)
name – feature label (optional; defaults to . in this tool)

Lines beginning with # are treated as comments and ignored.

Contact

Developed by Martina Debnath | MSc Genetics and Multiomics in Medicine | UCL

Thank you for using my intersection-and-union CLI <3

Feel free to reach out for collaboration.

GitHub: https://github.com/marti-dotcom

Email: martinadebnath@gmail.com

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Intersection & Union of BED4 Intervals

Motivation

Quick Start

Features included

Usage

Project Structure

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
LICENSE		LICENSE
README.md		README.md
intersectionsecondfile.bed.txt		intersectionsecondfile.bed.txt
isec_results.bed.txt		isec_results.bed.txt
main.bed.txt		main.bed.txt
mycode.py		mycode.py
union_results.bed.txt		union_results.bed.txt
unionsecondfile.bed.txt		unionsecondfile.bed.txt

Folders and files

Latest commit

History

Repository files navigation

Intersection & Union of BED4 Intervals

Motivation

Quick Start

Features included

Usage

Project Structure

Contact

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages