Im2col SIMD for 2D Tensors

A SIMD-optimized implementation of the im2col operation for 2D images (e.g., grayscale). This repository provides C++ source code targeting AVX2 (x86_64) and NEON (Armv8) architectures, as well as a reference (non-SIMD) implementation. You can integrate the resulting library into your own C++ project or access it from Python through a simple Ctypes wrapper.

Core utility developed for Mosaic-SR — used in production to accelerate patch extraction for the super-resolution pipeline on Arm CPU targets without GPU support.

Overview

The im2col (“image to column”) operation is commonly used in convolutional neural networks (CNNs) to transform a 2D image (or feature map) into a set of column vectors, facilitating efficient matrix-multiplication-based convolutions.

By taking advantage of SIMD intrinsics, we can significantly speed up the im2col computation on CPUs that support vector operations. This repository includes:

AVX2 implementation for modern x86_64 processors
NEON implementation for Armv8 processors
Reference scalar implementation (no intrinsics) for portability or as a fallback

The repository also contains a Python wrapper and a Jupyter Notebook that can be used as an example on how to use the library in Python.

Installation

Clone this repository:

git clone https://github.com/Henvezz95/im2col_2D.git
cd im2col_2D

Configure and build with CMake:
```
 mkdir build && cd build
 cmake -DCMAKE_BUILD_TYPE=Release ..
 cmake --build .
```
CMake will detect your CPU architecture and automatically compile the corresponding SIMD implementation (AVX2, NEON, or reference).
After building, you should find a shared library (e.g., libim2col.so on Linux or im2col.dll on Windows) in your build folder.

Folder Structure

 im2col_2D/
 ├── CMakeLists.txt        # Main CMake build script
 ├── src/
 │   ├── im2col_AVX2.cpp   # AVX2 implementation
 │   ├── im2col_NEON.cpp   # NEON implementation
 │   └── im2col_ref.cpp    # Reference (fallback) implementation
 ├── python/
 │   └── im2col.py         # Python ctypes wrapper
 ├── Test Notebook.ipynb   # Jupyter Notebook example

Implementation Notes

Architecture-aware dispatch: each implementation is tuned to its register width — AVX2 (256-bit, 8 floats) uses 2 cases; NEON (128-bit, 4 floats) uses 3, with Case 2 manually unrolled for

Note: The current implementation supports 2D (grayscale) arrays only. Multichannel (RGB) support via channel-stride is planned for a future release. common kernel sizes (5–8 columns)

Safe boundary handling: the Python wrapper over-allocates the output tensor by mem_tail elements to prevent out-of-bounds reads at image edges, trimming before return
Zero-copy interface: input and output tensors are passed directly via Ctypes pointers — no intermediate copies

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
python		python
src		src
CMakeLists.txt		CMakeLists.txt
README.md		README.md
Test Notebook.ipynb		Test Notebook.ipynb
img.webp		img.webp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Im2col SIMD for 2D Tensors

Overview

Installation

Folder Structure

Implementation Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 1

Languages

Folders and files

Latest commit

History

Repository files navigation

Im2col SIMD for 2D Tensors

Overview

Installation

Folder Structure

Implementation Notes

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 1

Languages

Packages