Skip to content

feat: add Vision Transformer (ViT) implementation for image classification#13333

Closed
devvratpathak wants to merge 4 commits intoTheAlgorithms:masterfrom
devvratpathak:feat/vision-transformer
Closed

feat: add Vision Transformer (ViT) implementation for image classification#13333
devvratpathak wants to merge 4 commits intoTheAlgorithms:masterfrom
devvratpathak:feat/vision-transformer

Conversation

@devvratpathak
Copy link

Description

This PR adds a comprehensive Vision Transformer (ViT) implementation to the computer_vision folder for image classification tasks.

Implementation Details

Implementation of the Vision Transformer architecture from "An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale" (Dosovitskiy et al., 2020).

Core Components:

  • Patch Embedding: Splits images into non-overlapping patches
  • Linear Projection: Projects flattened patches to embedding dimension
  • Positional Encoding: Adds learnable positional embeddings with CLS token
  • Attention Mechanism: Scaled dot-product attention
  • Layer Normalization: Normalizes layer outputs
  • Feed-Forward Network: Position-wise FFN with GELU activation
  • Transformer Encoder Block: Complete encoder with multi-head attention
  • Vision Transformer Pipeline: Full ViT for image classification

Code Quality:

All functions include comprehensive docstrings, type hints, doctests, and follow repository coding standards. All ruff checks pass successfully.

Example Usage:

from computer_vision.vision_transformer import vision_transformer
import numpy as np

image = np.random.rand(224, 224, 3)
logits = vision_transformer(image, num_classes=1000)
predicted_class = np.argmax(logits)

…features section

- Add comprehensive table of contents for easy navigation
- Include detailed installation steps with virtual environment setup
- Add usage examples showing how to run and import algorithms
- Create features section listing all algorithm categories
- Add explicit license section with MIT License information
- Expand contributing section with quick start guide
- Add about section explaining repository purpose

Fixes TheAlgorithms#13111
…features section

- Add comprehensive table of contents for easy navigation
- Include detailed installation steps with virtual environment setup
- Add usage examples showing how to run and import algorithms
- Create features section listing all algorithm categories
- Add explicit license section with MIT License information
- Expand contributing section with quick start guide
- Add about section explaining repository purpose

Fixes TheAlgorithms#13111
…ation

- Implement complete ViT architecture with patch embedding
- Add positional encoding with learnable CLS token
- Include scaled dot-product attention mechanism
- Implement transformer encoder blocks with layer normalization
- Add feed-forward network with GELU activation
- Include comprehensive docstrings and type hints
- Add doctests for all functions
- Provide example usage demonstrating the complete pipeline

Fixes TheAlgorithms#13326
- Replace Optional with X | None syntax (UP045)
- Use np.random.Generator instead of legacy np.random methods (NPY002)
- Fix line length violations (E501)
- Assign f-string literals to variables in exceptions (EM102)
- Remove unused variables and parameters (RUF059, F841)
- Add noqa comment for intentionally unused API parameter
- All doctests still pass successfully
@algorithms-keeper
Copy link

Closing this pull request as invalid

@devvratpathak, this pull request is being closed as none of the checkboxes have been marked. It is important that you go through the checklist and mark the ones relevant to this pull request. Please read the Contributing guidelines.

If you're facing any problem on how to mark a checkbox, please read the following instructions:

  • Read a point one at a time and think if it is relevant to the pull request or not.
  • If it is, then mark it by putting a x between the square bracket like so: [x]

NOTE: Only [x] is supported so if you have put any other letter or symbol between the brackets, that will be marked as invalid. If that is the case then please open a new pull request with the appropriate changes.

@algorithms-keeper algorithms-keeper bot closed this Oct 7, 2025
@algorithms-keeper algorithms-keeper bot added the awaiting reviews This PR is ready to be reviewed label Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

awaiting reviews This PR is ready to be reviewed invalid

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants