Deep Learning, Computer Vision and Medical Imaging Papers

I review, track, then document interesting and relevant works/papers for image classification, object detection, image captioning and Image Segmentation, Generative models, Vision-Language Models, 3D Vision and Medical Imaging - Using Convolution networks, Deep Neural networks, Transformer architectures

Image Classification

☑ CNN paper - (LeNet) - Gradient-Based Learning Applied to Document Recognition (CNN Foundation paper by Yann LeCun)
☑ VGG paper - Very Deep Convolutional Networks for Large-Scale Image Recognition (By Visual Geometry Group, University of Oxford)
☑ ResNet paper - Deep Residual Learning for Image Recognition (By Microsoft Research team)
☑ Vision Transformers (ViT) paper - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
☑ A ConvNet for the 2020s

Image Captioning

☑ Show and Tell: A Neural Image Caption Generator (By Google Team)
☑ Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Object Detection

☑ R-CNN paper - Rich feature hierarchies for accurate object detection and semantic segmentation
☑ YOLO paper - You Only Look Once: Unified, Real-Time Object Detection

Image Segmentation

☑ Mask R-CNN (Instance Segmentation)
☑ SAM paper - Segment Anything Model, By Google team (Instance Segmentation)
☑ U-Net, for medical imaging (Semantic Segmentation)

Generative Models

☑ Pixel Recurrent Neural Networks, by Google DeepMind, 2016 (Autoregressive Generative model paper) - Explicit Probability density approach (Direct from training images, employs tractable density)
☑ Auto-Encoding Variational Bayes, 2013 (Variational Autoencoders paper) - (Explicit Probability density approach, Approximate density measurement)
☑ Generative Adversarial Nets, NeurIPS 2014 - Generative Adversarial Networks (GANs paper) - (Implicit Probability density approach)
☑ Denoising Diffusion Probabilistic Models (DDPM), 2020 - Diffusion models paper

Foundational Vision Language Models (VLMs)

☑ CLIP paper: "Learning Transferable Visual Models From Natural Language Supervision", 2020
☑ Flamingo paper: "Flamingo: A Visual Language Model for Few-Shot Learning", 2022
☑ BLIP paper: Bootstrapped language-image pretraining, 2022

VLMs for Medical Imaging

☑ MedCLIP paper: "MedCLIP: contrastive Learning from Unpaired Medical images and text" (Extends CLIP pretraining by Decoupling image-text pairs not previously used to increase training size)
☑ SAM-Med3D: Towards General-purpose Segmentation Models for Volumetric Medical Images (Uses prompt points for guided 3D segmentation on Alzheimers dataset)
☑ MedBLIP paper: "MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts"
☑ Text3DSAM: Text-Guided 3D Medical Image Segmentation Using SAM-Inspired Architecture (CVPR 2025 challenge winner)

Other Visual Learning and representation papers

☑ DINOv3 (By Meta, August 2025)
☑ Densely Connected Convolutional Networks (DenseNet paper, CVPR 2017)

Depth Estimation (3D Vision)

MiDaS: Learning Robust Monocular Depth Estimation Combining Diverse Datasets
Monodepth2: Self-Supervised Monocular Depth Estimation with Left-Right Consistency

3D Reconstruction

Mesh R-CNN paper
Occupancy networks paper
Neural Radiance Fields (NeRF paper)

Conferences and Venues

A summary of top conferences for deep learning and medical imaging, including typical timelines.

NeurIPS
ICML
ICLR
CVPR
MICCAI (International Conference on Medical Image Computing and Computer Assisted Intervention)
Medical Imaging with Deep Learning (MIDL) Conference
ML4H
International Conference on Pattern Recognition (ICPR)
ICCV (IEEE-CVF International Conference on Computer Vision)
ECCV
AAAI
Neurocomputing
IEEE Transactions on Medical Imaging

Conference	Acronym	Typical Submission Deadline	Typical Conference Date
Neural Information Processing Systems	NeurIPS	Early May (Abstract) / Mid-May (Full)	Early December
International Conference on Machine Learning	ICML	Late January / Early February	Late July
International Conference on Learning Representations	ICLR	Late September / Early October	Early May
Conference on Computer Vision and Pattern Recognition	CVPR	Mid-November	Mid-June
International Conference on Computer Vision	ICCV	Mid-March	Mid-October (odd years)
European Conference on Computer Vision	ECCV	Early March	Late October (even years)
AAAI Conference on Artificial Intelligence	AAAI	Early September	End of February
Medical Image Computing and Computer Assisted Intervention	MICCAI	Early March	Mid-October
Medical Imaging with Deep Learning	MIDL	Mid-February	Early July
International Symposium on Biomedical Imaging	ISBI	Mid-November	April / May
Machine Learning for Health (Symposium)	ML4H	Late August	Early December

N/B: Dates are based on historical patterns. More From here: https://github.com/khairulislam/ML-conferences?tab=readme-ov-file Conference Acceptance rates: https://github.com/lixin4ever/Conference-Acceptance-Rate

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Deep Learning, Computer Vision and Medical Imaging Papers

Image Classification

☑ CNN paper - (LeNet) - Gradient-Based Learning Applied to Document Recognition (CNN Foundation paper by Yann LeCun)

☑ VGG paper - Very Deep Convolutional Networks for Large-Scale Image Recognition (By Visual Geometry Group, University of Oxford)

☑ ResNet paper - Deep Residual Learning for Image Recognition (By Microsoft Research team)

☑ Vision Transformers (ViT) paper - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

☑ A ConvNet for the 2020s

Image Captioning

☑ Show and Tell: A Neural Image Caption Generator (By Google Team)

☑ Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

Object Detection

☑ R-CNN paper - Rich feature hierarchies for accurate object detection and semantic segmentation

☑ YOLO paper - You Only Look Once: Unified, Real-Time Object Detection

Image Segmentation

☑ Mask R-CNN (Instance Segmentation)

☑ SAM paper - Segment Anything Model, By Google team (Instance Segmentation)

☑ U-Net, for medical imaging (Semantic Segmentation)

Generative Models

☑ Pixel Recurrent Neural Networks, by Google DeepMind, 2016 (Autoregressive Generative model paper) - Explicit Probability density approach (Direct from training images, employs tractable density)

☑ Auto-Encoding Variational Bayes, 2013 (Variational Autoencoders paper) - (Explicit Probability density approach, Approximate density measurement)

☑ Generative Adversarial Nets, NeurIPS 2014 - Generative Adversarial Networks (GANs paper) - (Implicit Probability density approach)

☑ Denoising Diffusion Probabilistic Models (DDPM), 2020 - Diffusion models paper

Foundational Vision Language Models (VLMs)

☑ CLIP paper: "Learning Transferable Visual Models From Natural Language Supervision", 2020

☑ Flamingo paper: "Flamingo: A Visual Language Model for Few-Shot Learning", 2022

☑ BLIP paper: Bootstrapped language-image pretraining, 2022

VLMs for Medical Imaging

☑ MedCLIP paper: "MedCLIP: contrastive Learning from Unpaired Medical images and text" (Extends CLIP pretraining by Decoupling image-text pairs not previously used to increase training size)

☑ SAM-Med3D: Towards General-purpose Segmentation Models for Volumetric Medical Images (Uses prompt points for guided 3D segmentation on Alzheimers dataset)

☑ MedBLIP paper: "MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts"

☑ Text3DSAM: Text-Guided 3D Medical Image Segmentation Using SAM-Inspired Architecture (CVPR 2025 challenge winner)

Other Visual Learning and representation papers

☑ DINOv3 (By Meta, August 2025)

☑ Densely Connected Convolutional Networks (DenseNet paper, CVPR 2017)

Depth Estimation (3D Vision)

MiDaS: Learning Robust Monocular Depth Estimation Combining Diverse Datasets

Monodepth2: Self-Supervised Monocular Depth Estimation with Left-Right Consistency

3D Reconstruction

Mesh R-CNN paper

Occupancy networks paper

Neural Radiance Fields (NeRF paper)

Conferences and Venues

A summary of top conferences for deep learning and medical imaging, including typical timelines.

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages