You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Deep Learning, Computer Vision and Medical Imaging Papers
I review, track, then document interesting and relevant works/papers for image classification, object detection, image captioning and Image Segmentation, Generative models, Vision-Language Models, 3D Vision and Medical Imaging - Using Convolution networks, Deep Neural networks, Transformer architectures
Image Classification
☑ CNN paper - (LeNet) - Gradient-Based Learning Applied to Document Recognition (CNN Foundation paper by Yann LeCun)
☑ VGG paper - Very Deep Convolutional Networks for Large-Scale Image Recognition (By Visual Geometry Group, University of Oxford)
☑ ResNet paper - Deep Residual Learning for Image Recognition (By Microsoft Research team)
☑ Vision Transformers (ViT) paper - An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
☑ A ConvNet for the 2020s
Image Captioning
☑ Show and Tell: A Neural Image Caption Generator (By Google Team)
☑ Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Object Detection
☑ R-CNN paper - Rich feature hierarchies for accurate object detection and semantic segmentation
☑ YOLO paper - You Only Look Once: Unified, Real-Time Object Detection
Image Segmentation
☑ Mask R-CNN (Instance Segmentation)
☑ SAM paper - Segment Anything Model, By Google team (Instance Segmentation)
☑ U-Net, for medical imaging (Semantic Segmentation)
Generative Models
☑ Pixel Recurrent Neural Networks, by Google DeepMind, 2016 (Autoregressive Generative model paper) - Explicit Probability density approach (Direct from training images, employs tractable density)
☑ Auto-Encoding Variational Bayes, 2013 (Variational Autoencoders paper) - (Explicit Probability density approach, Approximate density measurement)
☑ Generative Adversarial Nets, NeurIPS 2014 - Generative Adversarial Networks (GANs paper) - (Implicit Probability density approach)
☑ MedCLIP paper: "MedCLIP: contrastive Learning from Unpaired Medical images and text" (Extends CLIP pretraining by Decoupling image-text pairs not previously used to increase training size)
☑ SAM-Med3D: Towards General-purpose Segmentation Models for Volumetric Medical Images (Uses prompt points for guided 3D segmentation on Alzheimers dataset)
☑ MedBLIP paper: "MedBLIP: Bootstrapping Language-Image Pre-training from 3D Medical Images and Texts"
☑ Text3DSAM: Text-Guided 3D Medical Image Segmentation Using SAM-Inspired Architecture (CVPR 2025 challenge winner)