Amin Karimi Monsefi

I am a dedicated Ph.D. student in Computer Science at The Ohio State University, focusing on Computer Vision, Vision-Language Models, and Self-Supervised Learning under the supervision of Professor Rajiv Ramnath. My research encompasses image and video generation, as well as self-supervised learning techniques to advance the field of computer vision.

Research Interests:

Image and Video Generation:

Developing innovative methods for generating high-quality images and videos.
Extending image-based generative strategies into multi-frame sequences, focusing on preserving consistent identity, style, and motion dynamics.
Employing hierarchical knowledge structures to capture subtle morphological or stylistic variations, allowing for highly distinctive yet consistent image synthesis.
Exploring how the creativity of Large Language Models (LLMs) can be utilized in video generation with diffusion models.

Self-Supervised Learning for Vision:

Designing self-supervised approaches to learn meaningful representations from unlabeled data.
Projects include Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning and a self-supervised approach for general images using multimodal architectures like CLIP.
Applying self-supervised learning to medical image analysis to overcome the challenge of limited labeled data.

Medical Image Analysis:

Utilizing self-supervised learning to train models on unlabeled medical images.
It aims to extract valuable features for better analysis and interpretation in the medical domain.
Developed Masked LoGoNet, a neural network architecture with tailored self-supervised learning for efficient medical image segmentation.

Recent News and Updates:

Professional Experience:

ML Research Intern – Apple MIND Team
May 2025 – Sep 2025 · 5 months | Summer 2025 | Seattle, WA
• Researched and developed advanced generative models for efficient, few-step discrete diffusion, enabling faster and scalable text generation.
• Collaborated with a cross-functional ML team to design novel algorithms and architectures for large-scale language modeling.
• Conducted experiments and delivered insights that advanced Apple's research in discrete generative modeling and shaped future projects.
• Publisiung FS-DFM paper, Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models.
Machine Learning Intern – Higharc
May 2024 – Aug 2024 · 4 months | Remote, Durham, NC
• Conducting research on semantic and panoptic segmentation tasks.
• Utilizing unlabeled data to pre-train a DETR-based model and addressing challenges of limited labeled data with self-supervised learning.
Senior Data Scientist – JIBB
Dec 2020 – Dec 2021 · 1 year 1 month | Remote, San Francisco, CA
JIBB provides a smart platform to capture, save, and share handwriting from whiteboards or paper across devices.
• Designed and deployed computer vision pipelines for object detection and dynamic content filtering in both images and videos.
• Developed custom CNN architectures to accurately detect content color and remove shadows and reflections.
• Built automated tools for enhancing visual clarity in real-time handwriting sessions.
Senior Data Scientist & Back-End Developer – TAPSI
Mar 2018 – Dec 2020 · 2 years 10 months | Tehran, Iran
TAPSI is a leading online ride-hailing platform in Iran, providing intelligent mobility solutions through advanced technology and AI.
• Developed AI-powered pricing microservices in Python, communicating via RabbitMQ for real-time fare adjustments.
• Designed a GPS anomaly detection system to prevent fraud and ensure user safety.
• Built data-driven recommendation features (origin, destination, favorite places) using unsupervised learning.
• Created a microservice to estimate ETA based on live driver GPS; published a paper on the proposed method.
• Engineered a spatiotemporal forecasting tool to predict high-demand ride areas in urban regions.

Publications:

09/2025 - Submitted!: FS-DFM: Fast and Accurate Long Text Generation with Few-Step Diffusion Language Models
07/2025 - Accepted in Biomedical Optics Express Journal: ISOSNet: a unified framework for cone photoreceptor detection and inner segment and outer segment length measurement from AO-OCT B-scans
06/2025 - Accepted in ICCV 2025: TaxaDiffusion: Progressively Trained Diffusion Model for Fine-Grained Generation and Trait Discovery
03/2025 - Accepted in CVEU Workshop of CVPR 2025: KnobGen: Controlling the Sophistication of Artwork in Sketch-Based Diffusion Models
01/2025 - Accepted in ICLR 2025: Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
02/2025 - Accepted in SSI-FM Workshop of ICLR 2025: DetailCLIP: Detail-Oriented CLIP for Fine-Grained Tasks
05/2024 - Accepted in SIGKDD 2024: Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain
03/2024 - Accepted in Biomedical Optics Express Journal: Reducing Manual Labeling Requirements and Improved Retinal Ganglion Cell Identification in 3D AO-OCT Volumes Using Semi-Supervised Learning
06/2023 - Accepted in ACM SIGSPATIAL International Workshop on Advances in Urban-AI: CrashFormer: A Multimodal Architecture to Predict the Risk of Crash
05/2023 - Accepted in SIGKDD 2023: Novel Physics-Based Machine-Learning Models for Indoor Air Quality Approximations
02/2023 - Accepted in Digital Communications and Networks Journal - 2023: Smart and collaborative industrial IoT: A federated learning and data space approach
08/2022 - Accepted in ACM SIGSPATIAL 2022: Will there be a construction? Predicting road constructions based on heterogeneous spatiotemporal data

Reviewer Appointments:

Selected to serve as a reviewer for SIGKDD 2026
Selected to serve as a reviewer for SIGKDD 2025 Second Round (selected as an Outstanding Reviewer - 10% of top reviewers.)
Selected to serve as a reviewer for CVPR 2025
Selected to serve as a reviewer for ICLR 2025
Selected to serve as a reviewer for WACV 2025
Selected to serve as a reviewer for SIGKDD 2025 First Round Excellent Reviewer - 20% of top reviewers.)
Selected to serve as a reviewer for SIGKDD 2024

Bachelor and Master

My Bachelor’s thesis focused on applying reinforcement learning in a multi-object environment. In this unique setting, each object had the ability to train individually. Additionally, I incorporated federated learning techniques to enable the objects to generalize their models to each other. This research explored the potential of combining these approaches to enhance learning and decision-making in complex environments.

For my Master’s thesis, I delved into the realm of software testing. Specifically, I proposed an innovative approach to generating datasets using machine learning techniques. This approach aimed to cover the main paths within the software, enabling effective fault detection. By leveraging machine learning, I sought to enhance the efficiency and accuracy of software testing processes, ultimately improving software systems’ overall quality and reliability.