Pham Xuan Trung

Korea Advanced Institute of Science and Technology (KAIST)

Personal Information

🎓 Greetings, I have completed my integrated Ms-Ph.D. Degree in Electrical Engineering at KAIST in 2018-2025 in South Korea, under the guidance of Professor Chang D. Yoo. My interest relates to Computer Vision, Deep Learning, Machine Learning, Generative AI, and Image/Video/Audio processing: Self-supervised Learning, Generative Models, Diffusion Models, Multimodal Learning, Speech Processing and Natural Language Processing. I continued doing Postdoctoral research at KAIST after my doctoral degree is finished.

🎓 In the past, I have completed my Bachelor Degree in 2014 from Hanoi University of Science and Technology (HUST, a top-tier university in Vietnam), with the School of Electronics and Telecommunications ranked 10/526 students (top 1.9%). After that, I worked for VNPT Technology Corporation (employees > 1000+) in Hanoi (Vietnam) for 3 years until 2018, mainly doing research and deploying 2G, 3G & 4G mobile communication network projects with various vendors such as Alcatel-Lucent, Nokia Siemens, and SAMSUNG.

Email / CV / Google Scholar / LinkedIn

International Conferences and Journals

I take immense pride in my contributions to the academic community, having disseminated my findings through publications in top-tier venues: ICML (1), ICLR (3), CVPR (4), NeurIPS (2), ECCV (1), Advanced Materials (1, IF: 32), Nano Energy (1, IF: 19), IEEE TCSVT (1, IF: 8.4), IEEE TBC (1, IF: 4.5), IEEE Access (3, IF: 3.9), ICASSP (1).

Reviewers & Program Committee Members

I served as a Reviewer and Program Committee at various prestigious conferences and journals

ICML 2024, 2025, 2026 [The International Conference on Machine Learning]
ICLR 2024, 2025, 2026 [The International Conference on Learning Representations]
NeurIPS 2023, 2024, 2025 [The Conference on Neural Information Processing Systems]
CVPR 2023, 2024, 2025, 2026 [The Conference on Computer Vision and Pattern Recognition]
AAAI 2024, 2025, 2026 [The Association for the Advancement of Artificial Intelligence]
ICCV 2023, 2025 [The International Conference on Computer Vision]
ECCV 2024, 2026 [The European Conference on Computer Vision]
ACCV 2024 [The Asian Conference on Computer Vision]
ICASSP 2024, 2025, 2026 [The International Conference on Acoustics, Speech, and Signal Processing]
AISTATS 2025, 2026 [International Conference on Artificial Intelligence and Statistics]
IJCNN 2025 [International Joint Conference on Neural Networks]
ACM Multimedia 2025 [ACM International Conference on Multimedia]
WACV 2026 [Winter Conference on Applications of Computer Vision]
Neural Networks (NN) 2023, Impact Factor: 8.67 [Certificate]
IEEE Transaction on Multimedia (TMM) 2023, Impact Factor: 7.39
Computer Vision and Image Understanding (CVIU) 2024, Impact Factor: 4.3 [Certificate]
ISPRS Journal of Photogrammetry and Remote Sensing (ISPRS) 2024, Impact Factor: 11.83 [Certificate]
Expert Systems With Applications (ESWA) 2024, 2025, Impact Factor: 8.5 [Certificate]
Digital Signal Processing (DSP) 2025, Impact Factor: 3.4 [Certificate]
Transactions on Machine Learning Research (TMLR) 2025
Engineering Applications of Artificial Intelligence (EAAI) 2025, Impact Factor: 7.5 [Certificate]
Information Processing and Management (IPM) 2025, Impact Factor: 7.4 [Certificate]
IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) 2025, Impact Factor: 8.4 [Certificate]
IEEE Transactions on Emerging Topics in Computing (TETC) 2025, Impact Factor: 5.4 [Certificate]

Awards

🏅 Won Best Oral Presentation Award in THE 11th ANNUAL CONFERENCE OF VIETNAMESE YOUNG SCIENTISTS – ACVYS 2025 [Certificate]
🏅 Won Jang Youngsil Fellow Program, funded by KAIST. It is the most prestigious scholarship offered by KAIST (in South Korea) to support world-class researchers, 2025 [Certificate]
🏅 Won Award of the Top 100 Best Korea National Researches, 2023 [Certificate]
🏅 Annual Encourage Scholarship by Hanoi University of Science and Technology (HUST) for excellent students with outstanding performance for every semester, 2009 – 2014
🏅 Won Trang Nguyen Flower award for the best student among thousands of students in Giao Thuy B High School, Nam Dinh, Viet Nam 2009
🏅 Gold Medal: Won First Prize in Mathematics Contest for High School Students in Grade 12, Nam Dinh, Viet Nam 2009

Breaking News

2026-01-26 [ICLR 2026]: "A Hidden Semantic Bottleneck in Conditional Embeddings of Diffusion Transformers", paper has been accepted to ICLR 2026.
>> Oral: 2025-11-09 [🛑] [News]: Dr. Trung X. Pham delivered a talk titled “Masked Diffusion: The New Frontier of Multimodal Generative AI” at the 11th Annual Conference of Vietnamese Young Scientists (ACVYS 2025) held at Yonsei University, where he received the Best Oral Presentation Award.
2025-09-18 [NeurIPS 2025]: "Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation", paper has been accepted to NeurIPS 2025.
>> Spotlight: 2025-09-05 [🛑] [News]: Dr. Trung X. Pham has been officially awarded the F-2-7S Talent Visa by KAIST's President and the Korean government, recognizing his global expertise and enabling him to live and work in Korea.
2025-09-01 [🛑] [News]: Dr. Trung X. Pham has started his research as a Postdoctoral Researcher at KAIST.
2025-07-14 [Reviewers]: Dr. Trung X. Pham has accepted an invitation to serve as a Program Committee (PC) member for AAAI 2026. He will contribute to the paper review and selection process for this leading conference in the Advancement of Artificial Intelligence.
2025-06-25 [Reviewers]: Dr. Trung X. Pham has accepted an invitation to serve as a Program Committee (PC) member for WACV 2026. He will contribute to the paper review and selection process for this leading conference in computer vision research.
>> Spotlight: 2025-05-29 [🛑] [News]: 🎓 Trung X. Pham has successfully defended his Ph.D. degree in the School of Electrical Engineering at KAIST. His thesis topic: "Multimodal Masked Diffusion-based Generative Models: Innovations and Applications". This marks the completion of his doctoral journey and the beginning of his postdoctoral research career.
2025-05-15 [Arxiv 2025]: A Research paper on Zeroshot object customization has been released on Arxiv.
2025-05-01 [Reviewers]: Trung X. Pham has accepted an invitation to serve as a Program Committee (PC) member for NeurIPS 2025. He will contribute to the paper review and selection process for this leading conference in AI and neural information processing systems.
>> Spotlight: 2025-04-09 [🛑] [News]: 🏅 Trung X. Pham (Ph.D.) has been selected for the 2025 KAIST Jang Young Sil Fellowship Program (Postdoctoral Track). This prestigious program is KAIST’s most competitive fellowship, designed to support the development of world-class researchers. It attracts top talent from around the world.
2025-02-22 [CVPR 2025]: "ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On", paper has been accepted to CVPR 2025.
2025-01-10 [ICLR 2025]: "MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation", paper has been accepted to ICLR 2025.
2024-08-08 [TBD 2024]: "ACDMSR: Accelerated conditional diffusion models for single image super-resolution", paper has been accepted to IEEE Transactions on Broadcasting (TBD) 2024.
>> Spotlight: 2024-08-27 [🛑] [News]: 🎓 Trung X. Pham has successfully defended his Ph.D. Proposal in the School of Electrical Engineering at KAIST.
2024-05-02 [ICML 2024]: "Cross-view Masked Diffusion Transformers for Person Image Synthesis", paper has been accepted to ICML 2024.
2024-03-25 [TCSVT 2024]: "Learning from multi-perception features for real-word image super-resolution", paper has been accepted to IEEE Transactions on Circuits and Systems for Video Technology (TCSVT) 2024.
Read More...

Recent Publications

* denotes equal contributions. My research was first recorded in 2018.

[2026] A Hidden Semantic Bottleneck in Conditional Embeddings of Diffusion Transformers

Trung X. Pham, Kang Zhang, Ji Woo Hong and Chang D. Yoo

International Conference on Learning Representations (ICLR 2026), acceptance rate 28%, held in Brazil [OpenReview] [Code]

We present the first systematic study of these embeddings and uncover a notable redundancy: class-conditioned embeddings exhibit extreme angular similarity, exceeding 99% on ImageNet-1K, while continuous-condition tasks such as pose-guided image generation and video-to-audio generation reach over 99.9%.

[2025] Model-Guided Dual-Role Alignment for High-Fidelity Open-Domain Video-to-Audio Generation

Kang Zhang*, Trung X. Pham*, Suyeon Lee, Axi Niu, Arda Senocak, Joon Son Chung

Neural Information Processing Systems (NeurIPS 2025), acceptance rate 24.5%, held in California, United States of America [OpenReview] [Code]

A state-of-the-art framework, a novel framework with model-guidance replacing traditional CFG training for vision-inspired audio generation, outperforms existing state-of-the-art approaches with only 10% of training data.

[2025] E-MD3C: Taming Masked Diffusion Transformers for Efficient Zero-Shot Object Customization

Trung X. Pham, Zhang Kang, Hong Ji Woo, Xuran Zheng, Chang D. Yoo

Arxiv February 2025 [OpenReview] [Code]

A state-of-the-art framework, a novel framework for vision-inspired generation optimized for model parameter size, memory consumption, and inference speed using denoising masked diffusion transformers, facilitating efficient zero-shot object customization without reliance on giant pre-trained diffusion models as existing works.

[2025] MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation

Trung X. Pham*, Tri Ton*, and Chang D. Yoo

International Conference on Learning Representations (ICLR 2025), acceptance rate 32.1%, held in Singapore [OpenReview] [Code]

A state-of-the-art framework, a novel framework for vision-guided open-domain sound generation optimized for model parameter size, memory consumption, and inference speed using denoising masked diffusion transformers, facilitating efficient generation without reliance on pre-trained diffusion models.

[2025] ITA-MDT: Image-Timestep-Adaptive Masked Diffusion Transformer Framework for Image-Based Virtual Try-On

Ji Woo Hong, Tri Ton, Trung X. Pham, Gwanhyeong Koo, Sunjae Yoon, and Chang D. Yoo

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2025), acceptance rate 22.1%, held in the United States of America [OpenReview] [Code]

A novel and state-of-the-art framework for Virtual Tryon using Masked Diffusion Models.

[2024] Cross-view Masked Diffusion Transformers for Person Image Synthesis

Trung X. Pham*, Zhang Kang*, and Chang D. Yoo

International Conference on Machine Learning (ICML 2024), acceptance rate 27.5%, held in Vienna, Austria [OpenReview] [Code]

A state-of-the-art framework for pose-guided human image synthesis using the cutting-edge technique of masked diffusion transformers.

[2024] ACDMSR: Accelerated conditional diffusion models for single image super-resolution

Axi Niu, Trung X. Pham, Kang Zhang, Jinqiu Sun, Yu Zhu, Qingsen Yan, In So Kweon, Yanning Zhang

IEEE Transactions on Broadcasting (TBD 2024), IF: 5.19 [Links IEEE]

A novel framework for Speeding up Image Super-Resolution.

[2024] Learning from multi-perception features for real-word image super-resolution

Axi Niu, Kang Zhang, Trung X. Pham, Pei Wang, Jinqiu Sun, In So Kweon, Yanning Zhang

IEEE Transactions on Circuits and Systems for Video Technology (TCSVT 2024), IF: 8.4 [Links IEEE]

MPF-Net for Single Image Super-Resolution.

[2023] Self-supervised visual representation learning via residual momentum

Trung X. Pham, Axi Niu, Kang Zhang, Tee Joshua Tian Jin, Ji Woo Hong, Chang D Yoo

IEEE Access 2023, acceptance rate 30% [Links IEEE]

Introduction of residual momentum that significantly improves self-supervised learning frameworks.

[2023] DimCL: Dimensional Contrastive Learning for Improving Self-Supervised Learning

Thanh Nguyen*, Trung X. Pham*, Chaoning Zhang, Tung M Luu, Thang Vu, Chang D Yoo

IEEE Access 2023, acceptance rate 30% [Links IEEE]

Introduction of a new regularization that significantly improves self-supervised learning frameworks.

[2023] Cdpmsr: Conditional diffusion probabilistic models for single image super-resolution

Axi Niu, Kang Zhang, Trung X. Pham, Jinqiu Sun, Yu Zhu, In So Kweon, Yanning Zhang

IEEE International Conference on Image Processing (ICIP) 2023, acceptance rate 47% [Links]

Conditional diffusion model for image super-resolution with post-process technique

[2022] Deep learning-based noise robust flexible piezoelectric acoustic sensors for speech processing

Young Hoon Jung*, Trung X. Pham*, Dias Issa, Hee Seung Wang, Jae Hee Lee, Mingi Chung, Bo-Yeon Lee, Gwangsu Kim, Chang D Yoo, Keon Jae Lee

Nano Energy (IF: 19.0) 2022 [Links]

>> Top 100 best Korea national researches 2023 [Certificate]

An excellent combination of deep learning and flexible piezoelectric acoustic sensor for >99% accuracy of speaker recognition.

[2022] How does simsiam avoid collapse without negative samples? a unified understanding with self-supervised contrastive learning

Chaoning Zhang, Kang Zhang, Chenshuang Zhang, Trung X. Pham, Chang D Yoo, In So Kweon

International Conference on Learning Representations (ICLR 2022), acceptance rate 32.9% [OpenReview]

A deep analysis of constrastive learning frameworks to clarify the collapse issue.

[2022] Dual temperature helps contrastive learning without many negative samples: Towards understanding and simplifying moco

Chaoning Zhang*, Kang Zhang*, Trung X. Pham*, Axi Niu, Zhinan Qiao, Chang D Yoo, In So Kweon

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), acceptance rate 25.3% [Links]

The new loss function is proposed to reduce the massive number of negative samples in contrastive learning frameworks.

[2022] On the pros and cons of momentum encoder in self-supervised visual representation learning

Trung X. Pham, Chaoning Zhang, Axi Niu, Kang Zhang, Chang D Yoo

Arxiv 2022 [Links]

A deep investigation on the pros and cons of EMA-based contrastive learning frameworks.

[2022] Lad: A hybrid deep learning system for benign paroxysmal positional vertigo disorders diagnostic

Trung X. Pham*, Jin Woong Choi*, Rusty John Lloyd Mina, Thanh Xuan Nguyen, Sultan Rizky Madjid, Chang D Yoo

IEEE Access 2022, acceptance rate 30% [Links IEEE] [Code] [Dataset]

Using AI deep learning to diagnose BPPV disorders in patients in hospital, data from Chungnam National Hospital University.

[2021] Self-supervised Learning with Local Attention-Aware Feature

Trung X. Pham*, Rusty John Lloyd Mina, Dias Issa, Chang D. Yoo

Arxiv 2021 [Links]

Learning representation of the data without any labels.

[2021] Robust MAML: Prioritization task buffer with adaptive learning process for model-agnostic meta-learning

Thanh Nguyen, Tung Luu, Trung X. Pham, Sanzhar Rakhimkul, Chang D Yoo

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021, acceptance rate 48.4% [Links IEEE]

Learning meta model with adaptive learning rate scheme.

[2020] Learning augmentation network via influence functions

Donghoon Lee, Hyunsin Park, Trung X. Pham, Chang D Yoo

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), acceptance rate 22% [Links IEEE]

Learning augmentation with the differentiable neural network with a fancy influence approach.

[2020] Modality shifting attention network for multi-modal video question answering

Junyeong Kim, Minuk Ma, Trung X. Pham, Kyungsu Kim, Chang D Yoo

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), acceptance rate 22% [Links IEEE]

Video question answering via a smart design neural network.

[2019] Cascade rpn: Delving into high-quality region proposal network with adaptive convolution

Thang Vu, Hyunjun Jang, Trung X. Pham, Chang D Yoo

Advances in Neural Information Processing Systems (NeurIPS 2019), acceptance rate 21.6% [Links]

>> Spotlight (top 2.4%)

Significantly improve object detection of 2D images with a ground-breaking design.

[2020] Flexible piezoelectric acoustic sensors and machine learning for speech processing

Young Hoon Jung, Seong Kwang Hong, Hee Seung Wang, Jae Hyun Han, Trung X. Pham, Hyunsin Park, Junyeong Kim, Sunghun Kang, Chang D Yoo, Keon Jae Lee

Advanced Materials 2020 (IF: 32) [Links]

Combining AI/machine learning with flexible acoustic sensors for Speech Processing.

[2018] Fast and efficient image quality enhancement via desubpixel convolutional neural networks

Thang Vu, Cao Van Nguyen, Trung X. Pham, Tung M Luu, Chang D Yoo

Proceedings of the European Conference on Computer Vision (ECCV 2018), acceptance rate 31.8% [Links]

Efficient framework for image super-resolution

[2019] Short Convolutional Neural Network and MFCCs for Accurate Speaker Recognition Systems

Trung X. Pham and Chang D Yoo

The 34th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC 2019) [Links]

A lightweight, accurate, and efficient deep neural network for speaker recognition systems.

Paper arXiv Code

Co-operations

I am open to collaborating with researchers on various topics in Deep Learning, Machine Learning, and AI, including but not limited to Computer Vision, Generative AI, Video/Image/Audio Processing, and Natural Language Processing (NLP). Feel free to contact me at: trungpx@kaist.ac.kr or phamxuantrungbk@gmail.com