Pham Xuan Trung

Korea Advanced Institute of Science and Technology (KAIST)

Personal Information

Greetings, I am pursuing my integrated Ms-Ph.D. degree in Electrical Engineering at KAIST in 2018-2025, under the guidance of Professor Chang D. Yoo. My interested topics relate to computer vision, deep learning, generative AI, and image/video/audio processing: Self-supervised Learning, Generative Models, Diffusion Models, Multimodal Learning, Speech Processing, and Natural Language Processing.

In the past, I graduated in 2014 from Hanoi University of Science and Technology (HUST, a top-tier university in Viet Nam), with the School of Electronics and Telecommunications ranked 10/526 students (top 1.9%). After that, I worked for VNPT Technology Corporation (employees > 1000+) in Hanoi for 3 years until 2018, mainly doing research and deploying 2G, 3G & 4G mobile communication network projects with various vendors such as Alcatel-Lucent, Nokia Siemens, and SAMSUNG.

Email  /  CV  /  Google Scholar  /  LinkedIn

profile photo

International Conferences and Journals

I take immense pride in my contributions to the academic community, having disseminated my findings through publications in top-tier venues: NeurIPS (1), ICML (1), ICLR (1), CVPR (3), ECCV (1), ICASSP (1), Advanced Materials (1, IF: 32), Nano Energy (1, IF: 19), IEEE TCSTV (1, IF: 8.4), IEEE TBD (1, IF: 4.5), IEEE Access (3, IF: 3.9).

Reviewers & Program Committee Members

I served as a Reviewer and Program Committee at various prestigious conferences and journals

  1. CVPR 2023, 2024, 2025 [The Conference on Computer Vision and Pattern Recognition]
  2. NeurIPS 2023, 2024 [The Conference on Neural Information Processing Systems]
  3. ICML 2024 [The International Conference on Machine Learning]
  4. AAAI 2024, AAAI 2025 [The Association for the Advancement of Artificial Intelligence]
  5. ICCV 2023 [The International Conference on Computer Vision]
  6. ECCV 2024 [The European Conference on Computer Vision]
  7. ICLR 2024, ICLR 2025 [The International Conference on Learning Representations]
  8. ACCV 2024 [The Asian Conference on Computer Vision]
  9. ICASSP 2024, ICASSP 2025 [The International Conference on Acoustics, Speech, and Signal Processing]
  10. AISTATS 2025 [International Conference on Artificial Intelligence and Statistics]
  11. Neural Networks (NN) 2023, Impact Factor: 8.67
  12. IEEE Transaction on Multimedia (TMM) 2023, Impact Factor: 7.39
  13. Computer Vision and Image Understanding (CVIU) 2024, Impact Factor: 4.3
  14. ISPRS Journal of Photogrammetry and Remote Sensing (ISPRS) 2024, Impact Factor: 11.83

Awards

  • Won First Prize in Mathematic Contest for High School Student Grade 12, 2009
  • Won Trang Nguyen Flower award for the best student among thousands of students in Giao Thuy B High School, 2009
  • Annual Encourage Scholarship by Hanoi University of Science and Technology (HUST) for excellent students with outstanding performance for every semester, 2009 – 2014
  • Won Award of the Top 100 Best Korea National Researches, 2023 [Certificate]

Recent Publications

* denotes equal contributions, my research was first recorded in 2018.

MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation

Trung X. Pham*, Tri Ton*, and Chang D. Yoo

Arxiv Oct 2024 [OpenReview] [Code]


A state-of-the-art framework, a novel framework for vision-guided open-domain sound generation optimized for model parameter size, memory consumption, and inference speed using denoising masked diffusion transformers, facilitating efficient generation without reliance on pre-trained diffusion models.


Cross-view Masked Diffusion Transformers for Person Image Synthesis

Trung X. Pham*, Zhang Kang*, and Chang D. Yoo

The Forty-first International Conference on Machine Learning (ICML 2024), acceptance rate 27.5% [OpenReview] [Code]


A state-of-the-art framework for pose-guided human image synthesis using the cutting-edge technique of masked diffusion transformers.


ACDMSR: Accelerated conditional diffusion models for single image super-resolution

Axi Niu, Trung X. Pham, Kang Zhang, Jinqiu Sun, Yu Zhu, Qingsen Yan, In So Kweon, Yanning Zhang

IEEE Transactions on Broadcasting (TBD 2024), IF: 5.19 [Links IEEE]


A novel framework for Speeding up Image Super-Resolution.


Learning from multi-perception features for real-word image super-resolution

Axi Niu, Kang Zhang, Trung X. Pham, Pei Wang, Jinqiu Sun, In So Kweon, Yanning Zhang

IEEE Transactions on Circuits and Systems for Video Technology (TCSVT 2024), IF: 8.4 [Links IEEE]


MPF-Net for Single Image Super-Resolution.


Self-supervised visual representation learning via residual momentum

Trung X. Pham, Axi Niu, Kang Zhang, Tee Joshua Tian Jin, Ji Woo Hong, Chang D Yoo

IEEE Access 2023, acceptance rate 30% [Links IEEE]


Introduction of residual momentum that significantly improves self-supervised learning frameworks.


DimCL: Dimensional Contrastive Learning for Improving Self-Supervised Learning

Thanh Nguyen*, Trung X. Pham*, Chaoning Zhang, Tung M Luu, Thang Vu, Chang D Yoo

IEEE Access 2023, acceptance rate 30% [Links IEEE]


Introduction of a new regularization that significantly improves self-supervised learning frameworks.


Cdpmsr: Conditional diffusion probabilistic models for single image super-resolution

Axi Niu, Kang Zhang, Trung X. Pham, Jinqiu Sun, Yu Zhu, In So Kweon, Yanning Zhang

IEEE International Conference on Image Processing (ICIP) 2023, acceptance rate 47% [Links]


Conditional diffusion model for image super-resolution with post-process technique


Deep learning-based noise robust flexible piezoelectric acoustic sensors for speech processing

Young Hoon Jung*, Trung X. Pham*, Dias Issa, Hee Seung Wang, Jae Hee Lee, Mingi Chung, Bo-Yeon Lee, Gwangsu Kim, Chang D Yoo, Keon Jae Lee

Nano Energy (IF: 19.0) 2022 [Links]

>> Top 100 best Korea national researches 2023 [Certificate]


An excellent combination of deep learning and flexible piezoelectric acoustic sensor for >99% accuracy of speaker recognition.


How does simsiam avoid collapse without negative samples? a unified understanding with self-supervised contrastive learning

Chaoning Zhang, Kang Zhang, Chenshuang Zhang, Trung X. Pham, Chang D Yoo, In So Kweon

The Tenth International Conference on Learning Representations (ICLR 2022), acceptance rate 32.9% [OpenReview]


A deep analysis of constrastive learning frameworks to clarify the collapse issue.


Dual temperature helps contrastive learning without many negative samples: Towards understanding and simplifying moco

Chaoning Zhang*, Kang Zhang*, Trung X. Pham*, Axi Niu, Zhinan Qiao, Chang D Yoo, In So Kweon

The IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2022), acceptance rate 25.3% [Links]


The new loss function is proposed to reduce the massive number of negative samples in contrastive learning frameworks.


On the pros and cons of momentum encoder in self-supervised visual representation learning

Trung X. Pham, Chaoning Zhang, Axi Niu, Kang Zhang, Chang D Yoo

Arxiv 2022 [Links]


A deep investigation on the pros and cons of EMA-based contrastive learning frameworks.


Lad: A hybrid deep learning system for benign paroxysmal positional vertigo disorders diagnostic

Trung X. Pham*, Jin Woong Choi*, Rusty John Lloyd Mina, Thanh Xuan Nguyen, Sultan Rizky Madjid, Chang D Yoo

IEEE Access 2022, acceptance rate 30% [Links IEEE] [Code]


Using AI deep learning to diagnose BPPV disorders in patients in hospital, data from Chungnam National Hospital University.


Self-supervised Learning with Local Attention-Aware Feature

Trung X. Pham*, Rusty John Lloyd Mina, Dias Issa, Chang D. Yoo

Arxiv 2021 [Links]


Learning representation of the data without any labels.


Robust MAML: Prioritization task buffer with adaptive learning process for model-agnostic meta-learning

Thanh Nguyen, Tung Luu, Trung X. Pham, Sanzhar Rakhimkul, Chang D Yoo

IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2021, acceptance rate 48.4% [Links IEEE]


Learning meta model with adaptive learning rate scheme.


Learning augmentation network via influence functions

Donghoon Lee, Hyunsin Park, Trung X. Pham, Chang D Yoo

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), acceptance rate 22% [Links IEEE]


Learning augmentation with the differentiable neural network with a fancy influence approach.


Modality shifting attention network for multi-modal video question answering

Junyeong Kim, Minuk Ma, Trung X. Pham, Kyungsu Kim, Chang D Yoo

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR 2020), acceptance rate 22% [Links IEEE]


Video question answering via a smart design neural network.


Cascade rpn: Delving into high-quality region proposal network with adaptive convolution

Thang Vu, Hyunjun Jang, Trung X. Pham, Chang D Yoo

Advances in Neural Information Processing Systems (NeurIPS 2019), acceptance rate 21.6% [Links]


>> Spotlight (top 2.4%)

Significantly improve object detection of 2D images with a ground-breaking design.


Flexible piezoelectric acoustic sensors and machine learning for speech processing

Young Hoon Jung, Seong Kwang Hong, Hee Seung Wang, Jae Hyun Han, Trung X. Pham, Hyunsin Park, Junyeong Kim, Sunghun Kang, Chang D Yoo, Keon Jae Lee

Advanced Materials 2020 (IF: 32) [Links]


Combining AI/machine learning with flexible acoustic sensors for Speech Processing.


Fast and efficient image quality enhancement via desubpixel convolutional neural networks

Thang Vu, Cao Van Nguyen, Trung X. Pham, Tung M Luu, Chang D Yoo

Proceedings of the European Conference on Computer Vision (ECCV 2018), acceptance rate 31.8% [Links]


Efficient framework for image super-resolution


Short Convolutional Neural Network and MFCCs for Accurate Speaker Recognition Systems

Trung X. Pham and Chang D Yoo

The 34th International Technical Conference on Circuits/Systems, Computers and Communications (ITC-CSCC 2019) [Links]


A lightweight, accurate, and efficient deep neural network for speaker recognition systems.


Co-operations

I am open to collaborating with researchers on various topics in Deep Learning, Machine Learning, and AI, including but not limited to Computer Vision, Audio Processing, and Natural Language Processing (NLP). Feel free to contact me at: trungpx@kaist.ac.kr or phamxuantrungbk@gmail.com