I am Jiaming Xu (许珈铭), a third year Ph.D student supervised by Prof. Guohao Dai (戴国浩) in School of Computer Science, Shanghai Jiao Tong University (上海交通大学计算机学院) and Shanghai Innovation Institute (上海创智学院). Previously, I obtained my Bachelor’s degree in 2023 from School of Computer Science and Technology, Xidian University (西安电子科技大学计算机科学与技术学院) supervised by Prof. Nannan Wang (王楠楠). I was once an intern in Infinigence AI (无问芯穹) and now still collaborate closely with Xiuhong Li (李秀红) in Infinigence AI.

My research focuses on efficient machine learning systems (MLSys), primarily the effcient AI (e.g., LLM, sparse computing, embodied AI, multimodal model) inference through algorithm (e.g., quantization, pruning, speculative decoding) and system (kernel design, memory management, dataflow design, heterogeneous computing) co-deisgn. I have published 10+ papers at the top international conferences and journals such as IEEE TCAD, ISCA, MLSys, DAC.

🔥 News

  • 2025.10:  🎉🎉 MARCA-v2 was accepted by IEEE TCAD!
  • 2025.10:  🎉🎉 Congratulation! I was awarded the National Scholarship (Ph.D Students). This is my fourth National Scholarship.
  • 2025.09:  🎉🎉 We release SpecDiff and SpecPrune-VLA for diffusion and VLA model acceleration.
  • 2025.09:  🎉🎉 Two papers were accepted by ASP-DAC 2026. Look forward to meeting you next January in Hong Kong!
  • 2025.08:  🎤🎤 A invited talk (大模型推理软硬件协同优化) was given in Ordos, China. Thanks for the invitation of CCF-HPC 2025.

👥 Team

Now I lead the system team (DAI-Sys) in our lab. Our team currently consists of 10 students, including 4 Ph.D. students, 1 master student, and 5 undergraduates. I am very happpy to cooperate with them. I am looking for students, who are excited to tackle efficiency problems in AI from an algorithm, modeling, system/hardware perspectives, to join us (DAI-Sys小组招生).

Now

  • Jiaming Xu (许珈铭): third year Ph.D student in Shanghai Jiao Tong University and Shanghai Innovation Institude
  • Yaoxiu Lian (廉瑶秀): fourth year Ph.D student in Shanghai Jiao Tong University
  • Yongkang Zhou (周永康): first year Ph.D student in Shanghai Jiao Tong University and Shanghai Innovation Institude
  • Kele Shao (邵可乐): first year Ph.D student in Westlake University and Shanghai Innovation Institude
  • Jiayi Pan (潘佳一): second year master student in Shanghai Jiao Tong University
  • Tianlang Zhao (赵天朗): fourth year undergraduate in Shanghai Jiao Tong University
  • Hanzhen Wang (王翰楨): third year undergraduate in Shanghai Jiao Tong University
  • Haotian Fang (方皓天): second year undergraduate in Shanghai Jiao Tong University
  • Qiming Cheng (程淇铭): second year undergraduate in East China Normal University
  • Chengze Yuan (袁诚泽): second year undergraduate in Shanghai Jiao Tong University

Previous

  • Siming Chen (陈思铭, 2024.10~2025.6): fourth year undergraduate in Lanzhou University
  • Junyi Wu (吴俊逸, 2024.1~2025.3): third year undergraduate in Shanghai Jiao Tong University

📝 Publications

ISCA 2025
sym

SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting

Jiaming Xu, Jiayi Pan, Yongkang Zhou, Siming Chen, Jinhao Li, Yaoxiu Lian, Junyi Wu, Guohao Dai

ISCA 2025 (CCF-A)

Paper | Code | 机器之心

IEEE TCAD 2025
sym

Enabling Efficient Sparse Multiplications on GPUs with Heuristic Adaptability

Jiaming Xu*, Shan Huang*, Jinhao Li, Guyue Huang, Yuan Xie, Yu Wang, Guohao Dai

IEEE TCAD 2025 (CCF-A)

Paper | Project | Code

MLSys 2024
sym

FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics

Ke Hong*, Guohao Dai*, Jiaming Xu*, Qiuli Mao, Xiuhong Li, Jun Liu, Kangdi Chen, Yuhan Dong and Yu Wang

MLSys 2024 (non-CCF)

Paper | Arxiv | 机器之心

Arxiv
sym
Arxiv
sym

SpecDiff: Accelerating Diffusion Model Inference with Self-Speculation

Jiayi Pan*, Jiaming Xu*, Yongkang Zhou, Guohao Dai

Arxiv

Arxiv
sym

SpAct-NDP: Efficient LLM Inference via Sparse Activation on NDP-GPU Heterogeneous Architecture

ASP-DAC 2026 (CCF-C)

Jiaming Xu*, Tongxin Xie*, Yongkang Zhou, Jinhao Li, Yaoxiu Lian, Zhenhua Zhu, Yu Wang, Guohao Dai

Arxiv
sym

Accelerator for LLM-Enhanced GNN with Product Quantization and Unified Indexing

Jiaming Xu*, Jinhao Li*, Jun Liu, Hao Zhou and Guohao Dai

ASP-DAC 2025 (CCF-C)

Paper

ICCAD 2024
sym

Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization

Jinhao Li*, Jiaming Xu*, Shiyao Li, Shan Huang, Jun Liu, Yaoxiu Lian and Guohao Dai

ICCAD 2024 (CCF-B)

Paper

Arxiv
sym

Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective

Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Li Ding, Hao Zhou, Yu Wang, Guohao Dai

Arxiv | Code | 推广

Arxiv
sym

A Survey on Efficient Inference for Large Language Models

Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang

Arxiv | 机器之心

🎖 Honors and Awards

  • 2025.10 National Scholarship (Top 1%), Ministry of Education of The People’s Republic of China.
  • 2023.06 Outstanding Graduates, Shaanxi.
  • 2023.06 Outstanding Graduates, Xidian University.
  • 2023.06 Graduate Star (1/10), Xidian University.
  • 2022.12 Thanks for the Modern Scientist Scholarship (感恩近现代科学家奖学金) (1/12), Xidian University.
  • 2022.12 Principal’s Scholarship (校长奖学金) (1/5), Xidian University.
  • 2022.12 National Scholarship (Top 1%), Ministry of Education of The People’s Republic of China.
  • 2022.10 Top 1.3% of World, IEEEXtreme Programming Competition.
  • 2022.10 Huawei Intelligent Base Scholarship (华为智能基座奖学金) (1/10), Xidian University.
  • 2021.12 National Scholarship (Top 1%), Ministry of Education of The People’s Republic of China.
  • 2021.11 Silver Medal, The ICPC International Collegiate Programming Contest of Shaanxi Province.
  • 2021.10 Huawei Intelligent Base Scholarship (华为智能基座奖学金) (1/10), Xidian University.
  • 2020.12 National Scholarship (Top 1%), Ministry of Education of The People’s Republic of China.
  • 2020.11 Bronze Medal, The ICPC International Collegiate Programming Contest of Shaanxi Province.

📖 Educations

  • 2023.06 - 2028.06 (expected), School of Computer Science, Shanghai Jiao Tong University
  • 2019.09 - 2023.06, School of Computer Science and Technology, Xidian University

💬 Presentation

  • 2025.08, [Invited Talk] Efficient LLM inference via hardware-software codesign, CCF-HPC 2025 @Ordos, China
  • 2025.06, [Oral Presentation] Accelerating Large Language Model Inference with Speculative Early Exiting, ISCA 2025 @Tokyo, Japan
  • 2024.12, [Oral Presentation] Efficient LLM Inference on GPUs with Operator Optimization and Compilation, Chinasys 2024 @Tianjin, China
  • 2024.11, [Oral Presentation] MARCA: Mamba Accelerator with Reconfigurable Architecture, ICCAD 2024 @New York, USA
  • 2024.11, [Oral Presentation] Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization, ICCAD 2024 @New York, USA
  • 2024.11, [Oral Presentation] Towards Floating Point-Based Attention-Free LLM: Hybrid PIM with Non-Uniform Data Format and Reduced Multiolications, ICCAD 2024 @New York, USA
  • 2024.09, [Invited Talk] Efficient GPU computation in Large Language Models, CCF-HPC 2024 @Wuhan, China
  • 2024.05, [Oral Presentation] FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics, MLSys 2024 @California, USA
  • 2024.03, [Invited Talk] NVIDIA GPU and LLM Exploration, Flat GEMM Optimization, SJTU @Shanghai, China

💻 Service

  • 2025.09 - Now, Teaching Assistant, Thinking and Methodology in Programming (C++) (UG-CS1501-08) by Prof. Weiguo Gu
  • 2025.02 - 2025.06, Teaching Assistant, Algorithms and Complexity (UG-CS2308-01) by Prof. Qingshen Ren
  • 2025.02 - 2025.06, Teaching Assistant, Algorithm Design and Analysis (PG-CS7310H-033-M01) by Prof. Guohao Dai
  • 2024.09 - 2025.01, Teaching Assistant, Thinking and Methodology in Programming (C++) (UG-CS1501-04) by Prof. Weiguo Gu
  • 2024.05 - Now, IT Administrator of DAI-Lab, Shanghai Jiao Tong University
  • 2025.02 - 2025.06, Teaching Assistant, Algorithm Design and Analysis (PG-CS7310H-033-M01) by Prof. Guohao Dai
  • 2022.09 - 2023.06, Huawei Campus Ambassador, Xidian University
  • 2021.09 - 2022.09, Chairman of Huawei Innovation Club and Huawei Intelligent Base Club