I am Jiaming Xu (许珈铭), a third year Ph.D student supervised by Prof. Guohao Dai (戴国浩) in School of Computer Science, Shanghai Jiao Tong University (上海交通大学计算机学院) and Shanghai Innovation Institute (上海创智学院). Previously, I obtained my Bachelor’s degree in 2023 from School of Computer Science and Technology, Xidian University (西安电子科技大学计算机科学与技术学院) supervised by Prof. Nannan Wang (王楠楠). I was once an intern in Infinigence AI (无问芯穹) and now still collaborate closely with Xiuhong Li (李秀红) in Infinigence AI.

My research focuses on efficient machine learning systems (MLSys), primarily the effcient AI (e.g., LLM, sparse computing, embodied AI, multimodal model) inference through algorithm (e.g., quantization, pruning, speculative decoding) and system (kernel design, memory management, dataflow design, heterogeneous computing) co-deisgn. I have published 10+ papers at the top international conferences and journals such as IEEE TCAD, ASPLOS, ISCA, AAAI, MLSys, DAC.

🔥 News

2025.11: 🎉🎉 SpeContext was accepted by ASPLOS 2026! The accepted rate is only 10%! Look forward to meeting you next March in Pittsburgh, USA!
2025.11: 🎉🎉 SpecDiff and MoSs was accepted by AAAI! SpecDiff is accepted as Oral presentation! Look forward to meeting you next January in Singapore!
2025.10: 🎉🎉 MARCA-v2 was accepted by IEEE TCAD!
2025.10: 🎉🎉 Congratulation! I was awarded the National Scholarship (Ph.D Students). This is my fourth National Scholarship.
2025.09: 🎉🎉 We release SpecDiff and SpecPrune-VLA for diffusion and VLA model acceleration.
2025.09: 🎉🎉 Two papers were accepted by ASP-DAC 2026. Look forward to meeting you next January in Hong Kong!
2025.08: 🎤🎤 A invited talk (大模型推理软硬件协同优化) was given in Ordos, China. Thanks for the invitation of CCF-HPC 2025.

👥 Team

Now I lead the system team (DAI-Sys) in our lab. Our team currently consists of 14 students, including 3 Ph.D. students, 2 master student, and 9 undergraduates. I am very happy to cooperate with them. I am looking for students, who are excited to tackle efficiency problems in AI from an algorithm, modeling, system/hardware perspectives, to join us. You can contact me via email (jiamingxu@sjtu.edu.cn) with your bio. You also contact me through searching ‘MathsCode’ in social medias(e.g., RedNote（小红书）, Zhihu（知乎）). We welcome students from all over the world and both support online and offline cooperation.

Now

Jiaming Xu (许珈铭): third year Ph.D student in Shanghai Jiao Tong University and Shanghai Innovation Institude
Yaoxiu Lian (廉瑶秀): fourth year Ph.D student in Shanghai Jiao Tong University
Yongkang Zhou (周永康): first year Ph.D student in Shanghai Jiao Tong University and Shanghai Innovation Institude
Longsheng Zhou (周龙升): second year master student in University of Science and Technology of China
Jiayi Pan (潘佳一): second year master student in Shanghai Jiao Tong University
Tianlang Zhao (赵天朗): fourth year undergraduate in Shanghai Jiao Tong University
Hanzhen Wang (王翰楨): third year undergraduate in Shanghai Jiao Tong University
Yifan Jiao (焦一帆): second year undergraduate in Shanghai Jiao Tong University
Mingyi Xu (徐铭怿): second year undergraduate in Shanghai Jiao Tong University
Chengze Yuan (袁诚泽): second year undergraduate in Shanghai Jiao Tong University
Qiming Chen (程淇铭): second year undergraduate in East China Normal University
Jiewen Xiao (肖杰文): second year undergraduate in Shanghai Jiao Tong University
Ziying Wu (吴梓莹): second year undergraduate in Shanghai Jiao Tong University
Shuhuan Li (李姝浣): first year undergraduate in Shanghai Jiao Tong University

Previous

Siming Chen (陈思铭, 2024.10~2025.6): fourth year undergraduate in Lanzhou University
Junyi Wu (吴俊逸, 2024.1~2025.3): third year undergraduate in Shanghai Jiao Tong University

📝 Publications

ASPLOS 2026

SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs

Jiaming Xu, Jiayi Pan, Hanzhen Wang, Yongkang Zhou, Jiancai Ye, Yu Wang, Guohao Dai

ASPLOS 2026 (CCF-A)

Arxiv

ISCA 2025

SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting

Jiaming Xu, Jiayi Pan, Yongkang Zhou, Siming Chen, Jinhao Li, Yaoxiu Lian, Junyi Wu, Guohao Dai

ISCA 2025 (CCF-A)

Paper | Code | 机器之心

IEEE TCAD 2025

Enabling Efficient Sparse Multiplications on GPUs with Heuristic Adaptability

Jiaming Xu*, Shan Huang*, Jinhao Li, Guyue Huang, Yuan Xie, Yu Wang, Guohao Dai

IEEE TCAD 2025 (CCF-A)

Paper | Project | Code

MLSys 2024

FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics

Ke Hong*, Guohao Dai*, Jiaming Xu*, Qiuli Mao, Xiuhong Li, Jun Liu, Kangdi Chen, Yuhan Dong and Yu Wang

MLSys 2024 (non-CCF)

Paper | Arxiv | 机器之心

Arxiv

SpecDiff: Accelerating Diffusion Model Inference with Self-Speculation

Jiayi Pan*, Jiaming Xu*, Yongkang Zhou, Guohao Dai

AAAI 2026 Oral (CCF-A)

Arxiv

Arxiv

SpecPrune-VLA: Accelerating VisionLanguage-Action Models via Action-Aware Self-Speculative Pruning

Hanzhen Wang*, Jiaming Xu*, Jiayi Pan, Yongkang Zhou, Guohao Dai

Arxiv

Arxiv

SpAct-NDP: Efficient LLM Inference via Sparse Activation on NDP-GPU Heterogeneous Architecture

ASP-DAC 2026 (CCF-C)

Jiaming Xu*, Tongxin Xie*, Yongkang Zhou, Jinhao Li, Yaoxiu Lian, Zhenhua Zhu, Yu Wang, Guohao Dai

Arxiv

Accelerator for LLM-Enhanced GNN with Product Quantization and Unified Indexing

Jiaming Xu*, Jinhao Li*, Jun Liu, Hao Zhou and Guohao Dai

ASP-DAC 2025 (CCF-C)

Paper

ICCAD 2024

Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization

Jinhao Li*, Jiaming Xu*, Shiyao Li, Shan Huang, Jun Liu, Yaoxiu Lian and Guohao Dai

ICCAD 2024 (CCF-B)

Paper

Arxiv

Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective

Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Li Ding, Hao Zhou, Yu Wang, Guohao Dai

Arxiv | Code | 推广

Arxiv

A Survey on Efficient Inference for Large Language Models

Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang

Arxiv | 机器之心

IEEE TCAD 2026 (CCF-A), MARCA-v2: Mamba Accelerator with Complementary State Space Model Sparsity and Reconfigurable Architecture, Jinhao Li*, Shan Huang*, Jiaming Xu, Jun Liu, Ningyi Xu, Guohao Dai
IEEE TC 2025 (CCF-A), FlashDecoding++Next: High Throughput LLM Inference with Latency and Memory Optimization, Guohao Dai, Ke Hong, Qiuli Mao, Xiuhong Li, Jiaming Xu, Haofeng Huang, Hongtu Xia, Xuefei Ning, ShengenYan, Yun Liang, Yu Wang
AAAI 2026 (CCF-A), MoSs: Mixture of Scales for Efficient High-Resolution Autoregressive Image Generation, Yaoxiu Lian, Liang Hao, Gou Zhihong, Yijia Zhang, Jiaming Xu, Guohao Dai, Ningyi Xu
ASP-DAC 2026(CCF-C), BalanceGS: AlgorithmSystem Co-design for Efficient 3D Gaussian Splatting Training on GPU, Junyi Wu*, Jiaming Xu*, Jinhao Li, Yongkang Zhou, Jiayi Pan, Xingyang Li, Guohao Dai
CIKM 2025 (CCF-B), SG-Filter: Enhancing Similar Text Retrieval via Hierarchical Summarized-Semantic Index and Adaptive Filtering, Jiancai Ye*, Jun Liu*, Haoyu Zhang, Maojia Sheng, Tao Yang, Jiaming Xu, Jinhao Li, Yu Wang, Guohao Dai
DAC 2025 (CCF-A), A Cross-model Fusion-aware Framework for Optimizing (gather-matmul-scatter)s Workload, Yaoxiu Lian, Zhihong Gou, Yibo Han, Zhongming Yu, Jiaming Xu, Sheng Yuan, Zhilin Pei, Xingcheng Zhang, Ningyi Xu and Guohao Dai
DATE 2025 (CCF-B), DyLGNN: Efficient LM-GNN Fine-tuning with Dynamic Node Partitioning, Low-degree Sparsity, and Asynchronous Sub-batch, Zhen Yu*, Jinhao Li*, Jiaming Xu, Shan Huang, Jiancai Ye, Ningyi Xu and Guohao Dai
ASP-DAC 2025(CCF-C), LLSM: LLM-enhanced Logic Synthesis Model with EDA-guided CoT Prompting, Hybrid Embedding and* AIG-tailored Acceleration, Shan Huang*, Jinhao Li*, Zhen Yu, Jiancai Ye, Jiaming Xu, Ningyi Xu and Guohao Dai
ICCAD 2024 (CCF-B), MARCA: Mamba Accelerator with Reconfigurable Architecture, Jinhao Li*, Shan Huang*, Jiaming Xu, Jun Liu, Li Ding, Ningyi Xu and Guohao Dai
ICCAD 2023 (CCF-B), TSTC: Two-level Sparsity Tensor Core Enabling both Algorithm Flexibility and Hardware Efficiency, Jun Liu, Guohao Dai, Hao Xia, Lidong Guo, Xiangsheng Shi, Jiaming Xu, Huazhong Yang and Yu Wang

👨‍💻 Internship Experience

Research Intern @ Department of Foundation Large Models, 2012 Labs

Date: February, 2026 - Now

Supervisor: Dr. Zhongzhe Hu

Research Direction:

Long Context Optimization in Large-scale LLM Inference
Design and Optimization on Multi-token Prediction in Large-scale LLM Inference

Research Intern @ Department of AI Inference

Date: April, 2023 - April, 2024

Supervisor: Dr. Xiuhong Li

Research Direction:

GPU Optimization on LLM Operator Kernels: FlashDecoding++[MLSys 2024]
Algorithm and System Co-design on LLM Edge-side Inference: SpecEE[ISCA 2025], SpeContext[ASPLOS 2026]
Communication Optimization on Large-scale MoE LLM Inference

🎖 Honors and Awards

2025.10 National Scholarship (Top 1%), Ministry of Education of The People’s Republic of China.
2023.06 Outstanding Graduates, Shaanxi.
2023.06 Outstanding Graduates, Xidian University.
2023.06 Graduate Star (1/10), Xidian University.
2022.12 Thanks for the Modern Scientist Scholarship (感恩近现代科学家奖学金) (1/12), Xidian University.
2022.12 Principal’s Scholarship (校长奖学金) (1/5), Xidian University.
2022.12 National Scholarship (Top 1%), Ministry of Education of The People’s Republic of China.
2022.10 Top 1.3% of World, IEEEXtreme Programming Competition.
2022.10 Huawei Intelligent Base Scholarship (华为智能基座奖学金) (1/10), Xidian University.
2021.12 National Scholarship (Top 1%), Ministry of Education of The People’s Republic of China.
2021.11 Silver Medal, The ICPC International Collegiate Programming Contest of Shaanxi Province.
2021.10 Huawei Intelligent Base Scholarship (华为智能基座奖学金) (1/10), Xidian University.
2020.12 National Scholarship (Top 1%), Ministry of Education of The People’s Republic of China.
2020.11 Bronze Medal, The ICPC International Collegiate Programming Contest of Shaanxi Province.

📖 Educations

2023.06 - 2028.06 (expected), School of Computer Science, Shanghai Jiao Tong University
2019.09 - 2023.06, School of Computer Science and Technology, Xidian University

💬 Presentation

2026.01, [Oral Presentation] SpAct-NDP: Efficient LLM Inference via Sparse Activation on NDP-GPU Heterogeneous Architecture, ASP-DAC 2026 @Hongkong, China
2025.08, [Invited Talk] Efficient LLM inference via hardware-software codesign, CCF-HPC 2025 @Ordos, China
2025.06, [Oral Presentation] Accelerating Large Language Model Inference with Speculative Early Exiting, ISCA 2025 @Tokyo, Japan
2024.12, [Oral Presentation] Efficient LLM Inference on GPUs with Operator Optimization and Compilation, Chinasys 2024 @Tianjin, China
2024.11, [Oral Presentation] MARCA: Mamba Accelerator with Reconfigurable Architecture, ICCAD 2024 @New York, USA
2024.11, [Oral Presentation] Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization, ICCAD 2024 @New York, USA
2024.11, [Oral Presentation] Towards Floating Point-Based Attention-Free LLM: Hybrid PIM with Non-Uniform Data Format and Reduced Multiolications, ICCAD 2024 @New York, USA
2024.09, [Invited Talk] Efficient GPU computation in Large Language Models, CCF-HPC 2024 @Wuhan, China
2024.05, [Oral Presentation] FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics, MLSys 2024 @California, USA
2024.03, [Invited Talk] NVIDIA GPU and LLM Exploration, Flat GEMM Optimization, SJTU @Shanghai, China

💻 Service

2026.01 - Now, Reviewer, IEEE Transactions on Parallel and Distributed Systems (CCF-A, Top Journal)
2026.01 - Now, Reviewer, IEEE Transactions on Image Processing (CCF-A, Top Journal)
2025.09 - Now, Teaching Assistant, Thinking and Methodology in Programming (C++) (UG-CS1501-08) by Prof. Weiguo Gu
2025.02 - 2025.06, Teaching Assistant, Algorithms and Complexity (UG-CS2308-01) by Prof. Qingshen Ren
2025.02 - 2025.06, Teaching Assistant, Algorithm Design and Analysis (PG-CS7310H-033-M01) by Prof. Guohao Dai
2024.09 - 2025.01, Teaching Assistant, Thinking and Methodology in Programming (C++) (UG-CS1501-04) by Prof. Weiguo Gu
2024.05 - Now, IT Administrator of DAI-Lab, Shanghai Jiao Tong University
2025.02 - 2025.06, Teaching Assistant, Algorithm Design and Analysis (PG-CS7310H-033-M01) by Prof. Guohao Dai
2022.09 - 2023.06, Huawei Campus Ambassador, Xidian University
2021.09 - 2022.09, Chairman of Huawei Innovation Club and Huawei Intelligent Base Club

Jiaming Xu 许珈铭

🔥 News

👥 Team

📝 Publications

👨‍💻 Internship Experience

🎖 Honors and Awards

📖 Educations

💬 Presentation

💻 Service