I am Jiaming Xu (许珈铭), a third year Ph.D student supervised by Prof. Guohao Dai (戴国浩) in School of Computer Science, Shanghai Jiao Tong University (上海交通大学计算机学院) and Shanghai Innovation Institute (上海创智学院). Previously, I obtained my Bachelor’s degree in 2023 from School of Computer Science and Technology, Xidian University (西安电子科技大学计算机科学与技术学院) supervised by Prof. Nannan Wang (王楠楠). I was once an intern in Infinigence AI (无问芯穹) and now still collaborate closely with Xiuhong Li (李秀红) in Infinigence AI.
My research focuses on efficient machine learning systems (MLSys), primarily the effcient AI (e.g., LLM, sparse computing, embodied AI, multimodal model) inference through algorithm (e.g., quantization, pruning, speculative decoding) and system (kernel design, memory management, dataflow design, heterogeneous computing) co-deisgn. I have published 10+ papers  at the top international conferences and journals such as IEEE TCAD, ISCA, MLSys, DAC.
🔥 News
- 2025.10: 🎉🎉 MARCA-v2 was accepted by IEEE TCAD!
- 2025.10: 🎉🎉 Congratulation! I was awarded the National Scholarship (Ph.D Students). This is my fourth National Scholarship.
- 2025.09: 🎉🎉 We release SpecDiff and SpecPrune-VLA for diffusion and VLA model acceleration.
- 2025.09: 🎉🎉 Two papers were accepted by ASP-DAC 2026. Look forward to meeting you next January in Hong Kong!
- 2025.08: 🎤🎤 A invited talk (大模型推理软硬件协同优化) was given in Ordos, China. Thanks for the invitation of CCF-HPC 2025.
👥 Team
Now I lead the system team (DAI-Sys) in our lab. Our team currently consists of 10 students, including 4 Ph.D. students, 1 master student, and 5 undergraduates. I am very happpy to cooperate with them. I am looking for students, who are excited to tackle efficiency problems in AI from an algorithm, modeling, system/hardware perspectives, to join us (DAI-Sys小组招生).
Now
- Jiaming Xu (许珈铭): third year Ph.D student in Shanghai Jiao Tong University and Shanghai Innovation Institude
- Yaoxiu Lian (廉瑶秀): fourth year Ph.D student in Shanghai Jiao Tong University
- Yongkang Zhou (周永康): first year Ph.D student in Shanghai Jiao Tong University and Shanghai Innovation Institude
- Kele Shao (邵可乐): first year Ph.D student in Westlake University and Shanghai Innovation Institude
- Jiayi Pan (潘佳一): second year master student in Shanghai Jiao Tong University
- Tianlang Zhao (赵天朗): fourth year undergraduate in Shanghai Jiao Tong University
- Hanzhen Wang (王翰楨): third year undergraduate in Shanghai Jiao Tong University
- Haotian Fang (方皓天): second year undergraduate in Shanghai Jiao Tong University
- Qiming Cheng (程淇铭): second year undergraduate in East China Normal University
- Chengze Yuan (袁诚泽): second year undergraduate in Shanghai Jiao Tong University
Previous
- Siming Chen (陈思铭, 2024.10~2025.6): fourth year undergraduate in Lanzhou University
- Junyi Wu (吴俊逸, 2024.1~2025.3): third year undergraduate in Shanghai Jiao Tong University
📝 Publications

SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
Jiaming Xu, Jiayi Pan, Yongkang Zhou, Siming Chen, Jinhao Li, Yaoxiu Lian, Junyi Wu, Guohao Dai
ISCA 2025 (CCF-A)

Enabling Efficient Sparse Multiplications on GPUs with Heuristic Adaptability
Jiaming Xu*, Shan Huang*, Jinhao Li, Guyue Huang, Yuan Xie, Yu Wang, Guohao Dai
IEEE TCAD 2025 (CCF-A)

Ke Hong*, Guohao Dai*, Jiaming Xu*, Qiuli Mao, Xiuhong Li, Jun Liu, Kangdi Chen, Yuhan Dong and Yu Wang
MLSys 2024 (non-CCF)

SpecPrune-VLA: Accelerating VisionLanguage-Action Models via Action-Aware Self-Speculative Pruning
Hanzhen Wang*, Jiaming Xu*, Jiayi Pan, Yongkang Zhou, Guohao Dai

SpecDiff: Accelerating Diffusion Model Inference with Self-Speculation
Jiayi Pan*, Jiaming Xu*, Yongkang Zhou, Guohao Dai

SpAct-NDP: Efficient LLM Inference via Sparse Activation on NDP-GPU Heterogeneous Architecture
ASP-DAC 2026 (CCF-C)
Jiaming Xu*, Tongxin Xie*, Yongkang Zhou, Jinhao Li, Yaoxiu Lian, Zhenhua Zhu, Yu Wang, Guohao Dai

Accelerator for LLM-Enhanced GNN with Product Quantization and Unified Indexing
Jiaming Xu*, Jinhao Li*, Jun Liu, Hao Zhou and Guohao Dai
ASP-DAC 2025 (CCF-C)

Jinhao Li*, Jiaming Xu*, Shiyao Li, Shan Huang, Jun Liu, Yaoxiu Lian and Guohao Dai
ICCAD 2024 (CCF-B)

Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li, Jiaming Xu, Shan Huang, Yonghua Chen, Wen Li, Jun Liu, Yaoxiu Lian, Jiayi Pan, Li Ding, Hao Zhou, Yu Wang, Guohao Dai

A Survey on Efficient Inference for Large Language Models
Zixuan Zhou, Xuefei Ning, Ke Hong, Tianyu Fu, Jiaming Xu, Shiyao Li, Yuming Lou, Luning Wang, Zhihang Yuan, Xiuhong Li, Shengen Yan, Guohao Dai, Xiao-Ping Zhang, Yuhan Dong, Yu Wang
- 
    IEEE TCAD 2026(CCF-A), MARCA-v2: Mamba Accelerator with Complementary State Space Model Sparsity and Reconfigurable Architecture, Jinhao Li*, Shan Huang*, Jiaming Xu, Jun Liu, Ningyi Xu, Guohao Dai
- 
    IEEE TC 2025(CCF-A), FlashDecoding++Next: High Throughput LLM Inference with Latency and Memory Optimization, Guohao Dai, Ke Hong, Qiuli Mao, Xiuhong Li, Jiaming Xu, Haofeng Huang, Hongtu Xia, Xuefei Ning, ShengenYan, Yun Liang, Yu Wang
- 
    ASP-DAC 2026(CCF-C), BalanceGS: AlgorithmSystem Co-design for Efficient 3D Gaussian Splatting Training on GPU, Junyi Wu*, Jiaming Xu*, Jinhao Li, Yongkang Zhou, Jiayi Pan, Xingyang Li, Guohao Dai
- 
    CIKM 2025(CCF-B), SG-Filter: Enhancing Similar Text Retrieval via Hierarchical Summarized-Semantic Index and Adaptive Filtering, Jiancai Ye*, Jun Liu*, Haoyu Zhang, Maojia Sheng, Tao Yang, Jiaming Xu, Jinhao Li, Yu Wang, Guohao Dai
- 
    DAC 2025(CCF-A), A Cross-model Fusion-aware Framework for Optimizing (gather-matmul-scatter)s Workload, Yaoxiu Lian, Zhihong Gou, Yibo Han, Zhongming Yu, Jiaming Xu, Sheng Yuan, Zhilin Pei, Xingcheng Zhang, Ningyi Xu and Guohao Dai
- 
    DATE 2025(CCF-B), DyLGNN: Efficient LM-GNN Fine-tuning with Dynamic Node Partitioning, Low-degree Sparsity, and Asynchronous Sub-batch, Zhen Yu*, Jinhao Li*, Jiaming Xu, Shan Huang, Jiancai Ye, Ningyi Xu and Guohao Dai
- 
    ASP-DAC 2025(CCF-C), LLSM: LLM-enhanced Logic Synthesis Model with EDA-guided CoT Prompting, Hybrid Embedding and* AIG-tailored Acceleration, Shan Huang*, Jinhao Li*, Zhen Yu, Jiancai Ye, Jiaming Xu, Ningyi Xu and Guohao Dai
- 
    ICCAD 2024(CCF-B), MARCA: Mamba Accelerator with Reconfigurable Architecture, Jinhao Li*, Shan Huang*, Jiaming Xu, Jun Liu, Li Ding, Ningyi Xu and Guohao Dai
- 
    ICCAD 2023(CCF-B), TSTC: Two-level Sparsity Tensor Core Enabling both Algorithm Flexibility and Hardware Efficiency, Jun Liu, Guohao Dai, Hao Xia, Lidong Guo, Xiangsheng Shi, Jiaming Xu, Huazhong Yang and Yu Wang
🎖 Honors and Awards
- 2025.10 National Scholarship (Top 1%), Ministry of Education of The People’s Republic of China.
- 2023.06 Outstanding Graduates, Shaanxi.
- 2023.06 Outstanding Graduates, Xidian University.
- 2023.06 Graduate Star (1/10), Xidian University.
- 2022.12 Thanks for the Modern Scientist Scholarship (感恩近现代科学家奖学金) (1/12), Xidian University.
- 2022.12 Principal’s Scholarship (校长奖学金) (1/5), Xidian University.
- 2022.12 National Scholarship (Top 1%), Ministry of Education of The People’s Republic of China.
- 2022.10 Top 1.3% of World, IEEEXtreme Programming Competition.
- 2022.10 Huawei Intelligent Base Scholarship (华为智能基座奖学金) (1/10), Xidian University.
- 2021.12 National Scholarship (Top 1%), Ministry of Education of The People’s Republic of China.
- 2021.11 Silver Medal, The ICPC International Collegiate Programming Contest of Shaanxi Province.
- 2021.10 Huawei Intelligent Base Scholarship (华为智能基座奖学金) (1/10), Xidian University.
- 2020.12 National Scholarship (Top 1%), Ministry of Education of The People’s Republic of China.
- 2020.11 Bronze Medal, The ICPC International Collegiate Programming Contest of Shaanxi Province.
📖 Educations
- 2023.06 - 2028.06 (expected), School of Computer Science, Shanghai Jiao Tong University
- 2019.09 - 2023.06, School of Computer Science and Technology, Xidian University
💬 Presentation
- 2025.08, [Invited Talk] Efficient LLM inference via hardware-software codesign, CCF-HPC 2025 @Ordos, China
- 2025.06, [Oral Presentation] Accelerating Large Language Model Inference with Speculative Early Exiting, ISCA 2025 @Tokyo, Japan
- 2024.12, [Oral Presentation] Efficient LLM Inference on GPUs with Operator Optimization and Compilation, Chinasys 2024 @Tianjin, China
- 2024.11, [Oral Presentation] MARCA: Mamba Accelerator with Reconfigurable Architecture, ICCAD 2024 @New York, USA
- 2024.11, [Oral Presentation] Fast and Efficient 2-bit LLM Inference on GPU: 2/4/16-bit in a Weight Matrix with Asynchronous Dequantization, ICCAD 2024 @New York, USA
- 2024.11, [Oral Presentation] Towards Floating Point-Based Attention-Free LLM: Hybrid PIM with Non-Uniform Data Format and Reduced Multiolications, ICCAD 2024 @New York, USA
- 2024.09, [Invited Talk] Efficient GPU computation in Large Language Models, CCF-HPC 2024 @Wuhan, China
- 2024.05, [Oral Presentation] FlashDecoding++: Faster Large Language Model Inference with Asynchronization, Flat GEMM Optimization, and Heuristics, MLSys 2024 @California, USA
- 2024.03, [Invited Talk] NVIDIA GPU and LLM Exploration, Flat GEMM Optimization, SJTU @Shanghai, China
💻 Service
- 2025.09 - Now, Teaching Assistant, Thinking and Methodology in Programming (C++) (UG-CS1501-08) by Prof. Weiguo Gu
- 2025.02 - 2025.06, Teaching Assistant, Algorithms and Complexity (UG-CS2308-01) by Prof. Qingshen Ren
- 2025.02 - 2025.06, Teaching Assistant, Algorithm Design and Analysis (PG-CS7310H-033-M01) by Prof. Guohao Dai
- 2024.09 - 2025.01, Teaching Assistant, Thinking and Methodology in Programming (C++) (UG-CS1501-04) by Prof. Weiguo Gu
- 2024.05 - Now, IT Administrator of DAI-Lab, Shanghai Jiao Tong University
- 2025.02 - 2025.06, Teaching Assistant, Algorithm Design and Analysis (PG-CS7310H-033-M01) by Prof. Guohao Dai
- 2022.09 - 2023.06, Huawei Campus Ambassador, Xidian University
- 2021.09 - 2022.09, Chairman of Huawei Innovation Club and Huawei Intelligent Base Club
