Date of Award
6-2025
Document Type
Thesis
Publisher
Santa Clara : Santa Clara University, 2025
Degree Name
Master of Science (MS)
Department
Computer Science and Engineering
First Advisor
Sean Choi
Abstract
Tokenization is a critical preprocessing step in machine learning pipelines, particularly for generative AI models such as large language models (LLMs). Despite significant advances in training and inference technologies, tokenization continues to be a notable performance bottleneck due to its CPU-bound nature and limited acceleration support. This thesis proposes NetTokenizer, a system designed to offload tokenization from CPUs to network data planes. Utilizing software frameworks such as the Data Plane Development Kit (DPDK) and hardware-based SmartNIC solutions, NetTokenizer substantially improves inference efficiency. Experimental evaluations demonstrate up to a 95.9% reduction in tail latency and a 5.73x throughput increase compared to conventional CPU-based methods. This research provides a comprehensive analysis and practical foundation for deploying in-network tokenization solutions to enhance ML inference pipelines.
Recommended Citation
Mehta, Sudarshan, "Accelerating LLM Inference with Smart NIC Tokenization and Caching" (2025). Computer Science and Engineering Master's Theses. 52.
https://scholarcommons.scu.edu/cseng_mstr/52
