Date of Award

6-2025

Document Type

Thesis

Publisher

Santa Clara : Santa Clara University, 2025

Degree Name

Master of Science (MS)

Department

Computer Science and Engineering

First Advisor

Sean Choi

Abstract

Tokenization is a critical preprocessing step in machine learning pipelines, particularly for generative AI models such as large language models (LLMs). Despite significant advances in training and inference technologies, tokenization continues to be a notable performance bottleneck due to its CPU-bound nature and limited acceleration support. This thesis proposes NetTokenizer, a system designed to offload tokenization from CPUs to network data planes. Utilizing software frameworks such as the Data Plane Development Kit (DPDK) and hardware-based SmartNIC solutions, NetTokenizer substantially improves inference efficiency. Experimental evaluations demonstrate up to a 95.9% reduction in tail latency and a 5.73x throughput increase compared to conventional CPU-based methods. This research provides a comprehensive analysis and practical foundation for deploying in-network tokenization solutions to enhance ML inference pipelines.

Recommended Citation

Mehta, Sudarshan, "Accelerating LLM Inference with Smart NIC Tokenization and Caching" (2025). Computer Science and Engineering Master's Theses. 52.
https://scholarcommons.scu.edu/cseng_mstr/52

Download

Included in

Computer Engineering Commons

COinS

Computer Science and Engineering Master's Theses

Accelerating LLM Inference with Smart NIC Tokenization and Caching

Date of Award

Document Type

Publisher

Degree Name

Department

First Advisor

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Computer Science and Engineering Master's Theses

Accelerating LLM Inference with Smart NIC Tokenization and Caching

Author

Date of Award

Document Type

Publisher

Degree Name

Department

First Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links