Date of Award

6-2025

Document Type

Thesis

Publisher

Santa Clara : Santa Clara University, 2025

Degree Name

Master of Science (MS)

Department

Computer Science and Engineering

First Advisor

Sean Choi

Abstract

Tokenization is a critical preprocessing step in machine learning pipelines, particularly for generative AI models such as large language models (LLMs). Despite significant advances in training and inference technologies, tokenization continues to be a notable performance bottleneck due to its CPU-bound nature and limited acceleration support. This thesis proposes NetTokenizer, a system designed to offload tokenization from CPUs to network data planes. Utilizing software frameworks such as the Data Plane Development Kit (DPDK) and hardware-based SmartNIC solutions, NetTokenizer substantially improves inference efficiency. Experimental evaluations demonstrate up to a 95.9% reduction in tail latency and a 5.73x throughput increase compared to conventional CPU-based methods. This research provides a comprehensive analysis and practical foundation for deploying in-network tokenization solutions to enhance ML inference pipelines.

Share

COinS