Date of Award
6-8-2025
Document Type
Thesis
Publisher
Santa Clara : Santa Clara University, 2025
Department
Computer Science and Engineering
First Advisor
Sean Choi
Abstract
This project will investigate the development of a transformer-based machine learning model (NetGen), with the purpose of generating realistic and usable synthetic network traffic from natural language inputs. Network research and testing in the cybersecurity field depend on publicly available packet data (PCAP). Many of these datasets are either outdated or fail to represent more modern traffic patterns. We attempt to address this issue by preprocessing available network data into structured and tokenized sequences, suitable for training a transformer model. The transformer architecture allows us to translate user-provided descriptions—“simulate a TCP handshake from IP A to IP B”—into structurally accurate network packet streams. The e↵ectiveness of NetGen was evaluated using custom metrics that include Header Completeness, Sequence Consistency Index, and Field Validity Rate, helping to demonstrate the viability of transformer-based packet generators. Our best 90 M-parameter checkpoint achieves a Header-Completeness (HC) of 0.65, Sequence-Consistency Index (SCI) of 0.43, and Field-Validity Rate (FVR) of 0.93 demonstrating that NetGen has the capacity to reproduce long-range protocol structures. We aim to improve accessibility to realistic network traffic while significantly reducing the technical complexity associated with generating network traffic for academic researchers, cybersecurity professionals, and educators.
Recommended Citation
Pefourque, Arya; Fettkether, Cole; and Maschler, Johnathon, "NetGen: Network Traffic Generator" (2025). Computer Science and Engineering Senior Theses. 327.
https://scholarcommons.scu.edu/cseng_senior/327
