Date of Award

6-9-2025

Document Type

Thesis

Publisher

Santa Clara : Santa Clara University, 2025

Department

Computer Science and Engineering

First Advisor

Yuhong Liu

Abstract

Modern large language models (LLMs) increase the speed of software development but frequently hallucinate code, forcing developers to debug and rewrite generated code manually. This project introduces Documentation-Aware Code Generation, a retrieval-augmented generation (RAG) framework that injects authoritative documentation into the LLM prompt to reduce hallucinations in generated code. We build an ingestion pipeline that scrapes, cleans, chunks, and embeds official API documentation into a Pinecone vector database, then design a retrieval and re-ranking pipeline that retrieves the most relevant snippets for each user query. Results show that augmenting prompts with documentation lowers code hallucination by up to 60%. The system is exposed through backend APIs and a Next.js frontend, offering developers a tool that reliably generates code.

Share

COinS