Date of Award

6-9-2025

Document Type

Thesis

Publisher

Santa Clara : Santa Clara University, 2025

Department

Computer Science and Engineering

First Advisor

Yuhong Liu

Abstract

Modern large language models (LLMs) increase the speed of software development but frequently hallucinate code, forcing developers to debug and rewrite generated code manually. This project introduces Documentation-Aware Code Generation, a retrieval-augmented generation (RAG) framework that injects authoritative documentation into the LLM prompt to reduce hallucinations in generated code. We build an ingestion pipeline that scrapes, cleans, chunks, and embeds official API documentation into a Pinecone vector database, then design a retrieval and re-ranking pipeline that retrieves the most relevant snippets for each user query. Results show that augmenting prompts with documentation lowers code hallucination by up to 60%. The system is exposed through backend APIs and a Next.js frontend, offering developers a tool that reliably generates code.

Recommended Citation

Amirtharaj, Daniel and Norwood, Rahmin, "Documentation-Aware Code Generation Via Retrieval Augmented Generation" (2025). Computer Science and Engineering Senior Theses. 319.
https://scholarcommons.scu.edu/cseng_senior/319

Download

Included in

Computer Engineering Commons

COinS

Computer Science and Engineering Senior Theses

Documentation-Aware Code Generation Via Retrieval Augmented Generation

Date of Award

Document Type

Publisher

Department

First Advisor

Abstract

Recommended Citation

Included in

Search

Browse

Author Corner

Links

Computer Science and Engineering Senior Theses

Documentation-Aware Code Generation Via Retrieval Augmented Generation

Author

Date of Award

Document Type

Publisher

Department

First Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse

Author Corner

Links