Author

Colin Rioux

Date of Award

6-7-2022

Document Type

Thesis

Publisher

Santa Clara : Santa Clara University, 2022.

Degree Name

Master of Science (MS)

Department

Computer Science and Engineering

First Advisor

Sharon Hsiao

Abstract

Pseudocode is a traditional teaching tactic in computer science, yet it is not standardized and programming language dependent. Thus, it can be quite time consuming to write it. With the advancement of AI methodologies in NLP, AI could help address this problem. This work investigates the quality of AI generated pseudocode from source code. Five studies are conducted in this work to measure pseudocode quality, where each study modifies model input to observe accuracy and generalizability. The results show that there is an association between pseudocode quality and training and test set similarity. Furthermore, a sizable and diverse training set and a same language test set is critical for good quality generated pseudocode. Future work can explore language independent embeddings to simplify datasets while maintaining language semantics, if the creation of more applicable datasets is unfeasible.

Share

COinS