Date of Award
Santa Clara : Santa Clara University, 2020.
Computer Science and Engineering
Deep convolutional generative adversarial networks (DCGANs) have proven capable at generating diverse, photorealistic images of human faces, but it is difficult and often time-consuming to choose what kind of image these generative adversarial networks (GANs) produce. We create a simple, intuitive web application through which users may write a description of a human face in plain text and generate photos that appear to match the given description. In this paper, we show how text can be used to direct the output of a conditional GAN with a DCGAN architecture. While our images did somewhat resemble human faces, they often had artifacts that prevented them from looking like photographs, but they did generally match the input text descriptions. This was likely due to our using a relatively simple DCGAN architecture and trained for a relatively short amount of time. To improve image quality, we recommend experimenting with more advanced GAN architectures trained for a longer amount of time. While a more advanced GAN would likely take longer to train, more advanced models would likely achieve more photorealistic results. To make the model more robust to natural language input, we recommend implementing a text-embedding model to encode the text data that is passed to the GAN.
Cain, Ryan; Kralik, Gabriel; and Munson, Campbell, "Photo-Realistic Image Synthesis from Text Descriptions" (2020). Computer Science and Engineering Senior Theses. 176.