Date of Award
6-11-2024
Document Type
Thesis
Publisher
Santa Clara : Santa Clara University, 2024
Degree Name
Master of Science (MS)
Department
Computer Science and Engineering
First Advisor
Nam Ling
Abstract
This thesis presents a novel super-resolution model based on Vector Quantized Generative Adversarial Network (VQGAN) to enhance image resolution. Inspired by recent advancements in the field of image reconstruction, we apply VQGAN to the super-resolution task, leveraging its powerful generative capabilities to produce higher quality high-resolution images.
Building on the VQGAN framework, we propose an improved architecture that incorporates an additional ConvNeXt feature extractor based on Convolutional Neural Networks (CNN) to effectively capture and refine features from low-resolution images. To further enhance model performance, we implemented various strategies to optimize the utilization of the codebook, including capacity optimization, improved initialization, and an Exponential Moving Average (EMA) dynamic updating strategy to ensure more efficient and diverse codebook usage. Additionally, we divided the training process into two phases to improve training stability and convergence. In the first phase, we focus on training the codebook and decoder to create a strong high-resolution prior. In the second phase, we concentrate on training the encoder, the ConvNeXt-based feature extractor, and the GAN. This phased training approach enables the model to progressively learn and refine the complex details required for high-quality image reconstruction and enhances model stability.
Experimental results demonstrate that our model shows improvements to the existing methods in image quality as measured by Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS). This work highlights the potential of combining VQGAN with advanced feature extractors in super-resolution models, paving the way for future research and development in this field.
Recommended Citation
Zhou, Lebin, "Enhancing VQGAN-Based Model with ConNeXt for Blind Super-Resolution" (2024). Computer Science and Engineering Master's Theses. 39.
https://scholarcommons.scu.edu/cseng_mstr/39