Date of Award

6-11-2024

Document Type

Thesis

Publisher

Santa Clara : Santa Clara University, 2024

Degree Name

Master of Science (MS)

Department

Computer Science and Engineering

First Advisor

Nam Ling

Abstract

This thesis presents a novel super-resolution model based on Vector Quantized Generative Adversarial Network (VQGAN) to enhance image resolution. Inspired by recent advancements in the field of image reconstruction, we apply VQGAN to the super-resolution task, leveraging its powerful generative capabilities to produce higher quality high-resolution images.

Building on the VQGAN framework, we propose an improved architecture that incorporates an additional ConvNeXt feature extractor based on Convolutional Neural Networks (CNN) to effectively capture and refine features from low-resolution images. To further enhance model performance, we implemented various strategies to optimize the utilization of the codebook, including capacity optimization, improved initialization, and an Exponential Moving Average (EMA) dynamic updating strategy to ensure more efficient and diverse codebook usage. Additionally, we divided the training process into two phases to improve training stability and convergence. In the first phase, we focus on training the codebook and decoder to create a strong high-resolution prior. In the second phase, we concentrate on training the encoder, the ConvNeXt-based feature extractor, and the GAN. This phased training approach enables the model to progressively learn and refine the complex details required for high-quality image reconstruction and enhances model stability.

Experimental results demonstrate that our model shows improvements to the existing methods in image quality as measured by Peak Signal-to-Noise Ratio (PSNR), Structural Similarity Index (SSIM), and Learned Perceptual Image Patch Similarity (LPIPS). This work highlights the potential of combining VQGAN with advanced feature extractors in super-resolution models, paving the way for future research and development in this field.

Recommended Citation

Zhou, Lebin, "Enhancing VQGAN-Based Model with ConNeXt for Blind Super-Resolution" (2024). Computer Science and Engineering Master's Theses. 39.
https://scholarcommons.scu.edu/cseng_mstr/39

Download

Included in

Computer Engineering Commons

COinS

Scholar Commons

Computer Science and Engineering Master's Theses

Enhancing VQGAN-Based Model with ConNeXt for Blind Super-Resolution

Date of Award

Document Type

Publisher

Degree Name

Department

First Advisor

Abstract

Recommended Citation

Included in

Browse

Search

Author Corner

Links

Scholar Commons

Computer Science and Engineering Master's Theses

Enhancing VQGAN-Based Model with ConNeXt for Blind Super-Resolution

Author

Date of Award

Document Type

Publisher

Degree Name

Department

First Advisor

Abstract

Recommended Citation

Included in

Share

Browse

Search

Author Corner

Links