Date of Award
6-2024
Document Type
Dissertation
Publisher
Santa Clara : Santa Clara University, 2024
Degree Name
Doctor of Philosophy (PhD)
Department
Computer Science and Engineering
First Advisor
Nam Ling
Second Advisor
Ying Liu
Abstract
The task of video prediction is to generate unseen future video frames based on the past ones. It is an emerging, yet challenging task due to its inherent uncertainty and complex spatiotemporal dynamics. The ability to predict and anticipate future events from video prediction has applications in various prediction systems like self-driving cars, weather forecasting, traffic flow prediction, video compression etc. Due to the success of deep learning in the computer vision field, several deep learning Artificial Intelligence (AI) architectures such as convolutional neural networks (CNNs), long short-term memory (LSTMs), convolutional LSTMS (ConvLSTMs) and transformers have been explored to improve prediction accuracy. The internal representation, mainly the spatial correlations and temporal dynamics of the video, is learned and used to predict the next frames in deep learning-based video prediction. Several state-of-the-art deep learning methods have achieved superior video prediction accuracy at the expense of huge computational cost. In the light of recent wide popularity of Green AI which aims for efficient environment friendly solutions alongside accuracy, this research concentrates on efficient methods for video prediction. Such methods are suitable for memory-constrained and computation resource-limited platforms, such as mobile and embedded devices. We focus on CNN/LSTM methods and transformer-based architectures with fewer parameters for our lightweight efficient environment-friendly video prediction techniques. We conducted experimental studies on popular video prediction datasets and compared to existing methods, our proposed methods achieved competitive frame prediction accuracy with significantly reduced model size, trainable parameters, and computational complexity.
Recommended Citation
Mathai, Mareeta, "Deep Learning-Based Video Prediction" (2024). Engineering Ph.D. Theses. 53.
https://scholarcommons.scu.edu/eng_phd_theses/53