Date of Award


Document Type



Santa Clara : Santa Clara University, 2022.


Computer Science and Engineering

First Advisor

David C. Anastasiu


In the field of neural networks, there has been a long-standing problem that needs to be addressed: gaining insight into how neural networks make decisions. Neural Networks are still considered black boxes and are often difficult to understand. This lack of understanding becomes an ethical dilemma especially in the domain of self-driving cars. Given the limited number of works geared towards unravelling neural network logic for autonomous driving vehicles, our team seeks to create a novel neural network interpretability method to influence the neural network during its training process.

Deep Neural Networks have demonstrated impressive performance in complex tasks, such as image classification and speech recognition. However, due to their multi-layer structure combined with non-linear decision boundaries, it is hard to understand what makes them arrive at a particular classification or recognition decision given new data. Recently, several approaches have been proposed to understand and interpret the reasoning in a deep neural network. In some state-of-the-art solutions, researchers try to actively improve the network during the training process through the use of penalty functions that added into the chosen layers of the model. However, in many other state of the art solutions, interpretability is done after the model is trained. This means that no modifications are made to the network until after it is fully trained. This results from this method of interpretability often take form of decision tress, extracting logical rules, or highlighting images. However, improvement will have to be done through an additional training process, meaning that more time will be taken. This is often the case with vision-based models such as those used in self-driving neural networks.

Based on previous works done in the field of neural network interpretability, we propose adding a penalty function in the feature extraction layers of an autonomous driving neural network model rather than taking the common passive interpretability approach. Our method is intended to prevent the model from developing complex data representations that are not human-understandable. Our group specifically chose L1 regularization as our penalty function for the purpose of creating sparse feature maps that can ignore noise. Through the use of an autonomous driving simulator and feature extraction methods, we proved that our regularized model was more effective and interpretable than an baseline version of it. To determine model effectiveness, we observed the autonomy of both models. To determine interpretability, we measured the compressibility and randomness of the features learned by both models. For feature compressibility, the Principle Component Analysis (PCA) algorithm was applied from the features extracted from the model. Furthermore, the randomness of the features were calculated using entropy. Our results show that our regularized model was more autonomous and learned more compressible and less random features than the original baseline.