Date of Award
Santa Clara : Santa Clara University, 2023.
Doctor of Philosophy (PhD)
Electrical and Computer Engineering
Convolutional Neural Networks (CNNs) have evolved to be very accurate for the classification of image objects from a single image or frames in video. A major function in a CNN model is the extraction and encoding of features from training or ground truth images, and simple CNN models are trained to identify a dominant object in an image from the feature encodings. More complex models such as RCNN and others can identify and locate multiple objects in an image. Feature Maps from trained CNNs contain useful information beyond the encoding for classification or detection. By examining the maximum activation values and statistics from early layer feature maps it is possible to identify key points of objects, including location, particularly object types that were included in the original training data set. Methods are introduced that leverage the key points extracted from these early layers to isolate objects for more accurate classification and detection, using simpler networks compared to more complex, integrated networks.
An examination of the feature extraction process will provide insight into the information that is available in the various feature map layers of a CNN. While a basic CNN model does not explicitly create instances of visual or other types of information expression, it is possible to examine the Feature Map layers and create a framework for interpreting these layers. This can be valuable in a variety of different goals such object location and size, feature statistics, and redundancy analysis. In this thesis we examine in detail the interpretation of Feature Maps in CNN models, and develop a method for extracting information from trained convolutional layers to locate objects belonging to a pre-trained image data set. A major contribution of this work is the analysis of statistical characteristics of early layer feature maps and development of a method of identifying key-points of objects without the benefit of information from deeper layers. A second contribution is analysis of the accuracy of the selections as key-points of objects present in the image. A third contribution is the clustering of key-points to form partitions for cropping the original image and computing detection using the simple CNN model.
This key-point detection method has the potential to greatly improve the classification capability of simple CNNs by making it possible to identify multiple objects in a complex input image, with a modest computation cost, and also provide localization information.
Rush, Allen, "CNN Feature Map Interpretation and Key-Point Detection Using Statistics of Activation Layers" (2023). Engineering Ph.D. Theses. 51.