Zhiyuan Peng

Date of Award


Document Type

Thesis - SCU Access Only


Santa Clara : Santa Clara University, 2020.

Degree Name

Master of Science (MS)


Computer Science and Engineering

First Advisor

Yi Fang


Short texts are very common nowadays on the Internet because of the popularity of Twitter, notes and other lightweight blogging services. Accurate categorization of these short texts and recognition named entities are critical for enhancing these services as classification and named entity recognition (NER) provide the foundation for better Ads targeting and recommendation. In this thesis, we focus on classification and NER of short texts. For classification, due to the sparsity of the context information, traditional multi-label classification methods do not perform well on short text. We propose a novel Label Correlated Recurrent Neural Network (LC-RNN) for multi-label classification of short texts by exploiting label correlations. We utilize a tree structure to represent the relationships between labels and consequently an efficient max-product algorithm can be developed for exact inference of label prediction. We conduct experiments on four testbeds and the results demonstrate the effectiveness of the proposed model. For NER, We design a BioBERT+CRF model to recognize the named entities in call notes, a kind of short texts, written by pharmaceutical sales representatives. The result shows that our model obtains better performance than spaCy v2.0, a very popular public NER tool.

SCU Access Only

To access this paper, please log into or create an account in Scholar Commons using your email address.