Date of Award
Thesis - SCU Access Only
Santa Clara : Santa Clara University, 2020.
Master of Science (MS)
Computer Science and Engineering
Short texts are very common nowadays on the Internet because of the popularity of Twitter, notes and other lightweight blogging services. Accurate categorization of these short texts and recognition named entities are critical for enhancing these services as classification and named entity recognition (NER) provide the foundation for better Ads targeting and recommendation. In this thesis, we focus on classification and NER of short texts. For classification, due to the sparsity of the context information, traditional multi-label classification methods do not perform well on short text. We propose a novel Label Correlated Recurrent Neural Network (LC-RNN) for multi-label classification of short texts by exploiting label correlations. We utilize a tree structure to represent the relationships between labels and consequently an efficient max-product algorithm can be developed for exact inference of label prediction. We conduct experiments on four testbeds and the results demonstrate the effectiveness of the proposed model. For NER, We design a BioBERT+CRF model to recognize the named entities in call notes, a kind of short texts, written by pharmaceutical sales representatives. The result shows that our model obtains better performance than spaCy v2.0, a very popular public NER tool.
Peng, Zhiyuan, "Classification and Named Entity Recognition of Short Texts" (2020). Computer Science and Engineering Master's Theses. 21.