Date of Award

6-13-2024

Document Type

Thesis

Publisher

Santa Clara : Santa Clara University, 2024

Department

Computer Science and Engineering

First Advisor

Xiang Li

Abstract

A staggering amount of news articles are uploaded every day – approximately 5,000 in the United States alone [1]. That volume of information causes difficulty for many people who try to stay up-to-date with current events. The number of articles and the multitude of sources that they come from can feel overwhelming. In our project, we attempt to tackle this issue. We use web scraping to collect a dataset of news articles and combine it with a Large Language Model (LLM) capable of processing those articles to generate news summaries for the user. The user interacts with the program through a website, which presents them with a list of topics that may interest them. The user can select a few articles, similar or dissimilar to each other, to generate a summary of the key points of the selected articles. Overall, this approach is useful for abbreviating large amounts of information into more digestible pieces. Caution must be taken, however, as summaries generated by the LLM may not be completely factual. The user must remain vigilant as there may be inaccuracies in the generated summaries. As LLMs, training datasets, and the power of computer hardware advance, the accuracy and quickness of the generated summaries will continue to increase.

Share

COinS