Date of Award
6-13-2024
Document Type
Thesis
Publisher
Santa Clara : Santa Clara University, 2024
Department
Computer Science and Engineering
First Advisor
Xiang Li
Abstract
A staggering amount of news articles are uploaded every day – approximately 5,000 in the United States alone [1]. That volume of information causes difficulty for many people who try to stay up-to-date with current events. The number of articles and the multitude of sources that they come from can feel overwhelming. In our project, we attempt to tackle this issue. We use web scraping to collect a dataset of news articles and combine it with a Large Language Model (LLM) capable of processing those articles to generate news summaries for the user. The user interacts with the program through a website, which presents them with a list of topics that may interest them. The user can select a few articles, similar or dissimilar to each other, to generate a summary of the key points of the selected articles. Overall, this approach is useful for abbreviating large amounts of information into more digestible pieces. Caution must be taken, however, as summaries generated by the LLM may not be completely factual. The user must remain vigilant as there may be inaccuracies in the generated summaries. As LLMs, training datasets, and the power of computer hardware advance, the accuracy and quickness of the generated summaries will continue to increase.
Recommended Citation
Wang, Justin; Maguin, Jack; Orloff, George; and Groves, Justin, "Daily Digest: A News Aggregation Site" (2024). Computer Science and Engineering Senior Theses. 278.
https://scholarcommons.scu.edu/cseng_senior/278