Author

Jingsen Wang

Date of Award

8-6-2024

Document Type

Thesis

Publisher

Santa Clara : Santa Clara University, 2024

Degree Name

Master of Science (MS)

Department

Computer Science and Engineering

First Advisor

Yi Fang

Abstract

This thesis explores the potential of Large Language Models (LLMs) in automating the extraction of sourcing information from news articles, a crucial step towards enhancing transparency and ethical analysis in journalism. We evaluate the performance of two state-of-the-art LLMs, GPT-4 and Claude 3, in identifying and categorizing various source types across four diverse news articles. The thesis employs a zero-shot learning approach with two different prompt designs, assessing the models’ ability to adapt to varying source structures and prompt instructions.

Our findings reveal that while LLMs show promise in extracting sourcing information, their performance varies significantly across different article types and source structures. The research highlights the complex interplay between prompt design, source types, and model performance, with both LLMs demonstrating strengths and limitations in handling diverse journalistic contexts. This thesis contributes to the growing body of work on AI in journalism by providing initial insights into the current capabilities of LLMs in sourcing analysis and outlining key areas for future research and development in automated ethical analysis of news content.

Share

COinS