Date of Award

6-9-2022

Document Type

Thesis - SCU Access Only

Publisher

Santa Clara : Santa Clara University, 2022.

Department

Computer Science and Engineering

First Advisor

David C. Anastasiu

Abstract

In the field of Big Data, multivariate time series collect high dimensional data of observed subjects in sectors ranging from health services to entertainment to environmental studies over periodical time segments. By analyzing time series, researchers can query, cluster, summarize, and segment relevant time series to identify and model behaviors of subjects that inform future decisions for end-users like manufacturers or service providers. Specifically, researchers can utilize an offline, iterative dynamic programming algorithm that optimally partitions a given batch of multivariate time series data into segments with central patterns; these patterns not only capture the overall behavior of a stored time series, but also allow researchers to track changes in the input batch’s underlying behavior.

As the speed of collecting Big Data increases, however, the mentioned offline batch-processing algorithm that serially runs is not fast enough for streamed, real-time multivariate time series data. While current research has developed online streaming segmentation algorithms, which can immediately handle streamed-in data, these algorithms semantically segment streamed-in time series, failing to produce central patterns usually obtained from offline batch-processing segmentation algorithms.

To address the issues mentioned above, I implemented an online, parallel algorithm which produces prototypical patterns that capture the overall behavior of streamed-in time series. The algorithm receives streamed-in minibatches as its input, creating distinct, acceptable units of data to efficiently analyze. The method also smooths the transition between the patterns generated across subsequent minibatches.

Share

COinS