Repository logo
 

A framework for real-time, autonomous anomaly detection over voluminous time-series geospatial data streams

dc.contributor.authorBudgaga, Walid, author
dc.contributor.authorPallickara, Shrideep, advisor
dc.contributor.authorPallickara, Sangmi Lee, advisor
dc.contributor.authorBen-Hur, Asa, committee member
dc.contributor.authorSchumacher, Russ, committee member
dc.date.accessioned2007-01-03T06:37:15Z
dc.date.available2007-01-03T06:37:15Z
dc.date.issued2014
dc.description.abstractIn this research work we present an approach encompassing both algorithm and system design to detect anomalies in data streams. Individual observations within these streams are multidimensional, with each dimension corresponding to a feature of interest. We consider time-series geospatial datasets generated by remote and in situ observational devices. Three aspects make this problem particularly challenging: (1) the cumulative volume and rates of data arrivals, (2) anomalies evolve over time, and (3) there are spatio-temporal correlations associated with the data. Therefore, anomaly detections must be accurate and performed in real time. Given the data volumes involved, solutions must minimize user intervention and be amenable to distributed processing to ensure scalability. Our approach achieves accurate, high throughput classications in real time. We rely on Expectation Maximization (EM) to build Gaussian Mixture Models (GMMs) that model the densities of the training data. Rather than one all-encompassing model, our approach involves multiple model instances, each of which is responsible for a particular geographical extent and can also adapt as data evolves. We have incorporated these algorithms into our distributed storage platform, Galileo, and proled their suitability through empirical analysis which demonstrates high throughput (10,000 observations per-second, per-node) and low latency on real-world datasets.
dc.format.mediumborn digital
dc.format.mediummasters theses
dc.identifierBudgaga_colostate_0053N_12527.pdf
dc.identifier.urihttp://hdl.handle.net/10217/83908
dc.languageEnglish
dc.language.isoeng
dc.publisherColorado State University. Libraries
dc.relation.ispartof2000-2019
dc.rightsCopyright and other restrictions may apply. User is responsible for compliance with all applicable laws. For information about copyright law, please see https://libguides.colostate.edu/copyright.
dc.subjectbig data
dc.subjectclustering
dc.subjectdata streams
dc.subjectdistributed system
dc.subjectonline anomaly detection
dc.subjecttime series analytics
dc.titleA framework for real-time, autonomous anomaly detection over voluminous time-series geospatial data streams
dc.typeText
dcterms.rights.dplaThis Item is protected by copyright and/or related rights (https://rightsstatements.org/vocab/InC/1.0/). You are free to use this Item in any way that is permitted by the copyright and related rights legislation that applies to your use. For other uses you need to obtain permission from the rights-holder(s).
thesis.degree.disciplineComputer Science
thesis.degree.grantorColorado State University
thesis.degree.levelMasters
thesis.degree.nameMaster of Science (M.S.)

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Budgaga_colostate_0053N_12527.pdf
Size:
928.86 KB
Format:
Adobe Portable Document Format
Description: