Supporting scalable data analytics on large linked data

Update Item Information
Publication Type dissertation
School or College College of Engineering
Department Computing
Author Le, Wangchao
Title Supporting scalable data analytics on large linked data
Date 2013-12
Description Linked data are the de-facto standard in publishing and sharing data on the web. To date, we have been inundated with large amounts of ever-increasing linked data in constantly evolving structures. The proliferation of the data and the need to access and harvest knowledge from distributed data sources motivate us to revisit several classic problems in query processing and query optimization. The problem of answering queries over views is commonly encountered in a number of settings, including while enforcing security policies to access linked data, or when integrating data from disparate sources. We approach this problem by efficiently rewriting queries over the views to equivalent queries over the underlying linked data, thus avoiding the costs entailed by view materialization and maintenance. An outstanding problem of query rewriting is the number of rewritten queries is exponential to the size of the query and the views, which motivates us to study problem of multiquery optimization in the context of linked data. Our solutions are declarative and make no assumption for the underlying storage, i.e., being store-independent. Unlike relational and XML data, linked data are schema-less. While tracking the evolution of schema for linked data is hard, keyword search is an ideal tool to perform data integration. Existing works make crippling assumptions for the data and hence fall short in handling massive linked data with tens to hundreds of millions of facts. Our study for keyword search on linked data brought together the classical techniques in the literature and our novel ideas, which leads to much better query efficiency and quality of the results. Linked data also contain rich temporal semantics. To cope with the ever-increasing data, we have investigated how to partition and store large temporal or multiversion linked data for distributed and parallel computation, in an effort to achieve load-balancing to support scalable data analytics for massive linked data.
Type Text
Publisher University of Utah
Subject Database systems; Linked data; Query optimization; Query rewriting; RDF; SPARQL
Dissertation Institution University of Utah
Dissertation Name Doctor of Philosophy
Language eng
Rights Management Copyright © Wangchao Le 2013
Format Medium application/pdf
Format Extent 2,009,415 bytes
Identifier etd3/id/2660
ARK ark:/87278/s6380hw2
Setname ir_etd
ID 196235
Reference URL https://collections.lib.utah.edu/ark:/87278/s6380hw2