Automatic catalog construction for product search engines

Update Item Information
Publication Type dissertation
School or College College of Engineering
Department Computing
Author Nguyen, Hoa Thanh
Title Automatic catalog construction for product search engines
Date 2011-12
Description With the steady increase in online shopping, more and more consumers are resorting to Product Search Engines and shopping sites such as Yahoo! Shopping, Google Product Search, and Bing Shopping as their first stop for purchasing goods online. These sites act as intermediaries between shoppers and merchants to drive user experience by enabling faceted search, comparison of products based on their specifications, and ranking of products based on their attributes. The success of these systems heavily relies on the variety and quality of the products that they present to users. In that sense, product catalogs are to online shopping what the Web index is to Web search. Therefore, comprehensive product catalogs are fundamental to the success of Product Search Engines. Given the large number of products and categories, and the speed at which they are released to the market, constructing and keeping catalogs up-to-date becomes a challenging task, calling for the need of automated techniques that do not rely on human intervention. The main goal of this dissertation is to automatically construct catalogs for product search engines. To achieve this goal, the following problems must be addressed by these search engines: (i) product synthesis-creation of product instances that conform with the catalog schema; (ii) product discovery- derivation of product instances for products whose schemata are not present in the catalog; (iii) schema synthesis- construction of schemata for new product categories. We propose an end-to-end framework that automates, to a great extent, these tasks. We present a detailed experimental evaluation using real data sets which shows that our framework is effective, scaling to a large number of products and categories, and resilient to noise that is inherent in Web data.
Type Text
Publisher University of Utah
Subject Database; Data integration; Deep web; E-commerce; Machine learning; Schema matching
Dissertation Institution University of Utah
Dissertation Name Doctor of Philosophy
Language eng
Rights Management Copyright © Hoa Thanh Nguyen 2011
Format Medium application/pdf
Format Extent 2,772,263 bytes
Identifier us-etd3,75221
Source Original housed in Marriott Library Special Collections, TK7.5 2011 .N48
ARK ark:/87278/s6tx3w3h
Setname ir_etd
ID 194311
Reference URL https://collections.lib.utah.edu/ark:/87278/s6tx3w3h