
Representing the trajectory of a moving object more accurately with a fewer number of reported points is a crucial issue in designing MOD servers because the frequency of trajectory updates is a critical factor in determining the performance of a real-time MOD server. Conventional line-based models using a linear function create trajectories that have angles at reported (factual) locations. Thus it does not represent well the smooth trajectories of moving objects, especially with momentum. Our proposed trajectory representation models harness a higher degree polynomial to draw more accurate trajectories in which not only locations but also some important derivatives (e.g., velocity and acceleration) change smoothly. Our experimental results demonstrate the superiority of the proposed curve-based models over the conventional line-based ones in 1) tracking the actual trajectories, 2) reducing the number of reporting points (trajectory updates).
To support large-scale continuously changing data object (CCDO) applications, one requires a data management system that can store, update, and retrieve CCDOs. Each CCDO has both multidimensional-temporal (i.e., 2 or 3D geographic space or other information dimensions that vary with time) properties representing its continuous trajectory in an information space-time continuum as well as non-temporal properties such as identification, associated phone number, and address. Importantly, although CCDOs can continuously move or change (thus drawing continuous trajectories in a space-time), computer systems cannot deal with continuously occurring infinitesimal changes - this would effectively require infinite computational speed and sensor resolution. Thus, each object's continuously changing attribute values (e.g., location, velocity, and acceleration) can only be discretely updated. Hence, they are always associated with a degree of uncertainty, especially when there is a considerable time gap between two updated points. In this research, we proposed a novel and practical framework for managing multidimensional CCDOs (i.e., spatiotemporal trajectory representation and processing). The proposed spatiotemporal trajectory model, called the Tornado Model, can more efficiently support both conventional CCDOs that move in a 2- or 3-dimensional geographic space and emerging high-dimensional CCDOs, such as combined sensor streams and satellite data. Based on the proposed spatiotemporal trajectory modeling framework, one can significantly enhance the performance of spatiotemporal query processing.
In recent years, an increasing number of database applications deal with large sets of multidimensional data. In these applications, the performance of the query system is heavily dependent on the available access methods (such as R-trees) and the underlying disk system. Unlike traditional disk models, recently developed disk models provide multiple zones in a disk, where seek times and data transfer rates vary significantly across the zones. Although efficient index optimizations, such as bulk-loading and packing algorithms, have been developed for large data, there is a marked lack of investigation on how to optimize multidimensional access methods given a zoned disk model. We proposed novel zoned access techniques that can significantly enhance the query performance of various access methods by taking into account of the different page access times of the disk zones. The proposed index zoning algorithm can zone virtually any hierarchical (tree-type) index structure in such a way that all disk zone spaces are equally utilized and that more frequently accessed pages are stored in a faster zone. Then the proposed generic query algorithm makes each search be a sequence of zone-bursts, where each involved zone is intensively accessed during a certain amount of time, greatly reducing disk seek times. In our experiments, the proposed zoned access technique significantly improved the query performance of the R*-tree (i.e., 4-5 times faster query processing). We first started to apply our approaches to static database where data change doesn't happen or infrequently occur. Later we extended our approaches to dynamic database applications, where data insertions and deletions frequently occur at run time.
Access to a large volume of publicly available geospatial data on the web is hindered due to their restricted web interfaces. A typical scenario is the existence of numerous business web sites that provide the address of their branch locations through a limited "nearest location" web interface. For example, a chain restaurant's web site such as McDonalds can be queried to find some of the closest locations of its branches to the user's home address. However, even though the site has the location data of all restaurants in, for example, the state of California, the provided web interface makes it very difficult to retrieve this data set. We conceptualize this problem as a more general problem of running spatial range queries by utilizing only k-Nearest Neighbor (k-NN) searches. Subsequently, we propose a set of algorithms to cover the rectangular shape of the spatial range query with as few k-NN circles as possible.
Many Geographic Information Systems (GIS) handle large geospatial datasets stored in raster representation. Spatial joins over raster data are important queries in GIS for data analysis and decision support. However, evaluating spatial joins can be very time intensive due to the size of these datasets. In this paper we propose a new interactive framework that allows users to get approximate answers in near instantaneous time, thus allowing for truly interactive data exploration. Our method utilizes two proposed statistical approaches: probabilistic join and sampling based join. Our probabilistic join method provides speedup of two orders of magnitude with no correctness guarantee, while our sampling based method provides an order of magnitude improvement over the full quad-tree join and also provides running conffidence intervals. We propose a framework that combines the two approaches to allow end users to tradeoff speed versus bounded accuracy.
For many years, I have researched on the design and performance enhancement of continuous media servers, such as video servers, using data placement and retrieval scheduling techniques on disk subsystems. I have developed a series of techniques to support a hiccup-free display of video objects using modern multi-zone disk drives, where these disks are partitioned into multiple zones with different data transfer rates to maximize their storage capacity. However, the variable transfer rate of a disk drive has been problematic in guaranteeing real-time display of continuous media data. Naive approaches utilize only the minimum transfer rate of a disk to guarantee a hiccup-free display of continuous media. My techniques harness the average transfer rate of each disk (instead of its minimum), resulting in a significant improvement in the system throughput and, subsequently, a lower system cost. I produced many publications in this field and my work in this field was well organized and discussed in the textbook, Streaming Media Server Design by Prentice Hall PTR, 2003, that I co-authored with Ali Dashti, Cyrus Shahabi, and Roger Zimmermann.
Z-RAID is a data placement scheme to optimize data transfer rate of RAID (Redundant Array of Inexpensive Disks) system using multi-zone disks by constraining data placement, especially for streaming media server that require a high data transfer rate as well as fault tolerance. The main ideas of the proposed constrained data placement are 1) to store primary data blocks (for normal access) in faster zones and secondary blocks (for standby in case of a disk failure) in slower zones, and 2) to store frequently accessed data blocks (such as popular video clips) in faster zones and infrequently accessed blocks in slower zones. Its initial work was published in the proceedings of the 10th International Conference on Database Systems for Advanced Applications (DASFAA 2005), and its completed and extended journal version appears in the ACM Transactions on Storage (2007).
Another research interest is on latency-sensitive interactive applications such as interactive training systems or digital entertainment. In those applications, latency issues are more critical than system throughput which has traditionally been the main focus of continuous media server research. It has been known to be challenging for a server to achieve less than a second startup latency while supporting many simultaneous continuous displays, especially with user interaction. I have studied a deadline-driven retrieval scheduling with bulk prefetching to minimize startup latency (both average and variance). Combined with a random data placement, a less than a second startup latency can be achieved with a very low hiccup probability while maintaining a high throughput.
My research also includes various data replication techniques to achieve a further reduction in startup latency as well as fault tolerant design that is an essential feature in large scale servers. Especially, selective replication based on the popularity of multimedia objects provides a better performance and more cost-effective configuration of multimedia servers with limited system resources. My research on data replication has been extended to a more general reconfiguration problem that naturally arises in real large scale continuous media services where users' requests and popularity of objects change over time. Online reconfiguration of data objects following changing popularity usually requires time and disk bandwidth resulting in a degradation of the system performance during the process. I studied the optimum number of replicas based on varying popularity of objects and quantified the expected startup latency and reconfiguration overhead. After that, I devised an algorithm to control the reconfiguration process as a trade off between the reconfiguration overhead and the potential performance gain.