Future Research Directions

In the short term, I intend to extend my previous work on dynamic content sites based on local-area clusters. More work needs to be done to make such sites less dependent on administrator intervention (self-managing, self-tuning) and to improve their performance. As performance improvements I would like to investigate two techniques: integration of scheduler and database support for replication, and integration of caching and clustering in dynamic content sites. In the long term, I intend to explore wide-area replication of data onto smart web proxies. In this context, new trade-offs, and consistency issues arise for caching, replication and load distribution. Any of these research directions could become either a Masters or Ph.D project for interested students.

Towards Self-Managing, Self-Tuning Dynamic Content Servers

Scaling and availability for web sites serving dynamic content has been an active focus of recent research. However, bursts of dynamic content traffic created by flash crowds can still cause the server to become overloaded, so much overloaded in fact, that the server appears unavailable to the clients. Hardware gross over-provisioning may be possible for very large sites but infeasible for smaller sites. Moreover, even with careful studies of past peak loads, future demands may exceed expectations and overwhelm the site (also known as the Slashdot effect). Finally, the inherent complexity in the multi-tiered architecture of such sites makes it difficult to predict exactly where the bottleneck will be for a particular workload at each point in time. Over-provisioning is hence currently needed within all tiers of a multi-tiered dynamic content site. To address these problems, I intend to explore adaptive techniques for better utilization of all resources available to the dynamic content server. In particular, I intend to explore the following mechanisms that could be added to current dynamic content sites transparently, with no client-side modifications:

  • A transparent caching scheme based on automatic program analysis
  • Dynamic resource re-assignment across different applications and among the different tiers of the dynamic content web site to handle transient bottlenecks in an individual application or tier

    Integration of Scheduler and Database Support for Replication

    I am interested in defining the database features, the interfaces, and the protocols necessary for a closer integration between the scheduler and database consistency maintenance policies, in replicated database clusters.

    Up to now, I have studied only scheduler policies that reduce consistency maintenance overheads, while the database has remained unchanged. I believe that further improvements are possible if the database is no longer treated as a black box, or with only minimal changes to the database engine. In particular, a promising avenue is combining the scheduler's conflict avoidance with a fine-grained concurrency control such as multiversioning at the database.


    Integration of Caching and Clustering in Dynamic Content Sites

    While caching techniques for a standalone dynamic content web server have proven to be beneficial, such techniques increase CPU contention and memory pressure at the Web server. In a cluster with multiple Web servers, it seems natural to try to aggregate their memories and CPUs into what logically appears as a single larger cache.

    Two problems need to be solved to accomplish this goal: First, consistency needs to be maintained between the different caches and between the caches and the back-end databases. Second, an incoming request needs to be directed to the appropriate server to optimize its chance of being found in the cache at that server.


    Consistency Issues for Wide-Area Replication on Web Proxies

    Currently, state-of-the-art web proxies (i.e. machines at the client edge of the network), with very limited exceptions, cache only static web content and media streams produced by origin servers. Now let's assume that the proxies at the client edge are trusted machines, either by being part of a corporate content delivery network (CDN) operated by the content provider itself, or otherwise by being part of a CDN operator that is under contract with the content provider. In this context, I intend to investigate moving some of functionality from the server machines at the data center clusters to proxies at the client edge of the network. My goal here is to overcome limitations inherent in a single-site server (cluster), including issues such as long network latencies, network disconnection, and limits on the scalability of any single cluster.

    In this design, the proxies will be capable of executing application logic and store permanent data belonging to that application in a local file system or in a local relational database. The proxy may simply cache query results in its file system, and operate as a dynamic content cache, or it may operate by means of SQL queries on a local view of the database, just like the main server.

    While consistent replication and caching can be done transparently in a local cluster, more sophisticated methods may be needed in a geographically distributed system, because of problems of latency, bandwidth, and possible disconnected operation.


    Load Distribution onto Proxies

    In CDNs for static content, a client request is re-directed to a nearby proxy, wherein near-ness is often equated with physical proximity in the network (for instance, using the number of hops). Load on the network may also be taken into account, but it is always assumed that the proxies are not loaded, and load on the proxy is not taken into account. For static content proxies, this is a reasonable assumption, but it may not be so on a proxy that executes application logic and contains persistent data. I intend to develop an algorithm for load balancing that also takes proxy load into account. In such an algorithm, request distribution may still be driven primarily by locality, but load balancing considerations will take over in the case of overload.

    In this context, I also intend to explore a somewhat less intuitive idea. Currently, Internet data centers over-provide in terms of hardware to cover for overload periods created by flash-crowds. However, since some of the computation necessary for dynamic content generation is expensive, it may pay off to execute such computation remotely, on a trusted server or proxy, temporarily off-loading the main server during the load spike.