Research in Large Datacenters and Cloud


We propose OX, a runtime system that shields applications from network congestion and failures, in shared data centers.

OX enables customers to deploy network intensive data analytics frameworks within existing infrastructures, by protecting co-hosted QoS-constrained applications from network interference and performance degradation.

Moreover, OX reduces application vulnerability to hardware failures, such as rack power outages, for all applications. OX discovers application topologies by monitoring network traffic among application components (virtual machines), transparently.

In addition, OX allows application owners to specify groups of highly available virtual machines, following component roles and replication semantics.

Based on this information, OX builds on-line topology graphs for applications and incrementally partitions these graphs across the infrastructure to optimize communication between virtual machines and enforce availability constraints. We show the benefits of OX in a realistic shared data center setting using a mix of Hadoop and YCSB/Cassandra workloads.




Adaptive Distributed Data Store

Cloud computing related technologies have been quickly developed in recent years, and continue to prosper. Many Cloud providers, e.g. Amazon EC2, RackSpace, have built their own Cloud service interfaces, tools and rules, and enforce the use of their API for user access to their services. The number of Cloud users are kept growing; at the same time, many new demands for cloud services are emerging. For example, for achieving high availability, users may want to deploy their services on multiple Clouds, avoiding the risk of relying on a single Cloud provider.

We believe there are ever increasing demands for various new Cloud features with time. However, it may take a long time for commercial Clouds to develop these new features and make them public.

We propose to build a flexible Cloud management tier above existing various commercial and private Clouds. It interconnects the different types of the clouds, and supports fast experimenting with and deploying of new Cloud services.


Enhancing Application Robustness in Infrastructure-as-a-Service Clouds

Authors: Madalin Mihailescu, Andres Rodriguez, Dmitrijs Palcikovs, Gabriel Iszlai, Andrew Trossman, Joanna Ng, and Cristiana Amza.

In the IBM conference of the Centre for Advanced Studies on Collaborative Research (CASCON), best paper award, Toronto, Ontario, Canada, November 2011.

Enhancing Application Robustness in Infrastructure-as-a-Service Clouds

Authors: Madalin Mihailescu, Andres Rodriguez, and Cristiana Amza.

In The First International Workshop on Dependability of Clouds, Data Centers and Virtual Computing Environments (DCDV 2011), in conjunction with the 41st IEEE/IFIP DSN 2011, June 2011. (also a poster in NSDI 2011)

SLIM: Network Decongestion for Storage Systems

Authors: Madalin Mihailescu, Gokul Soundararajan, and Cristiana Amza.

In the Second Workshop on I/O Virtualization (WIOV), San Jose, CA, USA, February 2010.


Madalin Mihailescu

Saeed Ghanbari

Jin Chen

Prof. Amza