I am interested in doing experimental research on large-scale distributed systems, such as clusters, grids, and clouds, with a particular focus on performance and reliability aspects of such systems.

Big Data Research

Contributed to the design of a dynamic resource allocation mechanism that balances the allocations of multiple MapReduce framework instances to provide each instance similar levels of quality of service. (ACM SIGMETRICS'14)
Designed and analyzed various machine learning-based performance models to automatically tune Hadoop MapReduce applications. (MASCOTS'13, Best Paper Award)
Contributed to the design of a system to deploy multiple MapReduce framework instances to multi-cluster systems. (MTAGS'12 @ SuperComputing , Best Paper Award)
Designed and assessed several heuristics for energy-efficient scheduling of Hadoop MapReduce workloads on heterogeneous clusters comprising low-power and high-performance processors. (GCM'11 @ ACM/IFIP/USENIX Middleware )

Cloud Computing Research

Evaluated the variability of the performance delivered by production cloud services using the traces that we collected from Amazon Web Services and Google App Engine. (CCGRID'11 )
Using scientific computing workloads evaluated the performance of four production clouds: Amazon Elastic Compute Cloud (EC2), Mosso, Elastic Hosts, and GoGrid. (IEEE TPDS , Biggest Impact Award)
Developed a framework, C-Meter, for running performance tests on IaaS clouds. (Cloud'09 )

Grid Computing Research

Investigated various overload control techniques to control overload in multi-cluster computer systems with extensive experiments in the DAS-3 multi-cluster grid in the Netherlands. (GRID'11 )
Investigated the performance of dynamic workflow scheduling policies with both realistic simulations and real system experiments in the DAS-3 multi-cluster grid. (HPDC'10 )
Investigated the impact of static and dynamic overprovisioning techniques on the performance consistency of Bag-of-Tasks workloads. (Grid'10 )
Evaluated the performance and benefits of job runtime and queue wait time predictions in grids using traces gathered from various research and production grid environments. (HPDC'09 )

Failure Characterization in Large-Scale Distributed Systems

Proposed a model for groups of failures (space-correlated failures) in large-scale distributed systems using fifteen traces from the Failure Trace Archive. (EuroPar'10 )
Investigated and proposed a model for time-correlated failures (investigated auto-correlation, periodicity and peaks of failures) in large-scale distributed systems using nineteen traces from the Failure Trace Archive. (Grid'10 )

Incremental Placement of Interactive Perception Applications

This research has been done in the context of the SLIPStream project when I was an Intel/CMU Summer Research Fellow at Intel Research Labs, Pittsburgh.
Designed and implemented algorithms for dynamic and incremental placement of perception applications to minimize the end to end latency subject to capacity, placement, and migration cost constraints. (HPDC'11 )

Nezih Yigitbasi

home | research | publications | talks | | |