About me
I am a professor in the Electrical and Computer Engineering Department and (by courtesy) Department of Computer Science, University of Toronto. I received my Ph.D from the Computer Science Department of University of Illinois, Urbana-Champaign under the supervision of a great advisor, Yuanyuan Zhou. I was also a visiting PhD student in the awesome System and Networking group of University of California, San Diego. My CV is here.
(New) My PhD student Xiang (Jenny) Ren is on the academic job market! She is GREAT!
I founded a startup company called YScope with my PhD students so that our research can make real-world impact. Check out CLP, an open-source tool that can compress text logs and search compressed logs without decompression. This Uber Engineering Blog describes a deployment case-study of CLP.
My research interest is systems software, with a focus on developing practical solutions to improve the availability and performance of large software systems.
I am a Canada Research Chair in Systems Software and a recipient of McCharles Prize for Early Career Research Distinction. I also received a few teaching awards, including the Gordon Slemon award and Student Choice Award (upper year instructor) of Faculty of Engineering. I am the vice-chair of ACM SIGOPS.
I am looking for self motivated students to work with me. If you are interested, please submit your application here.
News
- Hacker News [1], [2],
- Discussions from HBase developers, which prompted a series of reactions to address the problems we mentioned in the paper.
- Twitter discussions: see this, this, and this (if you're looking for a screenshot that summarizes our paper, see this or this).
- Blog: the morning paper (also it is considered as a highlight of 2016), It Will Never Work In Theory, Another word for it, Metadata, Fifty Quick Ideas to Improve Your Tests, Postmortem lessons, Some discussions on Google+.
- And quite a few emails sent to us from developers...
Selected publications
- μSlope: High Compression and Fast Search on Semi-Structured Logs. In the Proceedings of the 18th USENIX Symposium on Operating Systems Design and Implementation (OSDI'24), July, 2024. Pages 529-544. [Code]
- Relational Debugging -- Pinpointing Root Causes of Performance Problems. In the Proceedings of the 17th USENIX Symposium on Operating Systems Design and Implementation (OSDI'23), July, 2023. Pages 65--80. [SIGOPS Blog, Code]
- Investigating Managed Language Runtime Performance: Why JavaScript and Python are 8x and 29x slower than C++, yet Java and Go can be Faster? In the Proceedings of the 2022 USENIX Annual Technical Conference (ATC'22), July 11-13, 2022. Pages 835--852. [USENIX ;login: article] [Code]
- Hubble: Performance Debugging with In-Production, Just-In-Time Method Tracing on Android. In the Proceedings of the 16th USENIX Symposium on Operating Systems Design and Implementation (OSDI'22), July 11-13, 2022. Pages 787--803.
- ctFS: Replacing File Indexing with Hardware Memory Translation through Contiguous File Allocation for Persistent Memory. In the Proceedings of the 20th USENIX Conference on File and Storage Technologies (FAST'22), Febuary 22-24, 2022. Best paper award runner up. [ACM Transaction on Storage article] [USENIX ;login: article] [Code]
- Understanding and Detecting Software Upgrade Failures in Distributed Systems In the Proceedings of The 28th ACM Symposium on Operating Systems Principles (SOSP'21), October 25-28, 2021. [Code]
- CLP: Efficient and Scalable Search on Compressed Text Logs. In the Proceedings of the 15th USENIX Symposium on Operating Systems Design and Implementation (OSDI'21). July 14--16, 2021. Pages 183--198. [Code]
- M3: End-to-End Memory Management in Elastic Systems Software Stack. In the 16th ACM European Conference on Computer Systems (EuroSys 2021), April, 2021. Pages 507-522. [Code]
- The Inflection Point Hypothesis: A Principled Debugging Approach for Locating the Root Cause of a Failure. In the 27th ACM Symposium on Operating Systems Principles (SOSP’19), October 2019, Huntsville, Ontario, Canada. [Press: The morning paper] [USENIX ;login: article]
- An Analysis of Performance Evolution of Linux's Core Operations. In the 27th ACM Symposium on Operating Systems Principles (SOSP’19), October 2019, Huntsville, Ontario, Canada. [Press: The morning paper] [Code]
- Log20: Fully Automated Optimal Placement of Log Printing Statements under Specified Overhead Threshold. In the 26th ACM Symposium on Operating Systems Principles (SOSP’17), October 2017, Shanghai, China. [Press: The morning paper][Code][Impact: licensed by Netflix]
- Pensieve: Non-Intrusive Failure Reproduction for Distributed Systems using the Event Chaining Approach. In the 26th ACM Symposium on Operating Systems Principles (SOSP’17), October 2017, Shanghai, China.
- Non-intrusive Performance Profiling of Entire Software Stacks based on the Flow Reconstruction Principle. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), November 2016, Savannah, GA.
- Don't Get Caught In the Cold, Warm-up Your JVM: Understand and Eliminate JVM Warm-up Overhead In Data-parallel Systems. Proceedings of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI’16), November 2016, Savannah, GA. [Press: Invited publication: USENIX ;login: 42(1), The Next Platform][Code]
- Simple Testing Can Prevent Most Critical Failures: An Analysis of Production Failures in Distributed Data-intensive Systems. Proceedings of the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14), October 2014, Broomfield, CO
- lprof: A Non-intrusive Request Flow Profiler for Distributed Systems. In the 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI’14), October 2014, Broomfield, CO. *: Equally contributed.
- Do Not Blame Users for Misconfigurations Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP’13), November 2013.
- Be Conservative: Enhancing Failure Diagnosis with Proactive Logging Proceedings of the 9th ACM/USENIX Symposium on Operating Systems Design and Implementation (OSDI’12), Hollywood, CA, October 2012
- Improving Software Diagnosability via Log Enhancement ACM Transactions on Computer Systems (TOCS), Februray 2012. Fast-forwarded from ASPLOS'11.
- SherLog: Error Diagnosis by Connecting Clues from Run-time Logs. In the Proceedings of the 15th International Conference on Architecture Support for Programming Language and Operating Systems (ASPLOS’10), pages 143-154, Pittsburgh, PA., March 2010.
- /* iComment: Bugs or Bad Comments? */ In the Proceedings of the 21st ACM Symposium on Operating Systems Principles (SOSP’07), pages 145-158, October 2007.
Full publication list
Group
It is fun to work with the following incredible people:
Post-doc:
- Huangshi Tian
Graduate students:
- Rishikesh Devsot
- Adrian Chiu
- Devin Gibson
- Zhuqi Jin
- Ruibin Li
- Zhihao Lin
- Jack Luo
- Xiang (Jenny) Ren
- Sitao Wang
- Xiaochong Wei
- Haiqi Xu
- Yi Fan Yu
Alumni:
- Yongle Zhang, PhD 2020, First Employment: Assistant Professor, Department of Computer Science, Purdue University. Winner of The SIGOPS Dennis M. Ritchie Thesis Award.
- Xu Zhao, PhD 2021, First Employment: Research Scientist@Facebook. Winner of Facebook Fellowship.
- Kirk Rodrigues, PhD 2023, First Employment: Co-founder of YScope.
- David Lion, PhD 2023, First Employment: Co-founder of YScope.
- Hailong Sun, visiting scholar, now Professor at Beihang University
- Serhei Makarov, Master of Applied Science, now at Red Hat.
- Rui Wang, Master of Applied Science 2023, now at YScope.
- Muhammad FaizanUllah (Undergraduate thesis) -> Microsoft
- Neil Newman (Undergraduate thesis) -> graduate school@UBC
- Alan Chung (Undergraduate thesis)
Teaching
- ECE344 Operating Systems: [Winter24][Winter23][Winter22][Winter21][Winter20][Winter18][Winter17][Winter16][Winter15][Winter14][Winter13]
- ECE454 Computer Systems Programming: [Fall18][Fall14][Fall13]
- ECE244 Programming Fundamentals: [Fall22][Fall17][Fall16]
- ECE1759 Graduate OS: [Fall23][Fall22][Fall21][Fall20][Fall17][Fall16][Fall14]
Program committee
- 2024: SOSP, OSDI
- 2023: SOSP, OSDI, EuroSys, EuroSys Poster (PC co-chair)
- 2022: OSDI
- 2021: OSDI, SOSP, HAOC, ASPLOS
- 2020: OSDI, NSDI
- 2019: HotOS (PC Co-chair with Jinyang Li), APSys (PC Co-chair with Yu Hua)
- 2018: OSDI, EuroSys, ASPLOS (ERC)
- 2017: SOSP, Student Research Competition@SOSP'17 (chair)
- 2016: ASPLOS (also chair of poster and lightning session)
- 2015: USENIX Annual Technical Conference, USENIX LISA, SOSP (poster PC)
- 2014: OSDI (external review committee), USENIX Annual Technical Conference, SIGMETRICS, USENIX ICAC
- 2012: USENIX Workshop on Managing Systems Automatically and Dynamically
Misc
I play a lot of sports, including basketball, skiing, swimming, and running. I was the captain of the Beihang's CSE basketball team when I was an undergrad and co-captain of the UIUC CS faculty & grad-student basketball team in the intramural games. I also ran some marathon and half-marathons (see a not-so-recent photo here). When I have more time, I also play accordion and piano.