FORCOLAB — Publications

Home Publications Team Gallery Join Us

Loading...

⌕

No publications match your search.

2026

7 papers

TOSEM 2026

When Tools Overlook Domain Knowledge: An Empirical Study of Refactoring in Scientific Software

R. Pudari, A. M. Awon, N. Ernst, and S. Zhou

ACM Transactions on Software Engineering and Methodology (TOSEM) 2026

Abstract

Refactoring is a critical process for improving code quality, but anecdotal evidence has shown that refactoring in scientific software (Sci-SW) is not always feasible. The inherently exploratory nature of Sci-SW development, characterized by evolving requirements and limited adoption of traditional software engineering practices, could present significant challenges to refactoring. However, there is no systematic study exploring refactoring practices in Sci-OSS. To bridge this gap, we explore the effectiveness of three state-of-the-art refactoring detection tools: RefDiff (C), RefactoringMiner (Java), and PyRef (Python) to detect refactorings in scientific open-source software (Sci-OSS). Our findings reveal that these tools have significant limitations, detecting fewer refactorings in Sci-OSS than non-scientific OSS (Non-Sci-OSS). Through a mixed-method approach, we identified that 67.54% of undetected refactorings in Sci-OSS require domain knowledge. To complement our analysis of the refactoring changes, we conducted surveys with 47 practitioners experienced in refactoring Sci-OSS and 14 follow-up interviews to gain deeper insights into the associated challenges. Our results revealed seven novel challenges for Sci-OSS refactoring, including a domain knowledge gap. These findings emphasize the necessity for specialized tools and strategies to support refactoring in Sci-OSS effectively.

TOSEM 2026

Detecting Protracted Vulnerabilities in Open Source Projects

A. Sridharkumar; S. Al Hajj Ibrahim; J. Zhou; Y. Wang; S. Hassan; A. Hassan and S. Zhou

ACM Transactions on Software Engineering and Methodology (TOSEM) 2026

⬇ PDF

Abstract

Timely resolution and disclosure of vulnerabilities are essential for maintaining the security of open-source software. However, many vulnerabilities remain unreported, unpatched, or undisclosed for extended periods, exposing users to prolonged security risks. We investigate the vulnerability lifecycle by focusing on protracted vulnerabilities (PCVEs), which remain unresolved or undisclosed over long durations. We propose DeeptraVul, an enhanced detection approach tailored to protracted cases, integrating multiple development artifacts and code-level signals supported by a large language model-based summarization component.

TOSEM 2026

"Should I Give Up Now?" Investigating LLM Pitfalls in Software Engineering

J. Tie, B. Yao, T. Li, H. Fang, I. Ahmed, D. Wang and S. Zhou

ACM Transactions on Software Engineering and Methodology (TOSEM) 2026

⬇ PDF

Abstract

Software engineers are increasingly incorporating AI assistants into their workflows to enhance productivity and alleviate cognitive load. However, experiences with large language models (LLMs) such as ChatGPT vary widely. Analyzing data from 26 participants in a complex web development task, we identified nine failure types categorized into incorrect or incomplete responses, cognitive overload, and context loss. Our quantitative analysis revealed that unhelpful responses increased the likelihood of abandonment by a factor of 11, while each additional prompt reduced abandonment probability by 17%.

CHI 2026

Untangling the Timeline: Challenges and Opportunities in Supporting Version Control in Modern Computer-Aided Design

Y. Deng, S. Zhang, Kathy Cheng, A. Olechowski and S. Zhou

The ACM CHI Conference on Human Factors in Computing Systems (CHI) 2026

⬇ PDF ▶ Slides

Abstract

Version control is critical in mechanical CAD to enable traceability, manage product variation, and support collaboration. This paper presents a systematic review of user-reported challenges with version control in modern CAD tools. Analyzing 170 online forum threads, we identify recurring socio-technical issues that span the management, continuity, scope, and distribution of versions. Our findings inform a broader reflection on how version control should be designed and improved for CAD.

CHI 2026

CADModelScope: Revealing the Dependency Structure Behind Parametric Computer-Aided Design Models

Y. Deng, Z. Zhang, S. Zhou, A. Olechowski

The ACM CHI Conference on Human Factors in Computing Systems (CHI) 2026

⬇ PDF ▶ Slides

Abstract

Parametric CAD models are constructed by a sequence of operations, where each operation may reference geometries created by earlier ones. This network of dependencies enables efficient modelling of complex geometry but also results in fragile models where small modifications can trigger cascading errors. We present CADModelScope, a multi-level graph-based visualization of operation dependencies integrated into a commercial CAD platform.

ICSE 2026

Beyond Adoption: Examining the Evolution and Impact of Codes of Conduct on Open-Source Communities

J. Sun, H. Fang, J. Zhang, J. Shi, R. Lai, A. Ihuman, R. Littauer, and S. Zhou

The 48th IEEE/ACM International Conference on Software Engineering (ICSE) 2026

⬇ PDF ▶ Slides

Abstract

While open source software (OSS) communities thrive on collaboration, conflicts such as toxic behavior and discrimination can surface, threatening the sustainability of these projects. To address these concerns, many communities have adopted a Code of Conduct (CoC). Our study compiles a large-scale dataset of CoCs along with their change histories in OSS repositories on GitHub to quantitatively understand the evolution of CoC content and investigate the potential impact of CoC adoption on community engagement. OSS communities with a CoC attract more new contributors and decrease the number of existing contributors disengaging from the community in the long term.

SERS 2026

The Shared Language of Crowds: A Crowd-Sourced Approach to Mapping Research Software Engineering and Software Engineering Research Terminology

T. Kehrer, R. Haines, G. Juckeland, S. Zhou and D. Bernholdt

1st International Workshop on Software Engineering and Research Software (SERS 2026)

⬇ PDF ▶ Slides

Abstract

Research Software Engineers (RSEs) often use different terminologies than the Software Engineering Research (SER) community for similar concepts. As an outcome of the Dagstuhl Seminar 24161, we developed an approach to explore these terminologies using crowd-sourcing to build a website presenting a "mapping of terms" between the groups.

2025

8 papers

CSCW 2025

It's a Complete Haystack: Understanding Dependency Management Needs in Computer-Aided Design

K. Cheng, A. Olechowski, and S. Zhou

The 28th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 2025

⬇ PDF

Abstract

Hardware development teams face increasing demands for better quality products, greater innovation, and shorter manufacturing lead times. One significant and unaddressed challenge is understanding and managing dependencies between 3D CAD models, especially when products can contain thousands of interconnected components. In this two-phase formative study, we explore designers' pain points of CAD dependency management through a thematic analysis of 100 online forum discussions and semi-structured interviews with 10 designers. We identify nine key challenges related to the traceability, navigation, and consistency of CAD dependencies.

CSCW 2025

Collaboration Challenges and Opportunities in Developing Scientific Open-Source Software Ecosystem: A Case Study on Astropy

J. Sun, Y. Li, A. Patil, J. Guo, and S. Zhou

The 28th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 2025

⬇ PDF ▶ Slides

Abstract

Scientific open-source software (OSS) has greatly benefited research communities through its transparent and collaborative nature. This study examines the challenges and opportunities for improving collaboration efficiency in the development and maintenance of scientific OSS. We conducted a mixed-methods case study on Astropy, including analysis of commit history, cross-referenced issues and pull requests, and interviews with core contributors.

CSCW 2025

Who is to Blame: A Comprehensive Review of Challenges and Opportunities in Designer-Developer Collaboration

S. Zhang, T. Zhang, J. Cheng, and S. Zhou

The 28th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 2025

⬇ PDF

Abstract

Software development relies on effective collaboration between Software Development Engineers (SDEs) and User eXperience Designers (UXDs). We conducted a systematic literature review of 45 papers published since 2004, uncovering three key collaboration challenges and two main categories of potential best practices. We then analyzed designer and developer forums and discussions from one open-source software repository to assess how the challenges and practices manifest in the status quo.

CiSE 2025

Do Research Software Engineers and Software Engineering Researchers Speak the Same Language?

T. Kehrer, R. Haines, G. Juckeland, S. Zhou and D. Bernholdt

Computing in Science & Engineering

⬇ PDF

Abstract

Anecdotal evidence suggests that Research Software Engineers (RSEs) and Software Engineering Researchers (SERs) often use different terminologies for similar concepts, creating communication challenges. Our preliminary findings reveal opportunities for mutual learning and collaboration, and our systematic methodology for terminology mapping provides a foundation for crowd-sourced extension and validation.

WWW 2025

MAML: Towards a Faster Web in Developing Regions

A. Varvello, I. Ahmed, S. Zhou, L. Subramanian, and Y. Zaki

In Proceedings of the Web Conference (WWW) 2025

⬇ PDF

Abstract

The web experience in developing regions remains subpar, primarily due to the growing complexity of modern webpages. We introduce the Mobile Application Markup Language (MAML), a flat layout-based web specification language that reduces computational and data transmission demands, while replacing excessive bloat from JS with a new scripting language centered on essential web functionalities. When compared to Google AMP across 100 testing webpages, MAML offers speedups by tens of seconds under challenging network conditions.

CHASE 2025

Advancing Sustainable Communities in Scientific OSS: A Replication Study with Astropy

J. Sun, Y. Li, A. Patil, J. Guo, and S. Zhou

The 18th International Conference on Cooperative and Human Aspects of Software Engineering (CHASE) 2025

⬇ PDF

Abstract

Scientific OSS fosters transparency and collaboration. Through a survey-based replication study in the Astropy Project, we gathered insights from disengaged contributors regarding their motivations, reasons for disengagement, and suggestions for improving community sustainability. Our findings reveal key motivations driving scientific contributions to OSS and identify barriers to sustained engagement.

MOBILESoft 2025

LLMs in Mobile Apps: Practices, Challenges, and Opportunities

K. Hau, S. Hassan, and S. Zhou

The 12th International Conference on Mobile Software Engineering and Systems (MOBILESoft) 2025

⬇ PDF

Abstract

We constructed a comprehensive dataset of 149 LLM-enabled Android apps and conducted an exploratory analysis to understand how LLMs are deployed and used within mobile apps. This analysis highlights key characteristics of the dataset, prevalent integration strategies, and common challenges developers face integrating LLMs under mobile device constraints, API management, and code infrastructure.

ICSE 2025

The Product Beyond the Model — An Empirical Study of Repositories of Open-Source ML Products

N. Nahar, H. Zhang, G. Lewis, S. Zhou, and C. Kästner

47th International Conference on Software Engineering (ICSE) 2025

⬇ PDF

Abstract

We contribute a dataset of 262 open-source ML products for end users identified among more than half a million ML-related projects on GitHub. We qualitatively and quantitatively analyze 30 open-source ML products to answer six broad research questions about development practices and system architecture, reporting 21 findings including limited involvement of data scientists and unusually low modularity between ML and non-ML code.

2024

2 papers

ICSME 2024

Can We Do Better with What We Have Done? Unveiling the Potential of ML Pipeline in Notebooks

Yuangan Zou, Xinpeng Shan, Shiqi Tan, and S. Zhou

International Conference on Software Maintenance and Evolution (ICSME) 2024

⬇ PDF ▶ Slides

Abstract

Computational notebooks are widely adopted by data scientists for experimenting with machine learning models. We conduct a qualitative analysis to examine how data scientists explore various alternatives through a series of versions of notebooks on Kaggle. By combining alternatives from all stages to form previously unexplored paths, we discover that certain untested combinations can outperform the best models as identified in the original notebooks.

CSCW 2024

"A Lot of Moving Parts": A Case Study of Open-Source Hardware Design Collaboration in the Thingiverse Community

K. Cheng, S. Zhou, and A. Olechowski

The 27th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 2024

⬇ PDF

Abstract

We conduct a detailed case study of DrawBot, a successful open-source hardware project that remarkably fostered a long-term collaboration on Thingiverse — a platform not explicitly intended for complex collaborative design. Through analyzing comment threads and design changes, we found how collaboration occurred, the challenges faced, and how the DrawBot community managed to overcome these obstacles.

2023

6 papers

ICSME 2023

Aligning Documentation and Q&A Forum through Constrained Decoding with Weak Supervision

R. Pudari, S. Zhou, I. Ahmed, Z. Dai, and S. Zhou

ICSME 2023 — New Ideas and Emerging Results

⬇ PDF ▶ Slides

Abstract

Stack Overflow plays a supplementary role to official documentation by offering practical examples and resolving uncertainties. We propose DOSA, a novel approach to automatically align Stack Overflow and documentation, injecting domain-specific knowledge about the documentation structure into large language models through weak supervision and constrained decoding. Our preliminary experiments find that DOSA outperforms various widely-used baselines.

CSCW 2023

User Perspectives on Branching in Computer-Aided Design

K. Cheng, P. Cuvin, A. Olechowski, and S. Zhou

The 26th ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW) 2023

⬇ PDF ▶ Slides

Abstract

We mine and analyze 719 user-generated posts from online CAD forums to qualitatively study designers' intentions for and preliminary use of branching in CAD. Our work contributes a taxonomy of CAD branching use cases, an identification of deficiencies of existing branching capabilities in CAD, and a discussion of the untapped potential of CAD branching to support a new paradigm of collaborative mechanical design.

CSCW 2023

In the Age of Collaboration, the Computer-Aided Design Ecosystem is Behind: Evidence from an Interview Study of Distributed CAD Practice

K. Cheng, S. Zhou, and A. Olechowski

The 26th ACM Conference On Computer-Supported Cooperative Work And Social Computing (CSCW) 2023

⬇ PDF ▶ Slides

Abstract

We conduct semi-structured interviews with 20 CAD professionals of diverse industries, roles, and experience levels to understand their collaborative workflows with distributed CAD tools. In total, we identify 14 challenges related to collaborative design, communication, data management, and permissioning that are currently impeding effective collaboration in professional CAD teams.

CHI 2023

Interaction of Thoughts: Towards Mediating Task Assignment in Human-AI Cooperation with a Capability-Aware Shared Mental Model

Z. He, Y. Song, S. Zhou, and Z. Cai

The ACM CHI Conference on Human Factors in Computing Systems (CHI) 2023

⬇ PDF

Abstract

We propose a capability-aware shared mental model (CASMM) for task assignment in human-AI cooperation, utilizing tuples to break down tasks into sets of scenarios and dynamically merging task grouping ideas through negotiation. A 3-phase user study via an image labeling task shows that building CASMM boosts accuracy and time efficiency significantly through forming task assignments close to real capabilities within few iterations.

CHI 2023

Aspirations and Practice of ML Model Documentation: Moving the Needle with Nudging and Traceability

A. Bhat, A. Coursey, G. Hu, S. Li, N. Nahar, S. Zhou, C. Kästner, and J. Guo

The ACM CHI Conference on Human Factors in Computing Systems (CHI) 2023

⬇ PDF

Abstract

Our analysis of publicly available model cards reveals a substantial gap between the model cards proposal and the practice. We design a tool named DocML aiming to nudge data scientists to comply with the model cards proposal during model development and to assess and manage documentation quality. A lab study reveals the benefit of our tool towards long-term documentation quality and accountability.

CAIN 2023 ★ Best Paper

A Meta-Summary of Challenges in Building Products with ML Components — Collecting Experiences from 4758+ Practitioners

N. Nahar, H. Zhang, G. Lewis, S. Zhou, and C. Kästner

International Conference on AI Engineering — Software Engineering for AI (CAIN) 2023

⬇ PDF

Abstract

Incorporating machine learning components into software products raises new software-engineering challenges and exacerbates existing ones. We provide a meta-summary synthesizing findings from studies involving 4758+ practitioners, identifying recurring challenges and providing a consolidated view of the landscape of ML engineering challenges in industry practice.

2022

4 papers

CASCON 2022

Exploring Trends and Practices of Forks in Open-Source Software Repositories

M. Hadian, S. Brisson, B. Adams, S. Ghari, E. Noei, M. Fokaefs, K. Lyons, and S. Zhou

32nd Annual International Conference on Computer Science and Software Engineering (CASCON) 2022

⬇ PDF

Abstract

Forking a software repository is a popular and recommended practice among developers. A fork is a copy of the original repository that can evolve independently from the parent repository, allowing developers to experiment with a code base or test new features without the danger of affecting the original project. In this work, we explore the motivation, the practices and the culture of forking open-source software repositories, studying how forks evolve compared to the parent repository, how they are related to pull requests, how they contribute back to the parent, and how dependencies are shared or differ within project families.

ICSME 2022 – NIER

Elevating Jupyter Notebook Maintenance Tooling by Identifying and Extracting Notebook Structures

Y. Jiang, C. Kästner, and S. Zhou

International Conference on Software Maintenance and Evolution (ICSME) 2022 — New Ideas and Emerging Results Track (NIER)

⬇ PDF

Abstract

Computational notebooks have become a popular tool for data analysis, but notebooks in practice are often criticized as hard to maintain and being of low code quality. We argue that central to better tool support is identifying the structure of notebooks. We present a lightweight and accurate approach to extract notebook structure and outline several ways such structure can be used to improve maintenance tooling for notebooks, including navigation and finding alternatives.

IST 2022

An Empirical Study of Emoji Use in Software Development Communication

S. Rong, W. Wang, U. Mannan, E. Almeida, S. Zhou, and I. Ahmed

Information and Software Technology (IST) 2022

⬇ PDF

Abstract

We present a large-scale empirical study on the intention of emoji usage conducted on 2,712 Open Source Software projects. We build a machine learning model to automate classifying the intentions behind emoji usage in 39,980 posts. Our results show that we can classify the intention of emoji usage with high accuracy (AUC of 0.97), and that developers use emoji for varying intentions that change throughout a conversation.

ICSE 2022

Collaboration Challenges in Building ML-Enabled Software: Communication, Documentation, Engineering, and Process

N. Nahar, S. Zhang, S. Zhou, and C. Kästner

44th International Conference on Software Engineering (ICSE) 2022

Abstract

Building ML-enabled software involves collaboration between team members with different backgrounds and expertise. We conducted an interview study to understand collaboration challenges in building ML-enabled software, identifying challenges around communication, documentation, engineering, and process.

2021

5 papers

FSE 2021

Studying the Effect of Pull Request Revert on Software Quality

S. Zhou, et al.

ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE) 2021

Abstract

Pull requests are a central mechanism for code integration in modern collaborative software development. This study examines the effects of reverted pull requests on software quality, analyzing large-scale repository data to understand when and why pull requests are reverted and what impact this has on the codebase.

RAISE 2021

Splitting, Renaming, Removing: A Study of Common Cleaning Activities in Jupyter Notebooks

H. Dong, S. Zhou, J. Guo, and C. Kästner

8th International Workshop on Realizing Artificial Intelligence Synergies in Software Engineering (RAISE) 2021

Abstract

Data scientists commonly use computational notebooks because they provide a good environment for testing multiple models. In this paper, we perform a qualitative study on how scientists clean their code. By sampling notebooks from GitHub and analyzing changes between subsequent commits, we identified common cleaning activities, such as changes to markdown or comments as well as reordering cells. Our results provide a valuable foundation for tool builders and notebook users.

JSME 2021

Perceptions of Open-Source Software Developers on Collaborations: An Interview and Survey Study

K. Constantino, S. Zhou, M. Souza, E. Figueiredo, and C. Kästner

Journal of Software: Evolution and Process (JSME) 2021

⬇ PDF

Abstract

We investigate the perceptions of open-source software developers on collaborations, such as motivations, techniques, and tools to support global, productive, and collaborative development. Following an interview study with 12 open-source software developers from GitHub, we conducted an extensive survey with 121 developers. We found that most collaborators prefer to collaborate with the core team, and most collaboration happens in software development and maintenance tasks.

ASE 2021

Subtle Bugs Everywhere: Generating Documentation for Data Wrangling Code

C. Yang, S. Zhou, J. Guo, and C. Kästner

36th IEEE/ACM International Conference on Automated Software Engineering (ASE) 2021

⬇ PDF

Abstract

Data scientists reportedly spend a significant amount of their time on data wrangling. We present a technique to generate interactive documentation for data wrangling code using program synthesis techniques to automatically summarize data transformations and test case selection techniques to purposefully select representative examples. A user study shows that users with our JupyterLab plugin are faster and more effective at finding realistic bugs in data wrangling code.

🏆 Distinguished Paper Award ICSME 2021

Interactive Patch Filtering as Debugging Aid

J. Liang, R. Ji, J. Jiang, S. Zhou, Y. Lou, Y. Xiong, and G. Huang

37th International Conference on Software Maintenance and Evolution (ICSME) 2021

⬇ PDF ⬇ Code

Abstract

We propose an interactive patch filtering approach to facilitate developers in the patch review process via effectively filtering out groups of incorrect patches. We implemented the approach as an Eclipse plugin, InPaFer, and evaluated its effectiveness. The results show that our approach improves the repair performance of developers, with 62.5% more successfully repaired bugs and 25.3% less debugging time.

≤2020

12 papers

ICSE 2020

How Has Forking Changed in the Last 20 Years? A Study of Hard Forks on GitHub

S. Zhou, B. Vasilescu, C. Kästner

42nd International Conference on Software Engineering (ICSE) 2020 — Acceptance rate: 20.9% (129/617)

⬇ PDF ▶ Slides ▶ Talk ⬇ Data

Abstract

The notion of forking has changed with the rise of distributed version control systems and social coding environments like GitHub. To revisit hard forks, we identify, study, and classify 15,306 hard forks on GitHub and interview 18 owners of hard forks or forked repositories. We find that hard forks often evolve out of social forks rather than being planned deliberately and that perceptions about hard forks have changed dramatically, seeing them often as a positive noncompetitive alternative to the original project.

ICGSE 2020

Understanding Collaborative Software Development: An Interview Study

K. Constantino, S. Zhou, M. Souza, E. Figueiredo, and C. Kästner

15th ACM/IEEE International Conference on Global Software Engineering (ICGSE) 2020

⬇ PDF

Abstract

This paper presents an interview study aiming to understand the motivations, how collaboration happens, and the challenges and barriers of collaborative software development. After interviewing twelve experienced software developers from GitHub, we found different types of collaborative contributions. Our analysis indicates that the main barriers for collaboration are related to non-technical, rather than technical issues.

MSR 2020 – Mining Challenge

An Exploratory Study to Find Motives behind Cross-platform Forks from Software Heritage Dataset

A. Bhattacharjee, S. Nath, S. Zhou, D. Chakroborti, B. Roy, C. Roy, and K. Schneider

17th International Conference on Mining Software Repositories (MSR) 2020 — Mining Challenge Track

⬇ PDF

Abstract

With the advances of Software Heritage Graph Dataset, we have the opportunity to investigate forking activities across platforms. We conduct an exploratory study on 10 popular open-source projects to identify cross-platform forks and investigate the motivation behind. We found that most cross-platform forks are mirrors of repositories on another platform, but we still find cases created due to preference of using certain functionalities supported by different platforms.

FSE 2019

What the Fork: A Study of Inefficient and Efficient Forking Practices in Social Coding

S. Zhou, B. Vasilescu, C. Kästner

27th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering (ESEC/FSE) 2019 — Acceptance rate: 24% (74/303)

⬇ PDF ▶ Slides

Abstract

Forking and pull requests have been widely used in open-source communities as uniform development and contribution mechanisms. However, some projects observe severe inefficiencies, including lost and duplicate contributions and fragmented communities. Using logistic regression models, we analyzed the association of context factors with inefficiencies and found that better modularity and centralized management can encourage more contributions and a higher fraction of accepted pull requests.

SANER 2019

Identifying Redundancies in Fork-based Development

L. Ren, S. Zhou, C. Kästner, and A. Wąsowski

27th IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2019 — Acceptance rate: 27% (40/148)

⬇ PDF ▶ Slides

Abstract

Fork-based development makes it difficult to maintain an overview of the whole community when the number of forks increases, which may lead to redundant development. We designed an approach to identify redundant code changes in forks as early as possible by building a machine learning model to predict redundancies. The result shows 57-83% precision for detecting duplicate code changes, and we could save developers' effort of 1.9-3.0 commits on average.

ISSRE 2019

How to Explain a Patch: An Empirical Study of Patch Explanations in Open Source Projects

J. Liang, Y. Hou, S. Zhou, J. Chen, Y. Xiong, G. Huang

30th International Symposium on Software Reliability Engineering (ISSRE) 2019

⬇ PDF

Abstract

We explored how developers explain their patches by manually analyzing 300 merged bug-fixing pull requests from six projects on GitHub. We build a patch explanation model which summarizes the elements in a patch explanation and corresponding expressive forms. We also conducted a quantitative analysis to understand the distributions of elements and the correlation between elements and their expressive forms.

ASE 2019 – Doctoral Symposium

Improving Collaboration Efficiency in Fork-based Development

Companion of the International Conference on Automated Software Engineering (ASE) 2019

⬇ PDF ▶ Poster ▶ Slides

ICSE 2018 – Poster

Poster: Forks Insight: Providing an Overview of GitHub Forks

L. Ren, S. Zhou, and C. Kästner

Companion of the International Conference on Software Engineering (ICSE) 2018 — Poster

⬇ PDF

ICSE 2018

Identifying Features in Forks

S. Zhou, Ș. Stănciulescu, O. Leßenich, Y. Xiong, A. Wąsowski, and C. Kästner

40th International Conference on Software Engineering (ICSE) 2018 — Acceptance rate: 21%

⬇ PDF ▶ Slides

Abstract

We introduced INFOX, an approach to automatically identify not-merged features in forks and generate an overview of active forks in a project. The approach clusters cohesive code fragments using code and network analysis techniques and uses information-retrieval techniques to label clusters with keywords. The clustering is effective, with 90% accuracy on a set of known features, and a human-subject evaluation shows that INFOX can provide actionable insight for developers of forks.

ICSE 2018

Adding Sparkle to Social Coding: An Empirical Study of Repository Badges in the npm Ecosystem

A. Trockman, S. Zhou, C. Kästner, and B. Vasilescu

40th International Conference on Software Engineering (ICSE) 2018 — Acceptance rate: 21%

⬇ PDF

Abstract

We report on a large-scale, mixed-methods empirical study of npm packages exploring the emerging phenomenon of repository badges. After surveying developers, mining 294,941 repositories, and applying statistical modeling and time series analysis, we find that non-trivial badges are mostly reliable signals, correlating with more tests, better pull requests, and fresher dependencies.

Releng 2015

Extracting Configuration Knowledge from Build Files with Symbolic Analysis

S. Zhou, J. Al-Kofahi, T. Nguyen, C. Kästner, and S. Nadi

3rd International Workshop on Release Engineering (Releng) 2015

Abstract

Build systems contain a lot of configuration knowledge about a software system, such as under which conditions specific files are compiled. We design an approach, based on SYMake, that symbolically evaluates Makefiles and extracts configuration knowledge in terms of file presence conditions and conditional parameters.

Internetware 2013

Elastic Resource Management for Heterogeneous Applications on PaaS

W. Hao, S. Zhou, T. Yang, R. Zhang, and Q. Wang

5th Asia-Pacific Symposium on Internetware 2013 — ACM, New York, NY

Abstract

We propose a practical and effective elasticity approach based on the analysis of application features — CPU consumption, I/O consumption, and request rate. The evaluation experiment shows that, compared with traditional approaches, our approach can save up to 32.8% VMs without significant increase of average response time and SLA violation.