Home Publications Team Gallery Join Us
Year
Venue
Loading...
No publications match your search.
2026
5 papers
TOSEM 2026
Detecting Protracted Vulnerabilities in Open Source Projects
A. Sridharkumar; S. Al Hajj Ibrahim; J. Zhou; Y. Wang; S. Hassan; A. Hassan and S. Zhou
ACM Transactions on Software Engineering and Methodology (TOSEM) 2026
⬇ pdf coming soon
Abstract
Timely resolution and disclosure of vulnerabilities are essential for maintaining the security of open-source software. However, many vulnerabilities remain unreported, unpatched, or undisclosed for extended periods, exposing users to prolonged security risks. We investigate the vulnerability lifecycle by focusing on protracted vulnerabilities (PCVEs), which remain unresolved or undisclosed over long durations. We propose DeeptraVul, an enhanced detection approach tailored to protracted cases, integrating multiple development artifacts and code-level signals supported by a large language model-based summarization component.
TOSEM 2026
"Should I Give Up Now?" Investigating LLM Pitfalls in Software Engineering
ACM Transactions on Software Engineering and Methodology (TOSEM) 2026
⬇ PDF
Abstract
Software engineers are increasingly incorporating AI assistants into their workflows to enhance productivity and alleviate cognitive load. However, experiences with large language models (LLMs) such as ChatGPT vary widely. Analyzing data from 26 participants in a complex web development task, we identified nine failure types categorized into incorrect or incomplete responses, cognitive overload, and context loss. Our quantitative analysis revealed that unhelpful responses increased the likelihood of abandonment by a factor of 11, while each additional prompt reduced abandonment probability by 17%.
CHI 2026
Untangling the Timeline: Challenges and Opportunities in Supporting Version Control in Modern Computer-Aided Design
The ACM CHI Conference on Human Factors in Computing Systems (CHI) 2026
⬇ pdf coming soon
Abstract
Version control is critical in mechanical CAD to enable traceability, manage product variation, and support collaboration. This paper presents a systematic review of user-reported challenges with version control in modern CAD tools. Analyzing 170 online forum threads, we identify recurring socio-technical issues that span the management, continuity, scope, and distribution of versions. Our findings inform a broader reflection on how version control should be designed and improved for CAD.
CHI 2026
CADModelScope: Revealing the Dependency Structure Behind Parametric Computer-Aided Design Models
The ACM CHI Conference on Human Factors in Computing Systems (CHI) 2026
⬇ pdf coming soon
Abstract
Parametric CAD models are constructed by a sequence of operations, where each operation may reference geometries created by earlier ones. This network of dependencies enables efficient modelling of complex geometry but also results in fragile models where small modifications can trigger cascading errors. We present CADModelScope, a multi-level graph-based visualization of operation dependencies integrated into a commercial CAD platform.
ICSE 2026
Beyond Adoption: Examining the Evolution and Impact of Codes of Conduct on Open-Source Communities
J. Sun, H. Fang, J. Zhang, J. Shi, R. Lai, A. Ihuman, R. Littauer, and S. Zhou
The 48th IEEE/ACM International Conference on Software Engineering (ICSE) 2026
⬇ PDF
Abstract
While open source software (OSS) communities thrive on collaboration, conflicts such as toxic behavior and discrimination can surface, threatening the sustainability of these projects. To address these concerns, many communities have adopted a Code of Conduct (CoC). Our study compiles a large-scale dataset of CoCs along with their change histories in OSS repositories on GitHub to quantitatively understand the evolution of CoC content and investigate the potential impact of CoC adoption on community engagement. OSS communities with a CoC attract more new contributors and decrease the number of existing contributors disengaging from the community in the long term.
SERS 2026
Do Research Software Engineers and Software Engineering Researchers Speak the Same Language?
1st International Workshop on Software Engineering and Research Software (SERS 2026)
⬇ pdf coming soon
Abstract
Research Software Engineers (RSEs) often use different terminologies than the Software Engineering Research (SER) community for similar concepts. As an outcome of the Dagstuhl Seminar 24161, we developed an approach to explore these terminologies using crowd-sourcing to build a website presenting a "mapping of terms" between the groups.
2025
8 papers
CSCW 2025
It's a Complete Haystack: Understanding Dependency Management Needs in Computer-Aided Design
The 28th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 2025
⬇ PDF
Abstract
Hardware development teams face increasing demands for better quality products, greater innovation, and shorter manufacturing lead times. One significant and unaddressed challenge is understanding and managing dependencies between 3D CAD models, especially when products can contain thousands of interconnected components. In this two-phase formative study, we explore designers' pain points of CAD dependency management through a thematic analysis of 100 online forum discussions and semi-structured interviews with 10 designers. We identify nine key challenges related to the traceability, navigation, and consistency of CAD dependencies.
CSCW 2025
Collaboration Challenges and Opportunities in Developing Scientific Open-Source Software Ecosystem: A Case Study on Astropy
The 28th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 2025
⬇ PDF ▶ Slides
Abstract
Scientific open-source software (OSS) has greatly benefited research communities through its transparent and collaborative nature. This study examines the challenges and opportunities for improving collaboration efficiency in the development and maintenance of scientific OSS. We conducted a mixed-methods case study on Astropy, including analysis of commit history, cross-referenced issues and pull requests, and interviews with core contributors.
CSCW 2025
Who is to Blame: A Comprehensive Review of Challenges and Opportunities in Designer-Developer Collaboration
The 28th ACM Conference on Computer-Supported Cooperative Work and Social Computing (CSCW) 2025
⬇ PDF
Abstract
Software development relies on effective collaboration between Software Development Engineers (SDEs) and User eXperience Designers (UXDs). We conducted a systematic literature review of 45 papers published since 2004, uncovering three key collaboration challenges and two main categories of potential best practices. We then analyzed designer and developer forums and discussions from one open-source software repository to assess how the challenges and practices manifest in the status quo.
CiSE 2025
Do Research Software Engineers and Software Engineering Researchers Speak the Same Language?
⬇ PDF
Abstract
Anecdotal evidence suggests that Research Software Engineers (RSEs) and Software Engineering Researchers (SERs) often use different terminologies for similar concepts, creating communication challenges. Our preliminary findings reveal opportunities for mutual learning and collaboration, and our systematic methodology for terminology mapping provides a foundation for crowd-sourced extension and validation.
WWW 2025
MAML: Towards a Faster Web in Developing Regions
A. Varvello, I. Ahmed, S. Zhou, L. Subramanian, and Y. Zaki
⬇ PDF
Abstract
The web experience in developing regions remains subpar, primarily due to the growing complexity of modern webpages. We introduce the Mobile Application Markup Language (MAML), a flat layout-based web specification language that reduces computational and data transmission demands, while replacing excessive bloat from JS with a new scripting language centered on essential web functionalities. When compared to Google AMP across 100 testing webpages, MAML offers speedups by tens of seconds under challenging network conditions.
CHASE 2025
Advancing Sustainable Communities in Scientific OSS: A Replication Study with Astropy
⬇ PDF
Abstract
Scientific OSS fosters transparency and collaboration. Through a survey-based replication study in the Astropy Project, we gathered insights from disengaged contributors regarding their motivations, reasons for disengagement, and suggestions for improving community sustainability. Our findings reveal key motivations driving scientific contributions to OSS and identify barriers to sustained engagement.
MOBILESoft 2025
LLMs in Mobile Apps: Practices, Challenges, and Opportunities
K. Hau, S. Hassan, and S. Zhou
⬇ PDF
Abstract
We constructed a comprehensive dataset of 149 LLM-enabled Android apps and conducted an exploratory analysis to understand how LLMs are deployed and used within mobile apps. This analysis highlights key characteristics of the dataset, prevalent integration strategies, and common challenges developers face integrating LLMs under mobile device constraints, API management, and code infrastructure.
ICSE 2025
The Product Beyond the Model — An Empirical Study of Repositories of Open-Source ML Products
N. Nahar, H. Zhang, G. Lewis, S. Zhou, and C. Kästner
⬇ PDF
Abstract
We contribute a dataset of 262 open-source ML products for end users identified among more than half a million ML-related projects on GitHub. We qualitatively and quantitatively analyze 30 open-source ML products to answer six broad research questions about development practices and system architecture, reporting 21 findings including limited involvement of data scientists and unusually low modularity between ML and non-ML code.
2024
3 papers
ICSME 2024
Can We Do Better with What We Have Done? Unveiling the Potential of ML Pipeline in Notebooks
Yuangan Zou, Xinpeng Shan, Shiqi Tan, and S. Zhou
⬇ PDF ▶ Slides
Abstract
Computational notebooks are widely adopted by data scientists for experimenting with machine learning models. We conduct a qualitative analysis to examine how data scientists explore various alternatives through a series of versions of notebooks on Kaggle. By combining alternatives from all stages to form previously unexplored paths, we discover that certain untested combinations can outperform the best models as identified in the original notebooks.
CSCW 2024
"A Lot of Moving Parts": A Case Study of Open-Source Hardware Design Collaboration in the Thingiverse Community
⬇ PDF
Abstract
We conduct a detailed case study of DrawBot, a successful open-source hardware project that remarkably fostered a long-term collaboration on Thingiverse — a platform not explicitly intended for complex collaborative design. Through analyzing comment threads and design changes, we found how collaboration occurred, the challenges faced, and how the DrawBot community managed to overcome these obstacles.
2023
6 papers
ICSME 2023
Aligning Documentation and Q&A Forum through Constrained Decoding with Weak Supervision
⬇ PDF ▶ Slides
Abstract
Stack Overflow plays a supplementary role to official documentation by offering practical examples and resolving uncertainties. We propose DOSA, a novel approach to automatically align Stack Overflow and documentation, injecting domain-specific knowledge about the documentation structure into large language models through weak supervision and constrained decoding. Our preliminary experiments find that DOSA outperforms various widely-used baselines.
CSCW 2023
⬇ PDF ▶ Slides
Abstract
We mine and analyze 719 user-generated posts from online CAD forums to qualitatively study designers' intentions for and preliminary use of branching in CAD. Our work contributes a taxonomy of CAD branching use cases, an identification of deficiencies of existing branching capabilities in CAD, and a discussion of the untapped potential of CAD branching to support a new paradigm of collaborative mechanical design.
CSCW 2023
In the Age of Collaboration, the Computer-Aided Design Ecosystem is Behind: Evidence from an Interview Study of Distributed CAD Practice
⬇ PDF ▶ Slides
Abstract
We conduct semi-structured interviews with 20 CAD professionals of diverse industries, roles, and experience levels to understand their collaborative workflows with distributed CAD tools. In total, we identify 14 challenges related to collaborative design, communication, data management, and permissioning that are currently impeding effective collaboration in professional CAD teams.
CHI 2023
Interaction of Thoughts: Towards Mediating Task Assignment in Human-AI Cooperation with a Capability-Aware Shared Mental Model
Z. He, Y. Song, S. Zhou, and Z. Cai
⬇ PDF
Abstract
We propose a capability-aware shared mental model (CASMM) for task assignment in human-AI cooperation, utilizing tuples to break down tasks into sets of scenarios and dynamically merging task grouping ideas through negotiation. A 3-phase user study via an image labeling task shows that building CASMM boosts accuracy and time efficiency significantly through forming task assignments close to real capabilities within few iterations.
CHI 2023
Aspirations and Practice of ML Model Documentation: Moving the Needle with Nudging and Traceability
A. Bhat, A. Coursey, G. Hu, S. Li, N. Nahar, S. Zhou, C. Kästner, and J. Guo
⬇ PDF
Abstract
Our analysis of publicly available model cards reveals a substantial gap between the model cards proposal and the practice. We design a tool named DocML aiming to nudge data scientists to comply with the model cards proposal during model development and to assess and manage documentation quality. A lab study reveals the benefit of our tool towards long-term documentation quality and accountability.
CAIN 2023 ★ Best Paper
A Meta-Summary of Challenges in Building Products with ML Components — Collecting Experiences from 4758+ Practitioners
N. Nahar, H. Zhang, G. Lewis, S. Zhou, and C. Kästner
⬇ PDF
Abstract
Incorporating machine learning components into software products raises new software-engineering challenges and exacerbates existing ones. We provide a meta-summary synthesizing findings from studies involving 4758+ practitioners, identifying recurring challenges and providing a consolidated view of the landscape of ML engineering challenges in industry practice.
2022
3 papers
ICSE 2022
Collaboration Challenges in Building ML-Enabled Software: Communication, Documentation, Engineering, and Process
N. Nahar, S. Zhang, S. Zhou, and C. Kästner
44th International Conference on Software Engineering (ICSE) 2022
Abstract
Building ML-enabled software involves collaboration between team members with different backgrounds and expertise. We conducted an interview study to understand collaboration challenges in building ML-enabled software, identifying challenges around communication, documentation, engineering, and process.
2021
papers
FSE 2021
Studying the Effect of Pull Request Revert on Software Quality
S. Zhou, et al.
ACM SIGSOFT International Symposium on the Foundations of Software Engineering (FSE) 2021
Abstract
Pull requests are a central mechanism for code integration in modern collaborative software development. This study examines the effects of reverted pull requests on software quality, analyzing large-scale repository data to understand when and why pull requests are reverted and what impact this has on the codebase.
2015–2018
4 papers
ICSE 2018
Identifying Features in Forks
40th International Conference on Software Engineering (ICSE) 2018 — Acceptance rate: 21%
⬇ PDF ▶ Slides
Abstract
We introduced INFOX, an approach to automatically identify not-merged features in forks and generate an overview of active forks in a project. The approach clusters cohesive code fragments using code and network analysis techniques and uses information-retrieval techniques to label clusters with keywords. The clustering is effective, with 90% accuracy on a set of known features, and a human-subject evaluation shows that INFOX can provide actionable insight for developers of forks.
ICSE 2018
Adding Sparkle to Social Coding: An Empirical Study of Repository Badges in the npm Ecosystem
40th International Conference on Software Engineering (ICSE) 2018 — Acceptance rate: 21%
⬇ PDF
Abstract
We report on a large-scale, mixed-methods empirical study of npm packages exploring the emerging phenomenon of repository badges. After surveying developers, mining 294,941 repositories, and applying statistical modeling and time series analysis, we find that non-trivial badges are mostly reliable signals, correlating with more tests, better pull requests, and fresher dependencies.
Releng 2015
Extracting Configuration Knowledge from Build Files with Symbolic Analysis
S. Zhou, J. Al-Kofahi, T. Nguyen, C. Kästner, and S. Nadi
Abstract
Build systems contain a lot of configuration knowledge about a software system, such as under which conditions specific files are compiled. We design an approach, based on SYMake, that symbolically evaluates Makefiles and extracts configuration knowledge in terms of file presence conditions and conditional parameters.
Internetware 2013
Elastic Resource Management for Heterogeneous Applications on PaaS
W. Hao, S. Zhou, T. Yang, R. Zhang, and Q. Wang
5th Asia-Pacific Symposium on Internetware 2013 — ACM, New York, NY
Abstract
We propose a practical and effective elasticity approach based on the analysis of application features — CPU consumption, I/O consumption, and request rate. The evaluation experiment shows that, compared with traditional approaches, our approach can save up to 32.8% VMs without significant increase of average response time and SLA violation.