What is it about?

Researchers studying software projects often use platforms like GitHub to find projects to analyze. However, GitHub contains many projects that aren't actually software, such as datasets or documents. This makes it hard for researchers to pick the right projects, leading to wasted time and possibly flawed results. To solve this, the authors created a smarter method to identify and categorize real software projects. They used advanced techniques like machine learning to scan and sort over 35,000 projects from a large dataset. Their system can tell if a project is a complete application, a library for other developers, or a plugin that adds features to other software.

Featured Image

Why is it important?

This method helps researchers quickly eliminate non-software projects from a dataset and find suitable software projects, saving time and improving the quality of their studies. It can also help organizations better understand and manage their open-source projects.

Read the Original

This page is a summary of: Smarter Project Selection for Software Engineering Research, July 2024, ACM (Association for Computing Machinery),
DOI: 10.1145/3663533.3664037.
You can read the full text:

Read

Contributors

The following have contributed to this page