Methodology > Basic Concepts
The design and support of APIs (application programming interfaces) is greatly important in the world of systems. Because APIs define the interaction between applications and operating systems, the support of APIs is essentailly a contract between the operating system developers and application developers. When application developers develope their software, they assume that all features and specifications presented by the API definitions must be supported by the operating systems. As a result, operating system developers must always take extra caution when dealing with API support.
The ability of maintaining support for applications developered using the defined APIs is well known as API compatibility. In fact, because maintaining API compatibility is so important and difficult, operating system developers tends to keep old APIs around while the new ones replicates their functionality. In general, operating system developers are concerned about breaking existing applications, that they often avoid deprecating old APIs at all cost, or at least until going through a lengthy process of comfirmation.
Even though API compatibility is no less important than other system properties (e.g., performance), the lack of proper measurement has always made API compatibility a property that is hard to evaluate or reason about. The definition of API compatibility in systems so far is closer to bug-for-bug compatibility: a system must implement every specifications of its precedents, including the unneeded or buggy ones. Such an evaluation model has left system developers no space to reduce API definitions or verify the progress of system building.
The study of API usage and compatibility is mostly beneficial for system developers. When system developers are making decisions about API support, they must rely on instincts to decide what is important and what is not. As a result, there is no clear principle to follow while deciding whether and how an API can make a system more compatible.
One type of system developers who will care about evaluating API usage and compatibility is system builders. System builders, during the process of gradually building up the API support, may struggle to express or prove their progress, or the completeness of the system they built. With bug compatibility, system builders can not make any claim until every API is supported in the system. Moreover, system builders need a principled way to prioritize the APIs that they have to implement, to maximize the support for user.
Another role that will care about this study is system API maintainers, who may or may not be the same as system builders. They are the ones who take care of adding or deprecating APIs after the system is released. They need to know whether there is really no usage of an API so they can deprecate it safely. If for strong reason, they must restrict or change the semantic of an API, they can use this study as a source of assessing the demage after making the change.
And finally, system researchers can find motivations to their works by observing the trends of API usage in the study.
To measure API usage and compatibility, we start with a simple evaluation model: counting the number of supported APIs or applications (as compatibility), or the number of applications depending on an API (as usage).
The problem of such an evaluation model is the fact that APIs are not equally popular, and neither are applications. Each API can be either more often or less often used by applications than its peers, thus counting the number of supported APIs is mostly meaningless for comparison. Simply put, supporting a number of popular APIs can not be called equally compatible as supporting the same number of unpopular APIs. The same problem exists when we consider the number of supported applications. Each application also has its popularity among users, so supporting a number of popular applications can not be called eqaulally compatible as supporting the same number of applications. For more information, see Linux > package popularity and Linux > study (on application developers) > system calls.
Theoretically, the usage of APIs is dominated by two factors: the choice of APIs made by the application developers (including the library developers) and the choice of applications made by the users. These two types of data have to be collected in different ways. The API usage in application developers has to be collected from application analysis, in a large scale. The application popularity has to be collected from user study in the real world.
Our evaluation model can be generalizable to different operating systems, if both application analysis and user study is feasible on the target platform. In this study, we target on Linux, paticularly on Ubuntu/Debian, because both application analysis and user study is relatively straightforward on the platform. First, Ubuntu/Debian is the most widely used Linux distribution. A large amount of applications on Ubuntu are binaries in the well-defined ELF format, so static analysis can reasonably detect the usage of APIs. Ubuntu/Debian's package manager (APT) helps automating large-scale application analysis. Finally, but as an important factor, Ubuntu/Debian provides popularity contest, a data set of installation statistics collected from majority of Ubuntu/Debian users.