Overview

Introduction

An increasing number of software systems today are very-large-scale software systems (VLSS) with decentralized control; support for multiple platforms; inherently conflicting requirements; continuous evolution and deployment; as well as heterogeneous, inconsistent, and changing elements (Maier 1998; Northrop et al. 2006). Such systems are often developed and evolved by heterogeneous and globally distributed teams and communities. VLSS are typically based on system-of-systems architectures comprising multiple heterogeneous systems, which evolve over many years and need to be continuously adapted to meet customer, market and technology requirements. Terms used in literature to describe VLSS include ultra-large-scale systems (Sullivan and Kazman 2008), systems-of-systems (Boehm 2006; Nielsen et al. 2015), or software ecosystems (Bosch 2009). Conventional software engineering has been compared to “building a house”, while the development and evolution of VLSS has been compared to the “development of cities”, which can typically no longer be planned, engineered, and evolved in a top-down fashion (Northrop et al. 2006). In such a context, methods for monitoring and evolution play an essential role:

Monitoring of VLSS. Ensuring the compliance of VLSS with their requirements is a necessity during evolution as organizations are not willing to use software for business-critical operations if the relationship to its requirements is vague (Robinson 2006, Maiden 2013). However, due to the size and complexity of VLSS a top-down approach to engineering and maintenance is no longer possible. For instance, the system is no longer under the control of a single stakeholder who determines a consistent set of requirements (Hall 2002). Novel approaches are required to monitor the emerging behavior of VLSS and to support the localization of problems. In particular, monitoring the performance and resource consumption of software systems is crucial to track down potential bottlenecks and software anomalies. For VLSS, performance monitoring is particularly challenging because systems might use different technologies, and user transactions can span several platform boundaries. Performance monitoring has to cooperate with the underlying levels of system software, such as the operating system, the virtual machine, or the hypervisor, to obtain relevant performance data.

Evolution of VLSS. Evolution is the rule and not the exception due to ever changing requirements, technologies, and markets. Users demand flexible systems that can be adapted rapidly and reliably to meet changing requirements, to increase performance, or to update technology. Existing VLSS represent a significant investment which needs to be preserved. Many industrial systems have a lifetime of 10-30 years and are facing continuous evolution due to changing software technology, new development methods, and business decisions driven by market situations. These challenges are specifically relevant in the context of VLSS, which comprise multiple, heterogeneous software systems that are managed and evolved independently. It is common in industrial development that different systems of the VLSS are deployed to customers over time, which then exist in many versions and variants. Customers often even evolve the systems themselves, e.g., when developing specific extensions. In software ecosystems, a whole community of (often even unknown) developers builds solutions based on a common platform. In such situations upgrading a system or parts of it at reasonable costs can become almost impossible, since both the original system and the deployed system may have evolved independently.

Research Areas

The paradigm shift from “building houses” to “building cities” requires new approaches and tools supporting monitoring and evolution. The figure shows the general themes and the specific research areas of the laboratory. We develop monitoring and evolution techniques for three different stages of VLSS development — operation and production, staging and simulation, development.

Operation and production. . The lab develops methods and tools for monitoring systems at runtime to collect data about system performance and system behavior. For instance, Module 1 has been developing an infrastructure for event-based monitoring of heterogeneous systems during commissioning and operation. Monitoring data will allow better diagnosis and more informed decisions about VLSS evolution. Module 3 has been developing non-intrusive performance and memory monitoring techniques inside Java virtual machines.

Staging and simulation. Verification and validation in a VLSS context is challenging due to complex dependencies between the systems the VLSS comprises and their complex environment. Support for staging and simulating VLSS after changes becomes vital to ensure compliance with requirements after evolving components in a VLSS. Research work in module 1 has been developing requirements-based monitoring techniques that interact with existing system simulators to assess VLSS after modifications. Another area of research is capture-and-replay techniques to assess deviations from the expected VLSS behavior.

Development. VLSS are often developed in software ecosystems in a globally distributed manner by internal and external developers based on core software platforms. In such settings, monitoring ongoing development is essential to understand what has been changed, to determine the location of features in the code, and to assess the impact of changes. In Module 2, the focus is on supporting feature-oriented development (e.g., based on program analysis and change tracking) to support VLSS evolution. This work also covers multi-level feature models in software ecosystems and techniques for model-to-code consistency.