Nicola Bena, Marco Anisetti, Ernesto Damiani, Alex Della Bruna, Chan Yeob Yeun, Claudio A. Ardagna
The advent of Large Language Models (LLMs) is revolutionizing the design and deployment of modern applications, enabling intelligent, adaptive, and context-aware services across a wide range of domains. These AI-driven components now coexist with legacy systems, microservices, and nanoservices, forming complex and evolving ecosystems. While LLMs offer unprecedented capabilities, they also introduce new risks related to security, privacy, and ethics, especially due to their probabilistic nature and dependence on vast, often opaque, training data. This paradigm shift calls for robust mechanisms to assess and verify the non-functional behavior of LLM-based applications. Assurance emerges as a key strategy, with certification techniques traditionally used to validate non-functional properties. However, existing certification approaches for LLM are still in their infancy, while traditional methods prove inadequate in this context. In this paper, we propose a multi-dimensional certification scheme for LLM-based applications that initially captures the broader context and dynamic behavior introduced by LLMs and other AI components. To this aim, after proposing a taxonomy of LLM-based applications and discussing the gaps in LLM assessment and verification, we define a certification model, where a hypergraph represents a specific behavior supporting the property to be certified and drives the evidence collection at the basis of the corresponding certification process. We then instantiate the proposed approach in different use cases and experimentally evaluate it using an LLM-based application that implements a recommendation task in a security-critical scenario.
This repository contains the source code, input dataset, intermediate results and detailed results of our experimental evaluation.