Software developers use datasets for training, validating and testing AI systems and underpinning models. Data privacy risks, including the possibility of de-anonymisation, remain a challenge across the AI system development lifecycle (SDLC). Existing data governance frameworks often require specialist knowledge and manual processes, making risk assessment difficult. This paper presents the Privacy Anonymisation Risk Assessor (PARA), a prototype multi-agent system designed to assess dataset de-anonymisation risks. PARA is intended to be used by software developers to detect risks by inspecting datasets, calculating de-anonymisation metrics, and obtaining individual dataset-level assessment reports. The research uses a Design Science Research Methodology (DSRM) to develop and evaluate PARA. Demonstration on publicly available Australian Census data confirmed the prototype’s potential applicability – producing assessment reports for each dataset. While this initial evaluation is limited, PARA can support developers in assessing data privacy risks and producing auditable feedback throughout the AI SDLC.
• Data anonymisation and sanitisation • Privacy-preserving protocols • Software verification and validation
Risk Assessor, Data Privacy, Anonymisation, Governance
AI systems rely on data to train, validate, and test models [9]. Software developers must check datasets for fitness, copyright compliance, and privacy requirements to avoid legal, ethical, and reputational risks [14]. Despite preprocessing, combinations of indirect identifiers may reveal identities [1,2,16]. Data privacy risks exist across AI SDLC stages: Initiate, Discover, Develop, Operate, Govern, and Adapt [9]. Current frameworks (NIST Privacy Framework, NIST AI RMF, EU AI Act) provide guidance but often require manual expertise, complicating consistent risk assessment [6,7,8,12].
This paper proposes the Privacy Anonymisation Risk Assessor (PARA), a multi-agent system for assessing de-anonymisation risk in tabular data. PARA generates anonymisation thresholds, calculates risk statistics, and produces auditable, per-dataset reports to help developers make informed decisions.
AI systems can expose sensitive information through model inversion and membership inference attacks [3,4,5,12]. Large language models may retain training data [5,10,11], demonstrating that anonymisation alone may not fully protect privacy [1,3,4,5]. Common methods to assess de-anonymisation risk include k-anonymity, l-diversity, t-closeness, and linkage-based risk estimation [2,13]. These methods are informative but can be complex to interpret in sparse or longitudinal datasets [2], indicating the need for tools like PARA to help developers assess privacy risks across AI SDLC stages [12].
This research applied Design Science Research Methodology (DSRM) to develop and evaluate PARA [13]. The six DSRM steps are: (1) problem identification, (2) solution design, (3) prototype development, (4) demonstration, (5) evaluation, and (6) communication. This paper focuses on Steps 1–4 with indicative findings from Step 5.
Figure 1: Design Science Research Methodology (DSRM). Adapted from [13].
PARA uses a hierarchical multi-agent system (HMAS) with a top-level Orchestrator Agent supervising three specialised agents: Scanner, Validator, and Summariser (Table 1). The prototype was developed using Google’s Agent Development Kit (ADK) with Model Context Protocol (MCP) and Gemma LLM (“gemma-3n-e4b-it”) integration [19].
Figure 2: PARA architecture and agent interactions. Human developers interact with the dashboard (1), which communicates with the Orchestrator (2) coordinating Scanner (3), Validator (4), and Summariser (5). External resources (6–11) support data inspection, anonymisation computation, LLM assistance, observability, and downstream usage.
NO. | COMPONENT | DESCRIPTION |
---|---|---|
1 | User Interface (dashboard) | Web dashboard for triggering scans, monitoring progress, and viewing reports. |
2 | Orchestrator Agent | Central agent that coordinates specialised agents. |
3 | Scanner Agent | Search datasets; returns dataset scan results including anonymisation thresholds, quasi-identifiers and sensitive columns. |
4 | Validator Agent | Computes de-anonymisation risk statistics; returns assessment results. |
5 | Summariser Agent | Provides auditable assessment report. |
6 | MCP Client | Client for agent-server communication. |
7 | MCP Server A | Access data repository for dataset inspection and schema retrieval. |
8 | MCP Server B | Assess de-anonymisation risk via statistical modules. |
9 | LLM Model | Gemma “gemma-3n-e4b-it”, used for thresholds and assessment reports. |
10 | Observability Store | Central logs and metrics for all agents. |
11 | External Layer | Downstream AI systems/services; PARA scans this repository to assess anonymisation risk. |
Table 1: Core user‑facing and runtime components; external resources (7–11) shown in Figure 2.
The prototype was demonstrated in the Discover phase of the AI SDLC using 2021 Australian Census tabular datasets. Developers initiate assessments via the Privacy Monitor tab, with individual dataset results accessible in the Recent Reports tab. Figure 3 shows example assessment reports.
Figure 3: Detailed assessment reports from the PARA demonstration on 2021 Australian Census datasets.
The demonstration served as an indicative evaluation, showing that PARA can produce auditable assessment reports for multiple datasets in a controlled environment.
The evaluation highlighted PARA’s potential to support developers in assessing anonymisation risks during AI SDLC. While the evaluation was limited in scope and formal measures of accuracy, scalability, and usability were not conducted, the prototype shows promise in generating dataset-level, auditable assessment reports. Future work will expand testing, conduct usability studies, and refine risk estimation techniques.
PARA is a multi-agent prototype to support developers in assessing de-anonymisation risks in tabular datasets during AI SDLC. The demonstration indicated that it can produce auditable assessment reports for multiple datasets, providing actionable feedback for dataset usage decisions. Future work will expand applicability, refine risk estimation, and conduct usability studies.
This research is supported by an Australian Government Research Training Program Scholarship. The authors gratefully acknowledge this support.
Prototype source code:
GitHub Repository
Prototype demonstration:
YouTube Video