ADR-014: New Alias Schema
| Status | Date | Author(s) |
|---|---|---|
| Accepted | 2026-02-23 | @nscuro |
Context¶
Vulnerability aliases are stored in the denormalized VULNERABILITYALIAS table:
| Column | Type | Constraints |
|---|---|---|
| ID | BIGINT | PK |
| CVE_ID | TEXT | |
| GHSA_ID | TEXT | |
| GSD_ID | TEXT | |
| INTERNAL_ID | TEXT | |
| OSV_ID | TEXT | |
| SNYK_ID | TEXT | |
| SONATYPE_ID | TEXT | |
| VULNDB_ID | TEXT | |
| UUID | UUID | NOT NULL, UNIQUE |
This design poses a few challenges:
- Rows lack a natural key, making it impossible to detect and prevent duplicates.
- Modifying rows (i.e. adding a new ID to an existing alias group) is prone to race conditions.
- Due to the combination of the points above, batching operations on this table is not possible.
- Vulnerability sources are hardcoded as columns, making it unnecessarily challenging to add new sources.
- Querying the table is unnecessarily hard, as it requires the caller to know what column to query on.
- The lack of provenance for alias relationships prevents safe removal of relationships, e.g. when upstream sources correct their data.
The logic to create or modify alias records is brittle and non-deterministic. Making it concurrency-safe would require acquisition of coarse advisory locks.
Alias synchronization unfortunately is in the hot path for vulnerability analysis result reconciliation, and is performed concurrently with potentially overlapping data. To ensure that synchronization is both performant and correct, we need a solution that allows us to batch database operations, while effectively shielding us against data races.
Decision¶
Schema¶
Normalize the data into a new VULNERABILITY_ALIAS table with the following schema:
| Column | Type | Constraints |
|---|---|---|
| GROUP_ID | UUID | NOT NULL |
| SOURCE | TEXT | PK |
| VULN_ID | TEXT | PK |
- The separate ID columns are collapsed into
SOURCEandVULN_ID. SOURCEandVULN_IDform the natural (primary) key, effectively preventing duplicates.- Alias relationships are identified via matching
GROUP_ID.
Querying¶
To query all aliases of a vulnerability identified by source and vulnId, excluding the input pair itself:
Alias Assertions¶
To track provenance of alias relationships, a separate VULNERABILITY_ALIAS_ASSERTION table records
which entity asserted that two vulnerabilities are aliases:
| Column | Type | Constraints |
|---|---|---|
| ASSERTER | TEXT | PK |
| VULN_SOURCE | TEXT | PK |
| VULN_ID | TEXT | PK |
| ALIAS_SOURCE | TEXT | PK |
| ALIAS_ID | TEXT | PK |
| CREATED_AT | TIMESTAMPTZ(3) | NOT NULL, DEFAULT NOW() |
Each row records that ASSERTER claimed (VULN_SOURCE, VULN_ID) and (ALIAS_SOURCE, ALIAS_ID)
are aliases. Assertions are directional: (VULN_SOURCE, VULN_ID) is the declaring vulnerability,
(ALIAS_SOURCE, ALIAS_ID) is the alias attributed to it. This enables efficient reconciliation
by querying existing assertions for a given vulnerability.
Alias groups in the VULNERABILITY_ALIAS table are derived from assertions and serve as a
materialized view for efficient read queries. They are recomputed whenever assertions change.
Assertions provide an audit trail and enable workflows such as revoking assertions from
a specific source, without affecting others.
Synchronization Algorithm¶
Given an asserter (e.g. NVD) and a map of declaring vulnerabilities to their asserted aliases:
- Begin transaction.
- Acquire PostgreSQL advisory locks for all declaring vulnerabilities, ordered by key to prevent deadlocks between concurrent transactions:
- Fetch existing assertions for the declaring vulnerabilities:
- Reconcile incoming aliases against existing assertions, scoped to the current asserter:
- Assertions to create: incoming alias keys minus existing alias keys for this asserter.
- Assertions to delete: existing alias keys for this asserter minus incoming alias keys.
UNKNOWNcleanup: if the asserter is notUNKNOWNandUNKNOWNassertions exist for the same declaring vulnerability, mark it for removal.
- Delete stale assertions:
- Create new assertions:
- Delete
UNKNOWNassertions for declaring vulnerabilities where a real asserter now provides claims: - Recompute alias groups for all modified vulnerabilities:
- Expand transitively: iteratively query both
VULNERABILITY_ALIASandVULNERABILITY_ALIAS_ASSERTIONto discover all transitively related keys. For example, ifCVE-1is being linked toGHSA-1, butGHSA-1already has an assertion linking it toGHSA-2, expansion ensuresGHSA-2is included. - Build a union-find from the expanded assertions to compute connected components.
- For each component, pick the lowest existing group UUID (deterministic via sorted set), or generate a new one if the component has no prior group.
- Upsert alias records, only writing when the group ID actually changed:
- Delete orphaned aliases no longer backed by any assertion.
- Expand transitively: iteratively query both
- Commit transaction and release locks (implicit).
Note
Advisory locks are scoped to declaring vulnerability only. This is sufficient because
assertions are directional: a given asserter always writes assertions under the declaring
vulnerability it owns (e.g. NVD writes assertions under NVD|CVE-*).
All SELECT, DELETE, and INSERT operations are batched via UNNEST, allowing multiple
vulnerabilities to be processed in a single transaction with minimal round trips.
The upsert's WHERE ... IS DISTINCT FROM clause avoids unnecessary writes.
Data Migration¶
Existing data is migrated from VULNERABILITYALIAS to VULNERABILITY_ALIAS via Liquibase.
The migration replicates the synchronization algorithm in SQL.
The old VULNERABILITYALIAS table is dropped afterwards.
Assertions are seeded from the migrated alias groups. For each group, one assertion per unordered
pair of members is inserted with ASSERTER = 'UNKNOWN', since the original data does not carry
provenance information.
An integration test verifies that the migration works as expected, including the handling of potential duplicates in the existing data set, and the correctness of seeded assertions.
Consequences¶
- Adding new vulnerability sources requires no schema changes.
- Alias synchronization can be fully batched, reducing round trips in the hot path.
- The natural primary key prevents duplicate alias entries by construction.
- Querying aliases is uniform, and callers no longer need source-specific column knowledge.
- The old
UUIDcolumn is dropped. Any external references to alias records by UUID will break. No known external consumers depend on this identifier. - Advisory locks add contention under concurrent writes to overlapping alias sets. This is bounded by the lock granularity (per declaring vulnerability key), and acceptable given the correctness guarantees it provides.
- Alias group recomputation requires transitive expansion, which issues additional queries. In practice, alias groups are small (< 5 members), so this is negligible.
- Alias assertions provide provenance but grow linearly with the number of aliases per declaring vulnerability. Given the small expected group sizes, this is acceptable.
UNKNOWNassertions seeded during migration are automatically superseded when a real asserter (e.g. NVD, GitHub) provides claims for the same declaring vulnerability.