ADR-015: Package Metadata
| Status | Date | Author(s) |
|---|---|---|
| Accepted | 2026-03-11 | @nscuro |
Context¶
To enable latest version checks, component age policies, and integrity verification, we retrieve the corresponding metadata from upstream package repositories.
Currently, this data is persisted in the REPOSITORY_META_COMPONENT and INTEGRITY_META_COMPONENT
tables. Note that INTEGRITY_META_COMPONENT is a new construct introduced in v5,
while REPOSITORY_META_COMPONENT already existed in v4.
REPOSITORY_META_COMPONENT schema¶
| Column | Type | Constraints |
|---|---|---|
| ID | BIGINT | PK |
| REPOSITORY_TYPE | TEXT | NOT NULL |
| NAMESPACE | TEXT | |
| NAME | TEXT | NOT NULL |
| LATEST_VERSION | TEXT | |
| PUBLISHED | TIMESTAMPTZ | |
| LAST_CHECK | TIMESTAMPTZ |
INTEGRITY_META_COMPONENT schema¶
| Column | Type | Constraints |
|---|---|---|
| ID | BIGINT | PK |
| PURL | TEXT | NOT NULL, UNIQUE |
| MD5 | TEXT | |
| SHA1 | TEXT | |
| SHA256 | TEXT | |
| SHA512 | TEXT | |
| PUBLISHED_AT | TIMESTAMPTZ | |
| REPOSITORY_URL | TEXT | |
| LAST_FETCH | TIMESTAMPTZ | |
| STATUS | TEXT |
Issues and limitations¶
It is not possible to join the COMPONENT and REPOSITORY_META_COMPONENT tables:
COMPONENThasPURLandPURL_COORDINATES(PURL without qualifiers and subpath) columns.REPOSITORY_META_COMPONENThasREPOSITORY_TYPE,NAMESPACEandNAMEcolumns.REPOSITORY_TYPEis not necessarily the same as a PURL type.- Namespace and name segments of full PURLs might be URL-encoded (e.g.
pkg:npm/%40foo/bar@1.0.0). NAMESPACEandNAMEcolumns ofREPOSITORY_META_COMPONENTcontain PURL namespace and name in URL-decoded form.
Thus:
- There is no single column we can join on.
- Constructing or deconstructing PURLs we can join on in the query is not reliable.
This causes a few bad limitations:
- When fetching data for the
/api/v1/finding/project/{uuid}endpoint, we run into the N+1 problem because we can't join. - It is impossible to delete
REPOSITORY_META_COMPONENTrecords that refer to components that no longer exist in the portfolio. The table can thus only grow over time.
Finally, the wording of REPOSITORY_META_COMPONENT and INTEGRITY_META_COMPONENT is
confusing. Fundamentally what they describe is package metadata and package artifact metadata.
Decision¶
Re-design the schema for this data to be more expressive, and easier to join.
REPOSITORY_META_COMPONENT becomes PACKAGE_METADATA.
Its primary key is a PURL without version, qualifiers, and subpath segments,
e.g. pkg:maven/com.acme/acme-lib. It stores:
- The latest available version for the component.
- Who resolved this data (i.e., name of the resolver implementation).
- When it was resolved.
- What source it was resolved from (e.g.
maven-central).
| Column | Type | Constraints |
|---|---|---|
| PURL | TEXT | PK |
| LATEST_VERSION | TEXT | |
| RESOLVED_BY | TEXT | NOT NULL |
| RESOLVED_AT | TIMESTAMPTZ | NOT NULL |
| RESOLVED_FROM | TEXT |
Its purpose is to store information that applies to all versions of the package.
Note that LATEST_VERSION is now nullable. A NULL value here is used to signal
that resolution was attempted, but no data was found.
The PUBLISHED column from REPOSITORY_META_COMPONENT is effectively dropped.
Its presence was a design mistake, as it confused publish timestamps of the latest
version, with that of an actual artifact.
INTEGRITY_META_COMPONENT becomes PACKAGE_ARTIFACT_METADATA.
Its primary key is a full PURL, including any qualifiers and subpaths,
which enables it to be joined with COMPONENT records. It stores:
- Artifact hashes.
- The publish timestamp.
- Who resolved this data (i.e., name of the resolver implementation).
- When it was resolved.
- What source it was resolved from (e.g.
maven-central).
| Column | Type | Constraints |
|---|---|---|
| PURL | TEXT | PK |
| PACKAGE_PURL | TEXT | FK, NOT NULL |
| HASH_MD5 | TEXT | |
| HASH_SHA1 | TEXT | |
| HASH_SHA256 | TEXT | |
| HASH_SHA512 | TEXT | |
| PUBLISHED_AT | TIMESTAMPTZ | |
| RESOLVED_BY | TEXT | NOT NULL |
| RESOLVED_AT | TIMESTAMPTZ | NOT NULL |
| RESOLVED_FROM | TEXT |
Its purpose is to store artifact-specific information. An important detail here is that
considering PURL qualifiers is critical. pkg:maven/com.acme/acme-lib@1.2.3?type=jar
and pkg:maven/com.acme/acme-lib@1.2.3?type=pom refer to different artifacts, despite
sharing the same coordinates. Their hashes and potentially even publish timestamps differ.
The REPOSITORY_URL column from INTEGRITY_META_COMPONENT is replaced with the combination
of RESOLVED_BY and RESOLVED_FROM. It's not always possible to link metadata to a single
URL, as sometimes multiple requests are required to assemble all necessary information.
The PACKAGE_PURL column has a foreign key constraint referencing the PACKAGE_METADATA
table. This enables clean joins from COMPONENT to PACKAGE_METADATA, for example:
SELECT "LATEST_VERSION"
FROM "COMPONENT" AS c
INNER JOIN "PACKAGE_ARTIFACT_METADATA" AS pam
ON pam."PURL" = c."PURL"
INNER JOIN "PACKAGE_METADATA" AS pm
ON pm."PURL" = pam."PACKAGE_PURL"
WHERE c."UUID" = '95d72ef7-a42f-4db6-9335-37aabe357315'
Note
Package metadata resolution is fundamentally linked to PURL.
No other identifier allows for accurate resolution and is thus not considered.
Not all COMPONENT records have a PURL.
Every unique COMPONENT.PURL value should have a corresponding PACKAGE_ARTIFACT_METADATA
record with matching PURL column. When ingesting data from BOMs or REST API requests,
we already canonicalize PURLs, which means that qualifier ordering etc. is generally stable
and will not lead to duplicate PACKAGE_ARTIFACT_METADATA rows.
It also improves data consistency: Artifact metadata cannot exist without corresponding
package metadata, even if the latter is "unknown". However, it does force PACKAGE_METADATA
records to exist before PACKAGE_ARTIFACT_METADATA records can be created. Resolvers
are expected to always resolve both sets of information, and the logic that orchestrates
resolvers must ensure proper insertion order.
The names PACKAGE_METADATA and PACKAGE_ARTIFACT_METADATA better convey what the
data they're holding is about.
Consequences¶
- The N+1 problem for endpoints like
/api/v1/finding/project/{uuid}is eliminated.
Package and artifact metadata can be fetched in a single query via joins. - Orphaned metadata rows can be identified and cleaned up by left-joining
against
COMPONENT, preventing unbounded table growth. - The FK from
PACKAGE_ARTIFACT_METADATAtoPACKAGE_METADATAenforces data consistency at the database level, but introduces a write-order dependency that all resolvers and their orchestration logic must respect. - Existing data in
REPOSITORY_META_COMPONENTandINTEGRITY_META_COMPONENTmust be migrated.REPOSITORY_META_COMPONENTrows cannot be migrated losslessly because they lack a PURL — they will need to be re-resolved.