IPIP-499: UnixFS CID Profiles #499

mishmosh · 2025-04-03T14:03:02Z

Currently, CIDs can be generated with a variety of settings and optimizations for chunking, DAG width, and more. This means the same file can yield multiple, different CIDs depending on which tools and settings are used, and it is not possible to reliably reproduce or verify the CID.

This proposal introduces profiles for IPFS CIDs. Profiles explicitly define CID version, hash algorithm, chunk size, DAG width, layout, and other parameters. They can be used to verify data across implementations, provide recommended settings depending on retrieval performance goals, and more.

src/ipips/ipip-0499.md

lets make the fanout match the max links from files and rename profile to `-wide` this will make it easier to discuss in ipfs/specs#499

Co-authored-by: Bumblefudge <bumblefudge@learningproof.xyz>

Import.* config params for controlling DAG width were added in: ipfs/kubo#10774

lidel · 2025-04-15T22:37:05Z

Thank you for kicking this off, and filling initial state.

I've incorporated specific "dag width" settings for File, Directory and HAMTDirectory nodes,
and updated the table to reflect state from ipfs/kubo#10774
and profiles that exist in Kubo master branch: legacy-cid-v0, test-cid-v1 and test-cid-v1-wide:

https://github.com/ipfs/kubo/blob/master/config/profile.go#L268-L307

agree what "cid-2025" profile should look like
- this will be new default in "Kubo v1.0"
- we have test-cid-v1 and test-cid-v1-wide in Kubo as potential candidates
switch to PR from local branch (so we have build preview)
figure out how to render the information (currently the table is not supported by https://github.com/ipfs/spec-generator)

src/ipips/ipip-0499.md

Co-authored-by: Christian Paul <info@jaller.de>

darobin · 2025-10-31T15:25:23Z

src/ipips/ipip-0499.md

+1. UnixFS DAG layout (e.g. balanced, trickle etc...)
+1. UnixFS DAG width (max number of links per `File` node)
+1. `HAMTDirectory` bitwidth, i.e. the number of bits determines the fanout of the `HAMTDirectory` (default bitwidth is 8 == 256 leaves).
+1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links


Suggested change

1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links

1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links. We do not include details about the estimation algorithm as we do not encourage implementations to support it.

Bit odd to discourage, when both most popular implementations in GO and JS use size-based heurstic - #499 (comment)

Unsure how to handle this. Perhaps clarify the heuristic is implementation-specific, and when deterministic behavior is expected, a specific heuristic should be used?

I don't think we should be estimating the block size as it's trivial to calculate it exactly. Can we not just define this (and punt to the spec for the details) to make it less hand-wavey?

Suggested change

1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links

1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on the final size of the serialized form of the [PBNode protobuf message](https://specs.ipfs.tech/unixfs/#dag-pb-node) that represents the directory.

rvagg · 2025-11-12T12:39:32Z

Hey, I'd love to be able to reference this, even if it's in "draft" form, could we just merge it and continue to iterate on top of it to get it right?

Fixed outdated references, consistent profile names, streamlined Summary and Motivation sections.

github-actions · 2025-11-15T01:24:01Z

🚀 Build Preview on IPFS ready

🔎 Commit: 123be3d
🔏 CID bafybeig3s4k2ppn6olfk56r3yrcyfuxlygevn2bra4vq77hazd5wcw6vmu
📦 Preview:

mishmosh · 2025-11-15T01:35:10Z

I made a few changes/fixes, aiming to land this early next week.

Added links to UnixFS spec (now that it exists)
Specified calendar versioning for profile names (line 64), per @b5 suggestion
- @lidel I gave the 3 kubo profiles names that matched the naming scheme. This would mean minor updates to kubo, but is probably better for future-proofing. Acceptable? Also happy to discuss live.
Changed the "current defaults" section into a series of legacy profile names, that implementations MAY support. This allows those profile sets to be referenced/used across implementations.
We were using fanout and bitwidth interchangeably. I changed them all to fanout, in keeping with the UnixFS terminology. If we prefer bitwidth, I can PR that to UnixFS spec and then also here.
Streamlined lots of duplicate language from Summary and Motivation sections

Open questions:

How to handle Test fixtures section (line 120)? (Not-blocking, IMO)
Thread on empty directory filtering (blocking)
Thread on threshold size (blocking)

src/ipips/ipip-0499.md

lidel · 2025-11-20T15:04:03Z

src/ipips/ipip-0499.md

+
+As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle DAG nodes needed to verify the CID.
+
+## Test fixtures


Just noting this is (imo) a blocker.

We did not merge UnixFS spec until we had sensible set of fixtures that people could use as reference.

The spec may be incomplete, but a fixture will let people reverse-engineer any details, and then PR improvement to spec.

Without fixtures for each UnixFS node type, we risk unknown unknown silently impacting final CID (e.g. because we did not know that someone may decide to place leaves one level sooner as "optimization" and someone else always at bottom, as "formal consistency")

Tracking this in ipfs/kubo#11071

Thanks!

I will implement kubo-* profiles as part of 0.40 and test fixtures will be part of that work.

Then we will be able to link to them form spec, like we did in https://specs.ipfs.tech/unixfs/#appendix-test-vectors

Co-authored-by: Rod Vagg <rod@vagg.org>

mishmosh · 2025-11-20T15:40:42Z

Just synced with @lidel. He wants to ship this with test fixtures in place, (tracked in kubo/issues/11071). In the meantime, we don't anticipate changes to the profiles themselves so you can can reference this PR.

Co-authored-by: Rod Vagg <rod@vagg.org>

icidasset · 2025-11-21T14:56:43Z

Great work, glad to see this!

Couple notes/questions:

The profiles (legacy + new) don't say if the chunks are of a fixed size, or which algorithm they use.
Small typo under "Compatibility": "support the the set of" (double the)
Would it also be interesting to note if an implementation respects symlinks and if so, how the different kinds of symlinks are translated?

- add chunking algorithm parameter to both tables (fixed-size) - add hidden entities row to legacy profiles table - ensures both unixfs-2025 and legacy tables cover same parameters

- rename kubo-legacy-2015 to kubo-legacy-2025 - clarify (v0.39 default) instead of (kubo default) - fix leaves value: dag-pb (UnixFSRawLeaves=False in legacy-cid-v0)

clarify empty directories and hidden entities handling with precise terminology based on kubo v0.39, helia, and storacha implementations: - `included`: always in DAG, no option to exclude (kubo/helia empty dirs) - `excluded`: never in DAG, no option to include (storacha empty dirs) - `opt-in`: excluded by default, flag to include (all hidden entities) - `opt-out`: included by default, flag to exclude add terminology note to explain these terms

…tions

add "Based on" row with package/tool versions and kubo profile names

- unixfs-2025: mark threshold as TODO, prefer Helia's block size approach - unixfs-2025: note kubo needs opt-out flag for empty directories - legacy profiles: add estimation method to kubo profiles - parameters section: add backticks, clarify threshold estimation methods

- add Symlinks parameter to UnixFS parameters list - add Symlinks row to unixfs-2025 (TODO) and legacy profiles tables - kubo: preserved, helia/storacha: followed, dasl: not specified - add terminology for preserved/followed with UnixFS spec reference - clarify kubo --dereference-args behavior

lidel

Quick update: I've pushed several commits addressing feedback and gaps in the document:

Resolved / Research done

Documented Syhmlink behavior as suggested by @icidasset
- Only Kubo 0.39 preserves symlinks, everything else dereferences on the fly by default, turning symlinks into real files and directories a symlink pointed at
Added Chunking algorithm row to both profile tables for completeness
Fixed kubo-legacy-2025 profile: corrected Leaves from raw to dag-pb (verified against kubo v0.39 legacy-cid-v0 profile where UnixFSRawLeaves=false)
Documented filtering behavior with clear terminology:
- included: always in DAG (no option to exclude)
- excluded: always excluded (no option to include)
- opt-in: excluded by default, flag to include (e.g., --hidden)
- opt-out: included by default, flag to exclude
Added Based on row with implementation versions and kubo profile names (legacy-cid-v0, test-cid-v1, test-cid-v1-wide)
Clarified HAMTDirectory threshold estimation methods in the parameters section: link count (naive), PBNode.Links size (name + CID), or full dag-pb block size (most accurate)
Noted that legacy table includes non-UnixFS implementations (DASL) in Summary section
Added estimation method suffix (est:links[name+cid]) to kubo profiles in legacy table

Remaining TODOs in `unixfs-2025`

Parameter	Status
HAMTDirectory threshold	TODO - fix kubo: likely based on full block size estimation (Helia approach)
Empty directories	TODO - use kubo? needs opt-out flag + `Import.*`
Hidden entities	TODO - use kubo? needs opt-in flag + `Import.*`
Symlinks	TODO - use kubo? needs flag + `Import.*` for controlling if all symlinks in imported directory tree are preserved or dereferenced)
Test fixtures	TODO - reuse kubo: will reuse once kubo has them for `*-2025` profiles

Other:

Look into data-preservation-programs/singularity#525

Implementation Plan (Kubo 0.40, ETA 2026 Q1)

To finalize this IPIP, Kubo needs to support additional Import.* configuration flags for:

Empty directories: opt-out flag to exclude them from DAG
Hidden files: already has --hidden, just need to wire it up from config
HAMTDirectory threshold: configurable to support both legacy estimation (name + CID size) and Helia-style full block size calculation

Test fixtures will likely be included in the same Kubo PR that adds these missing features.

I also think we may replace two kubo-2025 and kubo-2025-wide profiles with a single one, that makes decision on what remains narrow and what is wide, but will update once Kubo changes land. (now that we have convention of doing IPIPs with profiles, we can always course-correct in `-202

recommend full serialized PBNode size, link to dag-pb spec ref: ipfs#499 (comment)

- rename to UnixFS CID Profiles - add lidel as editor - add thanks section with PR reviewers

Create ipip-0000.md

8842176

mishmosh requested a review from a team as a code owner April 3, 2025 14:03

Update and rename ipip-0000.md to ipip-0499.md

4ba68f0

mishmosh changed the title ~~Create ipip-0000.md: CID profiles~~ IPIP 0499: CID Profiles Apr 3, 2025

2color reviewed Apr 3, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

2color mentioned this pull request Apr 3, 2025

Inconsistent CID Calculation with Example: Addressing a file by CID with UnixFS ipfs/helia#765

Closed

bumblefudge reviewed Apr 4, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

bumblefudge reviewed Apr 4, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

bumblefudge reviewed Apr 4, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

lidel reviewed Apr 11, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

lidel mentioned this pull request Apr 11, 2025

feat(config): ipfs add and Import options for controling UnixFS DAG Width ipfs/kubo#10774

Merged

lidel added a commit to ipfs/kubo that referenced this pull request Apr 15, 2025

refactor: test-cid-v1-wide with UnixFSHAMTDirectoryMaxFanout=1024

b08bc4d

lets make the fanout match the max links from files and rename profile to `-wide` this will make it easier to discuss in ipfs/specs#499

lidel and others added 2 commits April 15, 2025 23:41

add extra attributes proposed in review

6cc64cb

Co-authored-by: Bumblefudge <bumblefudge@learningproof.xyz>

incorporate kubo#10774

d8b8389

Import.* config params for controlling DAG width were added in: ipfs/kubo#10774

This was referenced Apr 18, 2025

Initial UnixFS specification #331

Merged

Protocol stewardship and improvements — IPFS/2025 ipshipyard/roadmaps#16

Open

Merge branch 'main' into patch-1

600d1fc

BrewTestBot mentioned this pull request May 21, 2025

ipfs 0.35.0 Homebrew/homebrew-core#224309

Merged

2color reviewed May 23, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

2color reviewed May 29, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

jaller94 reviewed Jun 30, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

This comment was marked as off-topic.

Sign in to view

SethDocherty mentioned this pull request Jul 1, 2025

Difference in CID Generation between IPFS and Singularity data-preservation-programs/singularity#525

Open

2color and others added 6 commits August 12, 2025 09:21

Update src/ipips/ipip-0499.md

595588c

Co-authored-by: Christian Paul <info@jaller.de>

add daniel as editor

41f9b86

edit summary and motivation

229988f

edit summary

f37e610

edit parameters and design

7a12f0a

edit user benefit and compatibility

ff69e56

address feedback from rvagg

9c621ba

darobin reviewed Oct 31, 2025

View reviewed changes

Update ipip-0499.md

c109c1a

Fixed outdated references, consistent profile names, streamlined Summary and Motivation sections.

rvagg reviewed Nov 15, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

rvagg reviewed Nov 15, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

rvagg reviewed Nov 15, 2025

View reviewed changes

src/ipips/ipip-0499.md Outdated Show resolved Hide resolved

lidel reviewed Nov 20, 2025

View reviewed changes

Update src/ipips/ipip-0499.md

383f9e3

Co-authored-by: Rod Vagg <rod@vagg.org>

mishmosh mentioned this pull request Nov 20, 2025

Implement modern CID profile from IPIP-499 ipfs/kubo#11071

Open

4 tasks

Update src/ipips/ipip-0499.md

e564968

Co-authored-by: Rod Vagg <rod@vagg.org>

Update src/ipips/ipip-0499.md

bbd547f

Co-authored-by: Rod Vagg <rod@vagg.org>

fix typo (the the)

70514b9

lidel self-assigned this Dec 10, 2025

lidel added 8 commits December 12, 2025 23:53

Merge branch 'main' into patch-1

89c9c62

feat(ipip-0499): add chunking algorithm and align profile tables

92352d7

- add chunking algorithm parameter to both tables (fixed-size) - add hidden entities row to legacy profiles table - ensures both unixfs-2025 and legacy tables cover same parameters

fix(ipip-0499): correct kubo legacy profile

9d0d415

- rename kubo-legacy-2015 to kubo-legacy-2025 - clarify (v0.39 default) instead of (kubo default) - fix leaves value: dag-pb (UnixFSRawLeaves=False in legacy-cid-v0)

fix(ipip-0499): note that legacy table includes non-UnixFS implementa…

94a1b79

…tions

feat(ipip-0499): add implementation versions to legacy profiles table

7a8d6ab

add "Based on" row with package/tool versions and kubo profile names

lidel requested changes Dec 13, 2025

View reviewed changes

lidel added 2 commits December 13, 2025 02:33

fix(ipip-0499): clarify HAMTDirectory threshold calculation methods

3a092a4

recommend full serialized PBNode size, link to dag-pb spec ref: ipfs#499 (comment)

fix(ipip-0499): update metadata and add contributors

123be3d

- rename to UnixFS CID Profiles - add lidel as editor - add thanks section with PR reviewers

lidel changed the title ~~IPIP 0499: CID Profiles~~ IPIP-499: UnixFS CID Profiles Dec 13, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

IPIP-499: UnixFS CID Profiles #499

IPIP-499: UnixFS CID Profiles #499

Uh oh!

mishmosh commented Apr 3, 2025 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lidel commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

This comment was marked as off-topic.

darobin Oct 31, 2025

lidel Nov 13, 2025 •

edited

Loading

achingbrain Nov 13, 2025 •

edited

Loading

rvagg commented Nov 12, 2025

github-actions bot commented Nov 15, 2025 •

edited

Loading

mishmosh commented Nov 15, 2025

Uh oh!

Uh oh!

Uh oh!

lidel Nov 20, 2025

mishmosh Nov 20, 2025

lidel Nov 20, 2025

mishmosh commented Nov 20, 2025

icidasset commented Nov 21, 2025

lidel left a comment •

edited

Loading

Labels

11 participants

	1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links
	1. `HAMTDirectory` threshold (max `Directory` size before switching to `HAMTDirectory`): based on an estimate of the block size by counting the size of PNNode.Links. We do not include details about the estimation algorithm as we do not encourage implementations to support it.


		As an alternative to profiles, users can store and transfer CAR files of UnixFS content, which include the merkle DAG nodes needed to verify the CID.

		## Test fixtures

IPIP-499: UnixFS CID Profiles #499

Are you sure you want to change the base?

IPIP-499: UnixFS CID Profiles #499

Uh oh!

Conversation

mishmosh commented Apr 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

lidel commented Apr 15, 2025

Uh oh!

Uh oh!

Uh oh!

This comment was marked as off-topic.

darobin Oct 31, 2025

Choose a reason for hiding this comment

lidel Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

achingbrain Nov 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

rvagg commented Nov 12, 2025

github-actions bot commented Nov 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

🚀 Build Preview on IPFS ready

mishmosh commented Nov 15, 2025

Uh oh!

Uh oh!

Uh oh!

lidel Nov 20, 2025

Choose a reason for hiding this comment

mishmosh Nov 20, 2025

Choose a reason for hiding this comment

lidel Nov 20, 2025

Choose a reason for hiding this comment

mishmosh commented Nov 20, 2025

icidasset commented Nov 21, 2025

lidel left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Resolved / Research done

Remaining TODOs in unixfs-2025

Implementation Plan (Kubo 0.40, ETA 2026 Q1)

Labels

11 participants

mishmosh commented Apr 3, 2025 •

edited

Loading

lidel Nov 13, 2025 •

edited

Loading

achingbrain Nov 13, 2025 •

edited

Loading

github-actions bot commented Nov 15, 2025 •

edited

Loading

lidel left a comment •

edited

Loading

Remaining TODOs in `unixfs-2025`