Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation #52

Merged
merged 2 commits into from
Aug 7, 2020
Merged

Documentation #52

merged 2 commits into from
Aug 7, 2020

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Jul 21, 2020

The dual aim here is to properly document the public API and document the internal workings so future maintainers and alternative language implementers can follow how it all works. I'll also do a parallel addition to https://github.com/ipld/specs/blob/master/data-structures/hashmap.md to document the exact ways that this implementation differs from that spec. It's very close but the serialization format is different. That document can already serve as a more general overview of the algorithm here so we'll make sure they work together.

I need to work through the other files and I'm finding some areas of concern that need a deeper look and some additional areas that probably need more test coverage. Already I'm pretty confident that the collapse algorithm in cleanChild() isn't doing what it should, leading to non-canonical forms of HAMTs during deletion operations. I've noted that in here.

/cc @warpfork

@rvagg
Copy link
Member Author

rvagg commented Jul 22, 2020

A dedicated section in the IPLD HashMap spec that draws out the specific variations here. That doc has the correct algorithm so is useful as a general reference (minus some doubts I have about the canonical form child node collapse algorithm I've already documented in here but they should be treated as a bug if I'm correct). It now includes a section at the bottom that talks about implicit parameters and the block layout.

@rvagg rvagg force-pushed the rvagg/docs branch 2 times, most recently from a1053ce to e134aaa Compare July 22, 2020 04:58
@rvagg rvagg marked this pull request as ready for review July 22, 2020 04:58
@rvagg rvagg requested a review from warpfork July 22, 2020 04:59
@rvagg
Copy link
Member Author

rvagg commented Jul 22, 2020

Added a doc.go which copies some intro text from the IPLD HashMap spec and made the README slightly more useful. There's some TODOs in there, one particularly concerning one for cleanChild(). I've also opened #54, and #55 as additional TODOs that I think are worth checking and speccing if there's variation from what we have documented already. #53 has a proposal for changing block layout. And filecoin-project/specs-actors#517 has a proposal for changing the hash algorithm that might be worth doing in this repo so other users can easily choose as well.

This change couples with ipld/specs#282 to add specifics of how this impl and the Filecoin specific use of it varies from the IPLD HashMap, but that's just in block layout and implicit parameters. I'll also go into filecoin-project/specs and add a little bit more meat to the HAMT section instead of just pointing to the IPLD HashMap spec as it does now.

This PR is good to go now I think. @warpfork & @anorth reviews appreciated (anyone else from the Filecoin team you want to pull in for reviews?).

@rvagg rvagg changed the title WIP docs Documentation Jul 22, 2020
@anorth
Copy link
Member

anorth commented Jul 23, 2020

Could you please add @Stebalien for review too?

Copy link
Member

@anorth anorth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wonderful, thank you so much.

// will set them. In the case of loadChild() we're not expecting a mutation so
// a save is likely going to mean we incur unnecessary serialization when
// we've simply inspected the tree. Copy() will only set a cached form if
// it already exists on the source. It's unclear exactly what Flush() is good
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI @whyrusleeping this, and a similar but (as I understand it) worse issue in the AMT may contribute a significant amount of unnecessary write load when we change only a single entry in a HAMT but re-encode and write back unchanged nodes.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mmm, this is a really good catch. We should look into addressing this. Do we know if filecoin usecases actually trigger this problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, I'm far from certain but it seems to me that in chain/validation/state.go StateWrapper holds on to a state tree HAMT and a flush is done whenever the vm is flushed, which seems to be triggered in a few places. So I think that this will mean that as the tree is read, it's building up these cache entries just from loading the nodes, then these nodes are all serialized and an attempted save happens but presumably most (or all) won't actually save because their CIDs would already be in storage because new nodes are saved when they are created and mutated anyway. It'd be interesting to see what happens if you noop this Flush() and run all the tests.

// If we perform this depth-first, then it's possible to see the collapse
// cascade upward such that we end up with some parent node with a bucket with
// only bucketSize elements. The canonical form of the HAMT requires that
// any node that could be collapsed into a parent bucket is collapsed and.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@rvagg rvagg requested a review from Stebalien July 23, 2020 04:50
hamt.go Outdated Show resolved Hide resolved
hamt.go Outdated Show resolved Hide resolved
hamt.go Outdated Show resolved Hide resolved
@rvagg rvagg mentioned this pull request Aug 5, 2020
@rvagg rvagg mentioned this pull request Aug 7, 2020
Copy link
Contributor

@warpfork warpfork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't really consider myself enough of an owner to be a deciding approver for this, but fwiw, would really like to merge this. As I also poke around this code, I keep being tempted to write documentation for some of these methods and values, and then keep going "D'oh! Rod already did it in PR 52!". 😅

@anorth
Copy link
Member

anorth commented Aug 7, 2020

@Stebalien or @whyrusleeping please merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants