Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load-time block format validation #59

Merged
merged 2 commits into from
Aug 17, 2020

Conversation

rvagg
Copy link
Member

@rvagg rvagg commented Aug 8, 2020

Includes #57, the first two commits. The changes so far are only in LoadNode and in the tests.

Adds in some simple helper functions into the tests to help manually build CBOR blobs that'll load and trigger various cases. Only got as far as ensuring that the bitmap and the number of elements aren't out of alignment. That can get stricter with #54 if we put in precisely the size of bitmap that we need for the array of elements.

Other things I'm working on in here are listed at the bottom of hamt_test.go. The aim is that this thing should refuse to load from blocks that smell funny (i.e. aren't exactly the right form). There should only be one way of representing this data and there should be no avenues for variation. That's an ideal and not strictly possible in the extreme sense for various reasons but we can get close.

@rvagg rvagg changed the title WIP load-time block format validation Load-time block format validation Aug 10, 2020
@rvagg rvagg marked this pull request as ready for review August 10, 2020 11:43
@rvagg
Copy link
Member Author

rvagg commented Aug 10, 2020

Ready for review now. This includes 2 commits from #57, the cleanChild() changes. It adds a bunch of code to LoadNode() and pulls out the loading functionality into a new loadNode() so we can toggle on root/non-root for different validation. There's a minor change that gets into the new cleanChild() because we no longer need to do one bit of validation there now. There's a new method in uhamt.go for counting the total number in the bitfield. There's a big-ol' test case named TestMalformedHamt which pushes this around with different incoming CBOR blocks. Some utilities in hamt_util_test.go plus some context-specific ones inside the test let us manually build CBOR that forces validation errors in various ways. There's also sanity-checking along the way to make sure we're not just hitting "this is all just bad CBOR" type errors.

The HAMT has (almost) zero chill now about malformed blocks. If a block smells bad then it's rejected, no pretending. One item that could be checked but is not because it's so complicated and costly (for now) is to make sure that entries are in their right positions in the graph given their keys. If that's important it could be added later and called on demand. For now this is block-local only.

hamt.go Outdated Show resolved Hide resolved
hamt.go Outdated Show resolved Hide resolved
hamt.go Outdated Show resolved Hide resolved
@Stebalien Stebalien self-requested a review August 10, 2020 20:12
if !((isLink && !isBucket) || (!isLink && isBucket)) {
return nil, ErrMalformedHamt
}
if isLink && ch.Link.Type() != cid.DagCBOR { // not dag-cbor
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly the most controversial item in here that I want to highlight. We're validating that all links to child HAMT nodes (not links it might contain as values, only HAMT nodes) are of codec DAG-CBOR. Just another check to get some assurance we're not being fed bogus data in the structure, it doesn't go very far but we're asserting that this HAMT has homogeneity in its block generation, for now. This could be changed in future if we allowed some variability in codec or wanted to have a migration path.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could... make this another parameter? I'd agree, anyway, that having the interior tree all be one codec is probably pretty reasonable.

The repo name doesn't have "cbor" in it; that's my only reason for pause here in making it hardcoded to that value.

(Otoh, there's... several other details and import statements that seem to imply this system is already cbor-only, and the UnmarshalCBOR methods, and... hrm. Maybe the repo name should just also indicate cbor-specificness, if it's already quietly the case.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable check for now, but I agree we should, ideally, make it more generic.

I mean, ideally we'd save the codec/hash-function, etc. when loading a node, then restore it when encoding. Then, we'd just check "do all linked children match their parents?". But at the moment, you're right. Everything is very DagCBOR specific.

@rvagg
Copy link
Member Author

rvagg commented Aug 11, 2020

Fixed CI bug with Go 1.11 and squashed this into just the two commits on the HEAD of this branch. Review should focus on those two, 04503fd particularly (the other just adds a coverage target to Makefile).

if !((isLink && !isBucket) || (!isLink && isBucket)) {
return nil, ErrMalformedHamt
}
if isLink && ch.Link.Type() != cid.DagCBOR { // not dag-cbor
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a reasonable check for now, but I agree we should, ideally, make it more generic.

I mean, ideally we'd save the codec/hash-function, etc. when loading a node, then restore it when encoding. Then, we'd just check "do all linked children match their parents?". But at the moment, you're right. Everything is very DagCBOR specific.

return nil, ErrMalformedHamt
}
for i := 1; i < len(ch.KVs); i++ {
if bytes.Compare(ch.KVs[i-1].Key, ch.KVs[i].Key) >= 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Followup: we should probably be checking the hash prefix.

@Stebalien Stebalien merged commit 1b0d1f9 into filecoin-project:master Aug 17, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants