Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Peer storage for nodes to distribute small encrypted blobs. #1110

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 72 additions & 0 deletions 01-messaging.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ All data fields are unsigned big-endian unless otherwise specified.
* [The `error` and `warning` Messages](#the-error-and-warning-messages)
* [Control Messages](#control-messages)
* [The `ping` and `pong` Messages](#the-ping-and-pong-messages)
* [Peer Storage](#peer-storage)
* [The `peer_storage` and `peer_storage_retrieval` Messages](#the-peer_storage-and-peer_storage_retrieval-messages)
* [Appendix A: BigSize Test Vectors](#appendix-a-bigsize-test-vectors)
* [Appendix B: Type-Length-Value Test Vectors](#appendix-b-type-length-value-test-vectors)
* [Appendix C: Message Extension](#appendix-c-message-extension)
Expand Down Expand Up @@ -494,6 +496,76 @@ every message maximally).
Finally, the usage of periodic `ping` messages serves to promote frequent key
rotations as specified within [BOLT #8](08-transport.md).

## Peer storage

### The `peer_storage` and `peer_storage_retrieval` Messages

Nodes that advertise the `option_provide_storage` feature offer storing
arbitrary data for their peers. The data stored must not exceed 65531 bytes,
which lets it fit in lightning messages.

Nodes can verify that their `option_provide_storage` peers correctly store
their data at each reconnection, by comparing the contents of the
retrieved data with the last one they sent. However, nodes should not expect
Comment on lines +508 to +509
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If nodes can't rely on their peer having the latest data, presumably they shouldn't be checking against the latest :)

Suggested change
their data at each reconnection, by comparing the contents of the
retrieved data with the last one they sent. However, nodes should not expect
their data at each reconnection, by comparing the contents of the
retrieved data with one they sent. However, nodes should not expect

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you should expect your peer to send you the latest data and check against the latest. However you shouldn't rely on it because your peer may be trying to cheat you. If the data your peer sent you back is not the latest you should mark them as unreliable.

their peers to always have their latest data available.

Nodes ask their peers to store data using the `peer_storage` message & expect
peers to return the latest data to them using the `peer_storage_retrieval` message:
Comment on lines +512 to +513

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seem to imply that peer_storage_retrieval is a reply to peer_storage but the requirements below only say that peer_storage_retrieval must be sent after a reconnection, not as a reply to a peer_storage message.


1. type: 7 (`peer_storage`)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we not use even types for these messages? since we are only sending these if the peer advertises option_provide_storage

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does IOKTBO apply to message types?? I thought it was just TLVs

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this applies to the low message type numbers though afaict. If it did then things like accept_channel would be treated the same right? I would support using even messages, no issue with that at all, but I think I'm not quite understanding the intricacies of IOKTBO

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i think for some of the very first messages this was sort of overlooked but it wasnt a problem since nothing would work if a node didnt understand something fundamental like accept_channel

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IOKTBO definitely applies to all messages, its just in the very initial protocol there's no distinction cause everyone supports everything :). Its definitely nice for these to be odd because then nodes can send peer_storage without bothering to check if their peer supports the feature at all.

2. data:
* [`u16`: `length`]
* [`length*byte`:`blob`]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also extend this in order to have key-values instead of one single storage blob?

Suggested change
* [`length*byte`:`blob`]
* [`u16`:`key`]
* [`length*byte`:`blob`]

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good idea, but I think that would be more useful in case we want to distribute multiple backups (or backup with >65kb data).

In that case, we'd also have to specify which key would be considered ack, and this could lead to spamming (i.e. Peer can send me many blobs that could fill up my storage) that is why the current scheme just replaces the old storage blob with a new one.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could also extend this in order to have key-values instead of one single storage blob?

Under the hood it might be a key-value structure - but think that that is better left to the above layer.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Under the hood it might be a key-value structure - but think that that is better left to the above layer.

+1. Single Opaque Buffer is simplest to implement and all other data structures can be implemented atop this if we want. KV storage also introduces the question of what we are supposed to send back on reconnect.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, not a great idea to go kv



1. type: 9 (`peer_storage_retrieval`)
2. data:
* [`u16`: `length`]
* [`length*byte`:`blob`]


Requirements:

The sender of `peer_storage`:
- MAY send `peer_storage` whenever necessary.
- MUST limit its `blob` to 65531 bytes.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could also consider padding so that it is always fixed to 65531 bytes

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you want to pad?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the reasoning would be privacy related: ie so that receiver cant deduce anything about the peer sending the data based on size changes.

That being said: I think all of that detail doesnt need to be specced out on this layer. How the blob is structured is the responsibility of the layer above.

Especially if the layer above is a "storage for payment" layer where they are charging per byte (or byte bucket) then we really dont want to force them to use the full 65k here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already handle this with the randomization of ping/pong responses. The actual messages themselves are encrypted as well. Padding the message out to 65531 actually negatively impacts privacy as best I can tell since now we can do partial inference and filter out all non-max-length messages from the anonymity set.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops I misread this. Indeed the receiver may come to learn about the contents of the payload through size analysis. My commentary on ping/pong length randomization is about 3p privacy, not 2p privacy.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah my initial thinking was that certain data sizes can be associated with certain types of backups or data in general

That being said: I think all of that detail doesnt need to be specced out on this layer. How the blob is structured is the responsibility of the layer above.

Yes I see what you're saying here. In this case I assume the "layer above" lies within the node implementation itself? It's not like external software would have direct access to this field

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, is it the BOLT's responsibility to document the need for padding?

- MUST encrypt the data in a manner that ensures its integrity upon receipt.


The receiver of `peer_storage`:
- If it offered `option_provide_storage`:
- if it has an open channel with the sender:
- MUST store the message.
- MAY store the message anyway.

- If it does store the message:
- MAY delay storage to ratelimit peer to no more than one update per second.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of data storage. Even with no overhead 64KiB/second will destroy almost any consumer SSD in a year. ISTM one update per second is much too much if you're not charging for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the idea is to replace it? Is it the IOPS themselves that will destroy the SSD or are you saying it'll just overrun the storage capacity.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the peer is supposed to replace the received data every time. Although what do you suggest would be a good ratelimit?

Copy link

@Chinwendu20 Chinwendu20 May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think storage here means persistent storage? I am thinking this should not be in the spec, how often an implementation chooses to store the backup data should be up to them? There is really no need for a consensus on it?

Edit:
I think this might not work because there is no guarantee that there would be a graceful shutdown.

The way this is set up, the peer is expected to replace the backup data with the latest one right? that means only the latest data is needed to be backed up. The data is not also retrieved throughout the connection with the peer only on reconnection. So that means there is really no need to persist this data until the peer connection is over, right? So no need for update/second, but then again this should be up to implementation?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a lot of data storage. Even with no overhead 64KiB/second will destroy almost any consumer SSD in a year. ISTM one update per second is much too much if you're not charging for it.

Agree and I believe directly charging is too involved for a low level feature like this. Could indirectly relate this rate to a cost, something like rate limiting to 1 update per htlc that has been routed over the channel.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the IOPS themselves that will destroy the SSD

Yes, over one year 64KiB/second is 2 TB, which is more than most consumer SSD's write lifetimes.

rate limiting to 1 update per htlc

that's even worse :)

ISTM a rate-limit of one/minute makes a lot more sense.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with rate limiting like this combined with no ACK is that you can't guarantee that an honest peer will store the latest data which I think is a very desirable property.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rate limiting to 1 update per htlc that has been routed over the channel.

that's even worse :)

Assuming they pay at least one sat per htlc, that should easily cover the cost of replacing your SSD.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem with rate limiting like this combined with no ACK is that you can't guarantee that an honest peer will store the latest data which I think is a very desirable property.

This protocol (very deliberately, AFAIU) does not provide any guarantees whatsoever - any peer can trivially just tell you "hey, I stored it" and not store it. Even if there were an ACK, we'd just happily always tell peers we stored it but rate-limit actually going to disk to once a minute (or, really, whenever we're writing other data, we'll probably never go to disk just for a peer storage event). You cannot rely on the storage being super durable no matter what, and once a second is a lot of data.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, over one year 64KiB/second is 2 TB, which is more than most consumer SSD's write lifetimes.

The rated endurance for SSDs is 600 rewrites (very underestimated, as this is for warranty). Assuming you store these on a 128GB SSD (76TBW), you will exhaust that in about 40 years.

- MUST replace the old `blob` with the latest received.
- MUST send `peer_storage_retrieval` again after reconnection, after exchanging `init` messages.


The sender of `peer_storage_retrieval`:
- MUST include the last `blob` it stored for that peer.
- when all channels with that peer are closed:
- SHOULD wait at least 2016 blocks before deleting the `blob`.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the thinking here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make it a bit more reliable. So that user might have a chance to get the PeerStorage back from recent old peers as well.


The receiver of `peer_storage_retrieval`:
- when it receives `peer_storage_retrieval` with an outdated or irrelevant data:
- MAY send a warning.

Rationale:
The `peer_storage` and `peer_storage_retrieval` messages enable nodes to securely store
and share data with other nodes in the network, serving as a backup mechanism for important
information. By utilizing them, nodes can safeguard crucial data, enhancing the network's
resilience and reliability. Additionally, even if we don't have an open channel, some nodes
might provide this service in exchange for some sats so they may store `peer_storage`.

`peer_storage_retrieval` should not be sent after `channel_reestablish` because then the user
wouldn't have an option to recover the node and update its state in case they lost data.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't have an option to recover the node and update its state in case they lost data.

can you elaborate a bit more here? don't really get why sending peer_storage_retrieval hurts here

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If data is lost, a node would need to see their storage before they can process the reestablish message.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, we want "peer_storage_retrieval MUST sent before channel_reestablish", but there is nothing problematic about sending it again after channel_reestablish, it's just unnecessary.


Nodes should send a `peer_storage` message whenever they wish to update the `blob` stored with their peers.
This `blob` can be used to distribute encrypted data, which could be helpful in restoring the node.

## Appendix A: BigSize Test Vectors

The following test vectors can be used to assert the correctness of a BigSize
Expand Down
1 change: 1 addition & 0 deletions 09-features.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ The Context column decodes as follows:
| 26/27 | `option_shutdown_anysegwit` | Future segwit versions allowed in `shutdown` | IN | | [BOLT #2][bolt02-shutdown] |
| 28/29 | `option_dual_fund` | Use v2 of channel open, enables dual funding | IN | | [BOLT #2](02-peer-protocol.md) |
| 38/39 | `option_onion_messages` | Can forward onion messages | IN | | [BOLT #7](04-onion-routing.md#onion-messages) |
| 42/43 | `option_provide_storage` | Can store other nodes' encrypted backup data | IN | | [BOLT #1][bolt01-Peer-storage] |
| 44/45 | `option_channel_type` | Node supports the `channel_type` field in open/accept | IN | | [BOLT #2](02-peer-protocol.md#the-open_channel-message) |
| 46/47 | `option_scid_alias` | Supply channel aliases for routing | IN | | [BOLT #2][bolt02-channel-ready] |
| 48/49 | `option_payment_metadata` | Payment metadata in tlv record | 9 | | [BOLT #11](11-payment-encoding.md#tagged-fields) |
Expand Down