delta-only repositories #729

cgwalters · 2017-03-10T19:53:19Z

In the Fedora/CentOS case where by default we rely on e.g. university-owned mirrors that might be some random ext4 server and not a proper object store, we can hit performance issues with the archive format.

It should be quite possible to make it easier for server operators to manage a "delta-only" repository. See also: #701

So it's delta-only + single "from empty" delta for the latest.

I think it'd be possible to cobble this together today via ostree static-delta generate --min-fallback-size 100000 for each delta you want, then ostree summary -u, then sync the summary and deltas/ content to the "delta repo".

The text was updated successfully, but these errors were encountered:

alexlarsson · 2017-03-24T12:54:17Z

I think this sounds good, as long as it properly falls back to the "from empty" delta if we're pulling from "not the next-to-latest" local version.

cgwalters · 2017-04-07T14:15:45Z

(But we need some unit test coverage, and there's various enhancements one could make on top of this like being able to fall back to a separate archive repo for e.g. downgrades)

cgwalters · 2017-04-07T14:20:41Z

Also, one thing occurs to me - we'd at least need to maintain the commit objects in the repo, otherwise prune would prune the deltas.

dustymabe · 2017-04-07T15:37:21Z

(But we need some unit test coverage, and there's various enhancements one could make on top of this like being able to fall back to a separate archive repo for e.g. downgrades)

does this issue cover the creation of unit tests for static delta only repos or do we need another ticket for that?

Also, one thing occurs to me - we'd at least need to maintain the commit objects in the repo, otherwise prune would prune the deltas.

are we talking about the static delta only repo? wouldn't that get rid of the point of not having a bunch of small files in the repo? If we have a master repo where the small files and the static deltas live and then just create static delta only repos by copying content out of that repo then we don't need to worry about this correct?

ramcq · 2018-01-09T10:19:53Z

be some random ext4 server and not a proper object store

@cgwalters I'm kind of confused by this - what about a filesystem makes it unsuitable for storing/hosting an ostree repo? Is there a more effective backend from which you can store an ostree repo and serve it over http? Or do mirror operators simply dislike having lots of files around?

alexlarsson · 2018-01-09T10:48:07Z

So, I recently chatted with someone who was running an "app store" about how they implement authorized downloads. Basically what they do is serve the app files on a cdn like cloudfront, and then use a feature like cloudfront secure urls as documented here: https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/PrivateContent.html where they generate the final URL on their server where they know that the logged in user is allowed to download a particular app. The secure URL has a lifetime of 30 seconds and is signed on the server, so the client doesn't have to care and can just download the thing.

In the context of ostree we could do the same thing if we had a delta-only repo on a cdn:

Use ostree pull --http-header=NAME=VALUE to set the secure cookie with a custom policy
In the policy, give access to "repo/deltas//${END OF COMMIT BASE64}" and only for 30 sec

Cloudfront allows you to use cookies for this, but it seems some other CDNs only support http params, so maybe ostree should have a feature similar to --http-header that adds a http param to all urls.

dustymabe · 2018-12-18T18:27:08Z

@jlebon, @cgwalters, @sinnykumari and I were discussing 'delta-only' repos today. One thing @jlebon brought up was:

  jlebon | @walters @ksinny @dustymabe, just remembered re. static deltas -- those can actually list
         | fallback objects the client should just fetch directly from `objects/`. so we'll have to be
         | careful of that, either also mirroring just those ones (i think they're usually big files), or
         | teach ostree to fetch fallback objects from a separate repo? (edited)
 walters | yeah, i think we need a repo config flag saying it's a delta-only repo

ramcq · 2018-12-18T21:15:04Z

I've been discussing this stuff with @alexlarsson a lot in the context of Flathub. At one point, the flathub stats were showing each download (whether an upgrade, or a new pull) was averaging 1GB of data transferred - but this was during a period that when ostree didn't see a matching delta it would pull the scratch delta instead of doing an object pull (madness, later resolved).

A delta-only repo is basically re-instating this: mirrors are great and everything, but are a far less relevant way of distributing files than modern caching/proxying CDNs. BunnyCDN (for Endless) and Fastly (for Flathub) work a-OK for ostree repos, and you can easily tune the caching to keep the immutable objects around for ~ever, have short timeouts / explicit purges, its pretty easy to cache ostree repos in CDNs, and the hit rate is superb (>97% in both cases I have access to, likely the two largest production ostree repos at present).

So: what problem is really being solved here? When you look at your CDN bill, or the time and data it costs at the client to have a very limited version of things on the server, I'm really not convinced that unless we make deltas heaps smarter, that a delta only repo is a benefit for clients. It makes mirroring easier, yes - because you have maybe one or a couple of delta folders per ref - but most people don't have a mirror network, so I think it represents a net loss for the bandwidth efficiency of the client, unless we:

Figure out some better practices/heuristics for generating/retaining deltas, such as having a chain of them until the cumulative size approaches a % threshold of the scratch delta
Have clients do some path-solving to actually pull a few deltas in a row, rather than falling back to an object pull
Take the list of deltas out of the summary file, otherwise it will massively bloat if you start to have any non-useless deltas available for people - I guess commit meta, provided you could square the circle of the delta itself containing the commit ID as a variable (but this is something that has been worked around in eg flatpak build-commit-from, because the delta is really about the content trees, not the commits, so they can be monkey-patched to a different commit)
More, smarter things...

cgwalters · 2018-12-18T21:25:33Z

but this was during a period that when ostree didn't see a matching delta it would pull the scratch delta instead of doing an object pull (madness, later resolved).

Right: #1709

ramcq · 2018-12-18T21:29:40Z

but this was during a period that when ostree didn't see a matching delta it would pull the scratch delta instead of doing an object pull (madness, later resolved).

Right: #1709

Oh yeah! What I said back then. tl;dr - deltas are an amazing technical advantage of ostree, and (modulo bringing any repo server to its knees when generating them on large files) incredibly smart and bandwidth efficient, but they totally fail to deliver on that promise due to how they are currently deployed and managed. Let's make repo the management tools, ostree/flatpak/repo-manager smarter before we force that ineffectual deployment cost onto our downstream mirrors and every end user by flipping a delta-only bit and not solving the real problem. :)

cgwalters · 2018-12-18T22:23:56Z

We (FCOS) are discussing this in the context of this issue which links to this MirrorManager one. A concern some people have is tying ourselves solely to a CDN.

ramcq · 2018-12-18T22:31:47Z

This is the answer you get if you ask mirror operators, of course. :) Provide an OCI image which just opens a caching front-end, and you can deploy your own grass-roots CDN with a geoIP or round robin frontend. Setting low TTLs or issuing PURGE is pretty easy after a summary update. I think if you "solve" this problem (making life easier for mirror operators) it will make things worse for users and undo eg work on delta RPMs etc.

dustymabe · 2018-12-19T20:44:17Z

concern some people have is tying ourselves solely to a CDN.

for me, I'm not as concerned with tying ourselves to CDN. We've been using a CDN for our ostree repo for a little while now and people still complain about slow download speeds and timeouts all the time. So we either have things configured badly or things are getting cycled out of the cache too fast. See also #1541 where we were discussing one optimization (i.e. the many redirects might be what is slowing down the downloads).

If we can get a good CDN "answer" then i'd be fine with that too

ramcq · 2018-12-19T21:18:22Z

Oh! Yeah redirects absolutely rinse the performance of whatever pipelining ostree is doing - at least I've definitely seen that at some point early in Flathub's life - that's why we set up dl.flathub.org as a separate hostname for repo access only. You have to point the origin in ostree to the hostname and path served by the CDN - you could probably finesse that with a mirrorlist of one in ostree.

I am almost certain that any Flathub issues are all due to load on the origin server rather than any problem with the CDN. Debian for instance has two CDNs (CloudFront and Fastly) and pays for neither - for Flathub we got Fastly basically by me tweeting, and it wasn't the only offer we received, just one of the best CDNs so I didn't spend much time with the others.

ramcq · 2018-12-19T21:20:37Z

https://gist.github.com/ramcq/a3991b5834767c6da73eec1af08b52ab is how the origin is configured on Flathub, fwiw.

cgwalters added the enhancement label Mar 10, 2017

sinnykumari mentioned this issue Dec 20, 2018

[RFE] OSTree mirroring support in mirrormanager fedora-infra/mirrormanager2#258

Open

ramcq mentioned this issue Jan 9, 2019

Follow redirect (HTTP 302) just once for objects/ #1541

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

delta-only repositories #729

delta-only repositories #729

cgwalters commented Mar 10, 2017

alexlarsson commented Mar 24, 2017

cgwalters commented Apr 7, 2017

cgwalters commented Apr 7, 2017

dustymabe commented Apr 7, 2017

ramcq commented Jan 9, 2018

alexlarsson commented Jan 9, 2018

dustymabe commented Dec 18, 2018

ramcq commented Dec 18, 2018

cgwalters commented Dec 18, 2018

ramcq commented Dec 18, 2018

cgwalters commented Dec 18, 2018

ramcq commented Dec 18, 2018 via email •

edited

Loading

dustymabe commented Dec 19, 2018

ramcq commented Dec 19, 2018

ramcq commented Dec 19, 2018

delta-only repositories #729

delta-only repositories #729

Comments

cgwalters commented Mar 10, 2017

alexlarsson commented Mar 24, 2017

cgwalters commented Apr 7, 2017

cgwalters commented Apr 7, 2017

dustymabe commented Apr 7, 2017

ramcq commented Jan 9, 2018

alexlarsson commented Jan 9, 2018

dustymabe commented Dec 18, 2018

ramcq commented Dec 18, 2018

cgwalters commented Dec 18, 2018

ramcq commented Dec 18, 2018

cgwalters commented Dec 18, 2018

ramcq commented Dec 18, 2018 via email • edited Loading

dustymabe commented Dec 19, 2018

ramcq commented Dec 19, 2018

ramcq commented Dec 19, 2018

ramcq commented Dec 18, 2018 via email •

edited

Loading