Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RFE] OSTree mirroring support in mirrormanager #258

Open
sinnykumari opened this issue Dec 17, 2018 · 16 comments
Open

[RFE] OSTree mirroring support in mirrormanager #258

sinnykumari opened this issue Dec 17, 2018 · 16 comments

Comments

@sinnykumari
Copy link

Upcoming Fedora CoreOS is based on ostree and plan for delivering OS content is via plain OSTree Repo . In order to get faster and verified content on client system, it will be nice to use Fedora mirroring system.

Some information:

Can someone (@adrianreber ?) help in identifying what needs to be done in the mirrormanager side in order to support OSTree mirroring?

@adrianreber
Copy link
Member

Looking at the current repository I think this needs a different approach than what MirrorManager currently does. This looks like a massive amount of files which makes mirroring and crawling really hard.

We really should not put any of these files in one of the existing mirror categories. Unfortunately this means that we will not have lot's of mirrors. But if we put OSTree in one of the existing repositories ('Fedora Linux' in MirrorManager speak for example), this would break our existing setup.

From my point of view we need a new and separate location (rsync module) for the content.

Then we cannot crawl it like we do right now. We need to have a clever way of scanning for certain files which are most relevant for this new kind of repository.

We also need, for the metalink, know which file defines a repository. Right now that is repomd.xml. We know the md5sum of that file and can verify if the mirror has the correct file and the md5sum is also what we distribute via the metalink to all the clients.

As it will not be possible to scan the complete mirrors as well as probably the master, the client downloading the files has to be clever enough to switch to the next mirror if a file is missing or has the wrong content.

Can someone (@adrianreber ?) help in identifying what needs to be done in the mirrormanager side in order to support OSTree mirroring?

Right now I think this needs a lot of work on the mirrormanager side as it requires different approaches to what we currently do. Also the whole mirroring sounds for me like something which should not be done with rsync, but with a tool that exactly knows which files it needs to download.

@sinnykumari
Copy link
Author

Looking at the current repository I think this needs a different approach than what MirrorManager currently does. This looks like a massive amount of files which makes mirroring and crawling really hard.

Yeah, there are lot of small files in objects/ which will be needed if we mirror entire ostree repo.
In case of just mirroring static deltas we create, we will need to only mirror deltas/ directory which contains comparatively lesser files with each file size in few MBs.

We really should not put any of these files in one of the existing mirror categories. Unfortunately this means that we will not have lot's of mirrors. But if we put OSTree in one of the existing repositories ('Fedora Linux' in MirrorManager speak for example), this would break our existing setup.

Umm, if we restrict OSTree repo to be mirrored in few mirror repositories then it will also mean that some users may get slow download speed if they don't have mirror in their area and it will not serve our purpose which we are trying to achieve through mirroring.

From my point of view we need a new and separate location (rsync module) for the content.

Any example?

Then we cannot crawl it like we do right now. We need to have a clever way of scanning for certain files which are most relevant for this new kind of repository.

We also need, for the metalink, know which file defines a repository. Right now that is repomd.xml. We know the md5sum of that file and can verify if the mirror has the correct file and the md5sum is also what we distribute via the metalink to all the clients.

For static delta case, maybe we can use summary file to get the md5sum and size of delta. https://github.com/ostreedev/ostree-releng-scripts/blob/master/print-summary can be used to get human readable content from summary file.

Also, I see some related work were done in the past into mirrormanager in PR https://github.com/fedora-infra/mirrormanager2/pull/7/files .

As it will not be possible to scan the complete mirrors as well as probably the master, the client downloading the files has to be clever enough to switch to the next mirror if a file is missing or has the wrong content.

For static delta case, this should be a single file on the user side while pull the update, so switching to other mirror should be easier? Not sure about how to deal with the case of mirroring entire ostree repo

Can someone (@adrianreber ?) help in identifying what needs to be done in the mirrormanager side in order to support OSTree mirroring?

Right now I think this needs a lot of work on the mirrormanager side as it requires different approaches to what we currently do. Also the whole mirroring sounds for me like something which should not be done with rsync, but with a tool that exactly knows which files it needs to download.

Maybe we can use ostree pull to update content from master mirror to rest of the mirrors? ostree knows better what changes has been done on top.

@cgwalters @dustymabe @jlebon Did I miss something ?

@cgwalters
Copy link

Do we need mirrormanager versus just using CloudFront?

@sinnykumari
Copy link
Author

Do we need mirrormanager versus just using CloudFront?

There was a concern with using CloudFront which is possibility of getting stale refs due to higher cache time https://pagure.io/fedora-infrastructure/issue/5970#comment-501897 .

@nirik nirik closed this as completed Dec 17, 2018
@nirik nirik reopened this Dec 17, 2018
@nirik
Copy link
Member

nirik commented Dec 17, 2018

Sorry about that close, github decided that my clicking in a window to get focus was hitting the close button. :( Anyhow...

For cloudfront we could invalidate the cache after we push new content, however, I thought some folks were still seeing very slow speeds with cloudfront. Did we ever find out why that was?

@sinnykumari
Copy link
Author

For cloudfront we could invalidate the cache after we push new content, however, I thought some folks were still seeing very slow speeds with cloudfront. Did we ever find out why that was?

I am not much aware of seeing slow download speed with current cloudfront set-up for atomic repo, maybe @dustymabe would know?
I see one ostree issue related to handling in better way http redirect when content from objects/ is fetched .

@cgwalters
Copy link

There was a concern with using CloudFront which is possibility of getting stale refs due to higher cache time https://pagure.io/fedora-infrastructure/issue/5970#comment-501897 .

CF honors the origin server's caching specifications, the idea here is our origin server would know to set lower expiry for refs/.

@cgwalters
Copy link

I see one ostree issue related to handling in better way http redirect when content from objects/ is fetched .

That issue talks about the fact that we don't necessarily need to change libostree if we tweak the server side.

Discussion on this now spawns 3 tickets so I'll reply to this comment here:

Also, I do not think it's ideal to set contenturl= in our configurations, since the cloudfront fronting was meant as a temporary workaround to see if it helped, and considered an implementation detail.

Agree, but we can maintain that independence by defining e.g. a DNS CNAME that redirects to the CF domain - I'm sure other people do something like this?

@sinnykumari
Copy link
Author

Considering the complexity to mirror small files, for the short term (F30) we are exploring on mirroring static-delta only repos ostreedev/ostree#729 . For the long run, we will look into mirroring objects/ as well.

@adrianreber Since we are not dealing with small files now, do you think it will be achievable with small/medium level change in mirrormanager side?

@dustymabe
Copy link

Do we need mirrormanager versus just using CloudFront?

ostreedev/ostree#729 (comment) - TL;DR: no if we can get it to work well enough that people don't have a bad experience.

@jlebon
Copy link

jlebon commented Dec 21, 2018

So before going further on the delta-only front, do we need to investigate a bit based on the discussions in ostreedev/ostree#729 whether there are any tweaks we can make to the current CDN setup?

@sinnykumari
Copy link
Author

So before going further on the delta-only front, do we need to investigate a bit based on the discussions in ostreedev/ostree#729 whether there are any tweaks we can make to the current CDN setup?

According to me it does make sense to look for the improvement in CDN front, but as per my understanding it will need some help from Fedora infra as well (who has done the current ostree CDN set-up)?

Meanwhile, I am also interested in understanding/exploring delta only repo in order to have a better opinion.

@dustymabe
Copy link

It sounds like we need to investigate both options. The CDN is attractive because it's a rather passive setup. Fedora infra has been responsive to helping us make changes to settings there, but we need to be careful that we don't break existing setups. For the CDN we should probably explore the proposal in luca's comment: ostreedev/ostree#1541 (comment)

Meanwhile, I am also interested in understanding/exploring delta only repo in order to have a better opinion.

I'm interested to see how easy this option would be and the performance gains as well.

@adrianreber
Copy link
Member

If there is something I need to comment on, please let me know. I do not understand most of the things discussed here about ostree. I have no idea what delta only means and how everything is connected to a CDN. I am happy to comment on changes needed for MirrorManager, but right now I am lost.

@dustymabe
Copy link

If there is something I need to comment on, please let me know

Yes! Thanks @adrianreber - we kind of carried the conversations from other places into this ticket. Sorry for the confusion. You are correct I believe. All action items are on us/infra for now.

@cgwalters
Copy link

It feels like this issue is at least to some degree about something much more fundamental, which is "To what degree does Fedora infra rely on public cloud?". For quite a long time (AFAIK) the answer as basically been "it doesn't" - obviously we upload AMIs to EC2 etc., but if EC2 is down, then us not being able to upload AMIs doesn't matter.

And I cheer and support the model of not being wholly dependent on public cloud - particularly just one. But this is a well-trod path - in fact let's state the obvious - dealing with this is a whole part of Red Hat's business model and proposition.

So yeah, as we talk about "CDN" I do think it's a fundamental requirement that we at least have DNS-level control and can switch over to a separate CDN provider easily, or use MirrorManager, or a mix of the two - even if we default to CDN, maybe support a flag in the HTTP request that forces on MM so that people who have set up e.g. mirrors in their univerisity still hit that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants