All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: Jonathan Nieder <jrnieder@gmail.com>
Cc: "Jonathan Tan" <jonathantanmy@google.com>,
	git <git@vger.kernel.org>, "Junio C Hamano" <gitster@pobox.com>,
	"Jeff King" <peff@peff.net>,
	"Ævar Arnfjörð Bjarmason" <avarab@gmail.com>
Subject: Re: [WIP 0/7] CDN offloading of fetch response
Date: Tue, 26 Feb 2019 09:30:28 +0100	[thread overview]
Message-ID: <CAP8UFD1tKNFO9wU8CbgNnSnQyvHYPsZMk1Bit2y1jxH4vk62qQ@mail.gmail.com> (raw)
In-Reply-To: <20190225234528.GD16965@google.com>

Hi,

On Tue, Feb 26, 2019 at 12:45 AM Jonathan Nieder <jrnieder@gmail.com> wrote:
>
> Christian Couder wrote:
> > On Sun, Feb 24, 2019 at 12:39 AM Jonathan Tan <jonathantanmy@google.com> wrote:
>
> > Especially I'd like to know what should the client do if they find out
> > that for example a repo that contains a lot of large files is
> > configured so that the large files should be fetched from a CDN that
> > the client cannot use? Is the client forced to find or setup another
> > repo configured differently if the client still wants to use CDN
> > offloading?
>
> The example from that message:
>
>   For example I think the Great Firewall of China lets people in China
>   use GitHub.com but not Google.com. So if people start configuring
>   their repos on GitHub so that they send packs that contain Google.com
>   CDN URLs (or actually anything that the Firewall blocks), it might
>   create many problems for users in China if they don't have a way to
>   opt out of receiving packs with those kind of URLs.
>
> But the same thing can happen with redirects, with embedded assets in
> web pages, and so on.

I don't think it's the same situation, because the CDN offloading is
likely to be used for large objects that some hosting sites like
GitHub, GitLab and BitBucket might not be ok to have them store for
free on their machines. (I think the current limitations are around
10GB or 20GB, everything included, for each user.)

So it's likely that users will want a way to host on such sites
incomplete repos using CDN offloading to a CDN on another site. And
then if the CDN is not accessible for some reason, things will
completely break when users will clone.

You could say that it's the same issue as when a video is not
available on a web page, but the web browser can still render the page
when a video is not available. So I don't think it's the same kind of
issue.

And by the way that's a reason why I think it's important to think
about this in relation to promisor/partial clone remotes. Because with
them it's less of a big deal if the CDN is unavailable, temporarily or
not, for some reason.

> I think in this situation the user would likely
> (and rightly) blame the host (github.com) for requiring access to a
> separate inaccessible site, and the problem could be resolved with
> them.

The host will say that it's repo admins' responsibility to use a CDN
that works for the repo users (or to pay for more space on the host).
Then repo admins will say that they use this CDN because it's simpler
for them or the only thing they can afford or deal with. (For example
I don't think it would be easy for westerners to use a Chinese CDN.)
Then users will likely blame Git for not supporting a way to use a
different CDN than the one configured in each repo.

> The beauty of this is that it's transparent to the client: the fact
> that packfile transfer was offloaded to a CDN is an implementation
> detail, and the server takes full responsibility for it.

Who is "the server" in real life? Are you sure they would be ok to
take full responsibility?

And yes, I agree that transparency for the client is nice. And if it's
really nice, then why not have it for promisor/partial clone remotes
too? But then do we really need duplicated functionality between
promisor remotes and CDN offloading?

And also I just think that in real life there needs to be an easy way
to override this transparency, and we already have that with promisor
remotes.

> This doesn't stop a hosting provider from using e.g. server options to
> allow the client more control over how their response is served, just
> like can be done for other features of how the transfer works (how
> often to send progress updates, whether to prioritize latency or
> throughput, etc).

Could you give a more concrete example of what could be done?

> What the client *can* do is turn off support for packfile URLs in a
> request completely.  This is required for backward compatibility and
> allows working around a host that has configured the feature
> incorrectly.

If the full content of a repo is really large, the size of a full pack
file sent by an initial clone could be really big and many client
machines could not have enough memory to deal with that. And this
suppose that repo hosting providers would be ok to host very large
repos in the first place.

With promisor remotes, it's less of a problem if for example:

- a repo hosting provider is not ok with very large repos,
- a CDN is unavailable,
- a repo admin has not configured some repos very well.

Thanks for your answer,
Christian.

  reply	other threads:[~2019-02-26  8:30 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-23 23:38 [WIP 0/7] CDN offloading of fetch response Jonathan Tan
2019-02-23 23:38 ` [WIP 1/7] http: use --stdin and --keep when downloading pack Jonathan Tan
2019-02-23 23:38 ` [WIP 2/7] http: improve documentation of http_pack_request Jonathan Tan
2019-02-23 23:38 ` [WIP 3/7] http-fetch: support fetching packfiles by URL Jonathan Tan
2019-02-23 23:38 ` [WIP 4/7] Documentation: order protocol v2 sections Jonathan Tan
2019-02-23 23:38 ` [WIP 5/7] Documentation: add Packfile URIs design doc Jonathan Tan
2019-02-23 23:39 ` [WIP 6/7] upload-pack: refactor reading of pack-objects out Jonathan Tan
2019-02-23 23:39 ` [WIP 7/7] upload-pack: send part of packfile response as uri Jonathan Tan
2019-02-24 15:54   ` Junio C Hamano
2019-02-25 21:04   ` Christian Couder
2019-02-26  1:53     ` Jonathan Nieder
2019-02-26  7:08       ` Christian Couder
2019-03-01  0:09   ` Josh Steadmon
2019-03-01  0:17     ` Jonathan Tan
2019-02-25 21:30 ` [WIP 0/7] CDN offloading of fetch response Christian Couder
2019-02-25 23:45   ` Jonathan Nieder
2019-02-26  8:30     ` Christian Couder [this message]
2019-02-26  9:12       ` Ævar Arnfjörð Bjarmason
2019-03-04  8:24         ` Christian Couder
2019-02-28 23:21       ` Jonathan Nieder
2019-03-04  8:54         ` Christian Couder
2019-03-08 21:55 ` [PATCH v2 0/8] " Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 1/8] http: use --stdin when getting dumb HTTP pack Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 2/8] http: improve documentation of http_pack_request Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 3/8] http-fetch: support fetching packfiles by URL Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 4/8] Documentation: order protocol v2 sections Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 5/8] Documentation: add Packfile URIs design doc Jonathan Tan
2019-04-23  5:31     ` Jeff King
2019-04-23 20:38       ` Jonathan Tan
2019-04-23 22:18         ` Ævar Arnfjörð Bjarmason
2019-04-23 22:22           ` Jonathan Nieder
2019-04-23 22:30             ` Ævar Arnfjörð Bjarmason
2019-04-23 22:51               ` Jonathan Nieder
2019-04-23 22:11       ` Jonathan Nieder
2019-04-23 22:25         ` Ævar Arnfjörð Bjarmason
2019-04-23 22:48           ` Jonathan Nieder
2019-04-24  7:48             ` Ævar Arnfjörð Bjarmason
2019-04-24  3:01       ` Junio C Hamano
2019-03-08 21:55   ` [PATCH v2 6/8] upload-pack: refactor reading of pack-objects out Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 7/8] fetch-pack: support more than one pack lockfile Jonathan Tan
2019-03-08 21:55   ` [PATCH v2 8/8] upload-pack: send part of packfile response as uri Jonathan Tan
2019-03-19 20:48   ` [PATCH v2 0/8] CDN offloading of fetch response Josh Steadmon
2019-04-23  5:21   ` Jeff King
2019-04-23 19:23     ` Jonathan Tan
2019-04-24  9:09   ` Ævar Arnfjörð Bjarmason

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAP8UFD1tKNFO9wU8CbgNnSnQyvHYPsZMk1Bit2y1jxH4vk62qQ@mail.gmail.com \
    --to=christian.couder@gmail.com \
    --cc=avarab@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=jonathantanmy@google.com \
    --cc=jrnieder@gmail.com \
    --cc=peff@peff.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.