All of lore.kernel.org
 help / color / mirror / Atom feed
* [ANNOUNCE] Git Merge Contributor Summit topic planning
@ 2017-01-31  0:48 Jeff King
  2017-01-31  0:59 ` Jeff King
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Jeff King @ 2017-01-31  0:48 UTC (permalink / raw)
  To: git

The Contributor Summit is only a few days away; I'd like to work out a
few bits of logistics ahead of time.

We're up to 26 attendees. The room layout will probably be three big
round-tables, with a central projector. We should be able to have
everybody pay attention to a single speaker, or break into 3 separate
conversations.

The list of topics is totally open. If you're coming and have something
you'd like to present or discuss, then propose it here. If you're _not_
coming, you may still chime in with input on topics, but please don't
suggest a topic unless somebody who is there will agree to lead the
discussion.

We'll write the final list on a whiteboard on Thursday morning, vote on
what looks good, and then work our way down the list.  Topics don't
_have_ to be proposed here ahead of time, but I'd encourage people to do
so as it leaves time for others to consider them and possibly do any
background thinking or research.

The rough schedule is:

  0830 to 0930 - registration, breakfast, milling about and socializing;
                 be aware that Git Merge Workshop attendees will be
		 doing the same things in the same space, so show up
		 with enough time to navigate a bit of a crowd.

  0930 to 1215 - we retire to our Fortress of Solitude to talk about
                 Very Important Git Things

  1215 to 1330 - lunch

  1330 to 1500 - Very Important Git Things, part deux. The end time
		 isn't a hard deadline, so we can go as late as 1600 if
		 the discussion keeps up.

There's no organized dinner planned. At our size, I think it's probably
most productive to let people form small groups for dinner if they want
to. But if somebody is really interested in trying to do a big group
reservation, they are welcome to try to organize it.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning
  2017-01-31  0:48 [ANNOUNCE] Git Merge Contributor Summit topic planning Jeff King
@ 2017-01-31  0:59 ` Jeff King
  2017-02-01 19:51   ` Christian Couder
  2017-02-01  9:32 ` Erik van Zijst
  2017-02-01 20:37 ` Stefan Beller
  2 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2017-01-31  0:59 UTC (permalink / raw)
  To: git

On Tue, Jan 31, 2017 at 01:48:05AM +0100, Jeff King wrote:

> The list of topics is totally open. If you're coming and have something
> you'd like to present or discuss, then propose it here. If you're _not_
> coming, you may still chime in with input on topics, but please don't
> suggest a topic unless somebody who is there will agree to lead the
> discussion.

Here are the two topics I plan on bringing:

  - Git / Software Freedom Conservancy yearly report. I'll plan to give
    a rundown of the past year's activities and financials, along with
    some open questions that could benefit from community input.

  - The git-scm.com website: who runs that thing, anyway? An overview
    of the site, how it's managed, and what it needs.

I plan to send out detailed emails on both topics to the list on
Wednesday, and then follow-up with a summary of any useful in-person
discussion (since obviously not everybody will be at the summit).

I'd encourage anybody with a topic to present to consider doing
something similar.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning
  2017-01-31  0:48 [ANNOUNCE] Git Merge Contributor Summit topic planning Jeff King
  2017-01-31  0:59 ` Jeff King
@ 2017-02-01  9:32 ` Erik van Zijst
  2017-02-01 14:53   ` Jeff King
  2017-02-01 20:37 ` Stefan Beller
  2 siblings, 1 reply; 9+ messages in thread
From: Erik van Zijst @ 2017-02-01  9:32 UTC (permalink / raw)
  To: peff; +Cc: git, ssaasen, mheemskerk

On Tue, Jan 31, 2017 at 01:48:05AM +0100, Jeff King wrote:

> The list of topics is totally open. If you're coming and have something
> you'd like to present or discuss, then propose it here. If you're _not_
> coming, you may still chime in with input on topics, but please don't
> suggest a topic unless somebody who is there will agree to lead the
> discussion.

I would like to talk about the possibility of CDN-aided cloning
operations as mentioned on this list earlier this week:
http://public-inbox.org/git/CADoxLGPFgF7W4XJzt0X+xFJDoN6RmfFGx_96MO9GPSSOjDK0EQ@mail.gmail.com/

At Bitbucket we have recently rolled out so-called clonebundle support
for Mercurial repositories.

Full clone operations are rather expensive on the server and are
responsible for a substantial part of our CPU and IO load. CDN-based
clonebundles have allowed us to eliminate most of this load for
Mercurial repos and we've since built a clonebundle spike for Git.

Clients performing a full clone get redirected to a CDN where they seed
their new local repo from a pre-built bundle file, and then pull/fetch
any remaining changes. Mercurial has had native, built-in support for
this for a while now.

I imagine other large code hosts could benefit from this as well and
I'd love to gauge the group's interest for this. Could this make sense
for Git? Would it have a chance of landing?

Our spike implements it as an optional capability during ref
advertisement. What are your thoughts on this?

Cheers,
Erik

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning
  2017-02-01  9:32 ` Erik van Zijst
@ 2017-02-01 14:53   ` Jeff King
  2017-02-01 18:06     ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2017-02-01 14:53 UTC (permalink / raw)
  To: Erik van Zijst; +Cc: git, ssaasen, mheemskerk

On Wed, Feb 01, 2017 at 10:32:12AM +0100, Erik van Zijst wrote:

> Clients performing a full clone get redirected to a CDN where they seed
> their new local repo from a pre-built bundle file, and then pull/fetch
> any remaining changes. Mercurial has had native, built-in support for
> this for a while now.
> 
> I imagine other large code hosts could benefit from this as well and
> I'd love to gauge the group's interest for this. Could this make sense
> for Git? Would it have a chance of landing?
> 
> Our spike implements it as an optional capability during ref
> advertisement. What are your thoughts on this?

I think this is definitely an interesting topic to discuss tomorrow.

Here are a few observations from my past thinking on the issue. I
haven't read the proposal from earlier this week yet, so some of them
may be obsolete.

Seeding from a bundle CDN generally solves two problems: getting the
bulk of the data from someplace with higher bandwidth (the CDN), and
getting the bulk of the data over a protocol that can be resumed (the
bundle).

But we don't necessarily have to solve both problems simultaneously.
And you might not want to. Storing a separate bundle on another server
is complicated to configure, and doubles the amount of disk space you
need (just half of it is on the CDN). Using a bundle means you can't
seed from a non-bundle source.

So for any solution, I'd want to consider how you can put together the
pieces. Can you seed from a non-bundle? Can you seed from yourself and
just get resumability? If so, how hard is it to serve a pseudo-bundle
based on the packfiles you have on disk (i.e., getting resumability
at least in the common cases without paying the disk cost). I.e., saving
enough data that you could reconstruct the bundle byte-for-byte when you
need to.

If you _can_ do that latter part, and you take "I only care about
resumability" to the simplest extreme, you'd probably end up with a
protocol more like:

  Client: I need a packfile with this want/have
  Server: OK, here it is; its opaque id is XYZ.
  ... connection interrupted ...
  Client: It's me again. I have up to byte N of pack XYZ
  Server: OK, resuming
          [or: I don't have XYZ anymore; start from scratch]

Then generating XYZ and generating that bundle are basically the same
task.

All just food for thought. I look forward to digging into it more on the
list and in the in-person discussion.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning
  2017-02-01 14:53   ` Jeff King
@ 2017-02-01 18:06     ` Junio C Hamano
  2017-02-01 21:28       ` Jeff King
  0 siblings, 1 reply; 9+ messages in thread
From: Junio C Hamano @ 2017-02-01 18:06 UTC (permalink / raw)
  To: Jeff King; +Cc: Erik van Zijst, git, ssaasen, mheemskerk

Jeff King <peff@peff.net> writes:

> If you _can_ do that latter part, and you take "I only care about
> resumability" to the simplest extreme, you'd probably end up with a
> protocol more like:
>
>   Client: I need a packfile with this want/have
>   Server: OK, here it is; its opaque id is XYZ.
>   ... connection interrupted ...
>   Client: It's me again. I have up to byte N of pack XYZ
>   Server: OK, resuming
>           [or: I don't have XYZ anymore; start from scratch]
>
> Then generating XYZ and generating that bundle are basically the same
> task.

The above allows a simple and naive implementation of generating a
packstream and "tee"ing it to a spool file to be kept while sending
to the first client that asks XYZ.

The story I heard from folks who run git servers at work for Android
and other projects, however, is that they rarely see two requests
with want/have that result in an identical XYZ, unless "have" is an
empty set (aka "clone").  In a busy repository, between two clone
requests relatively close together, somebody would be pushing, so
you'd need many XYZs in your spool even if you want to support only
the "clone" case.

So in the real life, I think that the exchange needs to be more
like this:

    C: I need a packfile with this want/have
    ... C/S negotiate what "have"s are common ...
    S: Sorry, but our negitiation indicates that you are way too
       behind.  I'll send you a packfile that brings you up to a
       slightly older set of "want", so pretend that you asked for
       these slightly older "want"s instead.  The opaque id of that
       packfile is XYZ.  After getting XYZ, come back to me with
       your original set of "want"s.  You would give me more recent
       "have" in that request.  
    ... connection interrupted ...
    C: It's me again.  I have up to byte N of pack XYZ
    S: OK, resuming (or: I do not have it anymore, start from scratch)
    ... after 0 or more iterations C fully receives and digests XYZ ...

and then the above will iterate until the server does not have to
say "Sorry but you are way too behind" and returns a packfile
without having to tweak the "want".

That way, you can limit the number of XYZ you would need to keep to
a reasonable number.

The recent proposal by Jonathan Tan also allows the server side to
tweak the final tips the client receives after the protocol exchange
started.  I suspect the above two will become related.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning
  2017-01-31  0:59 ` Jeff King
@ 2017-02-01 19:51   ` Christian Couder
  0 siblings, 0 replies; 9+ messages in thread
From: Christian Couder @ 2017-02-01 19:51 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Tue, Jan 31, 2017 at 1:59 AM, Jeff King <peff@peff.net> wrote:
> On Tue, Jan 31, 2017 at 01:48:05AM +0100, Jeff King wrote:
>
>> The list of topics is totally open. If you're coming and have something
>> you'd like to present or discuss, then propose it here. If you're _not_
>> coming, you may still chime in with input on topics, but please don't
>> suggest a topic unless somebody who is there will agree to lead the
>> discussion.
>
> Here are the two topics I plan on bringing:
>
>   - Git / Software Freedom Conservancy yearly report. I'll plan to give
>     a rundown of the past year's activities and financials, along with
>     some open questions that could benefit from community input.
>
>   - The git-scm.com website: who runs that thing, anyway? An overview
>     of the site, how it's managed, and what it needs.
>
> I plan to send out detailed emails on both topics to the list on
> Wednesday, and then follow-up with a summary of any useful in-person
> discussion (since obviously not everybody will be at the summit).
>
> I'd encourage anybody with a topic to present to consider doing
> something similar.

GitLab people at the Summit (this includes me) would like to spend a
few minutes to introduce https://gitlab.com/gitlab-org/gitaly/ and
answer any questions.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning
  2017-01-31  0:48 [ANNOUNCE] Git Merge Contributor Summit topic planning Jeff King
  2017-01-31  0:59 ` Jeff King
  2017-02-01  9:32 ` Erik van Zijst
@ 2017-02-01 20:37 ` Stefan Beller
  2 siblings, 0 replies; 9+ messages in thread
From: Stefan Beller @ 2017-02-01 20:37 UTC (permalink / raw)
  To: Jeff King; +Cc: git

On Mon, Jan 30, 2017 at 4:48 PM, Jeff King <peff@peff.net> wrote:
> The Contributor Summit is only a few days away; I'd like to work out a
> few bits of logistics ahead of time.
>
> We're up to 26 attendees. The room layout will probably be three big
> round-tables, with a central projector. We should be able to have
> everybody pay attention to a single speaker, or break into 3 separate
> conversations.
>
> The list of topics is totally open. If you're coming and have something
> you'd like to present or discuss, then propose it here. If you're _not_
> coming, you may still chime in with input on topics, but please don't
> suggest a topic unless somebody who is there will agree to lead the
> discussion.

submodules and X (How do submodules and worktrees interact,
should they?, Which functions need support for submodules, e.g. checkout,
branch,  grep, etc...? Are we interested in keeping a submodule its own
logical unit? Do we want to have dedicated plumbing commands for
submodules?)

... would be my line of talk,

Stefan

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning
  2017-02-01 18:06     ` Junio C Hamano
@ 2017-02-01 21:28       ` Jeff King
  2017-02-01 21:35         ` Junio C Hamano
  0 siblings, 1 reply; 9+ messages in thread
From: Jeff King @ 2017-02-01 21:28 UTC (permalink / raw)
  To: Junio C Hamano; +Cc: Erik van Zijst, git, ssaasen, mheemskerk

On Wed, Feb 01, 2017 at 10:06:15AM -0800, Junio C Hamano wrote:

> > If you _can_ do that latter part, and you take "I only care about
> > resumability" to the simplest extreme, you'd probably end up with a
> > protocol more like:
> >
> >   Client: I need a packfile with this want/have
> >   Server: OK, here it is; its opaque id is XYZ.
> >   ... connection interrupted ...
> >   Client: It's me again. I have up to byte N of pack XYZ
> >   Server: OK, resuming
> >           [or: I don't have XYZ anymore; start from scratch]
> >
> > Then generating XYZ and generating that bundle are basically the same
> > task.
> 
> The above allows a simple and naive implementation of generating a
> packstream and "tee"ing it to a spool file to be kept while sending
> to the first client that asks XYZ.
> 
> The story I heard from folks who run git servers at work for Android
> and other projects, however, is that they rarely see two requests
> with want/have that result in an identical XYZ, unless "have" is an
> empty set (aka "clone").  In a busy repository, between two clone
> requests relatively close together, somebody would be pushing, so
> you'd need many XYZs in your spool even if you want to support only
> the "clone" case.

Yeah, I agree a tag "XYZ" does not cover all cases, especially for
fetches.

We do caching at GitHub based on the sha1(want+have+options) tag, and it
does catch quite a lot of parallelism, but not all. It catches most
clones, and many fetches that are done by "thundering herds" of similar
clients.

One thing you could do with such a pure "resume XYZ" tag is to represent
the generated pack _without_ replicating the actual object bytes, but
take shortcuts by basing particular bits on the on-disk packfile. Just
enough to serve a deterministic packfile for the same want/have bits.

For instance, if the server knew that XYZ meant

  - send bytes m through n of packfile p, then...
  
  - send the object at position i of packfile p, as a delta against the
    object at position j of packfile q

  - ...and so on

Then you could store very small "instruction sheets" for each XYZ that
rely on the data in the packfiles. If those packfiles go away (e.g., due
to a repack) that invalidates all of your current XYZ tags. That's OK as
long as this is an optimization, not a correctness requirement.

I haven't actually built anything like this, though, so I don't have a
complete language for the instruction sheets, nor numbers on how big
they would be for average cases.

> So in the real life, I think that the exchange needs to be more
> like this:
> 
>     C: I need a packfile with this want/have
>     ... C/S negotiate what "have"s are common ...
>     S: Sorry, but our negitiation indicates that you are way too
>        behind.  I'll send you a packfile that brings you up to a
>        slightly older set of "want", so pretend that you asked for
>        these slightly older "want"s instead.  The opaque id of that
>        packfile is XYZ.  After getting XYZ, come back to me with
>        your original set of "want"s.  You would give me more recent
>        "have" in that request.  
>     ... connection interrupted ...
>     C: It's me again.  I have up to byte N of pack XYZ
>     S: OK, resuming (or: I do not have it anymore, start from scratch)
>     ... after 0 or more iterations C fully receives and digests XYZ ...
> 
> and then the above will iterate until the server does not have to
> say "Sorry but you are way too behind" and returns a packfile
> without having to tweak the "want".

Yes, I think that is a reasonable variant. The client knows about
seeding, but the XYZ conversation continues to happen inside the git
protocol. So it loses flexibility versus a true CDN redirection, but it
would "just work" when the server/client both understand the feature,
without the server admin having to set up a separate bundle-over-http
infrastructure.

-Peff

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning
  2017-02-01 21:28       ` Jeff King
@ 2017-02-01 21:35         ` Junio C Hamano
  0 siblings, 0 replies; 9+ messages in thread
From: Junio C Hamano @ 2017-02-01 21:35 UTC (permalink / raw)
  To: Jeff King; +Cc: Erik van Zijst, git, ssaasen, mheemskerk

Jeff King <peff@peff.net> writes:

> For instance, if the server knew that XYZ meant
>
>   - send bytes m through n of packfile p, then...
>   
>   - send the object at position i of packfile p, as a delta against the
>     object at position j of packfile q
>
>   - ...and so on
>
> Then you could store very small "instruction sheets" for each XYZ that
> rely on the data in the packfiles. If those packfiles go away (e.g., due
> to a repack) that invalidates all of your current XYZ tags. That's OK as
> long as this is an optimization, not a correctness requirement.

Yes.  You can play optimization games.

>> So in the real life, I think that the exchange needs to be more
>> like this:
>> 
>>     C: I need a packfile with this want/have
>>     ... C/S negotiate what "have"s are common ...
>>     S: Sorry, but our negitiation indicates that you are way too
>>        behind.  I'll send you a packfile that brings you up to a
>>        slightly older set of "want", so pretend that you asked for
>>        these slightly older "want"s instead.  The opaque id of that
>>        packfile is XYZ.  After getting XYZ, come back to me with
>>        your original set of "want"s.  You would give me more recent
>>        "have" in that request.  
>>     ... connection interrupted ...
>>     C: It's me again.  I have up to byte N of pack XYZ
>>     S: OK, resuming (or: I do not have it anymore, start from scratch)
>>     ... after 0 or more iterations C fully receives and digests XYZ ...
>> 
>> and then the above will iterate until the server does not have to
>> say "Sorry but you are way too behind" and returns a packfile
>> without having to tweak the "want".
>
> Yes, I think that is a reasonable variant. The client knows about
> seeding, but the XYZ conversation continues to happen inside the git
> protocol. So it loses flexibility versus a true CDN redirection, but it
> would "just work" when the server/client both understand the feature,
> without the server admin having to set up a separate bundle-over-http
> infrastructure.

You can also do a CDN offline as a natural extension.  When the
server says "Sorry, you are way too behind.", the above example
tells "I'll update you to a slightly stale version first" to the
client.  An natural extension could say "Go update yourself to a
slightly stale version first by grabbing that bundle over there."

But I agree that doing everything in-line may be a logical and
simpler first step to get there.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-02-01 21:35 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-31  0:48 [ANNOUNCE] Git Merge Contributor Summit topic planning Jeff King
2017-01-31  0:59 ` Jeff King
2017-02-01 19:51   ` Christian Couder
2017-02-01  9:32 ` Erik van Zijst
2017-02-01 14:53   ` Jeff King
2017-02-01 18:06     ` Junio C Hamano
2017-02-01 21:28       ` Jeff King
2017-02-01 21:35         ` Junio C Hamano
2017-02-01 20:37 ` Stefan Beller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.