* [ANNOUNCE] Git Merge Contributor Summit topic planning @ 2017-01-31 0:48 Jeff King 2017-01-31 0:59 ` Jeff King ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Jeff King @ 2017-01-31 0:48 UTC (permalink / raw) To: git The Contributor Summit is only a few days away; I'd like to work out a few bits of logistics ahead of time. We're up to 26 attendees. The room layout will probably be three big round-tables, with a central projector. We should be able to have everybody pay attention to a single speaker, or break into 3 separate conversations. The list of topics is totally open. If you're coming and have something you'd like to present or discuss, then propose it here. If you're _not_ coming, you may still chime in with input on topics, but please don't suggest a topic unless somebody who is there will agree to lead the discussion. We'll write the final list on a whiteboard on Thursday morning, vote on what looks good, and then work our way down the list. Topics don't _have_ to be proposed here ahead of time, but I'd encourage people to do so as it leaves time for others to consider them and possibly do any background thinking or research. The rough schedule is: 0830 to 0930 - registration, breakfast, milling about and socializing; be aware that Git Merge Workshop attendees will be doing the same things in the same space, so show up with enough time to navigate a bit of a crowd. 0930 to 1215 - we retire to our Fortress of Solitude to talk about Very Important Git Things 1215 to 1330 - lunch 1330 to 1500 - Very Important Git Things, part deux. The end time isn't a hard deadline, so we can go as late as 1600 if the discussion keeps up. There's no organized dinner planned. At our size, I think it's probably most productive to let people form small groups for dinner if they want to. But if somebody is really interested in trying to do a big group reservation, they are welcome to try to organize it. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning 2017-01-31 0:48 [ANNOUNCE] Git Merge Contributor Summit topic planning Jeff King @ 2017-01-31 0:59 ` Jeff King 2017-02-01 19:51 ` Christian Couder 2017-02-01 9:32 ` Erik van Zijst 2017-02-01 20:37 ` Stefan Beller 2 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2017-01-31 0:59 UTC (permalink / raw) To: git On Tue, Jan 31, 2017 at 01:48:05AM +0100, Jeff King wrote: > The list of topics is totally open. If you're coming and have something > you'd like to present or discuss, then propose it here. If you're _not_ > coming, you may still chime in with input on topics, but please don't > suggest a topic unless somebody who is there will agree to lead the > discussion. Here are the two topics I plan on bringing: - Git / Software Freedom Conservancy yearly report. I'll plan to give a rundown of the past year's activities and financials, along with some open questions that could benefit from community input. - The git-scm.com website: who runs that thing, anyway? An overview of the site, how it's managed, and what it needs. I plan to send out detailed emails on both topics to the list on Wednesday, and then follow-up with a summary of any useful in-person discussion (since obviously not everybody will be at the summit). I'd encourage anybody with a topic to present to consider doing something similar. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning 2017-01-31 0:59 ` Jeff King @ 2017-02-01 19:51 ` Christian Couder 0 siblings, 0 replies; 9+ messages in thread From: Christian Couder @ 2017-02-01 19:51 UTC (permalink / raw) To: Jeff King; +Cc: git On Tue, Jan 31, 2017 at 1:59 AM, Jeff King <peff@peff.net> wrote: > On Tue, Jan 31, 2017 at 01:48:05AM +0100, Jeff King wrote: > >> The list of topics is totally open. If you're coming and have something >> you'd like to present or discuss, then propose it here. If you're _not_ >> coming, you may still chime in with input on topics, but please don't >> suggest a topic unless somebody who is there will agree to lead the >> discussion. > > Here are the two topics I plan on bringing: > > - Git / Software Freedom Conservancy yearly report. I'll plan to give > a rundown of the past year's activities and financials, along with > some open questions that could benefit from community input. > > - The git-scm.com website: who runs that thing, anyway? An overview > of the site, how it's managed, and what it needs. > > I plan to send out detailed emails on both topics to the list on > Wednesday, and then follow-up with a summary of any useful in-person > discussion (since obviously not everybody will be at the summit). > > I'd encourage anybody with a topic to present to consider doing > something similar. GitLab people at the Summit (this includes me) would like to spend a few minutes to introduce https://gitlab.com/gitlab-org/gitaly/ and answer any questions. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning 2017-01-31 0:48 [ANNOUNCE] Git Merge Contributor Summit topic planning Jeff King 2017-01-31 0:59 ` Jeff King @ 2017-02-01 9:32 ` Erik van Zijst 2017-02-01 14:53 ` Jeff King 2017-02-01 20:37 ` Stefan Beller 2 siblings, 1 reply; 9+ messages in thread From: Erik van Zijst @ 2017-02-01 9:32 UTC (permalink / raw) To: peff; +Cc: git, ssaasen, mheemskerk On Tue, Jan 31, 2017 at 01:48:05AM +0100, Jeff King wrote: > The list of topics is totally open. If you're coming and have something > you'd like to present or discuss, then propose it here. If you're _not_ > coming, you may still chime in with input on topics, but please don't > suggest a topic unless somebody who is there will agree to lead the > discussion. I would like to talk about the possibility of CDN-aided cloning operations as mentioned on this list earlier this week: http://public-inbox.org/git/CADoxLGPFgF7W4XJzt0X+xFJDoN6RmfFGx_96MO9GPSSOjDK0EQ@mail.gmail.com/ At Bitbucket we have recently rolled out so-called clonebundle support for Mercurial repositories. Full clone operations are rather expensive on the server and are responsible for a substantial part of our CPU and IO load. CDN-based clonebundles have allowed us to eliminate most of this load for Mercurial repos and we've since built a clonebundle spike for Git. Clients performing a full clone get redirected to a CDN where they seed their new local repo from a pre-built bundle file, and then pull/fetch any remaining changes. Mercurial has had native, built-in support for this for a while now. I imagine other large code hosts could benefit from this as well and I'd love to gauge the group's interest for this. Could this make sense for Git? Would it have a chance of landing? Our spike implements it as an optional capability during ref advertisement. What are your thoughts on this? Cheers, Erik ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning 2017-02-01 9:32 ` Erik van Zijst @ 2017-02-01 14:53 ` Jeff King 2017-02-01 18:06 ` Junio C Hamano 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2017-02-01 14:53 UTC (permalink / raw) To: Erik van Zijst; +Cc: git, ssaasen, mheemskerk On Wed, Feb 01, 2017 at 10:32:12AM +0100, Erik van Zijst wrote: > Clients performing a full clone get redirected to a CDN where they seed > their new local repo from a pre-built bundle file, and then pull/fetch > any remaining changes. Mercurial has had native, built-in support for > this for a while now. > > I imagine other large code hosts could benefit from this as well and > I'd love to gauge the group's interest for this. Could this make sense > for Git? Would it have a chance of landing? > > Our spike implements it as an optional capability during ref > advertisement. What are your thoughts on this? I think this is definitely an interesting topic to discuss tomorrow. Here are a few observations from my past thinking on the issue. I haven't read the proposal from earlier this week yet, so some of them may be obsolete. Seeding from a bundle CDN generally solves two problems: getting the bulk of the data from someplace with higher bandwidth (the CDN), and getting the bulk of the data over a protocol that can be resumed (the bundle). But we don't necessarily have to solve both problems simultaneously. And you might not want to. Storing a separate bundle on another server is complicated to configure, and doubles the amount of disk space you need (just half of it is on the CDN). Using a bundle means you can't seed from a non-bundle source. So for any solution, I'd want to consider how you can put together the pieces. Can you seed from a non-bundle? Can you seed from yourself and just get resumability? If so, how hard is it to serve a pseudo-bundle based on the packfiles you have on disk (i.e., getting resumability at least in the common cases without paying the disk cost). I.e., saving enough data that you could reconstruct the bundle byte-for-byte when you need to. If you _can_ do that latter part, and you take "I only care about resumability" to the simplest extreme, you'd probably end up with a protocol more like: Client: I need a packfile with this want/have Server: OK, here it is; its opaque id is XYZ. ... connection interrupted ... Client: It's me again. I have up to byte N of pack XYZ Server: OK, resuming [or: I don't have XYZ anymore; start from scratch] Then generating XYZ and generating that bundle are basically the same task. All just food for thought. I look forward to digging into it more on the list and in the in-person discussion. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning 2017-02-01 14:53 ` Jeff King @ 2017-02-01 18:06 ` Junio C Hamano 2017-02-01 21:28 ` Jeff King 0 siblings, 1 reply; 9+ messages in thread From: Junio C Hamano @ 2017-02-01 18:06 UTC (permalink / raw) To: Jeff King; +Cc: Erik van Zijst, git, ssaasen, mheemskerk Jeff King <peff@peff.net> writes: > If you _can_ do that latter part, and you take "I only care about > resumability" to the simplest extreme, you'd probably end up with a > protocol more like: > > Client: I need a packfile with this want/have > Server: OK, here it is; its opaque id is XYZ. > ... connection interrupted ... > Client: It's me again. I have up to byte N of pack XYZ > Server: OK, resuming > [or: I don't have XYZ anymore; start from scratch] > > Then generating XYZ and generating that bundle are basically the same > task. The above allows a simple and naive implementation of generating a packstream and "tee"ing it to a spool file to be kept while sending to the first client that asks XYZ. The story I heard from folks who run git servers at work for Android and other projects, however, is that they rarely see two requests with want/have that result in an identical XYZ, unless "have" is an empty set (aka "clone"). In a busy repository, between two clone requests relatively close together, somebody would be pushing, so you'd need many XYZs in your spool even if you want to support only the "clone" case. So in the real life, I think that the exchange needs to be more like this: C: I need a packfile with this want/have ... C/S negotiate what "have"s are common ... S: Sorry, but our negitiation indicates that you are way too behind. I'll send you a packfile that brings you up to a slightly older set of "want", so pretend that you asked for these slightly older "want"s instead. The opaque id of that packfile is XYZ. After getting XYZ, come back to me with your original set of "want"s. You would give me more recent "have" in that request. ... connection interrupted ... C: It's me again. I have up to byte N of pack XYZ S: OK, resuming (or: I do not have it anymore, start from scratch) ... after 0 or more iterations C fully receives and digests XYZ ... and then the above will iterate until the server does not have to say "Sorry but you are way too behind" and returns a packfile without having to tweak the "want". That way, you can limit the number of XYZ you would need to keep to a reasonable number. The recent proposal by Jonathan Tan also allows the server side to tweak the final tips the client receives after the protocol exchange started. I suspect the above two will become related. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning 2017-02-01 18:06 ` Junio C Hamano @ 2017-02-01 21:28 ` Jeff King 2017-02-01 21:35 ` Junio C Hamano 0 siblings, 1 reply; 9+ messages in thread From: Jeff King @ 2017-02-01 21:28 UTC (permalink / raw) To: Junio C Hamano; +Cc: Erik van Zijst, git, ssaasen, mheemskerk On Wed, Feb 01, 2017 at 10:06:15AM -0800, Junio C Hamano wrote: > > If you _can_ do that latter part, and you take "I only care about > > resumability" to the simplest extreme, you'd probably end up with a > > protocol more like: > > > > Client: I need a packfile with this want/have > > Server: OK, here it is; its opaque id is XYZ. > > ... connection interrupted ... > > Client: It's me again. I have up to byte N of pack XYZ > > Server: OK, resuming > > [or: I don't have XYZ anymore; start from scratch] > > > > Then generating XYZ and generating that bundle are basically the same > > task. > > The above allows a simple and naive implementation of generating a > packstream and "tee"ing it to a spool file to be kept while sending > to the first client that asks XYZ. > > The story I heard from folks who run git servers at work for Android > and other projects, however, is that they rarely see two requests > with want/have that result in an identical XYZ, unless "have" is an > empty set (aka "clone"). In a busy repository, between two clone > requests relatively close together, somebody would be pushing, so > you'd need many XYZs in your spool even if you want to support only > the "clone" case. Yeah, I agree a tag "XYZ" does not cover all cases, especially for fetches. We do caching at GitHub based on the sha1(want+have+options) tag, and it does catch quite a lot of parallelism, but not all. It catches most clones, and many fetches that are done by "thundering herds" of similar clients. One thing you could do with such a pure "resume XYZ" tag is to represent the generated pack _without_ replicating the actual object bytes, but take shortcuts by basing particular bits on the on-disk packfile. Just enough to serve a deterministic packfile for the same want/have bits. For instance, if the server knew that XYZ meant - send bytes m through n of packfile p, then... - send the object at position i of packfile p, as a delta against the object at position j of packfile q - ...and so on Then you could store very small "instruction sheets" for each XYZ that rely on the data in the packfiles. If those packfiles go away (e.g., due to a repack) that invalidates all of your current XYZ tags. That's OK as long as this is an optimization, not a correctness requirement. I haven't actually built anything like this, though, so I don't have a complete language for the instruction sheets, nor numbers on how big they would be for average cases. > So in the real life, I think that the exchange needs to be more > like this: > > C: I need a packfile with this want/have > ... C/S negotiate what "have"s are common ... > S: Sorry, but our negitiation indicates that you are way too > behind. I'll send you a packfile that brings you up to a > slightly older set of "want", so pretend that you asked for > these slightly older "want"s instead. The opaque id of that > packfile is XYZ. After getting XYZ, come back to me with > your original set of "want"s. You would give me more recent > "have" in that request. > ... connection interrupted ... > C: It's me again. I have up to byte N of pack XYZ > S: OK, resuming (or: I do not have it anymore, start from scratch) > ... after 0 or more iterations C fully receives and digests XYZ ... > > and then the above will iterate until the server does not have to > say "Sorry but you are way too behind" and returns a packfile > without having to tweak the "want". Yes, I think that is a reasonable variant. The client knows about seeding, but the XYZ conversation continues to happen inside the git protocol. So it loses flexibility versus a true CDN redirection, but it would "just work" when the server/client both understand the feature, without the server admin having to set up a separate bundle-over-http infrastructure. -Peff ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning 2017-02-01 21:28 ` Jeff King @ 2017-02-01 21:35 ` Junio C Hamano 0 siblings, 0 replies; 9+ messages in thread From: Junio C Hamano @ 2017-02-01 21:35 UTC (permalink / raw) To: Jeff King; +Cc: Erik van Zijst, git, ssaasen, mheemskerk Jeff King <peff@peff.net> writes: > For instance, if the server knew that XYZ meant > > - send bytes m through n of packfile p, then... > > - send the object at position i of packfile p, as a delta against the > object at position j of packfile q > > - ...and so on > > Then you could store very small "instruction sheets" for each XYZ that > rely on the data in the packfiles. If those packfiles go away (e.g., due > to a repack) that invalidates all of your current XYZ tags. That's OK as > long as this is an optimization, not a correctness requirement. Yes. You can play optimization games. >> So in the real life, I think that the exchange needs to be more >> like this: >> >> C: I need a packfile with this want/have >> ... C/S negotiate what "have"s are common ... >> S: Sorry, but our negitiation indicates that you are way too >> behind. I'll send you a packfile that brings you up to a >> slightly older set of "want", so pretend that you asked for >> these slightly older "want"s instead. The opaque id of that >> packfile is XYZ. After getting XYZ, come back to me with >> your original set of "want"s. You would give me more recent >> "have" in that request. >> ... connection interrupted ... >> C: It's me again. I have up to byte N of pack XYZ >> S: OK, resuming (or: I do not have it anymore, start from scratch) >> ... after 0 or more iterations C fully receives and digests XYZ ... >> >> and then the above will iterate until the server does not have to >> say "Sorry but you are way too behind" and returns a packfile >> without having to tweak the "want". > > Yes, I think that is a reasonable variant. The client knows about > seeding, but the XYZ conversation continues to happen inside the git > protocol. So it loses flexibility versus a true CDN redirection, but it > would "just work" when the server/client both understand the feature, > without the server admin having to set up a separate bundle-over-http > infrastructure. You can also do a CDN offline as a natural extension. When the server says "Sorry, you are way too behind.", the above example tells "I'll update you to a slightly stale version first" to the client. An natural extension could say "Go update yourself to a slightly stale version first by grabbing that bundle over there." But I agree that doing everything in-line may be a logical and simpler first step to get there. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [ANNOUNCE] Git Merge Contributor Summit topic planning 2017-01-31 0:48 [ANNOUNCE] Git Merge Contributor Summit topic planning Jeff King 2017-01-31 0:59 ` Jeff King 2017-02-01 9:32 ` Erik van Zijst @ 2017-02-01 20:37 ` Stefan Beller 2 siblings, 0 replies; 9+ messages in thread From: Stefan Beller @ 2017-02-01 20:37 UTC (permalink / raw) To: Jeff King; +Cc: git On Mon, Jan 30, 2017 at 4:48 PM, Jeff King <peff@peff.net> wrote: > The Contributor Summit is only a few days away; I'd like to work out a > few bits of logistics ahead of time. > > We're up to 26 attendees. The room layout will probably be three big > round-tables, with a central projector. We should be able to have > everybody pay attention to a single speaker, or break into 3 separate > conversations. > > The list of topics is totally open. If you're coming and have something > you'd like to present or discuss, then propose it here. If you're _not_ > coming, you may still chime in with input on topics, but please don't > suggest a topic unless somebody who is there will agree to lead the > discussion. submodules and X (How do submodules and worktrees interact, should they?, Which functions need support for submodules, e.g. checkout, branch, grep, etc...? Are we interested in keeping a submodule its own logical unit? Do we want to have dedicated plumbing commands for submodules?) ... would be my line of talk, Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-02-01 21:35 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-01-31 0:48 [ANNOUNCE] Git Merge Contributor Summit topic planning Jeff King 2017-01-31 0:59 ` Jeff King 2017-02-01 19:51 ` Christian Couder 2017-02-01 9:32 ` Erik van Zijst 2017-02-01 14:53 ` Jeff King 2017-02-01 18:06 ` Junio C Hamano 2017-02-01 21:28 ` Jeff King 2017-02-01 21:35 ` Junio C Hamano 2017-02-01 20:37 ` Stefan Beller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.