* random server hacks on top of git
@ 2013-03-18 12:12 Jeff King
2013-03-18 18:24 ` Junio C Hamano
0 siblings, 1 reply; 2+ messages in thread
From: Jeff King @ 2013-03-18 12:12 UTC (permalink / raw)
To: René Scharfe; +Cc: Junio C Hamano, git
[Re-titled, as we are off-topic from the original patch series]
On Sun, Mar 17, 2013 at 05:38:59PM +0100, René Scharfe wrote:
> Am 17.03.2013 06:40, schrieb Jeff King:
> >We do have the capability to roll out to one or a few of our servers
> >(the granularity is not 0.2%, but it is still small). I'm going to try
> >to keep us more in sync with upstream git, but I don't know if I will
> >get to the point of ever deploying "master" or "next", even for a small
> >portion of the population. We are accumulating more hacks[1] on top of
> >git, so it is not just "run master for an hour on this server"; I have
> >to actually merge our fork.
>
> Did you perhaps intend to list these hacks in a footnote or link to a
> repository containing them? (I can't find the counterpart of that
> [1].)
I was actually just going to say "some of which are gross hacks that
will never see the light of day, some of which have already gone
upstream, and some of which I am planning on submitting upstream".
But since I happened to be cataloguing them recently, here is the list
of things that have not yet gone upstream. If anybody is interested in
a particular topic, I'm happy to discuss and/or prioritize moving it
forward.
- blame-tree; re-rolled from my submission last year to build on top
of the revision machinery, handle merges sanely, etc. Mostly this
needs documentation and a clean-up of the output format (which is
very utilitarian, but probably should share output with git-blame).
- diff --max-depth; this is a requirement to do blame-tree efficiently
if you want to do GitHub-style listings (you must recurse to find
the history of some/subdir, but you do not want to recurse past that
for efficiency reasons). This is hung up on two things:
1. It does not integrate with the pathspec max-depth code, because
we do not use struct pathspec in the tree diff (but I think
Duy's patches are changing that).
2. My definition of --max-depth is subtly different from that of
"git grep". But I think mine is more useful, and I haven't
decided how to reconcile it.
- share ref selection code between "git branch", "git tag", and "git
for-each-ref". This includes cleaning up the "tag --contains" code
to be safer for general use (so that "branch --contains" can benefit
from the speedup), and then getting the same options for all three
commands (tag doesn't know about --merged, and for-each-ref
doesn't know about --contains or --merged).
- receive.maxsize; index-pack will happily spool data to disk
forever, and you never even get a chance to make a policy decision
like "hey, this is too big". This patch lets index-pack cut off the
client after a certain number of bytes. It's not elegant because
the cutoff transfer is not resumable, but we use it is as a
last-ditch for DoS protection (the client can reconnect and send
more, of course, but at that point we have the opportunity to make
external policy decisions like locking their account). Not sure if
other sites would want this or not.
- receive.advertisealternates; basically turn off ".have"
advertisement. Some of our alternates networks are so huge that
the cost of collecting all of the alternate refs is very high (even
though it can save some transfer bandwidth). Not sure if other
sites want this or not (and I think it would be more elegant to
have a small static set of common refs that people build off of,
and advertise those. e.g., if you fork rails/rails, then we should
advertise rails/rails/refs/heads/master as a ".have", but not
anybody else's fork).
- receive.hiderefs; this is going to become redundant with Junio's
implementation
- an audit reflog; we keep a reflog for all refs at the root of the
repository. It differs from a regular reflog in that:
1. It never expires.
2. It is not part of reachability analysis.
3. It includes the refname for each entry, so you can see
deletions.
It's mostly useful for forensics when somebody has screwed up
their repository (or we're chasing down a git bug; it helped me
find the pack-refs race recently). Probably too GitHub-specific
for other people to want it (especially because it grows without
bound).
- statistics instrumentation; we keep counters for various things in
code (e.g., which phase of protocol upload-pack is in, how many
bytes sent, etc) and expose them in a few ways. One is over a
socket to run a "top"-like interface. Another is to tweak the argv
array of the process so that "ps" shows the process state. I think
it would be useful to other people running git servers, but the
code is currently quite nasty and invasive. I have a
work-in-progress to clean it up, but it's got a ways to go.
- hacks to set niceness and io-priority; this should be done by a
wrapper, but in our case it was simpler to catch all processes by
just building it into git. Too gross to go upstream.
- ignore some fsck warnings under transfer.fsckobjects; some of them
are annoyingly common when people pull old history from an
existing project and try to push it back up. It's not indicative
of a new bug in an implementation, but we have to live with the
broken history forever (e.g., zero-padded modes in trees).
-Peff
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: random server hacks on top of git
2013-03-18 12:12 random server hacks on top of git Jeff King
@ 2013-03-18 18:24 ` Junio C Hamano
0 siblings, 0 replies; 2+ messages in thread
From: Junio C Hamano @ 2013-03-18 18:24 UTC (permalink / raw)
To: Jeff King; +Cc: René Scharfe, git
Jeff King <peff@peff.net> writes:
> - blame-tree; re-rolled from my submission last year to build on top
>
> - diff --max-depth; this is a requirement to do blame-tree efficiently
Both look mildly interesting, especially after the magic pathspec
settles the latter may be a good addition.
> - share ref selection code between "git branch", "git tag", and "git
> for-each-ref".
Nice.
> - receive.maxsize; index-pack will happily spool data to disk
Again nice.
> - receive.advertisealternates; basically turn off ".have"
> advertisement.
I think this is a sane thing to do, especially with some hints on
the most common reference tree everybody is expected to know about.
> - receive.hiderefs; this is going to become redundant with Junio's
> implementation
Yup.
> - an audit reflog; we keep a reflog for all refs at the root of the
> repository. It differs from a regular reflog in that:
>
> 1. It never expires.
>
> 2. It is not part of reachability analysis.
>
> 3. It includes the refname for each entry, so you can see
> deletions.
Interesting.
> - ignore some fsck warnings under transfer.fsckobjects; some of them
> are annoyingly common when people pull old history from an
> existing project and try to push it back up.
Depending on the implementation, this may be very much valuable.
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2013-03-18 18:25 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-03-18 12:12 random server hacks on top of git Jeff King
2013-03-18 18:24 ` Junio C Hamano
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.