From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-6.5 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id 306342018A for ; Fri, 1 Jul 2016 20:11:14 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752166AbcGAUCm (ORCPT ); Fri, 1 Jul 2016 16:02:42 -0400 Received: from cloud.peff.net ([50.56.180.127]:39351 "HELO cloud.peff.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with SMTP id S1751900AbcGAUCl (ORCPT ); Fri, 1 Jul 2016 16:02:41 -0400 Received: (qmail 27495 invoked by uid 102); 1 Jul 2016 20:01:27 -0000 Received: from Unknown (HELO peff.net) (10.0.1.2) by cloud.peff.net (qpsmtpd/0.84) with SMTP; Fri, 01 Jul 2016 16:01:27 -0400 Received: (qmail 22184 invoked by uid 107); 1 Jul 2016 20:01:43 -0000 Received: from sigill.intra.peff.net (HELO sigill.intra.peff.net) (10.0.0.7) by peff.net (qpsmtpd/0.84) with SMTP; Fri, 01 Jul 2016 16:01:43 -0400 Received: by sigill.intra.peff.net (sSMTP sendmail emulation); Fri, 01 Jul 2016 16:01:23 -0400 Date: Fri, 1 Jul 2016 16:01:23 -0400 From: Jeff King To: Stefan Beller Cc: Dennis Kaarsemaker , "git@vger.kernel.org" , Dan Wang Subject: Re: [RFC PATCHv1 0/4] Push options in C Git Message-ID: <20160701200123.GB23141@sigill.intra.peff.net> References: <20160630005951.7408-1-sbeller@google.com> <20160701070902.GF5358@sigill.intra.peff.net> <20160701175503.GA16235@sigill.intra.peff.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On Fri, Jul 01, 2016 at 11:25:40AM -0700, Stefan Beller wrote: > > So a litmus test for me would be something like: > > > > Can you get the same effect with: > > > > git push $remote my-branch && > > curl https://$remote/api/do-the-thing?q=my-branch > > > > as you would with: > > > > git push --push-data=do-the-thing $remote mybranch > > People are happy when they can use one tool instead of two, > e.g. "Deploy with Git" is a thing[1] and maybe they want to add a > switch "once pushed skip the tests, I know they are broken". > > [1] https://devcenter.heroku.com/articles/git Yes, but people are also happy when they can use a flexible and standardized tool to do a thing. I'd be more frustrated when I found out that Git's data-pushing protocol has arbitrary limitations (like, say, I can't push a data item larger than a single 64K pkt-line), which would easily just work with something like HTTP POSTs. I'm not saying my point trumps yours; just that there are a lot of tradeoffs in a design like this. > > With Gerrit, how do you refer to the commits you just pushed, when > > making a separate action? My impression is that Gerrit assigns > > change-ids centrally when you push to refs/for/master; > > actually you install a commit hook to do it decentralized. Ah, OK. So you could do roughly the same thing: git push ... && id=$( git cat-file commit HEAD | perl -lne '/^Change-Id: (.*)/ and print $1' ) && curl https://$remote/api/do-something?id=$id (Yes, I know Gerrit doesn't necessarily have an API like that; I was mostly just wondering how it compared to the GitHub PR example I had laid out). > I am used to read high quality emails from you and asked myself what I can > learn by reading this email. "Trying to figure out your point", nothing more. Heh. Sorry, they can't all be winners. ;) > So you suggest that we only allow key=value pairs and not "key" only? > I am not sure if that makes it easier, though (what part is easier?) compared > to completely free form. I had just assumed they would always have a key and a value. If you have no key, then you are stuck parsing positionally to know what each value means. If you have no value, you can treat each element like a boolean true. But it's not that much worse to turn "foo" into "foo=true". > > but they do complicate the server implementation (how do we > > communicate them to the hooks? Should there be limits for safety and > > DoS protection on the server?) > > Oh good point! How do you handle DoS protection for pushing large > blobs, or weird history (that may blow up the before-behind > implementation as referenced in the "topological index field for > commit objects" thread) At GitHub we use a combination of push-time size heuristics and CPU run-time limits. So at push time, we reject objects larger than 100MB (uncompressed) in index-pack, and we'll refuse to make a single allocation larger than a gigabyte or so (to prevent index-pack blowing up while trying to figure out how big the object is). We reject any single pack larger than 2GB. Rejected objects never even make it to the repo, and we clean up the tmp packs immediately. So you can't just stream data at us endlessly and chew up disk or RAM (you can chew up plenty of CPU, though). That prevents the most egregious DoS problems, and protects us from crap like "whoops, this tree entry has a 2GB name, and it wraps an integer somewhere deep in git that probably should have been a size_t". But git is complicated enough that you can still get plenty of malicious crap through. So we aggressively limit completion times for most operations, and we keep track of the runtime of every single git operation and correlate it with the repo, user, and source IP, and apply quotas when there's excessive use. That's not nice for a user who has a legitimately big request, but it prevents them from impacting other users (so it's a baseline, and then we try to optimize things that don't fit into the timeouts, or that cause users to have quotas applied). That was a bit of a tangent, but back to this topic: I'd probably want some kind of hard limit on the number of data items and the size of each (though if they are single pkt-lines, that puts a hard limit already). > Shouldn't that be server specific as well? Potentially, yes. > Mind that I proposed a "receive.advertisePushOptions" config option. > Maybe that can be fine tuned in a later patch "receive.PushOptionSizeLimit", > "receive.maxPushOptions" or such? Yep, I think that is a good approach (presumably with some sane safe-ish defaults). > Talking about authentication, any tool that we additionally have "to > just trigger this one thing remotely" would need to learn about proper > auth, too. so it will not be just a small helper script, but blows up > into a large thing. I think we could avoid that duplicated effort, > too. I think the right tool there is credential helpers. And I don't think it's a new problem. You already can log into Gerrit via the web browser, but you have to have separate auth for running git. Hopefully there are tools to make that less painful (either pulling the browser cookie, or getting an opaque application token from the web app to feed into git). -Peff