From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on dcvr.yhbt.net X-Spam-Level: X-Spam-ASN: AS31976 209.132.180.0/23 X-Spam-Status: No, score=-3.5 required=3.0 tests=AWL,BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,RCVD_IN_DNSWL_HI,RP_MATCHES_RCVD shortcircuit=no autolearn=ham autolearn_force=no version=3.4.0 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by dcvr.yhbt.net (Postfix) with ESMTP id E8BFF202AC for ; Thu, 13 Jul 2017 15:05:13 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751267AbdGMPFM (ORCPT ); Thu, 13 Jul 2017 11:05:12 -0400 Received: from siwi.pair.com ([209.68.5.199]:22499 "EHLO siwi.pair.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751193AbdGMPFL (ORCPT ); Thu, 13 Jul 2017 11:05:11 -0400 Received: from siwi.pair.com (localhost [127.0.0.1]) by siwi.pair.com (Postfix) with ESMTP id 0D94784533; Thu, 13 Jul 2017 11:05:10 -0400 (EDT) Received: from [10.160.98.223] (unknown [167.220.148.107]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by siwi.pair.com (Postfix) with ESMTPSA id D0C978452E; Thu, 13 Jul 2017 11:05:09 -0400 (EDT) Subject: Re: [RFC PATCH 1/3] promised-blob, fsck: introduce promised blobs From: Jeff Hostetler To: Jonathan Nieder Cc: Jonathan Tan , git@vger.kernel.org References: <890a36fc-be16-83bc-fec6-94e21d0f7d0a@jeffhostetler.com> <20170712192831.GG93855@aiede.mtv.corp.google.com> <4b326d17-3896-a374-9d7a-d0c89954c943@jeffhostetler.com> Message-ID: Date: Thu, 13 Jul 2017 11:05:09 -0400 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Thunderbird/52.2.1 MIME-Version: 1.0 In-Reply-To: <4b326d17-3896-a374-9d7a-d0c89954c943@jeffhostetler.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: git-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org On 7/13/2017 10:48 AM, Jeff Hostetler wrote: > > > On 7/12/2017 3:28 PM, Jonathan Nieder wrote: >> Hi, >> >> Jeff Hostetler wrote: >> >>> My primary concern is scale and managing the list of objects over time. >> [...] >>> For >>> example, on the Windows repo we have (conservatively) 100M+ blobs (and >>> growing). Assuming 28 bytes per, gives a 2.8GB list to be manipulated. >>> >>> If I understand your proposal, newly-omitted blobs would need to be >>> merged into the promised-blob list after each fetch. The fetch itself >>> may not have that many new entries, but inserting them into the existing >>> list will be slow. >> >> This is a good point. An alternative would be to avoid storing the >> list and instead use a repository extension that treats all missing >> blobs in the repository similarly to promised blobs (with a weaker >> promise, where the server is allowed to return 404). The downsides: >> >> - blob sizes are not available without an additional request, e.g. for >> directory listings >> >> - the semantics of has_object_file become more complicated. If it >> is allowed to return false for omitted blobs, then callers have to >> be audited to tolerate that and try to look up the blob anyway. >> If it has to contact the server to find out whether an omitted blob >> is available, then callers have to be audited to skip this expensive >> operation when possible. >> >> - similarly, the semantics of sha1_object_info{,_extended} become more >> complicated. If they are allowed to return -1 for omitted blobs, >> then callers have to be audited to handle that. If they have to >> contact the server to find the object type and size, it becomes >> expensive in a way that affects callers. >> >> - causes futile repeated requests to the server for objects that don't >> exist. Caching negative lookups is fussy because a later push could >> cause those objects to exist --- though it should be possible for >> fetches to invalidate entries in such a cache using the list of >> promised blobs sent by the server. > > Also good points. :-) > > Would it be better to: > [] let fetch-pack/index-pack on the client compute the set of missing > objects based upon the commit range in the response. > [] build a "pack-.promised" as a peer of the "pack-.[pack,idx]" > files. > [] Use Jonathan's binary format with (-1LL) for the sizes, but also add > a flags/object-type word. Since the local commit/object traversal > usually knows the type of object (either by context or the mode field > in the parent entry), index-pack can set it immediately. > [] Repack would "do the right thing" when building new packfiles from > existing packfiles (and merge/prune/etc. a new promise file). > > Conceptually, these "pack-.promised" files would be read-only once > created, like the corresponding .pack/.idx files. But since we have a > fixed record size, we could let the dynamic object size requests update > the sizes in-place. (And if an update failed for some reason, the > client could just go on and use the result (as we do when trying to > update mtime info in the index).) > > This gives us the full set of omitted objects without the network > overhead and without the overhead of merging them into a single list. > And we defer the cost of getting an object's size until we actually > need the info. > > >> [...] >>> In such a "sparse clone", it would be nice to omit unneeded tree objects >>> in addition to just blobs. I say that because we are finding with GVFS >>> on the Windows repo, that even with commits-and-trees-only filtering, >>> the number of tree objects is overwhelming. So I'm also concerned about >>> limiting the list to just blobs. If we need to have this list, it >>> should be able to contain any object. (Suggesting having an object type >>> in the entry.) >> >> Would it work to have a separate lists of blobs and trees (either in >> separate files or in the same file)? >> >> One option would be to add a version number / magic string to the start >> of the file. That would allow making format changes later without a >> proliferation of distinct repository extensions. > > As I suggested above, we could just put them in the same file with a > flags/object-type field. > > Without getting too nerdy here, we could even say that the size doesn't > need to be a int:64 -- definitely bigger than int:32, but we don't need > the full int:64. We could make the size a int:40 or int:48 and use the > rest of the QWORD for flags/object-type. > > >> [...] >>> I assume that we'll also need a promised-blob.lock file to control >>> access during list manipulation. This is already a sore spot with the >>> index; I'd hate to create another one. >> >> Can you say more about this concern? My experience with concurrent >> fetches has already not been great (since one fetch process is not >> aware of what the other has fetched) --- is your concern that the >> promised-blob facility would affect pushes as well some day? > > [I wrote this before I wrote the multi-peer-file suggestion above.] > > I'm assuming that anyone adding a new promised-blob to the single file > will use the standard "read foo, write foo.lock, rename" trick. If we > have a large number of omitted blobs, this file will be large and the > insert will take a non-trivial amount of time. > > Concurrent fetches does sound a bit silly, so we only have to worry > about the time to copy the bits on disk. > > If other operations (such as demand loading missing blobs in a log or > diff) need to remove promised-blobs from the list as they create > loose blobs, then they will get stuck on the lock. I suppose we could > just say that the promised-blob list may have false-positives in it > (because lookups in it won't happen until the packed and loose objects > have been searched and failed) and just wait for GC to clean it. > > And there's always that stale lock problem where something dies holding > the lock and doesn't delete it. > > Some of these could be addressed with a DB or something that allows > incremental and atomic updates rather than a regular index-like > flat file, but I don't want to dive into that here. > > >>> I also have to wonder about the need to have a complete list of omitted >>> blobs up front. It may be better to just relax the consistency checks >>> and assume a missing blob is "intentionally missing" rather than >>> indicating a corruption somewhere. >> >> We've discussed this previously on list and gone back and forth. :) >> >>> And then let the client do a later >>> round-trip to either demand-load the object -or- demand-load the >>> existence/size info if/when it really matters. >> >> The cost of demand-loading this existence/size information is what >> ultimately convinced me of this approach. >> >> But I can see how the tradeoffs differ between the omit-large-blobs >> case and the omit-all-blobs case. We might end up having to support >> both modes. :( > > I think I've been in both camps myself. > > I'm not sure it would be that hard to support both (other than being > unfortunate), assuming pack-objects could build the same binary data > as I suggested be computed by fetch-pack/index-pack above. It could > then just be a run-time decision. > > I could even see it being useful -- do the initial clone with > client-side computation (because you're getting mostly ancient data > that you don't care about) and then let fetches use server-side > computation (because you're more likely to care about the sizes of > objects near your branch heads). > > >>> Maybe we should add a verb to your new fetch-blob endpoint to just get >>> the size of one or more objects to help with this. >> >> No objections from me, though we don't need it yet. I think it would also be helpful to consider more than one type of size request. [1] get the size of a single objects [2] get the sizes of a set of objects [3] get the sizes for all objects directly referenced by a tree. if both sides have a tree object, then we don't need to send a list blobs -- just ask for all the blobs referenced by the tree object. the server could just return an array of sizes ordered by blob position in the tree. [4] recursive version of [3]. [5] recursive for a commit (basically just a convenience on [4] and the tree root). This would let the clone do a client-side computation and then (in a request to the new end-point) get the sizes of everything in HEAD, for example. Thanks Jeff