From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id E4892C433EF for ; Fri, 20 May 2022 23:18:11 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1354039AbiETXSK (ORCPT ); Fri, 20 May 2022 19:18:10 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50374 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S238696AbiETXR4 (ORCPT ); Fri, 20 May 2022 19:17:56 -0400 Received: from mail-qt1-x82c.google.com (mail-qt1-x82c.google.com [IPv6:2607:f8b0:4864:20::82c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9C9B51957B8 for ; Fri, 20 May 2022 16:17:54 -0700 (PDT) Received: by mail-qt1-x82c.google.com with SMTP id v6so5847308qtx.12 for ; Fri, 20 May 2022 16:17:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20210112.gappssmtp.com; s=20210112; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=J/R31/ebmQ4lbVegmp4wIsp5bWpx1GJe4uyhjKvlOFQ=; b=B7Z2ItjefzCcqk4qTm9FTB7yxRsYInXnWwSsvk+9kz97HAk+q45SL9Lj3jIqKhxisx U1PjTvxyvCwqT1b6ukjQ5REMPjrjMF57KNePADDmhPj7z+kdqrWq8F+KXfCLKAyR3ttq NtgjVMLs88Oy6NE1KsEKhl080B+x1+ym2WYT4tXacpD3W8U6f+JbBx+7+k+AxSgWlNIb 4jk2UcDxmv4/7xWU1GmRc0TNv1BLOD/3nITqBJQHS8SzUz8lk7bq/iN4f1rqH7BSp34Y HGMm7uN7GilCHvNn4aqNPBPvGNygJgfxuSeVGl7nrmP8JcJxv2XFsuKabJ/fAMe4hGIL /fAQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=J/R31/ebmQ4lbVegmp4wIsp5bWpx1GJe4uyhjKvlOFQ=; b=lcHhHJswcDiGH1fj53m4w5BdF5Ad7oJy7tVjDe6YAdtm0lpBJ8nsgTJNRB3in2f7WP +wZtMderFjiJPxZ5jMbLFTVkwlsJpZCPHfy6wdC6bDQmR7WBfgXOfDJA9SWevxPRI/RP k/IMe1hvzyHRqYN2PdS0w1t3YAQF7QsNpcoAMoNtZNaAwZ6GknCubhKjsmob+BGMOnaH a4w6CAB6dNmB9loZD0zcYfcQIDqgMNkoYwxrsoy0MLJxWdrol45I6Ua1HCSn5f8YkmFK pqZuY8qWNts3HAOjt1GYJC8DaKiEAjYscQUgKXzgjR+CaFMSDAVqX6HFCYNCmCyU6WRa l7cg== X-Gm-Message-State: AOAM530Gbno9as9cWzsvuu4FKojBq4CCQm8+BD3x2PzEBWxR7cZeYGc7 TbkL9ED3Qur/QWBBPHK4AbNwf89qkxAdAQuQ X-Google-Smtp-Source: ABdhPJzXyNGbpflHifiUNSugljXd8adH3krg3urAzA+BgwNmtS3ONngyI2gFNS5SfmkzRLGUPXhscg== X-Received: by 2002:ac8:590c:0:b0:2f3:e1b7:5d1d with SMTP id 12-20020ac8590c000000b002f3e1b75d1dmr9508879qty.191.1653088673328; Fri, 20 May 2022 16:17:53 -0700 (PDT) Received: from localhost (104-178-186-189.lightspeed.milwwi.sbcglobal.net. [104.178.186.189]) by smtp.gmail.com with ESMTPSA id f4-20020ac86ec4000000b002f906fc8530sm483899qtv.46.2022.05.20.16.17.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 20 May 2022 16:17:53 -0700 (PDT) Date: Fri, 20 May 2022 19:17:52 -0400 From: Taylor Blau To: git@vger.kernel.org Cc: avarab@gmail.com, derrickstolee@github.com, gitster@pobox.com, jrnieder@gmail.com, larsxschneider@gmail.com, tytso@mit.edu Subject: [PATCH v5 08/17] builtin/pack-objects.c: --cruft without expiration Message-ID: References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org Teach `pack-objects` how to generate a cruft pack when no objects are dropped (i.e., `--cruft-expiration=never`). Later patches will teach `pack-objects` how to generate a cruft pack that prunes objects. When generating a cruft pack which does not prune objects, we want to collect all unreachable objects into a single pack (noting and updating their mtimes as we accumulate them). Ordinary use will pass the result of a `git repack -A` as a kept pack, so when this patch says "kept pack", readers should think "reachable objects". Generating a non-expiring cruft packs works as follows: - Callers provide a list of every pack they know about, and indicate which packs are about to be removed. - All packs which are going to be removed (we'll call these the redundant ones) are marked as kept in-core. Any packs the caller did not mention (but are known to the `pack-objects` process) are also marked as kept in-core. Packs not mentioned by the caller are assumed to be unknown to them, i.e., they entered the repository after the caller decided which packs should be kept and which should be discarded. Since we do not want to include objects in these "unknown" packs (because we don't know which of their objects are or aren't reachable), these are also marked as kept in-core. - Then, we enumerate all objects in the repository, and add them to our packing list if they do not appear in an in-core kept pack. This results in a new cruft pack which contains all known objects that aren't included in the kept packs. When the kept pack is the result of `git repack -A`, the resulting pack contains all unreachable objects. Signed-off-by: Taylor Blau --- Documentation/git-pack-objects.txt | 30 ++++ builtin/pack-objects.c | 201 +++++++++++++++++++++++++- object-file.c | 2 +- object-store.h | 2 + t/t5329-pack-objects-cruft.sh | 218 +++++++++++++++++++++++++++++ 5 files changed, 448 insertions(+), 5 deletions(-) create mode 100755 t/t5329-pack-objects-cruft.sh diff --git a/Documentation/git-pack-objects.txt b/Documentation/git-pack-objects.txt index f8344e1e5b..a9995a932c 100644 --- a/Documentation/git-pack-objects.txt +++ b/Documentation/git-pack-objects.txt @@ -13,6 +13,7 @@ SYNOPSIS [--no-reuse-delta] [--delta-base-offset] [--non-empty] [--local] [--incremental] [--window=] [--depth=] [--revs [--unpacked | --all]] [--keep-pack=] + [--cruft] [--cruft-expiration=