From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 43505C5517A for ; Wed, 11 Nov 2020 19:44:05 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D2C1E208B3 for ; Wed, 11 Nov 2020 19:44:04 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ttaylorr-com.20150623.gappssmtp.com header.i=@ttaylorr-com.20150623.gappssmtp.com header.b="f8QjcU2R" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727841AbgKKToD (ORCPT ); Wed, 11 Nov 2020 14:44:03 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:55096 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727830AbgKKToD (ORCPT ); Wed, 11 Nov 2020 14:44:03 -0500 Received: from mail-qk1-x729.google.com (mail-qk1-x729.google.com [IPv6:2607:f8b0:4864:20::729]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 5FEDDC0613D1 for ; Wed, 11 Nov 2020 11:44:03 -0800 (PST) Received: by mail-qk1-x729.google.com with SMTP id d9so2903380qke.8 for ; Wed, 11 Nov 2020 11:44:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ttaylorr-com.20150623.gappssmtp.com; s=20150623; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to; bh=HWQ/0PA991tIXD805VXbeNtT0ZCfavw0PqLbVnE1H9g=; b=f8QjcU2ROPDrR0ZFTjjzSjhnY+mzgb8+HDvL76xAzsnlItRAFEmw8mp6QA887XHGoP 0u3Syn0hPbV0T/IeaOS6mUDi2ftzHnzwkZV1jmtz/F5KzURY2544M/iJd+dc086Wv4ZF NH7aIqF0oz5xjo+fX6Wg9ks0JIfX1VyfmEkIccBEpHiS4Gx8/5HaOFCbjSU/v3hO9e2v nn3UbbSARQi2S5cDAyi24ZbrvKBFZ4TPz0nFtOhFTsbrnY5RD5UUhN13IBpRzr1rnGAa 5S2qUSBoeJ8D9umG20O3Jiudm7m4slEwPgsw3ohTWjh2xZYUIXhYERh+vqNYKVOQ9GEn UZ6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to; bh=HWQ/0PA991tIXD805VXbeNtT0ZCfavw0PqLbVnE1H9g=; b=FkDrLdOGovRsIJ2a2lXvidp54EcqfHnOLMhAY8gpNvh1xLfEK5UKIitsr7Q5P1vQ9M 8fv/9nkm+/dT5U2MGGzE5PBtPUGQqKlrxomeYpdv0k5LeuGPswt6nSo4X/JdZQe6V5Yu W7Cx3++GNrhpeqaNFYme0Fy+zxvewdAjAMgFGJeS34VldECFMAsKbU+4k4n8q5FbahVa WS6WPYG/5rVZ+c5VNO26zpT7/Yal/aFMQ84cu87jNR1ylqyYzhlzfDsqx7cumqP97Ivo HeVHjUVYoasVa5fPMaw0T+4MYk7TsmUpxpv7Izrl+SobIfdznJaM7XQoIJW9h4OUk8d3 7+zg== X-Gm-Message-State: AOAM531pNlQKp2nBy9Hpme6bQpxN+Asj7F9U+ac2GI1HZcGti5p0d37M BRMWnuVrsglStDojM/P4i5zdqX0r/Za+Xu6b X-Google-Smtp-Source: ABdhPJwktK5j5DPnV9yw04DQRmsPqRGG+LXyjC7/XmX+Ia6UYXbaVV0J1SXQ9LyJxrCezHovM7hkbw== X-Received: by 2002:a37:7345:: with SMTP id o66mr27393731qkc.222.1605123842141; Wed, 11 Nov 2020 11:44:02 -0800 (PST) Received: from localhost ([2605:9480:22e:ff10:7ccc:9a51:1ad:2057]) by smtp.gmail.com with ESMTPSA id q1sm2990054qti.95.2020.11.11.11.44.01 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 11 Nov 2020 11:44:01 -0800 (PST) Date: Wed, 11 Nov 2020 14:43:59 -0500 From: Taylor Blau To: git@vger.kernel.org Cc: dstolee@microsoft.com, gitster@pobox.com, peff@peff.net Subject: [PATCH 18/23] pack-bitmap-write: ignore BITMAP_FLAG_REUSE Message-ID: <99a295416b2714c183aa39dcf515fb625090c2f2.1605123653.git.me@ttaylorr.com> References: MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Jeff King The on-disk bitmap format has a flag to mark a bitmap to be "reused". This is a rather curious feature, and works like this: - a run of pack-objects would decide to mark the last 80% of the bitmaps it generates with the reuse flag - the next time we generate bitmaps, we'd see those reuse flags from the last run, and mark those commits as special: - we'd be more likely to select those commits to get bitmaps in the new output - when generating the bitmap for a selected commit, we'd reuse the old bitmap as-is (rearranging the bits to match the new pack, of course) However, neither of these behaviors particularly makes sense. Just because a commit happened to be bitmapped last time does not make it a good candidate for having a bitmap this time. In particular, we may choose bitmaps based on how recent they are in history, or whether a ref tip points to them, and those things will change. We're better off re-considering fresh which commits are good candidates. Reusing the existing bitmap _is_ a reasonable thing to do to save computation. But only reusing exact bitmaps is a weak form of this. If we have an old bitmap for A and now want a new bitmap for its child, we should be able to compute that only by looking at trees and that are new to the child. But this code would consider only exact reuse (which is perhaps why it was eager to select those commits in the first place). Furthermore, the recent switch to the reverse-edge algorithm for generating bitmaps dropped this optimization entirely (and yet still performs better). So let's do a few cleanups: - drop the whole "reusing bitmaps" phase of generating bitmaps. It's not helping anything, and is mostly unused code (or worse, code that is using CPU but not doing anything useful) - drop the use of the on-disk reuse flag to select commits to bitmap - stop setting the on-disk reuse flag in bitmaps we generate (since nothing respects it anymore) We will keep a few innards of the reuse code, which will help us implement a more capable version of the "reuse" optimization: - simplify rebuild_existing_bitmaps() into a function that only builds the mapping of bits between the old and new orders, but doesn't actually convert any bitmaps - make rebuild_bitmap() public; we'll call it lazily to convert bitmaps as we traverse (using the mapping created above) Signed-off-by: Jeff King Signed-off-by: Taylor Blau --- builtin/pack-objects.c | 1 - pack-bitmap-write.c | 50 +++++------------------------------------- pack-bitmap.c | 46 +++++--------------------------------- pack-bitmap.h | 6 ++++- 4 files changed, 16 insertions(+), 87 deletions(-) diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 5617c01b5a..2a00358f34 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -1104,7 +1104,6 @@ static void write_pack_file(void) stop_progress(&progress_state); bitmap_writer_show_progress(progress); - bitmap_writer_reuse_bitmaps(&to_pack); bitmap_writer_select_commits(indexed_commits, indexed_commits_nr, -1); bitmap_writer_build(&to_pack); bitmap_writer_finish(written_list, nr_written, diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c index 7b4fc0f304..1995f75818 100644 --- a/pack-bitmap-write.c +++ b/pack-bitmap-write.c @@ -30,7 +30,6 @@ struct bitmap_writer { struct ewah_bitmap *tags; kh_oid_map_t *bitmaps; - kh_oid_map_t *reused; struct packing_data *to_pack; struct bitmapped_commit *selected; @@ -112,7 +111,7 @@ void bitmap_writer_build_type_index(struct packing_data *to_pack, * Compute the actual bitmaps */ -static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitmap *reused) +static inline void push_bitmapped_commit(struct commit *commit) { if (writer.selected_nr >= writer.selected_alloc) { writer.selected_alloc = (writer.selected_alloc + 32) * 2; @@ -120,7 +119,7 @@ static inline void push_bitmapped_commit(struct commit *commit, struct ewah_bitm } writer.selected[writer.selected_nr].commit = commit; - writer.selected[writer.selected_nr].bitmap = reused; + writer.selected[writer.selected_nr].bitmap = NULL; writer.selected[writer.selected_nr].flags = 0; writer.selected_nr++; @@ -372,13 +371,6 @@ static void store_selected(struct bb_commit *ent, struct commit *commit) khiter_t hash_pos; int hash_ret; - /* - * the "reuse bitmaps" phase may have stored something here, but - * our new algorithm doesn't use it. Drop it. - */ - if (stored->bitmap) - ewah_free(stored->bitmap); - stored->bitmap = bitmap_to_ewah(ent->bitmap); hash_pos = kh_put_oid_map(writer.bitmaps, commit->object.oid, &hash_ret); @@ -477,35 +469,6 @@ static int date_compare(const void *_a, const void *_b) return (long)b->date - (long)a->date; } -void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack) -{ - struct bitmap_index *bitmap_git; - if (!(bitmap_git = prepare_bitmap_git(to_pack->repo))) - return; - - writer.reused = kh_init_oid_map(); - rebuild_existing_bitmaps(bitmap_git, to_pack, writer.reused, - writer.show_progress); - /* - * NEEDSWORK: rebuild_existing_bitmaps() makes writer.reused reference - * some bitmaps in bitmap_git, so we can't free the latter. - */ -} - -static struct ewah_bitmap *find_reused_bitmap(const struct object_id *oid) -{ - khiter_t hash_pos; - - if (!writer.reused) - return NULL; - - hash_pos = kh_get_oid_map(writer.reused, *oid); - if (hash_pos >= kh_end(writer.reused)) - return NULL; - - return kh_value(writer.reused, hash_pos); -} - void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps) @@ -519,12 +482,11 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, if (indexed_commits_nr < 100) { for (i = 0; i < indexed_commits_nr; ++i) - push_bitmapped_commit(indexed_commits[i], NULL); + push_bitmapped_commit(indexed_commits[i]); return; } for (;;) { - struct ewah_bitmap *reused_bitmap = NULL; struct commit *chosen = NULL; next = next_commit_index(i); @@ -539,15 +501,13 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, if (next == 0) { chosen = indexed_commits[i]; - reused_bitmap = find_reused_bitmap(&chosen->object.oid); } else { chosen = indexed_commits[i + next]; for (j = 0; j <= next; ++j) { struct commit *cm = indexed_commits[i + j]; - reused_bitmap = find_reused_bitmap(&cm->object.oid); - if (reused_bitmap || (cm->object.flags & NEEDS_BITMAP) != 0) { + if ((cm->object.flags & NEEDS_BITMAP) != 0) { chosen = cm; break; } @@ -557,7 +517,7 @@ void bitmap_writer_select_commits(struct commit **indexed_commits, } } - push_bitmapped_commit(chosen, reused_bitmap); + push_bitmapped_commit(chosen); i += next + 1; display_progress(writer.progress, i); diff --git a/pack-bitmap.c b/pack-bitmap.c index 82c6bf2843..682f4d19dd 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -1333,9 +1333,9 @@ void test_bitmap_walk(struct rev_info *revs) free_bitmap_index(bitmap_git); } -static int rebuild_bitmap(uint32_t *reposition, - struct ewah_bitmap *source, - struct bitmap *dest) +int rebuild_bitmap(const uint32_t *reposition, + struct ewah_bitmap *source, + struct bitmap *dest) { uint32_t pos = 0; struct ewah_iterator it; @@ -1364,19 +1364,11 @@ static int rebuild_bitmap(uint32_t *reposition, return 0; } -int rebuild_existing_bitmaps(struct bitmap_index *bitmap_git, - struct packing_data *mapping, - kh_oid_map_t *reused_bitmaps, - int show_progress) +uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, + struct packing_data *mapping) { uint32_t i, num_objects; uint32_t *reposition; - struct bitmap *rebuild; - struct stored_bitmap *stored; - struct progress *progress = NULL; - - khiter_t hash_pos; - int hash_ret; num_objects = bitmap_git->pack->num_objects; reposition = xcalloc(num_objects, sizeof(uint32_t)); @@ -1394,33 +1386,7 @@ int rebuild_existing_bitmaps(struct bitmap_index *bitmap_git, reposition[i] = oe_in_pack_pos(mapping, oe) + 1; } - rebuild = bitmap_new(); - i = 0; - - if (show_progress) - progress = start_progress("Reusing bitmaps", 0); - - kh_foreach_value(bitmap_git->bitmaps, stored, { - if (stored->flags & BITMAP_FLAG_REUSE) { - if (!rebuild_bitmap(reposition, - lookup_stored_bitmap(stored), - rebuild)) { - hash_pos = kh_put_oid_map(reused_bitmaps, - stored->oid, - &hash_ret); - kh_value(reused_bitmaps, hash_pos) = - bitmap_to_ewah(rebuild); - } - bitmap_reset(rebuild); - display_progress(progress, ++i); - } - }); - - stop_progress(&progress); - - free(reposition); - bitmap_free(rebuild); - return 0; + return reposition; } void free_bitmap_index(struct bitmap_index *b) diff --git a/pack-bitmap.h b/pack-bitmap.h index 1203120c43..afa4115136 100644 --- a/pack-bitmap.h +++ b/pack-bitmap.h @@ -73,7 +73,11 @@ void bitmap_writer_set_checksum(unsigned char *sha1); void bitmap_writer_build_type_index(struct packing_data *to_pack, struct pack_idx_entry **index, uint32_t index_nr); -void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack); +uint32_t *create_bitmap_mapping(struct bitmap_index *bitmap_git, + struct packing_data *mapping); +int rebuild_bitmap(const uint32_t *reposition, + struct ewah_bitmap *source, + struct bitmap *dest); void bitmap_writer_select_commits(struct commit **indexed_commits, unsigned int indexed_commits_nr, int max_bitmaps); void bitmap_writer_build(struct packing_data *to_pack); -- 2.29.2.156.gc03786897f