From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.6 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C83E0C55178 for ; Fri, 6 Nov 2020 17:56:34 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6224C206FA for ; Fri, 6 Nov 2020 17:56:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="Q3j3L0fI" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727320AbgKFR4d (ORCPT ); Fri, 6 Nov 2020 12:56:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48634 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726867AbgKFR4c (ORCPT ); Fri, 6 Nov 2020 12:56:32 -0500 Received: from mail-lj1-x242.google.com (mail-lj1-x242.google.com [IPv6:2a00:1450:4864:20::242]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35AE0C0613CF for ; Fri, 6 Nov 2020 09:56:32 -0800 (PST) Received: by mail-lj1-x242.google.com with SMTP id y16so2300410ljk.1 for ; Fri, 06 Nov 2020 09:56:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:references:date:in-reply-to:message-id :user-agent:mime-version:content-transfer-encoding; bh=HYIO94DG7IsvnSGiMySBVAQVyL9OSkdShE0hidSeqas=; b=Q3j3L0fI+l5/RsjBaKApSSGCP6QSY2bxaXdhhMtpSynsDRo9Y1Q9dzKaGyue+RBZLN Vj+nw5p44+S+nmkEPgE4ZEpfFD0h3b9OHPqAjuX6sEncZYjXF1/l27Xde6tLgH3gcRim G9+lkDHu6QveynENe951ChAxePRex7AJl44WfurJugPQDcRSiWT+l9a0TfoWZ13v/hm3 lDBJuEt5qBH4XIO0eO6ptryZIj/4OMu2KM2AUr0sSLz5Gnb4d6WDOB5E++uj5JS1XCiX HVxB12BXy52SOZNhDNkmk8uopnfkCDXEBjB5AwDHtGAyY4nxx7AL0i/qZ/ChoStatWzB 89bg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:references:date:in-reply-to :message-id:user-agent:mime-version:content-transfer-encoding; bh=HYIO94DG7IsvnSGiMySBVAQVyL9OSkdShE0hidSeqas=; b=FBalXt12oY9GofK/+c3KpQK60d6MKke0MebT1RfZ4C8Puqr0suHiRphogV83xU3qsy QHwsLaG8ZRPkaZSER8wzo2a7mMF8+daq6vXfcg69fWJYBgddrsF0iHV+K1NsdR79U75D HeHNVhIKolFQ/fLdMUiuKNEf02VlwtaV3AgrueDVlgnfW2qCQ0+oiToSNqu38Z+Jih1r 9MB7psZ6CR7N7Fx52QlkEFUnm6iIgdWW/1Qk4PJ58akAOgw7R4h8AoOaaCUClPgUH6rY QZYfqeckQ4ZK4LPbFwyNfmQVd5ON78jEZtx/VoKsgPRkWoXe/ZTLfb+oXVRiBZfxuEHx YxGw== X-Gm-Message-State: AOAM532GjPDPViI8XTo0C9MFqW1U9T9Ud+LHH+si3WsQuxoqOInAuzLG JPIddNdYj0Mkg50mFL+dGandIl9Bu8wqyhN0 X-Google-Smtp-Source: ABdhPJxo5oo1JVtwtCL1j0kMdSnWT0zrgMDqqoM1px4W0NazAJHjPEfjZ3U8AiUqdECVnbZtkAxffQ== X-Received: by 2002:a2e:9746:: with SMTP id f6mr1215860ljj.270.1604685390565; Fri, 06 Nov 2020 09:56:30 -0800 (PST) Received: from LAPTOP-ACER-ASPIRE-F5 (host-89-229-7-83.dynamic.mm.pl. [89.229.7.83]) by smtp.gmail.com with ESMTPSA id 2sm241524lfd.67.2020.11.06.09.56.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 06 Nov 2020 09:56:29 -0800 (PST) From: Jakub =?utf-8?Q?Nar=C4=99bski?= To: Abhishek Kumar Cc: git@vger.kernel.org, Derrick Stolee , Taylor Blau , Abhishek Kumar via GitGitGadget Subject: Re: [PATCH v4 07/10] commit-graph: implement generation data chunk References: <854kmbx4pi.fsf@gmail.com> <20201106112513.GA1553@Abhishek-Arch> Date: Fri, 06 Nov 2020 18:56:28 +0100 In-Reply-To: <20201106112513.GA1553@Abhishek-Arch> (Abhishek Kumar's message of "Fri, 6 Nov 2020 16:55:13 +0530") Message-ID: <858sbel67n.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (windows-nt) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org In short: I think that because current implementation of writing GDOV chunk follows an example of writing EDGE chunk, it should be left as it is now (simple), and posible performance improvements be postponed to some future commit. Abhishek Kumar writes: > On Fri, Oct 30, 2020 at 01:45:29PM +0100, Jakub Nar=C4=99bski wrote: [...] >>> We test the overflow-related code with the following repo history: >>> >>> F - N - U >>> / \ >>> U - N - U N >>> \ / >>> N - F - N >>=20 >> Do we need such complex history? I guess we need to test the handling of >> merge commits too. >>=20 > > I wanted to test three cases - a root epoch zero commit, a commit that's > far enough in past to overflow the offset and a commit that's far enough > in the future to overflow the offset. All right, if I understand this correctly this would be U as root, U-F pair of commits and N-F pair of commits, respectively. Did I get it right? Anyway, it might be a good idea to put this explanation in the commit message. >>> >>> Where the commits denoted by U have committer date of zero seconds >>> since Unix epoch, the commits denoted by N have committer date of >>> 1112354055 (default committer date for the test suite) seconds since >>> Unix epoch and the commits denoted by F have committer date of >>> (2 ^ 31 - 2) seconds since Unix epoch. >>> >>> The largest offset observed is 2 ^ 31, just large enough to overflow. [...] >>> @@ -1120,6 +1149,44 @@ static int write_graph_chunk_data(struct hashfil= e *f, >>> return 0; >>> } >>>=20=20 >>> +static int write_graph_chunk_generation_data(struct hashfile *f, >>> + struct write_commit_graph_context *ctx) >>> +{ >>> + int i, num_generation_data_overflows =3D 0; >>=20 >> Minor nitpick: in my opinion there should be empty line here, between >> the variables declaration and the code... however not all >> write_graph_chunk_*() functions have it. >>=20 >>> + for (i =3D 0; i < ctx->commits.nr; i++) { >>> + struct commit *c =3D ctx->commits.list[i]; >>> + timestamp_t offset =3D commit_graph_data_at(c)->generation - c->date; >>> + display_progress(ctx->progress, ++ctx->progress_cnt); >>=20 >> All right. >>=20 >>> + >>> + if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { >>> + offset =3D CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_d= ata_overflows; >>> + num_generation_data_overflows++; >>> + } >>=20 >> Hmmm... shouldn't we store these commits that need overflow handling >> (with corrected commit date offset greater than GENERATION_NUMBER_V2_OFF= SET_MAX) >> in a list or a queue, to remember them for writing GDOV chunk? >>=20 > > We could, although write_graph_chunk_extra_edges() (just like this functi= on) > prefers to iterate over all commits again. Both octopus merges and > overflowing corrected commit dates are exceedingly rare, might be > worthwhile to trade some memory to avoid looping again. I'm sorry, I have not looked what write_graph_chunk_extra_edges() does, or rather how it does what it does -- it is a good idea to pattern your solution in similar existing code. For me this is an even stronger hint that we should strive for simplicity first, and leave possible performance improvements for the future commit. Especially that you perform the most significant optimization for this overflow handling: ensuring that we do not perform any work if there are no commits with generation data overflow. Maybe, maybe we should add that information about similarity between write_graph_chunk_generation_data_overflow() and write_graph_chunk_extra_ed= ges()=20 in the commit message. I am unsure... >> We could store oids, or we could store commits themselves, for example: >>=20 >> if (offset > GENERATION_NUMBER_V2_OFFSET_MAX) { >> offset =3D CORRECTED_COMMIT_DATE_OFFSET_OVERFLOW | num_generation_dat= a_overflows; >> num_generation_data_overflows++; >>=20 >> ALLOC_GROW(ctx->gdov_commits.list, ctx->gdov_commits.nr + 1, ctx->gdo= v_commits.alloc); >> ctx->commits.list[ctx->gdov_commits.nr] =3D c; >> ctx->gdov_commits.nr++; >> } >>=20 >> Though in the above proposal we could get rid of `num_generation_data_ov= erflows`,=20 >> as it should be the same as `ctx->gdov_commits.nr`. >>=20 >> I have called the extra commit list member of write_commit_graph_context >> `gdov_commits`, but perhaps a better name would be `commits_gen_v2_overf= low`,=20 >> or similar more descriptive name. [...] >>> @@ -741,4 +742,47 @@ test_expect_success 'corrupt commit-graph write (m= issing tree)' ' >>> ) >>> ' >>>=20=20 >>> +test_commit_with_date() { >>> + file=3D"$1.t" && >>> + echo "$1" >"$file" && >>> + git add "$file" && >>> + GIT_COMMITTER_DATE=3D"$2" GIT_AUTHOR_DATE=3D"$2" git commit -m "$1" >>> + git tag "$1" >>> +} >>=20 >> Here we add a helper function. All right. >>=20 >> I wonder though if it wouldn't be a better idea to add `--date ` >> option to the test_commit() function in test-lib-functions.sh (which >> option would set GIT_COMMITTER_DATE and GIT_AUTHOR_DATE, and also >> set notick=3Dyes). >>=20 > > Yes, that's a better idea - I didn't know how to change test_commit() > well enough to tinker with what's working. > >> For example: >>=20 >> diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh >> index f1ae935fee..a1f9a2b09b 100644 >> --- a/t/test-lib-functions.sh >> +++ b/t/test-lib-functions.sh >> @@ -202,6 +202,12 @@ test_commit () { >> --signoff) >> signoff=3D"$1" >> ;; >> + --date) >> + notick=3Dyes >> + GIT_COMMITTER_DATE=3D"$2" >> + GIT_AUTHOR_DATE=3D"$2" >> + shift >> + ;; >> -C) >> indir=3D"$2" >> shift Note however that I have while I have followed example of other options (namely '-C '), I have not actually tested this proposed implementation in tests; I have just tested that it looks like it works OK. [...] >>> +test_expect_success 'overflow corrected commit date offset' ' >>> + objdir=3D".git/objects" && >>> + UNIX_EPOCH_ZERO=3D"1970-01-01 00:00 +0000" && >>> + FUTURE_DATE=3D"@2147483646 +0000" && >>=20 >> It is a bit funny to see UNIX_EPOCH_ZERO spelled one way, and >> FUTURE_DATE other way. >>=20 >> Wouldn't be more readable to use UNIX_EPOCH_ZERO=3D"@0 +0000"? > > It would, for some reason - I couldn't figure out the valid format for > this. Changed. Well, if "@2147483646 +0000" works (i.e. "@ "= ), why the same for timestamp 0, i.e. "@0 +0000", wouldn't work? [...] >>> +graph_git_behavior 'overflow corrected commit date offset' repo left r= ight >>=20 >> All right, here we compare the Git behavior with the commit-graph to the >> behavior without it... however I think that those two tests really >> should have distinct (different) test names. Currently they both use >> 'overflow corrected commit date offset'. >>=20 > > Following the earlier tests, the first test could be "set up and verify > repo with generation data overflow chunk" and the git behavior test can > be "generation data overflow chunk repo" First is OK, the second could possibly be improved but is all right. [...] >> Though to reduce "noise" in this patch, the rename of run_three_modes() >> to run_all_modes() and test_three_modes() to test_all_modes() could have >> been done in a separate preparatory patch. It would be pure refactoring >> patch, without introducing any new functionality. >>=20 > > Sure, that makes sense to me - this is patch is over 200 lines long > already. Thanks in advance. Best, --=20 Jakub Nar=C4=99bski