From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id D56D4C433F5 for ; Wed, 30 Mar 2022 17:37:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1349384AbiC3Rj2 (ORCPT ); Wed, 30 Mar 2022 13:39:28 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:49276 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1344245AbiC3Rj0 (ORCPT ); Wed, 30 Mar 2022 13:39:26 -0400 Received: from pb-smtp1.pobox.com (pb-smtp1.pobox.com [64.147.108.70]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 2C839CF495 for ; Wed, 30 Mar 2022 10:37:40 -0700 (PDT) Received: from pb-smtp1.pobox.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id 4201F125A10; Wed, 30 Mar 2022 13:37:38 -0400 (EDT) (envelope-from junio@pobox.com) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed; d=pobox.com; h=from:to:cc :subject:references:date:in-reply-to:message-id:mime-version :content-type; s=sasl; bh=L0PcWVVjic4bx3FVZeoGaiQaqYHQtrCmh0dbiu rzgCQ=; b=PnTASUGVYe0b++sJO78fW7t9pcv7mnlu7ft79QNhdToVsjWrWDrPvN 9mCkuYkkknounNF/yQb6lpkGrzrRMUrZhIosksQKQW/Buap9i53WX6j453ANaAqK dIaJaAzwzHVIaPshN1WHwfGsGbW7G+p5D4wbWOyuSbsoaNFI2699g= Received: from pb-smtp1.nyi.icgroup.com (unknown [127.0.0.1]) by pb-smtp1.pobox.com (Postfix) with ESMTP id 3670C125A0F; Wed, 30 Mar 2022 13:37:38 -0400 (EDT) (envelope-from junio@pobox.com) Received: from pobox.com (unknown [35.227.145.180]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by pb-smtp1.pobox.com (Postfix) with ESMTPSA id 75C72125A0E; Wed, 30 Mar 2022 13:37:36 -0400 (EDT) (envelope-from junio@pobox.com) From: Junio C Hamano To: "Neeraj Singh via GitGitGadget" Cc: git@vger.kernel.org, Johannes.Schindelin@gmx.de, avarab@gmail.com, nksingh85@gmail.com, ps@pks.im, jeffhost@microsoft.com, Bagas Sanjaya , worldhello.net@gmail.com, "Neeraj K. Singh" Subject: Re: [PATCH v5 04/14] core.fsyncmethod: batched disk flushes for loose-objects References: <9e7ae22fa4a2693fe26659f875dd780080c4cfb2.1648616734.git.gitgitgadget@gmail.com> Date: Wed, 30 Mar 2022 10:37:35 -0700 In-Reply-To: <9e7ae22fa4a2693fe26659f875dd780080c4cfb2.1648616734.git.gitgitgadget@gmail.com> (Neeraj Singh via GitGitGadget's message of "Wed, 30 Mar 2022 05:05:22 +0000") Message-ID: User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-Pobox-Relay-ID: 169F4624-B050-11EC-B882-5E84C8D8090B-77302942!pb-smtp1.pobox.com Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org "Neeraj Singh via GitGitGadget" writes: > diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt > index 9da3e5d88f6..3c90ba0b395 100644 > --- a/Documentation/config/core.txt > +++ b/Documentation/config/core.txt > @@ -596,6 +596,14 @@ core.fsyncMethod:: > * `writeout-only` issues pagecache writeback requests, but depending on the > filesystem and storage hardware, data added to the repository may not be > durable in the event of a system crash. This is the default mode on macOS. > +* `batch` enables a mode that uses writeout-only flushes to stage multiple > + updates in the disk writeback cache and then does a single full fsync of > + a dummy file to trigger the disk cache flush at the end of the operation. > ++ > + Currently `batch` mode only applies to loose-object files. Other repository > + data is made durable as if `fsync` was specified. This mode is expected to > + be as safe as `fsync` on macOS for repos stored on HFS+ or APFS filesystems > + and on Windows for repos stored on NTFS or ReFS filesystems. Does this format correctly? I had an impression that the second and subsequent paragraphs, connected with a line with a single "+" on it, has to be flushed left without indentation. > diff --git a/bulk-checkin.c b/bulk-checkin.c > index 8b0fd5c7723..9799d247cad 100644 > --- a/bulk-checkin.c > +++ b/bulk-checkin.c > @@ -3,15 +3,20 @@ > */ > #include "cache.h" > #include "bulk-checkin.h" > +#include "lockfile.h" > #include "repository.h" > #include "csum-file.h" > #include "pack.h" > #include "strbuf.h" > +#include "string-list.h" > +#include "tmp-objdir.h" > #include "packfile.h" > #include "object-store.h" > > static int odb_transaction_nesting; > > +static struct tmp_objdir *bulk_fsync_objdir; I wonder if this should be added to the bulk_checkin_state structure as a new member, especially if we fix the erroneous call to finish_bulk_checkin() as a preliminary fix-up of a bug that existed even before this series. > +/* > + * Cleanup after batch-mode fsync_object_files. > + */ > +static void do_batch_fsync(void) > +{ > + struct strbuf temp_path = STRBUF_INIT; > + struct tempfile *temp; > + > + if (!bulk_fsync_objdir) > + return; > + > + /* > + * Issue a full hardware flush against a temporary file to ensure > + * that all objects are durable before any renames occur. The code in > + * fsync_loose_object_bulk_checkin has already issued a writeout > + * request, but it has not flushed any writeback cache in the storage > + * hardware or any filesystem logs. This fsync call acts as a barrier > + * to ensure that the data in each new object file is durable before > + * the final name is visible. > + */ > + strbuf_addf(&temp_path, "%s/bulk_fsync_XXXXXX", get_object_directory()); > + temp = xmks_tempfile(temp_path.buf); > + fsync_or_die(get_tempfile_fd(temp), get_tempfile_path(temp)); > + delete_tempfile(&temp); > + strbuf_release(&temp_path); > + > + /* > + * Make the object files visible in the primary ODB after their data is > + * fully durable. > + */ > + tmp_objdir_migrate(bulk_fsync_objdir); > + bulk_fsync_objdir = NULL; > +} OK. > +void prepare_loose_object_bulk_checkin(void) > +{ > + /* > + * We lazily create the temporary object directory > + * the first time an object might be added, since > + * callers may not know whether any objects will be > + * added at the time they call begin_odb_transaction. > + */ > + if (!odb_transaction_nesting || bulk_fsync_objdir) > + return; > + > + bulk_fsync_objdir = tmp_objdir_create("bulk-fsync"); > + if (bulk_fsync_objdir) > + tmp_objdir_replace_primary_odb(bulk_fsync_objdir, 0); > +} OK. If we got a failure from tmp_objdir_create(), then we don't swap and end up creating a new loose object file in the primary object store. I wonder if we at least want to note that fact for later use at "unplug" time. We may create a few loose objects in the primary object store without any fsync, then a later call may successfully create a temporary object directory and we'd create more loose objects in the temporary one, which are flushed with the "create a dummy and fsync" trick and migrated, but do we need to do something to the ones we created in the primary object store before all that happens? > +void fsync_loose_object_bulk_checkin(int fd, const char *filename) > +{ > + /* > + * If we have an active ODB transaction, we issue a call that > + * cleans the filesystem page cache but avoids a hardware flush > + * command. Later on we will issue a single hardware flush > + * before as part of do_batch_fsync. > + */ > + if (!bulk_fsync_objdir || > + git_fsync(fd, FSYNC_WRITEOUT_ONLY) < 0) { > + fsync_or_die(fd, filename); > + } > +} Ah, if we have successfully created the temporary directory, we don't do full fsync but just writeout-only one, so there is no need for the worry I mentioned in the previous paragraph. OK. > @@ -301,4 +370,6 @@ void end_odb_transaction(void) > > if (bulk_checkin_state.f) > finish_bulk_checkin(&bulk_checkin_state); > + > + do_batch_fsync(); > } OK. > @@ -1961,6 +1963,9 @@ static int write_loose_object(const struct object_id *oid, char *hdr, > static struct strbuf tmp_file = STRBUF_INIT; > static struct strbuf filename = STRBUF_INIT; > > + if (batch_fsync_enabled(FSYNC_COMPONENT_LOOSE_OBJECT)) > + prepare_loose_object_bulk_checkin(); > + > loose_object_path(the_repository, &filename, oid); > > fd = create_tmpfile(&tmp_file, filename.buf); The necessary change to the "workhorse" code path is surprisingly small, which is pleasing to see. Thanks.