From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE371C433F5 for ; Thu, 24 Feb 2022 11:44:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S232775AbiBXLod (ORCPT ); Thu, 24 Feb 2022 06:44:33 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50102 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231176AbiBXLo3 (ORCPT ); Thu, 24 Feb 2022 06:44:29 -0500 Received: from mail-lf1-x130.google.com (mail-lf1-x130.google.com [IPv6:2a00:1450:4864:20::130]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 0BA8928DDF9 for ; Thu, 24 Feb 2022 03:43:59 -0800 (PST) Received: by mail-lf1-x130.google.com with SMTP id b11so3304783lfb.12 for ; Thu, 24 Feb 2022 03:43:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=bytedance-com.20210112.gappssmtp.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=0aj1sHJcqKfZb0FtBlcvqwZ+alQwzGOksJeJvrB2n3c=; b=lKuEAZnyhHTHhRa8sI+KaAy+L/24jIHphLByYoXbT7Laiv3Pp2hGZx5OkqoRCCwQiB G5snGJHKil1n7xQZDv4CkZYg5Oexa2ZeRnkMZGZ9K3qBO72NusQBRomYTmqJx26nK2bl k1uSUGzItmiRTTBKKl9kR2We0meBcXcQOJid+4UuFcbyBJ6ZnMFnk+3VVdO40M8QaLXn 215J84p6I9iqmAxns1anQP0EGVwzk+gvFKfqAt15OfNbp6i5onWDrRyNazdXg3ct2bC9 2GESHzW5dJ+rx0bqHIv4hXUxUzqVnQxr9B59xRg2RVCmLF+JL9/DauUh+hTdUK82Kyfo PgVw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=0aj1sHJcqKfZb0FtBlcvqwZ+alQwzGOksJeJvrB2n3c=; b=7VFjyxOTMCaMnidiCUmn/md5zPuUJpCsm/1RJiXnwuWPzsHuvOtWSRZ0WySaF5xm9h 3J00Mx3YQwlXlNE0jh76b52r+AmsOMV/GkEWA61BnyZmQJv5DQ5RsicauCVrlTuiRJwo +YnCfdSj/udFV2KlinO1atqAfuP4KwpCa+JJZerO86YiVYvuJXGp3j+UsGpuySTfDY9m ZdR609dvAuhsi0L/Cj4IjGEw8sMqr1m3+RmYWsDLvHbkfIGh9j5Y5pwNXVeZHwA51Hm5 G0d2U4rTNOl5GmVjHwOkZGoervXgo1Mw/83CEheeo9HAD2IMQa8kPPGqNUu67icZpl9r rVvA== X-Gm-Message-State: AOAM530EpSRfP3xqINJI02JFNJQGLmU0DI/RnhCN49L2cX9oRJj1NoxI UmGLQ6mdRRHpZihdeIdUJk0n89n9pnxTzYmTe4WQiw== X-Google-Smtp-Source: ABdhPJz3iN1wfagflfnHgoSqfOU5YkaziRD5rRzHy7J3nc0KrnbGmpLRrTnVqtp7IHNN091wV5xSs6YsGn582ChCuus= X-Received: by 2002:a19:e049:0:b0:42f:b0e2:10c4 with SMTP id g9-20020a19e049000000b0042fb0e210c4mr1553336lfj.170.1645703037284; Thu, 24 Feb 2022 03:43:57 -0800 (PST) MIME-Version: 1.0 References: <20220223135755.plbr2fvt66k3xyn5@riteshh-domain> In-Reply-To: <20220223135755.plbr2fvt66k3xyn5@riteshh-domain> From: Xin Yin Date: Thu, 24 Feb 2022 19:43:46 +0800 Message-ID: Subject: Re: [External] [RFC 9/9] ext4: fast_commit missing tracking updates to a file To: Ritesh Harjani Cc: linux-ext4@vger.kernel.org, Harshad Shirwadkar , "Theodore Ts'o" , Jan Kara , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Feb 23, 2022 at 9:59 PM Ritesh Harjani wrote: > > On 22/02/23 11:50AM, Xin Yin wrote: > > On Wed, Feb 23, 2022 at 4:36 AM Ritesh Harjani wrote: > > > > > > > > > > > > Testcase > > > ========== > > > 1. i=0; while [ $i -lt 1000 ]; do xfs_io -f -c "pwrite -S 0xaa -b 32k 0 32k" -c "fsync" /mnt/$i; i=$(($i+1)); done && sudo ./src/godown -v /mnt && sudo umount /mnt && sudo mount /dev/loop2 /mnt' > > > 2. ls -alih /mnt/ -> In this you will observe one such file with 0 bytes (which ideally should not happen) > > > > > > ^^^ say if you don't see the issue because your underlying storage > > > device is very fast, then maybe try with commit=1 mount option. > > > > > > Analysis > > > ========== > > > It seems a file's updates can be a part of two transaction tid. > > > Below are the sequence of events which could cause this issue. > > > > > > jbd2_handle_start -> (t_tid = 38) > > > __ext4_new_inode > > > ext4_fc_track_template -> __track_inode -> (i_sync_tid = 38, t_tid = 38) > > > > > > jbd2_start_commit -> (t_tid = 38) > > > > > > jbd2_handle_start (tid = 39) > > > ext4_fc_track_template -> __track_inode -> (i_sync_tid = 38, t_tid 39) > > > -> ext4_fc_reset_inode & ei->i_sync_tid = t_tid > > > > > > ext4_fc_commit_start -> (will wait since jbd2 full commit is in progress) > > > jbd2_end_commit (t_tid = 38) > > > -> jbd2_fc_cleanup() -> this will cleanup entries in sbi->s_fc_q[FC_Q_MAIN] > > > -> And the above could result inode size as 0 as after effect. > > > ext4_fc_commit_stop > > > > > > You could find the logs for the above behavior for inode 979 at [1]. > > > > > > -> So what is happening here is since the ei->i_fc_list is not empty > > > (because it is already part of sb's MAIN queue), we don't add this inode > > > again into neither sb's MAIN or STAGING queue. > > > And after jbd2_fc_cleanup() is called from jbd2 full commit, we > > > just remove this inode from the main queue. > > > > > > So as a simple fix, what I did below was to check if it is a jbd2 full commit > > > in ext4_fc_cleanup(), and if the ei->i_sync_tid > tid, that means we > > > need not remove that from MAIN queue. This is since neither jbd2 nor FC > > > has committed updates of those inodes for this new txn tid yet. > > > > > > But below are some quick queries on this > > > ========================================= > > > > > > 1. why do we call ext4_fc_reset_inode() when inode tid and > > > running txn tid does not match? > > This is part of a change in commit:bdc8a53a6f2f, it fixes the issue > > for fc tracking logic while jbd2 commit is ongoing. > > Thanks Xin for pointing the other issue too. > But I think what I was mostly referring to was - calling ext4_fc_reset_inode() > in ext4_fc_track_template(). > Understood, I missed something here, then maybe Harshad can give some directions for this part. > <..> > 391 tid = handle->h_transaction->t_tid; > 392 mutex_lock(&ei->i_fc_lock); > 393 if (tid == ei->i_sync_tid) { > 394 update = true; > 395 } else { > 396 ext4_fc_reset_inode(inode); > 397 ei->i_sync_tid = tid; > 398 } > 399 ret = __fc_track_fn(inode, args, update); > 400 mutex_unlock(&ei->i_fc_lock); > <..> > > So, yes these are few corner cases which I want to take a deeper look at. > I vaugely understand that this reset inode is done since we anyway might have > done the full commit for previous tid, so we can reset the inode track range. > > So, yes, we should carefully review this as well that if jbd2 commit happens for > an inode which is still part of MAIN_Q, then does it make sense to still > call ext4_fc_reset_inode() for that inode in ext4_fc_track_template()? > > > If the inode tid is bigger than txn tid, that means this inode may be > > in the STAGING queue, if we reset it then it will lose the tack range. > > I think it's a similar issue, the difference is this inode is already > > Do you have a test case which was failing for your issue? > I would like to test that one too. This issue can be triggered by generic/455 , but this is one failed case for it. I also do not have a reproducer for this. > > > > in the MAIN queue before the jbd2 commit starts. > > And yes , I think in this case we can not remove it from the MAIN > > Yes. I too have a similar thought. But I still wanted to get few queries sorted > (like point 1 & 2). > > > queue, but still need to clear EXT4_STATE_FC_COMMITTING right? it may > > block some task still waiting for it. > > Sorry I didn't get you here. So I think we will end up in such situation > (where ext4_fc_cleanup() is getting called for an inode with i_sync_tid > tid) > only from full commit path right ? > And that won't set EXT4_FC_COMMITTING for this inode right anyways no? I am not sure if there are any other cases, But for now we also clear EXT4_STATE_FC_COMMITTING in the full commit path right? So maybe some further tests are needed. Thanks, Xin Yin > > Do you mean anything else, or am I missing something here? > > -ritesh > > > > > > Thanks, > > Xin Yin > > > > > > 2. Also is this an expected behavior from the design perspective of > > > fast_commit. i.e. > > > a. the inode can be part of two tids? > > > b. And that while a full commit is in progress, the inode can still > > > receive updates but using a new transaction tid. > > > > > > Frankly speaking, since I was also working on other things, so I haven't > > > yet got the chance to completely analyze the situation yet. > > > Once I have those things sorted, I will spend more time on this, to > > > understand it more. Meanwhile if you already have some answers to above > > > queries/observations, please do share those here. > > > > > > Links > > > ========= > > > [1] https://raw.githubusercontent.com/riteshharjani/LinuxStudy/master/ext4/fast_commit/fc_inode_missing_updates_ino_979.txt > > > > > > Signed-off-by: Ritesh Harjani > > > --- > > > fs/ext4/fast_commit.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/fs/ext4/fast_commit.c b/fs/ext4/fast_commit.c > > > index 8803ba087b07..769b584c2552 100644 > > > --- a/fs/ext4/fast_commit.c > > > +++ b/fs/ext4/fast_commit.c > > > @@ -1252,6 +1252,8 @@ static void ext4_fc_cleanup(journal_t *journal, int full, tid_t tid) > > > spin_lock(&sbi->s_fc_lock); > > > list_for_each_entry_safe(iter, iter_n, &sbi->s_fc_q[FC_Q_MAIN], > > > i_fc_list) { > > > + if (full && iter->i_sync_tid > tid) > > > + continue; > > > list_del_init(&iter->i_fc_list); > > > ext4_clear_inode_state(&iter->vfs_inode, > > > EXT4_STATE_FC_COMMITTING); > > > -- > > > 2.31.1 > > >