From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8F2D8C388F9 for ; Fri, 20 Nov 2020 03:03:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 2F04E2222F for ; Fri, 20 Nov 2020 03:03:38 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726468AbgKTDDR (ORCPT ); Thu, 19 Nov 2020 22:03:17 -0500 Received: from szxga04-in.huawei.com ([45.249.212.190]:7658 "EHLO szxga04-in.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726365AbgKTDDR (ORCPT ); Thu, 19 Nov 2020 22:03:17 -0500 Received: from DGGEMS408-HUB.china.huawei.com (unknown [172.30.72.58]) by szxga04-in.huawei.com (SkyGuard) with ESMTP id 4CchDy08knz15PMm; Fri, 20 Nov 2020 11:02:58 +0800 (CST) Received: from [10.174.179.106] (10.174.179.106) by DGGEMS408-HUB.china.huawei.com (10.3.19.208) with Microsoft SMTP Server id 14.3.487.0; Fri, 20 Nov 2020 11:03:06 +0800 Subject: Re: [Bug report] journal data mode trigger panic in jbd2_journal_commit_transaction From: yangerkun To: Mauricio Oliveira CC: "Theodore Y . Ts'o" , , Jan Kara , , "zhangyi (F)" , Hou Tao , , Ye Bin , References: <68b9650e-bef2-69e2-ab5e-8aaddaf46cfe@huawei.com> <17d7ecde-5fda-cd03-6fef-e7b8250489f9@huawei.com> Message-ID: <14879a89-b6d2-e142-2ea3-23fbb041444b@huawei.com> Date: Fri, 20 Nov 2020 11:03:06 +0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:68.0) Gecko/20100101 Thunderbird/68.7.0 MIME-Version: 1.0 In-Reply-To: <17d7ecde-5fda-cd03-6fef-e7b8250489f9@huawei.com> Content-Type: text/plain; charset="utf-8"; format=flowed Content-Transfer-Encoding: 8bit X-Originating-IP: [10.174.179.106] X-CFilter-Loop: Reflected Precedence: bulk List-ID: X-Mailing-List: linux-ext4@vger.kernel.org 在 2020/11/20 10:54, yangerkun 写道: > > > 在 2020/11/19 21:12, Mauricio Oliveira 写道: >> On Thu, Nov 19, 2020 at 1:25 AM yangerkun wrote: >>> >>> >>> >>> 在 2020/11/16 21:50, Mauricio Oliveira 写道: >>>> Hi Kun, >>>> >>>> On Sat, Nov 14, 2020 at 5:18 AM yangerkun wrote: >>>>> While using ext4 with data=journal(3.10 kernel), we meet a problem >>>>> that >>>>> we think may never happend... >>>> [...] >>>> >>>> Could you please confirm you mean 5.10-rc* kernel instead of 3.10? >>>> (It seems so as you mention a recent commit below.)  Thanks! >>>> >>>>> For now, what I have seen that can dirty buffer directly is >>>>> ext4_page_mkwrite(64a9f1449950 ("ext4: data=journal: fixes for >>>>> ext4_page_mkwrite()")), and runing ext4_punch_hole with keep_size >>>>> /ext4_page_mkwrite parallel can trigger above warning easily. >>>> [...] >>>> >>>> >>> >>> Hi, >>> >>> Sorry for the long delay reply... And thanks a lot for your advise! The >>> bug trigger with a very low probability. So won't trigger with 5.10 can >>> not prove no bug exist in 5.10. >>> >> >> No worries, and thanks for following up. >> So I understand that the bug report was indeed on 3.10, and 5.10-rcN >> is not yet confirmed. >> >>> Google a lot and notice that someone before has report the same bug[1]. >>> '3b136499e906 ("ext4: fix data corruption in data=journal mode")' seems >>> fix the problem. I will try to understand this, and give a analysis >>> about how to reproduce it! >> >> Cool, thanks! >> >>> Thanks, >>> Kun. >> >> >> > Hi, > > The follow step can reproduce the bug[1] reported before easily. And the > bug we meet seems same. Following patch will fix the bug. > > 3b136499e906 ext4: fix data corruption in data=journal mode > b90197b65518 ext4: use private version of page_zero_new_buffers() for > data=journal mode > > > 1. mkfs.ext4 > 2. touch $tofile(ino == 12) > 3. touch $fromfile(ino == 13) and write 4k to fromfile and sync > > mmap $fromfile 4k > and write 4k > to $tofile > > ... > generic_perform_write >  ext4_write_begin >   ext4_journal_start >   (trans 1) >  if (ino == 12) sleep for 30s >  ...                           truncate $fromfile >                                to 0 >  copied=0,bytes=4k >  ext4_journalled_write_end >   page_zero_new_buffers >    mark_buffer_dirty >   write_end_fn >    ... >    __jbd2_journal_file_buffer >     test_clear_buffer_dirty >     __jbd2_journal_temp_unlink_buffer this will mark buffer dirty again! >   ext4_journal_stop >   (trans 1) >                                                  trans1 commit >                                                  ... >   ext4_truncate_failed_write >    ... >    journal_unmap_buffer >     set_buffer_freed >                                                  forget list >                                                   ... >                                                   clear_buffer_jbddirty >                                                   ... >                                                   J_ASSERT_BH(bh, >                                                   !buffer_dirty(bh)) >                                                   ^^^^^^^^^^^^^^^^^ >                                                   trigger the bug... > > > > [1]. https://www.spinics.net/lists/linux-ext4/msg56447.html > > Thanks, > Kun. > .