From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 135A6C433B4 for ; Thu, 13 May 2021 10:51:09 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id D3B196142C for ; Thu, 13 May 2021 10:51:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S231920AbhEMKwQ (ORCPT ); Thu, 13 May 2021 06:52:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:50702 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231394AbhEMKwP (ORCPT ); Thu, 13 May 2021 06:52:15 -0400 Received: from mail-qt1-x82d.google.com (mail-qt1-x82d.google.com [IPv6:2607:f8b0:4864:20::82d]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 78E86C061574 for ; Thu, 13 May 2021 03:51:04 -0700 (PDT) Received: by mail-qt1-x82d.google.com with SMTP id v4so10580553qtp.1 for ; Thu, 13 May 2021 03:51:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:reply-to:from:date:message-id :subject:to:cc:content-transfer-encoding; bh=+3BdFhzVeWPRm9ijVLYSGur+bin/M+35e4/gFQfSDSg=; b=duneyX81TYWnkw3RiaUj5DjVZHS9wBzkPzZuVxqphbG4Fm8/cdlTAQ7lQfUk1xIqN+ biknq+1wiRzhGIEXWhQkl818dTuMnjqIFFdQPMxs+saSGufdprs7WVYfH5bK+k0GuDv3 OF4OhnFCNCCmkkMoRsG52f7GO1+ukvUgVdYvic4LoVlhRipeZeIeX4bRTKKqWhsWjQAk Ba2h7bs+YiCPDavUiM5CIFZ4d9Zd4Ole475pFjOM2cIMQXc50pM8TJVqXfIRPMwevILB Oj04KAV7LQNEoq96/kQ7CnmaLDnPNveRj5E3me9upcLph66vpZJHQ26uZlpnDJyXaoGT Ag9A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:reply-to :from:date:message-id:subject:to:cc:content-transfer-encoding; bh=+3BdFhzVeWPRm9ijVLYSGur+bin/M+35e4/gFQfSDSg=; b=or3HcaHIkmSQKpvFLbVjpp8YQ9DP4+L0cb5h/SHOcRn9YSZE3H/3hnOl0DBYAx8Mmh TdymEeP9GNG60jDbRpU9jVILPs2keKQwhHaI0gK3uFCbpCnAh4weGcR5BsU9wi+vC8/6 +fQ1GVa2E/juomU9lWkDRXg1AvRH2i1QGtH7yeEmLKtSuAyeBYZoZu/vY9gcAXyo2y6H tEVoov5zp9HdybAU96KbrtsgAy7fGl5NbOLVyjw1Motud63a/bbS6h7WY1Px7ddfciSt C4LPF/qoDI/Ga7lYCoIgj5WXuLJRDxIlMoNeu0Ha1oZmaQa+atLeMHi0b3gbN/LH7D0E vUUw== X-Gm-Message-State: AOAM530JNkuzG3hfkSkGQeY1QTOD/FVcf9mQU/DIcw9oqhR99KaB6BnO AeXF/HXCvJkyk5PST/YdMqEX+6itraBRhuWyr2w= X-Google-Smtp-Source: ABdhPJxsGvY0SRdWAgMsipfY98j0Qr+v/yRZ36JUdSXF5oemGR9PrzadeUobo8zNMz2AffScOs2W+yOFPQe6b8QtR1c= X-Received: by 2002:ac8:5dce:: with SMTP id e14mr660499qtx.183.1620903063774; Thu, 13 May 2021 03:51:03 -0700 (PDT) MIME-Version: 1.0 References: <93c4600e-5263-5cba-adf0-6f47526e7561@in.tum.de> In-Reply-To: Reply-To: fdmanana@gmail.com From: Filipe Manana Date: Thu, 13 May 2021 11:50:52 +0100 Message-ID: Subject: Re: Leaf corruption due to csum range To: Philipp Fent Cc: linux-btrfs Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Thu, May 13, 2021 at 10:57 AM Filipe Manana wrote: > > On Tue, May 11, 2021 at 7:19 PM Philipp Fent wrote: > > > > Thanks for the explanation! I wasn't aware of these ioctls. > > > > > strace would be clear to me, which I'm more familiar with (or even > > better, bpftrace). > > > > I've attached an strace output that decompresses to about 200MB > > logfiles. I can't make heads or tails of it, but I hope it helps. > > I have never used bpftrace, do you have any pointers where I could star= t? > > There's some documentation and examples on their github. > > > > > > I just remembered that 5.13-rc1 includes a fix for races between mmap > > writes and fsync that could fix that > > > > I tried 5.13-rc1, but I'm running into the same csum range issue: > > > > > > > > Linux version 5.13.0-rc1-1-mainline (linux-mainline@archlinux) (gcc > > (GCC) 10.2.0, GNU ld (GNU Binutils) 2.36.1) #1 SMP PREEMPT Tue, 11 May > > 2021 15:34:19 +0000 > > ... > > BTRFS critical (device sda): corrupt leaf: root=3D18446744073709551610 > > block=3D507430633472 slot=3D5, csum end range (293918547968) goes beyon= d the > > start range (293918416896) of the next csum item > > BTRFS info (device sda): leaf 507430633472 gen 18451 total ptrs 11 free > > space 5016 owner 18446744073709551610 > > item 0 key (18446744073709551606 128 293837238272) itemoff 1592= 3 > > itemsize 360 > > item 1 key (18446744073709551606 128 293838544896) itemoff 1586= 3 > > itemsize 60 > > item 2 key (18446744073709551606 128 293838675968) itemoff 1556= 3 > > itemsize 300 > > item 3 key (18446744073709551606 128 293839527936) itemoff 1550= 3 > > itemsize 60 > > item 4 key (18446744073709551606 128 293872295936) itemoff 1526= 3 > > itemsize 240 > > item 5 key (18446744073709551606 128 293913763840) itemoff 1059= 1 > > itemsize 4672 > > item 6 key (18446744073709551606 128 293918416896) itemoff 8351 > > itemsize 2240 > > item 7 key (18446744073709551606 128 293947658240) itemoff 8347 > > itemsize 4 > > item 8 key (18446744073709551606 128 293965193216) itemoff 8287 > > itemsize 60 > > item 9 key (18446744073709551606 128 293965848576) itemoff 8227 > > itemsize 60 > > item 10 key (18446744073709551606 128 293966176256) itemoff 529= 1 > > itemsize 2936 > > BTRFS error (device sda): block=3D507430633472 write time tree block > > corruption detected > > BTRFS critical (device sda): corrupt leaf: root=3D18446744073709551610 > > block=3D507447197696 slot=3D0, csum end range (320352133120) goes beyon= d the > > start range (320352116736) of the next csum item > > BTRFS info (device sda): leaf 507447197696 gen 18451 total ptrs 3 free > > space 116 owner 18446744073709551610 > > item 0 key (18446744073709551606 128 320336326656) itemoff 847 > > itemsize 15436 > > item 1 key (18446744073709551606 128 320352116736) itemoff 831 > > itemsize 16 > > item 2 key (18446744073709551606 128 320352247808) itemoff 191 > > itemsize 640 > > BTRFS error (device sda): block=3D507447197696 write time tree block > > corruption detected > > BTRFS: error (device sda) in btrfs_sync_log:3136: errno=3D-5 IO failure > > BTRFS info (device sda): forced readonly > > Ok, then it's something else. > > When I run you reproducer I get an error: > > $ ./runMssql.sh > Starting MSSQL docker container... > 9943b714ed210a2937d5fce27ec110981b471e6e9f2c619629cb66501621ebb5 > Loading TPC-H schema... > Sqlcmd: Error: Microsoft ODBC Driver 17 for SQL Server : Login failed > for user 'sa'.. Ok, never mind. I changed the 'sleep 5' to 'sleep 15' in the script and it works now. Seems like 5 seconds is too little on this vm for starting the server. I can also trigger the bug. I'll see what causes the bug. Thanks for providing the reliable reproducer, it's really helpful. > > dbgen.sh ran successfully before. > > Any idea? > > > > > > > > > Curiously, the second leaf range overshot by only 16KB.... > > Let me know, if I can try anything else. > > > > -- > Filipe David Manana, > > =E2=80=9CWhether you think you can, or you think you can't =E2=80=94 you'= re right.=E2=80=9D --=20 Filipe David Manana, =E2=80=9CWhether you think you can, or you think you can't =E2=80=94 you're= right.=E2=80=9D