From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.1 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8B68EC433DF for ; Thu, 2 Jul 2020 17:46:13 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 6510020772 for ; Thu, 2 Jul 2020 17:46:13 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1593711973; bh=fw4TodQVf+xsQjbA6fxlFzCEg00LSw6P9y3ciQtTOVw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:List-ID:From; b=wlpUCJ7ztaCCWIiTOYXHsjb3D9zxlhWFYoQ6KV0e7HgnfNuYfq2WhU8rwNtDC2pWC ls749q+KGjkfMzLCVoPJy9BekkzpqxfVkPy+tsV0JAEarmLSTzCKyc+MKqcOyWAQmN 449BVh4MWBVkpYfK589bjeSbjknDwXXeKwJMGQMA= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727111AbgGBRqM (ORCPT ); Thu, 2 Jul 2020 13:46:12 -0400 Received: from mail.kernel.org ([198.145.29.99]:48410 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727124AbgGBRqM (ORCPT ); Thu, 2 Jul 2020 13:46:12 -0400 Received: from localhost (c-73-47-72-35.hsd1.nh.comcast.net [73.47.72.35]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id 7BCD220702; Thu, 2 Jul 2020 17:46:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1593711970; bh=fw4TodQVf+xsQjbA6fxlFzCEg00LSw6P9y3ciQtTOVw=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From; b=zP5f8vk+XhCJhB4GlI6438r5hK0d4HoTRV4f4mcWBJHO6xYTU/O7Sv7iI3po27t9I FlCB/zSTTsAC8GwzDK4eiEAvbAVg6LVCpWzAt+YrlCuh6SoL4UTDFee8AILgws/GPu EBo4l6ikX2wxfsbvMVUe90seXW5tAbPL3CFLeJ8Y= Date: Thu, 2 Jul 2020 13:46:09 -0400 From: Sasha Levin To: gregkh@linuxfoundation.org Cc: fdmanana@suse.com, dsterba@suse.com, josef@toxicpanda.com, stable@vger.kernel.org Subject: Re: FAILED: patch "[PATCH] btrfs: fix data block group relocation failure due to" failed to apply to 4.14-stable tree Message-ID: <20200702174609.GC2722994@sasha-vm> References: <1593428551379@kroah.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Content-Disposition: inline In-Reply-To: <1593428551379@kroah.com> Sender: stable-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: stable@vger.kernel.org On Mon, Jun 29, 2020 at 01:02:31PM +0200, gregkh@linuxfoundation.org wrote: > >The patch below does not apply to the 4.14-stable tree. >If someone wants it applied there, or to any other stable or longterm >tree, then please email the backport, including the original git commit >id to . > >thanks, > >greg k-h > >------------------ original commit in Linus's tree ------------------ > >>From 432cd2a10f1c10cead91fe706ff5dc52f06d642a Mon Sep 17 00:00:00 2001 >From: Filipe Manana >Date: Mon, 8 Jun 2020 13:32:55 +0100 >Subject: [PATCH] btrfs: fix data block group relocation failure due to > concurrent scrub > >When running relocation of a data block group while scrub is running in >parallel, it is possible that the relocation will fail and abort the >current transaction with an -EINVAL error: > > [134243.988595] BTRFS info (device sdc): found 14 extents, stage: move data extents > [134243.999871] ------------[ cut here ]------------ > [134244.000741] BTRFS: Transaction aborted (error -22) > [134244.001692] WARNING: CPU: 0 PID: 26954 at fs/btrfs/ctree.c:1071 __btrfs_cow_block+0x6a7/0x790 [btrfs] > [134244.003380] Modules linked in: btrfs blake2b_generic xor raid6_pq (...) > [134244.012577] CPU: 0 PID: 26954 Comm: btrfs Tainted: G W 5.6.0-rc7-btrfs-next-58 #5 > [134244.014162] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014 > [134244.016184] RIP: 0010:__btrfs_cow_block+0x6a7/0x790 [btrfs] > [134244.017151] Code: 48 c7 c7 (...) > [134244.020549] RSP: 0018:ffffa41607863888 EFLAGS: 00010286 > [134244.021515] RAX: 0000000000000000 RBX: ffff9614bdfe09c8 RCX: 0000000000000000 > [134244.022822] RDX: 0000000000000001 RSI: ffffffffb3d63980 RDI: 0000000000000001 > [134244.024124] RBP: ffff961589e8c000 R08: 0000000000000000 R09: 0000000000000001 > [134244.025424] R10: ffffffffc0ae5955 R11: 0000000000000000 R12: ffff9614bd530d08 > [134244.026725] R13: ffff9614ced41b88 R14: ffff9614bdfe2a48 R15: 0000000000000000 > [134244.028024] FS: 00007f29b63c08c0(0000) GS:ffff9615ba600000(0000) knlGS:0000000000000000 > [134244.029491] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [134244.030560] CR2: 00007f4eb339b000 CR3: 0000000130d6e006 CR4: 00000000003606f0 > [134244.031997] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [134244.033153] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [134244.034484] Call Trace: > [134244.034984] btrfs_cow_block+0x12b/0x2b0 [btrfs] > [134244.035859] do_relocation+0x30b/0x790 [btrfs] > [134244.036681] ? do_raw_spin_unlock+0x49/0xc0 > [134244.037460] ? _raw_spin_unlock+0x29/0x40 > [134244.038235] relocate_tree_blocks+0x37b/0x730 [btrfs] > [134244.039245] relocate_block_group+0x388/0x770 [btrfs] > [134244.040228] btrfs_relocate_block_group+0x161/0x2e0 [btrfs] > [134244.041323] btrfs_relocate_chunk+0x36/0x110 [btrfs] > [134244.041345] btrfs_balance+0xc06/0x1860 [btrfs] > [134244.043382] ? btrfs_ioctl_balance+0x27c/0x310 [btrfs] > [134244.045586] btrfs_ioctl_balance+0x1ed/0x310 [btrfs] > [134244.045611] btrfs_ioctl+0x1880/0x3760 [btrfs] > [134244.049043] ? do_raw_spin_unlock+0x49/0xc0 > [134244.049838] ? _raw_spin_unlock+0x29/0x40 > [134244.050587] ? __handle_mm_fault+0x11b3/0x14b0 > [134244.051417] ? ksys_ioctl+0x92/0xb0 > [134244.052070] ksys_ioctl+0x92/0xb0 > [134244.052701] ? trace_hardirqs_off_thunk+0x1a/0x1c > [134244.053511] __x64_sys_ioctl+0x16/0x20 > [134244.054206] do_syscall_64+0x5c/0x280 > [134244.054891] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [134244.055819] RIP: 0033:0x7f29b51c9dd7 > [134244.056491] Code: 00 00 00 (...) > [134244.059767] RSP: 002b:00007ffcccc1dd08 EFLAGS: 00000202 ORIG_RAX: 0000000000000010 > [134244.061168] RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007f29b51c9dd7 > [134244.062474] RDX: 00007ffcccc1dda0 RSI: 00000000c4009420 RDI: 0000000000000003 > [134244.063771] RBP: 0000000000000003 R08: 00005565cea4b000 R09: 0000000000000000 > [134244.065032] R10: 0000000000000541 R11: 0000000000000202 R12: 00007ffcccc2060a > [134244.066327] R13: 00007ffcccc1dda0 R14: 0000000000000002 R15: 00007ffcccc1dec0 > [134244.067626] irq event stamp: 0 > [134244.068202] hardirqs last enabled at (0): [<0000000000000000>] 0x0 > [134244.069351] hardirqs last disabled at (0): [] copy_process+0x74f/0x2020 > [134244.070909] softirqs last enabled at (0): [] copy_process+0x74f/0x2020 > [134244.072392] softirqs last disabled at (0): [<0000000000000000>] 0x0 > [134244.073432] ---[ end trace bd7c03622e0b0a99 ]--- > >The -EINVAL error comes from the following chain of function calls: > > __btrfs_cow_block() <-- aborts the transaction > btrfs_reloc_cow_block() > replace_file_extents() > get_new_location() <-- returns -EINVAL > >When relocating a data block group, for each allocated extent of the block >group, we preallocate another extent (at prealloc_file_extent_cluster()), >associated with the data relocation inode, and then dirty all its pages. >These preallocated extents have, and must have, the same size that extents >from the data block group being relocated have. > >Later before we start the relocation stage that updates pointers (bytenr >field of file extent items) to point to the the new extents, we trigger >writeback for the data relocation inode. The expectation is that writeback >will write the pages to the previously preallocated extents, that it >follows the NOCOW path. That is generally the case, however, if a scrub >is running it may have turned the block group that contains those extents >into RO mode, in which case writeback falls back to the COW path. > >However in the COW path instead of allocating exactly one extent with the >expected size, the allocator may end up allocating several smaller extents >due to free space fragmentation - because we tell it at cow_file_range() >that the minimum allocation size can match the filesystem's sector size. >This later breaks the relocation's expectation that an extent associated >to a file extent item in the data relocation inode has the same size as >the respective extent pointed by a file extent item in another tree - in >this case the extent to which the relocation inode poins to is smaller, >causing relocation.c:get_new_location() to return -EINVAL. > >For example, if we are relocating a data block group X that has a logical >address of X and the block group has an extent allocated at the logical >address X + 128KiB with a size of 64KiB: > >1) At prealloc_file_extent_cluster() we allocate an extent for the data > relocation inode with a size of 64KiB and associate it to the file > offset 128KiB (X + 128KiB - X) of the data relocation inode. This > preallocated extent was allocated at block group Z; > >2) A scrub running in parallel turns block group Z into RO mode and > starts scrubing its extents; > >3) Relocation triggers writeback for the data relocation inode; > >4) When running delalloc (btrfs_run_delalloc_range()), we try first the > NOCOW path because the data relocation inode has BTRFS_INODE_PREALLOC > set in its flags. However, because block group Z is in RO mode, the > NOCOW path (run_delalloc_nocow()) falls back into the COW path, by > calling cow_file_range(); > >5) At cow_file_range(), in the first iteration of the while loop we call > btrfs_reserve_extent() to allocate a 64KiB extent and pass it a minimum > allocation size of 4KiB (fs_info->sectorsize). Due to free space > fragmentation, btrfs_reserve_extent() ends up allocating two extents > of 32KiB each, each one on a different iteration of that while loop; > >6) Writeback of the data relocation inode completes; > >7) Relocation proceeds and ends up at relocation.c:replace_file_extents(), > with a leaf which has a file extent item that points to the data extent > from block group X, that has a logical address (bytenr) of X + 128KiB > and a size of 64KiB. Then it calls get_new_location(), which does a > lookup in the data relocation tree for a file extent item starting at > offset 128KiB (X + 128KiB - X) and belonging to the data relocation > inode. It finds a corresponding file extent item, however that item > points to an extent that has a size of 32KiB, which doesn't match the > expected size of 64KiB, resuling in -EINVAL being returned from this > function and propagated up to __btrfs_cow_block(), which aborts the > current transaction. > >To fix this make sure that at cow_file_range() when we call the allocator >we pass it a minimum allocation size corresponding the desired extent size >if the inode belongs to the data relocation tree, otherwise pass it the >filesystem's sector size as the minimum allocation size. > >CC: stable@vger.kernel.org # 4.4+ >Reviewed-by: Josef Bacik >Signed-off-by: Filipe Manana >Signed-off-by: David Sterba I've backported this to 4.4 by also taking 3752d22fcea1 ("btrfs: cow_file_range() num_bytes and disk_num_bytes are same") and some light massaging. -- Thanks, Sasha