From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CC6ACC43603 for ; Thu, 12 Dec 2019 16:19:49 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 959BC21655 for ; Thu, 12 Dec 2019 16:19:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1576167589; bh=CsSaMjALJqeGdoDiCgfuzhu7E4of+HaoUqAqe2nQAJ0=; h=Date:From:To:Subject:References:In-Reply-To:List-ID:From; b=DFfZCUK3pZKUYFEvmYd6bUj0pbJrBT/ELB7DgwV9zOABCq355neMxIqCgby1zLCtN 4pugjzBehfYyJSg2c6qmfNbZhEBpAxV5buXGoc5rLCAl125xFFkjmiGY85sAT1XsNR 0ePSRUeuDs0/OnWd+lo55W7oEUaa0xPEvRDiLBFM= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1729828AbfLLQTs (ORCPT ); Thu, 12 Dec 2019 11:19:48 -0500 Received: from mail-pj1-f67.google.com ([209.85.216.67]:40873 "EHLO mail-pj1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729591AbfLLQTs (ORCPT ); Thu, 12 Dec 2019 11:19:48 -0500 Received: by mail-pj1-f67.google.com with SMTP id s35so1225628pjb.7 for ; Thu, 12 Dec 2019 08:19:47 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=QTPWoslCD5HNrbm16yhiX6PoUaog5yhqqHA5QXSZeB0=; b=Y7ptHnOiQEc1Xc8OfNLrchfQFPP5O3PnjGnWfG05y7/mjjwaZDrmGQA08Zr/osPrMS tgh7Uk1JGiN/zZRsjlfYKrn39I8DjcJ/zfQ62/WGSiKDV6xlM5/e2VpP/jqC0xnYGu8b vBqCPoFp/zJLn0Sew2ACCS+wcQ7Bdn3/huBYjocMog5izytTFi7ttiJvkXJ3tDykRXZj 0uazU/9DcOSHySBeFFLphQfByJye2vsv4/QLoTBz4HdDzjDZHKE4dKJE7RyXYD7LHwKI yrE2YaQz7jCph882LawE0Uw87oP+0A1q+p+6C6A2Og0vkjCxXh9bck3tRPLZv/qJW3m9 47bA== X-Gm-Message-State: APjAAAWf6323K1sXjV5Lv3AkbXfM5O49Y649QkWXvF/bOtnT+s+dLkOZ 9PuU7IDYYyO+NvfVmkXMOKw= X-Google-Smtp-Source: APXvYqxQ8ImyKlG66NNN7y/pKfGscuE11zp6GZtk/F/Tl+iW9mH0sDPXH5EQUhfbzqxIQuyGULB84A== X-Received: by 2002:a17:90a:21ee:: with SMTP id q101mr10836542pjc.94.1576167587144; Thu, 12 Dec 2019 08:19:47 -0800 (PST) Received: from dennisz-mbp ([2620:10d:c090:180::147d]) by smtp.gmail.com with ESMTPSA id 81sm7690369pfx.73.2019.12.12.08.19.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 12 Dec 2019 08:19:46 -0800 (PST) Date: Thu, 12 Dec 2019 08:19:43 -0800 From: Dennis Zhou To: dsterba@suse.cz, David Sterba , Chris Mason , Josef Bacik , kernel-team@fb.com, linux-btrfs@vger.kernel.org Subject: Re: [PATCH 2/2] btrfs: fix compressed write bio attribution Message-ID: <20191212161943.GA30404@dennisz-mbp> References: <20191212151853.GT3929@twin.jikos.cz> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20191212151853.GT3929@twin.jikos.cz> User-Agent: Mutt/1.12.2 (2019-09-21) Sender: linux-btrfs-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-btrfs@vger.kernel.org On Thu, Dec 12, 2019 at 04:18:53PM +0100, David Sterba wrote: > On Wed, Dec 11, 2019 at 04:07:07PM -0800, Dennis Zhou wrote: > > Bio attribution is handled at bio_set_dev() as once we have a device, we > > have a corresponding request_queue and then can derive the current css. > > In special cases, we want to attribute to bio to someone else. This can > > be done by calling bio_associate_blkg_from_css(). Btrfs does this for > > compressed writeback as they are handled by kworkers which would be of > > the root cgroup rather than the cgroup designated by the wbc. > > > > Commit 1a41802701ec ("btrfs: drop bio_set_dev where not needed") removes > > early bio_set_dev() calls prior to submit_stripe_bio(). This breaks the > > above assumption that we'll have a request_queue when we are doing > > association. To fix this, special case passing the bio through just for > > btrfs_submit_compressed_write(). > > > > Without this, we crash in btrfs/024: > > [ 3052.093088] BUG: kernel NULL pointer dereference, address: 0000000000000510 > > [ 3052.107013] #PF: supervisor read access in kernel mode > > [ 3052.107014] #PF: error_code(0x0000) - not-present page > > [ 3052.107015] PGD 0 P4D 0 > > [ 3052.107021] Oops: 0000 [#1] SMP > > [ 3052.138904] CPU: 42 PID: 201270 Comm: kworker/u161:0 Kdump: loaded Not tainted 5.5.0-rc1-00062-g4852d8ac90a9 #712 > > [ 3052.138905] Hardware name: Quanta Tioga Pass Single Side 01-0032211004/Tioga Pass Single Side, BIOS F08_3A18 12/20/2018 > > [ 3052.138912] Workqueue: btrfs-delalloc btrfs_work_helper > > [ 3052.191375] RIP: 0010:bio_associate_blkg_from_css+0x1e/0x3c0 > > [ 3052.191377] Code: ff 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 89 f3 48 83 ec 08 48 8b 47 08 65 ff 05 ea 6e 9f 7e <48> 8b a8 10 05 00 00 45 31 c9 45 31 c0 31 d2 31 f6 b9 02 00 00 00 > > [ 3052.191379] RSP: 0018:ffffc900210cfc90 EFLAGS: 00010282 > > [ 3052.191380] RAX: 0000000000000000 RBX: ffff88bfe5573c00 RCX: 0000000000000000 > > [ 3052.191382] RDX: ffff889db48ec2f0 RSI: ffff88bfe5573c00 RDI: ffff889db48ec2f0 > > [ 3052.191386] RBP: 0000000000000800 R08: 0000000000203bb0 R09: ffff889db16b2400 > > [ 3052.293364] R10: 0000000000000000 R11: ffff88a07fffde80 R12: ffff889db48ec2f0 > > [ 3052.293365] R13: 0000000000001000 R14: ffff889de82bc000 R15: ffff889e2b7bdcc8 > > [ 3052.293367] FS: 0000000000000000(0000) GS:ffff889ffba00000(0000) knlGS:0000000000000000 > > [ 3052.293368] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > [ 3052.293369] CR2: 0000000000000510 CR3: 0000000002611001 CR4: 00000000007606e0 > > [ 3052.293370] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > > [ 3052.293371] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > > [ 3052.293372] PKRU: 55555554 > > [ 3052.293376] Call Trace: > > [ 3052.402552] btrfs_submit_compressed_write+0x137/0x390 > > Isn't it the same crash that Chris Murphy reported? > > https://lore.kernel.org/linux-btrfs/CAJCQCtS_7vjBnqeDsedBQJYuE_ap+Xo6D=MXY=rOxf66oJZkrA@mail.gmail.com/ > Yeah, looks like the same crash. > > [ 3052.402558] submit_compressed_extents+0x40f/0x4c0 > > [ 3052.422401] btrfs_work_helper+0x246/0x5a0 > > [ 3052.422408] process_one_work+0x200/0x570 > > [ 3052.438601] ? process_one_work+0x180/0x570 > > [ 3052.438605] worker_thread+0x4c/0x3e0 > > [ 3052.438614] kthread+0x103/0x140 > > [ 3052.460735] ? process_one_work+0x570/0x570 > > [ 3052.460737] ? kthread_mod_delayed_work+0xc0/0xc0 > > [ 3052.460744] ret_from_fork+0x24/0x30 > > > > Fixes: 1a41802701ec ("btrfs: drop bio_set_dev where not needed") > > Cc: David Sterba > > Cc: Josef Bacik > > Signed-off-by: Dennis Zhou > > --- > > fs/btrfs/compression.c | 14 +++++--------- > > fs/btrfs/volumes.c | 18 ++++++++++++++---- > > fs/btrfs/volumes.h | 3 +++ > > 3 files changed, 22 insertions(+), 13 deletions(-) > > > > diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c > > index 4ce81571f0cd..67d604fcb606 100644 > > --- a/fs/btrfs/compression.c > > +++ b/fs/btrfs/compression.c > > @@ -444,11 +444,9 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, > > bio->bi_opf = REQ_OP_WRITE | write_flags; > > bio->bi_private = cb; > > bio->bi_end_io = end_compressed_bio_write; > > - > > - if (blkcg_css) { > > + if (blkcg_css) > > bio->bi_opf |= REQ_CGROUP_PUNT; > > - bio_associate_blkg_from_css(bio, blkcg_css); > > - } > > + > > refcount_set(&cb->pending_bios, 1); > > > > /* create and submit bios for the compressed pages */ > > @@ -481,7 +479,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, > > BUG_ON(ret); /* -ENOMEM */ > > } > > > > - ret = btrfs_map_bio(fs_info, bio, 0); > > + ret = __btrfs_map_bio(fs_info, bio, 0, blkcg_css); > > if (ret) { > > bio->bi_status = ret; > > bio_endio(bio); > > @@ -491,10 +489,8 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, > > bio->bi_opf = REQ_OP_WRITE | write_flags; > > bio->bi_private = cb; > > bio->bi_end_io = end_compressed_bio_write; > > - if (blkcg_css) { > > + if (blkcg_css) > > bio->bi_opf |= REQ_CGROUP_PUNT; > > - bio_associate_blkg_from_css(bio, blkcg_css); > > - } > > bio_add_page(bio, page, PAGE_SIZE, 0); > > } > > if (bytes_left < PAGE_SIZE) { > > @@ -515,7 +511,7 @@ blk_status_t btrfs_submit_compressed_write(struct inode *inode, u64 start, > > BUG_ON(ret); /* -ENOMEM */ > > } > > > > - ret = btrfs_map_bio(fs_info, bio, 0); > > + ret = __btrfs_map_bio(fs_info, bio, 0, blkcg_css); > > if (ret) { > > bio->bi_status = ret; > > bio_endio(bio); > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > > index 66377e678504..c68d93a1aae8 100644 > > --- a/fs/btrfs/volumes.c > > +++ b/fs/btrfs/volumes.c > > @@ -6240,7 +6240,8 @@ static void btrfs_end_bio(struct bio *bio) > > } > > > > static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, > > - u64 physical, int dev_nr) > > + u64 physical, int dev_nr, > > + struct cgroup_subsys_state *blkcg_css) > > { > > struct btrfs_device *dev = bbio->stripes[dev_nr].dev; > > struct btrfs_fs_info *fs_info = bbio->fs_info; > > @@ -6255,6 +6256,8 @@ static void submit_stripe_bio(struct btrfs_bio *bbio, struct bio *bio, > > (u_long)dev->bdev->bd_dev, rcu_str_deref(dev->name), dev->devid, > > bio->bi_iter.bi_size); > > bio_set_dev(bio, dev->bdev); > > + if (blkcg_css) > > + bio_associate_blkg_from_css(bio, blkcg_css); > > At this point we know the bdev is the correct one, but is the blkcg_css > different for each device or is there one for all? > It's a single blkcg_css for all devices. It would be different blkgs as those are the request_queue-blkcg pairs. > Passing the blkcg_css is one way, one single point where the bio and css > are associated and probably the cleanest. I only don't like the need to > pass the blkcg_css around but that's probably not a big deal than to > forget to set bdev somewhere. Actually, let me try spinning one other way of doing this and get back to you in a bit.