All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Logan Gunthorpe <logang@deltatee.com>
Cc: Guoqing Jiang <guoqing.jiang@linux.dev>,
	Donald Buczek <buczek@molgen.mpg.de>, Song Liu <song@kernel.org>,
	Jan Kara <jack@suse.cz>, Jens Axboe <axboe@kernel.dk>,
	Paolo Valente <paolo.valente@linaro.org>,
	linux-raid <linux-raid@vger.kernel.org>,
	Tejun Heo <tj@kernel.org>, Christoph Hellwig <hch@infradead.org>,
	linux-block@vger.kernel.org
Subject: Re: [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held
Date: Thu, 26 May 2022 13:53:36 +0200	[thread overview]
Message-ID: <20220526115336.2whsfdcuqwfzk5fk@quack3.lan> (raw)
In-Reply-To: <ae6d294a-e9ec-a81d-6085-a9341ed8a470@deltatee.com>

[Added couple of CCs since this seems to be an issue in the generic block
layer]

On Wed 25-05-22 12:22:06, Logan Gunthorpe wrote:
> On 2022-05-25 03:04, Guoqing Jiang wrote:
> > I would prefer to focus on block tree or md tree. With latest block tree
> > (commit 44d8538d7e7dbee7246acda3b706c8134d15b9cb), I get below
> > similar issue as Donald reported, it happened with the cmd (which did
> > work with 5.12 kernel).
> > 
> > vm79:~/mdadm> sudo ./test --dev=loop --tests=05r1-add-internalbitmap
> 
> Ok, so this test passes for me, but my VM was not running with bfq. It
> also seems we have layers upon layers of different bugs to untangle.
> Perhaps you can try the tests with bfq disabled to make progress on the
> other regression I reported.
> 
> If I enable bfq and set the loop devices to the bfq scheduler, then I
> hit the same bug as you and Donald. It's clearly a NULL pointer
> de-reference in the bfq code, which seems to be triggered on the
> partition read after mdadm opens a block device (not sure if it's the md
> device or the loop device but I suspect the latter seeing it's not going
> through any md code).
> 
> Simplifying things down a bit, the null pointer dereference can be
> triggered by creating an md device with loop devices that have bfq
> scheduler set:
> 
>   mdadm --create --run /dev/md0 --level=1 -n2 /dev/loop0 /dev/loop1
> 
> The crash occurs in bfq_bio_bfqg() with blkg_to_bfqg() returning NULL.
> It's hard to trace where the NULL comes from in there -- the code is a
> bit complex.
> 
> I've found that the bfq bug exists in current md-next (42b805af102) but
> did not trigger in the base tag of v5.18-rc3. Bisecting revealed the bug
> was introduced by:
> 
>   4e54a2493e58 ("bfq: Get rid of __bio_blkcg() usage")
> 
> Reverting that commit and the next commit (075a53b7) on top of md-next
> was confirmed to fix the bug.
> 
> I've copied Jan, Jens and Paolo who can hopefully help with this. A
> cleaned up stack trace follows this email for their benefit.

So I've debugged this. The crash happens on the very first bio submitted to
the md0 device. The problem is that this bio gets remapped to loop0 - this
happens through bio_alloc_clone() -> __bio_clone() which ends up calling
bio_clone_blkg_association(). Now the resulting bio is inconsistent - it's
dst_bio->bi_bdev is pointing to loop0 while dst_bio->bi_blkg is pointing to
blkcg_gq associated with md0 request queue. And this breaks BFQ because
when this bio is inserted to loop0 request queue, BFQ looks at
bio->bi_blkg->q (it is a bit more complex than that but this is the gist
of the problem), expects its data there but BFQ is not initialized for md0
request_queue.

Now I think this is a bug in __bio_clone() but the inconsistency in the bio
is very much what we asked bio_clone_blkg_association() to do so maybe I'm
missing something and bios that are associated with one bdev but pointing
to blkg of another bdev are fine and controllers are supposed to handle
that (although I'm not sure how should they do that). So I'm asking here
before I just go and delete bio_clone_blkg_association() from
__bio_clone()...

								Honza
-- 
Jan Kara <jack@suse.com>
SUSE Labs, CR

  parent reply	other threads:[~2022-05-26 11:53 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-05-05  8:16 [PATCH 0/2] two fixes for md Guoqing Jiang
2022-05-05  8:16 ` [PATCH V3 1/2] md: don't unregister sync_thread with reconfig_mutex held Guoqing Jiang
2022-05-05 14:02   ` kernel test robot
2022-05-05 18:04   ` kernel test robot
2022-05-06  2:34     ` Guoqing Jiang
2022-05-06  2:34       ` Guoqing Jiang
2022-05-05  8:16 ` [PATCH 2/2] md: protect md_unregister_thread from reentrancy Guoqing Jiang
2022-05-09  6:39   ` Song Liu
2022-05-09  8:12     ` Guoqing Jiang
2022-05-06 11:36 ` [Update PATCH V3] md: don't unregister sync_thread with reconfig_mutex held Guoqing Jiang
2022-05-09  6:37   ` Song Liu
2022-05-09  8:09     ` Guoqing Jiang
2022-05-09  9:32       ` Wols Lists
2022-05-09 10:37         ` Guoqing Jiang
2022-05-09 11:19           ` Wols Lists
2022-05-09 11:26             ` Guoqing Jiang
2022-05-10  6:44       ` Song Liu
2022-05-10 12:01         ` Donald Buczek
2022-05-10 12:09           ` Guoqing Jiang
2022-05-10 12:35             ` Donald Buczek
2022-05-10 18:02               ` Song Liu
2022-05-11  8:10                 ` Guoqing Jiang
2022-05-11 21:45                   ` Song Liu
2022-05-20 18:27         ` Logan Gunthorpe
2022-05-21 18:23           ` Donald Buczek
2022-05-23  1:08             ` Guoqing Jiang
2022-05-23  5:41               ` Donald Buczek
2022-05-23  9:51                 ` Guoqing Jiang
2022-05-24 16:13                   ` Logan Gunthorpe
2022-05-25  9:04                     ` Guoqing Jiang
2022-05-25 18:22                       ` Logan Gunthorpe
2022-05-26  9:46                         ` Jan Kara
2022-05-26 11:53                         ` Jan Kara [this message]
2022-05-31  6:11                           ` Christoph Hellwig
2022-05-31  7:43                             ` Jan Kara
2022-05-30  9:55                   ` Guoqing Jiang
2022-05-30 16:35                     ` Logan Gunthorpe
2022-05-31  8:13                       ` Guoqing Jiang
2022-05-24 15:58                 ` Logan Gunthorpe
2022-05-24 18:16                   ` Song Liu
2022-05-25  9:17                 ` Guoqing Jiang
2022-05-24 15:51             ` Logan Gunthorpe
2022-06-02  8:12           ` Xiao Ni
2022-05-09  8:18   ` Donald Buczek
2022-05-09  8:48     ` Guoqing Jiang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20220526115336.2whsfdcuqwfzk5fk@quack3.lan \
    --to=jack@suse.cz \
    --cc=axboe@kernel.dk \
    --cc=buczek@molgen.mpg.de \
    --cc=guoqing.jiang@linux.dev \
    --cc=hch@infradead.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-raid@vger.kernel.org \
    --cc=logang@deltatee.com \
    --cc=paolo.valente@linaro.org \
    --cc=song@kernel.org \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.