From: Logan Gunthorpe <logang@deltatee.com>
To: Song Liu <songliubraving@fb.com>, Jens Axboe <axboe@kernel.dk>,
linux-raid <linux-raid@vger.kernel.org>
Cc: David Sloan <david.sloan@eideticom.com>,
Yu Kuai <yukuai3@huawei.com>,
Mateusz Grzonka <mateusz.grzonka@intel.com>,
Saurabh Sengar <ssengar@linux.microsoft.com>,
XU pengfei <xupengfei@nfschina.com>,
Guoqing Jiang <guoqing.jiang@linux.dev>,
Zhou nan <zhounan@nfschina.com>
Subject: Re: [GIT PULL] md-next 20220921
Date: Wed, 21 Sep 2022 17:44:58 -0600 [thread overview]
Message-ID: <80560b23-c124-c8ce-d66b-a7afe5b7fa41@deltatee.com> (raw)
In-Reply-To: <b347b8e9-d136-3430-5be0-b4b14d067dc4@deltatee.com>
On 2022-09-21 16:37, Logan Gunthorpe wrote:
>
>
> On 2022-09-21 15:33, Song Liu wrote:
>> Hi Jens,
>>
>> Please consider pulling the following changes for md-next on top of your
>> for-6.1/block branch (for-6.1/drivers branch doesn't exist yet).
>>
>> The major changes are:
>>
>> 1. Various raid5 fix and clean up, by Logan Gunthorpe and David Sloan.
>> 2. Raid10 performance optimization, by Yu Kuai.
>> 3. Generate CHANGE uevents for md device, by Mateusz Grzonka.
>
> I may have hit a bug with my tests on the latest md-next branch. Still
> trying to hit it again. The last tests I ran for several days with some
> patches on the previous md-next branch, but I didn't have Mateusz's
> changes, and it also looks like the branch was rebased today so it could
> be caused by either of those things. I'll let you know when I know more.
Yes, ok, I've found two separate issues and both are fixed by reverting
21023a82bff7 ("md: generate CHANGE uevents for md device")
I suggest we drop that patch for this cycle so we can sort them out.
The issues are:
1) The concrete issue comes when running mdadm test 01r1fail. I get the
kernel bugs at the end of this email. It seems we cannot call
kobject_uevent() in at least one of the contexts that md_new_event() is
called in because it sleeps in a critical section.
2) With our custom test suite that creates and destroys arrays, adds and
removes disks, and runs data through them repeatedly, I randomly start
seeing these warnings:
mdadm: Fail to create md0 when using
/sys/module/md_mod/parameters/new_array, fallback to creation via node
And then very occasionally get that warning paired with this error:
mdadm: unexpected failure opening /dev/md0
Which stops the test because it fails to create an array. I also see a
lot of the same bugs as below so it may be related.
Logan
--
BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 853, name: mdadm
preempt_count: 0, expected: 0
RCU nest depth: 1, expected: 0
1 lock held by mdadm/853:
#0: ffffffff98c623c0 (rcu_read_lock){....}-{1:2}, at:
md_ioctl+0x8f0/0x2670
CPU: 2 PID: 853 Comm: mdadm Not tainted
6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2
04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5a/0x74
dump_stack+0x10/0x12
__might_resched.cold+0x146/0x17e
__might_sleep+0x66/0xc0
kmem_cache_alloc_trace+0x2f8/0x400
kobject_uevent_env+0x121/0xa30
kobject_uevent+0xb/0x10
md_new_event+0x6b/0x80
md_error+0x168/0x1b0
md_ioctl+0x989/0x2670
blkdev_ioctl+0x24d/0x450
__x64_sys_ioctl+0xc0/0x100
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
=============================
[ BUG: Invalid wait context ]
6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680 Tainted: G
W
-----------------------------
mdadm/853 is trying to lock:
ffffffff990e4950 (uevent_sock_mutex){+.+.}-{3:3}, at:
kobject_uevent_env+0x460/0xa30
other info that might help us debug this:
context-{4:4}
1 lock held by mdadm/853:
#0: ffffffff98c623c0 (rcu_read_lock){....}-{1:2}, at:
md_ioctl+0x8f0/0x2670
stack backtrace:
CPU: 2 PID: 853 Comm: mdadm Tainted: G W
6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2
04/01/2014
Call Trace:
<TASK>
dump_stack_lvl+0x5a/0x74
dump_stack+0x10/0x12
__lock_acquire.cold+0x2f2/0x31a
lock_acquire+0x183/0x440
__mutex_lock+0x125/0xe20
mutex_lock_nested+0x1b/0x20
kobject_uevent_env+0x460/0xa30
kobject_uevent+0xb/0x10
md_new_event+0x6b/0x80
md_error+0x168/0x1b0
md_ioctl+0x989/0x2670
blkdev_ioctl+0x24d/0x450
__x64_sys_ioctl+0xc0/0x100
do_syscall_64+0x35/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
next prev parent reply other threads:[~2022-09-21 23:45 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-09-21 21:33 [GIT PULL] md-next 20220921 Song Liu
2022-09-21 22:37 ` Logan Gunthorpe
2022-09-21 23:44 ` Logan Gunthorpe [this message]
2022-09-22 0:40 ` Song Liu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=80560b23-c124-c8ce-d66b-a7afe5b7fa41@deltatee.com \
--to=logang@deltatee.com \
--cc=axboe@kernel.dk \
--cc=david.sloan@eideticom.com \
--cc=guoqing.jiang@linux.dev \
--cc=linux-raid@vger.kernel.org \
--cc=mateusz.grzonka@intel.com \
--cc=songliubraving@fb.com \
--cc=ssengar@linux.microsoft.com \
--cc=xupengfei@nfschina.com \
--cc=yukuai3@huawei.com \
--cc=zhounan@nfschina.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.