All of lore.kernel.org
 help / color / mirror / Atom feed
From: Logan Gunthorpe <logang@deltatee.com>
To: Song Liu <songliubraving@fb.com>, Jens Axboe <axboe@kernel.dk>,
	linux-raid <linux-raid@vger.kernel.org>
Cc: David Sloan <david.sloan@eideticom.com>,
	Yu Kuai <yukuai3@huawei.com>,
	Mateusz Grzonka <mateusz.grzonka@intel.com>,
	Saurabh Sengar <ssengar@linux.microsoft.com>,
	XU pengfei <xupengfei@nfschina.com>,
	Guoqing Jiang <guoqing.jiang@linux.dev>,
	Zhou nan <zhounan@nfschina.com>
Subject: Re: [GIT PULL] md-next 20220921
Date: Wed, 21 Sep 2022 17:44:58 -0600	[thread overview]
Message-ID: <80560b23-c124-c8ce-d66b-a7afe5b7fa41@deltatee.com> (raw)
In-Reply-To: <b347b8e9-d136-3430-5be0-b4b14d067dc4@deltatee.com>



On 2022-09-21 16:37, Logan Gunthorpe wrote:
> 
> 
> On 2022-09-21 15:33, Song Liu wrote:
>> Hi Jens, 
>>
>> Please consider pulling the following changes for md-next on top of your
>> for-6.1/block branch (for-6.1/drivers branch doesn't exist yet). 
>>
>> The major changes are:
>>
>> 1. Various raid5 fix and clean up, by Logan Gunthorpe and David Sloan.
>> 2. Raid10 performance optimization, by Yu Kuai. 
>> 3. Generate CHANGE uevents for md device, by Mateusz Grzonka. 
> 
> I may have hit a bug with my tests on the latest md-next branch. Still
> trying to hit it again. The last tests I ran for several days with some
> patches on the previous md-next branch, but I didn't have Mateusz's
> changes, and it also looks like the branch was rebased today so it could
> be caused by either of those things. I'll let you know when I know more.

Yes, ok, I've found two separate issues and both are fixed by reverting

   21023a82bff7 ("md: generate CHANGE uevents for md device")

I suggest we drop that patch for this cycle so we can sort them out.

The issues are:

1) The concrete issue comes when running mdadm test 01r1fail. I get the
kernel bugs at the end of this email. It seems we cannot call
kobject_uevent() in at least one of the contexts that md_new_event() is
called in because it sleeps in a critical section.

2) With our custom test suite that creates and destroys arrays, adds and
removes disks, and runs data through them repeatedly, I randomly start
seeing these warnings:

   mdadm: Fail to create md0 when using
/sys/module/md_mod/parameters/new_array, fallback to creation via node

And then very occasionally get that warning paired with this error:

   mdadm: unexpected failure opening /dev/md0

Which stops the test because it fails to create an array. I also see a
lot of the same bugs as below so it may be related.

Logan

--

 BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 853, name: mdadm
 preempt_count: 0, expected: 0
 RCU nest depth: 1, expected: 0
 1 lock held by mdadm/853:
  #0: ffffffff98c623c0 (rcu_read_lock){....}-{1:2}, at:
md_ioctl+0x8f0/0x2670
 CPU: 2 PID: 853 Comm: mdadm Not tainted
6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2
04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5a/0x74
  dump_stack+0x10/0x12
  __might_resched.cold+0x146/0x17e
  __might_sleep+0x66/0xc0
  kmem_cache_alloc_trace+0x2f8/0x400
  kobject_uevent_env+0x121/0xa30
  kobject_uevent+0xb/0x10
  md_new_event+0x6b/0x80
  md_error+0x168/0x1b0
  md_ioctl+0x989/0x2670
  blkdev_ioctl+0x24d/0x450
  __x64_sys_ioctl+0xc0/0x100
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

 =============================
 [ BUG: Invalid wait context ]
 6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680 Tainted: G
  W
 -----------------------------
 mdadm/853 is trying to lock:
 ffffffff990e4950 (uevent_sock_mutex){+.+.}-{3:3}, at:
kobject_uevent_env+0x460/0xa30
 other info that might help us debug this:
 context-{4:4}
 1 lock held by mdadm/853:
  #0: ffffffff98c623c0 (rcu_read_lock){....}-{1:2}, at:
md_ioctl+0x8f0/0x2670
 stack backtrace:
 CPU: 2 PID: 853 Comm: mdadm Tainted: G        W
6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2
04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5a/0x74
  dump_stack+0x10/0x12
  __lock_acquire.cold+0x2f2/0x31a
  lock_acquire+0x183/0x440
  __mutex_lock+0x125/0xe20
  mutex_lock_nested+0x1b/0x20
  kobject_uevent_env+0x460/0xa30
  kobject_uevent+0xb/0x10
  md_new_event+0x6b/0x80
  md_error+0x168/0x1b0
  md_ioctl+0x989/0x2670
  blkdev_ioctl+0x24d/0x450
  __x64_sys_ioctl+0xc0/0x100
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0



  reply	other threads:[~2022-09-21 23:45 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-21 21:33 [GIT PULL] md-next 20220921 Song Liu
2022-09-21 22:37 ` Logan Gunthorpe
2022-09-21 23:44   ` Logan Gunthorpe [this message]
2022-09-22  0:40     ` Song Liu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=80560b23-c124-c8ce-d66b-a7afe5b7fa41@deltatee.com \
    --to=logang@deltatee.com \
    --cc=axboe@kernel.dk \
    --cc=david.sloan@eideticom.com \
    --cc=guoqing.jiang@linux.dev \
    --cc=linux-raid@vger.kernel.org \
    --cc=mateusz.grzonka@intel.com \
    --cc=songliubraving@fb.com \
    --cc=ssengar@linux.microsoft.com \
    --cc=xupengfei@nfschina.com \
    --cc=yukuai3@huawei.com \
    --cc=zhounan@nfschina.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.