All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL] md-next 20220921
@ 2022-09-21 21:33 Song Liu
  2022-09-21 22:37 ` Logan Gunthorpe
  0 siblings, 1 reply; 4+ messages in thread
From: Song Liu @ 2022-09-21 21:33 UTC (permalink / raw)
  To: Jens Axboe, linux-raid
  Cc: Logan Gunthorpe, David Sloan, Yu Kuai, Mateusz Grzonka,
	Saurabh Sengar, XU pengfei, Guoqing Jiang, Zhou nan

Hi Jens, 

Please consider pulling the following changes for md-next on top of your
for-6.1/block branch (for-6.1/drivers branch doesn't exist yet). 

The major changes are:

1. Various raid5 fix and clean up, by Logan Gunthorpe and David Sloan.
2. Raid10 performance optimization, by Yu Kuai. 
3. Generate CHANGE uevents for md device, by Mateusz Grzonka. 

Thanks,
Song


The following changes since commit 8c5035dfbb9475b67c82b3fdb7351236525bf52b:

  blk-wbt: call rq_qos_add() after wb_normal is initialized (2022-09-21 08:36:13 -0600)

are available in the Git repository at:

  https://git.kernel.org/pub/scm/linux/kernel/git/song/md.git md-next

for you to fetch changes up to 9859e343daaf8b08bbb4bed63a378a05535bcb47:

  md: Fix spelling mistake in comments of r5l_log (2022-09-21 14:22:17 -0700)

----------------------------------------------------------------
David Sloan (1):
      md/raid5: Remove unnecessary bio_put() in raid5_read_one_chunk()

Guoqing Jiang (1):
      md/raid10: fix compile warning

Logan Gunthorpe (7):
      md/raid5: Refactor raid5_get_active_stripe()
      md/raid5: Drop extern on function declarations in raid5.h
      md/raid5: Cleanup prototype of raid5_get_active_stripe()
      md/raid5: Don't read ->active_stripes if it's not needed
      md/raid5: Ensure stripe_fill happens on non-read IO with journal
      md: Remove extra mddev_get() in md_seq_start()
      md/raid5: Wait for MD_SB_CHANGE_PENDING in raid5d

Mateusz Grzonka (1):
      md: generate CHANGE uevents for md device

Saurabh Sengar (1):
      md: Replace snprintf with scnprintf

Song Liu (1):
      Merge branch 'md-next-raid10-optimize' into md-next

XU pengfei (1):
      md/raid5: Fix spelling mistakes in comments

Yu Kuai (5):
      md/raid10: factor out code from wait_barrier() to stop_waiting_barrier()
      md/raid10: don't modify 'nr_waitng' in wait_barrier() for the case nowait
      md/raid10: prevent unnecessary calls to wake_up() in fast path
      md/raid10: fix improper BUG_ON() in raise_barrier()
      md/raid10: convert resync_lock to use seqlock

Zhou nan (1):
      md: Fix spelling mistake in comments of r5l_log

 drivers/md/md.c          |  32 ++++++++++++++++----------------
 drivers/md/md.h          |   2 +-
 drivers/md/raid0.c       |   2 +-
 drivers/md/raid10.c      | 153 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------------------------------
 drivers/md/raid10.h      |   2 +-
 drivers/md/raid5-cache.c |  11 ++++++-----
 drivers/md/raid5.c       | 149 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++--------------------------------------------------------------------
 drivers/md/raid5.h       |  32 ++++++++++++++++++++------------
 8 files changed, 223 insertions(+), 160 deletions(-)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GIT PULL] md-next 20220921
  2022-09-21 21:33 [GIT PULL] md-next 20220921 Song Liu
@ 2022-09-21 22:37 ` Logan Gunthorpe
  2022-09-21 23:44   ` Logan Gunthorpe
  0 siblings, 1 reply; 4+ messages in thread
From: Logan Gunthorpe @ 2022-09-21 22:37 UTC (permalink / raw)
  To: Song Liu, Jens Axboe, linux-raid
  Cc: David Sloan, Yu Kuai, Mateusz Grzonka, Saurabh Sengar,
	XU pengfei, Guoqing Jiang, Zhou nan



On 2022-09-21 15:33, Song Liu wrote:
> Hi Jens, 
> 
> Please consider pulling the following changes for md-next on top of your
> for-6.1/block branch (for-6.1/drivers branch doesn't exist yet). 
> 
> The major changes are:
> 
> 1. Various raid5 fix and clean up, by Logan Gunthorpe and David Sloan.
> 2. Raid10 performance optimization, by Yu Kuai. 
> 3. Generate CHANGE uevents for md device, by Mateusz Grzonka. 

I may have hit a bug with my tests on the latest md-next branch. Still
trying to hit it again. The last tests I ran for several days with some
patches on the previous md-next branch, but I didn't have Mateusz's
changes, and it also looks like the branch was rebased today so it could
be caused by either of those things. I'll let you know when I know more.

Logan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GIT PULL] md-next 20220921
  2022-09-21 22:37 ` Logan Gunthorpe
@ 2022-09-21 23:44   ` Logan Gunthorpe
  2022-09-22  0:40     ` Song Liu
  0 siblings, 1 reply; 4+ messages in thread
From: Logan Gunthorpe @ 2022-09-21 23:44 UTC (permalink / raw)
  To: Song Liu, Jens Axboe, linux-raid
  Cc: David Sloan, Yu Kuai, Mateusz Grzonka, Saurabh Sengar,
	XU pengfei, Guoqing Jiang, Zhou nan



On 2022-09-21 16:37, Logan Gunthorpe wrote:
> 
> 
> On 2022-09-21 15:33, Song Liu wrote:
>> Hi Jens, 
>>
>> Please consider pulling the following changes for md-next on top of your
>> for-6.1/block branch (for-6.1/drivers branch doesn't exist yet). 
>>
>> The major changes are:
>>
>> 1. Various raid5 fix and clean up, by Logan Gunthorpe and David Sloan.
>> 2. Raid10 performance optimization, by Yu Kuai. 
>> 3. Generate CHANGE uevents for md device, by Mateusz Grzonka. 
> 
> I may have hit a bug with my tests on the latest md-next branch. Still
> trying to hit it again. The last tests I ran for several days with some
> patches on the previous md-next branch, but I didn't have Mateusz's
> changes, and it also looks like the branch was rebased today so it could
> be caused by either of those things. I'll let you know when I know more.

Yes, ok, I've found two separate issues and both are fixed by reverting

   21023a82bff7 ("md: generate CHANGE uevents for md device")

I suggest we drop that patch for this cycle so we can sort them out.

The issues are:

1) The concrete issue comes when running mdadm test 01r1fail. I get the
kernel bugs at the end of this email. It seems we cannot call
kobject_uevent() in at least one of the contexts that md_new_event() is
called in because it sleeps in a critical section.

2) With our custom test suite that creates and destroys arrays, adds and
removes disks, and runs data through them repeatedly, I randomly start
seeing these warnings:

   mdadm: Fail to create md0 when using
/sys/module/md_mod/parameters/new_array, fallback to creation via node

And then very occasionally get that warning paired with this error:

   mdadm: unexpected failure opening /dev/md0

Which stops the test because it fails to create an array. I also see a
lot of the same bugs as below so it may be related.

Logan

--

 BUG: sleeping function called from invalid context at
include/linux/sched/mm.h:274
 in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 853, name: mdadm
 preempt_count: 0, expected: 0
 RCU nest depth: 1, expected: 0
 1 lock held by mdadm/853:
  #0: ffffffff98c623c0 (rcu_read_lock){....}-{1:2}, at:
md_ioctl+0x8f0/0x2670
 CPU: 2 PID: 853 Comm: mdadm Not tainted
6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2
04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5a/0x74
  dump_stack+0x10/0x12
  __might_resched.cold+0x146/0x17e
  __might_sleep+0x66/0xc0
  kmem_cache_alloc_trace+0x2f8/0x400
  kobject_uevent_env+0x121/0xa30
  kobject_uevent+0xb/0x10
  md_new_event+0x6b/0x80
  md_error+0x168/0x1b0
  md_ioctl+0x989/0x2670
  blkdev_ioctl+0x24d/0x450
  __x64_sys_ioctl+0xc0/0x100
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

 =============================
 [ BUG: Invalid wait context ]
 6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680 Tainted: G
  W
 -----------------------------
 mdadm/853 is trying to lock:
 ffffffff990e4950 (uevent_sock_mutex){+.+.}-{3:3}, at:
kobject_uevent_env+0x460/0xa30
 other info that might help us debug this:
 context-{4:4}
 1 lock held by mdadm/853:
  #0: ffffffff98c623c0 (rcu_read_lock){....}-{1:2}, at:
md_ioctl+0x8f0/0x2670
 stack backtrace:
 CPU: 2 PID: 853 Comm: mdadm Tainted: G        W
6.0.0-rc2-eid-vmlocalyes-dbg-00096-g9859e343daaf #2680
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.14.0-2
04/01/2014
 Call Trace:
  <TASK>
  dump_stack_lvl+0x5a/0x74
  dump_stack+0x10/0x12
  __lock_acquire.cold+0x2f2/0x31a
  lock_acquire+0x183/0x440
  __mutex_lock+0x125/0xe20
  mutex_lock_nested+0x1b/0x20
  kobject_uevent_env+0x460/0xa30
  kobject_uevent+0xb/0x10
  md_new_event+0x6b/0x80
  md_error+0x168/0x1b0
  md_ioctl+0x989/0x2670
  blkdev_ioctl+0x24d/0x450
  __x64_sys_ioctl+0xc0/0x100
  do_syscall_64+0x35/0x80
  entry_SYSCALL_64_after_hwframe+0x46/0xb0



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [GIT PULL] md-next 20220921
  2022-09-21 23:44   ` Logan Gunthorpe
@ 2022-09-22  0:40     ` Song Liu
  0 siblings, 0 replies; 4+ messages in thread
From: Song Liu @ 2022-09-22  0:40 UTC (permalink / raw)
  To: Logan Gunthorpe
  Cc: Jens Axboe, linux-raid, David Sloan, Yu Kuai, Mateusz Grzonka,
	Saurabh Sengar, XU pengfei, Guoqing Jiang, Zhou nan


Hi Logan, 

> On Sep 21, 2022, at 4:44 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
> 
> On 2022-09-21 16:37, Logan Gunthorpe wrote:
>> 
>> 
>> On 2022-09-21 15:33, Song Liu wrote:
>>> Hi Jens, 
>>> 
>>> Please consider pulling the following changes for md-next on top of your
>>> for-6.1/block branch (for-6.1/drivers branch doesn't exist yet). 
>>> 
>>> The major changes are:
>>> 
>>> 1. Various raid5 fix and clean up, by Logan Gunthorpe and David Sloan.
>>> 2. Raid10 performance optimization, by Yu Kuai. 
>>> 3. Generate CHANGE uevents for md device, by Mateusz Grzonka. 
>> 
>> I may have hit a bug with my tests on the latest md-next branch. Still
>> trying to hit it again. The last tests I ran for several days with some
>> patches on the previous md-next branch, but I didn't have Mateusz's
>> changes, and it also looks like the branch was rebased today so it could
>> be caused by either of those things. I'll let you know when I know more.
> 
> Yes, ok, I've found two separate issues and both are fixed by reverting
> 
>   21023a82bff7 ("md: generate CHANGE uevents for md device")
> 
> I suggest we drop that patch for this cycle so we can sort them out.
> 
> The issues are:
> 
> 1) The concrete issue comes when running mdadm test 01r1fail. I get the
> kernel bugs at the end of this email. It seems we cannot call
> kobject_uevent() in at least one of the contexts that md_new_event() is
> called in because it sleeps in a critical section.
> 
> 2) With our custom test suite that creates and destroys arrays, adds and
> removes disks, and runs data through them repeatedly, I randomly start
> seeing these warnings:
> 
>   mdadm: Fail to create md0 when using
> /sys/module/md_mod/parameters/new_array, fallback to creation via node
> 
> And then very occasionally get that warning paired with this error:
> 
>   mdadm: unexpected failure opening /dev/md0
> 
> Which stops the test because it fails to create an array. I also see a
> lot of the same bugs as below so it may be related.

Thanks for testing and debugging these issues. I also see issue 1). 

Jens, please ignore this pull request. I will send v2 later. 

Song






^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-09-22  0:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-21 21:33 [GIT PULL] md-next 20220921 Song Liu
2022-09-21 22:37 ` Logan Gunthorpe
2022-09-21 23:44   ` Logan Gunthorpe
2022-09-22  0:40     ` Song Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.