From: john.p.donnelly@oracle.com
To: Waiman Long <longman@redhat.com>,
chenguanyou <chenguanyou9338@gmail.com>,
gregkh@linuxfoundation.org
Cc: dave@stgolabs.net, hdanton@sina.com,
linux-kernel@vger.kernel.org, mazhenhua@xiaomi.com,
mingo@redhat.com, peterz@infradead.org, quic_aiquny@quicinc.com,
will@kernel.org, sashal@kernel.org
Subject: Re: [PATCH v5] locking/rwsem: Make handoff bit handling more consistent
Date: Wed, 20 Apr 2022 08:55:12 -0500 [thread overview]
Message-ID: <eae41639-cbca-4ea6-417f-f9b34a7138ea@oracle.com> (raw)
In-Reply-To: <2b6ed542-b3e0-1a87-33ac-d52fc0e0339c@oracle.com>
On 4/12/22 11:28 AM, john.p.donnelly@oracle.com wrote:
> On 4/11/22 4:07 PM, Waiman Long wrote:
>>
>> On 4/11/22 17:03, john.p.donnelly@oracle.com wrote:
>>>
>>>>>
>>>>> I have reached out to Waiman and he suggested this for our next
>>>>> test pass:
>>>>>
>>>>>
>>>>> 1ee326196c6658 locking/rwsem: Always try to wake waiters in
>>>>> out_nolock path
>>>>
>>>> Does this commit help to avoid the lockup problem?
>>>>
>>>> Commit 1ee326196c6658 fixes a potential missed wakeup problem when a
>>>> reader first in the wait queue is interrupted out without acquiring
>>>> the lock. It is actually not a fix for commit d257cc8cb8d5. However,
>>>> this commit changes the out_nolock path behavior of writers by
>>>> leaving the handoff bit set when the wait queue isn't empty. That
>>>> likely makes the missed wakeup problem easier to reproduce.
>>>>
>>>> Cheers,
>>>> Longman
>>>>
>>>
>>> Hi,
>>>
>>>
>>> We are testing now
>>>
>>> ETA for fio soak test completion is ~15hr from now.
>>>
>>> I wanted to share the stack traces for future reference + occurrences.
>>>
>> I am looking forward to your testing results tomorrow.
>>
>> Cheers,
>> Longman
>>
> Hi
>
> Our 24hr fio soak test with :
>
> 1ee326196c6658 locking/rwsem: Always try to wake waiters in out_nolock
> path
>
>
> applied to 5.15.30 passed.
>
> I suggest you append 1ee326196c6658 with :
>
>
> cc: stable
>
> Fixes: d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more
> consistent")
>
>
> I'll leave the implementation details up to the core maintainers how to
> do that ;-)
>
> ...
>
> Thank you
>
> John.
Hi ,
We have observed another panic with :
1ee326196c6658 locking/rwsem: Always try to wake waiters in out_nolock
path
Applied to 5.15.30 :
PID: 3789 TASK: ffff900fc409b300 CPU: 29 COMMAND: "dio/dm-0"
#0 [fffffe00006bce50] crash_nmi_callback at ffffffff97c772c3
#1 [fffffe00006bce58] nmi_handle at ffffffff97c40778
#2 [fffffe00006bcea0] default_do_nmi at ffffffff988161e2
#3 [fffffe00006bcec8] exc_nmi at ffffffff9881648d
#4 [fffffe00006bcef0] end_repeat_nmi at ffffffff98a0153b
[exception RIP: _raw_spin_lock_irq+35]
RIP: ffffffff98827333 RSP: ffffa9320917fc78 RFLAGS: 00000046
RAX: 0000000000000000 RBX: ffff900fc409b300 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
RBP: ffffa9320917fd20 R8: 0000000000000000 R9: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffff90006259546c
R13: ffffa9320917fcb0 R14: ffff900062595458 R15: 0000000000000000
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
--- <NMI exception stack> ---
#5 [ffffa9320917fc78] _raw_spin_lock_irq at ffffffff98827333
#6 [ffffa9320917fc78] rwsem_down_write_slowpath at ffffffff97d25d49
#7 [ffffa9320917fd28] ext4_map_blocks at ffffffffc104b6dc [ext4]
#8 [ffffa9320917fd98] ext4_convert_unwritten_extents at
ffffffffc10369e0 [ext4]
#9 [ffffa9320917fdf0] ext4_dio_write_end_io at ffffffffc103b2aa [ext4]
#10 [ffffa9320917fe18] iomap_dio_complete at ffffffff98013f45
#11 [ffffa9320917fe48] iomap_dio_complete_work at ffffffff98014047
#12 [ffffa9320917fe60] process_one_work at ffffffff97cd9191
#13 [ffffa9320917fea8] rescuer_thread at ffffffff97cd991b
#14 [ffffa9320917ff10] kthread at ffffffff97ce11f7
#15 [ffffa9320917ff50] ret_from_fork at ffffffff97c04cf2
crash>
The failure is observed running "fio test suite" as a 24 hour soak test
on an LVM composed of four NVME devices, Intel 72 core server. The
test cycles through a variety of file-system types.
This kernel has these commits
1ee326196c6658 locking/rwsem: Always try to wake waiters in out_nolock path
d257cc8cb8d5 ("locking/rwsem: Make handoff bit handling more consistent")
In earlier testing I had reverted d257cc8cb8d5 and did not observe said
panics. I still feel d257cc8cb8d5 is still the root cause.
Thank you,
John.
next prev parent reply other threads:[~2022-04-20 13:55 UTC|newest]
Thread overview: 41+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-16 1:29 [PATCH v5] locking/rwsem: Make handoff bit handling more consistent Waiman Long
2021-11-16 2:52 ` Aiqun(Maria) Yu
2021-11-16 9:14 ` Peter Zijlstra
2021-11-16 9:24 ` Peter Zijlstra
2021-11-16 14:52 ` Waiman Long
2021-11-17 13:36 ` Peter Zijlstra
2021-11-23 8:53 ` [tip: locking/urgent] " tip-bot2 for Waiman Long
2022-02-14 15:47 ` Re:[PATCH v5] " chenguanyou
2022-02-14 16:01 ` [PATCH " Greg KH
2022-04-11 18:26 ` john.p.donnelly
2022-04-11 18:40 ` Waiman Long
2022-04-11 21:03 ` john.p.donnelly
2022-04-11 21:07 ` Waiman Long
2022-04-12 16:28 ` john.p.donnelly
2022-04-12 17:04 ` Waiman Long
2022-04-14 10:48 ` Greg KH
2022-04-14 15:18 ` Waiman Long
2022-04-14 15:42 ` Greg KH
2022-04-14 15:44 ` Waiman Long
2022-04-20 13:55 ` john.p.donnelly [this message]
2022-04-26 20:21 ` Waiman Long
2022-04-26 21:22 ` john.p.donnelly
2022-02-14 16:22 ` chenguanyou
2022-02-15 7:41 ` [PATCH " Greg KH
2022-02-16 16:30 ` Waiman Long
2022-02-17 15:41 ` chenguanyou
2022-03-14 8:07 ` [PATCH " Greg KH
2022-03-22 2:49 ` chenguanyou
2022-03-24 12:51 ` [PATCH " Greg KH
2022-07-19 0:27 ` Doug Anderson
2022-07-19 10:41 ` Hillf Danton
2022-07-19 15:30 ` Doug Anderson
2022-07-22 11:55 ` Hillf Danton
2022-07-22 14:02 ` Doug Anderson
2022-07-23 0:17 ` Hillf Danton
2022-07-23 1:27 ` Hillf Danton
2022-08-05 17:14 ` Doug Anderson
2022-08-05 19:02 ` Waiman Long
2022-08-05 19:16 ` Doug Anderson
2022-08-30 16:18 ` Doug Anderson
2022-08-31 11:08 ` Hillf Danton
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=eae41639-cbca-4ea6-417f-f9b34a7138ea@oracle.com \
--to=john.p.donnelly@oracle.com \
--cc=chenguanyou9338@gmail.com \
--cc=dave@stgolabs.net \
--cc=gregkh@linuxfoundation.org \
--cc=hdanton@sina.com \
--cc=linux-kernel@vger.kernel.org \
--cc=longman@redhat.com \
--cc=mazhenhua@xiaomi.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=quic_aiquny@quicinc.com \
--cc=sashal@kernel.org \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.