* [PATCHv2 dlm/next 0/2] fs: dlm: some callback and lowcomm fixes
@ 2022-07-25 19:53 ` Alexander Aring
0 siblings, 0 replies; 8+ messages in thread
From: Alexander Aring @ 2022-07-25 19:53 UTC (permalink / raw)
To: teigland; +Cc: cluster-devel, stable, aahringo
Hi,
I currently look a little bit deeper into the callback handling of dlm.
I have some local branches which does more some rework and moving away
from the lkb_callbacks[] array per lkb and using a queue for handling
callbacks. However those are issues which I currently found for now
and should be fixed.
- Alex
changes since v2:
- drop patches 2/3 and 3/3 as it looks okay. Sorry about the noise.
- change commit messages regarding v2 changes.
- add patch "fs: dlm: fix race in lowcomms"
Alexander Aring (2):
fs: dlm: fix race in lowcomms
fs: dlm: fix race between test_bit() and queue_work()
fs/dlm/ast.c | 6 ++++--
fs/dlm/lowcomms.c | 4 ++++
2 files changed, 8 insertions(+), 2 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Cluster-devel] [PATCHv2 dlm/next 0/2] fs: dlm: some callback and lowcomm fixes
@ 2022-07-25 19:53 ` Alexander Aring
0 siblings, 0 replies; 8+ messages in thread
From: Alexander Aring @ 2022-07-25 19:53 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
I currently look a little bit deeper into the callback handling of dlm.
I have some local branches which does more some rework and moving away
from the lkb_callbacks[] array per lkb and using a queue for handling
callbacks. However those are issues which I currently found for now
and should be fixed.
- Alex
changes since v2:
- drop patches 2/3 and 3/3 as it looks okay. Sorry about the noise.
- change commit messages regarding v2 changes.
- add patch "fs: dlm: fix race in lowcomms"
Alexander Aring (2):
fs: dlm: fix race in lowcomms
fs: dlm: fix race between test_bit() and queue_work()
fs/dlm/ast.c | 6 ++++--
fs/dlm/lowcomms.c | 4 ++++
2 files changed, 8 insertions(+), 2 deletions(-)
--
2.31.1
^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCHv2 dlm/next 1/2] fs: dlm: fix race in lowcomms
2022-07-25 19:53 ` [Cluster-devel] " Alexander Aring
@ 2022-07-25 19:53 ` Alexander Aring
-1 siblings, 0 replies; 8+ messages in thread
From: Alexander Aring @ 2022-07-25 19:53 UTC (permalink / raw)
To: teigland; +Cc: cluster-devel, stable, aahringo
This patch fixes a race between queue_work() in
_dlm_lowcomms_commit_msg() and srcu_read_unlock(). The queue_work() can
take the final reference of a dlm_msg and so msg->idx can contain
garbage which is signaled by the following warning:
[ 676.237050] ------------[ cut here ]------------
[ 676.237052] WARNING: CPU: 0 PID: 1060 at include/linux/srcu.h:189 dlm_lowcomms_commit_msg+0x41/0x50
[ 676.238945] Modules linked in: dlm_locktorture torture rpcsec_gss_krb5 intel_rapl_msr intel_rapl_common iTCO_wdt iTCO_vendor_support qxl kvm_intel drm_ttm_helper vmw_vsock_virtio_transport kvm vmw_vsock_virtio_transport_common ttm irqbypass crc32_pclmul joydev crc32c_intel serio_raw drm_kms_helper vsock virtio_scsi virtio_console virtio_balloon snd_pcm drm syscopyarea sysfillrect sysimgblt snd_timer fb_sys_fops i2c_i801 lpc_ich snd i2c_smbus soundcore pcspkr
[ 676.244227] CPU: 0 PID: 1060 Comm: lock_torture_wr Not tainted 5.19.0-rc3+ #1546
[ 676.245216] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-2.module+el8.7.0+15506+033991b0 04/01/2014
[ 676.246460] RIP: 0010:dlm_lowcomms_commit_msg+0x41/0x50
[ 676.247132] Code: fe ff ff ff 75 24 48 c7 c6 bd 0f 49 bb 48 c7 c7 38 7c 01 bd e8 00 e7 ca ff 89 de 48 c7 c7 60 78 01 bd e8 42 3d cd ff 5b 5d c3 <0f> 0b eb d8 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
[ 676.249253] RSP: 0018:ffffa401c18ffc68 EFLAGS: 00010282
[ 676.249855] RAX: 0000000000000001 RBX: 00000000ffff8b76 RCX: 0000000000000006
[ 676.250713] RDX: 0000000000000000 RSI: ffffffffbccf3a10 RDI: ffffffffbcc7b62e
[ 676.251610] RBP: ffffa401c18ffc70 R08: 0000000000000001 R09: 0000000000000001
[ 676.252481] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000005
[ 676.253421] R13: ffff8b76786ec370 R14: ffff8b76786ec370 R15: ffff8b76786ec480
[ 676.254257] FS: 0000000000000000(0000) GS:ffff8b7777800000(0000) knlGS:0000000000000000
[ 676.255239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 676.255897] CR2: 00005590205d88b8 CR3: 000000017656c003 CR4: 0000000000770ee0
[ 676.256734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 676.257567] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 676.258397] PKRU: 55555554
[ 676.258729] Call Trace:
[ 676.259063] <TASK>
[ 676.259354] dlm_midcomms_commit_mhandle+0xcc/0x110
[ 676.259964] queue_bast+0x8b/0xb0
[ 676.260423] grant_pending_locks+0x166/0x1b0
[ 676.261007] _unlock_lock+0x75/0x90
[ 676.261469] unlock_lock.isra.57+0x62/0xa0
[ 676.262009] dlm_unlock+0x21e/0x330
[ 676.262457] ? lock_torture_stats+0x80/0x80 [dlm_locktorture]
[ 676.263183] torture_unlock+0x5a/0x90 [dlm_locktorture]
[ 676.263815] ? preempt_count_sub+0xba/0x100
[ 676.264361] ? complete+0x1d/0x60
[ 676.264777] lock_torture_writer+0xb8/0x150 [dlm_locktorture]
[ 676.265555] kthread+0x10a/0x130
[ 676.266007] ? kthread_complete_and_exit+0x20/0x20
[ 676.266616] ret_from_fork+0x22/0x30
[ 676.267097] </TASK>
[ 676.267381] irq event stamp: 9579855
[ 676.267824] hardirqs last enabled at (9579863): [<ffffffffbb14e6f8>] __up_console_sem+0x58/0x60
[ 676.268896] hardirqs last disabled at (9579872): [<ffffffffbb14e6dd>] __up_console_sem+0x3d/0x60
[ 676.270008] softirqs last enabled at (9579798): [<ffffffffbc200349>] __do_softirq+0x349/0x4c7
[ 676.271438] softirqs last disabled at (9579897): [<ffffffffbb0d54c0>] irq_exit_rcu+0xb0/0xf0
[ 676.272796] ---[ end trace 0000000000000000 ]---
I reproduced this warning with dlm_locktorture test which is currently
not upstream. However this patch fix the issue by make a additional
refcount between dlm_lowcomms_new_msg() and dlm_lowcomms_commit_msg().
In case of the race the kref_put() in dlm_lowcomms_commit_msg() will be
the final put.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/lowcomms.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index a4e84e8d94c8..59f64c596233 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1336,6 +1336,8 @@ struct dlm_msg *dlm_lowcomms_new_msg(int nodeid, int len, gfp_t allocation,
return NULL;
}
+ /* for dlm_lowcomms_commit_msg() */
+ kref_get(&msg->ref);
/* we assume if successful commit must called */
msg->idx = idx;
return msg;
@@ -1375,6 +1377,8 @@ void dlm_lowcomms_commit_msg(struct dlm_msg *msg)
{
_dlm_lowcomms_commit_msg(msg);
srcu_read_unlock(&connections_srcu, msg->idx);
+ /* because dlm_lowcomms_new_msg() */
+ kref_put(&msg->ref, dlm_msg_release);
}
#endif
--
2.31.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Cluster-devel] [PATCHv2 dlm/next 1/2] fs: dlm: fix race in lowcomms
@ 2022-07-25 19:53 ` Alexander Aring
0 siblings, 0 replies; 8+ messages in thread
From: Alexander Aring @ 2022-07-25 19:53 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch fixes a race between queue_work() in
_dlm_lowcomms_commit_msg() and srcu_read_unlock(). The queue_work() can
take the final reference of a dlm_msg and so msg->idx can contain
garbage which is signaled by the following warning:
[ 676.237050] ------------[ cut here ]------------
[ 676.237052] WARNING: CPU: 0 PID: 1060 at include/linux/srcu.h:189 dlm_lowcomms_commit_msg+0x41/0x50
[ 676.238945] Modules linked in: dlm_locktorture torture rpcsec_gss_krb5 intel_rapl_msr intel_rapl_common iTCO_wdt iTCO_vendor_support qxl kvm_intel drm_ttm_helper vmw_vsock_virtio_transport kvm vmw_vsock_virtio_transport_common ttm irqbypass crc32_pclmul joydev crc32c_intel serio_raw drm_kms_helper vsock virtio_scsi virtio_console virtio_balloon snd_pcm drm syscopyarea sysfillrect sysimgblt snd_timer fb_sys_fops i2c_i801 lpc_ich snd i2c_smbus soundcore pcspkr
[ 676.244227] CPU: 0 PID: 1060 Comm: lock_torture_wr Not tainted 5.19.0-rc3+ #1546
[ 676.245216] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-2.module+el8.7.0+15506+033991b0 04/01/2014
[ 676.246460] RIP: 0010:dlm_lowcomms_commit_msg+0x41/0x50
[ 676.247132] Code: fe ff ff ff 75 24 48 c7 c6 bd 0f 49 bb 48 c7 c7 38 7c 01 bd e8 00 e7 ca ff 89 de 48 c7 c7 60 78 01 bd e8 42 3d cd ff 5b 5d c3 <0f> 0b eb d8 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
[ 676.249253] RSP: 0018:ffffa401c18ffc68 EFLAGS: 00010282
[ 676.249855] RAX: 0000000000000001 RBX: 00000000ffff8b76 RCX: 0000000000000006
[ 676.250713] RDX: 0000000000000000 RSI: ffffffffbccf3a10 RDI: ffffffffbcc7b62e
[ 676.251610] RBP: ffffa401c18ffc70 R08: 0000000000000001 R09: 0000000000000001
[ 676.252481] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000005
[ 676.253421] R13: ffff8b76786ec370 R14: ffff8b76786ec370 R15: ffff8b76786ec480
[ 676.254257] FS: 0000000000000000(0000) GS:ffff8b7777800000(0000) knlGS:0000000000000000
[ 676.255239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 676.255897] CR2: 00005590205d88b8 CR3: 000000017656c003 CR4: 0000000000770ee0
[ 676.256734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 676.257567] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 676.258397] PKRU: 55555554
[ 676.258729] Call Trace:
[ 676.259063] <TASK>
[ 676.259354] dlm_midcomms_commit_mhandle+0xcc/0x110
[ 676.259964] queue_bast+0x8b/0xb0
[ 676.260423] grant_pending_locks+0x166/0x1b0
[ 676.261007] _unlock_lock+0x75/0x90
[ 676.261469] unlock_lock.isra.57+0x62/0xa0
[ 676.262009] dlm_unlock+0x21e/0x330
[ 676.262457] ? lock_torture_stats+0x80/0x80 [dlm_locktorture]
[ 676.263183] torture_unlock+0x5a/0x90 [dlm_locktorture]
[ 676.263815] ? preempt_count_sub+0xba/0x100
[ 676.264361] ? complete+0x1d/0x60
[ 676.264777] lock_torture_writer+0xb8/0x150 [dlm_locktorture]
[ 676.265555] kthread+0x10a/0x130
[ 676.266007] ? kthread_complete_and_exit+0x20/0x20
[ 676.266616] ret_from_fork+0x22/0x30
[ 676.267097] </TASK>
[ 676.267381] irq event stamp: 9579855
[ 676.267824] hardirqs last enabled at (9579863): [<ffffffffbb14e6f8>] __up_console_sem+0x58/0x60
[ 676.268896] hardirqs last disabled at (9579872): [<ffffffffbb14e6dd>] __up_console_sem+0x3d/0x60
[ 676.270008] softirqs last enabled at (9579798): [<ffffffffbc200349>] __do_softirq+0x349/0x4c7
[ 676.271438] softirqs last disabled at (9579897): [<ffffffffbb0d54c0>] irq_exit_rcu+0xb0/0xf0
[ 676.272796] ---[ end trace 0000000000000000 ]---
I reproduced this warning with dlm_locktorture test which is currently
not upstream. However this patch fix the issue by make a additional
refcount between dlm_lowcomms_new_msg() and dlm_lowcomms_commit_msg().
In case of the race the kref_put() in dlm_lowcomms_commit_msg() will be
the final put.
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/lowcomms.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/fs/dlm/lowcomms.c b/fs/dlm/lowcomms.c
index a4e84e8d94c8..59f64c596233 100644
--- a/fs/dlm/lowcomms.c
+++ b/fs/dlm/lowcomms.c
@@ -1336,6 +1336,8 @@ struct dlm_msg *dlm_lowcomms_new_msg(int nodeid, int len, gfp_t allocation,
return NULL;
}
+ /* for dlm_lowcomms_commit_msg() */
+ kref_get(&msg->ref);
/* we assume if successful commit must called */
msg->idx = idx;
return msg;
@@ -1375,6 +1377,8 @@ void dlm_lowcomms_commit_msg(struct dlm_msg *msg)
{
_dlm_lowcomms_commit_msg(msg);
srcu_read_unlock(&connections_srcu, msg->idx);
+ /* because dlm_lowcomms_new_msg() */
+ kref_put(&msg->ref, dlm_msg_release);
}
#endif
--
2.31.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCHv2 dlm/next 2/2] fs: dlm: fix race between test_bit() and queue_work()
2022-07-25 19:53 ` [Cluster-devel] " Alexander Aring
@ 2022-07-25 19:53 ` Alexander Aring
-1 siblings, 0 replies; 8+ messages in thread
From: Alexander Aring @ 2022-07-25 19:53 UTC (permalink / raw)
To: teigland; +Cc: cluster-devel, stable, aahringo
This patch will fix a race by surround ls_cb_mutex in set_bit() and the
test_bit() and it's conditional code blocks for LSFL_CB_DELAY.
The function dlm_callback_stop() has the idea to stop all callbacks and
flush all currently queued onces. The set_bit() is not enough because
there can be still queue_work() around after the workqueue was flushed.
To avoid queue_work() after set_bit() we surround both by ls_cb_mutex
lock.
Cc: stable@vger.kernel.org
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/ast.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c
index 19ef136f9e4f..a44cc42b6317 100644
--- a/fs/dlm/ast.c
+++ b/fs/dlm/ast.c
@@ -200,13 +200,13 @@ void dlm_add_cb(struct dlm_lkb *lkb, uint32_t flags, int mode, int status,
if (!prev_seq) {
kref_get(&lkb->lkb_ref);
+ mutex_lock(&ls->ls_cb_mutex);
if (test_bit(LSFL_CB_DELAY, &ls->ls_flags)) {
- mutex_lock(&ls->ls_cb_mutex);
list_add(&lkb->lkb_cb_list, &ls->ls_cb_delay);
- mutex_unlock(&ls->ls_cb_mutex);
} else {
queue_work(ls->ls_callback_wq, &lkb->lkb_cb_work);
}
+ mutex_unlock(&ls->ls_cb_mutex);
}
out:
mutex_unlock(&lkb->lkb_cb_mutex);
@@ -288,7 +288,9 @@ void dlm_callback_stop(struct dlm_ls *ls)
void dlm_callback_suspend(struct dlm_ls *ls)
{
+ mutex_lock(&ls->ls_cb_mutex);
set_bit(LSFL_CB_DELAY, &ls->ls_flags);
+ mutex_unlock(&ls->ls_cb_mutex);
if (ls->ls_callback_wq)
flush_workqueue(ls->ls_callback_wq);
--
2.31.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* [Cluster-devel] [PATCHv2 dlm/next 2/2] fs: dlm: fix race between test_bit() and queue_work()
@ 2022-07-25 19:53 ` Alexander Aring
0 siblings, 0 replies; 8+ messages in thread
From: Alexander Aring @ 2022-07-25 19:53 UTC (permalink / raw)
To: cluster-devel.redhat.com
This patch will fix a race by surround ls_cb_mutex in set_bit() and the
test_bit() and it's conditional code blocks for LSFL_CB_DELAY.
The function dlm_callback_stop() has the idea to stop all callbacks and
flush all currently queued onces. The set_bit() is not enough because
there can be still queue_work() around after the workqueue was flushed.
To avoid queue_work() after set_bit() we surround both by ls_cb_mutex
lock.
Cc: stable at vger.kernel.org
Signed-off-by: Alexander Aring <aahringo@redhat.com>
---
fs/dlm/ast.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/fs/dlm/ast.c b/fs/dlm/ast.c
index 19ef136f9e4f..a44cc42b6317 100644
--- a/fs/dlm/ast.c
+++ b/fs/dlm/ast.c
@@ -200,13 +200,13 @@ void dlm_add_cb(struct dlm_lkb *lkb, uint32_t flags, int mode, int status,
if (!prev_seq) {
kref_get(&lkb->lkb_ref);
+ mutex_lock(&ls->ls_cb_mutex);
if (test_bit(LSFL_CB_DELAY, &ls->ls_flags)) {
- mutex_lock(&ls->ls_cb_mutex);
list_add(&lkb->lkb_cb_list, &ls->ls_cb_delay);
- mutex_unlock(&ls->ls_cb_mutex);
} else {
queue_work(ls->ls_callback_wq, &lkb->lkb_cb_work);
}
+ mutex_unlock(&ls->ls_cb_mutex);
}
out:
mutex_unlock(&lkb->lkb_cb_mutex);
@@ -288,7 +288,9 @@ void dlm_callback_stop(struct dlm_ls *ls)
void dlm_callback_suspend(struct dlm_ls *ls)
{
+ mutex_lock(&ls->ls_cb_mutex);
set_bit(LSFL_CB_DELAY, &ls->ls_flags);
+ mutex_unlock(&ls->ls_cb_mutex);
if (ls->ls_callback_wq)
flush_workqueue(ls->ls_callback_wq);
--
2.31.1
^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [PATCHv2 dlm/next 1/2] fs: dlm: fix race in lowcomms
2022-07-25 19:53 ` [Cluster-devel] " Alexander Aring
@ 2022-07-25 21:41 ` Alexander Aring
-1 siblings, 0 replies; 8+ messages in thread
From: Alexander Aring @ 2022-07-25 21:41 UTC (permalink / raw)
To: David Teigland; +Cc: cluster-devel, stable
Hi,
On Mon, Jul 25, 2022 at 3:53 PM Alexander Aring <aahringo@redhat.com> wrote:
>
> This patch fixes a race between queue_work() in
> _dlm_lowcomms_commit_msg() and srcu_read_unlock(). The queue_work() can
> take the final reference of a dlm_msg and so msg->idx can contain
> garbage which is signaled by the following warning:
>
> [ 676.237050] ------------[ cut here ]------------
> [ 676.237052] WARNING: CPU: 0 PID: 1060 at include/linux/srcu.h:189 dlm_lowcomms_commit_msg+0x41/0x50
> [ 676.238945] Modules linked in: dlm_locktorture torture rpcsec_gss_krb5 intel_rapl_msr intel_rapl_common iTCO_wdt iTCO_vendor_support qxl kvm_intel drm_ttm_helper vmw_vsock_virtio_transport kvm vmw_vsock_virtio_transport_common ttm irqbypass crc32_pclmul joydev crc32c_intel serio_raw drm_kms_helper vsock virtio_scsi virtio_console virtio_balloon snd_pcm drm syscopyarea sysfillrect sysimgblt snd_timer fb_sys_fops i2c_i801 lpc_ich snd i2c_smbus soundcore pcspkr
> [ 676.244227] CPU: 0 PID: 1060 Comm: lock_torture_wr Not tainted 5.19.0-rc3+ #1546
> [ 676.245216] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-2.module+el8.7.0+15506+033991b0 04/01/2014
> [ 676.246460] RIP: 0010:dlm_lowcomms_commit_msg+0x41/0x50
> [ 676.247132] Code: fe ff ff ff 75 24 48 c7 c6 bd 0f 49 bb 48 c7 c7 38 7c 01 bd e8 00 e7 ca ff 89 de 48 c7 c7 60 78 01 bd e8 42 3d cd ff 5b 5d c3 <0f> 0b eb d8 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
> [ 676.249253] RSP: 0018:ffffa401c18ffc68 EFLAGS: 00010282
> [ 676.249855] RAX: 0000000000000001 RBX: 00000000ffff8b76 RCX: 0000000000000006
> [ 676.250713] RDX: 0000000000000000 RSI: ffffffffbccf3a10 RDI: ffffffffbcc7b62e
> [ 676.251610] RBP: ffffa401c18ffc70 R08: 0000000000000001 R09: 0000000000000001
> [ 676.252481] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000005
> [ 676.253421] R13: ffff8b76786ec370 R14: ffff8b76786ec370 R15: ffff8b76786ec480
> [ 676.254257] FS: 0000000000000000(0000) GS:ffff8b7777800000(0000) knlGS:0000000000000000
> [ 676.255239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 676.255897] CR2: 00005590205d88b8 CR3: 000000017656c003 CR4: 0000000000770ee0
> [ 676.256734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 676.257567] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 676.258397] PKRU: 55555554
> [ 676.258729] Call Trace:
> [ 676.259063] <TASK>
> [ 676.259354] dlm_midcomms_commit_mhandle+0xcc/0x110
> [ 676.259964] queue_bast+0x8b/0xb0
> [ 676.260423] grant_pending_locks+0x166/0x1b0
> [ 676.261007] _unlock_lock+0x75/0x90
> [ 676.261469] unlock_lock.isra.57+0x62/0xa0
> [ 676.262009] dlm_unlock+0x21e/0x330
> [ 676.262457] ? lock_torture_stats+0x80/0x80 [dlm_locktorture]
> [ 676.263183] torture_unlock+0x5a/0x90 [dlm_locktorture]
> [ 676.263815] ? preempt_count_sub+0xba/0x100
> [ 676.264361] ? complete+0x1d/0x60
> [ 676.264777] lock_torture_writer+0xb8/0x150 [dlm_locktorture]
> [ 676.265555] kthread+0x10a/0x130
> [ 676.266007] ? kthread_complete_and_exit+0x20/0x20
> [ 676.266616] ret_from_fork+0x22/0x30
> [ 676.267097] </TASK>
> [ 676.267381] irq event stamp: 9579855
> [ 676.267824] hardirqs last enabled at (9579863): [<ffffffffbb14e6f8>] __up_console_sem+0x58/0x60
> [ 676.268896] hardirqs last disabled at (9579872): [<ffffffffbb14e6dd>] __up_console_sem+0x3d/0x60
> [ 676.270008] softirqs last enabled at (9579798): [<ffffffffbc200349>] __do_softirq+0x349/0x4c7
> [ 676.271438] softirqs last disabled at (9579897): [<ffffffffbb0d54c0>] irq_exit_rcu+0xb0/0xf0
> [ 676.272796] ---[ end trace 0000000000000000 ]---
>
> I reproduced this warning with dlm_locktorture test which is currently
> not upstream. However this patch fix the issue by make a additional
> refcount between dlm_lowcomms_new_msg() and dlm_lowcomms_commit_msg().
> In case of the race the kref_put() in dlm_lowcomms_commit_msg() will be
> the final put.
>
> Signed-off-by: Alexander Aring <aahringo@redhat.com>
grml, now I forgot in this patch Cc: stable and fixes-tag. Will send v3. Sorry.
- Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
* [Cluster-devel] [PATCHv2 dlm/next 1/2] fs: dlm: fix race in lowcomms
@ 2022-07-25 21:41 ` Alexander Aring
0 siblings, 0 replies; 8+ messages in thread
From: Alexander Aring @ 2022-07-25 21:41 UTC (permalink / raw)
To: cluster-devel.redhat.com
Hi,
On Mon, Jul 25, 2022 at 3:53 PM Alexander Aring <aahringo@redhat.com> wrote:
>
> This patch fixes a race between queue_work() in
> _dlm_lowcomms_commit_msg() and srcu_read_unlock(). The queue_work() can
> take the final reference of a dlm_msg and so msg->idx can contain
> garbage which is signaled by the following warning:
>
> [ 676.237050] ------------[ cut here ]------------
> [ 676.237052] WARNING: CPU: 0 PID: 1060 at include/linux/srcu.h:189 dlm_lowcomms_commit_msg+0x41/0x50
> [ 676.238945] Modules linked in: dlm_locktorture torture rpcsec_gss_krb5 intel_rapl_msr intel_rapl_common iTCO_wdt iTCO_vendor_support qxl kvm_intel drm_ttm_helper vmw_vsock_virtio_transport kvm vmw_vsock_virtio_transport_common ttm irqbypass crc32_pclmul joydev crc32c_intel serio_raw drm_kms_helper vsock virtio_scsi virtio_console virtio_balloon snd_pcm drm syscopyarea sysfillrect sysimgblt snd_timer fb_sys_fops i2c_i801 lpc_ich snd i2c_smbus soundcore pcspkr
> [ 676.244227] CPU: 0 PID: 1060 Comm: lock_torture_wr Not tainted 5.19.0-rc3+ #1546
> [ 676.245216] Hardware name: Red Hat KVM/RHEL-AV, BIOS 1.16.0-2.module+el8.7.0+15506+033991b0 04/01/2014
> [ 676.246460] RIP: 0010:dlm_lowcomms_commit_msg+0x41/0x50
> [ 676.247132] Code: fe ff ff ff 75 24 48 c7 c6 bd 0f 49 bb 48 c7 c7 38 7c 01 bd e8 00 e7 ca ff 89 de 48 c7 c7 60 78 01 bd e8 42 3d cd ff 5b 5d c3 <0f> 0b eb d8 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48
> [ 676.249253] RSP: 0018:ffffa401c18ffc68 EFLAGS: 00010282
> [ 676.249855] RAX: 0000000000000001 RBX: 00000000ffff8b76 RCX: 0000000000000006
> [ 676.250713] RDX: 0000000000000000 RSI: ffffffffbccf3a10 RDI: ffffffffbcc7b62e
> [ 676.251610] RBP: ffffa401c18ffc70 R08: 0000000000000001 R09: 0000000000000001
> [ 676.252481] R10: 0000000000000001 R11: 0000000000000001 R12: 0000000000000005
> [ 676.253421] R13: ffff8b76786ec370 R14: ffff8b76786ec370 R15: ffff8b76786ec480
> [ 676.254257] FS: 0000000000000000(0000) GS:ffff8b7777800000(0000) knlGS:0000000000000000
> [ 676.255239] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 676.255897] CR2: 00005590205d88b8 CR3: 000000017656c003 CR4: 0000000000770ee0
> [ 676.256734] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 676.257567] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 676.258397] PKRU: 55555554
> [ 676.258729] Call Trace:
> [ 676.259063] <TASK>
> [ 676.259354] dlm_midcomms_commit_mhandle+0xcc/0x110
> [ 676.259964] queue_bast+0x8b/0xb0
> [ 676.260423] grant_pending_locks+0x166/0x1b0
> [ 676.261007] _unlock_lock+0x75/0x90
> [ 676.261469] unlock_lock.isra.57+0x62/0xa0
> [ 676.262009] dlm_unlock+0x21e/0x330
> [ 676.262457] ? lock_torture_stats+0x80/0x80 [dlm_locktorture]
> [ 676.263183] torture_unlock+0x5a/0x90 [dlm_locktorture]
> [ 676.263815] ? preempt_count_sub+0xba/0x100
> [ 676.264361] ? complete+0x1d/0x60
> [ 676.264777] lock_torture_writer+0xb8/0x150 [dlm_locktorture]
> [ 676.265555] kthread+0x10a/0x130
> [ 676.266007] ? kthread_complete_and_exit+0x20/0x20
> [ 676.266616] ret_from_fork+0x22/0x30
> [ 676.267097] </TASK>
> [ 676.267381] irq event stamp: 9579855
> [ 676.267824] hardirqs last enabled at (9579863): [<ffffffffbb14e6f8>] __up_console_sem+0x58/0x60
> [ 676.268896] hardirqs last disabled at (9579872): [<ffffffffbb14e6dd>] __up_console_sem+0x3d/0x60
> [ 676.270008] softirqs last enabled at (9579798): [<ffffffffbc200349>] __do_softirq+0x349/0x4c7
> [ 676.271438] softirqs last disabled at (9579897): [<ffffffffbb0d54c0>] irq_exit_rcu+0xb0/0xf0
> [ 676.272796] ---[ end trace 0000000000000000 ]---
>
> I reproduced this warning with dlm_locktorture test which is currently
> not upstream. However this patch fix the issue by make a additional
> refcount between dlm_lowcomms_new_msg() and dlm_lowcomms_commit_msg().
> In case of the race the kref_put() in dlm_lowcomms_commit_msg() will be
> the final put.
>
> Signed-off-by: Alexander Aring <aahringo@redhat.com>
grml, now I forgot in this patch Cc: stable and fixes-tag. Will send v3. Sorry.
- Alex
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2022-07-25 21:41 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-07-25 19:53 [PATCHv2 dlm/next 0/2] fs: dlm: some callback and lowcomm fixes Alexander Aring
2022-07-25 19:53 ` [Cluster-devel] " Alexander Aring
2022-07-25 19:53 ` [PATCHv2 dlm/next 1/2] fs: dlm: fix race in lowcomms Alexander Aring
2022-07-25 19:53 ` [Cluster-devel] " Alexander Aring
2022-07-25 21:41 ` Alexander Aring
2022-07-25 21:41 ` [Cluster-devel] " Alexander Aring
2022-07-25 19:53 ` [PATCHv2 dlm/next 2/2] fs: dlm: fix race between test_bit() and queue_work() Alexander Aring
2022-07-25 19:53 ` [Cluster-devel] " Alexander Aring
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.