linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fscache: Use wake_up_var() to wake up pending volume acquisition
@ 2022-11-28  3:19 Hou Tao
  2022-11-28  7:56 ` Jingbo Xu
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Hou Tao @ 2022-11-28  3:19 UTC (permalink / raw)
  To: linux-cachefs
  Cc: David Howells, Jeff Layton, linux-erofs, linux-kernel, houtao1

From: Hou Tao <houtao1@huawei.com>

The freeing of relinquished volume will wake up the pending volume
acquisition by using wake_up_bit(), however it is mismatched with
wait_var_event() used in fscache_wait_on_volume_collision() and it will
never wake up the waiter in the wait-queue because these two functions
operate on different wait-queues.

According to the implementation in fscache_wait_on_volume_collision(),
if the wake-up of pending acquisition is delayed longer than 20 seconds
(e.g., due to the delay of on-demand fd closing), the first
wait_var_event_timeout() will timeout and the following wait_var_event()
will hang forever as shown below:

 FS-Cache: Potential volume collision new=00000024 old=00000022
 ......
 INFO: task mount:1148 blocked for more than 122 seconds.
       Not tainted 6.1.0-rc6+ #1
 task:mount           state:D stack:0     pid:1148  ppid:1
 Call Trace:
  <TASK>
  __schedule+0x2f6/0xb80
  schedule+0x67/0xe0
  fscache_wait_on_volume_collision.cold+0x80/0x82
  __fscache_acquire_volume+0x40d/0x4e0
  erofs_fscache_register_volume+0x51/0xe0 [erofs]
  erofs_fscache_register_fs+0x19c/0x240 [erofs]
  erofs_fc_fill_super+0x746/0xaf0 [erofs]
  vfs_get_super+0x7d/0x100
  get_tree_nodev+0x16/0x20
  erofs_fc_get_tree+0x20/0x30 [erofs]
  vfs_get_tree+0x24/0xb0
  path_mount+0x2fa/0xa90
  do_mount+0x7c/0xa0
  __x64_sys_mount+0x8b/0xe0
  do_syscall_64+0x30/0x60
  entry_SYSCALL_64_after_hwframe+0x46/0xb0

Fixing it by using wake_up_var() instead of wake_up_bit(). In addition
because waitqueue_active() is used in wake_up_var() and clear_bit()
doesn't imply any memory barrier, so do smp_mb__after_atomic() before
invoking wake_up_var().

Fixes: 62ab63352350 ("fscache: Implement volume registration")
Signed-off-by: Hou Tao <houtao1@huawei.com>
---
 fs/fscache/volume.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
index ab8ceddf9efa..cf8293bb1aca 100644
--- a/fs/fscache/volume.c
+++ b/fs/fscache/volume.c
@@ -348,7 +348,12 @@ static void fscache_wake_pending_volume(struct fscache_volume *volume,
 		if (fscache_volume_same(cursor, volume)) {
 			fscache_see_volume(cursor, fscache_volume_see_hash_wake);
 			clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
-			wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
+			/*
+			 * Paired with barrier in wait_var_event(). Check
+			 * waitqueue_active() and wake_up_var() for details.
+			 */
+			smp_mb__after_atomic();
+			wake_up_var(&cursor->flags);
 			return;
 		}
 	}
-- 
2.29.2


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] fscache: Use wake_up_var() to wake up pending volume acquisition
  2022-11-28  3:19 [PATCH] fscache: Use wake_up_var() to wake up pending volume acquisition Hou Tao
@ 2022-11-28  7:56 ` Jingbo Xu
  2022-12-09 11:17 ` Hou Tao
  2022-12-09 11:26 ` David Howells
  2 siblings, 0 replies; 5+ messages in thread
From: Jingbo Xu @ 2022-11-28  7:56 UTC (permalink / raw)
  To: Hou Tao, linux-cachefs
  Cc: David Howells, linux-erofs, Jeff Layton, linux-kernel, houtao1

Hi,

Thanks for catching this.


On 11/28/22 11:19 AM, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
> 
> The freeing of relinquished volume will wake up the pending volume
> acquisition by using wake_up_bit(), however it is mismatched with
> wait_var_event() used in fscache_wait_on_volume_collision() and it will
> never wake up the waiter in the wait-queue because these two functions
> operate on different wait-queues.
> 
> According to the implementation in fscache_wait_on_volume_collision(),
> if the wake-up of pending acquisition is delayed longer than 20 seconds
> (e.g., due to the delay of on-demand fd closing), the first
> wait_var_event_timeout() will timeout and the following wait_var_event()
> will hang forever as shown below:
> 
>  FS-Cache: Potential volume collision new=00000024 old=00000022
>  ......
>  INFO: task mount:1148 blocked for more than 122 seconds.
>        Not tainted 6.1.0-rc6+ #1
>  task:mount           state:D stack:0     pid:1148  ppid:1
>  Call Trace:
>   <TASK>
>   __schedule+0x2f6/0xb80
>   schedule+0x67/0xe0
>   fscache_wait_on_volume_collision.cold+0x80/0x82
>   __fscache_acquire_volume+0x40d/0x4e0
>   erofs_fscache_register_volume+0x51/0xe0 [erofs]
>   erofs_fscache_register_fs+0x19c/0x240 [erofs]
>   erofs_fc_fill_super+0x746/0xaf0 [erofs]
>   vfs_get_super+0x7d/0x100
>   get_tree_nodev+0x16/0x20
>   erofs_fc_get_tree+0x20/0x30 [erofs]
>   vfs_get_tree+0x24/0xb0
>   path_mount+0x2fa/0xa90
>   do_mount+0x7c/0xa0
>   __x64_sys_mount+0x8b/0xe0
>   do_syscall_64+0x30/0x60
>   entry_SYSCALL_64_after_hwframe+0x46/0xb0
> 
> Fixing it by using wake_up_var() instead of wake_up_bit(). In addition
> because waitqueue_active() is used in wake_up_var() and clear_bit()
> doesn't imply any memory barrier, so do smp_mb__after_atomic() before
> invoking wake_up_var().
> 
> Fixes: 62ab63352350 ("fscache: Implement volume registration")
> Signed-off-by: Hou Tao <houtao1@huawei.com>

Reviewed-and-tested-by: Jingbo Xu <jefflexu@linux.alibaba.com>


> ---
>  fs/fscache/volume.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
> index ab8ceddf9efa..cf8293bb1aca 100644
> --- a/fs/fscache/volume.c
> +++ b/fs/fscache/volume.c
> @@ -348,7 +348,12 @@ static void fscache_wake_pending_volume(struct fscache_volume *volume,
>  		if (fscache_volume_same(cursor, volume)) {
>  			fscache_see_volume(cursor, fscache_volume_see_hash_wake);
>  			clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
> -			wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
> +			/*
> +			 * Paired with barrier in wait_var_event(). Check
> +			 * waitqueue_active() and wake_up_var() for details.
> +			 */
> +			smp_mb__after_atomic();
> +			wake_up_var(&cursor->flags);
>  			return;
>  		}
>  	}

-- 
Thanks,
Jingbo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fscache: Use wake_up_var() to wake up pending volume acquisition
  2022-11-28  3:19 [PATCH] fscache: Use wake_up_var() to wake up pending volume acquisition Hou Tao
  2022-11-28  7:56 ` Jingbo Xu
@ 2022-12-09 11:17 ` Hou Tao
  2022-12-09 11:26 ` David Howells
  2 siblings, 0 replies; 5+ messages in thread
From: Hou Tao @ 2022-12-09 11:17 UTC (permalink / raw)
  To: linux-cachefs, David Howells
  Cc: Jeff Layton, linux-erofs, linux-kernel, houtao1

Hi David,

Could you please pick it up for v6.2 ?

On 11/28/2022 11:19 AM, Hou Tao wrote:
> From: Hou Tao <houtao1@huawei.com>
>
> The freeing of relinquished volume will wake up the pending volume
> acquisition by using wake_up_bit(), however it is mismatched with
> wait_var_event() used in fscache_wait_on_volume_collision() and it will
> never wake up the waiter in the wait-queue because these two functions
> operate on different wait-queues.
>
> According to the implementation in fscache_wait_on_volume_collision(),
> if the wake-up of pending acquisition is delayed longer than 20 seconds
> (e.g., due to the delay of on-demand fd closing), the first
> wait_var_event_timeout() will timeout and the following wait_var_event()
> will hang forever as shown below:
>
>  FS-Cache: Potential volume collision new=00000024 old=00000022
>  ......
>  INFO: task mount:1148 blocked for more than 122 seconds.
>        Not tainted 6.1.0-rc6+ #1
>  task:mount           state:D stack:0     pid:1148  ppid:1
>  Call Trace:
>   <TASK>
>   __schedule+0x2f6/0xb80
>   schedule+0x67/0xe0
>   fscache_wait_on_volume_collision.cold+0x80/0x82
>   __fscache_acquire_volume+0x40d/0x4e0
>   erofs_fscache_register_volume+0x51/0xe0 [erofs]
>   erofs_fscache_register_fs+0x19c/0x240 [erofs]
>   erofs_fc_fill_super+0x746/0xaf0 [erofs]
>   vfs_get_super+0x7d/0x100
>   get_tree_nodev+0x16/0x20
>   erofs_fc_get_tree+0x20/0x30 [erofs]
>   vfs_get_tree+0x24/0xb0
>   path_mount+0x2fa/0xa90
>   do_mount+0x7c/0xa0
>   __x64_sys_mount+0x8b/0xe0
>   do_syscall_64+0x30/0x60
>   entry_SYSCALL_64_after_hwframe+0x46/0xb0
>
> Fixing it by using wake_up_var() instead of wake_up_bit(). In addition
> because waitqueue_active() is used in wake_up_var() and clear_bit()
> doesn't imply any memory barrier, so do smp_mb__after_atomic() before
> invoking wake_up_var().
>
> Fixes: 62ab63352350 ("fscache: Implement volume registration")
> Signed-off-by: Hou Tao <houtao1@huawei.com>
> ---
>  fs/fscache/volume.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
>
> diff --git a/fs/fscache/volume.c b/fs/fscache/volume.c
> index ab8ceddf9efa..cf8293bb1aca 100644
> --- a/fs/fscache/volume.c
> +++ b/fs/fscache/volume.c
> @@ -348,7 +348,12 @@ static void fscache_wake_pending_volume(struct fscache_volume *volume,
>  		if (fscache_volume_same(cursor, volume)) {
>  			fscache_see_volume(cursor, fscache_volume_see_hash_wake);
>  			clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
> -			wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
> +			/*
> +			 * Paired with barrier in wait_var_event(). Check
> +			 * waitqueue_active() and wake_up_var() for details.
> +			 */
> +			smp_mb__after_atomic();
> +			wake_up_var(&cursor->flags);
>  			return;
>  		}
>  	}


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fscache: Use wake_up_var() to wake up pending volume acquisition
  2022-11-28  3:19 [PATCH] fscache: Use wake_up_var() to wake up pending volume acquisition Hou Tao
  2022-11-28  7:56 ` Jingbo Xu
  2022-12-09 11:17 ` Hou Tao
@ 2022-12-09 11:26 ` David Howells
  2022-12-15  8:28   ` Hou Tao
  2 siblings, 1 reply; 5+ messages in thread
From: David Howells @ 2022-12-09 11:26 UTC (permalink / raw)
  To: Hou Tao
  Cc: dhowells, linux-cachefs, Jeff Layton, linux-erofs, linux-kernel, houtao1

Hou Tao <houtao@huaweicloud.com> wrote:

> >  			clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);

Maybe this should be clear_bit_unlock() instead.

And I wonder if:

	set_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &candidate->flags);

in fscache_hash_volume() needs a barrier before it.

> > -			wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
> > +			/*
> > +			 * Paired with barrier in wait_var_event(). Check
> > +			 * waitqueue_active() and wake_up_var() for details.
> > +			 */
> > +			smp_mb__after_atomic();
> > +			wake_up_var(&cursor->flags);

That doesn't seem right.

wake_up_bit() is more selective, so should be preferred to wake_up_var().

David


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] fscache: Use wake_up_var() to wake up pending volume acquisition
  2022-12-09 11:26 ` David Howells
@ 2022-12-15  8:28   ` Hou Tao
  0 siblings, 0 replies; 5+ messages in thread
From: Hou Tao @ 2022-12-15  8:28 UTC (permalink / raw)
  To: David Howells
  Cc: linux-cachefs, Jeff Layton, linux-erofs, linux-kernel, houtao1,
	Jingbo Xu

Hi David,

Sorry for the late reply. Busy for other business in work.

On 12/9/2022 7:26 PM, David Howells wrote:
> Hou Tao <houtao@huaweicloud.com> wrote:
>
>>>  			clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags);
> Maybe this should be clear_bit_unlock() instead.
I'm not sure about that. In my understanding, clear_bit_unlock() is usually
paired with test_and_set_bit_lock() to implement bit lock to make sure the
writes before clear_bit_unlock() are visible to read access in concurrent
process, right ? But now the caller of fscache_wake_pending_volume() only modify
cursor->flags and nothing else, so I don't think it is needed here.
If its intended purpose is to provide the missing smp_mb() for wake_up_bit(), I
also don't think it is right, because the release barrier provided by
clear_bit_unlock() doesn't guarantee the order of cursor->flags and wq_head, so
I think one extra smp_mb_after_atomic() is also needed after
clear_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &cursor->flags).

If the above reasoning makes sense to you, I think we also need to add
smp_mb_after_atomic() for wake_up_bit() in fscache_create_volume_work().
> And I wonder if:
>
> 	set_bit(FSCACHE_VOLUME_ACQUIRE_PENDING, &candidate->flags);
>
> in fscache_hash_volume() needs a barrier before it.
I also don't get it. The barrier is used to guarantee the order between
cursor->flags and candidate->flags, right ? But the write and read of
cursor->flags and candidate->flags are protected by the same hash lock.
>
>>> -			wake_up_bit(&cursor->flags, FSCACHE_VOLUME_ACQUIRE_PENDING);
>>> +			/*
>>> +			 * Paired with barrier in wait_var_event(). Check
>>> +			 * waitqueue_active() and wake_up_var() for details.
>>> +			 */
>>> +			smp_mb__after_atomic();
>>> +			wake_up_var(&cursor->flags);
> That doesn't seem right.
>
> wake_up_bit() is more selective, so should be preferred to wake_up_var().
OK. Will update fscache_wait_on_volume_collision() to use wait_on_bit() accordingly.
> David
>
>
> .


^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-12-15  8:28 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-28  3:19 [PATCH] fscache: Use wake_up_var() to wake up pending volume acquisition Hou Tao
2022-11-28  7:56 ` Jingbo Xu
2022-12-09 11:17 ` Hou Tao
2022-12-09 11:26 ` David Howells
2022-12-15  8:28   ` Hou Tao

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).