Why do NBD requests prevent hibernation, and FUSE requests do not?

linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Why do NBD requests prevent hibernation, and FUSE requests do not?
@ 2022-08-30  6:31 Nikolaus Rath
  2022-08-30 23:02 ` Bernd Schubert
  2022-09-02 12:49 ` Wouter Verhelst
  0 siblings, 2 replies; 6+ messages in thread
From: Nikolaus Rath @ 2022-08-30  6:31 UTC (permalink / raw)
  To: nbd, Linux FS Devel, miklos, Wouter Verhelst

Hello,

I am comparing the behavior of FUSE and NBD when attempting to hibernate
the system.

FUSE seems to be mostly compatible, I am able to suspend the system even
when there is ongoing I/O on the fuse filesystem.

With NBD, on the other hand, most I/O seems to prevent hibernation the
system. Example hibernation error:

  kernel: Freezing user space processes ... 
  kernel: Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
  kernel: task:rsync           state:D stack:    0 pid:348105 ppid:348104 flags:0x00004004
  kernel: Call Trace:
  kernel:  <TASK>
  kernel:  __schedule+0x308/0x9e0
  kernel:  schedule+0x4e/0xb0
  kernel:  schedule_timeout+0x88/0x150
  kernel:  ? __bpf_trace_tick_stop+0x10/0x10
  kernel:  io_schedule_timeout+0x4c/0x80
  kernel:  __cv_timedwait_common+0x129/0x160 [spl]
  kernel:  ? dequeue_task_stop+0x70/0x70
  kernel:  __cv_timedwait_io+0x15/0x20 [spl]
  kernel:  zio_wait+0x129/0x2b0 [zfs]
  kernel:  dmu_buf_hold+0x5b/0x90 [zfs]
  kernel:  zap_lockdir+0x4e/0xb0 [zfs]
  kernel:  zap_cursor_retrieve+0x1ae/0x320 [zfs]
  kernel:  ? dbuf_prefetch+0xf/0x20 [zfs]
  kernel:  ? dmu_prefetch+0xc8/0x200 [zfs]
  kernel:  zfs_readdir+0x12a/0x440 [zfs]
  kernel:  ? preempt_count_add+0x68/0xa0
  kernel:  ? preempt_count_add+0x68/0xa0
  kernel:  ? aa_file_perm+0x120/0x4c0
  kernel:  ? rrw_exit+0x65/0x150 [zfs]
  kernel:  ? _copy_to_user+0x21/0x30
  kernel:  ? cp_new_stat+0x150/0x180
  kernel:  zpl_iterate+0x4c/0x70 [zfs]
  kernel:  iterate_dir+0x171/0x1c0
  kernel:  __x64_sys_getdents64+0x78/0x110
  kernel:  ? __ia32_sys_getdents64+0x110/0x110
  kernel:  do_syscall_64+0x38/0xc0
  kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
  kernel: RIP: 0033:0x7f03c897a9c7
  kernel: RSP: 002b:00007ffd41e3c518 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
  kernel: RAX: ffffffffffffffda RBX: 0000561eff64dd40 RCX: 00007f03c897a9c7
  kernel: RDX: 0000000000008000 RSI: 0000561eff64dd70 RDI: 0000000000000000
  kernel: RBP: 0000561eff64dd70 R08: 0000000000000030 R09: 00007f03c8a72be0
  kernel: R10: 0000000000020000 R11: 0000000000000293 R12: ffffffffffffff80
  kernel: R13: 0000561eff64dd44 R14: 0000000000000000 R15: 0000000000000001
  kernel:  </TASK>

(this is with ZFS on top of the NBD device).

As far as I can tell, the problem is that while an NBD request is
pending, the atsk that waits for the result (in this case *rsync*) is
refusing to freeze. This happens even when setting a 5 minute timeout
for freezing (which is more than enough time for the NBD request to
complete), so I suspect that the NBD server task (in this case nbdkit)
has already been frozen and is thus unable to make progress.

However, I do not understand why the same is not happening for FUSE
(with FUSE requests being stuck because the FUSE daemon is already
frozen). Was I just very lucky in my tests? Or are tasks waiting for
FUSE request in a different kind of state? Or is NBD a red-herring here,
and the real trouble is with ZFS?

It would be great if someone  could shed some light on what's going on.

Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why do NBD requests prevent hibernation, and FUSE requests do not?
  2022-08-30  6:31 Why do NBD requests prevent hibernation, and FUSE requests do not? Nikolaus Rath
@ 2022-08-30 23:02 ` Bernd Schubert
  2022-09-07 15:50   ` Nikolaus Rath
  2022-09-02 12:49 ` Wouter Verhelst
  1 sibling, 1 reply; 6+ messages in thread
From: Bernd Schubert @ 2022-08-30 23:02 UTC (permalink / raw)
  To: nbd, Linux FS Devel, miklos, Wouter Verhelst



On 8/30/22 08:31, Nikolaus Rath wrote:
> Hello,
> 
> I am comparing the behavior of FUSE and NBD when attempting to hibernate
> the system.
> 
> FUSE seems to be mostly compatible, I am able to suspend the system even
> when there is ongoing I/O on the fuse filesystem.
> 

....

> 
> As far as I can tell, the problem is that while an NBD request is
> pending, the atsk that waits for the result (in this case *rsync*) is
> refusing to freeze. This happens even when setting a 5 minute timeout
> for freezing (which is more than enough time for the NBD request to
> complete), so I suspect that the NBD server task (in this case nbdkit)
> has already been frozen and is thus unable to make progress.
> 
> However, I do not understand why the same is not happening for FUSE
> (with FUSE requests being stuck because the FUSE daemon is already
> frozen). Was I just very lucky in my tests? Or are tasks waiting for
> FUSE request in a different kind of state? Or is NBD a red-herring here,
> and the real trouble is with ZFS?
> 
> It would be great if someone  could shed some light on what's going on.

I guess it is a generic issue also affecting fuse, see this patch

https://lore.kernel.org/lkml/20220511013057.245827-1-dlunev@chromium.org/

A bit down the thread you can find a reference to this ancient patch

https://linux-kernel.vger.kernel.narkive.com/UeBWfN1V/patch-fuse-make-fuse-daemon-frozen-along-with-kernel-threads

I had also asked about NFS when the server side is down, (and so a 
request reply will not come) but didn't get an answer.


- Bernd

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why do NBD requests prevent hibernation, and FUSE requests do not?
  2022-08-30  6:31 Why do NBD requests prevent hibernation, and FUSE requests do not? Nikolaus Rath
  2022-08-30 23:02 ` Bernd Schubert
@ 2022-09-02 12:49 ` Wouter Verhelst
  2022-09-07 10:18   ` Nikolaus Rath
  2022-09-16  8:05   ` Nikolaus Rath
  1 sibling, 2 replies; 6+ messages in thread
From: Wouter Verhelst @ 2022-09-02 12:49 UTC (permalink / raw)
  To: nbd, Linux FS Devel, miklos

Hi Nikolaus,

I do not know how FUSE works, so can't comment on that.

NBD, however, is a message-passing protocol: the client sends a message
to request something over a network socket, which causes the server to
do some processing, and then to send a message back. As far as the
kernel is concerned (at least outside nbd.ko), there is no connection
between the request message and the reply message.

As such, when the kernel suspends the nbd server, it has no way of
knowing that the in-kernel client is still waiting on a reply for a
message that was sent earlier.

I'm guessing that for FUSE, there is such a link?

On Tue, Aug 30, 2022 at 07:31:31AM +0100, Nikolaus Rath wrote:
> Hello,
> 
> I am comparing the behavior of FUSE and NBD when attempting to hibernate
> the system.
> 
> FUSE seems to be mostly compatible, I am able to suspend the system even
> when there is ongoing I/O on the fuse filesystem.
> 
> With NBD, on the other hand, most I/O seems to prevent hibernation the
> system. Example hibernation error:
> 
>   kernel: Freezing user space processes ... 
>   kernel: Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
>   kernel: task:rsync           state:D stack:    0 pid:348105 ppid:348104 flags:0x00004004
>   kernel: Call Trace:
>   kernel:  <TASK>
>   kernel:  __schedule+0x308/0x9e0
>   kernel:  schedule+0x4e/0xb0
>   kernel:  schedule_timeout+0x88/0x150
>   kernel:  ? __bpf_trace_tick_stop+0x10/0x10
>   kernel:  io_schedule_timeout+0x4c/0x80
>   kernel:  __cv_timedwait_common+0x129/0x160 [spl]
>   kernel:  ? dequeue_task_stop+0x70/0x70
>   kernel:  __cv_timedwait_io+0x15/0x20 [spl]
>   kernel:  zio_wait+0x129/0x2b0 [zfs]
>   kernel:  dmu_buf_hold+0x5b/0x90 [zfs]
>   kernel:  zap_lockdir+0x4e/0xb0 [zfs]
>   kernel:  zap_cursor_retrieve+0x1ae/0x320 [zfs]
>   kernel:  ? dbuf_prefetch+0xf/0x20 [zfs]
>   kernel:  ? dmu_prefetch+0xc8/0x200 [zfs]
>   kernel:  zfs_readdir+0x12a/0x440 [zfs]
>   kernel:  ? preempt_count_add+0x68/0xa0
>   kernel:  ? preempt_count_add+0x68/0xa0
>   kernel:  ? aa_file_perm+0x120/0x4c0
>   kernel:  ? rrw_exit+0x65/0x150 [zfs]
>   kernel:  ? _copy_to_user+0x21/0x30
>   kernel:  ? cp_new_stat+0x150/0x180
>   kernel:  zpl_iterate+0x4c/0x70 [zfs]
>   kernel:  iterate_dir+0x171/0x1c0
>   kernel:  __x64_sys_getdents64+0x78/0x110
>   kernel:  ? __ia32_sys_getdents64+0x110/0x110
>   kernel:  do_syscall_64+0x38/0xc0
>   kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
>   kernel: RIP: 0033:0x7f03c897a9c7
>   kernel: RSP: 002b:00007ffd41e3c518 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
>   kernel: RAX: ffffffffffffffda RBX: 0000561eff64dd40 RCX: 00007f03c897a9c7
>   kernel: RDX: 0000000000008000 RSI: 0000561eff64dd70 RDI: 0000000000000000
>   kernel: RBP: 0000561eff64dd70 R08: 0000000000000030 R09: 00007f03c8a72be0
>   kernel: R10: 0000000000020000 R11: 0000000000000293 R12: ffffffffffffff80
>   kernel: R13: 0000561eff64dd44 R14: 0000000000000000 R15: 0000000000000001
>   kernel:  </TASK>
> 
> (this is with ZFS on top of the NBD device).
> 
> 
> As far as I can tell, the problem is that while an NBD request is
> pending, the atsk that waits for the result (in this case *rsync*) is
> refusing to freeze. This happens even when setting a 5 minute timeout
> for freezing (which is more than enough time for the NBD request to
> complete), so I suspect that the NBD server task (in this case nbdkit)
> has already been frozen and is thus unable to make progress.
> 
> However, I do not understand why the same is not happening for FUSE
> (with FUSE requests being stuck because the FUSE daemon is already
> frozen). Was I just very lucky in my tests? Or are tasks waiting for
> FUSE request in a different kind of state? Or is NBD a red-herring here,
> and the real trouble is with ZFS?
> 
> It would be great if someone  could shed some light on what's going on.
> 
> 
> Best,
> -Nikolaus
> 
> -- 
> GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
> 
>              »Time flies like an arrow, fruit flies like a Banana.«
> 
> 

-- 
     w@uter.{be,co.za}
wouter@{grep.be,fosdem.org,debian.org}

I will have a Tin-Actinium-Potassium mixture, thanks.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why do NBD requests prevent hibernation, and FUSE requests do not?
  2022-09-02 12:49 ` Wouter Verhelst
@ 2022-09-07 10:18   ` Nikolaus Rath
  2022-09-16  8:05   ` Nikolaus Rath
  1 sibling, 0 replies; 6+ messages in thread
From: Nikolaus Rath @ 2022-09-07 10:18 UTC (permalink / raw)
  To: Wouter Verhelst; +Cc: nbd, Linux FS Devel, miklos

Hi Wouter,

FUSE communication happens through a pipe on the same host, so this
could in principle be used to make sure that the FUSE daemon isn't
frozen too early. But (per Bernd's response) it seems that this
information isn't actually used (since
https://linux-kernel.vger.kernel.narkive.com/UeBWfN1V/patch-fuse-make-fuse-daemon-frozen-along-with-kernel-threads)
hasn't been applied. So I guess I've just been exceptionally
lucky/unlucky in my FUSE/NBD experiments.

That said, since NBD can also operate over unix domain sockets I was
hoping that perhaps in that scenario there would be some way to
establish the same link for NBD?


Best,
-Nikolaus

On Sep 02 2022, Wouter Verhelst <w@uter.be> wrote:
> Hi Nikolaus,
>
> I do not know how FUSE works, so can't comment on that.
>
> NBD, however, is a message-passing protocol: the client sends a message
> to request something over a network socket, which causes the server to
> do some processing, and then to send a message back. As far as the
> kernel is concerned (at least outside nbd.ko), there is no connection
> between the request message and the reply message.
>
> As such, when the kernel suspends the nbd server, it has no way of
> knowing that the in-kernel client is still waiting on a reply for a
> message that was sent earlier.
>
> I'm guessing that for FUSE, there is such a link?
>
> On Tue, Aug 30, 2022 at 07:31:31AM +0100, Nikolaus Rath wrote:
>> Hello,
>> 
>> I am comparing the behavior of FUSE and NBD when attempting to hibernate
>> the system.
>> 
>> FUSE seems to be mostly compatible, I am able to suspend the system even
>> when there is ongoing I/O on the fuse filesystem.
>> 
>> With NBD, on the other hand, most I/O seems to prevent hibernation the
>> system. Example hibernation error:
>> 
>>   kernel: Freezing user space processes ... 
>>   kernel: Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
>>   kernel: task:rsync           state:D stack:    0 pid:348105 ppid:348104 flags:0x00004004
>>   kernel: Call Trace:
>>   kernel:  <TASK>
>>   kernel:  __schedule+0x308/0x9e0
>>   kernel:  schedule+0x4e/0xb0
>>   kernel:  schedule_timeout+0x88/0x150
>>   kernel:  ? __bpf_trace_tick_stop+0x10/0x10
>>   kernel:  io_schedule_timeout+0x4c/0x80
>>   kernel:  __cv_timedwait_common+0x129/0x160 [spl]
>>   kernel:  ? dequeue_task_stop+0x70/0x70
>>   kernel:  __cv_timedwait_io+0x15/0x20 [spl]
>>   kernel:  zio_wait+0x129/0x2b0 [zfs]
>>   kernel:  dmu_buf_hold+0x5b/0x90 [zfs]
>>   kernel:  zap_lockdir+0x4e/0xb0 [zfs]
>>   kernel:  zap_cursor_retrieve+0x1ae/0x320 [zfs]
>>   kernel:  ? dbuf_prefetch+0xf/0x20 [zfs]
>>   kernel:  ? dmu_prefetch+0xc8/0x200 [zfs]
>>   kernel:  zfs_readdir+0x12a/0x440 [zfs]
>>   kernel:  ? preempt_count_add+0x68/0xa0
>>   kernel:  ? preempt_count_add+0x68/0xa0
>>   kernel:  ? aa_file_perm+0x120/0x4c0
>>   kernel:  ? rrw_exit+0x65/0x150 [zfs]
>>   kernel:  ? _copy_to_user+0x21/0x30
>>   kernel:  ? cp_new_stat+0x150/0x180
>>   kernel:  zpl_iterate+0x4c/0x70 [zfs]
>>   kernel:  iterate_dir+0x171/0x1c0
>>   kernel:  __x64_sys_getdents64+0x78/0x110
>>   kernel:  ? __ia32_sys_getdents64+0x110/0x110
>>   kernel:  do_syscall_64+0x38/0xc0
>>   kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
>>   kernel: RIP: 0033:0x7f03c897a9c7
>>   kernel: RSP: 002b:00007ffd41e3c518 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
>>   kernel: RAX: ffffffffffffffda RBX: 0000561eff64dd40 RCX: 00007f03c897a9c7
>>   kernel: RDX: 0000000000008000 RSI: 0000561eff64dd70 RDI: 0000000000000000
>>   kernel: RBP: 0000561eff64dd70 R08: 0000000000000030 R09: 00007f03c8a72be0
>>   kernel: R10: 0000000000020000 R11: 0000000000000293 R12: ffffffffffffff80
>>   kernel: R13: 0000561eff64dd44 R14: 0000000000000000 R15: 0000000000000001
>>   kernel:  </TASK>
>> 
>> (this is with ZFS on top of the NBD device).
>> 
>> 
>> As far as I can tell, the problem is that while an NBD request is
>> pending, the atsk that waits for the result (in this case *rsync*) is
>> refusing to freeze. This happens even when setting a 5 minute timeout
>> for freezing (which is more than enough time for the NBD request to
>> complete), so I suspect that the NBD server task (in this case nbdkit)
>> has already been frozen and is thus unable to make progress.
>> 
>> However, I do not understand why the same is not happening for FUSE
>> (with FUSE requests being stuck because the FUSE daemon is already
>> frozen). Was I just very lucky in my tests? Or are tasks waiting for
>> FUSE request in a different kind of state? Or is NBD a red-herring here,
>> and the real trouble is with ZFS?
>> 
>> It would be great if someone  could shed some light on what's going on.
>> 
>> 
>> Best,
>> -Nikolaus
>> 
>> -- 
>> GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
>> 
>>              »Time flies like an arrow, fruit flies like a Banana.«
>> 
>> 
>
> -- 
>      w@uter.{be,co.za}
> wouter@{grep.be,fosdem.org,debian.org}
>
> I will have a Tin-Actinium-Potassium mixture, thanks.


-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why do NBD requests prevent hibernation, and FUSE requests do not?
  2022-08-30 23:02 ` Bernd Schubert
@ 2022-09-07 15:50   ` Nikolaus Rath
  0 siblings, 0 replies; 6+ messages in thread
From: Nikolaus Rath @ 2022-09-07 15:50 UTC (permalink / raw)
  To: Bernd Schubert; +Cc: nbd, Linux FS Devel, miklos, Wouter Verhelst

On Aug 31 2022, Bernd Schubert <bernd.schubert@fastmail.fm> wrote:
> On 8/30/22 08:31, Nikolaus Rath wrote:
>> Hello,
>> I am comparing the behavior of FUSE and NBD when attempting to hibernate
>> the system.
>> FUSE seems to be mostly compatible, I am able to suspend the system even
>> when there is ongoing I/O on the fuse filesystem.
>> 
>
> ....
>
>> As far as I can tell, the problem is that while an NBD request is
>> pending, the atsk that waits for the result (in this case *rsync*) is
>> refusing to freeze. This happens even when setting a 5 minute timeout
>> for freezing (which is more than enough time for the NBD request to
>> complete), so I suspect that the NBD server task (in this case nbdkit)
>> has already been frozen and is thus unable to make progress.
>> However, I do not understand why the same is not happening for FUSE
>> (with FUSE requests being stuck because the FUSE daemon is already
>> frozen). Was I just very lucky in my tests? Or are tasks waiting for
>> FUSE request in a different kind of state? Or is NBD a red-herring here,
>> and the real trouble is with ZFS?
>> It would be great if someone  could shed some light on what's going on.
>
> I guess it is a generic issue also affecting fuse, see this patch
>
> https://lore.kernel.org/lkml/20220511013057.245827-1-dlunev@chromium.org/
>
> A bit down the thread you can find a reference to this ancient patch
>
> https://linux-kernel.vger.kernel.narkive.com/UeBWfN1V/patch-fuse-make-fuse-daemon-frozen-along-with-kernel-threads

Interesting, thank you for the link! So it seems that I just got lucky
with FUSE.

Does anyone know in which order the kernel freezes processes by default?
Could I perhaps work around the problem by calling the FUSE/NBD daemon
something like "zzzzz_mydaemon"?


Best,
-Nikolaus

-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Why do NBD requests prevent hibernation, and FUSE requests do not?
  2022-09-02 12:49 ` Wouter Verhelst
  2022-09-07 10:18   ` Nikolaus Rath
@ 2022-09-16  8:05   ` Nikolaus Rath
  1 sibling, 0 replies; 6+ messages in thread
From: Nikolaus Rath @ 2022-09-16  8:05 UTC (permalink / raw)
  To: Wouter Verhelst; +Cc: nbd, Linux FS Devel, miklos

Hi Wouter,

Following up on this: should the NBD server perhaps set
PR_SET_IO_FLUSHER, and the kernel freeze tasks with this flag last?

Best,
-Nikolaus

On Sep 02 2022, Wouter Verhelst <w@uter.be> wrote:
> Hi Nikolaus,
>
> I do not know how FUSE works, so can't comment on that.
>
> NBD, however, is a message-passing protocol: the client sends a message
> to request something over a network socket, which causes the server to
> do some processing, and then to send a message back. As far as the
> kernel is concerned (at least outside nbd.ko), there is no connection
> between the request message and the reply message.
>
> As such, when the kernel suspends the nbd server, it has no way of
> knowing that the in-kernel client is still waiting on a reply for a
> message that was sent earlier.
>
> I'm guessing that for FUSE, there is such a link?
>
> On Tue, Aug 30, 2022 at 07:31:31AM +0100, Nikolaus Rath wrote:
>> Hello,
>> 
>> I am comparing the behavior of FUSE and NBD when attempting to hibernate
>> the system.
>> 
>> FUSE seems to be mostly compatible, I am able to suspend the system even
>> when there is ongoing I/O on the fuse filesystem.
>> 
>> With NBD, on the other hand, most I/O seems to prevent hibernation the
>> system. Example hibernation error:
>> 
>>   kernel: Freezing user space processes ... 
>>   kernel: Freezing of tasks failed after 20.003 seconds (1 tasks refusing to freeze, wq_busy=0):
>>   kernel: task:rsync           state:D stack:    0 pid:348105 ppid:348104 flags:0x00004004
>>   kernel: Call Trace:
>>   kernel:  <TASK>
>>   kernel:  __schedule+0x308/0x9e0
>>   kernel:  schedule+0x4e/0xb0
>>   kernel:  schedule_timeout+0x88/0x150
>>   kernel:  ? __bpf_trace_tick_stop+0x10/0x10
>>   kernel:  io_schedule_timeout+0x4c/0x80
>>   kernel:  __cv_timedwait_common+0x129/0x160 [spl]
>>   kernel:  ? dequeue_task_stop+0x70/0x70
>>   kernel:  __cv_timedwait_io+0x15/0x20 [spl]
>>   kernel:  zio_wait+0x129/0x2b0 [zfs]
>>   kernel:  dmu_buf_hold+0x5b/0x90 [zfs]
>>   kernel:  zap_lockdir+0x4e/0xb0 [zfs]
>>   kernel:  zap_cursor_retrieve+0x1ae/0x320 [zfs]
>>   kernel:  ? dbuf_prefetch+0xf/0x20 [zfs]
>>   kernel:  ? dmu_prefetch+0xc8/0x200 [zfs]
>>   kernel:  zfs_readdir+0x12a/0x440 [zfs]
>>   kernel:  ? preempt_count_add+0x68/0xa0
>>   kernel:  ? preempt_count_add+0x68/0xa0
>>   kernel:  ? aa_file_perm+0x120/0x4c0
>>   kernel:  ? rrw_exit+0x65/0x150 [zfs]
>>   kernel:  ? _copy_to_user+0x21/0x30
>>   kernel:  ? cp_new_stat+0x150/0x180
>>   kernel:  zpl_iterate+0x4c/0x70 [zfs]
>>   kernel:  iterate_dir+0x171/0x1c0
>>   kernel:  __x64_sys_getdents64+0x78/0x110
>>   kernel:  ? __ia32_sys_getdents64+0x110/0x110
>>   kernel:  do_syscall_64+0x38/0xc0
>>   kernel:  entry_SYSCALL_64_after_hwframe+0x44/0xae
>>   kernel: RIP: 0033:0x7f03c897a9c7
>>   kernel: RSP: 002b:00007ffd41e3c518 EFLAGS: 00000293 ORIG_RAX: 00000000000000d9
>>   kernel: RAX: ffffffffffffffda RBX: 0000561eff64dd40 RCX: 00007f03c897a9c7
>>   kernel: RDX: 0000000000008000 RSI: 0000561eff64dd70 RDI: 0000000000000000
>>   kernel: RBP: 0000561eff64dd70 R08: 0000000000000030 R09: 00007f03c8a72be0
>>   kernel: R10: 0000000000020000 R11: 0000000000000293 R12: ffffffffffffff80
>>   kernel: R13: 0000561eff64dd44 R14: 0000000000000000 R15: 0000000000000001
>>   kernel:  </TASK>
>> 
>> (this is with ZFS on top of the NBD device).
>> 
>> 
>> As far as I can tell, the problem is that while an NBD request is
>> pending, the atsk that waits for the result (in this case *rsync*) is
>> refusing to freeze. This happens even when setting a 5 minute timeout
>> for freezing (which is more than enough time for the NBD request to
>> complete), so I suspect that the NBD server task (in this case nbdkit)
>> has already been frozen and is thus unable to make progress.
>> 
>> However, I do not understand why the same is not happening for FUSE
>> (with FUSE requests being stuck because the FUSE daemon is already
>> frozen). Was I just very lucky in my tests? Or are tasks waiting for
>> FUSE request in a different kind of state? Or is NBD a red-herring here,
>> and the real trouble is with ZFS?
>> 
>> It would be great if someone  could shed some light on what's going on.
>> 
>> 
>> Best,
>> -Nikolaus
>> 
>> -- 
>> GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F
>> 
>>              »Time flies like an arrow, fruit flies like a Banana.«
>> 
>> 
>
> -- 
>      w@uter.{be,co.za}
> wouter@{grep.be,fosdem.org,debian.org}
>
> I will have a Tin-Actinium-Potassium mixture, thanks.


-- 
GPG Fingerprint: ED31 791B 2C5C 1613 AF38 8B8A D113 FCAC 3C4E 599F

             »Time flies like an arrow, fruit flies like a Banana.«

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-09-16  8:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-30  6:31 Why do NBD requests prevent hibernation, and FUSE requests do not? Nikolaus Rath
2022-08-30 23:02 ` Bernd Schubert
2022-09-07 15:50   ` Nikolaus Rath
2022-09-02 12:49 ` Wouter Verhelst
2022-09-07 10:18   ` Nikolaus Rath
2022-09-16  8:05   ` Nikolaus Rath

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).