All of lore.kernel.org
 help / color / mirror / Atom feed
* fuse freeze and usb devices
@ 2020-02-21 18:04 taz.007
  2020-02-21 19:20 ` Alan Stern
  0 siblings, 1 reply; 4+ messages in thread
From: taz.007 @ 2020-02-21 18:04 UTC (permalink / raw)
  To: linux-usb

Hello linux-usb,

I'm experiencing some freezing from a fuse userspace daemon. I'm not 
sure if it is an actual usb issue, so please point me to the correct 
subsystem/mailing list if they could help.
setup:
10 hard drives (ext3 or ext4) mounted on the system.
7 of those are sata under usb enclosures, (usb2 only).
2 of them are usbkeys (usb1 & usb2).
1 of them is a regular sata drive directly connected.
I use mergerfs to gather all of them under a common mount point.
scenario :
the machine is cpu loaded, (2C/4T) nearly fully used.
rsync is running in a loop (in order to reproduce the issue), copying 
some files (several GB) from the mergerfs mount point to another drive 
(that is not part of the pool, also a regular ext4 mounted drive).
some background processes are doing "light" (~50KB/sec) IO on the same 
mergerfs pool.
after a while , any access to the mergerfs mount point is frozen.
This is because mergerfs itself is stuck in a syscall (if I understand 
correctly) that is never returning.
However I can access (by doing an "ls" for example) the underlying 
mounted hard drives fine!
And in this case, accessing the underlying hard drives via "ls" somehow 
unfreezes the previously blocked syscall from the mergerfs daemon!
It is not even needed to use "ls", doing hdparm -tT on the drives 
directly also permits to unfreeze mergerfs.

Now the link with usb :
When I tweak the values of /sys/block/sdX/device/max_sectors I can alter 
the behaviour.
With the values of 128 or 240, I'm unable to reproduce the issue.
With the value of 512 it reproduces the issue after around 4-5hours.
With the value of 1024 it reproduces the issue after around 2hours.
(maybe those are statistically insignificant numbers and I'm just unlucky)

There are no errors from the kernel, and the drives still seem to be 
working fine in fact.
I'm using Linux 5.5.3, but I tried back the 5.1.15, and the issue is 
already there.

For more detailed info on the mergerfs callstack, see the original 
bugreport thread :
https://github.com/trapexit/mergerfs/issues/708

Please don't forget to CC me as I'm not subscribed to the ML.



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fuse freeze and usb devices
  2020-02-21 18:04 fuse freeze and usb devices taz.007
@ 2020-02-21 19:20 ` Alan Stern
  2020-02-23 12:12   ` taz.007
  0 siblings, 1 reply; 4+ messages in thread
From: Alan Stern @ 2020-02-21 19:20 UTC (permalink / raw)
  To: taz.007; +Cc: linux-usb

On Fri, 21 Feb 2020, taz.007 wrote:

> Hello linux-usb,
> 
> I'm experiencing some freezing from a fuse userspace daemon. I'm not 
> sure if it is an actual usb issue, so please point me to the correct 
> subsystem/mailing list if they could help.
> setup:
> 10 hard drives (ext3 or ext4) mounted on the system.
> 7 of those are sata under usb enclosures, (usb2 only).
> 2 of them are usbkeys (usb1 & usb2).
> 1 of them is a regular sata drive directly connected.
> I use mergerfs to gather all of them under a common mount point.
> scenario :
> the machine is cpu loaded, (2C/4T) nearly fully used.
> rsync is running in a loop (in order to reproduce the issue), copying 
> some files (several GB) from the mergerfs mount point to another drive 
> (that is not part of the pool, also a regular ext4 mounted drive).
> some background processes are doing "light" (~50KB/sec) IO on the same 
> mergerfs pool.
> after a while , any access to the mergerfs mount point is frozen.
> This is because mergerfs itself is stuck in a syscall (if I understand 
> correctly) that is never returning.
> However I can access (by doing an "ls" for example) the underlying 
> mounted hard drives fine!
> And in this case, accessing the underlying hard drives via "ls" somehow 
> unfreezes the previously blocked syscall from the mergerfs daemon!
> It is not even needed to use "ls", doing hdparm -tT on the drives 
> directly also permits to unfreeze mergerfs.
> 
> Now the link with usb :
> When I tweak the values of /sys/block/sdX/device/max_sectors I can alter 
> the behaviour.
> With the values of 128 or 240, I'm unable to reproduce the issue.
> With the value of 512 it reproduces the issue after around 4-5hours.
> With the value of 1024 it reproduces the issue after around 2hours.
> (maybe those are statistically insignificant numbers and I'm just unlucky)
> 
> There are no errors from the kernel, and the drives still seem to be 
> working fine in fact.
> I'm using Linux 5.5.3, but I tried back the 5.1.15, and the issue is 
> already there.
> 
> For more detailed info on the mergerfs callstack, see the original 
> bugreport thread :
> https://github.com/trapexit/mergerfs/issues/708
> 
> Please don't forget to CC me as I'm not subscribed to the ML.

It seems unlikely to me that your problem has anything to do with USB.  
You might try asking for help on the linux-kernel mailing list.

The GitHub bug report says that there are two threads stuck waiting in
splice system calls.  It also says that turning off splice doesn't
help.  When splice is off, what are the threads waiting for?

Some other things to consider...  They may not be related to your
problem:

What does "hdparm -B" show for your drives?

What do /sys/block/sdX/device/power/{control,runtime_status} contain?

Alan Stern


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: fuse freeze and usb devices
  2020-02-21 19:20 ` Alan Stern
@ 2020-02-23 12:12   ` taz.007
  0 siblings, 0 replies; 4+ messages in thread
From: taz.007 @ 2020-02-23 12:12 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-usb

On 21/02/20 20:20, Alan Stern wrote:
> On Fri, 21 Feb 2020, taz.007 wrote:
>
>> Hello linux-usb,
>>
>> I'm experiencing some freezing from a fuse userspace daemon. I'm not
>> sure if it is an actual usb issue, so please point me to the correct
>> subsystem/mailing list if they could help.
>> setup:
>> 10 hard drives (ext3 or ext4) mounted on the system.
>> 7 of those are sata under usb enclosures, (usb2 only).
>> 2 of them are usbkeys (usb1 & usb2).
>> 1 of them is a regular sata drive directly connected.
>> I use mergerfs to gather all of them under a common mount point.
>> scenario :
>> the machine is cpu loaded, (2C/4T) nearly fully used.
>> rsync is running in a loop (in order to reproduce the issue), copying
>> some files (several GB) from the mergerfs mount point to another drive
>> (that is not part of the pool, also a regular ext4 mounted drive).
>> some background processes are doing "light" (~50KB/sec) IO on the same
>> mergerfs pool.
>> after a while , any access to the mergerfs mount point is frozen.
>> This is because mergerfs itself is stuck in a syscall (if I understand
>> correctly) that is never returning.
>> However I can access (by doing an "ls" for example) the underlying
>> mounted hard drives fine!
>> And in this case, accessing the underlying hard drives via "ls" somehow
>> unfreezes the previously blocked syscall from the mergerfs daemon!
>> It is not even needed to use "ls", doing hdparm -tT on the drives
>> directly also permits to unfreeze mergerfs.
>>
>> Now the link with usb :
>> When I tweak the values of /sys/block/sdX/device/max_sectors I can alter
>> the behaviour.
>> With the values of 128 or 240, I'm unable to reproduce the issue.
>> With the value of 512 it reproduces the issue after around 4-5hours.
>> With the value of 1024 it reproduces the issue after around 2hours.
>> (maybe those are statistically insignificant numbers and I'm just unlucky)
>>
>> There are no errors from the kernel, and the drives still seem to be
>> working fine in fact.
>> I'm using Linux 5.5.3, but I tried back the 5.1.15, and the issue is
>> already there.
>>
>> For more detailed info on the mergerfs callstack, see the original
>> bugreport thread :
>> https://github.com/trapexit/mergerfs/issues/708
>>
>> Please don't forget to CC me as I'm not subscribed to the ML.
> It seems unlikely to me that your problem has anything to do with USB.
> You might try asking for help on the linux-kernel mailing list.

I will, thanks.

>
> The GitHub bug report says that there are two threads stuck waiting in
> splice system calls.  It also says that turning off splice doesn't
> help.  When splice is off, what are the threads waiting for?

they are waiting in pread64 () from /usr/lib/libpthread.so.0

> Some other things to consider...  They may not be related to your
> problem:
>
> What does "hdparm -B" show for your drives?

APM_level = off for the sata drive ; APM_level = not supported for all 
the usb drives

> What do /sys/block/sdX/device/power/{control,runtime_status} contain?
"on" and "active" for all of them.

I don't think it's related to a sleeping disk: the sata drive was 
busy(swapping a little) while the fuse mount was frozen.

I did a manual sync (on the cli), and it was enough to "wakeup" the 
system and rsync did resume.

> Alan Stern
>


^ permalink raw reply	[flat|nested] 4+ messages in thread

* fuse freeze and usb devices
@ 2020-02-23 13:03 taz.007
  0 siblings, 0 replies; 4+ messages in thread
From: taz.007 @ 2020-02-23 13:03 UTC (permalink / raw)
  To: linux-kernel

Hello kernel experts,


I'm experiencing some fuse user space daemon freeze.

I've created a bugzilla entry so I won't repeat all the details, so see 
here:

https://bugzilla.kernel.org/show_bug.cgi?id=206643

TLDR:

mergerfs is stuck inside pread64(). system is idle, manually accessing 
one of the drives via hdparm/ls (or sync) unblocks the syscall and 
mergerfs resumes.

I have no idea which subsystem is responsible for this behaviour, as the 
scenario is a bit complex.

Thanks for your help.


P.S. Add me in CC as I'm not subscribed to the ML.



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-02-23 13:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-21 18:04 fuse freeze and usb devices taz.007
2020-02-21 19:20 ` Alan Stern
2020-02-23 12:12   ` taz.007
2020-02-23 13:03 taz.007

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.