All of lore.kernel.org
 help / color / mirror / Atom feed
* USB disk disconnect problems
@ 2022-08-21 11:17 James Dutton
  2022-08-21 14:47 ` Alan Stern
  2022-10-03 18:04 ` James Dutton
  0 siblings, 2 replies; 15+ messages in thread
From: James Dutton @ 2022-08-21 11:17 UTC (permalink / raw)
  To: linux-usb

Hi,

Say I have mounted a usb disk.
I then disconnect the usb device
Linux complains about failed writes etc.
I then plug the usb device back in
Linux still complains about failed writes, and does not recover.

How do I get Linux to recognise the reinserted usb disk and carry on as normal?

I know my suggested behaviour might be detrimental for some users, in
case one modifies the usb disk in another computer and then comes
back, but I would like an option that assumes it has not been plugged
into anything else.

The reason being, I have a system that boots from a USB disk.
Due to interference, the USB device disconnects for a second or two
and then comes back, but Linux does not see it and I have to reboot
Linux to recover. So, in this situation I wish Linux to be able to
recover immediately, without needing a reboot.

The physical USB device removal then reinserting reproduces the
problem I am seeing, so I thought it would be a good example to get
working, if we could.

Can anyone give me any pointers as to where to start with fixing this?

Kind Regards

James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
  2022-08-21 11:17 USB disk disconnect problems James Dutton
@ 2022-08-21 14:47 ` Alan Stern
  2022-08-21 16:36   ` James Dutton
                     ` (2 more replies)
  2022-10-03 18:04 ` James Dutton
  1 sibling, 3 replies; 15+ messages in thread
From: Alan Stern @ 2022-08-21 14:47 UTC (permalink / raw)
  To: James Dutton; +Cc: linux-usb

On Sun, Aug 21, 2022 at 12:17:30PM +0100, James Dutton wrote:
> Hi,
> 
> Say I have mounted a usb disk.
> I then disconnect the usb device
> Linux complains about failed writes etc.
> I then plug the usb device back in
> Linux still complains about failed writes, and does not recover.
> 
> How do I get Linux to recognise the reinserted usb disk and carry on as normal?

As far as I know, there's only way way to do it: Go into system suspend 
before disconnecting the USB drive, and plug the drive back in before 
waking the system up.

> I know my suggested behaviour might be detrimental for some users, in
> case one modifies the usb disk in another computer and then comes
> back, but I would like an option that assumes it has not been plugged
> into anything else.

The resume procedure makes this assumption, if it finds that something 
has been disconnected and reconnected.

> The reason being, I have a system that boots from a USB disk.
> Due to interference, the USB device disconnects for a second or two
> and then comes back, but Linux does not see it and I have to reboot
> Linux to recover. So, in this situation I wish Linux to be able to
> recover immediately, without needing a reboot.

There is no way to do this.  For example, consider all those failed 
writes that you get error messages about.  Once they have failed, the 
system does not try to remember them in case there's a possibility of 
trying them again later.  They're just lost.

Similarly with failed reads.  When a program tries to read something 
from a disk and the read fails, the program generally does not wait for 
a while and then retry the read, to see if the disk will magically start 
working again.

> The physical USB device removal then reinserting reproduces the
> problem I am seeing, so I thought it would be a good example to get
> working, if we could.
> 
> Can anyone give me any pointers as to where to start with fixing this?

Sorry I can't be of any more help.

Alan Stern

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
  2022-08-21 14:47 ` Alan Stern
@ 2022-08-21 16:36   ` James Dutton
  2022-08-21 16:40     ` James Dutton
       [not found]   ` <CAA6KcBC2wEc78fgrMLBfbyEinR3rVUY6z8HeUbE=wtv0c4BP2Q@mail.gmail.com>
  2022-08-21 20:03   ` Matthew Dharm
  2 siblings, 1 reply; 15+ messages in thread
From: James Dutton @ 2022-08-21 16:36 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-usb

On Sun, 21 Aug 2022 at 15:47, Alan Stern <stern@rowland.harvard.edu> wrote:
>
> > The reason being, I have a system that boots from a USB disk.
> > Due to interference, the USB device disconnects for a second or two
> > and then comes back, but Linux does not see it and I have to reboot
> > Linux to recover. So, in this situation I wish Linux to be able to
> > recover immediately, without needing a reboot.
>
> There is no way to do this.  For example, consider all those failed
> writes that you get error messages about.  Once they have failed, the
> system does not try to remember them in case there's a possibility of
> trying them again later.  They're just lost.
I guess the solution would have to include a "retry in 1 second's
time" type failure mode, instead of just lost.
I.e. differentiate between the disk responding that the media failed,
and the link being down to the disk so the write message could not be
sent.
For example, NFS waits around for the network to return, maybe we
could add that functionality between a filesystem and usb storage.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
  2022-08-21 16:36   ` James Dutton
@ 2022-08-21 16:40     ` James Dutton
  2022-08-21 18:11       ` Alan Stern
  0 siblings, 1 reply; 15+ messages in thread
From: James Dutton @ 2022-08-21 16:40 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-usb

On Sun, 21 Aug 2022 at 17:36, James Dutton <james.dutton@gmail.com> wrote:
>
> On Sun, 21 Aug 2022 at 15:47, Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > > The reason being, I have a system that boots from a USB disk.
> > > Due to interference, the USB device disconnects for a second or two
> > > and then comes back, but Linux does not see it and I have to reboot
> > > Linux to recover. So, in this situation I wish Linux to be able to
> > > recover immediately, without needing a reboot.
> >
> > There is no way to do this.  For example, consider all those failed
> > writes that you get error messages about.  Once they have failed, the
> > system does not try to remember them in case there's a possibility of
> > trying them again later.  They're just lost.
> I guess the solution would have to include a "retry in 1 second's
> time" type failure mode, instead of just lost.
> I.e. differentiate between the disk responding that the media failed,
> and the link being down to the disk so the write message could not be
> sent.
> For example, NFS waits around for the network to return, maybe we
> could add that functionality between a filesystem and usb storage.

As a side note, I have seen USB links failing. Normally just to
something like a keyboard or mouse, so it just comes back without the
user knowing anything was wrong.
The problem is USB links to disks don't recover currently.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
  2022-08-21 16:40     ` James Dutton
@ 2022-08-21 18:11       ` Alan Stern
  0 siblings, 0 replies; 15+ messages in thread
From: Alan Stern @ 2022-08-21 18:11 UTC (permalink / raw)
  To: James Dutton; +Cc: linux-usb

On Sun, Aug 21, 2022 at 05:40:23PM +0100, James Dutton wrote:
> On Sun, 21 Aug 2022 at 17:36, James Dutton <james.dutton@gmail.com> wrote:
> >
> > On Sun, 21 Aug 2022 at 15:47, Alan Stern <stern@rowland.harvard.edu> wrote:
> > >
> > > > The reason being, I have a system that boots from a USB disk.
> > > > Due to interference, the USB device disconnects for a second or two
> > > > and then comes back, but Linux does not see it and I have to reboot
> > > > Linux to recover. So, in this situation I wish Linux to be able to
> > > > recover immediately, without needing a reboot.
> > >
> > > There is no way to do this.  For example, consider all those failed
> > > writes that you get error messages about.  Once they have failed, the
> > > system does not try to remember them in case there's a possibility of
> > > trying them again later.  They're just lost.
> > I guess the solution would have to include a "retry in 1 second's
> > time" type failure mode, instead of just lost.

Maybe, in theory.  In your case, I think a better solution would be to 
eliminate the interference that causes the transient disconnects to 
occur in the first place.  USB isn't designed to operate reliably in an 
environment filled with that much noise.

> > I.e. differentiate between the disk responding that the media failed,
> > and the link being down to the disk so the write message could not be
> > sent.
> > For example, NFS waits around for the network to return, maybe we
> > could add that functionality between a filesystem and usb storage.

In theory it could be done.  I suspect the overall benefit would not be 
very large; I have not heard lots of reports from other people facing 
the problem you have.  Consider that neither Windows nor Mac OS-X does 
this.

Also, doing this would lead to other problems.  For instace, I'm sure 
some people want to know that a device has stopped working as soon as 
the problem begins; they would get upset if the system kept trying to 
reconnect for tens of seconds before finally deciding the device was 
gone for good.  (Consider the way people have complained a lot over the 
years about NFS and its extremely long uninterruptible waits.)

> As a side note, I have seen USB links failing. Normally just to
> something like a keyboard or mouse, so it just comes back without the
> user knowing anything was wrong.

That's different.  When the link to a USB mouse fails and then starts 
working again, the system doesn't think the mouse has recovered; it 
regards what happened as a new mouse being plugged in.  (Same with 
keyboards.)  The user doesn't notice anything because the system treats 
all mice the same.  In fact, you can even plug in two mice at the same 
time (that is, without bothering to wait for the first one to fail) and 
the system will accept input from both of them interchangeably.

> The problem is USB links to disks don't recover currently.

Well, you have to admit that treating disks like mice -- considering all 
of them to be the same -- would not be a good strategy.  :-)

(On the other hand, sometimes two disks really do get treated as though 
they are the same.  That's what happens in a RAID-1 (mirroring) setup.  
If you have mirrored USB disks, you can unplug one of them and the 
system will continue working.  And when you plug it back it later, the 
system will repair it as necessary and then go on using it normally 
without your noticing.  But obviously this isn't what you have in mind.)

Alan Stern

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
       [not found]   ` <CAA6KcBC2wEc78fgrMLBfbyEinR3rVUY6z8HeUbE=wtv0c4BP2Q@mail.gmail.com>
@ 2022-08-21 19:03     ` Alan Stern
  0 siblings, 0 replies; 15+ messages in thread
From: Alan Stern @ 2022-08-21 19:03 UTC (permalink / raw)
  To: Matthew Dharm; +Cc: James Dutton, linux-usb

On Sun, Aug 21, 2022 at 11:42:00AM -0700, Matthew Dharm wrote:
> On Sun, Aug 21, 2022 at 7:47 AM Alan Stern <stern@rowland.harvard.edu>
> wrote:
> 
> > On Sun, Aug 21, 2022 at 12:17:30PM +0100, James Dutton wrote:
> > > I know my suggested behaviour might be detrimental for some users, in
> > > case one modifies the usb disk in another computer and then comes
> > > back, but I would like an option that assumes it has not been plugged
> > > into anything else.
> 
> 
> In the “old days” (that is, my original design for use-storage) it used to
> do exactly what you are looking for - based on VID, DID, and SerialNumber
> it would “remember” devices. The SCSI host would never be destroyed, and
> when a device re-appeared it would be re-connected to the existing host.

Ah yes...  I do remember those days, but not very often.  :-)

> That caused all sorts of problems. The SCSI and block layers just couldn’t
> handle it well. A clean umount / mount cycle worked fine, but if you
> unexpectedly disconnected the device all hell broke loose and there was no
> way to recover.
> 
> I did it this way because, way back when, there were issues dynamically
> destroying SCSI hosts. The people who worked on those other layers found it
> much, much easier to fix that problem than try to make it possible to
> recover from an unexpected disconnect.
> 
> Honestly, I’m not even sure where you would need to begin to make this
> work. It would require pretty radical changes is the block I/O layers to
> differentiate different failure modes, keep a lot more data around after
> certain types of failures, allow for specifying which devices this new
> policy (which is assuming reconnected devices really haven’t been altered)
> applies to, etc — it’s a big lift.

Provided you don't mind giving up after 30 seconds (the default SCSI 
timeout), you wouldn't need to change the block or other layers.  All 
you would have to do is avoid reporting a command failure if the reason 
for the failure is disconnection, wait for the device to reappear, and 
then retry the command.  (Yes, there would be a few extra complications 
but that's the basic idea.)  As far as the SCSI or block layers are 
concerned, it would look like the I/O succeeded but took an unusually 
long time to complete.

Alan Stern

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
  2022-08-21 14:47 ` Alan Stern
  2022-08-21 16:36   ` James Dutton
       [not found]   ` <CAA6KcBC2wEc78fgrMLBfbyEinR3rVUY6z8HeUbE=wtv0c4BP2Q@mail.gmail.com>
@ 2022-08-21 20:03   ` Matthew Dharm
  2022-08-21 20:59     ` James Dutton
  2 siblings, 1 reply; 15+ messages in thread
From: Matthew Dharm @ 2022-08-21 20:03 UTC (permalink / raw)
  To: Alan Stern; +Cc: James Dutton, linux-usb

(Re-sending, as the first one got blocked by the list for having an HTML part).

On Sun, Aug 21, 2022 at 7:47 AM Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Sun, Aug 21, 2022 at 12:17:30PM +0100, James Dutton wrote:
> > I know my suggested behaviour might be detrimental for some users, in
> > case one modifies the usb disk in another computer and then comes
> > back, but I would like an option that assumes it has not been plugged
> > into anything else.

In the “old days” (that is, my original design for use-storage) it
used to do exactly what you are looking for - based on VID, DID, and
SerialNumber it would “remember” devices. The SCSI host would never be
destroyed, and when a device re-appeared it would be re-connected to
the existing host.

That caused all sorts of problems. The SCSI and block layers just
couldn’t handle it well. A clean umount / mount cycle worked fine, but
if you unexpectedly disconnected the device all hell broke loose and
there was no way to recover.

I did it this way because, way back when, there were issues
dynamically destroying SCSI hosts. The people who worked on those
other layers found it much, much easier to fix that problem than try
to make it possible to recover from an unexpected disconnect.

Honestly, I’m not even sure where you would need to begin to make this
work. It would require pretty radical changes is the block I/O layers
to differentiate different failure modes, keep a lot more data around
after certain types of failures, allow for specifying which devices
this new policy (which is assuming reconnected devices really haven’t
been altered) applies to, etc — it’s a big lift.

Matt
aka “the guy who originally designed how this works”

-- 
Matthew Dharm
Former Maintainer, USB Mass Storage driver for Linux

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
  2022-08-21 20:03   ` Matthew Dharm
@ 2022-08-21 20:59     ` James Dutton
  2022-08-21 21:26       ` Matthew Dharm
  2022-08-22 10:18       ` Oliver Neukum
  0 siblings, 2 replies; 15+ messages in thread
From: James Dutton @ 2022-08-21 20:59 UTC (permalink / raw)
  To: Matthew Dharm; +Cc: Alan Stern, linux-usb

On Sun, 21 Aug 2022 at 21:03, Matthew Dharm
<mdharm-usb@one-eyed-alien.net> wrote:
>
> (Re-sending, as the first one got blocked by the list for having an HTML part).
>
> On Sun, Aug 21, 2022 at 7:47 AM Alan Stern <stern@rowland.harvard.edu> wrote:
> >
> > On Sun, Aug 21, 2022 at 12:17:30PM +0100, James Dutton wrote:
> > > I know my suggested behaviour might be detrimental for some users, in
> > > case one modifies the usb disk in another computer and then comes
> > > back, but I would like an option that assumes it has not been plugged
> > > into anything else.
>
> In the “old days” (that is, my original design for use-storage) it
> used to do exactly what you are looking for - based on VID, DID, and
> SerialNumber it would “remember” devices. The SCSI host would never be
> destroyed, and when a device re-appeared it would be re-connected to
> the existing host.
>
> That caused all sorts of problems. The SCSI and block layers just
> couldn’t handle it well. A clean umount / mount cycle worked fine, but
> if you unexpectedly disconnected the device all hell broke loose and
> there was no way to recover.
>
> I did it this way because, way back when, there were issues
> dynamically destroying SCSI hosts. The people who worked on those
> other layers found it much, much easier to fix that problem than try
> to make it possible to recover from an unexpected disconnect.
>
> Honestly, I’m not even sure where you would need to begin to make this
> work. It would require pretty radical changes is the block I/O layers
> to differentiate different failure modes, keep a lot more data around
> after certain types of failures, allow for specifying which devices
> this new policy (which is assuming reconnected devices really haven’t
> been altered) applies to, etc — it’s a big lift.
>

Are there any situations where we should actually try to recover?
What about:
The OS has not needed to read/write to the disk in a while. The USB
disk idles out and goes into a power save mode by itself.
The OS then wishes to write something, but would need to go through
some sort of wake up procedure first.

I don't know if that is a state that is available for USB devices, but
if it was, would it be fair to try and recover?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
  2022-08-21 20:59     ` James Dutton
@ 2022-08-21 21:26       ` Matthew Dharm
  2022-08-21 22:56         ` James Dutton
  2022-08-22 10:18       ` Oliver Neukum
  1 sibling, 1 reply; 15+ messages in thread
From: Matthew Dharm @ 2022-08-21 21:26 UTC (permalink / raw)
  To: James Dutton; +Cc: Alan Stern, linux-usb

On Sun, Aug 21, 2022 at 2:00 PM James Dutton <james.dutton@gmail.com> wrote:
>
> On Sun, 21 Aug 2022 at 21:03, Matthew Dharm
> <mdharm-usb@one-eyed-alien.net> wrote:
> >
> > In the “old days” (that is, my original design for use-storage) it
> > used to do exactly what you are looking for - based on VID, DID, and
> > SerialNumber it would “remember” devices. The SCSI host would never be
> > destroyed, and when a device re-appeared it would be re-connected to
> > the existing host.
> >
> > That caused all sorts of problems. The SCSI and block layers just
> > couldn’t handle it well. A clean umount / mount cycle worked fine, but
> > if you unexpectedly disconnected the device all hell broke loose and
> > there was no way to recover.
>
> Are there any situations where we should actually try to recover?
> What about:
> The OS has not needed to read/write to the disk in a while. The USB
> disk idles out and goes into a power save mode by itself.
> The OS then wishes to write something, but would need to go through
> some sort of wake up procedure first.
>
> I don't know if that is a state that is available for USB devices, but
> if it was, would it be fair to try and recover?

That scenario already happens all the time; rotating disks often
spin-down after an idle period and then automatically spin-up at the
next media-access command.  So long as they spin-up within the command
timeout (typically 30 seconds), there is no issue.  BUT, this is very
different from what you originally asked about -- in a low-power
spin-down state, the USB interface is still connected; only the
rotating has stopped.  From the computer's perspective, the device has
always remained attached; the only anomaly is that a command takes
longer-than-usual to complete.

The next level of deeper power savings would be a system-wide suspend
/ resume, which we've already discussed and is a path which is already
handled (and also different from the original scenario you described).

Matt

-- 
Matthew Dharm
Former Maintainer, USB Mass Storage driver for Linux

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
  2022-08-21 21:26       ` Matthew Dharm
@ 2022-08-21 22:56         ` James Dutton
  2022-08-22 10:03           ` Oliver Neukum
  0 siblings, 1 reply; 15+ messages in thread
From: James Dutton @ 2022-08-21 22:56 UTC (permalink / raw)
  To: Matthew Dharm; +Cc: Alan Stern, linux-usb

On Sun, 21 Aug 2022 at 22:26, Matthew Dharm
<mdharm-usb@one-eyed-alien.net> wrote:
>
> The next level of deeper power savings would be a system-wide suspend
> / resume, which we've already discussed and is a path which is already
> handled (and also different from the original scenario you described).
>

I tried a suspend / resume cycle.
1) The laptop suspends in that the screen blanks and the power LED
fades in/out as an indicator of suspend mode.
2) Power to the USB device is powered on while suspended. (LED light
on USB device remains on during suspend.)
3) I can remove and reinsert the USB during suspend and it still resumes ok.
4) On exit from suspend everything looks to work ok.

I see these messages in the syslog during the suspend/resume cycle:
<6>1 2022-08-21T23:18:57+01:00 nvme2 kernel - - -  [ 1127.688557] usb
4-2: reset SuperSpeed USB device number 2 using xhci_hcd
<4>1 2022-08-21T23:18:57+01:00 nvme2 kernel - - -  [ 1127.782252] usb
4-2: Enable of device-initiated U1 failed.
<4>1 2022-08-21T23:18:57+01:00 nvme2 kernel - - -  [ 1127.784263] usb
4-2: Enable of device-initiated U2 failed.

Is U1/U2 failing a problem that could maybe be causing the problems I have seen?
The error is in the logs, but the resume works, and the disk is accessible.


When the real problem occurs (not during suspend/resume), an extract here:
<6>1 2022-05-04T14:32:53+01:00 nvme2 kernel - - -  [20782.100705] sd
0:0:0:0: [sda] tag#8 uas_eh_abort_handler 0 uas-tag 2 inflight: CMD
<6>1 2022-05-04T14:32:53+01:00 nvme2 kernel - - -  [20782.100707] sd
0:0:0:0: [sda] tag#8 CDB: Write(10) 2a 00 1c 51 11 20 00 00 20 00
<6>1 2022-05-04T14:32:53+01:00 nvme2 kernel - - -  [20782.115321] scsi
host0: uas_eh_device_reset_handler start
<6>1 2022-05-04T14:32:53+01:00 nvme2 kernel - - -  [20782.248337] usb
4-1: reset SuperSpeed USB device number 2 using xhci_hcd
<4>1 2022-05-04T14:32:58+01:00 nvme2 kernel - - -  [20787.463620]
xhci_hcd 0000:00:14.0: Trying to add endpoint 0x83 without dropping
it.
<3>1 2022-05-04T14:32:58+01:00 nvme2 kernel - - -  [20787.463633] usb
4-1: failed to restore interface 0 altsetting 1 (error=-110)
<6>1 2022-05-04T14:32:58+01:00 nvme2 kernel - - -  [20787.471524] scsi
host0: uas_eh_device_reset_handler FAILED err -19
<6>1 2022-05-04T14:32:58+01:00 nvme2 kernel - - -  [20787.471540] sd
0:0:0:0: Device offlined - not ready after error recovery


So, it is attempting to recover, but the recovery fails.
What is error -110 and err -19 ?

Are there any "quirks" that I could try enabling in relation to reset problems?

Kind Regards

James

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
  2022-08-21 22:56         ` James Dutton
@ 2022-08-22 10:03           ` Oliver Neukum
  0 siblings, 0 replies; 15+ messages in thread
From: Oliver Neukum @ 2022-08-22 10:03 UTC (permalink / raw)
  To: James Dutton, Matthew Dharm; +Cc: Alan Stern, linux-usb



On 22.08.22 00:56, James Dutton wrote:

> I see these messages in the syslog during the suspend/resume cycle:
> <6>1 2022-08-21T23:18:57+01:00 nvme2 kernel - - -  [ 1127.688557] usb
> 4-2: reset SuperSpeed USB device number 2 using xhci_hcd
> <4>1 2022-08-21T23:18:57+01:00 nvme2 kernel - - -  [ 1127.782252] usb
> 4-2: Enable of device-initiated U1 failed.
> <4>1 2022-08-21T23:18:57+01:00 nvme2 kernel - - -  [ 1127.784263] usb
> 4-2: Enable of device-initiated U2 failed.
> 
> Is U1/U2 failing a problem that could maybe be causing the problems I have seen?
> The error is in the logs, but the resume works, and the disk is accessible.

That is power management. And for a disk to use only power
managementunder the host's control is not a problem.
> When the real problem occurs (not during suspend/resume), an extract here:
> <6>1 2022-05-04T14:32:53+01:00 nvme2 kernel - - -  [20782.100705] sd
> 0:0:0:0: [sda] tag#8 uas_eh_abort_handler 0 uas-tag 2 inflight: CM

A timeout has happened.

> <6>1 2022-05-04T14:32:53+01:00 nvme2 kernel - - -  [20782.100707] sd
> 0:0:0:0: [sda] tag#8 CDB: Write(10) 2a 00 1c 51 11 20 00 00 20 00
> <6>1 2022-05-04T14:32:53+01:00 nvme2 kernel - - -  [20782.115321] scsi
> host0: uas_eh_device_reset_handler start

At that time the SCSI layer does not know why a timeout has happened, so
it starts generic error hanfdling, involving a reset.

> <6>1 2022-05-04T14:32:53+01:00 nvme2 kernel - - -  [20782.248337] usb
> 4-1: reset SuperSpeed USB device number 2 using xhci_hcd
> <4>1 2022-05-04T14:32:58+01:00 nvme2 kernel - - -  [20787.463620]
> xhci_hcd 0000:00:14.0: Trying to add endpoint 0x83 without dropping
> it.

This should not happen

> <3>1 2022-05-04T14:32:58+01:00 nvme2 kernel - - -  [20787.463633] usb
> 4-1: failed to restore interface 0 altsetting 1 (error=-110)
> <6>1 2022-05-04T14:32:58+01:00 nvme2 kernel - - -  [20787.471524] scsi
> host0: uas_eh_device_reset_handler FAILED err -19
> <6>1 2022-05-04T14:32:58+01:00 nvme2 kernel - - -  [20787.471540] sd
> 0:0:0:0: Device offlined - not ready after error recovery

In this case the kernel does not think that your device has been
disconnected. All error handling has failed. It gives up on the
device but it is still know to the system.

> So, it is attempting to recover, but the recovery fails.
> What is error -110 and err -19 ?

-19 is ENODEV
-110 is ETIMEDOUT

Those numbers are to be found in
include/uapi/asm-generic/errno-base.h
include/uapi/asm-generic/errno.h


> Are there any "quirks" that I could try enabling in relation to reset problems?

Probably not. Is this log complete?

	Regards
		Oliver


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
  2022-08-21 20:59     ` James Dutton
  2022-08-21 21:26       ` Matthew Dharm
@ 2022-08-22 10:18       ` Oliver Neukum
  1 sibling, 0 replies; 15+ messages in thread
From: Oliver Neukum @ 2022-08-22 10:18 UTC (permalink / raw)
  To: James Dutton, Matthew Dharm; +Cc: Alan Stern, linux-usb



On 21.08.22 22:59, James Dutton wrote:
> On Sun, 21 Aug 2022 at 21:03, Matthew Dharm
> <mdharm-usb@one-eyed-alien.net> wrote:

>> In the “old days” (that is, my original design for use-storage) it
>> used to do exactly what you are looking for - based on VID, DID, and
>> SerialNumber it would “remember” devices. The SCSI host would never be
>> destroyed, and when a device re-appeared it would be re-connected to
>> the existing host.

Arguably, in case ACPI tells us that the port is internal we ought
to reintroduce that behavior. It is very much an edge case, though.

>> Honestly, I’m not even sure where you would need to begin to make this
>> work. It would require pretty radical changes is the block I/O layers
>> to differentiate different failure modes, keep a lot more data around
>> after certain types of failures, allow for specifying which devices
>> this new policy (which is assuming reconnected devices really haven’t
>> been altered) applies to, etc — it’s a big lift.

Basically like failover with multi path I'd say.

> Are there any situations where we should actually try to recover?
> What about:
> The OS has not needed to read/write to the disk in a while. The USB
> disk idles out and goes into a power save mode by itself.
> The OS then wishes to write something, but would need to go through
> some sort of wake up procedure first.

We have three issues

1) Is this the same device?
2) Has it been altered while it was disconnected?
3) What do we do in case of memory pressure causing pages to be laundered?

In case of device persistance we ignore #1 and #2 and #3
does not exist

> I don't know if that is a state that is available for USB devices, but
> if it was, would it be fair to try and recover?

That is strictly speaking not a USB question. Every device has this
issue. You just do not check on resumption from S3 or S4whether somebody
has replaced the SATA disk in your system.

	Regards
		Oliver

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
  2022-08-21 11:17 USB disk disconnect problems James Dutton
  2022-08-21 14:47 ` Alan Stern
@ 2022-10-03 18:04 ` James Dutton
  2022-10-03 18:17   ` Alan Stern
  1 sibling, 1 reply; 15+ messages in thread
From: James Dutton @ 2022-10-03 18:04 UTC (permalink / raw)
  To: linux-usb

On Sun, 21 Aug 2022 at 12:17, James Dutton <james.dutton@gmail.com> wrote:
>
> Hi,
>
> Say I have mounted a usb disk.
> I then disconnect the usb device
> Linux complains about failed writes etc.
> I then plug the usb device back in
> Linux still complains about failed writes, and does not recover.
>
> How do I get Linux to recognise the reinserted usb disk and carry on as normal?
>
> I know my suggested behaviour might be detrimental for some users, in
> case one modifies the usb disk in another computer and then comes
> back, but I would like an option that assumes it has not been plugged
> into anything else.
>
> The reason being, I have a system that boots from a USB disk.
> Due to interference, the USB device disconnects for a second or two
> and then comes back, but Linux does not see it and I have to reboot
> Linux to recover. So, in this situation I wish Linux to be able to
> recover immediately, without needing a reboot.
>
> The physical USB device removal then reinserting reproduces the
> problem I am seeing, so I thought it would be a good example to get
> working, if we could.
>
> Can anyone give me any pointers as to where to start with fixing this?
>
> Kind Regards
>
> James

I have done some more tests.
With the device plugged in, and me manually send a command to reset
the USB device.
Using instructions listed here:
https://askubuntu.com/questions/645/how-do-you-reset-a-usb-device-from-the-command-line

The reset fails.
It never recovers.
So, I think there is some problem relating to USB 3.x reset, and maybe
just my specific device which is an NVME storage in a USB dock.
I think the problem is more to do with the Linux kernel's USB 3.x
reset procedure, rather than any other cause.
Is there any quirk or test I can add, that would remove power from the
USB port and return it, as part of the reset procedure?
Or, is there any extra debug logging I can enable to help diagnose
where the reset function is failing?

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
  2022-10-03 18:04 ` James Dutton
@ 2022-10-03 18:17   ` Alan Stern
  2022-10-03 20:21     ` James Dutton
  0 siblings, 1 reply; 15+ messages in thread
From: Alan Stern @ 2022-10-03 18:17 UTC (permalink / raw)
  To: James Dutton; +Cc: linux-usb

On Mon, Oct 03, 2022 at 07:04:05PM +0100, James Dutton wrote:
> I have done some more tests.
> With the device plugged in, and me manually send a command to reset
> the USB device.
> Using instructions listed here:
> https://askubuntu.com/questions/645/how-do-you-reset-a-usb-device-from-the-command-line
> 
> The reset fails.
> It never recovers.
> So, I think there is some problem relating to USB 3.x reset, and maybe
> just my specific device which is an NVME storage in a USB dock.
> I think the problem is more to do with the Linux kernel's USB 3.x
> reset procedure, rather than any other cause.
> Is there any quirk or test I can add, that would remove power from the
> USB port and return it, as part of the reset procedure?
> Or, is there any extra debug logging I can enable to help diagnose
> where the reset function is failing?

You can try collecting a usbmon trace of the reset (instructions on the 
web or in Documentation/usb/usbmon.rst in the kernel source).  That will 
provide some clues as to whether the problem lies in the reset itself or 
in the activities that follow the reset.

Have you tried running a similar test using, say, a plain old USB thumb 
drive in place of the NVME storage device?

Alan Stern

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: USB disk disconnect problems
  2022-10-03 18:17   ` Alan Stern
@ 2022-10-03 20:21     ` James Dutton
  0 siblings, 0 replies; 15+ messages in thread
From: James Dutton @ 2022-10-03 20:21 UTC (permalink / raw)
  To: Alan Stern; +Cc: linux-usb

On Mon, 3 Oct 2022 at 19:17, Alan Stern <stern@rowland.harvard.edu> wrote:
>
> On Mon, Oct 03, 2022 at 07:04:05PM +0100, James Dutton wrote:
> > I have done some more tests.
> > With the device plugged in, I manually send a command to reset
> > the USB device.
> > Using instructions listed here:
> > https://askubuntu.com/questions/645/how-do-you-reset-a-usb-device-from-the-command-line
> >
> > The reset fails.
> > It never recovers.
> > So, I think there is some problem relating to USB 3.x reset, and maybe
> > just my specific device which is an NVME storage in a USB dock.
> > I think the problem is more to do with the Linux kernel's USB 3.x
> > reset procedure, rather than any other cause.
> > Is there any quirk or test I can add, that would remove power from the
> > USB port and return it, as part of the reset procedure?
> > Or, is there any extra debug logging I can enable to help diagnose
> > where the reset function is failing?
>
> You can try collecting a usbmon trace of the reset (instructions on the
> web or in Documentation/usb/usbmon.rst in the kernel source).  That will
> provide some clues as to whether the problem lies in the reset itself or
> in the activities that follow the reset.
>
> Have you tried running a similar test using, say, a plain old USB thumb
> drive in place of the NVME storage device?
>

I have tried the reset command on USB 2.0 and USB 3.0 flash sticks,
and they reset OK.
So, it seems to be a problem with this specific NVME USB device.
This NVME USB device says it is USB 3.2 when I do lsusb. I don't have
a USB 3.2 flash stick
lsusb output:
Bus 004 Device 002: ID 0bda:9210 Realtek Semiconductor Corp. RTL9210
M.2 NVME Adapter
Device Descriptor:
  bLength                18
  bDescriptorType         1
  bcdUSB               3.20
  bDeviceClass            0
  bDeviceSubClass         0
  bDeviceProtocol         0
  bMaxPacketSize0         9
  idVendor           0x0bda Realtek Semiconductor Corp.
  idProduct          0x9210 RTL9210 M.2 NVME Adapter
  bcdDevice           20.01


I will try to capture a usbmon and compare the flash sticks reset vs
the NVME USB device.

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-10-03 20:22 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-21 11:17 USB disk disconnect problems James Dutton
2022-08-21 14:47 ` Alan Stern
2022-08-21 16:36   ` James Dutton
2022-08-21 16:40     ` James Dutton
2022-08-21 18:11       ` Alan Stern
     [not found]   ` <CAA6KcBC2wEc78fgrMLBfbyEinR3rVUY6z8HeUbE=wtv0c4BP2Q@mail.gmail.com>
2022-08-21 19:03     ` Alan Stern
2022-08-21 20:03   ` Matthew Dharm
2022-08-21 20:59     ` James Dutton
2022-08-21 21:26       ` Matthew Dharm
2022-08-21 22:56         ` James Dutton
2022-08-22 10:03           ` Oliver Neukum
2022-08-22 10:18       ` Oliver Neukum
2022-10-03 18:04 ` James Dutton
2022-10-03 18:17   ` Alan Stern
2022-10-03 20:21     ` James Dutton

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.