All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alan Stern <stern@rowland.harvard.edu>
To: Oliver Neukum <oneukum@suse.com>
Cc: Hayes Wang <hayeswang@realtek.com>,
	Jason-ch Chen <jason-ch.chen@mediatek.com>,
	"matthias.bgg@gmail.com" <matthias.bgg@gmail.com>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org" 
	<linux-arm-kernel@lists.infradead.org>,
	"linux-mediatek@lists.infradead.org" 
	<linux-mediatek@lists.infradead.org>,
	"Project_Global_Chrome_Upstream_Group@mediatek.com" 
	<Project_Global_Chrome_Upstream_Group@mediatek.com>,
	"hsinyi@google.com" <hsinyi@google.com>,
	nic_swsd <nic_swsd@realtek.com>
Subject: Re: [PATCH] r8152: stop submitting rx for -EPROTO
Date: Mon, 4 Oct 2021 10:33:05 -0400	[thread overview]
Message-ID: <20211004143305.GA583555@rowland.harvard.edu> (raw)
In-Reply-To: <72573b91-11d7-55a0-0cd8-5afbc289b38c@suse.com>

On Mon, Oct 04, 2021 at 01:44:54PM +0200, Oliver Neukum wrote:
> 
> On 01.10.21 17:22, Alan Stern wrote:
> > On Fri, Oct 01, 2021 at 03:26:48AM +0000, Hayes Wang wrote:
> >>> Alan Stern <stern@rowland.harvard.edu>
> >>> [...]
> >>>> There has been some discussion about this in the past.
> >>>>
> >>>> In general, -EPROTO is almost always a non-recoverable error.
> >>> Excuse me. I am confused about the above description.
> >>> I got -EPROTO before, when I debugged another issue.
> >>> However, the bulk transfer still worked after I resubmitted
> >>> the transfer. I didn't do anything to recover it. That is why
> >>> I do resubmission for -EPROTO.
> >> I check the Linux driver and the xHCI spec.
> >> The driver gets -EPROTO for bulk transfer, when the host
> >> returns COMP_USB_TRANSACTION_ERROR.
> >> According to the spec of xHCI, USB TRANSACTION ERROR
> >> means the host did not receive a valid response from the
> >> device (Timeout, CRC, Bad PID, unexpected NYET, etc.).
> > That's right.  If the device and cable are working properly, this 
> > should never happen.  Or only extremely rarely (for example, caused 
> > by external electromagnetic interference).
> And the device. I am afraid the condition in your conditional statement
> is not as likely to be true as would be desirable for quite a lot setups.

But if the device isn't working, a simple retry is most unlikely to fix 
the problem.  Some form of active error recovery, such as a bus reset, 
will be necessary.  For a non-working cable, even a reset won't help -- 
the user would have to physically adjust or replace the cable.

> >> It seems to be reasonable why resubmission sometimes works.
> > Did you ever track down the reason why you got the -EPROTO error 
> > while debugging that other issue?  Can you reproduce it?
> 
> Is that really the issue though? We are seeing this issue with EPROTO.
> But wouldn't we see it with any recoverable error?

If you mean an error that can be fixed but only by doing something more 
than a simple retry, then yes.  However, the vast majority of USB 
drivers do not attempt anything more than a simple retry.  Relatively 
few of them (including usbhid and mass-storage) are more sophisticated 
in their error handling.

> AFAICT we are running into a situation without progress because drivers
> retry
> 
> * forever
> * immediately
> 
> If we broke any of these conditions the system would proceed and the
> hotplug event be eventually be processed. We may ask whether drivers should
> retry forever, but I don't see that you can blame it on error codes.

It's important to distinguish between:

    1.	errors that are transient and will disappear very quickly,
	meaning that a retry has a good chance of working, and

    2.	errors that are effectively permanent (or at least, long-lived)
	and therefore are highly unlikely to be fixed by retrying.

My point is that there is no reason to retry in case 2, and -EPROTO 
falls into this case (as do -EILSEQ and -ETIME).

Converting drivers to keep track of their retries, to avoid retrying 
forever, would be a fairly large change.  Even implementing delayed 
retries requires some significant work (as you can see in Hayes's recent 
patch -- and that was an easy case because the NAPI infrastructure was 
already present).  It's much simpler to avoid retrying entirely in 
situations where retries won't help.

And it's even simpler if the USB core would automatically prevent 
retries (by failing URB submissions after low-level protocol errors) in 
these situations.

Alan Stern

WARNING: multiple messages have this Message-ID (diff)
From: Alan Stern <stern@rowland.harvard.edu>
To: Oliver Neukum <oneukum@suse.com>
Cc: Hayes Wang <hayeswang@realtek.com>,
	Jason-ch Chen <jason-ch.chen@mediatek.com>,
	"matthias.bgg@gmail.com" <matthias.bgg@gmail.com>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	 "linux-mediatek@lists.infradead.org"
	<linux-mediatek@lists.infradead.org>,
	"Project_Global_Chrome_Upstream_Group@mediatek.com"
	<Project_Global_Chrome_Upstream_Group@mediatek.com>,
	 "hsinyi@google.com" <hsinyi@google.com>,
	nic_swsd <nic_swsd@realtek.com>
Subject: Re: [PATCH] r8152: stop submitting rx for -EPROTO
Date: Mon, 4 Oct 2021 10:33:05 -0400	[thread overview]
Message-ID: <20211004143305.GA583555@rowland.harvard.edu> (raw)
In-Reply-To: <72573b91-11d7-55a0-0cd8-5afbc289b38c@suse.com>

On Mon, Oct 04, 2021 at 01:44:54PM +0200, Oliver Neukum wrote:
> 
> On 01.10.21 17:22, Alan Stern wrote:
> > On Fri, Oct 01, 2021 at 03:26:48AM +0000, Hayes Wang wrote:
> >>> Alan Stern <stern@rowland.harvard.edu>
> >>> [...]
> >>>> There has been some discussion about this in the past.
> >>>>
> >>>> In general, -EPROTO is almost always a non-recoverable error.
> >>> Excuse me. I am confused about the above description.
> >>> I got -EPROTO before, when I debugged another issue.
> >>> However, the bulk transfer still worked after I resubmitted
> >>> the transfer. I didn't do anything to recover it. That is why
> >>> I do resubmission for -EPROTO.
> >> I check the Linux driver and the xHCI spec.
> >> The driver gets -EPROTO for bulk transfer, when the host
> >> returns COMP_USB_TRANSACTION_ERROR.
> >> According to the spec of xHCI, USB TRANSACTION ERROR
> >> means the host did not receive a valid response from the
> >> device (Timeout, CRC, Bad PID, unexpected NYET, etc.).
> > That's right.  If the device and cable are working properly, this 
> > should never happen.  Or only extremely rarely (for example, caused 
> > by external electromagnetic interference).
> And the device. I am afraid the condition in your conditional statement
> is not as likely to be true as would be desirable for quite a lot setups.

But if the device isn't working, a simple retry is most unlikely to fix 
the problem.  Some form of active error recovery, such as a bus reset, 
will be necessary.  For a non-working cable, even a reset won't help -- 
the user would have to physically adjust or replace the cable.

> >> It seems to be reasonable why resubmission sometimes works.
> > Did you ever track down the reason why you got the -EPROTO error 
> > while debugging that other issue?  Can you reproduce it?
> 
> Is that really the issue though? We are seeing this issue with EPROTO.
> But wouldn't we see it with any recoverable error?

If you mean an error that can be fixed but only by doing something more 
than a simple retry, then yes.  However, the vast majority of USB 
drivers do not attempt anything more than a simple retry.  Relatively 
few of them (including usbhid and mass-storage) are more sophisticated 
in their error handling.

> AFAICT we are running into a situation without progress because drivers
> retry
> 
> * forever
> * immediately
> 
> If we broke any of these conditions the system would proceed and the
> hotplug event be eventually be processed. We may ask whether drivers should
> retry forever, but I don't see that you can blame it on error codes.

It's important to distinguish between:

    1.	errors that are transient and will disappear very quickly,
	meaning that a retry has a good chance of working, and

    2.	errors that are effectively permanent (or at least, long-lived)
	and therefore are highly unlikely to be fixed by retrying.

My point is that there is no reason to retry in case 2, and -EPROTO 
falls into this case (as do -EILSEQ and -ETIME).

Converting drivers to keep track of their retries, to avoid retrying 
forever, would be a fairly large change.  Even implementing delayed 
retries requires some significant work (as you can see in Hayes's recent 
patch -- and that was an easy case because the NAPI infrastructure was 
already present).  It's much simpler to avoid retrying entirely in 
situations where retries won't help.

And it's even simpler if the USB core would automatically prevent 
retries (by failing URB submissions after low-level protocol errors) in 
these situations.

Alan Stern

_______________________________________________
Linux-mediatek mailing list
Linux-mediatek@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-mediatek

WARNING: multiple messages have this Message-ID (diff)
From: Alan Stern <stern@rowland.harvard.edu>
To: Oliver Neukum <oneukum@suse.com>
Cc: Hayes Wang <hayeswang@realtek.com>,
	Jason-ch Chen <jason-ch.chen@mediatek.com>,
	"matthias.bgg@gmail.com" <matthias.bgg@gmail.com>,
	"linux-usb@vger.kernel.org" <linux-usb@vger.kernel.org>,
	"netdev@vger.kernel.org" <netdev@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	 "linux-mediatek@lists.infradead.org"
	<linux-mediatek@lists.infradead.org>,
	"Project_Global_Chrome_Upstream_Group@mediatek.com"
	<Project_Global_Chrome_Upstream_Group@mediatek.com>,
	 "hsinyi@google.com" <hsinyi@google.com>,
	nic_swsd <nic_swsd@realtek.com>
Subject: Re: [PATCH] r8152: stop submitting rx for -EPROTO
Date: Mon, 4 Oct 2021 10:33:05 -0400	[thread overview]
Message-ID: <20211004143305.GA583555@rowland.harvard.edu> (raw)
In-Reply-To: <72573b91-11d7-55a0-0cd8-5afbc289b38c@suse.com>

On Mon, Oct 04, 2021 at 01:44:54PM +0200, Oliver Neukum wrote:
> 
> On 01.10.21 17:22, Alan Stern wrote:
> > On Fri, Oct 01, 2021 at 03:26:48AM +0000, Hayes Wang wrote:
> >>> Alan Stern <stern@rowland.harvard.edu>
> >>> [...]
> >>>> There has been some discussion about this in the past.
> >>>>
> >>>> In general, -EPROTO is almost always a non-recoverable error.
> >>> Excuse me. I am confused about the above description.
> >>> I got -EPROTO before, when I debugged another issue.
> >>> However, the bulk transfer still worked after I resubmitted
> >>> the transfer. I didn't do anything to recover it. That is why
> >>> I do resubmission for -EPROTO.
> >> I check the Linux driver and the xHCI spec.
> >> The driver gets -EPROTO for bulk transfer, when the host
> >> returns COMP_USB_TRANSACTION_ERROR.
> >> According to the spec of xHCI, USB TRANSACTION ERROR
> >> means the host did not receive a valid response from the
> >> device (Timeout, CRC, Bad PID, unexpected NYET, etc.).
> > That's right.  If the device and cable are working properly, this 
> > should never happen.  Or only extremely rarely (for example, caused 
> > by external electromagnetic interference).
> And the device. I am afraid the condition in your conditional statement
> is not as likely to be true as would be desirable for quite a lot setups.

But if the device isn't working, a simple retry is most unlikely to fix 
the problem.  Some form of active error recovery, such as a bus reset, 
will be necessary.  For a non-working cable, even a reset won't help -- 
the user would have to physically adjust or replace the cable.

> >> It seems to be reasonable why resubmission sometimes works.
> > Did you ever track down the reason why you got the -EPROTO error 
> > while debugging that other issue?  Can you reproduce it?
> 
> Is that really the issue though? We are seeing this issue with EPROTO.
> But wouldn't we see it with any recoverable error?

If you mean an error that can be fixed but only by doing something more 
than a simple retry, then yes.  However, the vast majority of USB 
drivers do not attempt anything more than a simple retry.  Relatively 
few of them (including usbhid and mass-storage) are more sophisticated 
in their error handling.

> AFAICT we are running into a situation without progress because drivers
> retry
> 
> * forever
> * immediately
> 
> If we broke any of these conditions the system would proceed and the
> hotplug event be eventually be processed. We may ask whether drivers should
> retry forever, but I don't see that you can blame it on error codes.

It's important to distinguish between:

    1.	errors that are transient and will disappear very quickly,
	meaning that a retry has a good chance of working, and

    2.	errors that are effectively permanent (or at least, long-lived)
	and therefore are highly unlikely to be fixed by retrying.

My point is that there is no reason to retry in case 2, and -EPROTO 
falls into this case (as do -EILSEQ and -ETIME).

Converting drivers to keep track of their retries, to avoid retrying 
forever, would be a fairly large change.  Even implementing delayed 
retries requires some significant work (as you can see in Hayes's recent 
patch -- and that was an easy case because the NAPI infrastructure was 
already present).  It's much simpler to avoid retrying entirely in 
situations where retries won't help.

And it's even simpler if the USB core would automatically prevent 
retries (by failing URB submissions after low-level protocol errors) in 
these situations.

Alan Stern

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-10-04 14:33 UTC|newest]

Thread overview: 44+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-29  5:18 [PATCH] r8152: stop submitting rx for -EPROTO Jason-ch Chen
2021-09-29  5:18 ` Jason-ch Chen
2021-09-29  5:18 ` Jason-ch Chen
2021-09-29  8:14 ` Hayes Wang
2021-09-29  8:14   ` Hayes Wang
2021-09-29  8:14   ` Hayes Wang
2021-09-29  9:52   ` Jason-ch Chen
2021-09-29  9:52     ` Jason-ch Chen
2021-09-29  9:52     ` Jason-ch Chen
2021-09-30  2:41     ` Hayes Wang
2021-09-30  2:41       ` Hayes Wang
2021-09-30  2:41       ` Hayes Wang
2021-10-01  1:36       ` Jason-ch Chen
2021-10-01  1:36         ` Jason-ch Chen
2021-10-01  1:36         ` Jason-ch Chen
2021-09-30  9:30     ` Oliver Neukum
2021-09-30  9:30       ` Oliver Neukum
2021-09-30  9:30       ` Oliver Neukum
2021-09-30 15:18       ` Alan Stern
2021-09-30 15:18         ` Alan Stern
2021-09-30 15:18         ` Alan Stern
2021-10-01  2:40         ` Hayes Wang
2021-10-01  2:40           ` Hayes Wang
2021-10-01  2:40           ` Hayes Wang
2021-10-01  3:26           ` Hayes Wang
2021-10-01  3:26             ` Hayes Wang
2021-10-01  3:26             ` Hayes Wang
2021-10-01 15:22             ` Alan Stern
2021-10-01 15:22               ` Alan Stern
2021-10-01 15:22               ` Alan Stern
2021-10-04  2:15               ` Hayes Wang
2021-10-04  2:15                 ` Hayes Wang
2021-10-04  2:15                 ` Hayes Wang
2021-10-04 11:44               ` Oliver Neukum
2021-10-04 11:44                 ` Oliver Neukum
2021-10-04 11:44                 ` Oliver Neukum
2021-10-04 14:33                 ` Alan Stern [this message]
2021-10-04 14:33                   ` Alan Stern
2021-10-04 14:33                   ` Alan Stern
2021-09-30 16:13       ` Hayes Wang
2021-09-30 16:13         ` Hayes Wang
2021-09-30 16:13         ` Hayes Wang
2021-10-04  6:28 ` [PATCH net] r8152: avoid to resubmit rx immediately Hayes Wang
2021-10-05 11:50   ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20211004143305.GA583555@rowland.harvard.edu \
    --to=stern@rowland.harvard.edu \
    --cc=Project_Global_Chrome_Upstream_Group@mediatek.com \
    --cc=hayeswang@realtek.com \
    --cc=hsinyi@google.com \
    --cc=jason-ch.chen@mediatek.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mediatek@lists.infradead.org \
    --cc=linux-usb@vger.kernel.org \
    --cc=matthias.bgg@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=nic_swsd@realtek.com \
    --cc=oneukum@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.