All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Applicability of using 'txq_trans_update' during ring recovery
@ 2022-04-12 17:01 Ray Jui
  2022-04-12 17:37 ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Ray Jui @ 2022-04-12 17:01 UTC (permalink / raw)
  To: David S. Miller, Jakub Kicinski; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 976 bytes --]

Hi David/Jakub,

I'd like to run through you on the idea of invoking 'txq_trans_update'
to update the last TX timestamp in the scenario where we temporarily
stop the TX queue to do some recovery work. Is it considered an
acceptable approach to prevent false positive triggering of TX timeout
during the recovery process?

I know in general people use 'netif_carrier_off' during the process when
they reset/change the entire TX/RX ring set and/or other resources on
the Ethernet card. But in our particular case, we have another driver
running (i.e., RoCE) on top and setting 'netif_carrier_off' will cause a
significant side effect on the other driver (e.g., all RoCE QPs will be
terminated). In addition, for this special recovery work on our driver,
we are doing it on a per NAPI ring set basis while keeping the traffic
on other queues running. Using 'netif_carrier_off' will prevent traffic
running from all other queues that are not going through recovery.

Thanks,

Ray

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4194 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Applicability of using 'txq_trans_update' during ring recovery
  2022-04-12 17:01 [RFC] Applicability of using 'txq_trans_update' during ring recovery Ray Jui
@ 2022-04-12 17:37 ` Jakub Kicinski
  2022-04-12 18:08   ` Ray Jui
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2022-04-12 17:37 UTC (permalink / raw)
  To: Ray Jui; +Cc: David S. Miller, netdev

On Tue, 12 Apr 2022 10:01:02 -0700 Ray Jui wrote:
> Hi David/Jakub,
> 
> I'd like to run through you on the idea of invoking 'txq_trans_update'
> to update the last TX timestamp in the scenario where we temporarily
> stop the TX queue to do some recovery work. Is it considered an
> acceptable approach to prevent false positive triggering of TX timeout
> during the recovery process?
> 
> I know in general people use 'netif_carrier_off' during the process when
> they reset/change the entire TX/RX ring set and/or other resources on
> the Ethernet card. But in our particular case, we have another driver
> running (i.e., RoCE) on top and setting 'netif_carrier_off' will cause a
> significant side effect on the other driver (e.g., all RoCE QPs will be
> terminated). In addition, for this special recovery work on our driver,
> we are doing it on a per NAPI ring set basis while keeping the traffic
> on other queues running. Using 'netif_carrier_off' will prevent traffic
> running from all other queues that are not going through recovery.

Can you use netif_device_detach() to mark the device as not present?

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Applicability of using 'txq_trans_update' during ring recovery
  2022-04-12 17:37 ` Jakub Kicinski
@ 2022-04-12 18:08   ` Ray Jui
  2022-04-12 18:24     ` Michael Chan
  0 siblings, 1 reply; 9+ messages in thread
From: Ray Jui @ 2022-04-12 18:08 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: David S. Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 2014 bytes --]

Hi Jakub,

On 4/12/2022 10:37 AM, Jakub Kicinski wrote:
> On Tue, 12 Apr 2022 10:01:02 -0700 Ray Jui wrote:
>> Hi David/Jakub,
>>
>> I'd like to run through you on the idea of invoking 'txq_trans_update'
>> to update the last TX timestamp in the scenario where we temporarily
>> stop the TX queue to do some recovery work. Is it considered an
>> acceptable approach to prevent false positive triggering of TX timeout
>> during the recovery process?
>>
>> I know in general people use 'netif_carrier_off' during the process when
>> they reset/change the entire TX/RX ring set and/or other resources on
>> the Ethernet card. But in our particular case, we have another driver
>> running (i.e., RoCE) on top and setting 'netif_carrier_off' will cause a
>> significant side effect on the other driver (e.g., all RoCE QPs will be
>> terminated). In addition, for this special recovery work on our driver,
>> we are doing it on a per NAPI ring set basis while keeping the traffic
>> on other queues running. Using 'netif_carrier_off' will prevent traffic
>> running from all other queues that are not going through recovery.
> 
> Can you use netif_device_detach() to mark the device as not present?

It seems 'netif_device_detach' marks the netif device as removed
(through __LINK_STATE_PRESENT) and stops all TX queues.

It also seems the core infiniband subsystem mainly relies on
'netif_carrier_ok' and 'netif_runing', so 'netif_device_detach' might
potentially work. I also need to check with our internal RoCE driver
team to confirm.

One drawback with 'netif_device_detach' compared to the current solution
is that we will have to stop all TX queues during the entire duration of
the recovery process (instead of on a per NAPI ring set basis).

Can you please also comment on whether 'txq_trans_update' is considered
an acceptable approach in this particular scenario? And if not, is there
another mechanism in the kernel net subsystem that allows one to quiece
traffic on a per NAPI ring set basis?

Thanks,

Ray

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4194 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Applicability of using 'txq_trans_update' during ring recovery
  2022-04-12 18:08   ` Ray Jui
@ 2022-04-12 18:24     ` Michael Chan
  2022-04-12 18:36       ` Ray Jui
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Chan @ 2022-04-12 18:24 UTC (permalink / raw)
  To: Ray Jui; +Cc: Jakub Kicinski, David S. Miller, Netdev

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

On Tue, Apr 12, 2022 at 11:08 AM Ray Jui <ray.jui@broadcom.com> wrote:

> Can you please also comment on whether 'txq_trans_update' is considered
> an acceptable approach in this particular scenario?

In my opinion, updating trans_start to the current jiffies to prevent
TX timeout is not a good solution.  It just buys you the arbitrary TX
timeout period before the next TX timeout.  If you take more than this
time to restart the TX queue, you will still get TX timeout.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Applicability of using 'txq_trans_update' during ring recovery
  2022-04-12 18:24     ` Michael Chan
@ 2022-04-12 18:36       ` Ray Jui
  2022-04-12 19:19         ` Michael Chan
  0 siblings, 1 reply; 9+ messages in thread
From: Ray Jui @ 2022-04-12 18:36 UTC (permalink / raw)
  To: Michael Chan; +Cc: Jakub Kicinski, David S. Miller, Netdev

[-- Attachment #1: Type: text/plain, Size: 819 bytes --]



On 4/12/22 11:24, Michael Chan wrote:
> On Tue, Apr 12, 2022 at 11:08 AM Ray Jui <ray.jui@broadcom.com> wrote:
> 
>> Can you please also comment on whether 'txq_trans_update' is considered
>> an acceptable approach in this particular scenario?
> 
> In my opinion, updating trans_start to the current jiffies to prevent
> TX timeout is not a good solution.  It just buys you the arbitrary TX
> timeout period before the next TX timeout.  If you take more than this
> time to restart the TX queue, you will still get TX timeout.

However, one can argue that the recovery work is expected to be finished 
in much less time than any arbitrary TX timeout period. If the recovery 
of the particular NAPI ring set is taking more than an arbitrary TX 
timeout period, then something is wrong and we should really TX timeout.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4194 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Applicability of using 'txq_trans_update' during ring recovery
  2022-04-12 18:36       ` Ray Jui
@ 2022-04-12 19:19         ` Michael Chan
  2022-04-12 19:34           ` Ray Jui
  0 siblings, 1 reply; 9+ messages in thread
From: Michael Chan @ 2022-04-12 19:19 UTC (permalink / raw)
  To: Ray Jui; +Cc: Jakub Kicinski, David S. Miller, Netdev

[-- Attachment #1: Type: text/plain, Size: 1232 bytes --]

On Tue, Apr 12, 2022 at 11:36 AM Ray Jui <ray.jui@broadcom.com> wrote:
> On 4/12/22 11:24, Michael Chan wrote:
> > On Tue, Apr 12, 2022 at 11:08 AM Ray Jui <ray.jui@broadcom.com> wrote:
> >
> >> Can you please also comment on whether 'txq_trans_update' is considered
> >> an acceptable approach in this particular scenario?
> >
> > In my opinion, updating trans_start to the current jiffies to prevent
> > TX timeout is not a good solution.  It just buys you the arbitrary TX
> > timeout period before the next TX timeout.  If you take more than this
> > time to restart the TX queue, you will still get TX timeout.
>
> However, one can argue that the recovery work is expected to be finished
> in much less time than any arbitrary TX timeout period. If the recovery
> of the particular NAPI ring set is taking more than an arbitrary TX
> timeout period, then something is wrong and we should really TX timeout.

Even if it should work in a specific case, you are still expanding the
definition of TX timeout to be no shorter than this specific recovery
time.

Our general error recovery time that includes firmware and chip reset
can take longer than the TX timeout period.  And we call
netif_carrier_off() for the whole duration.

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Applicability of using 'txq_trans_update' during ring recovery
  2022-04-12 19:19         ` Michael Chan
@ 2022-04-12 19:34           ` Ray Jui
  2022-04-12 21:49             ` Jakub Kicinski
  0 siblings, 1 reply; 9+ messages in thread
From: Ray Jui @ 2022-04-12 19:34 UTC (permalink / raw)
  To: Michael Chan; +Cc: Jakub Kicinski, David S. Miller, Netdev

[-- Attachment #1: Type: text/plain, Size: 2390 bytes --]



On 4/12/2022 12:19 PM, Michael Chan wrote:
> On Tue, Apr 12, 2022 at 11:36 AM Ray Jui <ray.jui@broadcom.com> wrote:
>> On 4/12/22 11:24, Michael Chan wrote:
>>> On Tue, Apr 12, 2022 at 11:08 AM Ray Jui <ray.jui@broadcom.com> wrote:
>>>
>>>> Can you please also comment on whether 'txq_trans_update' is considered
>>>> an acceptable approach in this particular scenario?
>>>
>>> In my opinion, updating trans_start to the current jiffies to prevent
>>> TX timeout is not a good solution.  It just buys you the arbitrary TX
>>> timeout period before the next TX timeout.  If you take more than this
>>> time to restart the TX queue, you will still get TX timeout.
>>
>> However, one can argue that the recovery work is expected to be finished
>> in much less time than any arbitrary TX timeout period. If the recovery
>> of the particular NAPI ring set is taking more than an arbitrary TX
>> timeout period, then something is wrong and we should really TX timeout.
> 
> Even if it should work in a specific case, you are still expanding the
> definition of TX timeout to be no shorter than this specific recovery
> time.
> 
> Our general error recovery time that includes firmware and chip reset
> can take longer than the TX timeout period.  And we call
> netif_carrier_off() for the whole duration.

Sure, that is the general error recovery case which is very different
from this specific recovery case we are discussing here. This specific
recovery is solely performed by driver (without resetting firmware and
chip) on a per NAPI ring set basis. While a specific NAPI ring set is
being recovered, traffic is still going with the rest of the NAPI ring
sets. Average recovery time is in the 1 - 2 ms range in this type of
recovery.

Also as I already said, 'netif_carrier_off' is not an option given that
the RoCE/infiniband subsystem has a dependency on 'netif_carrier_status'
for many of their operations.

Basically I'm looking for a solution that allows one to be able to:
1) quieice traffic and perform recovery on a per NAPI ring set basis
2) During recovery, it does not cause any drastic effect on RoCE

'txq_trans_update' may not be the most optimal solution, but it is a
solution that satisfies the two requirements above. If there are any
other option that is considered more optimal than 'txq_trans_update' and
can satisfy the two requirements, please let me know.

Thanks.



[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4194 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Applicability of using 'txq_trans_update' during ring recovery
  2022-04-12 19:34           ` Ray Jui
@ 2022-04-12 21:49             ` Jakub Kicinski
  2022-04-12 22:21               ` Ray Jui
  0 siblings, 1 reply; 9+ messages in thread
From: Jakub Kicinski @ 2022-04-12 21:49 UTC (permalink / raw)
  To: Ray Jui; +Cc: Michael Chan, David S. Miller, Netdev

On Tue, 12 Apr 2022 12:34:23 -0700 Ray Jui wrote:
> Sure, that is the general error recovery case which is very different
> from this specific recovery case we are discussing here. This specific
> recovery is solely performed by driver (without resetting firmware and
> chip) on a per NAPI ring set basis. While a specific NAPI ring set is
> being recovered, traffic is still going with the rest of the NAPI ring
> sets. Average recovery time is in the 1 - 2 ms range in this type of
> recovery.
> 
> Also as I already said, 'netif_carrier_off' is not an option given that
> the RoCE/infiniband subsystem has a dependency on 'netif_carrier_status'
> for many of their operations.
> 
> Basically I'm looking for a solution that allows one to be able to:
> 1) quieice traffic and perform recovery on a per NAPI ring set basis
> 2) During recovery, it does not cause any drastic effect on RoCE
> 
> 'txq_trans_update' may not be the most optimal solution, but it is a
> solution that satisfies the two requirements above. If there are any
> other option that is considered more optimal than 'txq_trans_update' and
> can satisfy the two requirements, please let me know.

The optimal solution would be to not have to reset your rings and
pretend like nothing happened :/ If you can't reset the ring in time
you'll have to live with the splat. End of story.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [RFC] Applicability of using 'txq_trans_update' during ring recovery
  2022-04-12 21:49             ` Jakub Kicinski
@ 2022-04-12 22:21               ` Ray Jui
  0 siblings, 0 replies; 9+ messages in thread
From: Ray Jui @ 2022-04-12 22:21 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Michael Chan, David S. Miller, Netdev

[-- Attachment #1: Type: text/plain, Size: 2063 bytes --]



On 4/12/2022 2:49 PM, Jakub Kicinski wrote:
> On Tue, 12 Apr 2022 12:34:23 -0700 Ray Jui wrote:
>> Sure, that is the general error recovery case which is very different
>> from this specific recovery case we are discussing here. This specific
>> recovery is solely performed by driver (without resetting firmware and
>> chip) on a per NAPI ring set basis. While a specific NAPI ring set is
>> being recovered, traffic is still going with the rest of the NAPI ring
>> sets. Average recovery time is in the 1 - 2 ms range in this type of
>> recovery.
>>
>> Also as I already said, 'netif_carrier_off' is not an option given that
>> the RoCE/infiniband subsystem has a dependency on 'netif_carrier_status'
>> for many of their operations.
>>
>> Basically I'm looking for a solution that allows one to be able to:
>> 1) quieice traffic and perform recovery on a per NAPI ring set basis
>> 2) During recovery, it does not cause any drastic effect on RoCE
>>
>> 'txq_trans_update' may not be the most optimal solution, but it is a
>> solution that satisfies the two requirements above. If there are any
>> other option that is considered more optimal than 'txq_trans_update' and
>> can satisfy the two requirements, please let me know.
> 
> The optimal solution would be to not have to reset your rings and
> pretend like nothing happened :/

Yes, I wish we have a more robust HW so we don't need to deal with this
in SW. Unfortunately not my choice.

> If you can't reset the ring in time
> you'll have to live with the splat. End of story.

But the splat is not caused by the fact that we cannot recovery in time;
instead, it is caused by the fact there was no activity on the TX queue
for some time and now when we stop the individual TX queue (without
tapping off at netif level), TX timeout can be falsely triggered.

Basically it sounds like we currently do not have an optimal way in the
kernel to allow us to stop and re-start TX queue on a per NAPI ring set
basis (whether we need it or not can be a separate discussion). Is this
statement correct?

Thanks!

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4194 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-04-12 23:33 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-12 17:01 [RFC] Applicability of using 'txq_trans_update' during ring recovery Ray Jui
2022-04-12 17:37 ` Jakub Kicinski
2022-04-12 18:08   ` Ray Jui
2022-04-12 18:24     ` Michael Chan
2022-04-12 18:36       ` Ray Jui
2022-04-12 19:19         ` Michael Chan
2022-04-12 19:34           ` Ray Jui
2022-04-12 21:49             ` Jakub Kicinski
2022-04-12 22:21               ` Ray Jui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.