All of lore.kernel.org
 help / color / mirror / Atom feed
From: fengchengwen <fengchengwen@huawei.com>
To: Andrew Rybchenko <andrew.rybchenko@oktetlabs.ru>,
	<thomas@monjalon.net>, <ferruh.yigit@xilinx.com>,
	<ferruh.yigit@amd.com>
Cc: <dev@dpdk.org>, <kalesh-anakkur.purayil@broadcom.com>,
	<somnath.kotur@broadcom.com>, <ajit.khaparde@broadcom.com>,
	<mdr@ashroe.eu>
Subject: Re: [PATCH v11 2/5] ethdev: support proactive error handling mode
Date: Tue, 11 Oct 2022 22:48:37 +0800	[thread overview]
Message-ID: <fcad90db-443f-bc22-16ae-30112c61cc9f@huawei.com> (raw)
In-Reply-To: <15fd413f-5759-2509-3d4c-35c3a2e5b2b8@oktetlabs.ru>

Hi Andrew,

On 2022/10/10 16:47, Andrew Rybchenko wrote:
> On 10/9/22 12:10, Chengwen Feng wrote:
>> From: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
>>
>> Some PMDs (e.g. hns3) could detect hardware or firmware errors, and try
>> to recover from the errors. In this process, the PMD sets the data path
>> pointers to dummy functions (which will prevent the crash), and also
>> make sure the control path operations failed with retcode -EBUSY.
>
> Could you explain why passive mode is not good. Why is
> proactive better? What are the benefits? IMHO, it would
> be simpler to have just one error recovery mode.


I think the two modes are not good or bad. To a large extent, they are 
determined

by the hardware and software design of the network card chip. Here take 
the hns3

driver as an examples:

During the error recovery, multiple handshakes are required between the 
driver and

the firmware, in addition, the handshake timeout are required.

If chose passive mode, the application may not register the callback 
(and also we

found that only ovs-dpdk register the reset event in many DPDK-based 
opensource

software), so the recovery will failed.  Furthermore, even if registered 
the callback,

the recovery process involves multiple handshakes which may take too 
much time

to complete, imagine having multiple ports to report the reset time at 
the same time.

(This possibility exists. Consider that the PF is reset due to multiple 
VFs under the PF.)

In this case, many VFs report event, but the event callback is executed 
sequentially

(because there is only one interrupt thread). As a result, later VFs 
cannot be processed

in time, and the reset may fails.


In conclusion, the proactive mode is an available troubleshooting method in

engineering practice.


>>
>> The above error handling mode is known as
>> RTE_ETH_ERROR_HANDLE_MODE_PROACTIVE (proactive error handling mode).
>>
>> In some service scenarios, application needs to be aware of the event
>> to determine whether to migrate services. So three events were
>> introduced:
>>
>> 1) RTE_ETH_EVENT_ERR_RECOVERING: used to notify the application that it
>> detected an error and the recovery is being started. Upon receiving the
>> event, the application should not invoke any control path APIs until
>> receiving RTE_ETH_EVENT_RECOVERY_SUCCESS or
>> RTE_ETH_EVENT_RECOVERY_FAILED event.
>>
>> 2) RTE_ETH_EVENT_RECOVERY_SUCCESS: used to notify the application that
>> it recovers successful from the error, the PMD already re-configures the
>> port, and the effect is the same as that of the restart operation.
>>
>> 3) RTE_ETH_EVENT_RECOVERY_FAILED: used to notify the application that it
>> recovers failed from the error, the port should not usable anymore. The
>> application should close the port.
>>
>> Signed-off-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
>> Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
>> Signed-off-by: Chengwen Feng <fengchengwen@huawei.com>
>> Reviewed-by: Ajit Khaparde <ajit.khaparde@broadcom.com>
>
> The code itself LGTM. I just want to understand why we need it.
> It should be proved in the description.
>

  reply	other threads:[~2022-10-11 14:48 UTC|newest]

Thread overview: 41+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20220128124831.427-1-kalesh-anakkur.purayil@broadcom.com>
2022-09-22  7:41 ` [PATCH v9 0/5] support error handling mode Chengwen Feng
2022-09-22  7:41   ` [PATCH v9 1/5] ethdev: support get port " Chengwen Feng
2022-10-03 17:35     ` Ferruh Yigit
2022-10-05  1:56       ` fengchengwen
2022-09-22  7:41   ` [PATCH v9 2/5] ethdev: support proactive " Chengwen Feng
2022-10-03 17:35     ` Ferruh Yigit
2022-09-22  7:41   ` [PATCH v9 3/5] app/testpmd: support error handling mode event Chengwen Feng
2022-09-22  7:41   ` [PATCH v9 4/5] net/hns3: support proactive error handling mode Chengwen Feng
2022-09-22  7:41   ` [PATCH v9 5/5] net/bnxt: " Chengwen Feng
2022-10-09  7:53 ` [PATCH v10 0/5] support " Chengwen Feng
2022-10-09  7:53   ` [PATCH v10 1/5] ethdev: support get port " Chengwen Feng
2022-10-09  7:53   ` [PATCH v10 2/5] ethdev: support proactive " Chengwen Feng
2022-10-09  7:53   ` [PATCH v10 3/5] app/testpmd: support error handling mode event Chengwen Feng
2022-10-09  7:53   ` [PATCH v10 4/5] net/hns3: support proactive error handling mode Chengwen Feng
2022-10-09  7:53   ` [PATCH v10 5/5] net/bnxt: " Chengwen Feng
2022-10-09  9:10 ` [PATCH v11 0/5] support " Chengwen Feng
2022-10-09  9:10   ` [PATCH v11 1/5] ethdev: support get port " Chengwen Feng
2022-10-10  8:38     ` Andrew Rybchenko
2022-10-10  8:44     ` Andrew Rybchenko
2022-10-09  9:10   ` [PATCH v11 2/5] ethdev: support proactive " Chengwen Feng
2022-10-10  8:47     ` Andrew Rybchenko
2022-10-11 14:48       ` fengchengwen [this message]
2022-10-09  9:10   ` [PATCH v11 3/5] app/testpmd: support error handling mode event Chengwen Feng
2022-10-09  9:10   ` [PATCH v11 4/5] net/hns3: support proactive error handling mode Chengwen Feng
2022-10-09 11:05     ` Dongdong Liu
2022-10-09  9:10   ` [PATCH v11 5/5] net/bnxt: " Chengwen Feng
2022-10-12  3:45 ` [PATCH v12 0/5] support " Chengwen Feng
2022-10-12  3:45   ` [PATCH v12 1/5] ethdev: add error handling mode to device info Chengwen Feng
2022-10-12  3:45   ` [PATCH v12 2/5] ethdev: support proactive error handling mode Chengwen Feng
2022-10-13  8:58     ` Andrew Rybchenko
2022-10-13 12:50       ` fengchengwen
2022-10-12  3:45   ` [PATCH v12 3/5] app/testpmd: support error handling mode event Chengwen Feng
2022-10-12  3:45   ` [PATCH v12 4/5] net/hns3: support proactive error handling mode Chengwen Feng
2022-10-12  3:45   ` [PATCH v12 5/5] net/bnxt: " Chengwen Feng
2022-10-13 12:42 ` [PATCH v13 0/5] support " Chengwen Feng
2022-10-13 12:42   ` [PATCH v13 1/5] ethdev: add error handling mode to device info Chengwen Feng
2022-10-13 12:42   ` [PATCH v13 2/5] ethdev: support proactive error handling mode Chengwen Feng
2022-10-13 12:42   ` [PATCH v13 3/5] app/testpmd: support error handling mode event Chengwen Feng
2022-10-13 12:42   ` [PATCH v13 4/5] net/hns3: support proactive error handling mode Chengwen Feng
2022-10-13 12:42   ` [PATCH v13 5/5] net/bnxt: " Chengwen Feng
2022-10-17  7:42   ` [PATCH v13 0/5] support " Andrew Rybchenko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=fcad90db-443f-bc22-16ae-30112c61cc9f@huawei.com \
    --to=fengchengwen@huawei.com \
    --cc=ajit.khaparde@broadcom.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=dev@dpdk.org \
    --cc=ferruh.yigit@amd.com \
    --cc=ferruh.yigit@xilinx.com \
    --cc=kalesh-anakkur.purayil@broadcom.com \
    --cc=mdr@ashroe.eu \
    --cc=somnath.kotur@broadcom.com \
    --cc=thomas@monjalon.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.