All of lore.kernel.org
 help / color / mirror / Atom feed
From: Pavan Chebbi <pavan.chebbi@broadcom.com>
To: David Christensen <drc@linux.vnet.ibm.com>
Cc: netdev@vger.kernel.org,
	Siva Reddy Kallam <siva.kallam@broadcom.com>,
	Prashant Sreedharan <prashant@broadcom.com>,
	Michael Chan <mchan@broadcom.com>
Subject: Re: [PATCH] net/tg3: resolve deadlock in tg3_reset_task() during EEH
Date: Tue, 24 Jan 2023 12:55:22 +0530	[thread overview]
Message-ID: <CALs4sv1OYthkDYBbj9i-jytWo7VZ2rL9VcHiWP55svVX8R20RQ@mail.gmail.com> (raw)
In-Reply-To: <20230124003107.214307-1-drc@linux.vnet.ibm.com>

[-- Attachment #1: Type: text/plain, Size: 4436 bytes --]

On Tue, Jan 24, 2023 at 6:01 AM David Christensen
<drc@linux.vnet.ibm.com> wrote:
>
> During EEH error injection testing, a deadlock was encountered in the tg3
> driver when tg3_io_error_detected() was attempting to cancel outstanding
> reset tasks:
>
> crash> foreach UN bt
> ...
> PID: 159    TASK: c0000000067c6000  CPU: 8   COMMAND: "eehd"
> ...
>  #5 [c00000000681f990] __cancel_work_timer at c00000000019fd18
>  #6 [c00000000681fa30] tg3_io_error_detected at c00800000295f098 [tg3]
>  #7 [c00000000681faf0] eeh_report_error at c00000000004e25c
> ...
>
> PID: 290    TASK: c000000036e5f800  CPU: 6   COMMAND: "kworker/6:1"
> ...
>  #4 [c00000003721fbc0] rtnl_lock at c000000000c940d8
>  #5 [c00000003721fbe0] tg3_reset_task at c008000002969358 [tg3]
>  #6 [c00000003721fc60] process_one_work at c00000000019e5c4
> ...
>
> PID: 296    TASK: c000000037a65800  CPU: 21  COMMAND: "kworker/21:1"
> ...
>  #4 [c000000037247bc0] rtnl_lock at c000000000c940d8
>  #5 [c000000037247be0] tg3_reset_task at c008000002969358 [tg3]
>  #6 [c000000037247c60] process_one_work at c00000000019e5c4
> ...
>
> PID: 655    TASK: c000000036f49000  CPU: 16  COMMAND: "kworker/16:2"
> ...:1
>
>  #4 [c0000000373ebbc0] rtnl_lock at c000000000c940d8
>  #5 [c0000000373ebbe0] tg3_reset_task at c008000002969358 [tg3]
>  #6 [c0000000373ebc60] process_one_work at c00000000019e5c4
> ...
>
> Code inspection shows that both tg3_io_error_detected() and
> tg3_reset_task() attempt to acquire the RTNL lock at the beginning of
> their code blocks.  If tg3_reset_task() should happen to execute between
> the times when tg3_io_error_deteced() acquires the RTNL lock and
> tg3_reset_task_cancel() is called, a deadlock will occur.
>
> Moving tg3_reset_task_cancel() call earlier within the code block, prior
> to acquiring RTNL, prevents this from happening, but also exposes another
> deadlock issue where tg3_reset_task() may execute AFTER
> tg3_io_error_detected() has executed:
>
> crash> foreach UN bt
> PID: 159    TASK: c0000000067d2000  CPU: 9   COMMAND: "eehd"
> ...
>  #4 [c000000006867a60] rtnl_lock at c000000000c940d8
>  #5 [c000000006867a80] tg3_io_slot_reset at c0080000026c2ea8 [tg3]
>  #6 [c000000006867b00] eeh_report_reset at c00000000004de88
> ...
> PID: 363    TASK: c000000037564000  CPU: 6   COMMAND: "kworker/6:1"
> ...
>  #3 [c000000036c1bb70] msleep at c000000000259e6c
>  #4 [c000000036c1bba0] napi_disable at c000000000c6b848
>  #5 [c000000036c1bbe0] tg3_reset_task at c0080000026d942c [tg3]
>  #6 [c000000036c1bc60] process_one_work at c00000000019e5c4
> ...
>
> This issue can be avoided by aborting tg3_reset_task() if EEH error
> recovery is already in progress.
>
> Signed-off-by: David Christensen <drc@linux.vnet.ibm.com>
> Cc: Siva Reddy Kallam <siva.kallam@broadcom.com>
> Cc: Prashant Sreedharan <prashant@broadcom.com>
> Cc: Michael Chan <mchan@broadcom.com>
> ---
>  drivers/net/ethernet/broadcom/tg3.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/net/ethernet/broadcom/tg3.c b/drivers/net/ethernet/broadcom/tg3.c
> index 59debdc344a5..ee4604e6900e 100644
> --- a/drivers/net/ethernet/broadcom/tg3.c
> +++ b/drivers/net/ethernet/broadcom/tg3.c
> @@ -11166,7 +11166,8 @@ static void tg3_reset_task(struct work_struct *work)
>         rtnl_lock();
>         tg3_full_lock(tp, 0);
>
> -       if (!netif_running(tp->dev)) {
> +       // Skip reset task if no netdev or already in PCI recovery
> +       if (!tp->dev || tp->pcierr_recovery || !netif_running(tp->dev)) {

Thanks for the patch. Can we not use netif_device_present() here?

>                 tg3_flag_clear(tp, RESET_TASK_PENDING);
>                 tg3_full_unlock(tp);
>                 rtnl_unlock();
> @@ -18101,6 +18102,9 @@ static pci_ers_result_t tg3_io_error_detected(struct pci_dev *pdev,
>
>         netdev_info(netdev, "PCI I/O error detected\n");
>
> +       /* Want to make sure that the reset task doesn't run */
> +       tg3_reset_task_cancel(tp);
> +
>         rtnl_lock();
>
>         /* Could be second call or maybe we don't have netdev yet */
> @@ -18117,9 +18121,6 @@ static pci_ers_result_t tg3_io_error_detected(struct pci_dev *pdev,
>
>         tg3_timer_stop(tp);
>
> -       /* Want to make sure that the reset task doesn't run */
> -       tg3_reset_task_cancel(tp);
> -
>         netif_device_detach(netdev);
>
>         /* Clean up software state, even if MMIO is blocked */
> --
> 2.31.1
>

[-- Attachment #2: S/MIME Cryptographic Signature --]
[-- Type: application/pkcs7-signature, Size: 4209 bytes --]

  reply	other threads:[~2023-01-24  7:25 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-24  0:31 [PATCH] net/tg3: resolve deadlock in tg3_reset_task() during EEH David Christensen
2023-01-24  7:25 ` Pavan Chebbi [this message]
2023-01-24  8:43   ` Michael Chan
2023-01-24  9:12     ` Pavan Chebbi
2023-01-24 18:53 ` [PATCH v2] " David Christensen
2023-01-25  8:33   ` Pavan Chebbi
2023-01-26  6:50   ` patchwork-bot+netdevbpf

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CALs4sv1OYthkDYBbj9i-jytWo7VZ2rL9VcHiWP55svVX8R20RQ@mail.gmail.com \
    --to=pavan.chebbi@broadcom.com \
    --cc=drc@linux.vnet.ibm.com \
    --cc=mchan@broadcom.com \
    --cc=netdev@vger.kernel.org \
    --cc=prashant@broadcom.com \
    --cc=siva.kallam@broadcom.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.