netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Paolo Abeni <pabeni@redhat.com>
To: David Thompson <davthompson@nvidia.com>,
	davem@davemloft.net,  edumazet@google.com, kuba@kernel.org,
	u.kleine-koenig@pengutronix.de,  leon@kernel.org
Cc: asmaa@nvidia.com, netdev@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [PATCH net v1] mlxbf_gige: open() should call request_irq() after NAPI init
Date: Tue, 19 Mar 2024 15:32:16 +0100	[thread overview]
Message-ID: <0bfbe50a087725366a47ee36d4778292e19bbee9.camel@redhat.com> (raw)
In-Reply-To: <20240315145609.23950-1-davthompson@nvidia.com>

On Fri, 2024-03-15 at 10:56 -0400, David Thompson wrote:
> This patch fixes an exception that occurs during open()
> when kdump is enabled.  The sequence to reproduce the
> exception is as follows:
> a) enable kdump
> b) trigger kdump via "echo c > /proc/sysrq-trigger"
> c) kdump kernel executes
> d) kdump kernel loads mlxbf_gige module
> e) the mlxbf_gige module runs its open() as the
>    the "oob_net0" interface is brought up
> f) mlxbf_gige module will experience an exception
>    during its open(), something like:
> 
>      Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
>      Mem abort info:
>        ESR = 0x0000000086000004
>        EC = 0x21: IABT (current EL), IL = 32 bits
>        SET = 0, FnV = 0
>        EA = 0, S1PTW = 0
>        FSC = 0x04: level 0 translation fault
>      user pgtable: 4k pages, 48-bit VAs, pgdp=00000000e29a4000
>      [0000000000000000] pgd=0000000000000000, p4d=0000000000000000
>      Internal error: Oops: 0000000086000004 [#1] SMP
>      CPU: 0 PID: 812 Comm: NetworkManager Tainted: G           OE     5.15.0-1035-bluefield #37-Ubuntu
>      Hardware name: https://www.mellanox.com BlueField-3 SmartNIC Main Card/BlueField-3 SmartNIC Main Card, BIOS 4.6.0.13024 Jan 19 2024
>      pstate: 80400009 (Nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
>      pc : 0x0
>      lr : __napi_poll+0x40/0x230
>      sp : ffff800008003e00
>      x29: ffff800008003e00 x28: 0000000000000000 x27: 00000000ffffffff
>      x26: ffff000066027238 x25: ffff00007cedec00 x24: ffff800008003ec8
>      x23: 000000000000012c x22: ffff800008003eb7 x21: 0000000000000000
>      x20: 0000000000000001 x19: ffff000066027238 x18: 0000000000000000
>      x17: ffff578fcb450000 x16: ffffa870b083c7c0 x15: 0000aaab010441d0
>      x14: 0000000000000001 x13: 00726f7272655f65 x12: 6769675f6662786c
>      x11: 0000000000000000 x10: 0000000000000000 x9 : ffffa870b0842398
>      x8 : 0000000000000004 x7 : fe5a48b9069706ea x6 : 17fdb11fc84ae0d2
>      x5 : d94a82549d594f35 x4 : 0000000000000000 x3 : 0000000000400100
>      x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000066027238
>      Call trace:
>       0x0
>       net_rx_action+0x178/0x360
>       __do_softirq+0x15c/0x428
>       __irq_exit_rcu+0xac/0xec
>       irq_exit+0x18/0x2c
>       handle_domain_irq+0x6c/0xa0
>       gic_handle_irq+0xec/0x1b0
>       call_on_irq_stack+0x20/0x2c
>       do_interrupt_handler+0x5c/0x70
>       el1_interrupt+0x30/0x50
>       el1h_64_irq_handler+0x18/0x2c
>       el1h_64_irq+0x7c/0x80
>       __setup_irq+0x4c0/0x950
>       request_threaded_irq+0xf4/0x1bc
>       mlxbf_gige_request_irqs+0x68/0x110 [mlxbf_gige]
>       mlxbf_gige_open+0x5c/0x170 [mlxbf_gige]
>       __dev_open+0x100/0x220
>       __dev_change_flags+0x16c/0x1f0
>       dev_change_flags+0x2c/0x70
>       do_setlink+0x220/0xa40
>       __rtnl_newlink+0x56c/0x8a0
>       rtnl_newlink+0x58/0x84
>       rtnetlink_rcv_msg+0x138/0x3c4
>       netlink_rcv_skb+0x64/0x130
>       rtnetlink_rcv+0x20/0x30
>       netlink_unicast+0x2ec/0x360
>       netlink_sendmsg+0x278/0x490
>       __sock_sendmsg+0x5c/0x6c
>       ____sys_sendmsg+0x290/0x2d4
>       ___sys_sendmsg+0x84/0xd0
>       __sys_sendmsg+0x70/0xd0
>       __arm64_sys_sendmsg+0x2c/0x40
>       invoke_syscall+0x78/0x100
>       el0_svc_common.constprop.0+0x54/0x184
>       do_el0_svc+0x30/0xac
>       el0_svc+0x48/0x160
>       el0t_64_sync_handler+0xa4/0x12c
>       el0t_64_sync+0x1a4/0x1a8
>      Code: bad PC value
>      ---[ end trace 7d1c3f3bf9d81885 ]---
>      Kernel panic - not syncing: Oops: Fatal exception in interrupt
>      Kernel Offset: 0x2870a7a00000 from 0xffff800008000000
>      PHYS_OFFSET: 0x80000000
>      CPU features: 0x0,000005c1,a3332a5a
>      Memory Limit: none
>      ---[ end Kernel panic - not syncing: Oops: Fatal exception in interrupt ]---
> 
> The exception happens because there is a pending RX interrupt before the
> call to request_irq(RX IRQ) executes.  Then, the RX IRQ handler fires
> immediately after this request_irq() completes. The RX IRQ handler runs
> "napi_schedule()" before NAPI is fully initialized via "netif_napi_add()"
> and "napi_enable()", both which happen later in the open() logic.
> 
> This patch fixes the issue by re-ordering the logic in mlxbf_gige open()
> so that the request_irq() calls execute after NAPI is fully initialized.

It also adds a missing phy_stop(phydev) in the error path. It's worth
mentioning in the changelog or possibly place into a separate patch.

Cheers,

Paolo


      parent reply	other threads:[~2024-03-19 14:32 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-15 14:56 [PATCH net v1] mlxbf_gige: open() should call request_irq() after NAPI init David Thompson
2024-03-18 11:40 ` Jiri Pirko
2024-03-19 14:32 ` Paolo Abeni [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0bfbe50a087725366a47ee36d4778292e19bbee9.camel@redhat.com \
    --to=pabeni@redhat.com \
    --cc=asmaa@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=davthompson@nvidia.com \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=u.kleine-koenig@pengutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).