All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
To: Dany Madden <drt@linux.ibm.com>
Cc: "Jakub Kicinski" <kuba@kernel.org>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	netdev@vger.kernel.org, linyunsheng@huawei.com,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Antoine Tenart" <atenart@kernel.org>,
	"Alexander Lobakin" <alobakin@pm.me>,
	"Wei Wang" <weiwan@google.com>, "Taehee Yoo" <ap420073@gmail.com>,
	"Björn Töpel" <bjorn@kernel.org>, "Arnd Bergmann" <arnd@arndb.de>,
	"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
	"Neil Horman" <nhorman@redhat.com>,
	"Dust Li" <dust.li@linux.alibaba.com>
Subject: Re: [PATCH net v2] napi: fix race inside napi_enable
Date: Thu, 21 Oct 2021 20:16:04 -0700	[thread overview]
Message-ID: <YXIs9GRNtNbl8MkZ@us.ibm.com> (raw)
In-Reply-To: <dc6902364a8f91c4292fe1c5e01b24be@imap.linux.ibm.com>

Dany Madden [drt@linux.ibm.com] wrote:
> 
> We hit two napi related crashes while attempting mtu size change.
> 
> 1st crash:
> [430425.020051] ------------[ cut here ]------------
> [430425.020053] kernel BUG at ../net/core/dev.c:6938!
> [430425.020057] Oops: Exception in kernel mode, sig: 5 [#1]
> [430425.020068] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [430425.020075] Modules linked in: binfmt_misc rpadlpar_io rpaphp xt_tcpudp
> iptable_filter ip_tables x_tables pseries_rng ibmvnic rng_core ibmveth
> vmx_crypto gf128mul fuse btrfs blake2b_generic xor zstd_compress
> lzo_compress raid6_pq dm_service_time crc32c_vpmsum dm_mirror dm_region_hash
> dm_log dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
> [430425.020123] CPU: 0 PID: 34337 Comm: kworker/0:3 Kdump: loaded Tainted: G
> W     5.15.0-rc2-suka-00486-gce916130f5f6 #3
> [430425.020133] Workqueue: events_long __ibmvnic_reset [ibmvnic]
> [430425.020145] NIP: c000000000cb03f4 LR: c0080000014a4ce8 CTR:
> c000000000cb03b0
> [430425.020151] REGS: c00000002e5d37e0 TRAP: 0700  Tainted: G    W
> (5.15.0-rc2-suka-00486-gce916130f5f6)
> [430425.020159] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR:
> 28248428 XER: 20000001
> [430425.020176] CFAR: c0080000014ad9cc IRQMASK: 0
>         GPR00: c0080000014a4ce8 c00000002e5d3a80 c000000001b12100
> c0000001274f3190
>         GPR04: 00000000ffff36dc fffffffffffffff6 0000000000000019
> 0000000000000010
>         GPR08: c00000002ec48900 0000000000000001 c0000001274f31a0
> c0080000014ad9b8
>         GPR12: c000000000cb03b0 c000000001d00000 0000000080000000
> 00000000000003fe
>         GPR16: 00000000000006e3 0000000000000000 0000000000000008
> c00000002ec48af8
>         GPR20: c00000002ec48db0 0000000000000000 0000000000000004
> 0000000000000000
>         GPR24: c00000002ec48000 0000000000000004 c00000002ec49070
> 0000000000000006
>         GPR28: c00000002ec48900 c00000002ec48900 0000000000000002
> c00000002ec48000
> [430425.020248] NIP [c000000000cb03f4] napi_enable+0x44/0xc0
> [430425.020257] LR [c0080000014a4ce8] __ibmvnic_open+0xf0/0x440 [ibmvnic]
> [430425.020265] Call Trace:
> [430425.020269] [c00000002e5d3a80] [c00000002ec48900] 0xc00000002ec48900
> (unreliable)
> [430425.020277] [c00000002e5d3ab0] [c0080000014a4f40]
> __ibmvnic_open+0x348/0x440 [ibmvnic]
> [430425.020286] [c00000002e5d3b40] [c0080000014ace58]
> __ibmvnic_reset+0xb10/0xe40 [ibmvnic]
> [430425.020296] [c00000002e5d3c60] [c0000000001673a4]
> process_one_work+0x2d4/0x5d0
> [430425.020305] [c00000002e5d3d00] [c000000000167718]
> worker_thread+0x78/0x6c0
> [430425.020314] [c00000002e5d3da0] [c000000000173388] kthread+0x188/0x190
> [430425.020322] [c00000002e5d3e10] [c00000000000cee4]
> ret_from_kernel_thread+0x5c/0x64
> [430425.020331] Instruction dump:
> [430425.020335] 38a0fff6 39430010 e92d0c80 f9210028 39200000 60000000
> 60000000 e9030010
> [430425.020348] f9010020 e9210020 7d2948f8 792907e0 <0b090000> e9230038
> 7d072838 89290889
> [430425.020364] ---[ end trace 3abb5ec5589518ca ]---
> [430425.068100]
> [430425.068108] Sending IPI to other CPUs
> [430425.068206] IPI complete
> [430425.090333] kexec: Starting switchover sequence.

Jakub,

We hit this napi_enable() BUG_ON() crash three times this week. In two
of those times it appears that

	napi->state = netdev_priv(netdev)

i.e it contains ibmvnic_adapter* in our case.

	# Crash was on eth3

	crash> net |grep eth3
	c00000002e948000  eth3   10.1.194.173

	crash> net_device |grep SIZE
	SIZE: 2304

	crash> px 2304
	$1 = 0x900

	crash> ibmvnic_adapter c00000002e948900 |grep napi
	  napi = 0xc00000003b7dc000,
	  num_active_rx_napi = 8,
	  napi_enabled = false,

	crash> napi_struct 0xc00000003b7dc000 |grep state
	  state = 13835058056063650048,
	    state = 0 '\000',

	crash> px 13835058056063650048
	$2 = 0xc00000002e948900		#eth3 ibmvnic_adapter above

In the third case napi->state was 16 (i.e NAPI_STATE_SCHED was clear and
hence the bug in napi_enable()).

Let us know if any other fields are of interest. Do we have any clues on
when this started?

Sukadev

  parent reply	other threads:[~2021-10-22  3:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-18  8:52 [PATCH net v2] napi: fix race inside napi_enable Xuan Zhuo
2021-09-20  8:50 ` patchwork-bot+netdevbpf
2021-09-20 19:20 ` Jakub Kicinski
2021-09-22  6:47   ` Xuan Zhuo
2021-09-23 13:14     ` Jakub Kicinski
     [not found]       ` <1632404456.506512-1-xuanzhuo@linux.alibaba.com>
2021-09-23 14:54         ` Jakub Kicinski
2021-10-18 21:58 ` Sukadev Bhattiprolu
2021-10-18 22:55   ` Jakub Kicinski
2021-10-18 23:36     ` Dany Madden
2021-10-18 23:47       ` Jakub Kicinski
2021-10-19  0:01         ` Sukadev Bhattiprolu
2021-10-22  3:16       ` Sukadev Bhattiprolu [this message]
2021-10-25 17:36         ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YXIs9GRNtNbl8MkZ@us.ibm.com \
    --to=sukadev@linux.ibm.com \
    --cc=alobakin@pm.me \
    --cc=ap420073@gmail.com \
    --cc=arnd@arndb.de \
    --cc=atenart@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=drt@linux.ibm.com \
    --cc=dust.li@linux.alibaba.com \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@redhat.com \
    --cc=weiwan@google.com \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.