All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sukadev Bhattiprolu <sukadev@linux.ibm.com>
To: Dany Madden <drt@linux.ibm.com>
Cc: "Jakub Kicinski" <kuba@kernel.org>,
	"Xuan Zhuo" <xuanzhuo@linux.alibaba.com>,
	netdev@vger.kernel.org, linyunsheng@huawei.com,
	"David S. Miller" <davem@davemloft.net>,
	"Eric Dumazet" <edumazet@google.com>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Antoine Tenart" <atenart@kernel.org>,
	"Alexander Lobakin" <alobakin@pm.me>,
	"Wei Wang" <weiwan@google.com>, "Taehee Yoo" <ap420073@gmail.com>,
	"Björn Töpel" <bjorn@kernel.org>, "Arnd Bergmann" <arnd@arndb.de>,
	"Kumar Kartikeya Dwivedi" <memxor@gmail.com>,
	"Neil Horman" <nhorman@redhat.com>,
	"Dust Li" <dust.li@linux.alibaba.com>
Subject: Re: [PATCH net v2] napi: fix race inside napi_enable
Date: Thu, 21 Oct 2021 20:16:04 -0700	[thread overview]
Message-ID: <YXIs9GRNtNbl8MkZ@us.ibm.com> (raw)
In-Reply-To: <dc6902364a8f91c4292fe1c5e01b24be@imap.linux.ibm.com>

Dany Madden [drt@linux.ibm.com] wrote:
> 
> We hit two napi related crashes while attempting mtu size change.
> 
> 1st crash:
> [430425.020051] ------------[ cut here ]------------
> [430425.020053] kernel BUG at ../net/core/dev.c:6938!
> [430425.020057] Oops: Exception in kernel mode, sig: 5 [#1]
> [430425.020068] LE PAGE_SIZE=64K MMU=Hash SMP NR_CPUS=2048 NUMA pSeries
> [430425.020075] Modules linked in: binfmt_misc rpadlpar_io rpaphp xt_tcpudp
> iptable_filter ip_tables x_tables pseries_rng ibmvnic rng_core ibmveth
> vmx_crypto gf128mul fuse btrfs blake2b_generic xor zstd_compress
> lzo_compress raid6_pq dm_service_time crc32c_vpmsum dm_mirror dm_region_hash
> dm_log dm_multipath scsi_dh_rdac scsi_dh_alua autofs4
> [430425.020123] CPU: 0 PID: 34337 Comm: kworker/0:3 Kdump: loaded Tainted: G
> W     5.15.0-rc2-suka-00486-gce916130f5f6 #3
> [430425.020133] Workqueue: events_long __ibmvnic_reset [ibmvnic]
> [430425.020145] NIP: c000000000cb03f4 LR: c0080000014a4ce8 CTR:
> c000000000cb03b0
> [430425.020151] REGS: c00000002e5d37e0 TRAP: 0700  Tainted: G    W
> (5.15.0-rc2-suka-00486-gce916130f5f6)
> [430425.020159] MSR: 800000000282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE> CR:
> 28248428 XER: 20000001
> [430425.020176] CFAR: c0080000014ad9cc IRQMASK: 0
>         GPR00: c0080000014a4ce8 c00000002e5d3a80 c000000001b12100
> c0000001274f3190
>         GPR04: 00000000ffff36dc fffffffffffffff6 0000000000000019
> 0000000000000010
>         GPR08: c00000002ec48900 0000000000000001 c0000001274f31a0
> c0080000014ad9b8
>         GPR12: c000000000cb03b0 c000000001d00000 0000000080000000
> 00000000000003fe
>         GPR16: 00000000000006e3 0000000000000000 0000000000000008
> c00000002ec48af8
>         GPR20: c00000002ec48db0 0000000000000000 0000000000000004
> 0000000000000000
>         GPR24: c00000002ec48000 0000000000000004 c00000002ec49070
> 0000000000000006
>         GPR28: c00000002ec48900 c00000002ec48900 0000000000000002
> c00000002ec48000
> [430425.020248] NIP [c000000000cb03f4] napi_enable+0x44/0xc0
> [430425.020257] LR [c0080000014a4ce8] __ibmvnic_open+0xf0/0x440 [ibmvnic]
> [430425.020265] Call Trace:
> [430425.020269] [c00000002e5d3a80] [c00000002ec48900] 0xc00000002ec48900
> (unreliable)
> [430425.020277] [c00000002e5d3ab0] [c0080000014a4f40]
> __ibmvnic_open+0x348/0x440 [ibmvnic]
> [430425.020286] [c00000002e5d3b40] [c0080000014ace58]
> __ibmvnic_reset+0xb10/0xe40 [ibmvnic]
> [430425.020296] [c00000002e5d3c60] [c0000000001673a4]
> process_one_work+0x2d4/0x5d0
> [430425.020305] [c00000002e5d3d00] [c000000000167718]
> worker_thread+0x78/0x6c0
> [430425.020314] [c00000002e5d3da0] [c000000000173388] kthread+0x188/0x190
> [430425.020322] [c00000002e5d3e10] [c00000000000cee4]
> ret_from_kernel_thread+0x5c/0x64
> [430425.020331] Instruction dump:
> [430425.020335] 38a0fff6 39430010 e92d0c80 f9210028 39200000 60000000
> 60000000 e9030010
> [430425.020348] f9010020 e9210020 7d2948f8 792907e0 <0b090000> e9230038
> 7d072838 89290889
> [430425.020364] ---[ end trace 3abb5ec5589518ca ]---
> [430425.068100]
> [430425.068108] Sending IPI to other CPUs
> [430425.068206] IPI complete
> [430425.090333] kexec: Starting switchover sequence.

Jakub,

We hit this napi_enable() BUG_ON() crash three times this week. In two
of those times it appears that

	napi->state = netdev_priv(netdev)

i.e it contains ibmvnic_adapter* in our case.

	# Crash was on eth3

	crash> net |grep eth3
	c00000002e948000  eth3   10.1.194.173

	crash> net_device |grep SIZE
	SIZE: 2304

	crash> px 2304
	$1 = 0x900

	crash> ibmvnic_adapter c00000002e948900 |grep napi
	  napi = 0xc00000003b7dc000,
	  num_active_rx_napi = 8,
	  napi_enabled = false,

	crash> napi_struct 0xc00000003b7dc000 |grep state
	  state = 13835058056063650048,
	    state = 0 '\000',

	crash> px 13835058056063650048
	$2 = 0xc00000002e948900		#eth3 ibmvnic_adapter above

In the third case napi->state was 16 (i.e NAPI_STATE_SCHED was clear and
hence the bug in napi_enable()).

Let us know if any other fields are of interest. Do we have any clues on
when this started?

Sukadev

  parent reply	other threads:[~2021-10-22  3:16 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-18  8:52 Xuan Zhuo
2021-09-20  8:50 ` patchwork-bot+netdevbpf
2021-09-20 19:20 ` Jakub Kicinski
2021-09-22  6:47   ` Xuan Zhuo
2021-09-23 13:14     ` Jakub Kicinski
     [not found]       ` <1632404456.506512-1-xuanzhuo@linux.alibaba.com>
2021-09-23 14:54         ` Jakub Kicinski
2021-10-18 21:58 ` Sukadev Bhattiprolu
2021-10-18 22:55   ` Jakub Kicinski
2021-10-18 23:36     ` Dany Madden
2021-10-18 23:47       ` Jakub Kicinski
2021-10-19  0:01         ` Sukadev Bhattiprolu
2021-10-22  3:16       ` Sukadev Bhattiprolu [this message]
2021-10-25 17:36         ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YXIs9GRNtNbl8MkZ@us.ibm.com \
    --to=sukadev@linux.ibm.com \
    --cc=alobakin@pm.me \
    --cc=ap420073@gmail.com \
    --cc=arnd@arndb.de \
    --cc=atenart@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=drt@linux.ibm.com \
    --cc=dust.li@linux.alibaba.com \
    --cc=edumazet@google.com \
    --cc=kuba@kernel.org \
    --cc=linyunsheng@huawei.com \
    --cc=memxor@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=nhorman@redhat.com \
    --cc=weiwan@google.com \
    --cc=xuanzhuo@linux.alibaba.com \
    --subject='Re: [PATCH net v2] napi: fix race inside napi_enable' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.