All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Dubel, Helena Anna" <helena.anna.dubel@intel.com>
To: "Ertman, David M" <david.m.ertman@intel.com>,
	"intel-wired-lan@lists.osuosl.org"
	<intel-wired-lan@lists.osuosl.org>
Subject: Re: [Intel-wired-lan] [PATCH net v2] ice: Don't double unplug aux on peer initiated reset
Date: Thu, 1 Sep 2022 09:55:12 +0000	[thread overview]
Message-ID: <BYAPR11MB2983DC4EFE6CEC213FF41F0BBE7B9@BYAPR11MB2983.namprd11.prod.outlook.com> (raw)
In-Reply-To: <20220809172423.1967513-1-david.m.ertman@intel.com>


> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Ertman, David M
> Sent: wtorek, 9 sierpnia 2022 19:24
> To: intel-wired-lan@lists.osuosl.org
> Subject: [Intel-wired-lan] [PATCH net v2] ice: Don't double unplug aux on
> peer initiated reset
> 
> In the IDC callback that is accessed when the aux drivers request a reset, the
> function to unplug the aux devices is called.  This function is also called in the
> ice_prepare_for_reset function. This double call is causing a "scheduling
> while atomic" BUG.
> 
> [  662.676430] ice 0000:4c:00.0 rocep76s0: cqp opcode = 0x1 maj_err_code =
> 0xffff min_err_code = 0x8003
> 
> [  662.676609] ice 0000:4c:00.0 rocep76s0: [Modify QP Cmd Error][op_code=8]
> status=-29 waiting=1 completion_err=1 maj=0xffff min=0x8003
> 
> [  662.815006] ice 0000:4c:00.0 rocep76s0: ICE OICR event notification: oicr =
> 0x10000003
> 
> [  662.815014] ice 0000:4c:00.0 rocep76s0: critical PE Error,
> GLPE_CRITERR=0x00011424
> 
> [  662.815017] ice 0000:4c:00.0 rocep76s0: Requesting a reset
> 
> [  662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002
> 
> [  662.815475] BUG: scheduling while atomic: swapper/37/0/0x00010002 [
> 662.815477] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4
> dns_resolver nfs lockd grace fscache netfs rfkill 8021q garp mrp stp llc vfat fat
> rpcrdma intel_rapl_msr intel_rapl_common sunrpc i10nm_edac rdma_ucm
> nfit ib_srpt libnvdimm ib_isert iscsi_target_mod x86_pkg_temp_thermal
> intel_powerclamp coretemp target_core_mod snd_hda_intel ib_iser
> snd_intel_dspcfg libiscsi snd_intel_sdw_acpi scsi_transport_iscsi kvm_intel
> iTCO_wdt rdma_cm snd_hda_codec kvm iw_cm ipmi_ssif
> iTCO_vendor_support snd_hda_core irqbypass crct10dif_pclmul
> crc32_pclmul ghash_clmulni_intel snd_hwdep snd_seq snd_seq_device rapl
> snd_pcm snd_timer isst_if_mbox_pci pcspkr isst_if_mmio irdma
> intel_uncore idxd acpi_ipmi joydev isst_if_common snd mei_me idxd_bus
> ipmi_si soundcore i2c_i801 mei ipmi_devintf i2c_smbus i2c_ismt
> ipmi_msghandler acpi_power_meter acpi_pad rv(OE) ib_uverbs ib_cm
> ib_core xfs libcrc32c ast i2c_algo_bit drm_vram_helper drm_kms_helper
> syscopyarea sysfillrect sysimgblt fb_sys_fops drm_ttm_helpe  r ttm [
> 662.815546]  nvme nvme_core ice drm crc32c_intel i40e t10_pi wmi
> pinctrl_emmitsburg dm_mirror dm_region_hash dm_log dm_mod fuse [
> 662.815557] Preemption disabled at:
> [  662.815558] [<0000000000000000>] 0x0
> [  662.815563] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S
> OE     5.17.1 #2
> [  662.815566] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS
> SE5C6301.86B.6624.D18.2111021741 11/02/2021 [  662.815568] Call Trace:
> [  662.815572]  <IRQ>
> [  662.815574]  dump_stack_lvl+0x33/0x42 [  662.815581]
> __schedule_bug.cold.147+0x7d/0x8a [  662.815588]
> __schedule+0x798/0x990 [  662.815595]  schedule+0x44/0xc0 [  662.815597]
> schedule_preempt_disabled+0x14/0x20
> [  662.815600]  __mutex_lock.isra.11+0x46c/0x490 [  662.815603]  ?
> __ibdev_printk+0x76/0xc0 [ib_core] [  662.815633]  device_del+0x37/0x3d0 [
> 662.815639]  ice_unplug_aux_dev+0x1a/0x40 [ice] [  662.815674]
> ice_schedule_reset+0x3c/0xd0 [ice] [  662.815693]
> irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [  662.815712]  ?
> bitmap_find_next_zero_area_off+0x45/0xa0
> [  662.815719]  ice_send_event_to_aux+0x54/0x70 [ice] [  662.815741]
> ice_misc_intr+0x21d/0x2d0 [ice] [  662.815756]
> __handle_irq_event_percpu+0x4c/0x180
> [  662.815762]  handle_irq_event_percpu+0xf/0x40 [  662.815764]
> handle_irq_event+0x34/0x60 [  662.815766]  handle_edge_irq+0x9a/0x1c0 [
> 662.815770]  __common_interrupt+0x62/0x100 [  662.815774]
> common_interrupt+0xb4/0xd0 [  662.815779]  </IRQ> [  662.815780]  <TASK>
> [  662.815780]  asm_common_interrupt+0x1e/0x40 [  662.815785] RIP:
> 0010:cpuidle_enter_state+0xd6/0x380
> [  662.815789] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c
> 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00
> 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [  662.815791] RSP:
> 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [  662.815793] RAX:
> ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [
> 662.815795] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI:
> ffffffff93f53ee7 [  662.815796] RBP: ff5e2bd11ff41928 R08: 0000000000000000
> R09: 000000000002f8c0 [  662.815797] R10: 0000010c3f18e2cf R11:
> 000000000000000f R12: 0000009a52da2d08 [  662.815798] R13:
> ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [
> 662.815801]  cpuidle_enter+0x29/0x40 [  662.815803]  do_idle+0x261/0x2b0 [
> 662.815807]  cpu_startup_entry+0x19/0x20 [  662.815809]
> start_secondary+0x114/0x150 [  662.815813]
> secondary_startup_64_no_verify+0xd5/0xdb
> [  662.815818]  </TASK>
> [  662.815846] bad: scheduling from the idle thread!
> [  662.815849] CPU: 37 PID: 0 Comm: swapper/37 Kdump: loaded Tainted: G S
> W  OE     5.17.1 #2
> [  662.815852] Hardware name: Intel Corporation D50DNP/D50DNP, BIOS
> SE5C6301.86B.6624.D18.2111021741 11/02/2021 [  662.815853] Call Trace:
> [  662.815855]  <IRQ>
> [  662.815856]  dump_stack_lvl+0x33/0x42 [  662.815860]
> dequeue_task_idle+0x20/0x30 [  662.815863]  __schedule+0x1c3/0x990 [
> 662.815868]  schedule+0x44/0xc0 [  662.815871]
> schedule_preempt_disabled+0x14/0x20
> [  662.815873]  __mutex_lock.isra.11+0x3a8/0x490 [  662.815876]  ?
> __ibdev_printk+0x76/0xc0 [ib_core] [  662.815904]  device_del+0x37/0x3d0 [
> 662.815909]  ice_unplug_aux_dev+0x1a/0x40 [ice] [  662.815937]
> ice_schedule_reset+0x3c/0xd0 [ice] [  662.815961]
> irdma_iidc_event_handler.cold.7+0xb6/0xd3 [irdma] [  662.815979]  ?
> bitmap_find_next_zero_area_off+0x45/0xa0
> [  662.815985]  ice_send_event_to_aux+0x54/0x70 [ice] [  662.816011]
> ice_misc_intr+0x21d/0x2d0 [ice] [  662.816033]
> __handle_irq_event_percpu+0x4c/0x180
> [  662.816037]  handle_irq_event_percpu+0xf/0x40 [  662.816039]
> handle_irq_event+0x34/0x60 [  662.816042]  handle_edge_irq+0x9a/0x1c0 [
> 662.816045]  __common_interrupt+0x62/0x100 [  662.816048]
> common_interrupt+0xb4/0xd0 [  662.816052]  </IRQ> [  662.816053]  <TASK>
> [  662.816054]  asm_common_interrupt+0x1e/0x40 [  662.816057] RIP:
> 0010:cpuidle_enter_state+0xd6/0x380
> [  662.816060] Code: 49 89 c4 0f 1f 44 00 00 31 ff e8 65 d7 95 ff 45 84 ff 74 12 9c
> 58 f6 c4 02 0f 85 64 02 00 00 31 ff e8 ae c5 9c ff fb 45 85 f6 <0f> 88 12 01 00 00
> 49 63 d6 4c 2b 24 24 48 8d 04 52 48 8d 04 82 49 [  662.816063] RSP:
> 0018:ff2c2c4f18edbe80 EFLAGS: 00000202 [  662.816065] RAX:
> ff280805df140000 RBX: 0000000000000002 RCX: 000000000000001f [
> 662.816067] RDX: 0000009a52da2d08 RSI: ffffffff93f8240b RDI:
> ffffffff93f53ee7 [  662.816068] RBP: ff5e2bd11ff41928 R08: 0000000000000000
> R09: 000000000002f8c0 [  662.816070] R10: 0000010c3f18e2cf R11:
> 000000000000000f R12: 0000009a52da2d08 [  662.816071] R13:
> ffffffff94ad7e20 R14: 0000000000000002 R15: 0000000000000000 [
> 662.816075]  cpuidle_enter+0x29/0x40 [  662.816077]  do_idle+0x261/0x2b0 [
> 662.816080]  cpu_startup_entry+0x19/0x20 [  662.816083]
> start_secondary+0x114/0x150 [  662.816087]
> secondary_startup_64_no_verify+0xd5/0xdb
> [  662.816091]  </TASK>
> [  662.816169] bad: scheduling from the idle thread!
> 
> The correct place to unplug the aux devices for a reset is in the
> prepare_for_reset function, as this is a common place for all reset flows.
> It also has built in protection from being called twice in a single reset instance
> before the aux devices are replugged.
> 
> Fixes: f9f5301e7e2d4 ("ice: Register auxiliary device to provide RDMA")
> Signed-off-by: Dave Ertman <david.m.ertman@intel.com>
> ---
> v2 - changed commit message to include BUG message and remove extra
> space
> ---
>  drivers/net/ethernet/intel/ice/ice_main.c | 2 --
>  1 file changed, 2 deletions(-)
> 

Tested-by: Helena Anna Dubel <helena.anna.dubel@intel.com>
-------------------------------------------------------------------------------------
Intel Technology Poland sp. z o.o.
ul. Slowackiego 173, 80-298 Gdansk
KRS 101882, NIP 957-07-52-316
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

      reply	other threads:[~2022-09-01 14:03 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-08-09 17:24 [Intel-wired-lan] [PATCH net v2] ice: Don't double unplug aux on peer initiated reset Dave Ertman
2022-09-01  9:55 ` Dubel, Helena Anna [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BYAPR11MB2983DC4EFE6CEC213FF41F0BBE7B9@BYAPR11MB2983.namprd11.prod.outlook.com \
    --to=helena.anna.dubel@intel.com \
    --cc=david.m.ertman@intel.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.