netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [regression] dpaa2: TSO offload on lx2160a causes fatal exception in interrupt
@ 2022-05-04  7:46 Thorsten Leemhuis
  2022-05-04  8:06 ` Ioana Ciornei
  0 siblings, 1 reply; 7+ messages in thread
From: Thorsten Leemhuis @ 2022-05-04  7:46 UTC (permalink / raw)
  To: Ioana Ciornei; +Cc: davem, kuba, netdev

Hi, this is your Linux kernel regression tracker.

Ioana, I noticed a regression report in bugzilla.kernel.org that afaics
nobody acted upon since it was reported about a week ago. The reporter
*suspects* it's caused by a recent change of yours. That's why I decided
to forward it to the lists and all people that seemed to be relevant
here. To quote from https://bugzilla.kernel.org/show_bug.cgi?id=215886 :

>  kernelbugs@63bit.net 2022-04-25 18:15:38 UTC
> 
> Network traffic eventually causes a fatal exception in interrupt. Disabling TSO prevents the bug. Likely related to recent changes to enable TSO?
> 
> Crash:
> [  487.231819] Unable to handle kernel paging request at virtual address fffffd9807000008
> [  487.239780] Mem abort info:
> [  487.242570]   ESR = 0x96000006
> [  487.245620]   EC = 0x25: DABT (current EL), IL = 32 bits
> [  487.250974]   SET = 0, FnV = 0
> [  487.254025]   EA = 0, S1PTW = 0
> [  487.257170]   FSC = 0x06: level 2 translation fault
> [  487.262050] Data abort info:
> [  487.264921]   ISV = 0, ISS = 0x00000006
> [  487.268748]   CM = 0, WnR = 0
> [  487.271747] swapper pgtable: 4k pages, 48-bit VAs, pgdp=00000000830fd000
> [  487.278449] [fffffd9807000008] pgd=100000277c353003, p4d=100000277c353003, pud=100000277c352003, pmd=0000000000000000
> [  487.289110] Internal error: Oops: 96000006 [#1] SMP
> [  487.293985] Modules linked in: rfkill fsl_dpaa2_ptp ltc2978 lm90 pmbus_core at24 ptp_qoriq fsl_dpaa2_eth pcs_lynx at803x phylink xgmac_mdio i2c_mux_pca954x i2c_mux sfp mdio_i2c qoriq_thermal qoriq_cpufreq layerscape_edac_mod vfat fat auth_rpcgss fuse sunrpc dpaa2_caam fsl_mc_dpio caam_jr nvme rtc_pcf2127 caamhash_desc mmc_block xhci_plat_hcd caamalg_desc regmap_spi dpaa2_console crct10dif_ce libdes ghash_ce nvme_core dwc3 caam sdhci_of_esdhc ulpi error sdhci_pltfm rtc_fsl_ftm_alarm udc_core sbsa_gwdt ahci_qoriq i2c_imx sdhci gpio_keys
> [  487.341467] CPU: 7 PID: 1772 Comm: sshd Tainted: G        W        --------  ---  5.18.0-0.rc3.20220422gitd569e86915b7f2f.31.fc37.aarch64 #1
> [  487.354061] Hardware name: SolidRun LX2160A Honeycomb (DT)
> [  487.359535] pstate: a0400005 (NzCv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
> [  487.366485] pc : kfree+0xac/0x304
> [  487.369799] lr : kfree+0x204/0x304
> [  487.373191] sp : ffff80000c4eb120
> [  487.376493] x29: ffff80000c4eb120 x28: ffff662240c46400 x27: 0000000000000001
> [  487.383621] x26: 0000000000000001 x25: ffff662246da0cc0 x24: ffff66224af78000
> [  487.390748] x23: ffffad184f4ce008 x22: ffffad1850185000 x21: ffffad1838d13cec
> [  487.397874] x20: ffff6601c0000000 x19: fffffd9807000000 x18: 0000000000000000
> [  487.405000] x17: ffffb910cdc49000 x16: ffffad184d7d9080 x15: 0000000000004000
> [  487.412126] x14: 0000000000000008 x13: 000000000000ffff x12: 0000000000000000
> [  487.419252] x11: 0000000000000004 x10: 0000000000000001 x9 : ffffad184d7d927c
> [  487.426379] x8 : 0000000000000000 x7 : 0000000ffffffd1d x6 : ffff662240a94900
> [  487.433505] x5 : 0000000000000003 x4 : 0000000000000009 x3 : ffffad184f4ce008
> [  487.440632] x2 : ffff662243eec000 x1 : 0000000100000100 x0 : fffffc0000000000
> [  487.447758] Call trace:
> [  487.450194]  kfree+0xac/0x304
> [  487.453151]  dpaa2_eth_free_tx_fd.isra.0+0x33c/0x3e0 [fsl_dpaa2_eth]
> [  487.459507]  dpaa2_eth_tx_conf+0x100/0x2e0 [fsl_dpaa2_eth]
> [  487.464989]  dpaa2_eth_poll+0xdc/0x380 [fsl_dpaa2_eth]
> [  487.470122]  __napi_poll.constprop.0+0x40/0x1a0
> [  487.474645]  net_rx_action+0x310/0x3d4
> [  487.478384]  __do_softirq+0x23c/0x6b4
> [  487.482036]  __irq_exit_rcu+0x104/0x214
> [  487.485862]  irq_exit_rcu+0x1c/0x50
> [  487.489339]  el1_interrupt+0x38/0x70
> [  487.492907]  el1h_64_irq_handler+0x18/0x24
> [  487.496993]  el1h_64_irq+0x68/0x6c
> [  487.500384]  __ip_finish_output+0x138/0x220
> [  487.504558]  ip_finish_output+0x40/0xf4
> [  487.508384]  ip_output+0xfc/0x2fc
> [  487.511689]  __ip_queue_xmit+0x1c0/0x5e0
> [  487.515601]  ip_queue_xmit+0x20/0x30
> [  487.519166]  __tcp_transmit_skb+0x3c0/0x7cc
> [  487.523339]  tcp_write_xmit+0x310/0x8ac
> [  487.527164]  __tcp_push_pending_frames+0x48/0x110
> [  487.531857]  tcp_push+0xbc/0x19c
> [  487.535075]  tcp_sendmsg_locked+0x2ac/0xad4
> [  487.539247]  tcp_sendmsg+0x44/0x6c
> [  487.542639]  inet_sendmsg+0x50/0x7c
> [  487.546117]  sock_sendmsg+0x60/0x70
> [  487.549595]  sock_write_iter+0x98/0xe0
> [  487.553333]  new_sync_write+0x124/0x130
> [  487.557159]  vfs_write+0x1c8/0x210
> [  487.560550]  ksys_write+0xd8/0xec
> [  487.563854]  __arm64_sys_write+0x28/0x34
> [  487.567766]  invoke_syscall+0x78/0x100
> [  487.571506]  el0_svc_common.constprop.0+0x68/0x124
> [  487.576287]  do_el0_svc+0x30/0x90
> [  487.579592]  el0_svc+0x60/0x1a4
> [  487.582723]  el0t_64_sync_handler+0x10c/0x140
> [  487.587070]  el0t_64_sync+0x190/0x194
> [  487.590723] Code: 8b130293 b25657e0 d34cfe73 8b131813 (f9400660) 
> [  487.596807] ---[ end trace 0000000000000000 ]---
> [  487.601413] Kernel panic - not syncing: Oops: Fatal exception in interrupt
> [  487.608276] SMP: stopping secondary CPUs
> [  487.612206] Kernel Offset: 0x2d1845400000 from 0xffff800008000000
> [  487.618287] PHYS_OFFSET: 0xffff99fe40000000
> [  487.622457] CPU features: 0x100,00004b09,00001086
> [  487.627150] Memory Limit: none
> [  487.630196] Rebooting in 1 seconds..
> 
> Mitigation:
> ethtool -K ethX tso off
> 
> [reply] [−] Comment 1 kernelbugs@63bit.net 2022-05-02 01:37:06 UTC
> 
> I believe this is related to commit 3dc709e0cd47c602a8d1a6747f1a91e9737eeed3
> 

That commit is "dpaa2-eth: add support for software TSO".

Could somebody take a look into this? Or was this discussed somewhere
else already? Or even fixed?

Anyway, to get this tracked:

#regzbot introduced: 3dc709e0cd47c602a8d1a6747f1a91e9737eeed3
#regzbot from: Unkown <kernelbugs@63bit.net>
#regzbot title: net: dpaa2: TSO offload on lx2160a causes fatal
exception in interrupt
#regzbot link: https://bugzilla.kernel.org/show_bug.cgi?id=215886

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)

P.S.: As the Linux kernel's regression tracker I deal with a lot of
reports and sometimes miss something important when writing mails like
this. If that's the case here, don't hesitate to tell me in a public
reply, it's in everyone's interest to set the public record straight.

-- 
Additional information about regzbot:

If you want to know more about regzbot, check out its web-interface, the
getting start guide, and the references documentation:

https://linux-regtracking.leemhuis.info/regzbot/
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/getting_started.md
https://gitlab.com/knurd42/regzbot/-/blob/main/docs/reference.md

The last two documents will explain how you can interact with regzbot
yourself if your want to.

Hint for reporters: when reporting a regression it's in your interest to
CC the regression list and tell regzbot about the issue, as that ensures
the regression makes it onto the radar of the Linux kernel's regression
tracker -- that's in your interest, as it ensures your report won't fall
through the cracks unnoticed.

Hint for developers: you normally don't need to care about regzbot once
it's involved. Fix the issue as you normally would, just remember to
include 'Link:' tag in the patch descriptions pointing to all reports
about the issue. This has been expected from developers even before
regzbot showed up for reasons explained in
'Documentation/process/submitting-patches.rst' and
'Documentation/process/5.Posting.rst'.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [regression] dpaa2: TSO offload on lx2160a causes fatal exception in interrupt
  2022-05-04  7:46 [regression] dpaa2: TSO offload on lx2160a causes fatal exception in interrupt Thorsten Leemhuis
@ 2022-05-04  8:06 ` Ioana Ciornei
  2022-05-12 16:43   ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Ioana Ciornei @ 2022-05-04  8:06 UTC (permalink / raw)
  To: Thorsten Leemhuis; +Cc: davem, kuba, netdev

On Wed, May 04, 2022 at 09:46:49AM +0200, Thorsten Leemhuis wrote:
> Hi, this is your Linux kernel regression tracker.
> 
> Ioana, I noticed a regression report in bugzilla.kernel.org that afaics
> nobody acted upon since it was reported about a week ago. The reporter
> *suspects* it's caused by a recent change of yours. 

I was not aware of this regression report, thanks for letting me know.

> That's why I decided
> to forward it to the lists and all people that seemed to be relevant
> here. To quote from https://bugzilla.kernel.org/show_bug.cgi?id=215886
> 
> >  kernelbugs@63bit.net 2022-04-25 18:15:38 UTC
> > 
> > Network traffic eventually causes a fatal exception in interrupt. Disabling TSO prevents the bug. Likely related to recent changes to enable TSO?
> > 
> > Crash:
> > 

<snip>

> > Mitigation:
> > ethtool -K ethX tso off
> > 
> > [reply] [−] Comment 1 kernelbugs@63bit.net 2022-05-02 01:37:06 UTC
> > 
> > I believe this is related to commit 3dc709e0cd47c602a8d1a6747f1a91e9737eeed3
> > 
> 
> That commit is "dpaa2-eth: add support for software TSO".
> 
> Could somebody take a look into this? Or was this discussed somewhere
> else already? Or even fixed?

I will take a look at it, it wasn't discussed already.

Ioana

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [regression] dpaa2: TSO offload on lx2160a causes fatal exception in interrupt
  2022-05-04  8:06 ` Ioana Ciornei
@ 2022-05-12 16:43   ` Jakub Kicinski
  2022-05-19  5:12     ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2022-05-12 16:43 UTC (permalink / raw)
  To: Ioana Ciornei; +Cc: Thorsten Leemhuis, davem, netdev

On Wed, 4 May 2022 08:06:47 +0000 Ioana Ciornei wrote:
> > > Mitigation:
> > > ethtool -K ethX tso off
> > > 
> > > [reply] [−] Comment 1 kernelbugs@63bit.net 2022-05-02 01:37:06 UTC
> > > 
> > > I believe this is related to commit 3dc709e0cd47c602a8d1a6747f1a91e9737eeed3
> > >   
> > 
> > That commit is "dpaa2-eth: add support for software TSO".
> > 
> > Could somebody take a look into this? Or was this discussed somewhere
> > else already? Or even fixed?  
> 
> I will take a look at it, it wasn't discussed already.

Hi! Any progress on this one? AFAICT this is a new bug in 5.18, would
be great if we can close it in the next week or so, otherwise perhaps
consider disabling TSO by default. maybe?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [regression] dpaa2: TSO offload on lx2160a causes fatal exception in interrupt
  2022-05-12 16:43   ` Jakub Kicinski
@ 2022-05-19  5:12     ` Jakub Kicinski
  2022-05-19  6:07       ` Thorsten Leemhuis
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2022-05-19  5:12 UTC (permalink / raw)
  To: Ioana Ciornei; +Cc: Thorsten Leemhuis, davem, netdev

On Thu, 12 May 2022 09:43:23 -0700 Jakub Kicinski wrote:
> On Wed, 4 May 2022 08:06:47 +0000 Ioana Ciornei wrote:
> > > > Mitigation:
> > > > ethtool -K ethX tso off
> > > > 
> > > > [reply] [−] Comment 1 kernelbugs@63bit.net 2022-05-02 01:37:06 UTC
> > > > 
> > > > I believe this is related to commit 3dc709e0cd47c602a8d1a6747f1a91e9737eeed3
> > > >     
> > > 
> > > That commit is "dpaa2-eth: add support for software TSO".
> > > 
> > > Could somebody take a look into this? Or was this discussed somewhere
> > > else already? Or even fixed?    
> > 
> > I will take a look at it, it wasn't discussed already.  
> 
> Hi! Any progress on this one? AFAICT this is a new bug in 5.18, would
> be great if we can close it in the next week or so, otherwise perhaps
> consider disabling TSO by default. maybe?

ping?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [regression] dpaa2: TSO offload on lx2160a causes fatal exception in interrupt
  2022-05-19  5:12     ` Jakub Kicinski
@ 2022-05-19  6:07       ` Thorsten Leemhuis
  2022-05-19 16:03         ` Jakub Kicinski
  0 siblings, 1 reply; 7+ messages in thread
From: Thorsten Leemhuis @ 2022-05-19  6:07 UTC (permalink / raw)
  To: Jakub Kicinski, Ioana Ciornei; +Cc: davem, netdev



On 19.05.22 07:12, Jakub Kicinski wrote:
> On Thu, 12 May 2022 09:43:23 -0700 Jakub Kicinski wrote:
>> On Wed, 4 May 2022 08:06:47 +0000 Ioana Ciornei wrote:
>>>>> Mitigation:
>>>>> ethtool -K ethX tso off
>>>>>
>>>>> [reply] [−] Comment 1 kernelbugs@63bit.net 2022-05-02 01:37:06 UTC
>>>>>
>>>>> I believe this is related to commit 3dc709e0cd47c602a8d1a6747f1a91e9737eeed3
>>>>>     
>>>>
>>>> That commit is "dpaa2-eth: add support for software TSO".
>>>>
>>>> Could somebody take a look into this? Or was this discussed somewhere
>>>> else already? Or even fixed?    
>>>
>>> I will take a look at it, it wasn't discussed already.  
>>
>> Hi! Any progress on this one? AFAICT this is a new bug in 5.18, would
>> be great if we can close it in the next week or so, otherwise perhaps
>> consider disabling TSO by default. maybe?
> 
> ping?

ICYMI: There was some activity in bugzilla nearly ten days ago and Ioana
provided a patch, but seems that didn't help. I asked for a status
update yesterday, but no reply yet:
https://bugzilla.kernel.org/show_bug.cgi?id=215886

Ciao, Thorsten

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [regression] dpaa2: TSO offload on lx2160a causes fatal exception in interrupt
  2022-05-19  6:07       ` Thorsten Leemhuis
@ 2022-05-19 16:03         ` Jakub Kicinski
  2022-05-19 16:23           ` Ioana Ciornei
  0 siblings, 1 reply; 7+ messages in thread
From: Jakub Kicinski @ 2022-05-19 16:03 UTC (permalink / raw)
  To: Thorsten Leemhuis; +Cc: Ioana Ciornei, davem, netdev

On Thu, 19 May 2022 08:07:15 +0200 Thorsten Leemhuis wrote:
> ICYMI

Oh, I forsurely missed it, don't look at the bz.

> There was some activity in bugzilla nearly ten days ago and Ioana
> provided a patch, but seems that didn't help. I asked for a status
> update yesterday, but no reply yet:
> https://bugzilla.kernel.org/show_bug.cgi?id=215886

Well, GTK, thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [regression] dpaa2: TSO offload on lx2160a causes fatal exception in interrupt
  2022-05-19 16:03         ` Jakub Kicinski
@ 2022-05-19 16:23           ` Ioana Ciornei
  0 siblings, 0 replies; 7+ messages in thread
From: Ioana Ciornei @ 2022-05-19 16:23 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Thorsten Leemhuis, davem, netdev

On Thu, May 19, 2022 at 09:03:08AM -0700, Jakub Kicinski wrote:
> On Thu, 19 May 2022 08:07:15 +0200 Thorsten Leemhuis wrote:
> > ICYMI
> 
> Oh, I forsurely missed it, don't look at the bz.
> 
> > There was some activity in bugzilla nearly ten days ago and Ioana
> > provided a patch, but seems that didn't help. I asked for a status
> > update yesterday, but no reply yet:
> > https://bugzilla.kernel.org/show_bug.cgi?id=215886
> 
> Well, GTK, thanks.

I just sent a new reply on the bugzilla thread just as you responded.
There is certainly a bug there for which I just posted a new patch to be
tried. I am just not sure that this will fix the entire crash that is
seen by the reporter (I am still unable to reproduce it).

I will try my best to get to the bottom of it, if not I'll submit a
patch so that we start with TSO disabled.

Ioana

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-05-19 16:23 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-04  7:46 [regression] dpaa2: TSO offload on lx2160a causes fatal exception in interrupt Thorsten Leemhuis
2022-05-04  8:06 ` Ioana Ciornei
2022-05-12 16:43   ` Jakub Kicinski
2022-05-19  5:12     ` Jakub Kicinski
2022-05-19  6:07       ` Thorsten Leemhuis
2022-05-19 16:03         ` Jakub Kicinski
2022-05-19 16:23           ` Ioana Ciornei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).