All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-wired-lan] (bisected) ixgbe tx hang since 4.16-rc1
@ 2018-06-03  7:55 Andre Tomt
  2018-06-04  0:15 ` Shannon Nelson
  0 siblings, 1 reply; 5+ messages in thread
From: Andre Tomt @ 2018-06-03  7:55 UTC (permalink / raw)
  To: intel-wired-lan

Hello

Since kernel 4.16-rc1 and including linus mainline git tree as of 
earlier today this Atom Denverton system with a X553 embedded network 
controllers has been unable to send packets until I kick the interface 
with a ethtool -r or leave it alone until it trips the TX Unit Hang 
watchdog.

The watchdog gets it going eventually, but this can take a while, as 
seen here:

> [   14.132485] ixgbe 0000:05:00.0 eno1: NIC Link is Up 1 Gbps, Flow Control: None
> [   14.133006] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
> [ 1085.067049] ixgbe 0000:05:00.0 eno1: Detected Tx Unit Hang 
>                  Tx Queue             <4>
>                  TDH, TDT             <1ab>, <1af>
>                  next_to_use          <1af>
>                  next_to_clean        <1ab>
>                tx_buffer_info[next_to_clean]
>                  time_stamp           <1000393d8>
>                  jiffies              <100039800>
> [ 1085.081184] ixgbe 0000:05:00.0 eno1: tx hang 1 detected on queue 4, resetting adapter
> [ 1085.081189] ixgbe 0000:05:00.0 eno1: initiating reset due to tx timeout
> [ 1085.082480] ixgbe 0000:05:00.0 eno1: Reset adapter
> [ 1085.083135] ixgbe 0000:05:00.0 eno1: NIC Link is Down
> [ 1088.729499] ixgbe 0000:05:00.0 eno1: NIC Link is Up 1 Gbps, Flow Control: None

A bisection turned up this commit:
49a94d74d9489647dfc3449ee135af4bf8fab2c1 is the first bad commit
commit 49a94d74d9489647dfc3449ee135af4bf8fab2c1
Author: Shannon Nelson <shannon.nelson@oracle.com>
Date:   Tue Dec 19 15:59:55 2017 -0800

     ixgbe: add ipsec engine start and stop routines

     Add in the code for running and stopping the hardware ipsec
     encryption/decryption engine.  It is good to keep the engine
     off when not in use in order to save on the power draw.

     Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
     Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
     Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>

:040000 040000 417954917c2d03feaa973e4cce5ca8f53b099c94 
66091763e776ef25a33417a375d8ce149f01b61f M	drivers

Just reverting that commit on 4.17 is a no go, so I verified it by 
commenting out the call to ixgbe_ipsec_stop_engine(adapter) in 
ixgbe_init_ipsec_offload(), and lo and behold, packets started moving 
without intervention again.

This is a Supermicro A2SDi-8C-HLN4F, Atom C3758 (Denverton) system with 
"Intel Corporation Ethernet Connection X553 1GbE (rev 11)" according to 
pci-id db.

I'm not using ipsec on this system. The modules are built but not loaded.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Intel-wired-lan] (bisected) ixgbe tx hang since 4.16-rc1
  2018-06-03  7:55 [Intel-wired-lan] (bisected) ixgbe tx hang since 4.16-rc1 Andre Tomt
@ 2018-06-04  0:15 ` Shannon Nelson
  2018-06-04  0:35   ` Andre Tomt
  2018-06-04  4:13   ` Rustad, Mark D
  0 siblings, 2 replies; 5+ messages in thread
From: Shannon Nelson @ 2018-06-04  0:15 UTC (permalink / raw)
  To: intel-wired-lan

On 6/3/2018 12:55 AM, Andre Tomt wrote:
> Hello
> 
> Since kernel 4.16-rc1 and including linus mainline git tree as of 
> earlier today this Atom Denverton system with a X553 embedded network 
> controllers has been unable to send packets until I kick the interface 
> with a ethtool -r or leave it alone until it trips the TX Unit Hang 
> watchdog.

Hi Andre,

Thanks for your bug report and the bisection.  I'm not aware of any 
differences in the x553 from the rest of the x5xx family, and I don't 
have access to one, so I can't do any testing of my own.

I can imagine how removing the call to ixgbe_ipsec_stop_engine() might 
help, but since the driver is still advertising support for the IPsec 
offload, I'm afraid there would still be other problems popping up if 
someone tried to use the offload.  If this particular device, or family 
of devices, is problematic, it might be better to simply filter them out 
at the top of ixgbe_init_ipsec_offload() like we do for the 82598.

In order to help our friends at Intel do some further investigation, 
would you please include the output from "lspci -s <pci-bdf> -nv" so we 
can know for sure which device this is?


Jeff, can you get someone in your test group to take look at this?  It 
would be good to know if IPsec offload on the embedded devices are going 
to cause us problems.

Thanks,
sln


> 
> The watchdog gets it going eventually, but this can take a while, as 
> seen here:
> 
>> [?? 14.132485] ixgbe 0000:05:00.0 eno1: NIC Link is Up 1 Gbps, Flow 
>> Control: None
>> [?? 14.133006] IPv6: ADDRCONF(NETDEV_CHANGE): eno1: link becomes ready
>> [ 1085.067049] ixgbe 0000:05:00.0 eno1: Detected Tx Unit Hang 
>> ???????????????? Tx Queue???????????? <4>
>> ???????????????? TDH, TDT???????????? <1ab>, <1af>
>> ???????????????? next_to_use????????? <1af>
>> ???????????????? next_to_clean??????? <1ab>
>> ?????????????? tx_buffer_info[next_to_clean]
>> ???????????????? time_stamp?????????? <1000393d8>
>> ???????????????? jiffies????????????? <100039800>
>> [ 1085.081184] ixgbe 0000:05:00.0 eno1: tx hang 1 detected on queue 4, 
>> resetting adapter
>> [ 1085.081189] ixgbe 0000:05:00.0 eno1: initiating reset due to tx 
>> timeout
>> [ 1085.082480] ixgbe 0000:05:00.0 eno1: Reset adapter
>> [ 1085.083135] ixgbe 0000:05:00.0 eno1: NIC Link is Down
>> [ 1088.729499] ixgbe 0000:05:00.0 eno1: NIC Link is Up 1 Gbps, Flow 
>> Control: None
> 
> A bisection turned up this commit:
> 49a94d74d9489647dfc3449ee135af4bf8fab2c1 is the first bad commit
> commit 49a94d74d9489647dfc3449ee135af4bf8fab2c1
> Author: Shannon Nelson <shannon.nelson@oracle.com>
> Date:?? Tue Dec 19 15:59:55 2017 -0800
> 
>  ??? ixgbe: add ipsec engine start and stop routines
> 
>  ??? Add in the code for running and stopping the hardware ipsec
>  ??? encryption/decryption engine.? It is good to keep the engine
>  ??? off when not in use in order to save on the power draw.
> 
>  ??? Signed-off-by: Shannon Nelson <shannon.nelson@oracle.com>
>  ??? Tested-by: Andrew Bowers <andrewx.bowers@intel.com>
>  ??? Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
> 
> :040000 040000 417954917c2d03feaa973e4cce5ca8f53b099c94 
> 66091763e776ef25a33417a375d8ce149f01b61f M??? drivers
> 
> Just reverting that commit on 4.17 is a no go, so I verified it by 
> commenting out the call to ixgbe_ipsec_stop_engine(adapter) in 
> ixgbe_init_ipsec_offload(), and lo and behold, packets started moving 
> without intervention again.
> 
> This is a Supermicro A2SDi-8C-HLN4F, Atom C3758 (Denverton) system with 
> "Intel Corporation Ethernet Connection X553 1GbE (rev 11)" according to 
> pci-id db.
> 
> I'm not using ipsec on this system. The modules are built but not loaded.






^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Intel-wired-lan] (bisected) ixgbe tx hang since 4.16-rc1
  2018-06-04  0:15 ` Shannon Nelson
@ 2018-06-04  0:35   ` Andre Tomt
  2018-06-04  2:32     ` Alexander Duyck
  2018-06-04  4:13   ` Rustad, Mark D
  1 sibling, 1 reply; 5+ messages in thread
From: Andre Tomt @ 2018-06-04  0:35 UTC (permalink / raw)
  To: intel-wired-lan

On 04. juni 2018 02:15, Shannon Nelson wrote:
> On 6/3/2018 12:55 AM, Andre Tomt wrote:
>> Hello
>>
>> Since kernel 4.16-rc1 and including linus mainline git tree as of 
>> earlier today this Atom Denverton system with a X553 embedded network 
>> controllers has been unable to send packets until I kick the interface 
>> with a ethtool -r or leave it alone until it trips the TX Unit Hang 
>> watchdog.
> 
> Hi Andre,
> 
> Thanks for your bug report and the bisection.? I'm not aware of any 
> differences in the x553 from the rest of the x5xx family, and I don't 
> have access to one, so I can't do any testing of my own.
> 
> I can imagine how removing the call to ixgbe_ipsec_stop_engine() might 
> help, but since the driver is still advertising support for the IPsec 
> offload, I'm afraid there would still be other problems popping up if 
> someone tried to use the offload.? If this particular device, or family 
> of devices, is problematic, it might be better to simply filter them out 
> at the top of ixgbe_init_ipsec_offload() like we do for the 82598.

Indeed, this was only a minimal semi-revert to verify the bisection result.

> In order to help our friends at Intel do some further investigation, 
> would you please include the output from "lspci -s <pci-bdf> -nv" so we 
> can know for sure which device this is?

Here it is, for one of the pci devices (they are all identical)
root at ubuntu1804:~# lspci -s 05:00.0 -nv
05:00.0 0200: 8086:15e4 (rev 11)
	Subsystem: 8086:0000
	Flags: bus master, fast devsel, latency 0, IRQ 16
	Memory at 7b7f800000 (64-bit, prefetchable) [size=2M]
	Memory at 7b7fa04000 (64-bit, prefetchable) [size=16K]
	Expansion ROM at df680000 [disabled] [size=512K]
	Capabilities: [40] Power Management version 3
	Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
	Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
	Capabilities: [a0] Express Endpoint, MSI 00
	Capabilities: [100] Advanced Error Reporting
	Capabilities: [140] Device Serial Number 00-00-c9-ff-ff-00-00-00
	Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
	Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
	Capabilities: [1b0] Access Control Services
	Kernel driver in use: ixgbe
	Kernel modules: ixgbe


> Jeff, can you get someone in your test group to take look at this?? It 
> would be good to know if IPsec offload on the embedded devices are going 
> to cause us problems.

> Thanks,
> sln

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Intel-wired-lan] (bisected) ixgbe tx hang since 4.16-rc1
  2018-06-04  0:35   ` Andre Tomt
@ 2018-06-04  2:32     ` Alexander Duyck
  0 siblings, 0 replies; 5+ messages in thread
From: Alexander Duyck @ 2018-06-04  2:32 UTC (permalink / raw)
  To: intel-wired-lan

On Sun, Jun 3, 2018 at 5:35 PM, Andre Tomt <andre@tomt.net> wrote:
> On 04. juni 2018 02:15, Shannon Nelson wrote:
>>
>> On 6/3/2018 12:55 AM, Andre Tomt wrote:
>>>
>>> Hello
>>>
>>> Since kernel 4.16-rc1 and including linus mainline git tree as of earlier
>>> today this Atom Denverton system with a X553 embedded network controllers
>>> has been unable to send packets until I kick the interface with a ethtool -r
>>> or leave it alone until it trips the TX Unit Hang watchdog.
>>
>>
>> Hi Andre,
>>
>> Thanks for your bug report and the bisection.  I'm not aware of any
>> differences in the x553 from the rest of the x5xx family, and I don't have
>> access to one, so I can't do any testing of my own.
>>
>> I can imagine how removing the call to ixgbe_ipsec_stop_engine() might
>> help, but since the driver is still advertising support for the IPsec
>> offload, I'm afraid there would still be other problems popping up if
>> someone tried to use the offload.  If this particular device, or family of
>> devices, is problematic, it might be better to simply filter them out at the
>> top of ixgbe_init_ipsec_offload() like we do for the 82598.
>
>
> Indeed, this was only a minimal semi-revert to verify the bisection result.
>
>> In order to help our friends at Intel do some further investigation, would
>> you please include the output from "lspci -s <pci-bdf> -nv" so we can know
>> for sure which device this is?
>
>
> Here it is, for one of the pci devices (they are all identical)
> root at ubuntu1804:~# lspci -s 05:00.0 -nv
> 05:00.0 0200: 8086:15e4 (rev 11)
>         Subsystem: 8086:0000
>         Flags: bus master, fast devsel, latency 0, IRQ 16
>         Memory at 7b7f800000 (64-bit, prefetchable) [size=2M]
>         Memory at 7b7fa04000 (64-bit, prefetchable) [size=16K]
>         Expansion ROM at df680000 [disabled] [size=512K]
>         Capabilities: [40] Power Management version 3
>         Capabilities: [50] MSI: Enable- Count=1/1 Maskable+ 64bit+
>         Capabilities: [70] MSI-X: Enable+ Count=64 Masked-
>         Capabilities: [a0] Express Endpoint, MSI 00
>         Capabilities: [100] Advanced Error Reporting
>         Capabilities: [140] Device Serial Number 00-00-c9-ff-ff-00-00-00
>         Capabilities: [150] Alternative Routing-ID Interpretation (ARI)
>         Capabilities: [160] Single Root I/O Virtualization (SR-IOV)
>         Capabilities: [1b0] Access Control Services
>         Kernel driver in use: ixgbe
>         Kernel modules: ixgbe
>
>
>> Jeff, can you get someone in your test group to take look at this?  It
>> would be good to know if IPsec offload on the embedded devices are going to
>> cause us problems.
>
>
>> Thanks,
>> sln

Did we ever try testing this on any parts that had the security
offloads disabled? The 82599, x540, and x550 have the option to
disable the security via a pin on the board as per section 4.6.12 of
their data sheets. In the case of 82599 I guess it was one of the
software defined pins, and then for x540 and x550 it became an
explicit pin. I'm wondering if what we are seeing is a part that has
security offload explicitly disabled and us messing with the registers
is somehow confusing it since we are attempting to update fields that
should be locked out.

Another possibility is that the link tweaking in ixgbe_ipsec_stop_data
is causing some sort of issue since the part in use here is a 1G part
if I understand the part number correctly. One thing we may want to
look at doing is just test and bail if the tx_rdy and rx_rdy bits are
already set before we check the link state. Really the code should
also probably be checking to see if we are even tweaking the security
mode before going through all this. Maybe a check to see if SECTX_DIS
is already set before we try to change things. That would help to take
care of the case I called out above where SECTX_DIS and TX_DIS can be
read-only on a system with these registers fused off.

Really I am not sure how safe that code is since enabling/disabling
loopback. Usually link changes like that are followed by a reset to
flush out the link setup. Doing this early in the setup after a reset
may be problematic at best and should be avoided if at all possible.
We might want to take a look at moving the ixgbe_init_ipsec_offload
call up from where it is currently to somewhere just after the
init_swfw_sync call but before the call to reset_hw. That may give us
enough leeway that we don't have to worry about the link issues since
from the sound of things all it takes is one reset and this issue goes
away anyway.

- Alex

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Intel-wired-lan] (bisected) ixgbe tx hang since 4.16-rc1
  2018-06-04  0:15 ` Shannon Nelson
  2018-06-04  0:35   ` Andre Tomt
@ 2018-06-04  4:13   ` Rustad, Mark D
  1 sibling, 0 replies; 5+ messages in thread
From: Rustad, Mark D @ 2018-06-04  4:13 UTC (permalink / raw)
  To: intel-wired-lan

Shannon,

-Andre

On Jun 3, 2018, at 5:15 PM, Shannon Nelson <shannon.nelson@oracle.com> wrote:

> Thanks for your bug report and the bisection.  I'm not aware of any  
> differences in the x553 from the rest of the x5xx family, and I don't  
> have access to one, so I can't do any testing of my own.

I seem to recall that the X553 in Denverton does not have IPsec support.

-- 
Mark Rustad, Networking Division, Intel Corporation

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-06-04  4:13 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-03  7:55 [Intel-wired-lan] (bisected) ixgbe tx hang since 4.16-rc1 Andre Tomt
2018-06-04  0:15 ` Shannon Nelson
2018-06-04  0:35   ` Andre Tomt
2018-06-04  2:32     ` Alexander Duyck
2018-06-04  4:13   ` Rustad, Mark D

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.