netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* NFS over NAT causes e1000e transmit hangs
@ 2017-04-18 18:18 Florian Fainelli
  2017-04-18 19:03 ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Fainelli @ 2017-04-18 18:18 UTC (permalink / raw)
  To: intel-wired-lan, jeffrey.t.kirsher; +Cc: netdev

Hi,

I am using NFS over a NAT with two e1000e adapters and with eth1 being
the LAN interface and eth0 the WAN interface. The kernel is Ubuntu's
16.10 kernel: 4.8.0-46-generic. The device doing NAT over NFS is just
mounting a remote folder and doing normal execution/file accesses. It's
enough to untar a file from this device onto a NFS share to expose the
problem.

The transmit hangs look like the ones below, doing a rmmod/insmod does
not help eliminated the problem, nor does a power cycle. Stopping the
NFS over NAT definitively does let the adapter recover.

Happy to test any patches/newer kernels if you think there is something
obviously wrong. It *seems* to have started when I updated to 4.8.x, and
I was not able to see this under 4.4, so first things could be to try a
bisection, time permitting.

The two devices involved in the NAT are:

fainelli@fainelli-desktop:[~/../linux]$ lspci -s 0000:09:00.0 -v
09:00.0 Ethernet controller: Intel Corporation 82574L Gigabit Network
Connection
        Subsystem: Intel Corporation Gigabit CT Desktop Adapter
        Flags: bus master, fast devsel, latency 0, IRQ 17
        Memory at ef6c0000 (32-bit, non-prefetchable) [size=128K]
        Memory at ef600000 (32-bit, non-prefetchable) [size=512K]
        I/O ports at b000 [size=32]
        Memory at ef6e0000 (32-bit, non-prefetchable) [size=16K]
        Expansion ROM at ef680000 [disabled] [size=256K]
        Capabilities: <access denied>
        Kernel driver in use: e1000e
        Kernel modules: e1000e

fainelli@fainelli-desktop:[~/../linux]$ lspci -s 0000:00:19.0 -v
00:19.0 Ethernet controller: Intel Corporation 82579LM Gigabit Network
Connection (rev 05)
        Subsystem: Dell 82579LM Gigabit Network Connection
        Flags: bus master, fast devsel, latency 0, IRQ 43
        Memory at ef900000 (32-bit, non-prefetchable) [size=128K]
        Memory at ef929000 (32-bit, non-prefetchable) [size=4K]
        I/O ports at f040 [size=32]
        Capabilities: <access denied>
        Kernel driver in use: e1000e
        Kernel modules: e1000e

[516481.589090] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
                  TDH                  <9b>
                  TDT                  <b0>
                  next_to_use          <b0>
                  next_to_clean        <96>
                buffer_info[next_to_clean]:
                  time_stamp           <107b0fc76>
                  next_to_watch        <9b>
                  jiffies              <107b10048>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3c00>
                PHY Extended Status    <3000>
                PCI Status             <10>
[516483.573120] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
                  TDH                  <9b>
                  TDT                  <b0>
                  next_to_use          <b0>
                  next_to_clean        <96>
                buffer_info[next_to_clean]:
                  time_stamp           <107b0fc76>
                  next_to_watch        <9b>
                  jiffies              <107b10238>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3c00>
                PHY Extended Status    <3000>
                PCI Status             <10>
[516485.589452] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
                  TDH                  <9b>
                  TDT                  <b0>
                  next_to_use          <b0>
                  next_to_clean        <96>
                buffer_info[next_to_clean]:
                  time_stamp           <107b0fc76>
                  next_to_watch        <9b>
                  jiffies              <107b10430>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3c00>
                PHY Extended Status    <3000>
                PCI Status             <10>
[516487.573397] e1000e 0000:00:19.0 eth0: Detected Hardware Unit Hang:
                  TDH                  <9b>
                  TDT                  <b0>
                  next_to_use          <b0>
                  next_to_clean        <96>
                buffer_info[next_to_clean]:
                  time_stamp           <107b0fc76>
                  next_to_watch        <9b>
                  jiffies              <107b10620>
                  next_to_watch.status <0>
                MAC Status             <40080083>
                PHY Status             <796d>
                PHY 1000BASE-T Status  <3c00>
                PHY Extended Status    <3000>
                PCI Status             <10>
[516487.700509] e1000e 0000:00:19.0 eth0: Reset adapter unexpectedly
[516491.526799] e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow
Control: Rx/Tx

Thanks for reading, here is a virtual potato: 0.
-- 
Florian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS over NAT causes e1000e transmit hangs
  2017-04-18 18:18 NFS over NAT causes e1000e transmit hangs Florian Fainelli
@ 2017-04-18 19:03 ` Eric Dumazet
  2017-04-18 19:05   ` Florian Fainelli
  0 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2017-04-18 19:03 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: intel-wired-lan, jeffrey.t.kirsher, netdev

On Tue, 2017-04-18 at 11:18 -0700, Florian Fainelli wrote:
> Hi,
> 
> I am using NFS over a NAT with two e1000e adapters and with eth1 being
> the LAN interface and eth0 the WAN interface. The kernel is Ubuntu's
> 16.10 kernel: 4.8.0-46-generic. The device doing NAT over NFS is just
> mounting a remote folder and doing normal execution/file accesses. It's
> enough to untar a file from this device onto a NFS share to expose the
> problem.
> 
> The transmit hangs look like the ones below, doing a rmmod/insmod does
> not help eliminated the problem, nor does a power cycle. Stopping the
> NFS over NAT definitively does let the adapter recover.

Is this NFS over TCP or UDP ?

Thanks !

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: NFS over NAT causes e1000e transmit hangs
  2017-04-18 19:03 ` Eric Dumazet
@ 2017-04-18 19:05   ` Florian Fainelli
  2017-04-19  8:52     ` [Intel-wired-lan] " Neftin, Sasha
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Fainelli @ 2017-04-18 19:05 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: intel-wired-lan, jeffrey.t.kirsher, netdev

On 04/18/2017 12:03 PM, Eric Dumazet wrote:
> On Tue, 2017-04-18 at 11:18 -0700, Florian Fainelli wrote:
>> Hi,
>>
>> I am using NFS over a NAT with two e1000e adapters and with eth1 being
>> the LAN interface and eth0 the WAN interface. The kernel is Ubuntu's
>> 16.10 kernel: 4.8.0-46-generic. The device doing NAT over NFS is just
>> mounting a remote folder and doing normal execution/file accesses. It's
>> enough to untar a file from this device onto a NFS share to expose the
>> problem.
>>
>> The transmit hangs look like the ones below, doing a rmmod/insmod does
>> not help eliminated the problem, nor does a power cycle. Stopping the
>> NFS over NAT definitively does let the adapter recover.
> 
> Is this NFS over TCP or UDP ?

This is NFS over TCP mounted with the following:

type nfs
(rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=2049,timeo=70,retrans=3,sec=sys,local_lock=none,addr=X.X.X.X)

Thanks Eric!
-- 
Florian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-wired-lan] NFS over NAT causes e1000e transmit hangs
  2017-04-18 19:05   ` Florian Fainelli
@ 2017-04-19  8:52     ` Neftin, Sasha
  2017-04-19 21:15       ` Florian Fainelli
  0 siblings, 1 reply; 8+ messages in thread
From: Neftin, Sasha @ 2017-04-19  8:52 UTC (permalink / raw)
  To: Florian Fainelli, Eric Dumazet; +Cc: netdev, intel-wired-lan

On 4/18/2017 22:05, Florian Fainelli wrote:
> On 04/18/2017 12:03 PM, Eric Dumazet wrote:
>> On Tue, 2017-04-18 at 11:18 -0700, Florian Fainelli wrote:
>>> Hi,
>>>
>>> I am using NFS over a NAT with two e1000e adapters and with eth1 being
>>> the LAN interface and eth0 the WAN interface. The kernel is Ubuntu's
>>> 16.10 kernel: 4.8.0-46-generic. The device doing NAT over NFS is just
>>> mounting a remote folder and doing normal execution/file accesses. It's
>>> enough to untar a file from this device onto a NFS share to expose the
>>> problem.
>>>
>>> The transmit hangs look like the ones below, doing a rmmod/insmod does
>>> not help eliminated the problem, nor does a power cycle. Stopping the
>>> NFS over NAT definitively does let the adapter recover.
>> Is this NFS over TCP or UDP ?
> This is NFS over TCP mounted with the following:
>
> type nfs
> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=2049,timeo=70,retrans=3,sec=sys,local_lock=none,addr=X.X.X.X)
>
> Thanks Eric!

Please, try disable TCP segmentation offload: ethtool -K <adapter> tso off.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-wired-lan] NFS over NAT causes e1000e transmit hangs
  2017-04-19  8:52     ` [Intel-wired-lan] " Neftin, Sasha
@ 2017-04-19 21:15       ` Florian Fainelli
  2017-04-23  6:46         ` Neftin, Sasha
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Fainelli @ 2017-04-19 21:15 UTC (permalink / raw)
  To: Neftin, Sasha, Eric Dumazet; +Cc: netdev, intel-wired-lan

On 04/19/2017 01:52 AM, Neftin, Sasha wrote:
> On 4/18/2017 22:05, Florian Fainelli wrote:
>> On 04/18/2017 12:03 PM, Eric Dumazet wrote:
>>> On Tue, 2017-04-18 at 11:18 -0700, Florian Fainelli wrote:
>>>> Hi,
>>>>
>>>> I am using NFS over a NAT with two e1000e adapters and with eth1 being
>>>> the LAN interface and eth0 the WAN interface. The kernel is Ubuntu's
>>>> 16.10 kernel: 4.8.0-46-generic. The device doing NAT over NFS is just
>>>> mounting a remote folder and doing normal execution/file accesses. It's
>>>> enough to untar a file from this device onto a NFS share to expose the
>>>> problem.
>>>>
>>>> The transmit hangs look like the ones below, doing a rmmod/insmod does
>>>> not help eliminated the problem, nor does a power cycle. Stopping the
>>>> NFS over NAT definitively does let the adapter recover.
>>> Is this NFS over TCP or UDP ?
>> This is NFS over TCP mounted with the following:
>>
>> type nfs
>> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=2049,timeo=70,retrans=3,sec=sys,local_lock=none,addr=X.X.X.X)
>>
>>
>> Thanks Eric!
> 
> Please, try disable TCP segmentation offload: ethtool -K <adapter> tso off.

I am not able to reproduce the hangs with TSO turned off. Is there a
specific patch you would want me to try?
-- 
Florian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-wired-lan] NFS over NAT causes e1000e transmit hangs
  2017-04-19 21:15       ` Florian Fainelli
@ 2017-04-23  6:46         ` Neftin, Sasha
  2017-04-23 17:08           ` Florian Fainelli
  0 siblings, 1 reply; 8+ messages in thread
From: Neftin, Sasha @ 2017-04-23  6:46 UTC (permalink / raw)
  To: Florian Fainelli, Eric Dumazet; +Cc: netdev, intel-wired-lan

On 4/20/2017 00:15, Florian Fainelli wrote:
> On 04/19/2017 01:52 AM, Neftin, Sasha wrote:
>> On 4/18/2017 22:05, Florian Fainelli wrote:
>>> On 04/18/2017 12:03 PM, Eric Dumazet wrote:
>>>> On Tue, 2017-04-18 at 11:18 -0700, Florian Fainelli wrote:
>>>>> Hi,
>>>>>
>>>>> I am using NFS over a NAT with two e1000e adapters and with eth1 being
>>>>> the LAN interface and eth0 the WAN interface. The kernel is Ubuntu's
>>>>> 16.10 kernel: 4.8.0-46-generic. The device doing NAT over NFS is just
>>>>> mounting a remote folder and doing normal execution/file accesses. It's
>>>>> enough to untar a file from this device onto a NFS share to expose the
>>>>> problem.
>>>>>
>>>>> The transmit hangs look like the ones below, doing a rmmod/insmod does
>>>>> not help eliminated the problem, nor does a power cycle. Stopping the
>>>>> NFS over NAT definitively does let the adapter recover.
>>>> Is this NFS over TCP or UDP ?
>>> This is NFS over TCP mounted with the following:
>>>
>>> type nfs
>>> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=2049,timeo=70,retrans=3,sec=sys,local_lock=none,addr=X.X.X.X)
>>>
>>>
>>> Thanks Eric!
>> Please, try disable TCP segmentation offload: ethtool -K <adapter> tso off.
> I am not able to reproduce the hangs with TSO turned off. Is there a
> specific patch you would want me to try?

Please, work with TSO turned off so. There is no patch for this specific 
problem.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-wired-lan] NFS over NAT causes e1000e transmit hangs
  2017-04-23  6:46         ` Neftin, Sasha
@ 2017-04-23 17:08           ` Florian Fainelli
  2017-04-23 17:24             ` Eric Dumazet
  0 siblings, 1 reply; 8+ messages in thread
From: Florian Fainelli @ 2017-04-23 17:08 UTC (permalink / raw)
  To: Neftin, Sasha, Eric Dumazet; +Cc: netdev, intel-wired-lan



On 04/22/2017 11:46 PM, Neftin, Sasha wrote:
> On 4/20/2017 00:15, Florian Fainelli wrote:
>> On 04/19/2017 01:52 AM, Neftin, Sasha wrote:
>>> On 4/18/2017 22:05, Florian Fainelli wrote:
>>>> On 04/18/2017 12:03 PM, Eric Dumazet wrote:
>>>>> On Tue, 2017-04-18 at 11:18 -0700, Florian Fainelli wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I am using NFS over a NAT with two e1000e adapters and with eth1
>>>>>> being
>>>>>> the LAN interface and eth0 the WAN interface. The kernel is Ubuntu's
>>>>>> 16.10 kernel: 4.8.0-46-generic. The device doing NAT over NFS is just
>>>>>> mounting a remote folder and doing normal execution/file accesses.
>>>>>> It's
>>>>>> enough to untar a file from this device onto a NFS share to expose
>>>>>> the
>>>>>> problem.
>>>>>>
>>>>>> The transmit hangs look like the ones below, doing a rmmod/insmod
>>>>>> does
>>>>>> not help eliminated the problem, nor does a power cycle. Stopping the
>>>>>> NFS over NAT definitively does let the adapter recover.
>>>>> Is this NFS over TCP or UDP ?
>>>> This is NFS over TCP mounted with the following:
>>>>
>>>> type nfs
>>>> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=2049,timeo=70,retrans=3,sec=sys,local_lock=none,addr=X.X.X.X)
>>>>
>>>>
>>>>
>>>> Thanks Eric!
>>> Please, try disable TCP segmentation offload: ethtool -K <adapter>
>>> tso off.
>> I am not able to reproduce the hangs with TSO turned off. Is there a
>> specific patch you would want me to try?
> 
> Please, work with TSO turned off so. There is no patch for this specific
> problem.

OK, are not we interested in somehow being able to identify such
problematic packets coming from the networking stack and force not using
TSO for those? Would an acceptable solution be to force the disabling of
TSO for this specific NIC model (provided it is some kind of HW bug)?

NB: I understand this is very old hardware for you at Intel, but
conversely, it is very widespread, and chances of people running into
similar issues are pretty high, so fixing it once would de-facto lower
the amount of support you'd have to provide in the future.

Thanks
-- 
Florian

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Intel-wired-lan] NFS over NAT causes e1000e transmit hangs
  2017-04-23 17:08           ` Florian Fainelli
@ 2017-04-23 17:24             ` Eric Dumazet
  0 siblings, 0 replies; 8+ messages in thread
From: Eric Dumazet @ 2017-04-23 17:24 UTC (permalink / raw)
  To: Florian Fainelli; +Cc: Neftin, Sasha, netdev, intel-wired-lan

On Sun, 2017-04-23 at 10:08 -0700, Florian Fainelli wrote:
> 
> On 04/22/2017 11:46 PM, Neftin, Sasha wrote:
> > On 4/20/2017 00:15, Florian Fainelli wrote:
> >> On 04/19/2017 01:52 AM, Neftin, Sasha wrote:
> >>> On 4/18/2017 22:05, Florian Fainelli wrote:
> >>>> On 04/18/2017 12:03 PM, Eric Dumazet wrote:
> >>>>> On Tue, 2017-04-18 at 11:18 -0700, Florian Fainelli wrote:
> >>>>>> Hi,
> >>>>>>
> >>>>>> I am using NFS over a NAT with two e1000e adapters and with eth1
> >>>>>> being
> >>>>>> the LAN interface and eth0 the WAN interface. The kernel is Ubuntu's
> >>>>>> 16.10 kernel: 4.8.0-46-generic. The device doing NAT over NFS is just
> >>>>>> mounting a remote folder and doing normal execution/file accesses.
> >>>>>> It's
> >>>>>> enough to untar a file from this device onto a NFS share to expose
> >>>>>> the
> >>>>>> problem.
> >>>>>>
> >>>>>> The transmit hangs look like the ones below, doing a rmmod/insmod
> >>>>>> does
> >>>>>> not help eliminated the problem, nor does a power cycle. Stopping the
> >>>>>> NFS over NAT definitively does let the adapter recover.
> >>>>> Is this NFS over TCP or UDP ?
> >>>> This is NFS over TCP mounted with the following:
> >>>>
> >>>> type nfs
> >>>> (rw,relatime,vers=3,rsize=1048576,wsize=1048576,namlen=255,hard,proto=tcp,port=2049,timeo=70,retrans=3,sec=sys,local_lock=none,addr=X.X.X.X)
> >>>>
> >>>>
> >>>>
> >>>> Thanks Eric!
> >>> Please, try disable TCP segmentation offload: ethtool -K <adapter>
> >>> tso off.
> >> I am not able to reproduce the hangs with TSO turned off. Is there a
> >> specific patch you would want me to try?
> > 
> > Please, work with TSO turned off so. There is no patch for this specific
> > problem.
> 
> OK, are not we interested in somehow being able to identify such
> problematic packets coming from the networking stack and force not using
> TSO for those? Would an acceptable solution be to force the disabling of
> TSO for this specific NIC model (provided it is some kind of HW bug)?
> 
> NB: I understand this is very old hardware for you at Intel, but
> conversely, it is very widespread, and chances of people running into
> similar issues are pretty high, so fixing it once would de-facto lower
> the amount of support you'd have to provide in the future.

Indeed it is very odd to disable TSO, especially if the problem only
shows up with NAT.

We probably have a nasty bug somewhere, or we might be able to have a
work around some hardware 'feature'.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-04-23 17:24 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-18 18:18 NFS over NAT causes e1000e transmit hangs Florian Fainelli
2017-04-18 19:03 ` Eric Dumazet
2017-04-18 19:05   ` Florian Fainelli
2017-04-19  8:52     ` [Intel-wired-lan] " Neftin, Sasha
2017-04-19 21:15       ` Florian Fainelli
2017-04-23  6:46         ` Neftin, Sasha
2017-04-23 17:08           ` Florian Fainelli
2017-04-23 17:24             ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).