netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* arping stuck with ENOBUFS in 4.19.150
@ 2020-10-22 15:19 Joakim Tjernlund
  2020-10-26 12:58 ` Joakim Tjernlund
  0 siblings, 1 reply; 11+ messages in thread
From: Joakim Tjernlund @ 2020-10-22 15:19 UTC (permalink / raw)
  To: netdev

strace arping -q -c 1 -b -U  -I eth1 0.0.0.0
...
sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\v\6 \v\v\v\v\377\377\377\377\377\377\0\0\0\0", 28, 0, {sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={1, ffffffffffff},
20) = -1 ENOBUFS (No buffer space available)
....
and then arping loops.

in 4.19.127 it was:
sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\5\271\362\n\322\212E\377\377\377\377\377\377\0\0\0\0", 28, 0, {​sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={​1,
ffffffffffff}​, 20) = 28

Seems like something has changed the IP behaviour between now and then ?
eth1 is UP but not RUNNING and has an IP address.

 Jocke

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arping stuck with ENOBUFS in 4.19.150
  2020-10-22 15:19 arping stuck with ENOBUFS in 4.19.150 Joakim Tjernlund
@ 2020-10-26 12:58 ` Joakim Tjernlund
  2020-10-26 16:27   ` Jakub Kicinski
  2020-10-26 18:31   ` David Ahern
  0 siblings, 2 replies; 11+ messages in thread
From: Joakim Tjernlund @ 2020-10-26 12:58 UTC (permalink / raw)
  To: netdev

Ping  (maybe it should read "arping" instead :)

 Jocke

On Thu, 2020-10-22 at 17:19 +0200, Joakim Tjernlund wrote:
> strace arping -q -c 1 -b -U  -I eth1 0.0.0.0
> ...
> sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\v\6 \v\v\v\v\377\377\377\377\377\377\0\0\0\0", 28, 0, {sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={1, ffffffffffff},
> 20) = -1 ENOBUFS (No buffer space available)
> ....
> and then arping loops.
> 
> in 4.19.127 it was:
> sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\5\271\362\n\322\212E\377\377\377\377\377\377\0\0\0\0", 28, 0, {​sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={​1,
> ffffffffffff}​, 20) = 28
> 
> Seems like something has changed the IP behaviour between now and then ?
> eth1 is UP but not RUNNING and has an IP address.
> 
>  Jocke


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arping stuck with ENOBUFS in 4.19.150
  2020-10-26 12:58 ` Joakim Tjernlund
@ 2020-10-26 16:27   ` Jakub Kicinski
  2020-10-26 18:31   ` David Ahern
  1 sibling, 0 replies; 11+ messages in thread
From: Jakub Kicinski @ 2020-10-26 16:27 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: netdev

On Mon, 26 Oct 2020 12:58:16 +0000 Joakim Tjernlund wrote:
> On Thu, 2020-10-22 at 17:19 +0200, Joakim Tjernlund wrote:
> > strace arping -q -c 1 -b -U  -I eth1 0.0.0.0
> > ...
> > sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\v\6 \v\v\v\v\377\377\377\377\377\377\0\0\0\0", 28, 0, {sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={1, ffffffffffff},
> > 20) = -1 ENOBUFS (No buffer space available)
> > ....
> > and then arping loops.
> > 
> > in 4.19.127 it was:
> > sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\5\271\362\n\322\212E\377\377\377\377\377\377\0\0\0\0", 28, 0, {​sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={​1,
> > ffffffffffff}​, 20) = 28
> > 
> > Seems like something has changed the IP behaviour between now and then ?
> > eth1 is UP but not RUNNING and has an IP address.

Seems like nobody knows off the top of their heads.

Any chance you can try 5.10-rc1 ? Or 5.9 ?

Or some versions in between 4.19.127 and 4.19.150 to narrow down the
search?

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arping stuck with ENOBUFS in 4.19.150
  2020-10-26 12:58 ` Joakim Tjernlund
  2020-10-26 16:27   ` Jakub Kicinski
@ 2020-10-26 18:31   ` David Ahern
  2020-10-29 14:10     ` Joakim Tjernlund
  1 sibling, 1 reply; 11+ messages in thread
From: David Ahern @ 2020-10-26 18:31 UTC (permalink / raw)
  To: Joakim Tjernlund, netdev

On 10/26/20 6:58 AM, Joakim Tjernlund wrote:
> Ping  (maybe it should read "arping" instead :)
> 
>  Jocke
> 
> On Thu, 2020-10-22 at 17:19 +0200, Joakim Tjernlund wrote:
>> strace arping -q -c 1 -b -U  -I eth1 0.0.0.0
>> ...
>> sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\v\6 \v\v\v\v\377\377\377\377\377\377\0\0\0\0", 28, 0, {sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={1, ffffffffffff},
>> 20) = -1 ENOBUFS (No buffer space available)
>> ....
>> and then arping loops.
>>
>> in 4.19.127 it was:
>> sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\5\271\362\n\322\212E\377\377\377\377\377\377\0\0\0\0", 28, 0, {​sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={​1,
>> ffffffffffff}​, 20) = 28
>>
>> Seems like something has changed the IP behaviour between now and then ?
>> eth1 is UP but not RUNNING and has an IP address.
>>
>>  Jocke
> 

do a git bisect between the releases to find out which commit is causing
the change in behavior.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arping stuck with ENOBUFS in 4.19.150
  2020-10-26 18:31   ` David Ahern
@ 2020-10-29 14:10     ` Joakim Tjernlund
  2020-10-29 15:18       ` David Ahern
  2020-10-29 19:10       ` Cong Wang
  0 siblings, 2 replies; 11+ messages in thread
From: Joakim Tjernlund @ 2020-10-29 14:10 UTC (permalink / raw)
  To: dsahern, netdev, kuba

OK, bisecting (was a bit of a bother since we merge upstream releases into our tree, is there a way to just bisect that?)

Result was commit "net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc"  (749cc0b0c7f3dcdfe5842f998c0274e54987384f)

Reverting that commit on top of our tree made it work again. How to fix?

 Jocke
 
On Mon, 2020-10-26 at 12:31 -0600, David Ahern wrote:
> 
> On 10/26/20 6:58 AM, Joakim Tjernlund wrote:
> > Ping  (maybe it should read "arping" instead :)
> > 
> >  Jocke
> > 
> > On Thu, 2020-10-22 at 17:19 +0200, Joakim Tjernlund wrote:
> > > strace arping -q -c 1 -b -U  -I eth1 0.0.0.0
> > > ...
> > > sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\v\6 \v\v\v\v\377\377\377\377\377\377\0\0\0\0", 28, 0, {sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={1, ffffffffffff},
> > > 20) = -1 ENOBUFS (No buffer space available)
> > > ....
> > > and then arping loops.
> > > 
> > > in 4.19.127 it was:
> > > sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\5\271\362\n\322\212E\377\377\377\377\377\377\0\0\0\0", 28, 0, {​sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={​1,
> > > ffffffffffff}​, 20) = 28
> > > 
> > > Seems like something has changed the IP behaviour between now and then ?
> > > eth1 is UP but not RUNNING and has an IP address.
> > > 
> > >  Jocke
> > 
> 
> do a git bisect between the releases to find out which commit is causing
> the change in behavior.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arping stuck with ENOBUFS in 4.19.150
  2020-10-29 14:10     ` Joakim Tjernlund
@ 2020-10-29 15:18       ` David Ahern
  2020-10-30  1:36         ` Yunsheng Lin
  2020-10-29 19:10       ` Cong Wang
  1 sibling, 1 reply; 11+ messages in thread
From: David Ahern @ 2020-10-29 15:18 UTC (permalink / raw)
  To: Joakim Tjernlund, netdev, kuba, Yunsheng Lin

On 10/29/20 8:10 AM, Joakim Tjernlund wrote:
> OK, bisecting (was a bit of a bother since we merge upstream releases into our tree, is there a way to just bisect that?)
> 
> Result was commit "net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc"  (749cc0b0c7f3dcdfe5842f998c0274e54987384f)
> 
> Reverting that commit on top of our tree made it work again. How to fix?

Adding the author of that patch (linyunsheng@huawei.com) to take a look.


> 
>  Jocke
>  
> On Mon, 2020-10-26 at 12:31 -0600, David Ahern wrote:
>>
>> On 10/26/20 6:58 AM, Joakim Tjernlund wrote:
>>> Ping  (maybe it should read "arping" instead :)
>>>
>>>  Jocke
>>>
>>> On Thu, 2020-10-22 at 17:19 +0200, Joakim Tjernlund wrote:
>>>> strace arping -q -c 1 -b -U  -I eth1 0.0.0.0
>>>> ...
>>>> sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\v\6 \v\v\v\v\377\377\377\377\377\377\0\0\0\0", 28, 0, {sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={1, ffffffffffff},
>>>> 20) = -1 ENOBUFS (No buffer space available)
>>>> ....
>>>> and then arping loops.
>>>>
>>>> in 4.19.127 it was:
>>>> sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\5\271\362\n\322\212E\377\377\377\377\377\377\0\0\0\0", 28, 0, {​sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={​1,
>>>> ffffffffffff}​, 20) = 28
>>>>
>>>> Seems like something has changed the IP behaviour between now and then ?
>>>> eth1 is UP but not RUNNING and has an IP address.
>>>>
>>>>  Jocke
>>>
>>
>> do a git bisect between the releases to find out which commit is causing
>> the change in behavior.
> 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arping stuck with ENOBUFS in 4.19.150
  2020-10-29 14:10     ` Joakim Tjernlund
  2020-10-29 15:18       ` David Ahern
@ 2020-10-29 19:10       ` Cong Wang
  1 sibling, 0 replies; 11+ messages in thread
From: Cong Wang @ 2020-10-29 19:10 UTC (permalink / raw)
  To: Joakim Tjernlund; +Cc: dsahern, netdev, kuba

On Thu, Oct 29, 2020 at 7:11 AM Joakim Tjernlund
<Joakim.Tjernlund@infinera.com> wrote:
>
> OK, bisecting (was a bit of a bother since we merge upstream releases into our tree, is there a way to just bisect that?)
>
> Result was commit "net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc"  (749cc0b0c7f3dcdfe5842f998c0274e54987384f)
>
> Reverting that commit on top of our tree made it work again. How to fix?

This is odd. The above commit touches the netdev reset path, did
your netdev get reset when you ran arping? You said your eth1 is UP,
is it always UP or flapping?

In the other thread, a bisect also points to the same commit on 5.4.
I guess there might be something missing in the backport.

Thanks.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arping stuck with ENOBUFS in 4.19.150
  2020-10-29 15:18       ` David Ahern
@ 2020-10-30  1:36         ` Yunsheng Lin
  2020-10-30 11:50           ` Joakim Tjernlund
  0 siblings, 1 reply; 11+ messages in thread
From: Yunsheng Lin @ 2020-10-30  1:36 UTC (permalink / raw)
  To: David Ahern, Joakim Tjernlund, netdev, kuba

On 2020/10/29 23:18, David Ahern wrote:
> On 10/29/20 8:10 AM, Joakim Tjernlund wrote:
>> OK, bisecting (was a bit of a bother since we merge upstream releases into our tree, is there a way to just bisect that?)
>>
>> Result was commit "net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc"  (749cc0b0c7f3dcdfe5842f998c0274e54987384f)
>>
>> Reverting that commit on top of our tree made it work again. How to fix?
> 
> Adding the author of that patch (linyunsheng@huawei.com) to take a look.
> 
> 
>>
>>  Jocke
>>  
>> On Mon, 2020-10-26 at 12:31 -0600, David Ahern wrote:
>>>
>>> On 10/26/20 6:58 AM, Joakim Tjernlund wrote:
>>>> Ping  (maybe it should read "arping" instead :)
>>>>
>>>>  Jocke
>>>>
>>>> On Thu, 2020-10-22 at 17:19 +0200, Joakim Tjernlund wrote:
>>>>> strace arping -q -c 1 -b -U  -I eth1 0.0.0.0
>>>>> ...
>>>>> sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\v\6 \v\v\v\v\377\377\377\377\377\377\0\0\0\0", 28, 0, {sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={1, ffffffffffff},
>>>>> 20) = -1 ENOBUFS (No buffer space available)
>>>>> ....
>>>>> and then arping loops.
>>>>>
>>>>> in 4.19.127 it was:
>>>>> sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\5\271\362\n\322\212E\377\377\377\377\377\377\0\0\0\0", 28, 0, {​sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={​1,
>>>>> ffffffffffff}​, 20) = 28
>>>>>
>>>>> Seems like something has changed the IP behaviour between now and then ?
>>>>> eth1 is UP but not RUNNING and has an IP address.

"eth1 is UP but not RUNNING" usually mean user has configure the netdev as up,
but the hardware has not detected a linkup yet.

Also What is the output of "ethtool eth1"?

It would be good to see the status of netdev before and after executing arping cmd
too.

Thanks.

>>>>>
>>>>>  Jocke
>>>>
>>>
>>> do a git bisect between the releases to find out which commit is causing
>>> the change in behavior.

unfortunately, I did not reproduce the above problem in 4.19.150 too.

root@(none)$ arping -q -c 1 -b -U  -I eth0 0.0.0.0
root@(none)$ arping -v
ARPing 2.21, by Thomas Habets <thomas@habets.se>
usage: arping [ -0aAbdDeFpPqrRuUv ] [ -w <sec> ] [ -W <sec> ] [ -S <host/ip> ]
              [ -T <host/ip ] [ -s <MAC> ] [ -t <MAC> ] [ -c <count> ]
              [ -C <count> ] [ -i <interface> ] [ -m <type> ] [ -g <group> ]
              [ -V <vlan> ] [ -Q <priority> ] <host/ip/MAC | -B>
For complete usage info, use --help or check the manpage.
root@(none)$ cat /proc/version
Linux version 4.19.150 (linyunsheng@ubuntu) (gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12)) #4 SMP PREEMPT Fri Oct 30 09:22:06 CST 2020



>>
> 
> 

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arping stuck with ENOBUFS in 4.19.150
  2020-10-30  1:36         ` Yunsheng Lin
@ 2020-10-30 11:50           ` Joakim Tjernlund
  2020-10-31  1:48             ` Yunsheng Lin
  0 siblings, 1 reply; 11+ messages in thread
From: Joakim Tjernlund @ 2020-10-30 11:50 UTC (permalink / raw)
  To: dsahern, linyunsheng, netdev, kuba

On Fri, 2020-10-30 at 09:36 +0800, Yunsheng Lin wrote:
> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> 
> 
> On 2020/10/29 23:18, David Ahern wrote:
> > On 10/29/20 8:10 AM, Joakim Tjernlund wrote:
> > > OK, bisecting (was a bit of a bother since we merge upstream releases into our tree, is there a way to just bisect that?)
> > > 
> > > Result was commit "net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc"  (749cc0b0c7f3dcdfe5842f998c0274e54987384f)
> > > 
> > > Reverting that commit on top of our tree made it work again. How to fix?
> > 
> > Adding the author of that patch (linyunsheng@huawei.com) to take a look.
> > 
> > 
> > > 
> > >  Jocke
> > > 
> > > On Mon, 2020-10-26 at 12:31 -0600, David Ahern wrote:
> > > > 
> > > > On 10/26/20 6:58 AM, Joakim Tjernlund wrote:
> > > > > Ping  (maybe it should read "arping" instead :)
> > > > > 
> > > > >  Jocke
> > > > > 
> > > > > On Thu, 2020-10-22 at 17:19 +0200, Joakim Tjernlund wrote:
> > > > > > strace arping -q -c 1 -b -U  -I eth1 0.0.0.0
> > > > > > ...
> > > > > > sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\v\6 \v\v\v\v\377\377\377\377\377\377\0\0\0\0", 28, 0, {sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={1, ffffffffffff},
> > > > > > 20) = -1 ENOBUFS (No buffer space available)
> > > > > > ....
> > > > > > and then arping loops.
> > > > > > 
> > > > > > in 4.19.127 it was:
> > > > > > sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\5\271\362\n\322\212E\377\377\377\377\377\377\0\0\0\0", 28, 0, {​sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={​1,
> > > > > > ffffffffffff}​, 20) = 28
> > > > > > 
> > > > > > Seems like something has changed the IP behaviour between now and then ?
> > > > > > eth1 is UP but not RUNNING and has an IP address.
> 
> "eth1 is UP but not RUNNING" usually mean user has configure the netdev as up,
> but the hardware has not detected a linkup yet.
> 
> Also What is the output of "ethtool eth1"?

echo 1 >  /sys/class/net/eth1/carrier
cu3-jocke ~ # arping -q -c 1 -b -U  -I eth1 0.0.0.0
cu3-jocke ~ # echo 0 >  /sys/class/net/eth1/carrier
cu3-jocke ~ # arping -q -c 1 -b -U  -I eth1 0.0.0.0
^Ccu3-jocke ~ # ethtool eth1
Settings for eth1:
	Supported ports: [ MII ]
	Supported link modes:   1000baseT/Full 
	Supported pause frame use: Symmetric Receive-only
	Supports auto-negotiation: Yes
	Advertised link modes:  1000baseT/Full 
	Advertised pause frame use: Symmetric Receive-only
	Advertised auto-negotiation: Yes
	Speed: 10Mb/s
	Duplex: Half
	Port: MII
	PHYAD: 1
	Transceiver: external
	Auto-negotiation: on
	Current message level: 0x00000037 (55)
			       drv probe link ifdown ifup
	Link detected: no

We have a writeable carrier since eth device is PHY less. Maybe that path is different ?
Check drivers/net/ethernet/freescale/dpaa/dpa_eth.c

> 
> It would be good to see the status of netdev before and after executing arping cmd
> too.

hmm, how do you mean?

> 
> Thanks.
> 
> > > > > > 
> > > > > >  Jocke
> > > > > 
> > > > 
> > > > do a git bisect between the releases to find out which commit is causing
> > > > the change in behavior.
> 
> unfortunately, I did not reproduce the above problem in 4.19.150 too.
> 
> root@(none)$ arping -q -c 1 -b -U  -I eth0 0.0.0.0
> root@(none)$ arping -v
> ARPing 2.21, by Thomas Habets <thomas@habets.se>
> usage: arping [ -0aAbdDeFpPqrRuUv ] [ -w <sec> ] [ -W <sec> ] [ -S <host/ip> ]
>               [ -T <host/ip ] [ -s <MAC> ] [ -t <MAC> ] [ -c <count> ]
>               [ -C <count> ] [ -i <interface> ] [ -m <type> ] [ -g <group> ]
>               [ -V <vlan> ] [ -Q <priority> ] <host/ip/MAC | -B>
> For complete usage info, use --help or check the manpage.
> root@(none)$ cat /proc/version
> Linux version 4.19.150 (linyunsheng@ubuntu) (gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12)) #4 SMP PREEMPT Fri Oct 30 09:22:06 CST 2020
> 
> 
> 
> > > 
> > 
> > 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: arping stuck with ENOBUFS in 4.19.150
  2020-10-30 11:50           ` Joakim Tjernlund
@ 2020-10-31  1:48             ` Yunsheng Lin
  2020-11-02  8:27               ` Joakim Tjernlund
  0 siblings, 1 reply; 11+ messages in thread
From: Yunsheng Lin @ 2020-10-31  1:48 UTC (permalink / raw)
  To: Joakim Tjernlund, dsahern, netdev, kuba

On 2020/10/30 19:50, Joakim Tjernlund wrote:
> On Fri, 2020-10-30 at 09:36 +0800, Yunsheng Lin wrote:
>> CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
>>
>>
>> On 2020/10/29 23:18, David Ahern wrote:
>>> On 10/29/20 8:10 AM, Joakim Tjernlund wrote:
>>>> OK, bisecting (was a bit of a bother since we merge upstream releases into our tree, is there a way to just bisect that?)
>>>>
>>>> Result was commit "net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc"  (749cc0b0c7f3dcdfe5842f998c0274e54987384f)
>>>>
>>>> Reverting that commit on top of our tree made it work again. How to fix?
>>>
>>> Adding the author of that patch (linyunsheng@huawei.com) to take a look.
>>>
>>>
>>>>
>>>>  Jocke
>>>>
>>>> On Mon, 2020-10-26 at 12:31 -0600, David Ahern wrote:
>>>>>
>>>>> On 10/26/20 6:58 AM, Joakim Tjernlund wrote:
>>>>>> Ping  (maybe it should read "arping" instead :)
>>>>>>
>>>>>>  Jocke
>>>>>>
>>>>>> On Thu, 2020-10-22 at 17:19 +0200, Joakim Tjernlund wrote:
>>>>>>> strace arping -q -c 1 -b -U  -I eth1 0.0.0.0
>>>>>>> ...
>>>>>>> sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\v\6 \v\v\v\v\377\377\377\377\377\377\0\0\0\0", 28, 0, {sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={1, ffffffffffff},
>>>>>>> 20) = -1 ENOBUFS (No buffer space available)
>>>>>>> ....
>>>>>>> and then arping loops.
>>>>>>>
>>>>>>> in 4.19.127 it was:
>>>>>>> sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\5\271\362\n\322\212E\377\377\377\377\377\377\0\0\0\0", 28, 0, {​sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={​1,
>>>>>>> ffffffffffff}​, 20) = 28
>>>>>>>
>>>>>>> Seems like something has changed the IP behaviour between now and then ?
>>>>>>> eth1 is UP but not RUNNING and has an IP address.
>>
>> "eth1 is UP but not RUNNING" usually mean user has configure the netdev as up,
>> but the hardware has not detected a linkup yet.
>>
>> Also What is the output of "ethtool eth1"?
> 
> echo 1 >  /sys/class/net/eth1/carrier
> cu3-jocke ~ # arping -q -c 1 -b -U  -I eth1 0.0.0.0
> cu3-jocke ~ # echo 0 >  /sys/class/net/eth1/carrier
> cu3-jocke ~ # arping -q -c 1 -b -U  -I eth1 0.0.0.0
> ^Ccu3-jocke ~ # ethtool eth1
> Settings for eth1:
> 	Supported ports: [ MII ]
> 	Supported link modes:   1000baseT/Full 
> 	Supported pause frame use: Symmetric Receive-only
> 	Supports auto-negotiation: Yes
> 	Advertised link modes:  1000baseT/Full 
> 	Advertised pause frame use: Symmetric Receive-only
> 	Advertised auto-negotiation: Yes
> 	Speed: 10Mb/s
> 	Duplex: Half
> 	Port: MII
> 	PHYAD: 1
> 	Transceiver: external
> 	Auto-negotiation: on
> 	Current message level: 0x00000037 (55)
> 			       drv probe link ifdown ifup
> 	Link detected: no
> 
> We have a writeable carrier since eth device is PHY less. Maybe that path is different ?
> Check drivers/net/ethernet/freescale/dpaa/dpa_eth.c

The above difference does not seems to matter.

> 
>>
>> It would be good to see the status of netdev before and after executing arping cmd
>> too.
> 
> hmm, how do you mean?

I was trying to find out when the netdev' state became "eth1 is UP but not RUNNING".

Anyway, when I looked at the backported patch, I did find new qdisc assignment is
missing from the upstream patch.

Please see if the below patch fix your problem, thanks:

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index bd96fd2..4e15913 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -1116,10 +1116,13 @@ static void dev_deactivate_queue(struct net_device *dev,
                                 void *_qdisc_default)
 {
        struct Qdisc *qdisc = rtnl_dereference(dev_queue->qdisc);
+       struct Qdisc *qdisc_default = _qdisc_default;

        if (qdisc) {
                if (!(qdisc->flags & TCQ_F_BUILTIN))
                        set_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state);
+
+               rcu_assign_pointer(dev_queue->qdisc, qdisc_default);
        }
 }




> 
>>
>> Thanks.
>>
>>>>>>>
>>>>>>>  Jocke
>>>>>>
>>>>>
>>>>> do a git bisect between the releases to find out which commit is causing
>>>>> the change in behavior.
>>
>> unfortunately, I did not reproduce the above problem in 4.19.150 too.
>>
>> root@(none)$ arping -q -c 1 -b -U  -I eth0 0.0.0.0
>> root@(none)$ arping -v
>> ARPing 2.21, by Thomas Habets <thomas@habets.se>
>> usage: arping [ -0aAbdDeFpPqrRuUv ] [ -w <sec> ] [ -W <sec> ] [ -S <host/ip> ]
>>               [ -T <host/ip ] [ -s <MAC> ] [ -t <MAC> ] [ -c <count> ]
>>               [ -C <count> ] [ -i <interface> ] [ -m <type> ] [ -g <group> ]
>>               [ -V <vlan> ] [ -Q <priority> ] <host/ip/MAC | -B>
>> For complete usage info, use --help or check the manpage.
>> root@(none)$ cat /proc/version
>> Linux version 4.19.150 (linyunsheng@ubuntu) (gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.12)) #4 SMP PREEMPT Fri Oct 30 09:22:06 CST 2020
>>
>>
>>
>>>>
>>>
>>>
> 

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: arping stuck with ENOBUFS in 4.19.150
  2020-10-31  1:48             ` Yunsheng Lin
@ 2020-11-02  8:27               ` Joakim Tjernlund
  0 siblings, 0 replies; 11+ messages in thread
From: Joakim Tjernlund @ 2020-11-02  8:27 UTC (permalink / raw)
  To: dsahern, linyunsheng, netdev, kuba

On Sat, 2020-10-31 at 09:48 +0800, Yunsheng Lin wrote:
> On 2020/10/30 19:50, Joakim Tjernlund wrote:
> > On Fri, 2020-10-30 at 09:36 +0800, Yunsheng Lin wrote:
> > > CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
> > > 
> > > 
> > > On 2020/10/29 23:18, David Ahern wrote:
> > > > On 10/29/20 8:10 AM, Joakim Tjernlund wrote:
> > > > > OK, bisecting (was a bit of a bother since we merge upstream releases into our tree, is there a way to just bisect that?)
> > > > > 
> > > > > Result was commit "net: sch_generic: aviod concurrent reset and enqueue op for lockless qdisc"  (749cc0b0c7f3dcdfe5842f998c0274e54987384f)
> > > > > 
> > > > > Reverting that commit on top of our tree made it work again. How to fix?
> > > > 
> > > > Adding the author of that patch (linyunsheng@huawei.com) to take a look.
> > > > 
> > > > 
> > > > > 
> > > > >  Jocke
> > > > > 
> > > > > On Mon, 2020-10-26 at 12:31 -0600, David Ahern wrote:
> > > > > > 
> > > > > > On 10/26/20 6:58 AM, Joakim Tjernlund wrote:
> > > > > > > Ping  (maybe it should read "arping" instead :)
> > > > > > > 
> > > > > > >  Jocke
> > > > > > > 
> > > > > > > On Thu, 2020-10-22 at 17:19 +0200, Joakim Tjernlund wrote:
> > > > > > > > strace arping -q -c 1 -b -U  -I eth1 0.0.0.0
> > > > > > > > ...
> > > > > > > > sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\v\6 \v\v\v\v\377\377\377\377\377\377\0\0\0\0", 28, 0, {sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={1, ffffffffffff},
> > > > > > > > 20) = -1 ENOBUFS (No buffer space available)
> > > > > > > > ....
> > > > > > > > and then arping loops.
> > > > > > > > 
> > > > > > > > in 4.19.127 it was:
> > > > > > > > sendto(3, "\0\1\10\0\6\4\0\1\0\6\234\5\271\362\n\322\212E\377\377\377\377\377\377\0\0\0\0", 28, 0, {​sa_family=AF_PACKET, proto=0x806, if4, pkttype=PACKET_HOST, addr(6)={​1,
> > > > > > > > ffffffffffff}​, 20) = 28
> > > > > > > > 
> > > > > > > > Seems like something has changed the IP behaviour between now and then ?
> > > > > > > > eth1 is UP but not RUNNING and has an IP address.
> > > 
> > > "eth1 is UP but not RUNNING" usually mean user has configure the netdev as up,
> > > but the hardware has not detected a linkup yet.
> > > 
> > > Also What is the output of "ethtool eth1"?
> > 
> > echo 1 >  /sys/class/net/eth1/carrier
> > cu3-jocke ~ # arping -q -c 1 -b -U  -I eth1 0.0.0.0
> > cu3-jocke ~ # echo 0 >  /sys/class/net/eth1/carrier
> > cu3-jocke ~ # arping -q -c 1 -b -U  -I eth1 0.0.0.0
> > ^Ccu3-jocke ~ # ethtool eth1
> > Settings for eth1:
> > 	Supported ports: [ MII ]
> > 	Supported link modes:   1000baseT/Full 
> > 	Supported pause frame use: Symmetric Receive-only
> > 	Supports auto-negotiation: Yes
> > 	Advertised link modes:  1000baseT/Full 
> > 	Advertised pause frame use: Symmetric Receive-only
> > 	Advertised auto-negotiation: Yes
> > 	Speed: 10Mb/s
> > 	Duplex: Half
> > 	Port: MII
> > 	PHYAD: 1
> > 	Transceiver: external
> > 	Auto-negotiation: on
> > 	Current message level: 0x00000037 (55)
> > 			       drv probe link ifdown ifup
> > 	Link detected: no
> > 
> > We have a writeable carrier since eth device is PHY less. Maybe that path is different ?
> > Check drivers/net/ethernet/freescale/dpaa/dpa_eth.c
> 
> The above difference does not seems to matter.
> 
> > 
> > > 
> > > It would be good to see the status of netdev before and after executing arping cmd
> > > too.
> > 
> > hmm, how do you mean?
> 
> I was trying to find out when the netdev' state became "eth1 is UP but not RUNNING".
> 
> Anyway, when I looked at the backported patch, I did find new qdisc assignment is
> missing from the upstream patch.
> 
> Please see if the below patch fix your problem, thanks:
> 
> diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
> index bd96fd2..4e15913 100644
> --- a/net/sched/sch_generic.c
> +++ b/net/sched/sch_generic.c
> @@ -1116,10 +1116,13 @@ static void dev_deactivate_queue(struct net_device *dev,
>                                  void *_qdisc_default)
>  {
>         struct Qdisc *qdisc = rtnl_dereference(dev_queue->qdisc);
> +       struct Qdisc *qdisc_default = _qdisc_default;
> 
>         if (qdisc) {
>                 if (!(qdisc->flags & TCQ_F_BUILTIN))
>                         set_bit(__QDISC_STATE_DEACTIVATED, &qdisc->state);
> +
> +               rcu_assign_pointer(dev_queue->qdisc, qdisc_default);
>         }
>  }

This patch seem to have resolved the problem, thanks.
Please CC me on the formal patch for 4.19.x

 Jocke

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-11-02  8:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-22 15:19 arping stuck with ENOBUFS in 4.19.150 Joakim Tjernlund
2020-10-26 12:58 ` Joakim Tjernlund
2020-10-26 16:27   ` Jakub Kicinski
2020-10-26 18:31   ` David Ahern
2020-10-29 14:10     ` Joakim Tjernlund
2020-10-29 15:18       ` David Ahern
2020-10-30  1:36         ` Yunsheng Lin
2020-10-30 11:50           ` Joakim Tjernlund
2020-10-31  1:48             ` Yunsheng Lin
2020-11-02  8:27               ` Joakim Tjernlund
2020-10-29 19:10       ` Cong Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).