All of lore.kernel.org
 help / color / mirror / Atom feed
* Very slow remove interface from kernel
@ 2023-05-09  8:22 Martin Zaharinov
  2023-05-09 10:20 ` Ido Schimmel
  0 siblings, 1 reply; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-09  8:22 UTC (permalink / raw)
  To: Eric Dumazet, netdev

Hi Eric 

I think may be help for this :


I try this on kernel 6.3.1 

add vlans : 
for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
for i in $(seq 2 4094); do ip link set dev vlan$i up; done


and after that run : 

for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done


time for remove for this 4093 vlans is 5-10 min .

Is there options to make fast this ?


Same problem is when have 5-6k ppp interface kernel very slow unregister device.


best regards,	
m.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-09  8:22 Very slow remove interface from kernel Martin Zaharinov
@ 2023-05-09 10:20 ` Ido Schimmel
  2023-05-09 10:32   ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Ido Schimmel @ 2023-05-09 10:20 UTC (permalink / raw)
  To: Martin Zaharinov; +Cc: Eric Dumazet, netdev

On Tue, May 09, 2023 at 11:22:13AM +0300, Martin Zaharinov wrote:
> add vlans : 
> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> 
> 
> and after that run : 
> 
> for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
> 
> 
> time for remove for this 4093 vlans is 5-10 min .
> 
> Is there options to make fast this ?

If you know you are going to delete all of them together, then you can
add them to the same group during creation:

for i in $(seq 2 4094); do ip link add link eth1 name vlan$i up group 10 type vlan id $i; done

Then delete the group:

ip link del group 10

IIRC, in the past there was a patchset to allow passing a list of
ifindexes instead of a group number, but it never made its way upstream.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-09 10:20 ` Ido Schimmel
@ 2023-05-09 10:32   ` Eric Dumazet
  2023-05-09 11:10     ` Martin Zaharinov
  0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2023-05-09 10:32 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: Martin Zaharinov, netdev

On Tue, May 9, 2023 at 12:20 PM Ido Schimmel <idosch@idosch.org> wrote:
>
> On Tue, May 09, 2023 at 11:22:13AM +0300, Martin Zaharinov wrote:
> > add vlans :
> > for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
> > for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> >
> >
> > and after that run :
> >
> > for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
> >
> >
> > time for remove for this 4093 vlans is 5-10 min .
> >
> > Is there options to make fast this ?
>
> If you know you are going to delete all of them together, then you can
> add them to the same group during creation:
>
> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i up group 10 type vlan id $i; done
>
> Then delete the group:
>
> ip link del group 10
>

Another way is to create a netns for retiring devices,
move devices to the 'retirens' when they need to go away.

Then once per minute, delete the retirens and create a new one.

-> This batches netdev deletions.


> IIRC, in the past there was a patchset to allow passing a list of
> ifindexes instead of a group number, but it never made its way upstream.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-09 10:32   ` Eric Dumazet
@ 2023-05-09 11:10     ` Martin Zaharinov
  2023-05-09 12:36       ` Eric Dumazet
  0 siblings, 1 reply; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-09 11:10 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ido Schimmel, netdev

Hi

in short, there is no way to make the kernel do it faster.

Before time with old kernel unregister device make more faster .

with latest kernel >6.x this make very slow .


is there any chance to try to make this more fast.


m.


> On 9 May 2023, at 13:32, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Tue, May 9, 2023 at 12:20 PM Ido Schimmel <idosch@idosch.org> wrote:
>> 
>> On Tue, May 09, 2023 at 11:22:13AM +0300, Martin Zaharinov wrote:
>>> add vlans :
>>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
>>> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
>>> 
>>> 
>>> and after that run :
>>> 
>>> for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>> 
>>> 
>>> time for remove for this 4093 vlans is 5-10 min .
>>> 
>>> Is there options to make fast this ?
>> 
>> If you know you are going to delete all of them together, then you can
>> add them to the same group during creation:
>> 
>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i up group 10 type vlan id $i; done
>> 
>> Then delete the group:
>> 
>> ip link del group 10
>> 
> 
> Another way is to create a netns for retiring devices,
> move devices to the 'retirens' when they need to go away.
> 
> Then once per minute, delete the retirens and create a new one.
> 
> -> This batches netdev deletions.
> 
> 
>> IIRC, in the past there was a patchset to allow passing a list of
>> ifindexes instead of a group number, but it never made its way upstream.



^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-09 11:10     ` Martin Zaharinov
@ 2023-05-09 12:36       ` Eric Dumazet
  2023-05-09 18:50         ` Martin Zaharinov
  2023-05-09 20:08         ` Martin Zaharinov
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2023-05-09 12:36 UTC (permalink / raw)
  To: Martin Zaharinov; +Cc: Ido Schimmel, netdev

On Tue, May 9, 2023 at 1:10 PM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi
>
> in short, there is no way to make the kernel do it faster.

Make sure your kernel does not include options you do not need.

>
> Before time with old kernel unregister device make more faster .
>
> with latest kernel >6.x this make very slow .
>

Yup, I feel your pain.

Maybe you should start a bisection then...

You might find that you have some CONFIG_ option that makes this
operation very slow.

Some layers (like hamradio and others) lack batch operations in their
netdev removal handlers.

For instance, on one machine I have access to and with my standard
.config, your benchmark gives a not too bad result with pristine
linux-6.3

modprobe dummy
ip link set dev dummy0 up
for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
vlan id $i; done
for i in $(seq 2 4094); do ip link set dev vlan$i up; done
time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type
vlan id $i; done
real 0m55.808s
user 0m0.788s
sys 0m6.868s

Without batching, I think one netdev removal needs three synchronize_net() calls

I am reasonably certain numbers would not look so good if I booted a
"make allyesconfig" kernel.








>
> is there any chance to try to make this more fast.
>
>
> m.
>
>
> > On 9 May 2023, at 13:32, Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, May 9, 2023 at 12:20 PM Ido Schimmel <idosch@idosch.org> wrote:
> >>
> >> On Tue, May 09, 2023 at 11:22:13AM +0300, Martin Zaharinov wrote:
> >>> add vlans :
> >>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
> >>> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> >>>
> >>>
> >>> and after that run :
> >>>
> >>> for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
> >>>
> >>>
> >>> time for remove for this 4093 vlans is 5-10 min .
> >>>
> >>> Is there options to make fast this ?
> >>
> >> If you know you are going to delete all of them together, then you can
> >> add them to the same group during creation:
> >>
> >> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i up group 10 type vlan id $i; done
> >>
> >> Then delete the group:
> >>
> >> ip link del group 10
> >>
> >
> > Another way is to create a netns for retiring devices,
> > move devices to the 'retirens' when they need to go away.
> >
> > Then once per minute, delete the retirens and create a new one.
> >
> > -> This batches netdev deletions.
> >
> >
> >> IIRC, in the past there was a patchset to allow passing a list of
> >> ifindexes instead of a group number, but it never made its way upstream.
>
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-09 12:36       ` Eric Dumazet
@ 2023-05-09 18:50         ` Martin Zaharinov
  2023-05-09 20:08           ` Ido Schimmel
  2023-05-09 20:08         ` Martin Zaharinov
  1 sibling, 1 reply; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-09 18:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ido Schimmel, netdev

Hi Eric

i try on kernel 6.3.1 


time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done

real	4m51.633s  —— here i stop with Ctrl + C  -  and rerun  and second part finish after 3 min
user	0m7.479s
sys	0m0.367s


Config is very clean i remove big part of CONFIG options .

is there options to debug what is happen.

m


> On 9 May 2023, at 15:36, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Tue, May 9, 2023 at 1:10 PM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi
>> 
>> in short, there is no way to make the kernel do it faster.
> 
> Make sure your kernel does not include options you do not need.
> 
>> 
>> Before time with old kernel unregister device make more faster .
>> 
>> with latest kernel >6.x this make very slow .
>> 
> 
> Yup, I feel your pain.
> 
> Maybe you should start a bisection then...
> 
> You might find that you have some CONFIG_ option that makes this
> operation very slow.
> 
> Some layers (like hamradio and others) lack batch operations in their
> netdev removal handlers.
> 
> For instance, on one machine I have access to and with my standard
> .config, your benchmark gives a not too bad result with pristine
> linux-6.3
> 
> modprobe dummy
> ip link set dev dummy0 up
> for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
> vlan id $i; done
> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type
> vlan id $i; done
> real 0m55.808s
> user 0m0.788s
> sys 0m6.868s
> 
> Without batching, I think one netdev removal needs three synchronize_net() calls
> 
> I am reasonably certain numbers would not look so good if I booted a
> "make allyesconfig" kernel.
> 
> 
> 
> 
> 
> 
> 
> 
>> 
>> is there any chance to try to make this more fast.
>> 
>> 
>> m.
>> 
>> 
>>> On 9 May 2023, at 13:32, Eric Dumazet <edumazet@google.com> wrote:
>>> 
>>> On Tue, May 9, 2023 at 12:20 PM Ido Schimmel <idosch@idosch.org> wrote:
>>>> 
>>>> On Tue, May 09, 2023 at 11:22:13AM +0300, Martin Zaharinov wrote:
>>>>> add vlans :
>>>>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
>>>>> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
>>>>> 
>>>>> 
>>>>> and after that run :
>>>>> 
>>>>> for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>>>> 
>>>>> 
>>>>> time for remove for this 4093 vlans is 5-10 min .
>>>>> 
>>>>> Is there options to make fast this ?
>>>> 
>>>> If you know you are going to delete all of them together, then you can
>>>> add them to the same group during creation:
>>>> 
>>>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i up group 10 type vlan id $i; done
>>>> 
>>>> Then delete the group:
>>>> 
>>>> ip link del group 10
>>>> 
>>> 
>>> Another way is to create a netns for retiring devices,
>>> move devices to the 'retirens' when they need to go away.
>>> 
>>> Then once per minute, delete the retirens and create a new one.
>>> 
>>> -> This batches netdev deletions.
>>> 
>>> 
>>>> IIRC, in the past there was a patchset to allow passing a list of
>>>> ifindexes instead of a group number, but it never made its way upstream.
>> 
>> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-09 12:36       ` Eric Dumazet
  2023-05-09 18:50         ` Martin Zaharinov
@ 2023-05-09 20:08         ` Martin Zaharinov
  1 sibling, 0 replies; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-09 20:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ido Schimmel, netdev

[-- Attachment #1: Type: text/plain, Size: 142 bytes --]

One more 
see this video
from time of remove i make this : watch -n.1 "ip a | grep UP | wc”

to look how many interface remove in 1sec

[-- Attachment #2: Screen Recording 2023-05-09 at 23.06.52.mov --]
[-- Type: video/quicktime, Size: 606423 bytes --]

[-- Attachment #3: Type: text/plain, Size: 2972 bytes --]





> On 9 May 2023, at 15:36, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Tue, May 9, 2023 at 1:10 PM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> Hi
>> 
>> in short, there is no way to make the kernel do it faster.
> 
> Make sure your kernel does not include options you do not need.
> 
>> 
>> Before time with old kernel unregister device make more faster .
>> 
>> with latest kernel >6.x this make very slow .
>> 
> 
> Yup, I feel your pain.
> 
> Maybe you should start a bisection then...
> 
> You might find that you have some CONFIG_ option that makes this
> operation very slow.
> 
> Some layers (like hamradio and others) lack batch operations in their
> netdev removal handlers.
> 
> For instance, on one machine I have access to and with my standard
> .config, your benchmark gives a not too bad result with pristine
> linux-6.3
> 
> modprobe dummy
> ip link set dev dummy0 up
> for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
> vlan id $i; done
> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type
> vlan id $i; done
> real 0m55.808s
> user 0m0.788s
> sys 0m6.868s
> 
> Without batching, I think one netdev removal needs three synchronize_net() calls
> 
> I am reasonably certain numbers would not look so good if I booted a
> "make allyesconfig" kernel.
> 
> 
> 
> 
> 
> 
> 
> 
>> 
>> is there any chance to try to make this more fast.
>> 
>> 
>> m.
>> 
>> 
>>> On 9 May 2023, at 13:32, Eric Dumazet <edumazet@google.com> wrote:
>>> 
>>> On Tue, May 9, 2023 at 12:20 PM Ido Schimmel <idosch@idosch.org> wrote:
>>>> 
>>>> On Tue, May 09, 2023 at 11:22:13AM +0300, Martin Zaharinov wrote:
>>>>> add vlans :
>>>>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
>>>>> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
>>>>> 
>>>>> 
>>>>> and after that run :
>>>>> 
>>>>> for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>>>> 
>>>>> 
>>>>> time for remove for this 4093 vlans is 5-10 min .
>>>>> 
>>>>> Is there options to make fast this ?
>>>> 
>>>> If you know you are going to delete all of them together, then you can
>>>> add them to the same group during creation:
>>>> 
>>>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i up group 10 type vlan id $i; done
>>>> 
>>>> Then delete the group:
>>>> 
>>>> ip link del group 10
>>>> 
>>> 
>>> Another way is to create a netns for retiring devices,
>>> move devices to the 'retirens' when they need to go away.
>>> 
>>> Then once per minute, delete the retirens and create a new one.
>>> 
>>> -> This batches netdev deletions.
>>> 
>>> 
>>>> IIRC, in the past there was a patchset to allow passing a list of
>>>> ifindexes instead of a group number, but it never made its way upstream.
>> 
>> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-09 18:50         ` Martin Zaharinov
@ 2023-05-09 20:08           ` Ido Schimmel
  2023-05-09 20:16             ` Martin Zaharinov
                               ` (3 more replies)
  0 siblings, 4 replies; 16+ messages in thread
From: Ido Schimmel @ 2023-05-09 20:08 UTC (permalink / raw)
  To: Martin Zaharinov; +Cc: Eric Dumazet, netdev

On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
> i try on kernel 6.3.1 
> 
> 
> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
> 
> real	4m51.633s  —— here i stop with Ctrl + C  -  and rerun  and second part finish after 3 min
> user	0m7.479s
> sys	0m0.367s

You are off-CPU most of the time, the question is what is blocking. I'm
getting the following results with net-next:

# time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
real 177.09
user 3.85
sys 31.26

When using a batch file to perform the deletion:

# time -p ip -b vlan_del.batch 
real 35.25
user 0.02
sys 3.61

And to check where we are blocked most of the time while using the batch
file:

# ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
[...]
    __schedule
    schedule
    schedule_timeout
    wait_for_completion
    rcu_barrier
    netdev_run_todo
    rtnetlink_rcv_msg
    netlink_rcv_skb
    netlink_unicast
    netlink_sendmsg
    ____sys_sendmsg
    ___sys_sendmsg
    __sys_sendmsg
    do_syscall_64
    entry_SYSCALL_64_after_hwframe
    -                ip (3660)
        25089479
[...]

We are blocked for around 70% of the time on the rcu_barrier() in
netdev_run_todo().

Note that one big difference between my setup and yours is that in my
case eth0 is a dummy device and in your case it's probably a physical
device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
so, it's possible that a non-negligible amount of time is spent talking
to hardware/firmware to delete the 4K VIDs from the device's VLAN
filter.

> 
> 
> Config is very clean i remove big part of CONFIG options .
> 
> is there options to debug what is happen.
> 
> m

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-09 20:08           ` Ido Schimmel
@ 2023-05-09 20:16             ` Martin Zaharinov
  2023-05-10  5:31             ` Martin Zaharinov
                               ` (2 subsequent siblings)
  3 siblings, 0 replies; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-09 20:16 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: Eric Dumazet, netdev

Hi Ido

yes is physical card intel 82599 dual port 10G on 2 socket system with 24 core on 3Ghz

this is time : 

time ./vlanadd

real	0m12.347s
user	0m8.863s
sys	0m2.594s

time ./vlanrem

real	8m59.105s
user	0m11.931s
sys	0m0.035s


for 1sec with : watch -n.1 "ip a | grep UP | wc”

and run vlanrem 

in 1sec ~ remove 4-5 vlans

and i think rcu make problem.

i found one post from 2009 : https://lore.kernel.org/all/20091024144610.GC6638@linux.vnet.ibm.com/T/

yes is old and may be is make many changes after that .

i have same case with slow remove interface and with ppp interface when drop users over 800-900 make same problem to remove device and reconnect (readd)

m.

> On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
> 
> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>> i try on kernel 6.3.1 
>> 
>> 
>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>> 
>> real 4m51.633s  —— here i stop with Ctrl + C  -  and rerun  and second part finish after 3 min
>> user 0m7.479s
>> sys 0m0.367s
> 
> You are off-CPU most of the time, the question is what is blocking. I'm
> getting the following results with net-next:
> 
> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> real 177.09
> user 3.85
> sys 31.26
> 
> When using a batch file to perform the deletion:
> 
> # time -p ip -b vlan_del.batch 
> real 35.25
> user 0.02
> sys 3.61
> 
> And to check where we are blocked most of the time while using the batch
> file:
> 
> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> [...]
>    __schedule
>    schedule
>    schedule_timeout
>    wait_for_completion
>    rcu_barrier
>    netdev_run_todo
>    rtnetlink_rcv_msg
>    netlink_rcv_skb
>    netlink_unicast
>    netlink_sendmsg
>    ____sys_sendmsg
>    ___sys_sendmsg
>    __sys_sendmsg
>    do_syscall_64
>    entry_SYSCALL_64_after_hwframe
>    -                ip (3660)
>        25089479
> [...]
> 
> We are blocked for around 70% of the time on the rcu_barrier() in
> netdev_run_todo().
> 
> Note that one big difference between my setup and yours is that in my
> case eth0 is a dummy device and in your case it's probably a physical
> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> so, it's possible that a non-negligible amount of time is spent talking
> to hardware/firmware to delete the 4K VIDs from the device's VLAN
> filter.
> 
>> 
>> 
>> Config is very clean i remove big part of CONFIG options .
>> 
>> is there options to debug what is happen.
>> 
>> m


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-09 20:08           ` Ido Schimmel
  2023-05-09 20:16             ` Martin Zaharinov
@ 2023-05-10  5:31             ` Martin Zaharinov
  2023-05-10  6:06             ` Martin Zaharinov
  2023-05-10  9:16             ` Martin Zaharinov
  3 siblings, 0 replies; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-10  5:31 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: Eric Dumazet, netdev

Hi Eric and Ido

after little research after change CONFIG_HZ_100 > CONFIG_HZ_1000 


vlanadd

real	0m15.106s
user	0m2.420s
sys	0m13.250s

vlandel: 

real	1m10.995s
user	0m1.045s
sys	0m7.678s

i use 100 last 10 years all installation is server for networking.

do you have any recommendations

best regards,
m


> On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
> 
> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>> i try on kernel 6.3.1 
>> 
>> 
>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>> 
>> real 4m51.633s  —— here i stop with Ctrl + C  -  and rerun  and second part finish after 3 min
>> user 0m7.479s
>> sys 0m0.367s
> 
> You are off-CPU most of the time, the question is what is blocking. I'm
> getting the following results with net-next:
> 
> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> real 177.09
> user 3.85
> sys 31.26
> 
> When using a batch file to perform the deletion:
> 
> # time -p ip -b vlan_del.batch 
> real 35.25
> user 0.02
> sys 3.61
> 
> And to check where we are blocked most of the time while using the batch
> file:
> 
> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> [...]
>    __schedule
>    schedule
>    schedule_timeout
>    wait_for_completion
>    rcu_barrier
>    netdev_run_todo
>    rtnetlink_rcv_msg
>    netlink_rcv_skb
>    netlink_unicast
>    netlink_sendmsg
>    ____sys_sendmsg
>    ___sys_sendmsg
>    __sys_sendmsg
>    do_syscall_64
>    entry_SYSCALL_64_after_hwframe
>    -                ip (3660)
>        25089479
> [...]
> 
> We are blocked for around 70% of the time on the rcu_barrier() in
> netdev_run_todo().
> 
> Note that one big difference between my setup and yours is that in my
> case eth0 is a dummy device and in your case it's probably a physical
> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> so, it's possible that a non-negligible amount of time is spent talking
> to hardware/firmware to delete the 4K VIDs from the device's VLAN
> filter.
> 
>> 
>> 
>> Config is very clean i remove big part of CONFIG options .
>> 
>> is there options to debug what is happen.
>> 
>> m


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-09 20:08           ` Ido Schimmel
  2023-05-09 20:16             ` Martin Zaharinov
  2023-05-10  5:31             ` Martin Zaharinov
@ 2023-05-10  6:06             ` Martin Zaharinov
  2023-05-10  9:40               ` Eric Dumazet
  2023-05-10  9:16             ` Martin Zaharinov
  3 siblings, 1 reply; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-10  6:06 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: Eric Dumazet, netdev

I think problem is in this part of code in net/core/dev.c



#define WAIT_REFS_MIN_MSECS 1
#define WAIT_REFS_MAX_MSECS 250
/**
 * netdev_wait_allrefs_any - wait until all references are gone.
 * @list: list of net_devices to wait on
 *
 * This is called when unregistering network devices.
 *
 * Any protocol or device that holds a reference should register
 * for netdevice notification, and cleanup and put back the
 * reference if they receive an UNREGISTER event.
 * We can get stuck here if buggy protocols don't correctly
 * call dev_put.
 */
static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
{
        unsigned long rebroadcast_time, warning_time;
        struct net_device *dev;
        int wait = 0;

        rebroadcast_time = warning_time = jiffies;

        list_for_each_entry(dev, list, todo_list)
                if (netdev_refcnt_read(dev) == 1)
                        return dev;

        while (true) {
                if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
                        rtnl_lock();

                        /* Rebroadcast unregister notification */
                        list_for_each_entry(dev, list, todo_list)
                                call_netdevice_notifiers(NETDEV_UNREGISTER, dev);

                        __rtnl_unlock();
                        rcu_barrier();
                        rtnl_lock();

                        list_for_each_entry(dev, list, todo_list)
                                if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
                                             &dev->state)) {
                                        /* We must not have linkwatch events
                                         * pending on unregister. If this
                                         * happens, we simply run the queue
                                         * unscheduled, resulting in a noop
                                         * for this device.
                                         */
                                        linkwatch_run_queue();
                                        break;
                                }

                        __rtnl_unlock();

                        rebroadcast_time = jiffies;
                }

                if (!wait) {
                        rcu_barrier();
                        wait = WAIT_REFS_MIN_MSECS;
                } else {
                        msleep(wait);
                        wait = min(wait << 1, WAIT_REFS_MAX_MSECS);
                }

                list_for_each_entry(dev, list, todo_list)
                        if (netdev_refcnt_read(dev) == 1)
                                return dev;

                if (time_after(jiffies, warning_time +
                               READ_ONCE(netdev_unregister_timeout_secs) * HZ)) {
                        list_for_each_entry(dev, list, todo_list) {
                                pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
                                         dev->name, netdev_refcnt_read(dev));
                                ref_tracker_dir_print(&dev->refcnt_tracker, 10);
                        }

                        warning_time = jiffies;
                }
        }
}



m.


> On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
> 
> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>> i try on kernel 6.3.1 
>> 
>> 
>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>> 
>> real 4m51.633s  —— here i stop with Ctrl + C  -  and rerun  and second part finish after 3 min
>> user 0m7.479s
>> sys 0m0.367s
> 
> You are off-CPU most of the time, the question is what is blocking. I'm
> getting the following results with net-next:
> 
> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> real 177.09
> user 3.85
> sys 31.26
> 
> When using a batch file to perform the deletion:
> 
> # time -p ip -b vlan_del.batch 
> real 35.25
> user 0.02
> sys 3.61
> 
> And to check where we are blocked most of the time while using the batch
> file:
> 
> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> [...]
>    __schedule
>    schedule
>    schedule_timeout
>    wait_for_completion
>    rcu_barrier
>    netdev_run_todo
>    rtnetlink_rcv_msg
>    netlink_rcv_skb
>    netlink_unicast
>    netlink_sendmsg
>    ____sys_sendmsg
>    ___sys_sendmsg
>    __sys_sendmsg
>    do_syscall_64
>    entry_SYSCALL_64_after_hwframe
>    -                ip (3660)
>        25089479
> [...]
> 
> We are blocked for around 70% of the time on the rcu_barrier() in
> netdev_run_todo().
> 
> Note that one big difference between my setup and yours is that in my
> case eth0 is a dummy device and in your case it's probably a physical
> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> so, it's possible that a non-negligible amount of time is spent talking
> to hardware/firmware to delete the 4K VIDs from the device's VLAN
> filter.
> 
>> 
>> 
>> Config is very clean i remove big part of CONFIG options .
>> 
>> is there options to debug what is happen.
>> 
>> m


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-09 20:08           ` Ido Schimmel
                               ` (2 preceding siblings ...)
  2023-05-10  6:06             ` Martin Zaharinov
@ 2023-05-10  9:16             ` Martin Zaharinov
  2023-05-10  9:22               ` Eric Dumazet
  3 siblings, 1 reply; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-10  9:16 UTC (permalink / raw)
  To: Ido Schimmel; +Cc: Eric Dumazet, netdev

Hi all 

one more update

i test with Proxmox direct  with kernel 6.2.6

modprobe dummy numdummies=1
ip link set dev dummy0 up
for i in $(seq 2 1999); do ip link add link dummy0 name vlan$i type vlan id $i; done
for i in $(seq 2 1999); do ip link set dev vlan$i up; done
 time for i in $(seq 2 1999); do ip link del link dummy0 name vlan$i type vlan id $i; done

real	1m6.308s
user	0m4.451s
sys	0m1.589s


This kernel is configured with CONFIG_HZ 250 and as you see i add 1998 vlans  if add 4094 is time up to 4-5 min to remove 

in test kernel i set CONFIG_HZ to 1000 but i dont this this is fine for any server.




> On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
> 
> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>> i try on kernel 6.3.1 
>> 
>> 
>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>> 
>> real 4m51.633s  —— here i stop with Ctrl + C  -  and rerun  and second part finish after 3 min
>> user 0m7.479s
>> sys 0m0.367s
> 
> You are off-CPU most of the time, the question is what is blocking. I'm
> getting the following results with net-next:
> 
> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> real 177.09
> user 3.85
> sys 31.26
> 
> When using a batch file to perform the deletion:
> 
> # time -p ip -b vlan_del.batch 
> real 35.25
> user 0.02
> sys 3.61
> 
> And to check where we are blocked most of the time while using the batch
> file:
> 
> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> [...]
>    __schedule
>    schedule
>    schedule_timeout
>    wait_for_completion
>    rcu_barrier
>    netdev_run_todo
>    rtnetlink_rcv_msg
>    netlink_rcv_skb
>    netlink_unicast
>    netlink_sendmsg
>    ____sys_sendmsg
>    ___sys_sendmsg
>    __sys_sendmsg
>    do_syscall_64
>    entry_SYSCALL_64_after_hwframe
>    -                ip (3660)
>        25089479
> [...]
> 
> We are blocked for around 70% of the time on the rcu_barrier() in
> netdev_run_todo().
> 
> Note that one big difference between my setup and yours is that in my
> case eth0 is a dummy device and in your case it's probably a physical
> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> so, it's possible that a non-negligible amount of time is spent talking
> to hardware/firmware to delete the 4K VIDs from the device's VLAN
> filter.
> 
>> 
>> 
>> Config is very clean i remove big part of CONFIG options .
>> 
>> is there options to debug what is happen.
>> 
>> m


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-10  9:16             ` Martin Zaharinov
@ 2023-05-10  9:22               ` Eric Dumazet
  0 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2023-05-10  9:22 UTC (permalink / raw)
  To: Martin Zaharinov; +Cc: Ido Schimmel, netdev

On Wed, May 10, 2023 at 11:17 AM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi all
>
> one more update
>
> i test with Proxmox direct  with kernel 6.2.6
>
> modprobe dummy numdummies=1
> ip link set dev dummy0 up
> for i in $(seq 2 1999); do ip link add link dummy0 name vlan$i type vlan id $i; done
> for i in $(seq 2 1999); do ip link set dev vlan$i up; done
>  time for i in $(seq 2 1999); do ip link del link dummy0 name vlan$i type vlan id $i; done
>
> real    1m6.308s
> user    0m4.451s
> sys     0m1.589s
>
>
> This kernel is configured with CONFIG_HZ 250 and as you see i add 1998 vlans  if add 4094 is time up to 4-5 min to remove
>
> in test kernel i set CONFIG_HZ to 1000 but i dont this this is fine for any server.

We use CONFIG_HZ=1000 on server builds.

Other values cause suboptimal behavior, for instance in TCP stack.


>
>
> > On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
> >
> > On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
> >> i try on kernel 6.3.1
> >>
> >>
> >> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
> >>
> >> real 4m51.633s  —— here i stop with Ctrl + C  -  and rerun  and second part finish after 3 min
> >> user 0m7.479s
> >> sys 0m0.367s
> >
> > You are off-CPU most of the time, the question is what is blocking. I'm
> > getting the following results with net-next:
> >
> > # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> > real 177.09
> > user 3.85
> > sys 31.26
> >
> > When using a batch file to perform the deletion:
> >
> > # time -p ip -b vlan_del.batch
> > real 35.25
> > user 0.02
> > sys 3.61
> >
> > And to check where we are blocked most of the time while using the batch
> > file:
> >
> > # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> > [...]
> >    __schedule
> >    schedule
> >    schedule_timeout
> >    wait_for_completion
> >    rcu_barrier
> >    netdev_run_todo
> >    rtnetlink_rcv_msg
> >    netlink_rcv_skb
> >    netlink_unicast
> >    netlink_sendmsg
> >    ____sys_sendmsg
> >    ___sys_sendmsg
> >    __sys_sendmsg
> >    do_syscall_64
> >    entry_SYSCALL_64_after_hwframe
> >    -                ip (3660)
> >        25089479
> > [...]
> >
> > We are blocked for around 70% of the time on the rcu_barrier() in
> > netdev_run_todo().
> >
> > Note that one big difference between my setup and yours is that in my
> > case eth0 is a dummy device and in your case it's probably a physical
> > device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> > so, it's possible that a non-negligible amount of time is spent talking
> > to hardware/firmware to delete the 4K VIDs from the device's VLAN
> > filter.
> >
> >>
> >>
> >> Config is very clean i remove big part of CONFIG options .
> >>
> >> is there options to debug what is happen.
> >>
> >> m
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-10  6:06             ` Martin Zaharinov
@ 2023-05-10  9:40               ` Eric Dumazet
  2023-05-10 13:15                 ` Martin Zaharinov
  2023-05-25  7:50                 ` Martin Zaharinov
  0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2023-05-10  9:40 UTC (permalink / raw)
  To: Martin Zaharinov; +Cc: Ido Schimmel, netdev

On Wed, May 10, 2023 at 8:06 AM Martin Zaharinov <micron10@gmail.com> wrote:
>
> I think problem is in this part of code in net/core/dev.c

What makes you think this ?

msleep()  is not called a single time on my test bed.

# perf probe -a msleep
# cat bench.sh
modprobe dummy 2>/dev/null
ip link set dev dummy0 up 2>/dev/null
for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
vlan id $i; done
for i in $(seq 2 4094); do ip link set dev vlan$i up; done
time for i in $(seq 2 4094); do ip link del link dummy0 name vlan$i
type vlan id $i; done

#  perf record -e probe:msleep -a -g ./bench.sh

real 0m59.877s
user 0m0.588s
sys 0m7.023s
[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 8.561 MB perf.data ]
# perf script
#   << empty, nothing >>




> #define WAIT_REFS_MIN_MSECS 1
> #define WAIT_REFS_MAX_MSECS 250
> /**
>  * netdev_wait_allrefs_any - wait until all references are gone.
>  * @list: list of net_devices to wait on
>  *
>  * This is called when unregistering network devices.
>  *
>  * Any protocol or device that holds a reference should register
>  * for netdevice notification, and cleanup and put back the
>  * reference if they receive an UNREGISTER event.
>  * We can get stuck here if buggy protocols don't correctly
>  * call dev_put.
>  */
> static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
> {
>         unsigned long rebroadcast_time, warning_time;
>         struct net_device *dev;
>         int wait = 0;
>
>         rebroadcast_time = warning_time = jiffies;
>
>         list_for_each_entry(dev, list, todo_list)
>                 if (netdev_refcnt_read(dev) == 1)
>                         return dev;
>
>         while (true) {
>                 if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
>                         rtnl_lock();
>
>                         /* Rebroadcast unregister notification */
>                         list_for_each_entry(dev, list, todo_list)
>                                 call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>
>                         __rtnl_unlock();
>                         rcu_barrier();
>                         rtnl_lock();
>
>                         list_for_each_entry(dev, list, todo_list)
>                                 if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
>                                              &dev->state)) {
>                                         /* We must not have linkwatch events
>                                          * pending on unregister. If this
>                                          * happens, we simply run the queue
>                                          * unscheduled, resulting in a noop
>                                          * for this device.
>                                          */
>                                         linkwatch_run_queue();
>                                         break;
>                                 }
>
>                         __rtnl_unlock();
>
>                         rebroadcast_time = jiffies;
>                 }
>
>                 if (!wait) {
>                         rcu_barrier();
>                         wait = WAIT_REFS_MIN_MSECS;
>                 } else {
>                         msleep(wait);
>                         wait = min(wait << 1, WAIT_REFS_MAX_MSECS);
>                 }
>
>                 list_for_each_entry(dev, list, todo_list)
>                         if (netdev_refcnt_read(dev) == 1)
>                                 return dev;
>
>                 if (time_after(jiffies, warning_time +
>                                READ_ONCE(netdev_unregister_timeout_secs) * HZ)) {
>                         list_for_each_entry(dev, list, todo_list) {
>                                 pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
>                                          dev->name, netdev_refcnt_read(dev));
>                                 ref_tracker_dir_print(&dev->refcnt_tracker, 10);
>                         }
>
>                         warning_time = jiffies;
>                 }
>         }
> }
>
>
>
> m.
>
>
> > On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
> >
> > On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
> >> i try on kernel 6.3.1
> >>
> >>
> >> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
> >>
> >> real 4m51.633s  —— here i stop with Ctrl + C  -  and rerun  and second part finish after 3 min
> >> user 0m7.479s
> >> sys 0m0.367s
> >
> > You are off-CPU most of the time, the question is what is blocking. I'm
> > getting the following results with net-next:
> >
> > # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> > real 177.09
> > user 3.85
> > sys 31.26
> >
> > When using a batch file to perform the deletion:
> >
> > # time -p ip -b vlan_del.batch
> > real 35.25
> > user 0.02
> > sys 3.61
> >
> > And to check where we are blocked most of the time while using the batch
> > file:
> >
> > # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> > [...]
> >    __schedule
> >    schedule
> >    schedule_timeout
> >    wait_for_completion
> >    rcu_barrier
> >    netdev_run_todo
> >    rtnetlink_rcv_msg
> >    netlink_rcv_skb
> >    netlink_unicast
> >    netlink_sendmsg
> >    ____sys_sendmsg
> >    ___sys_sendmsg
> >    __sys_sendmsg
> >    do_syscall_64
> >    entry_SYSCALL_64_after_hwframe
> >    -                ip (3660)
> >        25089479
> > [...]
> >
> > We are blocked for around 70% of the time on the rcu_barrier() in
> > netdev_run_todo().
> >
> > Note that one big difference between my setup and yours is that in my
> > case eth0 is a dummy device and in your case it's probably a physical
> > device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> > so, it's possible that a non-negligible amount of time is spent talking
> > to hardware/firmware to delete the 4K VIDs from the device's VLAN
> > filter.
> >
> >>
> >>
> >> Config is very clean i remove big part of CONFIG options .
> >>
> >> is there options to debug what is happen.
> >>
> >> m
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-10  9:40               ` Eric Dumazet
@ 2023-05-10 13:15                 ` Martin Zaharinov
  2023-05-25  7:50                 ` Martin Zaharinov
  1 sibling, 0 replies; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-10 13:15 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ido Schimmel, netdev

Ok i will try to set CONFIG_HZ to 1000 and will make tests


Thanks Eric

> On 10 May 2023, at 12:40, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Wed, May 10, 2023 at 8:06 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> I think problem is in this part of code in net/core/dev.c
> 
> What makes you think this ?
> 
> msleep()  is not called a single time on my test bed.
> 
> # perf probe -a msleep
> # cat bench.sh
> modprobe dummy 2>/dev/null
> ip link set dev dummy0 up 2>/dev/null
> for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
> vlan id $i; done
> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> time for i in $(seq 2 4094); do ip link del link dummy0 name vlan$i
> type vlan id $i; done
> 
> #  perf record -e probe:msleep -a -g ./bench.sh
> 
> real 0m59.877s
> user 0m0.588s
> sys 0m7.023s
> [ perf record: Woken up 6 times to write data ]
> [ perf record: Captured and wrote 8.561 MB perf.data ]
> # perf script
> #   << empty, nothing >>
> 
> 
> 
> 
>> #define WAIT_REFS_MIN_MSECS 1
>> #define WAIT_REFS_MAX_MSECS 250
>> /**
>> * netdev_wait_allrefs_any - wait until all references are gone.
>> * @list: list of net_devices to wait on
>> *
>> * This is called when unregistering network devices.
>> *
>> * Any protocol or device that holds a reference should register
>> * for netdevice notification, and cleanup and put back the
>> * reference if they receive an UNREGISTER event.
>> * We can get stuck here if buggy protocols don't correctly
>> * call dev_put.
>> */
>> static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
>> {
>>        unsigned long rebroadcast_time, warning_time;
>>        struct net_device *dev;
>>        int wait = 0;
>> 
>>        rebroadcast_time = warning_time = jiffies;
>> 
>>        list_for_each_entry(dev, list, todo_list)
>>                if (netdev_refcnt_read(dev) == 1)
>>                        return dev;
>> 
>>        while (true) {
>>                if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
>>                        rtnl_lock();
>> 
>>                        /* Rebroadcast unregister notification */
>>                        list_for_each_entry(dev, list, todo_list)
>>                                call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>> 
>>                        __rtnl_unlock();
>>                        rcu_barrier();
>>                        rtnl_lock();
>> 
>>                        list_for_each_entry(dev, list, todo_list)
>>                                if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
>>                                             &dev->state)) {
>>                                        /* We must not have linkwatch events
>>                                         * pending on unregister. If this
>>                                         * happens, we simply run the queue
>>                                         * unscheduled, resulting in a noop
>>                                         * for this device.
>>                                         */
>>                                        linkwatch_run_queue();
>>                                        break;
>>                                }
>> 
>>                        __rtnl_unlock();
>> 
>>                        rebroadcast_time = jiffies;
>>                }
>> 
>>                if (!wait) {
>>                        rcu_barrier();
>>                        wait = WAIT_REFS_MIN_MSECS;
>>                } else {
>>                        msleep(wait);
>>                        wait = min(wait << 1, WAIT_REFS_MAX_MSECS);
>>                }
>> 
>>                list_for_each_entry(dev, list, todo_list)
>>                        if (netdev_refcnt_read(dev) == 1)
>>                                return dev;
>> 
>>                if (time_after(jiffies, warning_time +
>>                               READ_ONCE(netdev_unregister_timeout_secs) * HZ)) {
>>                        list_for_each_entry(dev, list, todo_list) {
>>                                pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
>>                                         dev->name, netdev_refcnt_read(dev));
>>                                ref_tracker_dir_print(&dev->refcnt_tracker, 10);
>>                        }
>> 
>>                        warning_time = jiffies;
>>                }
>>        }
>> }
>> 
>> 
>> 
>> m.
>> 
>> 
>>> On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
>>> 
>>> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>>>> i try on kernel 6.3.1
>>>> 
>>>> 
>>>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>>> 
>>>> real 4m51.633s  —— here i stop with Ctrl + C  -  and rerun  and second part finish after 3 min
>>>> user 0m7.479s
>>>> sys 0m0.367s
>>> 
>>> You are off-CPU most of the time, the question is what is blocking. I'm
>>> getting the following results with net-next:
>>> 
>>> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
>>> real 177.09
>>> user 3.85
>>> sys 31.26
>>> 
>>> When using a batch file to perform the deletion:
>>> 
>>> # time -p ip -b vlan_del.batch
>>> real 35.25
>>> user 0.02
>>> sys 3.61
>>> 
>>> And to check where we are blocked most of the time while using the batch
>>> file:
>>> 
>>> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
>>> [...]
>>>   __schedule
>>>   schedule
>>>   schedule_timeout
>>>   wait_for_completion
>>>   rcu_barrier
>>>   netdev_run_todo
>>>   rtnetlink_rcv_msg
>>>   netlink_rcv_skb
>>>   netlink_unicast
>>>   netlink_sendmsg
>>>   ____sys_sendmsg
>>>   ___sys_sendmsg
>>>   __sys_sendmsg
>>>   do_syscall_64
>>>   entry_SYSCALL_64_after_hwframe
>>>   -                ip (3660)
>>>       25089479
>>> [...]
>>> 
>>> We are blocked for around 70% of the time on the rcu_barrier() in
>>> netdev_run_todo().
>>> 
>>> Note that one big difference between my setup and yours is that in my
>>> case eth0 is a dummy device and in your case it's probably a physical
>>> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
>>> so, it's possible that a non-negligible amount of time is spent talking
>>> to hardware/firmware to delete the 4K VIDs from the device's VLAN
>>> filter.
>>> 
>>>> 
>>>> 
>>>> Config is very clean i remove big part of CONFIG options .
>>>> 
>>>> is there options to debug what is happen.
>>>> 
>>>> m
>> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Very slow remove interface from kernel
  2023-05-10  9:40               ` Eric Dumazet
  2023-05-10 13:15                 ` Martin Zaharinov
@ 2023-05-25  7:50                 ` Martin Zaharinov
  1 sibling, 0 replies; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-25  7:50 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Ido Schimmel, netdev

Hi Eric 
after switch to HZ 1666 reduce time to 30 sec for remove 4093 vlans .

Do you think there will be a problem?

Best regards,
martin

> On 10 May 2023, at 12:40, Eric Dumazet <edumazet@google.com> wrote:
> 
> On Wed, May 10, 2023 at 8:06 AM Martin Zaharinov <micron10@gmail.com> wrote:
>> 
>> I think problem is in this part of code in net/core/dev.c
> 
> What makes you think this ?
> 
> msleep()  is not called a single time on my test bed.
> 
> # perf probe -a msleep
> # cat bench.sh
> modprobe dummy 2>/dev/null
> ip link set dev dummy0 up 2>/dev/null
> for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
> vlan id $i; done
> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> time for i in $(seq 2 4094); do ip link del link dummy0 name vlan$i
> type vlan id $i; done
> 
> #  perf record -e probe:msleep -a -g ./bench.sh
> 
> real 0m59.877s
> user 0m0.588s
> sys 0m7.023s
> [ perf record: Woken up 6 times to write data ]
> [ perf record: Captured and wrote 8.561 MB perf.data ]
> # perf script
> #   << empty, nothing >>
> 
> 
> 
> 
>> #define WAIT_REFS_MIN_MSECS 1
>> #define WAIT_REFS_MAX_MSECS 250
>> /**
>> * netdev_wait_allrefs_any - wait until all references are gone.
>> * @list: list of net_devices to wait on
>> *
>> * This is called when unregistering network devices.
>> *
>> * Any protocol or device that holds a reference should register
>> * for netdevice notification, and cleanup and put back the
>> * reference if they receive an UNREGISTER event.
>> * We can get stuck here if buggy protocols don't correctly
>> * call dev_put.
>> */
>> static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
>> {
>>        unsigned long rebroadcast_time, warning_time;
>>        struct net_device *dev;
>>        int wait = 0;
>> 
>>        rebroadcast_time = warning_time = jiffies;
>> 
>>        list_for_each_entry(dev, list, todo_list)
>>                if (netdev_refcnt_read(dev) == 1)
>>                        return dev;
>> 
>>        while (true) {
>>                if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
>>                        rtnl_lock();
>> 
>>                        /* Rebroadcast unregister notification */
>>                        list_for_each_entry(dev, list, todo_list)
>>                                call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>> 
>>                        __rtnl_unlock();
>>                        rcu_barrier();
>>                        rtnl_lock();
>> 
>>                        list_for_each_entry(dev, list, todo_list)
>>                                if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
>>                                             &dev->state)) {
>>                                        /* We must not have linkwatch events
>>                                         * pending on unregister. If this
>>                                         * happens, we simply run the queue
>>                                         * unscheduled, resulting in a noop
>>                                         * for this device.
>>                                         */
>>                                        linkwatch_run_queue();
>>                                        break;
>>                                }
>> 
>>                        __rtnl_unlock();
>> 
>>                        rebroadcast_time = jiffies;
>>                }
>> 
>>                if (!wait) {
>>                        rcu_barrier();
>>                        wait = WAIT_REFS_MIN_MSECS;
>>                } else {
>>                        msleep(wait);
>>                        wait = min(wait << 1, WAIT_REFS_MAX_MSECS);
>>                }
>> 
>>                list_for_each_entry(dev, list, todo_list)
>>                        if (netdev_refcnt_read(dev) == 1)
>>                                return dev;
>> 
>>                if (time_after(jiffies, warning_time +
>>                               READ_ONCE(netdev_unregister_timeout_secs) * HZ)) {
>>                        list_for_each_entry(dev, list, todo_list) {
>>                                pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
>>                                         dev->name, netdev_refcnt_read(dev));
>>                                ref_tracker_dir_print(&dev->refcnt_tracker, 10);
>>                        }
>> 
>>                        warning_time = jiffies;
>>                }
>>        }
>> }
>> 
>> 
>> 
>> m.
>> 
>> 
>>> On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
>>> 
>>> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>>>> i try on kernel 6.3.1
>>>> 
>>>> 
>>>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>>> 
>>>> real 4m51.633s  —— here i stop with Ctrl + C  -  and rerun  and second part finish after 3 min
>>>> user 0m7.479s
>>>> sys 0m0.367s
>>> 
>>> You are off-CPU most of the time, the question is what is blocking. I'm
>>> getting the following results with net-next:
>>> 
>>> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
>>> real 177.09
>>> user 3.85
>>> sys 31.26
>>> 
>>> When using a batch file to perform the deletion:
>>> 
>>> # time -p ip -b vlan_del.batch
>>> real 35.25
>>> user 0.02
>>> sys 3.61
>>> 
>>> And to check where we are blocked most of the time while using the batch
>>> file:
>>> 
>>> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
>>> [...]
>>>   __schedule
>>>   schedule
>>>   schedule_timeout
>>>   wait_for_completion
>>>   rcu_barrier
>>>   netdev_run_todo
>>>   rtnetlink_rcv_msg
>>>   netlink_rcv_skb
>>>   netlink_unicast
>>>   netlink_sendmsg
>>>   ____sys_sendmsg
>>>   ___sys_sendmsg
>>>   __sys_sendmsg
>>>   do_syscall_64
>>>   entry_SYSCALL_64_after_hwframe
>>>   -                ip (3660)
>>>       25089479
>>> [...]
>>> 
>>> We are blocked for around 70% of the time on the rcu_barrier() in
>>> netdev_run_todo().
>>> 
>>> Note that one big difference between my setup and yours is that in my
>>> case eth0 is a dummy device and in your case it's probably a physical
>>> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
>>> so, it's possible that a non-negligible amount of time is spent talking
>>> to hardware/firmware to delete the 4K VIDs from the device's VLAN
>>> filter.
>>> 
>>>> 
>>>> 
>>>> Config is very clean i remove big part of CONFIG options .
>>>> 
>>>> is there options to debug what is happen.
>>>> 
>>>> m
>> 


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-05-25  7:50 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-09  8:22 Very slow remove interface from kernel Martin Zaharinov
2023-05-09 10:20 ` Ido Schimmel
2023-05-09 10:32   ` Eric Dumazet
2023-05-09 11:10     ` Martin Zaharinov
2023-05-09 12:36       ` Eric Dumazet
2023-05-09 18:50         ` Martin Zaharinov
2023-05-09 20:08           ` Ido Schimmel
2023-05-09 20:16             ` Martin Zaharinov
2023-05-10  5:31             ` Martin Zaharinov
2023-05-10  6:06             ` Martin Zaharinov
2023-05-10  9:40               ` Eric Dumazet
2023-05-10 13:15                 ` Martin Zaharinov
2023-05-25  7:50                 ` Martin Zaharinov
2023-05-10  9:16             ` Martin Zaharinov
2023-05-10  9:22               ` Eric Dumazet
2023-05-09 20:08         ` Martin Zaharinov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.