* Very slow remove interface from kernel
@ 2023-05-09 8:22 Martin Zaharinov
2023-05-09 10:20 ` Ido Schimmel
0 siblings, 1 reply; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-09 8:22 UTC (permalink / raw)
To: Eric Dumazet, netdev
Hi Eric
I think may be help for this :
I try this on kernel 6.3.1
add vlans :
for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
for i in $(seq 2 4094); do ip link set dev vlan$i up; done
and after that run :
for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
time for remove for this 4093 vlans is 5-10 min .
Is there options to make fast this ?
Same problem is when have 5-6k ppp interface kernel very slow unregister device.
best regards,
m.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-09 8:22 Very slow remove interface from kernel Martin Zaharinov
@ 2023-05-09 10:20 ` Ido Schimmel
2023-05-09 10:32 ` Eric Dumazet
0 siblings, 1 reply; 16+ messages in thread
From: Ido Schimmel @ 2023-05-09 10:20 UTC (permalink / raw)
To: Martin Zaharinov; +Cc: Eric Dumazet, netdev
On Tue, May 09, 2023 at 11:22:13AM +0300, Martin Zaharinov wrote:
> add vlans :
> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
>
>
> and after that run :
>
> for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>
>
> time for remove for this 4093 vlans is 5-10 min .
>
> Is there options to make fast this ?
If you know you are going to delete all of them together, then you can
add them to the same group during creation:
for i in $(seq 2 4094); do ip link add link eth1 name vlan$i up group 10 type vlan id $i; done
Then delete the group:
ip link del group 10
IIRC, in the past there was a patchset to allow passing a list of
ifindexes instead of a group number, but it never made its way upstream.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-09 10:20 ` Ido Schimmel
@ 2023-05-09 10:32 ` Eric Dumazet
2023-05-09 11:10 ` Martin Zaharinov
0 siblings, 1 reply; 16+ messages in thread
From: Eric Dumazet @ 2023-05-09 10:32 UTC (permalink / raw)
To: Ido Schimmel; +Cc: Martin Zaharinov, netdev
On Tue, May 9, 2023 at 12:20 PM Ido Schimmel <idosch@idosch.org> wrote:
>
> On Tue, May 09, 2023 at 11:22:13AM +0300, Martin Zaharinov wrote:
> > add vlans :
> > for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
> > for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> >
> >
> > and after that run :
> >
> > for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
> >
> >
> > time for remove for this 4093 vlans is 5-10 min .
> >
> > Is there options to make fast this ?
>
> If you know you are going to delete all of them together, then you can
> add them to the same group during creation:
>
> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i up group 10 type vlan id $i; done
>
> Then delete the group:
>
> ip link del group 10
>
Another way is to create a netns for retiring devices,
move devices to the 'retirens' when they need to go away.
Then once per minute, delete the retirens and create a new one.
-> This batches netdev deletions.
> IIRC, in the past there was a patchset to allow passing a list of
> ifindexes instead of a group number, but it never made its way upstream.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-09 10:32 ` Eric Dumazet
@ 2023-05-09 11:10 ` Martin Zaharinov
2023-05-09 12:36 ` Eric Dumazet
0 siblings, 1 reply; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-09 11:10 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Ido Schimmel, netdev
Hi
in short, there is no way to make the kernel do it faster.
Before time with old kernel unregister device make more faster .
with latest kernel >6.x this make very slow .
is there any chance to try to make this more fast.
m.
> On 9 May 2023, at 13:32, Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, May 9, 2023 at 12:20 PM Ido Schimmel <idosch@idosch.org> wrote:
>>
>> On Tue, May 09, 2023 at 11:22:13AM +0300, Martin Zaharinov wrote:
>>> add vlans :
>>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
>>> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
>>>
>>>
>>> and after that run :
>>>
>>> for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>>
>>>
>>> time for remove for this 4093 vlans is 5-10 min .
>>>
>>> Is there options to make fast this ?
>>
>> If you know you are going to delete all of them together, then you can
>> add them to the same group during creation:
>>
>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i up group 10 type vlan id $i; done
>>
>> Then delete the group:
>>
>> ip link del group 10
>>
>
> Another way is to create a netns for retiring devices,
> move devices to the 'retirens' when they need to go away.
>
> Then once per minute, delete the retirens and create a new one.
>
> -> This batches netdev deletions.
>
>
>> IIRC, in the past there was a patchset to allow passing a list of
>> ifindexes instead of a group number, but it never made its way upstream.
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-09 11:10 ` Martin Zaharinov
@ 2023-05-09 12:36 ` Eric Dumazet
2023-05-09 18:50 ` Martin Zaharinov
2023-05-09 20:08 ` Martin Zaharinov
0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2023-05-09 12:36 UTC (permalink / raw)
To: Martin Zaharinov; +Cc: Ido Schimmel, netdev
On Tue, May 9, 2023 at 1:10 PM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi
>
> in short, there is no way to make the kernel do it faster.
Make sure your kernel does not include options you do not need.
>
> Before time with old kernel unregister device make more faster .
>
> with latest kernel >6.x this make very slow .
>
Yup, I feel your pain.
Maybe you should start a bisection then...
You might find that you have some CONFIG_ option that makes this
operation very slow.
Some layers (like hamradio and others) lack batch operations in their
netdev removal handlers.
For instance, on one machine I have access to and with my standard
.config, your benchmark gives a not too bad result with pristine
linux-6.3
modprobe dummy
ip link set dev dummy0 up
for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
vlan id $i; done
for i in $(seq 2 4094); do ip link set dev vlan$i up; done
time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type
vlan id $i; done
real 0m55.808s
user 0m0.788s
sys 0m6.868s
Without batching, I think one netdev removal needs three synchronize_net() calls
I am reasonably certain numbers would not look so good if I booted a
"make allyesconfig" kernel.
>
> is there any chance to try to make this more fast.
>
>
> m.
>
>
> > On 9 May 2023, at 13:32, Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Tue, May 9, 2023 at 12:20 PM Ido Schimmel <idosch@idosch.org> wrote:
> >>
> >> On Tue, May 09, 2023 at 11:22:13AM +0300, Martin Zaharinov wrote:
> >>> add vlans :
> >>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
> >>> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> >>>
> >>>
> >>> and after that run :
> >>>
> >>> for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
> >>>
> >>>
> >>> time for remove for this 4093 vlans is 5-10 min .
> >>>
> >>> Is there options to make fast this ?
> >>
> >> If you know you are going to delete all of them together, then you can
> >> add them to the same group during creation:
> >>
> >> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i up group 10 type vlan id $i; done
> >>
> >> Then delete the group:
> >>
> >> ip link del group 10
> >>
> >
> > Another way is to create a netns for retiring devices,
> > move devices to the 'retirens' when they need to go away.
> >
> > Then once per minute, delete the retirens and create a new one.
> >
> > -> This batches netdev deletions.
> >
> >
> >> IIRC, in the past there was a patchset to allow passing a list of
> >> ifindexes instead of a group number, but it never made its way upstream.
>
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-09 12:36 ` Eric Dumazet
@ 2023-05-09 18:50 ` Martin Zaharinov
2023-05-09 20:08 ` Ido Schimmel
2023-05-09 20:08 ` Martin Zaharinov
1 sibling, 1 reply; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-09 18:50 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Ido Schimmel, netdev
Hi Eric
i try on kernel 6.3.1
time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
real 4m51.633s —— here i stop with Ctrl + C - and rerun and second part finish after 3 min
user 0m7.479s
sys 0m0.367s
Config is very clean i remove big part of CONFIG options .
is there options to debug what is happen.
m
> On 9 May 2023, at 15:36, Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, May 9, 2023 at 1:10 PM Martin Zaharinov <micron10@gmail.com> wrote:
>>
>> Hi
>>
>> in short, there is no way to make the kernel do it faster.
>
> Make sure your kernel does not include options you do not need.
>
>>
>> Before time with old kernel unregister device make more faster .
>>
>> with latest kernel >6.x this make very slow .
>>
>
> Yup, I feel your pain.
>
> Maybe you should start a bisection then...
>
> You might find that you have some CONFIG_ option that makes this
> operation very slow.
>
> Some layers (like hamradio and others) lack batch operations in their
> netdev removal handlers.
>
> For instance, on one machine I have access to and with my standard
> .config, your benchmark gives a not too bad result with pristine
> linux-6.3
>
> modprobe dummy
> ip link set dev dummy0 up
> for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
> vlan id $i; done
> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type
> vlan id $i; done
> real 0m55.808s
> user 0m0.788s
> sys 0m6.868s
>
> Without batching, I think one netdev removal needs three synchronize_net() calls
>
> I am reasonably certain numbers would not look so good if I booted a
> "make allyesconfig" kernel.
>
>
>
>
>
>
>
>
>>
>> is there any chance to try to make this more fast.
>>
>>
>> m.
>>
>>
>>> On 9 May 2023, at 13:32, Eric Dumazet <edumazet@google.com> wrote:
>>>
>>> On Tue, May 9, 2023 at 12:20 PM Ido Schimmel <idosch@idosch.org> wrote:
>>>>
>>>> On Tue, May 09, 2023 at 11:22:13AM +0300, Martin Zaharinov wrote:
>>>>> add vlans :
>>>>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
>>>>> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
>>>>>
>>>>>
>>>>> and after that run :
>>>>>
>>>>> for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>>>>
>>>>>
>>>>> time for remove for this 4093 vlans is 5-10 min .
>>>>>
>>>>> Is there options to make fast this ?
>>>>
>>>> If you know you are going to delete all of them together, then you can
>>>> add them to the same group during creation:
>>>>
>>>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i up group 10 type vlan id $i; done
>>>>
>>>> Then delete the group:
>>>>
>>>> ip link del group 10
>>>>
>>>
>>> Another way is to create a netns for retiring devices,
>>> move devices to the 'retirens' when they need to go away.
>>>
>>> Then once per minute, delete the retirens and create a new one.
>>>
>>> -> This batches netdev deletions.
>>>
>>>
>>>> IIRC, in the past there was a patchset to allow passing a list of
>>>> ifindexes instead of a group number, but it never made its way upstream.
>>
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-09 12:36 ` Eric Dumazet
2023-05-09 18:50 ` Martin Zaharinov
@ 2023-05-09 20:08 ` Martin Zaharinov
1 sibling, 0 replies; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-09 20:08 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Ido Schimmel, netdev
[-- Attachment #1: Type: text/plain, Size: 142 bytes --]
One more
see this video
from time of remove i make this : watch -n.1 "ip a | grep UP | wc”
to look how many interface remove in 1sec
[-- Attachment #2: Screen Recording 2023-05-09 at 23.06.52.mov --]
[-- Type: video/quicktime, Size: 606423 bytes --]
[-- Attachment #3: Type: text/plain, Size: 2972 bytes --]
> On 9 May 2023, at 15:36, Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, May 9, 2023 at 1:10 PM Martin Zaharinov <micron10@gmail.com> wrote:
>>
>> Hi
>>
>> in short, there is no way to make the kernel do it faster.
>
> Make sure your kernel does not include options you do not need.
>
>>
>> Before time with old kernel unregister device make more faster .
>>
>> with latest kernel >6.x this make very slow .
>>
>
> Yup, I feel your pain.
>
> Maybe you should start a bisection then...
>
> You might find that you have some CONFIG_ option that makes this
> operation very slow.
>
> Some layers (like hamradio and others) lack batch operations in their
> netdev removal handlers.
>
> For instance, on one machine I have access to and with my standard
> .config, your benchmark gives a not too bad result with pristine
> linux-6.3
>
> modprobe dummy
> ip link set dev dummy0 up
> for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
> vlan id $i; done
> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type
> vlan id $i; done
> real 0m55.808s
> user 0m0.788s
> sys 0m6.868s
>
> Without batching, I think one netdev removal needs three synchronize_net() calls
>
> I am reasonably certain numbers would not look so good if I booted a
> "make allyesconfig" kernel.
>
>
>
>
>
>
>
>
>>
>> is there any chance to try to make this more fast.
>>
>>
>> m.
>>
>>
>>> On 9 May 2023, at 13:32, Eric Dumazet <edumazet@google.com> wrote:
>>>
>>> On Tue, May 9, 2023 at 12:20 PM Ido Schimmel <idosch@idosch.org> wrote:
>>>>
>>>> On Tue, May 09, 2023 at 11:22:13AM +0300, Martin Zaharinov wrote:
>>>>> add vlans :
>>>>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i type vlan id $i; done
>>>>> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
>>>>>
>>>>>
>>>>> and after that run :
>>>>>
>>>>> for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>>>>
>>>>>
>>>>> time for remove for this 4093 vlans is 5-10 min .
>>>>>
>>>>> Is there options to make fast this ?
>>>>
>>>> If you know you are going to delete all of them together, then you can
>>>> add them to the same group during creation:
>>>>
>>>> for i in $(seq 2 4094); do ip link add link eth1 name vlan$i up group 10 type vlan id $i; done
>>>>
>>>> Then delete the group:
>>>>
>>>> ip link del group 10
>>>>
>>>
>>> Another way is to create a netns for retiring devices,
>>> move devices to the 'retirens' when they need to go away.
>>>
>>> Then once per minute, delete the retirens and create a new one.
>>>
>>> -> This batches netdev deletions.
>>>
>>>
>>>> IIRC, in the past there was a patchset to allow passing a list of
>>>> ifindexes instead of a group number, but it never made its way upstream.
>>
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-09 18:50 ` Martin Zaharinov
@ 2023-05-09 20:08 ` Ido Schimmel
2023-05-09 20:16 ` Martin Zaharinov
` (3 more replies)
0 siblings, 4 replies; 16+ messages in thread
From: Ido Schimmel @ 2023-05-09 20:08 UTC (permalink / raw)
To: Martin Zaharinov; +Cc: Eric Dumazet, netdev
On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
> i try on kernel 6.3.1
>
>
> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>
> real 4m51.633s —— here i stop with Ctrl + C - and rerun and second part finish after 3 min
> user 0m7.479s
> sys 0m0.367s
You are off-CPU most of the time, the question is what is blocking. I'm
getting the following results with net-next:
# time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
real 177.09
user 3.85
sys 31.26
When using a batch file to perform the deletion:
# time -p ip -b vlan_del.batch
real 35.25
user 0.02
sys 3.61
And to check where we are blocked most of the time while using the batch
file:
# ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
[...]
__schedule
schedule
schedule_timeout
wait_for_completion
rcu_barrier
netdev_run_todo
rtnetlink_rcv_msg
netlink_rcv_skb
netlink_unicast
netlink_sendmsg
____sys_sendmsg
___sys_sendmsg
__sys_sendmsg
do_syscall_64
entry_SYSCALL_64_after_hwframe
- ip (3660)
25089479
[...]
We are blocked for around 70% of the time on the rcu_barrier() in
netdev_run_todo().
Note that one big difference between my setup and yours is that in my
case eth0 is a dummy device and in your case it's probably a physical
device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
so, it's possible that a non-negligible amount of time is spent talking
to hardware/firmware to delete the 4K VIDs from the device's VLAN
filter.
>
>
> Config is very clean i remove big part of CONFIG options .
>
> is there options to debug what is happen.
>
> m
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-09 20:08 ` Ido Schimmel
@ 2023-05-09 20:16 ` Martin Zaharinov
2023-05-10 5:31 ` Martin Zaharinov
` (2 subsequent siblings)
3 siblings, 0 replies; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-09 20:16 UTC (permalink / raw)
To: Ido Schimmel; +Cc: Eric Dumazet, netdev
Hi Ido
yes is physical card intel 82599 dual port 10G on 2 socket system with 24 core on 3Ghz
this is time :
time ./vlanadd
real 0m12.347s
user 0m8.863s
sys 0m2.594s
time ./vlanrem
real 8m59.105s
user 0m11.931s
sys 0m0.035s
for 1sec with : watch -n.1 "ip a | grep UP | wc”
and run vlanrem
in 1sec ~ remove 4-5 vlans
and i think rcu make problem.
i found one post from 2009 : https://lore.kernel.org/all/20091024144610.GC6638@linux.vnet.ibm.com/T/
yes is old and may be is make many changes after that .
i have same case with slow remove interface and with ppp interface when drop users over 800-900 make same problem to remove device and reconnect (readd)
m.
> On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>> i try on kernel 6.3.1
>>
>>
>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>
>> real 4m51.633s —— here i stop with Ctrl + C - and rerun and second part finish after 3 min
>> user 0m7.479s
>> sys 0m0.367s
>
> You are off-CPU most of the time, the question is what is blocking. I'm
> getting the following results with net-next:
>
> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> real 177.09
> user 3.85
> sys 31.26
>
> When using a batch file to perform the deletion:
>
> # time -p ip -b vlan_del.batch
> real 35.25
> user 0.02
> sys 3.61
>
> And to check where we are blocked most of the time while using the batch
> file:
>
> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> [...]
> __schedule
> schedule
> schedule_timeout
> wait_for_completion
> rcu_barrier
> netdev_run_todo
> rtnetlink_rcv_msg
> netlink_rcv_skb
> netlink_unicast
> netlink_sendmsg
> ____sys_sendmsg
> ___sys_sendmsg
> __sys_sendmsg
> do_syscall_64
> entry_SYSCALL_64_after_hwframe
> - ip (3660)
> 25089479
> [...]
>
> We are blocked for around 70% of the time on the rcu_barrier() in
> netdev_run_todo().
>
> Note that one big difference between my setup and yours is that in my
> case eth0 is a dummy device and in your case it's probably a physical
> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> so, it's possible that a non-negligible amount of time is spent talking
> to hardware/firmware to delete the 4K VIDs from the device's VLAN
> filter.
>
>>
>>
>> Config is very clean i remove big part of CONFIG options .
>>
>> is there options to debug what is happen.
>>
>> m
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-09 20:08 ` Ido Schimmel
2023-05-09 20:16 ` Martin Zaharinov
@ 2023-05-10 5:31 ` Martin Zaharinov
2023-05-10 6:06 ` Martin Zaharinov
2023-05-10 9:16 ` Martin Zaharinov
3 siblings, 0 replies; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-10 5:31 UTC (permalink / raw)
To: Ido Schimmel; +Cc: Eric Dumazet, netdev
Hi Eric and Ido
after little research after change CONFIG_HZ_100 > CONFIG_HZ_1000
vlanadd
real 0m15.106s
user 0m2.420s
sys 0m13.250s
vlandel:
real 1m10.995s
user 0m1.045s
sys 0m7.678s
i use 100 last 10 years all installation is server for networking.
do you have any recommendations
best regards,
m
> On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>> i try on kernel 6.3.1
>>
>>
>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>
>> real 4m51.633s —— here i stop with Ctrl + C - and rerun and second part finish after 3 min
>> user 0m7.479s
>> sys 0m0.367s
>
> You are off-CPU most of the time, the question is what is blocking. I'm
> getting the following results with net-next:
>
> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> real 177.09
> user 3.85
> sys 31.26
>
> When using a batch file to perform the deletion:
>
> # time -p ip -b vlan_del.batch
> real 35.25
> user 0.02
> sys 3.61
>
> And to check where we are blocked most of the time while using the batch
> file:
>
> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> [...]
> __schedule
> schedule
> schedule_timeout
> wait_for_completion
> rcu_barrier
> netdev_run_todo
> rtnetlink_rcv_msg
> netlink_rcv_skb
> netlink_unicast
> netlink_sendmsg
> ____sys_sendmsg
> ___sys_sendmsg
> __sys_sendmsg
> do_syscall_64
> entry_SYSCALL_64_after_hwframe
> - ip (3660)
> 25089479
> [...]
>
> We are blocked for around 70% of the time on the rcu_barrier() in
> netdev_run_todo().
>
> Note that one big difference between my setup and yours is that in my
> case eth0 is a dummy device and in your case it's probably a physical
> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> so, it's possible that a non-negligible amount of time is spent talking
> to hardware/firmware to delete the 4K VIDs from the device's VLAN
> filter.
>
>>
>>
>> Config is very clean i remove big part of CONFIG options .
>>
>> is there options to debug what is happen.
>>
>> m
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-09 20:08 ` Ido Schimmel
2023-05-09 20:16 ` Martin Zaharinov
2023-05-10 5:31 ` Martin Zaharinov
@ 2023-05-10 6:06 ` Martin Zaharinov
2023-05-10 9:40 ` Eric Dumazet
2023-05-10 9:16 ` Martin Zaharinov
3 siblings, 1 reply; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-10 6:06 UTC (permalink / raw)
To: Ido Schimmel; +Cc: Eric Dumazet, netdev
I think problem is in this part of code in net/core/dev.c
#define WAIT_REFS_MIN_MSECS 1
#define WAIT_REFS_MAX_MSECS 250
/**
* netdev_wait_allrefs_any - wait until all references are gone.
* @list: list of net_devices to wait on
*
* This is called when unregistering network devices.
*
* Any protocol or device that holds a reference should register
* for netdevice notification, and cleanup and put back the
* reference if they receive an UNREGISTER event.
* We can get stuck here if buggy protocols don't correctly
* call dev_put.
*/
static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
{
unsigned long rebroadcast_time, warning_time;
struct net_device *dev;
int wait = 0;
rebroadcast_time = warning_time = jiffies;
list_for_each_entry(dev, list, todo_list)
if (netdev_refcnt_read(dev) == 1)
return dev;
while (true) {
if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
rtnl_lock();
/* Rebroadcast unregister notification */
list_for_each_entry(dev, list, todo_list)
call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
__rtnl_unlock();
rcu_barrier();
rtnl_lock();
list_for_each_entry(dev, list, todo_list)
if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
&dev->state)) {
/* We must not have linkwatch events
* pending on unregister. If this
* happens, we simply run the queue
* unscheduled, resulting in a noop
* for this device.
*/
linkwatch_run_queue();
break;
}
__rtnl_unlock();
rebroadcast_time = jiffies;
}
if (!wait) {
rcu_barrier();
wait = WAIT_REFS_MIN_MSECS;
} else {
msleep(wait);
wait = min(wait << 1, WAIT_REFS_MAX_MSECS);
}
list_for_each_entry(dev, list, todo_list)
if (netdev_refcnt_read(dev) == 1)
return dev;
if (time_after(jiffies, warning_time +
READ_ONCE(netdev_unregister_timeout_secs) * HZ)) {
list_for_each_entry(dev, list, todo_list) {
pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
dev->name, netdev_refcnt_read(dev));
ref_tracker_dir_print(&dev->refcnt_tracker, 10);
}
warning_time = jiffies;
}
}
}
m.
> On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>> i try on kernel 6.3.1
>>
>>
>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>
>> real 4m51.633s —— here i stop with Ctrl + C - and rerun and second part finish after 3 min
>> user 0m7.479s
>> sys 0m0.367s
>
> You are off-CPU most of the time, the question is what is blocking. I'm
> getting the following results with net-next:
>
> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> real 177.09
> user 3.85
> sys 31.26
>
> When using a batch file to perform the deletion:
>
> # time -p ip -b vlan_del.batch
> real 35.25
> user 0.02
> sys 3.61
>
> And to check where we are blocked most of the time while using the batch
> file:
>
> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> [...]
> __schedule
> schedule
> schedule_timeout
> wait_for_completion
> rcu_barrier
> netdev_run_todo
> rtnetlink_rcv_msg
> netlink_rcv_skb
> netlink_unicast
> netlink_sendmsg
> ____sys_sendmsg
> ___sys_sendmsg
> __sys_sendmsg
> do_syscall_64
> entry_SYSCALL_64_after_hwframe
> - ip (3660)
> 25089479
> [...]
>
> We are blocked for around 70% of the time on the rcu_barrier() in
> netdev_run_todo().
>
> Note that one big difference between my setup and yours is that in my
> case eth0 is a dummy device and in your case it's probably a physical
> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> so, it's possible that a non-negligible amount of time is spent talking
> to hardware/firmware to delete the 4K VIDs from the device's VLAN
> filter.
>
>>
>>
>> Config is very clean i remove big part of CONFIG options .
>>
>> is there options to debug what is happen.
>>
>> m
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-09 20:08 ` Ido Schimmel
` (2 preceding siblings ...)
2023-05-10 6:06 ` Martin Zaharinov
@ 2023-05-10 9:16 ` Martin Zaharinov
2023-05-10 9:22 ` Eric Dumazet
3 siblings, 1 reply; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-10 9:16 UTC (permalink / raw)
To: Ido Schimmel; +Cc: Eric Dumazet, netdev
Hi all
one more update
i test with Proxmox direct with kernel 6.2.6
modprobe dummy numdummies=1
ip link set dev dummy0 up
for i in $(seq 2 1999); do ip link add link dummy0 name vlan$i type vlan id $i; done
for i in $(seq 2 1999); do ip link set dev vlan$i up; done
time for i in $(seq 2 1999); do ip link del link dummy0 name vlan$i type vlan id $i; done
real 1m6.308s
user 0m4.451s
sys 0m1.589s
This kernel is configured with CONFIG_HZ 250 and as you see i add 1998 vlans if add 4094 is time up to 4-5 min to remove
in test kernel i set CONFIG_HZ to 1000 but i dont this this is fine for any server.
> On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
>
> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>> i try on kernel 6.3.1
>>
>>
>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>
>> real 4m51.633s —— here i stop with Ctrl + C - and rerun and second part finish after 3 min
>> user 0m7.479s
>> sys 0m0.367s
>
> You are off-CPU most of the time, the question is what is blocking. I'm
> getting the following results with net-next:
>
> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> real 177.09
> user 3.85
> sys 31.26
>
> When using a batch file to perform the deletion:
>
> # time -p ip -b vlan_del.batch
> real 35.25
> user 0.02
> sys 3.61
>
> And to check where we are blocked most of the time while using the batch
> file:
>
> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> [...]
> __schedule
> schedule
> schedule_timeout
> wait_for_completion
> rcu_barrier
> netdev_run_todo
> rtnetlink_rcv_msg
> netlink_rcv_skb
> netlink_unicast
> netlink_sendmsg
> ____sys_sendmsg
> ___sys_sendmsg
> __sys_sendmsg
> do_syscall_64
> entry_SYSCALL_64_after_hwframe
> - ip (3660)
> 25089479
> [...]
>
> We are blocked for around 70% of the time on the rcu_barrier() in
> netdev_run_todo().
>
> Note that one big difference between my setup and yours is that in my
> case eth0 is a dummy device and in your case it's probably a physical
> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> so, it's possible that a non-negligible amount of time is spent talking
> to hardware/firmware to delete the 4K VIDs from the device's VLAN
> filter.
>
>>
>>
>> Config is very clean i remove big part of CONFIG options .
>>
>> is there options to debug what is happen.
>>
>> m
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-10 9:16 ` Martin Zaharinov
@ 2023-05-10 9:22 ` Eric Dumazet
0 siblings, 0 replies; 16+ messages in thread
From: Eric Dumazet @ 2023-05-10 9:22 UTC (permalink / raw)
To: Martin Zaharinov; +Cc: Ido Schimmel, netdev
On Wed, May 10, 2023 at 11:17 AM Martin Zaharinov <micron10@gmail.com> wrote:
>
> Hi all
>
> one more update
>
> i test with Proxmox direct with kernel 6.2.6
>
> modprobe dummy numdummies=1
> ip link set dev dummy0 up
> for i in $(seq 2 1999); do ip link add link dummy0 name vlan$i type vlan id $i; done
> for i in $(seq 2 1999); do ip link set dev vlan$i up; done
> time for i in $(seq 2 1999); do ip link del link dummy0 name vlan$i type vlan id $i; done
>
> real 1m6.308s
> user 0m4.451s
> sys 0m1.589s
>
>
> This kernel is configured with CONFIG_HZ 250 and as you see i add 1998 vlans if add 4094 is time up to 4-5 min to remove
>
> in test kernel i set CONFIG_HZ to 1000 but i dont this this is fine for any server.
We use CONFIG_HZ=1000 on server builds.
Other values cause suboptimal behavior, for instance in TCP stack.
>
>
> > On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
> >
> > On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
> >> i try on kernel 6.3.1
> >>
> >>
> >> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
> >>
> >> real 4m51.633s —— here i stop with Ctrl + C - and rerun and second part finish after 3 min
> >> user 0m7.479s
> >> sys 0m0.367s
> >
> > You are off-CPU most of the time, the question is what is blocking. I'm
> > getting the following results with net-next:
> >
> > # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> > real 177.09
> > user 3.85
> > sys 31.26
> >
> > When using a batch file to perform the deletion:
> >
> > # time -p ip -b vlan_del.batch
> > real 35.25
> > user 0.02
> > sys 3.61
> >
> > And to check where we are blocked most of the time while using the batch
> > file:
> >
> > # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> > [...]
> > __schedule
> > schedule
> > schedule_timeout
> > wait_for_completion
> > rcu_barrier
> > netdev_run_todo
> > rtnetlink_rcv_msg
> > netlink_rcv_skb
> > netlink_unicast
> > netlink_sendmsg
> > ____sys_sendmsg
> > ___sys_sendmsg
> > __sys_sendmsg
> > do_syscall_64
> > entry_SYSCALL_64_after_hwframe
> > - ip (3660)
> > 25089479
> > [...]
> >
> > We are blocked for around 70% of the time on the rcu_barrier() in
> > netdev_run_todo().
> >
> > Note that one big difference between my setup and yours is that in my
> > case eth0 is a dummy device and in your case it's probably a physical
> > device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> > so, it's possible that a non-negligible amount of time is spent talking
> > to hardware/firmware to delete the 4K VIDs from the device's VLAN
> > filter.
> >
> >>
> >>
> >> Config is very clean i remove big part of CONFIG options .
> >>
> >> is there options to debug what is happen.
> >>
> >> m
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-10 6:06 ` Martin Zaharinov
@ 2023-05-10 9:40 ` Eric Dumazet
2023-05-10 13:15 ` Martin Zaharinov
2023-05-25 7:50 ` Martin Zaharinov
0 siblings, 2 replies; 16+ messages in thread
From: Eric Dumazet @ 2023-05-10 9:40 UTC (permalink / raw)
To: Martin Zaharinov; +Cc: Ido Schimmel, netdev
On Wed, May 10, 2023 at 8:06 AM Martin Zaharinov <micron10@gmail.com> wrote:
>
> I think problem is in this part of code in net/core/dev.c
What makes you think this ?
msleep() is not called a single time on my test bed.
# perf probe -a msleep
# cat bench.sh
modprobe dummy 2>/dev/null
ip link set dev dummy0 up 2>/dev/null
for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
vlan id $i; done
for i in $(seq 2 4094); do ip link set dev vlan$i up; done
time for i in $(seq 2 4094); do ip link del link dummy0 name vlan$i
type vlan id $i; done
# perf record -e probe:msleep -a -g ./bench.sh
real 0m59.877s
user 0m0.588s
sys 0m7.023s
[ perf record: Woken up 6 times to write data ]
[ perf record: Captured and wrote 8.561 MB perf.data ]
# perf script
# << empty, nothing >>
> #define WAIT_REFS_MIN_MSECS 1
> #define WAIT_REFS_MAX_MSECS 250
> /**
> * netdev_wait_allrefs_any - wait until all references are gone.
> * @list: list of net_devices to wait on
> *
> * This is called when unregistering network devices.
> *
> * Any protocol or device that holds a reference should register
> * for netdevice notification, and cleanup and put back the
> * reference if they receive an UNREGISTER event.
> * We can get stuck here if buggy protocols don't correctly
> * call dev_put.
> */
> static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
> {
> unsigned long rebroadcast_time, warning_time;
> struct net_device *dev;
> int wait = 0;
>
> rebroadcast_time = warning_time = jiffies;
>
> list_for_each_entry(dev, list, todo_list)
> if (netdev_refcnt_read(dev) == 1)
> return dev;
>
> while (true) {
> if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
> rtnl_lock();
>
> /* Rebroadcast unregister notification */
> list_for_each_entry(dev, list, todo_list)
> call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>
> __rtnl_unlock();
> rcu_barrier();
> rtnl_lock();
>
> list_for_each_entry(dev, list, todo_list)
> if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
> &dev->state)) {
> /* We must not have linkwatch events
> * pending on unregister. If this
> * happens, we simply run the queue
> * unscheduled, resulting in a noop
> * for this device.
> */
> linkwatch_run_queue();
> break;
> }
>
> __rtnl_unlock();
>
> rebroadcast_time = jiffies;
> }
>
> if (!wait) {
> rcu_barrier();
> wait = WAIT_REFS_MIN_MSECS;
> } else {
> msleep(wait);
> wait = min(wait << 1, WAIT_REFS_MAX_MSECS);
> }
>
> list_for_each_entry(dev, list, todo_list)
> if (netdev_refcnt_read(dev) == 1)
> return dev;
>
> if (time_after(jiffies, warning_time +
> READ_ONCE(netdev_unregister_timeout_secs) * HZ)) {
> list_for_each_entry(dev, list, todo_list) {
> pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
> dev->name, netdev_refcnt_read(dev));
> ref_tracker_dir_print(&dev->refcnt_tracker, 10);
> }
>
> warning_time = jiffies;
> }
> }
> }
>
>
>
> m.
>
>
> > On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
> >
> > On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
> >> i try on kernel 6.3.1
> >>
> >>
> >> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
> >>
> >> real 4m51.633s —— here i stop with Ctrl + C - and rerun and second part finish after 3 min
> >> user 0m7.479s
> >> sys 0m0.367s
> >
> > You are off-CPU most of the time, the question is what is blocking. I'm
> > getting the following results with net-next:
> >
> > # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
> > real 177.09
> > user 3.85
> > sys 31.26
> >
> > When using a batch file to perform the deletion:
> >
> > # time -p ip -b vlan_del.batch
> > real 35.25
> > user 0.02
> > sys 3.61
> >
> > And to check where we are blocked most of the time while using the batch
> > file:
> >
> > # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
> > [...]
> > __schedule
> > schedule
> > schedule_timeout
> > wait_for_completion
> > rcu_barrier
> > netdev_run_todo
> > rtnetlink_rcv_msg
> > netlink_rcv_skb
> > netlink_unicast
> > netlink_sendmsg
> > ____sys_sendmsg
> > ___sys_sendmsg
> > __sys_sendmsg
> > do_syscall_64
> > entry_SYSCALL_64_after_hwframe
> > - ip (3660)
> > 25089479
> > [...]
> >
> > We are blocked for around 70% of the time on the rcu_barrier() in
> > netdev_run_todo().
> >
> > Note that one big difference between my setup and yours is that in my
> > case eth0 is a dummy device and in your case it's probably a physical
> > device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
> > so, it's possible that a non-negligible amount of time is spent talking
> > to hardware/firmware to delete the 4K VIDs from the device's VLAN
> > filter.
> >
> >>
> >>
> >> Config is very clean i remove big part of CONFIG options .
> >>
> >> is there options to debug what is happen.
> >>
> >> m
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-10 9:40 ` Eric Dumazet
@ 2023-05-10 13:15 ` Martin Zaharinov
2023-05-25 7:50 ` Martin Zaharinov
1 sibling, 0 replies; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-10 13:15 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Ido Schimmel, netdev
Ok i will try to set CONFIG_HZ to 1000 and will make tests
Thanks Eric
> On 10 May 2023, at 12:40, Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, May 10, 2023 at 8:06 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>
>> I think problem is in this part of code in net/core/dev.c
>
> What makes you think this ?
>
> msleep() is not called a single time on my test bed.
>
> # perf probe -a msleep
> # cat bench.sh
> modprobe dummy 2>/dev/null
> ip link set dev dummy0 up 2>/dev/null
> for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
> vlan id $i; done
> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> time for i in $(seq 2 4094); do ip link del link dummy0 name vlan$i
> type vlan id $i; done
>
> # perf record -e probe:msleep -a -g ./bench.sh
>
> real 0m59.877s
> user 0m0.588s
> sys 0m7.023s
> [ perf record: Woken up 6 times to write data ]
> [ perf record: Captured and wrote 8.561 MB perf.data ]
> # perf script
> # << empty, nothing >>
>
>
>
>
>> #define WAIT_REFS_MIN_MSECS 1
>> #define WAIT_REFS_MAX_MSECS 250
>> /**
>> * netdev_wait_allrefs_any - wait until all references are gone.
>> * @list: list of net_devices to wait on
>> *
>> * This is called when unregistering network devices.
>> *
>> * Any protocol or device that holds a reference should register
>> * for netdevice notification, and cleanup and put back the
>> * reference if they receive an UNREGISTER event.
>> * We can get stuck here if buggy protocols don't correctly
>> * call dev_put.
>> */
>> static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
>> {
>> unsigned long rebroadcast_time, warning_time;
>> struct net_device *dev;
>> int wait = 0;
>>
>> rebroadcast_time = warning_time = jiffies;
>>
>> list_for_each_entry(dev, list, todo_list)
>> if (netdev_refcnt_read(dev) == 1)
>> return dev;
>>
>> while (true) {
>> if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
>> rtnl_lock();
>>
>> /* Rebroadcast unregister notification */
>> list_for_each_entry(dev, list, todo_list)
>> call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>>
>> __rtnl_unlock();
>> rcu_barrier();
>> rtnl_lock();
>>
>> list_for_each_entry(dev, list, todo_list)
>> if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
>> &dev->state)) {
>> /* We must not have linkwatch events
>> * pending on unregister. If this
>> * happens, we simply run the queue
>> * unscheduled, resulting in a noop
>> * for this device.
>> */
>> linkwatch_run_queue();
>> break;
>> }
>>
>> __rtnl_unlock();
>>
>> rebroadcast_time = jiffies;
>> }
>>
>> if (!wait) {
>> rcu_barrier();
>> wait = WAIT_REFS_MIN_MSECS;
>> } else {
>> msleep(wait);
>> wait = min(wait << 1, WAIT_REFS_MAX_MSECS);
>> }
>>
>> list_for_each_entry(dev, list, todo_list)
>> if (netdev_refcnt_read(dev) == 1)
>> return dev;
>>
>> if (time_after(jiffies, warning_time +
>> READ_ONCE(netdev_unregister_timeout_secs) * HZ)) {
>> list_for_each_entry(dev, list, todo_list) {
>> pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
>> dev->name, netdev_refcnt_read(dev));
>> ref_tracker_dir_print(&dev->refcnt_tracker, 10);
>> }
>>
>> warning_time = jiffies;
>> }
>> }
>> }
>>
>>
>>
>> m.
>>
>>
>>> On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
>>>
>>> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>>>> i try on kernel 6.3.1
>>>>
>>>>
>>>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>>>
>>>> real 4m51.633s —— here i stop with Ctrl + C - and rerun and second part finish after 3 min
>>>> user 0m7.479s
>>>> sys 0m0.367s
>>>
>>> You are off-CPU most of the time, the question is what is blocking. I'm
>>> getting the following results with net-next:
>>>
>>> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
>>> real 177.09
>>> user 3.85
>>> sys 31.26
>>>
>>> When using a batch file to perform the deletion:
>>>
>>> # time -p ip -b vlan_del.batch
>>> real 35.25
>>> user 0.02
>>> sys 3.61
>>>
>>> And to check where we are blocked most of the time while using the batch
>>> file:
>>>
>>> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
>>> [...]
>>> __schedule
>>> schedule
>>> schedule_timeout
>>> wait_for_completion
>>> rcu_barrier
>>> netdev_run_todo
>>> rtnetlink_rcv_msg
>>> netlink_rcv_skb
>>> netlink_unicast
>>> netlink_sendmsg
>>> ____sys_sendmsg
>>> ___sys_sendmsg
>>> __sys_sendmsg
>>> do_syscall_64
>>> entry_SYSCALL_64_after_hwframe
>>> - ip (3660)
>>> 25089479
>>> [...]
>>>
>>> We are blocked for around 70% of the time on the rcu_barrier() in
>>> netdev_run_todo().
>>>
>>> Note that one big difference between my setup and yours is that in my
>>> case eth0 is a dummy device and in your case it's probably a physical
>>> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
>>> so, it's possible that a non-negligible amount of time is spent talking
>>> to hardware/firmware to delete the 4K VIDs from the device's VLAN
>>> filter.
>>>
>>>>
>>>>
>>>> Config is very clean i remove big part of CONFIG options .
>>>>
>>>> is there options to debug what is happen.
>>>>
>>>> m
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Very slow remove interface from kernel
2023-05-10 9:40 ` Eric Dumazet
2023-05-10 13:15 ` Martin Zaharinov
@ 2023-05-25 7:50 ` Martin Zaharinov
1 sibling, 0 replies; 16+ messages in thread
From: Martin Zaharinov @ 2023-05-25 7:50 UTC (permalink / raw)
To: Eric Dumazet; +Cc: Ido Schimmel, netdev
Hi Eric
after switch to HZ 1666 reduce time to 30 sec for remove 4093 vlans .
Do you think there will be a problem?
Best regards,
martin
> On 10 May 2023, at 12:40, Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, May 10, 2023 at 8:06 AM Martin Zaharinov <micron10@gmail.com> wrote:
>>
>> I think problem is in this part of code in net/core/dev.c
>
> What makes you think this ?
>
> msleep() is not called a single time on my test bed.
>
> # perf probe -a msleep
> # cat bench.sh
> modprobe dummy 2>/dev/null
> ip link set dev dummy0 up 2>/dev/null
> for i in $(seq 2 4094); do ip link add link dummy0 name vlan$i type
> vlan id $i; done
> for i in $(seq 2 4094); do ip link set dev vlan$i up; done
> time for i in $(seq 2 4094); do ip link del link dummy0 name vlan$i
> type vlan id $i; done
>
> # perf record -e probe:msleep -a -g ./bench.sh
>
> real 0m59.877s
> user 0m0.588s
> sys 0m7.023s
> [ perf record: Woken up 6 times to write data ]
> [ perf record: Captured and wrote 8.561 MB perf.data ]
> # perf script
> # << empty, nothing >>
>
>
>
>
>> #define WAIT_REFS_MIN_MSECS 1
>> #define WAIT_REFS_MAX_MSECS 250
>> /**
>> * netdev_wait_allrefs_any - wait until all references are gone.
>> * @list: list of net_devices to wait on
>> *
>> * This is called when unregistering network devices.
>> *
>> * Any protocol or device that holds a reference should register
>> * for netdevice notification, and cleanup and put back the
>> * reference if they receive an UNREGISTER event.
>> * We can get stuck here if buggy protocols don't correctly
>> * call dev_put.
>> */
>> static struct net_device *netdev_wait_allrefs_any(struct list_head *list)
>> {
>> unsigned long rebroadcast_time, warning_time;
>> struct net_device *dev;
>> int wait = 0;
>>
>> rebroadcast_time = warning_time = jiffies;
>>
>> list_for_each_entry(dev, list, todo_list)
>> if (netdev_refcnt_read(dev) == 1)
>> return dev;
>>
>> while (true) {
>> if (time_after(jiffies, rebroadcast_time + 1 * HZ)) {
>> rtnl_lock();
>>
>> /* Rebroadcast unregister notification */
>> list_for_each_entry(dev, list, todo_list)
>> call_netdevice_notifiers(NETDEV_UNREGISTER, dev);
>>
>> __rtnl_unlock();
>> rcu_barrier();
>> rtnl_lock();
>>
>> list_for_each_entry(dev, list, todo_list)
>> if (test_bit(__LINK_STATE_LINKWATCH_PENDING,
>> &dev->state)) {
>> /* We must not have linkwatch events
>> * pending on unregister. If this
>> * happens, we simply run the queue
>> * unscheduled, resulting in a noop
>> * for this device.
>> */
>> linkwatch_run_queue();
>> break;
>> }
>>
>> __rtnl_unlock();
>>
>> rebroadcast_time = jiffies;
>> }
>>
>> if (!wait) {
>> rcu_barrier();
>> wait = WAIT_REFS_MIN_MSECS;
>> } else {
>> msleep(wait);
>> wait = min(wait << 1, WAIT_REFS_MAX_MSECS);
>> }
>>
>> list_for_each_entry(dev, list, todo_list)
>> if (netdev_refcnt_read(dev) == 1)
>> return dev;
>>
>> if (time_after(jiffies, warning_time +
>> READ_ONCE(netdev_unregister_timeout_secs) * HZ)) {
>> list_for_each_entry(dev, list, todo_list) {
>> pr_emerg("unregister_netdevice: waiting for %s to become free. Usage count = %d\n",
>> dev->name, netdev_refcnt_read(dev));
>> ref_tracker_dir_print(&dev->refcnt_tracker, 10);
>> }
>>
>> warning_time = jiffies;
>> }
>> }
>> }
>>
>>
>>
>> m.
>>
>>
>>> On 9 May 2023, at 23:08, Ido Schimmel <idosch@idosch.org> wrote:
>>>
>>> On Tue, May 09, 2023 at 09:50:18PM +0300, Martin Zaharinov wrote:
>>>> i try on kernel 6.3.1
>>>>
>>>>
>>>> time for i in $(seq 2 4094); do ip link del link eth1 name vlan$i type vlan id $i; done
>>>>
>>>> real 4m51.633s —— here i stop with Ctrl + C - and rerun and second part finish after 3 min
>>>> user 0m7.479s
>>>> sys 0m0.367s
>>>
>>> You are off-CPU most of the time, the question is what is blocking. I'm
>>> getting the following results with net-next:
>>>
>>> # time -p for i in $(seq 2 4094); do ip link del dev eth0.$i; done
>>> real 177.09
>>> user 3.85
>>> sys 31.26
>>>
>>> When using a batch file to perform the deletion:
>>>
>>> # time -p ip -b vlan_del.batch
>>> real 35.25
>>> user 0.02
>>> sys 3.61
>>>
>>> And to check where we are blocked most of the time while using the batch
>>> file:
>>>
>>> # ../bcc/libbpf-tools/offcputime -p `pgrep -nx ip`
>>> [...]
>>> __schedule
>>> schedule
>>> schedule_timeout
>>> wait_for_completion
>>> rcu_barrier
>>> netdev_run_todo
>>> rtnetlink_rcv_msg
>>> netlink_rcv_skb
>>> netlink_unicast
>>> netlink_sendmsg
>>> ____sys_sendmsg
>>> ___sys_sendmsg
>>> __sys_sendmsg
>>> do_syscall_64
>>> entry_SYSCALL_64_after_hwframe
>>> - ip (3660)
>>> 25089479
>>> [...]
>>>
>>> We are blocked for around 70% of the time on the rcu_barrier() in
>>> netdev_run_todo().
>>>
>>> Note that one big difference between my setup and yours is that in my
>>> case eth0 is a dummy device and in your case it's probably a physical
>>> device that actually implements netdev_ops::ndo_vlan_rx_kill_vid(). If
>>> so, it's possible that a non-negligible amount of time is spent talking
>>> to hardware/firmware to delete the 4K VIDs from the device's VLAN
>>> filter.
>>>
>>>>
>>>>
>>>> Config is very clean i remove big part of CONFIG options .
>>>>
>>>> is there options to debug what is happen.
>>>>
>>>> m
>>
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2023-05-25 7:50 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-05-09 8:22 Very slow remove interface from kernel Martin Zaharinov
2023-05-09 10:20 ` Ido Schimmel
2023-05-09 10:32 ` Eric Dumazet
2023-05-09 11:10 ` Martin Zaharinov
2023-05-09 12:36 ` Eric Dumazet
2023-05-09 18:50 ` Martin Zaharinov
2023-05-09 20:08 ` Ido Schimmel
2023-05-09 20:16 ` Martin Zaharinov
2023-05-10 5:31 ` Martin Zaharinov
2023-05-10 6:06 ` Martin Zaharinov
2023-05-10 9:40 ` Eric Dumazet
2023-05-10 13:15 ` Martin Zaharinov
2023-05-25 7:50 ` Martin Zaharinov
2023-05-10 9:16 ` Martin Zaharinov
2023-05-10 9:22 ` Eric Dumazet
2023-05-09 20:08 ` Martin Zaharinov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.