* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
[not found] <d1530cba-1a72-cae8-6a04-ed8ec0f82e6e@gmail.com>
@ 2023-01-19 10:17 ` Paul Menzel
2023-01-19 10:22 ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network performance " Paul Menzel
2023-01-19 12:24 ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance " Bartek Kois
0 siblings, 2 replies; 14+ messages in thread
From: Paul Menzel @ 2023-01-19 10:17 UTC (permalink / raw)
To: Bartek Kois; +Cc: intel-wired-lan, regressions
#regzbot ^introduced: 4.9.88..5.10.149
Dear Bartek,
Am 14.01.23 um 11:23 schrieb Bartek Kois:
> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link set
> enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN based 10G
> adapter) I am experiencing high cpu load (even if no traffic is passing
> through the adapter) and network performance is low (when network is
> connected).
How do you test the network performance? Please give exact numbers for
comparison.
> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
> with no network attached. The problem can be observed on the
> following platforms: Supermicro X9SCL (Intel C202 PCH) and
> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
> X11SSL-F (Intel® C232 chipset) everything is working well.
>
> Tested environments:
> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1
> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no
> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F (Intel
> C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]
> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro X9SCL (Intel
> C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH) behave
> problematic as described above | newer platform: Supermicro X11SSL-F
> (Intel® C232 chipset) working well with no problems]
Maybe create a bug at the Linux kernel bug tracker [1], where you can
attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
> So far to solve the problem I was trying to upgrade system to the newest
> stable version, upgrade kernel to version 6.x, upgrade ixgbe driver to
> the newest version but with no luck.
Thank you for checking that. Too bad it’s still present. To rule out
some user space problem, could you test Debian 9.7 with a stable Linux
release, currently 6.1.7?
What does `sudo perf top --sort comm,dso` show, where the time is spent?
> Supermicro support suggested as follows:
> it might be kernel related debian 11.5 has kernel 5.10 which is a
> recent kernel it might not properly support the chipsets for X9
> therefore i suggest to use RHEL or CentOS as they use much older kernel
> versions. I expect that with ubuntu 20.04 you see the same problem it
> uses kernel 5.4
Testing another GNU/Linux distribution for another data point, might be
a good idea.
As nobody has responded yet, bisecting the issue is probably the fastest
way to get to the bottom of this. Luckily the problem seems reproducible
and you seem to be able to build a Linux kernel yourself, so that should
work. (For testing purposes you could also test with Ubuntu, as they
provide Linux kernel builds for (almost) all releases in their Linux
kernel mainline PPA [2].)
Kind regards,
Paul
[1]: https://bugzilla.kernel.org/
[2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network performance after moving to Debian 11.5
2023-01-19 10:17 ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5 Paul Menzel
@ 2023-01-19 10:22 ` Paul Menzel
2023-01-19 12:24 ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance " Bartek Kois
1 sibling, 0 replies; 14+ messages in thread
From: Paul Menzel @ 2023-01-19 10:22 UTC (permalink / raw)
To: Bartek Kois; +Cc: intel-wired-lan, regressions
Dear Bartek,
Am 19.01.23 um 11:17 schrieb Paul Menzel:
> #regzbot ^introduced: 4.9.88..5.10.149
> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>
>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link set
>> enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN based 10G
>> adapter) I am experiencing high cpu load (even if no traffic is
>> passing through the adapter) and network performance is low (when
>> network is connected).
>
> How do you test the network performance? Please give exact numbers for
> comparison.
>
>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>> with no network attached. The problem can be observed on the following
>> platforms: Supermicro X9SCL (Intel C202 PCH) and
>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>
>> Tested environments:
>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1
>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no
>> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F
>> (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]
>
>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro X9SCL
>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH) behave
>> problematic as described above | newer platform: Supermicro X11SSL-F
>> (Intel® C232 chipset) working well with no problems]
>
> Maybe create a bug at the Linux kernel bug tracker [1], where you can
> attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>
>> So far to solve the problem I was trying to upgrade system to the
>> newest stable version, upgrade kernel to version 6.x, upgrade ixgbe
>> driver to the newest version but with no luck.
>
> Thank you for checking that. Too bad it’s still present. To rule out
> some user space problem, could you test Debian 9.7 with a stable Linux
> release, currently 6.1.7?
>
> What does `sudo perf top --sort comm,dso` show, where the time is spent?
>
>> Supermicro support suggested as follows:
>> it might be kernel related debian 11.5 has kernel 5.10 which is a
>> recent kernel it might not properly support the chipsets for X9
>> therefore i suggest to use RHEL or CentOS as they use much older
>> kernel versions. I expect that with ubuntu 20.04 you see the same
>> problem it uses kernel 5.4
>
> Testing another GNU/Linux distribution for another data point, might be
> a good idea.
>
> As nobody has responded yet, bisecting the issue is probably the fastest
> way to get to the bottom of this. Luckily the problem seems reproducible
> and you seem to be able to build a Linux kernel yourself, so that should
> work. (For testing purposes you could also test with Ubuntu, as they
> provide Linux kernel builds for (almost) all releases in their Linux
> kernel mainline PPA [2].)
You could also try to do that in a virtual machine by passing through
the network device to the VM. If that reproduces the issue, that’s quite
a fast setup for bisecting a regression, as start times are really fast.
(For example, you can pass the Linux kernel directly to a QEMU VM with
the `-kernel` switch.)
Kind regards,
Paul
> [1]: https://bugzilla.kernel.org/
> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
2023-01-19 10:17 ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5 Paul Menzel
2023-01-19 10:22 ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network performance " Paul Menzel
@ 2023-01-19 12:24 ` Bartek Kois
2023-01-19 16:58 ` Bartek Kois
1 sibling, 1 reply; 14+ messages in thread
From: Bartek Kois @ 2023-01-19 12:24 UTC (permalink / raw)
To: Paul Menzel; +Cc: intel-wired-lan, regressions
W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>
> #regzbot ^introduced: 4.9.88..5.10.149
>
> Dear Bartek,
>
>
> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>
>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link
>> set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN based
>> 10G adapter) I am experiencing high cpu load (even if no traffic is
>> passing through the adapter) and network performance is low (when
>> network is connected).
>
> How do you test the network performance? Please give exact numbers for
> comparison.
>
I am using this server as a router for my subscribers with iptables (for
NAT and firewall) and hfsc (for QoS). First I encountered this problem
while migrating form Debian 9.7 to 11.5. Routers based on Supermicro
X11SSL-F (Intel® C232 chipset) works with no problems after that
migration, but routers based on Supermicro X9SCL (Intel C202 PCH) and
Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving strangely
with high cpu load (0.5-0.8 while before it was around 0.0-0.1) and
subscribers not being able to utilize their plans. I tried to strip down
the problem and ends up with clean system with no iptables or hfsc rules
behaving the same (higher load) right after setting the 10G link upeven
if no traffic is passing by.
>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>> with no network attached. The problem can be observed on the
>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>
>> Tested environments:
>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1
>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no
>> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F
>> (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]
>
>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro X9SCL
>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH)
>> behave problematic as described above | newer platform: Supermicro
>> X11SSL-F (Intel® C232 chipset) working well with no problems]
>
> Maybe create a bug at the Linux kernel bug tracker [1], where you can
> attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>
I`ve already reported that to the Debian team
ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so far
nobody took care of this issue so far.
>> So far to solve the problem I was trying to upgrade system to the
>> newest stable version, upgrade kernel to version 6.x, upgrade ixgbe
>> driver to the newest version but with no luck.
>
> Thank you for checking that. Too bad it’s still present. To rule out
> some user space problem, could you test Debian 9.7 with a stable Linux
> release, currently 6.1.7?
>
> What does `sudo perf top --sort comm,dso` show, where the time is spent?
During my first test in real enviroment with subscribers I gether the
following data through the perf:
27.83% [kernel] [k] strncpy
14.80% [kernel] [k] nft_do_chain
7.61% [kernel] [k] memcmp
5.63% [kernel] [k] nft_meta_get_eval
3.14% [kernel] [k] nft_cmp_eval
2.79% [kernel] [k] asm_exc_nmi
1.07% [kernel] [k] module_get_kallsym
0.92% [kernel] [k] kallsyms_expand_symbol.constprop.0
0.85% [kernel] [k] ixgbe_poll
0.75% [kernel] [k] format_decode
0.61% [kernel] [k] number
0.56% [kernel] [k] menu_select
0.54% [kernel] [k] clflush_cache_range
0.52% [kernel] [k] cpuidle_enter_state
0.51% [kernel] [k] vsnprintf
0.50% [kernel] [k] u32_classify
0.49% [kernel] [k] fib_table_lookup
0.40% [kernel] [k] dma_pte_clear_level
0.39% [kernel] [k] domain_mapping
0.36% [kernel] [k] ixgbe_xmit_fram
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
18 root 20 0 0 0 0 S 28.2 0.0 7:06.27
ksoftirqd/1
12 root 20 0 0 0 0 R 12.0 0.0 4:10.88
ksoftirqd/0
23 root 20 0 0 0 0 S 6.0 0.0 4:36.08
ksoftirqd/2
28 root 20 0 0 0 0 S 5.3 0.0 6:46.47
ksoftirqd/3
846449 root 20 0 0 0 0 I 1.0 0.0 0:01.61
kworker/0:0-events_power_efficient
13 root 20 0 0 0 0 I 0.3 0.0 0:13.50
rcu_sched
8264 root 20 0 101536 6944 4824 S 0.3 0.2 0:07.77 dhcpd
1 root 20 0 164048 10184 7672 S 0.0 0.3 0:04.52
systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.00
kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
rcu_par_gp
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/0:0H-events_highpri
9 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
mm_percpu_wq
10 root 20 0 0 0 0 S 0.0 0.0 0:00.00
rcu_tasks_rude_
11 root 20 0 0 0 0 S 0.0 0.0 0:00.00
rcu_tasks_trace
14 root rt 0 0 0 0 S 0.0 0.0 0:00.26
migration/0
>
>> Supermicro support suggested as follows:
>> it might be kernel related debian 11.5 has kernel 5.10 which is a
>> recent kernel it might not properly support the chipsets for X9
>> therefore i suggest to use RHEL or CentOS as they use much older
>> kernel versions. I expect that with ubuntu 20.04 you see the same
>> problem it uses kernel 5.4
> Testing another GNU/Linux distribution for another data point, might
> be a good idea.
>
> As nobody has responded yet, bisecting the issue is probably the
> fastest way to get to the bottom of this. Luckily the problem seems
> reproducible and you seem to be able to build a Linux kernel yourself,
> so that should work. (For testing purposes you could also test with
> Ubuntu, as they provide Linux kernel builds for (almost) all releases
> in their Linux kernel mainline PPA [2].)
>
Of course I can try Ubuntu and report how it is working.
Best regards
Bartek Kois
>
> Kind regards,
>
> Paul
>
>
> [1]: https://bugzilla.kernel.org/
> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
2023-01-19 12:24 ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance " Bartek Kois
@ 2023-01-19 16:58 ` Bartek Kois
2023-01-19 17:09 ` Paul Menzel
0 siblings, 1 reply; 14+ messages in thread
From: Bartek Kois @ 2023-01-19 16:58 UTC (permalink / raw)
To: Paul Menzel; +Cc: intel-wired-lan, regressions
W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>
> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>
>> #regzbot ^introduced: 4.9.88..5.10.149
>>
>> Dear Bartek,
>>
>>
>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>
>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link
>>> set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN
>>> based 10G adapter) I am experiencing high cpu load (even if no
>>> traffic is passing through the adapter) and network performance is
>>> low (when network is connected).
>>
>> How do you test the network performance? Please give exact numbers
>> for comparison.
>>
> I am using this server as a router for my subscribers with iptables
> (for NAT and firewall) and hfsc (for QoS). First I encountered this
> problem while migrating form Debian 9.7 to 11.5. Routers based on
> Supermicro X11SSL-F (Intel® C232 chipset) works with no problems after
> that migration, but routers based on Supermicro X9SCL (Intel C202 PCH)
> and Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving
> strangely with high cpu load (0.5-0.8 while before it was around
> 0.0-0.1) and subscribers not being able to utilize their plans. I
> tried to strip down the problem and ends up with clean system with no
> iptables or hfsc rules behaving the same (higher load) right after
> setting the 10G link upeven if no traffic is passing by.
>
>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>> with no network attached. The problem can be observed on the
>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>
>>> Tested environments:
>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1
>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no
>>> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F
>>> (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]
>>
>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro X9SCL
>>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH)
>>> behave problematic as described above | newer platform: Supermicro
>>> X11SSL-F (Intel® C232 chipset) working well with no problems]
>>
>> Maybe create a bug at the Linux kernel bug tracker [1], where you can
>> attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>
> I`ve already reported that to the Debian team
> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so far
> nobody took care of this issue so far.
>
>>> So far to solve the problem I was trying to upgrade system to the
>>> newest stable version, upgrade kernel to version 6.x, upgrade ixgbe
>>> driver to the newest version but with no luck.
>>
>> Thank you for checking that. Too bad it’s still present. To rule out
>> some user space problem, could you test Debian 9.7 with a stable
>> Linux release, currently 6.1.7?
>>
>> What does `sudo perf top --sort comm,dso` show, where the time is spent?
>
> During my first test in real enviroment with subscribers I gether the
> following data through the perf:
>
> 27.83% [kernel] [k] strncpy
> 14.80% [kernel] [k] nft_do_chain
> 7.61% [kernel] [k] memcmp
> 5.63% [kernel] [k] nft_meta_get_eval
> 3.14% [kernel] [k] nft_cmp_eval
> 2.79% [kernel] [k] asm_exc_nmi
> 1.07% [kernel] [k] module_get_kallsym
> 0.92% [kernel] [k]
> kallsyms_expand_symbol.constprop.0
> 0.85% [kernel] [k] ixgbe_poll
> 0.75% [kernel] [k] format_decode
> 0.61% [kernel] [k] number
> 0.56% [kernel] [k] menu_select
> 0.54% [kernel] [k] clflush_cache_range
> 0.52% [kernel] [k] cpuidle_enter_state
> 0.51% [kernel] [k] vsnprintf
> 0.50% [kernel] [k] u32_classify
> 0.49% [kernel] [k] fib_table_lookup
> 0.40% [kernel] [k] dma_pte_clear_level
> 0.39% [kernel] [k] domain_mapping
> 0.36% [kernel] [k] ixgbe_xmit_fram
>
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
> COMMAND
> 18 root 20 0 0 0 0 S 28.2 0.0 7:06.27
> ksoftirqd/1
> 12 root 20 0 0 0 0 R 12.0 0.0 4:10.88
> ksoftirqd/0
> 23 root 20 0 0 0 0 S 6.0 0.0 4:36.08
> ksoftirqd/2
> 28 root 20 0 0 0 0 S 5.3 0.0 6:46.47
> ksoftirqd/3
> 846449 root 20 0 0 0 0 I 1.0 0.0 0:01.61
> kworker/0:0-events_power_efficient
> 13 root 20 0 0 0 0 I 0.3 0.0 0:13.50
> rcu_sched
> 8264 root 20 0 101536 6944 4824 S 0.3 0.2 0:07.77
> dhcpd
> 1 root 20 0 164048 10184 7672 S 0.0 0.3 0:04.52
> systemd
> 2 root 20 0 0 0 0 S 0.0 0.0 0:00.00
> kthreadd
> 3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> rcu_gp
> 4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> rcu_par_gp
> 6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> kworker/0:0H-events_highpri
> 9 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
> mm_percpu_wq
> 10 root 20 0 0 0 0 S 0.0 0.0 0:00.00
> rcu_tasks_rude_
> 11 root 20 0 0 0 0 S 0.0 0.0 0:00.00
> rcu_tasks_trace
> 14 root rt 0 0 0 0 S 0.0 0.0 0:00.26
> migration/0
>
>>
>>> Supermicro support suggested as follows:
>>> it might be kernel related debian 11.5 has kernel 5.10 which is a
>>> recent kernel it might not properly support the chipsets for X9
>>> therefore i suggest to use RHEL or CentOS as they use much older
>>> kernel versions. I expect that with ubuntu 20.04 you see the same
>>> problem it uses kernel 5.4
>> Testing another GNU/Linux distribution for another data point, might
>> be a good idea.
>>
>> As nobody has responded yet, bisecting the issue is probably the
>> fastest way to get to the bottom of this. Luckily the problem seems
>> reproducible and you seem to be able to build a Linux kernel
>> yourself, so that should work. (For testing purposes you could also
>> test with Ubuntu, as they provide Linux kernel builds for (almost)
>> all releases in their Linux kernel mainline PPA [2].)
>>
> Of course I can try Ubuntu and report how it is working.
>
Ubuntu (5.15.0-43-generic) seems to be working in the same way
generating higher load after executing "ip link set enp1s0 up".
Best regards
Bartek Kois
> Best regards
>
> Bartek Kois
>
>>
>> Kind regards,
>>
>> Paul
>>
>>
>> [1]: https://bugzilla.kernel.org/
>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
2023-01-19 16:58 ` Bartek Kois
@ 2023-01-19 17:09 ` Paul Menzel
2023-01-19 17:17 ` Bartek Kois
0 siblings, 1 reply; 14+ messages in thread
From: Paul Menzel @ 2023-01-19 17:09 UTC (permalink / raw)
To: Bartek Kois; +Cc: intel-wired-lan, regressions
Dear Bartek,
Am 19.01.23 um 17:58 schrieb Bartek Kois:
> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>
>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>
>>> #regzbot ^introduced: 4.9.88..5.10.149
>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>
>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link
>>>> set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN
>>>> based 10G adapter) I am experiencing high cpu load (even if no
>>>> traffic is passing through the adapter) and network performance is
>>>> low (when network is connected).
>>>
>>> How do you test the network performance? Please give exact numbers
>>> for comparison.
>>>
>> I am using this server as a router for my subscribers with iptables
>> (for NAT and firewall) and hfsc (for QoS). First I encountered this
>> problem while migrating form Debian 9.7 to 11.5. Routers based on
>> Supermicro X11SSL-F (Intel® C232 chipset) works with no problems after
>> that migration, but routers based on Supermicro X9SCL (Intel C202 PCH)
>> and Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving
>> strangely with high cpu load (0.5-0.8 while before it was around
>> 0.0-0.1) and subscribers not being able to utilize their plans. I
>> tried to strip down the problem and ends up with clean system with no
>> iptables or hfsc rules behaving the same (higher load) right after
>> setting the 10G link upeven if no traffic is passing by.
>>
>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>> with no network attached. The problem can be observed on the
>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>
>>>> Tested environments:
>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1
>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no
>>>> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F
>>>> (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]
>>>
>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro X9SCL
>>>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>> behave problematic as described above | newer platform: Supermicro
>>>> X11SSL-F (Intel® C232 chipset) working well with no problems]
>>>
>>> Maybe create a bug at the Linux kernel bug tracker [1], where you can
>>> attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>
>> I`ve already reported that to the Debian team
>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so far
>> nobody took care of this issue so far.
>>
>>>> So far to solve the problem I was trying to upgrade system to the
>>>> newest stable version, upgrade kernel to version 6.x, upgrade ixgbe
>>>> driver to the newest version but with no luck.
>>>
>>> Thank you for checking that. Too bad it’s still present. To rule out
>>> some user space problem, could you test Debian 9.7 with a stable
>>> Linux release, currently 6.1.7?
>>>
>>> What does `sudo perf top --sort comm,dso` show, where the time is spent?
>>
>> During my first test in real enviroment with subscribers I gether the
>> following data through the perf:
>>
>> 27.83% [kernel] [k] strncpy
>> 14.80% [kernel] [k] nft_do_chain
>> 7.61% [kernel] [k] memcmp
>> 5.63% [kernel] [k] nft_meta_get_eval
>> 3.14% [kernel] [k] nft_cmp_eval
>> 2.79% [kernel] [k] asm_exc_nmi
>> 1.07% [kernel] [k] module_get_kallsym
>> 0.92% [kernel] [k] kallsyms_expand_symbol.constprop.0
>> 0.85% [kernel] [k] ixgbe_poll
>> 0.75% [kernel] [k] format_decode
>> 0.61% [kernel] [k] number
>> 0.56% [kernel] [k] menu_select
>> 0.54% [kernel] [k] clflush_cache_range
>> 0.52% [kernel] [k] cpuidle_enter_state
>> 0.51% [kernel] [k] vsnprintf
>> 0.50% [kernel] [k] u32_classify
>> 0.49% [kernel] [k] fib_table_lookup
>> 0.40% [kernel] [k] dma_pte_clear_level
>> 0.39% [kernel] [k] domain_mapping
>> 0.36% [kernel] [k] ixgbe_xmit_fram
>>
>>
>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>> 18 root 20 0 0 0 0 S 28.2 0.0 7:06.27 ksoftirqd/1
>> 12 root 20 0 0 0 0 R 12.0 0.0 4:10.88 ksoftirqd/0
[…]
Do you see different behavior in `/proc/interrupts`?
>>>> Supermicro support suggested as follows:
>>>> it might be kernel related debian 11.5 has kernel 5.10 which is a
>>>> recent kernel it might not properly support the chipsets for X9
>>>> therefore i suggest to use RHEL or CentOS as they use much older
>>>> kernel versions. I expect that with ubuntu 20.04 you see the same
>>>> problem it uses kernel 5.4
>>> >>> Testing another GNU/Linux distribution for another data point, might
>>> be a good idea.
>>>
>>> As nobody has responded yet, bisecting the issue is probably the
>>> fastest way to get to the bottom of this. Luckily the problem seems
>>> reproducible and you seem to be able to build a Linux kernel
>>> yourself, so that should work. (For testing purposes you could also
>>> test with Ubuntu, as they provide Linux kernel builds for (almost)
>>> all releases in their Linux kernel mainline PPA [2].)
>>>
>> Of course I can try Ubuntu and report how it is working.
>>
> Ubuntu (5.15.0-43-generic) seems to be working in the same way
> generating higher load after executing "ip link set enp1s0 up".
That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu 20.04
with Linux 5.4, and Ubuntu 18.04 with 4.15?
Anyway, I think, you won’t come around bisecting. Another hint, make
sure that you can build a 4.9 Linux kernel yourself, that does not
exhibit that issue.
Kind regards,
Paul
>>> [1]: https://bugzilla.kernel.org/
>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
2023-01-19 17:09 ` Paul Menzel
@ 2023-01-19 17:17 ` Bartek Kois
2023-01-22 20:28 ` Paul Menzel
0 siblings, 1 reply; 14+ messages in thread
From: Bartek Kois @ 2023-01-19 17:17 UTC (permalink / raw)
To: Paul Menzel; +Cc: intel-wired-lan, regressions
W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
> Dear Bartek,
>
>
> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>
>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>
>>>> #regzbot ^introduced: 4.9.88..5.10.149
>
>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>
>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link
>>>>> set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN
>>>>> based 10G adapter) I am experiencing high cpu load (even if no
>>>>> traffic is passing through the adapter) and network performance is
>>>>> low (when network is connected).
>>>>
>>>> How do you test the network performance? Please give exact numbers
>>>> for comparison.
>>>>
>>> I am using this server as a router for my subscribers with iptables
>>> (for NAT and firewall) and hfsc (for QoS). First I encountered this
>>> problem while migrating form Debian 9.7 to 11.5. Routers based on
>>> Supermicro X11SSL-F (Intel® C232 chipset) works with no problems
>>> after that migration, but routers based on Supermicro X9SCL (Intel
>>> C202 PCH) and Supermicro X10SLL+-F (Intel C222 Express PCH) starts
>>> behaving strangely with high cpu load (0.5-0.8 while before it was
>>> around 0.0-0.1) and subscribers not being able to utilize their
>>> plans. I tried to strip down the problem and ends up with clean
>>> system with no iptables or hfsc rules behaving the same (higher
>>> load) right after setting the 10G link upeven if no traffic is
>>> passing by.
>>>
>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>>> with no network attached. The problem can be observed on the
>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>
>>>>> Tested environments:
>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1
>>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no
>>>>> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F
>>>>> (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]
>>>>
>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro X9SCL
>>>>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>> behave problematic as described above | newer platform: Supermicro
>>>>> X11SSL-F (Intel® C232 chipset) working well with no problems]
>>>>
>>>> Maybe create a bug at the Linux kernel bug tracker [1], where you
>>>> can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>
>>> I`ve already reported that to the Debian team
>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so far
>>> nobody took care of this issue so far.
>>>
>>>>> So far to solve the problem I was trying to upgrade system to the
>>>>> newest stable version, upgrade kernel to version 6.x, upgrade
>>>>> ixgbe driver to the newest version but with no luck.
>>>>
>>>> Thank you for checking that. Too bad it’s still present. To rule
>>>> out some user space problem, could you test Debian 9.7 with a
>>>> stable Linux release, currently 6.1.7?
>>>>
>>>> What does `sudo perf top --sort comm,dso` show, where the time is
>>>> spent?
>>>
>>> During my first test in real enviroment with subscribers I gether
>>> the following data through the perf:
>>>
>>> 27.83% [kernel] [k] strncpy
>>> 14.80% [kernel] [k] nft_do_chain
>>> 7.61% [kernel] [k] memcmp
>>> 5.63% [kernel] [k] nft_meta_get_eval
>>> 3.14% [kernel] [k] nft_cmp_eval
>>> 2.79% [kernel] [k] asm_exc_nmi
>>> 1.07% [kernel] [k] module_get_kallsym
>>> 0.92% [kernel] [k]
>>> kallsyms_expand_symbol.constprop.0
>>> 0.85% [kernel] [k] ixgbe_poll
>>> 0.75% [kernel] [k] format_decode
>>> 0.61% [kernel] [k] number
>>> 0.56% [kernel] [k] menu_select
>>> 0.54% [kernel] [k] clflush_cache_range
>>> 0.52% [kernel] [k] cpuidle_enter_state
>>> 0.51% [kernel] [k] vsnprintf
>>> 0.50% [kernel] [k] u32_classify
>>> 0.49% [kernel] [k] fib_table_lookup
>>> 0.40% [kernel] [k] dma_pte_clear_level
>>> 0.39% [kernel] [k] domain_mapping
>>> 0.36% [kernel] [k] ixgbe_xmit_fram
>>>
>>>
>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
>>> COMMAND
>>> 18 root 20 0 0 0 0 S 28.2 0.0 7:06.27
>>> ksoftirqd/1
>>> 12 root 20 0 0 0 0 R 12.0 0.0 4:10.88
>>> ksoftirqd/0
>
> […]
>
> Do you see different behavior in `/proc/interrupts`?
>
This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP
Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on Supermicro X10SLL+-F
(Intel C222 Express PCH):
1 root 20 0 163948 10288 7696 S 0.0 0.1 0:39.58
systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.17
kthreadd
3 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 rcu_gp
4 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
rcu_par_gp
6 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/0:0H-kblockd
9 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
mm_percpu_wq
10 root 20 0 0 0 0 S 0.0 0.0 0:00.00
rcu_tasks_rude_
11 root 20 0 0 0 0 S 0.0 0.0 0:00.00
rcu_tasks_trace
12 root 20 0 0 0 0 S 0.0 0.0 6:07.13
ksoftirqd/0
13 root 20 0 0 0 0 I 0.0 0.0 4:15.28
rcu_sched
14 root rt 0 0 0 0 S 0.0 0.0 0:03.20
migration/0
15 root 20 0 0 0 0 S 0.0 0.0 0:00.00
cpuhp/0
16 root 20 0 0 0 0 S 0.0 0.0 0:00.00
cpuhp/1
17 root rt 0 0 0 0 S 0.0 0.0 0:02.75
migration/1
18 root 20 0 0 0 0 S 0.0 0.0 4:35.84
ksoftirqd/1
20 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/1:0H-events_highpri
21 root 20 0 0 0 0 S 0.0 0.0 0:00.00
cpuhp/2
22 root rt 0 0 0 0 S 0.0 0.0 0:01.37
migration/2
23 root 20 0 0 0 0 S 0.0 0.0 8:18.23
ksoftirqd/2
25 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/2:0H-events_highpri
26 root 20 0 0 0 0 S 0.0 0.0 0:00.00
cpuhp/3
27 root rt 0 0 0 0 S 0.0 0.0 0:01.76
migration/3
28 root 20 0 0 0 0 S 0.0 0.0 8:45.46
ksoftirqd/3
30 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/3:0H-events_highpri
31 root 20 0 0 0 0 S 0.0 0.0 0:00.00
cpuhp/4
32 root rt 0 0 0 0 S 0.0 0.0 0:04.39
migration/4
33 root 20 0 0 0 0 S 0.0 0.0 3:44.08
ksoftirqd/4
35 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/4:0H-events_highpri
36 root 20 0 0 0 0 S 0.0 0.0 0:00.00
cpuhp/5
37 root rt 0 0 0 0 S 0.0 0.0 0:02.44
migration/5
38 root 20 0 0 0 0 S 0.0 0.0 4:04.34
ksoftirqd/5
40 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/5:0H-events_highpri
41 root 20 0 0 0 0 S 0.0 0.0 0:00.00
cpuhp/6
42 root rt 0 0 0 0 S 0.0 0.0 0:01.95
migration/6
43 root 20 0 0 0 0 S 0.0 0.0 3:35.38
ksoftirqd/6
45 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/6:0H-kblockd
46 root 20 0 0 0 0 S 0.0 0.0 0:00.00
cpuhp/7
47 root rt 0 0 0 0 S 0.0 0.0 0:01.07
migration/7
48 root 20 0 0 0 0 S 0.0 0.0 0:00.16
ksoftirqd/7
50 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/7:0H-kblockd
59 root 20 0 0 0 0 S 0.0 0.0 0:00.00
kdevtmpfs
60 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 netns
61 root 20 0 0 0 0 S 0.0 0.0 0:00.00
kauditd
62 root 20 0 0 0 0 S 0.0 0.0 0:00.09
khungtaskd
63 root 20 0 0 0 0 S 0.0 0.0 0:00.00
oom_reaper
64 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
writeback
65 root 20 0 0 0 0 S 0.0 0.0 0:07.72
kcompactd0
66 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd
67 root 39 19 0 0 0 S 0.0 0.0 0:01.19
khugepaged
85 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kintegrityd
86 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kblockd
87 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
blkcg_punt_bio
88 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
edac-poller
89 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
devfreq_wq
91 root 0 -20 0 0 0 I 0.0 0.0 0:02.57
kworker/1:1H-kblockd
92 root 20 0 0 0 0 S 0.0 0.0 0:00.00
kswapd0
93 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kthrotld
94 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
acpi_thermal_pm
96 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
ipv6_addrconf
104 root 0 -20 0 0 0 I 0.0 0.0 0:00.68
kworker/2:1H-kblockd
109 root 0 -20 0 0 0 I 0.0 0.0 0:00.00 kstrp
112 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
zswap-shrink
113 root 0 -20 0 0 0 I 0.0 0.0 0:00.00
kworker/u17:0
and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1
on Supermicro X10SLL+-F (Intel C222 Express PCH)
31659 root 20 0 0 0 0 S 0.3 0.0 0:00.92
kworker/7:0
1 root 20 0 57032 6736 5256 S 0.0 0.1 2:28.14 systemd
2 root 20 0 0 0 0 S 0.0 0.0 0:00.19 kthreadd
3 root 20 0 0 0 0 S 0.0 0.0 0:35.42
ksoftirqd/0
5 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
kworker/0:0H
7 root 20 0 0 0 0 S 0.0 0.0 2:36.16 rcu_sched
8 root 20 0 0 0 0 S 0.0 0.0 0:00.00 rcu_bh
9 root rt 0 0 0 0 S 0.0 0.0 0:00.28
migration/0
10 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
lru-add-drain
11 root rt 0 0 0 0 S 0.0 0.0 0:00.25
watchdog/0
12 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/0
13 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/1
14 root rt 0 0 0 0 S 0.0 0.0 0:00.31
watchdog/1
15 root rt 0 0 0 0 S 0.0 0.0 0:25.69
migration/1
16 root 20 0 0 0 0 S 0.0 0.0 1:10.62
ksoftirqd/1
18 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
kworker/1:0H
19 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/2
20 root rt 0 0 0 0 S 0.0 0.0 0:00.26
watchdog/2
21 root rt 0 0 0 0 S 0.0 0.0 0:10.18
migration/2
22 root 20 0 0 0 0 S 0.0 0.0 0:51.08
ksoftirqd/2
24 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
kworker/2:0H
25 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/3
26 root rt 0 0 0 0 S 0.0 0.0 0:00.23
watchdog/3
27 root rt 0 0 0 0 S 0.0 0.0 0:00.32
migration/3
28 root 20 0 0 0 0 S 0.0 0.0 0:48.46
ksoftirqd/3
30 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
kworker/3:0H
31 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/4
32 root rt 0 0 0 0 S 0.0 0.0 0:00.21
watchdog/4
33 root rt 0 0 0 0 S 0.0 0.0 0:00.25
migration/4
34 root 20 0 0 0 0 S 0.0 0.0 0:36.35
ksoftirqd/4
36 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
kworker/4:0H
37 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/5
38 root rt 0 0 0 0 S 0.0 0.0 0:00.22
watchdog/5
39 root rt 0 0 0 0 S 0.0 0.0 0:04.02
migration/5
40 root 20 0 0 0 0 S 0.0 0.0 0:41.43
ksoftirqd/5
42 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
kworker/5:0H
43 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/6
44 root rt 0 0 0 0 S 0.0 0.0 0:00.22
watchdog/6
45 root rt 0 0 0 0 S 0.0 0.0 0:01.53
migration/6
46 root 20 0 0 0 0 S 0.0 0.0 0:41.66
ksoftirqd/6
48 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
kworker/6:0H
49 root 20 0 0 0 0 S 0.0 0.0 0:00.00 cpuhp/7
50 root rt 0 0 0 0 S 0.0 0.0 0:00.24
watchdog/7
51 root rt 0 0 0 0 S 0.0 0.0 0:00.27
migration/7
52 root 20 0 0 0 0 S 0.0 0.0 0:46.13
ksoftirqd/7
54 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
kworker/7:0H
55 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kdevtmpfs
56 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 netns
57 root 20 0 0 0 0 S 0.0 0.0 0:00.07
khungtaskd
58 root 20 0 0 0 0 S 0.0 0.0 0:00.00
oom_reaper
59 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 writeback
60 root 20 0 0 0 0 S 0.0 0.0 0:00.00
kcompactd0
62 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd
63 root 39 19 0 0 0 S 0.0 0.0 0:00.00
khugepaged
64 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 crypto
65 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
kintegrityd
66 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 bioset
67 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kblockd
75 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
devfreq_wq
76 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 watchdogd
77 root 20 0 0 0 0 S 0.0 0.0 0:00.00 kswapd0
78 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 vmstat
90 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kthrotld
91 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
ipv6_addrconf
121 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
acpi_thermal_pm
130 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 ata_sff
139 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 ixgbe
166 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0
167 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
scsi_tmf_0
168 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1
169 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
scsi_tmf_1
170 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_2
171 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
scsi_tmf_2
172 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_3
173 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
scsi_tmf_3
174 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_4
175 root 0 -20 0 0 0 S 0.0 0.0 0:00.00
scsi_tmf_4
176 root 20 0 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_5
>>>>> Supermicro support suggested as follows:
>>>>> it might be kernel related debian 11.5 has kernel 5.10 which is a
>>>>> recent kernel it might not properly support the chipsets for X9
>>>>> therefore i suggest to use RHEL or CentOS as they use much older
>>>>> kernel versions. I expect that with ubuntu 20.04 you see the same
>>>>> problem it uses kernel 5.4
>>>> >>> Testing another GNU/Linux distribution for another data point,
>>>> might
>>>> be a good idea.
>>>>
>>>> As nobody has responded yet, bisecting the issue is probably the
>>>> fastest way to get to the bottom of this. Luckily the problem seems
>>>> reproducible and you seem to be able to build a Linux kernel
>>>> yourself, so that should work. (For testing purposes you could also
>>>> test with Ubuntu, as they provide Linux kernel builds for (almost)
>>>> all releases in their Linux kernel mainline PPA [2].)
>>>>
>>> Of course I can try Ubuntu and report how it is working.
>>>
>> Ubuntu (5.15.0-43-generic) seems to be working in the same way
>> generating higher load after executing "ip link set enp1s0 up".
>
> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu 20.04
> with Linux 5.4, and Ubuntu 18.04 with 4.15?
>
> Anyway, I think, you won’t come around bisecting. Another hint, make
> sure that you can build a 4.9 Linux kernel yourself, that does not
> exhibit that issue.
>
That`s ringht, it is 22.04. I don`t have to build it. Standard kernel
Linux 4.9.0-6-amd64 form Debian 9.7 worked without problems for past 4
years.
Best regards
Bartek Kois
>
> Kind regards,
>
> Paul
>
>
>>>> [1]: https://bugzilla.kernel.org/
>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
2023-01-19 17:17 ` Bartek Kois
@ 2023-01-22 20:28 ` Paul Menzel
2023-01-23 18:38 ` Bartek Kois
0 siblings, 1 reply; 14+ messages in thread
From: Paul Menzel @ 2023-01-22 20:28 UTC (permalink / raw)
To: Bartek Kois; +Cc: intel-wired-lan, regressions
Dear Bartek,
Am 19.01.23 um 18:17 schrieb Bartek Kois:
> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>
>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>
>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>
>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>
>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link
>>>>>> set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN
>>>>>> based 10G adapter) I am experiencing high cpu load (even if no
>>>>>> traffic is passing through the adapter) and network performance is
>>>>>> low (when network is connected).
>>>>>
>>>>> How do you test the network performance? Please give exact numbers
>>>>> for comparison.
>>>>>
>>>> I am using this server as a router for my subscribers with iptables
>>>> (for NAT and firewall) and hfsc (for QoS). First I encountered this
>>>> problem while migrating form Debian 9.7 to 11.5. Routers based on
>>>> Supermicro X11SSL-F (Intel® C232 chipset) works with no problems
>>>> after that migration, but routers based on Supermicro X9SCL (Intel
>>>> C202 PCH) and Supermicro X10SLL+-F (Intel C222 Express PCH) starts
>>>> behaving strangely with high cpu load (0.5-0.8 while before it was
>>>> around 0.0-0.1) and subscribers not being able to utilize their
>>>> plans. I tried to strip down the problem and ends up with clean
>>>> system with no iptables or hfsc rules behaving the same (higher
>>>> load) right after setting the 10G link upeven if no traffic is
>>>> passing by.
>>>>
>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>>>> with no network attached. The problem can be observed on the
>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>
>>>>>> Tested environments:
>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1
>>>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no
>>>>>> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F
>>>>>> (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]
>>>>>
>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro X9SCL
>>>>>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>>> behave problematic as described above | newer platform: Supermicro
>>>>>> X11SSL-F (Intel® C232 chipset) working well with no problems]
>>>>>
>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where you
>>>>> can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>
>>>> I`ve already reported that to the Debian team
>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so far
>>>> nobody took care of this issue so far.
>>>>
>>>>>> So far to solve the problem I was trying to upgrade system to the
>>>>>> newest stable version, upgrade kernel to version 6.x, upgrade
>>>>>> ixgbe driver to the newest version but with no luck.
>>>>>
>>>>> Thank you for checking that. Too bad it’s still present. To rule
>>>>> out some user space problem, could you test Debian 9.7 with a
>>>>> stable Linux release, currently 6.1.7?
>>>>>
>>>>> What does `sudo perf top --sort comm,dso` show, where the time is
>>>>> spent?
>>>>
>>>> During my first test in real enviroment with subscribers I gether
>>>> the following data through the perf:
>>>>
>>>> 27.83% [kernel] [k] strncpy
>>>> 14.80% [kernel] [k] nft_do_chain
>>>> 7.61% [kernel] [k] memcmp
>>>> 5.63% [kernel] [k] nft_meta_get_eval
>>>> 3.14% [kernel] [k] nft_cmp_eval
>>>> 2.79% [kernel] [k] asm_exc_nmi
>>>> 1.07% [kernel] [k] module_get_kallsym
>>>> 0.92% [kernel] [k] kallsyms_expand_symbol.constprop.0
>>>> 0.85% [kernel] [k] ixgbe_poll
>>>> 0.75% [kernel] [k] format_decode
>>>> 0.61% [kernel] [k] number
>>>> 0.56% [kernel] [k] menu_select
>>>> 0.54% [kernel] [k] clflush_cache_range
>>>> 0.52% [kernel] [k] cpuidle_enter_state
>>>> 0.51% [kernel] [k] vsnprintf
>>>> 0.50% [kernel] [k] u32_classify
>>>> 0.49% [kernel] [k] fib_table_lookup
>>>> 0.40% [kernel] [k] dma_pte_clear_level
>>>> 0.39% [kernel] [k] domain_mapping
>>>> 0.36% [kernel] [k] ixgbe_xmit_fram
>>>>
>>>>
>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>>> 18 root 20 0 0 0 0 S 28.2 0.0 7:06.27 ksoftirqd/1
>>>> 12 root 20 0 0 0 0 R 12.0 0.0 4:10.88 ksoftirqd/0
>>
>> […]
>>
>> Do you see different behavior in `/proc/interrupts`?
>>
> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP
> Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on Supermicro X10SLL+-F
> (Intel C222 Express PCH):
>
> 1 root 20 0 163948 10288 7696 S 0.0 0.1 0:39.58 systemd
[…]
The content of `/proc/interrupts` has a different format on my system.
```
$ head -3 /proc/interrupts
CPU0 CPU1 CPU2 CPU3
1: 55560 0 113 0 IR-IO-APIC 1-edge
i8042
8: 0 0 0 0 IR-IO-APIC 8-edge
rtc0
```
[…]
> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1
> on Supermicro X10SLL+-F (Intel C222 Express PCH)
>
> 31659 root 20 0 0 0 0 S 0.3 0.0 0:00.92
> kworker/7:0
> 1 root 20 0 57032 6736 5256 S 0.0 0.1 2:28.14 systemd
[…]
>>>>>> Supermicro support suggested as follows:
>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which is a
>>>>>> recent kernel it might not properly support the chipsets for X9
>>>>>> therefore i suggest to use RHEL or CentOS as they use much older
>>>>>> kernel versions. I expect that with ubuntu 20.04 you see the same
>>>>>> problem it uses kernel 5.4
>>>>> >>> Testing another GNU/Linux distribution for another data point,
>>>>> might
>>>>> be a good idea.
>>>>>
>>>>> As nobody has responded yet, bisecting the issue is probably the
>>>>> fastest way to get to the bottom of this. Luckily the problem seems
>>>>> reproducible and you seem to be able to build a Linux kernel
>>>>> yourself, so that should work. (For testing purposes you could also
>>>>> test with Ubuntu, as they provide Linux kernel builds for (almost)
>>>>> all releases in their Linux kernel mainline PPA [2].)
>>>>>
>>>> Of course I can try Ubuntu and report how it is working.
>>>>
>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way
>>> generating higher load after executing "ip link set enp1s0 up".
>>
>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu 20.04
>> with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>
>> Anyway, I think, you won’t come around bisecting. Another hint, make
>> sure that you can build a 4.9 Linux kernel yourself, that does not
>> exhibit that issue.
>>
> That`s right, it is 22.04. I don`t have to build it. Standard kernel
> Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems for past 4
> years.
If nobody of the developers/maintainers is going to step up, you are on
your own. Again, as you can reproduce this easily, the fastest way is to
bisect the issue, which you can do on your own.
Kind regards,
Paul
>>>>> [1]: https://bugzilla.kernel.org/
>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
2023-01-22 20:28 ` Paul Menzel
@ 2023-01-23 18:38 ` Bartek Kois
2023-01-23 18:53 ` Paul Menzel
0 siblings, 1 reply; 14+ messages in thread
From: Bartek Kois @ 2023-01-23 18:38 UTC (permalink / raw)
To: Paul Menzel; +Cc: intel-wired-lan, regressions
W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
> Dear Bartek,
>
>
> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>
>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>
>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>
>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>
>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>
>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip
>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel
>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load (even
>>>>>>> if no traffic is passing through the adapter) and network
>>>>>>> performance is low (when network is connected).
>>>>>>
>>>>>> How do you test the network performance? Please give exact
>>>>>> numbers for comparison.
>>>>>>
>>>>> I am using this server as a router for my subscribers with
>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I
>>>>> encountered this problem while migrating form Debian 9.7 to 11.5.
>>>>> Routers based on Supermicro X11SSL-F (Intel® C232 chipset) works
>>>>> with no problems after that migration, but routers based on
>>>>> Supermicro X9SCL (Intel C202 PCH) and Supermicro X10SLL+-F (Intel
>>>>> C222 Express PCH) starts behaving strangely with high cpu load
>>>>> (0.5-0.8 while before it was around 0.0-0.1) and subscribers not
>>>>> being able to utilize their plans. I tried to strip down the
>>>>> problem and ends up with clean system with no iptables or hfsc
>>>>> rules behaving the same (higher load) right after setting the 10G
>>>>> link upeven if no traffic is passing by.
>>>>>
>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>>>>> with no network attached. The problem can be observed on the
>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the
>>>>>>> Supermicro
>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>
>>>>>>> Tested environments:
>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1
>>>>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with
>>>>>>> no problems: Supermicro X9SCL (Intel C202 PCH), Supermicro
>>>>>>> X10SLL+-F (Intel C222 Express PCH), Supermicro X11SSL-F (Intel®
>>>>>>> C232 chipset)]
>>>>>>
>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro X9SCL
>>>>>>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>>>> behave problematic as described above | newer platform:
>>>>>>> Supermicro X11SSL-F (Intel® C232 chipset) working well with no
>>>>>>> problems]
>>>>>>
>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where you
>>>>>> can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>
>>>>> I`ve already reported that to the Debian team
>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so
>>>>> far nobody took care of this issue so far.
>>>>>
>>>>>>> So far to solve the problem I was trying to upgrade system to
>>>>>>> the newest stable version, upgrade kernel to version 6.x,
>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>
>>>>>> Thank you for checking that. Too bad it’s still present. To rule
>>>>>> out some user space problem, could you test Debian 9.7 with a
>>>>>> stable Linux release, currently 6.1.7?
>>>>>>
>>>>>> What does `sudo perf top --sort comm,dso` show, where the time is
>>>>>> spent?
>>>>>
>>>>> During my first test in real enviroment with subscribers I gether
>>>>> the following data through the perf:
>>>>>
>>>>> 27.83% [kernel] [k] strncpy
>>>>> 14.80% [kernel] [k] nft_do_chain
>>>>> 7.61% [kernel] [k] memcmp
>>>>> 5.63% [kernel] [k] nft_meta_get_eval
>>>>> 3.14% [kernel] [k] nft_cmp_eval
>>>>> 2.79% [kernel] [k] asm_exc_nmi
>>>>> 1.07% [kernel] [k] module_get_kallsym
>>>>> 0.92% [kernel] [k]
>>>>> kallsyms_expand_symbol.constprop.0
>>>>> 0.85% [kernel] [k] ixgbe_poll
>>>>> 0.75% [kernel] [k] format_decode
>>>>> 0.61% [kernel] [k] number
>>>>> 0.56% [kernel] [k] menu_select
>>>>> 0.54% [kernel] [k] clflush_cache_range
>>>>> 0.52% [kernel] [k] cpuidle_enter_state
>>>>> 0.51% [kernel] [k] vsnprintf
>>>>> 0.50% [kernel] [k] u32_classify
>>>>> 0.49% [kernel] [k] fib_table_lookup
>>>>> 0.40% [kernel] [k] dma_pte_clear_level
>>>>> 0.39% [kernel] [k] domain_mapping
>>>>> 0.36% [kernel] [k] ixgbe_xmit_fram
>>>>>
>>>>>
>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+
>>>>> COMMAND
>>>>> 18 root 20 0 0 0 0 S 28.2 0.0 7:06.27
>>>>> ksoftirqd/1
>>>>> 12 root 20 0 0 0 0 R 12.0 0.0 4:10.88
>>>>> ksoftirqd/0
>>>
>>> […]
>>>
>>> Do you see different behavior in `/proc/interrupts`?
>>>
>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64 #1
>> SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on Supermicro
>> X10SLL+-F (Intel C222 Express PCH):
>>
>> 1 root 20 0 163948 10288 7696 S 0.0 0.1 0:39.58
>> systemd
>
> […]
>
> The content of `/proc/interrupts` has a different format on my system.
>
> ```
> $ head -3 /proc/interrupts
> CPU0 CPU1 CPU2 CPU3
> 1: 55560 0 113 0 IR-IO-APIC 1-edge
> i8042
> 8: 0 0 0 0 IR-IO-APIC 8-edge
> rtc0
> ```
> […]
>
>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>
>> 31659 root 20 0 0 0 0 S 0.3 0.0 0:00.92
>> kworker/7:0
>> 1 root 20 0 57032 6736 5256 S 0.0 0.1 2:28.14
>> systemd
>
> […]
>>>>>>> Supermicro support suggested as follows:
>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which is
>>>>>>> a recent kernel it might not properly support the chipsets for
>>>>>>> X9 therefore i suggest to use RHEL or CentOS as they use much
>>>>>>> older kernel versions. I expect that with ubuntu 20.04 you see
>>>>>>> the same problem it uses kernel 5.4
>>>>>> >>> Testing another GNU/Linux distribution for another data
>>>>>> point, might
>>>>>> be a good idea.
>>>>>>
>>>>>> As nobody has responded yet, bisecting the issue is probably the
>>>>>> fastest way to get to the bottom of this. Luckily the problem
>>>>>> seems reproducible and you seem to be able to build a Linux
>>>>>> kernel yourself, so that should work. (For testing purposes you
>>>>>> could also test with Ubuntu, as they provide Linux kernel builds
>>>>>> for (almost) all releases in their Linux kernel mainline PPA [2].)
>>>>>>
>>>>> Of course I can try Ubuntu and report how it is working.
>>>>>
>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way
>>>> generating higher load after executing "ip link set enp1s0 up".
>>>
>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu
>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>
>>> Anyway, I think, you won’t come around bisecting. Another hint, make
>>> sure that you can build a 4.9 Linux kernel yourself, that does not
>>> exhibit that issue.
>>>
>> That`s right, it is 22.04. I don`t have to build it. Standard kernel
>> Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems for past
>> 4 years.
>
> If nobody of the developers/maintainers is going to step up, you are
> on your own. Again, as you can reproduce this easily, the fastest way
> is to bisect the issue, which you can do on your own.
How can I invastigate that futher? I thought about trying to change some
of the parameters related to ixgbe driver and observe if anything is
changing, but when I am trying to do:
sudo modprobe ixgbe IntMode=0
I get the following error in the dmesg:
[ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
[ 2137.324848] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver
[ 2137.324848] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
[ 2138.505751] ixgbe 0000:02:00.0: Multiqueue Enabled: Rx Queue count =
4, Tx Queue count = 4 XDP Queue count = 0
[ 2138.506049] ixgbe 0000:02:00.0: 32.000 Gb/s available PCIe bandwidth
(5.0 GT/s PCIe x8 link)
[ 2138.506134] ixgbe 0000:02:00.0: MAC: 2, PHY: 1, PBA No: 0210FF-0FF
[ 2138.506137] ixgbe 0000:02:00.0: ac:1f:6b:ab:fa:70
[ 2138.510537] ixgbe 0000:02:00.0 enp2s0: renamed from eth0
[ 2138.537452] ixgbe 0000:02:00.0: Intel(R) 10 Gigabit Network Connection
How should I use those parameters?
Best regards
Bartek Kois
>
> Kind regards,
>
> Paul
>
>
>>>>>> [1]: https://bugzilla.kernel.org/
>>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
2023-01-23 18:38 ` Bartek Kois
@ 2023-01-23 18:53 ` Paul Menzel
2023-01-23 18:58 ` Bartek Kois
0 siblings, 1 reply; 14+ messages in thread
From: Paul Menzel @ 2023-01-23 18:53 UTC (permalink / raw)
To: Bartek Kois; +Cc: intel-wired-lan, regressions
Dear Bartek,
Am 23.01.23 um 19:38 schrieb Bartek Kois:
>
> W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
>> Dear Bartek,
>>
>>
>> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>>
>>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>>
>>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>>
>>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>>
>>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>>
>>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip
>>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel
>>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load (even
>>>>>>>> if no traffic is passing through the adapter) and network
>>>>>>>> performance is low (when network is connected).
>>>>>>>
>>>>>>> How do you test the network performance? Please give exact
>>>>>>> numbers for comparison.
>>>>>>>
>>>>>> I am using this server as a router for my subscribers with
>>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I
>>>>>> encountered this problem while migrating form Debian 9.7 to 11.5.
>>>>>> Routers based on Supermicro X11SSL-F (Intel® C232 chipset) works
>>>>>> with no problems after that migration, but routers based on
>>>>>> Supermicro X9SCL (Intel C202 PCH) and Supermicro X10SLL+-F (Intel
>>>>>> C222 Express PCH) starts behaving strangely with high cpu load
>>>>>> (0.5-0.8 while before it was around 0.0-0.1) and subscribers not
>>>>>> being able to utilize their plans. I tried to strip down the
>>>>>> problem and ends up with clean system with no iptables or hfsc
>>>>>> rules behaving the same (higher load) right after setting the 10G
>>>>>> link upeven if no traffic is passing by.
>>>>>>
>>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>>>>>> with no network attached. The problem can be observed on the
>>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the
>>>>>>>> Supermicro
>>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>>
>>>>>>>> Tested environments:
>>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1
>>>>>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with
>>>>>>>> no problems: Supermicro X9SCL (Intel C202 PCH), Supermicro
>>>>>>>> X10SLL+-F (Intel C222 Express PCH), Supermicro X11SSL-F (Intel®
>>>>>>>> C232 chipset)]
>>>>>>>
>>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro X9SCL
>>>>>>>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>>>>> behave problematic as described above | newer platform:
>>>>>>>> Supermicro X11SSL-F (Intel® C232 chipset) working well with no
>>>>>>>> problems]
>>>>>>>
>>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where you
>>>>>>> can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>>
>>>>>> I`ve already reported that to the Debian team
>>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so
>>>>>> far nobody took care of this issue so far.
>>>>>>
>>>>>>>> So far to solve the problem I was trying to upgrade system to
>>>>>>>> the newest stable version, upgrade kernel to version 6.x,
>>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>>
>>>>>>> Thank you for checking that. Too bad it’s still present. To rule
>>>>>>> out some user space problem, could you test Debian 9.7 with a
>>>>>>> stable Linux release, currently 6.1.7?
>>>>>>>
>>>>>>> What does `sudo perf top --sort comm,dso` show, where the time is
>>>>>>> spent?
>>>>>>
>>>>>> During my first test in real enviroment with subscribers I gether
>>>>>> the following data through the perf:
>>>>>>
>>>>>> 27.83% [kernel] [k] strncpy
>>>>>> 14.80% [kernel] [k] nft_do_chain
>>>>>> 7.61% [kernel] [k] memcmp
>>>>>> 5.63% [kernel] [k] nft_meta_get_eval
>>>>>> 3.14% [kernel] [k] nft_cmp_eval
>>>>>> 2.79% [kernel] [k] asm_exc_nmi
>>>>>> 1.07% [kernel] [k] module_get_kallsym
>>>>>> 0.92% [kernel] [k]
>>>>>> kallsyms_expand_symbol.constprop.0
>>>>>> 0.85% [kernel] [k] ixgbe_poll
>>>>>> 0.75% [kernel] [k] format_decode
>>>>>> 0.61% [kernel] [k] number
>>>>>> 0.56% [kernel] [k] menu_select
>>>>>> 0.54% [kernel] [k] clflush_cache_range
>>>>>> 0.52% [kernel] [k] cpuidle_enter_state
>>>>>> 0.51% [kernel] [k] vsnprintf
>>>>>> 0.50% [kernel] [k] u32_classify
>>>>>> 0.49% [kernel] [k] fib_table_lookup
>>>>>> 0.40% [kernel] [k] dma_pte_clear_level
>>>>>> 0.39% [kernel] [k] domain_mapping
>>>>>> 0.36% [kernel] [k] ixgbe_xmit_fram
>>>>>>
>>>>>>
>>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
>>>>>> 18 root 20 0 0 0 0 S 28.2 0.0 7:06.27 ksoftirqd/1
>>>>>> 12 root 20 0 0 0 0 R 12.0 0.0 4:10.88 ksoftirqd/0
>>>>
>>>> […]
>>>>
>>>> Do you see different behavior in `/proc/interrupts`?
>>>>
>>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64 #1
>>> SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on Supermicro
>>> X10SLL+-F (Intel C222 Express PCH):
>>>
>>> 1 root 20 0 163948 10288 7696 S 0.0 0.1 0:39.58 systemd
>>
>> […]
>>
>> The content of `/proc/interrupts` has a different format on my system.
>>
>> ```
>> $ head -3 /proc/interrupts
>> CPU0 CPU1 CPU2 CPU3
>> 1: 55560 0 113 0 IR-IO-APIC 1-edge i8042
>> 8: 0 0 0 0 IR-IO-APIC 8-edge rtc0
>> ```
>> […]
>>
>>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>
>>> 31659 root 20 0 0 0 0 S 0.3 0.0 0:00.92 kworker/7:0
>>> 1 root 20 0 57032 6736 5256 S 0.0 0.1 2:28.14 systemd
>>
>> […]
>>>>>>>> Supermicro support suggested as follows:
>>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which is
>>>>>>>> a recent kernel it might not properly support the chipsets for
>>>>>>>> X9 therefore i suggest to use RHEL or CentOS as they use much
>>>>>>>> older kernel versions. I expect that with ubuntu 20.04 you see
>>>>>>>> the same problem it uses kernel 5.4
>>>>>>> >>> Testing another GNU/Linux distribution for another data
>>>>>>> point, might be a good idea.
>>>>>>>
>>>>>>> As nobody has responded yet, bisecting the issue is probably the
>>>>>>> fastest way to get to the bottom of this. Luckily the problem
>>>>>>> seems reproducible and you seem to be able to build a Linux
>>>>>>> kernel yourself, so that should work. (For testing purposes you
>>>>>>> could also test with Ubuntu, as they provide Linux kernel builds
>>>>>>> for (almost) all releases in their Linux kernel mainline PPA [2].)
>>>>>>>
>>>>>> Of course I can try Ubuntu and report how it is working.
>>>>>>
>>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way
>>>>> generating higher load after executing "ip link set enp1s0 up".
>>>>
>>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu
>>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>>
>>>> Anyway, I think, you won’t come around bisecting. Another hint, make
>>>> sure that you can build a 4.9 Linux kernel yourself, that does not
>>>> exhibit that issue.
>>>>
>>> That`s right, it is 22.04. I don`t have to build it. Standard kernel
>>> Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems for past
>>> 4 years.
>>
>> If nobody of the developers/maintainers is going to step up, you are
>> on your own. Again, as you can reproduce this easily, the fastest way
>> is to bisect the issue, which you can do on your own.
>
> How can I investigate that further?
I repeat myself, please bisect the issue. It’s the fastest way.
> I thought about trying to change some
> of the parameters related to ixgbe driver and observe if anything is
> changing, but when I am trying to do:
>
> sudo modprobe ixgbe IntMode=0
>
> I get the following error in the dmesg:
>
> [ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
[…]
`modinfo ixgbe` shows the supported parameters.
Kind regards,
Paul
PS: If you need help bisecting, please ask. Otherwise, I am out of this
thread.
>>>>>>> [1]: https://bugzilla.kernel.org/
>>>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
2023-01-23 18:53 ` Paul Menzel
@ 2023-01-23 18:58 ` Bartek Kois
2023-01-23 19:03 ` Paul Menzel
0 siblings, 1 reply; 14+ messages in thread
From: Bartek Kois @ 2023-01-23 18:58 UTC (permalink / raw)
To: Paul Menzel; +Cc: intel-wired-lan, regressions
W dniu 23.01.2023 o 19:53, Paul Menzel pisze:
> Dear Bartek,
>
>
> Am 23.01.23 um 19:38 schrieb Bartek Kois:
>>
>> W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
>>> Dear Bartek,
>>>
>>>
>>> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>>>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>>>
>>>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>>>
>>>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>>>
>>>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>>>
>>>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>>>
>>>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip
>>>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel
>>>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load
>>>>>>>>> (even if no traffic is passing through the adapter) and
>>>>>>>>> network performance is low (when network is connected).
>>>>>>>>
>>>>>>>> How do you test the network performance? Please give exact
>>>>>>>> numbers for comparison.
>>>>>>>>
>>>>>>> I am using this server as a router for my subscribers with
>>>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I
>>>>>>> encountered this problem while migrating form Debian 9.7 to
>>>>>>> 11.5. Routers based on Supermicro X11SSL-F (Intel® C232
>>>>>>> chipset) works with no problems after that migration, but
>>>>>>> routers based on Supermicro X9SCL (Intel C202 PCH) and
>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving
>>>>>>> strangely with high cpu load (0.5-0.8 while before it was around
>>>>>>> 0.0-0.1) and subscribers not being able to utilize their plans.
>>>>>>> I tried to strip down the problem and ends up with clean system
>>>>>>> with no iptables or hfsc rules behaving the same (higher load)
>>>>>>> right after setting the 10G link upeven if no traffic is passing
>>>>>>> by.
>>>>>>>
>>>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>>>>>>> with no network attached. The problem can be observed on the
>>>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the
>>>>>>>>> Supermicro
>>>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>>>
>>>>>>>>> Tested environments:
>>>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1
>>>>>>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with
>>>>>>>>> no problems: Supermicro X9SCL (Intel C202 PCH), Supermicro
>>>>>>>>> X10SLL+-F (Intel C222 Express PCH), Supermicro X11SSL-F
>>>>>>>>> (Intel® C232 chipset)]
>>>>>>>>
>>>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro
>>>>>>>>> X9SCL (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222
>>>>>>>>> Express PCH) behave problematic as described above | newer
>>>>>>>>> platform: Supermicro X11SSL-F (Intel® C232 chipset) working
>>>>>>>>> well with no problems]
>>>>>>>>
>>>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where
>>>>>>>> you can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>>>
>>>>>>> I`ve already reported that to the Debian team
>>>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so
>>>>>>> far nobody took care of this issue so far.
>>>>>>>
>>>>>>>>> So far to solve the problem I was trying to upgrade system to
>>>>>>>>> the newest stable version, upgrade kernel to version 6.x,
>>>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>>>
>>>>>>>> Thank you for checking that. Too bad it’s still present. To
>>>>>>>> rule out some user space problem, could you test Debian 9.7
>>>>>>>> with a stable Linux release, currently 6.1.7?
>>>>>>>>
>>>>>>>> What does `sudo perf top --sort comm,dso` show, where the time
>>>>>>>> is spent?
>>>>>>>
>>>>>>> During my first test in real enviroment with subscribers I
>>>>>>> gether the following data through the perf:
>>>>>>>
>>>>>>> 27.83% [kernel] [k] strncpy
>>>>>>> 14.80% [kernel] [k] nft_do_chain
>>>>>>> 7.61% [kernel] [k] memcmp
>>>>>>> 5.63% [kernel] [k] nft_meta_get_eval
>>>>>>> 3.14% [kernel] [k] nft_cmp_eval
>>>>>>> 2.79% [kernel] [k] asm_exc_nmi
>>>>>>> 1.07% [kernel] [k] module_get_kallsym
>>>>>>> 0.92% [kernel] [k]
>>>>>>> kallsyms_expand_symbol.constprop.0
>>>>>>> 0.85% [kernel] [k] ixgbe_poll
>>>>>>> 0.75% [kernel] [k] format_decode
>>>>>>> 0.61% [kernel] [k] number
>>>>>>> 0.56% [kernel] [k] menu_select
>>>>>>> 0.54% [kernel] [k] clflush_cache_range
>>>>>>> 0.52% [kernel] [k] cpuidle_enter_state
>>>>>>> 0.51% [kernel] [k] vsnprintf
>>>>>>> 0.50% [kernel] [k] u32_classify
>>>>>>> 0.49% [kernel] [k] fib_table_lookup
>>>>>>> 0.40% [kernel] [k] dma_pte_clear_level
>>>>>>> 0.39% [kernel] [k] domain_mapping
>>>>>>> 0.36% [kernel] [k] ixgbe_xmit_fram
>>>>>>>
>>>>>>>
>>>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM
>>>>>>> TIME+ COMMAND
>>>>>>> 18 root 20 0 0 0 0 S 28.2 0.0
>>>>>>> 7:06.27 ksoftirqd/1
>>>>>>> 12 root 20 0 0 0 0 R 12.0 0.0
>>>>>>> 4:10.88 ksoftirqd/0
>>>>>
>>>>> […]
>>>>>
>>>>> Do you see different behavior in `/proc/interrupts`?
>>>>>
>>>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64
>>>> #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on
>>>> Supermicro X10SLL+-F (Intel C222 Express PCH):
>>>>
>>>> 1 root 20 0 163948 10288 7696 S 0.0 0.1 0:39.58
>>>> systemd
>>>
>>> […]
>>>
>>> The content of `/proc/interrupts` has a different format on my system.
>>>
>>> ```
>>> $ head -3 /proc/interrupts
>>> CPU0 CPU1 CPU2 CPU3
>>> 1: 55560 0 113 0 IR-IO-APIC 1-edge
>>> i8042
>>> 8: 0 0 0 0 IR-IO-APIC 8-edge
>>> rtc0
>>> ```
>>> […]
>>>
>>>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>
>>>> 31659 root 20 0 0 0 0 S 0.3 0.0 0:00.92
>>>> kworker/7:0
>>>> 1 root 20 0 57032 6736 5256 S 0.0 0.1 2:28.14
>>>> systemd
>>>
>>> […]
>>>>>>>>> Supermicro support suggested as follows:
>>>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which
>>>>>>>>> is a recent kernel it might not properly support the chipsets
>>>>>>>>> for X9 therefore i suggest to use RHEL or CentOS as they use
>>>>>>>>> much older kernel versions. I expect that with ubuntu 20.04
>>>>>>>>> you see the same problem it uses kernel 5.4
>>>>>>>> >>> Testing another GNU/Linux distribution for another data
>>>>>>>> point, might be a good idea.
>>>>>>>>
>>>>>>>> As nobody has responded yet, bisecting the issue is probably
>>>>>>>> the fastest way to get to the bottom of this. Luckily the
>>>>>>>> problem seems reproducible and you seem to be able to build a
>>>>>>>> Linux kernel yourself, so that should work. (For testing
>>>>>>>> purposes you could also test with Ubuntu, as they provide Linux
>>>>>>>> kernel builds for (almost) all releases in their Linux kernel
>>>>>>>> mainline PPA [2].)
>>>>>>>>
>>>>>>> Of course I can try Ubuntu and report how it is working.
>>>>>>>
>>>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way
>>>>>> generating higher load after executing "ip link set enp1s0 up".
>>>>>
>>>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu
>>>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>>>
>>>>> Anyway, I think, you won’t come around bisecting. Another hint,
>>>>> make sure that you can build a 4.9 Linux kernel yourself, that
>>>>> does not exhibit that issue.
>>>>>
>>>> That`s right, it is 22.04. I don`t have to build it. Standard
>>>> kernel Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems
>>>> for past 4 years.
>>>
>>> If nobody of the developers/maintainers is going to step up, you are
>>> on your own. Again, as you can reproduce this easily, the fastest
>>> way is to bisect the issue, which you can do on your own.
>>
>> How can I investigate that further?
>
> I repeat myself, please bisect the issue. It’s the fastest way.
>
>> I thought about trying to change some of the parameters related to
>> ixgbe driver and observe if anything is changing, but when I am
>> trying to do:
>>
>> sudo modprobe ixgbe IntMode=0
>>
>> I get the following error in the dmesg:
>>
>> [ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
>
> […]
>
> `modinfo ixgbe` shows the supported parameters.
>
>
> Kind regards,
>
> Paul
>
>
> PS: If you need help bisecting, please ask. Otherwise, I am out of
> this thread.
Ok, how exactly I can bisect this issue?
Best regards
Bartek Kois
>
>>>>>>>> [1]: https://bugzilla.kernel.org/
>>>>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
2023-01-23 18:58 ` Bartek Kois
@ 2023-01-23 19:03 ` Paul Menzel
2023-01-24 9:33 ` Linux kernel regression tracking (Thorsten Leemhuis)
0 siblings, 1 reply; 14+ messages in thread
From: Paul Menzel @ 2023-01-23 19:03 UTC (permalink / raw)
To: Bartek Kois; +Cc: intel-wired-lan, regressions
Dear Bartek,
Am 23.01.23 um 19:58 schrieb Bartek Kois:
> W dniu 23.01.2023 o 19:53, Paul Menzel pisze:
>> Am 23.01.23 um 19:38 schrieb Bartek Kois:
>>>
>>> W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
>>>> Dear Bartek,
>>>>
>>>>
>>>> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>>>>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>>>>
>>>>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>>>>
>>>>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>>>>
>>>>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>>>>
>>>>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>>>>
>>>>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip
>>>>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel
>>>>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load
>>>>>>>>>> (even if no traffic is passing through the adapter) and
>>>>>>>>>> network performance is low (when network is connected).
>>>>>>>>>
>>>>>>>>> How do you test the network performance? Please give exact
>>>>>>>>> numbers for comparison.
>>>>>>>>>
>>>>>>>> I am using this server as a router for my subscribers with
>>>>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I
>>>>>>>> encountered this problem while migrating form Debian 9.7 to
>>>>>>>> 11.5. Routers based on Supermicro X11SSL-F (Intel® C232
>>>>>>>> chipset) works with no problems after that migration, but
>>>>>>>> routers based on Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving
>>>>>>>> strangely with high cpu load (0.5-0.8 while before it was around
>>>>>>>> 0.0-0.1) and subscribers not being able to utilize their plans.
>>>>>>>> I tried to strip down the problem and ends up with clean system
>>>>>>>> with no iptables or hfsc rules behaving the same (higher load)
>>>>>>>> right after setting the 10G link upeven if no traffic is passing
>>>>>>>> by.
>>>>>>>>
>>>>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>>>>>>>> with no network attached. The problem can be observed on the
>>>>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the
>>>>>>>>>> Supermicro
>>>>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>>>>
>>>>>>>>>> Tested environments:
>>>>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1
>>>>>>>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with
>>>>>>>>>> no problems: Supermicro X9SCL (Intel C202 PCH), Supermicro
>>>>>>>>>> X10SLL+-F (Intel C222 Express PCH), Supermicro X11SSL-F
>>>>>>>>>> (Intel® C232 chipset)]
>>>>>>>>>
>>>>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro
>>>>>>>>>> X9SCL (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222
>>>>>>>>>> Express PCH) behave problematic as described above | newer
>>>>>>>>>> platform: Supermicro X11SSL-F (Intel® C232 chipset) working
>>>>>>>>>> well with no problems]
>>>>>>>>>
>>>>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where
>>>>>>>>> you can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>>>>
>>>>>>>> I`ve already reported that to the Debian team
>>>>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so
>>>>>>>> far nobody took care of this issue so far.
>>>>>>>>
>>>>>>>>>> So far to solve the problem I was trying to upgrade system to
>>>>>>>>>> the newest stable version, upgrade kernel to version 6.x,
>>>>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>>>>
>>>>>>>>> Thank you for checking that. Too bad it’s still present. To
>>>>>>>>> rule out some user space problem, could you test Debian 9.7
>>>>>>>>> with a stable Linux release, currently 6.1.7?
>>>>>>>>>
>>>>>>>>> What does `sudo perf top --sort comm,dso` show, where the time
>>>>>>>>> is spent?
>>>>>>>>
>>>>>>>> During my first test in real enviroment with subscribers I
>>>>>>>> gether the following data through the perf:
>>>>>>>>
>>>>>>>> 27.83% [kernel] [k] strncpy
>>>>>>>> 14.80% [kernel] [k] nft_do_chain
>>>>>>>> 7.61% [kernel] [k] memcmp
>>>>>>>> 5.63% [kernel] [k] nft_meta_get_eval
>>>>>>>> 3.14% [kernel] [k] nft_cmp_eval
>>>>>>>> 2.79% [kernel] [k] asm_exc_nmi
>>>>>>>> 1.07% [kernel] [k] module_get_kallsym
>>>>>>>> 0.92% [kernel] [k]
>>>>>>>> kallsyms_expand_symbol.constprop.0
>>>>>>>> 0.85% [kernel] [k] ixgbe_poll
>>>>>>>> 0.75% [kernel] [k] format_decode
>>>>>>>> 0.61% [kernel] [k] number
>>>>>>>> 0.56% [kernel] [k] menu_select
>>>>>>>> 0.54% [kernel] [k] clflush_cache_range
>>>>>>>> 0.52% [kernel] [k] cpuidle_enter_state
>>>>>>>> 0.51% [kernel] [k] vsnprintf
>>>>>>>> 0.50% [kernel] [k] u32_classify
>>>>>>>> 0.49% [kernel] [k] fib_table_lookup
>>>>>>>> 0.40% [kernel] [k] dma_pte_clear_level
>>>>>>>> 0.39% [kernel] [k] domain_mapping
>>>>>>>> 0.36% [kernel] [k] ixgbe_xmit_fram
>>>>>>>>
>>>>>>>>
>>>>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM
>>>>>>>> TIME+ COMMAND
>>>>>>>> 18 root 20 0 0 0 0 S 28.2 0.0
>>>>>>>> 7:06.27 ksoftirqd/1
>>>>>>>> 12 root 20 0 0 0 0 R 12.0 0.0
>>>>>>>> 4:10.88 ksoftirqd/0
>>>>>>
>>>>>> […]
>>>>>>
>>>>>> Do you see different behavior in `/proc/interrupts`?
>>>>>>
>>>>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64
>>>>> #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on
>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH):
>>>>>
>>>>> 1 root 20 0 163948 10288 7696 S 0.0 0.1 0:39.58
>>>>> systemd
>>>>
>>>> […]
>>>>
>>>> The content of `/proc/interrupts` has a different format on my system.
>>>>
>>>> ```
>>>> $ head -3 /proc/interrupts
>>>> CPU0 CPU1 CPU2 CPU3
>>>> 1: 55560 0 113 0 IR-IO-APIC 1-edge
>>>> i8042
>>>> 8: 0 0 0 0 IR-IO-APIC 8-edge
>>>> rtc0
>>>> ```
>>>> […]
>>>>
>>>>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>>
>>>>> 31659 root 20 0 0 0 0 S 0.3 0.0 0:00.92
>>>>> kworker/7:0
>>>>> 1 root 20 0 57032 6736 5256 S 0.0 0.1 2:28.14
>>>>> systemd
>>>>
>>>> […]
>>>>>>>>>> Supermicro support suggested as follows:
>>>>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which
>>>>>>>>>> is a recent kernel it might not properly support the chipsets
>>>>>>>>>> for X9 therefore i suggest to use RHEL or CentOS as they use
>>>>>>>>>> much older kernel versions. I expect that with ubuntu 20.04
>>>>>>>>>> you see the same problem it uses kernel 5.4
>>>>>>>>> >>> Testing another GNU/Linux distribution for another data
>>>>>>>>> point, might be a good idea.
>>>>>>>>>
>>>>>>>>> As nobody has responded yet, bisecting the issue is probably
>>>>>>>>> the fastest way to get to the bottom of this. Luckily the
>>>>>>>>> problem seems reproducible and you seem to be able to build a
>>>>>>>>> Linux kernel yourself, so that should work. (For testing
>>>>>>>>> purposes you could also test with Ubuntu, as they provide Linux
>>>>>>>>> kernel builds for (almost) all releases in their Linux kernel
>>>>>>>>> mainline PPA [2].)
>>>>>>>>>
>>>>>>>> Of course I can try Ubuntu and report how it is working.
>>>>>>>>
>>>>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way
>>>>>>> generating higher load after executing "ip link set enp1s0 up".
>>>>>>
>>>>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu
>>>>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>>>>
>>>>>> Anyway, I think, you won’t come around bisecting. Another hint,
>>>>>> make sure that you can build a 4.9 Linux kernel yourself, that
>>>>>> does not exhibit that issue.
>>>>>>
>>>>> That`s right, it is 22.04. I don`t have to build it. Standard
>>>>> kernel Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems
>>>>> for past 4 years.
>>>>
>>>> If nobody of the developers/maintainers is going to step up, you are
>>>> on your own. Again, as you can reproduce this easily, the fastest
>>>> way is to bisect the issue, which you can do on your own.
>>>
>>> How can I investigate that further?
>>
>> I repeat myself, please bisect the issue. It’s the fastest way.
>>
>>> I thought about trying to change some of the parameters related to
>>> ixgbe driver and observe if anything is changing, but when I am
>>> trying to do:
>>>
>>> sudo modprobe ixgbe IntMode=0
>>>
>>> I get the following error in the dmesg:
>>>
>>> [ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
>>
>> […]
>>
>> `modinfo ixgbe` shows the supported parameters.
>> PS: If you need help bisecting, please ask. Otherwise, I am out of
>> this thread.
>
> Ok, how exactly I can bisect this issue?
What have you tried so far? As written in the past, I’d first try more
distributions, for example, older Ubuntu versions. Then, if you have
some range, I’d use the Ubuntu PPA, and then between the release
candidate versions, only then start doing `git bisect` as documented in
the documentation [3].
Kind regards,
Paul
>>>>>>>>> [1]: https://bugzilla.kernel.org/
>>>>>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
[3]: https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
2023-01-23 19:03 ` Paul Menzel
@ 2023-01-24 9:33 ` Linux kernel regression tracking (Thorsten Leemhuis)
2023-01-24 9:40 ` Bartek Kois
0 siblings, 1 reply; 14+ messages in thread
From: Linux kernel regression tracking (Thorsten Leemhuis) @ 2023-01-24 9:33 UTC (permalink / raw)
To: Paul Menzel, Bartek Kois; +Cc: intel-wired-lan, regressions
On 23.01.23 20:03, Paul Menzel wrote:
> Am 23.01.23 um 19:58 schrieb Bartek Kois:
>
>> W dniu 23.01.2023 o 19:53, Paul Menzel pisze:
>
>>> Am 23.01.23 um 19:38 schrieb Bartek Kois:
>>>>
>>>> W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
>>>>> Dear Bartek,
>>>>>
>>>>>
>>>>> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>>>>>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>>>>>
>>>>>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>>>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>>>>>
>>>>>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>>>>>
>>>>>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>>>>>
>>>>>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>>>>>
>>>>>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip
>>>>>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel
>>>>>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load
>>>>>>>>>>> (even if no traffic is passing through the adapter) and
>>>>>>>>>>> network performance is low (when network is connected).
>>>>>>>>>>
>>>>>>>>>> How do you test the network performance? Please give exact
>>>>>>>>>> numbers for comparison.
>>>>>>>>>>
>>>>>>>>> I am using this server as a router for my subscribers with
>>>>>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I
>>>>>>>>> encountered this problem while migrating form Debian 9.7 to
>>>>>>>>> 11.5. Routers based on Supermicro X11SSL-F (Intel® C232
>>>>>>>>> chipset) works with no problems after that migration, but
>>>>>>>>> routers based on Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving
>>>>>>>>> strangely with high cpu load (0.5-0.8 while before it was
>>>>>>>>> around 0.0-0.1) and subscribers not being able to utilize their
>>>>>>>>> plans. I tried to strip down the problem and ends up with clean
>>>>>>>>> system with no iptables or hfsc rules behaving the same (higher
>>>>>>>>> load) right after setting the 10G link upeven if no traffic is
>>>>>>>>> passing by.
>>>>>>>>>
>>>>>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla
>>>>>>>>>>> system
>>>>>>>>>>> with no network attached. The problem can be observed on the
>>>>>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the
>>>>>>>>>>> Supermicro
>>>>>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>>>>>
>>>>>>>>>>> Tested environments:
>>>>>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>>>>>>> 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux [all platforms
>>>>>>>>>>> working well with no problems: Supermicro X9SCL (Intel C202
>>>>>>>>>>> PCH), Supermicro X10SLL+-F (Intel C222 Express PCH),
>>>>>>>>>>> Supermicro X11SSL-F (Intel® C232 chipset)]
>>>>>>>>>>
>>>>>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>>>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro
>>>>>>>>>>> X9SCL (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222
>>>>>>>>>>> Express PCH) behave problematic as described above | newer
>>>>>>>>>>> platform: Supermicro X11SSL-F (Intel® C232 chipset) working
>>>>>>>>>>> well with no problems]
>>>>>>>>>>
>>>>>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where
>>>>>>>>>> you can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>>>>>
>>>>>>>>> I`ve already reported that to the Debian team
>>>>>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but
>>>>>>>>> so far nobody took care of this issue so far.
>>>>>>>>>
>>>>>>>>>>> So far to solve the problem I was trying to upgrade system to
>>>>>>>>>>> the newest stable version, upgrade kernel to version 6.x,
>>>>>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>>>>>
>>>>>>>>>> Thank you for checking that. Too bad it’s still present. To
>>>>>>>>>> rule out some user space problem, could you test Debian 9.7
>>>>>>>>>> with a stable Linux release, currently 6.1.7?
>>>>>>>>>>
>>>>>>>>>> What does `sudo perf top --sort comm,dso` show, where the time
>>>>>>>>>> is spent?
>>>>>>>>>
>>>>>>>>> During my first test in real enviroment with subscribers I
>>>>>>>>> gether the following data through the perf:
>>>>>>>>>
>>>>>>>>> 27.83% [kernel] [k] strncpy
>>>>>>>>> 14.80% [kernel] [k] nft_do_chain
>>>>>>>>> 7.61% [kernel] [k] memcmp
>>>>>>>>> 5.63% [kernel] [k] nft_meta_get_eval
>>>>>>>>> 3.14% [kernel] [k] nft_cmp_eval
>>>>>>>>> 2.79% [kernel] [k] asm_exc_nmi
>>>>>>>>> 1.07% [kernel] [k] module_get_kallsym
>>>>>>>>> 0.92% [kernel] [k]
>>>>>>>>> kallsyms_expand_symbol.constprop.0
>>>>>>>>> 0.85% [kernel] [k] ixgbe_poll
>>>>>>>>> 0.75% [kernel] [k] format_decode
>>>>>>>>> 0.61% [kernel] [k] number
>>>>>>>>> 0.56% [kernel] [k] menu_select
>>>>>>>>> 0.54% [kernel] [k] clflush_cache_range
>>>>>>>>> 0.52% [kernel] [k] cpuidle_enter_state
>>>>>>>>> 0.51% [kernel] [k] vsnprintf
>>>>>>>>> 0.50% [kernel] [k] u32_classify
>>>>>>>>> 0.49% [kernel] [k] fib_table_lookup
>>>>>>>>> 0.40% [kernel] [k] dma_pte_clear_level
>>>>>>>>> 0.39% [kernel] [k] domain_mapping
>>>>>>>>> 0.36% [kernel] [k] ixgbe_xmit_fram
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM
>>>>>>>>> TIME+ COMMAND
>>>>>>>>> 18 root 20 0 0 0 0 S 28.2 0.0
>>>>>>>>> 7:06.27 ksoftirqd/1
>>>>>>>>> 12 root 20 0 0 0 0 R 12.0 0.0
>>>>>>>>> 4:10.88 ksoftirqd/0
>>>>>>>
>>>>>>> […]
>>>>>>>
>>>>>>> Do you see different behavior in `/proc/interrupts`?
>>>>>>>
>>>>>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64
>>>>>> #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on
>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH):
>>>>>>
>>>>>> 1 root 20 0 163948 10288 7696 S 0.0 0.1
>>>>>> 0:39.58 systemd
>>>>>
>>>>> […]
>>>>>
>>>>> The content of `/proc/interrupts` has a different format on my system.
>>>>>
>>>>> ```
>>>>> $ head -3 /proc/interrupts
>>>>> CPU0 CPU1 CPU2 CPU3
>>>>> 1: 55560 0 113 0 IR-IO-APIC 1-edge
>>>>> i8042
>>>>> 8: 0 0 0 0 IR-IO-APIC 8-edge
>>>>> rtc0
>>>>> ```
>>>>> […]
>>>>>
>>>>>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>>>
>>>>>> 31659 root 20 0 0 0 0 S 0.3 0.0 0:00.92
>>>>>> kworker/7:0
>>>>>> 1 root 20 0 57032 6736 5256 S 0.0 0.1 2:28.14
>>>>>> systemd
>>>>>
>>>>> […]
>>>>>>>>>>> Supermicro support suggested as follows:
>>>>>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which
>>>>>>>>>>> is a recent kernel it might not properly support the chipsets
>>>>>>>>>>> for X9 therefore i suggest to use RHEL or CentOS as they use
>>>>>>>>>>> much older kernel versions. I expect that with ubuntu 20.04
>>>>>>>>>>> you see the same problem it uses kernel 5.4
>>>>>>>>>> >>> Testing another GNU/Linux distribution for another data
>>>>>>>>>> point, might be a good idea.
>>>>>>>>>>
>>>>>>>>>> As nobody has responded yet, bisecting the issue is probably
>>>>>>>>>> the fastest way to get to the bottom of this. Luckily the
>>>>>>>>>> problem seems reproducible and you seem to be able to build a
>>>>>>>>>> Linux kernel yourself, so that should work. (For testing
>>>>>>>>>> purposes you could also test with Ubuntu, as they provide
>>>>>>>>>> Linux kernel builds for (almost) all releases in their Linux
>>>>>>>>>> kernel mainline PPA [2].)
>>>>>>>>>>
>>>>>>>>> Of course I can try Ubuntu and report how it is working.
>>>>>>>>>
>>>>>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way
>>>>>>>> generating higher load after executing "ip link set enp1s0 up".
>>>>>>>
>>>>>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu
>>>>>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>>>>>
>>>>>>> Anyway, I think, you won’t come around bisecting. Another hint,
>>>>>>> make sure that you can build a 4.9 Linux kernel yourself, that
>>>>>>> does not exhibit that issue.
>>>>>>>
>>>>>> That`s right, it is 22.04. I don`t have to build it. Standard
>>>>>> kernel Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems
>>>>>> for past 4 years.
>>>>>
>>>>> If nobody of the developers/maintainers is going to step up, you
>>>>> are on your own. Again, as you can reproduce this easily, the
>>>>> fastest way is to bisect the issue, which you can do on your own.
>>>>
>>>> How can I investigate that further?
>>>
>>> I repeat myself, please bisect the issue. It’s the fastest way.
>>>
>>>> I thought about trying to change some of the parameters related to
>>>> ixgbe driver and observe if anything is changing, but when I am
>>>> trying to do:
>>>>
>>>> sudo modprobe ixgbe IntMode=0
>>>>
>>>> I get the following error in the dmesg:
>>>>
>>>> [ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
>>>
>>> […]
>>>
>>> `modinfo ixgbe` shows the supported parameters.
>
>>> PS: If you need help bisecting, please ask. Otherwise, I am out of
>>> this thread.
>>
>> Ok, how exactly I can bisect this issue?
>
> What have you tried so far? As written in the past, I’d first try more
> distributions, for example, older Ubuntu versions. Then, if you have
> some range, I’d use the Ubuntu PPA, and then between the release
> candidate versions, only then start doing `git bisect` as documented in
> the documentation [3].
Hmmm. I'm not an expert in that area, but if you follow Paul's advice
keep in mind that a deliberate config change by the distro might have an
impact here. Hence it might be a good idea to rule that out first by
taking a config from a working kernel and using it (with the help of
"make olddefconfig") to build your own kernel from the version that is
known to fail. But over such a wide range of versions this can be
tricky. :-/
But apart from that Paul is right afaics: nobody yet had an idea what
might cause this regression, hence we need a bisection to pin-point the
problem.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
>>>>>>>>>> [1]: https://bugzilla.kernel.org/
>>>>>>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
> [3]: https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html
>
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
2023-01-24 9:33 ` Linux kernel regression tracking (Thorsten Leemhuis)
@ 2023-01-24 9:40 ` Bartek Kois
2023-03-23 13:46 ` Linux regression tracking (Thorsten Leemhuis)
0 siblings, 1 reply; 14+ messages in thread
From: Bartek Kois @ 2023-01-24 9:40 UTC (permalink / raw)
To: Linux regressions mailing list, Paul Menzel; +Cc: intel-wired-lan
W dniu 24.01.2023 o 10:33, Linux kernel regression tracking (Thorsten
Leemhuis) pisze:
> On 23.01.23 20:03, Paul Menzel wrote:
>> Am 23.01.23 um 19:58 schrieb Bartek Kois:
>>
>>> W dniu 23.01.2023 o 19:53, Paul Menzel pisze:
>>>> Am 23.01.23 um 19:38 schrieb Bartek Kois:
>>>>> W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
>>>>>> Dear Bartek,
>>>>>>
>>>>>>
>>>>>> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>>>>>>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>>>>>>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>>>>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>>>>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>>>>>>
>>>>>>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip
>>>>>>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel
>>>>>>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load
>>>>>>>>>>>> (even if no traffic is passing through the adapter) and
>>>>>>>>>>>> network performance is low (when network is connected).
>>>>>>>>>>> How do you test the network performance? Please give exact
>>>>>>>>>>> numbers for comparison.
>>>>>>>>>>>
>>>>>>>>>> I am using this server as a router for my subscribers with
>>>>>>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I
>>>>>>>>>> encountered this problem while migrating form Debian 9.7 to
>>>>>>>>>> 11.5. Routers based on Supermicro X11SSL-F (Intel® C232
>>>>>>>>>> chipset) works with no problems after that migration, but
>>>>>>>>>> routers based on Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving
>>>>>>>>>> strangely with high cpu load (0.5-0.8 while before it was
>>>>>>>>>> around 0.0-0.1) and subscribers not being able to utilize their
>>>>>>>>>> plans. I tried to strip down the problem and ends up with clean
>>>>>>>>>> system with no iptables or hfsc rules behaving the same (higher
>>>>>>>>>> load) right after setting the 10G link upeven if no traffic is
>>>>>>>>>> passing by.
>>>>>>>>>>
>>>>>>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla
>>>>>>>>>>>> system
>>>>>>>>>>>> with no network attached. The problem can be observed on the
>>>>>>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the
>>>>>>>>>>>> Supermicro
>>>>>>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>>>>>>
>>>>>>>>>>>> Tested environments:
>>>>>>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>>>>>>>> 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux [all platforms
>>>>>>>>>>>> working well with no problems: Supermicro X9SCL (Intel C202
>>>>>>>>>>>> PCH), Supermicro X10SLL+-F (Intel C222 Express PCH),
>>>>>>>>>>>> Supermicro X11SSL-F (Intel® C232 chipset)]
>>>>>>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>>>>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro
>>>>>>>>>>>> X9SCL (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222
>>>>>>>>>>>> Express PCH) behave problematic as described above | newer
>>>>>>>>>>>> platform: Supermicro X11SSL-F (Intel® C232 chipset) working
>>>>>>>>>>>> well with no problems]
>>>>>>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where
>>>>>>>>>>> you can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>>>>>>
>>>>>>>>>> I`ve already reported that to the Debian team
>>>>>>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but
>>>>>>>>>> so far nobody took care of this issue so far.
>>>>>>>>>>
>>>>>>>>>>>> So far to solve the problem I was trying to upgrade system to
>>>>>>>>>>>> the newest stable version, upgrade kernel to version 6.x,
>>>>>>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>>>>>> Thank you for checking that. Too bad it’s still present. To
>>>>>>>>>>> rule out some user space problem, could you test Debian 9.7
>>>>>>>>>>> with a stable Linux release, currently 6.1.7?
>>>>>>>>>>>
>>>>>>>>>>> What does `sudo perf top --sort comm,dso` show, where the time
>>>>>>>>>>> is spent?
>>>>>>>>>> During my first test in real enviroment with subscribers I
>>>>>>>>>> gether the following data through the perf:
>>>>>>>>>>
>>>>>>>>>> 27.83% [kernel] [k] strncpy
>>>>>>>>>> 14.80% [kernel] [k] nft_do_chain
>>>>>>>>>> 7.61% [kernel] [k] memcmp
>>>>>>>>>> 5.63% [kernel] [k] nft_meta_get_eval
>>>>>>>>>> 3.14% [kernel] [k] nft_cmp_eval
>>>>>>>>>> 2.79% [kernel] [k] asm_exc_nmi
>>>>>>>>>> 1.07% [kernel] [k] module_get_kallsym
>>>>>>>>>> 0.92% [kernel] [k]
>>>>>>>>>> kallsyms_expand_symbol.constprop.0
>>>>>>>>>> 0.85% [kernel] [k] ixgbe_poll
>>>>>>>>>> 0.75% [kernel] [k] format_decode
>>>>>>>>>> 0.61% [kernel] [k] number
>>>>>>>>>> 0.56% [kernel] [k] menu_select
>>>>>>>>>> 0.54% [kernel] [k] clflush_cache_range
>>>>>>>>>> 0.52% [kernel] [k] cpuidle_enter_state
>>>>>>>>>> 0.51% [kernel] [k] vsnprintf
>>>>>>>>>> 0.50% [kernel] [k] u32_classify
>>>>>>>>>> 0.49% [kernel] [k] fib_table_lookup
>>>>>>>>>> 0.40% [kernel] [k] dma_pte_clear_level
>>>>>>>>>> 0.39% [kernel] [k] domain_mapping
>>>>>>>>>> 0.36% [kernel] [k] ixgbe_xmit_fram
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM
>>>>>>>>>> TIME+ COMMAND
>>>>>>>>>> 18 root 20 0 0 0 0 S 28.2 0.0
>>>>>>>>>> 7:06.27 ksoftirqd/1
>>>>>>>>>> 12 root 20 0 0 0 0 R 12.0 0.0
>>>>>>>>>> 4:10.88 ksoftirqd/0
>>>>>>>> […]
>>>>>>>>
>>>>>>>> Do you see different behavior in `/proc/interrupts`?
>>>>>>>>
>>>>>>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64
>>>>>>> #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on
>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH):
>>>>>>>
>>>>>>> 1 root 20 0 163948 10288 7696 S 0.0 0.1
>>>>>>> 0:39.58 systemd
>>>>>> […]
>>>>>>
>>>>>> The content of `/proc/interrupts` has a different format on my system.
>>>>>>
>>>>>> ```
>>>>>> $ head -3 /proc/interrupts
>>>>>> CPU0 CPU1 CPU2 CPU3
>>>>>> 1: 55560 0 113 0 IR-IO-APIC 1-edge
>>>>>> i8042
>>>>>> 8: 0 0 0 0 IR-IO-APIC 8-edge
>>>>>> rtc0
>>>>>> ```
>>>>>> […]
>>>>>>
>>>>>>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>>>>
>>>>>>> 31659 root 20 0 0 0 0 S 0.3 0.0 0:00.92
>>>>>>> kworker/7:0
>>>>>>> 1 root 20 0 57032 6736 5256 S 0.0 0.1 2:28.14
>>>>>>> systemd
>>>>>> […]
>>>>>>>>>>>> Supermicro support suggested as follows:
>>>>>>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which
>>>>>>>>>>>> is a recent kernel it might not properly support the chipsets
>>>>>>>>>>>> for X9 therefore i suggest to use RHEL or CentOS as they use
>>>>>>>>>>>> much older kernel versions. I expect that with ubuntu 20.04
>>>>>>>>>>>> you see the same problem it uses kernel 5.4
>>>>>>>>>>>>>> Testing another GNU/Linux distribution for another data
>>>>>>>>>>> point, might be a good idea.
>>>>>>>>>>>
>>>>>>>>>>> As nobody has responded yet, bisecting the issue is probably
>>>>>>>>>>> the fastest way to get to the bottom of this. Luckily the
>>>>>>>>>>> problem seems reproducible and you seem to be able to build a
>>>>>>>>>>> Linux kernel yourself, so that should work. (For testing
>>>>>>>>>>> purposes you could also test with Ubuntu, as they provide
>>>>>>>>>>> Linux kernel builds for (almost) all releases in their Linux
>>>>>>>>>>> kernel mainline PPA [2].)
>>>>>>>>>>>
>>>>>>>>>> Of course I can try Ubuntu and report how it is working.
>>>>>>>>>>
>>>>>>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way
>>>>>>>>> generating higher load after executing "ip link set enp1s0 up".
>>>>>>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu
>>>>>>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>>>>>>
>>>>>>>> Anyway, I think, you won’t come around bisecting. Another hint,
>>>>>>>> make sure that you can build a 4.9 Linux kernel yourself, that
>>>>>>>> does not exhibit that issue.
>>>>>>>>
>>>>>>> That`s right, it is 22.04. I don`t have to build it. Standard
>>>>>>> kernel Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems
>>>>>>> for past 4 years.
>>>>>> If nobody of the developers/maintainers is going to step up, you
>>>>>> are on your own. Again, as you can reproduce this easily, the
>>>>>> fastest way is to bisect the issue, which you can do on your own.
>>>>> How can I investigate that further?
>>>> I repeat myself, please bisect the issue. It’s the fastest way.
>>>>
>>>>> I thought about trying to change some of the parameters related to
>>>>> ixgbe driver and observe if anything is changing, but when I am
>>>>> trying to do:
>>>>>
>>>>> sudo modprobe ixgbe IntMode=0
>>>>>
>>>>> I get the following error in the dmesg:
>>>>>
>>>>> [ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
>>>> […]
>>>>
>>>> `modinfo ixgbe` shows the supported parameters.
>>>> PS: If you need help bisecting, please ask. Otherwise, I am out of
>>>> this thread.
>>> Ok, how exactly I can bisect this issue?
>> What have you tried so far? As written in the past, I’d first try more
>> distributions, for example, older Ubuntu versions. Then, if you have
>> some range, I’d use the Ubuntu PPA, and then between the release
>> candidate versions, only then start doing `git bisect` as documented in
>> the documentation [3].
> Hmmm. I'm not an expert in that area, but if you follow Paul's advice
> keep in mind that a deliberate config change by the distro might have an
> impact here. Hence it might be a good idea to rule that out first by
> taking a config from a working kernel and using it (with the help of
> "make olddefconfig") to build your own kernel from the version that is
> known to fail. But over such a wide range of versions this can be
> tricky. :-/
>
> But apart from that Paul is right afaics: nobody yet had an idea what
> might cause this regression, hence we need a bisection to pin-point the
> problem.
Thanks for the advice. I`ll try my best to find out which commit caused
the problem, but it will take me some time as I have never done
bisecting especially on that scale. What`s wondering me the most is that
nobody reported this issue so far taking into account that these
platforms along with Debian and Intel 82599EN NIC is quite common
configuration I think.
Best regards
Bartek Kois
> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
>
>
>>>>>>>>>>> [1]: https://bugzilla.kernel.org/
>>>>>>>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
>> [3]: https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html
>>
>>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
2023-01-24 9:40 ` Bartek Kois
@ 2023-03-23 13:46 ` Linux regression tracking (Thorsten Leemhuis)
0 siblings, 0 replies; 14+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-03-23 13:46 UTC (permalink / raw)
To: Bartek Kois, Linux regressions mailing list, Paul Menzel; +Cc: intel-wired-lan
On 24.01.23 10:40, Bartek Kois wrote:
> W dniu 24.01.2023 o 10:33, Linux kernel regression tracking (Thorsten
> Leemhuis) pisze:
>> On 23.01.23 20:03, Paul Menzel wrote:
>>> Am 23.01.23 um 19:58 schrieb Bartek Kois:
>>>> W dniu 23.01.2023 o 19:53, Paul Menzel pisze:
>>>>> Am 23.01.23 um 19:38 schrieb Bartek Kois:
>>>>>> W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
>>>>>>> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>>>>>>>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>>>>>>>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>>>>>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>>>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>>>>>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>>>>>>>
>>>>>>>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip
>>>>>>>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel
>>>>>>>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load
>>>>>>>>>>>>> (even if no traffic is passing through the adapter) and
>>>>>>>>>>>>> network performance is low (when network is connected).
>>>>>>>>>>>> How do you test the network performance? Please give exact
>>>>>>>>>>>> numbers for comparison.
>>>>>>>>>>>>
>>>>>>>>>>> I am using this server as a router for my subscribers with
>>>>>>>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I
>>>>>>>>>>> encountered this problem while migrating form Debian 9.7 to
>>>>>>>>>>> 11.5. Routers based on Supermicro X11SSL-F (Intel® C232
>>>>>>>>>>> chipset) works with no problems after that migration, but
>>>>>>>>>>> routers based on Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving
>>>>>>>>>>> strangely with high cpu load (0.5-0.8 while before it was
>>>>>>>>>>> around 0.0-0.1) and subscribers not being able to utilize their
>>>>>>>>>>> plans. I tried to strip down the problem and ends up with clean
>>>>>>>>>>> system with no iptables or hfsc rules behaving the same (higher
>>>>>>>>>>> load) right after setting the 10G link upeven if no traffic is
>>>>>>>>>>> passing by.
>>>>>>>>>>>
>>>>>>>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla
>>>>>>>>>>>>> system
>>>>>>>>>>>>> with no network attached. The problem can be observed on the
>>>>>>>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the
>>>>>>>>>>>>> Supermicro
>>>>>>>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tested environments:
>>>>>>>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>>>>>>>>> 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux [all platforms
>>>>>>>>>>>>> working well with no problems: Supermicro X9SCL (Intel C202
>>>>>>>>>>>>> PCH), Supermicro X10SLL+-F (Intel C222 Express PCH),
>>>>>>>>>>>>> Supermicro X11SSL-F (Intel® C232 chipset)]
>>>>>>>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>>>>>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro
>>>>>>>>>>>>> X9SCL (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222
>>>>>>>>>>>>> Express PCH) behave problematic as described above | newer
>>>>>>>>>>>>> platform: Supermicro X11SSL-F (Intel® C232 chipset) working
>>>>>>>>>>>>> well with no problems]
>>>>>>>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where
>>>>>>>>>>>> you can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>>>>>>>
>>>>>>>>>>> I`ve already reported that to the Debian team
>>>>>>>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but
>>>>>>>>>>> so far nobody took care of this issue so far.
>>>>>>>>>>>
>>>>>>>>>>>>> So far to solve the problem I was trying to upgrade system to
>>>>>>>>>>>>> the newest stable version, upgrade kernel to version 6.x,
>>>>>>>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>>>>>>> Thank you for checking that. Too bad it’s still present. To
>>>>>>>>>>>> rule out some user space problem, could you test Debian 9.7
>>>>>>>>>>>> with a stable Linux release, currently 6.1.7?
>>>>>>>>>>>>
>>>>>>>>>>>> What does `sudo perf top --sort comm,dso` show, where the time
>>>>>>>>>>>> is spent?
>>>>>>>>>>> During my first test in real enviroment with subscribers I
>>>>>>>>>>> gether the following data through the perf:
>>>>>>>>>>>
>>>>>>>>>>> 27.83% [kernel] [k] strncpy
>>>>>>>>>>> 14.80% [kernel] [k] nft_do_chain
>>>>>>>>>>> 7.61% [kernel] [k] memcmp
>>>>>>>>>>> 5.63% [kernel] [k] nft_meta_get_eval
>>>>>>>>>>> 3.14% [kernel] [k] nft_cmp_eval
>>>>>>>>>>> 2.79% [kernel] [k] asm_exc_nmi
>>>>>>>>>>> 1.07% [kernel] [k] module_get_kallsym
>>>>>>>>>>> 0.92% [kernel] [k]
>>>>>>>>>>> kallsyms_expand_symbol.constprop.0
>>>>>>>>>>> 0.85% [kernel] [k] ixgbe_poll
>>>>>>>>>>> 0.75% [kernel] [k] format_decode
>>>>>>>>>>> 0.61% [kernel] [k] number
>>>>>>>>>>> 0.56% [kernel] [k] menu_select
>>>>>>>>>>> 0.54% [kernel] [k] clflush_cache_range
>>>>>>>>>>> 0.52% [kernel] [k] cpuidle_enter_state
>>>>>>>>>>> 0.51% [kernel] [k] vsnprintf
>>>>>>>>>>> 0.50% [kernel] [k] u32_classify
>>>>>>>>>>> 0.49% [kernel] [k] fib_table_lookup
>>>>>>>>>>> 0.40% [kernel] [k] dma_pte_clear_level
>>>>>>>>>>> 0.39% [kernel] [k] domain_mapping
>>>>>>>>>>> 0.36% [kernel] [k] ixgbe_xmit_fram
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> PID USER PR NI VIRT RES SHR S %CPU %MEM
>>>>>>>>>>> TIME+ COMMAND
>>>>>>>>>>> 18 root 20 0 0 0 0 S 28.2 0.0
>>>>>>>>>>> 7:06.27 ksoftirqd/1
>>>>>>>>>>> 12 root 20 0 0 0 0 R 12.0 0.0
>>>>>>>>>>> 4:10.88 ksoftirqd/0
>>>>>>>>> […]
>>>>>>>>>
>>>>>>>>> Do you see different behavior in `/proc/interrupts`?
>>>>>>>>>
>>>>>>>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64
>>>>>>>> #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on
>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH):
>>>>>>>>
>>>>>>>> 1 root 20 0 163948 10288 7696 S 0.0 0.1
>>>>>>>> 0:39.58 systemd
>>>>>>> […]
>>>>>>>
>>>>>>> The content of `/proc/interrupts` has a different format on my
>>>>>>> system.
>>>>>>>
>>>>>>> ```
>>>>>>> $ head -3 /proc/interrupts
>>>>>>> CPU0 CPU1 CPU2 CPU3
>>>>>>> 1: 55560 0 113 0 IR-IO-APIC 1-edge
>>>>>>> i8042
>>>>>>> 8: 0 0 0 0 IR-IO-APIC 8-edge
>>>>>>> rtc0
>>>>>>> ```
>>>>>>> […]
>>>>>>>
>>>>>>>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>>>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>>>>>
>>>>>>>> 31659 root 20 0 0 0 0 S 0.3 0.0 0:00.92
>>>>>>>> kworker/7:0
>>>>>>>> 1 root 20 0 57032 6736 5256 S 0.0 0.1 2:28.14
>>>>>>>> systemd
>>>>>>> […]
>>>>>>>>>>>>> Supermicro support suggested as follows:
>>>>>>>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which
>>>>>>>>>>>>> is a recent kernel it might not properly support the chipsets
>>>>>>>>>>>>> for X9 therefore i suggest to use RHEL or CentOS as they use
>>>>>>>>>>>>> much older kernel versions. I expect that with ubuntu 20.04
>>>>>>>>>>>>> you see the same problem it uses kernel 5.4
>>>>>>>>>>>>>>> Testing another GNU/Linux distribution for another data
>>>>>>>>>>>> point, might be a good idea.
>>>>>>>>>>>>
>>>>>>>>>>>> As nobody has responded yet, bisecting the issue is probably
>>>>>>>>>>>> the fastest way to get to the bottom of this. Luckily the
>>>>>>>>>>>> problem seems reproducible and you seem to be able to build a
>>>>>>>>>>>> Linux kernel yourself, so that should work. (For testing
>>>>>>>>>>>> purposes you could also test with Ubuntu, as they provide
>>>>>>>>>>>> Linux kernel builds for (almost) all releases in their Linux
>>>>>>>>>>>> kernel mainline PPA [2].)
>>>>>>>>>>>>
>>>>>>>>>>> Of course I can try Ubuntu and report how it is working.
>>>>>>>>>>>
>>>>>>>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way
>>>>>>>>>> generating higher load after executing "ip link set enp1s0 up".
>>>>>>>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu
>>>>>>>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>>>>>>>
>>>>>>>>> Anyway, I think, you won’t come around bisecting. Another hint,
>>>>>>>>> make sure that you can build a 4.9 Linux kernel yourself, that
>>>>>>>>> does not exhibit that issue.
>>>>>>>>>
>>>>>>>> That`s right, it is 22.04. I don`t have to build it. Standard
>>>>>>>> kernel Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems
>>>>>>>> for past 4 years.
>>>>>>> If nobody of the developers/maintainers is going to step up, you
>>>>>>> are on your own. Again, as you can reproduce this easily, the
>>>>>>> fastest way is to bisect the issue, which you can do on your own.
>>>>>> How can I investigate that further?
>>>>> I repeat myself, please bisect the issue. It’s the fastest way.
>>>>>
>>>>>> I thought about trying to change some of the parameters related to
>>>>>> ixgbe driver and observe if anything is changing, but when I am
>>>>>> trying to do:
>>>>>>
>>>>>> sudo modprobe ixgbe IntMode=0
>>>>>>
>>>>>> I get the following error in the dmesg:
>>>>>>
>>>>>> [ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
>>>>> […]
>>>>>
>>>>> `modinfo ixgbe` shows the supported parameters.
>>>>> PS: If you need help bisecting, please ask. Otherwise, I am out of
>>>>> this thread.
>>>> Ok, how exactly I can bisect this issue?
>>> What have you tried so far? As written in the past, I’d first try more
>>> distributions, for example, older Ubuntu versions. Then, if you have
>>> some range, I’d use the Ubuntu PPA, and then between the release
>>> candidate versions, only then start doing `git bisect` as documented in
>>> the documentation [3].
>> Hmmm. I'm not an expert in that area, but if you follow Paul's advice
>> keep in mind that a deliberate config change by the distro might have an
>> impact here. Hence it might be a good idea to rule that out first by
>> taking a config from a working kernel and using it (with the help of
>> "make olddefconfig") to build your own kernel from the version that is
>> known to fail. But over such a wide range of versions this can be
>> tricky. :-/
>>
>> But apart from that Paul is right afaics: nobody yet had an idea what
>> might cause this regression, hence we need a bisection to pin-point the
>> problem.
>
> Thanks for the advice. I`ll try my best to find out which commit caused
> the problem, but it will take me some time as I have never done
> bisecting especially on that scale.
Did you ever get closer to the root of the problem?
> What`s wondering me the most is that
> nobody reported this issue so far taking into account that these
> platforms along with Debian and Intel 82599EN NIC is quite common
> configuration I think.
I guess the answer is the usual: the problem only shows up in some
environments using that NIC -- for example if the firmware of the
motherboard or the configuration somehow directly or indirectly trigger
the problem.
Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.
P.S.:
#regzbot backburner: need bisection that will take some time to get done
#regzbot poke
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-03-23 13:46 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <d1530cba-1a72-cae8-6a04-ed8ec0f82e6e@gmail.com>
2023-01-19 10:17 ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5 Paul Menzel
2023-01-19 10:22 ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network performance " Paul Menzel
2023-01-19 12:24 ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance " Bartek Kois
2023-01-19 16:58 ` Bartek Kois
2023-01-19 17:09 ` Paul Menzel
2023-01-19 17:17 ` Bartek Kois
2023-01-22 20:28 ` Paul Menzel
2023-01-23 18:38 ` Bartek Kois
2023-01-23 18:53 ` Paul Menzel
2023-01-23 18:58 ` Bartek Kois
2023-01-23 19:03 ` Paul Menzel
2023-01-24 9:33 ` Linux kernel regression tracking (Thorsten Leemhuis)
2023-01-24 9:40 ` Bartek Kois
2023-03-23 13:46 ` Linux regression tracking (Thorsten Leemhuis)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).