regressions.lists.linux.dev archive mirror
 help / color / mirror / Atom feed
* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
       [not found] <d1530cba-1a72-cae8-6a04-ed8ec0f82e6e@gmail.com>
@ 2023-01-19 10:17 ` Paul Menzel
  2023-01-19 10:22   ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network performance " Paul Menzel
  2023-01-19 12:24   ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance " Bartek Kois
  0 siblings, 2 replies; 14+ messages in thread
From: Paul Menzel @ 2023-01-19 10:17 UTC (permalink / raw)
  To: Bartek Kois; +Cc: intel-wired-lan, regressions


#regzbot ^introduced: 4.9.88..5.10.149

Dear Bartek,


Am 14.01.23 um 11:23 schrieb Bartek Kois:

> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link set 
> enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN based 10G 
> adapter) I am experiencing high cpu load (even if no traffic is passing 
> through the adapter) and network performance is low (when network is 
> connected).

How do you test the network performance? Please give exact numbers for 
comparison.

> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
> with no network attached. The problem can be observed on the 
> following platforms: Supermicro X9SCL (Intel C202 PCH) and
> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
> X11SSL-F (Intel® C232 chipset) everything is working well.
> 
> Tested environments:
> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 
> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no 
> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F (Intel 
> C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]

> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 
> (2022-10-21) x86_64 GNU/Linux  [older platforms: Supermicro X9SCL (Intel 
> C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH) behave 
> problematic as described above | newer platform: Supermicro X11SSL-F 
> (Intel® C232 chipset) working well with no problems]

Maybe create a bug at the Linux kernel bug tracker [1], where you can 
attach all the logs (`dmesg`, `lspci -nnk -s …`, …).

> So far to solve the problem I was trying to upgrade system to the newest 
> stable version, upgrade kernel to version 6.x, upgrade ixgbe driver to 
> the newest version but with no luck.

Thank you for checking that. Too bad it’s still present. To rule out 
some user space problem, could you test Debian 9.7 with a stable Linux 
release, currently 6.1.7?

What does `sudo perf top --sort comm,dso` show, where the time is spent?

> Supermicro support suggested as follows:
> it might be kernel related debian 11.5 has kernel 5.10 which is a 
> recent kernel it might not properly support the chipsets for X9 
> therefore i suggest to use RHEL or CentOS as they use much older kernel 
> versions. I expect that with ubuntu 20.04 you see the same problem it 
> uses kernel 5.4
Testing another GNU/Linux distribution for another data point, might be 
a good idea.

As nobody has responded yet, bisecting the issue is probably the fastest 
way to get to the bottom of this. Luckily the problem seems reproducible 
and you seem to be able to build a Linux kernel yourself, so that should 
work. (For testing purposes you could also test with Ubuntu, as they 
provide Linux kernel builds for (almost) all releases in their Linux 
kernel mainline PPA [2].)


Kind regards,

Paul


[1]: https://bugzilla.kernel.org/
[2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network performance after moving to Debian 11.5
  2023-01-19 10:17 ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5 Paul Menzel
@ 2023-01-19 10:22   ` Paul Menzel
  2023-01-19 12:24   ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance " Bartek Kois
  1 sibling, 0 replies; 14+ messages in thread
From: Paul Menzel @ 2023-01-19 10:22 UTC (permalink / raw)
  To: Bartek Kois; +Cc: intel-wired-lan, regressions

Dear Bartek,


Am 19.01.23 um 11:17 schrieb Paul Menzel:
> #regzbot ^introduced: 4.9.88..5.10.149

> Am 14.01.23 um 11:23 schrieb Bartek Kois:
> 
>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link set 
>> enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN based 10G 
>> adapter) I am experiencing high cpu load (even if no traffic is 
>> passing through the adapter) and network performance is low (when 
>> network is connected).
> 
> How do you test the network performance? Please give exact numbers for 
> comparison.
> 
>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>> with no network attached. The problem can be observed on the following 
>> platforms: Supermicro X9SCL (Intel C202 PCH) and
>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>
>> Tested environments:
>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 
>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no 
>> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F 
>> (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]
> 
>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 
>> (2022-10-21) x86_64 GNU/Linux  [older platforms: Supermicro X9SCL 
>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH) behave 
>> problematic as described above | newer platform: Supermicro X11SSL-F 
>> (Intel® C232 chipset) working well with no problems]
> 
> Maybe create a bug at the Linux kernel bug tracker [1], where you can 
> attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
> 
>> So far to solve the problem I was trying to upgrade system to the 
>> newest stable version, upgrade kernel to version 6.x, upgrade ixgbe 
>> driver to the newest version but with no luck.
> 
> Thank you for checking that. Too bad it’s still present. To rule out 
> some user space problem, could you test Debian 9.7 with a stable Linux 
> release, currently 6.1.7?
> 
> What does `sudo perf top --sort comm,dso` show, where the time is spent?
> 
>> Supermicro support suggested as follows:
>> it might be kernel related debian 11.5 has kernel 5.10 which is a 
>> recent kernel it might not properly support the chipsets for X9 
>> therefore i suggest to use RHEL or CentOS as they use much older 
>> kernel versions. I expect that with ubuntu 20.04 you see the same 
>> problem it uses kernel 5.4
> 
> Testing another GNU/Linux distribution for another data point, might be 
> a good idea.
> 
> As nobody has responded yet, bisecting the issue is probably the fastest 
> way to get to the bottom of this. Luckily the problem seems reproducible 
> and you seem to be able to build a Linux kernel yourself, so that should 
> work. (For testing purposes you could also test with Ubuntu, as they 
> provide Linux kernel builds for (almost) all releases in their Linux 
> kernel mainline PPA [2].)

You could also try to do that in a virtual machine by passing through 
the network device to the VM. If that reproduces the issue, that’s quite 
a fast setup for bisecting a regression, as start times are really fast. 
(For example, you can pass the Linux kernel directly to a QEMU VM with 
the `-kernel` switch.)


Kind regards,

Paul


> [1]: https://bugzilla.kernel.org/
> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
  2023-01-19 10:17 ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5 Paul Menzel
  2023-01-19 10:22   ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network performance " Paul Menzel
@ 2023-01-19 12:24   ` Bartek Kois
  2023-01-19 16:58     ` Bartek Kois
  1 sibling, 1 reply; 14+ messages in thread
From: Bartek Kois @ 2023-01-19 12:24 UTC (permalink / raw)
  To: Paul Menzel; +Cc: intel-wired-lan, regressions


W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>
> #regzbot ^introduced: 4.9.88..5.10.149
>
> Dear Bartek,
>
>
> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>
>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link 
>> set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN based 
>> 10G adapter) I am experiencing high cpu load (even if no traffic is 
>> passing through the adapter) and network performance is low (when 
>> network is connected).
>
> How do you test the network performance? Please give exact numbers for 
> comparison.
>
I am using this server as a router for my subscribers with iptables (for 
NAT and firewall) and hfsc (for QoS). First I encountered this problem 
while migrating form Debian 9.7 to 11.5. Routers based  on Supermicro 
X11SSL-F (Intel® C232 chipset) works with no problems after that 
migration, but routers based on Supermicro X9SCL (Intel C202 PCH) and 
Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving strangely 
with high cpu load (0.5-0.8 while before it was around 0.0-0.1) and 
subscribers not being able to utilize their plans. I tried to strip down 
the problem and ends up with clean system with no iptables or hfsc rules 
behaving the same (higher load) right after setting the 10G link upeven 
if no traffic is passing by.

>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>> with no network attached. The problem can be observed on the 
>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>
>> Tested environments:
>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 
>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no 
>> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F 
>> (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]
>
>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 
>> (2022-10-21) x86_64 GNU/Linux  [older platforms: Supermicro X9SCL 
>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH) 
>> behave problematic as described above | newer platform: Supermicro 
>> X11SSL-F (Intel® C232 chipset) working well with no problems]
>
> Maybe create a bug at the Linux kernel bug tracker [1], where you can 
> attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>
I`ve already reported that to the Debian team 
ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so far 
nobody took care of this issue so far.

>> So far to solve the problem I was trying to upgrade system to the 
>> newest stable version, upgrade kernel to version 6.x, upgrade ixgbe 
>> driver to the newest version but with no luck.
>
> Thank you for checking that. Too bad it’s still present. To rule out 
> some user space problem, could you test Debian 9.7 with a stable Linux 
> release, currently 6.1.7?
>
> What does `sudo perf top --sort comm,dso` show, where the time is spent?

During my first test in real enviroment with subscribers I gether the 
following data through the perf:

   27.83%  [kernel]                   [k] strncpy
   14.80%  [kernel]                   [k] nft_do_chain
    7.61%  [kernel]                   [k] memcmp
    5.63%  [kernel]                   [k] nft_meta_get_eval
    3.14%  [kernel]                   [k] nft_cmp_eval
    2.79%  [kernel]                   [k] asm_exc_nmi
    1.07%  [kernel]                   [k] module_get_kallsym
    0.92%  [kernel]                   [k] kallsyms_expand_symbol.constprop.0
    0.85%  [kernel]                   [k] ixgbe_poll
    0.75%  [kernel]                   [k] format_decode
    0.61%  [kernel]                   [k] number
    0.56%  [kernel]                   [k] menu_select
    0.54%  [kernel]                   [k] clflush_cache_range
    0.52%  [kernel]                   [k] cpuidle_enter_state
    0.51%  [kernel]                   [k] vsnprintf
    0.50%  [kernel]                   [k] u32_classify
    0.49%  [kernel]                   [k] fib_table_lookup
    0.40%  [kernel]                   [k] dma_pte_clear_level
    0.39%  [kernel]                   [k] domain_mapping
    0.36%  [kernel]                   [k] ixgbe_xmit_fram


     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
      18 root      20   0       0      0      0 S  28.2   0.0 7:06.27 
ksoftirqd/1
      12 root      20   0       0      0      0 R  12.0   0.0 4:10.88 
ksoftirqd/0
      23 root      20   0       0      0      0 S   6.0   0.0 4:36.08 
ksoftirqd/2
      28 root      20   0       0      0      0 S   5.3   0.0 6:46.47 
ksoftirqd/3
  846449 root      20   0       0      0      0 I   1.0   0.0 0:01.61 
kworker/0:0-events_power_efficient
      13 root      20   0       0      0      0 I   0.3   0.0 0:13.50 
rcu_sched
    8264 root      20   0  101536   6944   4824 S   0.3   0.2 0:07.77 dhcpd
       1 root      20   0  164048  10184   7672 S   0.0   0.3 0:04.52 
systemd
       2 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
kthreadd
       3 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 rcu_gp
       4 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
rcu_par_gp
       6 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
kworker/0:0H-events_highpri
       9 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
mm_percpu_wq
      10 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
rcu_tasks_rude_
      11 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
rcu_tasks_trace
      14 root      rt   0       0      0      0 S   0.0   0.0 0:00.26 
migration/0

>
>> Supermicro support suggested as follows:
>> it might be kernel related debian 11.5 has kernel 5.10 which is a 
>> recent kernel it might not properly support the chipsets for X9 
>> therefore i suggest to use RHEL or CentOS as they use much older 
>> kernel versions. I expect that with ubuntu 20.04 you see the same 
>> problem it uses kernel 5.4
> Testing another GNU/Linux distribution for another data point, might 
> be a good idea.
>
> As nobody has responded yet, bisecting the issue is probably the 
> fastest way to get to the bottom of this. Luckily the problem seems 
> reproducible and you seem to be able to build a Linux kernel yourself, 
> so that should work. (For testing purposes you could also test with 
> Ubuntu, as they provide Linux kernel builds for (almost) all releases 
> in their Linux kernel mainline PPA [2].)
>
Of course  I can try Ubuntu and report how it is working.

Best regards

Bartek Kois

>
> Kind regards,
>
> Paul
>
>
> [1]: https://bugzilla.kernel.org/
> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
  2023-01-19 12:24   ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance " Bartek Kois
@ 2023-01-19 16:58     ` Bartek Kois
  2023-01-19 17:09       ` Paul Menzel
  0 siblings, 1 reply; 14+ messages in thread
From: Bartek Kois @ 2023-01-19 16:58 UTC (permalink / raw)
  To: Paul Menzel; +Cc: intel-wired-lan, regressions

W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>
> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>
>> #regzbot ^introduced: 4.9.88..5.10.149
>>
>> Dear Bartek,
>>
>>
>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>
>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link 
>>> set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN 
>>> based 10G adapter) I am experiencing high cpu load (even if no 
>>> traffic is passing through the adapter) and network performance is 
>>> low (when network is connected).
>>
>> How do you test the network performance? Please give exact numbers 
>> for comparison.
>>
> I am using this server as a router for my subscribers with iptables 
> (for NAT and firewall) and hfsc (for QoS). First I encountered this 
> problem while migrating form Debian 9.7 to 11.5. Routers based  on 
> Supermicro X11SSL-F (Intel® C232 chipset) works with no problems after 
> that migration, but routers based on Supermicro X9SCL (Intel C202 PCH) 
> and Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving 
> strangely with high cpu load (0.5-0.8 while before it was around 
> 0.0-0.1) and subscribers not being able to utilize their plans. I 
> tried to strip down the problem and ends up with clean system with no 
> iptables or hfsc rules behaving the same (higher load) right after 
> setting the 10G link upeven if no traffic is passing by.
>
>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>> with no network attached. The problem can be observed on the 
>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>
>>> Tested environments:
>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 
>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no 
>>> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F 
>>> (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]
>>
>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 
>>> (2022-10-21) x86_64 GNU/Linux  [older platforms: Supermicro X9SCL 
>>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH) 
>>> behave problematic as described above | newer platform: Supermicro 
>>> X11SSL-F (Intel® C232 chipset) working well with no problems]
>>
>> Maybe create a bug at the Linux kernel bug tracker [1], where you can 
>> attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>
> I`ve already reported that to the Debian team 
> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so far 
> nobody took care of this issue so far.
>
>>> So far to solve the problem I was trying to upgrade system to the 
>>> newest stable version, upgrade kernel to version 6.x, upgrade ixgbe 
>>> driver to the newest version but with no luck.
>>
>> Thank you for checking that. Too bad it’s still present. To rule out 
>> some user space problem, could you test Debian 9.7 with a stable 
>> Linux release, currently 6.1.7?
>>
>> What does `sudo perf top --sort comm,dso` show, where the time is spent?
>
> During my first test in real enviroment with subscribers I gether the 
> following data through the perf:
>
>   27.83%  [kernel]                   [k] strncpy
>   14.80%  [kernel]                   [k] nft_do_chain
>    7.61%  [kernel]                   [k] memcmp
>    5.63%  [kernel]                   [k] nft_meta_get_eval
>    3.14%  [kernel]                   [k] nft_cmp_eval
>    2.79%  [kernel]                   [k] asm_exc_nmi
>    1.07%  [kernel]                   [k] module_get_kallsym
>    0.92%  [kernel]                   [k] 
> kallsyms_expand_symbol.constprop.0
>    0.85%  [kernel]                   [k] ixgbe_poll
>    0.75%  [kernel]                   [k] format_decode
>    0.61%  [kernel]                   [k] number
>    0.56%  [kernel]                   [k] menu_select
>    0.54%  [kernel]                   [k] clflush_cache_range
>    0.52%  [kernel]                   [k] cpuidle_enter_state
>    0.51%  [kernel]                   [k] vsnprintf
>    0.50%  [kernel]                   [k] u32_classify
>    0.49%  [kernel]                   [k] fib_table_lookup
>    0.40%  [kernel]                   [k] dma_pte_clear_level
>    0.39%  [kernel]                   [k] domain_mapping
>    0.36%  [kernel]                   [k] ixgbe_xmit_fram
>
>
>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ 
> COMMAND
>      18 root      20   0       0      0      0 S  28.2   0.0 7:06.27 
> ksoftirqd/1
>      12 root      20   0       0      0      0 R  12.0   0.0 4:10.88 
> ksoftirqd/0
>      23 root      20   0       0      0      0 S   6.0   0.0 4:36.08 
> ksoftirqd/2
>      28 root      20   0       0      0      0 S   5.3   0.0 6:46.47 
> ksoftirqd/3
>  846449 root      20   0       0      0      0 I   1.0   0.0 0:01.61 
> kworker/0:0-events_power_efficient
>      13 root      20   0       0      0      0 I   0.3   0.0 0:13.50 
> rcu_sched
>    8264 root      20   0  101536   6944   4824 S   0.3   0.2 0:07.77 
> dhcpd
>       1 root      20   0  164048  10184   7672 S   0.0   0.3 0:04.52 
> systemd
>       2 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
> kthreadd
>       3 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
> rcu_gp
>       4 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
> rcu_par_gp
>       6 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
> kworker/0:0H-events_highpri
>       9 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
> mm_percpu_wq
>      10 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
> rcu_tasks_rude_
>      11 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
> rcu_tasks_trace
>      14 root      rt   0       0      0      0 S   0.0   0.0 0:00.26 
> migration/0
>
>>
>>> Supermicro support suggested as follows:
>>> it might be kernel related debian 11.5 has kernel 5.10 which is a 
>>> recent kernel it might not properly support the chipsets for X9 
>>> therefore i suggest to use RHEL or CentOS as they use much older 
>>> kernel versions. I expect that with ubuntu 20.04 you see the same 
>>> problem it uses kernel 5.4
>> Testing another GNU/Linux distribution for another data point, might 
>> be a good idea.
>>
>> As nobody has responded yet, bisecting the issue is probably the 
>> fastest way to get to the bottom of this. Luckily the problem seems 
>> reproducible and you seem to be able to build a Linux kernel 
>> yourself, so that should work. (For testing purposes you could also 
>> test with Ubuntu, as they provide Linux kernel builds for (almost) 
>> all releases in their Linux kernel mainline PPA [2].)
>>
> Of course  I can try Ubuntu and report how it is working.
>
Ubuntu (5.15.0-43-generic) seems to be working in the same way 
generating higher load after executing "ip link set enp1s0 up".

Best regards

Bartek Kois

> Best regards
>
> Bartek Kois
>
>>
>> Kind regards,
>>
>> Paul
>>
>>
>> [1]: https://bugzilla.kernel.org/
>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
  2023-01-19 16:58     ` Bartek Kois
@ 2023-01-19 17:09       ` Paul Menzel
  2023-01-19 17:17         ` Bartek Kois
  0 siblings, 1 reply; 14+ messages in thread
From: Paul Menzel @ 2023-01-19 17:09 UTC (permalink / raw)
  To: Bartek Kois; +Cc: intel-wired-lan, regressions

Dear Bartek,


Am 19.01.23 um 17:58 schrieb Bartek Kois:
> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>
>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>
>>> #regzbot ^introduced: 4.9.88..5.10.149

>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>
>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link 
>>>> set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN 
>>>> based 10G adapter) I am experiencing high cpu load (even if no 
>>>> traffic is passing through the adapter) and network performance is 
>>>> low (when network is connected).
>>>
>>> How do you test the network performance? Please give exact numbers 
>>> for comparison.
>>>
>> I am using this server as a router for my subscribers with iptables 
>> (for NAT and firewall) and hfsc (for QoS). First I encountered this 
>> problem while migrating form Debian 9.7 to 11.5. Routers based  on 
>> Supermicro X11SSL-F (Intel® C232 chipset) works with no problems after 
>> that migration, but routers based on Supermicro X9SCL (Intel C202 PCH) 
>> and Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving 
>> strangely with high cpu load (0.5-0.8 while before it was around 
>> 0.0-0.1) and subscribers not being able to utilize their plans. I 
>> tried to strip down the problem and ends up with clean system with no 
>> iptables or hfsc rules behaving the same (higher load) right after 
>> setting the 10G link upeven if no traffic is passing by.
>>
>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>> with no network attached. The problem can be observed on the 
>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>
>>>> Tested environments:
>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 
>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no 
>>>> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F 
>>>> (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]
>>>
>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 
>>>> (2022-10-21) x86_64 GNU/Linux  [older platforms: Supermicro X9SCL 
>>>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH) 
>>>> behave problematic as described above | newer platform: Supermicro 
>>>> X11SSL-F (Intel® C232 chipset) working well with no problems]
>>>
>>> Maybe create a bug at the Linux kernel bug tracker [1], where you can 
>>> attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>
>> I`ve already reported that to the Debian team 
>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so far 
>> nobody took care of this issue so far.
>>
>>>> So far to solve the problem I was trying to upgrade system to the 
>>>> newest stable version, upgrade kernel to version 6.x, upgrade ixgbe 
>>>> driver to the newest version but with no luck.
>>>
>>> Thank you for checking that. Too bad it’s still present. To rule out 
>>> some user space problem, could you test Debian 9.7 with a stable 
>>> Linux release, currently 6.1.7?
>>>
>>> What does `sudo perf top --sort comm,dso` show, where the time is spent?
>>
>> During my first test in real enviroment with subscribers I gether the 
>> following data through the perf:
>>
>>   27.83%  [kernel]                   [k] strncpy
>>   14.80%  [kernel]                   [k] nft_do_chain
>>    7.61%  [kernel]                   [k] memcmp
>>    5.63%  [kernel]                   [k] nft_meta_get_eval
>>    3.14%  [kernel]                   [k] nft_cmp_eval
>>    2.79%  [kernel]                   [k] asm_exc_nmi
>>    1.07%  [kernel]                   [k] module_get_kallsym
>>    0.92%  [kernel]                   [k] kallsyms_expand_symbol.constprop.0
>>    0.85%  [kernel]                   [k] ixgbe_poll
>>    0.75%  [kernel]                   [k] format_decode
>>    0.61%  [kernel]                   [k] number
>>    0.56%  [kernel]                   [k] menu_select
>>    0.54%  [kernel]                   [k] clflush_cache_range
>>    0.52%  [kernel]                   [k] cpuidle_enter_state
>>    0.51%  [kernel]                   [k] vsnprintf
>>    0.50%  [kernel]                   [k] u32_classify
>>    0.49%  [kernel]                   [k] fib_table_lookup
>>    0.40%  [kernel]                   [k] dma_pte_clear_level
>>    0.39%  [kernel]                   [k] domain_mapping
>>    0.36%  [kernel]                   [k] ixgbe_xmit_fram
>>
>>
>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
>>      18 root      20   0       0      0      0 S  28.2   0.0 7:06.27 ksoftirqd/1
>>      12 root      20   0       0      0      0 R  12.0   0.0 4:10.88 ksoftirqd/0

[…]

Do you see different behavior in `/proc/interrupts`?

>>>> Supermicro support suggested as follows:
>>>> it might be kernel related debian 11.5 has kernel 5.10 which is a 
>>>> recent kernel it might not properly support the chipsets for X9 
>>>> therefore i suggest to use RHEL or CentOS as they use much older 
>>>> kernel versions. I expect that with ubuntu 20.04 you see the same 
>>>> problem it uses kernel 5.4
>>> >>> Testing another GNU/Linux distribution for another data point, might
>>> be a good idea.
>>>
>>> As nobody has responded yet, bisecting the issue is probably the 
>>> fastest way to get to the bottom of this. Luckily the problem seems 
>>> reproducible and you seem to be able to build a Linux kernel 
>>> yourself, so that should work. (For testing purposes you could also 
>>> test with Ubuntu, as they provide Linux kernel builds for (almost) 
>>> all releases in their Linux kernel mainline PPA [2].)
>>>
>> Of course  I can try Ubuntu and report how it is working.
>>
> Ubuntu (5.15.0-43-generic) seems to be working in the same way 
> generating higher load after executing "ip link set enp1s0 up".

That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu 20.04 
with Linux 5.4, and Ubuntu 18.04 with 4.15?

Anyway, I think, you won’t come around bisecting. Another hint, make 
sure that you can build a 4.9 Linux kernel yourself, that does not 
exhibit that issue.


Kind regards,

Paul


>>> [1]: https://bugzilla.kernel.org/
>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
  2023-01-19 17:09       ` Paul Menzel
@ 2023-01-19 17:17         ` Bartek Kois
  2023-01-22 20:28           ` Paul Menzel
  0 siblings, 1 reply; 14+ messages in thread
From: Bartek Kois @ 2023-01-19 17:17 UTC (permalink / raw)
  To: Paul Menzel; +Cc: intel-wired-lan, regressions

W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
> Dear Bartek,
>
>
> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>
>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>
>>>> #regzbot ^introduced: 4.9.88..5.10.149
>
>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>
>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link 
>>>>> set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN 
>>>>> based 10G adapter) I am experiencing high cpu load (even if no 
>>>>> traffic is passing through the adapter) and network performance is 
>>>>> low (when network is connected).
>>>>
>>>> How do you test the network performance? Please give exact numbers 
>>>> for comparison.
>>>>
>>> I am using this server as a router for my subscribers with iptables 
>>> (for NAT and firewall) and hfsc (for QoS). First I encountered this 
>>> problem while migrating form Debian 9.7 to 11.5. Routers based  on 
>>> Supermicro X11SSL-F (Intel® C232 chipset) works with no problems 
>>> after that migration, but routers based on Supermicro X9SCL (Intel 
>>> C202 PCH) and Supermicro X10SLL+-F (Intel C222 Express PCH) starts 
>>> behaving strangely with high cpu load (0.5-0.8 while before it was 
>>> around 0.0-0.1) and subscribers not being able to utilize their 
>>> plans. I tried to strip down the problem and ends up with clean 
>>> system with no iptables or hfsc rules behaving the same (higher 
>>> load) right after setting the 10G link upeven if no traffic is 
>>> passing by.
>>>
>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>>> with no network attached. The problem can be observed on the 
>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>
>>>>> Tested environments:
>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 
>>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no 
>>>>> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F 
>>>>> (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]
>>>>
>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 
>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro X9SCL 
>>>>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH) 
>>>>> behave problematic as described above | newer platform: Supermicro 
>>>>> X11SSL-F (Intel® C232 chipset) working well with no problems]
>>>>
>>>> Maybe create a bug at the Linux kernel bug tracker [1], where you 
>>>> can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>
>>> I`ve already reported that to the Debian team 
>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so far 
>>> nobody took care of this issue so far.
>>>
>>>>> So far to solve the problem I was trying to upgrade system to the 
>>>>> newest stable version, upgrade kernel to version 6.x, upgrade 
>>>>> ixgbe driver to the newest version but with no luck.
>>>>
>>>> Thank you for checking that. Too bad it’s still present. To rule 
>>>> out some user space problem, could you test Debian 9.7 with a 
>>>> stable Linux release, currently 6.1.7?
>>>>
>>>> What does `sudo perf top --sort comm,dso` show, where the time is 
>>>> spent?
>>>
>>> During my first test in real enviroment with subscribers I gether 
>>> the following data through the perf:
>>>
>>>   27.83%  [kernel]                   [k] strncpy
>>>   14.80%  [kernel]                   [k] nft_do_chain
>>>    7.61%  [kernel]                   [k] memcmp
>>>    5.63%  [kernel]                   [k] nft_meta_get_eval
>>>    3.14%  [kernel]                   [k] nft_cmp_eval
>>>    2.79%  [kernel]                   [k] asm_exc_nmi
>>>    1.07%  [kernel]                   [k] module_get_kallsym
>>>    0.92%  [kernel]                   [k] 
>>> kallsyms_expand_symbol.constprop.0
>>>    0.85%  [kernel]                   [k] ixgbe_poll
>>>    0.75%  [kernel]                   [k] format_decode
>>>    0.61%  [kernel]                   [k] number
>>>    0.56%  [kernel]                   [k] menu_select
>>>    0.54%  [kernel]                   [k] clflush_cache_range
>>>    0.52%  [kernel]                   [k] cpuidle_enter_state
>>>    0.51%  [kernel]                   [k] vsnprintf
>>>    0.50%  [kernel]                   [k] u32_classify
>>>    0.49%  [kernel]                   [k] fib_table_lookup
>>>    0.40%  [kernel]                   [k] dma_pte_clear_level
>>>    0.39%  [kernel]                   [k] domain_mapping
>>>    0.36%  [kernel]                   [k] ixgbe_xmit_fram
>>>
>>>
>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ 
>>> COMMAND
>>>      18 root      20   0       0      0      0 S  28.2   0.0 7:06.27 
>>> ksoftirqd/1
>>>      12 root      20   0       0      0      0 R  12.0   0.0 4:10.88 
>>> ksoftirqd/0
>
> […]
>
> Do you see different behavior in `/proc/interrupts`?
>
This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP 
Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on Supermicro X10SLL+-F 
(Intel C222 Express PCH):

       1 root      20   0  163948  10288   7696 S   0.0   0.1 0:39.58 
systemd
       2 root      20   0       0      0      0 S   0.0   0.0 0:00.17 
kthreadd
       3 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 rcu_gp
       4 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
rcu_par_gp
       6 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
kworker/0:0H-kblockd
       9 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
mm_percpu_wq
      10 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
rcu_tasks_rude_
      11 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
rcu_tasks_trace
      12 root      20   0       0      0      0 S   0.0   0.0 6:07.13 
ksoftirqd/0
      13 root      20   0       0      0      0 I   0.0   0.0 4:15.28 
rcu_sched
      14 root      rt   0       0      0      0 S   0.0   0.0 0:03.20 
migration/0
      15 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
cpuhp/0
      16 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
cpuhp/1
      17 root      rt   0       0      0      0 S   0.0   0.0 0:02.75 
migration/1
      18 root      20   0       0      0      0 S   0.0   0.0 4:35.84 
ksoftirqd/1
      20 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
kworker/1:0H-events_highpri
      21 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
cpuhp/2
      22 root      rt   0       0      0      0 S   0.0   0.0 0:01.37 
migration/2
      23 root      20   0       0      0      0 S   0.0   0.0 8:18.23 
ksoftirqd/2
      25 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
kworker/2:0H-events_highpri
      26 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
cpuhp/3
      27 root      rt   0       0      0      0 S   0.0   0.0 0:01.76 
migration/3
      28 root      20   0       0      0      0 S   0.0   0.0 8:45.46 
ksoftirqd/3
      30 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
kworker/3:0H-events_highpri
      31 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
cpuhp/4
      32 root      rt   0       0      0      0 S   0.0   0.0 0:04.39 
migration/4
      33 root      20   0       0      0      0 S   0.0   0.0 3:44.08 
ksoftirqd/4
      35 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
kworker/4:0H-events_highpri
      36 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
cpuhp/5
      37 root      rt   0       0      0      0 S   0.0   0.0 0:02.44 
migration/5
      38 root      20   0       0      0      0 S   0.0   0.0 4:04.34 
ksoftirqd/5
      40 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
kworker/5:0H-events_highpri
      41 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
cpuhp/6
      42 root      rt   0       0      0      0 S   0.0   0.0 0:01.95 
migration/6
      43 root      20   0       0      0      0 S   0.0   0.0 3:35.38 
ksoftirqd/6
      45 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
kworker/6:0H-kblockd
      46 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
cpuhp/7
      47 root      rt   0       0      0      0 S   0.0   0.0 0:01.07 
migration/7
      48 root      20   0       0      0      0 S   0.0   0.0 0:00.16 
ksoftirqd/7
      50 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
kworker/7:0H-kblockd
      59 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
kdevtmpfs
      60 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 netns
      61 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
kauditd
      62 root      20   0       0      0      0 S   0.0   0.0 0:00.09 
khungtaskd
      63 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
oom_reaper
      64 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
writeback
      65 root      20   0       0      0      0 S   0.0   0.0 0:07.72 
kcompactd0
      66 root      25   5       0      0      0 S   0.0   0.0 0:00.00 ksmd
      67 root      39  19       0      0      0 S   0.0   0.0 0:01.19 
khugepaged
      85 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
kintegrityd
      86 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
kblockd
      87 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
blkcg_punt_bio
      88 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
edac-poller
      89 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
devfreq_wq
      91 root       0 -20       0      0      0 I   0.0   0.0 0:02.57 
kworker/1:1H-kblockd
      92 root      20   0       0      0      0 S   0.0   0.0 0:00.00 
kswapd0
      93 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
kthrotld
      94 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
acpi_thermal_pm
      96 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
ipv6_addrconf
     104 root       0 -20       0      0      0 I   0.0   0.0 0:00.68 
kworker/2:1H-kblockd
     109 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 kstrp
     112 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
zswap-shrink
     113 root       0 -20       0      0      0 I   0.0   0.0 0:00.00 
kworker/u17:0

and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 
on Supermicro X10SLL+-F (Intel C222 Express PCH)

31659 root      20   0       0      0      0 S   0.3  0.0 0:00.92 
kworker/7:0
     1 root      20   0   57032   6736   5256 S   0.0  0.1 2:28.14 systemd
     2 root      20   0       0      0      0 S   0.0  0.0 0:00.19 kthreadd
     3 root      20   0       0      0      0 S   0.0  0.0 0:35.42 
ksoftirqd/0
     5 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
kworker/0:0H
     7 root      20   0       0      0      0 S   0.0  0.0 2:36.16 rcu_sched
     8 root      20   0       0      0      0 S   0.0  0.0 0:00.00 rcu_bh
     9 root      rt   0       0      0      0 S   0.0  0.0 0:00.28 
migration/0
    10 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
lru-add-drain
    11 root      rt   0       0      0      0 S   0.0  0.0 0:00.25 
watchdog/0
    12 root      20   0       0      0      0 S   0.0  0.0 0:00.00 cpuhp/0
    13 root      20   0       0      0      0 S   0.0  0.0 0:00.00 cpuhp/1
    14 root      rt   0       0      0      0 S   0.0  0.0 0:00.31 
watchdog/1
    15 root      rt   0       0      0      0 S   0.0  0.0 0:25.69 
migration/1
    16 root      20   0       0      0      0 S   0.0  0.0 1:10.62 
ksoftirqd/1
    18 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
kworker/1:0H
    19 root      20   0       0      0      0 S   0.0  0.0 0:00.00 cpuhp/2
    20 root      rt   0       0      0      0 S   0.0  0.0 0:00.26 
watchdog/2
    21 root      rt   0       0      0      0 S   0.0  0.0 0:10.18 
migration/2
    22 root      20   0       0      0      0 S   0.0  0.0 0:51.08 
ksoftirqd/2
    24 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
kworker/2:0H
    25 root      20   0       0      0      0 S   0.0  0.0 0:00.00 cpuhp/3
    26 root      rt   0       0      0      0 S   0.0  0.0 0:00.23 
watchdog/3
    27 root      rt   0       0      0      0 S   0.0  0.0 0:00.32 
migration/3
    28 root      20   0       0      0      0 S   0.0  0.0 0:48.46 
ksoftirqd/3
    30 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
kworker/3:0H
    31 root      20   0       0      0      0 S   0.0  0.0 0:00.00 cpuhp/4
    32 root      rt   0       0      0      0 S   0.0  0.0 0:00.21 
watchdog/4
    33 root      rt   0       0      0      0 S   0.0  0.0 0:00.25 
migration/4
    34 root      20   0       0      0      0 S   0.0  0.0 0:36.35 
ksoftirqd/4
    36 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
kworker/4:0H
    37 root      20   0       0      0      0 S   0.0  0.0 0:00.00 cpuhp/5
    38 root      rt   0       0      0      0 S   0.0  0.0 0:00.22 
watchdog/5
    39 root      rt   0       0      0      0 S   0.0  0.0 0:04.02 
migration/5
    40 root      20   0       0      0      0 S   0.0  0.0 0:41.43 
ksoftirqd/5
    42 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
kworker/5:0H
    43 root      20   0       0      0      0 S   0.0  0.0 0:00.00 cpuhp/6
    44 root      rt   0       0      0      0 S   0.0  0.0 0:00.22 
watchdog/6
    45 root      rt   0       0      0      0 S   0.0  0.0 0:01.53 
migration/6
    46 root      20   0       0      0      0 S   0.0  0.0 0:41.66 
ksoftirqd/6
    48 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
kworker/6:0H
    49 root      20   0       0      0      0 S   0.0  0.0 0:00.00 cpuhp/7
    50 root      rt   0       0      0      0 S   0.0  0.0 0:00.24 
watchdog/7
    51 root      rt   0       0      0      0 S   0.0  0.0 0:00.27 
migration/7
    52 root      20   0       0      0      0 S   0.0  0.0 0:46.13 
ksoftirqd/7
    54 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
kworker/7:0H
    55 root      20   0       0      0      0 S   0.0  0.0 0:00.00 kdevtmpfs
    56 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 netns
    57 root      20   0       0      0      0 S   0.0  0.0 0:00.07 
khungtaskd
    58 root      20   0       0      0      0 S   0.0  0.0 0:00.00 
oom_reaper
    59 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 writeback
    60 root      20   0       0      0      0 S   0.0  0.0 0:00.00 
kcompactd0
    62 root      25   5       0      0      0 S   0.0  0.0 0:00.00 ksmd
    63 root      39  19       0      0      0 S   0.0  0.0 0:00.00 
khugepaged
    64 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 crypto
    65 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
kintegrityd
    66 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 bioset
    67 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 kblockd
    75 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
devfreq_wq
    76 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 watchdogd
    77 root      20   0       0      0      0 S   0.0  0.0 0:00.00 kswapd0
    78 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 vmstat
    90 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 kthrotld
    91 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
ipv6_addrconf
   121 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
acpi_thermal_pm
   130 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 ata_sff
   139 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 ixgbe
   166 root      20   0       0      0      0 S   0.0  0.0 0:00.00 scsi_eh_0
   167 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
scsi_tmf_0
   168 root      20   0       0      0      0 S   0.0  0.0 0:00.00 scsi_eh_1
   169 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
scsi_tmf_1
   170 root      20   0       0      0      0 S   0.0  0.0 0:00.00 scsi_eh_2
   171 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
scsi_tmf_2
   172 root      20   0       0      0      0 S   0.0  0.0 0:00.00 scsi_eh_3
   173 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
scsi_tmf_3
   174 root      20   0       0      0      0 S   0.0  0.0 0:00.00 scsi_eh_4
   175 root       0 -20       0      0      0 S   0.0  0.0 0:00.00 
scsi_tmf_4
   176 root      20   0       0      0      0 S   0.0  0.0 0:00.00 scsi_eh_5

>>>>> Supermicro support suggested as follows:
>>>>> it might be kernel related debian 11.5 has kernel 5.10 which is a 
>>>>> recent kernel it might not properly support the chipsets for X9 
>>>>> therefore i suggest to use RHEL or CentOS as they use much older 
>>>>> kernel versions. I expect that with ubuntu 20.04 you see the same 
>>>>> problem it uses kernel 5.4
>>>> >>> Testing another GNU/Linux distribution for another data point, 
>>>> might
>>>> be a good idea.
>>>>
>>>> As nobody has responded yet, bisecting the issue is probably the 
>>>> fastest way to get to the bottom of this. Luckily the problem seems 
>>>> reproducible and you seem to be able to build a Linux kernel 
>>>> yourself, so that should work. (For testing purposes you could also 
>>>> test with Ubuntu, as they provide Linux kernel builds for (almost) 
>>>> all releases in their Linux kernel mainline PPA [2].)
>>>>
>>> Of course  I can try Ubuntu and report how it is working.
>>>
>> Ubuntu (5.15.0-43-generic) seems to be working in the same way 
>> generating higher load after executing "ip link set enp1s0 up".
>
> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu 20.04 
> with Linux 5.4, and Ubuntu 18.04 with 4.15?
>
> Anyway, I think, you won’t come around bisecting. Another hint, make 
> sure that you can build a 4.9 Linux kernel yourself, that does not 
> exhibit that issue.
>
That`s ringht, it is 22.04. I don`t have to build it. Standard kernel 
Linux 4.9.0-6-amd64 form Debian 9.7 worked without problems for past 4 
years.

Best regards

Bartek Kois

>
> Kind regards,
>
> Paul
>
>
>>>> [1]: https://bugzilla.kernel.org/
>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
  2023-01-19 17:17         ` Bartek Kois
@ 2023-01-22 20:28           ` Paul Menzel
  2023-01-23 18:38             ` Bartek Kois
  0 siblings, 1 reply; 14+ messages in thread
From: Paul Menzel @ 2023-01-22 20:28 UTC (permalink / raw)
  To: Bartek Kois; +Cc: intel-wired-lan, regressions

Dear Bartek,


Am 19.01.23 um 18:17 schrieb Bartek Kois:
> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:

>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>
>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>
>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>
>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>
>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip link 
>>>>>> set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 82599EN 
>>>>>> based 10G adapter) I am experiencing high cpu load (even if no 
>>>>>> traffic is passing through the adapter) and network performance is 
>>>>>> low (when network is connected).
>>>>>
>>>>> How do you test the network performance? Please give exact numbers 
>>>>> for comparison.
>>>>>
>>>> I am using this server as a router for my subscribers with iptables 
>>>> (for NAT and firewall) and hfsc (for QoS). First I encountered this 
>>>> problem while migrating form Debian 9.7 to 11.5. Routers based  on 
>>>> Supermicro X11SSL-F (Intel® C232 chipset) works with no problems 
>>>> after that migration, but routers based on Supermicro X9SCL (Intel 
>>>> C202 PCH) and Supermicro X10SLL+-F (Intel C222 Express PCH) starts 
>>>> behaving strangely with high cpu load (0.5-0.8 while before it was 
>>>> around 0.0-0.1) and subscribers not being able to utilize their 
>>>> plans. I tried to strip down the problem and ends up with clean 
>>>> system with no iptables or hfsc rules behaving the same (higher 
>>>> load) right after setting the 10G link upeven if no traffic is 
>>>> passing by.
>>>>
>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>>>> with no network attached. The problem can be observed on the 
>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the Supermicro
>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>
>>>>>> Tested environments:
>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 
>>>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with no 
>>>>>> problems: Supermicro X9SCL (Intel C202 PCH), Supermicro X10SLL+-F 
>>>>>> (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® C232 chipset)]
>>>>>
>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 
>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro X9SCL 
>>>>>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH) 
>>>>>> behave problematic as described above | newer platform: Supermicro 
>>>>>> X11SSL-F (Intel® C232 chipset) working well with no problems]
>>>>>
>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where you 
>>>>> can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>
>>>> I`ve already reported that to the Debian team 
>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so far 
>>>> nobody took care of this issue so far.
>>>>
>>>>>> So far to solve the problem I was trying to upgrade system to the 
>>>>>> newest stable version, upgrade kernel to version 6.x, upgrade 
>>>>>> ixgbe driver to the newest version but with no luck.
>>>>>
>>>>> Thank you for checking that. Too bad it’s still present. To rule 
>>>>> out some user space problem, could you test Debian 9.7 with a 
>>>>> stable Linux release, currently 6.1.7?
>>>>>
>>>>> What does `sudo perf top --sort comm,dso` show, where the time is 
>>>>> spent?
>>>>
>>>> During my first test in real enviroment with subscribers I gether 
>>>> the following data through the perf:
>>>>
>>>>   27.83%  [kernel]                   [k] strncpy
>>>>   14.80%  [kernel]                   [k] nft_do_chain
>>>>    7.61%  [kernel]                   [k] memcmp
>>>>    5.63%  [kernel]                   [k] nft_meta_get_eval
>>>>    3.14%  [kernel]                   [k] nft_cmp_eval
>>>>    2.79%  [kernel]                   [k] asm_exc_nmi
>>>>    1.07%  [kernel]                   [k] module_get_kallsym
>>>>    0.92%  [kernel]                   [k] kallsyms_expand_symbol.constprop.0
>>>>    0.85%  [kernel]                   [k] ixgbe_poll
>>>>    0.75%  [kernel]                   [k] format_decode
>>>>    0.61%  [kernel]                   [k] number
>>>>    0.56%  [kernel]                   [k] menu_select
>>>>    0.54%  [kernel]                   [k] clflush_cache_range
>>>>    0.52%  [kernel]                   [k] cpuidle_enter_state
>>>>    0.51%  [kernel]                   [k] vsnprintf
>>>>    0.50%  [kernel]                   [k] u32_classify
>>>>    0.49%  [kernel]                   [k] fib_table_lookup
>>>>    0.40%  [kernel]                   [k] dma_pte_clear_level
>>>>    0.39%  [kernel]                   [k] domain_mapping
>>>>    0.36%  [kernel]                   [k] ixgbe_xmit_fram
>>>>
>>>>
>>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM TIME+ COMMAND
>>>>      18 root      20   0       0      0      0 S  28.2   0.0 7:06.27 ksoftirqd/1
>>>>      12 root      20   0       0      0      0 R  12.0   0.0 4:10.88 ksoftirqd/0
>>
>> […]
>>
>> Do you see different behavior in `/proc/interrupts`?
>>
> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP 
> Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on Supermicro X10SLL+-F 
> (Intel C222 Express PCH):
> 
>        1 root      20   0  163948  10288   7696 S   0.0   0.1 0:39.58 systemd

[…]

The content of `/proc/interrupts` has a different format on my system.

```
$ head -3 /proc/interrupts
            CPU0       CPU1       CPU2       CPU3
   1:      55560          0        113          0  IR-IO-APIC   1-edge 
    i8042
   8:          0          0          0          0  IR-IO-APIC   8-edge 
    rtc0
```
[…]

> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 
> on Supermicro X10SLL+-F (Intel C222 Express PCH)
> 
> 31659 root      20   0       0      0      0 S   0.3  0.0 0:00.92 
> kworker/7:0
>      1 root      20   0   57032   6736   5256 S   0.0  0.1 2:28.14 systemd

[…]
>>>>>> Supermicro support suggested as follows:
>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which is a 
>>>>>> recent kernel it might not properly support the chipsets for X9 
>>>>>> therefore i suggest to use RHEL or CentOS as they use much older 
>>>>>> kernel versions. I expect that with ubuntu 20.04 you see the same 
>>>>>> problem it uses kernel 5.4
>>>>> >>> Testing another GNU/Linux distribution for another data point, 
>>>>> might
>>>>> be a good idea.
>>>>>
>>>>> As nobody has responded yet, bisecting the issue is probably the 
>>>>> fastest way to get to the bottom of this. Luckily the problem seems 
>>>>> reproducible and you seem to be able to build a Linux kernel 
>>>>> yourself, so that should work. (For testing purposes you could also 
>>>>> test with Ubuntu, as they provide Linux kernel builds for (almost) 
>>>>> all releases in their Linux kernel mainline PPA [2].)
>>>>>
>>>> Of course  I can try Ubuntu and report how it is working.
>>>>
>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way 
>>> generating higher load after executing "ip link set enp1s0 up".
>>
>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu 20.04 
>> with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>
>> Anyway, I think, you won’t come around bisecting. Another hint, make 
>> sure that you can build a 4.9 Linux kernel yourself, that does not 
>> exhibit that issue.
>>
> That`s right, it is 22.04. I don`t have to build it. Standard kernel 
> Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems for past 4 
> years.

If nobody of the developers/maintainers is going to step up, you are on 
your own. Again, as you can reproduce this easily, the fastest way is to 
bisect the issue, which you can do on your own.


Kind regards,

Paul


>>>>> [1]: https://bugzilla.kernel.org/
>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
  2023-01-22 20:28           ` Paul Menzel
@ 2023-01-23 18:38             ` Bartek Kois
  2023-01-23 18:53               ` Paul Menzel
  0 siblings, 1 reply; 14+ messages in thread
From: Bartek Kois @ 2023-01-23 18:38 UTC (permalink / raw)
  To: Paul Menzel; +Cc: intel-wired-lan, regressions


W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
> Dear Bartek,
>
>
> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>
>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>
>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>
>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>
>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>
>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip 
>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 
>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load (even 
>>>>>>> if no traffic is passing through the adapter) and network 
>>>>>>> performance is low (when network is connected).
>>>>>>
>>>>>> How do you test the network performance? Please give exact 
>>>>>> numbers for comparison.
>>>>>>
>>>>> I am using this server as a router for my subscribers with 
>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I 
>>>>> encountered this problem while migrating form Debian 9.7 to 11.5. 
>>>>> Routers based  on Supermicro X11SSL-F (Intel® C232 chipset) works 
>>>>> with no problems after that migration, but routers based on 
>>>>> Supermicro X9SCL (Intel C202 PCH) and Supermicro X10SLL+-F (Intel 
>>>>> C222 Express PCH) starts behaving strangely with high cpu load 
>>>>> (0.5-0.8 while before it was around 0.0-0.1) and subscribers not 
>>>>> being able to utilize their plans. I tried to strip down the 
>>>>> problem and ends up with clean system with no iptables or hfsc 
>>>>> rules behaving the same (higher load) right after setting the 10G 
>>>>> link upeven if no traffic is passing by.
>>>>>
>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>>>>> with no network attached. The problem can be observed on the 
>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the 
>>>>>>> Supermicro
>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>
>>>>>>> Tested environments:
>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 
>>>>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with 
>>>>>>> no problems: Supermicro X9SCL (Intel C202 PCH), Supermicro 
>>>>>>> X10SLL+-F (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® 
>>>>>>> C232 chipset)]
>>>>>>
>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 
>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro X9SCL 
>>>>>>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH) 
>>>>>>> behave problematic as described above | newer platform: 
>>>>>>> Supermicro X11SSL-F (Intel® C232 chipset) working well with no 
>>>>>>> problems]
>>>>>>
>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where you 
>>>>>> can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>
>>>>> I`ve already reported that to the Debian team 
>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so 
>>>>> far nobody took care of this issue so far.
>>>>>
>>>>>>> So far to solve the problem I was trying to upgrade system to 
>>>>>>> the newest stable version, upgrade kernel to version 6.x, 
>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>
>>>>>> Thank you for checking that. Too bad it’s still present. To rule 
>>>>>> out some user space problem, could you test Debian 9.7 with a 
>>>>>> stable Linux release, currently 6.1.7?
>>>>>>
>>>>>> What does `sudo perf top --sort comm,dso` show, where the time is 
>>>>>> spent?
>>>>>
>>>>> During my first test in real enviroment with subscribers I gether 
>>>>> the following data through the perf:
>>>>>
>>>>>   27.83%  [kernel]                   [k] strncpy
>>>>>   14.80%  [kernel]                   [k] nft_do_chain
>>>>>    7.61%  [kernel]                   [k] memcmp
>>>>>    5.63%  [kernel]                   [k] nft_meta_get_eval
>>>>>    3.14%  [kernel]                   [k] nft_cmp_eval
>>>>>    2.79%  [kernel]                   [k] asm_exc_nmi
>>>>>    1.07%  [kernel]                   [k] module_get_kallsym
>>>>>    0.92%  [kernel]                   [k] 
>>>>> kallsyms_expand_symbol.constprop.0
>>>>>    0.85%  [kernel]                   [k] ixgbe_poll
>>>>>    0.75%  [kernel]                   [k] format_decode
>>>>>    0.61%  [kernel]                   [k] number
>>>>>    0.56%  [kernel]                   [k] menu_select
>>>>>    0.54%  [kernel]                   [k] clflush_cache_range
>>>>>    0.52%  [kernel]                   [k] cpuidle_enter_state
>>>>>    0.51%  [kernel]                   [k] vsnprintf
>>>>>    0.50%  [kernel]                   [k] u32_classify
>>>>>    0.49%  [kernel]                   [k] fib_table_lookup
>>>>>    0.40%  [kernel]                   [k] dma_pte_clear_level
>>>>>    0.39%  [kernel]                   [k] domain_mapping
>>>>>    0.36%  [kernel]                   [k] ixgbe_xmit_fram
>>>>>
>>>>>
>>>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ 
>>>>> COMMAND
>>>>>      18 root      20   0       0      0      0 S  28.2 0.0 7:06.27 
>>>>> ksoftirqd/1
>>>>>      12 root      20   0       0      0      0 R  12.0 0.0 4:10.88 
>>>>> ksoftirqd/0
>>>
>>> […]
>>>
>>> Do you see different behavior in `/proc/interrupts`?
>>>
>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64 #1 
>> SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on Supermicro 
>> X10SLL+-F (Intel C222 Express PCH):
>>
>>        1 root      20   0  163948  10288   7696 S   0.0   0.1 0:39.58 
>> systemd
>
> […]
>
> The content of `/proc/interrupts` has a different format on my system.
>
> ```
> $ head -3 /proc/interrupts
>            CPU0       CPU1       CPU2       CPU3
>   1:      55560          0        113          0  IR-IO-APIC 1-edge    
> i8042
>   8:          0          0          0          0  IR-IO-APIC 8-edge    
> rtc0
> ```
> […]
>
>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 
>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>
>> 31659 root      20   0       0      0      0 S   0.3  0.0 0:00.92 
>> kworker/7:0
>>      1 root      20   0   57032   6736   5256 S   0.0  0.1 2:28.14 
>> systemd
>
> […]
>>>>>>> Supermicro support suggested as follows:
>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which is 
>>>>>>> a recent kernel it might not properly support the chipsets for 
>>>>>>> X9 therefore i suggest to use RHEL or CentOS as they use much 
>>>>>>> older kernel versions. I expect that with ubuntu 20.04 you see 
>>>>>>> the same problem it uses kernel 5.4
>>>>>> >>> Testing another GNU/Linux distribution for another data 
>>>>>> point, might
>>>>>> be a good idea.
>>>>>>
>>>>>> As nobody has responded yet, bisecting the issue is probably the 
>>>>>> fastest way to get to the bottom of this. Luckily the problem 
>>>>>> seems reproducible and you seem to be able to build a Linux 
>>>>>> kernel yourself, so that should work. (For testing purposes you 
>>>>>> could also test with Ubuntu, as they provide Linux kernel builds 
>>>>>> for (almost) all releases in their Linux kernel mainline PPA [2].)
>>>>>>
>>>>> Of course  I can try Ubuntu and report how it is working.
>>>>>
>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way 
>>>> generating higher load after executing "ip link set enp1s0 up".
>>>
>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu 
>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>
>>> Anyway, I think, you won’t come around bisecting. Another hint, make 
>>> sure that you can build a 4.9 Linux kernel yourself, that does not 
>>> exhibit that issue.
>>>
>> That`s right, it is 22.04. I don`t have to build it. Standard kernel 
>> Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems for past 
>> 4 years.
>
> If nobody of the developers/maintainers is going to step up, you are 
> on your own. Again, as you can reproduce this easily, the fastest way 
> is to bisect the issue, which you can do on your own.

How can I invastigate that futher? I thought about trying to change some 
of the parameters related to ixgbe driver and observe if anything is 
changing, but when I am trying to do:

sudo modprobe ixgbe IntMode=0

I get the following error in the dmesg:

[ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
[ 2137.324848] ixgbe: Intel(R) 10 Gigabit PCI Express Network Driver
[ 2137.324848] ixgbe: Copyright (c) 1999-2016 Intel Corporation.
[ 2138.505751] ixgbe 0000:02:00.0: Multiqueue Enabled: Rx Queue count = 
4, Tx Queue count = 4 XDP Queue count = 0
[ 2138.506049] ixgbe 0000:02:00.0: 32.000 Gb/s available PCIe bandwidth 
(5.0 GT/s PCIe x8 link)
[ 2138.506134] ixgbe 0000:02:00.0: MAC: 2, PHY: 1, PBA No: 0210FF-0FF
[ 2138.506137] ixgbe 0000:02:00.0: ac:1f:6b:ab:fa:70
[ 2138.510537] ixgbe 0000:02:00.0 enp2s0: renamed from eth0
[ 2138.537452] ixgbe 0000:02:00.0: Intel(R) 10 Gigabit Network Connection

How should I use those parameters?

Best regards

Bartek Kois

>
> Kind regards,
>
> Paul
>
>
>>>>>> [1]: https://bugzilla.kernel.org/
>>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
  2023-01-23 18:38             ` Bartek Kois
@ 2023-01-23 18:53               ` Paul Menzel
  2023-01-23 18:58                 ` Bartek Kois
  0 siblings, 1 reply; 14+ messages in thread
From: Paul Menzel @ 2023-01-23 18:53 UTC (permalink / raw)
  To: Bartek Kois; +Cc: intel-wired-lan, regressions

Dear Bartek,


Am 23.01.23 um 19:38 schrieb Bartek Kois:
> 
> W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
>> Dear Bartek,
>>
>>
>> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>>
>>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>>
>>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>>
>>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>>
>>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>>
>>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip 
>>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 
>>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load (even 
>>>>>>>> if no traffic is passing through the adapter) and network 
>>>>>>>> performance is low (when network is connected).
>>>>>>>
>>>>>>> How do you test the network performance? Please give exact 
>>>>>>> numbers for comparison.
>>>>>>>
>>>>>> I am using this server as a router for my subscribers with 
>>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I 
>>>>>> encountered this problem while migrating form Debian 9.7 to 11.5. 
>>>>>> Routers based  on Supermicro X11SSL-F (Intel® C232 chipset) works 
>>>>>> with no problems after that migration, but routers based on 
>>>>>> Supermicro X9SCL (Intel C202 PCH) and Supermicro X10SLL+-F (Intel 
>>>>>> C222 Express PCH) starts behaving strangely with high cpu load 
>>>>>> (0.5-0.8 while before it was around 0.0-0.1) and subscribers not 
>>>>>> being able to utilize their plans. I tried to strip down the 
>>>>>> problem and ends up with clean system with no iptables or hfsc 
>>>>>> rules behaving the same (higher load) right after setting the 10G 
>>>>>> link upeven if no traffic is passing by.
>>>>>>
>>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>>>>>> with no network attached. The problem can be observed on the 
>>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the 
>>>>>>>> Supermicro
>>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>>
>>>>>>>> Tested environments:
>>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 
>>>>>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with 
>>>>>>>> no problems: Supermicro X9SCL (Intel C202 PCH), Supermicro 
>>>>>>>> X10SLL+-F (Intel C222 Express PCH), Supermicro X11SSL-F (Intel® 
>>>>>>>> C232 chipset)]
>>>>>>>
>>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 
>>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro X9SCL 
>>>>>>>> (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 Express PCH) 
>>>>>>>> behave problematic as described above | newer platform: 
>>>>>>>> Supermicro X11SSL-F (Intel® C232 chipset) working well with no 
>>>>>>>> problems]
>>>>>>>
>>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where you 
>>>>>>> can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>>
>>>>>> I`ve already reported that to the Debian team 
>>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so 
>>>>>> far nobody took care of this issue so far.
>>>>>>
>>>>>>>> So far to solve the problem I was trying to upgrade system to 
>>>>>>>> the newest stable version, upgrade kernel to version 6.x, 
>>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>>
>>>>>>> Thank you for checking that. Too bad it’s still present. To rule 
>>>>>>> out some user space problem, could you test Debian 9.7 with a 
>>>>>>> stable Linux release, currently 6.1.7?
>>>>>>>
>>>>>>> What does `sudo perf top --sort comm,dso` show, where the time is 
>>>>>>> spent?
>>>>>>
>>>>>> During my first test in real enviroment with subscribers I gether 
>>>>>> the following data through the perf:
>>>>>>
>>>>>>   27.83%  [kernel]                   [k] strncpy
>>>>>>   14.80%  [kernel]                   [k] nft_do_chain
>>>>>>    7.61%  [kernel]                   [k] memcmp
>>>>>>    5.63%  [kernel]                   [k] nft_meta_get_eval
>>>>>>    3.14%  [kernel]                   [k] nft_cmp_eval
>>>>>>    2.79%  [kernel]                   [k] asm_exc_nmi
>>>>>>    1.07%  [kernel]                   [k] module_get_kallsym
>>>>>>    0.92%  [kernel]                   [k] 
>>>>>> kallsyms_expand_symbol.constprop.0
>>>>>>    0.85%  [kernel]                   [k] ixgbe_poll
>>>>>>    0.75%  [kernel]                   [k] format_decode
>>>>>>    0.61%  [kernel]                   [k] number
>>>>>>    0.56%  [kernel]                   [k] menu_select
>>>>>>    0.54%  [kernel]                   [k] clflush_cache_range
>>>>>>    0.52%  [kernel]                   [k] cpuidle_enter_state
>>>>>>    0.51%  [kernel]                   [k] vsnprintf
>>>>>>    0.50%  [kernel]                   [k] u32_classify
>>>>>>    0.49%  [kernel]                   [k] fib_table_lookup
>>>>>>    0.40%  [kernel]                   [k] dma_pte_clear_level
>>>>>>    0.39%  [kernel]                   [k] domain_mapping
>>>>>>    0.36%  [kernel]                   [k] ixgbe_xmit_fram
>>>>>>
>>>>>>
>>>>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM TIME+ COMMAND
>>>>>>      18 root      20   0       0      0      0 S  28.2 0.0 7:06.27 ksoftirqd/1
>>>>>>      12 root      20   0       0      0      0 R  12.0 0.0 4:10.88 ksoftirqd/0
>>>>
>>>> […]
>>>>
>>>> Do you see different behavior in `/proc/interrupts`?
>>>>
>>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64 #1 
>>> SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on Supermicro 
>>> X10SLL+-F (Intel C222 Express PCH):
>>>
>>>        1 root      20   0  163948  10288   7696 S   0.0   0.1 0:39.58 systemd
>>
>> […]
>>
>> The content of `/proc/interrupts` has a different format on my system.
>>
>> ```
>> $ head -3 /proc/interrupts
>>            CPU0       CPU1       CPU2       CPU3
>>   1:      55560          0        113          0  IR-IO-APIC 1-edge i8042
>>   8:          0          0          0          0  IR-IO-APIC 8-edge rtc0
>> ```
>> […]
>>
>>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 
>>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>
>>> 31659 root      20   0       0      0      0 S   0.3  0.0 0:00.92  kworker/7:0
>>>      1 root      20   0   57032   6736   5256 S   0.0  0.1 2:28.14 systemd
>>
>> […]
>>>>>>>> Supermicro support suggested as follows:
>>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which is 
>>>>>>>> a recent kernel it might not properly support the chipsets for 
>>>>>>>> X9 therefore i suggest to use RHEL or CentOS as they use much 
>>>>>>>> older kernel versions. I expect that with ubuntu 20.04 you see 
>>>>>>>> the same problem it uses kernel 5.4
>>>>>>> >>> Testing another GNU/Linux distribution for another data 
>>>>>>> point, might be a good idea.
>>>>>>>
>>>>>>> As nobody has responded yet, bisecting the issue is probably the 
>>>>>>> fastest way to get to the bottom of this. Luckily the problem 
>>>>>>> seems reproducible and you seem to be able to build a Linux 
>>>>>>> kernel yourself, so that should work. (For testing purposes you 
>>>>>>> could also test with Ubuntu, as they provide Linux kernel builds 
>>>>>>> for (almost) all releases in their Linux kernel mainline PPA [2].)
>>>>>>>
>>>>>> Of course  I can try Ubuntu and report how it is working.
>>>>>>
>>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way 
>>>>> generating higher load after executing "ip link set enp1s0 up".
>>>>
>>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu 
>>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>>
>>>> Anyway, I think, you won’t come around bisecting. Another hint, make 
>>>> sure that you can build a 4.9 Linux kernel yourself, that does not 
>>>> exhibit that issue.
>>>>
>>> That`s right, it is 22.04. I don`t have to build it. Standard kernel 
>>> Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems for past 
>>> 4 years.
>>
>> If nobody of the developers/maintainers is going to step up, you are 
>> on your own. Again, as you can reproduce this easily, the fastest way 
>> is to bisect the issue, which you can do on your own.
> 
> How can I investigate that further?

I repeat myself, please bisect the issue. It’s the fastest way.

> I thought about trying to change some 
> of the parameters related to ixgbe driver and observe if anything is 
> changing, but when I am trying to do:
> 
> sudo modprobe ixgbe IntMode=0
> 
> I get the following error in the dmesg:
> 
> [ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<

[…]

`modinfo ixgbe` shows the supported parameters.


Kind regards,

Paul


PS: If you need help bisecting, please ask. Otherwise, I am out of this 
thread.


>>>>>>> [1]: https://bugzilla.kernel.org/
>>>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
  2023-01-23 18:53               ` Paul Menzel
@ 2023-01-23 18:58                 ` Bartek Kois
  2023-01-23 19:03                   ` Paul Menzel
  0 siblings, 1 reply; 14+ messages in thread
From: Bartek Kois @ 2023-01-23 18:58 UTC (permalink / raw)
  To: Paul Menzel; +Cc: intel-wired-lan, regressions


W dniu 23.01.2023 o 19:53, Paul Menzel pisze:
> Dear Bartek,
>
>
> Am 23.01.23 um 19:38 schrieb Bartek Kois:
>>
>> W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
>>> Dear Bartek,
>>>
>>>
>>> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>>>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>>>
>>>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>>>
>>>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>>>
>>>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>>>
>>>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>>>
>>>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip 
>>>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 
>>>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load 
>>>>>>>>> (even if no traffic is passing through the adapter) and 
>>>>>>>>> network performance is low (when network is connected).
>>>>>>>>
>>>>>>>> How do you test the network performance? Please give exact 
>>>>>>>> numbers for comparison.
>>>>>>>>
>>>>>>> I am using this server as a router for my subscribers with 
>>>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I 
>>>>>>> encountered this problem while migrating form Debian 9.7 to 
>>>>>>> 11.5. Routers based  on Supermicro X11SSL-F (Intel® C232 
>>>>>>> chipset) works with no problems after that migration, but 
>>>>>>> routers based on Supermicro X9SCL (Intel C202 PCH) and 
>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving 
>>>>>>> strangely with high cpu load (0.5-0.8 while before it was around 
>>>>>>> 0.0-0.1) and subscribers not being able to utilize their plans. 
>>>>>>> I tried to strip down the problem and ends up with clean system 
>>>>>>> with no iptables or hfsc rules behaving the same (higher load) 
>>>>>>> right after setting the 10G link upeven if no traffic is passing 
>>>>>>> by.
>>>>>>>
>>>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>>>>>>> with no network attached. The problem can be observed on the 
>>>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the 
>>>>>>>>> Supermicro
>>>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>>>
>>>>>>>>> Tested environments:
>>>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 
>>>>>>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with 
>>>>>>>>> no problems: Supermicro X9SCL (Intel C202 PCH), Supermicro 
>>>>>>>>> X10SLL+-F (Intel C222 Express PCH), Supermicro X11SSL-F 
>>>>>>>>> (Intel® C232 chipset)]
>>>>>>>>
>>>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 
>>>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro 
>>>>>>>>> X9SCL (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 
>>>>>>>>> Express PCH) behave problematic as described above | newer 
>>>>>>>>> platform: Supermicro X11SSL-F (Intel® C232 chipset) working 
>>>>>>>>> well with no problems]
>>>>>>>>
>>>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where 
>>>>>>>> you can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>>>
>>>>>>> I`ve already reported that to the Debian team 
>>>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so 
>>>>>>> far nobody took care of this issue so far.
>>>>>>>
>>>>>>>>> So far to solve the problem I was trying to upgrade system to 
>>>>>>>>> the newest stable version, upgrade kernel to version 6.x, 
>>>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>>>
>>>>>>>> Thank you for checking that. Too bad it’s still present. To 
>>>>>>>> rule out some user space problem, could you test Debian 9.7 
>>>>>>>> with a stable Linux release, currently 6.1.7?
>>>>>>>>
>>>>>>>> What does `sudo perf top --sort comm,dso` show, where the time 
>>>>>>>> is spent?
>>>>>>>
>>>>>>> During my first test in real enviroment with subscribers I 
>>>>>>> gether the following data through the perf:
>>>>>>>
>>>>>>>   27.83%  [kernel]                   [k] strncpy
>>>>>>>   14.80%  [kernel]                   [k] nft_do_chain
>>>>>>>    7.61%  [kernel]                   [k] memcmp
>>>>>>>    5.63%  [kernel]                   [k] nft_meta_get_eval
>>>>>>>    3.14%  [kernel]                   [k] nft_cmp_eval
>>>>>>>    2.79%  [kernel]                   [k] asm_exc_nmi
>>>>>>>    1.07%  [kernel]                   [k] module_get_kallsym
>>>>>>>    0.92%  [kernel]                   [k] 
>>>>>>> kallsyms_expand_symbol.constprop.0
>>>>>>>    0.85%  [kernel]                   [k] ixgbe_poll
>>>>>>>    0.75%  [kernel]                   [k] format_decode
>>>>>>>    0.61%  [kernel]                   [k] number
>>>>>>>    0.56%  [kernel]                   [k] menu_select
>>>>>>>    0.54%  [kernel]                   [k] clflush_cache_range
>>>>>>>    0.52%  [kernel]                   [k] cpuidle_enter_state
>>>>>>>    0.51%  [kernel]                   [k] vsnprintf
>>>>>>>    0.50%  [kernel]                   [k] u32_classify
>>>>>>>    0.49%  [kernel]                   [k] fib_table_lookup
>>>>>>>    0.40%  [kernel]                   [k] dma_pte_clear_level
>>>>>>>    0.39%  [kernel]                   [k] domain_mapping
>>>>>>>    0.36%  [kernel]                   [k] ixgbe_xmit_fram
>>>>>>>
>>>>>>>
>>>>>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM 
>>>>>>> TIME+ COMMAND
>>>>>>>      18 root      20   0       0      0      0 S  28.2 0.0 
>>>>>>> 7:06.27 ksoftirqd/1
>>>>>>>      12 root      20   0       0      0      0 R  12.0 0.0 
>>>>>>> 4:10.88 ksoftirqd/0
>>>>>
>>>>> […]
>>>>>
>>>>> Do you see different behavior in `/proc/interrupts`?
>>>>>
>>>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64 
>>>> #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on 
>>>> Supermicro X10SLL+-F (Intel C222 Express PCH):
>>>>
>>>>        1 root      20   0  163948  10288   7696 S   0.0 0.1 0:39.58 
>>>> systemd
>>>
>>> […]
>>>
>>> The content of `/proc/interrupts` has a different format on my system.
>>>
>>> ```
>>> $ head -3 /proc/interrupts
>>>            CPU0       CPU1       CPU2       CPU3
>>>   1:      55560          0        113          0  IR-IO-APIC 1-edge 
>>> i8042
>>>   8:          0          0          0          0  IR-IO-APIC 8-edge 
>>> rtc0
>>> ```
>>> […]
>>>
>>>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 
>>>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>
>>>> 31659 root      20   0       0      0      0 S   0.3  0.0 0:00.92  
>>>> kworker/7:0
>>>>      1 root      20   0   57032   6736   5256 S   0.0  0.1 2:28.14 
>>>> systemd
>>>
>>> […]
>>>>>>>>> Supermicro support suggested as follows:
>>>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which 
>>>>>>>>> is a recent kernel it might not properly support the chipsets 
>>>>>>>>> for X9 therefore i suggest to use RHEL or CentOS as they use 
>>>>>>>>> much older kernel versions. I expect that with ubuntu 20.04 
>>>>>>>>> you see the same problem it uses kernel 5.4
>>>>>>>> >>> Testing another GNU/Linux distribution for another data 
>>>>>>>> point, might be a good idea.
>>>>>>>>
>>>>>>>> As nobody has responded yet, bisecting the issue is probably 
>>>>>>>> the fastest way to get to the bottom of this. Luckily the 
>>>>>>>> problem seems reproducible and you seem to be able to build a 
>>>>>>>> Linux kernel yourself, so that should work. (For testing 
>>>>>>>> purposes you could also test with Ubuntu, as they provide Linux 
>>>>>>>> kernel builds for (almost) all releases in their Linux kernel 
>>>>>>>> mainline PPA [2].)
>>>>>>>>
>>>>>>> Of course  I can try Ubuntu and report how it is working.
>>>>>>>
>>>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way 
>>>>>> generating higher load after executing "ip link set enp1s0 up".
>>>>>
>>>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu 
>>>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>>>
>>>>> Anyway, I think, you won’t come around bisecting. Another hint, 
>>>>> make sure that you can build a 4.9 Linux kernel yourself, that 
>>>>> does not exhibit that issue.
>>>>>
>>>> That`s right, it is 22.04. I don`t have to build it. Standard 
>>>> kernel Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems 
>>>> for past 4 years.
>>>
>>> If nobody of the developers/maintainers is going to step up, you are 
>>> on your own. Again, as you can reproduce this easily, the fastest 
>>> way is to bisect the issue, which you can do on your own.
>>
>> How can I investigate that further?
>
> I repeat myself, please bisect the issue. It’s the fastest way.
>
>> I thought about trying to change some of the parameters related to 
>> ixgbe driver and observe if anything is changing, but when I am 
>> trying to do:
>>
>> sudo modprobe ixgbe IntMode=0
>>
>> I get the following error in the dmesg:
>>
>> [ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
>
> […]
>
> `modinfo ixgbe` shows the supported parameters.
>
>
> Kind regards,
>
> Paul
>
>
> PS: If you need help bisecting, please ask. Otherwise, I am out of 
> this thread.

Ok, how exactly I can bisect this issue?


Best regards

Bartek Kois

>
>>>>>>>> [1]: https://bugzilla.kernel.org/
>>>>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
  2023-01-23 18:58                 ` Bartek Kois
@ 2023-01-23 19:03                   ` Paul Menzel
  2023-01-24  9:33                     ` Linux kernel regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 14+ messages in thread
From: Paul Menzel @ 2023-01-23 19:03 UTC (permalink / raw)
  To: Bartek Kois; +Cc: intel-wired-lan, regressions

Dear Bartek,


Am 23.01.23 um 19:58 schrieb Bartek Kois:

> W dniu 23.01.2023 o 19:53, Paul Menzel pisze:

>> Am 23.01.23 um 19:38 schrieb Bartek Kois:
>>>
>>> W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
>>>> Dear Bartek,
>>>>
>>>>
>>>> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>>>>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>>>>
>>>>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>>>>
>>>>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>>>>
>>>>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>>>>
>>>>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>>>>
>>>>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip 
>>>>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel 
>>>>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load 
>>>>>>>>>> (even if no traffic is passing through the adapter) and 
>>>>>>>>>> network performance is low (when network is connected).
>>>>>>>>>
>>>>>>>>> How do you test the network performance? Please give exact 
>>>>>>>>> numbers for comparison.
>>>>>>>>>
>>>>>>>> I am using this server as a router for my subscribers with 
>>>>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I 
>>>>>>>> encountered this problem while migrating form Debian 9.7 to 
>>>>>>>> 11.5. Routers based  on Supermicro X11SSL-F (Intel® C232 
>>>>>>>> chipset) works with no problems after that migration, but 
>>>>>>>> routers based on Supermicro X9SCL (Intel C202 PCH) and 
>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving 
>>>>>>>> strangely with high cpu load (0.5-0.8 while before it was around 
>>>>>>>> 0.0-0.1) and subscribers not being able to utilize their plans. 
>>>>>>>> I tried to strip down the problem and ends up with clean system 
>>>>>>>> with no iptables or hfsc rules behaving the same (higher load) 
>>>>>>>> right after setting the 10G link upeven if no traffic is passing 
>>>>>>>> by.
>>>>>>>>
>>>>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla system
>>>>>>>>>> with no network attached. The problem can be observed on the 
>>>>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the 
>>>>>>>>>> Supermicro
>>>>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>>>>
>>>>>>>>>> Tested environments:
>>>>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 4.9.88-1+deb9u1 
>>>>>>>>>> (2018-05-07) x86_64 GNU/Linux [all platforms working well with 
>>>>>>>>>> no problems: Supermicro X9SCL (Intel C202 PCH), Supermicro 
>>>>>>>>>> X10SLL+-F (Intel C222 Express PCH), Supermicro X11SSL-F 
>>>>>>>>>> (Intel® C232 chipset)]
>>>>>>>>>
>>>>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2 
>>>>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro 
>>>>>>>>>> X9SCL (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222 
>>>>>>>>>> Express PCH) behave problematic as described above | newer 
>>>>>>>>>> platform: Supermicro X11SSL-F (Intel® C232 chipset) working 
>>>>>>>>>> well with no problems]
>>>>>>>>>
>>>>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where 
>>>>>>>>> you can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>>>>
>>>>>>>> I`ve already reported that to the Debian team 
>>>>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but so 
>>>>>>>> far nobody took care of this issue so far.
>>>>>>>>
>>>>>>>>>> So far to solve the problem I was trying to upgrade system to 
>>>>>>>>>> the newest stable version, upgrade kernel to version 6.x, 
>>>>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>>>>
>>>>>>>>> Thank you for checking that. Too bad it’s still present. To 
>>>>>>>>> rule out some user space problem, could you test Debian 9.7 
>>>>>>>>> with a stable Linux release, currently 6.1.7?
>>>>>>>>>
>>>>>>>>> What does `sudo perf top --sort comm,dso` show, where the time 
>>>>>>>>> is spent?
>>>>>>>>
>>>>>>>> During my first test in real enviroment with subscribers I 
>>>>>>>> gether the following data through the perf:
>>>>>>>>
>>>>>>>>   27.83%  [kernel]                   [k] strncpy
>>>>>>>>   14.80%  [kernel]                   [k] nft_do_chain
>>>>>>>>    7.61%  [kernel]                   [k] memcmp
>>>>>>>>    5.63%  [kernel]                   [k] nft_meta_get_eval
>>>>>>>>    3.14%  [kernel]                   [k] nft_cmp_eval
>>>>>>>>    2.79%  [kernel]                   [k] asm_exc_nmi
>>>>>>>>    1.07%  [kernel]                   [k] module_get_kallsym
>>>>>>>>    0.92%  [kernel]                   [k] 
>>>>>>>> kallsyms_expand_symbol.constprop.0
>>>>>>>>    0.85%  [kernel]                   [k] ixgbe_poll
>>>>>>>>    0.75%  [kernel]                   [k] format_decode
>>>>>>>>    0.61%  [kernel]                   [k] number
>>>>>>>>    0.56%  [kernel]                   [k] menu_select
>>>>>>>>    0.54%  [kernel]                   [k] clflush_cache_range
>>>>>>>>    0.52%  [kernel]                   [k] cpuidle_enter_state
>>>>>>>>    0.51%  [kernel]                   [k] vsnprintf
>>>>>>>>    0.50%  [kernel]                   [k] u32_classify
>>>>>>>>    0.49%  [kernel]                   [k] fib_table_lookup
>>>>>>>>    0.40%  [kernel]                   [k] dma_pte_clear_level
>>>>>>>>    0.39%  [kernel]                   [k] domain_mapping
>>>>>>>>    0.36%  [kernel]                   [k] ixgbe_xmit_fram
>>>>>>>>
>>>>>>>>
>>>>>>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM 
>>>>>>>> TIME+ COMMAND
>>>>>>>>      18 root      20   0       0      0      0 S  28.2 0.0 
>>>>>>>> 7:06.27 ksoftirqd/1
>>>>>>>>      12 root      20   0       0      0      0 R  12.0 0.0 
>>>>>>>> 4:10.88 ksoftirqd/0
>>>>>>
>>>>>> […]
>>>>>>
>>>>>> Do you see different behavior in `/proc/interrupts`?
>>>>>>
>>>>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64 
>>>>> #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on 
>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH):
>>>>>
>>>>>        1 root      20   0  163948  10288   7696 S   0.0 0.1 0:39.58 
>>>>> systemd
>>>>
>>>> […]
>>>>
>>>> The content of `/proc/interrupts` has a different format on my system.
>>>>
>>>> ```
>>>> $ head -3 /proc/interrupts
>>>>            CPU0       CPU1       CPU2       CPU3
>>>>   1:      55560          0        113          0  IR-IO-APIC 1-edge 
>>>> i8042
>>>>   8:          0          0          0          0  IR-IO-APIC 8-edge 
>>>> rtc0
>>>> ```
>>>> […]
>>>>
>>>>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian 
>>>>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>>
>>>>> 31659 root      20   0       0      0      0 S   0.3  0.0 0:00.92 
>>>>> kworker/7:0
>>>>>      1 root      20   0   57032   6736   5256 S   0.0  0.1 2:28.14 
>>>>> systemd
>>>>
>>>> […]
>>>>>>>>>> Supermicro support suggested as follows:
>>>>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which 
>>>>>>>>>> is a recent kernel it might not properly support the chipsets 
>>>>>>>>>> for X9 therefore i suggest to use RHEL or CentOS as they use 
>>>>>>>>>> much older kernel versions. I expect that with ubuntu 20.04 
>>>>>>>>>> you see the same problem it uses kernel 5.4
>>>>>>>>> >>> Testing another GNU/Linux distribution for another data 
>>>>>>>>> point, might be a good idea.
>>>>>>>>>
>>>>>>>>> As nobody has responded yet, bisecting the issue is probably 
>>>>>>>>> the fastest way to get to the bottom of this. Luckily the 
>>>>>>>>> problem seems reproducible and you seem to be able to build a 
>>>>>>>>> Linux kernel yourself, so that should work. (For testing 
>>>>>>>>> purposes you could also test with Ubuntu, as they provide Linux 
>>>>>>>>> kernel builds for (almost) all releases in their Linux kernel 
>>>>>>>>> mainline PPA [2].)
>>>>>>>>>
>>>>>>>> Of course  I can try Ubuntu and report how it is working.
>>>>>>>>
>>>>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way 
>>>>>>> generating higher load after executing "ip link set enp1s0 up".
>>>>>>
>>>>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu 
>>>>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>>>>
>>>>>> Anyway, I think, you won’t come around bisecting. Another hint, 
>>>>>> make sure that you can build a 4.9 Linux kernel yourself, that 
>>>>>> does not exhibit that issue.
>>>>>>
>>>>> That`s right, it is 22.04. I don`t have to build it. Standard 
>>>>> kernel Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems 
>>>>> for past 4 years.
>>>>
>>>> If nobody of the developers/maintainers is going to step up, you are 
>>>> on your own. Again, as you can reproduce this easily, the fastest 
>>>> way is to bisect the issue, which you can do on your own.
>>>
>>> How can I investigate that further?
>>
>> I repeat myself, please bisect the issue. It’s the fastest way.
>>
>>> I thought about trying to change some of the parameters related to 
>>> ixgbe driver and observe if anything is changing, but when I am 
>>> trying to do:
>>>
>>> sudo modprobe ixgbe IntMode=0
>>>
>>> I get the following error in the dmesg:
>>>
>>> [ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
>>
>> […]
>>
>> `modinfo ixgbe` shows the supported parameters.

>> PS: If you need help bisecting, please ask. Otherwise, I am out of 
>> this thread.
> 
> Ok, how exactly I can bisect this issue?

What have you tried so far? As written in the past, I’d first try more 
distributions, for example, older Ubuntu versions. Then, if you have 
some range, I’d use the Ubuntu PPA, and then between the release 
candidate versions, only then start doing `git bisect` as documented in 
the documentation [3].


Kind regards,

Paul


>>>>>>>>> [1]: https://bugzilla.kernel.org/
>>>>>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
[3]: https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
  2023-01-23 19:03                   ` Paul Menzel
@ 2023-01-24  9:33                     ` Linux kernel regression tracking (Thorsten Leemhuis)
  2023-01-24  9:40                       ` Bartek Kois
  0 siblings, 1 reply; 14+ messages in thread
From: Linux kernel regression tracking (Thorsten Leemhuis) @ 2023-01-24  9:33 UTC (permalink / raw)
  To: Paul Menzel, Bartek Kois; +Cc: intel-wired-lan, regressions

On 23.01.23 20:03, Paul Menzel wrote:
> Am 23.01.23 um 19:58 schrieb Bartek Kois:
> 
>> W dniu 23.01.2023 o 19:53, Paul Menzel pisze:
> 
>>> Am 23.01.23 um 19:38 schrieb Bartek Kois:
>>>>
>>>> W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
>>>>> Dear Bartek,
>>>>>
>>>>>
>>>>> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>>>>>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>>>>>
>>>>>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>>>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>>>>>
>>>>>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>>>>>
>>>>>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>>>>>
>>>>>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>>>>>
>>>>>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip
>>>>>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel
>>>>>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load
>>>>>>>>>>> (even if no traffic is passing through the adapter) and
>>>>>>>>>>> network performance is low (when network is connected).
>>>>>>>>>>
>>>>>>>>>> How do you test the network performance? Please give exact
>>>>>>>>>> numbers for comparison.
>>>>>>>>>>
>>>>>>>>> I am using this server as a router for my subscribers with
>>>>>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I
>>>>>>>>> encountered this problem while migrating form Debian 9.7 to
>>>>>>>>> 11.5. Routers based  on Supermicro X11SSL-F (Intel® C232
>>>>>>>>> chipset) works with no problems after that migration, but
>>>>>>>>> routers based on Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving
>>>>>>>>> strangely with high cpu load (0.5-0.8 while before it was
>>>>>>>>> around 0.0-0.1) and subscribers not being able to utilize their
>>>>>>>>> plans. I tried to strip down the problem and ends up with clean
>>>>>>>>> system with no iptables or hfsc rules behaving the same (higher
>>>>>>>>> load) right after setting the 10G link upeven if no traffic is
>>>>>>>>> passing by.
>>>>>>>>>
>>>>>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla
>>>>>>>>>>> system
>>>>>>>>>>> with no network attached. The problem can be observed on the
>>>>>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the
>>>>>>>>>>> Supermicro
>>>>>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>>>>>
>>>>>>>>>>> Tested environments:
>>>>>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>>>>>>> 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux [all platforms
>>>>>>>>>>> working well with no problems: Supermicro X9SCL (Intel C202
>>>>>>>>>>> PCH), Supermicro X10SLL+-F (Intel C222 Express PCH),
>>>>>>>>>>> Supermicro X11SSL-F (Intel® C232 chipset)]
>>>>>>>>>>
>>>>>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>>>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro
>>>>>>>>>>> X9SCL (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222
>>>>>>>>>>> Express PCH) behave problematic as described above | newer
>>>>>>>>>>> platform: Supermicro X11SSL-F (Intel® C232 chipset) working
>>>>>>>>>>> well with no problems]
>>>>>>>>>>
>>>>>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where
>>>>>>>>>> you can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>>>>>
>>>>>>>>> I`ve already reported that to the Debian team
>>>>>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but
>>>>>>>>> so far nobody took care of this issue so far.
>>>>>>>>>
>>>>>>>>>>> So far to solve the problem I was trying to upgrade system to
>>>>>>>>>>> the newest stable version, upgrade kernel to version 6.x,
>>>>>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>>>>>
>>>>>>>>>> Thank you for checking that. Too bad it’s still present. To
>>>>>>>>>> rule out some user space problem, could you test Debian 9.7
>>>>>>>>>> with a stable Linux release, currently 6.1.7?
>>>>>>>>>>
>>>>>>>>>> What does `sudo perf top --sort comm,dso` show, where the time
>>>>>>>>>> is spent?
>>>>>>>>>
>>>>>>>>> During my first test in real enviroment with subscribers I
>>>>>>>>> gether the following data through the perf:
>>>>>>>>>
>>>>>>>>>   27.83%  [kernel]                   [k] strncpy
>>>>>>>>>   14.80%  [kernel]                   [k] nft_do_chain
>>>>>>>>>    7.61%  [kernel]                   [k] memcmp
>>>>>>>>>    5.63%  [kernel]                   [k] nft_meta_get_eval
>>>>>>>>>    3.14%  [kernel]                   [k] nft_cmp_eval
>>>>>>>>>    2.79%  [kernel]                   [k] asm_exc_nmi
>>>>>>>>>    1.07%  [kernel]                   [k] module_get_kallsym
>>>>>>>>>    0.92%  [kernel]                   [k]
>>>>>>>>> kallsyms_expand_symbol.constprop.0
>>>>>>>>>    0.85%  [kernel]                   [k] ixgbe_poll
>>>>>>>>>    0.75%  [kernel]                   [k] format_decode
>>>>>>>>>    0.61%  [kernel]                   [k] number
>>>>>>>>>    0.56%  [kernel]                   [k] menu_select
>>>>>>>>>    0.54%  [kernel]                   [k] clflush_cache_range
>>>>>>>>>    0.52%  [kernel]                   [k] cpuidle_enter_state
>>>>>>>>>    0.51%  [kernel]                   [k] vsnprintf
>>>>>>>>>    0.50%  [kernel]                   [k] u32_classify
>>>>>>>>>    0.49%  [kernel]                   [k] fib_table_lookup
>>>>>>>>>    0.40%  [kernel]                   [k] dma_pte_clear_level
>>>>>>>>>    0.39%  [kernel]                   [k] domain_mapping
>>>>>>>>>    0.36%  [kernel]                   [k] ixgbe_xmit_fram
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>     PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM
>>>>>>>>> TIME+ COMMAND
>>>>>>>>>      18 root      20   0       0      0      0 S  28.2 0.0
>>>>>>>>> 7:06.27 ksoftirqd/1
>>>>>>>>>      12 root      20   0       0      0      0 R  12.0 0.0
>>>>>>>>> 4:10.88 ksoftirqd/0
>>>>>>>
>>>>>>> […]
>>>>>>>
>>>>>>> Do you see different behavior in `/proc/interrupts`?
>>>>>>>
>>>>>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64
>>>>>> #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on
>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH):
>>>>>>
>>>>>>        1 root      20   0  163948  10288   7696 S   0.0 0.1
>>>>>> 0:39.58 systemd
>>>>>
>>>>> […]
>>>>>
>>>>> The content of `/proc/interrupts` has a different format on my system.
>>>>>
>>>>> ```
>>>>> $ head -3 /proc/interrupts
>>>>>            CPU0       CPU1       CPU2       CPU3
>>>>>   1:      55560          0        113          0  IR-IO-APIC 1-edge
>>>>> i8042
>>>>>   8:          0          0          0          0  IR-IO-APIC 8-edge
>>>>> rtc0
>>>>> ```
>>>>> […]
>>>>>
>>>>>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>>>
>>>>>> 31659 root      20   0       0      0      0 S   0.3  0.0 0:00.92
>>>>>> kworker/7:0
>>>>>>      1 root      20   0   57032   6736   5256 S   0.0  0.1 2:28.14
>>>>>> systemd
>>>>>
>>>>> […]
>>>>>>>>>>> Supermicro support suggested as follows:
>>>>>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which
>>>>>>>>>>> is a recent kernel it might not properly support the chipsets
>>>>>>>>>>> for X9 therefore i suggest to use RHEL or CentOS as they use
>>>>>>>>>>> much older kernel versions. I expect that with ubuntu 20.04
>>>>>>>>>>> you see the same problem it uses kernel 5.4
>>>>>>>>>> >>> Testing another GNU/Linux distribution for another data
>>>>>>>>>> point, might be a good idea.
>>>>>>>>>>
>>>>>>>>>> As nobody has responded yet, bisecting the issue is probably
>>>>>>>>>> the fastest way to get to the bottom of this. Luckily the
>>>>>>>>>> problem seems reproducible and you seem to be able to build a
>>>>>>>>>> Linux kernel yourself, so that should work. (For testing
>>>>>>>>>> purposes you could also test with Ubuntu, as they provide
>>>>>>>>>> Linux kernel builds for (almost) all releases in their Linux
>>>>>>>>>> kernel mainline PPA [2].)
>>>>>>>>>>
>>>>>>>>> Of course  I can try Ubuntu and report how it is working.
>>>>>>>>>
>>>>>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way
>>>>>>>> generating higher load after executing "ip link set enp1s0 up".
>>>>>>>
>>>>>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu
>>>>>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>>>>>
>>>>>>> Anyway, I think, you won’t come around bisecting. Another hint,
>>>>>>> make sure that you can build a 4.9 Linux kernel yourself, that
>>>>>>> does not exhibit that issue.
>>>>>>>
>>>>>> That`s right, it is 22.04. I don`t have to build it. Standard
>>>>>> kernel Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems
>>>>>> for past 4 years.
>>>>>
>>>>> If nobody of the developers/maintainers is going to step up, you
>>>>> are on your own. Again, as you can reproduce this easily, the
>>>>> fastest way is to bisect the issue, which you can do on your own.
>>>>
>>>> How can I investigate that further?
>>>
>>> I repeat myself, please bisect the issue. It’s the fastest way.
>>>
>>>> I thought about trying to change some of the parameters related to
>>>> ixgbe driver and observe if anything is changing, but when I am
>>>> trying to do:
>>>>
>>>> sudo modprobe ixgbe IntMode=0
>>>>
>>>> I get the following error in the dmesg:
>>>>
>>>> [ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
>>>
>>> […]
>>>
>>> `modinfo ixgbe` shows the supported parameters.
> 
>>> PS: If you need help bisecting, please ask. Otherwise, I am out of
>>> this thread.
>>
>> Ok, how exactly I can bisect this issue?
> 
> What have you tried so far? As written in the past, I’d first try more
> distributions, for example, older Ubuntu versions. Then, if you have
> some range, I’d use the Ubuntu PPA, and then between the release
> candidate versions, only then start doing `git bisect` as documented in
> the documentation [3].

Hmmm. I'm not an expert in that area, but if you follow Paul's advice
keep in mind that a deliberate config change by the distro might have an
impact here. Hence it might be a good idea to rule that out first by
taking a config from a working kernel and using it (with the help of
"make olddefconfig") to build your own kernel from the version that is
known to fail. But over such a wide range of versions this can be
tricky. :-/

But apart from that Paul is right afaics: nobody yet had an idea what
might cause this regression, hence we need a bisection to pin-point the
problem.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.



>>>>>>>>>> [1]: https://bugzilla.kernel.org/
>>>>>>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
> [3]: https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html
> 
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
  2023-01-24  9:33                     ` Linux kernel regression tracking (Thorsten Leemhuis)
@ 2023-01-24  9:40                       ` Bartek Kois
  2023-03-23 13:46                         ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 1 reply; 14+ messages in thread
From: Bartek Kois @ 2023-01-24  9:40 UTC (permalink / raw)
  To: Linux regressions mailing list, Paul Menzel; +Cc: intel-wired-lan


W dniu 24.01.2023 o 10:33, Linux kernel regression tracking (Thorsten 
Leemhuis) pisze:
> On 23.01.23 20:03, Paul Menzel wrote:
>> Am 23.01.23 um 19:58 schrieb Bartek Kois:
>>
>>> W dniu 23.01.2023 o 19:53, Paul Menzel pisze:
>>>> Am 23.01.23 um 19:38 schrieb Bartek Kois:
>>>>> W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
>>>>>> Dear Bartek,
>>>>>>
>>>>>>
>>>>>> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>>>>>>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>>>>>>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>>>>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>>>>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>>>>>>
>>>>>>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip
>>>>>>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel
>>>>>>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load
>>>>>>>>>>>> (even if no traffic is passing through the adapter) and
>>>>>>>>>>>> network performance is low (when network is connected).
>>>>>>>>>>> How do you test the network performance? Please give exact
>>>>>>>>>>> numbers for comparison.
>>>>>>>>>>>
>>>>>>>>>> I am using this server as a router for my subscribers with
>>>>>>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I
>>>>>>>>>> encountered this problem while migrating form Debian 9.7 to
>>>>>>>>>> 11.5. Routers based  on Supermicro X11SSL-F (Intel® C232
>>>>>>>>>> chipset) works with no problems after that migration, but
>>>>>>>>>> routers based on Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving
>>>>>>>>>> strangely with high cpu load (0.5-0.8 while before it was
>>>>>>>>>> around 0.0-0.1) and subscribers not being able to utilize their
>>>>>>>>>> plans. I tried to strip down the problem and ends up with clean
>>>>>>>>>> system with no iptables or hfsc rules behaving the same (higher
>>>>>>>>>> load) right after setting the 10G link upeven if no traffic is
>>>>>>>>>> passing by.
>>>>>>>>>>
>>>>>>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla
>>>>>>>>>>>> system
>>>>>>>>>>>> with no network attached. The problem can be observed on the
>>>>>>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the
>>>>>>>>>>>> Supermicro
>>>>>>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>>>>>>
>>>>>>>>>>>> Tested environments:
>>>>>>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>>>>>>>> 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux [all platforms
>>>>>>>>>>>> working well with no problems: Supermicro X9SCL (Intel C202
>>>>>>>>>>>> PCH), Supermicro X10SLL+-F (Intel C222 Express PCH),
>>>>>>>>>>>> Supermicro X11SSL-F (Intel® C232 chipset)]
>>>>>>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>>>>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro
>>>>>>>>>>>> X9SCL (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222
>>>>>>>>>>>> Express PCH) behave problematic as described above | newer
>>>>>>>>>>>> platform: Supermicro X11SSL-F (Intel® C232 chipset) working
>>>>>>>>>>>> well with no problems]
>>>>>>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where
>>>>>>>>>>> you can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>>>>>>
>>>>>>>>>> I`ve already reported that to the Debian team
>>>>>>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but
>>>>>>>>>> so far nobody took care of this issue so far.
>>>>>>>>>>
>>>>>>>>>>>> So far to solve the problem I was trying to upgrade system to
>>>>>>>>>>>> the newest stable version, upgrade kernel to version 6.x,
>>>>>>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>>>>>> Thank you for checking that. Too bad it’s still present. To
>>>>>>>>>>> rule out some user space problem, could you test Debian 9.7
>>>>>>>>>>> with a stable Linux release, currently 6.1.7?
>>>>>>>>>>>
>>>>>>>>>>> What does `sudo perf top --sort comm,dso` show, where the time
>>>>>>>>>>> is spent?
>>>>>>>>>> During my first test in real enviroment with subscribers I
>>>>>>>>>> gether the following data through the perf:
>>>>>>>>>>
>>>>>>>>>>    27.83%  [kernel]                   [k] strncpy
>>>>>>>>>>    14.80%  [kernel]                   [k] nft_do_chain
>>>>>>>>>>     7.61%  [kernel]                   [k] memcmp
>>>>>>>>>>     5.63%  [kernel]                   [k] nft_meta_get_eval
>>>>>>>>>>     3.14%  [kernel]                   [k] nft_cmp_eval
>>>>>>>>>>     2.79%  [kernel]                   [k] asm_exc_nmi
>>>>>>>>>>     1.07%  [kernel]                   [k] module_get_kallsym
>>>>>>>>>>     0.92%  [kernel]                   [k]
>>>>>>>>>> kallsyms_expand_symbol.constprop.0
>>>>>>>>>>     0.85%  [kernel]                   [k] ixgbe_poll
>>>>>>>>>>     0.75%  [kernel]                   [k] format_decode
>>>>>>>>>>     0.61%  [kernel]                   [k] number
>>>>>>>>>>     0.56%  [kernel]                   [k] menu_select
>>>>>>>>>>     0.54%  [kernel]                   [k] clflush_cache_range
>>>>>>>>>>     0.52%  [kernel]                   [k] cpuidle_enter_state
>>>>>>>>>>     0.51%  [kernel]                   [k] vsnprintf
>>>>>>>>>>     0.50%  [kernel]                   [k] u32_classify
>>>>>>>>>>     0.49%  [kernel]                   [k] fib_table_lookup
>>>>>>>>>>     0.40%  [kernel]                   [k] dma_pte_clear_level
>>>>>>>>>>     0.39%  [kernel]                   [k] domain_mapping
>>>>>>>>>>     0.36%  [kernel]                   [k] ixgbe_xmit_fram
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM
>>>>>>>>>> TIME+ COMMAND
>>>>>>>>>>       18 root      20   0       0      0      0 S  28.2 0.0
>>>>>>>>>> 7:06.27 ksoftirqd/1
>>>>>>>>>>       12 root      20   0       0      0      0 R  12.0 0.0
>>>>>>>>>> 4:10.88 ksoftirqd/0
>>>>>>>> […]
>>>>>>>>
>>>>>>>> Do you see different behavior in `/proc/interrupts`?
>>>>>>>>
>>>>>>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64
>>>>>>> #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on
>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH):
>>>>>>>
>>>>>>>         1 root      20   0  163948  10288   7696 S   0.0 0.1
>>>>>>> 0:39.58 systemd
>>>>>> […]
>>>>>>
>>>>>> The content of `/proc/interrupts` has a different format on my system.
>>>>>>
>>>>>> ```
>>>>>> $ head -3 /proc/interrupts
>>>>>>             CPU0       CPU1       CPU2       CPU3
>>>>>>    1:      55560          0        113          0  IR-IO-APIC 1-edge
>>>>>> i8042
>>>>>>    8:          0          0          0          0  IR-IO-APIC 8-edge
>>>>>> rtc0
>>>>>> ```
>>>>>> […]
>>>>>>
>>>>>>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>>>>
>>>>>>> 31659 root      20   0       0      0      0 S   0.3  0.0 0:00.92
>>>>>>> kworker/7:0
>>>>>>>       1 root      20   0   57032   6736   5256 S   0.0  0.1 2:28.14
>>>>>>> systemd
>>>>>> […]
>>>>>>>>>>>> Supermicro support suggested as follows:
>>>>>>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which
>>>>>>>>>>>> is a recent kernel it might not properly support the chipsets
>>>>>>>>>>>> for X9 therefore i suggest to use RHEL or CentOS as they use
>>>>>>>>>>>> much older kernel versions. I expect that with ubuntu 20.04
>>>>>>>>>>>> you see the same problem it uses kernel 5.4
>>>>>>>>>>>>>> Testing another GNU/Linux distribution for another data
>>>>>>>>>>> point, might be a good idea.
>>>>>>>>>>>
>>>>>>>>>>> As nobody has responded yet, bisecting the issue is probably
>>>>>>>>>>> the fastest way to get to the bottom of this. Luckily the
>>>>>>>>>>> problem seems reproducible and you seem to be able to build a
>>>>>>>>>>> Linux kernel yourself, so that should work. (For testing
>>>>>>>>>>> purposes you could also test with Ubuntu, as they provide
>>>>>>>>>>> Linux kernel builds for (almost) all releases in their Linux
>>>>>>>>>>> kernel mainline PPA [2].)
>>>>>>>>>>>
>>>>>>>>>> Of course  I can try Ubuntu and report how it is working.
>>>>>>>>>>
>>>>>>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way
>>>>>>>>> generating higher load after executing "ip link set enp1s0 up".
>>>>>>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu
>>>>>>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>>>>>>
>>>>>>>> Anyway, I think, you won’t come around bisecting. Another hint,
>>>>>>>> make sure that you can build a 4.9 Linux kernel yourself, that
>>>>>>>> does not exhibit that issue.
>>>>>>>>
>>>>>>> That`s right, it is 22.04. I don`t have to build it. Standard
>>>>>>> kernel Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems
>>>>>>> for past 4 years.
>>>>>> If nobody of the developers/maintainers is going to step up, you
>>>>>> are on your own. Again, as you can reproduce this easily, the
>>>>>> fastest way is to bisect the issue, which you can do on your own.
>>>>> How can I investigate that further?
>>>> I repeat myself, please bisect the issue. It’s the fastest way.
>>>>
>>>>> I thought about trying to change some of the parameters related to
>>>>> ixgbe driver and observe if anything is changing, but when I am
>>>>> trying to do:
>>>>>
>>>>> sudo modprobe ixgbe IntMode=0
>>>>>
>>>>> I get the following error in the dmesg:
>>>>>
>>>>> [ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
>>>> […]
>>>>
>>>> `modinfo ixgbe` shows the supported parameters.
>>>> PS: If you need help bisecting, please ask. Otherwise, I am out of
>>>> this thread.
>>> Ok, how exactly I can bisect this issue?
>> What have you tried so far? As written in the past, I’d first try more
>> distributions, for example, older Ubuntu versions. Then, if you have
>> some range, I’d use the Ubuntu PPA, and then between the release
>> candidate versions, only then start doing `git bisect` as documented in
>> the documentation [3].
> Hmmm. I'm not an expert in that area, but if you follow Paul's advice
> keep in mind that a deliberate config change by the distro might have an
> impact here. Hence it might be a good idea to rule that out first by
> taking a config from a working kernel and using it (with the help of
> "make olddefconfig") to build your own kernel from the version that is
> known to fail. But over such a wide range of versions this can be
> tricky. :-/
>
> But apart from that Paul is right afaics: nobody yet had an idea what
> might cause this regression, hence we need a bisection to pin-point the
> problem.

Thanks for the advice. I`ll try my best to find out which commit caused 
the problem, but it will take me some time as I have never done 
bisecting especially on that scale. What`s wondering me the most is that 
nobody reported this issue so far taking into account that these 
platforms along with Debian and Intel 82599EN NIC is quite common 
configuration I think.

Best regards

Bartek Kois

> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
>
>
>>>>>>>>>>> [1]: https://bugzilla.kernel.org/
>>>>>>>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
>> [3]: https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html
>>
>>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
  2023-01-24  9:40                       ` Bartek Kois
@ 2023-03-23 13:46                         ` Linux regression tracking (Thorsten Leemhuis)
  0 siblings, 0 replies; 14+ messages in thread
From: Linux regression tracking (Thorsten Leemhuis) @ 2023-03-23 13:46 UTC (permalink / raw)
  To: Bartek Kois, Linux regressions mailing list, Paul Menzel; +Cc: intel-wired-lan

On 24.01.23 10:40, Bartek Kois wrote:
> W dniu 24.01.2023 o 10:33, Linux kernel regression tracking (Thorsten
> Leemhuis) pisze:
>> On 23.01.23 20:03, Paul Menzel wrote:
>>> Am 23.01.23 um 19:58 schrieb Bartek Kois:
>>>> W dniu 23.01.2023 o 19:53, Paul Menzel pisze:
>>>>> Am 23.01.23 um 19:38 schrieb Bartek Kois:
>>>>>> W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
>>>>>>> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>>>>>>>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>>>>>>>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>>>>>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>>>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>>>>>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>>>>>>>
>>>>>>>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip
>>>>>>>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel
>>>>>>>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load
>>>>>>>>>>>>> (even if no traffic is passing through the adapter) and
>>>>>>>>>>>>> network performance is low (when network is connected).
>>>>>>>>>>>> How do you test the network performance? Please give exact
>>>>>>>>>>>> numbers for comparison.
>>>>>>>>>>>>
>>>>>>>>>>> I am using this server as a router for my subscribers with
>>>>>>>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I
>>>>>>>>>>> encountered this problem while migrating form Debian 9.7 to
>>>>>>>>>>> 11.5. Routers based  on Supermicro X11SSL-F (Intel® C232
>>>>>>>>>>> chipset) works with no problems after that migration, but
>>>>>>>>>>> routers based on Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving
>>>>>>>>>>> strangely with high cpu load (0.5-0.8 while before it was
>>>>>>>>>>> around 0.0-0.1) and subscribers not being able to utilize their
>>>>>>>>>>> plans. I tried to strip down the problem and ends up with clean
>>>>>>>>>>> system with no iptables or hfsc rules behaving the same (higher
>>>>>>>>>>> load) right after setting the 10G link upeven if no traffic is
>>>>>>>>>>> passing by.
>>>>>>>>>>>
>>>>>>>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla
>>>>>>>>>>>>> system
>>>>>>>>>>>>> with no network attached. The problem can be observed on the
>>>>>>>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the
>>>>>>>>>>>>> Supermicro
>>>>>>>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Tested environments:
>>>>>>>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>>>>>>>>> 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux [all platforms
>>>>>>>>>>>>> working well with no problems: Supermicro X9SCL (Intel C202
>>>>>>>>>>>>> PCH), Supermicro X10SLL+-F (Intel C222 Express PCH),
>>>>>>>>>>>>> Supermicro X11SSL-F (Intel® C232 chipset)]
>>>>>>>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>>>>>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro
>>>>>>>>>>>>> X9SCL (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222
>>>>>>>>>>>>> Express PCH) behave problematic as described above | newer
>>>>>>>>>>>>> platform: Supermicro X11SSL-F (Intel® C232 chipset) working
>>>>>>>>>>>>> well with no problems]
>>>>>>>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where
>>>>>>>>>>>> you can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>>>>>>>
>>>>>>>>>>> I`ve already reported that to the Debian team
>>>>>>>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but
>>>>>>>>>>> so far nobody took care of this issue so far.
>>>>>>>>>>>
>>>>>>>>>>>>> So far to solve the problem I was trying to upgrade system to
>>>>>>>>>>>>> the newest stable version, upgrade kernel to version 6.x,
>>>>>>>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>>>>>>> Thank you for checking that. Too bad it’s still present. To
>>>>>>>>>>>> rule out some user space problem, could you test Debian 9.7
>>>>>>>>>>>> with a stable Linux release, currently 6.1.7?
>>>>>>>>>>>>
>>>>>>>>>>>> What does `sudo perf top --sort comm,dso` show, where the time
>>>>>>>>>>>> is spent?
>>>>>>>>>>> During my first test in real enviroment with subscribers I
>>>>>>>>>>> gether the following data through the perf:
>>>>>>>>>>>
>>>>>>>>>>>    27.83%  [kernel]                   [k] strncpy
>>>>>>>>>>>    14.80%  [kernel]                   [k] nft_do_chain
>>>>>>>>>>>     7.61%  [kernel]                   [k] memcmp
>>>>>>>>>>>     5.63%  [kernel]                   [k] nft_meta_get_eval
>>>>>>>>>>>     3.14%  [kernel]                   [k] nft_cmp_eval
>>>>>>>>>>>     2.79%  [kernel]                   [k] asm_exc_nmi
>>>>>>>>>>>     1.07%  [kernel]                   [k] module_get_kallsym
>>>>>>>>>>>     0.92%  [kernel]                   [k]
>>>>>>>>>>> kallsyms_expand_symbol.constprop.0
>>>>>>>>>>>     0.85%  [kernel]                   [k] ixgbe_poll
>>>>>>>>>>>     0.75%  [kernel]                   [k] format_decode
>>>>>>>>>>>     0.61%  [kernel]                   [k] number
>>>>>>>>>>>     0.56%  [kernel]                   [k] menu_select
>>>>>>>>>>>     0.54%  [kernel]                   [k] clflush_cache_range
>>>>>>>>>>>     0.52%  [kernel]                   [k] cpuidle_enter_state
>>>>>>>>>>>     0.51%  [kernel]                   [k] vsnprintf
>>>>>>>>>>>     0.50%  [kernel]                   [k] u32_classify
>>>>>>>>>>>     0.49%  [kernel]                   [k] fib_table_lookup
>>>>>>>>>>>     0.40%  [kernel]                   [k] dma_pte_clear_level
>>>>>>>>>>>     0.39%  [kernel]                   [k] domain_mapping
>>>>>>>>>>>     0.36%  [kernel]                   [k] ixgbe_xmit_fram
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM
>>>>>>>>>>> TIME+ COMMAND
>>>>>>>>>>>       18 root      20   0       0      0      0 S  28.2 0.0
>>>>>>>>>>> 7:06.27 ksoftirqd/1
>>>>>>>>>>>       12 root      20   0       0      0      0 R  12.0 0.0
>>>>>>>>>>> 4:10.88 ksoftirqd/0
>>>>>>>>> […]
>>>>>>>>>
>>>>>>>>> Do you see different behavior in `/proc/interrupts`?
>>>>>>>>>
>>>>>>>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64
>>>>>>>> #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on
>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH):
>>>>>>>>
>>>>>>>>         1 root      20   0  163948  10288   7696 S   0.0 0.1
>>>>>>>> 0:39.58 systemd
>>>>>>> […]
>>>>>>>
>>>>>>> The content of `/proc/interrupts` has a different format on my
>>>>>>> system.
>>>>>>>
>>>>>>> ```
>>>>>>> $ head -3 /proc/interrupts
>>>>>>>             CPU0       CPU1       CPU2       CPU3
>>>>>>>    1:      55560          0        113          0  IR-IO-APIC 1-edge
>>>>>>> i8042
>>>>>>>    8:          0          0          0          0  IR-IO-APIC 8-edge
>>>>>>> rtc0
>>>>>>> ```
>>>>>>> […]
>>>>>>>
>>>>>>>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>>>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>>>>>
>>>>>>>> 31659 root      20   0       0      0      0 S   0.3  0.0 0:00.92
>>>>>>>> kworker/7:0
>>>>>>>>       1 root      20   0   57032   6736   5256 S   0.0  0.1 2:28.14
>>>>>>>> systemd
>>>>>>> […]
>>>>>>>>>>>>> Supermicro support suggested as follows:
>>>>>>>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which
>>>>>>>>>>>>> is a recent kernel it might not properly support the chipsets
>>>>>>>>>>>>> for X9 therefore i suggest to use RHEL or CentOS as they use
>>>>>>>>>>>>> much older kernel versions. I expect that with ubuntu 20.04
>>>>>>>>>>>>> you see the same problem it uses kernel 5.4
>>>>>>>>>>>>>>> Testing another GNU/Linux distribution for another data
>>>>>>>>>>>> point, might be a good idea.
>>>>>>>>>>>>
>>>>>>>>>>>> As nobody has responded yet, bisecting the issue is probably
>>>>>>>>>>>> the fastest way to get to the bottom of this. Luckily the
>>>>>>>>>>>> problem seems reproducible and you seem to be able to build a
>>>>>>>>>>>> Linux kernel yourself, so that should work. (For testing
>>>>>>>>>>>> purposes you could also test with Ubuntu, as they provide
>>>>>>>>>>>> Linux kernel builds for (almost) all releases in their Linux
>>>>>>>>>>>> kernel mainline PPA [2].)
>>>>>>>>>>>>
>>>>>>>>>>> Of course  I can try Ubuntu and report how it is working.
>>>>>>>>>>>
>>>>>>>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way
>>>>>>>>>> generating higher load after executing "ip link set enp1s0 up".
>>>>>>>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu
>>>>>>>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>>>>>>>
>>>>>>>>> Anyway, I think, you won’t come around bisecting. Another hint,
>>>>>>>>> make sure that you can build a 4.9 Linux kernel yourself, that
>>>>>>>>> does not exhibit that issue.
>>>>>>>>>
>>>>>>>> That`s right, it is 22.04. I don`t have to build it. Standard
>>>>>>>> kernel Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems
>>>>>>>> for past 4 years.
>>>>>>> If nobody of the developers/maintainers is going to step up, you
>>>>>>> are on your own. Again, as you can reproduce this easily, the
>>>>>>> fastest way is to bisect the issue, which you can do on your own.
>>>>>> How can I investigate that further?
>>>>> I repeat myself, please bisect the issue. It’s the fastest way.
>>>>>
>>>>>> I thought about trying to change some of the parameters related to
>>>>>> ixgbe driver and observe if anything is changing, but when I am
>>>>>> trying to do:
>>>>>>
>>>>>> sudo modprobe ixgbe IntMode=0
>>>>>>
>>>>>> I get the following error in the dmesg:
>>>>>>
>>>>>> [ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
>>>>> […]
>>>>>
>>>>> `modinfo ixgbe` shows the supported parameters.
>>>>> PS: If you need help bisecting, please ask. Otherwise, I am out of
>>>>> this thread.
>>>> Ok, how exactly I can bisect this issue?
>>> What have you tried so far? As written in the past, I’d first try more
>>> distributions, for example, older Ubuntu versions. Then, if you have
>>> some range, I’d use the Ubuntu PPA, and then between the release
>>> candidate versions, only then start doing `git bisect` as documented in
>>> the documentation [3].
>> Hmmm. I'm not an expert in that area, but if you follow Paul's advice
>> keep in mind that a deliberate config change by the distro might have an
>> impact here. Hence it might be a good idea to rule that out first by
>> taking a config from a working kernel and using it (with the help of
>> "make olddefconfig") to build your own kernel from the version that is
>> known to fail. But over such a wide range of versions this can be
>> tricky. :-/
>>
>> But apart from that Paul is right afaics: nobody yet had an idea what
>> might cause this regression, hence we need a bisection to pin-point the
>> problem.
> 
> Thanks for the advice. I`ll try my best to find out which commit caused
> the problem, but it will take me some time as I have never done
> bisecting especially on that scale. 

Did you ever get closer to the root of the problem?

> What`s wondering me the most is that
> nobody reported this issue so far taking into account that these
> platforms along with Debian and Intel 82599EN NIC is quite common
> configuration I think.

I guess the answer is the usual: the problem only shows up in some
environments using that NIC -- for example if the firmware of the
motherboard or the configuration somehow directly or indirectly trigger
the problem.

Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
--
Everything you wanna know about Linux kernel regression tracking:
https://linux-regtracking.leemhuis.info/about/#tldr
If I did something stupid, please tell me, as explained on that page.

P.S.:

#regzbot backburner: need bisection that will take some time to get done
#regzbot poke

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-03-23 13:46 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <d1530cba-1a72-cae8-6a04-ed8ec0f82e6e@gmail.com>
2023-01-19 10:17 ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5 Paul Menzel
2023-01-19 10:22   ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network performance " Paul Menzel
2023-01-19 12:24   ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance " Bartek Kois
2023-01-19 16:58     ` Bartek Kois
2023-01-19 17:09       ` Paul Menzel
2023-01-19 17:17         ` Bartek Kois
2023-01-22 20:28           ` Paul Menzel
2023-01-23 18:38             ` Bartek Kois
2023-01-23 18:53               ` Paul Menzel
2023-01-23 18:58                 ` Bartek Kois
2023-01-23 19:03                   ` Paul Menzel
2023-01-24  9:33                     ` Linux kernel regression tracking (Thorsten Leemhuis)
2023-01-24  9:40                       ` Bartek Kois
2023-03-23 13:46                         ` Linux regression tracking (Thorsten Leemhuis)

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).