intel-wired-lan.lists.osuosl.org archive mirror
 help / color / mirror / Atom feed
From: Bartek Kois <bartek.kois@gmail.com>
To: Linux regressions mailing list <regressions@lists.linux.dev>,
	Paul Menzel <pmenzel@molgen.mpg.de>
Cc: intel-wired-lan@osuosl.org
Subject: Re: [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5
Date: Tue, 24 Jan 2023 10:40:55 +0100	[thread overview]
Message-ID: <ead1e9eb-944d-fa9b-e8ff-c087f8718c47@gmail.com> (raw)
In-Reply-To: <7d1347f4-4cf0-e8a8-000e-9128933181b9@leemhuis.info>


W dniu 24.01.2023 o 10:33, Linux kernel regression tracking (Thorsten 
Leemhuis) pisze:
> On 23.01.23 20:03, Paul Menzel wrote:
>> Am 23.01.23 um 19:58 schrieb Bartek Kois:
>>
>>> W dniu 23.01.2023 o 19:53, Paul Menzel pisze:
>>>> Am 23.01.23 um 19:38 schrieb Bartek Kois:
>>>>> W dniu 22.01.2023 o 21:28, Paul Menzel pisze:
>>>>>> Dear Bartek,
>>>>>>
>>>>>>
>>>>>> Am 19.01.23 um 18:17 schrieb Bartek Kois:
>>>>>>> W dniu 19.01.2023 o 18:09, Paul Menzel pisze:
>>>>>>>> Am 19.01.23 um 17:58 schrieb Bartek Kois:
>>>>>>>>> W dniu 19.01.2023 o 13:24, Bartek Kois pisze:
>>>>>>>>>> W dniu 19.01.2023 o 11:17, Paul Menzel pisze:
>>>>>>>>>>> #regzbot ^introduced: 4.9.88..5.10.149
>>>>>>>>>>> Am 14.01.23 um 11:23 schrieb Bartek Kois:
>>>>>>>>>>>
>>>>>>>>>>>> After moving from Debian 9.7 to 11.5 as soon as I perform "ip
>>>>>>>>>>>> link set enp1s0 up" for my 10G adapter (AOC-STGN-I1S - Intel
>>>>>>>>>>>> 82599EN based 10G adapter) I am experiencing high cpu load
>>>>>>>>>>>> (even if no traffic is passing through the adapter) and
>>>>>>>>>>>> network performance is low (when network is connected).
>>>>>>>>>>> How do you test the network performance? Please give exact
>>>>>>>>>>> numbers for comparison.
>>>>>>>>>>>
>>>>>>>>>> I am using this server as a router for my subscribers with
>>>>>>>>>> iptables (for NAT and firewall) and hfsc (for QoS). First I
>>>>>>>>>> encountered this problem while migrating form Debian 9.7 to
>>>>>>>>>> 11.5. Routers based  on Supermicro X11SSL-F (Intel® C232
>>>>>>>>>> chipset) works with no problems after that migration, but
>>>>>>>>>> routers based on Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH) starts behaving
>>>>>>>>>> strangely with high cpu load (0.5-0.8 while before it was
>>>>>>>>>> around 0.0-0.1) and subscribers not being able to utilize their
>>>>>>>>>> plans. I tried to strip down the problem and ends up with clean
>>>>>>>>>> system with no iptables or hfsc rules behaving the same (higher
>>>>>>>>>> load) right after setting the 10G link upeven if no traffic is
>>>>>>>>>> passing by.
>>>>>>>>>>
>>>>>>>>>>>> The cpu load is oscillating between 0.1 and 0.3 on vanilla
>>>>>>>>>>>> system
>>>>>>>>>>>> with no network attached. The problem can be observed on the
>>>>>>>>>>>> following platforms: Supermicro X9SCL (Intel C202 PCH) and
>>>>>>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH), but for the
>>>>>>>>>>>> Supermicro
>>>>>>>>>>>> X11SSL-F (Intel® C232 chipset) everything is working well.
>>>>>>>>>>>>
>>>>>>>>>>>> Tested environments:
>>>>>>>>>>>> Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>>>>>>>> 4.9.88-1+deb9u1 (2018-05-07) x86_64 GNU/Linux [all platforms
>>>>>>>>>>>> working well with no problems: Supermicro X9SCL (Intel C202
>>>>>>>>>>>> PCH), Supermicro X10SLL+-F (Intel C222 Express PCH),
>>>>>>>>>>>> Supermicro X11SSL-F (Intel® C232 chipset)]
>>>>>>>>>>>> Debian 11.5 - Linux 5.10.0-19-amd64 #1 SMP Debian 5.10.149-2
>>>>>>>>>>>> (2022-10-21) x86_64 GNU/Linux [older platforms: Supermicro
>>>>>>>>>>>> X9SCL (Intel C202 PCH), Supermicro X10SLL+-F (Intel C222
>>>>>>>>>>>> Express PCH) behave problematic as described above | newer
>>>>>>>>>>>> platform: Supermicro X11SSL-F (Intel® C232 chipset) working
>>>>>>>>>>>> well with no problems]
>>>>>>>>>>> Maybe create a bug at the Linux kernel bug tracker [1], where
>>>>>>>>>>> you can attach all the logs (`dmesg`, `lspci -nnk -s …`, …).
>>>>>>>>>>>
>>>>>>>>>> I`ve already reported that to the Debian team
>>>>>>>>>> ttps://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1024763, but
>>>>>>>>>> so far nobody took care of this issue so far.
>>>>>>>>>>
>>>>>>>>>>>> So far to solve the problem I was trying to upgrade system to
>>>>>>>>>>>> the newest stable version, upgrade kernel to version 6.x,
>>>>>>>>>>>> upgrade ixgbe driver to the newest version but with no luck.
>>>>>>>>>>> Thank you for checking that. Too bad it’s still present. To
>>>>>>>>>>> rule out some user space problem, could you test Debian 9.7
>>>>>>>>>>> with a stable Linux release, currently 6.1.7?
>>>>>>>>>>>
>>>>>>>>>>> What does `sudo perf top --sort comm,dso` show, where the time
>>>>>>>>>>> is spent?
>>>>>>>>>> During my first test in real enviroment with subscribers I
>>>>>>>>>> gether the following data through the perf:
>>>>>>>>>>
>>>>>>>>>>    27.83%  [kernel]                   [k] strncpy
>>>>>>>>>>    14.80%  [kernel]                   [k] nft_do_chain
>>>>>>>>>>     7.61%  [kernel]                   [k] memcmp
>>>>>>>>>>     5.63%  [kernel]                   [k] nft_meta_get_eval
>>>>>>>>>>     3.14%  [kernel]                   [k] nft_cmp_eval
>>>>>>>>>>     2.79%  [kernel]                   [k] asm_exc_nmi
>>>>>>>>>>     1.07%  [kernel]                   [k] module_get_kallsym
>>>>>>>>>>     0.92%  [kernel]                   [k]
>>>>>>>>>> kallsyms_expand_symbol.constprop.0
>>>>>>>>>>     0.85%  [kernel]                   [k] ixgbe_poll
>>>>>>>>>>     0.75%  [kernel]                   [k] format_decode
>>>>>>>>>>     0.61%  [kernel]                   [k] number
>>>>>>>>>>     0.56%  [kernel]                   [k] menu_select
>>>>>>>>>>     0.54%  [kernel]                   [k] clflush_cache_range
>>>>>>>>>>     0.52%  [kernel]                   [k] cpuidle_enter_state
>>>>>>>>>>     0.51%  [kernel]                   [k] vsnprintf
>>>>>>>>>>     0.50%  [kernel]                   [k] u32_classify
>>>>>>>>>>     0.49%  [kernel]                   [k] fib_table_lookup
>>>>>>>>>>     0.40%  [kernel]                   [k] dma_pte_clear_level
>>>>>>>>>>     0.39%  [kernel]                   [k] domain_mapping
>>>>>>>>>>     0.36%  [kernel]                   [k] ixgbe_xmit_fram
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>      PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM
>>>>>>>>>> TIME+ COMMAND
>>>>>>>>>>       18 root      20   0       0      0      0 S  28.2 0.0
>>>>>>>>>> 7:06.27 ksoftirqd/1
>>>>>>>>>>       12 root      20   0       0      0      0 R  12.0 0.0
>>>>>>>>>> 4:10.88 ksoftirqd/0
>>>>>>>> […]
>>>>>>>>
>>>>>>>> Do you see different behavior in `/proc/interrupts`?
>>>>>>>>
>>>>>>> This is how it looks like for Debian 11.5 - Linux 5.10.0-19-amd64
>>>>>>> #1 SMP Debian 5.10.149-2 (2022-10-21) x86_64 GNU/Linux on
>>>>>>> Supermicro X10SLL+-F (Intel C222 Express PCH):
>>>>>>>
>>>>>>>         1 root      20   0  163948  10288   7696 S   0.0 0.1
>>>>>>> 0:39.58 systemd
>>>>>> […]
>>>>>>
>>>>>> The content of `/proc/interrupts` has a different format on my system.
>>>>>>
>>>>>> ```
>>>>>> $ head -3 /proc/interrupts
>>>>>>             CPU0       CPU1       CPU2       CPU3
>>>>>>    1:      55560          0        113          0  IR-IO-APIC 1-edge
>>>>>> i8042
>>>>>>    8:          0          0          0          0  IR-IO-APIC 8-edge
>>>>>> rtc0
>>>>>> ```
>>>>>> […]
>>>>>>
>>>>>>> and for Debian 9.7 - Linux 4.9.0-6-amd64 #1 SMP Debian
>>>>>>> 4.9.88-1+deb9u1 on Supermicro X10SLL+-F (Intel C222 Express PCH)
>>>>>>>
>>>>>>> 31659 root      20   0       0      0      0 S   0.3  0.0 0:00.92
>>>>>>> kworker/7:0
>>>>>>>       1 root      20   0   57032   6736   5256 S   0.0  0.1 2:28.14
>>>>>>> systemd
>>>>>> […]
>>>>>>>>>>>> Supermicro support suggested as follows:
>>>>>>>>>>>> it might be kernel related debian 11.5 has kernel 5.10 which
>>>>>>>>>>>> is a recent kernel it might not properly support the chipsets
>>>>>>>>>>>> for X9 therefore i suggest to use RHEL or CentOS as they use
>>>>>>>>>>>> much older kernel versions. I expect that with ubuntu 20.04
>>>>>>>>>>>> you see the same problem it uses kernel 5.4
>>>>>>>>>>>>>> Testing another GNU/Linux distribution for another data
>>>>>>>>>>> point, might be a good idea.
>>>>>>>>>>>
>>>>>>>>>>> As nobody has responded yet, bisecting the issue is probably
>>>>>>>>>>> the fastest way to get to the bottom of this. Luckily the
>>>>>>>>>>> problem seems reproducible and you seem to be able to build a
>>>>>>>>>>> Linux kernel yourself, so that should work. (For testing
>>>>>>>>>>> purposes you could also test with Ubuntu, as they provide
>>>>>>>>>>> Linux kernel builds for (almost) all releases in their Linux
>>>>>>>>>>> kernel mainline PPA [2].)
>>>>>>>>>>>
>>>>>>>>>> Of course  I can try Ubuntu and report how it is working.
>>>>>>>>>>
>>>>>>>>> Ubuntu (5.15.0-43-generic) seems to be working in the same way
>>>>>>>>> generating higher load after executing "ip link set enp1s0 up".
>>>>>>>> That is good to know. (Is this Ubuntu 22.04?) What about Ubuntu
>>>>>>>> 20.04 with Linux 5.4, and Ubuntu 18.04 with 4.15?
>>>>>>>>
>>>>>>>> Anyway, I think, you won’t come around bisecting. Another hint,
>>>>>>>> make sure that you can build a 4.9 Linux kernel yourself, that
>>>>>>>> does not exhibit that issue.
>>>>>>>>
>>>>>>> That`s right, it is 22.04. I don`t have to build it. Standard
>>>>>>> kernel Linux 4.9.0-6-amd64 from Debian 9.7 worked without problems
>>>>>>> for past 4 years.
>>>>>> If nobody of the developers/maintainers is going to step up, you
>>>>>> are on your own. Again, as you can reproduce this easily, the
>>>>>> fastest way is to bisect the issue, which you can do on your own.
>>>>> How can I investigate that further?
>>>> I repeat myself, please bisect the issue. It’s the fastest way.
>>>>
>>>>> I thought about trying to change some of the parameters related to
>>>>> ixgbe driver and observe if anything is changing, but when I am
>>>>> trying to do:
>>>>>
>>>>> sudo modprobe ixgbe IntMode=0
>>>>>
>>>>> I get the following error in the dmesg:
>>>>>
>>>>> [ 2137.324772] ixgbe: unknown parameter 'IntMode' ignored <<<<<<<<<
>>>> […]
>>>>
>>>> `modinfo ixgbe` shows the supported parameters.
>>>> PS: If you need help bisecting, please ask. Otherwise, I am out of
>>>> this thread.
>>> Ok, how exactly I can bisect this issue?
>> What have you tried so far? As written in the past, I’d first try more
>> distributions, for example, older Ubuntu versions. Then, if you have
>> some range, I’d use the Ubuntu PPA, and then between the release
>> candidate versions, only then start doing `git bisect` as documented in
>> the documentation [3].
> Hmmm. I'm not an expert in that area, but if you follow Paul's advice
> keep in mind that a deliberate config change by the distro might have an
> impact here. Hence it might be a good idea to rule that out first by
> taking a config from a working kernel and using it (with the help of
> "make olddefconfig") to build your own kernel from the version that is
> known to fail. But over such a wide range of versions this can be
> tricky. :-/
>
> But apart from that Paul is right afaics: nobody yet had an idea what
> might cause this regression, hence we need a bisection to pin-point the
> problem.

Thanks for the advice. I`ll try my best to find out which commit caused 
the problem, but it will take me some time as I have never done 
bisecting especially on that scale. What`s wondering me the most is that 
nobody reported this issue so far taking into account that these 
platforms along with Debian and Intel 82599EN NIC is quite common 
configuration I think.

Best regards

Bartek Kois

> Ciao, Thorsten (wearing his 'the Linux kernel's regression tracker' hat)
> --
> Everything you wanna know about Linux kernel regression tracking:
> https://linux-regtracking.leemhuis.info/about/#tldr
> If I did something stupid, please tell me, as explained on that page.
>
>
>
>>>>>>>>>>> [1]: https://bugzilla.kernel.org/
>>>>>>>>>>> [2]: https://kernel.ubuntu.com/~kernel-ppa/mainline/
>> [3]: https://www.kernel.org/doc/html/latest/admin-guide/bug-bisect.html
>>
>>
_______________________________________________
Intel-wired-lan mailing list
Intel-wired-lan@osuosl.org
https://lists.osuosl.org/mailman/listinfo/intel-wired-lan

  reply	other threads:[~2023-01-24  9:41 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-14 10:23 [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance after moving to Debian 11.5 Bartek Kois
2023-01-19  9:59 ` Bartek Kois
2023-01-19 10:17 ` Paul Menzel
2023-01-19 10:22   ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network performance " Paul Menzel
2023-01-19 12:24   ` [Intel-wired-lan] Supermicro AOC-STGN-I1S (Intel 82599EN based 10G adapter) - poor network perfomance " Bartek Kois
2023-01-19 16:58     ` Bartek Kois
2023-01-19 17:09       ` Paul Menzel
2023-01-19 17:17         ` Bartek Kois
2023-01-22 20:28           ` Paul Menzel
2023-01-23 18:38             ` Bartek Kois
2023-01-23 18:53               ` Paul Menzel
2023-01-23 18:58                 ` Bartek Kois
2023-01-23 19:03                   ` Paul Menzel
2023-01-24  9:33                     ` Linux kernel regression tracking (Thorsten Leemhuis)
2023-01-24  9:40                       ` Bartek Kois [this message]
2023-03-23 13:46                         ` Linux regression tracking (Thorsten Leemhuis)
  -- strict thread matches above, loose matches on Subject: below --
2023-01-04  8:39 Bartek Kois

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ead1e9eb-944d-fa9b-e8ff-c087f8718c47@gmail.com \
    --to=bartek.kois@gmail.com \
    --cc=intel-wired-lan@osuosl.org \
    --cc=pmenzel@molgen.mpg.de \
    --cc=regressions@lists.linux.dev \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).