Re: high number of dropped packets/rx_missed_errors from 4.17 kernel

From: Andrei Popa <andreipopad@gmail.com>
To: "Rafael J. Wysocki" <rjw@rjwysocki.net>
Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	linux-kernel@vger.kernel.org, peterz@infradead.org,
	Linux PM <linux-pm@vger.kernel.org>
Subject: Re: high number of dropped packets/rx_missed_errors from 4.17 kernel
Date: Fri, 4 Dec 2020 08:45:52 +0200	[thread overview]
Message-ID: <5F92AF26-B9EB-4456-B135-39BF92396F6E@gmail.com> (raw)
In-Reply-To: <2E1DF9B2-0CE3-4C4E-8803-0DC145BFE530@gmail.com>

Hi,

I’ve applied your patch on kernel 4.17.0 and dropped packets and rx_missed_errors are still present, through they are increasing at a lower rate.

root@shaper:~# ./test
     rx_missed_errors: 2135
        RX errors 0  dropped 2155  overruns 0  frame 0
sleeping 60 seconds
     rx_missed_errors: 2433
        RX errors 0  dropped 2459  overruns 0  frame 0
sleeping 60 seconds
     rx_missed_errors: 2433
        RX errors 0  dropped 2465  overruns 0  frame 0
sleeping 60 seconds
     rx_missed_errors: 2526
        RX errors 0  dropped 2564  overruns 0  frame 0
sleeping 60 seconds

> On 3 Dec 2020, at 21:43, Andrei Popa <andreipopad@gmail.com> wrote:
> 
> Hi,
> 
> On what kernel version should I try the patch ? I tried on 5.9 and it doesn't build.
> 
>> On 18 Nov 2020, at 20:47, Rafael J. Wysocki <rjw@rjwysocki.net> wrote:
>> 
>> On Tuesday, November 17, 2020 7:31:29 PM CET Rafael J. Wysocki wrote:
>>> On 11/16/2020 8:11 AM, Andrei Popa wrote:
>>>> Hello,
>>>> 
>>>> After an update from vmlinuz-4.15.0-106-generic to vmlinuz-5.4.0-37-generic we experience, on a  number of servers, a very high number of rx_missed_errors and dropped packets only on the uplink 10G interface. We have another 10G downlink interface with no problems.
>>>> 
>>>> The affected servers have the following mainboards:
>>>> S5520HC ver E26045-455
>>>> S5520UR ver E22554-751
>>>> S5520UR ver E22554-753
>>>> S5000VSA
>>>> 
>>>> On other 30 servers with similar mainboards and/or configs there are no dropped packets with vmlinuz-5.4.0-37-generic.
>>>> 
>>>> We’ve installed vanilla 4.16 and there were no dropped packets.
>>>> Vanilla 4.17 had a very high number of dropped packets like the following:
>>>> 
>>>> root@shaper:~# cat test
>>>> #!/bin/bash
>>>> while true
>>>> do
>>>> ethtool -S ens6f1|grep "missed_errors"
>>>> ifconfig ens6f1|grep RX|grep dropped
>>>> sleep 1
>>>> done
>>>> 
>>>> root@shaper:~# ./test
>>>>     rx_missed_errors: 2418845
>>>>        RX errors 0  dropped 2418888  overruns 0  frame 0
>>>>     rx_missed_errors: 2426175
>>>>        RX errors 0  dropped 2426218  overruns 0  frame 0
>>>>     rx_missed_errors: 2431910
>>>>        RX errors 0  dropped 2431953  overruns 0  frame 0
>>>>     rx_missed_errors: 2437266
>>>>        RX errors 0  dropped 2437309  overruns 0  frame 0
>>>>     rx_missed_errors: 2443305
>>>>        RX errors 0  dropped 2443348  overruns 0  frame 0
>>>>     rx_missed_errors: 2448357
>>>>        RX errors 0  dropped 2448400  overruns 0  frame 0
>>>>     rx_missed_errors: 2452539
>>>>        RX errors 0  dropped 2452582  overruns 0  frame 0
>>>> 
>>>> We did a git bisect and we’ve found that the following commit generates the high number of dropped packets:
>>>> 
>>>> Author: Rafael J. Wysocki <rafael.j.wysocki@intel.com <mailto:rafael.j.wysocki@intel.com>>
>>>> Date:   Thu Apr 5 19:12:43 2018 +0200
>>>>    cpuidle: menu: Avoid selecting shallow states with stopped tick
>>>>    If the scheduler tick has been stopped already and the governor
>>>>    selects a shallow idle state, the CPU can spend a long time in that
>>>>    state if the selection is based on an inaccurate prediction of idle
>>>>    time.  That effect turns out to be relevant, so it needs to be
>>>>    mitigated.
>>>>    To that end, modify the menu governor to discard the result of the
>>>>    idle time prediction if the tick is stopped and the predicted idle
>>>>    time is less than the tick period length, unless the tick timer is
>>>>    going to expire soon.
>>>>    Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com <mailto:rafael.j.wysocki@intel.com>>
>>>>    Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org <mailto:peterz@infradead.org>>
>>>> diff --git a/drivers/cpuidle/governors/menu.c b/drivers/cpuidle/governors/menu.c
>>>> index 267982e471e0..1bfe03ceb236 100644
>>>> --- a/drivers/cpuidle/governors/menu.c
>>>> +++ b/drivers/cpuidle/governors/menu.c
>>>> @@ -352,13 +352,28 @@ static int menu_select(struct cpuidle_driver *drv, struct cpuidle_device *dev,
>>>>         */
>>>>        data->predicted_us = min(data->predicted_us, expected_interval);
>>>> -       /*
>>>> -        * Use the performance multiplier and the user-configurable
>>>> -        * latency_req to determine the maximum exit latency.
>>>> -        */
>>>> -       interactivity_req = data->predicted_us / performance_multiplier(nr_iowaiters, cpu_load);
>>>> -       if (latency_req > interactivity_req)
>>>> -               latency_req = interactivity_req;
>>> 
>>> The tick_nohz_tick_stopped() check may be done after the above and it 
>>> may be reworked a bit.
>>> 
>>> I'll send a test patch to you shortly.
>> 
>> The patch is appended, but please note that it has been rebased by hand and
>> not tested.
>> 
>> Please let me know if it makes any difference.
>> 
>> And in the future please avoid pasting the entire kernel config to your
>> reports, that's problematic.
>> 
>> ---
>> drivers/cpuidle/governors/menu.c |   23 ++++++++++++-----------
>> 1 file changed, 12 insertions(+), 11 deletions(-)
>> 
>> Index: linux-pm/drivers/cpuidle/governors/menu.c
>> ===================================================================
>> --- linux-pm.orig/drivers/cpuidle/governors/menu.c
>> +++ linux-pm/drivers/cpuidle/governors/menu.c
>> @@ -308,18 +308,18 @@ static int menu_select(struct cpuidle_dr
>> 				get_typical_interval(data, predicted_us)) *
>> 				NSEC_PER_USEC;
>> 
>> -	if (tick_nohz_tick_stopped()) {
>> -		/*
>> -		 * If the tick is already stopped, the cost of possible short
>> -		 * idle duration misprediction is much higher, because the CPU
>> -		 * may be stuck in a shallow idle state for a long time as a
>> -		 * result of it.  In that case say we might mispredict and use
>> -		 * the known time till the closest timer event for the idle
>> -		 * state selection.
>> -		 */
>> -		if (data->predicted_us < TICK_USEC)
>> -			data->predicted_us = min_t(unsigned int, TICK_USEC,
>> -						   ktime_to_us(delta_next));
>> +	/*
>> +	 * If the tick is already stopped, the cost of possible short idle
>> +	 * duration misprediction is much higher, because the CPU may be stuck
>> +	 * in a shallow idle state for a long time as a result of it.  In that
>> +	 * case, say we might mispredict and use the known time till the closest
>> +	 * timer event for the idle state selection, unless that event is going
>> +	 * to occur within the tick time frame (in which case the CPU will be
>> +	 * woken up from whatever idle state it gets into soon enough anyway).
>> +	 */
>> +	if (tick_nohz_tick_stopped() && data->predicted_us < TICK_USEC &&
>> +	    delta_next >= TICK_NSEC) {
>> +		data->predicted_us = ktime_to_us(delta_next);
>> 	} else {
>> 		/*
>> 		 * Use the performance multiplier and the user-configurable
>