All of lore.kernel.org
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Peter Zijlstra <peterz@infradead.org>
Cc: kernel test robot <oliver.sang@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	x86@kernel.org, lkp@lists.01.org, lkp@intel.com,
	ying.huang@intel.com, feng.tang@intel.com,
	zhengjun.xing@linux.intel.com, aubrey.li@linux.intel.com,
	yu.c.chen@intel.com
Subject: Re: [sched/fair]  56498cfb04:  netperf.Throughput_tps -5.4% regression
Date: Wed, 22 Sep 2021 14:42:47 +0100	[thread overview]
Message-ID: <20210922134247.GY3959@techsingularity.net> (raw)
In-Reply-To: <20210922124400.GQ4323@worktop.programming.kicks-ass.net>

On Wed, Sep 22, 2021 at 02:44:00PM +0200, Peter Zijlstra wrote:
> On Sun, Sep 12, 2021 at 11:34:47PM +0800, kernel test robot wrote:
> > 
> > 
> > Greeting,
> > 
> > FYI, we noticed a -5.4% regression of netperf.Throughput_tps due to commit:
> > 
> > 
> > commit: 56498cfb045d7147cdcba33795d19429afcd1d00 ("sched/fair: Avoid a second scan of target in select_idle_cpu")
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> Mel, was this in line with your own benchmarks?

UDP-RR was not run but I could add it for future reference as a
socket-based-ping-pong test. However, it would not be equivalent to lkp
as I only run one client/server.

For UDP_STREAM with single client the significant differences reported
were;

machine1:	+1.07% to +1.54% depending on packet size
machine2:	-1.4%  to +0.9%
machine3:	+1.5%  to -2.46%
machine4:	+1.16% to +1.64%
machine5:	-1.59% to +1.23%
machine6:	-2.10% to +1.83%

So it was a mix of small gains and some regressions with more gains than
losses. As netperf is running localhost, it can be a bit unreliable and
other workloads showed more gains than losses. On machine 2, total system
CPU usage went from 1195.21 seconds to 1197.52 seconds but activities like
context switches and interrupt deliveries were broadly similar. There
were differences in the total number of slab pages used but roughly
similar trends to probably reflect the system starting state more than
anything else.

On balance, I concluded that rescanning target is wasteful and that while
there might be slight variances, they would be difficult to consistent
reproduce. The largest concern is that skipping target means that one
additional new rq is potentially examined. That would incur a small
penalty if it was a wasteful search.

For the LKP test, the nr_threads are 50% so I expect with two sockets,
the machine is fully loaded and would be vulnerable to load-balancing
artifacts as client and server threads move around. Hence, I ended up
thinking that this result was likely a false positive.

-- 
Mel Gorman
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Mel Gorman <mgorman@techsingularity.net>
To: lkp@lists.01.org
Subject: Re: [sched/fair] 56498cfb04: netperf.Throughput_tps -5.4% regression
Date: Wed, 22 Sep 2021 14:42:47 +0100	[thread overview]
Message-ID: <20210922134247.GY3959@techsingularity.net> (raw)
In-Reply-To: <20210922124400.GQ4323@worktop.programming.kicks-ass.net>

[-- Attachment #1: Type: text/plain, Size: 2118 bytes --]

On Wed, Sep 22, 2021 at 02:44:00PM +0200, Peter Zijlstra wrote:
> On Sun, Sep 12, 2021 at 11:34:47PM +0800, kernel test robot wrote:
> > 
> > 
> > Greeting,
> > 
> > FYI, we noticed a -5.4% regression of netperf.Throughput_tps due to commit:
> > 
> > 
> > commit: 56498cfb045d7147cdcba33795d19429afcd1d00 ("sched/fair: Avoid a second scan of target in select_idle_cpu")
> > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> Mel, was this in line with your own benchmarks?

UDP-RR was not run but I could add it for future reference as a
socket-based-ping-pong test. However, it would not be equivalent to lkp
as I only run one client/server.

For UDP_STREAM with single client the significant differences reported
were;

machine1:	+1.07% to +1.54% depending on packet size
machine2:	-1.4%  to +0.9%
machine3:	+1.5%  to -2.46%
machine4:	+1.16% to +1.64%
machine5:	-1.59% to +1.23%
machine6:	-2.10% to +1.83%

So it was a mix of small gains and some regressions with more gains than
losses. As netperf is running localhost, it can be a bit unreliable and
other workloads showed more gains than losses. On machine 2, total system
CPU usage went from 1195.21 seconds to 1197.52 seconds but activities like
context switches and interrupt deliveries were broadly similar. There
were differences in the total number of slab pages used but roughly
similar trends to probably reflect the system starting state more than
anything else.

On balance, I concluded that rescanning target is wasteful and that while
there might be slight variances, they would be difficult to consistent
reproduce. The largest concern is that skipping target means that one
additional new rq is potentially examined. That would incur a small
penalty if it was a wasteful search.

For the LKP test, the nr_threads are 50% so I expect with two sockets,
the machine is fully loaded and would be vulnerable to load-balancing
artifacts as client and server threads move around. Hence, I ended up
thinking that this result was likely a false positive.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2021-09-22 13:42 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-09-12 15:34 [sched/fair] 56498cfb04: netperf.Throughput_tps -5.4% regression kernel test robot
2021-09-12 15:34 ` kernel test robot
2021-09-22 12:44 ` Peter Zijlstra
2021-09-22 12:44   ` Peter Zijlstra
2021-09-22 13:42   ` Mel Gorman [this message]
2021-09-22 13:42     ` Mel Gorman
2021-09-22 14:31     ` Peter Zijlstra
2021-09-22 14:31       ` Peter Zijlstra
  -- strict thread matches above, loose matches on Subject: below --
2021-08-18 15:03 kernel test robot
2021-08-18 15:03 ` kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210922134247.GY3959@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=aubrey.li@linux.intel.com \
    --cc=feng.tang@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=lkp@lists.01.org \
    --cc=oliver.sang@intel.com \
    --cc=peterz@infradead.org \
    --cc=x86@kernel.org \
    --cc=ying.huang@intel.com \
    --cc=yu.c.chen@intel.com \
    --cc=zhengjun.xing@linux.intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.