Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Phil Auld <pauld@redhat.com>, Ming Lei <ming.lei@redhat.com>,
	linux-block@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-xfs@vger.kernel.org, linux-kernel@vger.kernel.org,
	Jeff Moyer <jmoyer@redhat.com>,
	Dave Chinner <dchinner@redhat.com>,
	Eric Sandeen <sandeen@redhat.com>, Christoph Hellwig <hch@lst.de>,
	Jens Axboe <axboe@kernel.dk>, Ingo Molnar <mingo@redhat.com>,
	Tejun Heo <tj@kernel.org>,
	Vincent Guittot <vincent.guittot@linaro.org>
Subject: Re: [PATCH v4] sched/core: Preempt current task in favour of bound kthread
Date: Thu, 12 Dec 2019 09:46:17 +1100
Message-ID: <20191211224617.GE19256@dread.disaster.area> (raw)
In-Reply-To: <20191211173829.GB21797@linux.vnet.ibm.com>

On Wed, Dec 11, 2019 at 11:08:29PM +0530, Srikar Dronamraju wrote:
> A running task can wake-up a per CPU bound kthread on the same CPU.
> If the current running task doesn't yield the CPU before the next load
> balance operation, the scheduler would detect load imbalance and try to
> balance the load. However this load balance would fail as the waiting
> task is CPU bound, while the running task cannot be moved by the regular
> load balancer. Finally the active load balancer would kick in and move
> the task to a different CPU/Core. Moving the task to a different
> CPU/core can lead to loss in cache affinity leading to poor performance.
> 
> This is more prone to happen if the current running task is CPU
> intensive and the sched_wake_up_granularity is set to larger value.
> When the sched_wake_up_granularity was relatively small, it was observed
> that the bound thread would complete before the load balancer would have
> chosen to move the cache hot task to a different CPU.
> 
> To deal with this situation, the current running task would yield to a
> per CPU bound kthread, provided kthread is not CPU intensive.

So a question for you here: when does the workqueue worker pre-empt
the currently running task? Is it immediately? Or when a time-slice
of the currently running task runs out?

We don't want queued work immediately pre-empting the task that
queued the work - the queued work is *deferred* work that should be
run _soon_ but we want the currently running task to finish what it
is doing first if possible. i.e. these are not synchronous wakeups,
and so we shouldn't schedule kworker threads as though they are sync
wakeups. That will affect batch processing effciency and reduce
throughput because it will greatly increase the number of
unnecessary context switches during IO completion processing....

> /pboffline/hwcct_prg_old/lib/fsperf -t overwrite --noclean -f 5g -b 4k /pboffline
> 
> (With sched_wake_up_granularity set to 15ms)
> 
> Performance counter stats for 'system wide' (5 runs):
> event					    v5.4 				v5.4 + patch(v3)
> probe:active_load_balance_cpu_stop       1,919  ( +-  2.89% )                     4  ( +- 20.48% )
> sched:sched_waking                     441,535  ( +-  0.17% )               914,630  ( +-  0.18% )
> sched:sched_wakeup                     441,533  ( +-  0.17% )               914,630  ( +-  0.18% )
> sched:sched_wakeup_new                   2,436  ( +-  8.08% )                   545  ( +-  4.02% )
> sched:sched_switch                     797,007  ( +-  0.26% )             1,490,261  ( +-  0.10% )
> sched:sched_migrate_task                20,998  ( +-  1.04% )                 2,492  ( +- 11.56% )

As we see here. We've doubled the number of context switches
(increased by 700,000) just to avoid 17,000 incorrect load balancer
task migrations.

That seems like we now make 700,000 incorrect decisions instead of
just 20,000. The difference is that the consequence of making these
many incorrect pre-emption decisions is vastly less than the
consequence of making the wrong migration decision.

It seems to me that we should be checking this is_per_cpu_kthread()
state for tasks queued on the runqueue during active load balancing,
rather than at wakeup time.  i.e. in these cases we don't migrate
the running task, we just let it run out it's timeslice out and the
local per-cpu kthreads then run appropriately.

AFAICT this would have the same effect of avoiding unnecessary task
migrations in this workload, but without causing a global change to
the way workqueue kworkers are scheduled that has the potential to
cause regressions in other workqueue intensive workloads....

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply index

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-14 11:31 single aio thread is migrated crazily by scheduler Ming Lei
2019-11-14 13:14 ` Peter Zijlstra
2019-11-15  0:09   ` Ming Lei
2019-11-15 14:16     ` Ming Lei
2019-11-14 23:54 ` Dave Chinner
2019-11-15  1:08   ` Ming Lei
2019-11-15  4:56     ` Dave Chinner
2019-11-15  7:08       ` Ming Lei
2019-11-15 23:40         ` Dave Chinner
2019-11-16  6:31           ` Ming Lei
2019-11-18  9:21           ` Peter Zijlstra
2019-11-18 14:54             ` Vincent Guittot
2019-11-18 20:40             ` Dave Chinner
2019-11-20 19:16               ` Peter Zijlstra
2019-11-20 22:03                 ` Phil Auld
2019-11-21  4:12                   ` Ming Lei
2019-11-21 14:12                     ` Phil Auld
2019-11-21 15:02                       ` Boaz Harrosh
2019-11-21 16:19                         ` Jens Axboe
2019-12-09 16:58                           ` Srikar Dronamraju
2019-11-21 22:10                       ` Dave Chinner
2019-11-21 13:29                   ` Peter Zijlstra
2019-11-21 14:21                     ` Phil Auld
2019-12-09 16:51                     ` Srikar Dronamraju
2019-12-09 23:17                       ` Dave Chinner
2019-12-10  3:27                         ` Srikar Dronamraju
2019-12-10  5:43                         ` [PATCH v2] sched/core: Preempt current task in favour of bound kthread Srikar Dronamraju
2019-12-10  9:26                           ` Peter Zijlstra
2019-12-10  9:33                             ` Peter Zijlstra
2019-12-10 10:18                               ` Srikar Dronamraju
2019-12-10 10:16                             ` Srikar Dronamraju
2019-12-10  9:43                           ` Vincent Guittot
2019-12-10 10:11                             ` Srikar Dronamraju
2019-12-10 11:02                               ` Vincent Guittot
2019-12-10 17:23                           ` [PATCH v3] " Srikar Dronamraju
2019-12-11 17:38                             ` [PATCH v4] " Srikar Dronamraju
2019-12-11 22:46                               ` Dave Chinner [this message]
2019-12-12 10:10                                 ` Peter Zijlstra
2019-12-12 10:14                                   ` Peter Zijlstra
2019-12-12 10:23                                     ` Peter Zijlstra
2019-12-12 11:20                                       ` Vincent Guittot
2019-12-12 13:12                                         ` Peter Zijlstra
2019-12-12 15:07                                   ` Srikar Dronamraju
2019-12-12 15:15                                     ` Peter Zijlstra
2019-12-13  5:32                                   ` Srikar Dronamraju
2019-11-18 16:26           ` single aio thread is migrated crazily by scheduler Srikar Dronamraju
2019-11-18 21:18             ` Dave Chinner
2019-11-19  8:54               ` Ming Lei
     [not found]         ` <20191128094003.752-1-hdanton@sina.com>
2019-11-28  9:53           ` Vincent Guittot
2019-12-02  2:46             ` Ming Lei
2019-12-02  4:02               ` Dave Chinner
2019-12-02  4:22                 ` Ming Lei
2019-12-02 13:45                 ` Vincent Guittot
2019-12-02 21:22                   ` Phil Auld
2019-12-03  9:45                     ` Vincent Guittot
2019-12-04 13:50                       ` Ming Lei
2019-12-02 23:53                   ` Dave Chinner
2019-12-03  0:18                     ` Ming Lei
2019-12-03 13:34                     ` Vincent Guittot
2019-12-02  7:39               ` Juri Lelli
2019-12-02  3:08           ` Dave Chinner
     [not found]           ` <20191202090158.15016-1-hdanton@sina.com>
2019-12-02 23:06             ` Dave Chinner
     [not found]             ` <20191203131514.5176-1-hdanton@sina.com>
2019-12-03 22:29               ` Dave Chinner
     [not found]               ` <20191204102903.896-1-hdanton@sina.com>
2019-12-04 22:59                 ` Dave Chinner

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191211224617.GE19256@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=axboe@kernel.dk \
    --cc=dchinner@redhat.com \
    --cc=hch@lst.de \
    --cc=jmoyer@redhat.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=mingo@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=sandeen@redhat.com \
    --cc=srikar@linux.vnet.ibm.com \
    --cc=tj@kernel.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git