Linux-XFS Archive on lore.kernel.org
 help / color / Atom feed
From: Vincent Guittot <vincent.guittot@linaro.org>
To: Phil Auld <pauld@redhat.com>
Cc: Dave Chinner <david@fromorbit.com>,
	Ming Lei <ming.lei@redhat.com>, Hillf Danton <hdanton@sina.com>,
	linux-block <linux-block@vger.kernel.org>,
	linux-fs <linux-fsdevel@vger.kernel.org>,
	linux-xfs <linux-xfs@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Christoph Hellwig <hch@lst.de>, Jens Axboe <axboe@kernel.dk>,
	Peter Zijlstra <peterz@infradead.org>,
	Rong Chen <rong.a.chen@intel.com>, Tejun Heo <tj@kernel.org>
Subject: Re: single aio thread is migrated crazily by scheduler
Date: Tue, 3 Dec 2019 10:45:38 +0100
Message-ID: <CAKfTPtC7uycC3b2ngOFUqOh9-Fcz7h-151aaYJbLJFXrNq-gkw@mail.gmail.com> (raw)
In-Reply-To: <20191202212210.GA32767@lorien.usersys.redhat.com>

On Mon, 2 Dec 2019 at 22:22, Phil Auld <pauld@redhat.com> wrote:
>
> Hi Vincent,
>
> On Mon, Dec 02, 2019 at 02:45:42PM +0100 Vincent Guittot wrote:
> > On Mon, 2 Dec 2019 at 05:02, Dave Chinner <david@fromorbit.com> wrote:
>
> ...
>
> > > So, we can fiddle with workqueues, but it doesn't address the
> > > underlying issue that the scheduler appears to be migrating
> > > non-bound tasks off a busy CPU too easily....
> >
> > The root cause of the problem is that the sched_wakeup_granularity_ns
> > is in the same range or higher than load balance period. As Peter
> > explained, This make the kworker waiting for the CPU for several load
> > period and a transient unbalanced state becomes a stable one that the
> > scheduler to fix. With default value, the scheduler doesn't try to
> > migrate any task.
>
> There are actually two issues here.   With the high wakeup granularity
> we get the user task actively migrated. This causes the significant
> performance hit Ming was showing. With the fast wakeup_granularity
> (or smaller IOs - 512 instead of 4k) we get, instead, the user task
> migrated at wakeup to a new CPU for every IO completion.

Ok, I haven't noticed that this one was a problem too. Do we have perf
regression ?

>
> This is the 11k migrations per sec doing 11k iops.  In this test it
> is not by itself causing the measured performance issue. It generally
> flips back and forth between 2 cpus for large periods. I think it is
> crossing cache boundaries at times (but I have not looked closely
> at the traces compared to the topology, yet).

At task wake up, scheduler compares local and previous CPU to decide
where to place the task and will then try to find an idle one which
shares cache so I don't expect that it will cross cache boundary as
local and previous are in your case.

>
> The active balances are what really hurts in thie case but I agree
> that seems to be a tuning problem.
>
>
> Cheers,
> Phil
>
>
> >
> > Then, I agree that having an ack close to the request makes sense but
> > forcing it on the exact same CPU is too restrictive IMO. Being able to
> > use another CPU on the same core should not harm the performance and
> > may even improve it. And that may still be the case while CPUs share
> > their cache.
> >
> > >
> > > -Dave.
> > >
> > > [*] Pay attention to the WQ_POWER_EFFICIENT definition for a work
> > > queue: it's designed for interrupt routines that defer work via work
> > > queues to avoid doing work on otherwise idle CPUs. It does this by
> > > turning the per-cpu wq into an unbound wq so that work gets
> > > scheduled on a non-idle CPUs in preference to the local idle CPU
> > > which can then remain in low power states.
> > >
> > > That's the exact opposite of what using WQ_UNBOUND ends up doing in
> > > this IO completion context: it pushes the work out over idle CPUs
> > > rather than keeping them confined on the already busy CPUs where CPU
> > > affinity allows the work to be done quickly. So while WQ_UNBOUND
> > > avoids the user task being migrated frequently, it results in the
> > > work being spread around many more CPUs and we burn more power to do
> > > the same work.
> > >
> > > --
> > > Dave Chinner
> > > david@fromorbit.com
> >
>
> --
>

  reply index

Thread overview: 64+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-14 11:31 Ming Lei
2019-11-14 13:14 ` Peter Zijlstra
2019-11-15  0:09   ` Ming Lei
2019-11-15 14:16     ` Ming Lei
2019-11-14 23:54 ` Dave Chinner
2019-11-15  1:08   ` Ming Lei
2019-11-15  4:56     ` Dave Chinner
2019-11-15  7:08       ` Ming Lei
2019-11-15 23:40         ` Dave Chinner
2019-11-16  6:31           ` Ming Lei
2019-11-18  9:21           ` Peter Zijlstra
2019-11-18 14:54             ` Vincent Guittot
2019-11-18 20:40             ` Dave Chinner
2019-11-20 19:16               ` Peter Zijlstra
2019-11-20 22:03                 ` Phil Auld
2019-11-21  4:12                   ` Ming Lei
2019-11-21 14:12                     ` Phil Auld
2019-11-21 15:02                       ` Boaz Harrosh
2019-11-21 16:19                         ` Jens Axboe
2019-12-09 16:58                           ` Srikar Dronamraju
2019-11-21 22:10                       ` Dave Chinner
2019-11-21 13:29                   ` Peter Zijlstra
2019-11-21 14:21                     ` Phil Auld
2019-12-09 16:51                     ` Srikar Dronamraju
2019-12-09 23:17                       ` Dave Chinner
2019-12-10  3:27                         ` Srikar Dronamraju
2019-12-10  5:43                         ` [PATCH v2] sched/core: Preempt current task in favour of bound kthread Srikar Dronamraju
2019-12-10  9:26                           ` Peter Zijlstra
2019-12-10  9:33                             ` Peter Zijlstra
2019-12-10 10:18                               ` Srikar Dronamraju
2019-12-10 10:16                             ` Srikar Dronamraju
2019-12-10  9:43                           ` Vincent Guittot
2019-12-10 10:11                             ` Srikar Dronamraju
2019-12-10 11:02                               ` Vincent Guittot
2019-12-10 17:23                           ` [PATCH v3] " Srikar Dronamraju
2019-12-11 17:38                             ` [PATCH v4] " Srikar Dronamraju
2019-12-11 22:46                               ` Dave Chinner
2019-12-12 10:10                                 ` Peter Zijlstra
2019-12-12 10:14                                   ` Peter Zijlstra
2019-12-12 10:23                                     ` Peter Zijlstra
2019-12-12 11:20                                       ` Vincent Guittot
2019-12-12 13:12                                         ` Peter Zijlstra
2019-12-12 15:07                                   ` Srikar Dronamraju
2019-12-12 15:15                                     ` Peter Zijlstra
2019-12-13  5:32                                   ` Srikar Dronamraju
2019-11-18 16:26           ` single aio thread is migrated crazily by scheduler Srikar Dronamraju
2019-11-18 21:18             ` Dave Chinner
2019-11-19  8:54               ` Ming Lei
     [not found]         ` <20191128094003.752-1-hdanton@sina.com>
2019-11-28  9:53           ` Vincent Guittot
2019-12-02  2:46             ` Ming Lei
2019-12-02  4:02               ` Dave Chinner
2019-12-02  4:22                 ` Ming Lei
2019-12-02 13:45                 ` Vincent Guittot
2019-12-02 21:22                   ` Phil Auld
2019-12-03  9:45                     ` Vincent Guittot [this message]
2019-12-04 13:50                       ` Ming Lei
2019-12-02 23:53                   ` Dave Chinner
2019-12-03  0:18                     ` Ming Lei
2019-12-03 13:34                     ` Vincent Guittot
2019-12-02  7:39               ` Juri Lelli
2019-12-02  3:08           ` Dave Chinner
     [not found]           ` <20191202090158.15016-1-hdanton@sina.com>
2019-12-02 23:06             ` Dave Chinner
     [not found]             ` <20191203131514.5176-1-hdanton@sina.com>
2019-12-03 22:29               ` Dave Chinner
     [not found]               ` <20191204102903.896-1-hdanton@sina.com>
2019-12-04 22:59                 ` Dave Chinner

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAKfTPtC7uycC3b2ngOFUqOh9-Fcz7h-151aaYJbLJFXrNq-gkw@mail.gmail.com \
    --to=vincent.guittot@linaro.org \
    --cc=axboe@kernel.dk \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=hdanton@sina.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=ming.lei@redhat.com \
    --cc=pauld@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rong.a.chen@intel.com \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-XFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-xfs/0 linux-xfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-xfs linux-xfs/ https://lore.kernel.org/linux-xfs \
		linux-xfs@vger.kernel.org
	public-inbox-index linux-xfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-xfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git