All of lore.kernel.org
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@linux-foundation.org>
To: Dave Chinner <david@fromorbit.com>
Cc: Chris Mason <clm@fb.com>, Jan Kara <jack@suse.cz>,
	Josef Bacik <jbacik@fb.com>, LKML <linux-kernel@vger.kernel.org>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Neil Brown <neilb@suse.de>, Christoph Hellwig <hch@lst.de>,
	Tejun Heo <tj@kernel.org>
Subject: Re: [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug()
Date: Thu, 17 Sep 2015 18:50:29 -0700	[thread overview]
Message-ID: <CA+55aFzXW7t+1v3tmW2sxn-BLpvZ1_Ye6epiPWBeq70FoaSmFQ@mail.gmail.com> (raw)
In-Reply-To: <20150918003735.GR3902@dastard>

[-- Attachment #1: Type: text/plain, Size: 2457 bytes --]

On Thu, Sep 17, 2015 at 5:37 PM, Dave Chinner <david@fromorbit.com> wrote:
>> >
>> > I'm not seeing why that should be an issue. Sure, there's some CPU
>> > overhead to context switching, but I don't see that it should be that
>> > big of a deal.
>
> It may well change the dispatch order of enough IOs for it to be
> significant on an IO bound device.

Hmm. Maybe. We obviously try to order the IO's a bit by inode, and I
could see the use of a workqueue maybe changing that sufficiently. But
it still sounds a bit unlikely.

And in fact, I think I have a better explanation.

> In outright performance on my test machine, the difference in
> files/s is noise. However, the consistency looks to be substantially
> improved and the context switch rate is now running at under
> 3,000/sec.

Hmm. I don't recall seeing you mention how many context switches per
second you had before. What is it down from?

However, I think I may have found something more interesting here.

The fact is that a *normal* schedule will trigger that whole
blk_schedule_flush_plug(), but a cond_sched() or a cond_sched_lock()
doesn't actually do a normal schedule at all. Those trigger a
*preemption* schedule.

And a preemption schedule does not trigger that unplugging at all.
Why? A kernel "preemption" very much tries to avoid touching thread
state, because the whole point is that normally we may be preempting
threads in random places, so we don't run things like
sched_submit_work(), because the thread may be in the middle of
*creating* that work, and we don't want to introduce races. The
preemption scheduling can also be done with "task->state" set to
sleeping, and it won't actually sleep.

Now, for the explicit schedules like "cond_resched()" and
"cond_resched_lock()", those races with obviously don't exist, but
they happen to share the same preemption scheduling logic.

So it turns out that as far as I can see, the whole "cond_resched()"
will not start any IO at all, and it will just be left on the thread
plug until we schedule back to the thread.

So I don't think this has anything to do with kblockd_workqueue. I
don't think it even gets to that point.

I may be missing something, but just to humor me, can you test the
attached patch *without* Chris's patch to do explicit plugging? This
should make cond_resched() and cond_resched_lock() run the unplugging.

It may be entirely broken, I haven't thought this entirely through.

                   Linus

[-- Attachment #2: patch.diff --]
[-- Type: text/plain, Size: 633 bytes --]

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 97d276ff1edb..388ea9e7ab8a 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -4548,6 +4548,7 @@ SYSCALL_DEFINE0(sched_yield)
 int __sched _cond_resched(void)
 {
 	if (should_resched(0)) {
+		sched_submit_work(current);
 		preempt_schedule_common();
 		return 1;
 	}
@@ -4572,9 +4573,10 @@ int __cond_resched_lock(spinlock_t *lock)
 
 	if (spin_needbreak(lock) || resched) {
 		spin_unlock(lock);
-		if (resched)
+		if (resched) {
+			sched_submit_work(current);
 			preempt_schedule_common();
-		else
+		} else
 			cpu_relax();
 		ret = 1;
 		spin_lock(lock);

  reply	other threads:[~2015-09-18  1:50 UTC|newest]

Thread overview: 58+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-09-11 19:37 [PATCH] fs-writeback: drop wb->list_lock during blk_finish_plug() Chris Mason
2015-09-11 20:02 ` Linus Torvalds
2015-09-11 20:37   ` Linus Torvalds
2015-09-11 20:40     ` Josef Bacik
2015-09-11 21:04       ` Linus Torvalds
2015-09-11 22:06         ` Linus Torvalds
2015-09-11 23:16           ` Chris Mason
2015-09-11 23:36             ` Linus Torvalds
2015-09-12  0:52               ` Linus Torvalds
2015-09-12  2:15                 ` Chris Mason
2015-09-12  2:27                   ` Linus Torvalds
2015-09-12 23:00               ` Chris Mason
2015-09-12 23:29                 ` Linus Torvalds
2015-09-12 23:46                   ` Chris Mason
2015-09-13 13:12                     ` Chris Mason
2015-09-13 22:56                       ` Dave Chinner
2015-09-13 23:12                 ` Dave Chinner
2015-09-14 20:06                   ` Linus Torvalds
2015-09-16 15:16                     ` Chris Mason
2015-09-16 19:58                       ` Jan Kara
2015-09-16 20:00                         ` Chris Mason
2015-09-16 22:07                           ` Dave Chinner
2015-09-17  0:37                             ` Dave Chinner
2015-09-17  1:12                               ` Linus Torvalds
2015-09-17  2:14                                 ` Dave Chinner
2015-09-17 19:39                                   ` Linus Torvalds
2015-09-17 22:42                                     ` Chris Mason
2015-09-17 23:08                                       ` Linus Torvalds
2015-09-17 23:56                                         ` Chris Mason
2015-09-18  0:37                                           ` Dave Chinner
2015-09-18  1:50                                             ` Linus Torvalds [this message]
2015-09-18  5:40                                               ` Dave Chinner
2015-09-18  6:04                                                 ` Linus Torvalds
2015-09-18  6:06                                                   ` Linus Torvalds
2015-09-18 14:21                                                     ` Jens Axboe
2015-09-18 13:16                                                   ` Chris Mason
2015-09-18 14:23                                                     ` Jens Axboe
2015-09-18 15:32                                                       ` Linus Torvalds
2015-09-18 15:59                                                         ` Peter Zijlstra
2015-09-18 16:02                                                           ` Peter Zijlstra
2015-09-18 16:12                                                           ` Linus Torvalds
2015-09-28 14:47                                                             ` Peter Zijlstra
2015-09-28 16:08                                                               ` Linus Torvalds
2015-09-29  7:55                                                                 ` Ingo Molnar
2015-09-18 22:17                                                   ` Dave Chinner
2015-09-21  9:24                                                     ` Jan Kara
2015-09-21  9:24                                                       ` Jan Kara
2015-09-21 20:21                                                       ` Andrew Morton
2015-09-21 20:21                                                         ` Andrew Morton
2015-09-17 23:03                                   ` Dave Chinner
2015-09-17 23:13                                     ` Linus Torvalds
2015-09-17  3:48                               ` Chris Mason
2015-09-17  4:30                                 ` Dave Chinner
2015-09-17 12:13                                   ` Chris Mason
2015-09-11 23:06         ` Chris Mason
2015-09-11 23:13           ` Linus Torvalds
  -- strict thread matches above, loose matches on Subject: below --
2015-09-09 15:23 Chris Mason
2015-09-11 18:49 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+55aFzXW7t+1v3tmW2sxn-BLpvZ1_Ye6epiPWBeq70FoaSmFQ@mail.gmail.com \
    --to=torvalds@linux-foundation.org \
    --cc=clm@fb.com \
    --cc=david@fromorbit.com \
    --cc=hch@lst.de \
    --cc=jack@suse.cz \
    --cc=jbacik@fb.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=neilb@suse.de \
    --cc=tj@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.