From: Mel Gorman <mgorman@techsingularity.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>,
Andreas Gruenbacher <agruenba@redhat.com>,
Peter Zijlstra <peterz@infradead.org>,
Andy Lutomirski <luto@kernel.org>,
LKML <linux-kernel@vger.kernel.org>,
Bob Peterson <rpeterso@redhat.com>,
Steven Whitehouse <swhiteho@redhat.com>,
linux-mm <linux-mm@kvack.org>
Subject: Re: CONFIG_VMAP_STACK, on-stack struct, and wake_up_bit
Date: Wed, 26 Oct 2016 23:03:39 +0100 [thread overview]
Message-ID: <20161026220339.GE2699@techsingularity.net> (raw)
In-Reply-To: <CA+55aFy21NqcYTeLVVz4x4kfQ7A+o4HEv7srone6ppKAjCwn7g@mail.gmail.com>
On Wed, Oct 26, 2016 at 02:26:57PM -0700, Linus Torvalds wrote:
> On Wed, Oct 26, 2016 at 1:31 PM, Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > IO wait activity is not all that matters. We hit the lock/unlock paths
> > during a lot of operations like reclaim.
>
> I doubt we do.
>
> Yes, we hit the lock/unlock itself, but do we hit the *contention*?
>
> The current code is nasty, and always ends up touching the wait-queue
> regardless of whether it needs to or not, but we have a fix for that.
>
To be clear, are you referring to PeterZ's patch that avoids the lookup? If
so, I see your point.
> With that fixed, do we actually get contention on a per-page basis?
Reclaim would have to running parallel to migrations, faults, clearing
write-protect etc. I can't think of a situation where a normal workload
would hit it regularly and/or for long durations.
> Because without contention, we'd never actually look up the wait-queue
> at all.
>
> I suspect that without IO, it's really really hard to actually get
> that contention, because things like reclaim end up looking at the LRU
> queue etc wioth their own locking, so it should look at various
> individual pages one at a time, not have multiple queues look at the
> same page.
>
Except many direct reclaimers on small LRUs while a system is thrashing --
not a case that really matters, you've already lost.
> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index 7f2ae99e5daf..0f088f3a2fed 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -440,33 +440,7 @@ struct zone {
> >> + int initialized;
> >>
> >> /* Write-intensive fields used from the page allocator */
> >> ZONE_PADDING(_pad1_)
> >
> > zone_is_initialized is mostly the domain of hotplug. A potential cleanup
> > is to use a page flag and shrink the size of zone slightly. Nothing to
> > panic over.
>
> I really did that to make it very obvious that there was no semantic
> change. I just set the "initialized" flag in the same place where it
> used to initialize the wait_table, so that this:
>
> >> static inline bool zone_is_initialized(struct zone *zone)
> >> {
> >> - return !!zone->wait_table;
> >> + return zone->initialized;
> >> }
>
> ends up being obviously equivalent.
>
No problem with that.
> >> +#define WAIT_TABLE_BITS 8
> >> +#define WAIT_TABLE_SIZE (1 << WAIT_TABLE_BITS)
> >> +static wait_queue_head_t bit_wait_table[WAIT_TABLE_SIZE] __cacheline_aligned;
> >> +
> >> +wait_queue_head_t *bit_waitqueue(void *word, int bit)
> >> +{
> >> + const int shift = BITS_PER_LONG == 32 ? 5 : 6;
> >> + unsigned long val = (unsigned long)word << shift | bit;
> >> +
> >> + return bit_wait_table + hash_long(val, WAIT_TABLE_BITS);
> >> +}
> >> +EXPORT_SYMBOL(bit_waitqueue);
> >> +
> >
> > Minor nit that it's unfortunate this moved to the scheduler core. It
> > wouldn't have been a complete disaster to add a page_waitqueue_init() or
> > something similar after sched_init.
>
> I considered that, but decided that "minimal patch" was better. Plus,
> with that bit_waitqueue() actually also being used for the page
> locking queues (which act _kind of_ but not quite, like a bitlock),
> the bit_wait_table is actually more core than just the bit-wait code.
>
> In fact, I considered just renaming it to "hashed_wait_queue", because
> that's effectively how we use it now, rather than being particularly
> specific to the bit-waiting. But again, that would have made the patch
> bigger, which I wanted to avoid since this is a post-rc2 thing due to
> the gfs2 breakage.
>
No objection. Shuffling it around does not make it obviously better in
any way.
In the meantime, a machine freed up. FWIW, it survived booting on a 2-socket
and about 20 minutes of bashing on reclaim paths from multiple processes
to beat on lock/unlock. I didn't do a performance comparison or gather
profile data but I wouldn't expect anything interesting from profiles
other than some cycles saved.
--
Mel Gorman
SUSE Labs
next prev parent reply other threads:[~2016-10-26 22:03 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-10-26 12:51 CONFIG_VMAP_STACK, on-stack struct, and wake_up_bit Andreas Gruenbacher
2016-10-26 15:51 ` Andy Lutomirski
2016-10-26 16:32 ` Linus Torvalds
2016-10-26 17:15 ` Linus Torvalds
2016-10-26 17:58 ` Linus Torvalds
2016-10-26 18:04 ` Bob Peterson
2016-10-26 18:10 ` Linus Torvalds
2016-10-26 19:11 ` Bob Peterson
2016-10-26 21:01 ` Bob Peterson
2016-10-26 21:30 ` Linus Torvalds
2016-10-26 22:45 ` Borislav Petkov
2016-10-26 23:13 ` Borislav Petkov
2016-10-27 0:37 ` Bob Peterson
2016-10-27 12:36 ` Borislav Petkov
2016-10-27 18:51 ` Bob Peterson
2016-10-27 19:19 ` Borislav Petkov
2016-10-27 21:03 ` Bob Peterson
2016-10-27 21:19 ` Borislav Petkov
2016-10-28 8:37 ` [tip:x86/urgent] x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y tip-bot for Borislav Petkov
2016-10-26 20:31 ` CONFIG_VMAP_STACK, on-stack struct, and wake_up_bit Mel Gorman
2016-10-26 21:26 ` Linus Torvalds
2016-10-26 22:03 ` Mel Gorman [this message]
2016-10-26 22:09 ` Linus Torvalds
2016-10-26 23:07 ` Mel Gorman
2016-10-27 8:08 ` Peter Zijlstra
2016-10-27 9:07 ` Mel Gorman
2016-10-27 9:44 ` Peter Zijlstra
2016-10-27 9:59 ` Mel Gorman
2016-10-27 11:56 ` Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161026220339.GE2699@techsingularity.net \
--to=mgorman@techsingularity.net \
--cc=agruenba@redhat.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@amacapital.net \
--cc=luto@kernel.org \
--cc=peterz@infradead.org \
--cc=rpeterso@redhat.com \
--cc=swhiteho@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).