linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Mel Gorman <mgorman@techsingularity.net>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andy Lutomirski <luto@amacapital.net>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Andy Lutomirski <luto@kernel.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Bob Peterson <rpeterso@redhat.com>,
	Steven Whitehouse <swhiteho@redhat.com>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: CONFIG_VMAP_STACK, on-stack struct, and wake_up_bit
Date: Wed, 26 Oct 2016 23:03:39 +0100	[thread overview]
Message-ID: <20161026220339.GE2699@techsingularity.net> (raw)
In-Reply-To: <CA+55aFy21NqcYTeLVVz4x4kfQ7A+o4HEv7srone6ppKAjCwn7g@mail.gmail.com>

On Wed, Oct 26, 2016 at 02:26:57PM -0700, Linus Torvalds wrote:
> On Wed, Oct 26, 2016 at 1:31 PM, Mel Gorman <mgorman@techsingularity.net> wrote:
> >
> > IO wait activity is not all that matters. We hit the lock/unlock paths
> > during a lot of operations like reclaim.
> 
> I doubt we do.
> 
> Yes, we hit the lock/unlock itself, but do we hit the *contention*?
> 
> The current code is nasty, and always ends up touching the wait-queue
> regardless of whether it needs to or not, but we have a fix for that.
> 

To be clear, are you referring to PeterZ's patch that avoids the lookup? If
so, I see your point.

> With that fixed, do we actually get contention on a per-page basis?

Reclaim would have to running parallel to migrations, faults, clearing
write-protect etc. I can't think of a situation where a normal workload
would hit it regularly and/or for long durations.

> Because without contention, we'd never actually look up the wait-queue
> at all.
> 
> I suspect that without IO, it's really really hard to actually get
> that contention, because things like reclaim end up looking at the LRU
> queue etc wioth their own locking, so it should look at various
> individual pages one at a time, not have multiple queues look at the
> same page.
> 

Except many direct reclaimers on small LRUs while a system is thrashing --
not a case that really matters, you've already lost.

> >> diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
> >> index 7f2ae99e5daf..0f088f3a2fed 100644
> >> --- a/include/linux/mmzone.h
> >> +++ b/include/linux/mmzone.h
> >> @@ -440,33 +440,7 @@ struct zone {
> >> +     int initialized;
> >>
> >>       /* Write-intensive fields used from the page allocator */
> >>       ZONE_PADDING(_pad1_)
> >
> > zone_is_initialized is mostly the domain of hotplug. A potential cleanup
> > is to use a page flag and shrink the size of zone slightly. Nothing to
> > panic over.
> 
> I really did that to make it very obvious that there was no semantic
> change. I just set the "initialized" flag in the same place where it
> used to initialize the wait_table, so that this:
> 
> >>  static inline bool zone_is_initialized(struct zone *zone)
> >>  {
> >> -     return !!zone->wait_table;
> >> +     return zone->initialized;
> >>  }
> 
> ends up being obviously equivalent.
> 

No problem with that.

> >> +#define WAIT_TABLE_BITS 8
> >> +#define WAIT_TABLE_SIZE (1 << WAIT_TABLE_BITS)
> >> +static wait_queue_head_t bit_wait_table[WAIT_TABLE_SIZE] __cacheline_aligned;
> >> +
> >> +wait_queue_head_t *bit_waitqueue(void *word, int bit)
> >> +{
> >> +     const int shift = BITS_PER_LONG == 32 ? 5 : 6;
> >> +     unsigned long val = (unsigned long)word << shift | bit;
> >> +
> >> +     return bit_wait_table + hash_long(val, WAIT_TABLE_BITS);
> >> +}
> >> +EXPORT_SYMBOL(bit_waitqueue);
> >> +
> >
> > Minor nit that it's unfortunate this moved to the scheduler core. It
> > wouldn't have been a complete disaster to add a page_waitqueue_init() or
> > something similar after sched_init.
> 
> I considered that, but decided that "minimal patch" was better. Plus,
> with that bit_waitqueue() actually also being used for the page
> locking queues (which act _kind of_ but not quite, like a bitlock),
> the bit_wait_table is actually more core than just the bit-wait code.
> 
> In fact, I considered just renaming it to "hashed_wait_queue", because
> that's effectively how we use it now, rather than being particularly
> specific to the bit-waiting. But again, that would have made the patch
> bigger, which I wanted to avoid since this is a post-rc2 thing due to
> the gfs2 breakage.
> 

No objection. Shuffling it around does not make it obviously better in
any way.

In the meantime, a machine freed up. FWIW, it survived booting on a 2-socket
and about 20 minutes of bashing on reclaim paths from multiple processes
to beat on lock/unlock. I didn't do a performance comparison or gather
profile data but I wouldn't expect anything interesting from profiles
other than some cycles saved.

-- 
Mel Gorman
SUSE Labs

  reply	other threads:[~2016-10-26 22:03 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-26 12:51 CONFIG_VMAP_STACK, on-stack struct, and wake_up_bit Andreas Gruenbacher
2016-10-26 15:51 ` Andy Lutomirski
2016-10-26 16:32   ` Linus Torvalds
2016-10-26 17:15     ` Linus Torvalds
2016-10-26 17:58       ` Linus Torvalds
2016-10-26 18:04         ` Bob Peterson
2016-10-26 18:10           ` Linus Torvalds
2016-10-26 19:11             ` Bob Peterson
2016-10-26 21:01             ` Bob Peterson
2016-10-26 21:30               ` Linus Torvalds
2016-10-26 22:45                 ` Borislav Petkov
2016-10-26 23:13               ` Borislav Petkov
2016-10-27  0:37                 ` Bob Peterson
2016-10-27 12:36                   ` Borislav Petkov
2016-10-27 18:51                     ` Bob Peterson
2016-10-27 19:19                       ` Borislav Petkov
2016-10-27 21:03                         ` Bob Peterson
2016-10-27 21:19                           ` Borislav Petkov
2016-10-28  8:37                     ` [tip:x86/urgent] x86/microcode/AMD: Fix more fallout from CONFIG_RANDOMIZE_MEMORY=y tip-bot for Borislav Petkov
2016-10-26 20:31       ` CONFIG_VMAP_STACK, on-stack struct, and wake_up_bit Mel Gorman
2016-10-26 21:26         ` Linus Torvalds
2016-10-26 22:03           ` Mel Gorman [this message]
2016-10-26 22:09             ` Linus Torvalds
2016-10-26 23:07               ` Mel Gorman
2016-10-27  8:08                 ` Peter Zijlstra
2016-10-27  9:07                   ` Mel Gorman
2016-10-27  9:44                     ` Peter Zijlstra
2016-10-27  9:59                       ` Mel Gorman
2016-10-27 11:56                   ` Nicholas Piggin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161026220339.GE2699@techsingularity.net \
    --to=mgorman@techsingularity.net \
    --cc=agruenba@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=luto@amacapital.net \
    --cc=luto@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rpeterso@redhat.com \
    --cc=swhiteho@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).