From: Nicholas Piggin <npiggin@gmail.com>
To: Mel Gorman <mgorman@techsingularity.net>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
Linus Torvalds <torvalds@linux-foundation.org>,
Bob Peterson <rpeterso@redhat.com>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
swhiteho@redhat.com, luto@kernel.org, agruenba@redhat.com,
peterz@infradead.org, linux-mm@kvack.org
Subject: Re: [RFC][PATCH] make global bitlock waitqueues per-node
Date: Tue, 20 Dec 2016 23:21:22 +1000 [thread overview]
Message-ID: <20161220232122.62c8196e@roar.ozlabs.ibm.com> (raw)
In-Reply-To: <20161220125825.hfwyzy2mzc4lna7x@techsingularity.net>
On Tue, 20 Dec 2016 12:58:25 +0000
Mel Gorman <mgorman@techsingularity.net> wrote:
> On Tue, Dec 20, 2016 at 12:31:13PM +1000, Nicholas Piggin wrote:
> > On Mon, 19 Dec 2016 16:20:05 -0800
> > Dave Hansen <dave.hansen@linux.intel.com> wrote:
> >
> > > On 12/19/2016 03:07 PM, Linus Torvalds wrote:
> > > > +wait_queue_head_t *bit_waitqueue(void *word, int bit)
> > > > +{
> > > > + const int __maybe_unused nid = page_to_nid(virt_to_page(word));
> > > > +
> > > > + return __bit_waitqueue(word, bit, nid);
> > > >
> > > > No can do. Part of the problem with the old coffee was that it did that
> > > > virt_to_page() crud. That doesn't work with the virtually mapped stack.
> > >
> > > Ahhh, got it.
> > >
> > > So, what did you have in mind? Just redirect bit_waitqueue() to the
> > > "first_online_node" waitqueues?
> > >
> > > wait_queue_head_t *bit_waitqueue(void *word, int bit)
> > > {
> > > return __bit_waitqueue(word, bit, first_online_node);
> > > }
> > >
> > > We could do some fancy stuff like only do virt_to_page() for things in
> > > the linear map, but I'm not sure we'll see much of a gain for it. None
> > > of the other waitqueue users look as pathological as the 'struct page'
> > > ones. Maybe:
> > >
> > > wait_queue_head_t *bit_waitqueue(void *word, int bit)
> > > {
> > > int nid
> > > if (word >= VMALLOC_START) /* all addrs not in linear map */
> > > nid = first_online_node;
> > > else
> > > nid = page_to_nid(virt_to_page(word));
> > > return __bit_waitqueue(word, bit, nid);
> > > }
> >
> > I think he meant just make the page_waitqueue do the per-node thing
> > and leave bit_waitqueue as the global bit.
> >
>
> I'm pressed for time but at a glance, that might require a separate
> structure of wait_queues for page waitqueue. Most users of bit_waitqueue
> are not operating with pages. The first user is based on a word inside
> a block_device for example. All non-page users could assume node-0.
Yes it would require something or other like that. Trivial to keep things
balanced (if not local) over nodes by take a simple hash of the virtual
address to spread over the nodes. Or just keep using this separate global
table for the bit_waitqueue...
But before Linus grumps at me again, let's try to do the waitqueue
avoidance bit first before we worry about that :)
> It
> shrinks the available hash table space but as before, maybe collisions
> are not common enough to actually matter. That would be worth checking
> out. Alternatively, careful auditing to pick a node when it's known it's
> safe to call virt_to_page may work but it would be fragile.
>
> Unfortunately I won't be able to review or test any patches until January
> 3rd after I'm back online properly. Right now, I have intermittent internet
> access at best. During the next 4 days, I know I definitely will not have
> any internet access.
>
> The last time around, there were three patch sets to avoid the overhead for
> pages in particular. One was dropped (mine, based on Nick's old work) as
> it was too complicated. Peter had some patches but after enough hammering
> it failed due to a missed wakup that I didn't pin down before having to
> travel to a conference.
>
> I hadn't tested Nick's prototype although it looked fine because others
> reviewed it before I looked and I was waiting for another version to
> appear. If one appears, I'll take a closer look and bash it across a few
> machines to see if it has any lost wakeup problems.
>
Sure I'll respin it this week.
Thanks,
Nick
next prev parent reply other threads:[~2016-12-20 13:21 UTC|newest]
Thread overview: 20+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-12-19 22:58 [RFC][PATCH] make global bitlock waitqueues per-node Dave Hansen
[not found] ` <CA+55aFwK6JdSy9v_BkNYWNdfK82sYA1h3qCSAJQ0T45cOxeXmQ@mail.gmail.com>
2016-12-20 0:20 ` Dave Hansen
2016-12-20 2:31 ` Nicholas Piggin
2016-12-20 12:58 ` Mel Gorman
2016-12-20 13:21 ` Nicholas Piggin [this message]
2016-12-20 17:31 ` Linus Torvalds
2016-12-20 18:02 ` Linus Torvalds
2016-12-21 8:09 ` Peter Zijlstra
2016-12-21 8:32 ` Peter Zijlstra
2016-12-21 18:02 ` Linus Torvalds
2016-12-21 18:33 ` Nicholas Piggin
2016-12-21 19:01 ` Nicholas Piggin
2016-12-21 19:50 ` Linus Torvalds
2016-12-22 2:07 ` Nicholas Piggin
2016-12-22 19:28 ` Hugh Dickins
2016-12-21 10:26 ` Nicholas Piggin
2016-12-20 2:26 ` Nicholas Piggin
2016-12-21 12:30 ` Nicholas Piggin
2016-12-21 18:12 ` Linus Torvalds
2016-12-21 18:40 ` Nicholas Piggin
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20161220232122.62c8196e@roar.ozlabs.ibm.com \
--to=npiggin@gmail.com \
--cc=agruenba@redhat.com \
--cc=dave.hansen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=luto@kernel.org \
--cc=mgorman@techsingularity.net \
--cc=peterz@infradead.org \
--cc=rpeterso@redhat.com \
--cc=swhiteho@redhat.com \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).