Re: [PATCH 2/2] mm: add PageWaiters indicating tasks are waiting for a page bit

From: Linus Torvalds <torvalds@linux-foundation.org>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>,
	Bob Peterson <rpeterso@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Steven Whitehouse <swhiteho@redhat.com>,
	Andrew Lutomirski <luto@kernel.org>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-mm <linux-mm@kvack.org>,
	Mel Gorman <mgorman@techsingularity.net>
Subject: Re: [PATCH 2/2] mm: add PageWaiters indicating tasks are waiting for a page bit
Date: Tue, 27 Dec 2016 12:17:34 -0800	[thread overview]
Message-ID: <CA+55aFwjcEmtWjNXhugX3GfH0zvypLVi0r90PWL3DCD-jA4v5Q@mail.gmail.com> (raw)
In-Reply-To: <CA+55aFyXXKdjbidzVC=waiaAaUJpwqZQZv-kKoZfaiWtYy3z=A@mail.gmail.com>

On Tue, Dec 27, 2016 at 11:40 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> This patch at least might have a chance in hell of working. Let's see..

Ok, with that fixed, things do indeed seem to work.

And things also look fairly good on my "lots of nasty little
shortlived scripts" benchmark ("make -j32 test" for git, in case
people care).

That benchmark used to have "unlock_page()" and "__wake_up_bit()"
together using about 3% of all CPU time.

Now __wake_up_bit() doesn't show up at all (ok, it's something like
0.02%, so it's technically still there, but..) and "unlock_page()" is
at 0.66% of CPU time. So it's about a quarter of where it used to be.
And now it's about the same cost as the "try_lock_page() that is
inlined into filemap_map_pages() - it used to be that unlocking the
page was much more expensive than locking it because of all the
unnecessary waitqueue games.

So the benchmark still does a ton of page lock/unlock action, but it
doesn't stand out in the profiles as some kind of WTF thing any more.
And the profiles really show that the cost is the atomic op itself
rather than bad effects from bad code generation, which is what you
want to see.

Would I love to fix this all by not taking the page lock at all? Yes I
would. I suspect we should be able to do something clever and lockless
at least in theory.

But in the meantime, I'm happy with where our page locking overhead
is. And while I haven't seen the NUMA numbers from Dave Hansen with
this all, the early testing from Dave was that the original patch from
Nick already fixed the regression and was the fastest one anyway. And
this optimization will only have improved on things further, although
it might not be as noticeable on NUMA as it is on just a regular
single socket system.

                   Linus