Re: [PATCH 2/2] mm: add PageWaiters indicating tasks are waiting for a page bit

From: Mel Gorman <mgorman@techsingularity.net>
To: Nicholas Piggin <npiggin@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Bob Peterson <rpeterso@redhat.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Steven Whitehouse <swhiteho@redhat.com>,
	Andrew Lutomirski <luto@kernel.org>,
	Andreas Gruenbacher <agruenba@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	linux-mm <linux-mm@kvack.org>
Subject: Re: [PATCH 2/2] mm: add PageWaiters indicating tasks are waiting for a page bit
Date: Tue, 3 Jan 2017 17:18:27 +0000	[thread overview]
Message-ID: <20170103171827.mj6bpclerz2xwesx@techsingularity.net> (raw)
In-Reply-To: <20170103222958.4a2ce0e6@roar.ozlabs.ibm.com>

On Tue, Jan 03, 2017 at 10:29:58PM +1000, Nicholas Piggin wrote:
> > kernel building showed nothing unusual on any machine
> > 
> > git checkout in a loop showed;
> > 	o minor gains with Nick's patch
> > 	o no impact from Linus's patch
> > 	o flat performance from PeterZ's
> > 
> > git test suite showed
> > 	o close to flat performance on all patches
> > 	o Linus' patch on top showed increased variability but not serious
> 
> I'd be really surprised if Linus's patch is actually adding variability
> unless it is just some random cache or branch predictor or similar change
> due to changed code sizes. Testing on skylake CPU showed the old sequence
> takes a big stall with the load-after-lock;op hazard.
> 
> So I wouldn't worry about it too much, but maybe something interesting to
> look at for someone who knows x86 microarchitectures well.
> 

Agreed, it looked like a testing artifact. Later in the day it was obvious
that different results were obtained between boots and minor changes
in timing.

It's compounded by the fact that this particular test is based on /tmp so
different people will get different results depending on the filesystem. In
my case, that was btrfs.

> > 
> > will-it-scale pagefault tests
> > 	o page_fault1 and page_fault2 showed no differences in processes
> > 
> > 	o page_fault3 using processes did show some large losses at some
> > 	  process counts on all patches. The losses were not consistent on
> > 	  each run. There also was no consistently at loss with increasing
> > 	  process counts. It did appear that Peter's patch had fewer
> > 	  problems with only one thread count showing problems so it
> > 	  *may* be more resistent to the problem but not completely and
> > 	  it's not obvious why it might be so it could be a testing
> > 	  anomaly
> 
> Okay. page_fault3 has each process doing repeated page faults on their
> own 128MB file in /tmp. Unless they fill memory and start to reclaim,
> (which I believe must be happening in Dave's case) there should be no
> contention on page lock. After the patch, the uncontended case should
> be strictly faster when there is no contention.
> 

It's possible in the test that I setup that it's getting screwed by the
filesystem used. It's writing that array and on a COW filesystem there
is a whole bucket of variables such as new block allocations, writeback
timing etc. The writeback timing alone means that this test is questionable
because the results will depend on when background writeback triggers. I'm
not sure how Dave is controlling for these factors.

As an aside, will-it-scale page_fault was one of the tests I had dropped way
down on my list of priorities to watch a long time ago.  It's a short-lived
filesystem-based test vunerable to timing issues and highly variable that
didn't seem worth controlling for at the time. I queued the test for a look
on the rough offchance your patches were falling over something obvious. If
it is, I haven't spotted it yet. Profiles are very similar. Separate
runs with lock stat show detectable but negligable contentions on
page_wait_table. The bulk of the contention is within the filesystem.

> When there is contention, there is an added cost of setting and clearing
> page waiters bit. Maybe there is some other issue there... are you seeing
> the losses in uncontended case, contended, or both?
> 

I couldn't determine from the profile whether it was uncontended or not.
The fact that there are boot-to-boot variations makes it difficult
to determine if a profile vs !profile run is equivalent without using
debugging patches to gather stats. lock_stat does show large numbers of
contentions all right but the vast bulk of them are internal to btrfs.

-- 
Mel Gorman
SUSE Labs