linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Huang\, Ying" <ying.huang@intel.com>
To: Matthew Wilcox <willy@infradead.org>
Cc: "Chen\, Rong A" <rong.a.chen@intel.com>,
	"lkp\@01.org" <lkp@01.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	LKML <linux-kernel@vger.kernel.org>,
	Andi Kleen <ak@linux.intel.com>,
	Dave Hansen <dave.hansen@intel.com>,
	Waiman Long <longman@redhat.com>,
	Tim C Chen <tim.c.chen@intel.com>
Subject: Re: [LKP] [page cache] eb797a8ee0: vm-scalability.throughput -16.5% regression
Date: Tue, 26 Feb 2019 16:17:18 +0800	[thread overview]
Message-ID: <875zt7t14h.fsf@yhuang-dev.intel.com> (raw)
In-Reply-To: <20181114141713.GA25731@bombadil.infradead.org> (Matthew Wilcox's message of "Wed, 14 Nov 2018 22:17:13 +0800")

Matthew Wilcox <willy@infradead.org> writes:

> On Wed, Nov 14, 2018 at 05:22:42PM +0800, kernel test robot wrote:
>> FYI, we noticed a -16.5% regression of vm-scalability.throughput due to commit:
>> commit: eb797a8ee0ab4cd03df556980ce7bf167cadaa50 ("page cache: Rearrange address_space")
>> 
>> in testcase: vm-scalability
>> on test machine: 80 threads Skylake with 64G memory
>> with following parameters:
>> 
>> 	runtime: 300s
>> 	test: small-allocs
>> 	cpufreq_governor: performance
>
> Thanks for the report.  I am surprised by it, because I can't see what
> could have caused this.  On a 64-bit system (which your test is using),
> here's the effect of that patch:
>
> gfp_mask moves from being with private_lock up into a hole adjacent
> to i_mmap_writable.
> wb_err moves from the end of the array up to be with private_lock.
> address_space shrinks by 8 bytes.
>
> Hmm.  Might the shrinking be the problem?  Something in struct inode is
> now split across two cachelines, or is no longer in the same cacheline
> as something else?

It appears that the shrinking causes the regression.  To verify that, I
write a debug patch as follow.  And it recovers the regression.

"
modified   include/linux/fs.h
@@ -437,6 +437,7 @@ struct address_space {
 	spinlock_t		private_lock;
 	struct list_head	private_list;
 	void			*private_data;
+	int			pad0;
 } __attribute__((aligned(sizeof(long)))) __randomize_layout;
 	/*
 	 * On most architectures that alignment is already the case; but
"

Which recovers the size of struct address_space.  And the following
patch by itself recovers the size of struct inode, and it recovers the
regression too.

"
modified   include/linux/fs.h
@@ -704,6 +703,7 @@ struct inode {
 #endif
 
 	void			*i_private; /* fs or device private pointer */
+	long			i_dbg_pad;
 } __randomize_layout;
 
 static inline unsigned int i_blocksize(const struct inode *node)
"

That is, the regression is related with the size of struct inode.  Take
a look at how struct inode is allocated, I found it is allocated via
slab and without any alignment requirement.  After the first bad commit,
the sizeof(struct inode) is changed from 592 to 584, so the cache line
alignment of some inode are changed, which causes the regression.  To
further dig which causes the regression, I make the struct inode/struct
address_space allocation aligned with the cache line, add some pads to
struct address_space, and check the performance difference.  It is found
that if the last 8 or 16 bytes of address_space->i_mmap_rwsem are put in
a separate cache with other fields of address_space->i_mmap_rwsem, the
regression could be recovered.  The layout of struct rw_semphore is as
follow.

struct rw_semaphore {
	atomic_long_t              count;                /*     0     8 */
	struct list_head           wait_list;            /*     8    16 */
	raw_spinlock_t             wait_lock;            /*    24     4 */
	struct optimistic_spin_queue osq;                /*    28     4 */
	struct task_struct *       owner;                /*    32     8 */

	/* size: 40, cachelines: 1, members: 5 */
	/* last cacheline: 40 bytes */
};

This means, we need to put rw_semphore->owner at a different cache line
of other rw_semphore fields to recover the regression.

Back to the regression report.  Most of CPU time is spent for
rw_semphore spinning during the test,

     89.08            +0.7       89.79        perf-profile.self.cycles-pp.osq_lock
      1.02            -0.1        0.89        perf-profile.self.cycles-pp.rwsem_spin_on_owner

So the cache line layout of struct rw_semphore could have big influence
for the test.  And during spinning, the lock holder will update
address_space->i_mmap and address_space->i_mmap_rwsem.count, while the
spinning CPU will keep reading address_space->i_mmap_rwsem.owner, this
will cause cache ping-pong, then cause the regression.  The parent
commit happens to have the right cache line layout, while the first bad
commit doesn't.

As for fixing.  Should we care about the cache line alignment of struct
inode?  Or its size is considered more important because there may be a
huge number of struct inode in the system?

Best Regards,
Huang, Ying

> I'm at Plumbers this week, so I don't have much time to investigate,
> but this regression is very important to me and I shall dig into this
> when I can.  It's OK to revert this commit in the meantime; nothing
> depends on it yet.
>
>> In addition to that, the commit also has significant impact on the following tests:
>> 
>> | testcase: change | unixbench: unixbench.score 20.9% improvement                         |
>
> Huh.  If we had to choose, would a 20.9% improvement in unixbench be more
> important than a 16.5% penalty to will-it-scale?

  reply	other threads:[~2019-02-26  8:23 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-14  9:22 [LKP] [page cache] eb797a8ee0: vm-scalability.throughput -16.5% regression kernel test robot
2018-11-14 14:17 ` Matthew Wilcox
2019-02-26  8:17   ` Huang, Ying [this message]
2019-02-26 17:30     ` Linus Torvalds
2019-02-26 20:29       ` Waiman Long
2019-02-28  1:18         ` Huang, Ying
2019-02-28  1:32           ` Linus Torvalds
2019-03-02  8:26             ` Huang, Ying
2019-02-28  2:37           ` Waiman Long
2019-02-28  3:26             ` Huang, Ying

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875zt7t14h.fsf@yhuang-dev.intel.com \
    --to=ying.huang@intel.com \
    --cc=ak@linux.intel.com \
    --cc=dave.hansen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lkp@01.org \
    --cc=longman@redhat.com \
    --cc=rong.a.chen@intel.com \
    --cc=tim.c.chen@intel.com \
    --cc=torvalds@linux-foundation.org \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).