All of lore.kernel.org
 help / color / mirror / Atom feed
From: Sasha Levin <sashal@kernel.org>
To: linux-kernel@vger.kernel.org, stable@vger.kernel.org
Cc: Feng Tang <feng.tang@intel.com>,
	kernel test robot <oliver.sang@intel.com>,
	John Hubbard <jhubbard@nvidia.com>,
	Jason Gunthorpe <jgg@nvidia.com>, Peter Xu <peterx@redhat.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Sasha Levin <sashal@kernel.org>
Subject: [PATCH AUTOSEL 5.12 33/33] mm: relocate 'write_protect_seq' in struct mm_struct
Date: Tue, 15 Jun 2021 11:48:24 -0400	[thread overview]
Message-ID: <20210615154824.62044-33-sashal@kernel.org> (raw)
In-Reply-To: <20210615154824.62044-1-sashal@kernel.org>

From: Feng Tang <feng.tang@intel.com>

[ Upstream commit 2e3025434a6ba090c85871a1d4080ff784109e1f ]

0day robot reported a 9.2% regression for will-it-scale mmap1 test
case[1], caused by commit 57efa1fe5957 ("mm/gup: prevent gup_fast from
racing with COW during fork").

Further debug shows the regression is due to that commit changes the
offset of hot fields 'mmap_lock' inside structure 'mm_struct', thus some
cache alignment changes.

From the perf data, the contention for 'mmap_lock' is very severe and
takes around 95% cpu cycles, and it is a rw_semaphore

        struct rw_semaphore {
                atomic_long_t count;	/* 8 bytes */
                atomic_long_t owner;	/* 8 bytes */
                struct optimistic_spin_queue osq; /* spinner MCS lock */
                ...

Before commit 57efa1fe5957 adds the 'write_protect_seq', it happens to
have a very optimal cache alignment layout, as Linus explained:

 "and before the addition of the 'write_protect_seq' field, the
  mmap_sem was at offset 120 in 'struct mm_struct'.

  Which meant that count and owner were in two different cachelines,
  and then when you have contention and spend time in
  rwsem_down_write_slowpath(), this is probably *exactly* the kind
  of layout you want.

  Because first the rwsem_write_trylock() will do a cmpxchg on the
  first cacheline (for the optimistic fast-path), and then in the
  case of contention, rwsem_down_write_slowpath() will just access
  the second cacheline.

  Which is probably just optimal for a load that spends a lot of
  time contended - new waiters touch that first cacheline, and then
  they queue themselves up on the second cacheline."

After the commit, the rw_semaphore is at offset 128, which means the
'count' and 'owner' fields are now in the same cacheline, and causes
more cache bouncing.

Currently there are 3 "#ifdef CONFIG_XXX" before 'mmap_lock' which will
affect its offset:

  CONFIG_MMU
  CONFIG_MEMBARRIER
  CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES

The layout above is on 64 bits system with 0day's default kernel config
(similar to RHEL-8.3's config), in which all these 3 options are 'y'.
And the layout can vary with different kernel configs.

Relayouting a structure is usually a double-edged sword, as sometimes it
can helps one case, but hurt other cases.  For this case, one solution
is, as the newly added 'write_protect_seq' is a 4 bytes long seqcount_t
(when CONFIG_DEBUG_LOCK_ALLOC=n), placing it into an existing 4 bytes
hole in 'mm_struct' will not change other fields' alignment, while
restoring the regression.

Link: https://lore.kernel.org/lkml/20210525031636.GB7744@xsang-OptiPlex-9020/ [1]
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Feng Tang <feng.tang@intel.com>
Reviewed-by: John Hubbard <jhubbard@nvidia.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
---
 include/linux/mm_types.h | 27 ++++++++++++++++++++-------
 1 file changed, 20 insertions(+), 7 deletions(-)

diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5aacc1c10a45..8f0fb62e8975 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -445,13 +445,6 @@ struct mm_struct {
 		 */
 		atomic_t has_pinned;
 
-		/**
-		 * @write_protect_seq: Locked when any thread is write
-		 * protecting pages mapped by this mm to enforce a later COW,
-		 * for instance during page table copying for fork().
-		 */
-		seqcount_t write_protect_seq;
-
 #ifdef CONFIG_MMU
 		atomic_long_t pgtables_bytes;	/* PTE page table pages */
 #endif
@@ -460,6 +453,18 @@ struct mm_struct {
 		spinlock_t page_table_lock; /* Protects page tables and some
 					     * counters
 					     */
+		/*
+		 * With some kernel config, the current mmap_lock's offset
+		 * inside 'mm_struct' is at 0x120, which is very optimal, as
+		 * its two hot fields 'count' and 'owner' sit in 2 different
+		 * cachelines,  and when mmap_lock is highly contended, both
+		 * of the 2 fields will be accessed frequently, current layout
+		 * will help to reduce cache bouncing.
+		 *
+		 * So please be careful with adding new fields before
+		 * mmap_lock, which can easily push the 2 fields into one
+		 * cacheline.
+		 */
 		struct rw_semaphore mmap_lock;
 
 		struct list_head mmlist; /* List of maybe swapped mm's.	These
@@ -480,7 +485,15 @@ struct mm_struct {
 		unsigned long stack_vm;	   /* VM_STACK */
 		unsigned long def_flags;
 
+		/**
+		 * @write_protect_seq: Locked when any thread is write
+		 * protecting pages mapped by this mm to enforce a later COW,
+		 * for instance during page table copying for fork().
+		 */
+		seqcount_t write_protect_seq;
+
 		spinlock_t arg_lock; /* protect the below fields */
+
 		unsigned long start_code, end_code, start_data, end_data;
 		unsigned long start_brk, brk, start_stack;
 		unsigned long arg_start, arg_end, env_start, env_end;
-- 
2.30.2


      parent reply	other threads:[~2021-06-15 15:50 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-15 15:47 [PATCH AUTOSEL 5.12 01/33] regulator: cros-ec: Fix error code in dev_err message Sasha Levin
2021-06-15 15:47 ` [PATCH AUTOSEL 5.12 02/33] regulator: max77620: Silence deferred probe error Sasha Levin
2021-06-15 15:54   ` Mark Brown
2021-06-20 12:55     ` Sasha Levin
2021-06-21 10:40       ` Mark Brown
2021-06-21 18:03         ` Sasha Levin
2021-06-15 15:47 ` [PATCH AUTOSEL 5.12 03/33] regulator: bd70528: Fix off-by-one for buck123 .n_voltages setting Sasha Levin
2021-06-15 15:47 ` [PATCH AUTOSEL 5.12 04/33] platform/x86: thinkpad_acpi: Add X1 Carbon Gen 9 second fan support Sasha Levin
2021-06-15 15:47 ` [PATCH AUTOSEL 5.12 05/33] ASoC: rt5659: Fix the lost powers for the HDA header Sasha Levin
2021-06-15 15:47   ` Sasha Levin
2021-06-15 15:47 ` [PATCH AUTOSEL 5.12 06/33] phy: phy-mtk-tphy: Fix some resource leaks in mtk_phy_init() Sasha Levin
2021-06-15 15:47   ` Sasha Levin
2021-06-15 15:47   ` Sasha Levin
2021-06-15 15:47   ` Sasha Levin
2021-06-15 15:47 ` [PATCH AUTOSEL 5.12 07/33] ASoC: fsl-asoc-card: Set .owner attribute when registering card Sasha Levin
2021-06-15 15:47   ` Sasha Levin
2021-06-15 15:47 ` [PATCH AUTOSEL 5.12 08/33] regulator: mt6315: Fix function prototype for mt6315_map_mode Sasha Levin
2021-06-15 15:47   ` Sasha Levin
2021-06-15 15:47   ` Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 09/33] regulator: rtmv20: Fix to make regcache value first reading back from HW Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 10/33] NFSv4: nfs4_proc_set_acl needs to restore NFS_CAP_UIDGID_NOMAP on error Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 11/33] ASoC: AMD Renoir - add DMI entry for Lenovo 2020 AMD platforms Sasha Levin
2021-06-15 15:48   ` Sasha Levin
2021-06-15 15:56   ` Mark Brown
2021-06-15 15:56     ` Mark Brown
2021-06-20 12:56     ` Sasha Levin
2021-06-20 12:56       ` Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 12/33] spi: spi-zynq-qspi: Fix some wrong goto jumps & missing error code Sasha Levin
2021-06-15 15:48   ` Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 13/33] sched/pelt: Ensure that *_sum is always synced with *_avg Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 14/33] ASoC: AMD Renoir: Remove fix for DMI entry on Lenovo 2020 platforms Sasha Levin
2021-06-15 15:48   ` Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 15/33] ASoC: tas2562: Fix TDM_CFG0_SAMPRATE values Sasha Levin
2021-06-15 15:48   ` Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 16/33] regulator: hi6421v600: Fix .vsel_mask setting Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 17/33] spi: stm32-qspi: Always wait BUSY bit to be cleared in stm32_qspi_wait_cmd() Sasha Levin
2021-06-15 15:48   ` Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 18/33] NFSv4: Fix second deadlock in nfs4_evict_inode() Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 19/33] regulator: rt4801: Fix NULL pointer dereference if priv->enable_gpios is NULL Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 20/33] ASoC: rt5682: Fix the fast discharge for headset unplugging in soundwire mode Sasha Levin
2021-06-15 15:48   ` Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 21/33] pinctrl: ralink: rt2880: avoid to error in calls is pin is already enabled Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 22/33] drm/sun4i: dw-hdmi: Make HDMI PHY into a platform device Sasha Levin
2021-06-15 15:48   ` Sasha Levin
2021-06-15 15:48   ` Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 23/33] ASoC: qcom: lpass-cpu: Fix pop noise during audio capture begin Sasha Levin
2021-06-15 15:48   ` Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 24/33] scsi: core: Fix error handling of scsi_host_alloc() Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 25/33] scsi: core: Fix failure handling of scsi_add_host_with_dma() Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 26/33] scsi: core: Put .shost_dev in failure path if host state changes to RUNNING Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 27/33] scsi: core: Only put parent device if host state differs from SHOST_CREATED Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 28/33] radeon: use memcpy_to/fromio for UVD fw upload Sasha Levin
2021-06-15 15:48   ` Sasha Levin
2021-06-15 15:48   ` Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 29/33] kvm: avoid speculation-based attacks from out-of-range memslot accesses Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 30/33] kvm: fix previous commit for 32-bit builds Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 31/33] hwmon: (scpi-hwmon) shows the negative temperature properly Sasha Levin
2021-06-15 15:48 ` [PATCH AUTOSEL 5.12 32/33] riscv: code patching only works on !XIP_KERNEL Sasha Levin
2021-06-15 15:48   ` Sasha Levin
2021-06-15 15:48 ` Sasha Levin [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210615154824.62044-33-sashal@kernel.org \
    --to=sashal@kernel.org \
    --cc=feng.tang@intel.com \
    --cc=jgg@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oliver.sang@intel.com \
    --cc=peterx@redhat.com \
    --cc=stable@vger.kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.