linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing.
@ 2023-06-28 10:56 Zhu, Lipeng
  2023-07-02 21:11 ` Andrew Morton
  0 siblings, 1 reply; 5+ messages in thread
From: Zhu, Lipeng @ 2023-06-28 10:56 UTC (permalink / raw)
  To: akpm, viro, brauner
  Cc: linux-fsdevel, linux-kernel, linux-mm, pan.deng, yu.ma,
	tianyou.li, tim.c.chen, Zhu, Lipeng

When running UnixBench/Shell Scripts, we observed high false sharing
for accessing i_mmap against i_mmap_rwsem.

UnixBench/Shell Scripts are typical load/execute command test scenarios,
the i_mmap will be accessed frequently to insert/remove vma_interval_tree.
Meanwhile, the i_mmap_rwsem is frequently loaded. Unfortunately, they are
in the same cacheline.

The patch places the i_mmap and i_mmap_rwsem in separate cache lines to avoid
this false sharing problem.

With this patch, on Intel Sapphire Rapids 2 sockets 112c/224t platform, based
on kernel v6.4-rc4, the 224 parallel score is improved ~2.5% for
UnixBench/Shell Scripts case. And perf c2c tool shows the false sharing is
resolved as expected, the symbol vma_interval_tree_remove disappeared in
cache line 0 after this change.

Baseline:
=================================================
      Shared Cache Line Distribution Pareto
=================================================
  -------------------------------------------------------------
      0    13642    19392     9012       63  0xff1ddd3f0c8a3b00
  -------------------------------------------------------------
    9.22%    7.37%    0.00%    0.00%    0x0     0       1  0xffffffffab344052       518       334       354     5490       160  [k] vma_interval_tree_remove    [kernel.kallsyms]  vma_interval_tree_remove+18      0  1
    0.71%    0.73%    0.00%    0.00%    0x8     0       1  0xffffffffabb9a21f       574       338       458     1991       160  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+655    0  1
    0.52%    0.71%    5.34%    6.35%    0x8     0       1  0xffffffffabb9a236      1080       597       390     4848       160  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+678    0  1
    0.56%    0.47%   26.39%    6.35%    0x8     0       1  0xffffffffabb9a5ec      1327      1037       587     8537       160  [k] down_write                  [kernel.kallsyms]  down_write+28                    0  1
    0.11%    0.08%   15.72%    1.59%    0x8     0       1  0xffffffffab17082b      1618      1077       735     7303       160  [k] up_write                    [kernel.kallsyms]  up_write+27                      0  1
    0.01%    0.02%    0.08%    0.00%    0x8     0       1  0xffffffffabb9a27d      1594       593       512       53        43  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+749    0  1
    0.00%    0.01%    0.00%    0.00%    0x8     0       1  0xffffffffabb9a0c4         0       323       518       97        74  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+308    0  1
   44.74%   49.78%    0.00%    0.00%   0x10     0       1  0xffffffffab170995       609       344       430    26841       160  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+53           0  1
   26.62%   22.39%    0.00%    0.00%   0x10     0       1  0xffffffffab170965       514       347       437    13364       160  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+5            0  1

With this change:
  -------------------------------------------------------------
      0    12726    18554     9039       49  0xff157a0f25b90c40
  -------------------------------------------------------------
    0.90%    0.72%    0.00%    0.00%    0x0     1       1  0xffffffffa5f9a21f       532       353       461     2200       160  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+655    0  1
    0.53%    0.70%    5.16%    6.12%    0x0     1       1  0xffffffffa5f9a236      1196       670       403     4774       160  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+678    0  1
    0.68%    0.51%   25.91%    6.12%    0x0     1       1  0xffffffffa5f9a5ec      1049       807       540     8552       160  [k] down_write                  [kernel.kallsyms]  down_write+28                    0  1
    0.09%    0.06%   16.50%    2.04%    0x0     1       1  0xffffffffa557082b      1693      1351       758     7317       160  [k] up_write                    [kernel.kallsyms]  up_write+27                      0  1
    0.01%    0.00%    0.00%    0.00%    0x0     1       1  0xffffffffa5f9a0c4       543         0       491       89        68  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+308    0  1
    0.00%    0.01%    0.02%    0.00%    0x0     1       1  0xffffffffa5f9a27d         0       597       742       45        40  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+749    0  1
   49.29%   53.01%    0.00%    0.00%    0x8     1       1  0xffffffffa5570995       580       310       413    27106       160  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+53           0  1
   28.60%   24.12%    0.00%    0.00%    0x8     1       1  0xffffffffa5570965       490       321       419    13244       160  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+5            0  1

Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
---
 include/linux/fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index c85916e9f7db..d3dd8dcc9b8b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -434,7 +434,7 @@ struct address_space {
 	atomic_t		nr_thps;
 #endif
 	struct rb_root_cached	i_mmap;
-	struct rw_semaphore	i_mmap_rwsem;
+	struct rw_semaphore	i_mmap_rwsem ____cacheline_aligned_in_smp;
 	unsigned long		nrpages;
 	pgoff_t			writeback_index;
 	const struct address_space_operations *a_ops;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing.
  2023-06-28 10:56 [PATCH] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing Zhu, Lipeng
@ 2023-07-02 21:11 ` Andrew Morton
  2023-07-16 14:48   ` Zhu, Lipeng
  0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2023-07-02 21:11 UTC (permalink / raw)
  To: Zhu, Lipeng
  Cc: viro, brauner, linux-fsdevel, linux-kernel, linux-mm, pan.deng,
	yu.ma, tianyou.li, tim.c.chen

On Wed, 28 Jun 2023 18:56:25 +0800 "Zhu, Lipeng" <lipeng.zhu@intel.com> wrote:

> When running UnixBench/Shell Scripts, we observed high false sharing
> for accessing i_mmap against i_mmap_rwsem.
> 
> UnixBench/Shell Scripts are typical load/execute command test scenarios,
> the i_mmap will be accessed frequently to insert/remove vma_interval_tree.
> Meanwhile, the i_mmap_rwsem is frequently loaded. Unfortunately, they are
> in the same cacheline.

That sounds odd.  One would expect these two fields to be used in
close conjunction, so any sharing might even be beneficial.  Can you
identify in more detail what's actually going on in there?

> The patch places the i_mmap and i_mmap_rwsem in separate cache lines to avoid
> this false sharing problem.
> 
> With this patch, on Intel Sapphire Rapids 2 sockets 112c/224t platform, based
> on kernel v6.4-rc4, the 224 parallel score is improved ~2.5% for
> UnixBench/Shell Scripts case. And perf c2c tool shows the false sharing is
> resolved as expected, the symbol vma_interval_tree_remove disappeared in
> cache line 0 after this change.

There can be many address_spaces in memory, so a size increase is a
concern.  Is there anything we can do to minimize the cost of this?


^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing.
  2023-07-02 21:11 ` Andrew Morton
@ 2023-07-16 14:48   ` Zhu, Lipeng
  2023-07-16 14:54     ` [PATCH v2] " Zhu, Lipeng
  2023-07-16 14:56     ` Zhu, Lipeng
  0 siblings, 2 replies; 5+ messages in thread
From: Zhu, Lipeng @ 2023-07-16 14:48 UTC (permalink / raw)
  To: Andrew Morton
  Cc: viro, brauner, linux-fsdevel, linux-kernel, linux-mm, Deng, Pan,
	Ma, Yu, Li, Tianyou, tim.c.chen

Hi Andrew,

> -----Original Message-----
> From: Andrew Morton <akpm@linux-foundation.org>
> Sent: Monday, July 3, 2023 5:11 AM
> To: Zhu, Lipeng <lipeng.zhu@intel.com>
> Cc: viro@zeniv.linux.org.uk; brauner@kernel.org; linux-
> fsdevel@vger.kernel.org; linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> Deng, Pan <pan.deng@intel.com>; Ma, Yu <yu.ma@intel.com>; Li, Tianyou
> <tianyou.li@intel.com>; tim.c.chen@linux.intel.com
> Subject: Re: [PATCH] fs/address_space: add alignment padding for i_map and
> i_mmap_rwsem to mitigate a false sharing.
> 
> On Wed, 28 Jun 2023 18:56:25 +0800 "Zhu, Lipeng" <lipeng.zhu@intel.com>
> wrote:
> 
> > When running UnixBench/Shell Scripts, we observed high false sharing
> > for accessing i_mmap against i_mmap_rwsem.
> >
> > UnixBench/Shell Scripts are typical load/execute command test
> > scenarios, the i_mmap will be accessed frequently to insert/remove
> vma_interval_tree.
> > Meanwhile, the i_mmap_rwsem is frequently loaded. Unfortunately, they
> > are in the same cacheline.
> 
> That sounds odd.  One would expect these two fields to be used in close
> conjunction, so any sharing might even be beneficial.  Can you identify in more
> detail what's actually going on in there?
> 

Yes, I'm running UnixBench/Shell Script which concurrently 
launch->execute->exit a lot of shell commands.

During the workload running:
1: A lot of processes invoke vma_interval_tree_remove which touch 
"i_mmap", the call stack:
----vma_interval_tree_remove
    |----unlink_file_vma
    |    free_pgtables
    |    |----exit_mmap
    |    |    mmput
    |    |    |----begin_new_exec
    |    |    |    load_elf_binary
    |    |    |    bprm_execve

2: Also, there are a lot of processes touch 'i_mmap_rwsem' to acquire 
the semaphore in order to access 'i_mmap'.
In existing 'address_space' layout, 'i_mmap' and 'i_mmap_rwsem' are in 
the same cacheline.
struct address_space {
    struct inode *             host;                 /*     0     8 */
    struct xarray              i_pages;              /*     8    16 */
    struct rw_semaphore        invalidate_lock;      /*    24    40 */
    /* --- cacheline 1 boundary (64 bytes) --- */
    gfp_t                      gfp_mask;             /*    64     4 */
    atomic_t                   i_mmap_writable;      /*    68     4 */
    struct rb_root_cached      i_mmap;               /*    72    16 */
    struct rw_semaphore        i_mmap_rwsem;         /*    88    40 */
    /* --- cacheline 2 boundary (128 bytes) --- */
    long unsigned int          nrpages;              /*   128     8 */
    long unsigned int          writeback_index;      /*   136     8 */
    const struct address_space_operations  * a_ops;  /*   144     8 */
    long unsigned int          flags;                /*   152     8 */
    errseq_t                   wb_err;               /*   160     4 */
    spinlock_t                 private_lock;         /*   164     4 */
    struct list_head           private_list;         /*   168    16 */
    void *                     private_data;         /*   184     8 */

    /* size: 192, cachelines: 3, members: 15 */ };

Following perf c2c result shows heavy c2c bounce due to false sharing.
=================================================
      Shared Cache Line Distribution Pareto
=================================================
-------------------------------------------------------------
    0    3729     5791        0        0  0xff19b3818445c740
-------------------------------------------------------------
   3.27%    3.02%    0.00%    0.00%   0x18     0       1  0xffffffffa194403b       604       483       389      692       203  [k] vma_interval_tree_insert    [kernel.kallsyms]  vma_interval_tree_insert+75      0  1
   4.13%    3.63%    0.00%    0.00%   0x20     0       1  0xffffffffa19440a2       553       413       415      962       215  [k] vma_interval_tree_remove    [kernel.kallsyms]  vma_interval_tree_remove+18      0  1
   2.04%    1.35%    0.00%    0.00%   0x28     0       1  0xffffffffa219a1d6      1210       855       460     1229       222  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+678    0  1
   0.62%    1.85%    0.00%    0.00%   0x28     0       1  0xffffffffa219a1bf       762       329       577      527       198  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+655    0  1
   0.48%    0.31%    0.00%    0.00%   0x28     0       1  0xffffffffa219a58c      1677      1476       733     1544       224  [k] down_write                  [kernel.kallsyms]  down_write+28                    0  1
   0.05%    0.07%    0.00%    0.00%   0x28     0       1  0xffffffffa219a21d      1040       819       689       33        27  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+749    0  1
   0.00%    0.05%    0.00%    0.00%   0x28     0       1  0xffffffffa17707db         0      1005       786     1373       223  [k] up_write                    [kernel.kallsyms]  up_write+27                      0  1
   0.00%    0.02%    0.00%    0.00%   0x28     0       1  0xffffffffa219a064         0       233       778       32        30  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+308    0  1
  33.82%   34.10%    0.00%    0.00%   0x30     0       1  0xffffffffa1770945       779       495       534     6011       224  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+53           0  1
  17.06%   15.28%    0.00%    0.00%   0x30     0       1  0xffffffffa1770915       593       438       468     2715       224  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+5            0  1
   3.54%    3.52%    0.00%    0.00%   0x30     0       1  0xffffffffa2199f84       881       601       583     1421       223  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+84     0  1

> > The patch places the i_mmap and i_mmap_rwsem in separate cache lines
> > to avoid this false sharing problem.
> >
> > With this patch, on Intel Sapphire Rapids 2 sockets 112c/224t
> > platform, based on kernel v6.4-rc4, the 224 parallel score is improved
> > ~2.5% for UnixBench/Shell Scripts case. And perf c2c tool shows the
> > false sharing is resolved as expected, the symbol
> > vma_interval_tree_remove disappeared in cache line 0 after this change.
> 
> There can be many address_spaces in memory, so a size increase is a concern.
> Is there anything we can do to minimize the cost of this?

Thanks for your reminder of the memory size increased. After the padding, the
struct "address_space" need 4 cachelines to fill in, the memory size increased is
around ~33%.
Then I tried another approach by moving 'i_mmap_rwsem' under the field
'flags', no memory size increased for "address_space" and based on v6.4.0, on
Intel Sapphire Rapids 112C/224T platform, the score improves by ~5.3%. From
the perf c2c record data, the false sharing has been fixed for 'i_mmap' and
'i_mmap_rwsem'.

=================================================
      Shared Cache Line Distribution Pareto
=================================================
-------------------------------------------------------------
   0      556      838        0        0  0xff2780d7965d2780
-------------------------------------------------------------
    0.18%    0.60%    0.00%    0.00%    0x8     0       1  0xffffffffafff27b8       503       453       569       14        13  [k] do_dentry_open              [kernel.kallsyms]  do_dentry_open+456               0  1
    0.54%    0.12%    0.00%    0.00%    0x8     0       1  0xffffffffaffc51ac       510       199       428       15        12  [k] hugepage_vma_check          [kernel.kallsyms]  hugepage_vma_check+252           0  1
    1.80%    2.15%    0.00%    0.00%   0x18     0       1  0xffffffffb079a1d6      1778       799       343      215       136  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+678    0  1
    0.54%    1.31%    0.00%    0.00%   0x18     0       1  0xffffffffb079a1bf       547       296       528       91        71  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+655    0  1
    0.72%    0.72%    0.00%    0.00%   0x18     0       1  0xffffffffb079a58c      1479      1534       676      288       163  [k] down_write                  [kernel.kallsyms]  down_write+28                    0  1
    0.00%    0.12%    0.00%    0.00%   0x18     0       1  0xffffffffafd707db         0      2381       744      282       158  [k] up_write                    [kernel.kallsyms]  up_write+27                      0  1
    0.00%    0.12%    0.00%    0.00%   0x18     0       1  0xffffffffb079a064         0       239       518        6         6  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+308    0  1
   46.58%   47.02%    0.00%    0.00%   0x20     0       1  0xffffffffafd70945       704       403       499     1137       219  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+53           0  1
   23.92%   25.78%    0.00%    0.00%   0x20     0       1  0xffffffffafd70915       558       413       500      542       185  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+5            0  1
 
 I will send another patch out and update the commit log.

Lipeng Zhu,
Best Regards


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH v2] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing.
  2023-07-16 14:48   ` Zhu, Lipeng
@ 2023-07-16 14:54     ` Zhu, Lipeng
  2023-07-16 14:56     ` Zhu, Lipeng
  1 sibling, 0 replies; 5+ messages in thread
From: Zhu, Lipeng @ 2023-07-16 14:54 UTC (permalink / raw)
  To: lipeng.zhu
  Cc: akpm, brauner, linux-fsdevel, linux-kernel, linux-mm, pan.deng,
	tianyou.li, tim.c.chen, viro, yu.ma

When running UnixBench/Shell Scripts, we observed high false sharing
for accessing i_mmap against i_mmap_rwsem.

UnixBench/Shell Scripts are typical load/execute command test scenarios,
which concurrently launch->execute->exit a lot of shell commands.
A lot of processes invoke vma_interval_tree_remove which touch
"i_mmap", the call stack:
----vma_interval_tree_remove
    |----unlink_file_vma
    |    free_pgtables
    |    |----exit_mmap
    |    |    mmput
    |    |    |----begin_new_exec
    |    |    |    load_elf_binary
    |    |    |    bprm_execve

Meanwhile, there are a lot of processes touch 'i_mmap_rwsem' to acquire
the semaphore in order to access 'i_mmap'.
In existing 'address_space' layout, 'i_mmap' and 'i_mmap_rwsem' are in
the same cacheline.

The patch places the i_mmap and i_mmap_rwsem in separate cache lines
to avoid this false sharing problem.

With this patch, based on kernel v6.4.0, on Intel Sapphire Rapids
112C/224T platform, the score improves by ~5.3%. And perf c2c tool
shows the false sharing is resolved as expected, the symbol
vma_interval_tree_remove disappeared in cache line 0 after this change.

Baseline:
=================================================
      Shared Cache Line Distribution Pareto
=================================================
-------------------------------------------------------------
    0    3729     5791        0        0  0xff19b3818445c740
-------------------------------------------------------------
   3.27%    3.02%    0.00%    0.00%   0x18     0       1  0xffffffffa194403b       604       483       389      692       203  [k] vma_interval_tree_insert    [kernel.kallsyms]  vma_interval_tree_insert+75      0  1
   4.13%    3.63%    0.00%    0.00%   0x20     0       1  0xffffffffa19440a2       553       413       415      962       215  [k] vma_interval_tree_remove    [kernel.kallsyms]  vma_interval_tree_remove+18      0  1
   2.04%    1.35%    0.00%    0.00%   0x28     0       1  0xffffffffa219a1d6      1210       855       460     1229       222  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+678    0  1
   0.62%    1.85%    0.00%    0.00%   0x28     0       1  0xffffffffa219a1bf       762       329       577      527       198  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+655    0  1
   0.48%    0.31%    0.00%    0.00%   0x28     0       1  0xffffffffa219a58c      1677      1476       733     1544       224  [k] down_write                  [kernel.kallsyms]  down_write+28                    0  1
   0.05%    0.07%    0.00%    0.00%   0x28     0       1  0xffffffffa219a21d      1040       819       689       33        27  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+749    0  1
   0.00%    0.05%    0.00%    0.00%   0x28     0       1  0xffffffffa17707db         0      1005       786     1373       223  [k] up_write                    [kernel.kallsyms]  up_write+27                      0  1
   0.00%    0.02%    0.00%    0.00%   0x28     0       1  0xffffffffa219a064         0       233       778       32        30  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+308    0  1
  33.82%   34.10%    0.00%    0.00%   0x30     0       1  0xffffffffa1770945       779       495       534     6011       224  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+53           0  1
  17.06%   15.28%    0.00%    0.00%   0x30     0       1  0xffffffffa1770915       593       438       468     2715       224  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+5            0  1
   3.54%    3.52%    0.00%    0.00%   0x30     0       1  0xffffffffa2199f84       881       601       583     1421       223  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+84     0  1

With this change:
-------------------------------------------------------------
   0      556      838        0        0  0xff2780d7965d2780
-------------------------------------------------------------
    0.18%    0.60%    0.00%    0.00%    0x8     0       1  0xffffffffafff27b8       503       453       569       14        13  [k] do_dentry_open              [kernel.kallsyms]  do_dentry_open+456               0  1
    0.54%    0.12%    0.00%    0.00%    0x8     0       1  0xffffffffaffc51ac       510       199       428       15        12  [k] hugepage_vma_check          [kernel.kallsyms]  hugepage_vma_check+252           0  1
    1.80%    2.15%    0.00%    0.00%   0x18     0       1  0xffffffffb079a1d6      1778       799       343      215       136  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+678    0  1
    0.54%    1.31%    0.00%    0.00%   0x18     0       1  0xffffffffb079a1bf       547       296       528       91        71  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+655    0  1
    0.72%    0.72%    0.00%    0.00%   0x18     0       1  0xffffffffb079a58c      1479      1534       676      288       163  [k] down_write                  [kernel.kallsyms]  down_write+28                    0  1
    0.00%    0.12%    0.00%    0.00%   0x18     0       1  0xffffffffafd707db         0      2381       744      282       158  [k] up_write                    [kernel.kallsyms]  up_write+27                      0  1
    0.00%    0.12%    0.00%    0.00%   0x18     0       1  0xffffffffb079a064         0       239       518        6         6  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+308    0  1
   46.58%   47.02%    0.00%    0.00%   0x20     0       1  0xffffffffafd70945       704       403       499     1137       219  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+53           0  1
   23.92%   25.78%    0.00%    0.00%   0x20     0       1  0xffffffffafd70915       558       413       500      542       185  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+5            0  1

v1->v2: change padding to exchange fields.

Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
---
 include/linux/fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 133f0640fb24..4a525ed17eab 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -451,11 +451,11 @@ struct address_space {
 	atomic_t		nr_thps;
 #endif
 	struct rb_root_cached	i_mmap;
-	struct rw_semaphore	i_mmap_rwsem;
 	unsigned long		nrpages;
 	pgoff_t			writeback_index;
 	const struct address_space_operations *a_ops;
 	unsigned long		flags;
+	struct rw_semaphore	i_mmap_rwsem;
 	errseq_t		wb_err;
 	spinlock_t		private_lock;
 	struct list_head	private_list;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [PATCH v2] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing.
  2023-07-16 14:48   ` Zhu, Lipeng
  2023-07-16 14:54     ` [PATCH v2] " Zhu, Lipeng
@ 2023-07-16 14:56     ` Zhu, Lipeng
  1 sibling, 0 replies; 5+ messages in thread
From: Zhu, Lipeng @ 2023-07-16 14:56 UTC (permalink / raw)
  To: akpm
  Cc: lipeng.zhu, brauner, linux-fsdevel, linux-kernel, linux-mm,
	pan.deng, tianyou.li, tim.c.chen, viro, yu.ma, Zhu, Lipeng

When running UnixBench/Shell Scripts, we observed high false sharing
for accessing i_mmap against i_mmap_rwsem.

UnixBench/Shell Scripts are typical load/execute command test scenarios,
which concurrently launch->execute->exit a lot of shell commands.
A lot of processes invoke vma_interval_tree_remove which touch
"i_mmap", the call stack:
----vma_interval_tree_remove
    |----unlink_file_vma
    |    free_pgtables
    |    |----exit_mmap
    |    |    mmput
    |    |    |----begin_new_exec
    |    |    |    load_elf_binary
    |    |    |    bprm_execve

Meanwhile, there are a lot of processes touch 'i_mmap_rwsem' to acquire
the semaphore in order to access 'i_mmap'.
In existing 'address_space' layout, 'i_mmap' and 'i_mmap_rwsem' are in
the same cacheline.

The patch places the i_mmap and i_mmap_rwsem in separate cache lines
to avoid this false sharing problem.

With this patch, based on kernel v6.4.0, on Intel Sapphire Rapids
112C/224T platform, the score improves by ~5.3%. And perf c2c tool
shows the false sharing is resolved as expected, the symbol
vma_interval_tree_remove disappeared in cache line 0 after this change.

Baseline:
=================================================
      Shared Cache Line Distribution Pareto
=================================================
-------------------------------------------------------------
    0    3729     5791        0        0  0xff19b3818445c740
-------------------------------------------------------------
   3.27%    3.02%    0.00%    0.00%   0x18     0       1  0xffffffffa194403b       604       483       389      692       203  [k] vma_interval_tree_insert    [kernel.kallsyms]  vma_interval_tree_insert+75      0  1
   4.13%    3.63%    0.00%    0.00%   0x20     0       1  0xffffffffa19440a2       553       413       415      962       215  [k] vma_interval_tree_remove    [kernel.kallsyms]  vma_interval_tree_remove+18      0  1
   2.04%    1.35%    0.00%    0.00%   0x28     0       1  0xffffffffa219a1d6      1210       855       460     1229       222  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+678    0  1
   0.62%    1.85%    0.00%    0.00%   0x28     0       1  0xffffffffa219a1bf       762       329       577      527       198  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+655    0  1
   0.48%    0.31%    0.00%    0.00%   0x28     0       1  0xffffffffa219a58c      1677      1476       733     1544       224  [k] down_write                  [kernel.kallsyms]  down_write+28                    0  1
   0.05%    0.07%    0.00%    0.00%   0x28     0       1  0xffffffffa219a21d      1040       819       689       33        27  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+749    0  1
   0.00%    0.05%    0.00%    0.00%   0x28     0       1  0xffffffffa17707db         0      1005       786     1373       223  [k] up_write                    [kernel.kallsyms]  up_write+27                      0  1
   0.00%    0.02%    0.00%    0.00%   0x28     0       1  0xffffffffa219a064         0       233       778       32        30  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+308    0  1
  33.82%   34.10%    0.00%    0.00%   0x30     0       1  0xffffffffa1770945       779       495       534     6011       224  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+53           0  1
  17.06%   15.28%    0.00%    0.00%   0x30     0       1  0xffffffffa1770915       593       438       468     2715       224  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+5            0  1
   3.54%    3.52%    0.00%    0.00%   0x30     0       1  0xffffffffa2199f84       881       601       583     1421       223  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+84     0  1

With this change:
-------------------------------------------------------------
   0      556      838        0        0  0xff2780d7965d2780
-------------------------------------------------------------
    0.18%    0.60%    0.00%    0.00%    0x8     0       1  0xffffffffafff27b8       503       453       569       14        13  [k] do_dentry_open              [kernel.kallsyms]  do_dentry_open+456               0  1
    0.54%    0.12%    0.00%    0.00%    0x8     0       1  0xffffffffaffc51ac       510       199       428       15        12  [k] hugepage_vma_check          [kernel.kallsyms]  hugepage_vma_check+252           0  1
    1.80%    2.15%    0.00%    0.00%   0x18     0       1  0xffffffffb079a1d6      1778       799       343      215       136  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+678    0  1
    0.54%    1.31%    0.00%    0.00%   0x18     0       1  0xffffffffb079a1bf       547       296       528       91        71  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+655    0  1
    0.72%    0.72%    0.00%    0.00%   0x18     0       1  0xffffffffb079a58c      1479      1534       676      288       163  [k] down_write                  [kernel.kallsyms]  down_write+28                    0  1
    0.00%    0.12%    0.00%    0.00%   0x18     0       1  0xffffffffafd707db         0      2381       744      282       158  [k] up_write                    [kernel.kallsyms]  up_write+27                      0  1
    0.00%    0.12%    0.00%    0.00%   0x18     0       1  0xffffffffb079a064         0       239       518        6         6  [k] rwsem_down_write_slowpath   [kernel.kallsyms]  rwsem_down_write_slowpath+308    0  1
   46.58%   47.02%    0.00%    0.00%   0x20     0       1  0xffffffffafd70945       704       403       499     1137       219  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+53           0  1
   23.92%   25.78%    0.00%    0.00%   0x20     0       1  0xffffffffafd70915       558       413       500      542       185  [k] rwsem_spin_on_owner         [kernel.kallsyms]  rwsem_spin_on_owner+5            0  1

v1->v2: change padding to exchange fields.

Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
---
 include/linux/fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 133f0640fb24..4a525ed17eab 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -451,11 +451,11 @@ struct address_space {
 	atomic_t		nr_thps;
 #endif
 	struct rb_root_cached	i_mmap;
-	struct rw_semaphore	i_mmap_rwsem;
 	unsigned long		nrpages;
 	pgoff_t			writeback_index;
 	const struct address_space_operations *a_ops;
 	unsigned long		flags;
+	struct rw_semaphore	i_mmap_rwsem;
 	errseq_t		wb_err;
 	spinlock_t		private_lock;
 	struct list_head	private_list;
-- 
2.39.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-07-16 14:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-28 10:56 [PATCH] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing Zhu, Lipeng
2023-07-02 21:11 ` Andrew Morton
2023-07-16 14:48   ` Zhu, Lipeng
2023-07-16 14:54     ` [PATCH v2] " Zhu, Lipeng
2023-07-16 14:56     ` Zhu, Lipeng

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).