* [PATCH] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing.
@ 2023-06-28 10:56 Zhu, Lipeng
2023-07-02 21:11 ` Andrew Morton
0 siblings, 1 reply; 5+ messages in thread
From: Zhu, Lipeng @ 2023-06-28 10:56 UTC (permalink / raw)
To: akpm, viro, brauner
Cc: linux-fsdevel, linux-kernel, linux-mm, pan.deng, yu.ma,
tianyou.li, tim.c.chen, Zhu, Lipeng
When running UnixBench/Shell Scripts, we observed high false sharing
for accessing i_mmap against i_mmap_rwsem.
UnixBench/Shell Scripts are typical load/execute command test scenarios,
the i_mmap will be accessed frequently to insert/remove vma_interval_tree.
Meanwhile, the i_mmap_rwsem is frequently loaded. Unfortunately, they are
in the same cacheline.
The patch places the i_mmap and i_mmap_rwsem in separate cache lines to avoid
this false sharing problem.
With this patch, on Intel Sapphire Rapids 2 sockets 112c/224t platform, based
on kernel v6.4-rc4, the 224 parallel score is improved ~2.5% for
UnixBench/Shell Scripts case. And perf c2c tool shows the false sharing is
resolved as expected, the symbol vma_interval_tree_remove disappeared in
cache line 0 after this change.
Baseline:
=================================================
Shared Cache Line Distribution Pareto
=================================================
-------------------------------------------------------------
0 13642 19392 9012 63 0xff1ddd3f0c8a3b00
-------------------------------------------------------------
9.22% 7.37% 0.00% 0.00% 0x0 0 1 0xffffffffab344052 518 334 354 5490 160 [k] vma_interval_tree_remove [kernel.kallsyms] vma_interval_tree_remove+18 0 1
0.71% 0.73% 0.00% 0.00% 0x8 0 1 0xffffffffabb9a21f 574 338 458 1991 160 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+655 0 1
0.52% 0.71% 5.34% 6.35% 0x8 0 1 0xffffffffabb9a236 1080 597 390 4848 160 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+678 0 1
0.56% 0.47% 26.39% 6.35% 0x8 0 1 0xffffffffabb9a5ec 1327 1037 587 8537 160 [k] down_write [kernel.kallsyms] down_write+28 0 1
0.11% 0.08% 15.72% 1.59% 0x8 0 1 0xffffffffab17082b 1618 1077 735 7303 160 [k] up_write [kernel.kallsyms] up_write+27 0 1
0.01% 0.02% 0.08% 0.00% 0x8 0 1 0xffffffffabb9a27d 1594 593 512 53 43 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+749 0 1
0.00% 0.01% 0.00% 0.00% 0x8 0 1 0xffffffffabb9a0c4 0 323 518 97 74 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+308 0 1
44.74% 49.78% 0.00% 0.00% 0x10 0 1 0xffffffffab170995 609 344 430 26841 160 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+53 0 1
26.62% 22.39% 0.00% 0.00% 0x10 0 1 0xffffffffab170965 514 347 437 13364 160 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+5 0 1
With this change:
-------------------------------------------------------------
0 12726 18554 9039 49 0xff157a0f25b90c40
-------------------------------------------------------------
0.90% 0.72% 0.00% 0.00% 0x0 1 1 0xffffffffa5f9a21f 532 353 461 2200 160 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+655 0 1
0.53% 0.70% 5.16% 6.12% 0x0 1 1 0xffffffffa5f9a236 1196 670 403 4774 160 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+678 0 1
0.68% 0.51% 25.91% 6.12% 0x0 1 1 0xffffffffa5f9a5ec 1049 807 540 8552 160 [k] down_write [kernel.kallsyms] down_write+28 0 1
0.09% 0.06% 16.50% 2.04% 0x0 1 1 0xffffffffa557082b 1693 1351 758 7317 160 [k] up_write [kernel.kallsyms] up_write+27 0 1
0.01% 0.00% 0.00% 0.00% 0x0 1 1 0xffffffffa5f9a0c4 543 0 491 89 68 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+308 0 1
0.00% 0.01% 0.02% 0.00% 0x0 1 1 0xffffffffa5f9a27d 0 597 742 45 40 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+749 0 1
49.29% 53.01% 0.00% 0.00% 0x8 1 1 0xffffffffa5570995 580 310 413 27106 160 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+53 0 1
28.60% 24.12% 0.00% 0.00% 0x8 1 1 0xffffffffa5570965 490 321 419 13244 160 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+5 0 1
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
---
include/linux/fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c85916e9f7db..d3dd8dcc9b8b 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -434,7 +434,7 @@ struct address_space {
atomic_t nr_thps;
#endif
struct rb_root_cached i_mmap;
- struct rw_semaphore i_mmap_rwsem;
+ struct rw_semaphore i_mmap_rwsem ____cacheline_aligned_in_smp;
unsigned long nrpages;
pgoff_t writeback_index;
const struct address_space_operations *a_ops;
--
2.39.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing.
2023-06-28 10:56 [PATCH] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing Zhu, Lipeng
@ 2023-07-02 21:11 ` Andrew Morton
2023-07-16 14:48 ` Zhu, Lipeng
0 siblings, 1 reply; 5+ messages in thread
From: Andrew Morton @ 2023-07-02 21:11 UTC (permalink / raw)
To: Zhu, Lipeng
Cc: viro, brauner, linux-fsdevel, linux-kernel, linux-mm, pan.deng,
yu.ma, tianyou.li, tim.c.chen
On Wed, 28 Jun 2023 18:56:25 +0800 "Zhu, Lipeng" <lipeng.zhu@intel.com> wrote:
> When running UnixBench/Shell Scripts, we observed high false sharing
> for accessing i_mmap against i_mmap_rwsem.
>
> UnixBench/Shell Scripts are typical load/execute command test scenarios,
> the i_mmap will be accessed frequently to insert/remove vma_interval_tree.
> Meanwhile, the i_mmap_rwsem is frequently loaded. Unfortunately, they are
> in the same cacheline.
That sounds odd. One would expect these two fields to be used in
close conjunction, so any sharing might even be beneficial. Can you
identify in more detail what's actually going on in there?
> The patch places the i_mmap and i_mmap_rwsem in separate cache lines to avoid
> this false sharing problem.
>
> With this patch, on Intel Sapphire Rapids 2 sockets 112c/224t platform, based
> on kernel v6.4-rc4, the 224 parallel score is improved ~2.5% for
> UnixBench/Shell Scripts case. And perf c2c tool shows the false sharing is
> resolved as expected, the symbol vma_interval_tree_remove disappeared in
> cache line 0 after this change.
There can be many address_spaces in memory, so a size increase is a
concern. Is there anything we can do to minimize the cost of this?
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [PATCH] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing.
2023-07-02 21:11 ` Andrew Morton
@ 2023-07-16 14:48 ` Zhu, Lipeng
2023-07-16 14:54 ` [PATCH v2] " Zhu, Lipeng
2023-07-16 14:56 ` Zhu, Lipeng
0 siblings, 2 replies; 5+ messages in thread
From: Zhu, Lipeng @ 2023-07-16 14:48 UTC (permalink / raw)
To: Andrew Morton
Cc: viro, brauner, linux-fsdevel, linux-kernel, linux-mm, Deng, Pan,
Ma, Yu, Li, Tianyou, tim.c.chen
Hi Andrew,
> -----Original Message-----
> From: Andrew Morton <akpm@linux-foundation.org>
> Sent: Monday, July 3, 2023 5:11 AM
> To: Zhu, Lipeng <lipeng.zhu@intel.com>
> Cc: viro@zeniv.linux.org.uk; brauner@kernel.org; linux-
> fsdevel@vger.kernel.org; linux-kernel@vger.kernel.org; linux-mm@kvack.org;
> Deng, Pan <pan.deng@intel.com>; Ma, Yu <yu.ma@intel.com>; Li, Tianyou
> <tianyou.li@intel.com>; tim.c.chen@linux.intel.com
> Subject: Re: [PATCH] fs/address_space: add alignment padding for i_map and
> i_mmap_rwsem to mitigate a false sharing.
>
> On Wed, 28 Jun 2023 18:56:25 +0800 "Zhu, Lipeng" <lipeng.zhu@intel.com>
> wrote:
>
> > When running UnixBench/Shell Scripts, we observed high false sharing
> > for accessing i_mmap against i_mmap_rwsem.
> >
> > UnixBench/Shell Scripts are typical load/execute command test
> > scenarios, the i_mmap will be accessed frequently to insert/remove
> vma_interval_tree.
> > Meanwhile, the i_mmap_rwsem is frequently loaded. Unfortunately, they
> > are in the same cacheline.
>
> That sounds odd. One would expect these two fields to be used in close
> conjunction, so any sharing might even be beneficial. Can you identify in more
> detail what's actually going on in there?
>
Yes, I'm running UnixBench/Shell Script which concurrently
launch->execute->exit a lot of shell commands.
During the workload running:
1: A lot of processes invoke vma_interval_tree_remove which touch
"i_mmap", the call stack:
----vma_interval_tree_remove
|----unlink_file_vma
| free_pgtables
| |----exit_mmap
| | mmput
| | |----begin_new_exec
| | | load_elf_binary
| | | bprm_execve
2: Also, there are a lot of processes touch 'i_mmap_rwsem' to acquire
the semaphore in order to access 'i_mmap'.
In existing 'address_space' layout, 'i_mmap' and 'i_mmap_rwsem' are in
the same cacheline.
struct address_space {
struct inode * host; /* 0 8 */
struct xarray i_pages; /* 8 16 */
struct rw_semaphore invalidate_lock; /* 24 40 */
/* --- cacheline 1 boundary (64 bytes) --- */
gfp_t gfp_mask; /* 64 4 */
atomic_t i_mmap_writable; /* 68 4 */
struct rb_root_cached i_mmap; /* 72 16 */
struct rw_semaphore i_mmap_rwsem; /* 88 40 */
/* --- cacheline 2 boundary (128 bytes) --- */
long unsigned int nrpages; /* 128 8 */
long unsigned int writeback_index; /* 136 8 */
const struct address_space_operations * a_ops; /* 144 8 */
long unsigned int flags; /* 152 8 */
errseq_t wb_err; /* 160 4 */
spinlock_t private_lock; /* 164 4 */
struct list_head private_list; /* 168 16 */
void * private_data; /* 184 8 */
/* size: 192, cachelines: 3, members: 15 */ };
Following perf c2c result shows heavy c2c bounce due to false sharing.
=================================================
Shared Cache Line Distribution Pareto
=================================================
-------------------------------------------------------------
0 3729 5791 0 0 0xff19b3818445c740
-------------------------------------------------------------
3.27% 3.02% 0.00% 0.00% 0x18 0 1 0xffffffffa194403b 604 483 389 692 203 [k] vma_interval_tree_insert [kernel.kallsyms] vma_interval_tree_insert+75 0 1
4.13% 3.63% 0.00% 0.00% 0x20 0 1 0xffffffffa19440a2 553 413 415 962 215 [k] vma_interval_tree_remove [kernel.kallsyms] vma_interval_tree_remove+18 0 1
2.04% 1.35% 0.00% 0.00% 0x28 0 1 0xffffffffa219a1d6 1210 855 460 1229 222 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+678 0 1
0.62% 1.85% 0.00% 0.00% 0x28 0 1 0xffffffffa219a1bf 762 329 577 527 198 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+655 0 1
0.48% 0.31% 0.00% 0.00% 0x28 0 1 0xffffffffa219a58c 1677 1476 733 1544 224 [k] down_write [kernel.kallsyms] down_write+28 0 1
0.05% 0.07% 0.00% 0.00% 0x28 0 1 0xffffffffa219a21d 1040 819 689 33 27 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+749 0 1
0.00% 0.05% 0.00% 0.00% 0x28 0 1 0xffffffffa17707db 0 1005 786 1373 223 [k] up_write [kernel.kallsyms] up_write+27 0 1
0.00% 0.02% 0.00% 0.00% 0x28 0 1 0xffffffffa219a064 0 233 778 32 30 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+308 0 1
33.82% 34.10% 0.00% 0.00% 0x30 0 1 0xffffffffa1770945 779 495 534 6011 224 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+53 0 1
17.06% 15.28% 0.00% 0.00% 0x30 0 1 0xffffffffa1770915 593 438 468 2715 224 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+5 0 1
3.54% 3.52% 0.00% 0.00% 0x30 0 1 0xffffffffa2199f84 881 601 583 1421 223 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+84 0 1
> > The patch places the i_mmap and i_mmap_rwsem in separate cache lines
> > to avoid this false sharing problem.
> >
> > With this patch, on Intel Sapphire Rapids 2 sockets 112c/224t
> > platform, based on kernel v6.4-rc4, the 224 parallel score is improved
> > ~2.5% for UnixBench/Shell Scripts case. And perf c2c tool shows the
> > false sharing is resolved as expected, the symbol
> > vma_interval_tree_remove disappeared in cache line 0 after this change.
>
> There can be many address_spaces in memory, so a size increase is a concern.
> Is there anything we can do to minimize the cost of this?
Thanks for your reminder of the memory size increased. After the padding, the
struct "address_space" need 4 cachelines to fill in, the memory size increased is
around ~33%.
Then I tried another approach by moving 'i_mmap_rwsem' under the field
'flags', no memory size increased for "address_space" and based on v6.4.0, on
Intel Sapphire Rapids 112C/224T platform, the score improves by ~5.3%. From
the perf c2c record data, the false sharing has been fixed for 'i_mmap' and
'i_mmap_rwsem'.
=================================================
Shared Cache Line Distribution Pareto
=================================================
-------------------------------------------------------------
0 556 838 0 0 0xff2780d7965d2780
-------------------------------------------------------------
0.18% 0.60% 0.00% 0.00% 0x8 0 1 0xffffffffafff27b8 503 453 569 14 13 [k] do_dentry_open [kernel.kallsyms] do_dentry_open+456 0 1
0.54% 0.12% 0.00% 0.00% 0x8 0 1 0xffffffffaffc51ac 510 199 428 15 12 [k] hugepage_vma_check [kernel.kallsyms] hugepage_vma_check+252 0 1
1.80% 2.15% 0.00% 0.00% 0x18 0 1 0xffffffffb079a1d6 1778 799 343 215 136 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+678 0 1
0.54% 1.31% 0.00% 0.00% 0x18 0 1 0xffffffffb079a1bf 547 296 528 91 71 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+655 0 1
0.72% 0.72% 0.00% 0.00% 0x18 0 1 0xffffffffb079a58c 1479 1534 676 288 163 [k] down_write [kernel.kallsyms] down_write+28 0 1
0.00% 0.12% 0.00% 0.00% 0x18 0 1 0xffffffffafd707db 0 2381 744 282 158 [k] up_write [kernel.kallsyms] up_write+27 0 1
0.00% 0.12% 0.00% 0.00% 0x18 0 1 0xffffffffb079a064 0 239 518 6 6 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+308 0 1
46.58% 47.02% 0.00% 0.00% 0x20 0 1 0xffffffffafd70945 704 403 499 1137 219 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+53 0 1
23.92% 25.78% 0.00% 0.00% 0x20 0 1 0xffffffffafd70915 558 413 500 542 185 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+5 0 1
I will send another patch out and update the commit log.
Lipeng Zhu,
Best Regards
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH v2] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing.
2023-07-16 14:48 ` Zhu, Lipeng
@ 2023-07-16 14:54 ` Zhu, Lipeng
2023-07-16 14:56 ` Zhu, Lipeng
1 sibling, 0 replies; 5+ messages in thread
From: Zhu, Lipeng @ 2023-07-16 14:54 UTC (permalink / raw)
To: lipeng.zhu
Cc: akpm, brauner, linux-fsdevel, linux-kernel, linux-mm, pan.deng,
tianyou.li, tim.c.chen, viro, yu.ma
When running UnixBench/Shell Scripts, we observed high false sharing
for accessing i_mmap against i_mmap_rwsem.
UnixBench/Shell Scripts are typical load/execute command test scenarios,
which concurrently launch->execute->exit a lot of shell commands.
A lot of processes invoke vma_interval_tree_remove which touch
"i_mmap", the call stack:
----vma_interval_tree_remove
|----unlink_file_vma
| free_pgtables
| |----exit_mmap
| | mmput
| | |----begin_new_exec
| | | load_elf_binary
| | | bprm_execve
Meanwhile, there are a lot of processes touch 'i_mmap_rwsem' to acquire
the semaphore in order to access 'i_mmap'.
In existing 'address_space' layout, 'i_mmap' and 'i_mmap_rwsem' are in
the same cacheline.
The patch places the i_mmap and i_mmap_rwsem in separate cache lines
to avoid this false sharing problem.
With this patch, based on kernel v6.4.0, on Intel Sapphire Rapids
112C/224T platform, the score improves by ~5.3%. And perf c2c tool
shows the false sharing is resolved as expected, the symbol
vma_interval_tree_remove disappeared in cache line 0 after this change.
Baseline:
=================================================
Shared Cache Line Distribution Pareto
=================================================
-------------------------------------------------------------
0 3729 5791 0 0 0xff19b3818445c740
-------------------------------------------------------------
3.27% 3.02% 0.00% 0.00% 0x18 0 1 0xffffffffa194403b 604 483 389 692 203 [k] vma_interval_tree_insert [kernel.kallsyms] vma_interval_tree_insert+75 0 1
4.13% 3.63% 0.00% 0.00% 0x20 0 1 0xffffffffa19440a2 553 413 415 962 215 [k] vma_interval_tree_remove [kernel.kallsyms] vma_interval_tree_remove+18 0 1
2.04% 1.35% 0.00% 0.00% 0x28 0 1 0xffffffffa219a1d6 1210 855 460 1229 222 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+678 0 1
0.62% 1.85% 0.00% 0.00% 0x28 0 1 0xffffffffa219a1bf 762 329 577 527 198 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+655 0 1
0.48% 0.31% 0.00% 0.00% 0x28 0 1 0xffffffffa219a58c 1677 1476 733 1544 224 [k] down_write [kernel.kallsyms] down_write+28 0 1
0.05% 0.07% 0.00% 0.00% 0x28 0 1 0xffffffffa219a21d 1040 819 689 33 27 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+749 0 1
0.00% 0.05% 0.00% 0.00% 0x28 0 1 0xffffffffa17707db 0 1005 786 1373 223 [k] up_write [kernel.kallsyms] up_write+27 0 1
0.00% 0.02% 0.00% 0.00% 0x28 0 1 0xffffffffa219a064 0 233 778 32 30 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+308 0 1
33.82% 34.10% 0.00% 0.00% 0x30 0 1 0xffffffffa1770945 779 495 534 6011 224 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+53 0 1
17.06% 15.28% 0.00% 0.00% 0x30 0 1 0xffffffffa1770915 593 438 468 2715 224 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+5 0 1
3.54% 3.52% 0.00% 0.00% 0x30 0 1 0xffffffffa2199f84 881 601 583 1421 223 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+84 0 1
With this change:
-------------------------------------------------------------
0 556 838 0 0 0xff2780d7965d2780
-------------------------------------------------------------
0.18% 0.60% 0.00% 0.00% 0x8 0 1 0xffffffffafff27b8 503 453 569 14 13 [k] do_dentry_open [kernel.kallsyms] do_dentry_open+456 0 1
0.54% 0.12% 0.00% 0.00% 0x8 0 1 0xffffffffaffc51ac 510 199 428 15 12 [k] hugepage_vma_check [kernel.kallsyms] hugepage_vma_check+252 0 1
1.80% 2.15% 0.00% 0.00% 0x18 0 1 0xffffffffb079a1d6 1778 799 343 215 136 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+678 0 1
0.54% 1.31% 0.00% 0.00% 0x18 0 1 0xffffffffb079a1bf 547 296 528 91 71 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+655 0 1
0.72% 0.72% 0.00% 0.00% 0x18 0 1 0xffffffffb079a58c 1479 1534 676 288 163 [k] down_write [kernel.kallsyms] down_write+28 0 1
0.00% 0.12% 0.00% 0.00% 0x18 0 1 0xffffffffafd707db 0 2381 744 282 158 [k] up_write [kernel.kallsyms] up_write+27 0 1
0.00% 0.12% 0.00% 0.00% 0x18 0 1 0xffffffffb079a064 0 239 518 6 6 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+308 0 1
46.58% 47.02% 0.00% 0.00% 0x20 0 1 0xffffffffafd70945 704 403 499 1137 219 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+53 0 1
23.92% 25.78% 0.00% 0.00% 0x20 0 1 0xffffffffafd70915 558 413 500 542 185 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+5 0 1
v1->v2: change padding to exchange fields.
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
---
include/linux/fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 133f0640fb24..4a525ed17eab 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -451,11 +451,11 @@ struct address_space {
atomic_t nr_thps;
#endif
struct rb_root_cached i_mmap;
- struct rw_semaphore i_mmap_rwsem;
unsigned long nrpages;
pgoff_t writeback_index;
const struct address_space_operations *a_ops;
unsigned long flags;
+ struct rw_semaphore i_mmap_rwsem;
errseq_t wb_err;
spinlock_t private_lock;
struct list_head private_list;
--
2.39.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH v2] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing.
2023-07-16 14:48 ` Zhu, Lipeng
2023-07-16 14:54 ` [PATCH v2] " Zhu, Lipeng
@ 2023-07-16 14:56 ` Zhu, Lipeng
1 sibling, 0 replies; 5+ messages in thread
From: Zhu, Lipeng @ 2023-07-16 14:56 UTC (permalink / raw)
To: akpm
Cc: lipeng.zhu, brauner, linux-fsdevel, linux-kernel, linux-mm,
pan.deng, tianyou.li, tim.c.chen, viro, yu.ma, Zhu, Lipeng
When running UnixBench/Shell Scripts, we observed high false sharing
for accessing i_mmap against i_mmap_rwsem.
UnixBench/Shell Scripts are typical load/execute command test scenarios,
which concurrently launch->execute->exit a lot of shell commands.
A lot of processes invoke vma_interval_tree_remove which touch
"i_mmap", the call stack:
----vma_interval_tree_remove
|----unlink_file_vma
| free_pgtables
| |----exit_mmap
| | mmput
| | |----begin_new_exec
| | | load_elf_binary
| | | bprm_execve
Meanwhile, there are a lot of processes touch 'i_mmap_rwsem' to acquire
the semaphore in order to access 'i_mmap'.
In existing 'address_space' layout, 'i_mmap' and 'i_mmap_rwsem' are in
the same cacheline.
The patch places the i_mmap and i_mmap_rwsem in separate cache lines
to avoid this false sharing problem.
With this patch, based on kernel v6.4.0, on Intel Sapphire Rapids
112C/224T platform, the score improves by ~5.3%. And perf c2c tool
shows the false sharing is resolved as expected, the symbol
vma_interval_tree_remove disappeared in cache line 0 after this change.
Baseline:
=================================================
Shared Cache Line Distribution Pareto
=================================================
-------------------------------------------------------------
0 3729 5791 0 0 0xff19b3818445c740
-------------------------------------------------------------
3.27% 3.02% 0.00% 0.00% 0x18 0 1 0xffffffffa194403b 604 483 389 692 203 [k] vma_interval_tree_insert [kernel.kallsyms] vma_interval_tree_insert+75 0 1
4.13% 3.63% 0.00% 0.00% 0x20 0 1 0xffffffffa19440a2 553 413 415 962 215 [k] vma_interval_tree_remove [kernel.kallsyms] vma_interval_tree_remove+18 0 1
2.04% 1.35% 0.00% 0.00% 0x28 0 1 0xffffffffa219a1d6 1210 855 460 1229 222 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+678 0 1
0.62% 1.85% 0.00% 0.00% 0x28 0 1 0xffffffffa219a1bf 762 329 577 527 198 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+655 0 1
0.48% 0.31% 0.00% 0.00% 0x28 0 1 0xffffffffa219a58c 1677 1476 733 1544 224 [k] down_write [kernel.kallsyms] down_write+28 0 1
0.05% 0.07% 0.00% 0.00% 0x28 0 1 0xffffffffa219a21d 1040 819 689 33 27 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+749 0 1
0.00% 0.05% 0.00% 0.00% 0x28 0 1 0xffffffffa17707db 0 1005 786 1373 223 [k] up_write [kernel.kallsyms] up_write+27 0 1
0.00% 0.02% 0.00% 0.00% 0x28 0 1 0xffffffffa219a064 0 233 778 32 30 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+308 0 1
33.82% 34.10% 0.00% 0.00% 0x30 0 1 0xffffffffa1770945 779 495 534 6011 224 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+53 0 1
17.06% 15.28% 0.00% 0.00% 0x30 0 1 0xffffffffa1770915 593 438 468 2715 224 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+5 0 1
3.54% 3.52% 0.00% 0.00% 0x30 0 1 0xffffffffa2199f84 881 601 583 1421 223 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+84 0 1
With this change:
-------------------------------------------------------------
0 556 838 0 0 0xff2780d7965d2780
-------------------------------------------------------------
0.18% 0.60% 0.00% 0.00% 0x8 0 1 0xffffffffafff27b8 503 453 569 14 13 [k] do_dentry_open [kernel.kallsyms] do_dentry_open+456 0 1
0.54% 0.12% 0.00% 0.00% 0x8 0 1 0xffffffffaffc51ac 510 199 428 15 12 [k] hugepage_vma_check [kernel.kallsyms] hugepage_vma_check+252 0 1
1.80% 2.15% 0.00% 0.00% 0x18 0 1 0xffffffffb079a1d6 1778 799 343 215 136 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+678 0 1
0.54% 1.31% 0.00% 0.00% 0x18 0 1 0xffffffffb079a1bf 547 296 528 91 71 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+655 0 1
0.72% 0.72% 0.00% 0.00% 0x18 0 1 0xffffffffb079a58c 1479 1534 676 288 163 [k] down_write [kernel.kallsyms] down_write+28 0 1
0.00% 0.12% 0.00% 0.00% 0x18 0 1 0xffffffffafd707db 0 2381 744 282 158 [k] up_write [kernel.kallsyms] up_write+27 0 1
0.00% 0.12% 0.00% 0.00% 0x18 0 1 0xffffffffb079a064 0 239 518 6 6 [k] rwsem_down_write_slowpath [kernel.kallsyms] rwsem_down_write_slowpath+308 0 1
46.58% 47.02% 0.00% 0.00% 0x20 0 1 0xffffffffafd70945 704 403 499 1137 219 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+53 0 1
23.92% 25.78% 0.00% 0.00% 0x20 0 1 0xffffffffafd70915 558 413 500 542 185 [k] rwsem_spin_on_owner [kernel.kallsyms] rwsem_spin_on_owner+5 0 1
v1->v2: change padding to exchange fields.
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Signed-off-by: Lipeng Zhu <lipeng.zhu@intel.com>
---
include/linux/fs.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 133f0640fb24..4a525ed17eab 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -451,11 +451,11 @@ struct address_space {
atomic_t nr_thps;
#endif
struct rb_root_cached i_mmap;
- struct rw_semaphore i_mmap_rwsem;
unsigned long nrpages;
pgoff_t writeback_index;
const struct address_space_operations *a_ops;
unsigned long flags;
+ struct rw_semaphore i_mmap_rwsem;
errseq_t wb_err;
spinlock_t private_lock;
struct list_head private_list;
--
2.39.1
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-07-16 14:52 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-28 10:56 [PATCH] fs/address_space: add alignment padding for i_map and i_mmap_rwsem to mitigate a false sharing Zhu, Lipeng
2023-07-02 21:11 ` Andrew Morton
2023-07-16 14:48 ` Zhu, Lipeng
2023-07-16 14:54 ` [PATCH v2] " Zhu, Lipeng
2023-07-16 14:56 ` Zhu, Lipeng
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).