linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Shaokun Zhang <zhangshaokun@hisilicon.com>
To: <linux-fsdevel@vger.kernel.org>, <linux-kernel@vger.kernel.org>
Cc: Shaokun Zhang <zhangshaokun@hisilicon.com>,
	Will Deacon <will@kernel.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Boqun Feng <boqun.feng@gmail.com>, Yuqi Jin <jinyuqi@huawei.com>
Subject: [PATCH RESEND] fs: Move @f_count to different cacheline with @f_mode
Date: Wed, 24 Jun 2020 16:32:28 +0800	[thread overview]
Message-ID: <1592987548-8653-1-git-send-email-zhangshaokun@hisilicon.com> (raw)

get_file_rcu_many, which is called by __fget_files, has used
atomic_try_cmpxchg now and it can reduce the access number of the global
variable to improve the performance of atomic instruction compared with
atomic_cmpxchg. 

__fget_files does check the @f_mode with mask variable and will do some
atomic operations on @f_count, but both are on the same cacheline.
Many CPU cores do file access and it will cause much conflicts on @f_count. 
If we could make the two members into different cachelines, it shall relax
the siutations.

We have tested this on ARM64 and X86, the result is as follows:
Syscall of unixbench has been run on Huawei Kunpeng920 with this patch:
24 x System Call Overhead  1

System Call Overhead                    3160841.4 lps   (10.0 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
System Call Overhead                          15000.0    3160841.4   2107.2
                                                                   ========
System Benchmarks Index Score (Partial Only)                         2107.2

Without this patch:
24 x System Call Overhead  1

System Call Overhead                    2222456.0 lps   (10.0 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
System Call Overhead                          15000.0    2222456.0   1481.6
                                                                   ========
System Benchmarks Index Score (Partial Only)                         1481.6

And on Intel 6248 platform with this patch:
40 CPUs in system; running 24 parallel copies of tests

System Call Overhead                        4288509.1 lps   (10.0 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
System Call Overhead                          15000.0    4288509.1   2859.0
                                                                   ========
System Benchmarks Index Score (Partial Only)                         2859.0

Without this patch:
40 CPUs in system; running 24 parallel copies of tests

System Call Overhead                        3666313.0 lps   (10.0 s, 1 samples)

System Benchmarks Partial Index              BASELINE       RESULT    INDEX
System Call Overhead                          15000.0    3666313.0   2444.2
                                                                   ========
System Benchmarks Index Score (Partial Only)                         2444.2

Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Boqun Feng <boqun.feng@gmail.com>
Signed-off-by: Yuqi Jin <jinyuqi@huawei.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
---
 include/linux/fs.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 3f881a892ea7..0faeab5622fb 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -955,7 +955,6 @@ struct file {
 	 */
 	spinlock_t		f_lock;
 	enum rw_hint		f_write_hint;
-	atomic_long_t		f_count;
 	unsigned int 		f_flags;
 	fmode_t			f_mode;
 	struct mutex		f_pos_lock;
@@ -979,6 +978,7 @@ struct file {
 	struct address_space	*f_mapping;
 	errseq_t		f_wb_err;
 	errseq_t		f_sb_err; /* for syncfs */
+	atomic_long_t		f_count;
 } __randomize_layout
   __attribute__((aligned(4)));	/* lest something weird decides that 2 is OK */
 
-- 
2.7.4


             reply	other threads:[~2020-06-24  8:34 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-24  8:32 Shaokun Zhang [this message]
2020-07-08  7:23 ` [fs] 936e92b615: unixbench.score 32.3% improvement kernel test robot
2020-07-13  8:49   ` Shaokun Zhang
2020-08-21 16:02 ` [PATCH RESEND] fs: Move @f_count to different cacheline with @f_mode Will Deacon
2020-08-26  7:24   ` Shaokun Zhang
2020-08-26  8:24     ` Aleksa Sarai
2020-08-27  8:27       ` Shaokun Zhang
2020-09-08 11:37 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1592987548-8653-1-git-send-email-zhangshaokun@hisilicon.com \
    --to=zhangshaokun@hisilicon.com \
    --cc=boqun.feng@gmail.com \
    --cc=jinyuqi@huawei.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=peterz@infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).