All of lore.kernel.org
 help / color / mirror / Atom feed
From: Waiman Long <Waiman.Long@hpe.com>
To: Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	Dave Chinner <dchinner@redhat.com>
Cc: xfs@oss.sgi.com, linux-kernel@vger.kernel.org,
	Ingo Molnar <mingo@redhat.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Scott J Norton <scott.norton@hp.com>,
	Douglas Hatch <doug.hatch@hp.com>,
	Waiman Long <Waiman.Long@hpe.com>
Subject: [RFC PATCH 2/2] xfs: Allow degeneration of m_fdblocks/m_ifree to global counters
Date: Fri,  4 Mar 2016 21:51:39 -0500	[thread overview]
Message-ID: <1457146299-1601-3-git-send-email-Waiman.Long@hpe.com> (raw)
In-Reply-To: <1457146299-1601-1-git-send-email-Waiman.Long@hpe.com>

Small XFS filesystems on systems with large number of CPUs can incur a
significant overhead due to excessive calls to the percpu_counter_sum()
function which needs to walk through a large number of different
cachelines.

This patch uses the newly added percpu_counter_set_limit() API to
potentially switch the m_fdblocks and m_ifree per-cpu counters to
a global counter with locks at filesystem mount time if its size
is small relatively to the number of CPUs available.

A possible use case is the use of the NVDIMM as an application scratch
storage area for log file and other small files. Current battery-backed
NVDIMMs are pretty small in size, e.g. 8G per DIMM. So we cannot create
large filesystem on top of them.

On a 4-socket 80-thread system running 4.5-rc6 kernel, this patch can
improve the throughput of the AIM7 XFS disk workload by 25%. Before
the patch, the perf profile was:

  18.68%   0.08%  reaim  [k] __percpu_counter_compare
  18.05%   9.11%  reaim  [k] __percpu_counter_sum
   0.37%   0.36%  reaim  [k] __percpu_counter_add

After the patch, the perf profile was:

   0.73%   0.36%  reaim  [k] __percpu_counter_add
   0.27%   0.27%  reaim  [k] __percpu_counter_compare

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 fs/xfs/xfs_mount.c |    1 -
 fs/xfs/xfs_mount.h |    5 +++++
 fs/xfs/xfs_super.c |    6 ++++++
 3 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index bb753b3..fe74b91 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1163,7 +1163,6 @@ xfs_mod_ifree(
  * a large batch count (1024) to minimise global counter updates except when
  * we get near to ENOSPC and we have to be very accurate with our updates.
  */
-#define XFS_FDBLOCKS_BATCH	1024
 int
 xfs_mod_fdblocks(
 	struct xfs_mount	*mp,
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b570984..d9520f4 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -206,6 +206,11 @@ typedef struct xfs_mount {
 #define	XFS_WSYNC_WRITEIO_LOG	14	/* 16k */
 
 /*
+ * FD blocks batch size for per-cpu compare
+ */
+#define XFS_FDBLOCKS_BATCH	1024
+
+/*
  * Allow large block sizes to be reported to userspace programs if the
  * "largeio" mount option is used.
  *
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 59c9b7b..c0b4f79 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1412,6 +1412,12 @@ xfs_reinit_percpu_counters(
 	percpu_counter_set(&mp->m_icount, mp->m_sb.sb_icount);
 	percpu_counter_set(&mp->m_ifree, mp->m_sb.sb_ifree);
 	percpu_counter_set(&mp->m_fdblocks, mp->m_sb.sb_fdblocks);
+
+	/*
+	 * Use default batch size for m_ifree
+	 */
+	percpu_counter_set_limit(&mp->m_ifree, 0);
+	percpu_counter_set_limit(&mp->m_fdblocks, 4 * XFS_FDBLOCKS_BATCH);
 }
 
 static void
-- 
1.7.1

WARNING: multiple messages have this Message-ID (diff)
From: Waiman Long <Waiman.Long@hpe.com>
To: Tejun Heo <tj@kernel.org>,
	Christoph Lameter <cl@linux-foundation.org>,
	Dave Chinner <dchinner@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Scott J Norton <scott.norton@hp.com>,
	linux-kernel@vger.kernel.org, Waiman Long <Waiman.Long@hpe.com>,
	xfs@oss.sgi.com, Ingo Molnar <mingo@redhat.com>,
	Douglas Hatch <doug.hatch@hp.com>
Subject: [RFC PATCH 2/2] xfs: Allow degeneration of m_fdblocks/m_ifree to global counters
Date: Fri,  4 Mar 2016 21:51:39 -0500	[thread overview]
Message-ID: <1457146299-1601-3-git-send-email-Waiman.Long@hpe.com> (raw)
In-Reply-To: <1457146299-1601-1-git-send-email-Waiman.Long@hpe.com>

Small XFS filesystems on systems with large number of CPUs can incur a
significant overhead due to excessive calls to the percpu_counter_sum()
function which needs to walk through a large number of different
cachelines.

This patch uses the newly added percpu_counter_set_limit() API to
potentially switch the m_fdblocks and m_ifree per-cpu counters to
a global counter with locks at filesystem mount time if its size
is small relatively to the number of CPUs available.

A possible use case is the use of the NVDIMM as an application scratch
storage area for log file and other small files. Current battery-backed
NVDIMMs are pretty small in size, e.g. 8G per DIMM. So we cannot create
large filesystem on top of them.

On a 4-socket 80-thread system running 4.5-rc6 kernel, this patch can
improve the throughput of the AIM7 XFS disk workload by 25%. Before
the patch, the perf profile was:

  18.68%   0.08%  reaim  [k] __percpu_counter_compare
  18.05%   9.11%  reaim  [k] __percpu_counter_sum
   0.37%   0.36%  reaim  [k] __percpu_counter_add

After the patch, the perf profile was:

   0.73%   0.36%  reaim  [k] __percpu_counter_add
   0.27%   0.27%  reaim  [k] __percpu_counter_compare

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
---
 fs/xfs/xfs_mount.c |    1 -
 fs/xfs/xfs_mount.h |    5 +++++
 fs/xfs/xfs_super.c |    6 ++++++
 3 files changed, 11 insertions(+), 1 deletions(-)

diff --git a/fs/xfs/xfs_mount.c b/fs/xfs/xfs_mount.c
index bb753b3..fe74b91 100644
--- a/fs/xfs/xfs_mount.c
+++ b/fs/xfs/xfs_mount.c
@@ -1163,7 +1163,6 @@ xfs_mod_ifree(
  * a large batch count (1024) to minimise global counter updates except when
  * we get near to ENOSPC and we have to be very accurate with our updates.
  */
-#define XFS_FDBLOCKS_BATCH	1024
 int
 xfs_mod_fdblocks(
 	struct xfs_mount	*mp,
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index b570984..d9520f4 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -206,6 +206,11 @@ typedef struct xfs_mount {
 #define	XFS_WSYNC_WRITEIO_LOG	14	/* 16k */
 
 /*
+ * FD blocks batch size for per-cpu compare
+ */
+#define XFS_FDBLOCKS_BATCH	1024
+
+/*
  * Allow large block sizes to be reported to userspace programs if the
  * "largeio" mount option is used.
  *
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 59c9b7b..c0b4f79 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1412,6 +1412,12 @@ xfs_reinit_percpu_counters(
 	percpu_counter_set(&mp->m_icount, mp->m_sb.sb_icount);
 	percpu_counter_set(&mp->m_ifree, mp->m_sb.sb_ifree);
 	percpu_counter_set(&mp->m_fdblocks, mp->m_sb.sb_fdblocks);
+
+	/*
+	 * Use default batch size for m_ifree
+	 */
+	percpu_counter_set_limit(&mp->m_ifree, 0);
+	percpu_counter_set_limit(&mp->m_fdblocks, 4 * XFS_FDBLOCKS_BATCH);
 }
 
 static void
-- 
1.7.1

_______________________________________________
xfs mailing list
xfs@oss.sgi.com
http://oss.sgi.com/mailman/listinfo/xfs

  parent reply	other threads:[~2016-03-05  2:52 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-05  2:51 [RFC PATCH 0/2] percpu_counter: Enable switching to global counter Waiman Long
2016-03-05  2:51 ` Waiman Long
2016-03-05  2:51 ` [RFC PATCH 1/2] percpu_counter: Allow falling back to global counter on large system Waiman Long
2016-03-05  2:51   ` Waiman Long
2016-03-07 18:24   ` Christoph Lameter
2016-03-07 18:24     ` Christoph Lameter
2016-03-07 19:47     ` Waiman Long
2016-03-16 19:20     ` Waiman Long
2016-03-18  1:58       ` Christoph Lameter
2016-03-18  1:58         ` Christoph Lameter
2016-03-05  2:51 ` Waiman Long [this message]
2016-03-05  2:51   ` [RFC PATCH 2/2] xfs: Allow degeneration of m_fdblocks/m_ifree to global counters Waiman Long
2016-03-05  6:34 ` [RFC PATCH 0/2] percpu_counter: Enable switching to global counter Dave Chinner
2016-03-05  6:34   ` Dave Chinner
2016-03-07 17:39   ` Waiman Long
2016-03-07 21:33     ` Dave Chinner
2016-03-07 21:33       ` Dave Chinner
2016-03-16 20:06       ` Waiman Long

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1457146299-1601-3-git-send-email-Waiman.Long@hpe.com \
    --to=waiman.long@hpe.com \
    --cc=cl@linux-foundation.org \
    --cc=dchinner@redhat.com \
    --cc=doug.hatch@hp.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=scott.norton@hp.com \
    --cc=tj@kernel.org \
    --cc=xfs@oss.sgi.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.