All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ritesh Harjani <riteshh@linux.ibm.com>
To: linux-ext4@vger.kernel.org
Cc: "Theodore Ts'o" <tytso@mit.edu>, Jan Kara <jack@suse.com>,
	Andreas Dilger <adilger.kernel@dilger.ca>,
	Harshad Shirwadkar <harshadshirwadkar@gmail.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	Ritesh Harjani <riteshh@linux.ibm.com>
Subject: [RFC 0/1] ext4: Performance scalability improvement with fast_commit
Date: Mon, 14 Feb 2022 09:27:42 +0530	[thread overview]
Message-ID: <cover.1644809996.git.riteshh@linux.ibm.com> (raw)

Hello,

I have recently started playing with some filesystem performance scalability testing,
mainly ext4 for now and in this patch it is with fast_commit feature.

While running fs_mark (with -s0 -S5) for scalability runs with fast_commit enabled,
I noticed some heavy contention in ext4_fc_commit() -> ext4_fc_commit_dentry_updates().

Analysis
===========
This is because -
1. To commit all the dentry updates using FC, we first loop in for_each dentry
   entry in sbi->s_fc_dentry_q.
2. Then within that loop, for each of the above fc_dentry nodes, we again loop in
   for_each inode in sbi->s_fc_q. This is to get the corresponding inode entry
   belonging to fc_dentry->fcd_ino.
Second loop above, is mainly done to get corresponding inode so that before
committing dentry updates into FC area, we first write inode data, inode and
then dentry. This turns the whole ext4_fc_commit() path into quadratic time complexity.

This is fine until a multi-threaded application is making the updates to limited no.
of open files and then issuing fsync for each/any of the files.
But as no. of open files (tracked in FC list) increases, we see significant
performance impact with higher no. of open files (see below table for more details).

This RFC patch thus improves the performance of ext4_fc_commit() path by making
it linear time for doing dentry updates (ext4_fc_commit_dentry_updates()).


Observations on perf table results
===================================
If we look at the table below, we start seeing performance problems from row 6th
onwards, where the numbers actually decrease as compared to previous row (row 5).
And then from row 7th onwards the numbers are significantly low. In fact, I was
observing the fs_mark getting completely stuck for quite some time and
progressing very slowly (with params of row 7th onwards).


Observations on perf profile
===============================
Similar observations can be seen in below perf profile which is taken with params of
row-8th. Almost 87% of the time is being wasted in that O(N^2) loop to just find
the right corresponding inode for fc_dentry->fcd_ino.

[Table]: Perf absolute numbers in avg file creates per sec (from fs_mark in 1K order)
=======================================================================
#no. 	Order 		without-patch(K) 	with-patch(K) 		Diff(%)
1	1 		16.90 			17.51 			+3.60
2	2,2 		32.08 			31.80 			-0.87
3	3,3 		53.97 			55.01 			+1.92
4	4,4 		78.94 			76.90 			-2.58
5	5,5 		95.82 			95.37 			-0.46
6	6,6 		87.92 			103.38 			+17.58
7	6,10 		 0.73 			126.13 			+17178.08
8	6,14 		 2.33 			143.19 			+6045.49

Scalability run plots with different directory ways (/ threads) and no. of dirs/file
(w/o patches)
================================================================================

(Avg files/sec x1000) 				'fc_perf.txt' using 3:xtic(2)
  100 +--------------------------------------------------------------------+
      |       +      +       +       +      *       +       +      +       |
   90 |-+                            	    *       *   	         +-|
      |                                     *       *                      |
   80 |-+                            *      *       *                    +-|
      |                              *      *       *                      |
   70 |-+                            *      *       *                    +-|
      |                              *      *       *                      |
   60 |-+                            *      *       *                    +-|
      |                      *       *      *       *                      |
   50 |-+                    *       *      *       *                    +-|
      |                      *       *      *       *                      |
   40 |-+                    *       *      *       *                    +-|
      |                      *       *      *       *                      |
   30 |-+            *       *       *      *       *                    +-|
      |              *       *       *      *       *                      |
   20 |-+            *       *       *      *       *                    +-|
      |       *      *       *       *      *       *                      |
   10 |-+     *      *       *       *      *       *                    +-|
      |       *      *       *       *      *       *       +      +       |
    0 +--------------------------------------------------------------------+
             1,1     2,2     3,3     4,4    5,5     6,6    6,10   6,14 (order,dir & files)

	^^^^ extremely poor numbers at higher X values (w/o patch)

X-axis: 2^order dir ways, 2^dir & 2^files.
	For e.g. with x coordinate of 6,10 (2^6 == 64 && 2^10 == 1024)
	echo /run/riteshh/mnt/{1..64} |sed -E 's/[[:space:]]+/ -d /g' | xargs -I {} bash -c "sudo fs_mark -L 100 -D 1024 -n 1024 -s0 -S5 -d {}"

Y-axis: Avg files per sec (x1000).
	For e.g. a y coordinate of 100 represent 100K avg file creates per sec. with fs_mark


Perf profile
(w/o patches)
=============================
87.15%  [kernel]  [k] ext4_fc_commit 			--> Heavy contention/bottleneck
 1.98%  [kernel]  [k] perf_event_interrupt
 0.96%  [kernel]  [k] power_pmu_enable
 0.91%  [kernel]  [k] update_sd_lb_stats.constprop.0
 0.67%  [kernel]  [k] ktime_get


Scalability run plots with different directory ways (/ threads) and no. of dirs/file
(with patch)
================================================================================
(Avg files/sec x1000)
  160 +--------------------------------------------------------------------+
      |       +      +       +       +      +       +       +      +       |
  140 |-+                            'fc_perf.txt' using 4:xtic(2) *     +-|
      |                                                            *       |
      |                                                     *      *       |
  120 |-+                                                   *      *     +-|
      |                                                     *      *       |
  100 |-+                                           *       *      *     +-|
      |                                     *       *       *      *       |
      |                                     *       *       *      *       |
   80 |-+                            *      *       *       *      *     +-|
      |                              *      *       *       *      *       |
   60 |-+                            *      *       *       *      *     +-|
      |                      *       *      *       *       *      *       |
      |                      *       *      *       *       *      *       |
   40 |-+                    *       *      *       *       *      *     +-|
      |              *       *       *      *       *       *      *       |
   20 |-+            *       *       *      *       *       *      *     +-|
      |       *      *       *       *      *       *       *      *       |
      |       *      *       *       *      *       *       *      *       |
    0 +--------------------------------------------------------------------+
            1,1     2,2     3,3     4,4    5,5     6,6    6,10   6,14 (order, dir & files)

	^^^^ Shows linear scaling with this patch ;)

Perf profile
(with patch)
===========================
21.41%  [kernel]     [k] snooze_loop
18.67%  [kernel]     [k] _raw_spin_lock
12.34%  [kernel]     [k] _raw_spin_lock_irq
 5.02%  [kernel]     [k] update_sd_lb_stats.constprop.0
 1.91%  libc-2.31.so [.] __random
 1.85%  [kernel]     [k] _find_next_bit


xfstests results
==================
This has survived my fstests testing with -g log,metadata,auto group.
(CONFIG_KASAN disabled). I haven't found any regression due to this patch in my testing.

But to avoid me missing any corner slippery edges of fast_commit feature, a careful
review would really help as always :)


Ritesh Harjani (1):
  ext4: Improve fast_commit performance and scalability

 fs/ext4/ext4.h        |  2 ++
 fs/ext4/fast_commit.c | 64 +++++++++++++++++++++++++++++++------------
 fs/ext4/fast_commit.h |  1 +
 3 files changed, 50 insertions(+), 17 deletions(-)

--
2.31.1


             reply	other threads:[~2022-02-14  3:58 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-14  3:57 Ritesh Harjani [this message]
2022-02-14  3:57 ` [RFC 1/1] ext4: Improve fast_commit performance and scalability Ritesh Harjani
2022-02-16 23:25   ` harshad shirwadkar
2022-02-17 15:57     ` Ritesh Harjani
2022-02-18 19:29       ` harshad shirwadkar
2022-02-21  6:59       ` Ritesh Harjani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cover.1644809996.git.riteshh@linux.ibm.com \
    --to=riteshh@linux.ibm.com \
    --cc=adilger.kernel@dilger.ca \
    --cc=harshadshirwadkar@gmail.com \
    --cc=jack@suse.com \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.