linux-bcachefs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: Kent Overstreet <kent.overstreet@linux.dev>
Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org,
	linux-bcachefs@vger.kernel.org,
	Dave Chinner <dchinner@redhat.com>,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	Christian Brauner <brauner@kernel.org>
Subject: Re: [PATCH 22/32] vfs: inode cache conversion to hash-bl
Date: Wed, 10 May 2023 14:45:57 +1000	[thread overview]
Message-ID: <20230510044557.GF2651828@dread.disaster.area> (raw)
In-Reply-To: <20230509165657.1735798-23-kent.overstreet@linux.dev>

On Tue, May 09, 2023 at 12:56:47PM -0400, Kent Overstreet wrote:
> From: Dave Chinner <dchinner@redhat.com>
> 
> Because scalability of the global inode_hash_lock really, really
> sucks.
> 
> 32-way concurrent create on a couple of different filesystems
> before:
> 
> -   52.13%     0.04%  [kernel]            [k] ext4_create
>    - 52.09% ext4_create
>       - 41.03% __ext4_new_inode
>          - 29.92% insert_inode_locked
>             - 25.35% _raw_spin_lock
>                - do_raw_spin_lock
>                   - 24.97% __pv_queued_spin_lock_slowpath
> 
> -   72.33%     0.02%  [kernel]            [k] do_filp_open
>    - 72.31% do_filp_open
>       - 72.28% path_openat
>          - 57.03% bch2_create
>             - 56.46% __bch2_create
>                - 40.43% inode_insert5
>                   - 36.07% _raw_spin_lock
>                      - do_raw_spin_lock
>                           35.86% __pv_queued_spin_lock_slowpath
>                     4.02% find_inode
> 
> Convert the inode hash table to a RCU-aware hash-bl table just like
> the dentry cache. Note that we need to store a pointer to the
> hlist_bl_head the inode has been added to in the inode so that when
> it comes to unhash the inode we know what list to lock. We need to
> do this because the hash value that is used to hash the inode is
> generated from the inode itself - filesystems can provide this
> themselves so we have to either store the hash or the head pointer
> in the inode to be able to find the right list head for removal...
> 
> Same workload after:
> 
> Signed-off-by: Dave Chinner <dchinner@redhat.com>
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Christian Brauner <brauner@kernel.org>
> Cc: linux-fsdevel@vger.kernel.org
> Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev>

I have been maintaining this patchset uptodate in my own local trees
and the code in this patch looks the same. The commit message above,
however, has been mangled. The full commit message should be:

vfs: inode cache conversion to hash-bl

Because scalability of the global inode_hash_lock really, really
sucks and prevents me from doing scalability characterisation and
analysis of bcachefs algorithms.

Profiles of a 32-way concurrent create of 51.2m inodes with fsmark
on a couple of different filesystems on a 5.10 kernel:

-   52.13%     0.04%  [kernel]            [k] ext4_create
   - 52.09% ext4_create
      - 41.03% __ext4_new_inode
         - 29.92% insert_inode_locked
            - 25.35% _raw_spin_lock
               - do_raw_spin_lock
                  - 24.97% __pv_queued_spin_lock_slowpath


-   72.33%     0.02%  [kernel]            [k] do_filp_open
   - 72.31% do_filp_open
      - 72.28% path_openat
         - 57.03% bch2_create
            - 56.46% __bch2_create
               - 40.43% inode_insert5
                  - 36.07% _raw_spin_lock
                     - do_raw_spin_lock
                          35.86% __pv_queued_spin_lock_slowpath
                    4.02% find_inode

btrfs was tested but it is limited by internal lock contention at
>=2 threads on this workload, so never hammers the inode cache lock
hard enough for this change to matter to it's performance.

However, both bcachefs and ext4 demonstrate poor scaling at >=8
threads on concurrent lookup or create workloads.

Hence convert the inode hash table to a RCU-aware hash-bl table just
like the dentry cache. Note that we need to store a pointer to the
hlist_bl_head the inode has been added to in the inode so that when
it comes to unhash the inode we know what list to lock. We need to
do this because, unlike the dentry cache, the hash value that is
used to hash the inode is not generated from the inode itself. i.e.
filesystems can provide this themselves so we have to either store
the hashval or the hlist head pointer in the inode to be able to
find the right list head for removal...

Concurrent create with variying thread count (files/s):

                ext4                    bcachefs
threads         vanilla  patched        vanilla patched
2               117k     112k            80k     85k
4               185k     190k           133k    145k
8               303k     346k           185k    255k
16              389k     465k           190k    420k
32              360k     437k           142k    481k

CPU usage for both bcachefs and ext4 at 16 and 32 threads has been
halved on the patched kernel, while performance has increased
marginally on ext4 and massively on bcachefs. Internal filesystem
algorithms now limit performance on these workloads, not the global
inode_hash_lock.

Profile of the workloads on the patched kernels:

-   35.94%     0.07%  [kernel]                  [k] ext4_create
   - 35.87% ext4_create
      - 20.45% __ext4_new_inode
...
           3.36% insert_inode_locked

   - 78.43% do_filp_open
      - 78.36% path_openat
         - 53.95% bch2_create
            - 47.99% __bch2_create
....
              - 7.57% inode_insert5
                    6.94% find_inode

Spinlock contention is largely gone from the inode hash operations
and the filesystems are limited by contention in their internal
algorithms.

Signed-off-by: Dave Chinner <dchinner@redhat.com>
---

Other than that, the diffstat is the same and I don't see any obvious
differences in the code comapred to what I've been running locally.

-Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply	other threads:[~2023-05-10  4:46 UTC|newest]

Thread overview: 186+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-05-09 16:56 [PATCH 00/32] bcachefs - a new COW filesystem Kent Overstreet
2023-05-09 16:56 ` [PATCH 01/32] Compiler Attributes: add __flatten Kent Overstreet
2023-05-09 17:04   ` Miguel Ojeda
2023-05-09 17:24     ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 02/32] locking/lockdep: lock_class_is_held() Kent Overstreet
2023-05-09 19:30   ` Peter Zijlstra
2023-05-09 20:11     ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 03/32] locking/lockdep: lockdep_set_no_check_recursion() Kent Overstreet
2023-05-09 19:31   ` Peter Zijlstra
2023-05-09 19:57     ` Kent Overstreet
2023-05-09 20:18     ` Kent Overstreet
2023-05-09 20:27       ` Waiman Long
2023-05-09 20:35         ` Kent Overstreet
2023-05-09 21:37           ` Waiman Long
2023-05-10  8:59       ` Peter Zijlstra
2023-05-10 20:38         ` Kent Overstreet
2023-05-11  8:25           ` Peter Zijlstra
2023-05-11  9:32             ` Kent Overstreet
2023-05-12 20:49         ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 04/32] locking: SIX locks (shared/intent/exclusive) Kent Overstreet
2023-05-11 12:14   ` Jan Engelhardt
2023-05-12 20:58     ` Kent Overstreet
2023-05-12 22:39       ` Jan Engelhardt
2023-05-12 23:26         ` Kent Overstreet
2023-05-12 23:49           ` Randy Dunlap
2023-05-13  0:17             ` Kent Overstreet
2023-05-13  0:45               ` Eric Biggers
2023-05-13  0:51                 ` Kent Overstreet
2023-05-14 12:15   ` Jeff Layton
2023-05-15  2:39     ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 05/32] MAINTAINERS: Add entry for six locks Kent Overstreet
2023-05-09 16:56 ` [PATCH 06/32] sched: Add task_struct->faults_disabled_mapping Kent Overstreet
2023-05-10  1:07   ` Jan Kara
2023-05-10  6:18     ` Kent Overstreet
2023-05-23 13:34       ` Jan Kara
2023-05-23 16:21         ` [Cluster-devel] " Christoph Hellwig
2023-05-23 16:35           ` Kent Overstreet
2023-05-24  6:43             ` Christoph Hellwig
2023-05-24  8:09               ` Kent Overstreet
2023-05-25  8:58                 ` Christoph Hellwig
2023-05-25 20:50                   ` Kent Overstreet
2023-05-26  8:06                     ` Christoph Hellwig
2023-05-26  8:34                       ` Kent Overstreet
2023-05-25 21:40                   ` Kent Overstreet
2023-05-25 22:25           ` Andreas Grünbacher
2023-05-25 23:20             ` Kent Overstreet
2023-05-26  0:05               ` Andreas Grünbacher
2023-05-26  0:39                 ` Kent Overstreet
2023-05-26  8:10               ` Christoph Hellwig
2023-05-26  8:38                 ` Kent Overstreet
2023-05-23 16:49         ` Kent Overstreet
2023-05-25  8:47           ` Jan Kara
2023-05-25 21:36             ` Kent Overstreet
2023-05-25 22:45             ` Andreas Grünbacher
2023-05-25 22:04         ` Andreas Grünbacher
2023-05-09 16:56 ` [PATCH 07/32] mm: Bring back vmalloc_exec Kent Overstreet
2023-05-09 18:19   ` Lorenzo Stoakes
2023-05-09 20:15     ` Kent Overstreet
2023-05-09 20:46   ` Christoph Hellwig
2023-05-09 21:12     ` Lorenzo Stoakes
2023-05-09 21:29       ` Kent Overstreet
2023-05-10  6:48         ` Eric Biggers
2023-05-12 18:36           ` Kent Overstreet
2023-05-13  1:57             ` Eric Biggers
2023-05-13 19:28               ` Kent Overstreet
2023-05-14  5:45               ` Kent Overstreet
2023-05-14 18:43                 ` Eric Biggers
2023-05-15  5:38                   ` Kent Overstreet
2023-05-15  6:13                     ` Eric Biggers
2023-05-15  6:18                       ` Kent Overstreet
2023-05-15  7:13                         ` Eric Biggers
2023-05-15  7:26                           ` Kent Overstreet
2023-05-21 21:33                             ` Eric Biggers
2023-05-21 22:04                               ` Kent Overstreet
2023-05-15 10:29                 ` David Laight
2023-05-10 11:56         ` David Laight
2023-05-09 21:43       ` Darrick J. Wong
2023-05-09 21:54         ` Kent Overstreet
2023-05-11  5:33           ` Theodore Ts'o
2023-05-11  5:44             ` Kent Overstreet
2023-05-13 13:25       ` Lorenzo Stoakes
2023-05-14 18:39         ` Christophe Leroy
2023-05-14 23:43           ` Kent Overstreet
2023-05-15  4:45             ` Christophe Leroy
2023-05-15  5:02               ` Kent Overstreet
2023-05-10 14:18   ` Christophe Leroy
2023-05-10 15:05   ` Johannes Thumshirn
2023-05-11 22:28     ` Kees Cook
2023-05-12 18:41       ` Kent Overstreet
2023-05-16 21:02         ` Kees Cook
2023-05-16 21:20           ` Kent Overstreet
2023-05-16 21:47             ` Matthew Wilcox
2023-05-16 21:57               ` Kent Overstreet
2023-05-17  5:28               ` Kent Overstreet
2023-05-17 14:04                 ` Mike Rapoport
2023-05-17 14:18                   ` Kent Overstreet
2023-05-17 15:44                     ` Mike Rapoport
2023-05-17 15:59                       ` Kent Overstreet
2023-06-17  4:13             ` Andy Lutomirski
2023-06-17 15:34               ` Kent Overstreet
2023-06-17 19:19                 ` Andy Lutomirski
2023-06-17 20:08                   ` Kent Overstreet
2023-06-17 20:35                     ` Andy Lutomirski
2023-06-19 19:45                 ` Kees Cook
2023-06-20  0:39                   ` Kent Overstreet
2023-06-19  9:19   ` Mark Rutland
2023-06-19 10:47     ` Kent Overstreet
2023-06-19 12:47       ` Mark Rutland
2023-06-19 19:17         ` Kent Overstreet
2023-06-20 17:42           ` Andy Lutomirski
2023-06-20 18:08             ` Kent Overstreet
2023-06-20 18:15               ` Andy Lutomirski
2023-06-20 18:48                 ` Dave Hansen
2023-06-20 20:18                   ` Kent Overstreet
2023-06-20 20:42                   ` Andy Lutomirski
2023-06-20 22:32                     ` Andy Lutomirski
2023-06-20 22:43                       ` Nadav Amit
2023-06-21  1:27                         ` Andy Lutomirski
2023-05-09 16:56 ` [PATCH 08/32] fs: factor out d_mark_tmpfile() Kent Overstreet
2023-05-09 16:56 ` [PATCH 09/32] block: Add some exports for bcachefs Kent Overstreet
2023-05-09 16:56 ` [PATCH 10/32] block: Allow bio_iov_iter_get_pages() with bio->bi_bdev unset Kent Overstreet
2023-05-09 16:56 ` [PATCH 11/32] block: Bring back zero_fill_bio_iter Kent Overstreet
2023-05-09 16:56 ` [PATCH 12/32] block: Rework bio_for_each_segment_all() Kent Overstreet
2023-05-09 16:56 ` [PATCH 13/32] block: Rework bio_for_each_folio_all() Kent Overstreet
2023-05-09 16:56 ` [PATCH 14/32] block: Don't block on s_umount from __invalidate_super() Kent Overstreet
2023-05-09 16:56 ` [PATCH 15/32] bcache: move closures to lib/ Kent Overstreet
2023-05-10  1:10   ` Randy Dunlap
2023-05-09 16:56 ` [PATCH 16/32] MAINTAINERS: Add entry for closures Kent Overstreet
2023-05-09 17:05   ` Coly Li
2023-05-09 21:03   ` Randy Dunlap
2023-05-09 16:56 ` [PATCH 17/32] closures: closure_wait_event() Kent Overstreet
2023-05-09 16:56 ` [PATCH 18/32] closures: closure_nr_remaining() Kent Overstreet
2023-05-09 16:56 ` [PATCH 19/32] closures: Add a missing include Kent Overstreet
2023-05-09 16:56 ` [PATCH 20/32] vfs: factor out inode hash head calculation Kent Overstreet
2023-05-23  9:27   ` (subset) " Christian Brauner
2023-05-23 22:53     ` Dave Chinner
2023-05-24  6:44       ` Christoph Hellwig
2023-05-24  7:35         ` Dave Chinner
2023-05-24  8:31           ` Christian Brauner
2023-05-24  8:41             ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 21/32] hlist-bl: add hlist_bl_fake() Kent Overstreet
2023-05-10  4:48   ` Dave Chinner
2023-05-23  9:27   ` (subset) " Christian Brauner
2023-05-09 16:56 ` [PATCH 22/32] vfs: inode cache conversion to hash-bl Kent Overstreet
2023-05-10  4:45   ` Dave Chinner [this message]
2023-05-16 15:45     ` Christian Brauner
2023-05-16 16:17       ` Kent Overstreet
2023-05-16 23:15         ` Dave Chinner
2023-05-22 13:04           ` Christian Brauner
2023-05-23  9:28   ` (subset) " Christian Brauner
2023-10-19 15:30     ` Mateusz Guzik
2023-10-19 15:59       ` Mateusz Guzik
2023-10-20 11:38         ` Dave Chinner
2023-10-20 17:49           ` Mateusz Guzik
2023-10-21 12:13             ` Mateusz Guzik
2023-10-23  5:10             ` Dave Chinner
2023-10-27 17:13               ` Mateusz Guzik
2023-10-27 18:36                 ` Darrick J. Wong
2023-10-31 11:02                 ` Christian Brauner
2023-10-31 11:31                   ` Mateusz Guzik
2023-11-02  2:36                   ` Kent Overstreet
2023-11-04 20:51                     ` Dave Chinner
2023-05-09 16:56 ` [PATCH 23/32] iov_iter: copy_folio_from_iter_atomic() Kent Overstreet
2023-05-10  2:20   ` kernel test robot
2023-05-11  2:08   ` kernel test robot
2023-05-09 16:56 ` [PATCH 24/32] MAINTAINERS: Add entry for generic-radix-tree Kent Overstreet
2023-05-09 21:03   ` Randy Dunlap
2023-05-09 16:56 ` [PATCH 25/32] lib/generic-radix-tree.c: Don't overflow in peek() Kent Overstreet
2023-05-09 16:56 ` [PATCH 26/32] lib/generic-radix-tree.c: Add a missing include Kent Overstreet
2023-05-09 16:56 ` [PATCH 27/32] lib/generic-radix-tree.c: Add peek_prev() Kent Overstreet
2023-05-09 16:56 ` [PATCH 28/32] stacktrace: Export stack_trace_save_tsk Kent Overstreet
2023-06-19  9:10   ` Mark Rutland
2023-06-19 11:16     ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 29/32] lib/string_helpers: string_get_size() now returns characters wrote Kent Overstreet
2023-07-12 19:58   ` Kees Cook
2023-07-12 20:19     ` Kent Overstreet
2023-07-12 22:38       ` Kees Cook
2023-07-12 23:53         ` Kent Overstreet
2023-07-12 20:23     ` Kent Overstreet
2023-05-09 16:56 ` [PATCH 30/32] lib: Export errname Kent Overstreet
2023-05-09 16:56 ` [PATCH 31/32] lib: add mean and variance module Kent Overstreet
2023-05-09 16:56 ` [PATCH 32/32] MAINTAINERS: Add entry for bcachefs Kent Overstreet
2023-05-09 21:04   ` Randy Dunlap
2023-05-09 21:07     ` Kent Overstreet
2023-06-15 20:41 ` [PATCH 00/32] bcachefs - a new COW filesystem Pavel Machek
2023-06-15 21:26   ` Kent Overstreet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20230510044557.GF2651828@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=brauner@kernel.org \
    --cc=dchinner@redhat.com \
    --cc=kent.overstreet@linux.dev \
    --cc=linux-bcachefs@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).