Linux-ext4 Archive on lore.kernel.org
 help / color / Atom feed
From: Dave Chinner <david@fromorbit.com>
To: ira.weiny@intel.com
Cc: linux-kernel@vger.kernel.org,
	Alexander Viro <viro@zeniv.linux.org.uk>,
	"Darrick J. Wong" <darrick.wong@oracle.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Christoph Hellwig <hch@lst.de>,
	"Theodore Y. Ts'o" <tytso@mit.edu>, Jan Kara <jack@suse.cz>,
	linux-ext4@vger.kernel.org, linux-xfs@vger.kernel.org,
	linux-fsdevel@vger.kernel.org
Subject: Re: [PATCH v3 07/12] fs: Add locking for a dynamic DAX state
Date: Tue, 11 Feb 2020 19:00:35 +1100
Message-ID: <20200211080035.GI10776@dread.disaster.area> (raw)
In-Reply-To: <20200208193445.27421-8-ira.weiny@intel.com>

On Sat, Feb 08, 2020 at 11:34:40AM -0800, ira.weiny@intel.com wrote:
> From: Ira Weiny <ira.weiny@intel.com>
> 
> DAX requires special address space operations but many other functions
> check the IS_DAX() state.
> 
> While DAX is a property of the inode we perfer a lock at the super block
> level because of the overhead of a rwsem within the inode.
> 
> Define a vfs per superblock percpu rs semaphore to lock the DAX state

????

> while performing various VFS layer operations.  Write lock calls are
> provided here but are used in subsequent patches by the file systems
> themselves.
> 
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> 
> ---
> Changes from V2
> 
> 	Rebase on linux-next-08-02-2020
> 
> 	Fix locking order
> 	Change all references from mode to state where appropriate
> 	add CONFIG_FS_DAX requirement for state change
> 	Use a static branch to enable locking only when a dax capable
> 		device has been seen.
> 
> 	Move the lock to a global vfs lock
> 
> 		this does a few things
> 			1) preps us better for ext4 support
> 			2) removes funky callbacks from inode ops
> 			3) remove complexity from XFS and probably from
> 			   ext4 later
> 
> 		We can do this because
> 			1) the locking order is required to be at the
> 			   highest level anyway, so why complicate xfs
> 			2) We had to move the sem to the super_block
> 			   because it is too heavy for the inode.
> 			3) After internal discussions with Dan we
> 			   decided that this would be easier, just as
> 			   performant, and with slightly less overhead
> 			   than in the VFS SB.
> 
> 		We also change the functions names to up/down;
> 		read/write as appropriate.  Previous names were over
> 		simplified.

This, IMO, is a bit of a train wreck.

This patch has nothing to do with "DAX state", it's about
serialising access to the aops vector. There should be zero
references to DAX in this patch at all, except maybe to say
"switching DAX on dynamically requires atomic switching of address
space ops".

Big problems I see here:

1. static key to turn it on globally.
  - just a gross hack around doing it properly with a per-sb
    mechanism and enbaling it only on filesystems that are on DAX
    capable block devices.
  - you're already passing in an inode to all these functions. It's
    trivial to do:

	if (!inode->i_sb->s_flags & S_DYNAMIC_AOPS)
		return
	/* do sb->s_aops_lock manipulation */

2. global lock
  - OMG!
  - global lock will cause entire system IO/page fault stalls
    when someone does recursive/bulk DAX flag modification
    operations. Per-cpu rwsem Contention on  large systems will be
    utterly awful.
  - ext4's use case almost never hits the exclusive lock side of the
    percpu-rwsem - only when changing the journal mode flag on the
    inode. And it only affects writeback in progress, so it's not
    going to have massive concurrency on it like a VFS level global
    lock has. -> Bad model to follow.
  - per-sb lock is trivial - see #1 - which limits scope to single
    filesystem
  - per-inode rwsem would make this problem go away entirely.

3. If we can use a global per-cpu rwsem, why can't we just use a
   per-inode rwsem?
  - locking context rules are the same
  - rwsem scales pretty damn well for shared ops
  - no "global" contention problems
  - small enough that we can put another rwsem in the inode.

4. "inode_dax_state_up_read"
  - Eye bleeds.
  - this is about the aops structure serialisation, not dax.
  - The name makes no sense in the places that it has been added.

5. We already have code that does almost exactly what we need: the
   superblock freezing infrastructure.
  - freezing implements a "hold operations on this superblock until
    further notice" model we can easily follow.
  - sb_start_write/sb_end_write provides a simple API model and a
    clean, clear and concise naming convention we can use, too.


Really, I'm struggling to understand how we got to "global locking
that stops the world" from "need to change per-inode state
atomically". Can someone please explain to me why this patch isn't
just a simple set of largely self-explanitory functions like this:

XX_start_aop()
XX_end_aop()

XX_lock_aops()
XX_switch_aops(aops)
XX_unlock_aops()

where "XX" is "sb" or "inode" depending on what locking granularity
is used...

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

  reply index

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-08 19:34 [PATCH v3 00/12] Enable per-file/directory DAX operations V3 ira.weiny
2020-02-08 19:34 ` [PATCH v3 01/12] fs/stat: Define DAX statx attribute ira.weiny
2020-02-08 19:34 ` [PATCH v3 02/12] fs/xfs: Isolate the physical DAX flag from effective ira.weiny
2020-02-08 19:34 ` [PATCH v3 03/12] fs/xfs: Separate functionality of xfs_inode_supports_dax() ira.weiny
2020-02-11  5:47   ` Dave Chinner
2020-02-11 16:13     ` Ira Weiny
2020-02-08 19:34 ` [PATCH v3 04/12] fs/xfs: Clean up DAX support check ira.weiny
2020-02-11  5:57   ` Dave Chinner
2020-02-11 16:28     ` Ira Weiny
2020-02-11 20:38       ` Dave Chinner
2020-02-08 19:34 ` [PATCH v3 05/12] fs: remove unneeded IS_DAX() check ira.weiny
2020-02-11  5:34   ` Dave Chinner
2020-02-11 16:38     ` Ira Weiny
2020-02-11 20:41       ` Dave Chinner
2020-02-12 16:04         ` Ira Weiny
2020-02-08 19:34 ` [PATCH v3 06/12] fs/xfs: Check if the inode supports DAX under lock ira.weiny
2020-02-11  6:16   ` Dave Chinner
2020-02-11 17:55     ` Ira Weiny
2020-02-11 20:42       ` Dave Chinner
2020-02-12 16:10         ` Ira Weiny
2020-02-08 19:34 ` [PATCH v3 07/12] fs: Add locking for a dynamic DAX state ira.weiny
2020-02-11  8:00   ` Dave Chinner [this message]
2020-02-11 20:14     ` Ira Weiny
2020-02-11 20:59       ` Dan Williams
2020-02-11 21:49       ` Dave Chinner
2020-02-12  6:31         ` Darrick J. Wong
2020-02-08 19:34 ` [PATCH v3 08/12] fs/xfs: Clarify lockdep dependency for xfs_isilocked() ira.weiny
2020-02-08 19:34 ` [PATCH v3 09/12] fs/xfs: Add write DAX lock to xfs layer ira.weiny
2020-02-08 19:34 ` [PATCH v3 10/12] fs: Prevent DAX state change if file is mmap'ed ira.weiny
2020-02-08 19:34 ` [PATCH v3 11/12] fs/xfs: Clean up locking in dax invalidate ira.weiny
2020-02-08 19:34 ` [PATCH v3 12/12] fs/xfs: Allow toggle of effective DAX flag ira.weiny
2020-02-10 15:15 ` [PATCH v3 00/12] Enable per-file/directory DAX operations V3 Jeff Moyer
2020-02-11 20:17   ` Ira Weiny
2020-02-12 19:49     ` Jeff Moyer
2020-02-13 19:01       ` Ira Weiny
2020-02-13 19:05         ` Ira Weiny
2020-02-13 19:58           ` Darrick J. Wong
2020-02-13 23:29             ` Ira Weiny
2020-02-14  0:16               ` Dan Williams
2020-02-14 20:06                 ` Ira Weiny
2020-02-14 21:23                   ` Jeff Moyer
2020-02-14 21:58                     ` Ira Weiny
2020-02-14 22:06                       ` Jeff Moyer
2020-02-14 22:58                         ` Jeff Moyer
2020-02-14 23:03                           ` Jeff Moyer
2020-02-18  2:35                           ` Ira Weiny
2020-02-18 14:22                             ` Jeff Moyer
2020-02-18 23:54                               ` Ira Weiny
2020-02-20 16:20                                 ` Ira Weiny
2020-02-20 16:30                                   ` Darrick J. Wong
2020-02-20 16:49                                     ` Ira Weiny
2020-02-20 17:00                                       ` Darrick J. Wong

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200211080035.GI10776@dread.disaster.area \
    --to=david@fromorbit.com \
    --cc=dan.j.williams@intel.com \
    --cc=darrick.wong@oracle.com \
    --cc=hch@lst.de \
    --cc=ira.weiny@intel.com \
    --cc=jack@suse.cz \
    --cc=linux-ext4@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-xfs@vger.kernel.org \
    --cc=tytso@mit.edu \
    --cc=viro@zeniv.linux.org.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-ext4 Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-ext4/0 linux-ext4/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-ext4 linux-ext4/ https://lore.kernel.org/linux-ext4 \
		linux-ext4@vger.kernel.org
	public-inbox-index linux-ext4

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-ext4


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git