KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-nvdimm@lists.01.org
Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com,
	dgilbert@redhat.com, swhiteho@redhat.com
Subject: [PATCH v2 24/30] fuse, dax: Take ->i_mmap_sem lock during dax page fault
Date: Wed, 15 May 2019 15:27:09 -0400
Message-ID: <20190515192715.18000-25-vgoyal@redhat.com> (raw)
In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com>

We need some kind of locking mechanism here. Normal file systems like
ext4 and xfs seems to take their own semaphore to protect agains
truncate while fault is going on.

We have additional requirement to protect against fuse dax memory range
reclaim. When a range has been selected for reclaim, we need to make sure
no other read/write/fault can try to access that memory range while
reclaim is in progress. Once reclaim is complete, lock will be released
and read/write/fault will trigger allocation of fresh dax range.

Taking inode_lock() is not an option in fault path as lockdep complains
about circular dependencies. So define a new fuse_inode->i_mmap_sem.

Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
---
 fs/fuse/dir.c    |  2 ++
 fs/fuse/file.c   | 17 +++++++++++++----
 fs/fuse/fuse_i.h |  7 +++++++
 fs/fuse/inode.c  |  1 +
 4 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c
index fd8636e67ae9..84c0b638affb 100644
--- a/fs/fuse/dir.c
+++ b/fs/fuse/dir.c
@@ -1559,8 +1559,10 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr,
 	 */
 	if ((is_truncate || !is_wb) &&
 	    S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) {
+		down_write(&fi->i_mmap_sem);
 		truncate_pagecache(inode, outarg.attr.size);
 		invalidate_inode_pages2(inode->i_mapping);
+		up_write(&fi->i_mmap_sem);
 	}
 
 	clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state);
diff --git a/fs/fuse/file.c b/fs/fuse/file.c
index 2777355bc245..e536a04aaa06 100644
--- a/fs/fuse/file.c
+++ b/fs/fuse/file.c
@@ -2638,13 +2638,20 @@ static int __fuse_dax_fault(struct vm_fault *vmf, enum page_entry_size pe_size,
 	if (write)
 		sb_start_pagefault(sb);
 
-	/* TODO inode semaphore to protect faults vs truncate */
-
+	/*
+	 * We need to serialize against not only truncate but also against
+	 * fuse dax memory range reclaim. While a range is being reclaimed,
+	 * we do not want any read/write/mmap to make progress and try
+	 * to populate page cache or access memory we are trying to free.
+	 */
+	down_read(&get_fuse_inode(inode)->i_mmap_sem);
 	ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &fuse_iomap_ops);
 
 	if (ret & VM_FAULT_NEEDDSYNC)
 		ret = dax_finish_sync_fault(vmf, pe_size, pfn);
 
+	up_read(&get_fuse_inode(inode)->i_mmap_sem);
+
 	if (write)
 		sb_end_pagefault(sb);
 
@@ -3593,9 +3600,11 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset,
 			file_update_time(file);
 	}
 
-	if (mode & FALLOC_FL_PUNCH_HOLE)
+	if (mode & FALLOC_FL_PUNCH_HOLE) {
+		down_write(&fi->i_mmap_sem);
 		truncate_pagecache_range(inode, offset, offset + length - 1);
-
+		up_write(&fi->i_mmap_sem);
+	}
 	fuse_invalidate_attr(inode);
 
 out:
diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h
index f1ae549eff98..a234cf30538d 100644
--- a/fs/fuse/fuse_i.h
+++ b/fs/fuse/fuse_i.h
@@ -212,6 +212,13 @@ struct fuse_inode {
 	 */
 	struct rw_semaphore i_dmap_sem;
 
+	/**
+	 * Can't take inode lock in fault path (leads to circular dependency).
+	 * So take this in fuse dax fault path to make sure truncate and
+	 * punch hole etc. can't make progress in parallel.
+	 */
+	struct rw_semaphore i_mmap_sem;
+
 	/** Sorted rb tree of struct fuse_dax_mapping elements */
 	struct rb_root_cached dmap_tree;
 	unsigned long nr_dmaps;
diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c
index ad66a353554b..713c5f32ab35 100644
--- a/fs/fuse/inode.c
+++ b/fs/fuse/inode.c
@@ -85,6 +85,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb)
 	fi->state = 0;
 	fi->nr_dmaps = 0;
 	mutex_init(&fi->mutex);
+	init_rwsem(&fi->i_mmap_sem);
 	init_rwsem(&fi->i_dmap_sem);
 	spin_lock_init(&fi->lock);
 	fi->forget = fuse_alloc_forget();
-- 
2.20.1


  parent reply index

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-15 19:26 [PATCH v2 00/30] [RFC] virtio-fs: shared file system for virtual machines Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 01/30] fuse: delete dentry if timeout is zero Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 02/30] fuse: Clear setuid bit even in cache=never path Vivek Goyal
2019-05-20 14:41   ` Miklos Szeredi
2019-05-20 14:44     ` Miklos Szeredi
2019-05-20 20:25       ` Nikolaus Rath
2019-05-21 15:01     ` Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 03/30] fuse: Use default_file_splice_read for direct IO Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 04/30] fuse: export fuse_end_request() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 05/30] fuse: export fuse_len_args() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 06/30] fuse: Export fuse_send_init_request() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 07/30] fuse: export fuse_get_unique() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 08/30] fuse: extract fuse_fill_super_common() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 09/30] fuse: add fuse_iqueue_ops callbacks Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 10/30] fuse: Separate fuse device allocation and installation in fuse_conn Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 11/30] virtio_fs: add skeleton virtio_fs.ko module Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 12/30] dax: remove block device dependencies Vivek Goyal
2019-05-16  0:21   ` Dan Williams
2019-05-16 10:07     ` Stefan Hajnoczi
2019-05-16 14:23     ` Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 13/30] dax: Pass dax_dev to dax_writeback_mapping_range() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 14/30] virtio: Add get_shm_region method Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 15/30] virtio: Implement get_shm_region for PCI transport Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 16/30] virtio: Implement get_shm_region for MMIO transport Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 17/30] fuse, dax: add fuse_conn->dax_dev field Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device Vivek Goyal
2019-07-17 17:27   ` Halil Pasic
2019-07-18  9:04     ` Cornelia Huck
2019-07-18 11:20       ` Halil Pasic
2019-07-18 14:47         ` Cornelia Huck
2019-07-18 13:15     ` Vivek Goyal
2019-07-18 14:30       ` Dan Williams
2019-07-22 10:51         ` Christian Borntraeger
2019-07-22 10:56           ` Dr. David Alan Gilbert
2019-07-22 11:20             ` Christian Borntraeger
2019-07-22 11:43               ` Cornelia Huck
2019-07-22 12:00                 ` Christian Borntraeger
2019-07-22 12:08                   ` David Hildenbrand
2019-07-29 13:20                     ` Stefan Hajnoczi
2019-05-15 19:27 ` [PATCH v2 19/30] fuse: Keep a list of free dax memory ranges Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 20/30] fuse: Introduce setupmapping/removemapping commands Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 21/30] fuse, dax: Implement dax read/write operations Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 22/30] fuse, dax: add DAX mmap support Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 23/30] fuse: Define dax address space operations Vivek Goyal
2019-05-15 19:27 ` Vivek Goyal [this message]
2019-05-15 19:27 ` [PATCH v2 25/30] fuse: Maintain a list of busy elements Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 26/30] fuse: Add logic to free up a memory range Vivek Goyal
     [not found]   ` <CAN+Pk99SNKSf+GjSQUUWt_eu1fSjTy_ByUOEQUXHi8zNqXY1zA@mail.gmail.com>
2019-05-20 12:53     ` Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 27/30] fuse: Release file in process context Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 28/30] fuse: Reschedule dax free work if too many EAGAIN attempts Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 29/30] fuse: Take inode lock for dax inode truncation Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 30/30] virtio-fs: Do not provide abort interface in fusectl Vivek Goyal

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190515192715.18000-25-vgoyal@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=miklos@szeredi.hu \
    --cc=stefanha@redhat.com \
    --cc=swhiteho@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org kvm@archiver.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/ public-inbox