From: Vivek Goyal <vgoyal@redhat.com> To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: swhiteho@redhat.com, dgilbert@redhat.com, stefanha@redhat.com, miklos@szeredi.hu Subject: [PATCH v2 24/30] fuse, dax: Take ->i_mmap_sem lock during dax page fault Date: Wed, 15 May 2019 15:27:09 -0400 [thread overview] Message-ID: <20190515192715.18000-25-vgoyal@redhat.com> (raw) In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> We need some kind of locking mechanism here. Normal file systems like ext4 and xfs seems to take their own semaphore to protect agains truncate while fault is going on. We have additional requirement to protect against fuse dax memory range reclaim. When a range has been selected for reclaim, we need to make sure no other read/write/fault can try to access that memory range while reclaim is in progress. Once reclaim is complete, lock will be released and read/write/fault will trigger allocation of fresh dax range. Taking inode_lock() is not an option in fault path as lockdep complains about circular dependencies. So define a new fuse_inode->i_mmap_sem. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> --- fs/fuse/dir.c | 2 ++ fs/fuse/file.c | 17 +++++++++++++---- fs/fuse/fuse_i.h | 7 +++++++ fs/fuse/inode.c | 1 + 4 files changed, 23 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index fd8636e67ae9..84c0b638affb 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1559,8 +1559,10 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, */ if ((is_truncate || !is_wb) && S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) { + down_write(&fi->i_mmap_sem); truncate_pagecache(inode, outarg.attr.size); invalidate_inode_pages2(inode->i_mapping); + up_write(&fi->i_mmap_sem); } clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 2777355bc245..e536a04aaa06 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2638,13 +2638,20 @@ static int __fuse_dax_fault(struct vm_fault *vmf, enum page_entry_size pe_size, if (write) sb_start_pagefault(sb); - /* TODO inode semaphore to protect faults vs truncate */ - + /* + * We need to serialize against not only truncate but also against + * fuse dax memory range reclaim. While a range is being reclaimed, + * we do not want any read/write/mmap to make progress and try + * to populate page cache or access memory we are trying to free. + */ + down_read(&get_fuse_inode(inode)->i_mmap_sem); ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &fuse_iomap_ops); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); + up_read(&get_fuse_inode(inode)->i_mmap_sem); + if (write) sb_end_pagefault(sb); @@ -3593,9 +3600,11 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, file_update_time(file); } - if (mode & FALLOC_FL_PUNCH_HOLE) + if (mode & FALLOC_FL_PUNCH_HOLE) { + down_write(&fi->i_mmap_sem); truncate_pagecache_range(inode, offset, offset + length - 1); - + up_write(&fi->i_mmap_sem); + } fuse_invalidate_attr(inode); out: diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index f1ae549eff98..a234cf30538d 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -212,6 +212,13 @@ struct fuse_inode { */ struct rw_semaphore i_dmap_sem; + /** + * Can't take inode lock in fault path (leads to circular dependency). + * So take this in fuse dax fault path to make sure truncate and + * punch hole etc. can't make progress in parallel. + */ + struct rw_semaphore i_mmap_sem; + /** Sorted rb tree of struct fuse_dax_mapping elements */ struct rb_root_cached dmap_tree; unsigned long nr_dmaps; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index ad66a353554b..713c5f32ab35 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -85,6 +85,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) fi->state = 0; fi->nr_dmaps = 0; mutex_init(&fi->mutex); + init_rwsem(&fi->i_mmap_sem); init_rwsem(&fi->i_dmap_sem); spin_lock_init(&fi->lock); fi->forget = fuse_alloc_forget(); -- 2.20.1 _______________________________________________ Linux-nvdimm mailing list Linux-nvdimm@lists.01.org https://lists.01.org/mailman/listinfo/linux-nvdimm
WARNING: multiple messages have this Message-ID (diff)
From: Vivek Goyal <vgoyal@redhat.com> To: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, linux-nvdimm@lists.01.org Cc: vgoyal@redhat.com, miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com, swhiteho@redhat.com Subject: [PATCH v2 24/30] fuse, dax: Take ->i_mmap_sem lock during dax page fault Date: Wed, 15 May 2019 15:27:09 -0400 [thread overview] Message-ID: <20190515192715.18000-25-vgoyal@redhat.com> (raw) In-Reply-To: <20190515192715.18000-1-vgoyal@redhat.com> We need some kind of locking mechanism here. Normal file systems like ext4 and xfs seems to take their own semaphore to protect agains truncate while fault is going on. We have additional requirement to protect against fuse dax memory range reclaim. When a range has been selected for reclaim, we need to make sure no other read/write/fault can try to access that memory range while reclaim is in progress. Once reclaim is complete, lock will be released and read/write/fault will trigger allocation of fresh dax range. Taking inode_lock() is not an option in fault path as lockdep complains about circular dependencies. So define a new fuse_inode->i_mmap_sem. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> --- fs/fuse/dir.c | 2 ++ fs/fuse/file.c | 17 +++++++++++++---- fs/fuse/fuse_i.h | 7 +++++++ fs/fuse/inode.c | 1 + 4 files changed, 23 insertions(+), 4 deletions(-) diff --git a/fs/fuse/dir.c b/fs/fuse/dir.c index fd8636e67ae9..84c0b638affb 100644 --- a/fs/fuse/dir.c +++ b/fs/fuse/dir.c @@ -1559,8 +1559,10 @@ int fuse_do_setattr(struct dentry *dentry, struct iattr *attr, */ if ((is_truncate || !is_wb) && S_ISREG(inode->i_mode) && oldsize != outarg.attr.size) { + down_write(&fi->i_mmap_sem); truncate_pagecache(inode, outarg.attr.size); invalidate_inode_pages2(inode->i_mapping); + up_write(&fi->i_mmap_sem); } clear_bit(FUSE_I_SIZE_UNSTABLE, &fi->state); diff --git a/fs/fuse/file.c b/fs/fuse/file.c index 2777355bc245..e536a04aaa06 100644 --- a/fs/fuse/file.c +++ b/fs/fuse/file.c @@ -2638,13 +2638,20 @@ static int __fuse_dax_fault(struct vm_fault *vmf, enum page_entry_size pe_size, if (write) sb_start_pagefault(sb); - /* TODO inode semaphore to protect faults vs truncate */ - + /* + * We need to serialize against not only truncate but also against + * fuse dax memory range reclaim. While a range is being reclaimed, + * we do not want any read/write/mmap to make progress and try + * to populate page cache or access memory we are trying to free. + */ + down_read(&get_fuse_inode(inode)->i_mmap_sem); ret = dax_iomap_fault(vmf, pe_size, &pfn, NULL, &fuse_iomap_ops); if (ret & VM_FAULT_NEEDDSYNC) ret = dax_finish_sync_fault(vmf, pe_size, pfn); + up_read(&get_fuse_inode(inode)->i_mmap_sem); + if (write) sb_end_pagefault(sb); @@ -3593,9 +3600,11 @@ static long fuse_file_fallocate(struct file *file, int mode, loff_t offset, file_update_time(file); } - if (mode & FALLOC_FL_PUNCH_HOLE) + if (mode & FALLOC_FL_PUNCH_HOLE) { + down_write(&fi->i_mmap_sem); truncate_pagecache_range(inode, offset, offset + length - 1); - + up_write(&fi->i_mmap_sem); + } fuse_invalidate_attr(inode); out: diff --git a/fs/fuse/fuse_i.h b/fs/fuse/fuse_i.h index f1ae549eff98..a234cf30538d 100644 --- a/fs/fuse/fuse_i.h +++ b/fs/fuse/fuse_i.h @@ -212,6 +212,13 @@ struct fuse_inode { */ struct rw_semaphore i_dmap_sem; + /** + * Can't take inode lock in fault path (leads to circular dependency). + * So take this in fuse dax fault path to make sure truncate and + * punch hole etc. can't make progress in parallel. + */ + struct rw_semaphore i_mmap_sem; + /** Sorted rb tree of struct fuse_dax_mapping elements */ struct rb_root_cached dmap_tree; unsigned long nr_dmaps; diff --git a/fs/fuse/inode.c b/fs/fuse/inode.c index ad66a353554b..713c5f32ab35 100644 --- a/fs/fuse/inode.c +++ b/fs/fuse/inode.c @@ -85,6 +85,7 @@ static struct inode *fuse_alloc_inode(struct super_block *sb) fi->state = 0; fi->nr_dmaps = 0; mutex_init(&fi->mutex); + init_rwsem(&fi->i_mmap_sem); init_rwsem(&fi->i_dmap_sem); spin_lock_init(&fi->lock); fi->forget = fuse_alloc_forget(); -- 2.20.1
next prev parent reply other threads:[~2019-05-15 19:27 UTC|newest] Thread overview: 103+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-05-15 19:26 [PATCH v2 00/30] [RFC] virtio-fs: shared file system for virtual machines Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 01/30] fuse: delete dentry if timeout is zero Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 02/30] fuse: Clear setuid bit even in cache=never path Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-20 14:41 ` Miklos Szeredi 2019-05-20 14:41 ` Miklos Szeredi 2019-05-20 14:44 ` Miklos Szeredi 2019-05-20 14:44 ` Miklos Szeredi 2019-05-20 20:25 ` Nikolaus Rath 2019-05-20 20:25 ` Nikolaus Rath 2019-05-21 15:01 ` Vivek Goyal 2019-05-21 15:01 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 03/30] fuse: Use default_file_splice_read for direct IO Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 04/30] fuse: export fuse_end_request() Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 05/30] fuse: export fuse_len_args() Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 06/30] fuse: Export fuse_send_init_request() Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 07/30] fuse: export fuse_get_unique() Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 08/30] fuse: extract fuse_fill_super_common() Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 09/30] fuse: add fuse_iqueue_ops callbacks Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 10/30] fuse: Separate fuse device allocation and installation in fuse_conn Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 11/30] virtio_fs: add skeleton virtio_fs.ko module Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 12/30] dax: remove block device dependencies Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-16 0:21 ` Dan Williams 2019-05-16 0:21 ` Dan Williams 2019-05-16 10:07 ` Stefan Hajnoczi 2019-05-16 14:23 ` Vivek Goyal 2019-05-16 14:23 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 13/30] dax: Pass dax_dev to dax_writeback_mapping_range() Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 14/30] virtio: Add get_shm_region method Vivek Goyal 2019-05-15 19:26 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 15/30] virtio: Implement get_shm_region for PCI transport Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 16/30] virtio: Implement get_shm_region for MMIO transport Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 17/30] fuse, dax: add fuse_conn->dax_dev field Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-07-17 17:27 ` Halil Pasic 2019-07-17 17:27 ` Halil Pasic 2019-07-18 9:04 ` Cornelia Huck 2019-07-18 9:04 ` Cornelia Huck 2019-07-18 11:20 ` Halil Pasic 2019-07-18 11:20 ` Halil Pasic 2019-07-18 14:47 ` Cornelia Huck 2019-07-18 14:47 ` Cornelia Huck 2019-07-18 13:15 ` Vivek Goyal 2019-07-18 13:15 ` Vivek Goyal 2019-07-18 14:30 ` Dan Williams 2019-07-18 14:30 ` Dan Williams 2019-07-22 10:51 ` Christian Borntraeger 2019-07-22 10:51 ` Christian Borntraeger 2019-07-22 10:56 ` Dr. David Alan Gilbert 2019-07-22 10:56 ` Dr. David Alan Gilbert 2019-07-22 11:20 ` Christian Borntraeger 2019-07-22 11:20 ` Christian Borntraeger 2019-07-22 11:43 ` Cornelia Huck 2019-07-22 11:43 ` Cornelia Huck 2019-07-22 12:00 ` Christian Borntraeger 2019-07-22 12:00 ` Christian Borntraeger 2019-07-22 12:08 ` David Hildenbrand 2019-07-22 12:08 ` David Hildenbrand 2019-07-29 13:20 ` Stefan Hajnoczi 2019-05-15 19:27 ` [PATCH v2 19/30] fuse: Keep a list of free dax memory ranges Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 20/30] fuse: Introduce setupmapping/removemapping commands Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 21/30] fuse, dax: Implement dax read/write operations Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 22/30] fuse, dax: add DAX mmap support Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 23/30] fuse: Define dax address space operations Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal [this message] 2019-05-15 19:27 ` [PATCH v2 24/30] fuse, dax: Take ->i_mmap_sem lock during dax page fault Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 25/30] fuse: Maintain a list of busy elements Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 26/30] fuse: Add logic to free up a memory range Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-05-19 7:48 ` Eric Ren 2019-05-20 12:53 ` Vivek Goyal 2019-05-20 12:53 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 27/30] fuse: Release file in process context Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 28/30] fuse: Reschedule dax free work if too many EAGAIN attempts Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 29/30] fuse: Take inode lock for dax inode truncation Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 30/30] virtio-fs: Do not provide abort interface in fusectl Vivek Goyal 2019-05-15 19:27 ` Vivek Goyal
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190515192715.18000-25-vgoyal@redhat.com \ --to=vgoyal@redhat.com \ --cc=dgilbert@redhat.com \ --cc=kvm@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=miklos@szeredi.hu \ --cc=stefanha@redhat.com \ --cc=swhiteho@redhat.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.