linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Liu Bo <bo.liu@linux.alibaba.com>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvdimm@lists.01.org, virtio-fs@redhat.com,
	miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com,
	mst@redhat.com
Subject: Re: [PATCH 20/20] fuse,virtiofs: Add logic to free up a memory range
Date: Tue, 14 Apr 2020 15:30:45 -0400	[thread overview]
Message-ID: <20200414193045.GB210453@redhat.com> (raw)
In-Reply-To: <20200327220606.GA119028@rsjd01523.et2sqa>

On Sat, Mar 28, 2020 at 06:06:06AM +0800, Liu Bo wrote:
> On Fri, Mar 27, 2020 at 10:01:14AM -0400, Vivek Goyal wrote:
> > On Thu, Mar 26, 2020 at 08:09:05AM +0800, Liu Bo wrote:
> > 
> > [..]
> > > > +/*
> > > > + * Find first mapping in the tree and free it and return it. Do not add
> > > > + * it back to free pool. If fault == true, this function should be called
> > > > + * with fi->i_mmap_sem held.
> > > > + */
> > > > +static struct fuse_dax_mapping *inode_reclaim_one_dmap(struct fuse_conn *fc,
> > > > +							 struct inode *inode,
> > > > +							 bool fault)
> > > > +{
> > > > +	struct fuse_inode *fi = get_fuse_inode(inode);
> > > > +	struct fuse_dax_mapping *dmap;
> > > > +	int ret;
> > > > +
> > > > +	if (!fault)
> > > > +		down_write(&fi->i_mmap_sem);
> > > > +
> > > > +	/*
> > > > +	 * Make sure there are no references to inode pages using
> > > > +	 * get_user_pages()
> > > > +	 */
> > > > +	ret = fuse_break_dax_layouts(inode, 0, 0);
> > > 
> > > Hi Vivek,
> > > 
> > > This patch is enabling inline reclaim for fault path, but fault path
> > > has already holds a locked exceptional entry which I believe the above
> > > fuse_break_dax_layouts() needs to wait for, can you please elaborate
> > > on how this can be avoided?
> > > 
> > 
> > Hi Liubo,
> > 
> > Can you please point to the exact lock you are referring to. I will
> > check it out. Once we got rid of needing to take inode lock in
> > reclaim path, that opended the door to do inline reclaim in fault
> > path as well. But I was not aware of this exceptional entry lock.
> 
> Hi Vivek,
> 
> dax_iomap_{pte,pmd}_fault has called grab_mapping_entry to get a
> locked entry, when this fault gets into inline reclaim, would
> fuse_break_dax_layouts wait for the locked exceptional entry which is
> locked in dax_iomap_{pte,pmd}_fault?

Hi Liu Bo,

This is a good point. Indeed it can deadlock the way code is written
currently.

Currently we are calling fuse_break_dax_layouts() on the whole file
in memory inline reclaim path. I am thinking of changing that. Instead,
find a mapped memory range and file offset and call
fuse_break_dax_layouts() only on that range (2MB). This should ensure
that we don't try to break dax layout in the range where we are holding
exceptional entry lock and avoid deadlock possibility.

This also has added benefit that we don't have to unmap the whole
file in an attempt to reclaim one memory range. We will unmap only
a portion of file and that should be good from performance point of
view.

Here is proof of concept patch which applies on top of my internal 
tree.

---
 fs/fuse/file.c |   72 +++++++++++++++++++++++++++++++++++++++------------------
 1 file changed, 50 insertions(+), 22 deletions(-)

Index: redhat-linux/fs/fuse/file.c
===================================================================
--- redhat-linux.orig/fs/fuse/file.c	2020-04-14 13:47:19.493780528 -0400
+++ redhat-linux/fs/fuse/file.c	2020-04-14 14:58:26.814079643 -0400
@@ -4297,13 +4297,13 @@ static int fuse_break_dax_layouts(struct
         return ret;
 }
 
-/* Find first mapping in the tree and free it. */
-static struct fuse_dax_mapping *
-inode_reclaim_one_dmap_locked(struct fuse_conn *fc, struct inode *inode)
+/* Find first mapped dmap for an inode and return file offset. Caller needs
+ * to hold inode->i_dmap_sem lock either shared or exclusive. */
+static struct fuse_dax_mapping *inode_lookup_first_dmap(struct fuse_conn *fc,
+							struct inode *inode)
 {
 	struct fuse_inode *fi = get_fuse_inode(inode);
 	struct fuse_dax_mapping *dmap;
-	int ret;
 
 	for (dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, 0, -1);
 	     dmap;
@@ -4312,18 +4312,6 @@ inode_reclaim_one_dmap_locked(struct fus
 		if (refcount_read(&dmap->refcnt) > 1)
 			continue;
 
-		ret = reclaim_one_dmap_locked(fc, inode, dmap);
-		if (ret < 0)
-			return ERR_PTR(ret);
-
-		/* Clean up dmap. Do not add back to free list */
-		dmap_remove_busy_list(fc, dmap);
-		dmap->inode = NULL;
-		dmap->start = dmap->end = 0;
-
-		pr_debug("fuse: %s: reclaimed memory range. inode=%px,"
-			 " window_offset=0x%llx, length=0x%llx\n", __func__,
-			 inode, dmap->window_offset, dmap->length);
 		return dmap;
 	}
 
@@ -4335,30 +4323,70 @@ inode_reclaim_one_dmap_locked(struct fus
  * it back to free pool. If fault == true, this function should be called
  * with fi->i_mmap_sem held.
  */
-static struct fuse_dax_mapping *inode_reclaim_one_dmap(struct fuse_conn *fc,
-							 struct inode *inode,
-							 bool fault)
+static struct fuse_dax_mapping *
+inode_inline_reclaim_one_dmap(struct fuse_conn *fc, struct inode *inode,
+			      bool fault)
 {
 	struct fuse_inode *fi = get_fuse_inode(inode);
 	struct fuse_dax_mapping *dmap;
+	u64 dmap_start, dmap_end;
 	int ret;
 
 	if (!fault)
 		down_write(&fi->i_mmap_sem);
 
+	/* Lookup a dmap and corresponding file offset to reclaim. */
+	down_read(&fi->i_dmap_sem);
+	dmap = inode_lookup_first_dmap(fc, inode);
+	if (dmap) {
+		dmap_start = dmap->start;
+		dmap_end = dmap->end;
+	}
+	up_read(&fi->i_dmap_sem);
+
+	if (!dmap)
+		goto out_mmap_sem;
 	/*
 	 * Make sure there are no references to inode pages using
 	 * get_user_pages()
 	 */
-	ret = fuse_break_dax_layouts(inode, 0, 0);
+	ret = fuse_break_dax_layouts(inode, dmap_start, dmap_end);
 	if (ret) {
 		printk("virtio_fs: fuse_break_dax_layouts() failed. err=%d\n",
 		       ret);
 		dmap = ERR_PTR(ret);
 		goto out_mmap_sem;
 	}
+
 	down_write(&fi->i_dmap_sem);
-	dmap = inode_reclaim_one_dmap_locked(fc, inode);
+	dmap = fuse_dax_interval_tree_iter_first(&fi->dmap_tree, dmap_start,
+						 dmap_start);
+	/* Range already got reclaimed by somebody else */
+	if (!dmap)
+		goto out_write_dmap_sem;
+
+	/* still in use. */
+	if (refcount_read(&dmap->refcnt) > 1) {
+		dmap = NULL;
+		goto out_write_dmap_sem;
+	}
+
+	ret = reclaim_one_dmap_locked(fc, inode, dmap);
+	if (ret < 0) {
+		dmap = NULL;
+		goto out_write_dmap_sem;
+	}
+
+	/* Clean up dmap. Do not add back to free list */
+	dmap_remove_busy_list(fc, dmap);
+	dmap->inode = NULL;
+	dmap->start = dmap->end = 0;
+
+	pr_debug("fuse: %s: inline reclaimed memory range. inode=%px,"
+		 " window_offset=0x%llx, length=0x%llx\n", __func__,
+		 inode, dmap->window_offset, dmap->length);
+
+out_write_dmap_sem:
 	up_write(&fi->i_dmap_sem);
 out_mmap_sem:
 	if (!fault)
@@ -4379,7 +4407,7 @@ static struct fuse_dax_mapping *alloc_da
 			return dmap;
 
 		if (fi->nr_dmaps) {
-			dmap = inode_reclaim_one_dmap(fc, inode, fault);
+			dmap = inode_inline_reclaim_one_dmap(fc, inode, fault);
 			if (dmap)
 				return dmap;
 			/* If we could not reclaim a mapping because it

_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

  reply	other threads:[~2020-04-14 19:31 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-04 16:58 [PATCH 00/20] virtiofs: Add DAX support Vivek Goyal
2020-03-04 16:58 ` [PATCH 01/20] dax: Modify bdev_dax_pgoff() to handle NULL bdev Vivek Goyal
2020-03-04 16:58 ` [PATCH 02/20] dax: Create a range version of dax_layout_busy_page() Vivek Goyal
2020-03-10 15:19   ` Ira Weiny
2020-03-10 20:29     ` Vivek Goyal
2020-03-04 16:58 ` [PATCH 03/20] virtio: Add get_shm_region method Vivek Goyal
2020-03-10 10:53   ` Stefan Hajnoczi
2020-03-04 16:58 ` [PATCH 04/20] virtio: Implement get_shm_region for PCI transport Vivek Goyal
2020-03-10 11:04   ` Stefan Hajnoczi
2020-03-10 18:19     ` Vivek Goyal
2020-03-11 17:34       ` Stefan Hajnoczi
2020-03-11 19:29         ` Vivek Goyal
2020-03-10 11:12   ` Michael S. Tsirkin
2020-03-10 18:47     ` Vivek Goyal
2020-03-10 21:27       ` Michael S. Tsirkin
2020-03-04 16:58 ` [PATCH 05/20] virtio: Implement get_shm_region for MMIO transport Vivek Goyal
2020-03-10 11:06   ` Stefan Hajnoczi
2020-03-04 16:58 ` [PATCH 06/20] virtiofs: Provide a helper function for virtqueue initialization Vivek Goyal
2020-03-10 14:10   ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 07/20] fuse: Get rid of no_mount_options Vivek Goyal
2020-03-10 14:12   ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 08/20] fuse,virtiofs: Add a mount option to enable dax Vivek Goyal
2020-03-10 14:16   ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 09/20] virtio_fs, dax: Set up virtio_fs dax_device Vivek Goyal
2020-03-04 16:58 ` [PATCH 10/20] fuse,virtiofs: Keep a list of free dax memory ranges Vivek Goyal
2020-03-10 19:29   ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 11/20] fuse: implement FUSE_INIT map_alignment field Vivek Goyal
2020-03-10 19:31   ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 12/20] fuse: Introduce setupmapping/removemapping commands Vivek Goyal
2020-03-10 19:49   ` Miklos Szeredi
2020-03-10 20:33     ` Vivek Goyal
2020-03-11  7:03       ` Amir Goldstein
2020-03-11 14:19         ` Miklos Szeredi
2020-03-11 14:41           ` Vivek Goyal
2020-03-11 15:12             ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 13/20] fuse, dax: Implement dax read/write operations Vivek Goyal
2020-03-12  9:43   ` Miklos Szeredi
2020-03-12 16:02     ` Vivek Goyal
2020-03-13 10:18       ` Miklos Szeredi
2020-03-13 13:41         ` Vivek Goyal
2020-04-04  0:25   ` Liu Bo
2020-04-14 12:54     ` Vivek Goyal
2020-03-04 16:58 ` [PATCH 14/20] fuse,dax: add DAX mmap support Vivek Goyal
2020-03-04 16:58 ` [PATCH 15/20] fuse, dax: Take ->i_mmap_sem lock during dax page fault Vivek Goyal
2020-03-04 16:58 ` [PATCH 16/20] fuse,virtiofs: Define dax address space operations Vivek Goyal
2020-03-04 16:58 ` [PATCH 17/20] fuse,virtiofs: Maintain a list of busy elements Vivek Goyal
2020-03-04 16:58 ` [PATCH 18/20] fuse: Release file in process context Vivek Goyal
2020-03-04 16:58 ` [PATCH 19/20] fuse: Take inode lock for dax inode truncation Vivek Goyal
2020-03-04 16:58 ` [PATCH 20/20] fuse,virtiofs: Add logic to free up a memory range Vivek Goyal
2020-03-11  5:16   ` Liu Bo
2020-03-11 12:59     ` Vivek Goyal
2020-03-11 17:24       ` Liu Bo
2020-03-26  0:09   ` Liu Bo
2020-03-27 14:01     ` Vivek Goyal
2020-03-27 22:06       ` Liu Bo
2020-04-14 19:30         ` Vivek Goyal [this message]
2020-04-15 17:22           ` Liu Bo
2020-04-16 19:05             ` Vivek Goyal
2020-04-17 18:05               ` Liu Bo
2020-03-11  5:22 ` [PATCH 00/20] virtiofs: Add DAX support Amir Goldstein
2020-03-11 13:09   ` Vivek Goyal
2020-03-11 18:48   ` Vivek Goyal
2020-03-11 19:32     ` Amir Goldstein
2020-03-11 19:39       ` Vivek Goyal
2020-03-11 13:38 ` Patrick Ohly
2020-03-16 13:02   ` Vivek Goyal
2020-03-17  8:28     ` Patrick Ohly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200414193045.GB210453@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=bo.liu@linux.alibaba.com \
    --cc=dgilbert@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=miklos@szeredi.hu \
    --cc=mst@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).