linux-nvdimm.lists.01.org archive mirror
 help / color / mirror / Atom feed
From: Vivek Goyal <vgoyal@redhat.com>
To: Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	virtio-fs@redhat.com, Stefan Hajnoczi <stefanha@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	Miklos Szeredi <mszeredi@redhat.com>,
	Liu Bo <bo.liu@linux.alibaba.com>,
	Peng Tao <tao.peng@linux.alibaba.com>
Subject: Re: [PATCH 13/20] fuse, dax: Implement dax read/write operations
Date: Fri, 13 Mar 2020 09:41:55 -0400	[thread overview]
Message-ID: <20200313134155.GA156804@redhat.com> (raw)
In-Reply-To: <CAJfpegtuCCRfKfctUyQBimAOpnOTvW5zodLAy307Mr_1h0+e7g@mail.gmail.com>

On Fri, Mar 13, 2020 at 11:18:15AM +0100, Miklos Szeredi wrote:

[..]
> > > > +/* offset passed in should be aligned to FUSE_DAX_MEM_RANGE_SZ */
> > > > +static int fuse_setup_one_mapping(struct inode *inode, loff_t offset,
> > > > +                                 struct fuse_dax_mapping *dmap, bool writable,
> > > > +                                 bool upgrade)
> > > > +{
> > > > +       struct fuse_conn *fc = get_fuse_conn(inode);
> > > > +       struct fuse_inode *fi = get_fuse_inode(inode);
> > > > +       struct fuse_setupmapping_in inarg;
> > > > +       FUSE_ARGS(args);
> > > > +       ssize_t err;
> > > > +
> > > > +       WARN_ON(offset % FUSE_DAX_MEM_RANGE_SZ);
> > > > +       WARN_ON(fc->nr_free_ranges < 0);
> > > > +
> > > > +       /* Ask fuse daemon to setup mapping */
> > > > +       memset(&inarg, 0, sizeof(inarg));
> > > > +       inarg.foffset = offset;
> > > > +       inarg.fh = -1;
> > > > +       inarg.moffset = dmap->window_offset;
> > > > +       inarg.len = FUSE_DAX_MEM_RANGE_SZ;
> > > > +       inarg.flags |= FUSE_SETUPMAPPING_FLAG_READ;
> > > > +       if (writable)
> > > > +               inarg.flags |= FUSE_SETUPMAPPING_FLAG_WRITE;
> > > > +       args.opcode = FUSE_SETUPMAPPING;
> > > > +       args.nodeid = fi->nodeid;
> > > > +       args.in_numargs = 1;
> > > > +       args.in_args[0].size = sizeof(inarg);
> > > > +       args.in_args[0].value = &inarg;
> > >
> > > args.force = true?
> >
> > I can do that but I am not sure what exactly does args.force do and
> > why do we need it in this case.
> 
> Hm, it prevents interrupts.  Looking closely, however it will only
> prevent SIGKILL from immediately interrupting the request, otherwise
> it will send an INTERRUPT request and the filesystem can ignore that.
> Might make sense to have a args.nonint flag to prevent the sending of
> INTERRUPT...

Hi Miklos,

virtiofs does not support interrupt requests yet. Its fiq interrupt
handler just does not do anything.

static void virtio_fs_wake_interrupt_and_unlock(struct fuse_iqueue *fiq)
__releases(fiq->lock)
{
        /*
         * TODO interrupts.
         *
         * Normal fs operations on a local filesystems aren't interruptible.
         * Exceptions are blocking lock operations; for example fcntl(F_SETLKW)
         * with shared lock between host and guest.
         */
        spin_unlock(&fiq->lock);
}

So as of now setting force or not will not make any difference. We will
still end up waiting for request to finish.

Infact, I think there is no mechanism to set fc->no_interrupt in
virtio_fs. If I am reading request_wait_answer(), correctly, it will
see fc->no_interrupt is not set. That means filesystem supports
interrupt requests and it will do wait_event_interruptible() and
not even check for FR_FORCE bit. 

Right now fc->no_interrupt is set in response to INTERRUPT request
reply. Will it make sense to also be able to set it as part of
connection negotation protocol and filesystem can tell in the
beginning itself that it does not support interrupt and virtiofs
can make use of that.

So force flag is only useful if filesystem does not support interrupt
and in that case we do wait_event_killable() and upon receiving
SIGKILL, cancel request if it is still in pending queue. For virtiofs,
we take request out of fiq->pending queue in submission path itself
and if it can't be dispatched it waits on virtiofs speicfic queue
with FR_PENDING cleared. That means, setting FR_FORCE for virtiofs
does not mean anything as caller will end up waiting for
request to finish anyway.

IOW, setting FR_FORCE will make sense when we have mechanism to
detect that request is still queued in virtiofs queues and have
mechanism to cancel it. We don't have it. In fact, given we are
a push model, we dispatch request immediately to filesystem,
until and unless virtqueue is full. So probability of a request
still in virtiofs queue is low.

So may be we can start setting force at some point of time later
when we have mechanism to cancel detect and cancel pending requests
in virtiofs.

> 
> > First thing it does is that request is allocated with flag __GFP_NOFAIL.
> > Second thing it does is that caller is forced to wait for request
> > completion and its not an interruptible sleep.
> >
> > I am wondering what makes FUSE_SETUPMAPING/FUSE_REMOVEMAPPING requests
> > special that we need to set force flag.
> 
> Maybe not for SETUPMAPPING (I was confused by the error log).
> 
> However if REMOVEMAPPING fails for some reason, than that dax mapping
> will be leaked for the lifetime of the filesystem.   Or am I
> misunderstanding it?

FUSE_REMVOEMAPPING is not must. If we send another FUSE_SETUPMAPPING, then
it will create the new mapping and free up resources associated with
the previous mapping, IIUC.

So at one point of time we were thinking that what's the point of
sending FUSE_REMOVEMAPPING. It helps a bit with freeing up filesystem
resources earlier. So if cache size is big, then there will not be
much reclaim activity going and if we don't send FUSE_REMOVEMAPPING,
all these filesystem resources will remain busy on host for a long
time.

> 
> > > > +       ret = fuse_setup_one_mapping(inode,
> > > > +                                    ALIGN_DOWN(pos, FUSE_DAX_MEM_RANGE_SZ),
> > > > +                                    dmap, true, true);
> > > > +       if (ret < 0) {
> > > > +               printk("fuse_setup_one_mapping() failed. err=%d pos=0x%llx\n",
> > > > +                      ret, pos);
> > >
> > > Again.
> >
> > Will remove. How about converting some of them to pr_debug() instead? It
> > can help with debugging if something is not working.
> 
> Okay, and please move it to fuse_setup_one_mapping() where there's
> already a pr_debug() for the success case.

Will do.

> 
>  > > +
> > > > +       /* Do not use dax for file extending writes as its an mmap and
> > > > +        * trying to write beyong end of existing page will generate
> > > > +        * SIGBUS.
> > >
> > > Ah, here it is.  So what happens in case of a race?  Does that
> > > currently crash KVM?
> >
> > In case of race, yes, KVM hangs. So no shared directory operation yet
> > till we have designed proper error handling in kvm path.
> 
> I think before this is merged we have to fix the KVM crash; that's not
> acceptable even if we explicitly say that shared directory is not
> supported for the time being.

Ok, I will look into it. I had done some work in the past and realized
its not trivial to fix kvm error paths. There are no users and propagating
signals back into qemu instances and finding the right process is going to be
tricky.

Given the complexity of that work, I thought that for now we say that
shared directory is not supported and once basic dax patches get merged,
focus on kvm work.

Thanks
Vivek
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

  reply	other threads:[~2020-03-13 13:42 UTC|newest]

Thread overview: 67+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-04 16:58 [PATCH 00/20] virtiofs: Add DAX support Vivek Goyal
2020-03-04 16:58 ` [PATCH 01/20] dax: Modify bdev_dax_pgoff() to handle NULL bdev Vivek Goyal
2020-03-04 16:58 ` [PATCH 02/20] dax: Create a range version of dax_layout_busy_page() Vivek Goyal
2020-03-10 15:19   ` Ira Weiny
2020-03-10 20:29     ` Vivek Goyal
2020-03-04 16:58 ` [PATCH 03/20] virtio: Add get_shm_region method Vivek Goyal
2020-03-10 10:53   ` Stefan Hajnoczi
2020-03-04 16:58 ` [PATCH 04/20] virtio: Implement get_shm_region for PCI transport Vivek Goyal
2020-03-10 11:04   ` Stefan Hajnoczi
2020-03-10 18:19     ` Vivek Goyal
2020-03-11 17:34       ` Stefan Hajnoczi
2020-03-11 19:29         ` Vivek Goyal
2020-03-10 11:12   ` Michael S. Tsirkin
2020-03-10 18:47     ` Vivek Goyal
2020-03-10 21:27       ` Michael S. Tsirkin
2020-03-04 16:58 ` [PATCH 05/20] virtio: Implement get_shm_region for MMIO transport Vivek Goyal
2020-03-10 11:06   ` Stefan Hajnoczi
2020-03-04 16:58 ` [PATCH 06/20] virtiofs: Provide a helper function for virtqueue initialization Vivek Goyal
2020-03-10 14:10   ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 07/20] fuse: Get rid of no_mount_options Vivek Goyal
2020-03-10 14:12   ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 08/20] fuse,virtiofs: Add a mount option to enable dax Vivek Goyal
2020-03-10 14:16   ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 09/20] virtio_fs, dax: Set up virtio_fs dax_device Vivek Goyal
2020-03-04 16:58 ` [PATCH 10/20] fuse,virtiofs: Keep a list of free dax memory ranges Vivek Goyal
2020-03-10 19:29   ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 11/20] fuse: implement FUSE_INIT map_alignment field Vivek Goyal
2020-03-10 19:31   ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 12/20] fuse: Introduce setupmapping/removemapping commands Vivek Goyal
2020-03-10 19:49   ` Miklos Szeredi
2020-03-10 20:33     ` Vivek Goyal
2020-03-11  7:03       ` Amir Goldstein
2020-03-11 14:19         ` Miklos Szeredi
2020-03-11 14:41           ` Vivek Goyal
2020-03-11 15:12             ` Miklos Szeredi
2020-03-04 16:58 ` [PATCH 13/20] fuse, dax: Implement dax read/write operations Vivek Goyal
2020-03-12  9:43   ` Miklos Szeredi
2020-03-12 16:02     ` Vivek Goyal
2020-03-13 10:18       ` Miklos Szeredi
2020-03-13 13:41         ` Vivek Goyal [this message]
2020-04-04  0:25   ` Liu Bo
2020-04-14 12:54     ` Vivek Goyal
2020-03-04 16:58 ` [PATCH 14/20] fuse,dax: add DAX mmap support Vivek Goyal
2020-03-04 16:58 ` [PATCH 15/20] fuse, dax: Take ->i_mmap_sem lock during dax page fault Vivek Goyal
2020-03-04 16:58 ` [PATCH 16/20] fuse,virtiofs: Define dax address space operations Vivek Goyal
2020-03-04 16:58 ` [PATCH 17/20] fuse,virtiofs: Maintain a list of busy elements Vivek Goyal
2020-03-04 16:58 ` [PATCH 18/20] fuse: Release file in process context Vivek Goyal
2020-03-04 16:58 ` [PATCH 19/20] fuse: Take inode lock for dax inode truncation Vivek Goyal
2020-03-04 16:58 ` [PATCH 20/20] fuse,virtiofs: Add logic to free up a memory range Vivek Goyal
2020-03-11  5:16   ` Liu Bo
2020-03-11 12:59     ` Vivek Goyal
2020-03-11 17:24       ` Liu Bo
2020-03-26  0:09   ` Liu Bo
2020-03-27 14:01     ` Vivek Goyal
2020-03-27 22:06       ` Liu Bo
2020-04-14 19:30         ` Vivek Goyal
2020-04-15 17:22           ` Liu Bo
2020-04-16 19:05             ` Vivek Goyal
2020-04-17 18:05               ` Liu Bo
2020-03-11  5:22 ` [PATCH 00/20] virtiofs: Add DAX support Amir Goldstein
2020-03-11 13:09   ` Vivek Goyal
2020-03-11 18:48   ` Vivek Goyal
2020-03-11 19:32     ` Amir Goldstein
2020-03-11 19:39       ` Vivek Goyal
2020-03-11 13:38 ` Patrick Ohly
2020-03-16 13:02   ` Vivek Goyal
2020-03-17  8:28     ` Patrick Ohly

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200313134155.GA156804@redhat.com \
    --to=vgoyal@redhat.com \
    --cc=bo.liu@linux.alibaba.com \
    --cc=dgilbert@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=miklos@szeredi.hu \
    --cc=mst@redhat.com \
    --cc=mszeredi@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=tao.peng@linux.alibaba.com \
    --cc=virtio-fs@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).