LKML Archive on lore.kernel.org
 help / Atom feed
From: "Dr. David Alan Gilbert" <dgilbert@redhat.com>
To: Cornelia Huck <cohuck@redhat.com>
Cc: Stefan Hajnoczi <stefanha@redhat.com>,
	David Hildenbrand <david@redhat.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, miklos@szeredi.hu, sweil@redhat.com,
	swhiteho@redhat.com
Subject: Re: [PATCH 18/52] virtio-fs: Map cache using the values from the capabilities
Date: Fri, 14 Dec 2018 14:06:47 +0000
Message-ID: <20181214140646.GG2454@work-vm> (raw)
In-Reply-To: <20181214145058.6071bdac.cohuck@redhat.com>

* Cornelia Huck (cohuck@redhat.com) wrote:
> On Fri, 14 Dec 2018 13:44:34 +0000
> Stefan Hajnoczi <stefanha@redhat.com> wrote:
> 
> > On Thu, Dec 13, 2018 at 01:38:23PM +0100, Cornelia Huck wrote:
> > > On Thu, 13 Dec 2018 13:24:31 +0100
> > > David Hildenbrand <david@redhat.com> wrote:
> > >   
> > > > On 13.12.18 13:15, Dr. David Alan Gilbert wrote:  
> > > > > * David Hildenbrand (david@redhat.com) wrote:    
> > > > >> On 13.12.18 11:00, Dr. David Alan Gilbert wrote:    
> > > > >>> * David Hildenbrand (david@redhat.com) wrote:    
> > > > >>>> On 13.12.18 10:13, Dr. David Alan Gilbert wrote:    
> > > > >>>>> * David Hildenbrand (david@redhat.com) wrote:    
> > > > >>>>>> On 10.12.18 18:12, Vivek Goyal wrote:    
> > > > >>>>>>> Instead of assuming we had the fixed bar for the cache, use the
> > > > >>>>>>> value from the capabilities.
> > > > >>>>>>>
> > > > >>>>>>> Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > > >>>>>>> ---
> > > > >>>>>>>  fs/fuse/virtio_fs.c | 32 +++++++++++++++++---------------
> > > > >>>>>>>  1 file changed, 17 insertions(+), 15 deletions(-)
> > > > >>>>>>>
> > > > >>>>>>> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c
> > > > >>>>>>> index 60d496c16841..55bac1465536 100644
> > > > >>>>>>> --- a/fs/fuse/virtio_fs.c
> > > > >>>>>>> +++ b/fs/fuse/virtio_fs.c
> > > > >>>>>>> @@ -14,11 +14,6 @@
> > > > >>>>>>>  #include <uapi/linux/virtio_pci.h>
> > > > >>>>>>>  #include "fuse_i.h"
> > > > >>>>>>>  
> > > > >>>>>>> -enum {
> > > > >>>>>>> -	/* PCI BAR number of the virtio-fs DAX window */
> > > > >>>>>>> -	VIRTIO_FS_WINDOW_BAR = 2,
> > > > >>>>>>> -};
> > > > >>>>>>> -
> > > > >>>>>>>  /* List of virtio-fs device instances and a lock for the list */
> > > > >>>>>>>  static DEFINE_MUTEX(virtio_fs_mutex);
> > > > >>>>>>>  static LIST_HEAD(virtio_fs_instances);
> > > > >>>>>>> @@ -518,7 +513,7 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
> > > > >>>>>>>  	struct dev_pagemap *pgmap;
> > > > >>>>>>>  	struct pci_dev *pci_dev;
> > > > >>>>>>>  	phys_addr_t phys_addr;
> > > > >>>>>>> -	size_t len;
> > > > >>>>>>> +	size_t bar_len;
> > > > >>>>>>>  	int ret;
> > > > >>>>>>>  	u8 have_cache, cache_bar;
> > > > >>>>>>>  	u64 cache_offset, cache_len;
> > > > >>>>>>> @@ -551,17 +546,13 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs)
> > > > >>>>>>>          }
> > > > >>>>>>>  
> > > > >>>>>>>  	/* TODO handle case where device doesn't expose BAR? */    
> > > > >>>>>>
> > > > >>>>>> For virtio-pmem we decided to not go via BARs as this would effectively
> > > > >>>>>> make it only usable for virtio-pci implementers. Instead, we are going
> > > > >>>>>> to export the applicable physical device region directly (e.g.
> > > > >>>>>> phys_start, phys_size in virtio config), so it is decoupled from PCI
> > > > >>>>>> details. Doing the same for virtio-fs would allow e.g. also virtio-ccw
> > > > >>>>>> to make eventually use of this.    
> > > > >>>>>
> > > > >>>>> That makes it a very odd looking PCI device;  I can see that with
> > > > >>>>> virtio-pmem it makes some sense, given that it's job is to expose
> > > > >>>>> arbitrary chunks of memory.
> > > > >>>>>
> > > > >>>>> Dave    
> > > > >>>>
> > > > >>>> Well, the fact that your are
> > > > >>>>
> > > > >>>> - including <uapi/linux/virtio_pci.h>
> > > > >>>> - adding pci related code
> > > > >>>>
> > > > >>>> in/to fs/fuse/virtio_fs.c
> > > > >>>>
> > > > >>>> tells me that these properties might be better communicated on the
> > > > >>>> virtio layer, not on the PCI layer.
> > > > >>>>
> > > > >>>> Or do you really want to glue virtio-fs to virtio-pci for all eternity?    
> > > > >>>
> > > > >>> No, these need cleaning up; and the split within the bar
> > > > >>> is probably going to change to be communicated via virtio layer
> > > > >>> rather than pci capabilities.  However, I don't want to make our PCI
> > > > >>> device look odd, just to make portability to non-PCI devices - so it's
> > > > >>> right to make the split appropriately, but still to use PCI bars
> > > > >>> for what they were designed for.
> > > > >>>
> > > > >>> Dave    
> > > > >>
> > > > >> Let's discuss after the cleanup. In general I am not convinced this is
> > > > >> the right thing to do. Using virtio-pci for anything else than pure
> > > > >> transport smells like bad design to me (well, I am no virtio expert
> > > > >> after all ;) ). No matter what PCI bars were designed for. If we can't
> > > > >> get the same running with e.g. virtio-ccw or virtio-whatever, it is
> > > > >> broken by design (or an addon that is tightly glued to virtio-pci, if
> > > > >> that is the general idea).    
> > > > > 
> > > > > I'm sure we can find alternatives for virtio-*, so I wouldn't expect
> > > > > it to be glued to virtio-pci.
> > > > > 
> > > > > Dave    
> > > > 
> > > > As s390x does not have the concept of memory mapped io (RAM is RAM,
> > > > nothing else), this is not architectured. vitio-ccw can therefore not
> > > > define anything similar like that. However, in virtual environments we
> > > > can do whatever we want on top of the pure transport (e.g. on the virtio
> > > > layer).
> > > > 
> > > > Conny can correct me if I am wrong.  
> > > 
> > > I don't think you're wrong, but I haven't read the code yet and I'm
> > > therefore not aware of the purpose of this BAR.
> > > 
> > > Generally, if there is a memory location shared between host and guest,
> > > we need a way to communicate its location, which will likely differ
> > > between transports. For ccw, I could imagine a new channel command
> > > dedicated to exchanging configuration information (similar to what
> > > exists today to communicate the locations of virtqueues), but I'd
> > > rather not go down this path.
> > > 
> > > Without reading the code/design further, can we use one of the
> > > following instead of a BAR:
> > > - a virtqueue;
> > > - something in config space?
> > > That would be implementable by any virtio transport.  
> > 
> > The way I think about this is that we wish to extend the VIRTIO device
> > model with the concept of shared memory.  virtio-fs, virtio-gpu, and
> > virtio-vhost-user all have requirements for shared memory.
> > 
> > This seems like a transport-level issue to me.  PCI supports
> > memory-mapped I/O and that's the right place to do it.  If you try to
> > put it into config space or the virtqueue, you'll end up with something
> > that cannot be realized as a PCI device because it bypasses PCI bus
> > address translation.
> > 
> > If CCW needs a side-channel, that's fine.  But that side-channel is a
> > CCW-specific mechanism and probably doesn't apply to all other
> > transports.
> 
> But virtio-gpu works with ccw right now (I haven't checked what it
> uses); can virtio-fs use an equivalent method?
> 
> If there's a more generic case to be made for extending virtio devices
> with a way to handle shared memory, a ccw for that would be fine. I
> just want to avoid adding new ccws for everything as the namespace is
> not infinite.

In our case we've got somewhere between 0..3 ranges of memory, and I was
specifying them as PCI capabilities; however Gerd's suggestion was that
it would be better to just use 1 bar and then have something as part of
virtio or the like to split them up.
If we do that, then we could have something of the form
  (index, base, length)

for each of the regions, where in the PCI case 'index' means BAR and
in CCW it means something else. (For mmio it's probably irrelevant and
the base is probably a physical address).

Dave

--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK

  reply index

Thread overview: 98+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-12-10 17:12 [PATCH 00/52] [RFC] virtio-fs: shared file system for virtual machines Vivek Goyal
2018-12-10 17:12 ` [PATCH 01/52] fuse: add skeleton virtio_fs.ko module Vivek Goyal
2018-12-10 17:12 ` [PATCH 02/52] fuse: add probe/remove virtio driver Vivek Goyal
2018-12-10 17:12 ` [PATCH 03/52] fuse: rely on mutex_unlock() barrier instead of fput() Vivek Goyal
2018-12-10 17:12 ` [PATCH 04/52] fuse: extract fuse_fill_super_common() Vivek Goyal
2018-12-10 17:12 ` [PATCH 05/52] virtio_fs: get mount working Vivek Goyal
2018-12-10 17:12 ` [PATCH 06/52] fuse: export fuse_end_request() Vivek Goyal
2018-12-10 17:12 ` [PATCH 07/52] fuse: export fuse_len_args() Vivek Goyal
2018-12-10 17:12 ` [PATCH 08/52] fuse: add fuse_iqueue_ops callbacks Vivek Goyal
2018-12-10 17:12 ` [PATCH 09/52] fuse: process requests queues Vivek Goyal
2018-12-10 17:12 ` [PATCH 10/52] fuse: export fuse_get_unique() Vivek Goyal
2018-12-10 17:12 ` [PATCH 11/52] fuse: implement FUSE_FORGET for virtio-fs Vivek Goyal
2018-12-10 17:12 ` [PATCH 12/52] virtio_fs: Set up dax_device Vivek Goyal
2018-12-10 17:12 ` [PATCH 13/52] dax: remove block device dependencies Vivek Goyal
2018-12-10 17:12 ` [PATCH 14/52] fuse: add fuse_conn->dax_dev field Vivek Goyal
2018-12-10 17:12 ` [PATCH 15/52] fuse: map virtio_fs DAX window BAR Vivek Goyal
2018-12-12 16:37   ` Christian Borntraeger
2018-12-13 11:55     ` Stefan Hajnoczi
2018-12-13 16:06   ` kbuild test robot
2018-12-13 19:55   ` Dan Williams
2018-12-13 20:09     ` Dr. David Alan Gilbert
2018-12-13 20:15       ` Dan Williams
2018-12-13 20:40         ` Vivek Goyal
2018-12-13 21:18           ` Vivek Goyal
2018-12-14 10:09             ` Dr. David Alan Gilbert
2018-12-10 17:12 ` [PATCH 16/52] virtio-fs: Add VIRTIO_PCI_CAP_SHARED_MEMORY_CFG and utility to find them Vivek Goyal
2018-12-12 16:36   ` [PATCH] virtio-fs: fix semicolon.cocci warnings kbuild test robot
2018-12-12 16:36   ` [PATCH 16/52] virtio-fs: Add VIRTIO_PCI_CAP_SHARED_MEMORY_CFG and utility to find them kbuild test robot
2018-12-10 17:12 ` [PATCH 17/52] virtio-fs: Retrieve shm capabilities for cache Vivek Goyal
2018-12-10 17:12 ` [PATCH 18/52] virtio-fs: Map cache using the values from the capabilities Vivek Goyal
2018-12-13  9:10   ` David Hildenbrand
2018-12-13  9:13     ` Dr. David Alan Gilbert
2018-12-13  9:34       ` David Hildenbrand
2018-12-13 10:00         ` Dr. David Alan Gilbert
2018-12-13 11:26           ` David Hildenbrand
2018-12-13 12:15             ` Dr. David Alan Gilbert
2018-12-13 12:24               ` David Hildenbrand
2018-12-13 12:38                 ` Cornelia Huck
2018-12-14 13:44                   ` Stefan Hajnoczi
2018-12-14 13:50                     ` Cornelia Huck
2018-12-14 14:06                       ` Dr. David Alan Gilbert [this message]
2018-12-17 11:25                       ` Stefan Hajnoczi
2018-12-17 10:53                     ` David Hildenbrand
2018-12-17 14:56                       ` Stefan Hajnoczi
2018-12-18 17:13                         ` Cornelia Huck
2018-12-18 17:25                           ` David Hildenbrand
2019-01-02 10:24                             ` Stefan Hajnoczi
2019-03-17  0:33   ` Liu Bo
2019-03-20 10:42     ` Dr. David Alan Gilbert
2019-03-17  0:35   ` [PATCH] virtio-fs: fix multiple tag support Liu Bo
2019-03-19 20:26     ` Vivek Goyal
2019-03-20  2:04       ` Liu Bo
2018-12-10 17:12 ` [PATCH 19/52] virito-fs: Make dax optional Vivek Goyal
2018-12-10 17:12 ` [PATCH 20/52] Limit number of pages returned by direct_access() Vivek Goyal
2018-12-10 17:12 ` [PATCH 21/52] fuse: Introduce fuse_dax_mapping Vivek Goyal
2018-12-10 17:12 ` [PATCH 22/52] Create a list of free memory ranges Vivek Goyal
2018-12-11 17:44   ` kbuild test robot
2018-12-15 19:22   ` kbuild test robot
2018-12-10 17:12 ` [PATCH 23/52] fuse: simplify fuse_fill_super_common() calling Vivek Goyal
2018-12-10 17:12 ` [PATCH 24/52] fuse: Introduce setupmapping/removemapping commands Vivek Goyal
2018-12-10 17:12 ` [PATCH 25/52] Introduce interval tree basic data structures Vivek Goyal
2018-12-10 17:12 ` [PATCH 26/52] fuse: Implement basic DAX read/write support commands Vivek Goyal
2018-12-10 17:12 ` [PATCH 27/52] fuse: Maintain a list of busy elements Vivek Goyal
2018-12-10 17:12 ` [PATCH 28/52] Do fallocate() to grow file before mapping for file growing writes Vivek Goyal
2018-12-11  6:13   ` kbuild test robot
2018-12-11  6:20   ` kbuild test robot
2018-12-10 17:12 ` [PATCH 29/52] fuse: add DAX mmap support Vivek Goyal
2018-12-10 17:12 ` [PATCH 30/52] fuse: delete dentry if timeout is zero Vivek Goyal
2018-12-10 17:12 ` [PATCH 31/52] dax: Pass dax_dev to dax_writeback_mapping_range() Vivek Goyal
2018-12-11  6:12   ` kbuild test robot
2018-12-11 17:38   ` kbuild test robot
2018-12-10 17:12 ` [PATCH 32/52] fuse: Define dax address space operations Vivek Goyal
2018-12-10 17:12 ` [PATCH 33/52] fuse, dax: Take ->i_mmap_sem lock during dax page fault Vivek Goyal
2018-12-10 17:13 ` [PATCH 34/52] fuse: Add logic to free up a memory range Vivek Goyal
2018-12-10 17:13 ` [PATCH 35/52] fuse: Add logic to do direct reclaim of memory Vivek Goyal
2018-12-10 17:13 ` [PATCH 36/52] fuse: Kick worker when free memory drops below 20% of total ranges Vivek Goyal
2018-12-10 17:13 ` [PATCH 37/52] fuse: multiplex cached/direct_io/dax file operations Vivek Goyal
2018-12-10 17:13 ` [PATCH 38/52] Dispatch FORGET requests later instead of dropping them Vivek Goyal
2018-12-10 17:13 ` [PATCH 39/52] Release file in process context Vivek Goyal
2018-12-10 17:13 ` [PATCH 40/52] fuse: Do not block on inode lock while freeing memory range Vivek Goyal
2018-12-10 17:13 ` [PATCH 41/52] fuse: Reschedule dax free work if too many EAGAIN attempts Vivek Goyal
2018-12-10 17:13 ` [PATCH 42/52] fuse: Wait for memory ranges to become free Vivek Goyal
2018-12-10 17:13 ` [PATCH 43/52] fuse: Take inode lock for dax inode truncation Vivek Goyal
2018-12-10 17:13 ` [PATCH 44/52] fuse: Clear setuid bit even in direct I/O path Vivek Goyal
2018-12-10 17:13 ` [PATCH 45/52] virtio: Free fuse devices on umount Vivek Goyal
2018-12-10 17:13 ` [PATCH 46/52] virtio-fs: Retrieve shm capabilities for version table Vivek Goyal
2018-12-10 17:13 ` [PATCH 47/52] virtio-fs: Map using the values from the capabilities Vivek Goyal
2018-12-10 17:13 ` [PATCH 48/52] virtio-fs: pass version table pointer to fuse Vivek Goyal
2018-12-10 17:13 ` [PATCH 49/52] fuse: don't crash if version table is NULL Vivek Goyal
2018-12-10 17:13 ` [PATCH 50/52] fuse: add shared version support (virtio-fs only) Vivek Goyal
2018-12-10 17:13 ` [PATCH 51/52] fuse: shared version cleanups Vivek Goyal
2018-12-10 17:13 ` [PATCH 52/52] fuse: fix fuse_permission() for the default_permissions case Vivek Goyal
2018-12-19 21:25   ` kbuild test robot
2018-12-11 12:54 ` [PATCH 00/52] [RFC] virtio-fs: shared file system for virtual machines Stefan Hajnoczi
2018-12-12 20:30 ` Konrad Rzeszutek Wilk
2018-12-12 21:22   ` Vivek Goyal
2019-02-12 15:56 ` Aneesh Kumar K.V
2019-02-12 18:57   ` Vivek Goyal

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181214140646.GG2454@work-vm \
    --to=dgilbert@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=miklos@szeredi.hu \
    --cc=stefanha@redhat.com \
    --cc=sweil@redhat.com \
    --cc=swhiteho@redhat.com \
    --cc=vgoyal@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox