From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mx1.redhat.com ([209.132.183.28]:36680 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726657AbeLQKxw (ORCPT ); Mon, 17 Dec 2018 05:53:52 -0500 Subject: Re: [PATCH 18/52] virtio-fs: Map cache using the values from the capabilities To: Stefan Hajnoczi , Cornelia Huck Cc: "Dr. David Alan Gilbert" , Vivek Goyal , linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kvm@vger.kernel.org, miklos@szeredi.hu, sweil@redhat.com, swhiteho@redhat.com References: <20181210171318.16998-1-vgoyal@redhat.com> <20181210171318.16998-19-vgoyal@redhat.com> <20181213091320.GA2313@work-vm> <20181213100058.GC2313@work-vm> <96d9ea85-ddf3-3331-77ce-124475b26da4@redhat.com> <20181213121548.GN2313@work-vm> <0f1b43f6-57e3-c6d2-7ffe-cf783e125a7b@redhat.com> <20181213133823.2272736b.cohuck@redhat.com> <20181214134434.GA3882@stefanha-x1.localdomain> From: David Hildenbrand Message-ID: Date: Mon, 17 Dec 2018 11:53:46 +0100 MIME-Version: 1.0 In-Reply-To: <20181214134434.GA3882@stefanha-x1.localdomain> Content-Type: text/plain; charset=windows-1252 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-fsdevel-owner@vger.kernel.org List-ID: On 14.12.18 14:44, Stefan Hajnoczi wrote: > On Thu, Dec 13, 2018 at 01:38:23PM +0100, Cornelia Huck wrote: >> On Thu, 13 Dec 2018 13:24:31 +0100 >> David Hildenbrand wrote: >> >>> On 13.12.18 13:15, Dr. David Alan Gilbert wrote: >>>> * David Hildenbrand (david@redhat.com) wrote: >>>>> On 13.12.18 11:00, Dr. David Alan Gilbert wrote: >>>>>> * David Hildenbrand (david@redhat.com) wrote: >>>>>>> On 13.12.18 10:13, Dr. David Alan Gilbert wrote: >>>>>>>> * David Hildenbrand (david@redhat.com) wrote: >>>>>>>>> On 10.12.18 18:12, Vivek Goyal wrote: >>>>>>>>>> Instead of assuming we had the fixed bar for the cache, use the >>>>>>>>>> value from the capabilities. >>>>>>>>>> >>>>>>>>>> Signed-off-by: Dr. David Alan Gilbert >>>>>>>>>> --- >>>>>>>>>> fs/fuse/virtio_fs.c | 32 +++++++++++++++++--------------- >>>>>>>>>> 1 file changed, 17 insertions(+), 15 deletions(-) >>>>>>>>>> >>>>>>>>>> diff --git a/fs/fuse/virtio_fs.c b/fs/fuse/virtio_fs.c >>>>>>>>>> index 60d496c16841..55bac1465536 100644 >>>>>>>>>> --- a/fs/fuse/virtio_fs.c >>>>>>>>>> +++ b/fs/fuse/virtio_fs.c >>>>>>>>>> @@ -14,11 +14,6 @@ >>>>>>>>>> #include >>>>>>>>>> #include "fuse_i.h" >>>>>>>>>> >>>>>>>>>> -enum { >>>>>>>>>> - /* PCI BAR number of the virtio-fs DAX window */ >>>>>>>>>> - VIRTIO_FS_WINDOW_BAR = 2, >>>>>>>>>> -}; >>>>>>>>>> - >>>>>>>>>> /* List of virtio-fs device instances and a lock for the list */ >>>>>>>>>> static DEFINE_MUTEX(virtio_fs_mutex); >>>>>>>>>> static LIST_HEAD(virtio_fs_instances); >>>>>>>>>> @@ -518,7 +513,7 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) >>>>>>>>>> struct dev_pagemap *pgmap; >>>>>>>>>> struct pci_dev *pci_dev; >>>>>>>>>> phys_addr_t phys_addr; >>>>>>>>>> - size_t len; >>>>>>>>>> + size_t bar_len; >>>>>>>>>> int ret; >>>>>>>>>> u8 have_cache, cache_bar; >>>>>>>>>> u64 cache_offset, cache_len; >>>>>>>>>> @@ -551,17 +546,13 @@ static int virtio_fs_setup_dax(struct virtio_device *vdev, struct virtio_fs *fs) >>>>>>>>>> } >>>>>>>>>> >>>>>>>>>> /* TODO handle case where device doesn't expose BAR? */ >>>>>>>>> >>>>>>>>> For virtio-pmem we decided to not go via BARs as this would effectively >>>>>>>>> make it only usable for virtio-pci implementers. Instead, we are going >>>>>>>>> to export the applicable physical device region directly (e.g. >>>>>>>>> phys_start, phys_size in virtio config), so it is decoupled from PCI >>>>>>>>> details. Doing the same for virtio-fs would allow e.g. also virtio-ccw >>>>>>>>> to make eventually use of this. >>>>>>>> >>>>>>>> That makes it a very odd looking PCI device; I can see that with >>>>>>>> virtio-pmem it makes some sense, given that it's job is to expose >>>>>>>> arbitrary chunks of memory. >>>>>>>> >>>>>>>> Dave >>>>>>> >>>>>>> Well, the fact that your are >>>>>>> >>>>>>> - including >>>>>>> - adding pci related code >>>>>>> >>>>>>> in/to fs/fuse/virtio_fs.c >>>>>>> >>>>>>> tells me that these properties might be better communicated on the >>>>>>> virtio layer, not on the PCI layer. >>>>>>> >>>>>>> Or do you really want to glue virtio-fs to virtio-pci for all eternity? >>>>>> >>>>>> No, these need cleaning up; and the split within the bar >>>>>> is probably going to change to be communicated via virtio layer >>>>>> rather than pci capabilities. However, I don't want to make our PCI >>>>>> device look odd, just to make portability to non-PCI devices - so it's >>>>>> right to make the split appropriately, but still to use PCI bars >>>>>> for what they were designed for. >>>>>> >>>>>> Dave >>>>> >>>>> Let's discuss after the cleanup. In general I am not convinced this is >>>>> the right thing to do. Using virtio-pci for anything else than pure >>>>> transport smells like bad design to me (well, I am no virtio expert >>>>> after all ;) ). No matter what PCI bars were designed for. If we can't >>>>> get the same running with e.g. virtio-ccw or virtio-whatever, it is >>>>> broken by design (or an addon that is tightly glued to virtio-pci, if >>>>> that is the general idea). >>>> >>>> I'm sure we can find alternatives for virtio-*, so I wouldn't expect >>>> it to be glued to virtio-pci. >>>> >>>> Dave >>> >>> As s390x does not have the concept of memory mapped io (RAM is RAM, >>> nothing else), this is not architectured. vitio-ccw can therefore not >>> define anything similar like that. However, in virtual environments we >>> can do whatever we want on top of the pure transport (e.g. on the virtio >>> layer). >>> >>> Conny can correct me if I am wrong. >> >> I don't think you're wrong, but I haven't read the code yet and I'm >> therefore not aware of the purpose of this BAR. >> >> Generally, if there is a memory location shared between host and guest, >> we need a way to communicate its location, which will likely differ >> between transports. For ccw, I could imagine a new channel command >> dedicated to exchanging configuration information (similar to what >> exists today to communicate the locations of virtqueues), but I'd >> rather not go down this path. >> >> Without reading the code/design further, can we use one of the >> following instead of a BAR: >> - a virtqueue; >> - something in config space? >> That would be implementable by any virtio transport. > > The way I think about this is that we wish to extend the VIRTIO device > model with the concept of shared memory. virtio-fs, virtio-gpu, and > virtio-vhost-user all have requirements for shared memory. > > This seems like a transport-level issue to me. PCI supports > memory-mapped I/O and that's the right place to do it. If you try to > put it into config space or the virtqueue, you'll end up with something > that cannot be realized as a PCI device because it bypasses PCI bus > address translation. > > If CCW needs a side-channel, that's fine. But that side-channel is a > CCW-specific mechanism and probably doesn't apply to all other > transports. > > Stefan > I think the problem is more fundamental. There is no iommu. Whatever shared region you want to indicate, you want it to be assigned a memory region in guest physical memory. Like a DIMM/NVDIMM. And this should be different to the concept of a BAR. Or am I missing something? I am ok with using whatever other channel to transport such information. But I believe this is different to a typical BAR. (I wish I knew more about PCI internals ;) ). I would also like to know how shared memory works as of now for e.g. virtio-gpu. -- Thanks, David / dhildenb