From: "Dr. David Alan Gilbert" <dgilbert@redhat.com> To: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dan Williams <dan.j.williams@intel.com>, Vivek Goyal <vgoyal@redhat.com>, Halil Pasic <pasic@linux.ibm.com>, Collin Walling <walling@linux.ibm.com>, Cornelia Huck <cohuck@redhat.com>, Sebastian Ott <sebott@linux.ibm.com>, KVM list <kvm@vger.kernel.org>, Miklos Szeredi <miklos@szeredi.hu>, linux-nvdimm <linux-nvdimm@lists.01.org>, Linux Kernel Mailing List <linux-kernel@vger.kernel.org>, Stefan Hajnoczi <stefanha@redhat.com>, linux-fsdevel <linux-fsdevel@vger.kernel.org>, Steven Whitehouse <swhiteho@redhat.com>, Heiko Carstens <heiko.carstens@de.ibm.com>, David Hildenbrand <david@redhat.com> Subject: Re: [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device Date: Mon, 22 Jul 2019 11:56:30 +0100 Message-ID: <20190722105630.GC3035@work-vm> (raw) In-Reply-To: <c519011e-1df3-3f35-8582-2cb58367ff8a@de.ibm.com> * Christian Borntraeger (borntraeger@de.ibm.com) wrote: > > > On 18.07.19 16:30, Dan Williams wrote: > > On Thu, Jul 18, 2019 at 6:15 AM Vivek Goyal <vgoyal@redhat.com> wrote: > >> > >> On Wed, Jul 17, 2019 at 07:27:25PM +0200, Halil Pasic wrote: > >>> On Wed, 15 May 2019 15:27:03 -0400 > >>> Vivek Goyal <vgoyal@redhat.com> wrote: > >>> > >>>> From: Stefan Hajnoczi <stefanha@redhat.com> > >>>> > >>>> Setup a dax device. > >>>> > >>>> Use the shm capability to find the cache entry and map it. > >>>> > >>>> The DAX window is accessed by the fs/dax.c infrastructure and must have > >>>> struct pages (at least on x86). Use devm_memremap_pages() to map the > >>>> DAX window PCI BAR and allocate struct page. > >>>> > >>> > >>> Sorry for being this late. I don't see any more recent version so I will > >>> comment here. > >>> > >>> I'm trying to figure out how is this supposed to work on s390. My concern > >>> is, that on s390 PCI memory needs to be accessed by special > >>> instructions. This is taken care of by the stuff defined in > >>> arch/s390/include/asm/io.h. E.g. we 'override' __raw_writew so it uses > >>> the appropriate s390 instruction. However if the code does not use the > >>> linux abstractions for accessing PCI memory, but assumes it can be > >>> accessed like RAM, we have a problem. > >>> > >>> Looking at this patch, it seems to me, that we might end up with exactly > >>> the case described. For example AFAICT copy_to_iter() (3) resolves to > >>> the function in lib/iov_iter.c which does not seem to cater for s390 > >>> oddities. > >>> > >>> I didn't have the time to investigate this properly, and since virtio-fs > >>> is virtual, we may be able to get around what is otherwise a > >>> limitation on s390. My understanding of these areas is admittedly > >>> shallow, and since I'm not sure I'll have much more time to > >>> invest in the near future I decided to raise concern. > >>> > >>> Any opinions? > >> > >> Hi Halil, > >> > >> I don't understand s390 and how PCI works there as well. Is there any > >> other transport we can use there to map IO memory directly and access > >> using DAX? > >> > >> BTW, is DAX supported for s390. > >> > >> I am also hoping somebody who knows better can chip in. Till that time, > >> we could still use virtio-fs on s390 without DAX. > > > > s390 has so-called "limited" dax support, see CONFIG_FS_DAX_LIMITED. > > In practice that means that support for PTE_DEVMAP is missing which > > means no get_user_pages() support for dax mappings. Effectively it's > > only useful for execute-in-place as operations like fork() and ptrace > > of dax mappings will fail. > > > This is only true for the dcssblk device driver (drivers/s390/block/dcssblk.c > and arch/s390/mm/extmem.c). > > For what its worth, the dcssblk looks to Linux like normal memory (just above the > previously detected memory) that can be used like normal memory. In previous time > we even had struct pages for this memory - this was removed long ago (when it was > still xip) to reduce the memory footprint for large dcss blocks and small memory > guests. > Can the CONFIG_FS_DAX_LIMITED go away if we have struct pages for that memory? > > Now some observations: > - dcssblk is z/VM only (not KVM) > - Setting CONFIG_FS_DAX_LIMITED globally as a Kconfig option depending on wether > a device driver is compiled in or not seems not flexible enough in case if you > have device driver that does have struct pages and another one that doesn't > - I do not see a reason why we should not be able to map anything from QEMU > into the guest real memory via an additional KVM memory slot. > We would need to handle that in the guest somehow (and not as a PCI bar), > register this with struct pages etc. > - we must then look how we can create the link between the guest memory and the > virtio-fs driver. For virtio-ccw we might be able to add a new ccw command or > whatever. Maybe we could also piggy-back on some memory hotplug work from David > Hildenbrand (add cc). > > Regarding limitations on the platform: > - while we do have PCI, the virtio devices are usually plugged via the ccw bus. > That implies no PCI bars. I assume you use those PCI bars only to implicitely > have the location of the shared memory > Correct? Right. > - no real memory mapped I/O. Instead there are instructions that work on the mmio. > As I understand things, this is of no concern regarding virtio-fs as you do not > need mmio in the sense that a memory access of the guest to such an address > triggers an exit. You just need the shared memory as a mean to have the data > inside the guest. Any notification is done via normal virtqueue mechanisms > Correct? Yep. > > Adding Heiko, maybe he remembers some details of the dcssblk/xip history. > -- Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
next prev parent reply index Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-05-15 19:26 [PATCH v2 00/30] [RFC] virtio-fs: shared file system for virtual machines Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 01/30] fuse: delete dentry if timeout is zero Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 02/30] fuse: Clear setuid bit even in cache=never path Vivek Goyal 2019-05-20 14:41 ` Miklos Szeredi 2019-05-20 14:44 ` Miklos Szeredi 2019-05-20 20:25 ` Nikolaus Rath 2019-05-21 15:01 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 03/30] fuse: Use default_file_splice_read for direct IO Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 04/30] fuse: export fuse_end_request() Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 05/30] fuse: export fuse_len_args() Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 06/30] fuse: Export fuse_send_init_request() Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 07/30] fuse: export fuse_get_unique() Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 08/30] fuse: extract fuse_fill_super_common() Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 09/30] fuse: add fuse_iqueue_ops callbacks Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 10/30] fuse: Separate fuse device allocation and installation in fuse_conn Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 11/30] virtio_fs: add skeleton virtio_fs.ko module Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 12/30] dax: remove block device dependencies Vivek Goyal 2019-05-16 0:21 ` Dan Williams 2019-05-16 10:07 ` Stefan Hajnoczi 2019-05-16 14:23 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 13/30] dax: Pass dax_dev to dax_writeback_mapping_range() Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 14/30] virtio: Add get_shm_region method Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 15/30] virtio: Implement get_shm_region for PCI transport Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 16/30] virtio: Implement get_shm_region for MMIO transport Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 17/30] fuse, dax: add fuse_conn->dax_dev field Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device Vivek Goyal 2019-07-17 17:27 ` Halil Pasic 2019-07-18 9:04 ` Cornelia Huck 2019-07-18 11:20 ` Halil Pasic 2019-07-18 14:47 ` Cornelia Huck 2019-07-18 13:15 ` Vivek Goyal 2019-07-18 14:30 ` Dan Williams 2019-07-22 10:51 ` Christian Borntraeger 2019-07-22 10:56 ` Dr. David Alan Gilbert [this message] 2019-07-22 11:20 ` Christian Borntraeger 2019-07-22 11:43 ` Cornelia Huck 2019-07-22 12:00 ` Christian Borntraeger 2019-07-22 12:08 ` David Hildenbrand 2019-07-29 13:20 ` Stefan Hajnoczi 2019-05-15 19:27 ` [PATCH v2 19/30] fuse: Keep a list of free dax memory ranges Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 20/30] fuse: Introduce setupmapping/removemapping commands Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 21/30] fuse, dax: Implement dax read/write operations Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 22/30] fuse, dax: add DAX mmap support Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 23/30] fuse: Define dax address space operations Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 24/30] fuse, dax: Take ->i_mmap_sem lock during dax page fault Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 25/30] fuse: Maintain a list of busy elements Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 26/30] fuse: Add logic to free up a memory range Vivek Goyal [not found] ` <CAN+Pk99SNKSf+GjSQUUWt_eu1fSjTy_ByUOEQUXHi8zNqXY1zA@mail.gmail.com> 2019-05-20 12:53 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 27/30] fuse: Release file in process context Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 28/30] fuse: Reschedule dax free work if too many EAGAIN attempts Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 29/30] fuse: Take inode lock for dax inode truncation Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 30/30] virtio-fs: Do not provide abort interface in fusectl Vivek Goyal
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20190722105630.GC3035@work-vm \ --to=dgilbert@redhat.com \ --cc=borntraeger@de.ibm.com \ --cc=cohuck@redhat.com \ --cc=dan.j.williams@intel.com \ --cc=david@redhat.com \ --cc=heiko.carstens@de.ibm.com \ --cc=kvm@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-nvdimm@lists.01.org \ --cc=miklos@szeredi.hu \ --cc=pasic@linux.ibm.com \ --cc=sebott@linux.ibm.com \ --cc=stefanha@redhat.com \ --cc=swhiteho@redhat.com \ --cc=vgoyal@redhat.com \ --cc=walling@linux.ibm.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
KVM Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \ kvm@vger.kernel.org public-inbox-index kvm Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.kvm AGPL code for this site: git clone https://public-inbox.org/public-inbox.git