From: Christian Borntraeger <firstname.lastname@example.org> To: "Dr. David Alan Gilbert" <email@example.com> Cc: Dan Williams <firstname.lastname@example.org>, Vivek Goyal <email@example.com>, Halil Pasic <firstname.lastname@example.org>, Collin Walling <email@example.com>, Cornelia Huck <firstname.lastname@example.org>, Sebastian Ott <email@example.com>, KVM list <firstname.lastname@example.org>, Miklos Szeredi <email@example.com>, linux-nvdimm <firstname.lastname@example.org>, Linux Kernel Mailing List <email@example.com>, Stefan Hajnoczi <firstname.lastname@example.org>, linux-fsdevel <email@example.com>, Steven Whitehouse <firstname.lastname@example.org>, Heiko Carstens <email@example.com>, David Hildenbrand <firstname.lastname@example.org> Subject: Re: [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device Date: Mon, 22 Jul 2019 13:20:18 +0200 Message-ID: <email@example.com> (raw) In-Reply-To: <20190722105630.GC3035@work-vm> On 22.07.19 12:56, Dr. David Alan Gilbert wrote: > * Christian Borntraeger (firstname.lastname@example.org) wrote: >> >> >> On 18.07.19 16:30, Dan Williams wrote: >>> On Thu, Jul 18, 2019 at 6:15 AM Vivek Goyal <email@example.com> wrote: >>>> >>>> On Wed, Jul 17, 2019 at 07:27:25PM +0200, Halil Pasic wrote: >>>>> On Wed, 15 May 2019 15:27:03 -0400 >>>>> Vivek Goyal <firstname.lastname@example.org> wrote: >>>>> >>>>>> From: Stefan Hajnoczi <email@example.com> >>>>>> >>>>>> Setup a dax device. >>>>>> >>>>>> Use the shm capability to find the cache entry and map it. >>>>>> >>>>>> The DAX window is accessed by the fs/dax.c infrastructure and must have >>>>>> struct pages (at least on x86). Use devm_memremap_pages() to map the >>>>>> DAX window PCI BAR and allocate struct page. >>>>>> >>>>> >>>>> Sorry for being this late. I don't see any more recent version so I will >>>>> comment here. >>>>> >>>>> I'm trying to figure out how is this supposed to work on s390. My concern >>>>> is, that on s390 PCI memory needs to be accessed by special >>>>> instructions. This is taken care of by the stuff defined in >>>>> arch/s390/include/asm/io.h. E.g. we 'override' __raw_writew so it uses >>>>> the appropriate s390 instruction. However if the code does not use the >>>>> linux abstractions for accessing PCI memory, but assumes it can be >>>>> accessed like RAM, we have a problem. >>>>> >>>>> Looking at this patch, it seems to me, that we might end up with exactly >>>>> the case described. For example AFAICT copy_to_iter() (3) resolves to >>>>> the function in lib/iov_iter.c which does not seem to cater for s390 >>>>> oddities. >>>>> >>>>> I didn't have the time to investigate this properly, and since virtio-fs >>>>> is virtual, we may be able to get around what is otherwise a >>>>> limitation on s390. My understanding of these areas is admittedly >>>>> shallow, and since I'm not sure I'll have much more time to >>>>> invest in the near future I decided to raise concern. >>>>> >>>>> Any opinions? >>>> >>>> Hi Halil, >>>> >>>> I don't understand s390 and how PCI works there as well. Is there any >>>> other transport we can use there to map IO memory directly and access >>>> using DAX? >>>> >>>> BTW, is DAX supported for s390. >>>> >>>> I am also hoping somebody who knows better can chip in. Till that time, >>>> we could still use virtio-fs on s390 without DAX. >>> >>> s390 has so-called "limited" dax support, see CONFIG_FS_DAX_LIMITED. >>> In practice that means that support for PTE_DEVMAP is missing which >>> means no get_user_pages() support for dax mappings. Effectively it's >>> only useful for execute-in-place as operations like fork() and ptrace >>> of dax mappings will fail. >> >> >> This is only true for the dcssblk device driver (drivers/s390/block/dcssblk.c >> and arch/s390/mm/extmem.c). >> >> For what its worth, the dcssblk looks to Linux like normal memory (just above the >> previously detected memory) that can be used like normal memory. In previous time >> we even had struct pages for this memory - this was removed long ago (when it was >> still xip) to reduce the memory footprint for large dcss blocks and small memory >> guests. >> Can the CONFIG_FS_DAX_LIMITED go away if we have struct pages for that memory? >> >> Now some observations: >> - dcssblk is z/VM only (not KVM) >> - Setting CONFIG_FS_DAX_LIMITED globally as a Kconfig option depending on wether >> a device driver is compiled in or not seems not flexible enough in case if you >> have device driver that does have struct pages and another one that doesn't >> - I do not see a reason why we should not be able to map anything from QEMU >> into the guest real memory via an additional KVM memory slot. >> We would need to handle that in the guest somehow (and not as a PCI bar), >> register this with struct pages etc. >> - we must then look how we can create the link between the guest memory and the >> virtio-fs driver. For virtio-ccw we might be able to add a new ccw command or >> whatever. Maybe we could also piggy-back on some memory hotplug work from David >> Hildenbrand (add cc). >> >> Regarding limitations on the platform: >> - while we do have PCI, the virtio devices are usually plugged via the ccw bus. >> That implies no PCI bars. I assume you use those PCI bars only to implicitely >> have the location of the shared memory >> Correct? > > Right. So in essence we just have to provide a vm_get_shm_region callback in the virtio-ccw guest code? How many regions do we have to support? One region per device? Or many? Even if we need more, this should be possible with a 2 new CCWs, e.g READ_SHM_BASE(id) and READ_SHM_SIZE(id) > >> - no real memory mapped I/O. Instead there are instructions that work on the mmio. >> As I understand things, this is of no concern regarding virtio-fs as you do not >> need mmio in the sense that a memory access of the guest to such an address >> triggers an exit. You just need the shared memory as a mean to have the data >> inside the guest. Any notification is done via normal virtqueue mechanisms >> Correct? > > Yep.
next prev parent reply index Thread overview: 52+ messages / expand[flat|nested] mbox.gz Atom feed top 2019-05-15 19:26 [PATCH v2 00/30] [RFC] virtio-fs: shared file system for virtual machines Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 01/30] fuse: delete dentry if timeout is zero Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 02/30] fuse: Clear setuid bit even in cache=never path Vivek Goyal 2019-05-20 14:41 ` Miklos Szeredi 2019-05-20 14:44 ` Miklos Szeredi 2019-05-20 20:25 ` Nikolaus Rath 2019-05-21 15:01 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 03/30] fuse: Use default_file_splice_read for direct IO Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 04/30] fuse: export fuse_end_request() Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 05/30] fuse: export fuse_len_args() Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 06/30] fuse: Export fuse_send_init_request() Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 07/30] fuse: export fuse_get_unique() Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 08/30] fuse: extract fuse_fill_super_common() Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 09/30] fuse: add fuse_iqueue_ops callbacks Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 10/30] fuse: Separate fuse device allocation and installation in fuse_conn Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 11/30] virtio_fs: add skeleton virtio_fs.ko module Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 12/30] dax: remove block device dependencies Vivek Goyal 2019-05-16 0:21 ` Dan Williams 2019-05-16 10:07 ` Stefan Hajnoczi 2019-05-16 14:23 ` Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 13/30] dax: Pass dax_dev to dax_writeback_mapping_range() Vivek Goyal 2019-05-15 19:26 ` [PATCH v2 14/30] virtio: Add get_shm_region method Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 15/30] virtio: Implement get_shm_region for PCI transport Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 16/30] virtio: Implement get_shm_region for MMIO transport Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 17/30] fuse, dax: add fuse_conn->dax_dev field Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device Vivek Goyal 2019-07-17 17:27 ` Halil Pasic 2019-07-18 9:04 ` Cornelia Huck 2019-07-18 11:20 ` Halil Pasic 2019-07-18 14:47 ` Cornelia Huck 2019-07-18 13:15 ` Vivek Goyal 2019-07-18 14:30 ` Dan Williams 2019-07-22 10:51 ` Christian Borntraeger 2019-07-22 10:56 ` Dr. David Alan Gilbert 2019-07-22 11:20 ` Christian Borntraeger [this message] 2019-07-22 11:43 ` Cornelia Huck 2019-07-22 12:00 ` Christian Borntraeger 2019-07-22 12:08 ` David Hildenbrand 2019-07-29 13:20 ` Stefan Hajnoczi 2019-05-15 19:27 ` [PATCH v2 19/30] fuse: Keep a list of free dax memory ranges Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 20/30] fuse: Introduce setupmapping/removemapping commands Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 21/30] fuse, dax: Implement dax read/write operations Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 22/30] fuse, dax: add DAX mmap support Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 23/30] fuse: Define dax address space operations Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 24/30] fuse, dax: Take ->i_mmap_sem lock during dax page fault Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 25/30] fuse: Maintain a list of busy elements Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 26/30] fuse: Add logic to free up a memory range Vivek Goyal [not found] ` <CAN+Pk99SNKSf+GjSQUUWt_eu1fSjTy_ByUOEQUXHi8zNqXY1zA@mail.gmail.com> 2019-05-20 12:53 ` Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 27/30] fuse: Release file in process context Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 28/30] fuse: Reschedule dax free work if too many EAGAIN attempts Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 29/30] fuse: Take inode lock for dax inode truncation Vivek Goyal 2019-05-15 19:27 ` [PATCH v2 30/30] virtio-fs: Do not provide abort interface in fusectl Vivek Goyal
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ --firstname.lastname@example.org \ --email@example.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: link
KVM Archive on lore.kernel.org Archives are clonable: git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git # If you have public-inbox 1.1+ installed, you may # initialize and index your mirror using the following commands: public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \ firstname.lastname@example.org public-inbox-index kvm Example config snippet for mirrors Newsgroup available over NNTP: nntp://nntp.lore.kernel.org/org.kernel.vger.kvm AGPL code for this site: git clone https://public-inbox.org/public-inbox.git