kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Christian Borntraeger <borntraeger@de.ibm.com>
To: Cornelia Huck <cohuck@redhat.com>
Cc: "Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Dan Williams <dan.j.williams@intel.com>,
	Vivek Goyal <vgoyal@redhat.com>,
	Halil Pasic <pasic@linux.ibm.com>,
	Collin Walling <walling@linux.ibm.com>,
	Sebastian Ott <sebott@linux.ibm.com>,
	KVM list <kvm@vger.kernel.org>,
	Miklos Szeredi <miklos@szeredi.hu>,
	linux-nvdimm <linux-nvdimm@lists.01.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Steven Whitehouse <swhiteho@redhat.com>,
	Heiko Carstens <heiko.carstens@de.ibm.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device
Date: Mon, 22 Jul 2019 14:00:28 +0200	[thread overview]
Message-ID: <b8239073-4c40-0ce6-2576-9d71ca0b1c18@de.ibm.com> (raw)
In-Reply-To: <20190722134317.39b148ce.cohuck@redhat.com>



On 22.07.19 13:43, Cornelia Huck wrote:
> On Mon, 22 Jul 2019 13:20:18 +0200
> Christian Borntraeger <borntraeger@de.ibm.com> wrote:
> 
>> On 22.07.19 12:56, Dr. David Alan Gilbert wrote:
>>> * Christian Borntraeger (borntraeger@de.ibm.com) wrote:  
>>>>
>>>>
>>>> On 18.07.19 16:30, Dan Williams wrote:  
>>>>> On Thu, Jul 18, 2019 at 6:15 AM Vivek Goyal <vgoyal@redhat.com> wrote:  
>>>>>>
>>>>>> On Wed, Jul 17, 2019 at 07:27:25PM +0200, Halil Pasic wrote:  
>>>>>>> On Wed, 15 May 2019 15:27:03 -0400
>>>>>>> Vivek Goyal <vgoyal@redhat.com> wrote:
>>>>>>>  
>>>>>>>> From: Stefan Hajnoczi <stefanha@redhat.com>
>>>>>>>>
>>>>>>>> Setup a dax device.
>>>>>>>>
>>>>>>>> Use the shm capability to find the cache entry and map it.
>>>>>>>>
>>>>>>>> The DAX window is accessed by the fs/dax.c infrastructure and must have
>>>>>>>> struct pages (at least on x86).  Use devm_memremap_pages() to map the
>>>>>>>> DAX window PCI BAR and allocate struct page.
>>>>>>>>  
>>>>>>>
>>>>>>> Sorry for being this late. I don't see any more recent version so I will
>>>>>>> comment here.
>>>>>>>
>>>>>>> I'm trying to figure out how is this supposed to work on s390. My concern
>>>>>>> is, that on s390 PCI memory needs to be accessed by special
>>>>>>> instructions. This is taken care of by the stuff defined in
>>>>>>> arch/s390/include/asm/io.h. E.g. we 'override' __raw_writew so it uses
>>>>>>> the appropriate s390 instruction. However if the code does not use the
>>>>>>> linux abstractions for accessing PCI memory, but assumes it can be
>>>>>>> accessed like RAM, we have a problem.
>>>>>>>
>>>>>>> Looking at this patch, it seems to me, that we might end up with exactly
>>>>>>> the case described. For example AFAICT copy_to_iter() (3) resolves to
>>>>>>> the function in lib/iov_iter.c which does not seem to cater for s390
>>>>>>> oddities.
>>>>>>>
>>>>>>> I didn't have the time to investigate this properly, and since virtio-fs
>>>>>>> is virtual, we may be able to get around what is otherwise a
>>>>>>> limitation on s390. My understanding of these areas is admittedly
>>>>>>> shallow, and since I'm not sure I'll have much more time to
>>>>>>> invest in the near future I decided to raise concern.
>>>>>>>
>>>>>>> Any opinions?  
>>>>>>
>>>>>> Hi Halil,
>>>>>>
>>>>>> I don't understand s390 and how PCI works there as well. Is there any
>>>>>> other transport we can use there to map IO memory directly and access
>>>>>> using DAX?
>>>>>>
>>>>>> BTW, is DAX supported for s390.
>>>>>>
>>>>>> I am also hoping somebody who knows better can chip in. Till that time,
>>>>>> we could still use virtio-fs on s390 without DAX.  
>>>>>
>>>>> s390 has so-called "limited" dax support, see CONFIG_FS_DAX_LIMITED.
>>>>> In practice that means that support for PTE_DEVMAP is missing which
>>>>> means no get_user_pages() support for dax mappings. Effectively it's
>>>>> only useful for execute-in-place as operations like fork() and ptrace
>>>>> of dax mappings will fail.  
>>>>
>>>>
>>>> This is only true for the dcssblk device driver (drivers/s390/block/dcssblk.c
>>>> and arch/s390/mm/extmem.c). 
>>>>
>>>> For what its worth, the dcssblk looks to Linux like normal memory (just above the
>>>> previously detected memory) that can be used like normal memory. In previous time
>>>> we even had struct pages for this memory - this was removed long ago (when it was
>>>> still xip) to reduce the memory footprint for large dcss blocks and small memory
>>>> guests.
>>>> Can the CONFIG_FS_DAX_LIMITED go away if we have struct pages for that memory?
>>>>
>>>> Now some observations: 
>>>> - dcssblk is z/VM only (not KVM)
>>>> - Setting CONFIG_FS_DAX_LIMITED globally as a Kconfig option depending on wether
>>>>   a device driver is compiled in or not seems not flexible enough in case if you
>>>>   have device driver that does have struct pages and another one that doesn't
>>>> - I do not see a reason why we should not be able to map anything from QEMU
>>>>   into the guest real memory via an additional KVM memory slot. 
>>>>   We would need to handle that in the guest somehow (and not as a PCI bar),
>>>>   register this with struct pages etc.
> 
> You mean for ccw, right? I don't think we want pci to behave
> differently than everywhere else.

Yes for virtio-ccw. We would need to have a look at how virtio-ccw can create a memory
mapping with struct pages, so that DAX will work.(Dan, it is just struct pages that 
you need, correct?)


> 
>>>> - we must then look how we can create the link between the guest memory and the
>>>>   virtio-fs driver. For virtio-ccw we might be able to add a new ccw command or
>>>>   whatever. Maybe we could also piggy-back on some memory hotplug work from David
>>>>   Hildenbrand (add cc).
>>>>
>>>> Regarding limitations on the platform:
>>>> - while we do have PCI, the virtio devices are usually plugged via the ccw bus.
>>>>   That implies no PCI bars. I assume you use those PCI bars only to implicitely 
>>>>   have the location of the shared memory
>>>>   Correct?  
>>>
>>> Right.  
>>
>> So in essence we just have to provide a vm_get_shm_region callback in the virtio-ccw
>> guest code?
>>
>> How many regions do we have to support? One region per device? Or many?
>> Even if we need more, this should be possible with a 2 new CCWs, e.g READ_SHM_BASE(id)
>> and READ_SHM_SIZE(id)
> 
> I'd just add a single CCW with a control block containing id and size.
> 
> The main issue is where we put those regions, and what happens if we
> use both virtio-pci and virtio-ccw on the same machine.

Then these 2 devices should get independent memory regions that are added in an
independent (but still exclusive) way.
> 
>>
>>
>>>   
>>>> - no real memory mapped I/O. Instead there are instructions that work on the mmio.
>>>>   As I understand things, this is of no concern regarding virtio-fs as you do not
>>>>   need mmio in the sense that a memory access of the guest to such an address 
>>>>   triggers an exit. You just need the shared memory as a mean to have the data
>>>>   inside the guest. Any notification is done via normal virtqueue mechanisms
>>>>   Correct?  
>>>
>>> Yep.  
>>
> 


  reply	other threads:[~2019-07-22 12:00 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-15 19:26 [PATCH v2 00/30] [RFC] virtio-fs: shared file system for virtual machines Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 01/30] fuse: delete dentry if timeout is zero Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 02/30] fuse: Clear setuid bit even in cache=never path Vivek Goyal
2019-05-20 14:41   ` Miklos Szeredi
2019-05-20 14:44     ` Miklos Szeredi
2019-05-20 20:25       ` Nikolaus Rath
2019-05-21 15:01     ` Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 03/30] fuse: Use default_file_splice_read for direct IO Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 04/30] fuse: export fuse_end_request() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 05/30] fuse: export fuse_len_args() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 06/30] fuse: Export fuse_send_init_request() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 07/30] fuse: export fuse_get_unique() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 08/30] fuse: extract fuse_fill_super_common() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 09/30] fuse: add fuse_iqueue_ops callbacks Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 10/30] fuse: Separate fuse device allocation and installation in fuse_conn Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 11/30] virtio_fs: add skeleton virtio_fs.ko module Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 12/30] dax: remove block device dependencies Vivek Goyal
2019-05-16  0:21   ` Dan Williams
2019-05-16 10:07     ` Stefan Hajnoczi
2019-05-16 14:23     ` Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 13/30] dax: Pass dax_dev to dax_writeback_mapping_range() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 14/30] virtio: Add get_shm_region method Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 15/30] virtio: Implement get_shm_region for PCI transport Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 16/30] virtio: Implement get_shm_region for MMIO transport Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 17/30] fuse, dax: add fuse_conn->dax_dev field Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device Vivek Goyal
2019-07-17 17:27   ` Halil Pasic
2019-07-18  9:04     ` Cornelia Huck
2019-07-18 11:20       ` Halil Pasic
2019-07-18 14:47         ` Cornelia Huck
2019-07-18 13:15     ` Vivek Goyal
2019-07-18 14:30       ` Dan Williams
2019-07-22 10:51         ` Christian Borntraeger
2019-07-22 10:56           ` Dr. David Alan Gilbert
2019-07-22 11:20             ` Christian Borntraeger
2019-07-22 11:43               ` Cornelia Huck
2019-07-22 12:00                 ` Christian Borntraeger [this message]
2019-07-22 12:08                   ` David Hildenbrand
2019-07-29 13:20                     ` Stefan Hajnoczi
2019-05-15 19:27 ` [PATCH v2 19/30] fuse: Keep a list of free dax memory ranges Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 20/30] fuse: Introduce setupmapping/removemapping commands Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 21/30] fuse, dax: Implement dax read/write operations Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 22/30] fuse, dax: add DAX mmap support Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 23/30] fuse: Define dax address space operations Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 24/30] fuse, dax: Take ->i_mmap_sem lock during dax page fault Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 25/30] fuse: Maintain a list of busy elements Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 26/30] fuse: Add logic to free up a memory range Vivek Goyal
     [not found]   ` <CAN+Pk99SNKSf+GjSQUUWt_eu1fSjTy_ByUOEQUXHi8zNqXY1zA@mail.gmail.com>
2019-05-20 12:53     ` Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 27/30] fuse: Release file in process context Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 28/30] fuse: Reschedule dax free work if too many EAGAIN attempts Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 29/30] fuse: Take inode lock for dax inode truncation Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 30/30] virtio-fs: Do not provide abort interface in fusectl Vivek Goyal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b8239073-4c40-0ce6-2576-9d71ca0b1c18@de.ibm.com \
    --to=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=dan.j.williams@intel.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=heiko.carstens@de.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=miklos@szeredi.hu \
    --cc=pasic@linux.ibm.com \
    --cc=sebott@linux.ibm.com \
    --cc=stefanha@redhat.com \
    --cc=swhiteho@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=walling@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).