KVM Archive on lore.kernel.org
 help / color / Atom feed
From: Halil Pasic <pasic@linux.ibm.com>
To: Cornelia Huck <cohuck@redhat.com>
Cc: Vivek Goyal <vgoyal@redhat.com>,
	linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm@vger.kernel.org, linux-nvdimm@lists.01.org,
	miklos@szeredi.hu, stefanha@redhat.com, dgilbert@redhat.com,
	swhiteho@redhat.com, Sebastian Ott <sebott@linux.ibm.com>,
	Christian Borntraeger <borntraeger@de.ibm.com>,
	Collin Walling <walling@linux.ibm.com>,
	David Hildenbrand <david@redhat.com>
Subject: Re: [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device
Date: Thu, 18 Jul 2019 13:20:49 +0200
Message-ID: <20190718132049.37bea675.pasic@linux.ibm.com> (raw)
In-Reply-To: <20190718110417.561f6475.cohuck@redhat.com>

On Thu, 18 Jul 2019 11:04:17 +0200
Cornelia Huck <cohuck@redhat.com> wrote:

> On Wed, 17 Jul 2019 19:27:25 +0200
> Halil Pasic <pasic@linux.ibm.com> wrote:
> 
> > On Wed, 15 May 2019 15:27:03 -0400
> > Vivek Goyal <vgoyal@redhat.com> wrote:
> > 
> > > From: Stefan Hajnoczi <stefanha@redhat.com>
> > > 
> > > Setup a dax device.
> > > 
> > > Use the shm capability to find the cache entry and map it.
> > > 
> > > The DAX window is accessed by the fs/dax.c infrastructure and must have
> > > struct pages (at least on x86).  Use devm_memremap_pages() to map the
> > > DAX window PCI BAR and allocate struct page.
> > >  
> > 
> > Sorry for being this late. I don't see any more recent version so I will
> > comment here.
> 
> [Yeah, this one has been sitting in my to-review queue far too long as
> well :(]
> 
> > 
> > I'm trying to figure out how is this supposed to work on s390. My concern
> > is, that on s390 PCI memory needs to be accessed by special
> > instructions. This is taken care of by the stuff defined in
> > arch/s390/include/asm/io.h. E.g. we 'override' __raw_writew so it uses
> > the appropriate s390 instruction. However if the code does not use the
> > linux abstractions for accessing PCI memory, but assumes it can be
> > accessed like RAM, we have a problem.
> > 
> > Looking at this patch, it seems to me, that we might end up with exactly
> > the case described. For example AFAICT copy_to_iter() (3) resolves to
> > the function in lib/iov_iter.c which does not seem to cater for s390
> > oddities.
> 
> What about the new pci instructions recently introduced? Not sure how
> they differ from the old ones (which are currently the only ones
> supported in QEMU...), but I'm pretty sure they are supposed to solve
> an issue :)
> 

I'm struggling to find the connection between this topic and the new pci
instructions. Can you please explain in more detail?

> > 
> > I didn't have the time to investigate this properly, and since virtio-fs
> > is virtual, we may be able to get around what is otherwise a
> > limitation on s390. My understanding of these areas is admittedly
> > shallow, and since I'm not sure I'll have much more time to
> > invest in the near future I decided to raise concern.
> > 
> > Any opinions?
> 
> Let me point to the thread starting at
> https://marc.info/?l=linux-s390&m=155048406205221&w=2 as well. That
> memory region stuff is still unsolved for ccw, and I'm not sure if we
> need to do something for zpci as well.
> 

Right virtio-ccw is another problem, but at least there we don't have the
need to limit ourselves to a very specific set of instructions (for
accessing memory).

zPCI i.e. virtio-pci on z should require much less dedicated love if any
at all. Unfortunately I'm not very knowledgeable on either PCI in general
or its s390 variant.

> Does s390 work with DAX at all? ISTR that DAX evolved from XIP, so I
> thought it did?
> 

Documentation/filesystems/dax.txt even mentions dcssblk: s390 dcss block
device driver as a source of inspiration. So I suppose it does work.

Regards,
Halil

> > 
> > [CCing some s390 people who are probably more knowledgeable than my
> > on these matters.]
> > 
> > Regards,
> > Halil
> > 
> > 
> > > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > > Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > Signed-off-by: Vivek Goyal <vgoyal@redhat.com>
> > > Signed-off-by: Sebastien Boeuf <sebastien.boeuf@intel.com>
> > > Signed-off-by: Liu Bo <bo.liu@linux.alibaba.com>
> > > ---  
> > 
> > [..]
> >   
> > > +/* Map a window offset to a page frame number.  The window offset
> > > will have
> > > + * been produced by .iomap_begin(), which maps a file offset to a
> > > window
> > > + * offset.
> > > + */
> > > +static long virtio_fs_direct_access(struct dax_device *dax_dev,
> > > pgoff_t pgoff,
> > > +				    long nr_pages, void **kaddr,
> > > pfn_t *pfn) +{
> > > +	struct virtio_fs *fs = dax_get_private(dax_dev);
> > > +	phys_addr_t offset = PFN_PHYS(pgoff);
> > > +	size_t max_nr_pages = fs->window_len/PAGE_SIZE - pgoff;
> > > +
> > > +	if (kaddr)
> > > +		*kaddr = fs->window_kaddr + offset;  
> > 
> > (2) Here we use fs->window_kaddr, basically directing the access to
> > the virtio shared memory region.
> > 
> > > +	if (pfn)
> > > +		*pfn = phys_to_pfn_t(fs->window_phys_addr +
> > > offset,
> > > +					PFN_DEV | PFN_MAP);
> > > +	return nr_pages > max_nr_pages ? max_nr_pages : nr_pages;
> > > +}
> > > +
> > > +static size_t virtio_fs_copy_from_iter(struct dax_device *dax_dev,
> > > +				       pgoff_t pgoff, void *addr,
> > > +				       size_t bytes, struct
> > > iov_iter *i) +{
> > > +	return copy_from_iter(addr, bytes, i);
> > > +}
> > > +
> > > +static size_t virtio_fs_copy_to_iter(struct dax_device *dax_dev,
> > > +				       pgoff_t pgoff, void *addr,
> > > +				       size_t bytes, struct
> > > iov_iter *i) +{
> > > +	return copy_to_iter(addr, bytes, i);  
> > 
> > (3) And this should be the access to it. Which does not seem to use.
> > 
> > > +}
> > > +
> > > +static const struct dax_operations virtio_fs_dax_ops = {
> > > +	.direct_access = virtio_fs_direct_access,
> > > +	.copy_from_iter = virtio_fs_copy_from_iter,
> > > +	.copy_to_iter = virtio_fs_copy_to_iter,
> > > +};
> > > +
> > > +static void virtio_fs_percpu_release(struct percpu_ref *ref)
> > > +{
> > > +	struct virtio_fs_memremap_info *mi =
> > > +		container_of(ref, struct virtio_fs_memremap_info,
> > > ref); +
> > > +	complete(&mi->completion);
> > > +}
> > > +
> > > +static void virtio_fs_percpu_exit(void *data)
> > > +{
> > > +	struct virtio_fs_memremap_info *mi = data;
> > > +
> > > +	wait_for_completion(&mi->completion);
> > > +	percpu_ref_exit(&mi->ref);
> > > +}
> > > +
> > > +static void virtio_fs_percpu_kill(struct percpu_ref *ref)
> > > +{
> > > +	percpu_ref_kill(ref);
> > > +}
> > > +
> > > +static void virtio_fs_cleanup_dax(void *data)
> > > +{
> > > +	struct virtio_fs *fs = data;
> > > +
> > > +	kill_dax(fs->dax_dev);
> > > +	put_dax(fs->dax_dev);
> > > +}
> > > +
> > > +static int virtio_fs_setup_dax(struct virtio_device *vdev, struct
> > > virtio_fs *fs) +{
> > > +	struct virtio_shm_region cache_reg;
> > > +	struct virtio_fs_memremap_info *mi;
> > > +	struct dev_pagemap *pgmap;
> > > +	bool have_cache;
> > > +	int ret;
> > > +
> > > +	if (!IS_ENABLED(CONFIG_DAX_DRIVER))
> > > +		return 0;
> > > +
> > > +	/* Get cache region */
> > > +	have_cache = virtio_get_shm_region(vdev,
> > > +					   &cache_reg,
> > > +
> > > (u8)VIRTIO_FS_SHMCAP_ID_CACHE);
> > > +	if (!have_cache) {
> > > +		dev_err(&vdev->dev, "%s: No cache capability\n",
> > > __func__);
> > > +		return -ENXIO;
> > > +	} else {
> > > +		dev_notice(&vdev->dev, "Cache len: 0x%llx @
> > > 0x%llx\n",
> > > +			   cache_reg.len, cache_reg.addr);
> > > +	}
> > > +
> > > +	mi = devm_kzalloc(&vdev->dev, sizeof(*mi), GFP_KERNEL);
> > > +	if (!mi)
> > > +		return -ENOMEM;
> > > +
> > > +	init_completion(&mi->completion);
> > > +	ret = percpu_ref_init(&mi->ref, virtio_fs_percpu_release,
> > > 0,
> > > +			      GFP_KERNEL);
> > > +	if (ret < 0) {
> > > +		dev_err(&vdev->dev, "%s: percpu_ref_init failed
> > > (%d)\n",
> > > +			__func__, ret);
> > > +		return ret;
> > > +	}
> > > +
> > > +	ret = devm_add_action(&vdev->dev, virtio_fs_percpu_exit,
> > > mi);
> > > +	if (ret < 0) {
> > > +		percpu_ref_exit(&mi->ref);
> > > +		return ret;
> > > +	}
> > > +
> > > +	pgmap = &mi->pgmap;
> > > +	pgmap->altmap_valid = false;
> > > +	pgmap->ref = &mi->ref;
> > > +	pgmap->kill = virtio_fs_percpu_kill;
> > > +	pgmap->type = MEMORY_DEVICE_FS_DAX;
> > > +
> > > +	/* Ideally we would directly use the PCI BAR resource but
> > > +	 * devm_memremap_pages() wants its own copy in pgmap.  So
> > > +	 * initialize a struct resource from scratch (only the
> > > start
> > > +	 * and end fields will be used).
> > > +	 */
> > > +	pgmap->res = (struct resource){
> > > +		.name = "virtio-fs dax window",
> > > +		.start = (phys_addr_t) cache_reg.addr,
> > > +		.end = (phys_addr_t) cache_reg.addr +
> > > cache_reg.len - 1,
> > > +	};
> > > +
> > > +	fs->window_kaddr = devm_memremap_pages(&vdev->dev,
> > > pgmap);  
> > 
> > (1) Here we assign fs->window_kaddr basically from the virtio shm
> > region.
> > 
> > > +	if (IS_ERR(fs->window_kaddr))
> > > +		return PTR_ERR(fs->window_kaddr);
> > > +
> > > +	fs->window_phys_addr = (phys_addr_t) cache_reg.addr;
> > > +	fs->window_len = (phys_addr_t) cache_reg.len;
> > > +
> > > +	dev_dbg(&vdev->dev, "%s: window kaddr 0x%px phys_addr
> > > 0x%llx"
> > > +		" len 0x%llx\n", __func__, fs->window_kaddr,
> > > cache_reg.addr,
> > > +		cache_reg.len);
> > > +
> > > +	fs->dax_dev = alloc_dax(fs, NULL, &virtio_fs_dax_ops);
> > > +	if (!fs->dax_dev)
> > > +		return -ENOMEM;
> > > +
> > > +	return devm_add_action_or_reset(&vdev->dev,
> > > virtio_fs_cleanup_dax, fs); +}
> > > +  
> > 
> > [..]
> > 
> 


  reply index

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-15 19:26 [PATCH v2 00/30] [RFC] virtio-fs: shared file system for virtual machines Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 01/30] fuse: delete dentry if timeout is zero Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 02/30] fuse: Clear setuid bit even in cache=never path Vivek Goyal
2019-05-20 14:41   ` Miklos Szeredi
2019-05-20 14:44     ` Miklos Szeredi
2019-05-20 20:25       ` Nikolaus Rath
2019-05-21 15:01     ` Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 03/30] fuse: Use default_file_splice_read for direct IO Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 04/30] fuse: export fuse_end_request() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 05/30] fuse: export fuse_len_args() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 06/30] fuse: Export fuse_send_init_request() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 07/30] fuse: export fuse_get_unique() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 08/30] fuse: extract fuse_fill_super_common() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 09/30] fuse: add fuse_iqueue_ops callbacks Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 10/30] fuse: Separate fuse device allocation and installation in fuse_conn Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 11/30] virtio_fs: add skeleton virtio_fs.ko module Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 12/30] dax: remove block device dependencies Vivek Goyal
2019-05-16  0:21   ` Dan Williams
2019-05-16 10:07     ` Stefan Hajnoczi
2019-05-16 14:23     ` Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 13/30] dax: Pass dax_dev to dax_writeback_mapping_range() Vivek Goyal
2019-05-15 19:26 ` [PATCH v2 14/30] virtio: Add get_shm_region method Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 15/30] virtio: Implement get_shm_region for PCI transport Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 16/30] virtio: Implement get_shm_region for MMIO transport Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 17/30] fuse, dax: add fuse_conn->dax_dev field Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 18/30] virtio_fs, dax: Set up virtio_fs dax_device Vivek Goyal
2019-07-17 17:27   ` Halil Pasic
2019-07-18  9:04     ` Cornelia Huck
2019-07-18 11:20       ` Halil Pasic [this message]
2019-07-18 14:47         ` Cornelia Huck
2019-07-18 13:15     ` Vivek Goyal
2019-07-18 14:30       ` Dan Williams
2019-07-22 10:51         ` Christian Borntraeger
2019-07-22 10:56           ` Dr. David Alan Gilbert
2019-07-22 11:20             ` Christian Borntraeger
2019-07-22 11:43               ` Cornelia Huck
2019-07-22 12:00                 ` Christian Borntraeger
2019-07-22 12:08                   ` David Hildenbrand
2019-07-29 13:20                     ` Stefan Hajnoczi
2019-05-15 19:27 ` [PATCH v2 19/30] fuse: Keep a list of free dax memory ranges Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 20/30] fuse: Introduce setupmapping/removemapping commands Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 21/30] fuse, dax: Implement dax read/write operations Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 22/30] fuse, dax: add DAX mmap support Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 23/30] fuse: Define dax address space operations Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 24/30] fuse, dax: Take ->i_mmap_sem lock during dax page fault Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 25/30] fuse: Maintain a list of busy elements Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 26/30] fuse: Add logic to free up a memory range Vivek Goyal
     [not found]   ` <CAN+Pk99SNKSf+GjSQUUWt_eu1fSjTy_ByUOEQUXHi8zNqXY1zA@mail.gmail.com>
2019-05-20 12:53     ` Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 27/30] fuse: Release file in process context Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 28/30] fuse: Reschedule dax free work if too many EAGAIN attempts Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 29/30] fuse: Take inode lock for dax inode truncation Vivek Goyal
2019-05-15 19:27 ` [PATCH v2 30/30] virtio-fs: Do not provide abort interface in fusectl Vivek Goyal

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190718132049.37bea675.pasic@linux.ibm.com \
    --to=pasic@linux.ibm.com \
    --cc=borntraeger@de.ibm.com \
    --cc=cohuck@redhat.com \
    --cc=david@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=miklos@szeredi.hu \
    --cc=sebott@linux.ibm.com \
    --cc=stefanha@redhat.com \
    --cc=swhiteho@redhat.com \
    --cc=vgoyal@redhat.com \
    --cc=walling@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

KVM Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/kvm/0 kvm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 kvm kvm/ https://lore.kernel.org/kvm \
		kvm@vger.kernel.org
	public-inbox-index kvm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.kvm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git