All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alex Williamson <alex.williamson@redhat.com>
To: Yongji Xie <xyjxie@linux.vnet.ibm.com>
Cc: kvm@vger.kernel.org, linux-kernel@vger.kernel.org,
	bhelgaas@google.com, aik@ozlabs.ru, benh@kernel.crashing.org,
	paulus@samba.org, mpe@ellerman.id.au, warrier@linux.vnet.ibm.com,
	zhong@linux.vnet.ibm.com, nikunj@linux.vnet.ibm.com,
	gwshan@linux.vnet.ibm.com, kevin.tian@intel.com
Subject: Re: [PATCH v4] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive
Date: Thu, 23 Jun 2016 21:37:00 -0600	[thread overview]
Message-ID: <20160623213700.2ccfe903@t450s.home> (raw)
In-Reply-To: <d0099613-a4c4-7440-25f9-f265f464901a@linux.vnet.ibm.com>

On Fri, 24 Jun 2016 10:52:58 +0800
Yongji Xie <xyjxie@linux.vnet.ibm.com> wrote:
> On 2016/6/24 0:12, Alex Williamson wrote:
> > On Mon, 30 May 2016 21:06:37 +0800
> > Yongji Xie <xyjxie@linux.vnet.ibm.com> wrote:
> >> +static void vfio_pci_probe_mmaps(struct vfio_pci_device *vdev)
> >> +{
> >> +	struct resource *res;
> >> +	int bar;
> >> +	struct vfio_pci_dummy_resource *dummy_res;
> >> +
> >> +	INIT_LIST_HEAD(&vdev->dummy_resources_list);
> >> +
> >> +	for (bar = PCI_STD_RESOURCES; bar <= PCI_STD_RESOURCE_END; bar++) {
> >> +		res = vdev->pdev->resource + bar;
> >> +
> >> +		if (!IS_ENABLED(CONFIG_VFIO_PCI_MMAP))
> >> +			goto no_mmap;
> >> +
> >> +		if (!(res->flags & IORESOURCE_MEM))
> >> +			goto no_mmap;
> >> +
> >> +		/*
> >> +		 * The PCI core shouldn't set up a resource with a
> >> +		 * type but zero size. But there may be bugs that
> >> +		 * cause us to do that.
> >> +		 */
> >> +		if (!resource_size(res))
> >> +			goto no_mmap;
> >> +
> >> +		if (resource_size(res) >= PAGE_SIZE) {
> >> +			vdev->bar_mmap_supported[bar] = true;
> >> +			continue;
> >> +		}
> >> +
> >> +		if (!(res->start & ~PAGE_MASK)) {
> >> +			/*
> >> +			 * Add a dummy resource to reserve the remainder
> >> +			 * of the exclusive page in case that hot-add
> >> +			 * device's bar is assigned into it.
> >> +			 */
> >> +			dummy_res = kzalloc(sizeof(*dummy_res), GFP_KERNEL);
> >> +			if (dummy_res == NULL)
> >> +				goto no_mmap;
> >> +
> >> +			dummy_res->resource.start = res->end + 1;
> >> +			dummy_res->resource.end = res->start + PAGE_SIZE - 1;
> >> +			dummy_res->resource.flags = res->flags;
> >> +			if (request_resource(res->parent,
> >> +						&dummy_res->resource)) {
> >> +				kfree(dummy_res);
> >> +				goto no_mmap;
> >> +			}  
> > Isn't it true that request_resource() only tells us that at a given
> > point in time, no other drivers have reserved that resource?  It seems
> > like it does not guarantee that the resource isn't routed to another
> > device or that another driver won't at some point attempt to request
> > that same resource.  So for example if a user constructs their initrd
> > to bind vfio-pci to devices before other modules load, this
> > request_resource() may succeed, at the expense of drivers loaded later
> > now failing.  The behavior will depend on driver load order and we're
> > not actually insuring that the overflow resource is unused, just that
> > we got it first.  Can we do better?  Am I missing something that
> > prevents this?  Thanks,
> >
> > Alex  
> 
> Couldn't PCI resources allocator prevent this, which will find a
> empty slot in the resource tree firstly, then try to request that
> resource in allocate_resource() when a PCI device is probed.
> And I'd like to know why a PCI device driver would attempt to
> call request_resource()? Should this be done in PCI enumeration?

Hi Yongji,

Looks like most pci drivers call pci_request_regions().  From there the
call path is:

pci_request_selected_regions
  __pci_request_selected_regions
    __pci_request_region
      __request_mem_region
        __request_region
          __request_resource

We see this driver ordering issue sometimes with users attempting to
blacklist native pci drivers, trying to leave a device free for use by
vfio-pci.  If the device is a graphics card, the generic vesa or uefi
driver can request device resources causing a failure when vfio-pci
tries to request those same resources.  I expect that unless it's a
boot device, like vga in my example, the resources are not enabled
until the driver opens the device, therefore the request_resource() call
doesn't occur until that point.

For another trivial example, look at /proc/iomem as you load and unload
a driver, on my laptop with e1000e unloaded I see:

  e1200000-e121ffff : 0000:00:19.0
  e123e000-e123efff : 0000:00:19.0

When e1000e is loaded, each of these becomes claimed by the e1000e
driver:

  e1200000-e121ffff : 0000:00:19.0
    e1200000-e121ffff : e1000e
  e123e000-e123efff : 0000:00:19.0
    e123e000-e123efff : e1000e

Clearly pci core knows the resource is associated with the device, but
I don't think we're tapping into that with request_resource(), we're
just potentially stealing resources that another driver might have
claimed otherwise as I described above.  That's my suspicion at
least, feel free to show otherwise if it's incorrect.  Thanks,

Alex

  reply	other threads:[~2016-06-24  3:37 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <201605301311.u4UD99he028186@mx0a-001b2d01.pphosted.com>
2016-06-22 22:04 ` [PATCH v4] vfio-pci: Allow to mmap sub-page MMIO BARs if the mmio page is exclusive Alex Williamson
2016-06-23  2:39   ` Yongji Xie
2016-06-23  2:54     ` Alex Williamson
2016-06-23 16:12 ` Alex Williamson
2016-06-24  2:52   ` Yongji Xie
2016-06-24  3:37     ` Alex Williamson [this message]
2016-06-24  4:06       ` Tian, Kevin
2016-06-24 15:37       ` Yongji Xie
2016-06-24 16:43         ` Alex Williamson
2016-06-28 10:09           ` Yongji Xie
2016-06-28 19:47             ` Alex Williamson
2016-06-29 20:03               ` Alex Williamson
2016-06-30  2:40                 ` Yongji Xie
2016-06-30  2:53                   ` Alex Williamson
2016-06-30  3:29                     ` Yongji Xie
2016-05-30 13:06 Yongji Xie
  -- strict thread matches above, loose matches on Subject: below --
2016-05-30 13:06 Yongji Xie

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160623213700.2ccfe903@t450s.home \
    --to=alex.williamson@redhat.com \
    --cc=aik@ozlabs.ru \
    --cc=benh@kernel.crashing.org \
    --cc=bhelgaas@google.com \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mpe@ellerman.id.au \
    --cc=nikunj@linux.vnet.ibm.com \
    --cc=paulus@samba.org \
    --cc=warrier@linux.vnet.ibm.com \
    --cc=xyjxie@linux.vnet.ibm.com \
    --cc=zhong@linux.vnet.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.