All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Liguori <aliguori@us.ibm.com>
To: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Peter Maydell <peter.maydell@linaro.org>,
	kvm@vger.kernel.org, virtualization@lists.linux-foundation.org,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	KONRAD Frederic <fred.konrad@greensocs.com>
Subject: Re: [PATCH RFC] virtio-pci: new config layout: using memory BAR
Date: Wed, 05 Jun 2013 15:42:57 -0500	[thread overview]
Message-ID: <87k3m8qofi.fsf@codemonkey.ws> (raw)
In-Reply-To: <20130605194317.GA30923@redhat.com>

"Michael S. Tsirkin" <mst@redhat.com> writes:

> On Wed, Jun 05, 2013 at 01:57:16PM -0500, Anthony Liguori wrote:
>> "Michael S. Tsirkin" <mst@redhat.com> writes:
>> 
>> > On Wed, Jun 05, 2013 at 10:46:15AM -0500, Anthony Liguori wrote:
>> >> Look, it's very simple.
>> > We only need to do it if we do a change that breaks guests.
>> >
>> > Please find a guest that is broken by the patches. You won't find any.
>> 
>> I think the problem in this whole discussion is that we're talking past
>> each other.
>> 
>> Here is my understanding:
>> 
>> 1) PCI-e says that you must be able to disable IO bars and still have a
>> functioning device.
>> 
>> 2) It says (1) because you must size IO bars to 4096 which means that
>> practically speaking, once you enable a dozen or so PIO bars, you run
>> out of PIO space (16 * 4k == 64k and not all that space can be used).
>
>
> Let me add 3 other issues which I mentioned and you seem to miss:
>
> 3) architectures which don't have fast access to IO ports, exist
>    virtio does not work there ATM

Which architectures have PCI but no IO ports?

> 4) setups with many PCI bridges exist and have the same issue
>    as PCI express. virtio does not work there ATM

This is not virtio specific.  This is true for all devices that use IO.

> 5) On x86, even with nested page tables, firmware only decodes
>    the page address on an invalid PTE, not the data. You need to
>    emulate the guest to get at the data. Without
>    nested page tables, we have to do page table walk and emulate
>    to get both address and data. Since this is how MMIO
>    is implemented in kvm on x86, MMIO is much slower than PIO
>    (with nested page tables by a factor of >2, did not test without).

Am well aware of this, this is why we use PIO.

I fully agree with you that when we do MMIO, we should switch the
notification mechanism to avoid encoding anything meaningful as data.

>> virtio-pci uses a IO bars exclusively today.  Existing guest drivers
>> assume that there is an IO bar that contains the virtio-pci registers.
>> So let's consider the following scenarios:
>> 
>> QEMU of today:
>> 
>> 1) qemu -drive file=ubuntu-13.04.img,if=virtio
>> 
>> This works today.  Does adding an MMIO bar at BAR1 break this?
>> Certainly not if the device is behind a PCI bus...
>> 
>> But are we going to put devices behind a PCI-e bus by default?  Are we
>> going to ask the user to choose whether devices are put behind a legacy
>> bus or the express bus?
>> 
>> What happens if we put the device behind a PCI-e bus by default?  Well,
>> it can still work.  That is, until we do something like this:
>> 
>> 2) qemu -drive file=ubuntu-13.04.img,if=virtio -device virtio-rng
>>         -device virtio-balloon..
>> 
>> Such that we have more than a dozen or so devices.  This works
>> perfectly fine today.  It works fine because we've designed virtio to
>> make sure it works fine.  Quoting the spec:
>> 
>> "Configuration space is generally used for rarely-changing or
>>  initialization-time parameters. But it is a limited resource, so it
>>  might be better to use a virtqueue to update configuration information
>>  (the network device does this for filtering, otherwise the table in the
>>  config space could potentially be very large)."
>> 
>> In fact, we can have 100s of PCI devices today without running out of IO
>> space because we're so careful about this.
>> 
>> So if we switch to using PCI-e by default *and* we keep virtio-pci
>> without modifying the device IDs, then very frequently we are going to
>> break existing guests because the drivers they already have no longer
>> work.
>> 
>> A few virtio-serial channels, a few block devices, a couple of network
>> adapters, the balloon and RNG driver, and we hit the IO space limit
>> pretty damn quickly so this is not a contrived scenario at all.  I would
>> expect that we frequently run into this if we don't address this problem.
>> 
>> So we have a few options:
>> 1) Punt all of this complexity to libvirt et al and watch people make
>>    the wrong decisions about when to use PCI-e.  This will become yet
>>    another example of KVM being too hard to configure.
>> 
>> 2) Enable PCI-e by default and just force people to upgrade their
>>    drivers.
>> 
>> 3) Don't use PCI-e by default but still add BAR1 to virtio-pci
>> 
>> 4) Do virtio-pcie, make it PCI-e friendly (drop the IO BAR completely),
>
> We can't do this - it will hurt performance.

Can you explain?  I thought the whole trick with separating out the
virtqueue notification register was to regain the performance?

>>    give
>>    it a new device/vendor ID.   Continue to use virtio-pci for existing
>>    devices potentially adding virtio-{net,blk,...}-pcie variants for
>>    people that care to use them.
>> 
>> I think 1 == 2 == 3 and I view 2 as an ABI breaker.
>
> Why do you think 2 == 3? 2 changes default behaviour. 3 does not.

It doesn't change the default behavior but then we're pushing the
decision of when to use pci-e to the user.  They have to understand that
there can be subtle breakages because the virtio-pci driver may not work
if they are using an old guest.

>> libvirt does like
>> policy so they're going to make a simple decision and always use the
>> same bus by default.  I suspect if we made PCI the default, they might
>> just always set the PCI-e flag just because.
>
> This sounds very strange. But let's assume you are right for
> the sake of the argument ...
>
>> There are hundreds of thousands if not millions of guests with existing
>> virtio-pci drivers.  Forcing them to upgrade better have an extremely
>> good justification.
>> 
>> I think 4 is the best path forward.  It's better for users (guests
>> continue to work as they always have).  There's less confusion about
>> enabling PCI-e support--you must ask for the virtio-pcie variant and you
>> must have a virtio-pcie driver.  It's easy to explain.
>
> I don't think how this changes the situation. libvirt still need
> to set policy and decide which device to use.

But virtio-pcie never exhausts the IO configuration space.  That's the
difference.

And virtio-pcie is a separate driver so presumably libvirt will make
that visible in the XML.  In fact, it should.

>> It also maps to what regular hardware does.  I highly doubt that there
>> are any real PCI cards that made the shift from PCI to PCI-e without
>> bumping at least a revision ID.
>
> Only because the chance it's 100% compatible on the software level is 0.
> It always has some hardware specific quirks.
> No such excuse here.
>
>> It also means we don't need to play games about sometimes enabling IO
>> bars and sometimes not.
>
> This last paragraph is wrong, it ignores the issues 3) to 5) 
> I added above.
>
> If you do take them into account:
> 	- there are reasons to add MMIO BAR to PCI,
> 	  even without PCI express

So far, the only reason you've provided is "it doesn't work on some
architectures."  Which architectures?

> 	- we won't be able to drop IO BAR from virtio

An IO BAR is useless if it means we can't have more than 12 devices.

>
>> Regards,
>> 
>> Anthony Liguori
>> 
>> >
>> >
>> > -- 
>> > MST
>> > --
>> > To unsubscribe from this list: send the line "unsubscribe kvm" in
>> > the body of a message to majordomo@vger.kernel.org
>> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2013-06-05 20:42 UTC|newest]

Thread overview: 109+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-28 16:03 [PATCH RFC] virtio-pci: new config layout: using memory BAR Michael S. Tsirkin
2013-05-28 17:15 ` Anthony Liguori
2013-05-28 17:15 ` Anthony Liguori
2013-05-28 17:32   ` Michael S. Tsirkin
2013-05-28 17:43     ` Paolo Bonzini
2013-05-29  2:02       ` Laszlo Ersek
2013-05-29  2:02         ` Laszlo Ersek
2013-05-29  4:33       ` Rusty Russell
2013-05-29  4:33       ` Rusty Russell
2013-05-29  7:27         ` Paolo Bonzini
2013-05-29  8:05           ` Michael S. Tsirkin
2013-05-29 10:07           ` Laszlo Ersek
2013-05-28 18:53     ` Anthony Liguori
2013-05-28 19:27       ` Michael S. Tsirkin
2013-05-29  4:31   ` Rusty Russell
2013-05-29  4:31     ` Rusty Russell
2013-05-29  8:24     ` Michael S. Tsirkin
2013-05-29  8:52       ` Paolo Bonzini
2013-05-29  9:00       ` Peter Maydell
2013-05-29 10:08         ` Michael S. Tsirkin
2013-05-29 10:53           ` Peter Maydell
2013-05-29 12:16             ` Michael S. Tsirkin
2013-05-29 12:28               ` Paolo Bonzini
2013-05-29 12:37                 ` Michael S. Tsirkin
2013-05-29 12:52     ` Anthony Liguori
2013-05-29 12:52       ` Anthony Liguori
2013-05-29 13:24       ` Michael S. Tsirkin
2013-05-29 13:35         ` Peter Maydell
2013-05-29 13:35         ` Peter Maydell
2013-05-29 13:41         ` Paolo Bonzini
2013-05-29 14:02           ` Michael S. Tsirkin
2013-05-29 14:18           ` Anthony Liguori
2013-05-29 14:18           ` Anthony Liguori
2013-05-30  7:43           ` Michael S. Tsirkin
2013-05-29 14:16         ` Anthony Liguori
2013-05-29 14:16         ` Anthony Liguori
2013-05-29 14:30           ` Michael S. Tsirkin
2013-05-29 14:32             ` Paolo Bonzini
2013-05-29 14:52               ` Michael S. Tsirkin
2013-05-29 14:55             ` Anthony Liguori
2013-05-29 16:12               ` Michael S. Tsirkin
2013-05-29 18:16               ` Michael S. Tsirkin
2013-05-29 14:55             ` Anthony Liguori
2013-05-30  3:58       ` Rusty Russell
2013-05-30  3:58         ` Rusty Russell
2013-05-30  5:55         ` Michael S. Tsirkin
2013-05-30  7:55         ` Michael S. Tsirkin
2013-06-03  0:17           ` Rusty Russell
2013-05-30 13:53         ` Anthony Liguori
2013-05-30 13:53           ` Anthony Liguori
2013-05-30 14:01           ` Michael S. Tsirkin
2013-06-03  0:26             ` Rusty Russell
2013-06-03 10:11               ` Michael S. Tsirkin
2013-06-04  5:31                 ` Rusty Russell
2013-06-04  6:42                   ` Michael S. Tsirkin
2013-06-05  7:19                     ` Rusty Russell
2013-06-05 10:22                       ` Michael S. Tsirkin
2013-06-05 12:59                     ` Anthony Liguori
2013-06-05 14:09                       ` Michael S. Tsirkin
2013-06-05 15:08                         ` Anthony Liguori
2013-06-05 15:19                           ` Michael S. Tsirkin
2013-06-05 15:46                             ` Anthony Liguori
2013-06-05 15:46                             ` Anthony Liguori
2013-06-05 16:20                               ` Michael S. Tsirkin
2013-06-05 18:57                                 ` Anthony Liguori
2013-06-05 19:43                                   ` Michael S. Tsirkin
2013-06-05 19:52                                     ` Michael S. Tsirkin
2013-06-05 20:45                                       ` Anthony Liguori
2013-06-05 21:15                                         ` H. Peter Anvin
2013-06-05 21:15                                         ` Michael S. Tsirkin
2013-06-05 20:42                                     ` Anthony Liguori [this message]
2013-06-05 21:14                                       ` Michael S. Tsirkin
2013-06-05 21:53                                         ` Anthony Liguori
2013-06-05 21:53                                         ` Anthony Liguori
2013-06-05 22:19                                           ` Benjamin Herrenschmidt
2013-06-05 22:53                                             ` Anthony Liguori
2013-06-05 23:27                                               ` Benjamin Herrenschmidt
2013-06-05 19:54                                   ` Michael S. Tsirkin
2013-06-06  3:42                                   ` Rusty Russell
2013-06-06 14:59                                     ` Anthony Liguori
2013-06-07  1:58                                       ` Rusty Russell
2013-06-07  8:25                                       ` Peter Maydell
2013-06-05 21:10                                 ` H. Peter Anvin
2013-06-05 21:17                                   ` Michael S. Tsirkin
2013-06-05 21:50                                   ` Anthony Liguori
2013-06-05 21:55                                     ` H. Peter Anvin
2013-06-05 22:08                                       ` Anthony Liguori
2013-06-05 23:07                                         ` H. Peter Anvin
2013-06-06  0:41                                           ` Anthony Liguori
2013-06-06  6:34                                             ` Gleb Natapov
2013-06-06 13:53                                               ` H. Peter Anvin
2013-06-06 15:02                                               ` Anthony Liguori
2013-06-07 11:30                                                 ` Gleb Natapov
2013-06-11  7:10                                                 ` Michael S. Tsirkin
2013-06-11  7:53                                                   ` Gleb Natapov
2013-06-11  8:02                                                     ` Michael S. Tsirkin
2013-06-11  8:03                                                       ` Gleb Natapov
2013-06-11  8:19                                                         ` Michael S. Tsirkin
2013-06-11  8:22                                                           ` Gleb Natapov
2013-06-11  8:30                                                             ` Michael S. Tsirkin
2013-06-11  8:32                                                               ` Gleb Natapov
2013-06-11  8:04                                                       ` Michael S. Tsirkin
2013-06-06 15:02                                               ` Anthony Liguori
2013-06-06 15:06                                               ` Gerd Hoffmann
2013-06-06 15:10                                                 ` Gleb Natapov
2013-06-06 15:19                                                   ` H. Peter Anvin
2013-06-06 15:22                                                   ` Gerd Hoffmann
2013-07-08  4:25                                                   ` Kevin O'Connor
2013-06-06  8:02                   ` Michael S. Tsirkin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87k3m8qofi.fsf@codemonkey.ws \
    --to=aliguori@us.ibm.com \
    --cc=fred.konrad@greensocs.com \
    --cc=kvm@vger.kernel.org \
    --cc=mst@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=stefanha@redhat.com \
    --cc=virtualization@lists.linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.