kvm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jason Wang <jasowang@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Elena Afanasova <eafanasova@gmail.com>,
	kvm@vger.kernel.org, jag.raman@oracle.com,
	elena.ufimtseva@oracle.com
Subject: Re: [RFC 1/2] KVM: add initial support for KVM_SET_IOREGION
Date: Wed, 13 Jan 2021 10:38:29 +0800	[thread overview]
Message-ID: <e22eaf2b-15f6-5b41-75a8-0e9b24e84e16@redhat.com> (raw)
In-Reply-To: <20210107175311.GA168426@stefanha-x1.localdomain>


On 2021/1/8 上午1:53, Stefan Hajnoczi wrote:
> On Thu, Jan 07, 2021 at 11:30:47AM +0800, Jason Wang wrote:
>> On 2021/1/6 下午11:05, Stefan Hajnoczi wrote:
>>> On Wed, Jan 06, 2021 at 01:21:43PM +0800, Jason Wang wrote:
>>>> On 2021/1/5 下午6:25, Stefan Hajnoczi wrote:
>>>>> On Tue, Jan 05, 2021 at 11:53:01AM +0800, Jason Wang wrote:
>>>>>> On 2021/1/5 上午8:02, Elena Afanasova wrote:
>>>>>>> On Mon, 2021-01-04 at 13:34 +0800, Jason Wang wrote:
>>>>>>>> On 2021/1/4 上午4:32, Elena Afanasova wrote:
>>>>>>>>> On Thu, 2020-12-31 at 11:45 +0800, Jason Wang wrote:
>>>>>>>>>> On 2020/12/29 下午6:02, Elena Afanasova wrote:
>>>>>>>>>>> This vm ioctl adds or removes an ioregionfd MMIO/PIO region.
>>>>>>>>>> How about FAST_MMIO?
>>>>>>>>>>
>>>>>>>>> I’ll add KVM_IOREGION_FAST_MMIO flag support. So this may be
>>>>>>>>> suitable
>>>>>>>>> for triggers which could use posted writes. The struct
>>>>>>>>> ioregionfd_cmd
>>>>>>>>> size bits and the data field will be unused in this case.
>>>>>>>> Note that eventfd checks for length and have datamatch support. Do
>>>>>>>> we
>>>>>>>> need to do something similar.
>>>>>>>>
>>>>>>> Do you think datamatch support is necessary for ioregionfd?
>>>>>> I'm not sure. But if we don't have this support, it probably means we can't
>>>>>> use eventfd for ioregionfd.
>>>>> This is an interesting question because ioregionfd and ioeventfd have
>>>>> different semantics. While it would be great to support all ioeventfd
>>>>> features in ioregionfd, I'm not sure that is possible. I think ioeventfd
>>>>> will remain useful for devices that only need a doorbell write register.
>>>>>
>>>>> The differences:
>>>>>
>>>>> 1. ioeventfd has datamatch. This could be implemented in ioregionfd so
>>>>>       that a datamatch failure results in the classic ioctl(KVM_RETURN)
>>>>>       MMIO/PIO exit reason and the VMM can handle the access.
>>>>>
>>>>>       I'm not sure if this feature is useful though. Most of the time
>>>>>       ioregionfd users want to handle all accesses to the region and the
>>>>>       VMM may not even know how to handle register accesses because they
>>>>>       can only be handled in a dedicated thread or an out-of-process
>>>>>       device.
>>>> It's about whether or not the current semantic of ioregion is sufficient for
>>>> implementing doorbell.
>>>>
>>>> E.g in the case of virtio, the virtqueue index is encoded in the write to
>>>> the doorbell. And if a single MMIO area is used for all virtqueues,
>>>> datamatch is probably a must in this case.
>>> struct ioregionfd_cmd contains not just the register offset, but also
>>> the value written by the guest. Therefore datamatch is not necessary.
>>>
>>> Datamatch would only be useful as some kind of more complex optimization
>>> where different values writtent to the same register dispatch to
>>> different fds.
>>
>> That's exactly the case of virtio. Consider queue 1,2 shares the MMIO
>> register. We need use datamatch to dispatch the notification to different
>> eventfds.
> I can see two options without datamatch:
>
> 1. If both virtqueues are handled by the same userspace thread then only
>     1 fd is needed. ioregionfd sends the value written to the register,
>     so userspace is able to distinguish between the virtqueues.


Right.


>
> 2. If separate userspace threads process the virtqueues, then set up the
>     virtio-pci capabilities so the virtqueues have separate notification
>     registers:
>     https://docs.oasis-open.org/virtio/virtio/v1.1/cs01/virtio-v1.1-cs01.html#x1-1150004


Right. But this works only when PCI transport is used and queue index 
could be deduced from the register address (separated doorbell).

If we use MMIO or sharing the doorbell registers among all the 
virtqueues (multiplexer is zero in the above case) , it can't work 
without datamatch.


>
> With ioeventfd 2 fds are needed in case #1 because the data value
> written to the register is not communicated to userspace. But ioregionfd
> does not have this limitation, so I'm not sure whether datamatch is
> really needed in ioregionfd?
>
> Or is there a use case that I missed?
>
>>>>>>>> I guess the idea is to have a generic interface to let eventfd work
>>>>>>>> for
>>>>>>>> ioregion as well.
>>>>>>>>
>>>>>>> It seems that posted writes is the only "fast" case in ioregionfd. So I
>>>>>>> was thinking about using FAST_MMIO for this case only. Maybe in some
>>>>>>> cases it will be better to just use ioeventfd. But I'm not sure.
>>>>>> To be a generic infrastructure, it's better to have this, but we can listen
>>>>>> from the opinion of others.
>>>>> I think we want both FAST_MMIO and regular MMIO options for posted
>>>>> writes:
>>>>>
>>>>> 1. FAST_MMIO - ioregionfd_cmd size and data fields are zero and do not
>>>>>       contain information about the nature of the guest access. This is
>>>>>       fine for ioeventfd doorbell style registers because we don't need
>>>>>       that information.
>>>> Is FAST_MMIO always for doorbell? If not, we probably need the size and
>>>> data.
>>> My understanding is that FAST_MMIO only provides the guest physical
>>> address and no additional information. In fact, I'm not even sure if we
>>> know whether the access is a read or a write.
>>>
>>> So there is extremely limited information to work with and it's
>>> basically only useful for doorbell writes.
>>>
>>>>> 2. Regular MMIO - ioregionfd_cmd size and data fields contain valid data
>>>>>       about the nature of the guest access. This is needed when the device
>>>>>       register is more than a simple "kick" doorbell. For example, if the
>>>>>       device needs to know the value that the guest wrote.
>>>>>
>>>>> I suggest defining an additional KVM_SET_IOREGION flag called
>>>>> KVM_IOREGION_FAST_MMIO that can be set together with
>>>>> KVM_IOREGION_POSTED_WRITES.
>>>> If we need to expose FAST_MMIO to userspace, we probably need to define its
>>>> semantics which is probably not easy since it's an architecture
>>>> optimization.
>>> Maybe the name KVM_IOREGION_FAST_MMIO name should be changed to
>>> something more specific like KVM_IOREGION_OFFSET_ONLY, meaning that only
>>> the offset field is valid.
>>
>> Or we can do like what eventfd did, implies FAST_MMIO when memory_size is
>> zero (kvm_assign_ioeventfd()):
>>
>>      if (!args->len && bus_idx == KVM_MMIO_BUS) {
>>          ret = kvm_assign_ioeventfd_idx(kvm, KVM_FAST_MMIO_BUS, args);
>>          if (ret < 0)
>>              goto fast_fail;
>>      }
> Yes!
>
>>> I haven't checked if and how other architectures implement FAST_MMIO,
>>> but they will at least be able to provide the offset :).
>>>
>>>>> KVM_IOREGION_PIO cannot be used together with KVM_IOREGION_FAST_MMIO.
>>>>>
>>>>> In theory KVM_IOREGION_POSTED_WRITES doesn't need to be set with
>>>>> KVM_IOREGION_FAST_MMIO. Userspace would have to send back a struct
>>>>> ioregionfd_resp to acknowledge that the write has been handled.
>>>> Right, and it also depends on whether or not the hardware support (e.g
>>>> whether or not it can decode the instructions).
>>> The KVM_IOREGION_FAST_MMIO flag should be documented as an optimization
>>> hint. If hardware doesn't support FAST_MMIO then struct ioregionfd_cmd
>>> will contain all fields. Userspace will be able to process the cmd
>>> either way.
>>
>> You mean always have a fallback to MMIO for FAST_MMIO? That should be fine
>> but looks less optimal than the implying FAST_MMIO for zero length. I still
>> think introducing "FAST_MMIO" may bring confusion for users ...
> Regarding the fallback, my understanding is that ioeventfds are always
> placed on both the MMIO and FAST_MMIO bus when len is zero. That way
> architectures that don't support FAST_MMIO will still dispatch those
> ioeventfds. In virt/kvm/eventfd.c:kvm_assign_ioeventfd():
>
>    ret = kvm_assign_ioeventfd_idx(kvm, bus_idx, args);
>    ...
>    if (!args->len && bus_idx == KVM_MMIO_BUS) {
>        ret = kvm_assign_ioeventfd_idx(kvm, KVM_FAST_MMIO_BUS, args);
>
> So ioeventfd is already doing this fallback thing.
>
> Let's follow ioeventfd:
> 1. len=0 means the size/data fields are not needed. Userspace cannot
>     rely on these fields being valid.
> 2. There is an automatic fallback to the slow MMIO bus so that slow path
>     accesses are still detected by the ioregion.
>
> The explicit KVM_IOREGION_FAST_MMIO flag I mentioned is not needed.


Agreed.

Thanks


>
> Stefan


  reply	other threads:[~2021-01-13  2:40 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-29 10:02 [RFC 0/2] Introduce MMIO/PIO dispatch file descriptors (ioregionfd) Elena Afanasova
2020-12-29 10:02 ` [RFC 1/2] KVM: add initial support for KVM_SET_IOREGION Elena Afanasova
2020-12-29 11:36   ` Stefan Hajnoczi
2020-12-30 12:14     ` Elena Afanasova
2020-12-31  3:45   ` Jason Wang
2021-01-03 20:32     ` Elena Afanasova
2021-01-04  5:34       ` Jason Wang
2021-01-05  0:02         ` Elena Afanasova
2021-01-05  3:53           ` Jason Wang
2021-01-05 10:25             ` Stefan Hajnoczi
2021-01-06  5:21               ` Jason Wang
2021-01-06 15:05                 ` Stefan Hajnoczi
2021-01-07  3:30                   ` Jason Wang
2021-01-07 17:53                     ` Stefan Hajnoczi
2021-01-13  2:38                       ` Jason Wang [this message]
2021-01-13 15:52                         ` Stefan Hajnoczi
2021-01-14  4:05                           ` Jason Wang
2021-01-14 16:16                             ` Stefan Hajnoczi
2021-01-15  3:41                               ` Jason Wang
2020-12-29 10:02 ` [RFC 2/2] KVM: add initial support for ioregionfd blocking read/write operations Elena Afanasova
2020-12-29 12:00   ` Stefan Hajnoczi
2020-12-30 12:24     ` Elena Afanasova
2020-12-31  3:46   ` Jason Wang
2021-01-03 20:37     ` Elena Afanasova
2021-01-04  5:37       ` Jason Wang
2021-01-05  0:06         ` Elena Afanasova
2020-12-29 12:06 ` [RFC 0/2] Introduce MMIO/PIO dispatch file descriptors (ioregionfd) Stefan Hajnoczi
2020-12-30 17:56   ` Elena Afanasova

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e22eaf2b-15f6-5b41-75a8-0e9b24e84e16@redhat.com \
    --to=jasowang@redhat.com \
    --cc=eafanasova@gmail.com \
    --cc=elena.ufimtseva@oracle.com \
    --cc=jag.raman@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).