All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Kiszka <jan.kiszka@siemens.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: kvm <kvm@vger.kernel.org>
Subject: Re: kvm device assignment and MSI-X masking
Date: Tue, 14 Aug 2012 17:47:04 +0200	[thread overview]
Message-ID: <502A72F8.4000502@siemens.com> (raw)
In-Reply-To: <1344958466.4683.260.camel@ul30vt.home>

On 2012-08-14 17:34, Alex Williamson wrote:
> On Tue, 2012-08-14 at 17:15 +0200, Jan Kiszka wrote:
>> On 2012-08-14 16:31, Alex Williamson wrote:
>>> On Tue, 2012-08-14 at 16:10 +0200, Jan Kiszka wrote:
>>>> On 2012-08-14 16:05, Alex Williamson wrote:
>>>>> On Tue, 2012-08-14 at 15:48 +0200, Jan Kiszka wrote:
>>>>>> Hi Alex,
>>>>>>
>>>>>> you once wrote this comment in device-assignment.c, msix_mmio_write:
>>>>>>
>>>>>>     if (!msix_masked(&orig) && msix_masked(entry)) {
>>>>>>         /*
>>>>>>          * Vector masked, disable it
>>>>>>          *
>>>>>>          * XXX It's not clear if we can or should actually attempt
>>>>>>          * to mask or disable the interrupt.  KVM doesn't have
>>>>>>          * support for pending bits and kvm_assign_set_msix_entry
>>>>>>          * doesn't modify the device hardware mask.  Interrupts
>>>>>>          * while masked are simply not injected to the guest, so
>>>>>>          * are lost.  Can we get away with always injecting an
>>>>>>          * interrupt on unmask?
>>>>>>          */
>>>>>>
>>>>>> I'm wondering what made you think that we won't inject if the vector is
>>>>>> masked like this (ie. in the shadow MSI-X table). Can you recall the
>>>>>> details?
>>>>>>
>>>>>> I'm trying to refactor this code to make the KVM interface a bit more
>>>>>> encapsulating the kernel interface details, not fixing anything. Still,
>>>>>> I would also like to avoid introducing regressions.
>>>>>
>>>>> Yeah, I didn't leave a very good comment there.  I'm sure it made more
>>>>> sense to me at the time.  I think I was trying to say that not only do
>>>>> we not have a way to mask the physical hardware, but if we did, we don't
>>>>> have a way to retrieve the pending bits, so any pending interrupts while
>>>>> masked would be lost.  We might be able to deal with that by posting a
>>>>> spurious interrupt on unmask, but for now we do nothing as masking is
>>>>> usually done just to update the vector.  Thanks,
>>>>
>>>> Ok, thanks for the clarification.
>>>>
>>>> As we are at it, do you also recall if this
>>>>
>>>> --- a/hw/device-assignment.c
>>>> +++ b/hw/device-assignment.c
>>>> @@ -1573,28 +1573,7 @@ static void msix_mmio_write(void *opaque, target_phys_addr_t addr,
>>>>               */
>>>>          } else if (msix_masked(&orig) && !msix_masked(entry)) {
>>>>              /* Vector unmasked */
>>>> -            if (i >= adev->irq_entries_nr || !adev->entry[i].type) {
>>>> -                /* Previously unassigned vector, start from scratch */
>>>> -                assigned_dev_update_msix(pdev);
>>>> -                return;
>>>> -            } else {
>>>> -                /* Update an existing, previously masked vector */
>>>> -                struct kvm_irq_routing_entry orig = adev->entry[i];
>>>> -                int ret;
>>>> -
>>>> -                adev->entry[i].u.msi.address_lo = entry->addr_lo;
>>>> -                adev->entry[i].u.msi.address_hi = entry->addr_hi;
>>>> -                adev->entry[i].u.msi.data = entry->data;
>>>> -
>>>> -                ret = kvm_update_routing_entry(&orig, &adev->entry[i]);
>>>> -                if (ret) {
>>>> -                    fprintf(stderr,
>>>> -                            "Error updating irq routing entry (%d)\n", ret);
>>>> -                    return;
>>>> -                }
>>>> -
>>>> -                kvm_irqchip_commit_routes(kvm_state);
>>>> -            }
>>>> +            assigned_dev_update_msix(pdev);
>>>>          }
>>>>      }
>>>>  }
>>>>
>>>> would make a relevant difference for known workloads? I'm trying to get
>>>> rid of direct routing table manipulations, but I would also like to
>>>> avoid introducing things like kvm_irqchip_update_msi_route unless really
>>>> necessary. Or could VFIO make use of that as well?
>>>
>>> It makes me a little nervous, but I don't know that it won't work.
>>> There's a lot more latency in turning off MSI-X and completely
>>> rebuilding it than there is in updating the routing of a single vector.
>>> You can imagine that irqbalance could be triggering this path pretty
>>> regularly.  Increasing vectors beyond what was previously setup is more
>>> of an init-time event, so the latency doesn't bother me as much.  We'd
>>> probably have to send some spurious interrupts for anything we might
>>> have missed if we take the high latency path.
>>
>> Yeah, good points.
>>
>>>
>>> VFIO is already a little more abstracted, making use of the msix vector
>>> use and release interface, but we do still make use of the kvm_irqchip
>>> irqfd/virq interfaces.
>>
>> Hmm, but due to the nature of the callbacks, we always disable/reanable
>> on mask/unmask. So VFIO will be slower than current device assignment in
>> this regard.
> 
> It's a bit awkward, I'm not thrilled with those msix callbacks but they
> seem to work.  I have a similar comment in static void
> vfio_msix_vector_release that maybe we should just disable direct
> injection on mask so that qemu-msix can do the masking and fill in the
> PBA.

That will require an enhancement of the callback mechanism. So far it
does not allow to tell apart per-vector masking from general disabling.
When the latter happens, we still want to release resources, I suppose.

With such enhancement in place, we could even consider keeping the VIRQ
and MSI route active (and provide a route update service) to avoid the
tear-down/recreate overhead on fast mask/unmask cycles, e.g. for IRQ
migration.

> 
>> BTW, how do you handle the device's PBA? Pass it through to the guest?
> 
> We could but I'm trying to use qemu-msix infrastructure which handles
> the PBA.  We've been working happily w/o good PBA support for so long, I
> haven't bothered to work on a channel to get to the physical PBA yet.

I think bouncing should be OK performance-wise - until some strange
guest pops up that actually polls the PBA in high-load scenarios.

Jan

-- 
Siemens AG, Corporate Technology, CT RTC ITP SDP-DE
Corporate Competence Center Embedded Linux

      reply	other threads:[~2012-08-14 15:47 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-08-14 13:48 kvm device assignment and MSI-X masking Jan Kiszka
2012-08-14 14:05 ` Alex Williamson
2012-08-14 14:10   ` Jan Kiszka
2012-08-14 14:31     ` Alex Williamson
2012-08-14 15:15       ` Jan Kiszka
2012-08-14 15:34         ` Alex Williamson
2012-08-14 15:47           ` Jan Kiszka [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=502A72F8.4000502@siemens.com \
    --to=jan.kiszka@siemens.com \
    --cc=alex.williamson@redhat.com \
    --cc=kvm@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.