All of lore.kernel.org
 help / color / mirror / Atom feed
From: Auger Eric <eric.auger@redhat.com>
To: Peter Xu <peterx@redhat.com>, qemu-devel@nongnu.org
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	Alex Williamson <alex.williamson@redhat.com>,
	Cornelia Huck <cohuck@redhat.com>
Subject: Re: [PATCH v1.1 4/5] KVM: Kick resamplefd for split kernel irqchip
Date: Thu, 27 Feb 2020 18:42:09 +0100	[thread overview]
Message-ID: <5e69c385-b27c-61d4-5856-21bcf5e9c8f5@redhat.com> (raw)
In-Reply-To: <20200227170048.236599-1-peterx@redhat.com>

Hi Peter,

On 2/27/20 6:00 PM, Peter Xu wrote:
> This is majorly only for X86 because that's the only one that supports
> split irqchip for now.
> 
> When the irqchip is split, we face a dilemma that KVM irqfd will be
> enabled, however the slow irqchip is still running in the userspace.
> It means that the resamplefd in the kernel irqfds won't take any
> effect and it can miss to ack INTx interrupts on EOIs.
Won't it always fail to ack INTx? With the above sentence I understand
it can work sometimes?
> 
> One example is split irqchip with VFIO INTx, which will break if we
> use the VFIO INTx fast path.
> 
> This patch can potentially supports the VFIO fast path again for INTx,
> that the IRQ delivery will still use the fast path, while we don't
> need to trap MMIOs in QEMU for the device to emulate the EIOs (see the
> callers of vfio_eoi() hook).  However the EOI of the INTx will still
> need to be done from the userspace by caching all the resamplefds in
> QEMU and kick properly for IOAPIC EOI broadcast.
If I understand correctly this is a one way fast path? Fast path is on
the trigger side only: VFIO -> KVM but not on the deactivation side,
trapped by the userspace IOAPIC where you directly notify the UNMASK
eventfd from userspace. Is that correct?
> 
> When the userspace is responsible for the resamplefd kickup, don't
> register it on the kvm_irqfd anymore, because on newer kernels (after
> commit 654f1f13ea56, 5.2+) the KVM_IRQFD will fail if with both split
> irqchip and resamplefd.  This will make sure that the fast path will
> work for all supported kernels.
> 
> https://patchwork.kernel.org/patch/10738541/#22609933
> 
> Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> v1.1 changelog:
> - when resamplefd is going to be kicked from userspace, don't register
>   it again in KVM_IRQFD.  Tested against upstream kernel.
> 
>  accel/kvm/kvm-all.c    | 74 ++++++++++++++++++++++++++++++++++++++++--
>  accel/kvm/trace-events |  1 +
>  hw/intc/ioapic.c       | 11 +++++--
>  include/sysemu/kvm.h   |  4 +++
>  4 files changed, 86 insertions(+), 4 deletions(-)
> 
> diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
> index d49b74512a..b766b6e93c 100644
> --- a/accel/kvm/kvm-all.c
> +++ b/accel/kvm/kvm-all.c
> @@ -159,9 +159,62 @@ static const KVMCapabilityInfo kvm_required_capabilites[] = {
>  static NotifierList kvm_irqchip_change_notifiers =
>      NOTIFIER_LIST_INITIALIZER(kvm_irqchip_change_notifiers);
>  
> +struct KVMResampleFd {
> +    int gsi;
> +    EventNotifier *resample_event;
> +    QLIST_ENTRY(KVMResampleFd) node;
> +};
> +typedef struct KVMResampleFd KVMResampleFd;
> +
> +/*
> + * Only used with split irqchip where we need to do the resample fd
> + * kick for the kernel from userspace.
> + */
> +static QLIST_HEAD(, KVMResampleFd) kvm_resample_fd_list =
> +    QLIST_HEAD_INITIALIZER(kvm_resample_fd_list);
> +
>  #define kvm_slots_lock(kml)      qemu_mutex_lock(&(kml)->slots_lock)
>  #define kvm_slots_unlock(kml)    qemu_mutex_unlock(&(kml)->slots_lock)
>  
> +static inline void kvm_resample_fd_remove(int gsi)
> +{
> +    KVMResampleFd *rfd;
> +
> +    QLIST_FOREACH(rfd, &kvm_resample_fd_list, node) {
> +        if (rfd->gsi == gsi) {
> +            QLIST_REMOVE(rfd, node);
> +            break;
> +        }
> +    }
> +}
> +
> +static inline void kvm_resample_fd_insert(int gsi, EventNotifier *event)
> +{
> +    KVMResampleFd *rfd = g_new0(KVMResampleFd, 1);
> +
> +    rfd->gsi = gsi;
> +    rfd->resample_event = event;
> +
> +    QLIST_INSERT_HEAD(&kvm_resample_fd_list, rfd, node);
> +}
> +
> +void kvm_resample_fd_notify(int gsi)
> +{
> +    KVMResampleFd *rfd;
> +
> +    if (!kvm_irqchip_is_split()) {
> +        return;
> +    }
> +
> +    QLIST_FOREACH(rfd, &kvm_resample_fd_list, node) {
> +        if (rfd->gsi == gsi) {
> +            event_notifier_set(rfd->resample_event);
> +            trace_kvm_resample_fd_notify(gsi);
> +            break;
> +        }
> +    }
> +}
> +
>  int kvm_get_max_memslots(void)
>  {
>      KVMState *s = KVM_STATE(current_accel());
> @@ -1642,8 +1695,25 @@ static int kvm_irqchip_assign_irqfd(KVMState *s, EventNotifier *event,
>      };
>  
>      if (rfd != -1) {
> -        irqfd.flags |= KVM_IRQFD_FLAG_RESAMPLE;
> -        irqfd.resamplefd = rfd;
> +        assert(assign);
> +        if (kvm_irqchip_is_split()) {
> +            /*
> +             * When the slow irqchip (e.g. IOAPIC) is in the
> +             * userspace, resamplefd will not work because the EOI of
> +             * the interrupt will be delivered to userspace instead,
s/delivered to userspace/handled in userspace
> +             * the KVM resample fd kick is skipped.  The userspace
> +             * needs to remember the resamplefd and kick it when we
> +             * receive EOI of this IRQ.
Practically we now talk about a VFIO ACTION_UNMASK classical eventfd
As such isn't it a bit weird to handle those normal UNMASK eventfds in
the KVM code?


> +             */
> +            kvm_resample_fd_insert(virq, resample);
> +        } else {
> +            irqfd.flags |= KVM_IRQFD_FLAG_RESAMPLE;
> +            irqfd.resamplefd = rfd;
> +        }
> +    } else if (!assign) {
> +        if (kvm_irqchip_is_split()) {
> +            kvm_resample_fd_remove(virq);
> +        }
>      }
>  
>      if (!kvm_irqfds_enabled()) {
> diff --git a/accel/kvm/trace-events b/accel/kvm/trace-events
> index 4fb6e59d19..a68eb66534 100644
> --- a/accel/kvm/trace-events
> +++ b/accel/kvm/trace-events
> @@ -16,4 +16,5 @@ kvm_set_ioeventfd_mmio(int fd, uint64_t addr, uint32_t val, bool assign, uint32_
>  kvm_set_ioeventfd_pio(int fd, uint16_t addr, uint32_t val, bool assign, uint32_t size, bool datamatch) "fd: %d @0x%x val=0x%x assign: %d size: %d match: %d"
>  kvm_set_user_memory(uint32_t slot, uint32_t flags, uint64_t guest_phys_addr, uint64_t memory_size, uint64_t userspace_addr, int ret) "Slot#%d flags=0x%x gpa=0x%"PRIx64 " size=0x%"PRIx64 " ua=0x%"PRIx64 " ret=%d"
>  kvm_clear_dirty_log(uint32_t slot, uint64_t start, uint32_t size) "slot#%"PRId32" start 0x%"PRIx64" size 0x%"PRIx32
> +kvm_resample_fd_notify(int gsi) "gsi %d"
>  
> diff --git a/hw/intc/ioapic.c b/hw/intc/ioapic.c
> index 15747fe2c2..8c75465c62 100644
> --- a/hw/intc/ioapic.c
> +++ b/hw/intc/ioapic.c
> @@ -236,8 +236,15 @@ void ioapic_eoi_broadcast(int vector)
>          for (n = 0; n < IOAPIC_NUM_PINS; n++) {
>              entry = s->ioredtbl[n];
>  
> -            if ((entry & IOAPIC_VECTOR_MASK) != vector ||
> -                ((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1) != IOAPIC_TRIGGER_LEVEL) {
> +            if ((entry & IOAPIC_VECTOR_MASK) != vector) {
> +                continue;
> +            }
> +
> +            /* Kick resamplefd if KVM is bypassed */
> +            kvm_resample_fd_notify(n);
KVM is bypassed on the deactivation path but still we call
kvm_resample_fd_notify().
> +
> +            if (((entry >> IOAPIC_LVT_TRIGGER_MODE_SHIFT) & 1) !=
> +                IOAPIC_TRIGGER_LEVEL) {
>                  continue;
>              }
>  
> diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
> index 141342de98..b67552c047 100644
> --- a/include/sysemu/kvm.h
> +++ b/include/sysemu/kvm.h
> @@ -555,4 +555,8 @@ int kvm_set_one_reg(CPUState *cs, uint64_t id, void *source);
>  int kvm_get_one_reg(CPUState *cs, uint64_t id, void *target);
>  struct ppc_radix_page_info *kvm_get_radix_page_info(void);
>  int kvm_get_max_memslots(void);
> +
> +/* Notify resamplefd for EOI of specific level triggered interrupts */
> +void kvm_resample_fd_notify(int gsi);
> +
>  #endif
> 
Thanks

Eric



  parent reply	other threads:[~2020-02-27 17:43 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-02-26 22:50 [PATCH 0/5] vfio/pci: Fix up breakage against split irqchip and INTx Peter Xu
2020-02-26 22:50 ` [PATCH 1/5] vfio/pci: Disable INTx fast path if using split irqchip Peter Xu
2020-02-27 16:53   ` Auger Eric
2020-02-27 17:10     ` Peter Xu
2020-02-26 22:50 ` [PATCH 2/5] vfio/pci: Use kvm_irqchip_add_irqfd_notifier_gsi() for irqfds Peter Xu
2020-02-27 11:04   ` Auger Eric
2020-02-27 16:41   ` Cornelia Huck
2020-02-26 22:54 ` [PATCH 3/5] KVM: Pass EventNotifier into kvm_irqchip_assign_irqfd Peter Xu
2020-02-27 17:01   ` Auger Eric
2020-02-26 22:55 ` [PATCH 4/5] KVM: Kick resamplefd for split kernel irqchip Peter Xu
2020-02-27 17:00   ` [PATCH v1.1 " Peter Xu
2020-02-27 17:18     ` Peter Xu
2020-02-27 17:42     ` Auger Eric [this message]
2020-02-27 18:00       ` Peter Xu
2020-02-27 18:22         ` Auger Eric
2020-02-27 19:19           ` Peter Xu
2020-02-27 21:14             ` Auger Eric
2020-02-27 21:52               ` Peter Xu
2020-02-28 10:34               ` Paolo Bonzini
2020-02-28 10:36                 ` Auger Eric
2020-02-28 10:34   ` [PATCH " Paolo Bonzini
2020-02-28 14:58     ` Peter Xu
2020-02-28 15:24       ` Paolo Bonzini
2020-02-26 22:55 ` [PATCH 5/5] Revert "vfio/pci: Disable INTx fast path if using split irqchip" Peter Xu
2020-02-27 15:32 ` [PATCH 0/5] vfio/pci: Fix up breakage against split irqchip and INTx Auger Eric
2020-02-27 15:51   ` Peter Xu
2020-02-27 17:02     ` Peter Xu
2020-02-28 10:36 ` Paolo Bonzini
2020-02-28 15:25   ` Peter Xu
2020-02-28 15:32     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5e69c385-b27c-61d4-5856-21bcf5e9c8f5@redhat.com \
    --to=eric.auger@redhat.com \
    --cc=alex.williamson@redhat.com \
    --cc=cohuck@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.