From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1753290AbdBIPMu (ORCPT <rfc822;w@1wt.eu>);
        Thu, 9 Feb 2017 10:12:50 -0500
Received: from mx1.redhat.com ([209.132.183.28]:52474 "EHLO mx1.redhat.com"
        rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP
        id S1751836AbdBIPMs (ORCPT <rfc822;linux-kernel@vger.kernel.org>);
        Thu, 9 Feb 2017 10:12:48 -0500
Date: Thu, 9 Feb 2017 16:11:46 +0100
From: Radim =?utf-8?B?S3LEjW3DocWZ?= <rkrcmar@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org,
        "Wu, Feng" <feng.wu@intel.com>
Subject: Re: [PATCH 6/6] kvm: x86: do not use KVM_REQ_EVENT for APICv
 interrupt injection
Message-ID: <20170209151145.GK31091@potion>
References: <1482164232-130035-1-git-send-email-pbonzini@redhat.com>
 <1482164232-130035-7-git-send-email-pbonzini@redhat.com>
 <20170207195804.GA1473@potion>
 <d74d36b4-3376-2577-f81e-cb819e676fb4@redhat.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Disposition: inline
Content-Transfer-Encoding: 8bit
In-Reply-To: <d74d36b4-3376-2577-f81e-cb819e676fb4@redhat.com>
X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.28]); Thu, 09 Feb 2017 15:11:49 +0000 (UTC)
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

2017-02-08 17:23+0100, Paolo Bonzini:
> On 07/02/2017 20:58, Radim Krčmář wrote:
>>> -	local_irq_disable();
>>> +	if (kvm_lapic_enabled(vcpu)) {
>>> +		/*
>>> +		 * This handles the case where a posted interrupt was
>>> +		 * notified with kvm_vcpu_kick.
>>> +		 */
>>> +		if (kvm_x86_ops->sync_pir_to_irr)
>>> +			kvm_x86_ops->sync_pir_to_irr(vcpu);
>> Hm, this is not working well when nesting while L1 has assigned devices:
>> if the posted interrupt arrives just before local_irq_disable(), then
>> we'll just enter L2 instead of doing a nested VM exit (in case we have
>> interrupt exiting).
>> 
>> And after reading the code a bit, I think we allow posted interrupts in
>> L2 while L1 has assigned devices that use posted interrupts, and that it
>> doesn't work.
> 
> So you mean the interrupt is delivered to L2?  The fix would be to wrap
> L2 entry and exit with some subset of pi_pre_block/pi_post_block.

I hope not, as their PI strucutres are separate, so we'd be just
delaying the interrupt injection to L1.  The CPU running L2 guest will
notice a posted notification, but its PIR.ON will/might not be set.
L1's PIR.ON will be set, but no-one is going to care until the next VM
exit.

I'll add some unit tests to check that I understood the bug correctly.

Changing the notification vector for L2 would be an ok solution.
We'd reserve a new vector in L0 and check L1's interrupts.  If it were
targetting a VCPU that is currently in L2 with a notification vector
configured for L2, we'd translate that vector into the notification
vector we set for L2 -- L1 could then post interrupts to L2 without a VM
exit.  And "posted" interrupts for L1 while in L2 would trigger a VM
exit, because the notification vector would be different.