* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-06-06 10:57 ` [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts Paolo Bonzini
@ 2017-06-06 12:30 ` Longpeng (Mike)
2017-06-06 12:35 ` Paolo Bonzini
2017-06-06 21:49 ` kbuild test robot
` (2 subsequent siblings)
3 siblings, 1 reply; 28+ messages in thread
From: Longpeng (Mike) @ 2017-06-06 12:30 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, Huangweidong, Gonglei, wangxin,
Radim Krčmář
On 2017/6/6 18:57, Paolo Bonzini wrote:
> In some cases, for example involving hot-unplug of assigned
> devices, pi_post_block can forget to remove the vCPU from the
> blocked_vcpu_list. When this happens, the next call to
> pi_pre_block corrupts the list.
>
> Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block
> and WARN instead of adding the element twice in the list. Second,
> always do the list removal in pi_post_block if vcpu->pre_pcpu is
> set (not -1).
>
> The new code keeps interrupts disabled for the whole duration of
> pi_pre_block/pi_post_block. This is not strictly necessary, but
> easier to follow. For the same reason, PI.ON is checked only
> after the cmpxchg, and to handle it we just call the post-block
> code. This removes duplication of the list removal code.
>
> Cc: Longpeng (Mike) <longpeng2@huawei.com>
> Cc: Huangweidong <weidong.huang@huawei.com>
> Cc: Gonglei <arei.gonglei@huawei.com>
> Cc: wangxin <wangxinxin.wang@huawei.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/vmx.c | 62 ++++++++++++++++++++++--------------------------------
> 1 file changed, 25 insertions(+), 37 deletions(-)
>
[...]
> @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> } while (cmpxchg(&pi_desc->control, old.control,
> new.control) != old.control);
>
> - if(vcpu->pre_pcpu != -1) {
> - spin_lock_irqsave(
> - &per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> + if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
> + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> list_del(&vcpu->blocked_vcpu_list);
> - spin_unlock_irqrestore(
> - &per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
Hi Paolo,
spin_lock_irqsave() will disable kernel preempt, but spin_lock() won't. is there
some potential problems ?
Regards,
Longpeng(Mike)
> vcpu->pre_pcpu = -1;
> }
> }
> @@ -11283,7 +11280,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> */
> static int pi_pre_block(struct kvm_vcpu *vcpu)
> {
> - unsigned long flags;
> unsigned int dest;
> struct pi_desc old, new;
> struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> @@ -11293,34 +11289,20 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> !kvm_vcpu_apicv_active(vcpu))
> return 0;
>
> - vcpu->pre_pcpu = vcpu->cpu;
> - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> - list_add_tail(&vcpu->blocked_vcpu_list,
> - &per_cpu(blocked_vcpu_on_cpu,
> - vcpu->pre_pcpu));
> - spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> + WARN_ON(irqs_disabled());
> + local_irq_disable();
> + if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> + vcpu->pre_pcpu = vcpu->cpu;
> + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> + list_add_tail(&vcpu->blocked_vcpu_list,
> + &per_cpu(blocked_vcpu_on_cpu,
> + vcpu->pre_pcpu));
> + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> + }
>
> do {
> old.control = new.control = pi_desc->control;
>
> - /*
> - * We should not block the vCPU if
> - * an interrupt is posted for it.
> - */
> - if (pi_test_on(pi_desc) == 1) {
> - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> - list_del(&vcpu->blocked_vcpu_list);
> - spin_unlock_irqrestore(
> - &per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> - vcpu->pre_pcpu = -1;
> -
> - return 1;
> - }
> -
> WARN((pi_desc->sn == 1),
> "Warning: SN field of posted-interrupts "
> "is set before blocking\n");
> @@ -11345,7 +11327,12 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> } while (cmpxchg(&pi_desc->control, old.control,
> new.control) != old.control);
>
> - return 0;
> + /* We should not block the vCPU if an interrupt is posted for it. */
> + if (pi_test_on(pi_desc) == 1)
> + __pi_post_block(vcpu);
> +
> + local_irq_enable();
> + return (vcpu->pre_pcpu == -1);
> }
>
> static int vmx_pre_block(struct kvm_vcpu *vcpu)
> @@ -11361,12 +11348,13 @@ static int vmx_pre_block(struct kvm_vcpu *vcpu)
>
> static void pi_post_block(struct kvm_vcpu *vcpu)
> {
> - if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> - !irq_remapping_cap(IRQ_POSTING_CAP) ||
> - !kvm_vcpu_apicv_active(vcpu))
> + if (vcpu->pre_pcpu == -1)
> return;
>
> + WARN_ON(irqs_disabled());
> + local_irq_disable();
> __pi_post_block(vcpu);
> + local_irq_enable();
> }
>
> static void vmx_post_block(struct kvm_vcpu *vcpu)
--
Regards,
Longpeng(Mike)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-06-06 12:30 ` Longpeng (Mike)
@ 2017-06-06 12:35 ` Paolo Bonzini
2017-06-06 12:45 ` Longpeng (Mike)
0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2017-06-06 12:35 UTC (permalink / raw)
To: Longpeng (Mike)
Cc: linux-kernel, kvm, Huangweidong, Gonglei, wangxin,
Radim Krčmář
On 06/06/2017 14:30, Longpeng (Mike) wrote:
>
>
> On 2017/6/6 18:57, Paolo Bonzini wrote:
>
>> In some cases, for example involving hot-unplug of assigned
>> devices, pi_post_block can forget to remove the vCPU from the
>> blocked_vcpu_list. When this happens, the next call to
>> pi_pre_block corrupts the list.
>>
>> Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block
>> and WARN instead of adding the element twice in the list. Second,
>> always do the list removal in pi_post_block if vcpu->pre_pcpu is
>> set (not -1).
>>
>> The new code keeps interrupts disabled for the whole duration of
>> pi_pre_block/pi_post_block. This is not strictly necessary, but
>> easier to follow. For the same reason, PI.ON is checked only
>> after the cmpxchg, and to handle it we just call the post-block
>> code. This removes duplication of the list removal code.
>>
>> Cc: Longpeng (Mike) <longpeng2@huawei.com>
>> Cc: Huangweidong <weidong.huang@huawei.com>
>> Cc: Gonglei <arei.gonglei@huawei.com>
>> Cc: wangxin <wangxinxin.wang@huawei.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>> arch/x86/kvm/vmx.c | 62 ++++++++++++++++++++++--------------------------------
>> 1 file changed, 25 insertions(+), 37 deletions(-)
>>
>
>
> [...]
>
>
>> @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
>> } while (cmpxchg(&pi_desc->control, old.control,
>> new.control) != old.control);
>>
>> - if(vcpu->pre_pcpu != -1) {
>> - spin_lock_irqsave(
>> - &per_cpu(blocked_vcpu_on_cpu_lock,
>> - vcpu->pre_pcpu), flags);
>> + if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
>> + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
>> list_del(&vcpu->blocked_vcpu_list);
>> - spin_unlock_irqrestore(
>> - &per_cpu(blocked_vcpu_on_cpu_lock,
>> - vcpu->pre_pcpu), flags);
>> + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
>
>
> Hi Paolo,
>
> spin_lock_irqsave() will disable kernel preempt, but spin_lock() won't. is there
> some potential problems ?
Hi,
This function (and pi_pre_block too's part where it takes the spin lock)
runs with interrupts disabled now.
Thanks,
Paolo
> Regards,
> Longpeng(Mike)
>
>> vcpu->pre_pcpu = -1;
>> }
>> }
>> @@ -11283,7 +11280,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
>> */
>> static int pi_pre_block(struct kvm_vcpu *vcpu)
>> {
>> - unsigned long flags;
>> unsigned int dest;
>> struct pi_desc old, new;
>> struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
>> @@ -11293,34 +11289,20 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
>> !kvm_vcpu_apicv_active(vcpu))
>> return 0;
>>
>> - vcpu->pre_pcpu = vcpu->cpu;
>> - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
>> - vcpu->pre_pcpu), flags);
>> - list_add_tail(&vcpu->blocked_vcpu_list,
>> - &per_cpu(blocked_vcpu_on_cpu,
>> - vcpu->pre_pcpu));
>> - spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
>> - vcpu->pre_pcpu), flags);
>> + WARN_ON(irqs_disabled());
>> + local_irq_disable();
>> + if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
>> + vcpu->pre_pcpu = vcpu->cpu;
>> + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
>> + list_add_tail(&vcpu->blocked_vcpu_list,
>> + &per_cpu(blocked_vcpu_on_cpu,
>> + vcpu->pre_pcpu));
>> + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
>> + }
>>
>> do {
>> old.control = new.control = pi_desc->control;
>>
>> - /*
>> - * We should not block the vCPU if
>> - * an interrupt is posted for it.
>> - */
>> - if (pi_test_on(pi_desc) == 1) {
>> - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
>> - vcpu->pre_pcpu), flags);
>> - list_del(&vcpu->blocked_vcpu_list);
>> - spin_unlock_irqrestore(
>> - &per_cpu(blocked_vcpu_on_cpu_lock,
>> - vcpu->pre_pcpu), flags);
>> - vcpu->pre_pcpu = -1;
>> -
>> - return 1;
>> - }
>> -
>> WARN((pi_desc->sn == 1),
>> "Warning: SN field of posted-interrupts "
>> "is set before blocking\n");
>> @@ -11345,7 +11327,12 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
>> } while (cmpxchg(&pi_desc->control, old.control,
>> new.control) != old.control);
>>
>> - return 0;
>> + /* We should not block the vCPU if an interrupt is posted for it. */
>> + if (pi_test_on(pi_desc) == 1)
>> + __pi_post_block(vcpu);
>> +
>> + local_irq_enable();
>> + return (vcpu->pre_pcpu == -1);
>> }
>>
>> static int vmx_pre_block(struct kvm_vcpu *vcpu)
>> @@ -11361,12 +11348,13 @@ static int vmx_pre_block(struct kvm_vcpu *vcpu)
>>
>> static void pi_post_block(struct kvm_vcpu *vcpu)
>> {
>> - if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
>> - !irq_remapping_cap(IRQ_POSTING_CAP) ||
>> - !kvm_vcpu_apicv_active(vcpu))
>> + if (vcpu->pre_pcpu == -1)
>> return;
>>
>> + WARN_ON(irqs_disabled());
>> + local_irq_disable();
>> __pi_post_block(vcpu);
>> + local_irq_enable();
>> }
>>
>> static void vmx_post_block(struct kvm_vcpu *vcpu)
>
>
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-06-06 12:35 ` Paolo Bonzini
@ 2017-06-06 12:45 ` Longpeng (Mike)
0 siblings, 0 replies; 28+ messages in thread
From: Longpeng (Mike) @ 2017-06-06 12:45 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, Huangweidong, Gonglei, wangxin,
Radim Krčmář
On 2017/6/6 20:35, Paolo Bonzini wrote:
>
>
> On 06/06/2017 14:30, Longpeng (Mike) wrote:
>>
>>
>> On 2017/6/6 18:57, Paolo Bonzini wrote:
>>
>>> In some cases, for example involving hot-unplug of assigned
>>> devices, pi_post_block can forget to remove the vCPU from the
>>> blocked_vcpu_list. When this happens, the next call to
>>> pi_pre_block corrupts the list.
>>>
>>> Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block
>>> and WARN instead of adding the element twice in the list. Second,
>>> always do the list removal in pi_post_block if vcpu->pre_pcpu is
>>> set (not -1).
>>>
>>> The new code keeps interrupts disabled for the whole duration of
>>> pi_pre_block/pi_post_block. This is not strictly necessary, but
>>> easier to follow. For the same reason, PI.ON is checked only
>>> after the cmpxchg, and to handle it we just call the post-block
>>> code. This removes duplication of the list removal code.
>>>
>>> Cc: Longpeng (Mike) <longpeng2@huawei.com>
>>> Cc: Huangweidong <weidong.huang@huawei.com>
>>> Cc: Gonglei <arei.gonglei@huawei.com>
>>> Cc: wangxin <wangxinxin.wang@huawei.com>
>>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>> arch/x86/kvm/vmx.c | 62 ++++++++++++++++++++++--------------------------------
>>> 1 file changed, 25 insertions(+), 37 deletions(-)
>>>
>>
>>
>> [...]
>>
>>
>>> @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
>>> } while (cmpxchg(&pi_desc->control, old.control,
>>> new.control) != old.control);
>>>
>>> - if(vcpu->pre_pcpu != -1) {
>>> - spin_lock_irqsave(
>>> - &per_cpu(blocked_vcpu_on_cpu_lock,
>>> - vcpu->pre_pcpu), flags);
>>> + if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
>>> + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
>>> list_del(&vcpu->blocked_vcpu_list);
>>> - spin_unlock_irqrestore(
>>> - &per_cpu(blocked_vcpu_on_cpu_lock,
>>> - vcpu->pre_pcpu), flags);
>>> + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
>>
>>
>> Hi Paolo,
>>
>> spin_lock_irqsave() will disable kernel preempt, but spin_lock() won't. is there
>> some potential problems ?
>
> Hi,
>
> This function (and pi_pre_block too's part where it takes the spin lock)
> runs with interrupts disabled now.
>
Oh, yes, please forgive my foolish.
We'll continue to find why the list is corrupt when repeat poweron/shutdown
Thanks.
> Thanks,
>
> Paolo
>
>> Regards,
>> Longpeng(Mike)
>>
>>> vcpu->pre_pcpu = -1;
>>> }
>>> }
>>> @@ -11283,7 +11280,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
>>> */
>>> static int pi_pre_block(struct kvm_vcpu *vcpu)
>>> {
>>> - unsigned long flags;
>>> unsigned int dest;
>>> struct pi_desc old, new;
>>> struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
>>> @@ -11293,34 +11289,20 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
>>> !kvm_vcpu_apicv_active(vcpu))
>>> return 0;
>>>
>>> - vcpu->pre_pcpu = vcpu->cpu;
>>> - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
>>> - vcpu->pre_pcpu), flags);
>>> - list_add_tail(&vcpu->blocked_vcpu_list,
>>> - &per_cpu(blocked_vcpu_on_cpu,
>>> - vcpu->pre_pcpu));
>>> - spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
>>> - vcpu->pre_pcpu), flags);
>>> + WARN_ON(irqs_disabled());
>>> + local_irq_disable();
>>> + if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
>>> + vcpu->pre_pcpu = vcpu->cpu;
>>> + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
>>> + list_add_tail(&vcpu->blocked_vcpu_list,
>>> + &per_cpu(blocked_vcpu_on_cpu,
>>> + vcpu->pre_pcpu));
>>> + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
>>> + }
>>>
>>> do {
>>> old.control = new.control = pi_desc->control;
>>>
>>> - /*
>>> - * We should not block the vCPU if
>>> - * an interrupt is posted for it.
>>> - */
>>> - if (pi_test_on(pi_desc) == 1) {
>>> - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
>>> - vcpu->pre_pcpu), flags);
>>> - list_del(&vcpu->blocked_vcpu_list);
>>> - spin_unlock_irqrestore(
>>> - &per_cpu(blocked_vcpu_on_cpu_lock,
>>> - vcpu->pre_pcpu), flags);
>>> - vcpu->pre_pcpu = -1;
>>> -
>>> - return 1;
>>> - }
>>> -
>>> WARN((pi_desc->sn == 1),
>>> "Warning: SN field of posted-interrupts "
>>> "is set before blocking\n");
>>> @@ -11345,7 +11327,12 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
>>> } while (cmpxchg(&pi_desc->control, old.control,
>>> new.control) != old.control);
>>>
>>> - return 0;
>>> + /* We should not block the vCPU if an interrupt is posted for it. */
>>> + if (pi_test_on(pi_desc) == 1)
>>> + __pi_post_block(vcpu);
>>> +
>>> + local_irq_enable();
>>> + return (vcpu->pre_pcpu == -1);
>>> }
>>>
>>> static int vmx_pre_block(struct kvm_vcpu *vcpu)
>>> @@ -11361,12 +11348,13 @@ static int vmx_pre_block(struct kvm_vcpu *vcpu)
>>>
>>> static void pi_post_block(struct kvm_vcpu *vcpu)
>>> {
>>> - if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
>>> - !irq_remapping_cap(IRQ_POSTING_CAP) ||
>>> - !kvm_vcpu_apicv_active(vcpu))
>>> + if (vcpu->pre_pcpu == -1)
>>> return;
>>>
>>> + WARN_ON(irqs_disabled());
>>> + local_irq_disable();
>>> __pi_post_block(vcpu);
>>> + local_irq_enable();
>>> }
>>>
>>> static void vmx_post_block(struct kvm_vcpu *vcpu)
>>
>>
>
> .
>
--
Regards,
Longpeng(Mike)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-06-06 10:57 ` [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts Paolo Bonzini
2017-06-06 12:30 ` Longpeng (Mike)
@ 2017-06-06 21:49 ` kbuild test robot
2017-06-08 6:50 ` Peter Xu
2017-07-28 2:31 ` Longpeng (Mike)
3 siblings, 0 replies; 28+ messages in thread
From: kbuild test robot @ 2017-06-06 21:49 UTC (permalink / raw)
To: Paolo Bonzini
Cc: kbuild-all, linux-kernel, kvm, Longpeng, Huangweidong, Gonglei,
wangxin, Radim Krčmář
[-- Attachment #1: Type: text/plain, Size: 8147 bytes --]
Hi Paolo,
[auto build test ERROR on kvm/linux-next]
[also build test ERROR on v4.12-rc4 next-20170606]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]
url: https://github.com/0day-ci/linux/commits/Paolo-Bonzini/VT-d-PI-fixes/20170607-042637
base: https://git.kernel.org/pub/scm/virt/kvm/kvm.git linux-next
config: i386-randconfig-x012-201723 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
# save the attached .config to linux build tree
make ARCH=i386
All errors (new ones prefixed by >>):
In file included from arch/x86/include/asm/atomic.h:7:0,
from include/linux/atomic.h:4,
from include/linux/mm_types_task.h:12,
from include/linux/mm_types.h:4,
from arch/x86//kvm/irq.h:25,
from arch/x86//kvm/vmx.c:19:
arch/x86//kvm/vmx.c: In function '__pi_post_block':
arch/x86/include/asm/cmpxchg.h:129:2: warning: '__ret' is used uninitialized in this function [-Wuninitialized]
__ret; \
^~~~~
arch/x86/include/asm/cmpxchg.h:86:21: note: '__ret' was declared here
__typeof__(*(ptr)) __ret; \
^
arch/x86/include/asm/cmpxchg.h:133:2: note: in expansion of macro '__raw_cmpxchg'
__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
^~~~~~~~~~~~~
arch/x86/include/asm/cmpxchg.h:148:2: note: in expansion of macro '__cmpxchg'
__cmpxchg(ptr, old, new, sizeof(*(ptr)))
^~~~~~~~~
arch/x86//kvm/vmx.c:11243:11: note: in expansion of macro 'cmpxchg'
} while (cmpxchg(&pi_desc->control, old.control,
^~~~~~~
In function '__pi_post_block',
inlined from 'pi_post_block' at arch/x86//kvm/vmx.c:11342:2,
inlined from 'vmx_post_block' at arch/x86//kvm/vmx.c:11351:2:
>> arch/x86/include/asm/cmpxchg.h:127:3: error: call to '__cmpxchg_wrong_size' declared with attribute error: Bad argument size for cmpxchg
__cmpxchg_wrong_size(); \
^~~~~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/cmpxchg.h:133:2: note: in expansion of macro '__raw_cmpxchg'
__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
^~~~~~~~~~~~~
arch/x86/include/asm/cmpxchg.h:148:2: note: in expansion of macro '__cmpxchg'
__cmpxchg(ptr, old, new, sizeof(*(ptr)))
^~~~~~~~~
arch/x86//kvm/vmx.c:11243:11: note: in expansion of macro 'cmpxchg'
} while (cmpxchg(&pi_desc->control, old.control,
^~~~~~~
--
In file included from arch/x86/include/asm/atomic.h:7:0,
from include/linux/atomic.h:4,
from include/linux/mm_types_task.h:12,
from include/linux/mm_types.h:4,
from arch/x86/kvm/irq.h:25,
from arch/x86/kvm/vmx.c:19:
arch/x86/kvm/vmx.c: In function '__pi_post_block':
arch/x86/include/asm/cmpxchg.h:129:2: warning: '__ret' is used uninitialized in this function [-Wuninitialized]
__ret; \
^~~~~
arch/x86/include/asm/cmpxchg.h:86:21: note: '__ret' was declared here
__typeof__(*(ptr)) __ret; \
^
arch/x86/include/asm/cmpxchg.h:133:2: note: in expansion of macro '__raw_cmpxchg'
__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
^~~~~~~~~~~~~
arch/x86/include/asm/cmpxchg.h:148:2: note: in expansion of macro '__cmpxchg'
__cmpxchg(ptr, old, new, sizeof(*(ptr)))
^~~~~~~~~
arch/x86/kvm/vmx.c:11243:11: note: in expansion of macro 'cmpxchg'
} while (cmpxchg(&pi_desc->control, old.control,
^~~~~~~
In function '__pi_post_block',
inlined from 'pi_post_block' at arch/x86/kvm/vmx.c:11342:2,
inlined from 'vmx_post_block' at arch/x86/kvm/vmx.c:11351:2:
>> arch/x86/include/asm/cmpxchg.h:127:3: error: call to '__cmpxchg_wrong_size' declared with attribute error: Bad argument size for cmpxchg
__cmpxchg_wrong_size(); \
^~~~~~~~~~~~~~~~~~~~~~
arch/x86/include/asm/cmpxchg.h:133:2: note: in expansion of macro '__raw_cmpxchg'
__raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
^~~~~~~~~~~~~
arch/x86/include/asm/cmpxchg.h:148:2: note: in expansion of macro '__cmpxchg'
__cmpxchg(ptr, old, new, sizeof(*(ptr)))
^~~~~~~~~
arch/x86/kvm/vmx.c:11243:11: note: in expansion of macro 'cmpxchg'
} while (cmpxchg(&pi_desc->control, old.control,
^~~~~~~
vim +/__cmpxchg_wrong_size +127 arch/x86/include/asm/cmpxchg.h
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 121 : "=a" (__ret), "+m" (*__ptr) \
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 122 : "r" (__new), "0" (__old) \
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 123 : "memory"); \
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 124 break; \
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 125 } \
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 126 default: \
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 @127 __cmpxchg_wrong_size(); \
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 128 } \
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 129 __ret; \
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 130 })
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 131
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 132 #define __cmpxchg(ptr, old, new, size) \
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 133 __raw_cmpxchg((ptr), (old), (new), (size), LOCK_PREFIX)
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 134
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 135 #define __sync_cmpxchg(ptr, old, new, size) \
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 136 __raw_cmpxchg((ptr), (old), (new), (size), "lock; ")
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 137
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 138 #define __cmpxchg_local(ptr, old, new, size) \
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 139 __raw_cmpxchg((ptr), (old), (new), (size), "")
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 140
96a388de include/asm-x86/cmpxchg.h Thomas Gleixner 2007-10-11 141 #ifdef CONFIG_X86_32
a1ce3928 arch/x86/include/asm/cmpxchg.h David Howells 2012-10-02 142 # include <asm/cmpxchg_32.h>
96a388de include/asm-x86/cmpxchg.h Thomas Gleixner 2007-10-11 143 #else
a1ce3928 arch/x86/include/asm/cmpxchg.h David Howells 2012-10-02 144 # include <asm/cmpxchg_64.h>
96a388de include/asm-x86/cmpxchg.h Thomas Gleixner 2007-10-11 145 #endif
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 146
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 147 #define cmpxchg(ptr, old, new) \
fc395b92 arch/x86/include/asm/cmpxchg.h Jan Beulich 2012-01-26 @148 __cmpxchg(ptr, old, new, sizeof(*(ptr)))
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 149
e9826380 arch/x86/include/asm/cmpxchg.h Jeremy Fitzhardinge 2011-08-18 150 #define sync_cmpxchg(ptr, old, new) \
fc395b92 arch/x86/include/asm/cmpxchg.h Jan Beulich 2012-01-26 151 __sync_cmpxchg(ptr, old, new, sizeof(*(ptr)))
:::::: The code at line 127 was first introduced by commit
:::::: e9826380d83d1bda3ee5663bf3fa4667a6fbe60a x86, cmpxchg: Unify cmpxchg into cmpxchg.h
:::::: TO: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
:::::: CC: H. Peter Anvin <hpa@linux.intel.com>
---
0-DAY kernel test infrastructure Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all Intel Corporation
[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 29547 bytes --]
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-06-06 10:57 ` [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts Paolo Bonzini
2017-06-06 12:30 ` Longpeng (Mike)
2017-06-06 21:49 ` kbuild test robot
@ 2017-06-08 6:50 ` Peter Xu
2017-06-08 6:53 ` Peter Xu
2017-06-08 7:00 ` Paolo Bonzini
2017-07-28 2:31 ` Longpeng (Mike)
3 siblings, 2 replies; 28+ messages in thread
From: Peter Xu @ 2017-06-08 6:50 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, Longpeng, Huangweidong, Gonglei, wangxin,
Radim Krčmář
On Tue, Jun 06, 2017 at 12:57:05PM +0200, Paolo Bonzini wrote:
> In some cases, for example involving hot-unplug of assigned
> devices, pi_post_block can forget to remove the vCPU from the
> blocked_vcpu_list. When this happens, the next call to
> pi_pre_block corrupts the list.
>
> Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block
> and WARN instead of adding the element twice in the list. Second,
> always do the list removal in pi_post_block if vcpu->pre_pcpu is
> set (not -1).
>
> The new code keeps interrupts disabled for the whole duration of
> pi_pre_block/pi_post_block. This is not strictly necessary, but
> easier to follow. For the same reason, PI.ON is checked only
> after the cmpxchg, and to handle it we just call the post-block
> code. This removes duplication of the list removal code.
>
> Cc: Longpeng (Mike) <longpeng2@huawei.com>
> Cc: Huangweidong <weidong.huang@huawei.com>
> Cc: Gonglei <arei.gonglei@huawei.com>
> Cc: wangxin <wangxinxin.wang@huawei.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/vmx.c | 62 ++++++++++++++++++++++--------------------------------
> 1 file changed, 25 insertions(+), 37 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 747d16525b45..0f4714fe4908 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -11236,10 +11236,11 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> struct pi_desc old, new;
> unsigned int dest;
> - unsigned long flags;
>
> do {
> old.control = new.control = pi_desc->control;
> + WARN(old.nv != POSTED_INTR_WAKEUP_VECTOR,
> + "Wakeup handler not enabled while the VCPU is blocked\n");
>
> dest = cpu_physical_id(vcpu->cpu);
>
> @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> } while (cmpxchg(&pi_desc->control, old.control,
> new.control) != old.control);
>
> - if(vcpu->pre_pcpu != -1) {
> - spin_lock_irqsave(
> - &per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> + if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
> + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> list_del(&vcpu->blocked_vcpu_list);
> - spin_unlock_irqrestore(
> - &per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> vcpu->pre_pcpu = -1;
> }
> }
> @@ -11283,7 +11280,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> */
> static int pi_pre_block(struct kvm_vcpu *vcpu)
> {
> - unsigned long flags;
> unsigned int dest;
> struct pi_desc old, new;
> struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> @@ -11293,34 +11289,20 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> !kvm_vcpu_apicv_active(vcpu))
> return 0;
>
> - vcpu->pre_pcpu = vcpu->cpu;
> - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> - list_add_tail(&vcpu->blocked_vcpu_list,
> - &per_cpu(blocked_vcpu_on_cpu,
> - vcpu->pre_pcpu));
> - spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> + WARN_ON(irqs_disabled());
> + local_irq_disable();
> + if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> + vcpu->pre_pcpu = vcpu->cpu;
> + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> + list_add_tail(&vcpu->blocked_vcpu_list,
> + &per_cpu(blocked_vcpu_on_cpu,
> + vcpu->pre_pcpu));
> + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> + }
>
> do {
> old.control = new.control = pi_desc->control;
>
> - /*
> - * We should not block the vCPU if
> - * an interrupt is posted for it.
> - */
> - if (pi_test_on(pi_desc) == 1) {
> - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> - list_del(&vcpu->blocked_vcpu_list);
> - spin_unlock_irqrestore(
> - &per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> - vcpu->pre_pcpu = -1;
> -
> - return 1;
[1]
> - }
> -
> WARN((pi_desc->sn == 1),
> "Warning: SN field of posted-interrupts "
> "is set before blocking\n");
> @@ -11345,7 +11327,12 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> } while (cmpxchg(&pi_desc->control, old.control,
> new.control) != old.control);
>
> - return 0;
> + /* We should not block the vCPU if an interrupt is posted for it. */
> + if (pi_test_on(pi_desc) == 1)
> + __pi_post_block(vcpu);
A question on when pi_test_on() is set:
The old code will return 1 if detected (ses [1]), while the new code
does not. Would that matter? (IIUC that decides whether the vcpu will
continue to run?)
> +
> + local_irq_enable();
> + return (vcpu->pre_pcpu == -1);
Above we have:
if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
vcpu->pre_pcpu = vcpu->cpu;
...
}
Then can here vcpu->pre_pcpu really be -1?
> }
>
> static int vmx_pre_block(struct kvm_vcpu *vcpu)
> @@ -11361,12 +11348,13 @@ static int vmx_pre_block(struct kvm_vcpu *vcpu)
>
> static void pi_post_block(struct kvm_vcpu *vcpu)
> {
> - if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> - !irq_remapping_cap(IRQ_POSTING_CAP) ||
> - !kvm_vcpu_apicv_active(vcpu))
> + if (vcpu->pre_pcpu == -1)
> return;
>
> + WARN_ON(irqs_disabled());
> + local_irq_disable();
> __pi_post_block(vcpu);
> + local_irq_enable();
> }
>
> static void vmx_post_block(struct kvm_vcpu *vcpu)
> --
> 2.13.0
>
>
A general question to pre_block/post_block handling for PI:
I see that we are handling PI logic mostly in four places:
vmx_vcpu_pi_{load|put}
pi_{pre_post}_block
But do we really need the pre_block/post_block handling? Here's how I
understand when vcpu blocked:
- vcpu_block
- ->pre_block
- kvm_vcpu_block [1]
- schedule()
- kvm_sched_out
- vmx_vcpu_pi_put [3]
- (another process working) ...
- kvm_sched_in
- vmx_vcpu_pi_load [4]
- ->post_block [2]
If so, [1] & [2] will definitely be paired with [3] & [4], then why we
need [3] & [4] at all?
(Though [3] & [4] will also be used when preemption happens, so they
are required)
Please kindly figure out if I missed anything important...
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-06-08 6:50 ` Peter Xu
@ 2017-06-08 6:53 ` Peter Xu
2017-06-08 7:00 ` Paolo Bonzini
1 sibling, 0 replies; 28+ messages in thread
From: Peter Xu @ 2017-06-08 6:53 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, Longpeng, Huangweidong, Gonglei, wangxin,
Radim Krčmář
On Thu, Jun 08, 2017 at 02:50:57PM +0800, Peter Xu wrote:
> On Tue, Jun 06, 2017 at 12:57:05PM +0200, Paolo Bonzini wrote:
> > In some cases, for example involving hot-unplug of assigned
> > devices, pi_post_block can forget to remove the vCPU from the
> > blocked_vcpu_list. When this happens, the next call to
> > pi_pre_block corrupts the list.
> >
> > Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block
> > and WARN instead of adding the element twice in the list. Second,
> > always do the list removal in pi_post_block if vcpu->pre_pcpu is
> > set (not -1).
> >
> > The new code keeps interrupts disabled for the whole duration of
> > pi_pre_block/pi_post_block. This is not strictly necessary, but
> > easier to follow. For the same reason, PI.ON is checked only
> > after the cmpxchg, and to handle it we just call the post-block
> > code. This removes duplication of the list removal code.
> >
> > Cc: Longpeng (Mike) <longpeng2@huawei.com>
> > Cc: Huangweidong <weidong.huang@huawei.com>
> > Cc: Gonglei <arei.gonglei@huawei.com>
> > Cc: wangxin <wangxinxin.wang@huawei.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> > arch/x86/kvm/vmx.c | 62 ++++++++++++++++++++++--------------------------------
> > 1 file changed, 25 insertions(+), 37 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 747d16525b45..0f4714fe4908 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -11236,10 +11236,11 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> > struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > struct pi_desc old, new;
> > unsigned int dest;
> > - unsigned long flags;
> >
> > do {
> > old.control = new.control = pi_desc->control;
> > + WARN(old.nv != POSTED_INTR_WAKEUP_VECTOR,
> > + "Wakeup handler not enabled while the VCPU is blocked\n");
> >
> > dest = cpu_physical_id(vcpu->cpu);
> >
> > @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> > } while (cmpxchg(&pi_desc->control, old.control,
> > new.control) != old.control);
> >
> > - if(vcpu->pre_pcpu != -1) {
> > - spin_lock_irqsave(
> > - &per_cpu(blocked_vcpu_on_cpu_lock,
> > - vcpu->pre_pcpu), flags);
> > + if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
> > + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > list_del(&vcpu->blocked_vcpu_list);
> > - spin_unlock_irqrestore(
> > - &per_cpu(blocked_vcpu_on_cpu_lock,
> > - vcpu->pre_pcpu), flags);
> > + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > vcpu->pre_pcpu = -1;
> > }
> > }
> > @@ -11283,7 +11280,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> > */
> > static int pi_pre_block(struct kvm_vcpu *vcpu)
> > {
> > - unsigned long flags;
> > unsigned int dest;
> > struct pi_desc old, new;
> > struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > @@ -11293,34 +11289,20 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> > !kvm_vcpu_apicv_active(vcpu))
> > return 0;
> >
> > - vcpu->pre_pcpu = vcpu->cpu;
> > - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > - vcpu->pre_pcpu), flags);
> > - list_add_tail(&vcpu->blocked_vcpu_list,
> > - &per_cpu(blocked_vcpu_on_cpu,
> > - vcpu->pre_pcpu));
> > - spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > - vcpu->pre_pcpu), flags);
> > + WARN_ON(irqs_disabled());
> > + local_irq_disable();
> > + if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> > + vcpu->pre_pcpu = vcpu->cpu;
> > + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > + list_add_tail(&vcpu->blocked_vcpu_list,
> > + &per_cpu(blocked_vcpu_on_cpu,
> > + vcpu->pre_pcpu));
> > + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > + }
> >
> > do {
> > old.control = new.control = pi_desc->control;
> >
> > - /*
> > - * We should not block the vCPU if
> > - * an interrupt is posted for it.
> > - */
> > - if (pi_test_on(pi_desc) == 1) {
> > - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > - vcpu->pre_pcpu), flags);
> > - list_del(&vcpu->blocked_vcpu_list);
> > - spin_unlock_irqrestore(
> > - &per_cpu(blocked_vcpu_on_cpu_lock,
> > - vcpu->pre_pcpu), flags);
> > - vcpu->pre_pcpu = -1;
> > -
> > - return 1;
>
> [1]
>
> > - }
> > -
> > WARN((pi_desc->sn == 1),
> > "Warning: SN field of posted-interrupts "
> > "is set before blocking\n");
> > @@ -11345,7 +11327,12 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> > } while (cmpxchg(&pi_desc->control, old.control,
> > new.control) != old.control);
> >
> > - return 0;
> > + /* We should not block the vCPU if an interrupt is posted for it. */
> > + if (pi_test_on(pi_desc) == 1)
> > + __pi_post_block(vcpu);
>
> A question on when pi_test_on() is set:
>
> The old code will return 1 if detected (ses [1]), while the new code
> does not. Would that matter? (IIUC that decides whether the vcpu will
> continue to run?)
>
> > +
> > + local_irq_enable();
> > + return (vcpu->pre_pcpu == -1);
>
> Above we have:
>
> if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> vcpu->pre_pcpu = vcpu->cpu;
> ...
> }
>
> Then can here vcpu->pre_pcpu really be -1?
>
> > }
> >
> > static int vmx_pre_block(struct kvm_vcpu *vcpu)
> > @@ -11361,12 +11348,13 @@ static int vmx_pre_block(struct kvm_vcpu *vcpu)
> >
> > static void pi_post_block(struct kvm_vcpu *vcpu)
> > {
> > - if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > - !irq_remapping_cap(IRQ_POSTING_CAP) ||
> > - !kvm_vcpu_apicv_active(vcpu))
> > + if (vcpu->pre_pcpu == -1)
> > return;
> >
> > + WARN_ON(irqs_disabled());
> > + local_irq_disable();
> > __pi_post_block(vcpu);
> > + local_irq_enable();
> > }
> >
> > static void vmx_post_block(struct kvm_vcpu *vcpu)
> > --
> > 2.13.0
> >
> >
>
> A general question to pre_block/post_block handling for PI:
>
> I see that we are handling PI logic mostly in four places:
>
> vmx_vcpu_pi_{load|put}
> pi_{pre_post}_block
>
> But do we really need the pre_block/post_block handling? Here's how I
> understand when vcpu blocked:
>
> - vcpu_block
> - ->pre_block
> - kvm_vcpu_block [1]
> - schedule()
> - kvm_sched_out
> - vmx_vcpu_pi_put [3]
> - (another process working) ...
> - kvm_sched_in
> - vmx_vcpu_pi_load [4]
> - ->post_block [2]
>
> If so, [1] & [2] will definitely be paired with [3] & [4], then why we
> need [3] & [4] at all?
^^^^^^^^^ Here I meant [1] & [2]. Sorry.
>
> (Though [3] & [4] will also be used when preemption happens, so they
> are required)
>
> Please kindly figure out if I missed anything important...
>
> Thanks,
>
> --
> Peter Xu
--
Peter Xu
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-06-08 6:50 ` Peter Xu
2017-06-08 6:53 ` Peter Xu
@ 2017-06-08 7:00 ` Paolo Bonzini
2017-06-08 9:16 ` Peter Xu
1 sibling, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2017-06-08 7:00 UTC (permalink / raw)
To: Peter Xu
Cc: linux-kernel, kvm, Longpeng, Huangweidong, Gonglei, wangxin,
Radim Krčmář
----- Original Message -----
> From: "Peter Xu" <peterx@redhat.com>
> To: "Paolo Bonzini" <pbonzini@redhat.com>
> Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, "Longpeng" <longpeng2@huawei.com>, "Huangweidong"
> <weidong.huang@huawei.com>, "Gonglei" <arei.gonglei@huawei.com>, "wangxin" <wangxinxin.wang@huawei.com>, "Radim
> Krčmář" <rkrcmar@redhat.com>
> Sent: Thursday, June 8, 2017 8:50:57 AM
> Subject: Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
>
> On Tue, Jun 06, 2017 at 12:57:05PM +0200, Paolo Bonzini wrote:
> > In some cases, for example involving hot-unplug of assigned
> > devices, pi_post_block can forget to remove the vCPU from the
> > blocked_vcpu_list. When this happens, the next call to
> > pi_pre_block corrupts the list.
> >
> > Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block
> > and WARN instead of adding the element twice in the list. Second,
> > always do the list removal in pi_post_block if vcpu->pre_pcpu is
> > set (not -1).
> >
> > The new code keeps interrupts disabled for the whole duration of
> > pi_pre_block/pi_post_block. This is not strictly necessary, but
> > easier to follow. For the same reason, PI.ON is checked only
> > after the cmpxchg, and to handle it we just call the post-block
> > code. This removes duplication of the list removal code.
> >
> > Cc: Longpeng (Mike) <longpeng2@huawei.com>
> > Cc: Huangweidong <weidong.huang@huawei.com>
> > Cc: Gonglei <arei.gonglei@huawei.com>
> > Cc: wangxin <wangxinxin.wang@huawei.com>
> > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> > arch/x86/kvm/vmx.c | 62
> > ++++++++++++++++++++++--------------------------------
> > 1 file changed, 25 insertions(+), 37 deletions(-)
> >
> > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > index 747d16525b45..0f4714fe4908 100644
> > --- a/arch/x86/kvm/vmx.c
> > +++ b/arch/x86/kvm/vmx.c
> > @@ -11236,10 +11236,11 @@ static void __pi_post_block(struct kvm_vcpu
> > *vcpu)
> > struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > struct pi_desc old, new;
> > unsigned int dest;
> > - unsigned long flags;
> >
> > do {
> > old.control = new.control = pi_desc->control;
> > + WARN(old.nv != POSTED_INTR_WAKEUP_VECTOR,
> > + "Wakeup handler not enabled while the VCPU is blocked\n");
> >
> > dest = cpu_physical_id(vcpu->cpu);
> >
> > @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu
> > *vcpu)
> > } while (cmpxchg(&pi_desc->control, old.control,
> > new.control) != old.control);
> >
> > - if(vcpu->pre_pcpu != -1) {
> > - spin_lock_irqsave(
> > - &per_cpu(blocked_vcpu_on_cpu_lock,
> > - vcpu->pre_pcpu), flags);
> > + if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
> > + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > list_del(&vcpu->blocked_vcpu_list);
> > - spin_unlock_irqrestore(
> > - &per_cpu(blocked_vcpu_on_cpu_lock,
> > - vcpu->pre_pcpu), flags);
> > + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > vcpu->pre_pcpu = -1;
> > }
> > }
> > @@ -11283,7 +11280,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> > */
> > static int pi_pre_block(struct kvm_vcpu *vcpu)
> > {
> > - unsigned long flags;
> > unsigned int dest;
> > struct pi_desc old, new;
> > struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > @@ -11293,34 +11289,20 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> > !kvm_vcpu_apicv_active(vcpu))
> > return 0;
> >
> > - vcpu->pre_pcpu = vcpu->cpu;
> > - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > - vcpu->pre_pcpu), flags);
> > - list_add_tail(&vcpu->blocked_vcpu_list,
> > - &per_cpu(blocked_vcpu_on_cpu,
> > - vcpu->pre_pcpu));
> > - spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > - vcpu->pre_pcpu), flags);
> > + WARN_ON(irqs_disabled());
> > + local_irq_disable();
> > + if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> > + vcpu->pre_pcpu = vcpu->cpu;
> > + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > + list_add_tail(&vcpu->blocked_vcpu_list,
> > + &per_cpu(blocked_vcpu_on_cpu,
> > + vcpu->pre_pcpu));
> > + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > + }
> >
> > do {
> > old.control = new.control = pi_desc->control;
> >
> > - /*
> > - * We should not block the vCPU if
> > - * an interrupt is posted for it.
> > - */
> > - if (pi_test_on(pi_desc) == 1) {
> > - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > - vcpu->pre_pcpu), flags);
> > - list_del(&vcpu->blocked_vcpu_list);
> > - spin_unlock_irqrestore(
> > - &per_cpu(blocked_vcpu_on_cpu_lock,
> > - vcpu->pre_pcpu), flags);
> > - vcpu->pre_pcpu = -1;
> > -
> > - return 1;
>
> [1]
>
> > - }
> > -
> > WARN((pi_desc->sn == 1),
> > "Warning: SN field of posted-interrupts "
> > "is set before blocking\n");
> > @@ -11345,7 +11327,12 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> > } while (cmpxchg(&pi_desc->control, old.control,
> > new.control) != old.control);
> >
> > - return 0;
> > + /* We should not block the vCPU if an interrupt is posted for it. */
> > + if (pi_test_on(pi_desc) == 1)
> > + __pi_post_block(vcpu);
>
> A question on when pi_test_on() is set:
>
> The old code will return 1 if detected (ses [1]), while the new code
> does not. Would that matter? (IIUC that decides whether the vcpu will
> continue to run?)
The new code does, because __pi_post_block resets vcpu->pre_pcpu to -1.
> > +
> > + local_irq_enable();
> > + return (vcpu->pre_pcpu == -1);
>
> Above we have:
>
> if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> vcpu->pre_pcpu = vcpu->cpu;
> ...
> }
>
> Then can here vcpu->pre_pcpu really be -1?
See above. :)
> > }
> >
> > static int vmx_pre_block(struct kvm_vcpu *vcpu)
> > @@ -11361,12 +11348,13 @@ static int vmx_pre_block(struct kvm_vcpu *vcpu)
> >
> > static void pi_post_block(struct kvm_vcpu *vcpu)
> > {
> > - if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > - !irq_remapping_cap(IRQ_POSTING_CAP) ||
> > - !kvm_vcpu_apicv_active(vcpu))
> > + if (vcpu->pre_pcpu == -1)
> > return;
> >
> > + WARN_ON(irqs_disabled());
> > + local_irq_disable();
> > __pi_post_block(vcpu);
> > + local_irq_enable();
> > }
> >
> > static void vmx_post_block(struct kvm_vcpu *vcpu)
> > --
> > 2.13.0
> >
> >
>
> A general question to pre_block/post_block handling for PI:
>
> I see that we are handling PI logic mostly in four places:
>
> vmx_vcpu_pi_{load|put}
> pi_{pre_post}_block
>
> But do we really need the pre_block/post_block handling? Here's how I
> understand when vcpu blocked:
>
> - vcpu_block
> - ->pre_block
> - kvm_vcpu_block [1]
> - schedule()
> - kvm_sched_out
> - vmx_vcpu_pi_put [3]
> - (another process working) ...
> - kvm_sched_in
> - vmx_vcpu_pi_load [4]
> - ->post_block [2]
>
> If so, [1] & [2] will definitely be paired with [3] & [4], then why we
> need [3] & [4] at all?
>
> (Though [3] & [4] will also be used when preemption happens, so they
> are required)
>
> Please kindly figure out if I missed anything important...
Oh, I see what you mean: set up the wakeup handler in vmx_vcpu_pi_put
and rely on PI.ON to wake up the sleeping process immediately. That
should be feasible, but overall I like the current pre_block/post_block
structure, and I think it's simpler. The only thing to be careful about
is leaving the IRTE unmodified when scheduling out a blocked VCPU, which
is cleaned up and simplified in patch 3.
So I understand that the state may seem a bit too complicated as
of this patch, but hopefully the next two make it clearer.
Paolo
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-06-08 7:00 ` Paolo Bonzini
@ 2017-06-08 9:16 ` Peter Xu
2017-06-08 11:24 ` Paolo Bonzini
0 siblings, 1 reply; 28+ messages in thread
From: Peter Xu @ 2017-06-08 9:16 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, Longpeng, Huangweidong, Gonglei, wangxin,
Radim Krčmář
On Thu, Jun 08, 2017 at 03:00:39AM -0400, Paolo Bonzini wrote:
>
>
> ----- Original Message -----
> > From: "Peter Xu" <peterx@redhat.com>
> > To: "Paolo Bonzini" <pbonzini@redhat.com>
> > Cc: linux-kernel@vger.kernel.org, kvm@vger.kernel.org, "Longpeng" <longpeng2@huawei.com>, "Huangweidong"
> > <weidong.huang@huawei.com>, "Gonglei" <arei.gonglei@huawei.com>, "wangxin" <wangxinxin.wang@huawei.com>, "Radim
> > Krčmář" <rkrcmar@redhat.com>
> > Sent: Thursday, June 8, 2017 8:50:57 AM
> > Subject: Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
> >
> > On Tue, Jun 06, 2017 at 12:57:05PM +0200, Paolo Bonzini wrote:
> > > In some cases, for example involving hot-unplug of assigned
> > > devices, pi_post_block can forget to remove the vCPU from the
> > > blocked_vcpu_list. When this happens, the next call to
> > > pi_pre_block corrupts the list.
> > >
> > > Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block
> > > and WARN instead of adding the element twice in the list. Second,
> > > always do the list removal in pi_post_block if vcpu->pre_pcpu is
> > > set (not -1).
> > >
> > > The new code keeps interrupts disabled for the whole duration of
> > > pi_pre_block/pi_post_block. This is not strictly necessary, but
> > > easier to follow. For the same reason, PI.ON is checked only
> > > after the cmpxchg, and to handle it we just call the post-block
> > > code. This removes duplication of the list removal code.
> > >
> > > Cc: Longpeng (Mike) <longpeng2@huawei.com>
> > > Cc: Huangweidong <weidong.huang@huawei.com>
> > > Cc: Gonglei <arei.gonglei@huawei.com>
> > > Cc: wangxin <wangxinxin.wang@huawei.com>
> > > Cc: Radim Krčmář <rkrcmar@redhat.com>
> > > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > > ---
> > > arch/x86/kvm/vmx.c | 62
> > > ++++++++++++++++++++++--------------------------------
> > > 1 file changed, 25 insertions(+), 37 deletions(-)
> > >
> > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> > > index 747d16525b45..0f4714fe4908 100644
> > > --- a/arch/x86/kvm/vmx.c
> > > +++ b/arch/x86/kvm/vmx.c
> > > @@ -11236,10 +11236,11 @@ static void __pi_post_block(struct kvm_vcpu
> > > *vcpu)
> > > struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > struct pi_desc old, new;
> > > unsigned int dest;
> > > - unsigned long flags;
> > >
> > > do {
> > > old.control = new.control = pi_desc->control;
> > > + WARN(old.nv != POSTED_INTR_WAKEUP_VECTOR,
> > > + "Wakeup handler not enabled while the VCPU is blocked\n");
> > >
> > > dest = cpu_physical_id(vcpu->cpu);
> > >
> > > @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu
> > > *vcpu)
> > > } while (cmpxchg(&pi_desc->control, old.control,
> > > new.control) != old.control);
> > >
> > > - if(vcpu->pre_pcpu != -1) {
> > > - spin_lock_irqsave(
> > > - &per_cpu(blocked_vcpu_on_cpu_lock,
> > > - vcpu->pre_pcpu), flags);
> > > + if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
> > > + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > > list_del(&vcpu->blocked_vcpu_list);
> > > - spin_unlock_irqrestore(
> > > - &per_cpu(blocked_vcpu_on_cpu_lock,
> > > - vcpu->pre_pcpu), flags);
> > > + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > > vcpu->pre_pcpu = -1;
> > > }
> > > }
> > > @@ -11283,7 +11280,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> > > */
> > > static int pi_pre_block(struct kvm_vcpu *vcpu)
> > > {
> > > - unsigned long flags;
> > > unsigned int dest;
> > > struct pi_desc old, new;
> > > struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> > > @@ -11293,34 +11289,20 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> > > !kvm_vcpu_apicv_active(vcpu))
> > > return 0;
> > >
> > > - vcpu->pre_pcpu = vcpu->cpu;
> > > - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > - vcpu->pre_pcpu), flags);
> > > - list_add_tail(&vcpu->blocked_vcpu_list,
> > > - &per_cpu(blocked_vcpu_on_cpu,
> > > - vcpu->pre_pcpu));
> > > - spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > - vcpu->pre_pcpu), flags);
> > > + WARN_ON(irqs_disabled());
> > > + local_irq_disable();
> > > + if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> > > + vcpu->pre_pcpu = vcpu->cpu;
> > > + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > > + list_add_tail(&vcpu->blocked_vcpu_list,
> > > + &per_cpu(blocked_vcpu_on_cpu,
> > > + vcpu->pre_pcpu));
> > > + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> > > + }
> > >
> > > do {
> > > old.control = new.control = pi_desc->control;
> > >
> > > - /*
> > > - * We should not block the vCPU if
> > > - * an interrupt is posted for it.
> > > - */
> > > - if (pi_test_on(pi_desc) == 1) {
> > > - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> > > - vcpu->pre_pcpu), flags);
> > > - list_del(&vcpu->blocked_vcpu_list);
> > > - spin_unlock_irqrestore(
> > > - &per_cpu(blocked_vcpu_on_cpu_lock,
> > > - vcpu->pre_pcpu), flags);
> > > - vcpu->pre_pcpu = -1;
> > > -
> > > - return 1;
> >
> > [1]
> >
> > > - }
> > > -
> > > WARN((pi_desc->sn == 1),
> > > "Warning: SN field of posted-interrupts "
> > > "is set before blocking\n");
> > > @@ -11345,7 +11327,12 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> > > } while (cmpxchg(&pi_desc->control, old.control,
> > > new.control) != old.control);
> > >
> > > - return 0;
> > > + /* We should not block the vCPU if an interrupt is posted for it. */
> > > + if (pi_test_on(pi_desc) == 1)
> > > + __pi_post_block(vcpu);
> >
> > A question on when pi_test_on() is set:
> >
> > The old code will return 1 if detected (ses [1]), while the new code
> > does not. Would that matter? (IIUC that decides whether the vcpu will
> > continue to run?)
>
> The new code does, because __pi_post_block resets vcpu->pre_pcpu to -1.
Sorry I overlook that. :-)
>
> > > +
> > > + local_irq_enable();
> > > + return (vcpu->pre_pcpu == -1);
> >
> > Above we have:
> >
> > if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> > vcpu->pre_pcpu = vcpu->cpu;
> > ...
> > }
> >
> > Then can here vcpu->pre_pcpu really be -1?
>
> See above. :)
Yes. Then there's no problem.
>
> > > }
> > >
> > > static int vmx_pre_block(struct kvm_vcpu *vcpu)
> > > @@ -11361,12 +11348,13 @@ static int vmx_pre_block(struct kvm_vcpu *vcpu)
> > >
> > > static void pi_post_block(struct kvm_vcpu *vcpu)
> > > {
> > > - if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> > > - !irq_remapping_cap(IRQ_POSTING_CAP) ||
> > > - !kvm_vcpu_apicv_active(vcpu))
> > > + if (vcpu->pre_pcpu == -1)
> > > return;
> > >
> > > + WARN_ON(irqs_disabled());
> > > + local_irq_disable();
> > > __pi_post_block(vcpu);
> > > + local_irq_enable();
> > > }
> > >
> > > static void vmx_post_block(struct kvm_vcpu *vcpu)
> > > --
> > > 2.13.0
> > >
> > >
> >
> > A general question to pre_block/post_block handling for PI:
> >
> > I see that we are handling PI logic mostly in four places:
> >
> > vmx_vcpu_pi_{load|put}
> > pi_{pre_post}_block
> >
> > But do we really need the pre_block/post_block handling? Here's how I
> > understand when vcpu blocked:
> >
> > - vcpu_block
> > - ->pre_block
> > - kvm_vcpu_block [1]
> > - schedule()
> > - kvm_sched_out
> > - vmx_vcpu_pi_put [3]
> > - (another process working) ...
> > - kvm_sched_in
> > - vmx_vcpu_pi_load [4]
> > - ->post_block [2]
> >
> > If so, [1] & [2] will definitely be paired with [3] & [4], then why we
> > need [3] & [4] at all?
> >
> > (Though [3] & [4] will also be used when preemption happens, so they
> > are required)
> >
> > Please kindly figure out if I missed anything important...
>
> Oh, I see what you mean: set up the wakeup handler in vmx_vcpu_pi_put
> and rely on PI.ON to wake up the sleeping process immediately. That
> should be feasible, but overall I like the current pre_block/post_block
> structure, and I think it's simpler. The only thing to be careful about
> is leaving the IRTE unmodified when scheduling out a blocked VCPU, which
> is cleaned up and simplified in patch 3.
>
> So I understand that the state may seem a bit too complicated as
> of this patch, but hopefully the next two make it clearer.
After re-read the codes and patches I got the point. Indeed current
way should be clearer since pre/post_block are mostly handling NV/DST
while pi_load/put are for SN bit. Thanks!
--
Peter Xu
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-06-08 9:16 ` Peter Xu
@ 2017-06-08 11:24 ` Paolo Bonzini
2017-06-09 2:50 ` Peter Xu
0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2017-06-08 11:24 UTC (permalink / raw)
To: Peter Xu
Cc: linux-kernel, kvm, Longpeng, Huangweidong, Gonglei, wangxin,
Radim Krčmář
On 08/06/2017 11:16, Peter Xu wrote:
>> Oh, I see what you mean: set up the wakeup handler in vmx_vcpu_pi_put
>> and rely on PI.ON to wake up the sleeping process immediately. That
>> should be feasible, but overall I like the current pre_block/post_block
>> structure, and I think it's simpler. The only thing to be careful about
>> is leaving the IRTE unmodified when scheduling out a blocked VCPU, which
>> is cleaned up and simplified in patch 3.
>>
>> So I understand that the state may seem a bit too complicated as
>> of this patch, but hopefully the next two make it clearer.
> After re-read the codes and patches I got the point. Indeed current
> way should be clearer since pre/post_block are mostly handling NV/DST
> while pi_load/put are for SN bit. Thanks!
Almost: pi_load handles NDST too. However, I think with patch 3 it's
clearer how pi_load handles the nesting inside pre_block...post_block.
Thanks,
Paolo
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-06-08 11:24 ` Paolo Bonzini
@ 2017-06-09 2:50 ` Peter Xu
2017-06-09 7:29 ` Paolo Bonzini
0 siblings, 1 reply; 28+ messages in thread
From: Peter Xu @ 2017-06-09 2:50 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, Longpeng, Huangweidong, Gonglei, wangxin,
Radim Krčmář
On Thu, Jun 08, 2017 at 01:24:44PM +0200, Paolo Bonzini wrote:
>
>
> On 08/06/2017 11:16, Peter Xu wrote:
> >> Oh, I see what you mean: set up the wakeup handler in vmx_vcpu_pi_put
> >> and rely on PI.ON to wake up the sleeping process immediately. That
> >> should be feasible, but overall I like the current pre_block/post_block
> >> structure, and I think it's simpler. The only thing to be careful about
> >> is leaving the IRTE unmodified when scheduling out a blocked VCPU, which
> >> is cleaned up and simplified in patch 3.
> >>
> >> So I understand that the state may seem a bit too complicated as
> >> of this patch, but hopefully the next two make it clearer.
> > After re-read the codes and patches I got the point. Indeed current
> > way should be clearer since pre/post_block are mostly handling NV/DST
> > while pi_load/put are for SN bit. Thanks!
>
> Almost: pi_load handles NDST too. However, I think with patch 3 it's
> clearer how pi_load handles the nesting inside pre_block...post_block.
Yes. The old codes & comments for vmx_vcpu_pi_load() are not very easy
to understand for me.
For patch 3, not sure whether moving clear_sn() upper would be
clearer:
pi_load()
{
if (!pi_test_bit() && vcpu->cpu == cpu)
return;
/* Unconditionally clear SN */
pi_clear_sn();
/*
* If cpu not changed, no need to switch PDST; if WAKEUP, post_block
* will do it for us
*/
if (vcpu->cpu == cpu || nv == WAKEUP)
return;
/*
* Update PDST. Possibly the vcpu thread switched from one cpu to
* another.
*/
do {
...
} while (...)
}
Even, I'm thinking whether we can unconditionally setup PDST only in
pi_load(), then post_block() only needs to handle the NV bit.
(PS. since I'm at here... could I ask why in pi_pre_block we need to
udpate PDST as well? I guess that decides who will run the
wakeup_handler code to kick the vcpu thread, but would that really
matter?)
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-06-09 2:50 ` Peter Xu
@ 2017-06-09 7:29 ` Paolo Bonzini
2017-06-09 7:41 ` Peter Xu
0 siblings, 1 reply; 28+ messages in thread
From: Paolo Bonzini @ 2017-06-09 7:29 UTC (permalink / raw)
To: Peter Xu
Cc: linux-kernel, kvm, Longpeng, Huangweidong, Gonglei, wangxin,
Radim Krčmář
On 09/06/2017 04:50, Peter Xu wrote:
> Even, I'm thinking whether we can unconditionally setup PDST only in
> pi_load(), then post_block() only needs to handle the NV bit.
No, you can't do that without fiddling with the blocked_vcpu lists in
pi_load.
> (PS. since I'm at here... could I ask why in pi_pre_block we need to
> udpate PDST as well? I guess that decides who will run the
> wakeup_handler code to kick the vcpu thread, but would that really
> matter?)
For this one it's a yes. :) I think it's not needed anymore indeed
after these patches; see this comment:
/*
* The wakeup_handler expects the VCPU to be on the
* blocked_vcpu_list that matches ndst. Interrupts
* are disabled so no preemption should happen, but
* err on the side of safety.
*/
So we could add a WARN.
Paolo
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-06-09 7:29 ` Paolo Bonzini
@ 2017-06-09 7:41 ` Peter Xu
0 siblings, 0 replies; 28+ messages in thread
From: Peter Xu @ 2017-06-09 7:41 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, Longpeng, Huangweidong, Gonglei, wangxin,
Radim Krčmář
On Fri, Jun 09, 2017 at 09:29:45AM +0200, Paolo Bonzini wrote:
>
>
> On 09/06/2017 04:50, Peter Xu wrote:
> > Even, I'm thinking whether we can unconditionally setup PDST only in
> > pi_load(), then post_block() only needs to handle the NV bit.
>
> No, you can't do that without fiddling with the blocked_vcpu lists in
> pi_load.
Then how about we keep the blocked_vcpu list maintainance in
post_block(), but only let pi_load() handle the PDST?
(I really feel like they are two things - the blocked_vcpu list helps
for the kick when wakeup happens; while PDST makes sure the PI is
always pointing to the correct cpu)
>
> > (PS. since I'm at here... could I ask why in pi_pre_block we need to
> > udpate PDST as well? I guess that decides who will run the
> > wakeup_handler code to kick the vcpu thread, but would that really
> > matter?)
>
> For this one it's a yes. :) I think it's not needed anymore indeed
> after these patches; see this comment:
>
> /*
> * The wakeup_handler expects the VCPU to be on the
> * blocked_vcpu_list that matches ndst.
Actually I was always unclear on what this sentense means: iiuc
blocked_vcpu_list is only a list of vcpus that may need a kick, so why
it has anything to do with PDST after all?
(or say, no matter what PDST is, we just kick the vcpu thread without
doing anything else, do we?)
> Interrupts
> * are disabled so no preemption should happen, but
> * err on the side of safety.
> */
>
> So we could add a WARN.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-06-06 10:57 ` [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts Paolo Bonzini
` (2 preceding siblings ...)
2017-06-08 6:50 ` Peter Xu
@ 2017-07-28 2:31 ` Longpeng (Mike)
2017-07-28 6:28 ` Paolo Bonzini
3 siblings, 1 reply; 28+ messages in thread
From: Longpeng (Mike) @ 2017-07-28 2:31 UTC (permalink / raw)
To: Paolo Bonzini
Cc: linux-kernel, kvm, Huangweidong, Gonglei, wangxin,
Radim Krčmář
Hi Paolo,
On 2017/6/6 18:57, Paolo Bonzini wrote:
> In some cases, for example involving hot-unplug of assigned
> devices, pi_post_block can forget to remove the vCPU from the
> blocked_vcpu_list. When this happens, the next call to
> pi_pre_block corrupts the list.
>
> Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block
> and WARN instead of adding the element twice in the list. Second,
> always do the list removal in pi_post_block if vcpu->pre_pcpu is
> set (not -1).
>
> The new code keeps interrupts disabled for the whole duration of
> pi_pre_block/pi_post_block. This is not strictly necessary, but
> easier to follow. For the same reason, PI.ON is checked only
> after the cmpxchg, and to handle it we just call the post-block
> code. This removes duplication of the list removal code.
>
> Cc: Longpeng (Mike) <longpeng2@huawei.com>
> Cc: Huangweidong <weidong.huang@huawei.com>
> Cc: Gonglei <arei.gonglei@huawei.com>
> Cc: wangxin <wangxinxin.wang@huawei.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> arch/x86/kvm/vmx.c | 62 ++++++++++++++++++++++--------------------------------
> 1 file changed, 25 insertions(+), 37 deletions(-)
>
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 747d16525b45..0f4714fe4908 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -11236,10 +11236,11 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> struct pi_desc old, new;
> unsigned int dest;
> - unsigned long flags;
>
> do {
> old.control = new.control = pi_desc->control;
> + WARN(old.nv != POSTED_INTR_WAKEUP_VECTOR,
> + "Wakeup handler not enabled while the VCPU is blocked\n");
>
> dest = cpu_physical_id(vcpu->cpu);
>
> @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> } while (cmpxchg(&pi_desc->control, old.control,
> new.control) != old.control);
>
> - if(vcpu->pre_pcpu != -1) {
> - spin_lock_irqsave(
> - &per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> + if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
__pi_post_block is only called by pi_post_block/pi_pre_block now, it seems that
both of them would make sure "vcpu->pre_pcpu != -1" before __pi_post_block is
called, so maybe the above check is useless, right?
> + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> list_del(&vcpu->blocked_vcpu_list);
> - spin_unlock_irqrestore(
> - &per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> vcpu->pre_pcpu = -1;
> }
> }
> @@ -11283,7 +11280,6 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
> */
> static int pi_pre_block(struct kvm_vcpu *vcpu)
> {
> - unsigned long flags;
> unsigned int dest;
> struct pi_desc old, new;
> struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
> @@ -11293,34 +11289,20 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> !kvm_vcpu_apicv_active(vcpu))
> return 0;
>
> - vcpu->pre_pcpu = vcpu->cpu;
> - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> - list_add_tail(&vcpu->blocked_vcpu_list,
> - &per_cpu(blocked_vcpu_on_cpu,
> - vcpu->pre_pcpu));
> - spin_unlock_irqrestore(&per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> + WARN_ON(irqs_disabled());
> + local_irq_disable();
> + if (!WARN_ON_ONCE(vcpu->pre_pcpu != -1)) {
> + vcpu->pre_pcpu = vcpu->cpu;
> + spin_lock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> + list_add_tail(&vcpu->blocked_vcpu_list,
> + &per_cpu(blocked_vcpu_on_cpu,
> + vcpu->pre_pcpu));
> + spin_unlock(&per_cpu(blocked_vcpu_on_cpu_lock, vcpu->pre_pcpu));
> + }
>
> do {
> old.control = new.control = pi_desc->control;
>
> - /*
> - * We should not block the vCPU if
> - * an interrupt is posted for it.
> - */
> - if (pi_test_on(pi_desc) == 1) {
> - spin_lock_irqsave(&per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> - list_del(&vcpu->blocked_vcpu_list);
> - spin_unlock_irqrestore(
> - &per_cpu(blocked_vcpu_on_cpu_lock,
> - vcpu->pre_pcpu), flags);
> - vcpu->pre_pcpu = -1;
> -
> - return 1;
> - }
> -
> WARN((pi_desc->sn == 1),
> "Warning: SN field of posted-interrupts "
> "is set before blocking\n");
> @@ -11345,7 +11327,12 @@ static int pi_pre_block(struct kvm_vcpu *vcpu)
> } while (cmpxchg(&pi_desc->control, old.control,
> new.control) != old.control);
>
> - return 0;
> + /* We should not block the vCPU if an interrupt is posted for it. */
> + if (pi_test_on(pi_desc) == 1)
> + __pi_post_block(vcpu);
> +
> + local_irq_enable();
> + return (vcpu->pre_pcpu == -1);
> }
>
> static int vmx_pre_block(struct kvm_vcpu *vcpu)
> @@ -11361,12 +11348,13 @@ static int vmx_pre_block(struct kvm_vcpu *vcpu)
>
> static void pi_post_block(struct kvm_vcpu *vcpu)
> {
> - if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
> - !irq_remapping_cap(IRQ_POSTING_CAP) ||
> - !kvm_vcpu_apicv_active(vcpu))
> + if (vcpu->pre_pcpu == -1)
> return;
>
> + WARN_ON(irqs_disabled());
> + local_irq_disable();
> __pi_post_block(vcpu);
> + local_irq_enable();
> }
>
> static void vmx_post_block(struct kvm_vcpu *vcpu)
--
Regards,
Longpeng(Mike)
^ permalink raw reply [flat|nested] 28+ messages in thread
* Re: [PATCH 2/4] KVM: VMX: avoid double list add with VT-d posted interrupts
2017-07-28 2:31 ` Longpeng (Mike)
@ 2017-07-28 6:28 ` Paolo Bonzini
0 siblings, 0 replies; 28+ messages in thread
From: Paolo Bonzini @ 2017-07-28 6:28 UTC (permalink / raw)
To: Longpeng (Mike)
Cc: linux-kernel, kvm, Huangweidong, Gonglei, wangxin,
Radim Krčmář
On 28/07/2017 04:31, Longpeng (Mike) wrote:
> Hi Paolo,
>
> On 2017/6/6 18:57, Paolo Bonzini wrote:
>
>> In some cases, for example involving hot-unplug of assigned
>> devices, pi_post_block can forget to remove the vCPU from the
>> blocked_vcpu_list. When this happens, the next call to
>> pi_pre_block corrupts the list.
>>
>> Fix this in two ways. First, check vcpu->pre_pcpu in pi_pre_block
>> and WARN instead of adding the element twice in the list. Second,
>> always do the list removal in pi_post_block if vcpu->pre_pcpu is
>> set (not -1).
>>
>> The new code keeps interrupts disabled for the whole duration of
>> pi_pre_block/pi_post_block. This is not strictly necessary, but
>> easier to follow. For the same reason, PI.ON is checked only
>> after the cmpxchg, and to handle it we just call the post-block
>> code. This removes duplication of the list removal code.
>>
>> Cc: Longpeng (Mike) <longpeng2@huawei.com>
>> Cc: Huangweidong <weidong.huang@huawei.com>
>> Cc: Gonglei <arei.gonglei@huawei.com>
>> Cc: wangxin <wangxinxin.wang@huawei.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>> arch/x86/kvm/vmx.c | 62 ++++++++++++++++++++++--------------------------------
>> 1 file changed, 25 insertions(+), 37 deletions(-)
>>
>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>> index 747d16525b45..0f4714fe4908 100644
>> --- a/arch/x86/kvm/vmx.c
>> +++ b/arch/x86/kvm/vmx.c
>> @@ -11236,10 +11236,11 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
>> struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
>> struct pi_desc old, new;
>> unsigned int dest;
>> - unsigned long flags;
>>
>> do {
>> old.control = new.control = pi_desc->control;
>> + WARN(old.nv != POSTED_INTR_WAKEUP_VECTOR,
>> + "Wakeup handler not enabled while the VCPU is blocked\n");
>>
>> dest = cpu_physical_id(vcpu->cpu);
>>
>> @@ -11256,14 +11257,10 @@ static void __pi_post_block(struct kvm_vcpu *vcpu)
>> } while (cmpxchg(&pi_desc->control, old.control,
>> new.control) != old.control);
>>
>> - if(vcpu->pre_pcpu != -1) {
>> - spin_lock_irqsave(
>> - &per_cpu(blocked_vcpu_on_cpu_lock,
>> - vcpu->pre_pcpu), flags);
>> + if (!WARN_ON_ONCE(vcpu->pre_pcpu == -1)) {
>
>
> __pi_post_block is only called by pi_post_block/pi_pre_block now, it seems that
> both of them would make sure "vcpu->pre_pcpu != -1" before __pi_post_block is
> called, so maybe the above check is useless, right?
It's because a WARN is better than a double-add. And even if the caller
broke the invariant you'd have to do the cmpxchg loop above to make
things not break too much.
Paolo
^ permalink raw reply [flat|nested] 28+ messages in thread