Re: [PATCH v2] KVM: PPC: Defer vtime accounting 'til after IRQ handling

From: Laurent Vivier <lvivier@redhat.com>
To: Nicholas Piggin <npiggin@gmail.com>,
	kvm-ppc@vger.kernel.org, Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Greg Kurz <groug@kaod.org>,
	linux-kernel@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	Paul Mackerras <paulus@ozlabs.org>,
	stable@vger.kernel.org
Subject: Re: [PATCH v2] KVM: PPC: Defer vtime accounting 'til after IRQ handling
Date: Wed, 20 Oct 2021 08:29:06 +0200	[thread overview]
Message-ID: <2a13119c-ccec-1dd5-8cf6-da07a9d8fe6f@redhat.com> (raw)
In-Reply-To: <1634263564.zfj0ajf8eh.astroid@bobo.none>

On 15/10/2021 04:23, Nicholas Piggin wrote:
> Excerpts from Laurent Vivier's message of October 13, 2021 7:30 pm:
>> On 13/10/2021 01:18, Michael Ellerman wrote:
>>> Laurent Vivier <lvivier@redhat.com> writes:
>>>> Commit 112665286d08 moved guest_exit() in the interrupt protected
>>>> area to avoid wrong context warning (or worse), but the tick counter
>>>> cannot be updated and the guest time is accounted to the system time.
>>>>
>>>> To fix the problem port to POWER the x86 fix
>>>> 160457140187 ("Defer vtime accounting 'til after IRQ handling"):
>>>>
>>>> "Defer the call to account guest time until after servicing any IRQ(s)
>>>>    that happened in the guest or immediately after VM-Exit.  Tick-based
>>>>    accounting of vCPU time relies on PF_VCPU being set when the tick IRQ
>>>>    handler runs, and IRQs are blocked throughout the main sequence of
>>>>    vcpu_enter_guest(), including the call into vendor code to actually
>>>>    enter and exit the guest."
>>>>
>>>> Fixes: 112665286d08 ("KVM: PPC: Book3S HV: Context tracking exit guest context before enabling irqs")
>>>> Cc: npiggin@gmail.com
>>>> Cc: <stable@vger.kernel.org> # 5.12
>>>> Signed-off-by: Laurent Vivier <lvivier@redhat.com>
>>>> ---
>>>>
>>>> Notes:
>>>>       v2: remove reference to commit 61bd0f66ff92
>>>>           cc stable 5.12
>>>>           add the same comment in the code as for x86
>>>>
>>>>    arch/powerpc/kvm/book3s_hv.c | 24 ++++++++++++++++++++----
>>>>    1 file changed, 20 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c
>>>> index 2acb1c96cfaf..a694d1a8f6ce 100644
>>>> --- a/arch/powerpc/kvm/book3s_hv.c
>>>> +++ b/arch/powerpc/kvm/book3s_hv.c
>>> ...
>>>> @@ -4506,13 +4514,21 @@ int kvmhv_run_single_vcpu(struct kvm_vcpu *vcpu, u64 time_limit,
>>>>    
>>>>    	srcu_read_unlock(&kvm->srcu, srcu_idx);
>>>>    
>>>> +	context_tracking_guest_exit();
>>>> +
>>>>    	set_irq_happened(trap);
>>>>    
>>>>    	kvmppc_set_host_core(pcpu);
>>>>    
>>>> -	guest_exit_irqoff();
>>>> -
>>>>    	local_irq_enable();
>>>> +	/*
>>>> +	 * Wait until after servicing IRQs to account guest time so that any
>>>> +	 * ticks that occurred while running the guest are properly accounted
>>>> +	 * to the guest.  Waiting until IRQs are enabled degrades the accuracy
>>>> +	 * of accounting via context tracking, but the loss of accuracy is
>>>> +	 * acceptable for all known use cases.
>>>> +	 */
>>>> +	vtime_account_guest_exit();
>>>
>>> This pops a warning for me, running guest(s) on Power8:
>>>    
>>>     [  270.745303][T16661] ------------[ cut here ]------------
>>>     [  270.745374][T16661] WARNING: CPU: 72 PID: 16661 at arch/powerpc/kernel/time.c:311 vtime_account_kernel+0xe0/0xf0
>>
>> Thank you, I missed that...
>>
>> My patch is wrong, I have to add vtime_account_guest_exit() before the local_irq_enable().
> 
> I thought so because if we take an interrupt after exiting the guest that
> should be accounted to kernel not guest.
> 
>>
>> arch/powerpc/kernel/time.c
>>
>>    305 static unsigned long vtime_delta(struct cpu_accounting_data *acct,
>>    306                                  unsigned long *stime_scaled,
>>    307                                  unsigned long *steal_time)
>>    308 {
>>    309         unsigned long now, stime;
>>    310
>>    311         WARN_ON_ONCE(!irqs_disabled());
>> ...
>>
>> But I don't understand how ticks can be accounted now if irqs are still disabled.
>>
>> Not sure it is as simple as expected...
> 
> I don't know all the timer stuff too well. The
> !CONFIG_VIRT_CPU_ACCOUNTING case is relying on PF_VCPU to be set when
> the host timer interrupt runs irqtime_account_process_tick runs so it
> can accumulate that tick to the guest?
> 
> That probably makes sense then, but it seems like we need that in a
> different place. Timer interrupts are not guaranteed to be the first one
> to occur when interrupts are enabled.
> 
> Maybe a new tick_account_guest_exit() and move PF_VCPU clearing to that
> for tick based accounting. Call it after local_irq_enable and call the
> vtime accounting before it. Would that work?

Hi Nick,

I think I will not have the time to have a look to fix that?

Could you try?

Thanks,
Laurent