* Async page fault delivered while irq are disabled?
@ 2019-12-19 15:28 Frederic Weisbecker
2019-12-19 15:57 ` Sean Christopherson
0 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2019-12-19 15:28 UTC (permalink / raw)
To: Paolo Bonzini, Radim Krčmář
Cc: Sean Christopherson, Vitaly Kuznetsov, Wanpeng Li, Jim Mattson,
Joerg Roedel, kvm
Hi,
While checking the x86 async page fault code, I can't
find anything that prevents KVM_PV_REASON_PAGE_READY to be injected
while the guest has interrupts disabled. If that page fault happens
to trap in an interrupt disabled section, there may be a deadlock due to the
call to wake_up_process() which locks the rq->lock (among others).
Given how long that code is there, I guess such an issue would
have been reported for a while already. But I just would like to
be sure we are checking that.
Can someone enlighten me?
Thanks.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled?
2019-12-19 15:28 Async page fault delivered while irq are disabled? Frederic Weisbecker
@ 2019-12-19 15:57 ` Sean Christopherson
2019-12-19 16:15 ` Frederic Weisbecker
0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2019-12-19 15:57 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Paolo Bonzini, Radim Krčmář,
Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm
On Thu, Dec 19, 2019 at 04:28:15PM +0100, Frederic Weisbecker wrote:
> Hi,
>
> While checking the x86 async page fault code, I can't
> find anything that prevents KVM_PV_REASON_PAGE_READY to be injected
> while the guest has interrupts disabled. If that page fault happens
> to trap in an interrupt disabled section, there may be a deadlock due to the
> call to wake_up_process() which locks the rq->lock (among others).
>
> Given how long that code is there, I guess such an issue would
> have been reported for a while already. But I just would like to
> be sure we are checking that.
>
> Can someone enlighten me?
The check is triggered from the caller of kvm_async_page_present().
kvm_check_async_pf_completion()
|
|-> kvm_arch_can_inject_async_page_present()
|
|-> kvm_can_do_async_pf()
|
|-> kvm_x86_ops->interrupt_allowed()
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled?
2019-12-19 15:57 ` Sean Christopherson
@ 2019-12-19 16:15 ` Frederic Weisbecker
2019-12-19 19:00 ` Sean Christopherson
0 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2019-12-19 16:15 UTC (permalink / raw)
To: Sean Christopherson
Cc: Paolo Bonzini, Radim Krčmář,
Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm
On Thu, Dec 19, 2019 at 07:57:46AM -0800, Sean Christopherson wrote:
> On Thu, Dec 19, 2019 at 04:28:15PM +0100, Frederic Weisbecker wrote:
> > Hi,
> >
> > While checking the x86 async page fault code, I can't
> > find anything that prevents KVM_PV_REASON_PAGE_READY to be injected
> > while the guest has interrupts disabled. If that page fault happens
> > to trap in an interrupt disabled section, there may be a deadlock due to the
> > call to wake_up_process() which locks the rq->lock (among others).
> >
> > Given how long that code is there, I guess such an issue would
> > have been reported for a while already. But I just would like to
> > be sure we are checking that.
> >
> > Can someone enlighten me?
>
> The check is triggered from the caller of kvm_async_page_present().
>
> kvm_check_async_pf_completion()
> |
> |-> kvm_arch_can_inject_async_page_present()
> |
> |-> kvm_can_do_async_pf()
> |
> |-> kvm_x86_ops->interrupt_allowed()
Ah thanks, I missed that one. And what about
kvm_async_page_present_sync()? I don't see a similar check
there.
And one last silly question, what about that line in
kvm_arch_can_inject_async_page_present:
if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
return true;
That looks weird, also it shortcuts the irqs_allowed() check.
Thanks!
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled?
2019-12-19 16:15 ` Frederic Weisbecker
@ 2019-12-19 19:00 ` Sean Christopherson
2019-12-20 9:34 ` Paolo Bonzini
0 siblings, 1 reply; 8+ messages in thread
From: Sean Christopherson @ 2019-12-19 19:00 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Paolo Bonzini, Radim Krčmář,
Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm
On Thu, Dec 19, 2019 at 05:15:25PM +0100, Frederic Weisbecker wrote:
> On Thu, Dec 19, 2019 at 07:57:46AM -0800, Sean Christopherson wrote:
> > On Thu, Dec 19, 2019 at 04:28:15PM +0100, Frederic Weisbecker wrote:
> > > Hi,
> > >
> > > While checking the x86 async page fault code, I can't
> > > find anything that prevents KVM_PV_REASON_PAGE_READY to be injected
> > > while the guest has interrupts disabled. If that page fault happens
> > > to trap in an interrupt disabled section, there may be a deadlock due to the
> > > call to wake_up_process() which locks the rq->lock (among others).
> > >
> > > Given how long that code is there, I guess such an issue would
> > > have been reported for a while already. But I just would like to
> > > be sure we are checking that.
> > >
> > > Can someone enlighten me?
> >
> > The check is triggered from the caller of kvm_async_page_present().
> >
> > kvm_check_async_pf_completion()
> > |
> > |-> kvm_arch_can_inject_async_page_present()
> > |
> > |-> kvm_can_do_async_pf()
> > |
> > |-> kvm_x86_ops->interrupt_allowed()
>
> Ah thanks, I missed that one. And what about
> kvm_async_page_present_sync()? I don't see a similar check
> there.
CONFIG_KVM_ASYNC_PF_SYNC is selected only by s390, it can't be turned on
for x86.
> And one last silly question, what about that line in
> kvm_arch_can_inject_async_page_present:
>
> if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
> return true;
>
> That looks weird, also it shortcuts the irqs_allowed() check.
I wondered about that code as well :-). Definitely odd, but it would
require the guest to disable async #PF after an async #PF is queued. Best
guess is the idea is that it's the guest's problem if it disables async #PF
on the fly.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled?
2019-12-19 19:00 ` Sean Christopherson
@ 2019-12-20 9:34 ` Paolo Bonzini
2019-12-23 2:17 ` Frederic Weisbecker
0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2019-12-20 9:34 UTC (permalink / raw)
To: Sean Christopherson, Frederic Weisbecker
Cc: Radim Krčmář,
Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm
On 19/12/19 20:00, Sean Christopherson wrote:
>> And one last silly question, what about that line in
>> kvm_arch_can_inject_async_page_present:
>>
>> if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
>> return true;
>>
>> That looks weird, also it shortcuts the irqs_allowed() check.
>
> I wondered about that code as well :-). Definitely odd, but it would
> require the guest to disable async #PF after an async #PF is queued. Best
> guess is the idea is that it's the guest's problem if it disables async #PF
> on the fly.
>
When the guest disables async #PF all outstanding page faults are
cancelled by kvm_clear_async_pf_completion_queue. However, in case they
complete while in cancel_work_sync. you need to inject them even if
interrupts are disabled.
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled?
2019-12-20 9:34 ` Paolo Bonzini
@ 2019-12-23 2:17 ` Frederic Weisbecker
2019-12-23 8:38 ` Paolo Bonzini
0 siblings, 1 reply; 8+ messages in thread
From: Frederic Weisbecker @ 2019-12-23 2:17 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Sean Christopherson, Radim Krčmář,
Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm
On Fri, Dec 20, 2019 at 10:34:20AM +0100, Paolo Bonzini wrote:
> On 19/12/19 20:00, Sean Christopherson wrote:
> >> And one last silly question, what about that line in
> >> kvm_arch_can_inject_async_page_present:
> >>
> >> if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
> >> return true;
> >>
> >> That looks weird, also it shortcuts the irqs_allowed() check.
> >
> > I wondered about that code as well :-). Definitely odd, but it would
> > require the guest to disable async #PF after an async #PF is queued. Best
> > guess is the idea is that it's the guest's problem if it disables async #PF
> > on the fly.
> >
>
> When the guest disables async #PF all outstanding page faults are
> cancelled by kvm_clear_async_pf_completion_queue. However, in case they
> complete while in cancel_work_sync. you need to inject them even if
> interrupts are disabled.
Hmm, shouldn't the guest wait for the whole pending waitqueue in kvm_async_pf_task_wait()
to be serviced and woken up before actually allowing to disable async #PF ?
Because you can't really afford to inject those #PF while IRQs are disabled,
that's a big rq deadlock risk.
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled?
2019-12-23 2:17 ` Frederic Weisbecker
@ 2019-12-23 8:38 ` Paolo Bonzini
2019-12-26 17:28 ` Frederic Weisbecker
0 siblings, 1 reply; 8+ messages in thread
From: Paolo Bonzini @ 2019-12-23 8:38 UTC (permalink / raw)
To: Frederic Weisbecker
Cc: Sean Christopherson, Radim Krčmář,
Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm
On 23/12/19 03:17, Frederic Weisbecker wrote:
> On Fri, Dec 20, 2019 at 10:34:20AM +0100, Paolo Bonzini wrote:
>> On 19/12/19 20:00, Sean Christopherson wrote:
>>>> And one last silly question, what about that line in
>>>> kvm_arch_can_inject_async_page_present:
>>>>
>>>> if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
>>>> return true;
>>>>
>>>> That looks weird, also it shortcuts the irqs_allowed() check.
>>>
>>> I wondered about that code as well :-). Definitely odd, but it would
>>> require the guest to disable async #PF after an async #PF is queued. Best
>>> guess is the idea is that it's the guest's problem if it disables async #PF
>>> on the fly.
>>>
>>
>> When the guest disables async #PF all outstanding page faults are
>> cancelled by kvm_clear_async_pf_completion_queue. However, in case they
>> complete while in cancel_work_sync. you need to inject them even if
>> interrupts are disabled.
>
> Hmm, shouldn't the guest wait for the whole pending waitqueue in kvm_async_pf_task_wait()
> to be serviced and woken up before actually allowing to disable async #PF ?
> Because you can't really afford to inject those #PF while IRQs are disabled,
> that's a big rq deadlock risk.
That's just how Linux works, and Linux doesn't ever disable async page
faults with disabled IRQ (reboot_notifier_list is a blocking notifier).
Paolo
^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Async page fault delivered while irq are disabled?
2019-12-23 8:38 ` Paolo Bonzini
@ 2019-12-26 17:28 ` Frederic Weisbecker
0 siblings, 0 replies; 8+ messages in thread
From: Frederic Weisbecker @ 2019-12-26 17:28 UTC (permalink / raw)
To: Paolo Bonzini
Cc: Sean Christopherson, Radim Krčmář,
Vitaly Kuznetsov, Wanpeng Li, Jim Mattson, Joerg Roedel, kvm
On Mon, Dec 23, 2019 at 09:38:18AM +0100, Paolo Bonzini wrote:
> On 23/12/19 03:17, Frederic Weisbecker wrote:
> > On Fri, Dec 20, 2019 at 10:34:20AM +0100, Paolo Bonzini wrote:
> >> On 19/12/19 20:00, Sean Christopherson wrote:
> >>>> And one last silly question, what about that line in
> >>>> kvm_arch_can_inject_async_page_present:
> >>>>
> >>>> if (!(vcpu->arch.apf.msr_val & KVM_ASYNC_PF_ENABLED))
> >>>> return true;
> >>>>
> >>>> That looks weird, also it shortcuts the irqs_allowed() check.
> >>>
> >>> I wondered about that code as well :-). Definitely odd, but it would
> >>> require the guest to disable async #PF after an async #PF is queued. Best
> >>> guess is the idea is that it's the guest's problem if it disables async #PF
> >>> on the fly.
> >>>
> >>
> >> When the guest disables async #PF all outstanding page faults are
> >> cancelled by kvm_clear_async_pf_completion_queue. However, in case they
> >> complete while in cancel_work_sync. you need to inject them even if
> >> interrupts are disabled.
> >
> > Hmm, shouldn't the guest wait for the whole pending waitqueue in kvm_async_pf_task_wait()
> > to be serviced and woken up before actually allowing to disable async #PF ?
> > Because you can't really afford to inject those #PF while IRQs are disabled,
> > that's a big rq deadlock risk.
>
> That's just how Linux works, and Linux doesn't ever disable async page
> faults with disabled IRQ (reboot_notifier_list is a blocking notifier).
So when I talk about IRQs enabled requirement, this is to prevent the page fault from
interrupting code that may hold a lock.
Now in those case I think we are good, as kvm_pv_guest_cpu_reboot() is called from
a generic IPI (rq and others shouldn't be held at that time) and kvm_guest_cpu_offline()
is called from a thread with interrupts disabled.
Anyway those semantics and expectations are very obscure. Probably those async page
faults should be considered as IRQs from lockdep POV.
^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2019-12-26 17:28 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-12-19 15:28 Async page fault delivered while irq are disabled? Frederic Weisbecker
2019-12-19 15:57 ` Sean Christopherson
2019-12-19 16:15 ` Frederic Weisbecker
2019-12-19 19:00 ` Sean Christopherson
2019-12-20 9:34 ` Paolo Bonzini
2019-12-23 2:17 ` Frederic Weisbecker
2019-12-23 8:38 ` Paolo Bonzini
2019-12-26 17:28 ` Frederic Weisbecker
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).