Re: [PATCH v3 5/7] vpci: fix execution of long running operations

From: "Roger Pau Monné" <roger.pau@citrix.com>
To: Jan Beulich <JBeulich@suse.com>
Cc: Stefano Stabellini <sstabellini@kernel.org>,
	Wei Liu <wei.liu2@citrix.com>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	George Dunlap <George.Dunlap@eu.citrix.com>,
	Andrew Cooper <andrew.cooper3@citrix.com>,
	Ian Jackson <Ian.Jackson@eu.citrix.com>, Tim Deegan <tim@xen.org>,
	Julien Grall <julien.grall@arm.com>,
	Paul Durrant <paul.durrant@citrix.com>,
	xen-devel <xen-devel@lists.xenproject.org>
Subject: Re: [PATCH v3 5/7] vpci: fix execution of long running operations
Date: Thu, 8 Nov 2018 12:29:59 +0100	[thread overview]
Message-ID: <20181108112959.7oc4guik3icb7nls@mac.citrite.net> (raw)
In-Reply-To: <5BE4082C02000078001F97FE@prv1-mh.provo.novell.com>

On Thu, Nov 08, 2018 at 02:55:56AM -0700, Jan Beulich wrote:
> >>> On 07.11.18 at 18:15, <roger.pau@citrix.com> wrote:
> > On Wed, Nov 07, 2018 at 08:06:00AM -0700, Jan Beulich wrote:
> >> >>> On 07.11.18 at 12:11, <roger.pau@citrix.com> wrote:
> >> > On Mon, Nov 05, 2018 at 09:56:13AM -0700, Jan Beulich wrote:
> >> >> >>> On 30.10.18 at 16:41, <roger.pau@citrix.com> wrote:
> >> >> > BAR map/unmap is a long running operation that needs to be preempted
> >> >> > in order to avoid overrunning the assigned vCPU time (or even
> >> >> > triggering the watchdog).
> >> >> > 
> >> >> > Current logic for this preemption is wrong, and won't work at all for
> >> >> > AMD since only Intel makes use of hvm_io_pending (and even in that
> >> >> > case the current code is wrong).
> >> >> 
> >> >> I'm having trouble understanding this, both for the AMD aspect
> >> >> (it is only vvmx.c which has a function call not mirrored on the
> >> >> AMD side) and for the supposed general brokenness. Without
> >> >> some clarification I can't judge whether re-implementing via
> >> >> tasklet is actually the best approach.
> >> > 
> >> > hvm_io_pending itself cannot block the vCPU from executing, it's used
> >> > by nvmx_switch_guest in order to prevent changing the nested VMCS if
> >> > there's pending IO emulation work AFAICT.
> >> > 
> >> > The only way I could find to actually prevent a vCPU from running
> >> > while doing some work on it's behalf in a preemptive way is by
> >> > blocking it and using a tasklet. What's done with IOREQs is not
> >> > suitable here since Xen needs to do some work instead of just wait on
> >> > an external event (an event channel from the IOREQ).
> >> 
> >> No, there is a second means, I've just confused the functions. The
> >> question is whether your vpci_process_pending() invocation
> >> perhaps sits in the wrong function. handle_hvm_io_completion() is
> >> what hvm_do_resume() calls, and what can prevent a guest from
> >> resuming execution. The hvm_io_pending() invocation just sits on
> >> a special case path down from there (through handle_pio()).
> > 
> > Yes, handle_hvm_io_completion is the function that actually blocks the
> > vCPU and waits for an event channel from the ioreq. This is however
> > not suitable because it uses the following code (simplified):
> > 
> > set_bit(_VPF_blocked_in_xen, &current->pause_flags);
> > raise_softirq(SCHEDULE_SOFTIRQ);
> > do_softirq();
> > 
> > In the vPCI case Xen cannot set the vCPU as blocked_in_xen, Xen needs
> > the scheduler to schedule the vCPU so the pending work can be
> > processed.
> 
> Right, and I didn't say you should set the vCPU to blocked. What
> I've pointed out is that the mere fact of handle_hvm_io_completion()
> returning false makes hvm_do_resume() bail, resulting in another
> round through softirq processing (from entry.S code) as long as
> _some_ softirq is pending (here: the scheduler one).

No, hvm_do_resume bailing doesn't prevent the VM from being scheduled.
Both vmx_do_resume and svm_do_resume (the callers of hvm_do_resume)
will run the guest regardless of whether hvm_do_resume has bailed
early.

Note that at the point where hvm_do_resume is called there's no
turning back, both {vmx/svm}_do_resume functions are annotated as
noreturn. The only way to prevent guest execution at that point is to
raise a scheduler softirq and process it, like it's done in
wait_on_xen_event_channel.

> 
>  Then if the blocked bit is not set the call to do_softirq
> > would be recurred, thus probably leading to a stack overflow if
> > there's enough pending work. ie:
> > 
> > <process work>
> > 	<do_softirq>
> > 		<process work>
> > 			<do_softirq>
> > 				<...>
> 
> Why would that be? The do_softirq() invocation sits on the exit-
> to-guest path, explicitly avoiding any such nesting unless there
> was a do_softirq() invocation somewhere in a softirq handler.

It sits on an exit-to-guest path, but the following chunk:

raise_softirq(SCHEDULE_SOFTIRQ);
do_softirq();

Would prevent the path from ever reaching the exit-to-guest and
nesting on itself, unless the vCPU is marked as blocked, which
prevents it from being scheduled thus avoiding this recursion.

Thanks, Roger.

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel