Re: [Qemu-devel] [PATCH] migration: Improve accuracy of vCPU throttling with per-vCPU timers

From: Cosmin Marin <cosmin.marin@nutanix.com>
To: Peter Xu <peterx@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	Richard Henderson <rth@twiddle.net>
Subject: Re: [Qemu-devel] [PATCH] migration: Improve accuracy of vCPU throttling with per-vCPU timers
Date: Wed, 19 Jun 2019 15:23:28 +0000	[thread overview]
Message-ID: <2D8771C7-51F3-4C9A-9E89-2E2A75686BF9@nutanix.com> (raw)
In-Reply-To: <20190619013534.GA8761@xz-x1>

On 19/06/2019, 02:35, "Peter Xu" <peterx@redhat.com> wrote:

    On Tue, Jun 18, 2019 at 04:52:09PM +0000, Cosmin Marin wrote:
    > 
    > 
    > On 18/06/2019, 15:51, "Peter Xu" <peterx@redhat.com> wrote:
    > 
    >     On Tue, Jun 18, 2019 at 12:25:43PM +0000, Cosmin Marin wrote:
    >     > 	Hi Peter,
    >     > 
    >     > 	thanks for reviewing the patch. Indeed, I agree that it's almost impossible to determine which solution it's better from the scalability perspective. However, I feel that using per-vCPU timers is the only way for ensuring correctness of the throttling ratio.
    >     
    >     The thing is that your patch actually contains two changes:
    >     
    >     1. use N timers instead of one.
    >     
    >     2. remove throttle_thread_scheduled check, so we do the throttle
    >        always
    >     
    >     Here what I'm worried is that _maybe_ the 2nd item is the one that
    >     really helped.
    >     
    > 	C: The removal of *throttle_thread_scheduled* is a consequence of the per-vCPU model only. In this model, each of the vCPUs schedules work just for itself (as part of the timer's firing callback) - there's no global point of control - therefore, the variable isn't helpful for scheduling anymore.
    > 
    >     Note that there is a side effect that we might queue more than one
    >     work on one specific cpu if we queue it too fast, but it does not
    >     block us from trying it out to identify which item (1 or 2 or both)
    >     really helped here.  Then if we think that (queuing too much) is an
    >     issue then we can discuss on how to fix it since current patch will
    >     have this problem as well.
    >     
    > 	C: I believe that in the per-vCPU timer implementation we cannot queue more than one piece of work because, here, the vCPU queues work for itself and that happens only when the timer fires - so, the two "states" - scheduling and sleeping - are mutually exclusive running from the same thread context. 

    I think this is the place where I'm in question with - I don't think
    they are using the same context.  IMO the timer will always be run in
    the main thread no matter you use per-cpu timer or not, however the
    sleeping part should be run on per-cpu.

    A simple way to verify it would be: break at cpu_throttle_timer_tick()
    to see which thread it is running in.

	You're absolutely right, it was indeed a confusion I made (I've run a test in which I printed the thread IDs to confirm as well). However, I believe that there are two contributing factors preventing the scheduling of more than one piece of work: 
		- the timer's firing period is always greater than the vCPU's sleeping interval, therefore the timer won't fire while a vCPU is sleeping and as a consequence no additional work is scheduled (as long as the start of the sleeping time does not "move to the right" towards the next firing of the timer)
		- the timer's callback schedules work for one vCPU only (simple & fast) preventing additional delays between work executions on different vCPUs or potential overlapping of timer firing with vCPU sleeps 

    >     > 
    >     > 	It's a bit unclear to me how the throttling ratio inconsistency can be fixed by using a single timer even avoiding the conditional timer re-arming.  Could you provide more details about the use of a single timer ?
    > 
    > 	C: I feel like in this case it will sleep too much running into a problem similar to the one solved by 90bb0c0; under heavy throttling more than one work item may be scheduled.

    Right.  So I feel like we need a solution that will avoid this problem
    but at the same time keep the proper accuracy of the throttling.

	IMO the patch achieves both goals without putting too much pressure on the main thread when running the callbacks; we could see no problem related to the main thread/callbacks in any of the tests we ran.

    Thanks,

    -- 
    Peter Xu

Thanks,
Cosmin