All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcelo Tosatti <mtosatti@redhat.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>,
	mingo@redhat.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Thomas Gleixner <tglx@linutronix.de>
Subject: Re: [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\
Date: Wed, 27 Sep 2017 21:44:54 -0300	[thread overview]
Message-ID: <20170928004452.GA30040@amt.cnet> (raw)
In-Reply-To: <6f4afefd-8726-13ff-371e-0d3896b4cf6a@redhat.com>

On Wed, Sep 27, 2017 at 11:37:48AM +0200, Paolo Bonzini wrote:
> On 27/09/2017 00:49, Marcelo Tosatti wrote:
> > On Mon, Sep 25, 2017 at 05:12:42PM +0200, Paolo Bonzini wrote:
> >> On 25/09/2017 11:13, Peter Zijlstra wrote:
> >>> On Sun, Sep 24, 2017 at 11:57:53PM -0300, Marcelo Tosatti wrote:
> >>>> I think you are missing the following point:
> >>>>
> >>>> "vcpu0 can be interrupted when its not in a spinlock protected section, 
> >>>> otherwise it can't."
> >>
> >> Who says that?  Certainly a driver can dedicate a single VCPU to
> >> periodic polling of the device, in such a way that the polling does not
> >> require a spinlock.
> > 
> > This sequence:
> > 
> > 
> > VCPU-0					VCPU-1 (running realtime workload)
> > 
> > takes spinlock A
> > scheduled out				
> > 					spinlock(A) (busy spins until
> > 						     VCPU-0 is scheduled
> > 						     back in)
> > scheduled in
> > finishes execution of 
> > code under protected section
> > releases spinlock(A)			
> > 
> > 					takes spinlock(A)
> 
> Yes, but then you have
> 
>                                         busy waits for flag to be set
>   polls device
>   scheduled out
>   device receives data
>                                         ...
>   scheduled in
>   set flag
> 
> or
> 
>   check for work to do
>   scheduled out
>                                         submit work
>                                         busy waits for work to complete
>   scheduled in
>   do work
> 
> None of which have anything to do with spinlocks.  They're just
> different ways to get priority inversion, and this...
> 
> >>>> So this point of "everything needs to be RT and the priorities must be
> >>>> designed carefully", is this: 
> >>>>
> >>>> 	WHEN in spinlock protected section (more specifically, when 
> >>>> 	spinlock protected section _shared with realtime vcpus_),
> >>>>
> >>>> 	priority of vcpu0 > priority of emulator thread
> >>>>
> >>>> 	OTHERWISE
> >>>>
> >>>> 	priority of vcpu0 < priority of emulator thread.
> >>
> >> This is _not_ designed carefully, this is messy.
> > 
> > This is very precise to me. What is "messy" about it? (its clearly
> > defined).
> 
> ... it's not how you design RT systems to avoid priority inversion.
> It's just solving an instance of the issue.
> 
> >> The emulator thread can interrupt the VCPU thread, so it has to be at
> >> higher RT priority (+ priority inheritance of mutexes).  
> > 
> > It can only do that _when_ the VCPU thread is not running a critical
> > section which a higher priority task depends on.
> 
> All VCPUs must have the same priority.  There's no such thing as a
> housekeeping VCPU.
> 
> There could be one sacrificial VCPU that you place on the same physical
> CPU as the emulator thread, but that's it.  The whole system must run
> smoothly even if you place those on the same physical CPU, without any
> PV hacks.
> 
> > So when you say "The emulator thread can interrupt the VCPU thread", 
> > you're saying that it has to be modified to interrupt for a maximum
> > amount of time (say 15us).
> > 
> > Is that what you are suggesting?
> 
> This is correct.  But I think you are missing a fundamental point, that
> is not specific to virt.  If a thread 1 can trigger an event in thread
> 2, and thread 2 runs at priority N, thread 1 must be placed at priority >N.
> 
> In this case, the emulator thread can signal I/O completion to the VCPU:
> it doesn't matter if the VCPU is polling or using interrupts, the
> emulator thread must be placed at higher priority than the VCPU.  This
> is the root cause of the SeaBIOS issue that you were seeing.  Yes, it
> was me who suggested moving VCPU0 and the emulator thread to
> SCHED_NORMAL, but that was just a bandaid until the real fix was done
> (which is to set up SCHED_FIFO priorities correctly and use PI mutexes).
> 
> In fact, this is not specific to the emulator thread.  It applies just
> as well to vhost threads, to QEMU iothreads, even to the KVM PIT
> emulation thread if the guest were using it.
> 
> > Paolo, you don't control how many interruptions of the emulator thread
> > happen per second.
> 
> Indeed these non-VCPU threads should indeed run rarely and for a small
> amount of time only to achieve bounded latencies in the VCPUs.  But if
> they don't, those are simply bugs and we fix them.  In fact, sources of
> frequent interruptions have all been fixed or moved outside the main
> thread; for example, disks can use separate iothreads.  Configuration
> problems are also bugs, just in a different place.


> 
> For a very basic VM that I've just tried (whatever QEMU does by default,
> only tweak was not using the QEMU GUI), there is exactly one
> interruption per second when the VM is idle.  That interruption in turn
> is caused by udev periodically probing the guest CDROM (!).  So that's a
> configuration problem: remove the CDROM, and it does one interruption
> every 10-30 seconds, again all of them guest triggered.
> 
> Again: if you have many interruptions, it's not a flaw in KVM or QEMU's
> design, it's just that someone is doing something stupid.  It could be
> the guest (e.g. unnecessary devices or daemons as in the example above),
> QEMU (e.g. the RTC emulation used to trigger QEMU timers twice a second
> just to increment the clock), or the management (e.g. polling "is the VM
> running" 50 times per second).  But it can and must be fixed.

No, i mean you can run anything in VCPU-0 (it is valid to do that).
And that "anything" can generate 1 interrupt per second, 1000 or 10.000
interrupts per second. Which are all valid things to be done.

"I can't run a kernel compilation on VCPU-0 because that will impact
latency on the realtime VCPU-1" is not acceptable.

> The design should be the basic one: attribute the right priority to each
> thread and things just work.
> 
> Paolo
> 
> > So if you let the emulator thread interrupt the
> > emulator thread at all times, without some kind of bounding 
> > of these interruptions per time unit, you have a similar
> > problem as (*) (where the realtime task is scheduled).
> > 

  reply	other threads:[~2017-09-28  0:51 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-09-21 11:38 [patch 0/3] KVM KVM_HC_RT_PRIO hypercall support Marcelo Tosatti
2017-09-21 11:38 ` [patch 1/3] KVM: x86: add per-vcpu option to set guest vcpu -RT priority Marcelo Tosatti
2017-09-21 11:38 ` [patch 2/3] KVM: x86: KVM_HC_RT_PRIO hypercall (host-side) Marcelo Tosatti
2017-09-21 13:32   ` Konrad Rzeszutek Wilk
2017-09-21 13:49     ` Paolo Bonzini
2017-09-22  1:08       ` Marcelo Tosatti
2017-09-22  7:23         ` Paolo Bonzini
2017-09-22 12:24           ` Marcelo Tosatti
2017-09-21 11:38 ` [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall Marcelo Tosatti
2017-09-21 13:36   ` Konrad Rzeszutek Wilk
2017-09-21 14:06     ` Peter Zijlstra
2017-09-22  1:10       ` Marcelo Tosatti
2017-09-22 10:00         ` Peter Zijlstra
2017-09-22 10:56           ` Peter Zijlstra
2017-09-22 12:33             ` Marcelo Tosatti
2017-09-22 12:55               ` Peter Zijlstra
2017-09-23 10:56                 ` Paolo Bonzini
2017-09-23 13:41                   ` Peter Zijlstra
2017-09-24 13:05                     ` Paolo Bonzini
2017-09-25  2:57                       ` Marcelo Tosatti
2017-09-25  9:13                         ` Peter Zijlstra
2017-09-25 15:12                           ` Paolo Bonzini
2017-09-26 22:49                             ` [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\ Marcelo Tosatti
2017-09-27  9:37                               ` Paolo Bonzini
2017-09-28  0:44                                 ` Marcelo Tosatti [this message]
2017-09-28  7:22                                   ` Paolo Bonzini
2017-09-28 21:35                                     ` Marcelo Tosatti
2017-09-28 21:41                                       ` Marcelo Tosatti
2017-09-29  8:18                                       ` Paolo Bonzini
2017-09-29 16:40                                         ` Marcelo Tosatti
2017-09-29 17:05                                           ` Paolo Bonzini
2017-09-29 20:17                                             ` Marcelo Tosatti
2017-10-02 12:30                                               ` Paolo Bonzini
2017-10-02 12:48                                                 ` Peter Zijlstra
2017-09-26 23:22                           ` [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall Marcelo Tosatti
2017-09-25 16:20                         ` Konrad Rzeszutek Wilk
2017-09-22 12:16           ` Marcelo Tosatti
2017-09-22 12:31             ` Peter Zijlstra
2017-09-22 12:36               ` Marcelo Tosatti
2017-09-22 12:59                 ` Peter Zijlstra
2017-09-25  1:52                   ` Marcelo Tosatti
2017-09-25  8:35                     ` Peter Zijlstra
2017-09-22 12:40               ` [patch 3/3] x86: kvm guest side support for KVM_HC_RT_PRIO hypercall\ Marcelo Tosatti
2017-09-22 13:01                 ` Peter Zijlstra
2017-09-25  2:22                   ` Marcelo Tosatti
2017-09-25  8:58                     ` Peter Zijlstra
2017-09-25 10:41                     ` Thomas Gleixner
2017-09-25 18:28                       ` Jan Kiszka
2017-09-21 17:45 ` [patch 0/3] KVM KVM_HC_RT_PRIO hypercall support Jan Kiszka
2017-09-22  1:19   ` Marcelo Tosatti
2017-09-22  6:23     ` Jan Kiszka
2017-09-26 23:59       ` Marcelo Tosatti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170928004452.GA30040@amt.cnet \
    --to=mtosatti@redhat.com \
    --cc=konrad.wilk@oracle.com \
    --cc=kvm@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.