From mboxrd@z Thu Jan 1 00:00:00 1970 From: Paul Mackerras Subject: Re: [PATCH 13/13] kvm/powerpc: Allow book3s_hv guests to use SMT processor modes Date: Tue, 17 May 2011 20:44:22 +1000 Message-ID: <20110517104422.GB7924@brick.ozlabs.ibm.com> References: <20110511103443.GA2837@brick.ozlabs.ibm.com> <20110511104656.GN2837@brick.ozlabs.ibm.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Cc: linuxppc-dev@ozlabs.org, kvm@vger.kernel.org To: Alexander Graf Return-path: Received: from ozlabs.org ([203.10.76.45]:59674 "EHLO ozlabs.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753760Ab1EQLQA (ORCPT ); Tue, 17 May 2011 07:16:00 -0400 Content-Disposition: inline In-Reply-To: Sender: kvm-owner@vger.kernel.org List-ID: On Tue, May 17, 2011 at 10:21:56AM +0200, Alexander Graf wrote: > > On 11.05.2011, at 12:46, Paul Mackerras wrote: > > > -#define KVM_MAX_VCPUS 1 > > +#define KVM_MAX_VCPUS NR_CPUS > > +#define KVM_THREADS_PER_CORE 4 > > So what if POWER8 (or whatever it will be called) comes along with 8 > threads per core? Would that change the userspace interface? The idea is that userspace queries the KVM_CAP_PPC_SMT capability and the value it gets back is the number of vcpus per vcore. It then allocates vcpu numbers based on that. If a CPU came along with more than 4 threads per core then we'd have to change that define in the kernel, but that won't affect the userspace API. > > + /* wait for secondary threads to get back to nap mode */ > > + spin_lock(&vc->lock); > > + if (vc->nap_count < vc->n_woken) > > + kvmppc_wait_for_nap(vc); > > So you're taking the vcore wide lock and wait for other CPUs to set > themselves to nap? Not sure I fully understand this. Why would > another thread want to go to nap mode when it's 100% busy? It's more about waiting for the other hardware threads to have finished writing their vcpu state to memory. Currently those threads then go to nap mode, but they could in fact poll instead for a bit, so that name is possible a bit misleading, I agree. > > + cmpwi r12,0x980 > > + beq 40f > > + cmpwi r3,0x100 > > good old use define comment :) Yep, OK. :) > Maybe I also missed the point here, but how does this correlate with > Linux threads? Is each vcpu running in its own Linux thread? How > does the scheduling happen? IIUC the host only sees a single thread > per core and then distributes the vcpus to the respective host > threads. Each vcpu has its own Linux thread, but while the vcore is running, all but one of them are sleeping. The thing is that since the host is running with each core single-threaded, one Linux thread is enough to run 4 vcpus. So when we decide we can run the vcore, the vcpu thread that discovered that we can now run the vcore takes the responsibility to run it. That involves sending an IPI to the other hardware threads to wake them up and get them to each run a vcpu. Then the vcpu thread that is running the vcore dives into the guest switch code itself. It synchronizes with the other threads and does the partition switch, and then they all enter the guest. We thought about various schemes to cope with the hardware restriction that all hardware threads in a core have to be in the same partition (at least whenever the MMU is on). This is the least messy scheme we could come up with. I'd be happy to discuss the alternatives if you like. Paul.