From mboxrd@z Thu Jan  1 00:00:00 1970
From: Scott Wood <scottwood@freescale.com>
Date: Thu, 19 May 2011 18:26:22 +0000
Subject: Re: [PATCH 11/13] KVM: PPC: e500: Add shadow PID support
Message-Id: <20110519132622.09e04a3e@schlenkerla.am.freescale.net>
List-Id: <kvm-ppc.vger.kernel.org>
References: <20110517234242.GI3580@schlenkerla.am.freescale.net>
In-Reply-To: <20110517234242.GI3580@schlenkerla.am.freescale.net>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
To: kvm-ppc@vger.kernel.org

On Thu, 19 May 2011 15:18:30 +0200
Alexander Graf <agraf@suse.de> wrote:
	
> On 05/18/2011 01:42 AM, Scott Wood wrote:
> > +static inline int local_sid_setup_one(struct id *entry)
> > +{
> > +	unsigned long sid;
> > +	int ret = -1;
> > +
> > +	sid = ++(__get_cpu_var(pcpu_last_used_sid));
> > +	if (sid<  256) {
> > +		__get_cpu_var(pcpu_sids).entry[sid] = entry;
> > +		entry->val = sid;
> > +		entry->pentry =&__get_cpu_var(pcpu_sids).entry[sid];
> > +		ret = sid;
> > +	}
> > +
> > +	WARN_ON(sid>  256);
> 
> Why > 256?

If sid = 256, we've run out of sids, this function will return -1, all
the sids will be dumped, and pcpu_last_used_sid will be reset to zero.

sid > 256 indicates a race.

> Since the MMU code expects PID1 and PID2 to be zero (which breaks nested 
> KVM!), please make sure to properly abort in the SPR emulation code if 
> anyone tries to write to either of them.

Yeah, I figured someone would bring up nested KVM. :-)

It should be possible to handle guest PID1/PID2 by dynamically assigning
shadow PIDs to hardware PID registers, rather than fixed PID0/PID1 usage.

> > +static void kvmppc_e500_stlbe_invalidate(struct kvmppc_vcpu_e500 *vcpu_e500,
> > +					 int tlbsel, int esel)
> > +{
> > +	struct tlbe *gtlbe =&vcpu_e500->gtlb_arch[tlbsel][esel];
> > +	struct vcpu_id_table *idt = vcpu_e500->idt;
> > +	unsigned int pr, tid, ts, pid;
> > +	u32 val, eaddr;
> > +	unsigned long flags;
> > +
> > +	ts = get_tlb_ts(gtlbe);
> > +	tid = get_tlb_tid(gtlbe);
> > +
> > +	/* One guest ID may be mapped to two shadow IDs */
> > +	for (pr = 0; pr<  2; pr++) {
> > +		/*
> > +		 * If we don't find a valid ID mapping, maybe means
> > +		 * it doesn't have a valid ID mapping on local core,
> > +		 * and has valid ID mapping on other cores. We should
> > +		 * clear this ID mapping to disable related shadow TLBe.
> > +		 *
> > +		 * And we always clear corresponding ID mapping
> > +		 * if guest is invalidating a TLB1 entry.
> 
> Where?
> 
> > +		 */
> > +		if (tlbsel = 1 ||
> > +		    (pid = local_sid_lookup(&idt->id[ts][tid][pr]))<= 0) {
> > +			kvmppc_e500_id_table_reset_one(vcpu_e500, ts, tid, pr);
> > +			continue;
> > +		}

Here.

We only try to shoot down a single entry if it is TLB0 (and thus we know
it's backed by just one host TLB entry), and the shadow PID is active on
this host cpu (which means it is not active on others).  Otherwise we kill
the whole shadow PID.

I'll reword the comment to be clearer.

> > +		/* Guest is invalidating a TLB0 entry
> > +		 * which has a valid ID mapping.
> > +		 * We search host TLB0 to invalidate it's shadow TLBe */
> > +		val = (pid<<  16) | MAS6_SAS;
> 
> Is there no define for the shift? If not, please create one.

There is, not sure why it wasn't used.

> 
> > +		eaddr = get_tlb_eaddr(gtlbe);
> > +
> > +		local_irq_save(flags);
> > +
> > +		mtspr(SPRN_MAS6, val);
> > +		asm volatile("tlbsx 0, %[eaddr]" : : [eaddr] "r" (eaddr));
> > +		val = mfspr(SPRN_MAS1);
> > +		if (val&  MAS1_VALID) {
> > +			mtspr(SPRN_MAS1, val&  ~MAS1_VALID);
> > +			asm volatile("tlbwe");
> > +		}
> 
> Is all this faster than tlbivax? Sure, you might flush other guest's and 
> PID's addresses as well, but save here. Might be worth measuring.

tlbivax broadcasts, and requires tlbsync which broadcasts and needs a
global lock due to an erratum.

This is what Linux does for its own invalidations in the absense of tlbilx;
see __tlbil_va() in arch/powerpc/mm/tlb_nohash_low.S.

> > @@ -564,8 +780,8 @@ int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu)
> >
> >   	gtlbe =&vcpu_e500->gtlb_arch[tlbsel][esel];
> >
> > -	if (get_tlb_v(gtlbe)&&  tlbsel = 1)
> > -		kvmppc_e500_tlb1_invalidate(vcpu_e500, esel);
> > +	if (get_tlb_v(gtlbe))
> > +		kvmppc_e500_stlbe_invalidate(vcpu_e500, tlbsel, esel);
> >
> >   	gtlbe->mas1 = vcpu_e500->mas1;
> >   	gtlbe->mas2 = vcpu_e500->mas2;
> > @@ -582,6 +798,7 @@ int kvmppc_e500_emul_tlbwe(struct kvm_vcpu *vcpu)
> >   		u64 eaddr;
> >   		u64 raddr;
> >
> > +		preempt_disable();
> 
> Are those preempt disables related or just happen to land in this patch? 
> I'd assume the actual tlb modification functions to care about preemption?

It's related.  We want to keep preemption off from the time we look up a
shadow ID to the time we write the TLB entry into hardware.  Atfer that, if
the shadow ID is invalidated, the TLB entry will be as well.

-Scott