Re: [RFC PATCH] kvm: nv: Optimize the unmapping of shadow S2-MMU tables.

From: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, maz@kernel.org,
	darren@os.amperecomputing.com,
	d.scott.phillips@amperecomputing.com,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>
Subject: Re: [RFC PATCH] kvm: nv: Optimize the unmapping of shadow S2-MMU tables.
Date: Wed, 6 Mar 2024 11:01:09 +0530	[thread overview]
Message-ID: <8e2ee8dd-4412-4133-8b08-75d64ab79649@os.amperecomputing.com> (raw)
In-Reply-To: <Zebb9CyihqC4JqnK@linux.dev>

On 05-03-2024 02:16 pm, Oliver Upton wrote:
> -cc old kvmarm list
> +cc new kvmarm list, reviewers
> 
> Please run scripts/get_maintainer.pl next time around so we get the
> right people looking at a patch.
> 

Of course I know this script -:)
I didn't cc since I felt to avoid unnecessary overloading someone's 
inbox. I don't think anyone(even ARM) is interested in this feature 
other than Marc and me/Ampere. Otherwise this would have merged upstream 
by now.
BTW, NV feature development started way back in 2016/17.

> On Mon, Mar 04, 2024 at 09:46:06PM -0800, Ganapatrao Kulkarni wrote:
>> @@ -216,6 +223,13 @@ struct kvm_s2_mmu {
>>   	 * >0: Somebody is actively using this.
>>   	 */
>>   	atomic_t refcnt;
>> +
>> +	/*
>> +	 * For a Canonical IPA to Shadow IPA mapping.
>> +	 */
>> +	struct rb_root nested_mapipa_root;
> 
> There isn't any benefit to tracking the canonical IPA -> shadow IPA(s)
> mapping on a per-S2 basis, as there already exists a one-to-many problem
> (more below). Maintaining a per-VM data structure (since this is keyed
> by canonical IPA) makes a bit more sense.
> 
>> +	rwlock_t mmu_lock;
>> +
> 
> Err, is there any reason the existing mmu_lock is insufficient here?
> Surely taking a new reference on a canonical IPA for a shadow S2 must be
> done behind the MMU lock for it to be safe against MMU notifiers...
> 
> Also, Reusing the exact same name for it is sure to produce some lock
> imbalance funnies.
> 
>>   };
>>   
>>   static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
>> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
>> index da7ebd2f6e24..c31a59a1fdc6 100644
>> --- a/arch/arm64/include/asm/kvm_nested.h
>> +++ b/arch/arm64/include/asm/kvm_nested.h
>> @@ -65,6 +65,9 @@ extern void kvm_init_nested(struct kvm *kvm);
>>   extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
>>   extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
>>   extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu);
>> +extern void add_shadow_ipa_map_node(
>> +		struct kvm_s2_mmu *mmu,
>> +		phys_addr_t ipa, phys_addr_t shadow_ipa, long size);
> 
> style nitpick: no newline between the open bracket and first parameter.
> Wrap as needed at 80 (or a bit more) columns.
> 
>> +/*
>> + * Create a node and add to lookup table, when a page is mapped to
>> + * Canonical IPA and also mapped to Shadow IPA.
>> + */
>> +void add_shadow_ipa_map_node(struct kvm_s2_mmu *mmu,
>> +			phys_addr_t ipa,
>> +			phys_addr_t shadow_ipa, long size)
>> +{
>> +	struct rb_root *ipa_root = &(mmu->nested_mapipa_root);
>> +	struct rb_node **node = &(ipa_root->rb_node), *parent = NULL;
>> +	struct mapipa_node *new;
>> +
>> +	new = kzalloc(sizeof(struct mapipa_node), GFP_KERNEL);
>> +	if (!new)
>> +		return;
> 
> Should be GFP_KERNEL_ACCOUNT, you want to charge this to the user.
> 
>> +
>> +	new->shadow_ipa = shadow_ipa;
>> +	new->ipa = ipa;
>> +	new->size = size;
> 
> What about aliasing? You could have multiple shadow IPAs that point to
> the same canonical IPA, even within a single MMU.
> 
>> +	write_lock(&mmu->mmu_lock);
>> +
>> +	while (*node) {
>> +		struct mapipa_node *tmp;
>> +
>> +		tmp = container_of(*node, struct mapipa_node, node);
>> +		parent = *node;
>> +		if (new->ipa < tmp->ipa) {
>> +			node = &(*node)->rb_left;
>> +		} else if (new->ipa > tmp->ipa) {
>> +			node = &(*node)->rb_right;
>> +		} else {
>> +			write_unlock(&mmu->mmu_lock);
>> +			kfree(new);
>> +			return;
>> +		}
>> +	}
>> +
>> +	rb_link_node(&new->node, parent, node);
>> +	rb_insert_color(&new->node, ipa_root);
>> +	write_unlock(&mmu->mmu_lock);
> 
> Meh, one of the annoying things with rbtree is you have to build your
> own search functions...
> 
> It would appear that the rbtree intends to express intervals (i.e. GPA +
> size), but the search implementation treats GPA as an index. So I don't
> think this works as intended.
> 
> Have you considered other abstract data types (e.g. xarray, maple tree)
> and how they might apply here?
> 

Thanks for suggesting the maple tree based lookup, I will try it in next 
version.

>> +bool get_shadow_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa, phys_addr_t *shadow_ipa, long *size)
>> +{
>> +	struct rb_node *node;
>> +	struct mapipa_node *tmp = NULL;
>> +
>> +	read_lock(&mmu->mmu_lock);
>> +	node = mmu->nested_mapipa_root.rb_node;
>> +
>> +	while (node) {
>> +		tmp = container_of(node, struct mapipa_node, node);
>> +
>> +		if (tmp->ipa == ipa)
>> +			break;
>> +		else if (ipa > tmp->ipa)
>> +			node = node->rb_right;
>> +		else
>> +			node = node->rb_left;
>> +	}
>> +
>> +	read_unlock(&mmu->mmu_lock);
>> +
>> +	if (tmp && tmp->ipa == ipa) {
>> +		*shadow_ipa = tmp->shadow_ipa;
>> +		*size = tmp->size;
>> +		write_lock(&mmu->mmu_lock);
>> +		rb_erase(&tmp->node, &mmu->nested_mapipa_root);
>> +		write_unlock(&mmu->mmu_lock);
>> +		kfree(tmp);
>> +		return true;
>> +	}
> 
> Implicitly evicting the entry isn't going to work if we want to use it
> for updates to a stage-2 that do not evict the mapping, like write
> protection or access flag updates.
> 

Thanks,
Ganapat