All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, maz@kernel.org,
	darren@os.amperecomputing.com,
	d.scott.phillips@amperecomputing.com,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>
Subject: Re: [RFC PATCH] kvm: nv: Optimize the unmapping of shadow S2-MMU tables.
Date: Wed, 6 Mar 2024 11:01:09 +0530	[thread overview]
Message-ID: <8e2ee8dd-4412-4133-8b08-75d64ab79649@os.amperecomputing.com> (raw)
In-Reply-To: <Zebb9CyihqC4JqnK@linux.dev>



On 05-03-2024 02:16 pm, Oliver Upton wrote:
> -cc old kvmarm list
> +cc new kvmarm list, reviewers
> 
> Please run scripts/get_maintainer.pl next time around so we get the
> right people looking at a patch.
> 

Of course I know this script -:)
I didn't cc since I felt to avoid unnecessary overloading someone's 
inbox. I don't think anyone(even ARM) is interested in this feature 
other than Marc and me/Ampere. Otherwise this would have merged upstream 
by now.
BTW, NV feature development started way back in 2016/17.

> On Mon, Mar 04, 2024 at 09:46:06PM -0800, Ganapatrao Kulkarni wrote:
>> @@ -216,6 +223,13 @@ struct kvm_s2_mmu {
>>   	 * >0: Somebody is actively using this.
>>   	 */
>>   	atomic_t refcnt;
>> +
>> +	/*
>> +	 * For a Canonical IPA to Shadow IPA mapping.
>> +	 */
>> +	struct rb_root nested_mapipa_root;
> 
> There isn't any benefit to tracking the canonical IPA -> shadow IPA(s)
> mapping on a per-S2 basis, as there already exists a one-to-many problem
> (more below). Maintaining a per-VM data structure (since this is keyed
> by canonical IPA) makes a bit more sense.
> 
>> +	rwlock_t mmu_lock;
>> +
> 
> Err, is there any reason the existing mmu_lock is insufficient here?
> Surely taking a new reference on a canonical IPA for a shadow S2 must be
> done behind the MMU lock for it to be safe against MMU notifiers...
> 
> Also, Reusing the exact same name for it is sure to produce some lock
> imbalance funnies.
> 
>>   };
>>   
>>   static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
>> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
>> index da7ebd2f6e24..c31a59a1fdc6 100644
>> --- a/arch/arm64/include/asm/kvm_nested.h
>> +++ b/arch/arm64/include/asm/kvm_nested.h
>> @@ -65,6 +65,9 @@ extern void kvm_init_nested(struct kvm *kvm);
>>   extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
>>   extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
>>   extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu);
>> +extern void add_shadow_ipa_map_node(
>> +		struct kvm_s2_mmu *mmu,
>> +		phys_addr_t ipa, phys_addr_t shadow_ipa, long size);
> 
> style nitpick: no newline between the open bracket and first parameter.
> Wrap as needed at 80 (or a bit more) columns.
> 
>> +/*
>> + * Create a node and add to lookup table, when a page is mapped to
>> + * Canonical IPA and also mapped to Shadow IPA.
>> + */
>> +void add_shadow_ipa_map_node(struct kvm_s2_mmu *mmu,
>> +			phys_addr_t ipa,
>> +			phys_addr_t shadow_ipa, long size)
>> +{
>> +	struct rb_root *ipa_root = &(mmu->nested_mapipa_root);
>> +	struct rb_node **node = &(ipa_root->rb_node), *parent = NULL;
>> +	struct mapipa_node *new;
>> +
>> +	new = kzalloc(sizeof(struct mapipa_node), GFP_KERNEL);
>> +	if (!new)
>> +		return;
> 
> Should be GFP_KERNEL_ACCOUNT, you want to charge this to the user.
> 
>> +
>> +	new->shadow_ipa = shadow_ipa;
>> +	new->ipa = ipa;
>> +	new->size = size;
> 
> What about aliasing? You could have multiple shadow IPAs that point to
> the same canonical IPA, even within a single MMU.
> 
>> +	write_lock(&mmu->mmu_lock);
>> +
>> +	while (*node) {
>> +		struct mapipa_node *tmp;
>> +
>> +		tmp = container_of(*node, struct mapipa_node, node);
>> +		parent = *node;
>> +		if (new->ipa < tmp->ipa) {
>> +			node = &(*node)->rb_left;
>> +		} else if (new->ipa > tmp->ipa) {
>> +			node = &(*node)->rb_right;
>> +		} else {
>> +			write_unlock(&mmu->mmu_lock);
>> +			kfree(new);
>> +			return;
>> +		}
>> +	}
>> +
>> +	rb_link_node(&new->node, parent, node);
>> +	rb_insert_color(&new->node, ipa_root);
>> +	write_unlock(&mmu->mmu_lock);
> 
> Meh, one of the annoying things with rbtree is you have to build your
> own search functions...
> 
> It would appear that the rbtree intends to express intervals (i.e. GPA +
> size), but the search implementation treats GPA as an index. So I don't
> think this works as intended.
> 
> Have you considered other abstract data types (e.g. xarray, maple tree)
> and how they might apply here?
> 

Thanks for suggesting the maple tree based lookup, I will try it in next 
version.

>> +bool get_shadow_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa, phys_addr_t *shadow_ipa, long *size)
>> +{
>> +	struct rb_node *node;
>> +	struct mapipa_node *tmp = NULL;
>> +
>> +	read_lock(&mmu->mmu_lock);
>> +	node = mmu->nested_mapipa_root.rb_node;
>> +
>> +	while (node) {
>> +		tmp = container_of(node, struct mapipa_node, node);
>> +
>> +		if (tmp->ipa == ipa)
>> +			break;
>> +		else if (ipa > tmp->ipa)
>> +			node = node->rb_right;
>> +		else
>> +			node = node->rb_left;
>> +	}
>> +
>> +	read_unlock(&mmu->mmu_lock);
>> +
>> +	if (tmp && tmp->ipa == ipa) {
>> +		*shadow_ipa = tmp->shadow_ipa;
>> +		*size = tmp->size;
>> +		write_lock(&mmu->mmu_lock);
>> +		rb_erase(&tmp->node, &mmu->nested_mapipa_root);
>> +		write_unlock(&mmu->mmu_lock);
>> +		kfree(tmp);
>> +		return true;
>> +	}
> 
> Implicitly evicting the entry isn't going to work if we want to use it
> for updates to a stage-2 that do not evict the mapping, like write
> protection or access flag updates.
> 

Thanks,
Ganapat

WARNING: multiple messages have this Message-ID (diff)
From: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
To: Oliver Upton <oliver.upton@linux.dev>
Cc: kvmarm@lists.linux.dev, kvm@vger.kernel.org,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org, maz@kernel.org,
	darren@os.amperecomputing.com,
	d.scott.phillips@amperecomputing.com,
	James Morse <james.morse@arm.com>,
	Suzuki K Poulose <suzuki.poulose@arm.com>,
	Zenghui Yu <yuzenghui@huawei.com>
Subject: Re: [RFC PATCH] kvm: nv: Optimize the unmapping of shadow S2-MMU tables.
Date: Wed, 6 Mar 2024 11:01:09 +0530	[thread overview]
Message-ID: <8e2ee8dd-4412-4133-8b08-75d64ab79649@os.amperecomputing.com> (raw)
In-Reply-To: <Zebb9CyihqC4JqnK@linux.dev>



On 05-03-2024 02:16 pm, Oliver Upton wrote:
> -cc old kvmarm list
> +cc new kvmarm list, reviewers
> 
> Please run scripts/get_maintainer.pl next time around so we get the
> right people looking at a patch.
> 

Of course I know this script -:)
I didn't cc since I felt to avoid unnecessary overloading someone's 
inbox. I don't think anyone(even ARM) is interested in this feature 
other than Marc and me/Ampere. Otherwise this would have merged upstream 
by now.
BTW, NV feature development started way back in 2016/17.

> On Mon, Mar 04, 2024 at 09:46:06PM -0800, Ganapatrao Kulkarni wrote:
>> @@ -216,6 +223,13 @@ struct kvm_s2_mmu {
>>   	 * >0: Somebody is actively using this.
>>   	 */
>>   	atomic_t refcnt;
>> +
>> +	/*
>> +	 * For a Canonical IPA to Shadow IPA mapping.
>> +	 */
>> +	struct rb_root nested_mapipa_root;
> 
> There isn't any benefit to tracking the canonical IPA -> shadow IPA(s)
> mapping on a per-S2 basis, as there already exists a one-to-many problem
> (more below). Maintaining a per-VM data structure (since this is keyed
> by canonical IPA) makes a bit more sense.
> 
>> +	rwlock_t mmu_lock;
>> +
> 
> Err, is there any reason the existing mmu_lock is insufficient here?
> Surely taking a new reference on a canonical IPA for a shadow S2 must be
> done behind the MMU lock for it to be safe against MMU notifiers...
> 
> Also, Reusing the exact same name for it is sure to produce some lock
> imbalance funnies.
> 
>>   };
>>   
>>   static inline bool kvm_s2_mmu_valid(struct kvm_s2_mmu *mmu)
>> diff --git a/arch/arm64/include/asm/kvm_nested.h b/arch/arm64/include/asm/kvm_nested.h
>> index da7ebd2f6e24..c31a59a1fdc6 100644
>> --- a/arch/arm64/include/asm/kvm_nested.h
>> +++ b/arch/arm64/include/asm/kvm_nested.h
>> @@ -65,6 +65,9 @@ extern void kvm_init_nested(struct kvm *kvm);
>>   extern int kvm_vcpu_init_nested(struct kvm_vcpu *vcpu);
>>   extern void kvm_init_nested_s2_mmu(struct kvm_s2_mmu *mmu);
>>   extern struct kvm_s2_mmu *lookup_s2_mmu(struct kvm_vcpu *vcpu);
>> +extern void add_shadow_ipa_map_node(
>> +		struct kvm_s2_mmu *mmu,
>> +		phys_addr_t ipa, phys_addr_t shadow_ipa, long size);
> 
> style nitpick: no newline between the open bracket and first parameter.
> Wrap as needed at 80 (or a bit more) columns.
> 
>> +/*
>> + * Create a node and add to lookup table, when a page is mapped to
>> + * Canonical IPA and also mapped to Shadow IPA.
>> + */
>> +void add_shadow_ipa_map_node(struct kvm_s2_mmu *mmu,
>> +			phys_addr_t ipa,
>> +			phys_addr_t shadow_ipa, long size)
>> +{
>> +	struct rb_root *ipa_root = &(mmu->nested_mapipa_root);
>> +	struct rb_node **node = &(ipa_root->rb_node), *parent = NULL;
>> +	struct mapipa_node *new;
>> +
>> +	new = kzalloc(sizeof(struct mapipa_node), GFP_KERNEL);
>> +	if (!new)
>> +		return;
> 
> Should be GFP_KERNEL_ACCOUNT, you want to charge this to the user.
> 
>> +
>> +	new->shadow_ipa = shadow_ipa;
>> +	new->ipa = ipa;
>> +	new->size = size;
> 
> What about aliasing? You could have multiple shadow IPAs that point to
> the same canonical IPA, even within a single MMU.
> 
>> +	write_lock(&mmu->mmu_lock);
>> +
>> +	while (*node) {
>> +		struct mapipa_node *tmp;
>> +
>> +		tmp = container_of(*node, struct mapipa_node, node);
>> +		parent = *node;
>> +		if (new->ipa < tmp->ipa) {
>> +			node = &(*node)->rb_left;
>> +		} else if (new->ipa > tmp->ipa) {
>> +			node = &(*node)->rb_right;
>> +		} else {
>> +			write_unlock(&mmu->mmu_lock);
>> +			kfree(new);
>> +			return;
>> +		}
>> +	}
>> +
>> +	rb_link_node(&new->node, parent, node);
>> +	rb_insert_color(&new->node, ipa_root);
>> +	write_unlock(&mmu->mmu_lock);
> 
> Meh, one of the annoying things with rbtree is you have to build your
> own search functions...
> 
> It would appear that the rbtree intends to express intervals (i.e. GPA +
> size), but the search implementation treats GPA as an index. So I don't
> think this works as intended.
> 
> Have you considered other abstract data types (e.g. xarray, maple tree)
> and how they might apply here?
> 

Thanks for suggesting the maple tree based lookup, I will try it in next 
version.

>> +bool get_shadow_ipa(struct kvm_s2_mmu *mmu, phys_addr_t ipa, phys_addr_t *shadow_ipa, long *size)
>> +{
>> +	struct rb_node *node;
>> +	struct mapipa_node *tmp = NULL;
>> +
>> +	read_lock(&mmu->mmu_lock);
>> +	node = mmu->nested_mapipa_root.rb_node;
>> +
>> +	while (node) {
>> +		tmp = container_of(node, struct mapipa_node, node);
>> +
>> +		if (tmp->ipa == ipa)
>> +			break;
>> +		else if (ipa > tmp->ipa)
>> +			node = node->rb_right;
>> +		else
>> +			node = node->rb_left;
>> +	}
>> +
>> +	read_unlock(&mmu->mmu_lock);
>> +
>> +	if (tmp && tmp->ipa == ipa) {
>> +		*shadow_ipa = tmp->shadow_ipa;
>> +		*size = tmp->size;
>> +		write_lock(&mmu->mmu_lock);
>> +		rb_erase(&tmp->node, &mmu->nested_mapipa_root);
>> +		write_unlock(&mmu->mmu_lock);
>> +		kfree(tmp);
>> +		return true;
>> +	}
> 
> Implicitly evicting the entry isn't going to work if we want to use it
> for updates to a stage-2 that do not evict the mapping, like write
> protection or access flag updates.
> 

Thanks,
Ganapat

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2024-03-06  5:31 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-03-05  5:46 [RFC PATCH] kvm: nv: Optimize the unmapping of shadow S2-MMU tables Ganapatrao Kulkarni
2024-03-05  5:46 ` Ganapatrao Kulkarni
2024-03-05  8:46 ` Oliver Upton
2024-03-05  8:46   ` Oliver Upton
2024-03-06  5:31   ` Ganapatrao Kulkarni [this message]
2024-03-06  5:31     ` Ganapatrao Kulkarni
2024-03-06  8:39     ` Oliver Upton
2024-03-06  8:39       ` Oliver Upton
2024-03-06 13:33     ` Marc Zyngier
2024-03-06 13:33       ` Marc Zyngier
2024-03-06 14:57       ` Ganapatrao Kulkarni
2024-03-06 14:57         ` Ganapatrao Kulkarni
2024-03-05 11:08 ` Marc Zyngier
2024-03-05 11:08   ` Marc Zyngier
2024-03-05 11:13 ` Marc Zyngier
2024-03-05 11:13   ` Marc Zyngier
2024-03-05 13:29   ` Ganapatrao Kulkarni
2024-03-05 13:29     ` Ganapatrao Kulkarni
2024-03-05 15:03     ` Marc Zyngier
2024-03-05 15:03       ` Marc Zyngier
2024-03-05 18:33       ` Ganapatrao Kulkarni
2024-03-05 18:33         ` Ganapatrao Kulkarni
2024-03-06 10:23         ` Marc Zyngier
2024-03-06 10:23           ` Marc Zyngier
2024-03-26 11:33       ` Ganapatrao Kulkarni
2024-03-26 11:33         ` Ganapatrao Kulkarni
2024-03-27 12:12         ` Marc Zyngier
2024-03-27 12:12           ` Marc Zyngier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e2ee8dd-4412-4133-8b08-75d64ab79649@os.amperecomputing.com \
    --to=gankulkarni@os.amperecomputing.com \
    --cc=d.scott.phillips@amperecomputing.com \
    --cc=darren@os.amperecomputing.com \
    --cc=james.morse@arm.com \
    --cc=kvm@vger.kernel.org \
    --cc=kvmarm@lists.linux.dev \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=maz@kernel.org \
    --cc=oliver.upton@linux.dev \
    --cc=suzuki.poulose@arm.com \
    --cc=yuzenghui@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.