From: Robin Murphy <robin.murphy@arm.com>
To: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>,
Will Deacon <will@kernel.org>, Joerg Roedel <joro@8bytes.org>
Cc: Thierry Reding <treding@nvidia.com>,
linux-arm-msm@vger.kernel.org,
Douglas Anderson <dianders@chromium.org>,
linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush
Date: Mon, 21 Jun 2021 16:45:50 +0100 [thread overview]
Message-ID: <904f283c-f8b1-ba84-d010-eacc87bb53c5@arm.com> (raw)
In-Reply-To: <b099af10926b34249f4a30262db37f50491bebe7.1623981933.git.saiprakash.ranjan@codeaurora.org>
On 2021-06-18 03:51, Sai Prakash Ranjan wrote:
> Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire context
> with tlb_flush_all() callback in partial walk flush to improve unmap
> performance on select few platforms where the cost of over-invalidation
> is less than the unmap latency.
I still think this doesn't belong anywhere near io-pgtable at all. It's
a driver-internal decision how exactly it implements a non-leaf
invalidation, and that may be more complex than a predetermined boolean
decision. For example, I've just realised for SMMUv3 we can't invalidate
multiple levels of table at once with a range command, since if we
assume the whole thing is mapped at worst-case page granularity we may
fail to invalidate any parts which are mapped as intermediate-level
blocks. If invalidating a 1GB region (with 4KB granule) means having to
fall back to 256K non-range commands, we may not want to invalidate by
VA then, even though doing so for a 2MB region is still optimal.
It's also quite feasible that drivers might want to do this for leaf
invalidations too - if you don't like issuing 512 commands to invalidate
2MB, do you like issuing 511 commands to invalidate 2044KB? - and at
that point the logic really has to be in the driver anyway.
Robin.
> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> ---
> drivers/iommu/io-pgtable-arm.c | 3 ++-
> include/linux/io-pgtable.h | 5 +++++
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 87def58e79b5..5d362f2214bd 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -768,7 +768,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> IO_PGTABLE_QUIRK_NON_STRICT |
> IO_PGTABLE_QUIRK_ARM_TTBR1 |
> - IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
> + IO_PGTABLE_QUIRK_ARM_OUTER_WBWA |
> + IO_PGTABLE_QUIRK_TLB_INV_ALL))
> return NULL;
>
> data = arm_lpae_alloc_pgtable(cfg);
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 4d40dfa75b55..45441592a0e6 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -82,6 +82,10 @@ struct io_pgtable_cfg {
> *
> * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
> * attributes set in the TCR for a non-coherent page-table walker.
> + *
> + * IO_PGTABLE_QUIRK_TLB_INV_ALL: Use TLBIALL/TLBIASID to invalidate
> + * entire context for partial walk flush to increase unmap
> + * performance on select few platforms.
> */
> #define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
> #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1)
> @@ -89,6 +93,7 @@ struct io_pgtable_cfg {
> #define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
> #define IO_PGTABLE_QUIRK_ARM_TTBR1 BIT(5)
> #define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
> + #define IO_PGTABLE_QUIRK_TLB_INV_ALL BIT(7)
> unsigned long quirks;
> unsigned long pgsize_bitmap;
> unsigned int ias;
>
WARNING: multiple messages have this Message-ID (diff)
From: Robin Murphy <robin.murphy@arm.com>
To: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>,
Will Deacon <will@kernel.org>, Joerg Roedel <joro@8bytes.org>
Cc: linux-arm-msm@vger.kernel.org,
Douglas Anderson <dianders@chromium.org>,
linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
Thierry Reding <treding@nvidia.com>,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush
Date: Mon, 21 Jun 2021 16:45:50 +0100 [thread overview]
Message-ID: <904f283c-f8b1-ba84-d010-eacc87bb53c5@arm.com> (raw)
In-Reply-To: <b099af10926b34249f4a30262db37f50491bebe7.1623981933.git.saiprakash.ranjan@codeaurora.org>
On 2021-06-18 03:51, Sai Prakash Ranjan wrote:
> Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire context
> with tlb_flush_all() callback in partial walk flush to improve unmap
> performance on select few platforms where the cost of over-invalidation
> is less than the unmap latency.
I still think this doesn't belong anywhere near io-pgtable at all. It's
a driver-internal decision how exactly it implements a non-leaf
invalidation, and that may be more complex than a predetermined boolean
decision. For example, I've just realised for SMMUv3 we can't invalidate
multiple levels of table at once with a range command, since if we
assume the whole thing is mapped at worst-case page granularity we may
fail to invalidate any parts which are mapped as intermediate-level
blocks. If invalidating a 1GB region (with 4KB granule) means having to
fall back to 256K non-range commands, we may not want to invalidate by
VA then, even though doing so for a 2MB region is still optimal.
It's also quite feasible that drivers might want to do this for leaf
invalidations too - if you don't like issuing 512 commands to invalidate
2MB, do you like issuing 511 commands to invalidate 2044KB? - and at
that point the logic really has to be in the driver anyway.
Robin.
> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> ---
> drivers/iommu/io-pgtable-arm.c | 3 ++-
> include/linux/io-pgtable.h | 5 +++++
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 87def58e79b5..5d362f2214bd 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -768,7 +768,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> IO_PGTABLE_QUIRK_NON_STRICT |
> IO_PGTABLE_QUIRK_ARM_TTBR1 |
> - IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
> + IO_PGTABLE_QUIRK_ARM_OUTER_WBWA |
> + IO_PGTABLE_QUIRK_TLB_INV_ALL))
> return NULL;
>
> data = arm_lpae_alloc_pgtable(cfg);
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 4d40dfa75b55..45441592a0e6 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -82,6 +82,10 @@ struct io_pgtable_cfg {
> *
> * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
> * attributes set in the TCR for a non-coherent page-table walker.
> + *
> + * IO_PGTABLE_QUIRK_TLB_INV_ALL: Use TLBIALL/TLBIASID to invalidate
> + * entire context for partial walk flush to increase unmap
> + * performance on select few platforms.
> */
> #define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
> #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1)
> @@ -89,6 +93,7 @@ struct io_pgtable_cfg {
> #define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
> #define IO_PGTABLE_QUIRK_ARM_TTBR1 BIT(5)
> #define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
> + #define IO_PGTABLE_QUIRK_TLB_INV_ALL BIT(7)
> unsigned long quirks;
> unsigned long pgsize_bitmap;
> unsigned int ias;
>
_______________________________________________
iommu mailing list
iommu@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/iommu
WARNING: multiple messages have this Message-ID (diff)
From: Robin Murphy <robin.murphy@arm.com>
To: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>,
Will Deacon <will@kernel.org>, Joerg Roedel <joro@8bytes.org>
Cc: Thierry Reding <treding@nvidia.com>,
linux-arm-msm@vger.kernel.org,
Douglas Anderson <dianders@chromium.org>,
linux-kernel@vger.kernel.org, iommu@lists.linux-foundation.org,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush
Date: Mon, 21 Jun 2021 16:45:50 +0100 [thread overview]
Message-ID: <904f283c-f8b1-ba84-d010-eacc87bb53c5@arm.com> (raw)
In-Reply-To: <b099af10926b34249f4a30262db37f50491bebe7.1623981933.git.saiprakash.ranjan@codeaurora.org>
On 2021-06-18 03:51, Sai Prakash Ranjan wrote:
> Add a quirk IO_PGTABLE_QUIRK_TLB_INV_ALL to invalidate entire context
> with tlb_flush_all() callback in partial walk flush to improve unmap
> performance on select few platforms where the cost of over-invalidation
> is less than the unmap latency.
I still think this doesn't belong anywhere near io-pgtable at all. It's
a driver-internal decision how exactly it implements a non-leaf
invalidation, and that may be more complex than a predetermined boolean
decision. For example, I've just realised for SMMUv3 we can't invalidate
multiple levels of table at once with a range command, since if we
assume the whole thing is mapped at worst-case page granularity we may
fail to invalidate any parts which are mapped as intermediate-level
blocks. If invalidating a 1GB region (with 4KB granule) means having to
fall back to 256K non-range commands, we may not want to invalidate by
VA then, even though doing so for a 2MB region is still optimal.
It's also quite feasible that drivers might want to do this for leaf
invalidations too - if you don't like issuing 512 commands to invalidate
2MB, do you like issuing 511 commands to invalidate 2044KB? - and at
that point the logic really has to be in the driver anyway.
Robin.
> Signed-off-by: Sai Prakash Ranjan <saiprakash.ranjan@codeaurora.org>
> ---
> drivers/iommu/io-pgtable-arm.c | 3 ++-
> include/linux/io-pgtable.h | 5 +++++
> 2 files changed, 7 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/iommu/io-pgtable-arm.c b/drivers/iommu/io-pgtable-arm.c
> index 87def58e79b5..5d362f2214bd 100644
> --- a/drivers/iommu/io-pgtable-arm.c
> +++ b/drivers/iommu/io-pgtable-arm.c
> @@ -768,7 +768,8 @@ arm_64_lpae_alloc_pgtable_s1(struct io_pgtable_cfg *cfg, void *cookie)
> if (cfg->quirks & ~(IO_PGTABLE_QUIRK_ARM_NS |
> IO_PGTABLE_QUIRK_NON_STRICT |
> IO_PGTABLE_QUIRK_ARM_TTBR1 |
> - IO_PGTABLE_QUIRK_ARM_OUTER_WBWA))
> + IO_PGTABLE_QUIRK_ARM_OUTER_WBWA |
> + IO_PGTABLE_QUIRK_TLB_INV_ALL))
> return NULL;
>
> data = arm_lpae_alloc_pgtable(cfg);
> diff --git a/include/linux/io-pgtable.h b/include/linux/io-pgtable.h
> index 4d40dfa75b55..45441592a0e6 100644
> --- a/include/linux/io-pgtable.h
> +++ b/include/linux/io-pgtable.h
> @@ -82,6 +82,10 @@ struct io_pgtable_cfg {
> *
> * IO_PGTABLE_QUIRK_ARM_OUTER_WBWA: Override the outer-cacheability
> * attributes set in the TCR for a non-coherent page-table walker.
> + *
> + * IO_PGTABLE_QUIRK_TLB_INV_ALL: Use TLBIALL/TLBIASID to invalidate
> + * entire context for partial walk flush to increase unmap
> + * performance on select few platforms.
> */
> #define IO_PGTABLE_QUIRK_ARM_NS BIT(0)
> #define IO_PGTABLE_QUIRK_NO_PERMS BIT(1)
> @@ -89,6 +93,7 @@ struct io_pgtable_cfg {
> #define IO_PGTABLE_QUIRK_NON_STRICT BIT(4)
> #define IO_PGTABLE_QUIRK_ARM_TTBR1 BIT(5)
> #define IO_PGTABLE_QUIRK_ARM_OUTER_WBWA BIT(6)
> + #define IO_PGTABLE_QUIRK_TLB_INV_ALL BIT(7)
> unsigned long quirks;
> unsigned long pgsize_bitmap;
> unsigned int ias;
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2021-06-21 15:46 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-18 2:51 [PATCHv2 0/3] iommu/io-pgtable: Optimize partial walk flush for large scatter-gather list Sai Prakash Ranjan
2021-06-18 2:51 ` Sai Prakash Ranjan
2021-06-18 2:51 ` [PATCHv2 1/3] iommu/io-pgtable: Add a quirk to use tlb_flush_all() for partial walk flush Sai Prakash Ranjan
2021-06-18 2:51 ` Sai Prakash Ranjan
2021-06-21 15:45 ` Robin Murphy [this message]
2021-06-21 15:45 ` Robin Murphy
2021-06-21 15:45 ` Robin Murphy
2021-06-22 7:11 ` Sai Prakash Ranjan
2021-06-22 7:11 ` Sai Prakash Ranjan
2021-06-22 12:11 ` Robin Murphy
2021-06-22 12:11 ` Robin Murphy
2021-06-22 12:11 ` Robin Murphy
2021-06-22 14:27 ` Sai Prakash Ranjan
2021-06-22 14:27 ` Sai Prakash Ranjan
2021-06-22 18:37 ` Robin Murphy
2021-06-22 18:37 ` Robin Murphy
2021-06-22 18:37 ` Robin Murphy
2021-06-23 13:43 ` Sai Prakash Ranjan
2021-06-23 13:43 ` Sai Prakash Ranjan
2021-06-18 2:51 ` [PATCHv2 2/3] iommu/io-pgtable: Optimize partial walk flush for large scatter-gather list Sai Prakash Ranjan
2021-06-18 2:51 ` Sai Prakash Ranjan
2021-06-18 22:09 ` Doug Anderson
2021-06-18 22:09 ` Doug Anderson
2021-06-18 22:09 ` Doug Anderson
2021-06-21 5:47 ` Sai Prakash Ranjan
2021-06-21 5:47 ` Sai Prakash Ranjan
2021-06-21 16:30 ` Robin Murphy
2021-06-21 16:30 ` Robin Murphy
2021-06-21 16:30 ` Robin Murphy
2021-06-18 2:51 ` [PATCHv2 3/3] iommu/arm-smmu-qcom: Set IO_PGTABLE_QUIRK_TLB_INV_ALL for QTI SoC impl Sai Prakash Ranjan
2021-06-18 2:51 ` Sai Prakash Ranjan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=904f283c-f8b1-ba84-d010-eacc87bb53c5@arm.com \
--to=robin.murphy@arm.com \
--cc=dianders@chromium.org \
--cc=iommu@lists.linux-foundation.org \
--cc=joro@8bytes.org \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=saiprakash.ranjan@codeaurora.org \
--cc=treding@nvidia.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.