[v3,3/4] iommu/arm-smmu: Add support to handle Qcom's wait-for-safe logic
diff mbox series

Message ID 20190612071554.13573-4-vivek.gautam@codeaurora.org
State New
Headers show
Series
  • Qcom smmu-500 wait-for-safe handling for sdm845
Related show

Commit Message

Vivek Gautam June 12, 2019, 7:15 a.m. UTC
Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic
to address under-performance issues in real-time clients, such as
Display, and Camera.
On receiving an invalidation requests, the SMMU forwards SAFE request
to these clients and waits for SAFE ack signal from real-time clients.
The SAFE signal from such clients is used to qualify the start of
invalidation.
This logic is controlled by chicken bits, one for each - MDP (display),
IFE0, and IFE1 (camera), that can be accessed only from secure software
on sdm845.

This configuration, however, degrades the performance of non-real time
clients, such as USB, and UFS etc. This happens because, with wait-for-safe
logic enabled the hardware tries to throttle non-real time clients while
waiting for SAFE ack signals from real-time clients.

On MTP sdm845 devices, with wait-for-safe logic enabled at the boot time
by the bootloaders we see degraded performance of USB and UFS when kernel
enables the smmu stage-1 translations for these clients.
Turn off this wait-for-safe logic from the kernel gets us back the perf
of USB and UFS devices until we re-visit this when we start seeing perf
issues on display/camera on upstream supported SDM845 platforms.

Now, different bootloaders with their access control policies allow this
register access differently through secure monitor calls -
1) With one we can issue io-read/write secure monitor call (qcom-scm)
   to update the register, while,
2) With other, such as one on MTP sdm845 we should use the specific
   qcom-scm command to send request to do the complete register
   configuration.
Adding a separate device tree flag for arm-smmu to identify which
firmware configuration of the two mentioned above we use.
Not adding code change to allow type-(1) bootloaders to toggle the
safe using io-read/write qcom-scm call.

This change is inspired by the downstream change from Patrick Daly
to address performance issues with display and camera by handling
this wait-for-safe within separte io-pagetable ops to do TLB
maintenance. So a big thanks to him for the change.

Without this change the UFS reads are pretty slow:
$ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync
10+0 records in
10+0 records out
10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s
real    0m 22.39s
user    0m 0.00s
sys     0m 0.01s

With this change they are back to rock!
$ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync
300+0 records in
300+0 records out
314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s
real    0m 1.03s
user    0m 0.00s
sys     0m 0.54s

Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
---
 drivers/iommu/arm-smmu.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

Comments

Bjorn Andersson June 14, 2019, 4:05 a.m. UTC | #1
On Wed 12 Jun 00:15 PDT 2019, Vivek Gautam wrote:

> Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic
> to address under-performance issues in real-time clients, such as
> Display, and Camera.
> On receiving an invalidation requests, the SMMU forwards SAFE request
> to these clients and waits for SAFE ack signal from real-time clients.
> The SAFE signal from such clients is used to qualify the start of
> invalidation.
> This logic is controlled by chicken bits, one for each - MDP (display),
> IFE0, and IFE1 (camera), that can be accessed only from secure software
> on sdm845.
> 
> This configuration, however, degrades the performance of non-real time
> clients, such as USB, and UFS etc. This happens because, with wait-for-safe
> logic enabled the hardware tries to throttle non-real time clients while
> waiting for SAFE ack signals from real-time clients.
> 
> On MTP sdm845 devices, with wait-for-safe logic enabled at the boot time
> by the bootloaders we see degraded performance of USB and UFS when kernel
> enables the smmu stage-1 translations for these clients.
> Turn off this wait-for-safe logic from the kernel gets us back the perf
> of USB and UFS devices until we re-visit this when we start seeing perf
> issues on display/camera on upstream supported SDM845 platforms.
> 
> Now, different bootloaders with their access control policies allow this
> register access differently through secure monitor calls -
> 1) With one we can issue io-read/write secure monitor call (qcom-scm)
>    to update the register, while,
> 2) With other, such as one on MTP sdm845 we should use the specific
>    qcom-scm command to send request to do the complete register
>    configuration.
> Adding a separate device tree flag for arm-smmu to identify which
> firmware configuration of the two mentioned above we use.
> Not adding code change to allow type-(1) bootloaders to toggle the
> safe using io-read/write qcom-scm call.
> 
> This change is inspired by the downstream change from Patrick Daly
> to address performance issues with display and camera by handling
> this wait-for-safe within separte io-pagetable ops to do TLB
> maintenance. So a big thanks to him for the change.
> 
> Without this change the UFS reads are pretty slow:
> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync
> 10+0 records in
> 10+0 records out
> 10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s
> real    0m 22.39s
> user    0m 0.00s
> sys     0m 0.01s
> 
> With this change they are back to rock!
> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync
> 300+0 records in
> 300+0 records out
> 314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s
> real    0m 1.03s
> user    0m 0.00s
> sys     0m 0.54s
> 
> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> ---
>  drivers/iommu/arm-smmu.c | 16 ++++++++++++++++
>  1 file changed, 16 insertions(+)
> 
> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> index 0ad086da399c..3c3ad43eda97 100644
> --- a/drivers/iommu/arm-smmu.c
> +++ b/drivers/iommu/arm-smmu.c
> @@ -39,6 +39,7 @@
>  #include <linux/pci.h>
>  #include <linux/platform_device.h>
>  #include <linux/pm_runtime.h>
> +#include <linux/qcom_scm.h>
>  #include <linux/slab.h>
>  #include <linux/spinlock.h>
>  
> @@ -177,6 +178,7 @@ struct arm_smmu_device {
>  	u32				features;
>  
>  #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
> +#define ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA (1 << 1)
>  	u32				options;
>  	enum arm_smmu_arch_version	version;
>  	enum arm_smmu_implementation	model;
> @@ -262,6 +264,7 @@ static bool using_legacy_binding, using_generic_binding;
>  
>  static struct arm_smmu_option_prop arm_smmu_options[] = {
>  	{ ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" },
> +	{ ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA, "qcom,smmu-500-fw-impl-safe-errata" },

This should be added to the DT binding as well.

>  	{ 0, NULL},
>  };
>  
> @@ -2292,6 +2295,19 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>  	arm_smmu_device_reset(smmu);
>  	arm_smmu_test_smr_masks(smmu);
>  
> +	/*
> +	 * To address performance degradation in non-real time clients,
> +	 * such as USB and UFS, turn off wait-for-safe on sdm845 platforms,
> +	 * such as MTP, whose firmwares implement corresponding secure monitor
> +	 * call handlers.
> +	 */
> +	if (of_device_is_compatible(smmu->dev->of_node, "qcom,sdm845-smmu-500") &&
> +	    smmu->options & ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA) {
> +		err = qcom_scm_qsmmu500_wait_safe_toggle(0);
> +		if (err)
> +			dev_warn(dev, "Failed to turn off SAFE logic\n");
> +	}
> +

This looks good, I presume at some point we can profile things and
review if it's worth toggling this on the fly, but given that this is
conditioned on smmu->options that should be an implementation detail..

Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Regards,
Bjorn

>  	/*
>  	 * We want to avoid touching dev->power.lock in fastpaths unless
>  	 * it's really going to do something useful - pm_runtime_enabled()
> -- 
> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
> of Code Aurora Forum, hosted by The Linux Foundation
>
Vivek Gautam June 14, 2019, 9:18 a.m. UTC | #2
On 6/14/2019 9:35 AM, Bjorn Andersson wrote:
> On Wed 12 Jun 00:15 PDT 2019, Vivek Gautam wrote:
>
>> Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic
>> to address under-performance issues in real-time clients, such as
>> Display, and Camera.
>> On receiving an invalidation requests, the SMMU forwards SAFE request
>> to these clients and waits for SAFE ack signal from real-time clients.
>> The SAFE signal from such clients is used to qualify the start of
>> invalidation.
>> This logic is controlled by chicken bits, one for each - MDP (display),
>> IFE0, and IFE1 (camera), that can be accessed only from secure software
>> on sdm845.
>>
>> This configuration, however, degrades the performance of non-real time
>> clients, such as USB, and UFS etc. This happens because, with wait-for-safe
>> logic enabled the hardware tries to throttle non-real time clients while
>> waiting for SAFE ack signals from real-time clients.
>>
>> On MTP sdm845 devices, with wait-for-safe logic enabled at the boot time
>> by the bootloaders we see degraded performance of USB and UFS when kernel
>> enables the smmu stage-1 translations for these clients.
>> Turn off this wait-for-safe logic from the kernel gets us back the perf
>> of USB and UFS devices until we re-visit this when we start seeing perf
>> issues on display/camera on upstream supported SDM845 platforms.
>>
>> Now, different bootloaders with their access control policies allow this
>> register access differently through secure monitor calls -
>> 1) With one we can issue io-read/write secure monitor call (qcom-scm)
>>     to update the register, while,
>> 2) With other, such as one on MTP sdm845 we should use the specific
>>     qcom-scm command to send request to do the complete register
>>     configuration.
>> Adding a separate device tree flag for arm-smmu to identify which
>> firmware configuration of the two mentioned above we use.
>> Not adding code change to allow type-(1) bootloaders to toggle the
>> safe using io-read/write qcom-scm call.
>>
>> This change is inspired by the downstream change from Patrick Daly
>> to address performance issues with display and camera by handling
>> this wait-for-safe within separte io-pagetable ops to do TLB
>> maintenance. So a big thanks to him for the change.
>>
>> Without this change the UFS reads are pretty slow:
>> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync
>> 10+0 records in
>> 10+0 records out
>> 10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s
>> real    0m 22.39s
>> user    0m 0.00s
>> sys     0m 0.01s
>>
>> With this change they are back to rock!
>> $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync
>> 300+0 records in
>> 300+0 records out
>> 314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s
>> real    0m 1.03s
>> user    0m 0.00s
>> sys     0m 0.54s
>>
>> Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
>> ---
>>   drivers/iommu/arm-smmu.c | 16 ++++++++++++++++
>>   1 file changed, 16 insertions(+)
>>
>> diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
>> index 0ad086da399c..3c3ad43eda97 100644
>> --- a/drivers/iommu/arm-smmu.c
>> +++ b/drivers/iommu/arm-smmu.c
>> @@ -39,6 +39,7 @@
>>   #include <linux/pci.h>
>>   #include <linux/platform_device.h>
>>   #include <linux/pm_runtime.h>
>> +#include <linux/qcom_scm.h>
>>   #include <linux/slab.h>
>>   #include <linux/spinlock.h>
>>   
>> @@ -177,6 +178,7 @@ struct arm_smmu_device {
>>   	u32				features;
>>   
>>   #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
>> +#define ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA (1 << 1)
>>   	u32				options;
>>   	enum arm_smmu_arch_version	version;
>>   	enum arm_smmu_implementation	model;
>> @@ -262,6 +264,7 @@ static bool using_legacy_binding, using_generic_binding;
>>   
>>   static struct arm_smmu_option_prop arm_smmu_options[] = {
>>   	{ ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" },
>> +	{ ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA, "qcom,smmu-500-fw-impl-safe-errata" },
> This should be added to the DT binding as well.

Ah right. I missed that. Will add this and respin unless Robin and Will 
have concerns with this change.

>
>>   	{ 0, NULL},
>>   };
>>   
>> @@ -2292,6 +2295,19 @@ static int arm_smmu_device_probe(struct platform_device *pdev)
>>   	arm_smmu_device_reset(smmu);
>>   	arm_smmu_test_smr_masks(smmu);
>>   
>> +	/*
>> +	 * To address performance degradation in non-real time clients,
>> +	 * such as USB and UFS, turn off wait-for-safe on sdm845 platforms,
>> +	 * such as MTP, whose firmwares implement corresponding secure monitor
>> +	 * call handlers.
>> +	 */
>> +	if (of_device_is_compatible(smmu->dev->of_node, "qcom,sdm845-smmu-500") &&
>> +	    smmu->options & ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA) {
>> +		err = qcom_scm_qsmmu500_wait_safe_toggle(0);
>> +		if (err)
>> +			dev_warn(dev, "Failed to turn off SAFE logic\n");
>> +	}
>> +
> This looks good, I presume at some point we can profile things and
> review if it's worth toggling this on the fly, but given that this is
> conditioned on smmu->options that should be an implementation detail..
>
> Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org>

Thanks Bjorn.

Best regards
Vivek

> Regards,
> Bjorn
>
>>   	/*
>>   	 * We want to avoid touching dev->power.lock in fastpaths unless
>>   	 * it's really going to do something useful - pm_runtime_enabled()
>> -- 
>> QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
>> of Code Aurora Forum, hosted by The Linux Foundation
>>
Will Deacon June 18, 2019, 5:52 p.m. UTC | #3
Hi Vivek,

On Fri, Jun 14, 2019 at 02:48:07PM +0530, Vivek Gautam wrote:
> On 6/14/2019 9:35 AM, Bjorn Andersson wrote:
> > On Wed 12 Jun 00:15 PDT 2019, Vivek Gautam wrote:
> > 
> > > Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic
> > > to address under-performance issues in real-time clients, such as
> > > Display, and Camera.
> > > On receiving an invalidation requests, the SMMU forwards SAFE request
> > > to these clients and waits for SAFE ack signal from real-time clients.
> > > The SAFE signal from such clients is used to qualify the start of
> > > invalidation.
> > > This logic is controlled by chicken bits, one for each - MDP (display),
> > > IFE0, and IFE1 (camera), that can be accessed only from secure software
> > > on sdm845.
> > > 
> > > This configuration, however, degrades the performance of non-real time
> > > clients, such as USB, and UFS etc. This happens because, with wait-for-safe
> > > logic enabled the hardware tries to throttle non-real time clients while
> > > waiting for SAFE ack signals from real-time clients.
> > > 
> > > On MTP sdm845 devices, with wait-for-safe logic enabled at the boot time
> > > by the bootloaders we see degraded performance of USB and UFS when kernel
> > > enables the smmu stage-1 translations for these clients.
> > > Turn off this wait-for-safe logic from the kernel gets us back the perf
> > > of USB and UFS devices until we re-visit this when we start seeing perf
> > > issues on display/camera on upstream supported SDM845 platforms.

Re-visit what exactly, and how?

> > > Now, different bootloaders with their access control policies allow this
> > > register access differently through secure monitor calls -
> > > 1) With one we can issue io-read/write secure monitor call (qcom-scm)
> > >     to update the register, while,
> > > 2) With other, such as one on MTP sdm845 we should use the specific
> > >     qcom-scm command to send request to do the complete register
> > >     configuration.
> > > Adding a separate device tree flag for arm-smmu to identify which
> > > firmware configuration of the two mentioned above we use.
> > > Not adding code change to allow type-(1) bootloaders to toggle the
> > > safe using io-read/write qcom-scm call.
> > > 
> > > This change is inspired by the downstream change from Patrick Daly
> > > to address performance issues with display and camera by handling
> > > this wait-for-safe within separte io-pagetable ops to do TLB
> > > maintenance. So a big thanks to him for the change.
> > > 
> > > Without this change the UFS reads are pretty slow:
> > > $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync
> > > 10+0 records in
> > > 10+0 records out
> > > 10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s
> > > real    0m 22.39s
> > > user    0m 0.00s
> > > sys     0m 0.01s
> > > 
> > > With this change they are back to rock!
> > > $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync
> > > 300+0 records in
> > > 300+0 records out
> > > 314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s
> > > real    0m 1.03s
> > > user    0m 0.00s
> > > sys     0m 0.54s
> > > 
> > > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> > > ---
> > >   drivers/iommu/arm-smmu.c | 16 ++++++++++++++++
> > >   1 file changed, 16 insertions(+)
> > > 
> > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > index 0ad086da399c..3c3ad43eda97 100644
> > > --- a/drivers/iommu/arm-smmu.c
> > > +++ b/drivers/iommu/arm-smmu.c
> > > @@ -39,6 +39,7 @@
> > >   #include <linux/pci.h>
> > >   #include <linux/platform_device.h>
> > >   #include <linux/pm_runtime.h>
> > > +#include <linux/qcom_scm.h>
> > >   #include <linux/slab.h>
> > >   #include <linux/spinlock.h>
> > > @@ -177,6 +178,7 @@ struct arm_smmu_device {
> > >   	u32				features;
> > >   #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
> > > +#define ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA (1 << 1)
> > >   	u32				options;
> > >   	enum arm_smmu_arch_version	version;
> > >   	enum arm_smmu_implementation	model;
> > > @@ -262,6 +264,7 @@ static bool using_legacy_binding, using_generic_binding;
> > >   static struct arm_smmu_option_prop arm_smmu_options[] = {
> > >   	{ ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" },
> > > +	{ ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA, "qcom,smmu-500-fw-impl-safe-errata" },
> > This should be added to the DT binding as well.
> 
> Ah right. I missed that. Will add this and respin unless Robin and Will have
> concerns with this change.

My only concern really is whether it's safe for us to turn this off. It's
clear that somebody went to a lot of effort to add this extra goodness to
the IP, but your benchmarks suggest they never actually tried it out after
they finished building it.

Is there some downside I'm not seeing from disabling this stuff?

We probably also need some co-ordination with Andy if we're going to
merge this, since he maintains the firmware calling code.

Will
Vivek Gautam June 24, 2019, 10:28 a.m. UTC | #4
Hi Will,

On Tue, Jun 18, 2019 at 11:22 PM Will Deacon <will.deacon@arm.com> wrote:
>
> Hi Vivek,
>
> On Fri, Jun 14, 2019 at 02:48:07PM +0530, Vivek Gautam wrote:
> > On 6/14/2019 9:35 AM, Bjorn Andersson wrote:
> > > On Wed 12 Jun 00:15 PDT 2019, Vivek Gautam wrote:
> > >
> > > > Qcom's implementation of arm,mmu-500 adds a WAIT-FOR-SAFE logic
> > > > to address under-performance issues in real-time clients, such as
> > > > Display, and Camera.
> > > > On receiving an invalidation requests, the SMMU forwards SAFE request
> > > > to these clients and waits for SAFE ack signal from real-time clients.
> > > > The SAFE signal from such clients is used to qualify the start of
> > > > invalidation.
> > > > This logic is controlled by chicken bits, one for each - MDP (display),
> > > > IFE0, and IFE1 (camera), that can be accessed only from secure software
> > > > on sdm845.
> > > >
> > > > This configuration, however, degrades the performance of non-real time
> > > > clients, such as USB, and UFS etc. This happens because, with wait-for-safe
> > > > logic enabled the hardware tries to throttle non-real time clients while
> > > > waiting for SAFE ack signals from real-time clients.
> > > >
> > > > On MTP sdm845 devices, with wait-for-safe logic enabled at the boot time
> > > > by the bootloaders we see degraded performance of USB and UFS when kernel
> > > > enables the smmu stage-1 translations for these clients.
> > > > Turn off this wait-for-safe logic from the kernel gets us back the perf
> > > > of USB and UFS devices until we re-visit this when we start seeing perf
> > > > issues on display/camera on upstream supported SDM845 platforms.
>
> Re-visit what exactly, and how?
>
> > > > Now, different bootloaders with their access control policies allow this
> > > > register access differently through secure monitor calls -
> > > > 1) With one we can issue io-read/write secure monitor call (qcom-scm)
> > > >     to update the register, while,
> > > > 2) With other, such as one on MTP sdm845 we should use the specific
> > > >     qcom-scm command to send request to do the complete register
> > > >     configuration.
> > > > Adding a separate device tree flag for arm-smmu to identify which
> > > > firmware configuration of the two mentioned above we use.
> > > > Not adding code change to allow type-(1) bootloaders to toggle the
> > > > safe using io-read/write qcom-scm call.
> > > >
> > > > This change is inspired by the downstream change from Patrick Daly
> > > > to address performance issues with display and camera by handling
> > > > this wait-for-safe within separte io-pagetable ops to do TLB
> > > > maintenance. So a big thanks to him for the change.
> > > >
> > > > Without this change the UFS reads are pretty slow:
> > > > $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=10 conv=sync
> > > > 10+0 records in
> > > > 10+0 records out
> > > > 10485760 bytes (10.0MB) copied, 22.394903 seconds, 457.2KB/s
> > > > real    0m 22.39s
> > > > user    0m 0.00s
> > > > sys     0m 0.01s
> > > >
> > > > With this change they are back to rock!
> > > > $ time dd if=/dev/sda of=/dev/zero bs=1048576 count=300 conv=sync
> > > > 300+0 records in
> > > > 300+0 records out
> > > > 314572800 bytes (300.0MB) copied, 1.030541 seconds, 291.1MB/s
> > > > real    0m 1.03s
> > > > user    0m 0.00s
> > > > sys     0m 0.54s
> > > >
> > > > Signed-off-by: Vivek Gautam <vivek.gautam@codeaurora.org>
> > > > ---
> > > >   drivers/iommu/arm-smmu.c | 16 ++++++++++++++++
> > > >   1 file changed, 16 insertions(+)
> > > >
> > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > > index 0ad086da399c..3c3ad43eda97 100644
> > > > --- a/drivers/iommu/arm-smmu.c
> > > > +++ b/drivers/iommu/arm-smmu.c
> > > > @@ -39,6 +39,7 @@
> > > >   #include <linux/pci.h>
> > > >   #include <linux/platform_device.h>
> > > >   #include <linux/pm_runtime.h>
> > > > +#include <linux/qcom_scm.h>
> > > >   #include <linux/slab.h>
> > > >   #include <linux/spinlock.h>
> > > > @@ -177,6 +178,7 @@ struct arm_smmu_device {
> > > >           u32                             features;
> > > >   #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
> > > > +#define ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA (1 << 1)
> > > >           u32                             options;
> > > >           enum arm_smmu_arch_version      version;
> > > >           enum arm_smmu_implementation    model;
> > > > @@ -262,6 +264,7 @@ static bool using_legacy_binding, using_generic_binding;
> > > >   static struct arm_smmu_option_prop arm_smmu_options[] = {
> > > >           { ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" },
> > > > + { ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA, "qcom,smmu-500-fw-impl-safe-errata" },
> > > This should be added to the DT binding as well.
> >
> > Ah right. I missed that. Will add this and respin unless Robin and Will have
> > concerns with this change.
>
> My only concern really is whether it's safe for us to turn this off. It's
> clear that somebody went to a lot of effort to add this extra goodness to
> the IP, but your benchmarks suggest they never actually tried it out after
> they finished building it.
>
> Is there some downside I'm not seeing from disabling this stuff?

This wait-for-safe is a TLB invalidation enhancement to help display
and camera devices.
The SMMU hardware throttles the invalidations so that clients such as
display and camera can indicate when to start the invalidation.
So the SMMU essentially reduces the rate at which invalidations are
serviced from its queue. This also throttles the invalidations from
other masters too.

On sdm845, the software is expected to serialize the invalidation
command loading into SMMU invalidation FIFO using hardware locks
(downstream code [2]), and is also expected to throttle non-real time
clients while waiting for SAFE==1 (downstream code[2]). We don't do
any of these yet, and as per my understanding as this wait-for-safe is
enabled by the bootloader in a one time config, this logic reduces
performance of devices such as usb and ufs.

There's isn't any downside from disabling this logic until we have all
the pieces together from downstream in upstream kernels, and until we
have sdm845 devices that are running with full display/gfx stack
running. That's when we plan to revisit this and enable all the pieces
to get display and USB/UFS working with their optimum performance.

[1] https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/iommu/arm-smmu.c?h=msm-4.9#n5172
[2] https://source.codeaurora.org/quic/la/kernel/msm-4.9/tree/drivers/iommu/arm-smmu.c?h=msm-4.9#n5135

Thanks
Vivek

>
> We probably also need some co-ordination with Andy if we're going to
> merge this, since he maintains the firmware calling code.
>
> Will
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation
Will Deacon June 24, 2019, 5:03 p.m. UTC | #5
[+Krishna]

Hi Vivek,

On Mon, Jun 24, 2019 at 03:58:32PM +0530, Vivek Gautam wrote:
> On Tue, Jun 18, 2019 at 11:22 PM Will Deacon <will.deacon@arm.com> wrote:
> > On Fri, Jun 14, 2019 at 02:48:07PM +0530, Vivek Gautam wrote:
> > > On 6/14/2019 9:35 AM, Bjorn Andersson wrote:
> > > > On Wed 12 Jun 00:15 PDT 2019, Vivek Gautam wrote:
> > > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > > > index 0ad086da399c..3c3ad43eda97 100644
> > > > > --- a/drivers/iommu/arm-smmu.c
> > > > > +++ b/drivers/iommu/arm-smmu.c
> > > > > @@ -39,6 +39,7 @@
> > > > >   #include <linux/pci.h>
> > > > >   #include <linux/platform_device.h>
> > > > >   #include <linux/pm_runtime.h>
> > > > > +#include <linux/qcom_scm.h>
> > > > >   #include <linux/slab.h>
> > > > >   #include <linux/spinlock.h>
> > > > > @@ -177,6 +178,7 @@ struct arm_smmu_device {
> > > > >           u32                             features;
> > > > >   #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
> > > > > +#define ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA (1 << 1)
> > > > >           u32                             options;
> > > > >           enum arm_smmu_arch_version      version;
> > > > >           enum arm_smmu_implementation    model;
> > > > > @@ -262,6 +264,7 @@ static bool using_legacy_binding, using_generic_binding;
> > > > >   static struct arm_smmu_option_prop arm_smmu_options[] = {
> > > > >           { ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" },
> > > > > + { ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA, "qcom,smmu-500-fw-impl-safe-errata" },
> > > > This should be added to the DT binding as well.
> > >
> > > Ah right. I missed that. Will add this and respin unless Robin and Will have
> > > concerns with this change.
> >
> > My only concern really is whether it's safe for us to turn this off. It's
> > clear that somebody went to a lot of effort to add this extra goodness to
> > the IP, but your benchmarks suggest they never actually tried it out after
> > they finished building it.
> >
> > Is there some downside I'm not seeing from disabling this stuff?
> 
> This wait-for-safe is a TLB invalidation enhancement to help display
> and camera devices.
> The SMMU hardware throttles the invalidations so that clients such as
> display and camera can indicate when to start the invalidation.
> So the SMMU essentially reduces the rate at which invalidations are
> serviced from its queue. This also throttles the invalidations from
> other masters too.
> 
> On sdm845, the software is expected to serialize the invalidation
> command loading into SMMU invalidation FIFO using hardware locks
> (downstream code [2]), and is also expected to throttle non-real time
> clients while waiting for SAFE==1 (downstream code[2]). We don't do
> any of these yet, and as per my understanding as this wait-for-safe is
> enabled by the bootloader in a one time config, this logic reduces
> performance of devices such as usb and ufs.
> 
> There's isn't any downside from disabling this logic until we have all
> the pieces together from downstream in upstream kernels, and until we
> have sdm845 devices that are running with full display/gfx stack
> running. That's when we plan to revisit this and enable all the pieces
> to get display and USB/UFS working with their optimum performance.

Generally, I'd agree that approaching this incrementally makes sense, but
in this case you're adding new device-tree properties
("qcom,smmu-500-fw-impl-safe-errata") in order to do so, which seems
questionable if they're only going to be used in the short-term and will
be obsolete once Linux knows how to drive the device properly.

Instead, I think this needs to be part of a separate file that is maintained
by you, which follows on from the work that Krishna is doing for nvidia
built on top of Robin's prototype patches:

http://linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/smmu-impl

Once we have that, you can key this behaviour off the compatible string
rather than having to add quirk properties to reflect the transient needs of
Linux.

Krishna -- how have you been getting on with the branch above?

Will
Vivek Gautam June 25, 2019, 7:04 a.m. UTC | #6
Hi Will,


On Mon, Jun 24, 2019 at 10:33 PM Will Deacon <will@kernel.org> wrote:
>
> [+Krishna]
>
> Hi Vivek,
>
> On Mon, Jun 24, 2019 at 03:58:32PM +0530, Vivek Gautam wrote:
> > On Tue, Jun 18, 2019 at 11:22 PM Will Deacon <will.deacon@arm.com> wrote:
> > > On Fri, Jun 14, 2019 at 02:48:07PM +0530, Vivek Gautam wrote:
> > > > On 6/14/2019 9:35 AM, Bjorn Andersson wrote:
> > > > > On Wed 12 Jun 00:15 PDT 2019, Vivek Gautam wrote:
> > > > > > diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
> > > > > > index 0ad086da399c..3c3ad43eda97 100644
> > > > > > --- a/drivers/iommu/arm-smmu.c
> > > > > > +++ b/drivers/iommu/arm-smmu.c
> > > > > > @@ -39,6 +39,7 @@
> > > > > >   #include <linux/pci.h>
> > > > > >   #include <linux/platform_device.h>
> > > > > >   #include <linux/pm_runtime.h>
> > > > > > +#include <linux/qcom_scm.h>
> > > > > >   #include <linux/slab.h>
> > > > > >   #include <linux/spinlock.h>
> > > > > > @@ -177,6 +178,7 @@ struct arm_smmu_device {
> > > > > >           u32                             features;
> > > > > >   #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
> > > > > > +#define ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA (1 << 1)
> > > > > >           u32                             options;
> > > > > >           enum arm_smmu_arch_version      version;
> > > > > >           enum arm_smmu_implementation    model;
> > > > > > @@ -262,6 +264,7 @@ static bool using_legacy_binding, using_generic_binding;
> > > > > >   static struct arm_smmu_option_prop arm_smmu_options[] = {
> > > > > >           { ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" },
> > > > > > + { ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA, "qcom,smmu-500-fw-impl-safe-errata" },
> > > > > This should be added to the DT binding as well.
> > > >
> > > > Ah right. I missed that. Will add this and respin unless Robin and Will have
> > > > concerns with this change.
> > >
> > > My only concern really is whether it's safe for us to turn this off. It's
> > > clear that somebody went to a lot of effort to add this extra goodness to
> > > the IP, but your benchmarks suggest they never actually tried it out after
> > > they finished building it.
> > >
> > > Is there some downside I'm not seeing from disabling this stuff?
> >
> > This wait-for-safe is a TLB invalidation enhancement to help display
> > and camera devices.
> > The SMMU hardware throttles the invalidations so that clients such as
> > display and camera can indicate when to start the invalidation.
> > So the SMMU essentially reduces the rate at which invalidations are
> > serviced from its queue. This also throttles the invalidations from
> > other masters too.
> >
> > On sdm845, the software is expected to serialize the invalidation
> > command loading into SMMU invalidation FIFO using hardware locks
> > (downstream code [2]), and is also expected to throttle non-real time
> > clients while waiting for SAFE==1 (downstream code[2]). We don't do
> > any of these yet, and as per my understanding as this wait-for-safe is
> > enabled by the bootloader in a one time config, this logic reduces
> > performance of devices such as usb and ufs.
> >
> > There's isn't any downside from disabling this logic until we have all
> > the pieces together from downstream in upstream kernels, and until we
> > have sdm845 devices that are running with full display/gfx stack
> > running. That's when we plan to revisit this and enable all the pieces
> > to get display and USB/UFS working with their optimum performance.
>
> Generally, I'd agree that approaching this incrementally makes sense, but
> in this case you're adding new device-tree properties
> ("qcom,smmu-500-fw-impl-safe-errata") in order to do so, which seems
> questionable if they're only going to be used in the short-term and will
> be obsolete once Linux knows how to drive the device properly.

This device tree property will still be valid when we handle the wait-for-safe
properly for sdm845.
("qcom,smmu-500-fw-impl-safe-errata") property represents just that the
firmware has handles to do the entire sequence -
* read the secure register
* set/reset the bits in the register to enable/disable wait-for-safe for certain
  devices.
And this is valid when firmware masks access to this register from any other
EE.
So we don't have to do anything that, for example, we were doing for sdm845
based cheza's firmware [1] that implements simple scm handlers to read/write
secure registers.

And fyi, some of the newer SoCs too have this logic, and kernel can have that
extra bit of page-table-ops to handle wait-safe toggling.

[1] https://lore.kernel.org/patchwork/patch/983917/
>
> Instead, I think this needs to be part of a separate file that is maintained
> by you, which follows on from the work that Krishna is doing for nvidia
> built on top of Robin's prototype patches:
>
> http://linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/smmu-impl

Looking at this branch quickly, it seem there can be separate implementation
level configuration file that can be added.
But will this also handle separate page table ops when required in future.

Best regards
Vivek

>
> Once we have that, you can key this behaviour off the compatible string
> rather than having to add quirk properties to reflect the transient needs of
> Linux.
>
> Krishna -- how have you been getting on with the branch above?
>
> Will
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
Will Deacon June 25, 2019, 1:39 p.m. UTC | #7
On Tue, Jun 25, 2019 at 12:34:56PM +0530, Vivek Gautam wrote:
> On Mon, Jun 24, 2019 at 10:33 PM Will Deacon <will@kernel.org> wrote:
> > Instead, I think this needs to be part of a separate file that is maintained
> > by you, which follows on from the work that Krishna is doing for nvidia
> > built on top of Robin's prototype patches:
> >
> > http://linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/smmu-impl
> 
> Looking at this branch quickly, it seem there can be separate implementation
> level configuration file that can be added.
> But will this also handle separate page table ops when required in future.

Nothing's set in stone, but having the implementation-specific code
constrain the page-table format (especially wrt quirks) sounds reasonable to
me. I'm currently waiting for Krishna to respin the nvidia changes [1] on
top of this so that we can see how well the abstractions are holding up.

I certainly won't merge the stuff until we have a user.

Will

[1] https://lkml.kernel.org/r/1543887414-18209-1-git-send-email-vdumpa@nvidia.com
Vivek Gautam June 26, 2019, 6:33 a.m. UTC | #8
On Tue, Jun 25, 2019 at 7:09 PM Will Deacon <will@kernel.org> wrote:
>
> On Tue, Jun 25, 2019 at 12:34:56PM +0530, Vivek Gautam wrote:
> > On Mon, Jun 24, 2019 at 10:33 PM Will Deacon <will@kernel.org> wrote:
> > > Instead, I think this needs to be part of a separate file that is maintained
> > > by you, which follows on from the work that Krishna is doing for nvidia
> > > built on top of Robin's prototype patches:
> > >
> > > http://linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/smmu-impl
> >
> > Looking at this branch quickly, it seem there can be separate implementation
> > level configuration file that can be added.
> > But will this also handle separate page table ops when required in future.
>
> Nothing's set in stone, but having the implementation-specific code
> constrain the page-table format (especially wrt quirks) sounds reasonable to
> me. I'm currently waiting for Krishna to respin the nvidia changes [1] on
> top of this so that we can see how well the abstractions are holding up.

Sure. Would you want me to try Robin's branch and take out the qualcomm
related stuff to its own implementation? Or, would you like me to respin this
series so that you can take it in to enable SDM845 boards such as, MTP
and dragonboard to have a sane build - debian, etc. so people benefit
out of it.
Qualcomm stuff is lying in qcom-smmu and arm-smmu and may take some
time to stub out the implementation related details.
Let me know your take.

Thanks & regards
Vivek

>
> I certainly won't merge the stuff until we have a user.
>
> Will
>
> [1] https://lkml.kernel.org/r/1543887414-18209-1-git-send-email-vdumpa@nvidia.com
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu
Will Deacon June 26, 2019, 2:48 p.m. UTC | #9
On Wed, Jun 26, 2019 at 12:03:02PM +0530, Vivek Gautam wrote:
> On Tue, Jun 25, 2019 at 7:09 PM Will Deacon <will@kernel.org> wrote:
> >
> > On Tue, Jun 25, 2019 at 12:34:56PM +0530, Vivek Gautam wrote:
> > > On Mon, Jun 24, 2019 at 10:33 PM Will Deacon <will@kernel.org> wrote:
> > > > Instead, I think this needs to be part of a separate file that is maintained
> > > > by you, which follows on from the work that Krishna is doing for nvidia
> > > > built on top of Robin's prototype patches:
> > > >
> > > > http://linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/smmu-impl
> > >
> > > Looking at this branch quickly, it seem there can be separate implementation
> > > level configuration file that can be added.
> > > But will this also handle separate page table ops when required in future.
> >
> > Nothing's set in stone, but having the implementation-specific code
> > constrain the page-table format (especially wrt quirks) sounds reasonable to
> > me. I'm currently waiting for Krishna to respin the nvidia changes [1] on
> > top of this so that we can see how well the abstractions are holding up.
> 
> Sure. Would you want me to try Robin's branch and take out the qualcomm
> related stuff to its own implementation? Or, would you like me to respin this
> series so that you can take it in to enable SDM845 boards such as, MTP
> and dragonboard to have a sane build - debian, etc. so people benefit
> out of it.

I can't take this series without Acks on the firmware calling changes, and I
plan to send my 5.3 patches to Joerg at the end of the week so they get some
time in -next. In which case, I think it may be worth you having a play with
the branch above so we can get a better idea of any additional smmu_impl hooks
you may need.

> Qualcomm stuff is lying in qcom-smmu and arm-smmu and may take some
> time to stub out the implementation related details.

Not sure I follow you here. Are you talking about qcom_iommu.c?

Will
Vivek Gautam June 27, 2019, 7:05 a.m. UTC | #10
On Wed, Jun 26, 2019 at 8:18 PM Will Deacon <will@kernel.org> wrote:
>
> On Wed, Jun 26, 2019 at 12:03:02PM +0530, Vivek Gautam wrote:
> > On Tue, Jun 25, 2019 at 7:09 PM Will Deacon <will@kernel.org> wrote:
> > >
> > > On Tue, Jun 25, 2019 at 12:34:56PM +0530, Vivek Gautam wrote:
> > > > On Mon, Jun 24, 2019 at 10:33 PM Will Deacon <will@kernel.org> wrote:
> > > > > Instead, I think this needs to be part of a separate file that is maintained
> > > > > by you, which follows on from the work that Krishna is doing for nvidia
> > > > > built on top of Robin's prototype patches:
> > > > >
> > > > > http://linux-arm.org/git?p=linux-rm.git;a=shortlog;h=refs/heads/iommu/smmu-impl
> > > >
> > > > Looking at this branch quickly, it seem there can be separate implementation
> > > > level configuration file that can be added.
> > > > But will this also handle separate page table ops when required in future.
> > >
> > > Nothing's set in stone, but having the implementation-specific code
> > > constrain the page-table format (especially wrt quirks) sounds reasonable to
> > > me. I'm currently waiting for Krishna to respin the nvidia changes [1] on
> > > top of this so that we can see how well the abstractions are holding up.
> >
> > Sure. Would you want me to try Robin's branch and take out the qualcomm
> > related stuff to its own implementation? Or, would you like me to respin this
> > series so that you can take it in to enable SDM845 boards such as, MTP
> > and dragonboard to have a sane build - debian, etc. so people benefit
> > out of it.
>
> I can't take this series without Acks on the firmware calling changes, and I
> plan to send my 5.3 patches to Joerg at the end of the week so they get some
> time in -next. In which case, I think it may be worth you having a play with
> the branch above so we can get a better idea of any additional smmu_impl hooks
> you may need.

Cool. I will play around with it and get something tangible and meaningful.

>
> > Qualcomm stuff is lying in qcom-smmu and arm-smmu and may take some
> > time to stub out the implementation related details.
>
> Not sure I follow you here. Are you talking about qcom_iommu.c?

That's right. The qcom_iommu.c solved a different issue of secure context bank
allocations, when Rob forked out this driver and reused some of the
arm-smmu.c stuff.

We will take a look at that once we start adding the qcom implementation.

Thanks
Vivek

>
> Will
> _______________________________________________
> iommu mailing list
> iommu@lists.linux-foundation.org
> https://lists.linuxfoundation.org/mailman/listinfo/iommu

Patch
diff mbox series

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 0ad086da399c..3c3ad43eda97 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -39,6 +39,7 @@ 
 #include <linux/pci.h>
 #include <linux/platform_device.h>
 #include <linux/pm_runtime.h>
+#include <linux/qcom_scm.h>
 #include <linux/slab.h>
 #include <linux/spinlock.h>
 
@@ -177,6 +178,7 @@  struct arm_smmu_device {
 	u32				features;
 
 #define ARM_SMMU_OPT_SECURE_CFG_ACCESS (1 << 0)
+#define ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA (1 << 1)
 	u32				options;
 	enum arm_smmu_arch_version	version;
 	enum arm_smmu_implementation	model;
@@ -262,6 +264,7 @@  static bool using_legacy_binding, using_generic_binding;
 
 static struct arm_smmu_option_prop arm_smmu_options[] = {
 	{ ARM_SMMU_OPT_SECURE_CFG_ACCESS, "calxeda,smmu-secure-config-access" },
+	{ ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA, "qcom,smmu-500-fw-impl-safe-errata" },
 	{ 0, NULL},
 };
 
@@ -2292,6 +2295,19 @@  static int arm_smmu_device_probe(struct platform_device *pdev)
 	arm_smmu_device_reset(smmu);
 	arm_smmu_test_smr_masks(smmu);
 
+	/*
+	 * To address performance degradation in non-real time clients,
+	 * such as USB and UFS, turn off wait-for-safe on sdm845 platforms,
+	 * such as MTP, whose firmwares implement corresponding secure monitor
+	 * call handlers.
+	 */
+	if (of_device_is_compatible(smmu->dev->of_node, "qcom,sdm845-smmu-500") &&
+	    smmu->options & ARM_SMMU_OPT_QCOM_FW_IMPL_SAFE_ERRATA) {
+		err = qcom_scm_qsmmu500_wait_safe_toggle(0);
+		if (err)
+			dev_warn(dev, "Failed to turn off SAFE logic\n");
+	}
+
 	/*
 	 * We want to avoid touching dev->power.lock in fastpaths unless
 	 * it's really going to do something useful - pm_runtime_enabled()