All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/2] optee: fix OOM seen due to tee_shm_free()
@ 2021-02-25  9:06 ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-02-25  9:06 UTC (permalink / raw)
  To: jens.wiklander, zajec5
  Cc: bcm-kernel-feedback-list, linux-arm-kernel, linux-kernel, op-tee,
	Allen Pais

From: Allen Pais <apais@linux.microsoft.com>

The following out of memory errors are seen on kexec reboot
from the optee core.
    
[    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
[    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
    
tee_shm_release() is not invoked on dma shm buffer.
    
Implement .shutdown() in optee core as well as bnxt firmware driver
to handle the release of the buffers correctly.
    
More info:
https://github.com/OP-TEE/optee_os/issues/3637

v2:
  keep the .shutdown() method simple. [Jens Wiklander]

Allen Pais (2):
  optee: fix tee out of memory failure seen during kexec reboot
  firmware: tee_bnxt: implement shutdown method to handle kexec reboots

 drivers/firmware/broadcom/tee_bnxt_fw.c |  9 +++++++++
 drivers/tee/optee/core.c                | 20 ++++++++++++++++++++
 2 files changed, 29 insertions(+)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 0/2] optee: fix OOM seen due to tee_shm_free()
@ 2021-02-25  9:06 ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-02-25  9:06 UTC (permalink / raw)
  To: jens.wiklander, zajec5
  Cc: Allen Pais, op-tee, bcm-kernel-feedback-list, linux-kernel,
	linux-arm-kernel

From: Allen Pais <apais@linux.microsoft.com>

The following out of memory errors are seen on kexec reboot
from the optee core.
    
[    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
[    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
    
tee_shm_release() is not invoked on dma shm buffer.
    
Implement .shutdown() in optee core as well as bnxt firmware driver
to handle the release of the buffers correctly.
    
More info:
https://github.com/OP-TEE/optee_os/issues/3637

v2:
  keep the .shutdown() method simple. [Jens Wiklander]

Allen Pais (2):
  optee: fix tee out of memory failure seen during kexec reboot
  firmware: tee_bnxt: implement shutdown method to handle kexec reboots

 drivers/firmware/broadcom/tee_bnxt_fw.c |  9 +++++++++
 drivers/tee/optee/core.c                | 20 ++++++++++++++++++++
 2 files changed, 29 insertions(+)

-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-02-25  9:06 ` Allen Pais
@ 2021-02-25  9:06   ` Allen Pais
  -1 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-02-25  9:06 UTC (permalink / raw)
  To: jens.wiklander, zajec5
  Cc: bcm-kernel-feedback-list, linux-arm-kernel, linux-kernel, op-tee,
	Allen Pais

From: Allen Pais <apais@linux.microsoft.com>

The following out of memory errors are seen on kexec reboot
from the optee core.

[    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
[    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22

tee_shm_release() is not invoked on dma shm buffer.

Implement .shutdown() method to handle the release of the buffers
correctly.

More info:
https://github.com/OP-TEE/optee_os/issues/3637

Signed-off-by: Allen Pais <apais@linux.microsoft.com>
---
 drivers/tee/optee/core.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
index cf4718c6d35d..80e2774b5e2a 100644
--- a/drivers/tee/optee/core.c
+++ b/drivers/tee/optee/core.c
@@ -582,6 +582,13 @@ static optee_invoke_fn *get_invoke_func(struct device *dev)
 	return ERR_PTR(-EINVAL);
 }
 
+/* optee_remove - Device Removal Routine
+ * @pdev: platform device information struct
+ *
+ * optee_remove is called by platform subsystem to alter the driver
+ * that it should release the device
+ */
+
 static int optee_remove(struct platform_device *pdev)
 {
 	struct optee *optee = platform_get_drvdata(pdev);
@@ -612,6 +619,18 @@ static int optee_remove(struct platform_device *pdev)
 	return 0;
 }
 
+/* optee_shutdown - Device Removal Routine
+ * @pdev: platform device information struct
+ *
+ * platform_shutdown is called by the platform subsystem to alter
+ * the driver that a shutdown/reboot(or kexec) is happening and
+ * device must be disabled.
+ */
+static void optee_shutdown(struct platform_device *pdev)
+{
+	optee_disable_shm_cache(platform_get_drvdata(pdev));
+}
+
 static int optee_probe(struct platform_device *pdev)
 {
 	optee_invoke_fn *invoke_fn;
@@ -738,6 +757,7 @@ MODULE_DEVICE_TABLE(of, optee_dt_match);
 static struct platform_driver optee_driver = {
 	.probe  = optee_probe,
 	.remove = optee_remove,
+	.shutdown = optee_shutdown,
 	.driver = {
 		.name = "optee",
 		.of_match_table = optee_dt_match,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-02-25  9:06   ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-02-25  9:06 UTC (permalink / raw)
  To: jens.wiklander, zajec5
  Cc: Allen Pais, op-tee, bcm-kernel-feedback-list, linux-kernel,
	linux-arm-kernel

From: Allen Pais <apais@linux.microsoft.com>

The following out of memory errors are seen on kexec reboot
from the optee core.

[    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
[    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22

tee_shm_release() is not invoked on dma shm buffer.

Implement .shutdown() method to handle the release of the buffers
correctly.

More info:
https://github.com/OP-TEE/optee_os/issues/3637

Signed-off-by: Allen Pais <apais@linux.microsoft.com>
---
 drivers/tee/optee/core.c | 20 ++++++++++++++++++++
 1 file changed, 20 insertions(+)

diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
index cf4718c6d35d..80e2774b5e2a 100644
--- a/drivers/tee/optee/core.c
+++ b/drivers/tee/optee/core.c
@@ -582,6 +582,13 @@ static optee_invoke_fn *get_invoke_func(struct device *dev)
 	return ERR_PTR(-EINVAL);
 }
 
+/* optee_remove - Device Removal Routine
+ * @pdev: platform device information struct
+ *
+ * optee_remove is called by platform subsystem to alter the driver
+ * that it should release the device
+ */
+
 static int optee_remove(struct platform_device *pdev)
 {
 	struct optee *optee = platform_get_drvdata(pdev);
@@ -612,6 +619,18 @@ static int optee_remove(struct platform_device *pdev)
 	return 0;
 }
 
+/* optee_shutdown - Device Removal Routine
+ * @pdev: platform device information struct
+ *
+ * platform_shutdown is called by the platform subsystem to alter
+ * the driver that a shutdown/reboot(or kexec) is happening and
+ * device must be disabled.
+ */
+static void optee_shutdown(struct platform_device *pdev)
+{
+	optee_disable_shm_cache(platform_get_drvdata(pdev));
+}
+
 static int optee_probe(struct platform_device *pdev)
 {
 	optee_invoke_fn *invoke_fn;
@@ -738,6 +757,7 @@ MODULE_DEVICE_TABLE(of, optee_dt_match);
 static struct platform_driver optee_driver = {
 	.probe  = optee_probe,
 	.remove = optee_remove,
+	.shutdown = optee_shutdown,
 	.driver = {
 		.name = "optee",
 		.of_match_table = optee_dt_match,
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 2/2] firmware: tee_bnxt: implement shutdown method to handle kexec reboots
  2021-02-25  9:06 ` Allen Pais
@ 2021-02-25  9:06   ` Allen Pais
  -1 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-02-25  9:06 UTC (permalink / raw)
  To: jens.wiklander, zajec5
  Cc: bcm-kernel-feedback-list, linux-arm-kernel, linux-kernel, op-tee,
	Allen Pais

From: Allen Pais <apais@linux.microsoft.com>

 On kexec reboot the firmware driver fails to deallocate
shm memory leading to a memory leak. Implement .shutdown()
method to handle kexec reboots and to release shm buffers
correctly.

Signed-off-by: Allen Pais <apais@linux.microsoft.com>
---
 drivers/firmware/broadcom/tee_bnxt_fw.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/firmware/broadcom/tee_bnxt_fw.c b/drivers/firmware/broadcom/tee_bnxt_fw.c
index ed10da5313e8..4c62e044a99f 100644
--- a/drivers/firmware/broadcom/tee_bnxt_fw.c
+++ b/drivers/firmware/broadcom/tee_bnxt_fw.c
@@ -242,6 +242,14 @@ static int tee_bnxt_fw_remove(struct device *dev)
 	return 0;
 }
 
+static void tee_bnxt_fw_shutdown(struct device *dev)
+{
+	tee_shm_free(pvt_data.fw_shm_pool);
+	tee_client_close_session(pvt_data.ctx, pvt_data.session_id);
+	tee_client_close_context(pvt_data.ctx);
+	pvt_data.ctx = NULL;
+}
+
 static const struct tee_client_device_id tee_bnxt_fw_id_table[] = {
 	{UUID_INIT(0x6272636D, 0x2019, 0x0716,
 		    0x42, 0x43, 0x4D, 0x5F, 0x53, 0x43, 0x48, 0x49)},
@@ -257,6 +265,7 @@ static struct tee_client_driver tee_bnxt_fw_driver = {
 		.bus		= &tee_bus_type,
 		.probe		= tee_bnxt_fw_probe,
 		.remove		= tee_bnxt_fw_remove,
+		.shutdown	= tee_bnxt_fw_shutdown,
 	},
 };
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH v2 2/2] firmware: tee_bnxt: implement shutdown method to handle kexec reboots
@ 2021-02-25  9:06   ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-02-25  9:06 UTC (permalink / raw)
  To: jens.wiklander, zajec5
  Cc: Allen Pais, op-tee, bcm-kernel-feedback-list, linux-kernel,
	linux-arm-kernel

From: Allen Pais <apais@linux.microsoft.com>

 On kexec reboot the firmware driver fails to deallocate
shm memory leading to a memory leak. Implement .shutdown()
method to handle kexec reboots and to release shm buffers
correctly.

Signed-off-by: Allen Pais <apais@linux.microsoft.com>
---
 drivers/firmware/broadcom/tee_bnxt_fw.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/drivers/firmware/broadcom/tee_bnxt_fw.c b/drivers/firmware/broadcom/tee_bnxt_fw.c
index ed10da5313e8..4c62e044a99f 100644
--- a/drivers/firmware/broadcom/tee_bnxt_fw.c
+++ b/drivers/firmware/broadcom/tee_bnxt_fw.c
@@ -242,6 +242,14 @@ static int tee_bnxt_fw_remove(struct device *dev)
 	return 0;
 }
 
+static void tee_bnxt_fw_shutdown(struct device *dev)
+{
+	tee_shm_free(pvt_data.fw_shm_pool);
+	tee_client_close_session(pvt_data.ctx, pvt_data.session_id);
+	tee_client_close_context(pvt_data.ctx);
+	pvt_data.ctx = NULL;
+}
+
 static const struct tee_client_device_id tee_bnxt_fw_id_table[] = {
 	{UUID_INIT(0x6272636D, 0x2019, 0x0716,
 		    0x42, 0x43, 0x4D, 0x5F, 0x53, 0x43, 0x48, 0x49)},
@@ -257,6 +265,7 @@ static struct tee_client_driver tee_bnxt_fw_driver = {
 		.bus		= &tee_bus_type,
 		.probe		= tee_bnxt_fw_probe,
 		.remove		= tee_bnxt_fw_remove,
+		.shutdown	= tee_bnxt_fw_shutdown,
 	},
 };
 
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-02-25  9:06   ` Allen Pais
@ 2021-03-01 14:35     ` Jens Wiklander
  -1 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-03-01 14:35 UTC (permalink / raw)
  To: Allen Pais
  Cc: zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, op-tee, Allen Pais

On Thu, Feb 25, 2021 at 10:06 AM Allen Pais <allen.lkml@gmail.com> wrote:
>
> From: Allen Pais <apais@linux.microsoft.com>
>
> The following out of memory errors are seen on kexec reboot
> from the optee core.
>
> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>
> tee_shm_release() is not invoked on dma shm buffer.
>
> Implement .shutdown() method to handle the release of the buffers
> correctly.
>
> More info:
> https://github.com/OP-TEE/optee_os/issues/3637
>
> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> ---
>  drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)

This looks good to me. Do you have a practical way of testing this on
QEMU for instance?

Thanks,
Jens

>
> diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
> index cf4718c6d35d..80e2774b5e2a 100644
> --- a/drivers/tee/optee/core.c
> +++ b/drivers/tee/optee/core.c
> @@ -582,6 +582,13 @@ static optee_invoke_fn *get_invoke_func(struct device *dev)
>         return ERR_PTR(-EINVAL);
>  }
>
> +/* optee_remove - Device Removal Routine
> + * @pdev: platform device information struct
> + *
> + * optee_remove is called by platform subsystem to alter the driver
> + * that it should release the device
> + */
> +
>  static int optee_remove(struct platform_device *pdev)
>  {
>         struct optee *optee = platform_get_drvdata(pdev);
> @@ -612,6 +619,18 @@ static int optee_remove(struct platform_device *pdev)
>         return 0;
>  }
>
> +/* optee_shutdown - Device Removal Routine
> + * @pdev: platform device information struct
> + *
> + * platform_shutdown is called by the platform subsystem to alter
> + * the driver that a shutdown/reboot(or kexec) is happening and
> + * device must be disabled.
> + */
> +static void optee_shutdown(struct platform_device *pdev)
> +{
> +       optee_disable_shm_cache(platform_get_drvdata(pdev));
> +}
> +
>  static int optee_probe(struct platform_device *pdev)
>  {
>         optee_invoke_fn *invoke_fn;
> @@ -738,6 +757,7 @@ MODULE_DEVICE_TABLE(of, optee_dt_match);
>  static struct platform_driver optee_driver = {
>         .probe  = optee_probe,
>         .remove = optee_remove,
> +       .shutdown = optee_shutdown,
>         .driver = {
>                 .name = "optee",
>                 .of_match_table = optee_dt_match,
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-03-01 14:35     ` Jens Wiklander
  0 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-03-01 14:35 UTC (permalink / raw)
  To: Allen Pais
  Cc: zajec5, Linux Kernel Mailing List, op-tee,
	bcm-kernel-feedback-list, Allen Pais, Linux ARM

On Thu, Feb 25, 2021 at 10:06 AM Allen Pais <allen.lkml@gmail.com> wrote:
>
> From: Allen Pais <apais@linux.microsoft.com>
>
> The following out of memory errors are seen on kexec reboot
> from the optee core.
>
> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>
> tee_shm_release() is not invoked on dma shm buffer.
>
> Implement .shutdown() method to handle the release of the buffers
> correctly.
>
> More info:
> https://github.com/OP-TEE/optee_os/issues/3637
>
> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> ---
>  drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)

This looks good to me. Do you have a practical way of testing this on
QEMU for instance?

Thanks,
Jens

>
> diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
> index cf4718c6d35d..80e2774b5e2a 100644
> --- a/drivers/tee/optee/core.c
> +++ b/drivers/tee/optee/core.c
> @@ -582,6 +582,13 @@ static optee_invoke_fn *get_invoke_func(struct device *dev)
>         return ERR_PTR(-EINVAL);
>  }
>
> +/* optee_remove - Device Removal Routine
> + * @pdev: platform device information struct
> + *
> + * optee_remove is called by platform subsystem to alter the driver
> + * that it should release the device
> + */
> +
>  static int optee_remove(struct platform_device *pdev)
>  {
>         struct optee *optee = platform_get_drvdata(pdev);
> @@ -612,6 +619,18 @@ static int optee_remove(struct platform_device *pdev)
>         return 0;
>  }
>
> +/* optee_shutdown - Device Removal Routine
> + * @pdev: platform device information struct
> + *
> + * platform_shutdown is called by the platform subsystem to alter
> + * the driver that a shutdown/reboot(or kexec) is happening and
> + * device must be disabled.
> + */
> +static void optee_shutdown(struct platform_device *pdev)
> +{
> +       optee_disable_shm_cache(platform_get_drvdata(pdev));
> +}
> +
>  static int optee_probe(struct platform_device *pdev)
>  {
>         optee_invoke_fn *invoke_fn;
> @@ -738,6 +757,7 @@ MODULE_DEVICE_TABLE(of, optee_dt_match);
>  static struct platform_driver optee_driver = {
>         .probe  = optee_probe,
>         .remove = optee_remove,
> +       .shutdown = optee_shutdown,
>         .driver = {
>                 .name = "optee",
>                 .of_match_table = optee_dt_match,
> --
> 2.25.1
>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-03-01 14:35     ` Jens Wiklander
@ 2021-03-02  5:51       ` Allen Pais
  -1 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-03-02  5:51 UTC (permalink / raw)
  To: Jens Wiklander, Allen Pais
  Cc: zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, op-tee

>>
>> From: Allen Pais <apais@linux.microsoft.com>
>>
>> The following out of memory errors are seen on kexec reboot
>> from the optee core.
>>
>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>
>> tee_shm_release() is not invoked on dma shm buffer.
>>
>> Implement .shutdown() method to handle the release of the buffers
>> correctly.
>>
>> More info:
>> https://github.com/OP-TEE/optee_os/issues/3637
>>
>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>> ---
>>   drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
> 
> This looks good to me. Do you have a practical way of testing this on
> QEMU for instance?

   I have not tried this on QEMU. I will give it a go today.

Thanks.

> 
> Thanks,
> Jens
> 
>>
>> diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
>> index cf4718c6d35d..80e2774b5e2a 100644
>> --- a/drivers/tee/optee/core.c
>> +++ b/drivers/tee/optee/core.c
>> @@ -582,6 +582,13 @@ static optee_invoke_fn *get_invoke_func(struct device *dev)
>>          return ERR_PTR(-EINVAL);
>>   }
>>
>> +/* optee_remove - Device Removal Routine
>> + * @pdev: platform device information struct
>> + *
>> + * optee_remove is called by platform subsystem to alter the driver
>> + * that it should release the device
>> + */
>> +
>>   static int optee_remove(struct platform_device *pdev)
>>   {
>>          struct optee *optee = platform_get_drvdata(pdev);
>> @@ -612,6 +619,18 @@ static int optee_remove(struct platform_device *pdev)
>>          return 0;
>>   }
>>
>> +/* optee_shutdown - Device Removal Routine
>> + * @pdev: platform device information struct
>> + *
>> + * platform_shutdown is called by the platform subsystem to alter
>> + * the driver that a shutdown/reboot(or kexec) is happening and
>> + * device must be disabled.
>> + */
>> +static void optee_shutdown(struct platform_device *pdev)
>> +{
>> +       optee_disable_shm_cache(platform_get_drvdata(pdev));
>> +}
>> +
>>   static int optee_probe(struct platform_device *pdev)
>>   {
>>          optee_invoke_fn *invoke_fn;
>> @@ -738,6 +757,7 @@ MODULE_DEVICE_TABLE(of, optee_dt_match);
>>   static struct platform_driver optee_driver = {
>>          .probe  = optee_probe,
>>          .remove = optee_remove,
>> +       .shutdown = optee_shutdown,
>>          .driver = {
>>                  .name = "optee",
>>                  .of_match_table = optee_dt_match,
>> --
>> 2.25.1
>>

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-03-02  5:51       ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-03-02  5:51 UTC (permalink / raw)
  To: Jens Wiklander, Allen Pais
  Cc: op-tee, zajec5, bcm-kernel-feedback-list,
	Linux Kernel Mailing List, Linux ARM

>>
>> From: Allen Pais <apais@linux.microsoft.com>
>>
>> The following out of memory errors are seen on kexec reboot
>> from the optee core.
>>
>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>
>> tee_shm_release() is not invoked on dma shm buffer.
>>
>> Implement .shutdown() method to handle the release of the buffers
>> correctly.
>>
>> More info:
>> https://github.com/OP-TEE/optee_os/issues/3637
>>
>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>> ---
>>   drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
> 
> This looks good to me. Do you have a practical way of testing this on
> QEMU for instance?

   I have not tried this on QEMU. I will give it a go today.

Thanks.

> 
> Thanks,
> Jens
> 
>>
>> diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
>> index cf4718c6d35d..80e2774b5e2a 100644
>> --- a/drivers/tee/optee/core.c
>> +++ b/drivers/tee/optee/core.c
>> @@ -582,6 +582,13 @@ static optee_invoke_fn *get_invoke_func(struct device *dev)
>>          return ERR_PTR(-EINVAL);
>>   }
>>
>> +/* optee_remove - Device Removal Routine
>> + * @pdev: platform device information struct
>> + *
>> + * optee_remove is called by platform subsystem to alter the driver
>> + * that it should release the device
>> + */
>> +
>>   static int optee_remove(struct platform_device *pdev)
>>   {
>>          struct optee *optee = platform_get_drvdata(pdev);
>> @@ -612,6 +619,18 @@ static int optee_remove(struct platform_device *pdev)
>>          return 0;
>>   }
>>
>> +/* optee_shutdown - Device Removal Routine
>> + * @pdev: platform device information struct
>> + *
>> + * platform_shutdown is called by the platform subsystem to alter
>> + * the driver that a shutdown/reboot(or kexec) is happening and
>> + * device must be disabled.
>> + */
>> +static void optee_shutdown(struct platform_device *pdev)
>> +{
>> +       optee_disable_shm_cache(platform_get_drvdata(pdev));
>> +}
>> +
>>   static int optee_probe(struct platform_device *pdev)
>>   {
>>          optee_invoke_fn *invoke_fn;
>> @@ -738,6 +757,7 @@ MODULE_DEVICE_TABLE(of, optee_dt_match);
>>   static struct platform_driver optee_driver = {
>>          .probe  = optee_probe,
>>          .remove = optee_remove,
>> +       .shutdown = optee_shutdown,
>>          .driver = {
>>                  .name = "optee",
>>                  .of_match_table = optee_dt_match,
>> --
>> 2.25.1
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-03-01 14:35     ` Jens Wiklander
@ 2021-03-16 13:21       ` Allen Pais
  -1 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-03-16 13:21 UTC (permalink / raw)
  To: Jens Wiklander, Allen Pais
  Cc: zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, op-tee



>>
>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>
>> tee_shm_release() is not invoked on dma shm buffer.
>>
>> Implement .shutdown() method to handle the release of the buffers
>> correctly.
>>
>> More info:
>> https://github.com/OP-TEE/optee_os/issues/3637
>>
>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>> ---
>>   drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
> 
> This looks good to me. Do you have a practical way of testing this on
> QEMU for instance?
> 

Jens,

   I could not reproduce nor create a setup using QEMU, I could only
do it on a real h/w.

   I have extensively tested the fix and I don't see any issues.

Thanks.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-03-16 13:21       ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-03-16 13:21 UTC (permalink / raw)
  To: Jens Wiklander, Allen Pais
  Cc: zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, op-tee



>>
>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>
>> tee_shm_release() is not invoked on dma shm buffer.
>>
>> Implement .shutdown() method to handle the release of the buffers
>> correctly.
>>
>> More info:
>> https://github.com/OP-TEE/optee_os/issues/3637
>>
>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>> ---
>>   drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>   1 file changed, 20 insertions(+)
> 
> This looks good to me. Do you have a practical way of testing this on
> QEMU for instance?
> 

Jens,

   I could not reproduce nor create a setup using QEMU, I could only
do it on a real h/w.

   I have extensively tested the fix and I don't see any issues.

Thanks.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-02-25  9:06   ` Allen Pais
@ 2021-03-18 20:51     ` Tyler Hicks
  -1 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-03-18 20:51 UTC (permalink / raw)
  To: Allen Pais
  Cc: jens.wiklander, zajec5, bcm-kernel-feedback-list,
	linux-arm-kernel, linux-kernel, op-tee, Allen Pais

On 2021-02-25 14:36:09, Allen Pais wrote:
> From: Allen Pais <apais@linux.microsoft.com>
> 
> The following out of memory errors are seen on kexec reboot
> from the optee core.
> 
> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
> 
> tee_shm_release() is not invoked on dma shm buffer.
> 
> Implement .shutdown() method to handle the release of the buffers
> correctly.
> 
> More info:
> https://github.com/OP-TEE/optee_os/issues/3637
> 
> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> ---
>  drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
> index cf4718c6d35d..80e2774b5e2a 100644
> --- a/drivers/tee/optee/core.c
> +++ b/drivers/tee/optee/core.c
> @@ -582,6 +582,13 @@ static optee_invoke_fn *get_invoke_func(struct device *dev)
>  	return ERR_PTR(-EINVAL);
>  }
>  
> +/* optee_remove - Device Removal Routine
> + * @pdev: platform device information struct
> + *
> + * optee_remove is called by platform subsystem to alter the driver
                                                      ^ alert?

> + * that it should release the device
> + */
> +
>  static int optee_remove(struct platform_device *pdev)
>  {
>  	struct optee *optee = platform_get_drvdata(pdev);
> @@ -612,6 +619,18 @@ static int optee_remove(struct platform_device *pdev)
>  	return 0;
>  }
>  
> +/* optee_shutdown - Device Removal Routine
> + * @pdev: platform device information struct
> + *
> + * platform_shutdown is called by the platform subsystem to alter
                                                               ^ alert

With those two changes,

Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>

Tyler

> + * the driver that a shutdown/reboot(or kexec) is happening and
> + * device must be disabled.
> + */
> +static void optee_shutdown(struct platform_device *pdev)
> +{
> +	optee_disable_shm_cache(platform_get_drvdata(pdev));
> +}
> +
>  static int optee_probe(struct platform_device *pdev)
>  {
>  	optee_invoke_fn *invoke_fn;
> @@ -738,6 +757,7 @@ MODULE_DEVICE_TABLE(of, optee_dt_match);
>  static struct platform_driver optee_driver = {
>  	.probe  = optee_probe,
>  	.remove = optee_remove,
> +	.shutdown = optee_shutdown,
>  	.driver = {
>  		.name = "optee",
>  		.of_match_table = optee_dt_match,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-03-18 20:51     ` Tyler Hicks
  0 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-03-18 20:51 UTC (permalink / raw)
  To: Allen Pais
  Cc: jens.wiklander, zajec5, bcm-kernel-feedback-list,
	linux-arm-kernel, linux-kernel, op-tee, Allen Pais

On 2021-02-25 14:36:09, Allen Pais wrote:
> From: Allen Pais <apais@linux.microsoft.com>
> 
> The following out of memory errors are seen on kexec reboot
> from the optee core.
> 
> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
> 
> tee_shm_release() is not invoked on dma shm buffer.
> 
> Implement .shutdown() method to handle the release of the buffers
> correctly.
> 
> More info:
> https://github.com/OP-TEE/optee_os/issues/3637
> 
> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> ---
>  drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>  1 file changed, 20 insertions(+)
> 
> diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
> index cf4718c6d35d..80e2774b5e2a 100644
> --- a/drivers/tee/optee/core.c
> +++ b/drivers/tee/optee/core.c
> @@ -582,6 +582,13 @@ static optee_invoke_fn *get_invoke_func(struct device *dev)
>  	return ERR_PTR(-EINVAL);
>  }
>  
> +/* optee_remove - Device Removal Routine
> + * @pdev: platform device information struct
> + *
> + * optee_remove is called by platform subsystem to alter the driver
                                                      ^ alert?

> + * that it should release the device
> + */
> +
>  static int optee_remove(struct platform_device *pdev)
>  {
>  	struct optee *optee = platform_get_drvdata(pdev);
> @@ -612,6 +619,18 @@ static int optee_remove(struct platform_device *pdev)
>  	return 0;
>  }
>  
> +/* optee_shutdown - Device Removal Routine
> + * @pdev: platform device information struct
> + *
> + * platform_shutdown is called by the platform subsystem to alter
                                                               ^ alert

With those two changes,

Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>

Tyler

> + * the driver that a shutdown/reboot(or kexec) is happening and
> + * device must be disabled.
> + */
> +static void optee_shutdown(struct platform_device *pdev)
> +{
> +	optee_disable_shm_cache(platform_get_drvdata(pdev));
> +}
> +
>  static int optee_probe(struct platform_device *pdev)
>  {
>  	optee_invoke_fn *invoke_fn;
> @@ -738,6 +757,7 @@ MODULE_DEVICE_TABLE(of, optee_dt_match);
>  static struct platform_driver optee_driver = {
>  	.probe  = optee_probe,
>  	.remove = optee_remove,
> +	.shutdown = optee_shutdown,
>  	.driver = {
>  		.name = "optee",
>  		.of_match_table = optee_dt_match,
> -- 
> 2.25.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/2] firmware: tee_bnxt: implement shutdown method to handle kexec reboots
  2021-02-25  9:06   ` Allen Pais
@ 2021-03-18 20:55     ` Tyler Hicks
  -1 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-03-18 20:55 UTC (permalink / raw)
  To: Allen Pais
  Cc: jens.wiklander, zajec5, bcm-kernel-feedback-list,
	linux-arm-kernel, linux-kernel, op-tee, Allen Pais

On 2021-02-25 14:36:10, Allen Pais wrote:
> From: Allen Pais <apais@linux.microsoft.com>
> 
>  On kexec reboot the firmware driver fails to deallocate
> shm memory leading to a memory leak. Implement .shutdown()
> method to handle kexec reboots and to release shm buffers
> correctly.
> 
> Signed-off-by: Allen Pais <apais@linux.microsoft.com>

Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>

Tyler

> ---
>  drivers/firmware/broadcom/tee_bnxt_fw.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/firmware/broadcom/tee_bnxt_fw.c b/drivers/firmware/broadcom/tee_bnxt_fw.c
> index ed10da5313e8..4c62e044a99f 100644
> --- a/drivers/firmware/broadcom/tee_bnxt_fw.c
> +++ b/drivers/firmware/broadcom/tee_bnxt_fw.c
> @@ -242,6 +242,14 @@ static int tee_bnxt_fw_remove(struct device *dev)
>  	return 0;
>  }
>  
> +static void tee_bnxt_fw_shutdown(struct device *dev)
> +{
> +	tee_shm_free(pvt_data.fw_shm_pool);
> +	tee_client_close_session(pvt_data.ctx, pvt_data.session_id);
> +	tee_client_close_context(pvt_data.ctx);
> +	pvt_data.ctx = NULL;
> +}
> +
>  static const struct tee_client_device_id tee_bnxt_fw_id_table[] = {
>  	{UUID_INIT(0x6272636D, 0x2019, 0x0716,
>  		    0x42, 0x43, 0x4D, 0x5F, 0x53, 0x43, 0x48, 0x49)},
> @@ -257,6 +265,7 @@ static struct tee_client_driver tee_bnxt_fw_driver = {
>  		.bus		= &tee_bus_type,
>  		.probe		= tee_bnxt_fw_probe,
>  		.remove		= tee_bnxt_fw_remove,
> +		.shutdown	= tee_bnxt_fw_shutdown,
>  	},
>  };
>  
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 2/2] firmware: tee_bnxt: implement shutdown method to handle kexec reboots
@ 2021-03-18 20:55     ` Tyler Hicks
  0 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-03-18 20:55 UTC (permalink / raw)
  To: Allen Pais
  Cc: jens.wiklander, zajec5, bcm-kernel-feedback-list,
	linux-arm-kernel, linux-kernel, op-tee, Allen Pais

On 2021-02-25 14:36:10, Allen Pais wrote:
> From: Allen Pais <apais@linux.microsoft.com>
> 
>  On kexec reboot the firmware driver fails to deallocate
> shm memory leading to a memory leak. Implement .shutdown()
> method to handle kexec reboots and to release shm buffers
> correctly.
> 
> Signed-off-by: Allen Pais <apais@linux.microsoft.com>

Reviewed-by: Tyler Hicks <tyhicks@linux.microsoft.com>

Tyler

> ---
>  drivers/firmware/broadcom/tee_bnxt_fw.c | 9 +++++++++
>  1 file changed, 9 insertions(+)
> 
> diff --git a/drivers/firmware/broadcom/tee_bnxt_fw.c b/drivers/firmware/broadcom/tee_bnxt_fw.c
> index ed10da5313e8..4c62e044a99f 100644
> --- a/drivers/firmware/broadcom/tee_bnxt_fw.c
> +++ b/drivers/firmware/broadcom/tee_bnxt_fw.c
> @@ -242,6 +242,14 @@ static int tee_bnxt_fw_remove(struct device *dev)
>  	return 0;
>  }
>  
> +static void tee_bnxt_fw_shutdown(struct device *dev)
> +{
> +	tee_shm_free(pvt_data.fw_shm_pool);
> +	tee_client_close_session(pvt_data.ctx, pvt_data.session_id);
> +	tee_client_close_context(pvt_data.ctx);
> +	pvt_data.ctx = NULL;
> +}
> +
>  static const struct tee_client_device_id tee_bnxt_fw_id_table[] = {
>  	{UUID_INIT(0x6272636D, 0x2019, 0x0716,
>  		    0x42, 0x43, 0x4D, 0x5F, 0x53, 0x43, 0x48, 0x49)},
> @@ -257,6 +265,7 @@ static struct tee_client_driver tee_bnxt_fw_driver = {
>  		.bus		= &tee_bus_type,
>  		.probe		= tee_bnxt_fw_probe,
>  		.remove		= tee_bnxt_fw_remove,
> +		.shutdown	= tee_bnxt_fw_shutdown,
>  	},
>  };
>  
> -- 
> 2.25.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-03-16 13:21       ` Allen Pais
@ 2021-03-19  7:00         ` Jens Wiklander
  -1 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-03-19  7:00 UTC (permalink / raw)
  To: Allen Pais
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Tue, Mar 16, 2021 at 2:21 PM Allen Pais <apais@linux.microsoft.com> wrote:
>
>
>
> >>
> >> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
> >> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
> >>
> >> tee_shm_release() is not invoked on dma shm buffer.
> >>
> >> Implement .shutdown() method to handle the release of the buffers
> >> correctly.
> >>
> >> More info:
> >> https://github.com/OP-TEE/optee_os/issues/3637
> >>
> >> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> >> ---
> >>   drivers/tee/optee/core.c | 20 ++++++++++++++++++++
> >>   1 file changed, 20 insertions(+)
> >
> > This looks good to me. Do you have a practical way of testing this on
> > QEMU for instance?
> >
>
> Jens,
>
>    I could not reproduce nor create a setup using QEMU, I could only
> do it on a real h/w.
>
>    I have extensively tested the fix and I don't see any issues.

I did a few test runs too, seems OK.

Thanks,
Jens

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-03-19  7:00         ` Jens Wiklander
  0 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-03-19  7:00 UTC (permalink / raw)
  To: Allen Pais
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Tue, Mar 16, 2021 at 2:21 PM Allen Pais <apais@linux.microsoft.com> wrote:
>
>
>
> >>
> >> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
> >> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
> >>
> >> tee_shm_release() is not invoked on dma shm buffer.
> >>
> >> Implement .shutdown() method to handle the release of the buffers
> >> correctly.
> >>
> >> More info:
> >> https://github.com/OP-TEE/optee_os/issues/3637
> >>
> >> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> >> ---
> >>   drivers/tee/optee/core.c | 20 ++++++++++++++++++++
> >>   1 file changed, 20 insertions(+)
> >
> > This looks good to me. Do you have a practical way of testing this on
> > QEMU for instance?
> >
>
> Jens,
>
>    I could not reproduce nor create a setup using QEMU, I could only
> do it on a real h/w.
>
>    I have extensively tested the fix and I don't see any issues.

I did a few test runs too, seems OK.

Thanks,
Jens

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-03-19  7:00         ` Jens Wiklander
@ 2021-03-22  7:59           ` Allen Pais
  -1 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-03-22  7:59 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware


>>>>
>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>>>
>>>> tee_shm_release() is not invoked on dma shm buffer.
>>>>
>>>> Implement .shutdown() method to handle the release of the buffers
>>>> correctly.
>>>>
>>>> More info:
>>>> https://github.com/OP-TEE/optee_os/issues/3637
>>>>
>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>>>> ---
>>>>    drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>>>    1 file changed, 20 insertions(+)
>>>
>>> This looks good to me. Do you have a practical way of testing this on
>>> QEMU for instance?
>>>
>>
>> Jens,
>>
>>     I could not reproduce nor create a setup using QEMU, I could only
>> do it on a real h/w.
>>
>>     I have extensively tested the fix and I don't see any issues.
> 
> I did a few test runs too, seems OK.

Thank you very much.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-03-22  7:59           ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-03-22  7:59 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware


>>>>
>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>>>
>>>> tee_shm_release() is not invoked on dma shm buffer.
>>>>
>>>> Implement .shutdown() method to handle the release of the buffers
>>>> correctly.
>>>>
>>>> More info:
>>>> https://github.com/OP-TEE/optee_os/issues/3637
>>>>
>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>>>> ---
>>>>    drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>>>    1 file changed, 20 insertions(+)
>>>
>>> This looks good to me. Do you have a practical way of testing this on
>>> QEMU for instance?
>>>
>>
>> Jens,
>>
>>     I could not reproduce nor create a setup using QEMU, I could only
>> do it on a real h/w.
>>
>>     I have extensively tested the fix and I don't see any issues.
> 
> I did a few test runs too, seems OK.

Thank you very much.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-03-19  7:00         ` Jens Wiklander
@ 2021-05-05 13:45           ` Allen Pais
  -1 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-05 13:45 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

Jens, 

>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>>> 
>>>> tee_shm_release() is not invoked on dma shm buffer.
>>>> 
>>>> Implement .shutdown() method to handle the release of the buffers
>>>> correctly.
>>>> 
>>>> More info:
>>>> https://github.com/OP-TEE/optee_os/issues/3637
>>>> 
>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>>>> ---
>>>>  drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>>>  1 file changed, 20 insertions(+)
>>> 
>>> This looks good to me. Do you have a practical way of testing this on
>>> QEMU for instance?
>>> 
>> 
>> Jens,
>> 
>>   I could not reproduce nor create a setup using QEMU, I could only
>> do it on a real h/w.
>> 
>>   I have extensively tested the fix and I don't see any issues.
> 
> I did a few test runs too, seems OK.

 I carried these changes and have not run into any issues with Kexec so far.
Last week, while trying out kdump, we ran into a crash(this is when the
Kdump kernel reboots).

$echo c > /proc/sysrq-trigger

Leads to:

[   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
[   18.013002] Mem abort info:
[   18.015885]   ESR = 0x96000005
[   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
[   18.024516]   SET = 0, FnV = 0
[   18.027667]   EA = 0, S1PTW = 0
[   18.030905] Data abort info:
[   18.033877]   ISV = 0, ISS = 0x00000005
[   18.037835]   CM = 0, WnR = 0
[   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
[   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
[   18.054819] Internal error: Oops: 96000005 [#1] SMP
[   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
[   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
[   18.077174] Hardware name: Overlake (DT)
[   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
[   18.086170] pc : tee_shm_free+0x18/0x48
[   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
[   18.095066] sp : ffff80001005bb90
[   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000 
[   18.103962] x27: 0000000000000000 x26: ffff00003ed10490 
[   18.109440] x25: ffffca760e975f90 x24: 0000000000000000 
[   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18 
[   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a 
[   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010 
[   18.131352] x17: 0000000000000000 x16: 0000000000000000 
[   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808 
[   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f 
[   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820 
[   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0 
[   18.158742] x7 : 0000000000000000 x6 : 0000000000000000 
[   18.164220] x5 : 0000000000000000 x4 : 0000000000000000 
[   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700 
[   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04 
[   18.180654] Call trace:
[   18.183176]  tee_shm_free+0x18/0x48
[   18.186773]  optee_disable_shm_cache+0xa4/0xf0
[   18.191356]  optee_shutdown+0x20/0x30
[   18.195135]  platform_drv_shutdown+0x2c/0x38
[   18.199538]  device_shutdown+0x180/0x298
[   18.203586]  kernel_restart_prepare+0x44/0x50
[   18.208078]  kernel_restart+0x20/0x68
[   18.211853]  __do_sys_reboot+0x104/0x258
[   18.215899]  __arm64_sys_reboot+0x2c/0x38
[   18.220035]  el0_svc_handler+0x90/0x138
[   18.223991]  el0_svc+0x8/0x208
[   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60) 
[   18.233435] ---[ end trace 835d756cd66aa959 ]---
[   18.238621] Kernel panic - not syncing: Fatal exception
[   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
[   18.250299] PHYS_OFFSET: 0xffff99c680000000
[   18.254613] CPU features: 0x0002,21806008
[   18.258747] Memory Limit: none
[   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—

I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
Should disable and clear all the cache) we run into the crash trying to free shm.

Thoughts?

Thanks.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-05-05 13:45           ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-05 13:45 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

Jens, 

>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>>> 
>>>> tee_shm_release() is not invoked on dma shm buffer.
>>>> 
>>>> Implement .shutdown() method to handle the release of the buffers
>>>> correctly.
>>>> 
>>>> More info:
>>>> https://github.com/OP-TEE/optee_os/issues/3637
>>>> 
>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>>>> ---
>>>>  drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>>>  1 file changed, 20 insertions(+)
>>> 
>>> This looks good to me. Do you have a practical way of testing this on
>>> QEMU for instance?
>>> 
>> 
>> Jens,
>> 
>>   I could not reproduce nor create a setup using QEMU, I could only
>> do it on a real h/w.
>> 
>>   I have extensively tested the fix and I don't see any issues.
> 
> I did a few test runs too, seems OK.

 I carried these changes and have not run into any issues with Kexec so far.
Last week, while trying out kdump, we ran into a crash(this is when the
Kdump kernel reboots).

$echo c > /proc/sysrq-trigger

Leads to:

[   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
[   18.013002] Mem abort info:
[   18.015885]   ESR = 0x96000005
[   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
[   18.024516]   SET = 0, FnV = 0
[   18.027667]   EA = 0, S1PTW = 0
[   18.030905] Data abort info:
[   18.033877]   ISV = 0, ISS = 0x00000005
[   18.037835]   CM = 0, WnR = 0
[   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
[   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
[   18.054819] Internal error: Oops: 96000005 [#1] SMP
[   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
[   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
[   18.077174] Hardware name: Overlake (DT)
[   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
[   18.086170] pc : tee_shm_free+0x18/0x48
[   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
[   18.095066] sp : ffff80001005bb90
[   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000 
[   18.103962] x27: 0000000000000000 x26: ffff00003ed10490 
[   18.109440] x25: ffffca760e975f90 x24: 0000000000000000 
[   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18 
[   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a 
[   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010 
[   18.131352] x17: 0000000000000000 x16: 0000000000000000 
[   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808 
[   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f 
[   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820 
[   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0 
[   18.158742] x7 : 0000000000000000 x6 : 0000000000000000 
[   18.164220] x5 : 0000000000000000 x4 : 0000000000000000 
[   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700 
[   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04 
[   18.180654] Call trace:
[   18.183176]  tee_shm_free+0x18/0x48
[   18.186773]  optee_disable_shm_cache+0xa4/0xf0
[   18.191356]  optee_shutdown+0x20/0x30
[   18.195135]  platform_drv_shutdown+0x2c/0x38
[   18.199538]  device_shutdown+0x180/0x298
[   18.203586]  kernel_restart_prepare+0x44/0x50
[   18.208078]  kernel_restart+0x20/0x68
[   18.211853]  __do_sys_reboot+0x104/0x258
[   18.215899]  __arm64_sys_reboot+0x2c/0x38
[   18.220035]  el0_svc_handler+0x90/0x138
[   18.223991]  el0_svc+0x8/0x208
[   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60) 
[   18.233435] ---[ end trace 835d756cd66aa959 ]---
[   18.238621] Kernel panic - not syncing: Fatal exception
[   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
[   18.250299] PHYS_OFFSET: 0xffff99c680000000
[   18.254613] CPU features: 0x0002,21806008
[   18.258747] Memory Limit: none
[   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—

I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
Should disable and clear all the cache) we run into the crash trying to free shm.

Thoughts?

Thanks.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-05-05 13:45           ` Allen Pais
@ 2021-05-06  7:02             ` Jens Wiklander
  -1 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-05-06  7:02 UTC (permalink / raw)
  To: Allen Pais
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Wed, May 5, 2021 at 3:45 PM Allen Pais <apais@linux.microsoft.com> wrote:
>
> Jens,
>
> >>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
> >>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
> >>>>
> >>>> tee_shm_release() is not invoked on dma shm buffer.
> >>>>
> >>>> Implement .shutdown() method to handle the release of the buffers
> >>>> correctly.
> >>>>
> >>>> More info:
> >>>> https://github.com/OP-TEE/optee_os/issues/3637
> >>>>
> >>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> >>>> ---
> >>>>  drivers/tee/optee/core.c | 20 ++++++++++++++++++++
> >>>>  1 file changed, 20 insertions(+)
> >>>
> >>> This looks good to me. Do you have a practical way of testing this on
> >>> QEMU for instance?
> >>>
> >>
> >> Jens,
> >>
> >>   I could not reproduce nor create a setup using QEMU, I could only
> >> do it on a real h/w.
> >>
> >>   I have extensively tested the fix and I don't see any issues.
> >
> > I did a few test runs too, seems OK.
>
>  I carried these changes and have not run into any issues with Kexec so far.
> Last week, while trying out kdump, we ran into a crash(this is when the
> Kdump kernel reboots).
>
> $echo c > /proc/sysrq-trigger
>
> Leads to:
>
> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
> [   18.013002] Mem abort info:
> [   18.015885]   ESR = 0x96000005
> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   18.024516]   SET = 0, FnV = 0
> [   18.027667]   EA = 0, S1PTW = 0
> [   18.030905] Data abort info:
> [   18.033877]   ISV = 0, ISS = 0x00000005
> [   18.037835]   CM = 0, WnR = 0
> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
> [   18.077174] Hardware name: Overlake (DT)
> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
> [   18.086170] pc : tee_shm_free+0x18/0x48
> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
> [   18.095066] sp : ffff80001005bb90
> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
> [   18.180654] Call trace:
> [   18.183176]  tee_shm_free+0x18/0x48
> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
> [   18.191356]  optee_shutdown+0x20/0x30
> [   18.195135]  platform_drv_shutdown+0x2c/0x38
> [   18.199538]  device_shutdown+0x180/0x298
> [   18.203586]  kernel_restart_prepare+0x44/0x50
> [   18.208078]  kernel_restart+0x20/0x68
> [   18.211853]  __do_sys_reboot+0x104/0x258
> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
> [   18.220035]  el0_svc_handler+0x90/0x138
> [   18.223991]  el0_svc+0x8/0x208
> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
> [   18.238621] Kernel panic - not syncing: Fatal exception
> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
> [   18.254613] CPU features: 0x0002,21806008
> [   18.258747] Memory Limit: none
> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
>
> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
> Should disable and clear all the cache) we run into the crash trying to free shm.
>
> Thoughts?

It seems that the pointer is invalid, but the pointer doesn't look
like garbage. Could the kernel have unmapped the memory area covering
that address?

Cheers,
Jens

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-05-06  7:02             ` Jens Wiklander
  0 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-05-06  7:02 UTC (permalink / raw)
  To: Allen Pais
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Wed, May 5, 2021 at 3:45 PM Allen Pais <apais@linux.microsoft.com> wrote:
>
> Jens,
>
> >>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
> >>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
> >>>>
> >>>> tee_shm_release() is not invoked on dma shm buffer.
> >>>>
> >>>> Implement .shutdown() method to handle the release of the buffers
> >>>> correctly.
> >>>>
> >>>> More info:
> >>>> https://github.com/OP-TEE/optee_os/issues/3637
> >>>>
> >>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> >>>> ---
> >>>>  drivers/tee/optee/core.c | 20 ++++++++++++++++++++
> >>>>  1 file changed, 20 insertions(+)
> >>>
> >>> This looks good to me. Do you have a practical way of testing this on
> >>> QEMU for instance?
> >>>
> >>
> >> Jens,
> >>
> >>   I could not reproduce nor create a setup using QEMU, I could only
> >> do it on a real h/w.
> >>
> >>   I have extensively tested the fix and I don't see any issues.
> >
> > I did a few test runs too, seems OK.
>
>  I carried these changes and have not run into any issues with Kexec so far.
> Last week, while trying out kdump, we ran into a crash(this is when the
> Kdump kernel reboots).
>
> $echo c > /proc/sysrq-trigger
>
> Leads to:
>
> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
> [   18.013002] Mem abort info:
> [   18.015885]   ESR = 0x96000005
> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
> [   18.024516]   SET = 0, FnV = 0
> [   18.027667]   EA = 0, S1PTW = 0
> [   18.030905] Data abort info:
> [   18.033877]   ISV = 0, ISS = 0x00000005
> [   18.037835]   CM = 0, WnR = 0
> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
> [   18.077174] Hardware name: Overlake (DT)
> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
> [   18.086170] pc : tee_shm_free+0x18/0x48
> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
> [   18.095066] sp : ffff80001005bb90
> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
> [   18.180654] Call trace:
> [   18.183176]  tee_shm_free+0x18/0x48
> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
> [   18.191356]  optee_shutdown+0x20/0x30
> [   18.195135]  platform_drv_shutdown+0x2c/0x38
> [   18.199538]  device_shutdown+0x180/0x298
> [   18.203586]  kernel_restart_prepare+0x44/0x50
> [   18.208078]  kernel_restart+0x20/0x68
> [   18.211853]  __do_sys_reboot+0x104/0x258
> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
> [   18.220035]  el0_svc_handler+0x90/0x138
> [   18.223991]  el0_svc+0x8/0x208
> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
> [   18.238621] Kernel panic - not syncing: Fatal exception
> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
> [   18.254613] CPU features: 0x0002,21806008
> [   18.258747] Memory Limit: none
> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
>
> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
> Should disable and clear all the cache) we run into the crash trying to free shm.
>
> Thoughts?

It seems that the pointer is invalid, but the pointer doesn't look
like garbage. Could the kernel have unmapped the memory area covering
that address?

Cheers,
Jens

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-05-06  7:02             ` Jens Wiklander
@ 2021-05-06  7:10               ` Allen Pais
  -1 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-06  7:10 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware


>>>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>>>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>>>>> 
>>>>>> tee_shm_release() is not invoked on dma shm buffer.
>>>>>> 
>>>>>> Implement .shutdown() method to handle the release of the buffers
>>>>>> correctly.
>>>>>> 
>>>>>> More info:
>>>>>> https://github.com/OP-TEE/optee_os/issues/3637
>>>>>> 
>>>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>>>>>> ---
>>>>>> drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>>>>> 1 file changed, 20 insertions(+)
>>>>> 
>>>>> This looks good to me. Do you have a practical way of testing this on
>>>>> QEMU for instance?
>>>>> 
>>>> 
>>>> Jens,
>>>> 
>>>>  I could not reproduce nor create a setup using QEMU, I could only
>>>> do it on a real h/w.
>>>> 
>>>>  I have extensively tested the fix and I don't see any issues.
>>> 
>>> I did a few test runs too, seems OK.
>> 
>> I carried these changes and have not run into any issues with Kexec so far.
>> Last week, while trying out kdump, we ran into a crash(this is when the
>> Kdump kernel reboots).
>> 
>> $echo c > /proc/sysrq-trigger
>> 
>> Leads to:
>> 
>> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
>> [   18.013002] Mem abort info:
>> [   18.015885]   ESR = 0x96000005
>> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [   18.024516]   SET = 0, FnV = 0
>> [   18.027667]   EA = 0, S1PTW = 0
>> [   18.030905] Data abort info:
>> [   18.033877]   ISV = 0, ISS = 0x00000005
>> [   18.037835]   CM = 0, WnR = 0
>> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
>> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
>> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
>> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
>> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
>> [   18.077174] Hardware name: Overlake (DT)
>> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
>> [   18.086170] pc : tee_shm_free+0x18/0x48
>> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
>> [   18.095066] sp : ffff80001005bb90
>> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
>> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
>> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
>> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
>> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
>> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
>> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
>> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
>> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
>> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
>> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
>> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
>> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
>> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
>> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
>> [   18.180654] Call trace:
>> [   18.183176]  tee_shm_free+0x18/0x48
>> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
>> [   18.191356]  optee_shutdown+0x20/0x30
>> [   18.195135]  platform_drv_shutdown+0x2c/0x38
>> [   18.199538]  device_shutdown+0x180/0x298
>> [   18.203586]  kernel_restart_prepare+0x44/0x50
>> [   18.208078]  kernel_restart+0x20/0x68
>> [   18.211853]  __do_sys_reboot+0x104/0x258
>> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
>> [   18.220035]  el0_svc_handler+0x90/0x138
>> [   18.223991]  el0_svc+0x8/0x208
>> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
>> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
>> [   18.238621] Kernel panic - not syncing: Fatal exception
>> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
>> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
>> [   18.254613] CPU features: 0x0002,21806008
>> [   18.258747] Memory Limit: none
>> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
>> 
>> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
>> Should disable and clear all the cache) we run into the crash trying to free shm.
>> 
>> Thoughts?
> 
> It seems that the pointer is invalid, but the pointer doesn't look
> like garbage. Could the kernel have unmapped the memory area covering
> that address?
> 

 Yes, I am not entirely sure if the kernel had the time to unmap the memory.
Right after triggering the crash the kdump kernel is booted and I see the following

[ 2.050145] optee: probing for conduit method. 
[ 2.054743] optee: revision 3.6 (f84427aa) 
[ 2.054821] optee: dynamic shared memory is enabled 
[ 2.066186] optee: initialized driver 

Could this be previous un-released maps causing corruption?

Thanks.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-05-06  7:10               ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-06  7:10 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware


>>>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>>>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>>>>> 
>>>>>> tee_shm_release() is not invoked on dma shm buffer.
>>>>>> 
>>>>>> Implement .shutdown() method to handle the release of the buffers
>>>>>> correctly.
>>>>>> 
>>>>>> More info:
>>>>>> https://github.com/OP-TEE/optee_os/issues/3637
>>>>>> 
>>>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>>>>>> ---
>>>>>> drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>>>>> 1 file changed, 20 insertions(+)
>>>>> 
>>>>> This looks good to me. Do you have a practical way of testing this on
>>>>> QEMU for instance?
>>>>> 
>>>> 
>>>> Jens,
>>>> 
>>>>  I could not reproduce nor create a setup using QEMU, I could only
>>>> do it on a real h/w.
>>>> 
>>>>  I have extensively tested the fix and I don't see any issues.
>>> 
>>> I did a few test runs too, seems OK.
>> 
>> I carried these changes and have not run into any issues with Kexec so far.
>> Last week, while trying out kdump, we ran into a crash(this is when the
>> Kdump kernel reboots).
>> 
>> $echo c > /proc/sysrq-trigger
>> 
>> Leads to:
>> 
>> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
>> [   18.013002] Mem abort info:
>> [   18.015885]   ESR = 0x96000005
>> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
>> [   18.024516]   SET = 0, FnV = 0
>> [   18.027667]   EA = 0, S1PTW = 0
>> [   18.030905] Data abort info:
>> [   18.033877]   ISV = 0, ISS = 0x00000005
>> [   18.037835]   CM = 0, WnR = 0
>> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
>> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
>> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
>> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
>> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
>> [   18.077174] Hardware name: Overlake (DT)
>> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
>> [   18.086170] pc : tee_shm_free+0x18/0x48
>> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
>> [   18.095066] sp : ffff80001005bb90
>> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
>> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
>> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
>> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
>> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
>> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
>> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
>> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
>> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
>> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
>> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
>> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
>> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
>> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
>> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
>> [   18.180654] Call trace:
>> [   18.183176]  tee_shm_free+0x18/0x48
>> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
>> [   18.191356]  optee_shutdown+0x20/0x30
>> [   18.195135]  platform_drv_shutdown+0x2c/0x38
>> [   18.199538]  device_shutdown+0x180/0x298
>> [   18.203586]  kernel_restart_prepare+0x44/0x50
>> [   18.208078]  kernel_restart+0x20/0x68
>> [   18.211853]  __do_sys_reboot+0x104/0x258
>> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
>> [   18.220035]  el0_svc_handler+0x90/0x138
>> [   18.223991]  el0_svc+0x8/0x208
>> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
>> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
>> [   18.238621] Kernel panic - not syncing: Fatal exception
>> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
>> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
>> [   18.254613] CPU features: 0x0002,21806008
>> [   18.258747] Memory Limit: none
>> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
>> 
>> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
>> Should disable and clear all the cache) we run into the crash trying to free shm.
>> 
>> Thoughts?
> 
> It seems that the pointer is invalid, but the pointer doesn't look
> like garbage. Could the kernel have unmapped the memory area covering
> that address?
> 

 Yes, I am not entirely sure if the kernel had the time to unmap the memory.
Right after triggering the crash the kdump kernel is booted and I see the following

[ 2.050145] optee: probing for conduit method. 
[ 2.054743] optee: revision 3.6 (f84427aa) 
[ 2.054821] optee: dynamic shared memory is enabled 
[ 2.066186] optee: initialized driver 

Could this be previous un-released maps causing corruption?

Thanks.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-05-06  7:10               ` Allen Pais
@ 2021-05-06  7:19                 ` Jens Wiklander
  -1 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-05-06  7:19 UTC (permalink / raw)
  To: Allen Pais
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Thu, May 6, 2021 at 9:10 AM Allen Pais <apais@linux.microsoft.com> wrote:
>
>
> >>>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
> >>>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
> >>>>>>
> >>>>>> tee_shm_release() is not invoked on dma shm buffer.
> >>>>>>
> >>>>>> Implement .shutdown() method to handle the release of the buffers
> >>>>>> correctly.
> >>>>>>
> >>>>>> More info:
> >>>>>> https://github.com/OP-TEE/optee_os/issues/3637
> >>>>>>
> >>>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> >>>>>> ---
> >>>>>> drivers/tee/optee/core.c | 20 ++++++++++++++++++++
> >>>>>> 1 file changed, 20 insertions(+)
> >>>>>
> >>>>> This looks good to me. Do you have a practical way of testing this on
> >>>>> QEMU for instance?
> >>>>>
> >>>>
> >>>> Jens,
> >>>>
> >>>>  I could not reproduce nor create a setup using QEMU, I could only
> >>>> do it on a real h/w.
> >>>>
> >>>>  I have extensively tested the fix and I don't see any issues.
> >>>
> >>> I did a few test runs too, seems OK.
> >>
> >> I carried these changes and have not run into any issues with Kexec so far.
> >> Last week, while trying out kdump, we ran into a crash(this is when the
> >> Kdump kernel reboots).
> >>
> >> $echo c > /proc/sysrq-trigger
> >>
> >> Leads to:
> >>
> >> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
> >> [   18.013002] Mem abort info:
> >> [   18.015885]   ESR = 0x96000005
> >> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
> >> [   18.024516]   SET = 0, FnV = 0
> >> [   18.027667]   EA = 0, S1PTW = 0
> >> [   18.030905] Data abort info:
> >> [   18.033877]   ISV = 0, ISS = 0x00000005
> >> [   18.037835]   CM = 0, WnR = 0
> >> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
> >> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
> >> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
> >> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> >> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
> >> [   18.077174] Hardware name: Overlake (DT)
> >> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
> >> [   18.086170] pc : tee_shm_free+0x18/0x48
> >> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
> >> [   18.095066] sp : ffff80001005bb90
> >> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
> >> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
> >> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
> >> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
> >> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
> >> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
> >> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
> >> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
> >> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
> >> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
> >> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
> >> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
> >> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
> >> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
> >> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
> >> [   18.180654] Call trace:
> >> [   18.183176]  tee_shm_free+0x18/0x48
> >> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
> >> [   18.191356]  optee_shutdown+0x20/0x30
> >> [   18.195135]  platform_drv_shutdown+0x2c/0x38
> >> [   18.199538]  device_shutdown+0x180/0x298
> >> [   18.203586]  kernel_restart_prepare+0x44/0x50
> >> [   18.208078]  kernel_restart+0x20/0x68
> >> [   18.211853]  __do_sys_reboot+0x104/0x258
> >> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
> >> [   18.220035]  el0_svc_handler+0x90/0x138
> >> [   18.223991]  el0_svc+0x8/0x208
> >> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
> >> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
> >> [   18.238621] Kernel panic - not syncing: Fatal exception
> >> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
> >> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
> >> [   18.254613] CPU features: 0x0002,21806008
> >> [   18.258747] Memory Limit: none
> >> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
> >>
> >> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
> >> Should disable and clear all the cache) we run into the crash trying to free shm.
> >>
> >> Thoughts?
> >
> > It seems that the pointer is invalid, but the pointer doesn't look
> > like garbage. Could the kernel have unmapped the memory area covering
> > that address?
> >
>
>  Yes, I am not entirely sure if the kernel had the time to unmap the memory.
> Right after triggering the crash the kdump kernel is booted and I see the following
>
> [ 2.050145] optee: probing for conduit method.
> [ 2.054743] optee: revision 3.6 (f84427aa)
> [ 2.054821] optee: dynamic shared memory is enabled
> [ 2.066186] optee: initialized driver
>
> Could this be previous un-released maps causing corruption?

Aha, yes, that could be it.

Cheers,
Jens

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-05-06  7:19                 ` Jens Wiklander
  0 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-05-06  7:19 UTC (permalink / raw)
  To: Allen Pais
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Thu, May 6, 2021 at 9:10 AM Allen Pais <apais@linux.microsoft.com> wrote:
>
>
> >>>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
> >>>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
> >>>>>>
> >>>>>> tee_shm_release() is not invoked on dma shm buffer.
> >>>>>>
> >>>>>> Implement .shutdown() method to handle the release of the buffers
> >>>>>> correctly.
> >>>>>>
> >>>>>> More info:
> >>>>>> https://github.com/OP-TEE/optee_os/issues/3637
> >>>>>>
> >>>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> >>>>>> ---
> >>>>>> drivers/tee/optee/core.c | 20 ++++++++++++++++++++
> >>>>>> 1 file changed, 20 insertions(+)
> >>>>>
> >>>>> This looks good to me. Do you have a practical way of testing this on
> >>>>> QEMU for instance?
> >>>>>
> >>>>
> >>>> Jens,
> >>>>
> >>>>  I could not reproduce nor create a setup using QEMU, I could only
> >>>> do it on a real h/w.
> >>>>
> >>>>  I have extensively tested the fix and I don't see any issues.
> >>>
> >>> I did a few test runs too, seems OK.
> >>
> >> I carried these changes and have not run into any issues with Kexec so far.
> >> Last week, while trying out kdump, we ran into a crash(this is when the
> >> Kdump kernel reboots).
> >>
> >> $echo c > /proc/sysrq-trigger
> >>
> >> Leads to:
> >>
> >> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
> >> [   18.013002] Mem abort info:
> >> [   18.015885]   ESR = 0x96000005
> >> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
> >> [   18.024516]   SET = 0, FnV = 0
> >> [   18.027667]   EA = 0, S1PTW = 0
> >> [   18.030905] Data abort info:
> >> [   18.033877]   ISV = 0, ISS = 0x00000005
> >> [   18.037835]   CM = 0, WnR = 0
> >> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
> >> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
> >> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
> >> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> >> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
> >> [   18.077174] Hardware name: Overlake (DT)
> >> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
> >> [   18.086170] pc : tee_shm_free+0x18/0x48
> >> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
> >> [   18.095066] sp : ffff80001005bb90
> >> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
> >> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
> >> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
> >> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
> >> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
> >> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
> >> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
> >> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
> >> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
> >> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
> >> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
> >> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
> >> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
> >> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
> >> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
> >> [   18.180654] Call trace:
> >> [   18.183176]  tee_shm_free+0x18/0x48
> >> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
> >> [   18.191356]  optee_shutdown+0x20/0x30
> >> [   18.195135]  platform_drv_shutdown+0x2c/0x38
> >> [   18.199538]  device_shutdown+0x180/0x298
> >> [   18.203586]  kernel_restart_prepare+0x44/0x50
> >> [   18.208078]  kernel_restart+0x20/0x68
> >> [   18.211853]  __do_sys_reboot+0x104/0x258
> >> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
> >> [   18.220035]  el0_svc_handler+0x90/0x138
> >> [   18.223991]  el0_svc+0x8/0x208
> >> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
> >> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
> >> [   18.238621] Kernel panic - not syncing: Fatal exception
> >> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
> >> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
> >> [   18.254613] CPU features: 0x0002,21806008
> >> [   18.258747] Memory Limit: none
> >> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
> >>
> >> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
> >> Should disable and clear all the cache) we run into the crash trying to free shm.
> >>
> >> Thoughts?
> >
> > It seems that the pointer is invalid, but the pointer doesn't look
> > like garbage. Could the kernel have unmapped the memory area covering
> > that address?
> >
>
>  Yes, I am not entirely sure if the kernel had the time to unmap the memory.
> Right after triggering the crash the kdump kernel is booted and I see the following
>
> [ 2.050145] optee: probing for conduit method.
> [ 2.054743] optee: revision 3.6 (f84427aa)
> [ 2.054821] optee: dynamic shared memory is enabled
> [ 2.066186] optee: initialized driver
>
> Could this be previous un-released maps causing corruption?

Aha, yes, that could be it.

Cheers,
Jens

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-05-06  7:19                 ` Jens Wiklander
@ 2021-05-06  7:29                   ` Allen Pais
  -1 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-06  7:29 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

>> 
>>>>>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>>>>>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>>>>>>> 
>>>>>>>> tee_shm_release() is not invoked on dma shm buffer.
>>>>>>>> 
>>>>>>>> Implement .shutdown() method to handle the release of the buffers
>>>>>>>> correctly.
>>>>>>>> 
>>>>>>>> More info:
>>>>>>>> https://github.com/OP-TEE/optee_os/issues/3637
>>>>>>>> 
>>>>>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>>>>>>>> ---
>>>>>>>> drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>>>>>>> 1 file changed, 20 insertions(+)
>>>>>>> 
>>>>>>> This looks good to me. Do you have a practical way of testing this on
>>>>>>> QEMU for instance?
>>>>>>> 
>>>>>> 
>>>>>> Jens,
>>>>>> 
>>>>>> I could not reproduce nor create a setup using QEMU, I could only
>>>>>> do it on a real h/w.
>>>>>> 
>>>>>> I have extensively tested the fix and I don't see any issues.
>>>>> 
>>>>> I did a few test runs too, seems OK.
>>>> 
>>>> I carried these changes and have not run into any issues with Kexec so far.
>>>> Last week, while trying out kdump, we ran into a crash(this is when the
>>>> Kdump kernel reboots).
>>>> 
>>>> $echo c > /proc/sysrq-trigger
>>>> 
>>>> Leads to:
>>>> 
>>>> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
>>>> [   18.013002] Mem abort info:
>>>> [   18.015885]   ESR = 0x96000005
>>>> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>> [   18.024516]   SET = 0, FnV = 0
>>>> [   18.027667]   EA = 0, S1PTW = 0
>>>> [   18.030905] Data abort info:
>>>> [   18.033877]   ISV = 0, ISS = 0x00000005
>>>> [   18.037835]   CM = 0, WnR = 0
>>>> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
>>>> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
>>>> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
>>>> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
>>>> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
>>>> [   18.077174] Hardware name: Overlake (DT)
>>>> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
>>>> [   18.086170] pc : tee_shm_free+0x18/0x48
>>>> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
>>>> [   18.095066] sp : ffff80001005bb90
>>>> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
>>>> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
>>>> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
>>>> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
>>>> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
>>>> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
>>>> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
>>>> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
>>>> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
>>>> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
>>>> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
>>>> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
>>>> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
>>>> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
>>>> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
>>>> [   18.180654] Call trace:
>>>> [   18.183176]  tee_shm_free+0x18/0x48
>>>> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
>>>> [   18.191356]  optee_shutdown+0x20/0x30
>>>> [   18.195135]  platform_drv_shutdown+0x2c/0x38
>>>> [   18.199538]  device_shutdown+0x180/0x298
>>>> [   18.203586]  kernel_restart_prepare+0x44/0x50
>>>> [   18.208078]  kernel_restart+0x20/0x68
>>>> [   18.211853]  __do_sys_reboot+0x104/0x258
>>>> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
>>>> [   18.220035]  el0_svc_handler+0x90/0x138
>>>> [   18.223991]  el0_svc+0x8/0x208
>>>> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
>>>> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
>>>> [   18.238621] Kernel panic - not syncing: Fatal exception
>>>> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
>>>> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
>>>> [   18.254613] CPU features: 0x0002,21806008
>>>> [   18.258747] Memory Limit: none
>>>> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
>>>> 
>>>> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
>>>> Should disable and clear all the cache) we run into the crash trying to free shm.
>>>> 
>>>> Thoughts?
>>> 
>>> It seems that the pointer is invalid, but the pointer doesn't look
>>> like garbage. Could the kernel have unmapped the memory area covering
>>> that address?
>>> 
>> 
>> Yes, I am not entirely sure if the kernel had the time to unmap the memory.
>> Right after triggering the crash the kdump kernel is booted and I see the following
>> 
>> [ 2.050145] optee: probing for conduit method.
>> [ 2.054743] optee: revision 3.6 (f84427aa)
>> [ 2.054821] optee: dynamic shared memory is enabled
>> [ 2.066186] optee: initialized driver
>> 
>> Could this be previous un-released maps causing corruption?
> 
> Aha, yes, that could be it.
> 

How about checking for the ptr?

diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
index aadedec3bfe7..8dc4fe9a1588 100644
--- a/drivers/tee/optee/call.c
+++ b/drivers/tee/optee/call.c
@@ -426,10 +426,12 @@ void optee_disable_shm_cache(struct optee *optee)
                if (res.result.status == OPTEE_SMC_RETURN_ENOTAVAIL)
                        break; /* All shm's freed */
                if (res.result.status == OPTEE_SMC_RETURN_OK) {
-                       struct tee_shm *shm;
+                       struct tee_shm *shm = NULL;
 
                        shm = reg_pair_to_ptr(res.result.shm_upper32,
                                              res.result.shm_lower32);
+                       if (IS_ERR(shm))
+                               return PTR_ERR(shm);
                        tee_shm_free(shm);

Thanks.

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-05-06  7:29                   ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-06  7:29 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

>> 
>>>>>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>>>>>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>>>>>>> 
>>>>>>>> tee_shm_release() is not invoked on dma shm buffer.
>>>>>>>> 
>>>>>>>> Implement .shutdown() method to handle the release of the buffers
>>>>>>>> correctly.
>>>>>>>> 
>>>>>>>> More info:
>>>>>>>> https://github.com/OP-TEE/optee_os/issues/3637
>>>>>>>> 
>>>>>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>>>>>>>> ---
>>>>>>>> drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>>>>>>> 1 file changed, 20 insertions(+)
>>>>>>> 
>>>>>>> This looks good to me. Do you have a practical way of testing this on
>>>>>>> QEMU for instance?
>>>>>>> 
>>>>>> 
>>>>>> Jens,
>>>>>> 
>>>>>> I could not reproduce nor create a setup using QEMU, I could only
>>>>>> do it on a real h/w.
>>>>>> 
>>>>>> I have extensively tested the fix and I don't see any issues.
>>>>> 
>>>>> I did a few test runs too, seems OK.
>>>> 
>>>> I carried these changes and have not run into any issues with Kexec so far.
>>>> Last week, while trying out kdump, we ran into a crash(this is when the
>>>> Kdump kernel reboots).
>>>> 
>>>> $echo c > /proc/sysrq-trigger
>>>> 
>>>> Leads to:
>>>> 
>>>> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
>>>> [   18.013002] Mem abort info:
>>>> [   18.015885]   ESR = 0x96000005
>>>> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>> [   18.024516]   SET = 0, FnV = 0
>>>> [   18.027667]   EA = 0, S1PTW = 0
>>>> [   18.030905] Data abort info:
>>>> [   18.033877]   ISV = 0, ISS = 0x00000005
>>>> [   18.037835]   CM = 0, WnR = 0
>>>> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
>>>> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
>>>> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
>>>> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
>>>> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
>>>> [   18.077174] Hardware name: Overlake (DT)
>>>> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
>>>> [   18.086170] pc : tee_shm_free+0x18/0x48
>>>> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
>>>> [   18.095066] sp : ffff80001005bb90
>>>> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
>>>> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
>>>> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
>>>> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
>>>> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
>>>> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
>>>> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
>>>> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
>>>> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
>>>> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
>>>> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
>>>> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
>>>> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
>>>> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
>>>> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
>>>> [   18.180654] Call trace:
>>>> [   18.183176]  tee_shm_free+0x18/0x48
>>>> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
>>>> [   18.191356]  optee_shutdown+0x20/0x30
>>>> [   18.195135]  platform_drv_shutdown+0x2c/0x38
>>>> [   18.199538]  device_shutdown+0x180/0x298
>>>> [   18.203586]  kernel_restart_prepare+0x44/0x50
>>>> [   18.208078]  kernel_restart+0x20/0x68
>>>> [   18.211853]  __do_sys_reboot+0x104/0x258
>>>> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
>>>> [   18.220035]  el0_svc_handler+0x90/0x138
>>>> [   18.223991]  el0_svc+0x8/0x208
>>>> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
>>>> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
>>>> [   18.238621] Kernel panic - not syncing: Fatal exception
>>>> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
>>>> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
>>>> [   18.254613] CPU features: 0x0002,21806008
>>>> [   18.258747] Memory Limit: none
>>>> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
>>>> 
>>>> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
>>>> Should disable and clear all the cache) we run into the crash trying to free shm.
>>>> 
>>>> Thoughts?
>>> 
>>> It seems that the pointer is invalid, but the pointer doesn't look
>>> like garbage. Could the kernel have unmapped the memory area covering
>>> that address?
>>> 
>> 
>> Yes, I am not entirely sure if the kernel had the time to unmap the memory.
>> Right after triggering the crash the kdump kernel is booted and I see the following
>> 
>> [ 2.050145] optee: probing for conduit method.
>> [ 2.054743] optee: revision 3.6 (f84427aa)
>> [ 2.054821] optee: dynamic shared memory is enabled
>> [ 2.066186] optee: initialized driver
>> 
>> Could this be previous un-released maps causing corruption?
> 
> Aha, yes, that could be it.
> 

How about checking for the ptr?

diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
index aadedec3bfe7..8dc4fe9a1588 100644
--- a/drivers/tee/optee/call.c
+++ b/drivers/tee/optee/call.c
@@ -426,10 +426,12 @@ void optee_disable_shm_cache(struct optee *optee)
                if (res.result.status == OPTEE_SMC_RETURN_ENOTAVAIL)
                        break; /* All shm's freed */
                if (res.result.status == OPTEE_SMC_RETURN_OK) {
-                       struct tee_shm *shm;
+                       struct tee_shm *shm = NULL;
 
                        shm = reg_pair_to_ptr(res.result.shm_upper32,
                                              res.result.shm_lower32);
+                       if (IS_ERR(shm))
+                               return PTR_ERR(shm);
                        tee_shm_free(shm);

Thanks.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-05-06  7:29                   ` Allen Pais
@ 2021-05-06  8:15                     ` Jens Wiklander
  -1 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-05-06  8:15 UTC (permalink / raw)
  To: Allen Pais
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Thu, May 6, 2021 at 9:29 AM Allen Pais <apais@linux.microsoft.com> wrote:
>
> >>
> >>>>>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
> >>>>>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
> >>>>>>>>
> >>>>>>>> tee_shm_release() is not invoked on dma shm buffer.
> >>>>>>>>
> >>>>>>>> Implement .shutdown() method to handle the release of the buffers
> >>>>>>>> correctly.
> >>>>>>>>
> >>>>>>>> More info:
> >>>>>>>> https://github.com/OP-TEE/optee_os/issues/3637
> >>>>>>>>
> >>>>>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> >>>>>>>> ---
> >>>>>>>> drivers/tee/optee/core.c | 20 ++++++++++++++++++++
> >>>>>>>> 1 file changed, 20 insertions(+)
> >>>>>>>
> >>>>>>> This looks good to me. Do you have a practical way of testing this on
> >>>>>>> QEMU for instance?
> >>>>>>>
> >>>>>>
> >>>>>> Jens,
> >>>>>>
> >>>>>> I could not reproduce nor create a setup using QEMU, I could only
> >>>>>> do it on a real h/w.
> >>>>>>
> >>>>>> I have extensively tested the fix and I don't see any issues.
> >>>>>
> >>>>> I did a few test runs too, seems OK.
> >>>>
> >>>> I carried these changes and have not run into any issues with Kexec so far.
> >>>> Last week, while trying out kdump, we ran into a crash(this is when the
> >>>> Kdump kernel reboots).
> >>>>
> >>>> $echo c > /proc/sysrq-trigger
> >>>>
> >>>> Leads to:
> >>>>
> >>>> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
> >>>> [   18.013002] Mem abort info:
> >>>> [   18.015885]   ESR = 0x96000005
> >>>> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
> >>>> [   18.024516]   SET = 0, FnV = 0
> >>>> [   18.027667]   EA = 0, S1PTW = 0
> >>>> [   18.030905] Data abort info:
> >>>> [   18.033877]   ISV = 0, ISS = 0x00000005
> >>>> [   18.037835]   CM = 0, WnR = 0
> >>>> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
> >>>> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
> >>>> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
> >>>> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> >>>> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
> >>>> [   18.077174] Hardware name: Overlake (DT)
> >>>> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
> >>>> [   18.086170] pc : tee_shm_free+0x18/0x48
> >>>> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
> >>>> [   18.095066] sp : ffff80001005bb90
> >>>> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
> >>>> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
> >>>> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
> >>>> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
> >>>> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
> >>>> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
> >>>> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
> >>>> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
> >>>> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
> >>>> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
> >>>> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
> >>>> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
> >>>> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
> >>>> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
> >>>> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
> >>>> [   18.180654] Call trace:
> >>>> [   18.183176]  tee_shm_free+0x18/0x48
> >>>> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
> >>>> [   18.191356]  optee_shutdown+0x20/0x30
> >>>> [   18.195135]  platform_drv_shutdown+0x2c/0x38
> >>>> [   18.199538]  device_shutdown+0x180/0x298
> >>>> [   18.203586]  kernel_restart_prepare+0x44/0x50
> >>>> [   18.208078]  kernel_restart+0x20/0x68
> >>>> [   18.211853]  __do_sys_reboot+0x104/0x258
> >>>> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
> >>>> [   18.220035]  el0_svc_handler+0x90/0x138
> >>>> [   18.223991]  el0_svc+0x8/0x208
> >>>> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
> >>>> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
> >>>> [   18.238621] Kernel panic - not syncing: Fatal exception
> >>>> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
> >>>> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
> >>>> [   18.254613] CPU features: 0x0002,21806008
> >>>> [   18.258747] Memory Limit: none
> >>>> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
> >>>>
> >>>> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
> >>>> Should disable and clear all the cache) we run into the crash trying to free shm.
> >>>>
> >>>> Thoughts?
> >>>
> >>> It seems that the pointer is invalid, but the pointer doesn't look
> >>> like garbage. Could the kernel have unmapped the memory area covering
> >>> that address?
> >>>
> >>
> >> Yes, I am not entirely sure if the kernel had the time to unmap the memory.
> >> Right after triggering the crash the kdump kernel is booted and I see the following
> >>
> >> [ 2.050145] optee: probing for conduit method.
> >> [ 2.054743] optee: revision 3.6 (f84427aa)
> >> [ 2.054821] optee: dynamic shared memory is enabled
> >> [ 2.066186] optee: initialized driver
> >>
> >> Could this be previous un-released maps causing corruption?
> >
> > Aha, yes, that could be it.
> >
>
> How about checking for the ptr?
>
> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
> index aadedec3bfe7..8dc4fe9a1588 100644
> --- a/drivers/tee/optee/call.c
> +++ b/drivers/tee/optee/call.c
> @@ -426,10 +426,12 @@ void optee_disable_shm_cache(struct optee *optee)
>                 if (res.result.status == OPTEE_SMC_RETURN_ENOTAVAIL)
>                         break; /* All shm's freed */
>                 if (res.result.status == OPTEE_SMC_RETURN_OK) {
> -                       struct tee_shm *shm;
> +                       struct tee_shm *shm = NULL;
>
>                         shm = reg_pair_to_ptr(res.result.shm_upper32,
>                                               res.result.shm_lower32);
> +                       if (IS_ERR(shm))
> +                               return PTR_ERR(shm);
>                         tee_shm_free(shm);

I don't think that will help. If your theory is correct then that
pointer is from an older incarnation of the kernel. It could be worth
trying calling this function just before the call to
optee_enable_shm_cache() in optee_probe() but skipping the calls to
`tee_shm_free()` in that case. Since the kernel has restarted these
returned pointers are not valid any more and there's nothing to free,
we just need to make sure that secure world stops using those too.

Cheers,
Jens

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-05-06  8:15                     ` Jens Wiklander
  0 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-05-06  8:15 UTC (permalink / raw)
  To: Allen Pais
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Thu, May 6, 2021 at 9:29 AM Allen Pais <apais@linux.microsoft.com> wrote:
>
> >>
> >>>>>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
> >>>>>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
> >>>>>>>>
> >>>>>>>> tee_shm_release() is not invoked on dma shm buffer.
> >>>>>>>>
> >>>>>>>> Implement .shutdown() method to handle the release of the buffers
> >>>>>>>> correctly.
> >>>>>>>>
> >>>>>>>> More info:
> >>>>>>>> https://github.com/OP-TEE/optee_os/issues/3637
> >>>>>>>>
> >>>>>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
> >>>>>>>> ---
> >>>>>>>> drivers/tee/optee/core.c | 20 ++++++++++++++++++++
> >>>>>>>> 1 file changed, 20 insertions(+)
> >>>>>>>
> >>>>>>> This looks good to me. Do you have a practical way of testing this on
> >>>>>>> QEMU for instance?
> >>>>>>>
> >>>>>>
> >>>>>> Jens,
> >>>>>>
> >>>>>> I could not reproduce nor create a setup using QEMU, I could only
> >>>>>> do it on a real h/w.
> >>>>>>
> >>>>>> I have extensively tested the fix and I don't see any issues.
> >>>>>
> >>>>> I did a few test runs too, seems OK.
> >>>>
> >>>> I carried these changes and have not run into any issues with Kexec so far.
> >>>> Last week, while trying out kdump, we ran into a crash(this is when the
> >>>> Kdump kernel reboots).
> >>>>
> >>>> $echo c > /proc/sysrq-trigger
> >>>>
> >>>> Leads to:
> >>>>
> >>>> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
> >>>> [   18.013002] Mem abort info:
> >>>> [   18.015885]   ESR = 0x96000005
> >>>> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
> >>>> [   18.024516]   SET = 0, FnV = 0
> >>>> [   18.027667]   EA = 0, S1PTW = 0
> >>>> [   18.030905] Data abort info:
> >>>> [   18.033877]   ISV = 0, ISS = 0x00000005
> >>>> [   18.037835]   CM = 0, WnR = 0
> >>>> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
> >>>> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
> >>>> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
> >>>> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> >>>> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
> >>>> [   18.077174] Hardware name: Overlake (DT)
> >>>> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
> >>>> [   18.086170] pc : tee_shm_free+0x18/0x48
> >>>> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
> >>>> [   18.095066] sp : ffff80001005bb90
> >>>> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
> >>>> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
> >>>> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
> >>>> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
> >>>> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
> >>>> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
> >>>> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
> >>>> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
> >>>> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
> >>>> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
> >>>> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
> >>>> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
> >>>> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
> >>>> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
> >>>> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
> >>>> [   18.180654] Call trace:
> >>>> [   18.183176]  tee_shm_free+0x18/0x48
> >>>> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
> >>>> [   18.191356]  optee_shutdown+0x20/0x30
> >>>> [   18.195135]  platform_drv_shutdown+0x2c/0x38
> >>>> [   18.199538]  device_shutdown+0x180/0x298
> >>>> [   18.203586]  kernel_restart_prepare+0x44/0x50
> >>>> [   18.208078]  kernel_restart+0x20/0x68
> >>>> [   18.211853]  __do_sys_reboot+0x104/0x258
> >>>> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
> >>>> [   18.220035]  el0_svc_handler+0x90/0x138
> >>>> [   18.223991]  el0_svc+0x8/0x208
> >>>> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
> >>>> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
> >>>> [   18.238621] Kernel panic - not syncing: Fatal exception
> >>>> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
> >>>> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
> >>>> [   18.254613] CPU features: 0x0002,21806008
> >>>> [   18.258747] Memory Limit: none
> >>>> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
> >>>>
> >>>> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
> >>>> Should disable and clear all the cache) we run into the crash trying to free shm.
> >>>>
> >>>> Thoughts?
> >>>
> >>> It seems that the pointer is invalid, but the pointer doesn't look
> >>> like garbage. Could the kernel have unmapped the memory area covering
> >>> that address?
> >>>
> >>
> >> Yes, I am not entirely sure if the kernel had the time to unmap the memory.
> >> Right after triggering the crash the kdump kernel is booted and I see the following
> >>
> >> [ 2.050145] optee: probing for conduit method.
> >> [ 2.054743] optee: revision 3.6 (f84427aa)
> >> [ 2.054821] optee: dynamic shared memory is enabled
> >> [ 2.066186] optee: initialized driver
> >>
> >> Could this be previous un-released maps causing corruption?
> >
> > Aha, yes, that could be it.
> >
>
> How about checking for the ptr?
>
> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
> index aadedec3bfe7..8dc4fe9a1588 100644
> --- a/drivers/tee/optee/call.c
> +++ b/drivers/tee/optee/call.c
> @@ -426,10 +426,12 @@ void optee_disable_shm_cache(struct optee *optee)
>                 if (res.result.status == OPTEE_SMC_RETURN_ENOTAVAIL)
>                         break; /* All shm's freed */
>                 if (res.result.status == OPTEE_SMC_RETURN_OK) {
> -                       struct tee_shm *shm;
> +                       struct tee_shm *shm = NULL;
>
>                         shm = reg_pair_to_ptr(res.result.shm_upper32,
>                                               res.result.shm_lower32);
> +                       if (IS_ERR(shm))
> +                               return PTR_ERR(shm);
>                         tee_shm_free(shm);

I don't think that will help. If your theory is correct then that
pointer is from an older incarnation of the kernel. It could be worth
trying calling this function just before the call to
optee_enable_shm_cache() in optee_probe() but skipping the calls to
`tee_shm_free()` in that case. Since the kernel has restarted these
returned pointers are not valid any more and there's nothing to free,
we just need to make sure that secure world stops using those too.

Cheers,
Jens

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-05-06  8:15                     ` Jens Wiklander
@ 2021-05-06  8:35                       ` Allen Pais
  -1 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-06  8:35 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware



>> 
>>>> 
>>>>>>>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>>>>>>>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>>>>>>>>> 
>>>>>>>>>> tee_shm_release() is not invoked on dma shm buffer.
>>>>>>>>>> 
>>>>>>>>>> Implement .shutdown() method to handle the release of the buffers
>>>>>>>>>> correctly.
>>>>>>>>>> 
>>>>>>>>>> More info:
>>>>>>>>>> https://github.com/OP-TEE/optee_os/issues/3637
>>>>>>>>>> 
>>>>>>>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>>>>>>>>>> ---
>>>>>>>>>> drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>>>>>>>>> 1 file changed, 20 insertions(+)
>>>>>>>>> 
>>>>>>>>> This looks good to me. Do you have a practical way of testing this on
>>>>>>>>> QEMU for instance?
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jens,
>>>>>>>> 
>>>>>>>> I could not reproduce nor create a setup using QEMU, I could only
>>>>>>>> do it on a real h/w.
>>>>>>>> 
>>>>>>>> I have extensively tested the fix and I don't see any issues.
>>>>>>> 
>>>>>>> I did a few test runs too, seems OK.
>>>>>> 
>>>>>> I carried these changes and have not run into any issues with Kexec so far.
>>>>>> Last week, while trying out kdump, we ran into a crash(this is when the
>>>>>> Kdump kernel reboots).
>>>>>> 
>>>>>> $echo c > /proc/sysrq-trigger
>>>>>> 
>>>>>> Leads to:
>>>>>> 
>>>>>> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
>>>>>> [   18.013002] Mem abort info:
>>>>>> [   18.015885]   ESR = 0x96000005
>>>>>> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>> [   18.024516]   SET = 0, FnV = 0
>>>>>> [   18.027667]   EA = 0, S1PTW = 0
>>>>>> [   18.030905] Data abort info:
>>>>>> [   18.033877]   ISV = 0, ISS = 0x00000005
>>>>>> [   18.037835]   CM = 0, WnR = 0
>>>>>> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
>>>>>> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
>>>>>> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
>>>>>> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
>>>>>> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
>>>>>> [   18.077174] Hardware name: Overlake (DT)
>>>>>> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
>>>>>> [   18.086170] pc : tee_shm_free+0x18/0x48
>>>>>> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
>>>>>> [   18.095066] sp : ffff80001005bb90
>>>>>> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
>>>>>> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
>>>>>> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
>>>>>> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
>>>>>> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
>>>>>> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
>>>>>> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
>>>>>> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
>>>>>> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
>>>>>> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
>>>>>> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
>>>>>> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
>>>>>> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
>>>>>> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
>>>>>> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
>>>>>> [   18.180654] Call trace:
>>>>>> [   18.183176]  tee_shm_free+0x18/0x48
>>>>>> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
>>>>>> [   18.191356]  optee_shutdown+0x20/0x30
>>>>>> [   18.195135]  platform_drv_shutdown+0x2c/0x38
>>>>>> [   18.199538]  device_shutdown+0x180/0x298
>>>>>> [   18.203586]  kernel_restart_prepare+0x44/0x50
>>>>>> [   18.208078]  kernel_restart+0x20/0x68
>>>>>> [   18.211853]  __do_sys_reboot+0x104/0x258
>>>>>> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
>>>>>> [   18.220035]  el0_svc_handler+0x90/0x138
>>>>>> [   18.223991]  el0_svc+0x8/0x208
>>>>>> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
>>>>>> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
>>>>>> [   18.238621] Kernel panic - not syncing: Fatal exception
>>>>>> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
>>>>>> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
>>>>>> [   18.254613] CPU features: 0x0002,21806008
>>>>>> [   18.258747] Memory Limit: none
>>>>>> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
>>>>>> 
>>>>>> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
>>>>>> Should disable and clear all the cache) we run into the crash trying to free shm.
>>>>>> 
>>>>>> Thoughts?
>>>>> 
>>>>> It seems that the pointer is invalid, but the pointer doesn't look
>>>>> like garbage. Could the kernel have unmapped the memory area covering
>>>>> that address?
>>>>> 
>>>> 
>>>> Yes, I am not entirely sure if the kernel had the time to unmap the memory.
>>>> Right after triggering the crash the kdump kernel is booted and I see the following
>>>> 
>>>> [ 2.050145] optee: probing for conduit method.
>>>> [ 2.054743] optee: revision 3.6 (f84427aa)
>>>> [ 2.054821] optee: dynamic shared memory is enabled
>>>> [ 2.066186] optee: initialized driver
>>>> 
>>>> Could this be previous un-released maps causing corruption?
>>> 
>>> Aha, yes, that could be it.
>>> 
>> 
>> How about checking for the ptr?
>> 
>> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
>> index aadedec3bfe7..8dc4fe9a1588 100644
>> --- a/drivers/tee/optee/call.c
>> +++ b/drivers/tee/optee/call.c
>> @@ -426,10 +426,12 @@ void optee_disable_shm_cache(struct optee *optee)
>>                if (res.result.status == OPTEE_SMC_RETURN_ENOTAVAIL)
>>                        break; /* All shm's freed */
>>                if (res.result.status == OPTEE_SMC_RETURN_OK) {
>> -                       struct tee_shm *shm;
>> +                       struct tee_shm *shm = NULL;
>> 
>>                        shm = reg_pair_to_ptr(res.result.shm_upper32,
>>                                              res.result.shm_lower32);
>> +                       if (IS_ERR(shm))
>> +                               return PTR_ERR(shm);
>>                        tee_shm_free(shm);
> 
> I don't think that will help. If your theory is correct then that
> pointer is from an older incarnation of the kernel. It could be worth
> trying calling this function just before the call to
> optee_enable_shm_cache() in optee_probe() but skipping the calls to
> `tee_shm_free()` in that case. Since the kernel has restarted these
> returned pointers are not valid any more and there's nothing to free,
> we just need to make sure that secure world stops using those too.
> 

I thought about it too, but was not very sure. 

Calling optee_disable_shm_cache() before the enable call to ensure
That we have dropped all references to the secure world and looking
To start of fresh. Lemme try that out.

Thanks.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-05-06  8:35                       ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-06  8:35 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware



>> 
>>>> 
>>>>>>>>>> [    0.368428] tee_bnxt_fw optee-clnt0: tee_shm_alloc failed
>>>>>>>>>> [    0.368461] tee_bnxt_fw: probe of optee-clnt0 failed with error -22
>>>>>>>>>> 
>>>>>>>>>> tee_shm_release() is not invoked on dma shm buffer.
>>>>>>>>>> 
>>>>>>>>>> Implement .shutdown() method to handle the release of the buffers
>>>>>>>>>> correctly.
>>>>>>>>>> 
>>>>>>>>>> More info:
>>>>>>>>>> https://github.com/OP-TEE/optee_os/issues/3637
>>>>>>>>>> 
>>>>>>>>>> Signed-off-by: Allen Pais <apais@linux.microsoft.com>
>>>>>>>>>> ---
>>>>>>>>>> drivers/tee/optee/core.c | 20 ++++++++++++++++++++
>>>>>>>>>> 1 file changed, 20 insertions(+)
>>>>>>>>> 
>>>>>>>>> This looks good to me. Do you have a practical way of testing this on
>>>>>>>>> QEMU for instance?
>>>>>>>>> 
>>>>>>>> 
>>>>>>>> Jens,
>>>>>>>> 
>>>>>>>> I could not reproduce nor create a setup using QEMU, I could only
>>>>>>>> do it on a real h/w.
>>>>>>>> 
>>>>>>>> I have extensively tested the fix and I don't see any issues.
>>>>>>> 
>>>>>>> I did a few test runs too, seems OK.
>>>>>> 
>>>>>> I carried these changes and have not run into any issues with Kexec so far.
>>>>>> Last week, while trying out kdump, we ran into a crash(this is when the
>>>>>> Kdump kernel reboots).
>>>>>> 
>>>>>> $echo c > /proc/sysrq-trigger
>>>>>> 
>>>>>> Leads to:
>>>>>> 
>>>>>> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
>>>>>> [   18.013002] Mem abort info:
>>>>>> [   18.015885]   ESR = 0x96000005
>>>>>> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>> [   18.024516]   SET = 0, FnV = 0
>>>>>> [   18.027667]   EA = 0, S1PTW = 0
>>>>>> [   18.030905] Data abort info:
>>>>>> [   18.033877]   ISV = 0, ISS = 0x00000005
>>>>>> [   18.037835]   CM = 0, WnR = 0
>>>>>> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
>>>>>> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
>>>>>> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
>>>>>> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
>>>>>> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
>>>>>> [   18.077174] Hardware name: Overlake (DT)
>>>>>> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
>>>>>> [   18.086170] pc : tee_shm_free+0x18/0x48
>>>>>> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
>>>>>> [   18.095066] sp : ffff80001005bb90
>>>>>> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
>>>>>> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
>>>>>> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
>>>>>> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
>>>>>> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
>>>>>> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
>>>>>> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
>>>>>> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
>>>>>> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
>>>>>> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
>>>>>> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
>>>>>> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
>>>>>> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
>>>>>> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
>>>>>> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
>>>>>> [   18.180654] Call trace:
>>>>>> [   18.183176]  tee_shm_free+0x18/0x48
>>>>>> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
>>>>>> [   18.191356]  optee_shutdown+0x20/0x30
>>>>>> [   18.195135]  platform_drv_shutdown+0x2c/0x38
>>>>>> [   18.199538]  device_shutdown+0x180/0x298
>>>>>> [   18.203586]  kernel_restart_prepare+0x44/0x50
>>>>>> [   18.208078]  kernel_restart+0x20/0x68
>>>>>> [   18.211853]  __do_sys_reboot+0x104/0x258
>>>>>> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
>>>>>> [   18.220035]  el0_svc_handler+0x90/0x138
>>>>>> [   18.223991]  el0_svc+0x8/0x208
>>>>>> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
>>>>>> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
>>>>>> [   18.238621] Kernel panic - not syncing: Fatal exception
>>>>>> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
>>>>>> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
>>>>>> [   18.254613] CPU features: 0x0002,21806008
>>>>>> [   18.258747] Memory Limit: none
>>>>>> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
>>>>>> 
>>>>>> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
>>>>>> Should disable and clear all the cache) we run into the crash trying to free shm.
>>>>>> 
>>>>>> Thoughts?
>>>>> 
>>>>> It seems that the pointer is invalid, but the pointer doesn't look
>>>>> like garbage. Could the kernel have unmapped the memory area covering
>>>>> that address?
>>>>> 
>>>> 
>>>> Yes, I am not entirely sure if the kernel had the time to unmap the memory.
>>>> Right after triggering the crash the kdump kernel is booted and I see the following
>>>> 
>>>> [ 2.050145] optee: probing for conduit method.
>>>> [ 2.054743] optee: revision 3.6 (f84427aa)
>>>> [ 2.054821] optee: dynamic shared memory is enabled
>>>> [ 2.066186] optee: initialized driver
>>>> 
>>>> Could this be previous un-released maps causing corruption?
>>> 
>>> Aha, yes, that could be it.
>>> 
>> 
>> How about checking for the ptr?
>> 
>> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
>> index aadedec3bfe7..8dc4fe9a1588 100644
>> --- a/drivers/tee/optee/call.c
>> +++ b/drivers/tee/optee/call.c
>> @@ -426,10 +426,12 @@ void optee_disable_shm_cache(struct optee *optee)
>>                if (res.result.status == OPTEE_SMC_RETURN_ENOTAVAIL)
>>                        break; /* All shm's freed */
>>                if (res.result.status == OPTEE_SMC_RETURN_OK) {
>> -                       struct tee_shm *shm;
>> +                       struct tee_shm *shm = NULL;
>> 
>>                        shm = reg_pair_to_ptr(res.result.shm_upper32,
>>                                              res.result.shm_lower32);
>> +                       if (IS_ERR(shm))
>> +                               return PTR_ERR(shm);
>>                        tee_shm_free(shm);
> 
> I don't think that will help. If your theory is correct then that
> pointer is from an older incarnation of the kernel. It could be worth
> trying calling this function just before the call to
> optee_enable_shm_cache() in optee_probe() but skipping the calls to
> `tee_shm_free()` in that case. Since the kernel has restarted these
> returned pointers are not valid any more and there's nothing to free,
> we just need to make sure that secure world stops using those too.
> 

I thought about it too, but was not very sure. 

Calling optee_disable_shm_cache() before the enable call to ensure
That we have dropped all references to the secure world and looking
To start of fresh. Lemme try that out.

Thanks.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [PATCH] optee: Disable shm cache when booting the crash kernel
  2021-02-25  9:06 ` Allen Pais
@ 2021-05-07  3:58   ` Tyler Hicks
  -1 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-05-07  3:58 UTC (permalink / raw)
  To: jens.wiklander, zajec5, Allen Pais
  Cc: bcm-kernel-feedback-list, linux-arm-kernel, linux-kernel, op-tee,
	Allen Pais

The .shutdown hook is not called after a kernel crash when a kdump
kernel is pre-loaded. A kexec into the kdump kernel takes place as
quickly as possible without allowing drivers to clean up.

That means that the OP-TEE shared memory cache, which was initialized by
the kernel that crashed, is still in place when the kdump kernel is
booted. As the kdump kernel is shutdown, the .shutdown hook is called,
which calls optee_disable_shm_cache(), and OP-TEE's
OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
mapped for the kdump kernel since the cache was set up by the previous
kernel. Trying to dereference the tee_shm pointer or otherwise translate
the address results in a fault that cannot be handled:

 Unable to handle kernel paging request at virtual address ffff4317b9c09744
 Mem abort info:
   ESR = 0x96000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
 [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
 Internal error: Oops: 96000004 [#1] SMP
 Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
 CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
 Hardware name: Redacted (DT)
 pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
 pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
 lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
 sp : ffff80001005bb70
 x29: ffff80001005bb70 x28: ffff608e74648e00
 x27: ffff80001005bb98 x26: dead000000000100
 x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
 x23: ffff608e74cf8818 x22: ffff608e738be600
 x21: ffff80001005bbc8 x20: ffff608e738be638
 x19: ffff4317b9c09700 x18: ffffffffffffffff
 x17: 0000000000000041 x16: ffffba61b5171764
 x15: 0000000000000004 x14: 0000000000000fff
 x13: ffffba61b5c9dfc8 x12: 0000000000000003
 x11: 0000000000000000 x10: 0000000000000000
 x9 : ffffba61b5413824 x8 : 00000000ffff4317
 x7 : 0000000000000000 x6 : 0000000000000000
 x5 : 0000000000000000 x4 : 0000000000000000
 x3 : 0000000000000000 x2 : ffff4317b9c09700
 x1 : 00000000ffff4317 x0 : ffff4317b9c09700
 Call trace:
 tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
 optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
 optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
 platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
 device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
 kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
 __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
 do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
 el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
 el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
 el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
 Code: aa0003f3 b5000060 12800003 14000002 (b9404663)

When booting the kdump kernel, drain the shared memory cache while being
careful to not translate the addresses returned from
OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
and the cache is disabled, proceed with re-enabling the cache so that we
aren't dealing with invalid addresses while shutting down the kdump
kernel.

Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
---

This patch fixes a crash introduced by "optee: fix tee out of memory
failure seen during kexec reboot"[1]. However, I don't think that the
original two patch series[2] plus this patch is the full solution to
properly handling OP-TEE shared memory across kexec.

While testing this fix, I did about 10 kexec reboots and then triggered
a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
became unresponsive during boot while steadily streaming the following
errors to the serial console:

 arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
 arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000

I suspect that this is related to the problems of OP-TEE shared memory
handling across kexec. My current hunch is that while we've disabled the
shared memory cache with this patch, we haven't unregistered all of the
addresses that the previous kernel (which crashed) had registered with
OP-TEE and that perhaps OP-TEE OS is still trying to make use those
addresses?

I'm still pretty early in investigating that assumption and
I'm learning about OP-TEE as I go but I wanted to get this initial
fix-of-the-fix out so that it was clear that the v2 of the series[2] is
not complete.

[1] https://lore.kernel.org/lkml/20210225090610.242623-2-allen.lkml@gmail.com/
[2] https://lore.kernel.org/lkml/20210225090610.242623-1-allen.lkml@gmail.com/#t

 drivers/tee/optee/call.c          | 11 ++++++++++-
 drivers/tee/optee/core.c          | 13 +++++++++++--
 drivers/tee/optee/optee_private.h |  2 +-
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
index 6132cc8d014c..799e84bec63d 100644
--- a/drivers/tee/optee/call.c
+++ b/drivers/tee/optee/call.c
@@ -417,8 +417,10 @@ void optee_enable_shm_cache(struct optee *optee)
  * optee_disable_shm_cache() - Disables caching of some shared memory allocation
  *			      in OP-TEE
  * @optee:	main service struct
+ * @is_mapped:	true if the cached shared memory addresses were mapped by this
+ *		kernel, are safe to dereference, and should be freed
  */
-void optee_disable_shm_cache(struct optee *optee)
+void optee_disable_shm_cache(struct optee *optee, bool is_mapped)
 {
 	struct optee_call_waiter w;
 
@@ -437,6 +439,13 @@ void optee_disable_shm_cache(struct optee *optee)
 		if (res.result.status == OPTEE_SMC_RETURN_OK) {
 			struct tee_shm *shm;
 
+			/*
+			 * Shared memory references that were not mapped by
+			 * this kernel must be ignored to prevent a crash.
+			 */
+			if (!is_mapped)
+				continue;
+
 			shm = reg_pair_to_ptr(res.result.shm_upper32,
 					      res.result.shm_lower32);
 			tee_shm_free(shm);
diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
index 69d1f698907c..9985c671bd1f 100644
--- a/drivers/tee/optee/core.c
+++ b/drivers/tee/optee/core.c
@@ -6,6 +6,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/arm-smccc.h>
+#include <linux/crash_dump.h>
 #include <linux/errno.h>
 #include <linux/io.h>
 #include <linux/module.h>
@@ -588,7 +589,7 @@ static int optee_remove(struct platform_device *pdev)
 	 * reference counters and also avoid wild pointers in secure world
 	 * into the old shared memory range.
 	 */
-	optee_disable_shm_cache(optee);
+	optee_disable_shm_cache(optee, true);
 
 	/*
 	 * The two devices have to be unregistered before we can free the
@@ -618,7 +619,7 @@ static int optee_remove(struct platform_device *pdev)
  */
 static void optee_shutdown(struct platform_device *pdev)
 {
-	optee_disable_shm_cache(platform_get_drvdata(pdev));
+	optee_disable_shm_cache(platform_get_drvdata(pdev), true);
 }
 
 static int optee_probe(struct platform_device *pdev)
@@ -705,6 +706,14 @@ static int optee_probe(struct platform_device *pdev)
 	optee->memremaped_shm = memremaped_shm;
 	optee->pool = pool;
 
+	/*
+	 * The kexec into the crash kernel did not call our .shutdown hook. The
+	 * shm cache objects registered with OP-TEE are not valid for the crash
+	 * kernel.
+	 */
+	if (is_kdump_kernel())
+		optee_disable_shm_cache(optee, false);
+
 	optee_enable_shm_cache(optee);
 
 	if (optee->sec_caps & OPTEE_SMC_SEC_CAP_DYNAMIC_SHM)
diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h
index e25b216a14ef..16d8c82213e7 100644
--- a/drivers/tee/optee/optee_private.h
+++ b/drivers/tee/optee/optee_private.h
@@ -158,7 +158,7 @@ int optee_invoke_func(struct tee_context *ctx, struct tee_ioctl_invoke_arg *arg,
 int optee_cancel_req(struct tee_context *ctx, u32 cancel_id, u32 session);
 
 void optee_enable_shm_cache(struct optee *optee);
-void optee_disable_shm_cache(struct optee *optee);
+void optee_disable_shm_cache(struct optee *optee, bool is_mapped);
 
 int optee_shm_register(struct tee_context *ctx, struct tee_shm *shm,
 		       struct page **pages, size_t num_pages,
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [PATCH] optee: Disable shm cache when booting the crash kernel
@ 2021-05-07  3:58   ` Tyler Hicks
  0 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-05-07  3:58 UTC (permalink / raw)
  To: jens.wiklander, zajec5, Allen Pais
  Cc: bcm-kernel-feedback-list, linux-arm-kernel, linux-kernel, op-tee,
	Allen Pais

The .shutdown hook is not called after a kernel crash when a kdump
kernel is pre-loaded. A kexec into the kdump kernel takes place as
quickly as possible without allowing drivers to clean up.

That means that the OP-TEE shared memory cache, which was initialized by
the kernel that crashed, is still in place when the kdump kernel is
booted. As the kdump kernel is shutdown, the .shutdown hook is called,
which calls optee_disable_shm_cache(), and OP-TEE's
OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
mapped for the kdump kernel since the cache was set up by the previous
kernel. Trying to dereference the tee_shm pointer or otherwise translate
the address results in a fault that cannot be handled:

 Unable to handle kernel paging request at virtual address ffff4317b9c09744
 Mem abort info:
   ESR = 0x96000004
   EC = 0x25: DABT (current EL), IL = 32 bits
   SET = 0, FnV = 0
   EA = 0, S1PTW = 0
 Data abort info:
   ISV = 0, ISS = 0x00000004
   CM = 0, WnR = 0
 swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
 [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
 Internal error: Oops: 96000004 [#1] SMP
 Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
 CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
 Hardware name: Redacted (DT)
 pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
 pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
 lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
 sp : ffff80001005bb70
 x29: ffff80001005bb70 x28: ffff608e74648e00
 x27: ffff80001005bb98 x26: dead000000000100
 x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
 x23: ffff608e74cf8818 x22: ffff608e738be600
 x21: ffff80001005bbc8 x20: ffff608e738be638
 x19: ffff4317b9c09700 x18: ffffffffffffffff
 x17: 0000000000000041 x16: ffffba61b5171764
 x15: 0000000000000004 x14: 0000000000000fff
 x13: ffffba61b5c9dfc8 x12: 0000000000000003
 x11: 0000000000000000 x10: 0000000000000000
 x9 : ffffba61b5413824 x8 : 00000000ffff4317
 x7 : 0000000000000000 x6 : 0000000000000000
 x5 : 0000000000000000 x4 : 0000000000000000
 x3 : 0000000000000000 x2 : ffff4317b9c09700
 x1 : 00000000ffff4317 x0 : ffff4317b9c09700
 Call trace:
 tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
 optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
 optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
 platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
 device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
 kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
 __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
 do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
 el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
 el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
 el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
 Code: aa0003f3 b5000060 12800003 14000002 (b9404663)

When booting the kdump kernel, drain the shared memory cache while being
careful to not translate the addresses returned from
OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
and the cache is disabled, proceed with re-enabling the cache so that we
aren't dealing with invalid addresses while shutting down the kdump
kernel.

Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
---

This patch fixes a crash introduced by "optee: fix tee out of memory
failure seen during kexec reboot"[1]. However, I don't think that the
original two patch series[2] plus this patch is the full solution to
properly handling OP-TEE shared memory across kexec.

While testing this fix, I did about 10 kexec reboots and then triggered
a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
became unresponsive during boot while steadily streaming the following
errors to the serial console:

 arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
 arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000

I suspect that this is related to the problems of OP-TEE shared memory
handling across kexec. My current hunch is that while we've disabled the
shared memory cache with this patch, we haven't unregistered all of the
addresses that the previous kernel (which crashed) had registered with
OP-TEE and that perhaps OP-TEE OS is still trying to make use those
addresses?

I'm still pretty early in investigating that assumption and
I'm learning about OP-TEE as I go but I wanted to get this initial
fix-of-the-fix out so that it was clear that the v2 of the series[2] is
not complete.

[1] https://lore.kernel.org/lkml/20210225090610.242623-2-allen.lkml@gmail.com/
[2] https://lore.kernel.org/lkml/20210225090610.242623-1-allen.lkml@gmail.com/#t

 drivers/tee/optee/call.c          | 11 ++++++++++-
 drivers/tee/optee/core.c          | 13 +++++++++++--
 drivers/tee/optee/optee_private.h |  2 +-
 3 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
index 6132cc8d014c..799e84bec63d 100644
--- a/drivers/tee/optee/call.c
+++ b/drivers/tee/optee/call.c
@@ -417,8 +417,10 @@ void optee_enable_shm_cache(struct optee *optee)
  * optee_disable_shm_cache() - Disables caching of some shared memory allocation
  *			      in OP-TEE
  * @optee:	main service struct
+ * @is_mapped:	true if the cached shared memory addresses were mapped by this
+ *		kernel, are safe to dereference, and should be freed
  */
-void optee_disable_shm_cache(struct optee *optee)
+void optee_disable_shm_cache(struct optee *optee, bool is_mapped)
 {
 	struct optee_call_waiter w;
 
@@ -437,6 +439,13 @@ void optee_disable_shm_cache(struct optee *optee)
 		if (res.result.status == OPTEE_SMC_RETURN_OK) {
 			struct tee_shm *shm;
 
+			/*
+			 * Shared memory references that were not mapped by
+			 * this kernel must be ignored to prevent a crash.
+			 */
+			if (!is_mapped)
+				continue;
+
 			shm = reg_pair_to_ptr(res.result.shm_upper32,
 					      res.result.shm_lower32);
 			tee_shm_free(shm);
diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
index 69d1f698907c..9985c671bd1f 100644
--- a/drivers/tee/optee/core.c
+++ b/drivers/tee/optee/core.c
@@ -6,6 +6,7 @@
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
 #include <linux/arm-smccc.h>
+#include <linux/crash_dump.h>
 #include <linux/errno.h>
 #include <linux/io.h>
 #include <linux/module.h>
@@ -588,7 +589,7 @@ static int optee_remove(struct platform_device *pdev)
 	 * reference counters and also avoid wild pointers in secure world
 	 * into the old shared memory range.
 	 */
-	optee_disable_shm_cache(optee);
+	optee_disable_shm_cache(optee, true);
 
 	/*
 	 * The two devices have to be unregistered before we can free the
@@ -618,7 +619,7 @@ static int optee_remove(struct platform_device *pdev)
  */
 static void optee_shutdown(struct platform_device *pdev)
 {
-	optee_disable_shm_cache(platform_get_drvdata(pdev));
+	optee_disable_shm_cache(platform_get_drvdata(pdev), true);
 }
 
 static int optee_probe(struct platform_device *pdev)
@@ -705,6 +706,14 @@ static int optee_probe(struct platform_device *pdev)
 	optee->memremaped_shm = memremaped_shm;
 	optee->pool = pool;
 
+	/*
+	 * The kexec into the crash kernel did not call our .shutdown hook. The
+	 * shm cache objects registered with OP-TEE are not valid for the crash
+	 * kernel.
+	 */
+	if (is_kdump_kernel())
+		optee_disable_shm_cache(optee, false);
+
 	optee_enable_shm_cache(optee);
 
 	if (optee->sec_caps & OPTEE_SMC_SEC_CAP_DYNAMIC_SHM)
diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h
index e25b216a14ef..16d8c82213e7 100644
--- a/drivers/tee/optee/optee_private.h
+++ b/drivers/tee/optee/optee_private.h
@@ -158,7 +158,7 @@ int optee_invoke_func(struct tee_context *ctx, struct tee_ioctl_invoke_arg *arg,
 int optee_cancel_req(struct tee_context *ctx, u32 cancel_id, u32 session);
 
 void optee_enable_shm_cache(struct optee *optee);
-void optee_disable_shm_cache(struct optee *optee);
+void optee_disable_shm_cache(struct optee *optee, bool is_mapped);
 
 int optee_shm_register(struct tee_context *ctx, struct tee_shm *shm,
 		       struct page **pages, size_t num_pages,
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
  2021-05-07  3:58   ` Tyler Hicks
@ 2021-05-07  7:00     ` Allen Pais
  -1 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-07  7:00 UTC (permalink / raw)
  To: Tyler Hicks
  Cc: jens.wiklander, zajec5, Allen Pais, bcm-kernel-feedback-list,
	linux-arm-kernel, linux-kernel, op-tee



> On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> 
> The .shutdown hook is not called after a kernel crash when a kdump
> kernel is pre-loaded. A kexec into the kdump kernel takes place as
> quickly as possible without allowing drivers to clean up.
> 
> That means that the OP-TEE shared memory cache, which was initialized by
> the kernel that crashed, is still in place when the kdump kernel is
> booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> which calls optee_disable_shm_cache(), and OP-TEE's
> OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> mapped for the kdump kernel since the cache was set up by the previous
> kernel. Trying to dereference the tee_shm pointer or otherwise translate
> the address results in a fault that cannot be handled:
> 
> Unable to handle kernel paging request at virtual address ffff4317b9c09744
> Mem abort info:
>   ESR = 0x96000004
>   EC = 0x25: DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
> Data abort info:
>   ISV = 0, ISS = 0x00000004
>   CM = 0, WnR = 0
> swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> Internal error: Oops: 96000004 [#1] SMP
> Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> Hardware name: Redacted (DT)
> pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> sp : ffff80001005bb70
> x29: ffff80001005bb70 x28: ffff608e74648e00
> x27: ffff80001005bb98 x26: dead000000000100
> x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> x23: ffff608e74cf8818 x22: ffff608e738be600
> x21: ffff80001005bbc8 x20: ffff608e738be638
> x19: ffff4317b9c09700 x18: ffffffffffffffff
> x17: 0000000000000041 x16: ffffba61b5171764
> x15: 0000000000000004 x14: 0000000000000fff
> x13: ffffba61b5c9dfc8 x12: 0000000000000003
> x11: 0000000000000000 x10: 0000000000000000
> x9 : ffffba61b5413824 x8 : 00000000ffff4317
> x7 : 0000000000000000 x6 : 0000000000000000
> x5 : 0000000000000000 x4 : 0000000000000000
> x3 : 0000000000000000 x2 : ffff4317b9c09700
> x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> Call trace:
> tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> 
> When booting the kdump kernel, drain the shared memory cache while being
> careful to not translate the addresses returned from
> OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> and the cache is disabled, proceed with re-enabling the cache so that we
> aren't dealing with invalid addresses while shutting down the kdump
> kernel.
> 
> Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> ---
> 
> This patch fixes a crash introduced by "optee: fix tee out of memory
> failure seen during kexec reboot"[1]. However, I don't think that the
> original two patch series[2] plus this patch is the full solution to
> properly handling OP-TEE shared memory across kexec.
> 
> While testing this fix, I did about 10 kexec reboots and then triggered
> a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> became unresponsive during boot while steadily streaming the following
> errors to the serial console:
> 
> arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> 
> I suspect that this is related to the problems of OP-TEE shared memory
> handling across kexec. My current hunch is that while we've disabled the
> shared memory cache with this patch, we haven't unregistered all of the
> addresses that the previous kernel (which crashed) had registered with
> OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> addresses?
> 
> I'm still pretty early in investigating that assumption and
> I'm learning about OP-TEE as I go but I wanted to get this initial
> fix-of-the-fix out so that it was clear that the v2 of the series[2] is
> not complete.
> 
> [1] https://lore.kernel.org/lkml/20210225090610.242623-2-allen.lkml@gmail.com/
> [2] https://lore.kernel.org/lkml/20210225090610.242623-1-allen.lkml@gmail.com/#t
> 
> drivers/tee/optee/call.c          | 11 ++++++++++-
> drivers/tee/optee/core.c          | 13 +++++++++++--
> drivers/tee/optee/optee_private.h |  2 +-
> 3 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
> index 6132cc8d014c..799e84bec63d 100644
> --- a/drivers/tee/optee/call.c
> +++ b/drivers/tee/optee/call.c
> @@ -417,8 +417,10 @@ void optee_enable_shm_cache(struct optee *optee)
>  * optee_disable_shm_cache() - Disables caching of some shared memory allocation
>  *			      in OP-TEE
>  * @optee:	main service struct
> + * @is_mapped:	true if the cached shared memory addresses were mapped by this
> + *		kernel, are safe to dereference, and should be freed
>  */
> -void optee_disable_shm_cache(struct optee *optee)
> +void optee_disable_shm_cache(struct optee *optee, bool is_mapped)
> {
> 	struct optee_call_waiter w;
> 
> @@ -437,6 +439,13 @@ void optee_disable_shm_cache(struct optee *optee)
> 		if (res.result.status == OPTEE_SMC_RETURN_OK) {
> 			struct tee_shm *shm;
> 

 Thanks Tyler.
 From what I understand from my email exchange with Jens, I don’t
Think we want to touch optee_disable_shm_cache(), I could be wrong too,
@Jens, comments?

> +			/*
> +			 * Shared memory references that were not mapped by
> +			 * this kernel must be ignored to prevent a crash.
> +			 */
> +			if (!is_mapped)
> +				continue;
> +
> 			shm = reg_pair_to_ptr(res.result.shm_upper32,
> 					      res.result.shm_lower32);
> 			tee_shm_free(shm);
> diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
> index 69d1f698907c..9985c671bd1f 100644
> --- a/drivers/tee/optee/core.c
> +++ b/drivers/tee/optee/core.c
> @@ -6,6 +6,7 @@
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> 
> #include <linux/arm-smccc.h>
> +#include <linux/crash_dump.h>
> #include <linux/errno.h>
> #include <linux/io.h>
> #include <linux/module.h>
> @@ -588,7 +589,7 @@ static int optee_remove(struct platform_device *pdev)
> 	 * reference counters and also avoid wild pointers in secure world
> 	 * into the old shared memory range.
> 	 */
> -	optee_disable_shm_cache(optee);
> +	optee_disable_shm_cache(optee, true);
> 
> 	/*
> 	 * The two devices have to be unregistered before we can free the
> @@ -618,7 +619,7 @@ static int optee_remove(struct platform_device *pdev)
>  */
> static void optee_shutdown(struct platform_device *pdev)
> {
> -	optee_disable_shm_cache(platform_get_drvdata(pdev));
> +	optee_disable_shm_cache(platform_get_drvdata(pdev), true);
> }
> 
> static int optee_probe(struct platform_device *pdev)
> @@ -705,6 +706,14 @@ static int optee_probe(struct platform_device *pdev)
> 	optee->memremaped_shm = memremaped_shm;
> 	optee->pool = pool;
> 
> +	/*
> +	 * The kexec into the crash kernel did not call our .shutdown hook. The
> +	 * shm cache objects registered with OP-TEE are not valid for the crash
> +	 * kernel.
> +	 */
> +	if (is_kdump_kernel())
> +		optee_disable_shm_cache(optee, false);
> +

 Am glad this solves the kdump crash that we have been seeing.

- Allen

> 	optee_enable_shm_cache(optee);
> 
> 	if (optee->sec_caps & OPTEE_SMC_SEC_CAP_DYNAMIC_SHM)
> diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h
> index e25b216a14ef..16d8c82213e7 100644
> --- a/drivers/tee/optee/optee_private.h
> +++ b/drivers/tee/optee/optee_private.h
> @@ -158,7 +158,7 @@ int optee_invoke_func(struct tee_context *ctx, struct tee_ioctl_invoke_arg *arg,
> int optee_cancel_req(struct tee_context *ctx, u32 cancel_id, u32 session);
> 
> void optee_enable_shm_cache(struct optee *optee);
> -void optee_disable_shm_cache(struct optee *optee);
> +void optee_disable_shm_cache(struct optee *optee, bool is_mapped);
> 
> int optee_shm_register(struct tee_context *ctx, struct tee_shm *shm,
> 		       struct page **pages, size_t num_pages,
> -- 
> 2.25.1


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
@ 2021-05-07  7:00     ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-07  7:00 UTC (permalink / raw)
  To: Tyler Hicks
  Cc: jens.wiklander, zajec5, Allen Pais, bcm-kernel-feedback-list,
	linux-arm-kernel, linux-kernel, op-tee



> On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> 
> The .shutdown hook is not called after a kernel crash when a kdump
> kernel is pre-loaded. A kexec into the kdump kernel takes place as
> quickly as possible without allowing drivers to clean up.
> 
> That means that the OP-TEE shared memory cache, which was initialized by
> the kernel that crashed, is still in place when the kdump kernel is
> booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> which calls optee_disable_shm_cache(), and OP-TEE's
> OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> mapped for the kdump kernel since the cache was set up by the previous
> kernel. Trying to dereference the tee_shm pointer or otherwise translate
> the address results in a fault that cannot be handled:
> 
> Unable to handle kernel paging request at virtual address ffff4317b9c09744
> Mem abort info:
>   ESR = 0x96000004
>   EC = 0x25: DABT (current EL), IL = 32 bits
>   SET = 0, FnV = 0
>   EA = 0, S1PTW = 0
> Data abort info:
>   ISV = 0, ISS = 0x00000004
>   CM = 0, WnR = 0
> swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> Internal error: Oops: 96000004 [#1] SMP
> Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> Hardware name: Redacted (DT)
> pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> sp : ffff80001005bb70
> x29: ffff80001005bb70 x28: ffff608e74648e00
> x27: ffff80001005bb98 x26: dead000000000100
> x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> x23: ffff608e74cf8818 x22: ffff608e738be600
> x21: ffff80001005bbc8 x20: ffff608e738be638
> x19: ffff4317b9c09700 x18: ffffffffffffffff
> x17: 0000000000000041 x16: ffffba61b5171764
> x15: 0000000000000004 x14: 0000000000000fff
> x13: ffffba61b5c9dfc8 x12: 0000000000000003
> x11: 0000000000000000 x10: 0000000000000000
> x9 : ffffba61b5413824 x8 : 00000000ffff4317
> x7 : 0000000000000000 x6 : 0000000000000000
> x5 : 0000000000000000 x4 : 0000000000000000
> x3 : 0000000000000000 x2 : ffff4317b9c09700
> x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> Call trace:
> tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> 
> When booting the kdump kernel, drain the shared memory cache while being
> careful to not translate the addresses returned from
> OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> and the cache is disabled, proceed with re-enabling the cache so that we
> aren't dealing with invalid addresses while shutting down the kdump
> kernel.
> 
> Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> ---
> 
> This patch fixes a crash introduced by "optee: fix tee out of memory
> failure seen during kexec reboot"[1]. However, I don't think that the
> original two patch series[2] plus this patch is the full solution to
> properly handling OP-TEE shared memory across kexec.
> 
> While testing this fix, I did about 10 kexec reboots and then triggered
> a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> became unresponsive during boot while steadily streaming the following
> errors to the serial console:
> 
> arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> 
> I suspect that this is related to the problems of OP-TEE shared memory
> handling across kexec. My current hunch is that while we've disabled the
> shared memory cache with this patch, we haven't unregistered all of the
> addresses that the previous kernel (which crashed) had registered with
> OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> addresses?
> 
> I'm still pretty early in investigating that assumption and
> I'm learning about OP-TEE as I go but I wanted to get this initial
> fix-of-the-fix out so that it was clear that the v2 of the series[2] is
> not complete.
> 
> [1] https://lore.kernel.org/lkml/20210225090610.242623-2-allen.lkml@gmail.com/
> [2] https://lore.kernel.org/lkml/20210225090610.242623-1-allen.lkml@gmail.com/#t
> 
> drivers/tee/optee/call.c          | 11 ++++++++++-
> drivers/tee/optee/core.c          | 13 +++++++++++--
> drivers/tee/optee/optee_private.h |  2 +-
> 3 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
> index 6132cc8d014c..799e84bec63d 100644
> --- a/drivers/tee/optee/call.c
> +++ b/drivers/tee/optee/call.c
> @@ -417,8 +417,10 @@ void optee_enable_shm_cache(struct optee *optee)
>  * optee_disable_shm_cache() - Disables caching of some shared memory allocation
>  *			      in OP-TEE
>  * @optee:	main service struct
> + * @is_mapped:	true if the cached shared memory addresses were mapped by this
> + *		kernel, are safe to dereference, and should be freed
>  */
> -void optee_disable_shm_cache(struct optee *optee)
> +void optee_disable_shm_cache(struct optee *optee, bool is_mapped)
> {
> 	struct optee_call_waiter w;
> 
> @@ -437,6 +439,13 @@ void optee_disable_shm_cache(struct optee *optee)
> 		if (res.result.status == OPTEE_SMC_RETURN_OK) {
> 			struct tee_shm *shm;
> 

 Thanks Tyler.
 From what I understand from my email exchange with Jens, I don’t
Think we want to touch optee_disable_shm_cache(), I could be wrong too,
@Jens, comments?

> +			/*
> +			 * Shared memory references that were not mapped by
> +			 * this kernel must be ignored to prevent a crash.
> +			 */
> +			if (!is_mapped)
> +				continue;
> +
> 			shm = reg_pair_to_ptr(res.result.shm_upper32,
> 					      res.result.shm_lower32);
> 			tee_shm_free(shm);
> diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
> index 69d1f698907c..9985c671bd1f 100644
> --- a/drivers/tee/optee/core.c
> +++ b/drivers/tee/optee/core.c
> @@ -6,6 +6,7 @@
> #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> 
> #include <linux/arm-smccc.h>
> +#include <linux/crash_dump.h>
> #include <linux/errno.h>
> #include <linux/io.h>
> #include <linux/module.h>
> @@ -588,7 +589,7 @@ static int optee_remove(struct platform_device *pdev)
> 	 * reference counters and also avoid wild pointers in secure world
> 	 * into the old shared memory range.
> 	 */
> -	optee_disable_shm_cache(optee);
> +	optee_disable_shm_cache(optee, true);
> 
> 	/*
> 	 * The two devices have to be unregistered before we can free the
> @@ -618,7 +619,7 @@ static int optee_remove(struct platform_device *pdev)
>  */
> static void optee_shutdown(struct platform_device *pdev)
> {
> -	optee_disable_shm_cache(platform_get_drvdata(pdev));
> +	optee_disable_shm_cache(platform_get_drvdata(pdev), true);
> }
> 
> static int optee_probe(struct platform_device *pdev)
> @@ -705,6 +706,14 @@ static int optee_probe(struct platform_device *pdev)
> 	optee->memremaped_shm = memremaped_shm;
> 	optee->pool = pool;
> 
> +	/*
> +	 * The kexec into the crash kernel did not call our .shutdown hook. The
> +	 * shm cache objects registered with OP-TEE are not valid for the crash
> +	 * kernel.
> +	 */
> +	if (is_kdump_kernel())
> +		optee_disable_shm_cache(optee, false);
> +

 Am glad this solves the kdump crash that we have been seeing.

- Allen

> 	optee_enable_shm_cache(optee);
> 
> 	if (optee->sec_caps & OPTEE_SMC_SEC_CAP_DYNAMIC_SHM)
> diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h
> index e25b216a14ef..16d8c82213e7 100644
> --- a/drivers/tee/optee/optee_private.h
> +++ b/drivers/tee/optee/optee_private.h
> @@ -158,7 +158,7 @@ int optee_invoke_func(struct tee_context *ctx, struct tee_ioctl_invoke_arg *arg,
> int optee_cancel_req(struct tee_context *ctx, u32 cancel_id, u32 session);
> 
> void optee_enable_shm_cache(struct optee *optee);
> -void optee_disable_shm_cache(struct optee *optee);
> +void optee_disable_shm_cache(struct optee *optee, bool is_mapped);
> 
> int optee_shm_register(struct tee_context *ctx, struct tee_shm *shm,
> 		       struct page **pages, size_t num_pages,
> -- 
> 2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
  2021-05-06  8:15                     ` Jens Wiklander
@ 2021-05-07  7:03                       ` Allen Pais
  -1 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-07  7:03 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

>>>>>>>> 
>>>>>>>> I could not reproduce nor create a setup using QEMU, I could only
>>>>>>>> do it on a real h/w.
>>>>>>>> 
>>>>>>>> I have extensively tested the fix and I don't see any issues.
>>>>>>> 
>>>>>>> I did a few test runs too, seems OK.
>>>>>> 
>>>>>> I carried these changes and have not run into any issues with Kexec so far.
>>>>>> Last week, while trying out kdump, we ran into a crash(this is when the
>>>>>> Kdump kernel reboots).
>>>>>> 
>>>>>> $echo c > /proc/sysrq-trigger
>>>>>> 
>>>>>> Leads to:
>>>>>> 
>>>>>> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
>>>>>> [   18.013002] Mem abort info:
>>>>>> [   18.015885]   ESR = 0x96000005
>>>>>> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>> [   18.024516]   SET = 0, FnV = 0
>>>>>> [   18.027667]   EA = 0, S1PTW = 0
>>>>>> [   18.030905] Data abort info:
>>>>>> [   18.033877]   ISV = 0, ISS = 0x00000005
>>>>>> [   18.037835]   CM = 0, WnR = 0
>>>>>> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
>>>>>> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
>>>>>> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
>>>>>> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
>>>>>> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
>>>>>> [   18.077174] Hardware name: Overlake (DT)
>>>>>> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
>>>>>> [   18.086170] pc : tee_shm_free+0x18/0x48
>>>>>> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
>>>>>> [   18.095066] sp : ffff80001005bb90
>>>>>> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
>>>>>> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
>>>>>> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
>>>>>> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
>>>>>> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
>>>>>> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
>>>>>> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
>>>>>> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
>>>>>> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
>>>>>> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
>>>>>> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
>>>>>> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
>>>>>> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
>>>>>> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
>>>>>> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
>>>>>> [   18.180654] Call trace:
>>>>>> [   18.183176]  tee_shm_free+0x18/0x48
>>>>>> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
>>>>>> [   18.191356]  optee_shutdown+0x20/0x30
>>>>>> [   18.195135]  platform_drv_shutdown+0x2c/0x38
>>>>>> [   18.199538]  device_shutdown+0x180/0x298
>>>>>> [   18.203586]  kernel_restart_prepare+0x44/0x50
>>>>>> [   18.208078]  kernel_restart+0x20/0x68
>>>>>> [   18.211853]  __do_sys_reboot+0x104/0x258
>>>>>> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
>>>>>> [   18.220035]  el0_svc_handler+0x90/0x138
>>>>>> [   18.223991]  el0_svc+0x8/0x208
>>>>>> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
>>>>>> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
>>>>>> [   18.238621] Kernel panic - not syncing: Fatal exception
>>>>>> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
>>>>>> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
>>>>>> [   18.254613] CPU features: 0x0002,21806008
>>>>>> [   18.258747] Memory Limit: none
>>>>>> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
>>>>>> 
>>>>>> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
>>>>>> Should disable and clear all the cache) we run into the crash trying to free shm.
>>>>>> 
>>>>>> Thoughts?
>>>>> 
>>>>> It seems that the pointer is invalid, but the pointer doesn't look
>>>>> like garbage. Could the kernel have unmapped the memory area covering
>>>>> that address?
>>>>> 
>>>> 
>>>> Yes, I am not entirely sure if the kernel had the time to unmap the memory.
>>>> Right after triggering the crash the kdump kernel is booted and I see the following
>>>> 
>>>> [ 2.050145] optee: probing for conduit method.
>>>> [ 2.054743] optee: revision 3.6 (f84427aa)
>>>> [ 2.054821] optee: dynamic shared memory is enabled
>>>> [ 2.066186] optee: initialized driver
>>>> 
>>>> Could this be previous un-released maps causing corruption?
>>> 
>>> Aha, yes, that could be it.
>>> 
>> 
>> How about checking for the ptr?
>> 
>> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
>> index aadedec3bfe7..8dc4fe9a1588 100644
>> --- a/drivers/tee/optee/call.c
>> +++ b/drivers/tee/optee/call.c
>> @@ -426,10 +426,12 @@ void optee_disable_shm_cache(struct optee *optee)
>>                if (res.result.status == OPTEE_SMC_RETURN_ENOTAVAIL)
>>                        break; /* All shm's freed */
>>                if (res.result.status == OPTEE_SMC_RETURN_OK) {
>> -                       struct tee_shm *shm;
>> +                       struct tee_shm *shm = NULL;
>> 
>>                        shm = reg_pair_to_ptr(res.result.shm_upper32,
>>                                              res.result.shm_lower32);
>> +                       if (IS_ERR(shm))
>> +                               return PTR_ERR(shm);
>>                        tee_shm_free(shm);
> 
> I don't think that will help. If your theory is correct then that
> pointer is from an older incarnation of the kernel. It could be worth
> trying calling this function just before the call to
> optee_enable_shm_cache() in optee_probe() but skipping the calls to
> `tee_shm_free()` in that case. Since the kernel has restarted these
> returned pointers are not valid any more and there's nothing to free,
> we just need to make sure that secure world stops using those too.
> 

Jens,

  I suppose you saw the email from @Tyler, we have it fixed but ran
Into many arm-smmu 64000000.mmu: xxx logs being printed out
And system becoming unstable and stops responding. 

 Am debugging this further, any input would be really helpful.

Thanks.


^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot
@ 2021-05-07  7:03                       ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-07  7:03 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, bcm-kernel-feedback-list, Linux ARM,
	Linux Kernel Mailing List, OP-TEE TrustedFirmware

>>>>>>>> 
>>>>>>>> I could not reproduce nor create a setup using QEMU, I could only
>>>>>>>> do it on a real h/w.
>>>>>>>> 
>>>>>>>> I have extensively tested the fix and I don't see any issues.
>>>>>>> 
>>>>>>> I did a few test runs too, seems OK.
>>>>>> 
>>>>>> I carried these changes and have not run into any issues with Kexec so far.
>>>>>> Last week, while trying out kdump, we ran into a crash(this is when the
>>>>>> Kdump kernel reboots).
>>>>>> 
>>>>>> $echo c > /proc/sysrq-trigger
>>>>>> 
>>>>>> Leads to:
>>>>>> 
>>>>>> [   18.004831] Unable to handle kernel paging request at virtual address ffff0008dcef6758
>>>>>> [   18.013002] Mem abort info:
>>>>>> [   18.015885]   ESR = 0x96000005
>>>>>> [   18.019034]   EC = 0x25: DABT (current EL), IL = 32 bits
>>>>>> [   18.024516]   SET = 0, FnV = 0
>>>>>> [   18.027667]   EA = 0, S1PTW = 0
>>>>>> [   18.030905] Data abort info:
>>>>>> [   18.033877]   ISV = 0, ISS = 0x00000005
>>>>>> [   18.037835]   CM = 0, WnR = 0
>>>>>> [   18.040896] swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970a78000
>>>>>> [   18.047811] [ffff0008dcef6758] pgd=000000097fbf9003, pud=0000000000000000
>>>>>> [   18.054819] Internal error: Oops: 96000005 [#1] SMP
>>>>>> [   18.059850] Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
>>>>>> [   18.067395] CPU: 3 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.4.83-microsoft-standard #1
>>>>>> [   18.077174] Hardware name: Overlake (DT)
>>>>>> [   18.081219] pstate: 80400005 (Nzcv daif +PAN -UAO)
>>>>>> [   18.086170] pc : tee_shm_free+0x18/0x48
>>>>>> [   18.090126] lr : optee_disable_shm_cache+0xa4/0xf0
>>>>>> [   18.095066] sp : ffff80001005bb90
>>>>>> [   18.098484] x29: ffff80001005bb90 x28: ffff000037e20000
>>>>>> [   18.103962] x27: 0000000000000000 x26: ffff00003ed10490
>>>>>> [   18.109440] x25: ffffca760e975f90 x24: 0000000000000000
>>>>>> [   18.114918] x23: ffffca760ed79808 x22: ffff00003ec66e18
>>>>>> [   18.120396] x21: ffff80001005bc08 x20: 00000000b200000a
>>>>>> [   18.125874] x19: ffff0008dcef6700 x18: 0000000000000010
>>>>>> [   18.131352] x17: 0000000000000000 x16: 0000000000000000
>>>>>> [   18.136829] x15: ffffffffffffffff x14: ffffca760ed79808
>>>>>> [   18.142307] x13: ffff80009005b897 x12: ffff80001005b89f
>>>>>> [   18.147786] x11: ffffca760eda4000 x10: ffff80001005b820
>>>>>> [   18.153264] x9 : 00000000ffffffd0 x8 : ffffca760e59b2c0
>>>>>> [   18.158742] x7 : 0000000000000000 x6 : 0000000000000000
>>>>>> [   18.164220] x5 : 0000000000000000 x4 : 0000000000000000
>>>>>> [   18.169698] x3 : 0000000000000000 x2 : ffff0008dcef6700
>>>>>> [   18.175175] x1 : 00000000ffff0008 x0 : ffffca760e59ca04
>>>>>> [   18.180654] Call trace:
>>>>>> [   18.183176]  tee_shm_free+0x18/0x48
>>>>>> [   18.186773]  optee_disable_shm_cache+0xa4/0xf0
>>>>>> [   18.191356]  optee_shutdown+0x20/0x30
>>>>>> [   18.195135]  platform_drv_shutdown+0x2c/0x38
>>>>>> [   18.199538]  device_shutdown+0x180/0x298
>>>>>> [   18.203586]  kernel_restart_prepare+0x44/0x50
>>>>>> [   18.208078]  kernel_restart+0x20/0x68
>>>>>> [   18.211853]  __do_sys_reboot+0x104/0x258
>>>>>> [   18.215899]  __arm64_sys_reboot+0x2c/0x38
>>>>>> [   18.220035]  el0_svc_handler+0x90/0x138
>>>>>> [   18.223991]  el0_svc+0x8/0x208
>>>>>> [   18.227143] Code: f9000bf3 aa0003f3 aa1e03e0 d503201f (b9405a60)
>>>>>> [   18.233435] ---[ end trace 835d756cd66aa959 ]---
>>>>>> [   18.238621] Kernel panic - not syncing: Fatal exception
>>>>>> [   18.244014] Kernel Offset: 0x4a75fde00000 from 0xffff800010000000
>>>>>> [   18.250299] PHYS_OFFSET: 0xffff99c680000000
>>>>>> [   18.254613] CPU features: 0x0002,21806008
>>>>>> [   18.258747] Memory Limit: none
>>>>>> [   18.262310] ---[ end Kernel panic - not syncing: Fatal exception ]—
>>>>>> 
>>>>>> I see that before secure world returns OPTEE_SMC_RETURN_ENOTAVAIL(which
>>>>>> Should disable and clear all the cache) we run into the crash trying to free shm.
>>>>>> 
>>>>>> Thoughts?
>>>>> 
>>>>> It seems that the pointer is invalid, but the pointer doesn't look
>>>>> like garbage. Could the kernel have unmapped the memory area covering
>>>>> that address?
>>>>> 
>>>> 
>>>> Yes, I am not entirely sure if the kernel had the time to unmap the memory.
>>>> Right after triggering the crash the kdump kernel is booted and I see the following
>>>> 
>>>> [ 2.050145] optee: probing for conduit method.
>>>> [ 2.054743] optee: revision 3.6 (f84427aa)
>>>> [ 2.054821] optee: dynamic shared memory is enabled
>>>> [ 2.066186] optee: initialized driver
>>>> 
>>>> Could this be previous un-released maps causing corruption?
>>> 
>>> Aha, yes, that could be it.
>>> 
>> 
>> How about checking for the ptr?
>> 
>> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
>> index aadedec3bfe7..8dc4fe9a1588 100644
>> --- a/drivers/tee/optee/call.c
>> +++ b/drivers/tee/optee/call.c
>> @@ -426,10 +426,12 @@ void optee_disable_shm_cache(struct optee *optee)
>>                if (res.result.status == OPTEE_SMC_RETURN_ENOTAVAIL)
>>                        break; /* All shm's freed */
>>                if (res.result.status == OPTEE_SMC_RETURN_OK) {
>> -                       struct tee_shm *shm;
>> +                       struct tee_shm *shm = NULL;
>> 
>>                        shm = reg_pair_to_ptr(res.result.shm_upper32,
>>                                              res.result.shm_lower32);
>> +                       if (IS_ERR(shm))
>> +                               return PTR_ERR(shm);
>>                        tee_shm_free(shm);
> 
> I don't think that will help. If your theory is correct then that
> pointer is from an older incarnation of the kernel. It could be worth
> trying calling this function just before the call to
> optee_enable_shm_cache() in optee_probe() but skipping the calls to
> `tee_shm_free()` in that case. Since the kernel has restarted these
> returned pointers are not valid any more and there's nothing to free,
> we just need to make sure that secure world stops using those too.
> 

Jens,

  I suppose you saw the email from @Tyler, we have it fixed but ran
Into many arm-smmu 64000000.mmu: xxx logs being printed out
And system becoming unstable and stops responding. 

 Am debugging this further, any input would be really helpful.

Thanks.


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
  2021-05-07  7:00     ` Allen Pais
@ 2021-05-07  9:23       ` Jens Wiklander
  -1 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-05-07  9:23 UTC (permalink / raw)
  To: Allen Pais
  Cc: Tyler Hicks, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
>
>
>
> > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> >
> > The .shutdown hook is not called after a kernel crash when a kdump
> > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > quickly as possible without allowing drivers to clean up.
> >
> > That means that the OP-TEE shared memory cache, which was initialized by
> > the kernel that crashed, is still in place when the kdump kernel is
> > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > which calls optee_disable_shm_cache(), and OP-TEE's
> > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > mapped for the kdump kernel since the cache was set up by the previous
> > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > the address results in a fault that cannot be handled:
> >
> > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > Mem abort info:
> >   ESR = 0x96000004
> >   EC = 0x25: DABT (current EL), IL = 32 bits
> >   SET = 0, FnV = 0
> >   EA = 0, S1PTW = 0
> > Data abort info:
> >   ISV = 0, ISS = 0x00000004
> >   CM = 0, WnR = 0
> > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > Internal error: Oops: 96000004 [#1] SMP
> > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > Hardware name: Redacted (DT)
> > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > sp : ffff80001005bb70
> > x29: ffff80001005bb70 x28: ffff608e74648e00
> > x27: ffff80001005bb98 x26: dead000000000100
> > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > x23: ffff608e74cf8818 x22: ffff608e738be600
> > x21: ffff80001005bbc8 x20: ffff608e738be638
> > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > x17: 0000000000000041 x16: ffffba61b5171764
> > x15: 0000000000000004 x14: 0000000000000fff
> > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > x11: 0000000000000000 x10: 0000000000000000
> > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > x7 : 0000000000000000 x6 : 0000000000000000
> > x5 : 0000000000000000 x4 : 0000000000000000
> > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > Call trace:
> > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> >
> > When booting the kdump kernel, drain the shared memory cache while being
> > careful to not translate the addresses returned from
> > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > and the cache is disabled, proceed with re-enabling the cache so that we
> > aren't dealing with invalid addresses while shutting down the kdump
> > kernel.
> >
> > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > ---
> >
> > This patch fixes a crash introduced by "optee: fix tee out of memory
> > failure seen during kexec reboot"[1]. However, I don't think that the
> > original two patch series[2] plus this patch is the full solution to
> > properly handling OP-TEE shared memory across kexec.
> >
> > While testing this fix, I did about 10 kexec reboots and then triggered
> > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > became unresponsive during boot while steadily streaming the following
> > errors to the serial console:
> >
> > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> >
> > I suspect that this is related to the problems of OP-TEE shared memory
> > handling across kexec. My current hunch is that while we've disabled the
> > shared memory cache with this patch, we haven't unregistered all of the
> > addresses that the previous kernel (which crashed) had registered with
> > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > addresses?
> >
> > I'm still pretty early in investigating that assumption and
> > I'm learning about OP-TEE as I go but I wanted to get this initial
> > fix-of-the-fix out so that it was clear that the v2 of the series[2] is
> > not complete.
> >
> > [1] https://lore.kernel.org/lkml/20210225090610.242623-2-allen.lkml@gmail.com/
> > [2] https://lore.kernel.org/lkml/20210225090610.242623-1-allen.lkml@gmail.com/#t
> >
> > drivers/tee/optee/call.c          | 11 ++++++++++-
> > drivers/tee/optee/core.c          | 13 +++++++++++--
> > drivers/tee/optee/optee_private.h |  2 +-
> > 3 files changed, 22 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
> > index 6132cc8d014c..799e84bec63d 100644
> > --- a/drivers/tee/optee/call.c
> > +++ b/drivers/tee/optee/call.c
> > @@ -417,8 +417,10 @@ void optee_enable_shm_cache(struct optee *optee)
> >  * optee_disable_shm_cache() - Disables caching of some shared memory allocation
> >  *                          in OP-TEE
> >  * @optee:    main service struct
> > + * @is_mapped:       true if the cached shared memory addresses were mapped by this
> > + *           kernel, are safe to dereference, and should be freed
> >  */
> > -void optee_disable_shm_cache(struct optee *optee)
> > +void optee_disable_shm_cache(struct optee *optee, bool is_mapped)
> > {
> >       struct optee_call_waiter w;
> >
> > @@ -437,6 +439,13 @@ void optee_disable_shm_cache(struct optee *optee)
> >               if (res.result.status == OPTEE_SMC_RETURN_OK) {
> >                       struct tee_shm *shm;
> >
>
>  Thanks Tyler.
>  From what I understand from my email exchange with Jens, I don’t
> Think we want to touch optee_disable_shm_cache(), I could be wrong too,
> @Jens, comments?

Changing optee_disable_shm_cache() is fine. Bear in mind that there
are other times where we can't recover from a kernel crash. For
instance if a thread is executing in OP-TEE in secure world.

Cheers,
Jens

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
@ 2021-05-07  9:23       ` Jens Wiklander
  0 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-05-07  9:23 UTC (permalink / raw)
  To: Allen Pais
  Cc: Tyler Hicks, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
>
>
>
> > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> >
> > The .shutdown hook is not called after a kernel crash when a kdump
> > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > quickly as possible without allowing drivers to clean up.
> >
> > That means that the OP-TEE shared memory cache, which was initialized by
> > the kernel that crashed, is still in place when the kdump kernel is
> > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > which calls optee_disable_shm_cache(), and OP-TEE's
> > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > mapped for the kdump kernel since the cache was set up by the previous
> > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > the address results in a fault that cannot be handled:
> >
> > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > Mem abort info:
> >   ESR = 0x96000004
> >   EC = 0x25: DABT (current EL), IL = 32 bits
> >   SET = 0, FnV = 0
> >   EA = 0, S1PTW = 0
> > Data abort info:
> >   ISV = 0, ISS = 0x00000004
> >   CM = 0, WnR = 0
> > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > Internal error: Oops: 96000004 [#1] SMP
> > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > Hardware name: Redacted (DT)
> > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > sp : ffff80001005bb70
> > x29: ffff80001005bb70 x28: ffff608e74648e00
> > x27: ffff80001005bb98 x26: dead000000000100
> > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > x23: ffff608e74cf8818 x22: ffff608e738be600
> > x21: ffff80001005bbc8 x20: ffff608e738be638
> > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > x17: 0000000000000041 x16: ffffba61b5171764
> > x15: 0000000000000004 x14: 0000000000000fff
> > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > x11: 0000000000000000 x10: 0000000000000000
> > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > x7 : 0000000000000000 x6 : 0000000000000000
> > x5 : 0000000000000000 x4 : 0000000000000000
> > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > Call trace:
> > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> >
> > When booting the kdump kernel, drain the shared memory cache while being
> > careful to not translate the addresses returned from
> > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > and the cache is disabled, proceed with re-enabling the cache so that we
> > aren't dealing with invalid addresses while shutting down the kdump
> > kernel.
> >
> > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > ---
> >
> > This patch fixes a crash introduced by "optee: fix tee out of memory
> > failure seen during kexec reboot"[1]. However, I don't think that the
> > original two patch series[2] plus this patch is the full solution to
> > properly handling OP-TEE shared memory across kexec.
> >
> > While testing this fix, I did about 10 kexec reboots and then triggered
> > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > became unresponsive during boot while steadily streaming the following
> > errors to the serial console:
> >
> > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> >
> > I suspect that this is related to the problems of OP-TEE shared memory
> > handling across kexec. My current hunch is that while we've disabled the
> > shared memory cache with this patch, we haven't unregistered all of the
> > addresses that the previous kernel (which crashed) had registered with
> > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > addresses?
> >
> > I'm still pretty early in investigating that assumption and
> > I'm learning about OP-TEE as I go but I wanted to get this initial
> > fix-of-the-fix out so that it was clear that the v2 of the series[2] is
> > not complete.
> >
> > [1] https://lore.kernel.org/lkml/20210225090610.242623-2-allen.lkml@gmail.com/
> > [2] https://lore.kernel.org/lkml/20210225090610.242623-1-allen.lkml@gmail.com/#t
> >
> > drivers/tee/optee/call.c          | 11 ++++++++++-
> > drivers/tee/optee/core.c          | 13 +++++++++++--
> > drivers/tee/optee/optee_private.h |  2 +-
> > 3 files changed, 22 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
> > index 6132cc8d014c..799e84bec63d 100644
> > --- a/drivers/tee/optee/call.c
> > +++ b/drivers/tee/optee/call.c
> > @@ -417,8 +417,10 @@ void optee_enable_shm_cache(struct optee *optee)
> >  * optee_disable_shm_cache() - Disables caching of some shared memory allocation
> >  *                          in OP-TEE
> >  * @optee:    main service struct
> > + * @is_mapped:       true if the cached shared memory addresses were mapped by this
> > + *           kernel, are safe to dereference, and should be freed
> >  */
> > -void optee_disable_shm_cache(struct optee *optee)
> > +void optee_disable_shm_cache(struct optee *optee, bool is_mapped)
> > {
> >       struct optee_call_waiter w;
> >
> > @@ -437,6 +439,13 @@ void optee_disable_shm_cache(struct optee *optee)
> >               if (res.result.status == OPTEE_SMC_RETURN_OK) {
> >                       struct tee_shm *shm;
> >
>
>  Thanks Tyler.
>  From what I understand from my email exchange with Jens, I don’t
> Think we want to touch optee_disable_shm_cache(), I could be wrong too,
> @Jens, comments?

Changing optee_disable_shm_cache() is fine. Bear in mind that there
are other times where we can't recover from a kernel crash. For
instance if a thread is executing in OP-TEE in secure world.

Cheers,
Jens

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
  2021-05-07  9:23       ` Jens Wiklander
@ 2021-05-07  9:32         ` Allen Pais
  -1 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-07  9:32 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Tyler Hicks, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

>> 
>> 
>>> On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
>>> 
>>> The .shutdown hook is not called after a kernel crash when a kdump
>>> kernel is pre-loaded. A kexec into the kdump kernel takes place as
>>> quickly as possible without allowing drivers to clean up.
>>> 
>>> That means that the OP-TEE shared memory cache, which was initialized by
>>> the kernel that crashed, is still in place when the kdump kernel is
>>> booted. As the kdump kernel is shutdown, the .shutdown hook is called,
>>> which calls optee_disable_shm_cache(), and OP-TEE's
>>> OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
>>> mapped for the kdump kernel since the cache was set up by the previous
>>> kernel. Trying to dereference the tee_shm pointer or otherwise translate
>>> the address results in a fault that cannot be handled:
>>> 
>>> Unable to handle kernel paging request at virtual address ffff4317b9c09744
>>> Mem abort info:
>>>  ESR = 0x96000004
>>>  EC = 0x25: DABT (current EL), IL = 32 bits
>>>  SET = 0, FnV = 0
>>>  EA = 0, S1PTW = 0
>>> Data abort info:
>>>  ISV = 0, ISS = 0x00000004
>>>  CM = 0, WnR = 0
>>> swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
>>> [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
>>> Internal error: Oops: 96000004 [#1] SMP
>>> Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
>>> CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
>>> Hardware name: Redacted (DT)
>>> pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
>>> pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
>>> lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
>>> sp : ffff80001005bb70
>>> x29: ffff80001005bb70 x28: ffff608e74648e00
>>> x27: ffff80001005bb98 x26: dead000000000100
>>> x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
>>> x23: ffff608e74cf8818 x22: ffff608e738be600
>>> x21: ffff80001005bbc8 x20: ffff608e738be638
>>> x19: ffff4317b9c09700 x18: ffffffffffffffff
>>> x17: 0000000000000041 x16: ffffba61b5171764
>>> x15: 0000000000000004 x14: 0000000000000fff
>>> x13: ffffba61b5c9dfc8 x12: 0000000000000003
>>> x11: 0000000000000000 x10: 0000000000000000
>>> x9 : ffffba61b5413824 x8 : 00000000ffff4317
>>> x7 : 0000000000000000 x6 : 0000000000000000
>>> x5 : 0000000000000000 x4 : 0000000000000000
>>> x3 : 0000000000000000 x2 : ffff4317b9c09700
>>> x1 : 00000000ffff4317 x0 : ffff4317b9c09700
>>> Call trace:
>>> tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
>>> optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
>>> optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
>>> platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
>>> device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
>>> kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
>>> __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
>>> do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
>>> el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
>>> el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
>>> el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
>>> Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
>>> 
>>> When booting the kdump kernel, drain the shared memory cache while being
>>> careful to not translate the addresses returned from
>>> OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
>>> and the cache is disabled, proceed with re-enabling the cache so that we
>>> aren't dealing with invalid addresses while shutting down the kdump
>>> kernel.
>>> 
>>> Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
>>> ---
>>> 
>>> This patch fixes a crash introduced by "optee: fix tee out of memory
>>> failure seen during kexec reboot"[1]. However, I don't think that the
>>> original two patch series[2] plus this patch is the full solution to
>>> properly handling OP-TEE shared memory across kexec.
>>> 
>>> While testing this fix, I did about 10 kexec reboots and then triggered
>>> a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
>>> became unresponsive during boot while steadily streaming the following
>>> errors to the serial console:
>>> 
>>> arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
>>> arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
>>> 
>>> I suspect that this is related to the problems of OP-TEE shared memory
>>> handling across kexec. My current hunch is that while we've disabled the
>>> shared memory cache with this patch, we haven't unregistered all of the
>>> addresses that the previous kernel (which crashed) had registered with
>>> OP-TEE and that perhaps OP-TEE OS is still trying to make use those
>>> addresses?
>>> 
>>> I'm still pretty early in investigating that assumption and
>>> I'm learning about OP-TEE as I go but I wanted to get this initial
>>> fix-of-the-fix out so that it was clear that the v2 of the series[2] is
>>> not complete.
>>> 
>>> [1] https://lore.kernel.org/lkml/20210225090610.242623-2-allen.lkml@gmail.com/
>>> [2] https://lore.kernel.org/lkml/20210225090610.242623-1-allen.lkml@gmail.com/#t
>>> 
>>> drivers/tee/optee/call.c          | 11 ++++++++++-
>>> drivers/tee/optee/core.c          | 13 +++++++++++--
>>> drivers/tee/optee/optee_private.h |  2 +-
>>> 3 files changed, 22 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
>>> index 6132cc8d014c..799e84bec63d 100644
>>> --- a/drivers/tee/optee/call.c
>>> +++ b/drivers/tee/optee/call.c
>>> @@ -417,8 +417,10 @@ void optee_enable_shm_cache(struct optee *optee)
>>> * optee_disable_shm_cache() - Disables caching of some shared memory allocation
>>> *                          in OP-TEE
>>> * @optee:    main service struct
>>> + * @is_mapped:       true if the cached shared memory addresses were mapped by this
>>> + *           kernel, are safe to dereference, and should be freed
>>> */
>>> -void optee_disable_shm_cache(struct optee *optee)
>>> +void optee_disable_shm_cache(struct optee *optee, bool is_mapped)
>>> {
>>>      struct optee_call_waiter w;
>>> 
>>> @@ -437,6 +439,13 @@ void optee_disable_shm_cache(struct optee *optee)
>>>              if (res.result.status == OPTEE_SMC_RETURN_OK) {
>>>                      struct tee_shm *shm;
>>> 
>> 
>> Thanks Tyler.
>> From what I understand from my email exchange with Jens, I don’t
>> Think we want to touch optee_disable_shm_cache(), I could be wrong too,
>> @Jens, comments?
> 
> Changing optee_disable_shm_cache() is fine. Bear in mind that there
> are other times where we can't recover from a kernel crash. For
> instance if a thread is executing in OP-TEE in secure world.

I agree. My bad, I meant, “we don’t want to touch optee_disable_shm_cache()”.
And precisely for the reason you have mentioned above.

Thanks.

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
@ 2021-05-07  9:32         ` Allen Pais
  0 siblings, 0 replies; 56+ messages in thread
From: Allen Pais @ 2021-05-07  9:32 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Tyler Hicks, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

>> 
>> 
>>> On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
>>> 
>>> The .shutdown hook is not called after a kernel crash when a kdump
>>> kernel is pre-loaded. A kexec into the kdump kernel takes place as
>>> quickly as possible without allowing drivers to clean up.
>>> 
>>> That means that the OP-TEE shared memory cache, which was initialized by
>>> the kernel that crashed, is still in place when the kdump kernel is
>>> booted. As the kdump kernel is shutdown, the .shutdown hook is called,
>>> which calls optee_disable_shm_cache(), and OP-TEE's
>>> OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
>>> mapped for the kdump kernel since the cache was set up by the previous
>>> kernel. Trying to dereference the tee_shm pointer or otherwise translate
>>> the address results in a fault that cannot be handled:
>>> 
>>> Unable to handle kernel paging request at virtual address ffff4317b9c09744
>>> Mem abort info:
>>>  ESR = 0x96000004
>>>  EC = 0x25: DABT (current EL), IL = 32 bits
>>>  SET = 0, FnV = 0
>>>  EA = 0, S1PTW = 0
>>> Data abort info:
>>>  ISV = 0, ISS = 0x00000004
>>>  CM = 0, WnR = 0
>>> swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
>>> [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
>>> Internal error: Oops: 96000004 [#1] SMP
>>> Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
>>> CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
>>> Hardware name: Redacted (DT)
>>> pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
>>> pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
>>> lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
>>> sp : ffff80001005bb70
>>> x29: ffff80001005bb70 x28: ffff608e74648e00
>>> x27: ffff80001005bb98 x26: dead000000000100
>>> x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
>>> x23: ffff608e74cf8818 x22: ffff608e738be600
>>> x21: ffff80001005bbc8 x20: ffff608e738be638
>>> x19: ffff4317b9c09700 x18: ffffffffffffffff
>>> x17: 0000000000000041 x16: ffffba61b5171764
>>> x15: 0000000000000004 x14: 0000000000000fff
>>> x13: ffffba61b5c9dfc8 x12: 0000000000000003
>>> x11: 0000000000000000 x10: 0000000000000000
>>> x9 : ffffba61b5413824 x8 : 00000000ffff4317
>>> x7 : 0000000000000000 x6 : 0000000000000000
>>> x5 : 0000000000000000 x4 : 0000000000000000
>>> x3 : 0000000000000000 x2 : ffff4317b9c09700
>>> x1 : 00000000ffff4317 x0 : ffff4317b9c09700
>>> Call trace:
>>> tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
>>> optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
>>> optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
>>> platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
>>> device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
>>> kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
>>> __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
>>> do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
>>> el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
>>> el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
>>> el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
>>> Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
>>> 
>>> When booting the kdump kernel, drain the shared memory cache while being
>>> careful to not translate the addresses returned from
>>> OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
>>> and the cache is disabled, proceed with re-enabling the cache so that we
>>> aren't dealing with invalid addresses while shutting down the kdump
>>> kernel.
>>> 
>>> Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
>>> ---
>>> 
>>> This patch fixes a crash introduced by "optee: fix tee out of memory
>>> failure seen during kexec reboot"[1]. However, I don't think that the
>>> original two patch series[2] plus this patch is the full solution to
>>> properly handling OP-TEE shared memory across kexec.
>>> 
>>> While testing this fix, I did about 10 kexec reboots and then triggered
>>> a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
>>> became unresponsive during boot while steadily streaming the following
>>> errors to the serial console:
>>> 
>>> arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
>>> arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
>>> 
>>> I suspect that this is related to the problems of OP-TEE shared memory
>>> handling across kexec. My current hunch is that while we've disabled the
>>> shared memory cache with this patch, we haven't unregistered all of the
>>> addresses that the previous kernel (which crashed) had registered with
>>> OP-TEE and that perhaps OP-TEE OS is still trying to make use those
>>> addresses?
>>> 
>>> I'm still pretty early in investigating that assumption and
>>> I'm learning about OP-TEE as I go but I wanted to get this initial
>>> fix-of-the-fix out so that it was clear that the v2 of the series[2] is
>>> not complete.
>>> 
>>> [1] https://lore.kernel.org/lkml/20210225090610.242623-2-allen.lkml@gmail.com/
>>> [2] https://lore.kernel.org/lkml/20210225090610.242623-1-allen.lkml@gmail.com/#t
>>> 
>>> drivers/tee/optee/call.c          | 11 ++++++++++-
>>> drivers/tee/optee/core.c          | 13 +++++++++++--
>>> drivers/tee/optee/optee_private.h |  2 +-
>>> 3 files changed, 22 insertions(+), 4 deletions(-)
>>> 
>>> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
>>> index 6132cc8d014c..799e84bec63d 100644
>>> --- a/drivers/tee/optee/call.c
>>> +++ b/drivers/tee/optee/call.c
>>> @@ -417,8 +417,10 @@ void optee_enable_shm_cache(struct optee *optee)
>>> * optee_disable_shm_cache() - Disables caching of some shared memory allocation
>>> *                          in OP-TEE
>>> * @optee:    main service struct
>>> + * @is_mapped:       true if the cached shared memory addresses were mapped by this
>>> + *           kernel, are safe to dereference, and should be freed
>>> */
>>> -void optee_disable_shm_cache(struct optee *optee)
>>> +void optee_disable_shm_cache(struct optee *optee, bool is_mapped)
>>> {
>>>      struct optee_call_waiter w;
>>> 
>>> @@ -437,6 +439,13 @@ void optee_disable_shm_cache(struct optee *optee)
>>>              if (res.result.status == OPTEE_SMC_RETURN_OK) {
>>>                      struct tee_shm *shm;
>>> 
>> 
>> Thanks Tyler.
>> From what I understand from my email exchange with Jens, I don’t
>> Think we want to touch optee_disable_shm_cache(), I could be wrong too,
>> @Jens, comments?
> 
> Changing optee_disable_shm_cache() is fine. Bear in mind that there
> are other times where we can't recover from a kernel crash. For
> instance if a thread is executing in OP-TEE in secure world.

I agree. My bad, I meant, “we don’t want to touch optee_disable_shm_cache()”.
And precisely for the reason you have mentioned above.

Thanks.
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
  2021-05-07  9:23       ` Jens Wiklander
@ 2021-05-07 13:17         ` Tyler Hicks
  -1 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-05-07 13:17 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

On 2021-05-07 11:23:17, Jens Wiklander wrote:
> On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
> >
> >
> >
> > > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > >
> > > The .shutdown hook is not called after a kernel crash when a kdump
> > > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > > quickly as possible without allowing drivers to clean up.
> > >
> > > That means that the OP-TEE shared memory cache, which was initialized by
> > > the kernel that crashed, is still in place when the kdump kernel is
> > > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > > which calls optee_disable_shm_cache(), and OP-TEE's
> > > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > > mapped for the kdump kernel since the cache was set up by the previous
> > > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > > the address results in a fault that cannot be handled:
> > >
> > > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > > Mem abort info:
> > >   ESR = 0x96000004
> > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > >   SET = 0, FnV = 0
> > >   EA = 0, S1PTW = 0
> > > Data abort info:
> > >   ISV = 0, ISS = 0x00000004
> > >   CM = 0, WnR = 0
> > > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > > Internal error: Oops: 96000004 [#1] SMP
> > > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > > Hardware name: Redacted (DT)
> > > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > sp : ffff80001005bb70
> > > x29: ffff80001005bb70 x28: ffff608e74648e00
> > > x27: ffff80001005bb98 x26: dead000000000100
> > > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > > x23: ffff608e74cf8818 x22: ffff608e738be600
> > > x21: ffff80001005bbc8 x20: ffff608e738be638
> > > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > > x17: 0000000000000041 x16: ffffba61b5171764
> > > x15: 0000000000000004 x14: 0000000000000fff
> > > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > > x11: 0000000000000000 x10: 0000000000000000
> > > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > > x7 : 0000000000000000 x6 : 0000000000000000
> > > x5 : 0000000000000000 x4 : 0000000000000000
> > > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > > Call trace:
> > > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> > >
> > > When booting the kdump kernel, drain the shared memory cache while being
> > > careful to not translate the addresses returned from
> > > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > > and the cache is disabled, proceed with re-enabling the cache so that we
> > > aren't dealing with invalid addresses while shutting down the kdump
> > > kernel.
> > >
> > > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > > ---
> > >
> > > This patch fixes a crash introduced by "optee: fix tee out of memory
> > > failure seen during kexec reboot"[1]. However, I don't think that the
> > > original two patch series[2] plus this patch is the full solution to
> > > properly handling OP-TEE shared memory across kexec.
> > >
> > > While testing this fix, I did about 10 kexec reboots and then triggered
> > > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > > became unresponsive during boot while steadily streaming the following
> > > errors to the serial console:
> > >
> > > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> > >
> > > I suspect that this is related to the problems of OP-TEE shared memory
> > > handling across kexec. My current hunch is that while we've disabled the
> > > shared memory cache with this patch, we haven't unregistered all of the
> > > addresses that the previous kernel (which crashed) had registered with
> > > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > > addresses?

@Jens did you have any thoughts on what could be happening here with the
arm-smmu errors? Do I need to try to unregister the cached shared memory
addresses when booting the kdump kernel, rather than just disabling the
caches?

Tyler

> > >
> > > I'm still pretty early in investigating that assumption and
> > > I'm learning about OP-TEE as I go but I wanted to get this initial
> > > fix-of-the-fix out so that it was clear that the v2 of the series[2] is
> > > not complete.
> > >
> > > [1] https://lore.kernel.org/lkml/20210225090610.242623-2-allen.lkml@gmail.com/
> > > [2] https://lore.kernel.org/lkml/20210225090610.242623-1-allen.lkml@gmail.com/#t
> > >
> > > drivers/tee/optee/call.c          | 11 ++++++++++-
> > > drivers/tee/optee/core.c          | 13 +++++++++++--
> > > drivers/tee/optee/optee_private.h |  2 +-
> > > 3 files changed, 22 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
> > > index 6132cc8d014c..799e84bec63d 100644
> > > --- a/drivers/tee/optee/call.c
> > > +++ b/drivers/tee/optee/call.c
> > > @@ -417,8 +417,10 @@ void optee_enable_shm_cache(struct optee *optee)
> > >  * optee_disable_shm_cache() - Disables caching of some shared memory allocation
> > >  *                          in OP-TEE
> > >  * @optee:    main service struct
> > > + * @is_mapped:       true if the cached shared memory addresses were mapped by this
> > > + *           kernel, are safe to dereference, and should be freed
> > >  */
> > > -void optee_disable_shm_cache(struct optee *optee)
> > > +void optee_disable_shm_cache(struct optee *optee, bool is_mapped)
> > > {
> > >       struct optee_call_waiter w;
> > >
> > > @@ -437,6 +439,13 @@ void optee_disable_shm_cache(struct optee *optee)
> > >               if (res.result.status == OPTEE_SMC_RETURN_OK) {
> > >                       struct tee_shm *shm;
> > >
> >
> >  Thanks Tyler.
> >  From what I understand from my email exchange with Jens, I don’t
> > Think we want to touch optee_disable_shm_cache(), I could be wrong too,
> > @Jens, comments?
> 
> Changing optee_disable_shm_cache() is fine. Bear in mind that there
> are other times where we can't recover from a kernel crash. For
> instance if a thread is executing in OP-TEE in secure world.
> 
> Cheers,
> Jens
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
@ 2021-05-07 13:17         ` Tyler Hicks
  0 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-05-07 13:17 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

On 2021-05-07 11:23:17, Jens Wiklander wrote:
> On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
> >
> >
> >
> > > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > >
> > > The .shutdown hook is not called after a kernel crash when a kdump
> > > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > > quickly as possible without allowing drivers to clean up.
> > >
> > > That means that the OP-TEE shared memory cache, which was initialized by
> > > the kernel that crashed, is still in place when the kdump kernel is
> > > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > > which calls optee_disable_shm_cache(), and OP-TEE's
> > > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > > mapped for the kdump kernel since the cache was set up by the previous
> > > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > > the address results in a fault that cannot be handled:
> > >
> > > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > > Mem abort info:
> > >   ESR = 0x96000004
> > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > >   SET = 0, FnV = 0
> > >   EA = 0, S1PTW = 0
> > > Data abort info:
> > >   ISV = 0, ISS = 0x00000004
> > >   CM = 0, WnR = 0
> > > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > > Internal error: Oops: 96000004 [#1] SMP
> > > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > > Hardware name: Redacted (DT)
> > > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > sp : ffff80001005bb70
> > > x29: ffff80001005bb70 x28: ffff608e74648e00
> > > x27: ffff80001005bb98 x26: dead000000000100
> > > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > > x23: ffff608e74cf8818 x22: ffff608e738be600
> > > x21: ffff80001005bbc8 x20: ffff608e738be638
> > > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > > x17: 0000000000000041 x16: ffffba61b5171764
> > > x15: 0000000000000004 x14: 0000000000000fff
> > > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > > x11: 0000000000000000 x10: 0000000000000000
> > > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > > x7 : 0000000000000000 x6 : 0000000000000000
> > > x5 : 0000000000000000 x4 : 0000000000000000
> > > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > > Call trace:
> > > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> > >
> > > When booting the kdump kernel, drain the shared memory cache while being
> > > careful to not translate the addresses returned from
> > > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > > and the cache is disabled, proceed with re-enabling the cache so that we
> > > aren't dealing with invalid addresses while shutting down the kdump
> > > kernel.
> > >
> > > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > > ---
> > >
> > > This patch fixes a crash introduced by "optee: fix tee out of memory
> > > failure seen during kexec reboot"[1]. However, I don't think that the
> > > original two patch series[2] plus this patch is the full solution to
> > > properly handling OP-TEE shared memory across kexec.
> > >
> > > While testing this fix, I did about 10 kexec reboots and then triggered
> > > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > > became unresponsive during boot while steadily streaming the following
> > > errors to the serial console:
> > >
> > > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> > >
> > > I suspect that this is related to the problems of OP-TEE shared memory
> > > handling across kexec. My current hunch is that while we've disabled the
> > > shared memory cache with this patch, we haven't unregistered all of the
> > > addresses that the previous kernel (which crashed) had registered with
> > > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > > addresses?

@Jens did you have any thoughts on what could be happening here with the
arm-smmu errors? Do I need to try to unregister the cached shared memory
addresses when booting the kdump kernel, rather than just disabling the
caches?

Tyler

> > >
> > > I'm still pretty early in investigating that assumption and
> > > I'm learning about OP-TEE as I go but I wanted to get this initial
> > > fix-of-the-fix out so that it was clear that the v2 of the series[2] is
> > > not complete.
> > >
> > > [1] https://lore.kernel.org/lkml/20210225090610.242623-2-allen.lkml@gmail.com/
> > > [2] https://lore.kernel.org/lkml/20210225090610.242623-1-allen.lkml@gmail.com/#t
> > >
> > > drivers/tee/optee/call.c          | 11 ++++++++++-
> > > drivers/tee/optee/core.c          | 13 +++++++++++--
> > > drivers/tee/optee/optee_private.h |  2 +-
> > > 3 files changed, 22 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
> > > index 6132cc8d014c..799e84bec63d 100644
> > > --- a/drivers/tee/optee/call.c
> > > +++ b/drivers/tee/optee/call.c
> > > @@ -417,8 +417,10 @@ void optee_enable_shm_cache(struct optee *optee)
> > >  * optee_disable_shm_cache() - Disables caching of some shared memory allocation
> > >  *                          in OP-TEE
> > >  * @optee:    main service struct
> > > + * @is_mapped:       true if the cached shared memory addresses were mapped by this
> > > + *           kernel, are safe to dereference, and should be freed
> > >  */
> > > -void optee_disable_shm_cache(struct optee *optee)
> > > +void optee_disable_shm_cache(struct optee *optee, bool is_mapped)
> > > {
> > >       struct optee_call_waiter w;
> > >
> > > @@ -437,6 +439,13 @@ void optee_disable_shm_cache(struct optee *optee)
> > >               if (res.result.status == OPTEE_SMC_RETURN_OK) {
> > >                       struct tee_shm *shm;
> > >
> >
> >  Thanks Tyler.
> >  From what I understand from my email exchange with Jens, I don’t
> > Think we want to touch optee_disable_shm_cache(), I could be wrong too,
> > @Jens, comments?
> 
> Changing optee_disable_shm_cache() is fine. Bear in mind that there
> are other times where we can't recover from a kernel crash. For
> instance if a thread is executing in OP-TEE in secure world.
> 
> Cheers,
> Jens
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
  2021-05-07 13:17         ` Tyler Hicks
@ 2021-05-10  7:31           ` Jens Wiklander
  -1 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-05-10  7:31 UTC (permalink / raw)
  To: Tyler Hicks
  Cc: Allen Pais, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Fri, May 7, 2021 at 3:17 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
>
> On 2021-05-07 11:23:17, Jens Wiklander wrote:
> > On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
> > >
> > >
> > >
> > > > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > > >
> > > > The .shutdown hook is not called after a kernel crash when a kdump
> > > > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > > > quickly as possible without allowing drivers to clean up.
> > > >
> > > > That means that the OP-TEE shared memory cache, which was initialized by
> > > > the kernel that crashed, is still in place when the kdump kernel is
> > > > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > > > which calls optee_disable_shm_cache(), and OP-TEE's
> > > > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > > > mapped for the kdump kernel since the cache was set up by the previous
> > > > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > > > the address results in a fault that cannot be handled:
> > > >
> > > > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > > > Mem abort info:
> > > >   ESR = 0x96000004
> > > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > > >   SET = 0, FnV = 0
> > > >   EA = 0, S1PTW = 0
> > > > Data abort info:
> > > >   ISV = 0, ISS = 0x00000004
> > > >   CM = 0, WnR = 0
> > > > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > > > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > > > Internal error: Oops: 96000004 [#1] SMP
> > > > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > > > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > > > Hardware name: Redacted (DT)
> > > > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > > > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > sp : ffff80001005bb70
> > > > x29: ffff80001005bb70 x28: ffff608e74648e00
> > > > x27: ffff80001005bb98 x26: dead000000000100
> > > > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > > > x23: ffff608e74cf8818 x22: ffff608e738be600
> > > > x21: ffff80001005bbc8 x20: ffff608e738be638
> > > > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > > > x17: 0000000000000041 x16: ffffba61b5171764
> > > > x15: 0000000000000004 x14: 0000000000000fff
> > > > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > > > x11: 0000000000000000 x10: 0000000000000000
> > > > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > > > x7 : 0000000000000000 x6 : 0000000000000000
> > > > x5 : 0000000000000000 x4 : 0000000000000000
> > > > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > > > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > > > Call trace:
> > > > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > > > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > > > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > > > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > > > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > > > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > > > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > > > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > > > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > > > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> > > >
> > > > When booting the kdump kernel, drain the shared memory cache while being
> > > > careful to not translate the addresses returned from
> > > > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > > > and the cache is disabled, proceed with re-enabling the cache so that we
> > > > aren't dealing with invalid addresses while shutting down the kdump
> > > > kernel.
> > > >
> > > > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > > > ---
> > > >
> > > > This patch fixes a crash introduced by "optee: fix tee out of memory
> > > > failure seen during kexec reboot"[1]. However, I don't think that the
> > > > original two patch series[2] plus this patch is the full solution to
> > > > properly handling OP-TEE shared memory across kexec.
> > > >
> > > > While testing this fix, I did about 10 kexec reboots and then triggered
> > > > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > > > became unresponsive during boot while steadily streaming the following
> > > > errors to the serial console:
> > > >
> > > > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > > > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> > > >
> > > > I suspect that this is related to the problems of OP-TEE shared memory
> > > > handling across kexec. My current hunch is that while we've disabled the
> > > > shared memory cache with this patch, we haven't unregistered all of the
> > > > addresses that the previous kernel (which crashed) had registered with
> > > > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > > > addresses?
>
> @Jens did you have any thoughts on what could be happening here with the
> arm-smmu errors? Do I need to try to unregister the cached shared memory
> addresses when booting the kdump kernel, rather than just disabling the
> caches?

No idea. There's no support for SMMU in upstream OP-TEE. Just
disabling the caches should be good enough. You could try to never
enable the cache so see if it makes any difference.

Cheers,
Jens

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
@ 2021-05-10  7:31           ` Jens Wiklander
  0 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-05-10  7:31 UTC (permalink / raw)
  To: Tyler Hicks
  Cc: Allen Pais, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Fri, May 7, 2021 at 3:17 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
>
> On 2021-05-07 11:23:17, Jens Wiklander wrote:
> > On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
> > >
> > >
> > >
> > > > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > > >
> > > > The .shutdown hook is not called after a kernel crash when a kdump
> > > > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > > > quickly as possible without allowing drivers to clean up.
> > > >
> > > > That means that the OP-TEE shared memory cache, which was initialized by
> > > > the kernel that crashed, is still in place when the kdump kernel is
> > > > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > > > which calls optee_disable_shm_cache(), and OP-TEE's
> > > > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > > > mapped for the kdump kernel since the cache was set up by the previous
> > > > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > > > the address results in a fault that cannot be handled:
> > > >
> > > > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > > > Mem abort info:
> > > >   ESR = 0x96000004
> > > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > > >   SET = 0, FnV = 0
> > > >   EA = 0, S1PTW = 0
> > > > Data abort info:
> > > >   ISV = 0, ISS = 0x00000004
> > > >   CM = 0, WnR = 0
> > > > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > > > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > > > Internal error: Oops: 96000004 [#1] SMP
> > > > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > > > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > > > Hardware name: Redacted (DT)
> > > > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > > > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > sp : ffff80001005bb70
> > > > x29: ffff80001005bb70 x28: ffff608e74648e00
> > > > x27: ffff80001005bb98 x26: dead000000000100
> > > > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > > > x23: ffff608e74cf8818 x22: ffff608e738be600
> > > > x21: ffff80001005bbc8 x20: ffff608e738be638
> > > > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > > > x17: 0000000000000041 x16: ffffba61b5171764
> > > > x15: 0000000000000004 x14: 0000000000000fff
> > > > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > > > x11: 0000000000000000 x10: 0000000000000000
> > > > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > > > x7 : 0000000000000000 x6 : 0000000000000000
> > > > x5 : 0000000000000000 x4 : 0000000000000000
> > > > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > > > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > > > Call trace:
> > > > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > > > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > > > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > > > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > > > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > > > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > > > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > > > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > > > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > > > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> > > >
> > > > When booting the kdump kernel, drain the shared memory cache while being
> > > > careful to not translate the addresses returned from
> > > > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > > > and the cache is disabled, proceed with re-enabling the cache so that we
> > > > aren't dealing with invalid addresses while shutting down the kdump
> > > > kernel.
> > > >
> > > > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > > > ---
> > > >
> > > > This patch fixes a crash introduced by "optee: fix tee out of memory
> > > > failure seen during kexec reboot"[1]. However, I don't think that the
> > > > original two patch series[2] plus this patch is the full solution to
> > > > properly handling OP-TEE shared memory across kexec.
> > > >
> > > > While testing this fix, I did about 10 kexec reboots and then triggered
> > > > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > > > became unresponsive during boot while steadily streaming the following
> > > > errors to the serial console:
> > > >
> > > > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > > > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> > > >
> > > > I suspect that this is related to the problems of OP-TEE shared memory
> > > > handling across kexec. My current hunch is that while we've disabled the
> > > > shared memory cache with this patch, we haven't unregistered all of the
> > > > addresses that the previous kernel (which crashed) had registered with
> > > > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > > > addresses?
>
> @Jens did you have any thoughts on what could be happening here with the
> arm-smmu errors? Do I need to try to unregister the cached shared memory
> addresses when booting the kdump kernel, rather than just disabling the
> caches?

No idea. There's no support for SMMU in upstream OP-TEE. Just
disabling the caches should be good enough. You could try to never
enable the cache so see if it makes any difference.

Cheers,
Jens

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
  2021-05-10  7:31           ` Jens Wiklander
@ 2021-05-12  0:23             ` Tyler Hicks
  -1 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-05-12  0:23 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

On 2021-05-10 09:31:51, Jens Wiklander wrote:
> On Fri, May 7, 2021 at 3:17 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> >
> > On 2021-05-07 11:23:17, Jens Wiklander wrote:
> > > On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
> > > >
> > > >
> > > >
> > > > > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > > > >
> > > > > The .shutdown hook is not called after a kernel crash when a kdump
> > > > > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > > > > quickly as possible without allowing drivers to clean up.
> > > > >
> > > > > That means that the OP-TEE shared memory cache, which was initialized by
> > > > > the kernel that crashed, is still in place when the kdump kernel is
> > > > > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > > > > which calls optee_disable_shm_cache(), and OP-TEE's
> > > > > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > > > > mapped for the kdump kernel since the cache was set up by the previous
> > > > > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > > > > the address results in a fault that cannot be handled:
> > > > >
> > > > > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > > > > Mem abort info:
> > > > >   ESR = 0x96000004
> > > > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > >   SET = 0, FnV = 0
> > > > >   EA = 0, S1PTW = 0
> > > > > Data abort info:
> > > > >   ISV = 0, ISS = 0x00000004
> > > > >   CM = 0, WnR = 0
> > > > > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > > > > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > > > > Internal error: Oops: 96000004 [#1] SMP
> > > > > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > > > > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > > > > Hardware name: Redacted (DT)
> > > > > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > > > > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > sp : ffff80001005bb70
> > > > > x29: ffff80001005bb70 x28: ffff608e74648e00
> > > > > x27: ffff80001005bb98 x26: dead000000000100
> > > > > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > > > > x23: ffff608e74cf8818 x22: ffff608e738be600
> > > > > x21: ffff80001005bbc8 x20: ffff608e738be638
> > > > > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > > > > x17: 0000000000000041 x16: ffffba61b5171764
> > > > > x15: 0000000000000004 x14: 0000000000000fff
> > > > > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > > > > x11: 0000000000000000 x10: 0000000000000000
> > > > > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > > > > x7 : 0000000000000000 x6 : 0000000000000000
> > > > > x5 : 0000000000000000 x4 : 0000000000000000
> > > > > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > > > > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > > > > Call trace:
> > > > > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > > > > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > > > > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > > > > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > > > > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > > > > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > > > > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > > > > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > > > > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > > > > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> > > > >
> > > > > When booting the kdump kernel, drain the shared memory cache while being
> > > > > careful to not translate the addresses returned from
> > > > > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > > > > and the cache is disabled, proceed with re-enabling the cache so that we
> > > > > aren't dealing with invalid addresses while shutting down the kdump
> > > > > kernel.
> > > > >
> > > > > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > > > > ---
> > > > >
> > > > > This patch fixes a crash introduced by "optee: fix tee out of memory
> > > > > failure seen during kexec reboot"[1]. However, I don't think that the
> > > > > original two patch series[2] plus this patch is the full solution to
> > > > > properly handling OP-TEE shared memory across kexec.
> > > > >
> > > > > While testing this fix, I did about 10 kexec reboots and then triggered
> > > > > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > > > > became unresponsive during boot while steadily streaming the following
> > > > > errors to the serial console:
> > > > >
> > > > > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > > > > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> > > > >
> > > > > I suspect that this is related to the problems of OP-TEE shared memory
> > > > > handling across kexec. My current hunch is that while we've disabled the
> > > > > shared memory cache with this patch, we haven't unregistered all of the
> > > > > addresses that the previous kernel (which crashed) had registered with
> > > > > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > > > > addresses?
> >
> > @Jens did you have any thoughts on what could be happening here with the
> > arm-smmu errors? Do I need to try to unregister the cached shared memory
> > addresses when booting the kdump kernel, rather than just disabling the
> > caches?
> 
> No idea. There's no support for SMMU in upstream OP-TEE. Just
> disabling the caches should be good enough. You could try to never
> enable the cache so see if it makes any difference.

I think this is unrelated to OP-TEE and more to do with ongoing DMA
activity when the kernel has crashed and we've done an emergency kexec
into the kdump kernel which didn't shutdown the SMMU. The SoC I'm using
has a v2 SMMU and I think something similar to commit 3f54c447df34
("iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel") is needed for
the v1/v2 SMMU driver. I've prototyped a patch that disables the SMMU
interrupts (GFIE and GCFGFIE) in the kdump kernel and testing has looked
good so far. I'll send that out as a separate patch after a little more
testing.

However, with that change and my earlier change to disable the shm cache
during boot, I'm periodically seeing a different issue while the kdump
kernel is coming up. I'm pretty certain it was already there before but
I wasn't seeing it as often since the SMMU warnings were so "loud".

The kernel waits indefinitely for a secure world thread and boot hangs
completely:

[  243.359489] INFO: task swapper/0:1 blocked for more than 120 seconds.
[  243.366141]       Not tainted 5.4.83-microsoft-standard #1
[  243.371802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.379882] swapper/0       D    0     1      0 0x00000028
[  243.385543] Call trace:
[  243.388080]  __switch_to+0xc8/0x118
[  243.391683]  __schedule+0x2e0/0x700
[  243.395280]  schedule+0x38/0xb8
[  243.398522]  schedule_timeout+0x258/0x388
[  243.402659]  wait_for_completion+0x16c/0x4b8
[  243.407067]  optee_cq_wait_for_completion+0x28/0xa8
[  243.412100]  optee_disable_shm_cache+0xb8/0xf8
[  243.416685]  optee_probe+0x560/0x61c
[  243.420375]  platform_drv_probe+0x58/0xa8
[  243.424512]  really_probe+0xe0/0x338
[  243.428202]  driver_probe_device+0x5c/0xf0
[  243.432427]  device_driver_attach+0x74/0x80
[  243.436744]  __driver_attach+0x64/0xe0
[  243.440611]  bus_for_each_dev+0x84/0xd8
[  243.444570]  driver_attach+0x30/0x40
[  243.448258]  bus_add_driver+0x188/0x1e8
[  243.452215]  driver_register+0x64/0x110
[  243.456172]  __platform_driver_register+0x54/0x60
[  243.461027]  optee_driver_init+0x20/0x28
[  243.465075]  do_one_initcall+0x54/0x24c
[  243.469034]  kernel_init_freeable+0x1e8/0x2c0
[  243.473529]  kernel_init+0x18/0x118
[  243.477128]  ret_from_fork+0x10/0x18

I'm unable to trigger a sysrq over the serial console of this remote
machine so I don't yet know what the other threads on the system are
doing during this time. I'll hack something together tomorrow to get a
better idea.

The blocked task warning reminded me of when you said this earlier:

> Bear in mind that there are other times where we can't recover from a
> kernel crash. For instance if a thread is executing in OP-TEE in
> secure world. 

I suspect that it is related to what I'm seeing with this blocked task. Can you
expand on why we can't recover from a kernel crash if a thread is
executing in the secure world?

I appreciate your help!

Tyler

> 
> Cheers,
> Jens
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
@ 2021-05-12  0:23             ` Tyler Hicks
  0 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-05-12  0:23 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

On 2021-05-10 09:31:51, Jens Wiklander wrote:
> On Fri, May 7, 2021 at 3:17 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> >
> > On 2021-05-07 11:23:17, Jens Wiklander wrote:
> > > On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
> > > >
> > > >
> > > >
> > > > > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > > > >
> > > > > The .shutdown hook is not called after a kernel crash when a kdump
> > > > > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > > > > quickly as possible without allowing drivers to clean up.
> > > > >
> > > > > That means that the OP-TEE shared memory cache, which was initialized by
> > > > > the kernel that crashed, is still in place when the kdump kernel is
> > > > > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > > > > which calls optee_disable_shm_cache(), and OP-TEE's
> > > > > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > > > > mapped for the kdump kernel since the cache was set up by the previous
> > > > > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > > > > the address results in a fault that cannot be handled:
> > > > >
> > > > > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > > > > Mem abort info:
> > > > >   ESR = 0x96000004
> > > > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > >   SET = 0, FnV = 0
> > > > >   EA = 0, S1PTW = 0
> > > > > Data abort info:
> > > > >   ISV = 0, ISS = 0x00000004
> > > > >   CM = 0, WnR = 0
> > > > > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > > > > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > > > > Internal error: Oops: 96000004 [#1] SMP
> > > > > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > > > > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > > > > Hardware name: Redacted (DT)
> > > > > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > > > > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > sp : ffff80001005bb70
> > > > > x29: ffff80001005bb70 x28: ffff608e74648e00
> > > > > x27: ffff80001005bb98 x26: dead000000000100
> > > > > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > > > > x23: ffff608e74cf8818 x22: ffff608e738be600
> > > > > x21: ffff80001005bbc8 x20: ffff608e738be638
> > > > > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > > > > x17: 0000000000000041 x16: ffffba61b5171764
> > > > > x15: 0000000000000004 x14: 0000000000000fff
> > > > > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > > > > x11: 0000000000000000 x10: 0000000000000000
> > > > > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > > > > x7 : 0000000000000000 x6 : 0000000000000000
> > > > > x5 : 0000000000000000 x4 : 0000000000000000
> > > > > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > > > > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > > > > Call trace:
> > > > > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > > > > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > > > > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > > > > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > > > > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > > > > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > > > > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > > > > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > > > > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > > > > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> > > > >
> > > > > When booting the kdump kernel, drain the shared memory cache while being
> > > > > careful to not translate the addresses returned from
> > > > > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > > > > and the cache is disabled, proceed with re-enabling the cache so that we
> > > > > aren't dealing with invalid addresses while shutting down the kdump
> > > > > kernel.
> > > > >
> > > > > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > > > > ---
> > > > >
> > > > > This patch fixes a crash introduced by "optee: fix tee out of memory
> > > > > failure seen during kexec reboot"[1]. However, I don't think that the
> > > > > original two patch series[2] plus this patch is the full solution to
> > > > > properly handling OP-TEE shared memory across kexec.
> > > > >
> > > > > While testing this fix, I did about 10 kexec reboots and then triggered
> > > > > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > > > > became unresponsive during boot while steadily streaming the following
> > > > > errors to the serial console:
> > > > >
> > > > > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > > > > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> > > > >
> > > > > I suspect that this is related to the problems of OP-TEE shared memory
> > > > > handling across kexec. My current hunch is that while we've disabled the
> > > > > shared memory cache with this patch, we haven't unregistered all of the
> > > > > addresses that the previous kernel (which crashed) had registered with
> > > > > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > > > > addresses?
> >
> > @Jens did you have any thoughts on what could be happening here with the
> > arm-smmu errors? Do I need to try to unregister the cached shared memory
> > addresses when booting the kdump kernel, rather than just disabling the
> > caches?
> 
> No idea. There's no support for SMMU in upstream OP-TEE. Just
> disabling the caches should be good enough. You could try to never
> enable the cache so see if it makes any difference.

I think this is unrelated to OP-TEE and more to do with ongoing DMA
activity when the kernel has crashed and we've done an emergency kexec
into the kdump kernel which didn't shutdown the SMMU. The SoC I'm using
has a v2 SMMU and I think something similar to commit 3f54c447df34
("iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel") is needed for
the v1/v2 SMMU driver. I've prototyped a patch that disables the SMMU
interrupts (GFIE and GCFGFIE) in the kdump kernel and testing has looked
good so far. I'll send that out as a separate patch after a little more
testing.

However, with that change and my earlier change to disable the shm cache
during boot, I'm periodically seeing a different issue while the kdump
kernel is coming up. I'm pretty certain it was already there before but
I wasn't seeing it as often since the SMMU warnings were so "loud".

The kernel waits indefinitely for a secure world thread and boot hangs
completely:

[  243.359489] INFO: task swapper/0:1 blocked for more than 120 seconds.
[  243.366141]       Not tainted 5.4.83-microsoft-standard #1
[  243.371802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  243.379882] swapper/0       D    0     1      0 0x00000028
[  243.385543] Call trace:
[  243.388080]  __switch_to+0xc8/0x118
[  243.391683]  __schedule+0x2e0/0x700
[  243.395280]  schedule+0x38/0xb8
[  243.398522]  schedule_timeout+0x258/0x388
[  243.402659]  wait_for_completion+0x16c/0x4b8
[  243.407067]  optee_cq_wait_for_completion+0x28/0xa8
[  243.412100]  optee_disable_shm_cache+0xb8/0xf8
[  243.416685]  optee_probe+0x560/0x61c
[  243.420375]  platform_drv_probe+0x58/0xa8
[  243.424512]  really_probe+0xe0/0x338
[  243.428202]  driver_probe_device+0x5c/0xf0
[  243.432427]  device_driver_attach+0x74/0x80
[  243.436744]  __driver_attach+0x64/0xe0
[  243.440611]  bus_for_each_dev+0x84/0xd8
[  243.444570]  driver_attach+0x30/0x40
[  243.448258]  bus_add_driver+0x188/0x1e8
[  243.452215]  driver_register+0x64/0x110
[  243.456172]  __platform_driver_register+0x54/0x60
[  243.461027]  optee_driver_init+0x20/0x28
[  243.465075]  do_one_initcall+0x54/0x24c
[  243.469034]  kernel_init_freeable+0x1e8/0x2c0
[  243.473529]  kernel_init+0x18/0x118
[  243.477128]  ret_from_fork+0x10/0x18

I'm unable to trigger a sysrq over the serial console of this remote
machine so I don't yet know what the other threads on the system are
doing during this time. I'll hack something together tomorrow to get a
better idea.

The blocked task warning reminded me of when you said this earlier:

> Bear in mind that there are other times where we can't recover from a
> kernel crash. For instance if a thread is executing in OP-TEE in
> secure world. 

I suspect that it is related to what I'm seeing with this blocked task. Can you
expand on why we can't recover from a kernel crash if a thread is
executing in the secure world?

I appreciate your help!

Tyler

> 
> Cheers,
> Jens
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
  2021-05-12  0:23             ` Tyler Hicks
@ 2021-05-12  5:50               ` Jens Wiklander
  -1 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-05-12  5:50 UTC (permalink / raw)
  To: Tyler Hicks
  Cc: Allen Pais, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Wed, May 12, 2021 at 2:23 AM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
>
> On 2021-05-10 09:31:51, Jens Wiklander wrote:
> > On Fri, May 7, 2021 at 3:17 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > >
> > > On 2021-05-07 11:23:17, Jens Wiklander wrote:
> > > > On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > > > > >
> > > > > > The .shutdown hook is not called after a kernel crash when a kdump
> > > > > > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > > > > > quickly as possible without allowing drivers to clean up.
> > > > > >
> > > > > > That means that the OP-TEE shared memory cache, which was initialized by
> > > > > > the kernel that crashed, is still in place when the kdump kernel is
> > > > > > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > > > > > which calls optee_disable_shm_cache(), and OP-TEE's
> > > > > > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > > > > > mapped for the kdump kernel since the cache was set up by the previous
> > > > > > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > > > > > the address results in a fault that cannot be handled:
> > > > > >
> > > > > > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > > > > > Mem abort info:
> > > > > >   ESR = 0x96000004
> > > > > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > > >   SET = 0, FnV = 0
> > > > > >   EA = 0, S1PTW = 0
> > > > > > Data abort info:
> > > > > >   ISV = 0, ISS = 0x00000004
> > > > > >   CM = 0, WnR = 0
> > > > > > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > > > > > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > > > > > Internal error: Oops: 96000004 [#1] SMP
> > > > > > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > > > > > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > > > > > Hardware name: Redacted (DT)
> > > > > > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > > > > > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > > sp : ffff80001005bb70
> > > > > > x29: ffff80001005bb70 x28: ffff608e74648e00
> > > > > > x27: ffff80001005bb98 x26: dead000000000100
> > > > > > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > > > > > x23: ffff608e74cf8818 x22: ffff608e738be600
> > > > > > x21: ffff80001005bbc8 x20: ffff608e738be638
> > > > > > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > > > > > x17: 0000000000000041 x16: ffffba61b5171764
> > > > > > x15: 0000000000000004 x14: 0000000000000fff
> > > > > > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > > > > > x11: 0000000000000000 x10: 0000000000000000
> > > > > > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > > > > > x7 : 0000000000000000 x6 : 0000000000000000
> > > > > > x5 : 0000000000000000 x4 : 0000000000000000
> > > > > > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > > > > > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > > > > > Call trace:
> > > > > > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > > > > > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > > > > > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > > > > > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > > > > > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > > > > > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > > > > > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > > > > > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > > > > > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > > > > > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> > > > > >
> > > > > > When booting the kdump kernel, drain the shared memory cache while being
> > > > > > careful to not translate the addresses returned from
> > > > > > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > > > > > and the cache is disabled, proceed with re-enabling the cache so that we
> > > > > > aren't dealing with invalid addresses while shutting down the kdump
> > > > > > kernel.
> > > > > >
> > > > > > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > > > > > ---
> > > > > >
> > > > > > This patch fixes a crash introduced by "optee: fix tee out of memory
> > > > > > failure seen during kexec reboot"[1]. However, I don't think that the
> > > > > > original two patch series[2] plus this patch is the full solution to
> > > > > > properly handling OP-TEE shared memory across kexec.
> > > > > >
> > > > > > While testing this fix, I did about 10 kexec reboots and then triggered
> > > > > > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > > > > > became unresponsive during boot while steadily streaming the following
> > > > > > errors to the serial console:
> > > > > >
> > > > > > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > > > > > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> > > > > >
> > > > > > I suspect that this is related to the problems of OP-TEE shared memory
> > > > > > handling across kexec. My current hunch is that while we've disabled the
> > > > > > shared memory cache with this patch, we haven't unregistered all of the
> > > > > > addresses that the previous kernel (which crashed) had registered with
> > > > > > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > > > > > addresses?
> > >
> > > @Jens did you have any thoughts on what could be happening here with the
> > > arm-smmu errors? Do I need to try to unregister the cached shared memory
> > > addresses when booting the kdump kernel, rather than just disabling the
> > > caches?
> >
> > No idea. There's no support for SMMU in upstream OP-TEE. Just
> > disabling the caches should be good enough. You could try to never
> > enable the cache so see if it makes any difference.
>
> I think this is unrelated to OP-TEE and more to do with ongoing DMA
> activity when the kernel has crashed and we've done an emergency kexec
> into the kdump kernel which didn't shutdown the SMMU. The SoC I'm using
> has a v2 SMMU and I think something similar to commit 3f54c447df34
> ("iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel") is needed for
> the v1/v2 SMMU driver. I've prototyped a patch that disables the SMMU
> interrupts (GFIE and GCFGFIE) in the kdump kernel and testing has looked
> good so far. I'll send that out as a separate patch after a little more
> testing.
>
> However, with that change and my earlier change to disable the shm cache
> during boot, I'm periodically seeing a different issue while the kdump
> kernel is coming up. I'm pretty certain it was already there before but
> I wasn't seeing it as often since the SMMU warnings were so "loud".
>
> The kernel waits indefinitely for a secure world thread and boot hangs
> completely:
>
> [  243.359489] INFO: task swapper/0:1 blocked for more than 120 seconds.
> [  243.366141]       Not tainted 5.4.83-microsoft-standard #1
> [  243.371802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  243.379882] swapper/0       D    0     1      0 0x00000028
> [  243.385543] Call trace:
> [  243.388080]  __switch_to+0xc8/0x118
> [  243.391683]  __schedule+0x2e0/0x700
> [  243.395280]  schedule+0x38/0xb8
> [  243.398522]  schedule_timeout+0x258/0x388
> [  243.402659]  wait_for_completion+0x16c/0x4b8
> [  243.407067]  optee_cq_wait_for_completion+0x28/0xa8
> [  243.412100]  optee_disable_shm_cache+0xb8/0xf8
> [  243.416685]  optee_probe+0x560/0x61c
> [  243.420375]  platform_drv_probe+0x58/0xa8
> [  243.424512]  really_probe+0xe0/0x338
> [  243.428202]  driver_probe_device+0x5c/0xf0
> [  243.432427]  device_driver_attach+0x74/0x80
> [  243.436744]  __driver_attach+0x64/0xe0
> [  243.440611]  bus_for_each_dev+0x84/0xd8
> [  243.444570]  driver_attach+0x30/0x40
> [  243.448258]  bus_add_driver+0x188/0x1e8
> [  243.452215]  driver_register+0x64/0x110
> [  243.456172]  __platform_driver_register+0x54/0x60
> [  243.461027]  optee_driver_init+0x20/0x28
> [  243.465075]  do_one_initcall+0x54/0x24c
> [  243.469034]  kernel_init_freeable+0x1e8/0x2c0
> [  243.473529]  kernel_init+0x18/0x118
> [  243.477128]  ret_from_fork+0x10/0x18
>
> I'm unable to trigger a sysrq over the serial console of this remote
> machine so I don't yet know what the other threads on the system are
> doing during this time. I'll hack something together tomorrow to get a
> better idea.
>
> The blocked task warning reminded me of when you said this earlier:
>
> > Bear in mind that there are other times where we can't recover from a
> > kernel crash. For instance if a thread is executing in OP-TEE in
> > secure world.
>
> I suspect that it is related to what I'm seeing with this blocked task. Can you
> expand on why we can't recover from a kernel crash if a thread is
> executing in the secure world?

Threads in OP-TEE are scheduled by Linux so if a thread is executing
it may be preempted. In OP-TEE that's a suspended thread waiting to be
resumed. If the kernel restarts at this moment that thread will be
lost in a suspended state. It may actually explain what you're seeing
above. optee_disable_shm_cache() is supposed to try until all threads
in OP-TEE are free, that means no suspended threads either.

These suspended threads are a bit dangerous to a restarted kernel in
case they are resumed as they may very well be using some old shared
memory objects where the physical memory now is used for some other
purpose. Cleaning out those threads might be tricky since we can't
just reset the secure world state, instead I believe that they will
need to be given enough CPU time to eventually complete. However, this
is a case which we haven't tested in OP-TEE so there's a risk of
running into some not so well tested error paths.

Cheers,
Jens

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
@ 2021-05-12  5:50               ` Jens Wiklander
  0 siblings, 0 replies; 56+ messages in thread
From: Jens Wiklander @ 2021-05-12  5:50 UTC (permalink / raw)
  To: Tyler Hicks
  Cc: Allen Pais, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

On Wed, May 12, 2021 at 2:23 AM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
>
> On 2021-05-10 09:31:51, Jens Wiklander wrote:
> > On Fri, May 7, 2021 at 3:17 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > >
> > > On 2021-05-07 11:23:17, Jens Wiklander wrote:
> > > > On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > > > > >
> > > > > > The .shutdown hook is not called after a kernel crash when a kdump
> > > > > > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > > > > > quickly as possible without allowing drivers to clean up.
> > > > > >
> > > > > > That means that the OP-TEE shared memory cache, which was initialized by
> > > > > > the kernel that crashed, is still in place when the kdump kernel is
> > > > > > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > > > > > which calls optee_disable_shm_cache(), and OP-TEE's
> > > > > > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > > > > > mapped for the kdump kernel since the cache was set up by the previous
> > > > > > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > > > > > the address results in a fault that cannot be handled:
> > > > > >
> > > > > > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > > > > > Mem abort info:
> > > > > >   ESR = 0x96000004
> > > > > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > > >   SET = 0, FnV = 0
> > > > > >   EA = 0, S1PTW = 0
> > > > > > Data abort info:
> > > > > >   ISV = 0, ISS = 0x00000004
> > > > > >   CM = 0, WnR = 0
> > > > > > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > > > > > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > > > > > Internal error: Oops: 96000004 [#1] SMP
> > > > > > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > > > > > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > > > > > Hardware name: Redacted (DT)
> > > > > > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > > > > > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > > sp : ffff80001005bb70
> > > > > > x29: ffff80001005bb70 x28: ffff608e74648e00
> > > > > > x27: ffff80001005bb98 x26: dead000000000100
> > > > > > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > > > > > x23: ffff608e74cf8818 x22: ffff608e738be600
> > > > > > x21: ffff80001005bbc8 x20: ffff608e738be638
> > > > > > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > > > > > x17: 0000000000000041 x16: ffffba61b5171764
> > > > > > x15: 0000000000000004 x14: 0000000000000fff
> > > > > > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > > > > > x11: 0000000000000000 x10: 0000000000000000
> > > > > > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > > > > > x7 : 0000000000000000 x6 : 0000000000000000
> > > > > > x5 : 0000000000000000 x4 : 0000000000000000
> > > > > > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > > > > > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > > > > > Call trace:
> > > > > > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > > > > > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > > > > > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > > > > > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > > > > > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > > > > > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > > > > > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > > > > > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > > > > > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > > > > > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> > > > > >
> > > > > > When booting the kdump kernel, drain the shared memory cache while being
> > > > > > careful to not translate the addresses returned from
> > > > > > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > > > > > and the cache is disabled, proceed with re-enabling the cache so that we
> > > > > > aren't dealing with invalid addresses while shutting down the kdump
> > > > > > kernel.
> > > > > >
> > > > > > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > > > > > ---
> > > > > >
> > > > > > This patch fixes a crash introduced by "optee: fix tee out of memory
> > > > > > failure seen during kexec reboot"[1]. However, I don't think that the
> > > > > > original two patch series[2] plus this patch is the full solution to
> > > > > > properly handling OP-TEE shared memory across kexec.
> > > > > >
> > > > > > While testing this fix, I did about 10 kexec reboots and then triggered
> > > > > > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > > > > > became unresponsive during boot while steadily streaming the following
> > > > > > errors to the serial console:
> > > > > >
> > > > > > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > > > > > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> > > > > >
> > > > > > I suspect that this is related to the problems of OP-TEE shared memory
> > > > > > handling across kexec. My current hunch is that while we've disabled the
> > > > > > shared memory cache with this patch, we haven't unregistered all of the
> > > > > > addresses that the previous kernel (which crashed) had registered with
> > > > > > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > > > > > addresses?
> > >
> > > @Jens did you have any thoughts on what could be happening here with the
> > > arm-smmu errors? Do I need to try to unregister the cached shared memory
> > > addresses when booting the kdump kernel, rather than just disabling the
> > > caches?
> >
> > No idea. There's no support for SMMU in upstream OP-TEE. Just
> > disabling the caches should be good enough. You could try to never
> > enable the cache so see if it makes any difference.
>
> I think this is unrelated to OP-TEE and more to do with ongoing DMA
> activity when the kernel has crashed and we've done an emergency kexec
> into the kdump kernel which didn't shutdown the SMMU. The SoC I'm using
> has a v2 SMMU and I think something similar to commit 3f54c447df34
> ("iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel") is needed for
> the v1/v2 SMMU driver. I've prototyped a patch that disables the SMMU
> interrupts (GFIE and GCFGFIE) in the kdump kernel and testing has looked
> good so far. I'll send that out as a separate patch after a little more
> testing.
>
> However, with that change and my earlier change to disable the shm cache
> during boot, I'm periodically seeing a different issue while the kdump
> kernel is coming up. I'm pretty certain it was already there before but
> I wasn't seeing it as often since the SMMU warnings were so "loud".
>
> The kernel waits indefinitely for a secure world thread and boot hangs
> completely:
>
> [  243.359489] INFO: task swapper/0:1 blocked for more than 120 seconds.
> [  243.366141]       Not tainted 5.4.83-microsoft-standard #1
> [  243.371802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [  243.379882] swapper/0       D    0     1      0 0x00000028
> [  243.385543] Call trace:
> [  243.388080]  __switch_to+0xc8/0x118
> [  243.391683]  __schedule+0x2e0/0x700
> [  243.395280]  schedule+0x38/0xb8
> [  243.398522]  schedule_timeout+0x258/0x388
> [  243.402659]  wait_for_completion+0x16c/0x4b8
> [  243.407067]  optee_cq_wait_for_completion+0x28/0xa8
> [  243.412100]  optee_disable_shm_cache+0xb8/0xf8
> [  243.416685]  optee_probe+0x560/0x61c
> [  243.420375]  platform_drv_probe+0x58/0xa8
> [  243.424512]  really_probe+0xe0/0x338
> [  243.428202]  driver_probe_device+0x5c/0xf0
> [  243.432427]  device_driver_attach+0x74/0x80
> [  243.436744]  __driver_attach+0x64/0xe0
> [  243.440611]  bus_for_each_dev+0x84/0xd8
> [  243.444570]  driver_attach+0x30/0x40
> [  243.448258]  bus_add_driver+0x188/0x1e8
> [  243.452215]  driver_register+0x64/0x110
> [  243.456172]  __platform_driver_register+0x54/0x60
> [  243.461027]  optee_driver_init+0x20/0x28
> [  243.465075]  do_one_initcall+0x54/0x24c
> [  243.469034]  kernel_init_freeable+0x1e8/0x2c0
> [  243.473529]  kernel_init+0x18/0x118
> [  243.477128]  ret_from_fork+0x10/0x18
>
> I'm unable to trigger a sysrq over the serial console of this remote
> machine so I don't yet know what the other threads on the system are
> doing during this time. I'll hack something together tomorrow to get a
> better idea.
>
> The blocked task warning reminded me of when you said this earlier:
>
> > Bear in mind that there are other times where we can't recover from a
> > kernel crash. For instance if a thread is executing in OP-TEE in
> > secure world.
>
> I suspect that it is related to what I'm seeing with this blocked task. Can you
> expand on why we can't recover from a kernel crash if a thread is
> executing in the secure world?

Threads in OP-TEE are scheduled by Linux so if a thread is executing
it may be preempted. In OP-TEE that's a suspended thread waiting to be
resumed. If the kernel restarts at this moment that thread will be
lost in a suspended state. It may actually explain what you're seeing
above. optee_disable_shm_cache() is supposed to try until all threads
in OP-TEE are free, that means no suspended threads either.

These suspended threads are a bit dangerous to a restarted kernel in
case they are resumed as they may very well be using some old shared
memory objects where the physical memory now is used for some other
purpose. Cleaning out those threads might be tricky since we can't
just reset the secure world state, instead I believe that they will
need to be given enough CPU time to eventually complete. However, this
is a case which we haven't tested in OP-TEE so there's a risk of
running into some not so well tested error paths.

Cheers,
Jens

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
  2021-05-12  5:50               ` Jens Wiklander
@ 2021-05-17 20:24                 ` Tyler Hicks
  -1 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-05-17 20:24 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

On 2021-05-12 07:50:30, Jens Wiklander wrote:
> On Wed, May 12, 2021 at 2:23 AM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> >
> > On 2021-05-10 09:31:51, Jens Wiklander wrote:
> > > On Fri, May 7, 2021 at 3:17 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > > >
> > > > On 2021-05-07 11:23:17, Jens Wiklander wrote:
> > > > > On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > > > > > >
> > > > > > > The .shutdown hook is not called after a kernel crash when a kdump
> > > > > > > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > > > > > > quickly as possible without allowing drivers to clean up.
> > > > > > >
> > > > > > > That means that the OP-TEE shared memory cache, which was initialized by
> > > > > > > the kernel that crashed, is still in place when the kdump kernel is
> > > > > > > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > > > > > > which calls optee_disable_shm_cache(), and OP-TEE's
> > > > > > > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > > > > > > mapped for the kdump kernel since the cache was set up by the previous
> > > > > > > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > > > > > > the address results in a fault that cannot be handled:
> > > > > > >
> > > > > > > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > > > > > > Mem abort info:
> > > > > > >   ESR = 0x96000004
> > > > > > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > > > >   SET = 0, FnV = 0
> > > > > > >   EA = 0, S1PTW = 0
> > > > > > > Data abort info:
> > > > > > >   ISV = 0, ISS = 0x00000004
> > > > > > >   CM = 0, WnR = 0
> > > > > > > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > > > > > > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > > > > > > Internal error: Oops: 96000004 [#1] SMP
> > > > > > > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > > > > > > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > > > > > > Hardware name: Redacted (DT)
> > > > > > > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > > > > > > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > > > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > > > sp : ffff80001005bb70
> > > > > > > x29: ffff80001005bb70 x28: ffff608e74648e00
> > > > > > > x27: ffff80001005bb98 x26: dead000000000100
> > > > > > > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > > > > > > x23: ffff608e74cf8818 x22: ffff608e738be600
> > > > > > > x21: ffff80001005bbc8 x20: ffff608e738be638
> > > > > > > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > > > > > > x17: 0000000000000041 x16: ffffba61b5171764
> > > > > > > x15: 0000000000000004 x14: 0000000000000fff
> > > > > > > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > > > > > > x11: 0000000000000000 x10: 0000000000000000
> > > > > > > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > > > > > > x7 : 0000000000000000 x6 : 0000000000000000
> > > > > > > x5 : 0000000000000000 x4 : 0000000000000000
> > > > > > > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > > > > > > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > > > > > > Call trace:
> > > > > > > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > > > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > > > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > > > > > > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > > > > > > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > > > > > > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > > > > > > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > > > > > > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > > > > > > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > > > > > > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > > > > > > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > > > > > > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> > > > > > >
> > > > > > > When booting the kdump kernel, drain the shared memory cache while being
> > > > > > > careful to not translate the addresses returned from
> > > > > > > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > > > > > > and the cache is disabled, proceed with re-enabling the cache so that we
> > > > > > > aren't dealing with invalid addresses while shutting down the kdump
> > > > > > > kernel.
> > > > > > >
> > > > > > > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > > > > > > ---
> > > > > > >
> > > > > > > This patch fixes a crash introduced by "optee: fix tee out of memory
> > > > > > > failure seen during kexec reboot"[1]. However, I don't think that the
> > > > > > > original two patch series[2] plus this patch is the full solution to
> > > > > > > properly handling OP-TEE shared memory across kexec.
> > > > > > >
> > > > > > > While testing this fix, I did about 10 kexec reboots and then triggered
> > > > > > > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > > > > > > became unresponsive during boot while steadily streaming the following
> > > > > > > errors to the serial console:
> > > > > > >
> > > > > > > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > > > > > > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> > > > > > >
> > > > > > > I suspect that this is related to the problems of OP-TEE shared memory
> > > > > > > handling across kexec. My current hunch is that while we've disabled the
> > > > > > > shared memory cache with this patch, we haven't unregistered all of the
> > > > > > > addresses that the previous kernel (which crashed) had registered with
> > > > > > > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > > > > > > addresses?
> > > >
> > > > @Jens did you have any thoughts on what could be happening here with the
> > > > arm-smmu errors? Do I need to try to unregister the cached shared memory
> > > > addresses when booting the kdump kernel, rather than just disabling the
> > > > caches?
> > >
> > > No idea. There's no support for SMMU in upstream OP-TEE. Just
> > > disabling the caches should be good enough. You could try to never
> > > enable the cache so see if it makes any difference.
> >
> > I think this is unrelated to OP-TEE and more to do with ongoing DMA
> > activity when the kernel has crashed and we've done an emergency kexec
> > into the kdump kernel which didn't shutdown the SMMU. The SoC I'm using
> > has a v2 SMMU and I think something similar to commit 3f54c447df34
> > ("iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel") is needed for
> > the v1/v2 SMMU driver. I've prototyped a patch that disables the SMMU
> > interrupts (GFIE and GCFGFIE) in the kdump kernel and testing has looked
> > good so far. I'll send that out as a separate patch after a little more
> > testing.
> >
> > However, with that change and my earlier change to disable the shm cache
> > during boot, I'm periodically seeing a different issue while the kdump
> > kernel is coming up. I'm pretty certain it was already there before but
> > I wasn't seeing it as often since the SMMU warnings were so "loud".
> >
> > The kernel waits indefinitely for a secure world thread and boot hangs
> > completely:
> >
> > [  243.359489] INFO: task swapper/0:1 blocked for more than 120 seconds.
> > [  243.366141]       Not tainted 5.4.83-microsoft-standard #1
> > [  243.371802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  243.379882] swapper/0       D    0     1      0 0x00000028
> > [  243.385543] Call trace:
> > [  243.388080]  __switch_to+0xc8/0x118
> > [  243.391683]  __schedule+0x2e0/0x700
> > [  243.395280]  schedule+0x38/0xb8
> > [  243.398522]  schedule_timeout+0x258/0x388
> > [  243.402659]  wait_for_completion+0x16c/0x4b8
> > [  243.407067]  optee_cq_wait_for_completion+0x28/0xa8
> > [  243.412100]  optee_disable_shm_cache+0xb8/0xf8
> > [  243.416685]  optee_probe+0x560/0x61c
> > [  243.420375]  platform_drv_probe+0x58/0xa8
> > [  243.424512]  really_probe+0xe0/0x338
> > [  243.428202]  driver_probe_device+0x5c/0xf0
> > [  243.432427]  device_driver_attach+0x74/0x80
> > [  243.436744]  __driver_attach+0x64/0xe0
> > [  243.440611]  bus_for_each_dev+0x84/0xd8
> > [  243.444570]  driver_attach+0x30/0x40
> > [  243.448258]  bus_add_driver+0x188/0x1e8
> > [  243.452215]  driver_register+0x64/0x110
> > [  243.456172]  __platform_driver_register+0x54/0x60
> > [  243.461027]  optee_driver_init+0x20/0x28
> > [  243.465075]  do_one_initcall+0x54/0x24c
> > [  243.469034]  kernel_init_freeable+0x1e8/0x2c0
> > [  243.473529]  kernel_init+0x18/0x118
> > [  243.477128]  ret_from_fork+0x10/0x18
> >
> > I'm unable to trigger a sysrq over the serial console of this remote
> > machine so I don't yet know what the other threads on the system are
> > doing during this time. I'll hack something together tomorrow to get a
> > better idea.
> >
> > The blocked task warning reminded me of when you said this earlier:
> >
> > > Bear in mind that there are other times where we can't recover from a
> > > kernel crash. For instance if a thread is executing in OP-TEE in
> > > secure world.
> >
> > I suspect that it is related to what I'm seeing with this blocked task. Can you
> > expand on why we can't recover from a kernel crash if a thread is
> > executing in the secure world?
> 
> Threads in OP-TEE are scheduled by Linux so if a thread is executing
> it may be preempted. In OP-TEE that's a suspended thread waiting to be
> resumed. If the kernel restarts at this moment that thread will be
> lost in a suspended state. It may actually explain what you're seeing
> above. optee_disable_shm_cache() is supposed to try until all threads
> in OP-TEE are free, that means no suspended threads either.

I think everything is alright when the shutdown path is able to call
optee_disable_shm_cache() because we know that there are no suspended
threads hanging around. This is the case on the normal reboot and
shutdown paths but not the case after a panic with an emergency reboot
into the kdump kernel. I verified that I'm seeing
OPTEE_SMC_RETURN_ETHREAD_LIMIT returned from the secure world during
these hangs.

> These suspended threads are a bit dangerous to a restarted kernel in
> case they are resumed as they may very well be using some old shared
> memory objects where the physical memory now is used for some other
> purpose. Cleaning out those threads might be tricky since we can't
> just reset the secure world state, instead I believe that they will
> need to be given enough CPU time to eventually complete. However, this
> is a case which we haven't tested in OP-TEE so there's a risk of
> running into some not so well tested error paths.

The kdump kernel runs from a pre-reserved area of memory. Therefore, I
don't think that there's a chance of the secure world touching physical
memory that's being used by the kdump kernel. The problem is that the
kdump kernel doesn't have access to the optee_wait_queue of the kernel
that crashed. If I understand the RPC scheduling logic correctly, that
means that the kdump kernel cannot schedule those suspended threads
during boot. I think the only safe option is going to be to bail out of
optee_probe(), with -ENODEV, if is_kdump_kernel() returns true.

I tried to skip setting up the shm cache when booting the kdump kernel
but saw the same hang in an optee_open_session() -> optee_do_call_with_arg()
calling sequence.

Tyler

> 
> Cheers,
> Jens
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
@ 2021-05-17 20:24                 ` Tyler Hicks
  0 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-05-17 20:24 UTC (permalink / raw)
  To: Jens Wiklander
  Cc: Allen Pais, zajec5, Allen Pais, bcm-kernel-feedback-list,
	Linux ARM, Linux Kernel Mailing List, OP-TEE TrustedFirmware

On 2021-05-12 07:50:30, Jens Wiklander wrote:
> On Wed, May 12, 2021 at 2:23 AM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> >
> > On 2021-05-10 09:31:51, Jens Wiklander wrote:
> > > On Fri, May 7, 2021 at 3:17 PM Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > > >
> > > > On 2021-05-07 11:23:17, Jens Wiklander wrote:
> > > > > On Fri, May 7, 2021 at 9:00 AM Allen Pais <apais@linux.microsoft.com> wrote:
> > > > > >
> > > > > >
> > > > > >
> > > > > > > On 07-May-2021, at 9:28 AM, Tyler Hicks <tyhicks@linux.microsoft.com> wrote:
> > > > > > >
> > > > > > > The .shutdown hook is not called after a kernel crash when a kdump
> > > > > > > kernel is pre-loaded. A kexec into the kdump kernel takes place as
> > > > > > > quickly as possible without allowing drivers to clean up.
> > > > > > >
> > > > > > > That means that the OP-TEE shared memory cache, which was initialized by
> > > > > > > the kernel that crashed, is still in place when the kdump kernel is
> > > > > > > booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> > > > > > > which calls optee_disable_shm_cache(), and OP-TEE's
> > > > > > > OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> > > > > > > mapped for the kdump kernel since the cache was set up by the previous
> > > > > > > kernel. Trying to dereference the tee_shm pointer or otherwise translate
> > > > > > > the address results in a fault that cannot be handled:
> > > > > > >
> > > > > > > Unable to handle kernel paging request at virtual address ffff4317b9c09744
> > > > > > > Mem abort info:
> > > > > > >   ESR = 0x96000004
> > > > > > >   EC = 0x25: DABT (current EL), IL = 32 bits
> > > > > > >   SET = 0, FnV = 0
> > > > > > >   EA = 0, S1PTW = 0
> > > > > > > Data abort info:
> > > > > > >   ISV = 0, ISS = 0x00000004
> > > > > > >   CM = 0, WnR = 0
> > > > > > > swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
> > > > > > > [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
> > > > > > > Internal error: Oops: 96000004 [#1] SMP
> > > > > > > Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
> > > > > > > CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
> > > > > > > Hardware name: Redacted (DT)
> > > > > > > pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
> > > > > > > pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > > > lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > > > sp : ffff80001005bb70
> > > > > > > x29: ffff80001005bb70 x28: ffff608e74648e00
> > > > > > > x27: ffff80001005bb98 x26: dead000000000100
> > > > > > > x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
> > > > > > > x23: ffff608e74cf8818 x22: ffff608e738be600
> > > > > > > x21: ffff80001005bbc8 x20: ffff608e738be638
> > > > > > > x19: ffff4317b9c09700 x18: ffffffffffffffff
> > > > > > > x17: 0000000000000041 x16: ffffba61b5171764
> > > > > > > x15: 0000000000000004 x14: 0000000000000fff
> > > > > > > x13: ffffba61b5c9dfc8 x12: 0000000000000003
> > > > > > > x11: 0000000000000000 x10: 0000000000000000
> > > > > > > x9 : ffffba61b5413824 x8 : 00000000ffff4317
> > > > > > > x7 : 0000000000000000 x6 : 0000000000000000
> > > > > > > x5 : 0000000000000000 x4 : 0000000000000000
> > > > > > > x3 : 0000000000000000 x2 : ffff4317b9c09700
> > > > > > > x1 : 00000000ffff4317 x0 : ffff4317b9c09700
> > > > > > > Call trace:
> > > > > > > tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
> > > > > > > optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
> > > > > > > optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
> > > > > > > platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
> > > > > > > device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
> > > > > > > kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
> > > > > > > __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
> > > > > > > do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
> > > > > > > el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
> > > > > > > el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
> > > > > > > el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
> > > > > > > Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> > > > > > >
> > > > > > > When booting the kdump kernel, drain the shared memory cache while being
> > > > > > > careful to not translate the addresses returned from
> > > > > > > OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> > > > > > > and the cache is disabled, proceed with re-enabling the cache so that we
> > > > > > > aren't dealing with invalid addresses while shutting down the kdump
> > > > > > > kernel.
> > > > > > >
> > > > > > > Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> > > > > > > ---
> > > > > > >
> > > > > > > This patch fixes a crash introduced by "optee: fix tee out of memory
> > > > > > > failure seen during kexec reboot"[1]. However, I don't think that the
> > > > > > > original two patch series[2] plus this patch is the full solution to
> > > > > > > properly handling OP-TEE shared memory across kexec.
> > > > > > >
> > > > > > > While testing this fix, I did about 10 kexec reboots and then triggered
> > > > > > > a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> > > > > > > became unresponsive during boot while steadily streaming the following
> > > > > > > errors to the serial console:
> > > > > > >
> > > > > > > arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
> > > > > > > arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> > > > > > >
> > > > > > > I suspect that this is related to the problems of OP-TEE shared memory
> > > > > > > handling across kexec. My current hunch is that while we've disabled the
> > > > > > > shared memory cache with this patch, we haven't unregistered all of the
> > > > > > > addresses that the previous kernel (which crashed) had registered with
> > > > > > > OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> > > > > > > addresses?
> > > >
> > > > @Jens did you have any thoughts on what could be happening here with the
> > > > arm-smmu errors? Do I need to try to unregister the cached shared memory
> > > > addresses when booting the kdump kernel, rather than just disabling the
> > > > caches?
> > >
> > > No idea. There's no support for SMMU in upstream OP-TEE. Just
> > > disabling the caches should be good enough. You could try to never
> > > enable the cache so see if it makes any difference.
> >
> > I think this is unrelated to OP-TEE and more to do with ongoing DMA
> > activity when the kernel has crashed and we've done an emergency kexec
> > into the kdump kernel which didn't shutdown the SMMU. The SoC I'm using
> > has a v2 SMMU and I think something similar to commit 3f54c447df34
> > ("iommu/arm-smmu-v3: Don't disable SMMU in kdump kernel") is needed for
> > the v1/v2 SMMU driver. I've prototyped a patch that disables the SMMU
> > interrupts (GFIE and GCFGFIE) in the kdump kernel and testing has looked
> > good so far. I'll send that out as a separate patch after a little more
> > testing.
> >
> > However, with that change and my earlier change to disable the shm cache
> > during boot, I'm periodically seeing a different issue while the kdump
> > kernel is coming up. I'm pretty certain it was already there before but
> > I wasn't seeing it as often since the SMMU warnings were so "loud".
> >
> > The kernel waits indefinitely for a secure world thread and boot hangs
> > completely:
> >
> > [  243.359489] INFO: task swapper/0:1 blocked for more than 120 seconds.
> > [  243.366141]       Not tainted 5.4.83-microsoft-standard #1
> > [  243.371802] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> > [  243.379882] swapper/0       D    0     1      0 0x00000028
> > [  243.385543] Call trace:
> > [  243.388080]  __switch_to+0xc8/0x118
> > [  243.391683]  __schedule+0x2e0/0x700
> > [  243.395280]  schedule+0x38/0xb8
> > [  243.398522]  schedule_timeout+0x258/0x388
> > [  243.402659]  wait_for_completion+0x16c/0x4b8
> > [  243.407067]  optee_cq_wait_for_completion+0x28/0xa8
> > [  243.412100]  optee_disable_shm_cache+0xb8/0xf8
> > [  243.416685]  optee_probe+0x560/0x61c
> > [  243.420375]  platform_drv_probe+0x58/0xa8
> > [  243.424512]  really_probe+0xe0/0x338
> > [  243.428202]  driver_probe_device+0x5c/0xf0
> > [  243.432427]  device_driver_attach+0x74/0x80
> > [  243.436744]  __driver_attach+0x64/0xe0
> > [  243.440611]  bus_for_each_dev+0x84/0xd8
> > [  243.444570]  driver_attach+0x30/0x40
> > [  243.448258]  bus_add_driver+0x188/0x1e8
> > [  243.452215]  driver_register+0x64/0x110
> > [  243.456172]  __platform_driver_register+0x54/0x60
> > [  243.461027]  optee_driver_init+0x20/0x28
> > [  243.465075]  do_one_initcall+0x54/0x24c
> > [  243.469034]  kernel_init_freeable+0x1e8/0x2c0
> > [  243.473529]  kernel_init+0x18/0x118
> > [  243.477128]  ret_from_fork+0x10/0x18
> >
> > I'm unable to trigger a sysrq over the serial console of this remote
> > machine so I don't yet know what the other threads on the system are
> > doing during this time. I'll hack something together tomorrow to get a
> > better idea.
> >
> > The blocked task warning reminded me of when you said this earlier:
> >
> > > Bear in mind that there are other times where we can't recover from a
> > > kernel crash. For instance if a thread is executing in OP-TEE in
> > > secure world.
> >
> > I suspect that it is related to what I'm seeing with this blocked task. Can you
> > expand on why we can't recover from a kernel crash if a thread is
> > executing in the secure world?
> 
> Threads in OP-TEE are scheduled by Linux so if a thread is executing
> it may be preempted. In OP-TEE that's a suspended thread waiting to be
> resumed. If the kernel restarts at this moment that thread will be
> lost in a suspended state. It may actually explain what you're seeing
> above. optee_disable_shm_cache() is supposed to try until all threads
> in OP-TEE are free, that means no suspended threads either.

I think everything is alright when the shutdown path is able to call
optee_disable_shm_cache() because we know that there are no suspended
threads hanging around. This is the case on the normal reboot and
shutdown paths but not the case after a panic with an emergency reboot
into the kdump kernel. I verified that I'm seeing
OPTEE_SMC_RETURN_ETHREAD_LIMIT returned from the secure world during
these hangs.

> These suspended threads are a bit dangerous to a restarted kernel in
> case they are resumed as they may very well be using some old shared
> memory objects where the physical memory now is used for some other
> purpose. Cleaning out those threads might be tricky since we can't
> just reset the secure world state, instead I believe that they will
> need to be given enough CPU time to eventually complete. However, this
> is a case which we haven't tested in OP-TEE so there's a risk of
> running into some not so well tested error paths.

The kdump kernel runs from a pre-reserved area of memory. Therefore, I
don't think that there's a chance of the secure world touching physical
memory that's being used by the kdump kernel. The problem is that the
kdump kernel doesn't have access to the optee_wait_queue of the kernel
that crashed. If I understand the RPC scheduling logic correctly, that
means that the kdump kernel cannot schedule those suspended threads
during boot. I think the only safe option is going to be to bail out of
optee_probe(), with -ENODEV, if is_kdump_kernel() returns true.

I tried to skip setting up the shm cache when booting the kdump kernel
but saw the same hang in an optee_open_session() -> optee_do_call_with_arg()
calling sequence.

Tyler

> 
> Cheers,
> Jens
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
  2021-05-07  3:58   ` Tyler Hicks
@ 2021-05-17 20:31     ` Tyler Hicks
  -1 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-05-17 20:31 UTC (permalink / raw)
  To: jens.wiklander, zajec5, Allen Pais
  Cc: bcm-kernel-feedback-list, linux-arm-kernel, linux-kernel, op-tee,
	Allen Pais

On 2021-05-06 22:58:16, Tyler Hicks wrote:
> The .shutdown hook is not called after a kernel crash when a kdump
> kernel is pre-loaded. A kexec into the kdump kernel takes place as
> quickly as possible without allowing drivers to clean up.
> 
> That means that the OP-TEE shared memory cache, which was initialized by
> the kernel that crashed, is still in place when the kdump kernel is
> booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> which calls optee_disable_shm_cache(), and OP-TEE's
> OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> mapped for the kdump kernel since the cache was set up by the previous
> kernel. Trying to dereference the tee_shm pointer or otherwise translate
> the address results in a fault that cannot be handled:
> 
>  Unable to handle kernel paging request at virtual address ffff4317b9c09744
>  Mem abort info:
>    ESR = 0x96000004
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>  Data abort info:
>    ISV = 0, ISS = 0x00000004
>    CM = 0, WnR = 0
>  swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
>  [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
>  Internal error: Oops: 96000004 [#1] SMP
>  Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
>  CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
>  Hardware name: Redacted (DT)
>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
>  pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
>  lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
>  sp : ffff80001005bb70
>  x29: ffff80001005bb70 x28: ffff608e74648e00
>  x27: ffff80001005bb98 x26: dead000000000100
>  x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
>  x23: ffff608e74cf8818 x22: ffff608e738be600
>  x21: ffff80001005bbc8 x20: ffff608e738be638
>  x19: ffff4317b9c09700 x18: ffffffffffffffff
>  x17: 0000000000000041 x16: ffffba61b5171764
>  x15: 0000000000000004 x14: 0000000000000fff
>  x13: ffffba61b5c9dfc8 x12: 0000000000000003
>  x11: 0000000000000000 x10: 0000000000000000
>  x9 : ffffba61b5413824 x8 : 00000000ffff4317
>  x7 : 0000000000000000 x6 : 0000000000000000
>  x5 : 0000000000000000 x4 : 0000000000000000
>  x3 : 0000000000000000 x2 : ffff4317b9c09700
>  x1 : 00000000ffff4317 x0 : ffff4317b9c09700
>  Call trace:
>  tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
>  optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
>  optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
>  platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
>  device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
>  kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
>  __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
>  do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
>  el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
>  el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
>  el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
>  Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> 
> When booting the kdump kernel, drain the shared memory cache while being
> careful to not translate the addresses returned from
> OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> and the cache is disabled, proceed with re-enabling the cache so that we
> aren't dealing with invalid addresses while shutting down the kdump
> kernel.
> 
> Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> ---
> 
> This patch fixes a crash introduced by "optee: fix tee out of memory
> failure seen during kexec reboot"[1]. However, I don't think that the
> original two patch series[2] plus this patch is the full solution to
> properly handling OP-TEE shared memory across kexec.
> 
> While testing this fix, I did about 10 kexec reboots and then triggered
> a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> became unresponsive during boot while steadily streaming the following
> errors to the serial console:
> 
>  arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
>  arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> 
> I suspect that this is related to the problems of OP-TEE shared memory
> handling across kexec. My current hunch is that while we've disabled the
> shared memory cache with this patch, we haven't unregistered all of the
> addresses that the previous kernel (which crashed) had registered with
> OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> addresses?
> 
> I'm still pretty early in investigating that assumption and
> I'm learning about OP-TEE as I go but I wanted to get this initial
> fix-of-the-fix out so that it was clear that the v2 of the series[2] is
> not complete.
> 
> [1] https://lore.kernel.org/lkml/20210225090610.242623-2-allen.lkml@gmail.com/
> [2] https://lore.kernel.org/lkml/20210225090610.242623-1-allen.lkml@gmail.com/#t
> 
>  drivers/tee/optee/call.c          | 11 ++++++++++-
>  drivers/tee/optee/core.c          | 13 +++++++++++--
>  drivers/tee/optee/optee_private.h |  2 +-
>  3 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
> index 6132cc8d014c..799e84bec63d 100644
> --- a/drivers/tee/optee/call.c
> +++ b/drivers/tee/optee/call.c
> @@ -417,8 +417,10 @@ void optee_enable_shm_cache(struct optee *optee)
>   * optee_disable_shm_cache() - Disables caching of some shared memory allocation
>   *			      in OP-TEE
>   * @optee:	main service struct
> + * @is_mapped:	true if the cached shared memory addresses were mapped by this
> + *		kernel, are safe to dereference, and should be freed
>   */
> -void optee_disable_shm_cache(struct optee *optee)
> +void optee_disable_shm_cache(struct optee *optee, bool is_mapped)
>  {
>  	struct optee_call_waiter w;
>  
> @@ -437,6 +439,13 @@ void optee_disable_shm_cache(struct optee *optee)
>  		if (res.result.status == OPTEE_SMC_RETURN_OK) {
>  			struct tee_shm *shm;
>  
> +			/*
> +			 * Shared memory references that were not mapped by
> +			 * this kernel must be ignored to prevent a crash.
> +			 */
> +			if (!is_mapped)
> +				continue;
> +
>  			shm = reg_pair_to_ptr(res.result.shm_upper32,
>  					      res.result.shm_lower32);
>  			tee_shm_free(shm);
> diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
> index 69d1f698907c..9985c671bd1f 100644
> --- a/drivers/tee/optee/core.c
> +++ b/drivers/tee/optee/core.c
> @@ -6,6 +6,7 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
>  #include <linux/arm-smccc.h>
> +#include <linux/crash_dump.h>
>  #include <linux/errno.h>
>  #include <linux/io.h>
>  #include <linux/module.h>
> @@ -588,7 +589,7 @@ static int optee_remove(struct platform_device *pdev)
>  	 * reference counters and also avoid wild pointers in secure world
>  	 * into the old shared memory range.
>  	 */
> -	optee_disable_shm_cache(optee);
> +	optee_disable_shm_cache(optee, true);
>  
>  	/*
>  	 * The two devices have to be unregistered before we can free the
> @@ -618,7 +619,7 @@ static int optee_remove(struct platform_device *pdev)
>   */
>  static void optee_shutdown(struct platform_device *pdev)
>  {
> -	optee_disable_shm_cache(platform_get_drvdata(pdev));
> +	optee_disable_shm_cache(platform_get_drvdata(pdev), true);
>  }
>  
>  static int optee_probe(struct platform_device *pdev)
> @@ -705,6 +706,14 @@ static int optee_probe(struct platform_device *pdev)
>  	optee->memremaped_shm = memremaped_shm;
>  	optee->pool = pool;
>  
> +	/*
> +	 * The kexec into the crash kernel did not call our .shutdown hook. The
> +	 * shm cache objects registered with OP-TEE are not valid for the crash
> +	 * kernel.
> +	 */
> +	if (is_kdump_kernel())
> +		optee_disable_shm_cache(optee, false);

Additional testing showed that only clearing the shm cache when booting
the kdump kernel isn't quite enough. A kexec from an old kernel, without
Allen's fix ("optee: fix OOM seen due to tee_shm_free()"), to a new
kernel that contain the fix can still result in stale/invalid shm cache
addresses hanging around in the secure world. When the fixed kernel is
shutdown, it can still experience a crash and/or memory corruption
because the secure world returns bad addresses from
OPTEE_SMC_DISABLE_SHM_CACHE that are not valid for the current kernel.

In order to safely support kexec within the OP-TEE driver, I think the
best option is going to always do a call to optee_disable_shm_cache()
prior to calling optee_enable_shm_cache() in optee_probe().

This series is in need of a v3 with all the new knowledge/fixes after
testing kexec/kdump more with OP-TEE. I'll try to get a v3 out in the
coming days.

Tyler

> +
>  	optee_enable_shm_cache(optee);
>  
>  	if (optee->sec_caps & OPTEE_SMC_SEC_CAP_DYNAMIC_SHM)
> diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h
> index e25b216a14ef..16d8c82213e7 100644
> --- a/drivers/tee/optee/optee_private.h
> +++ b/drivers/tee/optee/optee_private.h
> @@ -158,7 +158,7 @@ int optee_invoke_func(struct tee_context *ctx, struct tee_ioctl_invoke_arg *arg,
>  int optee_cancel_req(struct tee_context *ctx, u32 cancel_id, u32 session);
>  
>  void optee_enable_shm_cache(struct optee *optee);
> -void optee_disable_shm_cache(struct optee *optee);
> +void optee_disable_shm_cache(struct optee *optee, bool is_mapped);
>  
>  int optee_shm_register(struct tee_context *ctx, struct tee_shm *shm,
>  		       struct page **pages, size_t num_pages,
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [PATCH] optee: Disable shm cache when booting the crash kernel
@ 2021-05-17 20:31     ` Tyler Hicks
  0 siblings, 0 replies; 56+ messages in thread
From: Tyler Hicks @ 2021-05-17 20:31 UTC (permalink / raw)
  To: jens.wiklander, zajec5, Allen Pais
  Cc: bcm-kernel-feedback-list, linux-arm-kernel, linux-kernel, op-tee,
	Allen Pais

On 2021-05-06 22:58:16, Tyler Hicks wrote:
> The .shutdown hook is not called after a kernel crash when a kdump
> kernel is pre-loaded. A kexec into the kdump kernel takes place as
> quickly as possible without allowing drivers to clean up.
> 
> That means that the OP-TEE shared memory cache, which was initialized by
> the kernel that crashed, is still in place when the kdump kernel is
> booted. As the kdump kernel is shutdown, the .shutdown hook is called,
> which calls optee_disable_shm_cache(), and OP-TEE's
> OPTEE_SMC_DISABLE_SHM_CACHE API returns virtual addresses that are not
> mapped for the kdump kernel since the cache was set up by the previous
> kernel. Trying to dereference the tee_shm pointer or otherwise translate
> the address results in a fault that cannot be handled:
> 
>  Unable to handle kernel paging request at virtual address ffff4317b9c09744
>  Mem abort info:
>    ESR = 0x96000004
>    EC = 0x25: DABT (current EL), IL = 32 bits
>    SET = 0, FnV = 0
>    EA = 0, S1PTW = 0
>  Data abort info:
>    ISV = 0, ISS = 0x00000004
>    CM = 0, WnR = 0
>  swapper pgtable: 4k pages, 48-bit VAs, pgdp=0000000970b1e000
>  [ffff4317b9c09744] pgd=0000000000000000, p4d=0000000000000000
>  Internal error: Oops: 96000004 [#1] SMP
>  Modules linked in: bnxt_en pcie_iproc_platform pcie_iproc diagbe(O)
>  CPU: 4 PID: 1 Comm: systemd-shutdow Tainted: G           O      5.10.19.8 #1
>  Hardware name: Redacted (DT)
>  pstate: 60400005 (nZCv daif +PAN -UAO -TCO BTYPE=--)
>  pc : tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
>  lr : optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
>  sp : ffff80001005bb70
>  x29: ffff80001005bb70 x28: ffff608e74648e00
>  x27: ffff80001005bb98 x26: dead000000000100
>  x25: ffff80001005bbb8 x24: aaaaaaaaaaaaaaaa
>  x23: ffff608e74cf8818 x22: ffff608e738be600
>  x21: ffff80001005bbc8 x20: ffff608e738be638
>  x19: ffff4317b9c09700 x18: ffffffffffffffff
>  x17: 0000000000000041 x16: ffffba61b5171764
>  x15: 0000000000000004 x14: 0000000000000fff
>  x13: ffffba61b5c9dfc8 x12: 0000000000000003
>  x11: 0000000000000000 x10: 0000000000000000
>  x9 : ffffba61b5413824 x8 : 00000000ffff4317
>  x7 : 0000000000000000 x6 : 0000000000000000
>  x5 : 0000000000000000 x4 : 0000000000000000
>  x3 : 0000000000000000 x2 : ffff4317b9c09700
>  x1 : 00000000ffff4317 x0 : ffff4317b9c09700
>  Call trace:
>  tee_shm_free (/usr/src/kernel/drivers/tee/tee_shm.c:363)
>  optee_disable_shm_cache (/usr/src/kernel/drivers/tee/optee/call.c:441)
>  optee_shutdown (/usr/src/kernel/drivers/tee/optee/core.c:636)
>  platform_drv_shutdown (/usr/src/kernel/drivers/base/platform.c:800)
>  device_shutdown (/usr/src/kernel/include/linux/device.h:758 /usr/src/kernel/drivers/base/core.c:4078)
>  kernel_restart (/usr/src/kernel/kernel/reboot.c:221 /usr/src/kernel/kernel/reboot.c:248)
>  __arm64_sys_reboot (/usr/src/kernel/kernel/reboot.c:349 /usr/src/kernel/kernel/reboot.c:312 /usr/src/kernel/kernel/reboot.c:312)
>  do_el0_svc (/usr/src/kernel/arch/arm64/kernel/syscall.c:56 /usr/src/kernel/arch/arm64/kernel/syscall.c:158 /usr/src/kernel/arch/arm64/kernel/syscall.c:197)
>  el0_svc (/usr/src/kernel/arch/arm64/kernel/entry-common.c:368)
>  el0_sync_handler (/usr/src/kernel/arch/arm64/kernel/entry-common.c:428)
>  el0_sync (/usr/src/kernel/arch/arm64/kernel/entry.S:671)
>  Code: aa0003f3 b5000060 12800003 14000002 (b9404663)
> 
> When booting the kdump kernel, drain the shared memory cache while being
> careful to not translate the addresses returned from
> OPTEE_SMC_DISABLE_SHM_CACHE. Once the invalid cache objects are drained
> and the cache is disabled, proceed with re-enabling the cache so that we
> aren't dealing with invalid addresses while shutting down the kdump
> kernel.
> 
> Signed-off-by: Tyler Hicks <tyhicks@linux.microsoft.com>
> ---
> 
> This patch fixes a crash introduced by "optee: fix tee out of memory
> failure seen during kexec reboot"[1]. However, I don't think that the
> original two patch series[2] plus this patch is the full solution to
> properly handling OP-TEE shared memory across kexec.
> 
> While testing this fix, I did about 10 kexec reboots and then triggered
> a kernel crash by writing 'c' to /proc/sysrq-trigger. The kdump kernel
> became unresponsive during boot while steadily streaming the following
> errors to the serial console:
> 
>  arm-smmu 64000000.mmu: Blocked unknown Stream ID 0x2000; boot with "arm-smmu.disable_bypass=0" to allow, but this may have security implications
>  arm-smmu 64000000.mmu:     GFSR 0x00000002, GFSYNR0 0x00000002, GFSYNR1 0x00002000, GFSYNR2 0x00000000
> 
> I suspect that this is related to the problems of OP-TEE shared memory
> handling across kexec. My current hunch is that while we've disabled the
> shared memory cache with this patch, we haven't unregistered all of the
> addresses that the previous kernel (which crashed) had registered with
> OP-TEE and that perhaps OP-TEE OS is still trying to make use those
> addresses?
> 
> I'm still pretty early in investigating that assumption and
> I'm learning about OP-TEE as I go but I wanted to get this initial
> fix-of-the-fix out so that it was clear that the v2 of the series[2] is
> not complete.
> 
> [1] https://lore.kernel.org/lkml/20210225090610.242623-2-allen.lkml@gmail.com/
> [2] https://lore.kernel.org/lkml/20210225090610.242623-1-allen.lkml@gmail.com/#t
> 
>  drivers/tee/optee/call.c          | 11 ++++++++++-
>  drivers/tee/optee/core.c          | 13 +++++++++++--
>  drivers/tee/optee/optee_private.h |  2 +-
>  3 files changed, 22 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/tee/optee/call.c b/drivers/tee/optee/call.c
> index 6132cc8d014c..799e84bec63d 100644
> --- a/drivers/tee/optee/call.c
> +++ b/drivers/tee/optee/call.c
> @@ -417,8 +417,10 @@ void optee_enable_shm_cache(struct optee *optee)
>   * optee_disable_shm_cache() - Disables caching of some shared memory allocation
>   *			      in OP-TEE
>   * @optee:	main service struct
> + * @is_mapped:	true if the cached shared memory addresses were mapped by this
> + *		kernel, are safe to dereference, and should be freed
>   */
> -void optee_disable_shm_cache(struct optee *optee)
> +void optee_disable_shm_cache(struct optee *optee, bool is_mapped)
>  {
>  	struct optee_call_waiter w;
>  
> @@ -437,6 +439,13 @@ void optee_disable_shm_cache(struct optee *optee)
>  		if (res.result.status == OPTEE_SMC_RETURN_OK) {
>  			struct tee_shm *shm;
>  
> +			/*
> +			 * Shared memory references that were not mapped by
> +			 * this kernel must be ignored to prevent a crash.
> +			 */
> +			if (!is_mapped)
> +				continue;
> +
>  			shm = reg_pair_to_ptr(res.result.shm_upper32,
>  					      res.result.shm_lower32);
>  			tee_shm_free(shm);
> diff --git a/drivers/tee/optee/core.c b/drivers/tee/optee/core.c
> index 69d1f698907c..9985c671bd1f 100644
> --- a/drivers/tee/optee/core.c
> +++ b/drivers/tee/optee/core.c
> @@ -6,6 +6,7 @@
>  #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>  
>  #include <linux/arm-smccc.h>
> +#include <linux/crash_dump.h>
>  #include <linux/errno.h>
>  #include <linux/io.h>
>  #include <linux/module.h>
> @@ -588,7 +589,7 @@ static int optee_remove(struct platform_device *pdev)
>  	 * reference counters and also avoid wild pointers in secure world
>  	 * into the old shared memory range.
>  	 */
> -	optee_disable_shm_cache(optee);
> +	optee_disable_shm_cache(optee, true);
>  
>  	/*
>  	 * The two devices have to be unregistered before we can free the
> @@ -618,7 +619,7 @@ static int optee_remove(struct platform_device *pdev)
>   */
>  static void optee_shutdown(struct platform_device *pdev)
>  {
> -	optee_disable_shm_cache(platform_get_drvdata(pdev));
> +	optee_disable_shm_cache(platform_get_drvdata(pdev), true);
>  }
>  
>  static int optee_probe(struct platform_device *pdev)
> @@ -705,6 +706,14 @@ static int optee_probe(struct platform_device *pdev)
>  	optee->memremaped_shm = memremaped_shm;
>  	optee->pool = pool;
>  
> +	/*
> +	 * The kexec into the crash kernel did not call our .shutdown hook. The
> +	 * shm cache objects registered with OP-TEE are not valid for the crash
> +	 * kernel.
> +	 */
> +	if (is_kdump_kernel())
> +		optee_disable_shm_cache(optee, false);

Additional testing showed that only clearing the shm cache when booting
the kdump kernel isn't quite enough. A kexec from an old kernel, without
Allen's fix ("optee: fix OOM seen due to tee_shm_free()"), to a new
kernel that contain the fix can still result in stale/invalid shm cache
addresses hanging around in the secure world. When the fixed kernel is
shutdown, it can still experience a crash and/or memory corruption
because the secure world returns bad addresses from
OPTEE_SMC_DISABLE_SHM_CACHE that are not valid for the current kernel.

In order to safely support kexec within the OP-TEE driver, I think the
best option is going to always do a call to optee_disable_shm_cache()
prior to calling optee_enable_shm_cache() in optee_probe().

This series is in need of a v3 with all the new knowledge/fixes after
testing kexec/kdump more with OP-TEE. I'll try to get a v3 out in the
coming days.

Tyler

> +
>  	optee_enable_shm_cache(optee);
>  
>  	if (optee->sec_caps & OPTEE_SMC_SEC_CAP_DYNAMIC_SHM)
> diff --git a/drivers/tee/optee/optee_private.h b/drivers/tee/optee/optee_private.h
> index e25b216a14ef..16d8c82213e7 100644
> --- a/drivers/tee/optee/optee_private.h
> +++ b/drivers/tee/optee/optee_private.h
> @@ -158,7 +158,7 @@ int optee_invoke_func(struct tee_context *ctx, struct tee_ioctl_invoke_arg *arg,
>  int optee_cancel_req(struct tee_context *ctx, u32 cancel_id, u32 session);
>  
>  void optee_enable_shm_cache(struct optee *optee);
> -void optee_disable_shm_cache(struct optee *optee);
> +void optee_disable_shm_cache(struct optee *optee, bool is_mapped);
>  
>  int optee_shm_register(struct tee_context *ctx, struct tee_shm *shm,
>  		       struct page **pages, size_t num_pages,
> -- 
> 2.25.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2021-05-17 20:33 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-25  9:06 [PATCH v2 0/2] optee: fix OOM seen due to tee_shm_free() Allen Pais
2021-02-25  9:06 ` Allen Pais
2021-02-25  9:06 ` [PATCH v2 1/2] optee: fix tee out of memory failure seen during kexec reboot Allen Pais
2021-02-25  9:06   ` Allen Pais
2021-03-01 14:35   ` Jens Wiklander
2021-03-01 14:35     ` Jens Wiklander
2021-03-02  5:51     ` Allen Pais
2021-03-02  5:51       ` Allen Pais
2021-03-16 13:21     ` Allen Pais
2021-03-16 13:21       ` Allen Pais
2021-03-19  7:00       ` Jens Wiklander
2021-03-19  7:00         ` Jens Wiklander
2021-03-22  7:59         ` Allen Pais
2021-03-22  7:59           ` Allen Pais
2021-05-05 13:45         ` Allen Pais
2021-05-05 13:45           ` Allen Pais
2021-05-06  7:02           ` Jens Wiklander
2021-05-06  7:02             ` Jens Wiklander
2021-05-06  7:10             ` Allen Pais
2021-05-06  7:10               ` Allen Pais
2021-05-06  7:19               ` Jens Wiklander
2021-05-06  7:19                 ` Jens Wiklander
2021-05-06  7:29                 ` Allen Pais
2021-05-06  7:29                   ` Allen Pais
2021-05-06  8:15                   ` Jens Wiklander
2021-05-06  8:15                     ` Jens Wiklander
2021-05-06  8:35                     ` Allen Pais
2021-05-06  8:35                       ` Allen Pais
2021-05-07  7:03                     ` Allen Pais
2021-05-07  7:03                       ` Allen Pais
2021-03-18 20:51   ` Tyler Hicks
2021-03-18 20:51     ` Tyler Hicks
2021-02-25  9:06 ` [PATCH v2 2/2] firmware: tee_bnxt: implement shutdown method to handle kexec reboots Allen Pais
2021-02-25  9:06   ` Allen Pais
2021-03-18 20:55   ` Tyler Hicks
2021-03-18 20:55     ` Tyler Hicks
2021-05-07  3:58 ` [PATCH] optee: Disable shm cache when booting the crash kernel Tyler Hicks
2021-05-07  3:58   ` Tyler Hicks
2021-05-07  7:00   ` Allen Pais
2021-05-07  7:00     ` Allen Pais
2021-05-07  9:23     ` Jens Wiklander
2021-05-07  9:23       ` Jens Wiklander
2021-05-07  9:32       ` Allen Pais
2021-05-07  9:32         ` Allen Pais
2021-05-07 13:17       ` Tyler Hicks
2021-05-07 13:17         ` Tyler Hicks
2021-05-10  7:31         ` Jens Wiklander
2021-05-10  7:31           ` Jens Wiklander
2021-05-12  0:23           ` Tyler Hicks
2021-05-12  0:23             ` Tyler Hicks
2021-05-12  5:50             ` Jens Wiklander
2021-05-12  5:50               ` Jens Wiklander
2021-05-17 20:24               ` Tyler Hicks
2021-05-17 20:24                 ` Tyler Hicks
2021-05-17 20:31   ` Tyler Hicks
2021-05-17 20:31     ` Tyler Hicks

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.