linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] fix issue when acpi smmuv3 device alloc offline node memory
@ 2019-03-15  2:19 Kefeng Wang
  2019-03-15  2:19 ` [PATCH 1/2] ACPI/IORT: set online numa node for smmuv3 device Kefeng Wang
  2019-03-15  2:19 ` [PATCH 2/2] ACPI: NUMA: show match info about PXM ID and offline/online node Kefeng Wang
  0 siblings, 2 replies; 18+ messages in thread
From: Kefeng Wang @ 2019-03-15  2:19 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, rjw, linux-acpi,
	linux-arm-kernel
  Cc: Kefeng Wang

IF acpi smmuv3 device is set to offline node, parsed from proximity domain
in SMMUv3 IORT table, it will lead to crash when alloc memory, so fix it
by using acpi_map_pxm_to_online_node() to find a online node and set it to
smmuv3 device. Meanwhile, show the match info about pxm id, offline node and
online node.

Kefeng Wang (2):
  ACPI/IORT: set online numa node for smmuv3 device
  ACPI: NUMA: show match info about PXM ID and offline/online node

 drivers/acpi/arm64/iort.c | 8 ++++----
 drivers/acpi/numa.c       | 3 +++
 2 files changed, 7 insertions(+), 4 deletions(-)

-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH 1/2] ACPI/IORT: set online numa node for smmuv3 device
  2019-03-15  2:19 [PATCH 0/2] fix issue when acpi smmuv3 device alloc offline node memory Kefeng Wang
@ 2019-03-15  2:19 ` Kefeng Wang
  2019-03-20 11:41   ` Robin Murphy
  2019-03-15  2:19 ` [PATCH 2/2] ACPI: NUMA: show match info about PXM ID and offline/online node Kefeng Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Kefeng Wang @ 2019-03-15  2:19 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, rjw, linux-acpi,
	linux-arm-kernel
  Cc: Kefeng Wang

If there is only node 0 in system, but smmuv3 device is set to offline
node 1, parsed from proximity domain in SMMUv3 IORT table, it will lead
to following crash,

[   47.492451] Unable to handle kernel paging request at virtual address 0000000000001388
[   47.500361] Mem abort info:
[   47.503143]   ESR = 0x96000004
[   47.506189]   Exception class = DABT (current EL), IL = 32 bits
[   47.512099]   SET = 0, FnV = 0
[   47.515140]   EA = 0, S1PTW = 0
[   47.518272] Data abort info:
[   47.521144]   ISV = 0, ISS = 0x00000004
[   47.524970]   CM = 0, WnR = 0
[   47.527929] [0000000000001388] user address but active_mm is swapper
[   47.534285] Internal error: Oops: 96000004 [#1] SMP
[   47.539151] Modules linked in:
[   47.542194] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
[   47.549490] pstate: 80c00009 (Nzcv daif +PAN +UAO)
[   47.554272] pc : __alloc_pages_nodemask+0x13c/0x1068
[   47.559224] lr : __alloc_pages_nodemask+0xdc/0x1068
...
[   47.646873] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
[   47.653560] Call trace:
[   47.655994]  __alloc_pages_nodemask+0x13c/0x1068
[   47.660600]  new_slab+0xec/0x570
[   47.663816]  ___slab_alloc+0x3e0/0x4f8
[   47.667553]  __slab_alloc+0x60/0x80
[   47.671029]  __kmalloc_node_track_caller+0x10c/0x478
[   47.675984]  devm_kmalloc+0x44/0xb0
[   47.679460]  pinctrl_bind_pins+0x4c/0x188
[   47.683457]  really_probe+0x78/0x2b8
[   47.687019]  driver_probe_device+0x64/0x110
[   47.691189]  device_driver_attach+0x74/0x98
[   47.695360]  __driver_attach+0x9c/0xe8
[   47.699095]  bus_for_each_dev+0x84/0xd8
[   47.702919]  driver_attach+0x30/0x40
[   47.706481]  bus_add_driver+0x170/0x218
[   47.710304]  driver_register+0x64/0x118
[   47.714128]  __platform_driver_register+0x54/0x60
[   47.718820]  arm_smmu_driver_init+0x24/0x2c
[   47.722991]  do_one_initcall+0xbc/0x328
[   47.726816]  kernel_init_freeable+0x304/0x3ac
[   47.731162]  kernel_init+0x18/0x110
[   47.734638]  ret_from_fork+0x10/0x1c
[   47.738202] Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
[   47.744307] ---[ end trace dfeaed4c373a32da ]--

Using acpi_map_pxm_to_online_node() to get online node to fix it.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 drivers/acpi/arm64/iort.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index e48894e002ba..a2ce836ec103 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1239,10 +1239,10 @@ static void  __init arm_smmu_v3_set_proximity(struct device *dev,
 
 	smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
 	if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
-		set_dev_node(dev, acpi_map_pxm_to_node(smmu->pxm));
-		pr_info("SMMU-v3[%llx] Mapped to Proximity domain %d\n",
-			smmu->base_address,
-			smmu->pxm);
+		int node = acpi_map_pxm_to_online_node(smmu->pxm);
+		set_dev_node(dev, node);
+		pr_info("SMMU-v3[%llx] -> PXM %d -> Node %d\n",
+			smmu->base_address, smmu->pxm, node);
 	}
 }
 #else
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH 2/2] ACPI: NUMA: show match info about PXM ID and offline/online node
  2019-03-15  2:19 [PATCH 0/2] fix issue when acpi smmuv3 device alloc offline node memory Kefeng Wang
  2019-03-15  2:19 ` [PATCH 1/2] ACPI/IORT: set online numa node for smmuv3 device Kefeng Wang
@ 2019-03-15  2:19 ` Kefeng Wang
  2019-03-15  8:34   ` Kefeng Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Kefeng Wang @ 2019-03-15  2:19 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, rjw, linux-acpi,
	linux-arm-kernel
  Cc: Kefeng Wang

It maybe trigger some issue when the acpi device driver allocs memory
from an offline node, so use alternative acpi_map_pxm_to_online_node()
to find an online node, let's show infomation about proximity ID, mapped
offline node and the nearest online node returned.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 drivers/acpi/numa.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
index 7bbbf8256a41..064c771a7338 100644
--- a/drivers/acpi/numa.c
+++ b/drivers/acpi/numa.c
@@ -121,6 +121,9 @@ int acpi_map_pxm_to_online_node(int pxm)
 			}
 		}
 	}
+	if (min_node != node)
+		pr_warn("IORT: PXM %d Mapped to offline node %d, choose nearest online node %d",
+			pxm, node, min_node);
 
 	return min_node;
 }
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH 2/2] ACPI: NUMA: show match info about PXM ID and offline/online node
  2019-03-15  2:19 ` [PATCH 2/2] ACPI: NUMA: show match info about PXM ID and offline/online node Kefeng Wang
@ 2019-03-15  8:34   ` Kefeng Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Kefeng Wang @ 2019-03-15  8:34 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, rjw, linux-acpi,
	linux-arm-kernel


On 2019/3/15 10:19, Kefeng Wang wrote:
> It maybe trigger some issue when the acpi device driver allocs memory
> from an offline node, so use alternative acpi_map_pxm_to_online_node()
> to find an online node, let's show infomation about proximity ID, mapped
> offline node and the nearest online node returned.
>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  drivers/acpi/numa.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/acpi/numa.c b/drivers/acpi/numa.c
> index 7bbbf8256a41..064c771a7338 100644
> --- a/drivers/acpi/numa.c
> +++ b/drivers/acpi/numa.c
> @@ -121,6 +121,9 @@ int acpi_map_pxm_to_online_node(int pxm)
>  			}
>  		}
>  	}
> +	if (min_node != node)
> +		pr_warn("IORT: PXM %d Mapped to offline node %d, choose nearest online node %d",
> +			pxm, node, min_node);
Should remove IORT prefix, some other table like NFIT, DMAR have PXM filed, waiting for review.
>  
>  	return min_node;
>  }


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] ACPI/IORT: set online numa node for smmuv3 device
  2019-03-15  2:19 ` [PATCH 1/2] ACPI/IORT: set online numa node for smmuv3 device Kefeng Wang
@ 2019-03-20 11:41   ` Robin Murphy
  2019-03-20 14:00     ` Lorenzo Pieralisi
  0 siblings, 1 reply; 18+ messages in thread
From: Robin Murphy @ 2019-03-20 11:41 UTC (permalink / raw)
  To: Kefeng Wang, Lorenzo Pieralisi, Hanjun Guo, Sudeep Holla, rjw,
	linux-acpi, linux-arm-kernel

On 15/03/2019 02:19, Kefeng Wang wrote:
> If there is only node 0 in system, but smmuv3 device is set to offline
> node 1, parsed from proximity domain in SMMUv3 IORT table, it will lead
> to following crash,

Surely that's just a firmware bug? If node 1 doesn't exist in the system 
then AFAICS if we're presented with a device claiming to be on that node 
we can only assume the whole thing is bogus. Thus if we're going to work 
around it at all, it seems to me like we should reject the entire device 
rather than just bodging it to some other node.

Robin.

> 
> [   47.492451] Unable to handle kernel paging request at virtual address 0000000000001388
> [   47.500361] Mem abort info:
> [   47.503143]   ESR = 0x96000004
> [   47.506189]   Exception class = DABT (current EL), IL = 32 bits
> [   47.512099]   SET = 0, FnV = 0
> [   47.515140]   EA = 0, S1PTW = 0
> [   47.518272] Data abort info:
> [   47.521144]   ISV = 0, ISS = 0x00000004
> [   47.524970]   CM = 0, WnR = 0
> [   47.527929] [0000000000001388] user address but active_mm is swapper
> [   47.534285] Internal error: Oops: 96000004 [#1] SMP
> [   47.539151] Modules linked in:
> [   47.542194] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
> [   47.549490] pstate: 80c00009 (Nzcv daif +PAN +UAO)
> [   47.554272] pc : __alloc_pages_nodemask+0x13c/0x1068
> [   47.559224] lr : __alloc_pages_nodemask+0xdc/0x1068
> ...
> [   47.646873] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
> [   47.653560] Call trace:
> [   47.655994]  __alloc_pages_nodemask+0x13c/0x1068
> [   47.660600]  new_slab+0xec/0x570
> [   47.663816]  ___slab_alloc+0x3e0/0x4f8
> [   47.667553]  __slab_alloc+0x60/0x80
> [   47.671029]  __kmalloc_node_track_caller+0x10c/0x478
> [   47.675984]  devm_kmalloc+0x44/0xb0
> [   47.679460]  pinctrl_bind_pins+0x4c/0x188
> [   47.683457]  really_probe+0x78/0x2b8
> [   47.687019]  driver_probe_device+0x64/0x110
> [   47.691189]  device_driver_attach+0x74/0x98
> [   47.695360]  __driver_attach+0x9c/0xe8
> [   47.699095]  bus_for_each_dev+0x84/0xd8
> [   47.702919]  driver_attach+0x30/0x40
> [   47.706481]  bus_add_driver+0x170/0x218
> [   47.710304]  driver_register+0x64/0x118
> [   47.714128]  __platform_driver_register+0x54/0x60
> [   47.718820]  arm_smmu_driver_init+0x24/0x2c
> [   47.722991]  do_one_initcall+0xbc/0x328
> [   47.726816]  kernel_init_freeable+0x304/0x3ac
> [   47.731162]  kernel_init+0x18/0x110
> [   47.734638]  ret_from_fork+0x10/0x1c
> [   47.738202] Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
> [   47.744307] ---[ end trace dfeaed4c373a32da ]--
> 
> Using acpi_map_pxm_to_online_node() to get online node to fix it.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>   drivers/acpi/arm64/iort.c | 8 ++++----
>   1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index e48894e002ba..a2ce836ec103 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -1239,10 +1239,10 @@ static void  __init arm_smmu_v3_set_proximity(struct device *dev,
>   
>   	smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
>   	if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
> -		set_dev_node(dev, acpi_map_pxm_to_node(smmu->pxm));
> -		pr_info("SMMU-v3[%llx] Mapped to Proximity domain %d\n",
> -			smmu->base_address,
> -			smmu->pxm);
> +		int node = acpi_map_pxm_to_online_node(smmu->pxm);
> +		set_dev_node(dev, node);
> +		pr_info("SMMU-v3[%llx] -> PXM %d -> Node %d\n",
> +			smmu->base_address, smmu->pxm, node);
>   	}
>   }
>   #else
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] ACPI/IORT: set online numa node for smmuv3 device
  2019-03-20 11:41   ` Robin Murphy
@ 2019-03-20 14:00     ` Lorenzo Pieralisi
  2019-03-21  6:08       ` Kefeng Wang
  0 siblings, 1 reply; 18+ messages in thread
From: Lorenzo Pieralisi @ 2019-03-20 14:00 UTC (permalink / raw)
  To: Robin Murphy
  Cc: Kefeng Wang, rjw, linux-acpi, Hanjun Guo, Sudeep Holla, linux-arm-kernel

On Wed, Mar 20, 2019 at 11:41:18AM +0000, Robin Murphy wrote:
> On 15/03/2019 02:19, Kefeng Wang wrote:
> >If there is only node 0 in system, but smmuv3 device is set to offline
> >node 1, parsed from proximity domain in SMMUv3 IORT table, it will lead
> >to following crash,
> 
> Surely that's just a firmware bug? If node 1 doesn't exist in the system
> then AFAICS if we're presented with a device claiming to be on that node we
> can only assume the whole thing is bogus. Thus if we're going to work around
> it at all, it seems to me like we should reject the entire device rather
> than just bodging it to some other node.

I suspect that's the same issue this thread addressed:

https://lore.kernel.org/linux-pci/CAErSpo6S0qtR42tjGZrFu4aMFFyThx1hkHTSowTt6t3XerpHnA@mail.gmail.com/

Lorenzo

> Robin.
> 
> >
> >[   47.492451] Unable to handle kernel paging request at virtual address 0000000000001388
> >[   47.500361] Mem abort info:
> >[   47.503143]   ESR = 0x96000004
> >[   47.506189]   Exception class = DABT (current EL), IL = 32 bits
> >[   47.512099]   SET = 0, FnV = 0
> >[   47.515140]   EA = 0, S1PTW = 0
> >[   47.518272] Data abort info:
> >[   47.521144]   ISV = 0, ISS = 0x00000004
> >[   47.524970]   CM = 0, WnR = 0
> >[   47.527929] [0000000000001388] user address but active_mm is swapper
> >[   47.534285] Internal error: Oops: 96000004 [#1] SMP
> >[   47.539151] Modules linked in:
> >[   47.542194] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
> >[   47.549490] pstate: 80c00009 (Nzcv daif +PAN +UAO)
> >[   47.554272] pc : __alloc_pages_nodemask+0x13c/0x1068
> >[   47.559224] lr : __alloc_pages_nodemask+0xdc/0x1068
> >...
> >[   47.646873] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
> >[   47.653560] Call trace:
> >[   47.655994]  __alloc_pages_nodemask+0x13c/0x1068
> >[   47.660600]  new_slab+0xec/0x570
> >[   47.663816]  ___slab_alloc+0x3e0/0x4f8
> >[   47.667553]  __slab_alloc+0x60/0x80
> >[   47.671029]  __kmalloc_node_track_caller+0x10c/0x478
> >[   47.675984]  devm_kmalloc+0x44/0xb0
> >[   47.679460]  pinctrl_bind_pins+0x4c/0x188
> >[   47.683457]  really_probe+0x78/0x2b8
> >[   47.687019]  driver_probe_device+0x64/0x110
> >[   47.691189]  device_driver_attach+0x74/0x98
> >[   47.695360]  __driver_attach+0x9c/0xe8
> >[   47.699095]  bus_for_each_dev+0x84/0xd8
> >[   47.702919]  driver_attach+0x30/0x40
> >[   47.706481]  bus_add_driver+0x170/0x218
> >[   47.710304]  driver_register+0x64/0x118
> >[   47.714128]  __platform_driver_register+0x54/0x60
> >[   47.718820]  arm_smmu_driver_init+0x24/0x2c
> >[   47.722991]  do_one_initcall+0xbc/0x328
> >[   47.726816]  kernel_init_freeable+0x304/0x3ac
> >[   47.731162]  kernel_init+0x18/0x110
> >[   47.734638]  ret_from_fork+0x10/0x1c
> >[   47.738202] Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
> >[   47.744307] ---[ end trace dfeaed4c373a32da ]--
> >
> >Using acpi_map_pxm_to_online_node() to get online node to fix it.
> >
> >Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> >---
> >  drivers/acpi/arm64/iort.c | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> >diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> >index e48894e002ba..a2ce836ec103 100644
> >--- a/drivers/acpi/arm64/iort.c
> >+++ b/drivers/acpi/arm64/iort.c
> >@@ -1239,10 +1239,10 @@ static void  __init arm_smmu_v3_set_proximity(struct device *dev,
> >  	smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
> >  	if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
> >-		set_dev_node(dev, acpi_map_pxm_to_node(smmu->pxm));
> >-		pr_info("SMMU-v3[%llx] Mapped to Proximity domain %d\n",
> >-			smmu->base_address,
> >-			smmu->pxm);
> >+		int node = acpi_map_pxm_to_online_node(smmu->pxm);
> >+		set_dev_node(dev, node);
> >+		pr_info("SMMU-v3[%llx] -> PXM %d -> Node %d\n",
> >+			smmu->base_address, smmu->pxm, node);
> >  	}
> >  }
> >  #else
> >

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] ACPI/IORT: set online numa node for smmuv3 device
  2019-03-20 14:00     ` Lorenzo Pieralisi
@ 2019-03-21  6:08       ` Kefeng Wang
  2019-03-27 14:24         ` Kefeng Wang
  2019-03-28 11:32         ` Lorenzo Pieralisi
  0 siblings, 2 replies; 18+ messages in thread
From: Kefeng Wang @ 2019-03-21  6:08 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Robin Murphy
  Cc: linux-acpi, rjw, Hanjun Guo, linux-arm-kernel, Sudeep Holla


On 2019/3/20 22:00, Lorenzo Pieralisi wrote:
> On Wed, Mar 20, 2019 at 11:41:18AM +0000, Robin Murphy wrote:
>> On 15/03/2019 02:19, Kefeng Wang wrote:
>>> If there is only node 0 in system, but smmuv3 device is set to offline
>>> node 1, parsed from proximity domain in SMMUv3 IORT table, it will lead
>>> to following crash,
>> Surely that's just a firmware bug? If node 1 doesn't exist in the system
>> then AFAICS if we're presented with a device claiming to be on that node we
>> can only assume the whole thing is bogus. Thus if we're going to work around
>> it at all, it seems to me like we should reject the entire device rather
>> than just bodging it to some other node.

Yes, I met this oops with a wrong IORT configuration,

> I suspect that's the same issue this thread addressed:
>
> https://lore.kernel.org/linux-pci/CAErSpo6S0qtR42tjGZrFu4aMFFyThx1hkHTSowTt6t3XerpHnA@mail.gmail.com/

and the situation mentioned above should will trigger this issue too.

If the node is offline, we can just return from arm_smmu_v3_set_proximity(),  any better way to fix this?


> Lorenzo
>
>> Robin.
>>
>>> [   47.492451] Unable to handle kernel paging request at virtual address 0000000000001388
>>> [   47.500361] Mem abort info:
>>> [   47.503143]   ESR = 0x96000004
>>> [   47.506189]   Exception class = DABT (current EL), IL = 32 bits
>>> [   47.512099]   SET = 0, FnV = 0
>>> [   47.515140]   EA = 0, S1PTW = 0
>>> [   47.518272] Data abort info:
>>> [   47.521144]   ISV = 0, ISS = 0x00000004
>>> [   47.524970]   CM = 0, WnR = 0
>>> [   47.527929] [0000000000001388] user address but active_mm is swapper
>>> [   47.534285] Internal error: Oops: 96000004 [#1] SMP
>>> [   47.539151] Modules linked in:
>>> [   47.542194] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
>>> [   47.549490] pstate: 80c00009 (Nzcv daif +PAN +UAO)
>>> [   47.554272] pc : __alloc_pages_nodemask+0x13c/0x1068
>>> [   47.559224] lr : __alloc_pages_nodemask+0xdc/0x1068
>>> ...
>>> [   47.646873] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
>>> [   47.653560] Call trace:
>>> [   47.655994]  __alloc_pages_nodemask+0x13c/0x1068
>>> [   47.660600]  new_slab+0xec/0x570
>>> [   47.663816]  ___slab_alloc+0x3e0/0x4f8
>>> [   47.667553]  __slab_alloc+0x60/0x80
>>> [   47.671029]  __kmalloc_node_track_caller+0x10c/0x478
>>> [   47.675984]  devm_kmalloc+0x44/0xb0
>>> [   47.679460]  pinctrl_bind_pins+0x4c/0x188
>>> [   47.683457]  really_probe+0x78/0x2b8
>>> [   47.687019]  driver_probe_device+0x64/0x110
>>> [   47.691189]  device_driver_attach+0x74/0x98
>>> [   47.695360]  __driver_attach+0x9c/0xe8
>>> [   47.699095]  bus_for_each_dev+0x84/0xd8
>>> [   47.702919]  driver_attach+0x30/0x40
>>> [   47.706481]  bus_add_driver+0x170/0x218
>>> [   47.710304]  driver_register+0x64/0x118
>>> [   47.714128]  __platform_driver_register+0x54/0x60
>>> [   47.718820]  arm_smmu_driver_init+0x24/0x2c
>>> [   47.722991]  do_one_initcall+0xbc/0x328
>>> [   47.726816]  kernel_init_freeable+0x304/0x3ac
>>> [   47.731162]  kernel_init+0x18/0x110
>>> [   47.734638]  ret_from_fork+0x10/0x1c
>>> [   47.738202] Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
>>> [   47.744307] ---[ end trace dfeaed4c373a32da ]--
>>>
>>> Using acpi_map_pxm_to_online_node() to get online node to fix it.
>>>
>>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>>> ---
>>>  drivers/acpi/arm64/iort.c | 8 ++++----
>>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
>>> index e48894e002ba..a2ce836ec103 100644
>>> --- a/drivers/acpi/arm64/iort.c
>>> +++ b/drivers/acpi/arm64/iort.c
>>> @@ -1239,10 +1239,10 @@ static void  __init arm_smmu_v3_set_proximity(struct device *dev,
>>>  	smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
>>>  	if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
>>> -		set_dev_node(dev, acpi_map_pxm_to_node(smmu->pxm));
>>> -		pr_info("SMMU-v3[%llx] Mapped to Proximity domain %d\n",
>>> -			smmu->base_address,
>>> -			smmu->pxm);
>>> +		int node = acpi_map_pxm_to_online_node(smmu->pxm);
>>> +		set_dev_node(dev, node);
>>> +		pr_info("SMMU-v3[%llx] -> PXM %d -> Node %d\n",
>>> +			smmu->base_address, smmu->pxm, node);
>>>  	}
>>>  }
>>>  #else
>>>
> .
>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] ACPI/IORT: set online numa node for smmuv3 device
  2019-03-21  6:08       ` Kefeng Wang
@ 2019-03-27 14:24         ` Kefeng Wang
  2019-03-28 11:32         ` Lorenzo Pieralisi
  1 sibling, 0 replies; 18+ messages in thread
From: Kefeng Wang @ 2019-03-27 14:24 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Robin Murphy
  Cc: linux-acpi, rjw, Hanjun Guo, linux-arm-kernel, Sudeep Holla

Kindly ping, thanks.

On 2019/3/21 14:08, Kefeng Wang wrote:
> On 2019/3/20 22:00, Lorenzo Pieralisi wrote:
>> On Wed, Mar 20, 2019 at 11:41:18AM +0000, Robin Murphy wrote:
>>> On 15/03/2019 02:19, Kefeng Wang wrote:
>>>> If there is only node 0 in system, but smmuv3 device is set to offline
>>>> node 1, parsed from proximity domain in SMMUv3 IORT table, it will lead
>>>> to following crash,
>>> Surely that's just a firmware bug? If node 1 doesn't exist in the system
>>> then AFAICS if we're presented with a device claiming to be on that node we
>>> can only assume the whole thing is bogus. Thus if we're going to work around
>>> it at all, it seems to me like we should reject the entire device rather
>>> than just bodging it to some other node.
> Yes, I met this oops with a wrong IORT configuration,
>
>> I suspect that's the same issue this thread addressed:
>>
>> https://lore.kernel.org/linux-pci/CAErSpo6S0qtR42tjGZrFu4aMFFyThx1hkHTSowTt6t3XerpHnA@mail.gmail.com/
> and the situation mentioned above should will trigger this issue too.
>
> If the node is offline, we can just return from arm_smmu_v3_set_proximity(),  any better way to fix this?
>
>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH 1/2] ACPI/IORT: set online numa node for smmuv3 device
  2019-03-21  6:08       ` Kefeng Wang
  2019-03-27 14:24         ` Kefeng Wang
@ 2019-03-28 11:32         ` Lorenzo Pieralisi
  2019-03-28 14:00           ` [PATCH v2] ACPI/IORT: Reject platform dev creation when dev set to wrong numa node Kefeng Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Lorenzo Pieralisi @ 2019-03-28 11:32 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: rjw, linux-acpi, Hanjun Guo, Sudeep Holla, Robin Murphy,
	linux-arm-kernel

On Thu, Mar 21, 2019 at 02:08:47PM +0800, Kefeng Wang wrote:
> 
> On 2019/3/20 22:00, Lorenzo Pieralisi wrote:
> > On Wed, Mar 20, 2019 at 11:41:18AM +0000, Robin Murphy wrote:
> >> On 15/03/2019 02:19, Kefeng Wang wrote:
> >>> If there is only node 0 in system, but smmuv3 device is set to offline
> >>> node 1, parsed from proximity domain in SMMUv3 IORT table, it will lead
> >>> to following crash,
> >> Surely that's just a firmware bug? If node 1 doesn't exist in the system
> >> then AFAICS if we're presented with a device claiming to be on that node we
> >> can only assume the whole thing is bogus. Thus if we're going to work around
> >> it at all, it seems to me like we should reject the entire device rather
> >> than just bodging it to some other node.
> 
> Yes, I met this oops with a wrong IORT configuration,
> 
> > I suspect that's the same issue this thread addressed:
> >
> > https://lore.kernel.org/linux-pci/CAErSpo6S0qtR42tjGZrFu4aMFFyThx1hkHTSowTt6t3XerpHnA@mail.gmail.com/
> 
> and the situation mentioned above should will trigger this issue too.
> 
> If the node is offline, we can just return from
> arm_smmu_v3_set_proximity(),  any better way to fix this?

Add a return value to the set_promixity() callback and return failure on
hitting the issue above, therefore terminating device creation.

Thanks,
Lorenzo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] ACPI/IORT: Reject platform dev creation when dev set to wrong numa node
  2019-03-28 14:00           ` [PATCH v2] ACPI/IORT: Reject platform dev creation when dev set to wrong numa node Kefeng Wang
@ 2019-03-28 13:59             ` Robin Murphy
  2019-03-28 14:29               ` Kefeng Wang
  2019-03-29  3:17             ` [PATCH RESEND " Kefeng Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Robin Murphy @ 2019-03-28 13:59 UTC (permalink / raw)
  To: Kefeng Wang, Lorenzo Pieralisi, Sudeep Holla, rjw, linux-acpi,
	linux-arm-kernel, hanjun.guo

On 28/03/2019 14:00, Kefeng Wang wrote:
> If there is only node 0 in system, but smmuv3 device is set to offline
> node 1, parsed from proximity domain in SMMUv3 IORT table, it will lead
> to following crash,
> 
> [   47.492451] Unable to handle kernel paging request at virtual address 0000000000001388
> [   47.500361] Mem abort info:
> [   47.503143]   ESR = 0x96000004
> [   47.506189]   Exception class = DABT (current EL), IL = 32 bits
> [   47.512099]   SET = 0, FnV = 0
> [   47.515140]   EA = 0, S1PTW = 0
> [   47.518272] Data abort info:
> [   47.521144]   ISV = 0, ISS = 0x00000004
> [   47.524970]   CM = 0, WnR = 0
> [   47.527929] [0000000000001388] user address but active_mm is swapper
> [   47.534285] Internal error: Oops: 96000004 [#1] SMP
> [   47.539151] Modules linked in:
> [   47.542194] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
> [   47.549490] pstate: 80c00009 (Nzcv daif +PAN +UAO)
> [   47.554272] pc : __alloc_pages_nodemask+0x13c/0x1068
> [   47.559224] lr : __alloc_pages_nodemask+0xdc/0x1068
> ...
> [   47.646873] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
> [   47.653560] Call trace:
> [   47.655994]  __alloc_pages_nodemask+0x13c/0x1068
> [   47.660600]  new_slab+0xec/0x570
> [   47.663816]  ___slab_alloc+0x3e0/0x4f8
> [   47.667553]  __slab_alloc+0x60/0x80
> [   47.671029]  __kmalloc_node_track_caller+0x10c/0x478
> [   47.675984]  devm_kmalloc+0x44/0xb0
> [   47.679460]  pinctrl_bind_pins+0x4c/0x188
> [   47.683457]  really_probe+0x78/0x2b8
> [   47.687019]  driver_probe_device+0x64/0x110
> [   47.691189]  device_driver_attach+0x74/0x98
> [   47.695360]  __driver_attach+0x9c/0xe8
> [   47.699095]  bus_for_each_dev+0x84/0xd8
> [   47.702919]  driver_attach+0x30/0x40
> [   47.706481]  bus_add_driver+0x170/0x218
> [   47.710304]  driver_register+0x64/0x118
> [   47.714128]  __platform_driver_register+0x54/0x60
> [   47.718820]  arm_smmu_driver_init+0x24/0x2c
> [   47.722991]  do_one_initcall+0xbc/0x328
> [   47.726816]  kernel_init_freeable+0x304/0x3ac
> [   47.731162]  kernel_init+0x18/0x110
> [   47.734638]  ret_from_fork+0x10/0x1c
> [   47.738202] Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
> [   47.744307] ---[ end trace dfeaed4c373a32da ]--
> 
> This could be triggered by firmware bug with bad IORT configuration,
> or a NUMA node has no memory attaching to it, also with NR_CPUS less
> than CPUs presented in MADT.
> 
> Make dev_set_proximity() with a return value, terminating device creation
> if it return failure.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>   drivers/acpi/arm64/iort.c | 24 ++++++++++++++++++------
>   1 file changed, 18 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index e48894e002ba..c294c3490e66 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -1232,21 +1232,30 @@ static bool __init arm_smmu_v3_is_coherent(struct acpi_iort_node *node)
>   /*
>    * set numa proximity domain for smmuv3 device
>    */
> -static void  __init arm_smmu_v3_set_proximity(struct device *dev,
> +static int  __init arm_smmu_v3_set_proximity(struct device *dev,
>   					      struct acpi_iort_node *node)
>   {
>   	struct acpi_iort_smmu_v3 *smmu;
>   
>   	smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
>   	if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
> -		set_dev_node(dev, acpi_map_pxm_to_node(smmu->pxm));
> +		int node = acpi_map_pxm_to_node(smmu->pxm);
> +		if (node != NUMA_NO_NODE && !node_online(node))
> +			return -EINVAL;
> +
> +		set_dev_node(dev, node);
>   		pr_info("SMMU-v3[%llx] Mapped to Proximity domain %d\n",
>   			smmu->base_address,
>   			smmu->pxm);
>   	}
> +	return 0;
>   }
>   #else
> -#define arm_smmu_v3_set_proximity NULL
> +static int  __init arm_smmu_v3_set_proximity(struct device *dev,
> +					      struct acpi_iort_node *node)
> +{
> +	return 0;
> +}

Doesn't this end up having the same effect as just leaving the callback 
assigned with NULL? Not sure why that would need to change :/

Robin.

>   #endif
>   
>   static int __init arm_smmu_count_resources(struct acpi_iort_node *node)
> @@ -1318,7 +1327,7 @@ struct iort_dev_config {
>   	int (*dev_count_resources)(struct acpi_iort_node *node);
>   	void (*dev_init_resources)(struct resource *res,
>   				     struct acpi_iort_node *node);
> -	void (*dev_set_proximity)(struct device *dev,
> +	int (*dev_set_proximity)(struct device *dev,
>   				    struct acpi_iort_node *node);
>   };
>   
> @@ -1369,8 +1378,11 @@ static int __init iort_add_platform_device(struct acpi_iort_node *node,
>   	if (!pdev)
>   		return -ENOMEM;
>   
> -	if (ops->dev_set_proximity)
> -		ops->dev_set_proximity(&pdev->dev, node);
> +	if (ops->dev_set_proximity) {
> +		ret = ops->dev_set_proximity(&pdev->dev, node);
> +		if (ret)
> +			goto dev_put;
> +	}
>   
>   	count = ops->dev_count_resources(node);
>   
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v2] ACPI/IORT: Reject platform dev creation when dev set to wrong numa node
  2019-03-28 11:32         ` Lorenzo Pieralisi
@ 2019-03-28 14:00           ` Kefeng Wang
  2019-03-28 13:59             ` Robin Murphy
  2019-03-29  3:17             ` [PATCH RESEND " Kefeng Wang
  0 siblings, 2 replies; 18+ messages in thread
From: Kefeng Wang @ 2019-03-28 14:00 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Robin Murphy, Sudeep Holla, rjw, linux-acpi,
	linux-arm-kernel, hanjun.guo
  Cc: Kefeng Wang

If there is only node 0 in system, but smmuv3 device is set to offline
node 1, parsed from proximity domain in SMMUv3 IORT table, it will lead
to following crash,

[   47.492451] Unable to handle kernel paging request at virtual address 0000000000001388
[   47.500361] Mem abort info:
[   47.503143]   ESR = 0x96000004
[   47.506189]   Exception class = DABT (current EL), IL = 32 bits
[   47.512099]   SET = 0, FnV = 0
[   47.515140]   EA = 0, S1PTW = 0
[   47.518272] Data abort info:
[   47.521144]   ISV = 0, ISS = 0x00000004
[   47.524970]   CM = 0, WnR = 0
[   47.527929] [0000000000001388] user address but active_mm is swapper
[   47.534285] Internal error: Oops: 96000004 [#1] SMP
[   47.539151] Modules linked in:
[   47.542194] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
[   47.549490] pstate: 80c00009 (Nzcv daif +PAN +UAO)
[   47.554272] pc : __alloc_pages_nodemask+0x13c/0x1068
[   47.559224] lr : __alloc_pages_nodemask+0xdc/0x1068
...
[   47.646873] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
[   47.653560] Call trace:
[   47.655994]  __alloc_pages_nodemask+0x13c/0x1068
[   47.660600]  new_slab+0xec/0x570
[   47.663816]  ___slab_alloc+0x3e0/0x4f8
[   47.667553]  __slab_alloc+0x60/0x80
[   47.671029]  __kmalloc_node_track_caller+0x10c/0x478
[   47.675984]  devm_kmalloc+0x44/0xb0
[   47.679460]  pinctrl_bind_pins+0x4c/0x188
[   47.683457]  really_probe+0x78/0x2b8
[   47.687019]  driver_probe_device+0x64/0x110
[   47.691189]  device_driver_attach+0x74/0x98
[   47.695360]  __driver_attach+0x9c/0xe8
[   47.699095]  bus_for_each_dev+0x84/0xd8
[   47.702919]  driver_attach+0x30/0x40
[   47.706481]  bus_add_driver+0x170/0x218
[   47.710304]  driver_register+0x64/0x118
[   47.714128]  __platform_driver_register+0x54/0x60
[   47.718820]  arm_smmu_driver_init+0x24/0x2c
[   47.722991]  do_one_initcall+0xbc/0x328
[   47.726816]  kernel_init_freeable+0x304/0x3ac
[   47.731162]  kernel_init+0x18/0x110
[   47.734638]  ret_from_fork+0x10/0x1c
[   47.738202] Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
[   47.744307] ---[ end trace dfeaed4c373a32da ]--

This could be triggered by firmware bug with bad IORT configuration,
or a NUMA node has no memory attaching to it, also with NR_CPUS less
than CPUs presented in MADT.

Make dev_set_proximity() with a return value, terminating device creation
if it return failure.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 drivers/acpi/arm64/iort.c | 24 ++++++++++++++++++------
 1 file changed, 18 insertions(+), 6 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index e48894e002ba..c294c3490e66 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1232,21 +1232,30 @@ static bool __init arm_smmu_v3_is_coherent(struct acpi_iort_node *node)
 /*
  * set numa proximity domain for smmuv3 device
  */
-static void  __init arm_smmu_v3_set_proximity(struct device *dev,
+static int  __init arm_smmu_v3_set_proximity(struct device *dev,
 					      struct acpi_iort_node *node)
 {
 	struct acpi_iort_smmu_v3 *smmu;
 
 	smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
 	if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
-		set_dev_node(dev, acpi_map_pxm_to_node(smmu->pxm));
+		int node = acpi_map_pxm_to_node(smmu->pxm);
+		if (node != NUMA_NO_NODE && !node_online(node))
+			return -EINVAL;
+
+		set_dev_node(dev, node);
 		pr_info("SMMU-v3[%llx] Mapped to Proximity domain %d\n",
 			smmu->base_address,
 			smmu->pxm);
 	}
+	return 0;
 }
 #else
-#define arm_smmu_v3_set_proximity NULL
+static int  __init arm_smmu_v3_set_proximity(struct device *dev,
+					      struct acpi_iort_node *node)
+{
+	return 0;
+}
 #endif
 
 static int __init arm_smmu_count_resources(struct acpi_iort_node *node)
@@ -1318,7 +1327,7 @@ struct iort_dev_config {
 	int (*dev_count_resources)(struct acpi_iort_node *node);
 	void (*dev_init_resources)(struct resource *res,
 				     struct acpi_iort_node *node);
-	void (*dev_set_proximity)(struct device *dev,
+	int (*dev_set_proximity)(struct device *dev,
 				    struct acpi_iort_node *node);
 };
 
@@ -1369,8 +1378,11 @@ static int __init iort_add_platform_device(struct acpi_iort_node *node,
 	if (!pdev)
 		return -ENOMEM;
 
-	if (ops->dev_set_proximity)
-		ops->dev_set_proximity(&pdev->dev, node);
+	if (ops->dev_set_proximity) {
+		ret = ops->dev_set_proximity(&pdev->dev, node);
+		if (ret)
+			goto dev_put;
+	}
 
 	count = ops->dev_count_resources(node);
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v2] ACPI/IORT: Reject platform dev creation when dev set to wrong numa node
  2019-03-28 13:59             ` Robin Murphy
@ 2019-03-28 14:29               ` Kefeng Wang
  0 siblings, 0 replies; 18+ messages in thread
From: Kefeng Wang @ 2019-03-28 14:29 UTC (permalink / raw)
  To: Robin Murphy, Lorenzo Pieralisi, Sudeep Holla, rjw, linux-acpi,
	linux-arm-kernel, hanjun.guo


On 2019/3/28 21:59, Robin Murphy wrote:
> On 28/03/2019 14:00, Kefeng Wang wrote:
>> If there is only node 0 in system, but smmuv3 device is set to offline
>> node 1, parsed from proximity domain in SMMUv3 IORT table, it will lead
>> to following crash,
>>
>> [   47.492451] Unable to handle kernel paging request at virtual address 0000000000001388
>> [   47.500361] Mem abort info:
>> [   47.503143]   ESR = 0x96000004
>> [   47.506189]   Exception class = DABT (current EL), IL = 32 bits
>> [   47.512099]   SET = 0, FnV = 0
>> [   47.515140]   EA = 0, S1PTW = 0
>> [   47.518272] Data abort info:
>> [   47.521144]   ISV = 0, ISS = 0x00000004
>> [   47.524970]   CM = 0, WnR = 0
>> [   47.527929] [0000000000001388] user address but active_mm is swapper
>> [   47.534285] Internal error: Oops: 96000004 [#1] SMP
>> [   47.539151] Modules linked in:
>> [   47.542194] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
>> [   47.549490] pstate: 80c00009 (Nzcv daif +PAN +UAO)
>> [   47.554272] pc : __alloc_pages_nodemask+0x13c/0x1068
>> [   47.559224] lr : __alloc_pages_nodemask+0xdc/0x1068
>> ...
>> [   47.646873] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
>> [   47.653560] Call trace:
>> [   47.655994]  __alloc_pages_nodemask+0x13c/0x1068
>> [   47.660600]  new_slab+0xec/0x570
>> [   47.663816]  ___slab_alloc+0x3e0/0x4f8
>> [   47.667553]  __slab_alloc+0x60/0x80
>> [   47.671029]  __kmalloc_node_track_caller+0x10c/0x478
>> [   47.675984]  devm_kmalloc+0x44/0xb0
>> [   47.679460]  pinctrl_bind_pins+0x4c/0x188
>> [   47.683457]  really_probe+0x78/0x2b8
>> [   47.687019]  driver_probe_device+0x64/0x110
>> [   47.691189]  device_driver_attach+0x74/0x98
>> [   47.695360]  __driver_attach+0x9c/0xe8
>> [   47.699095]  bus_for_each_dev+0x84/0xd8
>> [   47.702919]  driver_attach+0x30/0x40
>> [   47.706481]  bus_add_driver+0x170/0x218
>> [   47.710304]  driver_register+0x64/0x118
>> [   47.714128]  __platform_driver_register+0x54/0x60
>> [   47.718820]  arm_smmu_driver_init+0x24/0x2c
>> [   47.722991]  do_one_initcall+0xbc/0x328
>> [   47.726816]  kernel_init_freeable+0x304/0x3ac
>> [   47.731162]  kernel_init+0x18/0x110
>> [   47.734638]  ret_from_fork+0x10/0x1c
>> [   47.738202] Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
>> [   47.744307] ---[ end trace dfeaed4c373a32da ]--
>>
>> This could be triggered by firmware bug with bad IORT configuration,
>> or a NUMA node has no memory attaching to it, also with NR_CPUS less
>> than CPUs presented in MADT.
>>
>> Make dev_set_proximity() with a return value, terminating device creation
>> if it return failure.
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
>>   drivers/acpi/arm64/iort.c | 24 ++++++++++++++++++------
>>   1 file changed, 18 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
>> index e48894e002ba..c294c3490e66 100644
>> --- a/drivers/acpi/arm64/iort.c
>> +++ b/drivers/acpi/arm64/iort.c
>> @@ -1232,21 +1232,30 @@ static bool __init arm_smmu_v3_is_coherent(struct acpi_iort_node *node)
>>   /*
>>    * set numa proximity domain for smmuv3 device
>>    */
>> -static void  __init arm_smmu_v3_set_proximity(struct device *dev,
>> +static int  __init arm_smmu_v3_set_proximity(struct device *dev,
>>                             struct acpi_iort_node *node)
>>   {
>>       struct acpi_iort_smmu_v3 *smmu;
>>         smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
>>       if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
>> -        set_dev_node(dev, acpi_map_pxm_to_node(smmu->pxm));
>> +        int node = acpi_map_pxm_to_node(smmu->pxm);
>> +        if (node != NUMA_NO_NODE && !node_online(node))
>> +            return -EINVAL;
>> +
>> +        set_dev_node(dev, node);
>>           pr_info("SMMU-v3[%llx] Mapped to Proximity domain %d\n",
>>               smmu->base_address,
>>               smmu->pxm);
>>       }
>> +    return 0;
>>   }
>>   #else
>> -#define arm_smmu_v3_set_proximity NULL
>> +static int  __init arm_smmu_v3_set_proximity(struct device *dev,
>> +                          struct acpi_iort_node *node)
>> +{
>> +    return 0;
>> +}
>
> Doesn't this end up having the same effect as just leaving the callback assigned with NULL? Not sure why that would need to change :/

Oops, should not change this part  ; (

if no other issue, will resend

Thanks.


>
> Robin.
>


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH RESEND v2] ACPI/IORT: Reject platform dev creation when dev set to wrong numa node
  2019-03-28 14:00           ` [PATCH v2] ACPI/IORT: Reject platform dev creation when dev set to wrong numa node Kefeng Wang
  2019-03-28 13:59             ` Robin Murphy
@ 2019-03-29  3:17             ` Kefeng Wang
  2019-04-08 10:42               ` Lorenzo Pieralisi
  2019-04-08 10:46               ` Lorenzo Pieralisi
  1 sibling, 2 replies; 18+ messages in thread
From: Kefeng Wang @ 2019-03-29  3:17 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Robin Murphy, Sudeep Holla, rjw, linux-acpi,
	linux-arm-kernel, hanjun.guo
  Cc: Kefeng Wang

If there is only node 0 in system, but smmuv3 device is set to offline
node 1, parsed from proximity domain in SMMUv3 IORT table, it will lead
to following crash,

[   47.492451] Unable to handle kernel paging request at virtual address 0000000000001388
[   47.500361] Mem abort info:
[   47.503143]   ESR = 0x96000004
[   47.506189]   Exception class = DABT (current EL), IL = 32 bits
[   47.512099]   SET = 0, FnV = 0
[   47.515140]   EA = 0, S1PTW = 0
[   47.518272] Data abort info:
[   47.521144]   ISV = 0, ISS = 0x00000004
[   47.524970]   CM = 0, WnR = 0
[   47.527929] [0000000000001388] user address but active_mm is swapper
[   47.534285] Internal error: Oops: 96000004 [#1] SMP
[   47.539151] Modules linked in:
[   47.542194] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
[   47.549490] pstate: 80c00009 (Nzcv daif +PAN +UAO)
[   47.554272] pc : __alloc_pages_nodemask+0x13c/0x1068
[   47.559224] lr : __alloc_pages_nodemask+0xdc/0x1068
...
[   47.646873] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
[   47.653560] Call trace:
[   47.655994]  __alloc_pages_nodemask+0x13c/0x1068
[   47.660600]  new_slab+0xec/0x570
[   47.663816]  ___slab_alloc+0x3e0/0x4f8
[   47.667553]  __slab_alloc+0x60/0x80
[   47.671029]  __kmalloc_node_track_caller+0x10c/0x478
[   47.675984]  devm_kmalloc+0x44/0xb0
[   47.679460]  pinctrl_bind_pins+0x4c/0x188
[   47.683457]  really_probe+0x78/0x2b8
[   47.687019]  driver_probe_device+0x64/0x110
[   47.691189]  device_driver_attach+0x74/0x98
[   47.695360]  __driver_attach+0x9c/0xe8
[   47.699095]  bus_for_each_dev+0x84/0xd8
[   47.702919]  driver_attach+0x30/0x40
[   47.706481]  bus_add_driver+0x170/0x218
[   47.710304]  driver_register+0x64/0x118
[   47.714128]  __platform_driver_register+0x54/0x60
[   47.718820]  arm_smmu_driver_init+0x24/0x2c
[   47.722991]  do_one_initcall+0xbc/0x328
[   47.726816]  kernel_init_freeable+0x304/0x3ac
[   47.731162]  kernel_init+0x18/0x110
[   47.734638]  ret_from_fork+0x10/0x1c
[   47.738202] Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
[   47.744307] ---[ end trace dfeaed4c373a32da ]--

This could be triggered by firmware bug with bad IORT configuration,
or a NUMA node has no memory attaching to it, also with NR_CPUS less
than CPUs presented in MADT.

Make dev_set_proximity() with a return value, terminating device creation
if it return failure.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 drivers/acpi/arm64/iort.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index e48894e002ba..1fc1851b078e 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1232,18 +1232,23 @@ static bool __init arm_smmu_v3_is_coherent(struct acpi_iort_node *node)
 /*
  * set numa proximity domain for smmuv3 device
  */
-static void  __init arm_smmu_v3_set_proximity(struct device *dev,
+static int  __init arm_smmu_v3_set_proximity(struct device *dev,
 					      struct acpi_iort_node *node)
 {
 	struct acpi_iort_smmu_v3 *smmu;
 
 	smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
 	if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
-		set_dev_node(dev, acpi_map_pxm_to_node(smmu->pxm));
+		int node = acpi_map_pxm_to_node(smmu->pxm);
+		if (node != NUMA_NO_NODE && !node_online(node))
+			return -EINVAL;
+
+		set_dev_node(dev, node);
 		pr_info("SMMU-v3[%llx] Mapped to Proximity domain %d\n",
 			smmu->base_address,
 			smmu->pxm);
 	}
+	return 0;
 }
 #else
 #define arm_smmu_v3_set_proximity NULL
@@ -1318,7 +1323,7 @@ struct iort_dev_config {
 	int (*dev_count_resources)(struct acpi_iort_node *node);
 	void (*dev_init_resources)(struct resource *res,
 				     struct acpi_iort_node *node);
-	void (*dev_set_proximity)(struct device *dev,
+	int (*dev_set_proximity)(struct device *dev,
 				    struct acpi_iort_node *node);
 };
 
@@ -1369,8 +1374,11 @@ static int __init iort_add_platform_device(struct acpi_iort_node *node,
 	if (!pdev)
 		return -ENOMEM;
 
-	if (ops->dev_set_proximity)
-		ops->dev_set_proximity(&pdev->dev, node);
+	if (ops->dev_set_proximity) {
+		ret = ops->dev_set_proximity(&pdev->dev, node);
+		if (ret)
+			goto dev_put;
+	}
 
 	count = ops->dev_count_resources(node);
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND v2] ACPI/IORT: Reject platform dev creation when dev set to wrong numa node
  2019-03-29  3:17             ` [PATCH RESEND " Kefeng Wang
@ 2019-04-08 10:42               ` Lorenzo Pieralisi
  2019-04-08 10:46               ` Lorenzo Pieralisi
  1 sibling, 0 replies; 18+ messages in thread
From: Lorenzo Pieralisi @ 2019-04-08 10:42 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: rjw, linux-acpi, hanjun.guo, Sudeep Holla, Robin Murphy,
	linux-arm-kernel

On Fri, Mar 29, 2019 at 11:17:51AM +0800, Kefeng Wang wrote:
> If there is only node 0 in system, but smmuv3 device is set to offline
> node 1, parsed from proximity domain in SMMUv3 IORT table, it will lead
> to following crash,

"In a system where, through IORT firmware mappings, the SMMU device is
mapped to a NUMA node that is not online, the kernel bootstrap results
in the following crash:"

> [   47.492451] Unable to handle kernel paging request at virtual address 0000000000001388
> [   47.500361] Mem abort info:
> [   47.503143]   ESR = 0x96000004
> [   47.506189]   Exception class = DABT (current EL), IL = 32 bits
> [   47.512099]   SET = 0, FnV = 0
> [   47.515140]   EA = 0, S1PTW = 0
> [   47.518272] Data abort info:
> [   47.521144]   ISV = 0, ISS = 0x00000004
> [   47.524970]   CM = 0, WnR = 0
> [   47.527929] [0000000000001388] user address but active_mm is swapper
> [   47.534285] Internal error: Oops: 96000004 [#1] SMP
> [   47.539151] Modules linked in:
> [   47.542194] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
> [   47.549490] pstate: 80c00009 (Nzcv daif +PAN +UAO)
> [   47.554272] pc : __alloc_pages_nodemask+0x13c/0x1068
> [   47.559224] lr : __alloc_pages_nodemask+0xdc/0x1068
> ...
> [   47.646873] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
> [   47.653560] Call trace:
> [   47.655994]  __alloc_pages_nodemask+0x13c/0x1068
> [   47.660600]  new_slab+0xec/0x570
> [   47.663816]  ___slab_alloc+0x3e0/0x4f8
> [   47.667553]  __slab_alloc+0x60/0x80
> [   47.671029]  __kmalloc_node_track_caller+0x10c/0x478
> [   47.675984]  devm_kmalloc+0x44/0xb0
> [   47.679460]  pinctrl_bind_pins+0x4c/0x188
> [   47.683457]  really_probe+0x78/0x2b8
> [   47.687019]  driver_probe_device+0x64/0x110
> [   47.691189]  device_driver_attach+0x74/0x98
> [   47.695360]  __driver_attach+0x9c/0xe8
> [   47.699095]  bus_for_each_dev+0x84/0xd8
> [   47.702919]  driver_attach+0x30/0x40
> [   47.706481]  bus_add_driver+0x170/0x218
> [   47.710304]  driver_register+0x64/0x118
> [   47.714128]  __platform_driver_register+0x54/0x60
> [   47.718820]  arm_smmu_driver_init+0x24/0x2c
> [   47.722991]  do_one_initcall+0xbc/0x328
> [   47.726816]  kernel_init_freeable+0x304/0x3ac
> [   47.731162]  kernel_init+0x18/0x110
> [   47.734638]  ret_from_fork+0x10/0x1c
> [   47.738202] Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
> [   47.744307] ---[ end trace dfeaed4c373a32da ]--

Nit: timestamps are not useful information, remove them and indent
the log with two spaces, to quote it.

> This could be triggered by firmware bug with bad IORT configuration,
> or a NUMA node has no memory attaching to it, also with NR_CPUS less
> than CPUs presented in MADT.

Either you explain this properly or you remove this paragraph, I would
remove it.

Actually I would add a Link: tag to point at the lore archives where the
related discussions took place.

> Make dev_set_proximity() with a return value, terminating device creation
> if it return failure.

"Change the dev_set_proximity() hook prototype so that it returns a
value and make it return failure if the PXM->NUMA-node mapping
corresponds to an offline node, fixing the crash".

> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  drivers/acpi/arm64/iort.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)

With the commit log changes above:

Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>

> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index e48894e002ba..1fc1851b078e 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -1232,18 +1232,23 @@ static bool __init arm_smmu_v3_is_coherent(struct acpi_iort_node *node)
>  /*
>   * set numa proximity domain for smmuv3 device
>   */
> -static void  __init arm_smmu_v3_set_proximity(struct device *dev,
> +static int  __init arm_smmu_v3_set_proximity(struct device *dev,
>  					      struct acpi_iort_node *node)
>  {
>  	struct acpi_iort_smmu_v3 *smmu;
>  
>  	smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
>  	if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
> -		set_dev_node(dev, acpi_map_pxm_to_node(smmu->pxm));
> +		int node = acpi_map_pxm_to_node(smmu->pxm);
> +		if (node != NUMA_NO_NODE && !node_online(node))
> +			return -EINVAL;
> +
> +		set_dev_node(dev, node);
>  		pr_info("SMMU-v3[%llx] Mapped to Proximity domain %d\n",
>  			smmu->base_address,
>  			smmu->pxm);
>  	}
> +	return 0;
>  }
>  #else
>  #define arm_smmu_v3_set_proximity NULL
> @@ -1318,7 +1323,7 @@ struct iort_dev_config {
>  	int (*dev_count_resources)(struct acpi_iort_node *node);
>  	void (*dev_init_resources)(struct resource *res,
>  				     struct acpi_iort_node *node);
> -	void (*dev_set_proximity)(struct device *dev,
> +	int (*dev_set_proximity)(struct device *dev,
>  				    struct acpi_iort_node *node);
>  };
>  
> @@ -1369,8 +1374,11 @@ static int __init iort_add_platform_device(struct acpi_iort_node *node,
>  	if (!pdev)
>  		return -ENOMEM;
>  
> -	if (ops->dev_set_proximity)
> -		ops->dev_set_proximity(&pdev->dev, node);
> +	if (ops->dev_set_proximity) {
> +		ret = ops->dev_set_proximity(&pdev->dev, node);
> +		if (ret)
> +			goto dev_put;
> +	}
>  
>  	count = ops->dev_count_resources(node);
>  
> -- 
> 2.20.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH RESEND v2] ACPI/IORT: Reject platform dev creation when dev set to wrong numa node
  2019-03-29  3:17             ` [PATCH RESEND " Kefeng Wang
  2019-04-08 10:42               ` Lorenzo Pieralisi
@ 2019-04-08 10:46               ` Lorenzo Pieralisi
  2019-04-08 15:21                 ` [PATCH v3] ACPI/IORT: Reject platform device creation on NUMA node mapping failure Kefeng Wang
  1 sibling, 1 reply; 18+ messages in thread
From: Lorenzo Pieralisi @ 2019-04-08 10:46 UTC (permalink / raw)
  To: Kefeng Wang
  Cc: rjw, linux-acpi, hanjun.guo, Sudeep Holla, Robin Murphy,
	linux-arm-kernel

Also, in the $SUBJECT, s/numa/NUMA because that's an acronym not
an English word.

Here:

"ACPI/IORT: Reject platform device creation on NUMA node mapping failure"

Thanks,
Lorenzo

On Fri, Mar 29, 2019 at 11:17:51AM +0800, Kefeng Wang wrote:
> If there is only node 0 in system, but smmuv3 device is set to offline
> node 1, parsed from proximity domain in SMMUv3 IORT table, it will lead
> to following crash,
> 
> [   47.492451] Unable to handle kernel paging request at virtual address 0000000000001388
> [   47.500361] Mem abort info:
> [   47.503143]   ESR = 0x96000004
> [   47.506189]   Exception class = DABT (current EL), IL = 32 bits
> [   47.512099]   SET = 0, FnV = 0
> [   47.515140]   EA = 0, S1PTW = 0
> [   47.518272] Data abort info:
> [   47.521144]   ISV = 0, ISS = 0x00000004
> [   47.524970]   CM = 0, WnR = 0
> [   47.527929] [0000000000001388] user address but active_mm is swapper
> [   47.534285] Internal error: Oops: 96000004 [#1] SMP
> [   47.539151] Modules linked in:
> [   47.542194] CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
> [   47.549490] pstate: 80c00009 (Nzcv daif +PAN +UAO)
> [   47.554272] pc : __alloc_pages_nodemask+0x13c/0x1068
> [   47.559224] lr : __alloc_pages_nodemask+0xdc/0x1068
> ...
> [   47.646873] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
> [   47.653560] Call trace:
> [   47.655994]  __alloc_pages_nodemask+0x13c/0x1068
> [   47.660600]  new_slab+0xec/0x570
> [   47.663816]  ___slab_alloc+0x3e0/0x4f8
> [   47.667553]  __slab_alloc+0x60/0x80
> [   47.671029]  __kmalloc_node_track_caller+0x10c/0x478
> [   47.675984]  devm_kmalloc+0x44/0xb0
> [   47.679460]  pinctrl_bind_pins+0x4c/0x188
> [   47.683457]  really_probe+0x78/0x2b8
> [   47.687019]  driver_probe_device+0x64/0x110
> [   47.691189]  device_driver_attach+0x74/0x98
> [   47.695360]  __driver_attach+0x9c/0xe8
> [   47.699095]  bus_for_each_dev+0x84/0xd8
> [   47.702919]  driver_attach+0x30/0x40
> [   47.706481]  bus_add_driver+0x170/0x218
> [   47.710304]  driver_register+0x64/0x118
> [   47.714128]  __platform_driver_register+0x54/0x60
> [   47.718820]  arm_smmu_driver_init+0x24/0x2c
> [   47.722991]  do_one_initcall+0xbc/0x328
> [   47.726816]  kernel_init_freeable+0x304/0x3ac
> [   47.731162]  kernel_init+0x18/0x110
> [   47.734638]  ret_from_fork+0x10/0x1c
> [   47.738202] Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
> [   47.744307] ---[ end trace dfeaed4c373a32da ]--
> 
> This could be triggered by firmware bug with bad IORT configuration,
> or a NUMA node has no memory attaching to it, also with NR_CPUS less
> than CPUs presented in MADT.
> 
> Make dev_set_proximity() with a return value, terminating device creation
> if it return failure.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  drivers/acpi/arm64/iort.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index e48894e002ba..1fc1851b078e 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -1232,18 +1232,23 @@ static bool __init arm_smmu_v3_is_coherent(struct acpi_iort_node *node)
>  /*
>   * set numa proximity domain for smmuv3 device
>   */
> -static void  __init arm_smmu_v3_set_proximity(struct device *dev,
> +static int  __init arm_smmu_v3_set_proximity(struct device *dev,
>  					      struct acpi_iort_node *node)
>  {
>  	struct acpi_iort_smmu_v3 *smmu;
>  
>  	smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
>  	if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
> -		set_dev_node(dev, acpi_map_pxm_to_node(smmu->pxm));
> +		int node = acpi_map_pxm_to_node(smmu->pxm);
> +		if (node != NUMA_NO_NODE && !node_online(node))
> +			return -EINVAL;
> +
> +		set_dev_node(dev, node);
>  		pr_info("SMMU-v3[%llx] Mapped to Proximity domain %d\n",
>  			smmu->base_address,
>  			smmu->pxm);
>  	}
> +	return 0;
>  }
>  #else
>  #define arm_smmu_v3_set_proximity NULL
> @@ -1318,7 +1323,7 @@ struct iort_dev_config {
>  	int (*dev_count_resources)(struct acpi_iort_node *node);
>  	void (*dev_init_resources)(struct resource *res,
>  				     struct acpi_iort_node *node);
> -	void (*dev_set_proximity)(struct device *dev,
> +	int (*dev_set_proximity)(struct device *dev,
>  				    struct acpi_iort_node *node);
>  };
>  
> @@ -1369,8 +1374,11 @@ static int __init iort_add_platform_device(struct acpi_iort_node *node,
>  	if (!pdev)
>  		return -ENOMEM;
>  
> -	if (ops->dev_set_proximity)
> -		ops->dev_set_proximity(&pdev->dev, node);
> +	if (ops->dev_set_proximity) {
> +		ret = ops->dev_set_proximity(&pdev->dev, node);
> +		if (ret)
> +			goto dev_put;
> +	}
>  
>  	count = ops->dev_count_resources(node);
>  
> -- 
> 2.20.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH v3] ACPI/IORT: Reject platform device creation on NUMA node mapping failure
  2019-04-08 10:46               ` Lorenzo Pieralisi
@ 2019-04-08 15:21                 ` Kefeng Wang
  2019-04-16 17:02                   ` Lorenzo Pieralisi
  0 siblings, 1 reply; 18+ messages in thread
From: Kefeng Wang @ 2019-04-08 15:21 UTC (permalink / raw)
  To: Lorenzo Pieralisi, Robin Murphy, Sudeep Holla, rjw, linux-acpi,
	linux-arm-kernel
  Cc: Kefeng Wang

In a system where, through IORT firmware mappings, the SMMU device is
mapped to a NUMA node that is not online, the kernel bootstrap results
in the following crash:

  Unable to handle kernel paging request at virtual address 0000000000001388
  Mem abort info:
    ESR = 0x96000004
    Exception class = DABT (current EL), IL = 32 bits
    SET = 0, FnV = 0
    EA = 0, S1PTW = 0
  Data abort info:
    ISV = 0, ISS = 0x00000004
    CM = 0, WnR = 0
  [0000000000001388] user address but active_mm is swapper
  Internal error: Oops: 96000004 [#1] SMP
  Modules linked in:
  CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
  pstate: 80c00009 (Nzcv daif +PAN +UAO)
  pc : __alloc_pages_nodemask+0x13c/0x1068
  lr : __alloc_pages_nodemask+0xdc/0x1068
  ...
  Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
  Call trace:
   __alloc_pages_nodemask+0x13c/0x1068
   new_slab+0xec/0x570
   ___slab_alloc+0x3e0/0x4f8
   __slab_alloc+0x60/0x80
   __kmalloc_node_track_caller+0x10c/0x478
   devm_kmalloc+0x44/0xb0
   pinctrl_bind_pins+0x4c/0x188
   really_probe+0x78/0x2b8
   driver_probe_device+0x64/0x110
   device_driver_attach+0x74/0x98
   __driver_attach+0x9c/0xe8
   bus_for_each_dev+0x84/0xd8
   driver_attach+0x30/0x40
   bus_add_driver+0x170/0x218
   driver_register+0x64/0x118
   __platform_driver_register+0x54/0x60
   arm_smmu_driver_init+0x24/0x2c
   do_one_initcall+0xbc/0x328
   kernel_init_freeable+0x304/0x3ac
   kernel_init+0x18/0x110
   ret_from_fork+0x10/0x1c
  Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
  ---[ end trace dfeaed4c373a32da ]--

Change the dev_set_proximity() hook prototype so that it returns a
value and make it return failure if the PXM->NUMA-node mapping
corresponds to an offline node, fixing the crash.

Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Link: https://lore.kernel.org/linux-arm-kernel/20190315021940.86905-1-wangkefeng.wang@huawei.com/
---
v2->v3:
-Update changelog according to Lorenzo Pieralisi's comment and add acked-by.

 drivers/acpi/arm64/iort.c | 19 ++++++++++++++-----
 1 file changed, 14 insertions(+), 5 deletions(-)

diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
index e48894e002ba..a46c2c162c03 100644
--- a/drivers/acpi/arm64/iort.c
+++ b/drivers/acpi/arm64/iort.c
@@ -1232,18 +1232,24 @@ static bool __init arm_smmu_v3_is_coherent(struct acpi_iort_node *node)
 /*
  * set numa proximity domain for smmuv3 device
  */
-static void  __init arm_smmu_v3_set_proximity(struct device *dev,
+static int  __init arm_smmu_v3_set_proximity(struct device *dev,
 					      struct acpi_iort_node *node)
 {
 	struct acpi_iort_smmu_v3 *smmu;
 
 	smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
 	if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
-		set_dev_node(dev, acpi_map_pxm_to_node(smmu->pxm));
+		int node = acpi_map_pxm_to_node(smmu->pxm);
+
+		if (node != NUMA_NO_NODE && !node_online(node))
+			return -EINVAL;
+
+		set_dev_node(dev, node);
 		pr_info("SMMU-v3[%llx] Mapped to Proximity domain %d\n",
 			smmu->base_address,
 			smmu->pxm);
 	}
+	return 0;
 }
 #else
 #define arm_smmu_v3_set_proximity NULL
@@ -1318,7 +1324,7 @@ struct iort_dev_config {
 	int (*dev_count_resources)(struct acpi_iort_node *node);
 	void (*dev_init_resources)(struct resource *res,
 				     struct acpi_iort_node *node);
-	void (*dev_set_proximity)(struct device *dev,
+	int (*dev_set_proximity)(struct device *dev,
 				    struct acpi_iort_node *node);
 };
 
@@ -1369,8 +1375,11 @@ static int __init iort_add_platform_device(struct acpi_iort_node *node,
 	if (!pdev)
 		return -ENOMEM;
 
-	if (ops->dev_set_proximity)
-		ops->dev_set_proximity(&pdev->dev, node);
+	if (ops->dev_set_proximity) {
+		ret = ops->dev_set_proximity(&pdev->dev, node);
+		if (ret)
+			goto dev_put;
+	}
 
 	count = ops->dev_count_resources(node);
 
-- 
2.20.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] ACPI/IORT: Reject platform device creation on NUMA node mapping failure
  2019-04-08 15:21                 ` [PATCH v3] ACPI/IORT: Reject platform device creation on NUMA node mapping failure Kefeng Wang
@ 2019-04-16 17:02                   ` Lorenzo Pieralisi
  2019-04-16 17:05                     ` Will Deacon
  0 siblings, 1 reply; 18+ messages in thread
From: Lorenzo Pieralisi @ 2019-04-16 17:02 UTC (permalink / raw)
  To: Kefeng Wang, will.deacon
  Cc: rjw, Robin Murphy, linux-acpi, linux-arm-kernel, Sudeep Holla

[+Will]

Hi Will,

there is not enough material for an IORT pull request this cycle but
this patch should be merged and IORT code goes usually via arm64, can
you pick it up or if you prefer I can resend it on LAKML and CC you in
if it makes it any simpler ?

Thanks,
Lorenzo

On Mon, Apr 08, 2019 at 11:21:12PM +0800, Kefeng Wang wrote:
> In a system where, through IORT firmware mappings, the SMMU device is
> mapped to a NUMA node that is not online, the kernel bootstrap results
> in the following crash:
> 
>   Unable to handle kernel paging request at virtual address 0000000000001388
>   Mem abort info:
>     ESR = 0x96000004
>     Exception class = DABT (current EL), IL = 32 bits
>     SET = 0, FnV = 0
>     EA = 0, S1PTW = 0
>   Data abort info:
>     ISV = 0, ISS = 0x00000004
>     CM = 0, WnR = 0
>   [0000000000001388] user address but active_mm is swapper
>   Internal error: Oops: 96000004 [#1] SMP
>   Modules linked in:
>   CPU: 5 PID: 1 Comm: swapper/0 Not tainted 5.0.0 #15
>   pstate: 80c00009 (Nzcv daif +PAN +UAO)
>   pc : __alloc_pages_nodemask+0x13c/0x1068
>   lr : __alloc_pages_nodemask+0xdc/0x1068
>   ...
>   Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
>   Call trace:
>    __alloc_pages_nodemask+0x13c/0x1068
>    new_slab+0xec/0x570
>    ___slab_alloc+0x3e0/0x4f8
>    __slab_alloc+0x60/0x80
>    __kmalloc_node_track_caller+0x10c/0x478
>    devm_kmalloc+0x44/0xb0
>    pinctrl_bind_pins+0x4c/0x188
>    really_probe+0x78/0x2b8
>    driver_probe_device+0x64/0x110
>    device_driver_attach+0x74/0x98
>    __driver_attach+0x9c/0xe8
>    bus_for_each_dev+0x84/0xd8
>    driver_attach+0x30/0x40
>    bus_add_driver+0x170/0x218
>    driver_register+0x64/0x118
>    __platform_driver_register+0x54/0x60
>    arm_smmu_driver_init+0x24/0x2c
>    do_one_initcall+0xbc/0x328
>    kernel_init_freeable+0x304/0x3ac
>    kernel_init+0x18/0x110
>    ret_from_fork+0x10/0x1c
>   Code: f90013b5 b9410fa1 1a9f0694 b50014c2 (b9400804)
>   ---[ end trace dfeaed4c373a32da ]--
> 
> Change the dev_set_proximity() hook prototype so that it returns a
> value and make it return failure if the PXM->NUMA-node mapping
> corresponds to an offline node, fixing the crash.
> 
> Acked-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> Link: https://lore.kernel.org/linux-arm-kernel/20190315021940.86905-1-wangkefeng.wang@huawei.com/
> ---
> v2->v3:
> -Update changelog according to Lorenzo Pieralisi's comment and add acked-by.
> 
>  drivers/acpi/arm64/iort.c | 19 ++++++++++++++-----
>  1 file changed, 14 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/acpi/arm64/iort.c b/drivers/acpi/arm64/iort.c
> index e48894e002ba..a46c2c162c03 100644
> --- a/drivers/acpi/arm64/iort.c
> +++ b/drivers/acpi/arm64/iort.c
> @@ -1232,18 +1232,24 @@ static bool __init arm_smmu_v3_is_coherent(struct acpi_iort_node *node)
>  /*
>   * set numa proximity domain for smmuv3 device
>   */
> -static void  __init arm_smmu_v3_set_proximity(struct device *dev,
> +static int  __init arm_smmu_v3_set_proximity(struct device *dev,
>  					      struct acpi_iort_node *node)
>  {
>  	struct acpi_iort_smmu_v3 *smmu;
>  
>  	smmu = (struct acpi_iort_smmu_v3 *)node->node_data;
>  	if (smmu->flags & ACPI_IORT_SMMU_V3_PXM_VALID) {
> -		set_dev_node(dev, acpi_map_pxm_to_node(smmu->pxm));
> +		int node = acpi_map_pxm_to_node(smmu->pxm);
> +
> +		if (node != NUMA_NO_NODE && !node_online(node))
> +			return -EINVAL;
> +
> +		set_dev_node(dev, node);
>  		pr_info("SMMU-v3[%llx] Mapped to Proximity domain %d\n",
>  			smmu->base_address,
>  			smmu->pxm);
>  	}
> +	return 0;
>  }
>  #else
>  #define arm_smmu_v3_set_proximity NULL
> @@ -1318,7 +1324,7 @@ struct iort_dev_config {
>  	int (*dev_count_resources)(struct acpi_iort_node *node);
>  	void (*dev_init_resources)(struct resource *res,
>  				     struct acpi_iort_node *node);
> -	void (*dev_set_proximity)(struct device *dev,
> +	int (*dev_set_proximity)(struct device *dev,
>  				    struct acpi_iort_node *node);
>  };
>  
> @@ -1369,8 +1375,11 @@ static int __init iort_add_platform_device(struct acpi_iort_node *node,
>  	if (!pdev)
>  		return -ENOMEM;
>  
> -	if (ops->dev_set_proximity)
> -		ops->dev_set_proximity(&pdev->dev, node);
> +	if (ops->dev_set_proximity) {
> +		ret = ops->dev_set_proximity(&pdev->dev, node);
> +		if (ret)
> +			goto dev_put;
> +	}
>  
>  	count = ops->dev_count_resources(node);
>  
> -- 
> 2.20.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: [PATCH v3] ACPI/IORT: Reject platform device creation on NUMA node mapping failure
  2019-04-16 17:02                   ` Lorenzo Pieralisi
@ 2019-04-16 17:05                     ` Will Deacon
  0 siblings, 0 replies; 18+ messages in thread
From: Will Deacon @ 2019-04-16 17:05 UTC (permalink / raw)
  To: Lorenzo Pieralisi
  Cc: Kefeng Wang, rjw, linux-acpi, Sudeep Holla, Robin Murphy,
	linux-arm-kernel

On Tue, Apr 16, 2019 at 06:02:20PM +0100, Lorenzo Pieralisi wrote:
> [+Will]
> 
> there is not enough material for an IORT pull request this cycle but
> this patch should be merged and IORT code goes usually via arm64, can
> you pick it up or if you prefer I can resend it on LAKML and CC you in
> if it makes it any simpler ?
> 
> On Mon, Apr 08, 2019 at 11:21:12PM +0800, Kefeng Wang wrote:
> > In a system where, through IORT firmware mappings, the SMMU device is
> > mapped to a NUMA node that is not online, the kernel bootstrap results
> > in the following crash:

Queued for 5.2. Thanks for the heads-up.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2019-04-16 17:05 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-15  2:19 [PATCH 0/2] fix issue when acpi smmuv3 device alloc offline node memory Kefeng Wang
2019-03-15  2:19 ` [PATCH 1/2] ACPI/IORT: set online numa node for smmuv3 device Kefeng Wang
2019-03-20 11:41   ` Robin Murphy
2019-03-20 14:00     ` Lorenzo Pieralisi
2019-03-21  6:08       ` Kefeng Wang
2019-03-27 14:24         ` Kefeng Wang
2019-03-28 11:32         ` Lorenzo Pieralisi
2019-03-28 14:00           ` [PATCH v2] ACPI/IORT: Reject platform dev creation when dev set to wrong numa node Kefeng Wang
2019-03-28 13:59             ` Robin Murphy
2019-03-28 14:29               ` Kefeng Wang
2019-03-29  3:17             ` [PATCH RESEND " Kefeng Wang
2019-04-08 10:42               ` Lorenzo Pieralisi
2019-04-08 10:46               ` Lorenzo Pieralisi
2019-04-08 15:21                 ` [PATCH v3] ACPI/IORT: Reject platform device creation on NUMA node mapping failure Kefeng Wang
2019-04-16 17:02                   ` Lorenzo Pieralisi
2019-04-16 17:05                     ` Will Deacon
2019-03-15  2:19 ` [PATCH 2/2] ACPI: NUMA: show match info about PXM ID and offline/online node Kefeng Wang
2019-03-15  8:34   ` Kefeng Wang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).