All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables
@ 2018-12-13 10:59 ` Shameer Kolothum
  0 siblings, 0 replies; 12+ messages in thread
From: Shameer Kolothum @ 2018-12-13 10:59 UTC (permalink / raw)
  To: marc.zyngier, linux-kernel
  Cc: shankerd, ganapatrao.kulkarni, Robert.Richter, guohanjun,
	john.garry, linux-arm-kernel, linuxarm

From: Shanker Donthineni <shankerd@codeaurora.org>

The NUMA node information is visible to ITS driver but not being used
other than handling hardware errata. ITS/GICR hardware accesses to the
local NUMA node is usually quicker than the remote NUMA node. How slow
the remote NUMA accesses are depends on the implementation details.

This patch allocates memory for ITS management tables and command
queue from the corresponding NUMA node using the appropriate NUMA
aware functions. This change improves the performance of the ITS
tables read latency on systems where it has more than one ITS block,
and with the slower inter node accesses.

Apache Web server benchmarking using ab tool on a HiSilicon D06
board with multiple numa mem nodes shows Time per request and
Transfer rate improvements of ~3.6% with this patch.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---

This is to revive the patch originally sent by Shanker[1] and 
to back it up with a benchmark test. Any further testing of
this is most welcome.

v2-->v3
 -Addressed comments to use page_address().
 -Added Benchmark results to commit log.
 -Removed T-by from Ganapatrao for now.

v1-->v2
 -Edited commit text.
 -Added Ganapatrao's tested-by.

Benchmark test details:
--------------------------------
Test Setup:
-D06 with dimm on node 0(Sock#0) and 3 (Sock#1).
-ITS belongs to numa node 0.
-Filesystem mounted on a PCIe NVMe based disk.
-Apache server installed on D06.
-Running ab benchmark test in concurrency mode from a remote m/c
 connected to D06 via  hns3(PCIe) n/w port.
 "ab -k -c 750 -n 2000000 http://10.202.225.188/"

Test results are avg. of 15 runs.

For 4.20-rc1  Kernel,
----------------------------
Time per request(mean, concurrent)  = 0.02753[ms]  
Transfer Rate = 416501[Kbytes/sec]

For 4.20-rc1 +  this patch,
----------------------------------
Time per request(mean, concurrent)  = 0.02653[ms]  
Transfer Rate = 431954[Kbytes/sec]

% improvement ~3.6%

vmstat shows around 170K-200K interrupts per second.

~# vmstat 1 -w
procs -----------------------memory-- -  -system--
 r  b         swpd         free            in             
 5  0            0     30166724          102794 
 9  0            0     30141828          171148 
 5  0            0     30150160          207185 
13  0            0     30145924          175691 
15  0            0     30140792          145250 
13  0            0     30135556          201879 
13  0            0     30134864          192391 
10  0            0     30133632          168880 
....

[1] https://patchwork.kernel.org/patch/9833339/

 drivers/irqchip/irq-gic-v3-its.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index db20e99..ab01061 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1749,7 +1749,8 @@ static int its_setup_baser(struct its_node *its, struct its_baser *baser,
 		order = get_order(GITS_BASER_PAGES_MAX * psz);
 	}
 
-	base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
+	base = (void *)page_address(alloc_pages_node(its->numa_node,
+				    GFP_KERNEL | __GFP_ZERO, order));
 	if (!base)
 		return -ENOMEM;
 
@@ -2236,7 +2237,8 @@ static struct its_baser *its_get_baser(struct its_node *its, u32 type)
 	return NULL;
 }
 
-static bool its_alloc_table_entry(struct its_baser *baser, u32 id)
+static bool its_alloc_table_entry(struct its_node *its,
+				  struct its_baser *baser, u32 id)
 {
 	struct page *page;
 	u32 esz, idx;
@@ -2256,7 +2258,8 @@ static bool its_alloc_table_entry(struct its_baser *baser, u32 id)
 
 	/* Allocate memory for 2nd level table */
 	if (!table[idx]) {
-		page = alloc_pages(GFP_KERNEL | __GFP_ZERO, get_order(baser->psz));
+		page = alloc_pages_node(its->numa_node, GFP_KERNEL | __GFP_ZERO,
+					get_order(baser->psz));
 		if (!page)
 			return false;
 
@@ -2287,7 +2290,7 @@ static bool its_alloc_device_table(struct its_node *its, u32 dev_id)
 	if (!baser)
 		return (ilog2(dev_id) < its->device_ids);
 
-	return its_alloc_table_entry(baser, dev_id);
+	return its_alloc_table_entry(its, baser, dev_id);
 }
 
 static bool its_alloc_vpe_table(u32 vpe_id)
@@ -2311,7 +2314,7 @@ static bool its_alloc_vpe_table(u32 vpe_id)
 		if (!baser)
 			return false;
 
-		if (!its_alloc_table_entry(baser, vpe_id))
+		if (!its_alloc_table_entry(its, baser, vpe_id))
 			return false;
 	}
 
@@ -2345,7 +2348,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
 	nr_ites = max(2, nvecs);
 	sz = nr_ites * its->ite_size;
 	sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1;
-	itt = kzalloc(sz, GFP_KERNEL);
+	itt = kzalloc_node(sz, GFP_KERNEL, its->numa_node);
 	if (alloc_lpis) {
 		lpi_map = its_lpi_alloc(nvecs, &lpi_base, &nr_lpis);
 		if (lpi_map)
@@ -3541,8 +3544,9 @@ static int __init its_probe_one(struct resource *res,
 
 	its->numa_node = numa_node;
 
-	its->cmd_base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
-						get_order(ITS_CMD_QUEUE_SZ));
+	its->cmd_base = (void *)page_address(alloc_pages_node(its->numa_node,
+					     GFP_KERNEL | __GFP_ZERO,
+					     get_order(ITS_CMD_QUEUE_SZ)));
 	if (!its->cmd_base) {
 		err = -ENOMEM;
 		goto out_free_its;
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables
@ 2018-12-13 10:59 ` Shameer Kolothum
  0 siblings, 0 replies; 12+ messages in thread
From: Shameer Kolothum @ 2018-12-13 10:59 UTC (permalink / raw)
  To: marc.zyngier, linux-kernel
  Cc: john.garry, linuxarm, Robert.Richter, shankerd, guohanjun,
	linux-arm-kernel, ganapatrao.kulkarni

From: Shanker Donthineni <shankerd@codeaurora.org>

The NUMA node information is visible to ITS driver but not being used
other than handling hardware errata. ITS/GICR hardware accesses to the
local NUMA node is usually quicker than the remote NUMA node. How slow
the remote NUMA accesses are depends on the implementation details.

This patch allocates memory for ITS management tables and command
queue from the corresponding NUMA node using the appropriate NUMA
aware functions. This change improves the performance of the ITS
tables read latency on systems where it has more than one ITS block,
and with the slower inter node accesses.

Apache Web server benchmarking using ab tool on a HiSilicon D06
board with multiple numa mem nodes shows Time per request and
Transfer rate improvements of ~3.6% with this patch.

Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
---

This is to revive the patch originally sent by Shanker[1] and 
to back it up with a benchmark test. Any further testing of
this is most welcome.

v2-->v3
 -Addressed comments to use page_address().
 -Added Benchmark results to commit log.
 -Removed T-by from Ganapatrao for now.

v1-->v2
 -Edited commit text.
 -Added Ganapatrao's tested-by.

Benchmark test details:
--------------------------------
Test Setup:
-D06 with dimm on node 0(Sock#0) and 3 (Sock#1).
-ITS belongs to numa node 0.
-Filesystem mounted on a PCIe NVMe based disk.
-Apache server installed on D06.
-Running ab benchmark test in concurrency mode from a remote m/c
 connected to D06 via  hns3(PCIe) n/w port.
 "ab -k -c 750 -n 2000000 http://10.202.225.188/"

Test results are avg. of 15 runs.

For 4.20-rc1  Kernel,
----------------------------
Time per request(mean, concurrent)  = 0.02753[ms]  
Transfer Rate = 416501[Kbytes/sec]

For 4.20-rc1 +  this patch,
----------------------------------
Time per request(mean, concurrent)  = 0.02653[ms]  
Transfer Rate = 431954[Kbytes/sec]

% improvement ~3.6%

vmstat shows around 170K-200K interrupts per second.

~# vmstat 1 -w
procs -----------------------memory-- -  -system--
 r  b         swpd         free            in             
 5  0            0     30166724          102794 
 9  0            0     30141828          171148 
 5  0            0     30150160          207185 
13  0            0     30145924          175691 
15  0            0     30140792          145250 
13  0            0     30135556          201879 
13  0            0     30134864          192391 
10  0            0     30133632          168880 
....

[1] https://patchwork.kernel.org/patch/9833339/

 drivers/irqchip/irq-gic-v3-its.c | 20 ++++++++++++--------
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
index db20e99..ab01061 100644
--- a/drivers/irqchip/irq-gic-v3-its.c
+++ b/drivers/irqchip/irq-gic-v3-its.c
@@ -1749,7 +1749,8 @@ static int its_setup_baser(struct its_node *its, struct its_baser *baser,
 		order = get_order(GITS_BASER_PAGES_MAX * psz);
 	}
 
-	base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
+	base = (void *)page_address(alloc_pages_node(its->numa_node,
+				    GFP_KERNEL | __GFP_ZERO, order));
 	if (!base)
 		return -ENOMEM;
 
@@ -2236,7 +2237,8 @@ static struct its_baser *its_get_baser(struct its_node *its, u32 type)
 	return NULL;
 }
 
-static bool its_alloc_table_entry(struct its_baser *baser, u32 id)
+static bool its_alloc_table_entry(struct its_node *its,
+				  struct its_baser *baser, u32 id)
 {
 	struct page *page;
 	u32 esz, idx;
@@ -2256,7 +2258,8 @@ static bool its_alloc_table_entry(struct its_baser *baser, u32 id)
 
 	/* Allocate memory for 2nd level table */
 	if (!table[idx]) {
-		page = alloc_pages(GFP_KERNEL | __GFP_ZERO, get_order(baser->psz));
+		page = alloc_pages_node(its->numa_node, GFP_KERNEL | __GFP_ZERO,
+					get_order(baser->psz));
 		if (!page)
 			return false;
 
@@ -2287,7 +2290,7 @@ static bool its_alloc_device_table(struct its_node *its, u32 dev_id)
 	if (!baser)
 		return (ilog2(dev_id) < its->device_ids);
 
-	return its_alloc_table_entry(baser, dev_id);
+	return its_alloc_table_entry(its, baser, dev_id);
 }
 
 static bool its_alloc_vpe_table(u32 vpe_id)
@@ -2311,7 +2314,7 @@ static bool its_alloc_vpe_table(u32 vpe_id)
 		if (!baser)
 			return false;
 
-		if (!its_alloc_table_entry(baser, vpe_id))
+		if (!its_alloc_table_entry(its, baser, vpe_id))
 			return false;
 	}
 
@@ -2345,7 +2348,7 @@ static struct its_device *its_create_device(struct its_node *its, u32 dev_id,
 	nr_ites = max(2, nvecs);
 	sz = nr_ites * its->ite_size;
 	sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1;
-	itt = kzalloc(sz, GFP_KERNEL);
+	itt = kzalloc_node(sz, GFP_KERNEL, its->numa_node);
 	if (alloc_lpis) {
 		lpi_map = its_lpi_alloc(nvecs, &lpi_base, &nr_lpis);
 		if (lpi_map)
@@ -3541,8 +3544,9 @@ static int __init its_probe_one(struct resource *res,
 
 	its->numa_node = numa_node;
 
-	its->cmd_base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
-						get_order(ITS_CMD_QUEUE_SZ));
+	its->cmd_base = (void *)page_address(alloc_pages_node(its->numa_node,
+					     GFP_KERNEL | __GFP_ZERO,
+					     get_order(ITS_CMD_QUEUE_SZ)));
 	if (!its->cmd_base) {
 		err = -ENOMEM;
 		goto out_free_its;
-- 
2.7.4



_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables
  2018-12-13 10:59 ` Shameer Kolothum
@ 2018-12-13 11:54   ` Marc Zyngier
  -1 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2018-12-13 11:54 UTC (permalink / raw)
  To: Shameer Kolothum, linux-kernel
  Cc: shankerd, ganapatrao.kulkarni, Robert.Richter, guohanjun,
	john.garry, linux-arm-kernel, linuxarm

On 13/12/2018 10:59, Shameer Kolothum wrote:
> From: Shanker Donthineni <shankerd@codeaurora.org>
> 
> The NUMA node information is visible to ITS driver but not being used
> other than handling hardware errata. ITS/GICR hardware accesses to the
> local NUMA node is usually quicker than the remote NUMA node. How slow
> the remote NUMA accesses are depends on the implementation details.
> 
> This patch allocates memory for ITS management tables and command
> queue from the corresponding NUMA node using the appropriate NUMA
> aware functions. This change improves the performance of the ITS
> tables read latency on systems where it has more than one ITS block,
> and with the slower inter node accesses.
> 
> Apache Web server benchmarking using ab tool on a HiSilicon D06
> board with multiple numa mem nodes shows Time per request and
> Transfer rate improvements of ~3.6% with this patch.
> 
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
> 
> This is to revive the patch originally sent by Shanker[1] and 
> to back it up with a benchmark test. Any further testing of
> this is most welcome.
> 
> v2-->v3
>  -Addressed comments to use page_address().
>  -Added Benchmark results to commit log.
>  -Removed T-by from Ganapatrao for now.
> 
> v1-->v2
>  -Edited commit text.
>  -Added Ganapatrao's tested-by.
> 
> Benchmark test details:
> --------------------------------
> Test Setup:
> -D06 with dimm on node 0(Sock#0) and 3 (Sock#1).
> -ITS belongs to numa node 0.
> -Filesystem mounted on a PCIe NVMe based disk.
> -Apache server installed on D06.
> -Running ab benchmark test in concurrency mode from a remote m/c
>  connected to D06 via  hns3(PCIe) n/w port.
>  "ab -k -c 750 -n 2000000 http://10.202.225.188/"
> 
> Test results are avg. of 15 runs.
> 
> For 4.20-rc1  Kernel,
> ----------------------------
> Time per request(mean, concurrent)  = 0.02753[ms]  
> Transfer Rate = 416501[Kbytes/sec]
> 
> For 4.20-rc1 +  this patch,
> ----------------------------------
> Time per request(mean, concurrent)  = 0.02653[ms]  
> Transfer Rate = 431954[Kbytes/sec]
> 
> % improvement ~3.6%
> 
> vmstat shows around 170K-200K interrupts per second.
> 
> ~# vmstat 1 -w
> procs -----------------------memory-- -  -system--
>  r  b         swpd         free            in             
>  5  0            0     30166724          102794 
>  9  0            0     30141828          171148 
>  5  0            0     30150160          207185 
> 13  0            0     30145924          175691 
> 15  0            0     30140792          145250 
> 13  0            0     30135556          201879 
> 13  0            0     30134864          192391 
> 10  0            0     30133632          168880 
> ....
> 
> [1] https://patchwork.kernel.org/patch/9833339/

The figures certainly look convincing. I'd need someone from Cavium to
benchmark it on their hardware and come back with results so that we can
make a decision on this.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables
@ 2018-12-13 11:54   ` Marc Zyngier
  0 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2018-12-13 11:54 UTC (permalink / raw)
  To: Shameer Kolothum, linux-kernel
  Cc: john.garry, linuxarm, Robert.Richter, shankerd, guohanjun,
	linux-arm-kernel, ganapatrao.kulkarni

On 13/12/2018 10:59, Shameer Kolothum wrote:
> From: Shanker Donthineni <shankerd@codeaurora.org>
> 
> The NUMA node information is visible to ITS driver but not being used
> other than handling hardware errata. ITS/GICR hardware accesses to the
> local NUMA node is usually quicker than the remote NUMA node. How slow
> the remote NUMA accesses are depends on the implementation details.
> 
> This patch allocates memory for ITS management tables and command
> queue from the corresponding NUMA node using the appropriate NUMA
> aware functions. This change improves the performance of the ITS
> tables read latency on systems where it has more than one ITS block,
> and with the slower inter node accesses.
> 
> Apache Web server benchmarking using ab tool on a HiSilicon D06
> board with multiple numa mem nodes shows Time per request and
> Transfer rate improvements of ~3.6% with this patch.
> 
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
> 
> This is to revive the patch originally sent by Shanker[1] and 
> to back it up with a benchmark test. Any further testing of
> this is most welcome.
> 
> v2-->v3
>  -Addressed comments to use page_address().
>  -Added Benchmark results to commit log.
>  -Removed T-by from Ganapatrao for now.
> 
> v1-->v2
>  -Edited commit text.
>  -Added Ganapatrao's tested-by.
> 
> Benchmark test details:
> --------------------------------
> Test Setup:
> -D06 with dimm on node 0(Sock#0) and 3 (Sock#1).
> -ITS belongs to numa node 0.
> -Filesystem mounted on a PCIe NVMe based disk.
> -Apache server installed on D06.
> -Running ab benchmark test in concurrency mode from a remote m/c
>  connected to D06 via  hns3(PCIe) n/w port.
>  "ab -k -c 750 -n 2000000 http://10.202.225.188/"
> 
> Test results are avg. of 15 runs.
> 
> For 4.20-rc1  Kernel,
> ----------------------------
> Time per request(mean, concurrent)  = 0.02753[ms]  
> Transfer Rate = 416501[Kbytes/sec]
> 
> For 4.20-rc1 +  this patch,
> ----------------------------------
> Time per request(mean, concurrent)  = 0.02653[ms]  
> Transfer Rate = 431954[Kbytes/sec]
> 
> % improvement ~3.6%
> 
> vmstat shows around 170K-200K interrupts per second.
> 
> ~# vmstat 1 -w
> procs -----------------------memory-- -  -system--
>  r  b         swpd         free            in             
>  5  0            0     30166724          102794 
>  9  0            0     30141828          171148 
>  5  0            0     30150160          207185 
> 13  0            0     30145924          175691 
> 15  0            0     30140792          145250 
> 13  0            0     30135556          201879 
> 13  0            0     30134864          192391 
> 10  0            0     30133632          168880 
> ....
> 
> [1] https://patchwork.kernel.org/patch/9833339/

The figures certainly look convincing. I'd need someone from Cavium to
benchmark it on their hardware and come back with results so that we can
make a decision on this.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables
  2018-12-13 11:54   ` Marc Zyngier
@ 2019-01-11  3:53     ` Ganapatrao Kulkarni
  -1 siblings, 0 replies; 12+ messages in thread
From: Ganapatrao Kulkarni @ 2019-01-11  3:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Shameer Kolothum, LKML, shankerd, Ganapatrao Kulkarni,
	Robert Richter, Hanjun Guo, John Garry, linux-arm-kernel,
	Linuxarm, Nair, Jayachandran, gkulkarni

Hi Shameer,

Patch looks OK to me, please feel free to add,
Reviewed-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>

On Thu, Dec 13, 2018 at 5:25 PM Marc Zyngier <marc.zyngier@arm.com> wrote:
>
> On 13/12/2018 10:59, Shameer Kolothum wrote:
> > From: Shanker Donthineni <shankerd@codeaurora.org>
> >
> > The NUMA node information is visible to ITS driver but not being used
> > other than handling hardware errata. ITS/GICR hardware accesses to the
> > local NUMA node is usually quicker than the remote NUMA node. How slow
> > the remote NUMA accesses are depends on the implementation details.
> >
> > This patch allocates memory for ITS management tables and command
> > queue from the corresponding NUMA node using the appropriate NUMA
> > aware functions. This change improves the performance of the ITS
> > tables read latency on systems where it has more than one ITS block,
> > and with the slower inter node accesses.
> >
> > Apache Web server benchmarking using ab tool on a HiSilicon D06
> > board with multiple numa mem nodes shows Time per request and
> > Transfer rate improvements of ~3.6% with this patch.
> >
> > Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> > Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
> > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > ---
> >
> > This is to revive the patch originally sent by Shanker[1] and
> > to back it up with a benchmark test. Any further testing of
> > this is most welcome.
> >
> > v2-->v3
> >  -Addressed comments to use page_address().
> >  -Added Benchmark results to commit log.
> >  -Removed T-by from Ganapatrao for now.
> >
> > v1-->v2
> >  -Edited commit text.
> >  -Added Ganapatrao's tested-by.
> >
> > Benchmark test details:
> > --------------------------------
> > Test Setup:
> > -D06 with dimm on node 0(Sock#0) and 3 (Sock#1).
> > -ITS belongs to numa node 0.
> > -Filesystem mounted on a PCIe NVMe based disk.
> > -Apache server installed on D06.
> > -Running ab benchmark test in concurrency mode from a remote m/c
> >  connected to D06 via  hns3(PCIe) n/w port.
> >  "ab -k -c 750 -n 2000000 http://10.202.225.188/"
> >
> > Test results are avg. of 15 runs.
> >
> > For 4.20-rc1  Kernel,
> > ----------------------------
> > Time per request(mean, concurrent)  = 0.02753[ms]
> > Transfer Rate = 416501[Kbytes/sec]
> >
> > For 4.20-rc1 +  this patch,
> > ----------------------------------
> > Time per request(mean, concurrent)  = 0.02653[ms]
> > Transfer Rate = 431954[Kbytes/sec]
> >
> > % improvement ~3.6%
> >
> > vmstat shows around 170K-200K interrupts per second.
> >
> > ~# vmstat 1 -w
> > procs -----------------------memory-- -  -system--
> >  r  b         swpd         free            in
> >  5  0            0     30166724          102794
> >  9  0            0     30141828          171148
> >  5  0            0     30150160          207185
> > 13  0            0     30145924          175691
> > 15  0            0     30140792          145250
> > 13  0            0     30135556          201879
> > 13  0            0     30134864          192391
> > 10  0            0     30133632          168880
> > ....
> >
> > [1] https://patchwork.kernel.org/patch/9833339/
>
> The figures certainly look convincing. I'd need someone from Cavium to
> benchmark it on their hardware and come back with results so that we can
> make a decision on this.

Hi Marc,
My setup got altered during Lab migration from Cavium to Marvell office.
I don't think, i will have same setup anytime soon.

>
> Thanks,
>
>         M.
> --
> Jazz is not dead. It just smells funny...

Thanks,
Ganapat

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables
@ 2019-01-11  3:53     ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 12+ messages in thread
From: Ganapatrao Kulkarni @ 2019-01-11  3:53 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Nair, Jayachandran, John Garry, gkulkarni, Ganapatrao Kulkarni,
	LKML, Shameer Kolothum, Linuxarm, Robert Richter,
	linux-arm-kernel, Hanjun Guo, shankerd

Hi Shameer,

Patch looks OK to me, please feel free to add,
Reviewed-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>

On Thu, Dec 13, 2018 at 5:25 PM Marc Zyngier <marc.zyngier@arm.com> wrote:
>
> On 13/12/2018 10:59, Shameer Kolothum wrote:
> > From: Shanker Donthineni <shankerd@codeaurora.org>
> >
> > The NUMA node information is visible to ITS driver but not being used
> > other than handling hardware errata. ITS/GICR hardware accesses to the
> > local NUMA node is usually quicker than the remote NUMA node. How slow
> > the remote NUMA accesses are depends on the implementation details.
> >
> > This patch allocates memory for ITS management tables and command
> > queue from the corresponding NUMA node using the appropriate NUMA
> > aware functions. This change improves the performance of the ITS
> > tables read latency on systems where it has more than one ITS block,
> > and with the slower inter node accesses.
> >
> > Apache Web server benchmarking using ab tool on a HiSilicon D06
> > board with multiple numa mem nodes shows Time per request and
> > Transfer rate improvements of ~3.6% with this patch.
> >
> > Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> > Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
> > Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> > ---
> >
> > This is to revive the patch originally sent by Shanker[1] and
> > to back it up with a benchmark test. Any further testing of
> > this is most welcome.
> >
> > v2-->v3
> >  -Addressed comments to use page_address().
> >  -Added Benchmark results to commit log.
> >  -Removed T-by from Ganapatrao for now.
> >
> > v1-->v2
> >  -Edited commit text.
> >  -Added Ganapatrao's tested-by.
> >
> > Benchmark test details:
> > --------------------------------
> > Test Setup:
> > -D06 with dimm on node 0(Sock#0) and 3 (Sock#1).
> > -ITS belongs to numa node 0.
> > -Filesystem mounted on a PCIe NVMe based disk.
> > -Apache server installed on D06.
> > -Running ab benchmark test in concurrency mode from a remote m/c
> >  connected to D06 via  hns3(PCIe) n/w port.
> >  "ab -k -c 750 -n 2000000 http://10.202.225.188/"
> >
> > Test results are avg. of 15 runs.
> >
> > For 4.20-rc1  Kernel,
> > ----------------------------
> > Time per request(mean, concurrent)  = 0.02753[ms]
> > Transfer Rate = 416501[Kbytes/sec]
> >
> > For 4.20-rc1 +  this patch,
> > ----------------------------------
> > Time per request(mean, concurrent)  = 0.02653[ms]
> > Transfer Rate = 431954[Kbytes/sec]
> >
> > % improvement ~3.6%
> >
> > vmstat shows around 170K-200K interrupts per second.
> >
> > ~# vmstat 1 -w
> > procs -----------------------memory-- -  -system--
> >  r  b         swpd         free            in
> >  5  0            0     30166724          102794
> >  9  0            0     30141828          171148
> >  5  0            0     30150160          207185
> > 13  0            0     30145924          175691
> > 15  0            0     30140792          145250
> > 13  0            0     30135556          201879
> > 13  0            0     30134864          192391
> > 10  0            0     30133632          168880
> > ....
> >
> > [1] https://patchwork.kernel.org/patch/9833339/
>
> The figures certainly look convincing. I'd need someone from Cavium to
> benchmark it on their hardware and come back with results so that we can
> make a decision on this.

Hi Marc,
My setup got altered during Lab migration from Cavium to Marvell office.
I don't think, i will have same setup anytime soon.

>
> Thanks,
>
>         M.
> --
> Jazz is not dead. It just smells funny...

Thanks,
Ganapat

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables
  2019-01-11  3:53     ` Ganapatrao Kulkarni
@ 2019-01-11  9:01       ` Marc Zyngier
  -1 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2019-01-11  9:01 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Shameer Kolothum, LKML, shankerd, Ganapatrao Kulkarni,
	Robert Richter, Hanjun Guo, John Garry, linux-arm-kernel,
	Linuxarm, Nair, Jayachandran, gkulkarni

On 11/01/2019 03:53, Ganapatrao Kulkarni wrote:
> Hi Shameer,
> 
> Patch looks OK to me, please feel free to add,
> Reviewed-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
> 
> On Thu, Dec 13, 2018 at 5:25 PM Marc Zyngier <marc.zyngier@arm.com> wrote:
>>
>> On 13/12/2018 10:59, Shameer Kolothum wrote:
>>> From: Shanker Donthineni <shankerd@codeaurora.org>
>>>
>>> The NUMA node information is visible to ITS driver but not being used
>>> other than handling hardware errata. ITS/GICR hardware accesses to the
>>> local NUMA node is usually quicker than the remote NUMA node. How slow
>>> the remote NUMA accesses are depends on the implementation details.
>>>
>>> This patch allocates memory for ITS management tables and command
>>> queue from the corresponding NUMA node using the appropriate NUMA
>>> aware functions. This change improves the performance of the ITS
>>> tables read latency on systems where it has more than one ITS block,
>>> and with the slower inter node accesses.
>>>
>>> Apache Web server benchmarking using ab tool on a HiSilicon D06
>>> board with multiple numa mem nodes shows Time per request and
>>> Transfer rate improvements of ~3.6% with this patch.
>>>
>>> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
>>> Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
>>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>>> ---
>>>
>>> This is to revive the patch originally sent by Shanker[1] and
>>> to back it up with a benchmark test. Any further testing of
>>> this is most welcome.
>>>
>>> v2-->v3
>>>  -Addressed comments to use page_address().
>>>  -Added Benchmark results to commit log.
>>>  -Removed T-by from Ganapatrao for now.
>>>
>>> v1-->v2
>>>  -Edited commit text.
>>>  -Added Ganapatrao's tested-by.
>>>
>>> Benchmark test details:
>>> --------------------------------
>>> Test Setup:
>>> -D06 with dimm on node 0(Sock#0) and 3 (Sock#1).
>>> -ITS belongs to numa node 0.
>>> -Filesystem mounted on a PCIe NVMe based disk.
>>> -Apache server installed on D06.
>>> -Running ab benchmark test in concurrency mode from a remote m/c
>>>  connected to D06 via  hns3(PCIe) n/w port.
>>>  "ab -k -c 750 -n 2000000 http://10.202.225.188/"
>>>
>>> Test results are avg. of 15 runs.
>>>
>>> For 4.20-rc1  Kernel,
>>> ----------------------------
>>> Time per request(mean, concurrent)  = 0.02753[ms]
>>> Transfer Rate = 416501[Kbytes/sec]
>>>
>>> For 4.20-rc1 +  this patch,
>>> ----------------------------------
>>> Time per request(mean, concurrent)  = 0.02653[ms]
>>> Transfer Rate = 431954[Kbytes/sec]
>>>
>>> % improvement ~3.6%
>>>
>>> vmstat shows around 170K-200K interrupts per second.
>>>
>>> ~# vmstat 1 -w
>>> procs -----------------------memory-- -  -system--
>>>  r  b         swpd         free            in
>>>  5  0            0     30166724          102794
>>>  9  0            0     30141828          171148
>>>  5  0            0     30150160          207185
>>> 13  0            0     30145924          175691
>>> 15  0            0     30140792          145250
>>> 13  0            0     30135556          201879
>>> 13  0            0     30134864          192391
>>> 10  0            0     30133632          168880
>>> ....
>>>
>>> [1] https://patchwork.kernel.org/patch/9833339/
>>
>> The figures certainly look convincing. I'd need someone from Cavium to
>> benchmark it on their hardware and come back with results so that we can
>> make a decision on this.
> 
> Hi Marc,
> My setup got altered during Lab migration from Cavium to Marvell office.
> I don't think, i will have same setup anytime soon.

Fair enough. If nobody objects, I'll take it.

Shameer, please repost this on top of 5.0-rc1, together with
Ganapatrao's RB, and we'll take it from there.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables
@ 2019-01-11  9:01       ` Marc Zyngier
  0 siblings, 0 replies; 12+ messages in thread
From: Marc Zyngier @ 2019-01-11  9:01 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Nair, Jayachandran, John Garry, gkulkarni, Ganapatrao Kulkarni,
	LKML, Shameer Kolothum, Linuxarm, Robert Richter,
	linux-arm-kernel, Hanjun Guo, shankerd

On 11/01/2019 03:53, Ganapatrao Kulkarni wrote:
> Hi Shameer,
> 
> Patch looks OK to me, please feel free to add,
> Reviewed-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
> 
> On Thu, Dec 13, 2018 at 5:25 PM Marc Zyngier <marc.zyngier@arm.com> wrote:
>>
>> On 13/12/2018 10:59, Shameer Kolothum wrote:
>>> From: Shanker Donthineni <shankerd@codeaurora.org>
>>>
>>> The NUMA node information is visible to ITS driver but not being used
>>> other than handling hardware errata. ITS/GICR hardware accesses to the
>>> local NUMA node is usually quicker than the remote NUMA node. How slow
>>> the remote NUMA accesses are depends on the implementation details.
>>>
>>> This patch allocates memory for ITS management tables and command
>>> queue from the corresponding NUMA node using the appropriate NUMA
>>> aware functions. This change improves the performance of the ITS
>>> tables read latency on systems where it has more than one ITS block,
>>> and with the slower inter node accesses.
>>>
>>> Apache Web server benchmarking using ab tool on a HiSilicon D06
>>> board with multiple numa mem nodes shows Time per request and
>>> Transfer rate improvements of ~3.6% with this patch.
>>>
>>> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
>>> Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
>>> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
>>> ---
>>>
>>> This is to revive the patch originally sent by Shanker[1] and
>>> to back it up with a benchmark test. Any further testing of
>>> this is most welcome.
>>>
>>> v2-->v3
>>>  -Addressed comments to use page_address().
>>>  -Added Benchmark results to commit log.
>>>  -Removed T-by from Ganapatrao for now.
>>>
>>> v1-->v2
>>>  -Edited commit text.
>>>  -Added Ganapatrao's tested-by.
>>>
>>> Benchmark test details:
>>> --------------------------------
>>> Test Setup:
>>> -D06 with dimm on node 0(Sock#0) and 3 (Sock#1).
>>> -ITS belongs to numa node 0.
>>> -Filesystem mounted on a PCIe NVMe based disk.
>>> -Apache server installed on D06.
>>> -Running ab benchmark test in concurrency mode from a remote m/c
>>>  connected to D06 via  hns3(PCIe) n/w port.
>>>  "ab -k -c 750 -n 2000000 http://10.202.225.188/"
>>>
>>> Test results are avg. of 15 runs.
>>>
>>> For 4.20-rc1  Kernel,
>>> ----------------------------
>>> Time per request(mean, concurrent)  = 0.02753[ms]
>>> Transfer Rate = 416501[Kbytes/sec]
>>>
>>> For 4.20-rc1 +  this patch,
>>> ----------------------------------
>>> Time per request(mean, concurrent)  = 0.02653[ms]
>>> Transfer Rate = 431954[Kbytes/sec]
>>>
>>> % improvement ~3.6%
>>>
>>> vmstat shows around 170K-200K interrupts per second.
>>>
>>> ~# vmstat 1 -w
>>> procs -----------------------memory-- -  -system--
>>>  r  b         swpd         free            in
>>>  5  0            0     30166724          102794
>>>  9  0            0     30141828          171148
>>>  5  0            0     30150160          207185
>>> 13  0            0     30145924          175691
>>> 15  0            0     30140792          145250
>>> 13  0            0     30135556          201879
>>> 13  0            0     30134864          192391
>>> 10  0            0     30133632          168880
>>> ....
>>>
>>> [1] https://patchwork.kernel.org/patch/9833339/
>>
>> The figures certainly look convincing. I'd need someone from Cavium to
>> benchmark it on their hardware and come back with results so that we can
>> make a decision on this.
> 
> Hi Marc,
> My setup got altered during Lab migration from Cavium to Marvell office.
> I don't think, i will have same setup anytime soon.

Fair enough. If nobody objects, I'll take it.

Shameer, please repost this on top of 5.0-rc1, together with
Ganapatrao's RB, and we'll take it from there.

Thanks,

	M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables
  2018-12-13 10:59 ` Shameer Kolothum
@ 2019-01-11  9:42   ` Suzuki K Poulose
  -1 siblings, 0 replies; 12+ messages in thread
From: Suzuki K Poulose @ 2019-01-11  9:42 UTC (permalink / raw)
  To: shameerali.kolothum.thodi, marc.zyngier, linux-kernel
  Cc: shankerd, ganapatrao.kulkarni, Robert.Richter, guohanjun,
	john.garry, linux-arm-kernel, linuxarm

Hi Shameer,

On 13/12/2018 10:59, Shameer Kolothum wrote:
> From: Shanker Donthineni <shankerd@codeaurora.org>
> 
> The NUMA node information is visible to ITS driver but not being used
> other than handling hardware errata. ITS/GICR hardware accesses to the
> local NUMA node is usually quicker than the remote NUMA node. How slow
> the remote NUMA accesses are depends on the implementation details.
> 
> This patch allocates memory for ITS management tables and command
> queue from the corresponding NUMA node using the appropriate NUMA
> aware functions. This change improves the performance of the ITS
> tables read latency on systems where it has more than one ITS block,
> and with the slower inter node accesses.
> 
> Apache Web server benchmarking using ab tool on a HiSilicon D06
> board with multiple numa mem nodes shows Time per request and
> Transfer rate improvements of ~3.6% with this patch.
> 
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
> 
> This is to revive the patch originally sent by Shanker[1] and
> to back it up with a benchmark test. Any further testing of
> this is most welcome.
> 
> v2-->v3
>   -Addressed comments to use page_address().
>   -Added Benchmark results to commit log.
>   -Removed T-by from Ganapatrao for now.
> 
> v1-->v2
>   -Edited commit text.
>   -Added Ganapatrao's tested-by.
> 
> Benchmark test details:
> --------------------------------
> Test Setup:
> -D06 with dimm on node 0(Sock#0) and 3 (Sock#1).
> -ITS belongs to numa node 0.
> -Filesystem mounted on a PCIe NVMe based disk.
> -Apache server installed on D06.
> -Running ab benchmark test in concurrency mode from a remote m/c
>   connected to D06 via  hns3(PCIe) n/w port.
>   "ab -k -c 750 -n 2000000 http://10.202.225.188/"
> 
> Test results are avg. of 15 runs.
> 
> For 4.20-rc1  Kernel,
> ----------------------------
> Time per request(mean, concurrent)  = 0.02753[ms]
> Transfer Rate = 416501[Kbytes/sec]
> 
> For 4.20-rc1 +  this patch,
> ----------------------------------
> Time per request(mean, concurrent)  = 0.02653[ms]
> Transfer Rate = 431954[Kbytes/sec]
> 
> % improvement ~3.6%
> 
> vmstat shows around 170K-200K interrupts per second.
> 
> ~# vmstat 1 -w
> procs -----------------------memory-- -  -system--
>   r  b         swpd         free            in
>   5  0            0     30166724          102794
>   9  0            0     30141828          171148
>   5  0            0     30150160          207185
> 13  0            0     30145924          175691
> 15  0            0     30140792          145250
> 13  0            0     30135556          201879
> 13  0            0     30134864          192391
> 10  0            0     30133632          168880
> ....
> 
> [1] https://patchwork.kernel.org/patch/9833339/
> 
>   drivers/irqchip/irq-gic-v3-its.c | 20 ++++++++++++--------
>   1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index db20e99..ab01061 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1749,7 +1749,8 @@ static int its_setup_baser(struct its_node *its, struct its_baser *baser,
>   		order = get_order(GITS_BASER_PAGES_MAX * psz);
>   	}
>   
> -	base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> +	base = (void *)page_address(alloc_pages_node(its->numa_node,
> +				    GFP_KERNEL | __GFP_ZERO, order));

If alloc_pages_node() fails, the page_address() could crash the system.

> -	its->cmd_base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> -						get_order(ITS_CMD_QUEUE_SZ));
> +	its->cmd_base = (void *)page_address(alloc_pages_node(its->numa_node,
> +					     GFP_KERNEL | __GFP_ZERO,
> +					     get_order(ITS_CMD_QUEUE_SZ)));

Similarly here. We may want to handle it properly.

Cheers
Suzuki

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables
@ 2019-01-11  9:42   ` Suzuki K Poulose
  0 siblings, 0 replies; 12+ messages in thread
From: Suzuki K Poulose @ 2019-01-11  9:42 UTC (permalink / raw)
  To: shameerali.kolothum.thodi, marc.zyngier, linux-kernel
  Cc: john.garry, linuxarm, Robert.Richter, shankerd, guohanjun,
	linux-arm-kernel, ganapatrao.kulkarni

Hi Shameer,

On 13/12/2018 10:59, Shameer Kolothum wrote:
> From: Shanker Donthineni <shankerd@codeaurora.org>
> 
> The NUMA node information is visible to ITS driver but not being used
> other than handling hardware errata. ITS/GICR hardware accesses to the
> local NUMA node is usually quicker than the remote NUMA node. How slow
> the remote NUMA accesses are depends on the implementation details.
> 
> This patch allocates memory for ITS management tables and command
> queue from the corresponding NUMA node using the appropriate NUMA
> aware functions. This change improves the performance of the ITS
> tables read latency on systems where it has more than one ITS block,
> and with the slower inter node accesses.
> 
> Apache Web server benchmarking using ab tool on a HiSilicon D06
> board with multiple numa mem nodes shows Time per request and
> Transfer rate improvements of ~3.6% with this patch.
> 
> Signed-off-by: Shanker Donthineni <shankerd@codeaurora.org>
> Signed-off-by: Hanjun Guo <guohanjun@huawei.com>
> Signed-off-by: Shameer Kolothum <shameerali.kolothum.thodi@huawei.com>
> ---
> 
> This is to revive the patch originally sent by Shanker[1] and
> to back it up with a benchmark test. Any further testing of
> this is most welcome.
> 
> v2-->v3
>   -Addressed comments to use page_address().
>   -Added Benchmark results to commit log.
>   -Removed T-by from Ganapatrao for now.
> 
> v1-->v2
>   -Edited commit text.
>   -Added Ganapatrao's tested-by.
> 
> Benchmark test details:
> --------------------------------
> Test Setup:
> -D06 with dimm on node 0(Sock#0) and 3 (Sock#1).
> -ITS belongs to numa node 0.
> -Filesystem mounted on a PCIe NVMe based disk.
> -Apache server installed on D06.
> -Running ab benchmark test in concurrency mode from a remote m/c
>   connected to D06 via  hns3(PCIe) n/w port.
>   "ab -k -c 750 -n 2000000 http://10.202.225.188/"
> 
> Test results are avg. of 15 runs.
> 
> For 4.20-rc1  Kernel,
> ----------------------------
> Time per request(mean, concurrent)  = 0.02753[ms]
> Transfer Rate = 416501[Kbytes/sec]
> 
> For 4.20-rc1 +  this patch,
> ----------------------------------
> Time per request(mean, concurrent)  = 0.02653[ms]
> Transfer Rate = 431954[Kbytes/sec]
> 
> % improvement ~3.6%
> 
> vmstat shows around 170K-200K interrupts per second.
> 
> ~# vmstat 1 -w
> procs -----------------------memory-- -  -system--
>   r  b         swpd         free            in
>   5  0            0     30166724          102794
>   9  0            0     30141828          171148
>   5  0            0     30150160          207185
> 13  0            0     30145924          175691
> 15  0            0     30140792          145250
> 13  0            0     30135556          201879
> 13  0            0     30134864          192391
> 10  0            0     30133632          168880
> ....
> 
> [1] https://patchwork.kernel.org/patch/9833339/
> 
>   drivers/irqchip/irq-gic-v3-its.c | 20 ++++++++++++--------
>   1 file changed, 12 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> index db20e99..ab01061 100644
> --- a/drivers/irqchip/irq-gic-v3-its.c
> +++ b/drivers/irqchip/irq-gic-v3-its.c
> @@ -1749,7 +1749,8 @@ static int its_setup_baser(struct its_node *its, struct its_baser *baser,
>   		order = get_order(GITS_BASER_PAGES_MAX * psz);
>   	}
>   
> -	base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> +	base = (void *)page_address(alloc_pages_node(its->numa_node,
> +				    GFP_KERNEL | __GFP_ZERO, order));

If alloc_pages_node() fails, the page_address() could crash the system.

> -	its->cmd_base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> -						get_order(ITS_CMD_QUEUE_SZ));
> +	its->cmd_base = (void *)page_address(alloc_pages_node(its->numa_node,
> +					     GFP_KERNEL | __GFP_ZERO,
> +					     get_order(ITS_CMD_QUEUE_SZ)));

Similarly here. We may want to handle it properly.

Cheers
Suzuki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables
  2019-01-11  9:42   ` Suzuki K Poulose
@ 2019-01-11 10:34     ` Shameerali Kolothum Thodi
  -1 siblings, 0 replies; 12+ messages in thread
From: Shameerali Kolothum Thodi @ 2019-01-11 10:34 UTC (permalink / raw)
  To: Suzuki K Poulose, marc.zyngier, linux-kernel
  Cc: shankerd, ganapatrao.kulkarni, Robert.Richter,
	Guohanjun (Hanjun Guo),
	John Garry, linux-arm-kernel, Linuxarm

Hi Suzuki,

> -----Original Message-----
> From: Suzuki K Poulose [mailto:suzuki.poulose@arm.com]
> Sent: 11 January 2019 09:42
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> marc.zyngier@arm.com; linux-kernel@vger.kernel.org
> Cc: shankerd@codeaurora.org; ganapatrao.kulkarni@cavium.com;
> Robert.Richter@cavium.com; Guohanjun (Hanjun Guo)
> <guohanjun@huawei.com>; John Garry <john.garry@huawei.com>;
> linux-arm-kernel@lists.infradead.org; Linuxarm <linuxarm@huawei.com>
> Subject: Re: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation
> for ITS tables
> 

[...]

> >   drivers/irqchip/irq-gic-v3-its.c | 20 ++++++++++++--------
> >   1 file changed, 12 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> > index db20e99..ab01061 100644
> > --- a/drivers/irqchip/irq-gic-v3-its.c
> > +++ b/drivers/irqchip/irq-gic-v3-its.c
> > @@ -1749,7 +1749,8 @@ static int its_setup_baser(struct its_node *its,
> struct its_baser *baser,
> >   		order = get_order(GITS_BASER_PAGES_MAX * psz);
> >   	}
> >
> > -	base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> > +	base = (void *)page_address(alloc_pages_node(its->numa_node,
> > +				    GFP_KERNEL | __GFP_ZERO, order));
> 
> If alloc_pages_node() fails, the page_address() could crash the system.
> 
> > -	its->cmd_base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> > -						get_order(ITS_CMD_QUEUE_SZ));
> > +	its->cmd_base = (void
> *)page_address(alloc_pages_node(its->numa_node,
> > +					     GFP_KERNEL | __GFP_ZERO,
> > +					     get_order(ITS_CMD_QUEUE_SZ)));
> 
> Similarly here. We may want to handle it properly.

Ah..good catch. I will change it and rebase on top of 5.0-rc1 as suggested by Marc.

Thanks,
Shameer

^ permalink raw reply	[flat|nested] 12+ messages in thread

* RE: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables
@ 2019-01-11 10:34     ` Shameerali Kolothum Thodi
  0 siblings, 0 replies; 12+ messages in thread
From: Shameerali Kolothum Thodi @ 2019-01-11 10:34 UTC (permalink / raw)
  To: Suzuki K Poulose, marc.zyngier, linux-kernel
  Cc: John Garry, Linuxarm, Robert.Richter, shankerd,
	Guohanjun (Hanjun Guo),
	linux-arm-kernel, ganapatrao.kulkarni

Hi Suzuki,

> -----Original Message-----
> From: Suzuki K Poulose [mailto:suzuki.poulose@arm.com]
> Sent: 11 January 2019 09:42
> To: Shameerali Kolothum Thodi <shameerali.kolothum.thodi@huawei.com>;
> marc.zyngier@arm.com; linux-kernel@vger.kernel.org
> Cc: shankerd@codeaurora.org; ganapatrao.kulkarni@cavium.com;
> Robert.Richter@cavium.com; Guohanjun (Hanjun Guo)
> <guohanjun@huawei.com>; John Garry <john.garry@huawei.com>;
> linux-arm-kernel@lists.infradead.org; Linuxarm <linuxarm@huawei.com>
> Subject: Re: [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation
> for ITS tables
> 

[...]

> >   drivers/irqchip/irq-gic-v3-its.c | 20 ++++++++++++--------
> >   1 file changed, 12 insertions(+), 8 deletions(-)
> >
> > diff --git a/drivers/irqchip/irq-gic-v3-its.c b/drivers/irqchip/irq-gic-v3-its.c
> > index db20e99..ab01061 100644
> > --- a/drivers/irqchip/irq-gic-v3-its.c
> > +++ b/drivers/irqchip/irq-gic-v3-its.c
> > @@ -1749,7 +1749,8 @@ static int its_setup_baser(struct its_node *its,
> struct its_baser *baser,
> >   		order = get_order(GITS_BASER_PAGES_MAX * psz);
> >   	}
> >
> > -	base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO, order);
> > +	base = (void *)page_address(alloc_pages_node(its->numa_node,
> > +				    GFP_KERNEL | __GFP_ZERO, order));
> 
> If alloc_pages_node() fails, the page_address() could crash the system.
> 
> > -	its->cmd_base = (void *)__get_free_pages(GFP_KERNEL | __GFP_ZERO,
> > -						get_order(ITS_CMD_QUEUE_SZ));
> > +	its->cmd_base = (void
> *)page_address(alloc_pages_node(its->numa_node,
> > +					     GFP_KERNEL | __GFP_ZERO,
> > +					     get_order(ITS_CMD_QUEUE_SZ)));
> 
> Similarly here. We may want to handle it properly.

Ah..good catch. I will change it and rebase on top of 5.0-rc1 as suggested by Marc.

Thanks,
Shameer
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-01-11 10:35 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-12-13 10:59 [PATCH v3] irqchip: gicv3-its: Use NUMA aware memory allocation for ITS tables Shameer Kolothum
2018-12-13 10:59 ` Shameer Kolothum
2018-12-13 11:54 ` Marc Zyngier
2018-12-13 11:54   ` Marc Zyngier
2019-01-11  3:53   ` Ganapatrao Kulkarni
2019-01-11  3:53     ` Ganapatrao Kulkarni
2019-01-11  9:01     ` Marc Zyngier
2019-01-11  9:01       ` Marc Zyngier
2019-01-11  9:42 ` Suzuki K Poulose
2019-01-11  9:42   ` Suzuki K Poulose
2019-01-11 10:34   ` Shameerali Kolothum Thodi
2019-01-11 10:34     ` Shameerali Kolothum Thodi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.