All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] Performance optimization of ACL build process
@ 2016-08-16 14:01 Vladyslav Buslov
  2016-08-16 14:01 ` [PATCH] acl: use rte_calloc for temporary memory allocation Vladyslav Buslov
  2016-08-31 15:24 ` [PATCH] Performance optimization of ACL build process Stephen Hemminger
  0 siblings, 2 replies; 7+ messages in thread
From: Vladyslav Buslov @ 2016-08-16 14:01 UTC (permalink / raw)
  To: konstantin.ananyev; +Cc: dev

Hello,

In our application we need to be able to allocate tens of thousands of ACLs at runtime.
Testing revealed significant performance problems. We were able to track them to memset in calloc function which caused multiple page walks per invocation.
Modifying tb_mem to use huge page memory resulted ~2x performance gain for that operation.

Regards,
Vladyslav

Vladyslav Buslov (1):
  acl: use rte_calloc for temporary memory allocation

 lib/librte_acl/tb_mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

-- 
2.8.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH] acl: use rte_calloc for temporary memory allocation
  2016-08-16 14:01 [PATCH] Performance optimization of ACL build process Vladyslav Buslov
@ 2016-08-16 14:01 ` Vladyslav Buslov
  2016-08-31  1:27   ` Ananyev, Konstantin
  2016-08-31 15:24 ` [PATCH] Performance optimization of ACL build process Stephen Hemminger
  1 sibling, 1 reply; 7+ messages in thread
From: Vladyslav Buslov @ 2016-08-16 14:01 UTC (permalink / raw)
  To: konstantin.ananyev; +Cc: dev

Acl build process uses significant amount of memory
which degrades performance by causing page walks when memory
is allocated on regular heap using libc calloc.

This commit changes tb_mem to allocate temporary memory on huge pages
with rte_calloc.

Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
---
 lib/librte_acl/tb_mem.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/librte_acl/tb_mem.c b/lib/librte_acl/tb_mem.c
index 157e608..c373673 100644
--- a/lib/librte_acl/tb_mem.c
+++ b/lib/librte_acl/tb_mem.c
@@ -52,7 +52,7 @@ tb_pool(struct tb_mem_pool *pool, size_t sz)
 	size_t size;
 
 	size = sz + pool->alignment - 1;
-	block = calloc(1, size + sizeof(*pool->block));
+	block = rte_calloc("ACL_TBMEM_BLOCK", 1, size + sizeof(*pool->block), 0);
 	if (block == NULL) {
 		RTE_LOG(ERR, MALLOC, "%s(%zu)\n failed, currently allocated "
 			"by pool: %zu bytes\n", __func__, sz, pool->alloc);
-- 
2.8.3

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] acl: use rte_calloc for temporary memory allocation
  2016-08-16 14:01 ` [PATCH] acl: use rte_calloc for temporary memory allocation Vladyslav Buslov
@ 2016-08-31  1:27   ` Ananyev, Konstantin
  2016-08-31  8:38     ` Vladyslav Buslov
  0 siblings, 1 reply; 7+ messages in thread
From: Ananyev, Konstantin @ 2016-08-31  1:27 UTC (permalink / raw)
  To: Vladyslav Buslov; +Cc: dev

Hi Vladyslav,

> -----Original Message-----
> From: Vladyslav Buslov [mailto:vladyslav.buslov@harmonicinc.com]
> Sent: Tuesday, August 16, 2016 3:01 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org
> Subject: [PATCH] acl: use rte_calloc for temporary memory allocation
> 
> Acl build process uses significant amount of memory which degrades performance by causing page walks when memory is allocated on
> regular heap using libc calloc.
> 
> This commit changes tb_mem to allocate temporary memory on huge pages with rte_calloc.

We deliberately used standard system memory allocation routines (calloc/free) here.
With current design build phase was never considered to be an 'RT' phase operation.
It is pretty cpu and memory expensive.
So if we'll use RTE memory for build phase it could easily consume all (or most)
of it, and might cause DPDK process failure or degradation.
If you really feel that you (and other users) would benefit from using
rte_calloc here (I personally still in doubt), then at least it should be a new
field inside rte_acl_config, that would allow user to control that behavior.
Though, as I said above, librte_acl was never designed to ' to allocate tens of thousands of ACLs at runtime'.
To add ability to add/delete rules at runtime while keeping lookup time reasonably low
some new approach need to be introduced.  
Konstantin

> 
> Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
> ---
>  lib/librte_acl/tb_mem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_acl/tb_mem.c b/lib/librte_acl/tb_mem.c index 157e608..c373673 100644
> --- a/lib/librte_acl/tb_mem.c
> +++ b/lib/librte_acl/tb_mem.c
> @@ -52,7 +52,7 @@ tb_pool(struct tb_mem_pool *pool, size_t sz)
>  	size_t size;
> 
>  	size = sz + pool->alignment - 1;
> -	block = calloc(1, size + sizeof(*pool->block));
> +	block = rte_calloc("ACL_TBMEM_BLOCK", 1, size + sizeof(*pool->block),
> +0);
>  	if (block == NULL) {
>  		RTE_LOG(ERR, MALLOC, "%s(%zu)\n failed, currently allocated "
>  			"by pool: %zu bytes\n", __func__, sz, pool->alloc);
> --
> 2.8.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] acl: use rte_calloc for temporary memory allocation
  2016-08-31  1:27   ` Ananyev, Konstantin
@ 2016-08-31  8:38     ` Vladyslav Buslov
  2016-08-31  9:33       ` Thomas Monjalon
  2016-08-31  9:59       ` Ananyev, Konstantin
  0 siblings, 2 replies; 7+ messages in thread
From: Vladyslav Buslov @ 2016-08-31  8:38 UTC (permalink / raw)
  To: Ananyev, Konstantin; +Cc: dev

Hi Konstantin,

Thanks for your feedback.

Would you accept this change as config file compile-time parameter with libc calloc as default?
It is one line change only so it is easy to ifdef.

Regards,
Vlad

-----Original Message-----
From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com] 
Sent: Wednesday, August 31, 2016 4:28 AM
To: Vladyslav Buslov
Cc: dev@dpdk.org
Subject: RE: [PATCH] acl: use rte_calloc for temporary memory allocation

Hi Vladyslav,

> -----Original Message-----
> From: Vladyslav Buslov [mailto:vladyslav.buslov@harmonicinc.com]
> Sent: Tuesday, August 16, 2016 3:01 PM
> To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> Cc: dev@dpdk.org
> Subject: [PATCH] acl: use rte_calloc for temporary memory allocation
> 
> Acl build process uses significant amount of memory which degrades 
> performance by causing page walks when memory is allocated on regular heap using libc calloc.
> 
> This commit changes tb_mem to allocate temporary memory on huge pages with rte_calloc.

We deliberately used standard system memory allocation routines (calloc/free) here.
With current design build phase was never considered to be an 'RT' phase operation.
It is pretty cpu and memory expensive.
So if we'll use RTE memory for build phase it could easily consume all (or most) of it, and might cause DPDK process failure or degradation.
If you really feel that you (and other users) would benefit from using rte_calloc here (I personally still in doubt), then at least it should be a new field inside rte_acl_config, that would allow user to control that behavior.
Though, as I said above, librte_acl was never designed to ' to allocate tens of thousands of ACLs at runtime'.
To add ability to add/delete rules at runtime while keeping lookup time reasonably low some new approach need to be introduced.  
Konstantin

> 
> Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
> ---
>  lib/librte_acl/tb_mem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/lib/librte_acl/tb_mem.c b/lib/librte_acl/tb_mem.c index 
> 157e608..c373673 100644
> --- a/lib/librte_acl/tb_mem.c
> +++ b/lib/librte_acl/tb_mem.c
> @@ -52,7 +52,7 @@ tb_pool(struct tb_mem_pool *pool, size_t sz)
>  	size_t size;
> 
>  	size = sz + pool->alignment - 1;
> -	block = calloc(1, size + sizeof(*pool->block));
> +	block = rte_calloc("ACL_TBMEM_BLOCK", 1, size + 
> +sizeof(*pool->block), 0);
>  	if (block == NULL) {
>  		RTE_LOG(ERR, MALLOC, "%s(%zu)\n failed, currently allocated "
>  			"by pool: %zu bytes\n", __func__, sz, pool->alloc);
> --
> 2.8.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] acl: use rte_calloc for temporary memory allocation
  2016-08-31  8:38     ` Vladyslav Buslov
@ 2016-08-31  9:33       ` Thomas Monjalon
  2016-08-31  9:59       ` Ananyev, Konstantin
  1 sibling, 0 replies; 7+ messages in thread
From: Thomas Monjalon @ 2016-08-31  9:33 UTC (permalink / raw)
  To: Vladyslav Buslov; +Cc: dev, Ananyev, Konstantin

2016-08-31 08:38, Vladyslav Buslov:
> Would you accept this change as config file compile-time parameter with libc calloc as default?
> It is one line change only so it is easy to ifdef.

The configuration should not be compile-time.
Please think about a runtime configuration via an API.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] acl: use rte_calloc for temporary memory allocation
  2016-08-31  8:38     ` Vladyslav Buslov
  2016-08-31  9:33       ` Thomas Monjalon
@ 2016-08-31  9:59       ` Ananyev, Konstantin
  1 sibling, 0 replies; 7+ messages in thread
From: Ananyev, Konstantin @ 2016-08-31  9:59 UTC (permalink / raw)
  To: Vladyslav Buslov; +Cc: dev

Hi Vlad, 

> 
> Hi Konstantin,
> 
> Thanks for your feedback.
> 
> Would you accept this change as config file compile-time parameter with libc calloc as default?
> It is one line change only so it is easy to ifdef.

It is an option, but the main requirements from the community
is to minimize number of build time config options, and instead provide
ability to the user to configure things at runtime.
That's why I thought about something like:

+ /* use EAL hugepages for temporary memory allocations,
+  * might improve build time, build increases hugepages
+  * demand significantly.
+  */
#define	RTE_ACL_CFG_FLAG_HMEM	1

struct rte_acl_config {
	uint32_t num_categories; /**< Number of categories to build with. */
	uint32_t num_fields;     /**< Number of field definitions. */
	struct rte_acl_field_def defs[RTE_ACL_MAX_FIELDS];
	/**< array of field definitions. */
	size_t max_size;
	/**< max memory limit for internal run-time structures. */
+	uint64_t flags;
}; 

And then change tb_pool() to use either calloc() or rte_alloc() based on the flags value.
Another, probably even better and more flexible way is to allow user to specify his own alloc routine:

struct rte_acl_config {
	uint32_t num_categories; /**< Number of categories to build with. */
	uint32_t num_fields;     /**< Number of field definitions. */
	struct rte_acl_field_def defs[RTE_ACL_MAX_FIELDS];
	/**< array of field definitions. */
	size_t max_size;
	/**< max memory limit for internal run-time structures. */
+	void *(tballoc)( (size_t, size_t);
+	void (*tbfree)(void *);
};

In that case user can provide his own alloc/free based on rte_calloc/rte_free
or even on something else.
The only drawback I see with that approaches, that in that case, you'll need to follow
ABI compliance rules, which probably means that your change might not make 16.11.

Konstantin 

> 
> Regards,
> Vlad
> 
> -----Original Message-----
> From: Ananyev, Konstantin [mailto:konstantin.ananyev@intel.com]
> Sent: Wednesday, August 31, 2016 4:28 AM
> To: Vladyslav Buslov
> Cc: dev@dpdk.org
> Subject: RE: [PATCH] acl: use rte_calloc for temporary memory allocation
> 
> Hi Vladyslav,
> 
> > -----Original Message-----
> > From: Vladyslav Buslov [mailto:vladyslav.buslov@harmonicinc.com]
> > Sent: Tuesday, August 16, 2016 3:01 PM
> > To: Ananyev, Konstantin <konstantin.ananyev@intel.com>
> > Cc: dev@dpdk.org
> > Subject: [PATCH] acl: use rte_calloc for temporary memory allocation
> >
> > Acl build process uses significant amount of memory which degrades
> > performance by causing page walks when memory is allocated on regular heap using libc calloc.
> >
> > This commit changes tb_mem to allocate temporary memory on huge pages with rte_calloc.
> 
> We deliberately used standard system memory allocation routines (calloc/free) here.
> With current design build phase was never considered to be an 'RT' phase operation.
> It is pretty cpu and memory expensive.
> So if we'll use RTE memory for build phase it could easily consume all (or most) of it, and might cause DPDK process failure or degradation.
> If you really feel that you (and other users) would benefit from using rte_calloc here (I personally still in doubt), then at least it should be a
> new field inside rte_acl_config, that would allow user to control that behavior.
> Though, as I said above, librte_acl was never designed to ' to allocate tens of thousands of ACLs at runtime'.
> To add ability to add/delete rules at runtime while keeping lookup time reasonably low some new approach need to be introduced.
> Konstantin
> 
> >
> > Signed-off-by: Vladyslav Buslov <vladyslav.buslov@harmonicinc.com>
> > ---
> >  lib/librte_acl/tb_mem.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/lib/librte_acl/tb_mem.c b/lib/librte_acl/tb_mem.c index
> > 157e608..c373673 100644
> > --- a/lib/librte_acl/tb_mem.c
> > +++ b/lib/librte_acl/tb_mem.c
> > @@ -52,7 +52,7 @@ tb_pool(struct tb_mem_pool *pool, size_t sz)
> >  	size_t size;
> >
> >  	size = sz + pool->alignment - 1;
> > -	block = calloc(1, size + sizeof(*pool->block));
> > +	block = rte_calloc("ACL_TBMEM_BLOCK", 1, size +
> > +sizeof(*pool->block), 0);
> >  	if (block == NULL) {
> >  		RTE_LOG(ERR, MALLOC, "%s(%zu)\n failed, currently allocated "
> >  			"by pool: %zu bytes\n", __func__, sz, pool->alloc);
> > --
> > 2.8.3

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Performance optimization of ACL build process
  2016-08-16 14:01 [PATCH] Performance optimization of ACL build process Vladyslav Buslov
  2016-08-16 14:01 ` [PATCH] acl: use rte_calloc for temporary memory allocation Vladyslav Buslov
@ 2016-08-31 15:24 ` Stephen Hemminger
  1 sibling, 0 replies; 7+ messages in thread
From: Stephen Hemminger @ 2016-08-31 15:24 UTC (permalink / raw)
  To: Vladyslav Buslov; +Cc: konstantin.ananyev, dev

On Tue, 16 Aug 2016 17:01:27 +0300
Vladyslav Buslov <vladyslav.buslov@harmonicinc.com> wrote:

> Hello,
> 
> In our application we need to be able to allocate tens of thousands of ACLs at runtime.
> Testing revealed significant performance problems. We were able to track them to memset in calloc function which caused multiple page walks per invocation.
> Modifying tb_mem to use huge page memory resulted ~2x performance gain for that operation.
> 
> Regards,
> Vladyslav
> 
> Vladyslav Buslov (1):
>   acl: use rte_calloc for temporary memory allocation
> 
>  lib/librte_acl/tb_mem.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 

What about making sure that the ACL code does it's own initialization of all fields
and just use malloc rather than calloc?

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2016-08-31 15:24 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-16 14:01 [PATCH] Performance optimization of ACL build process Vladyslav Buslov
2016-08-16 14:01 ` [PATCH] acl: use rte_calloc for temporary memory allocation Vladyslav Buslov
2016-08-31  1:27   ` Ananyev, Konstantin
2016-08-31  8:38     ` Vladyslav Buslov
2016-08-31  9:33       ` Thomas Monjalon
2016-08-31  9:59       ` Ananyev, Konstantin
2016-08-31 15:24 ` [PATCH] Performance optimization of ACL build process Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.