All of lore.kernel.org
 help / color / mirror / Atom feed
* long initialization of rte_eal_hugepage_init
@ 2017-09-06  3:24 王志克
  2017-09-06  4:24 ` Stephen Hemminger
                   ` (2 more replies)
  0 siblings, 3 replies; 10+ messages in thread
From: 王志克 @ 2017-09-06  3:24 UTC (permalink / raw)
  To: users, dev

Hi All,

I observed that rte_eal_hugepage_init() will take quite long time if there are lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 minutes. That is too long especially for application restart case.

If the application only needs limited huge page while the host have lots of huge pages, the algorithm is not so efficent. Example, we only need 1G memory from each socket.

What is the proposal from DPDK community? Any solution?

Note I tried version dpdk 16.11.

Br,
Wang Zhike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: long initialization of rte_eal_hugepage_init
  2017-09-06  3:24 long initialization of rte_eal_hugepage_init 王志克
@ 2017-09-06  4:24 ` Stephen Hemminger
  2017-09-06  6:45   ` 王志克
  2017-09-06  4:35 ` Pavan Nikhilesh Bhagavatula
  2017-09-06  4:36 ` Tan, Jianfeng
  2 siblings, 1 reply; 10+ messages in thread
From: Stephen Hemminger @ 2017-09-06  4:24 UTC (permalink / raw)
  To: 王志克; +Cc: dev, users

Linux zeros huge pages by default. There was a fix in later releases

On Sep 5, 2017 8:24 PM, "王志克" <wangzhike@jd.com> wrote:

> Hi All,
>
> I observed that rte_eal_hugepage_init() will take quite long time if there
> are lots of huge pages. Example I have 500 1G huge pages, and it takes
> about 2 minutes. That is too long especially for application restart case.
>
> If the application only needs limited huge page while the host have lots
> of huge pages, the algorithm is not so efficent. Example, we only need 1G
> memory from each socket.
>
> What is the proposal from DPDK community? Any solution?
>
> Note I tried version dpdk 16.11.
>
> Br,
> Wang Zhike
>

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: long initialization of rte_eal_hugepage_init
  2017-09-06  3:24 long initialization of rte_eal_hugepage_init 王志克
  2017-09-06  4:24 ` Stephen Hemminger
@ 2017-09-06  4:35 ` Pavan Nikhilesh Bhagavatula
  2017-09-06  7:37   ` Sergio Gonzalez Monroy
  2017-09-06  4:36 ` Tan, Jianfeng
  2 siblings, 1 reply; 10+ messages in thread
From: Pavan Nikhilesh Bhagavatula @ 2017-09-06  4:35 UTC (permalink / raw)
  To: 王志克; +Cc: dev

On Wed, Sep 06, 2017 at 03:24:52AM +0000, 王志克 wrote:
> Hi All,
>
> I observed that rte_eal_hugepage_init() will take quite long time if there are lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 minutes. That is too long especially for application restart case.
>
> If the application only needs limited huge page while the host have lots of huge pages, the algorithm is not so efficent. Example, we only need 1G memory from each socket.
>

There is a EAL option --socket-mem which can be used to limit the memory
aquired from each socket.

> What is the proposal from DPDK community? Any solution?
>
> Note I tried version dpdk 16.11.
>
> Br,
> Wang Zhike

-Pavan

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: long initialization of rte_eal_hugepage_init
  2017-09-06  3:24 long initialization of rte_eal_hugepage_init 王志克
  2017-09-06  4:24 ` Stephen Hemminger
  2017-09-06  4:35 ` Pavan Nikhilesh Bhagavatula
@ 2017-09-06  4:36 ` Tan, Jianfeng
  2017-09-06  6:02   ` 王志克
  2 siblings, 1 reply; 10+ messages in thread
From: Tan, Jianfeng @ 2017-09-06  4:36 UTC (permalink / raw)
  To: wangzhike, users, dev



> -----Original Message-----
> From: users [mailto:users-bounces@dpdk.org] On Behalf Of ???
> Sent: Wednesday, September 6, 2017 11:25 AM
> To: users@dpdk.org; dev@dpdk.org
> Subject: [dpdk-users] long initialization of rte_eal_hugepage_init
> 
> Hi All,
> 
> I observed that rte_eal_hugepage_init() will take quite long time if there are
> lots of huge pages. Example I have 500 1G huge pages, and it takes about 2
> minutes. That is too long especially for application restart case.
> 
> If the application only needs limited huge page while the host have lots of
> huge pages, the algorithm is not so efficent. Example, we only need 1G
> memory from each socket.
> 
> What is the proposal from DPDK community? Any solution?

You can mount hugetlbfs with "size" option + use "--socket-mem" option in DPDK to restrict the memory to be used. 

Thanks,
Jianfeng

> 
> Note I tried version dpdk 16.11.
> 
> Br,
> Wang Zhike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: long initialization of rte_eal_hugepage_init
  2017-09-06  4:36 ` Tan, Jianfeng
@ 2017-09-06  6:02   ` 王志克
  2017-09-06  7:17     ` Tan, Jianfeng
  0 siblings, 1 reply; 10+ messages in thread
From: 王志克 @ 2017-09-06  6:02 UTC (permalink / raw)
  To: Tan, Jianfeng, users, dev

Do you mean "pagesize" when you say "size" option? I have specified the pagesize as 1G.
Also, I already use "--socket-mem " to specify that the application only needs 1G per NUMA node.

The problem is that map_all_hugepages() would map all free huge pages, and then select the proper ones. If I have 500 free huge pages (each 1G), and application only needs 1G per NUMA socket, it is unreasonable for such mapping.

My use case is OVS+DPDK. The OVS+DPDK would only need 2G, and other application (Qemu/VM) would use the other huge pages.

Br,
Wang Zhike


-----Original Message-----
From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com] 
Sent: Wednesday, September 06, 2017 12:36 PM
To: 王志克; users@dpdk.org; dev@dpdk.org
Subject: RE: long initialization of rte_eal_hugepage_init



> -----Original Message-----
> From: users [mailto:users-bounces@dpdk.org] On Behalf Of ???
> Sent: Wednesday, September 6, 2017 11:25 AM
> To: users@dpdk.org; dev@dpdk.org
> Subject: [dpdk-users] long initialization of rte_eal_hugepage_init
> 
> Hi All,
> 
> I observed that rte_eal_hugepage_init() will take quite long time if there are
> lots of huge pages. Example I have 500 1G huge pages, and it takes about 2
> minutes. That is too long especially for application restart case.
> 
> If the application only needs limited huge page while the host have lots of
> huge pages, the algorithm is not so efficent. Example, we only need 1G
> memory from each socket.
> 
> What is the proposal from DPDK community? Any solution?

You can mount hugetlbfs with "size" option + use "--socket-mem" option in DPDK to restrict the memory to be used. 

Thanks,
Jianfeng

> 
> Note I tried version dpdk 16.11.
> 
> Br,
> Wang Zhike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: long initialization of rte_eal_hugepage_init
  2017-09-06  4:24 ` Stephen Hemminger
@ 2017-09-06  6:45   ` 王志克
  0 siblings, 0 replies; 10+ messages in thread
From: 王志克 @ 2017-09-06  6:45 UTC (permalink / raw)
  To: Stephen Hemminger, zhihong.wang; +Cc: dev, users

Hi Stephen,

Do you means “disable zero huge page” would improve the performance?  How can the memory be guarantee<http://www.baidu.com/link?url=OcSiFdTLN-XzcXbWcNS7WKEDAs5KPRf5SoQeihstSK0eIPPoRsFICa7XLymTk-ln_XJ5mXmGU9C4srI6Nwax6IgorIeptfF9NvgooO1z4B3>d to be allocated? Would it introduce function issue?

I checked below commit, and I guess the commit at least means the “zero the huge page” is needed.

commit 5ce3ace1de458e2ded1b408acfe59c15cf9863f1
Author: Zhihong Wang <zhihong.wang@intel.com>
Date:   Sun Nov 22 14:13:35 2015 -0500

    eal: remove unnecessary hugepage zero-filling

    The kernel fills new allocated (huge) pages with zeros.
    DPDK just has to populate page tables to trigger the allocation.

    Signed-off-by: Zhihong Wang <zhihong.wang@intel.com>
    Acked-by: Stephen Hemminger <stephen@networkplumber.org>

From: Stephen Hemminger [mailto:stephen@networkplumber.org]
Sent: Wednesday, September 06, 2017 12:24 PM
To: 王志克
Cc: dev@dpdk.org; users@dpdk.org
Subject: Re: [dpdk-dev] long initialization of rte_eal_hugepage_init

Linux zeros huge pages by default. There was a fix in later releases

On Sep 5, 2017 8:24 PM, "王志克" <wangzhike@jd.com<mailto:wangzhike@jd.com>> wrote:
Hi All,

I observed that rte_eal_hugepage_init() will take quite long time if there are lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 minutes. That is too long especially for application restart case.

If the application only needs limited huge page while the host have lots of huge pages, the algorithm is not so efficent. Example, we only need 1G memory from each socket.

What is the proposal from DPDK community? Any solution?

Note I tried version dpdk 16.11.

Br,
Wang Zhike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: long initialization of rte_eal_hugepage_init
  2017-09-06  6:02   ` 王志克
@ 2017-09-06  7:17     ` Tan, Jianfeng
  2017-09-06  8:58       ` 王志克
  0 siblings, 1 reply; 10+ messages in thread
From: Tan, Jianfeng @ 2017-09-06  7:17 UTC (permalink / raw)
  To: wangzhike, users, dev



> -----Original Message-----
> From: 王志克 [mailto:wangzhike@jd.com]
> Sent: Wednesday, September 6, 2017 2:03 PM
> To: Tan, Jianfeng; users@dpdk.org; dev@dpdk.org
> Subject: RE: long initialization of rte_eal_hugepage_init
> 
> Do you mean "pagesize" when you say "size" option? I have specified the
> pagesize as 1G.

No, I mean "size". I mean adding another hugetlbfs with total size = what you need for your app. And with another DPDK option "--huge-dir", we can avoid allocating all free hugepages.

If you want to allocate memory on different sockets, e.g., --socket-mem 1024,1024, you need a newer DPDK with below commit by Ilya Maximets:
commit 1b72605d241 ("mem: balanced allocation of hugepages").

Thanks,
Jianfeng

> Also, I already use "--socket-mem " to specify that the application only needs
> 1G per NUMA node.
> 
> The problem is that map_all_hugepages() would map all free huge pages,
> and then select the proper ones. If I have 500 free huge pages (each 1G), and
> application only needs 1G per NUMA socket, it is unreasonable for such
> mapping.
> 
> My use case is OVS+DPDK. The OVS+DPDK would only need 2G, and other
> application (Qemu/VM) would use the other huge pages.
> 
> Br,
> Wang Zhike
> 
> 
> -----Original Message-----
> From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com]
> Sent: Wednesday, September 06, 2017 12:36 PM
> To: 王志克; users@dpdk.org; dev@dpdk.org
> Subject: RE: long initialization of rte_eal_hugepage_init
> 
> 
> 
> > -----Original Message-----
> > From: users [mailto:users-bounces@dpdk.org] On Behalf Of ???
> > Sent: Wednesday, September 6, 2017 11:25 AM
> > To: users@dpdk.org; dev@dpdk.org
> > Subject: [dpdk-users] long initialization of rte_eal_hugepage_init
> >
> > Hi All,
> >
> > I observed that rte_eal_hugepage_init() will take quite long time if there
> are
> > lots of huge pages. Example I have 500 1G huge pages, and it takes about 2
> > minutes. That is too long especially for application restart case.
> >
> > If the application only needs limited huge page while the host have lots of
> > huge pages, the algorithm is not so efficent. Example, we only need 1G
> > memory from each socket.
> >
> > What is the proposal from DPDK community? Any solution?
> 
> You can mount hugetlbfs with "size" option + use "--socket-mem" option in
> DPDK to restrict the memory to be used.
> 
> Thanks,
> Jianfeng
> 
> >
> > Note I tried version dpdk 16.11.
> >
> > Br,
> > Wang Zhike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: long initialization of rte_eal_hugepage_init
  2017-09-06  4:35 ` Pavan Nikhilesh Bhagavatula
@ 2017-09-06  7:37   ` Sergio Gonzalez Monroy
  2017-09-06  8:59     ` 王志克
  0 siblings, 1 reply; 10+ messages in thread
From: Sergio Gonzalez Monroy @ 2017-09-06  7:37 UTC (permalink / raw)
  To: Pavan Nikhilesh Bhagavatula, 王志克; +Cc: dev

On 06/09/2017 05:35, Pavan Nikhilesh Bhagavatula wrote:
> On Wed, Sep 06, 2017 at 03:24:52AM +0000, 王志克 wrote:
>> Hi All,
>>
>> I observed that rte_eal_hugepage_init() will take quite long time if there are lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 minutes. That is too long especially for application restart case.
>>
>> If the application only needs limited huge page while the host have lots of huge pages, the algorithm is not so efficent. Example, we only need 1G memory from each socket.
>>
> There is a EAL option --socket-mem which can be used to limit the memory
> aquired from each socket.
>
>> What is the proposal from DPDK community? Any solution?
>>
>> Note I tried version dpdk 16.11.
>>
>> Br,
>> Wang Zhike
> -Pavan

Since DPDK 17.08+ we use libnuma to first get the amount of pages we 
need from each socket, then as many more as we can.
So you can setup your huge page mount point or cgroups to limit the 
amount of pages you can get.

So basically:
1. setup mount quota or cgroup limit
2. use --socket-mem option to limit amount per socket

Note that pre-17.08 we did not have libnuma support so chances are that 
if you have a low quota/limit and need memory from both sockets it would 
likely fail to allocate the request.

Thanks,
Sergio

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: long initialization of rte_eal_hugepage_init
  2017-09-06  7:17     ` Tan, Jianfeng
@ 2017-09-06  8:58       ` 王志克
  0 siblings, 0 replies; 10+ messages in thread
From: 王志克 @ 2017-09-06  8:58 UTC (permalink / raw)
  To: Tan, Jianfeng, users, dev

Thanks Jianfeng for your suggestion. I get the point.

Br,
Wang Zhike

-----Original Message-----
From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com] 
Sent: Wednesday, September 06, 2017 3:18 PM
To: 王志克; users@dpdk.org; dev@dpdk.org
Subject: RE: long initialization of rte_eal_hugepage_init



> -----Original Message-----
> From: 王志克 [mailto:wangzhike@jd.com]
> Sent: Wednesday, September 6, 2017 2:03 PM
> To: Tan, Jianfeng; users@dpdk.org; dev@dpdk.org
> Subject: RE: long initialization of rte_eal_hugepage_init
> 
> Do you mean "pagesize" when you say "size" option? I have specified the
> pagesize as 1G.

No, I mean "size". I mean adding another hugetlbfs with total size = what you need for your app. And with another DPDK option "--huge-dir", we can avoid allocating all free hugepages.

If you want to allocate memory on different sockets, e.g., --socket-mem 1024,1024, you need a newer DPDK with below commit by Ilya Maximets:
commit 1b72605d241 ("mem: balanced allocation of hugepages").

Thanks,
Jianfeng

> Also, I already use "--socket-mem " to specify that the application only needs
> 1G per NUMA node.
> 
> The problem is that map_all_hugepages() would map all free huge pages,
> and then select the proper ones. If I have 500 free huge pages (each 1G), and
> application only needs 1G per NUMA socket, it is unreasonable for such
> mapping.
> 
> My use case is OVS+DPDK. The OVS+DPDK would only need 2G, and other
> application (Qemu/VM) would use the other huge pages.
> 
> Br,
> Wang Zhike
> 
> 
> -----Original Message-----
> From: Tan, Jianfeng [mailto:jianfeng.tan@intel.com]
> Sent: Wednesday, September 06, 2017 12:36 PM
> To: 王志克; users@dpdk.org; dev@dpdk.org
> Subject: RE: long initialization of rte_eal_hugepage_init
> 
> 
> 
> > -----Original Message-----
> > From: users [mailto:users-bounces@dpdk.org] On Behalf Of ???
> > Sent: Wednesday, September 6, 2017 11:25 AM
> > To: users@dpdk.org; dev@dpdk.org
> > Subject: [dpdk-users] long initialization of rte_eal_hugepage_init
> >
> > Hi All,
> >
> > I observed that rte_eal_hugepage_init() will take quite long time if there
> are
> > lots of huge pages. Example I have 500 1G huge pages, and it takes about 2
> > minutes. That is too long especially for application restart case.
> >
> > If the application only needs limited huge page while the host have lots of
> > huge pages, the algorithm is not so efficent. Example, we only need 1G
> > memory from each socket.
> >
> > What is the proposal from DPDK community? Any solution?
> 
> You can mount hugetlbfs with "size" option + use "--socket-mem" option in
> DPDK to restrict the memory to be used.
> 
> Thanks,
> Jianfeng
> 
> >
> > Note I tried version dpdk 16.11.
> >
> > Br,
> > Wang Zhike

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: long initialization of rte_eal_hugepage_init
  2017-09-06  7:37   ` Sergio Gonzalez Monroy
@ 2017-09-06  8:59     ` 王志克
  0 siblings, 0 replies; 10+ messages in thread
From: 王志克 @ 2017-09-06  8:59 UTC (permalink / raw)
  To: Sergio Gonzalez Monroy, Pavan Nikhilesh Bhagavatula; +Cc: dev

Thanks Sergio. It really helps.

Br,
Wang Zhike

-----Original Message-----
From: Sergio Gonzalez Monroy [mailto:sergio.gonzalez.monroy@intel.com] 
Sent: Wednesday, September 06, 2017 3:37 PM
To: Pavan Nikhilesh Bhagavatula; 王志克
Cc: dev@dpdk.org
Subject: Re: [dpdk-dev] long initialization of rte_eal_hugepage_init

On 06/09/2017 05:35, Pavan Nikhilesh Bhagavatula wrote:
> On Wed, Sep 06, 2017 at 03:24:52AM +0000, 王志克 wrote:
>> Hi All,
>>
>> I observed that rte_eal_hugepage_init() will take quite long time if there are lots of huge pages. Example I have 500 1G huge pages, and it takes about 2 minutes. That is too long especially for application restart case.
>>
>> If the application only needs limited huge page while the host have lots of huge pages, the algorithm is not so efficent. Example, we only need 1G memory from each socket.
>>
> There is a EAL option --socket-mem which can be used to limit the memory
> aquired from each socket.
>
>> What is the proposal from DPDK community? Any solution?
>>
>> Note I tried version dpdk 16.11.
>>
>> Br,
>> Wang Zhike
> -Pavan

Since DPDK 17.08+ we use libnuma to first get the amount of pages we 
need from each socket, then as many more as we can.
So you can setup your huge page mount point or cgroups to limit the 
amount of pages you can get.

So basically:
1. setup mount quota or cgroup limit
2. use --socket-mem option to limit amount per socket

Note that pre-17.08 we did not have libnuma support so chances are that 
if you have a low quota/limit and need memory from both sockets it would 
likely fail to allocate the request.

Thanks,
Sergio

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2017-09-06  8:59 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-06  3:24 long initialization of rte_eal_hugepage_init 王志克
2017-09-06  4:24 ` Stephen Hemminger
2017-09-06  6:45   ` 王志克
2017-09-06  4:35 ` Pavan Nikhilesh Bhagavatula
2017-09-06  7:37   ` Sergio Gonzalez Monroy
2017-09-06  8:59     ` 王志克
2017-09-06  4:36 ` Tan, Jianfeng
2017-09-06  6:02   ` 王志克
2017-09-06  7:17     ` Tan, Jianfeng
2017-09-06  8:58       ` 王志克

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.