netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
@ 2019-02-13  1:58 Eric Dumazet
  2019-02-13  2:04 ` David Ahern
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Eric Dumazet @ 2019-02-13  1:58 UTC (permalink / raw)
  To: David Ahern, Stephen Hemminger
  Cc: netdev, Eric Dumazet, Eric Dumazet, Hangbin Liu, Phil Sutter

In the past, we tried to increase the buffer size up to 32 KB in order
to reduce number of syscalls per dump.

Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
brought the size back to 4KB because the kernel can not know the application
is ready to receive bigger requests.

See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
for more details.

Fixes: 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Hangbin Liu <liuhangbin@gmail.com>
Cc: Phil Sutter <phil@nwl.cc>
---
 lib/libnetlink.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/libnetlink.c b/lib/libnetlink.c
index 1892a02ab5d0d73776c9882ffc77edcd2c663d01..0d48a3d43cf03065dacbd419578ab10af56431a4 100644
--- a/lib/libnetlink.c
+++ b/lib/libnetlink.c
@@ -718,6 +718,8 @@ static int rtnl_recvmsg(int fd, struct msghdr *msg, char **answer)
 	if (len < 0)
 		return len;
 
+	if (len < 32768)
+		len = 32768;
 	buf = malloc(len);
 	if (!buf) {
 		fprintf(stderr, "malloc error: not enough buffer\n");
-- 
2.21.0.rc0.258.g878e2cd30e-goog


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2019-02-13  1:58 [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg() Eric Dumazet
@ 2019-02-13  2:04 ` David Ahern
  2019-02-13  2:08   ` Eric Dumazet
                     ` (2 more replies)
  2019-02-13 17:46 ` Phil Sutter
  2019-02-13 21:57 ` Stephen Hemminger
  2 siblings, 3 replies; 17+ messages in thread
From: David Ahern @ 2019-02-13  2:04 UTC (permalink / raw)
  To: Eric Dumazet, Stephen Hemminger
  Cc: netdev, Eric Dumazet, Hangbin Liu, Phil Sutter

On 2/12/19 6:58 PM, Eric Dumazet wrote:
> In the past, we tried to increase the buffer size up to 32 KB in order
> to reduce number of syscalls per dump.
> 
> Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> brought the size back to 4KB because the kernel can not know the application
> is ready to receive bigger requests.
> 
> See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
> d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
> for more details.
> 
> Fixes: 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Hangbin Liu <liuhangbin@gmail.com>
> Cc: Phil Sutter <phil@nwl.cc>
> ---
>  lib/libnetlink.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
> index 1892a02ab5d0d73776c9882ffc77edcd2c663d01..0d48a3d43cf03065dacbd419578ab10af56431a4 100644
> --- a/lib/libnetlink.c
> +++ b/lib/libnetlink.c
> @@ -718,6 +718,8 @@ static int rtnl_recvmsg(int fd, struct msghdr *msg, char **answer)
>  	if (len < 0)
>  		return len;
>  
> +	if (len < 32768)
> +		len = 32768;
>  	buf = malloc(len);
>  	if (!buf) {
>  		fprintf(stderr, "malloc error: not enough buffer\n");
> 

I believe that negates the whole point of 2d34851cd341 - which I have no
problem with. 2 recvmsg calls per message is overkill.

Do we know of any single message sizes > 32k? 2d34851cd341 cites
increasing VF's but at some point there is a limit. If not, the whole
PEEK thing should go away and we just malloc 32k (or 64k) buffers for
each recvmsg.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2019-02-13  2:04 ` David Ahern
@ 2019-02-13  2:08   ` Eric Dumazet
  2019-02-13  6:20     ` Hangbin Liu
  2019-02-14 13:49   ` Michal Kubecek
  2023-05-29 12:29   ` Gal Pressman
  2 siblings, 1 reply; 17+ messages in thread
From: Eric Dumazet @ 2019-02-13  2:08 UTC (permalink / raw)
  To: David Ahern, Eric Dumazet, Stephen Hemminger
  Cc: netdev, Hangbin Liu, Phil Sutter



On 02/12/2019 06:04 PM, David Ahern wrote:
> On 2/12/19 6:58 PM, Eric Dumazet wrote:
>> In the past, we tried to increase the buffer size up to 32 KB in order
>> to reduce number of syscalls per dump.
>>
>> Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
>> brought the size back to 4KB because the kernel can not know the application
>> is ready to receive bigger requests.
>>
>> See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
>> d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
>> for more details.
>>
>> Fixes: 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Cc: Hangbin Liu <liuhangbin@gmail.com>
>> Cc: Phil Sutter <phil@nwl.cc>
>> ---
>>  lib/libnetlink.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
>> index 1892a02ab5d0d73776c9882ffc77edcd2c663d01..0d48a3d43cf03065dacbd419578ab10af56431a4 100644
>> --- a/lib/libnetlink.c
>> +++ b/lib/libnetlink.c
>> @@ -718,6 +718,8 @@ static int rtnl_recvmsg(int fd, struct msghdr *msg, char **answer)
>>  	if (len < 0)
>>  		return len;
>>  
>> +	if (len < 32768)
>> +		len = 32768;
>>  	buf = malloc(len);
>>  	if (!buf) {
>>  		fprintf(stderr, "malloc error: not enough buffer\n");
>>
> 
> I believe that negates the whole point of 2d34851cd341 - which I have no
> problem with. 2 recvmsg calls per message is overkill.
> 

It does not negates the point at all.

The main point was to eventually be able to allocate more than 32KB.

We need to have a minimum size of 32KB so that the kernel can cook reasonably sized skbs

Because trying to allocate 4KB only in 2019 is kind of stupid...

( Especially considering ss currently buffers the whole thing before calling render() !!! )

> Do we know of any single message sizes > 32k? 2d34851cd341 cites
> increasing VF's but at some point there is a limit. If not, the whole
> PEEK thing should go away and we just malloc 32k (or 64k) buffers for
> each recvmsg.
> 


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2019-02-13  2:08   ` Eric Dumazet
@ 2019-02-13  6:20     ` Hangbin Liu
  2019-02-14 13:51       ` Michal Kubecek
  0 siblings, 1 reply; 17+ messages in thread
From: Hangbin Liu @ 2019-02-13  6:20 UTC (permalink / raw)
  To: David Ahern; +Cc: Eric Dumazet, Stephen Hemminger, netdev, Phil Sutter

Hi David,
On Tue, Feb 12, 2019 at 06:08:13PM -0800, Eric Dumazet wrote:
> >> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
> >> index 1892a02ab5d0d73776c9882ffc77edcd2c663d01..0d48a3d43cf03065dacbd419578ab10af56431a4 100644
> >> --- a/lib/libnetlink.c
> >> +++ b/lib/libnetlink.c
> >> @@ -718,6 +718,8 @@ static int rtnl_recvmsg(int fd, struct msghdr *msg, char **answer)
> >>  	if (len < 0)
> >>  		return len;
> >>  
> >> +	if (len < 32768)
> >> +		len = 32768;
> >>  	buf = malloc(len);
> >>  	if (!buf) {
> >>  		fprintf(stderr, "malloc error: not enough buffer\n");
> >>
> > 
> > I believe that negates the whole point of 2d34851cd341 - which I have no
> > problem with. 2 recvmsg calls per message is overkill.

It should not affects ip cmd too much. But for ss, as Eric pointed, it
cause performance issue.

> > 
> 
> It does not negates the point at all.
> 
> The main point was to eventually be able to allocate more than 32KB.
> 
> We need to have a minimum size of 32KB so that the kernel can cook reasonably sized skbs
> 
> Because trying to allocate 4KB only in 2019 is kind of stupid...
> 
> ( Especially considering ss currently buffers the whole thing before calling render() !!! )

This makes sense to me.
 
> > Do we know of any single message sizes > 32k? 2d34851cd341 cites
> > increasing VF's but at some point there is a limit. If not, the whole
> > PEEK thing should go away and we just malloc 32k (or 64k) buffers for
> > each recvmsg.

Apart from the 200 VFs example, I think there will be more and more virtual
interfaces be used in cloud environment, like openstack/OVS, so useing 32K
or 64K is still not safe.

What do you think.

Thanks
Hangbin

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2019-02-13  1:58 [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg() Eric Dumazet
  2019-02-13  2:04 ` David Ahern
@ 2019-02-13 17:46 ` Phil Sutter
  2019-02-13 17:58   ` Eric Dumazet
  2019-02-13 21:57 ` Stephen Hemminger
  2 siblings, 1 reply; 17+ messages in thread
From: Phil Sutter @ 2019-02-13 17:46 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: David Ahern, Stephen Hemminger, netdev, Eric Dumazet, Hangbin Liu

Hi Eric,

On Tue, Feb 12, 2019 at 05:58:41PM -0800, Eric Dumazet wrote:
> In the past, we tried to increase the buffer size up to 32 KB in order
> to reduce number of syscalls per dump.
> 
> Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> brought the size back to 4KB because the kernel can not know the application
> is ready to receive bigger requests.
> 
> See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
> d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
> for more details.

Wouldn't it be better if the kernel recognized MSG_TRUNC and allocated a
buffer large enough to hold the full message in that case? I have no
idea how hard that would be to implement, but calling recvmsg() with
MSG_TRUNC set and not getting the full message length in return is not
quite what one expects after reading recvmsg(2).

Cheers, Phil

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2019-02-13 17:46 ` Phil Sutter
@ 2019-02-13 17:58   ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2019-02-13 17:58 UTC (permalink / raw)
  To: Phil Sutter, Eric Dumazet, David Ahern, Stephen Hemminger,
	netdev, Hangbin Liu



On 02/13/2019 09:46 AM, Phil Sutter wrote:
> Hi Eric,
> 
> On Tue, Feb 12, 2019 at 05:58:41PM -0800, Eric Dumazet wrote:
>> In the past, we tried to increase the buffer size up to 32 KB in order
>> to reduce number of syscalls per dump.
>>
>> Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
>> brought the size back to 4KB because the kernel can not know the application
>> is ready to receive bigger requests.
>>
>> See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
>> d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
>> for more details.
> 
> Wouldn't it be better if the kernel recognized MSG_TRUNC and allocated a
> buffer large enough to hold the full message in that case? I have no
> idea how hard that would be to implement, but calling recvmsg() with
> MSG_TRUNC set and not getting the full message length in return is not
> quite what one expects after reading recvmsg(2).
> 

I am not sure what you are suggesting.

The buffer is already in the kernel, this is the skb in the receive queue.
Its size is unknown... We only can guess.

We can not change old kernels, and new iproute2/ss commands need to work with old kernels.

We can not change MSG_TRUNC semantic.

Adding another MSG_be_gentle_do_not_consume_skb_if_user_size_is_too_small wont solve the issue for old kernels.


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2019-02-13  1:58 [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg() Eric Dumazet
  2019-02-13  2:04 ` David Ahern
  2019-02-13 17:46 ` Phil Sutter
@ 2019-02-13 21:57 ` Stephen Hemminger
  2019-02-13 21:59   ` Eric Dumazet
  2 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2019-02-13 21:57 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: David Ahern, netdev, Eric Dumazet, Hangbin Liu, Phil Sutter

On Tue, 12 Feb 2019 17:58:41 -0800
Eric Dumazet <edumazet@google.com> wrote:

> In the past, we tried to increase the buffer size up to 32 KB in order
> to reduce number of syscalls per dump.
> 
> Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> brought the size back to 4KB because the kernel can not know the application
> is ready to receive bigger requests.
> 
> See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
> d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
> for more details.
> 
> Fixes: 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Cc: Hangbin Liu <liuhangbin@gmail.com>
> Cc: Phil Sutter <phil@nwl.cc>

Applied, although maybe we should bump it to 64K or bigger?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2019-02-13 21:57 ` Stephen Hemminger
@ 2019-02-13 21:59   ` Eric Dumazet
  0 siblings, 0 replies; 17+ messages in thread
From: Eric Dumazet @ 2019-02-13 21:59 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Ahern, netdev, Eric Dumazet, Hangbin Liu, Phil Sutter

On Wed, Feb 13, 2019 at 1:57 PM Stephen Hemminger
<stephen@networkplumber.org> wrote:
>
> On Tue, 12 Feb 2019 17:58:41 -0800
> Eric Dumazet <edumazet@google.com> wrote:
>
> > In the past, we tried to increase the buffer size up to 32 KB in order
> > to reduce number of syscalls per dump.
> >
> > Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> > brought the size back to 4KB because the kernel can not know the application
> > is ready to receive bigger requests.
> >
> > See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
> > d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
> > for more details.
> >
> > Fixes: 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
> > Cc: Hangbin Liu <liuhangbin@gmail.com>
> > Cc: Phil Sutter <phil@nwl.cc>
>
> Applied, although maybe we should bump it to 64K or bigger?

Note the kernel does not yet try 64KB allocations, so I do not see an
urgent need for that :)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2019-02-13  2:04 ` David Ahern
  2019-02-13  2:08   ` Eric Dumazet
@ 2019-02-14 13:49   ` Michal Kubecek
  2019-02-14 17:34     ` David Ahern
  2023-05-29 12:29   ` Gal Pressman
  2 siblings, 1 reply; 17+ messages in thread
From: Michal Kubecek @ 2019-02-14 13:49 UTC (permalink / raw)
  To: netdev
  Cc: David Ahern, Eric Dumazet, Stephen Hemminger, Eric Dumazet,
	Hangbin Liu, Phil Sutter

On Tue, Feb 12, 2019 at 07:04:17PM -0700, David Ahern wrote:
> 
> Do we know of any single message sizes > 32k? 2d34851cd341 cites
> increasing VF's but at some point there is a limit. If not, the whole
> PEEK thing should go away and we just malloc 32k (or 64k) buffers for
> each recvmsg.

IFLA_VF_LIST is by far the biggest thing I have seen so far. I don't
remember exact numbers but the issue with 32KB buffer (for the whole
RTM_NELINK message) was encountered by some of our customers with NICs
having 120 or 128 VFs.

There is a bigger issue with IFLA_VFINFO_LIST, though, as it's an
attribute so that netlink limits its size to 64 KB. IIRC with current
size of IFLA_VF_INFO this would be reached with 270-280 VFs (I'm sure
the number was higer than 256 but not too much higher.)

This would mean unless we let something else grow too much, the whole
message shouldn't get much bigger than 64 KB. And if we can find some
other solution (e.g. passing VF information in separate messages if
client declares support), even 32 KB would be more than enough.

Michal Kubecek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2019-02-13  6:20     ` Hangbin Liu
@ 2019-02-14 13:51       ` Michal Kubecek
  0 siblings, 0 replies; 17+ messages in thread
From: Michal Kubecek @ 2019-02-14 13:51 UTC (permalink / raw)
  To: netdev
  Cc: Hangbin Liu, David Ahern, Eric Dumazet, Stephen Hemminger, Phil Sutter

On Wed, Feb 13, 2019 at 02:20:12PM +0800, Hangbin Liu wrote:
>  
> > > Do we know of any single message sizes > 32k? 2d34851cd341 cites
> > > increasing VF's but at some point there is a limit. If not, the whole
> > > PEEK thing should go away and we just malloc 32k (or 64k) buffers for
> > > each recvmsg.
> 
> Apart from the 200 VFs example, I think there will be more and more virtual
> interfaces be used in cloud environment, like openstack/OVS, so useing 32K
> or 64K is still not safe.

Many intefraces are not a problem. Those will have separate messages so
that they don't need to fit into one buffer.

Michal Kubecek

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2019-02-14 13:49   ` Michal Kubecek
@ 2019-02-14 17:34     ` David Ahern
  2019-02-14 17:47       ` Phil Sutter
  0 siblings, 1 reply; 17+ messages in thread
From: David Ahern @ 2019-02-14 17:34 UTC (permalink / raw)
  To: Michal Kubecek, netdev
  Cc: Eric Dumazet, Stephen Hemminger, Eric Dumazet, Hangbin Liu, Phil Sutter

On 2/14/19 6:49 AM, Michal Kubecek wrote:
> On Tue, Feb 12, 2019 at 07:04:17PM -0700, David Ahern wrote:
>>
>> Do we know of any single message sizes > 32k? 2d34851cd341 cites
>> increasing VF's but at some point there is a limit. If not, the whole
>> PEEK thing should go away and we just malloc 32k (or 64k) buffers for
>> each recvmsg.
> 
> IFLA_VF_LIST is by far the biggest thing I have seen so far. I don't
> remember exact numbers but the issue with 32KB buffer (for the whole
> RTM_NELINK message) was encountered by some of our customers with NICs
> having 120 or 128 VFs.
> 
> There is a bigger issue with IFLA_VFINFO_LIST, though, as it's an
> attribute so that netlink limits its size to 64 KB. IIRC with current
> size of IFLA_VF_INFO this would be reached with 270-280 VFs (I'm sure
> the number was higer than 256 but not too much higher.)
> 
> This would mean unless we let something else grow too much, the whole
> message shouldn't get much bigger than 64 KB. And if we can find some
> other solution (e.g. passing VF information in separate messages if
> client declares support), even 32 KB would be more than enough.

That's what I was asking, thanks. So 32kB today is sufficient, 64kB has
future buffer. So this whole PEEK and allocate the message size is
overkill. It could just as easily been bumped from 32kB to 64kB in the
original patch and been good for a while.

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2019-02-14 17:34     ` David Ahern
@ 2019-02-14 17:47       ` Phil Sutter
  0 siblings, 0 replies; 17+ messages in thread
From: Phil Sutter @ 2019-02-14 17:47 UTC (permalink / raw)
  To: David Ahern
  Cc: Michal Kubecek, netdev, Eric Dumazet, Stephen Hemminger,
	Eric Dumazet, Hangbin Liu

On Thu, Feb 14, 2019 at 10:34:06AM -0700, David Ahern wrote:
> On 2/14/19 6:49 AM, Michal Kubecek wrote:
> > On Tue, Feb 12, 2019 at 07:04:17PM -0700, David Ahern wrote:
> >>
> >> Do we know of any single message sizes > 32k? 2d34851cd341 cites
> >> increasing VF's but at some point there is a limit. If not, the whole
> >> PEEK thing should go away and we just malloc 32k (or 64k) buffers for
> >> each recvmsg.
> > 
> > IFLA_VF_LIST is by far the biggest thing I have seen so far. I don't
> > remember exact numbers but the issue with 32KB buffer (for the whole
> > RTM_NELINK message) was encountered by some of our customers with NICs
> > having 120 or 128 VFs.
> > 
> > There is a bigger issue with IFLA_VFINFO_LIST, though, as it's an
> > attribute so that netlink limits its size to 64 KB. IIRC with current
> > size of IFLA_VF_INFO this would be reached with 270-280 VFs (I'm sure
> > the number was higer than 256 but not too much higher.)

Using netdevsim, 'ip link show' becomes unusable after enabling more
than 244 VFs. I guess it depends on how much info per VF is available.

> > This would mean unless we let something else grow too much, the whole
> > message shouldn't get much bigger than 64 KB. And if we can find some
> > other solution (e.g. passing VF information in separate messages if
> > client declares support), even 32 KB would be more than enough.
> 
> That's what I was asking, thanks. So 32kB today is sufficient, 64kB has
> future buffer. So this whole PEEK and allocate the message size is
> overkill. It could just as easily been bumped from 32kB to 64kB in the
> original patch and been good for a while.

Yes, I think the real problem is how VF-related messages are structured
currently.

Cheers, Phil

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2019-02-13  2:04 ` David Ahern
  2019-02-13  2:08   ` Eric Dumazet
  2019-02-14 13:49   ` Michal Kubecek
@ 2023-05-29 12:29   ` Gal Pressman
  2023-05-31 21:51     ` Stephen Hemminger
  2 siblings, 1 reply; 17+ messages in thread
From: Gal Pressman @ 2023-05-29 12:29 UTC (permalink / raw)
  To: David Ahern, Eric Dumazet, Stephen Hemminger
  Cc: netdev, Eric Dumazet, Hangbin Liu, Phil Sutter, Jakub Kicinski,
	Michal Kubecek

On 13/02/2019 4:04, David Ahern wrote:
> On 2/12/19 6:58 PM, Eric Dumazet wrote:
>> In the past, we tried to increase the buffer size up to 32 KB in order
>> to reduce number of syscalls per dump.
>>
>> Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
>> brought the size back to 4KB because the kernel can not know the application
>> is ready to receive bigger requests.
>>
>> See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
>> d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
>> for more details.
>>
>> Fixes: 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>> Cc: Hangbin Liu <liuhangbin@gmail.com>
>> Cc: Phil Sutter <phil@nwl.cc>
>> ---
>>  lib/libnetlink.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
>> index 1892a02ab5d0d73776c9882ffc77edcd2c663d01..0d48a3d43cf03065dacbd419578ab10af56431a4 100644
>> --- a/lib/libnetlink.c
>> +++ b/lib/libnetlink.c
>> @@ -718,6 +718,8 @@ static int rtnl_recvmsg(int fd, struct msghdr *msg, char **answer)
>>  	if (len < 0)
>>  		return len;
>>  
>> +	if (len < 32768)
>> +		len = 32768;
>>  	buf = malloc(len);
>>  	if (!buf) {
>>  		fprintf(stderr, "malloc error: not enough buffer\n");
>>
> 
> I believe that negates the whole point of 2d34851cd341 - which I have no
> problem with. 2 recvmsg calls per message is overkill.
> 
> Do we know of any single message sizes > 32k? 2d34851cd341 cites
> increasing VF's but at some point there is a limit. If not, the whole
> PEEK thing should go away and we just malloc 32k (or 64k) buffers for
> each recvmsg.
> 

Hey,

Sorry for reviving this old thread, but I see this topic was already
discussed here :).
I have a system where the large number of VFs result in a message
greater than 32k, which makes a simple 'ip link' command return an error.

Should we change the kernel's 'max_recvmsg_len' to 64k? Any other (more
robust) ideas to resolve this issue?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2023-05-29 12:29   ` Gal Pressman
@ 2023-05-31 21:51     ` Stephen Hemminger
  2023-06-04 13:33       ` Gal Pressman
  0 siblings, 1 reply; 17+ messages in thread
From: Stephen Hemminger @ 2023-05-31 21:51 UTC (permalink / raw)
  To: Gal Pressman
  Cc: David Ahern, Eric Dumazet, netdev, Eric Dumazet, Hangbin Liu,
	Phil Sutter, Jakub Kicinski, Michal Kubecek

On Mon, 29 May 2023 15:29:51 +0300
Gal Pressman <gal@nvidia.com> wrote:

> On 13/02/2019 4:04, David Ahern wrote:
> > On 2/12/19 6:58 PM, Eric Dumazet wrote:  
> >> In the past, we tried to increase the buffer size up to 32 KB in order
> >> to reduce number of syscalls per dump.
> >>
> >> Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> >> brought the size back to 4KB because the kernel can not know the application
> >> is ready to receive bigger requests.
> >>
> >> See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
> >> d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
> >> for more details.
> >>
> >> Fixes: 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
> >> Signed-off-by: Eric Dumazet <edumazet@google.com>
> >> Cc: Hangbin Liu <liuhangbin@gmail.com>
> >> Cc: Phil Sutter <phil@nwl.cc>
> >> ---
> >>  lib/libnetlink.c | 2 ++
> >>  1 file changed, 2 insertions(+)
> >>
> >> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
> >> index 1892a02ab5d0d73776c9882ffc77edcd2c663d01..0d48a3d43cf03065dacbd419578ab10af56431a4 100644
> >> --- a/lib/libnetlink.c
> >> +++ b/lib/libnetlink.c
> >> @@ -718,6 +718,8 @@ static int rtnl_recvmsg(int fd, struct msghdr *msg, char **answer)
> >>  	if (len < 0)
> >>  		return len;
> >>  
> >> +	if (len < 32768)
> >> +		len = 32768;
> >>  	buf = malloc(len);
> >>  	if (!buf) {
> >>  		fprintf(stderr, "malloc error: not enough buffer\n");
> >>  
> > 
> > I believe that negates the whole point of 2d34851cd341 - which I have no
> > problem with. 2 recvmsg calls per message is overkill.
> > 
> > Do we know of any single message sizes > 32k? 2d34851cd341 cites
> > increasing VF's but at some point there is a limit. If not, the whole
> > PEEK thing should go away and we just malloc 32k (or 64k) buffers for
> > each recvmsg.
> >   
> 
> Hey,
> 
> Sorry for reviving this old thread, but I see this topic was already
> discussed here :).
> I have a system where the large number of VFs result in a message
> greater than 32k, which makes a simple 'ip link' command return an error.
> 
> Should we change the kernel's 'max_recvmsg_len' to 64k? Any other (more
> robust) ideas to resolve this issue?

No matter what the size, someone will always have too many VF's to fit
in the response. There is no way to get a stable solution without doing
some API changes.

It is possible to dump millions of routes, so it is not directly a netlink
issue more that the current API is slamming all the VF's as info blocks
under a single message response.

That would mean replacing IFLA_VFINFO_LIST with a separate query

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2023-05-31 21:51     ` Stephen Hemminger
@ 2023-06-04 13:33       ` Gal Pressman
  2023-06-04 16:03         ` David Ahern
  0 siblings, 1 reply; 17+ messages in thread
From: Gal Pressman @ 2023-06-04 13:33 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: David Ahern, Eric Dumazet, netdev, Eric Dumazet, Hangbin Liu,
	Phil Sutter, Jakub Kicinski, Michal Kubecek

On 01/06/2023 0:51, Stephen Hemminger wrote:
> On Mon, 29 May 2023 15:29:51 +0300
> Gal Pressman <gal@nvidia.com> wrote:
> 
>> On 13/02/2019 4:04, David Ahern wrote:
>>> On 2/12/19 6:58 PM, Eric Dumazet wrote:  
>>>> In the past, we tried to increase the buffer size up to 32 KB in order
>>>> to reduce number of syscalls per dump.
>>>>
>>>> Commit 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
>>>> brought the size back to 4KB because the kernel can not know the application
>>>> is ready to receive bigger requests.
>>>>
>>>> See kernel commits 9063e21fb026 ("netlink: autosize skb lengthes") and
>>>> d35c99ff77ec ("netlink: do not enter direct reclaim from netlink_dump()")
>>>> for more details.
>>>>
>>>> Fixes: 2d34851cd341 ("lib/libnetlink: re malloc buff if size is not enough")
>>>> Signed-off-by: Eric Dumazet <edumazet@google.com>
>>>> Cc: Hangbin Liu <liuhangbin@gmail.com>
>>>> Cc: Phil Sutter <phil@nwl.cc>
>>>> ---
>>>>  lib/libnetlink.c | 2 ++
>>>>  1 file changed, 2 insertions(+)
>>>>
>>>> diff --git a/lib/libnetlink.c b/lib/libnetlink.c
>>>> index 1892a02ab5d0d73776c9882ffc77edcd2c663d01..0d48a3d43cf03065dacbd419578ab10af56431a4 100644
>>>> --- a/lib/libnetlink.c
>>>> +++ b/lib/libnetlink.c
>>>> @@ -718,6 +718,8 @@ static int rtnl_recvmsg(int fd, struct msghdr *msg, char **answer)
>>>>  	if (len < 0)
>>>>  		return len;
>>>>  
>>>> +	if (len < 32768)
>>>> +		len = 32768;
>>>>  	buf = malloc(len);
>>>>  	if (!buf) {
>>>>  		fprintf(stderr, "malloc error: not enough buffer\n");
>>>>  
>>>
>>> I believe that negates the whole point of 2d34851cd341 - which I have no
>>> problem with. 2 recvmsg calls per message is overkill.
>>>
>>> Do we know of any single message sizes > 32k? 2d34851cd341 cites
>>> increasing VF's but at some point there is a limit. If not, the whole
>>> PEEK thing should go away and we just malloc 32k (or 64k) buffers for
>>> each recvmsg.
>>>   
>>
>> Hey,
>>
>> Sorry for reviving this old thread, but I see this topic was already
>> discussed here :).
>> I have a system where the large number of VFs result in a message
>> greater than 32k, which makes a simple 'ip link' command return an error.
>>
>> Should we change the kernel's 'max_recvmsg_len' to 64k? Any other (more
>> robust) ideas to resolve this issue?
> 
> No matter what the size, someone will always have too many VF's to fit
> in the response. There is no way to get a stable solution without doing
> some API changes.
> 
> It is possible to dump millions of routes, so it is not directly a netlink
> issue more that the current API is slamming all the VF's as info blocks
> under a single message response.
> 
> That would mean replacing IFLA_VFINFO_LIST with a separate query

Thanks Stephen!
How would you imagine it? Changing the userspace to split each (PF, VF)
to a separate netlink call instead of a single call for each PF?

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2023-06-04 13:33       ` Gal Pressman
@ 2023-06-04 16:03         ` David Ahern
  2023-06-05  7:24           ` Gal Pressman
  0 siblings, 1 reply; 17+ messages in thread
From: David Ahern @ 2023-06-04 16:03 UTC (permalink / raw)
  To: Gal Pressman, Stephen Hemminger
  Cc: Eric Dumazet, netdev, Eric Dumazet, Hangbin Liu, Phil Sutter,
	Jakub Kicinski, Michal Kubecek

On 6/4/23 7:33 AM, Gal Pressman wrote:
>> It is possible to dump millions of routes, so it is not directly a netlink
>> issue more that the current API is slamming all the VF's as info blocks
>> under a single message response.
>>
>> That would mean replacing IFLA_VFINFO_LIST with a separate query
> 
> Thanks Stephen!
> How would you imagine it? Changing the userspace to split each (PF, VF)
> to a separate netlink call instead of a single call for each PF?


This is the last attempt that I recall to address this issue:

https://lore.kernel.org/netdev/20210123045321.2797360-1-edwin.peer@broadcom.com/


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg()
  2023-06-04 16:03         ` David Ahern
@ 2023-06-05  7:24           ` Gal Pressman
  0 siblings, 0 replies; 17+ messages in thread
From: Gal Pressman @ 2023-06-05  7:24 UTC (permalink / raw)
  To: David Ahern, Stephen Hemminger
  Cc: Eric Dumazet, netdev, Eric Dumazet, Hangbin Liu, Phil Sutter,
	Jakub Kicinski, Michal Kubecek

On 04/06/2023 19:03, David Ahern wrote:
> On 6/4/23 7:33 AM, Gal Pressman wrote:
>>> It is possible to dump millions of routes, so it is not directly a netlink
>>> issue more that the current API is slamming all the VF's as info blocks
>>> under a single message response.
>>>
>>> That would mean replacing IFLA_VFINFO_LIST with a separate query
>>
>> Thanks Stephen!
>> How would you imagine it? Changing the userspace to split each (PF, VF)
>> to a separate netlink call instead of a single call for each PF?
> 
> 
> This is the last attempt that I recall to address this issue:
> 
> https://lore.kernel.org/netdev/20210123045321.2797360-1-edwin.peer@broadcom.com/

Thanks for the pointer David, this is helpful.

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-06-05  7:24 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-13  1:58 [PATCH iproute2] lib/libnetlink: ensure a minimum of 32KB for the buffer used in rtnl_recvmsg() Eric Dumazet
2019-02-13  2:04 ` David Ahern
2019-02-13  2:08   ` Eric Dumazet
2019-02-13  6:20     ` Hangbin Liu
2019-02-14 13:51       ` Michal Kubecek
2019-02-14 13:49   ` Michal Kubecek
2019-02-14 17:34     ` David Ahern
2019-02-14 17:47       ` Phil Sutter
2023-05-29 12:29   ` Gal Pressman
2023-05-31 21:51     ` Stephen Hemminger
2023-06-04 13:33       ` Gal Pressman
2023-06-04 16:03         ` David Ahern
2023-06-05  7:24           ` Gal Pressman
2019-02-13 17:46 ` Phil Sutter
2019-02-13 17:58   ` Eric Dumazet
2019-02-13 21:57 ` Stephen Hemminger
2019-02-13 21:59   ` Eric Dumazet

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).