linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE
@ 2017-11-07 11:29 Jason A. Donenfeld
  2017-11-08  6:03 ` Jason A. Donenfeld
  2017-11-08  6:16 ` Johannes Berg
  0 siblings, 2 replies; 19+ messages in thread
From: Jason A. Donenfeld @ 2017-11-07 11:29 UTC (permalink / raw)
  To: johannes, davem, Netdev, linux-kernel; +Cc: Jason A. Donenfeld

The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus reserves and restores the required length for
NLMSG_DONE during the call to the dump function.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
See you all at netdevconf tomorrow!

 net/netlink/af_netlink.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b93148e8e9fb..b2d0a2fb1939 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2183,7 +2183,9 @@ static int netlink_dump(struct sock *sk)
 	skb_reserve(skb, skb_tailroom(skb) - alloc_size);
 	netlink_skb_set_owner_r(skb, sk);
 
+	skb->end -= nlmsg_total_size(sizeof(len));
 	len = cb->dump(skb, cb);
+	skb->end += nlmsg_total_size(sizeof(len));
 
 	if (len > 0) {
 		mutex_unlock(nlk->cb_mutex);
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE
  2017-11-07 11:29 [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE Jason A. Donenfeld
@ 2017-11-08  6:03 ` Jason A. Donenfeld
  2017-11-08  6:16 ` Johannes Berg
  1 sibling, 0 replies; 19+ messages in thread
From: Jason A. Donenfeld @ 2017-11-08  6:03 UTC (permalink / raw)
  To: Johannes Berg, David Miller, Netdev, LKML; +Cc: Jason A. Donenfeld

By the way, in case you're curious, here's the {up,down,cross}stream
WireGuard commit that works around it via its compat layer (a rat's
nest of hideous backports for all the weird kernels people want
WireGuard to run on, which I cannot wait to remove):

https://git.zx2c4.com/WireGuard/commit/?id=f689ea7acc23dc8e0968699d964ee382b04fbbe4

Particularly relavent here is the last chunk of that, which is part of
the automated test suite, which reproduces the issue by finding the
tightest possible packing.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE
  2017-11-07 11:29 [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE Jason A. Donenfeld
  2017-11-08  6:03 ` Jason A. Donenfeld
@ 2017-11-08  6:16 ` Johannes Berg
  2017-11-08  6:35   ` Jason A. Donenfeld
  2017-11-08  7:21   ` [PATCH v2] " Jason A. Donenfeld
  1 sibling, 2 replies; 19+ messages in thread
From: Johannes Berg @ 2017-11-08  6:16 UTC (permalink / raw)
  To: Jason A. Donenfeld, davem, Netdev, linux-kernel

On Tue, 2017-11-07 at 20:29 +0900, Jason A. Donenfeld wrote:
> 
> This patch thus reserves and restores the required length for
> NLMSG_DONE during the call to the dump function.
> 

That basically removes that space though, even when the dump isn't
complete... wouldn't it be better to do something like this?

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index f34750691c5c..fccf83598dab 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2183,9 +2183,15 @@ static int netlink_dump(struct sock *sk)
 	skb_reserve(skb, skb_tailroom(skb) - alloc_size);
 	netlink_skb_set_owner_r(skb, sk);
 
-	len = cb->dump(skb, cb);
+	/* if the dump is already done, just complete */
+	if (nlk->dump_done)
+		len = 0;
+	else
+		len = cb->dump(skb, cb);
+
+	nlk->dump_done = len == 0;
 
-	if (len > 0) {
+	if (len > 0 || skb_tailroom(skb) < nlmsg_total_size(sizeof(len))) {
 		mutex_unlock(nlk->cb_mutex);
 
 		if (sk_filter(sk, skb))
@@ -2196,7 +2202,7 @@ static int netlink_dump(struct sock *sk)
 	}
 
 	nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-	if (!nlh)
+	if (WARN_ON(!nlh))
 		goto errout_skb;
 
 	nl_dump_check_consistent(cb, nlh);
@@ -2273,6 +2279,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	}
 
 	nlk->cb_running = true;
+	nlk->dump_done = false;
 
 	mutex_unlock(nlk->cb_mutex);
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 3490f2430532..91a3652d384f 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -33,6 +33,7 @@ struct netlink_sock {
 	wait_queue_head_t	wait;
 	bool			bound;
 	bool			cb_running;
+	bool			dump_done;
 	struct netlink_callback	cb;
 	struct mutex		*cb_mutex;
 	struct mutex		cb_def_mutex;


https://p.sipsolutions.net/90574c3c0116d68a.txt

(untested)

johannes

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE
  2017-11-08  6:16 ` Johannes Berg
@ 2017-11-08  6:35   ` Jason A. Donenfeld
  2017-11-08  7:06     ` Jason A. Donenfeld
  2017-11-08  7:21   ` [PATCH v2] " Jason A. Donenfeld
  1 sibling, 1 reply; 19+ messages in thread
From: Jason A. Donenfeld @ 2017-11-08  6:35 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David Miller, Netdev, LKML

Hi Johannes,

Yes indeed. It sacrifices 24 bytes for making things much less
complex. However, if you prefer increasing the complexity of the state
machine a bit instead, I suppose we could roll with this approach
instead...

Jason

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE
  2017-11-08  6:35   ` Jason A. Donenfeld
@ 2017-11-08  7:06     ` Jason A. Donenfeld
  0 siblings, 0 replies; 19+ messages in thread
From: Jason A. Donenfeld @ 2017-11-08  7:06 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David Miller, Netdev, LKML

Erf, your patch doesn't handle what happens if len comes back
negative, but I'll fix it up and send a v2 using this approach.

I think I really prefer v1 though.

Jason

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v2] af_netlink: give correct bounds to dump skb for NLMSG_DONE
  2017-11-08  6:16 ` Johannes Berg
  2017-11-08  6:35   ` Jason A. Donenfeld
@ 2017-11-08  7:21   ` Jason A. Donenfeld
  2017-11-09  1:42     ` [PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps Jason A. Donenfeld
  1 sibling, 1 reply; 19+ messages in thread
From: Jason A. Donenfeld @ 2017-11-08  7:21 UTC (permalink / raw)
  To: Johannes Berg, davem, Netdev, linux-kernel; +Cc: Jason A. Donenfeld

The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Changes v1->v2:
  - This implements things without having to twiddle with the skb, at
    the expense of potentially having an extra back-and-forth and quite
    a bit of added complexity.


 net/netlink/af_netlink.c | 14 ++++++++------
 net/netlink/af_netlink.h |  1 +
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b93148e8e9fb..7020689e643e 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
 	struct sk_buff *skb = NULL;
 	struct nlmsghdr *nlh;
 	struct module *module;
-	int len, err = -ENOBUFS;
+	int err = -ENOBUFS;
 	int alloc_min_size;
 	int alloc_size;
 
@@ -2183,9 +2183,10 @@ static int netlink_dump(struct sock *sk)
 	skb_reserve(skb, skb_tailroom(skb) - alloc_size);
 	netlink_skb_set_owner_r(skb, sk);
 
-	len = cb->dump(skb, cb);
+	if (nlk->dump_done_errno > 0)
+		nlk->dump_done_errno = cb->dump(skb, cb);
 
-	if (len > 0) {
+	if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
 		mutex_unlock(nlk->cb_mutex);
 
 		if (sk_filter(sk, skb))
@@ -2195,13 +2196,13 @@ static int netlink_dump(struct sock *sk)
 		return 0;
 	}
 
-	nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-	if (!nlh)
+	nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(nlk->dump_done_errno), NLM_F_MULTI);
+	if (WARN_ON(!nlh))
 		goto errout_skb;
 
 	nl_dump_check_consistent(cb, nlh);
 
-	memcpy(nlmsg_data(nlh), &len, sizeof(len));
+	memcpy(nlmsg_data(nlh), &nlk->dump_done_errno, sizeof(nlk->dump_done_errno));
 
 	if (sk_filter(sk, skb))
 		kfree_skb(skb);
@@ -2273,6 +2274,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	}
 
 	nlk->cb_running = true;
+	nlk->dump_done_errno = INT_MAX;
 
 	mutex_unlock(nlk->cb_mutex);
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 028188597eaa..962de7b3c023 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -34,6 +34,7 @@ struct netlink_sock {
 	wait_queue_head_t	wait;
 	bool			bound;
 	bool			cb_running;
+	int			dump_done_errno;
 	struct netlink_callback	cb;
 	struct mutex		*cb_mutex;
 	struct mutex		cb_def_mutex;
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-08  7:21   ` [PATCH v2] " Jason A. Donenfeld
@ 2017-11-09  1:42     ` Jason A. Donenfeld
  2017-11-09  2:02       ` Johannes Berg
  2017-11-09  4:04       ` [PATCH v4] " Jason A. Donenfeld
  0 siblings, 2 replies; 19+ messages in thread
From: Jason A. Donenfeld @ 2017-11-09  1:42 UTC (permalink / raw)
  To: Johannes Berg, davem, Netdev, linux-kernel; +Cc: Jason A. Donenfeld

The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Can we get this into 4.14? Is there still time? It should also be queued
up for stable.

Changes v2->v3:
  - Johannes didn't like the subject line of the patch, so the only
    thing that's changed in this version is the new subject line.

 net/netlink/af_netlink.c | 14 ++++++++------
 net/netlink/af_netlink.h |  1 +
 2 files changed, 9 insertions(+), 6 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b93148e8e9fb..7020689e643e 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
 	struct sk_buff *skb = NULL;
 	struct nlmsghdr *nlh;
 	struct module *module;
-	int len, err = -ENOBUFS;
+	int err = -ENOBUFS;
 	int alloc_min_size;
 	int alloc_size;
 
@@ -2183,9 +2183,10 @@ static int netlink_dump(struct sock *sk)
 	skb_reserve(skb, skb_tailroom(skb) - alloc_size);
 	netlink_skb_set_owner_r(skb, sk);
 
-	len = cb->dump(skb, cb);
+	if (nlk->dump_done_errno > 0)
+		nlk->dump_done_errno = cb->dump(skb, cb);
 
-	if (len > 0) {
+	if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
 		mutex_unlock(nlk->cb_mutex);
 
 		if (sk_filter(sk, skb))
@@ -2195,13 +2196,13 @@ static int netlink_dump(struct sock *sk)
 		return 0;
 	}
 
-	nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-	if (!nlh)
+	nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(nlk->dump_done_errno), NLM_F_MULTI);
+	if (WARN_ON(!nlh))
 		goto errout_skb;
 
 	nl_dump_check_consistent(cb, nlh);
 
-	memcpy(nlmsg_data(nlh), &len, sizeof(len));
+	memcpy(nlmsg_data(nlh), &nlk->dump_done_errno, sizeof(nlk->dump_done_errno));
 
 	if (sk_filter(sk, skb))
 		kfree_skb(skb);
@@ -2273,6 +2274,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	}
 
 	nlk->cb_running = true;
+	nlk->dump_done_errno = INT_MAX;
 
 	mutex_unlock(nlk->cb_mutex);
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 028188597eaa..962de7b3c023 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -34,6 +34,7 @@ struct netlink_sock {
 	wait_queue_head_t	wait;
 	bool			bound;
 	bool			cb_running;
+	int			dump_done_errno;
 	struct netlink_callback	cb;
 	struct mutex		*cb_mutex;
 	struct mutex		cb_def_mutex;
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-09  1:42     ` [PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps Jason A. Donenfeld
@ 2017-11-09  2:02       ` Johannes Berg
  2017-11-09  2:57         ` Jason A. Donenfeld
  2017-11-09  4:04       ` [PATCH v4] " Jason A. Donenfeld
  1 sibling, 1 reply; 19+ messages in thread
From: Johannes Berg @ 2017-11-09  2:02 UTC (permalink / raw)
  To: Jason A. Donenfeld, davem, Netdev, linux-kernel

On Thu, 2017-11-09 at 10:42 +0900, Jason A. Donenfeld wrote:
> +++ b/net/netlink/af_netlink.c
> @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
>  	struct sk_buff *skb = NULL;
>  	struct nlmsghdr *nlh;
>  	struct module *module;
> -	int len, err = -ENOBUFS;
> +	int err = -ENOBUFS;
>  	int alloc_min_size;
>  	int alloc_size;
>  
> @@ -2183,9 +2183,10 @@ static int netlink_dump(struct sock *sk)
>  	skb_reserve(skb, skb_tailroom(skb) - alloc_size);
>  	netlink_skb_set_owner_r(skb, sk);
>  
> -	len = cb->dump(skb, cb);
> +	if (nlk->dump_done_errno > 0)
> +		nlk->dump_done_errno = cb->dump(skb, cb);
>  
> -	if (len > 0) {
> +	if (nlk->dump_done_errno > 0 || skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
>  		mutex_unlock(nlk->cb_mutex);

nit: I think your line got a little long here :)
 
> -	memcpy(nlmsg_data(nlh), &len, sizeof(len));
> +	memcpy(nlmsg_data(nlh), &nlk->dump_done_errno, sizeof(nlk->dump_done_errno));

and here

> +	nlk->dump_done_errno = INT_MAX;

I guess positive values aren't really returned from dump?

johannes

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-09  2:02       ` Johannes Berg
@ 2017-11-09  2:57         ` Jason A. Donenfeld
  0 siblings, 0 replies; 19+ messages in thread
From: Jason A. Donenfeld @ 2017-11-09  2:57 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David Miller, Netdev, LKML

On Thu, Nov 9, 2017 at 11:02 AM, Johannes Berg
<johannes@sipsolutions.net> wrote:
> nit: I think your line got a little long here :)

Ack. For v4.

> and here

Ack. For v4.

>
>> +     nlk->dump_done_errno = INT_MAX;
>
> I guess positive values aren't really returned from dump?

When a positive value is returned, the API uses this to signify that
the dump is not done, and it should be called again. When 0 or
negative is returned, it means the dump is complete and the return
value should be returned in DONE. We therefore initialize
dump_done_errno to some positive value, so that we can make calling
->dump() conditional on it being positive. When it becomes zero or
negative, it's time to return. This mirrors the semantics of the
actual ->dump() function precisely.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-09  1:42     ` [PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps Jason A. Donenfeld
  2017-11-09  2:02       ` Johannes Berg
@ 2017-11-09  4:04       ` Jason A. Donenfeld
  2017-11-11  2:26         ` Jason A. Donenfeld
  2017-11-11 14:09         ` David Miller
  1 sibling, 2 replies; 19+ messages in thread
From: Jason A. Donenfeld @ 2017-11-09  4:04 UTC (permalink / raw)
  To: Johannes Berg, davem, Netdev, linux-kernel; +Cc: Jason A. Donenfeld

The way people generally use netlink_dump is that they fill in the skb
as much as possible, breaking when nla_put returns an error. Then, they
get called again and start filling out the next skb, and again, and so
forth. The mechanism at work here is the ability for the iterative
dumping function to detect when the skb is filled up and not fill it
past the brim, waiting for a fresh skb for the rest of the data.

However, if the attributes are small and nicely packed, it is possible
that a dump callback function successfully fills in attributes until the
skb is of size 4080 (libmnl's default page-sized receive buffer size).
The dump function completes, satisfied, and then, if it happens to be
that this is actually the last skb, and no further ones are to be sent,
then netlink_dump will add on the NLMSG_DONE part:

  nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);

It is very important that netlink_dump does this, of course. However, in
this example, that call to nlmsg_put_answer will fail, because the
previous filling by the dump function did not leave it enough room. And
how could it possibly have done so? All of the nla_put variety of
functions simply check to see if the skb has enough tailroom,
independent of the context it is in.

In order to keep the important assumptions of all netlink dump users, it
is therefore important to give them an skb that has this end part of the
tail already reserved, so that the call to nlmsg_put_answer does not
fail. Otherwise, library authors are forced to find some bizarre sized
receive buffer that has a large modulo relative to the common sizes of
messages received, which is ugly and buggy.

This patch thus saves the NLMSG_DONE for an additional message, for the
case that things are dangerously close to the brim. This requires
keeping track of the errno from ->dump() across calls.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
---
Can we get this into 4.14? Is there still time? It should also be queued
up for stable.

Changes v3->v4:
 - I like long lines. The kernel does not. Wrapped at 80 now.

 net/netlink/af_netlink.c | 17 +++++++++++------
 net/netlink/af_netlink.h |  1 +
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index b93148e8e9fb..15c99dfa3d72 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
 	struct sk_buff *skb = NULL;
 	struct nlmsghdr *nlh;
 	struct module *module;
-	int len, err = -ENOBUFS;
+	int err = -ENOBUFS;
 	int alloc_min_size;
 	int alloc_size;
 
@@ -2183,9 +2183,11 @@ static int netlink_dump(struct sock *sk)
 	skb_reserve(skb, skb_tailroom(skb) - alloc_size);
 	netlink_skb_set_owner_r(skb, sk);
 
-	len = cb->dump(skb, cb);
+	if (nlk->dump_done_errno > 0)
+		nlk->dump_done_errno = cb->dump(skb, cb);
 
-	if (len > 0) {
+	if (nlk->dump_done_errno > 0 ||
+	    skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
 		mutex_unlock(nlk->cb_mutex);
 
 		if (sk_filter(sk, skb))
@@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk)
 		return 0;
 	}
 
-	nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-	if (!nlh)
+	nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
+			       sizeof(nlk->dump_done_errno), NLM_F_MULTI);
+	if (WARN_ON(!nlh))
 		goto errout_skb;
 
 	nl_dump_check_consistent(cb, nlh);
 
-	memcpy(nlmsg_data(nlh), &len, sizeof(len));
+	memcpy(nlmsg_data(nlh), &nlk->dump_done_errno,
+	       sizeof(nlk->dump_done_errno));
 
 	if (sk_filter(sk, skb))
 		kfree_skb(skb);
@@ -2273,6 +2277,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
 	}
 
 	nlk->cb_running = true;
+	nlk->dump_done_errno = INT_MAX;
 
 	mutex_unlock(nlk->cb_mutex);
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 028188597eaa..962de7b3c023 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -34,6 +34,7 @@ struct netlink_sock {
 	wait_queue_head_t	wait;
 	bool			bound;
 	bool			cb_running;
+	int			dump_done_errno;
 	struct netlink_callback	cb;
 	struct mutex		*cb_mutex;
 	struct mutex		cb_def_mutex;
-- 
2.15.0

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-09  4:04       ` [PATCH v4] " Jason A. Donenfeld
@ 2017-11-11  2:26         ` Jason A. Donenfeld
  2017-11-11  2:37           ` David Miller
  2017-11-11 14:09         ` David Miller
  1 sibling, 1 reply; 19+ messages in thread
From: Jason A. Donenfeld @ 2017-11-11  2:26 UTC (permalink / raw)
  To: Johannes Berg, David Miller, Netdev, LKML

IIRC, 4.14 comes tomorrow-ish? If possible, it would be nice to get
this in 4.14 before then, so it doesn't have to take time to trickle
down through stable.

Jason

On Thu, Nov 9, 2017 at 1:04 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> The way people generally use netlink_dump is that they fill in the skb
> as much as possible, breaking when nla_put returns an error. Then, they
> get called again and start filling out the next skb, and again, and so
> forth. The mechanism at work here is the ability for the iterative
> dumping function to detect when the skb is filled up and not fill it
> past the brim, waiting for a fresh skb for the rest of the data.
>
> However, if the attributes are small and nicely packed, it is possible
> that a dump callback function successfully fills in attributes until the
> skb is of size 4080 (libmnl's default page-sized receive buffer size).
> The dump function completes, satisfied, and then, if it happens to be
> that this is actually the last skb, and no further ones are to be sent,
> then netlink_dump will add on the NLMSG_DONE part:
>
>   nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
>
> It is very important that netlink_dump does this, of course. However, in
> this example, that call to nlmsg_put_answer will fail, because the
> previous filling by the dump function did not leave it enough room. And
> how could it possibly have done so? All of the nla_put variety of
> functions simply check to see if the skb has enough tailroom,
> independent of the context it is in.
>
> In order to keep the important assumptions of all netlink dump users, it
> is therefore important to give them an skb that has this end part of the
> tail already reserved, so that the call to nlmsg_put_answer does not
> fail. Otherwise, library authors are forced to find some bizarre sized
> receive buffer that has a large modulo relative to the common sizes of
> messages received, which is ugly and buggy.
>
> This patch thus saves the NLMSG_DONE for an additional message, for the
> case that things are dangerously close to the brim. This requires
> keeping track of the errno from ->dump() across calls.
>
> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
> ---
> Can we get this into 4.14? Is there still time? It should also be queued
> up for stable.
>
> Changes v3->v4:
>  - I like long lines. The kernel does not. Wrapped at 80 now.
>
>  net/netlink/af_netlink.c | 17 +++++++++++------
>  net/netlink/af_netlink.h |  1 +
>  2 files changed, 12 insertions(+), 6 deletions(-)
>
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index b93148e8e9fb..15c99dfa3d72 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -2136,7 +2136,7 @@ static int netlink_dump(struct sock *sk)
>         struct sk_buff *skb = NULL;
>         struct nlmsghdr *nlh;
>         struct module *module;
> -       int len, err = -ENOBUFS;
> +       int err = -ENOBUFS;
>         int alloc_min_size;
>         int alloc_size;
>
> @@ -2183,9 +2183,11 @@ static int netlink_dump(struct sock *sk)
>         skb_reserve(skb, skb_tailroom(skb) - alloc_size);
>         netlink_skb_set_owner_r(skb, sk);
>
> -       len = cb->dump(skb, cb);
> +       if (nlk->dump_done_errno > 0)
> +               nlk->dump_done_errno = cb->dump(skb, cb);
>
> -       if (len > 0) {
> +       if (nlk->dump_done_errno > 0 ||
> +           skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
>                 mutex_unlock(nlk->cb_mutex);
>
>                 if (sk_filter(sk, skb))
> @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk)
>                 return 0;
>         }
>
> -       nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
> -       if (!nlh)
> +       nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
> +                              sizeof(nlk->dump_done_errno), NLM_F_MULTI);
> +       if (WARN_ON(!nlh))
>                 goto errout_skb;
>
>         nl_dump_check_consistent(cb, nlh);
>
> -       memcpy(nlmsg_data(nlh), &len, sizeof(len));
> +       memcpy(nlmsg_data(nlh), &nlk->dump_done_errno,
> +              sizeof(nlk->dump_done_errno));
>
>         if (sk_filter(sk, skb))
>                 kfree_skb(skb);
> @@ -2273,6 +2277,7 @@ int __netlink_dump_start(struct sock *ssk, struct sk_buff *skb,
>         }
>
>         nlk->cb_running = true;
> +       nlk->dump_done_errno = INT_MAX;
>
>         mutex_unlock(nlk->cb_mutex);
>
> diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
> index 028188597eaa..962de7b3c023 100644
> --- a/net/netlink/af_netlink.h
> +++ b/net/netlink/af_netlink.h
> @@ -34,6 +34,7 @@ struct netlink_sock {
>         wait_queue_head_t       wait;
>         bool                    bound;
>         bool                    cb_running;
> +       int                     dump_done_errno;
>         struct netlink_callback cb;
>         struct mutex            *cb_mutex;
>         struct mutex            cb_def_mutex;
> --
> 2.15.0
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-11  2:26         ` Jason A. Donenfeld
@ 2017-11-11  2:37           ` David Miller
  2017-11-11  2:47             ` Jason A. Donenfeld
  0 siblings, 1 reply; 19+ messages in thread
From: David Miller @ 2017-11-11  2:37 UTC (permalink / raw)
  To: Jason; +Cc: johannes, netdev, linux-kernel

From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Sat, 11 Nov 2017 11:26:12 +0900

> IIRC, 4.14 comes tomorrow-ish? If possible, it would be nice to get
> this in 4.14 before then, so it doesn't have to take time to trickle
> down through stable.

Jason I'm already pushing my luck as-is with the pull request I made
yesterday.

I've seen your original requst to get this in, you don't have to say
it multiple times.

We can get this into the merge window and submit it for -stable, so
please relax.

Thank you.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-11  2:37           ` David Miller
@ 2017-11-11  2:47             ` Jason A. Donenfeld
  0 siblings, 0 replies; 19+ messages in thread
From: Jason A. Donenfeld @ 2017-11-11  2:47 UTC (permalink / raw)
  To: David Miller; +Cc: Johannes Berg, Netdev, LKML

On Sat, Nov 11, 2017 at 11:37 AM, David Miller <davem@davemloft.net> wrote:
> Jason I'm already pushing my luck as-is with the pull request I made
> yesterday.
>
> I've seen your original requst to get this in, you don't have to say
> it multiple times.
>
> We can get this into the merge window and submit it for -stable, so
> please relax.

Whoops, sorry! Okay, no problem.

Jason

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-09  4:04       ` [PATCH v4] " Jason A. Donenfeld
  2017-11-11  2:26         ` Jason A. Donenfeld
@ 2017-11-11 14:09         ` David Miller
  2017-11-11 14:15           ` Johannes Berg
  1 sibling, 1 reply; 19+ messages in thread
From: David Miller @ 2017-11-11 14:09 UTC (permalink / raw)
  To: Jason; +Cc: johannes, netdev, linux-kernel

From: "Jason A. Donenfeld" <Jason@zx2c4.com>
Date: Thu,  9 Nov 2017 13:04:44 +0900

> @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk)
>  		return 0;
>  	}
>  
> -	nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
> -	if (!nlh)
> +	nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
> +			       sizeof(nlk->dump_done_errno), NLM_F_MULTI);
> +	if (WARN_ON(!nlh))
>  		goto errout_skb;

If you're handling this by forcing another read() to procude the
NLMSG_DONE, then you have no reason to WARN_ON() here.

In fact you are adding a WARN_ON() which is trivially triggerable by
any user.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-11 14:09         ` David Miller
@ 2017-11-11 14:15           ` Johannes Berg
  2017-11-11 14:18             ` Johannes Berg
  2017-11-11 14:21             ` David Miller
  0 siblings, 2 replies; 19+ messages in thread
From: Johannes Berg @ 2017-11-11 14:15 UTC (permalink / raw)
  To: David Miller, Jason; +Cc: netdev, linux-kernel

On Sat, 2017-11-11 at 23:09 +0900, David Miller wrote:
> From: "Jason A. Donenfeld" <Jason@zx2c4.com>
> Date: Thu,  9 Nov 2017 13:04:44 +0900
> 
> > @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk)
> >               return 0;
> >       }
> >  
> > -     nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
> > -     if (!nlh)
> > +     nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
> > +                            sizeof(nlk->dump_done_errno), NLM_F_MULTI);
> > +     if (WARN_ON(!nlh))
> >               goto errout_skb;
> 
> If you're handling this by forcing another read() to procude the
> NLMSG_DONE, then you have no reason to WARN_ON() here.
> 
> In fact you are adding a WARN_ON() which is trivially triggerable by
> any user.

I added this in my suggestion for how this could work, but I don't
think you're right, since we previously check if there's enough space.
The patch is missing the full context, but this is:


+       if (nlk->dump_done_errno > 0 ||
+           skb_tailroom(skb) < nlmsg_total_size(sizeof(nlk->dump_done_errno))) {
                mutex_unlock(nlk->cb_mutex);
 
                if (sk_filter(sk, skb))
                        kfree_skb(skb);
                else
                        __netlink_sendskb(sk, skb);
                return 0;
        }

-       nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
-       if (!nlh)
+       nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
+                              sizeof(nlk->dump_done_errno), NLM_F_MULTI);
+       if (WARN_ON(!nlh))


So unless the nlmsg_total_size() vs. nlmsg_put_answer() suddenly gets a
different idea of how much space is needed, nlh shouldn't ever be NULL
once we get here.

johannes

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-11 14:15           ` Johannes Berg
@ 2017-11-11 14:18             ` Johannes Berg
  2017-11-11 15:18               ` Jason A. Donenfeld
  2017-11-11 14:21             ` David Miller
  1 sibling, 1 reply; 19+ messages in thread
From: Johannes Berg @ 2017-11-11 14:18 UTC (permalink / raw)
  To: David Miller, Jason; +Cc: netdev, linux-kernel


> > If you're handling this by forcing another read() to procude the
> > NLMSG_DONE, then you have no reason to WARN_ON() here.
> > 
> > In fact you are adding a WARN_ON() which is trivially triggerable by
> > any user.
> 
> I added this in my suggestion for how this could work, but I don't
> think you're right, since we previously check if there's enough space.

Or perhaps I should say this differently:

Forcing another read happens through the

	skb_tailroom(skb) < nlmsg_total_size(...)

check, so the nlmsg_put_answer() can't really fail.


Handling nlmsg_put_answer() failures by forcing another read would have
required jumping to the existing if code with a goto, or restructuring
the whole thing completely somehow, and I didn't see how to do that.

johannes

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-11 14:15           ` Johannes Berg
  2017-11-11 14:18             ` Johannes Berg
@ 2017-11-11 14:21             ` David Miller
  2017-11-13  1:18               ` David Miller
  1 sibling, 1 reply; 19+ messages in thread
From: David Miller @ 2017-11-11 14:21 UTC (permalink / raw)
  To: johannes; +Cc: Jason, netdev, linux-kernel

From: Johannes Berg <johannes@sipsolutions.net>
Date: Sat, 11 Nov 2017 15:15:21 +0100

> On Sat, 2017-11-11 at 23:09 +0900, David Miller wrote:
>> From: "Jason A. Donenfeld" <Jason@zx2c4.com>
>> Date: Thu,  9 Nov 2017 13:04:44 +0900
>> 
>> > @@ -2195,13 +2197,15 @@ static int netlink_dump(struct sock *sk)
>> >               return 0;
>> >       }
>> >  
>> > -     nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE, sizeof(len), NLM_F_MULTI);
>> > -     if (!nlh)
>> > +     nlh = nlmsg_put_answer(skb, cb, NLMSG_DONE,
>> > +                            sizeof(nlk->dump_done_errno), NLM_F_MULTI);
>> > +     if (WARN_ON(!nlh))
>> >               goto errout_skb;
>> 
>> If you're handling this by forcing another read() to procude the
>> NLMSG_DONE, then you have no reason to WARN_ON() here.
>> 
>> In fact you are adding a WARN_ON() which is trivially triggerable by
>> any user.
> 
> I added this in my suggestion for how this could work, but I don't
> think you're right, since we previously check if there's enough space.
> The patch is missing the full context, but this is:
 ...
> So unless the nlmsg_total_size() vs. nlmsg_put_answer() suddenly gets a
> different idea of how much space is needed, nlh shouldn't ever be NULL
> once we get here.

Aha, that's what I missed.  Indeed, it cannot happen.

My bad.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-11 14:18             ` Johannes Berg
@ 2017-11-11 15:18               ` Jason A. Donenfeld
  0 siblings, 0 replies; 19+ messages in thread
From: Jason A. Donenfeld @ 2017-11-11 15:18 UTC (permalink / raw)
  To: Johannes Berg; +Cc: David Miller, Netdev, LKML

On Sat, Nov 11, 2017 at 11:18 PM, Johannes Berg
<johannes@sipsolutions.net> wrote:
>
>> > If you're handling this by forcing another read() to procude the
>> > NLMSG_DONE, then you have no reason to WARN_ON() here.
>> >
>> > In fact you are adding a WARN_ON() which is trivially triggerable by
>> > any user.
>>
>> I added this in my suggestion for how this could work, but I don't
>> think you're right, since we previously check if there's enough space.
>
> Or perhaps I should say this differently:
>
> Forcing another read happens through the
>
>         skb_tailroom(skb) < nlmsg_total_size(...)
>
> check, so the nlmsg_put_answer() can't really fail.
>
>
> Handling nlmsg_put_answer() failures by forcing another read would have
> required jumping to the existing if code with a goto, or restructuring
> the whole thing completely somehow, and I didn't see how to do that.

Exactly. And if something _does_ go wrong in our logic, and we can't
add NLMSG_DONE, we really do want people to report this to us, since
dumps must always end that way. We'd probably have caught this a
number of years ago when userspace developers were twiddling with
their receive buffers if we had had the WARN_ON. Nice suggestion from
Johannes.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH v4] af_netlink: ensure that NLMSG_DONE never fails in dumps
  2017-11-11 14:21             ` David Miller
@ 2017-11-13  1:18               ` David Miller
  0 siblings, 0 replies; 19+ messages in thread
From: David Miller @ 2017-11-13  1:18 UTC (permalink / raw)
  To: johannes; +Cc: Jason, netdev, linux-kernel

From: David Miller <davem@davemloft.net>
Date: Sat, 11 Nov 2017 23:21:01 +0900 (KST)

> Aha, that's what I missed.  Indeed, it cannot happen.

Applied and queued up for -stable.

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-11-13  1:18 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-07 11:29 [PATCH] af_netlink: give correct bounds to dump skb for NLMSG_DONE Jason A. Donenfeld
2017-11-08  6:03 ` Jason A. Donenfeld
2017-11-08  6:16 ` Johannes Berg
2017-11-08  6:35   ` Jason A. Donenfeld
2017-11-08  7:06     ` Jason A. Donenfeld
2017-11-08  7:21   ` [PATCH v2] " Jason A. Donenfeld
2017-11-09  1:42     ` [PATCH v3] af_netlink: ensure that NLMSG_DONE never fails in dumps Jason A. Donenfeld
2017-11-09  2:02       ` Johannes Berg
2017-11-09  2:57         ` Jason A. Donenfeld
2017-11-09  4:04       ` [PATCH v4] " Jason A. Donenfeld
2017-11-11  2:26         ` Jason A. Donenfeld
2017-11-11  2:37           ` David Miller
2017-11-11  2:47             ` Jason A. Donenfeld
2017-11-11 14:09         ` David Miller
2017-11-11 14:15           ` Johannes Berg
2017-11-11 14:18             ` Johannes Berg
2017-11-11 15:18               ` Jason A. Donenfeld
2017-11-11 14:21             ` David Miller
2017-11-13  1:18               ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).