All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
@ 2023-04-07 22:21 Dave Pifke
  2023-04-08 18:23 ` Pablo Neira Ayuso
  2023-04-08 23:09 ` Pablo Neira Ayuso
  0 siblings, 2 replies; 7+ messages in thread
From: Dave Pifke @ 2023-04-07 22:21 UTC (permalink / raw)
  To: netfilter-devel

Prior to this patch, nft inside a systemd-nspawn container was failing
to install my ruleset (which includes a large-ish map), with the error

netlink: Error: Could not process rule: Message too long

strace reveals:

setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [524288], 4) = -1 EPERM (Operation not permitted)

This is despite the nspawn process supposedly having CAP_NET_ADMIN,
and despite /proc/sys/net/core/wmem_max (in the main host namespace)
being set larger than the requested size:

net.core.wmem_max = 16777216

A web search reveals at least one other user having the same issue:

https://old.reddit.com/r/Proxmox/comments/scnoav/lxc_container_debian_11_nftables_geoblocking/

After this patch, nft succeeds.
---
 src/mnl.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/src/mnl.c b/src/mnl.c
index 26f943db..ab6750c8 100644
--- a/src/mnl.c
+++ b/src/mnl.c
@@ -260,6 +260,13 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl,
 		return;
 
 	/* Rise sender buffer length to avoid hitting -EMSGSIZE */
+	if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
+		       &newbuffsiz, sizeof(socklen_t)) == 0)
+		return;
+
+	/* If the above fails (probably because it exceeds
+	 * /proc/sys/net/core/wmem_max), try again with SO_SNDBUFFORCE.
+	 * This requires CAP_NET_ADMIN. */
 	if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE,
 		       &newbuffsiz, sizeof(socklen_t)) < 0)
 		return;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
  2023-04-07 22:21 [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE Dave Pifke
@ 2023-04-08 18:23 ` Pablo Neira Ayuso
  2023-04-08 18:34   ` Dave Pifke
  2023-04-08 23:09 ` Pablo Neira Ayuso
  1 sibling, 1 reply; 7+ messages in thread
From: Pablo Neira Ayuso @ 2023-04-08 18:23 UTC (permalink / raw)
  To: Dave Pifke; +Cc: netfilter-devel

On Fri, Apr 07, 2023 at 04:21:57PM -0600, Dave Pifke wrote:
> Prior to this patch, nft inside a systemd-nspawn container was failing
> to install my ruleset (which includes a large-ish map), with the error
> 
> netlink: Error: Could not process rule: Message too long
> 
> strace reveals:
> 
> setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [524288], 4) = -1 EPERM (Operation not permitted)
> 
> This is despite the nspawn process supposedly having CAP_NET_ADMIN,
> and despite /proc/sys/net/core/wmem_max (in the main host namespace)
> being set larger than the requested size:
> 
> net.core.wmem_max = 16777216
> 
> A web search reveals at least one other user having the same issue:
> 
> https://old.reddit.com/r/Proxmox/comments/scnoav/lxc_container_debian_11_nftables_geoblocking/
> 
> After this patch, nft succeeds.

Patch LGTM.

May I add your Signed-off-by: tag to this patch before applying it?

Thanks.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
  2023-04-08 18:23 ` Pablo Neira Ayuso
@ 2023-04-08 18:34   ` Dave Pifke
  0 siblings, 0 replies; 7+ messages in thread
From: Dave Pifke @ 2023-04-08 18:34 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

Pablo Neira Ayuso <pablo@netfilter.org> writes:

> May I add your Signed-off-by: tag to this patch before applying it?

Yes.  Thank you!


-- 
Dave Pifke, dave@pifke.org

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
  2023-04-07 22:21 [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE Dave Pifke
  2023-04-08 18:23 ` Pablo Neira Ayuso
@ 2023-04-08 23:09 ` Pablo Neira Ayuso
  2023-04-10  9:04   ` Pablo Neira Ayuso
  2023-04-10 18:03   ` Dave Pifke
  1 sibling, 2 replies; 7+ messages in thread
From: Pablo Neira Ayuso @ 2023-04-08 23:09 UTC (permalink / raw)
  To: Dave Pifke; +Cc: netfilter-devel

Hi again,

Let me revisit this.

On Fri, Apr 07, 2023 at 04:21:57PM -0600, Dave Pifke wrote:
> Prior to this patch, nft inside a systemd-nspawn container was failing
> to install my ruleset (which includes a large-ish map), with the error
> 
> netlink: Error: Could not process rule: Message too long
> 
> strace reveals:
> 
> setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [524288], 4) = -1 EPERM (Operation not permitted)
>
> This is despite the nspawn process supposedly having CAP_NET_ADMIN,
> and despite /proc/sys/net/core/wmem_max (in the main host namespace)
> being set larger than the requested size:
> 
> net.core.wmem_max = 16777216

OK, so you indeed increased net.core.wmem_max on the host namespace.

> A web search reveals at least one other user having the same issue:
> 
> https://old.reddit.com/r/Proxmox/comments/scnoav/lxc_container_debian_11_nftables_geoblocking/
> 
> After this patch, nft succeeds.
> ---
>  src/mnl.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/src/mnl.c b/src/mnl.c
> index 26f943db..ab6750c8 100644
> --- a/src/mnl.c
> +++ b/src/mnl.c
> @@ -260,6 +260,13 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl,
>  		return;
>  
>  	/* Rise sender buffer length to avoid hitting -EMSGSIZE */
> +	if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
> +		       &newbuffsiz, sizeof(socklen_t)) == 0)
> +		return;

setsockopt() with SO_SNDBUF never fails: it trims the newbuffsiz that is
specified by net.core.wmem_max

This needs to call:

	setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
		   &newbuffsiz, sizeof(socklen_t));

without checking the return value. Otherwise, SO_SNDBUFFORCE is never
going to be called after this patch. This needs a v2.

On top of this patch, you still needed to increase net.core.wmem_max
in your host container for this to work.

> +	/* If the above fails (probably because it exceeds
> +	 * /proc/sys/net/core/wmem_max), try again with SO_SNDBUFFORCE.
> +	 * This requires CAP_NET_ADMIN. */
>  	if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE,
>  		       &newbuffsiz, sizeof(socklen_t)) < 0)
>  		return;
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
  2023-04-08 23:09 ` Pablo Neira Ayuso
@ 2023-04-10  9:04   ` Pablo Neira Ayuso
  2023-04-10 18:03   ` Dave Pifke
  1 sibling, 0 replies; 7+ messages in thread
From: Pablo Neira Ayuso @ 2023-04-10  9:04 UTC (permalink / raw)
  To: Dave Pifke; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 897 bytes --]

Hi,

On Sun, Apr 09, 2023 at 01:09:07AM +0200, Pablo Neira Ayuso wrote:
> > diff --git a/src/mnl.c b/src/mnl.c
> > index 26f943db..ab6750c8 100644
> > --- a/src/mnl.c
> > +++ b/src/mnl.c
> > @@ -260,6 +260,13 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl,
> >  		return;
> >  
> >  	/* Rise sender buffer length to avoid hitting -EMSGSIZE */
> > +	if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
> > +		       &newbuffsiz, sizeof(socklen_t)) == 0)
> > +		return;
> 
> setsockopt() with SO_SNDBUF never fails: it trims the newbuffsiz that is
> specified by net.core.wmem_max
> 
> This needs to call:
> 
> 	setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
> 		   &newbuffsiz, sizeof(socklen_t));
> 
> without checking the return value. Otherwise, SO_SNDBUFFORCE is never
> going to be called after this patch. This needs a v2.

I think this patch should be fine.

[-- Attachment #2: x.patch --]
[-- Type: text/x-diff, Size: 678 bytes --]

diff --git a/src/mnl.c b/src/mnl.c
index 26f943dbb4c8..ee62c0c9c2a0 100644
--- a/src/mnl.c
+++ b/src/mnl.c
@@ -261,8 +261,13 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl,
 
 	/* Rise sender buffer length to avoid hitting -EMSGSIZE */
 	if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE,
-		       &newbuffsiz, sizeof(socklen_t)) < 0)
-		return;
+		       &newbuffsiz, sizeof(socklen_t)) < 0) {
+		/* Fall back to SO_SNDBUF, this never fails, kernel trims down
+		 * the size to net.core.wmem_max.
+		 */
+		setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
+			   &newbuffsiz, sizeof(socklen_t));
+	}
 }
 
 static unsigned int nlsndbufsiz;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
  2023-04-08 23:09 ` Pablo Neira Ayuso
  2023-04-10  9:04   ` Pablo Neira Ayuso
@ 2023-04-10 18:03   ` Dave Pifke
  2023-04-18 10:10     ` Pablo Neira Ayuso
  1 sibling, 1 reply; 7+ messages in thread
From: Dave Pifke @ 2023-04-10 18:03 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

[-- Attachment #1: Type: text/plain, Size: 1121 bytes --]

Pablo Neira Ayuso <pablo@netfilter.org> writes:

> setsockopt() with SO_SNDBUF never fails: it trims the newbuffsiz that is
> specified by net.core.wmem_max

Oh, good catch!  Your revised patch LGTM, and is closer to what was
being done in the immediately proceeding function, mnl_set_rcvbuffer.

However, after thinking about it, I feel we should be checking the
receiver value after setsockopt returns.  If someone is running
e.g. AppArmor, it seems better to me to attempt the non-privileged
operation first, to avoid adding noise in the logs.

Also, I don't think there are any current situations where
SO_SNDBUFFORCE might also trim down the value, but after re-reading the
man page, I'm not sure the contract precludes that in the future.

Attached is a V3 patch for consideration, which also changes the code to
attempt the non-privileged SO_RCVBUF before SO_RCVBUFFORCE.  I defer to
your judgment on which version is actually better; I tested both and
they both work a) in a container where SO_SNDBUFFORCE fails, and b)
outside a container with wmem_max set to a small-ish value where
SO_SNDBUFFORCE is required.

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: so-sndbuf-v3.patch --]
[-- Type: text/x-diff, Size: 1945 bytes --]

diff --git a/src/mnl.c b/src/mnl.c
index 26f943db..dcc22b82 100644
--- a/src/mnl.c
+++ b/src/mnl.c
@@ -259,10 +259,19 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl,
 	if (newbuffsiz <= sndnlbuffsiz)
 		return;
 
-	/* Rise sender buffer length to avoid hitting -EMSGSIZE */
-	if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE,
-		       &newbuffsiz, sizeof(socklen_t)) < 0)
-		return;
+	/* Raise sender buffer length to avoid hitting -EMSGSIZE.  The kernel may
+	 * reduce this to /proc/sys/net/core/wmem_max, see socket(7).
+	 */
+	sndnlbuffsiz = newbuffsiz;
+	if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
+		   &sndnlbuffsiz, sizeof(socklen_t)) < 0 || sndnlbuffsiz < newbuffsiz) {
+		/* If SO_SNDBUF failed or the resulting size is still too small, try
+		 * again with SO_SNDBUFFORCE.  This requires CAP_NET_ADMIN.
+		 */
+		sndnlbuffsiz = newbuffsiz;
+		setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE,
+		   &sndnlbuffsiz, sizeof(socklen_t));
+	}
 }
 
 static unsigned int nlsndbufsiz;
@@ -280,14 +289,16 @@ static int mnl_set_rcvbuffer(const struct mnl_socket *nl, socklen_t bufsiz)
 	if (nlsndbufsiz >= bufsiz)
 		return 0;
 
-	ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUFFORCE,
-			 &bufsiz, sizeof(socklen_t));
-	if (ret < 0) {
-		/* If this doesn't work, try to reach the system wide maximum
-		 * (or whatever the user requested).
+	nlsndbufsiz = bufsiz;
+	ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUF,
+			 &nlsndbufsiz, sizeof(socklen_t));
+	if (ret < 0 || nlsndbufsiz < bufsiz) {
+		/* If this doesn't work, try again with SO_RCVBUFFORCE.  This requires
+		 * CAP_NET_ADMIN.
 		 */
-		ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUF,
-				 &bufsiz, sizeof(socklen_t));
+		nlsndbufsiz = bufsiz;
+		ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUFFORCE,
+				 &nlsndbufsiz, sizeof(socklen_t));
 	}
 
 	return ret;

[-- Attachment #3: Type: text/plain, Size: 33 bytes --]



-- 
Dave Pifke, dave@pifke.org

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
  2023-04-10 18:03   ` Dave Pifke
@ 2023-04-18 10:10     ` Pablo Neira Ayuso
  0 siblings, 0 replies; 7+ messages in thread
From: Pablo Neira Ayuso @ 2023-04-18 10:10 UTC (permalink / raw)
  To: Dave Pifke; +Cc: netfilter-devel

On Mon, Apr 10, 2023 at 12:03:34PM -0600, Dave Pifke wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> writes:
> 
> > setsockopt() with SO_SNDBUF never fails: it trims the newbuffsiz that is
> > specified by net.core.wmem_max
> 
> Oh, good catch!  Your revised patch LGTM, and is closer to what was
> being done in the immediately proceeding function, mnl_set_rcvbuffer.
> 
> However, after thinking about it, I feel we should be checking the
> receiver value after setsockopt returns.  If someone is running
> e.g. AppArmor, it seems better to me to attempt the non-privileged
> operation first, to avoid adding noise in the logs.
> 
> Also, I don't think there are any current situations where
> SO_SNDBUFFORCE might also trim down the value, but after re-reading the
> man page, I'm not sure the contract precludes that in the future.
> 
> Attached is a V3 patch for consideration, which also changes the code to
> attempt the non-privileged SO_RCVBUF before SO_RCVBUFFORCE.  I defer to
> your judgment on which version is actually better; I tested both and
> they both work a) in a container where SO_SNDBUFFORCE fails, and b)
> outside a container with wmem_max set to a small-ish value where
> SO_SNDBUFFORCE is required.

Thanks for your patch.

setsockopt() does not update the &sndnlbuffsiz that is passed as
argument in Linux.

I have posted this patch:

https://patchwork.ozlabs.org/project/netfilter-devel/patch/20230418100223.158964-1-pablo@netfilter.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2023-04-18 10:11 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-07 22:21 [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE Dave Pifke
2023-04-08 18:23 ` Pablo Neira Ayuso
2023-04-08 18:34   ` Dave Pifke
2023-04-08 23:09 ` Pablo Neira Ayuso
2023-04-10  9:04   ` Pablo Neira Ayuso
2023-04-10 18:03   ` Dave Pifke
2023-04-18 10:10     ` Pablo Neira Ayuso

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.