* [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
@ 2023-04-07 22:21 Dave Pifke
2023-04-08 18:23 ` Pablo Neira Ayuso
2023-04-08 23:09 ` Pablo Neira Ayuso
0 siblings, 2 replies; 7+ messages in thread
From: Dave Pifke @ 2023-04-07 22:21 UTC (permalink / raw)
To: netfilter-devel
Prior to this patch, nft inside a systemd-nspawn container was failing
to install my ruleset (which includes a large-ish map), with the error
netlink: Error: Could not process rule: Message too long
strace reveals:
setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [524288], 4) = -1 EPERM (Operation not permitted)
This is despite the nspawn process supposedly having CAP_NET_ADMIN,
and despite /proc/sys/net/core/wmem_max (in the main host namespace)
being set larger than the requested size:
net.core.wmem_max = 16777216
A web search reveals at least one other user having the same issue:
https://old.reddit.com/r/Proxmox/comments/scnoav/lxc_container_debian_11_nftables_geoblocking/
After this patch, nft succeeds.
---
src/mnl.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/src/mnl.c b/src/mnl.c
index 26f943db..ab6750c8 100644
--- a/src/mnl.c
+++ b/src/mnl.c
@@ -260,6 +260,13 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl,
return;
/* Rise sender buffer length to avoid hitting -EMSGSIZE */
+ if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
+ &newbuffsiz, sizeof(socklen_t)) == 0)
+ return;
+
+ /* If the above fails (probably because it exceeds
+ * /proc/sys/net/core/wmem_max), try again with SO_SNDBUFFORCE.
+ * This requires CAP_NET_ADMIN. */
if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE,
&newbuffsiz, sizeof(socklen_t)) < 0)
return;
--
2.20.1
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
2023-04-07 22:21 [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE Dave Pifke
@ 2023-04-08 18:23 ` Pablo Neira Ayuso
2023-04-08 18:34 ` Dave Pifke
2023-04-08 23:09 ` Pablo Neira Ayuso
1 sibling, 1 reply; 7+ messages in thread
From: Pablo Neira Ayuso @ 2023-04-08 18:23 UTC (permalink / raw)
To: Dave Pifke; +Cc: netfilter-devel
On Fri, Apr 07, 2023 at 04:21:57PM -0600, Dave Pifke wrote:
> Prior to this patch, nft inside a systemd-nspawn container was failing
> to install my ruleset (which includes a large-ish map), with the error
>
> netlink: Error: Could not process rule: Message too long
>
> strace reveals:
>
> setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [524288], 4) = -1 EPERM (Operation not permitted)
>
> This is despite the nspawn process supposedly having CAP_NET_ADMIN,
> and despite /proc/sys/net/core/wmem_max (in the main host namespace)
> being set larger than the requested size:
>
> net.core.wmem_max = 16777216
>
> A web search reveals at least one other user having the same issue:
>
> https://old.reddit.com/r/Proxmox/comments/scnoav/lxc_container_debian_11_nftables_geoblocking/
>
> After this patch, nft succeeds.
Patch LGTM.
May I add your Signed-off-by: tag to this patch before applying it?
Thanks.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
2023-04-08 18:23 ` Pablo Neira Ayuso
@ 2023-04-08 18:34 ` Dave Pifke
0 siblings, 0 replies; 7+ messages in thread
From: Dave Pifke @ 2023-04-08 18:34 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel
Pablo Neira Ayuso <pablo@netfilter.org> writes:
> May I add your Signed-off-by: tag to this patch before applying it?
Yes. Thank you!
--
Dave Pifke, dave@pifke.org
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
2023-04-07 22:21 [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE Dave Pifke
2023-04-08 18:23 ` Pablo Neira Ayuso
@ 2023-04-08 23:09 ` Pablo Neira Ayuso
2023-04-10 9:04 ` Pablo Neira Ayuso
2023-04-10 18:03 ` Dave Pifke
1 sibling, 2 replies; 7+ messages in thread
From: Pablo Neira Ayuso @ 2023-04-08 23:09 UTC (permalink / raw)
To: Dave Pifke; +Cc: netfilter-devel
Hi again,
Let me revisit this.
On Fri, Apr 07, 2023 at 04:21:57PM -0600, Dave Pifke wrote:
> Prior to this patch, nft inside a systemd-nspawn container was failing
> to install my ruleset (which includes a large-ish map), with the error
>
> netlink: Error: Could not process rule: Message too long
>
> strace reveals:
>
> setsockopt(3, SOL_SOCKET, SO_SNDBUFFORCE, [524288], 4) = -1 EPERM (Operation not permitted)
>
> This is despite the nspawn process supposedly having CAP_NET_ADMIN,
> and despite /proc/sys/net/core/wmem_max (in the main host namespace)
> being set larger than the requested size:
>
> net.core.wmem_max = 16777216
OK, so you indeed increased net.core.wmem_max on the host namespace.
> A web search reveals at least one other user having the same issue:
>
> https://old.reddit.com/r/Proxmox/comments/scnoav/lxc_container_debian_11_nftables_geoblocking/
>
> After this patch, nft succeeds.
> ---
> src/mnl.c | 7 +++++++
> 1 file changed, 7 insertions(+)
>
> diff --git a/src/mnl.c b/src/mnl.c
> index 26f943db..ab6750c8 100644
> --- a/src/mnl.c
> +++ b/src/mnl.c
> @@ -260,6 +260,13 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl,
> return;
>
> /* Rise sender buffer length to avoid hitting -EMSGSIZE */
> + if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
> + &newbuffsiz, sizeof(socklen_t)) == 0)
> + return;
setsockopt() with SO_SNDBUF never fails: it trims the newbuffsiz that is
specified by net.core.wmem_max
This needs to call:
setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
&newbuffsiz, sizeof(socklen_t));
without checking the return value. Otherwise, SO_SNDBUFFORCE is never
going to be called after this patch. This needs a v2.
On top of this patch, you still needed to increase net.core.wmem_max
in your host container for this to work.
> + /* If the above fails (probably because it exceeds
> + * /proc/sys/net/core/wmem_max), try again with SO_SNDBUFFORCE.
> + * This requires CAP_NET_ADMIN. */
> if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE,
> &newbuffsiz, sizeof(socklen_t)) < 0)
> return;
> --
> 2.20.1
>
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
2023-04-08 23:09 ` Pablo Neira Ayuso
@ 2023-04-10 9:04 ` Pablo Neira Ayuso
2023-04-10 18:03 ` Dave Pifke
1 sibling, 0 replies; 7+ messages in thread
From: Pablo Neira Ayuso @ 2023-04-10 9:04 UTC (permalink / raw)
To: Dave Pifke; +Cc: netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 897 bytes --]
Hi,
On Sun, Apr 09, 2023 at 01:09:07AM +0200, Pablo Neira Ayuso wrote:
> > diff --git a/src/mnl.c b/src/mnl.c
> > index 26f943db..ab6750c8 100644
> > --- a/src/mnl.c
> > +++ b/src/mnl.c
> > @@ -260,6 +260,13 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl,
> > return;
> >
> > /* Rise sender buffer length to avoid hitting -EMSGSIZE */
> > + if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
> > + &newbuffsiz, sizeof(socklen_t)) == 0)
> > + return;
>
> setsockopt() with SO_SNDBUF never fails: it trims the newbuffsiz that is
> specified by net.core.wmem_max
>
> This needs to call:
>
> setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
> &newbuffsiz, sizeof(socklen_t));
>
> without checking the return value. Otherwise, SO_SNDBUFFORCE is never
> going to be called after this patch. This needs a v2.
I think this patch should be fine.
[-- Attachment #2: x.patch --]
[-- Type: text/x-diff, Size: 678 bytes --]
diff --git a/src/mnl.c b/src/mnl.c
index 26f943dbb4c8..ee62c0c9c2a0 100644
--- a/src/mnl.c
+++ b/src/mnl.c
@@ -261,8 +261,13 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl,
/* Rise sender buffer length to avoid hitting -EMSGSIZE */
if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE,
- &newbuffsiz, sizeof(socklen_t)) < 0)
- return;
+ &newbuffsiz, sizeof(socklen_t)) < 0) {
+ /* Fall back to SO_SNDBUF, this never fails, kernel trims down
+ * the size to net.core.wmem_max.
+ */
+ setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
+ &newbuffsiz, sizeof(socklen_t));
+ }
}
static unsigned int nlsndbufsiz;
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
2023-04-08 23:09 ` Pablo Neira Ayuso
2023-04-10 9:04 ` Pablo Neira Ayuso
@ 2023-04-10 18:03 ` Dave Pifke
2023-04-18 10:10 ` Pablo Neira Ayuso
1 sibling, 1 reply; 7+ messages in thread
From: Dave Pifke @ 2023-04-10 18:03 UTC (permalink / raw)
To: Pablo Neira Ayuso; +Cc: netfilter-devel
[-- Attachment #1: Type: text/plain, Size: 1121 bytes --]
Pablo Neira Ayuso <pablo@netfilter.org> writes:
> setsockopt() with SO_SNDBUF never fails: it trims the newbuffsiz that is
> specified by net.core.wmem_max
Oh, good catch! Your revised patch LGTM, and is closer to what was
being done in the immediately proceeding function, mnl_set_rcvbuffer.
However, after thinking about it, I feel we should be checking the
receiver value after setsockopt returns. If someone is running
e.g. AppArmor, it seems better to me to attempt the non-privileged
operation first, to avoid adding noise in the logs.
Also, I don't think there are any current situations where
SO_SNDBUFFORCE might also trim down the value, but after re-reading the
man page, I'm not sure the contract precludes that in the future.
Attached is a V3 patch for consideration, which also changes the code to
attempt the non-privileged SO_RCVBUF before SO_RCVBUFFORCE. I defer to
your judgment on which version is actually better; I tested both and
they both work a) in a container where SO_SNDBUFFORCE fails, and b)
outside a container with wmem_max set to a small-ish value where
SO_SNDBUFFORCE is required.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: so-sndbuf-v3.patch --]
[-- Type: text/x-diff, Size: 1945 bytes --]
diff --git a/src/mnl.c b/src/mnl.c
index 26f943db..dcc22b82 100644
--- a/src/mnl.c
+++ b/src/mnl.c
@@ -259,10 +259,19 @@ static void mnl_set_sndbuffer(const struct mnl_socket *nl,
if (newbuffsiz <= sndnlbuffsiz)
return;
- /* Rise sender buffer length to avoid hitting -EMSGSIZE */
- if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE,
- &newbuffsiz, sizeof(socklen_t)) < 0)
- return;
+ /* Raise sender buffer length to avoid hitting -EMSGSIZE. The kernel may
+ * reduce this to /proc/sys/net/core/wmem_max, see socket(7).
+ */
+ sndnlbuffsiz = newbuffsiz;
+ if (setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUF,
+ &sndnlbuffsiz, sizeof(socklen_t)) < 0 || sndnlbuffsiz < newbuffsiz) {
+ /* If SO_SNDBUF failed or the resulting size is still too small, try
+ * again with SO_SNDBUFFORCE. This requires CAP_NET_ADMIN.
+ */
+ sndnlbuffsiz = newbuffsiz;
+ setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_SNDBUFFORCE,
+ &sndnlbuffsiz, sizeof(socklen_t));
+ }
}
static unsigned int nlsndbufsiz;
@@ -280,14 +289,16 @@ static int mnl_set_rcvbuffer(const struct mnl_socket *nl, socklen_t bufsiz)
if (nlsndbufsiz >= bufsiz)
return 0;
- ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUFFORCE,
- &bufsiz, sizeof(socklen_t));
- if (ret < 0) {
- /* If this doesn't work, try to reach the system wide maximum
- * (or whatever the user requested).
+ nlsndbufsiz = bufsiz;
+ ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUF,
+ &nlsndbufsiz, sizeof(socklen_t));
+ if (ret < 0 || nlsndbufsiz < bufsiz) {
+ /* If this doesn't work, try again with SO_RCVBUFFORCE. This requires
+ * CAP_NET_ADMIN.
*/
- ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUF,
- &bufsiz, sizeof(socklen_t));
+ nlsndbufsiz = bufsiz;
+ ret = setsockopt(mnl_socket_get_fd(nl), SOL_SOCKET, SO_RCVBUFFORCE,
+ &nlsndbufsiz, sizeof(socklen_t));
}
return ret;
[-- Attachment #3: Type: text/plain, Size: 33 bytes --]
--
Dave Pifke, dave@pifke.org
^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE
2023-04-10 18:03 ` Dave Pifke
@ 2023-04-18 10:10 ` Pablo Neira Ayuso
0 siblings, 0 replies; 7+ messages in thread
From: Pablo Neira Ayuso @ 2023-04-18 10:10 UTC (permalink / raw)
To: Dave Pifke; +Cc: netfilter-devel
On Mon, Apr 10, 2023 at 12:03:34PM -0600, Dave Pifke wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> writes:
>
> > setsockopt() with SO_SNDBUF never fails: it trims the newbuffsiz that is
> > specified by net.core.wmem_max
>
> Oh, good catch! Your revised patch LGTM, and is closer to what was
> being done in the immediately proceeding function, mnl_set_rcvbuffer.
>
> However, after thinking about it, I feel we should be checking the
> receiver value after setsockopt returns. If someone is running
> e.g. AppArmor, it seems better to me to attempt the non-privileged
> operation first, to avoid adding noise in the logs.
>
> Also, I don't think there are any current situations where
> SO_SNDBUFFORCE might also trim down the value, but after re-reading the
> man page, I'm not sure the contract precludes that in the future.
>
> Attached is a V3 patch for consideration, which also changes the code to
> attempt the non-privileged SO_RCVBUF before SO_RCVBUFFORCE. I defer to
> your judgment on which version is actually better; I tested both and
> they both work a) in a container where SO_SNDBUFFORCE fails, and b)
> outside a container with wmem_max set to a small-ish value where
> SO_SNDBUFFORCE is required.
Thanks for your patch.
setsockopt() does not update the &sndnlbuffsiz that is passed as
argument in Linux.
I have posted this patch:
https://patchwork.ozlabs.org/project/netfilter-devel/patch/20230418100223.158964-1-pablo@netfilter.org/
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-04-18 10:11 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-04-07 22:21 [PATCH] src: try SO_SNDBUF before SO_SNDBUFFORCE Dave Pifke
2023-04-08 18:23 ` Pablo Neira Ayuso
2023-04-08 18:34 ` Dave Pifke
2023-04-08 23:09 ` Pablo Neira Ayuso
2023-04-10 9:04 ` Pablo Neira Ayuso
2023-04-10 18:03 ` Dave Pifke
2023-04-18 10:10 ` Pablo Neira Ayuso
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.