netfilter-devel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH nft] src: mnl: fix --echo buffer size -- again
@ 2019-09-09 22:19 Florian Westphal
  2019-09-10  8:50 ` Pablo Neira Ayuso
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Westphal @ 2019-09-09 22:19 UTC (permalink / raw)
  To: netfilter-devel; +Cc: Florian Westphal, Eric Garver

Eric Garver reports:
   If this restart is triggered it causes rules to be duplicated. We send
   the same batch again.

... and indeed, if the batch isn't doing a full replace, we cannot resend.

Therefore, remove the restart logic again.

1. If user passed --echo, use a 4mb buffer.
2. assume each element in the batch will result in a 1k
notification and further increase limits if thats not enough.

This still passes on s390x (the platform that did not work with
the former, more conservative estimate).

Next option (aside from increasing the guess again ...) is to add a
commandline switch to nftables to allow userspace to override the
buffer size.

Fixes: 877baf9538f66f8f238 ("src: mnl: retry when we hit -ENOBUFS")
Reported-by: Eric Garver <eric@garver.life>
Signed-off-by: Florian Westphal <fw@strlen.de>
---
 src/mnl.c | 21 ++++++++++-----------
 1 file changed, 10 insertions(+), 11 deletions(-)

diff --git a/src/mnl.c b/src/mnl.c
index 9c1f5356c9b9..d664564e16af 100644
--- a/src/mnl.c
+++ b/src/mnl.c
@@ -311,8 +311,6 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list,
 	int ret, fd = mnl_socket_get_fd(nl), portid = mnl_socket_get_portid(nl);
 	uint32_t iov_len = nftnl_batch_iovec_len(ctx->batch);
 	char rcv_buf[MNL_SOCKET_BUFFER_SIZE];
-	unsigned int enobuf_restarts = 0;
-	size_t avg_msg_size, batch_size;
 	const struct sockaddr_nl snl = {
 		.nl_family = AF_NETLINK
 	};
@@ -321,17 +319,22 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list,
 		.tv_usec	= 0
 	};
 	struct iovec iov[iov_len];
-	unsigned int scale = 4;
 	struct msghdr msg = {};
 	fd_set readfds;
 
 	mnl_set_sndbuffer(ctx->nft->nf_sock, ctx->batch);
 
-	batch_size = mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len);
-	avg_msg_size = div_round_up(batch_size, num_cmds);
+	mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len);
 
-restart:
-	mnl_set_rcvbuffer(ctx->nft->nf_sock, num_cmds * avg_msg_size * scale);
+	if (nft_output_echo(&ctx->nft->output)) {
+		size_t buffer_size = MNL_SOCKET_BUFFER_SIZE * 1024;
+		size_t new_buffer_size = num_cmds * 1024;
+
+		if (new_buffer_size > buffer_size)
+			buffer_size = new_buffer_size;
+
+		mnl_set_rcvbuffer(ctx->nft->nf_sock, buffer_size);
+	}
 
 	ret = mnl_nft_socket_sendmsg(ctx, &msg);
 	if (ret == -1)
@@ -351,10 +354,6 @@ restart:
 
 		ret = mnl_socket_recvfrom(nl, rcv_buf, sizeof(rcv_buf));
 		if (ret == -1) {
-			if (errno == ENOBUFS && enobuf_restarts++ < 3) {
-				scale *= 2;
-				goto restart;
-			}
 			return -1;
 		}
 
-- 
2.21.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH nft] src: mnl: fix --echo buffer size -- again
  2019-09-09 22:19 [PATCH nft] src: mnl: fix --echo buffer size -- again Florian Westphal
@ 2019-09-10  8:50 ` Pablo Neira Ayuso
  2019-09-10 10:52   ` Florian Westphal
  0 siblings, 1 reply; 6+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-10  8:50 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, Eric Garver

On Tue, Sep 10, 2019 at 12:19:18AM +0200, Florian Westphal wrote:
[...]
> diff --git a/src/mnl.c b/src/mnl.c
> index 9c1f5356c9b9..d664564e16af 100644
> --- a/src/mnl.c
> +++ b/src/mnl.c
> @@ -311,8 +311,6 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list,
>  	int ret, fd = mnl_socket_get_fd(nl), portid = mnl_socket_get_portid(nl);
>  	uint32_t iov_len = nftnl_batch_iovec_len(ctx->batch);
>  	char rcv_buf[MNL_SOCKET_BUFFER_SIZE];
> -	unsigned int enobuf_restarts = 0;
> -	size_t avg_msg_size, batch_size;
>  	const struct sockaddr_nl snl = {
>  		.nl_family = AF_NETLINK
>  	};
> @@ -321,17 +319,22 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list,
>  		.tv_usec	= 0
>  	};
>  	struct iovec iov[iov_len];
> -	unsigned int scale = 4;
>  	struct msghdr msg = {};
>  	fd_set readfds;
>  
>  	mnl_set_sndbuffer(ctx->nft->nf_sock, ctx->batch);
>  
> -	batch_size = mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len);
> -	avg_msg_size = div_round_up(batch_size, num_cmds);
> +	mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len);
>  
> -restart:
> -	mnl_set_rcvbuffer(ctx->nft->nf_sock, num_cmds * avg_msg_size * scale);
> +	if (nft_output_echo(&ctx->nft->output)) {
> +		size_t buffer_size = MNL_SOCKET_BUFFER_SIZE * 1024;
> +		size_t new_buffer_size = num_cmds * 1024;

Probably all simplify this to?

		mnl_set_rcvbuffer(ctx->nft->nf_sock, (1 << 10) * num_cmds);

Upper limit for us is NLMSG_GOODSIZE, which is not exposed to
userspace, although we have control over that upper limit from
nf_tables. In practise, I think we go over the 1 Kbytes per message
boundary.

> +
> +		if (new_buffer_size > buffer_size)
> +			buffer_size = new_buffer_size;
> +
> +		mnl_set_rcvbuffer(ctx->nft->nf_sock, buffer_size);
> +	}
>  
>  	ret = mnl_nft_socket_sendmsg(ctx, &msg);
>  	if (ret == -1)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH nft] src: mnl: fix --echo buffer size -- again
  2019-09-10  8:50 ` Pablo Neira Ayuso
@ 2019-09-10 10:52   ` Florian Westphal
  2019-09-10 11:22     ` Pablo Neira Ayuso
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Westphal @ 2019-09-10 10:52 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, Eric Garver

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> On Tue, Sep 10, 2019 at 12:19:18AM +0200, Florian Westphal wrote:
> [...]
> > diff --git a/src/mnl.c b/src/mnl.c
> > index 9c1f5356c9b9..d664564e16af 100644
> > --- a/src/mnl.c
> > +++ b/src/mnl.c
> > @@ -311,8 +311,6 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list,
> >  	int ret, fd = mnl_socket_get_fd(nl), portid = mnl_socket_get_portid(nl);
> >  	uint32_t iov_len = nftnl_batch_iovec_len(ctx->batch);
> >  	char rcv_buf[MNL_SOCKET_BUFFER_SIZE];
> > -	unsigned int enobuf_restarts = 0;
> > -	size_t avg_msg_size, batch_size;
> >  	const struct sockaddr_nl snl = {
> >  		.nl_family = AF_NETLINK
> >  	};
> > @@ -321,17 +319,22 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list,
> >  		.tv_usec	= 0
> >  	};
> >  	struct iovec iov[iov_len];
> > -	unsigned int scale = 4;
> >  	struct msghdr msg = {};
> >  	fd_set readfds;
> >  
> >  	mnl_set_sndbuffer(ctx->nft->nf_sock, ctx->batch);
> >  
> > -	batch_size = mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len);
> > -	avg_msg_size = div_round_up(batch_size, num_cmds);
> > +	mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len);
> >  
> > -restart:
> > -	mnl_set_rcvbuffer(ctx->nft->nf_sock, num_cmds * avg_msg_size * scale);
> > +	if (nft_output_echo(&ctx->nft->output)) {
> > +		size_t buffer_size = MNL_SOCKET_BUFFER_SIZE * 1024;
> > +		size_t new_buffer_size = num_cmds * 1024;
> 
> Probably all simplify this to?
> 
> 		mnl_set_rcvbuffer(ctx->nft->nf_sock, (1 << 10) * num_cmds);

Reason for above patch was to avoid any risk for normal operations by
restricting the recvbuffer tuning to echo-mode and also adding a
lower thresh.

For some reason I don't like the idea of setting only 1k recvbuf by
default in the extreme case.

That said, it does seem to work.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH nft] src: mnl: fix --echo buffer size -- again
  2019-09-10 10:52   ` Florian Westphal
@ 2019-09-10 11:22     ` Pablo Neira Ayuso
  2019-09-10 11:44       ` Florian Westphal
  0 siblings, 1 reply; 6+ messages in thread
From: Pablo Neira Ayuso @ 2019-09-10 11:22 UTC (permalink / raw)
  To: Florian Westphal; +Cc: netfilter-devel, Eric Garver

[-- Attachment #1: Type: text/plain, Size: 2167 bytes --]

On Tue, Sep 10, 2019 at 12:52:42PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > On Tue, Sep 10, 2019 at 12:19:18AM +0200, Florian Westphal wrote:
> > [...]
> > > diff --git a/src/mnl.c b/src/mnl.c
> > > index 9c1f5356c9b9..d664564e16af 100644
> > > --- a/src/mnl.c
> > > +++ b/src/mnl.c
> > > @@ -311,8 +311,6 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list,
> > >  	int ret, fd = mnl_socket_get_fd(nl), portid = mnl_socket_get_portid(nl);
> > >  	uint32_t iov_len = nftnl_batch_iovec_len(ctx->batch);
> > >  	char rcv_buf[MNL_SOCKET_BUFFER_SIZE];
> > > -	unsigned int enobuf_restarts = 0;
> > > -	size_t avg_msg_size, batch_size;
> > >  	const struct sockaddr_nl snl = {
> > >  		.nl_family = AF_NETLINK
> > >  	};
> > > @@ -321,17 +319,22 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list,
> > >  		.tv_usec	= 0
> > >  	};
> > >  	struct iovec iov[iov_len];
> > > -	unsigned int scale = 4;
> > >  	struct msghdr msg = {};
> > >  	fd_set readfds;
> > >  
> > >  	mnl_set_sndbuffer(ctx->nft->nf_sock, ctx->batch);
> > >  
> > > -	batch_size = mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len);
> > > -	avg_msg_size = div_round_up(batch_size, num_cmds);
> > > +	mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len);
> > >  
> > > -restart:
> > > -	mnl_set_rcvbuffer(ctx->nft->nf_sock, num_cmds * avg_msg_size * scale);
> > > +	if (nft_output_echo(&ctx->nft->output)) {
> > > +		size_t buffer_size = MNL_SOCKET_BUFFER_SIZE * 1024;
> > > +		size_t new_buffer_size = num_cmds * 1024;
> > 
> > Probably all simplify this to?
> > 
> > 		mnl_set_rcvbuffer(ctx->nft->nf_sock, (1 << 10) * num_cmds);
> 
> Reason for above patch was to avoid any risk for normal operations by
> restricting the recvbuffer tuning to echo-mode and also adding a
> lower thresh.
> 
> For some reason I don't like the idea of setting only 1k recvbuf by
> default in the extreme case.

I'd still like to keep setting the receive buffer for the non-echo
case, a ruleset with lots of acknowledments (lots of errors) might hit
ENOBUFS, I remember that was reproducible.

Probably this? it's based on your patch.

[-- Attachment #2: x.patch --]
[-- Type: text/x-diff, Size: 2121 bytes --]

diff --git a/src/mnl.c b/src/mnl.c
index 9c1f5356c9b9..8031bd6add80 100644
--- a/src/mnl.c
+++ b/src/mnl.c
@@ -304,6 +304,8 @@ static ssize_t mnl_nft_socket_sendmsg(struct netlink_ctx *ctx,
 	return sendmsg(mnl_socket_get_fd(ctx->nft->nf_sock), msg, 0);
 }
 
+#define NFT_MNL_ECHO_RCVBUFF_DEFAULT	(MNL_SOCKET_BUFFER_SIZE * 1024)
+
 int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list,
 		   uint32_t num_cmds)
 {
@@ -311,8 +313,6 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list,
 	int ret, fd = mnl_socket_get_fd(nl), portid = mnl_socket_get_portid(nl);
 	uint32_t iov_len = nftnl_batch_iovec_len(ctx->batch);
 	char rcv_buf[MNL_SOCKET_BUFFER_SIZE];
-	unsigned int enobuf_restarts = 0;
-	size_t avg_msg_size, batch_size;
 	const struct sockaddr_nl snl = {
 		.nl_family = AF_NETLINK
 	};
@@ -321,17 +321,24 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list,
 		.tv_usec	= 0
 	};
 	struct iovec iov[iov_len];
-	unsigned int scale = 4;
 	struct msghdr msg = {};
+	unsigned int rcvbufsiz;
+	size_t batch_size;
 	fd_set readfds;
 
 	mnl_set_sndbuffer(ctx->nft->nf_sock, ctx->batch);
 
 	batch_size = mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len);
-	avg_msg_size = div_round_up(batch_size, num_cmds);
 
-restart:
-	mnl_set_rcvbuffer(ctx->nft->nf_sock, num_cmds * avg_msg_size * scale);
+	if (nft_output_echo(&ctx->nft->output)) {
+		rcvbufsiz = num_cmds * 1024;
+		if (rcvbufsiz < NFT_MNL_ECHO_RCVBUFF_DEFAULT)
+			rcvbufsiz = NFT_MNL_ECHO_RCVBUFF_DEFAULT;
+	} else {
+		rcvbufsiz = num_cmds * div_round_up(batch_size, num_cmds) * 4;
+	}
+
+	mnl_set_rcvbuffer(ctx->nft->nf_sock, rcvbufsiz);
 
 	ret = mnl_nft_socket_sendmsg(ctx, &msg);
 	if (ret == -1)
@@ -350,13 +357,8 @@ restart:
 			break;
 
 		ret = mnl_socket_recvfrom(nl, rcv_buf, sizeof(rcv_buf));
-		if (ret == -1) {
-			if (errno == ENOBUFS && enobuf_restarts++ < 3) {
-				scale *= 2;
-				goto restart;
-			}
+		if (ret == -1)
 			return -1;
-		}
 
 		ret = mnl_cb_run(rcv_buf, ret, 0, portid, &netlink_echo_callback, ctx);
 		/* Continue on error, make sure we get all acknowledgments */

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH nft] src: mnl: fix --echo buffer size -- again
  2019-09-10 11:22     ` Pablo Neira Ayuso
@ 2019-09-10 11:44       ` Florian Westphal
  2019-09-10 13:08         ` Eric Garver
  0 siblings, 1 reply; 6+ messages in thread
From: Florian Westphal @ 2019-09-10 11:44 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, Eric Garver

Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> I'd still like to keep setting the receive buffer for the non-echo
> case, a ruleset with lots of acknowledments (lots of errors) might hit
> ENOBUFS, I remember that was reproducible.
> 
> Probably this? it's based on your patch.

LGTM, feel free to apply this.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH nft] src: mnl: fix --echo buffer size -- again
  2019-09-10 11:44       ` Florian Westphal
@ 2019-09-10 13:08         ` Eric Garver
  0 siblings, 0 replies; 6+ messages in thread
From: Eric Garver @ 2019-09-10 13:08 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel

On Tue, Sep 10, 2019 at 01:44:12PM +0200, Florian Westphal wrote:
> Pablo Neira Ayuso <pablo@netfilter.org> wrote:
> > I'd still like to keep setting the receive buffer for the non-echo
> > case, a ruleset with lots of acknowledments (lots of errors) might hit
> > ENOBUFS, I remember that was reproducible.
> > 
> > Probably this? it's based on your patch.
> 
> LGTM, feel free to apply this.

Pablo's version passes all my tests. But so does Florian's version.
Feel free to add my tested-by tag.

Tested-by: Eric Garver <eric@garver.life>

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-09-10 13:08 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-09 22:19 [PATCH nft] src: mnl: fix --echo buffer size -- again Florian Westphal
2019-09-10  8:50 ` Pablo Neira Ayuso
2019-09-10 10:52   ` Florian Westphal
2019-09-10 11:22     ` Pablo Neira Ayuso
2019-09-10 11:44       ` Florian Westphal
2019-09-10 13:08         ` Eric Garver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).