* [PATCH nft] src: mnl: fix --echo buffer size -- again @ 2019-09-09 22:19 Florian Westphal 2019-09-10 8:50 ` Pablo Neira Ayuso 0 siblings, 1 reply; 6+ messages in thread From: Florian Westphal @ 2019-09-09 22:19 UTC (permalink / raw) To: netfilter-devel; +Cc: Florian Westphal, Eric Garver Eric Garver reports: If this restart is triggered it causes rules to be duplicated. We send the same batch again. ... and indeed, if the batch isn't doing a full replace, we cannot resend. Therefore, remove the restart logic again. 1. If user passed --echo, use a 4mb buffer. 2. assume each element in the batch will result in a 1k notification and further increase limits if thats not enough. This still passes on s390x (the platform that did not work with the former, more conservative estimate). Next option (aside from increasing the guess again ...) is to add a commandline switch to nftables to allow userspace to override the buffer size. Fixes: 877baf9538f66f8f238 ("src: mnl: retry when we hit -ENOBUFS") Reported-by: Eric Garver <eric@garver.life> Signed-off-by: Florian Westphal <fw@strlen.de> --- src/mnl.c | 21 ++++++++++----------- 1 file changed, 10 insertions(+), 11 deletions(-) diff --git a/src/mnl.c b/src/mnl.c index 9c1f5356c9b9..d664564e16af 100644 --- a/src/mnl.c +++ b/src/mnl.c @@ -311,8 +311,6 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, int ret, fd = mnl_socket_get_fd(nl), portid = mnl_socket_get_portid(nl); uint32_t iov_len = nftnl_batch_iovec_len(ctx->batch); char rcv_buf[MNL_SOCKET_BUFFER_SIZE]; - unsigned int enobuf_restarts = 0; - size_t avg_msg_size, batch_size; const struct sockaddr_nl snl = { .nl_family = AF_NETLINK }; @@ -321,17 +319,22 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, .tv_usec = 0 }; struct iovec iov[iov_len]; - unsigned int scale = 4; struct msghdr msg = {}; fd_set readfds; mnl_set_sndbuffer(ctx->nft->nf_sock, ctx->batch); - batch_size = mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len); - avg_msg_size = div_round_up(batch_size, num_cmds); + mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len); -restart: - mnl_set_rcvbuffer(ctx->nft->nf_sock, num_cmds * avg_msg_size * scale); + if (nft_output_echo(&ctx->nft->output)) { + size_t buffer_size = MNL_SOCKET_BUFFER_SIZE * 1024; + size_t new_buffer_size = num_cmds * 1024; + + if (new_buffer_size > buffer_size) + buffer_size = new_buffer_size; + + mnl_set_rcvbuffer(ctx->nft->nf_sock, buffer_size); + } ret = mnl_nft_socket_sendmsg(ctx, &msg); if (ret == -1) @@ -351,10 +354,6 @@ restart: ret = mnl_socket_recvfrom(nl, rcv_buf, sizeof(rcv_buf)); if (ret == -1) { - if (errno == ENOBUFS && enobuf_restarts++ < 3) { - scale *= 2; - goto restart; - } return -1; } -- 2.21.0 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH nft] src: mnl: fix --echo buffer size -- again 2019-09-09 22:19 [PATCH nft] src: mnl: fix --echo buffer size -- again Florian Westphal @ 2019-09-10 8:50 ` Pablo Neira Ayuso 2019-09-10 10:52 ` Florian Westphal 0 siblings, 1 reply; 6+ messages in thread From: Pablo Neira Ayuso @ 2019-09-10 8:50 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, Eric Garver On Tue, Sep 10, 2019 at 12:19:18AM +0200, Florian Westphal wrote: [...] > diff --git a/src/mnl.c b/src/mnl.c > index 9c1f5356c9b9..d664564e16af 100644 > --- a/src/mnl.c > +++ b/src/mnl.c > @@ -311,8 +311,6 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, > int ret, fd = mnl_socket_get_fd(nl), portid = mnl_socket_get_portid(nl); > uint32_t iov_len = nftnl_batch_iovec_len(ctx->batch); > char rcv_buf[MNL_SOCKET_BUFFER_SIZE]; > - unsigned int enobuf_restarts = 0; > - size_t avg_msg_size, batch_size; > const struct sockaddr_nl snl = { > .nl_family = AF_NETLINK > }; > @@ -321,17 +319,22 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, > .tv_usec = 0 > }; > struct iovec iov[iov_len]; > - unsigned int scale = 4; > struct msghdr msg = {}; > fd_set readfds; > > mnl_set_sndbuffer(ctx->nft->nf_sock, ctx->batch); > > - batch_size = mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len); > - avg_msg_size = div_round_up(batch_size, num_cmds); > + mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len); > > -restart: > - mnl_set_rcvbuffer(ctx->nft->nf_sock, num_cmds * avg_msg_size * scale); > + if (nft_output_echo(&ctx->nft->output)) { > + size_t buffer_size = MNL_SOCKET_BUFFER_SIZE * 1024; > + size_t new_buffer_size = num_cmds * 1024; Probably all simplify this to? mnl_set_rcvbuffer(ctx->nft->nf_sock, (1 << 10) * num_cmds); Upper limit for us is NLMSG_GOODSIZE, which is not exposed to userspace, although we have control over that upper limit from nf_tables. In practise, I think we go over the 1 Kbytes per message boundary. > + > + if (new_buffer_size > buffer_size) > + buffer_size = new_buffer_size; > + > + mnl_set_rcvbuffer(ctx->nft->nf_sock, buffer_size); > + } > > ret = mnl_nft_socket_sendmsg(ctx, &msg); > if (ret == -1) ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH nft] src: mnl: fix --echo buffer size -- again 2019-09-10 8:50 ` Pablo Neira Ayuso @ 2019-09-10 10:52 ` Florian Westphal 2019-09-10 11:22 ` Pablo Neira Ayuso 0 siblings, 1 reply; 6+ messages in thread From: Florian Westphal @ 2019-09-10 10:52 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, Eric Garver Pablo Neira Ayuso <pablo@netfilter.org> wrote: > On Tue, Sep 10, 2019 at 12:19:18AM +0200, Florian Westphal wrote: > [...] > > diff --git a/src/mnl.c b/src/mnl.c > > index 9c1f5356c9b9..d664564e16af 100644 > > --- a/src/mnl.c > > +++ b/src/mnl.c > > @@ -311,8 +311,6 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, > > int ret, fd = mnl_socket_get_fd(nl), portid = mnl_socket_get_portid(nl); > > uint32_t iov_len = nftnl_batch_iovec_len(ctx->batch); > > char rcv_buf[MNL_SOCKET_BUFFER_SIZE]; > > - unsigned int enobuf_restarts = 0; > > - size_t avg_msg_size, batch_size; > > const struct sockaddr_nl snl = { > > .nl_family = AF_NETLINK > > }; > > @@ -321,17 +319,22 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, > > .tv_usec = 0 > > }; > > struct iovec iov[iov_len]; > > - unsigned int scale = 4; > > struct msghdr msg = {}; > > fd_set readfds; > > > > mnl_set_sndbuffer(ctx->nft->nf_sock, ctx->batch); > > > > - batch_size = mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len); > > - avg_msg_size = div_round_up(batch_size, num_cmds); > > + mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len); > > > > -restart: > > - mnl_set_rcvbuffer(ctx->nft->nf_sock, num_cmds * avg_msg_size * scale); > > + if (nft_output_echo(&ctx->nft->output)) { > > + size_t buffer_size = MNL_SOCKET_BUFFER_SIZE * 1024; > > + size_t new_buffer_size = num_cmds * 1024; > > Probably all simplify this to? > > mnl_set_rcvbuffer(ctx->nft->nf_sock, (1 << 10) * num_cmds); Reason for above patch was to avoid any risk for normal operations by restricting the recvbuffer tuning to echo-mode and also adding a lower thresh. For some reason I don't like the idea of setting only 1k recvbuf by default in the extreme case. That said, it does seem to work. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH nft] src: mnl: fix --echo buffer size -- again 2019-09-10 10:52 ` Florian Westphal @ 2019-09-10 11:22 ` Pablo Neira Ayuso 2019-09-10 11:44 ` Florian Westphal 0 siblings, 1 reply; 6+ messages in thread From: Pablo Neira Ayuso @ 2019-09-10 11:22 UTC (permalink / raw) To: Florian Westphal; +Cc: netfilter-devel, Eric Garver [-- Attachment #1: Type: text/plain, Size: 2167 bytes --] On Tue, Sep 10, 2019 at 12:52:42PM +0200, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > On Tue, Sep 10, 2019 at 12:19:18AM +0200, Florian Westphal wrote: > > [...] > > > diff --git a/src/mnl.c b/src/mnl.c > > > index 9c1f5356c9b9..d664564e16af 100644 > > > --- a/src/mnl.c > > > +++ b/src/mnl.c > > > @@ -311,8 +311,6 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, > > > int ret, fd = mnl_socket_get_fd(nl), portid = mnl_socket_get_portid(nl); > > > uint32_t iov_len = nftnl_batch_iovec_len(ctx->batch); > > > char rcv_buf[MNL_SOCKET_BUFFER_SIZE]; > > > - unsigned int enobuf_restarts = 0; > > > - size_t avg_msg_size, batch_size; > > > const struct sockaddr_nl snl = { > > > .nl_family = AF_NETLINK > > > }; > > > @@ -321,17 +319,22 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, > > > .tv_usec = 0 > > > }; > > > struct iovec iov[iov_len]; > > > - unsigned int scale = 4; > > > struct msghdr msg = {}; > > > fd_set readfds; > > > > > > mnl_set_sndbuffer(ctx->nft->nf_sock, ctx->batch); > > > > > > - batch_size = mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len); > > > - avg_msg_size = div_round_up(batch_size, num_cmds); > > > + mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len); > > > > > > -restart: > > > - mnl_set_rcvbuffer(ctx->nft->nf_sock, num_cmds * avg_msg_size * scale); > > > + if (nft_output_echo(&ctx->nft->output)) { > > > + size_t buffer_size = MNL_SOCKET_BUFFER_SIZE * 1024; > > > + size_t new_buffer_size = num_cmds * 1024; > > > > Probably all simplify this to? > > > > mnl_set_rcvbuffer(ctx->nft->nf_sock, (1 << 10) * num_cmds); > > Reason for above patch was to avoid any risk for normal operations by > restricting the recvbuffer tuning to echo-mode and also adding a > lower thresh. > > For some reason I don't like the idea of setting only 1k recvbuf by > default in the extreme case. I'd still like to keep setting the receive buffer for the non-echo case, a ruleset with lots of acknowledments (lots of errors) might hit ENOBUFS, I remember that was reproducible. Probably this? it's based on your patch. [-- Attachment #2: x.patch --] [-- Type: text/x-diff, Size: 2121 bytes --] diff --git a/src/mnl.c b/src/mnl.c index 9c1f5356c9b9..8031bd6add80 100644 --- a/src/mnl.c +++ b/src/mnl.c @@ -304,6 +304,8 @@ static ssize_t mnl_nft_socket_sendmsg(struct netlink_ctx *ctx, return sendmsg(mnl_socket_get_fd(ctx->nft->nf_sock), msg, 0); } +#define NFT_MNL_ECHO_RCVBUFF_DEFAULT (MNL_SOCKET_BUFFER_SIZE * 1024) + int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, uint32_t num_cmds) { @@ -311,8 +313,6 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, int ret, fd = mnl_socket_get_fd(nl), portid = mnl_socket_get_portid(nl); uint32_t iov_len = nftnl_batch_iovec_len(ctx->batch); char rcv_buf[MNL_SOCKET_BUFFER_SIZE]; - unsigned int enobuf_restarts = 0; - size_t avg_msg_size, batch_size; const struct sockaddr_nl snl = { .nl_family = AF_NETLINK }; @@ -321,17 +321,24 @@ int mnl_batch_talk(struct netlink_ctx *ctx, struct list_head *err_list, .tv_usec = 0 }; struct iovec iov[iov_len]; - unsigned int scale = 4; struct msghdr msg = {}; + unsigned int rcvbufsiz; + size_t batch_size; fd_set readfds; mnl_set_sndbuffer(ctx->nft->nf_sock, ctx->batch); batch_size = mnl_nft_batch_to_msg(ctx, &msg, &snl, iov, iov_len); - avg_msg_size = div_round_up(batch_size, num_cmds); -restart: - mnl_set_rcvbuffer(ctx->nft->nf_sock, num_cmds * avg_msg_size * scale); + if (nft_output_echo(&ctx->nft->output)) { + rcvbufsiz = num_cmds * 1024; + if (rcvbufsiz < NFT_MNL_ECHO_RCVBUFF_DEFAULT) + rcvbufsiz = NFT_MNL_ECHO_RCVBUFF_DEFAULT; + } else { + rcvbufsiz = num_cmds * div_round_up(batch_size, num_cmds) * 4; + } + + mnl_set_rcvbuffer(ctx->nft->nf_sock, rcvbufsiz); ret = mnl_nft_socket_sendmsg(ctx, &msg); if (ret == -1) @@ -350,13 +357,8 @@ restart: break; ret = mnl_socket_recvfrom(nl, rcv_buf, sizeof(rcv_buf)); - if (ret == -1) { - if (errno == ENOBUFS && enobuf_restarts++ < 3) { - scale *= 2; - goto restart; - } + if (ret == -1) return -1; - } ret = mnl_cb_run(rcv_buf, ret, 0, portid, &netlink_echo_callback, ctx); /* Continue on error, make sure we get all acknowledgments */ ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH nft] src: mnl: fix --echo buffer size -- again 2019-09-10 11:22 ` Pablo Neira Ayuso @ 2019-09-10 11:44 ` Florian Westphal 2019-09-10 13:08 ` Eric Garver 0 siblings, 1 reply; 6+ messages in thread From: Florian Westphal @ 2019-09-10 11:44 UTC (permalink / raw) To: Pablo Neira Ayuso; +Cc: Florian Westphal, netfilter-devel, Eric Garver Pablo Neira Ayuso <pablo@netfilter.org> wrote: > I'd still like to keep setting the receive buffer for the non-echo > case, a ruleset with lots of acknowledments (lots of errors) might hit > ENOBUFS, I remember that was reproducible. > > Probably this? it's based on your patch. LGTM, feel free to apply this. ^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH nft] src: mnl: fix --echo buffer size -- again 2019-09-10 11:44 ` Florian Westphal @ 2019-09-10 13:08 ` Eric Garver 0 siblings, 0 replies; 6+ messages in thread From: Eric Garver @ 2019-09-10 13:08 UTC (permalink / raw) To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel On Tue, Sep 10, 2019 at 01:44:12PM +0200, Florian Westphal wrote: > Pablo Neira Ayuso <pablo@netfilter.org> wrote: > > I'd still like to keep setting the receive buffer for the non-echo > > case, a ruleset with lots of acknowledments (lots of errors) might hit > > ENOBUFS, I remember that was reproducible. > > > > Probably this? it's based on your patch. > > LGTM, feel free to apply this. Pablo's version passes all my tests. But so does Florian's version. Feel free to add my tested-by tag. Tested-by: Eric Garver <eric@garver.life> ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2019-09-10 13:08 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-09-09 22:19 [PATCH nft] src: mnl: fix --echo buffer size -- again Florian Westphal 2019-09-10 8:50 ` Pablo Neira Ayuso 2019-09-10 10:52 ` Florian Westphal 2019-09-10 11:22 ` Pablo Neira Ayuso 2019-09-10 11:44 ` Florian Westphal 2019-09-10 13:08 ` Eric Garver
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).