All of lore.kernel.org
 help / color / mirror / Atom feed
* [iptables PATCH] nft: Increase BATCH_PAGE_SIZE to support huge rulesets
@ 2021-04-01 14:53 Phil Sutter
  2021-04-02  5:38 ` Florian Westphal
  0 siblings, 1 reply; 6+ messages in thread
From: Phil Sutter @ 2021-04-01 14:53 UTC (permalink / raw)
  To: Pablo Neira Ayuso; +Cc: netfilter-devel

In order to support the same ruleset sizes as legacy iptables, the
kernel's limit of 1024 iovecs has to be overcome. Therefore increase
each iovec's size from 256KB to 4MB.

While being at it, add a log message for failing sendmsg() call. This is
not supposed to happen, even if the transaction fails. Yet if it does,
users are left with only a "line XXX failed" message (with line number
being the COMMIT line).

Signed-off-by: Phil Sutter <phil@nwl.cc>
---
 iptables/nft.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/iptables/nft.c b/iptables/nft.c
index bd840e75f83f4..e19c88ece6c2a 100644
--- a/iptables/nft.c
+++ b/iptables/nft.c
@@ -88,11 +88,11 @@ int mnl_talk(struct nft_handle *h, struct nlmsghdr *nlh,
 
 #define NFT_NLMSG_MAXSIZE (UINT16_MAX + getpagesize())
 
-/* selected batch page is 256 Kbytes long to load ruleset of
- * half a million rules without hitting -EMSGSIZE due to large
- * iovec.
+/* Selected batch page is 4 Mbytes long to support loading a ruleset of 3.5M
+ * rules matching on source and destination address as well as input and output
+ * interfaces. This is what legacy iptables supports.
  */
-#define BATCH_PAGE_SIZE getpagesize() * 32
+#define BATCH_PAGE_SIZE getpagesize() * 512
 
 static struct nftnl_batch *mnl_batch_init(void)
 {
@@ -220,8 +220,10 @@ static int mnl_batch_talk(struct nft_handle *h, int numcmds)
 	int err = 0;
 
 	ret = mnl_nft_socket_sendmsg(h, numcmds);
-	if (ret == -1)
+	if (ret == -1) {
+		fprintf(stderr, "sendmsg() failed: %s\n", strerror(errno));
 		return -1;
+	}
 
 	FD_ZERO(&readfds);
 	FD_SET(fd, &readfds);
-- 
2.31.0


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [iptables PATCH] nft: Increase BATCH_PAGE_SIZE to support huge rulesets
  2021-04-01 14:53 [iptables PATCH] nft: Increase BATCH_PAGE_SIZE to support huge rulesets Phil Sutter
@ 2021-04-02  5:38 ` Florian Westphal
  2021-04-02  6:47   ` Pablo Neira Ayuso
  2021-04-03  8:49   ` Phil Sutter
  0 siblings, 2 replies; 6+ messages in thread
From: Florian Westphal @ 2021-04-02  5:38 UTC (permalink / raw)
  To: Phil Sutter; +Cc: Pablo Neira Ayuso, netfilter-devel

Phil Sutter <phil@nwl.cc> wrote:
> In order to support the same ruleset sizes as legacy iptables, the
> kernel's limit of 1024 iovecs has to be overcome. Therefore increase
> each iovec's size from 256KB to 4MB.
> 
> While being at it, add a log message for failing sendmsg() call. This is
> not supposed to happen, even if the transaction fails. Yet if it does,
> users are left with only a "line XXX failed" message (with line number
> being the COMMIT line).
> 
> Signed-off-by: Phil Sutter <phil@nwl.cc>
> ---
>  iptables/nft.c | 12 +++++++-----
>  1 file changed, 7 insertions(+), 5 deletions(-)
> 
> diff --git a/iptables/nft.c b/iptables/nft.c
> index bd840e75f83f4..e19c88ece6c2a 100644
> --- a/iptables/nft.c
> +++ b/iptables/nft.c
> @@ -88,11 +88,11 @@ int mnl_talk(struct nft_handle *h, struct nlmsghdr *nlh,
>  
>  #define NFT_NLMSG_MAXSIZE (UINT16_MAX + getpagesize())
>  
> -/* selected batch page is 256 Kbytes long to load ruleset of
> - * half a million rules without hitting -EMSGSIZE due to large
> - * iovec.
> +/* Selected batch page is 4 Mbytes long to support loading a ruleset of 3.5M
> + * rules matching on source and destination address as well as input and output
> + * interfaces. This is what legacy iptables supports.
>   */
> -#define BATCH_PAGE_SIZE getpagesize() * 32
> +#define BATCH_PAGE_SIZE getpagesize() * 512

Why not remove getpagesize() altogether?

The comment assumes getpagesize returns 4096 so might as well just use
"#define BATCH_PAGE_SIZE  (4 * 1024 * 1024)" or similar?

On my system getpagesize() * 512 yields 2097152 ...

>  static struct nftnl_batch *mnl_batch_init(void)
>  {
> @@ -220,8 +220,10 @@ static int mnl_batch_talk(struct nft_handle *h, int numcmds)
>  	int err = 0;
>  
>  	ret = mnl_nft_socket_sendmsg(h, numcmds);
> -	if (ret == -1)
> +	if (ret == -1) {
> +		fprintf(stderr, "sendmsg() failed: %s\n", strerror(errno));
>  		return -1;
> +	}

Isn't that library code?  At the very least this should use
nft_print().

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [iptables PATCH] nft: Increase BATCH_PAGE_SIZE to support huge rulesets
  2021-04-02  5:38 ` Florian Westphal
@ 2021-04-02  6:47   ` Pablo Neira Ayuso
  2021-04-03  8:49   ` Phil Sutter
  1 sibling, 0 replies; 6+ messages in thread
From: Pablo Neira Ayuso @ 2021-04-02  6:47 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Phil Sutter, netfilter-devel

On Fri, Apr 02, 2021 at 07:38:10AM +0200, Florian Westphal wrote:
> Phil Sutter <phil@nwl.cc> wrote:
> > In order to support the same ruleset sizes as legacy iptables, the
> > kernel's limit of 1024 iovecs has to be overcome. Therefore increase
> > each iovec's size from 256KB to 4MB.
> > 
> > While being at it, add a log message for failing sendmsg() call. This is
> > not supposed to happen, even if the transaction fails. Yet if it does,
> > users are left with only a "line XXX failed" message (with line number
> > being the COMMIT line).
> > 
> > Signed-off-by: Phil Sutter <phil@nwl.cc>
> > ---
> >  iptables/nft.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/iptables/nft.c b/iptables/nft.c
> > index bd840e75f83f4..e19c88ece6c2a 100644
> > --- a/iptables/nft.c
> > +++ b/iptables/nft.c
> > @@ -88,11 +88,11 @@ int mnl_talk(struct nft_handle *h, struct nlmsghdr *nlh,
> >  
> >  #define NFT_NLMSG_MAXSIZE (UINT16_MAX + getpagesize())
> >  
> > -/* selected batch page is 256 Kbytes long to load ruleset of
> > - * half a million rules without hitting -EMSGSIZE due to large
> > - * iovec.
> > +/* Selected batch page is 4 Mbytes long to support loading a ruleset of 3.5M
> > + * rules matching on source and destination address as well as input and output
> > + * interfaces. This is what legacy iptables supports.
> >   */
> > -#define BATCH_PAGE_SIZE getpagesize() * 32
> > +#define BATCH_PAGE_SIZE getpagesize() * 512
> 
> Why not remove getpagesize() altogether?
> 
> The comment assumes getpagesize returns 4096 so might as well just use
> "#define BATCH_PAGE_SIZE  (4 * 1024 * 1024)" or similar?

Agreed.

> On my system getpagesize() * 512 yields 2097152 ...
>
> >  static struct nftnl_batch *mnl_batch_init(void)
> >  {
> > @@ -220,8 +220,10 @@ static int mnl_batch_talk(struct nft_handle *h, int numcmds)
> >  	int err = 0;
> >  
> >  	ret = mnl_nft_socket_sendmsg(h, numcmds);
> > -	if (ret == -1)
> > +	if (ret == -1) {
> > +		fprintf(stderr, "sendmsg() failed: %s\n", strerror(errno));
> >  		return -1;
> > +	}
> 
> Isn't that library code?  At the very least this should use
> nft_print().

I'm not sure this update is required. EMSGSIZE should only come from
sendmsg() in the mnl_batch_talk() path.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [iptables PATCH] nft: Increase BATCH_PAGE_SIZE to support huge rulesets
  2021-04-02  5:38 ` Florian Westphal
  2021-04-02  6:47   ` Pablo Neira Ayuso
@ 2021-04-03  8:49   ` Phil Sutter
  2021-04-03 10:23     ` Pablo Neira Ayuso
  2021-04-03 11:31     ` Florian Westphal
  1 sibling, 2 replies; 6+ messages in thread
From: Phil Sutter @ 2021-04-03  8:49 UTC (permalink / raw)
  To: Florian Westphal; +Cc: Pablo Neira Ayuso, netfilter-devel

Hi,

On Fri, Apr 02, 2021 at 07:38:10AM +0200, Florian Westphal wrote:
> Phil Sutter <phil@nwl.cc> wrote:
> > In order to support the same ruleset sizes as legacy iptables, the
> > kernel's limit of 1024 iovecs has to be overcome. Therefore increase
> > each iovec's size from 256KB to 4MB.
> > 
> > While being at it, add a log message for failing sendmsg() call. This is
> > not supposed to happen, even if the transaction fails. Yet if it does,
> > users are left with only a "line XXX failed" message (with line number
> > being the COMMIT line).
> > 
> > Signed-off-by: Phil Sutter <phil@nwl.cc>
> > ---
> >  iptables/nft.c | 12 +++++++-----
> >  1 file changed, 7 insertions(+), 5 deletions(-)
> > 
> > diff --git a/iptables/nft.c b/iptables/nft.c
> > index bd840e75f83f4..e19c88ece6c2a 100644
> > --- a/iptables/nft.c
> > +++ b/iptables/nft.c
> > @@ -88,11 +88,11 @@ int mnl_talk(struct nft_handle *h, struct nlmsghdr *nlh,
> >  
> >  #define NFT_NLMSG_MAXSIZE (UINT16_MAX + getpagesize())
> >  
> > -/* selected batch page is 256 Kbytes long to load ruleset of
> > - * half a million rules without hitting -EMSGSIZE due to large
> > - * iovec.
> > +/* Selected batch page is 4 Mbytes long to support loading a ruleset of 3.5M
> > + * rules matching on source and destination address as well as input and output
> > + * interfaces. This is what legacy iptables supports.
> >   */
> > -#define BATCH_PAGE_SIZE getpagesize() * 32
> > +#define BATCH_PAGE_SIZE getpagesize() * 512
> 
> Why not remove getpagesize() altogether?

Yes, why not. At least I couldn't find a reason in git log why it's
there in the first place.

> The comment assumes getpagesize returns 4096 so might as well just use
> "#define BATCH_PAGE_SIZE  (4 * 1024 * 1024)" or similar?
> 
> On my system getpagesize() * 512 yields 2097152 ...

Thanks for digging deeper, my comment was wrong. I believed the old
comment and assumed getpagesize() would return 256k / 32 = 8k but indeed
it returns 4k.

> >  static struct nftnl_batch *mnl_batch_init(void)
> >  {
> > @@ -220,8 +220,10 @@ static int mnl_batch_talk(struct nft_handle *h, int numcmds)
> >  	int err = 0;
> >  
> >  	ret = mnl_nft_socket_sendmsg(h, numcmds);
> > -	if (ret == -1)
> > +	if (ret == -1) {
> > +		fprintf(stderr, "sendmsg() failed: %s\n", strerror(errno));
> >  		return -1;
> > +	}
> 
> Isn't that library code?  At the very least this should use
> nft_print().

Good point, but for the upcoming identical change to nftables! ;)
There I'm still undecided about the best way to handle it. For iptables,
I guess this minimal error reporting to stderr for a case that shouldn't
happen is fine. ACK?

Thanks, Phil

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [iptables PATCH] nft: Increase BATCH_PAGE_SIZE to support huge rulesets
  2021-04-03  8:49   ` Phil Sutter
@ 2021-04-03 10:23     ` Pablo Neira Ayuso
  2021-04-03 11:31     ` Florian Westphal
  1 sibling, 0 replies; 6+ messages in thread
From: Pablo Neira Ayuso @ 2021-04-03 10:23 UTC (permalink / raw)
  To: Phil Sutter, Florian Westphal, netfilter-devel

On Sat, Apr 03, 2021 at 10:49:40AM +0200, Phil Sutter wrote:
> Hi,
> 
> On Fri, Apr 02, 2021 at 07:38:10AM +0200, Florian Westphal wrote:
> > Phil Sutter <phil@nwl.cc> wrote:
> > > In order to support the same ruleset sizes as legacy iptables, the
> > > kernel's limit of 1024 iovecs has to be overcome. Therefore increase
> > > each iovec's size from 256KB to 4MB.
> > > 
> > > While being at it, add a log message for failing sendmsg() call. This is
> > > not supposed to happen, even if the transaction fails. Yet if it does,
> > > users are left with only a "line XXX failed" message (with line number
> > > being the COMMIT line).
> > > 
> > > Signed-off-by: Phil Sutter <phil@nwl.cc>
> > > ---
> > >  iptables/nft.c | 12 +++++++-----
> > >  1 file changed, 7 insertions(+), 5 deletions(-)
> > > 
> > > diff --git a/iptables/nft.c b/iptables/nft.c
> > > index bd840e75f83f4..e19c88ece6c2a 100644
> > > --- a/iptables/nft.c
> > > +++ b/iptables/nft.c
> > > @@ -88,11 +88,11 @@ int mnl_talk(struct nft_handle *h, struct nlmsghdr *nlh,
> > >  
> > >  #define NFT_NLMSG_MAXSIZE (UINT16_MAX + getpagesize())
> > >  
> > > -/* selected batch page is 256 Kbytes long to load ruleset of
> > > - * half a million rules without hitting -EMSGSIZE due to large
> > > - * iovec.
> > > +/* Selected batch page is 4 Mbytes long to support loading a ruleset of 3.5M
> > > + * rules matching on source and destination address as well as input and output
> > > + * interfaces. This is what legacy iptables supports.
> > >   */
> > > -#define BATCH_PAGE_SIZE getpagesize() * 32
> > > +#define BATCH_PAGE_SIZE getpagesize() * 512
> > 
> > Why not remove getpagesize() altogether?
> 
> Yes, why not. At least I couldn't find a reason in git log why it's
> there in the first place.
> 
> > The comment assumes getpagesize returns 4096 so might as well just use
> > "#define BATCH_PAGE_SIZE  (4 * 1024 * 1024)" or similar?
> > 
> > On my system getpagesize() * 512 yields 2097152 ...
> 
> Thanks for digging deeper, my comment was wrong. I believed the old
> comment and assumed getpagesize() would return 256k / 32 = 8k but indeed
> it returns 4k.

NLMSG_GOODSIZE is not exposed to userspace, I was using it as
reference. NLMSG_GOODSIZE is PAGE_SIZE.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [iptables PATCH] nft: Increase BATCH_PAGE_SIZE to support huge rulesets
  2021-04-03  8:49   ` Phil Sutter
  2021-04-03 10:23     ` Pablo Neira Ayuso
@ 2021-04-03 11:31     ` Florian Westphal
  1 sibling, 0 replies; 6+ messages in thread
From: Florian Westphal @ 2021-04-03 11:31 UTC (permalink / raw)
  To: Phil Sutter, Florian Westphal, Pablo Neira Ayuso, netfilter-devel

Phil Sutter <phil@nwl.cc> wrote:
> > Isn't that library code?  At the very least this should use
> > nft_print().
> 
> Good point, but for the upcoming identical change to nftables! ;)
> There I'm still undecided about the best way to handle it. For iptables,
> I guess this minimal error reporting to stderr for a case that shouldn't
> happen is fine. ACK?

Oh right.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-04-03 11:32 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-01 14:53 [iptables PATCH] nft: Increase BATCH_PAGE_SIZE to support huge rulesets Phil Sutter
2021-04-02  5:38 ` Florian Westphal
2021-04-02  6:47   ` Pablo Neira Ayuso
2021-04-03  8:49   ` Phil Sutter
2021-04-03 10:23     ` Pablo Neira Ayuso
2021-04-03 11:31     ` Florian Westphal

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.