All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Brauner <christian@brauner.io>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: davem@davemloft.net, netdev@vger.kernel.org,
	linux-kernel@vger.kernel.org, netfilter-devel@vger.kernel.org,
	coreteam@netfilter.org, bridge@lists.linux-foundation.org,
	tyhicks@canonical.com, kadlec@blackhole.kfki.hu, fw@strlen.de,
	roopa@cumulusnetworks.com, nikolay@cumulusnetworks.com
Subject: Re: [PATCH net-next 1/2] br_netfilter: add struct netns_brnf
Date: Thu, 13 Dec 2018 12:43:39 +0100	[thread overview]
Message-ID: <20181213114338.j3thzzu7cvrpz72e@brauner.io> (raw)
In-Reply-To: <20181127082349.ummq2perajt6olhh@salvia>

On Tue, Nov 27, 2018 at 09:23:49AM +0100, Pablo Neira Ayuso wrote:
> On Tue, Nov 27, 2018 at 03:20:45AM +0100, Christian Brauner wrote:
> > On Tue, Nov 27, 2018 at 01:20:47AM +0100, Pablo Neira Ayuso wrote:
> > > Hi,
> > > 
> > > On Wed, Nov 07, 2018 at 02:48:58PM +0100, Christian Brauner wrote:
> > > [...]
> > > > diff --git a/include/net/netns/netfilter.h b/include/net/netns/netfilter.h
> > > > index ca043342c0eb..eedbd1ac940e 100644
> > > > --- a/include/net/netns/netfilter.h
> > > > +++ b/include/net/netns/netfilter.h
> > > > @@ -35,4 +35,20 @@ struct netns_nf {
> > > >  	bool			defrag_ipv6;
> > > >  #endif
> > > >  };
> > > > +
> > > > +struct netns_brnf {
> > > > +#ifdef CONFIG_SYSCTL
> > > > +	struct ctl_table_header *ctl_hdr;
> > > > +#endif
> > > > +
> > > > +	/* default value is 1 */
> > > > +	int call_iptables;
> > > > +	int call_ip6tables;
> > > > +	int call_arptables;
> > > > +
> > > > +	/* default value is 0 */
> > > > +	int filter_vlan_tagged;
> > > > +	int filter_pppoe_tagged;
> > > > +	int pass_vlan_indev;
> > > > +};
> > > 
> > > I have spun on this several times, wondering if there's a way to avoid
> > > scratching these many bytes per netns to expose these sysctl entries
> > > that are plain on/off toggles... You said this:
> > > 
> > > >Currently, the /proc/sys/net/bridge folder is only created in the
> > > >initial network namespace
> > > 
> > > I think we can add one single sysctl to expose these as flags from net
> > > namespaces. Idea is to keep the existing (legacy) sysctl entries for
> > > init_net only, and add a new single new one that exposes these as flags
> > > (should be also available for consistency in init_net I'd suggest).
> > > Flags could be map in this way, eg.
> > > 
> > >         0x1     call_iptables
> > >         0x2     call_ip6tables
> > >         0x4     call_arptables
> > >         0x8     filter_vlan_tagged
> > >         ...
> > > 
> > > Also documentation would be good to have for this.
> > > 
> > > Would this idea fly for you? Thanks.
> > 
> > My suggestion is to keep these files per network namespace but have a
> > single flag argument in struct netns_brnf:
> > +struct netns_brnf {
> > +#ifdef CONFIG_SYSCTL
> > +        struct ctl_table_header *ctl_hdr;
> > +#endif
> > +
> > +       /* default value is 1 */
> > +       unsigned int filter_flags;
> > +};
> > 
> > #define BRNF_CALL_IPTABLES    0x1
> > #define BRNF_CALL_IP6TABLES   0x2
> > #define BRNF_CALL_ARPTABLES   0x4
> > #define BRNF_CALL_VLAN_TAGGED 0x8
> > 
> > a write to the corresponding file would then cause the flag to be set or
> > unset in filter_flags.
> > This way we are a) space-efficient internally not bloating struct net
> > while b) not breaking running tools in non-initial network namespaces
> > that expect the files to be there. b) is really the important bit here. :)
> 
> OK, please, go explore this space-efficient approach. Thanks.

Sorry for the wait. Other patches came up. :)
So, I looked into this approach and it is annoying to do:
- the sysctl proc parsing infrastructure is not equipped to deal with
  flags at all and expanding it to it would be a lot of code
- we would need either an atomic type or locking for filter_flags in the
  netns_brnf struct if multiple proc sysctl handlers try to raise or
  lower bits in filter_flags via different files at the same time

So I feel that this is not a feasible solution. We could make netns_brnf
a pointer in struct net and allocate it on new network namespace
creation if we care about space but then we take the performance hit of
k*alloc().
What I stressed before: for userspace it's important that we don't
change the semantics how br netfilter is configured in a non-initial
network namespace to not break existing tools in such environments.

Christian

WARNING: multiple messages have this Message-ID (diff)
From: Christian Brauner <christian@brauner.io>
To: Pablo Neira Ayuso <pablo@netfilter.org>
Cc: nikolay@cumulusnetworks.com, netdev@vger.kernel.org,
	roopa@cumulusnetworks.com, bridge@lists.linux-foundation.org,
	fw@strlen.de, linux-kernel@vger.kernel.org,
	tyhicks@canonical.com, coreteam@netfilter.org,
	netfilter-devel@vger.kernel.org, kadlec@blackhole.kfki.hu,
	davem@davemloft.net
Subject: Re: [Bridge] [PATCH net-next 1/2] br_netfilter: add struct netns_brnf
Date: Thu, 13 Dec 2018 12:43:39 +0100	[thread overview]
Message-ID: <20181213114338.j3thzzu7cvrpz72e@brauner.io> (raw)
In-Reply-To: <20181127082349.ummq2perajt6olhh@salvia>

On Tue, Nov 27, 2018 at 09:23:49AM +0100, Pablo Neira Ayuso wrote:
> On Tue, Nov 27, 2018 at 03:20:45AM +0100, Christian Brauner wrote:
> > On Tue, Nov 27, 2018 at 01:20:47AM +0100, Pablo Neira Ayuso wrote:
> > > Hi,
> > > 
> > > On Wed, Nov 07, 2018 at 02:48:58PM +0100, Christian Brauner wrote:
> > > [...]
> > > > diff --git a/include/net/netns/netfilter.h b/include/net/netns/netfilter.h
> > > > index ca043342c0eb..eedbd1ac940e 100644
> > > > --- a/include/net/netns/netfilter.h
> > > > +++ b/include/net/netns/netfilter.h
> > > > @@ -35,4 +35,20 @@ struct netns_nf {
> > > >  	bool			defrag_ipv6;
> > > >  #endif
> > > >  };
> > > > +
> > > > +struct netns_brnf {
> > > > +#ifdef CONFIG_SYSCTL
> > > > +	struct ctl_table_header *ctl_hdr;
> > > > +#endif
> > > > +
> > > > +	/* default value is 1 */
> > > > +	int call_iptables;
> > > > +	int call_ip6tables;
> > > > +	int call_arptables;
> > > > +
> > > > +	/* default value is 0 */
> > > > +	int filter_vlan_tagged;
> > > > +	int filter_pppoe_tagged;
> > > > +	int pass_vlan_indev;
> > > > +};
> > > 
> > > I have spun on this several times, wondering if there's a way to avoid
> > > scratching these many bytes per netns to expose these sysctl entries
> > > that are plain on/off toggles... You said this:
> > > 
> > > >Currently, the /proc/sys/net/bridge folder is only created in the
> > > >initial network namespace
> > > 
> > > I think we can add one single sysctl to expose these as flags from net
> > > namespaces. Idea is to keep the existing (legacy) sysctl entries for
> > > init_net only, and add a new single new one that exposes these as flags
> > > (should be also available for consistency in init_net I'd suggest).
> > > Flags could be map in this way, eg.
> > > 
> > >         0x1     call_iptables
> > >         0x2     call_ip6tables
> > >         0x4     call_arptables
> > >         0x8     filter_vlan_tagged
> > >         ...
> > > 
> > > Also documentation would be good to have for this.
> > > 
> > > Would this idea fly for you? Thanks.
> > 
> > My suggestion is to keep these files per network namespace but have a
> > single flag argument in struct netns_brnf:
> > +struct netns_brnf {
> > +#ifdef CONFIG_SYSCTL
> > +        struct ctl_table_header *ctl_hdr;
> > +#endif
> > +
> > +       /* default value is 1 */
> > +       unsigned int filter_flags;
> > +};
> > 
> > #define BRNF_CALL_IPTABLES    0x1
> > #define BRNF_CALL_IP6TABLES   0x2
> > #define BRNF_CALL_ARPTABLES   0x4
> > #define BRNF_CALL_VLAN_TAGGED 0x8
> > 
> > a write to the corresponding file would then cause the flag to be set or
> > unset in filter_flags.
> > This way we are a) space-efficient internally not bloating struct net
> > while b) not breaking running tools in non-initial network namespaces
> > that expect the files to be there. b) is really the important bit here. :)
> 
> OK, please, go explore this space-efficient approach. Thanks.

Sorry for the wait. Other patches came up. :)
So, I looked into this approach and it is annoying to do:
- the sysctl proc parsing infrastructure is not equipped to deal with
  flags at all and expanding it to it would be a lot of code
- we would need either an atomic type or locking for filter_flags in the
  netns_brnf struct if multiple proc sysctl handlers try to raise or
  lower bits in filter_flags via different files at the same time

So I feel that this is not a feasible solution. We could make netns_brnf
a pointer in struct net and allocate it on new network namespace
creation if we care about space but then we take the performance hit of
k*alloc().
What I stressed before: for userspace it's important that we don't
change the semantics how br netfilter is configured in a non-initial
network namespace to not break existing tools in such environments.

Christian

  parent reply	other threads:[~2018-12-13 11:43 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-11-07 13:48 [PATCH net-next 0/2] br_netfilter: enable in non-initial netns Christian Brauner
2018-11-07 13:48 ` [Bridge] " Christian Brauner
2018-11-07 13:48 ` [PATCH net-next 1/2] br_netfilter: add struct netns_brnf Christian Brauner
2018-11-07 13:48   ` [Bridge] " Christian Brauner
2018-11-27  0:20   ` Pablo Neira Ayuso
2018-11-27  0:20     ` [Bridge] " Pablo Neira Ayuso
2018-11-27  2:20     ` Christian Brauner
2018-11-27  2:20       ` [Bridge] " Christian Brauner
2018-11-27  8:23       ` Pablo Neira Ayuso
2018-11-27  8:23         ` [Bridge] " Pablo Neira Ayuso
2018-11-27 10:19         ` Christian Brauner
2018-11-27 10:19           ` [Bridge] " Christian Brauner
2018-12-13 11:43         ` Christian Brauner [this message]
2018-12-13 11:43           ` Christian Brauner
2018-11-07 13:48 ` [PATCH net-next 2/2] br_netfilter: namespace bridge netfilter sysctls Christian Brauner
2018-11-07 13:48   ` [Bridge] " Christian Brauner
2019-03-07 14:58 ` [PATCH net-next 0/2] br_netfilter: enable in non-initial netns Florian LAUNAY
2019-03-07 14:58   ` [Bridge] " Florian LAUNAY

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20181213114338.j3thzzu7cvrpz72e@brauner.io \
    --to=christian@brauner.io \
    --cc=bridge@lists.linux-foundation.org \
    --cc=coreteam@netfilter.org \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=kadlec@blackhole.kfki.hu \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=netfilter-devel@vger.kernel.org \
    --cc=nikolay@cumulusnetworks.com \
    --cc=pablo@netfilter.org \
    --cc=roopa@cumulusnetworks.com \
    --cc=tyhicks@canonical.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.