All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [PATCH 2/6] vxlan: Group Policy extension
@ 2015-01-07 17:32 Alexei Starovoitov
       [not found] ` <CAADnVQJErdNJrXOOSqEqkbC8524VCH2E9vYL-WdTb_6SGsTwvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2015-01-07 17:32 UTC (permalink / raw)
  To: Thomas Graf
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	Stephen Hemminger, David S. Miller

On Wed, Jan 7, 2015 at 3:10 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/06/15 at 07:37pm, Alexei Starovoitov wrote:
>> Even it works ok, I think this struct layout is ugly.
>> imo would be much easier to read if you replace
>> the whole vxlanhdr with vxlanhdr_gbp
>> or split vxlanhdr into two 32-bit structs.
>> then __packed hacks won't be needed.
>
> The main reason why I merged it into vxlanhdr is for documentation
> purposes and to avoid duplicating the generic VXLAN header for every
> extension. The RCO and GPE extensions would need to duplicate this
> over and over. It gets messy in particular when multiple extensions
> can be used in combination (such as GBP and RCO) which then each
> have their own conflicting header definitions. This way, it is clear
> which extensions are compatible by just looking at the definition
> of the structure.

I'm afraid 'union' style with first u8 flags working as selector
won't work for the case you're describing, but since
                      md.gbp = ntohs(vxh->gbp.policy_id);
    2652:       41 0f b7 55 0a          movzwl 0xa(%r13),%edx
then at least from performance side it's ok at least on x86.
So this _packed stuff is fine, though not pretty.
It's internal header, so we can improve it later.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
       [not found] ` <CAADnVQJErdNJrXOOSqEqkbC8524VCH2E9vYL-WdTb_6SGsTwvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-01-07 23:27   ` Thomas Graf
  2015-01-07 23:39     ` Alexei Starovoitov
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Graf @ 2015-01-07 23:27 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	Stephen Hemminger, David S. Miller

On 01/07/15 at 09:32am, Alexei Starovoitov wrote:
> I'm afraid 'union' style with first u8 flags working as selector
> won't work for the case you're describing, but since
>                       md.gbp = ntohs(vxh->gbp.policy_id);
>     2652:       41 0f b7 55 0a          movzwl 0xa(%r13),%edx
> then at least from performance side it's ok at least on x86.
> So this _packed stuff is fine, though not pretty.
> It's internal header, so we can improve it later.

I'm not sure I understand your first sentence but I'm not
married to the code as-is.

Would you like something like this?

 struct vxlanhdr_gbp {
 	__u8 vx_flags;
+#ifdef __LITTLE_ENDIAN_BITFIELD
+       __u8    reserved_flags1:3,
+               policy_applied:1,
+               reserved_flags2:2,
+               dont_learn:1,
+               reserved_flags3:1;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+       __u8    reserved_flags1:1,
+               dont_learn:1,
+               reserved_flags2:2,
+               policy_applied:1,
+               reserved_flags3:3;
+#else
+#error "Please fix <asm/byteorder.h>"
+#endif
+       __be16 policy_id;
+       __be32  vx_vni;
 };

 struct vxlanhdr {
+       union {
+               struct {
+#ifdef __LITTLE_ENDIAN_BITFIELD
+                       __u8    reserved_flags1:3,
+                               vni_present:1,
+                               reserved_flags2:3,
+                               gbp_present:1;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+                       __u8    gbp_present:1,
+                               reserved_flags2:3,
+                               vni_present:1,
+                               reserved_flags1:3;
+#else
+#error "Please fix <asm/byteorder.h>"
+#endif
+                       __u8    vx_reserved1;
+                       __be16  vx_reserved2;
+               };
+               __be32 vx_flags;
+       };
+       __be32  vx_vni;
 };




_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-07 23:27   ` Thomas Graf
@ 2015-01-07 23:39     ` Alexei Starovoitov
  0 siblings, 0 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-01-07 23:39 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Jesse Gross, Stephen Hemminger, Pravin Shelar,
	netdev, dev

On Wed, Jan 7, 2015 at 3:27 PM, Thomas Graf <tgraf@suug.ch> wrote:
>
> Would you like something like this?

yes. imo this version is much easier to read
and reason about different bits in protocol.

May be even use a flag mask on '__be32 vx_flags'
instead of calling out 'gbp_present' as explicit bitfield.
Then different vxlan extensions proposals don't
have to fight over positions in the first byte of
single 'struct vxlanhdr'...
but to me the below two structs look good as-is.

>  struct vxlanhdr_gbp {
>         __u8 vx_flags;
> +#ifdef __LITTLE_ENDIAN_BITFIELD
> +       __u8    reserved_flags1:3,
> +               policy_applied:1,
> +               reserved_flags2:2,
> +               dont_learn:1,
> +               reserved_flags3:1;
> +#elif defined(__BIG_ENDIAN_BITFIELD)
> +       __u8    reserved_flags1:1,
> +               dont_learn:1,
> +               reserved_flags2:2,
> +               policy_applied:1,
> +               reserved_flags3:3;
> +#else
> +#error "Please fix <asm/byteorder.h>"
> +#endif
> +       __be16 policy_id;
> +       __be32  vx_vni;
>  };
>
>  struct vxlanhdr {
> +       union {
> +               struct {
> +#ifdef __LITTLE_ENDIAN_BITFIELD
> +                       __u8    reserved_flags1:3,
> +                               vni_present:1,
> +                               reserved_flags2:3,
> +                               gbp_present:1;
> +#elif defined(__BIG_ENDIAN_BITFIELD)
> +                       __u8    gbp_present:1,
> +                               reserved_flags2:3,
> +                               vni_present:1,
> +                               reserved_flags1:3;
> +#else
> +#error "Please fix <asm/byteorder.h>"
> +#endif
> +                       __u8    vx_reserved1;
> +                       __be16  vx_reserved2;
> +               };
> +               __be32 vx_flags;
> +       };
> +       __be32  vx_vni;
>  };
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-13 11:32         ` Thomas Graf
@ 2015-01-13 16:16           ` Tom Herbert
  0 siblings, 0 replies; 30+ messages in thread
From: Tom Herbert @ 2015-01-13 16:16 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Alexei Starovoitov, Linux Netdev List, dev

On Tue, Jan 13, 2015 at 3:32 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/12/15 at 06:28pm, Tom Herbert wrote:
>> On Mon, Jan 12, 2015 at 5:03 PM, Thomas Graf <tgraf@suug.ch> wrote:
>> >>
>> >> Creating a level of indirection for extensions seems overly
>> >> complicated to me. Why not just define IFLA_VXLAN_GBP as just another
>> >> enum above?
>> >
>> > I think it's cleaner to group them in a nested attribute.
>> > It clearly separates the optional extensions from the base
>> > attributes. RCO, GPE, GBP can all live in there.
>>
>> This is inconsistent with similar things in GRE and GUE. For instance,
>> GRE keyid is set as its own attribute. It just seems like this adding
>> more code to the driver than is necessary for the functionality
>> needed.
>
> The major difference here is that we have to consider backwards
> compatibility specifically for VXLAN. Your initial feedback on GPE
> actually led me to how I implemented GBP.
>
> I think the axioms we want to establish are as follows:
>  1. Extensions need to be explicitly enabled by the user. A previously
>     dropped frame should only be processed if the user explitly asks
>     for it.
>  2. As a consequence: only share a VLXAN UDP port if the enabled
>     extensions match (vxlan_sock_add), e.g. user A might want RCO
>     but user B might be unaware. They cannot share the same UDP port.
>
> The 2nd lead me to introduce the 'exts' member to vxlan_sock so we can
> compare it in vxlan_find_sock() and only share a UDP port if the
> enabled extensions match.
>
RCO is represented in the socket in VXLAN flags (VLXAN_F_*). My patch
also adds a flags to vxlan_sock which contains the VLXAN flags. For
shared port, I suspect all the receive features must match, including
receive checksum settings for instance, but we don't care about
transmit side. To facilitate this, I would suggest splitting flags
into o_flags and i_flags like ip_tunnel does, and then compare i_flags
in vxlan_find_sock.

Regardless of the internal implementation, I still don't see much
value in exposing these distinctions in netlink.

Tom

> Your patch currently implements (1) but not (2).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-13  2:28       ` Tom Herbert
@ 2015-01-13 11:32         ` Thomas Graf
  2015-01-13 16:16           ` Tom Herbert
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Graf @ 2015-01-13 11:32 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Alexei Starovoitov, Linux Netdev List, dev

On 01/12/15 at 06:28pm, Tom Herbert wrote:
> On Mon, Jan 12, 2015 at 5:03 PM, Thomas Graf <tgraf@suug.ch> wrote:
> >>
> >> Creating a level of indirection for extensions seems overly
> >> complicated to me. Why not just define IFLA_VXLAN_GBP as just another
> >> enum above?
> >
> > I think it's cleaner to group them in a nested attribute.
> > It clearly separates the optional extensions from the base
> > attributes. RCO, GPE, GBP can all live in there.
> 
> This is inconsistent with similar things in GRE and GUE. For instance,
> GRE keyid is set as its own attribute. It just seems like this adding
> more code to the driver than is necessary for the functionality
> needed.

The major difference here is that we have to consider backwards
compatibility specifically for VXLAN. Your initial feedback on GPE
actually led me to how I implemented GBP.

I think the axioms we want to establish are as follows:
 1. Extensions need to be explicitly enabled by the user. A previously
    dropped frame should only be processed if the user explitly asks
    for it.
 2. As a consequence: only share a VLXAN UDP port if the enabled
    extensions match (vxlan_sock_add), e.g. user A might want RCO
    but user B might be unaware. They cannot share the same UDP port.

The 2nd lead me to introduce the 'exts' member to vxlan_sock so we can
compare it in vxlan_find_sock() and only share a UDP port if the
enabled extensions match.

Your patch currently implements (1) but not (2).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-12 17:59     ` David Miller
@ 2015-01-13  8:29       ` Nicolas Dichtel
  0 siblings, 0 replies; 30+ messages in thread
From: Nicolas Dichtel @ 2015-01-13  8:29 UTC (permalink / raw)
  To: David Miller
  Cc: tgraf, jesse, stephen, pshelar, therbert, alexei.starovoitov,
	netdev, dev

Le 12/01/2015 18:59, David Miller a écrit :
>
> Can you PLEASE, PLEASE, not quote and entire full patch just to add two
> lines of commentary.
>
> Quote _only_ the _RELEVANT_ portions of the email you are replying to.
>
Will do, sorry for that.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-13  1:03     ` Thomas Graf
@ 2015-01-13  2:28       ` Tom Herbert
  2015-01-13 11:32         ` Thomas Graf
  0 siblings, 1 reply; 30+ messages in thread
From: Tom Herbert @ 2015-01-13  2:28 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Alexei Starovoitov, Linux Netdev List, dev

On Mon, Jan 12, 2015 at 5:03 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/12/15 at 10:14am, Tom Herbert wrote:
>> > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
>> > index f7d0d2d..9f07bf5 100644
>> > --- a/include/uapi/linux/if_link.h
>> > +++ b/include/uapi/linux/if_link.h
>> > @@ -370,10 +370,18 @@ enum {
>> >         IFLA_VXLAN_UDP_CSUM,
>> >         IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
>> >         IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
>> > +       IFLA_VXLAN_EXTENSION,
>> >         __IFLA_VXLAN_MAX
>> >  };
>> >  #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
>> >
>> > +enum {
>> > +       IFLA_VXLAN_EXT_UNSPEC,
>> > +       IFLA_VXLAN_EXT_GBP,
>> > +       __IFLA_VXLAN_EXT_MAX,
>> > +};
>> > +#define IFLA_VXLAN_EXT_MAX (__IFLA_VXLAN_EXT_MAX - 1)
>> > +
>>
>> Creating a level of indirection for extensions seems overly
>> complicated to me. Why not just define IFLA_VXLAN_GBP as just another
>> enum above?
>
> I think it's cleaner to group them in a nested attribute.
> It clearly separates the optional extensions from the base
> attributes. RCO, GPE, GBP can all live in there.

This is inconsistent with similar things in GRE and GUE. For instance,
GRE keyid is set as its own attribute. It just seems like this adding
more code to the driver than is necessary for the functionality
needed.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-12 17:37   ` Nicolas Dichtel
  2015-01-12 17:59     ` David Miller
@ 2015-01-13  1:04     ` Thomas Graf
  1 sibling, 0 replies; 30+ messages in thread
From: Thomas Graf @ 2015-01-13  1:04 UTC (permalink / raw)
  To: Nicolas Dichtel
  Cc: davem, jesse, stephen, pshelar, therbert, alexei.starovoitov,
	netdev, dev

On 01/12/15 at 06:37pm, Nicolas Dichtel wrote:
> >+	if (data[IFLA_VXLAN_EXTENSION])
> >+		configure_vxlan_exts(vxlan, data[IFLA_VXLAN_EXTENSION]);
> >+
> Can you also update vxlan_fill_info() so that these new attributes can be
> dumped via netlink?

Sure, will do.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-12 18:14   ` Tom Herbert
@ 2015-01-13  1:03     ` Thomas Graf
  2015-01-13  2:28       ` Tom Herbert
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Graf @ 2015-01-13  1:03 UTC (permalink / raw)
  To: Tom Herbert
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Alexei Starovoitov, Linux Netdev List, dev

On 01/12/15 at 10:14am, Tom Herbert wrote:
> > diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> > index f7d0d2d..9f07bf5 100644
> > --- a/include/uapi/linux/if_link.h
> > +++ b/include/uapi/linux/if_link.h
> > @@ -370,10 +370,18 @@ enum {
> >         IFLA_VXLAN_UDP_CSUM,
> >         IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
> >         IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
> > +       IFLA_VXLAN_EXTENSION,
> >         __IFLA_VXLAN_MAX
> >  };
> >  #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
> >
> > +enum {
> > +       IFLA_VXLAN_EXT_UNSPEC,
> > +       IFLA_VXLAN_EXT_GBP,
> > +       __IFLA_VXLAN_EXT_MAX,
> > +};
> > +#define IFLA_VXLAN_EXT_MAX (__IFLA_VXLAN_EXT_MAX - 1)
> > +
> 
> Creating a level of indirection for extensions seems overly
> complicated to me. Why not just define IFLA_VXLAN_GBP as just another
> enum above?

I think it's cleaner to group them in a nested attribute.
It clearly separates the optional extensions from the base
attributes. RCO, GPE, GBP can all live in there.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-12 22:59           ` Thomas Graf
@ 2015-01-12 23:19             ` Jesse Gross
  0 siblings, 0 replies; 30+ messages in thread
From: Jesse Gross @ 2015-01-12 23:19 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Stephen Hemminger, Pravin Shelar, Tom Herbert,
	Alexei Starovoitov, dev, netdev

On Mon, Jan 12, 2015 at 2:59 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/12/15 at 02:50pm, Jesse Gross wrote:
>> On Mon, Jan 12, 2015 at 2:47 PM, Thomas Graf <tgraf@suug.ch> wrote:
>> > On 01/12/15 at 11:23am, Jesse Gross wrote:
>> >> On Mon, Jan 12, 2015 at 4:26 AM, Thomas Graf <tgraf@suug.ch> wrote:
>> >> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>> >> > index 4d52aa9..b148739 100644
>> >> > --- a/drivers/net/vxlan.c
>> >> > +++ b/drivers/net/vxlan.c
>> >> > @@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
>> >> >                         continue;
>> >> >
>> >> >                 vh2 = (struct vxlanhdr *)(p->data + off_vx);
>> >> > -               if (vh->vx_vni != vh2->vx_vni) {
>> >> > +               if (vh->vx_flags != vh2->vx_flags ||
>> >> > +                   vh->vx_vni != vh2->vx_vni) {
>> >>
>> >> It's probably better to do a memcmp over the entire header. There's no
>> >> guarantee that new fields will be entirely represented by flags.
>> >
>> > vx_flags covers the entire first 32 bit of vxlanhdr so it's
>> > equivalent to a memcmp() already. I can change it to memcmp() if
>> > you think that's more readable.
>>
>> I was actually referring to the reserved 8 bit chunk after the VNI.
>> This could potentially be used for something in the future.
>
> Shouldn't that be covered by vh->vx_vni != vh2->vx_vni? I may
> still misunderstand, sorry.

Ah, sorry. I see that vx_vni is 4 bytes instead of 3 bytes of the
actual VNI. I agree that the whole header is covered for GRO purposes.
The definition of the VNI field is a little confusing but I guess it's
more efficient than the alternative.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-12 22:50         ` Jesse Gross
@ 2015-01-12 22:59           ` Thomas Graf
  2015-01-12 23:19             ` Jesse Gross
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Graf @ 2015-01-12 22:59 UTC (permalink / raw)
  To: Jesse Gross
  Cc: David Miller, Stephen Hemminger, Pravin Shelar, Tom Herbert,
	Alexei Starovoitov, dev, netdev

On 01/12/15 at 02:50pm, Jesse Gross wrote:
> On Mon, Jan 12, 2015 at 2:47 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > On 01/12/15 at 11:23am, Jesse Gross wrote:
> >> On Mon, Jan 12, 2015 at 4:26 AM, Thomas Graf <tgraf@suug.ch> wrote:
> >> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> >> > index 4d52aa9..b148739 100644
> >> > --- a/drivers/net/vxlan.c
> >> > +++ b/drivers/net/vxlan.c
> >> > @@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
> >> >                         continue;
> >> >
> >> >                 vh2 = (struct vxlanhdr *)(p->data + off_vx);
> >> > -               if (vh->vx_vni != vh2->vx_vni) {
> >> > +               if (vh->vx_flags != vh2->vx_flags ||
> >> > +                   vh->vx_vni != vh2->vx_vni) {
> >>
> >> It's probably better to do a memcmp over the entire header. There's no
> >> guarantee that new fields will be entirely represented by flags.
> >
> > vx_flags covers the entire first 32 bit of vxlanhdr so it's
> > equivalent to a memcmp() already. I can change it to memcmp() if
> > you think that's more readable.
> 
> I was actually referring to the reserved 8 bit chunk after the VNI.
> This could potentially be used for something in the future.

Shouldn't that be covered by vh->vx_vni != vh2->vx_vni? I may
still misunderstand, sorry.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-12 22:47       ` Thomas Graf
@ 2015-01-12 22:50         ` Jesse Gross
  2015-01-12 22:59           ` Thomas Graf
  0 siblings, 1 reply; 30+ messages in thread
From: Jesse Gross @ 2015-01-12 22:50 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Stephen Hemminger, Pravin Shelar, Tom Herbert,
	Alexei Starovoitov, dev, netdev

On Mon, Jan 12, 2015 at 2:47 PM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/12/15 at 11:23am, Jesse Gross wrote:
>> On Mon, Jan 12, 2015 at 4:26 AM, Thomas Graf <tgraf@suug.ch> wrote:
>> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
>> > index 4d52aa9..b148739 100644
>> > --- a/drivers/net/vxlan.c
>> > +++ b/drivers/net/vxlan.c
>> > @@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
>> >                         continue;
>> >
>> >                 vh2 = (struct vxlanhdr *)(p->data + off_vx);
>> > -               if (vh->vx_vni != vh2->vx_vni) {
>> > +               if (vh->vx_flags != vh2->vx_flags ||
>> > +                   vh->vx_vni != vh2->vx_vni) {
>>
>> It's probably better to do a memcmp over the entire header. There's no
>> guarantee that new fields will be entirely represented by flags.
>
> vx_flags covers the entire first 32 bit of vxlanhdr so it's
> equivalent to a memcmp() already. I can change it to memcmp() if
> you think that's more readable.

I was actually referring to the reserved 8 bit chunk after the VNI.
This could potentially be used for something in the future.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
       [not found]     ` <CAEP_g=8TqGnftZa_scKODa2ra7gsV6ov_5J+Lbfq+4bFDZjiBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-01-12 22:47       ` Thomas Graf
  2015-01-12 22:50         ` Jesse Gross
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Graf @ 2015-01-12 22:47 UTC (permalink / raw)
  To: Jesse Gross
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev, David Miller,
	Stephen Hemminger, Alexei Starovoitov, Tom Herbert

On 01/12/15 at 11:23am, Jesse Gross wrote:
> On Mon, Jan 12, 2015 at 4:26 AM, Thomas Graf <tgraf@suug.ch> wrote:
> > diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> > index 4d52aa9..b148739 100644
> > --- a/drivers/net/vxlan.c
> > +++ b/drivers/net/vxlan.c
> > @@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
> >                         continue;
> >
> >                 vh2 = (struct vxlanhdr *)(p->data + off_vx);
> > -               if (vh->vx_vni != vh2->vx_vni) {
> > +               if (vh->vx_flags != vh2->vx_flags ||
> > +                   vh->vx_vni != vh2->vx_vni) {
> 
> It's probably better to do a memcmp over the entire header. There's no
> guarantee that new fields will be entirely represented by flags.

vx_flags covers the entire first 32 bit of vxlanhdr so it's
equivalent to a memcmp() already. I can change it to memcmp() if
you think that's more readable.

> 
> > diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
> > index d7c46b3..dd68c97 100644
> > --- a/net/openvswitch/vport-vxlan.c
> > +++ b/net/openvswitch/vport-vxlan.c
> > @@ -146,6 +147,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
> >         struct vxlan_port *vxlan_port = vxlan_vport(vport);
> >         __be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
> >         struct ovs_key_ipv4_tunnel *tun_key;
> > +       struct vxlan_metadata md;
> 
> It might be a good idea to zero out 'md', even if not strictly required.

Sure
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-12 12:26 ` [PATCH 2/6] vxlan: Group Policy extension Thomas Graf
@ 2015-01-12 19:23   ` Jesse Gross
       [not found]     ` <CAEP_g=8TqGnftZa_scKODa2ra7gsV6ov_5J+Lbfq+4bFDZjiBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Jesse Gross @ 2015-01-12 19:23 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Stephen Hemminger, Pravin Shelar, Tom Herbert,
	Alexei Starovoitov, dev, netdev

On Mon, Jan 12, 2015 at 4:26 AM, Thomas Graf <tgraf@suug.ch> wrote:
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 4d52aa9..b148739 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
>                         continue;
>
>                 vh2 = (struct vxlanhdr *)(p->data + off_vx);
> -               if (vh->vx_vni != vh2->vx_vni) {
> +               if (vh->vx_flags != vh2->vx_flags ||
> +                   vh->vx_vni != vh2->vx_vni) {

It's probably better to do a memcmp over the entire header. There's no
guarantee that new fields will be entirely represented by flags.

> diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
> index d7c46b3..dd68c97 100644
> --- a/net/openvswitch/vport-vxlan.c
> +++ b/net/openvswitch/vport-vxlan.c
> @@ -146,6 +147,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
>         struct vxlan_port *vxlan_port = vxlan_vport(vport);
>         __be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
>         struct ovs_key_ipv4_tunnel *tun_key;
> +       struct vxlan_metadata md;

It might be a good idea to zero out 'md', even if not strictly required.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-08 22:47 ` [PATCH 2/6] vxlan: Group Policy extension Thomas Graf
  2015-01-09 17:37   ` Alexei Starovoitov
  2015-01-12 17:37   ` Nicolas Dichtel
@ 2015-01-12 18:14   ` Tom Herbert
  2015-01-13  1:03     ` Thomas Graf
  2 siblings, 1 reply; 30+ messages in thread
From: Tom Herbert @ 2015-01-12 18:14 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Alexei Starovoitov, Linux Netdev List, dev

> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index f7d0d2d..9f07bf5 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -370,10 +370,18 @@ enum {
>         IFLA_VXLAN_UDP_CSUM,
>         IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
>         IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
> +       IFLA_VXLAN_EXTENSION,
>         __IFLA_VXLAN_MAX
>  };
>  #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
>
> +enum {
> +       IFLA_VXLAN_EXT_UNSPEC,
> +       IFLA_VXLAN_EXT_GBP,
> +       __IFLA_VXLAN_EXT_MAX,
> +};
> +#define IFLA_VXLAN_EXT_MAX (__IFLA_VXLAN_EXT_MAX - 1)
> +

Creating a level of indirection for extensions seems overly
complicated to me. Why not just define IFLA_VXLAN_GBP as just another
enum above?

Tom

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-12 17:37   ` Nicolas Dichtel
@ 2015-01-12 17:59     ` David Miller
  2015-01-13  8:29       ` Nicolas Dichtel
  2015-01-13  1:04     ` Thomas Graf
  1 sibling, 1 reply; 30+ messages in thread
From: David Miller @ 2015-01-12 17:59 UTC (permalink / raw)
  To: nicolas.dichtel
  Cc: tgraf, jesse, stephen, pshelar, therbert, alexei.starovoitov,
	netdev, dev


Can you PLEASE, PLEASE, not quote and entire full patch just to add two
lines of commentary.

Quote _only_ the _RELEVANT_ portions of the email you are replying to.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-08 22:47 ` [PATCH 2/6] vxlan: Group Policy extension Thomas Graf
  2015-01-09 17:37   ` Alexei Starovoitov
@ 2015-01-12 17:37   ` Nicolas Dichtel
  2015-01-12 17:59     ` David Miller
  2015-01-13  1:04     ` Thomas Graf
  2015-01-12 18:14   ` Tom Herbert
  2 siblings, 2 replies; 30+ messages in thread
From: Nicolas Dichtel @ 2015-01-12 17:37 UTC (permalink / raw)
  To: Thomas Graf, davem, jesse, stephen, pshelar, therbert,
	alexei.starovoitov
  Cc: netdev, dev

Le 08/01/2015 23:47, Thomas Graf a écrit :
> Implements supports for the Group Policy VXLAN extension [0] to provide
> a lightweight and simple security label mechanism across network peers
> based on VXLAN. The security context and associated metadata is mapped
> to/from skb->mark. This allows further mapping to a SELinux context
> using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
> tc, etc.
>
> The group membership is defined by the lower 16 bits of skb->mark, the
> upper 16 bits are used for flags.
>
> SELinux allows to manage label to secure local resources. However,
> distributed applications require ACLs to implemented across hosts. This
> is typically achieved by matching on L2-L4 fields to identify the
> original sending host and process on the receiver. On top of that,
> netlabel and specifically CIPSO [1] allow to map security contexts to
> universal labels.  However, netlabel and CIPSO are relatively complex.
> This patch provides a lightweight alternative for overlay network
> environments with a trusted underlay. No additional control protocol
> is required.
>
>             Host 1:                       Host 2:
>
>        Group A        Group B        Group B     Group A
>        +-----+   +-------------+    +-------+   +-----+
>        | lxc |   | SELinux CTX |    | httpd |   | VM  |
>        +--+--+   +--+----------+    +---+---+   +--+--+
> 	  \---+---/                     \----+---/
> 	      |                              |
> 	  +---+---+                      +---+---+
> 	  | vxlan |                      | vxlan |
> 	  +---+---+                      +---+---+
> 	      +------------------------------+
>
> Backwards compatibility:
> A VXLAN-GBP socket can receive standard VXLAN frames and will assign
> the default group 0x0000 to such frames. A Linux VXLAN socket will
> drop VXLAN-GBP  frames. The extension is therefore disabled by default
> and needs to be specifically enabled:
>
>     ip link add [...] type vxlan [...] gbp
>
> In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
> must run on a separate port number.
>
> Examples:
>   iptables:
>    host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
>    host2# iptables -I INPUT -m mark --mark 0x200 -j DROP
>
>   OVS:
>    # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
>    # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'
>
> [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
> [1] http://lwn.net/Articles/204905/
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
> v2:
>   - split GBP header definition into separate struct vxlanhdr_gbp as requested
>     by Alexei
>
>   drivers/net/vxlan.c           | 161 ++++++++++++++++++++++++++++++------------
>   include/net/vxlan.h           |  73 +++++++++++++++++--
>   include/uapi/linux/if_link.h  |   8 +++
>   net/openvswitch/vport-vxlan.c |   9 ++-
>   4 files changed, 198 insertions(+), 53 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 4d52aa9..b148739 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -132,6 +132,7 @@ struct vxlan_dev {
>   	__u8		  tos;		/* TOS override */
>   	__u8		  ttl;
>   	u32		  flags;	/* VXLAN_F_* in vxlan.h */
> +	u32		  exts;		/* Enabled extensions */
>
>   	struct work_struct sock_work;
>   	struct work_struct igmp_join;
> @@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
>   			continue;
>
>   		vh2 = (struct vxlanhdr *)(p->data + off_vx);
> -		if (vh->vx_vni != vh2->vx_vni) {
> +		if (vh->vx_flags != vh2->vx_flags ||
> +		    vh->vx_vni != vh2->vx_vni) {
>   			NAPI_GRO_CB(p)->same_flow = 0;
>   			continue;
>   		}
> @@ -1095,6 +1097,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>   {
>   	struct vxlan_sock *vs;
>   	struct vxlanhdr *vxh;
> +	struct vxlan_metadata md = {0};
>
>   	/* Need Vxlan and inner Ethernet header to be present */
>   	if (!pskb_may_pull(skb, VXLAN_HLEN))
> @@ -1113,6 +1116,22 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>   	if (vs->exts) {
>   		if (!vxh->vni_present)
>   			goto error_invalid_header;
> +
> +		if (vxh->gbp_present) {
> +			struct vxlanhdr_gbp *gbp;
> +
> +			if (!(vs->exts & VXLAN_EXT_GBP))
> +				goto error_invalid_header;
> +
> +			gbp = (struct vxlanhdr_gbp *)vxh;
> +			md.gbp = ntohs(gbp->policy_id);
> +
> +			if (gbp->dont_learn)
> +				md.gbp |= VXLAN_GBP_DONT_LEARN;
> +
> +			if (gbp->policy_applied)
> +				md.gbp |= VXLAN_GBP_POLICY_APPLIED;
> +		}
>   	} else {
>   		if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
>   		    (vxh->vx_vni & htonl(0xff)))
> @@ -1122,7 +1141,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>   	if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
>   		goto drop;
>
> -	vs->rcv(vs, skb, vxh->vx_vni);
> +	md.vni = vxh->vx_vni;
> +	vs->rcv(vs, skb, &md);
>   	return 0;
>
>   drop:
> @@ -1138,8 +1158,8 @@ error:
>   	return 1;
>   }
>
> -static void vxlan_rcv(struct vxlan_sock *vs,
> -		      struct sk_buff *skb, __be32 vx_vni)
> +static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
> +		      struct vxlan_metadata *md)
>   {
>   	struct iphdr *oip = NULL;
>   	struct ipv6hdr *oip6 = NULL;
> @@ -1150,7 +1170,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
>   	int err = 0;
>   	union vxlan_addr *remote_ip;
>
> -	vni = ntohl(vx_vni) >> 8;
> +	vni = ntohl(md->vni) >> 8;
>   	/* Is this VNI defined? */
>   	vxlan = vxlan_vs_find_vni(vs, vni);
>   	if (!vxlan)
> @@ -1184,6 +1204,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
>   		goto drop;
>
>   	skb_reset_network_header(skb);
> +	skb->mark = md->gbp;
>
>   	if (oip6)
>   		err = IP6_ECN_decapsulate(oip6, skb);
> @@ -1533,15 +1554,57 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
>   	return false;
>   }
>
> +static int vxlan_build_hdr(struct sk_buff *skb, struct vxlan_sock *vs,
> +			   int min_headroom, struct vxlan_metadata *md)
> +{
> +	struct vxlanhdr *vxh;
> +	int err;
> +
> +	/* Need space for new headers (invalidates iph ptr) */
> +	err = skb_cow_head(skb, min_headroom);
> +	if (unlikely(err)) {
> +		kfree_skb(skb);
> +		return err;
> +	}
> +
> +	skb = vlan_hwaccel_push_inside(skb);
> +	if (WARN_ON(!skb))
> +		return -ENOMEM;
> +
> +	vxh = (struct vxlanhdr *)__skb_push(skb, sizeof(*vxh));
> +	vxh->vx_flags = htonl(VXLAN_FLAGS);
> +	vxh->vx_vni = md->vni;
> +
> +	if (vs->exts)  {
> +		if (vs->exts & VXLAN_EXT_GBP) {
> +			struct vxlanhdr_gbp *gbp;
> +
> +			gbp = (struct vxlanhdr_gbp *)vxh;
> +			vxh->gbp_present = 1;
> +
> +			if (md->gbp & VXLAN_GBP_DONT_LEARN)
> +				gbp->dont_learn = 1;
> +
> +			if (md->gbp & VXLAN_GBP_POLICY_APPLIED)
> +				gbp->policy_applied = 1;
> +
> +			gbp->policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
> +		}
> +	}
> +
> +	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
> +
> +	return 0;
> +}
> +
>   #if IS_ENABLED(CONFIG_IPV6)
>   static int vxlan6_xmit_skb(struct vxlan_sock *vs,
>   			   struct dst_entry *dst, struct sk_buff *skb,
>   			   struct net_device *dev, struct in6_addr *saddr,
>   			   struct in6_addr *daddr, __u8 prio, __u8 ttl,
> -			   __be16 src_port, __be16 dst_port, __be32 vni,
> -			   bool xnet)
> +			   __be16 src_port, __be16 dst_port,
> +			   struct vxlan_metadata *md, bool xnet)
>   {
> -	struct vxlanhdr *vxh;
>   	int min_headroom;
>   	int err;
>   	bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
> @@ -1558,24 +1621,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
>   			+ VXLAN_HLEN + sizeof(struct ipv6hdr)
>   			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
>
> -	/* Need space for new headers (invalidates iph ptr) */
> -	err = skb_cow_head(skb, min_headroom);
> -	if (unlikely(err)) {
> -		kfree_skb(skb);
> -		goto err;
> -	}
> -
> -	skb = vlan_hwaccel_push_inside(skb);
> -	if (WARN_ON(!skb)) {
> -		err = -ENOMEM;
> +	err = vxlan_build_hdr(skb, vs, min_headroom, md);
> +	if (err)
>   		goto err;
> -	}
> -
> -	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -	vxh->vx_flags = htonl(VXLAN_FLAGS);
> -	vxh->vx_vni = vni;
> -
> -	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
>
>   	udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
>   			     ttl, src_port, dst_port);
> @@ -1589,9 +1637,9 @@ err:
>   int vxlan_xmit_skb(struct vxlan_sock *vs,
>   		   struct rtable *rt, struct sk_buff *skb,
>   		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
> -		   __be16 src_port, __be16 dst_port, __be32 vni, bool xnet)
> +		   __be16 src_port, __be16 dst_port,
> +		   struct vxlan_metadata *md, bool xnet)
>   {
> -	struct vxlanhdr *vxh;
>   	int min_headroom;
>   	int err;
>   	bool udp_sum = !vs->sock->sk->sk_no_check_tx;
> @@ -1604,22 +1652,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
>   			+ VXLAN_HLEN + sizeof(struct iphdr)
>   			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
>
> -	/* Need space for new headers (invalidates iph ptr) */
> -	err = skb_cow_head(skb, min_headroom);
> -	if (unlikely(err)) {
> -		kfree_skb(skb);
> +	err = vxlan_build_hdr(skb, vs, min_headroom, md);
> +	if (err)
>   		return err;
> -	}
> -
> -	skb = vlan_hwaccel_push_inside(skb);
> -	if (WARN_ON(!skb))
> -		return -ENOMEM;
> -
> -	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -	vxh->vx_flags = htonl(VXLAN_FLAGS);
> -	vxh->vx_vni = vni;
> -
> -	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
>
>   	return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
>   				   ttl, df, src_port, dst_port, xnet);
> @@ -1679,6 +1714,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>   	const struct iphdr *old_iph;
>   	struct flowi4 fl4;
>   	union vxlan_addr *dst;
> +	struct vxlan_metadata md;
>   	__be16 src_port = 0, dst_port;
>   	u32 vni;
>   	__be16 df = 0;
> @@ -1749,11 +1785,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>
>   		tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
>   		ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
> +		md.vni = htonl(vni << 8);
> +		md.gbp = skb->mark;
>
>   		err = vxlan_xmit_skb(vxlan->vn_sock, rt, skb,
>   				     fl4.saddr, dst->sin.sin_addr.s_addr,
> -				     tos, ttl, df, src_port, dst_port,
> -				     htonl(vni << 8),
> +				     tos, ttl, df, src_port, dst_port, &md,
>   				     !net_eq(vxlan->net, dev_net(vxlan->dev)));
>   		if (err < 0) {
>   			/* skb is already freed. */
> @@ -1806,10 +1843,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>   		}
>
>   		ttl = ttl ? : ip6_dst_hoplimit(ndst);
> +		md.vni = htonl(vni << 8);
> +		md.gbp = skb->mark;
>
>   		err = vxlan6_xmit_skb(vxlan->vn_sock, ndst, skb,
>   				      dev, &fl6.saddr, &fl6.daddr, 0, ttl,
> -				      src_port, dst_port, htonl(vni << 8),
> +				      src_port, dst_port, &md,
>   				      !net_eq(vxlan->net, dev_net(vxlan->dev)));
>   #endif
>   	}
> @@ -2210,6 +2249,11 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
>   	[IFLA_VXLAN_UDP_CSUM]	= { .type = NLA_U8 },
>   	[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]	= { .type = NLA_U8 },
>   	[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]	= { .type = NLA_U8 },
> +	[IFLA_VXLAN_EXTENSION]	= { .type = NLA_NESTED },
> +};
> +
> +static const struct nla_policy vxlan_ext_policy[IFLA_VXLAN_EXT_MAX + 1] = {
> +	[IFLA_VXLAN_EXT_GBP]	= { .type = NLA_FLAG, },
>   };
>
>   static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
> @@ -2246,6 +2290,18 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
>   		}
>   	}
>
> +	if (data[IFLA_VXLAN_EXTENSION]) {
> +		int err;
> +
> +		err = nla_validate_nested(data[IFLA_VXLAN_EXTENSION],
> +					  IFLA_VXLAN_EXT_MAX, vxlan_ext_policy);
> +		if (err < 0) {
> +			pr_debug("invalid VXLAN extension configuration: %d\n",
> +				 err);
> +			return -EINVAL;
> +		}
> +	}
> +
>   	return 0;
>   }
>
> @@ -2400,6 +2456,18 @@ static void vxlan_sock_work(struct work_struct *work)
>   	dev_put(vxlan->dev);
>   }
>
> +static void configure_vxlan_exts(struct vxlan_dev *vxlan, struct nlattr *attr)
> +{
> +	struct nlattr *exts[IFLA_VXLAN_EXT_MAX+1];
> +
> +	/* Validated in vxlan_validate() */
> +	if (nla_parse_nested(exts, IFLA_VXLAN_EXT_MAX, attr, NULL) < 0)
> +		BUG();
> +
> +	if (exts[IFLA_VXLAN_EXT_GBP])
> +		vxlan->exts |= VXLAN_EXT_GBP;
> +}
> +
>   static int vxlan_newlink(struct net *net, struct net_device *dev,
>   			 struct nlattr *tb[], struct nlattr *data[])
>   {
> @@ -2525,6 +2593,9 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
>   	    nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
>   		vxlan->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
>
> +	if (data[IFLA_VXLAN_EXTENSION])
> +		configure_vxlan_exts(vxlan, data[IFLA_VXLAN_EXTENSION]);
> +
Can you also update vxlan_fill_info() so that these new attributes can be dumped 
via netlink?


Thank you,
Nicolas

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/6] vxlan: Group Policy extension
  2015-01-12 12:26 [PATCH 0/6 net-next v3] VXLAN Group Policy Extension Thomas Graf
@ 2015-01-12 12:26 ` Thomas Graf
  2015-01-12 19:23   ` Jesse Gross
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Graf @ 2015-01-12 12:26 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar, therbert, alexei.starovoitov; +Cc: dev, netdev

Implements supports for the Group Policy VXLAN extension [0] to provide
a lightweight and simple security label mechanism across network peers
based on VXLAN. The security context and associated metadata is mapped
to/from skb->mark. This allows further mapping to a SELinux context
using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
tc, etc.

The group membership is defined by the lower 16 bits of skb->mark, the
upper 16 bits are used for flags.

SELinux allows to manage label to secure local resources. However,
distributed applications require ACLs to implemented across hosts. This
is typically achieved by matching on L2-L4 fields to identify the
original sending host and process on the receiver. On top of that,
netlabel and specifically CIPSO [1] allow to map security contexts to
universal labels.  However, netlabel and CIPSO are relatively complex.
This patch provides a lightweight alternative for overlay network
environments with a trusted underlay. No additional control protocol
is required.

           Host 1:                       Host 2:

      Group A        Group B        Group B     Group A
      +-----+   +-------------+    +-------+   +-----+
      | lxc |   | SELinux CTX |    | httpd |   | VM  |
      +--+--+   +--+----------+    +---+---+   +--+--+
	  \---+---/                     \----+---/
	      |                              |
	  +---+---+                      +---+---+
	  | vxlan |                      | vxlan |
	  +---+---+                      +---+---+
	      +------------------------------+

Backwards compatibility:
A VXLAN-GBP socket can receive standard VXLAN frames and will assign
the default group 0x0000 to such frames. A Linux VXLAN socket will
drop VXLAN-GBP  frames. The extension is therefore disabled by default
and needs to be specifically enabled:

   ip link add [...] type vxlan [...] gbp

In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
must run on a separate port number.

Examples:
 iptables:
  host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
  host2# iptables -I INPUT -m mark --mark 0x200 -j DROP

 OVS:
  # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
  # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'

[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
[1] http://lwn.net/Articles/204905/

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
v2->v3:
 - Removed empty struct vxlan_gbp as spotted by Alexei
v1->v2:
 - split GBP header definition into separate struct vxlanhdr_gbp as requested
   by Alexei

 drivers/net/vxlan.c           | 161 ++++++++++++++++++++++++++++++------------
 include/net/vxlan.h           |  70 ++++++++++++++++--
 include/uapi/linux/if_link.h  |   8 +++
 net/openvswitch/vport-vxlan.c |   9 ++-
 4 files changed, 195 insertions(+), 53 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 4d52aa9..b148739 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -132,6 +132,7 @@ struct vxlan_dev {
 	__u8		  tos;		/* TOS override */
 	__u8		  ttl;
 	u32		  flags;	/* VXLAN_F_* in vxlan.h */
+	u32		  exts;		/* Enabled extensions */
 
 	struct work_struct sock_work;
 	struct work_struct igmp_join;
@@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
 			continue;
 
 		vh2 = (struct vxlanhdr *)(p->data + off_vx);
-		if (vh->vx_vni != vh2->vx_vni) {
+		if (vh->vx_flags != vh2->vx_flags ||
+		    vh->vx_vni != vh2->vx_vni) {
 			NAPI_GRO_CB(p)->same_flow = 0;
 			continue;
 		}
@@ -1095,6 +1097,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
 	struct vxlan_sock *vs;
 	struct vxlanhdr *vxh;
+	struct vxlan_metadata md = {0};
 
 	/* Need Vxlan and inner Ethernet header to be present */
 	if (!pskb_may_pull(skb, VXLAN_HLEN))
@@ -1113,6 +1116,22 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (vs->exts) {
 		if (!vxh->vni_present)
 			goto error_invalid_header;
+
+		if (vxh->gbp_present) {
+			struct vxlanhdr_gbp *gbp;
+
+			if (!(vs->exts & VXLAN_EXT_GBP))
+				goto error_invalid_header;
+
+			gbp = (struct vxlanhdr_gbp *)vxh;
+			md.gbp = ntohs(gbp->policy_id);
+
+			if (gbp->dont_learn)
+				md.gbp |= VXLAN_GBP_DONT_LEARN;
+
+			if (gbp->policy_applied)
+				md.gbp |= VXLAN_GBP_POLICY_APPLIED;
+		}
 	} else {
 		if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
 		    (vxh->vx_vni & htonl(0xff)))
@@ -1122,7 +1141,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
 		goto drop;
 
-	vs->rcv(vs, skb, vxh->vx_vni);
+	md.vni = vxh->vx_vni;
+	vs->rcv(vs, skb, &md);
 	return 0;
 
 drop:
@@ -1138,8 +1158,8 @@ error:
 	return 1;
 }
 
-static void vxlan_rcv(struct vxlan_sock *vs,
-		      struct sk_buff *skb, __be32 vx_vni)
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+		      struct vxlan_metadata *md)
 {
 	struct iphdr *oip = NULL;
 	struct ipv6hdr *oip6 = NULL;
@@ -1150,7 +1170,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 	int err = 0;
 	union vxlan_addr *remote_ip;
 
-	vni = ntohl(vx_vni) >> 8;
+	vni = ntohl(md->vni) >> 8;
 	/* Is this VNI defined? */
 	vxlan = vxlan_vs_find_vni(vs, vni);
 	if (!vxlan)
@@ -1184,6 +1204,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 		goto drop;
 
 	skb_reset_network_header(skb);
+	skb->mark = md->gbp;
 
 	if (oip6)
 		err = IP6_ECN_decapsulate(oip6, skb);
@@ -1533,15 +1554,57 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 	return false;
 }
 
+static int vxlan_build_hdr(struct sk_buff *skb, struct vxlan_sock *vs,
+			   int min_headroom, struct vxlan_metadata *md)
+{
+	struct vxlanhdr *vxh;
+	int err;
+
+	/* Need space for new headers (invalidates iph ptr) */
+	err = skb_cow_head(skb, min_headroom);
+	if (unlikely(err)) {
+		kfree_skb(skb);
+		return err;
+	}
+
+	skb = vlan_hwaccel_push_inside(skb);
+	if (WARN_ON(!skb))
+		return -ENOMEM;
+
+	vxh = (struct vxlanhdr *)__skb_push(skb, sizeof(*vxh));
+	vxh->vx_flags = htonl(VXLAN_FLAGS);
+	vxh->vx_vni = md->vni;
+
+	if (vs->exts)  {
+		if (vs->exts & VXLAN_EXT_GBP) {
+			struct vxlanhdr_gbp *gbp;
+
+			gbp = (struct vxlanhdr_gbp *)vxh;
+			vxh->gbp_present = 1;
+
+			if (md->gbp & VXLAN_GBP_DONT_LEARN)
+				gbp->dont_learn = 1;
+
+			if (md->gbp & VXLAN_GBP_POLICY_APPLIED)
+				gbp->policy_applied = 1;
+
+			gbp->policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
+		}
+	}
+
+	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
+
+	return 0;
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 			   struct dst_entry *dst, struct sk_buff *skb,
 			   struct net_device *dev, struct in6_addr *saddr,
 			   struct in6_addr *daddr, __u8 prio, __u8 ttl,
-			   __be16 src_port, __be16 dst_port, __be32 vni,
-			   bool xnet)
+			   __be16 src_port, __be16 dst_port,
+			   struct vxlan_metadata *md, bool xnet)
 {
-	struct vxlanhdr *vxh;
 	int min_headroom;
 	int err;
 	bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
@@ -1558,24 +1621,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 			+ VXLAN_HLEN + sizeof(struct ipv6hdr)
 			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
 
-	/* Need space for new headers (invalidates iph ptr) */
-	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err)) {
-		kfree_skb(skb);
-		goto err;
-	}
-
-	skb = vlan_hwaccel_push_inside(skb);
-	if (WARN_ON(!skb)) {
-		err = -ENOMEM;
+	err = vxlan_build_hdr(skb, vs, min_headroom, md);
+	if (err)
 		goto err;
-	}
-
-	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = htonl(VXLAN_FLAGS);
-	vxh->vx_vni = vni;
-
-	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
 			     ttl, src_port, dst_port);
@@ -1589,9 +1637,9 @@ err:
 int vxlan_xmit_skb(struct vxlan_sock *vs,
 		   struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-		   __be16 src_port, __be16 dst_port, __be32 vni, bool xnet)
+		   __be16 src_port, __be16 dst_port,
+		   struct vxlan_metadata *md, bool xnet)
 {
-	struct vxlanhdr *vxh;
 	int min_headroom;
 	int err;
 	bool udp_sum = !vs->sock->sk->sk_no_check_tx;
@@ -1604,22 +1652,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 			+ VXLAN_HLEN + sizeof(struct iphdr)
 			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
 
-	/* Need space for new headers (invalidates iph ptr) */
-	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err)) {
-		kfree_skb(skb);
+	err = vxlan_build_hdr(skb, vs, min_headroom, md);
+	if (err)
 		return err;
-	}
-
-	skb = vlan_hwaccel_push_inside(skb);
-	if (WARN_ON(!skb))
-		return -ENOMEM;
-
-	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = htonl(VXLAN_FLAGS);
-	vxh->vx_vni = vni;
-
-	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
 				   ttl, df, src_port, dst_port, xnet);
@@ -1679,6 +1714,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	const struct iphdr *old_iph;
 	struct flowi4 fl4;
 	union vxlan_addr *dst;
+	struct vxlan_metadata md;
 	__be16 src_port = 0, dst_port;
 	u32 vni;
 	__be16 df = 0;
@@ -1749,11 +1785,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 		tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
 		ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
+		md.vni = htonl(vni << 8);
+		md.gbp = skb->mark;
 
 		err = vxlan_xmit_skb(vxlan->vn_sock, rt, skb,
 				     fl4.saddr, dst->sin.sin_addr.s_addr,
-				     tos, ttl, df, src_port, dst_port,
-				     htonl(vni << 8),
+				     tos, ttl, df, src_port, dst_port, &md,
 				     !net_eq(vxlan->net, dev_net(vxlan->dev)));
 		if (err < 0) {
 			/* skb is already freed. */
@@ -1806,10 +1843,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		}
 
 		ttl = ttl ? : ip6_dst_hoplimit(ndst);
+		md.vni = htonl(vni << 8);
+		md.gbp = skb->mark;
 
 		err = vxlan6_xmit_skb(vxlan->vn_sock, ndst, skb,
 				      dev, &fl6.saddr, &fl6.daddr, 0, ttl,
-				      src_port, dst_port, htonl(vni << 8),
+				      src_port, dst_port, &md,
 				      !net_eq(vxlan->net, dev_net(vxlan->dev)));
 #endif
 	}
@@ -2210,6 +2249,11 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
 	[IFLA_VXLAN_UDP_CSUM]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]	= { .type = NLA_U8 },
+	[IFLA_VXLAN_EXTENSION]	= { .type = NLA_NESTED },
+};
+
+static const struct nla_policy vxlan_ext_policy[IFLA_VXLAN_EXT_MAX + 1] = {
+	[IFLA_VXLAN_EXT_GBP]	= { .type = NLA_FLAG, },
 };
 
 static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -2246,6 +2290,18 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
 		}
 	}
 
+	if (data[IFLA_VXLAN_EXTENSION]) {
+		int err;
+
+		err = nla_validate_nested(data[IFLA_VXLAN_EXTENSION],
+					  IFLA_VXLAN_EXT_MAX, vxlan_ext_policy);
+		if (err < 0) {
+			pr_debug("invalid VXLAN extension configuration: %d\n",
+				 err);
+			return -EINVAL;
+		}
+	}
+
 	return 0;
 }
 
@@ -2400,6 +2456,18 @@ static void vxlan_sock_work(struct work_struct *work)
 	dev_put(vxlan->dev);
 }
 
+static void configure_vxlan_exts(struct vxlan_dev *vxlan, struct nlattr *attr)
+{
+	struct nlattr *exts[IFLA_VXLAN_EXT_MAX+1];
+
+	/* Validated in vxlan_validate() */
+	if (nla_parse_nested(exts, IFLA_VXLAN_EXT_MAX, attr, NULL) < 0)
+		BUG();
+
+	if (exts[IFLA_VXLAN_EXT_GBP])
+		vxlan->exts |= VXLAN_EXT_GBP;
+}
+
 static int vxlan_newlink(struct net *net, struct net_device *dev,
 			 struct nlattr *tb[], struct nlattr *data[])
 {
@@ -2525,6 +2593,9 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 	    nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
 		vxlan->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
 
+	if (data[IFLA_VXLAN_EXTENSION])
+		configure_vxlan_exts(vxlan, data[IFLA_VXLAN_EXTENSION]);
+
 	if (vxlan_find_vni(net, vni, use_ipv6 ? AF_INET6 : AF_INET,
 			   vxlan->dst_port)) {
 		pr_info("duplicate VNI %u\n", vni);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 3e98d31..66ec53c 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -11,13 +11,62 @@
 #define VNI_HASH_BITS	10
 #define VNI_HASH_SIZE	(1<<VNI_HASH_BITS)
 
+/*
+ * VXLAN Group Based Policy Extension:
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |1|-|-|-|1|-|-|-|R|D|R|R|A|R|R|R|        Group Policy ID        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                VXLAN Network Identifier (VNI) |   Reserved    |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * D = Don't Learn bit. When set, this bit indicates that the egress
+ *     VTEP MUST NOT learn the source address of the encapsulated frame.
+ *
+ * A = Indicates that the group policy has already been applied to
+ *     this packet. Policies MUST NOT be applied by devices when the
+ *     A bit is set.
+ *
+ * [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
+ */
+struct vxlanhdr_gbp {
+	__u8	vx_flags;
+#ifdef __LITTLE_ENDIAN_BITFIELD
+	__u8	reserved_flags1:3,
+		policy_applied:1,
+		reserved_flags2:2,
+		dont_learn:1,
+		reserved_flags3:1;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+	__u8	reserved_flags1:1,
+		dont_learn:1,
+		reserved_flags2:2,
+		policy_applied:1,
+		reserved_flags3:3;
+#else
+#error	"Please fix <asm/byteorder.h>"
+#endif
+	__be16	policy_id;
+	__be32	vx_vni;
+};
+
+/* skb->mark mapping
+ *
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |R|R|R|R|R|R|R|R|R|D|R|R|A|R|R|R|        Group Policy ID        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+#define VXLAN_GBP_DONT_LEARN		(BIT(6) << 16)
+#define VXLAN_GBP_POLICY_APPLIED	(BIT(3) << 16)
+#define VXLAN_GBP_ID_MASK		(0xFFFF)
+
 /* VXLAN protocol header:
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- * |R|R|R|R|I|R|R|R|               Reserved                        |
+ * |G|R|R|R|I|R|R|R|               Reserved                        |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * |                VXLAN Network Identifier (VNI) |   Reserved    |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  *
+ * G = 1	Group Policy (VXLAN-GBP)
  * I = 1	VXLAN Network Identifier (VNI) present
  */
 struct vxlanhdr {
@@ -26,9 +75,11 @@ struct vxlanhdr {
 #ifdef __LITTLE_ENDIAN_BITFIELD
 			__u8	reserved_flags1:3,
 				vni_present:1,
-				reserved_flags2:4;
+				reserved_flags2:3,
+				gbp_present:1;
 #elif defined(__BIG_ENDIAN_BITFIELD)
-			__u8	reserved_flags2:4,
+			__u8	gbp_present:1,
+				reserved_flags2:3,
 				vni_present:1,
 				reserved_flags1:3;
 #else
@@ -42,8 +93,16 @@ struct vxlanhdr {
 	__be32	vx_vni;
 };
 
+struct vxlan_metadata {
+	__be32		vni;
+	u32		gbp;
+};
+
 struct vxlan_sock;
-typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb, __be32 key);
+typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb,
+			   struct vxlan_metadata *md);
+
+#define VXLAN_EXT_GBP			BIT(0)
 
 /* per UDP socket information */
 struct vxlan_sock {
@@ -78,7 +137,8 @@ void vxlan_sock_release(struct vxlan_sock *vs);
 int vxlan_xmit_skb(struct vxlan_sock *vs,
 		   struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-		   __be16 src_port, __be16 dst_port, __be32 vni, bool xnet);
+		   __be16 src_port, __be16 dst_port, struct vxlan_metadata *md,
+		   bool xnet);
 
 static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
 						     netdev_features_t features)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index f7d0d2d..9f07bf5 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -370,10 +370,18 @@ enum {
 	IFLA_VXLAN_UDP_CSUM,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
+	IFLA_VXLAN_EXTENSION,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
 
+enum {
+	IFLA_VXLAN_EXT_UNSPEC,
+	IFLA_VXLAN_EXT_GBP,
+	__IFLA_VXLAN_EXT_MAX,
+};
+#define IFLA_VXLAN_EXT_MAX (__IFLA_VXLAN_EXT_MAX - 1)
+
 struct ifla_vxlan_port_range {
 	__be16	low;
 	__be16	high;
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index d7c46b3..dd68c97 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -59,7 +59,8 @@ static inline struct vxlan_port *vxlan_vport(const struct vport *vport)
 }
 
 /* Called with rcu_read_lock and BH disabled. */
-static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+		      struct vxlan_metadata *md)
 {
 	struct ovs_tunnel_info tun_info;
 	struct vport *vport = vs->data;
@@ -68,7 +69,7 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
 
 	/* Save outer tunnel values */
 	iph = ip_hdr(skb);
-	key = cpu_to_be64(ntohl(vx_vni) >> 8);
+	key = cpu_to_be64(ntohl(md->vni) >> 8);
 	ovs_flow_tun_info_init(&tun_info, iph,
 			       udp_hdr(skb)->source, udp_hdr(skb)->dest,
 			       key, TUNNEL_KEY, NULL, 0);
@@ -146,6 +147,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 	struct vxlan_port *vxlan_port = vxlan_vport(vport);
 	__be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
 	struct ovs_key_ipv4_tunnel *tun_key;
+	struct vxlan_metadata md;
 	struct rtable *rt;
 	struct flowi4 fl;
 	__be16 src_port;
@@ -178,12 +180,13 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 	skb->ignore_df = 1;
 
 	src_port = udp_flow_src_port(net, skb, 0, 0, true);
+	md.vni = htonl(be64_to_cpu(tun_key->tun_id) << 8);
 
 	err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
 			     fl.saddr, tun_key->ipv4_dst,
 			     tun_key->ipv4_tos, tun_key->ipv4_ttl, df,
 			     src_port, dst_port,
-			     htonl(be64_to_cpu(tun_key->tun_id) << 8),
+			     &md,
 			     false);
 	if (err < 0)
 		ip_rt_put(rt);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-09 17:37   ` Alexei Starovoitov
@ 2015-01-09 22:10     ` Thomas Graf
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Graf @ 2015-01-09 22:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Jesse Gross, Stephen Hemminger, Pravin Shelar,
	Tom Herbert, netdev, dev

On 01/09/15 at 09:37am, Alexei Starovoitov wrote:
> On Thu, Jan 8, 2015 at 2:47 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > +
> > +struct vxlan_gbp {
> > +} __packed;
> 
> empty struct ? seems unused.
> looks good to me otherwise.

Poor leftover, must feel all lonely there. Thanks for the reviews.
Will wait a little bit longer for more feedback and send out v3.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-08 22:47 ` [PATCH 2/6] vxlan: Group Policy extension Thomas Graf
@ 2015-01-09 17:37   ` Alexei Starovoitov
  2015-01-09 22:10     ` Thomas Graf
  2015-01-12 17:37   ` Nicolas Dichtel
  2015-01-12 18:14   ` Tom Herbert
  2 siblings, 1 reply; 30+ messages in thread
From: Alexei Starovoitov @ 2015-01-09 17:37 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David S. Miller, Jesse Gross, Stephen Hemminger, Pravin Shelar,
	Tom Herbert, netdev, dev

On Thu, Jan 8, 2015 at 2:47 PM, Thomas Graf <tgraf@suug.ch> wrote:
> +
> +struct vxlan_gbp {
> +} __packed;

empty struct ? seems unused.
looks good to me otherwise.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/6] vxlan: Group Policy extension
  2015-01-08 22:47 [PATCH 0/6 net-next v2] VXLAN Group Policy Extension Thomas Graf
@ 2015-01-08 22:47 ` Thomas Graf
  2015-01-09 17:37   ` Alexei Starovoitov
                     ` (2 more replies)
  0 siblings, 3 replies; 30+ messages in thread
From: Thomas Graf @ 2015-01-08 22:47 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar, therbert, alexei.starovoitov; +Cc: netdev, dev

Implements supports for the Group Policy VXLAN extension [0] to provide
a lightweight and simple security label mechanism across network peers
based on VXLAN. The security context and associated metadata is mapped
to/from skb->mark. This allows further mapping to a SELinux context
using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
tc, etc.

The group membership is defined by the lower 16 bits of skb->mark, the
upper 16 bits are used for flags.

SELinux allows to manage label to secure local resources. However,
distributed applications require ACLs to implemented across hosts. This
is typically achieved by matching on L2-L4 fields to identify the
original sending host and process on the receiver. On top of that,
netlabel and specifically CIPSO [1] allow to map security contexts to
universal labels.  However, netlabel and CIPSO are relatively complex.
This patch provides a lightweight alternative for overlay network
environments with a trusted underlay. No additional control protocol
is required.

           Host 1:                       Host 2:

      Group A        Group B        Group B     Group A
      +-----+   +-------------+    +-------+   +-----+
      | lxc |   | SELinux CTX |    | httpd |   | VM  |
      +--+--+   +--+----------+    +---+---+   +--+--+
	  \---+---/                     \----+---/
	      |                              |
	  +---+---+                      +---+---+
	  | vxlan |                      | vxlan |
	  +---+---+                      +---+---+
	      +------------------------------+

Backwards compatibility:
A VXLAN-GBP socket can receive standard VXLAN frames and will assign
the default group 0x0000 to such frames. A Linux VXLAN socket will
drop VXLAN-GBP  frames. The extension is therefore disabled by default
and needs to be specifically enabled:

   ip link add [...] type vxlan [...] gbp

In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
must run on a separate port number.

Examples:
 iptables:
  host1# iptables -I OUTPUT -m owner --uid-owner 101 -j MARK --set-mark 0x200
  host2# iptables -I INPUT -m mark --mark 0x200 -j DROP

 OVS:
  # ovs-ofctl add-flow br0 'in_port=1,actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL'
  # ovs-ofctl add-flow br0 'in_port=2,tun_gbp_id=0x200,actions=drop'

[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
[1] http://lwn.net/Articles/204905/

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
v2:
 - split GBP header definition into separate struct vxlanhdr_gbp as requested
   by Alexei

 drivers/net/vxlan.c           | 161 ++++++++++++++++++++++++++++++------------
 include/net/vxlan.h           |  73 +++++++++++++++++--
 include/uapi/linux/if_link.h  |   8 +++
 net/openvswitch/vport-vxlan.c |   9 ++-
 4 files changed, 198 insertions(+), 53 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 4d52aa9..b148739 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -132,6 +132,7 @@ struct vxlan_dev {
 	__u8		  tos;		/* TOS override */
 	__u8		  ttl;
 	u32		  flags;	/* VXLAN_F_* in vxlan.h */
+	u32		  exts;		/* Enabled extensions */
 
 	struct work_struct sock_work;
 	struct work_struct igmp_join;
@@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
 			continue;
 
 		vh2 = (struct vxlanhdr *)(p->data + off_vx);
-		if (vh->vx_vni != vh2->vx_vni) {
+		if (vh->vx_flags != vh2->vx_flags ||
+		    vh->vx_vni != vh2->vx_vni) {
 			NAPI_GRO_CB(p)->same_flow = 0;
 			continue;
 		}
@@ -1095,6 +1097,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
 	struct vxlan_sock *vs;
 	struct vxlanhdr *vxh;
+	struct vxlan_metadata md = {0};
 
 	/* Need Vxlan and inner Ethernet header to be present */
 	if (!pskb_may_pull(skb, VXLAN_HLEN))
@@ -1113,6 +1116,22 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (vs->exts) {
 		if (!vxh->vni_present)
 			goto error_invalid_header;
+
+		if (vxh->gbp_present) {
+			struct vxlanhdr_gbp *gbp;
+
+			if (!(vs->exts & VXLAN_EXT_GBP))
+				goto error_invalid_header;
+
+			gbp = (struct vxlanhdr_gbp *)vxh;
+			md.gbp = ntohs(gbp->policy_id);
+
+			if (gbp->dont_learn)
+				md.gbp |= VXLAN_GBP_DONT_LEARN;
+
+			if (gbp->policy_applied)
+				md.gbp |= VXLAN_GBP_POLICY_APPLIED;
+		}
 	} else {
 		if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
 		    (vxh->vx_vni & htonl(0xff)))
@@ -1122,7 +1141,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
 		goto drop;
 
-	vs->rcv(vs, skb, vxh->vx_vni);
+	md.vni = vxh->vx_vni;
+	vs->rcv(vs, skb, &md);
 	return 0;
 
 drop:
@@ -1138,8 +1158,8 @@ error:
 	return 1;
 }
 
-static void vxlan_rcv(struct vxlan_sock *vs,
-		      struct sk_buff *skb, __be32 vx_vni)
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+		      struct vxlan_metadata *md)
 {
 	struct iphdr *oip = NULL;
 	struct ipv6hdr *oip6 = NULL;
@@ -1150,7 +1170,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 	int err = 0;
 	union vxlan_addr *remote_ip;
 
-	vni = ntohl(vx_vni) >> 8;
+	vni = ntohl(md->vni) >> 8;
 	/* Is this VNI defined? */
 	vxlan = vxlan_vs_find_vni(vs, vni);
 	if (!vxlan)
@@ -1184,6 +1204,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 		goto drop;
 
 	skb_reset_network_header(skb);
+	skb->mark = md->gbp;
 
 	if (oip6)
 		err = IP6_ECN_decapsulate(oip6, skb);
@@ -1533,15 +1554,57 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 	return false;
 }
 
+static int vxlan_build_hdr(struct sk_buff *skb, struct vxlan_sock *vs,
+			   int min_headroom, struct vxlan_metadata *md)
+{
+	struct vxlanhdr *vxh;
+	int err;
+
+	/* Need space for new headers (invalidates iph ptr) */
+	err = skb_cow_head(skb, min_headroom);
+	if (unlikely(err)) {
+		kfree_skb(skb);
+		return err;
+	}
+
+	skb = vlan_hwaccel_push_inside(skb);
+	if (WARN_ON(!skb))
+		return -ENOMEM;
+
+	vxh = (struct vxlanhdr *)__skb_push(skb, sizeof(*vxh));
+	vxh->vx_flags = htonl(VXLAN_FLAGS);
+	vxh->vx_vni = md->vni;
+
+	if (vs->exts)  {
+		if (vs->exts & VXLAN_EXT_GBP) {
+			struct vxlanhdr_gbp *gbp;
+
+			gbp = (struct vxlanhdr_gbp *)vxh;
+			vxh->gbp_present = 1;
+
+			if (md->gbp & VXLAN_GBP_DONT_LEARN)
+				gbp->dont_learn = 1;
+
+			if (md->gbp & VXLAN_GBP_POLICY_APPLIED)
+				gbp->policy_applied = 1;
+
+			gbp->policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
+		}
+	}
+
+	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
+
+	return 0;
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 			   struct dst_entry *dst, struct sk_buff *skb,
 			   struct net_device *dev, struct in6_addr *saddr,
 			   struct in6_addr *daddr, __u8 prio, __u8 ttl,
-			   __be16 src_port, __be16 dst_port, __be32 vni,
-			   bool xnet)
+			   __be16 src_port, __be16 dst_port,
+			   struct vxlan_metadata *md, bool xnet)
 {
-	struct vxlanhdr *vxh;
 	int min_headroom;
 	int err;
 	bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
@@ -1558,24 +1621,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 			+ VXLAN_HLEN + sizeof(struct ipv6hdr)
 			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
 
-	/* Need space for new headers (invalidates iph ptr) */
-	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err)) {
-		kfree_skb(skb);
-		goto err;
-	}
-
-	skb = vlan_hwaccel_push_inside(skb);
-	if (WARN_ON(!skb)) {
-		err = -ENOMEM;
+	err = vxlan_build_hdr(skb, vs, min_headroom, md);
+	if (err)
 		goto err;
-	}
-
-	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = htonl(VXLAN_FLAGS);
-	vxh->vx_vni = vni;
-
-	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
 			     ttl, src_port, dst_port);
@@ -1589,9 +1637,9 @@ err:
 int vxlan_xmit_skb(struct vxlan_sock *vs,
 		   struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-		   __be16 src_port, __be16 dst_port, __be32 vni, bool xnet)
+		   __be16 src_port, __be16 dst_port,
+		   struct vxlan_metadata *md, bool xnet)
 {
-	struct vxlanhdr *vxh;
 	int min_headroom;
 	int err;
 	bool udp_sum = !vs->sock->sk->sk_no_check_tx;
@@ -1604,22 +1652,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 			+ VXLAN_HLEN + sizeof(struct iphdr)
 			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
 
-	/* Need space for new headers (invalidates iph ptr) */
-	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err)) {
-		kfree_skb(skb);
+	err = vxlan_build_hdr(skb, vs, min_headroom, md);
+	if (err)
 		return err;
-	}
-
-	skb = vlan_hwaccel_push_inside(skb);
-	if (WARN_ON(!skb))
-		return -ENOMEM;
-
-	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = htonl(VXLAN_FLAGS);
-	vxh->vx_vni = vni;
-
-	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
 				   ttl, df, src_port, dst_port, xnet);
@@ -1679,6 +1714,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	const struct iphdr *old_iph;
 	struct flowi4 fl4;
 	union vxlan_addr *dst;
+	struct vxlan_metadata md;
 	__be16 src_port = 0, dst_port;
 	u32 vni;
 	__be16 df = 0;
@@ -1749,11 +1785,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 		tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
 		ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
+		md.vni = htonl(vni << 8);
+		md.gbp = skb->mark;
 
 		err = vxlan_xmit_skb(vxlan->vn_sock, rt, skb,
 				     fl4.saddr, dst->sin.sin_addr.s_addr,
-				     tos, ttl, df, src_port, dst_port,
-				     htonl(vni << 8),
+				     tos, ttl, df, src_port, dst_port, &md,
 				     !net_eq(vxlan->net, dev_net(vxlan->dev)));
 		if (err < 0) {
 			/* skb is already freed. */
@@ -1806,10 +1843,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		}
 
 		ttl = ttl ? : ip6_dst_hoplimit(ndst);
+		md.vni = htonl(vni << 8);
+		md.gbp = skb->mark;
 
 		err = vxlan6_xmit_skb(vxlan->vn_sock, ndst, skb,
 				      dev, &fl6.saddr, &fl6.daddr, 0, ttl,
-				      src_port, dst_port, htonl(vni << 8),
+				      src_port, dst_port, &md,
 				      !net_eq(vxlan->net, dev_net(vxlan->dev)));
 #endif
 	}
@@ -2210,6 +2249,11 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
 	[IFLA_VXLAN_UDP_CSUM]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]	= { .type = NLA_U8 },
+	[IFLA_VXLAN_EXTENSION]	= { .type = NLA_NESTED },
+};
+
+static const struct nla_policy vxlan_ext_policy[IFLA_VXLAN_EXT_MAX + 1] = {
+	[IFLA_VXLAN_EXT_GBP]	= { .type = NLA_FLAG, },
 };
 
 static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -2246,6 +2290,18 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
 		}
 	}
 
+	if (data[IFLA_VXLAN_EXTENSION]) {
+		int err;
+
+		err = nla_validate_nested(data[IFLA_VXLAN_EXTENSION],
+					  IFLA_VXLAN_EXT_MAX, vxlan_ext_policy);
+		if (err < 0) {
+			pr_debug("invalid VXLAN extension configuration: %d\n",
+				 err);
+			return -EINVAL;
+		}
+	}
+
 	return 0;
 }
 
@@ -2400,6 +2456,18 @@ static void vxlan_sock_work(struct work_struct *work)
 	dev_put(vxlan->dev);
 }
 
+static void configure_vxlan_exts(struct vxlan_dev *vxlan, struct nlattr *attr)
+{
+	struct nlattr *exts[IFLA_VXLAN_EXT_MAX+1];
+
+	/* Validated in vxlan_validate() */
+	if (nla_parse_nested(exts, IFLA_VXLAN_EXT_MAX, attr, NULL) < 0)
+		BUG();
+
+	if (exts[IFLA_VXLAN_EXT_GBP])
+		vxlan->exts |= VXLAN_EXT_GBP;
+}
+
 static int vxlan_newlink(struct net *net, struct net_device *dev,
 			 struct nlattr *tb[], struct nlattr *data[])
 {
@@ -2525,6 +2593,9 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 	    nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
 		vxlan->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
 
+	if (data[IFLA_VXLAN_EXTENSION])
+		configure_vxlan_exts(vxlan, data[IFLA_VXLAN_EXTENSION]);
+
 	if (vxlan_find_vni(net, vni, use_ipv6 ? AF_INET6 : AF_INET,
 			   vxlan->dst_port)) {
 		pr_info("duplicate VNI %u\n", vni);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 3e98d31..af0526b 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -11,13 +11,65 @@
 #define VNI_HASH_BITS	10
 #define VNI_HASH_SIZE	(1<<VNI_HASH_BITS)
 
+/*
+ * VXLAN Group Based Policy Extension:
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |1|-|-|-|1|-|-|-|R|D|R|R|A|R|R|R|        Group Policy ID        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                VXLAN Network Identifier (VNI) |   Reserved    |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * D = Don't Learn bit. When set, this bit indicates that the egress
+ *     VTEP MUST NOT learn the source address of the encapsulated frame.
+ *
+ * A = Indicates that the group policy has already been applied to
+ *     this packet. Policies MUST NOT be applied by devices when the
+ *     A bit is set.
+ *
+ * [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
+ */
+struct vxlanhdr_gbp {
+	__u8	vx_flags;
+#ifdef __LITTLE_ENDIAN_BITFIELD
+	__u8	reserved_flags1:3,
+		policy_applied:1,
+		reserved_flags2:2,
+		dont_learn:1,
+		reserved_flags3:1;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+	__u8	reserved_flags1:1,
+		dont_learn:1,
+		reserved_flags2:2,
+		policy_applied:1,
+		reserved_flags3:3;
+#else
+#error	"Please fix <asm/byteorder.h>"
+#endif
+	__be16	policy_id;
+	__be32	vx_vni;
+};
+
+struct vxlan_gbp {
+} __packed;
+
+/* skb->mark mapping
+ *
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |R|R|R|R|R|R|R|R|R|D|R|R|A|R|R|R|        Group Policy ID        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+#define VXLAN_GBP_DONT_LEARN		(BIT(6) << 16)
+#define VXLAN_GBP_POLICY_APPLIED	(BIT(3) << 16)
+#define VXLAN_GBP_ID_MASK		(0xFFFF)
+
 /* VXLAN protocol header:
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- * |R|R|R|R|I|R|R|R|               Reserved                        |
+ * |G|R|R|R|I|R|R|R|               Reserved                        |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * |                VXLAN Network Identifier (VNI) |   Reserved    |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  *
+ * G = 1	Group Policy (VXLAN-GBP)
  * I = 1	VXLAN Network Identifier (VNI) present
  */
 struct vxlanhdr {
@@ -26,9 +78,11 @@ struct vxlanhdr {
 #ifdef __LITTLE_ENDIAN_BITFIELD
 			__u8	reserved_flags1:3,
 				vni_present:1,
-				reserved_flags2:4;
+				reserved_flags2:3,
+				gbp_present:1;
 #elif defined(__BIG_ENDIAN_BITFIELD)
-			__u8	reserved_flags2:4,
+			__u8	gbp_present:1,
+				reserved_flags2:3,
 				vni_present:1,
 				reserved_flags1:3;
 #else
@@ -42,8 +96,16 @@ struct vxlanhdr {
 	__be32	vx_vni;
 };
 
+struct vxlan_metadata {
+	__be32		vni;
+	u32		gbp;
+};
+
 struct vxlan_sock;
-typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb, __be32 key);
+typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb,
+			   struct vxlan_metadata *md);
+
+#define VXLAN_EXT_GBP			BIT(0)
 
 /* per UDP socket information */
 struct vxlan_sock {
@@ -78,7 +140,8 @@ void vxlan_sock_release(struct vxlan_sock *vs);
 int vxlan_xmit_skb(struct vxlan_sock *vs,
 		   struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-		   __be16 src_port, __be16 dst_port, __be32 vni, bool xnet);
+		   __be16 src_port, __be16 dst_port, struct vxlan_metadata *md,
+		   bool xnet);
 
 static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
 						     netdev_features_t features)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index f7d0d2d..9f07bf5 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -370,10 +370,18 @@ enum {
 	IFLA_VXLAN_UDP_CSUM,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
+	IFLA_VXLAN_EXTENSION,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
 
+enum {
+	IFLA_VXLAN_EXT_UNSPEC,
+	IFLA_VXLAN_EXT_GBP,
+	__IFLA_VXLAN_EXT_MAX,
+};
+#define IFLA_VXLAN_EXT_MAX (__IFLA_VXLAN_EXT_MAX - 1)
+
 struct ifla_vxlan_port_range {
 	__be16	low;
 	__be16	high;
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index d7c46b3..dd68c97 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -59,7 +59,8 @@ static inline struct vxlan_port *vxlan_vport(const struct vport *vport)
 }
 
 /* Called with rcu_read_lock and BH disabled. */
-static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+		      struct vxlan_metadata *md)
 {
 	struct ovs_tunnel_info tun_info;
 	struct vport *vport = vs->data;
@@ -68,7 +69,7 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
 
 	/* Save outer tunnel values */
 	iph = ip_hdr(skb);
-	key = cpu_to_be64(ntohl(vx_vni) >> 8);
+	key = cpu_to_be64(ntohl(md->vni) >> 8);
 	ovs_flow_tun_info_init(&tun_info, iph,
 			       udp_hdr(skb)->source, udp_hdr(skb)->dest,
 			       key, TUNNEL_KEY, NULL, 0);
@@ -146,6 +147,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 	struct vxlan_port *vxlan_port = vxlan_vport(vport);
 	__be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
 	struct ovs_key_ipv4_tunnel *tun_key;
+	struct vxlan_metadata md;
 	struct rtable *rt;
 	struct flowi4 fl;
 	__be16 src_port;
@@ -178,12 +180,13 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 	skb->ignore_df = 1;
 
 	src_port = udp_flow_src_port(net, skb, 0, 0, true);
+	md.vni = htonl(be64_to_cpu(tun_key->tun_id) << 8);
 
 	err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
 			     fl.saddr, tun_key->ipv4_dst,
 			     tun_key->ipv4_tos, tun_key->ipv4_ttl, df,
 			     src_port, dst_port,
-			     htonl(be64_to_cpu(tun_key->tun_id) << 8),
+			     &md,
 			     false);
 	if (err < 0)
 		ip_rt_put(rt);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
       [not found]           ` <CA+mtBx_A_M3+irq7w4nNCyPZBgM7ja+wfJT4w4Q0Yo6GMGYVgA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-01-07 17:21             ` Thomas Graf
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Graf @ 2015-01-07 17:21 UTC (permalink / raw)
  To: Tom Herbert
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, Linux Netdev List, Stephen Hemminger,
	David Miller

On 01/07/15 at 08:56am, Tom Herbert wrote:
> On Wed, Jan 7, 2015 at 8:21 AM, Thomas Graf <tgraf@suug.ch> wrote:
> > If the VNI is not already used for another purpose, yes. The solution
> > as proposed can be integrated into existing VXLAN overlays separated by
> > VNI. It is also compatible with hardware VXLAN VTEPs which ignore the
> > reserved bits while continueing to maintain VNI separation.
> 
> It seems like it should be relatively easy to group VNIs together to
> have the same mark with the current use of VNI. The works up to the
> point that all packets corresponding to a single VNI get the same
> mark.

This really depends on the network architecture and assumes that you
can remap the VNIs in the entire network. You might want to run L3
with group definitions across multiple L2 VNI segments. A second issue
is that many hardware VXLAN VTEPs do VNI based learning and will run
into capacity limits.

I'm not saying it's impossible but it's very tricky to intergrate if
you can't start from scratch. The whole point of this is to come up
with something that is painfully easy to use and integrate without
requiring to change much.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-07 16:21       ` Thomas Graf
@ 2015-01-07 16:56         ` Tom Herbert
       [not found]           ` <CA+mtBx_A_M3+irq7w4nNCyPZBgM7ja+wfJT4w4Q0Yo6GMGYVgA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Tom Herbert @ 2015-01-07 16:56 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Linux Netdev List, dev

On Wed, Jan 7, 2015 at 8:21 AM, Thomas Graf <tgraf@suug.ch> wrote:
> On 01/07/15 at 08:05am, Tom Herbert wrote:
>> Associating a sixteen bit field with security is worrisome, especially
>> considering that VXLAN provides no verification for any header fields
>> and doesn't even advocate use of outer UDP checksum so the field is
>> susceptible to an undetected single bit flip. The concept of a
>> "trusted underlay" is weak justification and hardly universal, so the
>> only way to actually secure this is through IPsec (this is mentioned
>> in the VXLAN-GPB draft).
>
> As you state correctly, this work requires a trusted underlay which can
> be achieved with IPsec, OpenVPN, SSH, ...
>
This can't be enforced. There's already a lot of deployment of VXLAN
in non-trusted networks, and there's nothing to prevent someone from
using this feature in those environments. Maybe there's an argument
that VXLAN already fundamentally lacks security and verification, so
adding this field might not make things worse :-/.

>> But if we have the security state of IPsec then why would we need
>> this field anyway?
>
> It's a separation of concern: the security label mechanism of the
> overlay should not depend on an eventual encryption layer in the
> underlay as not all of them provide a mechanism to label packets.
>
>> Could this same functionality be achieved if we just match the VNI to
>> a mark in IP tables?
>
> If the VNI is not already used for another purpose, yes. The solution
> as proposed can be integrated into existing VXLAN overlays separated by
> VNI. It is also compatible with hardware VXLAN VTEPs which ignore the
> reserved bits while continueing to maintain VNI separation.

It seems like it should be relatively easy to group VNIs together to
have the same mark with the current use of VNI. The works up to the
point that all packets corresponding to a single VNI get the same
mark.

Tom

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
       [not found]     ` <CA+mtBx_Jj-tUM1nbHd2fHb0-=QpK3tcQgA=smWmg=cB-fupdGg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2015-01-07 16:21       ` Thomas Graf
  2015-01-07 16:56         ` Tom Herbert
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Graf @ 2015-01-07 16:21 UTC (permalink / raw)
  To: Tom Herbert
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, Linux Netdev List, Stephen Hemminger,
	David Miller

On 01/07/15 at 08:05am, Tom Herbert wrote:
> Associating a sixteen bit field with security is worrisome, especially
> considering that VXLAN provides no verification for any header fields
> and doesn't even advocate use of outer UDP checksum so the field is
> susceptible to an undetected single bit flip. The concept of a
> "trusted underlay" is weak justification and hardly universal, so the
> only way to actually secure this is through IPsec (this is mentioned
> in the VXLAN-GPB draft).

As you state correctly, this work requires a trusted underlay which can
be achieved with IPsec, OpenVPN, SSH, ...

> But if we have the security state of IPsec then why would we need
> this field anyway?

It's a separation of concern: the security label mechanism of the
overlay should not depend on an eventual encryption layer in the
underlay as not all of them provide a mechanism to label packets.
 
> Could this same functionality be achieved if we just match the VNI to
> a mark in IP tables?

If the VNI is not already used for another purpose, yes. The solution
as proposed can be integrated into existing VXLAN overlays separated by
VNI. It is also compatible with hardware VXLAN VTEPs which ignore the
reserved bits while continueing to maintain VNI separation.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-07  2:05 ` [PATCH 2/6] vxlan: Group Policy extension Thomas Graf
@ 2015-01-07 16:05   ` Tom Herbert
       [not found]     ` <CA+mtBx_Jj-tUM1nbHd2fHb0-=QpK3tcQgA=smWmg=cB-fupdGg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 30+ messages in thread
From: Tom Herbert @ 2015-01-07 16:05 UTC (permalink / raw)
  To: Thomas Graf
  Cc: David Miller, Jesse Gross, Stephen Hemminger, Pravin B Shelar,
	Linux Netdev List, dev

On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> Implements supports for the Group Policy VXLAN extension [0] to provide
> a lightweight and simple security label mechanism across network peers
> based on VXLAN. The security context and associated metadata is mapped
> to/from skb->mark. This allows further mapping to a SELinux context
> using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
> tc, etc.
>
> The group membership is defined by the lower 16 bits of skb->mark, the
> upper 16 bits are used for flags.
>
> SELinux allows to manage label to secure local resources. However,
> distributed applications require ACLs to implemented across hosts. This
> is typically achieved by matching on L2-L4 fields to identify the
> original sending host and process on the receiver. On top of that,
> netlabel and specifically CIPSO [1] allow to map security contexts to
> universal labels.  However, netlabel and CIPSO are relatively complex.
> This patch provides a lightweight alternative for overlay network
> environments with a trusted underlay. No additional control protocol
> is required.
>
Associating a sixteen bit field with security is worrisome, especially
considering that VXLAN provides no verification for any header fields
and doesn't even advocate use of outer UDP checksum so the field is
susceptible to an undetected single bit flip. The concept of a
"trusted underlay" is weak justification and hardly universal, so the
only way to actually secure this is through IPsec (this is mentioned
in the VXLAN-GPB draft). But if we have the security state of IPsec
then why would we need this field anyway?

Could this same functionality be achieved if we just match the VNI to
a mark in IP tables?

Tom

>            Host 1:                       Host 2:
>
>       Group A        Group B        Group B     Group A
>       +-----+   +-------------+    +-------+   +-----+
>       | lxc |   | SELinux CTX |    | httpd |   | VM  |
>       +--+--+   +--+----------+    +---+---+   +--+--+
>           \---+---/                     \----+---/
>               |                              |
>           +---+---+                      +---+---+
>           | vxlan |                      | vxlan |
>           +---+---+                      +---+---+
>               +------------------------------+
>
> Backwards compatibility:
> A VXLAN-GBP socket can receive standard VXLAN frames and will assign
> the default group 0x0000 to such frames. A Linux VXLAN socket will
> drop VXLAN-GBP  frames. The extension is therefore disabled by default
> and needs to be specifically enabled:
>
>    ip link add [...] type vxlan [...] gbp
>
> In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
> must run on a separate port number.
>
> Examples:
>   iptables:
>   $ iptables -I OUTPUT -p icmp -j MARK --set-mark 0x200
>   $ iptables -I INPUT -i br0 -m mark --mark 0x200 -j ACCEPT
>
>   OVS (patches provided separately):
>   in_port=1, actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL
>
> [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
> [1] http://lwn.net/Articles/204905/
>
> Signed-off-by: Thomas Graf <tgraf@suug.ch>
> ---
>  drivers/net/vxlan.c           | 155 ++++++++++++++++++++++++++++++------------
>  include/net/vxlan.h           |  80 ++++++++++++++++++++--
>  include/uapi/linux/if_link.h  |   8 +++
>  net/openvswitch/vport-vxlan.c |   9 ++-
>  4 files changed, 197 insertions(+), 55 deletions(-)
>
> diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
> index 4d52aa9..30b7b59 100644
> --- a/drivers/net/vxlan.c
> +++ b/drivers/net/vxlan.c
> @@ -132,6 +132,7 @@ struct vxlan_dev {
>         __u8              tos;          /* TOS override */
>         __u8              ttl;
>         u32               flags;        /* VXLAN_F_* in vxlan.h */
> +       u32               exts;         /* Enabled extensions */
>
>         struct work_struct sock_work;
>         struct work_struct igmp_join;
> @@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
>                         continue;
>
>                 vh2 = (struct vxlanhdr *)(p->data + off_vx);
> -               if (vh->vx_vni != vh2->vx_vni) {
> +               if (vh->vx_flags != vh2->vx_flags ||
> +                   vh->vx_vni != vh2->vx_vni) {
>                         NAPI_GRO_CB(p)->same_flow = 0;
>                         continue;
>                 }
> @@ -1095,6 +1097,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>  {
>         struct vxlan_sock *vs;
>         struct vxlanhdr *vxh;
> +       struct vxlan_metadata md = {0};
>
>         /* Need Vxlan and inner Ethernet header to be present */
>         if (!pskb_may_pull(skb, VXLAN_HLEN))
> @@ -1113,6 +1116,19 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>         if (vs->exts) {
>                 if (!vxh->vni_present)
>                         goto error_invalid_header;
> +
> +               if (vxh->gbp_present) {
> +                       if (!(vs->exts & VXLAN_EXT_GBP))
> +                               goto error_invalid_header;
> +
> +                       md.gbp = ntohs(vxh->gbp.policy_id);
> +
> +                       if (vxh->gbp.dont_learn)
> +                               md.gbp |= VXLAN_GBP_DONT_LEARN;
> +
> +                       if (vxh->gbp.policy_applied)
> +                               md.gbp |= VXLAN_GBP_POLICY_APPLIED;
> +               }
>         } else {
>                 if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
>                     (vxh->vx_vni & htonl(0xff)))
> @@ -1122,7 +1138,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
>         if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
>                 goto drop;
>
> -       vs->rcv(vs, skb, vxh->vx_vni);
> +       md.vni = vxh->vx_vni;
> +       vs->rcv(vs, skb, &md);
>         return 0;
>
>  drop:
> @@ -1138,8 +1155,8 @@ error:
>         return 1;
>  }
>
> -static void vxlan_rcv(struct vxlan_sock *vs,
> -                     struct sk_buff *skb, __be32 vx_vni)
> +static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
> +                     struct vxlan_metadata *md)
>  {
>         struct iphdr *oip = NULL;
>         struct ipv6hdr *oip6 = NULL;
> @@ -1150,7 +1167,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
>         int err = 0;
>         union vxlan_addr *remote_ip;
>
> -       vni = ntohl(vx_vni) >> 8;
> +       vni = ntohl(md->vni) >> 8;
>         /* Is this VNI defined? */
>         vxlan = vxlan_vs_find_vni(vs, vni);
>         if (!vxlan)
> @@ -1184,6 +1201,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
>                 goto drop;
>
>         skb_reset_network_header(skb);
> +       skb->mark = md->gbp;
>
>         if (oip6)
>                 err = IP6_ECN_decapsulate(oip6, skb);
> @@ -1533,15 +1551,54 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
>         return false;
>  }
>
> +static int vxlan_build_hdr(struct sk_buff *skb, struct vxlan_sock *vs,
> +                          int min_headroom, struct vxlan_metadata *md)
> +{
> +       struct vxlanhdr *vxh;
> +       int err;
> +
> +       /* Need space for new headers (invalidates iph ptr) */
> +       err = skb_cow_head(skb, min_headroom);
> +       if (unlikely(err)) {
> +               kfree_skb(skb);
> +               return err;
> +       }
> +
> +       skb = vlan_hwaccel_push_inside(skb);
> +       if (WARN_ON(!skb))
> +               return -ENOMEM;
> +
> +       vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> +       vxh->vx_flags = htonl(VXLAN_FLAGS);
> +       vxh->vx_vni = md->vni;
> +
> +       if (vs->exts)  {
> +               if (vs->exts & VXLAN_EXT_GBP) {
> +                       vxh->gbp_present = 1;
> +
> +                       if (md->gbp & VXLAN_GBP_DONT_LEARN)
> +                               vxh->gbp.dont_learn = 1;
> +
> +                       if (md->gbp & VXLAN_GBP_POLICY_APPLIED)
> +                               vxh->gbp.policy_applied = 1;
> +
> +                       vxh->gbp.policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
> +               }
> +       }
> +
> +       skb_set_inner_protocol(skb, htons(ETH_P_TEB));
> +
> +       return 0;
> +}
> +
>  #if IS_ENABLED(CONFIG_IPV6)
>  static int vxlan6_xmit_skb(struct vxlan_sock *vs,
>                            struct dst_entry *dst, struct sk_buff *skb,
>                            struct net_device *dev, struct in6_addr *saddr,
>                            struct in6_addr *daddr, __u8 prio, __u8 ttl,
> -                          __be16 src_port, __be16 dst_port, __be32 vni,
> -                          bool xnet)
> +                          __be16 src_port, __be16 dst_port,
> +                          struct vxlan_metadata *md, bool xnet)
>  {
> -       struct vxlanhdr *vxh;
>         int min_headroom;
>         int err;
>         bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
> @@ -1558,24 +1615,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
>                         + VXLAN_HLEN + sizeof(struct ipv6hdr)
>                         + (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
>
> -       /* Need space for new headers (invalidates iph ptr) */
> -       err = skb_cow_head(skb, min_headroom);
> -       if (unlikely(err)) {
> -               kfree_skb(skb);
> -               goto err;
> -       }
> -
> -       skb = vlan_hwaccel_push_inside(skb);
> -       if (WARN_ON(!skb)) {
> -               err = -ENOMEM;
> +       err = vxlan_build_hdr(skb, vs, min_headroom, md);
> +       if (err)
>                 goto err;
> -       }
> -
> -       vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -       vxh->vx_flags = htonl(VXLAN_FLAGS);
> -       vxh->vx_vni = vni;
> -
> -       skb_set_inner_protocol(skb, htons(ETH_P_TEB));
>
>         udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
>                              ttl, src_port, dst_port);
> @@ -1589,9 +1631,9 @@ err:
>  int vxlan_xmit_skb(struct vxlan_sock *vs,
>                    struct rtable *rt, struct sk_buff *skb,
>                    __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
> -                  __be16 src_port, __be16 dst_port, __be32 vni, bool xnet)
> +                  __be16 src_port, __be16 dst_port,
> +                  struct vxlan_metadata *md, bool xnet)
>  {
> -       struct vxlanhdr *vxh;
>         int min_headroom;
>         int err;
>         bool udp_sum = !vs->sock->sk->sk_no_check_tx;
> @@ -1604,22 +1646,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
>                         + VXLAN_HLEN + sizeof(struct iphdr)
>                         + (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
>
> -       /* Need space for new headers (invalidates iph ptr) */
> -       err = skb_cow_head(skb, min_headroom);
> -       if (unlikely(err)) {
> -               kfree_skb(skb);
> +       err = vxlan_build_hdr(skb, vs, min_headroom, md);
> +       if (err)
>                 return err;
> -       }
> -
> -       skb = vlan_hwaccel_push_inside(skb);
> -       if (WARN_ON(!skb))
> -               return -ENOMEM;
> -
> -       vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
> -       vxh->vx_flags = htonl(VXLAN_FLAGS);
> -       vxh->vx_vni = vni;
> -
> -       skb_set_inner_protocol(skb, htons(ETH_P_TEB));
>
>         return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
>                                    ttl, df, src_port, dst_port, xnet);
> @@ -1679,6 +1708,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>         const struct iphdr *old_iph;
>         struct flowi4 fl4;
>         union vxlan_addr *dst;
> +       struct vxlan_metadata md;
>         __be16 src_port = 0, dst_port;
>         u32 vni;
>         __be16 df = 0;
> @@ -1749,11 +1779,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>
>                 tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
>                 ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
> +               md.vni = htonl(vni << 8);
> +               md.gbp = skb->mark;
>
>                 err = vxlan_xmit_skb(vxlan->vn_sock, rt, skb,
>                                      fl4.saddr, dst->sin.sin_addr.s_addr,
> -                                    tos, ttl, df, src_port, dst_port,
> -                                    htonl(vni << 8),
> +                                    tos, ttl, df, src_port, dst_port, &md,
>                                      !net_eq(vxlan->net, dev_net(vxlan->dev)));
>                 if (err < 0) {
>                         /* skb is already freed. */
> @@ -1806,10 +1837,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
>                 }
>
>                 ttl = ttl ? : ip6_dst_hoplimit(ndst);
> +               md.vni = htonl(vni << 8);
> +               md.gbp = skb->mark;
>
>                 err = vxlan6_xmit_skb(vxlan->vn_sock, ndst, skb,
>                                       dev, &fl6.saddr, &fl6.daddr, 0, ttl,
> -                                     src_port, dst_port, htonl(vni << 8),
> +                                     src_port, dst_port, &md,
>                                       !net_eq(vxlan->net, dev_net(vxlan->dev)));
>  #endif
>         }
> @@ -2210,6 +2243,11 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
>         [IFLA_VXLAN_UDP_CSUM]   = { .type = NLA_U8 },
>         [IFLA_VXLAN_UDP_ZERO_CSUM6_TX]  = { .type = NLA_U8 },
>         [IFLA_VXLAN_UDP_ZERO_CSUM6_RX]  = { .type = NLA_U8 },
> +       [IFLA_VXLAN_EXTENSION]  = { .type = NLA_NESTED },
> +};
> +
> +static const struct nla_policy vxlan_ext_policy[IFLA_VXLAN_EXT_MAX + 1] = {
> +       [IFLA_VXLAN_EXT_GBP]    = { .type = NLA_FLAG, },
>  };
>
>  static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
> @@ -2246,6 +2284,18 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
>                 }
>         }
>
> +       if (data[IFLA_VXLAN_EXTENSION]) {
> +               int err;
> +
> +               err = nla_validate_nested(data[IFLA_VXLAN_EXTENSION],
> +                                         IFLA_VXLAN_EXT_MAX, vxlan_ext_policy);
> +               if (err < 0) {
> +                       pr_debug("invalid VXLAN extension configuration: %d\n",
> +                                err);
> +                       return -EINVAL;
> +               }
> +       }
> +
>         return 0;
>  }
>
> @@ -2400,6 +2450,18 @@ static void vxlan_sock_work(struct work_struct *work)
>         dev_put(vxlan->dev);
>  }
>
> +static void configure_vxlan_exts(struct vxlan_dev *vxlan, struct nlattr *attr)
> +{
> +       struct nlattr *exts[IFLA_VXLAN_EXT_MAX+1];
> +
> +       /* Validated in vxlan_validate() */
> +       if (nla_parse_nested(exts, IFLA_VXLAN_EXT_MAX, attr, NULL) < 0)
> +               BUG();
> +
> +       if (exts[IFLA_VXLAN_EXT_GBP])
> +               vxlan->exts |= VXLAN_EXT_GBP;
> +}
> +
>  static int vxlan_newlink(struct net *net, struct net_device *dev,
>                          struct nlattr *tb[], struct nlattr *data[])
>  {
> @@ -2525,6 +2587,9 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
>             nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
>                 vxlan->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
>
> +       if (data[IFLA_VXLAN_EXTENSION])
> +               configure_vxlan_exts(vxlan, data[IFLA_VXLAN_EXTENSION]);
> +
>         if (vxlan_find_vni(net, vni, use_ipv6 ? AF_INET6 : AF_INET,
>                            vxlan->dst_port)) {
>                 pr_info("duplicate VNI %u\n", vni);
> diff --git a/include/net/vxlan.h b/include/net/vxlan.h
> index 3e98d31..66000d0 100644
> --- a/include/net/vxlan.h
> +++ b/include/net/vxlan.h
> @@ -11,13 +11,60 @@
>  #define VNI_HASH_BITS  10
>  #define VNI_HASH_SIZE  (1<<VNI_HASH_BITS)
>
> +/*
> + * VXLAN Group Based Policy Extension:
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |1|-|-|-|1|-|-|-|R|D|R|R|A|R|R|R|        Group Policy ID        |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |                VXLAN Network Identifier (VNI) |   Reserved    |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + *
> + * D = Don't Learn bit. When set, this bit indicates that the egress
> + *     VTEP MUST NOT learn the source address of the encapsulated frame.
> + *
> + * A = Indicates that the group policy has already been applied to
> + *     this packet. Policies MUST NOT be applied by devices when the
> + *     A bit is set.
> + *
> + * [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
> + */
> +struct vxlan_gbp {
> +#ifdef __LITTLE_ENDIAN_BITFIELD
> +       __u8    reserved_flags1:3,
> +               policy_applied:1,
> +               reserved_flags2:2,
> +               dont_learn:1,
> +               reserved_flags3:1;
> +#elif defined(__BIG_ENDIAN_BITFIELD)
> +       __u8    reserved_flags1:1,
> +               dont_learn:1,
> +               reserved_flags2:2,
> +               policy_applied:1,
> +               reserved_flags3:3;
> +#else
> +#error "Please fix <asm/byteorder.h>"
> +#endif
> +       __be16 policy_id;
> +} __packed;
> +
> +/* skb->mark mapping
> + *
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + * |R|R|R|R|R|R|R|R|R|D|R|R|A|R|R|R|        Group Policy ID        |
> + * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> + */
> +#define VXLAN_GBP_DONT_LEARN           (BIT(6) << 16)
> +#define VXLAN_GBP_POLICY_APPLIED       (BIT(3) << 16)
> +#define VXLAN_GBP_ID_MASK              (0xFFFF)
> +
>  /* VXLAN protocol header:
>   * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
> - * |R|R|R|R|I|R|R|R|               Reserved                        |
> + * |G|R|R|R|I|R|R|R|               Reserved                        |
>   * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>   * |                VXLAN Network Identifier (VNI) |   Reserved    |
>   * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
>   *
> + * G = 1       Group Policy (VXLAN-GBP)
>   * I = 1       VXLAN Network Identifier (VNI) present
>   */
>  struct vxlanhdr {
> @@ -26,24 +73,42 @@ struct vxlanhdr {
>  #ifdef __LITTLE_ENDIAN_BITFIELD
>                         __u8    reserved_flags1:3,
>                                 vni_present:1,
> -                               reserved_flags2:4;
> +                               reserved_flags2:3,
> +                               gbp_present:1;
>  #elif defined(__BIG_ENDIAN_BITFIELD)
> -                       __u8    reserved_flags2:4,
> +                       __u8    gbp_present:1,
> +                               reserved_flags2:3,
>                                 vni_present:1,
>                                 reserved_flags1:3;
>  #else
>  #error "Please fix <asm/byteorder.h>"
>  #endif
> -                       __u8    vx_reserved1;
> -                       __be16  vx_reserved2;
> +                       union {
> +                               /* NOTE: Offset 0 will be 1 byte aligned, so
> +                                * all member structs must be marked packed.
> +                                */
> +                               struct vxlan_gbp gbp;
> +                               struct {
> +                                       __u8    vx_reserved1;
> +                                       __be16  vx_reserved2;
> +                               } __packed;
> +                       };
>                 };
>                 __be32 vx_flags;
>         };
>         __be32  vx_vni;
>  };
>
> +struct vxlan_metadata {
> +       __be32          vni;
> +       u32             gbp;
> +};
> +
>  struct vxlan_sock;
> -typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb, __be32 key);
> +typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb,
> +                          struct vxlan_metadata *md);
> +
> +#define VXLAN_EXT_GBP                  BIT(0)
>
>  /* per UDP socket information */
>  struct vxlan_sock {
> @@ -78,7 +143,8 @@ void vxlan_sock_release(struct vxlan_sock *vs);
>  int vxlan_xmit_skb(struct vxlan_sock *vs,
>                    struct rtable *rt, struct sk_buff *skb,
>                    __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
> -                  __be16 src_port, __be16 dst_port, __be32 vni, bool xnet);
> +                  __be16 src_port, __be16 dst_port, struct vxlan_metadata *md,
> +                  bool xnet);
>
>  static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
>                                                      netdev_features_t features)
> diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
> index f7d0d2d..9f07bf5 100644
> --- a/include/uapi/linux/if_link.h
> +++ b/include/uapi/linux/if_link.h
> @@ -370,10 +370,18 @@ enum {
>         IFLA_VXLAN_UDP_CSUM,
>         IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
>         IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
> +       IFLA_VXLAN_EXTENSION,
>         __IFLA_VXLAN_MAX
>  };
>  #define IFLA_VXLAN_MAX (__IFLA_VXLAN_MAX - 1)
>
> +enum {
> +       IFLA_VXLAN_EXT_UNSPEC,
> +       IFLA_VXLAN_EXT_GBP,
> +       __IFLA_VXLAN_EXT_MAX,
> +};
> +#define IFLA_VXLAN_EXT_MAX (__IFLA_VXLAN_EXT_MAX - 1)
> +
>  struct ifla_vxlan_port_range {
>         __be16  low;
>         __be16  high;
> diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
> index d7c46b3..dd68c97 100644
> --- a/net/openvswitch/vport-vxlan.c
> +++ b/net/openvswitch/vport-vxlan.c
> @@ -59,7 +59,8 @@ static inline struct vxlan_port *vxlan_vport(const struct vport *vport)
>  }
>
>  /* Called with rcu_read_lock and BH disabled. */
> -static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
> +static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
> +                     struct vxlan_metadata *md)
>  {
>         struct ovs_tunnel_info tun_info;
>         struct vport *vport = vs->data;
> @@ -68,7 +69,7 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
>
>         /* Save outer tunnel values */
>         iph = ip_hdr(skb);
> -       key = cpu_to_be64(ntohl(vx_vni) >> 8);
> +       key = cpu_to_be64(ntohl(md->vni) >> 8);
>         ovs_flow_tun_info_init(&tun_info, iph,
>                                udp_hdr(skb)->source, udp_hdr(skb)->dest,
>                                key, TUNNEL_KEY, NULL, 0);
> @@ -146,6 +147,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
>         struct vxlan_port *vxlan_port = vxlan_vport(vport);
>         __be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
>         struct ovs_key_ipv4_tunnel *tun_key;
> +       struct vxlan_metadata md;
>         struct rtable *rt;
>         struct flowi4 fl;
>         __be16 src_port;
> @@ -178,12 +180,13 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
>         skb->ignore_df = 1;
>
>         src_port = udp_flow_src_port(net, skb, 0, 0, true);
> +       md.vni = htonl(be64_to_cpu(tun_key->tun_id) << 8);
>
>         err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
>                              fl.saddr, tun_key->ipv4_dst,
>                              tun_key->ipv4_tos, tun_key->ipv4_ttl, df,
>                              src_port, dst_port,
> -                            htonl(be64_to_cpu(tun_key->tun_id) << 8),
> +                            &md,
>                              false);
>         if (err < 0)
>                 ip_rt_put(rt);
> --
> 1.9.3
>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-07  3:37 Alexei Starovoitov
  2015-01-07 10:03 ` David Laight
@ 2015-01-07 11:10 ` Thomas Graf
  1 sibling, 0 replies; 30+ messages in thread
From: Thomas Graf @ 2015-01-07 11:10 UTC (permalink / raw)
  To: Alexei Starovoitov
  Cc: David S. Miller, Jesse Gross, Stephen Hemminger, Pravin Shelar,
	netdev, dev

On 01/06/15 at 07:37pm, Alexei Starovoitov wrote:
> Even it works ok, I think this struct layout is ugly.
> imo would be much easier to read if you replace
> the whole vxlanhdr with vxlanhdr_gbp
> or split vxlanhdr into two 32-bit structs.
> then __packed hacks won't be needed.

The main reason why I merged it into vxlanhdr is for documentation
purposes and to avoid duplicating the generic VXLAN header for every
extension. The RCO and GPE extensions would need to duplicate this
over and over. It gets messy in particular when multiple extensions
can be used in combination (such as GBP and RCO) which then each
have their own conflicting header definitions. This way, it is clear
which extensions are compatible by just looking at the definition
of the structure.

That said, I'm not married to this idea.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-07 10:03 ` David Laight
@ 2015-01-07 11:01   ` Thomas Graf
  0 siblings, 0 replies; 30+ messages in thread
From: Thomas Graf @ 2015-01-07 11:01 UTC (permalink / raw)
  To: David Laight
  Cc: 'Alexei Starovoitov',
	David S. Miller, Jesse Gross, Stephen Hemminger, Pravin Shelar,
	netdev, dev

On 01/07/15 at 10:03am, David Laight wrote:
> From: Alexei Starovoitov
> > On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > > +struct vxlan_gbp {
> > > +#ifdef __LITTLE_ENDIAN_BITFIELD
> > > +       __u8    reserved_flags1:3,
> > ...
> > > +       __be16 policy_id;
> > > +} __packed;
> > 
> > are you sure that compiler will be smart enough
> > to do single short load when you pack
> > u8 + struct vxlan_gbp inside struct vxlanhdr ?
> > I suspect compiler will use two byte loads
> > with shifts and ors every time to access policy_id.
> > Even it works ok, I think this struct layout is ugly.
> > imo would be much easier to read if you replace
> > the whole vxlanhdr with vxlanhdr_gbp
> > or split vxlanhdr into two 32-bit structs.
> > then __packed hacks won't be needed.

If I read objdump output correctly, gcc seems fine with it:

        /* For backwards compatibility, only allow reserved fields to be
         * used by VXLAN extensions if explicitly requested.
         */
        if (vs->exts) {
                if (!vxh->vni_present)
    2640:       41 0f b6 55 08          movzbl 0x8(%r13),%edx
    2645:       f6 c2 08                test   $0x8,%dl
    2648:       74 c2                   je     260c <vxlan_udp_encap_recv+0x9c>
    [...]
                        md.gbp = ntohs(vxh->gbp.policy_id);
    2652:       41 0f b7 55 0a          movzwl 0xa(%r13),%edx

Let me know what I have to do/provide to validate this properly.

> Also, if you are writing the values then you need to write
> all the members of the bitfield in order to get a single write.
> 
> Basically you are much better off using explicit masks.

I went back and forth on this and chose to use individual bit fields
and map them to a static bit definition which is exported via Netlink.
That way the user space Netlink interface remains stable should the
wire protocol ever change. Yes, this implies some branching which could
be avoided right now as long as user and wire protocol are identical. I
did not observe any performance differences in benchmarks though.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* RE: [PATCH 2/6] vxlan: Group Policy extension
  2015-01-07  3:37 Alexei Starovoitov
@ 2015-01-07 10:03 ` David Laight
  2015-01-07 11:01   ` Thomas Graf
  2015-01-07 11:10 ` Thomas Graf
  1 sibling, 1 reply; 30+ messages in thread
From: David Laight @ 2015-01-07 10:03 UTC (permalink / raw)
  To: 'Alexei Starovoitov', Thomas Graf
  Cc: David S. Miller, Jesse Gross, Stephen Hemminger, Pravin Shelar,
	netdev, dev

From: Alexei Starovoitov
> On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> > +struct vxlan_gbp {
> > +#ifdef __LITTLE_ENDIAN_BITFIELD
> > +       __u8    reserved_flags1:3,
> ...
> > +       __be16 policy_id;
> > +} __packed;
> 
> are you sure that compiler will be smart enough
> to do single short load when you pack
> u8 + struct vxlan_gbp inside struct vxlanhdr ?
> I suspect compiler will use two byte loads
> with shifts and ors every time to access policy_id.
> Even it works ok, I think this struct layout is ugly.
> imo would be much easier to read if you replace
> the whole vxlanhdr with vxlanhdr_gbp
> or split vxlanhdr into two 32-bit structs.
> then __packed hacks won't be needed.

Also, if you are writing the values then you need to write
all the members of the bitfield in order to get a single write.

Basically you are much better off using explicit masks.

	David


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/6] vxlan: Group Policy extension
@ 2015-01-07  3:37 Alexei Starovoitov
  2015-01-07 10:03 ` David Laight
  2015-01-07 11:10 ` Thomas Graf
  0 siblings, 2 replies; 30+ messages in thread
From: Alexei Starovoitov @ 2015-01-07  3:37 UTC (permalink / raw)
  To: Thomas Graf
  Cc: dev-yBygre7rU0TnMu66kgdUjQ, netdev-u79uwXL29TY76Z2rM5mHXA,
	Stephen Hemminger, David S. Miller

On Tue, Jan 6, 2015 at 6:05 PM, Thomas Graf <tgraf@suug.ch> wrote:
> +struct vxlan_gbp {
> +#ifdef __LITTLE_ENDIAN_BITFIELD
> +       __u8    reserved_flags1:3,
...
> +       __be16 policy_id;
> +} __packed;

are you sure that compiler will be smart enough
to do single short load when you pack
u8 + struct vxlan_gbp inside struct vxlanhdr ?
I suspect compiler will use two byte loads
with shifts and ors every time to access policy_id.
Even it works ok, I think this struct layout is ugly.
imo would be much easier to read if you replace
the whole vxlanhdr with vxlanhdr_gbp
or split vxlanhdr into two 32-bit structs.
then __packed hacks won't be needed.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 2/6] vxlan: Group Policy extension
  2015-01-07  2:05 [PATCH 0/6 net-next] VXLAN Group Policy Extension Thomas Graf
@ 2015-01-07  2:05 ` Thomas Graf
  2015-01-07 16:05   ` Tom Herbert
  0 siblings, 1 reply; 30+ messages in thread
From: Thomas Graf @ 2015-01-07  2:05 UTC (permalink / raw)
  To: davem, jesse, stephen, pshelar; +Cc: netdev, dev

Implements supports for the Group Policy VXLAN extension [0] to provide
a lightweight and simple security label mechanism across network peers
based on VXLAN. The security context and associated metadata is mapped
to/from skb->mark. This allows further mapping to a SELinux context
using SECMARK, to implement ACLs directly with nftables, iptables, OVS,
tc, etc.

The group membership is defined by the lower 16 bits of skb->mark, the
upper 16 bits are used for flags.

SELinux allows to manage label to secure local resources. However,
distributed applications require ACLs to implemented across hosts. This
is typically achieved by matching on L2-L4 fields to identify the
original sending host and process on the receiver. On top of that,
netlabel and specifically CIPSO [1] allow to map security contexts to
universal labels.  However, netlabel and CIPSO are relatively complex.
This patch provides a lightweight alternative for overlay network
environments with a trusted underlay. No additional control protocol
is required.

           Host 1:                       Host 2:

      Group A        Group B        Group B     Group A
      +-----+   +-------------+    +-------+   +-----+
      | lxc |   | SELinux CTX |    | httpd |   | VM  |
      +--+--+   +--+----------+    +---+---+   +--+--+
	  \---+---/                     \----+---/
	      |                              |
	  +---+---+                      +---+---+
	  | vxlan |                      | vxlan |
	  +---+---+                      +---+---+
	      +------------------------------+

Backwards compatibility:
A VXLAN-GBP socket can receive standard VXLAN frames and will assign
the default group 0x0000 to such frames. A Linux VXLAN socket will
drop VXLAN-GBP  frames. The extension is therefore disabled by default
and needs to be specifically enabled:

   ip link add [...] type vxlan [...] gbp

In a mixed environment with VXLAN and VXLAN-GBP sockets, the GBP socket
must run on a separate port number.

Examples:
  iptables:
  $ iptables -I OUTPUT -p icmp -j MARK --set-mark 0x200
  $ iptables -I INPUT -i br0 -m mark --mark 0x200 -j ACCEPT

  OVS (patches provided separately):
  in_port=1, actions=load:0x200->NXM_NX_TUN_GBP_ID[],NORMAL

[0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
[1] http://lwn.net/Articles/204905/

Signed-off-by: Thomas Graf <tgraf@suug.ch>
---
 drivers/net/vxlan.c           | 155 ++++++++++++++++++++++++++++++------------
 include/net/vxlan.h           |  80 ++++++++++++++++++++--
 include/uapi/linux/if_link.h  |   8 +++
 net/openvswitch/vport-vxlan.c |   9 ++-
 4 files changed, 197 insertions(+), 55 deletions(-)

diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 4d52aa9..30b7b59 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -132,6 +132,7 @@ struct vxlan_dev {
 	__u8		  tos;		/* TOS override */
 	__u8		  ttl;
 	u32		  flags;	/* VXLAN_F_* in vxlan.h */
+	u32		  exts;		/* Enabled extensions */
 
 	struct work_struct sock_work;
 	struct work_struct igmp_join;
@@ -568,7 +569,8 @@ static struct sk_buff **vxlan_gro_receive(struct sk_buff **head, struct sk_buff
 			continue;
 
 		vh2 = (struct vxlanhdr *)(p->data + off_vx);
-		if (vh->vx_vni != vh2->vx_vni) {
+		if (vh->vx_flags != vh2->vx_flags ||
+		    vh->vx_vni != vh2->vx_vni) {
 			NAPI_GRO_CB(p)->same_flow = 0;
 			continue;
 		}
@@ -1095,6 +1097,7 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 {
 	struct vxlan_sock *vs;
 	struct vxlanhdr *vxh;
+	struct vxlan_metadata md = {0};
 
 	/* Need Vxlan and inner Ethernet header to be present */
 	if (!pskb_may_pull(skb, VXLAN_HLEN))
@@ -1113,6 +1116,19 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (vs->exts) {
 		if (!vxh->vni_present)
 			goto error_invalid_header;
+
+		if (vxh->gbp_present) {
+			if (!(vs->exts & VXLAN_EXT_GBP))
+				goto error_invalid_header;
+
+			md.gbp = ntohs(vxh->gbp.policy_id);
+
+			if (vxh->gbp.dont_learn)
+				md.gbp |= VXLAN_GBP_DONT_LEARN;
+
+			if (vxh->gbp.policy_applied)
+				md.gbp |= VXLAN_GBP_POLICY_APPLIED;
+		}
 	} else {
 		if (vxh->vx_flags != htonl(VXLAN_FLAGS) ||
 		    (vxh->vx_vni & htonl(0xff)))
@@ -1122,7 +1138,8 @@ static int vxlan_udp_encap_recv(struct sock *sk, struct sk_buff *skb)
 	if (iptunnel_pull_header(skb, VXLAN_HLEN, htons(ETH_P_TEB)))
 		goto drop;
 
-	vs->rcv(vs, skb, vxh->vx_vni);
+	md.vni = vxh->vx_vni;
+	vs->rcv(vs, skb, &md);
 	return 0;
 
 drop:
@@ -1138,8 +1155,8 @@ error:
 	return 1;
 }
 
-static void vxlan_rcv(struct vxlan_sock *vs,
-		      struct sk_buff *skb, __be32 vx_vni)
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+		      struct vxlan_metadata *md)
 {
 	struct iphdr *oip = NULL;
 	struct ipv6hdr *oip6 = NULL;
@@ -1150,7 +1167,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 	int err = 0;
 	union vxlan_addr *remote_ip;
 
-	vni = ntohl(vx_vni) >> 8;
+	vni = ntohl(md->vni) >> 8;
 	/* Is this VNI defined? */
 	vxlan = vxlan_vs_find_vni(vs, vni);
 	if (!vxlan)
@@ -1184,6 +1201,7 @@ static void vxlan_rcv(struct vxlan_sock *vs,
 		goto drop;
 
 	skb_reset_network_header(skb);
+	skb->mark = md->gbp;
 
 	if (oip6)
 		err = IP6_ECN_decapsulate(oip6, skb);
@@ -1533,15 +1551,54 @@ static bool route_shortcircuit(struct net_device *dev, struct sk_buff *skb)
 	return false;
 }
 
+static int vxlan_build_hdr(struct sk_buff *skb, struct vxlan_sock *vs,
+			   int min_headroom, struct vxlan_metadata *md)
+{
+	struct vxlanhdr *vxh;
+	int err;
+
+	/* Need space for new headers (invalidates iph ptr) */
+	err = skb_cow_head(skb, min_headroom);
+	if (unlikely(err)) {
+		kfree_skb(skb);
+		return err;
+	}
+
+	skb = vlan_hwaccel_push_inside(skb);
+	if (WARN_ON(!skb))
+		return -ENOMEM;
+
+	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
+	vxh->vx_flags = htonl(VXLAN_FLAGS);
+	vxh->vx_vni = md->vni;
+
+	if (vs->exts)  {
+		if (vs->exts & VXLAN_EXT_GBP) {
+			vxh->gbp_present = 1;
+
+			if (md->gbp & VXLAN_GBP_DONT_LEARN)
+				vxh->gbp.dont_learn = 1;
+
+			if (md->gbp & VXLAN_GBP_POLICY_APPLIED)
+				vxh->gbp.policy_applied = 1;
+
+			vxh->gbp.policy_id = htons(md->gbp & VXLAN_GBP_ID_MASK);
+		}
+	}
+
+	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
+
+	return 0;
+}
+
 #if IS_ENABLED(CONFIG_IPV6)
 static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 			   struct dst_entry *dst, struct sk_buff *skb,
 			   struct net_device *dev, struct in6_addr *saddr,
 			   struct in6_addr *daddr, __u8 prio, __u8 ttl,
-			   __be16 src_port, __be16 dst_port, __be32 vni,
-			   bool xnet)
+			   __be16 src_port, __be16 dst_port,
+			   struct vxlan_metadata *md, bool xnet)
 {
-	struct vxlanhdr *vxh;
 	int min_headroom;
 	int err;
 	bool udp_sum = !udp_get_no_check6_tx(vs->sock->sk);
@@ -1558,24 +1615,9 @@ static int vxlan6_xmit_skb(struct vxlan_sock *vs,
 			+ VXLAN_HLEN + sizeof(struct ipv6hdr)
 			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
 
-	/* Need space for new headers (invalidates iph ptr) */
-	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err)) {
-		kfree_skb(skb);
-		goto err;
-	}
-
-	skb = vlan_hwaccel_push_inside(skb);
-	if (WARN_ON(!skb)) {
-		err = -ENOMEM;
+	err = vxlan_build_hdr(skb, vs, min_headroom, md);
+	if (err)
 		goto err;
-	}
-
-	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = htonl(VXLAN_FLAGS);
-	vxh->vx_vni = vni;
-
-	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	udp_tunnel6_xmit_skb(vs->sock, dst, skb, dev, saddr, daddr, prio,
 			     ttl, src_port, dst_port);
@@ -1589,9 +1631,9 @@ err:
 int vxlan_xmit_skb(struct vxlan_sock *vs,
 		   struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-		   __be16 src_port, __be16 dst_port, __be32 vni, bool xnet)
+		   __be16 src_port, __be16 dst_port,
+		   struct vxlan_metadata *md, bool xnet)
 {
-	struct vxlanhdr *vxh;
 	int min_headroom;
 	int err;
 	bool udp_sum = !vs->sock->sk->sk_no_check_tx;
@@ -1604,22 +1646,9 @@ int vxlan_xmit_skb(struct vxlan_sock *vs,
 			+ VXLAN_HLEN + sizeof(struct iphdr)
 			+ (vlan_tx_tag_present(skb) ? VLAN_HLEN : 0);
 
-	/* Need space for new headers (invalidates iph ptr) */
-	err = skb_cow_head(skb, min_headroom);
-	if (unlikely(err)) {
-		kfree_skb(skb);
+	err = vxlan_build_hdr(skb, vs, min_headroom, md);
+	if (err)
 		return err;
-	}
-
-	skb = vlan_hwaccel_push_inside(skb);
-	if (WARN_ON(!skb))
-		return -ENOMEM;
-
-	vxh = (struct vxlanhdr *) __skb_push(skb, sizeof(*vxh));
-	vxh->vx_flags = htonl(VXLAN_FLAGS);
-	vxh->vx_vni = vni;
-
-	skb_set_inner_protocol(skb, htons(ETH_P_TEB));
 
 	return udp_tunnel_xmit_skb(vs->sock, rt, skb, src, dst, tos,
 				   ttl, df, src_port, dst_port, xnet);
@@ -1679,6 +1708,7 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 	const struct iphdr *old_iph;
 	struct flowi4 fl4;
 	union vxlan_addr *dst;
+	struct vxlan_metadata md;
 	__be16 src_port = 0, dst_port;
 	u32 vni;
 	__be16 df = 0;
@@ -1749,11 +1779,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 
 		tos = ip_tunnel_ecn_encap(tos, old_iph, skb);
 		ttl = ttl ? : ip4_dst_hoplimit(&rt->dst);
+		md.vni = htonl(vni << 8);
+		md.gbp = skb->mark;
 
 		err = vxlan_xmit_skb(vxlan->vn_sock, rt, skb,
 				     fl4.saddr, dst->sin.sin_addr.s_addr,
-				     tos, ttl, df, src_port, dst_port,
-				     htonl(vni << 8),
+				     tos, ttl, df, src_port, dst_port, &md,
 				     !net_eq(vxlan->net, dev_net(vxlan->dev)));
 		if (err < 0) {
 			/* skb is already freed. */
@@ -1806,10 +1837,12 @@ static void vxlan_xmit_one(struct sk_buff *skb, struct net_device *dev,
 		}
 
 		ttl = ttl ? : ip6_dst_hoplimit(ndst);
+		md.vni = htonl(vni << 8);
+		md.gbp = skb->mark;
 
 		err = vxlan6_xmit_skb(vxlan->vn_sock, ndst, skb,
 				      dev, &fl6.saddr, &fl6.daddr, 0, ttl,
-				      src_port, dst_port, htonl(vni << 8),
+				      src_port, dst_port, &md,
 				      !net_eq(vxlan->net, dev_net(vxlan->dev)));
 #endif
 	}
@@ -2210,6 +2243,11 @@ static const struct nla_policy vxlan_policy[IFLA_VXLAN_MAX + 1] = {
 	[IFLA_VXLAN_UDP_CSUM]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_UDP_ZERO_CSUM6_TX]	= { .type = NLA_U8 },
 	[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]	= { .type = NLA_U8 },
+	[IFLA_VXLAN_EXTENSION]	= { .type = NLA_NESTED },
+};
+
+static const struct nla_policy vxlan_ext_policy[IFLA_VXLAN_EXT_MAX + 1] = {
+	[IFLA_VXLAN_EXT_GBP]	= { .type = NLA_FLAG, },
 };
 
 static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
@@ -2246,6 +2284,18 @@ static int vxlan_validate(struct nlattr *tb[], struct nlattr *data[])
 		}
 	}
 
+	if (data[IFLA_VXLAN_EXTENSION]) {
+		int err;
+
+		err = nla_validate_nested(data[IFLA_VXLAN_EXTENSION],
+					  IFLA_VXLAN_EXT_MAX, vxlan_ext_policy);
+		if (err < 0) {
+			pr_debug("invalid VXLAN extension configuration: %d\n",
+				 err);
+			return -EINVAL;
+		}
+	}
+
 	return 0;
 }
 
@@ -2400,6 +2450,18 @@ static void vxlan_sock_work(struct work_struct *work)
 	dev_put(vxlan->dev);
 }
 
+static void configure_vxlan_exts(struct vxlan_dev *vxlan, struct nlattr *attr)
+{
+	struct nlattr *exts[IFLA_VXLAN_EXT_MAX+1];
+
+	/* Validated in vxlan_validate() */
+	if (nla_parse_nested(exts, IFLA_VXLAN_EXT_MAX, attr, NULL) < 0)
+		BUG();
+
+	if (exts[IFLA_VXLAN_EXT_GBP])
+		vxlan->exts |= VXLAN_EXT_GBP;
+}
+
 static int vxlan_newlink(struct net *net, struct net_device *dev,
 			 struct nlattr *tb[], struct nlattr *data[])
 {
@@ -2525,6 +2587,9 @@ static int vxlan_newlink(struct net *net, struct net_device *dev,
 	    nla_get_u8(data[IFLA_VXLAN_UDP_ZERO_CSUM6_RX]))
 		vxlan->flags |= VXLAN_F_UDP_ZERO_CSUM6_RX;
 
+	if (data[IFLA_VXLAN_EXTENSION])
+		configure_vxlan_exts(vxlan, data[IFLA_VXLAN_EXTENSION]);
+
 	if (vxlan_find_vni(net, vni, use_ipv6 ? AF_INET6 : AF_INET,
 			   vxlan->dst_port)) {
 		pr_info("duplicate VNI %u\n", vni);
diff --git a/include/net/vxlan.h b/include/net/vxlan.h
index 3e98d31..66000d0 100644
--- a/include/net/vxlan.h
+++ b/include/net/vxlan.h
@@ -11,13 +11,60 @@
 #define VNI_HASH_BITS	10
 #define VNI_HASH_SIZE	(1<<VNI_HASH_BITS)
 
+/*
+ * VXLAN Group Based Policy Extension:
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |1|-|-|-|1|-|-|-|R|D|R|R|A|R|R|R|        Group Policy ID        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |                VXLAN Network Identifier (VNI) |   Reserved    |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ *
+ * D = Don't Learn bit. When set, this bit indicates that the egress
+ *     VTEP MUST NOT learn the source address of the encapsulated frame.
+ *
+ * A = Indicates that the group policy has already been applied to
+ *     this packet. Policies MUST NOT be applied by devices when the
+ *     A bit is set.
+ *
+ * [0] https://tools.ietf.org/html/draft-smith-vxlan-group-policy
+ */
+struct vxlan_gbp {
+#ifdef __LITTLE_ENDIAN_BITFIELD
+	__u8	reserved_flags1:3,
+		policy_applied:1,
+		reserved_flags2:2,
+		dont_learn:1,
+		reserved_flags3:1;
+#elif defined(__BIG_ENDIAN_BITFIELD)
+	__u8	reserved_flags1:1,
+		dont_learn:1,
+		reserved_flags2:2,
+		policy_applied:1,
+		reserved_flags3:3;
+#else
+#error	"Please fix <asm/byteorder.h>"
+#endif
+	__be16 policy_id;
+} __packed;
+
+/* skb->mark mapping
+ *
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ * |R|R|R|R|R|R|R|R|R|D|R|R|A|R|R|R|        Group Policy ID        |
+ * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
+ */
+#define VXLAN_GBP_DONT_LEARN		(BIT(6) << 16)
+#define VXLAN_GBP_POLICY_APPLIED	(BIT(3) << 16)
+#define VXLAN_GBP_ID_MASK		(0xFFFF)
+
 /* VXLAN protocol header:
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
- * |R|R|R|R|I|R|R|R|               Reserved                        |
+ * |G|R|R|R|I|R|R|R|               Reserved                        |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  * |                VXLAN Network Identifier (VNI) |   Reserved    |
  * +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
  *
+ * G = 1	Group Policy (VXLAN-GBP)
  * I = 1	VXLAN Network Identifier (VNI) present
  */
 struct vxlanhdr {
@@ -26,24 +73,42 @@ struct vxlanhdr {
 #ifdef __LITTLE_ENDIAN_BITFIELD
 			__u8	reserved_flags1:3,
 				vni_present:1,
-				reserved_flags2:4;
+				reserved_flags2:3,
+				gbp_present:1;
 #elif defined(__BIG_ENDIAN_BITFIELD)
-			__u8	reserved_flags2:4,
+			__u8	gbp_present:1,
+				reserved_flags2:3,
 				vni_present:1,
 				reserved_flags1:3;
 #else
 #error	"Please fix <asm/byteorder.h>"
 #endif
-			__u8	vx_reserved1;
-			__be16	vx_reserved2;
+			union {
+				/* NOTE: Offset 0 will be 1 byte aligned, so
+				 * all member structs must be marked packed.
+				 */
+				struct vxlan_gbp gbp;
+				struct {
+					__u8	vx_reserved1;
+					__be16	vx_reserved2;
+				} __packed;
+			};
 		};
 		__be32 vx_flags;
 	};
 	__be32	vx_vni;
 };
 
+struct vxlan_metadata {
+	__be32		vni;
+	u32		gbp;
+};
+
 struct vxlan_sock;
-typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb, __be32 key);
+typedef void (vxlan_rcv_t)(struct vxlan_sock *vh, struct sk_buff *skb,
+			   struct vxlan_metadata *md);
+
+#define VXLAN_EXT_GBP			BIT(0)
 
 /* per UDP socket information */
 struct vxlan_sock {
@@ -78,7 +143,8 @@ void vxlan_sock_release(struct vxlan_sock *vs);
 int vxlan_xmit_skb(struct vxlan_sock *vs,
 		   struct rtable *rt, struct sk_buff *skb,
 		   __be32 src, __be32 dst, __u8 tos, __u8 ttl, __be16 df,
-		   __be16 src_port, __be16 dst_port, __be32 vni, bool xnet);
+		   __be16 src_port, __be16 dst_port, struct vxlan_metadata *md,
+		   bool xnet);
 
 static inline netdev_features_t vxlan_features_check(struct sk_buff *skb,
 						     netdev_features_t features)
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index f7d0d2d..9f07bf5 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -370,10 +370,18 @@ enum {
 	IFLA_VXLAN_UDP_CSUM,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_TX,
 	IFLA_VXLAN_UDP_ZERO_CSUM6_RX,
+	IFLA_VXLAN_EXTENSION,
 	__IFLA_VXLAN_MAX
 };
 #define IFLA_VXLAN_MAX	(__IFLA_VXLAN_MAX - 1)
 
+enum {
+	IFLA_VXLAN_EXT_UNSPEC,
+	IFLA_VXLAN_EXT_GBP,
+	__IFLA_VXLAN_EXT_MAX,
+};
+#define IFLA_VXLAN_EXT_MAX (__IFLA_VXLAN_EXT_MAX - 1)
+
 struct ifla_vxlan_port_range {
 	__be16	low;
 	__be16	high;
diff --git a/net/openvswitch/vport-vxlan.c b/net/openvswitch/vport-vxlan.c
index d7c46b3..dd68c97 100644
--- a/net/openvswitch/vport-vxlan.c
+++ b/net/openvswitch/vport-vxlan.c
@@ -59,7 +59,8 @@ static inline struct vxlan_port *vxlan_vport(const struct vport *vport)
 }
 
 /* Called with rcu_read_lock and BH disabled. */
-static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
+static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb,
+		      struct vxlan_metadata *md)
 {
 	struct ovs_tunnel_info tun_info;
 	struct vport *vport = vs->data;
@@ -68,7 +69,7 @@ static void vxlan_rcv(struct vxlan_sock *vs, struct sk_buff *skb, __be32 vx_vni)
 
 	/* Save outer tunnel values */
 	iph = ip_hdr(skb);
-	key = cpu_to_be64(ntohl(vx_vni) >> 8);
+	key = cpu_to_be64(ntohl(md->vni) >> 8);
 	ovs_flow_tun_info_init(&tun_info, iph,
 			       udp_hdr(skb)->source, udp_hdr(skb)->dest,
 			       key, TUNNEL_KEY, NULL, 0);
@@ -146,6 +147,7 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 	struct vxlan_port *vxlan_port = vxlan_vport(vport);
 	__be16 dst_port = inet_sk(vxlan_port->vs->sock->sk)->inet_sport;
 	struct ovs_key_ipv4_tunnel *tun_key;
+	struct vxlan_metadata md;
 	struct rtable *rt;
 	struct flowi4 fl;
 	__be16 src_port;
@@ -178,12 +180,13 @@ static int vxlan_tnl_send(struct vport *vport, struct sk_buff *skb)
 	skb->ignore_df = 1;
 
 	src_port = udp_flow_src_port(net, skb, 0, 0, true);
+	md.vni = htonl(be64_to_cpu(tun_key->tun_id) << 8);
 
 	err = vxlan_xmit_skb(vxlan_port->vs, rt, skb,
 			     fl.saddr, tun_key->ipv4_dst,
 			     tun_key->ipv4_tos, tun_key->ipv4_ttl, df,
 			     src_port, dst_port,
-			     htonl(be64_to_cpu(tun_key->tun_id) << 8),
+			     &md,
 			     false);
 	if (err < 0)
 		ip_rt_put(rt);
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2015-01-13 16:16 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-07 17:32 [PATCH 2/6] vxlan: Group Policy extension Alexei Starovoitov
     [not found] ` <CAADnVQJErdNJrXOOSqEqkbC8524VCH2E9vYL-WdTb_6SGsTwvw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-07 23:27   ` Thomas Graf
2015-01-07 23:39     ` Alexei Starovoitov
  -- strict thread matches above, loose matches on Subject: below --
2015-01-12 12:26 [PATCH 0/6 net-next v3] VXLAN Group Policy Extension Thomas Graf
2015-01-12 12:26 ` [PATCH 2/6] vxlan: Group Policy extension Thomas Graf
2015-01-12 19:23   ` Jesse Gross
     [not found]     ` <CAEP_g=8TqGnftZa_scKODa2ra7gsV6ov_5J+Lbfq+4bFDZjiBw-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-12 22:47       ` Thomas Graf
2015-01-12 22:50         ` Jesse Gross
2015-01-12 22:59           ` Thomas Graf
2015-01-12 23:19             ` Jesse Gross
2015-01-08 22:47 [PATCH 0/6 net-next v2] VXLAN Group Policy Extension Thomas Graf
2015-01-08 22:47 ` [PATCH 2/6] vxlan: Group Policy extension Thomas Graf
2015-01-09 17:37   ` Alexei Starovoitov
2015-01-09 22:10     ` Thomas Graf
2015-01-12 17:37   ` Nicolas Dichtel
2015-01-12 17:59     ` David Miller
2015-01-13  8:29       ` Nicolas Dichtel
2015-01-13  1:04     ` Thomas Graf
2015-01-12 18:14   ` Tom Herbert
2015-01-13  1:03     ` Thomas Graf
2015-01-13  2:28       ` Tom Herbert
2015-01-13 11:32         ` Thomas Graf
2015-01-13 16:16           ` Tom Herbert
2015-01-07  3:37 Alexei Starovoitov
2015-01-07 10:03 ` David Laight
2015-01-07 11:01   ` Thomas Graf
2015-01-07 11:10 ` Thomas Graf
2015-01-07  2:05 [PATCH 0/6 net-next] VXLAN Group Policy Extension Thomas Graf
2015-01-07  2:05 ` [PATCH 2/6] vxlan: Group Policy extension Thomas Graf
2015-01-07 16:05   ` Tom Herbert
     [not found]     ` <CA+mtBx_Jj-tUM1nbHd2fHb0-=QpK3tcQgA=smWmg=cB-fupdGg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-07 16:21       ` Thomas Graf
2015-01-07 16:56         ` Tom Herbert
     [not found]           ` <CA+mtBx_A_M3+irq7w4nNCyPZBgM7ja+wfJT4w4Q0Yo6GMGYVgA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2015-01-07 17:21             ` Thomas Graf

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.