All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
@ 2015-03-26 22:41 Dan Williams
  2015-03-26 22:43 ` [PATCH 2/2] ipvlan: always allow the broadcast MAC address Dan Williams
                   ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Dan Williams @ 2015-03-26 22:41 UTC (permalink / raw)
  To: netdev; +Cc: Mahesh Bandewar, jbenc

The broadcast MAC is supposed to be allowed whenever the device
has an IPv4 address, otherwise ARP requests get dropped on the
floor.  If ndo_set_rx_mode (and thus
ipvlan_set_multicast_mac_filter()) gets called after the address
was added, it blows away the broadcast MAC address in
mac_filters that was added at IPv4 address addition.  Fix that.

Signed-off-by: Dan Williams <dcbw@redhat.com>
---
 drivers/net/ipvlan/ipvlan_main.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 4f4099d..d34f580 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -241,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
 
 		bitmap_copy(ipvlan->mac_filters, mc_filters,
 			    IPVLAN_MAC_FILTER_SIZE);
+
+		if (ipvlan->ipv4cnt)
+			ipvlan_set_broadcast_mac_filter(ipvlan, true);
 	}
 	dev_uc_sync(ipvlan->phy_dev, dev);
 	dev_mc_sync(ipvlan->phy_dev, dev);
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [PATCH 2/2] ipvlan: always allow the broadcast MAC address
  2015-03-26 22:41 [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters Dan Williams
@ 2015-03-26 22:43 ` Dan Williams
  2015-03-27 17:46   ` Jiri Benc
  2015-03-28  0:52   ` Mahesh Bandewar
  2015-03-27 17:45 ` [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters Jiri Benc
  2015-03-30 20:28 ` Mahesh Bandewar
  2 siblings, 2 replies; 37+ messages in thread
From: Dan Williams @ 2015-03-26 22:43 UTC (permalink / raw)
  To: netdev; +Cc: Mahesh Bandewar, jbenc

ipvlan currently fails DHCP addressing for two reasons:

1) DHCP offers are typically unicast back to the device's MAC
address, and at the IP layer have a destination IP address
matching the new lease address.  In ipvlan unicast packets
are checked against existing addresses assigned to the ipvlan
interface, so clearly this fails hard because the ipvlan
interface has no IP addresses yet.  Workaround: request
that the server broadcast replies (-B for dhclient), which
don't get checked against the IP address list.

2) Even when that's done, mac_filters only allows the
broadcast MAC address when the interface has >= 1 IPv4
addresses, so double-fail, and the incoming DHCP offer
gets dropped on the floor again.

Instead of doing ugly stuff like watching for outgoing DHCP
requests and adding the broadcast MAC to mac_filters for
a period of time, just always allow the broadcast MAC.  This
lets the ipvlan interface be configured with DHCP in
Layer2 mode as long as as broadcast replies are used.

Signed-off-by: Dan Williams <dcbw@redhat.com>
---
 drivers/net/ipvlan/ipvlan_main.c | 21 ++++-----------------
 1 file changed, 4 insertions(+), 17 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index d34f580..787d7fb 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -214,17 +214,6 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
 		dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
 }
 
-static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
-{
-	struct net_device *dev = ipvlan->dev;
-	unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
-
-	if (set && !test_bit(hashbit, ipvlan->mac_filters))
-		__set_bit(hashbit, ipvlan->mac_filters);
-	else if (!set && test_bit(hashbit, ipvlan->mac_filters))
-		__clear_bit(hashbit, ipvlan->mac_filters);
-}
-
 static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
 {
 	struct ipvl_dev *ipvlan = netdev_priv(dev);
@@ -239,11 +228,11 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
 		netdev_for_each_mc_addr(ha, dev)
 			__set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
 
+		/* always allow broadcast frames */
+		__set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
+
 		bitmap_copy(ipvlan->mac_filters, mc_filters,
 			    IPVLAN_MAC_FILTER_SIZE);
-
-		if (ipvlan->ipv4cnt)
-			ipvlan_set_broadcast_mac_filter(ipvlan, true);
 	}
 	dev_uc_sync(ipvlan->phy_dev, dev);
 	dev_mc_sync(ipvlan->phy_dev, dev);
@@ -470,6 +459,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
 	INIT_LIST_HEAD(&ipvlan->addrs);
 	ipvlan->ipv4cnt = 0;
 	ipvlan->ipv6cnt = 0;
+	__set_bit(ipvlan_mac_hash(dev->broadcast), ipvlan->mac_filters);
 
 	/* TODO Probably put random address here to be presented to the
 	 * world but keep using the physical-dev address for the outgoing
@@ -694,7 +684,6 @@ static int ipvlan_add_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
 	list_add_tail_rcu(&addr->anode, &ipvlan->addrs);
 	ipvlan->ipv4cnt++;
 	ipvlan_ht_addr_add(ipvlan, addr);
-	ipvlan_set_broadcast_mac_filter(ipvlan, true);
 
 	return 0;
 }
@@ -711,8 +700,6 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
 	list_del_rcu(&addr->anode);
 	ipvlan->ipv4cnt--;
 	WARN_ON(ipvlan->ipv4cnt < 0);
-	if (!ipvlan->ipv4cnt)
-	    ipvlan_set_broadcast_mac_filter(ipvlan, false);
 	kfree_rcu(addr, rcu);
 
 	return;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
  2015-03-26 22:41 [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters Dan Williams
  2015-03-26 22:43 ` [PATCH 2/2] ipvlan: always allow the broadcast MAC address Dan Williams
@ 2015-03-27 17:45 ` Jiri Benc
  2015-03-30 20:28 ` Mahesh Bandewar
  2 siblings, 0 replies; 37+ messages in thread
From: Jiri Benc @ 2015-03-27 17:45 UTC (permalink / raw)
  To: Dan Williams; +Cc: netdev, Mahesh Bandewar

On Thu, 26 Mar 2015 17:41:38 -0500, Dan Williams wrote:
> The broadcast MAC is supposed to be allowed whenever the device
> has an IPv4 address, otherwise ARP requests get dropped on the
> floor.  If ndo_set_rx_mode (and thus
> ipvlan_set_multicast_mac_filter()) gets called after the address
> was added, it blows away the broadcast MAC address in
> mac_filters that was added at IPv4 address addition.  Fix that.
> 
> Signed-off-by: Dan Williams <dcbw@redhat.com>
> ---
>  drivers/net/ipvlan/ipvlan_main.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 4f4099d..d34f580 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -241,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
>  
>  		bitmap_copy(ipvlan->mac_filters, mc_filters,
>  			    IPVLAN_MAC_FILTER_SIZE);
> +
> +		if (ipvlan->ipv4cnt)
> +			ipvlan_set_broadcast_mac_filter(ipvlan, true);

It would be probably better to set the bit in mc_filters before copying
over to ipvlan->mac_filters. But it's not a big deal as bitmap_copy is
not atomic and it is changed by the second patch anyway.

I see not much value in having the two patches separate and would
squash them but it's not a big deal, either.

Reviewed-by: Jiri Benc <jbenc@redhat.com>

-- 
Jiri Benc

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] ipvlan: always allow the broadcast MAC address
  2015-03-26 22:43 ` [PATCH 2/2] ipvlan: always allow the broadcast MAC address Dan Williams
@ 2015-03-27 17:46   ` Jiri Benc
  2015-03-28  0:52   ` Mahesh Bandewar
  1 sibling, 0 replies; 37+ messages in thread
From: Jiri Benc @ 2015-03-27 17:46 UTC (permalink / raw)
  To: Dan Williams; +Cc: netdev, Mahesh Bandewar

On Thu, 26 Mar 2015 17:43:42 -0500, Dan Williams wrote:
> ipvlan currently fails DHCP addressing for two reasons:
> 
> 1) DHCP offers are typically unicast back to the device's MAC
> address, and at the IP layer have a destination IP address
> matching the new lease address.  In ipvlan unicast packets
> are checked against existing addresses assigned to the ipvlan
> interface, so clearly this fails hard because the ipvlan
> interface has no IP addresses yet.  Workaround: request
> that the server broadcast replies (-B for dhclient), which
> don't get checked against the IP address list.
> 
> 2) Even when that's done, mac_filters only allows the
> broadcast MAC address when the interface has >= 1 IPv4
> addresses, so double-fail, and the incoming DHCP offer
> gets dropped on the floor again.
> 
> Instead of doing ugly stuff like watching for outgoing DHCP
> requests and adding the broadcast MAC to mac_filters for
> a period of time, just always allow the broadcast MAC.  This
> lets the ipvlan interface be configured with DHCP in
> Layer2 mode as long as as broadcast replies are used.
> 
> Signed-off-by: Dan Williams <dcbw@redhat.com>

I don't see any better option, either.

Reviewed-by: Jiri Benc <jbenc@redhat.com>

-- 
Jiri Benc

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] ipvlan: always allow the broadcast MAC address
  2015-03-26 22:43 ` [PATCH 2/2] ipvlan: always allow the broadcast MAC address Dan Williams
  2015-03-27 17:46   ` Jiri Benc
@ 2015-03-28  0:52   ` Mahesh Bandewar
  2015-03-28  5:56     ` Mahesh Bandewar
  1 sibling, 1 reply; 37+ messages in thread
From: Mahesh Bandewar @ 2015-03-28  0:52 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-netdev, jbenc

On Thu, Mar 26, 2015 at 3:43 PM, Dan Williams <dcbw@redhat.com> wrote:
> ipvlan currently fails DHCP addressing for two reasons:
>
> 1) DHCP offers are typically unicast back to the device's MAC
> address, and at the IP layer have a destination IP address
> matching the new lease address.  In ipvlan unicast packets
> are checked against existing addresses assigned to the ipvlan
> interface, so clearly this fails hard because the ipvlan
> interface has no IP addresses yet.  Workaround: request
> that the server broadcast replies (-B for dhclient), which
> don't get checked against the IP address list.
>
> 2) Even when that's done, mac_filters only allows the
> broadcast MAC address when the interface has >= 1 IPv4
> addresses, so double-fail, and the incoming DHCP offer
> gets dropped on the floor again.
>
> Instead of doing ugly stuff like watching for outgoing DHCP
> requests and adding the broadcast MAC to mac_filters for
> a period of time, just always allow the broadcast MAC.  This
> lets the ipvlan interface be configured with DHCP in
> Layer2 mode as long as as broadcast replies are used.
>
Guys, stop thinking that we live in IPv4-only world! Think about a
hybrid environment where all your IPvlan slaves are IPv6 only while
your network has IPv4 (with crap-load of broadcast traffic) and IPv6
traffic. In that situation IPvlan demuxer has to forward all that
broadcast-traffic to it's slaves which they don't care at all.

I agree that the current driver falls short of handling DHCPv4 and
needs a fix. But for couple of initial config packets, it's not wise
to loose the performance forever. I don't like this patch.


> Signed-off-by: Dan Williams <dcbw@redhat.com>
> ---
>  drivers/net/ipvlan/ipvlan_main.c | 21 ++++-----------------
>  1 file changed, 4 insertions(+), 17 deletions(-)
>
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index d34f580..787d7fb 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -214,17 +214,6 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
>                 dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
>  }
>
> -static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> -{
> -       struct net_device *dev = ipvlan->dev;
> -       unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
> -
> -       if (set && !test_bit(hashbit, ipvlan->mac_filters))
> -               __set_bit(hashbit, ipvlan->mac_filters);
> -       else if (!set && test_bit(hashbit, ipvlan->mac_filters))
> -               __clear_bit(hashbit, ipvlan->mac_filters);
> -}
> -
>  static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
>  {
>         struct ipvl_dev *ipvlan = netdev_priv(dev);
> @@ -239,11 +228,11 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
>                 netdev_for_each_mc_addr(ha, dev)
>                         __set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
>
> +               /* always allow broadcast frames */
> +               __set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
> +
>                 bitmap_copy(ipvlan->mac_filters, mc_filters,
>                             IPVLAN_MAC_FILTER_SIZE);
> -
> -               if (ipvlan->ipv4cnt)
> -                       ipvlan_set_broadcast_mac_filter(ipvlan, true);
>         }
>         dev_uc_sync(ipvlan->phy_dev, dev);
>         dev_mc_sync(ipvlan->phy_dev, dev);
> @@ -470,6 +459,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
>         INIT_LIST_HEAD(&ipvlan->addrs);
>         ipvlan->ipv4cnt = 0;
>         ipvlan->ipv6cnt = 0;
> +       __set_bit(ipvlan_mac_hash(dev->broadcast), ipvlan->mac_filters);
>
>         /* TODO Probably put random address here to be presented to the
>          * world but keep using the physical-dev address for the outgoing
> @@ -694,7 +684,6 @@ static int ipvlan_add_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
>         list_add_tail_rcu(&addr->anode, &ipvlan->addrs);
>         ipvlan->ipv4cnt++;
>         ipvlan_ht_addr_add(ipvlan, addr);
> -       ipvlan_set_broadcast_mac_filter(ipvlan, true);
>
>         return 0;
>  }
> @@ -711,8 +700,6 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
>         list_del_rcu(&addr->anode);
>         ipvlan->ipv4cnt--;
>         WARN_ON(ipvlan->ipv4cnt < 0);
> -       if (!ipvlan->ipv4cnt)
> -           ipvlan_set_broadcast_mac_filter(ipvlan, false);
>         kfree_rcu(addr, rcu);
>
>         return;
> --
> 2.1.0
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] ipvlan: always allow the broadcast MAC address
  2015-03-28  0:52   ` Mahesh Bandewar
@ 2015-03-28  5:56     ` Mahesh Bandewar
  2015-03-28 18:32       ` Jiri Benc
  0 siblings, 1 reply; 37+ messages in thread
From: Mahesh Bandewar @ 2015-03-28  5:56 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-netdev, jbenc

On Fri, Mar 27, 2015 at 5:52 PM, Mahesh Bandewar <maheshb@google.com> wrote:
> On Thu, Mar 26, 2015 at 3:43 PM, Dan Williams <dcbw@redhat.com> wrote:
>> ipvlan currently fails DHCP addressing for two reasons:
>>
>> 1) DHCP offers are typically unicast back to the device's MAC
>> address, and at the IP layer have a destination IP address
>> matching the new lease address.  In ipvlan unicast packets
>> are checked against existing addresses assigned to the ipvlan
>> interface, so clearly this fails hard because the ipvlan
>> interface has no IP addresses yet.  Workaround: request
>> that the server broadcast replies (-B for dhclient), which
>> don't get checked against the IP address list.
>>
>> 2) Even when that's done, mac_filters only allows the
>> broadcast MAC address when the interface has >= 1 IPv4
>> addresses, so double-fail, and the incoming DHCP offer
>> gets dropped on the floor again.
>>
>> Instead of doing ugly stuff like watching for outgoing DHCP
>> requests and adding the broadcast MAC to mac_filters for
>> a period of time, just always allow the broadcast MAC.  This
>> lets the ipvlan interface be configured with DHCP in
>> Layer2 mode as long as as broadcast replies are used.
>>
> Guys, stop thinking that we live in IPv4-only world! Think about a
> hybrid environment where all your IPvlan slaves are IPv6 only while
> your network has IPv4 (with crap-load of broadcast traffic) and IPv6
> traffic. In that situation IPvlan demuxer has to forward all that
> broadcast-traffic to it's slaves which they don't care at all.
>
> I agree that the current driver falls short of handling DHCPv4 and
> needs a fix. But for couple of initial config packets, it's not wise
> to loose the performance forever. I don't like this patch.
>
I think the easiest way to fix this would be to flip the logic -

The current logic disables broadcast by default and enables only when
an IPv4 address is added. If this is inverted and -
enables broadcast by default but disables it when only IPv6
address(es) is / are added. These links can have multiple addresses
and hence have to be careful if any one of those is IPv4 then
broadcast bit has to be set.
>
>> Signed-off-by: Dan Williams <dcbw@redhat.com>
>> ---
>>  drivers/net/ipvlan/ipvlan_main.c | 21 ++++-----------------
>>  1 file changed, 4 insertions(+), 17 deletions(-)
>>
>> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
>> index d34f580..787d7fb 100644
>> --- a/drivers/net/ipvlan/ipvlan_main.c
>> +++ b/drivers/net/ipvlan/ipvlan_main.c
>> @@ -214,17 +214,6 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
>>                 dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
>>  }
>>
>> -static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
>> -{
>> -       struct net_device *dev = ipvlan->dev;
>> -       unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
>> -
>> -       if (set && !test_bit(hashbit, ipvlan->mac_filters))
>> -               __set_bit(hashbit, ipvlan->mac_filters);
>> -       else if (!set && test_bit(hashbit, ipvlan->mac_filters))
>> -               __clear_bit(hashbit, ipvlan->mac_filters);
>> -}
>> -
>>  static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
>>  {
>>         struct ipvl_dev *ipvlan = netdev_priv(dev);
>> @@ -239,11 +228,11 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
>>                 netdev_for_each_mc_addr(ha, dev)
>>                         __set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
>>
>> +               /* always allow broadcast frames */
>> +               __set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
>> +
>>                 bitmap_copy(ipvlan->mac_filters, mc_filters,
>>                             IPVLAN_MAC_FILTER_SIZE);
>> -
>> -               if (ipvlan->ipv4cnt)
>> -                       ipvlan_set_broadcast_mac_filter(ipvlan, true);
>>         }
>>         dev_uc_sync(ipvlan->phy_dev, dev);
>>         dev_mc_sync(ipvlan->phy_dev, dev);
>> @@ -470,6 +459,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
>>         INIT_LIST_HEAD(&ipvlan->addrs);
>>         ipvlan->ipv4cnt = 0;
>>         ipvlan->ipv6cnt = 0;
>> +       __set_bit(ipvlan_mac_hash(dev->broadcast), ipvlan->mac_filters);
>>
>>         /* TODO Probably put random address here to be presented to the
>>          * world but keep using the physical-dev address for the outgoing
>> @@ -694,7 +684,6 @@ static int ipvlan_add_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
>>         list_add_tail_rcu(&addr->anode, &ipvlan->addrs);
>>         ipvlan->ipv4cnt++;
>>         ipvlan_ht_addr_add(ipvlan, addr);
>> -       ipvlan_set_broadcast_mac_filter(ipvlan, true);
>>
>>         return 0;
>>  }
>> @@ -711,8 +700,6 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
>>         list_del_rcu(&addr->anode);
>>         ipvlan->ipv4cnt--;
>>         WARN_ON(ipvlan->ipv4cnt < 0);
>> -       if (!ipvlan->ipv4cnt)
>> -           ipvlan_set_broadcast_mac_filter(ipvlan, false);
>>         kfree_rcu(addr, rcu);
>>
>>         return;
>> --
>> 2.1.0
>>
>>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] ipvlan: always allow the broadcast MAC address
  2015-03-28  5:56     ` Mahesh Bandewar
@ 2015-03-28 18:32       ` Jiri Benc
  2015-03-30 14:37         ` Dan Williams
  0 siblings, 1 reply; 37+ messages in thread
From: Jiri Benc @ 2015-03-28 18:32 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: Dan Williams, linux-netdev

On Fri, 27 Mar 2015 22:56:15 -0700, Mahesh Bandewar wrote:
> The current logic disables broadcast by default and enables only when
> an IPv4 address is added. If this is inverted and -
> enables broadcast by default but disables it when only IPv6
> address(es) is / are added. These links can have multiple addresses
> and hence have to be careful if any one of those is IPv4 then
> broadcast bit has to be set.

You'd have to be careful and ignore IPv6 link local addresses.
Those are added automatically whenever IPv6 is enabled and their
presence does not mean the network is not IPv4 only.

But I don't like such magic behavior. It would lead to DHCP sometimes
working and sometimes not in mixed v4/v6 environment depending on
whether DHCPv4 or SLAAC was faster.

Could we perhaps add a flag when creating ipvlan interface stating
whether IPv4 broadcast should be always enabled? Or, rather, the other
way round - whether it should be disabled by default. Call it "nodhcp"
or so.

Btw, speaking about IPv6 link local addresses, these actually do not
work with ipvlan correctly. I'm getting DAD failures if I have more
than one ipvlan interface, which is no wonder. This means that ipvlan
cannot work with IPv6 reliably by default (unless you take care of ll
address assignment and ensure all ipvlan interfaces get a different
one).

 Jiri

-- 
Jiri Benc

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] ipvlan: always allow the broadcast MAC address
  2015-03-28 18:32       ` Jiri Benc
@ 2015-03-30 14:37         ` Dan Williams
  2015-03-30 16:54           ` Mahesh Bandewar
  0 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2015-03-30 14:37 UTC (permalink / raw)
  To: Jiri Benc; +Cc: Mahesh Bandewar, linux-netdev

On Sat, 2015-03-28 at 19:32 +0100, Jiri Benc wrote:
> On Fri, 27 Mar 2015 22:56:15 -0700, Mahesh Bandewar wrote:
> > The current logic disables broadcast by default and enables only when
> > an IPv4 address is added. If this is inverted and -
> > enables broadcast by default but disables it when only IPv6
> > address(es) is / are added. These links can have multiple addresses
> > and hence have to be careful if any one of those is IPv4 then
> > broadcast bit has to be set.
> 
> You'd have to be careful and ignore IPv6 link local addresses.
> Those are added automatically whenever IPv6 is enabled and their
> presence does not mean the network is not IPv4 only.
> 
> But I don't like such magic behavior. It would lead to DHCP sometimes
> working and sometimes not in mixed v4/v6 environment depending on
> whether DHCPv4 or SLAAC was faster.
> 
> Could we perhaps add a flag when creating ipvlan interface stating
> whether IPv4 broadcast should be always enabled? Or, rather, the other
> way round - whether it should be disabled by default. Call it "nodhcp"
> or so.
> 
> Btw, speaking about IPv6 link local addresses, these actually do not
> work with ipvlan correctly. I'm getting DAD failures if I have more
> than one ipvlan interface, which is no wonder. This means that ipvlan
> cannot work with IPv6 reliably by default (unless you take care of ll
> address assignment and ensure all ipvlan interfaces get a different
> one).

ipvlan doesn't set dev_id.  Once dev_id is set the kernel's IPv6LL
address generation code will assign a different LL address to each
ipvlan interface created from the same physical interface, despite that
they have the same MAC address.

But of course you'd have to be careful to assign a *different* dev_id
than any of that physical interface's non-ipvlan children too, and I
have no idea how that would work since dev_id is currently done
per-driver.  eg, if you have a physical interface with dev_id=1 which
you then create an ipvlan from, that ipvlan must not use dev_id=1 or it
will be assigned the same IPv6LL address as the parent.

Dan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] ipvlan: always allow the broadcast MAC address
  2015-03-30 14:37         ` Dan Williams
@ 2015-03-30 16:54           ` Mahesh Bandewar
  2015-03-30 17:44             ` Dan Williams
  0 siblings, 1 reply; 37+ messages in thread
From: Mahesh Bandewar @ 2015-03-30 16:54 UTC (permalink / raw)
  To: Dan Williams; +Cc: Jiri Benc, linux-netdev

On Mon, Mar 30, 2015 at 7:37 AM, Dan Williams <dcbw@redhat.com> wrote:
> On Sat, 2015-03-28 at 19:32 +0100, Jiri Benc wrote:
>> On Fri, 27 Mar 2015 22:56:15 -0700, Mahesh Bandewar wrote:
>> > The current logic disables broadcast by default and enables only when
>> > an IPv4 address is added. If this is inverted and -
>> > enables broadcast by default but disables it when only IPv6
>> > address(es) is / are added. These links can have multiple addresses
>> > and hence have to be careful if any one of those is IPv4 then
>> > broadcast bit has to be set.
>>
>> You'd have to be careful and ignore IPv6 link local addresses.
>> Those are added automatically whenever IPv6 is enabled and their
>> presence does not mean the network is not IPv4 only.
>>
>> But I don't like such magic behavior. It would lead to DHCP sometimes
>> working and sometimes not in mixed v4/v6 environment depending on
>> whether DHCPv4 or SLAAC was faster.
>>
>> Could we perhaps add a flag when creating ipvlan interface stating
>> whether IPv4 broadcast should be always enabled? Or, rather, the other
>> way round - whether it should be disabled by default. Call it "nodhcp"
>> or so.
>>
>> Btw, speaking about IPv6 link local addresses, these actually do not
>> work with ipvlan correctly. I'm getting DAD failures if I have more
>> than one ipvlan interface, which is no wonder. This means that ipvlan
>> cannot work with IPv6 reliably by default (unless you take care of ll
>> address assignment and ensure all ipvlan interfaces get a different
>> one).
>
> ipvlan doesn't set dev_id.  Once dev_id is set the kernel's IPv6LL
> address generation code will assign a different LL address to each
> ipvlan interface created from the same physical interface, despite that
> they have the same MAC address.
>
Yes, that was what my plan was but never got around fixing that

> But of course you'd have to be careful to assign a *different* dev_id
> than any of that physical interface's non-ipvlan children too, and I
> have no idea how that would work since dev_id is currently done
> per-driver.  eg, if you have a physical interface with dev_id=1 which
> you then create an ipvlan from, that ipvlan must not use dev_id=1 or it
> will be assigned the same IPv6LL address as the parent.
>
The description is very clear for dev_id (in netdevice.h). So the idea
of using the subsequent numbers after master's id should be possible.
After all these logical devices are going to share the same link. Most
physical drivers don't assign dev-id so the beginning is 0x0 (for the
physical driver) and from 0x1 can be assigned to the logical links.
The definition is not clear in terms of what is the beginning (0x0 or
0x1) but from the code that generates the IPv6LL it's common that it's
0x0 hence logical links on top of these links can use 0x1 onward.
However a check to see if the master-link has dev-id and staying clear
of that should be sufficient.

--mahesh..


>
> Dan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] ipvlan: always allow the broadcast MAC address
  2015-03-30 16:54           ` Mahesh Bandewar
@ 2015-03-30 17:44             ` Dan Williams
  2015-03-30 17:56               ` Mahesh Bandewar
  0 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2015-03-30 17:44 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: Jiri Benc, linux-netdev

On Mon, 2015-03-30 at 09:54 -0700, Mahesh Bandewar wrote:
> On Mon, Mar 30, 2015 at 7:37 AM, Dan Williams <dcbw@redhat.com> wrote:
> > On Sat, 2015-03-28 at 19:32 +0100, Jiri Benc wrote:
> >> On Fri, 27 Mar 2015 22:56:15 -0700, Mahesh Bandewar wrote:
> >> > The current logic disables broadcast by default and enables only when
> >> > an IPv4 address is added. If this is inverted and -
> >> > enables broadcast by default but disables it when only IPv6
> >> > address(es) is / are added. These links can have multiple addresses
> >> > and hence have to be careful if any one of those is IPv4 then
> >> > broadcast bit has to be set.
> >>
> >> You'd have to be careful and ignore IPv6 link local addresses.
> >> Those are added automatically whenever IPv6 is enabled and their
> >> presence does not mean the network is not IPv4 only.
> >>
> >> But I don't like such magic behavior. It would lead to DHCP sometimes
> >> working and sometimes not in mixed v4/v6 environment depending on
> >> whether DHCPv4 or SLAAC was faster.
> >>
> >> Could we perhaps add a flag when creating ipvlan interface stating
> >> whether IPv4 broadcast should be always enabled? Or, rather, the other
> >> way round - whether it should be disabled by default. Call it "nodhcp"
> >> or so.
> >>
> >> Btw, speaking about IPv6 link local addresses, these actually do not
> >> work with ipvlan correctly. I'm getting DAD failures if I have more
> >> than one ipvlan interface, which is no wonder. This means that ipvlan
> >> cannot work with IPv6 reliably by default (unless you take care of ll
> >> address assignment and ensure all ipvlan interfaces get a different
> >> one).
> >
> > ipvlan doesn't set dev_id.  Once dev_id is set the kernel's IPv6LL
> > address generation code will assign a different LL address to each
> > ipvlan interface created from the same physical interface, despite that
> > they have the same MAC address.
> >
> Yes, that was what my plan was but never got around fixing that
> 
> > But of course you'd have to be careful to assign a *different* dev_id
> > than any of that physical interface's non-ipvlan children too, and I
> > have no idea how that would work since dev_id is currently done
> > per-driver.  eg, if you have a physical interface with dev_id=1 which
> > you then create an ipvlan from, that ipvlan must not use dev_id=1 or it
> > will be assigned the same IPv6LL address as the parent.
> >
> The description is very clear for dev_id (in netdevice.h). So the idea
> of using the subsequent numbers after master's id should be possible.
> After all these logical devices are going to share the same link. Most
> physical drivers don't assign dev-id so the beginning is 0x0 (for the
> physical driver) and from 0x1 can be assigned to the logical links.
> The definition is not clear in terms of what is the beginning (0x0 or
> 0x1) but from the code that generates the IPv6LL it's common that it's
> 0x0 hence logical links on top of these links can use 0x1 onward.
> However a check to see if the master-link has dev-id and staying clear
> of that should be sufficient.

My point was that if you have a parent with a non-zero dev_id, there can
be other siblings of the parent that have a different dev_id and share
the same MAC address.  So creating an ipvlan with parent->dev_id + 1
doesn't work, because the parent may have a sibling with parent->dev_id
+ 1 and the same MAC address already.

Dan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] ipvlan: always allow the broadcast MAC address
  2015-03-30 17:44             ` Dan Williams
@ 2015-03-30 17:56               ` Mahesh Bandewar
  2015-03-30 18:13                 ` Dan Williams
  0 siblings, 1 reply; 37+ messages in thread
From: Mahesh Bandewar @ 2015-03-30 17:56 UTC (permalink / raw)
  To: Dan Williams; +Cc: Jiri Benc, linux-netdev

On Mon, Mar 30, 2015 at 10:44 AM, Dan Williams <dcbw@redhat.com> wrote:
> On Mon, 2015-03-30 at 09:54 -0700, Mahesh Bandewar wrote:
>> On Mon, Mar 30, 2015 at 7:37 AM, Dan Williams <dcbw@redhat.com> wrote:
>> > On Sat, 2015-03-28 at 19:32 +0100, Jiri Benc wrote:
>> >> On Fri, 27 Mar 2015 22:56:15 -0700, Mahesh Bandewar wrote:
>> >> > The current logic disables broadcast by default and enables only when
>> >> > an IPv4 address is added. If this is inverted and -
>> >> > enables broadcast by default but disables it when only IPv6
>> >> > address(es) is / are added. These links can have multiple addresses
>> >> > and hence have to be careful if any one of those is IPv4 then
>> >> > broadcast bit has to be set.
>> >>
>> >> You'd have to be careful and ignore IPv6 link local addresses.
>> >> Those are added automatically whenever IPv6 is enabled and their
>> >> presence does not mean the network is not IPv4 only.
>> >>
>> >> But I don't like such magic behavior. It would lead to DHCP sometimes
>> >> working and sometimes not in mixed v4/v6 environment depending on
>> >> whether DHCPv4 or SLAAC was faster.
>> >>
>> >> Could we perhaps add a flag when creating ipvlan interface stating
>> >> whether IPv4 broadcast should be always enabled? Or, rather, the other
>> >> way round - whether it should be disabled by default. Call it "nodhcp"
>> >> or so.
>> >>
>> >> Btw, speaking about IPv6 link local addresses, these actually do not
>> >> work with ipvlan correctly. I'm getting DAD failures if I have more
>> >> than one ipvlan interface, which is no wonder. This means that ipvlan
>> >> cannot work with IPv6 reliably by default (unless you take care of ll
>> >> address assignment and ensure all ipvlan interfaces get a different
>> >> one).
>> >
>> > ipvlan doesn't set dev_id.  Once dev_id is set the kernel's IPv6LL
>> > address generation code will assign a different LL address to each
>> > ipvlan interface created from the same physical interface, despite that
>> > they have the same MAC address.
>> >
>> Yes, that was what my plan was but never got around fixing that
>>
>> > But of course you'd have to be careful to assign a *different* dev_id
>> > than any of that physical interface's non-ipvlan children too, and I
>> > have no idea how that would work since dev_id is currently done
>> > per-driver.  eg, if you have a physical interface with dev_id=1 which
>> > you then create an ipvlan from, that ipvlan must not use dev_id=1 or it
>> > will be assigned the same IPv6LL address as the parent.
>> >
>> The description is very clear for dev_id (in netdevice.h). So the idea
>> of using the subsequent numbers after master's id should be possible.
>> After all these logical devices are going to share the same link. Most
>> physical drivers don't assign dev-id so the beginning is 0x0 (for the
>> physical driver) and from 0x1 can be assigned to the logical links.
>> The definition is not clear in terms of what is the beginning (0x0 or
>> 0x1) but from the code that generates the IPv6LL it's common that it's
>> 0x0 hence logical links on top of these links can use 0x1 onward.
>> However a check to see if the master-link has dev-id and staying clear
>> of that should be sufficient.
>
> My point was that if you have a parent with a non-zero dev_id, there can
> be other siblings of the parent that have a different dev_id and share
> the same MAC address.  So creating an ipvlan with parent->dev_id + 1
> doesn't work, because the parent may have a sibling with parent->dev_id
> + 1 and the same MAC address already.
>
May be I'm missing something but is there a scenario where sibling
(physical / port) will be sharing the same LL-address? The definition
/ description in netdevice.h is  -

               *  @dev_id:        Used to differentiate devices that share
               *                          the same link layer address

So I's assuming the layered / stacked devices (children) rather than
ports etc (siblings). What am I missing?



> Dan
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] ipvlan: always allow the broadcast MAC address
  2015-03-30 17:56               ` Mahesh Bandewar
@ 2015-03-30 18:13                 ` Dan Williams
  2015-03-30 18:32                   ` Mahesh Bandewar
  0 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2015-03-30 18:13 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: Jiri Benc, linux-netdev

On Mon, 2015-03-30 at 10:56 -0700, Mahesh Bandewar wrote:
> On Mon, Mar 30, 2015 at 10:44 AM, Dan Williams <dcbw@redhat.com> wrote:
> > On Mon, 2015-03-30 at 09:54 -0700, Mahesh Bandewar wrote:
> >> On Mon, Mar 30, 2015 at 7:37 AM, Dan Williams <dcbw@redhat.com> wrote:
> >> > On Sat, 2015-03-28 at 19:32 +0100, Jiri Benc wrote:
> >> >> On Fri, 27 Mar 2015 22:56:15 -0700, Mahesh Bandewar wrote:
> >> >> > The current logic disables broadcast by default and enables only when
> >> >> > an IPv4 address is added. If this is inverted and -
> >> >> > enables broadcast by default but disables it when only IPv6
> >> >> > address(es) is / are added. These links can have multiple addresses
> >> >> > and hence have to be careful if any one of those is IPv4 then
> >> >> > broadcast bit has to be set.
> >> >>
> >> >> You'd have to be careful and ignore IPv6 link local addresses.
> >> >> Those are added automatically whenever IPv6 is enabled and their
> >> >> presence does not mean the network is not IPv4 only.
> >> >>
> >> >> But I don't like such magic behavior. It would lead to DHCP sometimes
> >> >> working and sometimes not in mixed v4/v6 environment depending on
> >> >> whether DHCPv4 or SLAAC was faster.
> >> >>
> >> >> Could we perhaps add a flag when creating ipvlan interface stating
> >> >> whether IPv4 broadcast should be always enabled? Or, rather, the other
> >> >> way round - whether it should be disabled by default. Call it "nodhcp"
> >> >> or so.
> >> >>
> >> >> Btw, speaking about IPv6 link local addresses, these actually do not
> >> >> work with ipvlan correctly. I'm getting DAD failures if I have more
> >> >> than one ipvlan interface, which is no wonder. This means that ipvlan
> >> >> cannot work with IPv6 reliably by default (unless you take care of ll
> >> >> address assignment and ensure all ipvlan interfaces get a different
> >> >> one).
> >> >
> >> > ipvlan doesn't set dev_id.  Once dev_id is set the kernel's IPv6LL
> >> > address generation code will assign a different LL address to each
> >> > ipvlan interface created from the same physical interface, despite that
> >> > they have the same MAC address.
> >> >
> >> Yes, that was what my plan was but never got around fixing that
> >>
> >> > But of course you'd have to be careful to assign a *different* dev_id
> >> > than any of that physical interface's non-ipvlan children too, and I
> >> > have no idea how that would work since dev_id is currently done
> >> > per-driver.  eg, if you have a physical interface with dev_id=1 which
> >> > you then create an ipvlan from, that ipvlan must not use dev_id=1 or it
> >> > will be assigned the same IPv6LL address as the parent.
> >> >
> >> The description is very clear for dev_id (in netdevice.h). So the idea
> >> of using the subsequent numbers after master's id should be possible.
> >> After all these logical devices are going to share the same link. Most
> >> physical drivers don't assign dev-id so the beginning is 0x0 (for the
> >> physical driver) and from 0x1 can be assigned to the logical links.
> >> The definition is not clear in terms of what is the beginning (0x0 or
> >> 0x1) but from the code that generates the IPv6LL it's common that it's
> >> 0x0 hence logical links on top of these links can use 0x1 onward.
> >> However a check to see if the master-link has dev-id and staying clear
> >> of that should be sufficient.
> >
> > My point was that if you have a parent with a non-zero dev_id, there can
> > be other siblings of the parent that have a different dev_id and share
> > the same MAC address.  So creating an ipvlan with parent->dev_id + 1
> > doesn't work, because the parent may have a sibling with parent->dev_id
> > + 1 and the same MAC address already.
> >
> May be I'm missing something but is there a scenario where sibling
> (physical / port) will be sharing the same LL-address? The definition
> / description in netdevice.h is  -
> 
>                *  @dev_id:        Used to differentiate devices that share
>                *                          the same link layer address
> 
> So I's assuming the layered / stacked devices (children) rather than
> ports etc (siblings). What am I missing?

I don't think that distinction matters since you can create an ipvlan
interface on top of any other interface except a macvlan.  So any driver
that sets dev_id could be the parent of an ipvlan interface.  That
appears to include some CAN devices and s390's qeth driver at the
moment.

Dan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 2/2] ipvlan: always allow the broadcast MAC address
  2015-03-30 18:13                 ` Dan Williams
@ 2015-03-30 18:32                   ` Mahesh Bandewar
  0 siblings, 0 replies; 37+ messages in thread
From: Mahesh Bandewar @ 2015-03-30 18:32 UTC (permalink / raw)
  To: Dan Williams; +Cc: Jiri Benc, linux-netdev

On Mon, Mar 30, 2015 at 11:13 AM, Dan Williams <dcbw@redhat.com> wrote:
> On Mon, 2015-03-30 at 10:56 -0700, Mahesh Bandewar wrote:
>> On Mon, Mar 30, 2015 at 10:44 AM, Dan Williams <dcbw@redhat.com> wrote:
>> > On Mon, 2015-03-30 at 09:54 -0700, Mahesh Bandewar wrote:
>> >> On Mon, Mar 30, 2015 at 7:37 AM, Dan Williams <dcbw@redhat.com> wrote:
>> >> > On Sat, 2015-03-28 at 19:32 +0100, Jiri Benc wrote:
>> >> >> On Fri, 27 Mar 2015 22:56:15 -0700, Mahesh Bandewar wrote:
>> >> >> > The current logic disables broadcast by default and enables only when
>> >> >> > an IPv4 address is added. If this is inverted and -
>> >> >> > enables broadcast by default but disables it when only IPv6
>> >> >> > address(es) is / are added. These links can have multiple addresses
>> >> >> > and hence have to be careful if any one of those is IPv4 then
>> >> >> > broadcast bit has to be set.
>> >> >>
>> >> >> You'd have to be careful and ignore IPv6 link local addresses.
>> >> >> Those are added automatically whenever IPv6 is enabled and their
>> >> >> presence does not mean the network is not IPv4 only.
>> >> >>
>> >> >> But I don't like such magic behavior. It would lead to DHCP sometimes
>> >> >> working and sometimes not in mixed v4/v6 environment depending on
>> >> >> whether DHCPv4 or SLAAC was faster.
>> >> >>
>> >> >> Could we perhaps add a flag when creating ipvlan interface stating
>> >> >> whether IPv4 broadcast should be always enabled? Or, rather, the other
>> >> >> way round - whether it should be disabled by default. Call it "nodhcp"
>> >> >> or so.
>> >> >>
>> >> >> Btw, speaking about IPv6 link local addresses, these actually do not
>> >> >> work with ipvlan correctly. I'm getting DAD failures if I have more
>> >> >> than one ipvlan interface, which is no wonder. This means that ipvlan
>> >> >> cannot work with IPv6 reliably by default (unless you take care of ll
>> >> >> address assignment and ensure all ipvlan interfaces get a different
>> >> >> one).
>> >> >
>> >> > ipvlan doesn't set dev_id.  Once dev_id is set the kernel's IPv6LL
>> >> > address generation code will assign a different LL address to each
>> >> > ipvlan interface created from the same physical interface, despite that
>> >> > they have the same MAC address.
>> >> >
>> >> Yes, that was what my plan was but never got around fixing that
>> >>
>> >> > But of course you'd have to be careful to assign a *different* dev_id
>> >> > than any of that physical interface's non-ipvlan children too, and I
>> >> > have no idea how that would work since dev_id is currently done
>> >> > per-driver.  eg, if you have a physical interface with dev_id=1 which
>> >> > you then create an ipvlan from, that ipvlan must not use dev_id=1 or it
>> >> > will be assigned the same IPv6LL address as the parent.
>> >> >
>> >> The description is very clear for dev_id (in netdevice.h). So the idea
>> >> of using the subsequent numbers after master's id should be possible.
>> >> After all these logical devices are going to share the same link. Most
>> >> physical drivers don't assign dev-id so the beginning is 0x0 (for the
>> >> physical driver) and from 0x1 can be assigned to the logical links.
>> >> The definition is not clear in terms of what is the beginning (0x0 or
>> >> 0x1) but from the code that generates the IPv6LL it's common that it's
>> >> 0x0 hence logical links on top of these links can use 0x1 onward.
>> >> However a check to see if the master-link has dev-id and staying clear
>> >> of that should be sufficient.
>> >
>> > My point was that if you have a parent with a non-zero dev_id, there can
>> > be other siblings of the parent that have a different dev_id and share
>> > the same MAC address.  So creating an ipvlan with parent->dev_id + 1
>> > doesn't work, because the parent may have a sibling with parent->dev_id
>> > + 1 and the same MAC address already.
>> >
>> May be I'm missing something but is there a scenario where sibling
>> (physical / port) will be sharing the same LL-address? The definition
>> / description in netdevice.h is  -
>>
>>                *  @dev_id:        Used to differentiate devices that share
>>                *                          the same link layer address
>>
>> So I's assuming the layered / stacked devices (children) rather than
>> ports etc (siblings). What am I missing?
>
> I don't think that distinction matters since you can create an ipvlan
> interface on top of any other interface except a macvlan.  So any driver
> that sets dev_id could be the parent of an ipvlan interface.  That
> appears to include some CAN devices and s390's qeth driver at the
> moment.
>
Yes, that is true (forgot about it!), which means the dev-id has to be
unique in the physical drivers (not just masters') hierarchy.

I think it becomes little complicated. When IPvlan get instantiated,
traverse the dev-layers
using the adj_list linkage and find the lowest device and then
traverse the hierarchy to find a base-dev-id. Then use that base for
all logical link creation. Would this work?

> Dan
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
  2015-03-26 22:41 [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters Dan Williams
  2015-03-26 22:43 ` [PATCH 2/2] ipvlan: always allow the broadcast MAC address Dan Williams
  2015-03-27 17:45 ` [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters Jiri Benc
@ 2015-03-30 20:28 ` Mahesh Bandewar
  2015-03-30 21:01   ` Dan Williams
  2 siblings, 1 reply; 37+ messages in thread
From: Mahesh Bandewar @ 2015-03-30 20:28 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-netdev, jbenc

On Thu, Mar 26, 2015 at 3:41 PM, Dan Williams <dcbw@redhat.com> wrote:
> The broadcast MAC is supposed to be allowed whenever the device
> has an IPv4 address, otherwise ARP requests get dropped on the
> floor.  If ndo_set_rx_mode (and thus
> ipvlan_set_multicast_mac_filter()) gets called after the address
> was added, it blows away the broadcast MAC address in
> mac_filters that was added at IPv4 address addition.  Fix that.
>
> Signed-off-by: Dan Williams <dcbw@redhat.com>
Acked-by: Mahesh Bandewar <maheshb@google.com>
> ---
>  drivers/net/ipvlan/ipvlan_main.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 4f4099d..d34f580 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -241,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
>
>                 bitmap_copy(ipvlan->mac_filters, mc_filters,
>                             IPVLAN_MAC_FILTER_SIZE);
> +
> +               if (ipvlan->ipv4cnt)
> +                       ipvlan_set_broadcast_mac_filter(ipvlan, true);
>         }
>         dev_uc_sync(ipvlan->phy_dev, dev);
>         dev_mc_sync(ipvlan->phy_dev, dev);
> --
> 2.1.0
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
  2015-03-30 20:28 ` Mahesh Bandewar
@ 2015-03-30 21:01   ` Dan Williams
  2015-03-30 21:11     ` Mahesh Bandewar
  0 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2015-03-30 21:01 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: linux-netdev, jbenc

On Mon, 2015-03-30 at 13:28 -0700, Mahesh Bandewar wrote:
> On Thu, Mar 26, 2015 at 3:41 PM, Dan Williams <dcbw@redhat.com> wrote:
> > The broadcast MAC is supposed to be allowed whenever the device
> > has an IPv4 address, otherwise ARP requests get dropped on the
> > floor.  If ndo_set_rx_mode (and thus
> > ipvlan_set_multicast_mac_filter()) gets called after the address
> > was added, it blows away the broadcast MAC address in
> > mac_filters that was added at IPv4 address addition.  Fix that.
> >
> > Signed-off-by: Dan Williams <dcbw@redhat.com>
> Acked-by: Mahesh Bandewar <maheshb@google.com>

I'm actually going to send another patch that supercedes this one and
handles the DHCP issue in a slightly different way.

Dan

> > ---
> >  drivers/net/ipvlan/ipvlan_main.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> > index 4f4099d..d34f580 100644
> > --- a/drivers/net/ipvlan/ipvlan_main.c
> > +++ b/drivers/net/ipvlan/ipvlan_main.c
> > @@ -241,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
> >
> >                 bitmap_copy(ipvlan->mac_filters, mc_filters,
> >                             IPVLAN_MAC_FILTER_SIZE);
> > +
> > +               if (ipvlan->ipv4cnt)
> > +                       ipvlan_set_broadcast_mac_filter(ipvlan, true);
> >         }
> >         dev_uc_sync(ipvlan->phy_dev, dev);
> >         dev_mc_sync(ipvlan->phy_dev, dev);
> > --
> > 2.1.0
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
  2015-03-30 21:01   ` Dan Williams
@ 2015-03-30 21:11     ` Mahesh Bandewar
  2015-03-31  3:05       ` Dan Williams
  0 siblings, 1 reply; 37+ messages in thread
From: Mahesh Bandewar @ 2015-03-30 21:11 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-netdev, jbenc

On Mon, Mar 30, 2015 at 2:01 PM, Dan Williams <dcbw@redhat.com> wrote:
> On Mon, 2015-03-30 at 13:28 -0700, Mahesh Bandewar wrote:
>> On Thu, Mar 26, 2015 at 3:41 PM, Dan Williams <dcbw@redhat.com> wrote:
>> > The broadcast MAC is supposed to be allowed whenever the device
>> > has an IPv4 address, otherwise ARP requests get dropped on the
>> > floor.  If ndo_set_rx_mode (and thus
>> > ipvlan_set_multicast_mac_filter()) gets called after the address
>> > was added, it blows away the broadcast MAC address in
>> > mac_filters that was added at IPv4 address addition.  Fix that.
>> >
>> > Signed-off-by: Dan Williams <dcbw@redhat.com>
>> Acked-by: Mahesh Bandewar <maheshb@google.com>
>
> I'm actually going to send another patch that supercedes this one and
> handles the DHCP issue in a slightly different way.
>
Sure, I'll take a look at that but irrespective of how DHCP is
handled, this patch fixes the problem that you have described since
ndo_set_rxmode will wipe the broadcast bit if set and will end up
breaking the IPv4. So I feel that this is still required in some-form.

> Dan
>
>> > ---
>> >  drivers/net/ipvlan/ipvlan_main.c | 3 +++
>> >  1 file changed, 3 insertions(+)
>> >
>> > diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
>> > index 4f4099d..d34f580 100644
>> > --- a/drivers/net/ipvlan/ipvlan_main.c
>> > +++ b/drivers/net/ipvlan/ipvlan_main.c
>> > @@ -241,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
>> >
>> >                 bitmap_copy(ipvlan->mac_filters, mc_filters,
>> >                             IPVLAN_MAC_FILTER_SIZE);
>> > +
>> > +               if (ipvlan->ipv4cnt)
>> > +                       ipvlan_set_broadcast_mac_filter(ipvlan, true);
>> >         }
>> >         dev_uc_sync(ipvlan->phy_dev, dev);
>> >         dev_mc_sync(ipvlan->phy_dev, dev);
>> > --
>> > 2.1.0
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
  2015-03-30 21:11     ` Mahesh Bandewar
@ 2015-03-31  3:05       ` Dan Williams
  2015-03-31  4:22         ` [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP Dan Williams
  0 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2015-03-31  3:05 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: linux-netdev, jbenc

On Mon, 2015-03-30 at 14:11 -0700, Mahesh Bandewar wrote:
> On Mon, Mar 30, 2015 at 2:01 PM, Dan Williams <dcbw@redhat.com> wrote:
> > On Mon, 2015-03-30 at 13:28 -0700, Mahesh Bandewar wrote:
> >> On Thu, Mar 26, 2015 at 3:41 PM, Dan Williams <dcbw@redhat.com> wrote:
> >> > The broadcast MAC is supposed to be allowed whenever the device
> >> > has an IPv4 address, otherwise ARP requests get dropped on the
> >> > floor.  If ndo_set_rx_mode (and thus
> >> > ipvlan_set_multicast_mac_filter()) gets called after the address
> >> > was added, it blows away the broadcast MAC address in
> >> > mac_filters that was added at IPv4 address addition.  Fix that.
> >> >
> >> > Signed-off-by: Dan Williams <dcbw@redhat.com>
> >> Acked-by: Mahesh Bandewar <maheshb@google.com>
> >
> > I'm actually going to send another patch that supercedes this one and
> > handles the DHCP issue in a slightly different way.
> >
> Sure, I'll take a look at that but irrespective of how DHCP is
> handled, this patch fixes the problem that you have described since
> ndo_set_rxmode will wipe the broadcast bit if set and will end up
> breaking the IPv4. So I feel that this is still required in some-form.

That's true, which is why I originally split the patches up.  So I'm
fine if this is applied.  But the DHCP bits will change this part too.

Dan

> > Dan
> >
> >> > ---
> >> >  drivers/net/ipvlan/ipvlan_main.c | 3 +++
> >> >  1 file changed, 3 insertions(+)
> >> >
> >> > diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> >> > index 4f4099d..d34f580 100644
> >> > --- a/drivers/net/ipvlan/ipvlan_main.c
> >> > +++ b/drivers/net/ipvlan/ipvlan_main.c
> >> > @@ -241,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
> >> >
> >> >                 bitmap_copy(ipvlan->mac_filters, mc_filters,
> >> >                             IPVLAN_MAC_FILTER_SIZE);
> >> > +
> >> > +               if (ipvlan->ipv4cnt)
> >> > +                       ipvlan_set_broadcast_mac_filter(ipvlan, true);
> >> >         }
> >> >         dev_uc_sync(ipvlan->phy_dev, dev);
> >> >         dev_mc_sync(ipvlan->phy_dev, dev);
> >> > --
> >> > 2.1.0
> >> >
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-03-31  3:05       ` Dan Williams
@ 2015-03-31  4:22         ` Dan Williams
  2015-04-01 20:07           ` Dan Williams
                             ` (2 more replies)
  0 siblings, 3 replies; 37+ messages in thread
From: Dan Williams @ 2015-03-31  4:22 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: linux-netdev, jbenc

The broadcast MAC is supposed to be allowed whenever the device
has an IPv4 address, otherwise ARP requests get dropped on the
floor.  If ndo_set_rx_mode (and thus
ipvlan_set_multicast_mac_filter()) got called after the address
was added, it would blow away the broadcast MAC address in
mac_filters that was added at IPv4 address addition.

Next, ipvlan currently fails DHCP addressing for two reasons:

1) DHCP offers are typically unicast back to the device's MAC
address, and at the IP layer have a destination IP address
matching the new lease address.  In ipvlan unicast packets
are checked against existing addresses assigned to the ipvlan
interface, so clearly this fails hard because the ipvlan
interface has no IP addresses yet.  Workaround: request
that the server broadcast replies (-B for dhclient) which
don't get checked against the address list.

2) Even when that's done, mac_filters only allows the
broadcast MAC address when the interface has >= 1 IPv4
addresses, so double-fail, and the incoming DHCP offer
gets dropped on the floor again.

To fix these issues, Watch for outgoing DHCP requests and
enable the broadcast MAC address indefinitely when one is seen.

Signed-off-by: Dan Williams <dcbw@redhat.com>
---
NOTE: this patch supercedes my previous two ipvlan patches:
[PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
[PATCH 2/2] ipvlan: always allow the broadcast MAC address

 drivers/net/ipvlan/ipvlan.h      |  2 ++
 drivers/net/ipvlan/ipvlan_core.c | 36 ++++++++++++++++++++++++++++++++++--
 drivers/net/ipvlan/ipvlan_main.c | 12 +++++++++---
 3 files changed, 45 insertions(+), 5 deletions(-)

diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
index 924ea98..7e0b8ff 100644
--- a/drivers/net/ipvlan/ipvlan.h
+++ b/drivers/net/ipvlan/ipvlan.h
@@ -67,6 +67,7 @@ struct ipvl_dev {
 	struct list_head	addrs;
 	int			ipv4cnt;
 	int			ipv6cnt;
+	bool			dhcp4_seen;
 	struct ipvl_pcpu_stats	__percpu *pcpu_stats;
 	DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
 	netdev_features_t	sfeatures;
@@ -118,4 +119,5 @@ bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
 struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
 					const void *iaddr, bool is_v6);
 void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
+void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set);
 #endif /* __IPVLAN_H */
diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
index 2a17500..6dd992b 100644
--- a/drivers/net/ipvlan/ipvlan_core.c
+++ b/drivers/net/ipvlan/ipvlan_core.c
@@ -126,6 +126,12 @@ static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
 		lyr3h = arph;
 		break;
 	}
+	case htons(ETH_P_ALL): {
+		/* Raw sockets */
+		if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
+			break;
+		/* Fall through */
+	}
 	case htons(ETH_P_IP): {
 		u32 pktlen;
 		struct iphdr *ip4h;
@@ -454,16 +460,42 @@ out:
 	return ipvlan_process_outbound(skb, ipvlan);
 }
 
+#define DHCP_PACKET_MIN_LEN 282
+
+static bool is_dhcp(struct sk_buff *skb)
+{
+	struct iphdr *ip4h = ip_hdr(skb);
+	struct udphdr *udph;
+
+	if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
+		|| ip_is_fragment(ip4h))
+		return false;
+
+	/* In the case of RAW sockets the transport header is not set by
+	 * the IP stack so we must set it ourselves.
+	 */
+	if (skb->transport_header == skb->network_header)
+		skb_set_transport_header(skb, ETH_HLEN + sizeof(struct iphdr));
+	udph = udp_hdr(skb);
+	return udph && udph->dest == htons(67) && udph->source == htons(68);
+}
+
 static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
 {
-	const struct ipvl_dev *ipvlan = netdev_priv(dev);
+	struct ipvl_dev *ipvlan = netdev_priv(dev);
 	struct ethhdr *eth = eth_hdr(skb);
 	struct ipvl_addr *addr;
 	void *lyr3h;
 	int addr_type;
 
+	lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
+	if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
+	    && is_dhcp(skb)) {
+		ipvlan->dhcp4_seen = true;
+		ipvlan_set_broadcast_mac_filter(ipvlan, true);
+	}
+
 	if (ether_addr_equal(eth->h_dest, eth->h_source)) {
-		lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
 		if (lyr3h) {
 			addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
 			if (addr)
diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
index 4f4099d..a497fb9 100644
--- a/drivers/net/ipvlan/ipvlan_main.c
+++ b/drivers/net/ipvlan/ipvlan_main.c
@@ -167,6 +167,8 @@ static int ipvlan_stop(struct net_device *dev)
 
 	dev_uc_del(phy_dev, phy_dev->dev_addr);
 
+	ipvlan->dhcp4_seen = false;
+
 	if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
 		list_for_each_entry(addr, &ipvlan->addrs, anode)
 			ipvlan_ht_addr_del(addr, !dev->dismantle);
@@ -214,7 +216,7 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
 		dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
 }
 
-static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
+void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
 {
 	struct net_device *dev = ipvlan->dev;
 	unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
@@ -239,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
 		netdev_for_each_mc_addr(ha, dev)
 			__set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
 
+		if (ipvlan->ipv4cnt || ipvlan->dhcp4_seen)
+			__set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
+
 		bitmap_copy(ipvlan->mac_filters, mc_filters,
 			    IPVLAN_MAC_FILTER_SIZE);
 	}
@@ -467,6 +472,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
 	INIT_LIST_HEAD(&ipvlan->addrs);
 	ipvlan->ipv4cnt = 0;
 	ipvlan->ipv6cnt = 0;
+	ipvlan->dhcp4_seen = false;
 
 	/* TODO Probably put random address here to be presented to the
 	 * world but keep using the physical-dev address for the outgoing
@@ -708,8 +714,8 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
 	list_del_rcu(&addr->anode);
 	ipvlan->ipv4cnt--;
 	WARN_ON(ipvlan->ipv4cnt < 0);
-	if (!ipvlan->ipv4cnt)
-	    ipvlan_set_broadcast_mac_filter(ipvlan, false);
+	if (!ipvlan->ipv4cnt && !ipvlan->dhcp4_seen)
+		ipvlan_set_broadcast_mac_filter(ipvlan, false);
 	kfree_rcu(addr, rcu);
 
 	return;
-- 
2.1.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-03-31  4:22         ` [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP Dan Williams
@ 2015-04-01 20:07           ` Dan Williams
  2015-04-01 20:24           ` Mahesh Bandewar
  2015-04-02 18:16           ` David Miller
  2 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2015-04-01 20:07 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: linux-netdev, jbenc

On Mon, 2015-03-30 at 23:22 -0500, Dan Williams wrote:
> The broadcast MAC is supposed to be allowed whenever the device
> has an IPv4 address, otherwise ARP requests get dropped on the
> floor.  If ndo_set_rx_mode (and thus
> ipvlan_set_multicast_mac_filter()) got called after the address
> was added, it would blow away the broadcast MAC address in
> mac_filters that was added at IPv4 address addition.
> 
> Next, ipvlan currently fails DHCP addressing for two reasons:
> 
> 1) DHCP offers are typically unicast back to the device's MAC
> address, and at the IP layer have a destination IP address
> matching the new lease address.  In ipvlan unicast packets
> are checked against existing addresses assigned to the ipvlan
> interface, so clearly this fails hard because the ipvlan
> interface has no IP addresses yet.  Workaround: request
> that the server broadcast replies (-B for dhclient) which
> don't get checked against the address list.
> 
> 2) Even when that's done, mac_filters only allows the
> broadcast MAC address when the interface has >= 1 IPv4
> addresses, so double-fail, and the incoming DHCP offer
> gets dropped on the floor again.
> 
> To fix these issues, Watch for outgoing DHCP requests and
> enable the broadcast MAC address indefinitely when one is seen.

Any thoughts on this approach Mahesh?

Dan

> Signed-off-by: Dan Williams <dcbw@redhat.com>
> ---
> NOTE: this patch supercedes my previous two ipvlan patches:
> [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
> [PATCH 2/2] ipvlan: always allow the broadcast MAC address
> 
>  drivers/net/ipvlan/ipvlan.h      |  2 ++
>  drivers/net/ipvlan/ipvlan_core.c | 36 ++++++++++++++++++++++++++++++++++--
>  drivers/net/ipvlan/ipvlan_main.c | 12 +++++++++---
>  3 files changed, 45 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
> index 924ea98..7e0b8ff 100644
> --- a/drivers/net/ipvlan/ipvlan.h
> +++ b/drivers/net/ipvlan/ipvlan.h
> @@ -67,6 +67,7 @@ struct ipvl_dev {
>  	struct list_head	addrs;
>  	int			ipv4cnt;
>  	int			ipv6cnt;
> +	bool			dhcp4_seen;
>  	struct ipvl_pcpu_stats	__percpu *pcpu_stats;
>  	DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
>  	netdev_features_t	sfeatures;
> @@ -118,4 +119,5 @@ bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
>  struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
>  					const void *iaddr, bool is_v6);
>  void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
> +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set);
>  #endif /* __IPVLAN_H */
> diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
> index 2a17500..6dd992b 100644
> --- a/drivers/net/ipvlan/ipvlan_core.c
> +++ b/drivers/net/ipvlan/ipvlan_core.c
> @@ -126,6 +126,12 @@ static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
>  		lyr3h = arph;
>  		break;
>  	}
> +	case htons(ETH_P_ALL): {
> +		/* Raw sockets */
> +		if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
> +			break;
> +		/* Fall through */
> +	}
>  	case htons(ETH_P_IP): {
>  		u32 pktlen;
>  		struct iphdr *ip4h;
> @@ -454,16 +460,42 @@ out:
>  	return ipvlan_process_outbound(skb, ipvlan);
>  }
>  
> +#define DHCP_PACKET_MIN_LEN 282
> +
> +static bool is_dhcp(struct sk_buff *skb)
> +{
> +	struct iphdr *ip4h = ip_hdr(skb);
> +	struct udphdr *udph;
> +
> +	if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
> +		|| ip_is_fragment(ip4h))
> +		return false;
> +
> +	/* In the case of RAW sockets the transport header is not set by
> +	 * the IP stack so we must set it ourselves.
> +	 */
> +	if (skb->transport_header == skb->network_header)
> +		skb_set_transport_header(skb, ETH_HLEN + sizeof(struct iphdr));
> +	udph = udp_hdr(skb);
> +	return udph && udph->dest == htons(67) && udph->source == htons(68);
> +}
> +
>  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
>  {
> -	const struct ipvl_dev *ipvlan = netdev_priv(dev);
> +	struct ipvl_dev *ipvlan = netdev_priv(dev);
>  	struct ethhdr *eth = eth_hdr(skb);
>  	struct ipvl_addr *addr;
>  	void *lyr3h;
>  	int addr_type;
>  
> +	lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> +	if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
> +	    && is_dhcp(skb)) {
> +		ipvlan->dhcp4_seen = true;
> +		ipvlan_set_broadcast_mac_filter(ipvlan, true);
> +	}
> +
>  	if (ether_addr_equal(eth->h_dest, eth->h_source)) {
> -		lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
>  		if (lyr3h) {
>  			addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
>  			if (addr)
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 4f4099d..a497fb9 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -167,6 +167,8 @@ static int ipvlan_stop(struct net_device *dev)
>  
>  	dev_uc_del(phy_dev, phy_dev->dev_addr);
>  
> +	ipvlan->dhcp4_seen = false;
> +
>  	if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
>  		list_for_each_entry(addr, &ipvlan->addrs, anode)
>  			ipvlan_ht_addr_del(addr, !dev->dismantle);
> @@ -214,7 +216,7 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
>  		dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
>  }
>  
> -static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
>  {
>  	struct net_device *dev = ipvlan->dev;
>  	unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
> @@ -239,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
>  		netdev_for_each_mc_addr(ha, dev)
>  			__set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
>  
> +		if (ipvlan->ipv4cnt || ipvlan->dhcp4_seen)
> +			__set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
> +
>  		bitmap_copy(ipvlan->mac_filters, mc_filters,
>  			    IPVLAN_MAC_FILTER_SIZE);
>  	}
> @@ -467,6 +472,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
>  	INIT_LIST_HEAD(&ipvlan->addrs);
>  	ipvlan->ipv4cnt = 0;
>  	ipvlan->ipv6cnt = 0;
> +	ipvlan->dhcp4_seen = false;
>  
>  	/* TODO Probably put random address here to be presented to the
>  	 * world but keep using the physical-dev address for the outgoing
> @@ -708,8 +714,8 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
>  	list_del_rcu(&addr->anode);
>  	ipvlan->ipv4cnt--;
>  	WARN_ON(ipvlan->ipv4cnt < 0);
> -	if (!ipvlan->ipv4cnt)
> -	    ipvlan_set_broadcast_mac_filter(ipvlan, false);
> +	if (!ipvlan->ipv4cnt && !ipvlan->dhcp4_seen)
> +		ipvlan_set_broadcast_mac_filter(ipvlan, false);
>  	kfree_rcu(addr, rcu);
>  
>  	return;

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-03-31  4:22         ` [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP Dan Williams
  2015-04-01 20:07           ` Dan Williams
@ 2015-04-01 20:24           ` Mahesh Bandewar
  2015-04-01 20:55             ` Dan Williams
  2015-04-02 18:16           ` David Miller
  2 siblings, 1 reply; 37+ messages in thread
From: Mahesh Bandewar @ 2015-04-01 20:24 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-netdev, jbenc

On Mon, Mar 30, 2015 at 9:22 PM, Dan Williams <dcbw@redhat.com> wrote:
> The broadcast MAC is supposed to be allowed whenever the device
> has an IPv4 address, otherwise ARP requests get dropped on the
> floor.  If ndo_set_rx_mode (and thus
> ipvlan_set_multicast_mac_filter()) got called after the address
> was added, it would blow away the broadcast MAC address in
> mac_filters that was added at IPv4 address addition.
>
> Next, ipvlan currently fails DHCP addressing for two reasons:
>
I would prefer DHCPv4 instead of DHCP since it could also mean DHCPv6
which does not use broadcast.

> 1) DHCP offers are typically unicast back to the device's MAC
> address, and at the IP layer have a destination IP address
> matching the new lease address.  In ipvlan unicast packets
> are checked against existing addresses assigned to the ipvlan
> interface, so clearly this fails hard because the ipvlan
> interface has no IP addresses yet.  Workaround: request
> that the server broadcast replies (-B for dhclient) which
> don't get checked against the address list.
>
> 2) Even when that's done, mac_filters only allows the
> broadcast MAC address when the interface has >= 1 IPv4
> addresses, so double-fail, and the incoming DHCP offer
> gets dropped on the floor again.
>
> To fix these issues, Watch for outgoing DHCP requests and
> enable the broadcast MAC address indefinitely when one is seen.
>
This approach will work but adds a performance drag for all UDP
traffic whether the link needs DHCPv4 or not (especially when it does
not need). Logic works well only when the link (always) needs DHCPv4.
When the link does not care about DHCPv4 the packet-inspection always
happens.

Do we really need to special case DHCPv4? Do you see problems in
inverting the current logic of adding broadcast bit (i.e. from
enable-if-IPv4 to disable-if-IPv6-only)?

> Signed-off-by: Dan Williams <dcbw@redhat.com>
> ---
> NOTE: this patch supercedes my previous two ipvlan patches:
> [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
> [PATCH 2/2] ipvlan: always allow the broadcast MAC address
>
>  drivers/net/ipvlan/ipvlan.h      |  2 ++
>  drivers/net/ipvlan/ipvlan_core.c | 36 ++++++++++++++++++++++++++++++++++--
>  drivers/net/ipvlan/ipvlan_main.c | 12 +++++++++---
>  3 files changed, 45 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
> index 924ea98..7e0b8ff 100644
> --- a/drivers/net/ipvlan/ipvlan.h
> +++ b/drivers/net/ipvlan/ipvlan.h
> @@ -67,6 +67,7 @@ struct ipvl_dev {
>         struct list_head        addrs;
>         int                     ipv4cnt;
>         int                     ipv6cnt;
> +       bool                    dhcp4_seen;
>         struct ipvl_pcpu_stats  __percpu *pcpu_stats;
>         DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
>         netdev_features_t       sfeatures;
> @@ -118,4 +119,5 @@ bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
>  struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
>                                         const void *iaddr, bool is_v6);
>  void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
> +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set);
>  #endif /* __IPVLAN_H */
> diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
> index 2a17500..6dd992b 100644
> --- a/drivers/net/ipvlan/ipvlan_core.c
> +++ b/drivers/net/ipvlan/ipvlan_core.c
> @@ -126,6 +126,12 @@ static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
>                 lyr3h = arph;
>                 break;
>         }
> +       case htons(ETH_P_ALL): {
> +               /* Raw sockets */
> +               if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
> +                       break;
> +               /* Fall through */
> +       }
>         case htons(ETH_P_IP): {
>                 u32 pktlen;
>                 struct iphdr *ip4h;
> @@ -454,16 +460,42 @@ out:
>         return ipvlan_process_outbound(skb, ipvlan);
>  }
>
> +#define DHCP_PACKET_MIN_LEN 282
> +
> +static bool is_dhcp(struct sk_buff *skb)
> +{
> +       struct iphdr *ip4h = ip_hdr(skb);
> +       struct udphdr *udph;
> +
> +       if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
> +               || ip_is_fragment(ip4h))
> +               return false;
> +
> +       /* In the case of RAW sockets the transport header is not set by
> +        * the IP stack so we must set it ourselves.
> +        */
> +       if (skb->transport_header == skb->network_header)
> +               skb_set_transport_header(skb, ETH_HLEN + sizeof(struct iphdr));
> +       udph = udp_hdr(skb);
> +       return udph && udph->dest == htons(67) && udph->source == htons(68);
> +}
> +
>  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
>  {
> -       const struct ipvl_dev *ipvlan = netdev_priv(dev);
> +       struct ipvl_dev *ipvlan = netdev_priv(dev);
>         struct ethhdr *eth = eth_hdr(skb);
>         struct ipvl_addr *addr;
>         void *lyr3h;
>         int addr_type;
>
> +       lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> +       if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
> +           && is_dhcp(skb)) {
> +               ipvlan->dhcp4_seen = true;
> +               ipvlan_set_broadcast_mac_filter(ipvlan, true);
> +       }
> +
>         if (ether_addr_equal(eth->h_dest, eth->h_source)) {
> -               lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
>                 if (lyr3h) {
>                         addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
>                         if (addr)
> diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> index 4f4099d..a497fb9 100644
> --- a/drivers/net/ipvlan/ipvlan_main.c
> +++ b/drivers/net/ipvlan/ipvlan_main.c
> @@ -167,6 +167,8 @@ static int ipvlan_stop(struct net_device *dev)
>
>         dev_uc_del(phy_dev, phy_dev->dev_addr);
>
> +       ipvlan->dhcp4_seen = false;
> +
>         if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
>                 list_for_each_entry(addr, &ipvlan->addrs, anode)
>                         ipvlan_ht_addr_del(addr, !dev->dismantle);
> @@ -214,7 +216,7 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
>                 dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
>  }
>
> -static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
>  {
>         struct net_device *dev = ipvlan->dev;
>         unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
> @@ -239,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
>                 netdev_for_each_mc_addr(ha, dev)
>                         __set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
>
> +               if (ipvlan->ipv4cnt || ipvlan->dhcp4_seen)
> +                       __set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
> +
>                 bitmap_copy(ipvlan->mac_filters, mc_filters,
>                             IPVLAN_MAC_FILTER_SIZE);
>         }
> @@ -467,6 +472,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
>         INIT_LIST_HEAD(&ipvlan->addrs);
>         ipvlan->ipv4cnt = 0;
>         ipvlan->ipv6cnt = 0;
> +       ipvlan->dhcp4_seen = false;
>
>         /* TODO Probably put random address here to be presented to the
>          * world but keep using the physical-dev address for the outgoing
> @@ -708,8 +714,8 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
>         list_del_rcu(&addr->anode);
>         ipvlan->ipv4cnt--;
>         WARN_ON(ipvlan->ipv4cnt < 0);
> -       if (!ipvlan->ipv4cnt)
> -           ipvlan_set_broadcast_mac_filter(ipvlan, false);
> +       if (!ipvlan->ipv4cnt && !ipvlan->dhcp4_seen)
> +               ipvlan_set_broadcast_mac_filter(ipvlan, false);
>         kfree_rcu(addr, rcu);
>
>         return;
> --
> 2.1.0
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-01 20:24           ` Mahesh Bandewar
@ 2015-04-01 20:55             ` Dan Williams
  2015-04-02  1:30               ` Mahesh Bandewar
  0 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2015-04-01 20:55 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: linux-netdev, jbenc

On Wed, 2015-04-01 at 13:24 -0700, Mahesh Bandewar wrote:
> On Mon, Mar 30, 2015 at 9:22 PM, Dan Williams <dcbw@redhat.com> wrote:
> > The broadcast MAC is supposed to be allowed whenever the device
> > has an IPv4 address, otherwise ARP requests get dropped on the
> > floor.  If ndo_set_rx_mode (and thus
> > ipvlan_set_multicast_mac_filter()) got called after the address
> > was added, it would blow away the broadcast MAC address in
> > mac_filters that was added at IPv4 address addition.
> >
> > Next, ipvlan currently fails DHCP addressing for two reasons:
> >
> I would prefer DHCPv4 instead of DHCP since it could also mean DHCPv6
> which does not use broadcast.
> 
> > 1) DHCP offers are typically unicast back to the device's MAC
> > address, and at the IP layer have a destination IP address
> > matching the new lease address.  In ipvlan unicast packets
> > are checked against existing addresses assigned to the ipvlan
> > interface, so clearly this fails hard because the ipvlan
> > interface has no IP addresses yet.  Workaround: request
> > that the server broadcast replies (-B for dhclient) which
> > don't get checked against the address list.
> >
> > 2) Even when that's done, mac_filters only allows the
> > broadcast MAC address when the interface has >= 1 IPv4
> > addresses, so double-fail, and the incoming DHCP offer
> > gets dropped on the floor again.
> >
> > To fix these issues, Watch for outgoing DHCP requests and
> > enable the broadcast MAC address indefinitely when one is seen.
> >
> This approach will work but adds a performance drag for all UDP
> traffic whether the link needs DHCPv4 or not (especially when it does
> not need). Logic works well only when the link (always) needs DHCPv4.
> When the link does not care about DHCPv4 the packet-inspection always
> happens.

If DHCPv4 is detected on a link, doesn't that mean the link needs the
broadcast filter enabled?  If you start DHCPv4 on the link you obviously
expect it to work, unless there's no DHCPv4 server available.  The patch
should also reset the broadcast filter on close, so network management
could simply start DHCPv4, watch it fail, close/open and set static IP.

Alternatively we could have the broadcast filter on a 2 minute timer
that starts when DHCPv4 is seen.

> Do we really need to special case DHCPv4? Do you see problems in
> inverting the current logic of adding broadcast bit (i.e. from
> enable-if-IPv4 to disable-if-IPv6-only)?

The problem with this is dual-stack configuration.  Typically both
DHCPv4 and SLAAC are started in this case, and often SLAAC will complete
earlier.  So now the interface only has an IPv6 address but DHCPv4 is
still ongoing; as I understand your proposal the filter would now block
broadcast packets that DHCPv4 is listening for.

Dan

> > Signed-off-by: Dan Williams <dcbw@redhat.com>
> > ---
> > NOTE: this patch supercedes my previous two ipvlan patches:
> > [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
> > [PATCH 2/2] ipvlan: always allow the broadcast MAC address
> >
> >  drivers/net/ipvlan/ipvlan.h      |  2 ++
> >  drivers/net/ipvlan/ipvlan_core.c | 36 ++++++++++++++++++++++++++++++++++--
> >  drivers/net/ipvlan/ipvlan_main.c | 12 +++++++++---
> >  3 files changed, 45 insertions(+), 5 deletions(-)
> >
> > diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
> > index 924ea98..7e0b8ff 100644
> > --- a/drivers/net/ipvlan/ipvlan.h
> > +++ b/drivers/net/ipvlan/ipvlan.h
> > @@ -67,6 +67,7 @@ struct ipvl_dev {
> >         struct list_head        addrs;
> >         int                     ipv4cnt;
> >         int                     ipv6cnt;
> > +       bool                    dhcp4_seen;
> >         struct ipvl_pcpu_stats  __percpu *pcpu_stats;
> >         DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
> >         netdev_features_t       sfeatures;
> > @@ -118,4 +119,5 @@ bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
> >  struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
> >                                         const void *iaddr, bool is_v6);
> >  void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set);
> >  #endif /* __IPVLAN_H */
> > diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
> > index 2a17500..6dd992b 100644
> > --- a/drivers/net/ipvlan/ipvlan_core.c
> > +++ b/drivers/net/ipvlan/ipvlan_core.c
> > @@ -126,6 +126,12 @@ static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
> >                 lyr3h = arph;
> >                 break;
> >         }
> > +       case htons(ETH_P_ALL): {
> > +               /* Raw sockets */
> > +               if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
> > +                       break;
> > +               /* Fall through */
> > +       }
> >         case htons(ETH_P_IP): {
> >                 u32 pktlen;
> >                 struct iphdr *ip4h;
> > @@ -454,16 +460,42 @@ out:
> >         return ipvlan_process_outbound(skb, ipvlan);
> >  }
> >
> > +#define DHCP_PACKET_MIN_LEN 282
> > +
> > +static bool is_dhcp(struct sk_buff *skb)
> > +{
> > +       struct iphdr *ip4h = ip_hdr(skb);
> > +       struct udphdr *udph;
> > +
> > +       if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
> > +               || ip_is_fragment(ip4h))
> > +               return false;
> > +
> > +       /* In the case of RAW sockets the transport header is not set by
> > +        * the IP stack so we must set it ourselves.
> > +        */
> > +       if (skb->transport_header == skb->network_header)
> > +               skb_set_transport_header(skb, ETH_HLEN + sizeof(struct iphdr));
> > +       udph = udp_hdr(skb);
> > +       return udph && udph->dest == htons(67) && udph->source == htons(68);
> > +}
> > +
> >  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
> >  {
> > -       const struct ipvl_dev *ipvlan = netdev_priv(dev);
> > +       struct ipvl_dev *ipvlan = netdev_priv(dev);
> >         struct ethhdr *eth = eth_hdr(skb);
> >         struct ipvl_addr *addr;
> >         void *lyr3h;
> >         int addr_type;
> >
> > +       lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> > +       if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
> > +           && is_dhcp(skb)) {
> > +               ipvlan->dhcp4_seen = true;
> > +               ipvlan_set_broadcast_mac_filter(ipvlan, true);
> > +       }
> > +
> >         if (ether_addr_equal(eth->h_dest, eth->h_source)) {
> > -               lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> >                 if (lyr3h) {
> >                         addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
> >                         if (addr)
> > diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> > index 4f4099d..a497fb9 100644
> > --- a/drivers/net/ipvlan/ipvlan_main.c
> > +++ b/drivers/net/ipvlan/ipvlan_main.c
> > @@ -167,6 +167,8 @@ static int ipvlan_stop(struct net_device *dev)
> >
> >         dev_uc_del(phy_dev, phy_dev->dev_addr);
> >
> > +       ipvlan->dhcp4_seen = false;
> > +
> >         if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
> >                 list_for_each_entry(addr, &ipvlan->addrs, anode)
> >                         ipvlan_ht_addr_del(addr, !dev->dismantle);
> > @@ -214,7 +216,7 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
> >                 dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
> >  }
> >
> > -static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> >  {
> >         struct net_device *dev = ipvlan->dev;
> >         unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
> > @@ -239,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
> >                 netdev_for_each_mc_addr(ha, dev)
> >                         __set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
> >
> > +               if (ipvlan->ipv4cnt || ipvlan->dhcp4_seen)
> > +                       __set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
> > +
> >                 bitmap_copy(ipvlan->mac_filters, mc_filters,
> >                             IPVLAN_MAC_FILTER_SIZE);
> >         }
> > @@ -467,6 +472,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
> >         INIT_LIST_HEAD(&ipvlan->addrs);
> >         ipvlan->ipv4cnt = 0;
> >         ipvlan->ipv6cnt = 0;
> > +       ipvlan->dhcp4_seen = false;
> >
> >         /* TODO Probably put random address here to be presented to the
> >          * world but keep using the physical-dev address for the outgoing
> > @@ -708,8 +714,8 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
> >         list_del_rcu(&addr->anode);
> >         ipvlan->ipv4cnt--;
> >         WARN_ON(ipvlan->ipv4cnt < 0);
> > -       if (!ipvlan->ipv4cnt)
> > -           ipvlan_set_broadcast_mac_filter(ipvlan, false);
> > +       if (!ipvlan->ipv4cnt && !ipvlan->dhcp4_seen)
> > +               ipvlan_set_broadcast_mac_filter(ipvlan, false);
> >         kfree_rcu(addr, rcu);
> >
> >         return;
> > --
> > 2.1.0
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-01 20:55             ` Dan Williams
@ 2015-04-02  1:30               ` Mahesh Bandewar
  2015-04-02 14:40                 ` Dan Williams
  0 siblings, 1 reply; 37+ messages in thread
From: Mahesh Bandewar @ 2015-04-02  1:30 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-netdev, jbenc

On Wed, Apr 1, 2015 at 1:55 PM, Dan Williams <dcbw@redhat.com> wrote:
> On Wed, 2015-04-01 at 13:24 -0700, Mahesh Bandewar wrote:
>> On Mon, Mar 30, 2015 at 9:22 PM, Dan Williams <dcbw@redhat.com> wrote:
>> > The broadcast MAC is supposed to be allowed whenever the device
>> > has an IPv4 address, otherwise ARP requests get dropped on the
>> > floor.  If ndo_set_rx_mode (and thus
>> > ipvlan_set_multicast_mac_filter()) got called after the address
>> > was added, it would blow away the broadcast MAC address in
>> > mac_filters that was added at IPv4 address addition.
>> >
>> > Next, ipvlan currently fails DHCP addressing for two reasons:
>> >
>> I would prefer DHCPv4 instead of DHCP since it could also mean DHCPv6
>> which does not use broadcast.
>>
>> > 1) DHCP offers are typically unicast back to the device's MAC
>> > address, and at the IP layer have a destination IP address
>> > matching the new lease address.  In ipvlan unicast packets
>> > are checked against existing addresses assigned to the ipvlan
>> > interface, so clearly this fails hard because the ipvlan
>> > interface has no IP addresses yet.  Workaround: request
>> > that the server broadcast replies (-B for dhclient) which
>> > don't get checked against the address list.
>> >
>> > 2) Even when that's done, mac_filters only allows the
>> > broadcast MAC address when the interface has >= 1 IPv4
>> > addresses, so double-fail, and the incoming DHCP offer
>> > gets dropped on the floor again.
>> >
>> > To fix these issues, Watch for outgoing DHCP requests and
>> > enable the broadcast MAC address indefinitely when one is seen.
>> >
>> This approach will work but adds a performance drag for all UDP
>> traffic whether the link needs DHCPv4 or not (especially when it does
>> not need). Logic works well only when the link (always) needs DHCPv4.
>> When the link does not care about DHCPv4 the packet-inspection always
>> happens.
>
> If DHCPv4 is detected on a link, doesn't that mean the link needs the
> broadcast filter enabled?
Yes and it should stay enabled as it's going to have IPv4. Also once
broadcast bit is set there should not be any additional jugglery /
packet inspection which hurts performance.

> If you start DHCPv4 on the link you obviously
> expect it to work, unless there's no DHCPv4 server available.  The patch
> should also reset the broadcast filter on close, so network management
> could simply start DHCPv4, watch it fail, close/open and set static IP.
>
> Alternatively we could have the broadcast filter on a 2 minute timer
> that starts when DHCPv4 is seen.
>
Well, but this would be error-prone. I don't remember but DHCP timeout
is not just 2 minutes.

>> Do we really need to special case DHCPv4? Do you see problems in
>> inverting the current logic of adding broadcast bit (i.e. from
>> enable-if-IPv4 to disable-if-IPv6-only)?
>
> The problem with this is dual-stack configuration.  Typically both
> DHCPv4 and SLAAC are started in this case, and often SLAAC will complete
> earlier.  So now the interface only has an IPv6 address but DHCPv4 is
> still ongoing; as I understand your proposal the filter would now block
> broadcast packets that DHCPv4 is listening for.
>
That was a proposal and we can tweak it if necessary. My primary
concern is that the solution should not make us pay per packet penalty
once (and even if not) the DHCPv4 handshake is complete, which is not
the case as it stands now.

> Dan
>
>> > Signed-off-by: Dan Williams <dcbw@redhat.com>
>> > ---
>> > NOTE: this patch supercedes my previous two ipvlan patches:
>> > [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
>> > [PATCH 2/2] ipvlan: always allow the broadcast MAC address
>> >
>> >  drivers/net/ipvlan/ipvlan.h      |  2 ++
>> >  drivers/net/ipvlan/ipvlan_core.c | 36 ++++++++++++++++++++++++++++++++++--
>> >  drivers/net/ipvlan/ipvlan_main.c | 12 +++++++++---
>> >  3 files changed, 45 insertions(+), 5 deletions(-)
>> >
>> > diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
>> > index 924ea98..7e0b8ff 100644
>> > --- a/drivers/net/ipvlan/ipvlan.h
>> > +++ b/drivers/net/ipvlan/ipvlan.h
>> > @@ -67,6 +67,7 @@ struct ipvl_dev {
>> >         struct list_head        addrs;
>> >         int                     ipv4cnt;
>> >         int                     ipv6cnt;
>> > +       bool                    dhcp4_seen;
>> >         struct ipvl_pcpu_stats  __percpu *pcpu_stats;
>> >         DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
>> >         netdev_features_t       sfeatures;
>> > @@ -118,4 +119,5 @@ bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
>> >  struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
>> >                                         const void *iaddr, bool is_v6);
>> >  void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
>> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set);
>> >  #endif /* __IPVLAN_H */
>> > diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
>> > index 2a17500..6dd992b 100644
>> > --- a/drivers/net/ipvlan/ipvlan_core.c
>> > +++ b/drivers/net/ipvlan/ipvlan_core.c
>> > @@ -126,6 +126,12 @@ static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
>> >                 lyr3h = arph;
>> >                 break;
>> >         }
>> > +       case htons(ETH_P_ALL): {
>> > +               /* Raw sockets */
>> > +               if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
>> > +                       break;
>> > +               /* Fall through */
>> > +       }
>> >         case htons(ETH_P_IP): {
>> >                 u32 pktlen;
>> >                 struct iphdr *ip4h;
>> > @@ -454,16 +460,42 @@ out:
>> >         return ipvlan_process_outbound(skb, ipvlan);
>> >  }
>> >
>> > +#define DHCP_PACKET_MIN_LEN 282
>> > +
>> > +static bool is_dhcp(struct sk_buff *skb)
>> > +{
>> > +       struct iphdr *ip4h = ip_hdr(skb);
>> > +       struct udphdr *udph;
>> > +
>> > +       if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
>> > +               || ip_is_fragment(ip4h))
>> > +               return false;
>> > +
>> > +       /* In the case of RAW sockets the transport header is not set by
>> > +        * the IP stack so we must set it ourselves.
>> > +        */
>> > +       if (skb->transport_header == skb->network_header)
>> > +               skb_set_transport_header(skb, ETH_HLEN + sizeof(struct iphdr));
>> > +       udph = udp_hdr(skb);
>> > +       return udph && udph->dest == htons(67) && udph->source == htons(68);
>> > +}
>> > +
>> >  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
>> >  {
>> > -       const struct ipvl_dev *ipvlan = netdev_priv(dev);
>> > +       struct ipvl_dev *ipvlan = netdev_priv(dev);
>> >         struct ethhdr *eth = eth_hdr(skb);
>> >         struct ipvl_addr *addr;
>> >         void *lyr3h;
>> >         int addr_type;
>> >
>> > +       lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
>> > +       if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
>> > +           && is_dhcp(skb)) {
>> > +               ipvlan->dhcp4_seen = true;
>> > +               ipvlan_set_broadcast_mac_filter(ipvlan, true);
>> > +       }
>> > +
>> >         if (ether_addr_equal(eth->h_dest, eth->h_source)) {
>> > -               lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
>> >                 if (lyr3h) {
>> >                         addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
>> >                         if (addr)
>> > diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
>> > index 4f4099d..a497fb9 100644
>> > --- a/drivers/net/ipvlan/ipvlan_main.c
>> > +++ b/drivers/net/ipvlan/ipvlan_main.c
>> > @@ -167,6 +167,8 @@ static int ipvlan_stop(struct net_device *dev)
>> >
>> >         dev_uc_del(phy_dev, phy_dev->dev_addr);
>> >
>> > +       ipvlan->dhcp4_seen = false;
>> > +
>> >         if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
>> >                 list_for_each_entry(addr, &ipvlan->addrs, anode)
>> >                         ipvlan_ht_addr_del(addr, !dev->dismantle);
>> > @@ -214,7 +216,7 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
>> >                 dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
>> >  }
>> >
>> > -static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
>> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
>> >  {
>> >         struct net_device *dev = ipvlan->dev;
>> >         unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
>> > @@ -239,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
>> >                 netdev_for_each_mc_addr(ha, dev)
>> >                         __set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
>> >
>> > +               if (ipvlan->ipv4cnt || ipvlan->dhcp4_seen)
>> > +                       __set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
>> > +
>> >                 bitmap_copy(ipvlan->mac_filters, mc_filters,
>> >                             IPVLAN_MAC_FILTER_SIZE);
>> >         }
>> > @@ -467,6 +472,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
>> >         INIT_LIST_HEAD(&ipvlan->addrs);
>> >         ipvlan->ipv4cnt = 0;
>> >         ipvlan->ipv6cnt = 0;
>> > +       ipvlan->dhcp4_seen = false;
>> >
>> >         /* TODO Probably put random address here to be presented to the
>> >          * world but keep using the physical-dev address for the outgoing
>> > @@ -708,8 +714,8 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
>> >         list_del_rcu(&addr->anode);
>> >         ipvlan->ipv4cnt--;
>> >         WARN_ON(ipvlan->ipv4cnt < 0);
>> > -       if (!ipvlan->ipv4cnt)
>> > -           ipvlan_set_broadcast_mac_filter(ipvlan, false);
>> > +       if (!ipvlan->ipv4cnt && !ipvlan->dhcp4_seen)
>> > +               ipvlan_set_broadcast_mac_filter(ipvlan, false);
>> >         kfree_rcu(addr, rcu);
>> >
>> >         return;
>> > --
>> > 2.1.0
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-02  1:30               ` Mahesh Bandewar
@ 2015-04-02 14:40                 ` Dan Williams
  2015-04-03  1:39                   ` Mahesh Bandewar
  0 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2015-04-02 14:40 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: linux-netdev, jbenc

On Wed, 2015-04-01 at 18:30 -0700, Mahesh Bandewar wrote:
> On Wed, Apr 1, 2015 at 1:55 PM, Dan Williams <dcbw@redhat.com> wrote:
> > On Wed, 2015-04-01 at 13:24 -0700, Mahesh Bandewar wrote:
> >> On Mon, Mar 30, 2015 at 9:22 PM, Dan Williams <dcbw@redhat.com> wrote:
> >> > The broadcast MAC is supposed to be allowed whenever the device
> >> > has an IPv4 address, otherwise ARP requests get dropped on the
> >> > floor.  If ndo_set_rx_mode (and thus
> >> > ipvlan_set_multicast_mac_filter()) got called after the address
> >> > was added, it would blow away the broadcast MAC address in
> >> > mac_filters that was added at IPv4 address addition.
> >> >
> >> > Next, ipvlan currently fails DHCP addressing for two reasons:
> >> >
> >> I would prefer DHCPv4 instead of DHCP since it could also mean DHCPv6
> >> which does not use broadcast.
> >>
> >> > 1) DHCP offers are typically unicast back to the device's MAC
> >> > address, and at the IP layer have a destination IP address
> >> > matching the new lease address.  In ipvlan unicast packets
> >> > are checked against existing addresses assigned to the ipvlan
> >> > interface, so clearly this fails hard because the ipvlan
> >> > interface has no IP addresses yet.  Workaround: request
> >> > that the server broadcast replies (-B for dhclient) which
> >> > don't get checked against the address list.
> >> >
> >> > 2) Even when that's done, mac_filters only allows the
> >> > broadcast MAC address when the interface has >= 1 IPv4
> >> > addresses, so double-fail, and the incoming DHCP offer
> >> > gets dropped on the floor again.
> >> >
> >> > To fix these issues, Watch for outgoing DHCP requests and
> >> > enable the broadcast MAC address indefinitely when one is seen.
> >> >
> >> This approach will work but adds a performance drag for all UDP
> >> traffic whether the link needs DHCPv4 or not (especially when it does
> >> not need). Logic works well only when the link (always) needs DHCPv4.
> >> When the link does not care about DHCPv4 the packet-inspection always
> >> happens.
> >
> > If DHCPv4 is detected on a link, doesn't that mean the link needs the
> > broadcast filter enabled?
> Yes and it should stay enabled as it's going to have IPv4. Also once
> broadcast bit is set there should not be any additional jugglery /
> packet inspection which hurts performance.

Yes, which is why my patch includes the check for dhcp4_seen and skips
inspecting the packet.  I could modify it to skip the L3 header check
too in that case.

> > If you start DHCPv4 on the link you obviously
> > expect it to work, unless there's no DHCPv4 server available.  The patch
> > should also reset the broadcast filter on close, so network management
> > could simply start DHCPv4, watch it fail, close/open and set static IP.
> >
> > Alternatively we could have the broadcast filter on a 2 minute timer
> > that starts when DHCPv4 is seen.
> >
> Well, but this would be error-prone. I don't remember but DHCP timeout
> is not just 2 minutes.

2m is IIRC an RFC recommended timeout for initial DHCPv4 requests.  If
you don't get a reply to your requests within 2m, you aren't likely to
get one ever on this network and you can stop trying for a while.  You
either get a reply in which case you get an IPv4 address and broadcast
is always enabled, or you don't, in which case you don't really need
broadcast.  Then when userspace next tries DHCP, the timer would allow
broadcast for another 2 minutes.

> >> Do we really need to special case DHCPv4? Do you see problems in
> >> inverting the current logic of adding broadcast bit (i.e. from
> >> enable-if-IPv4 to disable-if-IPv6-only)?
> >
> > The problem with this is dual-stack configuration.  Typically both
> > DHCPv4 and SLAAC are started in this case, and often SLAAC will complete
> > earlier.  So now the interface only has an IPv6 address but DHCPv4 is
> > still ongoing; as I understand your proposal the filter would now block
> > broadcast packets that DHCPv4 is listening for.
> >
> That was a proposal and we can tweak it if necessary. My primary
> concern is that the solution should not make us pay per packet penalty
> once (and even if not) the DHCPv4 handshake is complete, which is not
> the case as it stands now.

I'm not sure what you mean by "not make us pay a per packet penalty once
DHCPv4 is complete".  Once an IPv4 address is assigned, the broadcast
filter must allow broadcast packets.  Before an IPv4 address is
assigned, the driver must watch for DHCPv4 and allow broadcast packets
so that the DHCP reply isn't missed.

If DHCPv4 gets no response, then there are two options:

a) down the interface (which resets the broadcast filter to block
broadcast packets) and re-up it, and don't do DHCPv4 again

b) have a timer in the driver that waits a period of time for an IPv4
address after a DHCPv4 request, and blocks broadcast when the timer
expires

Dan

> > Dan
> >
> >> > Signed-off-by: Dan Williams <dcbw@redhat.com>
> >> > ---
> >> > NOTE: this patch supercedes my previous two ipvlan patches:
> >> > [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
> >> > [PATCH 2/2] ipvlan: always allow the broadcast MAC address
> >> >
> >> >  drivers/net/ipvlan/ipvlan.h      |  2 ++
> >> >  drivers/net/ipvlan/ipvlan_core.c | 36 ++++++++++++++++++++++++++++++++++--
> >> >  drivers/net/ipvlan/ipvlan_main.c | 12 +++++++++---
> >> >  3 files changed, 45 insertions(+), 5 deletions(-)
> >> >
> >> > diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
> >> > index 924ea98..7e0b8ff 100644
> >> > --- a/drivers/net/ipvlan/ipvlan.h
> >> > +++ b/drivers/net/ipvlan/ipvlan.h
> >> > @@ -67,6 +67,7 @@ struct ipvl_dev {
> >> >         struct list_head        addrs;
> >> >         int                     ipv4cnt;
> >> >         int                     ipv6cnt;
> >> > +       bool                    dhcp4_seen;
> >> >         struct ipvl_pcpu_stats  __percpu *pcpu_stats;
> >> >         DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
> >> >         netdev_features_t       sfeatures;
> >> > @@ -118,4 +119,5 @@ bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
> >> >  struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
> >> >                                         const void *iaddr, bool is_v6);
> >> >  void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set);
> >> >  #endif /* __IPVLAN_H */
> >> > diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
> >> > index 2a17500..6dd992b 100644
> >> > --- a/drivers/net/ipvlan/ipvlan_core.c
> >> > +++ b/drivers/net/ipvlan/ipvlan_core.c
> >> > @@ -126,6 +126,12 @@ static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
> >> >                 lyr3h = arph;
> >> >                 break;
> >> >         }
> >> > +       case htons(ETH_P_ALL): {
> >> > +               /* Raw sockets */
> >> > +               if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
> >> > +                       break;
> >> > +               /* Fall through */
> >> > +       }
> >> >         case htons(ETH_P_IP): {
> >> >                 u32 pktlen;
> >> >                 struct iphdr *ip4h;
> >> > @@ -454,16 +460,42 @@ out:
> >> >         return ipvlan_process_outbound(skb, ipvlan);
> >> >  }
> >> >
> >> > +#define DHCP_PACKET_MIN_LEN 282
> >> > +
> >> > +static bool is_dhcp(struct sk_buff *skb)
> >> > +{
> >> > +       struct iphdr *ip4h = ip_hdr(skb);
> >> > +       struct udphdr *udph;
> >> > +
> >> > +       if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
> >> > +               || ip_is_fragment(ip4h))
> >> > +               return false;
> >> > +
> >> > +       /* In the case of RAW sockets the transport header is not set by
> >> > +        * the IP stack so we must set it ourselves.
> >> > +        */
> >> > +       if (skb->transport_header == skb->network_header)
> >> > +               skb_set_transport_header(skb, ETH_HLEN + sizeof(struct iphdr));
> >> > +       udph = udp_hdr(skb);
> >> > +       return udph && udph->dest == htons(67) && udph->source == htons(68);
> >> > +}
> >> > +
> >> >  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
> >> >  {
> >> > -       const struct ipvl_dev *ipvlan = netdev_priv(dev);
> >> > +       struct ipvl_dev *ipvlan = netdev_priv(dev);
> >> >         struct ethhdr *eth = eth_hdr(skb);
> >> >         struct ipvl_addr *addr;
> >> >         void *lyr3h;
> >> >         int addr_type;
> >> >
> >> > +       lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> >> > +       if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
> >> > +           && is_dhcp(skb)) {
> >> > +               ipvlan->dhcp4_seen = true;
> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, true);
> >> > +       }
> >> > +
> >> >         if (ether_addr_equal(eth->h_dest, eth->h_source)) {
> >> > -               lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> >> >                 if (lyr3h) {
> >> >                         addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
> >> >                         if (addr)
> >> > diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> >> > index 4f4099d..a497fb9 100644
> >> > --- a/drivers/net/ipvlan/ipvlan_main.c
> >> > +++ b/drivers/net/ipvlan/ipvlan_main.c
> >> > @@ -167,6 +167,8 @@ static int ipvlan_stop(struct net_device *dev)
> >> >
> >> >         dev_uc_del(phy_dev, phy_dev->dev_addr);
> >> >
> >> > +       ipvlan->dhcp4_seen = false;
> >> > +
> >> >         if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
> >> >                 list_for_each_entry(addr, &ipvlan->addrs, anode)
> >> >                         ipvlan_ht_addr_del(addr, !dev->dismantle);
> >> > @@ -214,7 +216,7 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
> >> >                 dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
> >> >  }
> >> >
> >> > -static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> >> >  {
> >> >         struct net_device *dev = ipvlan->dev;
> >> >         unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
> >> > @@ -239,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
> >> >                 netdev_for_each_mc_addr(ha, dev)
> >> >                         __set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
> >> >
> >> > +               if (ipvlan->ipv4cnt || ipvlan->dhcp4_seen)
> >> > +                       __set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
> >> > +
> >> >                 bitmap_copy(ipvlan->mac_filters, mc_filters,
> >> >                             IPVLAN_MAC_FILTER_SIZE);
> >> >         }
> >> > @@ -467,6 +472,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
> >> >         INIT_LIST_HEAD(&ipvlan->addrs);
> >> >         ipvlan->ipv4cnt = 0;
> >> >         ipvlan->ipv6cnt = 0;
> >> > +       ipvlan->dhcp4_seen = false;
> >> >
> >> >         /* TODO Probably put random address here to be presented to the
> >> >          * world but keep using the physical-dev address for the outgoing
> >> > @@ -708,8 +714,8 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
> >> >         list_del_rcu(&addr->anode);
> >> >         ipvlan->ipv4cnt--;
> >> >         WARN_ON(ipvlan->ipv4cnt < 0);
> >> > -       if (!ipvlan->ipv4cnt)
> >> > -           ipvlan_set_broadcast_mac_filter(ipvlan, false);
> >> > +       if (!ipvlan->ipv4cnt && !ipvlan->dhcp4_seen)
> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, false);
> >> >         kfree_rcu(addr, rcu);
> >> >
> >> >         return;
> >> > --
> >> > 2.1.0
> >> >
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-03-31  4:22         ` [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP Dan Williams
  2015-04-01 20:07           ` Dan Williams
  2015-04-01 20:24           ` Mahesh Bandewar
@ 2015-04-02 18:16           ` David Miller
  2015-04-02 18:39             ` Dan Williams
  2 siblings, 1 reply; 37+ messages in thread
From: David Miller @ 2015-04-02 18:16 UTC (permalink / raw)
  To: dcbw; +Cc: maheshb, netdev, jbenc

From: Dan Williams <dcbw@redhat.com>
Date: Mon, 30 Mar 2015 23:22:50 -0500

> +	case htons(ETH_P_ALL): {
> +		/* Raw sockets */
> +		if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
> +			break;
> +		/* Fall through */
> +	}

You are not declaring any local variables in this switch block, therefore
there is no reason to surround it with curly braces, remove them.

> +	if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
> +		|| ip_is_fragment(ip4h))
> +		return false;

This is not formatted properly.

Lines of conditional statements _END_ with operators, they should never
begin with one.

Also, the second and subsequent lines of a multi-line if() statement should
begin precisely at the first column after the openning parenthesis of the
first line.  You must use the appropriate number of TAB then SPACE characters
to achieve this.

If you are purely using TAB characters to indent these things, it is almost
certain that you are doing it wrong.

And yes, I am super sick of typing this in several times a day, especially
when this is not only documented, but checked for by checkpatch and other
automated tools.

> +	lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> +	if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
> +	    && is_dhcp(skb)) {

Likewise.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-02 18:16           ` David Miller
@ 2015-04-02 18:39             ` Dan Williams
  2015-04-02 18:46               ` Dan Williams
  0 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2015-04-02 18:39 UTC (permalink / raw)
  To: David Miller; +Cc: maheshb, netdev, jbenc

On Thu, 2015-04-02 at 14:16 -0400, David Miller wrote:
> From: Dan Williams <dcbw@redhat.com>
> Date: Mon, 30 Mar 2015 23:22:50 -0500
> 
> > +	case htons(ETH_P_ALL): {
> > +		/* Raw sockets */
> > +		if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
> > +			break;
> > +		/* Fall through */
> > +	}
> 
> You are not declaring any local variables in this switch block, therefore
> there is no reason to surround it with curly braces, remove them.
> 
> > +	if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
> > +		|| ip_is_fragment(ip4h))
> > +		return false;
> 
> This is not formatted properly.
> 
> Lines of conditional statements _END_ with operators, they should never
> begin with one.
> 
> Also, the second and subsequent lines of a multi-line if() statement should
> begin precisely at the first column after the openning parenthesis of the
> first line.  You must use the appropriate number of TAB then SPACE characters
> to achieve this.
> 
> If you are purely using TAB characters to indent these things, it is almost
> certain that you are doing it wrong.
> 
> And yes, I am super sick of typing this in several times a day, especially
> when this is not only documented, but checked for by checkpatch and other
> automated tools.
> 
> > +	lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> > +	if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
> > +	    && is_dhcp(skb)) {
> 
> Likewise.

I'm sorry about that, and I will fix style/indent and re-post once
Mahesh and I have agreed on the exact behavior here.

Dan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-02 18:39             ` Dan Williams
@ 2015-04-02 18:46               ` Dan Williams
  0 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2015-04-02 18:46 UTC (permalink / raw)
  To: David Miller; +Cc: maheshb, netdev, jbenc

On Thu, 2015-04-02 at 13:39 -0500, Dan Williams wrote:
> On Thu, 2015-04-02 at 14:16 -0400, David Miller wrote:
> > From: Dan Williams <dcbw@redhat.com>
> > Date: Mon, 30 Mar 2015 23:22:50 -0500
> > 
> > > +	case htons(ETH_P_ALL): {
> > > +		/* Raw sockets */
> > > +		if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
> > > +			break;
> > > +		/* Fall through */
> > > +	}
> > 
> > You are not declaring any local variables in this switch block, therefore
> > there is no reason to surround it with curly braces, remove them.
> > 
> > > +	if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
> > > +		|| ip_is_fragment(ip4h))
> > > +		return false;
> > 
> > This is not formatted properly.
> > 
> > Lines of conditional statements _END_ with operators, they should never
> > begin with one.
> > 
> > Also, the second and subsequent lines of a multi-line if() statement should
> > begin precisely at the first column after the openning parenthesis of the
> > first line.  You must use the appropriate number of TAB then SPACE characters
> > to achieve this.
> > 
> > If you are purely using TAB characters to indent these things, it is almost
> > certain that you are doing it wrong.
> > 
> > And yes, I am super sick of typing this in several times a day, especially
> > when this is not only documented, but checked for by checkpatch and other
> > automated tools.
> > 
> > > +	lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> > > +	if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
> > > +	    && is_dhcp(skb)) {
> > 
> > Likewise.
> 
> I'm sorry about that, and I will fix style/indent and re-post once
> Mahesh and I have agreed on the exact behavior here.

It also conflicts with Jiri Benc's patches that you applied, so I need
to fix that up too.

Dan

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-02 14:40                 ` Dan Williams
@ 2015-04-03  1:39                   ` Mahesh Bandewar
  2015-04-06 17:17                     ` Dan Williams
  0 siblings, 1 reply; 37+ messages in thread
From: Mahesh Bandewar @ 2015-04-03  1:39 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-netdev, jbenc

On Thu, Apr 2, 2015 at 7:40 AM, Dan Williams <dcbw@redhat.com> wrote:
> On Wed, 2015-04-01 at 18:30 -0700, Mahesh Bandewar wrote:
>> On Wed, Apr 1, 2015 at 1:55 PM, Dan Williams <dcbw@redhat.com> wrote:
>> > On Wed, 2015-04-01 at 13:24 -0700, Mahesh Bandewar wrote:
>> >> On Mon, Mar 30, 2015 at 9:22 PM, Dan Williams <dcbw@redhat.com> wrote:
>> >> > The broadcast MAC is supposed to be allowed whenever the device
>> >> > has an IPv4 address, otherwise ARP requests get dropped on the
>> >> > floor.  If ndo_set_rx_mode (and thus
>> >> > ipvlan_set_multicast_mac_filter()) got called after the address
>> >> > was added, it would blow away the broadcast MAC address in
>> >> > mac_filters that was added at IPv4 address addition.
>> >> >
>> >> > Next, ipvlan currently fails DHCP addressing for two reasons:
>> >> >
>> >> I would prefer DHCPv4 instead of DHCP since it could also mean DHCPv6
>> >> which does not use broadcast.
>> >>
>> >> > 1) DHCP offers are typically unicast back to the device's MAC
>> >> > address, and at the IP layer have a destination IP address
>> >> > matching the new lease address.  In ipvlan unicast packets
>> >> > are checked against existing addresses assigned to the ipvlan
>> >> > interface, so clearly this fails hard because the ipvlan
>> >> > interface has no IP addresses yet.  Workaround: request
>> >> > that the server broadcast replies (-B for dhclient) which
>> >> > don't get checked against the address list.
>> >> >
>> >> > 2) Even when that's done, mac_filters only allows the
>> >> > broadcast MAC address when the interface has >= 1 IPv4
>> >> > addresses, so double-fail, and the incoming DHCP offer
>> >> > gets dropped on the floor again.
>> >> >
>> >> > To fix these issues, Watch for outgoing DHCP requests and
>> >> > enable the broadcast MAC address indefinitely when one is seen.
>> >> >
>> >> This approach will work but adds a performance drag for all UDP
>> >> traffic whether the link needs DHCPv4 or not (especially when it does
>> >> not need). Logic works well only when the link (always) needs DHCPv4.
>> >> When the link does not care about DHCPv4 the packet-inspection always
>> >> happens.
>> >
>> > If DHCPv4 is detected on a link, doesn't that mean the link needs the
>> > broadcast filter enabled?
>> Yes and it should stay enabled as it's going to have IPv4. Also once
>> broadcast bit is set there should not be any additional jugglery /
>> packet inspection which hurts performance.
>
> Yes, which is why my patch includes the check for dhcp4_seen and skips
> inspecting the packet.  I could modify it to skip the L3 header check
> too in that case.
>
dhcp4_seen works correctly when there is dhcpv4 happening, but
snooping continues if autoconf is not used.

>> > If you start DHCPv4 on the link you obviously
>> > expect it to work, unless there's no DHCPv4 server available.  The patch
>> > should also reset the broadcast filter on close, so network management
>> > could simply start DHCPv4, watch it fail, close/open and set static IP.
>> >
>> > Alternatively we could have the broadcast filter on a 2 minute timer
>> > that starts when DHCPv4 is seen.
>> >
>> Well, but this would be error-prone. I don't remember but DHCP timeout
>> is not just 2 minutes.
>
> 2m is IIRC an RFC recommended timeout for initial DHCPv4 requests.  If
> you don't get a reply to your requests within 2m, you aren't likely to
> get one ever on this network and you can stop trying for a while.  You
> either get a reply in which case you get an IPv4 address and broadcast
> is always enabled, or you don't, in which case you don't really need
> broadcast.  Then when userspace next tries DHCP, the timer would allow
> broadcast for another 2 minutes.
>
>> >> Do we really need to special case DHCPv4? Do you see problems in
>> >> inverting the current logic of adding broadcast bit (i.e. from
>> >> enable-if-IPv4 to disable-if-IPv6-only)?
>> >
>> > The problem with this is dual-stack configuration.  Typically both
>> > DHCPv4 and SLAAC are started in this case, and often SLAAC will complete
>> > earlier.  So now the interface only has an IPv6 address but DHCPv4 is
>> > still ongoing; as I understand your proposal the filter would now block
>> > broadcast packets that DHCPv4 is listening for.
>> >
>> That was a proposal and we can tweak it if necessary. My primary
>> concern is that the solution should not make us pay per packet penalty
>> once (and even if not) the DHCPv4 handshake is complete, which is not
>> the case as it stands now.
>
> I'm not sure what you mean by "not make us pay a per packet penalty once
> DHCPv4 is complete".  Once an IPv4 address is assigned, the broadcast
> filter must allow broadcast packets.  Before an IPv4 address is
> assigned, the driver must watch for DHCPv4 and allow broadcast packets
> so that the DHCP reply isn't missed.
>
If the link is not using DHCPv4 to configure, the snooping continues
and that is per packet cost the needs to be paid. Probably this can be
avoided by setting dhcp4_seen = true in add_addr4 handler.

> If DHCPv4 gets no response, then there are two options:
>
> a) down the interface (which resets the broadcast filter to block
> broadcast packets) and re-up it, and don't do DHCPv4 again
>
> b) have a timer in the driver that waits a period of time for an IPv4
> address after a DHCPv4 request, and blocks broadcast when the timer
> expires
>
Probably something similar where turn on the broadcast bit and wait
for the interface to get configured (2 min timer) and at the expiry
decide what is to be done about braodcast bit. If the addresses
configured are all IPv6, then eliminate it and if any of them are IPv4
then don't change it. In this case no special casing nor snooping is
required and this should work for dual-stack scenario as well.

> Dan
>
>> > Dan
>> >
>> >> > Signed-off-by: Dan Williams <dcbw@redhat.com>
>> >> > ---
>> >> > NOTE: this patch supercedes my previous two ipvlan patches:
>> >> > [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
>> >> > [PATCH 2/2] ipvlan: always allow the broadcast MAC address
>> >> >
>> >> >  drivers/net/ipvlan/ipvlan.h      |  2 ++
>> >> >  drivers/net/ipvlan/ipvlan_core.c | 36 ++++++++++++++++++++++++++++++++++--
>> >> >  drivers/net/ipvlan/ipvlan_main.c | 12 +++++++++---
>> >> >  3 files changed, 45 insertions(+), 5 deletions(-)
>> >> >
>> >> > diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
>> >> > index 924ea98..7e0b8ff 100644
>> >> > --- a/drivers/net/ipvlan/ipvlan.h
>> >> > +++ b/drivers/net/ipvlan/ipvlan.h
>> >> > @@ -67,6 +67,7 @@ struct ipvl_dev {
>> >> >         struct list_head        addrs;
>> >> >         int                     ipv4cnt;
>> >> >         int                     ipv6cnt;
>> >> > +       bool                    dhcp4_seen;
>> >> >         struct ipvl_pcpu_stats  __percpu *pcpu_stats;
>> >> >         DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
>> >> >         netdev_features_t       sfeatures;
>> >> > @@ -118,4 +119,5 @@ bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
>> >> >  struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
>> >> >                                         const void *iaddr, bool is_v6);
>> >> >  void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
>> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set);
>> >> >  #endif /* __IPVLAN_H */
>> >> > diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
>> >> > index 2a17500..6dd992b 100644
>> >> > --- a/drivers/net/ipvlan/ipvlan_core.c
>> >> > +++ b/drivers/net/ipvlan/ipvlan_core.c
>> >> > @@ -126,6 +126,12 @@ static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
>> >> >                 lyr3h = arph;
>> >> >                 break;
>> >> >         }
>> >> > +       case htons(ETH_P_ALL): {
>> >> > +               /* Raw sockets */
>> >> > +               if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
>> >> > +                       break;
>> >> > +               /* Fall through */
>> >> > +       }
>> >> >         case htons(ETH_P_IP): {
>> >> >                 u32 pktlen;
>> >> >                 struct iphdr *ip4h;
>> >> > @@ -454,16 +460,42 @@ out:
>> >> >         return ipvlan_process_outbound(skb, ipvlan);
>> >> >  }
>> >> >
>> >> > +#define DHCP_PACKET_MIN_LEN 282
>> >> > +
>> >> > +static bool is_dhcp(struct sk_buff *skb)
>> >> > +{
>> >> > +       struct iphdr *ip4h = ip_hdr(skb);
>> >> > +       struct udphdr *udph;
>> >> > +
>> >> > +       if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
>> >> > +               || ip_is_fragment(ip4h))
>> >> > +               return false;
>> >> > +
>> >> > +       /* In the case of RAW sockets the transport header is not set by
>> >> > +        * the IP stack so we must set it ourselves.
>> >> > +        */
>> >> > +       if (skb->transport_header == skb->network_header)
>> >> > +               skb_set_transport_header(skb, ETH_HLEN + sizeof(struct iphdr));
>> >> > +       udph = udp_hdr(skb);
>> >> > +       return udph && udph->dest == htons(67) && udph->source == htons(68);
>> >> > +}
>> >> > +
>> >> >  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
>> >> >  {
>> >> > -       const struct ipvl_dev *ipvlan = netdev_priv(dev);
>> >> > +       struct ipvl_dev *ipvlan = netdev_priv(dev);
>> >> >         struct ethhdr *eth = eth_hdr(skb);
>> >> >         struct ipvl_addr *addr;
>> >> >         void *lyr3h;
>> >> >         int addr_type;
>> >> >
>> >> > +       lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
>> >> > +       if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
>> >> > +           && is_dhcp(skb)) {
>> >> > +               ipvlan->dhcp4_seen = true;
>> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, true);
>> >> > +       }
>> >> > +
>> >> >         if (ether_addr_equal(eth->h_dest, eth->h_source)) {
>> >> > -               lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
>> >> >                 if (lyr3h) {
>> >> >                         addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
>> >> >                         if (addr)
>> >> > diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
>> >> > index 4f4099d..a497fb9 100644
>> >> > --- a/drivers/net/ipvlan/ipvlan_main.c
>> >> > +++ b/drivers/net/ipvlan/ipvlan_main.c
>> >> > @@ -167,6 +167,8 @@ static int ipvlan_stop(struct net_device *dev)
>> >> >
>> >> >         dev_uc_del(phy_dev, phy_dev->dev_addr);
>> >> >
>> >> > +       ipvlan->dhcp4_seen = false;
>> >> > +
>> >> >         if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
>> >> >                 list_for_each_entry(addr, &ipvlan->addrs, anode)
>> >> >                         ipvlan_ht_addr_del(addr, !dev->dismantle);
>> >> > @@ -214,7 +216,7 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
>> >> >                 dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
>> >> >  }
>> >> >
>> >> > -static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
>> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
>> >> >  {
>> >> >         struct net_device *dev = ipvlan->dev;
>> >> >         unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
>> >> > @@ -239,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
>> >> >                 netdev_for_each_mc_addr(ha, dev)
>> >> >                         __set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
>> >> >
>> >> > +               if (ipvlan->ipv4cnt || ipvlan->dhcp4_seen)
>> >> > +                       __set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
>> >> > +
>> >> >                 bitmap_copy(ipvlan->mac_filters, mc_filters,
>> >> >                             IPVLAN_MAC_FILTER_SIZE);
>> >> >         }
>> >> > @@ -467,6 +472,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
>> >> >         INIT_LIST_HEAD(&ipvlan->addrs);
>> >> >         ipvlan->ipv4cnt = 0;
>> >> >         ipvlan->ipv6cnt = 0;
>> >> > +       ipvlan->dhcp4_seen = false;
>> >> >
>> >> >         /* TODO Probably put random address here to be presented to the
>> >> >          * world but keep using the physical-dev address for the outgoing
>> >> > @@ -708,8 +714,8 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
>> >> >         list_del_rcu(&addr->anode);
>> >> >         ipvlan->ipv4cnt--;
>> >> >         WARN_ON(ipvlan->ipv4cnt < 0);
>> >> > -       if (!ipvlan->ipv4cnt)
>> >> > -           ipvlan_set_broadcast_mac_filter(ipvlan, false);
>> >> > +       if (!ipvlan->ipv4cnt && !ipvlan->dhcp4_seen)
>> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, false);
>> >> >         kfree_rcu(addr, rcu);
>> >> >
>> >> >         return;
>> >> > --
>> >> > 2.1.0
>> >> >
>> >> >
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-03  1:39                   ` Mahesh Bandewar
@ 2015-04-06 17:17                     ` Dan Williams
  2015-04-07 18:32                       ` Mahesh Bandewar
  2015-04-08  9:37                       ` David Laight
  0 siblings, 2 replies; 37+ messages in thread
From: Dan Williams @ 2015-04-06 17:17 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: linux-netdev, jbenc

On Thu, 2015-04-02 at 18:39 -0700, Mahesh Bandewar wrote:
> On Thu, Apr 2, 2015 at 7:40 AM, Dan Williams <dcbw@redhat.com> wrote:
> > On Wed, 2015-04-01 at 18:30 -0700, Mahesh Bandewar wrote:
> >> On Wed, Apr 1, 2015 at 1:55 PM, Dan Williams <dcbw@redhat.com> wrote:
> >> > On Wed, 2015-04-01 at 13:24 -0700, Mahesh Bandewar wrote:
> >> >> On Mon, Mar 30, 2015 at 9:22 PM, Dan Williams <dcbw@redhat.com> wrote:
> >> >> > The broadcast MAC is supposed to be allowed whenever the device
> >> >> > has an IPv4 address, otherwise ARP requests get dropped on the
> >> >> > floor.  If ndo_set_rx_mode (and thus
> >> >> > ipvlan_set_multicast_mac_filter()) got called after the address
> >> >> > was added, it would blow away the broadcast MAC address in
> >> >> > mac_filters that was added at IPv4 address addition.
> >> >> >
> >> >> > Next, ipvlan currently fails DHCP addressing for two reasons:
> >> >> >
> >> >> I would prefer DHCPv4 instead of DHCP since it could also mean DHCPv6
> >> >> which does not use broadcast.
> >> >>
> >> >> > 1) DHCP offers are typically unicast back to the device's MAC
> >> >> > address, and at the IP layer have a destination IP address
> >> >> > matching the new lease address.  In ipvlan unicast packets
> >> >> > are checked against existing addresses assigned to the ipvlan
> >> >> > interface, so clearly this fails hard because the ipvlan
> >> >> > interface has no IP addresses yet.  Workaround: request
> >> >> > that the server broadcast replies (-B for dhclient) which
> >> >> > don't get checked against the address list.
> >> >> >
> >> >> > 2) Even when that's done, mac_filters only allows the
> >> >> > broadcast MAC address when the interface has >= 1 IPv4
> >> >> > addresses, so double-fail, and the incoming DHCP offer
> >> >> > gets dropped on the floor again.
> >> >> >
> >> >> > To fix these issues, Watch for outgoing DHCP requests and
> >> >> > enable the broadcast MAC address indefinitely when one is seen.
> >> >> >
> >> >> This approach will work but adds a performance drag for all UDP
> >> >> traffic whether the link needs DHCPv4 or not (especially when it does
> >> >> not need). Logic works well only when the link (always) needs DHCPv4.
> >> >> When the link does not care about DHCPv4 the packet-inspection always
> >> >> happens.
> >> >
> >> > If DHCPv4 is detected on a link, doesn't that mean the link needs the
> >> > broadcast filter enabled?
> >> Yes and it should stay enabled as it's going to have IPv4. Also once
> >> broadcast bit is set there should not be any additional jugglery /
> >> packet inspection which hurts performance.
> >
> > Yes, which is why my patch includes the check for dhcp4_seen and skips
> > inspecting the packet.  I could modify it to skip the L3 header check
> > too in that case.
> >
> dhcp4_seen works correctly when there is dhcpv4 happening, but
> snooping continues if autoconf is not used.
> 
> >> > If you start DHCPv4 on the link you obviously
> >> > expect it to work, unless there's no DHCPv4 server available.  The patch
> >> > should also reset the broadcast filter on close, so network management
> >> > could simply start DHCPv4, watch it fail, close/open and set static IP.
> >> >
> >> > Alternatively we could have the broadcast filter on a 2 minute timer
> >> > that starts when DHCPv4 is seen.
> >> >
> >> Well, but this would be error-prone. I don't remember but DHCP timeout
> >> is not just 2 minutes.
> >
> > 2m is IIRC an RFC recommended timeout for initial DHCPv4 requests.  If
> > you don't get a reply to your requests within 2m, you aren't likely to
> > get one ever on this network and you can stop trying for a while.  You
> > either get a reply in which case you get an IPv4 address and broadcast
> > is always enabled, or you don't, in which case you don't really need
> > broadcast.  Then when userspace next tries DHCP, the timer would allow
> > broadcast for another 2 minutes.
> >
> >> >> Do we really need to special case DHCPv4? Do you see problems in
> >> >> inverting the current logic of adding broadcast bit (i.e. from
> >> >> enable-if-IPv4 to disable-if-IPv6-only)?
> >> >
> >> > The problem with this is dual-stack configuration.  Typically both
> >> > DHCPv4 and SLAAC are started in this case, and often SLAAC will complete
> >> > earlier.  So now the interface only has an IPv6 address but DHCPv4 is
> >> > still ongoing; as I understand your proposal the filter would now block
> >> > broadcast packets that DHCPv4 is listening for.
> >> >
> >> That was a proposal and we can tweak it if necessary. My primary
> >> concern is that the solution should not make us pay per packet penalty
> >> once (and even if not) the DHCPv4 handshake is complete, which is not
> >> the case as it stands now.
> >
> > I'm not sure what you mean by "not make us pay a per packet penalty once
> > DHCPv4 is complete".  Once an IPv4 address is assigned, the broadcast
> > filter must allow broadcast packets.  Before an IPv4 address is
> > assigned, the driver must watch for DHCPv4 and allow broadcast packets
> > so that the DHCP reply isn't missed.
> >
> If the link is not using DHCPv4 to configure, the snooping continues
> and that is per packet cost the needs to be paid. Probably this can be
> avoided by setting dhcp4_seen = true in add_addr4 handler.
> 
> > If DHCPv4 gets no response, then there are two options:
> >
> > a) down the interface (which resets the broadcast filter to block
> > broadcast packets) and re-up it, and don't do DHCPv4 again
> >
> > b) have a timer in the driver that waits a period of time for an IPv4
> > address after a DHCPv4 request, and blocks broadcast when the timer
> > expires
> >
> Probably something similar where turn on the broadcast bit and wait
> for the interface to get configured (2 min timer) and at the expiry
> decide what is to be done about braodcast bit. If the addresses
> configured are all IPv6, then eliminate it and if any of them are IPv4
> then don't change it. In this case no special casing nor snooping is
> required and this should work for dual-stack scenario as well.

This does not work for the case where, after configuring, the DHCPv4
address lease expires and the IPv4 address is removed, and then DHCPv4
is started again.  Possibly because the DHCP server was gone for a short
time.  The only way to handle that is to snoop again.

Reaching for maximum performance is great, but if that is done by
ignoring/breaking a whole class of normal use-cases, I don't think
that's reasonable.  It's like saying "gee, I'd love UDP to be faster, so
I'll remove anything TCP-related in the kernel".

Also, I don't think the snooping is as bad for performance as you may
think.  The only relevant issue here is L2 + IPv6-only, and in that case
it's 4 extra compares (dhcp4_seen, ipv4cnt, lyr3h, and addr_type) for
the external case.  How much is that really going to slow things down,
versus breaking a huge part of IPv4 address configuration?

Dan

> > Dan
> >
> >> > Dan
> >> >
> >> >> > Signed-off-by: Dan Williams <dcbw@redhat.com>
> >> >> > ---
> >> >> > NOTE: this patch supercedes my previous two ipvlan patches:
> >> >> > [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
> >> >> > [PATCH 2/2] ipvlan: always allow the broadcast MAC address
> >> >> >
> >> >> >  drivers/net/ipvlan/ipvlan.h      |  2 ++
> >> >> >  drivers/net/ipvlan/ipvlan_core.c | 36 ++++++++++++++++++++++++++++++++++--
> >> >> >  drivers/net/ipvlan/ipvlan_main.c | 12 +++++++++---
> >> >> >  3 files changed, 45 insertions(+), 5 deletions(-)
> >> >> >
> >> >> > diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
> >> >> > index 924ea98..7e0b8ff 100644
> >> >> > --- a/drivers/net/ipvlan/ipvlan.h
> >> >> > +++ b/drivers/net/ipvlan/ipvlan.h
> >> >> > @@ -67,6 +67,7 @@ struct ipvl_dev {
> >> >> >         struct list_head        addrs;
> >> >> >         int                     ipv4cnt;
> >> >> >         int                     ipv6cnt;
> >> >> > +       bool                    dhcp4_seen;
> >> >> >         struct ipvl_pcpu_stats  __percpu *pcpu_stats;
> >> >> >         DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
> >> >> >         netdev_features_t       sfeatures;
> >> >> > @@ -118,4 +119,5 @@ bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
> >> >> >  struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
> >> >> >                                         const void *iaddr, bool is_v6);
> >> >> >  void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
> >> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set);
> >> >> >  #endif /* __IPVLAN_H */
> >> >> > diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
> >> >> > index 2a17500..6dd992b 100644
> >> >> > --- a/drivers/net/ipvlan/ipvlan_core.c
> >> >> > +++ b/drivers/net/ipvlan/ipvlan_core.c
> >> >> > @@ -126,6 +126,12 @@ static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
> >> >> >                 lyr3h = arph;
> >> >> >                 break;
> >> >> >         }
> >> >> > +       case htons(ETH_P_ALL): {
> >> >> > +               /* Raw sockets */
> >> >> > +               if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
> >> >> > +                       break;
> >> >> > +               /* Fall through */
> >> >> > +       }
> >> >> >         case htons(ETH_P_IP): {
> >> >> >                 u32 pktlen;
> >> >> >                 struct iphdr *ip4h;
> >> >> > @@ -454,16 +460,42 @@ out:
> >> >> >         return ipvlan_process_outbound(skb, ipvlan);
> >> >> >  }
> >> >> >
> >> >> > +#define DHCP_PACKET_MIN_LEN 282
> >> >> > +
> >> >> > +static bool is_dhcp(struct sk_buff *skb)
> >> >> > +{
> >> >> > +       struct iphdr *ip4h = ip_hdr(skb);
> >> >> > +       struct udphdr *udph;
> >> >> > +
> >> >> > +       if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
> >> >> > +               || ip_is_fragment(ip4h))
> >> >> > +               return false;
> >> >> > +
> >> >> > +       /* In the case of RAW sockets the transport header is not set by
> >> >> > +        * the IP stack so we must set it ourselves.
> >> >> > +        */
> >> >> > +       if (skb->transport_header == skb->network_header)
> >> >> > +               skb_set_transport_header(skb, ETH_HLEN + sizeof(struct iphdr));
> >> >> > +       udph = udp_hdr(skb);
> >> >> > +       return udph && udph->dest == htons(67) && udph->source == htons(68);
> >> >> > +}
> >> >> > +
> >> >> >  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
> >> >> >  {
> >> >> > -       const struct ipvl_dev *ipvlan = netdev_priv(dev);
> >> >> > +       struct ipvl_dev *ipvlan = netdev_priv(dev);
> >> >> >         struct ethhdr *eth = eth_hdr(skb);
> >> >> >         struct ipvl_addr *addr;
> >> >> >         void *lyr3h;
> >> >> >         int addr_type;
> >> >> >
> >> >> > +       lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> >> >> > +       if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
> >> >> > +           && is_dhcp(skb)) {
> >> >> > +               ipvlan->dhcp4_seen = true;
> >> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, true);
> >> >> > +       }
> >> >> > +
> >> >> >         if (ether_addr_equal(eth->h_dest, eth->h_source)) {
> >> >> > -               lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> >> >> >                 if (lyr3h) {
> >> >> >                         addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
> >> >> >                         if (addr)
> >> >> > diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> >> >> > index 4f4099d..a497fb9 100644
> >> >> > --- a/drivers/net/ipvlan/ipvlan_main.c
> >> >> > +++ b/drivers/net/ipvlan/ipvlan_main.c
> >> >> > @@ -167,6 +167,8 @@ static int ipvlan_stop(struct net_device *dev)
> >> >> >
> >> >> >         dev_uc_del(phy_dev, phy_dev->dev_addr);
> >> >> >
> >> >> > +       ipvlan->dhcp4_seen = false;
> >> >> > +
> >> >> >         if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
> >> >> >                 list_for_each_entry(addr, &ipvlan->addrs, anode)
> >> >> >                         ipvlan_ht_addr_del(addr, !dev->dismantle);
> >> >> > @@ -214,7 +216,7 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
> >> >> >                 dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
> >> >> >  }
> >> >> >
> >> >> > -static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> >> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> >> >> >  {
> >> >> >         struct net_device *dev = ipvlan->dev;
> >> >> >         unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
> >> >> > @@ -239,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
> >> >> >                 netdev_for_each_mc_addr(ha, dev)
> >> >> >                         __set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
> >> >> >
> >> >> > +               if (ipvlan->ipv4cnt || ipvlan->dhcp4_seen)
> >> >> > +                       __set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
> >> >> > +
> >> >> >                 bitmap_copy(ipvlan->mac_filters, mc_filters,
> >> >> >                             IPVLAN_MAC_FILTER_SIZE);
> >> >> >         }
> >> >> > @@ -467,6 +472,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
> >> >> >         INIT_LIST_HEAD(&ipvlan->addrs);
> >> >> >         ipvlan->ipv4cnt = 0;
> >> >> >         ipvlan->ipv6cnt = 0;
> >> >> > +       ipvlan->dhcp4_seen = false;
> >> >> >
> >> >> >         /* TODO Probably put random address here to be presented to the
> >> >> >          * world but keep using the physical-dev address for the outgoing
> >> >> > @@ -708,8 +714,8 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
> >> >> >         list_del_rcu(&addr->anode);
> >> >> >         ipvlan->ipv4cnt--;
> >> >> >         WARN_ON(ipvlan->ipv4cnt < 0);
> >> >> > -       if (!ipvlan->ipv4cnt)
> >> >> > -           ipvlan_set_broadcast_mac_filter(ipvlan, false);
> >> >> > +       if (!ipvlan->ipv4cnt && !ipvlan->dhcp4_seen)
> >> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, false);
> >> >> >         kfree_rcu(addr, rcu);
> >> >> >
> >> >> >         return;
> >> >> > --
> >> >> > 2.1.0
> >> >> >
> >> >> >
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-06 17:17                     ` Dan Williams
@ 2015-04-07 18:32                       ` Mahesh Bandewar
  2015-04-07 19:45                         ` Dan Williams
  2015-04-08  9:37                       ` David Laight
  1 sibling, 1 reply; 37+ messages in thread
From: Mahesh Bandewar @ 2015-04-07 18:32 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-netdev, jbenc

On Mon, Apr 6, 2015 at 10:17 AM, Dan Williams <dcbw@redhat.com> wrote:
> On Thu, 2015-04-02 at 18:39 -0700, Mahesh Bandewar wrote:
>> On Thu, Apr 2, 2015 at 7:40 AM, Dan Williams <dcbw@redhat.com> wrote:
>> > On Wed, 2015-04-01 at 18:30 -0700, Mahesh Bandewar wrote:
>> >> On Wed, Apr 1, 2015 at 1:55 PM, Dan Williams <dcbw@redhat.com> wrote:
>> >> > On Wed, 2015-04-01 at 13:24 -0700, Mahesh Bandewar wrote:
>> >> >> On Mon, Mar 30, 2015 at 9:22 PM, Dan Williams <dcbw@redhat.com> wrote:
>> >> >> > The broadcast MAC is supposed to be allowed whenever the device
>> >> >> > has an IPv4 address, otherwise ARP requests get dropped on the
>> >> >> > floor.  If ndo_set_rx_mode (and thus
>> >> >> > ipvlan_set_multicast_mac_filter()) got called after the address
>> >> >> > was added, it would blow away the broadcast MAC address in
>> >> >> > mac_filters that was added at IPv4 address addition.
>> >> >> >
>> >> >> > Next, ipvlan currently fails DHCP addressing for two reasons:
>> >> >> >
>> >> >> I would prefer DHCPv4 instead of DHCP since it could also mean DHCPv6
>> >> >> which does not use broadcast.
>> >> >>
>> >> >> > 1) DHCP offers are typically unicast back to the device's MAC
>> >> >> > address, and at the IP layer have a destination IP address
>> >> >> > matching the new lease address.  In ipvlan unicast packets
>> >> >> > are checked against existing addresses assigned to the ipvlan
>> >> >> > interface, so clearly this fails hard because the ipvlan
>> >> >> > interface has no IP addresses yet.  Workaround: request
>> >> >> > that the server broadcast replies (-B for dhclient) which
>> >> >> > don't get checked against the address list.
>> >> >> >
>> >> >> > 2) Even when that's done, mac_filters only allows the
>> >> >> > broadcast MAC address when the interface has >= 1 IPv4
>> >> >> > addresses, so double-fail, and the incoming DHCP offer
>> >> >> > gets dropped on the floor again.
>> >> >> >
>> >> >> > To fix these issues, Watch for outgoing DHCP requests and
>> >> >> > enable the broadcast MAC address indefinitely when one is seen.
>> >> >> >
>> >> >> This approach will work but adds a performance drag for all UDP
>> >> >> traffic whether the link needs DHCPv4 or not (especially when it does
>> >> >> not need). Logic works well only when the link (always) needs DHCPv4.
>> >> >> When the link does not care about DHCPv4 the packet-inspection always
>> >> >> happens.
>> >> >
>> >> > If DHCPv4 is detected on a link, doesn't that mean the link needs the
>> >> > broadcast filter enabled?
>> >> Yes and it should stay enabled as it's going to have IPv4. Also once
>> >> broadcast bit is set there should not be any additional jugglery /
>> >> packet inspection which hurts performance.
>> >
>> > Yes, which is why my patch includes the check for dhcp4_seen and skips
>> > inspecting the packet.  I could modify it to skip the L3 header check
>> > too in that case.
>> >
>> dhcp4_seen works correctly when there is dhcpv4 happening, but
>> snooping continues if autoconf is not used.
>>
>> >> > If you start DHCPv4 on the link you obviously
>> >> > expect it to work, unless there's no DHCPv4 server available.  The patch
>> >> > should also reset the broadcast filter on close, so network management
>> >> > could simply start DHCPv4, watch it fail, close/open and set static IP.
>> >> >
>> >> > Alternatively we could have the broadcast filter on a 2 minute timer
>> >> > that starts when DHCPv4 is seen.
>> >> >
>> >> Well, but this would be error-prone. I don't remember but DHCP timeout
>> >> is not just 2 minutes.
>> >
>> > 2m is IIRC an RFC recommended timeout for initial DHCPv4 requests.  If
>> > you don't get a reply to your requests within 2m, you aren't likely to
>> > get one ever on this network and you can stop trying for a while.  You
>> > either get a reply in which case you get an IPv4 address and broadcast
>> > is always enabled, or you don't, in which case you don't really need
>> > broadcast.  Then when userspace next tries DHCP, the timer would allow
>> > broadcast for another 2 minutes.
>> >
>> >> >> Do we really need to special case DHCPv4? Do you see problems in
>> >> >> inverting the current logic of adding broadcast bit (i.e. from
>> >> >> enable-if-IPv4 to disable-if-IPv6-only)?
>> >> >
>> >> > The problem with this is dual-stack configuration.  Typically both
>> >> > DHCPv4 and SLAAC are started in this case, and often SLAAC will complete
>> >> > earlier.  So now the interface only has an IPv6 address but DHCPv4 is
>> >> > still ongoing; as I understand your proposal the filter would now block
>> >> > broadcast packets that DHCPv4 is listening for.
>> >> >
>> >> That was a proposal and we can tweak it if necessary. My primary
>> >> concern is that the solution should not make us pay per packet penalty
>> >> once (and even if not) the DHCPv4 handshake is complete, which is not
>> >> the case as it stands now.
>> >
>> > I'm not sure what you mean by "not make us pay a per packet penalty once
>> > DHCPv4 is complete".  Once an IPv4 address is assigned, the broadcast
>> > filter must allow broadcast packets.  Before an IPv4 address is
>> > assigned, the driver must watch for DHCPv4 and allow broadcast packets
>> > so that the DHCP reply isn't missed.
>> >
>> If the link is not using DHCPv4 to configure, the snooping continues
>> and that is per packet cost the needs to be paid. Probably this can be
>> avoided by setting dhcp4_seen = true in add_addr4 handler.
>>
>> > If DHCPv4 gets no response, then there are two options:
>> >
>> > a) down the interface (which resets the broadcast filter to block
>> > broadcast packets) and re-up it, and don't do DHCPv4 again
>> >
>> > b) have a timer in the driver that waits a period of time for an IPv4
>> > address after a DHCPv4 request, and blocks broadcast when the timer
>> > expires
>> >
>> Probably something similar where turn on the broadcast bit and wait
>> for the interface to get configured (2 min timer) and at the expiry
>> decide what is to be done about braodcast bit. If the addresses
>> configured are all IPv6, then eliminate it and if any of them are IPv4
>> then don't change it. In this case no special casing nor snooping is
>> required and this should work for dual-stack scenario as well.
>
> This does not work for the case where, after configuring, the DHCPv4
> address lease expires and the IPv4 address is removed, and then DHCPv4
> is started again.  Possibly because the DHCP server was gone for a short
> time.  The only way to handle that is to snoop again.
>
If the DHCPv4 lease is expired, then why DHCPv4-renew wont work (also
it's unicast)? Also if the address applied is  IPv4 then the broadcast
bit will stay. In this case I don't understand why this wont work.

If the link is stripped off of address(es), then it should again go
back into the config-mode where it would turn on the broadcast bit and
enable timer to get it configured. May be I'm missing something.


> Reaching for maximum performance is great, but if that is done by
> ignoring/breaking a whole class of normal use-cases, I don't think
> that's reasonable.  It's like saying "gee, I'd love UDP to be faster, so
> I'll remove anything TCP-related in the kernel".
>
What is the amount of traffic that a link will be sending for config
(DHCPv4 in this case)? I would guarantee you that it's so negligible
to the extent that it's would be non-existent. But all the suggestions
include snooping of every packet that goes through the link. However
I'm not suggesting that this is non-important or can stay broken. But
would rather have a solution that would not affect 99.9% of the
traffic (per packet penalty that I mentioned earlier).

> Also, I don't think the snooping is as bad for performance as you may
> think.  The only relevant issue here is L2 + IPv6-only,
Also IPv4 case when the link is not using autoconf.

> and in that case
> it's 4 extra compares (dhcp4_seen, ipv4cnt, lyr3h, and addr_type) for
> the external case.  How much is that really going to slow things down,
> versus breaking a huge part of IPv4 address configuration?
>
> Dan
>
>> > Dan
>> >
>> >> > Dan
>> >> >
>> >> >> > Signed-off-by: Dan Williams <dcbw@redhat.com>
>> >> >> > ---
>> >> >> > NOTE: this patch supercedes my previous two ipvlan patches:
>> >> >> > [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
>> >> >> > [PATCH 2/2] ipvlan: always allow the broadcast MAC address
>> >> >> >
>> >> >> >  drivers/net/ipvlan/ipvlan.h      |  2 ++
>> >> >> >  drivers/net/ipvlan/ipvlan_core.c | 36 ++++++++++++++++++++++++++++++++++--
>> >> >> >  drivers/net/ipvlan/ipvlan_main.c | 12 +++++++++---
>> >> >> >  3 files changed, 45 insertions(+), 5 deletions(-)
>> >> >> >
>> >> >> > diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
>> >> >> > index 924ea98..7e0b8ff 100644
>> >> >> > --- a/drivers/net/ipvlan/ipvlan.h
>> >> >> > +++ b/drivers/net/ipvlan/ipvlan.h
>> >> >> > @@ -67,6 +67,7 @@ struct ipvl_dev {
>> >> >> >         struct list_head        addrs;
>> >> >> >         int                     ipv4cnt;
>> >> >> >         int                     ipv6cnt;
>> >> >> > +       bool                    dhcp4_seen;
>> >> >> >         struct ipvl_pcpu_stats  __percpu *pcpu_stats;
>> >> >> >         DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
>> >> >> >         netdev_features_t       sfeatures;
>> >> >> > @@ -118,4 +119,5 @@ bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
>> >> >> >  struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
>> >> >> >                                         const void *iaddr, bool is_v6);
>> >> >> >  void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
>> >> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set);
>> >> >> >  #endif /* __IPVLAN_H */
>> >> >> > diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
>> >> >> > index 2a17500..6dd992b 100644
>> >> >> > --- a/drivers/net/ipvlan/ipvlan_core.c
>> >> >> > +++ b/drivers/net/ipvlan/ipvlan_core.c
>> >> >> > @@ -126,6 +126,12 @@ static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
>> >> >> >                 lyr3h = arph;
>> >> >> >                 break;
>> >> >> >         }
>> >> >> > +       case htons(ETH_P_ALL): {
>> >> >> > +               /* Raw sockets */
>> >> >> > +               if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
>> >> >> > +                       break;
>> >> >> > +               /* Fall through */
>> >> >> > +       }
>> >> >> >         case htons(ETH_P_IP): {
>> >> >> >                 u32 pktlen;
>> >> >> >                 struct iphdr *ip4h;
>> >> >> > @@ -454,16 +460,42 @@ out:
>> >> >> >         return ipvlan_process_outbound(skb, ipvlan);
>> >> >> >  }
>> >> >> >
>> >> >> > +#define DHCP_PACKET_MIN_LEN 282
>> >> >> > +
>> >> >> > +static bool is_dhcp(struct sk_buff *skb)
>> >> >> > +{
>> >> >> > +       struct iphdr *ip4h = ip_hdr(skb);
>> >> >> > +       struct udphdr *udph;
>> >> >> > +
>> >> >> > +       if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
>> >> >> > +               || ip_is_fragment(ip4h))
>> >> >> > +               return false;
>> >> >> > +
>> >> >> > +       /* In the case of RAW sockets the transport header is not set by
>> >> >> > +        * the IP stack so we must set it ourselves.
>> >> >> > +        */
>> >> >> > +       if (skb->transport_header == skb->network_header)
>> >> >> > +               skb_set_transport_header(skb, ETH_HLEN + sizeof(struct iphdr));
>> >> >> > +       udph = udp_hdr(skb);
>> >> >> > +       return udph && udph->dest == htons(67) && udph->source == htons(68);
>> >> >> > +}
>> >> >> > +
>> >> >> >  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
>> >> >> >  {
>> >> >> > -       const struct ipvl_dev *ipvlan = netdev_priv(dev);
>> >> >> > +       struct ipvl_dev *ipvlan = netdev_priv(dev);
>> >> >> >         struct ethhdr *eth = eth_hdr(skb);
>> >> >> >         struct ipvl_addr *addr;
>> >> >> >         void *lyr3h;
>> >> >> >         int addr_type;
>> >> >> >
>> >> >> > +       lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
>> >> >> > +       if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
>> >> >> > +           && is_dhcp(skb)) {
>> >> >> > +               ipvlan->dhcp4_seen = true;
>> >> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, true);
>> >> >> > +       }
>> >> >> > +
>> >> >> >         if (ether_addr_equal(eth->h_dest, eth->h_source)) {
>> >> >> > -               lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
>> >> >> >                 if (lyr3h) {
>> >> >> >                         addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
>> >> >> >                         if (addr)
>> >> >> > diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
>> >> >> > index 4f4099d..a497fb9 100644
>> >> >> > --- a/drivers/net/ipvlan/ipvlan_main.c
>> >> >> > +++ b/drivers/net/ipvlan/ipvlan_main.c
>> >> >> > @@ -167,6 +167,8 @@ static int ipvlan_stop(struct net_device *dev)
>> >> >> >
>> >> >> >         dev_uc_del(phy_dev, phy_dev->dev_addr);
>> >> >> >
>> >> >> > +       ipvlan->dhcp4_seen = false;
>> >> >> > +
>> >> >> >         if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
>> >> >> >                 list_for_each_entry(addr, &ipvlan->addrs, anode)
>> >> >> >                         ipvlan_ht_addr_del(addr, !dev->dismantle);
>> >> >> > @@ -214,7 +216,7 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
>> >> >> >                 dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
>> >> >> >  }
>> >> >> >
>> >> >> > -static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
>> >> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
>> >> >> >  {
>> >> >> >         struct net_device *dev = ipvlan->dev;
>> >> >> >         unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
>> >> >> > @@ -239,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
>> >> >> >                 netdev_for_each_mc_addr(ha, dev)
>> >> >> >                         __set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
>> >> >> >
>> >> >> > +               if (ipvlan->ipv4cnt || ipvlan->dhcp4_seen)
>> >> >> > +                       __set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
>> >> >> > +
>> >> >> >                 bitmap_copy(ipvlan->mac_filters, mc_filters,
>> >> >> >                             IPVLAN_MAC_FILTER_SIZE);
>> >> >> >         }
>> >> >> > @@ -467,6 +472,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
>> >> >> >         INIT_LIST_HEAD(&ipvlan->addrs);
>> >> >> >         ipvlan->ipv4cnt = 0;
>> >> >> >         ipvlan->ipv6cnt = 0;
>> >> >> > +       ipvlan->dhcp4_seen = false;
>> >> >> >
>> >> >> >         /* TODO Probably put random address here to be presented to the
>> >> >> >          * world but keep using the physical-dev address for the outgoing
>> >> >> > @@ -708,8 +714,8 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
>> >> >> >         list_del_rcu(&addr->anode);
>> >> >> >         ipvlan->ipv4cnt--;
>> >> >> >         WARN_ON(ipvlan->ipv4cnt < 0);
>> >> >> > -       if (!ipvlan->ipv4cnt)
>> >> >> > -           ipvlan_set_broadcast_mac_filter(ipvlan, false);
>> >> >> > +       if (!ipvlan->ipv4cnt && !ipvlan->dhcp4_seen)
>> >> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, false);
>> >> >> >         kfree_rcu(addr, rcu);
>> >> >> >
>> >> >> >         return;
>> >> >> > --
>> >> >> > 2.1.0
>> >> >> >
>> >> >> >
>> >> >> --
>> >> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> >> >> the body of a message to majordomo@vger.kernel.org
>> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >> >
>> >> >
>> >> --
>> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> >> the body of a message to majordomo@vger.kernel.org
>> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>> >
>> >
>> --
>> To unsubscribe from this list: send the line "unsubscribe netdev" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-07 18:32                       ` Mahesh Bandewar
@ 2015-04-07 19:45                         ` Dan Williams
  2015-04-09 15:51                           ` Dan Williams
  0 siblings, 1 reply; 37+ messages in thread
From: Dan Williams @ 2015-04-07 19:45 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: linux-netdev, jbenc

On Tue, 2015-04-07 at 11:32 -0700, Mahesh Bandewar wrote:
> On Mon, Apr 6, 2015 at 10:17 AM, Dan Williams <dcbw@redhat.com> wrote:
> > On Thu, 2015-04-02 at 18:39 -0700, Mahesh Bandewar wrote:
> >> On Thu, Apr 2, 2015 at 7:40 AM, Dan Williams <dcbw@redhat.com> wrote:
> >> > On Wed, 2015-04-01 at 18:30 -0700, Mahesh Bandewar wrote:
> >> >> On Wed, Apr 1, 2015 at 1:55 PM, Dan Williams <dcbw@redhat.com> wrote:
> >> >> > On Wed, 2015-04-01 at 13:24 -0700, Mahesh Bandewar wrote:
> >> >> >> On Mon, Mar 30, 2015 at 9:22 PM, Dan Williams <dcbw@redhat.com> wrote:
> >> >> >> > The broadcast MAC is supposed to be allowed whenever the device
> >> >> >> > has an IPv4 address, otherwise ARP requests get dropped on the
> >> >> >> > floor.  If ndo_set_rx_mode (and thus
> >> >> >> > ipvlan_set_multicast_mac_filter()) got called after the address
> >> >> >> > was added, it would blow away the broadcast MAC address in
> >> >> >> > mac_filters that was added at IPv4 address addition.
> >> >> >> >
> >> >> >> > Next, ipvlan currently fails DHCP addressing for two reasons:
> >> >> >> >
> >> >> >> I would prefer DHCPv4 instead of DHCP since it could also mean DHCPv6
> >> >> >> which does not use broadcast.
> >> >> >>
> >> >> >> > 1) DHCP offers are typically unicast back to the device's MAC
> >> >> >> > address, and at the IP layer have a destination IP address
> >> >> >> > matching the new lease address.  In ipvlan unicast packets
> >> >> >> > are checked against existing addresses assigned to the ipvlan
> >> >> >> > interface, so clearly this fails hard because the ipvlan
> >> >> >> > interface has no IP addresses yet.  Workaround: request
> >> >> >> > that the server broadcast replies (-B for dhclient) which
> >> >> >> > don't get checked against the address list.
> >> >> >> >
> >> >> >> > 2) Even when that's done, mac_filters only allows the
> >> >> >> > broadcast MAC address when the interface has >= 1 IPv4
> >> >> >> > addresses, so double-fail, and the incoming DHCP offer
> >> >> >> > gets dropped on the floor again.
> >> >> >> >
> >> >> >> > To fix these issues, Watch for outgoing DHCP requests and
> >> >> >> > enable the broadcast MAC address indefinitely when one is seen.
> >> >> >> >
> >> >> >> This approach will work but adds a performance drag for all UDP
> >> >> >> traffic whether the link needs DHCPv4 or not (especially when it does
> >> >> >> not need). Logic works well only when the link (always) needs DHCPv4.
> >> >> >> When the link does not care about DHCPv4 the packet-inspection always
> >> >> >> happens.
> >> >> >
> >> >> > If DHCPv4 is detected on a link, doesn't that mean the link needs the
> >> >> > broadcast filter enabled?
> >> >> Yes and it should stay enabled as it's going to have IPv4. Also once
> >> >> broadcast bit is set there should not be any additional jugglery /
> >> >> packet inspection which hurts performance.
> >> >
> >> > Yes, which is why my patch includes the check for dhcp4_seen and skips
> >> > inspecting the packet.  I could modify it to skip the L3 header check
> >> > too in that case.
> >> >
> >> dhcp4_seen works correctly when there is dhcpv4 happening, but
> >> snooping continues if autoconf is not used.
> >>
> >> >> > If you start DHCPv4 on the link you obviously
> >> >> > expect it to work, unless there's no DHCPv4 server available.  The patch
> >> >> > should also reset the broadcast filter on close, so network management
> >> >> > could simply start DHCPv4, watch it fail, close/open and set static IP.
> >> >> >
> >> >> > Alternatively we could have the broadcast filter on a 2 minute timer
> >> >> > that starts when DHCPv4 is seen.
> >> >> >
> >> >> Well, but this would be error-prone. I don't remember but DHCP timeout
> >> >> is not just 2 minutes.
> >> >
> >> > 2m is IIRC an RFC recommended timeout for initial DHCPv4 requests.  If
> >> > you don't get a reply to your requests within 2m, you aren't likely to
> >> > get one ever on this network and you can stop trying for a while.  You
> >> > either get a reply in which case you get an IPv4 address and broadcast
> >> > is always enabled, or you don't, in which case you don't really need
> >> > broadcast.  Then when userspace next tries DHCP, the timer would allow
> >> > broadcast for another 2 minutes.
> >> >
> >> >> >> Do we really need to special case DHCPv4? Do you see problems in
> >> >> >> inverting the current logic of adding broadcast bit (i.e. from
> >> >> >> enable-if-IPv4 to disable-if-IPv6-only)?
> >> >> >
> >> >> > The problem with this is dual-stack configuration.  Typically both
> >> >> > DHCPv4 and SLAAC are started in this case, and often SLAAC will complete
> >> >> > earlier.  So now the interface only has an IPv6 address but DHCPv4 is
> >> >> > still ongoing; as I understand your proposal the filter would now block
> >> >> > broadcast packets that DHCPv4 is listening for.
> >> >> >
> >> >> That was a proposal and we can tweak it if necessary. My primary
> >> >> concern is that the solution should not make us pay per packet penalty
> >> >> once (and even if not) the DHCPv4 handshake is complete, which is not
> >> >> the case as it stands now.
> >> >
> >> > I'm not sure what you mean by "not make us pay a per packet penalty once
> >> > DHCPv4 is complete".  Once an IPv4 address is assigned, the broadcast
> >> > filter must allow broadcast packets.  Before an IPv4 address is
> >> > assigned, the driver must watch for DHCPv4 and allow broadcast packets
> >> > so that the DHCP reply isn't missed.
> >> >
> >> If the link is not using DHCPv4 to configure, the snooping continues
> >> and that is per packet cost the needs to be paid. Probably this can be
> >> avoided by setting dhcp4_seen = true in add_addr4 handler.
> >>
> >> > If DHCPv4 gets no response, then there are two options:
> >> >
> >> > a) down the interface (which resets the broadcast filter to block
> >> > broadcast packets) and re-up it, and don't do DHCPv4 again
> >> >
> >> > b) have a timer in the driver that waits a period of time for an IPv4
> >> > address after a DHCPv4 request, and blocks broadcast when the timer
> >> > expires
> >> >
> >> Probably something similar where turn on the broadcast bit and wait
> >> for the interface to get configured (2 min timer) and at the expiry
> >> decide what is to be done about braodcast bit. If the addresses
> >> configured are all IPv6, then eliminate it and if any of them are IPv4
> >> then don't change it. In this case no special casing nor snooping is
> >> required and this should work for dual-stack scenario as well.
> >
> > This does not work for the case where, after configuring, the DHCPv4
> > address lease expires and the IPv4 address is removed, and then DHCPv4
> > is started again.  Possibly because the DHCP server was gone for a short
> > time.  The only way to handle that is to snoop again.
> >
> If the DHCPv4 lease is expired, then why DHCPv4-renew wont work (also
> it's unicast)? Also if the address applied is  IPv4 then the broadcast
> bit will stay. In this case I don't understand why this wont work.

You're right.  I mis-read your proposal above.

But having re-read it, are you proposing a 2m timer on interface up?  If
so (and ignore this if not) I don't think that works well either,
because there's no guarantee that interface configuration will happen
only close to interface up.  Maybe userspace adds IPv6 addresses
initially, and then later tries to do DHCP for some reason.  We simply
cannot rely on specific ordering of operations that userspace might want
to do.

If you don't mean a 2m timer from interface up, then ignore that, and
then what kind of time do you mean? :)

> If the link is stripped off of address(es), then it should again go
> back into the config-mode where it would turn on the broadcast bit and
> enable timer to get it configured. May be I'm missing something.

As above, I was wrong about the DHCPv4 lease expiration thing, but this
may still run afoul of userspace operation ordering if you are talking
about a 2m timer from interface up.

Just had another thought though; what if instead of snooping for all the
DHCP stuff, the code just snooped outgoing IPv4 packets for a broadcast
destination address?  Then turn on the broadcast bit filter for 2m.
That would look something like this:

static bool is_bcast4(struct sk_buff *skb)
{
	struct iphdr *ip4h;

	switch (skb->protocol) {
	case htons(ETH_P_ALL):
		/* Raw sockets */
		if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
			break;
		/* Fall through */
	case htons(ETH_P_IP):
		if (unlikely(!pskb_may_pull(skb, sizeof(*ip4h))))
			return NULL;
		ip4h = ip_hdr(skb);
		if (ip4h->ihl < 5 || ip4h->version != 4)
			return NULL;
		return ip4h->daddr == INADDR_BROADCAST;
	}
	return false;
}

static int ipvlan_xmit_mode_l2(...)
{
	if (!ipvlan->ipv4cnt && !ipvlan->bcast4_seen && is_bcast4(skb)) {
		ipvlan->bcast4_seen = true;
		ipvlan_set_broadcast_mac_filter(ipvlan, true);
	}

Yes, it's still snooping for all packets, but it's a lot fewer compares
than looking for DHCP specifically.

> > Reaching for maximum performance is great, but if that is done by
> > ignoring/breaking a whole class of normal use-cases, I don't think
> > that's reasonable.  It's like saying "gee, I'd love UDP to be faster, so
> > I'll remove anything TCP-related in the kernel".
> >
> What is the amount of traffic that a link will be sending for config
> (DHCPv4 in this case)? I would guarantee you that it's so negligible
> to the extent that it's would be non-existent. But all the suggestions
> include snooping of every packet that goes through the link. However
> I'm not suggesting that this is non-important or can stay broken. But
> would rather have a solution that would not affect 99.9% of the
> traffic (per packet penalty that I mentioned earlier).

I agree, and it's great that we're brainstorming solutions.
Unfortunately as a kernel driver, it cannot rely on specific ordering or
timing of address addition/deletion since that's not something you can
predict at all.

> > Also, I don't think the snooping is as bad for performance as you may
> > think.  The only relevant issue here is L2 + IPv6-only,
> Also IPv4 case when the link is not using autoconf.

Yes, that's true.  But a kernel driver simply doesn't know what its
setup will be so it cannot assume much of anything about the addressing
setup despite whatever performance gains there might be...

Dan

> > and in that case
> > it's 4 extra compares (dhcp4_seen, ipv4cnt, lyr3h, and addr_type) for
> > the external case.  How much is that really going to slow things down,
> > versus breaking a huge part of IPv4 address configuration?
> >
> > Dan
> >
> >> > Dan
> >> >
> >> >> > Dan
> >> >> >
> >> >> >> > Signed-off-by: Dan Williams <dcbw@redhat.com>
> >> >> >> > ---
> >> >> >> > NOTE: this patch supercedes my previous two ipvlan patches:
> >> >> >> > [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
> >> >> >> > [PATCH 2/2] ipvlan: always allow the broadcast MAC address
> >> >> >> >
> >> >> >> >  drivers/net/ipvlan/ipvlan.h      |  2 ++
> >> >> >> >  drivers/net/ipvlan/ipvlan_core.c | 36 ++++++++++++++++++++++++++++++++++--
> >> >> >> >  drivers/net/ipvlan/ipvlan_main.c | 12 +++++++++---
> >> >> >> >  3 files changed, 45 insertions(+), 5 deletions(-)
> >> >> >> >
> >> >> >> > diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
> >> >> >> > index 924ea98..7e0b8ff 100644
> >> >> >> > --- a/drivers/net/ipvlan/ipvlan.h
> >> >> >> > +++ b/drivers/net/ipvlan/ipvlan.h
> >> >> >> > @@ -67,6 +67,7 @@ struct ipvl_dev {
> >> >> >> >         struct list_head        addrs;
> >> >> >> >         int                     ipv4cnt;
> >> >> >> >         int                     ipv6cnt;
> >> >> >> > +       bool                    dhcp4_seen;
> >> >> >> >         struct ipvl_pcpu_stats  __percpu *pcpu_stats;
> >> >> >> >         DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
> >> >> >> >         netdev_features_t       sfeatures;
> >> >> >> > @@ -118,4 +119,5 @@ bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
> >> >> >> >  struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
> >> >> >> >                                         const void *iaddr, bool is_v6);
> >> >> >> >  void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
> >> >> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set);
> >> >> >> >  #endif /* __IPVLAN_H */
> >> >> >> > diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
> >> >> >> > index 2a17500..6dd992b 100644
> >> >> >> > --- a/drivers/net/ipvlan/ipvlan_core.c
> >> >> >> > +++ b/drivers/net/ipvlan/ipvlan_core.c
> >> >> >> > @@ -126,6 +126,12 @@ static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
> >> >> >> >                 lyr3h = arph;
> >> >> >> >                 break;
> >> >> >> >         }
> >> >> >> > +       case htons(ETH_P_ALL): {
> >> >> >> > +               /* Raw sockets */
> >> >> >> > +               if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
> >> >> >> > +                       break;
> >> >> >> > +               /* Fall through */
> >> >> >> > +       }
> >> >> >> >         case htons(ETH_P_IP): {
> >> >> >> >                 u32 pktlen;
> >> >> >> >                 struct iphdr *ip4h;
> >> >> >> > @@ -454,16 +460,42 @@ out:
> >> >> >> >         return ipvlan_process_outbound(skb, ipvlan);
> >> >> >> >  }
> >> >> >> >
> >> >> >> > +#define DHCP_PACKET_MIN_LEN 282
> >> >> >> > +
> >> >> >> > +static bool is_dhcp(struct sk_buff *skb)
> >> >> >> > +{
> >> >> >> > +       struct iphdr *ip4h = ip_hdr(skb);
> >> >> >> > +       struct udphdr *udph;
> >> >> >> > +
> >> >> >> > +       if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
> >> >> >> > +               || ip_is_fragment(ip4h))
> >> >> >> > +               return false;
> >> >> >> > +
> >> >> >> > +       /* In the case of RAW sockets the transport header is not set by
> >> >> >> > +        * the IP stack so we must set it ourselves.
> >> >> >> > +        */
> >> >> >> > +       if (skb->transport_header == skb->network_header)
> >> >> >> > +               skb_set_transport_header(skb, ETH_HLEN + sizeof(struct iphdr));
> >> >> >> > +       udph = udp_hdr(skb);
> >> >> >> > +       return udph && udph->dest == htons(67) && udph->source == htons(68);
> >> >> >> > +}
> >> >> >> > +
> >> >> >> >  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
> >> >> >> >  {
> >> >> >> > -       const struct ipvl_dev *ipvlan = netdev_priv(dev);
> >> >> >> > +       struct ipvl_dev *ipvlan = netdev_priv(dev);
> >> >> >> >         struct ethhdr *eth = eth_hdr(skb);
> >> >> >> >         struct ipvl_addr *addr;
> >> >> >> >         void *lyr3h;
> >> >> >> >         int addr_type;
> >> >> >> >
> >> >> >> > +       lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> >> >> >> > +       if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
> >> >> >> > +           && is_dhcp(skb)) {
> >> >> >> > +               ipvlan->dhcp4_seen = true;
> >> >> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, true);
> >> >> >> > +       }
> >> >> >> > +
> >> >> >> >         if (ether_addr_equal(eth->h_dest, eth->h_source)) {
> >> >> >> > -               lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> >> >> >> >                 if (lyr3h) {
> >> >> >> >                         addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
> >> >> >> >                         if (addr)
> >> >> >> > diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> >> >> >> > index 4f4099d..a497fb9 100644
> >> >> >> > --- a/drivers/net/ipvlan/ipvlan_main.c
> >> >> >> > +++ b/drivers/net/ipvlan/ipvlan_main.c
> >> >> >> > @@ -167,6 +167,8 @@ static int ipvlan_stop(struct net_device *dev)
> >> >> >> >
> >> >> >> >         dev_uc_del(phy_dev, phy_dev->dev_addr);
> >> >> >> >
> >> >> >> > +       ipvlan->dhcp4_seen = false;
> >> >> >> > +
> >> >> >> >         if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
> >> >> >> >                 list_for_each_entry(addr, &ipvlan->addrs, anode)
> >> >> >> >                         ipvlan_ht_addr_del(addr, !dev->dismantle);
> >> >> >> > @@ -214,7 +216,7 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
> >> >> >> >                 dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
> >> >> >> >  }
> >> >> >> >
> >> >> >> > -static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> >> >> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> >> >> >> >  {
> >> >> >> >         struct net_device *dev = ipvlan->dev;
> >> >> >> >         unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
> >> >> >> > @@ -239,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
> >> >> >> >                 netdev_for_each_mc_addr(ha, dev)
> >> >> >> >                         __set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
> >> >> >> >
> >> >> >> > +               if (ipvlan->ipv4cnt || ipvlan->dhcp4_seen)
> >> >> >> > +                       __set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
> >> >> >> > +
> >> >> >> >                 bitmap_copy(ipvlan->mac_filters, mc_filters,
> >> >> >> >                             IPVLAN_MAC_FILTER_SIZE);
> >> >> >> >         }
> >> >> >> > @@ -467,6 +472,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
> >> >> >> >         INIT_LIST_HEAD(&ipvlan->addrs);
> >> >> >> >         ipvlan->ipv4cnt = 0;
> >> >> >> >         ipvlan->ipv6cnt = 0;
> >> >> >> > +       ipvlan->dhcp4_seen = false;
> >> >> >> >
> >> >> >> >         /* TODO Probably put random address here to be presented to the
> >> >> >> >          * world but keep using the physical-dev address for the outgoing
> >> >> >> > @@ -708,8 +714,8 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
> >> >> >> >         list_del_rcu(&addr->anode);
> >> >> >> >         ipvlan->ipv4cnt--;
> >> >> >> >         WARN_ON(ipvlan->ipv4cnt < 0);
> >> >> >> > -       if (!ipvlan->ipv4cnt)
> >> >> >> > -           ipvlan_set_broadcast_mac_filter(ipvlan, false);
> >> >> >> > +       if (!ipvlan->ipv4cnt && !ipvlan->dhcp4_seen)
> >> >> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, false);
> >> >> >> >         kfree_rcu(addr, rcu);
> >> >> >> >
> >> >> >> >         return;
> >> >> >> > --
> >> >> >> > 2.1.0
> >> >> >> >
> >> >> >> >
> >> >> >> --
> >> >> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >> >
> >> >> >
> >> >> --
> >> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> >> the body of a message to majordomo@vger.kernel.org
> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >> >
> >> >
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> >
> >
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* RE: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-06 17:17                     ` Dan Williams
  2015-04-07 18:32                       ` Mahesh Bandewar
@ 2015-04-08  9:37                       ` David Laight
  2015-04-08 14:12                         ` Dan Williams
  2015-04-09  1:08                         ` Mahesh Bandewar
  1 sibling, 2 replies; 37+ messages in thread
From: David Laight @ 2015-04-08  9:37 UTC (permalink / raw)
  To: 'Dan Williams', Mahesh Bandewar; +Cc: linux-netdev, jbenc

From: Dan Williams
...
> > Probably something similar where turn on the broadcast bit and wait
> > for the interface to get configured (2 min timer) and at the expiry
> > decide what is to be done about braodcast bit. If the addresses
> > configured are all IPv6, then eliminate it and if any of them are IPv4
> > then don't change it. In this case no special casing nor snooping is
> > required and this should work for dual-stack scenario as well.
> 
> This does not work for the case where, after configuring, the DHCPv4
> address lease expires and the IPv4 address is removed, and then DHCPv4
> is started again.  Possibly because the DHCP server was gone for a short
> time.  The only way to handle that is to snoop again.
> 
> Reaching for maximum performance is great, but if that is done by
> ignoring/breaking a whole class of normal use-cases, I don't think
> that's reasonable.  It's like saying "gee, I'd love UDP to be faster, so
> I'll remove anything TCP-related in the kernel".
> 
> Also, I don't think the snooping is as bad for performance as you may
> think.  The only relevant issue here is L2 + IPv6-only, and in that case
> it's 4 extra compares (dhcp4_seen, ipv4cnt, lyr3h, and addr_type) for
> the external case.  How much is that really going to slow things down,
> versus breaking a huge part of IPv4 address configuration?

How much performance do you really gain from disabling broadcasts in hardware?
(Unless your network has massive broadcast storms - which are a different problem).
I can imagine it helping a low power device stay idle.

For DHCP you have bigger issues.
All the versions of the dhcp client I've seen use the bpf interface for
receive traffic (even once started) so you get the cost of a clone
of every received packet.

	David


^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-08  9:37                       ` David Laight
@ 2015-04-08 14:12                         ` Dan Williams
  2015-04-09  1:08                         ` Mahesh Bandewar
  1 sibling, 0 replies; 37+ messages in thread
From: Dan Williams @ 2015-04-08 14:12 UTC (permalink / raw)
  To: David Laight; +Cc: Mahesh Bandewar, linux-netdev, jbenc

On Wed, 2015-04-08 at 09:37 +0000, David Laight wrote:
> From: Dan Williams
> ...
> > > Probably something similar where turn on the broadcast bit and wait
> > > for the interface to get configured (2 min timer) and at the expiry
> > > decide what is to be done about braodcast bit. If the addresses
> > > configured are all IPv6, then eliminate it and if any of them are IPv4
> > > then don't change it. In this case no special casing nor snooping is
> > > required and this should work for dual-stack scenario as well.
> > 
> > This does not work for the case where, after configuring, the DHCPv4
> > address lease expires and the IPv4 address is removed, and then DHCPv4
> > is started again.  Possibly because the DHCP server was gone for a short
> > time.  The only way to handle that is to snoop again.
> > 
> > Reaching for maximum performance is great, but if that is done by
> > ignoring/breaking a whole class of normal use-cases, I don't think
> > that's reasonable.  It's like saying "gee, I'd love UDP to be faster, so
> > I'll remove anything TCP-related in the kernel".
> > 
> > Also, I don't think the snooping is as bad for performance as you may
> > think.  The only relevant issue here is L2 + IPv6-only, and in that case
> > it's 4 extra compares (dhcp4_seen, ipv4cnt, lyr3h, and addr_type) for
> > the external case.  How much is that really going to slow things down,
> > versus breaking a huge part of IPv4 address configuration?
> 
> How much performance do you really gain from disabling broadcasts in hardware?
> (Unless your network has massive broadcast storms - which are a different problem).
> I can imagine it helping a low power device stay idle.

I'm not sure what metrics the decision to have a broadcast filter in
ipvlan was based on since I'm just trying to fix some bugs, but I assume
Mahesh has the answer to that?

Dan

> For DHCP you have bigger issues.
> All the versions of the dhcp client I've seen use the bpf interface for
> receive traffic (even once started) so you get the cost of a clone
> of every received packet.
> 
> 	David
> 
> NrybXǧv^)޺{.n+z^)w*\x1fjg\x1eݢj/zޖ2ޙ&)ߡa\x7f\x1eGh\x0fj:+vw٥

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-08  9:37                       ` David Laight
  2015-04-08 14:12                         ` Dan Williams
@ 2015-04-09  1:08                         ` Mahesh Bandewar
  1 sibling, 0 replies; 37+ messages in thread
From: Mahesh Bandewar @ 2015-04-09  1:08 UTC (permalink / raw)
  To: David Laight; +Cc: Dan Williams, linux-netdev, jbenc

On Wed, Apr 8, 2015 at 2:37 AM, David Laight <David.Laight@aculab.com> wrote:
> From: Dan Williams
> ...
>> > Probably something similar where turn on the broadcast bit and wait
>> > for the interface to get configured (2 min timer) and at the expiry
>> > decide what is to be done about braodcast bit. If the addresses
>> > configured are all IPv6, then eliminate it and if any of them are IPv4
>> > then don't change it. In this case no special casing nor snooping is
>> > required and this should work for dual-stack scenario as well.
>>
>> This does not work for the case where, after configuring, the DHCPv4
>> address lease expires and the IPv4 address is removed, and then DHCPv4
>> is started again.  Possibly because the DHCP server was gone for a short
>> time.  The only way to handle that is to snoop again.
>>
>> Reaching for maximum performance is great, but if that is done by
>> ignoring/breaking a whole class of normal use-cases, I don't think
>> that's reasonable.  It's like saying "gee, I'd love UDP to be faster, so
>> I'll remove anything TCP-related in the kernel".
>>
>> Also, I don't think the snooping is as bad for performance as you may
>> think.  The only relevant issue here is L2 + IPv6-only, and in that case
>> it's 4 extra compares (dhcp4_seen, ipv4cnt, lyr3h, and addr_type) for
>> the external case.  How much is that really going to slow things down,
>> versus breaking a huge part of IPv4 address configuration?
>
> How much performance do you really gain from disabling broadcasts in hardware?
> (Unless your network has massive broadcast storms - which are a different problem).
> I can imagine it helping a low power device stay idle.
>
IPvlan is a virtual driver and is enabling / disabling broadcast on
these virtual links. So N virtual links mean N*clone and that's the
cost.

> For DHCP you have bigger issues.
> All the versions of the dhcp client I've seen use the bpf interface for
> receive traffic (even once started) so you get the cost of a clone
> of every received packet.
>
>         David
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-07 19:45                         ` Dan Williams
@ 2015-04-09 15:51                           ` Dan Williams
  2015-04-09 16:01                             ` Eric Dumazet
  2015-04-09 16:33                             ` Mahesh Bandewar
  0 siblings, 2 replies; 37+ messages in thread
From: Dan Williams @ 2015-04-09 15:51 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: linux-netdev, jbenc

On Tue, 2015-04-07 at 14:45 -0500, Dan Williams wrote:
> On Tue, 2015-04-07 at 11:32 -0700, Mahesh Bandewar wrote:
> > On Mon, Apr 6, 2015 at 10:17 AM, Dan Williams <dcbw@redhat.com> wrote:
> > > On Thu, 2015-04-02 at 18:39 -0700, Mahesh Bandewar wrote:
> > >> On Thu, Apr 2, 2015 at 7:40 AM, Dan Williams <dcbw@redhat.com> wrote:
> > >> > On Wed, 2015-04-01 at 18:30 -0700, Mahesh Bandewar wrote:
> > >> >> On Wed, Apr 1, 2015 at 1:55 PM, Dan Williams <dcbw@redhat.com> wrote:
> > >> >> > On Wed, 2015-04-01 at 13:24 -0700, Mahesh Bandewar wrote:
> > >> >> >> On Mon, Mar 30, 2015 at 9:22 PM, Dan Williams <dcbw@redhat.com> wrote:
> > >> >> >> > The broadcast MAC is supposed to be allowed whenever the device
> > >> >> >> > has an IPv4 address, otherwise ARP requests get dropped on the
> > >> >> >> > floor.  If ndo_set_rx_mode (and thus
> > >> >> >> > ipvlan_set_multicast_mac_filter()) got called after the address
> > >> >> >> > was added, it would blow away the broadcast MAC address in
> > >> >> >> > mac_filters that was added at IPv4 address addition.
> > >> >> >> >
> > >> >> >> > Next, ipvlan currently fails DHCP addressing for two reasons:
> > >> >> >> >
> > >> >> >> I would prefer DHCPv4 instead of DHCP since it could also mean DHCPv6
> > >> >> >> which does not use broadcast.
> > >> >> >>
> > >> >> >> > 1) DHCP offers are typically unicast back to the device's MAC
> > >> >> >> > address, and at the IP layer have a destination IP address
> > >> >> >> > matching the new lease address.  In ipvlan unicast packets
> > >> >> >> > are checked against existing addresses assigned to the ipvlan
> > >> >> >> > interface, so clearly this fails hard because the ipvlan
> > >> >> >> > interface has no IP addresses yet.  Workaround: request
> > >> >> >> > that the server broadcast replies (-B for dhclient) which
> > >> >> >> > don't get checked against the address list.
> > >> >> >> >
> > >> >> >> > 2) Even when that's done, mac_filters only allows the
> > >> >> >> > broadcast MAC address when the interface has >= 1 IPv4
> > >> >> >> > addresses, so double-fail, and the incoming DHCP offer
> > >> >> >> > gets dropped on the floor again.
> > >> >> >> >
> > >> >> >> > To fix these issues, Watch for outgoing DHCP requests and
> > >> >> >> > enable the broadcast MAC address indefinitely when one is seen.
> > >> >> >> >
> > >> >> >> This approach will work but adds a performance drag for all UDP
> > >> >> >> traffic whether the link needs DHCPv4 or not (especially when it does
> > >> >> >> not need). Logic works well only when the link (always) needs DHCPv4.
> > >> >> >> When the link does not care about DHCPv4 the packet-inspection always
> > >> >> >> happens.
> > >> >> >
> > >> >> > If DHCPv4 is detected on a link, doesn't that mean the link needs the
> > >> >> > broadcast filter enabled?
> > >> >> Yes and it should stay enabled as it's going to have IPv4. Also once
> > >> >> broadcast bit is set there should not be any additional jugglery /
> > >> >> packet inspection which hurts performance.
> > >> >
> > >> > Yes, which is why my patch includes the check for dhcp4_seen and skips
> > >> > inspecting the packet.  I could modify it to skip the L3 header check
> > >> > too in that case.
> > >> >
> > >> dhcp4_seen works correctly when there is dhcpv4 happening, but
> > >> snooping continues if autoconf is not used.
> > >>
> > >> >> > If you start DHCPv4 on the link you obviously
> > >> >> > expect it to work, unless there's no DHCPv4 server available.  The patch
> > >> >> > should also reset the broadcast filter on close, so network management
> > >> >> > could simply start DHCPv4, watch it fail, close/open and set static IP.
> > >> >> >
> > >> >> > Alternatively we could have the broadcast filter on a 2 minute timer
> > >> >> > that starts when DHCPv4 is seen.
> > >> >> >
> > >> >> Well, but this would be error-prone. I don't remember but DHCP timeout
> > >> >> is not just 2 minutes.
> > >> >
> > >> > 2m is IIRC an RFC recommended timeout for initial DHCPv4 requests.  If
> > >> > you don't get a reply to your requests within 2m, you aren't likely to
> > >> > get one ever on this network and you can stop trying for a while.  You
> > >> > either get a reply in which case you get an IPv4 address and broadcast
> > >> > is always enabled, or you don't, in which case you don't really need
> > >> > broadcast.  Then when userspace next tries DHCP, the timer would allow
> > >> > broadcast for another 2 minutes.
> > >> >
> > >> >> >> Do we really need to special case DHCPv4? Do you see problems in
> > >> >> >> inverting the current logic of adding broadcast bit (i.e. from
> > >> >> >> enable-if-IPv4 to disable-if-IPv6-only)?
> > >> >> >
> > >> >> > The problem with this is dual-stack configuration.  Typically both
> > >> >> > DHCPv4 and SLAAC are started in this case, and often SLAAC will complete
> > >> >> > earlier.  So now the interface only has an IPv6 address but DHCPv4 is
> > >> >> > still ongoing; as I understand your proposal the filter would now block
> > >> >> > broadcast packets that DHCPv4 is listening for.
> > >> >> >
> > >> >> That was a proposal and we can tweak it if necessary. My primary
> > >> >> concern is that the solution should not make us pay per packet penalty
> > >> >> once (and even if not) the DHCPv4 handshake is complete, which is not
> > >> >> the case as it stands now.
> > >> >
> > >> > I'm not sure what you mean by "not make us pay a per packet penalty once
> > >> > DHCPv4 is complete".  Once an IPv4 address is assigned, the broadcast
> > >> > filter must allow broadcast packets.  Before an IPv4 address is
> > >> > assigned, the driver must watch for DHCPv4 and allow broadcast packets
> > >> > so that the DHCP reply isn't missed.
> > >> >
> > >> If the link is not using DHCPv4 to configure, the snooping continues
> > >> and that is per packet cost the needs to be paid. Probably this can be
> > >> avoided by setting dhcp4_seen = true in add_addr4 handler.
> > >>
> > >> > If DHCPv4 gets no response, then there are two options:
> > >> >
> > >> > a) down the interface (which resets the broadcast filter to block
> > >> > broadcast packets) and re-up it, and don't do DHCPv4 again
> > >> >
> > >> > b) have a timer in the driver that waits a period of time for an IPv4
> > >> > address after a DHCPv4 request, and blocks broadcast when the timer
> > >> > expires
> > >> >
> > >> Probably something similar where turn on the broadcast bit and wait
> > >> for the interface to get configured (2 min timer) and at the expiry
> > >> decide what is to be done about braodcast bit. If the addresses
> > >> configured are all IPv6, then eliminate it and if any of them are IPv4
> > >> then don't change it. In this case no special casing nor snooping is
> > >> required and this should work for dual-stack scenario as well.
> > >
> > > This does not work for the case where, after configuring, the DHCPv4
> > > address lease expires and the IPv4 address is removed, and then DHCPv4
> > > is started again.  Possibly because the DHCP server was gone for a short
> > > time.  The only way to handle that is to snoop again.
> > >
> > If the DHCPv4 lease is expired, then why DHCPv4-renew wont work (also
> > it's unicast)? Also if the address applied is  IPv4 then the broadcast
> > bit will stay. In this case I don't understand why this wont work.
> 
> You're right.  I mis-read your proposal above.
> 
> But having re-read it, are you proposing a 2m timer on interface up?  If
> so (and ignore this if not) I don't think that works well either,
> because there's no guarantee that interface configuration will happen
> only close to interface up.  Maybe userspace adds IPv6 addresses
> initially, and then later tries to do DHCP for some reason.  We simply
> cannot rely on specific ordering of operations that userspace might want
> to do.
> 
> If you don't mean a 2m timer from interface up, then ignore that, and
> then what kind of time do you mean? :)
> 
> > If the link is stripped off of address(es), then it should again go
> > back into the config-mode where it would turn on the broadcast bit and
> > enable timer to get it configured. May be I'm missing something.
> 
> As above, I was wrong about the DHCPv4 lease expiration thing, but this
> may still run afoul of userspace operation ordering if you are talking
> about a 2m timer from interface up.
> 
> Just had another thought though; what if instead of snooping for all the
> DHCP stuff, the code just snooped outgoing IPv4 packets for a broadcast
> destination address?  Then turn on the broadcast bit filter for 2m.
> That would look something like this:
> 
> static bool is_bcast4(struct sk_buff *skb)
> {
> 	struct iphdr *ip4h;
> 
> 	switch (skb->protocol) {
> 	case htons(ETH_P_ALL):
> 		/* Raw sockets */
> 		if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
> 			break;
> 		/* Fall through */
> 	case htons(ETH_P_IP):
> 		if (unlikely(!pskb_may_pull(skb, sizeof(*ip4h))))
> 			return NULL;
> 		ip4h = ip_hdr(skb);
> 		if (ip4h->ihl < 5 || ip4h->version != 4)
> 			return NULL;
> 		return ip4h->daddr == INADDR_BROADCAST;
> 	}
> 	return false;
> }
> 
> static int ipvlan_xmit_mode_l2(...)
> {
> 	if (!ipvlan->ipv4cnt && !ipvlan->bcast4_seen && is_bcast4(skb)) {
> 		ipvlan->bcast4_seen = true;
> 		ipvlan_set_broadcast_mac_filter(ipvlan, true);
> 	}
> 
> Yes, it's still snooping for all packets, but it's a lot fewer compares
> than looking for DHCP specifically.

Any thoughts on this Mahesh?  Is this non-DHCP approach more to your
liking?  If so I'll generate an actual patch and do some testing.

Dan

> > > Reaching for maximum performance is great, but if that is done by
> > > ignoring/breaking a whole class of normal use-cases, I don't think
> > > that's reasonable.  It's like saying "gee, I'd love UDP to be faster, so
> > > I'll remove anything TCP-related in the kernel".
> > >
> > What is the amount of traffic that a link will be sending for config
> > (DHCPv4 in this case)? I would guarantee you that it's so negligible
> > to the extent that it's would be non-existent. But all the suggestions
> > include snooping of every packet that goes through the link. However
> > I'm not suggesting that this is non-important or can stay broken. But
> > would rather have a solution that would not affect 99.9% of the
> > traffic (per packet penalty that I mentioned earlier).
> 
> I agree, and it's great that we're brainstorming solutions.
> Unfortunately as a kernel driver, it cannot rely on specific ordering or
> timing of address addition/deletion since that's not something you can
> predict at all.
> 
> > > Also, I don't think the snooping is as bad for performance as you may
> > > think.  The only relevant issue here is L2 + IPv6-only,
> > Also IPv4 case when the link is not using autoconf.
> 
> Yes, that's true.  But a kernel driver simply doesn't know what its
> setup will be so it cannot assume much of anything about the addressing
> setup despite whatever performance gains there might be...
> 
> Dan
> 
> > > and in that case
> > > it's 4 extra compares (dhcp4_seen, ipv4cnt, lyr3h, and addr_type) for
> > > the external case.  How much is that really going to slow things down,
> > > versus breaking a huge part of IPv4 address configuration?
> > >
> > > Dan
> > >
> > >> > Dan
> > >> >
> > >> >> > Dan
> > >> >> >
> > >> >> >> > Signed-off-by: Dan Williams <dcbw@redhat.com>
> > >> >> >> > ---
> > >> >> >> > NOTE: this patch supercedes my previous two ipvlan patches:
> > >> >> >> > [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters
> > >> >> >> > [PATCH 2/2] ipvlan: always allow the broadcast MAC address
> > >> >> >> >
> > >> >> >> >  drivers/net/ipvlan/ipvlan.h      |  2 ++
> > >> >> >> >  drivers/net/ipvlan/ipvlan_core.c | 36 ++++++++++++++++++++++++++++++++++--
> > >> >> >> >  drivers/net/ipvlan/ipvlan_main.c | 12 +++++++++---
> > >> >> >> >  3 files changed, 45 insertions(+), 5 deletions(-)
> > >> >> >> >
> > >> >> >> > diff --git a/drivers/net/ipvlan/ipvlan.h b/drivers/net/ipvlan/ipvlan.h
> > >> >> >> > index 924ea98..7e0b8ff 100644
> > >> >> >> > --- a/drivers/net/ipvlan/ipvlan.h
> > >> >> >> > +++ b/drivers/net/ipvlan/ipvlan.h
> > >> >> >> > @@ -67,6 +67,7 @@ struct ipvl_dev {
> > >> >> >> >         struct list_head        addrs;
> > >> >> >> >         int                     ipv4cnt;
> > >> >> >> >         int                     ipv6cnt;
> > >> >> >> > +       bool                    dhcp4_seen;
> > >> >> >> >         struct ipvl_pcpu_stats  __percpu *pcpu_stats;
> > >> >> >> >         DECLARE_BITMAP(mac_filters, IPVLAN_MAC_FILTER_SIZE);
> > >> >> >> >         netdev_features_t       sfeatures;
> > >> >> >> > @@ -118,4 +119,5 @@ bool ipvlan_addr_busy(struct ipvl_dev *ipvlan, void *iaddr, bool is_v6);
> > >> >> >> >  struct ipvl_addr *ipvlan_ht_addr_lookup(const struct ipvl_port *port,
> > >> >> >> >                                         const void *iaddr, bool is_v6);
> > >> >> >> >  void ipvlan_ht_addr_del(struct ipvl_addr *addr, bool sync);
> > >> >> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set);
> > >> >> >> >  #endif /* __IPVLAN_H */
> > >> >> >> > diff --git a/drivers/net/ipvlan/ipvlan_core.c b/drivers/net/ipvlan/ipvlan_core.c
> > >> >> >> > index 2a17500..6dd992b 100644
> > >> >> >> > --- a/drivers/net/ipvlan/ipvlan_core.c
> > >> >> >> > +++ b/drivers/net/ipvlan/ipvlan_core.c
> > >> >> >> > @@ -126,6 +126,12 @@ static void *ipvlan_get_L3_hdr(struct sk_buff *skb, int *type)
> > >> >> >> >                 lyr3h = arph;
> > >> >> >> >                 break;
> > >> >> >> >         }
> > >> >> >> > +       case htons(ETH_P_ALL): {
> > >> >> >> > +               /* Raw sockets */
> > >> >> >> > +               if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
> > >> >> >> > +                       break;
> > >> >> >> > +               /* Fall through */
> > >> >> >> > +       }
> > >> >> >> >         case htons(ETH_P_IP): {
> > >> >> >> >                 u32 pktlen;
> > >> >> >> >                 struct iphdr *ip4h;
> > >> >> >> > @@ -454,16 +460,42 @@ out:
> > >> >> >> >         return ipvlan_process_outbound(skb, ipvlan);
> > >> >> >> >  }
> > >> >> >> >
> > >> >> >> > +#define DHCP_PACKET_MIN_LEN 282
> > >> >> >> > +
> > >> >> >> > +static bool is_dhcp(struct sk_buff *skb)
> > >> >> >> > +{
> > >> >> >> > +       struct iphdr *ip4h = ip_hdr(skb);
> > >> >> >> > +       struct udphdr *udph;
> > >> >> >> > +
> > >> >> >> > +       if (skb->len < DHCP_PACKET_MIN_LEN || ip4h->protocol != IPPROTO_UDP
> > >> >> >> > +               || ip_is_fragment(ip4h))
> > >> >> >> > +               return false;
> > >> >> >> > +
> > >> >> >> > +       /* In the case of RAW sockets the transport header is not set by
> > >> >> >> > +        * the IP stack so we must set it ourselves.
> > >> >> >> > +        */
> > >> >> >> > +       if (skb->transport_header == skb->network_header)
> > >> >> >> > +               skb_set_transport_header(skb, ETH_HLEN + sizeof(struct iphdr));
> > >> >> >> > +       udph = udp_hdr(skb);
> > >> >> >> > +       return udph && udph->dest == htons(67) && udph->source == htons(68);
> > >> >> >> > +}
> > >> >> >> > +
> > >> >> >> >  static int ipvlan_xmit_mode_l2(struct sk_buff *skb, struct net_device *dev)
> > >> >> >> >  {
> > >> >> >> > -       const struct ipvl_dev *ipvlan = netdev_priv(dev);
> > >> >> >> > +       struct ipvl_dev *ipvlan = netdev_priv(dev);
> > >> >> >> >         struct ethhdr *eth = eth_hdr(skb);
> > >> >> >> >         struct ipvl_addr *addr;
> > >> >> >> >         void *lyr3h;
> > >> >> >> >         int addr_type;
> > >> >> >> >
> > >> >> >> > +       lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> > >> >> >> > +       if (!ipvlan->dhcp4_seen && lyr3h && addr_type == IPVL_IPV4
> > >> >> >> > +           && is_dhcp(skb)) {
> > >> >> >> > +               ipvlan->dhcp4_seen = true;
> > >> >> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, true);
> > >> >> >> > +       }
> > >> >> >> > +
> > >> >> >> >         if (ether_addr_equal(eth->h_dest, eth->h_source)) {
> > >> >> >> > -               lyr3h = ipvlan_get_L3_hdr(skb, &addr_type);
> > >> >> >> >                 if (lyr3h) {
> > >> >> >> >                         addr = ipvlan_addr_lookup(ipvlan->port, lyr3h, addr_type, true);
> > >> >> >> >                         if (addr)
> > >> >> >> > diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c
> > >> >> >> > index 4f4099d..a497fb9 100644
> > >> >> >> > --- a/drivers/net/ipvlan/ipvlan_main.c
> > >> >> >> > +++ b/drivers/net/ipvlan/ipvlan_main.c
> > >> >> >> > @@ -167,6 +167,8 @@ static int ipvlan_stop(struct net_device *dev)
> > >> >> >> >
> > >> >> >> >         dev_uc_del(phy_dev, phy_dev->dev_addr);
> > >> >> >> >
> > >> >> >> > +       ipvlan->dhcp4_seen = false;
> > >> >> >> > +
> > >> >> >> >         if (ipvlan->ipv6cnt > 0 || ipvlan->ipv4cnt > 0) {
> > >> >> >> >                 list_for_each_entry(addr, &ipvlan->addrs, anode)
> > >> >> >> >                         ipvlan_ht_addr_del(addr, !dev->dismantle);
> > >> >> >> > @@ -214,7 +216,7 @@ static void ipvlan_change_rx_flags(struct net_device *dev, int change)
> > >> >> >> >                 dev_set_allmulti(phy_dev, dev->flags & IFF_ALLMULTI? 1 : -1);
> > >> >> >> >  }
> > >> >> >> >
> > >> >> >> > -static void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> > >> >> >> > +void ipvlan_set_broadcast_mac_filter(struct ipvl_dev *ipvlan, bool set)
> > >> >> >> >  {
> > >> >> >> >         struct net_device *dev = ipvlan->dev;
> > >> >> >> >         unsigned int hashbit = ipvlan_mac_hash(dev->broadcast);
> > >> >> >> > @@ -239,6 +241,9 @@ static void ipvlan_set_multicast_mac_filter(struct net_device *dev)
> > >> >> >> >                 netdev_for_each_mc_addr(ha, dev)
> > >> >> >> >                         __set_bit(ipvlan_mac_hash(ha->addr), mc_filters);
> > >> >> >> >
> > >> >> >> > +               if (ipvlan->ipv4cnt || ipvlan->dhcp4_seen)
> > >> >> >> > +                       __set_bit(ipvlan_mac_hash(dev->broadcast), mc_filters);
> > >> >> >> > +
> > >> >> >> >                 bitmap_copy(ipvlan->mac_filters, mc_filters,
> > >> >> >> >                             IPVLAN_MAC_FILTER_SIZE);
> > >> >> >> >         }
> > >> >> >> > @@ -467,6 +472,7 @@ static int ipvlan_link_new(struct net *src_net, struct net_device *dev,
> > >> >> >> >         INIT_LIST_HEAD(&ipvlan->addrs);
> > >> >> >> >         ipvlan->ipv4cnt = 0;
> > >> >> >> >         ipvlan->ipv6cnt = 0;
> > >> >> >> > +       ipvlan->dhcp4_seen = false;
> > >> >> >> >
> > >> >> >> >         /* TODO Probably put random address here to be presented to the
> > >> >> >> >          * world but keep using the physical-dev address for the outgoing
> > >> >> >> > @@ -708,8 +714,8 @@ static void ipvlan_del_addr4(struct ipvl_dev *ipvlan, struct in_addr *ip4_addr)
> > >> >> >> >         list_del_rcu(&addr->anode);
> > >> >> >> >         ipvlan->ipv4cnt--;
> > >> >> >> >         WARN_ON(ipvlan->ipv4cnt < 0);
> > >> >> >> > -       if (!ipvlan->ipv4cnt)
> > >> >> >> > -           ipvlan_set_broadcast_mac_filter(ipvlan, false);
> > >> >> >> > +       if (!ipvlan->ipv4cnt && !ipvlan->dhcp4_seen)
> > >> >> >> > +               ipvlan_set_broadcast_mac_filter(ipvlan, false);
> > >> >> >> >         kfree_rcu(addr, rcu);
> > >> >> >> >
> > >> >> >> >         return;
> > >> >> >> > --
> > >> >> >> > 2.1.0
> > >> >> >> >
> > >> >> >> >
> > >> >> >> --
> > >> >> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> > >> >> >> the body of a message to majordomo@vger.kernel.org
> > >> >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >> >> >
> > >> >> >
> > >> >> --
> > >> >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> > >> >> the body of a message to majordomo@vger.kernel.org
> > >> >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >> >
> > >> >
> > >> --
> > >> To unsubscribe from this list: send the line "unsubscribe netdev" in
> > >> the body of a message to majordomo@vger.kernel.org
> > >> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > >
> > >
> > --
> > To unsubscribe from this list: send the line "unsubscribe netdev" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-09 15:51                           ` Dan Williams
@ 2015-04-09 16:01                             ` Eric Dumazet
  2015-04-09 16:33                             ` Mahesh Bandewar
  1 sibling, 0 replies; 37+ messages in thread
From: Eric Dumazet @ 2015-04-09 16:01 UTC (permalink / raw)
  To: Dan Williams; +Cc: Mahesh Bandewar, linux-netdev, jbenc

On Thu, 2015-04-09 at 10:51 -0500, Dan Williams wrote:

> Any thoughts on this Mahesh?  Is this non-DHCP approach more to your
> liking?  If so I'll generate an actual patch and do some testing.

Pleas guys trim messages when replying. Otherwise people are bored and
do not even try to read them anymore.

Thanks.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-09 15:51                           ` Dan Williams
  2015-04-09 16:01                             ` Eric Dumazet
@ 2015-04-09 16:33                             ` Mahesh Bandewar
  2015-04-09 22:18                               ` Dan Williams
  1 sibling, 1 reply; 37+ messages in thread
From: Mahesh Bandewar @ 2015-04-09 16:33 UTC (permalink / raw)
  To: Dan Williams; +Cc: linux-netdev, jbenc

>> You're right.  I mis-read your proposal above.
>>
>> But having re-read it, are you proposing a 2m timer on interface up?  If
>> so (and ignore this if not) I don't think that works well either,
>> because there's no guarantee that interface configuration will happen
>> only close to interface up.  Maybe userspace adds IPv6 addresses
>> initially, and then later tries to do DHCP for some reason.  We simply
>> cannot rely on specific ordering of operations that userspace might want
>> to do.
>>
>> If you don't mean a 2m timer from interface up, then ignore that, and
>> then what kind of time do you mean? :)
>>
>> > If the link is stripped off of address(es), then it should again go
>> > back into the config-mode where it would turn on the broadcast bit and
>> > enable timer to get it configured. May be I'm missing something.
>>
>> As above, I was wrong about the DHCPv4 lease expiration thing, but this
>> may still run afoul of userspace operation ordering if you are talking
>> about a 2m timer from interface up.
>>
>> Just had another thought though; what if instead of snooping for all the
>> DHCP stuff, the code just snooped outgoing IPv4 packets for a broadcast
>> destination address?  Then turn on the broadcast bit filter for 2m.
>> That would look something like this:
>>
>> static bool is_bcast4(struct sk_buff *skb)
>> {
>>       struct iphdr *ip4h;
>>
>>       switch (skb->protocol) {
>>       case htons(ETH_P_ALL):
>>               /* Raw sockets */
>>               if (eth_hdr(skb)->h_proto != htons(ETH_P_IP))
>>                       break;
>>               /* Fall through */
>>       case htons(ETH_P_IP):
>>               if (unlikely(!pskb_may_pull(skb, sizeof(*ip4h))))
>>                       return NULL;
>>               ip4h = ip_hdr(skb);
>>               if (ip4h->ihl < 5 || ip4h->version != 4)
>>                       return NULL;
>>               return ip4h->daddr == INADDR_BROADCAST;
>>       }
>>       return false;
>> }
>>
>> static int ipvlan_xmit_mode_l2(...)
>> {
>>       if (!ipvlan->ipv4cnt && !ipvlan->bcast4_seen && is_bcast4(skb)) {
>>               ipvlan->bcast4_seen = true;
>>               ipvlan_set_broadcast_mac_filter(ipvlan, true);
>>       }
>>
>> Yes, it's still snooping for all packets, but it's a lot fewer compares
>> than looking for DHCP specifically.
>
> Any thoughts on this Mahesh?  Is this non-DHCP approach more to your
> liking?  If so I'll generate an actual patch and do some testing.
>
Sorry about the late reply but some how I missed that.

Yes, this looks better, but I'm thinking of solving this issue with a
different approach which does not involve snooping. Or if it does, it
wont be in a fast path! It is sort of falling back to your original
patch which eliminates setting / resetting the broadcast bit but after
deferring the broadcast / multicast processing to a work-queue. This
will keep the fast path clean and we can do all sort of jugglery in
work-queue (if needed) without affecting the performance of the device
(fast path) whether it's IPv6 or IPv4 traffic.

Also Eric pointed out how multicast is broken in IPvlan, that needs a
fix too. I have cooked something, but needs some testing, I'll push
out those patches as soon as I'm happy with it's testing.

Thanks,
--mahesh..


> Dan
>

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP
  2015-04-09 16:33                             ` Mahesh Bandewar
@ 2015-04-09 22:18                               ` Dan Williams
  0 siblings, 0 replies; 37+ messages in thread
From: Dan Williams @ 2015-04-09 22:18 UTC (permalink / raw)
  To: Mahesh Bandewar; +Cc: linux-netdev, jbenc

On Thu, 2015-04-09 at 09:33 -0700, Mahesh Bandewar wrote:
> >> Yes, it's still snooping for all packets, but it's a lot fewer compares
> >> than looking for DHCP specifically.
> >
> > Any thoughts on this Mahesh?  Is this non-DHCP approach more to your
> > liking?  If so I'll generate an actual patch and do some testing.
> >
> Sorry about the late reply but some how I missed that.
> 
> Yes, this looks better, but I'm thinking of solving this issue with a
> different approach which does not involve snooping. Or if it does, it
> wont be in a fast path! It is sort of falling back to your original
> patch which eliminates setting / resetting the broadcast bit but after
> deferring the broadcast / multicast processing to a work-queue. This
> will keep the fast path clean and we can do all sort of jugglery in
> work-queue (if needed) without affecting the performance of the device
> (fast path) whether it's IPv6 or IPv4 traffic.
> 
> Also Eric pointed out how multicast is broken in IPvlan, that needs a
> fix too. I have cooked something, but needs some testing, I'll push
> out those patches as soon as I'm happy with it's testing.

I'm quite happy to test those patches when you post them too.

Dan

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2015-04-09 22:18 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-26 22:41 [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters Dan Williams
2015-03-26 22:43 ` [PATCH 2/2] ipvlan: always allow the broadcast MAC address Dan Williams
2015-03-27 17:46   ` Jiri Benc
2015-03-28  0:52   ` Mahesh Bandewar
2015-03-28  5:56     ` Mahesh Bandewar
2015-03-28 18:32       ` Jiri Benc
2015-03-30 14:37         ` Dan Williams
2015-03-30 16:54           ` Mahesh Bandewar
2015-03-30 17:44             ` Dan Williams
2015-03-30 17:56               ` Mahesh Bandewar
2015-03-30 18:13                 ` Dan Williams
2015-03-30 18:32                   ` Mahesh Bandewar
2015-03-27 17:45 ` [PATCH 1/2] ipvlan: don't loose broadcast MAC when setting MAC filters Jiri Benc
2015-03-30 20:28 ` Mahesh Bandewar
2015-03-30 21:01   ` Dan Williams
2015-03-30 21:11     ` Mahesh Bandewar
2015-03-31  3:05       ` Dan Williams
2015-03-31  4:22         ` [PATCH] ipvlan: fix up broadcast MAC filtering for ARP and DHCP Dan Williams
2015-04-01 20:07           ` Dan Williams
2015-04-01 20:24           ` Mahesh Bandewar
2015-04-01 20:55             ` Dan Williams
2015-04-02  1:30               ` Mahesh Bandewar
2015-04-02 14:40                 ` Dan Williams
2015-04-03  1:39                   ` Mahesh Bandewar
2015-04-06 17:17                     ` Dan Williams
2015-04-07 18:32                       ` Mahesh Bandewar
2015-04-07 19:45                         ` Dan Williams
2015-04-09 15:51                           ` Dan Williams
2015-04-09 16:01                             ` Eric Dumazet
2015-04-09 16:33                             ` Mahesh Bandewar
2015-04-09 22:18                               ` Dan Williams
2015-04-08  9:37                       ` David Laight
2015-04-08 14:12                         ` Dan Williams
2015-04-09  1:08                         ` Mahesh Bandewar
2015-04-02 18:16           ` David Miller
2015-04-02 18:39             ` Dan Williams
2015-04-02 18:46               ` Dan Williams

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.