All of lore.kernel.org
 help / color / mirror / Atom feed
From: David Ahern <dsa@cumulusnetworks.com>
To: Daniel Borkmann <daniel@iogearbox.net>, davem@davemloft.net
Cc: netdev@vger.kernel.org, Mahesh Bandewar <maheshb@google.com>,
	Florian Westphal <fw@strlen.de>, Martynas Pumputis <m@lambda.lt>
Subject: Re: [PATCH net] ipvlan, l3mdev: fix broken l3s mode wrt local routes
Date: Wed, 30 Jan 2019 15:24:17 -0700	[thread overview]
Message-ID: <2c0c0ea4-274f-b19e-8c4e-71940243bba9@cumulusnetworks.com> (raw)
In-Reply-To: <20190130114948.24227-1-daniel@iogearbox.net>

On 1/30/19 4:49 AM, Daniel Borkmann wrote:
> While implementing ipvlan l3 and l3s mode for kubernetes CNI plugin,
> I ran into the issue that while l3 mode is working fine, l3s mode
> does not have any connectivity to kube-apiserver and hence all pods
> end up in Error state as well. The ipvlan master device sits on
> top of a bond device and hostns traffic to kube-apiserver (also running
> in hostns) is DNATed from 10.152.183.1:443 to 139.178.29.207:37573
> where the latter is the address of the bond0. While in l3 mode, a
> curl to https://10.152.183.1:443 or to https://139.178.29.207:37573
> works fine from hostns, neither of them do in case of l3s. In the
> latter only a curl to https://127.0.0.1:37573 appeared to work where
> for local addresses of bond0 I saw kernel suddenly starting to emit
> ARP requests to query HW address of bond0 which remained unanswered
> and neighbor entries in INCOMPLETE state. These ARP requests only
> happen while in l3s.
> 
> Debugging this further, I found the issue is that l3s mode is piggy-
> backing on l3 master device, and in this case local routes are using
> l3mdev_master_dev_rcu(dev) instead of net->loopback_dev as per commit
> f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev
> if relevant") and 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be
> a loopback"). I found that reverting them back into using the
> net->loopback_dev fixed ipvlan l3s connectivity and got everything
> working for the CNI.
> 
> Now judging from 4fbae7d83c98 ("ipvlan: Introduce l3s mode") and the
> l3mdev paper in [0] the only sole reason why ipvlan l3s is relying
> on l3 master device is to get the l3mdev_ip_rcv() receive hook for
> setting the dst entry of the input route without adding its own
> ipvlan specific hacks into the receive path, however, any l3 domain
> semantics beyond just that are breaking l3s operation. Note that
> ipvlan also has the ability to dynamically switch its internal
> operation from l3 to l3s for all ports via ipvlan_set_port_mode()
> at runtime. In any case, l3 vs l3s soley distinguishes itself by
> 'de-confusing' netfilter through switching skb->dev to ipvlan slave
> device late in NF_INET_LOCAL_IN before handing the skb to L4.
> 
> Minimal fix taken here is to add a IFF_L3MDEV_RX_HANDLER flag which,
> if set from ipvlan setup, gets us only the wanted l3mdev_l3_rcv() hook
> without any additional l3mdev semantics on top. This should also have
> minimal impact since dev->priv_flags is already hot in cache. With
> this set, l3s mode is working fine and I also get things like
> masquerading pod traffic on the ipvlan master properly working.
> 
>   [0] https://netdevconf.org/1.2/papers/ahern-what-is-l3mdev-paper.pdf
> 
> Fixes: f5a0aab84b74 ("net: ipv4: dst for local input routes should use l3mdev if relevant")
> Fixes: 5f02ce24c269 ("net: l3mdev: Allow the l3mdev to be a loopback")
> Fixes: 4fbae7d83c98 ("ipvlan: Introduce l3s mode")
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Mahesh Bandewar <maheshb@google.com>
> Cc: David Ahern <dsa@cumulusnetworks.com>
> Cc: Florian Westphal <fw@strlen.de>
> Cc: Martynas Pumputis <m@lambda.lt>
> ---
>  drivers/net/ipvlan/ipvlan_main.c | 6 +++---
>  include/linux/netdevice.h        | 8 ++++++++
>  include/net/l3mdev.h             | 3 ++-
>  3 files changed, 13 insertions(+), 4 deletions(-)
> 
I am not surprised that ipvlan needs a finer grained selection of the
l3mdev hooks.

Acked-by: David Ahern <dsa@cumulusnetworks.com>

  reply	other threads:[~2019-01-30 23:30 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-30 11:49 [PATCH net] ipvlan, l3mdev: fix broken l3s mode wrt local routes Daniel Borkmann
2019-01-30 22:24 ` David Ahern [this message]
2019-01-31  6:14 ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2c0c0ea4-274f-b19e-8c4e-71940243bba9@cumulusnetworks.com \
    --to=dsa@cumulusnetworks.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=fw@strlen.de \
    --cc=m@lambda.lt \
    --cc=maheshb@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.