netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Alexander Sverdlin <alexander.sverdlin@nokia.com>
To: netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	Vlad Yasevich <vyasevich@gmail.com>,
	Jiri Pirko <jpirko@redhat.com>, Arnd Bergmann <arnd@arndb.de>,
	Matija Glavinic Pecotic <matija.glavinic-pecotic.ext@nokia.com>
Subject: Multicast from underlying MACVLAN interface towards MACVLAN
Date: Tue, 12 May 2020 16:23:11 +0200	[thread overview]
Message-ID: <8e6e5260-9359-eddd-c928-dba487f1319b@nokia.com> (raw)

Dear Network Core developers!

I've been debugging an issue with Multicast replies from underlying
interface of MACVLAN towards MACVLAN. These SKBs never contain a MAC header
and therefore cannot be properly processed by MACVLAN.

The usecase is following:
eth1 <-- eth1.212 <-- macvlan@eth1.212 (in bridge mode)

As I understand the problem, it actually plays no role, that there is an intermediate VLAN interface.
The problem is, if macvlan@eth1.212 sends Router Solicitation these SKBs are received on eth1.212,
but the corresponding multicast Router Advertisements are not received on macvlan@eth1.212.

I've tracked the problem down to the following incompatibility between MACVLAN code and IP code...

One the one hand, MACVLAN always expects ethernet header:

static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)                                                                                                          
{                                                                                                                                                                               
        struct macvlan_port *port;                                                                                                                                              
        struct sk_buff *skb = *pskb;                                                                                                                                            
        const struct ethhdr *eth = eth_hdr(skb);                                                                                                                                
        ...
                                                                                                                                                                                
        port = macvlan_port_get_rcu(skb->dev);                                                                                                                                  
        if (is_multicast_ether_addr(eth->h_dest)) {                                                                                                                             

One the other hand, IP doesn't populate ethernet header for multicast loopback transmission:

int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)                                                                                                    
{                                                                                                                                                                               
        skb_reset_mac_header(skb);                                                                                                                                              
        __skb_pull(skb, skb_network_offset(skb));                                                                                                                               
        skb->pkt_type = PACKET_LOOPBACK;                                                                                                                                        
        skb->ip_summed = CHECKSUM_UNNECESSARY;                                                                                                                                  
        WARN_ON(!skb_dst(skb));                                                                                                                                                 
        skb_dst_force(skb);                                                                                                                                                     
        netif_rx_ni(skb);                                                                                                                                                       

Unicast however works fine, because of:

int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb)                                                                                                        
{                                                                                                                                                                               
        struct net_device *dev = neigh->dev;                                                                                                                                    
        unsigned int seq;                                                                                                                                                       
        int err;                                                                                                                                                                
                                                                                                                                                                                
        do {                                                                                                                                                                    
                __skb_pull(skb, skb_network_offset(skb));                                                                                                                       
                seq = read_seqbegin(&neigh->ha_lock);                                                                                                                           
                err = dev_hard_header(skb, dev, ntohs(skb->protocol),                                                                                                           
                                      neigh->ha, NULL, skb->len);                                                                                                               
        } while (read_seqretry(&neigh->ha_lock, seq));                                                                                                                          
                                                                                                                                                                                
        if (err >= 0)                                                                                                                                                           
                err = dev_queue_xmit(skb);                                                                                                                                      

I've also collected some stack traces and SKB dumps to illustrate the problem
(I've instrumented macvlan_handle_frame() and eth_header() to understand when
the ethernet header has been generated):

macvlan_handle_frame() receives Router Advertisement, but cannot forward
without Ethernet header:

skb len=96 headroom=40 headlen=96 tailroom=56
mac=(40,0) net=(40,40) trans=80
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xae2e9a2f ip_summed=1 complete_sw=0 valid=0 level=0)
hash(0xc97ebd88 sw=1 l4=1) proto=0x86dd pkttype=5 iif=24
dev name=etha01.212 feat=0x0x0000000040005000
skb headroom: 00000000: 00 28 b3 4d 84 88 ff ff b2 72 b9 5e 00 00 00 00
skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb headroom: 00000020: 08 0f 00 00 00 00 00 00
skb linear:   00000000: 60 09 88 bd 00 38 3a ff fe 80 00 00 00 00 00 00
skb linear:   00000010: 00 40 43 ff fe 80 00 00 ff 02 00 00 00 00 00 00
skb linear:   00000020: 00 00 00 00 00 00 00 01 86 00 61 00 40 00 00 2d
skb linear:   00000030: 00 00 00 00 00 00 00 00 03 04 40 e0 00 00 01 2c
skb linear:   00000040: 00 00 00 78 00 00 00 00 fd 5f 42 68 23 87 a8 81
skb linear:   00000050: 00 00 00 00 00 00 00 00 01 01 02 40 43 80 00 00
skb tailroom: 00000000: 00 f0 01 00 00 00 00 00 a4 73 00 00 00 00 00 00
skb tailroom: 00000010: a4 73 00 00 00 00 00 00 00 10 00 00 00 00 00 00
skb tailroom: 00000020: 01 00 00 00 06 00 00 00 40 66 02 00 00 00 00 00
skb tailroom: 00000030: 40 76 02 00 00 00 00 00

Call Trace:
 <IRQ>
 dump_stack+0x69/0x9b
 macvlan_handle_frame+0x321/0x425 [macvlan]
 ? macvlan_forward_source+0x110/0x110 [macvlan]
 __netif_receive_skb_core+0x545/0xda0
 ? ip6_mc_input+0x103/0x250 [ipv6]
 ? ipv6_rcv+0xe1/0xf0 [ipv6]
 ? __netif_receive_skb_one_core+0x36/0x70
 __netif_receive_skb_one_core+0x36/0x70
 process_backlog+0x97/0x140
 net_rx_action+0x1eb/0x350
 __do_softirq+0xe3/0x383
 do_softirq_own_stack+0x2a/0x40
 </IRQ>
 do_softirq.part.4+0x4e/0x50
 netif_rx_ni+0x60/0xd0
 dev_loopback_xmit+0x83/0xf0
 ip6_finish_output2+0x575/0x590 [ipv6]
 ? ip6_cork_release.isra.1+0x64/0x90 [ipv6]
 ? __ip6_make_skb+0x38d/0x680 [ipv6]
 ? ip6_output+0x6c/0x140 [ipv6]
 ip6_output+0x6c/0x140 [ipv6]
 ip6_send_skb+0x1e/0x60 [ipv6]
 rawv6_sendmsg+0xc4b/0xe10 [ipv6]
 ? proc_put_long+0xd0/0xd0
 ? rw_copy_check_uvector+0x4e/0x110
 ? sock_sendmsg+0x36/0x40
 sock_sendmsg+0x36/0x40
 ___sys_sendmsg+0x2b6/0x2d0
 ? proc_dointvec+0x23/0x30
 ? addrconf_sysctl_forward+0x8d/0x250 [ipv6]
 ? dev_forward_change+0x130/0x130 [ipv6]
 ? _raw_spin_unlock+0x12/0x30
 ? proc_sys_call_handler.isra.14+0x9f/0x110
 ? __call_rcu+0x213/0x510
 ? get_max_files+0x10/0x10
 ? trace_hardirqs_on+0x2c/0xe0
 ? __sys_sendmsg+0x63/0xa0
 __sys_sendmsg+0x63/0xa0
 do_syscall_64+0x6c/0x1e0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Later when the same RA is being transmitted neigh_connected_output(), this is the first
time Ethernet header is being generated for this packet, but this is towards "world", not
the internal MACVLAN bridge:

skb len=110 headroom=26 headlen=110 tailroom=56
mac=(-1,-1) net=(40,40) trans=80
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xae2e9a2f ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0xc97ebd88 sw=1 l4=1) proto=0x86dd pkttype=0 iif=0
dev name=etha01.212 feat=0x0x0000000040005000
sk family=10 type=3 proto=58
skb headroom: 00000000: 00 28 b3 4d 84 88 ff ff b2 72 b9 5e 00 00 00 00
skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00
skb linear:   00000000: 33 33 00 00 00 01 02 40 43 80 00 00 86 dd 60 09
skb linear:   00000010: 88 bd 00 38 3a ff fe 80 00 00 00 00 00 00 00 40
skb linear:   00000020: 43 ff fe 80 00 00 ff 02 00 00 00 00 00 00 00 00
skb linear:   00000030: 00 00 00 00 00 01 86 00 61 00 40 00 00 2d 00 00
skb linear:   00000040: 00 00 00 00 00 00 03 04 40 e0 00 00 01 2c 00 00
skb linear:   00000050: 00 78 00 00 00 00 fd 5f 42 68 23 87 a8 81 00 00
skb linear:   00000060: 00 00 00 00 00 00 01 01 02 40 43 80 00 00
skb tailroom: 00000000: 00 f0 01 00 00 00 00 00 a4 73 00 00 00 00 00 00
skb tailroom: 00000010: a4 73 00 00 00 00 00 00 00 10 00 00 00 00 00 00
skb tailroom: 00000020: 01 00 00 00 06 00 00 00 40 66 02 00 00 00 00 00
skb tailroom: 00000030: 40 76 02 00 00 00 00 00

Call Trace:
 dump_stack+0x69/0x9b
 debug_hdr+0x4c/0x60
 eth_header+0x71/0xe0
 vlan_dev_hard_header+0x58/0x140 [8021q]
 neigh_connected_output+0xa9/0x100
 ip6_finish_output2+0x24a/0x590 [ipv6]
 ? ip6_cork_release.isra.1+0x64/0x90 [ipv6]
 ? __ip6_make_skb+0x38d/0x680 [ipv6]
 ? ip6_output+0x6c/0x140 [ipv6]
 ip6_output+0x6c/0x140 [ipv6]
 ip6_send_skb+0x1e/0x60 [ipv6]
 rawv6_sendmsg+0xc4b/0xe10 [ipv6]
 ? proc_put_long+0xd0/0xd0
 ? rw_copy_check_uvector+0x4e/0x110
 ? sock_sendmsg+0x36/0x40
 sock_sendmsg+0x36/0x40
 ___sys_sendmsg+0x2b6/0x2d0
 ? proc_dointvec+0x23/0x30
 ? addrconf_sysctl_forward+0x8d/0x250 [ipv6]
 ? dev_forward_change+0x130/0x130 [ipv6]
 ? _raw_spin_unlock+0x12/0x30
 ? proc_sys_call_handler.isra.14+0x9f/0x110
 ? __call_rcu+0x213/0x510
 ? get_max_files+0x10/0x10
 ? trace_hardirqs_on+0x2c/0xe0
 ? __sys_sendmsg+0x63/0xa0
 __sys_sendmsg+0x63/0xa0
 do_syscall_64+0x6c/0x1e0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

I would appreciate any hint, how to approach this problem! I can try to come up with a patch,
but as this is so central thing in the IP protocol, I'd like to hear some opinions first...

-- 
Best regards,
Alexander Sverdlin.

                 reply	other threads:[~2020-05-12 14:23 UTC|newest]

Thread overview: [no followups] expand[flat|nested]  mbox.gz  Atom feed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e6e5260-9359-eddd-c928-dba487f1319b@nokia.com \
    --to=alexander.sverdlin@nokia.com \
    --cc=arnd@arndb.de \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=jpirko@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matija.glavinic-pecotic.ext@nokia.com \
    --cc=netdev@vger.kernel.org \
    --cc=vyasevich@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).