netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Multicast from underlying MACVLAN interface towards MACVLAN
@ 2020-05-12 14:23 Alexander Sverdlin
  0 siblings, 0 replies; only message in thread
From: Alexander Sverdlin @ 2020-05-12 14:23 UTC (permalink / raw)
  To: netdev
  Cc: David S. Miller, Eric Dumazet, linux-kernel, Vlad Yasevich,
	Jiri Pirko, Arnd Bergmann, Matija Glavinic Pecotic

Dear Network Core developers!

I've been debugging an issue with Multicast replies from underlying
interface of MACVLAN towards MACVLAN. These SKBs never contain a MAC header
and therefore cannot be properly processed by MACVLAN.

The usecase is following:
eth1 <-- eth1.212 <-- macvlan@eth1.212 (in bridge mode)

As I understand the problem, it actually plays no role, that there is an intermediate VLAN interface.
The problem is, if macvlan@eth1.212 sends Router Solicitation these SKBs are received on eth1.212,
but the corresponding multicast Router Advertisements are not received on macvlan@eth1.212.

I've tracked the problem down to the following incompatibility between MACVLAN code and IP code...

One the one hand, MACVLAN always expects ethernet header:

static rx_handler_result_t macvlan_handle_frame(struct sk_buff **pskb)                                                                                                          
{                                                                                                                                                                               
        struct macvlan_port *port;                                                                                                                                              
        struct sk_buff *skb = *pskb;                                                                                                                                            
        const struct ethhdr *eth = eth_hdr(skb);                                                                                                                                
        ...
                                                                                                                                                                                
        port = macvlan_port_get_rcu(skb->dev);                                                                                                                                  
        if (is_multicast_ether_addr(eth->h_dest)) {                                                                                                                             

One the other hand, IP doesn't populate ethernet header for multicast loopback transmission:

int dev_loopback_xmit(struct net *net, struct sock *sk, struct sk_buff *skb)                                                                                                    
{                                                                                                                                                                               
        skb_reset_mac_header(skb);                                                                                                                                              
        __skb_pull(skb, skb_network_offset(skb));                                                                                                                               
        skb->pkt_type = PACKET_LOOPBACK;                                                                                                                                        
        skb->ip_summed = CHECKSUM_UNNECESSARY;                                                                                                                                  
        WARN_ON(!skb_dst(skb));                                                                                                                                                 
        skb_dst_force(skb);                                                                                                                                                     
        netif_rx_ni(skb);                                                                                                                                                       

Unicast however works fine, because of:

int neigh_connected_output(struct neighbour *neigh, struct sk_buff *skb)                                                                                                        
{                                                                                                                                                                               
        struct net_device *dev = neigh->dev;                                                                                                                                    
        unsigned int seq;                                                                                                                                                       
        int err;                                                                                                                                                                
                                                                                                                                                                                
        do {                                                                                                                                                                    
                __skb_pull(skb, skb_network_offset(skb));                                                                                                                       
                seq = read_seqbegin(&neigh->ha_lock);                                                                                                                           
                err = dev_hard_header(skb, dev, ntohs(skb->protocol),                                                                                                           
                                      neigh->ha, NULL, skb->len);                                                                                                               
        } while (read_seqretry(&neigh->ha_lock, seq));                                                                                                                          
                                                                                                                                                                                
        if (err >= 0)                                                                                                                                                           
                err = dev_queue_xmit(skb);                                                                                                                                      

I've also collected some stack traces and SKB dumps to illustrate the problem
(I've instrumented macvlan_handle_frame() and eth_header() to understand when
the ethernet header has been generated):

macvlan_handle_frame() receives Router Advertisement, but cannot forward
without Ethernet header:

skb len=96 headroom=40 headlen=96 tailroom=56
mac=(40,0) net=(40,40) trans=80
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xae2e9a2f ip_summed=1 complete_sw=0 valid=0 level=0)
hash(0xc97ebd88 sw=1 l4=1) proto=0x86dd pkttype=5 iif=24
dev name=etha01.212 feat=0x0x0000000040005000
skb headroom: 00000000: 00 28 b3 4d 84 88 ff ff b2 72 b9 5e 00 00 00 00
skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
skb headroom: 00000020: 08 0f 00 00 00 00 00 00
skb linear:   00000000: 60 09 88 bd 00 38 3a ff fe 80 00 00 00 00 00 00
skb linear:   00000010: 00 40 43 ff fe 80 00 00 ff 02 00 00 00 00 00 00
skb linear:   00000020: 00 00 00 00 00 00 00 01 86 00 61 00 40 00 00 2d
skb linear:   00000030: 00 00 00 00 00 00 00 00 03 04 40 e0 00 00 01 2c
skb linear:   00000040: 00 00 00 78 00 00 00 00 fd 5f 42 68 23 87 a8 81
skb linear:   00000050: 00 00 00 00 00 00 00 00 01 01 02 40 43 80 00 00
skb tailroom: 00000000: 00 f0 01 00 00 00 00 00 a4 73 00 00 00 00 00 00
skb tailroom: 00000010: a4 73 00 00 00 00 00 00 00 10 00 00 00 00 00 00
skb tailroom: 00000020: 01 00 00 00 06 00 00 00 40 66 02 00 00 00 00 00
skb tailroom: 00000030: 40 76 02 00 00 00 00 00

Call Trace:
 <IRQ>
 dump_stack+0x69/0x9b
 macvlan_handle_frame+0x321/0x425 [macvlan]
 ? macvlan_forward_source+0x110/0x110 [macvlan]
 __netif_receive_skb_core+0x545/0xda0
 ? ip6_mc_input+0x103/0x250 [ipv6]
 ? ipv6_rcv+0xe1/0xf0 [ipv6]
 ? __netif_receive_skb_one_core+0x36/0x70
 __netif_receive_skb_one_core+0x36/0x70
 process_backlog+0x97/0x140
 net_rx_action+0x1eb/0x350
 __do_softirq+0xe3/0x383
 do_softirq_own_stack+0x2a/0x40
 </IRQ>
 do_softirq.part.4+0x4e/0x50
 netif_rx_ni+0x60/0xd0
 dev_loopback_xmit+0x83/0xf0
 ip6_finish_output2+0x575/0x590 [ipv6]
 ? ip6_cork_release.isra.1+0x64/0x90 [ipv6]
 ? __ip6_make_skb+0x38d/0x680 [ipv6]
 ? ip6_output+0x6c/0x140 [ipv6]
 ip6_output+0x6c/0x140 [ipv6]
 ip6_send_skb+0x1e/0x60 [ipv6]
 rawv6_sendmsg+0xc4b/0xe10 [ipv6]
 ? proc_put_long+0xd0/0xd0
 ? rw_copy_check_uvector+0x4e/0x110
 ? sock_sendmsg+0x36/0x40
 sock_sendmsg+0x36/0x40
 ___sys_sendmsg+0x2b6/0x2d0
 ? proc_dointvec+0x23/0x30
 ? addrconf_sysctl_forward+0x8d/0x250 [ipv6]
 ? dev_forward_change+0x130/0x130 [ipv6]
 ? _raw_spin_unlock+0x12/0x30
 ? proc_sys_call_handler.isra.14+0x9f/0x110
 ? __call_rcu+0x213/0x510
 ? get_max_files+0x10/0x10
 ? trace_hardirqs_on+0x2c/0xe0
 ? __sys_sendmsg+0x63/0xa0
 __sys_sendmsg+0x63/0xa0
 do_syscall_64+0x6c/0x1e0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

Later when the same RA is being transmitted neigh_connected_output(), this is the first
time Ethernet header is being generated for this packet, but this is towards "world", not
the internal MACVLAN bridge:

skb len=110 headroom=26 headlen=110 tailroom=56
mac=(-1,-1) net=(40,40) trans=80
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0xae2e9a2f ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0xc97ebd88 sw=1 l4=1) proto=0x86dd pkttype=0 iif=0
dev name=etha01.212 feat=0x0x0000000040005000
sk family=10 type=3 proto=58
skb headroom: 00000000: 00 28 b3 4d 84 88 ff ff b2 72 b9 5e 00 00 00 00
skb headroom: 00000010: 00 00 00 00 00 00 00 00 00 00
skb linear:   00000000: 33 33 00 00 00 01 02 40 43 80 00 00 86 dd 60 09
skb linear:   00000010: 88 bd 00 38 3a ff fe 80 00 00 00 00 00 00 00 40
skb linear:   00000020: 43 ff fe 80 00 00 ff 02 00 00 00 00 00 00 00 00
skb linear:   00000030: 00 00 00 00 00 01 86 00 61 00 40 00 00 2d 00 00
skb linear:   00000040: 00 00 00 00 00 00 03 04 40 e0 00 00 01 2c 00 00
skb linear:   00000050: 00 78 00 00 00 00 fd 5f 42 68 23 87 a8 81 00 00
skb linear:   00000060: 00 00 00 00 00 00 01 01 02 40 43 80 00 00
skb tailroom: 00000000: 00 f0 01 00 00 00 00 00 a4 73 00 00 00 00 00 00
skb tailroom: 00000010: a4 73 00 00 00 00 00 00 00 10 00 00 00 00 00 00
skb tailroom: 00000020: 01 00 00 00 06 00 00 00 40 66 02 00 00 00 00 00
skb tailroom: 00000030: 40 76 02 00 00 00 00 00

Call Trace:
 dump_stack+0x69/0x9b
 debug_hdr+0x4c/0x60
 eth_header+0x71/0xe0
 vlan_dev_hard_header+0x58/0x140 [8021q]
 neigh_connected_output+0xa9/0x100
 ip6_finish_output2+0x24a/0x590 [ipv6]
 ? ip6_cork_release.isra.1+0x64/0x90 [ipv6]
 ? __ip6_make_skb+0x38d/0x680 [ipv6]
 ? ip6_output+0x6c/0x140 [ipv6]
 ip6_output+0x6c/0x140 [ipv6]
 ip6_send_skb+0x1e/0x60 [ipv6]
 rawv6_sendmsg+0xc4b/0xe10 [ipv6]
 ? proc_put_long+0xd0/0xd0
 ? rw_copy_check_uvector+0x4e/0x110
 ? sock_sendmsg+0x36/0x40
 sock_sendmsg+0x36/0x40
 ___sys_sendmsg+0x2b6/0x2d0
 ? proc_dointvec+0x23/0x30
 ? addrconf_sysctl_forward+0x8d/0x250 [ipv6]
 ? dev_forward_change+0x130/0x130 [ipv6]
 ? _raw_spin_unlock+0x12/0x30
 ? proc_sys_call_handler.isra.14+0x9f/0x110
 ? __call_rcu+0x213/0x510
 ? get_max_files+0x10/0x10
 ? trace_hardirqs_on+0x2c/0xe0
 ? __sys_sendmsg+0x63/0xa0
 __sys_sendmsg+0x63/0xa0
 do_syscall_64+0x6c/0x1e0
 entry_SYSCALL_64_after_hwframe+0x49/0xbe

I would appreciate any hint, how to approach this problem! I can try to come up with a patch,
but as this is so central thing in the IP protocol, I'd like to hear some opinions first...

-- 
Best regards,
Alexander Sverdlin.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2020-05-12 14:23 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-12 14:23 Multicast from underlying MACVLAN interface towards MACVLAN Alexander Sverdlin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).