All of lore.kernel.org
 help / color / mirror / Atom feed
* ipv6: handle -EFAULT from skb_copy_bits
@ 2016-12-14 15:47 Dave Jones
  2016-12-17 15:41 ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Jones @ 2016-12-14 15:47 UTC (permalink / raw)
  To: netdev

It seems to be possible to craft a packet for sendmsg that triggers
the -EFAULT path in skb_copy_bits resulting in a BUG_ON that looks like:

RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80

Call Trace:
 [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
 [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
 [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
 [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
 [<ffffffff816da27e>] SyS_sendto+0xe/0x10
 [<ffffffff81002910>] do_syscall_64+0x50/0xa0
 [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25

Handle this in rawv6_push_pending_frames and jump to the failure path.

Signed-off-by: Dave Jones <davej@codemonkey.org.uk>

diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 291ebc260e70..35aa82faa052 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -591,7 +591,9 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
 	}
 
 	offset += skb_transport_offset(skb);
-	BUG_ON(skb_copy_bits(skb, offset, &csum, 2));
+	err = skb_copy_bits(skb, offset, &csum, 2);
+	if (err < 0)
+		goto out;
 
 	/* in case cksum was not initialized */
 	if (unlikely(csum))

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-14 15:47 ipv6: handle -EFAULT from skb_copy_bits Dave Jones
@ 2016-12-17 15:41 ` David Miller
  2016-12-17 16:43   ` Dave Jones
  2016-12-19 17:03   ` Dave Jones
  0 siblings, 2 replies; 22+ messages in thread
From: David Miller @ 2016-12-17 15:41 UTC (permalink / raw)
  To: davej; +Cc: netdev

From: Dave Jones <davej@codemonkey.org.uk>
Date: Wed, 14 Dec 2016 10:47:29 -0500

> It seems to be possible to craft a packet for sendmsg that triggers
> the -EFAULT path in skb_copy_bits resulting in a BUG_ON that looks like:
> 
> RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
> RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
> RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
> RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
> RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
> R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
> R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
> 
> Call Trace:
>  [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
>  [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
>  [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
>  [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
>  [<ffffffff816da27e>] SyS_sendto+0xe/0x10
>  [<ffffffff81002910>] do_syscall_64+0x50/0xa0
>  [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
> 
> Handle this in rawv6_push_pending_frames and jump to the failure path.
> 
> Signed-off-by: Dave Jones <davej@codemonkey.org.uk>

Hmmm, that's interesting.  Becaue the code in __ip6_append_data(), which
sets up the ->cork.base.length value, seems to be defensively trying to
avoid this possibility.

For example, it checks things like:

	if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
	    (sk->sk_protocol == IPPROTO_UDP ||
	     sk->sk_protocol == IPPROTO_RAW)) {

This is why the transport offset plus the length should never exceed
the total length for that skb_copy_bits() call.

Perhaps this protocol check in the code above is incomplete?  Do you
know what the sk->sk_protocol value was when that BUG triggered?  That
might shine some light on what is really happening here.

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-17 15:41 ` David Miller
@ 2016-12-17 16:43   ` Dave Jones
  2016-12-19 17:03   ` Dave Jones
  1 sibling, 0 replies; 22+ messages in thread
From: Dave Jones @ 2016-12-17 16:43 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Sat, Dec 17, 2016 at 10:41:20AM -0500, David Miller wrote:
 > From: Dave Jones <davej@codemonkey.org.uk>
 > Date: Wed, 14 Dec 2016 10:47:29 -0500
 > 
 > > It seems to be possible to craft a packet for sendmsg that triggers
 > > the -EFAULT path in skb_copy_bits resulting in a BUG_ON that looks like:
 > > 
 > > RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
 > > RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
 > > RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
 > > RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
 > > RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
 > > R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
 > > R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
 > > 
 > > Call Trace:
 > >  [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
 > >  [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
 > >  [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
 > >  [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
 > >  [<ffffffff816da27e>] SyS_sendto+0xe/0x10
 > >  [<ffffffff81002910>] do_syscall_64+0x50/0xa0
 > >  [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
 > > 
 > > Handle this in rawv6_push_pending_frames and jump to the failure path.
 > > 
 > > Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
 > 
 > Hmmm, that's interesting.  Becaue the code in __ip6_append_data(), which
 > sets up the ->cork.base.length value, seems to be defensively trying to
 > avoid this possibility.
 > 
 > For example, it checks things like:
 > 
 > 	if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
 > 	    (sk->sk_protocol == IPPROTO_UDP ||
 > 	     sk->sk_protocol == IPPROTO_RAW)) {
 > 
 > This is why the transport offset plus the length should never exceed
 > the total length for that skb_copy_bits() call.
 > 
 > Perhaps this protocol check in the code above is incomplete?  Do you
 > know what the sk->sk_protocol value was when that BUG triggered?  That
 > might shine some light on what is really happening here.

I'll see if I can craft up a reproducer next week.
For some reason I've not hit this on my test setup at home, but it
reproduces daily in our test setup at facebook.  The only thing
I can think of is that those fb boxes are ipv6 only, so I might be
exercising v4 more at home.

	Dave

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-17 15:41 ` David Miller
  2016-12-17 16:43   ` Dave Jones
@ 2016-12-19 17:03   ` Dave Jones
  2016-12-19 19:48     ` David Miller
  1 sibling, 1 reply; 22+ messages in thread
From: Dave Jones @ 2016-12-19 17:03 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Sat, Dec 17, 2016 at 10:41:20AM -0500, David Miller wrote:

 > > It seems to be possible to craft a packet for sendmsg that triggers
 > > the -EFAULT path in skb_copy_bits resulting in a BUG_ON that looks like:
 > > 
 > > RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
 > > RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
 > > RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
 > > RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
 > > RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
 > > R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
 > > R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
 > > 
 > > Call Trace:
 > >  [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
 > >  [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
 > >  [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
 > >  [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
 > >  [<ffffffff816da27e>] SyS_sendto+0xe/0x10
 > >  [<ffffffff81002910>] do_syscall_64+0x50/0xa0
 > >  [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
 > > 
 > > Handle this in rawv6_push_pending_frames and jump to the failure path.
 > > 
 > > Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
 > 
 > Hmmm, that's interesting.  Becaue the code in __ip6_append_data(), which
 > sets up the ->cork.base.length value, seems to be defensively trying to
 > avoid this possibility.
 > 
 > For example, it checks things like:
 > 
 > 	if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
 > 	    (sk->sk_protocol == IPPROTO_UDP ||
 > 	     sk->sk_protocol == IPPROTO_RAW)) {
 > 
 > This is why the transport offset plus the length should never exceed
 > the total length for that skb_copy_bits() call.
 > 
 > Perhaps this protocol check in the code above is incomplete?  Do you
 > know what the sk->sk_protocol value was when that BUG triggered?  That
 > might shine some light on what is really happening here.

Hm.
  sk_protocol = 7, 






struct sock {
  __sk_common = {
    {
      skc_addrpair = 0, 
      {
        skc_daddr = 0, 
        skc_rcv_saddr = 0
      }
    }, 
    {
      skc_hash = 0, 
      skc_u16hashes = {0, 0}
    }, 
    {
      skc_portpair = 458752, 
      {
        skc_dport = 0, 
        skc_num = 7
      }
    }, 
    skc_family = 10, 
    skc_state = 7 '\a', 
    skc_reuse = 1 '\001', 
    skc_reuseport = 0 '\000', 
    skc_ipv6only = 0 '\000', 
    skc_net_refcnt = 1 '\001', 
    skc_bound_dev_if = 0, 
    {
      skc_bind_node = {
        next = 0x0, 
        pprev = 0x0
      }, 
      skc_portaddr_node = {
        next = 0x0, 
        pprev = 0x0
      }
    }, 
    skc_prot = 0xffffffff81cf3bc0 <rawv6_prot>, 
    skc_net = {
      net = 0xffffffff81ce78c0 <init_net>
    }, 
    skc_v6_daddr = {
      in6_u = {
        u6_addr8 = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", 
        u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, 
        u6_addr32 = {0, 0, 0, 0}
      }
    }, 
    }, 
    skc_v6_rcv_saddr = {
      in6_u = {
        u6_addr8 = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000", 
        u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, 
        u6_addr32 = {0, 0, 0, 0}
      }
    }, 
    skc_cookie = {
      counter = 0
    }, 
    {
      skc_flags = 256, 
      skc_listener = 0x100, 
      skc_tw_dr = 0x100
    }, 
    skc_dontcopy_begin = 0xffff881fd1ce9b68, 
    {
      skc_node = {
        next = 0x0, 
        pprev = 0x0
      }, 
      skc_nulls_node = {
        next = 0x0, 
        pprev = 0x0
      }
    }, 
    skc_tx_queue_mapping = -1, 
    {
      skc_incoming_cpu = -1, 
      skc_rcv_wnd = 4294967295, 
      skc_tw_rcv_nxt = 4294967295
    }, 
    skc_refcnt = {
      counter = 1
    }, 
    skc_dontcopy_end = 0xffff881fd1ce9b84, 
    {
      skc_rxhash = 0, 
      skc_window_clamp = 0, 
      skc_tw_snd_nxt = 0
    }
  }, 
  sk_lock = {
    slock = {
      {
        rlock = {
          raw_lock = {
            val = {
              counter = 0
            }
          }
        }
      }
    }, 
    owned = 1, 
    wq = {
      lock = {
        {
          rlock = {
            raw_lock = {
              val = {
                counter = 0
              }
            }
          }
        }
      }, 
      task_list = {
        next = 0xffff881fd1ce9b98, 
        prev = 0xffff881fd1ce9b98
      }
    }
  }, 
  sk_receive_queue = {
    next = 0xffff881fd1ce9ba8, 
    prev = 0xffff881fd1ce9ba8, 
    qlen = 0, 
    lock = {
      {
        rlock = {
          raw_lock = {
            val = {
              counter = 0
            }
          }
        }
      }
    }
  }, 
  sk_backlog = {
    rmem_alloc = {
      counter = 0
    }, 
    len = 0, 
    head = 0x0, 
    tail = 0x0
  }, 
  sk_forward_alloc = 0, 
  sk_txhash = 0, 
  sk_napi_id = 0, 
  sk_ll_usec = 0, 
  sk_drops = {
    counter = 0
  }, 
  sk_rcvbuf = 1024000, 
  sk_filter = 0x0, 
  {
    sk_wq = 0xffff881fc4b3ab00, 
    sk_wq_raw = 0xffff881fc4b3ab00
  }, 
  sk_policy = {0x0, 0x0}, 
  sk_rx_dst = 0x0, 
  sk_dst_cache = 0x0, 
  sk_wmem_alloc = {
    counter = 769
  }, 
  sk_omem_alloc = {
    counter = 640
  }, 
  sk_sndbuf = 212992, 
  sk_write_queue = {
    next = 0xffff881f6fc60b00, 
    prev = 0xffff881f6fc60b00, 
    qlen = 1, 
    lock = {
      {
        rlock = {
          raw_lock = {
            val = {
              counter = 0
            }
          }
        }
      }
    }
  }, 
  sk_shutdown = 0, 
  sk_no_check_tx = 0, 
  sk_no_check_rx = 0, 
  sk_userlocks = 0, 
  sk_protocol = 7, 
  sk_type = 3, 
  sk_wmem_queued = 0, 
  sk_allocation = 37748928, 
  sk_pacing_rate = 4294967295, 
  sk_max_pacing_rate = 4294967295, 
  sk_route_caps = 0, 
  sk_route_nocaps = 0, 
  sk_gso_type = 0, 
  sk_gso_max_size = 0, 
  sk_gso_max_segs = 0, 
  sk_rcvlowat = 1, 
  sk_lingertime = 0, 
  sk_error_queue = {
    next = 0xffff881fd1ce9c88, 
    prev = 0xffff881fd1ce9c88, 
    qlen = 0, 
    lock = {
      {
        rlock = {
          raw_lock = {
            val = {
              counter = 0
            }
          }
        }
      }
    }
  }, 
  sk_prot_creator = 0xffffffff81cf3bc0 <rawv6_prot>, 
  sk_callback_lock = {
    raw_lock = {
      cnts = {
        counter = 0
      }, 
      wait_lock = {
        val = {
          counter = 0
        }
      }
    }
  }, 
  sk_err = 0, 
  sk_err_soft = 0, 
  sk_ack_backlog = 0, 
  sk_max_ack_backlog = 0, 
  sk_priority = 0, 
  sk_mark = 0, 
  sk_peer_pid = 0x0, 
  sk_peer_cred = 0x0, 
  sk_rcvtimeo = 9223372036854775807, 
  sk_sndtimeo = 9223372036854775807, 
  sk_timer = {
    entry = {
      next = 0x0, 
      pprev = 0x0
    }, 
    expires = 0, 
    function = 0x0, 
    data = 0, 
    flags = 4, 
    slack = -1, 
    start_pid = -1, 
    start_site = 0x0, 
    start_comm = "\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000"
  }, 
  sk_stamp = {
    tv64 = 1481510503238183349
  }, 
  sk_tsflags = 0, 
  sk_tskey = 0, 
  sk_socket = 0xffff881fd0d24b00, 
  sk_user_data = 0x0, 
  sk_frag = {
    page = 0x0, 
    offset = 0, 
    size = 0
  }, 
  sk_send_head = 0x0, 
  sk_peek_off = -1, 
  sk_write_pending = 0, 
  sk_security = 0x0, 
  sk_cgrp_data = {
    {
      {
        is_data = 48 '0', 
        padding = 109 'm', 
        prioidx = 33292, 
        classid = 4294967295
      }, 
      val = 18446744071596436784
    }
  }, 
  sk_memcg = 0x0, 
  sk_state_change = 0xffffffff816dbed0 <sock_def_wakeup>, 
  sk_data_ready = 0xffffffff816dcd20 <sock_def_readable>, 
  sk_write_space = 0xffffffff816dcc90 <sock_def_write_space>, 
  sk_error_report = 0xffffffff816dcc30 <sock_def_error_report>, 
  sk_backlog_rcv = 0xffffffff817c5690 <rawv6_rcv_skb>, 
  sk_destruct = 0xffffffff81772460 <inet_sock_destruct>, 
  sk_reuseport_cb = 0x0
}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-19 17:03   ` Dave Jones
@ 2016-12-19 19:48     ` David Miller
  2016-12-20  0:31       ` Dave Jones
  0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2016-12-19 19:48 UTC (permalink / raw)
  To: davej; +Cc: netdev

From: Dave Jones <davej@codemonkey.org.uk>
Date: Mon, 19 Dec 2016 12:03:20 -0500

> On Sat, Dec 17, 2016 at 10:41:20AM -0500, David Miller wrote:
> 
>  > > It seems to be possible to craft a packet for sendmsg that triggers
>  > > the -EFAULT path in skb_copy_bits resulting in a BUG_ON that looks like:
>  > > 
>  > > RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
>  > > RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
>  > > RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
>  > > RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
>  > > RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
>  > > R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
>  > > R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80
>  > > 
>  > > Call Trace:
>  > >  [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
>  > >  [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
>  > >  [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
>  > >  [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
>  > >  [<ffffffff816da27e>] SyS_sendto+0xe/0x10
>  > >  [<ffffffff81002910>] do_syscall_64+0x50/0xa0
>  > >  [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25
>  > > 
>  > > Handle this in rawv6_push_pending_frames and jump to the failure path.
>  > > 
>  > > Signed-off-by: Dave Jones <davej@codemonkey.org.uk>
>  > 
>  > Hmmm, that's interesting.  Becaue the code in __ip6_append_data(), which
>  > sets up the ->cork.base.length value, seems to be defensively trying to
>  > avoid this possibility.
>  > 
>  > For example, it checks things like:
>  > 
>  > 	if (cork->length + length > mtu - headersize && ipc6->dontfrag &&
>  > 	    (sk->sk_protocol == IPPROTO_UDP ||
>  > 	     sk->sk_protocol == IPPROTO_RAW)) {
>  > 
>  > This is why the transport offset plus the length should never exceed
>  > the total length for that skb_copy_bits() call.
>  > 
>  > Perhaps this protocol check in the code above is incomplete?  Do you
>  > know what the sk->sk_protocol value was when that BUG triggered?  That
>  > might shine some light on what is really happening here.
> 
> Hm.
>   sk_protocol = 7, 

So, some arbitrary value.

Obviously, the test above intends to handle RAW sockets and that is exactly
what we have here.

A raw socket gets whatever the user specified as 'protocol' at create
time as the sk->sk_protocol value.

One thing that's interesting is that if the user picks "IPPROTO_RAW"
as the value of 'protocol' we set inet->hdrincl to 1.

The user can also set inet->hdrincl to 1 or 0 via setsockopt().

I think this is part of the problem.  The test above means to check
for "RAW socket with hdrincl set" and is trying to do this more simply.
But because setsockopt() can set this arbitrarily, testing sk_protocol
alone isn't enough.

So changing:

	sk->sk_protocol == IPPROTO_RAW

into something like:

	(sk->sk_socket->type == SOCK_RAW && inet_sk(sk)->hdrincl)

should correct the test.

That is because this hdrincl thing is what controls which code path we
take in rawv6_sendmsg():

	if (inet->hdrincl)
		err = rawv6_send_hdrinc(sk, msg, len, &fl6, &dst, msg->msg_flags);
	else {
		ipc6.opt = opt;
		lock_sock(sk);
		err = ip6_append_data(sk, raw6_getfrag, &rfv,
			len, 0, &ipc6, &fl6, (struct rt6_info *)dst,
			msg->msg_flags, &sockc);

		if (err)
			ip6_flush_pending_frames(sk);
		else if (!(msg->msg_flags & MSG_MORE))
			err = rawv6_push_pending_frames(sk, &fl6, rp);
		release_sock(sk);
	}

You can test if the change I suggest above works.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-19 19:48     ` David Miller
@ 2016-12-20  0:31       ` Dave Jones
  2016-12-20  0:40         ` Dave Jones
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Jones @ 2016-12-20  0:31 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Mon, Dec 19, 2016 at 02:48:48PM -0500, David Miller wrote:

 > One thing that's interesting is that if the user picks "IPPROTO_RAW"
 > as the value of 'protocol' we set inet->hdrincl to 1.
 > 
 > The user can also set inet->hdrincl to 1 or 0 via setsockopt().
 > 
 > I think this is part of the problem.  The test above means to check
 > for "RAW socket with hdrincl set" and is trying to do this more simply.
 > But because setsockopt() can set this arbitrarily, testing sk_protocol
 > alone isn't enough.
 > 
 > So changing:
 > 
 > 	sk->sk_protocol == IPPROTO_RAW
 > 
 > into something like:
 > 
 > 	(sk->sk_socket->type == SOCK_RAW && inet_sk(sk)->hdrincl)
 > 
 > should correct the test.
 >  ..
 > 
 > You can test if the change I suggest above works.

Unfortunately, this made no difference.  I spent some time today trying
to make a better reproducer, but failed. I'll revisit again tomorrow.

Maybe I need >1 process/thread to trigger this.  That would explain why
I can trigger it with Trinity.

	Dave

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-20  0:31       ` Dave Jones
@ 2016-12-20  0:40         ` Dave Jones
  2016-12-20  1:36           ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Jones @ 2016-12-20  0:40 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote:

 > Unfortunately, this made no difference.  I spent some time today trying
 > to make a better reproducer, but failed. I'll revisit again tomorrow.
 > 
 > Maybe I need >1 process/thread to trigger this.  That would explain why
 > I can trigger it with Trinity.

scratch that last part, I finally just repro'd it with a single process.

	Dave

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-20  0:40         ` Dave Jones
@ 2016-12-20  1:36           ` David Miller
  2016-12-20 18:17             ` Dave Jones
  0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2016-12-20  1:36 UTC (permalink / raw)
  To: davej; +Cc: netdev

From: Dave Jones <davej@codemonkey.org.uk>
Date: Mon, 19 Dec 2016 19:40:13 -0500

> On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote:
> 
>  > Unfortunately, this made no difference.  I spent some time today trying
>  > to make a better reproducer, but failed. I'll revisit again tomorrow.
>  > 
>  > Maybe I need >1 process/thread to trigger this.  That would explain why
>  > I can trigger it with Trinity.
> 
> scratch that last part, I finally just repro'd it with a single process.

Thanks for the info, I'll try to think about this some more.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-20  1:36           ` David Miller
@ 2016-12-20 18:17             ` Dave Jones
  2016-12-20 18:28               ` David Miller
  2016-12-20 19:31               ` Cong Wang
  0 siblings, 2 replies; 22+ messages in thread
From: Dave Jones @ 2016-12-20 18:17 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote:
 > From: Dave Jones <davej@codemonkey.org.uk>
 > Date: Mon, 19 Dec 2016 19:40:13 -0500
 > 
 > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote:
 > > 
 > >  > Unfortunately, this made no difference.  I spent some time today trying
 > >  > to make a better reproducer, but failed. I'll revisit again tomorrow.
 > >  > 
 > >  > Maybe I need >1 process/thread to trigger this.  That would explain why
 > >  > I can trigger it with Trinity.
 > > 
 > > scratch that last part, I finally just repro'd it with a single process.
 > 
 > Thanks for the info, I'll try to think about this some more.

I threw in some debug printks right before that BUG_ON.
it's always this:

skb->len=31 skb->data_len=0 offset:30 total_len:9

Shouldn't we have kicked out data_len=0 skb's somewhere before we got this far ?

	Dave

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-20 18:17             ` Dave Jones
@ 2016-12-20 18:28               ` David Miller
  2016-12-20 19:34                 ` Dave Jones
  2016-12-20 19:31               ` Cong Wang
  1 sibling, 1 reply; 22+ messages in thread
From: David Miller @ 2016-12-20 18:28 UTC (permalink / raw)
  To: davej; +Cc: netdev

From: Dave Jones <davej@codemonkey.org.uk>
Date: Tue, 20 Dec 2016 13:17:28 -0500

> On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote:
>  > From: Dave Jones <davej@codemonkey.org.uk>
>  > Date: Mon, 19 Dec 2016 19:40:13 -0500
>  > 
>  > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote:
>  > > 
>  > >  > Unfortunately, this made no difference.  I spent some time today trying
>  > >  > to make a better reproducer, but failed. I'll revisit again tomorrow.
>  > >  > 
>  > >  > Maybe I need >1 process/thread to trigger this.  That would explain why
>  > >  > I can trigger it with Trinity.
>  > > 
>  > > scratch that last part, I finally just repro'd it with a single process.
>  > 
>  > Thanks for the info, I'll try to think about this some more.
> 
> I threw in some debug printks right before that BUG_ON.
> it's always this:
> 
> skb->len=31 skb->data_len=0 offset:30 total_len:9
> 
> Shouldn't we have kicked out data_len=0 skb's somewhere before we got this far ?

skb->data_len is just the length of any non-linear data in the SKB.

This has to do with the SKB buffer layout and geometry, not whether
the packet is "fragmented" in the protocol sense.

So no, this isn't a criteria for packets being filtered out by this
point.

Can you try to capture what sk->sk_socket->type and
inet_sk(sk)->hdrincl are set to at the time of the crash?

Thanks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-20 18:17             ` Dave Jones
  2016-12-20 18:28               ` David Miller
@ 2016-12-20 19:31               ` Cong Wang
  2016-12-20 22:12                 ` Dave Jones
  1 sibling, 1 reply; 22+ messages in thread
From: Cong Wang @ 2016-12-20 19:31 UTC (permalink / raw)
  To: Dave Jones; +Cc: David Miller, Linux Kernel Network Developers

On Tue, Dec 20, 2016 at 10:17 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
> On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote:
>  > From: Dave Jones <davej@codemonkey.org.uk>
>  > Date: Mon, 19 Dec 2016 19:40:13 -0500
>  >
>  > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote:
>  > >
>  > >  > Unfortunately, this made no difference.  I spent some time today trying
>  > >  > to make a better reproducer, but failed. I'll revisit again tomorrow.
>  > >  >
>  > >  > Maybe I need >1 process/thread to trigger this.  That would explain why
>  > >  > I can trigger it with Trinity.
>  > >
>  > > scratch that last part, I finally just repro'd it with a single process.
>  >
>  > Thanks for the info, I'll try to think about this some more.
>
> I threw in some debug printks right before that BUG_ON.
> it's always this:
>
> skb->len=31 skb->data_len=0 offset:30 total_len:9

Clearly we fail because 30 > 31 - 2, seems 'offset' is not correct here,
off-by-one?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-20 18:28               ` David Miller
@ 2016-12-20 19:34                 ` Dave Jones
  0 siblings, 0 replies; 22+ messages in thread
From: Dave Jones @ 2016-12-20 19:34 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Tue, Dec 20, 2016 at 01:28:13PM -0500, David Miller wrote:
 
 > This has to do with the SKB buffer layout and geometry, not whether
 > the packet is "fragmented" in the protocol sense.
 > 
 > So no, this isn't a criteria for packets being filtered out by this
 > point.
 > 
 > Can you try to capture what sk->sk_socket->type and
 > inet_sk(sk)->hdrincl are set to at the time of the crash?
 > 

type:3 hdrincl:0

	Dave

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-20 19:31               ` Cong Wang
@ 2016-12-20 22:12                 ` Dave Jones
  2016-12-21  6:09                   ` Cong Wang
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Jones @ 2016-12-20 22:12 UTC (permalink / raw)
  To: Cong Wang; +Cc: David Miller, Linux Kernel Network Developers

On Tue, Dec 20, 2016 at 11:31:38AM -0800, Cong Wang wrote:
 > On Tue, Dec 20, 2016 at 10:17 AM, Dave Jones <davej@codemonkey.org.uk> wrote:
 > > On Mon, Dec 19, 2016 at 08:36:23PM -0500, David Miller wrote:
 > >  > From: Dave Jones <davej@codemonkey.org.uk>
 > >  > Date: Mon, 19 Dec 2016 19:40:13 -0500
 > >  >
 > >  > > On Mon, Dec 19, 2016 at 07:31:44PM -0500, Dave Jones wrote:
 > >  > >
 > >  > >  > Unfortunately, this made no difference.  I spent some time today trying
 > >  > >  > to make a better reproducer, but failed. I'll revisit again tomorrow.
 > >  > >  >
 > >  > >  > Maybe I need >1 process/thread to trigger this.  That would explain why
 > >  > >  > I can trigger it with Trinity.
 > >  > >
 > >  > > scratch that last part, I finally just repro'd it with a single process.
 > >  >
 > >  > Thanks for the info, I'll try to think about this some more.
 > >
 > > I threw in some debug printks right before that BUG_ON.
 > > it's always this:
 > >
 > > skb->len=31 skb->data_len=0 offset:30 total_len:9
 > 
 > Clearly we fail because 30 > 31 - 2, seems 'offset' is not correct here,
 > off-by-one?

Ok, I finally made a messy, albeit good enough reproducer.

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define LEN 504

int main(int argc, char* argv[])
{
	int fd;
	int zero = 0;
	char buf[LEN];

	memset(buf, 0, LEN);

	fd = socket(AF_INET6, SOCK_RAW, 7);

	setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
	setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

	sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
}

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-20 22:12                 ` Dave Jones
@ 2016-12-21  6:09                   ` Cong Wang
  2016-12-21 12:27                     ` Hannes Frederic Sowa
  0 siblings, 1 reply; 22+ messages in thread
From: Cong Wang @ 2016-12-21  6:09 UTC (permalink / raw)
  To: Dave Jones; +Cc: David Miller, Linux Kernel Network Developers

On Tue, Dec 20, 2016 at 2:12 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
>         fd = socket(AF_INET6, SOCK_RAW, 7);
>
>         setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
>         setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);
>

Interesting, you set the checksum offset to be 0, but the packet size
is actually 49, transport header is located at offset 48, so apparently
the packet doesn't have room for a 16bit checksum after network header.

Your original patch seems reasonable to me, unless there is some
check in __ip6_append_data() which is supposed to catch this, but
CHECKSUM is specific to raw socket only.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-21  6:09                   ` Cong Wang
@ 2016-12-21 12:27                     ` Hannes Frederic Sowa
  2016-12-21 12:41                       ` Hannes Frederic Sowa
  0 siblings, 1 reply; 22+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-21 12:27 UTC (permalink / raw)
  To: Cong Wang, Dave Jones; +Cc: David Miller, Linux Kernel Network Developers

On Tue, 2016-12-20 at 22:09 -0800, Cong Wang wrote:
> On Tue, Dec 20, 2016 at 2:12 PM, Dave Jones <davej@codemonkey.org.uk> wrote:
> >         fd = socket(AF_INET6, SOCK_RAW, 7);
> > 
> >         setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
> >         setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);
> > 
> 
> Interesting, you set the checksum offset to be 0, but the packet size
> is actually 49, transport header is located at offset 48, so apparently
> the packet doesn't have room for a 16bit checksum after network header.
> 
> Your original patch seems reasonable to me, unless there is some
> check in __ip6_append_data() which is supposed to catch this, but
> CHECKSUM is specific to raw socket only.

The calculation of total_len is wrong here:

	total_len = inet_sk(sk)->cork.base.length;
	if (offset >= total_len - 1) {
		err = -EINVAL;
		ip6_flush_pending_frames(sk);
		goto out;
	}


At least for this bug to fix we need to subtract the extension header
length after the fragmentation header, so:

--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -536,6 +536,17 @@ static int rawv6_recvmsg(struct sock *sk, struct msghdr *msg, size_t len,
        goto out;
 }
 
+static unsigned int raw6_corked_transport_len(const struct sock *sk)
+{
+       unsigned int len = inet_sk(sk)->cork.base.length;
+       struct ipv6_txoptions *opt = inet6_sk(sk)->cork.opt;
+
+       if (likely(!opt))
+               return len;
+
+       return len - opt->opt_flen;
+}
+
 static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
                                     struct raw6_sock *rp)
 {
@@ -543,7 +554,7 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
        int err = 0;
        int offset;
        int len;
-       int total_len;
+       int transport_len;
        __wsum tmp_csum;
        __sum16 csum;
 
@@ -555,8 +566,8 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
                goto out;
 
        offset = rp->offset;
-       total_len = inet_sk(sk)->cork.base.length;
-       if (offset >= total_len - 1) {
+       transport_len = raw6_corked_transport_len(sk);
+       if (offset >= transport_len - 1) {
                err = -EINVAL;
                ip6_flush_pending_frames(sk);
                goto out;
@@ -598,7 +609,7 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
                tmp_csum = csum_sub(tmp_csum, csum_unfold(csum));
 
        csum = csum_ipv6_magic(&fl6->saddr, &fl6->daddr,
-                              total_len, fl6->flowi6_proto, tmp_csum);
+                              transport_len, fl6->flowi6_proto, tmp_csum);
 
        if (csum == 0 && fl6->flowi6_proto == IPPROTO_UDP)
                csum = CSUM_MANGLED_0;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-21 12:27                     ` Hannes Frederic Sowa
@ 2016-12-21 12:41                       ` Hannes Frederic Sowa
  2016-12-21 19:04                         ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-21 12:41 UTC (permalink / raw)
  To: Cong Wang, Dave Jones; +Cc: David Miller, Linux Kernel Network Developers

On Wed, 2016-12-21 at 13:27 +0100, Hannes Frederic Sowa wrote: 
> @@ -555,8 +566,8 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
>                 goto out;
>  
>         offset = rp->offset;
> -       total_len = inet_sk(sk)->cork.base.length;
> -       if (offset >= total_len - 1) {
> +       transport_len = raw6_corked_transport_len(sk);
> +       if (offset >= transport_len - 1) {
>                 err = -EINVAL;
>                 ip6_flush_pending_frames(sk);
>                 goto out;
> @@ -598,7 +609,7 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
>                 tmp_csum = csum_sub(tmp_csum, csum_unfold(csum));
>  
>         csum = csum_ipv6_magic(&fl6->saddr, &fl6->daddr,
> -                              total_len, fl6->flowi6_proto, tmp_csum);
> +                              transport_len, fl6->flowi6_proto, tmp_csum);
>  
> 

Ops, here we need actually the total_len plus the opt->opt_nflen to
always calculate the correct checksum.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-21 12:41                       ` Hannes Frederic Sowa
@ 2016-12-21 19:04                         ` David Miller
  2016-12-21 21:33                           ` Hannes Frederic Sowa
  0 siblings, 1 reply; 22+ messages in thread
From: David Miller @ 2016-12-21 19:04 UTC (permalink / raw)
  To: hannes; +Cc: xiyou.wangcong, davej, netdev

From: Hannes Frederic Sowa <hannes@stressinduktion.org>
Date: Wed, 21 Dec 2016 13:41:13 +0100

> On Wed, 2016-12-21 at 13:27 +0100, Hannes Frederic Sowa wrote: 
>> @@ -555,8 +566,8 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
>>                 goto out;
>>  
>>         offset = rp->offset;
>> -       total_len = inet_sk(sk)->cork.base.length;
>> -       if (offset >= total_len - 1) {
>> +       transport_len = raw6_corked_transport_len(sk);
>> +       if (offset >= transport_len - 1) {
>>                 err = -EINVAL;
>>                 ip6_flush_pending_frames(sk);
>>                 goto out;
>> @@ -598,7 +609,7 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
>>                 tmp_csum = csum_sub(tmp_csum, csum_unfold(csum));
>>  
>>         csum = csum_ipv6_magic(&fl6->saddr, &fl6->daddr,
>> -                              total_len, fl6->flowi6_proto, tmp_csum);
>> +                              transport_len, fl6->flowi6_proto, tmp_csum);
>>  
>> 
> 
> Ops, here we need actually the total_len plus the opt->opt_nflen to
> always calculate the correct checksum.

It's a real shame we can't just use skb_transport_offset().  This value
has essentially been calculated for us already.

Also, if we iterate over multiple SKBs in the write queue, don't you have
to redo this calculation for every SKB we iterate over?

Furthermore, what if the user queued up some SKBs in the raw socket
with MSG_MORE, and then changed some of the options for a subsequent
sendmsg() call?

Given all of this, I think the best thing to do is validate the offset
after the queue walks, which is pretty much what Dave Jones's original
patch was doing.

Sigh... well, at least we now understand what's going on here.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-21 19:04                         ` David Miller
@ 2016-12-21 21:33                           ` Hannes Frederic Sowa
  2016-12-22  1:40                             ` Dave Jones
  0 siblings, 1 reply; 22+ messages in thread
From: Hannes Frederic Sowa @ 2016-12-21 21:33 UTC (permalink / raw)
  To: David Miller; +Cc: xiyou.wangcong, davej, netdev

On Wed, 2016-12-21 at 14:04 -0500, David Miller wrote:
> From: Hannes Frederic Sowa <hannes@stressinduktion.org>
> Date: Wed, 21 Dec 2016 13:41:13 +0100
> 
> > On Wed, 2016-12-21 at 13:27 +0100, Hannes Frederic Sowa wrote: 
> >> @@ -555,8 +566,8 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
> >>                 goto out;
> >>  
> >>         offset = rp->offset;
> >> -       total_len = inet_sk(sk)->cork.base.length;
> >> -       if (offset >= total_len - 1) {
> >> +       transport_len = raw6_corked_transport_len(sk);
> >> +       if (offset >= transport_len - 1) {
> >>                 err = -EINVAL;
> >>                 ip6_flush_pending_frames(sk);
> >>                 goto out;
> >> @@ -598,7 +609,7 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
> >>                 tmp_csum = csum_sub(tmp_csum, csum_unfold(csum));
> >>  
> >>         csum = csum_ipv6_magic(&fl6->saddr, &fl6->daddr,
> >> -                              total_len, fl6->flowi6_proto, tmp_csum);
> >> +                              transport_len, fl6->flowi6_proto, tmp_csum);
> >>  
> >> 
> > 
> > Ops, here we need actually the total_len plus the opt->opt_nflen to
> > always calculate the correct checksum.
> 
> It's a real shame we can't just use skb_transport_offset().  This value
> has essentially been calculated for us already.

This code path is called when we want to send out the socket write
queue. Because of MSG_MORE we might have multiple skbs in there, but
only the first one actually carries the true header, the others are
optimized for the ipv6 fragmentation fast path, thus we need to use
skb_transport_offset to find the correct offset but we must not
touch/read the data area before, as it is undefined data at that point.

Basically total_len so far accumulated all the payload length written
to the socket from the syscall argument from the user, but
unfortunately it also accounts for the first sendmsg's synthesized
extension header length in the write queue. We need to reverse this
calculation at this point in time.

So total_len is independent from the loop we do below and the length
should always reflect the length of all skbs stored in the write queue.

The loop adjusts the offset of the checksum in case the checksum offset
points to data in one of the later skbs, but we know before if that can
happen or not because of the cork length.

> Also, if we iterate over multiple SKBs in the write queue, don't you have
> to redo this calculation for every SKB we iterate over?

All the skb's payload length are summed up in the cork's length field,
so we don't need to sum it up again but just can use that value as is
minus the extension header adjustment.

> Furthermore, what if the user queued up some SKBs in the raw socket
> with MSG_MORE, and then changed some of the options for a subsequent
> sendmsg() call?

We only refresh the extension headers the next time we prepare an ipv6
header, which is the first sendmsg after one without MSG_MORE. We are
protected in this situation and don't change header lengths later
during further MSG_MORE sendmsgs.

> Given all of this, I think the best thing to do is validate the offset
> after the queue walks, which is pretty much what Dave Jones's original
> patch was doing.

I think both approaches protect against the bug reasonably well, but
Dave's patch has a bug: we must either call ip6_flush_pending_frames to
clear the socket write queue with the buggy send request.

> Sigh... well, at least we now understand what's going on here.

Yep, the code is more than a bit complex. :/

Bye,
Hannes

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-21 21:33                           ` Hannes Frederic Sowa
@ 2016-12-22  1:40                             ` Dave Jones
  2016-12-22  3:29                               ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Jones @ 2016-12-22  1:40 UTC (permalink / raw)
  To: Hannes Frederic Sowa; +Cc: David Miller, xiyou.wangcong, netdev

On Wed, Dec 21, 2016 at 10:33:20PM +0100, Hannes Frederic Sowa wrote:

 > > Given all of this, I think the best thing to do is validate the offset
 > > after the queue walks, which is pretty much what Dave Jones's original
 > > patch was doing.
 > 
 > I think both approaches protect against the bug reasonably well, but
 > Dave's patch has a bug: we must either call ip6_flush_pending_frames to
 > clear the socket write queue with the buggy send request.

I can fix that up and resubmit, or we can go with your approach.
DaveM ?

	Dave

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-22  1:40                             ` Dave Jones
@ 2016-12-22  3:29                               ` David Miller
  0 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2016-12-22  3:29 UTC (permalink / raw)
  To: davej; +Cc: hannes, xiyou.wangcong, netdev

From: Dave Jones <davej@codemonkey.org.uk>
Date: Wed, 21 Dec 2016 20:40:19 -0500

> On Wed, Dec 21, 2016 at 10:33:20PM +0100, Hannes Frederic Sowa wrote:
> 
>  > > Given all of this, I think the best thing to do is validate the offset
>  > > after the queue walks, which is pretty much what Dave Jones's original
>  > > patch was doing.
>  > 
>  > I think both approaches protect against the bug reasonably well, but
>  > Dave's patch has a bug: we must either call ip6_flush_pending_frames to
>  > clear the socket write queue with the buggy send request.
> 
> I can fix that up and resubmit, or we can go with your approach.
> DaveM ?

Please respin your patch with the fix Dave.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: ipv6: handle -EFAULT from skb_copy_bits
  2016-12-22 16:16 Dave Jones
@ 2016-12-23 17:21 ` David Miller
  0 siblings, 0 replies; 22+ messages in thread
From: David Miller @ 2016-12-23 17:21 UTC (permalink / raw)
  To: davej; +Cc: netdev, hannes

From: Dave Jones <davej@codemonkey.org.uk>
Date: Thu, 22 Dec 2016 11:16:22 -0500

> By setting certain socket options on ipv6 raw sockets, we can confuse the
> length calculation in rawv6_push_pending_frames triggering a BUG_ON.
 ...
> Signed-off-by: Dave Jones <davej@codemonkey.org.uk>

Applied and queued up for -stable, thanks Dave.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* ipv6: handle -EFAULT from skb_copy_bits
@ 2016-12-22 16:16 Dave Jones
  2016-12-23 17:21 ` David Miller
  0 siblings, 1 reply; 22+ messages in thread
From: Dave Jones @ 2016-12-22 16:16 UTC (permalink / raw)
  To: netdev; +Cc: Hannes Frederic Sowa

By setting certain socket options on ipv6 raw sockets, we can confuse the
length calculation in rawv6_push_pending_frames triggering a BUG_ON.

RIP: 0010:[<ffffffff817c6390>] [<ffffffff817c6390>] rawv6_sendmsg+0xc30/0xc40
RSP: 0018:ffff881f6c4a7c18  EFLAGS: 00010282
RAX: 00000000fffffff2 RBX: ffff881f6c681680 RCX: 0000000000000002
RDX: ffff881f6c4a7cf8 RSI: 0000000000000030 RDI: ffff881fed0f6a00
RBP: ffff881f6c4a7da8 R08: 0000000000000000 R09: 0000000000000009
R10: ffff881fed0f6a00 R11: 0000000000000009 R12: 0000000000000030
R13: ffff881fed0f6a00 R14: ffff881fee39ba00 R15: ffff881fefa93a80

Call Trace:
 [<ffffffff8118ba23>] ? unmap_page_range+0x693/0x830
 [<ffffffff81772697>] inet_sendmsg+0x67/0xa0
 [<ffffffff816d93f8>] sock_sendmsg+0x38/0x50
 [<ffffffff816d982f>] SYSC_sendto+0xef/0x170
 [<ffffffff816da27e>] SyS_sendto+0xe/0x10
 [<ffffffff81002910>] do_syscall_64+0x50/0xa0
 [<ffffffff817f7cbc>] entry_SYSCALL64_slow_path+0x25/0x25

Handle by jumping to the failure path if skb_copy_bits gets an EFAULT.

Reproducer:

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <sys/types.h>
#include <sys/socket.h>
#include <netinet/in.h>

#define LEN 504

int main(int argc, char* argv[])
{
	int fd;
	int zero = 0;
	char buf[LEN];

	memset(buf, 0, LEN);

	fd = socket(AF_INET6, SOCK_RAW, 7);

	setsockopt(fd, SOL_IPV6, IPV6_CHECKSUM, &zero, 4);
	setsockopt(fd, SOL_IPV6, IPV6_DSTOPTS, &buf, LEN);

	sendto(fd, buf, 1, 0, (struct sockaddr *) buf, 110);
}

Signed-off-by: Dave Jones <davej@codemonkey.org.uk>

diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 291ebc260e70..ea89073c8247 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -591,7 +591,11 @@ static int rawv6_push_pending_frames(struct sock *sk, struct flowi6 *fl6,
 	}
 
 	offset += skb_transport_offset(skb);
-	BUG_ON(skb_copy_bits(skb, offset, &csum, 2));
+	err = skb_copy_bits(skb, offset, &csum, 2);
+	if (err < 0) {
+		ip6_flush_pending_frames(sk);
+		goto out;
+	}
 
 	/* in case cksum was not initialized */
 	if (unlikely(csum))

^ permalink raw reply related	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2016-12-23 17:21 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-14 15:47 ipv6: handle -EFAULT from skb_copy_bits Dave Jones
2016-12-17 15:41 ` David Miller
2016-12-17 16:43   ` Dave Jones
2016-12-19 17:03   ` Dave Jones
2016-12-19 19:48     ` David Miller
2016-12-20  0:31       ` Dave Jones
2016-12-20  0:40         ` Dave Jones
2016-12-20  1:36           ` David Miller
2016-12-20 18:17             ` Dave Jones
2016-12-20 18:28               ` David Miller
2016-12-20 19:34                 ` Dave Jones
2016-12-20 19:31               ` Cong Wang
2016-12-20 22:12                 ` Dave Jones
2016-12-21  6:09                   ` Cong Wang
2016-12-21 12:27                     ` Hannes Frederic Sowa
2016-12-21 12:41                       ` Hannes Frederic Sowa
2016-12-21 19:04                         ` David Miller
2016-12-21 21:33                           ` Hannes Frederic Sowa
2016-12-22  1:40                             ` Dave Jones
2016-12-22  3:29                               ` David Miller
2016-12-22 16:16 Dave Jones
2016-12-23 17:21 ` David Miller

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.