netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* possible stack corruption in icmp_send (__stack_chk_fail)
@ 2021-02-17 18:12 Jason A. Donenfeld
  2021-02-17 22:27 ` Willem de Bruijn
  0 siblings, 1 reply; 6+ messages in thread
From: Jason A. Donenfeld @ 2021-02-17 18:12 UTC (permalink / raw)
  To: Netdev, Willem de Bruijn; +Cc: LKML

Hi Netdev & Willem,

I've received a report of stack corruption -- via the stack protector
check -- in icmp_send. I was sent a vmcore, and was able to extract
the OOPS from there. However, I've been unable to produce the bug and
I don't see where it'd be in the code. That might point to a more
sinister problem, or I'm simply just not seeing it. Apparently the
reporter reproduces it every 40 or so minutes, and has seen it happen
since at least ~5.10. Willem - I'm emailing you because it seems like
you were making a lot of changes to the icmp code around then, and
perhaps you have an intuition. For example, some of the error handling
code takes a pointer to a stack buffer (_objh and such), and maybe
that's problematic? I'm not quite sure. The vmcore, along with the
various kernel binaries I hunted down are here:
https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
. The extracted dmesg follows below, in case you or anyone has a
pointer. I've been staring at this for a while and don't see it.

Jason

Kernel panic - not syncing: stack-protector: Kernel stack is corrupted
in: __icmp_send+0x5bd/0x5c0
CPU: 0 PID: 959 Comm: kworker/0:2 Kdump: loaded Not tainted
5.11.0-051100-lowlatency #202102142330
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
rel-1.13.0-48-gd9c812dda519-prebuilt.qemu.org 04/01/2014
Workqueue: wg-crypt-wg0 wg_packet_decrypt_worker [wireguard]
Call Trace:
 <IRQ>
 show_stack+0x52/0x58
 dump_stack+0x70/0x8b
 panic+0x108/0x2ea
 ? ip_push_pending_frames+0x42/0x90
 ? __icmp_send+0x5bd/0x5c0
 __stack_chk_fail+0x14/0x20
 __icmp_send+0x5bd/0x5c0
 icmp_ndo_send+0x148/0x160
 wg_xmit+0x359/0x450 [wireguard]
 ? harmonize_features+0x19/0x80
 xmit_one.constprop.0+0x9f/0x190
 dev_hard_start_xmit+0x43/0x90
 sch_direct_xmit+0x11d/0x340
 __qdisc_run+0x66/0xc0
 __dev_xmit_skb+0xd5/0x340
 __dev_queue_xmit+0x32b/0x4d0
 ? nf_conntrack_double_lock.constprop.0+0x97/0x140 [nf_conntrack]
 dev_queue_xmit+0x10/0x20
 neigh_connected_output+0xcb/0xf0
 ip_finish_output2+0x17f/0x470
 __ip_finish_output+0x9b/0x140
 ? ipv4_confirm+0x4a/0x80 [nf_conntrack]
 ip_finish_output+0x2d/0xb0
 ip_output+0x78/0x110
 ? __ip_finish_output+0x140/0x140
 ip_forward_finish+0x58/0x90
 ip_forward+0x40a/0x4d0
 ? ip4_key_hashfn+0xb0/0xb0
 ip_sublist_rcv_finish+0x3d/0x50
 ip_list_rcv_finish.constprop.0+0x163/0x190
 ip_sublist_rcv+0x37/0xb0
 ? ip_rcv_finish_core.constprop.0+0x310/0x310
 ip_list_rcv+0xf5/0x120
 __netif_receive_skb_list_core+0x228/0x250
 __netif_receive_skb_list+0x102/0x170
 ? dev_gro_receive+0x1b5/0x370
 netif_receive_skb_list_internal+0xca/0x190
 napi_complete_done+0x7a/0x1a0
 wg_packet_rx_poll+0x384/0x400 [wireguard]
 napi_poll+0x92/0x200
 net_rx_action+0xb8/0x1c0
 __do_softirq+0xce/0x2b3
 asm_call_irq_on_stack+0x12/0x20
 </IRQ>
 do_softirq_own_stack+0x3d/0x50
 do_softirq+0x66/0x80
 __local_bh_enable_ip+0x62/0x70
 _raw_spin_unlock_bh+0x1e/0x20
 wg_packet_decrypt_worker+0xf6/0x190 [wireguard]
 process_one_work+0x217/0x3e0
 worker_thread+0x4d/0x350
 ? rescuer_thread+0x390/0x390
 kthread+0x145/0x170
 ? __kthread_bind_mask+0x70/0x70
 ret_from_fork+0x22/0x30
Kernel Offset: 0x2000000 from 0xffffffff81000000 (relocation range:
0xffffffff80000000-0xffffffffbfffffff)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: possible stack corruption in icmp_send (__stack_chk_fail)
  2021-02-17 18:12 possible stack corruption in icmp_send (__stack_chk_fail) Jason A. Donenfeld
@ 2021-02-17 22:27 ` Willem de Bruijn
  2021-02-17 22:55   ` Jason A. Donenfeld
  0 siblings, 1 reply; 6+ messages in thread
From: Willem de Bruijn @ 2021-02-17 22:27 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Netdev, LKML, Willem de Bruijn

On Wed, Feb 17, 2021 at 1:12 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Netdev & Willem,
>
> I've received a report of stack corruption -- via the stack protector
> check -- in icmp_send. I was sent a vmcore, and was able to extract
> the OOPS from there. However, I've been unable to produce the bug and
> I don't see where it'd be in the code. That might point to a more
> sinister problem, or I'm simply just not seeing it. Apparently the
> reporter reproduces it every 40 or so minutes, and has seen it happen
> since at least ~5.10. Willem - I'm emailing you because it seems like
> you were making a lot of changes to the icmp code around then, and
> perhaps you have an intuition. For example, some of the error handling
> code takes a pointer to a stack buffer (_objh and such), and maybe
> that's problematic? I'm not quite sure. The vmcore, along with the
> various kernel binaries I hunted down are here:
> https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
> . The extracted dmesg follows below, in case you or anyone has a
> pointer. I've been staring at this for a while and don't see it.
>
> Jason

Sorry, I also don't immediately see a cause.

The _objh is a fairly standard approach to accessing skb data with
skb_header_pointer. More importantly, that codepath is in the icmp
receive path and then guarded by a socket option
(inet_sk(sk)->recverr_rfc4884), so unlikely to be exercised here.

This is an icmp send in response to a forwarded packet (assuming
__qdisc_run dequeued the packet that triggered it). The icmp send code
is quite robust against, e.g., undersized packets. But could it be
that the forwarded packet is not sensible IPv4? The skb->protocol is
inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol.

As for on-stack variable overflow, __ip_options_echo parses the
(untrusted) input to write into stack allocated icmp_param. But that
is fairly well tested, rarely touched, code by now. Perhaps relevant,
though, the opt passed is in skb->cb[], which at should probably not
be interpreted as inet_skb_parm (IPCB).

   static inline void icmp_send(struct sk_buff *skb_in, int type, int
code, __be32 info)
   {
        __icmp_send(skb_in, type, code, info, &IPCB(skb_in)->opt);
   }


A vmlinux image might help. I couldn't find one for this kernel.

Or if the kernel can be modified and this path is rarely taken,
logging the packet, e.g., with skb_dump.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: possible stack corruption in icmp_send (__stack_chk_fail)
  2021-02-17 22:27 ` Willem de Bruijn
@ 2021-02-17 22:55   ` Jason A. Donenfeld
  2021-02-17 23:05     ` Willem de Bruijn
  0 siblings, 1 reply; 6+ messages in thread
From: Jason A. Donenfeld @ 2021-02-17 22:55 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Netdev, LKML, Willem de Bruijn

Hi Willem,

On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
> A vmlinux image might help. I couldn't find one for this kernel.

https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
has .debs with vmlinuz in there, which you can extract to vmlinux, as
well as my own vmlinux elf construction with the symbols added back in
by extracting them from kallsyms. That's the best I've been able to
do, as all of this is coming from somebody random emailing me.

> But could it be
> that the forwarded packet is not sensible IPv4? The skb->protocol is
> inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol.

The wg calls to icmp_ndo_send are gated by checking skb->protocol:

        if (skb->protocol == htons(ETH_P_IP))
               icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
       else if (skb->protocol == htons(ETH_P_IPV6))
               icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH,
ICMPV6_ADDR_UNREACH, 0);

On the other hand, that code is hit on an error path when
wg_check_packet_protocol returns false:

static inline bool wg_check_packet_protocol(struct sk_buff *skb)
{
       __be16 real_protocol = ip_tunnel_parse_protocol(skb);
       return real_protocol && skb->protocol == real_protocol;
}

So that means, at least in theory, icmp_ndo_send could be called with
skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address
that. But... is it actually a problem?

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: possible stack corruption in icmp_send (__stack_chk_fail)
  2021-02-17 22:55   ` Jason A. Donenfeld
@ 2021-02-17 23:05     ` Willem de Bruijn
  2021-02-17 23:18       ` Jason A. Donenfeld
  0 siblings, 1 reply; 6+ messages in thread
From: Willem de Bruijn @ 2021-02-17 23:05 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Netdev, LKML, Willem de Bruijn

On Wed, Feb 17, 2021 at 5:56 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> Hi Willem,
>
> On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn
> <willemdebruijn.kernel@gmail.com> wrote:
> > A vmlinux image might help. I couldn't find one for this kernel.
>
> https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
> has .debs with vmlinuz in there, which you can extract to vmlinux, as
> well as my own vmlinux elf construction with the symbols added back in
> by extracting them from kallsyms. That's the best I've been able to
> do, as all of this is coming from somebody random emailing me.
>
> > But could it be
> > that the forwarded packet is not sensible IPv4? The skb->protocol is
> > inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol.
>
> The wg calls to icmp_ndo_send are gated by checking skb->protocol:
>
>         if (skb->protocol == htons(ETH_P_IP))
>                icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH, 0);
>        else if (skb->protocol == htons(ETH_P_IPV6))
>                icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH,
> ICMPV6_ADDR_UNREACH, 0);
>
> On the other hand, that code is hit on an error path when
> wg_check_packet_protocol returns false:
>
> static inline bool wg_check_packet_protocol(struct sk_buff *skb)
> {
>        __be16 real_protocol = ip_tunnel_parse_protocol(skb);
>        return real_protocol && skb->protocol == real_protocol;
> }
>
> So that means, at least in theory, icmp_ndo_send could be called with
> skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address
> that. But... is it actually a problem?

For this forwarded packet that arrived on a wireguard tunnel,
skb->protocol was originally also set by ip_tunnel_parse_protocol.
So likely not.

The other issue seems more like a real bug. wg_xmit calling
icmp_ndo_send without clearing IPCB first.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: possible stack corruption in icmp_send (__stack_chk_fail)
  2021-02-17 23:05     ` Willem de Bruijn
@ 2021-02-17 23:18       ` Jason A. Donenfeld
  2021-02-17 23:37         ` Willem de Bruijn
  0 siblings, 1 reply; 6+ messages in thread
From: Jason A. Donenfeld @ 2021-02-17 23:18 UTC (permalink / raw)
  To: Willem de Bruijn; +Cc: Netdev, LKML, Willem de Bruijn

On 2/18/21, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> On Wed, Feb 17, 2021 at 5:56 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>>
>> Hi Willem,
>>
>> On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn
>> <willemdebruijn.kernel@gmail.com> wrote:
>> > A vmlinux image might help. I couldn't find one for this kernel.
>>
>> https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
>> has .debs with vmlinuz in there, which you can extract to vmlinux, as
>> well as my own vmlinux elf construction with the symbols added back in
>> by extracting them from kallsyms. That's the best I've been able to
>> do, as all of this is coming from somebody random emailing me.
>>
>> > But could it be
>> > that the forwarded packet is not sensible IPv4? The skb->protocol is
>> > inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol.
>>
>> The wg calls to icmp_ndo_send are gated by checking skb->protocol:
>>
>>         if (skb->protocol == htons(ETH_P_IP))
>>                icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH,
>> 0);
>>        else if (skb->protocol == htons(ETH_P_IPV6))
>>                icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH,
>> ICMPV6_ADDR_UNREACH, 0);
>>
>> On the other hand, that code is hit on an error path when
>> wg_check_packet_protocol returns false:
>>
>> static inline bool wg_check_packet_protocol(struct sk_buff *skb)
>> {
>>        __be16 real_protocol = ip_tunnel_parse_protocol(skb);
>>        return real_protocol && skb->protocol == real_protocol;
>> }
>>
>> So that means, at least in theory, icmp_ndo_send could be called with
>> skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address
>> that. But... is it actually a problem?
>
> For this forwarded packet that arrived on a wireguard tunnel,
> skb->protocol was originally also set by ip_tunnel_parse_protocol.
> So likely not.
>
> The other issue seems more like a real bug. wg_xmit calling
> icmp_ndo_send without clearing IPCB first.
>

Bingo! Nice eye! I confirmed the crash by just memsetting 0x41 to cb
before the call. Clearly this should be zeroed by icmp_ndo_xmit. Will
send a patch for icmp_ndo_xmit momentarily and will CC you.

Jason

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: possible stack corruption in icmp_send (__stack_chk_fail)
  2021-02-17 23:18       ` Jason A. Donenfeld
@ 2021-02-17 23:37         ` Willem de Bruijn
  0 siblings, 0 replies; 6+ messages in thread
From: Willem de Bruijn @ 2021-02-17 23:37 UTC (permalink / raw)
  To: Jason A. Donenfeld; +Cc: Netdev, LKML, Willem de Bruijn

On Wed, Feb 17, 2021 at 6:18 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>
> On 2/18/21, Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote:
> > On Wed, Feb 17, 2021 at 5:56 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> >>
> >> Hi Willem,
> >>
> >> On Wed, Feb 17, 2021 at 11:27 PM Willem de Bruijn
> >> <willemdebruijn.kernel@gmail.com> wrote:
> >> > A vmlinux image might help. I couldn't find one for this kernel.
> >>
> >> https://data.zx2c4.com/icmp_send-crash-e03b4a42-706a-43bf-bc40-1f15966b3216.tar.xz
> >> has .debs with vmlinuz in there, which you can extract to vmlinux, as
> >> well as my own vmlinux elf construction with the symbols added back in
> >> by extracting them from kallsyms. That's the best I've been able to
> >> do, as all of this is coming from somebody random emailing me.
> >>
> >> > But could it be
> >> > that the forwarded packet is not sensible IPv4? The skb->protocol is
> >> > inferred in wg_packet_consume_data_done->ip_tunnel_parse_protocol.
> >>
> >> The wg calls to icmp_ndo_send are gated by checking skb->protocol:
> >>
> >>         if (skb->protocol == htons(ETH_P_IP))
> >>                icmp_ndo_send(skb, ICMP_DEST_UNREACH, ICMP_HOST_UNREACH,
> >> 0);
> >>        else if (skb->protocol == htons(ETH_P_IPV6))
> >>                icmpv6_ndo_send(skb, ICMPV6_DEST_UNREACH,
> >> ICMPV6_ADDR_UNREACH, 0);
> >>
> >> On the other hand, that code is hit on an error path when
> >> wg_check_packet_protocol returns false:
> >>
> >> static inline bool wg_check_packet_protocol(struct sk_buff *skb)
> >> {
> >>        __be16 real_protocol = ip_tunnel_parse_protocol(skb);
> >>        return real_protocol && skb->protocol == real_protocol;
> >> }
> >>
> >> So that means, at least in theory, icmp_ndo_send could be called with
> >> skb->protocol != ip_tunnel_parse_protocol(skb). I guess I can address
> >> that. But... is it actually a problem?
> >
> > For this forwarded packet that arrived on a wireguard tunnel,
> > skb->protocol was originally also set by ip_tunnel_parse_protocol.
> > So likely not.
> >
> > The other issue seems more like a real bug. wg_xmit calling
> > icmp_ndo_send without clearing IPCB first.
> >
>
> Bingo! Nice eye! I confirmed the crash by just memsetting 0x41 to cb
> before the call. Clearly this should be zeroed by icmp_ndo_xmit. Will
> send a patch for icmp_ndo_xmit momentarily and will CC you.

Great, let's hope that's it.

gtp_build_skb_ip4 zeroes before calling. The fix will be most
obviously correct if wg_xmit does the same.

But it is quite likely that the other callers, xfrmi_xmit2 and
sunvnet_start_xmit_common should zero, too. If so, then icmp_ndo_xmit
is the more robust location to do this. Then the Fixes tag will likely
go quite a bit farther back, too.

Whichever variant of the patch you prefer.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-02-17 23:39 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-17 18:12 possible stack corruption in icmp_send (__stack_chk_fail) Jason A. Donenfeld
2021-02-17 22:27 ` Willem de Bruijn
2021-02-17 22:55   ` Jason A. Donenfeld
2021-02-17 23:05     ` Willem de Bruijn
2021-02-17 23:18       ` Jason A. Donenfeld
2021-02-17 23:37         ` Willem de Bruijn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).