netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Possible race in ipv4 routing
@ 2021-02-01  9:20 Schmid, Carsten
  2021-02-01 15:35 ` David Ahern
  0 siblings, 1 reply; 5+ messages in thread
From: Schmid, Carsten @ 2021-02-01  9:20 UTC (permalink / raw)
  To: davem, kuznet, yoshfuji; +Cc: netdev

Hi,

on kernel 4.14(.147) i have seen something weird. The stack trace:

[65064.457920] BUG: unable to handle kernel NULL pointer dereference at 0000000000000604
[65064.466677] IP: ip_route_output_key_hash_rcu+0x755/0x850
[65064.472599] PGD 0 P4D 0
[65064.475422] Oops: 0000 [#1] PREEMPT SMP NOPTI
[65064.480277] Modules linked in: bcmdhd(O) nls_cp437 nls_utf8 ebt_ip6 ebt_ip ebtable_filter ebtables squashfs zlib_inflate xz_dec veth lzo lzo_compress lzo_decompress nls_iso8859_1 nls_cp850 vfat fat cfq_iosched sd_mod ah4 esp4 xfrm4_mode_transport tntfs(PO) texfat(PO) usb_storage xfrm_user xfrm_algo cls_u32 cdc_acm sch_htb intel_tfm_governor snd_soc_apl_mgu_hu ecryptfs intel_xhci_usb_role_switch roles dwc3 udc_core intel_ipu4_psys intel_ipu4_psys_csslib adv728x coretemp snd_soc_skl intel_ipu4_isys videobuf2_dma_contig videobuf2_memops sdw_cnl ipu4_acpi snd_soc_acpi_intel_match intel_ipu4_isys_csslib videobuf2_v4l2 snd_soc_acpi videobuf2_core sbi_apl i2c_i801 snd_soc_core snd_compress snd_soc_skl_ipc sdw_bus xhci_pci crc8 ahci snd_soc_sst_ipc xhci_hcd libahci snd_soc_sst_dsp cfg80211 snd_hda_ext_core
[65064.559290]  libata snd_hda_core intel_ipu4_mmu usbcore rfkill usb_common snd_pcm scsi_mod dwc3_pci snd_timer mei_me snd intel_ipu4 soundcore mei iova nfsd auth_rpcgss lockd grace sunrpc zram zsmalloc loop fuse 8021q bridge stp llc inap560t(O) i915 video backlight intel_gtt i2c_algo_bit drm_kms_helper drm firmware_class igb_avb(O) ptp hwmon pps_core spi_pxa2xx_platform [last unloaded: bcmdhd]
[65064.598123] CPU: 0 PID: 249 Comm: 6310_io01 Tainted: P     U     O    4.14.147-apl #1
[65064.606860] task: ffff96e9f19a3200 task.stack: ffffb3cb80450000
[65064.613465] RIP: 0010:ip_route_output_key_hash_rcu+0x755/0x850
[65064.619859] Blocking unknown connection: IN= OUT=wlan_bridge SRC=172.16.222.97 DST=224.0.0.22 LEN=40 TOS=0x00 PREC=0xC0 TTL=1 ID=0 DF PROTO=2 MARK=0x1000d4
[65064.619972] RSP: 0018:ffffb3cb80453940 EFLAGS: 00010246
[65064.641436] RAX: ffff96e866eebf00 RBX: ffff96e800e5a238 RCX: 0000000000000000
[65064.649397] RDX: 0000000000000001 RSI: 0000000026c730a0 RDI: 0000000000000000
[65064.657363] RBP: ffffb3cb80453990 R08: 0000000000000000 R09: ffff96e9f5bdf000
[65064.665326] R10: 0000000000000000 R11: ffff96e875740540 R12: ffff96e866eebf00
[65064.673290] R13: ffffb3cb804539a0 R14: ffff96e9f1ba7000 R15: 0000000000000000
[65064.681252] FS:  00007efc327fc700(0000) GS:ffff96e9fdc00000(0000) knlGS:0000000000000000
[65064.690284] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[65064.696697] CR2: 0000000000000604 CR3: 00000001f4d44000 CR4: 00000000003406b0
[65064.704653] Call Trace:
[65064.707384]  ip_route_output_key_hash+0x82/0xb0
[65064.712438]  ip_route_output_flow+0x19/0x50
[65064.717104]  ip_queue_xmit+0x389/0x3c0
[65064.721286]  __tcp_transmit_skb+0x598/0x9f0
[65064.725956]  tcp_write_xmit+0x1ab/0xf40
[65064.730234]  __tcp_push_pending_frames+0x30/0xd0
[65064.735387]  tcp_push+0xe7/0x110
[65064.738985]  tcp_sendmsg_locked+0x9a3/0xe40
[65064.743652]  tcp_sendmsg+0x27/0x40
[65064.747444]  inet_sendmsg+0x2f/0xf0
[65064.751332]  sock_sendmsg+0x31/0x40
[65064.755221]  ___sys_sendmsg+0x28d/0x2a0
[65064.759500]  ? do_iter_readv_writev+0x103/0x160
[65064.764557]  ? try_to_wake_up+0x25c/0x460
[65064.769024]  ? default_wake_function+0xd/0x10
[65064.773893]  ? __wake_up_common+0x6e/0x120
[65064.778460]  ? __fget+0x71/0xa0
[65064.781962]  __sys_sendmsg+0x4f/0x90
[65064.785946]  ? __sys_sendmsg+0x4f/0x90
[65064.790125]  SyS_sendmsg+0x9/0x10
[65064.793818]  do_syscall_64+0x79/0x350
[65064.797900]  ? schedule+0x2e/0x90
[65064.801594]  ? exit_to_usermode_loop+0x5a/0x90
[65064.806549]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
[65064.812183] RIP: 0033:0x7efc3c549807
[65064.816167] RSP: 002b:00007efc327f9500 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[65064.824616] RAX: ffffffffffffffda RBX: 0000000000000233 RCX: 00007efc3c549807
[65064.832575] RDX: 0000000000004000 RSI: 00007efc327f9570 RDI: 0000000000000233
[65064.840543] RBP: 00007efc327f9570 R08: 0000000000000000 R09: 0000000000000001
[65064.848500] R10: 00007efc2c087548 R11: 0000000000000293 R12: 0000000000004000
[65064.856459] R13: 0000000000000233 R14: 00007efc327fc5c0 R15: 00007efc24bf7810
[65064.864419] Code: b1 72 e9 bc fe ff ff 4c 89 4d c8 4c 89 45 d0 e8 a2 08 be ff 4c 8b 45 d0 4c 8b 4d c8 e9 88 fa ff ff 48 8b 08 48 8b 89 98 04 00 00 <8b> 89 04 06 00 00 39 88 a0 00 00 00 0f 85 9c fe ff ff 8b 80 80
[65064.885531] RIP: ip_route_output_key_hash_rcu+0x755/0x850 RSP: ffffb3cb80453940
[65064.893689] CR2: 0000000000000604

Fortunately i have a core dump, and analyzed that.
Finally in the analysis i came to that point:
(net/ipv4/route.c around line 1520):

static bool rt_cache_valid(const struct rtable *rt)
{
returnrt &&
rt->dst.obsolete == DST_OBSOLETE_FORCE_CHK &&
!rt_is_expired(rt); <<<< crash here, see below
}

The code was executing to the check !rt_is_expired(rt); and the disassembly looks like
ffffffff814f1ba0:48 85 c0             test   rax,rax // (1127, check for rt not NULL)
ffffffff814f1ba3:74 0e                je     ffffffff814f1bb3 <ip_route_output_key_hash_rcu+0x603>
/usr/src/kernel/net/ipv4/route.c:1524
ffffffff814f1ba5:66 83 78 64 ff       cmp    WORD PTR [rax+0x64],0xffff // rt->dst.obsolete accessed
__mkroute_output():

/usr/src/kernel/net/ipv4/route.c:2276
rth = rcu_dereference(*prth);
ffffffff814f1baa:49 89 c4             mov    r12,rax // r12= rt
rt_cache_valid():
/usr/src/kernel/net/ipv4/route.c:1524
ffffffff814f1bad:0f 84 48 01 00 00    je     ffffffff814f1cfb <ip_route_output_key_hash_rcu+0x74b> goes to ***1***

the only reference reaching this point is after the DST_OBSOLETE_FORCE_CHK above:
read_pnet():
/usr/src/kernel/include/net/net_namespace.h:282
ffffffff814f1cfb:48 8b 08             mov    rcx,QWORD PTR [rax]       <<<<<<***1*** need to come in here because previous instruction is a "jmp"
fetches net_device *dev into rcx, value is ffff96e866eeb000

ffffffff814f1cfe:48 8b 89 98 04 00 00 mov    rcx,QWORD PTR [rcx+0x498]
    crash> rd 0xffff96e866eeb490 4
    ffff96e866eeb490:  0000000000000000 0000000000000000   ................ that's a NULL pointer here
    ffff96e866eeb4a0:  8000000000025287 0000000000010002   .R..............
crash>__read_once_size():
/usr/src/kernel/include/linux/compiler.h:183
ffffffff814f1d05:8b 89 04 06 00 00    mov    ecx,DWORD PTR [rcx+0x604] <<<<<<***** crash here. rcx = NULL, offset 0x604

Looking at the data i can see that dst.obsolete=0x0002 (i ordered the memdump into the struct), so why did we reach that point?
R12 = rt = ffff96e866eebf00 = struct rtable
SIZE: 216
struct rtable {
   [0x0] struct dst_entry dst;
        struct dst_entry {
           [0x0] struct net_device *dev;                        ffff96e866eeb000
           [0x8] struct callback_head callback_head;
                struct callback_head {
                    [0x0] struct callback_head *next;           0000000000000000
                    [0x8] void (*func)(struct callback_head *); 0101003800000500
        }
        SIZE: 0x10
          [0x18] struct dst_entry *child;                       0000000000000000
          [0x20] struct dst_ops *ops;                           ffffffff8dcb28c0
          [0x28] unsigned long _metrics;                        ffffffff8da66d41
          [0x30] unsigned long expires;                         0000000000000000
          [0x38] struct dst_entry *path;                        ffff96e866eebf00
          [0x40] struct dst_entry *from;                        0000000000000000
          [0x48] struct xfrm_state *xfrm;                       0000000000000000
          [0x50] int (*input)(struct sk_buff *);                ffffffff8d49f640
          [0x58] int (*output)(struct net *, struct sock *, struct sk_buff *);ffffffff8d49f6d0
          [0x60] unsigned short flags;                          0000
          [0x62] short error;                                   0000
          [0x64] short obsolete;                                0002 <<<< we have not ffff here, but 0x02
          [0x66] unsigned short header_len;                     0000
          [0x68] unsigned short trailer_len;                    0000
          [0x6a] unsigned short __pad3;                         0000
          [0x6c] __u32 __pad2;                                  00000000
          [0x70] long __pad_to_align_refcnt[2];                 0000000000000000 0000000000000000
          [0x80] atomic_t __refcnt;                             000000000
          [0x84] int __use;                                     000000000
          [0x88] unsigned long lastuse;                         0000000103dc0d11
          [0x90] struct lwtunnel_state *lwtstate;               0000000000000000
                 union {
          [0x98]     struct dst_entry *next;                    0000000000000000
          [0x98]     struct rtable *rt_next;
          [0x98]     struct rt6_info *rt6_next;
          [0x98]     struct dn_route *dn_next;
                 };
        }
        SIZE: 0xa0
--- snip ---

crash> rd 0xffff96e866eebf00 32 (=r12=rt)
ffff96e866eebf00:  ffff96e866eeb000 0000000000000000   ...f............
ffff96e866eebf10:  0101003800000500 0000000000000000   ....8...........
ffff96e866eebf20:  ffffffff8dcb28c0 ffffffff8da66d41   .(......Am......
ffff96e866eebf30:  0000000000000000 ffff96e866eebf00   ...........f....
ffff96e866eebf40:  0000000000000000 0000000000000000   ................
ffff96e866eebf50:  ffffffff8d49f640 ffffffff8d49f6d0   @.I.......I.....
ffff96e866eebf60:  0000000200000000 0000000000000000   ................ <-- here is the "obsolete"
ffff96e866eebf70:  0000000000000000 0000000000000000   ................
ffff96e866eebf80:  0000000000000000 0000000103dc0d11   ................
ffff96e866eebf90:  0000000000000000 0000000000000000   ................
ffff96e866eebfa0:  0000000000025287 0000000000000001   .R..............
ffff96e866eebfb0:  0000000000000000 00000000000000fe   ................
ffff96e866eebfc0:  ffff96e866eebfc0 ffff96e866eebfc0   ...f.......f....
ffff96e866eebfd0:  ffff96e8388dfc00 ffff96e8388dfc82   ...8.......8....

Looks like a race somewhere, but i am not familiar with the TCPIP code.
What i have found is a patch that was submitted for 4.14 and could have somethig to do with that:
https://www.spinics.net/lists/stable-commits/msg133055.html
But this patch didn't make it into 4.14.

Can someone check this race condition?

Best regards
Carsten
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible race in ipv4 routing
  2021-02-01  9:20 Possible race in ipv4 routing Schmid, Carsten
@ 2021-02-01 15:35 ` David Ahern
  2021-02-01 18:52   ` Wei Wang
  0 siblings, 1 reply; 5+ messages in thread
From: David Ahern @ 2021-02-01 15:35 UTC (permalink / raw)
  To: Schmid, Carsten, davem, kuznet, yoshfuji, Wei Wang; +Cc: netdev

On 2/1/21 2:20 AM, Schmid, Carsten wrote:
> Hi,
> 
> on kernel 4.14(.147) i have seen something weird. The stack trace:
> 
> [65064.457920] BUG: unable to handle kernel NULL pointer dereference at 0000000000000604
> [65064.466677] IP: ip_route_output_key_hash_rcu+0x755/0x850
> [65064.472599] PGD 0 P4D 0
> [65064.475422] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [65064.480277] Modules linked in: bcmdhd(O) nls_cp437 nls_utf8 ebt_ip6 ebt_ip ebtable_filter ebtables squashfs zlib_inflate xz_dec veth lzo lzo_compress lzo_decompress nls_iso8859_1 nls_cp850 vfat fat cfq_iosched sd_mod ah4 esp4 xfrm4_mode_transport tntfs(PO) texfat(PO) usb_storage xfrm_user xfrm_algo cls_u32 cdc_acm sch_htb intel_tfm_governor snd_soc_apl_mgu_hu ecryptfs intel_xhci_usb_role_switch roles dwc3 udc_core intel_ipu4_psys intel_ipu4_psys_csslib adv728x coretemp snd_soc_skl intel_ipu4_isys videobuf2_dma_contig videobuf2_memops sdw_cnl ipu4_acpi snd_soc_acpi_intel_match intel_ipu4_isys_csslib videobuf2_v4l2 snd_soc_acpi videobuf2_core sbi_apl i2c_i801 snd_soc_core snd_compress snd_soc_skl_ipc sdw_bus xhci_pci crc8 ahci snd_soc_sst_ipc xhci_hcd libahci snd_soc_sst_dsp cfg80211 snd_hda_ext_core
> [65064.559290]  libata snd_hda_core intel_ipu4_mmu usbcore rfkill usb_common snd_pcm scsi_mod dwc3_pci snd_timer mei_me snd intel_ipu4 soundcore mei iova nfsd auth_rpcgss lockd grace sunrpc zram zsmalloc loop fuse 8021q bridge stp llc inap560t(O) i915 video backlight intel_gtt i2c_algo_bit drm_kms_helper drm firmware_class igb_avb(O) ptp hwmon pps_core spi_pxa2xx_platform [last unloaded: bcmdhd]
> [65064.598123] CPU: 0 PID: 249 Comm: 6310_io01 Tainted: P     U     O    4.14.147-apl #1
> [65064.606860] task: ffff96e9f19a3200 task.stack: ffffb3cb80450000
> [65064.613465] RIP: 0010:ip_route_output_key_hash_rcu+0x755/0x850
> [65064.619859] Blocking unknown connection: IN= OUT=wlan_bridge SRC=172.16.222.97 DST=224.0.0.22 LEN=40 TOS=0x00 PREC=0xC0 TTL=1 ID=0 DF PROTO=2 MARK=0x1000d4
> [65064.619972] RSP: 0018:ffffb3cb80453940 EFLAGS: 00010246
> [65064.641436] RAX: ffff96e866eebf00 RBX: ffff96e800e5a238 RCX: 0000000000000000
> [65064.649397] RDX: 0000000000000001 RSI: 0000000026c730a0 RDI: 0000000000000000
> [65064.657363] RBP: ffffb3cb80453990 R08: 0000000000000000 R09: ffff96e9f5bdf000
> [65064.665326] R10: 0000000000000000 R11: ffff96e875740540 R12: ffff96e866eebf00
> [65064.673290] R13: ffffb3cb804539a0 R14: ffff96e9f1ba7000 R15: 0000000000000000
> [65064.681252] FS:  00007efc327fc700(0000) GS:ffff96e9fdc00000(0000) knlGS:0000000000000000
> [65064.690284] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [65064.696697] CR2: 0000000000000604 CR3: 00000001f4d44000 CR4: 00000000003406b0
> [65064.704653] Call Trace:
> [65064.707384]  ip_route_output_key_hash+0x82/0xb0
> [65064.712438]  ip_route_output_flow+0x19/0x50
> [65064.717104]  ip_queue_xmit+0x389/0x3c0
> [65064.721286]  __tcp_transmit_skb+0x598/0x9f0
> [65064.725956]  tcp_write_xmit+0x1ab/0xf40
> [65064.730234]  __tcp_push_pending_frames+0x30/0xd0
> [65064.735387]  tcp_push+0xe7/0x110
> [65064.738985]  tcp_sendmsg_locked+0x9a3/0xe40
> [65064.743652]  tcp_sendmsg+0x27/0x40
> [65064.747444]  inet_sendmsg+0x2f/0xf0
> [65064.751332]  sock_sendmsg+0x31/0x40
> [65064.755221]  ___sys_sendmsg+0x28d/0x2a0
> [65064.759500]  ? do_iter_readv_writev+0x103/0x160
> [65064.764557]  ? try_to_wake_up+0x25c/0x460
> [65064.769024]  ? default_wake_function+0xd/0x10
> [65064.773893]  ? __wake_up_common+0x6e/0x120
> [65064.778460]  ? __fget+0x71/0xa0
> [65064.781962]  __sys_sendmsg+0x4f/0x90
> [65064.785946]  ? __sys_sendmsg+0x4f/0x90
> [65064.790125]  SyS_sendmsg+0x9/0x10
> [65064.793818]  do_syscall_64+0x79/0x350
> [65064.797900]  ? schedule+0x2e/0x90
> [65064.801594]  ? exit_to_usermode_loop+0x5a/0x90
> [65064.806549]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> [65064.812183] RIP: 0033:0x7efc3c549807
> [65064.816167] RSP: 002b:00007efc327f9500 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
> [65064.824616] RAX: ffffffffffffffda RBX: 0000000000000233 RCX: 00007efc3c549807
> [65064.832575] RDX: 0000000000004000 RSI: 00007efc327f9570 RDI: 0000000000000233
> [65064.840543] RBP: 00007efc327f9570 R08: 0000000000000000 R09: 0000000000000001
> [65064.848500] R10: 00007efc2c087548 R11: 0000000000000293 R12: 0000000000004000
> [65064.856459] R13: 0000000000000233 R14: 00007efc327fc5c0 R15: 00007efc24bf7810
> [65064.864419] Code: b1 72 e9 bc fe ff ff 4c 89 4d c8 4c 89 45 d0 e8 a2 08 be ff 4c 8b 45 d0 4c 8b 4d c8 e9 88 fa ff ff 48 8b 08 48 8b 89 98 04 00 00 <8b> 89 04 06 00 00 39 88 a0 00 00 00 0f 85 9c fe ff ff 8b 80 80
> [65064.885531] RIP: ip_route_output_key_hash_rcu+0x755/0x850 RSP: ffffb3cb80453940
> [65064.893689] CR2: 0000000000000604
> 
> Fortunately i have a core dump, and analyzed that.
> Finally in the analysis i came to that point:
> (net/ipv4/route.c around line 1520):
> 
> static bool rt_cache_valid(const struct rtable *rt)
> {
> returnrt &&
> rt->dst.obsolete == DST_OBSOLETE_FORCE_CHK &&
> !rt_is_expired(rt); <<<< crash here, see below
> }
> 
> The code was executing to the check !rt_is_expired(rt); and the disassembly looks like
> ffffffff814f1ba0:48 85 c0             test   rax,rax // (1127, check for rt not NULL)
> ffffffff814f1ba3:74 0e                je     ffffffff814f1bb3 <ip_route_output_key_hash_rcu+0x603>
> /usr/src/kernel/net/ipv4/route.c:1524
> ffffffff814f1ba5:66 83 78 64 ff       cmp    WORD PTR [rax+0x64],0xffff // rt->dst.obsolete accessed
> __mkroute_output():
> 
> /usr/src/kernel/net/ipv4/route.c:2276
> rth = rcu_dereference(*prth);
> ffffffff814f1baa:49 89 c4             mov    r12,rax // r12= rt
> rt_cache_valid():
> /usr/src/kernel/net/ipv4/route.c:1524
> ffffffff814f1bad:0f 84 48 01 00 00    je     ffffffff814f1cfb <ip_route_output_key_hash_rcu+0x74b> goes to ***1***
> 
> the only reference reaching this point is after the DST_OBSOLETE_FORCE_CHK above:
> read_pnet():
> /usr/src/kernel/include/net/net_namespace.h:282
> ffffffff814f1cfb:48 8b 08             mov    rcx,QWORD PTR [rax]       <<<<<<***1*** need to come in here because previous instruction is a "jmp"
> fetches net_device *dev into rcx, value is ffff96e866eeb000
> 
> ffffffff814f1cfe:48 8b 89 98 04 00 00 mov    rcx,QWORD PTR [rcx+0x498]
>     crash> rd 0xffff96e866eeb490 4
>     ffff96e866eeb490:  0000000000000000 0000000000000000   ................ that's a NULL pointer here
>     ffff96e866eeb4a0:  8000000000025287 0000000000010002   .R..............
> crash>__read_once_size():
> /usr/src/kernel/include/linux/compiler.h:183
> ffffffff814f1d05:8b 89 04 06 00 00    mov    ecx,DWORD PTR [rcx+0x604] <<<<<<***** crash here. rcx = NULL, offset 0x604
> 
> Looking at the data i can see that dst.obsolete=0x0002 (i ordered the memdump into the struct), so why did we reach that point?
> R12 = rt = ffff96e866eebf00 = struct rtable
> SIZE: 216
> struct rtable {
>    [0x0] struct dst_entry dst;
>         struct dst_entry {
>            [0x0] struct net_device *dev;                        ffff96e866eeb000
>            [0x8] struct callback_head callback_head;
>                 struct callback_head {
>                     [0x0] struct callback_head *next;           0000000000000000
>                     [0x8] void (*func)(struct callback_head *); 0101003800000500
>         }
>         SIZE: 0x10
>           [0x18] struct dst_entry *child;                       0000000000000000
>           [0x20] struct dst_ops *ops;                           ffffffff8dcb28c0
>           [0x28] unsigned long _metrics;                        ffffffff8da66d41
>           [0x30] unsigned long expires;                         0000000000000000
>           [0x38] struct dst_entry *path;                        ffff96e866eebf00
>           [0x40] struct dst_entry *from;                        0000000000000000
>           [0x48] struct xfrm_state *xfrm;                       0000000000000000
>           [0x50] int (*input)(struct sk_buff *);                ffffffff8d49f640
>           [0x58] int (*output)(struct net *, struct sock *, struct sk_buff *);ffffffff8d49f6d0
>           [0x60] unsigned short flags;                          0000
>           [0x62] short error;                                   0000
>           [0x64] short obsolete;                                0002 <<<< we have not ffff here, but 0x02
>           [0x66] unsigned short header_len;                     0000
>           [0x68] unsigned short trailer_len;                    0000
>           [0x6a] unsigned short __pad3;                         0000
>           [0x6c] __u32 __pad2;                                  00000000
>           [0x70] long __pad_to_align_refcnt[2];                 0000000000000000 0000000000000000
>           [0x80] atomic_t __refcnt;                             000000000
>           [0x84] int __use;                                     000000000
>           [0x88] unsigned long lastuse;                         0000000103dc0d11
>           [0x90] struct lwtunnel_state *lwtstate;               0000000000000000
>                  union {
>           [0x98]     struct dst_entry *next;                    0000000000000000
>           [0x98]     struct rtable *rt_next;
>           [0x98]     struct rt6_info *rt6_next;
>           [0x98]     struct dn_route *dn_next;
>                  };
>         }
>         SIZE: 0xa0
> --- snip ---
> 
> crash> rd 0xffff96e866eebf00 32 (=r12=rt)
> ffff96e866eebf00:  ffff96e866eeb000 0000000000000000   ...f............
> ffff96e866eebf10:  0101003800000500 0000000000000000   ....8...........
> ffff96e866eebf20:  ffffffff8dcb28c0 ffffffff8da66d41   .(......Am......
> ffff96e866eebf30:  0000000000000000 ffff96e866eebf00   ...........f....
> ffff96e866eebf40:  0000000000000000 0000000000000000   ................
> ffff96e866eebf50:  ffffffff8d49f640 ffffffff8d49f6d0   @.I.......I.....
> ffff96e866eebf60:  0000000200000000 0000000000000000   ................ <-- here is the "obsolete"
> ffff96e866eebf70:  0000000000000000 0000000000000000   ................
> ffff96e866eebf80:  0000000000000000 0000000103dc0d11   ................
> ffff96e866eebf90:  0000000000000000 0000000000000000   ................
> ffff96e866eebfa0:  0000000000025287 0000000000000001   .R..............
> ffff96e866eebfb0:  0000000000000000 00000000000000fe   ................
> ffff96e866eebfc0:  ffff96e866eebfc0 ffff96e866eebfc0   ...f.......f....
> ffff96e866eebfd0:  ffff96e8388dfc00 ffff96e8388dfc82   ...8.......8....
> 
> Looks like a race somewhere, but i am not familiar with the TCPIP code.
> What i have found is a patch that was submitted for 4.14 and could have somethig to do with that:
> https://www.spinics.net/lists/stable-commits/msg133055.html
> But this patch didn't make it into 4.14.
> 
> Can someone check this race condition?
> 

dst->dev is NULL. Adding author of the patch for thoughts.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Possible race in ipv4 routing
  2021-02-01 15:35 ` David Ahern
@ 2021-02-01 18:52   ` Wei Wang
  2021-04-14  9:18     ` AW: " Schmid, Carsten
  2021-04-16 12:29     ` Schmid, Carsten
  0 siblings, 2 replies; 5+ messages in thread
From: Wei Wang @ 2021-02-01 18:52 UTC (permalink / raw)
  To: David Ahern; +Cc: Schmid, Carsten, davem, kuznet, yoshfuji, netdev

On Mon, Feb 1, 2021 at 7:35 AM David Ahern <dsahern@gmail.com> wrote:
>
> On 2/1/21 2:20 AM, Schmid, Carsten wrote:
> > Hi,
> >
> > on kernel 4.14(.147) i have seen something weird. The stack trace:
> >
> > [65064.457920] BUG: unable to handle kernel NULL pointer dereference at 0000000000000604
> > [65064.466677] IP: ip_route_output_key_hash_rcu+0x755/0x850
> > [65064.472599] PGD 0 P4D 0
> > [65064.475422] Oops: 0000 [#1] PREEMPT SMP NOPTI
> > [65064.480277] Modules linked in: bcmdhd(O) nls_cp437 nls_utf8 ebt_ip6 ebt_ip ebtable_filter ebtables squashfs zlib_inflate xz_dec veth lzo lzo_compress lzo_decompress nls_iso8859_1 nls_cp850 vfat fat cfq_iosched sd_mod ah4 esp4 xfrm4_mode_transport tntfs(PO) texfat(PO) usb_storage xfrm_user xfrm_algo cls_u32 cdc_acm sch_htb intel_tfm_governor snd_soc_apl_mgu_hu ecryptfs intel_xhci_usb_role_switch roles dwc3 udc_core intel_ipu4_psys intel_ipu4_psys_csslib adv728x coretemp snd_soc_skl intel_ipu4_isys videobuf2_dma_contig videobuf2_memops sdw_cnl ipu4_acpi snd_soc_acpi_intel_match intel_ipu4_isys_csslib videobuf2_v4l2 snd_soc_acpi videobuf2_core sbi_apl i2c_i801 snd_soc_core snd_compress snd_soc_skl_ipc sdw_bus xhci_pci crc8 ahci snd_soc_sst_ipc xhci_hcd libahci snd_soc_sst_dsp cfg80211 snd_hda_ext_core
> > [65064.559290]  libata snd_hda_core intel_ipu4_mmu usbcore rfkill usb_common snd_pcm scsi_mod dwc3_pci snd_timer mei_me snd intel_ipu4 soundcore mei iova nfsd auth_rpcgss lockd grace sunrpc zram zsmalloc loop fuse 8021q bridge stp llc inap560t(O) i915 video backlight intel_gtt i2c_algo_bit drm_kms_helper drm firmware_class igb_avb(O) ptp hwmon pps_core spi_pxa2xx_platform [last unloaded: bcmdhd]
> > [65064.598123] CPU: 0 PID: 249 Comm: 6310_io01 Tainted: P     U     O    4.14.147-apl #1
> > [65064.606860] task: ffff96e9f19a3200 task.stack: ffffb3cb80450000
> > [65064.613465] RIP: 0010:ip_route_output_key_hash_rcu+0x755/0x850
> > [65064.619859] Blocking unknown connection: IN= OUT=wlan_bridge SRC=172.16.222.97 DST=224.0.0.22 LEN=40 TOS=0x00 PREC=0xC0 TTL=1 ID=0 DF PROTO=2 MARK=0x1000d4
> > [65064.619972] RSP: 0018:ffffb3cb80453940 EFLAGS: 00010246
> > [65064.641436] RAX: ffff96e866eebf00 RBX: ffff96e800e5a238 RCX: 0000000000000000
> > [65064.649397] RDX: 0000000000000001 RSI: 0000000026c730a0 RDI: 0000000000000000
> > [65064.657363] RBP: ffffb3cb80453990 R08: 0000000000000000 R09: ffff96e9f5bdf000
> > [65064.665326] R10: 0000000000000000 R11: ffff96e875740540 R12: ffff96e866eebf00
> > [65064.673290] R13: ffffb3cb804539a0 R14: ffff96e9f1ba7000 R15: 0000000000000000
> > [65064.681252] FS:  00007efc327fc700(0000) GS:ffff96e9fdc00000(0000) knlGS:0000000000000000
> > [65064.690284] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [65064.696697] CR2: 0000000000000604 CR3: 00000001f4d44000 CR4: 00000000003406b0
> > [65064.704653] Call Trace:
> > [65064.707384]  ip_route_output_key_hash+0x82/0xb0
> > [65064.712438]  ip_route_output_flow+0x19/0x50
> > [65064.717104]  ip_queue_xmit+0x389/0x3c0
> > [65064.721286]  __tcp_transmit_skb+0x598/0x9f0
> > [65064.725956]  tcp_write_xmit+0x1ab/0xf40
> > [65064.730234]  __tcp_push_pending_frames+0x30/0xd0
> > [65064.735387]  tcp_push+0xe7/0x110
> > [65064.738985]  tcp_sendmsg_locked+0x9a3/0xe40
> > [65064.743652]  tcp_sendmsg+0x27/0x40
> > [65064.747444]  inet_sendmsg+0x2f/0xf0
> > [65064.751332]  sock_sendmsg+0x31/0x40
> > [65064.755221]  ___sys_sendmsg+0x28d/0x2a0
> > [65064.759500]  ? do_iter_readv_writev+0x103/0x160
> > [65064.764557]  ? try_to_wake_up+0x25c/0x460
> > [65064.769024]  ? default_wake_function+0xd/0x10
> > [65064.773893]  ? __wake_up_common+0x6e/0x120
> > [65064.778460]  ? __fget+0x71/0xa0
> > [65064.781962]  __sys_sendmsg+0x4f/0x90
> > [65064.785946]  ? __sys_sendmsg+0x4f/0x90
> > [65064.790125]  SyS_sendmsg+0x9/0x10
> > [65064.793818]  do_syscall_64+0x79/0x350
> > [65064.797900]  ? schedule+0x2e/0x90
> > [65064.801594]  ? exit_to_usermode_loop+0x5a/0x90
> > [65064.806549]  entry_SYSCALL_64_after_hwframe+0x3d/0xa2
> > [65064.812183] RIP: 0033:0x7efc3c549807
> > [65064.816167] RSP: 002b:00007efc327f9500 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
> > [65064.824616] RAX: ffffffffffffffda RBX: 0000000000000233 RCX: 00007efc3c549807
> > [65064.832575] RDX: 0000000000004000 RSI: 00007efc327f9570 RDI: 0000000000000233
> > [65064.840543] RBP: 00007efc327f9570 R08: 0000000000000000 R09: 0000000000000001
> > [65064.848500] R10: 00007efc2c087548 R11: 0000000000000293 R12: 0000000000004000
> > [65064.856459] R13: 0000000000000233 R14: 00007efc327fc5c0 R15: 00007efc24bf7810
> > [65064.864419] Code: b1 72 e9 bc fe ff ff 4c 89 4d c8 4c 89 45 d0 e8 a2 08 be ff 4c 8b 45 d0 4c 8b 4d c8 e9 88 fa ff ff 48 8b 08 48 8b 89 98 04 00 00 <8b> 89 04 06 00 00 39 88 a0 00 00 00 0f 85 9c fe ff ff 8b 80 80
> > [65064.885531] RIP: ip_route_output_key_hash_rcu+0x755/0x850 RSP: ffffb3cb80453940
> > [65064.893689] CR2: 0000000000000604
> >
> > Fortunately i have a core dump, and analyzed that.
> > Finally in the analysis i came to that point:
> > (net/ipv4/route.c around line 1520):
> >
> > static bool rt_cache_valid(const struct rtable *rt)
> > {
> > returnrt &&
> > rt->dst.obsolete == DST_OBSOLETE_FORCE_CHK &&
> > !rt_is_expired(rt); <<<< crash here, see below
> > }
> >
> > The code was executing to the check !rt_is_expired(rt); and the disassembly looks like
> > ffffffff814f1ba0:48 85 c0             test   rax,rax // (1127, check for rt not NULL)
> > ffffffff814f1ba3:74 0e                je     ffffffff814f1bb3 <ip_route_output_key_hash_rcu+0x603>
> > /usr/src/kernel/net/ipv4/route.c:1524
> > ffffffff814f1ba5:66 83 78 64 ff       cmp    WORD PTR [rax+0x64],0xffff // rt->dst.obsolete accessed
> > __mkroute_output():
> >
> > /usr/src/kernel/net/ipv4/route.c:2276
> > rth = rcu_dereference(*prth);
> > ffffffff814f1baa:49 89 c4             mov    r12,rax // r12= rt
> > rt_cache_valid():
> > /usr/src/kernel/net/ipv4/route.c:1524
> > ffffffff814f1bad:0f 84 48 01 00 00    je     ffffffff814f1cfb <ip_route_output_key_hash_rcu+0x74b> goes to ***1***
> >
> > the only reference reaching this point is after the DST_OBSOLETE_FORCE_CHK above:
> > read_pnet():
> > /usr/src/kernel/include/net/net_namespace.h:282
> > ffffffff814f1cfb:48 8b 08             mov    rcx,QWORD PTR [rax]       <<<<<<***1*** need to come in here because previous instruction is a "jmp"
> > fetches net_device *dev into rcx, value is ffff96e866eeb000
> >
> > ffffffff814f1cfe:48 8b 89 98 04 00 00 mov    rcx,QWORD PTR [rcx+0x498]
> >     crash> rd 0xffff96e866eeb490 4
> >     ffff96e866eeb490:  0000000000000000 0000000000000000   ................ that's a NULL pointer here
> >     ffff96e866eeb4a0:  8000000000025287 0000000000010002   .R..............
> > crash>__read_once_size():
> > /usr/src/kernel/include/linux/compiler.h:183
> > ffffffff814f1d05:8b 89 04 06 00 00    mov    ecx,DWORD PTR [rcx+0x604] <<<<<<***** crash here. rcx = NULL, offset 0x604
> >
> > Looking at the data i can see that dst.obsolete=0x0002 (i ordered the memdump into the struct), so why did we reach that point?
> > R12 = rt = ffff96e866eebf00 = struct rtable
> > SIZE: 216
> > struct rtable {
> >    [0x0] struct dst_entry dst;
> >         struct dst_entry {
> >            [0x0] struct net_device *dev;                        ffff96e866eeb000
> >            [0x8] struct callback_head callback_head;
> >                 struct callback_head {
> >                     [0x0] struct callback_head *next;           0000000000000000
> >                     [0x8] void (*func)(struct callback_head *); 0101003800000500
> >         }
> >         SIZE: 0x10
> >           [0x18] struct dst_entry *child;                       0000000000000000
> >           [0x20] struct dst_ops *ops;                           ffffffff8dcb28c0
> >           [0x28] unsigned long _metrics;                        ffffffff8da66d41
> >           [0x30] unsigned long expires;                         0000000000000000
> >           [0x38] struct dst_entry *path;                        ffff96e866eebf00
> >           [0x40] struct dst_entry *from;                        0000000000000000
> >           [0x48] struct xfrm_state *xfrm;                       0000000000000000
> >           [0x50] int (*input)(struct sk_buff *);                ffffffff8d49f640
> >           [0x58] int (*output)(struct net *, struct sock *, struct sk_buff *);ffffffff8d49f6d0
> >           [0x60] unsigned short flags;                          0000
> >           [0x62] short error;                                   0000
> >           [0x64] short obsolete;                                0002 <<<< we have not ffff here, but 0x02
> >           [0x66] unsigned short header_len;                     0000
> >           [0x68] unsigned short trailer_len;                    0000
> >           [0x6a] unsigned short __pad3;                         0000
> >           [0x6c] __u32 __pad2;                                  00000000
> >           [0x70] long __pad_to_align_refcnt[2];                 0000000000000000 0000000000000000
> >           [0x80] atomic_t __refcnt;                             000000000
> >           [0x84] int __use;                                     000000000
> >           [0x88] unsigned long lastuse;                         0000000103dc0d11
> >           [0x90] struct lwtunnel_state *lwtstate;               0000000000000000
> >                  union {
> >           [0x98]     struct dst_entry *next;                    0000000000000000
> >           [0x98]     struct rtable *rt_next;
> >           [0x98]     struct rt6_info *rt6_next;
> >           [0x98]     struct dn_route *dn_next;
> >                  };
> >         }
> >         SIZE: 0xa0
> > --- snip ---
> >
> > crash> rd 0xffff96e866eebf00 32 (=r12=rt)
> > ffff96e866eebf00:  ffff96e866eeb000 0000000000000000   ...f............
> > ffff96e866eebf10:  0101003800000500 0000000000000000   ....8...........
> > ffff96e866eebf20:  ffffffff8dcb28c0 ffffffff8da66d41   .(......Am......
> > ffff96e866eebf30:  0000000000000000 ffff96e866eebf00   ...........f....
> > ffff96e866eebf40:  0000000000000000 0000000000000000   ................
> > ffff96e866eebf50:  ffffffff8d49f640 ffffffff8d49f6d0   @.I.......I.....
> > ffff96e866eebf60:  0000000200000000 0000000000000000   ................ <-- here is the "obsolete"
> > ffff96e866eebf70:  0000000000000000 0000000000000000   ................
> > ffff96e866eebf80:  0000000000000000 0000000103dc0d11   ................
> > ffff96e866eebf90:  0000000000000000 0000000000000000   ................
> > ffff96e866eebfa0:  0000000000025287 0000000000000001   .R..............
> > ffff96e866eebfb0:  0000000000000000 00000000000000fe   ................
> > ffff96e866eebfc0:  ffff96e866eebfc0 ffff96e866eebfc0   ...f.......f....
> > ffff96e866eebfd0:  ffff96e8388dfc00 ffff96e8388dfc82   ...8.......8....
> >
> > Looks like a race somewhere, but i am not familiar with the TCPIP code.
> > What i have found is a patch that was submitted for 4.14 and could have somethig to do with that:
> > https://www.spinics.net/lists/stable-commits/msg133055.html
> > But this patch didn't make it into 4.14.
> >
> > Can someone check this race condition?
> >
>
> dst->dev is NULL. Adding author of the patch for thoughts.

It definitely looks like the race described in
https://www.spinics.net/lists/stable-commits/msg133055.html, from all
the evidence above.
However, I am not very sure, how rt_is_expired() could crash with NULL
ptr. I don't think dst->dev could be NULL even after calling
dst_dev_put(), cause we assign loopback_dev to dst->dev there.
Also, Carsten mentioned the memdump shows 'dst.obsolete=0x0002'. In
rt_is_expired(), if 'dst.obsolete=0x0002', we should not call
rt_is_expired(). So there might be some memory barrier that is
required in dst_dev_put()?
But anyway, since the fix above replaced dst_dev_put() with
rt_add_uncached_list(), I believe this crash should also be fixed.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* AW: Possible race in ipv4 routing
  2021-02-01 18:52   ` Wei Wang
@ 2021-04-14  9:18     ` Schmid, Carsten
  2021-04-16 12:29     ` Schmid, Carsten
  1 sibling, 0 replies; 5+ messages in thread
From: Schmid, Carsten @ 2021-04-14  9:18 UTC (permalink / raw)
  To: Wei Wang, David Ahern; +Cc: davem, kuznet, yoshfuji, netdev

Hi again,

> -----Ursprüngliche Nachricht-----
> Von: Wei Wang [mailto:weiwan@google.com]
> Gesendet: Montag, 1. Februar 2021 19:53
> An: David Ahern <dsahern@gmail.com>
> Cc: Schmid, Carsten <Carsten_Schmid@mentor.com>;
> davem@davemloft.net; kuznet@ms2.inr.ac.ru; yoshfuji@linux-ipv6.org;
> netdev@vger.kernel.org
> Betreff: Re: Possible race in ipv4 routing
>
> On Mon, Feb 1, 2021 at 7:35 AM David Ahern <dsahern@gmail.com> wrote:
> >
> > On 2/1/21 2:20 AM, Schmid, Carsten wrote:
> > > Hi,
> > >
> > > on kernel 4.14(.147) i have seen something weird. The stack trace:
> > >
> > > [65064.457920] BUG: unable to handle kernel NULL pointer dereference
> at 0000000000000604
> > > [65064.466677] IP: ip_route_output_key_hash_rcu+0x755/0x850
-------8<-------------
...
-------8<-------------

> > > Fortunately i have a core dump, and analyzed that.
i again have a core dump and ...
... the pointer is not NULL this time:
(see RCX content, which has the pointer's content and is accessed)

[44087.587354] general protection fault: 0000 [#1] PREEMPT SMP NOPTI
[44087.594172] Modules linked in: bcmdhd(O) ebt_ip6 ebt_ip ebtable_filter ebtables squashfs zlib_inflate xz_dec veth lzo lzo_compress lzo_decompress esp4 ah4 xfrm4_mode_transport xfrm_user xfrm_algo cls_u32 sch_htb cdc_acm intel_tfm_governor intel_ipu4_psys intel_ipu4_psys_csslib ecryptfs snd_soc_apl_mgu_hu intel_xhci_usb_role_switch roles dwc3 adv728x udc_core intel_ipu4_isys videobuf2_dma_contig videobuf2_memops ipu4_acpi intel_ipu4_isys_csslib coretemp snd_soc_skl videobuf2_v4l2 videobuf2_core sdw_cnl snd_soc_acpi_intel_match snd_soc_acpi snd_soc_core sbi_apl snd_compress i2c_i801 snd_soc_skl_ipc sdw_bus crc8 intel_ipu4_mmu snd_soc_sst_ipc snd_soc_sst_dsp ahci snd_hda_ext_core libahci xhci_pci snd_hda_core mei_me xhci_hcd snd_pcm libata cfg80211 mei snd_timer usbcore dwc3_pci snd scsi_mod usb_common
[44087.607643] skl_ipc_process_reply: 65 callbacks suppressed
[44087.679547]  soundcore rfkill intel_ipu4 iova nfsd auth_rpcgss lockd grace sunrpc zram zsmalloc loop fuse 8021q bridge stp llc inap560t(O) i915 video backlight intel_gtt i2c_algo_bit drm_kms_helper drm firmware_class igb_avb(O) ptp hwmon spi_pxa2xx_platform pps_core [last unloaded: bcmdhd]
[44087.708253] CPU: 1 PID: 248 Comm: 6310_io03 Tainted: G     U     O    4.14.198-apl #1
[44087.717004] task: ffffa0d273b75780 task.stack: ffffa28cc0714000
[44087.723623] RIP: 0010:ip_route_output_key_hash_rcu+0x763/0x860
[44087.730138] RSP: 0018:ffffa28cc0717940 EFLAGS: 00010206
[44087.735976] RAX: ffffa0d0d1f40b00 RBX: ffffa0d26e6d6038 RCX: 000499f24000000a
[44087.743961] RDX: 0000000000000001 RSI: 0000000060c730a0 RDI: 0000000000000000
[44087.751942] RBP: ffffa28cc0717990 R08: 0000000000000000 R09: ffffa0d272e29200
[44087.759919] R10: 0000000000000000 R11: ffffa0d0c60ec300 R12: ffffa0d0d1f40b00
[44087.767897] R13: ffffa28cc07179a0 R14: ffffa0d272e31000 R15: 0000000000000000
[44087.775870] FS:  00007f85affff700(0000) GS:ffffa0d27fc80000(0000) knlGS:0000000000000000
[44087.784917] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[44087.791338] CR2: 00007f3837ec1000 CR3: 00000001f70d4000 CR4: 00000000003406a0
[44087.799314] Call Trace:
[44087.802048]  ip_route_output_key_hash+0x82/0xb0
[44087.807119]  ip_route_output_flow+0x19/0x50
[44087.811798]  ip_queue_xmit+0x389/0x3c0
[44087.815986]  __tcp_transmit_skb+0x598/0x9f0
[44087.820658]  tcp_write_xmit+0x1b7/0xf50
[44087.824942]  __tcp_push_pending_frames+0x30/0xd0
[44087.830109]  tcp_push+0xe7/0x110
[44087.833712]  tcp_sendmsg_locked+0x9ac/0xe50
[44087.838386]  ? __switch_to_asm+0x41/0x70
[44087.842770]  tcp_sendmsg+0x27/0x40
[44087.846574]  inet_sendmsg+0x2f/0xf0
[44087.850468]  sock_sendmsg+0x31/0x40
[44087.854363]  ___sys_sendmsg+0x28d/0x2a0
[44087.858646]  ? wake_up_q+0x54/0x80
[44087.862444]  ? futex_wake+0x8a/0x180
[44087.866437]  ? do_futex+0xc0/0xc60
[44087.870234]  ? tick_program_event+0x3f/0x70
[44087.874912]  ? __fget+0x71/0xa0
[44087.878422]  __sys_sendmsg+0x4f/0x90
[44087.882412]  ? __sys_sendmsg+0x4f/0x90
[44087.886591]  SyS_sendmsg+0x9/0x10
[44087.890290]  do_syscall_64+0x79/0x350
[44087.894380]  ? schedule+0x2e/0x90
[44087.898079]  ? exit_to_usermode_loop+0x5a/0x90
[44087.903040]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
[44087.908680] RIP: 0033:0x7f85bb79e807
[44087.912667] RSP: 002b:00007f85afffceb0 EFLAGS: 00000293 ORIG_RAX: 000000000000002e
[44087.921131] RAX: ffffffffffffffda RBX: 0000000000000132 RCX: 00007f85bb79e807
[44087.929116] RDX: 0000000000004000 RSI: 00007f85afffcf20 RDI: 0000000000000132
[44087.937083] RBP: 00007f85afffcf20 R08: 0000000000000000 R09: 0000000000000000
[44087.945057] R10: 000000000536591c R11: 0000000000000293 R12: 0000000000004000
[44087.953022] R13: 0000560a8a4d5208 R14: 00007f85affff5c0 R15: 0000560a8a8a7040
[44087.960994] Code: 4d c8 4c 89 45 d0 e8 dd d2 bd ff 4c 8b 45 d0 4c 8b 4d c8 e9 83 fa ff ff 48 8b 08 48 8b 89 98 04 00 00 48 85 c9 0f 84 a5 fe ff ff <8b> 89 04 06 00 00 39 88 a0 00 00 00 0f 85 93 fe ff ff 8b 80 80
[44087.982134] RIP: ip_route_output_key_hash_rcu+0x763/0x860 RSP: ffffa28cc0717940

> > > https://www.spinics.net/lists/stable-commits/msg133055.html
> > > But this patch didn't make it into 4.14.
> > >
> > > Can someone check this race condition?
> > >
> >
> > dst->dev is NULL. Adding author of the patch for thoughts.
>
> It definitely looks like the race described in
> https://www.spinics.net/lists/stable-commits/msg133055.html, from all
> the evidence above.
> However, I am not very sure, how rt_is_expired() could crash with NULL
> ptr. I don't think dst->dev could be NULL even after calling
> dst_dev_put(), cause we assign loopback_dev to dst->dev there.
> Also, Carsten mentioned the memdump shows 'dst.obsolete=0x0002'. In
> rt_is_expired(), if 'dst.obsolete=0x0002', we should not call
> rt_is_expired(). So there might be some memory barrier that is
> required in dst_dev_put()?

Do we have this part locked, between checking dst.obsolete=0x0002 and
calling rt_is_expired?

> But anyway, since the fix above replaced dst_dev_put() with
> rt_add_uncached_list(), I believe this crash should also be fixed.

The crash that is seen here has the fix applied, it was a trivial
"move the code a bit" for adaption into 4.14. (and it is meanwhile merged
into 4.14 stable).
Additionally, in my kernel i added a NULL pointer check which obviously doesn't
really help here (this is not upstream):
 static inline bool rt_is_expired(const struct rtable *rth)
 {
-return rth->rt_genid != rt_genid_ipv4(dev_net(rth->dst.dev));
+return (dev_net(rth->dst.dev) == NULL) ||
+   (rth->rt_genid != rt_genid_ipv4(dev_net(rth->dst.dev)));
 }

Any thoughts what could cause this?
For me it looks like a UAF and/or a race.
What i have also found is:
https://syzkaller.appspot.com/bug?id=1157f1b5ae3cf62f60d725868ae9fc53e9050aa0


Best regards
Carsten
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

^ permalink raw reply	[flat|nested] 5+ messages in thread

* AW: Possible race in ipv4 routing
  2021-02-01 18:52   ` Wei Wang
  2021-04-14  9:18     ` AW: " Schmid, Carsten
@ 2021-04-16 12:29     ` Schmid, Carsten
  1 sibling, 0 replies; 5+ messages in thread
From: Schmid, Carsten @ 2021-04-16 12:29 UTC (permalink / raw)
  To: Wei Wang, David Ahern; +Cc: davem, kuznet, yoshfuji, netdev

Hi again,

> > > > on kernel 4.14(.147) i have seen something weird. The stack trace:
> > > >
> > > > [65064.457920] BUG: unable to handle kernel NULL pointer dereference
> > at 0000000000000604
> > > > [65064.466677] IP: ip_route_output_key_hash_rcu+0x755/0x850

I have another crash with a very different stack, but also ending up in
rt_is_expired but this time from an interrupt context:
[14330.764616] BUG: unable to handle kernel NULL pointer dereference at 0000000000000608
[14330.773382] IP: ipv4_dst_check+0x1c/0x40
[14330.777761] PGD 0 P4D 0
[14330.780586] Oops: 0000 [#1] PREEMPT SMP NOPTI
[14330.785451] Modules linked in: bcmdhd(O) squashfs zlib_inflate xz_dec ebt_ip6 ebt_ip ebtable_filter ebtables veth lzo lzo_compress lzo_decompress ah4 xfrm4_mode_transport nls_iso8859_1 nls_cp850 vfat fat cfq_iosched sd_mod xfrm_user xfrm_algo tntfs(PO) texfat(PO) usb_storage cls_u32 sch_htb intel_tfm_governor ecryptfs snd_soc_apl_mgu_hu intel_ipu4_psys intel_xhci_usb_role_switch dwc3 roles intel_ipu4_psys_csslib udc_core adv728x coretemp intel_ipu4_isys videobuf2_dma_contig i2c_i801 videobuf2_memops snd_soc_skl ipu4_acpi intel_ipu4_isys_csslib sdw_cnl videobuf2_v4l2 snd_soc_acpi_intel_match videobuf2_core snd_soc_acpi snd_soc_core sbi_apl snd_compress snd_soc_skl_ipc sdw_bus crc8 snd_soc_sst_ipc snd_soc_sst_dsp snd_hda_ext_core intel_ipu4_mmu snd_hda_core ahci snd_pcm xhci_pci xhci_hcd libahci snd_timer
[14330.865056]  cfg80211 snd libata soundcore intel_ipu4 usbcore mei_me rfkill dwc3_pci usb_common scsi_mod iova mei nfsd auth_rpcgss lockd grace sunrpc zram zsmalloc loop fuse 8021q bridge stp llc inap560t(O) i915 video backlight intel_gtt i2c_algo_bit igb_avb(O) drm_kms_helper drm ptp pps_core hwmon spi_pxa2xx_platform firmware_class [last unloaded: bcmdhd]
[14330.900358] CPU: 0 PID: 251 Comm: 6310_io03 Tainted: P     U     O    4.14.198-apl #1
[14330.909105] task: ffff93fd77bd4b00 task.stack: ffff9a0ec0538000
[14330.915712] RIP: 0010:ipv4_dst_check+0x1c/0x40
[14330.920671] RSP: 0018:ffff93fd7fc03ca0 EFLAGS: 00010202
[14330.926504] RAX: 0000000000000000 RBX: ffff93fd6b0def00 RCX: 0000000000000001
[14330.934474] RDX: 0000000000000004 RSI: 0000000000000000 RDI: ffff93fba73b1e00
[14330.942444] RBP: ffff93fd7fc03ca0 R08: 0000000079044900 R09: 0000000000005ba0
[14330.950414] R10: 63c730a037c730a0 R11: 0000000000000001 R12: ffff93fd2b634d80
[14330.958381] R13: ffffffffa26ac800 R14: ffff93fd76c08000 R15: 0000000000000008
[14330.966352] FS:  00007f1b37fff700(0000) GS:ffff93fd7fc00000(0000) knlGS:0000000000000000
[14330.975391] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[14330.981807] CR2: 0000000000000608 CR3: 00000001f6884000 CR4: 00000000003406b0
[14330.989777] Call Trace:
[14330.992505]  <IRQ>
[14330.994750]  tcp_v4_early_demux+0xf1/0x150
[14330.999326]  ip_rcv_finish+0x152/0x370
[14331.003509]  ip_rcv+0x25d/0x370
[14331.007004]  ? ip_local_deliver_finish+0x210/0x210
[14331.012356]  __netif_receive_skb_core+0x202/0x8d0
[14331.017609]  __netif_receive_skb+0x13/0x60
[14331.022181]  netif_receive_skb_internal+0x3b/0x110
[14331.027521]  napi_gro_receive+0xe9/0x110
[14331.031910]  igb_poll+0x6a4/0x1300 [igb_avb]
[14331.036682]  net_rx_action+0xdb/0x320
[14331.040771]  __do_softirq+0xd0/0x308
[14331.044765]  irq_exit+0xba/0xc0
[14331.048268]  do_IRQ+0x81/0xd0
[14331.051572]  common_interrupt+0x8d/0x8d
[14331.055853]  </IRQ>
[14331.058186] RIP: 0010:unix_write_space+0x1/0xa0
[14331.063243] RSP: 0018:ffff9a0ec053bab0 EFLAGS: 00000212 ORIG_RAX: ffffffffffffff4d
[14331.071702] RAX: ffffffffa1f5ed80 RBX: ffff93fb87832800 RCX: ffff93fb878320c8
[14331.079673] RDX: 0000000000000000 RSI: 0000000000000246 RDI: ffff93fb87832800
[14331.087644] RBP: ffff9a0ec053bac8 R08: ffff9a0ec053c000 R09: ffff93fd2787ea31
[14331.095615] R10: ffff9a0ec053bcf8 R11: 0000000000000000 R12: ffff93fd4a4aa300
[14331.103587] R13: ffff93fb878320c8 R14: 0000000000000041 R15: 0000000000000000
[14331.111562]  ? unix_seqpacket_sendmsg+0x50/0x50
[14331.116624]  ? sock_wfree+0x39/0x60
[14331.120516]  unix_destruct_scm+0x7e/0xa0
[14331.124897]  skb_release_head_state+0x5b/0xb0
[14331.129760]  skb_release_all+0xd/0x30
[14331.133847]  consume_skb+0x2b/0xb0
[14331.137643]  unix_stream_read_generic+0x7e0/0x8d0
[14331.142900]  ? import_iovec+0x3a/0xe0
[14331.146987]  unix_stream_recvmsg+0x4c/0x70
[14331.151562]  ? unix_state_double_unlock+0x30/0x30
[14331.156816]  sock_recvmsg+0x3b/0x50
[14331.160710]  ___sys_recvmsg+0xd4/0x180
[14331.164899]  ? ktime_get+0x3e/0xa0
[14331.168694]  ? clockevents_program_min_delta+0x53/0x100
[14331.174530]  ? clockevents_program_event+0xee/0x110
[14331.179978]  ? tick_program_event+0x3f/0x70
[14331.184650]  ? __fget+0x71/0xa0
[14331.188157]  __sys_recvmsg+0x4c/0x90
[14331.192147]  ? __sys_recvmsg+0x4c/0x90
[14331.196334]  SyS_recvmsg+0x9/0x10
[14331.200033]  do_syscall_64+0x79/0x350
[14331.204120]  ? schedule+0x2e/0x90
[14331.207818]  ? exit_to_usermode_loop+0x5a/0x90
[14331.212772]  entry_SYSCALL_64_after_hwframe+0x41/0xa6
[14331.218413] RIP: 0033:0x7f1b439a8767
[14331.222401] RSP: 002b:00007f1b37ffdb80 EFLAGS: 00000293 ORIG_RAX: 000000000000002f
[14331.230857] RAX: ffffffffffffffda RBX: 00000000000000be RCX: 00007f1b439a8767
[14331.238827] RDX: 0000000000000000 RSI: 00007f1b37ffdbf0 RDI: 00000000000000be
[14331.246796] RBP: 00007f1b37ffdbf0 R08: 0000000000000000 R09: 0000000000000000
[14331.254766] R10: 0000000000000076 R11: 0000000000000293 R12: 0000000000000000
[14331.262735] R13: 00007f1b37fff5c0 R14: 00007f1b3815d470 R15: 0000000200000001
[14331.270705] Code: 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 66 83 7f 64 ff 55 48 89 e5 75 23 48 8b 07 48 8b 90 98 04 00 00 31 c0 48 85 d2 74 10 <8b> 92 04 06 00 00 39 97 a0 00 00 00 48 0f 44 c7 5d c3 31 c0 5d
[14331.291831] RIP: ipv4_dst_check+0x1c/0x40 RSP: ffff93fd7fc03ca0
[14331.298444] CR2: 0000000000000608

INDIRECT_CALLABLE_SCOPE struct dst_entry *ipv4_dst_check(struct dst_entry *dst,
 u32 cookie)
{
struct rtable *rt = (struct rtable *) dst;

/* All IPV4 dsts are created with ->obsolete set to the value
 * DST_OBSOLETE_FORCE_CHK which forces validation calls down
 * into this function always.
 *
 * When a PMTU/redirect information update invalidates a route,
 * this is indicated by setting obsolete to DST_OBSOLETE_KILL or
 * DST_OBSOLETE_DEAD.
 */
if (dst->obsolete != DST_OBSOLETE_FORCE_CHK || rt_is_expired(rt))
return NULL;
return dst;
}
EXPORT_INDIRECT_CALLABLE(ipv4_dst_check);

static inline bool rt_is_expired(const struct rtable *rth)
{
return (dev_net(rth->dst.dev) == NULL) ||
   (rth->rt_genid != rt_genid_ipv4(dev_net(rth->dst.dev))); <--- crashes here

Any ideas?
This damn smells like an UAF due to a race.

Unfortunately i don't have the coredump. Only the stack.
But no reproducer yet.

Best regards
Carsten
-----------------
Mentor Graphics (Deutschland) GmbH, Arnulfstrasse 201, 80634 München Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Frank Thürauf

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2021-04-16 12:36 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-01  9:20 Possible race in ipv4 routing Schmid, Carsten
2021-02-01 15:35 ` David Ahern
2021-02-01 18:52   ` Wei Wang
2021-04-14  9:18     ` AW: " Schmid, Carsten
2021-04-16 12:29     ` Schmid, Carsten

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).