From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S932408AbdCFSwC (ORCPT ); Mon, 6 Mar 2017 13:52:02 -0500 Received: from mail-ua0-f177.google.com ([209.85.217.177]:33373 "EHLO mail-ua0-f177.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S932266AbdCFSvq (ORCPT ); Mon, 6 Mar 2017 13:51:46 -0500 MIME-Version: 1.0 In-Reply-To: <4e4d7d51-82de-b21e-cb5d-d804f7b88999@cumulusnetworks.com> References: <1488658514.9415.356.camel@edumazet-glaptop3.roam.corp.google.com> <4e4d7d51-82de-b21e-cb5d-d804f7b88999@cumulusnetworks.com> From: Dmitry Vyukov Date: Mon, 6 Mar 2017 19:51:24 +0100 Message-ID: Subject: Re: net: heap out-of-bounds in fib6_clean_node/rt6_fill_node/fib6_age/fib6_prune_clone To: David Ahern Cc: Eric Dumazet , Mahesh Bandewar , Eric Dumazet , David Miller , Alexey Kuznetsov , James Morris , Hideaki YOSHIFUJI , Patrick McHardy , netdev , LKML , Cong Wang , syzkaller Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Mar 6, 2017 at 6:31 PM, David Ahern wrote: > On 3/4/17 1:15 PM, Eric Dumazet wrote: >> On Sat, 2017-03-04 at 19:57 +0100, Dmitry Vyukov wrote: >>> On Fri, Mar 3, 2017 at 8:12 PM, David Ahern wrote: >>>> On 3/3/17 6:39 AM, Dmitry Vyukov wrote: >>>>> I am getting heap out-of-bounds reports in >>>>> fib6_clean_node/rt6_fill_node/fib6_age/fib6_prune_clone while running >>>>> syzkaller fuzzer on 86292b33d4b79ee03e2f43ea0381ef85f077c760. They all >>>>> follow the same pattern: an object of size 216 is allocated from >>>>> ip_dst_cache slab, and then accessed at offset 272/276 withing >>>>> fib6_walk. Looks like type confusion. Unfortunately this is not >>>>> reproducible. >>>> >>>> I'll take a look this weekend or Monday at the latest. >>> >>> >>> I've got some additional useful info on this. I think this is >>> use-after-free rather than out-of-bounds. I've collected stack where >>> the route was disposed with call_rcu, see the last "Disposed" stack. >>> The crash happens when cmpxchg in rt_cache_route replaces an existing >>> route. And that route seems to have some existing pointers to it >>> (rt->dst.rt6_next) which fib6_walk uses to get to it after its >>> deletion. >> >> rt_cache_route() deals with IPv4 routes. >> >> We somehow mix IPv4 and IPv6 dsts in IPv6 tree. >> >> We need to add type safety at IPV6 route insertions to catch the >> offender. >> > > I've seen something like this before -- a rt was on the gc list but > still linked in the tables because of some reference. > > Dmitry: you seem to have reproduced this a few times. Can you share how > to run whatever tests you are using? We hit it several thousand times, but we get only several dozens of crashes per day on ~80 VMs. So if you try to reproduce it on a single machine it can take days for a single crash. If you are ready to go that route, here are some instructions on setting up syzkaller: https://github.com/google/syzkaller You also need kernel built with CONFIG_KASAN. I am ready to help with resolving any issues. Another possible route is if you give me a patch with some additional WARNINGs. Then I can deploy it to bots and collect stacks.