From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752676AbdK1M6N (ORCPT ); Tue, 28 Nov 2017 07:58:13 -0500 Received: from mail-pl0-f66.google.com ([209.85.160.66]:39605 "EHLO mail-pl0-f66.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751615AbdK1M6L (ORCPT ); Tue, 28 Nov 2017 07:58:11 -0500 X-Google-Smtp-Source: AGs4zMbtojAzEThGGATdWFrz2QMrZtdIAcf7XegDmb+mwx31yn9muPRiOubpxwbyer1MulZ41P1JPY77I+wrhcKvZy4= MIME-Version: 1.0 In-Reply-To: <20171128123555.mo4ikj2ru6mkibwo@lakrids.cambridge.arm.com> References: <20171128123555.mo4ikj2ru6mkibwo@lakrids.cambridge.arm.com> From: Dmitry Vyukov Date: Tue, 28 Nov 2017 13:57:49 +0100 Message-ID: Subject: Re: kasan: false use-after-scope warnings with KCOV To: Mark Rutland Cc: LKML , linux-arm-kernel@lists.infradead.org, Andrey Ryabinin , Alexander Potapenko , kasan-dev Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 28, 2017 at 1:35 PM, Mark Rutland wrote: > Hi, > > As a heads-up, I'm seeing a number of what appear to be false-positive > use-after-scope warnings when I enable both KCOV and KASAN (inline or outline), > when using the Linaro 17.08 GCC7.1.1 for arm64. So far I haven't spotted these > without KCOV selected, and I'm only seeing these for sanitize-use-after-scope. > > The reports vary depending on configuration even with the same trigger. I'm not > sure if it's the reporting that's misleading, or whether the detection is going > wrong. > > For example, with v4.15-rc1, defconfig + KCOV + KASAN_OUTLINE, I can trigger a > splat: > > $ perf record true > > [ 37.577497] ================================================================== > [ 37.584702] BUG: KASAN: use-after-scope in __alloc_pages_nodemask+0x104/0x1608 > [ 37.591883] Write of size 24 at addr ffff80092d65f160 by task perf/2430 > [ 37.598452] > [ 37.599944] CPU: 1 PID: 2430 Comm: perf Not tainted 4.15.0-rc1-00001-gaf82bf81ebae #1 > [ 37.607725] Hardware name: ARM Juno development board (r1) (DT) > [ 37.613605] Call trace: > [ 37.616051] dump_backtrace+0x0/0x320 > [ 37.619700] show_stack+0x20/0x30 > [ 37.623005] dump_stack+0x108/0x174 > [ 37.626481] print_address_description+0x60/0x270 > [ 37.631162] kasan_report+0x210/0x2f0 > [ 37.634811] check_memory_region+0x148/0x198 > [ 37.639063] __asan_storeN+0x14/0x20 > [ 37.642624] __alloc_pages_nodemask+0x104/0x1608 > [ 37.647221] alloc_pages_vma+0xa0/0x2d8 > [ 37.651042] wp_page_copy+0x15c/0xee0 > [ 37.654689] do_wp_page+0x404/0xa70 > [ 37.658165] __handle_mm_fault+0xb28/0x13e0 > [ 37.662331] handle_mm_fault+0x290/0x390 > [ 37.666237] do_page_fault+0x32c/0x5c0 > [ 37.669969] do_mem_abort+0xa8/0x1e0 > [ 37.673528] el0_da+0x20/0x24 > [ 37.676477] > [ 37.677961] The buggy address belongs to the page: > [ 37.682730] page:ffff7e0024b597c0 count:0 mapcount:0 mapping: (null) index:0x0 > [ 37.690692] flags: 0x1fffc00000000000() > [ 37.694518] raw: 1fffc00000000000 0000000000000000 0000000000000000 00000000ffffffff > [ 37.702225] raw: 0000000000000000 ffff7e0024b597e0 0000000000000000 0000000000000000 > [ 37.709922] page dumped because: kasan: bad access detected > [ 37.715457] > [ 37.716941] Memory state around the buggy address: > [ 37.721709] ffff80092d65f000: f2 f2 04 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2 > [ 37.728893] ffff80092d65f080: f2 f2 00 f2 f2 f2 f2 f2 f2 f2 00 f2 f2 f2 f2 f2 > [ 37.736078] >ffff80092d65f100: f2 f2 00 f2 f2 f2 f2 f2 f2 f2 f8 f8 f8 f8 00 f2 > [ 37.743257] ^ > [ 37.749576] ffff80092d65f180: f2 f2 f2 f2 f2 f2 00 00 00 00 00 00 00 f2 f3 f3 > [ 37.756761] ffff80092d65f200: f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > [ 37.763939] ================================================================== > [ 37.771117] Disabling lock debugging due to kernel taint > > $ ./scripts/faddr2line vmlinux __alloc_pages_nodemask+0x104/0x1608 > __alloc_pages_nodemask+0x104/0x1608: > __alloc_pages_nodemask at mm/page_alloc.c:4215 > > ... which is the declaration+initialisation of a local variable in > __alloc_pages_nodemask: > > 4208 struct page * > 4209 __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, int preferred_nid, > 4210 nodemask_t *nodemask) > 4211 { > 4212 struct page *page; > 4213 unsigned int alloc_flags = ALLOC_WMARK_LOW; > 4214 gfp_t alloc_mask; /* The gfp_t that was actually used for allocation */ > 4215 struct alloc_context ac = { }; > > ... which is clearly not a use-after-scope bug. > > If I separate the declaration and assignment, I get a splat corresponding to the > assignment to ac. > > I wondered if we were missing some shadow initialisation, so I hacked a call to > kasan_unpoison_task_stack() into dup_task_struct(), but this had no effect. I > also wondered if this was the result of an overflow caused by instrumentation > bloating the stack, but doubling my stack size (from 32K to 64K) also had no > effect. Hi Mark, Has anything changed in your environment? Kernel? Compiler? Configs? The last one that I debugged related to stack false positives was due to incorrect DTLB flush after KASAN shadow initialization. But that was on x86 and due to a missed backport to 4.4. Please post disasm of the function. Instrumentation should have been cleared shadow for ac in prologue.