From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_PASS,URIBL_BLOCKED,WEIRD_PORT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F73BC04EB8 for ; Mon, 10 Dec 2018 18:47:38 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 17FBB2084E for ; Mon, 10 Dec 2018 18:47:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 17FBB2084E Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=collabora.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1728887AbeLJSrg (ORCPT ); Mon, 10 Dec 2018 13:47:36 -0500 Received: from bhuna.collabora.co.uk ([46.235.227.227]:43694 "EHLO bhuna.collabora.co.uk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727071AbeLJSrf (ORCPT ); Mon, 10 Dec 2018 13:47:35 -0500 Received: from [IPv6:2a00:5f00:102:0:ec70:a07e:19ec:9e12] (unknown [IPv6:2a00:5f00:102:0:ec70:a07e:19ec:9e12]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) (Authenticated sender: gtucker) by bhuna.collabora.co.uk (Postfix) with ESMTPSA id 5341A27D796; Mon, 10 Dec 2018 18:47:33 +0000 (GMT) Subject: Re: mainline/master boot bisection: v4.20-rc5-79-gabb8d6ecbd8f on jetson-tk1 To: Steven Rostedt , Ravi Bangoria Cc: Srikar Dronamraju , tomeu.vizoso@collabora.com, Oleg Nesterov , broonie@kernel.org, matthew.hart@linaro.org, khilman@baylibre.com, enric.balletbo@collabora.com, Namhyung Kim , Peter Zijlstra , linux-kernel@vger.kernel.org, Ingo Molnar , Jiri Olsa , Alexander Shishkin , Arnaldo Carvalho de Melo References: <5c09f05a.1c69fb81.95568.35c2@mx.google.com> <20181210131933.53e3ae8a@gandalf.local.home> From: Guillaume Tucker Message-ID: <20e8dbdc-a49e-8c2a-b4dd-fcbc3bbb9440@collabora.com> Date: Mon, 10 Dec 2018 18:47:30 +0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.0 MIME-Version: 1.0 In-Reply-To: <20181210131933.53e3ae8a@gandalf.local.home> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/12/2018 18:19, Steven Rostedt wrote: > On Mon, 10 Dec 2018 16:23:19 +0530 > Ravi Bangoria wrote: > >> Hi, >> >> Can you please provide more details. I don't understand how this patch >> can cause boot failure. >> >> >From the log found at >> https://storage.kernelci.org/mainline/master/v4.20-rc5-79-gabb8d6ecbd8f/arm/multi_v7_defconfig+CONFIG_EFI=y+CONFIG_ARM_LPAE=y/lab-baylibre/boot-tegra124-jetson-tk1.html >> >> 23:21:06.680269 [ 7.500733] Unable to handle kernel NULL pointer dereference at virtual address 00000064 >> 23:21:06.680455 [ 7.508893] pgd = (ptrval) >> 23:21:06.721940 [ 7.511591] [00000064] *pgd=ad7d8003, *pmd=f9d5d003 >> 23:21:06.722241 [ 7.516500] Internal error: Oops: 207 [#1] SMP ARM >> ... >> 23:21:06.722724 [ 7.546706] CPU: 0 PID: 122 Comm: udevd Not tainted 4.20.0-rc5 #1 >> 23:21:06.722911 [ 7.552785] Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) >> 23:21:06.765203 [ 7.559045] PC is at drm_plane_register_all+0x18/0x50 >> 23:21:06.765493 [ 7.564094] LR is at drm_modeset_register_all+0xc/0x6c >> 23:21:06.765698 [ 7.569217] pc : [] lr : [] psr: a0000013 >> 23:21:06.765882 [ 7.575470] sp : c3451c70 ip : 2d827000 fp : c1804c48 >> 23:21:06.766053 [ 7.580680] r10: 00000000 r9 : ec9cc300 r8 : 00000000 >> 23:21:06.766229 [ 7.585893] r7 : bf193c80 r6 : 00000000 r5 : c3694224 r4 : fffffffc >> 23:21:06.766403 [ 7.592404] r3 : 00002000 r2 : 0002f000 r1 : eef92cf0 r0 : c3694000 >> ... >> 23:21:07.068237 [ 7.880215] [] (drm_plane_register_all) from [] (drm_modeset_register_all+0xc/0x6c) >> 23:21:07.068493 [ 7.889603] [] (drm_modeset_register_all) from [] (drm_dev_register+0x16c/0x1c4) >> 23:21:07.109960 [ 7.898915] [] (drm_dev_register) from [] (nouveau_platform_probe+0x54/0x8c [nouveau]) >> 23:21:07.110285 [ 7.908750] [] (nouveau_platform_probe [nouveau]) from [] (platform_drv_probe+0x48/0x98) >> 23:21:07.110515 [ 7.918572] [] (platform_drv_probe) from [] (really_probe+0x228/0x2d0) >> 23:21:07.110706 [ 7.926832] [] (really_probe) from [] (driver_probe_device+0x60/0x174) >> 23:21:07.110893 [ 7.935093] [] (driver_probe_device) from [] (__driver_attach+0xd0/0xd4) >> 23:21:07.153794 [ 7.943528] [] (__driver_attach) from [] (bus_for_each_dev+0x74/0xb4) >> 23:21:07.154133 [ 7.951688] [] (bus_for_each_dev) from [] (bus_add_driver+0x18c/0x210) >> 23:21:07.154352 [ 7.959946] [] (bus_add_driver) from [] (driver_register+0x74/0x108) >> 23:21:07.154544 [ 7.968212] [] (driver_register) from [] (nouveau_drm_init+0x170/0x1000 [nouveau]) >> 23:21:07.154739 [ 7.977692] [] (nouveau_drm_init [nouveau]) from [] (do_one_initcall+0x54/0x1fc) >> 23:21:07.197008 [ 7.986820] [] (do_one_initcall) from [] (do_init_module+0x64/0x1f4) >> 23:21:07.197344 [ 7.994906] [] (do_init_module) from [] (load_module+0x1ee8/0x23c8) >> 23:21:07.197553 [ 8.002907] [] (load_module) from [] (sys_finit_module+0xac/0xd8) >> 23:21:07.197751 [ 8.010722] [] (sys_finit_module) from [] (ret_fast_syscall+0x0/0x4c) >> 23:21:07.197935 [ 8.018884] Exception stack(0xc3451fa8 to 0xc3451ff0) >> >> >> Both PC and LR are pointing to drm_* code. I don't see this anyway related to >> uprobes. Did I miss anything? >> > > The bot sometimes gets confused during the bisect. This looks to be one > of those times. I'd simply ignore it because the code path of the > commit it points out is obviously never hit. > > The bug may be a race condition that will cause havoc with automated > bisects. Update: It turns out this was in fact the result of some network infrastructure issue in the test lab. There are checks at the end of the bisection, to verify that the "breaking" revision does fail to boot 3 times in a row and then succeed to boot 3 times in a row after reverting the change. As unlikely as it sounds, downloading the kernel binary failed 3 times for the "bad" checks and succeeded 3 times for the "good" checks... (probably caused by caching). All the logs can be found here: http://lava.baylibre.com:10080/scheduler/alljobs?length=25&search=lava-bisect-11491#table There's a fix coming to avoid this issue in the future and discard lab infrastructure errors. Sorry for the noise. Guillaume