From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS, USER_AGENT_MUTT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 14AC6C282CE for ; Tue, 4 Jun 2019 10:43:46 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id E90F123406 for ; Tue, 4 Jun 2019 10:43:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1727382AbfFDKnp (ORCPT ); Tue, 4 Jun 2019 06:43:45 -0400 Received: from mx1.redhat.com ([209.132.183.28]:52256 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727107AbfFDKno (ORCPT ); Tue, 4 Jun 2019 06:43:44 -0400 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mx1.redhat.com (Postfix) with ESMTPS id 9AE26A7DD; Tue, 4 Jun 2019 10:43:41 +0000 (UTC) Received: from ming.t460p (ovpn-8-27.pek2.redhat.com [10.72.8.27]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 068F42B59F; Tue, 4 Jun 2019 10:43:32 +0000 (UTC) Date: Tue, 4 Jun 2019 18:43:27 +0800 From: Ming Lei To: Rong Chen Cc: Jens Axboe , Bart Van Assche , Christoph Hellwig , LKML , Linus Torvalds , lkp@01.org Subject: Re: [block] 47cdee29ef: BUG:kernel_NULL_pointer_dereference,address Message-ID: <20190604104326.GA22492@ming.t460p> References: <20190604020956.GC6576@shao2-debian> <20190604040343.GB7208@ming.t460p> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.11.3 (2019-02-01) X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Greylist: Sender IP whitelisted, not delayed by milter-greylist-4.5.16 (mx1.redhat.com [10.5.110.29]); Tue, 04 Jun 2019 10:43:44 +0000 (UTC) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Jun 04, 2019 at 05:06:44PM +0800, Rong Chen wrote: > Hi, > > On 6/4/19 12:03 PM, Ming Lei wrote: > > Hi Rong Chen, > > > > Thanks for your test & report! > > > > On Tue, Jun 04, 2019 at 10:09:56AM +0800, kernel test robot wrote: > > > FYI, we noticed the following commit (built with gcc-7): > > > > > > commit: 47cdee29ef9d94e485eb08f962c74943023a5271 ("block: move blk_exit_queue into __blk_release_queue") > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > > > > in testcase: trinity > > > with following parameters: > > > > > > runtime: 300s > > > > > > test-description: Trinity is a linux system call fuzz tester. > > > test-url: http://codemonkey.org.uk/projects/trinity/ > > > > > > > > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 2G > > > > > > caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace): > > > > > > > > > +-------------------------------------------------+------------+------------+ > > > | | 31cb1d64da | 47cdee29ef | > > > +-------------------------------------------------+------------+------------+ > > > | boot_successes | 3 | 0 | > > > | boot_failures | 13 | 8 | > > > | BUG:kernel_reboot-without-warning_in_test_stage | 13 | | > > > | BUG:kernel_NULL_pointer_dereference,address | 0 | 8 | > > > | Oops:#[##] | 0 | 8 | > > > | RIP:blk_mq_free_rqs | 0 | 8 | > > > | Kernel_panic-not_syncing:Fatal_exception | 0 | 8 | > > > +-------------------------------------------------+------------+------------+ > > > > > > > > > If you fix the issue, kindly add following tag > > > Reported-by: kernel test robot > > > > > > > > > [ 6.560544] BUG: kernel NULL pointer dereference, address: 0000000000000020 > > > [ 6.561658] #PF: supervisor read access in kernel mode > > > [ 6.562495] #PF: error_code(0x0000) - not-present page > > > [ 6.563277] PGD 0 P4D 0 > > > [ 6.563277] Oops: 0000 [#1] PTI > > > [ 6.563277] CPU: 0 PID: 147 Comm: kworker/0:2 Tainted: G T 5.2.0-rc1-00387-g47cdee29 #1 > > > [ 6.563277] Workqueue: events __blk_release_queue > > > [ 6.563277] RIP: 0010:blk_mq_free_rqs+0x2c/0xaf > > > > Looks there is race between removing queue and switching elevator, and > > which should be done by Trinity. > > > > I guess that commit 47cdee29ef9d94e485eb08f962c74943023a5271 just > > changes the timing and makes it easy to trigger. > > > > Please test the following patch and see if difference can be made. > > If the patch can't fix the issue, please enable KASAN and reproduce, > > then more useful log may be got. > > The patch doesn't work, Attached please find the dmesg file with KASAN > enabled. Thanks for your test. I think I can understand the issue now, it is because blk_mq_free_rqs() needs tag_set, however tag_set may have been freed. In theory, we don't need tagset for freeing scheduler tags which is per-request-queue, not like driver tags. However, the big trouble is that .exit_request() needs tagset, and this one is a generic issue, not limited to ide. Give me a little time, I will investigate and see if good solution can be figured out. Otherwise, we may have to revert that commit. Thanks, Ming From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============5453091400166018884==" MIME-Version: 1.0 From: Ming Lei To: lkp@lists.01.org Subject: Re: [block] 47cdee29ef: BUG:kernel_NULL_pointer_dereference, address Date: Tue, 04 Jun 2019 18:43:27 +0800 Message-ID: <20190604104326.GA22492@ming.t460p> In-Reply-To: List-Id: --===============5453091400166018884== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Tue, Jun 04, 2019 at 05:06:44PM +0800, Rong Chen wrote: > Hi, > = > On 6/4/19 12:03 PM, Ming Lei wrote: > > Hi Rong Chen, > > = > > Thanks for your test & report! > > = > > On Tue, Jun 04, 2019 at 10:09:56AM +0800, kernel test robot wrote: > > > FYI, we noticed the following commit (built with gcc-7): > > > = > > > commit: 47cdee29ef9d94e485eb08f962c74943023a5271 ("block: move blk_ex= it_queue into __blk_release_queue") > > > https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master > > > = > > > in testcase: trinity > > > with following parameters: > > > = > > > runtime: 300s > > > = > > > test-description: Trinity is a linux system call fuzz tester. > > > test-url: http://codemonkey.org.uk/projects/trinity/ > > > = > > > = > > > on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp= 2 -m 2G > > > = > > > caused below changes (please refer to attached dmesg/kmsg for entire = log/backtrace): > > > = > > > = > > > +-------------------------------------------------+------------+-----= -------+ > > > | | 31cb1d64da | 47cd= ee29ef | > > > +-------------------------------------------------+------------+-----= -------+ > > > | boot_successes | 3 | 0 = | > > > | boot_failures | 13 | 8 = | > > > | BUG:kernel_reboot-without-warning_in_test_stage | 13 | = | > > > | BUG:kernel_NULL_pointer_dereference,address | 0 | 8 = | > > > | Oops:#[##] | 0 | 8 = | > > > | RIP:blk_mq_free_rqs | 0 | 8 = | > > > | Kernel_panic-not_syncing:Fatal_exception | 0 | 8 = | > > > +-------------------------------------------------+------------+-----= -------+ > > > = > > > = > > > If you fix the issue, kindly add following tag > > > Reported-by: kernel test robot > > > = > > > = > > > [ 6.560544] BUG: kernel NULL pointer dereference, address: 0000000= 000000020 > > > [ 6.561658] #PF: supervisor read access in kernel mode > > > [ 6.562495] #PF: error_code(0x0000) - not-present page > > > [ 6.563277] PGD 0 P4D 0 > > > [ 6.563277] Oops: 0000 [#1] PTI > > > [ 6.563277] CPU: 0 PID: 147 Comm: kworker/0:2 Tainted: G = T 5.2.0-rc1-00387-g47cdee29 #1 > > > [ 6.563277] Workqueue: events __blk_release_queue > > > [ 6.563277] RIP: 0010:blk_mq_free_rqs+0x2c/0xaf > > = > > Looks there is race between removing queue and switching elevator, and > > which should be done by Trinity. > > = > > I guess that commit 47cdee29ef9d94e485eb08f962c74943023a5271 just > > changes the timing and makes it easy to trigger. > > = > > Please test the following patch and see if difference can be made. > > If the patch can't fix the issue, please enable KASAN and reproduce, > > then more useful log may be got. > = > The patch doesn't work, Attached please find the dmesg file with KASAN > enabled. Thanks for your test. I think I can understand the issue now, it is because blk_mq_free_rqs() needs tag_set, however tag_set may have been freed. In theory, we don't need tagset for freeing scheduler tags which is per-request-queue, not like driver tags. However, the big trouble is that .exit_request() needs tagset, and this one is a generic issue, not limited to ide. Give me a little time, I will investigate and see if good solution can be figured out. Otherwise, we may have to revert that commit. Thanks, Ming --===============5453091400166018884==--