From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE, SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7E515C2BA83 for ; Thu, 13 Feb 2020 14:28:22 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 48930222C2 for ; Thu, 13 Feb 2020 14:28:22 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=ziepe.ca header.i=@ziepe.ca header.b="ns4Finbb" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726300AbgBMO2V (ORCPT ); Thu, 13 Feb 2020 09:28:21 -0500 Received: from mail-qt1-f193.google.com ([209.85.160.193]:42971 "EHLO mail-qt1-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725781AbgBMO2V (ORCPT ); Thu, 13 Feb 2020 09:28:21 -0500 Received: by mail-qt1-f193.google.com with SMTP id r5so4463396qtt.9 for ; Thu, 13 Feb 2020 06:28:20 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ziepe.ca; s=google; h=date:from:to:cc:subject:message-id:references:mime-version :content-disposition:in-reply-to:user-agent; bh=r0RlfLcqIUwkSA7VxqzqkyT+4hV7hsl+i/zcvS6bOrk=; b=ns4FinbbHS7kVBjbRv+7/rpzQVjth+eC5QN+F3psQIWBqwfEHxcO9dtmdMhV18Wo0t ziz9LTlcAOiBaVtghKnHOII/gZ5Y3L+TOGwpxfsvGJI/yxg5B5fZ2uwOuQhLKCcXI7/a gFg4zd58Kh4oHj/Jeve+LQiUriE5NKYq2VD6tR+3shRhf3UVkhAWCPL4VTdqtoG2ho+E yc2vcfmANZtlQxsnJTOEsB6JFC15iC4uFgBLL/uCZsi85up20MnCdVSn6zrr+ojdAzlz K6ijARP8MMYs9+CwdWPI4PomujYwdKXcce1hYC8Bz3I3VM8RCNr418nUmh/yGZ5yxl25 IvsA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:in-reply-to:user-agent; bh=r0RlfLcqIUwkSA7VxqzqkyT+4hV7hsl+i/zcvS6bOrk=; b=jBYYoy3zh5gv87jJCr9OcABuegps7OayqpVkm2cYcR3sIaDPKw4gqfMFv7eyXf6Loo 2+JaJpDiAkSXaGsFOpdfOQcjDwrbpM4kTApj2VchqLNEIOldy8AAV9KihDgZ+Tl4qYXR u1GTTjG4FOjgBoUHc1yyYZmvQqhDVl4mS0BXMUjrMOCPeRF8MJ3DpNX3pkEHI+G7/RW4 HhMo7LBY7hwOkBXLRlPRi3fp7QhGtnsrI0wCSaQWpN4ICP7sJ2G3fCsNgljwq6R4G9Re mBgt8kVP+U3VV0AQkCYeh5J2gUYs8RIP7qS5+M8Z1hfwDLYn6XKqj+EnGXorFWRNSan6 AHUw== X-Gm-Message-State: APjAAAWxgcirqURI7+H0EPFR2bSjbFbrZCNkL24O8rBiQFsEFSuyjxLW l1WBrGDghXCPpaMbargBqXfjtFRQeDb15Q== X-Google-Smtp-Source: APXvYqxd8JFLsOs7rQumzXI41ANdLU044qaAKu4RnItr9XIO40YSIvh6lRkTOvD5iIPpkseKLFIe3A== X-Received: by 2002:ac8:a83:: with SMTP id d3mr12029129qti.228.1581604099673; Thu, 13 Feb 2020 06:28:19 -0800 (PST) Received: from ziepe.ca (hlfxns017vw-142-68-57-212.dhcp-dynamic.fibreop.ns.bellaliant.net. [142.68.57.212]) by smtp.gmail.com with ESMTPSA id v2sm1516968qto.73.2020.02.13.06.28.19 (version=TLS1_2 cipher=ECDHE-RSA-CHACHA20-POLY1305 bits=256/256); Thu, 13 Feb 2020 06:28:19 -0800 (PST) Received: from jgg by mlx.ziepe.ca with local (Exim 4.90_1) (envelope-from ) id 1j2FTW-0004MB-MO; Thu, 13 Feb 2020 10:28:18 -0400 Date: Thu, 13 Feb 2020 10:28:18 -0400 From: Jason Gunthorpe To: Leon Romanovsky Cc: Doug Ledford , Leon Romanovsky , RDMA mailing list , Daniel Jurgens , Erez Shitrit , Maor Gottlieb , Michael Guralnik , Moni Shoua , Parav Pandit , Sean Hefty , Valentine Fatiev , Yishai Hadas , Yonatan Cohen , Zhu Yanjun Subject: Re: [PATCH rdma-rc 8/9] IB/umad: Fix kernel crash while unloading ib_umad Message-ID: <20200213142818.GA16120@ziepe.ca> References: <20200212072635.682689-1-leon@kernel.org> <20200212072635.682689-9-leon@kernel.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20200212072635.682689-9-leon@kernel.org> User-Agent: Mutt/1.9.4 (2018-02-28) Sender: linux-rdma-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-rdma@vger.kernel.org On Wed, Feb 12, 2020 at 09:26:34AM +0200, Leon Romanovsky wrote: > From: Yonatan Cohen > > When unloading ib_umad, remove ibdev sys file 1st before > port removal to prevent kernel oops. > > ib_mad's method ibdev_show() might access a umad port > whoes ibdev field has already been NULLed when rmmod ib_umad > was issued from another shell. > > Consider this scenario > shell-1 shell-2 > rmmod ib_mod cat /sys/devices/../ibdev > | | > ib_umad_kill_port() ibdev_show() > port->ib_dev = NULL dev_name(port->ib_dev) > > kernel stack > PF: error_code(0x0000) - not-present page > Oops: 0000 [#1] SMP DEBUG_PAGEALLOC PTI > RIP: 0010:ibdev_show+0x18/0x50 [ib_umad] > RSP: 0018:ffffc9000097fe40 EFLAGS: 00010282 > RAX: 0000000000000000 RBX: ffffffffa0441120 RCX: ffff8881df514000 > RDX: ffff8881df514000 RSI: ffffffffa0441120 RDI: ffff8881df1e8870 > RBP: ffffffff81caf000 R08: ffff8881df1e8870 R09: 0000000000000000 > R10: 0000000000001000 R11: 0000000000000003 R12: ffff88822f550b40 > R13: 0000000000000001 R14: ffffc9000097ff08 R15: ffff8882238bad58 > FS: 00007f1437ff3740(0000) GS:ffff888236940000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00000000000004e8 CR3: 00000001e0dfc001 CR4: 00000000001606e0 > Call Trace: > dev_attr_show+0x15/0x50 > sysfs_kf_seq_show+0xb8/0x1a0 > seq_read+0x12d/0x350 > vfs_read+0x89/0x140 > ksys_read+0x55/0xd0 > do_syscall_64+0x55/0x1b0 > entry_SYSCALL_64_after_hwframe+0x44/0xa9: > > Fixes: e9dd5daf884c ("IB/umad: Refactor code to use cdev_device_add()") This is the wrong fixes line, this ordering change was actually deliberately done: commit cf7ad3030271c55a7119a8c2162563e3f6e93879 Author: Parav Pandit Date: Fri Dec 21 16:19:24 2018 +0200 IB/umad: Avoid destroying device while it is accessed ib_umad_reg_agent2() and ib_umad_reg_agent() access the device name in dev_notice(), while concurrently, ib_umad_kill_port() can destroy the device using device_destroy(). cpu-0 cpu-1 ----- ----- ib_umad_ioctl() [...] ib_umad_kill_port() device_destroy(dev) ib_umad_reg_agent() dev_notice(dev) The mistake in the above was to move the device_dstroy() down, not split it into device_del() above and put_device() below. Now that is already split we are OK. Jason