From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 543C5C433EF for ; Sat, 23 Apr 2022 10:15:45 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234848AbiDWKSh (ORCPT ); Sat, 23 Apr 2022 06:18:37 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:38678 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S231953AbiDWKSd (ORCPT ); Sat, 23 Apr 2022 06:18:33 -0400 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by lindbergh.monkeyblade.net (Postfix) with ESMTP id 243FD69495 for ; Sat, 23 Apr 2022 03:15:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1650708934; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Nnu/RwyQm3ejKaQwoN4NCJcZ0nanUujxwSLD2x1jY2o=; b=cEZrOpeFuVg/h+VIT9wLleOEyKVAF4e/7D9vyU15k6apuD+W/UJg3luLStLGp3GO/oxkYz OOxHoioYe6V5uu3DDPX1CAFK06JJj3ALG0OgfP60aQd4S39JNxRq/H/Vx65U4r+oXG75H6 KnLIZrdnNMPRt94dw/eWjMn8k112eLc= Received: from mimecast-mx02.redhat.com (mx3-rdu2.redhat.com [66.187.233.73]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-674-pshl6yfwO1e5w0rDgnjiQA-1; Sat, 23 Apr 2022 06:15:31 -0400 X-MC-Unique: pshl6yfwO1e5w0rDgnjiQA-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.rdu2.redhat.com [10.11.54.8]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx02.redhat.com (Postfix) with ESMTPS id 6AFAF380670C; Sat, 23 Apr 2022 10:15:30 +0000 (UTC) Received: from ceranb (unknown [10.39.192.68]) by smtp.corp.redhat.com (Postfix) with ESMTP id 27402C28102; Sat, 23 Apr 2022 10:15:28 +0000 (UTC) Date: Sat, 23 Apr 2022 12:15:27 +0200 From: Ivan Vecera To: "Ertman, David M" Cc: "netdev@vger.kernel.org" , poros , mschmidt , Leon Romanovsky , "Brandeburg, Jesse" , "Nguyen, Anthony L" , "David S. Miller" , Jakub Kicinski , Paolo Abeni , "Saleem, Shiraz" , "moderated list:INTEL ETHERNET DRIVERS" , open list Subject: Re: [PATCH net v3] ice: Fix race during aux device (un)plugging Message-ID: <20220423121527.79fa5efb@ceranb> In-Reply-To: References: <20220421060906.1902576-1-ivecera@redhat.com> MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-Scanned-By: MIMEDefang 2.85 on 10.11.54.8 Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 22 Apr 2022 20:55:10 +0000 "Ertman, David M" wrote: > > -----Original Message----- > > From: Ertman, David M > > Sent: Friday, April 22, 2022 10:42 AM > > To: Ivan Vecera ; netdev@vger.kernel.org > > Cc: poros ; mschmidt ; Leon > > Romanovsky ; Brandeburg, Jesse > > ; Nguyen, Anthony L > > ; David S. Miller ; > > Jakub Kicinski ; Paolo Abeni ; > > Saleem, Shiraz ; moderated list:INTEL ETHERNET > > DRIVERS ; open list > kernel@vger.kernel.org> > > Subject: RE: [PATCH net v3] ice: Fix race during aux device (un)plugging > > > > > -----Original Message----- > > > From: Ivan Vecera > > > Sent: Wednesday, April 20, 2022 11:09 PM > > > To: netdev@vger.kernel.org > > > Cc: poros ; mschmidt ; > > Leon > > > Romanovsky ; Brandeburg, Jesse > > > ; Nguyen, Anthony L > > > ; David S. Miller ; > > > Jakub Kicinski ; Paolo Abeni ; > > > Ertman, David M ; Saleem, Shiraz > > > ; moderated list:INTEL ETHERNET DRIVERS > > > > wired-lan@lists.osuosl.org>; open list > > > Subject: [PATCH net v3] ice: Fix race during aux device (un)plugging > > > > > > Function ice_plug_aux_dev() assigns pf->adev field too early prior > > > aux device initialization and on other side ice_unplug_aux_dev() > > > starts aux device deinit and at the end assigns NULL to pf->adev. > > > This is wrong because pf->adev should always be non-NULL only when > > > aux device is fully initialized and ready. This wrong order causes > > > a crash when ice_send_event_to_aux() call occurs because that function > > > depends on non-NULL value of pf->adev and does not assume that > > > aux device is half-initialized or half-destroyed. > > > After order correction the race window is tiny but it is still there, > > > as Leon mentioned and manipulation with pf->adev needs to be protected > > > by mutex. > > > > > > Fix (un-)plugging functions so pf->adev field is set after aux device > > > init and prior aux device destroy and protect pf->adev assignment by > > > new mutex. This mutex is also held during ice_send_event_to_aux() > > > call to ensure that aux device is valid during that call. Device > > > lock used ice_send_event_to_aux() to avoid its concurrent run can > > > be removed as this is secured by that mutex. > > > > > > Reproducer: > > > cycle=1 > > > while :;do > > > echo "#### Cycle: $cycle" > > > > > > ip link set ens7f0 mtu 9000 > > > ip link add bond0 type bond mode 1 miimon 100 > > > ip link set bond0 up > > > ifenslave bond0 ens7f0 > > > ip link set bond0 mtu 9000 > > > ethtool -L ens7f0 combined 1 > > > ip link del bond0 > > > ip link set ens7f0 mtu 1500 > > > sleep 1 > > > > > > let cycle++ > > > done > > > > > > In short when the device is added/removed to/from bond the aux device > > > is unplugged/plugged. When MTU of the device is changed an event is > > > sent to aux device asynchronously. This can race with (un)plugging > > > operation and because pf->adev is set too early (plug) or too late > > > (unplug) the function ice_send_event_to_aux() can touch uninitialized > > > or destroyed fields. In the case of crash below pf->adev->dev.mutex. > > > > > > Crash: > > > [ 53.372066] bond0: (slave ens7f0): making interface the new active one > > > [ 53.378622] bond0: (slave ens7f0): Enslaving as an active interface with an > > u > > > p link > > > [ 53.386294] IPv6: ADDRCONF(NETDEV_CHANGE): bond0: link becomes > > > ready > > > [ 53.549104] bond0: (slave ens7f1): Enslaving as a backup interface with an > > > up > > > link > > > [ 54.118906] ice 0000:ca:00.0 ens7f0: Number of in use tx queues changed > > > inval > > > idating tc mappings. Priority traffic classification disabled! > > > [ 54.233374] ice 0000:ca:00.1 ens7f1: Number of in use tx queues changed > > > inval > > > idating tc mappings. Priority traffic classification disabled! > > > [ 54.248204] bond0: (slave ens7f0): Releasing backup interface > > > [ 54.253955] bond0: (slave ens7f1): making interface the new active one > > > [ 54.274875] bond0: (slave ens7f1): Releasing backup interface > > > [ 54.289153] bond0 (unregistering): Released all slaves > > > [ 55.383179] MII link monitoring set to 100 ms > > > [ 55.398696] bond0: (slave ens7f0): making interface the new active one > > > [ 55.405241] BUG: kernel NULL pointer dereference, address: > > > 0000000000000080 > > > [ 55.405289] bond0: (slave ens7f0): Enslaving as an active interface with an > > u > > > p link > > > [ 55.412198] #PF: supervisor write access in kernel mode > > > [ 55.412200] #PF: error_code(0x0002) - not-present page > > > [ 55.412201] PGD 25d2ad067 P4D 0 > > > [ 55.412204] Oops: 0002 [#1] PREEMPT SMP NOPTI > > > [ 55.412207] CPU: 0 PID: 403 Comm: kworker/0:2 Kdump: loaded Tainted: > > G > > > S > > > 5.17.0-13579-g57f2d6540f03 #1 > > > [ 55.429094] bond0: (slave ens7f1): Enslaving as a backup interface with an > > > up > > > link > > > [ 55.430224] Hardware name: Dell Inc. PowerEdge R750/06V45N, BIOS > > 1.4.4 > > > 10/07/ > > > 2021 > > > [ 55.430226] Workqueue: ice ice_service_task [ice] > > > [ 55.468169] RIP: 0010:mutex_unlock+0x10/0x20 > > > [ 55.472439] Code: 0f b1 13 74 96 eb e0 4c 89 ee eb d8 e8 79 54 ff ff 66 0f 1f > > 84 > > > 00 00 00 00 00 0f 1f 44 00 00 65 48 8b 04 25 40 ef 01 00 31 d2 48 0f b1 17 > > 75 > > > 01 c3 e9 e3 fe ff ff 0f 1f 00 0f 1f 44 00 00 48 > > > [ 55.491186] RSP: 0018:ff4454230d7d7e28 EFLAGS: 00010246 > > > [ 55.496413] RAX: ff1a79b208b08000 RBX: ff1a79b2182e8880 RCX: > > > 0000000000000001 > > > [ 55.503545] RDX: 0000000000000000 RSI: ff4454230d7d7db0 RDI: > > > 0000000000000080 > > > [ 55.510678] RBP: ff1a79d1c7e48b68 R08: ff4454230d7d7db0 R09: > > > 0000000000000041 > > > [ 55.517812] R10: 00000000000000a5 R11: 00000000000006e6 R12: > > > ff1a79d1c7e48bc0 > > > [ 55.524945] R13: 0000000000000000 R14: ff1a79d0ffc305c0 R15: > > > 0000000000000000 > > > [ 55.532076] FS: 0000000000000000(0000) GS:ff1a79d0ffc00000(0000) > > > knlGS:0000000000000000 > > > [ 55.540163] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > > > [ 55.545908] CR2: 0000000000000080 CR3: 00000003487ae003 CR4: > > > 0000000000771ef0 > > > [ 55.553041] DR0: 0000000000000000 DR1: 0000000000000000 DR2: > > > 0000000000000000 > > > [ 55.560173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: > > > 0000000000000400 > > > [ 55.567305] PKRU: 55555554 > > > [ 55.570018] Call Trace: > > > [ 55.572474] > > > [ 55.574579] ice_service_task+0xaab/0xef0 [ice] > > > [ 55.579130] process_one_work+0x1c5/0x390 > > > [ 55.583141] ? process_one_work+0x390/0x390 > > > [ 55.587326] worker_thread+0x30/0x360 > > > [ 55.590994] ? process_one_work+0x390/0x390 > > > [ 55.595180] kthread+0xe6/0x110 > > > [ 55.598325] ? kthread_complete_and_exit+0x20/0x20 > > > [ 55.603116] ret_from_fork+0x1f/0x30 > > > [ 55.606698] > > > > > > Fixes: f9f5301e7e2d ("ice: Register auxiliary device to provide RDMA") > > > Cc: Leon Romanovsky > > > Signed-off-by: Ivan Vecera > > > > Sorry for previous mis-reply - hit the wrong button. > > > > LGTM > > Acked-by: Dave Ertman > > After thinking about this for a bit longer, I did think of one issue. > > With the removal of the device_lock in ice_send_event_to_aux(), there is no guarantee that the > function pointer will not become NULL by the auxiliary_driver unloading. It is a very small window, > but it could happen. > > I think the device_lock should probably stay also. > > DaveE > The function pointer can't become NULL but adev->dev.driver can. So yeah, you are right the device lock needs to be held as well. Will submit v4. Thx, Ivan