From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-4.0 required=3.0 tests=DKIMWL_WL_MED,DKIM_SIGNED, DKIM_VALID,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 27E81C282C4 for ; Sat, 9 Feb 2019 08:31:59 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C18B920869 for ; Sat, 9 Feb 2019 08:31:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=aruba.it header.i=@aruba.it header.b="EePoa2gS" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726852AbfBIIb5 (ORCPT ); Sat, 9 Feb 2019 03:31:57 -0500 Received: from smtpcmd15176.aruba.it ([62.149.156.176]:49179 "EHLO smtpcmd15176.aruba.it" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1726047AbfBIIb5 (ORCPT ); Sat, 9 Feb 2019 03:31:57 -0500 X-Greylist: delayed 420 seconds by postgrey-1.27 at vger.kernel.org; Sat, 09 Feb 2019 03:31:55 EST Received: from [192.168.1.32] ([93.146.66.165]) by smtpcmd15.ad.aruba.it with bizsmtp id aYQo1z00P3Zw7e501YQoVg; Sat, 09 Feb 2019 09:24:54 +0100 Subject: Re: Possible bug into DSA2 code. From: Rodolfo Giometti To: Florian Fainelli , Andrew Lunn , Vivien Didelot Cc: "David S. Miller" , netdev@vger.kernel.org References: Message-ID: <528f797d-445b-a314-d8ef-db15a3b6a8ce@enneenne.com> Date: Sat, 9 Feb 2019 09:24:48 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.4.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aruba.it; s=a1; t=1549700694; bh=/WOIHIBOdFlIvhueyXVm9iKve+dBpdk8faHl6MvEDpI=; h=Subject:From:To:Date:MIME-Version:Content-Type; b=EePoa2gSQWKazL8rAjcW5C+Ei3St3zB77OY0J7PhHNfbgSK+UXngXSUCJkv0xxP8F EgjjW7V/ZEMtSvLe/2TcDKe2ELCa31UGHrEPFE3+vF12zipaJtbNgEiOSq1wRjyaUQ MXBG9dLoFS6JrBNqUYzxckZ9CM+W/6+pl2xuV0MSrOoya57vXQO0qob8sENB0nsPjb +1j5kwKez0eXKLQQNqD4D/NVlKkli3qd3Ct+juMwswFfvej1irGiGS5lfoL70g61k4 frq4rCrTjQ8Un21v6YiJc36AvJuuy4INGGoXkVGBtZqG87jKZNihJMlCmfCAcNxUvd iREEJ9+qJRgdA== Sender: netdev-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: netdev@vger.kernel.org Hello, I'm working with EPRESSObin and DSA2 where I added the ability to dynamically load and unload switch configurations by using DT-overlay (a patchwork from here https://lore.kernel.org/patchwork/patch/468129/). During my tests I notice that when I remove the overlay in order to disable the switch I got the following BUG message: [ 24.862079] ------------[ cut here ]------------ [ 24.866767] kernel BUG at drivers/net/phy/mdio_bus.c:448! [ 24.872328] Internal error: Oops - BUG: 0 [#1] PREEMPT SMP [ 24.877967] Modules linked in: [ 24.881109] CPU: 0 PID: 2189 Comm: rmdir Not tainted 4.19.0-sw3720tsn1-038561 [ 24.890509] Hardware name: Kbact sw3720tsn1 smart switch (DT) [ 24.896426] pstate: 20000005 (nzCv daif -PAN -UAO) [ 24.901365] pc : mdiobus_unregister+0x90/0x98 [ 24.905838] lr : mv88e6xxx_mdios_unregister+0x64/0x88 [ 24.911028] sp : ffff80006aea7730 [ 24.914434] x29: ffff80006aea7730 x28: ffff80006adc27c0 [ 24.919898] x27: ffff0000091e3000 x26: ffff0000090d9000 [ 24.925365] x25: ffff80006c9420a0 x24: ffff80006c3e1c10 [ 24.930828] x23: 0000000000000060 x22: ffff80006c9e6018 [ 24.936294] x21: ffff80006c9e6110 x20: ffff80006c942800 [ 24.941758] x19: ffff80006c942d40 x18: ffffffffffffffff [ 24.947225] x17: 0000000000000000 x16: ffff80006adc27c0 [ 24.952690] x15: ffff0000090d96c8 x14: ffff000089198737 [ 24.958156] x13: ffff000009198745 x12: ffff0000090d9940 [ 24.963621] x11: ffff0000086be4b0 x10: 0000000000000040 [ 24.969087] x9 : ffff0000090f4710 x8 : 0000000040000000 [ 24.974553] x7 : ffff0000090d96c8 x6 : ffff80006a97a921 [ 24.980018] x5 : ffff80006a97a920 x4 : 0000000000000fff [ 24.985483] x3 : 0000000000000000 x2 : 0000000000000000 [ 24.990948] x1 : 0000000000000003 x0 : ffff80006c942800 [ 24.996416] Process rmdir (pid: 2189, stack limit = 0x(____ptrval____)) [ 25.003225] Call trace: [ 25.005737] mdiobus_unregister+0x90/0x98 [ 25.009858] mv88e6xxx_mdios_unregister+0x64/0x88 [ 25.014696] mv88e6xxx_remove+0x2c/0x88 [ 25.018637] mdio_remove+0x20/0x48 [ 25.022135] device_release_driver_internal+0x1a8/0x240 [ 25.027509] device_release_driver+0x14/0x20 [ 25.031899] bus_remove_device+0x110/0x128 [ 25.036109] device_del+0x124/0x340 [ 25.039693] mdio_device_remove+0x14/0x28 [ 25.043815] mdiobus_unregister+0x50/0x98 [ 25.047940] orion_mdio_remove+0x34/0xb0 [ 25.051970] platform_drv_remove+0x24/0x50 [ 25.056181] device_release_driver_internal+0x1a8/0x240 [ 25.061557] device_release_driver+0x14/0x20 [ 25.065947] bus_remove_device+0x110/0x128 [ 25.070158] device_del+0x124/0x340 [ 25.073742] platform_device_del.part.3+0x24/0x90 [ 25.078580] platform_device_unregister+0x18/0x30 [ 25.083422] of_platform_device_destroy+0xb4/0xb8 [ 25.088257] of_platform_notify+0xa8/0x170 [ 25.092471] notifier_call_chain+0x54/0x98 [ 25.096679] blocking_notifier_call_chain+0x48/0x70 [ 25.101697] of_property_notify+0x60/0xa0 [ 25.105819] __of_changeset_entry_notify+0x54/0x100 [ 25.110836] __of_changeset_revert_notify+0x3c/0x70 [ 25.115857] of_overlay_remove+0x2ac/0x378 [ 25.120066] cfs_overlay_release+0x28/0x50 [ 25.124278] config_item_put.part.0+0x70/0xb0 [ 25.128757] config_item_put+0x10/0x20 [ 25.132609] configfs_rmdir+0x1ec/0x2e0 [ 25.136554] vfs_rmdir+0x7c/0x170 [ 25.139956] do_rmdir+0x17c/0x1d0 [ 25.143361] __arm64_sys_unlinkat+0x4c/0x60 [ 25.147664] el0_svc_common+0x60/0xe8 [ 25.151426] el0_svc_handler+0x2c/0x80 [ 25.155279] el0_svc+0x8/0xc [ 25.158236] Code: a94153f3 a9425bf5 a8c37bfd d65f03c0 (d4210000) [ 25.164509] ---[ end trace 5138591d8b9c9222 ]--- After looking into the kernel code I discovered that this depends to the commit 1eb59443e72c69edbb836626f9f7f7e82427eeac which modifications I report below: diff --git a/net/dsa/dsa2.c b/net/dsa/dsa2.c index 921a36fd139d..4e0f3c268103 100644 --- a/net/dsa/dsa2.c +++ b/net/dsa/dsa2.c @@ -312,6 +312,18 @@ static int dsa_ds_apply(struct dsa_switch_tree *dst, struct dsa_switch *ds) if (err < 0) return err; + if (!ds->slave_mii_bus && ds->drv->phy_read) { + ds->slave_mii_bus = devm_mdiobus_alloc(ds->dev); + if (!ds->slave_mii_bus) + return -ENOMEM; + + dsa_slave_mii_bus_init(ds); + + err = mdiobus_register(ds->slave_mii_bus); + if (err < 0) + return err; + } + for (index = 0; index < DSA_MAX_PORTS; index++) { port = ds->ports[index].dn; if (!port) @@ -361,6 +373,9 @@ static void dsa_ds_unapply(struct dsa_switch_tree *dst, struct dsa_switch *ds) dsa_user_port_unapply(port, index, ds); } + + if (ds->slave_mii_bus && ds->drv->phy_read) + mdiobus_unregister(ds->slave_mii_bus); } static int dsa_dst_apply(struct dsa_switch_tree *dst) This patch looks buggy to me because if this patch has the target to catch drivers that call dsa_ds_apply() having ds->slave_mii_bus set to NULL with a defined ds->ops->phy_read, then it should take into account also those drivers that have both ds->slave_mii_bus and ds->ops->phy_read already defined and then DO NOT call mdiobus_unregister() during dsa_ds_unapply()! This because DSA should NOT undo an operation it never did. So we I see two possible solutions: 1) having both ds->slave_mii_bus and ds->ops->phy_read already defined is an error, then it must be signaled to the calling code, or 2) we have to use a flag to signal dsa_ds_unapply() what to do. I don't know DSA too much to provide the rigth-thing(TM) so I'm waiting for a reply before proposing a patch. :-) Ciao, Rodolfo -- GNU/Linux Solutions e-mail: giometti@enneenne.com Linux Device Driver giometti@linux.it Embedded Systems phone: +39 349 2432127 UNIX programming skype: rodolfo.giometti