From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.3 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F95FC433DF for ; Sat, 16 May 2020 06:45:18 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 31D022065C for ; Sat, 16 May 2020 06:45:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725867AbgEPGpR (ORCPT ); Sat, 16 May 2020 02:45:17 -0400 Received: from bmailout3.hostsharing.net ([176.9.242.62]:41237 "EHLO bmailout3.hostsharing.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725807AbgEPGpR (ORCPT ); Sat, 16 May 2020 02:45:17 -0400 Received: from h08.hostsharing.net (h08.hostsharing.net [IPv6:2a01:37:1000::53df:5f1c:0]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client CN "*.hostsharing.net", Issuer "COMODO RSA Domain Validation Secure Server CA" (not verified)) by bmailout3.hostsharing.net (Postfix) with ESMTPS id A8593100D9402; Sat, 16 May 2020 08:45:12 +0200 (CEST) Received: by h08.hostsharing.net (Postfix, from userid 100393) id 1F45AAF8FC; Sat, 16 May 2020 08:45:12 +0200 (CEST) Date: Sat, 16 May 2020 08:45:12 +0200 From: Lukas Wunner To: Andy Shevchenko Cc: Mark Brown , Nicolas Saenz Julienne , Martin Sperl , linux-spi , linux-rpi-kernel , Linus Walleij Subject: Re: [PATCH 1/5] spi: Fix controller unregister order Message-ID: <20200516064512.cwslwlkozff3mycf@wunner.de> References: <8aaf9d44c153fe233b17bc2dec4eb679898d7e7b.1589557526.git.lukas@wunner.de> <20200515162725.GG5066@sirena.org.uk> <20200515163147.3u4xjqdxci2neup7@wunner.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: NeoMutt/20170113 (1.7.2) Sender: linux-spi-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-spi@vger.kernel.org On Sat, May 16, 2020 at 12:37:17AM +0300, Andy Shevchenko wrote: > On Fri, May 15, 2020 at 7:41 PM Lukas Wunner wrote: > > On Fri, May 15, 2020 at 05:27:25PM +0100, Mark Brown wrote: > > > On Fri, May 15, 2020 at 05:58:01PM +0200, Lukas Wunner wrote: > > > > However since commit ffbbdd21329f ("spi: create a message queueing > > > > infrastructure"), spi_destroy_queue() is executed before unbinding the > > > > slaves. It sets ctlr->running = false, thereby preventing SPI bus > > > > access and causing unbinding of slave devices to fail. > > > > > > Devices should basically never fail an unbind operation - if something > > > went seriously wrong there's basically nothing that can be done in terms > > > of error handling and keeping the device around isn't going to help. > > > > I guess the word "fail" in the commit message invites misinterpretations. > > The driver does unbind from the slave device, but the physical device is > > not left in a proper state. E.g. interrupts may still be generated by > > the device because writing a register to disable them failed. > > I didn't check a patch, but I see a bug on kernel bugzilla against > spi-pxa2xx because of this. It requires quite untrivial ->remove() in > order to quiescent the DMA and other activities. Yes from a quick look at spi-pxa2xx.c it's immediately obvious that the use of devm_spi_register_controller() is likewise completely wrong. The crucial thing to understand is that the SPI driver's ->remove() hook is executed *before* any device-managed resources are released. pxa2xx_spi_remove() disables the clock, frees the IRQ, releases DMA, so the SPI controller is no longer usable even though it's still registered! Somehow this incorrect order got cargo-culted to dozens of drivers over the years. We use SPI-attached Ethernet chips and when the SPI driver's module is unloaded, the Ethernet driver's ->ndo_stop() hook is executed to bring down the interface. For this it needs to communicate with the Ethernet chip, but it can't because the ->remove() hook has already been executed and unbinding of the SPI slave happens afterwards, when the SPI controller is unregistered via devres_release_all(). There's another issue in spi-pxa2xx.c: It acquires a runtime PM ref even though the driver core already does that. Do you have a link to the spi-pxa2xx.c bugzilla? Are you able to test patches? I can submit a patch but I can only compile-test it, I don't have that hardware. Thanks, Lukas