From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-1.1 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_PASS, T_DKIMWL_WL_HIGH,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B9793C4321D for ; Fri, 17 Aug 2018 10:10:58 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 676AA218F9 for ; Fri, 17 Aug 2018 10:10:58 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=broadcom.com header.i=@broadcom.com header.b="ax4bJvfb" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 676AA218F9 Authentication-Results: mail.kernel.org; dmarc=fail (p=quarantine dis=none) header.from=broadcom.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726738AbeHQNNp (ORCPT ); Fri, 17 Aug 2018 09:13:45 -0400 Received: from mail-oi0-f68.google.com ([209.85.218.68]:42451 "EHLO mail-oi0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725992AbeHQNNp (ORCPT ); Fri, 17 Aug 2018 09:13:45 -0400 Received: by mail-oi0-f68.google.com with SMTP id b16-v6so13099924oic.9 for ; Fri, 17 Aug 2018 03:10:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=broadcom.com; s=google; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=JwBzlheMQQqyxVp7cbqdv6MmfPMwCOru/J69KQywaG8=; b=ax4bJvfbd20HnfRbkJGZjrl6CVysJw1VdU3Ni/HBCLc6Y/bvx2j0aRdOyna/jjdkJ5 B5fXio3FxbJ0KEJC7CRwcAg1UQodgMdHJeETpPylh1YgoYWlQY3uzpRe84DDs0G+Oncr 00UJod1ebgb2mgTrhImhAaYX0OG0Enus1V9AU= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=JwBzlheMQQqyxVp7cbqdv6MmfPMwCOru/J69KQywaG8=; b=ee0CUAVC3gyxfs56ogcheVaVMs/7/TgA+s7BdeM/5xvVOhxIwkS0aEcAd6BfBucpTZ Vl19VNNSWs8nP8FV5ola2xj7Ex+BsuKKKHY/97zxL7cjxyxIW0EsTQBTb2qOwcDemfsD k7m+/5X8FPgdP6WoiVZavSLJBsL7bZQNlPVXvAOmrDFP8BXulrkI6tuQ9IxwKNaRbVz/ 6hc1djNW6ZgYKCn7X2bww766XicnFvUtebATaBZ9Z+KMeUYtgT3rNo0zNIWnBLiq30SO b1kJOE7WSsK0TJHjKPs4lnsvs3fj0xGnwIxWHhM1e1WPEmxJPXMAHjxvKtxdZNKxYMRu te+A== X-Gm-Message-State: AOUpUlEvHixVYG1oYJpm1JYgRT401YqDk+dQsro6mDDLzxON6QOuXijy N5yGbX6bEKRX81k1XJLTxuFrMGSdcUEqO5nhhuixxA== X-Google-Smtp-Source: AA+uWPzAfy9OWrTJyV++5VMDdMuJkJ46rnxsQQf0JhLV30QNcPM3ujP9guDDh7seJreC/6EzwjhfFrdJe+LGZPaugek= X-Received: by 2002:aca:4914:: with SMTP id w20-v6mr1948497oia.5.1534500654293; Fri, 17 Aug 2018 03:10:54 -0700 (PDT) MIME-Version: 1.0 Received: by 2002:a4a:244d:0:0:0:0:0 with HTTP; Fri, 17 Aug 2018 03:10:53 -0700 (PDT) In-Reply-To: References: <20180817044902.31420-1-benh@kernel.crashing.org> <20180817044902.31420-6-benh@kernel.crashing.org> <66577784.13494160.1534493390262.JavaMail.zimbra@kalray.eu> <1ffafe01887e3b760f84488b06339aba64f586e5.camel@kernel.crashing.org> From: Hari Vyas Date: Fri, 17 Aug 2018 15:40:53 +0530 Message-ID: Subject: Re: [RFC PATCH 5/6] pci: Protect the enable/disable state of pci_dev using the state mutex To: Benjamin Herrenschmidt Cc: Marta Rybczynska , Bjorn Helgaas , linux-pci@vger.kernel.org, Ray Jui , Srinath Mannam , Guenter Roeck , Jens Axboe , Lukas Wunner , Konstantin Khlebnikov , Pierre-Yves Kerbrat , linux-kernel@vger.kernel.org Content-Type: text/plain; charset="UTF-8" Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Fri, Aug 17, 2018 at 3:09 PM, Benjamin Herrenschmidt wrote: > On Fri, 2018-08-17 at 14:30 +0530, Hari Vyas wrote: >> So many mail threads for common issues going so just trying to >> summarize concern from my side. >> 1) HW level >> PCI Config data overwritten (ex: PCI_COMMAND bits etc) was happening >> at lower layer so from my >> perspective, it is best to handle concern at lower level with locking >> mechanism and that is what >> I proposed in my upcoming patch. Without that, I guess, we may end up >> in issues later with some weird scenario >> which may not be known today and we will again be fine tuning stuff >> here and there. > > Maybe... at this point I'm not trying to address that specific issue. > That being said, the PCI_COMMAND register should be under control of > the driver at runtime and thus shouldn't be exposed to races like that > except in the usual case of races in configuring bridges. Those races > definitely need to be handled at the higher level. > > So I'm rather dubious of adding a whole new layer of "modify" callbacks > to config space accessors for that, especially since they won't do any > good on architectures with lockless config accesses such as ... x86 > modify() is optional. host/controller layers may override it. It was just a proposal to avoid race issues which happens in SMP environment and solves other issues. I agree, nothing great anyhow can be done in lockless config >> 2) SW level(internal data structure): >> About is_added,is_busmaster: These all are bit fields so infact I too >> suggested to remove those bit fields and >> make separate variables in pci_dev structure. >> This will avoid all data-overwritten,concurrency issues but ofcourse >> at the level of space cost. > > It's unnecessary to do blanket changes without first understanding the > details of what's going on. A lot of these things are never touched > outside of the overall single threaded environment of > discovery/assignment or under driver control, in which case it's > expectd that the driver doesn't call them in racy ways > > That said, I'm happy to move some of them under my proposed > dev_state_lock. > > For is_added specifically, the field is simply set at the wrong time as > you can see in my previous patch. > Issue needs to be addressed and that is our goal. Some times simple mistakes need lot of debugging which happened in my case and my suggestion is to just avoid. SMP related issues are popping up now so we just need to be careful. >> Other possibility will be either to use atomic > > Atomic is almost always wrong. It doesn't synchronize anything other > than the field, so either it's a big waste of it gives a false sense of > security for something that's almost always still racy. > > I'd rather keep the indication that a bunch of those fields *are* > unprotected and rely on the outer layers being single threaded. > >> or locking mechanism >> for these. My point here is also same, better >> avoid fine tuning later stage. >> Moving is_added up and down doesn't look like good as we are just >> shifting issue up and down. > > No, you are wrong. I also originally covered is_added with my new mutex > but in hindsight it's the wrong thing to do, and went back to the > correct fix at this point which is precisely to move it up. > > is_added is essentially owned by the part of the PCI probe code that > runs single threaded before the device gets exposed to the device > model. > > Pretty much all of the code, including all the corresponding fields > (struct resources etc...) in pci_dev are unprotected and rely on being > accessed in a single threaded manner. is_added is no exception. > > It was simply a mistake to set it after device_attach(). > > Thus moving it back up fixes it *correctly* by making it follow the > same rules as all the related fields. > > That said, if we want to start adding protection to all those other > fields, then this is a different matter alltogether. But at this point, > this is the best fix for the problem at hand. > >> Please check and decide accordingly. I will hold my to-be-submitted >> modify() patch about handling hw level >> over-written case with locking around read-write operation. > > Can you remind us in this thread which specific cases of RMW races of > config space you were trying to address ? > Same pci bridge master, memory bit setting concern only (which my colleague Srinath figured out after lot of effort some time back) where only one bit in PCI_COMMAND was getting set. (Bug 200793 - PCI: read-write config operation doesn't look like SMP safe) My approach is to handle with modify operations at lower level so bits are not over-written or lost. As stated earlier, issue should be just resolved in better way. No issue in going with majority Regards, hari > Cheers, > Ben. > >