From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.6 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 341E6C32771 for ; Wed, 15 Jan 2020 22:10:15 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id EFE2D24679 for ; Wed, 15 Jan 2020 22:10:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1579126215; bh=XQSPf1waxzfWN4+hrEHJUJPfNcBgaqmPOHCovZCd68E=; h=Date:From:To:Cc:Subject:List-ID:From; b=gPUJFjtoYO9LhZBR1A85s+ppZmyZ/iVexz9MDxsS4R7/fYRxLFJUbLSgz8j3aWAB3 4By2kfflyhWWCXkz4DGYCFTEIdatN+aAvnUCJ0JxMudz58KkINluZHYqffN8f24z0t 5/owJGU1bDXMNb+hU9hj6XculGmSCVT6pr9MAqrk= Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1730478AbgAOWKO (ORCPT ); Wed, 15 Jan 2020 17:10:14 -0500 Received: from mail.kernel.org ([198.145.29.99]:35034 "EHLO mail.kernel.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1729203AbgAOWKL (ORCPT ); Wed, 15 Jan 2020 17:10:11 -0500 Received: from localhost (mobile-166-170-223-177.mycingular.net [166.170.223.177]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by mail.kernel.org (Postfix) with ESMTPSA id B32A7207E0; Wed, 15 Jan 2020 22:10:10 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=default; t=1579126211; bh=XQSPf1waxzfWN4+hrEHJUJPfNcBgaqmPOHCovZCd68E=; h=Date:From:To:Cc:Subject:From; b=Qlwc/0Z7tLVjpVVMSv+B5lepjF5B1a2vJlA/PcvS7hfQO7fLpyck4TvdA6AAnY9Fq WgaCXxU8Ts2eIbO96rHNZGbGz/KBHxKEgWni83jyxnEuoiPYyz5UWT57NvOb5xEMai y7skUmZh1hJ7gxAU2ZhgL9wIT3g98+x8LbNmMFEs= Date: Wed, 15 Jan 2020 16:10:08 -0600 From: Bjorn Helgaas To: Alexandru Gagniuc , Alexandru Gagniuc , Keith Busch Cc: Jan Vesely , Lukas Wunner , Alex Williamson , Austin Bolen , Shyam Iyer , Sinan Kaya , linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org Subject: Issues with "PCI/LINK: Report degraded links via link bandwidth notification" Message-ID: <20200115221008.GA191037@google.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.10.1 (2018-07-13) Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org I think we have a problem with link bandwidth change notifications (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c). Here's a recent bug report where Jan reported "_tons_" of these notifications on an nvme device: https://bugzilla.kernel.org/show_bug.cgi?id=206197 There was similar discussion involving GPU drivers at https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org The current solution is the CONFIG_PCIE_BW config option, which disables the messages completely. That option defaults to "off" (no messages), but even so, I think it's a little problematic. Users are not really in a position to figure out whether it's safe to enable. All they can do is experiment and see whether it works with their current mix of devices and drivers. I don't think it's currently useful for distros because it's a compile-time switch, and distros cannot predict what system configs will be used, so I don't think they can enable it. Does anybody have proposals for making it smarter about distinguishing real problems from intentional power management, or maybe interfaces drivers could use to tell us when we should ignore bandwidth changes? Bjorn