linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bjorn Helgaas <helgaas@kernel.org>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: mr.nuke.me@gmail.com, linux-pci@vger.kernel.org,
	austin_bolen@dell.com, alex_gagniuc@dellteam.com,
	keith.busch@intel.com, Shyam_Iyer@Dell.com, lukas@wunner.de,
	okaya@kernel.org, torvalds@linux-foundation.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] PCI: Add link_change error handler and vfio-pci user
Date: Mon, 29 Apr 2019 12:43:25 -0500	[thread overview]
Message-ID: <20190429174325.GA105836@google.com> (raw)
In-Reply-To: <20190429085104.728aee75@x1.home>

On Mon, Apr 29, 2019 at 08:51:04AM -0600, Alex Williamson wrote:
> On Wed, 24 Apr 2019 12:57:58 -0500
> Bjorn Helgaas <helgaas@kernel.org> wrote:
> 
> > On Tue, Apr 23, 2019 at 04:42:28PM -0600, Alex Williamson wrote:
> > > The PCIe bandwidth notification service generates logging any time a
> > > link changes speed or width to a state that is considered downgraded.
> > > Unfortunately, it cannot differentiate signal integrity related link
> > > changes from those intentionally initiated by an endpoint driver,
> > > including drivers that may live in userspace or VMs when making use
> > > of vfio-pci.  Therefore, allow the driver to have a say in whether
> > > the link is indeed downgraded and worth noting in the log, or if the
> > > change is perhaps intentional.
> > > 
> > > For vfio-pci, we don't know the intentions of the user/guest driver
> > > either, but we do know that GPU drivers in guests actively manage
> > > the link state and therefore trigger the bandwidth notification for
> > > what appear to be entirely intentional link changes.
> > > 
> > > Fixes: e8303bb7a75c PCI/LINK: Report degraded links via link bandwidth notification
> > > Link: https://lore.kernel.org/linux-pci/155597243666.19387.1205950870601742062.stgit@gimli.home/T/#u
> > > Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
> > > ---
> > > 
> > > Changing to pci_dbg() logging is not super usable, so let's try the
> > > previous idea of letting the driver handle link change events as they
> > > see fit.  Ideally this might be two patches, but for easier handling,
> > > folding the pci and vfio-pci bits together.  Comments?  Thanks,  
> > 
> > I'm a little uneasy about the bandwidth notification logging as a
> > whole.  Messages in dmesg don't seem like a solid base for building
> > management tools.
> > 
> > I assume the eventual goal would be some sort of digested notification
> > along the lines of "hey mr/ms administrator, the link to device X
> > unexpectedly became slower, you might want to check that out."
> > 
> > If I were building that, I don't think I would use dmesg.  I might
> > write a daemon that polls /sys/.../current_link_{speed,width}, or
> > maybe use some sort of netlink event.  Maybe it would be useful to
> > have the admin designate devices of interest.
> > 
> > I'm hesitant about adding a .link_change() handler.  If there's
> > something useful a driver could do with it, that's one thing.  But
> > using it merely to suppress a message doesn't really seem worth the
> > trouble, and it seems unfriendly to ask drivers to add it when they
> > didn't ask for it and get no benefit from it.
> 
> So where do we go from here?  I agree that dmesg is not necessarily a
> great choice for these sorts of events and if they went somewhere else,
> maybe I wouldn't have the same concerns about them generating user
> confusion or contributing to DoS vectors from userspace drivers.  As it
> is though, we have known cases where benign events generate confusing
> logging messages, which seems like a regression.  Drivers didn't ask
> for a link_change handler, but nor did they ask that the link state to
> their device be monitored so closely.  Maybe this not only needs some
> sort of change to the logging mechanism, but also an opt-in by the
> driver if they don't expect runtime link changes.  Thanks,

I think it's really too late in the cycle to rework this and get
changes merged before the v5.1 release (probably on May 5), so I'll
queue up a revert and we can iron out the wrinkles for v5.2.

Bjorn

      parent reply	other threads:[~2019-04-29 17:43 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-23 22:42 [PATCH] PCI: Add link_change error handler and vfio-pci user Alex Williamson
2019-04-24 16:45 ` Alex_Gagniuc
2019-04-24 17:19   ` Alex Williamson
2019-04-24 17:35     ` Alex G
2019-04-24 17:57 ` Bjorn Helgaas
2019-04-29 14:51   ` Alex Williamson
2019-04-29 16:45     ` Sinan Kaya
2019-04-29 16:59       ` Alex Williamson
2019-04-30 17:59         ` Keith Busch
2019-04-29 17:43     ` Bjorn Helgaas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190429174325.GA105836@google.com \
    --to=helgaas@kernel.org \
    --cc=Shyam_Iyer@Dell.com \
    --cc=alex.williamson@redhat.com \
    --cc=alex_gagniuc@dellteam.com \
    --cc=austin_bolen@dell.com \
    --cc=keith.busch@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=lukas@wunner.de \
    --cc=mr.nuke.me@gmail.com \
    --cc=okaya@kernel.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).