linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Leon Romanovsky <leon@kernel.org>
To: Amey Narkhede <ameynarkhede03@gmail.com>
Cc: alex.williamson@redhat.com, raphael.norwitz@nutanix.com,
	linux-pci@vger.kernel.org, bhelgaas@google.com,
	linux-kernel@vger.kernel.org, alay.shah@nutanix.com,
	suresh.gumpula@nutanix.com, shyam.rajendran@nutanix.com,
	felipe@nutanix.com
Subject: Re: [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism
Date: Thu, 18 Mar 2021 16:57:48 +0200	[thread overview]
Message-ID: <YFNqbJZo3wqhMc1S@unreal> (raw)
In-Reply-To: <20210318142252.fqi3das3mtct4yje@archlinux>

On Thu, Mar 18, 2021 at 07:52:52PM +0530, Amey Narkhede wrote:
> On 21/03/18 11:09AM, Leon Romanovsky wrote:
> > On Wed, Mar 17, 2021 at 11:31:40AM -0600, Alex Williamson wrote:
> > > On Wed, 17 Mar 2021 15:58:40 +0200
> > > Leon Romanovsky <leon@kernel.org> wrote:
> > >
> > > > On Wed, Mar 17, 2021 at 06:47:18PM +0530, Amey Narkhede wrote:
> > > > > On 21/03/17 01:47PM, Leon Romanovsky wrote:
> > > > > > On Wed, Mar 17, 2021 at 04:53:09PM +0530, Amey Narkhede wrote:
> > > > > > > On 21/03/17 01:02PM, Leon Romanovsky wrote:
> > > > > > > > On Wed, Mar 17, 2021 at 03:54:47PM +0530, Amey Narkhede wrote:
> > > > > > > > > On 21/03/17 06:20AM, Leon Romanovsky wrote:
> > > > > > > > > > On Mon, Mar 15, 2021 at 06:32:32PM +0000, Raphael Norwitz wrote:
> > > > > > > > > > > On Mon, Mar 15, 2021 at 10:29:50AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > On Mon, 15 Mar 2021 21:03:41 +0530
> > > > > > > > > > > > Amey Narkhede <ameynarkhede03@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > > On 21/03/15 05:07PM, Leon Romanovsky wrote:
> > > > > > > > > > > > > > On Mon, Mar 15, 2021 at 08:34:09AM -0600, Alex Williamson wrote:
> > > > > > > > > > > > > > > On Mon, 15 Mar 2021 14:52:26 +0100
> > > > > > > > > > > > > > > Pali Rohár <pali@kernel.org> wrote:
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > On Monday 15 March 2021 19:13:23 Amey Narkhede wrote:
> > > > > > > > > > > > > > > > > slot reset (pci_dev_reset_slot_function) and secondary bus
> > > > > > > > > > > > > > > > > reset(pci_parent_bus_reset) which I think are hot reset and
> > > > > > > > > > > > > > > > > warm reset respectively.
> > > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > > No. PCI secondary bus reset = PCIe Hot Reset. Slot reset is just another
> > > > > > > > > > > > > > > > type of reset, which is currently implemented only for PCIe hot plug
> > > > > > > > > > > > > > > > bridges and for PowerPC PowerNV platform and it just call PCI secondary
> > > > > > > > > > > > > > > > bus reset with some other hook. PCIe Warm Reset does not have API in
> > > > > > > > > > > > > > > > kernel and therefore drivers do not export this type of reset via any
> > > > > > > > > > > > > > > > kernel function (yet).
> > > > > > > > > > > > > > >
> > > > > > > > > > > > > > > Warm reset is beyond the scope of this series, but could be implemented
> > > > > > > > > > > > > > > in a compatible way to fit within the pci_reset_fn_methods[] array
> > > > > > > > > > > > > > > defined here.  Note that with this series the resets available through
> > > > > > > > > > > > > > > pci_reset_function() and the per device reset attribute is sysfs remain
> > > > > > > > > > > > > > > exactly the same as they are currently.  The bus and slot reset
> > > > > > > > > > > > > > > methods used here are limited to devices where only a single function is
> > > > > > > > > > > > > > > affected by the reset, therefore it is not like the patch you proposed
> > > > > > > > > > > > > > > which performed a reset irrespective of the downstream devices.  This
> > > > > > > > > > > > > > > series only enables selection of the existing methods.  Thanks,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Alex,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I asked the patch author here [1], but didn't get any response, maybe
> > > > > > > > > > > > > > you can answer me. What is the use case scenario for this functionality?
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > [1] https://lore.kernel.org/lkml/YE389lAqjJSeTolM@unreal/
> > > > > > > > > > > > > >
> > > > > > > > > > > > > Sorry for not responding immediately. There were some buggy wifi cards
> > > > > > > > > > > > > which needed FLR explicitly not sure if that behavior is fixed in
> > > > > > > > > > > > > drivers. Also there is use a case at Nutanix but the engineer who
> > > > > > > > > > > > > is involved is on PTO that is why I did not respond immediately as
> > > > > > > > > > > > > I don't know the details yet.
> > > > > > > > > > > >
> > > > > > > > > > > > And more generally, devices continue to have reset issues and we
> > > > > > > > > > > > impose a fixed priority in our ordering.  We can and probably should
> > > > > > > > > > > > continue to quirk devices when we find broken resets so that we have
> > > > > > > > > > > > the best default behavior, but it's currently not easy for an end user
> > > > > > > > > > > > to experiment, ie. this reset works, that one doesn't.  We might also
> > > > > > > > > > > > have platform issues where a given reset works better on a certain
> > > > > > > > > > > > platform.  Exposing a way to test these things might lead to better
> > > > > > > > > > > > quirks.  In the case I think Pali was looking for, they wanted a
> > > > > > > > > > > > mechanism to force a bus reset, if this was in reference to a single
> > > > > > > > > > > > function device, this could be accomplished by setting a priority for
> > > > > > > > > > > > that mechanism, which would translate to not only the sysfs reset
> > > > > > > > > > > > attribute, but also the reset mechanism used by vfio-pci.  Thanks,
> > > > > > > > > > > >
> > > > > > > > > > > > Alex
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > To confirm from our end - we have seen many such instances where default
> > > > > > > > > > > reset methods have not worked well on our platform. Debugging these
> > > > > > > > > > > issues is painful in practice, and this interface would make it far
> > > > > > > > > > > easier.
> > > > > > > > > > >
> > > > > > > > > > > Having an interface like this would also help us better communicate the
> > > > > > > > > > > issues we find with upstream. Allowing others to more easily test our
> > > > > > > > > > > (or other entities') findings should give better visibility into
> > > > > > > > > > > which issues apply to the device in general and which are platform
> > > > > > > > > > > specific. In disambiguating the former from the latter, we should be
> > > > > > > > > > > able to better quirk devices for everyone, and in the latter cases, this
> > > > > > > > > > > interface allows for a safer and more elegant solution than any of the
> > > > > > > > > > > current alternatives.
> > > > > > > > > >
> > > > > > > > > > So to summarize, we are talking about test and debug interface to
> > > > > > > > > > overcome HW bugs, am I right?
> > > > > > > > > >
> > > > > > > > > > My personal experience shows that once the easy workaround exists
> > > > > > > > > > (and write to generally available sysfs is very simple), the vendors
> > > > > > > > > > and users desire for proper fix decreases drastically. IMHO, we will
> > > > > > > > > > see increase of copy/paste in SO and blog posts, but reduce in quirks.
> > > > > > > > > >
> > > > > > > > > > My 2-cents.
> > > > > > > > > >
> > > > > > > > > I agree with your point but at least it gives the userspace ability
> > > > > > > > > to use broken device until bug is fixed in upstream.
> > > > > > > >
> > > > > > > > As I said, I don't expect many fixes once "userspace" will be able to
> > > > > > > > use cheap workaround. There is no incentive to fix it.
> > >
> > > We can increase the annoyance factor of using a modified set of reset
> > > methods, but ultimately we can only control what goes into our kernel,
> > > other kernels might take v1 of this series and incorporate it
> > > regardless of what happens here.
> > >
> > > > > > > > > This is also applicable for obscure devices without upstream
> > > > > > > > > drivers for example custom FPGA based devices.
> > > > > > > >
> > > > > > > > This is not relevant to upstream kernel. Those vendors ship everything
> > > > > > > > custom, they don't need upstream, we don't need them :)
> > > > > > > >
> > > > > > > By custom I meant hobbyists who could tinker with their custom FPGA.
> > > > > >
> > > > > > I invite such hobbyists to send patches and include their FPGA in
> > > > > > upstream kernel.
> > >
> > > This is potentially another good use case, how receptive are we going
> > > to be to an FPGA design that botches a reset.  Do they have a valid
> > > device ID for us to base a quirk on, are they just squatting on one, or
> > > using the default from a library.  Maybe the next bitstream will
> > > resolve it, maybe without any external indication.  IOW, what would the
> > > quality level be for that quirk versus using this as a workaround,
> > > where the user probably wouldn't mind a kernel nag?
> >
> > It is worth to solve it when the need arises.
> >
> > >
> > > > > > > > > Another main application which I forgot to mention is virtualization
> > > > > > > > > where vmm wants to reset the device when the guest is reset,
> > > > > > > > > to emulate machine reboot as closely as possible.
> > > > > > > >
> > > > > > > > It can work in very narrow case, because reset will cause to device
> > > > > > > > reprobe and most likely the driver will be different from the one that
> > > > > > > > started reset. I can imagine that net devices will lose their state and
> > > > > > > > config after such reset too.
> > > > > > > >
> > > > > > > Not sure if I got that 100% right. The pci_reset_function() function
> > > > > > > saves and restores device state over the reset.
> > > > > >
> > > > > > I'm talking about netdev state, but whatever given the existence of
> > > > > > sysfs reset knob.
> > > > > >
> > > > > > >
> > > > > > > > IMHO, it will be saner for everyone if virtualization don't try such resets.
> > >
> > > That would cause a massive regression in device assignment support.  As
> > > with other sysfs attributes, triggering them alongside a running driver
> > > is probably not going to end well.  However, pci_reset_function() is
> > > extremely useful for stopping devices and returning them to a default
> > > state, when either rebooting a VM or returning the device to the host.
> > > The device is not removed and re-probed when this occurs, vfio-pci is
> > > able to hold onto the device across these actions.  Sure, don't reset a
> > > netdev device when it's in use, that's not what these are used for.
> > >
> > > > > > > The exists reset sysfs attribute was added for exactly this case
> > > > > > > though.
> > > > > >
> > > > > > I didn't know the rationale behind that file till you said and I
> > > > > > googled libvirt discussion, so ok. Do you propose that libvirt
> > > > > > will manage database of devices and their working reset types?
> > > > > >
> > > > > I don't have much idea about internals of libvirt but why would
> > > > > it need to manage database of working reset types? It could just
> > > > > read new reset_methods attribute to get the list of supported reset
> > > > > methods.
> > > >
> > > > Because the idea of this patch is to read all supported reset types and
> > > > allow to the user to chose the working one. The user will do it with
> > > > help from StackOverflow, but libvirt will need to have some sort of
> > > > database, otherwise it won't be different from simple "echo 1 > reset"
> > > > which will iterate over all supported resets anyway.
> > >
> > > AFAIK, libvirt no longer attempts to do resets itself, or is at least
> > > moving in that direction.  vfio-pci will reset as device when they're
> > > opened by a user (when available) or triggered via the API.
> >
> > <...>
> >
> > > > The difference here is that this is a workaround to solve bugs that
> > > > should be fixed in the kernel.
> > >
> > > If we want to discourage using this as a primary means to resolve reset
> > > issues on a device then we can create log warnings any time it's used.
> > > Downstreams that really want this functionality are going to take this
> > > patch from the list whether we accept it or not.  As above, it seems
> > > there are valid use cases.  Even with mainstream vfio in QEMU, I go
> > > through some hoops trying to determine if I can do a secondary bus
> > > reset rather than a PM reset because it's not specified anywhere what a
> > > "soft reset" means for any given device.  This sort of interface could
> > > make it easier to apply a system policy that a pci_reset_function()
> > > should always perform a secondary bus reset if the only other option is
> > > a PM reset.  Maybe that policy mostly makes sense for a VM use case, so
> > > we'd want one policy by default and another when the device is used for
> > > this functionality.  How could we accomplish that with a quirk?  Thanks,
> >
> > I'm lost here, does vfio-pci use sysfs interface or internal to the kernel API?
> >
> > If it is latter then we don't really need sysfs, if not, we still need
> > some sort of DB to create second policy, because "supported != working".
> > What am I missing?
> >
> > Thanks
> >
> Can you explain bit more about why supported != working?

It is written in the commit message of this patch.
https://lore.kernel.org/lkml/20210312173452.3855-1-ameynarkhede03@gmail.com/
"This feature aims to allow greater control of a device for use cases
as device assignment, where specific device or platform issues may
interact poorly with a given reset method, and for which device specific
quirks have not been developed."

You wrote it and also repeated it a couple of times during the discussion.

If device can understand that specific reset doesn't work, it won't
perform it in first place.

Thanks

  reply	other threads:[~2021-03-18 14:58 UTC|newest]

Thread overview: 90+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-12 17:34 [PATCH 0/4] Expose and manage PCI device reset ameynarkhede03
2021-03-12 17:34 ` [PATCH 1/4] PCI: Refactor pcie_flr to follow calling convention of other reset methods ameynarkhede03
2021-03-12 17:34 ` [PATCH 2/4] PCI: Add new bitmap for keeping track of supported reset mechanisms ameynarkhede03
2021-03-14 23:51   ` Pali Rohár
2021-03-12 17:34 ` [PATCH 3/4] PCI: Remove reset_fn field from pci_dev ameynarkhede03
2021-03-14 23:52   ` Pali Rohár
2021-03-12 17:34 ` [PATCH 4/4] PCI/sysfs: Allow userspace to query and set device reset mechanism ameynarkhede03
2021-03-14 23:55   ` Pali Rohár
2021-03-15 13:43     ` Amey Narkhede
2021-03-15 13:52       ` Pali Rohár
2021-03-15 14:34         ` Alex Williamson
2021-03-15 14:52           ` Pali Rohár
2021-03-15 15:03             ` Alex Williamson
2021-03-17 19:02               ` Pali Rohár
2021-03-17 19:15                 ` Alex Williamson
2021-03-17 19:24                   ` Pali Rohár
2021-03-17 19:32                     ` Alex Williamson
2021-03-17 19:40                       ` Pali Rohár
2021-03-17 20:00                         ` Alex Williamson
2021-03-17 20:13                           ` Pali Rohár
2021-03-18 14:31                             ` Amey Narkhede
2021-03-23 14:34                               ` Pali Rohár
2021-03-23 14:44                                 ` Alex Williamson
2021-03-23 15:32                                   ` Amey Narkhede
2021-03-23 16:06                                     ` Alex Williamson
2021-03-23 16:15                                       ` Alex Williamson
2021-03-15 15:07           ` Leon Romanovsky
2021-03-15 15:33             ` Amey Narkhede
2021-03-15 16:29               ` Alex Williamson
2021-03-15 18:32                 ` Raphael Norwitz
2021-03-17  4:20                   ` Leon Romanovsky
2021-03-17 10:24                     ` Amey Narkhede
2021-03-17 11:02                       ` Leon Romanovsky
2021-03-17 11:23                         ` Amey Narkhede
2021-03-17 11:47                           ` Leon Romanovsky
2021-03-17 13:17                             ` Amey Narkhede
2021-03-17 13:58                               ` Leon Romanovsky
2021-03-17 17:31                                 ` Alex Williamson
2021-03-18  9:09                                   ` Leon Romanovsky
2021-03-18 14:22                                     ` Amey Narkhede
2021-03-18 14:57                                       ` Leon Romanovsky [this message]
2021-03-18 17:01                                         ` Amey Narkhede
2021-03-18 17:35                                           ` Leon Romanovsky
2021-03-18 17:43                                             ` Amey Narkhede
2021-03-18 18:14                                               ` Enrico Weigelt, metux IT consult
2021-03-19 13:05                                               ` Leon Romanovsky
2021-03-19 15:23                                                 ` Amey Narkhede
2021-03-19 15:37                                                   ` Leon Romanovsky
2021-03-19 15:53                                                     ` Amey Narkhede
2021-03-18 17:58                                             ` Enrico Weigelt, metux IT consult
2021-03-19 13:07                                               ` Leon Romanovsky
2021-03-18 16:39                                     ` Alex Williamson
2021-03-18 17:22                                       ` Leon Romanovsky
2021-03-18 17:38                                         ` Amey Narkhede
2021-03-18 18:34                                         ` Enrico Weigelt, metux IT consult
2021-03-19 12:59                                           ` Leon Romanovsky
2021-03-19 13:48                                             ` Enrico Weigelt, metux IT consult
2021-03-19 15:51                                               ` Leon Romanovsky
2021-03-19 15:57                                             ` Bjorn Helgaas
2021-03-19 16:24                                               ` Leon Romanovsky
2021-03-19 16:23                                             ` Alex Williamson
2021-03-20  9:10                                               ` Leon Romanovsky
2021-03-20 14:59                                                 ` Alex Williamson
2021-03-21  8:40                                                   ` Leon Romanovsky
2021-03-21 14:57                                                     ` Amey Narkhede
2021-03-22 17:10                                                     ` Alex Williamson
2021-03-24 10:03                                                       ` Leon Romanovsky
2021-03-24 14:37                                                         ` Alex Williamson
2021-03-24 15:13                                                           ` Leon Romanovsky
2021-03-24 17:17                                                             ` Alex Williamson
2021-03-25  8:37                                                               ` Leon Romanovsky
2021-03-25 14:55                                                                 ` Alex Williamson
2021-03-25 16:09                                                                   ` Leon Romanovsky
2021-03-25 17:22                                                                     ` Amey Narkhede
2021-03-25 17:36                                                                       ` Leon Romanovsky
2021-03-25 17:53                                                                     ` Alex Williamson
2021-03-26  6:40                                                                       ` Leon Romanovsky
2021-03-26  9:18                                                                         ` Krzysztof Wilczyński
2021-03-26 12:54                                                                           ` Leon Romanovsky
2021-03-26 14:20                                                                         ` Alex Williamson
2021-03-27  6:02                                                                           ` Leon Romanovsky
2021-03-25 16:26                                                                 ` Amey Narkhede
2021-03-25 16:46                                                                   ` Leon Romanovsky
2021-03-18 17:51     ` Enrico Weigelt, metux IT consult
     [not found] ` <20210312112043.3f2954e3@omen.home.shazbot.org>
2021-03-12 18:40   ` [PATCH 0/4] Expose and manage PCI device reset Amey Narkhede
2021-03-12 18:58     ` Krzysztof Wilczyński
2021-03-12 19:06       ` Amey Narkhede
2021-03-12 19:20         ` Krzysztof Wilczyński
2021-03-13  2:02     ` Raphael Norwitz
2021-03-14 12:09 ` Leon Romanovsky

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YFNqbJZo3wqhMc1S@unreal \
    --to=leon@kernel.org \
    --cc=alay.shah@nutanix.com \
    --cc=alex.williamson@redhat.com \
    --cc=ameynarkhede03@gmail.com \
    --cc=bhelgaas@google.com \
    --cc=felipe@nutanix.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=raphael.norwitz@nutanix.com \
    --cc=shyam.rajendran@nutanix.com \
    --cc=suresh.gumpula@nutanix.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).