On 6/14/2021 8:38 PM, Christoph Hellwig wrote: > really_probe tries to special case errors from ->probe, but due to all > other initialization added to the function over time now a lot of > internal errors hit that code path as well. Untangle that by adding > a new probe_err local variable and apply the special casing only to > that. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > drivers/base/dd.c | 72 +++++++++++++++++++++++++++-------------------- > 1 file changed, 41 insertions(+), 31 deletions(-) > > diff --git a/drivers/base/dd.c b/drivers/base/dd.c > index 7477d3322b3a..999bc737a8f0 100644 > --- a/drivers/base/dd.c > +++ b/drivers/base/dd.c > @@ -513,12 +513,42 @@ static ssize_t state_synced_show(struct device *dev, > } > static DEVICE_ATTR_RO(state_synced); > > + > +static int call_driver_probe(struct device *dev, struct device_driver *drv) > +{ > + int ret = 0; > + > + if (dev->bus->probe) > + ret = dev->bus->probe(dev); > + else if (drv->probe) > + ret = drv->probe(dev); > + > + switch (ret) { > + case -EPROBE_DEFER: > + /* Driver requested deferred probing */ > + dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name); > + break; > + case -ENODEV: > + case -ENXIO: > + pr_debug("%s: probe of %s rejects match %d\n", > + drv->name, dev_name(dev), ret); > + break; > + default: > + /* driver matched but the probe failed */ > + pr_warn("%s: probe of %s failed with error %d\n", > + drv->name, dev_name(dev), ret); There should be case 0, that is, success case before default case as below: + case 0: + /* Driver returned success */ + break; Otherwise even in case of success, above warning would mislead that probe has failed. Thanks, Kirti > + break; > + } > + > + return ret; > +} > + > static int really_probe(struct device *dev, struct device_driver *drv) > { > - int ret = -EPROBE_DEFER; > int local_trigger_count = atomic_read(&deferred_trigger_count); > bool test_remove = IS_ENABLED(CONFIG_DEBUG_TEST_DRIVER_REMOVE) && > !drv->suppress_bind_attrs; > + int ret = -EPROBE_DEFER, probe_ret = 0; > > if (defer_all_probes) { > /* > @@ -572,15 +602,15 @@ static int really_probe(struct device *dev, struct device_driver *drv) > goto probe_failed; > } > > - if (dev->bus->probe) { > - ret = dev->bus->probe(dev); > - if (ret) > - goto probe_failed; > - } else if (drv->probe) { > - ret = drv->probe(dev); > - if (ret) > - goto probe_failed; > - } > + probe_ret = call_driver_probe(dev, drv); > + if (probe_ret) { > + /* > + * Ignore errors returned by ->probe so that the next driver can > + * try its luck. > + */ > + ret = 0; > + goto probe_failed; > + } > > if (device_add_groups(dev, drv->dev_groups)) { > dev_err(dev, "device_add_groups() failed\n"); > @@ -650,28 +680,8 @@ static int really_probe(struct device *dev, struct device_driver *drv) > dev->pm_domain->dismiss(dev); > pm_runtime_reinit(dev); > dev_pm_set_driver_flags(dev, 0); > - > - switch (ret) { > - case -EPROBE_DEFER: > - /* Driver requested deferred probing */ > - dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name); > + if (probe_ret == -EPROBE_DEFER) > driver_deferred_probe_add_trigger(dev, local_trigger_count); > - break; > - case -ENODEV: > - case -ENXIO: > - pr_debug("%s: probe of %s rejects match %d\n", > - drv->name, dev_name(dev), ret); > - break; > - default: > - /* driver matched but the probe failed */ > - pr_warn("%s: probe of %s failed with error %d\n", > - drv->name, dev_name(dev), ret); > - } > - /* > - * Ignore errors returned by ->probe so that the next driver can try > - * its luck. > - */ > - ret = 0; > done: > atomic_dec(&probe_count); > wake_up_all(&probe_waitqueue); >
On Mon, Jun 14, 2021 at 05:08:40PM +0200, Christoph Hellwig wrote: > @@ -679,8 +666,6 @@ static int really_probe(struct device *dev, struct device_driver *drv) > dev->pm_domain->dismiss(dev); > pm_runtime_reinit(dev); > dev_pm_set_driver_flags(dev, 0); > - if (probe_ret == -EPROBE_DEFER) > - driver_deferred_probe_add_trigger(dev, local_trigger_count); > done: I like the new arrangement - however I'm looking at the ordering relative to this: > atomic_dec(&probe_count); > wake_up_all(&probe_waitqueue); And wondering if the idea is that driver_deferred_probe_add_trigger() is supposed to be enclosed by the atomic, so that the device_block_probing() / wait_for_device_probe() sequence is actually a fence against queuing new work? Which is suggesting that the other driver_deferred_probe_add_trigger() at the top of really_probe is already ordered wrong? Although, if that is the idea the wait_for_device_probe() doesn't look entirely sequenced right.. It looks easy enough to fix by moving the probe_count up: > +static int driver_probe_device(struct device_driver *drv, struct device *dev) > +{ > + int trigger_count = atomic_read(&deferred_trigger_count); > + int ret; > + > + ret = __driver_probe_device(drv, dev); > + if (ret == -EPROBE_DEFER || ret == EPROBE_DEFER) { > + driver_deferred_probe_add(dev); > + > + /* > + * Did a trigger occur while probing? Need to re-trigger if yes > + */ > + if (trigger_count != atomic_read(&deferred_trigger_count) && > + !defer_all_probes) > + driver_deferred_probe_trigger(); > + } into here? I didn't see a reason why it couldn't enclose the pm stuff too.. Jason
On Mon, Jun 14, 2021 at 05:08:37PM +0200, Christoph Hellwig wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> Checking if the dev is dead or if the dev is already bound is a required
> precondition to invoking driver_probe_device(). All the call chains
> leading here duplicate these checks.
>
> Add it directly to driver_probe_device() so the precondition is clear and
> remove the checks from device_driver_attach() and
> __driver_attach_async_helper().
>
> The other call chain going through __device_attach_driver() does have
> these same checks but they are inlined into logic higher up the call stack
> and can't be removed.
>
> The sysfs uAPI call chain starting at bind_store() is a bit confused
> because it reads dev->driver unlocked and returns -ENODEV if it is !NULL,
> otherwise it reads it again under lock and returns 0 if it is !NULL. Fix
> this to always return -EBUSY and always read dev->driver under its lock.
>
> Done in preparation for the next patches which will add additional
> callers to driver_probe_device() and will need these checks as well.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> [hch: drop the extra checks in device_driver_attach and bind_store]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Mon, Jun 14, 2021 at 05:08:38PM +0200, Christoph Hellwig wrote:
> really_probe tries to special case errors from ->probe, but due to all
> other initialization added to the function over time now a lot of
> internal errors hit that code path as well. Untangle that by adding
> a new probe_err local variable and apply the special casing only to
> that.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/base/dd.c | 72 +++++++++++++++++++++++++++--------------------
> 1 file changed, 41 insertions(+), 31 deletions(-)
>
> diff --git a/drivers/base/dd.c b/drivers/base/dd.c
> index 7477d3322b3a..999bc737a8f0 100644
> --- a/drivers/base/dd.c
> +++ b/drivers/base/dd.c
> @@ -513,12 +513,42 @@ static ssize_t state_synced_show(struct device *dev,
> }
> static DEVICE_ATTR_RO(state_synced);
>
> +
> +static int call_driver_probe(struct device *dev, struct device_driver *drv)
> +{
> + int ret = 0;
> +
> + if (dev->bus->probe)
> + ret = dev->bus->probe(dev);
> + else if (drv->probe)
> + ret = drv->probe(dev);
> +
> + switch (ret) {
> + case -EPROBE_DEFER:
> + /* Driver requested deferred probing */
> + dev_dbg(dev, "Driver %s requests probe deferral\n", drv->name);
> + break;
> + case -ENODEV:
> + case -ENXIO:
> + pr_debug("%s: probe of %s rejects match %d\n",
> + drv->name, dev_name(dev), ret);
> + break;
> + default:
> + /* driver matched but the probe failed */
> + pr_warn("%s: probe of %s failed with error %d\n",
> + drv->name, dev_name(dev), ret);
> + break;
> + }
Like Kirti said, 0 needs to be handled here. Did this not spew a lot of
warnings in the logs?
And we can fix up the pr_* calls to use dev_* in the future, shows the
evolution of this code...
thanks,
greg k-h
On Mon, Jun 14, 2021 at 05:08:39PM +0200, Christoph Hellwig wrote:
> Currently really_probe() returns 1 on success and 0 if the probe() call
> fails. This return code arrangement is designed to be useful for
> __device_attach_driver() which is walking the device list and trying every
> driver. 0 means to keep trying.
>
> However, it is not useful for the other places that call through to
> really_probe() that do actually want to see the probe() return code.
>
> For instance bind_store() would be better to return the actual error code
> from the driver's probe method, not discarding it and returning -ENODEV.
>
> Reorganize things so that really_probe() returns the error code from
> ->probe as a (inverted) positive number, and 0 for successful attach.
>
> With this, __device_attach_driver can ignore the (positive) probe errors,
> return 1 to exit the loop for a successful binding and pass on the
> other negative errors, while device_driver_attach simplify inverts the
> positive errors and returns all errors to the sysfs code.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/base/bus.c | 6 +-----
> drivers/base/dd.c | 29 ++++++++++++++++++++---------
> 2 files changed, 21 insertions(+), 14 deletions(-)
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Mon, Jun 14, 2021 at 05:08:41PM +0200, Christoph Hellwig wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> This is intended as a replacement API for device_bind_driver(). It has at
> least the following benefits:
>
> - Internal locking. Few of the users of device_bind_driver() follow the
> locking rules
>
> - Calls device driver probe() internally. Notably this means that devm
> support for probe works correctly as probe() error will call
> devres_release_all()
>
> - struct device_driver -> dev_groups is supported
>
> - Simplified calling convention, no need to manually call probe().
>
> The general usage is for situations that already know what driver to bind
> and need to ensure the bind is synchronized with other logic. Call
> device_driver_attach() after device_add().
>
> If probe() returns a failure then this will be preserved up through to the
> error return of device_driver_attach().
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
On Mon, Jun 14, 2021 at 05:08:36PM +0200, Christoph Hellwig wrote:
> This is my alternative take on this series from Jason:
>
> https://lore.kernel.org/dri-devel/87czsszi9i.fsf@redhat.com/T/
>
> The mdev/vfio parts are exactly the same, but this solves the driver core
> changes for the direct probing without the in/out flag that Greg hated,
> which cause a little more work, but probably make the result better.
>
> Original decription from Jason below:
>
> The mdev bus's core part for managing the lifecycle of devices is mostly
> as one would expect for a driver core bus subsystem.
>
> However instead of having a normal 'struct device_driver' and binding the
> actual mdev drivers through the standard driver core mechanisms it open
> codes this with the struct mdev_parent_ops and provides a single driver
> that shims between the VFIO core's struct vfio_device and the actual
> device driver.
>
> Instead, allow mdev drivers implement an actual struct mdev_driver and
> directly call vfio_register_group_dev() in the probe() function for the
> mdev. Arrange to bind the created mdev_device to the mdev_driver that is
> provided by the end driver.
>
> The actual execution flow doesn't change much, eg what was
> parent_ops->create is now device_driver->probe and it is called at almost
> the exact same time - except under the normal control of the driver core.
>
> Ultimately converting all the drivers unlocks a fair number of additional
> VFIO simplifications and cleanups.
This looks much better as far as the driver core changes go, thank you
for doing this.
I'm guessing there will be at least one more revision of this. Do you
want this to go through my driver core tree or is there a mdev tree it
should go through? Either is fine for me.
thanks,
greg k-h
On Mon, Jun 14 2021, Christoph Hellwig <hch@lst.de> wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> Checking if the dev is dead or if the dev is already bound is a required
> precondition to invoking driver_probe_device(). All the call chains
> leading here duplicate these checks.
>
> Add it directly to driver_probe_device() so the precondition is clear and
> remove the checks from device_driver_attach() and
> __driver_attach_async_helper().
>
> The other call chain going through __device_attach_driver() does have
> these same checks but they are inlined into logic higher up the call stack
> and can't be removed.
>
> The sysfs uAPI call chain starting at bind_store() is a bit confused
> because it reads dev->driver unlocked and returns -ENODEV if it is !NULL,
> otherwise it reads it again under lock and returns 0 if it is !NULL. Fix
> this to always return -EBUSY and always read dev->driver under its lock.
>
> Done in preparation for the next patches which will add additional
> callers to driver_probe_device() and will need these checks as well.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> [hch: drop the extra checks in device_driver_attach and bind_store]
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/base/bus.c | 2 +-
> drivers/base/dd.c | 32 ++++++++++----------------------
> 2 files changed, 11 insertions(+), 23 deletions(-)
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
On Mon, Jun 14 2021, Christoph Hellwig <hch@lst.de> wrote:
> Currently really_probe() returns 1 on success and 0 if the probe() call
> fails. This return code arrangement is designed to be useful for
> __device_attach_driver() which is walking the device list and trying every
> driver. 0 means to keep trying.
>
> However, it is not useful for the other places that call through to
> really_probe() that do actually want to see the probe() return code.
>
> For instance bind_store() would be better to return the actual error code
> from the driver's probe method, not discarding it and returning -ENODEV.
>
> Reorganize things so that really_probe() returns the error code from
> ->probe as a (inverted) positive number, and 0 for successful attach.
>
> With this, __device_attach_driver can ignore the (positive) probe errors,
> return 1 to exit the loop for a successful binding and pass on the
> other negative errors, while device_driver_attach simplify inverts the
> positive errors and returns all errors to the sysfs code.
>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/base/bus.c | 6 +-----
> drivers/base/dd.c | 29 ++++++++++++++++++++---------
> 2 files changed, 21 insertions(+), 14 deletions(-)
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
On Mon, Jun 14 2021, Christoph Hellwig <hch@lst.de> wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> This is intended as a replacement API for device_bind_driver(). It has at
> least the following benefits:
>
> - Internal locking. Few of the users of device_bind_driver() follow the
> locking rules
>
> - Calls device driver probe() internally. Notably this means that devm
> support for probe works correctly as probe() error will call
> devres_release_all()
>
> - struct device_driver -> dev_groups is supported
>
> - Simplified calling convention, no need to manually call probe().
>
> The general usage is for situations that already know what driver to bind
> and need to ensure the bind is synchronized with other logic. Call
> device_driver_attach() after device_add().
>
> If probe() returns a failure then this will be preserved up through to the
> error return of device_driver_attach().
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/base/base.h | 1 -
> drivers/base/dd.c | 3 +++
> include/linux/device.h | 2 ++
> 3 files changed, 5 insertions(+), 1 deletion(-)
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
On Mon, Jun 14 2021, Christoph Hellwig <hch@lst.de> wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> For some reason the vfio_mdev shim mdev_driver has its own module and
> kconfig. As the next patch requires access to it from mdev.ko merge the
> two modules together and remove VFIO_MDEV_DEVICE.
>
> A later patch deletes this driver entirely.
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> Documentation/s390/vfio-ap.rst | 1 -
> arch/s390/Kconfig | 2 +-
> drivers/gpu/drm/i915/Kconfig | 2 +-
> drivers/vfio/mdev/Kconfig | 7 -------
> drivers/vfio/mdev/Makefile | 3 +--
> drivers/vfio/mdev/mdev_core.c | 16 ++++++++++++++--
> drivers/vfio/mdev/mdev_private.h | 2 ++
> drivers/vfio/mdev/vfio_mdev.c | 24 +-----------------------
> samples/Kconfig | 6 +++---
> 9 files changed, 23 insertions(+), 40 deletions(-)
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
On Mon, Jun 14 2021, Christoph Hellwig <hch@lst.de> wrote:
> From: Jason Gunthorpe <jgg@nvidia.com>
>
> This allows a mdev driver to opt out of using vfio_mdev.c, instead the
> driver will provide a 'struct mdev_driver' and register directly with the
> driver core.
>
> Much of mdev_parent_ops becomes unused in this mode:
> - create()/remove() are done via the mdev_driver probe()/remove()
> - mdev_attr_groups becomes mdev_driver driver.dev_groups
> - Wrapper function callbacks are replaced with the same ones from
> struct vfio_device_ops
>
> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
> drivers/vfio/mdev/mdev_core.c | 30 ++++++++++++++++++++++--------
> drivers/vfio/mdev/mdev_driver.c | 10 ++++++++++
> include/linux/mdev.h | 2 ++
> 3 files changed, 34 insertions(+), 8 deletions(-)
Reviewed-by: Cornelia Huck <cohuck@redhat.com>
On Tue, Jun 15, 2021 at 07:50:21AM +0200, Christoph Hellwig wrote:
> On Tue, Jun 15, 2021 at 07:21:57AM +0200, Greg Kroah-Hartman wrote:
> > This looks much better as far as the driver core changes go, thank you
> > for doing this.
> >
> > I'm guessing there will be at least one more revision of this.
>
> Yes.
>
> > Do you
> > want this to go through my driver core tree or is there a mdev tree it
> > should go through? Either is fine for me.
>
> Either way is fine with me. Alex, do you have a preference?
I would prefer to see it go to Alex's tree since there are more vfio
patches following this that might get to this merge window.
If you want the driver core part on a branch you can pull/etc I can
organize that.
Jason