openbmc.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/5] driver core, of: support for reserved devices
@ 2021-10-22  2:00 Zev Weiss
  2021-10-22  2:00 ` [PATCH 1/5] of: base: add function to check for status = "reserved" Zev Weiss
                   ` (6 more replies)
  0 siblings, 7 replies; 34+ messages in thread
From: Zev Weiss @ 2021-10-22  2:00 UTC (permalink / raw)
  To: Frank Rowand, Greg Kroah-Hartman, Rob Herring
  Cc: devicetree, Zev Weiss, Andrew Jeffery, openbmc, linux-kernel,
	Jeremy Kerr

Hello all,

This series is another incarnation of a couple other patchsets I've
posted recently [0, 1], but again different enough in overall
structure that I'm not sure it's exactly a v2 (or v3).

As compared to [1], it abandons the writable binary sysfs files and at
Frank's suggestion returns to an approach more akin to [0], though
without any driver-specific (aspeed-smc) changes, which I figure might
as well be done later in a separate series once appropriate
infrastructure is in place.

The basic idea is to implement support for a status property value
that's documented in the DT spec [2], but thus far not used at all in
the kernel (or anywhere else I'm aware of): "reserved".  According to
the spec (section 2.3.4, Table 2.4), this status:

  Indicates that the device is operational, but should not be used.
  Typically this is used for devices that are controlled by another
  software component, such as platform firmware.

With these changes, devices marked as reserved are (at least in some
cases, more on this later) instantiated, but will not have drivers
bound to them unless and until userspace explicitly requests it by
writing the device's name to the driver's sysfs 'bind' file.  This
enables appropriate handling of hardware arrangements that can arise
in contexts like OpenBMC, where a device may be shared with another
external controller not under the kernel's control (for example, the
flash chip storing the host CPU's firmware, shared by the BMC and the
host CPU and exclusively under the control of the latter by default).
Such a device can be marked as reserved so that the kernel refrains
from touching it until appropriate preparatory steps have been taken
(e.g. BMC userspace coordinating with the host CPU to arbitrate which
processor has control of the firmware flash).

Patches 1-3 provide some basic plumbing for checking the "reserved"
status of a device, patch 4 is the main driver-core change, and patch
5 tweaks the OF platform code to not skip reserved devices so that
they can actually be instantiated.

One shortcoming of this series is that it doesn't automatically apply
universally across all busses and drivers -- patch 5 enables support
for platform devices, but similar changes would be required for
support in other busses (e.g. in of_register_spi_devices(),
of_i2c_register_devices(), etc.) and drivers that instantiate DT
devices.  Since at present a "reserved" status is treated as
equivalent to "disabled" and this series preserves that status quo in
those cases I'd hope this wouldn't be considered a deal-breaker, but
a thing to be aware of at least.

Greg: I know on [1] you had commented nack-ing the addition of boolean
function parameters; patch 4 adds a flags mask instead in an analogous
situation.  I'm not certain how much of an improvement you'd consider
that (hopefully at least slightly better, in that the arguments passed
at the call site are more self-explanatory); if that's still
unsatisfactory I'd welcome any suggested alternatives.


Thanks,
Zev

[0] https://lore.kernel.org/openbmc/20210929115409.21254-1-zev@bewilderbeest.net/
[1] https://lore.kernel.org/openbmc/20211007000954.30621-1-zev@bewilderbeest.net/
[2] https://github.com/devicetree-org/devicetree-specification/releases/download/v0.3/devicetree-specification-v0.3.pdf

Zev Weiss (5):
  of: base: add function to check for status = "reserved"
  device property: add fwnode_device_is_reserved()
  of: property: add support for fwnode_device_is_reserved()
  driver core: inhibit automatic driver binding on reserved devices
  of: platform: instantiate reserved devices

 drivers/base/bus.c            |  2 +-
 drivers/base/dd.c             | 13 ++++++----
 drivers/base/property.c       | 16 +++++++++++++
 drivers/dma/idxd/compat.c     |  3 +--
 drivers/of/base.c             | 56 ++++++++++++++++++++++++++++++++++++-------
 drivers/of/platform.c         |  2 +-
 drivers/of/property.c         |  6 +++++
 drivers/vfio/mdev/mdev_core.c |  2 +-
 include/linux/device.h        | 14 ++++++++++-
 include/linux/fwnode.h        |  2 ++
 include/linux/of.h            |  6 +++++
 include/linux/property.h      |  1 +
 12 files changed, 104 insertions(+), 19 deletions(-)

-- 
2.33.1


^ permalink raw reply	[flat|nested] 34+ messages in thread

* [PATCH 1/5] of: base: add function to check for status = "reserved"
  2021-10-22  2:00 [PATCH 0/5] driver core, of: support for reserved devices Zev Weiss
@ 2021-10-22  2:00 ` Zev Weiss
  2021-10-22  6:43   ` Greg Kroah-Hartman
  2021-10-22  2:00 ` [PATCH 2/5] device property: add fwnode_device_is_reserved() Zev Weiss
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Zev Weiss @ 2021-10-22  2:00 UTC (permalink / raw)
  To: Frank Rowand, Greg Kroah-Hartman, Rob Herring
  Cc: devicetree, Zev Weiss, Andrew Jeffery, openbmc, linux-kernel,
	Jeremy Kerr

Per v0.3 of the Devicetree Specification [0]:

  Indicates that the device is operational, but should not be used.
  Typically this is used for devices that are controlled by another
  software component, such as platform firmware.

One use-case for this is in OpenBMC, where certain devices (such as a
BIOS flash chip) may be shared by the host and the BMC, but cannot be
accessed by the BMC during its usual boot-time device probing, because
they require additional (potentially elaborate) coordination with the
host to arbitrate which processor is controlling the device.

Devices marked with this status should thus be instantiated, but not
have a driver bound to them or be otherwise touched.

[0] https://github.com/devicetree-org/devicetree-specification/releases/download/v0.3/devicetree-specification-v0.3.pdf

Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
---
 drivers/of/base.c  | 56 +++++++++++++++++++++++++++++++++++++++-------
 include/linux/of.h |  6 +++++
 2 files changed, 54 insertions(+), 8 deletions(-)

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 0ac17256258d..3bd7c5b8a2cc 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -580,14 +580,16 @@ int of_machine_is_compatible(const char *compat)
 EXPORT_SYMBOL(of_machine_is_compatible);
 
 /**
- *  __of_device_is_available - check if a device is available for use
+ * __of_device_check_status - check if a device's status matches a particular string
  *
- *  @device: Node to check for availability, with locks already held
+ * @device: Node to check status of, with locks already held
+ * @val: Status string to check for, or NULL for "okay"/"ok"
  *
- *  Return: True if the status property is absent or set to "okay" or "ok",
- *  false otherwise
+ * Return: True if status property exists and matches @val, or either "okay"
+ * or "ok" if @val is NULL, or if status property is absent and @val is
+ * "okay", "ok", or NULL.  False otherwise.
  */
-static bool __of_device_is_available(const struct device_node *device)
+static bool __of_device_check_status(const struct device_node *device, const char *val)
 {
 	const char *status;
 	int statlen;
@@ -596,17 +598,35 @@ static bool __of_device_is_available(const struct device_node *device)
 		return false;
 
 	status = __of_get_property(device, "status", &statlen);
-	if (status == NULL)
-		return true;
+	if (!status) {
+		/* a missing status property is treated as "okay" */
+		status = "okay";
+		statlen = strlen(status) + 1; /* property lengths include the NUL terminator */
+	}
 
 	if (statlen > 0) {
-		if (!strcmp(status, "okay") || !strcmp(status, "ok"))
+		if (!val && (!strcmp(status, "okay") || !strcmp(status, "ok")))
+			return true;
+		else if (val && !strcmp(status, val))
 			return true;
 	}
 
 	return false;
 }
 
+/**
+ * __of_device_is_available - check if a device is available for use
+ *
+ * @device: Node to check for availability, with locks already held
+ *
+ * Return: True if the status property is absent or set to "okay" or "ok",
+ * false otherwise
+ */
+static bool __of_device_is_available(const struct device_node *device)
+{
+	return __of_device_check_status(device, NULL);
+}
+
 /**
  *  of_device_is_available - check if a device is available for use
  *
@@ -628,6 +648,26 @@ bool of_device_is_available(const struct device_node *device)
 }
 EXPORT_SYMBOL(of_device_is_available);
 
+/**
+ * of_device_is_reserved - check if a device is marked as reserved
+ *
+ * @device: Node to check for reservation
+ *
+ * Return: True if the status property is set to "reserved", false otherwise
+ */
+bool of_device_is_reserved(const struct device_node *device)
+{
+	unsigned long flags;
+	bool res;
+
+	raw_spin_lock_irqsave(&devtree_lock, flags);
+	res = __of_device_check_status(device, "reserved");
+	raw_spin_unlock_irqrestore(&devtree_lock, flags);
+
+	return res;
+}
+EXPORT_SYMBOL(of_device_is_reserved);
+
 /**
  *  of_device_is_big_endian - check if a device has BE registers
  *
diff --git a/include/linux/of.h b/include/linux/of.h
index 6f1c41f109bb..aa9762da5e7c 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -345,6 +345,7 @@ extern int of_device_is_compatible(const struct device_node *device,
 extern int of_device_compatible_match(struct device_node *device,
 				      const char *const *compat);
 extern bool of_device_is_available(const struct device_node *device);
+extern bool of_device_is_reserved(const struct device_node *device);
 extern bool of_device_is_big_endian(const struct device_node *device);
 extern const void *of_get_property(const struct device_node *node,
 				const char *name,
@@ -707,6 +708,11 @@ static inline bool of_device_is_available(const struct device_node *device)
 	return false;
 }
 
+static inline bool of_device_is_reserved(const struct device_node *device)
+{
+	return false;
+}
+
 static inline bool of_device_is_big_endian(const struct device_node *device)
 {
 	return false;
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 2/5] device property: add fwnode_device_is_reserved()
  2021-10-22  2:00 [PATCH 0/5] driver core, of: support for reserved devices Zev Weiss
  2021-10-22  2:00 ` [PATCH 1/5] of: base: add function to check for status = "reserved" Zev Weiss
@ 2021-10-22  2:00 ` Zev Weiss
  2021-10-22  2:00 ` [PATCH 3/5] of: property: add support for fwnode_device_is_reserved() Zev Weiss
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 34+ messages in thread
From: Zev Weiss @ 2021-10-22  2:00 UTC (permalink / raw)
  To: Frank Rowand, Greg Kroah-Hartman, Rob Herring
  Cc: devicetree, Heikki Krogerus, Saravana Kannan, Zev Weiss,
	Rafael J. Wysocki, Andrew Jeffery, openbmc, linux-kernel,
	Daniel Scally, linux-acpi, Jeremy Kerr, Andy Shevchenko,
	Len Brown

This is the fwnode API corresponding to of_device_is_reserved(), which
indicates that a device exists but may not be immediately accessible.

Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
---
 drivers/base/property.c  | 16 ++++++++++++++++
 include/linux/fwnode.h   |  2 ++
 include/linux/property.h |  1 +
 3 files changed, 19 insertions(+)

diff --git a/drivers/base/property.c b/drivers/base/property.c
index 453918eb7390..7b70cb04531c 100644
--- a/drivers/base/property.c
+++ b/drivers/base/property.c
@@ -845,6 +845,22 @@ bool fwnode_device_is_available(const struct fwnode_handle *fwnode)
 }
 EXPORT_SYMBOL_GPL(fwnode_device_is_available);
 
+/**
+ * fwnode_device_is_reserved - check if a device is reserved for use
+ * @fwnode: Pointer to the fwnode of the device.
+ *
+ * For fwnode node types that don't implement the .device_is_reserved()
+ * operation, this function returns false.
+ */
+bool fwnode_device_is_reserved(const struct fwnode_handle *fwnode)
+{
+	if (!fwnode_has_op(fwnode, device_is_reserved))
+		return false;
+
+	return fwnode_call_bool_op(fwnode, device_is_reserved);
+}
+EXPORT_SYMBOL_GPL(fwnode_device_is_reserved);
+
 /**
  * device_get_child_node_count - return the number of child nodes for device
  * @dev: Device to cound the child nodes for
diff --git a/include/linux/fwnode.h b/include/linux/fwnode.h
index 9f4ad719bfe3..1a9aefbe12f1 100644
--- a/include/linux/fwnode.h
+++ b/include/linux/fwnode.h
@@ -86,6 +86,7 @@ struct fwnode_reference_args {
  * @get: Get a reference to an fwnode.
  * @put: Put a reference to an fwnode.
  * @device_is_available: Return true if the device is available.
+ * @device_is_reserved: Return true if the device is reserved.
  * @device_get_match_data: Return the device driver match data.
  * @property_present: Return true if a property is present.
  * @property_read_int_array: Read an array of integer properties. Return zero on
@@ -110,6 +111,7 @@ struct fwnode_operations {
 	struct fwnode_handle *(*get)(struct fwnode_handle *fwnode);
 	void (*put)(struct fwnode_handle *fwnode);
 	bool (*device_is_available)(const struct fwnode_handle *fwnode);
+	bool (*device_is_reserved)(const struct fwnode_handle *fwnode);
 	const void *(*device_get_match_data)(const struct fwnode_handle *fwnode,
 					     const struct device *dev);
 	bool (*property_present)(const struct fwnode_handle *fwnode,
diff --git a/include/linux/property.h b/include/linux/property.h
index 357513a977e5..455b022fa612 100644
--- a/include/linux/property.h
+++ b/include/linux/property.h
@@ -50,6 +50,7 @@ int device_property_match_string(struct device *dev,
 				 const char *propname, const char *string);
 
 bool fwnode_device_is_available(const struct fwnode_handle *fwnode);
+bool fwnode_device_is_reserved(const struct fwnode_handle *fwnode);
 bool fwnode_property_present(const struct fwnode_handle *fwnode,
 			     const char *propname);
 int fwnode_property_read_u8_array(const struct fwnode_handle *fwnode,
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 3/5] of: property: add support for fwnode_device_is_reserved()
  2021-10-22  2:00 [PATCH 0/5] driver core, of: support for reserved devices Zev Weiss
  2021-10-22  2:00 ` [PATCH 1/5] of: base: add function to check for status = "reserved" Zev Weiss
  2021-10-22  2:00 ` [PATCH 2/5] device property: add fwnode_device_is_reserved() Zev Weiss
@ 2021-10-22  2:00 ` Zev Weiss
  2021-10-22  2:00 ` [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices Zev Weiss
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 34+ messages in thread
From: Zev Weiss @ 2021-10-22  2:00 UTC (permalink / raw)
  To: Frank Rowand, Greg Kroah-Hartman, Rob Herring
  Cc: devicetree, Zev Weiss, Andrew Jeffery, openbmc, linux-kernel,
	Jeremy Kerr

This is just a simple pass-through to of_device_is_reserved().

Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
---
 drivers/of/property.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/of/property.c b/drivers/of/property.c
index a3483484a5a2..67a8cb7ac462 100644
--- a/drivers/of/property.c
+++ b/drivers/of/property.c
@@ -872,6 +872,11 @@ static bool of_fwnode_device_is_available(const struct fwnode_handle *fwnode)
 	return of_device_is_available(to_of_node(fwnode));
 }
 
+static bool of_fwnode_device_is_reserved(const struct fwnode_handle *fwnode)
+{
+	return of_device_is_reserved(to_of_node(fwnode));
+}
+
 static bool of_fwnode_property_present(const struct fwnode_handle *fwnode,
 				       const char *propname)
 {
@@ -1458,6 +1463,7 @@ const struct fwnode_operations of_fwnode_ops = {
 	.get = of_fwnode_get,
 	.put = of_fwnode_put,
 	.device_is_available = of_fwnode_device_is_available,
+	.device_is_reserved = of_fwnode_device_is_reserved,
 	.device_get_match_data = of_fwnode_device_get_match_data,
 	.property_present = of_fwnode_property_present,
 	.property_read_int_array = of_fwnode_property_read_int_array,
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-22  2:00 [PATCH 0/5] driver core, of: support for reserved devices Zev Weiss
                   ` (2 preceding siblings ...)
  2021-10-22  2:00 ` [PATCH 3/5] of: property: add support for fwnode_device_is_reserved() Zev Weiss
@ 2021-10-22  2:00 ` Zev Weiss
  2021-10-22  6:46   ` Greg Kroah-Hartman
  2021-10-22  2:00 ` [PATCH 5/5] of: platform: instantiate " Zev Weiss
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 34+ messages in thread
From: Zev Weiss @ 2021-10-22  2:00 UTC (permalink / raw)
  To: Frank Rowand, Greg Kroah-Hartman, Rob Herring
  Cc: Zev Weiss, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr,
	Rajat Jain, Jianxiong Gao, Dave Jiang, kvm, Saravana Kannan,
	Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Bhaskar Chowdhury,
	Thomas Gleixner, Andy Shevchenko, Andrew Jeffery, Cornelia Huck,
	linux-kernel, Vinod Koul, dmaengine

Devices whose fwnodes are marked as reserved are instantiated, but
will not have a driver bound to them unless userspace explicitly
requests it by writing to a 'bind' sysfs file.  This is to enable
devices that may require special (userspace-mediated) preparation
before a driver can safely probe them.

Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
---
 drivers/base/bus.c            |  2 +-
 drivers/base/dd.c             | 13 ++++++++-----
 drivers/dma/idxd/compat.c     |  3 +--
 drivers/vfio/mdev/mdev_core.c |  2 +-
 include/linux/device.h        | 14 +++++++++++++-
 5 files changed, 24 insertions(+), 10 deletions(-)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index bdc98c5713d5..8a30f9a02de0 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -211,7 +211,7 @@ static ssize_t bind_store(struct device_driver *drv, const char *buf,
 
 	dev = bus_find_device_by_name(bus, NULL, buf);
 	if (dev && driver_match_device(drv, dev)) {
-		err = device_driver_attach(drv, dev);
+		err = device_driver_attach(drv, dev, DRV_BIND_EXPLICIT);
 		if (!err) {
 			/* success */
 			err = count;
diff --git a/drivers/base/dd.c b/drivers/base/dd.c
index 68ea1f949daa..ee64740a63d9 100644
--- a/drivers/base/dd.c
+++ b/drivers/base/dd.c
@@ -22,6 +22,7 @@
 #include <linux/dma-map-ops.h>
 #include <linux/init.h>
 #include <linux/module.h>
+#include <linux/property.h>
 #include <linux/kthread.h>
 #include <linux/wait.h>
 #include <linux/async.h>
@@ -727,13 +728,14 @@ void wait_for_device_probe(void)
 }
 EXPORT_SYMBOL_GPL(wait_for_device_probe);
 
-static int __driver_probe_device(struct device_driver *drv, struct device *dev)
+static int __driver_probe_device(struct device_driver *drv, struct device *dev, u32 flags)
 {
 	int ret = 0;
 
 	if (dev->p->dead || !device_is_registered(dev))
 		return -ENODEV;
-	if (dev->driver)
+	if (dev->driver ||
+	    (fwnode_device_is_reserved(dev->fwnode) && !(flags & DRV_BIND_EXPLICIT)))
 		return -EBUSY;
 
 	dev->can_match = true;
@@ -778,7 +780,7 @@ static int driver_probe_device(struct device_driver *drv, struct device *dev)
 	int ret;
 
 	atomic_inc(&probe_count);
-	ret = __driver_probe_device(drv, dev);
+	ret = __driver_probe_device(drv, dev, DRV_BIND_DEFAULT);
 	if (ret == -EPROBE_DEFER || ret == EPROBE_DEFER) {
 		driver_deferred_probe_add(dev);
 
@@ -1052,16 +1054,17 @@ static void __device_driver_unlock(struct device *dev, struct device *parent)
  * device_driver_attach - attach a specific driver to a specific device
  * @drv: Driver to attach
  * @dev: Device to attach it to
+ * @flags: Bitmask of DRV_BIND_* flags
  *
  * Manually attach driver to a device. Will acquire both @dev lock and
  * @dev->parent lock if needed. Returns 0 on success, -ERR on failure.
  */
-int device_driver_attach(struct device_driver *drv, struct device *dev)
+int device_driver_attach(struct device_driver *drv, struct device *dev, u32 flags)
 {
 	int ret;
 
 	__device_driver_lock(dev, dev->parent);
-	ret = __driver_probe_device(drv, dev);
+	ret = __driver_probe_device(drv, dev, flags);
 	__device_driver_unlock(dev, dev->parent);
 
 	/* also return probe errors as normal negative errnos */
diff --git a/drivers/dma/idxd/compat.c b/drivers/dma/idxd/compat.c
index 3df21615f888..51df38dea15a 100644
--- a/drivers/dma/idxd/compat.c
+++ b/drivers/dma/idxd/compat.c
@@ -7,7 +7,6 @@
 #include <linux/device/bus.h>
 #include "idxd.h"
 
-extern int device_driver_attach(struct device_driver *drv, struct device *dev);
 extern void device_driver_detach(struct device *dev);
 
 #define DRIVER_ATTR_IGNORE_LOCKDEP(_name, _mode, _show, _store)	\
@@ -56,7 +55,7 @@ static ssize_t bind_store(struct device_driver *drv, const char *buf, size_t cou
 	if (!alt_drv)
 		return -ENODEV;
 
-	rc = device_driver_attach(alt_drv, dev);
+	rc = device_driver_attach(alt_drv, dev, DRV_BIND_EXPLICIT);
 	if (rc < 0)
 		return rc;
 
diff --git a/drivers/vfio/mdev/mdev_core.c b/drivers/vfio/mdev/mdev_core.c
index b314101237fe..f42c6ec543c8 100644
--- a/drivers/vfio/mdev/mdev_core.c
+++ b/drivers/vfio/mdev/mdev_core.c
@@ -309,7 +309,7 @@ int mdev_device_create(struct mdev_type *type, const guid_t *uuid)
 
 	if (!drv)
 		drv = &vfio_mdev_driver;
-	ret = device_driver_attach(&drv->driver, &mdev->dev);
+	ret = device_driver_attach(&drv->driver, &mdev->dev, DRV_BIND_DEFAULT);
 	if (ret)
 		goto out_del;
 
diff --git a/include/linux/device.h b/include/linux/device.h
index e270cb740b9e..1ada1093799b 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -876,12 +876,24 @@ static inline void *dev_get_platdata(const struct device *dev)
 	return dev->platform_data;
 }
 
+/*
+ * Driver-binding flags (for passing to device_driver_attach())
+ *
+ * DRV_BIND_DEFAULT: a default, automatic bind, e.g. as a result of a device
+ *                   being added for which we already have a driver, or vice
+ *                   versa.
+ * DRV_BIND_EXPLICIT: an explicit, userspace-requested driver bind, e.g. as a
+ *                    result of a write to /sys/bus/.../drivers/.../bind
+ */
+#define DRV_BIND_DEFAULT	0
+#define DRV_BIND_EXPLICIT	BIT(0)
+
 /*
  * Manual binding of a device to driver. See drivers/base/bus.c
  * for information on use.
  */
 int __must_check device_driver_attach(struct device_driver *drv,
-				      struct device *dev);
+				      struct device *dev, u32 flags);
 int __must_check device_bind_driver(struct device *dev);
 void device_release_driver(struct device *dev);
 int  __must_check device_attach(struct device *dev);
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* [PATCH 5/5] of: platform: instantiate reserved devices
  2021-10-22  2:00 [PATCH 0/5] driver core, of: support for reserved devices Zev Weiss
                   ` (3 preceding siblings ...)
  2021-10-22  2:00 ` [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices Zev Weiss
@ 2021-10-22  2:00 ` Zev Weiss
  2021-10-22  2:58 ` [PATCH 0/5] driver core, of: support for " Rob Herring
  2021-10-22  6:50 ` Greg Kroah-Hartman
  6 siblings, 0 replies; 34+ messages in thread
From: Zev Weiss @ 2021-10-22  2:00 UTC (permalink / raw)
  To: Frank Rowand, Greg Kroah-Hartman, Rob Herring
  Cc: devicetree, Zev Weiss, Andrew Jeffery, openbmc, linux-kernel,
	Jeremy Kerr

Devices with a "reserved" status will now be passed through to
of_device_add() along with "okay" ones so that the driver core is
aware of their existence and drivers can be explicitly bound to them
when appropriate.

Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
---
 drivers/of/platform.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/of/platform.c b/drivers/of/platform.c
index 74afbb7a4f5e..060e1e9058c2 100644
--- a/drivers/of/platform.c
+++ b/drivers/of/platform.c
@@ -170,7 +170,7 @@ static struct platform_device *of_platform_device_create_pdata(
 {
 	struct platform_device *dev;
 
-	if (!of_device_is_available(np) ||
+	if ((!of_device_is_available(np) && !of_device_is_reserved(np)) ||
 	    of_node_test_and_set_flag(np, OF_POPULATED))
 		return NULL;
 
-- 
2.33.1


^ permalink raw reply related	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/5] driver core, of: support for reserved devices
  2021-10-22  2:00 [PATCH 0/5] driver core, of: support for reserved devices Zev Weiss
                   ` (4 preceding siblings ...)
  2021-10-22  2:00 ` [PATCH 5/5] of: platform: instantiate " Zev Weiss
@ 2021-10-22  2:58 ` Rob Herring
  2021-10-22  3:13   ` Zev Weiss
  2021-10-22  6:50   ` Greg Kroah-Hartman
  2021-10-22  6:50 ` Greg Kroah-Hartman
  6 siblings, 2 replies; 34+ messages in thread
From: Rob Herring @ 2021-10-22  2:58 UTC (permalink / raw)
  To: Zev Weiss
  Cc: devicetree, Andrew Jeffery, Greg Kroah-Hartman, OpenBMC Maillist,
	linux-kernel, Jeremy Kerr, Frank Rowand

On Thu, Oct 21, 2021 at 9:00 PM Zev Weiss <zev@bewilderbeest.net> wrote:
>
> Hello all,
>
> This series is another incarnation of a couple other patchsets I've
> posted recently [0, 1], but again different enough in overall
> structure that I'm not sure it's exactly a v2 (or v3).
>
> As compared to [1], it abandons the writable binary sysfs files and at
> Frank's suggestion returns to an approach more akin to [0], though
> without any driver-specific (aspeed-smc) changes, which I figure might
> as well be done later in a separate series once appropriate
> infrastructure is in place.

I skimmed this, and overall I like the approach.

> The basic idea is to implement support for a status property value
> that's documented in the DT spec [2], but thus far not used at all in
> the kernel (or anywhere else I'm aware of): "reserved".  According to
> the spec (section 2.3.4, Table 2.4), this status:
>
>   Indicates that the device is operational, but should not be used.
>   Typically this is used for devices that are controlled by another
>   software component, such as platform firmware.
>
> With these changes, devices marked as reserved are (at least in some
> cases, more on this later) instantiated, but will not have drivers
> bound to them unless and until userspace explicitly requests it by
> writing the device's name to the driver's sysfs 'bind' file.  This
> enables appropriate handling of hardware arrangements that can arise
> in contexts like OpenBMC, where a device may be shared with another
> external controller not under the kernel's control (for example, the
> flash chip storing the host CPU's firmware, shared by the BMC and the
> host CPU and exclusively under the control of the latter by default).
> Such a device can be marked as reserved so that the kernel refrains
> from touching it until appropriate preparatory steps have been taken
> (e.g. BMC userspace coordinating with the host CPU to arbitrate which
> processor has control of the firmware flash).
>
> Patches 1-3 provide some basic plumbing for checking the "reserved"
> status of a device, patch 4 is the main driver-core change, and patch
> 5 tweaks the OF platform code to not skip reserved devices so that
> they can actually be instantiated.
>
> One shortcoming of this series is that it doesn't automatically apply
> universally across all busses and drivers -- patch 5 enables support
> for platform devices, but similar changes would be required for
> support in other busses (e.g. in of_register_spi_devices(),
> of_i2c_register_devices(), etc.) and drivers that instantiate DT
> devices.  Since at present a "reserved" status is treated as
> equivalent to "disabled" and this series preserves that status quo in
> those cases I'd hope this wouldn't be considered a deal-breaker, but
> a thing to be aware of at least.
>
> Greg: I know on [1] you had commented nack-ing the addition of boolean
> function parameters; patch 4 adds a flags mask instead in an analogous
> situation.  I'm not certain how much of an improvement you'd consider
> that (hopefully at least slightly better, in that the arguments passed
> at the call site are more self-explanatory); if that's still
> unsatisfactory I'd welcome any suggested alternatives.

Can't we add a flag bit in struct device to reflect manual binding?
bind will set it and unbind clears it.

Rob

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/5] driver core, of: support for reserved devices
  2021-10-22  2:58 ` [PATCH 0/5] driver core, of: support for " Rob Herring
@ 2021-10-22  3:13   ` Zev Weiss
  2021-10-22  6:50   ` Greg Kroah-Hartman
  1 sibling, 0 replies; 34+ messages in thread
From: Zev Weiss @ 2021-10-22  3:13 UTC (permalink / raw)
  To: Rob Herring
  Cc: devicetree, Andrew Jeffery, Greg Kroah-Hartman, OpenBMC Maillist,
	linux-kernel, Jeremy Kerr, Frank Rowand

On Thu, Oct 21, 2021 at 07:58:46PM PDT, Rob Herring wrote:
>On Thu, Oct 21, 2021 at 9:00 PM Zev Weiss <zev@bewilderbeest.net> wrote:
>>
>> Hello all,
>>
>> This series is another incarnation of a couple other patchsets I've
>> posted recently [0, 1], but again different enough in overall
>> structure that I'm not sure it's exactly a v2 (or v3).
>>
>> As compared to [1], it abandons the writable binary sysfs files and at
>> Frank's suggestion returns to an approach more akin to [0], though
>> without any driver-specific (aspeed-smc) changes, which I figure might
>> as well be done later in a separate series once appropriate
>> infrastructure is in place.
>
>I skimmed this, and overall I like the approach.
>
>> The basic idea is to implement support for a status property value
>> that's documented in the DT spec [2], but thus far not used at all in
>> the kernel (or anywhere else I'm aware of): "reserved".  According to
>> the spec (section 2.3.4, Table 2.4), this status:
>>
>>   Indicates that the device is operational, but should not be used.
>>   Typically this is used for devices that are controlled by another
>>   software component, such as platform firmware.
>>
>> With these changes, devices marked as reserved are (at least in some
>> cases, more on this later) instantiated, but will not have drivers
>> bound to them unless and until userspace explicitly requests it by
>> writing the device's name to the driver's sysfs 'bind' file.  This
>> enables appropriate handling of hardware arrangements that can arise
>> in contexts like OpenBMC, where a device may be shared with another
>> external controller not under the kernel's control (for example, the
>> flash chip storing the host CPU's firmware, shared by the BMC and the
>> host CPU and exclusively under the control of the latter by default).
>> Such a device can be marked as reserved so that the kernel refrains
>> from touching it until appropriate preparatory steps have been taken
>> (e.g. BMC userspace coordinating with the host CPU to arbitrate which
>> processor has control of the firmware flash).
>>
>> Patches 1-3 provide some basic plumbing for checking the "reserved"
>> status of a device, patch 4 is the main driver-core change, and patch
>> 5 tweaks the OF platform code to not skip reserved devices so that
>> they can actually be instantiated.
>>
>> One shortcoming of this series is that it doesn't automatically apply
>> universally across all busses and drivers -- patch 5 enables support
>> for platform devices, but similar changes would be required for
>> support in other busses (e.g. in of_register_spi_devices(),
>> of_i2c_register_devices(), etc.) and drivers that instantiate DT
>> devices.  Since at present a "reserved" status is treated as
>> equivalent to "disabled" and this series preserves that status quo in
>> those cases I'd hope this wouldn't be considered a deal-breaker, but
>> a thing to be aware of at least.
>>
>> Greg: I know on [1] you had commented nack-ing the addition of boolean
>> function parameters; patch 4 adds a flags mask instead in an analogous
>> situation.  I'm not certain how much of an improvement you'd consider
>> that (hopefully at least slightly better, in that the arguments passed
>> at the call site are more self-explanatory); if that's still
>> unsatisfactory I'd welcome any suggested alternatives.
>
>Can't we add a flag bit in struct device to reflect manual binding?
>bind will set it and unbind clears it.
>

I considered this (and actually drafted up a version that did exactly 
that), but it seemed like turning a parameter-passing problem into a 
state-maintenance problem (finding all the places that flag would need 
to be cleared and ensuring newly-added ones don't get missed, which 
unlike a function parameter the compiler can't really check for us).  
Given that the live range (definition to use) of that value is entirely 
within the codepath of a single call-chain (bind_store() -> 
device_driver_attach() -> __driver_probe_device()), continuing to 
maintain that state beyond that call chain seemed like unnecessary 
complexity to me, but if there's a consensus that that would actually be 
preferable I can certainly do it that way instead.


Zev


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] of: base: add function to check for status = "reserved"
  2021-10-22  2:00 ` [PATCH 1/5] of: base: add function to check for status = "reserved" Zev Weiss
@ 2021-10-22  6:43   ` Greg Kroah-Hartman
  2021-10-22  7:38     ` Zev Weiss
  0 siblings, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-22  6:43 UTC (permalink / raw)
  To: Zev Weiss
  Cc: devicetree, Andrew Jeffery, openbmc, linux-kernel, Rob Herring,
	Jeremy Kerr, Frank Rowand

On Thu, Oct 21, 2021 at 07:00:28PM -0700, Zev Weiss wrote:
> Per v0.3 of the Devicetree Specification [0]:
> 
>   Indicates that the device is operational, but should not be used.
>   Typically this is used for devices that are controlled by another
>   software component, such as platform firmware.
> 
> One use-case for this is in OpenBMC, where certain devices (such as a
> BIOS flash chip) may be shared by the host and the BMC, but cannot be
> accessed by the BMC during its usual boot-time device probing, because
> they require additional (potentially elaborate) coordination with the
> host to arbitrate which processor is controlling the device.
> 
> Devices marked with this status should thus be instantiated, but not
> have a driver bound to them or be otherwise touched.
> 
> [0] https://github.com/devicetree-org/devicetree-specification/releases/download/v0.3/devicetree-specification-v0.3.pdf
> 
> Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
> ---
>  drivers/of/base.c  | 56 +++++++++++++++++++++++++++++++++++++++-------
>  include/linux/of.h |  6 +++++
>  2 files changed, 54 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/of/base.c b/drivers/of/base.c
> index 0ac17256258d..3bd7c5b8a2cc 100644
> --- a/drivers/of/base.c
> +++ b/drivers/of/base.c
> @@ -580,14 +580,16 @@ int of_machine_is_compatible(const char *compat)
>  EXPORT_SYMBOL(of_machine_is_compatible);
>  
>  /**
> - *  __of_device_is_available - check if a device is available for use
> + * __of_device_check_status - check if a device's status matches a particular string
>   *
> - *  @device: Node to check for availability, with locks already held
> + * @device: Node to check status of, with locks already held
> + * @val: Status string to check for, or NULL for "okay"/"ok"
>   *
> - *  Return: True if the status property is absent or set to "okay" or "ok",
> - *  false otherwise
> + * Return: True if status property exists and matches @val, or either "okay"
> + * or "ok" if @val is NULL, or if status property is absent and @val is
> + * "okay", "ok", or NULL.  False otherwise.
>   */
> -static bool __of_device_is_available(const struct device_node *device)
> +static bool __of_device_check_status(const struct device_node *device, const char *val)
>  {
>  	const char *status;
>  	int statlen;
> @@ -596,17 +598,35 @@ static bool __of_device_is_available(const struct device_node *device)
>  		return false;
>  
>  	status = __of_get_property(device, "status", &statlen);
> -	if (status == NULL)
> -		return true;
> +	if (!status) {
> +		/* a missing status property is treated as "okay" */
> +		status = "okay";
> +		statlen = strlen(status) + 1; /* property lengths include the NUL terminator */
> +	}
>  
>  	if (statlen > 0) {
> -		if (!strcmp(status, "okay") || !strcmp(status, "ok"))
> +		if (!val && (!strcmp(status, "okay") || !strcmp(status, "ok")))
> +			return true;
> +		else if (val && !strcmp(status, val))


Ick, where is this string coming from?  The kernel or userspace or a
device tree?  This feels very wrong, why is the kernel doing parsing
like this of different options that all mean the same thing?


>  			return true;
>  	}
>  
>  	return false;
>  }
>  
> +/**
> + * __of_device_is_available - check if a device is available for use
> + *
> + * @device: Node to check for availability, with locks already held
> + *
> + * Return: True if the status property is absent or set to "okay" or "ok",
> + * false otherwise
> + */
> +static bool __of_device_is_available(const struct device_node *device)
> +{
> +	return __of_device_check_status(device, NULL);
> +}
> +
>  /**
>   *  of_device_is_available - check if a device is available for use
>   *
> @@ -628,6 +648,26 @@ bool of_device_is_available(const struct device_node *device)
>  }
>  EXPORT_SYMBOL(of_device_is_available);
>  
> +/**
> + * of_device_is_reserved - check if a device is marked as reserved
> + *
> + * @device: Node to check for reservation
> + *
> + * Return: True if the status property is set to "reserved", false otherwise
> + */
> +bool of_device_is_reserved(const struct device_node *device)
> +{
> +	unsigned long flags;
> +	bool res;
> +
> +	raw_spin_lock_irqsave(&devtree_lock, flags);
> +	res = __of_device_check_status(device, "reserved");
> +	raw_spin_unlock_irqrestore(&devtree_lock, flags);

Why is this a "raw" spinlock?

Where is this status coming from?

> +
> +	return res;
> +}
> +EXPORT_SYMBOL(of_device_is_reserved);

EXPORT_SYMBOL_GPL()?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-22  2:00 ` [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices Zev Weiss
@ 2021-10-22  6:46   ` Greg Kroah-Hartman
  2021-10-22  8:32     ` Zev Weiss
  0 siblings, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-22  6:46 UTC (permalink / raw)
  To: Zev Weiss
  Cc: kvm, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr, Rajat Jain,
	Frank Rowand, Jianxiong Gao, Dave Jiang, Saravana Kannan,
	Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
> Devices whose fwnodes are marked as reserved are instantiated, but
> will not have a driver bound to them unless userspace explicitly
> requests it by writing to a 'bind' sysfs file.  This is to enable
> devices that may require special (userspace-mediated) preparation
> before a driver can safely probe them.
> 
> Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
> ---
>  drivers/base/bus.c            |  2 +-
>  drivers/base/dd.c             | 13 ++++++++-----
>  drivers/dma/idxd/compat.c     |  3 +--
>  drivers/vfio/mdev/mdev_core.c |  2 +-
>  include/linux/device.h        | 14 +++++++++++++-
>  5 files changed, 24 insertions(+), 10 deletions(-)

Ugh, no, I don't really want to add yet-another-state to the driver core
like this.  Why are these devices even in the kernel with a driver that
wants to bind to them registered if the driver somehow should NOT be
bound to it?  Shouldn't all of that logic be in the crazy driver itself
as that is a very rare and odd thing to do that the driver core should
not care about at all.

And why does a device need userspace interaction at all?  Again, why
would the driver not know about this and handle it all directly?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/5] driver core, of: support for reserved devices
  2021-10-22  2:00 [PATCH 0/5] driver core, of: support for reserved devices Zev Weiss
                   ` (5 preceding siblings ...)
  2021-10-22  2:58 ` [PATCH 0/5] driver core, of: support for " Rob Herring
@ 2021-10-22  6:50 ` Greg Kroah-Hartman
  2021-10-22  9:00   ` Zev Weiss
  6 siblings, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-22  6:50 UTC (permalink / raw)
  To: Zev Weiss
  Cc: devicetree, Andrew Jeffery, openbmc, linux-kernel, Rob Herring,
	Jeremy Kerr, Frank Rowand

On Thu, Oct 21, 2021 at 07:00:27PM -0700, Zev Weiss wrote:
> Hello all,
> 
> This series is another incarnation of a couple other patchsets I've
> posted recently [0, 1], but again different enough in overall
> structure that I'm not sure it's exactly a v2 (or v3).
> 
> As compared to [1], it abandons the writable binary sysfs files and at
> Frank's suggestion returns to an approach more akin to [0], though
> without any driver-specific (aspeed-smc) changes, which I figure might
> as well be done later in a separate series once appropriate
> infrastructure is in place.
> 
> The basic idea is to implement support for a status property value
> that's documented in the DT spec [2], but thus far not used at all in
> the kernel (or anywhere else I'm aware of): "reserved".  According to
> the spec (section 2.3.4, Table 2.4), this status:
> 
>   Indicates that the device is operational, but should not be used.
>   Typically this is used for devices that are controlled by another
>   software component, such as platform firmware.
> 
> With these changes, devices marked as reserved are (at least in some
> cases, more on this later) instantiated, but will not have drivers
> bound to them unless and until userspace explicitly requests it by
> writing the device's name to the driver's sysfs 'bind' file.  This
> enables appropriate handling of hardware arrangements that can arise
> in contexts like OpenBMC, where a device may be shared with another
> external controller not under the kernel's control (for example, the
> flash chip storing the host CPU's firmware, shared by the BMC and the
> host CPU and exclusively under the control of the latter by default).
> Such a device can be marked as reserved so that the kernel refrains
> from touching it until appropriate preparatory steps have been taken
> (e.g. BMC userspace coordinating with the host CPU to arbitrate which
> processor has control of the firmware flash).
> 
> Patches 1-3 provide some basic plumbing for checking the "reserved"
> status of a device, patch 4 is the main driver-core change, and patch
> 5 tweaks the OF platform code to not skip reserved devices so that
> they can actually be instantiated.

Again, the driver core should not care about this, that is up to the bus
that wants to read these "reserved" values and do something with them or
not (remember the bus is the thing that does the binding, not the driver
core).

But are you sure you are using the "reserved" field properly?  You are
wanting to do "something" to the device to later on be able to then have
the kernel touch the device, while it seems that the reason for this
field is for the kernel to NEVER touch the device at all.  What will
break if you change this logic?

> One shortcoming of this series is that it doesn't automatically apply
> universally across all busses and drivers -- patch 5 enables support
> for platform devices, but similar changes would be required for
> support in other busses (e.g. in of_register_spi_devices(),
> of_i2c_register_devices(), etc.) and drivers that instantiate DT
> devices.  Since at present a "reserved" status is treated as
> equivalent to "disabled" and this series preserves that status quo in
> those cases I'd hope this wouldn't be considered a deal-breaker, but
> a thing to be aware of at least.
> 
> Greg: I know on [1] you had commented nack-ing the addition of boolean
> function parameters; patch 4 adds a flags mask instead in an analogous
> situation.  I'm not certain how much of an improvement you'd consider
> that (hopefully at least slightly better, in that the arguments passed
> at the call site are more self-explanatory); if that's still
> unsatisfactory I'd welcome any suggested alternatives.

Flags are a bit better, yes, but still I do not think this is the right
way to go here, see my comments on the patches...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/5] driver core, of: support for reserved devices
  2021-10-22  2:58 ` [PATCH 0/5] driver core, of: support for " Rob Herring
  2021-10-22  3:13   ` Zev Weiss
@ 2021-10-22  6:50   ` Greg Kroah-Hartman
  1 sibling, 0 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-22  6:50 UTC (permalink / raw)
  To: Rob Herring
  Cc: devicetree, Zev Weiss, Andrew Jeffery, OpenBMC Maillist,
	linux-kernel, Jeremy Kerr, Frank Rowand

On Thu, Oct 21, 2021 at 09:58:46PM -0500, Rob Herring wrote:
> On Thu, Oct 21, 2021 at 9:00 PM Zev Weiss <zev@bewilderbeest.net> wrote:
> >
> > Hello all,
> >
> > This series is another incarnation of a couple other patchsets I've
> > posted recently [0, 1], but again different enough in overall
> > structure that I'm not sure it's exactly a v2 (or v3).
> >
> > As compared to [1], it abandons the writable binary sysfs files and at
> > Frank's suggestion returns to an approach more akin to [0], though
> > without any driver-specific (aspeed-smc) changes, which I figure might
> > as well be done later in a separate series once appropriate
> > infrastructure is in place.
> 
> I skimmed this, and overall I like the approach.
> 
> > The basic idea is to implement support for a status property value
> > that's documented in the DT spec [2], but thus far not used at all in
> > the kernel (or anywhere else I'm aware of): "reserved".  According to
> > the spec (section 2.3.4, Table 2.4), this status:
> >
> >   Indicates that the device is operational, but should not be used.
> >   Typically this is used for devices that are controlled by another
> >   software component, such as platform firmware.
> >
> > With these changes, devices marked as reserved are (at least in some
> > cases, more on this later) instantiated, but will not have drivers
> > bound to them unless and until userspace explicitly requests it by
> > writing the device's name to the driver's sysfs 'bind' file.  This
> > enables appropriate handling of hardware arrangements that can arise
> > in contexts like OpenBMC, where a device may be shared with another
> > external controller not under the kernel's control (for example, the
> > flash chip storing the host CPU's firmware, shared by the BMC and the
> > host CPU and exclusively under the control of the latter by default).
> > Such a device can be marked as reserved so that the kernel refrains
> > from touching it until appropriate preparatory steps have been taken
> > (e.g. BMC userspace coordinating with the host CPU to arbitrate which
> > processor has control of the firmware flash).
> >
> > Patches 1-3 provide some basic plumbing for checking the "reserved"
> > status of a device, patch 4 is the main driver-core change, and patch
> > 5 tweaks the OF platform code to not skip reserved devices so that
> > they can actually be instantiated.
> >
> > One shortcoming of this series is that it doesn't automatically apply
> > universally across all busses and drivers -- patch 5 enables support
> > for platform devices, but similar changes would be required for
> > support in other busses (e.g. in of_register_spi_devices(),
> > of_i2c_register_devices(), etc.) and drivers that instantiate DT
> > devices.  Since at present a "reserved" status is treated as
> > equivalent to "disabled" and this series preserves that status quo in
> > those cases I'd hope this wouldn't be considered a deal-breaker, but
> > a thing to be aware of at least.
> >
> > Greg: I know on [1] you had commented nack-ing the addition of boolean
> > function parameters; patch 4 adds a flags mask instead in an analogous
> > situation.  I'm not certain how much of an improvement you'd consider
> > that (hopefully at least slightly better, in that the arguments passed
> > at the call site are more self-explanatory); if that's still
> > unsatisfactory I'd welcome any suggested alternatives.
> 
> Can't we add a flag bit in struct device to reflect manual binding?
> bind will set it and unbind clears it.

The driver core does not "know" the difference betwen manual and not
manual binding, nor should it.  That's up to the bus.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] of: base: add function to check for status = "reserved"
  2021-10-22  6:43   ` Greg Kroah-Hartman
@ 2021-10-22  7:38     ` Zev Weiss
  2021-10-22  7:45       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 34+ messages in thread
From: Zev Weiss @ 2021-10-22  7:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: devicetree, Andrew Jeffery, openbmc, linux-kernel, Rob Herring,
	Jeremy Kerr, Frank Rowand

On Thu, Oct 21, 2021 at 11:43:23PM PDT, Greg Kroah-Hartman wrote:
>On Thu, Oct 21, 2021 at 07:00:28PM -0700, Zev Weiss wrote:
>> Per v0.3 of the Devicetree Specification [0]:
>>
>>   Indicates that the device is operational, but should not be used.
>>   Typically this is used for devices that are controlled by another
>>   software component, such as platform firmware.
>>
>> One use-case for this is in OpenBMC, where certain devices (such as a
>> BIOS flash chip) may be shared by the host and the BMC, but cannot be
>> accessed by the BMC during its usual boot-time device probing, because
>> they require additional (potentially elaborate) coordination with the
>> host to arbitrate which processor is controlling the device.
>>
>> Devices marked with this status should thus be instantiated, but not
>> have a driver bound to them or be otherwise touched.
>>
>> [0] https://github.com/devicetree-org/devicetree-specification/releases/download/v0.3/devicetree-specification-v0.3.pdf
>>
>> Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
>> ---
>>  drivers/of/base.c  | 56 +++++++++++++++++++++++++++++++++++++++-------
>>  include/linux/of.h |  6 +++++
>>  2 files changed, 54 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/of/base.c b/drivers/of/base.c
>> index 0ac17256258d..3bd7c5b8a2cc 100644
>> --- a/drivers/of/base.c
>> +++ b/drivers/of/base.c
>> @@ -580,14 +580,16 @@ int of_machine_is_compatible(const char *compat)
>>  EXPORT_SYMBOL(of_machine_is_compatible);
>>
>>  /**
>> - *  __of_device_is_available - check if a device is available for use
>> + * __of_device_check_status - check if a device's status matches a particular string
>>   *
>> - *  @device: Node to check for availability, with locks already held
>> + * @device: Node to check status of, with locks already held
>> + * @val: Status string to check for, or NULL for "okay"/"ok"
>>   *
>> - *  Return: True if the status property is absent or set to "okay" or "ok",
>> - *  false otherwise
>> + * Return: True if status property exists and matches @val, or either "okay"
>> + * or "ok" if @val is NULL, or if status property is absent and @val is
>> + * "okay", "ok", or NULL.  False otherwise.
>>   */
>> -static bool __of_device_is_available(const struct device_node *device)
>> +static bool __of_device_check_status(const struct device_node *device, const char *val)
>>  {
>>  	const char *status;
>>  	int statlen;
>> @@ -596,17 +598,35 @@ static bool __of_device_is_available(const struct device_node *device)
>>  		return false;
>>
>>  	status = __of_get_property(device, "status", &statlen);
>> -	if (status == NULL)
>> -		return true;
>> +	if (!status) {
>> +		/* a missing status property is treated as "okay" */
>> +		status = "okay";
>> +		statlen = strlen(status) + 1; /* property lengths include the NUL terminator */
>> +	}
>>
>>  	if (statlen > 0) {
>> -		if (!strcmp(status, "okay") || !strcmp(status, "ok"))
>> +		if (!val && (!strcmp(status, "okay") || !strcmp(status, "ok")))
>> +			return true;
>> +		else if (val && !strcmp(status, val))
>
>
>Ick, where is this string coming from?  The kernel or userspace or a
>device tree?  This feels very wrong, why is the kernel doing parsing
>like this of different options that all mean the same thing?
>

Which string do you mean by "this string"?  'status' comes from a 
property of the device tree node; 'val' will be one of a small set of 
string constants passed by the caller.  Accepting either spelling of 
"okay"/"ok" has been in place since 2008 (commit 834d97d45220, 
"[POWERPC] Add of_device_is_available function"); using NULL as a 
shorthand for those two strings was a suggestion that came up in review 
feedback on a previous incarnation of these patches 
(https://lore.kernel.org/openbmc/CAL_Jsq+rKGv39zHTxNx0A7=X4K48nXRLqWonecG5SobdJq3yKw@mail.gmail.com/T/#u).

>
>>  			return true;
>>  	}
>>
>>  	return false;
>>  }
>>
>> +/**
>> + * __of_device_is_available - check if a device is available for use
>> + *
>> + * @device: Node to check for availability, with locks already held
>> + *
>> + * Return: True if the status property is absent or set to "okay" or "ok",
>> + * false otherwise
>> + */
>> +static bool __of_device_is_available(const struct device_node *device)
>> +{
>> +	return __of_device_check_status(device, NULL);
>> +}
>> +
>>  /**
>>   *  of_device_is_available - check if a device is available for use
>>   *
>> @@ -628,6 +648,26 @@ bool of_device_is_available(const struct device_node *device)
>>  }
>>  EXPORT_SYMBOL(of_device_is_available);
>>
>> +/**
>> + * of_device_is_reserved - check if a device is marked as reserved
>> + *
>> + * @device: Node to check for reservation
>> + *
>> + * Return: True if the status property is set to "reserved", false otherwise
>> + */
>> +bool of_device_is_reserved(const struct device_node *device)
>> +{
>> +	unsigned long flags;
>> +	bool res;
>> +
>> +	raw_spin_lock_irqsave(&devtree_lock, flags);
>> +	res = __of_device_check_status(device, "reserved");
>> +	raw_spin_unlock_irqrestore(&devtree_lock, flags);
>
>Why is this a "raw" spinlock?
>

devtree_lock being a raw_spinlock_t appears to date from commit 
d6d3c4e65651 ("OF: convert devtree lock from rw_lock to raw spinlock"); 
"required for preempt-rt", according to Thomas Gleixner's commit 
message.

>Where is this status coming from?
>

This would be specified in a DT node, e.g. via something like:

   &somedev {
     compatible = "foobar";
     status = "reserved";
     /* ... */
   };

>> +
>> +	return res;
>> +}
>> +EXPORT_SYMBOL(of_device_is_reserved);
>
>EXPORT_SYMBOL_GPL()?
>

Its closest existing sibling, of_device_is_available(), is a plain 
EXPORT_SYMBOL(); if we want to convert things more broadly that'd be 
fine with me, but having one be GPL-only and the other not seems like 
it'd be a bit confusing and inconsistent?


Thanks,
Zev


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 1/5] of: base: add function to check for status = "reserved"
  2021-10-22  7:38     ` Zev Weiss
@ 2021-10-22  7:45       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-22  7:45 UTC (permalink / raw)
  To: Zev Weiss
  Cc: devicetree, Andrew Jeffery, openbmc, linux-kernel, Rob Herring,
	Jeremy Kerr, Frank Rowand

On Fri, Oct 22, 2021 at 12:38:40AM -0700, Zev Weiss wrote:
> On Thu, Oct 21, 2021 at 11:43:23PM PDT, Greg Kroah-Hartman wrote:
> > On Thu, Oct 21, 2021 at 07:00:28PM -0700, Zev Weiss wrote:
> > > Per v0.3 of the Devicetree Specification [0]:
> > > 
> > >   Indicates that the device is operational, but should not be used.
> > >   Typically this is used for devices that are controlled by another
> > >   software component, such as platform firmware.
> > > 
> > > One use-case for this is in OpenBMC, where certain devices (such as a
> > > BIOS flash chip) may be shared by the host and the BMC, but cannot be
> > > accessed by the BMC during its usual boot-time device probing, because
> > > they require additional (potentially elaborate) coordination with the
> > > host to arbitrate which processor is controlling the device.
> > > 
> > > Devices marked with this status should thus be instantiated, but not
> > > have a driver bound to them or be otherwise touched.
> > > 
> > > [0] https://github.com/devicetree-org/devicetree-specification/releases/download/v0.3/devicetree-specification-v0.3.pdf
> > > 
> > > Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
> > > ---
> > >  drivers/of/base.c  | 56 +++++++++++++++++++++++++++++++++++++++-------
> > >  include/linux/of.h |  6 +++++
> > >  2 files changed, 54 insertions(+), 8 deletions(-)
> > > 
> > > diff --git a/drivers/of/base.c b/drivers/of/base.c
> > > index 0ac17256258d..3bd7c5b8a2cc 100644
> > > --- a/drivers/of/base.c
> > > +++ b/drivers/of/base.c
> > > @@ -580,14 +580,16 @@ int of_machine_is_compatible(const char *compat)
> > >  EXPORT_SYMBOL(of_machine_is_compatible);
> > > 
> > >  /**
> > > - *  __of_device_is_available - check if a device is available for use
> > > + * __of_device_check_status - check if a device's status matches a particular string
> > >   *
> > > - *  @device: Node to check for availability, with locks already held
> > > + * @device: Node to check status of, with locks already held
> > > + * @val: Status string to check for, or NULL for "okay"/"ok"
> > >   *
> > > - *  Return: True if the status property is absent or set to "okay" or "ok",
> > > - *  false otherwise
> > > + * Return: True if status property exists and matches @val, or either "okay"
> > > + * or "ok" if @val is NULL, or if status property is absent and @val is
> > > + * "okay", "ok", or NULL.  False otherwise.
> > >   */
> > > -static bool __of_device_is_available(const struct device_node *device)
> > > +static bool __of_device_check_status(const struct device_node *device, const char *val)
> > >  {
> > >  	const char *status;
> > >  	int statlen;
> > > @@ -596,17 +598,35 @@ static bool __of_device_is_available(const struct device_node *device)
> > >  		return false;
> > > 
> > >  	status = __of_get_property(device, "status", &statlen);
> > > -	if (status == NULL)
> > > -		return true;
> > > +	if (!status) {
> > > +		/* a missing status property is treated as "okay" */
> > > +		status = "okay";
> > > +		statlen = strlen(status) + 1; /* property lengths include the NUL terminator */
> > > +	}
> > > 
> > >  	if (statlen > 0) {
> > > -		if (!strcmp(status, "okay") || !strcmp(status, "ok"))
> > > +		if (!val && (!strcmp(status, "okay") || !strcmp(status, "ok")))
> > > +			return true;
> > > +		else if (val && !strcmp(status, val))
> > 
> > 
> > Ick, where is this string coming from?  The kernel or userspace or a
> > device tree?  This feels very wrong, why is the kernel doing parsing
> > like this of different options that all mean the same thing?
> > 
> 
> Which string do you mean by "this string"?  'status' comes from a property
> of the device tree node; 'val' will be one of a small set of string
> constants passed by the caller.  Accepting either spelling of "okay"/"ok"
> has been in place since 2008 (commit 834d97d45220, "[POWERPC] Add
> of_device_is_available function"); using NULL as a shorthand for those two
> strings was a suggestion that came up in review feedback on a previous
> incarnation of these patches (https://lore.kernel.org/openbmc/CAL_Jsq+rKGv39zHTxNx0A7=X4K48nXRLqWonecG5SobdJq3yKw@mail.gmail.com/T/#u).

I was referring to "okay".  And if this really is a "we take either"
type of thing, shouldn't there be a single function to call for this
type of test, much like we have some of the sysfs helpers?

And what about using match_string() as well?

> > >  			return true;
> > >  	}
> > > 
> > >  	return false;
> > >  }
> > > 
> > > +/**
> > > + * __of_device_is_available - check if a device is available for use
> > > + *
> > > + * @device: Node to check for availability, with locks already held
> > > + *
> > > + * Return: True if the status property is absent or set to "okay" or "ok",
> > > + * false otherwise
> > > + */
> > > +static bool __of_device_is_available(const struct device_node *device)
> > > +{
> > > +	return __of_device_check_status(device, NULL);
> > > +}
> > > +
> > >  /**
> > >   *  of_device_is_available - check if a device is available for use
> > >   *
> > > @@ -628,6 +648,26 @@ bool of_device_is_available(const struct device_node *device)
> > >  }
> > >  EXPORT_SYMBOL(of_device_is_available);
> > > 
> > > +/**
> > > + * of_device_is_reserved - check if a device is marked as reserved
> > > + *
> > > + * @device: Node to check for reservation
> > > + *
> > > + * Return: True if the status property is set to "reserved", false otherwise
> > > + */
> > > +bool of_device_is_reserved(const struct device_node *device)
> > > +{
> > > +	unsigned long flags;
> > > +	bool res;
> > > +
> > > +	raw_spin_lock_irqsave(&devtree_lock, flags);
> > > +	res = __of_device_check_status(device, "reserved");
> > > +	raw_spin_unlock_irqrestore(&devtree_lock, flags);
> > 
> > Why is this a "raw" spinlock?
> > 
> 
> devtree_lock being a raw_spinlock_t appears to date from commit d6d3c4e65651
> ("OF: convert devtree lock from rw_lock to raw spinlock"); "required for
> preempt-rt", according to Thomas Gleixner's commit message.
> 
> > Where is this status coming from?
> > 
> 
> This would be specified in a DT node, e.g. via something like:
> 
>   &somedev {
>     compatible = "foobar";
>     status = "reserved";
>     /* ... */
>   };
> 
> > > +
> > > +	return res;
> > > +}
> > > +EXPORT_SYMBOL(of_device_is_reserved);
> > 
> > EXPORT_SYMBOL_GPL()?
> > 
> 
> Its closest existing sibling, of_device_is_available(), is a plain
> EXPORT_SYMBOL(); if we want to convert things more broadly that'd be fine
> with me, but having one be GPL-only and the other not seems like it'd be a
> bit confusing and inconsistent?

Ah, ok, you are following the rest of this file for this, and the
locking stuff, sorry, I was not familiar with it.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-22  6:46   ` Greg Kroah-Hartman
@ 2021-10-22  8:32     ` Zev Weiss
  2021-10-22  8:57       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 34+ messages in thread
From: Zev Weiss @ 2021-10-22  8:32 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: kvm, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr, Rajat Jain,
	Frank Rowand, Jianxiong Gao, Dave Jiang, Saravana Kannan,
	Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
>On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
>> Devices whose fwnodes are marked as reserved are instantiated, but
>> will not have a driver bound to them unless userspace explicitly
>> requests it by writing to a 'bind' sysfs file.  This is to enable
>> devices that may require special (userspace-mediated) preparation
>> before a driver can safely probe them.
>>
>> Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
>> ---
>>  drivers/base/bus.c            |  2 +-
>>  drivers/base/dd.c             | 13 ++++++++-----
>>  drivers/dma/idxd/compat.c     |  3 +--
>>  drivers/vfio/mdev/mdev_core.c |  2 +-
>>  include/linux/device.h        | 14 +++++++++++++-
>>  5 files changed, 24 insertions(+), 10 deletions(-)
>
>Ugh, no, I don't really want to add yet-another-state to the driver core
>like this.  Why are these devices even in the kernel with a driver that
>wants to bind to them registered if the driver somehow should NOT be
>bound to it?  Shouldn't all of that logic be in the crazy driver itself
>as that is a very rare and odd thing to do that the driver core should
>not care about at all.
>
>And why does a device need userspace interaction at all?  Again, why
>would the driver not know about this and handle it all directly?
>

Let me expand a bit more on the details of the specific situation I'm 
dealing with...

On a server motherboard we've got a host CPU (Xeon, Epyc, POWER, etc.) 
and a baseboard management controller, or BMC (typically an ARM SoC, an 
ASPEED AST2500 in my case).  The host CPU's firmware (BIOS/UEFI, ME 
firmware, etc.) lives in a SPI flash chip.  Because it's the host's 
firmware, that flash chip is connected to and generally (by default) 
under the control of the host CPU.  

But we also want the BMC to be able to perform out-of-band updates to 
the host's firmware, so the flash is *also* connected to the BMC.  
There's an external mux (controlled by a GPIO output driven by the BMC) 
that switches which processor (host or BMC) is actually driving the SPI 
signals to the flash chip, but there's a bunch of other stuff that's 
also required before the BMC can flip that switch and take control of 
the SPI interface:

  - the BMC needs to track (and potentially alter) the host's power state 
    to ensure it's not running (in OpenBMC the existing logic for this is 
    an entire non-trivial userspace daemon unto itself)

  - it needs to twiddle some other GPIOs to put the ME into recovery mode

  - it needs to exchange some IPMI messages with the ME to confirm it got 
    into recovery mode

(Some of the details here are specific to the particular motherboard I'm 
working with, but I'd guess other systems probably have broadly similar 
requirements.)

The firmware flash (or at least the BMC's side of the mux in front of 
it) is attached to a spi-nor controller that's well supported by an 
existing MTD driver (aspeed-smc), but that driver can't safely probe the 
chip until all the stuff described above has been done.  In particular, 
this means we can't reasonably bind the driver to that device during the 
normal device-discovery/driver-binding done in the BMC's boot process 
(nor do we want to, as that would pull the rug out from under the 
running host).  We basically only ever want to touch that SPI interface 
when a user (sysadmin using the BMC, let's say) has explicitly initiated 
an out-of-band firmware update.

So we want the kernel to be aware of the device's existence (so that we 
*can* bind a driver to it when needed), but we don't want it touching 
the device unless we really ask for it.

Does that help clarify the motivation for wanting this functionality?


Thanks,
Zev


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-22  8:32     ` Zev Weiss
@ 2021-10-22  8:57       ` Greg Kroah-Hartman
  2021-10-22 15:18         ` Patrick Williams
  2021-10-22 16:27         ` Zev Weiss
  0 siblings, 2 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-22  8:57 UTC (permalink / raw)
  To: Zev Weiss
  Cc: kvm, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr, Rajat Jain,
	Frank Rowand, Jianxiong Gao, Dave Jiang, Saravana Kannan,
	Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
> On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
> > On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
> > > Devices whose fwnodes are marked as reserved are instantiated, but
> > > will not have a driver bound to them unless userspace explicitly
> > > requests it by writing to a 'bind' sysfs file.  This is to enable
> > > devices that may require special (userspace-mediated) preparation
> > > before a driver can safely probe them.
> > > 
> > > Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
> > > ---
> > >  drivers/base/bus.c            |  2 +-
> > >  drivers/base/dd.c             | 13 ++++++++-----
> > >  drivers/dma/idxd/compat.c     |  3 +--
> > >  drivers/vfio/mdev/mdev_core.c |  2 +-
> > >  include/linux/device.h        | 14 +++++++++++++-
> > >  5 files changed, 24 insertions(+), 10 deletions(-)
> > 
> > Ugh, no, I don't really want to add yet-another-state to the driver core
> > like this.  Why are these devices even in the kernel with a driver that
> > wants to bind to them registered if the driver somehow should NOT be
> > bound to it?  Shouldn't all of that logic be in the crazy driver itself
> > as that is a very rare and odd thing to do that the driver core should
> > not care about at all.
> > 
> > And why does a device need userspace interaction at all?  Again, why
> > would the driver not know about this and handle it all directly?
> > 
> 
> Let me expand a bit more on the details of the specific situation I'm
> dealing with...
> 
> On a server motherboard we've got a host CPU (Xeon, Epyc, POWER, etc.) and a
> baseboard management controller, or BMC (typically an ARM SoC, an ASPEED
> AST2500 in my case).  The host CPU's firmware (BIOS/UEFI, ME firmware, etc.)
> lives in a SPI flash chip.  Because it's the host's firmware, that flash
> chip is connected to and generally (by default) under the control of the
> host CPU.
> 
> But we also want the BMC to be able to perform out-of-band updates to the
> host's firmware, so the flash is *also* connected to the BMC.  There's an
> external mux (controlled by a GPIO output driven by the BMC) that switches
> which processor (host or BMC) is actually driving the SPI signals to the
> flash chip, but there's a bunch of other stuff that's also required before
> the BMC can flip that switch and take control of the SPI interface:
> 
>  - the BMC needs to track (and potentially alter) the host's power state
> to ensure it's not running (in OpenBMC the existing logic for this is    an
> entire non-trivial userspace daemon unto itself)
> 
>  - it needs to twiddle some other GPIOs to put the ME into recovery mode
> 
>  - it needs to exchange some IPMI messages with the ME to confirm it got
> into recovery mode
> 
> (Some of the details here are specific to the particular motherboard I'm
> working with, but I'd guess other systems probably have broadly similar
> requirements.)
> 
> The firmware flash (or at least the BMC's side of the mux in front of it) is
> attached to a spi-nor controller that's well supported by an existing MTD
> driver (aspeed-smc), but that driver can't safely probe the chip until all
> the stuff described above has been done.  In particular, this means we can't
> reasonably bind the driver to that device during the normal
> device-discovery/driver-binding done in the BMC's boot process (nor do we
> want to, as that would pull the rug out from under the running host).  We
> basically only ever want to touch that SPI interface when a user (sysadmin
> using the BMC, let's say) has explicitly initiated an out-of-band firmware
> update.
> 
> So we want the kernel to be aware of the device's existence (so that we
> *can* bind a driver to it when needed), but we don't want it touching the
> device unless we really ask for it.
> 
> Does that help clarify the motivation for wanting this functionality?

Sure, then just do this type of thing in the driver itself.  Do not have
any matching "ids" for this hardware it so that the bus will never call
the probe function for this hardware _until_ a manual write happens to
the driver's "bind" sysfs file.

Then when userspace is done, do a "unbind" write.

No driver core changes should be needed at all here.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/5] driver core, of: support for reserved devices
  2021-10-22  6:50 ` Greg Kroah-Hartman
@ 2021-10-22  9:00   ` Zev Weiss
  2021-10-22  9:22     ` Greg Kroah-Hartman
  2021-10-25  5:53     ` Frank Rowand
  0 siblings, 2 replies; 34+ messages in thread
From: Zev Weiss @ 2021-10-22  9:00 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: devicetree, Andrew Jeffery, openbmc, linux-kernel, Rob Herring,
	Jeremy Kerr, Frank Rowand

On Thu, Oct 21, 2021 at 11:50:07PM PDT, Greg Kroah-Hartman wrote:
>On Thu, Oct 21, 2021 at 07:00:27PM -0700, Zev Weiss wrote:
>> Hello all,
>>
>> This series is another incarnation of a couple other patchsets I've
>> posted recently [0, 1], but again different enough in overall
>> structure that I'm not sure it's exactly a v2 (or v3).
>>
>> As compared to [1], it abandons the writable binary sysfs files and at
>> Frank's suggestion returns to an approach more akin to [0], though
>> without any driver-specific (aspeed-smc) changes, which I figure might
>> as well be done later in a separate series once appropriate
>> infrastructure is in place.
>>
>> The basic idea is to implement support for a status property value
>> that's documented in the DT spec [2], but thus far not used at all in
>> the kernel (or anywhere else I'm aware of): "reserved".  According to
>> the spec (section 2.3.4, Table 2.4), this status:
>>
>>   Indicates that the device is operational, but should not be used.
>>   Typically this is used for devices that are controlled by another
>>   software component, such as platform firmware.
>>
>> With these changes, devices marked as reserved are (at least in some
>> cases, more on this later) instantiated, but will not have drivers
>> bound to them unless and until userspace explicitly requests it by
>> writing the device's name to the driver's sysfs 'bind' file.  This
>> enables appropriate handling of hardware arrangements that can arise
>> in contexts like OpenBMC, where a device may be shared with another
>> external controller not under the kernel's control (for example, the
>> flash chip storing the host CPU's firmware, shared by the BMC and the
>> host CPU and exclusively under the control of the latter by default).
>> Such a device can be marked as reserved so that the kernel refrains
>> from touching it until appropriate preparatory steps have been taken
>> (e.g. BMC userspace coordinating with the host CPU to arbitrate which
>> processor has control of the firmware flash).
>>
>> Patches 1-3 provide some basic plumbing for checking the "reserved"
>> status of a device, patch 4 is the main driver-core change, and patch
>> 5 tweaks the OF platform code to not skip reserved devices so that
>> they can actually be instantiated.
>
>Again, the driver core should not care about this, that is up to the bus
>that wants to read these "reserved" values and do something with them or
>not (remember the bus is the thing that does the binding, not the driver
>core).
>
>But are you sure you are using the "reserved" field properly?

Well, thus far both Rob Herring and Oliver O'Halloran (originator of the 
"reserved" status in the DT spec, whom I probably should have CCed 
earlier, sorry) have seemed receptive to this interpretation of it, 
which I'd hope would lend it some credence.

>You are
>wanting to do "something" to the device to later on be able to then have
>the kernel touch the device, while it seems that the reason for this
>field is for the kernel to NEVER touch the device at all.  What will
>break if you change this logic?

Given that there's no existing usage of or support for this status value 
anywhere I can see in the kernel, and that Oliver has indicated that it 
should be compatible with usage in OpenPower platform firmware, my 
expectation would certainly be that nothing would break, but if there 
are examples of things that could I'd be interested to see them.


Thanks,
Zev


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/5] driver core, of: support for reserved devices
  2021-10-22  9:00   ` Zev Weiss
@ 2021-10-22  9:22     ` Greg Kroah-Hartman
  2021-10-25  5:53     ` Frank Rowand
  1 sibling, 0 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-22  9:22 UTC (permalink / raw)
  To: Zev Weiss
  Cc: devicetree, Andrew Jeffery, openbmc, linux-kernel, Rob Herring,
	Jeremy Kerr, Frank Rowand

On Fri, Oct 22, 2021 at 02:00:57AM -0700, Zev Weiss wrote:
> On Thu, Oct 21, 2021 at 11:50:07PM PDT, Greg Kroah-Hartman wrote:
> > On Thu, Oct 21, 2021 at 07:00:27PM -0700, Zev Weiss wrote:
> > > Hello all,
> > > 
> > > This series is another incarnation of a couple other patchsets I've
> > > posted recently [0, 1], but again different enough in overall
> > > structure that I'm not sure it's exactly a v2 (or v3).
> > > 
> > > As compared to [1], it abandons the writable binary sysfs files and at
> > > Frank's suggestion returns to an approach more akin to [0], though
> > > without any driver-specific (aspeed-smc) changes, which I figure might
> > > as well be done later in a separate series once appropriate
> > > infrastructure is in place.
> > > 
> > > The basic idea is to implement support for a status property value
> > > that's documented in the DT spec [2], but thus far not used at all in
> > > the kernel (or anywhere else I'm aware of): "reserved".  According to
> > > the spec (section 2.3.4, Table 2.4), this status:
> > > 
> > >   Indicates that the device is operational, but should not be used.
> > >   Typically this is used for devices that are controlled by another
> > >   software component, such as platform firmware.
> > > 
> > > With these changes, devices marked as reserved are (at least in some
> > > cases, more on this later) instantiated, but will not have drivers
> > > bound to them unless and until userspace explicitly requests it by
> > > writing the device's name to the driver's sysfs 'bind' file.  This
> > > enables appropriate handling of hardware arrangements that can arise
> > > in contexts like OpenBMC, where a device may be shared with another
> > > external controller not under the kernel's control (for example, the
> > > flash chip storing the host CPU's firmware, shared by the BMC and the
> > > host CPU and exclusively under the control of the latter by default).
> > > Such a device can be marked as reserved so that the kernel refrains
> > > from touching it until appropriate preparatory steps have been taken
> > > (e.g. BMC userspace coordinating with the host CPU to arbitrate which
> > > processor has control of the firmware flash).
> > > 
> > > Patches 1-3 provide some basic plumbing for checking the "reserved"
> > > status of a device, patch 4 is the main driver-core change, and patch
> > > 5 tweaks the OF platform code to not skip reserved devices so that
> > > they can actually be instantiated.
> > 
> > Again, the driver core should not care about this, that is up to the bus
> > that wants to read these "reserved" values and do something with them or
> > not (remember the bus is the thing that does the binding, not the driver
> > core).
> > 
> > But are you sure you are using the "reserved" field properly?
> 
> Well, thus far both Rob Herring and Oliver O'Halloran (originator of the
> "reserved" status in the DT spec, whom I probably should have CCed earlier,
> sorry) have seemed receptive to this interpretation of it, which I'd hope
> would lend it some credence.

Ok, that's up to the DT people, I'll let you all fight it out with the
platform creators :)

Good luck!

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-22  8:57       ` Greg Kroah-Hartman
@ 2021-10-22 15:18         ` Patrick Williams
  2021-10-23  8:56           ` Greg Kroah-Hartman
  2021-10-22 16:27         ` Zev Weiss
  1 sibling, 1 reply; 34+ messages in thread
From: Patrick Williams @ 2021-10-22 15:18 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Zev Weiss, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr,
	Rajat Jain, Frank Rowand, Jianxiong Gao, Dave Jiang, kvm,
	Saravana Kannan, Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

[-- Attachment #1: Type: text/plain, Size: 1956 bytes --]

Hi Greg,

On Fri, Oct 22, 2021 at 10:57:21AM +0200, Greg Kroah-Hartman wrote:
> On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
> > On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
> > > On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:

> > So we want the kernel to be aware of the device's existence (so that we
> > *can* bind a driver to it when needed), but we don't want it touching the
> > device unless we really ask for it.
> > 
> > Does that help clarify the motivation for wanting this functionality?
> 
> Sure, then just do this type of thing in the driver itself.  Do not have
> any matching "ids" for this hardware it so that the bus will never call
> the probe function for this hardware _until_ a manual write happens to
> the driver's "bind" sysfs file.

It sounds like you're suggesting a change to one particular driver to satisfy
this one particular case (and maybe I'm just not understanding your suggestion).
For a BMC, this is a pretty regular situation and not just as one-off as Zev's
example.

Another good example is where a system can have optional riser cards with a
whole tree of devices that might be on that riser card (and there might be
different variants of a riser card that could go in the same slot).  Usually
there is an EEPROM of some sort at a well-known address that can be parsed to
identify which kind of riser card it is and then the appropriate sub-devices can
be enumerated.  That EEPROM parsing is something that is currently done in
userspace due to the complexity and often vendor-specific nature of it.

Many of these devices require quite a bit more configuration information than
can be passed along a `bind` call.  I believe it has been suggested previously
that this riser-card scenario could also be solved with dynamic loading of DT
snippets, but that support seems simple pretty far from being merged.

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-22  8:57       ` Greg Kroah-Hartman
  2021-10-22 15:18         ` Patrick Williams
@ 2021-10-22 16:27         ` Zev Weiss
  2021-10-23  8:55           ` Greg Kroah-Hartman
  1 sibling, 1 reply; 34+ messages in thread
From: Zev Weiss @ 2021-10-22 16:27 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: kvm, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr, Rajat Jain,
	Frank Rowand, Jianxiong Gao, Dave Jiang, Saravana Kannan,
	Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

On Fri, Oct 22, 2021 at 01:57:21AM PDT, Greg Kroah-Hartman wrote:
>On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
>> On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
>> > On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
>> > > Devices whose fwnodes are marked as reserved are instantiated, but
>> > > will not have a driver bound to them unless userspace explicitly
>> > > requests it by writing to a 'bind' sysfs file.  This is to enable
>> > > devices that may require special (userspace-mediated) preparation
>> > > before a driver can safely probe them.
>> > >
>> > > Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
>> > > ---
>> > >  drivers/base/bus.c            |  2 +-
>> > >  drivers/base/dd.c             | 13 ++++++++-----
>> > >  drivers/dma/idxd/compat.c     |  3 +--
>> > >  drivers/vfio/mdev/mdev_core.c |  2 +-
>> > >  include/linux/device.h        | 14 +++++++++++++-
>> > >  5 files changed, 24 insertions(+), 10 deletions(-)
>> >
>> > Ugh, no, I don't really want to add yet-another-state to the driver core
>> > like this.  Why are these devices even in the kernel with a driver that
>> > wants to bind to them registered if the driver somehow should NOT be
>> > bound to it?  Shouldn't all of that logic be in the crazy driver itself
>> > as that is a very rare and odd thing to do that the driver core should
>> > not care about at all.
>> >
>> > And why does a device need userspace interaction at all?  Again, why
>> > would the driver not know about this and handle it all directly?
>> >
>>
>> Let me expand a bit more on the details of the specific situation I'm
>> dealing with...
>>
>> On a server motherboard we've got a host CPU (Xeon, Epyc, POWER, etc.) and a
>> baseboard management controller, or BMC (typically an ARM SoC, an ASPEED
>> AST2500 in my case).  The host CPU's firmware (BIOS/UEFI, ME firmware, etc.)
>> lives in a SPI flash chip.  Because it's the host's firmware, that flash
>> chip is connected to and generally (by default) under the control of the
>> host CPU.
>>
>> But we also want the BMC to be able to perform out-of-band updates to the
>> host's firmware, so the flash is *also* connected to the BMC.  There's an
>> external mux (controlled by a GPIO output driven by the BMC) that switches
>> which processor (host or BMC) is actually driving the SPI signals to the
>> flash chip, but there's a bunch of other stuff that's also required before
>> the BMC can flip that switch and take control of the SPI interface:
>>
>>  - the BMC needs to track (and potentially alter) the host's power state
>> to ensure it's not running (in OpenBMC the existing logic for this is    an
>> entire non-trivial userspace daemon unto itself)
>>
>>  - it needs to twiddle some other GPIOs to put the ME into recovery mode
>>
>>  - it needs to exchange some IPMI messages with the ME to confirm it got
>> into recovery mode
>>
>> (Some of the details here are specific to the particular motherboard I'm
>> working with, but I'd guess other systems probably have broadly similar
>> requirements.)
>>
>> The firmware flash (or at least the BMC's side of the mux in front of it) is
>> attached to a spi-nor controller that's well supported by an existing MTD
>> driver (aspeed-smc), but that driver can't safely probe the chip until all
>> the stuff described above has been done.  In particular, this means we can't
>> reasonably bind the driver to that device during the normal
>> device-discovery/driver-binding done in the BMC's boot process (nor do we
>> want to, as that would pull the rug out from under the running host).  We
>> basically only ever want to touch that SPI interface when a user (sysadmin
>> using the BMC, let's say) has explicitly initiated an out-of-band firmware
>> update.
>>
>> So we want the kernel to be aware of the device's existence (so that we
>> *can* bind a driver to it when needed), but we don't want it touching the
>> device unless we really ask for it.
>>
>> Does that help clarify the motivation for wanting this functionality?
>
>Sure, then just do this type of thing in the driver itself.  Do not have
>any matching "ids" for this hardware it so that the bus will never call
>the probe function for this hardware _until_ a manual write happens to
>the driver's "bind" sysfs file.
>

Perhaps I'm misunderstanding what you're suggesting, but if I just 
change the DT "compatible" string so that the device doesn't match the 
driver and then try to manually bind it, the driver_match_device() check 
in bind_store() prevents that manual bind from actually happening.


Thanks,
Zev


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-22 16:27         ` Zev Weiss
@ 2021-10-23  8:55           ` Greg Kroah-Hartman
  0 siblings, 0 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-23  8:55 UTC (permalink / raw)
  To: Zev Weiss
  Cc: kvm, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr, Rajat Jain,
	Frank Rowand, Jianxiong Gao, Dave Jiang, Saravana Kannan,
	Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

On Fri, Oct 22, 2021 at 09:27:41AM -0700, Zev Weiss wrote:
> On Fri, Oct 22, 2021 at 01:57:21AM PDT, Greg Kroah-Hartman wrote:
> > On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
> > > On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
> > > > On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
> > > > > Devices whose fwnodes are marked as reserved are instantiated, but
> > > > > will not have a driver bound to them unless userspace explicitly
> > > > > requests it by writing to a 'bind' sysfs file.  This is to enable
> > > > > devices that may require special (userspace-mediated) preparation
> > > > > before a driver can safely probe them.
> > > > >
> > > > > Signed-off-by: Zev Weiss <zev@bewilderbeest.net>
> > > > > ---
> > > > >  drivers/base/bus.c            |  2 +-
> > > > >  drivers/base/dd.c             | 13 ++++++++-----
> > > > >  drivers/dma/idxd/compat.c     |  3 +--
> > > > >  drivers/vfio/mdev/mdev_core.c |  2 +-
> > > > >  include/linux/device.h        | 14 +++++++++++++-
> > > > >  5 files changed, 24 insertions(+), 10 deletions(-)
> > > >
> > > > Ugh, no, I don't really want to add yet-another-state to the driver core
> > > > like this.  Why are these devices even in the kernel with a driver that
> > > > wants to bind to them registered if the driver somehow should NOT be
> > > > bound to it?  Shouldn't all of that logic be in the crazy driver itself
> > > > as that is a very rare and odd thing to do that the driver core should
> > > > not care about at all.
> > > >
> > > > And why does a device need userspace interaction at all?  Again, why
> > > > would the driver not know about this and handle it all directly?
> > > >
> > > 
> > > Let me expand a bit more on the details of the specific situation I'm
> > > dealing with...
> > > 
> > > On a server motherboard we've got a host CPU (Xeon, Epyc, POWER, etc.) and a
> > > baseboard management controller, or BMC (typically an ARM SoC, an ASPEED
> > > AST2500 in my case).  The host CPU's firmware (BIOS/UEFI, ME firmware, etc.)
> > > lives in a SPI flash chip.  Because it's the host's firmware, that flash
> > > chip is connected to and generally (by default) under the control of the
> > > host CPU.
> > > 
> > > But we also want the BMC to be able to perform out-of-band updates to the
> > > host's firmware, so the flash is *also* connected to the BMC.  There's an
> > > external mux (controlled by a GPIO output driven by the BMC) that switches
> > > which processor (host or BMC) is actually driving the SPI signals to the
> > > flash chip, but there's a bunch of other stuff that's also required before
> > > the BMC can flip that switch and take control of the SPI interface:
> > > 
> > >  - the BMC needs to track (and potentially alter) the host's power state
> > > to ensure it's not running (in OpenBMC the existing logic for this is    an
> > > entire non-trivial userspace daemon unto itself)
> > > 
> > >  - it needs to twiddle some other GPIOs to put the ME into recovery mode
> > > 
> > >  - it needs to exchange some IPMI messages with the ME to confirm it got
> > > into recovery mode
> > > 
> > > (Some of the details here are specific to the particular motherboard I'm
> > > working with, but I'd guess other systems probably have broadly similar
> > > requirements.)
> > > 
> > > The firmware flash (or at least the BMC's side of the mux in front of it) is
> > > attached to a spi-nor controller that's well supported by an existing MTD
> > > driver (aspeed-smc), but that driver can't safely probe the chip until all
> > > the stuff described above has been done.  In particular, this means we can't
> > > reasonably bind the driver to that device during the normal
> > > device-discovery/driver-binding done in the BMC's boot process (nor do we
> > > want to, as that would pull the rug out from under the running host).  We
> > > basically only ever want to touch that SPI interface when a user (sysadmin
> > > using the BMC, let's say) has explicitly initiated an out-of-band firmware
> > > update.
> > > 
> > > So we want the kernel to be aware of the device's existence (so that we
> > > *can* bind a driver to it when needed), but we don't want it touching the
> > > device unless we really ask for it.
> > > 
> > > Does that help clarify the motivation for wanting this functionality?
> > 
> > Sure, then just do this type of thing in the driver itself.  Do not have
> > any matching "ids" for this hardware it so that the bus will never call
> > the probe function for this hardware _until_ a manual write happens to
> > the driver's "bind" sysfs file.
> > 
> 
> Perhaps I'm misunderstanding what you're suggesting, but if I just change
> the DT "compatible" string so that the device doesn't match the driver and
> then try to manually bind it, the driver_match_device() check in
> bind_store() prevents that manual bind from actually happening.

Hm, I thought the bus had the ability to 'override' this somehow.  The
bus does get the callback in driver_match_device() so maybe do the logic
in there?  Somehow this works for other devices and busses, so there
must be a way it happens...

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-22 15:18         ` Patrick Williams
@ 2021-10-23  8:56           ` Greg Kroah-Hartman
  2021-10-25  5:38             ` Frank Rowand
  0 siblings, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-23  8:56 UTC (permalink / raw)
  To: Patrick Williams
  Cc: Zev Weiss, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr,
	Rajat Jain, Frank Rowand, Jianxiong Gao, Dave Jiang, kvm,
	Saravana Kannan, Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

On Fri, Oct 22, 2021 at 10:18:11AM -0500, Patrick Williams wrote:
> Hi Greg,
> 
> On Fri, Oct 22, 2021 at 10:57:21AM +0200, Greg Kroah-Hartman wrote:
> > On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
> > > On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
> > > > On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
> 
> > > So we want the kernel to be aware of the device's existence (so that we
> > > *can* bind a driver to it when needed), but we don't want it touching the
> > > device unless we really ask for it.
> > > 
> > > Does that help clarify the motivation for wanting this functionality?
> > 
> > Sure, then just do this type of thing in the driver itself.  Do not have
> > any matching "ids" for this hardware it so that the bus will never call
> > the probe function for this hardware _until_ a manual write happens to
> > the driver's "bind" sysfs file.
> 
> It sounds like you're suggesting a change to one particular driver to satisfy
> this one particular case (and maybe I'm just not understanding your suggestion).
> For a BMC, this is a pretty regular situation and not just as one-off as Zev's
> example.
> 
> Another good example is where a system can have optional riser cards with a
> whole tree of devices that might be on that riser card (and there might be
> different variants of a riser card that could go in the same slot).  Usually
> there is an EEPROM of some sort at a well-known address that can be parsed to
> identify which kind of riser card it is and then the appropriate sub-devices can
> be enumerated.  That EEPROM parsing is something that is currently done in
> userspace due to the complexity and often vendor-specific nature of it.
> 
> Many of these devices require quite a bit more configuration information than
> can be passed along a `bind` call.  I believe it has been suggested previously
> that this riser-card scenario could also be solved with dynamic loading of DT
> snippets, but that support seems simple pretty far from being merged.

Then work to get the DT code merged!  Do not try to create
yet-another-way of doing things here if DT overlays is the correct
solution here (and it seems like it is.)

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-23  8:56           ` Greg Kroah-Hartman
@ 2021-10-25  5:38             ` Frank Rowand
  2021-10-25  6:15               ` Greg Kroah-Hartman
  0 siblings, 1 reply; 34+ messages in thread
From: Frank Rowand @ 2021-10-25  5:38 UTC (permalink / raw)
  To: Greg Kroah-Hartman, Patrick Williams
  Cc: Zev Weiss, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr,
	Rajat Jain, Jianxiong Gao, Dave Jiang, kvm, Saravana Kannan,
	Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
> On Fri, Oct 22, 2021 at 10:18:11AM -0500, Patrick Williams wrote:
>> Hi Greg,
>>
>> On Fri, Oct 22, 2021 at 10:57:21AM +0200, Greg Kroah-Hartman wrote:
>>> On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
>>>> On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
>>>>> On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
>>
>>>> So we want the kernel to be aware of the device's existence (so that we
>>>> *can* bind a driver to it when needed), but we don't want it touching the
>>>> device unless we really ask for it.
>>>>
>>>> Does that help clarify the motivation for wanting this functionality?
>>>
>>> Sure, then just do this type of thing in the driver itself.  Do not have
>>> any matching "ids" for this hardware it so that the bus will never call
>>> the probe function for this hardware _until_ a manual write happens to
>>> the driver's "bind" sysfs file.
>>
>> It sounds like you're suggesting a change to one particular driver to satisfy
>> this one particular case (and maybe I'm just not understanding your suggestion).
>> For a BMC, this is a pretty regular situation and not just as one-off as Zev's
>> example.
>>
>> Another good example is where a system can have optional riser cards with a
>> whole tree of devices that might be on that riser card (and there might be
>> different variants of a riser card that could go in the same slot).  Usually
>> there is an EEPROM of some sort at a well-known address that can be parsed to
>> identify which kind of riser card it is and then the appropriate sub-devices can
>> be enumerated.  That EEPROM parsing is something that is currently done in
>> userspace due to the complexity and often vendor-specific nature of it.
>>
>> Many of these devices require quite a bit more configuration information than
>> can be passed along a `bind` call.  I believe it has been suggested previously
>> that this riser-card scenario could also be solved with dynamic loading of DT
>> snippets, but that support seems simple pretty far from being merged.
> 
> Then work to get the DT code merged!  Do not try to create
> yet-another-way of doing things here if DT overlays is the correct
> solution here (and it seems like it is.)

I don't think this is a case that fits the overlay model.

We know what the description of the device is (which is what devicetree
is all about), but the device is to be shared between the Linux kernel
and some other entity, such as the firmware or another OS.  The issue
to be resolved is how to describe that the device is to be shared (in
this case exclusively by the kernel _or_ by the other entity at any
given moment), and how to move ownership of the device between the
Linux kernel and the other entity.

In the scenario presented by Zev, it is suggested that a user space
agent will be involved in deciding which entity owns the device and
to tell the Linux kernel when to take ownership of the device (and
presumably when to relinquish ownership, although we haven't seen
the implementation of relinquishing ownership yet).  One could
imagine direct communication between the driver and the other
entity to mediate ownership.  That seems like a driver specific
defined choice to me, though if there are enough different drivers
facing this situation then eventually a shared framework would
make sense.

So to step back and think architecture, it seems to me that the
devicetree needs to specify in the node describing the shared
device that the device must be (1) owned exclusively by either
the Linux kernel or some other entity, with a signalling method
between the Linux kernel and the other entity being defined
(possibly by information in the node or possibly by the definition
of the driver) or (2) actively shared by both the Linux
kernel and the other entity.  Actively shared may or may not be
possible, based on the specific hardware.  For example, if a single
contains some bits controlled by the Linux kernel and other bits
controlled by the other entity, then it can be difficult for one
of the two entities to atomically modify the register in coordination
with the other entity.  Obviously case 1 is much simpler than case 2,
I'm just trying to create a picture of the potential cases.  In a
simpler version of case 2, each entity would have control of their
own set of registers.

Diverging away from the overlay question, to comment on the
"status" property mentioned elsewhere in this thread, I do not
think that a status value of "reserved" is an adequate method
of conveying all of the information needed by the above range
of scenarios.

-Frank

> 
> thanks,
> 
> greg k-h
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/5] driver core, of: support for reserved devices
  2021-10-22  9:00   ` Zev Weiss
  2021-10-22  9:22     ` Greg Kroah-Hartman
@ 2021-10-25  5:53     ` Frank Rowand
  2021-10-25 13:57       ` Frank Rowand
  1 sibling, 1 reply; 34+ messages in thread
From: Frank Rowand @ 2021-10-25  5:53 UTC (permalink / raw)
  To: Zev Weiss, Greg Kroah-Hartman
  Cc: devicetree, Andrew Jeffery, openbmc, linux-kernel, Rob Herring,
	Jeremy Kerr

On 10/22/21 4:00 AM, Zev Weiss wrote:
> On Thu, Oct 21, 2021 at 11:50:07PM PDT, Greg Kroah-Hartman wrote:
>> On Thu, Oct 21, 2021 at 07:00:27PM -0700, Zev Weiss wrote:
>>> Hello all,
>>>
>>> This series is another incarnation of a couple other patchsets I've
>>> posted recently [0, 1], but again different enough in overall
>>> structure that I'm not sure it's exactly a v2 (or v3).
>>>
>>> As compared to [1], it abandons the writable binary sysfs files and at
>>> Frank's suggestion returns to an approach more akin to [0], though
>>> without any driver-specific (aspeed-smc) changes, which I figure might
>>> as well be done later in a separate series once appropriate
>>> infrastructure is in place.
>>>
>>> The basic idea is to implement support for a status property value
>>> that's documented in the DT spec [2], but thus far not used at all in
>>> the kernel (or anywhere else I'm aware of): "reserved".  According to
>>> the spec (section 2.3.4, Table 2.4), this status:
>>>
>>>   Indicates that the device is operational, but should not be used.
>>>   Typically this is used for devices that are controlled by another
>>>   software component, such as platform firmware.
>>>
>>> With these changes, devices marked as reserved are (at least in some
>>> cases, more on this later) instantiated, but will not have drivers
>>> bound to them unless and until userspace explicitly requests it by
>>> writing the device's name to the driver's sysfs 'bind' file.  This
>>> enables appropriate handling of hardware arrangements that can arise
>>> in contexts like OpenBMC, where a device may be shared with another
>>> external controller not under the kernel's control (for example, the
>>> flash chip storing the host CPU's firmware, shared by the BMC and the
>>> host CPU and exclusively under the control of the latter by default).
>>> Such a device can be marked as reserved so that the kernel refrains
>>> from touching it until appropriate preparatory steps have been taken
>>> (e.g. BMC userspace coordinating with the host CPU to arbitrate which
>>> processor has control of the firmware flash).
>>>
>>> Patches 1-3 provide some basic plumbing for checking the "reserved"
>>> status of a device, patch 4 is the main driver-core change, and patch
>>> 5 tweaks the OF platform code to not skip reserved devices so that
>>> they can actually be instantiated.
>>
>> Again, the driver core should not care about this, that is up to the bus
>> that wants to read these "reserved" values and do something with them or
>> not (remember the bus is the thing that does the binding, not the driver
>> core).
>>
>> But are you sure you are using the "reserved" field properly?
> 
> Well, thus far both Rob Herring and Oliver O'Halloran (originator of the "reserved" status in the DT spec, whom I probably should have CCed earlier, sorry) have seemed receptive to this interpretation of it, which I'd hope would lend it some credence.

I am not on board with this interpretation.  To me, if the value of
status is "reserved", then the Linux kernel should _never_ use the
device described by the node.

If a "reserved" node is usable by the Linux kernel, then the specification
should be updated to allow this.  And the specification should probably
be expanded to either discuss how to describe the coordination between
multiple entities or state that the coordination is outside of the
specification and will be implemention dependent.

I am wary of the complexity of the operating system treating a node as
reserved at initial boot, then at some point via coordination with
some other entity starting to use the node.  It is not too complex if
the node is a leaf node with no links (phandles) to or from any other node,
but as soon as links to or from other nodes exist, then other drivers or
subsystems may need to be aware of when the node is available to the
operating system or given back to the other entity then any part of the
operating system has to coordinate in that state transition.  This is
driving a lot of my caution that we be careful to create architecture
and not an ad hoc hack.

-Frank

> 
>> You are
>> wanting to do "something" to the device to later on be able to then have
>> the kernel touch the device, while it seems that the reason for this
>> field is for the kernel to NEVER touch the device at all.  What will
>> break if you change this logic?
> 
> Given that there's no existing usage of or support for this status value anywhere I can see in the kernel, and that Oliver has indicated that it should be compatible with usage in OpenPower platform firmware, my expectation would certainly be that nothing would break, but if there are examples of things that could I'd be interested to see them.
> 
> 
> Thanks,
> Zev
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-25  5:38             ` Frank Rowand
@ 2021-10-25  6:15               ` Greg Kroah-Hartman
  2021-10-25 11:44                 ` Patrick Williams
  0 siblings, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-25  6:15 UTC (permalink / raw)
  To: Frank Rowand
  Cc: Zev Weiss, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr,
	Rajat Jain, Jianxiong Gao, Dave Jiang, kvm, Saravana Kannan,
	Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
> > On Fri, Oct 22, 2021 at 10:18:11AM -0500, Patrick Williams wrote:
> >> Hi Greg,
> >>
> >> On Fri, Oct 22, 2021 at 10:57:21AM +0200, Greg Kroah-Hartman wrote:
> >>> On Fri, Oct 22, 2021 at 01:32:32AM -0700, Zev Weiss wrote:
> >>>> On Thu, Oct 21, 2021 at 11:46:56PM PDT, Greg Kroah-Hartman wrote:
> >>>>> On Thu, Oct 21, 2021 at 07:00:31PM -0700, Zev Weiss wrote:
> >>
> >>>> So we want the kernel to be aware of the device's existence (so that we
> >>>> *can* bind a driver to it when needed), but we don't want it touching the
> >>>> device unless we really ask for it.
> >>>>
> >>>> Does that help clarify the motivation for wanting this functionality?
> >>>
> >>> Sure, then just do this type of thing in the driver itself.  Do not have
> >>> any matching "ids" for this hardware it so that the bus will never call
> >>> the probe function for this hardware _until_ a manual write happens to
> >>> the driver's "bind" sysfs file.
> >>
> >> It sounds like you're suggesting a change to one particular driver to satisfy
> >> this one particular case (and maybe I'm just not understanding your suggestion).
> >> For a BMC, this is a pretty regular situation and not just as one-off as Zev's
> >> example.
> >>
> >> Another good example is where a system can have optional riser cards with a
> >> whole tree of devices that might be on that riser card (and there might be
> >> different variants of a riser card that could go in the same slot).  Usually
> >> there is an EEPROM of some sort at a well-known address that can be parsed to
> >> identify which kind of riser card it is and then the appropriate sub-devices can
> >> be enumerated.  That EEPROM parsing is something that is currently done in
> >> userspace due to the complexity and often vendor-specific nature of it.
> >>
> >> Many of these devices require quite a bit more configuration information than
> >> can be passed along a `bind` call.  I believe it has been suggested previously
> >> that this riser-card scenario could also be solved with dynamic loading of DT
> >> snippets, but that support seems simple pretty far from being merged.
> > 
> > Then work to get the DT code merged!  Do not try to create
> > yet-another-way of doing things here if DT overlays is the correct
> > solution here (and it seems like it is.)
> 
> I don't think this is a case that fits the overlay model.
> 
> We know what the description of the device is (which is what devicetree
> is all about), but the device is to be shared between the Linux kernel
> and some other entity, such as the firmware or another OS.  The issue
> to be resolved is how to describe that the device is to be shared (in
> this case exclusively by the kernel _or_ by the other entity at any
> given moment), and how to move ownership of the device between the
> Linux kernel and the other entity.
> 
> In the scenario presented by Zev, it is suggested that a user space
> agent will be involved in deciding which entity owns the device and
> to tell the Linux kernel when to take ownership of the device (and
> presumably when to relinquish ownership, although we haven't seen
> the implementation of relinquishing ownership yet).  One could
> imagine direct communication between the driver and the other
> entity to mediate ownership.  That seems like a driver specific
> defined choice to me, though if there are enough different drivers
> facing this situation then eventually a shared framework would
> make sense.

We have the bind/unbind ability today, from userspace, that can control
this.  Why not just have Linux grab the device when it boots, and then
when userspace wants to "give the device up", it writes to "unbind" in
sysfs, and then when all is done, it writes to the "bind" file and then
Linux takes back over.

Unless for some reason Linux should _not_ grab the device when booting,
then things get messier, as we have seen in this thread.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-25  6:15               ` Greg Kroah-Hartman
@ 2021-10-25 11:44                 ` Patrick Williams
  2021-10-25 12:58                   ` Andy Shevchenko
  0 siblings, 1 reply; 34+ messages in thread
From: Patrick Williams @ 2021-10-25 11:44 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Zev Weiss, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr,
	Rajat Jain, Frank Rowand, Jianxiong Gao, Dave Jiang, kvm,
	Saravana Kannan, Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

[-- Attachment #1: Type: text/plain, Size: 1123 bytes --]

On Mon, Oct 25, 2021 at 08:15:41AM +0200, Greg Kroah-Hartman wrote:
> On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> > On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
 
> We have the bind/unbind ability today, from userspace, that can control
> this.  Why not just have Linux grab the device when it boots, and then
> when userspace wants to "give the device up", it writes to "unbind" in
> sysfs, and then when all is done, it writes to the "bind" file and then
> Linux takes back over.
> 
> Unless for some reason Linux should _not_ grab the device when booting,
> then things get messier, as we have seen in this thread.

This is probably more typical on a BMC than atypical.  The systems often require
the BMC (running Linux) to be able to reboot independently from the managed host
(running anything).  In the example Zev gave, the BMC rebooting would rip away
the BIOS chip from the running host.

The BMC almost always needs to come up in a "I don't know what could possibly be
going on in the system" state and re-discover where the system was left off.

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-25 11:44                 ` Patrick Williams
@ 2021-10-25 12:58                   ` Andy Shevchenko
  2021-10-25 13:20                     ` Patrick Williams
  0 siblings, 1 reply; 34+ messages in thread
From: Andy Shevchenko @ 2021-10-25 12:58 UTC (permalink / raw)
  To: Patrick Williams
  Cc: Zev Weiss, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr,
	Rajat Jain, Frank Rowand, Jianxiong Gao, Dave Jiang, kvm,
	Saravana Kannan, Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andrew Jeffery,
	Greg Kroah-Hartman, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

On Mon, Oct 25, 2021 at 06:44:26AM -0500, Patrick Williams wrote:
> On Mon, Oct 25, 2021 at 08:15:41AM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> > > On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
>  
> > We have the bind/unbind ability today, from userspace, that can control
> > this.  Why not just have Linux grab the device when it boots, and then
> > when userspace wants to "give the device up", it writes to "unbind" in
> > sysfs, and then when all is done, it writes to the "bind" file and then
> > Linux takes back over.
> > 
> > Unless for some reason Linux should _not_ grab the device when booting,
> > then things get messier, as we have seen in this thread.
> 
> This is probably more typical on a BMC than atypical.  The systems often require
> the BMC (running Linux) to be able to reboot independently from the managed host
> (running anything).  In the example Zev gave, the BMC rebooting would rip away
> the BIOS chip from the running host.
> 
> The BMC almost always needs to come up in a "I don't know what could possibly be
> going on in the system" state and re-discover where the system was left off.

Isn't it an architectural issue then?

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-25 12:58                   ` Andy Shevchenko
@ 2021-10-25 13:20                     ` Patrick Williams
  2021-10-25 13:34                       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 34+ messages in thread
From: Patrick Williams @ 2021-10-25 13:20 UTC (permalink / raw)
  To: Andy Shevchenko
  Cc: Zev Weiss, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr,
	Rajat Jain, Frank Rowand, Jianxiong Gao, Dave Jiang, kvm,
	Saravana Kannan, Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andrew Jeffery,
	Greg Kroah-Hartman, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

[-- Attachment #1: Type: text/plain, Size: 1601 bytes --]

On Mon, Oct 25, 2021 at 03:58:25PM +0300, Andy Shevchenko wrote:
> On Mon, Oct 25, 2021 at 06:44:26AM -0500, Patrick Williams wrote:
> > On Mon, Oct 25, 2021 at 08:15:41AM +0200, Greg Kroah-Hartman wrote:
> > > On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> > > > On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
> >  
> > > We have the bind/unbind ability today, from userspace, that can control
> > > this.  Why not just have Linux grab the device when it boots, and then
> > > when userspace wants to "give the device up", it writes to "unbind" in
> > > sysfs, and then when all is done, it writes to the "bind" file and then
> > > Linux takes back over.
> > > 
> > > Unless for some reason Linux should _not_ grab the device when booting,
> > > then things get messier, as we have seen in this thread.
> > 
> > This is probably more typical on a BMC than atypical.  The systems often require
> > the BMC (running Linux) to be able to reboot independently from the managed host
> > (running anything).  In the example Zev gave, the BMC rebooting would rip away
> > the BIOS chip from the running host.
> > 
> > The BMC almost always needs to come up in a "I don't know what could possibly be
> > going on in the system" state and re-discover where the system was left off.
> 
> Isn't it an architectural issue then?

I'm not sure what "it" you are referring to here.

I was trying to explain why starting in "bind" state is not a good idea for a
BMC in most of these cases where we want to be able to dynamically add a device.


-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-25 13:20                     ` Patrick Williams
@ 2021-10-25 13:34                       ` Greg Kroah-Hartman
  2021-10-25 14:02                         ` Patrick Williams
  0 siblings, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-25 13:34 UTC (permalink / raw)
  To: Patrick Williams
  Cc: Zev Weiss, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr,
	Rajat Jain, Frank Rowand, Jianxiong Gao, Dave Jiang, kvm,
	Saravana Kannan, Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

On Mon, Oct 25, 2021 at 08:20:05AM -0500, Patrick Williams wrote:
> On Mon, Oct 25, 2021 at 03:58:25PM +0300, Andy Shevchenko wrote:
> > On Mon, Oct 25, 2021 at 06:44:26AM -0500, Patrick Williams wrote:
> > > On Mon, Oct 25, 2021 at 08:15:41AM +0200, Greg Kroah-Hartman wrote:
> > > > On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> > > > > On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
> > >  
> > > > We have the bind/unbind ability today, from userspace, that can control
> > > > this.  Why not just have Linux grab the device when it boots, and then
> > > > when userspace wants to "give the device up", it writes to "unbind" in
> > > > sysfs, and then when all is done, it writes to the "bind" file and then
> > > > Linux takes back over.
> > > > 
> > > > Unless for some reason Linux should _not_ grab the device when booting,
> > > > then things get messier, as we have seen in this thread.
> > > 
> > > This is probably more typical on a BMC than atypical.  The systems often require
> > > the BMC (running Linux) to be able to reboot independently from the managed host
> > > (running anything).  In the example Zev gave, the BMC rebooting would rip away
> > > the BIOS chip from the running host.
> > > 
> > > The BMC almost always needs to come up in a "I don't know what could possibly be
> > > going on in the system" state and re-discover where the system was left off.
> > 
> > Isn't it an architectural issue then?
> 
> I'm not sure what "it" you are referring to here.
> 
> I was trying to explain why starting in "bind" state is not a good idea for a
> BMC in most of these cases where we want to be able to dynamically add a device.

I think "it" is "something needs to be the moderator between the two
operating systems".  What is the external entity that handles the
switching between the two?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 0/5] driver core, of: support for reserved devices
  2021-10-25  5:53     ` Frank Rowand
@ 2021-10-25 13:57       ` Frank Rowand
  0 siblings, 0 replies; 34+ messages in thread
From: Frank Rowand @ 2021-10-25 13:57 UTC (permalink / raw)
  To: Zev Weiss, Greg Kroah-Hartman
  Cc: devicetree, Andrew Jeffery, openbmc, linux-kernel, Rob Herring,
	Jeremy Kerr

On 10/25/21 12:53 AM, Frank Rowand wrote:
> On 10/22/21 4:00 AM, Zev Weiss wrote:
>> On Thu, Oct 21, 2021 at 11:50:07PM PDT, Greg Kroah-Hartman wrote:
>>> On Thu, Oct 21, 2021 at 07:00:27PM -0700, Zev Weiss wrote:
>>>> Hello all,
>>>>
>>>> This series is another incarnation of a couple other patchsets I've
>>>> posted recently [0, 1], but again different enough in overall
>>>> structure that I'm not sure it's exactly a v2 (or v3).
>>>>
>>>> As compared to [1], it abandons the writable binary sysfs files and at
>>>> Frank's suggestion returns to an approach more akin to [0], though
>>>> without any driver-specific (aspeed-smc) changes, which I figure might
>>>> as well be done later in a separate series once appropriate
>>>> infrastructure is in place.
>>>>
>>>> The basic idea is to implement support for a status property value
>>>> that's documented in the DT spec [2], but thus far not used at all in
>>>> the kernel (or anywhere else I'm aware of): "reserved".  According to
>>>> the spec (section 2.3.4, Table 2.4), this status:
>>>>
>>>>   Indicates that the device is operational, but should not be used.
>>>>   Typically this is used for devices that are controlled by another
>>>>   software component, such as platform firmware.
>>>>
>>>> With these changes, devices marked as reserved are (at least in some
>>>> cases, more on this later) instantiated, but will not have drivers
>>>> bound to them unless and until userspace explicitly requests it by
>>>> writing the device's name to the driver's sysfs 'bind' file.  This
>>>> enables appropriate handling of hardware arrangements that can arise
>>>> in contexts like OpenBMC, where a device may be shared with another
>>>> external controller not under the kernel's control (for example, the
>>>> flash chip storing the host CPU's firmware, shared by the BMC and the
>>>> host CPU and exclusively under the control of the latter by default).
>>>> Such a device can be marked as reserved so that the kernel refrains
>>>> from touching it until appropriate preparatory steps have been taken
>>>> (e.g. BMC userspace coordinating with the host CPU to arbitrate which
>>>> processor has control of the firmware flash).
>>>>
>>>> Patches 1-3 provide some basic plumbing for checking the "reserved"
>>>> status of a device, patch 4 is the main driver-core change, and patch
>>>> 5 tweaks the OF platform code to not skip reserved devices so that
>>>> they can actually be instantiated.
>>>
>>> Again, the driver core should not care about this, that is up to the bus
>>> that wants to read these "reserved" values and do something with them or
>>> not (remember the bus is the thing that does the binding, not the driver
>>> core).
>>>
>>> But are you sure you are using the "reserved" field properly?
>>
>> Well, thus far both Rob Herring and Oliver O'Halloran (originator of the "reserved" status in the DT spec, whom I probably should have CCed earlier, sorry) have seemed receptive to this interpretation of it, which I'd hope would lend it some credence.
> 
> I am not on board with this interpretation.  To me, if the value of
> status is "reserved", then the Linux kernel should _never_ use the
> device described by the node.
> 
> If a "reserved" node is usable by the Linux kernel, then the specification
> should be updated to allow this.  And the specification should probably
> be expanded to either discuss how to describe the coordination between
> multiple entities or state that the coordination is outside of the
> specification and will be implemention dependent.

Maybe a value of "reserved-sharable" should be added to the standard.
This would indicate that the node is operational and controlled by
another software component, but is available to the operating system
after requesting permission or being granted permission from the other
software component.

The exact method of requesting permission or being granted permission
could either be driver specific, or the driver binding could
include one or more additional properties to describe the method.

One thing that I am wary of is the possibility of a proliferation of
status checks changing from "okay" to "okay" || ("reserved" and the
state of the driver is that permission has been granted).

From a simplicity of coding view, it is really tempting to dynamically
change the value of the status property from "reserved-sharable"
to "okay" when the other software component grants permission to
use the device, so that status checks will magically allow use
instead of blocking use.  I do not like changing the value of the
status property dynamically because the devicetree is supposed to
describe hardware (and communicate information from the firmware
to the operating system), not to actively maintain state.

-Frank

> 
> I am wary of the complexity of the operating system treating a node as
> reserved at initial boot, then at some point via coordination with
> some other entity starting to use the node.  It is not too complex if
> the node is a leaf node with no links (phandles) to or from any other node,
> but as soon as links to or from other nodes exist, then other drivers or
> subsystems may need to be aware of when the node is available to the
> operating system or given back to the other entity then any part of the
> operating system has to coordinate in that state transition.  This is
> driving a lot of my caution that we be careful to create architecture
> and not an ad hoc hack.
> 
> -Frank
> 
>>
>>> You are
>>> wanting to do "something" to the device to later on be able to then have
>>> the kernel touch the device, while it seems that the reason for this
>>> field is for the kernel to NEVER touch the device at all.  What will
>>> break if you change this logic?
>>
>> Given that there's no existing usage of or support for this status value anywhere I can see in the kernel, and that Oliver has indicated that it should be compatible with usage in OpenPower platform firmware, my expectation would certainly be that nothing would break, but if there are examples of things that could I'd be interested to see them.
>>
>>
>> Thanks,
>> Zev
>>
> 


^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-25 13:34                       ` Greg Kroah-Hartman
@ 2021-10-25 14:02                         ` Patrick Williams
  2021-10-25 14:09                           ` Greg Kroah-Hartman
  0 siblings, 1 reply; 34+ messages in thread
From: Patrick Williams @ 2021-10-25 14:02 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Zev Weiss, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr,
	Rajat Jain, Frank Rowand, Jianxiong Gao, Dave Jiang, kvm,
	Saravana Kannan, Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

[-- Attachment #1: Type: text/plain, Size: 2802 bytes --]

On Mon, Oct 25, 2021 at 03:34:05PM +0200, Greg Kroah-Hartman wrote:
> On Mon, Oct 25, 2021 at 08:20:05AM -0500, Patrick Williams wrote:
> > On Mon, Oct 25, 2021 at 03:58:25PM +0300, Andy Shevchenko wrote:
> > > On Mon, Oct 25, 2021 at 06:44:26AM -0500, Patrick Williams wrote:
> > > > On Mon, Oct 25, 2021 at 08:15:41AM +0200, Greg Kroah-Hartman wrote:
> > > > > On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> > > > > > On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
> > > >  
> > > > > We have the bind/unbind ability today, from userspace, that can control
> > > > > this.  Why not just have Linux grab the device when it boots, and then
> > > > > when userspace wants to "give the device up", it writes to "unbind" in
> > > > > sysfs, and then when all is done, it writes to the "bind" file and then
> > > > > Linux takes back over.
> > > > > 
> > > > > Unless for some reason Linux should _not_ grab the device when booting,
> > > > > then things get messier, as we have seen in this thread.
> > > > 
> > > > This is probably more typical on a BMC than atypical.  The systems often require
> > > > the BMC (running Linux) to be able to reboot independently from the managed host
> > > > (running anything).  In the example Zev gave, the BMC rebooting would rip away
> > > > the BIOS chip from the running host.
> > > > 
> > > > The BMC almost always needs to come up in a "I don't know what could possibly be
> > > > going on in the system" state and re-discover where the system was left off.
> > > 
> > > Isn't it an architectural issue then?
> > 
> > I'm not sure what "it" you are referring to here.
> > 
> > I was trying to explain why starting in "bind" state is not a good idea for a
> > BMC in most of these cases where we want to be able to dynamically add a device.
> 
> I think "it" is "something needs to be the moderator between the two
> operating systems".  What is the external entity that handles the
> switching between the two?

Ah, ok.

Those usually end up being system / device specific.  In the case of the BIOS
flash, most designs I've seen use a SPI mux between the BMC and the host
processor or IO hub (PCH on Xeons).  The BMC has a GPIO to control the mux.

As far as state, the BMC on start-up will go through a set of discovery code to
figure out where it left the system prior to getting reset.  That involves
looking at the power subsystem and usually doing some kind of query to the host
to see if it is alive.  These queries are mostly system / host-processor design
specific.  I've seen anything from an IPMI/IPMB message alert from the BMC to
the BIOS to ask "are you alive" to reading host processor state over JTAG to
figure out if the processors are "making progress".

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-25 14:02                         ` Patrick Williams
@ 2021-10-25 14:09                           ` Greg Kroah-Hartman
  2021-10-25 15:54                             ` Patrick Williams
  0 siblings, 1 reply; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-25 14:09 UTC (permalink / raw)
  To: Patrick Williams
  Cc: Zev Weiss, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr,
	Rajat Jain, Frank Rowand, Jianxiong Gao, Dave Jiang, kvm,
	Saravana Kannan, Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

On Mon, Oct 25, 2021 at 09:02:40AM -0500, Patrick Williams wrote:
> On Mon, Oct 25, 2021 at 03:34:05PM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Oct 25, 2021 at 08:20:05AM -0500, Patrick Williams wrote:
> > > On Mon, Oct 25, 2021 at 03:58:25PM +0300, Andy Shevchenko wrote:
> > > > On Mon, Oct 25, 2021 at 06:44:26AM -0500, Patrick Williams wrote:
> > > > > On Mon, Oct 25, 2021 at 08:15:41AM +0200, Greg Kroah-Hartman wrote:
> > > > > > On Mon, Oct 25, 2021 at 12:38:08AM -0500, Frank Rowand wrote:
> > > > > > > On 10/23/21 3:56 AM, Greg Kroah-Hartman wrote:
> > > > >  
> > > > > > We have the bind/unbind ability today, from userspace, that can control
> > > > > > this.  Why not just have Linux grab the device when it boots, and then
> > > > > > when userspace wants to "give the device up", it writes to "unbind" in
> > > > > > sysfs, and then when all is done, it writes to the "bind" file and then
> > > > > > Linux takes back over.
> > > > > > 
> > > > > > Unless for some reason Linux should _not_ grab the device when booting,
> > > > > > then things get messier, as we have seen in this thread.
> > > > > 
> > > > > This is probably more typical on a BMC than atypical.  The systems often require
> > > > > the BMC (running Linux) to be able to reboot independently from the managed host
> > > > > (running anything).  In the example Zev gave, the BMC rebooting would rip away
> > > > > the BIOS chip from the running host.
> > > > > 
> > > > > The BMC almost always needs to come up in a "I don't know what could possibly be
> > > > > going on in the system" state and re-discover where the system was left off.
> > > > 
> > > > Isn't it an architectural issue then?
> > > 
> > > I'm not sure what "it" you are referring to here.
> > > 
> > > I was trying to explain why starting in "bind" state is not a good idea for a
> > > BMC in most of these cases where we want to be able to dynamically add a device.
> > 
> > I think "it" is "something needs to be the moderator between the two
> > operating systems".  What is the external entity that handles the
> > switching between the two?
> 
> Ah, ok.
> 
> Those usually end up being system / device specific.  In the case of the BIOS
> flash, most designs I've seen use a SPI mux between the BMC and the host
> processor or IO hub (PCH on Xeons).  The BMC has a GPIO to control the mux.
> 
> As far as state, the BMC on start-up will go through a set of discovery code to
> figure out where it left the system prior to getting reset.  That involves
> looking at the power subsystem and usually doing some kind of query to the host
> to see if it is alive.  These queries are mostly system / host-processor design
> specific.  I've seen anything from an IPMI/IPMB message alert from the BMC to
> the BIOS to ask "are you alive" to reading host processor state over JTAG to
> figure out if the processors are "making progress".

But which processor is "in control" here over the hardware?  What method
is used to pass the device from one CPU to another from a logical point
of view?  Sounds like it is another driver that needs to handle all of
this, so why not have that be the one that adds/removes the devices
under control here?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-25 14:09                           ` Greg Kroah-Hartman
@ 2021-10-25 15:54                             ` Patrick Williams
  2021-10-25 18:36                               ` Greg Kroah-Hartman
  0 siblings, 1 reply; 34+ messages in thread
From: Patrick Williams @ 2021-10-25 15:54 UTC (permalink / raw)
  To: Greg Kroah-Hartman
  Cc: Zev Weiss, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr,
	Rajat Jain, Frank Rowand, Jianxiong Gao, Dave Jiang, kvm,
	Saravana Kannan, Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

[-- Attachment #1: Type: text/plain, Size: 2894 bytes --]

On Mon, Oct 25, 2021 at 04:09:59PM +0200, Greg Kroah-Hartman wrote:
> On Mon, Oct 25, 2021 at 09:02:40AM -0500, Patrick Williams wrote:
> > On Mon, Oct 25, 2021 at 03:34:05PM +0200, Greg Kroah-Hartman wrote:
> > > On Mon, Oct 25, 2021 at 08:20:05AM -0500, Patrick Williams wrote:
> > > I think "it" is "something needs to be the moderator between the two
> > > operating systems".  What is the external entity that handles the
> > > switching between the two?
> > 
> > Ah, ok.
> > 
> > Those usually end up being system / device specific.  In the case of the BIOS
> > flash, most designs I've seen use a SPI mux between the BMC and the host
> > processor or IO hub (PCH on Xeons).  The BMC has a GPIO to control the mux.
> > 
> > As far as state, the BMC on start-up will go through a set of discovery code to
> > figure out where it left the system prior to getting reset.  That involves
> > looking at the power subsystem and usually doing some kind of query to the host
> > to see if it is alive.  These queries are mostly system / host-processor design
> > specific.  I've seen anything from an IPMI/IPMB message alert from the BMC to
> > the BIOS to ask "are you alive" to reading host processor state over JTAG to
> > figure out if the processors are "making progress".
> 
> But which processor is "in control" here over the hardware?  

The BMC.  It owns the GPIO that controls the SPI mux.  

But, the BMC is responsible for doing all operations in a way that doesn't mess
up the running host processor(s).  Pulling away the SPI flash containing the
BIOS code at an incorrect time might do that.

> What method
> is used to pass the device from one CPU to another from a logical point
> of view?  

The state of the server as a whole is determined and maintained by the BMC.  I'm
simplifying here a bit but the operation "turn on the host processors" implies
"the host processors will access the BIOS" so the BMC must ensure "SPI mux is
switched towards the host" before "turn on the host processors".

> Sounds like it is another driver that needs to handle all of
> this, so why not have that be the one that adds/removes the devices
> under control here?

If what you're describing is moving all of the state control logic into the
kernel, I don't think that is feasible.  For some systems it would mean moving
yet another entire IPMI stack into the kernel tree.  On others it might be
somewhat simpler, but it is still a good amount of code.  We could probably
write up more details on the scope of this.

If what you're describing is a small driver, similar to the board support
drivers that were used before the device tree, that instantiates subordinate
devices it doesn't seem like an unreasonable alternative to DT overlays to me
(for whatever my limited kernel contribution experience counts for).

-- 
Patrick Williams

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 34+ messages in thread

* Re: [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices
  2021-10-25 15:54                             ` Patrick Williams
@ 2021-10-25 18:36                               ` Greg Kroah-Hartman
  0 siblings, 0 replies; 34+ messages in thread
From: Greg Kroah-Hartman @ 2021-10-25 18:36 UTC (permalink / raw)
  To: Patrick Williams
  Cc: Zev Weiss, Rafael J. Wysocki, Kirti Wankhede, Jeremy Kerr,
	Rajat Jain, Frank Rowand, Jianxiong Gao, Dave Jiang, kvm,
	Saravana Kannan, Mauro Carvalho Chehab, openbmc, devicetree,
	Konrad Rzeszutek Wilk, Alex Williamson, Rob Herring,
	Bhaskar Chowdhury, Thomas Gleixner, Andy Shevchenko,
	Andrew Jeffery, Cornelia Huck, linux-kernel, Vinod Koul,
	dmaengine

On Mon, Oct 25, 2021 at 10:54:21AM -0500, Patrick Williams wrote:
> On Mon, Oct 25, 2021 at 04:09:59PM +0200, Greg Kroah-Hartman wrote:
> > On Mon, Oct 25, 2021 at 09:02:40AM -0500, Patrick Williams wrote:
> > > On Mon, Oct 25, 2021 at 03:34:05PM +0200, Greg Kroah-Hartman wrote:
> > > > On Mon, Oct 25, 2021 at 08:20:05AM -0500, Patrick Williams wrote:
> > > > I think "it" is "something needs to be the moderator between the two
> > > > operating systems".  What is the external entity that handles the
> > > > switching between the two?
> > > 
> > > Ah, ok.
> > > 
> > > Those usually end up being system / device specific.  In the case of the BIOS
> > > flash, most designs I've seen use a SPI mux between the BMC and the host
> > > processor or IO hub (PCH on Xeons).  The BMC has a GPIO to control the mux.
> > > 
> > > As far as state, the BMC on start-up will go through a set of discovery code to
> > > figure out where it left the system prior to getting reset.  That involves
> > > looking at the power subsystem and usually doing some kind of query to the host
> > > to see if it is alive.  These queries are mostly system / host-processor design
> > > specific.  I've seen anything from an IPMI/IPMB message alert from the BMC to
> > > the BIOS to ask "are you alive" to reading host processor state over JTAG to
> > > figure out if the processors are "making progress".
> > 
> > But which processor is "in control" here over the hardware?  
> 
> The BMC.  It owns the GPIO that controls the SPI mux.  
> 
> But, the BMC is responsible for doing all operations in a way that doesn't mess
> up the running host processor(s).  Pulling away the SPI flash containing the
> BIOS code at an incorrect time might do that.
> 
> > What method
> > is used to pass the device from one CPU to another from a logical point
> > of view?  
> 
> The state of the server as a whole is determined and maintained by the BMC.  I'm
> simplifying here a bit but the operation "turn on the host processors" implies
> "the host processors will access the BIOS" so the BMC must ensure "SPI mux is
> switched towards the host" before "turn on the host processors".
> 
> > Sounds like it is another driver that needs to handle all of
> > this, so why not have that be the one that adds/removes the devices
> > under control here?
> 
> If what you're describing is moving all of the state control logic into the
> kernel, I don't think that is feasible.  For some systems it would mean moving
> yet another entire IPMI stack into the kernel tree.  On others it might be
> somewhat simpler, but it is still a good amount of code.  We could probably
> write up more details on the scope of this.
> 
> If what you're describing is a small driver, similar to the board support
> drivers that were used before the device tree, that instantiates subordinate
> devices it doesn't seem like an unreasonable alternative to DT overlays to me
> (for whatever my limited kernel contribution experience counts for).
> 

Something has to be here doing the mediation between the two processors
and keeping things straight as to what processor is handling the
hardware when.  I suggest you focus on that first...

Good luck!

greg k-h

^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2021-10-25 18:37 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-22  2:00 [PATCH 0/5] driver core, of: support for reserved devices Zev Weiss
2021-10-22  2:00 ` [PATCH 1/5] of: base: add function to check for status = "reserved" Zev Weiss
2021-10-22  6:43   ` Greg Kroah-Hartman
2021-10-22  7:38     ` Zev Weiss
2021-10-22  7:45       ` Greg Kroah-Hartman
2021-10-22  2:00 ` [PATCH 2/5] device property: add fwnode_device_is_reserved() Zev Weiss
2021-10-22  2:00 ` [PATCH 3/5] of: property: add support for fwnode_device_is_reserved() Zev Weiss
2021-10-22  2:00 ` [PATCH 4/5] driver core: inhibit automatic driver binding on reserved devices Zev Weiss
2021-10-22  6:46   ` Greg Kroah-Hartman
2021-10-22  8:32     ` Zev Weiss
2021-10-22  8:57       ` Greg Kroah-Hartman
2021-10-22 15:18         ` Patrick Williams
2021-10-23  8:56           ` Greg Kroah-Hartman
2021-10-25  5:38             ` Frank Rowand
2021-10-25  6:15               ` Greg Kroah-Hartman
2021-10-25 11:44                 ` Patrick Williams
2021-10-25 12:58                   ` Andy Shevchenko
2021-10-25 13:20                     ` Patrick Williams
2021-10-25 13:34                       ` Greg Kroah-Hartman
2021-10-25 14:02                         ` Patrick Williams
2021-10-25 14:09                           ` Greg Kroah-Hartman
2021-10-25 15:54                             ` Patrick Williams
2021-10-25 18:36                               ` Greg Kroah-Hartman
2021-10-22 16:27         ` Zev Weiss
2021-10-23  8:55           ` Greg Kroah-Hartman
2021-10-22  2:00 ` [PATCH 5/5] of: platform: instantiate " Zev Weiss
2021-10-22  2:58 ` [PATCH 0/5] driver core, of: support for " Rob Herring
2021-10-22  3:13   ` Zev Weiss
2021-10-22  6:50   ` Greg Kroah-Hartman
2021-10-22  6:50 ` Greg Kroah-Hartman
2021-10-22  9:00   ` Zev Weiss
2021-10-22  9:22     ` Greg Kroah-Hartman
2021-10-25  5:53     ` Frank Rowand
2021-10-25 13:57       ` Frank Rowand

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).