dri-devel.lists.freedesktop.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Convert to persistent DRM devices
@ 2023-11-17 17:43 Jeffrey Hugo
  2023-11-17 17:43 ` [PATCH 1/2] accel/qaic: Increase number of in_reset states Jeffrey Hugo
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Jeffrey Hugo @ 2023-11-17 17:43 UTC (permalink / raw)
  To: quic_carlv, quic_pkanojiy, stanislaw.gruszka, ogabbay
  Cc: linux-arm-msm, Jeffrey Hugo, dri-devel

The qaic driver currently creates and destroys the DRM devices when the
qaic device is in an operational state for userspace. This does not match
what other DRM drivers do, and leads to a few race conditions that need
to be handled.

Instead, create the DRM device when the underlying PCIe device is detected
and destroy the DRM device when the underlying device disappears.

Use KOBJ_ONLINE/OFFLINE udev events to signal to userspace when the
underlying device is ready to accept requests, or has entered a reset
state.

Carl Vanderlip (2):
  accel/qaic: Increase number of in_reset states
  accel/qaic: Expand DRM device lifecycle

 Documentation/accel/qaic/qaic.rst   |  9 +++++-
 drivers/accel/qaic/mhi_controller.c |  2 +-
 drivers/accel/qaic/qaic.h           | 15 +++++++--
 drivers/accel/qaic/qaic_control.c   |  5 +--
 drivers/accel/qaic/qaic_data.c      | 16 ++++-----
 drivers/accel/qaic/qaic_drv.c       | 50 ++++++++++++-----------------
 6 files changed, 52 insertions(+), 45 deletions(-)

-- 
2.40.1


^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] accel/qaic: Increase number of in_reset states
  2023-11-17 17:43 [PATCH 0/2] Convert to persistent DRM devices Jeffrey Hugo
@ 2023-11-17 17:43 ` Jeffrey Hugo
  2023-11-20  8:18   ` Jacek Lawrynowicz
  2023-11-17 17:43 ` [PATCH 2/2] accel/qaic: Expand DRM device lifecycle Jeffrey Hugo
  2023-12-01 17:38 ` [PATCH 0/2] Convert to persistent DRM devices Jeffrey Hugo
  2 siblings, 1 reply; 6+ messages in thread
From: Jeffrey Hugo @ 2023-11-17 17:43 UTC (permalink / raw)
  To: quic_carlv, quic_pkanojiy, stanislaw.gruszka, ogabbay
  Cc: linux-arm-msm, Jeffrey Hugo, dri-devel

From: Carl Vanderlip <quic_carlv@quicinc.com>

'in_reset' holds the state of the device. As part of bringup, the device
needs to be queried to check if it's in a valid state. Add a new state
that indicates that the device is coming up, but not ready for users
yet. Rename to 'reset_state' to better describe the variable.

Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
---
 drivers/accel/qaic/qaic.h         | 13 +++++++++++--
 drivers/accel/qaic/qaic_control.c |  5 +++--
 drivers/accel/qaic/qaic_data.c    | 16 ++++++++--------
 drivers/accel/qaic/qaic_drv.c     | 12 ++++++------
 4 files changed, 28 insertions(+), 18 deletions(-)

diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
index bc40d52dc010..bd0c884e6bf7 100644
--- a/drivers/accel/qaic/qaic.h
+++ b/drivers/accel/qaic/qaic.h
@@ -31,6 +31,15 @@
 #define to_drm(qddev) (&(qddev)->drm)
 #define to_accel_kdev(qddev) (to_drm(qddev)->accel->kdev) /* Return Linux device of accel node */
 
+enum __packed reset_states {
+	/* Device is offline or will be very soon */
+	QAIC_OFFLINE,
+	/* Device is booting, not clear if it's in a usable state */
+	QAIC_BOOT,
+	/* Device is fully operational */
+	QAIC_ONLINE,
+};
+
 extern bool datapath_polling;
 
 struct qaic_user {
@@ -121,8 +130,8 @@ struct qaic_device {
 	struct workqueue_struct	*cntl_wq;
 	/* Synchronizes all the users of device during cleanup */
 	struct srcu_struct	dev_lock;
-	/* true: Device under reset; false: Device not under reset */
-	bool			in_reset;
+	/* Track the state of the device during resets */
+	enum reset_states	reset_state;
 	/* true: single MSI is used to operate device */
 	bool			single_msi;
 	/*
diff --git a/drivers/accel/qaic/qaic_control.c b/drivers/accel/qaic/qaic_control.c
index 84915824be54..0701d2deee08 100644
--- a/drivers/accel/qaic/qaic_control.c
+++ b/drivers/accel/qaic/qaic_control.c
@@ -1022,7 +1022,8 @@ static void *msg_xfer(struct qaic_device *qdev, struct wrapper_list *wrappers, u
 	int xfer_count = 0;
 	int retry_count;
 
-	if (qdev->in_reset) {
+	/* Allow QAIC_BOOT state since we need to check control protocol version */
+	if (qdev->reset_state == QAIC_OFFLINE) {
 		mutex_unlock(&qdev->cntl_mutex);
 		return ERR_PTR(-ENODEV);
 	}
@@ -1306,7 +1307,7 @@ int qaic_manage_ioctl(struct drm_device *dev, void *data, struct drm_file *file_
 	qdev = usr->qddev->qdev;
 
 	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
-	if (qdev->in_reset) {
+	if (qdev->reset_state != QAIC_ONLINE) {
 		srcu_read_unlock(&qdev->dev_lock, qdev_rcu_id);
 		srcu_read_unlock(&usr->qddev_lock, usr_rcu_id);
 		return -ENODEV;
diff --git a/drivers/accel/qaic/qaic_data.c b/drivers/accel/qaic/qaic_data.c
index 8da81768f2ab..28643a47c405 100644
--- a/drivers/accel/qaic/qaic_data.c
+++ b/drivers/accel/qaic/qaic_data.c
@@ -690,7 +690,7 @@ int qaic_create_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *fi
 
 	qdev = usr->qddev->qdev;
 	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
-	if (qdev->in_reset) {
+	if (qdev->reset_state != QAIC_ONLINE) {
 		ret = -ENODEV;
 		goto unlock_dev_srcu;
 	}
@@ -749,7 +749,7 @@ int qaic_mmap_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *file
 
 	qdev = usr->qddev->qdev;
 	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
-	if (qdev->in_reset) {
+	if (qdev->reset_state != QAIC_ONLINE) {
 		ret = -ENODEV;
 		goto unlock_dev_srcu;
 	}
@@ -970,7 +970,7 @@ int qaic_attach_slice_bo_ioctl(struct drm_device *dev, void *data, struct drm_fi
 
 	qdev = usr->qddev->qdev;
 	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
-	if (qdev->in_reset) {
+	if (qdev->reset_state != QAIC_ONLINE) {
 		ret = -ENODEV;
 		goto unlock_dev_srcu;
 	}
@@ -1341,7 +1341,7 @@ static int __qaic_execute_bo_ioctl(struct drm_device *dev, void *data, struct dr
 
 	qdev = usr->qddev->qdev;
 	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
-	if (qdev->in_reset) {
+	if (qdev->reset_state != QAIC_ONLINE) {
 		ret = -ENODEV;
 		goto unlock_dev_srcu;
 	}
@@ -1497,7 +1497,7 @@ void irq_polling_work(struct work_struct *work)
 	rcu_id = srcu_read_lock(&dbc->ch_lock);
 
 	while (1) {
-		if (dbc->qdev->in_reset) {
+		if (dbc->qdev->reset_state != QAIC_ONLINE) {
 			srcu_read_unlock(&dbc->ch_lock, rcu_id);
 			return;
 		}
@@ -1687,7 +1687,7 @@ int qaic_wait_bo_ioctl(struct drm_device *dev, void *data, struct drm_file *file
 
 	qdev = usr->qddev->qdev;
 	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
-	if (qdev->in_reset) {
+	if (qdev->reset_state != QAIC_ONLINE) {
 		ret = -ENODEV;
 		goto unlock_dev_srcu;
 	}
@@ -1756,7 +1756,7 @@ int qaic_perf_stats_bo_ioctl(struct drm_device *dev, void *data, struct drm_file
 
 	qdev = usr->qddev->qdev;
 	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
-	if (qdev->in_reset) {
+	if (qdev->reset_state != QAIC_ONLINE) {
 		ret = -ENODEV;
 		goto unlock_dev_srcu;
 	}
@@ -1847,7 +1847,7 @@ int qaic_detach_slice_bo_ioctl(struct drm_device *dev, void *data, struct drm_fi
 
 	qdev = usr->qddev->qdev;
 	qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
-	if (qdev->in_reset) {
+	if (qdev->reset_state != QAIC_ONLINE) {
 		ret = -ENODEV;
 		goto unlock_dev_srcu;
 	}
diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
index b12226385003..02fe23248da4 100644
--- a/drivers/accel/qaic/qaic_drv.c
+++ b/drivers/accel/qaic/qaic_drv.c
@@ -64,7 +64,7 @@ static int qaic_open(struct drm_device *dev, struct drm_file *file)
 	int ret;
 
 	rcu_id = srcu_read_lock(&qdev->dev_lock);
-	if (qdev->in_reset) {
+	if (qdev->reset_state != QAIC_ONLINE) {
 		ret = -ENODEV;
 		goto dev_unlock;
 	}
@@ -121,7 +121,7 @@ static void qaic_postclose(struct drm_device *dev, struct drm_file *file)
 	if (qddev) {
 		qdev = qddev->qdev;
 		qdev_rcu_id = srcu_read_lock(&qdev->dev_lock);
-		if (!qdev->in_reset) {
+		if (qdev->reset_state == QAIC_ONLINE) {
 			qaic_release_usr(qdev, usr);
 			for (i = 0; i < qdev->num_dbc; ++i)
 				if (qdev->dbc[i].usr && qdev->dbc[i].usr->handle == usr->handle)
@@ -254,7 +254,7 @@ static int qaic_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_id
 
 	qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
 
-	qdev->in_reset = false;
+	qdev->reset_state = QAIC_ONLINE;
 
 	dev_set_drvdata(&mhi_dev->dev, qdev);
 	qdev->cntl_ch = mhi_dev;
@@ -291,7 +291,7 @@ static void qaic_notify_reset(struct qaic_device *qdev)
 {
 	int i;
 
-	qdev->in_reset = true;
+	qdev->reset_state = QAIC_OFFLINE;
 	/* wake up any waiters to avoid waiting for timeouts at sync */
 	wake_all_cntl(qdev);
 	for (i = 0; i < qdev->num_dbc; ++i)
@@ -313,7 +313,7 @@ void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset)
 		release_dbc(qdev, i);
 
 	if (exit_reset)
-		qdev->in_reset = false;
+		qdev->reset_state = QAIC_ONLINE;
 }
 
 static void cleanup_qdev(struct qaic_device *qdev)
@@ -550,7 +550,7 @@ static void qaic_pci_reset_done(struct pci_dev *pdev)
 {
 	struct qaic_device *qdev = pci_get_drvdata(pdev);
 
-	qdev->in_reset = false;
+	qdev->reset_state = QAIC_ONLINE;
 	qaic_mhi_reset_done(qdev->mhi_cntrl);
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] accel/qaic: Expand DRM device lifecycle
  2023-11-17 17:43 [PATCH 0/2] Convert to persistent DRM devices Jeffrey Hugo
  2023-11-17 17:43 ` [PATCH 1/2] accel/qaic: Increase number of in_reset states Jeffrey Hugo
@ 2023-11-17 17:43 ` Jeffrey Hugo
  2023-11-21  8:17   ` Jacek Lawrynowicz
  2023-12-01 17:38 ` [PATCH 0/2] Convert to persistent DRM devices Jeffrey Hugo
  2 siblings, 1 reply; 6+ messages in thread
From: Jeffrey Hugo @ 2023-11-17 17:43 UTC (permalink / raw)
  To: quic_carlv, quic_pkanojiy, stanislaw.gruszka, ogabbay
  Cc: linux-arm-msm, Jeffrey Hugo, dri-devel

From: Carl Vanderlip <quic_carlv@quicinc.com>

Currently the QAIC DRM device registers itself when the MHI QAIC_CONTROL
channel becomes available. This is when the device is able to process
workloads. However, the DRM driver also provides the debugfs interface
bootlog for the device. If the device fails to boot to the QSM (which
brings up the MHI QAIC_CONTROL channel), the bootlog won't be available for
debugging why it failed to boot.

Change when the DRM device registers itself from when QAIC_CONTROL is
available to when the card is first probed on the PCI bus. Additionally,
make the DRM driver persist through reset/error cases so the driver
doesn't have to be reloaded to access the card again. Send
KOBJ_ONLINE/OFFLINE uevents so userspace can know when DRM device is
ready to handle requests.

Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com>
Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
---
 Documentation/accel/qaic/qaic.rst   |  9 +++++-
 drivers/accel/qaic/mhi_controller.c |  2 +-
 drivers/accel/qaic/qaic.h           |  2 +-
 drivers/accel/qaic/qaic_drv.c       | 44 +++++++++++------------------
 4 files changed, 27 insertions(+), 30 deletions(-)

diff --git a/Documentation/accel/qaic/qaic.rst b/Documentation/accel/qaic/qaic.rst
index f81020736ebf..efb7771273bb 100644
--- a/Documentation/accel/qaic/qaic.rst
+++ b/Documentation/accel/qaic/qaic.rst
@@ -93,8 +93,15 @@ commands (does not impact QAIC).
 uAPI
 ====
 
+QAIC creates an accel device per phsyical PCIe device. This accel device exists
+for as long as the PCIe device is known to Linux.
+
+The PCIe device may not be in the state to accept requests from userspace at
+all times. QAIC will trigger KOBJ_ONLINE/OFFLINE uevents to advertise when the
+device can accept requests (ONLINE) and when the device is no longer accepting
+requests (OFFLINE) because of a reset or other state transition.
+
 QAIC defines a number of driver specific IOCTLs as part of the userspace API.
-This section describes those APIs.
 
 DRM_IOCTL_QAIC_MANAGE
   This IOCTL allows userspace to send a NNC request to the QSM. The call will
diff --git a/drivers/accel/qaic/mhi_controller.c b/drivers/accel/qaic/mhi_controller.c
index 5d3cc30009cc..832464f2833a 100644
--- a/drivers/accel/qaic/mhi_controller.c
+++ b/drivers/accel/qaic/mhi_controller.c
@@ -469,7 +469,7 @@ static void mhi_status_cb(struct mhi_controller *mhi_cntrl, enum mhi_callback re
 		pci_err(qdev->pdev, "Fatal error received from device. Attempting to recover\n");
 	/* this event occurs in non-atomic context */
 	if (reason == MHI_CB_SYS_ERROR)
-		qaic_dev_reset_clean_local_state(qdev, true);
+		qaic_dev_reset_clean_local_state(qdev);
 }
 
 static int mhi_reset_and_async_power_up(struct mhi_controller *mhi_cntrl)
diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
index bd0c884e6bf7..66f4abf6c4c4 100644
--- a/drivers/accel/qaic/qaic.h
+++ b/drivers/accel/qaic/qaic.h
@@ -283,7 +283,7 @@ void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
 void release_dbc(struct qaic_device *qdev, u32 dbc_id);
 
 void wake_all_cntl(struct qaic_device *qdev);
-void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset);
+void qaic_dev_reset_clean_local_state(struct qaic_device *qdev);
 
 struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev, struct dma_buf *dma_buf);
 
diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
index 02fe23248da4..c19bc83b249c 100644
--- a/drivers/accel/qaic/qaic_drv.c
+++ b/drivers/accel/qaic/qaic_drv.c
@@ -8,6 +8,7 @@
 #include <linux/idr.h>
 #include <linux/interrupt.h>
 #include <linux/list.h>
+#include <linux/kobject.h>
 #include <linux/kref.h>
 #include <linux/mhi.h>
 #include <linux/module.h>
@@ -43,9 +44,6 @@ MODULE_PARM_DESC(datapath_polling, "Operate the datapath in polling mode");
 static bool link_up;
 static DEFINE_IDA(qaic_usrs);
 
-static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id);
-static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id);
-
 static void free_usr(struct kref *kref)
 {
 	struct qaic_user *usr = container_of(kref, struct qaic_user, ref_count);
@@ -183,13 +181,6 @@ static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id)
 
 	qddev->partition_id = partition_id;
 
-	/*
-	 * drm_dev_unregister() sets the driver data to NULL and
-	 * drm_dev_register() does not update the driver data. During a SOC
-	 * reset drm dev is unregistered and registered again leaving the
-	 * driver data to NULL.
-	 */
-	dev_set_drvdata(to_accel_kdev(qddev), drm->accel);
 	ret = drm_dev_register(drm, 0);
 	if (ret)
 		pci_dbg(qdev->pdev, "drm_dev_register failed %d\n", ret);
@@ -203,7 +194,6 @@ static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id)
 	struct drm_device *drm = to_drm(qddev);
 	struct qaic_user *usr;
 
-	drm_dev_get(drm);
 	drm_dev_unregister(drm);
 	qddev->partition_id = 0;
 	/*
@@ -232,7 +222,6 @@ static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id)
 		mutex_lock(&qddev->users_mutex);
 	}
 	mutex_unlock(&qddev->users_mutex);
-	drm_dev_put(drm);
 }
 
 static int qaic_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_id *id)
@@ -254,8 +243,6 @@ static int qaic_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_id
 
 	qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
 
-	qdev->reset_state = QAIC_ONLINE;
-
 	dev_set_drvdata(&mhi_dev->dev, qdev);
 	qdev->cntl_ch = mhi_dev;
 
@@ -265,6 +252,7 @@ static int qaic_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_id
 		return ret;
 	}
 
+	qdev->reset_state = QAIC_BOOT;
 	ret = get_cntl_version(qdev, NULL, &major, &minor);
 	if (ret || major != CNTL_MAJOR || minor > CNTL_MINOR) {
 		pci_err(qdev->pdev, "%s: Control protocol version (%d.%d) not supported. Supported version is (%d.%d). Ret: %d\n",
@@ -272,8 +260,8 @@ static int qaic_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_id
 		ret = -EINVAL;
 		goto close_control;
 	}
-
-	ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION);
+	qdev->reset_state = QAIC_ONLINE;
+	kobject_uevent(&(to_accel_kdev(qdev->qddev))->kobj, KOBJ_ONLINE);
 
 	return ret;
 
@@ -291,6 +279,7 @@ static void qaic_notify_reset(struct qaic_device *qdev)
 {
 	int i;
 
+	kobject_uevent(&(to_accel_kdev(qdev->qddev))->kobj, KOBJ_OFFLINE);
 	qdev->reset_state = QAIC_OFFLINE;
 	/* wake up any waiters to avoid waiting for timeouts at sync */
 	wake_all_cntl(qdev);
@@ -299,21 +288,15 @@ static void qaic_notify_reset(struct qaic_device *qdev)
 	synchronize_srcu(&qdev->dev_lock);
 }
 
-void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset)
+void qaic_dev_reset_clean_local_state(struct qaic_device *qdev)
 {
 	int i;
 
 	qaic_notify_reset(qdev);
 
-	/* remove drmdevs to prevent new users from coming in */
-	qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION);
-
 	/* start tearing things down */
 	for (i = 0; i < qdev->num_dbc; ++i)
 		release_dbc(qdev, i);
-
-	if (exit_reset)
-		qdev->reset_state = QAIC_ONLINE;
 }
 
 static void cleanup_qdev(struct qaic_device *qdev)
@@ -338,6 +321,7 @@ static struct qaic_device *create_qdev(struct pci_dev *pdev, const struct pci_de
 	if (!qdev)
 		return NULL;
 
+	qdev->reset_state = QAIC_OFFLINE;
 	if (id->device == PCI_DEV_AIC100) {
 		qdev->num_dbc = 16;
 		qdev->dbc = devm_kcalloc(&pdev->dev, qdev->num_dbc, sizeof(*qdev->dbc), GFP_KERNEL);
@@ -499,15 +483,21 @@ static int qaic_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto cleanup_qdev;
 	}
 
+	ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION);
+	if (ret)
+		goto cleanup_qdev;
+
 	qdev->mhi_cntrl = qaic_mhi_register_controller(pdev, qdev->bar_0, mhi_irq,
 						       qdev->single_msi);
 	if (IS_ERR(qdev->mhi_cntrl)) {
 		ret = PTR_ERR(qdev->mhi_cntrl);
-		goto cleanup_qdev;
+		goto cleanup_drm_dev;
 	}
 
 	return 0;
 
+cleanup_drm_dev:
+	qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION);
 cleanup_qdev:
 	cleanup_qdev(qdev);
 	return ret;
@@ -520,7 +510,8 @@ static void qaic_pci_remove(struct pci_dev *pdev)
 	if (!qdev)
 		return;
 
-	qaic_dev_reset_clean_local_state(qdev, false);
+	qaic_dev_reset_clean_local_state(qdev);
+	qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION);
 	qaic_mhi_free_controller(qdev->mhi_cntrl, link_up);
 	cleanup_qdev(qdev);
 }
@@ -543,14 +534,13 @@ static void qaic_pci_reset_prepare(struct pci_dev *pdev)
 
 	qaic_notify_reset(qdev);
 	qaic_mhi_start_reset(qdev->mhi_cntrl);
-	qaic_dev_reset_clean_local_state(qdev, false);
+	qaic_dev_reset_clean_local_state(qdev);
 }
 
 static void qaic_pci_reset_done(struct pci_dev *pdev)
 {
 	struct qaic_device *qdev = pci_get_drvdata(pdev);
 
-	qdev->reset_state = QAIC_ONLINE;
 	qaic_mhi_reset_done(qdev->mhi_cntrl);
 }
 
-- 
2.40.1


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 1/2] accel/qaic: Increase number of in_reset states
  2023-11-17 17:43 ` [PATCH 1/2] accel/qaic: Increase number of in_reset states Jeffrey Hugo
@ 2023-11-20  8:18   ` Jacek Lawrynowicz
  0 siblings, 0 replies; 6+ messages in thread
From: Jacek Lawrynowicz @ 2023-11-20  8:18 UTC (permalink / raw)
  To: dri-devel

Hi,

On 17.11.2023 18:43, Jeffrey Hugo wrote:
> From: Carl Vanderlip <quic_carlv@quicinc.com>
> 
> 'in_reset' holds the state of the device. As part of bringup, the device
> needs to be queried to check if it's in a valid state. Add a new state
> that indicates that the device is coming up, but not ready for users
> yet. Rename to 'reset_state' to better describe the variable.
> 
> Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com>
> Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> ---
>  drivers/accel/qaic/qaic.h         | 13 +++++++++++--
>  drivers/accel/qaic/qaic_control.c |  5 +++--
>  drivers/accel/qaic/qaic_data.c    | 16 ++++++++--------
>  drivers/accel/qaic/qaic_drv.c     | 12 ++++++------
>  4 files changed, 28 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
> index bc40d52dc010..bd0c884e6bf7 100644
> --- a/drivers/accel/qaic/qaic.h
> +++ b/drivers/accel/qaic/qaic.h
> @@ -31,6 +31,15 @@
>  #define to_drm(qddev) (&(qddev)->drm)
>  #define to_accel_kdev(qddev) (to_drm(qddev)->accel->kdev) /* Return Linux device of accel node */
>  
> +enum __packed reset_states {
> +	/* Device is offline or will be very soon */
> +	QAIC_OFFLINE,
> +	/* Device is booting, not clear if it's in a usable state */
> +	QAIC_BOOT,
> +	/* Device is fully operational */
> +	QAIC_ONLINE,
> +};
> +
>  extern bool datapath_polling;
>  
>  struct qaic_user {
> @@ -121,8 +130,8 @@ struct qaic_device {
>  	struct workqueue_struct	*cntl_wq;
>  	/* Synchronizes all the users of device during cleanup */
>  	struct srcu_struct	dev_lock;
> -	/* true: Device under reset; false: Device not under reset */
> -	bool			in_reset;
> +	/* Track the state of the device during resets */
> +	enum reset_states	reset_state;
I would rename this to dev_state but otherwise:

Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] accel/qaic: Expand DRM device lifecycle
  2023-11-17 17:43 ` [PATCH 2/2] accel/qaic: Expand DRM device lifecycle Jeffrey Hugo
@ 2023-11-21  8:17   ` Jacek Lawrynowicz
  0 siblings, 0 replies; 6+ messages in thread
From: Jacek Lawrynowicz @ 2023-11-21  8:17 UTC (permalink / raw)
  To: dri-devel

Reviewed-by: Jacek Lawrynowicz <jacek.lawrynowicz@linux.intel.com>

On 17.11.2023 18:43, Jeffrey Hugo wrote:
> From: Carl Vanderlip <quic_carlv@quicinc.com>
> 
> Currently the QAIC DRM device registers itself when the MHI QAIC_CONTROL
> channel becomes available. This is when the device is able to process
> workloads. However, the DRM driver also provides the debugfs interface
> bootlog for the device. If the device fails to boot to the QSM (which
> brings up the MHI QAIC_CONTROL channel), the bootlog won't be available for
> debugging why it failed to boot.
> 
> Change when the DRM device registers itself from when QAIC_CONTROL is
> available to when the card is first probed on the PCI bus. Additionally,
> make the DRM driver persist through reset/error cases so the driver
> doesn't have to be reloaded to access the card again. Send
> KOBJ_ONLINE/OFFLINE uevents so userspace can know when DRM device is
> ready to handle requests.
> 
> Signed-off-by: Carl Vanderlip <quic_carlv@quicinc.com>
> Reviewed-by: Pranjal Ramajor Asha Kanojiya <quic_pkanojiy@quicinc.com>
> Reviewed-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> Signed-off-by: Jeffrey Hugo <quic_jhugo@quicinc.com>
> ---
>  Documentation/accel/qaic/qaic.rst   |  9 +++++-
>  drivers/accel/qaic/mhi_controller.c |  2 +-
>  drivers/accel/qaic/qaic.h           |  2 +-
>  drivers/accel/qaic/qaic_drv.c       | 44 +++++++++++------------------
>  4 files changed, 27 insertions(+), 30 deletions(-)
> 
> diff --git a/Documentation/accel/qaic/qaic.rst b/Documentation/accel/qaic/qaic.rst
> index f81020736ebf..efb7771273bb 100644
> --- a/Documentation/accel/qaic/qaic.rst
> +++ b/Documentation/accel/qaic/qaic.rst
> @@ -93,8 +93,15 @@ commands (does not impact QAIC).
>  uAPI
>  ====
>  
> +QAIC creates an accel device per phsyical PCIe device. This accel device exists
> +for as long as the PCIe device is known to Linux.
> +
> +The PCIe device may not be in the state to accept requests from userspace at
> +all times. QAIC will trigger KOBJ_ONLINE/OFFLINE uevents to advertise when the
> +device can accept requests (ONLINE) and when the device is no longer accepting
> +requests (OFFLINE) because of a reset or other state transition.
> +
>  QAIC defines a number of driver specific IOCTLs as part of the userspace API.
> -This section describes those APIs.
>  
>  DRM_IOCTL_QAIC_MANAGE
>    This IOCTL allows userspace to send a NNC request to the QSM. The call will
> diff --git a/drivers/accel/qaic/mhi_controller.c b/drivers/accel/qaic/mhi_controller.c
> index 5d3cc30009cc..832464f2833a 100644
> --- a/drivers/accel/qaic/mhi_controller.c
> +++ b/drivers/accel/qaic/mhi_controller.c
> @@ -469,7 +469,7 @@ static void mhi_status_cb(struct mhi_controller *mhi_cntrl, enum mhi_callback re
>  		pci_err(qdev->pdev, "Fatal error received from device. Attempting to recover\n");
>  	/* this event occurs in non-atomic context */
>  	if (reason == MHI_CB_SYS_ERROR)
> -		qaic_dev_reset_clean_local_state(qdev, true);
> +		qaic_dev_reset_clean_local_state(qdev);
>  }
>  
>  static int mhi_reset_and_async_power_up(struct mhi_controller *mhi_cntrl)
> diff --git a/drivers/accel/qaic/qaic.h b/drivers/accel/qaic/qaic.h
> index bd0c884e6bf7..66f4abf6c4c4 100644
> --- a/drivers/accel/qaic/qaic.h
> +++ b/drivers/accel/qaic/qaic.h
> @@ -283,7 +283,7 @@ void wakeup_dbc(struct qaic_device *qdev, u32 dbc_id);
>  void release_dbc(struct qaic_device *qdev, u32 dbc_id);
>  
>  void wake_all_cntl(struct qaic_device *qdev);
> -void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset);
> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev);
>  
>  struct drm_gem_object *qaic_gem_prime_import(struct drm_device *dev, struct dma_buf *dma_buf);
>  
> diff --git a/drivers/accel/qaic/qaic_drv.c b/drivers/accel/qaic/qaic_drv.c
> index 02fe23248da4..c19bc83b249c 100644
> --- a/drivers/accel/qaic/qaic_drv.c
> +++ b/drivers/accel/qaic/qaic_drv.c
> @@ -8,6 +8,7 @@
>  #include <linux/idr.h>
>  #include <linux/interrupt.h>
>  #include <linux/list.h>
> +#include <linux/kobject.h>
>  #include <linux/kref.h>
>  #include <linux/mhi.h>
>  #include <linux/module.h>
> @@ -43,9 +44,6 @@ MODULE_PARM_DESC(datapath_polling, "Operate the datapath in polling mode");
>  static bool link_up;
>  static DEFINE_IDA(qaic_usrs);
>  
> -static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id);
> -static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id);
> -
>  static void free_usr(struct kref *kref)
>  {
>  	struct qaic_user *usr = container_of(kref, struct qaic_user, ref_count);
> @@ -183,13 +181,6 @@ static int qaic_create_drm_device(struct qaic_device *qdev, s32 partition_id)
>  
>  	qddev->partition_id = partition_id;
>  
> -	/*
> -	 * drm_dev_unregister() sets the driver data to NULL and
> -	 * drm_dev_register() does not update the driver data. During a SOC
> -	 * reset drm dev is unregistered and registered again leaving the
> -	 * driver data to NULL.
> -	 */
> -	dev_set_drvdata(to_accel_kdev(qddev), drm->accel);
>  	ret = drm_dev_register(drm, 0);
>  	if (ret)
>  		pci_dbg(qdev->pdev, "drm_dev_register failed %d\n", ret);
> @@ -203,7 +194,6 @@ static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id)
>  	struct drm_device *drm = to_drm(qddev);
>  	struct qaic_user *usr;
>  
> -	drm_dev_get(drm);
>  	drm_dev_unregister(drm);
>  	qddev->partition_id = 0;
>  	/*
> @@ -232,7 +222,6 @@ static void qaic_destroy_drm_device(struct qaic_device *qdev, s32 partition_id)
>  		mutex_lock(&qddev->users_mutex);
>  	}
>  	mutex_unlock(&qddev->users_mutex);
> -	drm_dev_put(drm);
>  }
>  
>  static int qaic_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_id *id)
> @@ -254,8 +243,6 @@ static int qaic_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_id
>  
>  	qdev = pci_get_drvdata(to_pci_dev(mhi_dev->mhi_cntrl->cntrl_dev));
>  
> -	qdev->reset_state = QAIC_ONLINE;
> -
>  	dev_set_drvdata(&mhi_dev->dev, qdev);
>  	qdev->cntl_ch = mhi_dev;
>  
> @@ -265,6 +252,7 @@ static int qaic_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_id
>  		return ret;
>  	}
>  
> +	qdev->reset_state = QAIC_BOOT;
>  	ret = get_cntl_version(qdev, NULL, &major, &minor);
>  	if (ret || major != CNTL_MAJOR || minor > CNTL_MINOR) {
>  		pci_err(qdev->pdev, "%s: Control protocol version (%d.%d) not supported. Supported version is (%d.%d). Ret: %d\n",
> @@ -272,8 +260,8 @@ static int qaic_mhi_probe(struct mhi_device *mhi_dev, const struct mhi_device_id
>  		ret = -EINVAL;
>  		goto close_control;
>  	}
> -
> -	ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION);
> +	qdev->reset_state = QAIC_ONLINE;
> +	kobject_uevent(&(to_accel_kdev(qdev->qddev))->kobj, KOBJ_ONLINE);
>  
>  	return ret;
>  
> @@ -291,6 +279,7 @@ static void qaic_notify_reset(struct qaic_device *qdev)
>  {
>  	int i;
>  
> +	kobject_uevent(&(to_accel_kdev(qdev->qddev))->kobj, KOBJ_OFFLINE);
>  	qdev->reset_state = QAIC_OFFLINE;
>  	/* wake up any waiters to avoid waiting for timeouts at sync */
>  	wake_all_cntl(qdev);
> @@ -299,21 +288,15 @@ static void qaic_notify_reset(struct qaic_device *qdev)
>  	synchronize_srcu(&qdev->dev_lock);
>  }
>  
> -void qaic_dev_reset_clean_local_state(struct qaic_device *qdev, bool exit_reset)
> +void qaic_dev_reset_clean_local_state(struct qaic_device *qdev)
>  {
>  	int i;
>  
>  	qaic_notify_reset(qdev);
>  
> -	/* remove drmdevs to prevent new users from coming in */
> -	qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION);
> -
>  	/* start tearing things down */
>  	for (i = 0; i < qdev->num_dbc; ++i)
>  		release_dbc(qdev, i);
> -
> -	if (exit_reset)
> -		qdev->reset_state = QAIC_ONLINE;
>  }
>  
>  static void cleanup_qdev(struct qaic_device *qdev)
> @@ -338,6 +321,7 @@ static struct qaic_device *create_qdev(struct pci_dev *pdev, const struct pci_de
>  	if (!qdev)
>  		return NULL;
>  
> +	qdev->reset_state = QAIC_OFFLINE;
>  	if (id->device == PCI_DEV_AIC100) {
>  		qdev->num_dbc = 16;
>  		qdev->dbc = devm_kcalloc(&pdev->dev, qdev->num_dbc, sizeof(*qdev->dbc), GFP_KERNEL);
> @@ -499,15 +483,21 @@ static int qaic_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>  		goto cleanup_qdev;
>  	}
>  
> +	ret = qaic_create_drm_device(qdev, QAIC_NO_PARTITION);
> +	if (ret)
> +		goto cleanup_qdev;
> +
>  	qdev->mhi_cntrl = qaic_mhi_register_controller(pdev, qdev->bar_0, mhi_irq,
>  						       qdev->single_msi);
>  	if (IS_ERR(qdev->mhi_cntrl)) {
>  		ret = PTR_ERR(qdev->mhi_cntrl);
> -		goto cleanup_qdev;
> +		goto cleanup_drm_dev;
>  	}
>  
>  	return 0;
>  
> +cleanup_drm_dev:
> +	qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION);
>  cleanup_qdev:
>  	cleanup_qdev(qdev);
>  	return ret;
> @@ -520,7 +510,8 @@ static void qaic_pci_remove(struct pci_dev *pdev)
>  	if (!qdev)
>  		return;
>  
> -	qaic_dev_reset_clean_local_state(qdev, false);
> +	qaic_dev_reset_clean_local_state(qdev);
> +	qaic_destroy_drm_device(qdev, QAIC_NO_PARTITION);
>  	qaic_mhi_free_controller(qdev->mhi_cntrl, link_up);
>  	cleanup_qdev(qdev);
>  }
> @@ -543,14 +534,13 @@ static void qaic_pci_reset_prepare(struct pci_dev *pdev)
>  
>  	qaic_notify_reset(qdev);
>  	qaic_mhi_start_reset(qdev->mhi_cntrl);
> -	qaic_dev_reset_clean_local_state(qdev, false);
> +	qaic_dev_reset_clean_local_state(qdev);
>  }
>  
>  static void qaic_pci_reset_done(struct pci_dev *pdev)
>  {
>  	struct qaic_device *qdev = pci_get_drvdata(pdev);
>  
> -	qdev->reset_state = QAIC_ONLINE;
>  	qaic_mhi_reset_done(qdev->mhi_cntrl);
>  }
>  

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH 0/2] Convert to persistent DRM devices
  2023-11-17 17:43 [PATCH 0/2] Convert to persistent DRM devices Jeffrey Hugo
  2023-11-17 17:43 ` [PATCH 1/2] accel/qaic: Increase number of in_reset states Jeffrey Hugo
  2023-11-17 17:43 ` [PATCH 2/2] accel/qaic: Expand DRM device lifecycle Jeffrey Hugo
@ 2023-12-01 17:38 ` Jeffrey Hugo
  2 siblings, 0 replies; 6+ messages in thread
From: Jeffrey Hugo @ 2023-12-01 17:38 UTC (permalink / raw)
  To: quic_carlv, quic_pkanojiy, stanislaw.gruszka, ogabbay
  Cc: linux-arm-msm, dri-devel

On 11/17/2023 10:43 AM, Jeffrey Hugo wrote:
> The qaic driver currently creates and destroys the DRM devices when the
> qaic device is in an operational state for userspace. This does not match
> what other DRM drivers do, and leads to a few race conditions that need
> to be handled.
> 
> Instead, create the DRM device when the underlying PCIe device is detected
> and destroy the DRM device when the underlying device disappears.
> 
> Use KOBJ_ONLINE/OFFLINE udev events to signal to userspace when the
> underlying device is ready to accept requests, or has entered a reset
> state.
> 
> Carl Vanderlip (2):
>    accel/qaic: Increase number of in_reset states
>    accel/qaic: Expand DRM device lifecycle
> 
>   Documentation/accel/qaic/qaic.rst   |  9 +++++-
>   drivers/accel/qaic/mhi_controller.c |  2 +-
>   drivers/accel/qaic/qaic.h           | 15 +++++++--
>   drivers/accel/qaic/qaic_control.c   |  5 +--
>   drivers/accel/qaic/qaic_data.c      | 16 ++++-----
>   drivers/accel/qaic/qaic_drv.c       | 50 ++++++++++++-----------------
>   6 files changed, 52 insertions(+), 45 deletions(-)
> 

Pushed to drm-misc-next

-Jeff

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2023-12-01 17:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-11-17 17:43 [PATCH 0/2] Convert to persistent DRM devices Jeffrey Hugo
2023-11-17 17:43 ` [PATCH 1/2] accel/qaic: Increase number of in_reset states Jeffrey Hugo
2023-11-20  8:18   ` Jacek Lawrynowicz
2023-11-17 17:43 ` [PATCH 2/2] accel/qaic: Expand DRM device lifecycle Jeffrey Hugo
2023-11-21  8:17   ` Jacek Lawrynowicz
2023-12-01 17:38 ` [PATCH 0/2] Convert to persistent DRM devices Jeffrey Hugo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).