linux-arm-msm.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] mhi: pci_generic: Misc improvements
@ 2020-11-13 14:59 Loic Poulain
  2020-11-13 14:59 ` [PATCH 1/8] mhi: pci-generic: Increase number of hardware events Loic Poulain
                   ` (7 more replies)
  0 siblings, 8 replies; 27+ messages in thread
From: Loic Poulain @ 2020-11-13 14:59 UTC (permalink / raw)
  To: manivannan.sadhasivam, hemantk; +Cc: linux-arm-msm, Loic Poulain

This series adjust some configuration values to ensure stability and
robustness of mhi pci devices (timeout, number of events, burst mode).

It also includes support for system sleep as well as a recovery procedure
that can be triggered when a PCI error is reported, either by PCI AER or by
the new health-check mechanism.

All these changes have been tested with Telit FN980m module.

Loic Poulain (8):
  mhi: pci-generic: Increase number of hardware events
  mhi: pci-generic: Perform hard reset on remove
  mhi: pci_generic: Enable burst mode for hardware channels
  mhi: pci_generic: Add support for reset
  mhi: pci_generic: Add suspend/resume/recovery procedure
  mhi: pci_generic: Add PCI error handlers
  mhi: pci_generic: Add health-check
  mhi: pci_generic: Increase controller timeout value

 drivers/bus/mhi/pci_generic.c | 352 ++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 335 insertions(+), 17 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 27+ messages in thread

* [PATCH 1/8] mhi: pci-generic: Increase number of hardware events
  2020-11-13 14:59 [PATCH 0/8] mhi: pci_generic: Misc improvements Loic Poulain
@ 2020-11-13 14:59 ` Loic Poulain
  2020-11-13 16:56   ` Bhaumik Bhatt
  2020-11-18  3:54   ` Hemant Kumar
  2020-11-13 14:59 ` [PATCH 2/8] mhi: pci-generic: Perform hard reset on remove Loic Poulain
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 27+ messages in thread
From: Loic Poulain @ 2020-11-13 14:59 UTC (permalink / raw)
  To: manivannan.sadhasivam, hemantk; +Cc: linux-arm-msm, Loic Poulain

If the IPA (IP hardware accelerator) is starved of event ring elements,
the modem is crashing (SDX55). That can be prevented by setting a
larger number of events (i.e 2 x number of channel ring elements).

Tested with FN980m module.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
---
 drivers/bus/mhi/pci_generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
index e3df838..13a7e4f 100644
--- a/drivers/bus/mhi/pci_generic.c
+++ b/drivers/bus/mhi/pci_generic.c
@@ -91,7 +91,7 @@ struct mhi_pci_dev_info {
 
 #define MHI_EVENT_CONFIG_HW_DATA(ev_ring, ch_num) \
 	{					\
-		.num_elements = 128,		\
+		.num_elements = 256,		\
 		.irq_moderation_ms = 5,		\
 		.irq = (ev_ring) + 1,		\
 		.priority = 1,			\
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 2/8] mhi: pci-generic: Perform hard reset on remove
  2020-11-13 14:59 [PATCH 0/8] mhi: pci_generic: Misc improvements Loic Poulain
  2020-11-13 14:59 ` [PATCH 1/8] mhi: pci-generic: Increase number of hardware events Loic Poulain
@ 2020-11-13 14:59 ` Loic Poulain
  2020-11-19  1:46   ` Hemant Kumar
  2020-11-13 14:59 ` [PATCH 3/8] mhi: pci_generic: Enable burst mode for hardware channels Loic Poulain
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Loic Poulain @ 2020-11-13 14:59 UTC (permalink / raw)
  To: manivannan.sadhasivam, hemantk; +Cc: linux-arm-msm, Loic Poulain

Ensure that the device is hard-reset on remove to restore its initial
state and avoid further issues on subsequent probe.

This has been tested with Telit FN980m module.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
---
 drivers/bus/mhi/pci_generic.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
index 13a7e4f..09c6b26 100644
--- a/drivers/bus/mhi/pci_generic.c
+++ b/drivers/bus/mhi/pci_generic.c
@@ -15,6 +15,8 @@
 
 #define MHI_PCI_DEFAULT_BAR_NUM 0
 
+#define DEV_RESET_REG (0xB0)
+
 /**
  * struct mhi_pci_dev_info - MHI PCI device specific information
  * @config: MHI controller configuration
@@ -166,6 +168,11 @@ static void mhi_pci_status_cb(struct mhi_controller *mhi_cntrl,
 	/* Nothing to do for now */
 }
 
+static inline void mhi_pci_reset(struct mhi_controller *mhi_cntrl)
+{
+	writel(1, mhi_cntrl->regs + DEV_RESET_REG);
+}
+
 static int mhi_pci_claim(struct mhi_controller *mhi_cntrl,
 			 unsigned int bar_num, u64 dma_mask)
 {
@@ -329,6 +336,10 @@ static void mhi_pci_remove(struct pci_dev *pdev)
 	mhi_power_down(mhi_cntrl, true);
 	mhi_unprepare_after_power_down(mhi_cntrl);
 	mhi_unregister_controller(mhi_cntrl);
+
+	/* MHI-layer reset could not be enough, always hard-reset the device */
+	mhi_pci_reset(mhi_cntrl);
+
 	mhi_free_controller(mhi_cntrl);
 }
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 3/8] mhi: pci_generic: Enable burst mode for hardware channels
  2020-11-13 14:59 [PATCH 0/8] mhi: pci_generic: Misc improvements Loic Poulain
  2020-11-13 14:59 ` [PATCH 1/8] mhi: pci-generic: Increase number of hardware events Loic Poulain
  2020-11-13 14:59 ` [PATCH 2/8] mhi: pci-generic: Perform hard reset on remove Loic Poulain
@ 2020-11-13 14:59 ` Loic Poulain
  2020-11-13 17:06   ` Bhaumik Bhatt
  2020-11-13 14:59 ` [PATCH 4/8] mhi: pci_generic: Add support for reset Loic Poulain
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Loic Poulain @ 2020-11-13 14:59 UTC (permalink / raw)
  To: manivannan.sadhasivam, hemantk; +Cc: linux-arm-msm, Loic Poulain

Hardware channels have a feature called burst mode that allows to
queue transfer ring element(s) (TRE) to a channel without ringing
the device doorbell. In that mode, the device is polling the channel
context for new elements. This reduces the frequency of host initiated
doorbells and increase throughput.

Create a new dedicated macro for hardware channels with burst enabled.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
---
 drivers/bus/mhi/pci_generic.c | 34 ++++++++++++++++++++++++++++++++--
 1 file changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
index 09c6b26..0c07cf5 100644
--- a/drivers/bus/mhi/pci_generic.c
+++ b/drivers/bus/mhi/pci_generic.c
@@ -78,6 +78,36 @@ struct mhi_pci_dev_info {
 		.offload_channel = false,	\
 	}
 
+#define MHI_CHANNEL_CONFIG_HW_UL(ch_num, ch_name, el_count, ev_ring) \
+	{						\
+		.num = ch_num,				\
+		.name = ch_name,			\
+		.num_elements = el_count,		\
+		.event_ring = ev_ring,			\
+		.dir = DMA_TO_DEVICE,			\
+		.ee_mask = BIT(MHI_EE_AMSS),		\
+		.pollcfg = 0,				\
+		.doorbell = MHI_DB_BRST_ENABLE,	\
+		.lpm_notify = false,			\
+		.offload_channel = false,		\
+		.doorbell_mode_switch = false,		\
+	}						\
+
+#define MHI_CHANNEL_CONFIG_HW_DL(ch_num, ch_name, el_count, ev_ring) \
+	{						\
+		.num = ch_num,				\
+		.name = ch_name,			\
+		.num_elements = el_count,		\
+		.event_ring = ev_ring,			\
+		.dir = DMA_FROM_DEVICE,			\
+		.ee_mask = BIT(MHI_EE_AMSS),		\
+		.pollcfg = 0,				\
+		.doorbell = MHI_DB_BRST_ENABLE,	\
+		.lpm_notify = false,			\
+		.offload_channel = false,		\
+		.doorbell_mode_switch = false,		\
+	}
+
 #define MHI_EVENT_CONFIG_DATA(ev_ring)		\
 	{					\
 		.num_elements = 128,		\
@@ -112,8 +142,8 @@ static const struct mhi_channel_config modem_qcom_v1_mhi_channels[] = {
 	MHI_CHANNEL_CONFIG_DL(15, "QMI", 4, 0),
 	MHI_CHANNEL_CONFIG_UL(20, "IPCR", 8, 0),
 	MHI_CHANNEL_CONFIG_DL(21, "IPCR", 8, 0),
-	MHI_CHANNEL_CONFIG_UL(100, "IP_HW0", 128, 1),
-	MHI_CHANNEL_CONFIG_DL(101, "IP_HW0", 128, 2),
+	MHI_CHANNEL_CONFIG_HW_UL(100, "IP_HW0", 128, 1),
+	MHI_CHANNEL_CONFIG_HW_DL(101, "IP_HW0", 128, 2),
 };
 
 static const struct mhi_event_config modem_qcom_v1_mhi_events[] = {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 4/8] mhi: pci_generic: Add support for reset
  2020-11-13 14:59 [PATCH 0/8] mhi: pci_generic: Misc improvements Loic Poulain
                   ` (2 preceding siblings ...)
  2020-11-13 14:59 ` [PATCH 3/8] mhi: pci_generic: Enable burst mode for hardware channels Loic Poulain
@ 2020-11-13 14:59 ` Loic Poulain
  2020-11-13 17:12   ` Bhaumik Bhatt
  2020-11-13 14:59 ` [PATCH 5/8] mhi: pci_generic: Add suspend/resume/recovery procedure Loic Poulain
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Loic Poulain @ 2020-11-13 14:59 UTC (permalink / raw)
  To: manivannan.sadhasivam, hemantk; +Cc: linux-arm-msm, Loic Poulain

Add support for resetting the device, reset can be triggered in case
of error or manually via sysfs (/sys/bus/pci/devices/*/reset).

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
---
 drivers/bus/mhi/pci_generic.c | 117 +++++++++++++++++++++++++++++++++++++-----
 1 file changed, 104 insertions(+), 13 deletions(-)

diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
index 0c07cf5..b48c382 100644
--- a/drivers/bus/mhi/pci_generic.c
+++ b/drivers/bus/mhi/pci_generic.c
@@ -8,6 +8,7 @@
  * Copyright (C) 2020 Linaro Ltd <loic.poulain@linaro.org>
  */
 
+#include <linux/delay.h>
 #include <linux/device.h>
 #include <linux/mhi.h>
 #include <linux/module.h>
@@ -179,6 +180,16 @@ static const struct pci_device_id mhi_pci_id_table[] = {
 };
 MODULE_DEVICE_TABLE(pci, mhi_pci_id_table);
 
+enum mhi_pci_device_status {
+	MHI_PCI_DEV_STARTED,
+};
+
+struct mhi_pci_device {
+	struct mhi_controller mhi_cntrl;
+	struct pci_saved_state *pci_state;
+	unsigned long status;
+};
+
 static int mhi_pci_read_reg(struct mhi_controller *mhi_cntrl,
 			    void __iomem *addr, u32 *out)
 {
@@ -203,6 +214,20 @@ static inline void mhi_pci_reset(struct mhi_controller *mhi_cntrl)
 	writel(1, mhi_cntrl->regs + DEV_RESET_REG);
 }
 
+static bool mhi_pci_is_alive(struct mhi_controller *mhi_cntrl)
+{
+	struct pci_dev *pdev = to_pci_dev(mhi_cntrl->cntrl_dev);
+	u16 vendor = 0;
+
+	if (pci_read_config_word(pdev, PCI_VENDOR_ID, &vendor))
+		return false;
+
+	if (vendor == (u16) ~0 || vendor == 0)
+		return false;
+
+	return true;
+}
+
 static int mhi_pci_claim(struct mhi_controller *mhi_cntrl,
 			 unsigned int bar_num, u64 dma_mask)
 {
@@ -298,16 +323,18 @@ static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	const struct mhi_pci_dev_info *info = (struct mhi_pci_dev_info *) id->driver_data;
 	const struct mhi_controller_config *mhi_cntrl_config;
+	struct mhi_pci_device *mhi_pdev;
 	struct mhi_controller *mhi_cntrl;
 	int err;
 
 	dev_dbg(&pdev->dev, "MHI PCI device found: %s\n", info->name);
 
-	mhi_cntrl = mhi_alloc_controller();
-	if (!mhi_cntrl)
+	mhi_pdev = devm_kzalloc(&pdev->dev, sizeof(*mhi_pdev), GFP_KERNEL);
+	if (!mhi_pdev)
 		return -ENOMEM;
 
 	mhi_cntrl_config = info->config;
+	mhi_cntrl = &mhi_pdev->mhi_cntrl;
 	mhi_cntrl->cntrl_dev = &pdev->dev;
 	mhi_cntrl->iova_start = 0;
 	mhi_cntrl->iova_stop = DMA_BIT_MASK(info->dma_data_width);
@@ -322,17 +349,21 @@ static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	err = mhi_pci_claim(mhi_cntrl, info->bar_num, DMA_BIT_MASK(info->dma_data_width));
 	if (err)
-		goto err_release;
+		return err;
 
 	err = mhi_pci_get_irqs(mhi_cntrl, mhi_cntrl_config);
 	if (err)
-		goto err_release;
+		return err;
+
+	pci_set_drvdata(pdev, mhi_pdev);
 
-	pci_set_drvdata(pdev, mhi_cntrl);
+	/* Have stored pci confspace at hand for restore in sudden PCI error */
+	pci_save_state(pdev);
+	mhi_pdev->pci_state = pci_store_saved_state(pdev);
 
 	err = mhi_register_controller(mhi_cntrl, mhi_cntrl_config);
 	if (err)
-		goto err_release;
+		return err;
 
 	/* MHI bus does not power up the controller by default */
 	err = mhi_prepare_for_power_up(mhi_cntrl);
@@ -347,37 +378,97 @@ static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		goto err_unprepare;
 	}
 
+	set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status);
+
 	return 0;
 
 err_unprepare:
 	mhi_unprepare_after_power_down(mhi_cntrl);
 err_unregister:
 	mhi_unregister_controller(mhi_cntrl);
-err_release:
-	mhi_free_controller(mhi_cntrl);
 
 	return err;
 }
 
 static void mhi_pci_remove(struct pci_dev *pdev)
 {
-	struct mhi_controller *mhi_cntrl = pci_get_drvdata(pdev);
+	struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
+	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
+
+	if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
+		mhi_power_down(mhi_cntrl, true);
+		mhi_unprepare_after_power_down(mhi_cntrl);
+	}
 
-	mhi_power_down(mhi_cntrl, true);
-	mhi_unprepare_after_power_down(mhi_cntrl);
 	mhi_unregister_controller(mhi_cntrl);
 
 	/* MHI-layer reset could not be enough, always hard-reset the device */
 	mhi_pci_reset(mhi_cntrl);
+}
+
+void mhi_pci_reset_prepare(struct pci_dev *pdev)
+{
+	struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
+	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
+
+	dev_info(&pdev->dev, "reset\n");
 
-	mhi_free_controller(mhi_cntrl);
+	/* Clean up MHI state */
+	if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
+		mhi_power_down(mhi_cntrl, false);
+		mhi_unprepare_after_power_down(mhi_cntrl);
+	}
+
+	/* cause internal device reset */
+	mhi_pci_reset(mhi_cntrl);
+
+	/* Be sure device reset has been executed */
+	msleep(500);
 }
 
+void mhi_pci_reset_done(struct pci_dev *pdev)
+{
+	struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
+	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
+	int err;
+
+	/* Restore initial known working PCI state */
+	pci_load_saved_state(pdev, mhi_pdev->pci_state);
+	pci_restore_state(pdev);
+
+	/* Is device status available ? */
+	if (!mhi_pci_is_alive(mhi_cntrl)) {
+		dev_err(&pdev->dev, "reset failed\n");
+		return;
+	}
+
+	err = mhi_prepare_for_power_up(mhi_cntrl);
+	if (err) {
+		dev_err(&pdev->dev, "failed to prepare MHI controller\n");
+		return;
+	}
+
+	err = mhi_sync_power_up(mhi_cntrl);
+	if (err) {
+		dev_err(&pdev->dev, "failed to power up MHI controller\n");
+		mhi_unprepare_after_power_down(mhi_cntrl);
+		return;
+	}
+
+	set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status);
+}
+
+static const struct pci_error_handlers mhi_pci_err_handler = {
+	.reset_prepare = mhi_pci_reset_prepare,
+	.reset_done = mhi_pci_reset_done,
+};
+
 static struct pci_driver mhi_pci_driver = {
 	.name		= "mhi-pci-generic",
 	.id_table	= mhi_pci_id_table,
 	.probe		= mhi_pci_probe,
-	.remove		= mhi_pci_remove
+	.remove		= mhi_pci_remove,
+	.err_handler	= &mhi_pci_err_handler,
 };
 module_pci_driver(mhi_pci_driver);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 5/8] mhi: pci_generic: Add suspend/resume/recovery procedure
  2020-11-13 14:59 [PATCH 0/8] mhi: pci_generic: Misc improvements Loic Poulain
                   ` (3 preceding siblings ...)
  2020-11-13 14:59 ` [PATCH 4/8] mhi: pci_generic: Add support for reset Loic Poulain
@ 2020-11-13 14:59 ` Loic Poulain
  2020-11-19  2:21   ` Hemant Kumar
  2020-11-13 15:00 ` [PATCH 6/8] mhi: pci_generic: Add PCI error handlers Loic Poulain
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 27+ messages in thread
From: Loic Poulain @ 2020-11-13 14:59 UTC (permalink / raw)
  To: manivannan.sadhasivam, hemantk; +Cc: linux-arm-msm, Loic Poulain

Add support for system wide suspend/resume. During suspend, MHI
device controller must be put in M3 state and PCI bus in D3 state.

Add a recovery procedure allowing to reinitialize the device in case
of error during resume steps, which can happen if device loses power
(and so its context) while system suspend.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
---
 drivers/bus/mhi/pci_generic.c | 100 ++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 100 insertions(+)

diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
index b48c382..75f565b 100644
--- a/drivers/bus/mhi/pci_generic.c
+++ b/drivers/bus/mhi/pci_generic.c
@@ -13,6 +13,7 @@
 #include <linux/mhi.h>
 #include <linux/module.h>
 #include <linux/pci.h>
+#include <linux/workqueue.h>
 
 #define MHI_PCI_DEFAULT_BAR_NUM 0
 
@@ -187,6 +188,7 @@ enum mhi_pci_device_status {
 struct mhi_pci_device {
 	struct mhi_controller mhi_cntrl;
 	struct pci_saved_state *pci_state;
+	struct work_struct recovery_work;
 	unsigned long status;
 };
 
@@ -319,6 +321,48 @@ static void mhi_pci_runtime_put(struct mhi_controller *mhi_cntrl)
 	/* no PM for now */
 }
 
+static void mhi_pci_recovery_work(struct work_struct *work)
+{
+	struct mhi_pci_device *mhi_pdev = container_of(work, struct mhi_pci_device,
+						       recovery_work);
+	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
+	struct pci_dev *pdev = to_pci_dev(mhi_cntrl->cntrl_dev);
+	int err;
+
+	dev_warn(&pdev->dev, "device recovery started\n");
+
+	/* Clean up MHI state */
+	if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
+		mhi_power_down(mhi_cntrl, false);
+		mhi_unprepare_after_power_down(mhi_cntrl);
+	}
+
+	/* Check if we can recover without full reset */
+	pci_set_power_state(pdev, PCI_D0);
+	pci_load_saved_state(pdev, mhi_pdev->pci_state);
+	pci_restore_state(pdev);
+
+	if (!mhi_pci_is_alive(mhi_cntrl))
+		goto err_try_reset;
+
+	err = mhi_prepare_for_power_up(mhi_cntrl);
+	if (err)
+		goto err_try_reset;
+
+	err = mhi_sync_power_up(mhi_cntrl);
+	if (err)
+		goto err_unprepare;
+
+	set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status);
+	return;
+
+err_unprepare:
+	mhi_unprepare_after_power_down(mhi_cntrl);
+err_try_reset:
+	if (pci_reset_function(pdev))
+		dev_err(&pdev->dev, "Recovery failed\n");
+}
+
 static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	const struct mhi_pci_dev_info *info = (struct mhi_pci_dev_info *) id->driver_data;
@@ -333,6 +377,8 @@ static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	if (!mhi_pdev)
 		return -ENOMEM;
 
+	INIT_WORK(&mhi_pdev->recovery_work, mhi_pci_recovery_work);
+
 	mhi_cntrl_config = info->config;
 	mhi_cntrl = &mhi_pdev->mhi_cntrl;
 	mhi_cntrl->cntrl_dev = &pdev->dev;
@@ -395,6 +441,8 @@ static void mhi_pci_remove(struct pci_dev *pdev)
 	struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
 	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
 
+	cancel_work_sync(&mhi_pdev->recovery_work);
+
 	if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
 		mhi_power_down(mhi_cntrl, true);
 		mhi_unprepare_after_power_down(mhi_cntrl);
@@ -463,12 +511,64 @@ static const struct pci_error_handlers mhi_pci_err_handler = {
 	.reset_done = mhi_pci_reset_done,
 };
 
+int  __maybe_unused mhi_pci_suspend(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
+	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
+
+	/* Transition to M3 state */
+	mhi_pm_suspend(mhi_cntrl);
+
+	pci_save_state(pdev);
+	pci_disable_device(pdev);
+	pci_wake_from_d3(pdev, true);
+	pci_set_power_state(pdev, PCI_D3hot);
+
+	return 0;
+}
+
+static int __maybe_unused mhi_pci_resume(struct device *dev)
+{
+	struct pci_dev *pdev = to_pci_dev(dev);
+	struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
+	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
+	int err;
+
+	pci_set_power_state(pdev, PCI_D0);
+	pci_restore_state(pdev);
+	pci_set_master(pdev);
+
+	err = pci_enable_device(pdev);
+	if (err)
+		goto err_recovery;
+
+	/* Exit M3, transition to M0 state */
+	err = mhi_pm_resume(mhi_cntrl);
+	if (err) {
+		dev_err(&pdev->dev, "failed to resume device: %d\n", err);
+		goto err_recovery;
+	}
+
+	return 0;
+
+err_recovery:
+	/* The device may have loose power or crashed, try recovering it */
+	queue_work(system_long_wq, &mhi_pdev->recovery_work);
+	return 0;
+}
+
+static const struct dev_pm_ops mhi_pci_pm_ops = {
+	SET_SYSTEM_SLEEP_PM_OPS(mhi_pci_suspend, mhi_pci_resume)
+};
+
 static struct pci_driver mhi_pci_driver = {
 	.name		= "mhi-pci-generic",
 	.id_table	= mhi_pci_id_table,
 	.probe		= mhi_pci_probe,
 	.remove		= mhi_pci_remove,
 	.err_handler	= &mhi_pci_err_handler,
+	.driver.pm	= &mhi_pci_pm_ops
 };
 module_pci_driver(mhi_pci_driver);
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 6/8] mhi: pci_generic: Add PCI error handlers
  2020-11-13 14:59 [PATCH 0/8] mhi: pci_generic: Misc improvements Loic Poulain
                   ` (4 preceding siblings ...)
  2020-11-13 14:59 ` [PATCH 5/8] mhi: pci_generic: Add suspend/resume/recovery procedure Loic Poulain
@ 2020-11-13 15:00 ` Loic Poulain
  2020-11-13 15:00 ` [PATCH 7/8] mhi: pci_generic: Add health-check Loic Poulain
  2020-11-13 15:00 ` [PATCH 8/8] mhi: pci_generic: Increase controller timeout value Loic Poulain
  7 siblings, 0 replies; 27+ messages in thread
From: Loic Poulain @ 2020-11-13 15:00 UTC (permalink / raw)
  To: manivannan.sadhasivam, hemantk; +Cc: linux-arm-msm, Loic Poulain

In AER capable root complex, errors are reported to the host which
can then act accordingly and perform PCI recovering procedure.

This patch enables error reporting and implements error_detected,
slot_reset and resume callbacks.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
---
 drivers/bus/mhi/pci_generic.c | 50 +++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 50 insertions(+)

diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
index 75f565b..a7fbba5 100644
--- a/drivers/bus/mhi/pci_generic.c
+++ b/drivers/bus/mhi/pci_generic.c
@@ -8,6 +8,7 @@
  * Copyright (C) 2020 Linaro Ltd <loic.poulain@linaro.org>
  */
 
+#include <linux/aer.h>
 #include <linux/delay.h>
 #include <linux/device.h>
 #include <linux/mhi.h>
@@ -407,6 +408,8 @@ static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 	pci_save_state(pdev);
 	mhi_pdev->pci_state = pci_store_saved_state(pdev);
 
+	pci_enable_pcie_error_reporting(pdev);
+
 	err = mhi_register_controller(mhi_cntrl, mhi_cntrl_config);
 	if (err)
 		return err;
@@ -506,7 +509,54 @@ void mhi_pci_reset_done(struct pci_dev *pdev)
 	set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status);
 }
 
+static pci_ers_result_t mhi_pci_error_detected(struct pci_dev *pdev,
+					       pci_channel_state_t state)
+{
+	struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
+	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
+
+	dev_err(&pdev->dev, "PCI error detected, state = %u\n", state);
+
+	if (state == pci_channel_io_perm_failure)
+		return PCI_ERS_RESULT_DISCONNECT;
+
+	/* Clean up MHI state */
+	if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
+		mhi_power_down(mhi_cntrl, false);
+		mhi_unprepare_after_power_down(mhi_cntrl);
+	} else {
+		/* Nothing to do */
+		return PCI_ERS_RESULT_RECOVERED;
+	}
+
+	pci_disable_device(pdev);
+
+	return PCI_ERS_RESULT_NEED_RESET;
+}
+
+static pci_ers_result_t mhi_pci_slot_reset(struct pci_dev *pdev)
+{
+	if (pci_enable_device(pdev)) {
+		dev_err(&pdev->dev, "Cannot re-enable PCI device after reset.\n");
+		return PCI_ERS_RESULT_DISCONNECT;
+	}
+
+	return PCI_ERS_RESULT_RECOVERED;
+}
+
+static void mhi_pci_io_resume(struct pci_dev *pdev)
+{
+	struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
+
+	dev_err(&pdev->dev, "PCI slot reset done\n");
+
+	queue_work(system_long_wq, &mhi_pdev->recovery_work);
+}
+
 static const struct pci_error_handlers mhi_pci_err_handler = {
+	.error_detected = mhi_pci_error_detected,
+	.slot_reset = mhi_pci_slot_reset,
+	.resume = mhi_pci_io_resume,
 	.reset_prepare = mhi_pci_reset_prepare,
 	.reset_done = mhi_pci_reset_done,
 };
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 7/8] mhi: pci_generic: Add health-check
  2020-11-13 14:59 [PATCH 0/8] mhi: pci_generic: Misc improvements Loic Poulain
                   ` (5 preceding siblings ...)
  2020-11-13 15:00 ` [PATCH 6/8] mhi: pci_generic: Add PCI error handlers Loic Poulain
@ 2020-11-13 15:00 ` Loic Poulain
  2020-11-13 15:00 ` [PATCH 8/8] mhi: pci_generic: Increase controller timeout value Loic Poulain
  7 siblings, 0 replies; 27+ messages in thread
From: Loic Poulain @ 2020-11-13 15:00 UTC (permalink / raw)
  To: manivannan.sadhasivam, hemantk; +Cc: linux-arm-msm, Loic Poulain

If the modem crashes for any reason, we may not be able to detect
it at MHI level (MHI registers not reachable anymore).

This patch implements a health-check mechanism to check regularly
that device is alive (MHI layer can communicate with). If device
is not alive (because a crash or unexpected reset), the recovery
procedure is triggered.

Tested successfully with Telit FN980m module.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
---
 drivers/bus/mhi/pci_generic.c | 36 ++++++++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
index a7fbba5..6cc7bb9 100644
--- a/drivers/bus/mhi/pci_generic.c
+++ b/drivers/bus/mhi/pci_generic.c
@@ -14,12 +14,15 @@
 #include <linux/mhi.h>
 #include <linux/module.h>
 #include <linux/pci.h>
+#include <linux/timer.h>
 #include <linux/workqueue.h>
 
 #define MHI_PCI_DEFAULT_BAR_NUM 0
 
 #define DEV_RESET_REG (0xB0)
 
+#define HEALTH_CHECK_PERIOD (HZ * 5)
+
 /**
  * struct mhi_pci_dev_info - MHI PCI device specific information
  * @config: MHI controller configuration
@@ -190,6 +193,7 @@ struct mhi_pci_device {
 	struct mhi_controller mhi_cntrl;
 	struct pci_saved_state *pci_state;
 	struct work_struct recovery_work;
+	struct timer_list health_check_timer;
 	unsigned long status;
 };
 
@@ -332,6 +336,8 @@ static void mhi_pci_recovery_work(struct work_struct *work)
 
 	dev_warn(&pdev->dev, "device recovery started\n");
 
+	del_timer(&mhi_pdev->health_check_timer);
+
 	/* Clean up MHI state */
 	if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
 		mhi_power_down(mhi_cntrl, false);
@@ -355,6 +361,7 @@ static void mhi_pci_recovery_work(struct work_struct *work)
 		goto err_unprepare;
 
 	set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status);
+	mod_timer(&mhi_pdev->health_check_timer, jiffies + HEALTH_CHECK_PERIOD);
 	return;
 
 err_unprepare:
@@ -364,6 +371,21 @@ static void mhi_pci_recovery_work(struct work_struct *work)
 		dev_err(&pdev->dev, "Recovery failed\n");
 }
 
+static void health_check(struct timer_list *t)
+{
+	struct mhi_pci_device *mhi_pdev = from_timer(mhi_pdev, t, health_check_timer);
+	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
+
+	if (!mhi_pci_is_alive(mhi_cntrl)) {
+		dev_err(mhi_cntrl->cntrl_dev, "Device died\n");
+		queue_work(system_long_wq, &mhi_pdev->recovery_work);
+		return;
+	}
+
+	/* reschedule in two seconds */
+	mod_timer(&mhi_pdev->health_check_timer, jiffies + HEALTH_CHECK_PERIOD);
+}
+
 static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 {
 	const struct mhi_pci_dev_info *info = (struct mhi_pci_dev_info *) id->driver_data;
@@ -379,6 +401,7 @@ static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 		return -ENOMEM;
 
 	INIT_WORK(&mhi_pdev->recovery_work, mhi_pci_recovery_work);
+	timer_setup(&mhi_pdev->health_check_timer, health_check, 0);
 
 	mhi_cntrl_config = info->config;
 	mhi_cntrl = &mhi_pdev->mhi_cntrl;
@@ -429,6 +452,9 @@ static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
 
 	set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status);
 
+	/* start health check */
+	mod_timer(&mhi_pdev->health_check_timer, jiffies + HEALTH_CHECK_PERIOD);
+
 	return 0;
 
 err_unprepare:
@@ -444,6 +470,7 @@ static void mhi_pci_remove(struct pci_dev *pdev)
 	struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
 	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
 
+	del_timer(&mhi_pdev->health_check_timer);
 	cancel_work_sync(&mhi_pdev->recovery_work);
 
 	if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
@@ -464,6 +491,8 @@ void mhi_pci_reset_prepare(struct pci_dev *pdev)
 
 	dev_info(&pdev->dev, "reset\n");
 
+	del_timer(&mhi_pdev->health_check_timer);
+
 	/* Clean up MHI state */
 	if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
 		mhi_power_down(mhi_cntrl, false);
@@ -507,6 +536,7 @@ void mhi_pci_reset_done(struct pci_dev *pdev)
 	}
 
 	set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status);
+	mod_timer(&mhi_pdev->health_check_timer, jiffies + HEALTH_CHECK_PERIOD);
 }
 
 static pci_ers_result_t mhi_pci_error_detected(struct pci_dev *pdev,
@@ -567,6 +597,9 @@ int  __maybe_unused mhi_pci_suspend(struct device *dev)
 	struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
 	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
 
+	/* Suspend health check */
+	del_timer(&mhi_pdev->health_check_timer);
+
 	/* Transition to M3 state */
 	mhi_pm_suspend(mhi_cntrl);
 
@@ -600,6 +633,9 @@ static int __maybe_unused mhi_pci_resume(struct device *dev)
 		goto err_recovery;
 	}
 
+	/* Resume health check */
+	mod_timer(&mhi_pdev->health_check_timer, jiffies + HEALTH_CHECK_PERIOD);
+
 	return 0;
 
 err_recovery:
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* [PATCH 8/8] mhi: pci_generic: Increase controller timeout value
  2020-11-13 14:59 [PATCH 0/8] mhi: pci_generic: Misc improvements Loic Poulain
                   ` (6 preceding siblings ...)
  2020-11-13 15:00 ` [PATCH 7/8] mhi: pci_generic: Add health-check Loic Poulain
@ 2020-11-13 15:00 ` Loic Poulain
  2020-11-13 17:08   ` Bhaumik Bhatt
  2020-11-25 20:23   ` Hemant Kumar
  7 siblings, 2 replies; 27+ messages in thread
From: Loic Poulain @ 2020-11-13 15:00 UTC (permalink / raw)
  To: manivannan.sadhasivam, hemantk; +Cc: linux-arm-msm, Loic Poulain

On cold boot, device can take slightly more than 5 seconds to start.
Increase the timeout to prevent MHI power-up issues.

Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
---
 drivers/bus/mhi/pci_generic.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
index 6cc7bb9..7f0068c 100644
--- a/drivers/bus/mhi/pci_generic.c
+++ b/drivers/bus/mhi/pci_generic.c
@@ -162,7 +162,7 @@ static const struct mhi_event_config modem_qcom_v1_mhi_events[] = {
 
 static const struct mhi_controller_config modem_qcom_v1_mhiv_config = {
 	.max_channels = 128,
-	.timeout_ms = 5000,
+	.timeout_ms = 8000,
 	.num_channels = ARRAY_SIZE(modem_qcom_v1_mhi_channels),
 	.ch_cfg = modem_qcom_v1_mhi_channels,
 	.num_events = ARRAY_SIZE(modem_qcom_v1_mhi_events),
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] mhi: pci-generic: Increase number of hardware events
  2020-11-13 14:59 ` [PATCH 1/8] mhi: pci-generic: Increase number of hardware events Loic Poulain
@ 2020-11-13 16:56   ` Bhaumik Bhatt
  2020-11-18  3:54   ` Hemant Kumar
  1 sibling, 0 replies; 27+ messages in thread
From: Bhaumik Bhatt @ 2020-11-13 16:56 UTC (permalink / raw)
  To: Loic Poulain; +Cc: manivannan.sadhasivam, hemantk, linux-arm-msm

On 2020-11-13 06:59, Loic Poulain wrote:
> If the IPA (IP hardware accelerator) is starved of event ring elements,
> the modem is crashing (SDX55). That can be prevented by setting a
> larger number of events (i.e 2 x number of channel ring elements).
> 
> Tested with FN980m module.
> 
> Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
> ---
>  drivers/bus/mhi/pci_generic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/mhi/pci_generic.c 
> b/drivers/bus/mhi/pci_generic.c
> index e3df838..13a7e4f 100644
> --- a/drivers/bus/mhi/pci_generic.c
> +++ b/drivers/bus/mhi/pci_generic.c
> @@ -91,7 +91,7 @@ struct mhi_pci_dev_info {
> 
>  #define MHI_EVENT_CONFIG_HW_DATA(ev_ring, ch_num) \
>  	{					\
> -		.num_elements = 128,		\
> +		.num_elements = 256,		\
>  		.irq_moderation_ms = 5,		\
>  		.irq = (ev_ring) + 1,		\
>  		.priority = 1,			\

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 3/8] mhi: pci_generic: Enable burst mode for hardware channels
  2020-11-13 14:59 ` [PATCH 3/8] mhi: pci_generic: Enable burst mode for hardware channels Loic Poulain
@ 2020-11-13 17:06   ` Bhaumik Bhatt
  0 siblings, 0 replies; 27+ messages in thread
From: Bhaumik Bhatt @ 2020-11-13 17:06 UTC (permalink / raw)
  To: Loic Poulain; +Cc: manivannan.sadhasivam, hemantk, linux-arm-msm

Hi Loic,
On 2020-11-13 06:59, Loic Poulain wrote:
> Hardware channels have a feature called burst mode that allows to
> queue transfer ring element(s) (TRE) to a channel without ringing
> the device doorbell. In that mode, the device is polling the channel
> context for new elements. This reduces the frequency of host initiated
> doorbells and increase throughput.
> 
> Create a new dedicated macro for hardware channels with burst enabled.
> 
> Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
> ---
>  drivers/bus/mhi/pci_generic.c | 34 ++++++++++++++++++++++++++++++++--
>  1 file changed, 32 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/bus/mhi/pci_generic.c 
> b/drivers/bus/mhi/pci_generic.c
> index 09c6b26..0c07cf5 100644
> --- a/drivers/bus/mhi/pci_generic.c
> +++ b/drivers/bus/mhi/pci_generic.c
> @@ -78,6 +78,36 @@ struct mhi_pci_dev_info {
>  		.offload_channel = false,	\
>  	}
> 
> +#define MHI_CHANNEL_CONFIG_HW_UL(ch_num, ch_name, el_count, ev_ring) \
> +	{						\
> +		.num = ch_num,				\
> +		.name = ch_name,			\
> +		.num_elements = el_count,		\
> +		.event_ring = ev_ring,			\
> +		.dir = DMA_TO_DEVICE,			\
> +		.ee_mask = BIT(MHI_EE_AMSS),		\
> +		.pollcfg = 0,				\
> +		.doorbell = MHI_DB_BRST_ENABLE,	\
> +		.lpm_notify = false,			\
> +		.offload_channel = false,		\
> +		.doorbell_mode_switch = false,		\
doorbell_mode_switch should be true for channel 100 (UL). Any reason why 
the boolean
fields are not yet configurable from the parameters to this macro? That 
will be
needed going forward so it should be changed now.

doorbell also may not be "enabled" by default. Basically, let's have 
something that
can be easily configured for different kinds of HW channels.
> +	}						\
> +
> +#define MHI_CHANNEL_CONFIG_HW_DL(ch_num, ch_name, el_count, ev_ring) \
> +	{						\
> +		.num = ch_num,				\
> +		.name = ch_name,			\
> +		.num_elements = el_count,		\
> +		.event_ring = ev_ring,			\
> +		.dir = DMA_FROM_DEVICE,			\
> +		.ee_mask = BIT(MHI_EE_AMSS),		\
> +		.pollcfg = 0,				\
> +		.doorbell = MHI_DB_BRST_ENABLE,	\
> +		.lpm_notify = false,			\
> +		.offload_channel = false,		\
> +		.doorbell_mode_switch = false,		\
> +	}
> +
>  #define MHI_EVENT_CONFIG_DATA(ev_ring)		\
>  	{					\
>  		.num_elements = 128,		\
> @@ -112,8 +142,8 @@ static const struct mhi_channel_config
> modem_qcom_v1_mhi_channels[] = {
>  	MHI_CHANNEL_CONFIG_DL(15, "QMI", 4, 0),
>  	MHI_CHANNEL_CONFIG_UL(20, "IPCR", 8, 0),
>  	MHI_CHANNEL_CONFIG_DL(21, "IPCR", 8, 0),
> -	MHI_CHANNEL_CONFIG_UL(100, "IP_HW0", 128, 1),
> -	MHI_CHANNEL_CONFIG_DL(101, "IP_HW0", 128, 2),
> +	MHI_CHANNEL_CONFIG_HW_UL(100, "IP_HW0", 128, 1),
> +	MHI_CHANNEL_CONFIG_HW_DL(101, "IP_HW0", 128, 2),
>  };
> 
>  static const struct mhi_event_config modem_qcom_v1_mhi_events[] = {
Thanks,
Bhaumik
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8] mhi: pci_generic: Increase controller timeout value
  2020-11-13 15:00 ` [PATCH 8/8] mhi: pci_generic: Increase controller timeout value Loic Poulain
@ 2020-11-13 17:08   ` Bhaumik Bhatt
  2020-11-25 20:23   ` Hemant Kumar
  1 sibling, 0 replies; 27+ messages in thread
From: Bhaumik Bhatt @ 2020-11-13 17:08 UTC (permalink / raw)
  To: Loic Poulain; +Cc: manivannan.sadhasivam, hemantk, linux-arm-msm

On 2020-11-13 07:00, Loic Poulain wrote:
> On cold boot, device can take slightly more than 5 seconds to start.
> Increase the timeout to prevent MHI power-up issues.
> 
> Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Bhaumik Bhatt <bbhatt@codeaurora.org>
> ---
>  drivers/bus/mhi/pci_generic.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/mhi/pci_generic.c 
> b/drivers/bus/mhi/pci_generic.c
> index 6cc7bb9..7f0068c 100644
> --- a/drivers/bus/mhi/pci_generic.c
> +++ b/drivers/bus/mhi/pci_generic.c
> @@ -162,7 +162,7 @@ static const struct mhi_event_config
> modem_qcom_v1_mhi_events[] = {
> 
>  static const struct mhi_controller_config modem_qcom_v1_mhiv_config = 
> {
>  	.max_channels = 128,
> -	.timeout_ms = 5000,
> +	.timeout_ms = 8000,
>  	.num_channels = ARRAY_SIZE(modem_qcom_v1_mhi_channels),
>  	.ch_cfg = modem_qcom_v1_mhi_channels,
>  	.num_events = ARRAY_SIZE(modem_qcom_v1_mhi_events),

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/8] mhi: pci_generic: Add support for reset
  2020-11-13 14:59 ` [PATCH 4/8] mhi: pci_generic: Add support for reset Loic Poulain
@ 2020-11-13 17:12   ` Bhaumik Bhatt
  2020-11-13 17:42     ` Loic Poulain
  0 siblings, 1 reply; 27+ messages in thread
From: Bhaumik Bhatt @ 2020-11-13 17:12 UTC (permalink / raw)
  To: Loic Poulain; +Cc: manivannan.sadhasivam, hemantk, linux-arm-msm

On 2020-11-13 06:59, Loic Poulain wrote:
> Add support for resetting the device, reset can be triggered in case
> of error or manually via sysfs (/sys/bus/pci/devices/*/reset).
> 
> Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
> ---
>  drivers/bus/mhi/pci_generic.c | 117 
> +++++++++++++++++++++++++++++++++++++-----
>  1 file changed, 104 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/bus/mhi/pci_generic.c 
> b/drivers/bus/mhi/pci_generic.c
> index 0c07cf5..b48c382 100644
> --- a/drivers/bus/mhi/pci_generic.c
> +++ b/drivers/bus/mhi/pci_generic.c
> @@ -8,6 +8,7 @@
>   * Copyright (C) 2020 Linaro Ltd <loic.poulain@linaro.org>
>   */
> 
> +#include <linux/delay.h>
>  #include <linux/device.h>
>  #include <linux/mhi.h>
>  #include <linux/module.h>
> @@ -179,6 +180,16 @@ static const struct pci_device_id 
> mhi_pci_id_table[] = {
>  };
>  MODULE_DEVICE_TABLE(pci, mhi_pci_id_table);
> 
> +enum mhi_pci_device_status {
> +	MHI_PCI_DEV_STARTED,
> +};
> +
> +struct mhi_pci_device {
> +	struct mhi_controller mhi_cntrl;
> +	struct pci_saved_state *pci_state;
> +	unsigned long status;
> +};
> +
>  static int mhi_pci_read_reg(struct mhi_controller *mhi_cntrl,
>  			    void __iomem *addr, u32 *out)
>  {
> @@ -203,6 +214,20 @@ static inline void mhi_pci_reset(struct
> mhi_controller *mhi_cntrl)
>  	writel(1, mhi_cntrl->regs + DEV_RESET_REG);
>  }
> 
> +static bool mhi_pci_is_alive(struct mhi_controller *mhi_cntrl)
> +{
> +	struct pci_dev *pdev = to_pci_dev(mhi_cntrl->cntrl_dev);
> +	u16 vendor = 0;
> +
> +	if (pci_read_config_word(pdev, PCI_VENDOR_ID, &vendor))
> +		return false;
> +
> +	if (vendor == (u16) ~0 || vendor == 0)
> +		return false;
> +
> +	return true;
> +}
> +
>  static int mhi_pci_claim(struct mhi_controller *mhi_cntrl,
>  			 unsigned int bar_num, u64 dma_mask)
>  {
> @@ -298,16 +323,18 @@ static int mhi_pci_probe(struct pci_dev *pdev,
> const struct pci_device_id *id)
>  {
>  	const struct mhi_pci_dev_info *info = (struct mhi_pci_dev_info *)
> id->driver_data;
>  	const struct mhi_controller_config *mhi_cntrl_config;
> +	struct mhi_pci_device *mhi_pdev;
>  	struct mhi_controller *mhi_cntrl;
>  	int err;
> 
>  	dev_dbg(&pdev->dev, "MHI PCI device found: %s\n", info->name);
> 
> -	mhi_cntrl = mhi_alloc_controller();
> -	if (!mhi_cntrl)
Not recommended.
> +	mhi_pdev = devm_kzalloc(&pdev->dev, sizeof(*mhi_pdev), GFP_KERNEL);
> +	if (!mhi_pdev)
>  		return -ENOMEM;
Why move away from the mhi_alloc_controller/mhi_free_controller pair we
recommended for use?
> 
>  	mhi_cntrl_config = info->config;
> +	mhi_cntrl = &mhi_pdev->mhi_cntrl;
>  	mhi_cntrl->cntrl_dev = &pdev->dev;
>  	mhi_cntrl->iova_start = 0;
>  	mhi_cntrl->iova_stop = DMA_BIT_MASK(info->dma_data_width);
> @@ -322,17 +349,21 @@ static int mhi_pci_probe(struct pci_dev *pdev,
> const struct pci_device_id *id)
> 
>  	err = mhi_pci_claim(mhi_cntrl, info->bar_num,
> DMA_BIT_MASK(info->dma_data_width));
>  	if (err)
> -		goto err_release;
> +		return err;
> 
>  	err = mhi_pci_get_irqs(mhi_cntrl, mhi_cntrl_config);
>  	if (err)
> -		goto err_release;
> +		return err;
> +
> +	pci_set_drvdata(pdev, mhi_pdev);
> 
> -	pci_set_drvdata(pdev, mhi_cntrl);
> +	/* Have stored pci confspace at hand for restore in sudden PCI error 
> */
> +	pci_save_state(pdev);
> +	mhi_pdev->pci_state = pci_store_saved_state(pdev);
> 
>  	err = mhi_register_controller(mhi_cntrl, mhi_cntrl_config);
>  	if (err)
> -		goto err_release;
> +		return err;
> 
>  	/* MHI bus does not power up the controller by default */
>  	err = mhi_prepare_for_power_up(mhi_cntrl);
> @@ -347,37 +378,97 @@ static int mhi_pci_probe(struct pci_dev *pdev,
> const struct pci_device_id *id)
>  		goto err_unprepare;
>  	}
> 
> +	set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status);
> +
>  	return 0;
> 
>  err_unprepare:
>  	mhi_unprepare_after_power_down(mhi_cntrl);
>  err_unregister:
>  	mhi_unregister_controller(mhi_cntrl);
> -err_release:
> -	mhi_free_controller(mhi_cntrl);
Not recommended.
> 
>  	return err;
>  }
> 
>  static void mhi_pci_remove(struct pci_dev *pdev)
>  {
> -	struct mhi_controller *mhi_cntrl = pci_get_drvdata(pdev);
> +	struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
> +	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
> +
> +	if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
> +		mhi_power_down(mhi_cntrl, true);
> +		mhi_unprepare_after_power_down(mhi_cntrl);
> +	}
> 
> -	mhi_power_down(mhi_cntrl, true);
> -	mhi_unprepare_after_power_down(mhi_cntrl);
>  	mhi_unregister_controller(mhi_cntrl);
> 
>  	/* MHI-layer reset could not be enough, always hard-reset the device 
> */
>  	mhi_pci_reset(mhi_cntrl);
> +}
> +
> +void mhi_pci_reset_prepare(struct pci_dev *pdev)
> +{
> +	struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
> +	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
> +
> +	dev_info(&pdev->dev, "reset\n");
> 
> -	mhi_free_controller(mhi_cntrl);
> +	/* Clean up MHI state */
> +	if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
> +		mhi_power_down(mhi_cntrl, false);
> +		mhi_unprepare_after_power_down(mhi_cntrl);
> +	}
> +
> +	/* cause internal device reset */
> +	mhi_pci_reset(mhi_cntrl);
> +
> +	/* Be sure device reset has been executed */
> +	msleep(500);
>  }
> 
> +void mhi_pci_reset_done(struct pci_dev *pdev)
> +{
> +	struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
> +	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
> +	int err;
> +
> +	/* Restore initial known working PCI state */
> +	pci_load_saved_state(pdev, mhi_pdev->pci_state);
> +	pci_restore_state(pdev);
> +
> +	/* Is device status available ? */
> +	if (!mhi_pci_is_alive(mhi_cntrl)) {
> +		dev_err(&pdev->dev, "reset failed\n");
> +		return;
> +	}
> +
> +	err = mhi_prepare_for_power_up(mhi_cntrl);
> +	if (err) {
> +		dev_err(&pdev->dev, "failed to prepare MHI controller\n");
> +		return;
> +	}
> +
> +	err = mhi_sync_power_up(mhi_cntrl);
> +	if (err) {
> +		dev_err(&pdev->dev, "failed to power up MHI controller\n");
> +		mhi_unprepare_after_power_down(mhi_cntrl);
> +		return;
> +	}
> +
> +	set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status);
> +}
> +
> +static const struct pci_error_handlers mhi_pci_err_handler = {
> +	.reset_prepare = mhi_pci_reset_prepare,
> +	.reset_done = mhi_pci_reset_done,
> +};
> +
>  static struct pci_driver mhi_pci_driver = {
>  	.name		= "mhi-pci-generic",
>  	.id_table	= mhi_pci_id_table,
>  	.probe		= mhi_pci_probe,
> -	.remove		= mhi_pci_remove
> +	.remove		= mhi_pci_remove,
> +	.err_handler	= &mhi_pci_err_handler,
>  };
>  module_pci_driver(mhi_pci_driver);

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/8] mhi: pci_generic: Add support for reset
  2020-11-13 17:12   ` Bhaumik Bhatt
@ 2020-11-13 17:42     ` Loic Poulain
  2020-11-13 17:44       ` Bhaumik Bhatt
  0 siblings, 1 reply; 27+ messages in thread
From: Loic Poulain @ 2020-11-13 17:42 UTC (permalink / raw)
  To: Bhaumik Bhatt; +Cc: Manivannan Sadhasivam, Hemant Kumar, linux-arm-msm

Hi Bhaumik,

On Fri, 13 Nov 2020 at 18:12, Bhaumik Bhatt <bbhatt@codeaurora.org> wrote:
>
> On 2020-11-13 06:59, Loic Poulain wrote:
> > Add support for resetting the device, reset can be triggered in case
> > of error or manually via sysfs (/sys/bus/pci/devices/*/reset).
> >
> > Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
> > ---
> >  drivers/bus/mhi/pci_generic.c | 117
> > +++++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 104 insertions(+), 13 deletions(-)
> >
> > diff --git a/drivers/bus/mhi/pci_generic.c
> > b/drivers/bus/mhi/pci_generic.c
> > index 0c07cf5..b48c382 100644
> > --- a/drivers/bus/mhi/pci_generic.c
> > +++ b/drivers/bus/mhi/pci_generic.c
> > @@ -8,6 +8,7 @@
> >   * Copyright (C) 2020 Linaro Ltd <loic.poulain@linaro.org>
> >   */
> >
> > +#include <linux/delay.h>
> >  #include <linux/device.h>
> >  #include <linux/mhi.h>
> >  #include <linux/module.h>
> > @@ -179,6 +180,16 @@ static const struct pci_device_id
> > mhi_pci_id_table[] = {
> >  };
> >  MODULE_DEVICE_TABLE(pci, mhi_pci_id_table);
> >
> > +enum mhi_pci_device_status {
> > +     MHI_PCI_DEV_STARTED,
> > +};
> > +
> > +struct mhi_pci_device {
> > +     struct mhi_controller mhi_cntrl;
> > +     struct pci_saved_state *pci_state;
> > +     unsigned long status;
> > +};
> > +
> >  static int mhi_pci_read_reg(struct mhi_controller *mhi_cntrl,
> >                           void __iomem *addr, u32 *out)
> >  {
> > @@ -203,6 +214,20 @@ static inline void mhi_pci_reset(struct
> > mhi_controller *mhi_cntrl)
> >       writel(1, mhi_cntrl->regs + DEV_RESET_REG);
> >  }
> >
> > +static bool mhi_pci_is_alive(struct mhi_controller *mhi_cntrl)
> > +{
> > +     struct pci_dev *pdev = to_pci_dev(mhi_cntrl->cntrl_dev);
> > +     u16 vendor = 0;
> > +
> > +     if (pci_read_config_word(pdev, PCI_VENDOR_ID, &vendor))
> > +             return false;
> > +
> > +     if (vendor == (u16) ~0 || vendor == 0)
> > +             return false;
> > +
> > +     return true;
> > +}
> > +
> >  static int mhi_pci_claim(struct mhi_controller *mhi_cntrl,
> >                        unsigned int bar_num, u64 dma_mask)
> >  {
> > @@ -298,16 +323,18 @@ static int mhi_pci_probe(struct pci_dev *pdev,
> > const struct pci_device_id *id)
> >  {
> >       const struct mhi_pci_dev_info *info = (struct mhi_pci_dev_info *)
> > id->driver_data;
> >       const struct mhi_controller_config *mhi_cntrl_config;
> > +     struct mhi_pci_device *mhi_pdev;
> >       struct mhi_controller *mhi_cntrl;
> >       int err;
> >
> >       dev_dbg(&pdev->dev, "MHI PCI device found: %s\n", info->name);
> >
> > -     mhi_cntrl = mhi_alloc_controller();
> > -     if (!mhi_cntrl)
> Not recommended.
> > +     mhi_pdev = devm_kzalloc(&pdev->dev, sizeof(*mhi_pdev), GFP_KERNEL);
> > +     if (!mhi_pdev)
> >               return -ENOMEM;
> Why move away from the mhi_alloc_controller/mhi_free_controller pair we
> recommended for use?

Because now I don't allocate mhi controller separately, it's part of
the allocated mhi_pci_device that inherit from it. If necessary we
would need something like mhi_init_controller() to initialize the
field, but AFAIU everything needs to be zeroed.

Regards,
Loic

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 4/8] mhi: pci_generic: Add support for reset
  2020-11-13 17:42     ` Loic Poulain
@ 2020-11-13 17:44       ` Bhaumik Bhatt
  0 siblings, 0 replies; 27+ messages in thread
From: Bhaumik Bhatt @ 2020-11-13 17:44 UTC (permalink / raw)
  To: Loic Poulain; +Cc: Manivannan Sadhasivam, Hemant Kumar, linux-arm-msm

On 2020-11-13 09:42, Loic Poulain wrote:
> Hi Bhaumik,
> 
> On Fri, 13 Nov 2020 at 18:12, Bhaumik Bhatt <bbhatt@codeaurora.org> 
> wrote:
>> 
>> On 2020-11-13 06:59, Loic Poulain wrote:
>> > Add support for resetting the device, reset can be triggered in case
>> > of error or manually via sysfs (/sys/bus/pci/devices/*/reset).
>> >
>> > Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
>> > ---
>> >  drivers/bus/mhi/pci_generic.c | 117
>> > +++++++++++++++++++++++++++++++++++++-----
>> >  1 file changed, 104 insertions(+), 13 deletions(-)
>> >
>> > diff --git a/drivers/bus/mhi/pci_generic.c
>> > b/drivers/bus/mhi/pci_generic.c
>> > index 0c07cf5..b48c382 100644
>> > --- a/drivers/bus/mhi/pci_generic.c
>> > +++ b/drivers/bus/mhi/pci_generic.c
>> > @@ -8,6 +8,7 @@
>> >   * Copyright (C) 2020 Linaro Ltd <loic.poulain@linaro.org>
>> >   */
>> >
>> > +#include <linux/delay.h>
>> >  #include <linux/device.h>
>> >  #include <linux/mhi.h>
>> >  #include <linux/module.h>
>> > @@ -179,6 +180,16 @@ static const struct pci_device_id
>> > mhi_pci_id_table[] = {
>> >  };
>> >  MODULE_DEVICE_TABLE(pci, mhi_pci_id_table);
>> >
>> > +enum mhi_pci_device_status {
>> > +     MHI_PCI_DEV_STARTED,
>> > +};
>> > +
>> > +struct mhi_pci_device {
>> > +     struct mhi_controller mhi_cntrl;
>> > +     struct pci_saved_state *pci_state;
>> > +     unsigned long status;
>> > +};
>> > +
>> >  static int mhi_pci_read_reg(struct mhi_controller *mhi_cntrl,
>> >                           void __iomem *addr, u32 *out)
>> >  {
>> > @@ -203,6 +214,20 @@ static inline void mhi_pci_reset(struct
>> > mhi_controller *mhi_cntrl)
>> >       writel(1, mhi_cntrl->regs + DEV_RESET_REG);
>> >  }
>> >
>> > +static bool mhi_pci_is_alive(struct mhi_controller *mhi_cntrl)
>> > +{
>> > +     struct pci_dev *pdev = to_pci_dev(mhi_cntrl->cntrl_dev);
>> > +     u16 vendor = 0;
>> > +
>> > +     if (pci_read_config_word(pdev, PCI_VENDOR_ID, &vendor))
>> > +             return false;
>> > +
>> > +     if (vendor == (u16) ~0 || vendor == 0)
>> > +             return false;
>> > +
>> > +     return true;
>> > +}
>> > +
>> >  static int mhi_pci_claim(struct mhi_controller *mhi_cntrl,
>> >                        unsigned int bar_num, u64 dma_mask)
>> >  {
>> > @@ -298,16 +323,18 @@ static int mhi_pci_probe(struct pci_dev *pdev,
>> > const struct pci_device_id *id)
>> >  {
>> >       const struct mhi_pci_dev_info *info = (struct mhi_pci_dev_info *)
>> > id->driver_data;
>> >       const struct mhi_controller_config *mhi_cntrl_config;
>> > +     struct mhi_pci_device *mhi_pdev;
>> >       struct mhi_controller *mhi_cntrl;
>> >       int err;
>> >
>> >       dev_dbg(&pdev->dev, "MHI PCI device found: %s\n", info->name);
>> >
>> > -     mhi_cntrl = mhi_alloc_controller();
>> > -     if (!mhi_cntrl)
>> Not recommended.
>> > +     mhi_pdev = devm_kzalloc(&pdev->dev, sizeof(*mhi_pdev), GFP_KERNEL);
>> > +     if (!mhi_pdev)
>> >               return -ENOMEM;
>> Why move away from the mhi_alloc_controller/mhi_free_controller pair 
>> we
>> recommended for use?
> 
> Because now I don't allocate mhi controller separately, it's part of
> the allocated mhi_pci_device that inherit from it. If necessary we
> would need something like mhi_init_controller() to initialize the
> field, but AFAIU everything needs to be zeroed.
Let's make sure we use the mhi_alloc/free_controller APIs because it 
will
help us have better control in case we need track of some allocations.
> 
> Regards,
> Loic

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] mhi: pci-generic: Increase number of hardware events
  2020-11-13 14:59 ` [PATCH 1/8] mhi: pci-generic: Increase number of hardware events Loic Poulain
  2020-11-13 16:56   ` Bhaumik Bhatt
@ 2020-11-18  3:54   ` Hemant Kumar
  2020-11-18  8:26     ` Loic Poulain
  1 sibling, 1 reply; 27+ messages in thread
From: Hemant Kumar @ 2020-11-18  3:54 UTC (permalink / raw)
  To: Loic Poulain, manivannan.sadhasivam; +Cc: linux-arm-msm

Hi Loic,

On 11/13/20 6:59 AM, Loic Poulain wrote:
> If the IPA (IP hardware accelerator) is starved of event ring elements,
> the modem is crashing (SDX55). That can be prevented by setting a
it is because of event processing is getting starved ? Event ring 
elements = 2 x xfer ring is good for HW channels. Do you think NAPI 
polling would help when you start verifying higher data rates?

> larger number of events (i.e 2 x number of channel ring elements).
> 
> Tested with FN980m module.
> 
> Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
> ---
>   drivers/bus/mhi/pci_generic.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
> index e3df838..13a7e4f 100644
> --- a/drivers/bus/mhi/pci_generic.c
> +++ b/drivers/bus/mhi/pci_generic.c
> @@ -91,7 +91,7 @@ struct mhi_pci_dev_info {
>   
>   #define MHI_EVENT_CONFIG_HW_DATA(ev_ring, ch_num) \
>   	{					\
> -		.num_elements = 128,		\
> +		.num_elements = 256,		\
>   		.irq_moderation_ms = 5,		\
>   		.irq = (ev_ring) + 1,		\
>   		.priority = 1,			\
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] mhi: pci-generic: Increase number of hardware events
  2020-11-18  3:54   ` Hemant Kumar
@ 2020-11-18  8:26     ` Loic Poulain
  2020-11-19  1:50       ` Hemant Kumar
  0 siblings, 1 reply; 27+ messages in thread
From: Loic Poulain @ 2020-11-18  8:26 UTC (permalink / raw)
  To: Hemant Kumar; +Cc: Manivannan Sadhasivam, linux-arm-msm

On Wed, 18 Nov 2020 at 04:54, Hemant Kumar <hemantk@codeaurora.org> wrote:
>
> Hi Loic,
>
> On 11/13/20 6:59 AM, Loic Poulain wrote:
> > If the IPA (IP hardware accelerator) is starved of event ring elements,
> > the modem is crashing (SDX55). That can be prevented by setting a
> it is because of event processing is getting starved ?

Yes, and hardware does not like that.

> Event ring elements = 2 x xfer ring is good for HW channels. Do you think NAPI
> polling would help when you start verifying higher data rates?

That's a good question, I'll certainly test that. But we still need to
ensure this will
never ever happen with this higher event count.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/8] mhi: pci-generic: Perform hard reset on remove
  2020-11-13 14:59 ` [PATCH 2/8] mhi: pci-generic: Perform hard reset on remove Loic Poulain
@ 2020-11-19  1:46   ` Hemant Kumar
  2020-11-19  9:21     ` Loic Poulain
  0 siblings, 1 reply; 27+ messages in thread
From: Hemant Kumar @ 2020-11-19  1:46 UTC (permalink / raw)
  To: Loic Poulain, manivannan.sadhasivam; +Cc: linux-arm-msm

Hi Loic,

On 11/13/20 6:59 AM, Loic Poulain wrote:
> Ensure that the device is hard-reset on remove to restore its initial
> state and avoid further issues on subsequent probe.
> 
> This has been tested with Telit FN980m module.
> 
> Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
> ---
>   drivers/bus/mhi/pci_generic.c | 11 +++++++++++
>   1 file changed, 11 insertions(+)
> 
> diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
> index 13a7e4f..09c6b26 100644
> --- a/drivers/bus/mhi/pci_generic.c
> +++ b/drivers/bus/mhi/pci_generic.c
> @@ -15,6 +15,8 @@
>   
>   #define MHI_PCI_DEFAULT_BAR_NUM 0
>   
> +#define DEV_RESET_REG (0xB0)
> +
>   /**
>    * struct mhi_pci_dev_info - MHI PCI device specific information
>    * @config: MHI controller configuration
> @@ -166,6 +168,11 @@ static void mhi_pci_status_cb(struct mhi_controller *mhi_cntrl,
>   	/* Nothing to do for now */
>   }
>   
> +static inline void mhi_pci_reset(struct mhi_controller *mhi_cntrl)
> +{
> +	writel(1, mhi_cntrl->regs + DEV_RESET_REG);
> +}
> +
>   static int mhi_pci_claim(struct mhi_controller *mhi_cntrl,
>   			 unsigned int bar_num, u64 dma_mask)
>   {
> @@ -329,6 +336,10 @@ static void mhi_pci_remove(struct pci_dev *pdev)
>   	mhi_power_down(mhi_cntrl, true);
>   	mhi_unprepare_after_power_down(mhi_cntrl);
>   	mhi_unregister_controller(mhi_cntrl);
> +
> +	/* MHI-layer reset could not be enough, always hard-reset the device */
> +	mhi_pci_reset(mhi_cntrl);

Referring to MHI spec:
Hosts writes this register to trigger a reset. This can be used when the 
host detects a timeout in the MHI protocol and can use the reset as a 
last resort to recover the device. Host should first attempt an MHI 
Reset procedure before resetting the entire device.

What issue are you facing which requires you to do full device reset ? 
Based on the spec recommendation, looks like this should be a last resort.

Thanks,
Hemant
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] mhi: pci-generic: Increase number of hardware events
  2020-11-18  8:26     ` Loic Poulain
@ 2020-11-19  1:50       ` Hemant Kumar
  2020-11-19  9:42         ` Loic Poulain
  0 siblings, 1 reply; 27+ messages in thread
From: Hemant Kumar @ 2020-11-19  1:50 UTC (permalink / raw)
  To: Loic Poulain; +Cc: Manivannan Sadhasivam, linux-arm-msm

Hi Loic,

On 11/18/20 12:26 AM, Loic Poulain wrote:
> On Wed, 18 Nov 2020 at 04:54, Hemant Kumar <hemantk@codeaurora.org> wrote:
>>
>> Hi Loic,
>>
>> On 11/13/20 6:59 AM, Loic Poulain wrote:
>>> If the IPA (IP hardware accelerator) is starved of event ring elements,
>>> the modem is crashing (SDX55). That can be prevented by setting a
>> it is because of event processing is getting starved ?
> 
> Yes, and hardware does not like that.
> 
>> Event ring elements = 2 x xfer ring is good for HW channels. Do you think NAPI
>> polling would help when you start verifying higher data rates?
> 
> That's a good question, I'll certainly test that. But we still need to
> ensure this will
> never ever happen with this higher event count.
Once you move to burst mode, if device is running out of credit, it 
would send OOB and move to doorbell mode, and stop posting events until
MHI Host rings DB with a TRE.
> 

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/8] mhi: pci_generic: Add suspend/resume/recovery procedure
  2020-11-13 14:59 ` [PATCH 5/8] mhi: pci_generic: Add suspend/resume/recovery procedure Loic Poulain
@ 2020-11-19  2:21   ` Hemant Kumar
  2020-11-19  9:54     ` Loic Poulain
  0 siblings, 1 reply; 27+ messages in thread
From: Hemant Kumar @ 2020-11-19  2:21 UTC (permalink / raw)
  To: Loic Poulain, manivannan.sadhasivam; +Cc: linux-arm-msm

Hi Loic,

On 11/13/20 6:59 AM, Loic Poulain wrote:
> Add support for system wide suspend/resume. During suspend, MHI
> device controller must be put in M3 state and PCI bus in D3 state.
> 
> Add a recovery procedure allowing to reinitialize the device in case
> of error during resume steps, which can happen if device loses power
> (and so its context) while system suspend.
> 
> Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
> ---
[..]
>   
> +static void mhi_pci_recovery_work(struct work_struct *work)
> +{
> +	struct mhi_pci_device *mhi_pdev = container_of(work, struct mhi_pci_device,
> +						       recovery_work);
> +	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
> +	struct pci_dev *pdev = to_pci_dev(mhi_cntrl->cntrl_dev);
> +	int err;
> +
> +	dev_warn(&pdev->dev, "device recovery started\n");
> +
> +	/* Clean up MHI state */
> +	if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
> +		mhi_power_down(mhi_cntrl, false);
> +		mhi_unprepare_after_power_down(mhi_cntrl);
> +	}
> +
> +	/* Check if we can recover without full reset */
> +	pci_set_power_state(pdev, PCI_D0);
> +	pci_load_saved_state(pdev, mhi_pdev->pci_state);
> +	pci_restore_state(pdev);
> +
> +	if (!mhi_pci_is_alive(mhi_cntrl))
> +		goto err_try_reset;
> +
> +	err = mhi_prepare_for_power_up(mhi_cntrl);
> +	if (err)
> +		goto err_try_reset;
> +
> +	err = mhi_sync_power_up(mhi_cntrl);
> +	if (err)
> +		goto err_unprepare;
> +
> +	set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status);
> +	return;
> +
> +err_unprepare:
> +	mhi_unprepare_after_power_down(mhi_cntrl);
> +err_try_reset:
> +	if (pci_reset_function(pdev))
> +		dev_err(&pdev->dev, "Recovery failed\n");
> +}
> +
>   static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   {
>   	const struct mhi_pci_dev_info *info = (struct mhi_pci_dev_info *) id->driver_data;
> @@ -333,6 +377,8 @@ static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
>   	if (!mhi_pdev)
>   		return -ENOMEM;
>   
> +	INIT_WORK(&mhi_pdev->recovery_work, mhi_pci_recovery_work);
> +
>   	mhi_cntrl_config = info->config;
>   	mhi_cntrl = &mhi_pdev->mhi_cntrl;
>   	mhi_cntrl->cntrl_dev = &pdev->dev;
> @@ -395,6 +441,8 @@ static void mhi_pci_remove(struct pci_dev *pdev)
>   	struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
>   	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
>   
> +	cancel_work_sync(&mhi_pdev->recovery_work);
> +
>   	if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
>   		mhi_power_down(mhi_cntrl, true);
>   		mhi_unprepare_after_power_down(mhi_cntrl);
> @@ -463,12 +511,64 @@ static const struct pci_error_handlers mhi_pci_err_handler = {
>   	.reset_done = mhi_pci_reset_done,
>   };
>   
> +int  __maybe_unused mhi_pci_suspend(struct device *dev)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
> +	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
> +
is it possible we come here and work still did not get a chance to run ? 
If so we can flush it before going through suspend.
> +	/* Transition to M3 state */
> +	mhi_pm_suspend(mhi_cntrl);
> +
> +	pci_save_state(pdev);
> +	pci_disable_device(pdev);
> +	pci_wake_from_d3(pdev, true);
> +	pci_set_power_state(pdev, PCI_D3hot);
are you planning to park the device in D3hot all the time ?
Is there a plan to move to D3Cold in your use case ?
> +
> +	return 0;
> +}
> +
> +static int __maybe_unused mhi_pci_resume(struct device *dev)
> +{
> +	struct pci_dev *pdev = to_pci_dev(dev);
> +	struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
> +	struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
> +	int err;
> +
> +	pci_set_power_state(pdev, PCI_D0);
> +	pci_restore_state(pdev);
> +	pci_set_master(pdev);
> +
> +	err = pci_enable_device(pdev);
> +	if (err)
> +		goto err_recovery;
> +
> +	/* Exit M3, transition to M0 state */
> +	err = mhi_pm_resume(mhi_cntrl);
> +	if (err) {
> +		dev_err(&pdev->dev, "failed to resume device: %d\n", err);
> +		goto err_recovery;
> +	}
> +
> +	return 0;
> +
> +err_recovery:
> +	/* The device may have loose power or crashed, try recovering it */
> +	queue_work(system_long_wq, &mhi_pdev->recovery_work);
> +	return 0;
> +}
> +
[..]

Thanks,
Hemant

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/8] mhi: pci-generic: Perform hard reset on remove
  2020-11-19  1:46   ` Hemant Kumar
@ 2020-11-19  9:21     ` Loic Poulain
  2020-11-25 17:41       ` Jeffrey Hugo
  0 siblings, 1 reply; 27+ messages in thread
From: Loic Poulain @ 2020-11-19  9:21 UTC (permalink / raw)
  To: Hemant Kumar; +Cc: Manivannan Sadhasivam, linux-arm-msm

Hi Hemant,

On Thu, 19 Nov 2020 at 02:46, Hemant Kumar <hemantk@codeaurora.org> wrote:
>
> Hi Loic,
>
> On 11/13/20 6:59 AM, Loic Poulain wrote:
> > Ensure that the device is hard-reset on remove to restore its initial
> > state and avoid further issues on subsequent probe.
> >
> > This has been tested with Telit FN980m module.
> >
> > Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
> > ---
> >   drivers/bus/mhi/pci_generic.c | 11 +++++++++++
> >   1 file changed, 11 insertions(+)
> >
> > diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
> > index 13a7e4f..09c6b26 100644
> > --- a/drivers/bus/mhi/pci_generic.c
> > +++ b/drivers/bus/mhi/pci_generic.c
> > @@ -15,6 +15,8 @@
> >
> >   #define MHI_PCI_DEFAULT_BAR_NUM 0
> >
> > +#define DEV_RESET_REG (0xB0)
> > +
> >   /**
> >    * struct mhi_pci_dev_info - MHI PCI device specific information
> >    * @config: MHI controller configuration
> > @@ -166,6 +168,11 @@ static void mhi_pci_status_cb(struct mhi_controller *mhi_cntrl,
> >       /* Nothing to do for now */
> >   }
> >
> > +static inline void mhi_pci_reset(struct mhi_controller *mhi_cntrl)
> > +{
> > +     writel(1, mhi_cntrl->regs + DEV_RESET_REG);
> > +}
> > +
> >   static int mhi_pci_claim(struct mhi_controller *mhi_cntrl,
> >                        unsigned int bar_num, u64 dma_mask)
> >   {
> > @@ -329,6 +336,10 @@ static void mhi_pci_remove(struct pci_dev *pdev)
> >       mhi_power_down(mhi_cntrl, true);
> >       mhi_unprepare_after_power_down(mhi_cntrl);
> >       mhi_unregister_controller(mhi_cntrl);
> > +
> > +     /* MHI-layer reset could not be enough, always hard-reset the device */
> > +     mhi_pci_reset(mhi_cntrl);
>
> Referring to MHI spec:
> Hosts writes this register to trigger a reset. This can be used when the
> host detects a timeout in the MHI protocol and can use the reset as a
> last resort to recover the device. Host should first attempt an MHI
> Reset procedure before resetting the entire device.
>
> What issue are you facing which requires you to do full device reset ?
> Based on the spec recommendation, looks like this should be a last resort.

On module unload/reload, the device does not go through cold reset and
can be in error state on further reload, causing mhi power up to fail.
This patch simply resets the device in remove so that it is in the
exact same condition as before probing (not only MHI layer, but all
the device context), I think it makes sense and prevents any
unexpected state on further reloading. Note also that unloading the
module (or unbinding the device) is not something that usually
happens, except when the user does it explicitly for any reason.

Regards,
Loic

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 1/8] mhi: pci-generic: Increase number of hardware events
  2020-11-19  1:50       ` Hemant Kumar
@ 2020-11-19  9:42         ` Loic Poulain
  0 siblings, 0 replies; 27+ messages in thread
From: Loic Poulain @ 2020-11-19  9:42 UTC (permalink / raw)
  To: Hemant Kumar; +Cc: Manivannan Sadhasivam, linux-arm-msm

Hi Hemant,

On Thu, 19 Nov 2020 at 02:50, Hemant Kumar <hemantk@codeaurora.org> wrote:
>
> Hi Loic,
>
> On 11/18/20 12:26 AM, Loic Poulain wrote:
> > On Wed, 18 Nov 2020 at 04:54, Hemant Kumar <hemantk@codeaurora.org> wrote:
> >>
> >> Hi Loic,
> >>
> >> On 11/13/20 6:59 AM, Loic Poulain wrote:
> >>> If the IPA (IP hardware accelerator) is starved of event ring elements,
> >>> the modem is crashing (SDX55). That can be prevented by setting a
> >> it is because of event processing is getting starved ?
> >
> > Yes, and hardware does not like that.
> >
> >> Event ring elements = 2 x xfer ring is good for HW channels. Do you think NAPI
> >> polling would help when you start verifying higher data rates?
> >
> > That's a good question, I'll certainly test that. But we still need to
> > ensure this will
> > never ever happen with this higher event count.
> Once you move to burst mode, if device is running out of credit, it
> would send OOB and move to doorbell mode, and stop posting events until
> MHI Host rings DB with a TRE.

That not something I observe with my device/hardware, and I end with
the modem crashing:

[   47.607340] ipa_mhi_client ipa_mhi_gsi_ev_err_cb:1121 channel
id=100 client=42 state=2
[   47.608337] ipa_mhi_client ipa_mhi_gsi_ev_err_cb:1133 Received
GSI_EVT_EVT_RING_EMPTY_ERR
[   47.609310] ipa_mhi_client ipa_mhi_gsi_ev_err_cb:1138 err_desc=0x5000
[   47.610339] IPA: unrecoverable error has occurred, asserting

It's triggered from here:
https://source.codeaurora.org/quic/la/kernel/msm-4.14/tree/drivers/platform/msm/ipa/ipa_clients/ipa_mhi_client.c?h=APSS.FSM.7.6#n1134

Regards,
Loic

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 5/8] mhi: pci_generic: Add suspend/resume/recovery procedure
  2020-11-19  2:21   ` Hemant Kumar
@ 2020-11-19  9:54     ` Loic Poulain
  0 siblings, 0 replies; 27+ messages in thread
From: Loic Poulain @ 2020-11-19  9:54 UTC (permalink / raw)
  To: Hemant Kumar; +Cc: Manivannan Sadhasivam, linux-arm-msm

Hi Hemant,

On Thu, 19 Nov 2020 at 03:21, Hemant Kumar <hemantk@codeaurora.org> wrote:
>
> Hi Loic,
>
> On 11/13/20 6:59 AM, Loic Poulain wrote:
> > Add support for system wide suspend/resume. During suspend, MHI
> > device controller must be put in M3 state and PCI bus in D3 state.
> >
> > Add a recovery procedure allowing to reinitialize the device in case
> > of error during resume steps, which can happen if device loses power
> > (and so its context) while system suspend.
> >
> > Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
> > ---
> [..]
> >
> > +static void mhi_pci_recovery_work(struct work_struct *work)
> > +{
> > +     struct mhi_pci_device *mhi_pdev = container_of(work, struct mhi_pci_device,
> > +                                                    recovery_work);
> > +     struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
> > +     struct pci_dev *pdev = to_pci_dev(mhi_cntrl->cntrl_dev);
> > +     int err;
> > +
> > +     dev_warn(&pdev->dev, "device recovery started\n");
> > +
> > +     /* Clean up MHI state */
> > +     if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
> > +             mhi_power_down(mhi_cntrl, false);
> > +             mhi_unprepare_after_power_down(mhi_cntrl);
> > +     }
> > +
> > +     /* Check if we can recover without full reset */
> > +     pci_set_power_state(pdev, PCI_D0);
> > +     pci_load_saved_state(pdev, mhi_pdev->pci_state);
> > +     pci_restore_state(pdev);
> > +
> > +     if (!mhi_pci_is_alive(mhi_cntrl))
> > +             goto err_try_reset;
> > +
> > +     err = mhi_prepare_for_power_up(mhi_cntrl);
> > +     if (err)
> > +             goto err_try_reset;
> > +
> > +     err = mhi_sync_power_up(mhi_cntrl);
> > +     if (err)
> > +             goto err_unprepare;
> > +
> > +     set_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status);
> > +     return;
> > +
> > +err_unprepare:
> > +     mhi_unprepare_after_power_down(mhi_cntrl);
> > +err_try_reset:
> > +     if (pci_reset_function(pdev))
> > +             dev_err(&pdev->dev, "Recovery failed\n");
> > +}
> > +
> >   static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >   {
> >       const struct mhi_pci_dev_info *info = (struct mhi_pci_dev_info *) id->driver_data;
> > @@ -333,6 +377,8 @@ static int mhi_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id)
> >       if (!mhi_pdev)
> >               return -ENOMEM;
> >
> > +     INIT_WORK(&mhi_pdev->recovery_work, mhi_pci_recovery_work);
> > +
> >       mhi_cntrl_config = info->config;
> >       mhi_cntrl = &mhi_pdev->mhi_cntrl;
> >       mhi_cntrl->cntrl_dev = &pdev->dev;
> > @@ -395,6 +441,8 @@ static void mhi_pci_remove(struct pci_dev *pdev)
> >       struct mhi_pci_device *mhi_pdev = pci_get_drvdata(pdev);
> >       struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
> >
> > +     cancel_work_sync(&mhi_pdev->recovery_work);
> > +
> >       if (test_and_clear_bit(MHI_PCI_DEV_STARTED, &mhi_pdev->status)) {
> >               mhi_power_down(mhi_cntrl, true);
> >               mhi_unprepare_after_power_down(mhi_cntrl);
> > @@ -463,12 +511,64 @@ static const struct pci_error_handlers mhi_pci_err_handler = {
> >       .reset_done = mhi_pci_reset_done,
> >   };
> >
> > +int  __maybe_unused mhi_pci_suspend(struct device *dev)
> > +{
> > +     struct pci_dev *pdev = to_pci_dev(dev);
> > +     struct mhi_pci_device *mhi_pdev = dev_get_drvdata(dev);
> > +     struct mhi_controller *mhi_cntrl = &mhi_pdev->mhi_cntrl;
> > +
> is it possible we come here and work still did not get a chance to run ?
> If so we can flush it before going through suspend.

Yes, would be good to cancel that work, will fix that.

> > +     /* Transition to M3 state */
> > +     mhi_pm_suspend(mhi_cntrl);
> > +
> > +     pci_save_state(pdev);
> > +     pci_disable_device(pdev);
> > +     pci_wake_from_d3(pdev, true);
> > +     pci_set_power_state(pdev, PCI_D3hot);
> are you planning to park the device in D3hot all the time ?
> Is there a plan to move to D3Cold in your use case ?

D3Cold is not a state that is controllable in software.

But here we support both scenarios:

Either the device is kept powered during platform suspend, because
e.g. it has auxiliary power. In that case, context is maintained and
the device can be safely resumed from M3. (e.g The M2 spec specifies
that 3.3V power should be provided even during system suspend).

Or the device has its power cut off during platform suspend, that
actually the case with my desktop, PCI express ports are disabled, and
the modem is powered off, this is D3Cold. In that case, we need to
reinitialize the modem, that what is done in the recovery procedure.

Regards,
Loic

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/8] mhi: pci-generic: Perform hard reset on remove
  2020-11-19  9:21     ` Loic Poulain
@ 2020-11-25 17:41       ` Jeffrey Hugo
  2020-11-27 16:21         ` Loic Poulain
  0 siblings, 1 reply; 27+ messages in thread
From: Jeffrey Hugo @ 2020-11-25 17:41 UTC (permalink / raw)
  To: Loic Poulain, Hemant Kumar; +Cc: Manivannan Sadhasivam, linux-arm-msm

On 11/19/2020 2:21 AM, Loic Poulain wrote:
> Hi Hemant,
> 
> On Thu, 19 Nov 2020 at 02:46, Hemant Kumar <hemantk@codeaurora.org> wrote:
>>
>> Hi Loic,
>>
>> On 11/13/20 6:59 AM, Loic Poulain wrote:
>>> Ensure that the device is hard-reset on remove to restore its initial
>>> state and avoid further issues on subsequent probe.
>>>
>>> This has been tested with Telit FN980m module.
>>>
>>> Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
>>> ---
>>>    drivers/bus/mhi/pci_generic.c | 11 +++++++++++
>>>    1 file changed, 11 insertions(+)
>>>
>>> diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
>>> index 13a7e4f..09c6b26 100644
>>> --- a/drivers/bus/mhi/pci_generic.c
>>> +++ b/drivers/bus/mhi/pci_generic.c
>>> @@ -15,6 +15,8 @@
>>>
>>>    #define MHI_PCI_DEFAULT_BAR_NUM 0
>>>
>>> +#define DEV_RESET_REG (0xB0)
>>> +
>>>    /**
>>>     * struct mhi_pci_dev_info - MHI PCI device specific information
>>>     * @config: MHI controller configuration
>>> @@ -166,6 +168,11 @@ static void mhi_pci_status_cb(struct mhi_controller *mhi_cntrl,
>>>        /* Nothing to do for now */
>>>    }
>>>
>>> +static inline void mhi_pci_reset(struct mhi_controller *mhi_cntrl)
>>> +{
>>> +     writel(1, mhi_cntrl->regs + DEV_RESET_REG);
>>> +}
>>> +
>>>    static int mhi_pci_claim(struct mhi_controller *mhi_cntrl,
>>>                         unsigned int bar_num, u64 dma_mask)
>>>    {
>>> @@ -329,6 +336,10 @@ static void mhi_pci_remove(struct pci_dev *pdev)
>>>        mhi_power_down(mhi_cntrl, true);
>>>        mhi_unprepare_after_power_down(mhi_cntrl);
>>>        mhi_unregister_controller(mhi_cntrl);
>>> +
>>> +     /* MHI-layer reset could not be enough, always hard-reset the device */
>>> +     mhi_pci_reset(mhi_cntrl);
>>
>> Referring to MHI spec:
>> Hosts writes this register to trigger a reset. This can be used when the
>> host detects a timeout in the MHI protocol and can use the reset as a
>> last resort to recover the device. Host should first attempt an MHI
>> Reset procedure before resetting the entire device.
>>
>> What issue are you facing which requires you to do full device reset ?
>> Based on the spec recommendation, looks like this should be a last resort.
> 
> On module unload/reload, the device does not go through cold reset and
> can be in error state on further reload, causing mhi power up to fail.
> This patch simply resets the device in remove so that it is in the
> exact same condition as before probing (not only MHI layer, but all
> the device context), I think it makes sense and prevents any
> unexpected state on further reloading. Note also that unloading the
> module (or unbinding the device) is not something that usually
> happens, except when the user does it explicitly for any reason.

This seems unnecessary to me.  Qaic has the same usecase, and the MHI 
state machine reset is sufficient.  Perhaps you have a unique edge case 
though.

However, you are implementing the soc_reset functionality in your 
driver, when its a common MHI functionality, and is something I would 
like to use.  If you dig back, I proposed a sysfs extension to expose 
that to userspace, but I have a desire to use it from my driver, same as 
you.

Would you please make MHI core changes to expose the soc_reset 
functionality out so that multiple drivers can use a common implementation?


-- 
Jeffrey Hugo
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 8/8] mhi: pci_generic: Increase controller timeout value
  2020-11-13 15:00 ` [PATCH 8/8] mhi: pci_generic: Increase controller timeout value Loic Poulain
  2020-11-13 17:08   ` Bhaumik Bhatt
@ 2020-11-25 20:23   ` Hemant Kumar
  1 sibling, 0 replies; 27+ messages in thread
From: Hemant Kumar @ 2020-11-25 20:23 UTC (permalink / raw)
  To: Loic Poulain, manivannan.sadhasivam; +Cc: linux-arm-msm



On 11/13/20 7:00 AM, Loic Poulain wrote:
> On cold boot, device can take slightly more than 5 seconds to start.
> Increase the timeout to prevent MHI power-up issues.
> 
> Signed-off-by: Loic Poulain <loic.poulain@linaro.org>
> ---
>   drivers/bus/mhi/pci_generic.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/bus/mhi/pci_generic.c b/drivers/bus/mhi/pci_generic.c
> index 6cc7bb9..7f0068c 100644
> --- a/drivers/bus/mhi/pci_generic.c
> +++ b/drivers/bus/mhi/pci_generic.c
> @@ -162,7 +162,7 @@ static const struct mhi_event_config modem_qcom_v1_mhi_events[] = {
>   
>   static const struct mhi_controller_config modem_qcom_v1_mhiv_config = {
>   	.max_channels = 128,
> -	.timeout_ms = 5000,
> +	.timeout_ms = 8000,
>   	.num_channels = ARRAY_SIZE(modem_qcom_v1_mhi_channels),
>   	.ch_cfg = modem_qcom_v1_mhi_channels,
>   	.num_events = ARRAY_SIZE(modem_qcom_v1_mhi_events),
> 

Reviewed-by: Hemant Kumar <hemantk@codeaurora.org>
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/8] mhi: pci-generic: Perform hard reset on remove
  2020-11-25 17:41       ` Jeffrey Hugo
@ 2020-11-27 16:21         ` Loic Poulain
  2020-12-02  1:54           ` Bhaumik Bhatt
  0 siblings, 1 reply; 27+ messages in thread
From: Loic Poulain @ 2020-11-27 16:21 UTC (permalink / raw)
  To: Jeffrey Hugo; +Cc: Hemant Kumar, Manivannan Sadhasivam, linux-arm-msm

Hi Jeffrey,

On Wed, 25 Nov 2020 at 18:41, Jeffrey Hugo <jhugo@codeaurora.org> wrote:
> >>> @@ -329,6 +336,10 @@ static void mhi_pci_remove(struct pci_dev *pdev)
> >>>        mhi_power_down(mhi_cntrl, true);
> >>>        mhi_unprepare_after_power_down(mhi_cntrl);
> >>>        mhi_unregister_controller(mhi_cntrl);
> >>> +
> >>> +     /* MHI-layer reset could not be enough, always hard-reset the device */
> >>> +     mhi_pci_reset(mhi_cntrl);
> >>
> >> Referring to MHI spec:
> >> Hosts writes this register to trigger a reset. This can be used when the
> >> host detects a timeout in the MHI protocol and can use the reset as a
> >> last resort to recover the device. Host should first attempt an MHI
> >> Reset procedure before resetting the entire device.
> >>
> >> What issue are you facing which requires you to do full device reset ?
> >> Based on the spec recommendation, looks like this should be a last resort.
> >
> > On module unload/reload, the device does not go through cold reset and
> > can be in error state on further reload, causing mhi power up to fail.
> > This patch simply resets the device in remove so that it is in the
> > exact same condition as before probing (not only MHI layer, but all
> > the device context), I think it makes sense and prevents any
> > unexpected state on further reloading. Note also that unloading the
> > module (or unbinding the device) is not something that usually
> > happens, except when the user does it explicitly for any reason.
>
> This seems unnecessary to me.  Qaic has the same usecase, and the MHI
> state machine reset is sufficient.  Perhaps you have a unique edge case
> though.
>
> However, you are implementing the soc_reset functionality in your
> driver, when its a common MHI functionality, and is something I would
> like to use.  If you dig back, I proposed a sysfs extension to expose
> that to userspace, but I have a desire to use it from my driver, same as
> you.
>
> Would you please make MHI core changes to expose the soc_reset
> functionality out so that multiple drivers can use a common implementation?

I overlooked this reply, going to move that into MHI core, as you suggested.

Thanks,
Loic

^ permalink raw reply	[flat|nested] 27+ messages in thread

* Re: [PATCH 2/8] mhi: pci-generic: Perform hard reset on remove
  2020-11-27 16:21         ` Loic Poulain
@ 2020-12-02  1:54           ` Bhaumik Bhatt
  0 siblings, 0 replies; 27+ messages in thread
From: Bhaumik Bhatt @ 2020-12-02  1:54 UTC (permalink / raw)
  To: Loic Poulain
  Cc: Jeffrey Hugo, Hemant Kumar, Manivannan Sadhasivam, linux-arm-msm

On 2020-11-27 08:21 AM, Loic Poulain wrote:
> Hi Jeffrey,
> 
> On Wed, 25 Nov 2020 at 18:41, Jeffrey Hugo <jhugo@codeaurora.org> 
> wrote:
>> >>> @@ -329,6 +336,10 @@ static void mhi_pci_remove(struct pci_dev *pdev)
>> >>>        mhi_power_down(mhi_cntrl, true);
>> >>>        mhi_unprepare_after_power_down(mhi_cntrl);
>> >>>        mhi_unregister_controller(mhi_cntrl);
>> >>> +
>> >>> +     /* MHI-layer reset could not be enough, always hard-reset the device */
>> >>> +     mhi_pci_reset(mhi_cntrl);
>> >>
>> >> Referring to MHI spec:
>> >> Hosts writes this register to trigger a reset. This can be used when the
>> >> host detects a timeout in the MHI protocol and can use the reset as a
>> >> last resort to recover the device. Host should first attempt an MHI
>> >> Reset procedure before resetting the entire device.
>> >>
>> >> What issue are you facing which requires you to do full device reset ?
>> >> Based on the spec recommendation, looks like this should be a last resort.
>> >
>> > On module unload/reload, the device does not go through cold reset and
>> > can be in error state on further reload, causing mhi power up to fail.
>> > This patch simply resets the device in remove so that it is in the
>> > exact same condition as before probing (not only MHI layer, but all
>> > the device context), I think it makes sense and prevents any
>> > unexpected state on further reloading. Note also that unloading the
>> > module (or unbinding the device) is not something that usually
>> > happens, except when the user does it explicitly for any reason.
>> 
>> This seems unnecessary to me.  Qaic has the same usecase, and the MHI
>> state machine reset is sufficient.  Perhaps you have a unique edge 
>> case
>> though.
>> 
>> However, you are implementing the soc_reset functionality in your
>> driver, when its a common MHI functionality, and is something I would
>> like to use.  If you dig back, I proposed a sysfs extension to expose
>> that to userspace, but I have a desire to use it from my driver, same 
>> as
>> you.
>> 
>> Would you please make MHI core changes to expose the soc_reset
>> functionality out so that multiple drivers can use a common 
>> implementation?
> 
> I overlooked this reply, going to move that into MHI core, as you 
> suggested.
> 
> Thanks,
> Loic
Yes, this makes sense to do as Jeff suggested.
Thanks,
Bhaumik
---
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora 
Forum,
a Linux Foundation Collaborative Project

^ permalink raw reply	[flat|nested] 27+ messages in thread

end of thread, other threads:[~2020-12-02  1:55 UTC | newest]

Thread overview: 27+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-13 14:59 [PATCH 0/8] mhi: pci_generic: Misc improvements Loic Poulain
2020-11-13 14:59 ` [PATCH 1/8] mhi: pci-generic: Increase number of hardware events Loic Poulain
2020-11-13 16:56   ` Bhaumik Bhatt
2020-11-18  3:54   ` Hemant Kumar
2020-11-18  8:26     ` Loic Poulain
2020-11-19  1:50       ` Hemant Kumar
2020-11-19  9:42         ` Loic Poulain
2020-11-13 14:59 ` [PATCH 2/8] mhi: pci-generic: Perform hard reset on remove Loic Poulain
2020-11-19  1:46   ` Hemant Kumar
2020-11-19  9:21     ` Loic Poulain
2020-11-25 17:41       ` Jeffrey Hugo
2020-11-27 16:21         ` Loic Poulain
2020-12-02  1:54           ` Bhaumik Bhatt
2020-11-13 14:59 ` [PATCH 3/8] mhi: pci_generic: Enable burst mode for hardware channels Loic Poulain
2020-11-13 17:06   ` Bhaumik Bhatt
2020-11-13 14:59 ` [PATCH 4/8] mhi: pci_generic: Add support for reset Loic Poulain
2020-11-13 17:12   ` Bhaumik Bhatt
2020-11-13 17:42     ` Loic Poulain
2020-11-13 17:44       ` Bhaumik Bhatt
2020-11-13 14:59 ` [PATCH 5/8] mhi: pci_generic: Add suspend/resume/recovery procedure Loic Poulain
2020-11-19  2:21   ` Hemant Kumar
2020-11-19  9:54     ` Loic Poulain
2020-11-13 15:00 ` [PATCH 6/8] mhi: pci_generic: Add PCI error handlers Loic Poulain
2020-11-13 15:00 ` [PATCH 7/8] mhi: pci_generic: Add health-check Loic Poulain
2020-11-13 15:00 ` [PATCH 8/8] mhi: pci_generic: Increase controller timeout value Loic Poulain
2020-11-13 17:08   ` Bhaumik Bhatt
2020-11-25 20:23   ` Hemant Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).