All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2] Bluetooth: Add ncmd=0 recovery handling
@ 2021-04-08  2:36 Manish Mandlik
  2021-04-08  3:31 ` [v2] " bluez.test.bot
  2021-04-08  8:09 ` [PATCH v2] " Marcel Holtmann
  0 siblings, 2 replies; 3+ messages in thread
From: Manish Mandlik @ 2021-04-08  2:36 UTC (permalink / raw)
  To: marcel, luiz.dentz
  Cc: linux-bluetooth, Alain Michaud, chromeos-bluetooth-upstreaming,
	Manish Mandlik, Abhishek Pandit-Subedi, David S. Miller,
	Jakub Kicinski, Johan Hedberg, linux-kernel, netdev

During command status or command complete event, the controller may set
ncmd=0 indicating that it is not accepting any more commands. In such a
case, host holds off sending any more commands to the controller. If the
controller doesn't recover from such condition, host will wait forever,
until the user decides that the Bluetooth is broken and may power cycles
the Bluetooth.

This patch triggers the hardware error to reset the controller and
driver when it gets into such state as there is no other wat out.

Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
Signed-off-by: Manish Mandlik <mmandlik@google.com>
---

Changes in v2:
- Emit the hardware error when ncmd=0 occurs

 include/net/bluetooth/hci.h      |  1 +
 include/net/bluetooth/hci_core.h |  1 +
 net/bluetooth/hci_core.c         | 15 +++++++++++++++
 net/bluetooth/hci_event.c        | 10 ++++++++++
 4 files changed, 27 insertions(+)

diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index ea4ae551c426..c4b0650fb9ae 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -339,6 +339,7 @@ enum {
 #define HCI_PAIRING_TIMEOUT	msecs_to_jiffies(60000)	/* 60 seconds */
 #define HCI_INIT_TIMEOUT	msecs_to_jiffies(10000)	/* 10 seconds */
 #define HCI_CMD_TIMEOUT		msecs_to_jiffies(2000)	/* 2 seconds */
+#define HCI_NCMD_TIMEOUT	msecs_to_jiffies(4000)	/* 4 seconds */
 #define HCI_ACL_TX_TIMEOUT	msecs_to_jiffies(45000)	/* 45 seconds */
 #define HCI_AUTO_OFF_TIMEOUT	msecs_to_jiffies(2000)	/* 2 seconds */
 #define HCI_POWER_OFF_TIMEOUT	msecs_to_jiffies(5000)	/* 5 seconds */
diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index ebdd4afe30d2..f14692b39fd5 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -470,6 +470,7 @@ struct hci_dev {
 	struct delayed_work	service_cache;
 
 	struct delayed_work	cmd_timer;
+	struct delayed_work	ncmd_timer;
 
 	struct work_struct	rx_work;
 	struct work_struct	cmd_work;
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index b0d9c36acc03..c102a8763cb5 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -2769,6 +2769,20 @@ static void hci_cmd_timeout(struct work_struct *work)
 	queue_work(hdev->workqueue, &hdev->cmd_work);
 }
 
+/* HCI ncmd timer function */
+static void hci_ncmd_timeout(struct work_struct *work)
+{
+	struct hci_dev *hdev = container_of(work, struct hci_dev,
+					    ncmd_timer.work);
+
+	bt_dev_err(hdev, "Controller not accepting commands anymore: ncmd = 0");
+
+	/* This is an irrecoverable state. Inject hw error event to reset
+	 * the device and driver.
+	 */
+	hci_reset_dev(hdev);
+}
+
 struct oob_data *hci_find_remote_oob_data(struct hci_dev *hdev,
 					  bdaddr_t *bdaddr, u8 bdaddr_type)
 {
@@ -3831,6 +3845,7 @@ struct hci_dev *hci_alloc_dev(void)
 	init_waitqueue_head(&hdev->suspend_wait_q);
 
 	INIT_DELAYED_WORK(&hdev->cmd_timer, hci_cmd_timeout);
+	INIT_DELAYED_WORK(&hdev->ncmd_timer, hci_ncmd_timeout);
 
 	hci_request_setup(hdev);
 
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index cf2f4a0abdbd..114a9170d809 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -3635,6 +3635,11 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb,
 	if (*opcode != HCI_OP_NOP)
 		cancel_delayed_work(&hdev->cmd_timer);
 
+	if (!ev->ncmd && !test_bit(HCI_RESET, &hdev->flags))
+		schedule_delayed_work(&hdev->ncmd_timer, HCI_NCMD_TIMEOUT);
+	else
+		cancel_delayed_work(&hdev->ncmd_timer);
+
 	if (ev->ncmd && !test_bit(HCI_RESET, &hdev->flags))
 		atomic_set(&hdev->cmd_cnt, 1);
 
@@ -3740,6 +3745,11 @@ static void hci_cmd_status_evt(struct hci_dev *hdev, struct sk_buff *skb,
 	if (*opcode != HCI_OP_NOP)
 		cancel_delayed_work(&hdev->cmd_timer);
 
+	if (!ev->ncmd && !test_bit(HCI_RESET, &hdev->flags))
+		schedule_delayed_work(&hdev->ncmd_timer, HCI_NCMD_TIMEOUT);
+	else
+		cancel_delayed_work(&hdev->ncmd_timer);
+
 	if (ev->ncmd && !test_bit(HCI_RESET, &hdev->flags))
 		atomic_set(&hdev->cmd_cnt, 1);
 
-- 
2.31.0.208.g409f899ff0-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* RE: [v2] Bluetooth: Add ncmd=0 recovery handling
  2021-04-08  2:36 [PATCH v2] Bluetooth: Add ncmd=0 recovery handling Manish Mandlik
@ 2021-04-08  3:31 ` bluez.test.bot
  2021-04-08  8:09 ` [PATCH v2] " Marcel Holtmann
  1 sibling, 0 replies; 3+ messages in thread
From: bluez.test.bot @ 2021-04-08  3:31 UTC (permalink / raw)
  To: linux-bluetooth, mmandlik

[-- Attachment #1: Type: text/plain, Size: 1966 bytes --]

This is automated email and please do not reply to this email!

Dear submitter,

Thank you for submitting the patches to the linux bluetooth mailing list.
This is a CI test results with your patch series:
PW Link:https://patchwork.kernel.org/project/bluetooth/list/?series=462901

---Test result---

##############################
Test: CheckPatch - PASS


##############################
Test: CheckGitLint - PASS


##############################
Test: CheckBuildK - PASS


##############################
Test: CheckTestRunner: Setup - PASS


##############################
Test: CheckTestRunner: l2cap-tester - PASS
Total: 40, Passed: 34 (85.0%), Failed: 0, Not Run: 6

##############################
Test: CheckTestRunner: bnep-tester - PASS
Total: 1, Passed: 1 (100.0%), Failed: 0, Not Run: 0

##############################
Test: CheckTestRunner: mgmt-tester - FAIL
Total: 416, Passed: 396 (95.2%), Failed: 6, Not Run: 14

Failed Test Cases
Set connectable off (LE) - Success 2                 Failed       0.024 seconds
Set connectable off (LE) - Success 3                 Failed       0.024 seconds
Set connectable off (LE) - Success 4                 Failed       0.024 seconds
Add Advertising - Success 13 (ADV_SCAN_IND)          Failed       0.028 seconds
Add Advertising - Success 14 (ADV_NONCONN_IND)       Failed       0.024 seconds
Add Advertising - Success 17 (Connectable -> off)    Failed       0.024 seconds

##############################
Test: CheckTestRunner: rfcomm-tester - PASS
Total: 9, Passed: 9 (100.0%), Failed: 0, Not Run: 0

##############################
Test: CheckTestRunner: sco-tester - PASS
Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0

##############################
Test: CheckTestRunner: smp-tester - PASS
Total: 8, Passed: 8 (100.0%), Failed: 0, Not Run: 0

##############################
Test: CheckTestRunner: userchan-tester - PASS
Total: 3, Passed: 3 (100.0%), Failed: 0, Not Run: 0



---
Regards,
Linux Bluetooth


[-- Attachment #2: l2cap-tester.log --]
[-- Type: application/octet-stream, Size: 43340 bytes --]

[-- Attachment #3: bnep-tester.log --]
[-- Type: application/octet-stream, Size: 3532 bytes --]

[-- Attachment #4: mgmt-tester.log --]
[-- Type: application/octet-stream, Size: 547567 bytes --]

[-- Attachment #5: rfcomm-tester.log --]
[-- Type: application/octet-stream, Size: 11651 bytes --]

[-- Attachment #6: sco-tester.log --]
[-- Type: application/octet-stream, Size: 9887 bytes --]

[-- Attachment #7: smp-tester.log --]
[-- Type: application/octet-stream, Size: 11798 bytes --]

[-- Attachment #8: userchan-tester.log --]
[-- Type: application/octet-stream, Size: 5426 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] Bluetooth: Add ncmd=0 recovery handling
  2021-04-08  2:36 [PATCH v2] Bluetooth: Add ncmd=0 recovery handling Manish Mandlik
  2021-04-08  3:31 ` [v2] " bluez.test.bot
@ 2021-04-08  8:09 ` Marcel Holtmann
  1 sibling, 0 replies; 3+ messages in thread
From: Marcel Holtmann @ 2021-04-08  8:09 UTC (permalink / raw)
  To: Manish Mandlik
  Cc: Luiz Augusto von Dentz, linux-bluetooth, Alain Michaud,
	CrosBT Upstreaming, Abhishek Pandit-Subedi, David S. Miller,
	Jakub Kicinski, Johan Hedberg, LKML, netdev

Hi Manish,

> During command status or command complete event, the controller may set
> ncmd=0 indicating that it is not accepting any more commands. In such a
> case, host holds off sending any more commands to the controller. If the
> controller doesn't recover from such condition, host will wait forever,
> until the user decides that the Bluetooth is broken and may power cycles
> the Bluetooth.
> 
> This patch triggers the hardware error to reset the controller and
> driver when it gets into such state as there is no other wat out.
> 
> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
> Signed-off-by: Manish Mandlik <mmandlik@google.com>
> ---
> 
> Changes in v2:
> - Emit the hardware error when ncmd=0 occurs
> 
> include/net/bluetooth/hci.h      |  1 +
> include/net/bluetooth/hci_core.h |  1 +
> net/bluetooth/hci_core.c         | 15 +++++++++++++++
> net/bluetooth/hci_event.c        | 10 ++++++++++
> 4 files changed, 27 insertions(+)
> 
> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
> index ea4ae551c426..c4b0650fb9ae 100644
> --- a/include/net/bluetooth/hci.h
> +++ b/include/net/bluetooth/hci.h
> @@ -339,6 +339,7 @@ enum {
> #define HCI_PAIRING_TIMEOUT	msecs_to_jiffies(60000)	/* 60 seconds */
> #define HCI_INIT_TIMEOUT	msecs_to_jiffies(10000)	/* 10 seconds */
> #define HCI_CMD_TIMEOUT		msecs_to_jiffies(2000)	/* 2 seconds */
> +#define HCI_NCMD_TIMEOUT	msecs_to_jiffies(4000)	/* 4 seconds */
> #define HCI_ACL_TX_TIMEOUT	msecs_to_jiffies(45000)	/* 45 seconds */
> #define HCI_AUTO_OFF_TIMEOUT	msecs_to_jiffies(2000)	/* 2 seconds */
> #define HCI_POWER_OFF_TIMEOUT	msecs_to_jiffies(5000)	/* 5 seconds */
> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> index ebdd4afe30d2..f14692b39fd5 100644
> --- a/include/net/bluetooth/hci_core.h
> +++ b/include/net/bluetooth/hci_core.h
> @@ -470,6 +470,7 @@ struct hci_dev {
> 	struct delayed_work	service_cache;
> 
> 	struct delayed_work	cmd_timer;
> +	struct delayed_work	ncmd_timer;
> 
> 	struct work_struct	rx_work;
> 	struct work_struct	cmd_work;
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index b0d9c36acc03..c102a8763cb5 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -2769,6 +2769,20 @@ static void hci_cmd_timeout(struct work_struct *work)
> 	queue_work(hdev->workqueue, &hdev->cmd_work);
> }
> 
> +/* HCI ncmd timer function */
> +static void hci_ncmd_timeout(struct work_struct *work)
> +{
> +	struct hci_dev *hdev = container_of(work, struct hci_dev,
> +					    ncmd_timer.work);
> +
> +	bt_dev_err(hdev, "Controller not accepting commands anymore: ncmd = 0");
> +
> +	/* This is an irrecoverable state. Inject hw error event to reset
> +	 * the device and driver.
> +	 */
> +	hci_reset_dev(hdev);

	/* This is an irrecoverable state, inject hardware error event */
	hci_reset_dev(hdev);

Since you will not be resetting the driver here. You just tell the core stack to reset itself and with HCI_Reset hopefully bring the hardware back to life. Or if the ncmd=0 is a hardware bug, just start sending a new command.

> +}
> +
> struct oob_data *hci_find_remote_oob_data(struct hci_dev *hdev,
> 					  bdaddr_t *bdaddr, u8 bdaddr_type)
> {
> @@ -3831,6 +3845,7 @@ struct hci_dev *hci_alloc_dev(void)
> 	init_waitqueue_head(&hdev->suspend_wait_q);
> 
> 	INIT_DELAYED_WORK(&hdev->cmd_timer, hci_cmd_timeout);
> +	INIT_DELAYED_WORK(&hdev->ncmd_timer, hci_ncmd_timeout);
> 
> 	hci_request_setup(hdev);
> 
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index cf2f4a0abdbd..114a9170d809 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -3635,6 +3635,11 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb,
> 	if (*opcode != HCI_OP_NOP)
> 		cancel_delayed_work(&hdev->cmd_timer);
> 
> +	if (!ev->ncmd &&!test_bit(HCI_RESET, &hdev->flags))
> +		schedule_delayed_work(&hdev->ncmd_timer, HCI_NCMD_TIMEOUT);
> +	else
> +		cancel_delayed_work(&hdev->ncmd_timer);
> +
> 	if (ev->ncmd && !test_bit(HCI_RESET, &hdev->flags))
> 		atomic_set(&hdev->cmd_cnt, 1);
> 

	if (!test_bit(HCI_RESET, &hdev->flags)) {
		if (ev->ncmd) {
			cancel_delayed_work(&hdev->ncmd_timer);
			atomic_set(&hdev->cmd_cnt, 1);
		} else {
			schedule_delayed_work(&hdev->ncmd_timer,
					      HCI_NCMD_TIMEOUT);
		}
	}

I think doing it this way is a bit cleaner and avoid the check of !ncmd and !HCI_RESET twice.

And I wonder if there isn’t a cancel_delayed_work missing in hci_dev_do_close or some related location when we are shutting down.

What do we do when this happens during HCI_INIT. I think if ncmd_timer triggers during HCI_INIT, then hci_up needs to be aborted and no hardware error event to be injected.

In addition since you are now calling hci_reset_dev also from the core stack (perviously, it was just up to the drivers to do that), I would add an extra error.

diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index fd12f1652bdf..1c9ef5608930 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -4073,6 +4073,8 @@ int hci_reset_dev(struct hci_dev *hdev)
        hci_skb_pkt_type(skb) = HCI_EVENT_PKT;
        skb_put_data(skb, hw_err, 3);
 
+       bt_dev_err(hdev, "Injecting HCI hardware error event");
+
        /* Send Hardware Error to upper stack */
        return hci_recv_frame(hdev, skb);
 }

This has the advantage that if you take a btmon trace, you know this event is injected. Or more precisely eventually will be able to know since we haven’t merged my patches yet that will redirect bt_dev_{err,warn,..} into btmon as well.

Regards

Marcel


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-04-08  8:10 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-08  2:36 [PATCH v2] Bluetooth: Add ncmd=0 recovery handling Manish Mandlik
2021-04-08  3:31 ` [v2] " bluez.test.bot
2021-04-08  8:09 ` [PATCH v2] " Marcel Holtmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.