linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout
@ 2014-05-14 14:00 Alexander Holler
  2014-05-14 14:00 ` [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s Alexander Holler
  2014-05-31  5:28 ` [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout Marcel Holtmann
  0 siblings, 2 replies; 13+ messages in thread
From: Alexander Holler @ 2014-05-14 14:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-bluetooth, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	Alexander Holler

I assume the timeout for processing HCI commands was originally intended to
detect hung bluetooth devices and should not include the time needed locally
to handle the response to an HCI command. That is important because the time
needed locally (by the kernel or even userland) to process responses to HCI
commands varies a lot between systems and HCI commands. That's even more true
since many actions to HCI command responses are handled inside works which
might be delayed quiet some time, depending on the actual system load.

So stop the timeout as soon as a response to an HCI command was received.

This fixes various problems which resulted in HCI command timeouts and an
afterwards non-working bluetooth stack, especially on slower systems like
some ARM devices.

Drawback is that in-kernel problems like deadlocks aren't detected by HCI
command timeouts anymore, but such problems should be detected and handled
by other means and not by a timeout where it is hard to specify a value
reasonable for all possible systems (-configurations, -loads).

Furthermore, if the timeout includes local processing of HCI command
responses, in-kernel errors like hung tasks might be masked by the
timeout, because the hung task would be killed by the timeout before
the hung task would be detected (by other means).

Signed-off-by: Alexander Holler <holler@ahsoftware.de>
---
 net/bluetooth/hci_event.c | 12 ++++++------
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 15010a2..94c2dc0 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -2338,6 +2338,9 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
 
 	opcode = __le16_to_cpu(ev->opcode);
 
+	if (opcode != HCI_OP_NOP)
+		del_timer(&hdev->cmd_timer);
+
 	switch (opcode) {
 	case HCI_OP_INQUIRY_CANCEL:
 		hci_cc_inquiry_cancel(hdev, skb);
@@ -2584,9 +2587,6 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
 		break;
 	}
 
-	if (opcode != HCI_OP_NOP)
-		del_timer(&hdev->cmd_timer);
-
 	hci_req_cmd_complete(hdev, opcode, status);
 
 	if (ev->ncmd && !test_bit(HCI_RESET, &hdev->flags)) {
@@ -2605,6 +2605,9 @@ static void hci_cmd_status_evt(struct hci_dev *hdev, struct sk_buff *skb)
 
 	opcode = __le16_to_cpu(ev->opcode);
 
+	if (opcode != HCI_OP_NOP)
+		del_timer(&hdev->cmd_timer);
+
 	switch (opcode) {
 	case HCI_OP_INQUIRY:
 		hci_cs_inquiry(hdev, ev->status);
@@ -2675,9 +2678,6 @@ static void hci_cmd_status_evt(struct hci_dev *hdev, struct sk_buff *skb)
 		break;
 	}
 
-	if (opcode != HCI_OP_NOP)
-		del_timer(&hdev->cmd_timer);
-
 	if (ev->status ||
 	    (hdev->sent_cmd && !bt_cb(hdev->sent_cmd)->req.event))
 		hci_req_cmd_complete(hdev, opcode, ev->status);
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s
  2014-05-14 14:00 [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout Alexander Holler
@ 2014-05-14 14:00 ` Alexander Holler
  2014-05-15 12:54   ` Luiz Augusto von Dentz
  2014-05-31  5:32   ` Marcel Holtmann
  2014-05-31  5:28 ` [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout Marcel Holtmann
  1 sibling, 2 replies; 13+ messages in thread
From: Alexander Holler @ 2014-05-14 14:00 UTC (permalink / raw)
  To: linux-kernel
  Cc: linux-bluetooth, Marcel Holtmann, Gustavo Padovan, Johan Hedberg,
	Alexander Holler

The reasoning to do this is the following:

- If a timeout occurs, the HCI-communication is broken afterwards and the
  dongle isn't usable anymore.
- If it works after e.g. waiting 4s everyone is still happy but if it
  just breaks after only waiting 2s nothing is gained.
- Having to wait some more seconds until an error occurs doesn't change
  anything.

So there is no disadvantage in rasing the timeout but a great advantage
in case the dongle needs more than 2s to process an HCI command.
E.g. I had sometimes HCI command timeouts at boot (but never after the BT stack
was successfull started). I assume the reason might be the USB-probing which
happend before through the bootloader, which might have confused the dongle
such that it needs a bit more time, but I'm not sure.

Together with the patch which limits the timeout only to the actual time the
dongle needs to process an HCI command (and doesn't include the time the
kernel needs to process the answer to an HCI command), my problems were gone.

Signed-off-by: Alexander Holler <holler@ahsoftware.de>
---
 include/net/bluetooth/hci.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
index be150cf..d50fd34 100644
--- a/include/net/bluetooth/hci.h
+++ b/include/net/bluetooth/hci.h
@@ -180,7 +180,7 @@ enum {
 #define HCI_DISCONN_TIMEOUT	msecs_to_jiffies(2000)	/* 2 seconds */
 #define HCI_PAIRING_TIMEOUT	msecs_to_jiffies(60000)	/* 60 seconds */
 #define HCI_INIT_TIMEOUT	msecs_to_jiffies(10000)	/* 10 seconds */
-#define HCI_CMD_TIMEOUT		msecs_to_jiffies(2000)	/* 2 seconds */
+#define HCI_CMD_TIMEOUT		msecs_to_jiffies(8000)	/* 8 seconds */
 #define HCI_ACL_TX_TIMEOUT	msecs_to_jiffies(45000)	/* 45 seconds */
 #define HCI_AUTO_OFF_TIMEOUT	msecs_to_jiffies(2000)	/* 2 seconds */
 #define HCI_POWER_OFF_TIMEOUT	msecs_to_jiffies(5000)	/* 5 seconds */
-- 
1.8.3.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s
  2014-05-14 14:00 ` [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s Alexander Holler
@ 2014-05-15 12:54   ` Luiz Augusto von Dentz
  2014-05-15 14:50     ` Alexander Holler
  2014-05-31  5:32   ` Marcel Holtmann
  1 sibling, 1 reply; 13+ messages in thread
From: Luiz Augusto von Dentz @ 2014-05-15 12:54 UTC (permalink / raw)
  To: Alexander Holler
  Cc: Linux Kernel Mailing List, linux-bluetooth, Marcel Holtmann,
	Gustavo Padovan, Johan Hedberg

Hi Alexander,

On Wed, May 14, 2014 at 5:00 PM, Alexander Holler <holler@ahsoftware.de> wrote:
> The reasoning to do this is the following:
>
> - If a timeout occurs, the HCI-communication is broken afterwards and the
>   dongle isn't usable anymore.
> - If it works after e.g. waiting 4s everyone is still happy but if it
>   just breaks after only waiting 2s nothing is gained.
> - Having to wait some more seconds until an error occurs doesn't change
>   anything.
>
> So there is no disadvantage in rasing the timeout but a great advantage
> in case the dongle needs more than 2s to process an HCI command.
> E.g. I had sometimes HCI command timeouts at boot (but never after the BT stack
> was successfull started). I assume the reason might be the USB-probing which
> happend before through the bootloader, which might have confused the dongle
> such that it needs a bit more time, but I'm not sure.
>
> Together with the patch which limits the timeout only to the actual time the
> dongle needs to process an HCI command (and doesn't include the time the
> kernel needs to process the answer to an HCI command), my problems were gone.
>
> Signed-off-by: Alexander Holler <holler@ahsoftware.de>
> ---
>  include/net/bluetooth/hci.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
> index be150cf..d50fd34 100644
> --- a/include/net/bluetooth/hci.h
> +++ b/include/net/bluetooth/hci.h
> @@ -180,7 +180,7 @@ enum {
>  #define HCI_DISCONN_TIMEOUT    msecs_to_jiffies(2000)  /* 2 seconds */
>  #define HCI_PAIRING_TIMEOUT    msecs_to_jiffies(60000) /* 60 seconds */
>  #define HCI_INIT_TIMEOUT       msecs_to_jiffies(10000) /* 10 seconds */
> -#define HCI_CMD_TIMEOUT                msecs_to_jiffies(2000)  /* 2 seconds */
> +#define HCI_CMD_TIMEOUT                msecs_to_jiffies(8000)  /* 8 seconds */
>  #define HCI_ACL_TX_TIMEOUT     msecs_to_jiffies(45000) /* 45 seconds */
>  #define HCI_AUTO_OFF_TIMEOUT   msecs_to_jiffies(2000)  /* 2 seconds */
>  #define HCI_POWER_OFF_TIMEOUT  msecs_to_jiffies(5000)  /* 5 seconds */
> --
> 1.8.3.1

This timeout seems arbitrary so I suppose we can increase it if we
feel it is necessary but we used already different timeout for
different commands like HCI_POWER_OFF_TIMEOUT, so perhaps if we can
identify which command is more likely to timeout.

We could perhaps auto reset if a command timeout if there is really no
other way to recover.


-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s
  2014-05-15 12:54   ` Luiz Augusto von Dentz
@ 2014-05-15 14:50     ` Alexander Holler
  2014-05-15 15:19       ` Alexander Holler
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Holler @ 2014-05-15 14:50 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Linux Kernel Mailing List, linux-bluetooth, Marcel Holtmann,
	Gustavo Padovan, Johan Hedberg

Am 15.05.2014 14:54, schrieb Luiz Augusto von Dentz:
> Hi Alexander,
> 
> On Wed, May 14, 2014 at 5:00 PM, Alexander Holler <holler@ahsoftware.de> wrote:
>> The reasoning to do this is the following:
>>
>> - If a timeout occurs, the HCI-communication is broken afterwards and the
>>   dongle isn't usable anymore.
>> - If it works after e.g. waiting 4s everyone is still happy but if it
>>   just breaks after only waiting 2s nothing is gained.
>> - Having to wait some more seconds until an error occurs doesn't change
>>   anything.
>>
>> So there is no disadvantage in rasing the timeout but a great advantage
>> in case the dongle needs more than 2s to process an HCI command.
>> E.g. I had sometimes HCI command timeouts at boot (but never after the BT stack
>> was successfull started). I assume the reason might be the USB-probing which
>> happend before through the bootloader, which might have confused the dongle
>> such that it needs a bit more time, but I'm not sure.
>>
>> Together with the patch which limits the timeout only to the actual time the
>> dongle needs to process an HCI command (and doesn't include the time the
>> kernel needs to process the answer to an HCI command), my problems were gone.
>>
>> Signed-off-by: Alexander Holler <holler@ahsoftware.de>
>> ---
>>  include/net/bluetooth/hci.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
>> index be150cf..d50fd34 100644
>> --- a/include/net/bluetooth/hci.h
>> +++ b/include/net/bluetooth/hci.h
>> @@ -180,7 +180,7 @@ enum {
>>  #define HCI_DISCONN_TIMEOUT    msecs_to_jiffies(2000)  /* 2 seconds */
>>  #define HCI_PAIRING_TIMEOUT    msecs_to_jiffies(60000) /* 60 seconds */
>>  #define HCI_INIT_TIMEOUT       msecs_to_jiffies(10000) /* 10 seconds */
>> -#define HCI_CMD_TIMEOUT                msecs_to_jiffies(2000)  /* 2 seconds */
>> +#define HCI_CMD_TIMEOUT                msecs_to_jiffies(8000)  /* 8 seconds */
>>  #define HCI_ACL_TX_TIMEOUT     msecs_to_jiffies(45000) /* 45 seconds */
>>  #define HCI_AUTO_OFF_TIMEOUT   msecs_to_jiffies(2000)  /* 2 seconds */
>>  #define HCI_POWER_OFF_TIMEOUT  msecs_to_jiffies(5000)  /* 5 seconds */
>> --
>> 1.8.3.1
> 
> This timeout seems arbitrary so I suppose we can increase it if we
> feel it is necessary but we used already different timeout for
> different commands like HCI_POWER_OFF_TIMEOUT, so perhaps if we can
> identify which command is more likely to timeout.
> 
> We could perhaps auto reset if a command timeout if there is really no
> other way to recover.

It is arbitrary but 2s is not enough here. And as I've written in the
description, there is absolutely no reason to keep this timeout
unnecessarily short. No one cares if an error message appears after 2s
or 8s if the communication with the dongle is in both cases broken
afterwards.

One of the commands I experieced the problem with was e.g.
HCI_OP_DELETE_STORED_LINK_KEY or HCI_OP_WRITE_SSP_MODE.

Regards,

Alexander Holler

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s
  2014-05-15 14:50     ` Alexander Holler
@ 2014-05-15 15:19       ` Alexander Holler
  2014-05-16  5:35         ` Alexander Holler
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Holler @ 2014-05-15 15:19 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Linux Kernel Mailing List, linux-bluetooth, Marcel Holtmann,
	Gustavo Padovan, Johan Hedberg

Am 15.05.2014 16:50, schrieb Alexander Holler:
> Am 15.05.2014 14:54, schrieb Luiz Augusto von Dentz:

>> This timeout seems arbitrary so I suppose we can increase it if we
>> feel it is necessary but we used already different timeout for
>> different commands like HCI_POWER_OFF_TIMEOUT, so perhaps if we can
>> identify which command is more likely to timeout.
>>
>> We could perhaps auto reset if a command timeout if there is really no
>> other way to recover.
>
> It is arbitrary but 2s is not enough here. And as I've written in the
> description, there is absolutely no reason to keep this timeout
> unnecessarily short. No one cares if an error message appears after 2s
> or 8s if the communication with the dongle is in both cases broken
> afterwards.
>
> One of the commands I experieced the problem with was e.g.
> HCI_OP_DELETE_STORED_LINK_KEY or HCI_OP_WRITE_SSP_MODE.

The problem is that you can never be sure what the origin of a timeouted 
command was. It might have been e.g. the USB-subsystem through wich the 
command and the response has to travel (in case of USB dongles) and not 
the dongle itself.

Regards,

Alexander Holler

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s
  2014-05-15 15:19       ` Alexander Holler
@ 2014-05-16  5:35         ` Alexander Holler
  2014-05-27 15:19           ` Alexander Holler
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Holler @ 2014-05-16  5:35 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Linux Kernel Mailing List, linux-bluetooth, Marcel Holtmann,
	Gustavo Padovan, Johan Hedberg

Am 15.05.2014 17:19, schrieb Alexander Holler:
> Am 15.05.2014 16:50, schrieb Alexander Holler:
>> Am 15.05.2014 14:54, schrieb Luiz Augusto von Dentz:
>
>>> This timeout seems arbitrary so I suppose we can increase it if we
>>> feel it is necessary but we used already different timeout for
>>> different commands like HCI_POWER_OFF_TIMEOUT, so perhaps if we can
>>> identify which command is more likely to timeout.
>>>
>>> We could perhaps auto reset if a command timeout if there is really no
>>> other way to recover.
>>
>> It is arbitrary but 2s is not enough here. And as I've written in the
>> description, there is absolutely no reason to keep this timeout
>> unnecessarily short. No one cares if an error message appears after 2s
>> or 8s if the communication with the dongle is in both cases broken
>> afterwards.
>>
>> One of the commands I experieced the problem with was e.g.
>> HCI_OP_DELETE_STORED_LINK_KEY or HCI_OP_WRITE_SSP_MODE.
>
> The problem is that you can never be sure what the origin of a timeouted
> command was. It might have been e.g. the USB-subsystem through wich the
> command and the response has to travel (in case of USB dongles) and not
> the dongle itself.

To explain a bit more, the box I'm experiencing these problems boots 
from USB2.0 HD. So it's likely that there is quiet some action on the 
bus and that shouldn't affect the operation of the BT-stack (besides 
slowing it maybe a bit down).

Regards,

Alexander Holler


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s
  2014-05-16  5:35         ` Alexander Holler
@ 2014-05-27 15:19           ` Alexander Holler
  0 siblings, 0 replies; 13+ messages in thread
From: Alexander Holler @ 2014-05-27 15:19 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Linux Kernel Mailing List, linux-bluetooth, Marcel Holtmann,
	Gustavo Padovan, Johan Hedberg

Am 16.05.2014 07:35, schrieb Alexander Holler:
> Am 15.05.2014 17:19, schrieb Alexander Holler:
>> Am 15.05.2014 16:50, schrieb Alexander Holler:
>>> Am 15.05.2014 14:54, schrieb Luiz Augusto von Dentz:
>>
>>>> This timeout seems arbitrary so I suppose we can increase it if we
>>>> feel it is necessary but we used already different timeout for
>>>> different commands like HCI_POWER_OFF_TIMEOUT, so perhaps if we can
>>>> identify which command is more likely to timeout.
>>>>
>>>> We could perhaps auto reset if a command timeout if there is really no
>>>> other way to recover.
>>>
>>> It is arbitrary but 2s is not enough here. And as I've written in the
>>> description, there is absolutely no reason to keep this timeout
>>> unnecessarily short. No one cares if an error message appears after 2s
>>> or 8s if the communication with the dongle is in both cases broken
>>> afterwards.
>>>
>>> One of the commands I experieced the problem with was e.g.
>>> HCI_OP_DELETE_STORED_LINK_KEY or HCI_OP_WRITE_SSP_MODE.
>>
>> The problem is that you can never be sure what the origin of a timeouted
>> command was. It might have been e.g. the USB-subsystem through wich the
>> command and the response has to travel (in case of USB dongles) and not
>> the dongle itself.
>
> To explain a bit more, the box I'm experiencing these problems boots
> from USB2.0 HD. So it's likely that there is quiet some action on the
> bus and that shouldn't affect the operation of the BT-stack (besides
> slowing it maybe a bit down).

Anything wrong with my conclusion?

I don't know what's the origin of the current timeout of 2s, but I don't 
think it takes the used transport into account.

So if the source of these 2s is the spec for BT-devices, the 2s would 
make sense as a timeout to test BT-devices, but it already becomes an 
arbitrary value whenever the transport isn't realtime and/or exclusive 
for the device but e.g. an I/O scheduler is inbetween (which is the case 
for USB).

Regards,

Alexander Holler

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout
  2014-05-14 14:00 [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout Alexander Holler
  2014-05-14 14:00 ` [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s Alexander Holler
@ 2014-05-31  5:28 ` Marcel Holtmann
  2014-05-31  6:45   ` Alexander Holler
  1 sibling, 1 reply; 13+ messages in thread
From: Marcel Holtmann @ 2014-05-31  5:28 UTC (permalink / raw)
  To: Alexander Holler; +Cc: LKML, linux-bluetooth, Gustavo F. Padovan, Johan Hedberg

Hi Alexander,

> I assume the timeout for processing HCI commands was originally intended to
> detect hung bluetooth devices and should not include the time needed locally
> to handle the response to an HCI command. That is important because the time
> needed locally (by the kernel or even userland) to process responses to HCI
> commands varies a lot between systems and HCI commands. That's even more true
> since many actions to HCI command responses are handled inside works which
> might be delayed quiet some time, depending on the actual system load.
> 
> So stop the timeout as soon as a response to an HCI command was received.
> 
> This fixes various problems which resulted in HCI command timeouts and an
> afterwards non-working bluetooth stack, especially on slower systems like
> some ARM devices.
> 
> Drawback is that in-kernel problems like deadlocks aren't detected by HCI
> command timeouts anymore, but such problems should be detected and handled
> by other means and not by a timeout where it is hard to specify a value
> reasonable for all possible systems (-configurations, -loads).
> 
> Furthermore, if the timeout includes local processing of HCI command
> responses, in-kernel errors like hung tasks might be masked by the
> timeout, because the hung task would be killed by the timeout before
> the hung task would be detected (by other means).
> 
> Signed-off-by: Alexander Holler <holler@ahsoftware.de>
> ---
> net/bluetooth/hci_event.c | 12 ++++++------
> 1 file changed, 6 insertions(+), 6 deletions(-)
> 
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index 15010a2..94c2dc0 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -2338,6 +2338,9 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
> 
> 	opcode = __le16_to_cpu(ev->opcode);
> 
> +	if (opcode != HCI_OP_NOP)
> +		del_timer(&hdev->cmd_timer);
> +

so I actually wonder if we should move away from timer and move to a delayed work item to handle the timeout and if that would actually fix this issue.

Regards

Marcel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s
  2014-05-14 14:00 ` [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s Alexander Holler
  2014-05-15 12:54   ` Luiz Augusto von Dentz
@ 2014-05-31  5:32   ` Marcel Holtmann
  2014-05-31  6:36     ` Alexander Holler
  1 sibling, 1 reply; 13+ messages in thread
From: Marcel Holtmann @ 2014-05-31  5:32 UTC (permalink / raw)
  To: Alexander Holler
  Cc: linux-kernel, linux-bluetooth, Gustavo F. Padovan, Johan Hedberg

Hi Alexander,

> The reasoning to do this is the following:
> 
> - If a timeout occurs, the HCI-communication is broken afterwards and the
>  dongle isn't usable anymore.
> - If it works after e.g. waiting 4s everyone is still happy but if it
>  just breaks after only waiting 2s nothing is gained.
> - Having to wait some more seconds until an error occurs doesn't change
>  anything.
> 
> So there is no disadvantage in rasing the timeout but a great advantage
> in case the dongle needs more than 2s to process an HCI command.
> E.g. I had sometimes HCI command timeouts at boot (but never after the BT stack
> was successfull started). I assume the reason might be the USB-probing which
> happend before through the bootloader, which might have confused the dongle
> such that it needs a bit more time, but I'm not sure.
> 
> Together with the patch which limits the timeout only to the actual time the
> dongle needs to process an HCI command (and doesn't include the time the
> kernel needs to process the answer to an HCI command), my problems were gone.
> 
> Signed-off-by: Alexander Holler <holler@ahsoftware.de>
> ---
> include/net/bluetooth/hci.h | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
> index be150cf..d50fd34 100644
> --- a/include/net/bluetooth/hci.h
> +++ b/include/net/bluetooth/hci.h
> @@ -180,7 +180,7 @@ enum {
> #define HCI_DISCONN_TIMEOUT	msecs_to_jiffies(2000)	/* 2 seconds */
> #define HCI_PAIRING_TIMEOUT	msecs_to_jiffies(60000)	/* 60 seconds */
> #define HCI_INIT_TIMEOUT	msecs_to_jiffies(10000)	/* 10 seconds */
> -#define HCI_CMD_TIMEOUT		msecs_to_jiffies(2000)	/* 2 seconds */
> +#define HCI_CMD_TIMEOUT		msecs_to_jiffies(8000)	/* 8 seconds */
> #define HCI_ACL_TX_TIMEOUT	msecs_to_jiffies(45000)	/* 45 seconds */
> #define HCI_AUTO_OFF_TIMEOUT	msecs_to_jiffies(2000)	/* 2 seconds */
> #define HCI_POWER_OFF_TIMEOUT	msecs_to_jiffies(5000)	/* 5 seconds */

I think moving the command timeout handling into a delayed work struct might actually solve this problem nicely and does not force us to increase the timeout. A chip that does not respond for 8 seconds is a pretty bad chip.

Regards

Marcel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s
  2014-05-31  5:32   ` Marcel Holtmann
@ 2014-05-31  6:36     ` Alexander Holler
  2014-06-01  1:42       ` Marcel Holtmann
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Holler @ 2014-05-31  6:36 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: linux-kernel, linux-bluetooth, Gustavo F. Padovan, Johan Hedberg

Am 31.05.2014 07:32, schrieb Marcel Holtmann:
> Hi Alexander,
> 
>> The reasoning to do this is the following:
>>
>> - If a timeout occurs, the HCI-communication is broken afterwards and the
>>  dongle isn't usable anymore.
>> - If it works after e.g. waiting 4s everyone is still happy but if it
>>  just breaks after only waiting 2s nothing is gained.
>> - Having to wait some more seconds until an error occurs doesn't change
>>  anything.
>>
>> So there is no disadvantage in rasing the timeout but a great advantage
>> in case the dongle needs more than 2s to process an HCI command.
>> E.g. I had sometimes HCI command timeouts at boot (but never after the BT stack
>> was successfull started). I assume the reason might be the USB-probing which
>> happend before through the bootloader, which might have confused the dongle
>> such that it needs a bit more time, but I'm not sure.
>>
>> Together with the patch which limits the timeout only to the actual time the
>> dongle needs to process an HCI command (and doesn't include the time the
>> kernel needs to process the answer to an HCI command), my problems were gone.
>>
>> Signed-off-by: Alexander Holler <holler@ahsoftware.de>
>> ---
>> include/net/bluetooth/hci.h | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
>> index be150cf..d50fd34 100644
>> --- a/include/net/bluetooth/hci.h
>> +++ b/include/net/bluetooth/hci.h
>> @@ -180,7 +180,7 @@ enum {
>> #define HCI_DISCONN_TIMEOUT	msecs_to_jiffies(2000)	/* 2 seconds */
>> #define HCI_PAIRING_TIMEOUT	msecs_to_jiffies(60000)	/* 60 seconds */
>> #define HCI_INIT_TIMEOUT	msecs_to_jiffies(10000)	/* 10 seconds */
>> -#define HCI_CMD_TIMEOUT		msecs_to_jiffies(2000)	/* 2 seconds */
>> +#define HCI_CMD_TIMEOUT		msecs_to_jiffies(8000)	/* 8 seconds */
>> #define HCI_ACL_TX_TIMEOUT	msecs_to_jiffies(45000)	/* 45 seconds */
>> #define HCI_AUTO_OFF_TIMEOUT	msecs_to_jiffies(2000)	/* 2 seconds */
>> #define HCI_POWER_OFF_TIMEOUT	msecs_to_jiffies(5000)	/* 5 seconds */
> 
> I think moving the command timeout handling into a delayed work struct might actually solve this problem nicely and does not force us to increase the timeout. A chip that does not respond for 8 seconds is a pretty bad chip.

As I said in another mail, I don't think it is the chip. On the system
where I'm experiencing these timeouts there's still the USB-subsysten
inbetween. And this system boots from USB too, which means there's a lot
of other traffic on the USB-bus besides the one for the USB-BT-dongle.
And I don't know how the USB-stack (and hw) schedules the traffic, if he
is able to schedule that at all.

Regards,

Alexander

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout
  2014-05-31  5:28 ` [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout Marcel Holtmann
@ 2014-05-31  6:45   ` Alexander Holler
  2014-05-31  7:09     ` Alexander Holler
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Holler @ 2014-05-31  6:45 UTC (permalink / raw)
  To: Marcel Holtmann; +Cc: LKML, linux-bluetooth, Gustavo F. Padovan, Johan Hedberg

Am 31.05.2014 07:28, schrieb Marcel Holtmann:
> Hi Alexander,
> 
>> I assume the timeout for processing HCI commands was originally intended to
>> detect hung bluetooth devices and should not include the time needed locally
>> to handle the response to an HCI command. That is important because the time
>> needed locally (by the kernel or even userland) to process responses to HCI
>> commands varies a lot between systems and HCI commands. That's even more true
>> since many actions to HCI command responses are handled inside works which
>> might be delayed quiet some time, depending on the actual system load.
>>
>> So stop the timeout as soon as a response to an HCI command was received.
>>
>> This fixes various problems which resulted in HCI command timeouts and an
>> afterwards non-working bluetooth stack, especially on slower systems like
>> some ARM devices.
>>
>> Drawback is that in-kernel problems like deadlocks aren't detected by HCI
>> command timeouts anymore, but such problems should be detected and handled
>> by other means and not by a timeout where it is hard to specify a value
>> reasonable for all possible systems (-configurations, -loads).
>>
>> Furthermore, if the timeout includes local processing of HCI command
>> responses, in-kernel errors like hung tasks might be masked by the
>> timeout, because the hung task would be killed by the timeout before
>> the hung task would be detected (by other means).
>>
>> Signed-off-by: Alexander Holler <holler@ahsoftware.de>
>> ---
>> net/bluetooth/hci_event.c | 12 ++++++------
>> 1 file changed, 6 insertions(+), 6 deletions(-)
>>
>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
>> index 15010a2..94c2dc0 100644
>> --- a/net/bluetooth/hci_event.c
>> +++ b/net/bluetooth/hci_event.c
>> @@ -2338,6 +2338,9 @@ static void hci_cmd_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
>>
>> 	opcode = __le16_to_cpu(ev->opcode);
>>
>> +	if (opcode != HCI_OP_NOP)
>> +		del_timer(&hdev->cmd_timer);
>> +
> 
> so I actually wonder if we should move away from timer and move to a delayed work item to handle the timeout and if that would actually fix this issue.

The problem is that I have absolutely no clue where these timeouts do
come from. They appear for different commands and almost always only at
boot. If the machine did come up without hci command timeouts, there
never was one afterwards. So I digged in the dark and the above patch
was one of the results. But I still had to increase the command timeout.
And I can't do much testing as I use this box. I just experience this
problem almost always when I boot it (to do kernel updates) and since a
very long time (more than a year I think).

Regards,

Alexander Holler

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout
  2014-05-31  6:45   ` Alexander Holler
@ 2014-05-31  7:09     ` Alexander Holler
  0 siblings, 0 replies; 13+ messages in thread
From: Alexander Holler @ 2014-05-31  7:09 UTC (permalink / raw)
  To: Marcel Holtmann; +Cc: LKML, linux-bluetooth, Gustavo F. Padovan, Johan Hedberg

Am 31.05.2014 08:45, schrieb Alexander Holler:
> Am 31.05.2014 07:28, schrieb Marcel Holtmann:

>> so I actually wonder if we should move away from timer and move to a delayed work item to handle the timeout and if that would actually fix this issue.
>
> The problem is that I have absolutely no clue where these timeouts do
> come from. They appear for different commands and almost always only at
> boot. If the machine did come up without hci command timeouts, there
> never was one afterwards. So I digged in the dark and the above patch
> was one of the results. But I still had to increase the command timeout.
> And I can't do much testing as I use this box. I just experience this
> problem almost always when I boot it (to do kernel updates) and since a
> very long time (more than a year I think).

And I should mention that I had these timeouts with a different dongle 
too (I switched from a bt 3.x dongle to a bt 4.x dongle around a year 
ago). But as said, in the commit msg, the old behaviour might have 
masked different problems like deadlocks or similiar too, so I just 
might have experienced several different problems.

Regards,

Alexander Holler

.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s
  2014-05-31  6:36     ` Alexander Holler
@ 2014-06-01  1:42       ` Marcel Holtmann
  0 siblings, 0 replies; 13+ messages in thread
From: Marcel Holtmann @ 2014-06-01  1:42 UTC (permalink / raw)
  To: Alexander Holler; +Cc: LKML, linux-bluetooth, Gustavo F. Padovan, Johan Hedberg

Hi Alexander,

>>> The reasoning to do this is the following:
>>> 
>>> - If a timeout occurs, the HCI-communication is broken afterwards and the
>>> dongle isn't usable anymore.
>>> - If it works after e.g. waiting 4s everyone is still happy but if it
>>> just breaks after only waiting 2s nothing is gained.
>>> - Having to wait some more seconds until an error occurs doesn't change
>>> anything.
>>> 
>>> So there is no disadvantage in rasing the timeout but a great advantage
>>> in case the dongle needs more than 2s to process an HCI command.
>>> E.g. I had sometimes HCI command timeouts at boot (but never after the BT stack
>>> was successfull started). I assume the reason might be the USB-probing which
>>> happend before through the bootloader, which might have confused the dongle
>>> such that it needs a bit more time, but I'm not sure.
>>> 
>>> Together with the patch which limits the timeout only to the actual time the
>>> dongle needs to process an HCI command (and doesn't include the time the
>>> kernel needs to process the answer to an HCI command), my problems were gone.
>>> 
>>> Signed-off-by: Alexander Holler <holler@ahsoftware.de>
>>> ---
>>> include/net/bluetooth/hci.h | 2 +-
>>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>> 
>>> diff --git a/include/net/bluetooth/hci.h b/include/net/bluetooth/hci.h
>>> index be150cf..d50fd34 100644
>>> --- a/include/net/bluetooth/hci.h
>>> +++ b/include/net/bluetooth/hci.h
>>> @@ -180,7 +180,7 @@ enum {
>>> #define HCI_DISCONN_TIMEOUT	msecs_to_jiffies(2000)	/* 2 seconds */
>>> #define HCI_PAIRING_TIMEOUT	msecs_to_jiffies(60000)	/* 60 seconds */
>>> #define HCI_INIT_TIMEOUT	msecs_to_jiffies(10000)	/* 10 seconds */
>>> -#define HCI_CMD_TIMEOUT		msecs_to_jiffies(2000)	/* 2 seconds */
>>> +#define HCI_CMD_TIMEOUT		msecs_to_jiffies(8000)	/* 8 seconds */
>>> #define HCI_ACL_TX_TIMEOUT	msecs_to_jiffies(45000)	/* 45 seconds */
>>> #define HCI_AUTO_OFF_TIMEOUT	msecs_to_jiffies(2000)	/* 2 seconds */
>>> #define HCI_POWER_OFF_TIMEOUT	msecs_to_jiffies(5000)	/* 5 seconds */
>> 
>> I think moving the command timeout handling into a delayed work struct might actually solve this problem nicely and does not force us to increase the timeout. A chip that does not respond for 8 seconds is a pretty bad chip.
> 
> As I said in another mail, I don't think it is the chip. On the system
> where I'm experiencing these timeouts there's still the USB-subsysten
> inbetween. And this system boots from USB too, which means there's a lot
> of other traffic on the USB-bus besides the one for the USB-BT-dongle.
> And I don't know how the USB-stack (and hw) schedules the traffic, if he
> is able to schedule that at all.

I posted an experimental patch for changing cmd_timer into a delayed_work item and this might solve your scheduling problem. However that patch is untested and might needs some additional thinking on which queue the delayed work should be scheduled on. Nevertheless, you see the direction that we should explore first.

The L2CAP layer is not using struct timer_list at all anymore and in general we should remove timer_list usage from the Bluetooth subsystem since we do all our command processing in a workqueue. At least for lower protocols like HCI, L2CAP, SMP and A2DP.

Regards

Marcel


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2014-06-01  1:43 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-05-14 14:00 [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout Alexander Holler
2014-05-14 14:00 ` [PATCH 2/2] bluetooth: raise HCI_CMD_TIMEOUT from 2s to 8s Alexander Holler
2014-05-15 12:54   ` Luiz Augusto von Dentz
2014-05-15 14:50     ` Alexander Holler
2014-05-15 15:19       ` Alexander Holler
2014-05-16  5:35         ` Alexander Holler
2014-05-27 15:19           ` Alexander Holler
2014-05-31  5:32   ` Marcel Holtmann
2014-05-31  6:36     ` Alexander Holler
2014-06-01  1:42       ` Marcel Holtmann
2014-05-31  5:28 ` [PATCH 1/2] bluetooth: don't include local processing of HCI commands in the command timeout Marcel Holtmann
2014-05-31  6:45   ` Alexander Holler
2014-05-31  7:09     ` Alexander Holler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).