All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes
@ 2017-03-28 17:50 Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 01/16] Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work() Dean Jenkins
                   ` (16 more replies)
  0 siblings, 17 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

This is RFC patchset V1 which reorganises hci_uart_tty_close() to overcome a
design flaw. I would like some comments on the changes.

Design Flaw
===========

An example callstack is as follows

Have Bluetooth running using a BCSP based UART Bluetooth Radio Module.

Now kill the userland hciattach program by doing
killall hciattach

When hciattach terminates, it calls ioctl TIOCSETD to run:

tty_set_ldisc() takes a tty ref lock via tty_ldisc_lock
tty_ldisc_close()
hci_uart_tty_close()
hci_unregister_dev()
hci_dev_do_close()
__hci_req_sync() which tries to send a HCI RESET command which depends on
HCI_QUIRK_RESET_ON_CLOSE being enabled and that is the default condition.

The design flaw is that in order to send a HCI RESET command the BCSP Data Link
protocol layer must be able to send and receive BCSP frames from the UART port
that is physically connected to the BCSP based UART Bluetooth Radio Module. If
this data "pipe" is broken then BCSP will attempt to retransmit every 250ms
until the BCSP layer is closed. In addition, the HCI RESET command has a 2
second timeout which will block __hci_req_sync() for 2 seconds. Enabling
BT_DBG() shows BCSP attempting to retransmit during hci_uart_tty_close().

Meanwhile, tty_set_ldisc() has a tty ref lock which prevents flush_to_ldisc()
from passing the UART port data to the BCSP layer causing a loss of RX BCSP
frames. Similarly, the hci_uart_tty_wakeup() no longer runs. These commits do
not fix this tty ref lock issue.

This patchset removes hci_ldisc.c hci_uart_tty_close() breakages that prevent
__hci_req_sync() from being successful. There are also race conditions due to
the BCSP timeout handling being asynchronous to the thread running
hci_uart_tty_close() which can cause kernel crashes in particular when BCSP is
already retransmitting due to broken communications. Therefore, disabling
the sending of the HCI RESET command does not prevent some of the races.

Solution
========

Reorganise hci_uart_tty_close() so that hci_unregister_dev() can run with no
interference to the data pipe between the Data Link protocol layer such as BCSP
and the UART driver.

The patchset cleans up the HCI_UART_REGISTERED and HCI_UART_PROTO_READY flag
handling so that:
a) hdev is only valid when HCI_UART_REGISTERED is in the set state.
b) the proto Data Link functions pointers can only be used when
   HCI_UART_PROTO_READY is in the set state.

Note that flushing can corrupt any data being sent from BCSP to the UART driver
which is undesirable.

The most important patch is
"Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races"
which adds rwlocking around the clearing of the flag HCI_UART_PROTO_READY.
This is because the atomic flag HCI_UART_PROTO_READY cannot always prevent a
thread using a Data Link protocol layer function pointer during or after closure
of the Data Link protocol layer. This can result in a kernel crash. Remember
that the BCSP retransmission handling runs in a separate thread and tries to
send a new frame. Therefore there is a small window of a race condition for the
flag HCI_UART_PROTO_READY to be seen in the set state and then calls the
proto function pointer after the Data Link protocol layer has been closed,
resulting in a kernel crash.

Next steps
==========

I am still testing these changes but I would like some feedback on the proposed
changes.

I will reply to any feedback next week.

Thanks.


Dean Jenkins (16):
  Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work()
  Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev
  Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY
  Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev()
    call
  Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame()
  Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue()
  Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup()
  Bluetooth: hci_ldisc: Separate flag handling in hci_uart_tty_close()
  Bluetooth: hci_ldisc: Tidy-up HCI_UART_REGISTERED in
    hci_uart_tty_close()
  Bluetooth: hci_ldisc: hci_uart_tty_close() detach tty after last msg
  Bluetooth: hci_ldisc: hci_uart_tty_close() move hci_uart_close()
  Bluetooth: hci_ldisc: hci_uart_tty_close() move cancel_work_sync()
  Bluetooth: hci_ldisc: hci_uart_tty_close() free hu->tx_skb
  Bluetooth: hci_ldisc: Simplify flushing
  Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races
  Bluetooth: hci_ldisc: Add ioctl_mutex avoiding concurrency

 drivers/bluetooth/hci_ldisc.c | 105 ++++++++++++++++++++++++++++++++----------
 drivers/bluetooth/hci_uart.h  |   3 ++
 2 files changed, 84 insertions(+), 24 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC V1 01/16] Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work()
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 02/16] Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev Dean Jenkins
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

If hci_register_dev() returns an error in hci_uart_init_work()
then the HCI_UART_REGISTERED bit gets erroneously set due to
a missing return statement. Therefore, add the missing return
statement.

The consequence of the missing return is that the HCI UART is not
registered but HCI_UART_REGISTERED is set which allows the code
to think that hu->hdev is safe to access but hu->hdev has been
freed so could lead to a crash.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 9497c46..3a65414 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -186,6 +186,7 @@ static void hci_uart_init_work(struct work_struct *work)
 		hci_free_dev(hu->hdev);
 		hu->hdev = NULL;
 		hu->proto->close(hu);
+		return;
 	}
 
 	set_bit(HCI_UART_REGISTERED, &hu->flags);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 02/16] Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 01/16] Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work() Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 03/16] Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY Dean Jenkins
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

When hci_register_dev() fails, hu->hdev should be set to NULL before
freeing hdev. This avoids potential use of hu->hdev after it has been
freed.

This commit sets hu->hdev to NULL before calling hci_free_dev() in error
handling scenarios in hci_uart_init_work() and hci_uart_register_dev().

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 3a65414..a351cc7 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -176,6 +176,7 @@ static void hci_uart_init_work(struct work_struct *work)
 {
 	struct hci_uart *hu = container_of(work, struct hci_uart, init_ready);
 	int err;
+	struct hci_dev *hdev;
 
 	if (!test_and_clear_bit(HCI_UART_INIT_PENDING, &hu->hdev_flags))
 		return;
@@ -183,8 +184,9 @@ static void hci_uart_init_work(struct work_struct *work)
 	err = hci_register_dev(hu->hdev);
 	if (err < 0) {
 		BT_ERR("Can't register HCI device");
-		hci_free_dev(hu->hdev);
+		hdev = hu->hdev;
 		hu->hdev = NULL;
+		hci_free_dev(hdev);
 		hu->proto->close(hu);
 		return;
 	}
@@ -617,6 +619,7 @@ static int hci_uart_register_dev(struct hci_uart *hu)
 
 	if (hci_register_dev(hdev) < 0) {
 		BT_ERR("Can't register HCI device");
+		hu->hdev = NULL;
 		hci_free_dev(hdev);
 		return -ENODEV;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 03/16] Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 01/16] Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work() Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 02/16] Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 04/16] Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev() call Dean Jenkins
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

Ensure that HCI_UART_PROTO_READY is cleared before close(hu) is
called which closes the Data Link protocol layer.

Therefore, add the missing bit clear of HCI_UART_PROTO_READY to
hci_uart_init_work() so that the flag is cleared when
hci_register_dev fails.

Without the fix, the functions of the Data Link protocol layer could
potentially be accessed after that layer has been closed. This
could lead to a crash as memory would have been freed in that layer.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index a351cc7..1887ad4 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -187,6 +187,7 @@ static void hci_uart_init_work(struct work_struct *work)
 		hdev = hu->hdev;
 		hu->hdev = NULL;
 		hci_free_dev(hdev);
+		clear_bit(HCI_UART_PROTO_READY, &hu->flags);
 		hu->proto->close(hu);
 		return;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 04/16] Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev() call
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (2 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 03/16] Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-30 10:11   ` Marcel Holtmann
  2017-03-28 17:50 ` [RFC V1 05/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame() Dean Jenkins
                   ` (12 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

This commit relates to the HCI UART quirk HCI_QUIRK_RESET_ON_CLOSE
which is enabled by default and can be disabled by setting hdev_flags
flag HCI_UART_RESET_ON_INIT via ioctl HCIUARTSETFLAGS from userland.

The implementation of HCI_QUIRK_RESET_ON_CLOSE is broken for the BCSP
Data Link protocol layer because it can be seen that
hci_unregister_dev() takes 2 seconds to run due to BCSP communications
failure. Suspect that sending of the HCI RESET command is broken
for the other Data Link protocols as well.

Therefore, add a comment to inform that the call to
hci_unregister_dev() in hci_uart_tty_close() may send a HCI RESET
command. This transmission needs the HCI UART driver to remain
operational including having the Data Link protocol being bound to
the HCI UART driver. If the transmission fails, then
hci_unregister_dev() waits HCI_CMD_TIMEOUT (2) seconds for the
timeout to occur which is undesirable.

The current software has a call to hci_unregister_dev(hdev) in
hci_uart_tty_close() which can cause an attempt at sending the
command HCI RESET to occur through the following callstack:

hci_uart_tty_close()
hci_unregister_dev()
hci_dev_do_close()
__hci_req_sync() to queue up command HCI RESET
which adds a work-item to the hdev->workqueue and waits 2 seconds
for a response

Subsequently
hdev->workqueue runs and processes the work-item by calling
hci_cmd_work()
hci_send_frame()
hci_uart_send_frame() to enqueue into the Data Link protocol layer

No protocol response comes so hci_unregister_dev() takes 2 seconds to
run!

In fact, this hidden transmission of the HCI RESET command is most
likely the root cause of why hci_uart_tty_close() crashed in the past
and was "fixed" with multiple workarounds in the past.

The underlying design flaw is that hci_uart_tty_close() is interfering
with various aspects of transmitting and receiving HCI messages
whilst hci_unregister_dev() is trying to use the Data Link protocol
to send the HCI RESET command. Consequently, the design flaw
causes the Data Link protocol to retransmit as no reply comes from
the Bluetooth Radio module over the UART link.

Over the many years of "fixes", this caused the logic in
hci_uart_tty_close() to become over-complex.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 1887ad4..c78c9dc 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -499,6 +499,12 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 	if (test_and_clear_bit(HCI_UART_PROTO_READY, &hu->flags)) {
 		if (hdev) {
 			if (test_bit(HCI_UART_REGISTERED, &hu->flags))
+				/* Note hci_unregister_dev() may try to send
+				 * a HCI RESET command. If the transmission
+				 * fails then hci_unregister_dev() waits
+				 * HCI_CMD_TIMEOUT (2) seconds for the timeout
+				 * to occur.
+				 */
 				hci_unregister_dev(hdev);
 			hci_free_dev(hdev);
 		}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 05/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame()
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (3 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 04/16] Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev() call Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 06/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue() Dean Jenkins
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

Before attempting to send a HCI message, check that the Data Link
protocol is still bound to the HCI UART driver. This makes the code
consistent with the usage of the other proto function pointers.

Therefore, add a check for HCI_UART_PROTO_READY into hci_uart_send_frame()
and return -EUNATCH if the Data Link protocol is not bound.

This also allows hci_send_frame() to report the error of an unbound
Data Link protocol layer. Therefore, it assists with diagnostics into
why HCI messages are being sent when the Data Link protocol is not
bound and avoids potential crashes.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index c78c9dc..eeea305 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -255,6 +255,9 @@ static int hci_uart_send_frame(struct hci_dev *hdev, struct sk_buff *skb)
 	BT_DBG("%s: type %d len %d", hdev->name, hci_skb_pkt_type(skb),
 	       skb->len);
 
+	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
+		return -EUNATCH;
+
 	hu->proto->enqueue(hu, skb);
 
 	hci_uart_tx_wakeup(hu);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 06/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue()
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (4 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 05/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame() Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 07/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup() Dean Jenkins
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

Before attempting to dequeue a Data Link protocol encapsulated message,
check that the Data Link protocol is still bound to the HCI UART driver.
This makes the code consistent with the usage of the other proto
function pointers.

Therefore, add a check for HCI_UART_PROTO_READY into hci_uart_dequeue()
and return NULL if the Data Link protocol is not bound.

This is needed for robustness as there is a scheduling race condition.
hci_uart_write_work() is scheduled to run via work queue hu->write_work
from hci_uart_tx_wakeup(). Therefore, there is a delay between
scheduling hci_uart_write_work() to run and hci_uart_dequeue() running
whereby the Data Link protocol layer could become unbound during the
scheduling delay. In this case, without the check, the call to the
unbound Data Link protocol layer dequeue function can crash.

It is noted that hci_uart_tty_close() has a
"cancel_work_sync(&hu->write_work)" statement but this only reduces
the window of the race condition because it is possible for a new
work-item to be added to work queue hu->write_work after the call to
cancel_work_sync(). For example, Data Link layer retransmissions can
be added to the work queue after the cancel_work_sync() has finished.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index eeea305..c4b801b 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -113,10 +113,12 @@ static inline struct sk_buff *hci_uart_dequeue(struct hci_uart *hu)
 {
 	struct sk_buff *skb = hu->tx_skb;
 
-	if (!skb)
-		skb = hu->proto->dequeue(hu);
-	else
+	if (!skb) {
+		if (test_bit(HCI_UART_PROTO_READY, &hu->flags))
+			skb = hu->proto->dequeue(hu);
+	} else {
 		hu->tx_skb = NULL;
+	}
 
 	return skb;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 07/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup()
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (5 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 06/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue() Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-30 10:11   ` Marcel Holtmann
  2017-03-28 17:50 ` [RFC V1 08/16] Bluetooth: hci_ldisc: Separate flag handling in hci_uart_tty_close() Dean Jenkins
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

Before attempting to schedule a work-item onto hu->write_work in
hci_uart_tx_wakeup(), check that the Data Link protocol layer is
still bound to the HCI UART driver.

Failure to perform this protocol check causes a race condition between
the work queue hu->write_work running hci_uart_write_work() and the
Data Link protocol layer being unbound (closed) in hci_uart_tty_close().

Note hci_uart_tty_close() does have a "cancel_work_sync(&hu->write_work)"
but it is ineffective because it cannot prevent work-items being added
to hu->write_work after cancel_work_sync() has run.

Therefore, add a check for HCI_UART_PROTO_READY into hci_uart_tx_wakeup()
which prevents scheduling of the work queue when HCI_UART_PROTO_READY
is in the clear state. However, note a small race condition remains
because the hci_uart_tx_wakeup() thread can run in parallel with the
hci_uart_tty_close() thread so it is possible that a schedule of
hu->write_work can occur when HCI_UART_PROTO_READY is cleared. A complete
solution needs locking of the threads which is implemented in a future
commit.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index c4b801b..a1bb4d64b9 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -125,15 +125,19 @@ static inline struct sk_buff *hci_uart_dequeue(struct hci_uart *hu)
 
 int hci_uart_tx_wakeup(struct hci_uart *hu)
 {
+	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
+		goto no_schedule;
+
 	if (test_and_set_bit(HCI_UART_SENDING, &hu->tx_state)) {
 		set_bit(HCI_UART_TX_WAKEUP, &hu->tx_state);
-		return 0;
+		goto no_schedule;
 	}
 
 	BT_DBG("");
 
 	schedule_work(&hu->write_work);
 
+no_schedule:
 	return 0;
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 08/16] Bluetooth: hci_ldisc: Separate flag handling in hci_uart_tty_close()
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (6 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 07/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup() Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 09/16] Bluetooth: hci_ldisc: Tidy-up HCI_UART_REGISTERED " Dean Jenkins
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

The treatment of flags HCI_UART_REGISTERED and HCI_UART_PROTO_READY
in hci_uart_tty_close() was combined in multiple if statements. This
is over-complex and can be simplified.

The simplification is to separate the processing of the flags
HCI_UART_REGISTERED and HCI_UART_PROTO_READY in hci_uart_tty_close()
because registration of the HCI UART device is actually independent of
the binding of the Data Link protocol layer. The Data Link protocol
layer is bound before the HCI UART device is registered and therefore
unregister the HCI UART device before unbinding the Data Link protocol
layer. However, the software should not break if the Data Link protocol
somehow becomes unbound whilst the HCI UART device is still registered.

One of the reasons the software got over complex was because
hci_uart_send_frame() did not check the status of the
HCI_UART_PROTO_READY flag which meant that the hdev->workqueue could
operate on an unbound Data Link protocol layer. The complexity of the
logic prevented hci_uart_send_frame() from crashing.

The commit
"Bluetooth: Add protocol check to hci_uart_send_frame()"
for adding the HCI_UART_PROTO_READY flag check to hci_uart_send_frame()
is necessary to allow this simplification to work. Otherwise,
hci_uart_send_frame() can attempt to write to the unbound Data Link
protocol layer resulting in a crash.

Note hci_uart_write_work() races with hci_uart_tty_close() so it is
important that freeing of hdev is moved to the end of
hci_uart_tty_close() to reduce the risk of use after free of hdev.
For completeness set hu->hdev to NULL before freeing hdev so that
hu no longer references the freed hdev.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 30 ++++++++++++++++++------------
 1 file changed, 18 insertions(+), 12 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index a1bb4d64b9..9e3604d 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -505,22 +505,28 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 
 	cancel_work_sync(&hu->write_work);
 
-	if (test_and_clear_bit(HCI_UART_PROTO_READY, &hu->flags)) {
-		if (hdev) {
-			if (test_bit(HCI_UART_REGISTERED, &hu->flags))
-				/* Note hci_unregister_dev() may try to send
-				 * a HCI RESET command. If the transmission
-				 * fails then hci_unregister_dev() waits
-				 * HCI_CMD_TIMEOUT (2) seconds for the timeout
-				 * to occur.
-				 */
-				hci_unregister_dev(hdev);
-			hci_free_dev(hdev);
-		}
+	if (hdev) {
+		if (test_bit(HCI_UART_REGISTERED, &hu->flags))
+			/* Note hci_unregister_dev() may try to send a
+			 * HCI RESET command. If the transmission fails then
+			 * hci_unregister_dev() waits HCI_CMD_TIMEOUT
+			 * (2) seconds for the timeout to occur.
+			 */
+			hci_unregister_dev(hdev);
+	}
+
+	if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
+		clear_bit(HCI_UART_PROTO_READY, &hu->flags);
 		hu->proto->close(hu);
 	}
+
 	clear_bit(HCI_UART_PROTO_SET, &hu->flags);
 
+	if (test_and_clear_bit(HCI_UART_REGISTERED, &hu->flags)) {
+		hu->hdev = NULL;
+		hci_free_dev(hdev);
+	}
+
 	kfree(hu);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 09/16] Bluetooth: hci_ldisc: Tidy-up HCI_UART_REGISTERED in hci_uart_tty_close()
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (7 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 08/16] Bluetooth: hci_ldisc: Separate flag handling in hci_uart_tty_close() Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 10/16] Bluetooth: hci_ldisc: hci_uart_tty_close() detach tty after last msg Dean Jenkins
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

The code in hci_uart_tty_close is over complex in handling flag
HCI_UART_REGISTERED as it is unnecessary to check that hdev is NULL.
This is because hdev is only valid when HCI_UART_REGISTERED is in
the set state.

Therefore, remove all "if (hdev)" checks and instead check for flag
HCI_UART_REGISTERED being in the set state.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 9e3604d..2d5c6f0 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -500,20 +500,17 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 		return;
 
 	hdev = hu->hdev;
-	if (hdev)
+	if (test_bit(HCI_UART_REGISTERED, &hu->flags))
 		hci_uart_close(hdev);
 
 	cancel_work_sync(&hu->write_work);
 
-	if (hdev) {
-		if (test_bit(HCI_UART_REGISTERED, &hu->flags))
-			/* Note hci_unregister_dev() may try to send a
-			 * HCI RESET command. If the transmission fails then
-			 * hci_unregister_dev() waits HCI_CMD_TIMEOUT
-			 * (2) seconds for the timeout to occur.
-			 */
-			hci_unregister_dev(hdev);
-	}
+	if (test_bit(HCI_UART_REGISTERED, &hu->flags))
+		/* Note hci_unregister_dev() may try to send a HCI RESET
+		 * command. If the transmission fails then hci_unregister_dev()
+		 * waits HCI_CMD_TIMEOUT (2) seconds for the timeout to occur.
+		 */
+		hci_unregister_dev(hdev);
 
 	if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
 		clear_bit(HCI_UART_PROTO_READY, &hu->flags);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 10/16] Bluetooth: hci_ldisc: hci_uart_tty_close() detach tty after last msg
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (8 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 09/16] Bluetooth: hci_ldisc: Tidy-up HCI_UART_REGISTERED " Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 11/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move hci_uart_close() Dean Jenkins
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

hci_uart_tty_close() is detaching the tty prematurely which prevents
the Data Link protocol layer from using the tty serial port in the
call to hci_unregister_dev().

Consequently, when hci_unregister_dev() attempts to send a HCI RESET
command, the BCSP layer attempts to retransmit the message many times
and the command is timed-out after HCI_CMD_TIMEOUT (2) seconds.
Nothing was physically transmitted because the tty was detached too
early.

Therefore, move the detach statement "tty->disc_data = NULL" to after
hci_unregister_dev() so that the HCI RESET command maybe sent and
acknowledged.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 2d5c6f0..e7d1b8b 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -493,9 +493,6 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 
 	BT_DBG("tty %p", tty);
 
-	/* Detach from the tty */
-	tty->disc_data = NULL;
-
 	if (!hu)
 		return;
 
@@ -517,6 +514,9 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 		hu->proto->close(hu);
 	}
 
+	/* Detach from the tty */
+	tty->disc_data = NULL;
+
 	clear_bit(HCI_UART_PROTO_SET, &hu->flags);
 
 	if (test_and_clear_bit(HCI_UART_REGISTERED, &hu->flags)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 11/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move hci_uart_close()
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (9 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 10/16] Bluetooth: hci_ldisc: hci_uart_tty_close() detach tty after last msg Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 12/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move cancel_work_sync() Dean Jenkins
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

In hci_uart_tty_close() move hci_uart_close() to after the call to
hci_unregister_dev() to avoid the tty flush in hci_uart_close()
and clearance of HCI_RUNNING from corrupting the Data Link protocol
layer's communication stream.

In fact, perform hci_uart_close() at the of hci_uart_tty_close()
where the Data Link protocol layer has been closed so that no
more data can be written in the tty layer. This makes the flush
clear out any non-sent data from the tty layer although this
is probably unnecessary.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index e7d1b8b..acb08c6 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -497,8 +497,6 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 		return;
 
 	hdev = hu->hdev;
-	if (test_bit(HCI_UART_REGISTERED, &hu->flags))
-		hci_uart_close(hdev);
 
 	cancel_work_sync(&hu->write_work);
 
@@ -520,6 +518,7 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 	clear_bit(HCI_UART_PROTO_SET, &hu->flags);
 
 	if (test_and_clear_bit(HCI_UART_REGISTERED, &hu->flags)) {
+		hci_uart_close(hdev);
 		hu->hdev = NULL;
 		hci_free_dev(hdev);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 12/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move cancel_work_sync()
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (10 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 11/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move hci_uart_close() Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 13/16] Bluetooth: hci_ldisc: hci_uart_tty_close() free hu->tx_skb Dean Jenkins
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

The current location of cancel_work_sync() in hci_uart_tty_close()
before the call to hci_unregister_dev(hdev) may momentarily interfere
with the execution of hci_uart_tty_close() by waiting for an active call
to hci_uart_write_work() to finish. However, its current location is
ineffective at preventing a race condition described next.

cancel_work_sync() is needed to try to prevent hci_uart_write_work() from
running whilst the Data Link protocol layer is being unbound (closed).
However, it is weak as a race condition exists between
hci_uart_write_work() being scheduled via hci_uart_tx_wakeup() and
closure of the Data Link protocol layer. In particular, the Data Link
protocol layer may have a retransmission timer which will independently
call hci_uart_tx_wakeup().

To minimise the race condition, cancel_work_sync() needs to be as close
as possible to the code that closes down the Data Link protocol layer.
Therefore, move the cancel_work_sync() statement to be just before
the code which unbinds (closes) the Data Link protocol layer.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index acb08c6..61e03dd4 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -498,8 +498,6 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 
 	hdev = hu->hdev;
 
-	cancel_work_sync(&hu->write_work);
-
 	if (test_bit(HCI_UART_REGISTERED, &hu->flags))
 		/* Note hci_unregister_dev() may try to send a HCI RESET
 		 * command. If the transmission fails then hci_unregister_dev()
@@ -509,6 +507,7 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 
 	if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
 		clear_bit(HCI_UART_PROTO_READY, &hu->flags);
+		cancel_work_sync(&hu->write_work);
 		hu->proto->close(hu);
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 13/16] Bluetooth: hci_ldisc: hci_uart_tty_close() free hu->tx_skb
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (11 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 12/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move cancel_work_sync() Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 14/16] Bluetooth: hci_ldisc: Simplify flushing Dean Jenkins
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

There is a race condition for accessing hu->tx_skb between
hci_uart_flush() and hci_uart_dequeue() which runs in
hci_uart_write_work() from the work queue hu->write_work. This race
condition exists because there is no locking between these 2 threads
to protect hu->tx_skb. Consequently a call to hci_uart_flush() might
be able to free hu->tx_skb whilst hci_uart_write_work() is using
hu->tx_skb which is undesirable as a crash could occur.

Performing any flushing in the transmission path between the Data Link
protocol layer and the UART port may corrupt the data. So freeing
hu->tx_skb or not freeing hu->tx_skb makes little difference to having
intact data or corrupted data. So it is OK not to free hu->tx_skb.

Instead, move the freeing of hu->tx_skb to the end of
hci_uart_tty_close() from hci_uart_flush(). This eliminates the race
condition because the Data Link protocol layer is in the unbound state
and ensures hu->tx_skb is freed before hu is freed. Also use a temporary
pointer to allow hu->tx_skb to be set to NULL before freeing hu->tx_skb.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 61e03dd4..ad2dec5 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -229,10 +229,6 @@ static int hci_uart_flush(struct hci_dev *hdev)
 
 	BT_DBG("hdev %p tty %p", hdev, tty);
 
-	if (hu->tx_skb) {
-		kfree_skb(hu->tx_skb); hu->tx_skb = NULL;
-	}
-
 	/* Flush any pending characters in the driver and discipline. */
 	tty_ldisc_flush(tty);
 	tty_driver_flush_buffer(tty);
@@ -490,6 +486,7 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 {
 	struct hci_uart *hu = tty->disc_data;
 	struct hci_dev *hdev;
+	struct sk_buff *temp_skb;
 
 	BT_DBG("tty %p", tty);
 
@@ -522,6 +519,12 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 		hci_free_dev(hdev);
 	}
 
+	if (hu->tx_skb) {
+		temp_skb = hu->tx_skb;
+		hu->tx_skb = NULL;
+		kfree_skb(temp_skb);
+	}
+
 	kfree(hu);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 14/16] Bluetooth: hci_ldisc: Simplify flushing
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (12 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 13/16] Bluetooth: hci_ldisc: hci_uart_tty_close() free hu->tx_skb Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 15/16] Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races Dean Jenkins
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

The flushing due to the hci_uart_close() call probably corrupts the
data flow between the Data Link protocol layer and the UART port
(transmission direction). In any case, the flushing activity is
asynchronous to the state of the Data Link protocol layer so is
undesirable.

Modify hci_uart_close() to not perform any flushing because
hci_uart_flush() is explicitly called in hci_dev_do_close() which
is synchronous to the closure of the Data Link protocol layer so
it is OK. In other words, there is no point in flushing multiple
times with the Data Link protocol layer being in the bound state.

In hci_uart_tty_close() call hci_uart_flush() instead of
hci_uart_close() before freeing hdev. This final flush is after the
Data Link protocol layer has been unbound so ensures that the
transmission path is empty.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index ad2dec5..0a10ee1 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -244,8 +244,7 @@ static int hci_uart_close(struct hci_dev *hdev)
 {
 	BT_DBG("hdev %p", hdev);
 
-	hci_uart_flush(hdev);
-	hdev->flush = NULL;
+	/* Nothing to do for UART driver */
 	return 0;
 }
 
@@ -514,7 +513,7 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 	clear_bit(HCI_UART_PROTO_SET, &hu->flags);
 
 	if (test_and_clear_bit(HCI_UART_REGISTERED, &hu->flags)) {
-		hci_uart_close(hdev);
+		hci_uart_flush(hdev);
 		hu->hdev = NULL;
 		hci_free_dev(hdev);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 15/16] Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (13 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 14/16] Bluetooth: hci_ldisc: Simplify flushing Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-28 17:50 ` [RFC V1 16/16] Bluetooth: hci_ldisc: Add ioctl_mutex avoiding concurrency Dean Jenkins
  2017-03-30 10:11 ` [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Marcel Holtmann
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

When HCI_UART_PROTO_READY is in the set state, the Data Link protocol
layer (proto) is bound to the HCI UART driver. This state allows the
registered proto function pointers to be used by the HCI UART driver.

When unbinding (closing) the Data Link protocol layer, the proto
function pointers much be prevented from being used immediately before
running the proto close function pointer. Otherwise, there is a risk
that a proto non-close function pointer is used during or after the
proto close function pointer is used. The consequences are likely to
be a kernel crash because the proto close function pointer will free
resources used in the Data Link protocol layer.

Therefore, add a reader writer lock (rwlock) solution to prevent the
close proto function pointer from running by using write_lock_irqsave()
whilst the other proto function pointers are protected using
read_lock(). This means HCI_UART_PROTO_READY can safely be cleared
in the knowledge that no proto function pointers are running.

When flag HCI_UART_PROTO_READY is put into the clear state,
proto close function pointer can safely be run. Note
flag HCI_UART_PROTO_SET being in the set state prevents the proto
open function pointer from being run so there is no race condition
between proto open and close function pointers.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 33 +++++++++++++++++++++++++++++++--
 drivers/bluetooth/hci_uart.h  |  1 +
 2 files changed, 32 insertions(+), 2 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 0a10ee1..0a31629 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -114,8 +114,12 @@ static inline struct sk_buff *hci_uart_dequeue(struct hci_uart *hu)
 	struct sk_buff *skb = hu->tx_skb;
 
 	if (!skb) {
+		read_lock(&hu->proto_lock);
+
 		if (test_bit(HCI_UART_PROTO_READY, &hu->flags))
 			skb = hu->proto->dequeue(hu);
+
+		read_unlock(&hu->proto_lock);
 	} else {
 		hu->tx_skb = NULL;
 	}
@@ -125,6 +129,8 @@ static inline struct sk_buff *hci_uart_dequeue(struct hci_uart *hu)
 
 int hci_uart_tx_wakeup(struct hci_uart *hu)
 {
+	read_lock(&hu->proto_lock);
+
 	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
 		goto no_schedule;
 
@@ -138,6 +144,8 @@ int hci_uart_tx_wakeup(struct hci_uart *hu)
 	schedule_work(&hu->write_work);
 
 no_schedule:
+	read_unlock(&hu->proto_lock);
+
 	return 0;
 }
 
@@ -233,9 +241,13 @@ static int hci_uart_flush(struct hci_dev *hdev)
 	tty_ldisc_flush(tty);
 	tty_driver_flush_buffer(tty);
 
+	read_lock(&hu->proto_lock);
+
 	if (test_bit(HCI_UART_PROTO_READY, &hu->flags))
 		hu->proto->flush(hu);
 
+	read_unlock(&hu->proto_lock);
+
 	return 0;
 }
 
@@ -256,11 +268,17 @@ static int hci_uart_send_frame(struct hci_dev *hdev, struct sk_buff *skb)
 	BT_DBG("%s: type %d len %d", hdev->name, hci_skb_pkt_type(skb),
 	       skb->len);
 
-	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
+	read_lock(&hu->proto_lock);
+
+	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
+		read_unlock(&hu->proto_lock);
 		return -EUNATCH;
+	}
 
 	hu->proto->enqueue(hu, skb);
 
+	read_unlock(&hu->proto_lock);
+
 	hci_uart_tx_wakeup(hu);
 
 	return 0;
@@ -470,6 +488,8 @@ static int hci_uart_tty_open(struct tty_struct *tty)
 	INIT_WORK(&hu->init_ready, hci_uart_init_work);
 	INIT_WORK(&hu->write_work, hci_uart_write_work);
 
+	rwlock_init(&hu->proto_lock);
+
 	/* Flush any pending characters in the driver */
 	tty_driver_flush_buffer(tty);
 
@@ -486,6 +506,7 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 	struct hci_uart *hu = tty->disc_data;
 	struct hci_dev *hdev;
 	struct sk_buff *temp_skb;
+	unsigned long flags;
 
 	BT_DBG("tty %p", tty);
 
@@ -502,7 +523,10 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 		hci_unregister_dev(hdev);
 
 	if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
+		write_lock_irqsave(&hu->proto_lock, flags);
 		clear_bit(HCI_UART_PROTO_READY, &hu->flags);
+		write_unlock_irqrestore(&hu->proto_lock, flags);
+
 		cancel_work_sync(&hu->write_work);
 		hu->proto->close(hu);
 	}
@@ -573,13 +597,18 @@ static void hci_uart_tty_receive(struct tty_struct *tty, const u8 *data,
 	if (!hu || tty != hu->tty)
 		return;
 
-	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
+	read_lock(&hu->proto_lock);
+
+	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
+		read_unlock(&hu->proto_lock);
 		return;
+	}
 
 	/* It does not need a lock here as it is already protected by a mutex in
 	 * tty caller
 	 */
 	hu->proto->recv(hu, data, count);
+	read_unlock(&hu->proto_lock);
 
 	if (hu->hdev)
 		hu->hdev->stat.byte_rx += count;
diff --git a/drivers/bluetooth/hci_uart.h b/drivers/bluetooth/hci_uart.h
index 0701395..580ca98 100644
--- a/drivers/bluetooth/hci_uart.h
+++ b/drivers/bluetooth/hci_uart.h
@@ -85,6 +85,7 @@ struct hci_uart {
 	struct work_struct	write_work;
 
 	const struct hci_uart_proto *proto;
+	rwlock_t		proto_lock;	/* Stop work for proto close */
 	void			*priv;
 
 	struct sk_buff		*tx_skb;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [RFC V1 16/16] Bluetooth: hci_ldisc: Add ioctl_mutex avoiding concurrency
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (14 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 15/16] Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races Dean Jenkins
@ 2017-03-28 17:50 ` Dean Jenkins
  2017-03-30 10:11 ` [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Marcel Holtmann
  16 siblings, 0 replies; 25+ messages in thread
From: Dean Jenkins @ 2017-03-28 17:50 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

There is a concurrency risk with hci_uart_tty_ioctl() running with
itself. Also there is a race condition between hci_uart_tty_close()
clearing the HCI_UART_PROTO_SET flag whilst hci_uart_tty_ioctl() runs.

Even though the HCI_UART_PROTO_SET flag is atomic, this is not
sufficient to prevent hci_uart_tty_ioctl() from potentially accessing
freed or stale variables when hci_uart_tty_close() completes. This is
because the hci_uart_tty_ioctl() thread can run in parallel with the
hci_uart_tty_close() thread. This means the HCI_UART_PROTO_SET flag
can change at any time during the execution of hci_uart_tty_ioctl()
which can lead to a malfunction.

Therefore, add a mutex lock called ioctl_mutex in hci_uart_tty_ioctl()
to prevent self concurrency of hci_uart_tty_ioctl(). In
hci_uart_tty_close() add ioctl_mutex around the clearing of the hu
value from tty->disc_data and clearing the HCI_UART_PROTO_SET flag.

This mutex lock ensures that tty->disc_data does not become NULL
and the HCI_UART_PROTO_SET flag cannot become clear whilst the main
body of hci_uart_tty_ioctl() executes.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 5 +++++
 drivers/bluetooth/hci_uart.h  | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 0a31629..307fe3b 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -489,6 +489,7 @@ static int hci_uart_tty_open(struct tty_struct *tty)
 	INIT_WORK(&hu->write_work, hci_uart_write_work);
 
 	rwlock_init(&hu->proto_lock);
+	mutex_init(&hu->ioctl_mutex);
 
 	/* Flush any pending characters in the driver */
 	tty_driver_flush_buffer(tty);
@@ -531,10 +532,12 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 		hu->proto->close(hu);
 	}
 
+	mutex_lock(&hu->ioctl_mutex);
 	/* Detach from the tty */
 	tty->disc_data = NULL;
 
 	clear_bit(HCI_UART_PROTO_SET, &hu->flags);
+	mutex_unlock(&hu->ioctl_mutex);
 
 	if (test_and_clear_bit(HCI_UART_REGISTERED, &hu->flags)) {
 		hci_uart_flush(hdev);
@@ -745,6 +748,7 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
 	if (!hu)
 		return -EBADF;
 
+	mutex_lock(&hu->ioctl_mutex);
 	switch (cmd) {
 	case HCIUARTSETPROTO:
 		if (!test_and_set_bit(HCI_UART_PROTO_SET, &hu->flags)) {
@@ -784,6 +788,7 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
 		err = n_tty_ioctl_helper(tty, file, cmd, arg);
 		break;
 	}
+	mutex_unlock(&hu->ioctl_mutex);
 
 	return err;
 }
diff --git a/drivers/bluetooth/hci_uart.h b/drivers/bluetooth/hci_uart.h
index 580ca98..25cd93b 100644
--- a/drivers/bluetooth/hci_uart.h
+++ b/drivers/bluetooth/hci_uart.h
@@ -91,6 +91,8 @@ struct hci_uart {
 	struct sk_buff		*tx_skb;
 	unsigned long		tx_state;
 
+	struct mutex		ioctl_mutex;
+
 	unsigned int init_speed;
 	unsigned int oper_speed;
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes
  2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (15 preceding siblings ...)
  2017-03-28 17:50 ` [RFC V1 16/16] Bluetooth: hci_ldisc: Add ioctl_mutex avoiding concurrency Dean Jenkins
@ 2017-03-30 10:11 ` Marcel Holtmann
  2017-04-03 15:09   ` Dean Jenkins
  16 siblings, 1 reply; 25+ messages in thread
From: Marcel Holtmann @ 2017-03-30 10:11 UTC (permalink / raw)
  To: Dean Jenkins; +Cc: Gustavo F. Padovan, Johan Hedberg, linux-bluetooth

Hi Dean,

> This is RFC patchset V1 which reorganises hci_uart_tty_close() to overcome a
> design flaw. I would like some comments on the changes.
> 
> Design Flaw
> ===========
> 
> An example callstack is as follows
> 
> Have Bluetooth running using a BCSP based UART Bluetooth Radio Module.
> 
> Now kill the userland hciattach program by doing
> killall hciattach

is there any chance we can convert BCSP support to run fully inside the kernel with the new parts we have put in. And with that then also use btattach. The split of some parts of BCSP in userspace seems never been a good idea.

I am a bit reluctant to change major hci_ldisc pieces because of just one broken protocol. Running BCSP fully in the kernel seems a better solution to deal with some of these issues.

Regards

Marcel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC V1 07/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup()
  2017-03-28 17:50 ` [RFC V1 07/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup() Dean Jenkins
@ 2017-03-30 10:11   ` Marcel Holtmann
  0 siblings, 0 replies; 25+ messages in thread
From: Marcel Holtmann @ 2017-03-30 10:11 UTC (permalink / raw)
  To: Dean Jenkins; +Cc: Gustavo F. Padovan, Johan Hedberg, linux-bluetooth

Hi Dean,

> Before attempting to schedule a work-item onto hu->write_work in
> hci_uart_tx_wakeup(), check that the Data Link protocol layer is
> still bound to the HCI UART driver.
> 
> Failure to perform this protocol check causes a race condition between
> the work queue hu->write_work running hci_uart_write_work() and the
> Data Link protocol layer being unbound (closed) in hci_uart_tty_close().
> 
> Note hci_uart_tty_close() does have a "cancel_work_sync(&hu->write_work)"
> but it is ineffective because it cannot prevent work-items being added
> to hu->write_work after cancel_work_sync() has run.
> 
> Therefore, add a check for HCI_UART_PROTO_READY into hci_uart_tx_wakeup()
> which prevents scheduling of the work queue when HCI_UART_PROTO_READY
> is in the clear state. However, note a small race condition remains
> because the hci_uart_tx_wakeup() thread can run in parallel with the
> hci_uart_tty_close() thread so it is possible that a schedule of
> hu->write_work can occur when HCI_UART_PROTO_READY is cleared. A complete
> solution needs locking of the threads which is implemented in a future
> commit.
> 
> Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
> ---
> drivers/bluetooth/hci_ldisc.c | 6 +++++-
> 1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
> index c4b801b..a1bb4d64b9 100644
> --- a/drivers/bluetooth/hci_ldisc.c
> +++ b/drivers/bluetooth/hci_ldisc.c
> @@ -125,15 +125,19 @@ static inline struct sk_buff *hci_uart_dequeue(struct hci_uart *hu)
> 
> int hci_uart_tx_wakeup(struct hci_uart *hu)
> {
> +	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
> +		goto no_schedule;
> +
> 	if (test_and_set_bit(HCI_UART_SENDING, &hu->tx_state)) {
> 		set_bit(HCI_UART_TX_WAKEUP, &hu->tx_state);
> -		return 0;
> +		goto no_schedule;
> 	}
> 
> 	BT_DBG("");
> 
> 	schedule_work(&hu->write_work);
> 
> +no_schedule:
> 	return 0;
> }

so instead of calling no_schedule label, just keep calling return 0 directly.

Regards

Marcel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC V1 04/16] Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev() call
  2017-03-28 17:50 ` [RFC V1 04/16] Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev() call Dean Jenkins
@ 2017-03-30 10:11   ` Marcel Holtmann
  0 siblings, 0 replies; 25+ messages in thread
From: Marcel Holtmann @ 2017-03-30 10:11 UTC (permalink / raw)
  To: Dean Jenkins; +Cc: Gustavo F. Padovan, Johan Hedberg, linux-bluetooth

Hi Dean,

> This commit relates to the HCI UART quirk HCI_QUIRK_RESET_ON_CLOSE
> which is enabled by default and can be disabled by setting hdev_flags
> flag HCI_UART_RESET_ON_INIT via ioctl HCIUARTSETFLAGS from userland.
> 
> The implementation of HCI_QUIRK_RESET_ON_CLOSE is broken for the BCSP
> Data Link protocol layer because it can be seen that
> hci_unregister_dev() takes 2 seconds to run due to BCSP communications
> failure. Suspect that sending of the HCI RESET command is broken
> for the other Data Link protocols as well.
> 
> Therefore, add a comment to inform that the call to
> hci_unregister_dev() in hci_uart_tty_close() may send a HCI RESET
> command. This transmission needs the HCI UART driver to remain
> operational including having the Data Link protocol being bound to
> the HCI UART driver. If the transmission fails, then
> hci_unregister_dev() waits HCI_CMD_TIMEOUT (2) seconds for the
> timeout to occur which is undesirable.
> 
> The current software has a call to hci_unregister_dev(hdev) in
> hci_uart_tty_close() which can cause an attempt at sending the
> command HCI RESET to occur through the following callstack:
> 
> hci_uart_tty_close()
> hci_unregister_dev()
> hci_dev_do_close()
> __hci_req_sync() to queue up command HCI RESET
> which adds a work-item to the hdev->workqueue and waits 2 seconds
> for a response
> 
> Subsequently
> hdev->workqueue runs and processes the work-item by calling
> hci_cmd_work()
> hci_send_frame()
> hci_uart_send_frame() to enqueue into the Data Link protocol layer
> 
> No protocol response comes so hci_unregister_dev() takes 2 seconds to
> run!
> 
> In fact, this hidden transmission of the HCI RESET command is most
> likely the root cause of why hci_uart_tty_close() crashed in the past
> and was "fixed" with multiple workarounds in the past.
> 
> The underlying design flaw is that hci_uart_tty_close() is interfering
> with various aspects of transmitting and receiving HCI messages
> whilst hci_unregister_dev() is trying to use the Data Link protocol
> to send the HCI RESET command. Consequently, the design flaw
> causes the Data Link protocol to retransmit as no reply comes from
> the Bluetooth Radio module over the UART link.
> 
> Over the many years of "fixes", this caused the logic in
> hci_uart_tty_close() to become over-complex.
> 
> Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
> ---
> drivers/bluetooth/hci_ldisc.c | 6 ++++++
> 1 file changed, 6 insertions(+)
> 
> diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
> index 1887ad4..c78c9dc 100644
> --- a/drivers/bluetooth/hci_ldisc.c
> +++ b/drivers/bluetooth/hci_ldisc.c
> @@ -499,6 +499,12 @@ static void hci_uart_tty_close(struct tty_struct *tty)
> 	if (test_and_clear_bit(HCI_UART_PROTO_READY, &hu->flags)) {
> 		if (hdev) {
> 			if (test_bit(HCI_UART_REGISTERED, &hu->flags))
> +				/* Note hci_unregister_dev() may try to send
> +				 * a HCI RESET command. If the transmission
> +				 * fails then hci_unregister_dev() waits
> +				 * HCI_CMD_TIMEOUT (2) seconds for the timeout
> +				 * to occur.
> +				 */
> 				hci_unregister_dev(hdev);

lets put { } around this if statement. I know it is no needed, but we generally do that for clarity.

Regards

Marcel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes
  2017-03-30 10:11 ` [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Marcel Holtmann
@ 2017-04-03 15:09   ` Dean Jenkins
  2017-04-03 15:51     ` Marcel Holtmann
  0 siblings, 1 reply; 25+ messages in thread
From: Dean Jenkins @ 2017-04-03 15:09 UTC (permalink / raw)
  To: Marcel Holtmann; +Cc: Gustavo F. Padovan, Johan Hedberg, linux-bluetooth

Hi Marcel,

On 30/03/17 11:11, Marcel Holtmann wrote:
> Hi Dean,
>
>> This is RFC patchset V1 which reorganises hci_uart_tty_close() to overcome a
>> design flaw. I would like some comments on the changes.
>>
>> Design Flaw
>> ===========
>>
>> An example callstack is as follows
>>
>> Have Bluetooth running using a BCSP based UART Bluetooth Radio Module.
>>
>> Now kill the userland hciattach program by doing
>> killall hciattach
> is there any chance we can convert BCSP support to run fully inside the kernel with the new parts we have put in. And with that then also use btattach. The split of some parts of BCSP in userspace seems never been a good idea.

I am not aware of "the new parts we [you] have put in" to the kernel 
because I am working with the older 3.14 kernel with userland components 
that are not Bluez based but the kernel issue is observable. Is there a 
web page where I can find out about your design changes for the new parts ?

My efforts are to improve the latest upstream kernel to eliminate this 
kernel design flaw in HCI UART LDISC (Note TTY LDISC is also broken but 
not fixed by my patchset).

I see that "btattach" is at 
https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/tools/btattach.c, 
however, I am unable to identify whether Linux distributions such as 
Ubuntu have a bluez package that contains "btattach". Is "btattach" a 
replacement for "hciattach" ?


>
> I am a bit reluctant to change major hci_ldisc pieces because of just one broken protocol. Running BCSP fully in the kernel seems a better solution to deal with some of these issues.

The kernel BCSP software in the kernel is not broken although it is not 
fully implemented as you already highlighted. The issue is that HCI UART 
LDISC (and TTY LDISC) has a broken procedure for closing down the HCI 
UART device via hci_uart_tty_close().

This means that I don't see how your suggestion helps to resolve the 
kernel design flaw which is related to closing down any of the Bluetooth 
Data Link protocol layers such as H4, H5, and BCSP (I use BCSP). This 
flaw seems to me to be a long standing Bluetooth kernel Data Link 
protocol layer closedown issue and is unrelated to how the Data Link 
protocol layer is established (connected). Therefore, having BCSP partly 
in userland is irrelevant to this kernel design flaw. Even with BCSP 
fully in the kernel, the protocol closedown issue will remain present I 
think.

I might try to build "btattach" and have a go to use it. If you look 
inside the source code of "btattach" and "hciattach" you can see the 
problem area in closing down an established Bluetooth Data Link protocol 
layer by the use of:

     if (ioctl(fd, TIOCSETD, &ldisc) < 0) {
         perror("Failed set serial line discipline");
         close(fd);
         return -1;
     }

This userland call is the problem area as this asynchronous ioctl 
TIOCSETD can cause hci_uart_tty_close() to run and I think it can cause 
trouble for ALL the Bluetooth Data Link protocol layers such as H4, H5 
and BCSP.

The design flaw is exposed after the Data Link protocol layer has been 
established (connected) and ioctl TIOCSETD is used from userland. In my 
example, I killed "hciattach" which is an abnormal scenario but it still 
needs good handling. I think I have strace evidence of TIOCSETD being 
used due to SIGTERM.

The design flaw is because TIOCSETD can trigger the sending of a HCI 
RESET command during closedown of HCI UART LDISC, TTY LDISC and the Data 
Link protocol layer. I only have experience of BCSP but I suspect H4 and 
H5 have retransmission procedures similar to BCSP so would also be 
susceptible to this issue of trying to send a HCI RESET command whilst 
closing down the needed data path to the UART driver which causes 
sending of the HCI RESET command to be unsuccessful.

I think the callstack is:

Userland ioctl TIOCSETD executes causing =>
Kernel ioctl system call which runs
tty_ioctl()
tiocsetd()
tty_set_ldisc()
tty_ldisc_close()
hci_uart_tty_close()
hci_unregister_dev()
hci_dev_do_close()
__hci_req_sync() which tries to send a HCI RESET command which depends on
HCI_QUIRK_RESET_ON_CLOSE being enabled and that is the default condition

I believe It will affect the closure of any of the Bluetooth Data Link 
protocol layers.

Note that not enabling HCI_QUIRK_RESET_ON_CLOSE does not fully help 
because if Data Link protocol layer retransmissions are occurring when 
hci_uart_tty_close() runs then the various race conditions are still 
present in hci_uart_tty_close().

I suspect evidence of the design flaw can be observed by measuring the 
execution time of the userland ioctl TIOCSETD calls. I predict that 
sometimes it will take 2 seconds for TIOCSETD to complete due to being 
blocked waiting for the unsuccessful attempt at sending the HCI RESET 
command because the HCI command time-out expires. I believe this will be 
independent of the underlying Bluetooth Data Link protocol layer.

Do you have any suggestions for moving forward in accepting my proposed 
changes ? I will try to provide more observable evidence of the issue on 
kernel v.4.10 on a Linux PC.

Thanks for your time in looking at my proposed patches.

Best regards,
Dean

-- 
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes
  2017-04-03 15:09   ` Dean Jenkins
@ 2017-04-03 15:51     ` Marcel Holtmann
  2017-04-04 20:36       ` Dean Jenkins
  0 siblings, 1 reply; 25+ messages in thread
From: Marcel Holtmann @ 2017-04-03 15:51 UTC (permalink / raw)
  To: Dean Jenkins; +Cc: Gustavo F. Padovan, Johan Hedberg, linux-bluetooth

Hi Dean,

>>> This is RFC patchset V1 which reorganises hci_uart_tty_close() to overcome a
>>> design flaw. I would like some comments on the changes.
>>> 
>>> Design Flaw
>>> ===========
>>> 
>>> An example callstack is as follows
>>> 
>>> Have Bluetooth running using a BCSP based UART Bluetooth Radio Module.
>>> 
>>> Now kill the userland hciattach program by doing
>>> killall hciattach
>> is there any chance we can convert BCSP support to run fully inside the kernel with the new parts we have put in. And with that then also use btattach. The split of some parts of BCSP in userspace seems never been a good idea.
> 
> I am not aware of "the new parts we [you] have put in" to the kernel because I am working with the older 3.14 kernel with userland components that are not Bluez based but the kernel issue is observable. Is there a web page where I can find out about your design changes for the new parts ?
> 
> My efforts are to improve the latest upstream kernel to eliminate this kernel design flaw in HCI UART LDISC (Note TTY LDISC is also broken but not fixed by my patchset).
> 
> I see that "btattach" is at https://git.kernel.org/pub/scm/bluetooth/bluez.git/tree/tools/btattach.c, however, I am unable to identify whether Linux distributions such as Ubuntu have a bluez package that contains "btattach". Is "btattach" a replacement for "hciattach” ?

yes, we want to move towards btattach that just assigned the line discipline and selects the UART protocol. Everything else including firmware download, speed changes, recover etc. should be done inside the kernel.

And later with serdev, we would not even need btattach anymore. UART based Bluetooth devices would be enumerated via DT and the TTY not even exposed to userspace. We are slowly getting to that point.

The latest kernel has UART drivers like hci_intel.c and hci_bcm.c which do a lot of things in the kernel. And btattach is just the process that keeps the line discipline open.

>> I am a bit reluctant to change major hci_ldisc pieces because of just one broken protocol. Running BCSP fully in the kernel seems a better solution to deal with some of these issues.
> 
> The kernel BCSP software in the kernel is not broken although it is not fully implemented as you already highlighted. The issue is that HCI UART LDISC (and TTY LDISC) has a broken procedure for closing down the HCI UART device via hci_uart_tty_close().
> 
> This means that I don't see how your suggestion helps to resolve the kernel design flaw which is related to closing down any of the Bluetooth Data Link protocol layers such as H4, H5, and BCSP (I use BCSP). This flaw seems to me to be a long standing Bluetooth kernel Data Link protocol layer closedown issue and is unrelated to how the Data Link protocol layer is established (connected). Therefore, having BCSP partly in userland is irrelevant to this kernel design flaw. Even with BCSP fully in the kernel, the protocol closedown issue will remain present I think.

If you think there are issues, then lets fix them for all protocols. I assumed this was BCSP specific.

> I might try to build "btattach" and have a go to use it. If you look inside the source code of "btattach" and "hciattach" you can see the problem area in closing down an established Bluetooth Data Link protocol layer by the use of:
> 
>    if (ioctl(fd, TIOCSETD, &ldisc) < 0) {
>        perror("Failed set serial line discipline");
>        close(fd);
>        return -1;
>    }
> 
> This userland call is the problem area as this asynchronous ioctl TIOCSETD can cause hci_uart_tty_close() to run and I think it can cause trouble for ALL the Bluetooth Data Link protocol layers such as H4, H5 and BCSP.
> 
> The design flaw is exposed after the Data Link protocol layer has been established (connected) and ioctl TIOCSETD is used from userland. In my example, I killed "hciattach" which is an abnormal scenario but it still needs good handling. I think I have strace evidence of TIOCSETD being used due to SIGTERM.
> 
> The design flaw is because TIOCSETD can trigger the sending of a HCI RESET command during closedown of HCI UART LDISC, TTY LDISC and the Data Link protocol layer. I only have experience of BCSP but I suspect H4 and H5 have retransmission procedures similar to BCSP so would also be susceptible to this issue of trying to send a HCI RESET command whilst closing down the needed data path to the UART driver which causes sending of the HCI RESET command to be unsuccessful.
> 
> I think the callstack is:
> 
> Userland ioctl TIOCSETD executes causing =>
> Kernel ioctl system call which runs
> tty_ioctl()
> tiocsetd()
> tty_set_ldisc()
> tty_ldisc_close()
> hci_uart_tty_close()
> hci_unregister_dev()
> hci_dev_do_close()
> __hci_req_sync() which tries to send a HCI RESET command which depends on
> HCI_QUIRK_RESET_ON_CLOSE being enabled and that is the default condition
> 
> I believe It will affect the closure of any of the Bluetooth Data Link protocol layers.
> 
> Note that not enabling HCI_QUIRK_RESET_ON_CLOSE does not fully help because if Data Link protocol layer retransmissions are occurring when hci_uart_tty_close() runs then the various race conditions are still present in hci_uart_tty_close().
> 
> I suspect evidence of the design flaw can be observed by measuring the execution time of the userland ioctl TIOCSETD calls. I predict that sometimes it will take 2 seconds for TIOCSETD to complete due to being blocked waiting for the unsuccessful attempt at sending the HCI RESET command because the HCI command time-out expires. I believe this will be independent of the underlying Bluetooth Data Link protocol layer.
> 
> Do you have any suggestions for moving forward in accepting my proposed changes ? I will try to provide more observable evidence of the issue on kernel v.4.10 on a Linux PC.

If this is an issue in 4.10, then lets get this fixed / hardened.

Regards

Marcel


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes
  2017-04-03 15:51     ` Marcel Holtmann
@ 2017-04-04 20:36       ` Dean Jenkins
  2017-04-05 15:28         ` Dean Jenkins
  0 siblings, 1 reply; 25+ messages in thread
From: Dean Jenkins @ 2017-04-04 20:36 UTC (permalink / raw)
  To: Marcel Holtmann; +Cc: Gustavo F. Padovan, Johan Hedberg, linux-bluetooth

[-- Attachment #1: Type: text/plain, Size: 7084 bytes --]

Hi Marcel,

On 03/04/17 16:51, Marcel Holtmann wrote:
>
> If this is an issue in 4.10, then lets get this fixed / hardened.
>

I have created a simple test script that leads to a kernel crash in HCI 
UART LDISC. Please note that this is a different scenario to the one 
documented in my previous E-mails but the resulting crash is similar.

I am using a BCSP supported Bluetooth Radio Module connected via a USB 
to serial interface to a 64-bit x86 Linux laptop.

I built kernel v4.10.5 although I messed up the .config settings so 
uname -r gave a strange string of4.10.54.10.5_debug+, sigh

Please note that "btattach" fails to establish a BCSP connection with 
the Bluetooth Radio Module but that is advantageous in exposing a fatal 
race condition in hci_uart_tty_close(). In this case, the BCSP Data Link 
protocol layer attempts to retransmit every 250ms and the BCSP layer 
gets closed down whilst still attempting to retransmit which causes a 
kernel crash.

My proposed patchset fixes this race condition plus some other issues.

Please refer to my attached tarball v4.10.5_bt_crash_info.tgz, the 
contents are:
v4.10.5_pure_testing/
v4.10.5_pure_testing/0001-Bluetooth-Debug-shows-how-hci_uart_tty_close-is-call.patch
v4.10.5_pure_testing/v4.10.5_pure_testing/btattach_loop.sh
v4.10.5_pure_testing/enable_hci_bt_logging.sh
v4.10.5_pure_testing/bt_hci_uart_ldisc_crash_snippet_log.txt

The patch file 
0001-Bluetooth-Debug-shows-how-hci_uart_tty_close-is-call.patch
is just adding debug so you can see that killing "btattach" causes 
do_group_exit() to run in the kernel which cleans-up by using 
tty_ldisc_kill() which calls tty_ldisc_close() which causes 
hci_uart_tty_close() to run.

enable_hci_bt_logging.sh
enables dynamic debug for dmesg although dynamic debug seems unreliable 
these days at least for me and I have to run enable_hci_bt_logging.sh 
multiple times to get all the files logging properly.

bt_hci_uart_ldisc_crash_snippet_log.txt
This shows my analysis of 2 example NULL pointer dereference kernel 
crashes which are easy to reproduce using my test script.

v4.10.5_pure_testing/btattach_loop.sh
This is the simple test script which causes a kernel crash within 10 
minutes of running, it is easily repeatable with my equipment. I think 
the kernel crash occurs before the loop counter reaches 100.

The contents of the script is:

#!/bin/bash

let i=1

while [ true ]
do
         echo "loop $i"
         ./btattach -A /dev/ttyUSB0 -P bcsp &
         sleep 5
         echo "now killing..."
         killall btattach
         i=$(($i + 1))
done

Example kernel crash backtrace:

[ 1561.709737] BUG: unable to handle kernel NULL pointer dereference at 
0000000000000044
[ 1561.709829] IP: _raw_spin_lock_irqsave+0x22/0x40
[ 1561.709862] PGD 0

[ 1561.709889] Oops: 0002 [#1] SMP
[ 1561.709911] Modules linked in: ftdi_sio rfcomm hci_uart btqca xt_set 
ip_set_hash_ip nf_log_ipv6 ip_set nf_log_ipv4 nf_log_common xt_LOG 
ip6table_nat nf_nat_ipv6 xt_recent iptable_nat nf_nat_ipv4 ipt_REJECT 
nf_reject_ipv4 iptable_mangle iptable_raw nf_conntrack_ipv4 
nf_defrag_ipv4 xt_comment ip6t_REJECT nf_reject_ipv6 xt_addrtype bridge 
stp llc xt_mark ip6table_mangle nf_nat_tftp nf_nat_snmp_basic 
nf_conntrack_snmp xt_tcpudp nf_nat_sip xt_CT nf_nat_pptp 
nf_nat_proto_gre nf_nat_irc ip6table_raw nf_nat_h323 xt_multiport 
nf_nat_ftp nf_nat_amanda nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack 
nf_nat nf_conntrack_tftp nf_conntrack_sip nf_conntrack_sane 
nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink 
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc 
nf_conntrack_h323
[ 1561.710172]  nf_conntrack_ftp ts_kmp nf_conntrack_amanda nf_conntrack 
iptable_filter ip_tables ip6table_filter ip6_tables x_tables lockd grace 
ctr ccm af_packet bnep msr usbserial uvcvideo btusb btrtl arc4 brcmsmac 
btbcm videobuf2_vmalloc btintel cordic videobuf2_memops bluetooth 
intel_powerclamp brcmutil videobuf2_v4l2 videobuf2_core mac80211 
videodev coretemp kvm_intel kvm joydev media cfg80211 iTCO_wdt 
iTCO_vendor_support snd_hda_codec_hdmi snd_hda_codec_realtek psmouse 
snd_hda_codec_generic toshiba_acpi input_leds bcma snd_hda_intel 
irqbypass snd_hda_codec toshiba_bluetooth cpufreq_ondemand sparse_keymap 
crc32c_intel cpufreq_conservative industrialio snd_hda_core serio_raw 
wmi rfkill thermal cpufreq_powersave battery ac snd_hwdep snd_pcm 
acpi_cpufreq snd_timer snd toshiba_haps mei_me mei e1000e
[ 1561.718028]  fjes soundcore evdev ptp lpc_ich shpchp pps_core 
intel_ips tpm_tis tpm_tis_core nvram tpm sch_fq_codel sunrpc ipv6 
crc_ccitt autofs4 hid_generic usbhid hid sdhci_pci mmc_block sdhci 
sr_mod ehci_pci ehci_hcd mmc_core usbcore usb_common i915 video button 
i2c_algo_bit drm_kms_helper drm [last unloaded: ftdi_sio]
[ 1561.721317] CPU: 1 PID: 4473 Comm: kworker/1:0 Not tainted 
4.10.54.10.5_debug+ #12
[ 1561.722981] Hardware name: TOSHIBA Satellite R630/Portable PC, BIOS 
Version 1.40   07/20/2010
[ 1561.724660] Workqueue: events hci_uart_write_work [hci_uart]
[ 1561.726377] task: ffff98d9b4f15100 task.stack: ffffa868c1178000
[ 1561.728568] RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40
[ 1561.730355] RSP: 0000:ffffa868c117bda8 EFLAGS: 00010046
[ 1561.731993] RAX: 0000000000000000 RBX: 0000000000000296 RCX: 
0000000000079ad6
[ 1561.733646] RDX: 0000000000000001 RSI: ffff98da77c5c580 RDI: 
0000000000000044
[ 1561.735314] RBP: ffffa868c117bdb0 R08: 000000000001c5a0 R09: 
ffffffff9965a54a
[ 1561.736991] R10: fffff48441dc9a80 R11: ffff98da73403700 R12: 
0000000000000030
[ 1561.738641] R13: 0000000000000044 R14: 0000000000000030 R15: 
ffff98d9b4e50000
[ 1561.740288] FS:  0000000000000000(0000) GS:ffff98da77c40000(0000) 
knlGS:0000000000000000
[ 1561.741940] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1561.743585] CR2: 0000000000000044 CR3: 0000000128d1a000 CR4: 
00000000000006e0
[ 1561.745238] Call Trace:
[ 1561.746882]  skb_dequeue+0x1d/0x70
[ 1561.748512]  bcsp_dequeue+0x27/0x180 [hci_uart]
[ 1561.750130]  ? kfree_skbmem+0x5a/0x60
[ 1561.751742]  hci_uart_write_work+0xc4/0x100 [hci_uart]
[ 1561.753356]  process_one_work+0x151/0x410
[ 1561.754930]  worker_thread+0x69/0x4b0
[ 1561.756507]  kthread+0x114/0x150
[ 1561.758076]  ? rescuer_thread+0x340/0x340
[ 1561.759636]  ? kthread_park+0x90/0x90
[ 1561.761191]  ret_from_fork+0x2c/0x40
[ 1561.762693] Code: 89 e5 e8 82 96 98 ff 5d c3 66 66 66 66 90 55 48 89 
e5 53 9c 58 66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 31 c0 ba 01 00 
00 00 <f0> 0f b1 17 85 c0 75 06 48 89 d8 5b 5d c3 89 c6 e8 29 83 98 ff
[ 1561.766034] RIP: _raw_spin_lock_irqsave+0x22/0x40 RSP: ffffa868c117bda8
[ 1561.768033] CR2: 0000000000000044
[ 1561.774555] ---[ end trace 51cc1d3575c0e559 ]---

Please see my analysis in bt_hci_uart_ldisc_crash_snippet_log.txt

With my patchset applied, this crash appears to no longer occur.

If I manage to produce some more useful results then I will post them.

Thanks,

Best regards,
Dean

-- 
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.


[-- Attachment #2: v4.10.5_bt_crash_info.tgz --]
[-- Type: application/x-compressed-tar, Size: 8119 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes
  2017-04-04 20:36       ` Dean Jenkins
@ 2017-04-05 15:28         ` Dean Jenkins
  2017-04-06  7:23           ` Marcel Holtmann
  0 siblings, 1 reply; 25+ messages in thread
From: Dean Jenkins @ 2017-04-05 15:28 UTC (permalink / raw)
  To: Marcel Holtmann; +Cc: Gustavo F. Padovan, Johan Hedberg, linux-bluetooth

[-- Attachment #1: Type: text/plain, Size: 7367 bytes --]

Hi Marcel,

On 04/04/17 21:36, Dean Jenkins wrote:
>
>
> On 03/04/17 16:51, Marcel Holtmann wrote:
>>
>> If this is an issue in 4.10, then lets get this fixed / hardened.
>>
>
> If I manage to produce some more useful results then I will post them.
>

I have now managed to crash the h4 Data Link protocol layer via 
hci_uart_tty_close().

This confirms that there is a design flaw in hci_uart_tty_close() which 
is independent of the Bluetooth Data Link protocol layers.

I don't have a Bluetooth Radio Module that uses h4 protocol so I used my 
BCSP enabled Bluetooth Radio Module that has a USB to serial interface. 
I realise that this is a weird setup but it is OK for this testcase 
because we need the h4 protocol to be timing out for transmissions. Also 
the BCSP Bluetooth Radio Module may send BCSP frames which will exercise 
the h4 receive path although rejection of the frames should occur which 
is as expected.

Please see the attached tarball v4.10.5.h4_crash.tgz, the contents are:
v4.10.5.h4_crash/
v4.10.5.h4_crash/0001-btattach-Add-option-r-for-sending-HCI-RESET-on-close.patch
v4.10.5.h4_crash/0002-btattach-Fix-raw_device-flag-usage.patch
v4.10.5.h4_crash/0001-Bluetooth-Debug-shows-how-hci_uart_tty_close-is-call.patch
v4.10.5.h4_crash/enable_hci_bt_logging.sh
v4.10.5.h4_crash/btattach_loop_h4.sh
v4.10.5.h4_crash/bt_hci_uart_ldisc_h4_crash_snippet_log.txt


0001-btattach-Add-option-r-for-sending-HCI-RESET-on-close.patch
0002-btattach-Fix-raw_device-flag-usage.patch
These btattach patches add a command line option -r to override 
HCI_UART_RESET_ON_INIT so that the kernel HCI UART LDISC component can 
use the quirk HCI_QUIRK_RESET_ON_CLOSE. This means hci_uart_tty_close() 
is allowed to a send a HCI RESET command during closure of the HCI UART 
LDISC which increases the probability of a kernel crash occurring due to 
the design flaw in hci_uart_tty_close().


0001-Bluetooth-Debug-shows-how-hci_uart_tty_close-is-call.patch
is a kernel patch for v4.10.5 to add some debug to show when 
hci_uart_tty_close() gets called.

enable_hci_bt_logging.sh
Enabled dynamic debug of some Bluetooth files to allow dmesg to show 
when the Bluetooth functions run. However, this seems unreliable and the 
script needs to be run multiple times to get all the files to log properly.

btattach_loop_h4.sh
is a simple test script, and the content is:

#!/bin/bash

let i=1

while [ true ]
do
     echo "loop $i"
     ./btattach -A /dev/ttyUSB0 -P h4 -r &
     sleep 5
     echo "now killing..."
     killall btattach
     i=$(($i + 1))
done

Note that I used my -r option. I suspect I should of used -B option, sigh.

Here is the kernel crash backtrace from 
bt_hci_uart_ldisc_h4_crash_snippet_log.txt, it seems easy to reproduce:

[  563.702430] BUG: unable to handle kernel NULL pointer dereference at 
000000000000001c
[  563.702491] IP: _raw_spin_lock_irqsave+0x22/0x40
[  563.702520] PGD 13230e067
[  563.702521] PUD 1238d6067
[  563.702540] PMD 0

[  563.704521] Oops: 0002 [#1] SMP
[  563.706215] Modules linked in: hci_uart btqca nf_log_ipv6 
ip6table_nat nf_nat_ipv6 ip6t_REJECT nf_reject_ipv6 ip6table_mangle 
xt_set ip_set_hash_ip ip_set nf_log_ipv4 nf_log_common xt_LOG xt_recent 
iptable_nat nf_nat_ipv4 xt_comment ipt_REJECT nf_reject_ipv4 xt_addrtype 
bridge stp llc xt_mark iptable_mangle xt_tcpudp iptable_raw 
nf_conntrack_ipv4 nf_defrag_ipv4 xt_CT nf_nat_tftp nf_nat_snmp_basic 
nf_conntrack_snmp ip6table_raw nf_nat_sip xt_multiport nf_nat_pptp 
nf_nat_proto_gre nf_nat_irc nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_h323 
nf_nat_ftp xt_conntrack nf_nat_amanda nf_nat nf_conntrack_tftp 
nf_conntrack_sip nf_conntrack_sane nf_conntrack_pptp 
nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink 
nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc 
nf_conntrack_h323 nf_conntrack_ftp
[  563.715634]  ts_kmp nf_conntrack_amanda nf_conntrack iptable_filter 
ip_tables ip6table_filter ip6_tables x_tables lockd grace ctr ccm 
af_packet bnep msr arc4 brcmsmac uvcvideo cordic brcmutil ftdi_sio 
videobuf2_vmalloc usbserial intel_powerclamp mac80211 videobuf2_memops 
btusb videobuf2_v4l2 btrtl coretemp snd_hda_codec_hdmi btbcm 
videobuf2_core btintel kvm_intel snd_hda_codec_realtek bluetooth kvm 
videodev joydev cfg80211 snd_hda_codec_generic media snd_hda_intel 
snd_hda_codec psmouse snd_hda_core irqbypass e1000e snd_hwdep input_leds 
snd_pcm iTCO_wdt iTCO_vendor_support serio_raw crc32c_intel toshiba_acpi 
snd_timer ptp mei_me fjes bcma snd mei thermal sparse_keymap 
toshiba_haps industrialio lpc_ich soundcore battery ac pps_core shpchp 
intel_ips toshiba_bluetooth wmi rfkill tpm_tis tpm_tis_core
[  563.726159]  tpm cpufreq_ondemand cpufreq_conservative 
cpufreq_powersave acpi_cpufreq evdev nvram sunrpc sch_fq_codel ipv6 
crc_ccitt autofs4 hid_generic usbhid hid sdhci_pci mmc_block sdhci 
sr_mod ehci_pci ehci_hcd mmc_core usbcore usb_common i915 video button 
i2c_algo_bit drm_kms_helper drm
[  563.730623] CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 
4.10.54.10.5_debug+ #13
[  563.732862] Hardware name: TOSHIBA Satellite R630/Portable PC, BIOS 
Version 1.40   07/20/2010
[  563.735127] Workqueue: events hci_uart_write_work [hci_uart]
[  563.737376] task: ffffa03f32951b00 task.stack: ffffbc4ec0768000
[  563.739620] RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40
[  563.741839] RSP: 0018:ffffbc4ec076bdd8 EFLAGS: 00010046
[  563.744059] RAX: 0000000000000000 RBX: 0000000000000202 RCX: 
ffffa03f37c1cd60
[  563.746286] RDX: 0000000000000001 RSI: ffffa03f37c18ae0 RDI: 
000000000000001c
[  563.748493] RBP: ffffbc4ec076bde0 R08: ffffa03f32332380 R09: 
0000000000000001
[  563.750690] R10: ffffe9de427ce800 R11: ffffa03f33498600 R12: 
0000000000000008
[  563.752884] R13: 000000000000001c R14: ffffa03e79b1ad38 R15: 
ffffa03e79b1a240
[  563.755075] FS:  0000000000000000(0000) GS:ffffa03f37c00000(0000) 
knlGS:0000000000000000
[  563.757260] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  563.759426] CR2: 000000000000001c CR3: 000000011d7a3000 CR4: 
00000000000006f0
[  563.761610] Call Trace:
[  563.763798]  skb_dequeue+0x1d/0x70
[  563.765975]  h4_dequeue+0x16/0x20 [hci_uart]
[  563.768128]  hci_uart_write_work+0xc4/0x100 [hci_uart]
[  563.770262]  process_one_work+0x151/0x410
[  563.772368]  worker_thread+0x69/0x4b0
[  563.774466]  kthread+0x114/0x150
[  563.776519]  ? rescuer_thread+0x340/0x340
[  563.778546]  ? kthread_park+0x90/0x90
[  563.780566]  ret_from_fork+0x2c/0x40
[  563.782559] Code: 89 e5 e8 82 96 98 ff 5d c3 66 66 66 66 90 55 48 89 
e5 53 9c 58 66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 31 c0 ba 01 00 
00 00 <f0> 0f b1 17 85 c0 75 06 48 89 d8 5b 5d c3 89 c6 e8 29 83 98 ff
[  563.786800] RIP: _raw_spin_lock_irqsave+0x22/0x40 RSP: ffffbc4ec076bdd8
[  563.788940] CR2: 000000000000001c
[  563.798389] ---[ end trace 326429e7baa9f359 ]---

Please see my analysis in bt_hci_uart_ldisc_h4_crash_snippet_log.txt

With my patchset applied, this crash does not occur because locking is 
present which inhibits the Data Link protocol layer's dequeue function 
from running when the Data Link protocol layer is about to be closed down.

I will try to provide some "good" logs of my patchset with btattach 
using BCSP.

Thanks,

Best regards,
Dean

-- 
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.


[-- Attachment #2: v4.10.5.h4_crash.tgz --]
[-- Type: application/x-compressed-tar, Size: 7866 bytes --]

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes
  2017-04-05 15:28         ` Dean Jenkins
@ 2017-04-06  7:23           ` Marcel Holtmann
  0 siblings, 0 replies; 25+ messages in thread
From: Marcel Holtmann @ 2017-04-06  7:23 UTC (permalink / raw)
  To: Dean Jenkins; +Cc: Gustavo F. Padovan, Johan Hedberg, linux-bluetooth

Hi Dean,

>>> If this is an issue in 4.10, then lets get this fixed / hardened.
>>> 
>> 
>> If I manage to produce some more useful results then I will post them.
>> 
> 
> I have now managed to crash the h4 Data Link protocol layer via hci_uart_tty_close().
> 
> This confirms that there is a design flaw in hci_uart_tty_close() which is independent of the Bluetooth Data Link protocol layers.
> 
> I don't have a Bluetooth Radio Module that uses h4 protocol so I used my BCSP enabled Bluetooth Radio Module that has a USB to serial interface. I realise that this is a weird setup but it is OK for this testcase because we need the h4 protocol to be timing out for transmissions. Also the BCSP Bluetooth Radio Module may send BCSP frames which will exercise the h4 receive path although rejection of the frames should occur which is as expected.

can you send me a patch set with my minor comments addressed. Then I have another look at it.

Regards

Marcel


^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2017-04-06  7:23 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-28 17:50 [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
2017-03-28 17:50 ` [RFC V1 01/16] Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work() Dean Jenkins
2017-03-28 17:50 ` [RFC V1 02/16] Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev Dean Jenkins
2017-03-28 17:50 ` [RFC V1 03/16] Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY Dean Jenkins
2017-03-28 17:50 ` [RFC V1 04/16] Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev() call Dean Jenkins
2017-03-30 10:11   ` Marcel Holtmann
2017-03-28 17:50 ` [RFC V1 05/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame() Dean Jenkins
2017-03-28 17:50 ` [RFC V1 06/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue() Dean Jenkins
2017-03-28 17:50 ` [RFC V1 07/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup() Dean Jenkins
2017-03-30 10:11   ` Marcel Holtmann
2017-03-28 17:50 ` [RFC V1 08/16] Bluetooth: hci_ldisc: Separate flag handling in hci_uart_tty_close() Dean Jenkins
2017-03-28 17:50 ` [RFC V1 09/16] Bluetooth: hci_ldisc: Tidy-up HCI_UART_REGISTERED " Dean Jenkins
2017-03-28 17:50 ` [RFC V1 10/16] Bluetooth: hci_ldisc: hci_uart_tty_close() detach tty after last msg Dean Jenkins
2017-03-28 17:50 ` [RFC V1 11/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move hci_uart_close() Dean Jenkins
2017-03-28 17:50 ` [RFC V1 12/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move cancel_work_sync() Dean Jenkins
2017-03-28 17:50 ` [RFC V1 13/16] Bluetooth: hci_ldisc: hci_uart_tty_close() free hu->tx_skb Dean Jenkins
2017-03-28 17:50 ` [RFC V1 14/16] Bluetooth: hci_ldisc: Simplify flushing Dean Jenkins
2017-03-28 17:50 ` [RFC V1 15/16] Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races Dean Jenkins
2017-03-28 17:50 ` [RFC V1 16/16] Bluetooth: hci_ldisc: Add ioctl_mutex avoiding concurrency Dean Jenkins
2017-03-30 10:11 ` [RFC V1 00/16] hci_ldisc hci_uart_tty_close() fixes Marcel Holtmann
2017-04-03 15:09   ` Dean Jenkins
2017-04-03 15:51     ` Marcel Holtmann
2017-04-04 20:36       ` Dean Jenkins
2017-04-05 15:28         ` Dean Jenkins
2017-04-06  7:23           ` Marcel Holtmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.