All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes
@ 2017-04-07 17:04 Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 01/16] Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work() Dean Jenkins
                   ` (16 more replies)
  0 siblings, 17 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

This is patchset V2 which reorganises hci_uart_tty_close() to overcome a design
flaw related to the proper closure of the Data Link protocol layer. In the worst
case scenario a kernel crash occurs.


Previous Discussions
====================

Please refer to the discussion on my RFC patchset V1 that can be found here:
https://www.spinics.net/lists/linux-bluetooth/msg70077.html


Changes between V1 and V2
=========================

1. Implemented some minor code style changes that were suggested by Marcel
Holtmann.
2. Reordered some of the patches to better group together related changes.


Terms used
==========

Data Link protocol:

This refers to the serial link protocol used by the UART based Bluetooth Radio
Module. h4, BCSP and h5 are examples of Data Link protocols and there are other
protocols available.

Data Link protocol layer:

This refers to the "proto" protocols that implement the Data Link protocol on
the Linux side of the UART link.

These protocols are implemented with a function pointer interface and therefore
a specific Data Link protocol can be bound within the UART HCI driver.


Overview of Data Link protocol data flow
========================================

The Data Link protocol layer uses serial protocol frames to transport
information such as HCI messages to the Bluetooth Radio Module.

Internally within the Linux kernel, the Data Link protocol layer supplies
protocol frames via the HCI UART LDISC and TTY layers to the UART driver. Any
breakage in this data "pipe" will cause a disruption of the communications with
the Bluetooth Radio Module.


Overview of Design Flaw
=======================

There are a number of issues relating to the implementation of
hci_uart_tty_close().

The areas of concern are:

a) poor locking of hu->proto->close() with respect to the other "proto" function
   pointers. This means hu->proto->close() can asynchronously remove resources
   used by the other "proto" function pointers resulting in malfunctions and
   kernel crashes.

b) tearing down parts of the data "pipe" between the Data Link protocol layer
   and the UART driver despite attempting to send HCI commands and trying to
   handle retransmissions within hci_uart_tty_close(). This results in a loss of
   communications with the Bluetooth Radio Module. Note that a loss of
   communications can cause the Data Link protocol layer to independently
   attempt to retransmit such as in the case of BCSP.

c) suspect HCI_QUIRK_RESET_ON_CLOSE has been broken for many years because it
   is not possible to get the HCI RESET command successfully ACK'ed due to the
   data "pipe" being cut in multiple places during the execution of
   tty_ldisc_close().

Please note the issue is independent of the implementation of the Data Link
protocol layer. BCSP closedown scenarios have been used as an example. 


Test cases
==========

Test 1
------

Setup Bluetooth to use a BCSP based UART Bluetooth Radio Module. In this case
use the Bluez hciattach utility.

Then kill the userland hciattach program by doing
killall hciattach

When hciattach terminates, SIGTERM handling causes ioctl TIOCSETD to run
kernel function tty_set_ldisc().

Test 2
------

Setup Bluetooth to use a BCSP based UART Bluetooth Radio Module. In this case
use the Bluez btattach utility.

Note that btattach does not establish a BCSP connection due to missing some BCSP
protocol implementation. However, this is ideal in exposing the design flaw.

Here is an example test script that can trigger a Bluetooth kernel crash in
relation to hci_uart_tty_close().

#!/bin/bash

let i=1

while [ true ]
do
        echo "loop $i"
        ./btattach -A /dev/ttyUSB0 -P bcsp &
        sleep 5
        echo "now killing..."
        killall btattach
        i=$(($i + 1))
done

Kernel crashes within 10 minutes of starting the script.

h4 can also be shown to fail by using
	./btattach -A /dev/ttyUSB0 -P h4 &

Note that in my test run, h4 was communicating with a BCSP enabled Bluetooth
Radio Module. The probability of a crash is increased when poor communications
occur.

When btattach gets a SIGTERM, the program terminates with a group_exit() system
call which causes tty_set_ldisc() to run to perform clean-up.


Analysis of hci_uart_tty_close() callstack
==========================================

Crash 1 - Sending HCI RESET command due to HCI_QUIRK_RESET_ON_CLOSE
-------

>From tests 1 and 2 the callstack can be:

tty_set_ldisc() takes a tty ref lock via tty_ldisc_lock
tty_ldisc_close()
hci_uart_tty_close()
hci_unregister_dev()
hci_dev_do_close()
__hci_req_sync() which tries to send a HCI RESET command which depends on
HCI_QUIRK_RESET_ON_CLOSE being enabled and that is the default condition. This
is done by queueing up the HCI RESET command which adds a work-item to the
hdev->workqueue and waits 2 seconds for a response

Subsequently in a parallel thread to hci_uart_tty_close()
hdev->workqueue runs and processes the work-item by calling
hci_cmd_work()
hci_send_frame()
hci_uart_send_frame()
hci_uart_tx_wakeup() to schedule hci_uart_write_work() via the work queue
 hu->write_work 

Subsequently in a parallel thread to hci_uart_tty_close()
hci_uart_write_work()
hci_uart_dequeue()
hu->proto->dequeue()
bcsp_dequeue()
tty->ops->write() to write the BCSP frame to the UART driver via TTY layer

Subsequently in a parallel thread to hci_uart_tty_close()
BCSP re-transmission timer expires after 250ms due to no ACK from the Bluetooth
Radio Module.
bcsp_timed_event()
hci_uart_tx_wakeup() to schedule hci_uart_write_work() via the work queue
 hu->write_work 

Subsequently in a parallel thread to hci_uart_tty_close()
hci_uart_write_work()
hci_uart_dequeue()
hu->proto->dequeue()
bcsp_dequeue()
tty->ops->write() to write the BCSP frame to the UART driver via TTY layer

Subsequently in a parallel thread to hci_uart_tty_close()
BCSP re-transmission timer expires after 250ms due to no ACK from the Bluetooth
Radio Module.
bcsp_timed_event()
hci_uart_tx_wakeup() to schedule hci_uart_write_work() via the work queue
 hu->write_work 

Subsequently in a parallel thread to hci_uart_tty_close()
hci_uart_write_work()
hci_uart_dequeue()
hu->proto->dequeue()
bcsp_dequeue()
tty->ops->write() to write the BCSP frame to the UART driver via TTY layer

etc. BCSP attempts to retransmit every 250ms during the execution of
hci_uart_tty_close() until bcsp_close() is called via hu->proto->close().

Here is an example kernel crash log for kernel 4.10.5 + debug
(sorry for the bad kernel version string 4.10.54.10.5_debug+)

[ 1561.709737] BUG: unable to handle kernel NULL pointer dereference at 0000000000000044
[ 1561.709829] IP: _raw_spin_lock_irqsave+0x22/0x40
[ 1561.709862] PGD 0
[ 1561.709889] Oops: 0002 1 SMP
[ 1561.709911] Modules linked in: ftdi_sio rfcomm hci_uart btqca xt_set ip_set_hash_ip nf_log_ipv6 ip_set nf_log_ipv4 nf_log_common xt_LOG ip6table_nat nf_nat_ipv6 xt_recent iptable_nat nf_nat_ipv4 ipt_REJECT nf_reject_ipv4 iptable_mangle iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment ip6t_REJECT nf_reject_ipv6 xt_addrtype bridge stp llc xt_mark ip6table_mangle nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp xt_tcpudp nf_nat_sip xt_CT nf_nat_pptp nf_nat_proto_gre nf_nat_irc ip6table_raw nf_nat_h323 xt_multiport nf_nat_ftp nf_nat_amanda nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_nat nf_conntrack_tftp nf_conntrack_sip nf_conntrack_sane nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323
[ 1561.710172] nf_conntrack_ftp ts_kmp nf_conntrack_amanda nf_conntrack iptable_filter ip_tables ip6table_filter ip6_tables x_tables lockd grace ctr ccm af_packet bnep msr usbserial uvcvideo btusb btrtl arc4 brcmsmac btbcm videobuf2_vmalloc btintel cordic videobuf2_memops bluetooth intel_powerclamp brcmutil videobuf2_v4l2 videobuf2_core mac80211 videodev coretemp kvm_intel kvm joydev media cfg80211 iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi snd_hda_codec_realtek psmouse snd_hda_codec_generic toshiba_acpi input_leds bcma snd_hda_intel irqbypass snd_hda_codec toshiba_bluetooth cpufreq_ondemand sparse_keymap crc32c_intel cpufreq_conservative industrialio snd_hda_core serio_raw wmi rfkill thermal cpufreq_powersave battery ac snd_hwdep snd_pcm acpi_cpufreq snd_timer snd toshiba_haps mei_me mei e1000e
[ 1561.718028] fjes soundcore evdev ptp lpc_ich shpchp pps_core intel_ips tpm_tis tpm_tis_core nvram tpm sch_fq_codel sunrpc ipv6 crc_ccitt autofs4 hid_generic usbhid hid sdhci_pci mmc_block sdhci sr_mod ehci_pci ehci_hcd mmc_core usbcore usb_common i915 video button i2c_algo_bit drm_kms_helper drm [last unloaded: ftdi_sio]
[ 1561.721317] CPU: 1 PID: 4473 Comm: kworker/1:0 Not tainted 4.10.54.10.5_debug+ #12
[ 1561.722981] Hardware name: TOSHIBA Satellite R630/Portable PC, BIOS Version 1.40 07/20/2010
[ 1561.724660] Workqueue: events hci_uart_write_work [hci_uart]
[ 1561.726377] task: ffff98d9b4f15100 task.stack: ffffa868c1178000
[ 1561.728568] RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40
[ 1561.730355] RSP: 0000:ffffa868c117bda8 EFLAGS: 00010046
[ 1561.731993] RAX: 0000000000000000 RBX: 0000000000000296 RCX: 0000000000079ad6
[ 1561.733646] RDX: 0000000000000001 RSI: ffff98da77c5c580 RDI: 0000000000000044
[ 1561.735314] RBP: ffffa868c117bdb0 R08: 000000000001c5a0 R09: ffffffff9965a54a
[ 1561.736991] R10: fffff48441dc9a80 R11: ffff98da73403700 R12: 0000000000000030
[ 1561.738641] R13: 0000000000000044 R14: 0000000000000030 R15: ffff98d9b4e50000
[ 1561.740288] FS: 0000000000000000(0000) GS:ffff98da77c40000(0000) knlGS:0000000000000000
[ 1561.741940] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1561.743585] CR2: 0000000000000044 CR3: 0000000128d1a000 CR4: 00000000000006e0
[ 1561.745238] Call Trace:
[ 1561.746882] skb_dequeue+0x1d/0x70
[ 1561.748512] bcsp_dequeue+0x27/0x180 [hci_uart]
[ 1561.750130] ? kfree_skbmem+0x5a/0x60
[ 1561.751742] hci_uart_write_work+0xc4/0x100 [hci_uart]
[ 1561.753356] process_one_work+0x151/0x410
[ 1561.754930] worker_thread+0x69/0x4b0
[ 1561.756507] kthread+0x114/0x150
[ 1561.758076] ? rescuer_thread+0x340/0x340
[ 1561.759636] ? kthread_park+0x90/0x90
[ 1561.761191] ret_from_fork+0x2c/0x40
[ 1561.762693] Code: 89 e5 e8 82 96 98 ff 5d c3 66 66 66 66 90 55 48 89 e5 53 9c 58 66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 06 48 89 d8 5b 5d c3 89 c6 e8 29 83 98 ff
[ 1561.766034] RIP: _raw_spin_lock_irqsave+0x22/0x40 RSP: ffffa868c117bda8
[ 1561.768033] CR2: 0000000000000044
[ 1561.774555] --[ end trace 51cc1d3575c0e559 ]--


Crash 2 - Sending HCI commands but get no ACKs
-------

Using test 2 and h4, a similar crash could be triggered for h4_dequeue().

In this case HCI is attempting to send multiple HCI commands.

tty_set_ldisc() takes a tty ref lock via tty_ldisc_lock
tty_ldisc_close()
hci_uart_tty_close()
hci_unregister_dev()
hci_dev_do_close()

HCI commands become queued up by adding work-items to the hdev->workqueue.

Subsequently in a parallel thread to hci_uart_tty_close()
hdev->workqueue runs and processes a work-item by calling
hci_cmd_work()
hci_send_frame()
hci_uart_send_frame()
hci_uart_tx_wakeup() to schedule hci_uart_write_work() via the work queue
 hu->write_work 

Subsequently in a parallel thread to hci_uart_tty_close()
hci_uart_write_work()
hci_uart_dequeue()
hu->proto->dequeue()
tty->ops->write() to write the h4 frame to the UART driver via TTY layer

Subsequently in a parallel thread to hci_uart_tty_close()
hdev->workqueue runs and processes a work-item by calling
hci_cmd_work()
hci_send_frame()
hci_uart_send_frame()
hci_uart_tx_wakeup() to schedule hci_uart_write_work() via the work queue
 hu->write_work 

Subsequently in a parallel thread to hci_uart_tty_close()
hci_uart_write_work()
hci_uart_dequeue()
hu->proto->dequeue()
tty->ops->write() to write the h4 frame to the UART driver via TTY layer

etc. Note that HCI TX timeouts occur.

Here is an example kernel crash log for kernel 4.10.5 + debug
(sorry for the bad kernel version string 4.10.54.10.5_debug+)

[ 563.702430] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
[ 563.702491] IP: _raw_spin_lock_irqsave+0x22/0x40
[ 563.702520] PGD 13230e067
[ 563.702521] PUD 1238d6067
[ 563.702540] PMD 0
[ 563.704521] Oops: 0002 1 SMP
[ 563.706215] Modules linked in: hci_uart btqca nf_log_ipv6 ip6table_nat nf_nat_ipv6 ip6t_REJECT nf_reject_ipv6 ip6table_mangle xt_set ip_set_hash_ip ip_set nf_log_ipv4 nf_log_common xt_LOG xt_recent iptable_nat nf_nat_ipv4 xt_comment ipt_REJECT nf_reject_ipv4 xt_addrtype bridge stp llc xt_mark iptable_mangle xt_tcpudp iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_CT nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp ip6table_raw nf_nat_sip xt_multiport nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_h323 nf_nat_ftp xt_conntrack nf_nat_amanda nf_nat nf_conntrack_tftp nf_conntrack_sip nf_conntrack_sane nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp
[ 563.715634] ts_kmp nf_conntrack_amanda nf_conntrack iptable_filter ip_tables ip6table_filter ip6_tables x_tables lockd grace ctr ccm af_packet bnep msr arc4 brcmsmac uvcvideo cordic brcmutil ftdi_sio videobuf2_vmalloc usbserial intel_powerclamp mac80211 videobuf2_memops btusb videobuf2_v4l2 btrtl coretemp snd_hda_codec_hdmi btbcm videobuf2_core btintel kvm_intel snd_hda_codec_realtek bluetooth kvm videodev joydev cfg80211 snd_hda_codec_generic media snd_hda_intel snd_hda_codec psmouse snd_hda_core irqbypass e1000e snd_hwdep input_leds snd_pcm iTCO_wdt iTCO_vendor_support serio_raw crc32c_intel toshiba_acpi snd_timer ptp mei_me fjes bcma snd mei thermal sparse_keymap toshiba_haps industrialio lpc_ich soundcore battery ac pps_core shpchp intel_ips toshiba_bluetooth wmi rfkill tpm_tis tpm_tis_core
[ 563.726159] tpm cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq evdev nvram sunrpc sch_fq_codel ipv6 crc_ccitt autofs4 hid_generic usbhid hid sdhci_pci mmc_block sdhci sr_mod ehci_pci ehci_hcd mmc_core usbcore usb_common i915 video button i2c_algo_bit drm_kms_helper drm
[ 563.730623] CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.10.54.10.5_debug+ #13
[ 563.732862] Hardware name: TOSHIBA Satellite R630/Portable PC, BIOS Version 1.40 07/20/2010
[ 563.735127] Workqueue: events hci_uart_write_work [hci_uart]
[ 563.737376] task: ffffa03f32951b00 task.stack: ffffbc4ec0768000
[ 563.739620] RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40
[ 563.741839] RSP: 0018:ffffbc4ec076bdd8 EFLAGS: 00010046
[ 563.744059] RAX: 0000000000000000 RBX: 0000000000000202 RCX: ffffa03f37c1cd60
[ 563.746286] RDX: 0000000000000001 RSI: ffffa03f37c18ae0 RDI: 000000000000001c
[ 563.748493] RBP: ffffbc4ec076bde0 R08: ffffa03f32332380 R09: 0000000000000001
[ 563.750690] R10: ffffe9de427ce800 R11: ffffa03f33498600 R12: 0000000000000008
[ 563.752884] R13: 000000000000001c R14: ffffa03e79b1ad38 R15: ffffa03e79b1a240
[ 563.755075] FS: 0000000000000000(0000) GS:ffffa03f37c00000(0000) knlGS:0000000000000000
[ 563.757260] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 563.759426] CR2: 000000000000001c CR3: 000000011d7a3000 CR4: 00000000000006f0
[ 563.761610] Call Trace:
[ 563.763798] skb_dequeue+0x1d/0x70
[ 563.765975] h4_dequeue+0x16/0x20 [hci_uart]
[ 563.768128] hci_uart_write_work+0xc4/0x100 [hci_uart]
[ 563.770262] process_one_work+0x151/0x410
[ 563.772368] worker_thread+0x69/0x4b0
[ 563.774466] kthread+0x114/0x150
[ 563.776519] ? rescuer_thread+0x340/0x340
[ 563.778546] ? kthread_park+0x90/0x90
[ 563.780566] ret_from_fork+0x2c/0x40
[ 563.782559] Code: 89 e5 e8 82 96 98 ff 5d c3 66 66 66 66 90 55 48 89 e5 53 9c 58 66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 06 48 89 d8 5b 5d c3 89 c6 e8 29 83 98 ff
[ 563.786800] RIP: _raw_spin_lock_irqsave+0x22/0x40 RSP: ffffbc4ec076bdd8
[ 563.788940] CR2: 000000000000001c
[ 563.798389] --[ end trace 326429e7baa9f359 ]--


Detailed analysis
=================

The failure can be rare due to the following mitigations:

a) the Data Link protocol layer might be idle when hci_uart_tty_close() runs.
b) the system never shuts down Bluetooth so hci_uart_tty_close() does not run.
c) HCI_QUIRK_RESET_ON_CLOSE is disabled so the HCI RESET command is not sent.

Please note that disabling HCI_QUIRK_RESET_ON_CLOSE helps to reduce the
probability that a crash occurs. However, it is not a fix.

It can be seen that the following threads can be running concurrently with
hci_uart_tty_close():

a) hci_cmd_work() via the work queue hdev->workqueue
b) hci_uart_write_work() via the work queue hu->write_work
c) Data Link protocol layer retransmission timer such as bcsp_timed_event()

In order to send a HCI command the Data Link protocol layer must be able to send
and receive frames from the UART port that is physically connected to the UART
based Bluetooth Radio Module. If this data "pipe" is broken then the Data Link
protocol layer may perform retransmissions such as in the case of BCSP. In
addition, a loss of communications causes the HCI command to not be ACK'ed so
causes a HCI command timeout to occur.

In the case of HCI_QUIRK_RESET_ON_CLOSE, the sending of the HCI RESET command
has a 2 second timeout which will block __hci_req_sync() for up to 2 seconds.

The hu->proto->dequeue() kernel crash occurs because hu->proto->close() was
called which removes the resources needed by hu->proto->dequeue(). This causes
a NULL pointer dereference kernel crash to occur when hu->proto->dequeue()
executes.

The reason why hu->proto->close() can run before or during hu->proto->dequeue()
is because the call to cancel_work_sync(&hu->write_work) is ineffective. This is
because hci_cmd_work() and BCSP bcsp_timed_event() can schedule work-items
after the cancel_work_sync() completed.

Note that flushing can corrupt any protocol serial frames being sent from the
Data Link protocol layer to the UART driver and this is undesirable.


Solution
========

Reorganise hci_uart_tty_close() so that hci_unregister_dev() can run with no
interference to the data "pipe" between the Data Link protocol layer such as
BCSP and the UART driver.

The patchset cleans up the HCI_UART_REGISTERED and HCI_UART_PROTO_READY flag
handling so that:

a) hdev is only valid when HCI_UART_REGISTERED is in the set state.
b) the "proto" Data Link function pointers can only be used when
   HCI_UART_PROTO_READY is in the set state.

The most important patch is
"Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races"

This patch adds rwlocking around the clearing of the flag HCI_UART_PROTO_READY.
This is because the atomic flag HCI_UART_PROTO_READY cannot always prevent a
thread using a Data Link protocol layer function pointer during or after closure
of the Data Link protocol layer. This can result in a kernel crash. Remember
that the BCSP retransmission handling runs in a separate thread and tries to
send a new frame. Therefore there is a small window of a race condition for the
flag HCI_UART_PROTO_READY to be seen in the set state and then calls the
"proto" function pointer after the Data Link protocol layer has been closed,
resulting in a kernel crash.


Overview of patches
-------------------

The patches were rebased onto the bluetooth-next/master branch commit:
3eed895 Bluetooth: L2CAP: Don't return -EAGAIN if out of credits

Tidy-up patches (mainly for h5 binding error handling)
  Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work()
  Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev
  Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY

Adds a comment to warn that hci_unregister_dev() may send a HCI RESET command
  Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev()
    call

Simplifies the relationship between HCI_UART_REGISTERED and HCI_UART_PROTO_READY
  Bluetooth: hci_ldisc: Separate flag handling in hci_uart_tty_close()

Prevents "proto" functions being used immediately before and after
hu->proto->close()
  Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame()
  Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue()
  Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup()
  Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races

Prevents the "data" pipe being cut too early inside hci_uart_tty_close()
  Bluetooth: hci_ldisc: Tidy-up HCI_UART_REGISTERED in
    hci_uart_tty_close()
  Bluetooth: hci_ldisc: hci_uart_tty_close() move hci_uart_close()
  Bluetooth: hci_ldisc: hci_uart_tty_close() detach tty after last msg
  Bluetooth: hci_ldisc: hci_uart_tty_close() move cancel_work_sync()
  Bluetooth: hci_ldisc: hci_uart_tty_close() free hu->tx_skb
  Bluetooth: hci_ldisc: Simplify flushing

Prevent concurrency of hci_uart_tty_ioctl() with hci_uart_tty_close()
  Bluetooth: hci_ldisc: Add ioctl_mutex avoiding concurrency


Further work
============

This patchset tackles the issues of hci_uart_tty_close().

However, the TTY layer needs investigation because tty_set_ldisc() has a tty ref
lock during the execution of hci_uart_tty_close() which:

a) prevents flush_to_ldisc() from passing the received UART data to the Data
   Link protocol layer.
b) prevents hci_uart_tty_wakeup() from passing the Data Link protocol layer's
   transmission data to the UART driver.

Therefore, the TTY layer cuts the data "pipe" during the execution of
hci_uart_tty_close() causing a loss of communications with the Bluetooth Radio
Module.

This patchset does not attempt to fix the TTY layer. This means that the
HCI_QUIRK_RESET_ON_CLOSE will remain being unsuccessful in sending the HCI RESET
command. In fact, all attempts at sending an HCI command during
hci_uart_tty_close() will still fail.


Dean Jenkins (16):
  Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work()
  Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev
  Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY
  Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev()
    call
  Bluetooth: hci_ldisc: Separate flag handling in hci_uart_tty_close()
  Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame()
  Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue()
  Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup()
  Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races
  Bluetooth: hci_ldisc: Tidy-up HCI_UART_REGISTERED in
    hci_uart_tty_close()
  Bluetooth: hci_ldisc: hci_uart_tty_close() move hci_uart_close()
  Bluetooth: hci_ldisc: hci_uart_tty_close() detach tty after last msg
  Bluetooth: hci_ldisc: hci_uart_tty_close() move cancel_work_sync()
  Bluetooth: hci_ldisc: hci_uart_tty_close() free hu->tx_skb
  Bluetooth: hci_ldisc: Simplify flushing
  Bluetooth: hci_ldisc: Add ioctl_mutex avoiding concurrency

 drivers/bluetooth/hci_ldisc.c | 105 ++++++++++++++++++++++++++++++++----------
 drivers/bluetooth/hci_uart.h  |   3 ++
 2 files changed, 84 insertions(+), 24 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 18+ messages in thread

* [PATCH V2 01/16] Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work()
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 02/16] Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev Dean Jenkins
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

If hci_register_dev() returns an error in hci_uart_init_work()
then the HCI_UART_REGISTERED bit gets erroneously set due to
a missing return statement. Therefore, add the missing return
statement.

The consequence of the missing return is that the HCI UART is not
registered but HCI_UART_REGISTERED is set which allows the code
to think that hu->hdev is safe to access but hu->hdev has been
freed so could lead to a crash.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 9497c46..3a65414 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -186,6 +186,7 @@ static void hci_uart_init_work(struct work_struct *work)
 		hci_free_dev(hu->hdev);
 		hu->hdev = NULL;
 		hu->proto->close(hu);
+		return;
 	}
 
 	set_bit(HCI_UART_REGISTERED, &hu->flags);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 02/16] Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 01/16] Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work() Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 03/16] Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY Dean Jenkins
                   ` (14 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

When hci_register_dev() fails, hu->hdev should be set to NULL before
freeing hdev. This avoids potential use of hu->hdev after it has been
freed.

This commit sets hu->hdev to NULL before calling hci_free_dev() in error
handling scenarios in hci_uart_init_work() and hci_uart_register_dev().

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 3a65414..a351cc7 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -176,6 +176,7 @@ static void hci_uart_init_work(struct work_struct *work)
 {
 	struct hci_uart *hu = container_of(work, struct hci_uart, init_ready);
 	int err;
+	struct hci_dev *hdev;
 
 	if (!test_and_clear_bit(HCI_UART_INIT_PENDING, &hu->hdev_flags))
 		return;
@@ -183,8 +184,9 @@ static void hci_uart_init_work(struct work_struct *work)
 	err = hci_register_dev(hu->hdev);
 	if (err < 0) {
 		BT_ERR("Can't register HCI device");
-		hci_free_dev(hu->hdev);
+		hdev = hu->hdev;
 		hu->hdev = NULL;
+		hci_free_dev(hdev);
 		hu->proto->close(hu);
 		return;
 	}
@@ -617,6 +619,7 @@ static int hci_uart_register_dev(struct hci_uart *hu)
 
 	if (hci_register_dev(hdev) < 0) {
 		BT_ERR("Can't register HCI device");
+		hu->hdev = NULL;
 		hci_free_dev(hdev);
 		return -ENODEV;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 03/16] Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 01/16] Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work() Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 02/16] Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 04/16] Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev() call Dean Jenkins
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

Ensure that HCI_UART_PROTO_READY is cleared before close(hu) is
called which closes the Data Link protocol layer.

Therefore, add the missing bit clear of HCI_UART_PROTO_READY to
hci_uart_init_work() so that the flag is cleared when
hci_register_dev fails.

Without the fix, the functions of the Data Link protocol layer could
potentially be accessed after that layer has been closed. This
could lead to a crash as memory would have been freed in that layer.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index a351cc7..1887ad4 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -187,6 +187,7 @@ static void hci_uart_init_work(struct work_struct *work)
 		hdev = hu->hdev;
 		hu->hdev = NULL;
 		hci_free_dev(hdev);
+		clear_bit(HCI_UART_PROTO_READY, &hu->flags);
 		hu->proto->close(hu);
 		return;
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 04/16] Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev() call
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (2 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 03/16] Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 05/16] Bluetooth: hci_ldisc: Separate flag handling in hci_uart_tty_close() Dean Jenkins
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

This commit relates to the HCI UART quirk HCI_QUIRK_RESET_ON_CLOSE
which is enabled by default and can be disabled by setting hdev_flags
flag HCI_UART_RESET_ON_INIT via ioctl HCIUARTSETFLAGS from userland.

The implementation of HCI_QUIRK_RESET_ON_CLOSE is broken for the BCSP
Data Link protocol layer because it can be seen that
hci_unregister_dev() takes 2 seconds to run due to BCSP communications
failure. Suspect that sending of the HCI RESET command is broken
for the other Data Link protocols as well.

Therefore, add a comment to inform that the call to
hci_unregister_dev() in hci_uart_tty_close() may send a HCI RESET
command. This transmission needs the HCI UART driver to remain
operational including having the Data Link protocol being bound to
the HCI UART driver. If the transmission fails, then
hci_unregister_dev() waits HCI_CMD_TIMEOUT (2) seconds for the
timeout to occur which is undesirable.

The current software has a call to hci_unregister_dev(hdev) in
hci_uart_tty_close() which can cause an attempt at sending the
command HCI RESET to occur through the following callstack:

hci_uart_tty_close()
hci_unregister_dev()
hci_dev_do_close()
__hci_req_sync() to queue up command HCI RESET
which adds a work-item to the hdev->workqueue and waits 2 seconds
for a response

Subsequently
hdev->workqueue runs and processes the work-item by calling
hci_cmd_work()
hci_send_frame()
hci_uart_send_frame() to enqueue into the Data Link protocol layer

No protocol response comes so hci_unregister_dev() takes 2 seconds to
run!

In fact, this hidden transmission of the HCI RESET command is most
likely the root cause of why hci_uart_tty_close() crashed in the past
and was "fixed" with multiple workarounds in the past.

The underlying design flaw is that hci_uart_tty_close() is interfering
with various aspects of transmitting and receiving HCI messages
whilst hci_unregister_dev() is trying to use the Data Link protocol
to send the HCI RESET command. Consequently, the design flaw
causes the Data Link protocol to retransmit as no reply comes from
the Bluetooth Radio module over the UART link.

Over the many years of "fixes", this caused the logic in
hci_uart_tty_close() to become over-complex.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 9 ++++++++-
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 1887ad4..9a96463 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -498,8 +498,15 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 
 	if (test_and_clear_bit(HCI_UART_PROTO_READY, &hu->flags)) {
 		if (hdev) {
-			if (test_bit(HCI_UART_REGISTERED, &hu->flags))
+			if (test_bit(HCI_UART_REGISTERED, &hu->flags)) {
+				/* Note hci_unregister_dev() may try to send
+				 * a HCI RESET command. If the transmission
+				 * fails then hci_unregister_dev() waits
+				 * HCI_CMD_TIMEOUT (2) seconds for the timeout
+				 * to occur.
+				 */
 				hci_unregister_dev(hdev);
+			}
 			hci_free_dev(hdev);
 		}
 		hu->proto->close(hu);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 05/16] Bluetooth: hci_ldisc: Separate flag handling in hci_uart_tty_close()
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (3 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 04/16] Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev() call Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 06/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame() Dean Jenkins
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

The treatment of flags HCI_UART_REGISTERED and HCI_UART_PROTO_READY
in hci_uart_tty_close() was combined in multiple if statements. This
is over-complex and can be simplified.

The simplification is to separate the processing of the flags
HCI_UART_REGISTERED and HCI_UART_PROTO_READY in hci_uart_tty_close()
because registration of the HCI UART device is actually independent of
the binding of the Data Link protocol layer. The Data Link protocol
layer is bound before the HCI UART device is registered and therefore
unregister the HCI UART device before unbinding the Data Link protocol
layer. However, the software should not break if the Data Link protocol
somehow becomes unbound whilst the HCI UART device is still registered.

One of the reasons the software got over complex was because
hci_uart_send_frame() did not check the status of the
HCI_UART_PROTO_READY flag which meant that the hdev->workqueue could
operate on an unbound Data Link protocol layer. The complexity of the
logic prevented hci_uart_send_frame() from crashing.

The commit
"Bluetooth: Add protocol check to hci_uart_send_frame()"
for adding the HCI_UART_PROTO_READY flag check to hci_uart_send_frame()
is necessary to allow this simplification to work. Otherwise,
hci_uart_send_frame() can attempt to write to the unbound Data Link
protocol layer resulting in a crash.

Note hci_uart_write_work() races with hci_uart_tty_close() so it is
important that freeing of hdev is moved to the end of
hci_uart_tty_close() to reduce the risk of use after free of hdev.
For completeness set hu->hdev to NULL before freeing hdev so that
hu no longer references the freed hdev.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 30 +++++++++++++++++-------------
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 9a96463..063ccaa 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -496,23 +496,27 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 
 	cancel_work_sync(&hu->write_work);
 
-	if (test_and_clear_bit(HCI_UART_PROTO_READY, &hu->flags)) {
-		if (hdev) {
-			if (test_bit(HCI_UART_REGISTERED, &hu->flags)) {
-				/* Note hci_unregister_dev() may try to send
-				 * a HCI RESET command. If the transmission
-				 * fails then hci_unregister_dev() waits
-				 * HCI_CMD_TIMEOUT (2) seconds for the timeout
-				 * to occur.
-				 */
-				hci_unregister_dev(hdev);
-			}
-			hci_free_dev(hdev);
+	if (hdev) {
+		if (test_bit(HCI_UART_REGISTERED, &hu->flags)) {
+			/* Note hci_unregister_dev() may try to send a
+			 * HCI RESET command. If the transmission fails then
+			 * hci_unregister_dev() waits HCI_CMD_TIMEOUT
+			 * (2) seconds for the timeout to occur.
+			 */
+			hci_unregister_dev(hdev);
 		}
-		hu->proto->close(hu);
 	}
+
+	if (test_and_clear_bit(HCI_UART_PROTO_READY, &hu->flags))
+		hu->proto->close(hu);
+
 	clear_bit(HCI_UART_PROTO_SET, &hu->flags);
 
+	if (test_and_clear_bit(HCI_UART_REGISTERED, &hu->flags)) {
+		hu->hdev = NULL;
+		hci_free_dev(hdev);
+	}
+
 	kfree(hu);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 06/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame()
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (4 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 05/16] Bluetooth: hci_ldisc: Separate flag handling in hci_uart_tty_close() Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 07/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue() Dean Jenkins
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

Before attempting to send a HCI message, check that the Data Link
protocol is still bound to the HCI UART driver. This makes the code
consistent with the usage of the other proto function pointers.

Therefore, add a check for HCI_UART_PROTO_READY into hci_uart_send_frame()
and return -EUNATCH if the Data Link protocol is not bound.

This also allows hci_send_frame() to report the error of an unbound
Data Link protocol layer. Therefore, it assists with diagnostics into
why HCI messages are being sent when the Data Link protocol is not
bound and avoids potential crashes.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 063ccaa..da9acc7 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -255,6 +255,9 @@ static int hci_uart_send_frame(struct hci_dev *hdev, struct sk_buff *skb)
 	BT_DBG("%s: type %d len %d", hdev->name, hci_skb_pkt_type(skb),
 	       skb->len);
 
+	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
+		return -EUNATCH;
+
 	hu->proto->enqueue(hu, skb);
 
 	hci_uart_tx_wakeup(hu);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 07/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue()
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (5 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 06/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame() Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 08/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup() Dean Jenkins
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

Before attempting to dequeue a Data Link protocol encapsulated message,
check that the Data Link protocol is still bound to the HCI UART driver.
This makes the code consistent with the usage of the other proto
function pointers.

Therefore, add a check for HCI_UART_PROTO_READY into hci_uart_dequeue()
and return NULL if the Data Link protocol is not bound.

This is needed for robustness as there is a scheduling race condition.
hci_uart_write_work() is scheduled to run via work queue hu->write_work
from hci_uart_tx_wakeup(). Therefore, there is a delay between
scheduling hci_uart_write_work() to run and hci_uart_dequeue() running
whereby the Data Link protocol layer could become unbound during the
scheduling delay. In this case, without the check, the call to the
unbound Data Link protocol layer dequeue function can crash.

It is noted that hci_uart_tty_close() has a
"cancel_work_sync(&hu->write_work)" statement but this only reduces
the window of the race condition because it is possible for a new
work-item to be added to work queue hu->write_work after the call to
cancel_work_sync(). For example, Data Link layer retransmissions can
be added to the work queue after the cancel_work_sync() has finished.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index da9acc7..1d09964 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -113,10 +113,12 @@ static inline struct sk_buff *hci_uart_dequeue(struct hci_uart *hu)
 {
 	struct sk_buff *skb = hu->tx_skb;
 
-	if (!skb)
-		skb = hu->proto->dequeue(hu);
-	else
+	if (!skb) {
+		if (test_bit(HCI_UART_PROTO_READY, &hu->flags))
+			skb = hu->proto->dequeue(hu);
+	} else {
 		hu->tx_skb = NULL;
+	}
 
 	return skb;
 }
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 08/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup()
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (6 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 07/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue() Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 09/16] Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races Dean Jenkins
                   ` (8 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

Before attempting to schedule a work-item onto hu->write_work in
hci_uart_tx_wakeup(), check that the Data Link protocol layer is
still bound to the HCI UART driver.

Failure to perform this protocol check causes a race condition between
the work queue hu->write_work running hci_uart_write_work() and the
Data Link protocol layer being unbound (closed) in hci_uart_tty_close().

Note hci_uart_tty_close() does have a "cancel_work_sync(&hu->write_work)"
but it is ineffective because it cannot prevent work-items being added
to hu->write_work after cancel_work_sync() has run.

Therefore, add a check for HCI_UART_PROTO_READY into hci_uart_tx_wakeup()
which prevents scheduling of the work queue when HCI_UART_PROTO_READY
is in the clear state. However, note a small race condition remains
because the hci_uart_tx_wakeup() thread can run in parallel with the
hci_uart_tty_close() thread so it is possible that a schedule of
hu->write_work can occur when HCI_UART_PROTO_READY is cleared. A complete
solution needs locking of the threads which is implemented in a future
commit.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 1d09964..48826c3 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -125,6 +125,9 @@ static inline struct sk_buff *hci_uart_dequeue(struct hci_uart *hu)
 
 int hci_uart_tx_wakeup(struct hci_uart *hu)
 {
+	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
+		return 0;
+
 	if (test_and_set_bit(HCI_UART_SENDING, &hu->tx_state)) {
 		set_bit(HCI_UART_TX_WAKEUP, &hu->tx_state);
 		return 0;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 09/16] Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (7 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 08/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup() Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 10/16] Bluetooth: hci_ldisc: Tidy-up HCI_UART_REGISTERED in hci_uart_tty_close() Dean Jenkins
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

When HCI_UART_PROTO_READY is in the set state, the Data Link protocol
layer (proto) is bound to the HCI UART driver. This state allows the
registered proto function pointers to be used by the HCI UART driver.

When unbinding (closing) the Data Link protocol layer, the proto
function pointers much be prevented from being used immediately before
running the proto close function pointer. Otherwise, there is a risk
that a proto non-close function pointer is used during or after the
proto close function pointer is used. The consequences are likely to
be a kernel crash because the proto close function pointer will free
resources used in the Data Link protocol layer.

Therefore, add a reader writer lock (rwlock) solution to prevent the
close proto function pointer from running by using write_lock_irqsave()
whilst the other proto function pointers are protected using
read_lock(). This means HCI_UART_PROTO_READY can safely be cleared
in the knowledge that no proto function pointers are running.

When flag HCI_UART_PROTO_READY is put into the clear state,
proto close function pointer can safely be run. Note
flag HCI_UART_PROTO_SET being in the set state prevents the proto
open function pointer from being run so there is no race condition
between proto open and close function pointers.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 41 ++++++++++++++++++++++++++++++++++++-----
 drivers/bluetooth/hci_uart.h  |  1 +
 2 files changed, 37 insertions(+), 5 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 48826c3..65bfc0f 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -114,8 +114,12 @@ static inline struct sk_buff *hci_uart_dequeue(struct hci_uart *hu)
 	struct sk_buff *skb = hu->tx_skb;
 
 	if (!skb) {
+		read_lock(&hu->proto_lock);
+
 		if (test_bit(HCI_UART_PROTO_READY, &hu->flags))
 			skb = hu->proto->dequeue(hu);
+
+		read_unlock(&hu->proto_lock);
 	} else {
 		hu->tx_skb = NULL;
 	}
@@ -125,18 +129,23 @@ static inline struct sk_buff *hci_uart_dequeue(struct hci_uart *hu)
 
 int hci_uart_tx_wakeup(struct hci_uart *hu)
 {
+	read_lock(&hu->proto_lock);
+
 	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
-		return 0;
+		goto no_schedule;
 
 	if (test_and_set_bit(HCI_UART_SENDING, &hu->tx_state)) {
 		set_bit(HCI_UART_TX_WAKEUP, &hu->tx_state);
-		return 0;
+		goto no_schedule;
 	}
 
 	BT_DBG("");
 
 	schedule_work(&hu->write_work);
 
+no_schedule:
+	read_unlock(&hu->proto_lock);
+
 	return 0;
 }
 
@@ -236,9 +245,13 @@ static int hci_uart_flush(struct hci_dev *hdev)
 	tty_ldisc_flush(tty);
 	tty_driver_flush_buffer(tty);
 
+	read_lock(&hu->proto_lock);
+
 	if (test_bit(HCI_UART_PROTO_READY, &hu->flags))
 		hu->proto->flush(hu);
 
+	read_unlock(&hu->proto_lock);
+
 	return 0;
 }
 
@@ -260,10 +273,15 @@ static int hci_uart_send_frame(struct hci_dev *hdev, struct sk_buff *skb)
 	BT_DBG("%s: type %d len %d", hdev->name, hci_skb_pkt_type(skb),
 	       skb->len);
 
-	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
+	read_lock(&hu->proto_lock);
+
+	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
+		read_unlock(&hu->proto_lock);
 		return -EUNATCH;
+	}
 
 	hu->proto->enqueue(hu, skb);
+	read_unlock(&hu->proto_lock);
 
 	hci_uart_tx_wakeup(hu);
 
@@ -474,6 +492,8 @@ static int hci_uart_tty_open(struct tty_struct *tty)
 	INIT_WORK(&hu->init_ready, hci_uart_init_work);
 	INIT_WORK(&hu->write_work, hci_uart_write_work);
 
+	rwlock_init(&hu->proto_lock);
+
 	/* Flush any pending characters in the driver */
 	tty_driver_flush_buffer(tty);
 
@@ -489,6 +509,7 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 {
 	struct hci_uart *hu = tty->disc_data;
 	struct hci_dev *hdev;
+	unsigned long flags;
 
 	BT_DBG("tty %p", tty);
 
@@ -515,8 +536,13 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 		}
 	}
 
-	if (test_and_clear_bit(HCI_UART_PROTO_READY, &hu->flags))
+	if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
+		write_lock_irqsave(&hu->proto_lock, flags);
+		clear_bit(HCI_UART_PROTO_READY, &hu->flags);
+		write_unlock_irqrestore(&hu->proto_lock, flags);
+
 		hu->proto->close(hu);
+	}
 
 	clear_bit(HCI_UART_PROTO_SET, &hu->flags);
 
@@ -574,13 +600,18 @@ static void hci_uart_tty_receive(struct tty_struct *tty, const u8 *data,
 	if (!hu || tty != hu->tty)
 		return;
 
-	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags))
+	read_lock(&hu->proto_lock);
+
+	if (!test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
+		read_unlock(&hu->proto_lock);
 		return;
+	}
 
 	/* It does not need a lock here as it is already protected by a mutex in
 	 * tty caller
 	 */
 	hu->proto->recv(hu, data, count);
+	read_unlock(&hu->proto_lock);
 
 	if (hu->hdev)
 		hu->hdev->stat.byte_rx += count;
diff --git a/drivers/bluetooth/hci_uart.h b/drivers/bluetooth/hci_uart.h
index 0701395..580ca98 100644
--- a/drivers/bluetooth/hci_uart.h
+++ b/drivers/bluetooth/hci_uart.h
@@ -85,6 +85,7 @@ struct hci_uart {
 	struct work_struct	write_work;
 
 	const struct hci_uart_proto *proto;
+	rwlock_t		proto_lock;	/* Stop work for proto close */
 	void			*priv;
 
 	struct sk_buff		*tx_skb;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 10/16] Bluetooth: hci_ldisc: Tidy-up HCI_UART_REGISTERED in hci_uart_tty_close()
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (8 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 09/16] Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 11/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move hci_uart_close() Dean Jenkins
                   ` (6 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

The code in hci_uart_tty_close is over complex in handling flag
HCI_UART_REGISTERED as it is unnecessary to check that hdev is NULL.
This is because hdev is only valid when HCI_UART_REGISTERED is in
the set state.

Therefore, remove all "if (hdev)" checks and instead check for flag
HCI_UART_REGISTERED being in the set state.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 17 +++++++----------
 1 file changed, 7 insertions(+), 10 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 65bfc0f..4a32fd2 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -520,20 +520,17 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 		return;
 
 	hdev = hu->hdev;
-	if (hdev)
+	if (test_bit(HCI_UART_REGISTERED, &hu->flags))
 		hci_uart_close(hdev);
 
 	cancel_work_sync(&hu->write_work);
 
-	if (hdev) {
-		if (test_bit(HCI_UART_REGISTERED, &hu->flags)) {
-			/* Note hci_unregister_dev() may try to send a
-			 * HCI RESET command. If the transmission fails then
-			 * hci_unregister_dev() waits HCI_CMD_TIMEOUT
-			 * (2) seconds for the timeout to occur.
-			 */
-			hci_unregister_dev(hdev);
-		}
+	if (test_bit(HCI_UART_REGISTERED, &hu->flags)) {
+		/* Note hci_unregister_dev() may try to send a HCI RESET
+		 * command. If the transmission fails then hci_unregister_dev()
+		 * waits HCI_CMD_TIMEOUT (2) seconds for the timeout to occur.
+		 */
+		hci_unregister_dev(hdev);
 	}
 
 	if (test_bit(HCI_UART_PROTO_READY, &hu->flags)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 11/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move hci_uart_close()
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (9 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 10/16] Bluetooth: hci_ldisc: Tidy-up HCI_UART_REGISTERED in hci_uart_tty_close() Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 12/16] Bluetooth: hci_ldisc: hci_uart_tty_close() detach tty after last msg Dean Jenkins
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

In hci_uart_tty_close() move hci_uart_close() to after the call to
hci_unregister_dev() to avoid the tty flush in hci_uart_close()
and clearance of HCI_RUNNING from corrupting the Data Link protocol
layer's communication stream.

In fact, perform hci_uart_close() at the end of hci_uart_tty_close()
where the Data Link protocol layer has been closed so that no
more data can be written in the tty layer. This makes the flush
clear out any non-sent data from the tty layer although this
is probably unnecessary.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 4a32fd2..b2b6515 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -520,8 +520,6 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 		return;
 
 	hdev = hu->hdev;
-	if (test_bit(HCI_UART_REGISTERED, &hu->flags))
-		hci_uart_close(hdev);
 
 	cancel_work_sync(&hu->write_work);
 
@@ -544,6 +542,7 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 	clear_bit(HCI_UART_PROTO_SET, &hu->flags);
 
 	if (test_and_clear_bit(HCI_UART_REGISTERED, &hu->flags)) {
+		hci_uart_close(hdev);
 		hu->hdev = NULL;
 		hci_free_dev(hdev);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 12/16] Bluetooth: hci_ldisc: hci_uart_tty_close() detach tty after last msg
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (10 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 11/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move hci_uart_close() Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 13/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move cancel_work_sync() Dean Jenkins
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

hci_uart_tty_close() is detaching the tty prematurely which prevents
the Data Link protocol layer from using the tty serial port in the
call to hci_unregister_dev().

Consequently, when hci_unregister_dev() attempts to send a HCI RESET
command, the BCSP layer attempts to retransmit the message many times
and the command is timed-out after HCI_CMD_TIMEOUT (2) seconds.
Nothing was physically transmitted because the tty was detached too
early.

Therefore, move the detach statement "tty->disc_data = NULL" to after
hci_unregister_dev() so that the HCI RESET command maybe sent and
acknowledged.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index b2b6515..761282f 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -513,9 +513,6 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 
 	BT_DBG("tty %p", tty);
 
-	/* Detach from the tty */
-	tty->disc_data = NULL;
-
 	if (!hu)
 		return;
 
@@ -539,6 +536,9 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 		hu->proto->close(hu);
 	}
 
+	/* Detach from the tty */
+	tty->disc_data = NULL;
+
 	clear_bit(HCI_UART_PROTO_SET, &hu->flags);
 
 	if (test_and_clear_bit(HCI_UART_REGISTERED, &hu->flags)) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 13/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move cancel_work_sync()
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (11 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 12/16] Bluetooth: hci_ldisc: hci_uart_tty_close() detach tty after last msg Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 14/16] Bluetooth: hci_ldisc: hci_uart_tty_close() free hu->tx_skb Dean Jenkins
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

The current location of cancel_work_sync() in hci_uart_tty_close()
before the call to hci_unregister_dev(hdev) may momentarily interfere
with the execution of hci_uart_tty_close() by waiting for an active call
to hci_uart_write_work() to finish. However, its current location is
ineffective at preventing a race condition described next.

cancel_work_sync() is needed to try to prevent hci_uart_write_work() from
running whilst the Data Link protocol layer is being unbound (closed).
However, it is weak as a race condition exists between
hci_uart_write_work() being scheduled via hci_uart_tx_wakeup() and
closure of the Data Link protocol layer. In particular, the Data Link
protocol layer may have a retransmission timer which will independently
call hci_uart_tx_wakeup().

To minimise the race condition, cancel_work_sync() needs to be as close
as possible to the code that closes down the Data Link protocol layer.
Therefore, move the cancel_work_sync() statement to be just before
the code which unbinds (closes) the Data Link protocol layer.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 761282f..236c8e7 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -518,8 +518,6 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 
 	hdev = hu->hdev;
 
-	cancel_work_sync(&hu->write_work);
-
 	if (test_bit(HCI_UART_REGISTERED, &hu->flags)) {
 		/* Note hci_unregister_dev() may try to send a HCI RESET
 		 * command. If the transmission fails then hci_unregister_dev()
@@ -533,6 +531,7 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 		clear_bit(HCI_UART_PROTO_READY, &hu->flags);
 		write_unlock_irqrestore(&hu->proto_lock, flags);
 
+		cancel_work_sync(&hu->write_work);
 		hu->proto->close(hu);
 	}
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 14/16] Bluetooth: hci_ldisc: hci_uart_tty_close() free hu->tx_skb
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (12 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 13/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move cancel_work_sync() Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 15/16] Bluetooth: hci_ldisc: Simplify flushing Dean Jenkins
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

There is a race condition for accessing hu->tx_skb between
hci_uart_flush() and hci_uart_dequeue() which runs in
hci_uart_write_work() from the work queue hu->write_work. This race
condition exists because there is no locking between these 2 threads
to protect hu->tx_skb. Consequently a call to hci_uart_flush() might
be able to free hu->tx_skb whilst hci_uart_write_work() is using
hu->tx_skb which is undesirable as a crash could occur.

Performing any flushing in the transmission path between the Data Link
protocol layer and the UART port may corrupt the data. So freeing
hu->tx_skb or not freeing hu->tx_skb makes little difference to having
intact data or corrupted data. So it is OK not to free hu->tx_skb.

Instead, move the freeing of hu->tx_skb to the end of
hci_uart_tty_close() from hci_uart_flush(). This eliminates the race
condition because the Data Link protocol layer is in the unbound state
and ensures hu->tx_skb is freed before hu is freed. Also use a temporary
pointer to allow hu->tx_skb to be set to NULL before freeing hu->tx_skb.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index 236c8e7..fbf830a 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -237,10 +237,6 @@ static int hci_uart_flush(struct hci_dev *hdev)
 
 	BT_DBG("hdev %p tty %p", hdev, tty);
 
-	if (hu->tx_skb) {
-		kfree_skb(hu->tx_skb); hu->tx_skb = NULL;
-	}
-
 	/* Flush any pending characters in the driver and discipline. */
 	tty_ldisc_flush(tty);
 	tty_driver_flush_buffer(tty);
@@ -510,6 +506,7 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 	struct hci_uart *hu = tty->disc_data;
 	struct hci_dev *hdev;
 	unsigned long flags;
+	struct sk_buff *temp_skb;
 
 	BT_DBG("tty %p", tty);
 
@@ -546,6 +543,12 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 		hci_free_dev(hdev);
 	}
 
+	if (hu->tx_skb) {
+		temp_skb = hu->tx_skb;
+		hu->tx_skb = NULL;
+		kfree_skb(temp_skb);
+	}
+
 	kfree(hu);
 }
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 15/16] Bluetooth: hci_ldisc: Simplify flushing
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (13 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 14/16] Bluetooth: hci_ldisc: hci_uart_tty_close() free hu->tx_skb Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-07 17:04 ` [PATCH V2 16/16] Bluetooth: hci_ldisc: Add ioctl_mutex avoiding concurrency Dean Jenkins
  2017-04-12  9:13 ` [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

The flushing due to the hci_uart_close() call probably corrupts the
data flow between the Data Link protocol layer and the UART port
(transmission direction). In any case, the flushing activity is
asynchronous to the state of the Data Link protocol layer so is
undesirable.

Modify hci_uart_close() to not perform any flushing because
hci_uart_flush() is explicitly called in hci_dev_do_close() which
is synchronous to the closure of the Data Link protocol layer so
it is OK. In other words, there is no point in flushing multiple
times with the Data Link protocol layer being in the bound state.

In hci_uart_tty_close() call hci_uart_flush() instead of
hci_uart_close() before freeing hdev. This final flush is after the
Data Link protocol layer has been unbound so ensures that the
transmission path is empty.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index fbf830a..ec9b96f 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -256,8 +256,7 @@ static int hci_uart_close(struct hci_dev *hdev)
 {
 	BT_DBG("hdev %p", hdev);
 
-	hci_uart_flush(hdev);
-	hdev->flush = NULL;
+	/* Nothing to do for UART driver */
 	return 0;
 }
 
@@ -538,7 +537,7 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 	clear_bit(HCI_UART_PROTO_SET, &hu->flags);
 
 	if (test_and_clear_bit(HCI_UART_REGISTERED, &hu->flags)) {
-		hci_uart_close(hdev);
+		hci_uart_flush(hdev);
 		hu->hdev = NULL;
 		hci_free_dev(hdev);
 	}
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* [PATCH V2 16/16] Bluetooth: hci_ldisc: Add ioctl_mutex avoiding concurrency
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (14 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 15/16] Bluetooth: hci_ldisc: Simplify flushing Dean Jenkins
@ 2017-04-07 17:04 ` Dean Jenkins
  2017-04-12  9:13 ` [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-07 17:04 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Dean Jenkins, Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

There is a concurrency risk with hci_uart_tty_ioctl() running with
itself. Also there is a race condition between hci_uart_tty_close()
clearing the HCI_UART_PROTO_SET flag whilst hci_uart_tty_ioctl() runs.

Even though the HCI_UART_PROTO_SET flag is atomic, this is not
sufficient to prevent hci_uart_tty_ioctl() from potentially accessing
freed or stale variables when hci_uart_tty_close() completes. This is
because the hci_uart_tty_ioctl() thread can run in parallel with the
hci_uart_tty_close() thread. This means the HCI_UART_PROTO_SET flag
can change at any time during the execution of hci_uart_tty_ioctl()
which can lead to a malfunction.

Therefore, add a mutex lock called ioctl_mutex in hci_uart_tty_ioctl()
to prevent self concurrency of hci_uart_tty_ioctl(). In
hci_uart_tty_close() add ioctl_mutex around the clearing of the hu
value from tty->disc_data and clearing the HCI_UART_PROTO_SET flag.

This mutex lock ensures that tty->disc_data does not become NULL
and the HCI_UART_PROTO_SET flag cannot become clear whilst the main
body of hci_uart_tty_ioctl() executes.

Signed-off-by: Dean Jenkins <Dean_Jenkins@mentor.com>
---
 drivers/bluetooth/hci_ldisc.c | 5 +++++
 drivers/bluetooth/hci_uart.h  | 2 ++
 2 files changed, 7 insertions(+)

diff --git a/drivers/bluetooth/hci_ldisc.c b/drivers/bluetooth/hci_ldisc.c
index ec9b96f..dbf16835 100644
--- a/drivers/bluetooth/hci_ldisc.c
+++ b/drivers/bluetooth/hci_ldisc.c
@@ -488,6 +488,7 @@ static int hci_uart_tty_open(struct tty_struct *tty)
 	INIT_WORK(&hu->write_work, hci_uart_write_work);
 
 	rwlock_init(&hu->proto_lock);
+	mutex_init(&hu->ioctl_mutex);
 
 	/* Flush any pending characters in the driver */
 	tty_driver_flush_buffer(tty);
@@ -531,10 +532,12 @@ static void hci_uart_tty_close(struct tty_struct *tty)
 		hu->proto->close(hu);
 	}
 
+	mutex_lock(&hu->ioctl_mutex);
 	/* Detach from the tty */
 	tty->disc_data = NULL;
 
 	clear_bit(HCI_UART_PROTO_SET, &hu->flags);
+	mutex_unlock(&hu->ioctl_mutex);
 
 	if (test_and_clear_bit(HCI_UART_REGISTERED, &hu->flags)) {
 		hci_uart_flush(hdev);
@@ -745,6 +748,7 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
 	if (!hu)
 		return -EBADF;
 
+	mutex_lock(&hu->ioctl_mutex);
 	switch (cmd) {
 	case HCIUARTSETPROTO:
 		if (!test_and_set_bit(HCI_UART_PROTO_SET, &hu->flags)) {
@@ -784,6 +788,7 @@ static int hci_uart_tty_ioctl(struct tty_struct *tty, struct file *file,
 		err = n_tty_ioctl_helper(tty, file, cmd, arg);
 		break;
 	}
+	mutex_unlock(&hu->ioctl_mutex);
 
 	return err;
 }
diff --git a/drivers/bluetooth/hci_uart.h b/drivers/bluetooth/hci_uart.h
index 580ca98..54c435d 100644
--- a/drivers/bluetooth/hci_uart.h
+++ b/drivers/bluetooth/hci_uart.h
@@ -91,6 +91,8 @@ struct hci_uart {
 	struct sk_buff		*tx_skb;
 	unsigned long		tx_state;
 
+	struct mutex		ioctl_mutex;	/* HCI_UART_PROTO_SET lock */
+
 	unsigned int init_speed;
 	unsigned int oper_speed;
 };
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 18+ messages in thread

* Re: [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes
  2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
                   ` (15 preceding siblings ...)
  2017-04-07 17:04 ` [PATCH V2 16/16] Bluetooth: hci_ldisc: Add ioctl_mutex avoiding concurrency Dean Jenkins
@ 2017-04-12  9:13 ` Dean Jenkins
  16 siblings, 0 replies; 18+ messages in thread
From: Dean Jenkins @ 2017-04-12  9:13 UTC (permalink / raw)
  To: Marcel Holtmann; +Cc: Gustavo F . Padovan, Johan Hedberg, linux-bluetooth

Hi Marcel,

On 07/04/17 18:04, Dean Jenkins wrote:
> This is patchset V2 which reorganises hci_uart_tty_close() to overcome a design
> flaw related to the proper closure of the Data Link protocol layer. In the worst
> case scenario a kernel crash occurs.
>
>
> Previous Discussions
> ====================
>
> Please refer to the discussion on my RFC patchset V1 that can be found here:
> https://www.spinics.net/lists/linux-bluetooth/msg70077.html
>
>
> Changes between V1 and V2
> =========================
>
> 1. Implemented some minor code style changes that were suggested by Marcel
> Holtmann.
> 2. Reordered some of the patches to better group together related changes.
>
>
> Terms used
> ==========
>
> Data Link protocol:
>
> This refers to the serial link protocol used by the UART based Bluetooth Radio
> Module. h4, BCSP and h5 are examples of Data Link protocols and there are other
> protocols available.
>
> Data Link protocol layer:
>
> This refers to the "proto" protocols that implement the Data Link protocol on
> the Linux side of the UART link.
>
> These protocols are implemented with a function pointer interface and therefore
> a specific Data Link protocol can be bound within the UART HCI driver.
>
>
> Overview of Data Link protocol data flow
> ========================================
>
> The Data Link protocol layer uses serial protocol frames to transport
> information such as HCI messages to the Bluetooth Radio Module.
>
> Internally within the Linux kernel, the Data Link protocol layer supplies
> protocol frames via the HCI UART LDISC and TTY layers to the UART driver. Any
> breakage in this data "pipe" will cause a disruption of the communications with
> the Bluetooth Radio Module.
>
>
> Overview of Design Flaw
> =======================
>
> There are a number of issues relating to the implementation of
> hci_uart_tty_close().
>
> The areas of concern are:
>
> a) poor locking of hu->proto->close() with respect to the other "proto" function
>     pointers. This means hu->proto->close() can asynchronously remove resources
>     used by the other "proto" function pointers resulting in malfunctions and
>     kernel crashes.
>
> b) tearing down parts of the data "pipe" between the Data Link protocol layer
>     and the UART driver despite attempting to send HCI commands and trying to
>     handle retransmissions within hci_uart_tty_close(). This results in a loss of
>     communications with the Bluetooth Radio Module. Note that a loss of
>     communications can cause the Data Link protocol layer to independently
>     attempt to retransmit such as in the case of BCSP.
>
> c) suspect HCI_QUIRK_RESET_ON_CLOSE has been broken for many years because it
>     is not possible to get the HCI RESET command successfully ACK'ed due to the
>     data "pipe" being cut in multiple places during the execution of
>     tty_ldisc_close().
>
> Please note the issue is independent of the implementation of the Data Link
> protocol layer. BCSP closedown scenarios have been used as an example.
>
>
> Test cases
> ==========
>
> Test 1
> ------
>
> Setup Bluetooth to use a BCSP based UART Bluetooth Radio Module. In this case
> use the Bluez hciattach utility.
>
> Then kill the userland hciattach program by doing
> killall hciattach
>
> When hciattach terminates, SIGTERM handling causes ioctl TIOCSETD to run
> kernel function tty_set_ldisc().
>
> Test 2
> ------
>
> Setup Bluetooth to use a BCSP based UART Bluetooth Radio Module. In this case
> use the Bluez btattach utility.
>
> Note that btattach does not establish a BCSP connection due to missing some BCSP
> protocol implementation. However, this is ideal in exposing the design flaw.
>
> Here is an example test script that can trigger a Bluetooth kernel crash in
> relation to hci_uart_tty_close().
>
> #!/bin/bash
>
> let i=1
>
> while [ true ]
> do
>          echo "loop $i"
>          ./btattach -A /dev/ttyUSB0 -P bcsp &
>          sleep 5
>          echo "now killing..."
>          killall btattach
>          i=$(($i + 1))
> done
>
> Kernel crashes within 10 minutes of starting the script.
>
> h4 can also be shown to fail by using
> 	./btattach -A /dev/ttyUSB0 -P h4 &
>
> Note that in my test run, h4 was communicating with a BCSP enabled Bluetooth
> Radio Module. The probability of a crash is increased when poor communications
> occur.
>
> When btattach gets a SIGTERM, the program terminates with a group_exit() system
> call which causes tty_set_ldisc() to run to perform clean-up.
>
>
> Analysis of hci_uart_tty_close() callstack
> ==========================================
>
> Crash 1 - Sending HCI RESET command due to HCI_QUIRK_RESET_ON_CLOSE
> -------
>
>  From tests 1 and 2 the callstack can be:
>
> tty_set_ldisc() takes a tty ref lock via tty_ldisc_lock
> tty_ldisc_close()
> hci_uart_tty_close()
> hci_unregister_dev()
> hci_dev_do_close()
> __hci_req_sync() which tries to send a HCI RESET command which depends on
> HCI_QUIRK_RESET_ON_CLOSE being enabled and that is the default condition. This
> is done by queueing up the HCI RESET command which adds a work-item to the
> hdev->workqueue and waits 2 seconds for a response
>
> Subsequently in a parallel thread to hci_uart_tty_close()
> hdev->workqueue runs and processes the work-item by calling
> hci_cmd_work()
> hci_send_frame()
> hci_uart_send_frame()
> hci_uart_tx_wakeup() to schedule hci_uart_write_work() via the work queue
>   hu->write_work
>
> Subsequently in a parallel thread to hci_uart_tty_close()
> hci_uart_write_work()
> hci_uart_dequeue()
> hu->proto->dequeue()
> bcsp_dequeue()
> tty->ops->write() to write the BCSP frame to the UART driver via TTY layer
>
> Subsequently in a parallel thread to hci_uart_tty_close()
> BCSP re-transmission timer expires after 250ms due to no ACK from the Bluetooth
> Radio Module.
> bcsp_timed_event()
> hci_uart_tx_wakeup() to schedule hci_uart_write_work() via the work queue
>   hu->write_work
>
> Subsequently in a parallel thread to hci_uart_tty_close()
> hci_uart_write_work()
> hci_uart_dequeue()
> hu->proto->dequeue()
> bcsp_dequeue()
> tty->ops->write() to write the BCSP frame to the UART driver via TTY layer
>
> Subsequently in a parallel thread to hci_uart_tty_close()
> BCSP re-transmission timer expires after 250ms due to no ACK from the Bluetooth
> Radio Module.
> bcsp_timed_event()
> hci_uart_tx_wakeup() to schedule hci_uart_write_work() via the work queue
>   hu->write_work
>
> Subsequently in a parallel thread to hci_uart_tty_close()
> hci_uart_write_work()
> hci_uart_dequeue()
> hu->proto->dequeue()
> bcsp_dequeue()
> tty->ops->write() to write the BCSP frame to the UART driver via TTY layer
>
> etc. BCSP attempts to retransmit every 250ms during the execution of
> hci_uart_tty_close() until bcsp_close() is called via hu->proto->close().
>
> Here is an example kernel crash log for kernel 4.10.5 + debug
> (sorry for the bad kernel version string 4.10.54.10.5_debug+)
>
> [ 1561.709737] BUG: unable to handle kernel NULL pointer dereference at 0000000000000044
> [ 1561.709829] IP: _raw_spin_lock_irqsave+0x22/0x40
> [ 1561.709862] PGD 0
> [ 1561.709889] Oops: 0002 1 SMP
> [ 1561.709911] Modules linked in: ftdi_sio rfcomm hci_uart btqca xt_set ip_set_hash_ip nf_log_ipv6 ip_set nf_log_ipv4 nf_log_common xt_LOG ip6table_nat nf_nat_ipv6 xt_recent iptable_nat nf_nat_ipv4 ipt_REJECT nf_reject_ipv4 iptable_mangle iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment ip6t_REJECT nf_reject_ipv6 xt_addrtype bridge stp llc xt_mark ip6table_mangle nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp xt_tcpudp nf_nat_sip xt_CT nf_nat_pptp nf_nat_proto_gre nf_nat_irc ip6table_raw nf_nat_h323 xt_multiport nf_nat_ftp nf_nat_amanda nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_nat nf_conntrack_tftp nf_conntrack_sip nf_conntrack_sane nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323
> [ 1561.710172] nf_conntrack_ftp ts_kmp nf_conntrack_amanda nf_conntrack iptable_filter ip_tables ip6table_filter ip6_tables x_tables lockd grace ctr ccm af_packet bnep msr usbserial uvcvideo btusb btrtl arc4 brcmsmac btbcm videobuf2_vmalloc btintel cordic videobuf2_memops bluetooth intel_powerclamp brcmutil videobuf2_v4l2 videobuf2_core mac80211 videodev coretemp kvm_intel kvm joydev media cfg80211 iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi snd_hda_codec_realtek psmouse snd_hda_codec_generic toshiba_acpi input_leds bcma snd_hda_intel irqbypass snd_hda_codec toshiba_bluetooth cpufreq_ondemand sparse_keymap crc32c_intel cpufreq_conservative industrialio snd_hda_core serio_raw wmi rfkill thermal cpufreq_powersave battery ac snd_hwdep snd_pcm acpi_cpufreq snd_timer snd toshiba_haps mei_me mei e1000e
> [ 1561.718028] fjes soundcore evdev ptp lpc_ich shpchp pps_core intel_ips tpm_tis tpm_tis_core nvram tpm sch_fq_codel sunrpc ipv6 crc_ccitt autofs4 hid_generic usbhid hid sdhci_pci mmc_block sdhci sr_mod ehci_pci ehci_hcd mmc_core usbcore usb_common i915 video button i2c_algo_bit drm_kms_helper drm [last unloaded: ftdi_sio]
> [ 1561.721317] CPU: 1 PID: 4473 Comm: kworker/1:0 Not tainted 4.10.54.10.5_debug+ #12
> [ 1561.722981] Hardware name: TOSHIBA Satellite R630/Portable PC, BIOS Version 1.40 07/20/2010
> [ 1561.724660] Workqueue: events hci_uart_write_work [hci_uart]
> [ 1561.726377] task: ffff98d9b4f15100 task.stack: ffffa868c1178000
> [ 1561.728568] RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40
> [ 1561.730355] RSP: 0000:ffffa868c117bda8 EFLAGS: 00010046
> [ 1561.731993] RAX: 0000000000000000 RBX: 0000000000000296 RCX: 0000000000079ad6
> [ 1561.733646] RDX: 0000000000000001 RSI: ffff98da77c5c580 RDI: 0000000000000044
> [ 1561.735314] RBP: ffffa868c117bdb0 R08: 000000000001c5a0 R09: ffffffff9965a54a
> [ 1561.736991] R10: fffff48441dc9a80 R11: ffff98da73403700 R12: 0000000000000030
> [ 1561.738641] R13: 0000000000000044 R14: 0000000000000030 R15: ffff98d9b4e50000
> [ 1561.740288] FS: 0000000000000000(0000) GS:ffff98da77c40000(0000) knlGS:0000000000000000
> [ 1561.741940] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 1561.743585] CR2: 0000000000000044 CR3: 0000000128d1a000 CR4: 00000000000006e0
> [ 1561.745238] Call Trace:
> [ 1561.746882] skb_dequeue+0x1d/0x70
> [ 1561.748512] bcsp_dequeue+0x27/0x180 [hci_uart]
> [ 1561.750130] ? kfree_skbmem+0x5a/0x60
> [ 1561.751742] hci_uart_write_work+0xc4/0x100 [hci_uart]
> [ 1561.753356] process_one_work+0x151/0x410
> [ 1561.754930] worker_thread+0x69/0x4b0
> [ 1561.756507] kthread+0x114/0x150
> [ 1561.758076] ? rescuer_thread+0x340/0x340
> [ 1561.759636] ? kthread_park+0x90/0x90
> [ 1561.761191] ret_from_fork+0x2c/0x40
> [ 1561.762693] Code: 89 e5 e8 82 96 98 ff 5d c3 66 66 66 66 90 55 48 89 e5 53 9c 58 66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 06 48 89 d8 5b 5d c3 89 c6 e8 29 83 98 ff
> [ 1561.766034] RIP: _raw_spin_lock_irqsave+0x22/0x40 RSP: ffffa868c117bda8
> [ 1561.768033] CR2: 0000000000000044
> [ 1561.774555] --[ end trace 51cc1d3575c0e559 ]--
>
>
> Crash 2 - Sending HCI commands but get no ACKs
> -------
>
> Using test 2 and h4, a similar crash could be triggered for h4_dequeue().
>
> In this case HCI is attempting to send multiple HCI commands.
>
> tty_set_ldisc() takes a tty ref lock via tty_ldisc_lock
> tty_ldisc_close()
> hci_uart_tty_close()
> hci_unregister_dev()
> hci_dev_do_close()
>
> HCI commands become queued up by adding work-items to the hdev->workqueue.
>
> Subsequently in a parallel thread to hci_uart_tty_close()
> hdev->workqueue runs and processes a work-item by calling
> hci_cmd_work()
> hci_send_frame()
> hci_uart_send_frame()
> hci_uart_tx_wakeup() to schedule hci_uart_write_work() via the work queue
>   hu->write_work
>
> Subsequently in a parallel thread to hci_uart_tty_close()
> hci_uart_write_work()
> hci_uart_dequeue()
> hu->proto->dequeue()
> tty->ops->write() to write the h4 frame to the UART driver via TTY layer
>
> Subsequently in a parallel thread to hci_uart_tty_close()
> hdev->workqueue runs and processes a work-item by calling
> hci_cmd_work()
> hci_send_frame()
> hci_uart_send_frame()
> hci_uart_tx_wakeup() to schedule hci_uart_write_work() via the work queue
>   hu->write_work
>
> Subsequently in a parallel thread to hci_uart_tty_close()
> hci_uart_write_work()
> hci_uart_dequeue()
> hu->proto->dequeue()
> tty->ops->write() to write the h4 frame to the UART driver via TTY layer
>
> etc. Note that HCI TX timeouts occur.
>
> Here is an example kernel crash log for kernel 4.10.5 + debug
> (sorry for the bad kernel version string 4.10.54.10.5_debug+)
>
> [ 563.702430] BUG: unable to handle kernel NULL pointer dereference at 000000000000001c
> [ 563.702491] IP: _raw_spin_lock_irqsave+0x22/0x40
> [ 563.702520] PGD 13230e067
> [ 563.702521] PUD 1238d6067
> [ 563.702540] PMD 0
> [ 563.704521] Oops: 0002 1 SMP
> [ 563.706215] Modules linked in: hci_uart btqca nf_log_ipv6 ip6table_nat nf_nat_ipv6 ip6t_REJECT nf_reject_ipv6 ip6table_mangle xt_set ip_set_hash_ip ip_set nf_log_ipv4 nf_log_common xt_LOG xt_recent iptable_nat nf_nat_ipv4 xt_comment ipt_REJECT nf_reject_ipv4 xt_addrtype bridge stp llc xt_mark iptable_mangle xt_tcpudp iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_CT nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp ip6table_raw nf_nat_sip xt_multiport nf_nat_pptp nf_nat_proto_gre nf_nat_irc nf_conntrack_ipv6 nf_defrag_ipv6 nf_nat_h323 nf_nat_ftp xt_conntrack nf_nat_amanda nf_nat nf_conntrack_tftp nf_conntrack_sip nf_conntrack_sane nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 nf_conntrack_ftp
> [ 563.715634] ts_kmp nf_conntrack_amanda nf_conntrack iptable_filter ip_tables ip6table_filter ip6_tables x_tables lockd grace ctr ccm af_packet bnep msr arc4 brcmsmac uvcvideo cordic brcmutil ftdi_sio videobuf2_vmalloc usbserial intel_powerclamp mac80211 videobuf2_memops btusb videobuf2_v4l2 btrtl coretemp snd_hda_codec_hdmi btbcm videobuf2_core btintel kvm_intel snd_hda_codec_realtek bluetooth kvm videodev joydev cfg80211 snd_hda_codec_generic media snd_hda_intel snd_hda_codec psmouse snd_hda_core irqbypass e1000e snd_hwdep input_leds snd_pcm iTCO_wdt iTCO_vendor_support serio_raw crc32c_intel toshiba_acpi snd_timer ptp mei_me fjes bcma snd mei thermal sparse_keymap toshiba_haps industrialio lpc_ich soundcore battery ac pps_core shpchp intel_ips toshiba_bluetooth wmi rfkill tpm_tis tpm_tis_core
> [ 563.726159] tpm cpufreq_ondemand cpufreq_conservative cpufreq_powersave acpi_cpufreq evdev nvram sunrpc sch_fq_codel ipv6 crc_ccitt autofs4 hid_generic usbhid hid sdhci_pci mmc_block sdhci sr_mod ehci_pci ehci_hcd mmc_core usbcore usb_common i915 video button i2c_algo_bit drm_kms_helper drm
> [ 563.730623] CPU: 0 PID: 29 Comm: kworker/0:1 Not tainted 4.10.54.10.5_debug+ #13
> [ 563.732862] Hardware name: TOSHIBA Satellite R630/Portable PC, BIOS Version 1.40 07/20/2010
> [ 563.735127] Workqueue: events hci_uart_write_work [hci_uart]
> [ 563.737376] task: ffffa03f32951b00 task.stack: ffffbc4ec0768000
> [ 563.739620] RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40
> [ 563.741839] RSP: 0018:ffffbc4ec076bdd8 EFLAGS: 00010046
> [ 563.744059] RAX: 0000000000000000 RBX: 0000000000000202 RCX: ffffa03f37c1cd60
> [ 563.746286] RDX: 0000000000000001 RSI: ffffa03f37c18ae0 RDI: 000000000000001c
> [ 563.748493] RBP: ffffbc4ec076bde0 R08: ffffa03f32332380 R09: 0000000000000001
> [ 563.750690] R10: ffffe9de427ce800 R11: ffffa03f33498600 R12: 0000000000000008
> [ 563.752884] R13: 000000000000001c R14: ffffa03e79b1ad38 R15: ffffa03e79b1a240
> [ 563.755075] FS: 0000000000000000(0000) GS:ffffa03f37c00000(0000) knlGS:0000000000000000
> [ 563.757260] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 563.759426] CR2: 000000000000001c CR3: 000000011d7a3000 CR4: 00000000000006f0
> [ 563.761610] Call Trace:
> [ 563.763798] skb_dequeue+0x1d/0x70
> [ 563.765975] h4_dequeue+0x16/0x20 [hci_uart]
> [ 563.768128] hci_uart_write_work+0xc4/0x100 [hci_uart]
> [ 563.770262] process_one_work+0x151/0x410
> [ 563.772368] worker_thread+0x69/0x4b0
> [ 563.774466] kthread+0x114/0x150
> [ 563.776519] ? rescuer_thread+0x340/0x340
> [ 563.778546] ? kthread_park+0x90/0x90
> [ 563.780566] ret_from_fork+0x2c/0x40
> [ 563.782559] Code: 89 e5 e8 82 96 98 ff 5d c3 66 66 66 66 90 55 48 89 e5 53 9c 58 66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 31 c0 ba 01 00 00 00 <f0> 0f b1 17 85 c0 75 06 48 89 d8 5b 5d c3 89 c6 e8 29 83 98 ff
> [ 563.786800] RIP: _raw_spin_lock_irqsave+0x22/0x40 RSP: ffffbc4ec076bdd8
> [ 563.788940] CR2: 000000000000001c
> [ 563.798389] --[ end trace 326429e7baa9f359 ]--
>
>
> Detailed analysis
> =================
>
> The failure can be rare due to the following mitigations:
>
> a) the Data Link protocol layer might be idle when hci_uart_tty_close() runs.
> b) the system never shuts down Bluetooth so hci_uart_tty_close() does not run.
> c) HCI_QUIRK_RESET_ON_CLOSE is disabled so the HCI RESET command is not sent.
>
> Please note that disabling HCI_QUIRK_RESET_ON_CLOSE helps to reduce the
> probability that a crash occurs. However, it is not a fix.
>
> It can be seen that the following threads can be running concurrently with
> hci_uart_tty_close():
>
> a) hci_cmd_work() via the work queue hdev->workqueue
> b) hci_uart_write_work() via the work queue hu->write_work
> c) Data Link protocol layer retransmission timer such as bcsp_timed_event()
>
> In order to send a HCI command the Data Link protocol layer must be able to send
> and receive frames from the UART port that is physically connected to the UART
> based Bluetooth Radio Module. If this data "pipe" is broken then the Data Link
> protocol layer may perform retransmissions such as in the case of BCSP. In
> addition, a loss of communications causes the HCI command to not be ACK'ed so
> causes a HCI command timeout to occur.
>
> In the case of HCI_QUIRK_RESET_ON_CLOSE, the sending of the HCI RESET command
> has a 2 second timeout which will block __hci_req_sync() for up to 2 seconds.
>
> The hu->proto->dequeue() kernel crash occurs because hu->proto->close() was
> called which removes the resources needed by hu->proto->dequeue(). This causes
> a NULL pointer dereference kernel crash to occur when hu->proto->dequeue()
> executes.
>
> The reason why hu->proto->close() can run before or during hu->proto->dequeue()
> is because the call to cancel_work_sync(&hu->write_work) is ineffective. This is
> because hci_cmd_work() and BCSP bcsp_timed_event() can schedule work-items
> after the cancel_work_sync() completed.
>
> Note that flushing can corrupt any protocol serial frames being sent from the
> Data Link protocol layer to the UART driver and this is undesirable.
>
>
> Solution
> ========
>
> Reorganise hci_uart_tty_close() so that hci_unregister_dev() can run with no
> interference to the data "pipe" between the Data Link protocol layer such as
> BCSP and the UART driver.
>
> The patchset cleans up the HCI_UART_REGISTERED and HCI_UART_PROTO_READY flag
> handling so that:
>
> a) hdev is only valid when HCI_UART_REGISTERED is in the set state.
> b) the "proto" Data Link function pointers can only be used when
>     HCI_UART_PROTO_READY is in the set state.
>
> The most important patch is
> "Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races"
>
> This patch adds rwlocking around the clearing of the flag HCI_UART_PROTO_READY.
> This is because the atomic flag HCI_UART_PROTO_READY cannot always prevent a
> thread using a Data Link protocol layer function pointer during or after closure
> of the Data Link protocol layer. This can result in a kernel crash. Remember
> that the BCSP retransmission handling runs in a separate thread and tries to
> send a new frame. Therefore there is a small window of a race condition for the
> flag HCI_UART_PROTO_READY to be seen in the set state and then calls the
> "proto" function pointer after the Data Link protocol layer has been closed,
> resulting in a kernel crash.
>
>
> Overview of patches
> -------------------
>
> The patches were rebased onto the bluetooth-next/master branch commit:
> 3eed895 Bluetooth: L2CAP: Don't return -EAGAIN if out of credits
>
> Tidy-up patches (mainly for h5 binding error handling)
>    Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work()
>    Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev
>    Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY
>
> Adds a comment to warn that hci_unregister_dev() may send a HCI RESET command
>    Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev()
>      call
>
> Simplifies the relationship between HCI_UART_REGISTERED and HCI_UART_PROTO_READY
>    Bluetooth: hci_ldisc: Separate flag handling in hci_uart_tty_close()
>
> Prevents "proto" functions being used immediately before and after
> hu->proto->close()
>    Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame()
>    Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue()
>    Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup()
>    Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races
>
> Prevents the "data" pipe being cut too early inside hci_uart_tty_close()
>    Bluetooth: hci_ldisc: Tidy-up HCI_UART_REGISTERED in
>      hci_uart_tty_close()
>    Bluetooth: hci_ldisc: hci_uart_tty_close() move hci_uart_close()
>    Bluetooth: hci_ldisc: hci_uart_tty_close() detach tty after last msg
>    Bluetooth: hci_ldisc: hci_uart_tty_close() move cancel_work_sync()
>    Bluetooth: hci_ldisc: hci_uart_tty_close() free hu->tx_skb
>    Bluetooth: hci_ldisc: Simplify flushing
>
> Prevent concurrency of hci_uart_tty_ioctl() with hci_uart_tty_close()
>    Bluetooth: hci_ldisc: Add ioctl_mutex avoiding concurrency
>
>
> Further work
> ============
>
> This patchset tackles the issues of hci_uart_tty_close().
>
> However, the TTY layer needs investigation because tty_set_ldisc() has a tty ref
> lock during the execution of hci_uart_tty_close() which:
>
> a) prevents flush_to_ldisc() from passing the received UART data to the Data
>     Link protocol layer.
> b) prevents hci_uart_tty_wakeup() from passing the Data Link protocol layer's
>     transmission data to the UART driver.
>
> Therefore, the TTY layer cuts the data "pipe" during the execution of
> hci_uart_tty_close() causing a loss of communications with the Bluetooth Radio
> Module.
>
> This patchset does not attempt to fix the TTY layer. This means that the
> HCI_QUIRK_RESET_ON_CLOSE will remain being unsuccessful in sending the HCI RESET
> command. In fact, all attempts at sending an HCI command during
> hci_uart_tty_close() will still fail.
>
>
> Dean Jenkins (16):
>    Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work()
>    Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev
>    Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY
>    Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev()
>      call
>    Bluetooth: hci_ldisc: Separate flag handling in hci_uart_tty_close()
>    Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame()
>    Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue()
>    Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup()
>    Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races
>    Bluetooth: hci_ldisc: Tidy-up HCI_UART_REGISTERED in
>      hci_uart_tty_close()
>    Bluetooth: hci_ldisc: hci_uart_tty_close() move hci_uart_close()
>    Bluetooth: hci_ldisc: hci_uart_tty_close() detach tty after last msg
>    Bluetooth: hci_ldisc: hci_uart_tty_close() move cancel_work_sync()
>    Bluetooth: hci_ldisc: hci_uart_tty_close() free hu->tx_skb
>    Bluetooth: hci_ldisc: Simplify flushing
>    Bluetooth: hci_ldisc: Add ioctl_mutex avoiding concurrency
>
>   drivers/bluetooth/hci_ldisc.c | 105 ++++++++++++++++++++++++++++++++----------
>   drivers/bluetooth/hci_uart.h  |   3 ++
>   2 files changed, 84 insertions(+), 24 deletions(-)
>

If you have not already noticed, I have sent you my V2 HCI UART LDISC 
patchset for you to look at.

I look forward to receiving your comments and feedback.

Thank you,

Best regards,
Dean

-- 
Dean Jenkins
Embedded Software Engineer
Linux Transportation Solutions
Mentor Embedded Software Division
Mentor Graphics (UK) Ltd.

^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2017-04-12  9:13 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-04-07 17:04 [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 01/16] Bluetooth: hci_ldisc: Add missing return in hci_uart_init_work() Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 02/16] Bluetooth: hci_ldisc: Ensure hu->hdev set to NULL before freeing hdev Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 03/16] Bluetooth: hci_ldisc: Add missing clear HCI_UART_PROTO_READY Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 04/16] Bluetooth: hci_ldisc: Add HCI RESET comment to hci_unregister_dev() call Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 05/16] Bluetooth: hci_ldisc: Separate flag handling in hci_uart_tty_close() Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 06/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_send_frame() Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 07/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_dequeue() Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 08/16] Bluetooth: hci_ldisc: Add protocol check to hci_uart_tx_wakeup() Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 09/16] Bluetooth: hci_ldisc: Use rwlocking to avoid closing proto races Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 10/16] Bluetooth: hci_ldisc: Tidy-up HCI_UART_REGISTERED in hci_uart_tty_close() Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 11/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move hci_uart_close() Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 12/16] Bluetooth: hci_ldisc: hci_uart_tty_close() detach tty after last msg Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 13/16] Bluetooth: hci_ldisc: hci_uart_tty_close() move cancel_work_sync() Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 14/16] Bluetooth: hci_ldisc: hci_uart_tty_close() free hu->tx_skb Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 15/16] Bluetooth: hci_ldisc: Simplify flushing Dean Jenkins
2017-04-07 17:04 ` [PATCH V2 16/16] Bluetooth: hci_ldisc: Add ioctl_mutex avoiding concurrency Dean Jenkins
2017-04-12  9:13 ` [PATCH V2 00/16] hci_ldisc hci_uart_tty_close() fixes Dean Jenkins

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.