Hi Marcel, On 03/04/17 16:51, Marcel Holtmann wrote: > > If this is an issue in 4.10, then lets get this fixed / hardened. > I have created a simple test script that leads to a kernel crash in HCI UART LDISC. Please note that this is a different scenario to the one documented in my previous E-mails but the resulting crash is similar. I am using a BCSP supported Bluetooth Radio Module connected via a USB to serial interface to a 64-bit x86 Linux laptop. I built kernel v4.10.5 although I messed up the .config settings so uname -r gave a strange string of4.10.54.10.5_debug+, sigh Please note that "btattach" fails to establish a BCSP connection with the Bluetooth Radio Module but that is advantageous in exposing a fatal race condition in hci_uart_tty_close(). In this case, the BCSP Data Link protocol layer attempts to retransmit every 250ms and the BCSP layer gets closed down whilst still attempting to retransmit which causes a kernel crash. My proposed patchset fixes this race condition plus some other issues. Please refer to my attached tarball v4.10.5_bt_crash_info.tgz, the contents are: v4.10.5_pure_testing/ v4.10.5_pure_testing/0001-Bluetooth-Debug-shows-how-hci_uart_tty_close-is-call.patch v4.10.5_pure_testing/v4.10.5_pure_testing/btattach_loop.sh v4.10.5_pure_testing/enable_hci_bt_logging.sh v4.10.5_pure_testing/bt_hci_uart_ldisc_crash_snippet_log.txt The patch file 0001-Bluetooth-Debug-shows-how-hci_uart_tty_close-is-call.patch is just adding debug so you can see that killing "btattach" causes do_group_exit() to run in the kernel which cleans-up by using tty_ldisc_kill() which calls tty_ldisc_close() which causes hci_uart_tty_close() to run. enable_hci_bt_logging.sh enables dynamic debug for dmesg although dynamic debug seems unreliable these days at least for me and I have to run enable_hci_bt_logging.sh multiple times to get all the files logging properly. bt_hci_uart_ldisc_crash_snippet_log.txt This shows my analysis of 2 example NULL pointer dereference kernel crashes which are easy to reproduce using my test script. v4.10.5_pure_testing/btattach_loop.sh This is the simple test script which causes a kernel crash within 10 minutes of running, it is easily repeatable with my equipment. I think the kernel crash occurs before the loop counter reaches 100. The contents of the script is: #!/bin/bash let i=1 while [ true ] do echo "loop $i" ./btattach -A /dev/ttyUSB0 -P bcsp & sleep 5 echo "now killing..." killall btattach i=$(($i + 1)) done Example kernel crash backtrace: [ 1561.709737] BUG: unable to handle kernel NULL pointer dereference at 0000000000000044 [ 1561.709829] IP: _raw_spin_lock_irqsave+0x22/0x40 [ 1561.709862] PGD 0 [ 1561.709889] Oops: 0002 [#1] SMP [ 1561.709911] Modules linked in: ftdi_sio rfcomm hci_uart btqca xt_set ip_set_hash_ip nf_log_ipv6 ip_set nf_log_ipv4 nf_log_common xt_LOG ip6table_nat nf_nat_ipv6 xt_recent iptable_nat nf_nat_ipv4 ipt_REJECT nf_reject_ipv4 iptable_mangle iptable_raw nf_conntrack_ipv4 nf_defrag_ipv4 xt_comment ip6t_REJECT nf_reject_ipv6 xt_addrtype bridge stp llc xt_mark ip6table_mangle nf_nat_tftp nf_nat_snmp_basic nf_conntrack_snmp xt_tcpudp nf_nat_sip xt_CT nf_nat_pptp nf_nat_proto_gre nf_nat_irc ip6table_raw nf_nat_h323 xt_multiport nf_nat_ftp nf_nat_amanda nf_conntrack_ipv6 nf_defrag_ipv6 xt_conntrack nf_nat nf_conntrack_tftp nf_conntrack_sip nf_conntrack_sane nf_conntrack_pptp nf_conntrack_proto_gre nf_conntrack_netlink nfnetlink nf_conntrack_netbios_ns nf_conntrack_broadcast nf_conntrack_irc nf_conntrack_h323 [ 1561.710172] nf_conntrack_ftp ts_kmp nf_conntrack_amanda nf_conntrack iptable_filter ip_tables ip6table_filter ip6_tables x_tables lockd grace ctr ccm af_packet bnep msr usbserial uvcvideo btusb btrtl arc4 brcmsmac btbcm videobuf2_vmalloc btintel cordic videobuf2_memops bluetooth intel_powerclamp brcmutil videobuf2_v4l2 videobuf2_core mac80211 videodev coretemp kvm_intel kvm joydev media cfg80211 iTCO_wdt iTCO_vendor_support snd_hda_codec_hdmi snd_hda_codec_realtek psmouse snd_hda_codec_generic toshiba_acpi input_leds bcma snd_hda_intel irqbypass snd_hda_codec toshiba_bluetooth cpufreq_ondemand sparse_keymap crc32c_intel cpufreq_conservative industrialio snd_hda_core serio_raw wmi rfkill thermal cpufreq_powersave battery ac snd_hwdep snd_pcm acpi_cpufreq snd_timer snd toshiba_haps mei_me mei e1000e [ 1561.718028] fjes soundcore evdev ptp lpc_ich shpchp pps_core intel_ips tpm_tis tpm_tis_core nvram tpm sch_fq_codel sunrpc ipv6 crc_ccitt autofs4 hid_generic usbhid hid sdhci_pci mmc_block sdhci sr_mod ehci_pci ehci_hcd mmc_core usbcore usb_common i915 video button i2c_algo_bit drm_kms_helper drm [last unloaded: ftdi_sio] [ 1561.721317] CPU: 1 PID: 4473 Comm: kworker/1:0 Not tainted 4.10.54.10.5_debug+ #12 [ 1561.722981] Hardware name: TOSHIBA Satellite R630/Portable PC, BIOS Version 1.40 07/20/2010 [ 1561.724660] Workqueue: events hci_uart_write_work [hci_uart] [ 1561.726377] task: ffff98d9b4f15100 task.stack: ffffa868c1178000 [ 1561.728568] RIP: 0010:_raw_spin_lock_irqsave+0x22/0x40 [ 1561.730355] RSP: 0000:ffffa868c117bda8 EFLAGS: 00010046 [ 1561.731993] RAX: 0000000000000000 RBX: 0000000000000296 RCX: 0000000000079ad6 [ 1561.733646] RDX: 0000000000000001 RSI: ffff98da77c5c580 RDI: 0000000000000044 [ 1561.735314] RBP: ffffa868c117bdb0 R08: 000000000001c5a0 R09: ffffffff9965a54a [ 1561.736991] R10: fffff48441dc9a80 R11: ffff98da73403700 R12: 0000000000000030 [ 1561.738641] R13: 0000000000000044 R14: 0000000000000030 R15: ffff98d9b4e50000 [ 1561.740288] FS: 0000000000000000(0000) GS:ffff98da77c40000(0000) knlGS:0000000000000000 [ 1561.741940] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1561.743585] CR2: 0000000000000044 CR3: 0000000128d1a000 CR4: 00000000000006e0 [ 1561.745238] Call Trace: [ 1561.746882] skb_dequeue+0x1d/0x70 [ 1561.748512] bcsp_dequeue+0x27/0x180 [hci_uart] [ 1561.750130] ? kfree_skbmem+0x5a/0x60 [ 1561.751742] hci_uart_write_work+0xc4/0x100 [hci_uart] [ 1561.753356] process_one_work+0x151/0x410 [ 1561.754930] worker_thread+0x69/0x4b0 [ 1561.756507] kthread+0x114/0x150 [ 1561.758076] ? rescuer_thread+0x340/0x340 [ 1561.759636] ? kthread_park+0x90/0x90 [ 1561.761191] ret_from_fork+0x2c/0x40 [ 1561.762693] Code: 89 e5 e8 82 96 98 ff 5d c3 66 66 66 66 90 55 48 89 e5 53 9c 58 66 66 90 66 90 48 89 c3 fa 66 66 90 66 66 90 31 c0 ba 01 00 00 00 0f b1 17 85 c0 75 06 48 89 d8 5b 5d c3 89 c6 e8 29 83 98 ff [ 1561.766034] RIP: _raw_spin_lock_irqsave+0x22/0x40 RSP: ffffa868c117bda8 [ 1561.768033] CR2: 0000000000000044 [ 1561.774555] ---[ end trace 51cc1d3575c0e559 ]--- Please see my analysis in bt_hci_uart_ldisc_crash_snippet_log.txt With my patchset applied, this crash appears to no longer occur. If I manage to produce some more useful results then I will post them. Thanks, Best regards, Dean -- Dean Jenkins Embedded Software Engineer Linux Transportation Solutions Mentor Embedded Software Division Mentor Graphics (UK) Ltd.