linux-bluetooth.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH v1 0/2] handle USB endpoint race condition
@ 2020-06-27 10:54 Archie Pusaka
  2020-06-27 10:54 ` [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found Archie Pusaka
  2020-06-27 10:54 ` [RFC PATCH v1 2/2] Bluetooth: queue L2CAP conn req if encryption is needed Archie Pusaka
  0 siblings, 2 replies; 15+ messages in thread
From: Archie Pusaka @ 2020-06-27 10:54 UTC (permalink / raw)
  To: linux-bluetooth, Marcel Holtmann
  Cc: chromeos-bluetooth-upstreaming, Archie Pusaka, David S. Miller,
	Jakub Kicinski, Johan Hedberg, linux-kernel, netdev

From: Archie Pusaka <apusaka@chromium.org>

On platforms with USB transport, events and data are transferred via
different endpoints. This could potentially cause ordering problems,
where the order of processed information is different than the actual
order.

One such a case is we receive a ACL packet before receiving the
connect complete event. This could be frequently observed on ChromeOS
auto test setup.

> ACL Data RX: Handle 256 flags 0x02 dlen 12       #6 [hci0] 9.314240
      L2CAP: Connection Request (0x02) ident 1 len 4
        PSM: 1 (0x0001)
        Source CID: 1536
> HCI Event: Connect Complete (0x03) plen 11       #7 [hci0] 9.333236
        Status: Success (0x00)
        Handle: 256
        Address: 70:BF:92:CF:95:B7 (OUI 70-BF-92)
        Link type: ACL (0x01)
        Encryption: Disabled (0x00)

If this happens, we will simply discard the ACL packet, since no
corresponding handle is registered yet. The situation where the ACL
packet is not replied by us is handled differently by each peripheral:
some will resend the ACL packet (which will be successful, since the
handle can be found now), but some will get upset and disconnect.

To prevent this from happening, we propose to introduce a queue. So,
instead of discarding those packets without handle, we put them in a
queue. Then, we will process this queue when we receive a connect
completion event for the matching handle. The queued packets will be
cleared if after 2 seconds we still don't receive the corresponding
connect complete event.

Similarly, there is a chance that the incoming L2CAP connection
request comes before HCI encryption change event. When this happens,
the incoming connection will be refused.
> ACL Data RX: Handle 256 flags 0x02 dlen 12      #46 [hci0] 9.275297
      L2CAP: Connection Request (0x02) ident 4 len 4
        PSM: 3 (0x0003)
        Source CID: 256
< ACL Data TX: Handle 256 flags 0x00 dlen 16      #47 [hci0] 9.275393
      L2CAP: Connection Response (0x03) ident 4 len 8
        Destination CID: 0
        Source CID: 256
        Result: Connection refused - security block (0x0003)
        Status: No further information available (0x0000)
> HCI Event: Encryption Change (0x08) plen 4      #51 [hci0] 9.314109
        Status: Success (0x00)
        Handle: 256
        Encryption: Enabled with E0 (0x01)

We propose to handle this case with a similar solution: queuing the
L2CAP connection request until we receive HCI encryption change.

The proposed patch is tested with a special Intel firmware which
allows us to purposely delay the "connect complete" event.

We plan to merge this change to ChromeOS auto test setup to see
whether this could fix the problem there, but we also would like
to have your input on this in the meantime.

Thanks,
Archie


Archie Pusaka (2):
  Bluetooth: queue ACL packets if no handle is found
  Bluetooth: queue L2CAP conn req if encryption is needed

 include/net/bluetooth/bluetooth.h |  6 +++
 include/net/bluetooth/hci_core.h  |  8 +++
 include/net/bluetooth/l2cap.h     |  6 +++
 net/bluetooth/hci_core.c          | 84 ++++++++++++++++++++++++++---
 net/bluetooth/hci_event.c         |  5 ++
 net/bluetooth/l2cap_core.c        | 87 +++++++++++++++++++++++++++----
 6 files changed, 179 insertions(+), 17 deletions(-)

-- 
2.27.0.212.ge8ba1cc988-goog


^ permalink raw reply	[flat|nested] 15+ messages in thread

* [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found
  2020-06-27 10:54 [RFC PATCH v1 0/2] handle USB endpoint race condition Archie Pusaka
@ 2020-06-27 10:54 ` Archie Pusaka
  2020-06-29  6:40   ` Marcel Holtmann
  2020-06-27 10:54 ` [RFC PATCH v1 2/2] Bluetooth: queue L2CAP conn req if encryption is needed Archie Pusaka
  1 sibling, 1 reply; 15+ messages in thread
From: Archie Pusaka @ 2020-06-27 10:54 UTC (permalink / raw)
  To: linux-bluetooth, Marcel Holtmann
  Cc: chromeos-bluetooth-upstreaming, Archie Pusaka,
	Abhishek Pandit-Subedi, David S. Miller, Jakub Kicinski,
	Johan Hedberg, linux-kernel, netdev

From: Archie Pusaka <apusaka@chromium.org>

There is a possibility that an ACL packet is received before we
receive the HCI connect event for the corresponding handle. If this
happens, we discard the ACL packet.

Rather than just ignoring them, this patch provides a queue for
incoming ACL packet without a handle. The queue is processed when
receiving a HCI connection event. If 2 seconds elapsed without
receiving the HCI connection event, assume something bad happened
and discard the queued packet.

Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>

---

 include/net/bluetooth/hci_core.h |  8 +++
 net/bluetooth/hci_core.c         | 84 +++++++++++++++++++++++++++++---
 net/bluetooth/hci_event.c        |  2 +
 3 files changed, 88 insertions(+), 6 deletions(-)

diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
index 836dc997ff94..b69ecdd0d15a 100644
--- a/include/net/bluetooth/hci_core.h
+++ b/include/net/bluetooth/hci_core.h
@@ -270,6 +270,9 @@ struct adv_monitor {
 /* Default authenticated payload timeout 30s */
 #define DEFAULT_AUTH_PAYLOAD_TIMEOUT   0x0bb8
 
+/* Time to keep ACL packets without a corresponding handle queued (2s) */
+#define PENDING_ACL_TIMEOUT		msecs_to_jiffies(2000)
+
 struct amp_assoc {
 	__u16	len;
 	__u16	offset;
@@ -538,6 +541,9 @@ struct hci_dev {
 	struct delayed_work	rpa_expired;
 	bdaddr_t		rpa;
 
+	struct delayed_work	remove_pending_acl;
+	struct sk_buff_head	pending_acl_q;
+
 #if IS_ENABLED(CONFIG_BT_LEDS)
 	struct led_trigger	*power_led;
 #endif
@@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, __le16 ediv, __le64 rand,
 void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr,
 			       u8 *bdaddr_type);
 
+void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn);
+
 #define SCO_AIRMODE_MASK       0x0003
 #define SCO_AIRMODE_CVSD       0x0000
 #define SCO_AIRMODE_TRANSP     0x0003
diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
index 7959b851cc63..30780242c267 100644
--- a/net/bluetooth/hci_core.c
+++ b/net/bluetooth/hci_core.c
@@ -1786,6 +1786,7 @@ int hci_dev_do_close(struct hci_dev *hdev)
 	skb_queue_purge(&hdev->rx_q);
 	skb_queue_purge(&hdev->cmd_q);
 	skb_queue_purge(&hdev->raw_q);
+	skb_queue_purge(&hdev->pending_acl_q);
 
 	/* Drop last sent command */
 	if (hdev->sent_cmd) {
@@ -3518,6 +3519,78 @@ static int hci_suspend_notifier(struct notifier_block *nb, unsigned long action,
 	return NOTIFY_STOP;
 }
 
+static void hci_add_pending_acl(struct hci_dev *hdev, struct sk_buff *skb)
+{
+	skb_queue_tail(&hdev->pending_acl_q, skb);
+
+	queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
+			   PENDING_ACL_TIMEOUT);
+}
+
+void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn)
+{
+	struct sk_buff *skb, *tmp;
+	struct hci_acl_hdr *hdr;
+	u16 handle, flags;
+	bool reset_timer = false;
+
+	skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
+		hdr = (struct hci_acl_hdr *)skb->data;
+		handle = __le16_to_cpu(hdr->handle);
+		flags  = hci_flags(handle);
+		handle = hci_handle(handle);
+
+		if (handle != conn->handle)
+			continue;
+
+		__skb_unlink(skb, &hdev->pending_acl_q);
+		skb_pull(skb, HCI_ACL_HDR_SIZE);
+
+		l2cap_recv_acldata(conn, skb, flags);
+		reset_timer = true;
+	}
+
+	if (reset_timer)
+		mod_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
+				 PENDING_ACL_TIMEOUT);
+}
+
+/* Remove the oldest pending ACL, and all pending ACLs with the same handle */
+static void hci_remove_pending_acl(struct work_struct *work)
+{
+	struct hci_dev *hdev;
+	struct sk_buff *skb, *tmp;
+	struct hci_acl_hdr *hdr;
+	u16 handle, oldest_handle;
+
+	hdev = container_of(work, struct hci_dev, remove_pending_acl.work);
+	skb = skb_dequeue(&hdev->pending_acl_q);
+
+	if (!skb)
+		return;
+
+	hdr = (struct hci_acl_hdr *)skb->data;
+	oldest_handle = hci_handle(__le16_to_cpu(hdr->handle));
+	kfree_skb(skb);
+
+	bt_dev_err(hdev, "ACL packet for unknown connection handle %d",
+		   oldest_handle);
+
+	skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
+		hdr = (struct hci_acl_hdr *)skb->data;
+		handle = hci_handle(__le16_to_cpu(hdr->handle));
+
+		if (handle == oldest_handle) {
+			__skb_unlink(skb, &hdev->pending_acl_q);
+			kfree_skb(skb);
+		}
+	}
+
+	if (!skb_queue_empty(&hdev->pending_acl_q))
+		queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
+				   PENDING_ACL_TIMEOUT);
+}
+
 /* Alloc HCI device */
 struct hci_dev *hci_alloc_dev(void)
 {
@@ -3610,10 +3683,12 @@ struct hci_dev *hci_alloc_dev(void)
 	INIT_WORK(&hdev->suspend_prepare, hci_prepare_suspend);
 
 	INIT_DELAYED_WORK(&hdev->power_off, hci_power_off);
+	INIT_DELAYED_WORK(&hdev->remove_pending_acl, hci_remove_pending_acl);
 
 	skb_queue_head_init(&hdev->rx_q);
 	skb_queue_head_init(&hdev->cmd_q);
 	skb_queue_head_init(&hdev->raw_q);
+	skb_queue_head_init(&hdev->pending_acl_q);
 
 	init_waitqueue_head(&hdev->req_wait_q);
 	init_waitqueue_head(&hdev->suspend_wait_q);
@@ -4662,8 +4737,6 @@ static void hci_acldata_packet(struct hci_dev *hdev, struct sk_buff *skb)
 	struct hci_conn *conn;
 	__u16 handle, flags;
 
-	skb_pull(skb, HCI_ACL_HDR_SIZE);
-
 	handle = __le16_to_cpu(hdr->handle);
 	flags  = hci_flags(handle);
 	handle = hci_handle(handle);
@@ -4678,17 +4751,16 @@ static void hci_acldata_packet(struct hci_dev *hdev, struct sk_buff *skb)
 	hci_dev_unlock(hdev);
 
 	if (conn) {
+		skb_pull(skb, HCI_ACL_HDR_SIZE);
+
 		hci_conn_enter_active_mode(conn, BT_POWER_FORCE_ACTIVE_OFF);
 
 		/* Send to upper protocol */
 		l2cap_recv_acldata(conn, skb, flags);
 		return;
 	} else {
-		bt_dev_err(hdev, "ACL packet for unknown connection handle %d",
-			   handle);
+		hci_add_pending_acl(hdev, skb);
 	}
-
-	kfree_skb(skb);
 }
 
 /* SCO data packet */
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index e060fc9ebb18..108c6c102a6a 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -2627,6 +2627,8 @@ static void hci_conn_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
 			hci_send_cmd(hdev, HCI_OP_CHANGE_CONN_PTYPE, sizeof(cp),
 				     &cp);
 		}
+
+		hci_process_pending_acl(hdev, conn);
 	} else {
 		conn->state = BT_CLOSED;
 		if (conn->type == ACL_LINK)
-- 
2.27.0.212.ge8ba1cc988-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [RFC PATCH v1 2/2] Bluetooth: queue L2CAP conn req if encryption is needed
  2020-06-27 10:54 [RFC PATCH v1 0/2] handle USB endpoint race condition Archie Pusaka
  2020-06-27 10:54 ` [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found Archie Pusaka
@ 2020-06-27 10:54 ` Archie Pusaka
  2020-06-29  6:49   ` Marcel Holtmann
  1 sibling, 1 reply; 15+ messages in thread
From: Archie Pusaka @ 2020-06-27 10:54 UTC (permalink / raw)
  To: linux-bluetooth, Marcel Holtmann
  Cc: chromeos-bluetooth-upstreaming, Archie Pusaka,
	Abhishek Pandit-Subedi, David S. Miller, Jakub Kicinski,
	Johan Hedberg, linux-kernel, netdev

From: Archie Pusaka <apusaka@chromium.org>

It is possible to receive an L2CAP conn req for an encrypted
connection, before actually receiving the HCI change encryption
event. If this happened, the received L2CAP packet will be ignored.

This patch queues the L2CAP packet and process them after the
expected HCI event is received. If after 2 seconds we still don't
receive it, then we assume something bad happened and discard the
queued packets.

Signed-off-by: Archie Pusaka <apusaka@chromium.org>
Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>

---

 include/net/bluetooth/bluetooth.h |  6 +++
 include/net/bluetooth/l2cap.h     |  6 +++
 net/bluetooth/hci_event.c         |  3 ++
 net/bluetooth/l2cap_core.c        | 87 +++++++++++++++++++++++++++----
 4 files changed, 91 insertions(+), 11 deletions(-)

diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
index 7ee8041af803..e64278401084 100644
--- a/include/net/bluetooth/bluetooth.h
+++ b/include/net/bluetooth/bluetooth.h
@@ -335,7 +335,11 @@ struct l2cap_ctrl {
 	u16	reqseq;
 	u16	txseq;
 	u8	retries;
+	u8	rsp_code;
+	u8	amp_id;
+	__u8	ident;
 	__le16  psm;
+	__le16	scid;
 	bdaddr_t bdaddr;
 	struct l2cap_chan *chan;
 };
@@ -374,6 +378,8 @@ struct bt_skb_cb {
 		struct hci_ctrl hci;
 	};
 };
+static_assert(sizeof(struct bt_skb_cb) <= sizeof(((struct sk_buff *)0)->cb));
+
 #define bt_cb(skb) ((struct bt_skb_cb *)((skb)->cb))
 
 #define hci_skb_pkt_type(skb) bt_cb((skb))->pkt_type
diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h
index 8f1e6a7a2df8..f8f6dec96f12 100644
--- a/include/net/bluetooth/l2cap.h
+++ b/include/net/bluetooth/l2cap.h
@@ -58,6 +58,7 @@
 #define L2CAP_MOVE_ERTX_TIMEOUT		msecs_to_jiffies(60000)
 #define L2CAP_WAIT_ACK_POLL_PERIOD	msecs_to_jiffies(200)
 #define L2CAP_WAIT_ACK_TIMEOUT		msecs_to_jiffies(10000)
+#define L2CAP_PEND_ENC_CONN_TIMEOUT	msecs_to_jiffies(2000)
 
 #define L2CAP_A2MP_DEFAULT_MTU		670
 
@@ -700,6 +701,9 @@ struct l2cap_conn {
 	struct mutex		chan_lock;
 	struct kref		ref;
 	struct list_head	users;
+
+	struct delayed_work	remove_pending_encrypt_conn;
+	struct sk_buff_head	pending_conn_q;
 };
 
 struct l2cap_user {
@@ -1001,4 +1005,6 @@ void l2cap_conn_put(struct l2cap_conn *conn);
 int l2cap_register_user(struct l2cap_conn *conn, struct l2cap_user *user);
 void l2cap_unregister_user(struct l2cap_conn *conn, struct l2cap_user *user);
 
+void l2cap_process_pending_encrypt_conn(struct hci_conn *hcon);
+
 #endif /* __L2CAP_H */
diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
index 108c6c102a6a..8cefc51a5ca4 100644
--- a/net/bluetooth/hci_event.c
+++ b/net/bluetooth/hci_event.c
@@ -3136,6 +3136,9 @@ static void hci_encrypt_change_evt(struct hci_dev *hdev, struct sk_buff *skb)
 
 unlock:
 	hci_dev_unlock(hdev);
+
+	if (conn && !ev->status && ev->encrypt)
+		l2cap_process_pending_encrypt_conn(conn);
 }
 
 static void hci_change_link_key_complete_evt(struct hci_dev *hdev,
diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
index 35d2bc569a2d..fc6fe2c80c46 100644
--- a/net/bluetooth/l2cap_core.c
+++ b/net/bluetooth/l2cap_core.c
@@ -62,6 +62,10 @@ static void l2cap_send_disconn_req(struct l2cap_chan *chan, int err);
 static void l2cap_tx(struct l2cap_chan *chan, struct l2cap_ctrl *control,
 		     struct sk_buff_head *skbs, u8 event);
 
+static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
+					u8 ident, u8 *data, u8 rsp_code,
+					u8 amp_id, bool queue_if_fail);
+
 static inline u8 bdaddr_type(u8 link_type, u8 bdaddr_type)
 {
 	if (link_type == LE_LINK) {
@@ -1902,6 +1906,8 @@ static void l2cap_conn_del(struct hci_conn *hcon, int err)
 	if (conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_SENT)
 		cancel_delayed_work_sync(&conn->info_timer);
 
+	cancel_delayed_work_sync(&conn->remove_pending_encrypt_conn);
+
 	hcon->l2cap_data = NULL;
 	conn->hchan = NULL;
 	l2cap_conn_put(conn);
@@ -2023,6 +2029,55 @@ static void l2cap_retrans_timeout(struct work_struct *work)
 	l2cap_chan_put(chan);
 }
 
+static void l2cap_add_pending_encrypt_conn(struct l2cap_conn *conn,
+					   struct l2cap_conn_req *req,
+					   u8 ident, u8 rsp_code, u8 amp_id)
+{
+	struct sk_buff *skb = bt_skb_alloc(0, GFP_KERNEL);
+
+	bt_cb(skb)->l2cap.psm = req->psm;
+	bt_cb(skb)->l2cap.scid = req->scid;
+	bt_cb(skb)->l2cap.ident = ident;
+	bt_cb(skb)->l2cap.rsp_code = rsp_code;
+	bt_cb(skb)->l2cap.amp_id = amp_id;
+
+	skb_queue_tail(&conn->pending_conn_q, skb);
+	queue_delayed_work(conn->hcon->hdev->workqueue,
+			   &conn->remove_pending_encrypt_conn,
+			   L2CAP_PEND_ENC_CONN_TIMEOUT);
+}
+
+void l2cap_process_pending_encrypt_conn(struct hci_conn *hcon)
+{
+	struct sk_buff *skb;
+	struct l2cap_conn *conn = hcon->l2cap_data;
+
+	if (!conn)
+		return;
+
+	while ((skb = skb_dequeue(&conn->pending_conn_q))) {
+		struct l2cap_conn_req req;
+		u8 ident, rsp_code, amp_id;
+
+		req.psm = bt_cb(skb)->l2cap.psm;
+		req.scid = bt_cb(skb)->l2cap.scid;
+		ident = bt_cb(skb)->l2cap.ident;
+		rsp_code = bt_cb(skb)->l2cap.rsp_code;
+		amp_id = bt_cb(skb)->l2cap.amp_id;
+
+		l2cap_connect(conn, ident, (u8 *)&req, rsp_code, amp_id, false);
+		kfree_skb(skb);
+	}
+}
+
+static void l2cap_remove_pending_encrypt_conn(struct work_struct *work)
+{
+	struct l2cap_conn *conn = container_of(work, struct l2cap_conn,
+					    remove_pending_encrypt_conn.work);
+
+	l2cap_process_pending_encrypt_conn(conn->hcon);
+}
+
 static void l2cap_streaming_send(struct l2cap_chan *chan,
 				 struct sk_buff_head *skbs)
 {
@@ -4076,8 +4131,8 @@ static inline int l2cap_command_rej(struct l2cap_conn *conn,
 }
 
 static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
-					struct l2cap_cmd_hdr *cmd,
-					u8 *data, u8 rsp_code, u8 amp_id)
+					u8 ident, u8 *data, u8 rsp_code,
+					u8 amp_id, bool queue_if_fail)
 {
 	struct l2cap_conn_req *req = (struct l2cap_conn_req *) data;
 	struct l2cap_conn_rsp rsp;
@@ -4103,8 +4158,15 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
 	/* Check if the ACL is secure enough (if not SDP) */
 	if (psm != cpu_to_le16(L2CAP_PSM_SDP) &&
 	    !hci_conn_check_link_mode(conn->hcon)) {
-		conn->disc_reason = HCI_ERROR_AUTH_FAILURE;
-		result = L2CAP_CR_SEC_BLOCK;
+		if (!queue_if_fail) {
+			conn->disc_reason = HCI_ERROR_AUTH_FAILURE;
+			result = L2CAP_CR_SEC_BLOCK;
+			goto response;
+		}
+
+		l2cap_add_pending_encrypt_conn(conn, req, ident, rsp_code,
+					       amp_id);
+		result = L2CAP_CR_PEND;
 		goto response;
 	}
 
@@ -4147,7 +4209,7 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
 
 	__set_chan_timer(chan, chan->ops->get_sndtimeo(chan));
 
-	chan->ident = cmd->ident;
+	chan->ident = ident;
 
 	if (conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_DONE) {
 		if (l2cap_chan_check_security(chan, false)) {
@@ -4191,7 +4253,7 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
 	rsp.dcid   = cpu_to_le16(dcid);
 	rsp.result = cpu_to_le16(result);
 	rsp.status = cpu_to_le16(status);
-	l2cap_send_cmd(conn, cmd->ident, rsp_code, sizeof(rsp), &rsp);
+	l2cap_send_cmd(conn, ident, rsp_code, sizeof(rsp), &rsp);
 
 	if (result == L2CAP_CR_PEND && status == L2CAP_CS_NO_INFO) {
 		struct l2cap_info_req info;
@@ -4233,7 +4295,7 @@ static int l2cap_connect_req(struct l2cap_conn *conn,
 		mgmt_device_connected(hdev, hcon, 0, NULL, 0);
 	hci_dev_unlock(hdev);
 
-	l2cap_connect(conn, cmd, data, L2CAP_CONN_RSP, 0);
+	l2cap_connect(conn, cmd->ident, data, L2CAP_CONN_RSP, 0, true);
 	return 0;
 }
 
@@ -4802,8 +4864,8 @@ static int l2cap_create_channel_req(struct l2cap_conn *conn,
 
 	/* For controller id 0 make BR/EDR connection */
 	if (req->amp_id == AMP_ID_BREDR) {
-		l2cap_connect(conn, cmd, data, L2CAP_CREATE_CHAN_RSP,
-			      req->amp_id);
+		l2cap_connect(conn, cmd->ident, data, L2CAP_CREATE_CHAN_RSP,
+			      req->amp_id, true);
 		return 0;
 	}
 
@@ -4817,8 +4879,8 @@ static int l2cap_create_channel_req(struct l2cap_conn *conn,
 		goto error;
 	}
 
-	chan = l2cap_connect(conn, cmd, data, L2CAP_CREATE_CHAN_RSP,
-			     req->amp_id);
+	chan = l2cap_connect(conn, cmd->ident, data, L2CAP_CREATE_CHAN_RSP,
+			     req->amp_id, true);
 	if (chan) {
 		struct amp_mgr *mgr = conn->hcon->amp_mgr;
 		struct hci_conn *hs_hcon;
@@ -7745,8 +7807,11 @@ static struct l2cap_conn *l2cap_conn_add(struct hci_conn *hcon)
 	INIT_LIST_HEAD(&conn->users);
 
 	INIT_DELAYED_WORK(&conn->info_timer, l2cap_info_timeout);
+	INIT_DELAYED_WORK(&conn->remove_pending_encrypt_conn,
+			  l2cap_remove_pending_encrypt_conn);
 
 	skb_queue_head_init(&conn->pending_rx);
+	skb_queue_head_init(&conn->pending_conn_q);
 	INIT_WORK(&conn->pending_rx_work, process_pending_rx);
 	INIT_WORK(&conn->id_addr_update_work, l2cap_conn_update_id_addr);
 
-- 
2.27.0.212.ge8ba1cc988-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found
  2020-06-27 10:54 ` [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found Archie Pusaka
@ 2020-06-29  6:40   ` Marcel Holtmann
  2020-06-30  6:48     ` Archie Pusaka
  0 siblings, 1 reply; 15+ messages in thread
From: Marcel Holtmann @ 2020-06-29  6:40 UTC (permalink / raw)
  To: Archie Pusaka
  Cc: linux-bluetooth, chromeos-bluetooth-upstreaming, Archie Pusaka,
	Abhishek Pandit-Subedi, David S. Miller, Jakub Kicinski,
	Johan Hedberg, kernel list, netdev

Hi Archie,

> There is a possibility that an ACL packet is received before we
> receive the HCI connect event for the corresponding handle. If this
> happens, we discard the ACL packet.
> 
> Rather than just ignoring them, this patch provides a queue for
> incoming ACL packet without a handle. The queue is processed when
> receiving a HCI connection event. If 2 seconds elapsed without
> receiving the HCI connection event, assume something bad happened
> and discard the queued packet.
> 
> Signed-off-by: Archie Pusaka <apusaka@chromium.org>
> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>

so two things up front. I want to hide this behind a HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. Frankly if this kind of out-of-order happens on UART or SDIO transports, then something is obviously going wrong. I have no plan to fix up after a fully serialized transport.

Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want this off by default. You can enable it via an experimental setting. The reason here is that we have to make it really hard and fail as often as possible so that hardware manufactures and spec writers realize that something is fundamentally broken here.

I have no problem in running the code and complaining loudly in case the quirk has been set. Just injecting the packets can only happen if bluetoothd explicitly enabled it.

> 
> ---
> 
> include/net/bluetooth/hci_core.h |  8 +++
> net/bluetooth/hci_core.c         | 84 +++++++++++++++++++++++++++++---
> net/bluetooth/hci_event.c        |  2 +
> 3 files changed, 88 insertions(+), 6 deletions(-)
> 
> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> index 836dc997ff94..b69ecdd0d15a 100644
> --- a/include/net/bluetooth/hci_core.h
> +++ b/include/net/bluetooth/hci_core.h
> @@ -270,6 +270,9 @@ struct adv_monitor {
> /* Default authenticated payload timeout 30s */
> #define DEFAULT_AUTH_PAYLOAD_TIMEOUT   0x0bb8
> 
> +/* Time to keep ACL packets without a corresponding handle queued (2s) */
> +#define PENDING_ACL_TIMEOUT		msecs_to_jiffies(2000)
> +

Do we have some btmon traces with timestamps. Isn’t a second enough? Actually 2 seconds is an awful long time.

> struct amp_assoc {
> 	__u16	len;
> 	__u16	offset;
> @@ -538,6 +541,9 @@ struct hci_dev {
> 	struct delayed_work	rpa_expired;
> 	bdaddr_t		rpa;
> 
> +	struct delayed_work	remove_pending_acl;
> +	struct sk_buff_head	pending_acl_q;
> +

can we name this ooo_q and move it to the other queues in this struct. Unless we want to add a Kconfig option around it, we don’t need to keep it here.

> #if IS_ENABLED(CONFIG_BT_LEDS)
> 	struct led_trigger	*power_led;
> #endif
> @@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, __le16 ediv, __le64 rand,
> void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr,
> 			       u8 *bdaddr_type);
> 
> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn);
> +
> #define SCO_AIRMODE_MASK       0x0003
> #define SCO_AIRMODE_CVSD       0x0000
> #define SCO_AIRMODE_TRANSP     0x0003
> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> index 7959b851cc63..30780242c267 100644
> --- a/net/bluetooth/hci_core.c
> +++ b/net/bluetooth/hci_core.c
> @@ -1786,6 +1786,7 @@ int hci_dev_do_close(struct hci_dev *hdev)
> 	skb_queue_purge(&hdev->rx_q);
> 	skb_queue_purge(&hdev->cmd_q);
> 	skb_queue_purge(&hdev->raw_q);
> +	skb_queue_purge(&hdev->pending_acl_q);
> 
> 	/* Drop last sent command */
> 	if (hdev->sent_cmd) {
> @@ -3518,6 +3519,78 @@ static int hci_suspend_notifier(struct notifier_block *nb, unsigned long action,
> 	return NOTIFY_STOP;
> }
> 
> +static void hci_add_pending_acl(struct hci_dev *hdev, struct sk_buff *skb)
> +{
> +	skb_queue_tail(&hdev->pending_acl_q, skb);
> +
> +	queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
> +			   PENDING_ACL_TIMEOUT);
> +}
> +
> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn)
> +{
> +	struct sk_buff *skb, *tmp;
> +	struct hci_acl_hdr *hdr;
> +	u16 handle, flags;
> +	bool reset_timer = false;
> +
> +	skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
> +		hdr = (struct hci_acl_hdr *)skb->data;
> +		handle = __le16_to_cpu(hdr->handle);
> +		flags  = hci_flags(handle);
> +		handle = hci_handle(handle);
> +
> +		if (handle != conn->handle)
> +			continue;
> +
> +		__skb_unlink(skb, &hdev->pending_acl_q);
> +		skb_pull(skb, HCI_ACL_HDR_SIZE);
> +
> +		l2cap_recv_acldata(conn, skb, flags);
> +		reset_timer = true;
> +	}
> +
> +	if (reset_timer)
> +		mod_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
> +				 PENDING_ACL_TIMEOUT);
> +}
> +
> +/* Remove the oldest pending ACL, and all pending ACLs with the same handle */
> +static void hci_remove_pending_acl(struct work_struct *work)
> +{
> +	struct hci_dev *hdev;
> +	struct sk_buff *skb, *tmp;
> +	struct hci_acl_hdr *hdr;
> +	u16 handle, oldest_handle;
> +
> +	hdev = container_of(work, struct hci_dev, remove_pending_acl.work);
> +	skb = skb_dequeue(&hdev->pending_acl_q);
> +
> +	if (!skb)
> +		return;
> +
> +	hdr = (struct hci_acl_hdr *)skb->data;
> +	oldest_handle = hci_handle(__le16_to_cpu(hdr->handle));
> +	kfree_skb(skb);
> +
> +	bt_dev_err(hdev, "ACL packet for unknown connection handle %d",
> +		   oldest_handle);
> +
> +	skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
> +		hdr = (struct hci_acl_hdr *)skb->data;
> +		handle = hci_handle(__le16_to_cpu(hdr->handle));
> +
> +		if (handle == oldest_handle) {
> +			__skb_unlink(skb, &hdev->pending_acl_q);
> +			kfree_skb(skb);
> +		}
> +	}
> +
> +	if (!skb_queue_empty(&hdev->pending_acl_q))
> +		queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
> +				   PENDING_ACL_TIMEOUT);
> +}
> +

So I am wondering if we make this too complicated. Since generally speaking we can only have a single HCI connect complete anyway at a time. No matter if the controller serializes it for us or we do it for the controller. So hci_conn_add could just process the queue for packets with its handle and then flush it. And it can flush it no matter what since whatever other packets are in the queue, they can not be valid.

That said, we wouldn’t even need to check the packet handles at all. We just needed to flag them as already out-of-order queued once and hand them back into the rx_q at the top. Then the would be processed as usual. Already ooo packets would cause the same error as before if it is for a non-existing handle and others would end up being processed.

For me this means we just need another queue to park the packets until hci_conn_add gets called. I might have missed something, but I am looking for the least invasive option for this and least code duplication.

> /* Alloc HCI device */
> struct hci_dev *hci_alloc_dev(void)
> {
> @@ -3610,10 +3683,12 @@ struct hci_dev *hci_alloc_dev(void)
> 	INIT_WORK(&hdev->suspend_prepare, hci_prepare_suspend);
> 
> 	INIT_DELAYED_WORK(&hdev->power_off, hci_power_off);
> +	INIT_DELAYED_WORK(&hdev->remove_pending_acl, hci_remove_pending_acl);
> 
> 	skb_queue_head_init(&hdev->rx_q);
> 	skb_queue_head_init(&hdev->cmd_q);
> 	skb_queue_head_init(&hdev->raw_q);
> +	skb_queue_head_init(&hdev->pending_acl_q);
> 
> 	init_waitqueue_head(&hdev->req_wait_q);
> 	init_waitqueue_head(&hdev->suspend_wait_q);
> @@ -4662,8 +4737,6 @@ static void hci_acldata_packet(struct hci_dev *hdev, struct sk_buff *skb)
> 	struct hci_conn *conn;
> 	__u16 handle, flags;
> 
> -	skb_pull(skb, HCI_ACL_HDR_SIZE);
> -
> 	handle = __le16_to_cpu(hdr->handle);
> 	flags  = hci_flags(handle);
> 	handle = hci_handle(handle);
> @@ -4678,17 +4751,16 @@ static void hci_acldata_packet(struct hci_dev *hdev, struct sk_buff *skb)
> 	hci_dev_unlock(hdev);
> 
> 	if (conn) {
> +		skb_pull(skb, HCI_ACL_HDR_SIZE);
> +
> 		hci_conn_enter_active_mode(conn, BT_POWER_FORCE_ACTIVE_OFF);
> 
> 		/* Send to upper protocol */
> 		l2cap_recv_acldata(conn, skb, flags);
> 		return;
> 	} else {
> -		bt_dev_err(hdev, "ACL packet for unknown connection handle %d",
> -			   handle);
> +		hci_add_pending_acl(hdev, skb);

So here I want to keep being verbose. If no quirk is set, then this has to stay as an error. In case the quirk is set, then this should still warn that we are queuing up a packet. It is not an expected behavior.

> 	}
> -
> -	kfree_skb(skb);
> }
> 
> /* SCO data packet */
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index e060fc9ebb18..108c6c102a6a 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -2627,6 +2627,8 @@ static void hci_conn_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
> 			hci_send_cmd(hdev, HCI_OP_CHANGE_CONN_PTYPE, sizeof(cp),
> 				     &cp);
> 		}
> +
> +		hci_process_pending_acl(hdev, conn);

Can we just do this in hci_conn_add() when we create the connection object?
 
> 	} else {
> 		conn->state = BT_CLOSED;
> 		if (conn->type == ACL_LINK)
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 

Regards

Marcel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 2/2] Bluetooth: queue L2CAP conn req if encryption is needed
  2020-06-27 10:54 ` [RFC PATCH v1 2/2] Bluetooth: queue L2CAP conn req if encryption is needed Archie Pusaka
@ 2020-06-29  6:49   ` Marcel Holtmann
  2020-06-29 19:42     ` Luiz Augusto von Dentz
  0 siblings, 1 reply; 15+ messages in thread
From: Marcel Holtmann @ 2020-06-29  6:49 UTC (permalink / raw)
  To: Archie Pusaka
  Cc: linux-bluetooth, chromeos-bluetooth-upstreaming, Archie Pusaka,
	Abhishek Pandit-Subedi, David S. Miller, Jakub Kicinski,
	Johan Hedberg, linux-kernel, netdev

Hi Archie,

> It is possible to receive an L2CAP conn req for an encrypted
> connection, before actually receiving the HCI change encryption
> event. If this happened, the received L2CAP packet will be ignored.
> 
> This patch queues the L2CAP packet and process them after the
> expected HCI event is received. If after 2 seconds we still don't
> receive it, then we assume something bad happened and discard the
> queued packets.

as with the other patch, this should be behind the same quirk and experimental setting for exactly the same reasons.

> 
> Signed-off-by: Archie Pusaka <apusaka@chromium.org>
> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
> 
> ---
> 
> include/net/bluetooth/bluetooth.h |  6 +++
> include/net/bluetooth/l2cap.h     |  6 +++
> net/bluetooth/hci_event.c         |  3 ++
> net/bluetooth/l2cap_core.c        | 87 +++++++++++++++++++++++++++----
> 4 files changed, 91 insertions(+), 11 deletions(-)
> 
> diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
> index 7ee8041af803..e64278401084 100644
> --- a/include/net/bluetooth/bluetooth.h
> +++ b/include/net/bluetooth/bluetooth.h
> @@ -335,7 +335,11 @@ struct l2cap_ctrl {
> 	u16	reqseq;
> 	u16	txseq;
> 	u8	retries;
> +	u8	rsp_code;
> +	u8	amp_id;
> +	__u8	ident;
> 	__le16  psm;
> +	__le16	scid;
> 	bdaddr_t bdaddr;
> 	struct l2cap_chan *chan;
> };

I would not bother trying to make this work with CREATE_CHAN_REQ. That is if you want to setup a L2CAP channel that can be moved between BR/EDR and AMP controllers and in that case you have to read the L2CAP information and features first. Meaning there will have been unencrypted ACL packets. This problem only exists if the remote side doesn’t request any version information first.

> @@ -374,6 +378,8 @@ struct bt_skb_cb {
> 		struct hci_ctrl hci;
> 	};
> };
> +static_assert(sizeof(struct bt_skb_cb) <= sizeof(((struct sk_buff *)0)->cb));
> +
> #define bt_cb(skb) ((struct bt_skb_cb *)((skb)->cb))
> 
> #define hci_skb_pkt_type(skb) bt_cb((skb))->pkt_type
> diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h
> index 8f1e6a7a2df8..f8f6dec96f12 100644
> --- a/include/net/bluetooth/l2cap.h
> +++ b/include/net/bluetooth/l2cap.h
> @@ -58,6 +58,7 @@
> #define L2CAP_MOVE_ERTX_TIMEOUT		msecs_to_jiffies(60000)
> #define L2CAP_WAIT_ACK_POLL_PERIOD	msecs_to_jiffies(200)
> #define L2CAP_WAIT_ACK_TIMEOUT		msecs_to_jiffies(10000)
> +#define L2CAP_PEND_ENC_CONN_TIMEOUT	msecs_to_jiffies(2000)
> 
> #define L2CAP_A2MP_DEFAULT_MTU		670
> 
> @@ -700,6 +701,9 @@ struct l2cap_conn {
> 	struct mutex		chan_lock;
> 	struct kref		ref;
> 	struct list_head	users;
> +
> +	struct delayed_work	remove_pending_encrypt_conn;
> +	struct sk_buff_head	pending_conn_q;
> };
> 
> struct l2cap_user {
> @@ -1001,4 +1005,6 @@ void l2cap_conn_put(struct l2cap_conn *conn);
> int l2cap_register_user(struct l2cap_conn *conn, struct l2cap_user *user);
> void l2cap_unregister_user(struct l2cap_conn *conn, struct l2cap_user *user);
> 
> +void l2cap_process_pending_encrypt_conn(struct hci_conn *hcon);
> +
> #endif /* __L2CAP_H */
> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> index 108c6c102a6a..8cefc51a5ca4 100644
> --- a/net/bluetooth/hci_event.c
> +++ b/net/bluetooth/hci_event.c
> @@ -3136,6 +3136,9 @@ static void hci_encrypt_change_evt(struct hci_dev *hdev, struct sk_buff *skb)
> 
> unlock:
> 	hci_dev_unlock(hdev);
> +
> +	if (conn && !ev->status && ev->encrypt)
> +		l2cap_process_pending_encrypt_conn(conn);
> }
> 
> static void hci_change_link_key_complete_evt(struct hci_dev *hdev,
> diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> index 35d2bc569a2d..fc6fe2c80c46 100644
> --- a/net/bluetooth/l2cap_core.c
> +++ b/net/bluetooth/l2cap_core.c
> @@ -62,6 +62,10 @@ static void l2cap_send_disconn_req(struct l2cap_chan *chan, int err);
> static void l2cap_tx(struct l2cap_chan *chan, struct l2cap_ctrl *control,
> 		     struct sk_buff_head *skbs, u8 event);
> 
> +static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
> +					u8 ident, u8 *data, u8 rsp_code,
> +					u8 amp_id, bool queue_if_fail);
> +
> static inline u8 bdaddr_type(u8 link_type, u8 bdaddr_type)
> {
> 	if (link_type == LE_LINK) {
> @@ -1902,6 +1906,8 @@ static void l2cap_conn_del(struct hci_conn *hcon, int err)
> 	if (conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_SENT)
> 		cancel_delayed_work_sync(&conn->info_timer);
> 
> +	cancel_delayed_work_sync(&conn->remove_pending_encrypt_conn);
> +
> 	hcon->l2cap_data = NULL;
> 	conn->hchan = NULL;
> 	l2cap_conn_put(conn);
> @@ -2023,6 +2029,55 @@ static void l2cap_retrans_timeout(struct work_struct *work)
> 	l2cap_chan_put(chan);
> }
> 
> +static void l2cap_add_pending_encrypt_conn(struct l2cap_conn *conn,
> +					   struct l2cap_conn_req *req,
> +					   u8 ident, u8 rsp_code, u8 amp_id)
> +{
> +	struct sk_buff *skb = bt_skb_alloc(0, GFP_KERNEL);
> +
> +	bt_cb(skb)->l2cap.psm = req->psm;
> +	bt_cb(skb)->l2cap.scid = req->scid;
> +	bt_cb(skb)->l2cap.ident = ident;
> +	bt_cb(skb)->l2cap.rsp_code = rsp_code;
> +	bt_cb(skb)->l2cap.amp_id = amp_id;
> +
> +	skb_queue_tail(&conn->pending_conn_q, skb);
> +	queue_delayed_work(conn->hcon->hdev->workqueue,
> +			   &conn->remove_pending_encrypt_conn,
> +			   L2CAP_PEND_ENC_CONN_TIMEOUT);
> +}
> +
> +void l2cap_process_pending_encrypt_conn(struct hci_conn *hcon)
> +{
> +	struct sk_buff *skb;
> +	struct l2cap_conn *conn = hcon->l2cap_data;
> +
> +	if (!conn)
> +		return;
> +
> +	while ((skb = skb_dequeue(&conn->pending_conn_q))) {
> +		struct l2cap_conn_req req;
> +		u8 ident, rsp_code, amp_id;
> +
> +		req.psm = bt_cb(skb)->l2cap.psm;
> +		req.scid = bt_cb(skb)->l2cap.scid;
> +		ident = bt_cb(skb)->l2cap.ident;
> +		rsp_code = bt_cb(skb)->l2cap.rsp_code;
> +		amp_id = bt_cb(skb)->l2cap.amp_id;
> +
> +		l2cap_connect(conn, ident, (u8 *)&req, rsp_code, amp_id, false);
> +		kfree_skb(skb);
> +	}
> +}
> +
> +static void l2cap_remove_pending_encrypt_conn(struct work_struct *work)
> +{
> +	struct l2cap_conn *conn = container_of(work, struct l2cap_conn,
> +					    remove_pending_encrypt_conn.work);
> +
> +	l2cap_process_pending_encrypt_conn(conn->hcon);
> +}
> +
> static void l2cap_streaming_send(struct l2cap_chan *chan,
> 				 struct sk_buff_head *skbs)
> {
> @@ -4076,8 +4131,8 @@ static inline int l2cap_command_rej(struct l2cap_conn *conn,
> }
> 
> static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
> -					struct l2cap_cmd_hdr *cmd,
> -					u8 *data, u8 rsp_code, u8 amp_id)
> +					u8 ident, u8 *data, u8 rsp_code,
> +					u8 amp_id, bool queue_if_fail)
> {
> 	struct l2cap_conn_req *req = (struct l2cap_conn_req *) data;
> 	struct l2cap_conn_rsp rsp;
> @@ -4103,8 +4158,15 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
> 	/* Check if the ACL is secure enough (if not SDP) */
> 	if (psm != cpu_to_le16(L2CAP_PSM_SDP) &&
> 	    !hci_conn_check_link_mode(conn->hcon)) {
> -		conn->disc_reason = HCI_ERROR_AUTH_FAILURE;
> -		result = L2CAP_CR_SEC_BLOCK;
> +		if (!queue_if_fail) {
> +			conn->disc_reason = HCI_ERROR_AUTH_FAILURE;
> +			result = L2CAP_CR_SEC_BLOCK;
> +			goto response;
> +		}
> +
> +		l2cap_add_pending_encrypt_conn(conn, req, ident, rsp_code,
> +					       amp_id);
> +		result = L2CAP_CR_PEND;
> 		goto response;
> 	}

So I am actually wondering if the approach is not better to send back a pending to the connect request like we do for everything else. And then proceed with getting our remote L2CAP information. If these come back in encrypted, then we can assume that we actually had encryption enabled and proceed with a L2CAP connect response saying that all is fine.

That also means no queuing of packets is required.

Regards

Marcel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 2/2] Bluetooth: queue L2CAP conn req if encryption is needed
  2020-06-29  6:49   ` Marcel Holtmann
@ 2020-06-29 19:42     ` Luiz Augusto von Dentz
  2020-06-29 20:20       ` Marcel Holtmann
  0 siblings, 1 reply; 15+ messages in thread
From: Luiz Augusto von Dentz @ 2020-06-29 19:42 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Archie Pusaka, linux-bluetooth, chromeos-bluetooth-upstreaming,
	Archie Pusaka, Abhishek Pandit-Subedi, David S. Miller,
	Jakub Kicinski, Johan Hedberg, Linux Kernel Mailing List,
	open list:NETWORKING [GENERAL]

Hi Marcel, Archie,

On Mon, Jun 29, 2020 at 12:00 PM Marcel Holtmann <marcel@holtmann.org> wrote:
>
> Hi Archie,
>
> > It is possible to receive an L2CAP conn req for an encrypted
> > connection, before actually receiving the HCI change encryption
> > event. If this happened, the received L2CAP packet will be ignored.

How is this possible? Or you are referring to a race between the ACL
and Event endpoint where the Encryption Change is actually pending to
be processed but we end up processing the ACL data first.

> > This patch queues the L2CAP packet and process them after the
> > expected HCI event is received. If after 2 seconds we still don't
> > receive it, then we assume something bad happened and discard the
> > queued packets.
>
> as with the other patch, this should be behind the same quirk and experimental setting for exactly the same reasons.
>
> >
> > Signed-off-by: Archie Pusaka <apusaka@chromium.org>
> > Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
> >
> > ---
> >
> > include/net/bluetooth/bluetooth.h |  6 +++
> > include/net/bluetooth/l2cap.h     |  6 +++
> > net/bluetooth/hci_event.c         |  3 ++
> > net/bluetooth/l2cap_core.c        | 87 +++++++++++++++++++++++++++----
> > 4 files changed, 91 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
> > index 7ee8041af803..e64278401084 100644
> > --- a/include/net/bluetooth/bluetooth.h
> > +++ b/include/net/bluetooth/bluetooth.h
> > @@ -335,7 +335,11 @@ struct l2cap_ctrl {
> >       u16     reqseq;
> >       u16     txseq;
> >       u8      retries;
> > +     u8      rsp_code;
> > +     u8      amp_id;
> > +     __u8    ident;
> >       __le16  psm;
> > +     __le16  scid;
> >       bdaddr_t bdaddr;
> >       struct l2cap_chan *chan;
> > };
>
> I would not bother trying to make this work with CREATE_CHAN_REQ. That is if you want to setup a L2CAP channel that can be moved between BR/EDR and AMP controllers and in that case you have to read the L2CAP information and features first. Meaning there will have been unencrypted ACL packets. This problem only exists if the remote side doesn’t request any version information first.
>
> > @@ -374,6 +378,8 @@ struct bt_skb_cb {
> >               struct hci_ctrl hci;
> >       };
> > };
> > +static_assert(sizeof(struct bt_skb_cb) <= sizeof(((struct sk_buff *)0)->cb));
> > +
> > #define bt_cb(skb) ((struct bt_skb_cb *)((skb)->cb))
> >
> > #define hci_skb_pkt_type(skb) bt_cb((skb))->pkt_type
> > diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h
> > index 8f1e6a7a2df8..f8f6dec96f12 100644
> > --- a/include/net/bluetooth/l2cap.h
> > +++ b/include/net/bluetooth/l2cap.h
> > @@ -58,6 +58,7 @@
> > #define L2CAP_MOVE_ERTX_TIMEOUT               msecs_to_jiffies(60000)
> > #define L2CAP_WAIT_ACK_POLL_PERIOD    msecs_to_jiffies(200)
> > #define L2CAP_WAIT_ACK_TIMEOUT                msecs_to_jiffies(10000)
> > +#define L2CAP_PEND_ENC_CONN_TIMEOUT  msecs_to_jiffies(2000)
> >
> > #define L2CAP_A2MP_DEFAULT_MTU                670
> >
> > @@ -700,6 +701,9 @@ struct l2cap_conn {
> >       struct mutex            chan_lock;
> >       struct kref             ref;
> >       struct list_head        users;
> > +
> > +     struct delayed_work     remove_pending_encrypt_conn;
> > +     struct sk_buff_head     pending_conn_q;
> > };
> >
> > struct l2cap_user {
> > @@ -1001,4 +1005,6 @@ void l2cap_conn_put(struct l2cap_conn *conn);
> > int l2cap_register_user(struct l2cap_conn *conn, struct l2cap_user *user);
> > void l2cap_unregister_user(struct l2cap_conn *conn, struct l2cap_user *user);
> >
> > +void l2cap_process_pending_encrypt_conn(struct hci_conn *hcon);
> > +
> > #endif /* __L2CAP_H */
> > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> > index 108c6c102a6a..8cefc51a5ca4 100644
> > --- a/net/bluetooth/hci_event.c
> > +++ b/net/bluetooth/hci_event.c
> > @@ -3136,6 +3136,9 @@ static void hci_encrypt_change_evt(struct hci_dev *hdev, struct sk_buff *skb)
> >
> > unlock:
> >       hci_dev_unlock(hdev);
> > +
> > +     if (conn && !ev->status && ev->encrypt)
> > +             l2cap_process_pending_encrypt_conn(conn);
> > }
> >
> > static void hci_change_link_key_complete_evt(struct hci_dev *hdev,
> > diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> > index 35d2bc569a2d..fc6fe2c80c46 100644
> > --- a/net/bluetooth/l2cap_core.c
> > +++ b/net/bluetooth/l2cap_core.c
> > @@ -62,6 +62,10 @@ static void l2cap_send_disconn_req(struct l2cap_chan *chan, int err);
> > static void l2cap_tx(struct l2cap_chan *chan, struct l2cap_ctrl *control,
> >                    struct sk_buff_head *skbs, u8 event);
> >
> > +static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
> > +                                     u8 ident, u8 *data, u8 rsp_code,
> > +                                     u8 amp_id, bool queue_if_fail);
> > +
> > static inline u8 bdaddr_type(u8 link_type, u8 bdaddr_type)
> > {
> >       if (link_type == LE_LINK) {
> > @@ -1902,6 +1906,8 @@ static void l2cap_conn_del(struct hci_conn *hcon, int err)
> >       if (conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_SENT)
> >               cancel_delayed_work_sync(&conn->info_timer);
> >
> > +     cancel_delayed_work_sync(&conn->remove_pending_encrypt_conn);
> > +
> >       hcon->l2cap_data = NULL;
> >       conn->hchan = NULL;
> >       l2cap_conn_put(conn);
> > @@ -2023,6 +2029,55 @@ static void l2cap_retrans_timeout(struct work_struct *work)
> >       l2cap_chan_put(chan);
> > }
> >
> > +static void l2cap_add_pending_encrypt_conn(struct l2cap_conn *conn,
> > +                                        struct l2cap_conn_req *req,
> > +                                        u8 ident, u8 rsp_code, u8 amp_id)
> > +{
> > +     struct sk_buff *skb = bt_skb_alloc(0, GFP_KERNEL);
> > +
> > +     bt_cb(skb)->l2cap.psm = req->psm;
> > +     bt_cb(skb)->l2cap.scid = req->scid;
> > +     bt_cb(skb)->l2cap.ident = ident;
> > +     bt_cb(skb)->l2cap.rsp_code = rsp_code;
> > +     bt_cb(skb)->l2cap.amp_id = amp_id;
> > +
> > +     skb_queue_tail(&conn->pending_conn_q, skb);
> > +     queue_delayed_work(conn->hcon->hdev->workqueue,
> > +                        &conn->remove_pending_encrypt_conn,
> > +                        L2CAP_PEND_ENC_CONN_TIMEOUT);
> > +}
> > +
> > +void l2cap_process_pending_encrypt_conn(struct hci_conn *hcon)
> > +{
> > +     struct sk_buff *skb;
> > +     struct l2cap_conn *conn = hcon->l2cap_data;
> > +
> > +     if (!conn)
> > +             return;
> > +
> > +     while ((skb = skb_dequeue(&conn->pending_conn_q))) {
> > +             struct l2cap_conn_req req;
> > +             u8 ident, rsp_code, amp_id;
> > +
> > +             req.psm = bt_cb(skb)->l2cap.psm;
> > +             req.scid = bt_cb(skb)->l2cap.scid;
> > +             ident = bt_cb(skb)->l2cap.ident;
> > +             rsp_code = bt_cb(skb)->l2cap.rsp_code;
> > +             amp_id = bt_cb(skb)->l2cap.amp_id;
> > +
> > +             l2cap_connect(conn, ident, (u8 *)&req, rsp_code, amp_id, false);
> > +             kfree_skb(skb);
> > +     }
> > +}
> > +
> > +static void l2cap_remove_pending_encrypt_conn(struct work_struct *work)
> > +{
> > +     struct l2cap_conn *conn = container_of(work, struct l2cap_conn,
> > +                                         remove_pending_encrypt_conn.work);
> > +
> > +     l2cap_process_pending_encrypt_conn(conn->hcon);
> > +}
> > +
> > static void l2cap_streaming_send(struct l2cap_chan *chan,
> >                                struct sk_buff_head *skbs)
> > {
> > @@ -4076,8 +4131,8 @@ static inline int l2cap_command_rej(struct l2cap_conn *conn,
> > }
> >
> > static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
> > -                                     struct l2cap_cmd_hdr *cmd,
> > -                                     u8 *data, u8 rsp_code, u8 amp_id)
> > +                                     u8 ident, u8 *data, u8 rsp_code,
> > +                                     u8 amp_id, bool queue_if_fail)
> > {
> >       struct l2cap_conn_req *req = (struct l2cap_conn_req *) data;
> >       struct l2cap_conn_rsp rsp;
> > @@ -4103,8 +4158,15 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
> >       /* Check if the ACL is secure enough (if not SDP) */
> >       if (psm != cpu_to_le16(L2CAP_PSM_SDP) &&
> >           !hci_conn_check_link_mode(conn->hcon)) {
> > -             conn->disc_reason = HCI_ERROR_AUTH_FAILURE;
> > -             result = L2CAP_CR_SEC_BLOCK;
> > +             if (!queue_if_fail) {
> > +                     conn->disc_reason = HCI_ERROR_AUTH_FAILURE;
> > +                     result = L2CAP_CR_SEC_BLOCK;
> > +                     goto response;
> > +             }
> > +
> > +             l2cap_add_pending_encrypt_conn(conn, req, ident, rsp_code,
> > +                                            amp_id);
> > +             result = L2CAP_CR_PEND;
> >               goto response;
> >       }
>
> So I am actually wondering if the approach is not better to send back a pending to the connect request like we do for everything else. And then proceed with getting our remote L2CAP information. If these come back in encrypted, then we can assume that we actually had encryption enabled and proceed with a L2CAP connect response saying that all is fine.

I wonder if we should resolve this by having different queues in
hci_recv_frame (e.g. hdev->evt_rx), that way we can dequeue the HCI
events before ACL so we first update the HCI states before start
processing the L2CAP data, thoughts? Something like this:

https://gist.github.com/Vudentz/464fb0065a73e5c99bdb66cd2c5a1a2d

> That also means no queuing of packets is required.
>
> Regards
>
> Marcel
>


-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 2/2] Bluetooth: queue L2CAP conn req if encryption is needed
  2020-06-29 19:42     ` Luiz Augusto von Dentz
@ 2020-06-29 20:20       ` Marcel Holtmann
  2020-06-29 22:44         ` Luiz Augusto von Dentz
  0 siblings, 1 reply; 15+ messages in thread
From: Marcel Holtmann @ 2020-06-29 20:20 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Archie Pusaka, linux-bluetooth, chromeos-bluetooth-upstreaming,
	Archie Pusaka, Abhishek Pandit-Subedi, David S. Miller,
	Jakub Kicinski, Johan Hedberg, Linux Kernel Mailing List,
	open list:NETWORKING [GENERAL]

Hi Luiz,

>>> It is possible to receive an L2CAP conn req for an encrypted
>>> connection, before actually receiving the HCI change encryption
>>> event. If this happened, the received L2CAP packet will be ignored.
> 
> How is this possible? Or you are referring to a race between the ACL
> and Event endpoint where the Encryption Change is actually pending to
> be processed but we end up processing the ACL data first.

you get the ACL packet with the L2CAP_Connect_Req in it and then the HCI Encryption Change event. However over the air they go in the different order. It is specific to the USB transport and nothing is going to fix this. The USB transport design is borked. You can only do bandaids.

>>> This patch queues the L2CAP packet and process them after the
>>> expected HCI event is received. If after 2 seconds we still don't
>>> receive it, then we assume something bad happened and discard the
>>> queued packets.
>> 
>> as with the other patch, this should be behind the same quirk and experimental setting for exactly the same reasons.
>> 
>>> 
>>> Signed-off-by: Archie Pusaka <apusaka@chromium.org>
>>> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
>>> 
>>> ---
>>> 
>>> include/net/bluetooth/bluetooth.h |  6 +++
>>> include/net/bluetooth/l2cap.h     |  6 +++
>>> net/bluetooth/hci_event.c         |  3 ++
>>> net/bluetooth/l2cap_core.c        | 87 +++++++++++++++++++++++++++----
>>> 4 files changed, 91 insertions(+), 11 deletions(-)
>>> 
>>> diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
>>> index 7ee8041af803..e64278401084 100644
>>> --- a/include/net/bluetooth/bluetooth.h
>>> +++ b/include/net/bluetooth/bluetooth.h
>>> @@ -335,7 +335,11 @@ struct l2cap_ctrl {
>>>      u16     reqseq;
>>>      u16     txseq;
>>>      u8      retries;
>>> +     u8      rsp_code;
>>> +     u8      amp_id;
>>> +     __u8    ident;
>>>      __le16  psm;
>>> +     __le16  scid;
>>>      bdaddr_t bdaddr;
>>>      struct l2cap_chan *chan;
>>> };
>> 
>> I would not bother trying to make this work with CREATE_CHAN_REQ. That is if you want to setup a L2CAP channel that can be moved between BR/EDR and AMP controllers and in that case you have to read the L2CAP information and features first. Meaning there will have been unencrypted ACL packets. This problem only exists if the remote side doesn’t request any version information first.
>> 
>>> @@ -374,6 +378,8 @@ struct bt_skb_cb {
>>>              struct hci_ctrl hci;
>>>      };
>>> };
>>> +static_assert(sizeof(struct bt_skb_cb) <= sizeof(((struct sk_buff *)0)->cb));
>>> +
>>> #define bt_cb(skb) ((struct bt_skb_cb *)((skb)->cb))
>>> 
>>> #define hci_skb_pkt_type(skb) bt_cb((skb))->pkt_type
>>> diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h
>>> index 8f1e6a7a2df8..f8f6dec96f12 100644
>>> --- a/include/net/bluetooth/l2cap.h
>>> +++ b/include/net/bluetooth/l2cap.h
>>> @@ -58,6 +58,7 @@
>>> #define L2CAP_MOVE_ERTX_TIMEOUT               msecs_to_jiffies(60000)
>>> #define L2CAP_WAIT_ACK_POLL_PERIOD    msecs_to_jiffies(200)
>>> #define L2CAP_WAIT_ACK_TIMEOUT                msecs_to_jiffies(10000)
>>> +#define L2CAP_PEND_ENC_CONN_TIMEOUT  msecs_to_jiffies(2000)
>>> 
>>> #define L2CAP_A2MP_DEFAULT_MTU                670
>>> 
>>> @@ -700,6 +701,9 @@ struct l2cap_conn {
>>>      struct mutex            chan_lock;
>>>      struct kref             ref;
>>>      struct list_head        users;
>>> +
>>> +     struct delayed_work     remove_pending_encrypt_conn;
>>> +     struct sk_buff_head     pending_conn_q;
>>> };
>>> 
>>> struct l2cap_user {
>>> @@ -1001,4 +1005,6 @@ void l2cap_conn_put(struct l2cap_conn *conn);
>>> int l2cap_register_user(struct l2cap_conn *conn, struct l2cap_user *user);
>>> void l2cap_unregister_user(struct l2cap_conn *conn, struct l2cap_user *user);
>>> 
>>> +void l2cap_process_pending_encrypt_conn(struct hci_conn *hcon);
>>> +
>>> #endif /* __L2CAP_H */
>>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
>>> index 108c6c102a6a..8cefc51a5ca4 100644
>>> --- a/net/bluetooth/hci_event.c
>>> +++ b/net/bluetooth/hci_event.c
>>> @@ -3136,6 +3136,9 @@ static void hci_encrypt_change_evt(struct hci_dev *hdev, struct sk_buff *skb)
>>> 
>>> unlock:
>>>      hci_dev_unlock(hdev);
>>> +
>>> +     if (conn && !ev->status && ev->encrypt)
>>> +             l2cap_process_pending_encrypt_conn(conn);
>>> }
>>> 
>>> static void hci_change_link_key_complete_evt(struct hci_dev *hdev,
>>> diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
>>> index 35d2bc569a2d..fc6fe2c80c46 100644
>>> --- a/net/bluetooth/l2cap_core.c
>>> +++ b/net/bluetooth/l2cap_core.c
>>> @@ -62,6 +62,10 @@ static void l2cap_send_disconn_req(struct l2cap_chan *chan, int err);
>>> static void l2cap_tx(struct l2cap_chan *chan, struct l2cap_ctrl *control,
>>>                   struct sk_buff_head *skbs, u8 event);
>>> 
>>> +static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
>>> +                                     u8 ident, u8 *data, u8 rsp_code,
>>> +                                     u8 amp_id, bool queue_if_fail);
>>> +
>>> static inline u8 bdaddr_type(u8 link_type, u8 bdaddr_type)
>>> {
>>>      if (link_type == LE_LINK) {
>>> @@ -1902,6 +1906,8 @@ static void l2cap_conn_del(struct hci_conn *hcon, int err)
>>>      if (conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_SENT)
>>>              cancel_delayed_work_sync(&conn->info_timer);
>>> 
>>> +     cancel_delayed_work_sync(&conn->remove_pending_encrypt_conn);
>>> +
>>>      hcon->l2cap_data = NULL;
>>>      conn->hchan = NULL;
>>>      l2cap_conn_put(conn);
>>> @@ -2023,6 +2029,55 @@ static void l2cap_retrans_timeout(struct work_struct *work)
>>>      l2cap_chan_put(chan);
>>> }
>>> 
>>> +static void l2cap_add_pending_encrypt_conn(struct l2cap_conn *conn,
>>> +                                        struct l2cap_conn_req *req,
>>> +                                        u8 ident, u8 rsp_code, u8 amp_id)
>>> +{
>>> +     struct sk_buff *skb = bt_skb_alloc(0, GFP_KERNEL);
>>> +
>>> +     bt_cb(skb)->l2cap.psm = req->psm;
>>> +     bt_cb(skb)->l2cap.scid = req->scid;
>>> +     bt_cb(skb)->l2cap.ident = ident;
>>> +     bt_cb(skb)->l2cap.rsp_code = rsp_code;
>>> +     bt_cb(skb)->l2cap.amp_id = amp_id;
>>> +
>>> +     skb_queue_tail(&conn->pending_conn_q, skb);
>>> +     queue_delayed_work(conn->hcon->hdev->workqueue,
>>> +                        &conn->remove_pending_encrypt_conn,
>>> +                        L2CAP_PEND_ENC_CONN_TIMEOUT);
>>> +}
>>> +
>>> +void l2cap_process_pending_encrypt_conn(struct hci_conn *hcon)
>>> +{
>>> +     struct sk_buff *skb;
>>> +     struct l2cap_conn *conn = hcon->l2cap_data;
>>> +
>>> +     if (!conn)
>>> +             return;
>>> +
>>> +     while ((skb = skb_dequeue(&conn->pending_conn_q))) {
>>> +             struct l2cap_conn_req req;
>>> +             u8 ident, rsp_code, amp_id;
>>> +
>>> +             req.psm = bt_cb(skb)->l2cap.psm;
>>> +             req.scid = bt_cb(skb)->l2cap.scid;
>>> +             ident = bt_cb(skb)->l2cap.ident;
>>> +             rsp_code = bt_cb(skb)->l2cap.rsp_code;
>>> +             amp_id = bt_cb(skb)->l2cap.amp_id;
>>> +
>>> +             l2cap_connect(conn, ident, (u8 *)&req, rsp_code, amp_id, false);
>>> +             kfree_skb(skb);
>>> +     }
>>> +}
>>> +
>>> +static void l2cap_remove_pending_encrypt_conn(struct work_struct *work)
>>> +{
>>> +     struct l2cap_conn *conn = container_of(work, struct l2cap_conn,
>>> +                                         remove_pending_encrypt_conn.work);
>>> +
>>> +     l2cap_process_pending_encrypt_conn(conn->hcon);
>>> +}
>>> +
>>> static void l2cap_streaming_send(struct l2cap_chan *chan,
>>>                               struct sk_buff_head *skbs)
>>> {
>>> @@ -4076,8 +4131,8 @@ static inline int l2cap_command_rej(struct l2cap_conn *conn,
>>> }
>>> 
>>> static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
>>> -                                     struct l2cap_cmd_hdr *cmd,
>>> -                                     u8 *data, u8 rsp_code, u8 amp_id)
>>> +                                     u8 ident, u8 *data, u8 rsp_code,
>>> +                                     u8 amp_id, bool queue_if_fail)
>>> {
>>>      struct l2cap_conn_req *req = (struct l2cap_conn_req *) data;
>>>      struct l2cap_conn_rsp rsp;
>>> @@ -4103,8 +4158,15 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
>>>      /* Check if the ACL is secure enough (if not SDP) */
>>>      if (psm != cpu_to_le16(L2CAP_PSM_SDP) &&
>>>          !hci_conn_check_link_mode(conn->hcon)) {
>>> -             conn->disc_reason = HCI_ERROR_AUTH_FAILURE;
>>> -             result = L2CAP_CR_SEC_BLOCK;
>>> +             if (!queue_if_fail) {
>>> +                     conn->disc_reason = HCI_ERROR_AUTH_FAILURE;
>>> +                     result = L2CAP_CR_SEC_BLOCK;
>>> +                     goto response;
>>> +             }
>>> +
>>> +             l2cap_add_pending_encrypt_conn(conn, req, ident, rsp_code,
>>> +                                            amp_id);
>>> +             result = L2CAP_CR_PEND;
>>>              goto response;
>>>      }
>> 
>> So I am actually wondering if the approach is not better to send back a pending to the connect request like we do for everything else. And then proceed with getting our remote L2CAP information. If these come back in encrypted, then we can assume that we actually had encryption enabled and proceed with a L2CAP connect response saying that all is fine.
> 
> I wonder if we should resolve this by having different queues in
> hci_recv_frame (e.g. hdev->evt_rx), that way we can dequeue the HCI
> events before ACL so we first update the HCI states before start
> processing the L2CAP data, thoughts? Something like this:
> 
> https://gist.github.com/Vudentz/464fb0065a73e5c99bdb66cd2c5a1a2d

No. We need to keep things serialized. We actually have to reject unencrypted packets.

So whatever we do needs to be behind a quirk and an explicit opt-in.

Regards

Marcel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 2/2] Bluetooth: queue L2CAP conn req if encryption is needed
  2020-06-29 20:20       ` Marcel Holtmann
@ 2020-06-29 22:44         ` Luiz Augusto von Dentz
  2020-06-30  6:48           ` Marcel Holtmann
  0 siblings, 1 reply; 15+ messages in thread
From: Luiz Augusto von Dentz @ 2020-06-29 22:44 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Archie Pusaka, linux-bluetooth, chromeos-bluetooth-upstreaming,
	Archie Pusaka, Abhishek Pandit-Subedi, David S. Miller,
	Jakub Kicinski, Johan Hedberg, Linux Kernel Mailing List,
	open list:NETWORKING [GENERAL]

Hi Marcel,

On Mon, Jun 29, 2020 at 1:21 PM Marcel Holtmann <marcel@holtmann.org> wrote:
>
> Hi Luiz,
>
> >>> It is possible to receive an L2CAP conn req for an encrypted
> >>> connection, before actually receiving the HCI change encryption
> >>> event. If this happened, the received L2CAP packet will be ignored.
> >
> > How is this possible? Or you are referring to a race between the ACL
> > and Event endpoint where the Encryption Change is actually pending to
> > be processed but we end up processing the ACL data first.
>
> you get the ACL packet with the L2CAP_Connect_Req in it and then the HCI Encryption Change event. However over the air they go in the different order. It is specific to the USB transport and nothing is going to fix this. The USB transport design is borked. You can only do bandaids.
>
> >>> This patch queues the L2CAP packet and process them after the
> >>> expected HCI event is received. If after 2 seconds we still don't
> >>> receive it, then we assume something bad happened and discard the
> >>> queued packets.
> >>
> >> as with the other patch, this should be behind the same quirk and experimental setting for exactly the same reasons.
> >>
> >>>
> >>> Signed-off-by: Archie Pusaka <apusaka@chromium.org>
> >>> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
> >>>
> >>> ---
> >>>
> >>> include/net/bluetooth/bluetooth.h |  6 +++
> >>> include/net/bluetooth/l2cap.h     |  6 +++
> >>> net/bluetooth/hci_event.c         |  3 ++
> >>> net/bluetooth/l2cap_core.c        | 87 +++++++++++++++++++++++++++----
> >>> 4 files changed, 91 insertions(+), 11 deletions(-)
> >>>
> >>> diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
> >>> index 7ee8041af803..e64278401084 100644
> >>> --- a/include/net/bluetooth/bluetooth.h
> >>> +++ b/include/net/bluetooth/bluetooth.h
> >>> @@ -335,7 +335,11 @@ struct l2cap_ctrl {
> >>>      u16     reqseq;
> >>>      u16     txseq;
> >>>      u8      retries;
> >>> +     u8      rsp_code;
> >>> +     u8      amp_id;
> >>> +     __u8    ident;
> >>>      __le16  psm;
> >>> +     __le16  scid;
> >>>      bdaddr_t bdaddr;
> >>>      struct l2cap_chan *chan;
> >>> };
> >>
> >> I would not bother trying to make this work with CREATE_CHAN_REQ. That is if you want to setup a L2CAP channel that can be moved between BR/EDR and AMP controllers and in that case you have to read the L2CAP information and features first. Meaning there will have been unencrypted ACL packets. This problem only exists if the remote side doesn’t request any version information first.
> >>
> >>> @@ -374,6 +378,8 @@ struct bt_skb_cb {
> >>>              struct hci_ctrl hci;
> >>>      };
> >>> };
> >>> +static_assert(sizeof(struct bt_skb_cb) <= sizeof(((struct sk_buff *)0)->cb));
> >>> +
> >>> #define bt_cb(skb) ((struct bt_skb_cb *)((skb)->cb))
> >>>
> >>> #define hci_skb_pkt_type(skb) bt_cb((skb))->pkt_type
> >>> diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h
> >>> index 8f1e6a7a2df8..f8f6dec96f12 100644
> >>> --- a/include/net/bluetooth/l2cap.h
> >>> +++ b/include/net/bluetooth/l2cap.h
> >>> @@ -58,6 +58,7 @@
> >>> #define L2CAP_MOVE_ERTX_TIMEOUT               msecs_to_jiffies(60000)
> >>> #define L2CAP_WAIT_ACK_POLL_PERIOD    msecs_to_jiffies(200)
> >>> #define L2CAP_WAIT_ACK_TIMEOUT                msecs_to_jiffies(10000)
> >>> +#define L2CAP_PEND_ENC_CONN_TIMEOUT  msecs_to_jiffies(2000)
> >>>
> >>> #define L2CAP_A2MP_DEFAULT_MTU                670
> >>>
> >>> @@ -700,6 +701,9 @@ struct l2cap_conn {
> >>>      struct mutex            chan_lock;
> >>>      struct kref             ref;
> >>>      struct list_head        users;
> >>> +
> >>> +     struct delayed_work     remove_pending_encrypt_conn;
> >>> +     struct sk_buff_head     pending_conn_q;
> >>> };
> >>>
> >>> struct l2cap_user {
> >>> @@ -1001,4 +1005,6 @@ void l2cap_conn_put(struct l2cap_conn *conn);
> >>> int l2cap_register_user(struct l2cap_conn *conn, struct l2cap_user *user);
> >>> void l2cap_unregister_user(struct l2cap_conn *conn, struct l2cap_user *user);
> >>>
> >>> +void l2cap_process_pending_encrypt_conn(struct hci_conn *hcon);
> >>> +
> >>> #endif /* __L2CAP_H */
> >>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> >>> index 108c6c102a6a..8cefc51a5ca4 100644
> >>> --- a/net/bluetooth/hci_event.c
> >>> +++ b/net/bluetooth/hci_event.c
> >>> @@ -3136,6 +3136,9 @@ static void hci_encrypt_change_evt(struct hci_dev *hdev, struct sk_buff *skb)
> >>>
> >>> unlock:
> >>>      hci_dev_unlock(hdev);
> >>> +
> >>> +     if (conn && !ev->status && ev->encrypt)
> >>> +             l2cap_process_pending_encrypt_conn(conn);
> >>> }
> >>>
> >>> static void hci_change_link_key_complete_evt(struct hci_dev *hdev,
> >>> diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
> >>> index 35d2bc569a2d..fc6fe2c80c46 100644
> >>> --- a/net/bluetooth/l2cap_core.c
> >>> +++ b/net/bluetooth/l2cap_core.c
> >>> @@ -62,6 +62,10 @@ static void l2cap_send_disconn_req(struct l2cap_chan *chan, int err);
> >>> static void l2cap_tx(struct l2cap_chan *chan, struct l2cap_ctrl *control,
> >>>                   struct sk_buff_head *skbs, u8 event);
> >>>
> >>> +static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
> >>> +                                     u8 ident, u8 *data, u8 rsp_code,
> >>> +                                     u8 amp_id, bool queue_if_fail);
> >>> +
> >>> static inline u8 bdaddr_type(u8 link_type, u8 bdaddr_type)
> >>> {
> >>>      if (link_type == LE_LINK) {
> >>> @@ -1902,6 +1906,8 @@ static void l2cap_conn_del(struct hci_conn *hcon, int err)
> >>>      if (conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_SENT)
> >>>              cancel_delayed_work_sync(&conn->info_timer);
> >>>
> >>> +     cancel_delayed_work_sync(&conn->remove_pending_encrypt_conn);
> >>> +
> >>>      hcon->l2cap_data = NULL;
> >>>      conn->hchan = NULL;
> >>>      l2cap_conn_put(conn);
> >>> @@ -2023,6 +2029,55 @@ static void l2cap_retrans_timeout(struct work_struct *work)
> >>>      l2cap_chan_put(chan);
> >>> }
> >>>
> >>> +static void l2cap_add_pending_encrypt_conn(struct l2cap_conn *conn,
> >>> +                                        struct l2cap_conn_req *req,
> >>> +                                        u8 ident, u8 rsp_code, u8 amp_id)
> >>> +{
> >>> +     struct sk_buff *skb = bt_skb_alloc(0, GFP_KERNEL);
> >>> +
> >>> +     bt_cb(skb)->l2cap.psm = req->psm;
> >>> +     bt_cb(skb)->l2cap.scid = req->scid;
> >>> +     bt_cb(skb)->l2cap.ident = ident;
> >>> +     bt_cb(skb)->l2cap.rsp_code = rsp_code;
> >>> +     bt_cb(skb)->l2cap.amp_id = amp_id;
> >>> +
> >>> +     skb_queue_tail(&conn->pending_conn_q, skb);
> >>> +     queue_delayed_work(conn->hcon->hdev->workqueue,
> >>> +                        &conn->remove_pending_encrypt_conn,
> >>> +                        L2CAP_PEND_ENC_CONN_TIMEOUT);
> >>> +}
> >>> +
> >>> +void l2cap_process_pending_encrypt_conn(struct hci_conn *hcon)
> >>> +{
> >>> +     struct sk_buff *skb;
> >>> +     struct l2cap_conn *conn = hcon->l2cap_data;
> >>> +
> >>> +     if (!conn)
> >>> +             return;
> >>> +
> >>> +     while ((skb = skb_dequeue(&conn->pending_conn_q))) {
> >>> +             struct l2cap_conn_req req;
> >>> +             u8 ident, rsp_code, amp_id;
> >>> +
> >>> +             req.psm = bt_cb(skb)->l2cap.psm;
> >>> +             req.scid = bt_cb(skb)->l2cap.scid;
> >>> +             ident = bt_cb(skb)->l2cap.ident;
> >>> +             rsp_code = bt_cb(skb)->l2cap.rsp_code;
> >>> +             amp_id = bt_cb(skb)->l2cap.amp_id;
> >>> +
> >>> +             l2cap_connect(conn, ident, (u8 *)&req, rsp_code, amp_id, false);
> >>> +             kfree_skb(skb);
> >>> +     }
> >>> +}
> >>> +
> >>> +static void l2cap_remove_pending_encrypt_conn(struct work_struct *work)
> >>> +{
> >>> +     struct l2cap_conn *conn = container_of(work, struct l2cap_conn,
> >>> +                                         remove_pending_encrypt_conn.work);
> >>> +
> >>> +     l2cap_process_pending_encrypt_conn(conn->hcon);
> >>> +}
> >>> +
> >>> static void l2cap_streaming_send(struct l2cap_chan *chan,
> >>>                               struct sk_buff_head *skbs)
> >>> {
> >>> @@ -4076,8 +4131,8 @@ static inline int l2cap_command_rej(struct l2cap_conn *conn,
> >>> }
> >>>
> >>> static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
> >>> -                                     struct l2cap_cmd_hdr *cmd,
> >>> -                                     u8 *data, u8 rsp_code, u8 amp_id)
> >>> +                                     u8 ident, u8 *data, u8 rsp_code,
> >>> +                                     u8 amp_id, bool queue_if_fail)
> >>> {
> >>>      struct l2cap_conn_req *req = (struct l2cap_conn_req *) data;
> >>>      struct l2cap_conn_rsp rsp;
> >>> @@ -4103,8 +4158,15 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
> >>>      /* Check if the ACL is secure enough (if not SDP) */
> >>>      if (psm != cpu_to_le16(L2CAP_PSM_SDP) &&
> >>>          !hci_conn_check_link_mode(conn->hcon)) {
> >>> -             conn->disc_reason = HCI_ERROR_AUTH_FAILURE;
> >>> -             result = L2CAP_CR_SEC_BLOCK;
> >>> +             if (!queue_if_fail) {
> >>> +                     conn->disc_reason = HCI_ERROR_AUTH_FAILURE;
> >>> +                     result = L2CAP_CR_SEC_BLOCK;
> >>> +                     goto response;
> >>> +             }
> >>> +
> >>> +             l2cap_add_pending_encrypt_conn(conn, req, ident, rsp_code,
> >>> +                                            amp_id);
> >>> +             result = L2CAP_CR_PEND;
> >>>              goto response;
> >>>      }
> >>
> >> So I am actually wondering if the approach is not better to send back a pending to the connect request like we do for everything else. And then proceed with getting our remote L2CAP information. If these come back in encrypted, then we can assume that we actually had encryption enabled and proceed with a L2CAP connect response saying that all is fine.
> >
> > I wonder if we should resolve this by having different queues in
> > hci_recv_frame (e.g. hdev->evt_rx), that way we can dequeue the HCI
> > events before ACL so we first update the HCI states before start
> > processing the L2CAP data, thoughts? Something like this:
> >
> > https://gist.github.com/Vudentz/464fb0065a73e5c99bdb66cd2c5a1a2d
>
> No. We need to keep things serialized. We actually have to reject unencrypted packets.
>
> So whatever we do needs to be behind a quirk and an explicit opt-in.

While I agree we are just working around the real issue, Id guess
processing the event before ACL would work (I haven't tested it yet)
much better than leaving this up to the L2CAP layer since that
requires a timer in order for us to e.g. accept/reject the connection
request, also since this problem is known to affect other events as
well (e.g. data for ATT coming before Connection Complete) I guess
using the time the kernel takes to schedule the rx_work as the window
where we would assume the packets arrived 'at same time' so we can
resolve the conflicts between endpoints. On top of this we could
perhaps consider using a delayed work for rx_work so the driver can
actually tune up what is the time window (perhaps for USB that should
be the polling interval) where we would consider events and data that
have arrived at same time.

Or are you saying that the conflict resolution I proposed would
actually break things? I could picture any event that if it were
processed before the data at such a short time window would, note here
I'm talking about miliseconds not seconds so it is not that this will
be doing much reordering and if we go with delayed work it should be
relatively simple to add a Kconfig option(build-time)/module(runtime)
parameter to btusb to configure the interface were we would do such
reordering which the default could be 0 in which case we can just keep
queuing everything on rx_q.

> Regards
>
> Marcel
>


-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found
  2020-06-29  6:40   ` Marcel Holtmann
@ 2020-06-30  6:48     ` Archie Pusaka
  2020-06-30  6:54       ` Marcel Holtmann
  0 siblings, 1 reply; 15+ messages in thread
From: Archie Pusaka @ 2020-06-30  6:48 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: linux-bluetooth, chromeos-bluetooth-upstreaming,
	Abhishek Pandit-Subedi, David S. Miller, Jakub Kicinski,
	Johan Hedberg, kernel list, netdev

Hi Marcel,

On Mon, 29 Jun 2020 at 14:40, Marcel Holtmann <marcel@holtmann.org> wrote:
>
> Hi Archie,
>
> > There is a possibility that an ACL packet is received before we
> > receive the HCI connect event for the corresponding handle. If this
> > happens, we discard the ACL packet.
> >
> > Rather than just ignoring them, this patch provides a queue for
> > incoming ACL packet without a handle. The queue is processed when
> > receiving a HCI connection event. If 2 seconds elapsed without
> > receiving the HCI connection event, assume something bad happened
> > and discard the queued packet.
> >
> > Signed-off-by: Archie Pusaka <apusaka@chromium.org>
> > Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
>
> so two things up front. I want to hide this behind a HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. Frankly if this kind of out-of-order happens on UART or SDIO transports, then something is obviously going wrong. I have no plan to fix up after a fully serialized transport.
>
> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want this off by default. You can enable it via an experimental setting. The reason here is that we have to make it really hard and fail as often as possible so that hardware manufactures and spec writers realize that something is fundamentally broken here.
>
> I have no problem in running the code and complaining loudly in case the quirk has been set. Just injecting the packets can only happen if bluetoothd explicitly enabled it.

Got it.

>
>
> >
> > ---
> >
> > include/net/bluetooth/hci_core.h |  8 +++
> > net/bluetooth/hci_core.c         | 84 +++++++++++++++++++++++++++++---
> > net/bluetooth/hci_event.c        |  2 +
> > 3 files changed, 88 insertions(+), 6 deletions(-)
> >
> > diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> > index 836dc997ff94..b69ecdd0d15a 100644
> > --- a/include/net/bluetooth/hci_core.h
> > +++ b/include/net/bluetooth/hci_core.h
> > @@ -270,6 +270,9 @@ struct adv_monitor {
> > /* Default authenticated payload timeout 30s */
> > #define DEFAULT_AUTH_PAYLOAD_TIMEOUT   0x0bb8
> >
> > +/* Time to keep ACL packets without a corresponding handle queued (2s) */
> > +#define PENDING_ACL_TIMEOUT          msecs_to_jiffies(2000)
> > +
>
> Do we have some btmon traces with timestamps. Isn’t a second enough? Actually 2 seconds is an awful long time.

When this happens in the test lab, the HCI connect event is about
0.002 second behind the first ACL packet. We can change this if
required.

>
> > struct amp_assoc {
> >       __u16   len;
> >       __u16   offset;
> > @@ -538,6 +541,9 @@ struct hci_dev {
> >       struct delayed_work     rpa_expired;
> >       bdaddr_t                rpa;
> >
> > +     struct delayed_work     remove_pending_acl;
> > +     struct sk_buff_head     pending_acl_q;
> > +
>
> can we name this ooo_q and move it to the other queues in this struct. Unless we want to add a Kconfig option around it, we don’t need to keep it here.

Ack.

>
> > #if IS_ENABLED(CONFIG_BT_LEDS)
> >       struct led_trigger      *power_led;
> > #endif
> > @@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, __le16 ediv, __le64 rand,
> > void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr,
> >                              u8 *bdaddr_type);
> >
> > +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn);
> > +
> > #define SCO_AIRMODE_MASK       0x0003
> > #define SCO_AIRMODE_CVSD       0x0000
> > #define SCO_AIRMODE_TRANSP     0x0003
> > diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> > index 7959b851cc63..30780242c267 100644
> > --- a/net/bluetooth/hci_core.c
> > +++ b/net/bluetooth/hci_core.c
> > @@ -1786,6 +1786,7 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >       skb_queue_purge(&hdev->rx_q);
> >       skb_queue_purge(&hdev->cmd_q);
> >       skb_queue_purge(&hdev->raw_q);
> > +     skb_queue_purge(&hdev->pending_acl_q);
> >
> >       /* Drop last sent command */
> >       if (hdev->sent_cmd) {
> > @@ -3518,6 +3519,78 @@ static int hci_suspend_notifier(struct notifier_block *nb, unsigned long action,
> >       return NOTIFY_STOP;
> > }
> >
> > +static void hci_add_pending_acl(struct hci_dev *hdev, struct sk_buff *skb)
> > +{
> > +     skb_queue_tail(&hdev->pending_acl_q, skb);
> > +
> > +     queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
> > +                        PENDING_ACL_TIMEOUT);
> > +}
> > +
> > +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn)
> > +{
> > +     struct sk_buff *skb, *tmp;
> > +     struct hci_acl_hdr *hdr;
> > +     u16 handle, flags;
> > +     bool reset_timer = false;
> > +
> > +     skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
> > +             hdr = (struct hci_acl_hdr *)skb->data;
> > +             handle = __le16_to_cpu(hdr->handle);
> > +             flags  = hci_flags(handle);
> > +             handle = hci_handle(handle);
> > +
> > +             if (handle != conn->handle)
> > +                     continue;
> > +
> > +             __skb_unlink(skb, &hdev->pending_acl_q);
> > +             skb_pull(skb, HCI_ACL_HDR_SIZE);
> > +
> > +             l2cap_recv_acldata(conn, skb, flags);
> > +             reset_timer = true;
> > +     }
> > +
> > +     if (reset_timer)
> > +             mod_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
> > +                              PENDING_ACL_TIMEOUT);
> > +}
> > +
> > +/* Remove the oldest pending ACL, and all pending ACLs with the same handle */
> > +static void hci_remove_pending_acl(struct work_struct *work)
> > +{
> > +     struct hci_dev *hdev;
> > +     struct sk_buff *skb, *tmp;
> > +     struct hci_acl_hdr *hdr;
> > +     u16 handle, oldest_handle;
> > +
> > +     hdev = container_of(work, struct hci_dev, remove_pending_acl.work);
> > +     skb = skb_dequeue(&hdev->pending_acl_q);
> > +
> > +     if (!skb)
> > +             return;
> > +
> > +     hdr = (struct hci_acl_hdr *)skb->data;
> > +     oldest_handle = hci_handle(__le16_to_cpu(hdr->handle));
> > +     kfree_skb(skb);
> > +
> > +     bt_dev_err(hdev, "ACL packet for unknown connection handle %d",
> > +                oldest_handle);
> > +
> > +     skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
> > +             hdr = (struct hci_acl_hdr *)skb->data;
> > +             handle = hci_handle(__le16_to_cpu(hdr->handle));
> > +
> > +             if (handle == oldest_handle) {
> > +                     __skb_unlink(skb, &hdev->pending_acl_q);
> > +                     kfree_skb(skb);
> > +             }
> > +     }
> > +
> > +     if (!skb_queue_empty(&hdev->pending_acl_q))
> > +             queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
> > +                                PENDING_ACL_TIMEOUT);
> > +}
> > +
>
> So I am wondering if we make this too complicated. Since generally speaking we can only have a single HCI connect complete anyway at a time. No matter if the controller serializes it for us or we do it for the controller. So hci_conn_add could just process the queue for packets with its handle and then flush it. And it can flush it no matter what since whatever other packets are in the queue, they can not be valid.
>
> That said, we wouldn’t even need to check the packet handles at all. We just needed to flag them as already out-of-order queued once and hand them back into the rx_q at the top. Then the would be processed as usual. Already ooo packets would cause the same error as before if it is for a non-existing handle and others would end up being processed.
>
> For me this means we just need another queue to park the packets until hci_conn_add gets called. I might have missed something, but I am looking for the least invasive option for this and least code duplication.

I'm not aware of the fact that we can only have a single HCI connect
complete event at any time. Is this also true even if two / more
peripherals connect at the same time?
I was under the impression that if we have device A and B both are
connecting to us at the same time, we might receive the packets in
this order:
(1) ACL A
(2) ACL B
(3) HCI conn evt B
(4) HCI conn evt A
Hence the queue and the handle check.

>
> > /* Alloc HCI device */
> > struct hci_dev *hci_alloc_dev(void)
> > {
> > @@ -3610,10 +3683,12 @@ struct hci_dev *hci_alloc_dev(void)
> >       INIT_WORK(&hdev->suspend_prepare, hci_prepare_suspend);
> >
> >       INIT_DELAYED_WORK(&hdev->power_off, hci_power_off);
> > +     INIT_DELAYED_WORK(&hdev->remove_pending_acl, hci_remove_pending_acl);
> >
> >       skb_queue_head_init(&hdev->rx_q);
> >       skb_queue_head_init(&hdev->cmd_q);
> >       skb_queue_head_init(&hdev->raw_q);
> > +     skb_queue_head_init(&hdev->pending_acl_q);
> >
> >       init_waitqueue_head(&hdev->req_wait_q);
> >       init_waitqueue_head(&hdev->suspend_wait_q);
> > @@ -4662,8 +4737,6 @@ static void hci_acldata_packet(struct hci_dev *hdev, struct sk_buff *skb)
> >       struct hci_conn *conn;
> >       __u16 handle, flags;
> >
> > -     skb_pull(skb, HCI_ACL_HDR_SIZE);
> > -
> >       handle = __le16_to_cpu(hdr->handle);
> >       flags  = hci_flags(handle);
> >       handle = hci_handle(handle);
> > @@ -4678,17 +4751,16 @@ static void hci_acldata_packet(struct hci_dev *hdev, struct sk_buff *skb)
> >       hci_dev_unlock(hdev);
> >
> >       if (conn) {
> > +             skb_pull(skb, HCI_ACL_HDR_SIZE);
> > +
> >               hci_conn_enter_active_mode(conn, BT_POWER_FORCE_ACTIVE_OFF);
> >
> >               /* Send to upper protocol */
> >               l2cap_recv_acldata(conn, skb, flags);
> >               return;
> >       } else {
> > -             bt_dev_err(hdev, "ACL packet for unknown connection handle %d",
> > -                        handle);
> > +             hci_add_pending_acl(hdev, skb);
>
> So here I want to keep being verbose. If no quirk is set, then this has to stay as an error. In case the quirk is set, then this should still warn that we are queuing up a packet. It is not an expected behavior.

Ack.

>
> >       }
> > -
> > -     kfree_skb(skb);
> > }
> >
> > /* SCO data packet */
> > diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
> > index e060fc9ebb18..108c6c102a6a 100644
> > --- a/net/bluetooth/hci_event.c
> > +++ b/net/bluetooth/hci_event.c
> > @@ -2627,6 +2627,8 @@ static void hci_conn_complete_evt(struct hci_dev *hdev, struct sk_buff *skb)
> >                       hci_send_cmd(hdev, HCI_OP_CHANGE_CONN_PTYPE, sizeof(cp),
> >                                    &cp);
> >               }
> > +
> > +             hci_process_pending_acl(hdev, conn);
>
> Can we just do this in hci_conn_add() when we create the connection object?

Yes, we can.

>
> >       } else {
> >               conn->state = BT_CLOSED;
> >               if (conn->type == ACL_LINK)
> > --
> > 2.27.0.212.ge8ba1cc988-goog
> >
>
> Regards
>
> Marcel
>

Thanks,
Archie

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 2/2] Bluetooth: queue L2CAP conn req if encryption is needed
  2020-06-29 22:44         ` Luiz Augusto von Dentz
@ 2020-06-30  6:48           ` Marcel Holtmann
  0 siblings, 0 replies; 15+ messages in thread
From: Marcel Holtmann @ 2020-06-30  6:48 UTC (permalink / raw)
  To: Luiz Augusto von Dentz
  Cc: Archie Pusaka, linux-bluetooth, chromeos-bluetooth-upstreaming,
	Archie Pusaka, Abhishek Pandit-Subedi, David S. Miller,
	Jakub Kicinski, Johan Hedberg, Linux Kernel Mailing List,
	open list:NETWORKING [GENERAL]

Hi Luiz,

>>>>> It is possible to receive an L2CAP conn req for an encrypted
>>>>> connection, before actually receiving the HCI change encryption
>>>>> event. If this happened, the received L2CAP packet will be ignored.
>>> 
>>> How is this possible? Or you are referring to a race between the ACL
>>> and Event endpoint where the Encryption Change is actually pending to
>>> be processed but we end up processing the ACL data first.
>> 
>> you get the ACL packet with the L2CAP_Connect_Req in it and then the HCI Encryption Change event. However over the air they go in the different order. It is specific to the USB transport and nothing is going to fix this. The USB transport design is borked. You can only do bandaids.
>> 
>>>>> This patch queues the L2CAP packet and process them after the
>>>>> expected HCI event is received. If after 2 seconds we still don't
>>>>> receive it, then we assume something bad happened and discard the
>>>>> queued packets.
>>>> 
>>>> as with the other patch, this should be behind the same quirk and experimental setting for exactly the same reasons.
>>>> 
>>>>> 
>>>>> Signed-off-by: Archie Pusaka <apusaka@chromium.org>
>>>>> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
>>>>> 
>>>>> ---
>>>>> 
>>>>> include/net/bluetooth/bluetooth.h |  6 +++
>>>>> include/net/bluetooth/l2cap.h     |  6 +++
>>>>> net/bluetooth/hci_event.c         |  3 ++
>>>>> net/bluetooth/l2cap_core.c        | 87 +++++++++++++++++++++++++++----
>>>>> 4 files changed, 91 insertions(+), 11 deletions(-)
>>>>> 
>>>>> diff --git a/include/net/bluetooth/bluetooth.h b/include/net/bluetooth/bluetooth.h
>>>>> index 7ee8041af803..e64278401084 100644
>>>>> --- a/include/net/bluetooth/bluetooth.h
>>>>> +++ b/include/net/bluetooth/bluetooth.h
>>>>> @@ -335,7 +335,11 @@ struct l2cap_ctrl {
>>>>>     u16     reqseq;
>>>>>     u16     txseq;
>>>>>     u8      retries;
>>>>> +     u8      rsp_code;
>>>>> +     u8      amp_id;
>>>>> +     __u8    ident;
>>>>>     __le16  psm;
>>>>> +     __le16  scid;
>>>>>     bdaddr_t bdaddr;
>>>>>     struct l2cap_chan *chan;
>>>>> };
>>>> 
>>>> I would not bother trying to make this work with CREATE_CHAN_REQ. That is if you want to setup a L2CAP channel that can be moved between BR/EDR and AMP controllers and in that case you have to read the L2CAP information and features first. Meaning there will have been unencrypted ACL packets. This problem only exists if the remote side doesn’t request any version information first.
>>>> 
>>>>> @@ -374,6 +378,8 @@ struct bt_skb_cb {
>>>>>             struct hci_ctrl hci;
>>>>>     };
>>>>> };
>>>>> +static_assert(sizeof(struct bt_skb_cb) <= sizeof(((struct sk_buff *)0)->cb));
>>>>> +
>>>>> #define bt_cb(skb) ((struct bt_skb_cb *)((skb)->cb))
>>>>> 
>>>>> #define hci_skb_pkt_type(skb) bt_cb((skb))->pkt_type
>>>>> diff --git a/include/net/bluetooth/l2cap.h b/include/net/bluetooth/l2cap.h
>>>>> index 8f1e6a7a2df8..f8f6dec96f12 100644
>>>>> --- a/include/net/bluetooth/l2cap.h
>>>>> +++ b/include/net/bluetooth/l2cap.h
>>>>> @@ -58,6 +58,7 @@
>>>>> #define L2CAP_MOVE_ERTX_TIMEOUT               msecs_to_jiffies(60000)
>>>>> #define L2CAP_WAIT_ACK_POLL_PERIOD    msecs_to_jiffies(200)
>>>>> #define L2CAP_WAIT_ACK_TIMEOUT                msecs_to_jiffies(10000)
>>>>> +#define L2CAP_PEND_ENC_CONN_TIMEOUT  msecs_to_jiffies(2000)
>>>>> 
>>>>> #define L2CAP_A2MP_DEFAULT_MTU                670
>>>>> 
>>>>> @@ -700,6 +701,9 @@ struct l2cap_conn {
>>>>>     struct mutex            chan_lock;
>>>>>     struct kref             ref;
>>>>>     struct list_head        users;
>>>>> +
>>>>> +     struct delayed_work     remove_pending_encrypt_conn;
>>>>> +     struct sk_buff_head     pending_conn_q;
>>>>> };
>>>>> 
>>>>> struct l2cap_user {
>>>>> @@ -1001,4 +1005,6 @@ void l2cap_conn_put(struct l2cap_conn *conn);
>>>>> int l2cap_register_user(struct l2cap_conn *conn, struct l2cap_user *user);
>>>>> void l2cap_unregister_user(struct l2cap_conn *conn, struct l2cap_user *user);
>>>>> 
>>>>> +void l2cap_process_pending_encrypt_conn(struct hci_conn *hcon);
>>>>> +
>>>>> #endif /* __L2CAP_H */
>>>>> diff --git a/net/bluetooth/hci_event.c b/net/bluetooth/hci_event.c
>>>>> index 108c6c102a6a..8cefc51a5ca4 100644
>>>>> --- a/net/bluetooth/hci_event.c
>>>>> +++ b/net/bluetooth/hci_event.c
>>>>> @@ -3136,6 +3136,9 @@ static void hci_encrypt_change_evt(struct hci_dev *hdev, struct sk_buff *skb)
>>>>> 
>>>>> unlock:
>>>>>     hci_dev_unlock(hdev);
>>>>> +
>>>>> +     if (conn && !ev->status && ev->encrypt)
>>>>> +             l2cap_process_pending_encrypt_conn(conn);
>>>>> }
>>>>> 
>>>>> static void hci_change_link_key_complete_evt(struct hci_dev *hdev,
>>>>> diff --git a/net/bluetooth/l2cap_core.c b/net/bluetooth/l2cap_core.c
>>>>> index 35d2bc569a2d..fc6fe2c80c46 100644
>>>>> --- a/net/bluetooth/l2cap_core.c
>>>>> +++ b/net/bluetooth/l2cap_core.c
>>>>> @@ -62,6 +62,10 @@ static void l2cap_send_disconn_req(struct l2cap_chan *chan, int err);
>>>>> static void l2cap_tx(struct l2cap_chan *chan, struct l2cap_ctrl *control,
>>>>>                  struct sk_buff_head *skbs, u8 event);
>>>>> 
>>>>> +static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
>>>>> +                                     u8 ident, u8 *data, u8 rsp_code,
>>>>> +                                     u8 amp_id, bool queue_if_fail);
>>>>> +
>>>>> static inline u8 bdaddr_type(u8 link_type, u8 bdaddr_type)
>>>>> {
>>>>>     if (link_type == LE_LINK) {
>>>>> @@ -1902,6 +1906,8 @@ static void l2cap_conn_del(struct hci_conn *hcon, int err)
>>>>>     if (conn->info_state & L2CAP_INFO_FEAT_MASK_REQ_SENT)
>>>>>             cancel_delayed_work_sync(&conn->info_timer);
>>>>> 
>>>>> +     cancel_delayed_work_sync(&conn->remove_pending_encrypt_conn);
>>>>> +
>>>>>     hcon->l2cap_data = NULL;
>>>>>     conn->hchan = NULL;
>>>>>     l2cap_conn_put(conn);
>>>>> @@ -2023,6 +2029,55 @@ static void l2cap_retrans_timeout(struct work_struct *work)
>>>>>     l2cap_chan_put(chan);
>>>>> }
>>>>> 
>>>>> +static void l2cap_add_pending_encrypt_conn(struct l2cap_conn *conn,
>>>>> +                                        struct l2cap_conn_req *req,
>>>>> +                                        u8 ident, u8 rsp_code, u8 amp_id)
>>>>> +{
>>>>> +     struct sk_buff *skb = bt_skb_alloc(0, GFP_KERNEL);
>>>>> +
>>>>> +     bt_cb(skb)->l2cap.psm = req->psm;
>>>>> +     bt_cb(skb)->l2cap.scid = req->scid;
>>>>> +     bt_cb(skb)->l2cap.ident = ident;
>>>>> +     bt_cb(skb)->l2cap.rsp_code = rsp_code;
>>>>> +     bt_cb(skb)->l2cap.amp_id = amp_id;
>>>>> +
>>>>> +     skb_queue_tail(&conn->pending_conn_q, skb);
>>>>> +     queue_delayed_work(conn->hcon->hdev->workqueue,
>>>>> +                        &conn->remove_pending_encrypt_conn,
>>>>> +                        L2CAP_PEND_ENC_CONN_TIMEOUT);
>>>>> +}
>>>>> +
>>>>> +void l2cap_process_pending_encrypt_conn(struct hci_conn *hcon)
>>>>> +{
>>>>> +     struct sk_buff *skb;
>>>>> +     struct l2cap_conn *conn = hcon->l2cap_data;
>>>>> +
>>>>> +     if (!conn)
>>>>> +             return;
>>>>> +
>>>>> +     while ((skb = skb_dequeue(&conn->pending_conn_q))) {
>>>>> +             struct l2cap_conn_req req;
>>>>> +             u8 ident, rsp_code, amp_id;
>>>>> +
>>>>> +             req.psm = bt_cb(skb)->l2cap.psm;
>>>>> +             req.scid = bt_cb(skb)->l2cap.scid;
>>>>> +             ident = bt_cb(skb)->l2cap.ident;
>>>>> +             rsp_code = bt_cb(skb)->l2cap.rsp_code;
>>>>> +             amp_id = bt_cb(skb)->l2cap.amp_id;
>>>>> +
>>>>> +             l2cap_connect(conn, ident, (u8 *)&req, rsp_code, amp_id, false);
>>>>> +             kfree_skb(skb);
>>>>> +     }
>>>>> +}
>>>>> +
>>>>> +static void l2cap_remove_pending_encrypt_conn(struct work_struct *work)
>>>>> +{
>>>>> +     struct l2cap_conn *conn = container_of(work, struct l2cap_conn,
>>>>> +                                         remove_pending_encrypt_conn.work);
>>>>> +
>>>>> +     l2cap_process_pending_encrypt_conn(conn->hcon);
>>>>> +}
>>>>> +
>>>>> static void l2cap_streaming_send(struct l2cap_chan *chan,
>>>>>                              struct sk_buff_head *skbs)
>>>>> {
>>>>> @@ -4076,8 +4131,8 @@ static inline int l2cap_command_rej(struct l2cap_conn *conn,
>>>>> }
>>>>> 
>>>>> static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
>>>>> -                                     struct l2cap_cmd_hdr *cmd,
>>>>> -                                     u8 *data, u8 rsp_code, u8 amp_id)
>>>>> +                                     u8 ident, u8 *data, u8 rsp_code,
>>>>> +                                     u8 amp_id, bool queue_if_fail)
>>>>> {
>>>>>     struct l2cap_conn_req *req = (struct l2cap_conn_req *) data;
>>>>>     struct l2cap_conn_rsp rsp;
>>>>> @@ -4103,8 +4158,15 @@ static struct l2cap_chan *l2cap_connect(struct l2cap_conn *conn,
>>>>>     /* Check if the ACL is secure enough (if not SDP) */
>>>>>     if (psm != cpu_to_le16(L2CAP_PSM_SDP) &&
>>>>>         !hci_conn_check_link_mode(conn->hcon)) {
>>>>> -             conn->disc_reason = HCI_ERROR_AUTH_FAILURE;
>>>>> -             result = L2CAP_CR_SEC_BLOCK;
>>>>> +             if (!queue_if_fail) {
>>>>> +                     conn->disc_reason = HCI_ERROR_AUTH_FAILURE;
>>>>> +                     result = L2CAP_CR_SEC_BLOCK;
>>>>> +                     goto response;
>>>>> +             }
>>>>> +
>>>>> +             l2cap_add_pending_encrypt_conn(conn, req, ident, rsp_code,
>>>>> +                                            amp_id);
>>>>> +             result = L2CAP_CR_PEND;
>>>>>             goto response;
>>>>>     }
>>>> 
>>>> So I am actually wondering if the approach is not better to send back a pending to the connect request like we do for everything else. And then proceed with getting our remote L2CAP information. If these come back in encrypted, then we can assume that we actually had encryption enabled and proceed with a L2CAP connect response saying that all is fine.
>>> 
>>> I wonder if we should resolve this by having different queues in
>>> hci_recv_frame (e.g. hdev->evt_rx), that way we can dequeue the HCI
>>> events before ACL so we first update the HCI states before start
>>> processing the L2CAP data, thoughts? Something like this:
>>> 
>>> https://gist.github.com/Vudentz/464fb0065a73e5c99bdb66cd2c5a1a2d
>> 
>> No. We need to keep things serialized. We actually have to reject unencrypted packets.
>> 
>> So whatever we do needs to be behind a quirk and an explicit opt-in.
> 
> While I agree we are just working around the real issue, Id guess
> processing the event before ACL would work (I haven't tested it yet)
> much better than leaving this up to the L2CAP layer since that
> requires a timer in order for us to e.g. accept/reject the connection
> request, also since this problem is known to affect other events as
> well (e.g. data for ATT coming before Connection Complete) I guess
> using the time the kernel takes to schedule the rx_work as the window
> where we would assume the packets arrived 'at same time' so we can
> resolve the conflicts between endpoints. On top of this we could
> perhaps consider using a delayed work for rx_work so the driver can
> actually tune up what is the time window (perhaps for USB that should
> be the polling interval) where we would consider events and data that
> have arrived at same time.

I am not following this argumentation. The drivers call hci_recv_frame and they get all put into the rx_q. The processing of that queue doesn’t change the order. So for the core, it is all serialized as it is suppose to be. That means the timing of when rx_work is irrelevant to this discussion.

But I agree that I want to fix this without having to use timers and I think that is possible. And of course it is in opt-in. Long term we need to fix the USB transport to proper serialize things. Please keep in mind that all other transports don’t have this problem.

> Or are you saying that the conflict resolution I proposed would
> actually break things? I could picture any event that if it were
> processed before the data at such a short time window would, note here
> I'm talking about miliseconds not seconds so it is not that this will
> be doing much reordering and if we go with delayed work it should be
> relatively simple to add a Kconfig option(build-time)/module(runtime)
> parameter to btusb to configure the interface were we would do such
> reordering which the default could be 0 in which case we can just keep
> queuing everything on rx_q.

The not-encrypted issues worries me the the most. You need to opt-in and accept the risk. If you are unlucky, you might fail qualification because of this.

Regards

Marcel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found
  2020-06-30  6:48     ` Archie Pusaka
@ 2020-06-30  6:54       ` Marcel Holtmann
       [not found]         ` <CALWDO_Vrn_pXMbkXifKFazha7BYPqLpCthqHOb9ZmVE3wDRMfA@mail.gmail.com>
  0 siblings, 1 reply; 15+ messages in thread
From: Marcel Holtmann @ 2020-06-30  6:54 UTC (permalink / raw)
  To: Archie Pusaka
  Cc: linux-bluetooth, chromeos-bluetooth-upstreaming,
	Abhishek Pandit-Subedi, David S. Miller, Jakub Kicinski,
	Johan Hedberg, kernel list, netdev

Hi Archie,

>>> There is a possibility that an ACL packet is received before we
>>> receive the HCI connect event for the corresponding handle. If this
>>> happens, we discard the ACL packet.
>>> 
>>> Rather than just ignoring them, this patch provides a queue for
>>> incoming ACL packet without a handle. The queue is processed when
>>> receiving a HCI connection event. If 2 seconds elapsed without
>>> receiving the HCI connection event, assume something bad happened
>>> and discard the queued packet.
>>> 
>>> Signed-off-by: Archie Pusaka <apusaka@chromium.org>
>>> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
>> 
>> so two things up front. I want to hide this behind a HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. Frankly if this kind of out-of-order happens on UART or SDIO transports, then something is obviously going wrong. I have no plan to fix up after a fully serialized transport.
>> 
>> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want this off by default. You can enable it via an experimental setting. The reason here is that we have to make it really hard and fail as often as possible so that hardware manufactures and spec writers realize that something is fundamentally broken here.
>> 
>> I have no problem in running the code and complaining loudly in case the quirk has been set. Just injecting the packets can only happen if bluetoothd explicitly enabled it.
> 
> Got it.
> 
>> 
>> 
>>> 
>>> ---
>>> 
>>> include/net/bluetooth/hci_core.h |  8 +++
>>> net/bluetooth/hci_core.c         | 84 +++++++++++++++++++++++++++++---
>>> net/bluetooth/hci_event.c        |  2 +
>>> 3 files changed, 88 insertions(+), 6 deletions(-)
>>> 
>>> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
>>> index 836dc997ff94..b69ecdd0d15a 100644
>>> --- a/include/net/bluetooth/hci_core.h
>>> +++ b/include/net/bluetooth/hci_core.h
>>> @@ -270,6 +270,9 @@ struct adv_monitor {
>>> /* Default authenticated payload timeout 30s */
>>> #define DEFAULT_AUTH_PAYLOAD_TIMEOUT   0x0bb8
>>> 
>>> +/* Time to keep ACL packets without a corresponding handle queued (2s) */
>>> +#define PENDING_ACL_TIMEOUT          msecs_to_jiffies(2000)
>>> +
>> 
>> Do we have some btmon traces with timestamps. Isn’t a second enough? Actually 2 seconds is an awful long time.
> 
> When this happens in the test lab, the HCI connect event is about
> 0.002 second behind the first ACL packet. We can change this if
> required.
> 
>> 
>>> struct amp_assoc {
>>>      __u16   len;
>>>      __u16   offset;
>>> @@ -538,6 +541,9 @@ struct hci_dev {
>>>      struct delayed_work     rpa_expired;
>>>      bdaddr_t                rpa;
>>> 
>>> +     struct delayed_work     remove_pending_acl;
>>> +     struct sk_buff_head     pending_acl_q;
>>> +
>> 
>> can we name this ooo_q and move it to the other queues in this struct. Unless we want to add a Kconfig option around it, we don’t need to keep it here.
> 
> Ack.
> 
>> 
>>> #if IS_ENABLED(CONFIG_BT_LEDS)
>>>      struct led_trigger      *power_led;
>>> #endif
>>> @@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, __le16 ediv, __le64 rand,
>>> void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr,
>>>                             u8 *bdaddr_type);
>>> 
>>> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn);
>>> +
>>> #define SCO_AIRMODE_MASK       0x0003
>>> #define SCO_AIRMODE_CVSD       0x0000
>>> #define SCO_AIRMODE_TRANSP     0x0003
>>> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>>> index 7959b851cc63..30780242c267 100644
>>> --- a/net/bluetooth/hci_core.c
>>> +++ b/net/bluetooth/hci_core.c
>>> @@ -1786,6 +1786,7 @@ int hci_dev_do_close(struct hci_dev *hdev)
>>>      skb_queue_purge(&hdev->rx_q);
>>>      skb_queue_purge(&hdev->cmd_q);
>>>      skb_queue_purge(&hdev->raw_q);
>>> +     skb_queue_purge(&hdev->pending_acl_q);
>>> 
>>>      /* Drop last sent command */
>>>      if (hdev->sent_cmd) {
>>> @@ -3518,6 +3519,78 @@ static int hci_suspend_notifier(struct notifier_block *nb, unsigned long action,
>>>      return NOTIFY_STOP;
>>> }
>>> 
>>> +static void hci_add_pending_acl(struct hci_dev *hdev, struct sk_buff *skb)
>>> +{
>>> +     skb_queue_tail(&hdev->pending_acl_q, skb);
>>> +
>>> +     queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
>>> +                        PENDING_ACL_TIMEOUT);
>>> +}
>>> +
>>> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn)
>>> +{
>>> +     struct sk_buff *skb, *tmp;
>>> +     struct hci_acl_hdr *hdr;
>>> +     u16 handle, flags;
>>> +     bool reset_timer = false;
>>> +
>>> +     skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
>>> +             hdr = (struct hci_acl_hdr *)skb->data;
>>> +             handle = __le16_to_cpu(hdr->handle);
>>> +             flags  = hci_flags(handle);
>>> +             handle = hci_handle(handle);
>>> +
>>> +             if (handle != conn->handle)
>>> +                     continue;
>>> +
>>> +             __skb_unlink(skb, &hdev->pending_acl_q);
>>> +             skb_pull(skb, HCI_ACL_HDR_SIZE);
>>> +
>>> +             l2cap_recv_acldata(conn, skb, flags);
>>> +             reset_timer = true;
>>> +     }
>>> +
>>> +     if (reset_timer)
>>> +             mod_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
>>> +                              PENDING_ACL_TIMEOUT);
>>> +}
>>> +
>>> +/* Remove the oldest pending ACL, and all pending ACLs with the same handle */
>>> +static void hci_remove_pending_acl(struct work_struct *work)
>>> +{
>>> +     struct hci_dev *hdev;
>>> +     struct sk_buff *skb, *tmp;
>>> +     struct hci_acl_hdr *hdr;
>>> +     u16 handle, oldest_handle;
>>> +
>>> +     hdev = container_of(work, struct hci_dev, remove_pending_acl.work);
>>> +     skb = skb_dequeue(&hdev->pending_acl_q);
>>> +
>>> +     if (!skb)
>>> +             return;
>>> +
>>> +     hdr = (struct hci_acl_hdr *)skb->data;
>>> +     oldest_handle = hci_handle(__le16_to_cpu(hdr->handle));
>>> +     kfree_skb(skb);
>>> +
>>> +     bt_dev_err(hdev, "ACL packet for unknown connection handle %d",
>>> +                oldest_handle);
>>> +
>>> +     skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
>>> +             hdr = (struct hci_acl_hdr *)skb->data;
>>> +             handle = hci_handle(__le16_to_cpu(hdr->handle));
>>> +
>>> +             if (handle == oldest_handle) {
>>> +                     __skb_unlink(skb, &hdev->pending_acl_q);
>>> +                     kfree_skb(skb);
>>> +             }
>>> +     }
>>> +
>>> +     if (!skb_queue_empty(&hdev->pending_acl_q))
>>> +             queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
>>> +                                PENDING_ACL_TIMEOUT);
>>> +}
>>> +
>> 
>> So I am wondering if we make this too complicated. Since generally speaking we can only have a single HCI connect complete anyway at a time. No matter if the controller serializes it for us or we do it for the controller. So hci_conn_add could just process the queue for packets with its handle and then flush it. And it can flush it no matter what since whatever other packets are in the queue, they can not be valid.
>> 
>> That said, we wouldn’t even need to check the packet handles at all. We just needed to flag them as already out-of-order queued once and hand them back into the rx_q at the top. Then the would be processed as usual. Already ooo packets would cause the same error as before if it is for a non-existing handle and others would end up being processed.
>> 
>> For me this means we just need another queue to park the packets until hci_conn_add gets called. I might have missed something, but I am looking for the least invasive option for this and least code duplication.
> 
> I'm not aware of the fact that we can only have a single HCI connect
> complete event at any time. Is this also true even if two / more
> peripherals connect at the same time?
> I was under the impression that if we have device A and B both are
> connecting to us at the same time, we might receive the packets in
> this order:
> (1) ACL A
> (2) ACL B
> (3) HCI conn evt B
> (4) HCI conn evt A
> Hence the queue and the handle check.

my reading from the LL state machine is that once the first LL_Connect_Req is processes, the controller moves out of the advertising state. So no other LL_Connect_Req can be processed. So that means that connection attempts are serialized.

Now if you run AE and multiple instances, that might be different, but then again, these instances are also offset in time and so I don’t see how we can get more than one HCI_Connection_Complete event at a time (and with that a leading ACL packet).

Regards

Marcel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found
       [not found]         ` <CALWDO_Vrn_pXMbkXifKFazha7BYPqLpCthqHOb9ZmVE3wDRMfA@mail.gmail.com>
@ 2020-07-15 14:07           ` Alain Michaud
  2020-07-15 15:25             ` Luiz Augusto von Dentz
  2020-07-17  6:51           ` Marcel Holtmann
  1 sibling, 1 reply; 15+ messages in thread
From: Alain Michaud @ 2020-07-15 14:07 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Archie Pusaka, linux-bluetooth, chromeos-bluetooth-upstreaming,
	Abhishek Pandit-Subedi, David S. Miller, Jakub Kicinski,
	Johan Hedberg, kernel list, netdev

Resending in plain text.


On Wed, Jul 15, 2020 at 9:56 AM Alain Michaud <alainmichaud@google.com> wrote:
>
> Hi Marcel,
>
> Sorry, just got around to this.
>
> On Tue, Jun 30, 2020 at 2:55 AM Marcel Holtmann <marcel@holtmann.org> wrote:
>>
>> Hi Archie,
>>
>> >>> There is a possibility that an ACL packet is received before we
>> >>> receive the HCI connect event for the corresponding handle. If this
>> >>> happens, we discard the ACL packet.
>> >>>
>> >>> Rather than just ignoring them, this patch provides a queue for
>> >>> incoming ACL packet without a handle. The queue is processed when
>> >>> receiving a HCI connection event. If 2 seconds elapsed without
>> >>> receiving the HCI connection event, assume something bad happened
>> >>> and discard the queued packet.
>> >>>
>> >>> Signed-off-by: Archie Pusaka <apusaka@chromium.org>
>> >>> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
>> >>
>> >> so two things up front. I want to hide this behind a HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. Frankly if this kind of out-of-order happens on UART or SDIO transports, then something is obviously going wrong. I have no plan to fix up after a fully serialized transport.
>> >>
>> >> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want this off by default. You can enable it via an experimental setting. The reason here is that we have to make it really hard and fail as often as possible so that hardware manufactures and spec writers realize that something is fundamentally broken here.
>
> I don't have any objection to making this explicit enable to non serialized transports.  However, I do wonder what the intention is around making this off by default.  We already know there is a race condition between the interupt and bulk endpoints over USB, so this can and does happen.  Hardware manufaturers can't relly do much about this other than trying to pull the interupt endpoint more often, but that's only a workaround, it can't avoid it all together.
>
> IMO, this seems like a legitimate fix at the host level and I don't see any obvious benefits to hide this fix under an experimental feature and make it more difficult for the customers and system integrators to discover.
>
>>
>> >>
>> >> I have no problem in running the code and complaining loudly in case the quirk has been set. Just injecting the packets can only happen if bluetoothd explicitly enabled it.
>> >
>> > Got it.
>> >
>> >>
>> >>
>> >>>
>> >>> ---
>> >>>
>> >>> include/net/bluetooth/hci_core.h |  8 +++
>> >>> net/bluetooth/hci_core.c         | 84 +++++++++++++++++++++++++++++---
>> >>> net/bluetooth/hci_event.c        |  2 +
>> >>> 3 files changed, 88 insertions(+), 6 deletions(-)
>> >>>
>> >>> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
>> >>> index 836dc997ff94..b69ecdd0d15a 100644
>> >>> --- a/include/net/bluetooth/hci_core.h
>> >>> +++ b/include/net/bluetooth/hci_core.h
>> >>> @@ -270,6 +270,9 @@ struct adv_monitor {
>> >>> /* Default authenticated payload timeout 30s */
>> >>> #define DEFAULT_AUTH_PAYLOAD_TIMEOUT   0x0bb8
>> >>>
>> >>> +/* Time to keep ACL packets without a corresponding handle queued (2s) */
>> >>> +#define PENDING_ACL_TIMEOUT          msecs_to_jiffies(2000)
>> >>> +
>> >>
>> >> Do we have some btmon traces with timestamps. Isn’t a second enough? Actually 2 seconds is an awful long time.
>> >
>> > When this happens in the test lab, the HCI connect event is about
>> > 0.002 second behind the first ACL packet. We can change this if
>> > required.
>> >
>> >>
>> >>> struct amp_assoc {
>> >>>      __u16   len;
>> >>>      __u16   offset;
>> >>> @@ -538,6 +541,9 @@ struct hci_dev {
>> >>>      struct delayed_work     rpa_expired;
>> >>>      bdaddr_t                rpa;
>> >>>
>> >>> +     struct delayed_work     remove_pending_acl;
>> >>> +     struct sk_buff_head     pending_acl_q;
>> >>> +
>> >>
>> >> can we name this ooo_q and move it to the other queues in this struct. Unless we want to add a Kconfig option around it, we don’t need to keep it here.
>> >
>> > Ack.
>> >
>> >>
>> >>> #if IS_ENABLED(CONFIG_BT_LEDS)
>> >>>      struct led_trigger      *power_led;
>> >>> #endif
>> >>> @@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, __le16 ediv, __le64 rand,
>> >>> void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr,
>> >>>                             u8 *bdaddr_type);
>> >>>
>> >>> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn);
>> >>> +
>> >>> #define SCO_AIRMODE_MASK       0x0003
>> >>> #define SCO_AIRMODE_CVSD       0x0000
>> >>> #define SCO_AIRMODE_TRANSP     0x0003
>> >>> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
>> >>> index 7959b851cc63..30780242c267 100644
>> >>> --- a/net/bluetooth/hci_core.c
>> >>> +++ b/net/bluetooth/hci_core.c
>> >>> @@ -1786,6 +1786,7 @@ int hci_dev_do_close(struct hci_dev *hdev)
>> >>>      skb_queue_purge(&hdev->rx_q);
>> >>>      skb_queue_purge(&hdev->cmd_q);
>> >>>      skb_queue_purge(&hdev->raw_q);
>> >>> +     skb_queue_purge(&hdev->pending_acl_q);
>> >>>
>> >>>      /* Drop last sent command */
>> >>>      if (hdev->sent_cmd) {
>> >>> @@ -3518,6 +3519,78 @@ static int hci_suspend_notifier(struct notifier_block *nb, unsigned long action,
>> >>>      return NOTIFY_STOP;
>> >>> }
>> >>>
>> >>> +static void hci_add_pending_acl(struct hci_dev *hdev, struct sk_buff *skb)
>> >>> +{
>> >>> +     skb_queue_tail(&hdev->pending_acl_q, skb);
>> >>> +
>> >>> +     queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
>> >>> +                        PENDING_ACL_TIMEOUT);
>> >>> +}
>> >>> +
>> >>> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn)
>> >>> +{
>> >>> +     struct sk_buff *skb, *tmp;
>> >>> +     struct hci_acl_hdr *hdr;
>> >>> +     u16 handle, flags;
>> >>> +     bool reset_timer = false;
>> >>> +
>> >>> +     skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
>> >>> +             hdr = (struct hci_acl_hdr *)skb->data;
>> >>> +             handle = __le16_to_cpu(hdr->handle);
>> >>> +             flags  = hci_flags(handle);
>> >>> +             handle = hci_handle(handle);
>> >>> +
>> >>> +             if (handle != conn->handle)
>> >>> +                     continue;
>> >>> +
>> >>> +             __skb_unlink(skb, &hdev->pending_acl_q);
>> >>> +             skb_pull(skb, HCI_ACL_HDR_SIZE);
>> >>> +
>> >>> +             l2cap_recv_acldata(conn, skb, flags);
>> >>> +             reset_timer = true;
>> >>> +     }
>> >>> +
>> >>> +     if (reset_timer)
>> >>> +             mod_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
>> >>> +                              PENDING_ACL_TIMEOUT);
>> >>> +}
>> >>> +
>> >>> +/* Remove the oldest pending ACL, and all pending ACLs with the same handle */
>> >>> +static void hci_remove_pending_acl(struct work_struct *work)
>> >>> +{
>> >>> +     struct hci_dev *hdev;
>> >>> +     struct sk_buff *skb, *tmp;
>> >>> +     struct hci_acl_hdr *hdr;
>> >>> +     u16 handle, oldest_handle;
>> >>> +
>> >>> +     hdev = container_of(work, struct hci_dev, remove_pending_acl.work);
>> >>> +     skb = skb_dequeue(&hdev->pending_acl_q);
>> >>> +
>> >>> +     if (!skb)
>> >>> +             return;
>> >>> +
>> >>> +     hdr = (struct hci_acl_hdr *)skb->data;
>> >>> +     oldest_handle = hci_handle(__le16_to_cpu(hdr->handle));
>> >>> +     kfree_skb(skb);
>> >>> +
>> >>> +     bt_dev_err(hdev, "ACL packet for unknown connection handle %d",
>> >>> +                oldest_handle);
>> >>> +
>> >>> +     skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
>> >>> +             hdr = (struct hci_acl_hdr *)skb->data;
>> >>> +             handle = hci_handle(__le16_to_cpu(hdr->handle));
>> >>> +
>> >>> +             if (handle == oldest_handle) {
>> >>> +                     __skb_unlink(skb, &hdev->pending_acl_q);
>> >>> +                     kfree_skb(skb);
>> >>> +             }
>> >>> +     }
>> >>> +
>> >>> +     if (!skb_queue_empty(&hdev->pending_acl_q))
>> >>> +             queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
>> >>> +                                PENDING_ACL_TIMEOUT);
>> >>> +}
>> >>> +
>> >>
>> >> So I am wondering if we make this too complicated. Since generally speaking we can only have a single HCI connect complete anyway at a time. No matter if the controller serializes it for us or we do it for the controller. So hci_conn_add could just process the queue for packets with its handle and then flush it. And it can flush it no matter what since whatever other packets are in the queue, they can not be valid.
>> >>
>> >> That said, we wouldn’t even need to check the packet handles at all. We just needed to flag them as already out-of-order queued once and hand them back into the rx_q at the top. Then the would be processed as usual. Already ooo packets would cause the same error as before if it is for a non-existing handle and others would end up being processed.
>> >>
>> >> For me this means we just need another queue to park the packets until hci_conn_add gets called. I might have missed something, but I am looking for the least invasive option for this and least code duplication.
>> >
>> > I'm not aware of the fact that we can only have a single HCI connect
>> > complete event at any time. Is this also true even if two / more
>> > peripherals connect at the same time?
>> > I was under the impression that if we have device A and B both are
>> > connecting to us at the same time, we might receive the packets in
>> > this order:
>> > (1) ACL A
>> > (2) ACL B
>> > (3) HCI conn evt B
>> > (4) HCI conn evt A
>> > Hence the queue and the handle check.
>>
>> my reading from the LL state machine is that once the first LL_Connect_Req is processes, the controller moves out of the advertising state. So no other LL_Connect_Req can be processed. So that means that connection attempts are serialized.
>>
>> Now if you run AE and multiple instances, that might be different, but then again, these instances are also offset in time and so I don’t see how we can get more than one HCI_Connection_Complete event at a time (and with that a leading ACL packet).
>>
>> Regards
>>
>> Marcel
>>
>> --
>> You received this message because you are subscribed to the Google Groups "ChromeOS Bluetooth Upstreaming" group.
>> To unsubscribe from this group and stop receiving emails from it, send an email to chromeos-bluetooth-upstreaming+unsubscribe@chromium.org.
>> To post to this group, send email to chromeos-bluetooth-upstreaming@chromium.org.
>> To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromeos-bluetooth-upstreaming/7BBB55E0-FBD9-40C0-80D9-D5E7FC9F80D2%40holtmann.org.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found
  2020-07-15 14:07           ` Alain Michaud
@ 2020-07-15 15:25             ` Luiz Augusto von Dentz
  0 siblings, 0 replies; 15+ messages in thread
From: Luiz Augusto von Dentz @ 2020-07-15 15:25 UTC (permalink / raw)
  To: Alain Michaud
  Cc: Marcel Holtmann, Archie Pusaka, linux-bluetooth,
	chromeos-bluetooth-upstreaming, Abhishek Pandit-Subedi,
	David S. Miller, Jakub Kicinski, Johan Hedberg, kernel list,
	netdev

Hi Alain,

On Wed, Jul 15, 2020 at 8:10 AM Alain Michaud <alainmichaud@google.com> wrote:
>
> Resending in plain text.

I've sent a RFC to work out the ordering, that should work out for any
race where it process ACL before Events during a polling interval
(1ms) so I hope that is enough to catch all these races, if that is
not perhaps we could make the interval configurable.

>
> On Wed, Jul 15, 2020 at 9:56 AM Alain Michaud <alainmichaud@google.com> wrote:
> >
> > Hi Marcel,
> >
> > Sorry, just got around to this.
> >
> > On Tue, Jun 30, 2020 at 2:55 AM Marcel Holtmann <marcel@holtmann.org> wrote:
> >>
> >> Hi Archie,
> >>
> >> >>> There is a possibility that an ACL packet is received before we
> >> >>> receive the HCI connect event for the corresponding handle. If this
> >> >>> happens, we discard the ACL packet.
> >> >>>
> >> >>> Rather than just ignoring them, this patch provides a queue for
> >> >>> incoming ACL packet without a handle. The queue is processed when
> >> >>> receiving a HCI connection event. If 2 seconds elapsed without
> >> >>> receiving the HCI connection event, assume something bad happened
> >> >>> and discard the queued packet.
> >> >>>
> >> >>> Signed-off-by: Archie Pusaka <apusaka@chromium.org>
> >> >>> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
> >> >>
> >> >> so two things up front. I want to hide this behind a HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. Frankly if this kind of out-of-order happens on UART or SDIO transports, then something is obviously going wrong. I have no plan to fix up after a fully serialized transport.
> >> >>
> >> >> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want this off by default. You can enable it via an experimental setting. The reason here is that we have to make it really hard and fail as often as possible so that hardware manufactures and spec writers realize that something is fundamentally broken here.
> >
> > I don't have any objection to making this explicit enable to non serialized transports.  However, I do wonder what the intention is around making this off by default.  We already know there is a race condition between the interupt and bulk endpoints over USB, so this can and does happen.  Hardware manufaturers can't relly do much about this other than trying to pull the interupt endpoint more often, but that's only a workaround, it can't avoid it all together.
> >
> > IMO, this seems like a legitimate fix at the host level and I don't see any obvious benefits to hide this fix under an experimental feature and make it more difficult for the customers and system integrators to discover.
> >
> >>
> >> >>
> >> >> I have no problem in running the code and complaining loudly in case the quirk has been set. Just injecting the packets can only happen if bluetoothd explicitly enabled it.
> >> >
> >> > Got it.
> >> >
> >> >>
> >> >>
> >> >>>
> >> >>> ---
> >> >>>
> >> >>> include/net/bluetooth/hci_core.h |  8 +++
> >> >>> net/bluetooth/hci_core.c         | 84 +++++++++++++++++++++++++++++---
> >> >>> net/bluetooth/hci_event.c        |  2 +
> >> >>> 3 files changed, 88 insertions(+), 6 deletions(-)
> >> >>>
> >> >>> diff --git a/include/net/bluetooth/hci_core.h b/include/net/bluetooth/hci_core.h
> >> >>> index 836dc997ff94..b69ecdd0d15a 100644
> >> >>> --- a/include/net/bluetooth/hci_core.h
> >> >>> +++ b/include/net/bluetooth/hci_core.h
> >> >>> @@ -270,6 +270,9 @@ struct adv_monitor {
> >> >>> /* Default authenticated payload timeout 30s */
> >> >>> #define DEFAULT_AUTH_PAYLOAD_TIMEOUT   0x0bb8
> >> >>>
> >> >>> +/* Time to keep ACL packets without a corresponding handle queued (2s) */
> >> >>> +#define PENDING_ACL_TIMEOUT          msecs_to_jiffies(2000)
> >> >>> +
> >> >>
> >> >> Do we have some btmon traces with timestamps. Isn’t a second enough? Actually 2 seconds is an awful long time.
> >> >
> >> > When this happens in the test lab, the HCI connect event is about
> >> > 0.002 second behind the first ACL packet. We can change this if
> >> > required.
> >> >
> >> >>
> >> >>> struct amp_assoc {
> >> >>>      __u16   len;
> >> >>>      __u16   offset;
> >> >>> @@ -538,6 +541,9 @@ struct hci_dev {
> >> >>>      struct delayed_work     rpa_expired;
> >> >>>      bdaddr_t                rpa;
> >> >>>
> >> >>> +     struct delayed_work     remove_pending_acl;
> >> >>> +     struct sk_buff_head     pending_acl_q;
> >> >>> +
> >> >>
> >> >> can we name this ooo_q and move it to the other queues in this struct. Unless we want to add a Kconfig option around it, we don’t need to keep it here.
> >> >
> >> > Ack.
> >> >
> >> >>
> >> >>> #if IS_ENABLED(CONFIG_BT_LEDS)
> >> >>>      struct led_trigger      *power_led;
> >> >>> #endif
> >> >>> @@ -1773,6 +1779,8 @@ void hci_le_start_enc(struct hci_conn *conn, __le16 ediv, __le64 rand,
> >> >>> void hci_copy_identity_address(struct hci_dev *hdev, bdaddr_t *bdaddr,
> >> >>>                             u8 *bdaddr_type);
> >> >>>
> >> >>> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn);
> >> >>> +
> >> >>> #define SCO_AIRMODE_MASK       0x0003
> >> >>> #define SCO_AIRMODE_CVSD       0x0000
> >> >>> #define SCO_AIRMODE_TRANSP     0x0003
> >> >>> diff --git a/net/bluetooth/hci_core.c b/net/bluetooth/hci_core.c
> >> >>> index 7959b851cc63..30780242c267 100644
> >> >>> --- a/net/bluetooth/hci_core.c
> >> >>> +++ b/net/bluetooth/hci_core.c
> >> >>> @@ -1786,6 +1786,7 @@ int hci_dev_do_close(struct hci_dev *hdev)
> >> >>>      skb_queue_purge(&hdev->rx_q);
> >> >>>      skb_queue_purge(&hdev->cmd_q);
> >> >>>      skb_queue_purge(&hdev->raw_q);
> >> >>> +     skb_queue_purge(&hdev->pending_acl_q);
> >> >>>
> >> >>>      /* Drop last sent command */
> >> >>>      if (hdev->sent_cmd) {
> >> >>> @@ -3518,6 +3519,78 @@ static int hci_suspend_notifier(struct notifier_block *nb, unsigned long action,
> >> >>>      return NOTIFY_STOP;
> >> >>> }
> >> >>>
> >> >>> +static void hci_add_pending_acl(struct hci_dev *hdev, struct sk_buff *skb)
> >> >>> +{
> >> >>> +     skb_queue_tail(&hdev->pending_acl_q, skb);
> >> >>> +
> >> >>> +     queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
> >> >>> +                        PENDING_ACL_TIMEOUT);
> >> >>> +}
> >> >>> +
> >> >>> +void hci_process_pending_acl(struct hci_dev *hdev, struct hci_conn *conn)
> >> >>> +{
> >> >>> +     struct sk_buff *skb, *tmp;
> >> >>> +     struct hci_acl_hdr *hdr;
> >> >>> +     u16 handle, flags;
> >> >>> +     bool reset_timer = false;
> >> >>> +
> >> >>> +     skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
> >> >>> +             hdr = (struct hci_acl_hdr *)skb->data;
> >> >>> +             handle = __le16_to_cpu(hdr->handle);
> >> >>> +             flags  = hci_flags(handle);
> >> >>> +             handle = hci_handle(handle);
> >> >>> +
> >> >>> +             if (handle != conn->handle)
> >> >>> +                     continue;
> >> >>> +
> >> >>> +             __skb_unlink(skb, &hdev->pending_acl_q);
> >> >>> +             skb_pull(skb, HCI_ACL_HDR_SIZE);
> >> >>> +
> >> >>> +             l2cap_recv_acldata(conn, skb, flags);
> >> >>> +             reset_timer = true;
> >> >>> +     }
> >> >>> +
> >> >>> +     if (reset_timer)
> >> >>> +             mod_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
> >> >>> +                              PENDING_ACL_TIMEOUT);
> >> >>> +}
> >> >>> +
> >> >>> +/* Remove the oldest pending ACL, and all pending ACLs with the same handle */
> >> >>> +static void hci_remove_pending_acl(struct work_struct *work)
> >> >>> +{
> >> >>> +     struct hci_dev *hdev;
> >> >>> +     struct sk_buff *skb, *tmp;
> >> >>> +     struct hci_acl_hdr *hdr;
> >> >>> +     u16 handle, oldest_handle;
> >> >>> +
> >> >>> +     hdev = container_of(work, struct hci_dev, remove_pending_acl.work);
> >> >>> +     skb = skb_dequeue(&hdev->pending_acl_q);
> >> >>> +
> >> >>> +     if (!skb)
> >> >>> +             return;
> >> >>> +
> >> >>> +     hdr = (struct hci_acl_hdr *)skb->data;
> >> >>> +     oldest_handle = hci_handle(__le16_to_cpu(hdr->handle));
> >> >>> +     kfree_skb(skb);
> >> >>> +
> >> >>> +     bt_dev_err(hdev, "ACL packet for unknown connection handle %d",
> >> >>> +                oldest_handle);
> >> >>> +
> >> >>> +     skb_queue_walk_safe(&hdev->pending_acl_q, skb, tmp) {
> >> >>> +             hdr = (struct hci_acl_hdr *)skb->data;
> >> >>> +             handle = hci_handle(__le16_to_cpu(hdr->handle));
> >> >>> +
> >> >>> +             if (handle == oldest_handle) {
> >> >>> +                     __skb_unlink(skb, &hdev->pending_acl_q);
> >> >>> +                     kfree_skb(skb);
> >> >>> +             }
> >> >>> +     }
> >> >>> +
> >> >>> +     if (!skb_queue_empty(&hdev->pending_acl_q))
> >> >>> +             queue_delayed_work(hdev->workqueue, &hdev->remove_pending_acl,
> >> >>> +                                PENDING_ACL_TIMEOUT);
> >> >>> +}
> >> >>> +
> >> >>
> >> >> So I am wondering if we make this too complicated. Since generally speaking we can only have a single HCI connect complete anyway at a time. No matter if the controller serializes it for us or we do it for the controller. So hci_conn_add could just process the queue for packets with its handle and then flush it. And it can flush it no matter what since whatever other packets are in the queue, they can not be valid.
> >> >>
> >> >> That said, we wouldn’t even need to check the packet handles at all. We just needed to flag them as already out-of-order queued once and hand them back into the rx_q at the top. Then the would be processed as usual. Already ooo packets would cause the same error as before if it is for a non-existing handle and others would end up being processed.
> >> >>
> >> >> For me this means we just need another queue to park the packets until hci_conn_add gets called. I might have missed something, but I am looking for the least invasive option for this and least code duplication.
> >> >
> >> > I'm not aware of the fact that we can only have a single HCI connect
> >> > complete event at any time. Is this also true even if two / more
> >> > peripherals connect at the same time?
> >> > I was under the impression that if we have device A and B both are
> >> > connecting to us at the same time, we might receive the packets in
> >> > this order:
> >> > (1) ACL A
> >> > (2) ACL B
> >> > (3) HCI conn evt B
> >> > (4) HCI conn evt A
> >> > Hence the queue and the handle check.
> >>
> >> my reading from the LL state machine is that once the first LL_Connect_Req is processes, the controller moves out of the advertising state. So no other LL_Connect_Req can be processed. So that means that connection attempts are serialized.
> >>
> >> Now if you run AE and multiple instances, that might be different, but then again, these instances are also offset in time and so I don’t see how we can get more than one HCI_Connection_Complete event at a time (and with that a leading ACL packet).
> >>
> >> Regards
> >>
> >> Marcel
> >>
> >> --
> >> You received this message because you are subscribed to the Google Groups "ChromeOS Bluetooth Upstreaming" group.
> >> To unsubscribe from this group and stop receiving emails from it, send an email to chromeos-bluetooth-upstreaming+unsubscribe@chromium.org.
> >> To post to this group, send email to chromeos-bluetooth-upstreaming@chromium.org.
> >> To view this discussion on the web visit https://groups.google.com/a/chromium.org/d/msgid/chromeos-bluetooth-upstreaming/7BBB55E0-FBD9-40C0-80D9-D5E7FC9F80D2%40holtmann.org.



-- 
Luiz Augusto von Dentz

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found
       [not found]         ` <CALWDO_Vrn_pXMbkXifKFazha7BYPqLpCthqHOb9ZmVE3wDRMfA@mail.gmail.com>
  2020-07-15 14:07           ` Alain Michaud
@ 2020-07-17  6:51           ` Marcel Holtmann
  2020-07-17 13:17             ` Alain Michaud
  1 sibling, 1 reply; 15+ messages in thread
From: Marcel Holtmann @ 2020-07-17  6:51 UTC (permalink / raw)
  To: Alain Michaud
  Cc: Archie Pusaka, linux-bluetooth, chromeos-bluetooth-upstreaming,
	Abhishek Pandit-Subedi, David S. Miller, Jakub Kicinski,
	Johan Hedberg, kernel list, netdev

Hi Alain,

> >>> There is a possibility that an ACL packet is received before we
> >>> receive the HCI connect event for the corresponding handle. If this
> >>> happens, we discard the ACL packet.
> >>> 
> >>> Rather than just ignoring them, this patch provides a queue for
> >>> incoming ACL packet without a handle. The queue is processed when
> >>> receiving a HCI connection event. If 2 seconds elapsed without
> >>> receiving the HCI connection event, assume something bad happened
> >>> and discard the queued packet.
> >>> 
> >>> Signed-off-by: Archie Pusaka <apusaka@chromium.org>
> >>> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
> >> 
> >> so two things up front. I want to hide this behind a HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. Frankly if this kind of out-of-order happens on UART or SDIO transports, then something is obviously going wrong. I have no plan to fix up after a fully serialized transport.
> >> 
> >> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want this off by default. You can enable it via an experimental setting. The reason here is that we have to make it really hard and fail as often as possible so that hardware manufactures and spec writers realize that something is fundamentally broken here.
> I don't have any objection to making this explicit enable to non serialized transports.  However, I do wonder what the intention is around making this off by default.  We already know there is a race condition between the interupt and bulk endpoints over USB, so this can and does happen.  Hardware manufaturers can't relly do much about this other than trying to pull the interupt endpoint more often, but that's only a workaround, it can't avoid it all together.
> 
> IMO, this seems like a legitimate fix at the host level and I don't see any obvious benefits to hide this fix under an experimental feature and make it more difficult for the customers and system integrators to discover.

the problem is that this is not a fix. It is papering over a hole and at best a workaround with both eyes closed and hoping for the best. I am not looking forward for the first security researcher to figure out that they have a chance to inject an unencrypted packet since we are waiting 2 seconds for the USB transport to get its act together.

In addition, I think that Luiz attempt to align with the poll intervals inside the USB transport directly is a cleaner and more self-contained approach. It also reduces the window of opportunity for any attacker since we actually align the USB transport specific intervals with each other.

Regards

Marcel


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found
  2020-07-17  6:51           ` Marcel Holtmann
@ 2020-07-17 13:17             ` Alain Michaud
  0 siblings, 0 replies; 15+ messages in thread
From: Alain Michaud @ 2020-07-17 13:17 UTC (permalink / raw)
  To: Marcel Holtmann
  Cc: Archie Pusaka, linux-bluetooth, chromeos-bluetooth-upstreaming,
	Abhishek Pandit-Subedi, David S. Miller, Jakub Kicinski,
	Johan Hedberg, kernel list, netdev

Hi Marcel,

On Fri, Jul 17, 2020 at 2:51 AM Marcel Holtmann <marcel@holtmann.org> wrote:
>
> Hi Alain,
>
> > >>> There is a possibility that an ACL packet is received before we
> > >>> receive the HCI connect event for the corresponding handle. If this
> > >>> happens, we discard the ACL packet.
> > >>>
> > >>> Rather than just ignoring them, this patch provides a queue for
> > >>> incoming ACL packet without a handle. The queue is processed when
> > >>> receiving a HCI connection event. If 2 seconds elapsed without
> > >>> receiving the HCI connection event, assume something bad happened
> > >>> and discard the queued packet.
> > >>>
> > >>> Signed-off-by: Archie Pusaka <apusaka@chromium.org>
> > >>> Reviewed-by: Abhishek Pandit-Subedi <abhishekpandit@chromium.org>
> > >>
> > >> so two things up front. I want to hide this behind a HCI_QUIRK_OUT_OF_ORDER_ACL that a transport driver has to set first. Frankly if this kind of out-of-order happens on UART or SDIO transports, then something is obviously going wrong. I have no plan to fix up after a fully serialized transport.
> > >>
> > >> Secondly, if a transport sets HCI_QUIRK_OUT_OF_ORDER_ACL, then I want this off by default. You can enable it via an experimental setting. The reason here is that we have to make it really hard and fail as often as possible so that hardware manufactures and spec writers realize that something is fundamentally broken here.
> > I don't have any objection to making this explicit enable to non serialized transports.  However, I do wonder what the intention is around making this off by default.  We already know there is a race condition between the interupt and bulk endpoints over USB, so this can and does happen.  Hardware manufaturers can't relly do much about this other than trying to pull the interupt endpoint more often, but that's only a workaround, it can't avoid it all together.
> >
> > IMO, this seems like a legitimate fix at the host level and I don't see any obvious benefits to hide this fix under an experimental feature and make it more difficult for the customers and system integrators to discover.
>
> the problem is that this is not a fix. It is papering over a hole and at best a workaround with both eyes closed and hoping for the best. I am not looking forward for the first security researcher to figure out that they have a chance to inject an unencrypted packet since we are waiting 2 seconds for the USB transport to get its act together.
I don't think this is the right characterization but I agree, 2
seconds would be too long, it would ideally be no longer than the USB
polling interval diff.

>
> In addition, I think that Luiz attempt to align with the poll intervals inside the USB transport directly is a cleaner and more self-contained approach. It also reduces the window of opportunity for any attacker since we actually align the USB transport specific intervals with each other.
I'll have to look at Luiz's patch and think through if this really
eliminates the problem.  If may indeed be a more practical approach to
this problem.

>
> Regards
>
> Marcel
>

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-07-17 13:18 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-27 10:54 [RFC PATCH v1 0/2] handle USB endpoint race condition Archie Pusaka
2020-06-27 10:54 ` [RFC PATCH v1 1/2] Bluetooth: queue ACL packets if no handle is found Archie Pusaka
2020-06-29  6:40   ` Marcel Holtmann
2020-06-30  6:48     ` Archie Pusaka
2020-06-30  6:54       ` Marcel Holtmann
     [not found]         ` <CALWDO_Vrn_pXMbkXifKFazha7BYPqLpCthqHOb9ZmVE3wDRMfA@mail.gmail.com>
2020-07-15 14:07           ` Alain Michaud
2020-07-15 15:25             ` Luiz Augusto von Dentz
2020-07-17  6:51           ` Marcel Holtmann
2020-07-17 13:17             ` Alain Michaud
2020-06-27 10:54 ` [RFC PATCH v1 2/2] Bluetooth: queue L2CAP conn req if encryption is needed Archie Pusaka
2020-06-29  6:49   ` Marcel Holtmann
2020-06-29 19:42     ` Luiz Augusto von Dentz
2020-06-29 20:20       ` Marcel Holtmann
2020-06-29 22:44         ` Luiz Augusto von Dentz
2020-06-30  6:48           ` Marcel Holtmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).