All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/5] net: atlantic: more fuzzing fixes
@ 2022-04-18 23:17 Grant Grundler
  2022-04-18 23:17 ` [PATCH 1/5] net: atlantic: limit buff_ring index value Grant Grundler
                   ` (5 more replies)
  0 siblings, 6 replies; 15+ messages in thread
From: Grant Grundler @ 2022-04-18 23:17 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: Jakub Kicinski, Paolo Abeni, netdev, David S . Miller, LKML,
	Aashay Shringarpure, Yi Chou, Shervin Oloumi, Grant Grundler

The Chrome OS fuzzing team posted a "Fuzzing" report for atlantic driver
in Q4 2021 using Chrome OS v5.4 kernel and "Cable Matters
Thunderbolt 3 to 10 Gb Ethernet" (b0 version):
    https://docs.google.com/document/d/e/2PACX-1vT4oCGNhhy_AuUqpu6NGnW0N9HF_jxf2kS7raOpOlNRqJNiTHAtjiHRthXYSeXIRTgfeVvsEt0qK9qK/pub

It essentially describes four problems:
1) validate rxd_wb->next_desc_ptr before populating buff->next
2) "frag[0] not initialized" case in aq_ring_rx_clean()
3) limit iterations handling fragments in aq_ring_rx_clean()
4) validate hw_head_ in hw_atl_b0_hw_ring_tx_head_update()

I've added one "clean up" contribution:
    "net: atlantic: reduce scope of is_rsc_complete"

I tested the "original" patches using chromeos-v5.4 kernel branch:
    https://chromium-review.googlesource.com/q/hashtag:pcinet-atlantic-2022q1+(status:open%20OR%20status:merged)

The fuzzing team will retest using the chromeos-v5.4 patches and the b0 HW.

I've forward ported those patches to 5.18-rc2 and compiled them but am
currently unable to test them on 5.18-rc2 kernel (logistics problems).

I'm confident in all but the last patch:
   "net: atlantic: verify hw_head_ is reasonable"

Please verify I'm not confusing how ring->sw_head and ring->sw_tail
are used in hw_atl_b0_hw_ring_tx_head_update().

Credit largely goes to Chrome OS Fuzzing team members:
    Aashay Shringarpure, Yi Chou, Shervin Oloumi

cheers,
grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/5] net: atlantic: limit buff_ring index value
  2022-04-18 23:17 [PATCH 0/5] net: atlantic: more fuzzing fixes Grant Grundler
@ 2022-04-18 23:17 ` Grant Grundler
  2022-04-18 23:17 ` [PATCH 2/5] net: atlantic: fix "frag[0] not initialized" Grant Grundler
                   ` (4 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Grant Grundler @ 2022-04-18 23:17 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: Jakub Kicinski, Paolo Abeni, netdev, David S . Miller, LKML,
	Aashay Shringarpure, Yi Chou, Shervin Oloumi, Grant Grundler

buff->next is pulled from data DMA'd by the NIC and later used
to index into the buff_ring[] array. Verify the index is within
the size of the array.

Reported-by: Aashay Shringarpure <aashay@google.com>
Reported-by: Yi Chou <yich@google.com>
Reported-by: Shervin Oloumi <enlightened@google.com>
Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
index d875ce3ec759..e72b9d86f6ad 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
@@ -981,7 +981,9 @@ int hw_atl_b0_hw_ring_rx_receive(struct aq_hw_s *self, struct aq_ring_s *ring)
 
 			if (buff->is_lro) {
 				/* LRO */
-				buff->next = rxd_wb->next_desc_ptr;
+				buff->next =
+					(rxd_wb->next_desc_ptr < ring->size) ?
+					rxd_wb->next_desc_ptr : 0U;
 				++ring->stats.rx.lro_packets;
 			} else {
 				/* jumbo */
-- 
2.36.0.rc0.470.gd361397f0d-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/5] net: atlantic:  fix "frag[0] not initialized"
  2022-04-18 23:17 [PATCH 0/5] net: atlantic: more fuzzing fixes Grant Grundler
  2022-04-18 23:17 ` [PATCH 1/5] net: atlantic: limit buff_ring index value Grant Grundler
@ 2022-04-18 23:17 ` Grant Grundler
  2022-04-18 23:17 ` [PATCH 3/5] net: atlantic: reduce scope of is_rsc_complete Grant Grundler
                   ` (3 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Grant Grundler @ 2022-04-18 23:17 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: Jakub Kicinski, Paolo Abeni, netdev, David S . Miller, LKML,
	Aashay Shringarpure, Yi Chou, Shervin Oloumi, Grant Grundler

In aq_ring_rx_clean(), if buff->is_eop is not set AND
buff->len < AQ_CFG_RX_HDR_SIZE, then hdr_len remains equal to
buff->len and skb_add_rx_frag(xxx, *0*, ...) is not called.

The loop following this code starts calling skb_add_rx_frag() starting
with i=1 and thus frag[0] is never initialized. Since i is initialized
to zero at the top of the primary loop, we can just reference and
post-increment i instead of hardcoding the 0 when calling
skb_add_rx_frag() the first time.

Reported-by: Aashay Shringarpure <aashay@google.com>
Reported-by: Yi Chou <yich@google.com>
Reported-by: Shervin Oloumi <enlightened@google.com>
Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index 77e76c9efd32..440423b0e8ea 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -446,7 +446,7 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
 		       ALIGN(hdr_len, sizeof(long)));
 
 		if (buff->len - hdr_len > 0) {
-			skb_add_rx_frag(skb, 0, buff->rxdata.page,
+			skb_add_rx_frag(skb, i++, buff->rxdata.page,
 					buff->rxdata.pg_off + hdr_len,
 					buff->len - hdr_len,
 					AQ_CFG_RX_FRAME_MAX);
@@ -455,7 +455,6 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
 
 		if (!buff->is_eop) {
 			buff_ = buff;
-			i = 1U;
 			do {
 				next_ = buff_->next;
 				buff_ = &self->buff_ring[next_];
-- 
2.36.0.rc0.470.gd361397f0d-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 3/5] net: atlantic: reduce scope of is_rsc_complete
  2022-04-18 23:17 [PATCH 0/5] net: atlantic: more fuzzing fixes Grant Grundler
  2022-04-18 23:17 ` [PATCH 1/5] net: atlantic: limit buff_ring index value Grant Grundler
  2022-04-18 23:17 ` [PATCH 2/5] net: atlantic: fix "frag[0] not initialized" Grant Grundler
@ 2022-04-18 23:17 ` Grant Grundler
  2022-04-18 23:17 ` [PATCH 4/5] net: atlantic: add check for MAX_SKB_FRAGS Grant Grundler
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 15+ messages in thread
From: Grant Grundler @ 2022-04-18 23:17 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: Jakub Kicinski, Paolo Abeni, netdev, David S . Miller, LKML,
	Aashay Shringarpure, Yi Chou, Shervin Oloumi, Grant Grundler

Don't defer handling the err case outside the loop. That's pointless.

And since is_rsc_complete is only used inside this loop, declare
it inside the loop to reduce it's scope.

Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 13 ++++++-------
 1 file changed, 6 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index 440423b0e8ea..bc1952131799 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -346,7 +346,6 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
 		     int budget)
 {
 	struct net_device *ndev = aq_nic_get_ndev(self->aq_nic);
-	bool is_rsc_completed = true;
 	int err = 0;
 
 	for (; (self->sw_head != self->hw_head) && budget;
@@ -366,6 +365,8 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
 		if (!buff->is_eop) {
 			buff_ = buff;
 			do {
+				bool is_rsc_completed = true;
+
 				if (buff_->next >= self->size) {
 					err = -EIO;
 					goto err_exit;
@@ -377,18 +378,16 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
 							    next_,
 							    self->hw_head);
 
-				if (unlikely(!is_rsc_completed))
-					break;
+				if (unlikely(!is_rsc_completed)) {
+					err = 0;
+					goto err_exit;
+				}
 
 				buff->is_error |= buff_->is_error;
 				buff->is_cso_err |= buff_->is_cso_err;
 
 			} while (!buff_->is_eop);
 
-			if (!is_rsc_completed) {
-				err = 0;
-				goto err_exit;
-			}
 			if (buff->is_error ||
 			    (buff->is_lro && buff->is_cso_err)) {
 				buff_ = buff;
-- 
2.36.0.rc0.470.gd361397f0d-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 4/5] net: atlantic: add check for MAX_SKB_FRAGS
  2022-04-18 23:17 [PATCH 0/5] net: atlantic: more fuzzing fixes Grant Grundler
                   ` (2 preceding siblings ...)
  2022-04-18 23:17 ` [PATCH 3/5] net: atlantic: reduce scope of is_rsc_complete Grant Grundler
@ 2022-04-18 23:17 ` Grant Grundler
  2022-04-18 23:17 ` [PATCH 5/5] net: atlantic: verify hw_head_ is reasonable Grant Grundler
  2022-04-21 19:53 ` [PATCH 0/5] net: atlantic: more fuzzing fixes Grant Grundler
  5 siblings, 0 replies; 15+ messages in thread
From: Grant Grundler @ 2022-04-18 23:17 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: Jakub Kicinski, Paolo Abeni, netdev, David S . Miller, LKML,
	Aashay Shringarpure, Yi Chou, Shervin Oloumi, Grant Grundler

Enforce that the CPU can not get stuck in an infinite loop.

Reported-by: Aashay Shringarpure <aashay@google.com>
Reported-by: Yi Chou <yich@google.com>
Reported-by: Shervin Oloumi <enlightened@google.com>
Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 drivers/net/ethernet/aquantia/atlantic/aq_ring.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
index bc1952131799..8201ce7adb77 100644
--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
@@ -363,6 +363,7 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
 			continue;
 
 		if (!buff->is_eop) {
+			unsigned int frag_cnt = 0U;
 			buff_ = buff;
 			do {
 				bool is_rsc_completed = true;
@@ -371,6 +372,8 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
 					err = -EIO;
 					goto err_exit;
 				}
+
+				frag_cnt++;
 				next_ = buff_->next,
 				buff_ = &self->buff_ring[next_];
 				is_rsc_completed =
@@ -378,7 +381,8 @@ int aq_ring_rx_clean(struct aq_ring_s *self,
 							    next_,
 							    self->hw_head);
 
-				if (unlikely(!is_rsc_completed)) {
+				if (unlikely(!is_rsc_completed) ||
+				    frag_cnt > MAX_SKB_FRAGS) {
 					err = 0;
 					goto err_exit;
 				}
-- 
2.36.0.rc0.470.gd361397f0d-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] net: atlantic: verify hw_head_ is reasonable
  2022-04-18 23:17 [PATCH 0/5] net: atlantic: more fuzzing fixes Grant Grundler
                   ` (3 preceding siblings ...)
  2022-04-18 23:17 ` [PATCH 4/5] net: atlantic: add check for MAX_SKB_FRAGS Grant Grundler
@ 2022-04-18 23:17 ` Grant Grundler
  2022-04-21 19:53 ` [PATCH 0/5] net: atlantic: more fuzzing fixes Grant Grundler
  5 siblings, 0 replies; 15+ messages in thread
From: Grant Grundler @ 2022-04-18 23:17 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: Jakub Kicinski, Paolo Abeni, netdev, David S . Miller, LKML,
	Aashay Shringarpure, Yi Chou, Shervin Oloumi, Grant Grundler

Bounds check hw_head index to verify it lies within the TX buffer ring.

Unexpected values of hw_head may cause aq_ring_tx_clean to double
dev_kfree_skb_any already cleaned parts of the ring.

Reported-by: Aashay Shringarpure <aashay@google.com>
Reported-by: Yi Chou <yich@google.com>
Reported-by: Shervin Oloumi <enlightened@google.com>
Signed-off-by: Grant Grundler <grundler@chromium.org>
---
 .../aquantia/atlantic/hw_atl/hw_atl_b0.c      | 21 +++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
index e72b9d86f6ad..9b6b93bb3e86 100644
--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
@@ -889,6 +889,27 @@ int hw_atl_b0_hw_ring_tx_head_update(struct aq_hw_s *self,
 		err = -ENXIO;
 		goto err_exit;
 	}
+
+	/* Validate that the new hw_head_ is reasonable. */
+	if (hw_head_ >= ring->size) {
+		err = -ENXIO;
+		goto err_exit;
+	}
+
+	if (ring->sw_head >= ring->sw_tail) {
+		/* Head index hasn't wrapped around to below tail index. */
+		if (hw_head_ < ring->sw_head && hw_head_ >= ring->sw_tail) {
+			err = -ENXIO;
+			goto err_exit;
+		}
+	} else {
+		/* Head index has wrapped around and is below tail index. */
+		if (hw_head_ < ring->sw_head || hw_head_ >= ring->sw_tail) {
+			err = -ENXIO;
+			goto err_exit;
+		}
+	}
+
 	ring->hw_head = hw_head_;
 	err = aq_hw_err_from_flags(self);
 
-- 
2.36.0.rc0.470.gd361397f0d-goog


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
  2022-04-18 23:17 [PATCH 0/5] net: atlantic: more fuzzing fixes Grant Grundler
                   ` (4 preceding siblings ...)
  2022-04-18 23:17 ` [PATCH 5/5] net: atlantic: verify hw_head_ is reasonable Grant Grundler
@ 2022-04-21 19:53 ` Grant Grundler
  2022-04-26 16:00   ` [EXT] " Igor Russkikh
  5 siblings, 1 reply; 15+ messages in thread
From: Grant Grundler @ 2022-04-21 19:53 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Igor Russkikh, Jakub Kicinski, Paolo Abeni, netdev,
	David S . Miller, LKML, Aashay Shringarpure, Yi Chou,
	Shervin Oloumi

Igor,
Will you have a chance to comment on this in the near future?
Should someone else review/integrate these patches?

I'm asking since I've seen no comments in the past three days.

cheers,
grant


On Mon, Apr 18, 2022 at 4:17 PM Grant Grundler <grundler@chromium.org> wrote:
>
> The Chrome OS fuzzing team posted a "Fuzzing" report for atlantic driver
> in Q4 2021 using Chrome OS v5.4 kernel and "Cable Matters
> Thunderbolt 3 to 10 Gb Ethernet" (b0 version):
>     https://docs.google.com/document/d/e/2PACX-1vT4oCGNhhy_AuUqpu6NGnW0N9HF_jxf2kS7raOpOlNRqJNiTHAtjiHRthXYSeXIRTgfeVvsEt0qK9qK/pub
>
> It essentially describes four problems:
> 1) validate rxd_wb->next_desc_ptr before populating buff->next
> 2) "frag[0] not initialized" case in aq_ring_rx_clean()
> 3) limit iterations handling fragments in aq_ring_rx_clean()
> 4) validate hw_head_ in hw_atl_b0_hw_ring_tx_head_update()
>
> I've added one "clean up" contribution:
>     "net: atlantic: reduce scope of is_rsc_complete"
>
> I tested the "original" patches using chromeos-v5.4 kernel branch:
>     https://chromium-review.googlesource.com/q/hashtag:pcinet-atlantic-2022q1+(status:open%20OR%20status:merged)
>
> The fuzzing team will retest using the chromeos-v5.4 patches and the b0 HW.
>
> I've forward ported those patches to 5.18-rc2 and compiled them but am
> currently unable to test them on 5.18-rc2 kernel (logistics problems).
>
> I'm confident in all but the last patch:
>    "net: atlantic: verify hw_head_ is reasonable"
>
> Please verify I'm not confusing how ring->sw_head and ring->sw_tail
> are used in hw_atl_b0_hw_ring_tx_head_update().
>
> Credit largely goes to Chrome OS Fuzzing team members:
>     Aashay Shringarpure, Yi Chou, Shervin Oloumi
>
> cheers,
> grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
  2022-04-21 19:53 ` [PATCH 0/5] net: atlantic: more fuzzing fixes Grant Grundler
@ 2022-04-26 16:00   ` Igor Russkikh
  2022-04-26 17:20     ` Grant Grundler
  0 siblings, 1 reply; 15+ messages in thread
From: Igor Russkikh @ 2022-04-26 16:00 UTC (permalink / raw)
  To: Grant Grundler, Dmitry Bezrukov
  Cc: Jakub Kicinski, Paolo Abeni, netdev, David S . Miller, LKML,
	Aashay Shringarpure, Yi Chou, Shervin Oloumi

Hi Grant,

Sorry for the delay, I was on vacation.
Thanks for working on this.

I'm adding here Dmitrii, to help me review the patches.
Dmitrii, here is a full series:

https://patchwork.kernel.org/project/netdevbpf/cover/20220418231746.2464800-1-grundler@chromium.org/

Grant, I've reviewed and also quite OK with patches 1-4.

For patch 5 - why do you think we need these extra comparisons with software head/tail?
From what I see in logic, only the size limiting check is enough there..

Other extra checks are tricky and non intuitive..

Regards,
  Igor

On 4/21/2022 9:53 PM, Grant Grundler wrote:
> External Email
> 
> ----------------------------------------------------------------------
> Igor,
> Will you have a chance to comment on this in the near future?
> Should someone else review/integrate these patches?
> 
> I'm asking since I've seen no comments in the past three days.
> 
> cheers,
> grant
> 
> 
> On Mon, Apr 18, 2022 at 4:17 PM Grant Grundler <grundler@chromium.org> 
> wrote:
>>
>> The Chrome OS fuzzing team posted a "Fuzzing" report for atlantic driver
>> in Q4 2021 using Chrome OS v5.4 kernel and "Cable Matters
>> Thunderbolt 3 to 10 Gb Ethernet" (b0 version):
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_e_2PACX-2D1vT4oCGNhhy-5FAuUqpu6NGnW0N9HF-5Fjxf2kS7raOpOlNRqJNiTHAtjiHRthXYSeXIRTgfeVvsEt0qK9qK_pub&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8WoQQ-hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=620jqeSvQrGg6aotI35cWwQpjaL94s7TFeFh2cYSyvA&e=
>>
>> It essentially describes four problems:
>> 1) validate rxd_wb->next_desc_ptr before populating buff->next
>> 2) "frag[0] not initialized" case in aq_ring_rx_clean()
>> 3) limit iterations handling fragments in aq_ring_rx_clean()
>> 4) validate hw_head_ in hw_atl_b0_hw_ring_tx_head_update()
>>
>> I've added one "clean up" contribution:
>>     "net: atlantic: reduce scope of is_rsc_complete"
>>
>> I tested the "original" patches using chromeos-v5.4 kernel branch:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__chromium-2Dreview.googlesource.com_q_hashtag-3Apcinet-2Datlantic-2D2022q1-2B-28status-3Aopen-2520OR-2520status-3Amerged-29&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8WoQQ-hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=1a1YwJqrY-be2oDgGAG5oOyZDnqIok_2p5G-N8djo2I&e=
>>
>> The fuzzing team will retest using the chromeos-v5.4 patches and the b0 
>> HW.
>>
>> I've forward ported those patches to 5.18-rc2 and compiled them but am
>> currently unable to test them on 5.18-rc2 kernel (logistics problems).
>>
>> I'm confident in all but the last patch:
>>    "net: atlantic: verify hw_head_ is reasonable"
>>
>> Please verify I'm not confusing how ring->sw_head and ring->sw_tail
>> are used in hw_atl_b0_hw_ring_tx_head_update().
>>
>> Credit largely goes to Chrome OS Fuzzing team members:
>>     Aashay Shringarpure, Yi Chou, Shervin Oloumi
>>
>> cheers,
>> grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
  2022-04-26 16:00   ` [EXT] " Igor Russkikh
@ 2022-04-26 17:20     ` Grant Grundler
  2022-05-03 11:14       ` Dmitrii Bezrukov
  0 siblings, 1 reply; 15+ messages in thread
From: Grant Grundler @ 2022-04-26 17:20 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: Grant Grundler, Dmitry Bezrukov, Jakub Kicinski, Paolo Abeni,
	netdev, David S . Miller, LKML, Aashay Shringarpure, Yi Chou,
	Shervin Oloumi

[reply-all again since I forgot to tell gmail to post this as "plain
text"...grrh... so much for AI figuring this stuff out.]


On Tue, Apr 26, 2022 at 9:00 AM Igor Russkikh <irusskikh@marvell.com> wrote:
>
> Hi Grant,
>
> Sorry for the delay, I was on vacation.
> Thanks for working on this.

Hi Igor!
Very welcome! And yes, I was starting to wonder... but I'm now glad
that you didn't review them before you got back. These patches are no
reason to ruin a perfectly good vacation. :)

> I'm adding here Dmitrii, to help me review the patches.
> Dmitrii, here is a full series:
>
> https://patchwork.kernel.org/project/netdevbpf/cover/20220418231746.2464800-1-grundler@chromium.org/
>
> Grant, I've reviewed and also quite OK with patches 1-4.

Excellent! \o/


> For patch 5 - why do you think we need these extra comparisons with software head/tail?

The ChromeOS security team (CC'd) believes the driver needs to verify
"expected behavior". In other words, the driver expects the device to
provide new values of tail index which are between [tail,head)
("available to fill").

Your question makes me chuckle because I asked exactly the same
question. :D Everyone agrees it is a minimum requirement to verify the
index was "in bounds". And I agree it's prudent to verify the device
is "well behaved" where we can. I haven't looked at the code enough to
know what could go wrong if, for example, the tail index is
decremented instead of incremented or a "next fragment" index falls in
the "available to fill" range.

However, I didn't run the fuzzer and, for now, I'm taking the ChromeOS
security team's word that this check is needed. If you (or Dmitrii)
feel strongly the driver can handle malicious or firmware bugs in
other ways, I'm not offended if you decline this patch. However, I
would be curious what those other mechanisms are.

> From what I see in logic, only the size limiting check is enough there..
>
> Other extra checks are tricky and non intuitive..

Yes, somewhat tricky in the code but conceptually simple: For the RX
buffer ring, IIUC, [head,tail) is "CPU to process" and [tail, head) is
"available to fill". New tail values should always be in the latter
range.

The trickiness comes in because this is a ring buffer and [tail, head)
it is equally likely that head =< tail  or head > tail numerically.

If you like, feel free to add comments explaining the ring behavior or
ask me to add such a comment (and repost #5). I'm a big fan of
documenting non-intuitive things in the code. That way the next person
to look at the code can verify the code and the IO device do what the
comment claims.

On the RX buffer ring, I'm also wondering if there is a race condition
such that the driver uses stale values of the tail pointer when
walking the RX fragment lists and validating index values. Aashay
assures me this race condition is not possible and I am convinced this
is true for the TX buffer ring where the driver is the "producer"
(tells the device what is in the TX ring). I still have to review the
RX buffer handling code more and will continue the conversation with
him until we agree.

cheers,
grant

>
> Regards,
>   Igor
>
> On 4/21/2022 9:53 PM, Grant Grundler wrote:
> > External Email
> >
> > ----------------------------------------------------------------------
> > Igor,
> > Will you have a chance to comment on this in the near future?
> > Should someone else review/integrate these patches?
> >
> > I'm asking since I've seen no comments in the past three days.
> >
> > cheers,
> > grant
> >
> >
> > On Mon, Apr 18, 2022 at 4:17 PM Grant Grundler <grundler@chromium.org>
> > wrote:
> >>
> >> The Chrome OS fuzzing team posted a "Fuzzing" report for atlantic driver
> >> in Q4 2021 using Chrome OS v5.4 kernel and "Cable Matters
> >> Thunderbolt 3 to 10 Gb Ethernet" (b0 version):
> >>
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.com_document_d_e_2PACX-2D1vT4oCGNhhy-5FAuUqpu6NGnW0N9HF-5Fjxf2kS7raOpOlNRqJNiTHAtjiHRthXYSeXIRTgfeVvsEt0qK9qK_pub&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8WoQQ-hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=620jqeSvQrGg6aotI35cWwQpjaL94s7TFeFh2cYSyvA&e=
> >>
> >> It essentially describes four problems:
> >> 1) validate rxd_wb->next_desc_ptr before populating buff->next
> >> 2) "frag[0] not initialized" case in aq_ring_rx_clean()
> >> 3) limit iterations handling fragments in aq_ring_rx_clean()
> >> 4) validate hw_head_ in hw_atl_b0_hw_ring_tx_head_update()
> >>
> >> I've added one "clean up" contribution:
> >>     "net: atlantic: reduce scope of is_rsc_complete"
> >>
> >> I tested the "original" patches using chromeos-v5.4 kernel branch:
> >>
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__chromium-2Dreview.googlesource.com_q_hashtag-3Apcinet-2Datlantic-2D2022q1-2B-28status-3Aopen-2520OR-2520status-3Amerged-29&d=DwIBaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8WoQQ-hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=1a1YwJqrY-be2oDgGAG5oOyZDnqIok_2p5G-N8djo2I&e=
> >>
> >> The fuzzing team will retest using the chromeos-v5.4 patches and the b0
> >> HW.
> >>
> >> I've forward ported those patches to 5.18-rc2 and compiled them but am
> >> currently unable to test them on 5.18-rc2 kernel (logistics problems).
> >>
> >> I'm confident in all but the last patch:
> >>    "net: atlantic: verify hw_head_ is reasonable"
> >>
> >> Please verify I'm not confusing how ring->sw_head and ring->sw_tail
> >> are used in hw_atl_b0_hw_ring_tx_head_update().
> >>
> >> Credit largely goes to Chrome OS Fuzzing team members:
> >>     Aashay Shringarpure, Yi Chou, Shervin Oloumi
> >>
> >> cheers,
> >> grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
  2022-04-26 17:20     ` Grant Grundler
@ 2022-05-03 11:14       ` Dmitrii Bezrukov
  2022-05-03 18:07         ` Grant Grundler
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitrii Bezrukov @ 2022-05-03 11:14 UTC (permalink / raw)
  To: Grant Grundler, Igor Russkikh
  Cc: Jakub Kicinski, Paolo Abeni, netdev, David S . Miller, LKML,
	Aashay Shringarpure, Yi Chou, Shervin Oloumi

Hi Grants,

>[1/5] net: atlantic: limit buff_ring index value

>diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c 
>b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
>index d875ce3ec759..e72b9d86f6ad 100644
>--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
>+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
>@@ -981,7 +981,9 @@  int hw_atl_b0_hw_ring_rx_receive(struct aq_hw_s 
>*self, struct aq_ring_s *ring)
> 
> 			if (buff->is_lro) {
> 				/* LRO */
>-				buff->next = rxd_wb->next_desc_ptr;
>+				buff->next =
>+					(rxd_wb->next_desc_ptr < ring->size) ?
>+					rxd_wb->next_desc_ptr : 0U;
> 				++ring->stats.rx.lro_packets;
> 			} else {
> 				/* jumbo */

I don’t find this way correct. At least in this functions there is no access to buffers by this index "next".
Following this fix, if it happens then during assembling of LRO session it could be that this buffer (you suggesting to use buffer with index 0) becomes a part of current LRO session and will be also processed as a single packet or as a part of other LRO huge packet.
To be honest there are lot of possibilities depends on current values of head and tail which may cause either memory leak or double free or something else.
There is a code which calls this function aq_ring.c: aq_ring_rx_clean.
From my POV it's better to check that indexes don't point to outside of ring in code which collects LRO session.
There is expectation that "next" is in range "cleaned" descriptors, which means that HW put data there wrote descriptor and buffers are ready to be process by assembling code.
So in case if "next" points to something outside of ring then guess it would be better just to stop collecting these buffers to one huge packet and treat this LRO session as completed.
Then rest of buffers (which should be it that chain) will be collected again without beginning and just dropped by stack later.

> [4/5] net: atlantic: add check for MAX_SKB_FRAGS
>
>diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c 
>b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
>index bc1952131799..8201ce7adb77 100644
>--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
>+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
>@@ -363,6 +363,7 @@  int aq_ring_rx_clean(struct aq_ring_s *self,
> 			continue;
> 
> 		if (!buff->is_eop) {
>+			unsigned int frag_cnt = 0U;
> 			buff_ = buff;
> 			do {
> 				bool is_rsc_completed = true;
>@@ -371,6 +372,8 @@  int aq_ring_rx_clean(struct aq_ring_s *self,
> 					err = -EIO;
> 					goto err_exit;
> 				}
>+
>+				frag_cnt++;
> 				next_ = buff_->next,
> 				buff_ = &self->buff_ring[next_];
> 				is_rsc_completed =
>@@ -378,7 +381,8 @@  int aq_ring_rx_clean(struct aq_ring_s *self,
> 							    next_,
> 							    self->hw_head);
> 
>-				if (unlikely(!is_rsc_completed)) {
>+				if (unlikely(!is_rsc_completed) ||
>+				    frag_cnt > MAX_SKB_FRAGS) {
> 					err = 0;
> 					goto err_exit;
> 				}

Number of fragments are limited by HW configuration: hw_atl_b0_internal.h: #define HW_ATL_B0_LRO_RXD_MAX 16U.
Let's imagine if it happens: driver stucks at this point it will wait for session completion and session will not be completed due to too much fragments.
Guess in case if number of buffers exceeds this limit then it is required to close session and continue (submit packet to stack and finalize clearing if processed descriptors/buffers).

> [5/5] net: atlantic: verify hw_head_ is reasonable diff --git 
>a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c 
>b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
>index e72b9d86f6ad..9b6b93bb3e86 100644
>--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
>+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
>@@ -889,6 +889,27 @@  int hw_atl_b0_hw_ring_tx_head_update(struct aq_hw_s *self,
> 		err = -ENXIO;
> 		goto err_exit;
> 	}
>+
>+	/* Validate that the new hw_head_ is reasonable. */
>+	if (hw_head_ >= ring->size) {
>+		err = -ENXIO;
>+		goto err_exit;
>+	}
>+
>+	if (ring->sw_head >= ring->sw_tail) {
>+		/* Head index hasn't wrapped around to below tail index. */
>+		if (hw_head_ < ring->sw_head && hw_head_ >= ring->sw_tail) {
>+			err = -ENXIO;
>+			goto err_exit;
>+		}
>+	} else {
>+		/* Head index has wrapped around and is below tail index. */
>+		if (hw_head_ < ring->sw_head || hw_head_ >= ring->sw_tail) {
>+			err = -ENXIO;
>+			goto err_exit;
>+		}
>+	}
>+
> 	ring->hw_head = hw_head_;
> 	err = aq_hw_err_from_flags(self);

Simple example. One packet with one buffer was sent. Sw_tail = 1, sw_head=0. From interrupt this function is called and hw_head_ is 1.
Code will follow "else" branch of second "if" that you add and condition "if (hw_head_ < ring->sw_head || hw_head_ >= ring->sw_tail) {" will be true which seems to be not expected.

Best regards,
Dmitrii Bezrukov

-----Original Message-----
From: Grant Grundler <grundler@chromium.org> 
Sent: Tuesday, April 26, 2022 7:21 PM
To: Igor Russkikh <irusskikh@marvell.com>
Cc: Grant Grundler <grundler@chromium.org>; Dmitrii Bezrukov <dbezrukov@marvell.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; netdev <netdev@vger.kernel.org>; David S . Miller <davem@davemloft.net>; LKML <linux-kernel@vger.kernel.org>; Aashay Shringarpure <aashay@google.com>; Yi Chou <yich@google.com>; Shervin Oloumi <enlightened@google.com>
Subject: Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes

[reply-all again since I forgot to tell gmail to post this as "plain text"...grrh... so much for AI figuring this stuff out.]


On Tue, Apr 26, 2022 at 9:00 AM Igor Russkikh <irusskikh@marvell.com> wrote:
>
> Hi Grant,
>
> Sorry for the delay, I was on vacation.
> Thanks for working on this.

Hi Igor!
Very welcome! And yes, I was starting to wonder... but I'm now glad that you didn't review them before you got back. These patches are no reason to ruin a perfectly good vacation. :)

> I'm adding here Dmitrii, to help me review the patches.
> Dmitrii, here is a full series:
>
> https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.
> org_project_netdevbpf_cover_20220418231746.2464800-2D1-2Dgrundler-40ch
> romium.org_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=AliKLBUTg9lFc5sIMTzJt8
> MdPiAgKbsC8IpLIHmdX9w&m=1LeNSCJMgZ7UkGBm56FcvL_Oza8VOX45LEtQf31qPE2KLQ
> cr5Q36aMIUR2DzLhi7&s=fVFxLPRO8K2DYFpGUOggf38nbDFaHKg8aATsjB1TuB0&e=
>
> Grant, I've reviewed and also quite OK with patches 1-4.

Excellent! \o/


> For patch 5 - why do you think we need these extra comparisons with software head/tail?

The ChromeOS security team (CC'd) believes the driver needs to verify "expected behavior". In other words, the driver expects the device to provide new values of tail index which are between [tail,head) ("available to fill").

Your question makes me chuckle because I asked exactly the same question. :D Everyone agrees it is a minimum requirement to verify the index was "in bounds". And I agree it's prudent to verify the device is "well behaved" where we can. I haven't looked at the code enough to know what could go wrong if, for example, the tail index is decremented instead of incremented or a "next fragment" index falls in the "available to fill" range.

However, I didn't run the fuzzer and, for now, I'm taking the ChromeOS security team's word that this check is needed. If you (or Dmitrii) feel strongly the driver can handle malicious or firmware bugs in other ways, I'm not offended if you decline this patch. However, I would be curious what those other mechanisms are.

> From what I see in logic, only the size limiting check is enough there..
>
> Other extra checks are tricky and non intuitive..

Yes, somewhat tricky in the code but conceptually simple: For the RX buffer ring, IIUC, [head,tail) is "CPU to process" and [tail, head) is "available to fill". New tail values should always be in the latter range.

The trickiness comes in because this is a ring buffer and [tail, head) it is equally likely that head =< tail  or head > tail numerically.

If you like, feel free to add comments explaining the ring behavior or ask me to add such a comment (and repost #5). I'm a big fan of documenting non-intuitive things in the code. That way the next person to look at the code can verify the code and the IO device do what the comment claims.

On the RX buffer ring, I'm also wondering if there is a race condition such that the driver uses stale values of the tail pointer when walking the RX fragment lists and validating index values. Aashay assures me this race condition is not possible and I am convinced this is true for the TX buffer ring where the driver is the "producer"
(tells the device what is in the TX ring). I still have to review the RX buffer handling code more and will continue the conversation with him until we agree.

cheers,
grant

>
> Regards,
>   Igor
>
> On 4/21/2022 9:53 PM, Grant Grundler wrote:
> > External Email
> >
> > --------------------------------------------------------------------
> > --
> > Igor,
> > Will you have a chance to comment on this in the near future?
> > Should someone else review/integrate these patches?
> >
> > I'm asking since I've seen no comments in the past three days.
> >
> > cheers,
> > grant
> >
> >
> > On Mon, Apr 18, 2022 at 4:17 PM Grant Grundler 
> > <grundler@chromium.org>
> > wrote:
> >>
> >> The Chrome OS fuzzing team posted a "Fuzzing" report for atlantic 
> >> driver in Q4 2021 using Chrome OS v5.4 kernel and "Cable Matters 
> >> Thunderbolt 3 to 10 Gb Ethernet" (b0 version):
> >>
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.co
> >> m_document_d_e_2PACX-2D1vT4oCGNhhy-5FAuUqpu6NGnW0N9HF-5Fjxf2kS7raOp
> >> OlNRqJNiTHAtjiHRthXYSeXIRTgfeVvsEt0qK9qK_pub&d=DwIBaQ&c=nKjWec2b6R0
> >> mOyPaz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8Wo
> >> QQ-hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=620jqeS
> >> vQrGg6aotI35cWwQpjaL94s7TFeFh2cYSyvA&e=
> >>
> >> It essentially describes four problems:
> >> 1) validate rxd_wb->next_desc_ptr before populating buff->next
> >> 2) "frag[0] not initialized" case in aq_ring_rx_clean()
> >> 3) limit iterations handling fragments in aq_ring_rx_clean()
> >> 4) validate hw_head_ in hw_atl_b0_hw_ring_tx_head_update()
> >>
> >> I've added one "clean up" contribution:
> >>     "net: atlantic: reduce scope of is_rsc_complete"
> >>
> >> I tested the "original" patches using chromeos-v5.4 kernel branch:
> >>
> >> https://urldefense.proofpoint.com/v2/url?u=https-3A__chromium-2Drev
> >> iew.googlesource.com_q_hashtag-3Apcinet-2Datlantic-2D2022q1-2B-28st
> >> atus-3Aopen-2520OR-2520status-3Amerged-29&d=DwIBaQ&c=nKjWec2b6R0mOy
> >> Paz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8WoQQ-
> >> hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=1a1YwJqrY-
> >> be2oDgGAG5oOyZDnqIok_2p5G-N8djo2I&e=
> >>
> >> The fuzzing team will retest using the chromeos-v5.4 patches and 
> >> the b0 HW.
> >>
> >> I've forward ported those patches to 5.18-rc2 and compiled them but 
> >> am currently unable to test them on 5.18-rc2 kernel (logistics problems).
> >>
> >> I'm confident in all but the last patch:
> >>    "net: atlantic: verify hw_head_ is reasonable"
> >>
> >> Please verify I'm not confusing how ring->sw_head and ring->sw_tail 
> >> are used in hw_atl_b0_hw_ring_tx_head_update().
> >>
> >> Credit largely goes to Chrome OS Fuzzing team members:
> >>     Aashay Shringarpure, Yi Chou, Shervin Oloumi
> >>
> >> cheers,
> >> grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
  2022-05-03 11:14       ` Dmitrii Bezrukov
@ 2022-05-03 18:07         ` Grant Grundler
  2022-05-04 14:39           ` Dmitrii Bezrukov
  0 siblings, 1 reply; 15+ messages in thread
From: Grant Grundler @ 2022-05-03 18:07 UTC (permalink / raw)
  To: Dmitrii Bezrukov
  Cc: Grant Grundler, Igor Russkikh, Jakub Kicinski, Paolo Abeni,
	netdev, David S . Miller, LKML, Aashay Shringarpure, Yi Chou,
	Shervin Oloumi

Hi Dmitrii!

On Tue, May 3, 2022 at 4:15 AM Dmitrii Bezrukov <dbezrukov@marvell.com> wrote:
>
> Hi Grants,
>
> >[1/5] net: atlantic: limit buff_ring index value
>
> >diff --git a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >index d875ce3ec759..e72b9d86f6ad 100644
> >--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >@@ -981,7 +981,9 @@  int hw_atl_b0_hw_ring_rx_receive(struct aq_hw_s
> >*self, struct aq_ring_s *ring)
> >
> >                       if (buff->is_lro) {
> >                               /* LRO */
> >-                              buff->next = rxd_wb->next_desc_ptr;
> >+                              buff->next =
> >+                                      (rxd_wb->next_desc_ptr < ring->size) ?
> >+                                      rxd_wb->next_desc_ptr : 0U;
> >                               ++ring->stats.rx.lro_packets;
> >                       } else {
> >                               /* jumbo */
>
> I don’t find this way correct. At least in this functions there is no access to buffers by this index "next".

Did I misunderstand what this code (line 378) is doing in aq_ring.c?
342 #define AQ_SKB_ALIGN SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
343 int aq_ring_rx_clean(struct aq_ring_s *self,
344                      struct napi_struct *napi,
345                      int *work_done,
346                      int budget)
347 {
...
371                                 if (buff_->next >= self->size) {
372                                         err = -EIO;
373                                         goto err_exit;
374                                 }
375
376                                 frag_cnt++;
377                                 next_ = buff_->next,
378                                 buff_ = &self->buff_ring[next_];

My change is redundant with Lines 371-374 - they essentially do the
same thing and were added on
2021-12-26 by 5f50153288452 (Zekun Shen)

The original fuzzing work was done on chromeos-v5.4 kernel and didn't
include this change. I'll backport 5f50153288452t to chromeos-v5.4 and
drop my proposed change.

> Following this fix, if it happens then during assembling of LRO session it could be that this buffer (you suggesting to use buffer with index 0) becomes a part of current LRO session and will be also processed as a single packet or as a part of other LRO huge packet.
> To be honest there are lot of possibilities depends on current values of head and tail which may cause either memory leak or double free or something else.

*nod*

> There is a code which calls this function aq_ring.c: aq_ring_rx_clean.

Exactly.

> From my POV it's better to check that indexes don't point to outside of ring in code which collects LRO session.

Sounds good to me. I don't have a preference. I'm ok with
dropping/ignoring [1/5] patch.

> There is expectation that "next" is in range "cleaned" descriptors, which means that HW put data there wrote descriptor and buffers are ready to be process by assembling code.
> So in case if "next" points to something outside of ring then guess it would be better just to stop collecting these buffers to one huge packet and treat this LRO session as completed.
> Then rest of buffers (which should be it that chain) will be collected again without beginning and just dropped by stack later.

That makes sense to me. And apologies for not noticing Zekun Shen's
2021-12-26 change earlier. I've been working on this off and on for
several months.

> > [4/5] net: atlantic: add check for MAX_SKB_FRAGS
> >
> >diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> >b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> >index bc1952131799..8201ce7adb77 100644
> >--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> >+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> >@@ -363,6 +363,7 @@  int aq_ring_rx_clean(struct aq_ring_s *self,
> >                       continue;
> >
> >               if (!buff->is_eop) {
> >+                      unsigned int frag_cnt = 0U;
> >                       buff_ = buff;
> >                       do {
> >                               bool is_rsc_completed = true;
> >@@ -371,6 +372,8 @@  int aq_ring_rx_clean(struct aq_ring_s *self,
> >                                       err = -EIO;
> >                                       goto err_exit;
> >                               }
> >+
> >+                              frag_cnt++;
> >                               next_ = buff_->next,
> >                               buff_ = &self->buff_ring[next_];
> >                               is_rsc_completed =
> >@@ -378,7 +381,8 @@  int aq_ring_rx_clean(struct aq_ring_s *self,
> >                                                           next_,
> >                                                           self->hw_head);
> >
> >-                              if (unlikely(!is_rsc_completed)) {
> >+                              if (unlikely(!is_rsc_completed) ||
> >+                                  frag_cnt > MAX_SKB_FRAGS) {
> >                                       err = 0;
> >                                       goto err_exit;
> >                               }
>
> Number of fragments are limited by HW configuration: hw_atl_b0_internal.h: #define HW_ATL_B0_LRO_RXD_MAX 16U.

Should this loop be checking against HW_ATL_B0_LRO_RXD_MAX instead?

> Let's imagine if it happens: driver stucks at this point it will wait for session completion and session will not be completed due to too much fragments.
> Guess in case if number of buffers exceeds this limit then it is required to close session and continue (submit packet to stack and finalize clearing if processed descriptors/buffers).

Sorry, I'm not understanding your conclusion. Is the "goto err_exit"
in this case doing what you described?
Does this patch have the right idea (even if it should test against a
different constant)?

My main concern is the CPU gets stuck in this loop for a very long
(infinite?) time.

>
> > [5/5] net: atlantic: verify hw_head_ is reasonable diff --git
> >a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >index e72b9d86f6ad..9b6b93bb3e86 100644
> >--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >@@ -889,6 +889,27 @@  int hw_atl_b0_hw_ring_tx_head_update(struct aq_hw_s *self,
> >               err = -ENXIO;
> >               goto err_exit;
> >       }
> >+
> >+      /* Validate that the new hw_head_ is reasonable. */
> >+      if (hw_head_ >= ring->size) {
> >+              err = -ENXIO;
> >+              goto err_exit;
> >+      }
> >+
> >+      if (ring->sw_head >= ring->sw_tail) {
> >+              /* Head index hasn't wrapped around to below tail index. */
> >+              if (hw_head_ < ring->sw_head && hw_head_ >= ring->sw_tail) {
> >+                      err = -ENXIO;
> >+                      goto err_exit;
> >+              }
> >+      } else {
> >+              /* Head index has wrapped around and is below tail index. */
> >+              if (hw_head_ < ring->sw_head || hw_head_ >= ring->sw_tail) {
> >+                      err = -ENXIO;
> >+                      goto err_exit;
> >+              }
> >+      }
> >+
> >       ring->hw_head = hw_head_;
> >       err = aq_hw_err_from_flags(self);
>
> Simple example. One packet with one buffer was sent. Sw_tail = 1, sw_head=0. From interrupt this function is called and hw_head_ is 1.
> Code will follow "else" branch of second "if" that you add and condition "if (hw_head_ < ring->sw_head || hw_head_ >= ring->sw_tail) {" will be true which seems to be not expected.

Correct - I've got this wrong (head/tail swapped). Even if I had it
right, As Igor observed, debatable if it's necessary. Please
drop/ignore this patch as well. Aashay and I need to discuss this
more.

thank you again!

cheers,
grant


>
> Best regards,
> Dmitrii Bezrukov
>
> -----Original Message-----
> From: Grant Grundler <grundler@chromium.org>
> Sent: Tuesday, April 26, 2022 7:21 PM
> To: Igor Russkikh <irusskikh@marvell.com>
> Cc: Grant Grundler <grundler@chromium.org>; Dmitrii Bezrukov <dbezrukov@marvell.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; netdev <netdev@vger.kernel.org>; David S . Miller <davem@davemloft.net>; LKML <linux-kernel@vger.kernel.org>; Aashay Shringarpure <aashay@google.com>; Yi Chou <yich@google.com>; Shervin Oloumi <enlightened@google.com>
> Subject: Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
>
> [reply-all again since I forgot to tell gmail to post this as "plain text"...grrh... so much for AI figuring this stuff out.]
>
>
> On Tue, Apr 26, 2022 at 9:00 AM Igor Russkikh <irusskikh@marvell.com> wrote:
> >
> > Hi Grant,
> >
> > Sorry for the delay, I was on vacation.
> > Thanks for working on this.
>
> Hi Igor!
> Very welcome! And yes, I was starting to wonder... but I'm now glad that you didn't review them before you got back. These patches are no reason to ruin a perfectly good vacation. :)
>
> > I'm adding here Dmitrii, to help me review the patches.
> > Dmitrii, here is a full series:
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.
> > org_project_netdevbpf_cover_20220418231746.2464800-2D1-2Dgrundler-40ch
> > romium.org_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=AliKLBUTg9lFc5sIMTzJt8
> > MdPiAgKbsC8IpLIHmdX9w&m=1LeNSCJMgZ7UkGBm56FcvL_Oza8VOX45LEtQf31qPE2KLQ
> > cr5Q36aMIUR2DzLhi7&s=fVFxLPRO8K2DYFpGUOggf38nbDFaHKg8aATsjB1TuB0&e=
> >
> > Grant, I've reviewed and also quite OK with patches 1-4.
>
> Excellent! \o/
>
>
> > For patch 5 - why do you think we need these extra comparisons with software head/tail?
>
> The ChromeOS security team (CC'd) believes the driver needs to verify "expected behavior". In other words, the driver expects the device to provide new values of tail index which are between [tail,head) ("available to fill").
>
> Your question makes me chuckle because I asked exactly the same question. :D Everyone agrees it is a minimum requirement to verify the index was "in bounds". And I agree it's prudent to verify the device is "well behaved" where we can. I haven't looked at the code enough to know what could go wrong if, for example, the tail index is decremented instead of incremented or a "next fragment" index falls in the "available to fill" range.
>
> However, I didn't run the fuzzer and, for now, I'm taking the ChromeOS security team's word that this check is needed. If you (or Dmitrii) feel strongly the driver can handle malicious or firmware bugs in other ways, I'm not offended if you decline this patch. However, I would be curious what those other mechanisms are.
>
> > From what I see in logic, only the size limiting check is enough there..
> >
> > Other extra checks are tricky and non intuitive..
>
> Yes, somewhat tricky in the code but conceptually simple: For the RX buffer ring, IIUC, [head,tail) is "CPU to process" and [tail, head) is "available to fill". New tail values should always be in the latter range.
>
> The trickiness comes in because this is a ring buffer and [tail, head) it is equally likely that head =< tail  or head > tail numerically.
>
> If you like, feel free to add comments explaining the ring behavior or ask me to add such a comment (and repost #5). I'm a big fan of documenting non-intuitive things in the code. That way the next person to look at the code can verify the code and the IO device do what the comment claims.
>
> On the RX buffer ring, I'm also wondering if there is a race condition such that the driver uses stale values of the tail pointer when walking the RX fragment lists and validating index values. Aashay assures me this race condition is not possible and I am convinced this is true for the TX buffer ring where the driver is the "producer"
> (tells the device what is in the TX ring). I still have to review the RX buffer handling code more and will continue the conversation with him until we agree.
>
> cheers,
> grant
>
> >
> > Regards,
> >   Igor
> >
> > On 4/21/2022 9:53 PM, Grant Grundler wrote:
> > > External Email
> > >
> > > --------------------------------------------------------------------
> > > --
> > > Igor,
> > > Will you have a chance to comment on this in the near future?
> > > Should someone else review/integrate these patches?
> > >
> > > I'm asking since I've seen no comments in the past three days.
> > >
> > > cheers,
> > > grant
> > >
> > >
> > > On Mon, Apr 18, 2022 at 4:17 PM Grant Grundler
> > > <grundler@chromium.org>
> > > wrote:
> > >>
> > >> The Chrome OS fuzzing team posted a "Fuzzing" report for atlantic
> > >> driver in Q4 2021 using Chrome OS v5.4 kernel and "Cable Matters
> > >> Thunderbolt 3 to 10 Gb Ethernet" (b0 version):
> > >>
> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.co
> > >> m_document_d_e_2PACX-2D1vT4oCGNhhy-5FAuUqpu6NGnW0N9HF-5Fjxf2kS7raOp
> > >> OlNRqJNiTHAtjiHRthXYSeXIRTgfeVvsEt0qK9qK_pub&d=DwIBaQ&c=nKjWec2b6R0
> > >> mOyPaz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8Wo
> > >> QQ-hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=620jqeS
> > >> vQrGg6aotI35cWwQpjaL94s7TFeFh2cYSyvA&e=
> > >>
> > >> It essentially describes four problems:
> > >> 1) validate rxd_wb->next_desc_ptr before populating buff->next
> > >> 2) "frag[0] not initialized" case in aq_ring_rx_clean()
> > >> 3) limit iterations handling fragments in aq_ring_rx_clean()
> > >> 4) validate hw_head_ in hw_atl_b0_hw_ring_tx_head_update()
> > >>
> > >> I've added one "clean up" contribution:
> > >>     "net: atlantic: reduce scope of is_rsc_complete"
> > >>
> > >> I tested the "original" patches using chromeos-v5.4 kernel branch:
> > >>
> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__chromium-2Drev
> > >> iew.googlesource.com_q_hashtag-3Apcinet-2Datlantic-2D2022q1-2B-28st
> > >> atus-3Aopen-2520OR-2520status-3Amerged-29&d=DwIBaQ&c=nKjWec2b6R0mOy
> > >> Paz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8WoQQ-
> > >> hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=1a1YwJqrY-
> > >> be2oDgGAG5oOyZDnqIok_2p5G-N8djo2I&e=
> > >>
> > >> The fuzzing team will retest using the chromeos-v5.4 patches and
> > >> the b0 HW.
> > >>
> > >> I've forward ported those patches to 5.18-rc2 and compiled them but
> > >> am currently unable to test them on 5.18-rc2 kernel (logistics problems).
> > >>
> > >> I'm confident in all but the last patch:
> > >>    "net: atlantic: verify hw_head_ is reasonable"
> > >>
> > >> Please verify I'm not confusing how ring->sw_head and ring->sw_tail
> > >> are used in hw_atl_b0_hw_ring_tx_head_update().
> > >>
> > >> Credit largely goes to Chrome OS Fuzzing team members:
> > >>     Aashay Shringarpure, Yi Chou, Shervin Oloumi
> > >>
> > >> cheers,
> > >> grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
  2022-05-03 18:07         ` Grant Grundler
@ 2022-05-04 14:39           ` Dmitrii Bezrukov
  2022-05-04 20:11             ` Grant Grundler
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitrii Bezrukov @ 2022-05-04 14:39 UTC (permalink / raw)
  To: Grant Grundler
  Cc: Igor Russkikh, Jakub Kicinski, Paolo Abeni, netdev,
	David S . Miller, LKML, Aashay Shringarpure, Yi Chou,
	Shervin Oloumi

Hi Grant,

> Did I misunderstand what this code (line 378) is doing in aq_ring.c?
Yes it's done there. I meant that in 'hw_atl/hw_atl_b0.c" there is not access to buffer. 
And probably it's not a good place to put this your change there. Due to you are going to drop this 1/5 patch, it doesn’t make any sense anymore.

> *nod*
I'm sorry but I don’t understand what it means.

> Should this loop be checking against HW_ATL_B0_LRO_RXD_MAX instead?
No, that's OK to check with MAX_SKB_FRAGS as you do.

>Sorry, I'm not understanding your conclusion. Is the "goto err_exit"
>in this case doing what you described?
>Does this patch have the right idea (even if it should test against a different constant)?
>
>My main concern is the CPU gets stuck in this loop for a very long
>(infinite?) time.
Yes this is exactly that I meant, that in this case, the condition to push packet to stack will not be ever reached.
So to close session I guess need to set is_rsc_completed to true when number of frags is going to exceed value MAX_SKB_FRAGS, then packet will be built and submitted to stack.
But of course need to check that there will not be any other corner cases with this new change.

Best regards,
Dmitrii Bezrukov

-----Original Message-----
From: Grant Grundler <grundler@chromium.org> 
Sent: Tuesday, May 3, 2022 8:07 PM
To: Dmitrii Bezrukov <dbezrukov@marvell.com>
Cc: Grant Grundler <grundler@chromium.org>; Igor Russkikh <irusskikh@marvell.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; netdev <netdev@vger.kernel.org>; David S . Miller <davem@davemloft.net>; LKML <linux-kernel@vger.kernel.org>; Aashay Shringarpure <aashay@google.com>; Yi Chou <yich@google.com>; Shervin Oloumi <enlightened@google.com>
Subject: Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes

Hi Dmitrii!

On Tue, May 3, 2022 at 4:15 AM Dmitrii Bezrukov <dbezrukov@marvell.com> wrote:
>
> Hi Grants,
>
> >[1/5] net: atlantic: limit buff_ring index value
>
> >diff --git 
> >a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >index d875ce3ec759..e72b9d86f6ad 100644
> >--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >@@ -981,7 +981,9 @@  int hw_atl_b0_hw_ring_rx_receive(struct aq_hw_s 
> >*self, struct aq_ring_s *ring)
> >
> >                       if (buff->is_lro) {
> >                               /* LRO */
> >-                              buff->next = rxd_wb->next_desc_ptr;
> >+                              buff->next =
> >+                                      (rxd_wb->next_desc_ptr < ring->size) ?
> >+                                      rxd_wb->next_desc_ptr : 0U;
> >                               ++ring->stats.rx.lro_packets;
> >                       } else {
> >                               /* jumbo */
>
> I don’t find this way correct. At least in this functions there is no access to buffers by this index "next".

Did I misunderstand what this code (line 378) is doing in aq_ring.c?
342 #define AQ_SKB_ALIGN SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
343 int aq_ring_rx_clean(struct aq_ring_s *self,
344                      struct napi_struct *napi,
345                      int *work_done,
346                      int budget)
347 {
...
371                                 if (buff_->next >= self->size) {
372                                         err = -EIO;
373                                         goto err_exit;
374                                 }
375
376                                 frag_cnt++;
377                                 next_ = buff_->next,
378                                 buff_ = &self->buff_ring[next_];

My change is redundant with Lines 371-374 - they essentially do the same thing and were added on
2021-12-26 by 5f50153288452 (Zekun Shen)

The original fuzzing work was done on chromeos-v5.4 kernel and didn't include this change. I'll backport 5f50153288452t to chromeos-v5.4 and drop my proposed change.

> Following this fix, if it happens then during assembling of LRO session it could be that this buffer (you suggesting to use buffer with index 0) becomes a part of current LRO session and will be also processed as a single packet or as a part of other LRO huge packet.
> To be honest there are lot of possibilities depends on current values of head and tail which may cause either memory leak or double free or something else.

*nod*

> There is a code which calls this function aq_ring.c: aq_ring_rx_clean.

Exactly.

> From my POV it's better to check that indexes don't point to outside of ring in code which collects LRO session.

Sounds good to me. I don't have a preference. I'm ok with dropping/ignoring [1/5] patch.

> There is expectation that "next" is in range "cleaned" descriptors, which means that HW put data there wrote descriptor and buffers are ready to be process by assembling code.
> So in case if "next" points to something outside of ring then guess it would be better just to stop collecting these buffers to one huge packet and treat this LRO session as completed.
> Then rest of buffers (which should be it that chain) will be collected again without beginning and just dropped by stack later.

That makes sense to me. And apologies for not noticing Zekun Shen's
2021-12-26 change earlier. I've been working on this off and on for several months.

> > [4/5] net: atlantic: add check for MAX_SKB_FRAGS
> >
> >diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> >b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> >index bc1952131799..8201ce7adb77 100644
> >--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> >+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> >@@ -363,6 +363,7 @@  int aq_ring_rx_clean(struct aq_ring_s *self,
> >                       continue;
> >
> >               if (!buff->is_eop) {
> >+                      unsigned int frag_cnt = 0U;
> >                       buff_ = buff;
> >                       do {
> >                               bool is_rsc_completed = true; @@ 
> >-371,6 +372,8 @@  int aq_ring_rx_clean(struct aq_ring_s *self,
> >                                       err = -EIO;
> >                                       goto err_exit;
> >                               }
> >+
> >+                              frag_cnt++;
> >                               next_ = buff_->next,
> >                               buff_ = &self->buff_ring[next_];
> >                               is_rsc_completed = @@ -378,7 +381,8 @@  
> >int aq_ring_rx_clean(struct aq_ring_s *self,
> >                                                           next_,
> >                                                           
> >self->hw_head);
> >
> >-                              if (unlikely(!is_rsc_completed)) {
> >+                              if (unlikely(!is_rsc_completed) ||
> >+                                  frag_cnt > MAX_SKB_FRAGS) {
> >                                       err = 0;
> >                                       goto err_exit;
> >                               }
>
> Number of fragments are limited by HW configuration: hw_atl_b0_internal.h: #define HW_ATL_B0_LRO_RXD_MAX 16U.

Should this loop be checking against HW_ATL_B0_LRO_RXD_MAX instead?

> Let's imagine if it happens: driver stucks at this point it will wait for session completion and session will not be completed due to too much fragments.
> Guess in case if number of buffers exceeds this limit then it is required to close session and continue (submit packet to stack and finalize clearing if processed descriptors/buffers).

Sorry, I'm not understanding your conclusion. Is the "goto err_exit"
in this case doing what you described?
Does this patch have the right idea (even if it should test against a different constant)?

My main concern is the CPU gets stuck in this loop for a very long
(infinite?) time.

>
> > [5/5] net: atlantic: verify hw_head_ is reasonable diff --git 
> >a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >index e72b9d86f6ad..9b6b93bb3e86 100644
> >--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> >@@ -889,6 +889,27 @@  int hw_atl_b0_hw_ring_tx_head_update(struct aq_hw_s *self,
> >               err = -ENXIO;
> >               goto err_exit;
> >       }
> >+
> >+      /* Validate that the new hw_head_ is reasonable. */
> >+      if (hw_head_ >= ring->size) {
> >+              err = -ENXIO;
> >+              goto err_exit;
> >+      }
> >+
> >+      if (ring->sw_head >= ring->sw_tail) {
> >+              /* Head index hasn't wrapped around to below tail index. */
> >+              if (hw_head_ < ring->sw_head && hw_head_ >= ring->sw_tail) {
> >+                      err = -ENXIO;
> >+                      goto err_exit;
> >+              }
> >+      } else {
> >+              /* Head index has wrapped around and is below tail index. */
> >+              if (hw_head_ < ring->sw_head || hw_head_ >= ring->sw_tail) {
> >+                      err = -ENXIO;
> >+                      goto err_exit;
> >+              }
> >+      }
> >+
> >       ring->hw_head = hw_head_;
> >       err = aq_hw_err_from_flags(self);
>
> Simple example. One packet with one buffer was sent. Sw_tail = 1, sw_head=0. From interrupt this function is called and hw_head_ is 1.
> Code will follow "else" branch of second "if" that you add and condition "if (hw_head_ < ring->sw_head || hw_head_ >= ring->sw_tail) {" will be true which seems to be not expected.

Correct - I've got this wrong (head/tail swapped). Even if I had it right, As Igor observed, debatable if it's necessary. Please drop/ignore this patch as well. Aashay and I need to discuss this more.

thank you again!

cheers,
grant


>
> Best regards,
> Dmitrii Bezrukov
>
> -----Original Message-----
> From: Grant Grundler <grundler@chromium.org>
> Sent: Tuesday, April 26, 2022 7:21 PM
> To: Igor Russkikh <irusskikh@marvell.com>
> Cc: Grant Grundler <grundler@chromium.org>; Dmitrii Bezrukov 
> <dbezrukov@marvell.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni 
> <pabeni@redhat.com>; netdev <netdev@vger.kernel.org>; David S . Miller 
> <davem@davemloft.net>; LKML <linux-kernel@vger.kernel.org>; Aashay 
> Shringarpure <aashay@google.com>; Yi Chou <yich@google.com>; Shervin 
> Oloumi <enlightened@google.com>
> Subject: Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
>
> [reply-all again since I forgot to tell gmail to post this as "plain 
> text"...grrh... so much for AI figuring this stuff out.]
>
>
> On Tue, Apr 26, 2022 at 9:00 AM Igor Russkikh <irusskikh@marvell.com> wrote:
> >
> > Hi Grant,
> >
> > Sorry for the delay, I was on vacation.
> > Thanks for working on this.
>
> Hi Igor!
> Very welcome! And yes, I was starting to wonder... but I'm now glad 
> that you didn't review them before you got back. These patches are no 
> reason to ruin a perfectly good vacation. :)
>
> > I'm adding here Dmitrii, to help me review the patches.
> > Dmitrii, here is a full series:
> >
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.
> > org_project_netdevbpf_cover_20220418231746.2464800-2D1-2Dgrundler-40
> > ch
> > romium.org_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=AliKLBUTg9lFc5sIMTzJ
> > t8 
> > MdPiAgKbsC8IpLIHmdX9w&m=1LeNSCJMgZ7UkGBm56FcvL_Oza8VOX45LEtQf31qPE2K
> > LQ 
> > cr5Q36aMIUR2DzLhi7&s=fVFxLPRO8K2DYFpGUOggf38nbDFaHKg8aATsjB1TuB0&e=
> >
> > Grant, I've reviewed and also quite OK with patches 1-4.
>
> Excellent! \o/
>
>
> > For patch 5 - why do you think we need these extra comparisons with software head/tail?
>
> The ChromeOS security team (CC'd) believes the driver needs to verify "expected behavior". In other words, the driver expects the device to provide new values of tail index which are between [tail,head) ("available to fill").
>
> Your question makes me chuckle because I asked exactly the same question. :D Everyone agrees it is a minimum requirement to verify the index was "in bounds". And I agree it's prudent to verify the device is "well behaved" where we can. I haven't looked at the code enough to know what could go wrong if, for example, the tail index is decremented instead of incremented or a "next fragment" index falls in the "available to fill" range.
>
> However, I didn't run the fuzzer and, for now, I'm taking the ChromeOS security team's word that this check is needed. If you (or Dmitrii) feel strongly the driver can handle malicious or firmware bugs in other ways, I'm not offended if you decline this patch. However, I would be curious what those other mechanisms are.
>
> > From what I see in logic, only the size limiting check is enough there..
> >
> > Other extra checks are tricky and non intuitive..
>
> Yes, somewhat tricky in the code but conceptually simple: For the RX buffer ring, IIUC, [head,tail) is "CPU to process" and [tail, head) is "available to fill". New tail values should always be in the latter range.
>
> The trickiness comes in because this is a ring buffer and [tail, head) it is equally likely that head =< tail  or head > tail numerically.
>
> If you like, feel free to add comments explaining the ring behavior or ask me to add such a comment (and repost #5). I'm a big fan of documenting non-intuitive things in the code. That way the next person to look at the code can verify the code and the IO device do what the comment claims.
>
> On the RX buffer ring, I'm also wondering if there is a race condition such that the driver uses stale values of the tail pointer when walking the RX fragment lists and validating index values. Aashay assures me this race condition is not possible and I am convinced this is true for the TX buffer ring where the driver is the "producer"
> (tells the device what is in the TX ring). I still have to review the RX buffer handling code more and will continue the conversation with him until we agree.
>
> cheers,
> grant
>
> >
> > Regards,
> >   Igor
> >
> > On 4/21/2022 9:53 PM, Grant Grundler wrote:
> > > External Email
> > >
> > > ------------------------------------------------------------------
> > > --
> > > --
> > > Igor,
> > > Will you have a chance to comment on this in the near future?
> > > Should someone else review/integrate these patches?
> > >
> > > I'm asking since I've seen no comments in the past three days.
> > >
> > > cheers,
> > > grant
> > >
> > >
> > > On Mon, Apr 18, 2022 at 4:17 PM Grant Grundler 
> > > <grundler@chromium.org>
> > > wrote:
> > >>
> > >> The Chrome OS fuzzing team posted a "Fuzzing" report for atlantic 
> > >> driver in Q4 2021 using Chrome OS v5.4 kernel and "Cable Matters 
> > >> Thunderbolt 3 to 10 Gb Ethernet" (b0 version):
> > >>
> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.
> > >> co 
> > >> m_document_d_e_2PACX-2D1vT4oCGNhhy-5FAuUqpu6NGnW0N9HF-5Fjxf2kS7ra
> > >> Op
> > >> OlNRqJNiTHAtjiHRthXYSeXIRTgfeVvsEt0qK9qK_pub&d=DwIBaQ&c=nKjWec2b6
> > >> R0 
> > >> mOyPaz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8
> > >> Wo 
> > >> QQ-hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=620jq
> > >> eS vQrGg6aotI35cWwQpjaL94s7TFeFh2cYSyvA&e=
> > >>
> > >> It essentially describes four problems:
> > >> 1) validate rxd_wb->next_desc_ptr before populating buff->next
> > >> 2) "frag[0] not initialized" case in aq_ring_rx_clean()
> > >> 3) limit iterations handling fragments in aq_ring_rx_clean()
> > >> 4) validate hw_head_ in hw_atl_b0_hw_ring_tx_head_update()
> > >>
> > >> I've added one "clean up" contribution:
> > >>     "net: atlantic: reduce scope of is_rsc_complete"
> > >>
> > >> I tested the "original" patches using chromeos-v5.4 kernel branch:
> > >>
> > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__chromium-2Dr
> > >> ev 
> > >> iew.googlesource.com_q_hashtag-3Apcinet-2Datlantic-2D2022q1-2B-28
> > >> st 
> > >> atus-3Aopen-2520OR-2520status-3Amerged-29&d=DwIBaQ&c=nKjWec2b6R0m
> > >> Oy
> > >> Paz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8WoQ
> > >> Q-
> > >> hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=1a1YwJqr
> > >> Y- be2oDgGAG5oOyZDnqIok_2p5G-N8djo2I&e=
> > >>
> > >> The fuzzing team will retest using the chromeos-v5.4 patches and 
> > >> the b0 HW.
> > >>
> > >> I've forward ported those patches to 5.18-rc2 and compiled them 
> > >> but am currently unable to test them on 5.18-rc2 kernel (logistics problems).
> > >>
> > >> I'm confident in all but the last patch:
> > >>    "net: atlantic: verify hw_head_ is reasonable"
> > >>
> > >> Please verify I'm not confusing how ring->sw_head and 
> > >> ring->sw_tail are used in hw_atl_b0_hw_ring_tx_head_update().
> > >>
> > >> Credit largely goes to Chrome OS Fuzzing team members:
> > >>     Aashay Shringarpure, Yi Chou, Shervin Oloumi
> > >>
> > >> cheers,
> > >> grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
  2022-05-04 14:39           ` Dmitrii Bezrukov
@ 2022-05-04 20:11             ` Grant Grundler
  2022-05-05  7:11               ` Igor Russkikh
  0 siblings, 1 reply; 15+ messages in thread
From: Grant Grundler @ 2022-05-04 20:11 UTC (permalink / raw)
  To: Dmitrii Bezrukov
  Cc: Grant Grundler, Igor Russkikh, Jakub Kicinski, Paolo Abeni,
	netdev, David S . Miller, LKML, Aashay Shringarpure, Yi Chou,
	Shervin Oloumi

On Wed, May 4, 2022 at 7:39 AM Dmitrii Bezrukov <dbezrukov@marvell.com> wrote:
>
> Hi Grant,
>
> > Did I misunderstand what this code (line 378) is doing in aq_ring.c?
> Yes it's done there. I meant that in 'hw_atl/hw_atl_b0.c" there is not access to buffer.

Ah ok - understood.
> And probably it's not a good place to put this your change there. Due to you are going to drop this 1/5 patch, it doesn’t make any sense anymore.


>
> > *nod*
> I'm sorry but I don’t understand what it means.

:) It's an acknowledgement - You have to imagine me nodding my head in "yes". :)

>
> > Should this loop be checking against HW_ATL_B0_LRO_RXD_MAX instead?
> No, that's OK to check with MAX_SKB_FRAGS as you do.

OK :)

>
> >Sorry, I'm not understanding your conclusion. Is the "goto err_exit"
> >in this case doing what you described?
> >Does this patch have the right idea (even if it should test against a different constant)?
> >
> >My main concern is the CPU gets stuck in this loop for a very long
> >(infinite?) time.
> Yes this is exactly that I meant, that in this case, the condition to push packet to stack will not be ever reached.
> So to close session I guess need to set is_rsc_completed to true when number of frags is going to exceed value MAX_SKB_FRAGS, then packet will be built and submitted to stack.
> But of course need to check that there will not be any other corner cases with this new change.

Ok. Sounds like I should post a v2 then and just drop 1/5 and 5/5
patches.  Will post that tomorrow.

Thanks again!
grant

>
> Best regards,
> Dmitrii Bezrukov
>
> -----Original Message-----
> From: Grant Grundler <grundler@chromium.org>
> Sent: Tuesday, May 3, 2022 8:07 PM
> To: Dmitrii Bezrukov <dbezrukov@marvell.com>
> Cc: Grant Grundler <grundler@chromium.org>; Igor Russkikh <irusskikh@marvell.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni <pabeni@redhat.com>; netdev <netdev@vger.kernel.org>; David S . Miller <davem@davemloft.net>; LKML <linux-kernel@vger.kernel.org>; Aashay Shringarpure <aashay@google.com>; Yi Chou <yich@google.com>; Shervin Oloumi <enlightened@google.com>
> Subject: Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
>
> Hi Dmitrii!
>
> On Tue, May 3, 2022 at 4:15 AM Dmitrii Bezrukov <dbezrukov@marvell.com> wrote:
> >
> > Hi Grants,
> >
> > >[1/5] net: atlantic: limit buff_ring index value
> >
> > >diff --git
> > >a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> > >b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> > >index d875ce3ec759..e72b9d86f6ad 100644
> > >--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> > >+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> > >@@ -981,7 +981,9 @@  int hw_atl_b0_hw_ring_rx_receive(struct aq_hw_s
> > >*self, struct aq_ring_s *ring)
> > >
> > >                       if (buff->is_lro) {
> > >                               /* LRO */
> > >-                              buff->next = rxd_wb->next_desc_ptr;
> > >+                              buff->next =
> > >+                                      (rxd_wb->next_desc_ptr < ring->size) ?
> > >+                                      rxd_wb->next_desc_ptr : 0U;
> > >                               ++ring->stats.rx.lro_packets;
> > >                       } else {
> > >                               /* jumbo */
> >
> > I don’t find this way correct. At least in this functions there is no access to buffers by this index "next".
>
> Did I misunderstand what this code (line 378) is doing in aq_ring.c?
> 342 #define AQ_SKB_ALIGN SKB_DATA_ALIGN(sizeof(struct skb_shared_info))
> 343 int aq_ring_rx_clean(struct aq_ring_s *self,
> 344                      struct napi_struct *napi,
> 345                      int *work_done,
> 346                      int budget)
> 347 {
> ...
> 371                                 if (buff_->next >= self->size) {
> 372                                         err = -EIO;
> 373                                         goto err_exit;
> 374                                 }
> 375
> 376                                 frag_cnt++;
> 377                                 next_ = buff_->next,
> 378                                 buff_ = &self->buff_ring[next_];
>
> My change is redundant with Lines 371-374 - they essentially do the same thing and were added on
> 2021-12-26 by 5f50153288452 (Zekun Shen)
>
> The original fuzzing work was done on chromeos-v5.4 kernel and didn't include this change. I'll backport 5f50153288452t to chromeos-v5.4 and drop my proposed change.
>
> > Following this fix, if it happens then during assembling of LRO session it could be that this buffer (you suggesting to use buffer with index 0) becomes a part of current LRO session and will be also processed as a single packet or as a part of other LRO huge packet.
> > To be honest there are lot of possibilities depends on current values of head and tail which may cause either memory leak or double free or something else.
>
> *nod*
>
> > There is a code which calls this function aq_ring.c: aq_ring_rx_clean.
>
> Exactly.
>
> > From my POV it's better to check that indexes don't point to outside of ring in code which collects LRO session.
>
> Sounds good to me. I don't have a preference. I'm ok with dropping/ignoring [1/5] patch.
>
> > There is expectation that "next" is in range "cleaned" descriptors, which means that HW put data there wrote descriptor and buffers are ready to be process by assembling code.
> > So in case if "next" points to something outside of ring then guess it would be better just to stop collecting these buffers to one huge packet and treat this LRO session as completed.
> > Then rest of buffers (which should be it that chain) will be collected again without beginning and just dropped by stack later.
>
> That makes sense to me. And apologies for not noticing Zekun Shen's
> 2021-12-26 change earlier. I've been working on this off and on for several months.
>
> > > [4/5] net: atlantic: add check for MAX_SKB_FRAGS
> > >
> > >diff --git a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> > >b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> > >index bc1952131799..8201ce7adb77 100644
> > >--- a/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> > >+++ b/drivers/net/ethernet/aquantia/atlantic/aq_ring.c
> > >@@ -363,6 +363,7 @@  int aq_ring_rx_clean(struct aq_ring_s *self,
> > >                       continue;
> > >
> > >               if (!buff->is_eop) {
> > >+                      unsigned int frag_cnt = 0U;
> > >                       buff_ = buff;
> > >                       do {
> > >                               bool is_rsc_completed = true; @@
> > >-371,6 +372,8 @@  int aq_ring_rx_clean(struct aq_ring_s *self,
> > >                                       err = -EIO;
> > >                                       goto err_exit;
> > >                               }
> > >+
> > >+                              frag_cnt++;
> > >                               next_ = buff_->next,
> > >                               buff_ = &self->buff_ring[next_];
> > >                               is_rsc_completed = @@ -378,7 +381,8 @@
> > >int aq_ring_rx_clean(struct aq_ring_s *self,
> > >                                                           next_,
> > >
> > >self->hw_head);
> > >
> > >-                              if (unlikely(!is_rsc_completed)) {
> > >+                              if (unlikely(!is_rsc_completed) ||
> > >+                                  frag_cnt > MAX_SKB_FRAGS) {
> > >                                       err = 0;
> > >                                       goto err_exit;
> > >                               }
> >
> > Number of fragments are limited by HW configuration: hw_atl_b0_internal.h: #define HW_ATL_B0_LRO_RXD_MAX 16U.
>
> Should this loop be checking against HW_ATL_B0_LRO_RXD_MAX instead?
>
> > Let's imagine if it happens: driver stucks at this point it will wait for session completion and session will not be completed due to too much fragments.
> > Guess in case if number of buffers exceeds this limit then it is required to close session and continue (submit packet to stack and finalize clearing if processed descriptors/buffers).
>
> Sorry, I'm not understanding your conclusion. Is the "goto err_exit"
> in this case doing what you described?
> Does this patch have the right idea (even if it should test against a different constant)?
>
> My main concern is the CPU gets stuck in this loop for a very long
> (infinite?) time.
>
> >
> > > [5/5] net: atlantic: verify hw_head_ is reasonable diff --git
> > >a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> > >b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> > >index e72b9d86f6ad..9b6b93bb3e86 100644
> > >--- a/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> > >+++ b/drivers/net/ethernet/aquantia/atlantic/hw_atl/hw_atl_b0.c
> > >@@ -889,6 +889,27 @@  int hw_atl_b0_hw_ring_tx_head_update(struct aq_hw_s *self,
> > >               err = -ENXIO;
> > >               goto err_exit;
> > >       }
> > >+
> > >+      /* Validate that the new hw_head_ is reasonable. */
> > >+      if (hw_head_ >= ring->size) {
> > >+              err = -ENXIO;
> > >+              goto err_exit;
> > >+      }
> > >+
> > >+      if (ring->sw_head >= ring->sw_tail) {
> > >+              /* Head index hasn't wrapped around to below tail index. */
> > >+              if (hw_head_ < ring->sw_head && hw_head_ >= ring->sw_tail) {
> > >+                      err = -ENXIO;
> > >+                      goto err_exit;
> > >+              }
> > >+      } else {
> > >+              /* Head index has wrapped around and is below tail index. */
> > >+              if (hw_head_ < ring->sw_head || hw_head_ >= ring->sw_tail) {
> > >+                      err = -ENXIO;
> > >+                      goto err_exit;
> > >+              }
> > >+      }
> > >+
> > >       ring->hw_head = hw_head_;
> > >       err = aq_hw_err_from_flags(self);
> >
> > Simple example. One packet with one buffer was sent. Sw_tail = 1, sw_head=0. From interrupt this function is called and hw_head_ is 1.
> > Code will follow "else" branch of second "if" that you add and condition "if (hw_head_ < ring->sw_head || hw_head_ >= ring->sw_tail) {" will be true which seems to be not expected.
>
> Correct - I've got this wrong (head/tail swapped). Even if I had it right, As Igor observed, debatable if it's necessary. Please drop/ignore this patch as well. Aashay and I need to discuss this more.
>
> thank you again!
>
> cheers,
> grant
>
>
> >
> > Best regards,
> > Dmitrii Bezrukov
> >
> > -----Original Message-----
> > From: Grant Grundler <grundler@chromium.org>
> > Sent: Tuesday, April 26, 2022 7:21 PM
> > To: Igor Russkikh <irusskikh@marvell.com>
> > Cc: Grant Grundler <grundler@chromium.org>; Dmitrii Bezrukov
> > <dbezrukov@marvell.com>; Jakub Kicinski <kuba@kernel.org>; Paolo Abeni
> > <pabeni@redhat.com>; netdev <netdev@vger.kernel.org>; David S . Miller
> > <davem@davemloft.net>; LKML <linux-kernel@vger.kernel.org>; Aashay
> > Shringarpure <aashay@google.com>; Yi Chou <yich@google.com>; Shervin
> > Oloumi <enlightened@google.com>
> > Subject: Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
> >
> > [reply-all again since I forgot to tell gmail to post this as "plain
> > text"...grrh... so much for AI figuring this stuff out.]
> >
> >
> > On Tue, Apr 26, 2022 at 9:00 AM Igor Russkikh <irusskikh@marvell.com> wrote:
> > >
> > > Hi Grant,
> > >
> > > Sorry for the delay, I was on vacation.
> > > Thanks for working on this.
> >
> > Hi Igor!
> > Very welcome! And yes, I was starting to wonder... but I'm now glad
> > that you didn't review them before you got back. These patches are no
> > reason to ruin a perfectly good vacation. :)
> >
> > > I'm adding here Dmitrii, to help me review the patches.
> > > Dmitrii, here is a full series:
> > >
> > > https://urldefense.proofpoint.com/v2/url?u=https-3A__patchwork.kernel.
> > > org_project_netdevbpf_cover_20220418231746.2464800-2D1-2Dgrundler-40
> > > ch
> > > romium.org_&d=DwIFaQ&c=nKjWec2b6R0mOyPaz7xtfQ&r=AliKLBUTg9lFc5sIMTzJ
> > > t8
> > > MdPiAgKbsC8IpLIHmdX9w&m=1LeNSCJMgZ7UkGBm56FcvL_Oza8VOX45LEtQf31qPE2K
> > > LQ
> > > cr5Q36aMIUR2DzLhi7&s=fVFxLPRO8K2DYFpGUOggf38nbDFaHKg8aATsjB1TuB0&e=
> > >
> > > Grant, I've reviewed and also quite OK with patches 1-4.
> >
> > Excellent! \o/
> >
> >
> > > For patch 5 - why do you think we need these extra comparisons with software head/tail?
> >
> > The ChromeOS security team (CC'd) believes the driver needs to verify "expected behavior". In other words, the driver expects the device to provide new values of tail index which are between [tail,head) ("available to fill").
> >
> > Your question makes me chuckle because I asked exactly the same question. :D Everyone agrees it is a minimum requirement to verify the index was "in bounds". And I agree it's prudent to verify the device is "well behaved" where we can. I haven't looked at the code enough to know what could go wrong if, for example, the tail index is decremented instead of incremented or a "next fragment" index falls in the "available to fill" range.
> >
> > However, I didn't run the fuzzer and, for now, I'm taking the ChromeOS security team's word that this check is needed. If you (or Dmitrii) feel strongly the driver can handle malicious or firmware bugs in other ways, I'm not offended if you decline this patch. However, I would be curious what those other mechanisms are.
> >
> > > From what I see in logic, only the size limiting check is enough there..
> > >
> > > Other extra checks are tricky and non intuitive..
> >
> > Yes, somewhat tricky in the code but conceptually simple: For the RX buffer ring, IIUC, [head,tail) is "CPU to process" and [tail, head) is "available to fill". New tail values should always be in the latter range.
> >
> > The trickiness comes in because this is a ring buffer and [tail, head) it is equally likely that head =< tail  or head > tail numerically.
> >
> > If you like, feel free to add comments explaining the ring behavior or ask me to add such a comment (and repost #5). I'm a big fan of documenting non-intuitive things in the code. That way the next person to look at the code can verify the code and the IO device do what the comment claims.
> >
> > On the RX buffer ring, I'm also wondering if there is a race condition such that the driver uses stale values of the tail pointer when walking the RX fragment lists and validating index values. Aashay assures me this race condition is not possible and I am convinced this is true for the TX buffer ring where the driver is the "producer"
> > (tells the device what is in the TX ring). I still have to review the RX buffer handling code more and will continue the conversation with him until we agree.
> >
> > cheers,
> > grant
> >
> > >
> > > Regards,
> > >   Igor
> > >
> > > On 4/21/2022 9:53 PM, Grant Grundler wrote:
> > > > External Email
> > > >
> > > > ------------------------------------------------------------------
> > > > --
> > > > --
> > > > Igor,
> > > > Will you have a chance to comment on this in the near future?
> > > > Should someone else review/integrate these patches?
> > > >
> > > > I'm asking since I've seen no comments in the past three days.
> > > >
> > > > cheers,
> > > > grant
> > > >
> > > >
> > > > On Mon, Apr 18, 2022 at 4:17 PM Grant Grundler
> > > > <grundler@chromium.org>
> > > > wrote:
> > > >>
> > > >> The Chrome OS fuzzing team posted a "Fuzzing" report for atlantic
> > > >> driver in Q4 2021 using Chrome OS v5.4 kernel and "Cable Matters
> > > >> Thunderbolt 3 to 10 Gb Ethernet" (b0 version):
> > > >>
> > > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__docs.google.
> > > >> co
> > > >> m_document_d_e_2PACX-2D1vT4oCGNhhy-5FAuUqpu6NGnW0N9HF-5Fjxf2kS7ra
> > > >> Op
> > > >> OlNRqJNiTHAtjiHRthXYSeXIRTgfeVvsEt0qK9qK_pub&d=DwIBaQ&c=nKjWec2b6
> > > >> R0
> > > >> mOyPaz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8
> > > >> Wo
> > > >> QQ-hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=620jq
> > > >> eS vQrGg6aotI35cWwQpjaL94s7TFeFh2cYSyvA&e=
> > > >>
> > > >> It essentially describes four problems:
> > > >> 1) validate rxd_wb->next_desc_ptr before populating buff->next
> > > >> 2) "frag[0] not initialized" case in aq_ring_rx_clean()
> > > >> 3) limit iterations handling fragments in aq_ring_rx_clean()
> > > >> 4) validate hw_head_ in hw_atl_b0_hw_ring_tx_head_update()
> > > >>
> > > >> I've added one "clean up" contribution:
> > > >>     "net: atlantic: reduce scope of is_rsc_complete"
> > > >>
> > > >> I tested the "original" patches using chromeos-v5.4 kernel branch:
> > > >>
> > > >> https://urldefense.proofpoint.com/v2/url?u=https-3A__chromium-2Dr
> > > >> ev
> > > >> iew.googlesource.com_q_hashtag-3Apcinet-2Datlantic-2D2022q1-2B-28
> > > >> st
> > > >> atus-3Aopen-2520OR-2520status-3Amerged-29&d=DwIBaQ&c=nKjWec2b6R0m
> > > >> Oy
> > > >> Paz7xtfQ&r=3kUjVPjrPMvlbd3rzgP63W0eewvCq4D-kzQRqaXHOqU&m=QoxR8WoQ
> > > >> Q-
> > > >> hpWu_tThQydP3-6zkRWACvRmj_7aY1qo2FG6DdPdI86vAYrfKQFMHX&s=1a1YwJqr
> > > >> Y- be2oDgGAG5oOyZDnqIok_2p5G-N8djo2I&e=
> > > >>
> > > >> The fuzzing team will retest using the chromeos-v5.4 patches and
> > > >> the b0 HW.
> > > >>
> > > >> I've forward ported those patches to 5.18-rc2 and compiled them
> > > >> but am currently unable to test them on 5.18-rc2 kernel (logistics problems).
> > > >>
> > > >> I'm confident in all but the last patch:
> > > >>    "net: atlantic: verify hw_head_ is reasonable"
> > > >>
> > > >> Please verify I'm not confusing how ring->sw_head and
> > > >> ring->sw_tail are used in hw_atl_b0_hw_ring_tx_head_update().
> > > >>
> > > >> Credit largely goes to Chrome OS Fuzzing team members:
> > > >>     Aashay Shringarpure, Yi Chou, Shervin Oloumi
> > > >>
> > > >> cheers,
> > > >> grant

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
  2022-05-04 20:11             ` Grant Grundler
@ 2022-05-05  7:11               ` Igor Russkikh
  2022-05-05 20:57                 ` Grant Grundler
  0 siblings, 1 reply; 15+ messages in thread
From: Igor Russkikh @ 2022-05-05  7:11 UTC (permalink / raw)
  To: Grant Grundler, Dmitrii Bezrukov
  Cc: Jakub Kicinski, Paolo Abeni, netdev, David S . Miller, LKML,
	Aashay Shringarpure, Yi Chou, Shervin Oloumi


Hi Grant and Dmitrii,

>> So to close session I guess need to set is_rsc_completed to true when 
>> number of frags is going to exceed value MAX_SKB_FRAGS, then packet will 
>> be built and submitted to stack.
>> But of course need to check that there will not be any other corner cases 
>> with this new change.
> 
> Ok. Sounds like I should post a v2 then and just drop 1/5 and 5/5
> patches.  Will post that tomorrow.

I think the part with check `hw_head_ >= ring->size` still can be used safely (patch 5).

For patch 1 - I agree it may make things worse, so either drop or think on how to interpret invalid `next` and stop LRO session.

Thanks,
   Igor

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [EXT] Re: [PATCH 0/5] net: atlantic: more fuzzing fixes
  2022-05-05  7:11               ` Igor Russkikh
@ 2022-05-05 20:57                 ` Grant Grundler
  0 siblings, 0 replies; 15+ messages in thread
From: Grant Grundler @ 2022-05-05 20:57 UTC (permalink / raw)
  To: Igor Russkikh
  Cc: Grant Grundler, Dmitrii Bezrukov, Jakub Kicinski, Paolo Abeni,
	netdev, David S . Miller, LKML, Aashay Shringarpure, Yi Chou,
	Shervin Oloumi

On Thu, May 5, 2022 at 12:11 AM Igor Russkikh <irusskikh@marvell.com> wrote:
>
>
> Hi Grant and Dmitrii,
>
> >> So to close session I guess need to set is_rsc_completed to true when
> >> number of frags is going to exceed value MAX_SKB_FRAGS, then packet will
> >> be built and submitted to stack.
> >> But of course need to check that there will not be any other corner cases
> >> with this new change.
> >
> > Ok. Sounds like I should post a v2 then and just drop 1/5 and 5/5
> > patches.  Will post that tomorrow.
>
> I think the part with check `hw_head_ >= ring->size` still can be used safely (patch 5).

Ok - I'll rewrite 5/5 to only include this hunk.

> For patch 1 - I agree it may make things worse, so either drop or think on how to interpret invalid `next` and stop LRO session.

I'll drop the proposed patch for now and discuss with Aashay (ChromeOS
security) more.

cheers,
grant

>
> Thanks,
>    Igor

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-05-05 20:57 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-18 23:17 [PATCH 0/5] net: atlantic: more fuzzing fixes Grant Grundler
2022-04-18 23:17 ` [PATCH 1/5] net: atlantic: limit buff_ring index value Grant Grundler
2022-04-18 23:17 ` [PATCH 2/5] net: atlantic: fix "frag[0] not initialized" Grant Grundler
2022-04-18 23:17 ` [PATCH 3/5] net: atlantic: reduce scope of is_rsc_complete Grant Grundler
2022-04-18 23:17 ` [PATCH 4/5] net: atlantic: add check for MAX_SKB_FRAGS Grant Grundler
2022-04-18 23:17 ` [PATCH 5/5] net: atlantic: verify hw_head_ is reasonable Grant Grundler
2022-04-21 19:53 ` [PATCH 0/5] net: atlantic: more fuzzing fixes Grant Grundler
2022-04-26 16:00   ` [EXT] " Igor Russkikh
2022-04-26 17:20     ` Grant Grundler
2022-05-03 11:14       ` Dmitrii Bezrukov
2022-05-03 18:07         ` Grant Grundler
2022-05-04 14:39           ` Dmitrii Bezrukov
2022-05-04 20:11             ` Grant Grundler
2022-05-05  7:11               ` Igor Russkikh
2022-05-05 20:57                 ` Grant Grundler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.