Re: [EXT] [PATCH] aquantia: Reserve space when allocating an SKB

From: Igor Russkikh <irusskikh@marvell.com>
To: "Ramsay, Lincoln" <Lincoln.Ramsay@digi.com>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>, <netdev@vger.kernel.org>,
	Dmitry Bogdanov <dbogdanov@marvell.com>
Subject: Re: [EXT] [PATCH] aquantia: Reserve space when allocating an SKB
Date: Wed, 18 Nov 2020 17:02:49 +0300	[thread overview]
Message-ID: <2b392026-c077-2871-3492-eb5ddd582422@marvell.com> (raw)
In-Reply-To: <CY4PR1001MB23118EE23F7F5196817B8B2EE8E10@CY4PR1001MB2311.namprd10.prod.outlook.com>

Hi Ramsay,

> When performing IPv6 forwarding, there is an expectation that SKBs
> will have some headroom. When forwarding a packet from the aquantia
> driver, this does not always happen, triggering a kernel warning.
> 
> It was observed that napi_alloc_skb and other ethernet drivers
> reserve (NET_SKB_PAD + NET_IP_ALIGN) bytes in new SKBs. Do this
> when calling build_skb as well.

Thanks for the analysis, but I think the solution you propose is invalid.

> After much hunting and debugging, I think I have figured out the issue
> here.
> 
> aq_ring.c has this code (edited slightly for brevity):
> 
> if (buff->is_eop && buff->len <= AQ_CFG_RX_FRAME_MAX - AQ_SKB_ALIGN) {
>     skb = build_skb(aq_buf_vaddr(&buff->rxdata), AQ_CFG_RX_FRAME_MAX);
>     skb_put(skb, buff->len);
> } else {
>     skb = napi_alloc_skb(napi, AQ_CFG_RX_HDR_SIZE);
> 
> There is a significant difference between the SKB produced by these 2 code
> paths. When napi_alloc_skb creates an SKB, there is a certain amount of
> headroom reserved. The same pattern appears to be used in all of the other
> ethernet drivers I have looked at. However, this is not done in the
> build_skb codepath.

...

> -	rxpage->pg_off = 0;
> +	rxpage->pg_off = AQ_SKB_PAD;
> 
>  	return 0;
> 
> @@ -67,8 +69,8 @@ static int aq_get_rxpages(struct aq_ring_s *self, struct
> aq_ring_buff_s *rxbuf,
>  		/* One means ring is the only user and can reuse */
>  		if (page_ref_count(rxbuf->rxdata.page) > 1) {
>  			/* Try reuse buffer */
> -			rxbuf->rxdata.pg_off += AQ_CFG_RX_FRAME_MAX;
> -			if (rxbuf->rxdata.pg_off + AQ_CFG_RX_FRAME_MAX <=
> +			rxbuf->rxdata.pg_off += AQ_CFG_RX_FRAME_MAX +
> AQ_SKB_PAD;
> +			if (rxbuf->rxdata.pg_off + AQ_CFG_RX_FRAME_MAX +
> AQ_SKB_PAD <=
>  				(PAGE_SIZE << order)) {

Here I understand your intention. You are trying to "offset" the placement of
the packet data, and the restore it back when construction SKB.

The problem however is that hardware is being programmed with fixed descriptor
size for placement. And its equal to AQ_CFG_RX_FRAME_MAX (2K by default).

This means, HW will do writes of up to 2K packet data into a single
descriptor, and then (if not enough), will go for next descriptor data.

With your solution, packets of size (AQ_CFG_RX_FRAME_MAX - AQ_SKB_PAD) up to
size of AQ_CFG_RX_FRAME_MAX will overwrite the area of page they designated
to. Ultimately, HW will do a memory corruption of next page.

The limitation here is we can't tell HW on granularity less than 1K.

I think the only acceptable solution here would be removing that optimized
path of build_skb, and keep only napi_alloc_skb. Or, we can think of keeping
it under some configuration condition (which is also not good).

So far I can't imagine any other good solution.

HW supports also a header split - this could be used to follow the optimized
path, but thats not an easy thing to implement.

Regards,
  Igor