All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Maciej Fijalkowski <maciej.fijalkowski@intel.com>,
	 bpf@vger.kernel.org,  ast@kernel.org,  daniel@iogearbox.net,
	 andrii@kernel.org
Cc: netdev@vger.kernel.org,  magnus.karlsson@intel.com,
	 bjorn@kernel.org,  maciej.fijalkowski@intel.com,
	 echaudro@redhat.com,  lorenzo@kernel.org,
	 tirthendu.sarkar@intel.com
Subject: RE: [PATCH v3 bpf 2/4] xsk: fix usage of multi-buffer BPF helpers for ZC XDP
Date: Wed, 03 Jan 2024 12:48:10 -0800	[thread overview]
Message-ID: <6595c80a83de0_256122088a@john.notmuch> (raw)
In-Reply-To: <20231221132656.384606-3-maciej.fijalkowski@intel.com>

Maciej Fijalkowski wrote:
> Currently when packet is shrunk via bpf_xdp_adjust_tail(), null ptr
> dereference happens:
> 
> [1136314.192256] BUG: kernel NULL pointer dereference, address:
> 0000000000000034
> [1136314.203943] #PF: supervisor read access in kernel mode
> [1136314.213768] #PF: error_code(0x0000) - not-present page
> [1136314.223550] PGD 0 P4D 0
> [1136314.230684] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [1136314.239621] CPU: 8 PID: 54203 Comm: xdpsock Not tainted 6.6.0+ #257
> [1136314.250469] Hardware name: Intel Corporation S2600WFT/S2600WFT,
> BIOS SE5C620.86B.02.01.0008.031920191559 03/19/2019
> [1136314.265615] RIP: 0010:__xdp_return+0x6c/0x210
> [1136314.274653] Code: ad 00 48 8b 47 08 49 89 f8 a8 01 0f 85 9b 01 00 00 0f 1f 44 00 00 f0 41 ff 48 34 75 32 4c 89 c7 e9 79 cd 80 ff 83 fe 03 75 17 <f6> 41 34 01 0f 85 02 01 00 00 48 89 cf e9 22 cc 1e 00 e9 3d d2 86
> [1136314.302907] RSP: 0018:ffffc900089f8db0 EFLAGS: 00010246
> [1136314.312967] RAX: ffffc9003168aed0 RBX: ffff8881c3300000 RCX:
> 0000000000000000
> [1136314.324953] RDX: 0000000000000000 RSI: 0000000000000003 RDI:
> ffffc9003168c000
> [1136314.336929] RBP: 0000000000000ae0 R08: 0000000000000002 R09:
> 0000000000010000
> [1136314.348844] R10: ffffc9000e495000 R11: 0000000000000040 R12:
> 0000000000000001
> [1136314.360706] R13: 0000000000000524 R14: ffffc9003168aec0 R15:
> 0000000000000001
> [1136314.373298] FS:  00007f8df8bbcb80(0000) GS:ffff8897e0e00000(0000)
> knlGS:0000000000000000
> [1136314.386105] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [1136314.396532] CR2: 0000000000000034 CR3: 00000001aa912002 CR4:
> 00000000007706f0
> [1136314.408377] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [1136314.420173] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [1136314.431890] PKRU: 55555554
> [1136314.439143] Call Trace:
> [1136314.446058]  <IRQ>
> [1136314.452465]  ? __die+0x20/0x70
> [1136314.459881]  ? page_fault_oops+0x15b/0x440
> [1136314.468305]  ? exc_page_fault+0x6a/0x150
> [1136314.476491]  ? asm_exc_page_fault+0x22/0x30
> [1136314.484927]  ? __xdp_return+0x6c/0x210
> [1136314.492863]  bpf_xdp_adjust_tail+0x155/0x1d0
> [1136314.501269]  bpf_prog_ccc47ae29d3b6570_xdp_sock_prog+0x15/0x60
> [1136314.511263]  ice_clean_rx_irq_zc+0x206/0xc60 [ice]
> [1136314.520222]  ? ice_xmit_zc+0x6e/0x150 [ice]
> [1136314.528506]  ice_napi_poll+0x467/0x670 [ice]
> [1136314.536858]  ? ttwu_do_activate.constprop.0+0x8f/0x1a0
> [1136314.546010]  __napi_poll+0x29/0x1b0
> [1136314.553462]  net_rx_action+0x133/0x270
> [1136314.561619]  __do_softirq+0xbe/0x28e
> [1136314.569303]  do_softirq+0x3f/0x60
> 
> This comes from __xdp_return() call with xdp_buff argument passed as
> NULL which is supposed to be consumed by xsk_buff_free() call.
> 
> To address this properly, in ZC case, a node that represents the frag
> being removed has to be pulled out of xskb_list. Introduce

hmm it looks like xsk_buff_free() called by __xdp_return would
pull the frag out of the xskb_list? Or am I wrong?

Then the issue is primarily the NULL handling?

> appriopriate xsk helpers to do such node operation and use them
> accordingly within bpf_xdp_adjust_tail().
> 
> Fixes: 24ea50127ecf ("xsk: support mbuf on ZC RX")
> Acked-by: Magnus Karlsson <magnus.karlsson@intel.com> # For the xsk header part
> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> ---
>  include/net/xdp_sock_drv.h | 26 +++++++++++++++++++++
>  net/core/filter.c          | 48 +++++++++++++++++++++++++++++++-------
>  2 files changed, 65 insertions(+), 9 deletions(-)
> 
> diff --git a/include/net/xdp_sock_drv.h b/include/net/xdp_sock_drv.h
> index b62bb8525a5f..3d35ac0f838b 100644
> --- a/include/net/xdp_sock_drv.h
> +++ b/include/net/xdp_sock_drv.h
> @@ -159,6 +159,23 @@ static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first)
>  	return ret;
>  }
>  
> +static inline void xsk_buff_del_tail(struct xdp_buff *tail)
> +{
> +	struct xdp_buff_xsk *xskb = container_of(tail, struct xdp_buff_xsk, xdp);
> +
> +	list_del(&xskb->xskb_list_node);
> +}
> +
> +static inline struct xdp_buff *xsk_buff_get_tail(struct xdp_buff *first)
> +{
> +	struct xdp_buff_xsk *xskb = container_of(first, struct xdp_buff_xsk, xdp);
> +	struct xdp_buff_xsk *frag;
> +
> +	frag = list_last_entry(&xskb->pool->xskb_list, struct xdp_buff_xsk,
> +			       xskb_list_node);
> +	return &frag->xdp;
> +}
> +
>  static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size)
>  {
>  	xdp->data = xdp->data_hard_start + XDP_PACKET_HEADROOM;
> @@ -350,6 +367,15 @@ static inline struct xdp_buff *xsk_buff_get_frag(struct xdp_buff *first)
>  	return NULL;
>  }
>  
> +static inline void xsk_buff_del_tail(struct xdp_buff *tail)
> +{
> +}
> +
> +static inline struct xdp_buff *xsk_buff_get_tail(struct xdp_buff *first)
> +{
> +	return NULL;
> +}
> +
>  static inline void xsk_buff_set_size(struct xdp_buff *xdp, u32 size)
>  {
>  }
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 24061f29c9dd..1e20196687fd 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -83,6 +83,7 @@
>  #include <net/netfilter/nf_conntrack_bpf.h>
>  #include <net/netkit.h>
>  #include <linux/un.h>
> +#include <net/xdp_sock_drv.h>
>  
>  #include "dev.h"
>  
> @@ -4096,6 +4097,42 @@ static int bpf_xdp_frags_increase_tail(struct xdp_buff *xdp, int offset)
>  	return 0;
>  }
>  
> +static void __shrink_data(struct xdp_buff *xdp, struct xdp_mem_info *mem_info,
> +			  skb_frag_t *frag, int shrink)
> +{
> +	if (mem_info->type == MEM_TYPE_XSK_BUFF_POOL) {
> +		struct xdp_buff *tail = xsk_buff_get_tail(xdp);
> +
> +		if (tail)
> +			tail->data_end -= shrink;
> +	}
> +	skb_frag_size_sub(frag, shrink);
> +}
> +
> +static bool shrink_data(struct xdp_buff *xdp, skb_frag_t *frag, int shrink)
> +{
> +	struct xdp_mem_info *mem_info = &xdp->rxq->mem;
> +
> +	if (skb_frag_size(frag) == shrink) {
> +		struct page *page = skb_frag_page(frag);
> +		struct xdp_buff *zc_frag = NULL;
> +
> +		if (mem_info->type == MEM_TYPE_XSK_BUFF_POOL) {
> +			zc_frag = xsk_buff_get_tail(xdp);
> +
> +			if (zc_frag) {
> +				xdp_buff_clear_frags_flag(zc_frag);
> +				xsk_buff_del_tail(zc_frag);
> +			}
> +		}

Should this be fixed in xdp_return instead of here? The xdp_return
is doing what xsk_buff_del_tail() does. If we also called clear_frags
there could this be simpler?

 if (skb_frag_size(frag) == shrink) {
	struct page *page = skb_frag_page(frag);

	__xdp_return(page_address(page), mem_info, false, xsk_buff_get_tail(xdp));
 } else {
   __shrink_data(xdp, mem_info, frag, shrink);
 }

the return will need to have an unlikely(!xdp) to guard the case it
might be NULL, but also not sure if we would ever expect a NULL
here if MEM_TYPE_XSK_BUFF_POOL so you might skip that unlikely
as well?

> +
> +		__xdp_return(page_address(page), mem_info, false, zc_frag);
> +		return true;
> +	}
> +	__shrink_data(xdp, mem_info, frag, shrink);
> +	return false;
> +}
> +
>  static int bpf_xdp_frags_shrink_tail(struct xdp_buff *xdp, int offset)
>  {
>  	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
> @@ -4110,17 +4147,10 @@ static int bpf_xdp_frags_shrink_tail(struct xdp_buff *xdp, int offset)
>  
>  		len_free += shrink;
>  		offset -= shrink;
> -
> -		if (skb_frag_size(frag) == shrink) {
> -			struct page *page = skb_frag_page(frag);
> -

And then I likely would avoid the helper altogether? And code
example above just lands here?

> -			__xdp_return(page_address(page), &xdp->rxq->mem,
> -				     false, NULL);
> +		if (shrink_data(xdp, frag, shrink))
>  			n_frags_free++;
> -		} else {
> -			skb_frag_size_sub(frag, shrink);
> +		else
>  			break;
> -		}
>  	}

I think the fix can be more straight-forward if we just populate
the NULL field with the xdp_buff using the get_tail() helper
created above.

>  	sinfo->nr_frags -= n_frags_free;
>  	sinfo->xdp_frags_size -= len_free;
> -- 
> 2.34.1
> 
> 



  parent reply	other threads:[~2024-01-03 20:48 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-21 13:26 [PATCH v3 bpf 0/4] net: bpf_xdp_adjust_tail() fixes Maciej Fijalkowski
2023-12-21 13:26 ` [PATCH v3 bpf 1/4] xsk: recycle buffer in case Rx queue was full Maciej Fijalkowski
2023-12-21 13:26 ` [PATCH v3 bpf 2/4] xsk: fix usage of multi-buffer BPF helpers for ZC XDP Maciej Fijalkowski
2024-01-02 22:58   ` Martin KaFai Lau
2024-01-03 12:04     ` Maciej Fijalkowski
2024-01-03 22:53       ` Martin KaFai Lau
2024-01-04 20:23         ` Maciej Fijalkowski
2024-01-03 20:48   ` John Fastabend [this message]
2024-01-04 20:18     ` Maciej Fijalkowski
2023-12-21 13:26 ` [PATCH v3 bpf 3/4] ice: work on pre-XDP prog frag count Maciej Fijalkowski
2023-12-21 13:26 ` [PATCH v3 bpf 4/4] i40e: handle multi-buffer packets that are shrunk by xdp prog Maciej Fijalkowski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6595c80a83de0_256122088a@john.notmuch \
    --to=john.fastabend@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bjorn@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=echaudro@redhat.com \
    --cc=lorenzo@kernel.org \
    --cc=maciej.fijalkowski@intel.com \
    --cc=magnus.karlsson@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=tirthendu.sarkar@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.