> > static __always_inline void > > @@ -116,7 +120,8 @@ struct xdp_frame { > > u16 len; > > u16 headroom; > > u32 metasize:8; > > - u32 frame_sz:24; > > + u32 frame_sz:23; > > + u32 mb:1; /* xdp non-linear frame */ > > /* Lifetime of xdp_rxq_info is limited to NAPI/enqueue time, > > * while mem info is valid on remote CPU. > > */ > > So, it seems that these bitfield's are the root-cause of the > performance regression. Credit to Alexei whom wisely already point > this out[1] in V2 ;-) > > [1] https://lore.kernel.org/netdev/20200904010705.jm6dnuyj3oq4cpjd@ast-mbp.dhcp.thefacebook.com/ yes, shame on me..yesterday I recalled email from Alexei debugging the issue reported by Magnus. In the current approach I am testing (not posted upstream yet) I reduced the size of xdp_mem_info as proposed by Jesper in [0] and I added a flags field in xdp_frame/xdp_buff we can use for multiple features (e.g. multi-buff or hw csum hints). Doing so, running xdp_rxq_info sample on ixgbe 10Gbps NIC I do not have any performance regressions for xdp_tx or xdp_drop. Same results have been reported by Magnus off-list on i40e (we have a 1% regression on xdp_sock tests iiuc). I will continue working on this. Regards, Lorenzo [0] https://patchwork.kernel.org/project/netdevbpf/patch/20210409223801.104657-2-mcroce@linux.microsoft.com/ > > > > @@ -179,6 +184,7 @@ void xdp_convert_frame_to_buff(struct xdp_frame *frame, struct xdp_buff *xdp) > > xdp->data_end = frame->data + frame->len; > > xdp->data_meta = frame->data - frame->metasize; > > xdp->frame_sz = frame->frame_sz; > > + xdp->mb = frame->mb; > > } > > > > static inline > > @@ -205,6 +211,7 @@ int xdp_update_frame_from_buff(struct xdp_buff *xdp, > > xdp_frame->headroom = headroom - sizeof(*xdp_frame); > > xdp_frame->metasize = metasize; > > xdp_frame->frame_sz = xdp->frame_sz; > > + xdp_frame->mb = xdp->mb; > > > > return 0; > > } > > -- > Best regards, > Jesper Dangaard Brouer > MSc.CS, Principal Kernel Engineer at Red Hat > LinkedIn: http://www.linkedin.com/in/brouer >