From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= Subject: Re: [RFC PATCH 01/14] packet: introduce AF_PACKET V4 userspace API Date: Thu, 2 Nov 2017 17:47:01 +0100 Message-ID: References: <20171031124145.9667-1-bjorn.topel@gmail.com> <20171031124145.9667-2-bjorn.topel@gmail.com> <34090015-d061-8f7a-18e3-c8c67ade800f@oracle.com> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Cc: Willem de Bruijn , "Karlsson, Magnus" , Alexander Duyck , Alexander Duyck , John Fastabend , Alexei Starovoitov , Jesper Dangaard Brouer , michael.lundkvist@ericsson.com, ravineet.singh@ericsson.com, Daniel Borkmann , Network Development , =?UTF-8?B?QmrDtnJuIFTDtnBlbA==?= , jesse.brandeburg@intel.com, anjali.singhai@intel.com, rami.rosen@intel.com, jeffrey.b.shaw@intel.com, ferruh.yigit@intel.com, qi.z.zhang@intel.com To: Tushar Dave Return-path: Received: from mail-qt0-f196.google.com ([209.85.216.196]:48313 "EHLO mail-qt0-f196.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751587AbdKBQrC (ORCPT ); Thu, 2 Nov 2017 12:47:02 -0400 Received: by mail-qt0-f196.google.com with SMTP id f8so185284qta.5 for ; Thu, 02 Nov 2017 09:47:02 -0700 (PDT) In-Reply-To: <34090015-d061-8f7a-18e3-c8c67ade800f@oracle.com> Sender: netdev-owner@vger.kernel.org List-ID: 2017-11-02 17:40 GMT+01:00 Tushar Dave : > > > On 11/02/2017 03:06 AM, Bj=C3=B6rn T=C3=B6pel wrote: >> >> On 2017-11-02 02:45, Willem de Bruijn wrote: >>> >>> On Tue, Oct 31, 2017 at 9:41 PM, Bj=C3=B6rn T=C3=B6pel >>> wrote: >>>> >>>> From: Bj=C3=B6rn T=C3=B6pel >>>> >>>> This patch adds the necessary AF_PACKET V4 structures for usage from >>>> userspace. AF_PACKET V4 is a new interface optimized for high >>>> performance packet processing. >>>> >>>> Signed-off-by: Bj=C3=B6rn T=C3=B6pel >>>> --- >>>> include/uapi/linux/if_packet.h | 65 >>>> +++++++++++++++++++++++++++++++++++++++++- >>>> 1 file changed, 64 insertions(+), 1 deletion(-) >>>> >>>> +struct tpacket4_queue { >>>> + struct tpacket4_desc *ring; >>>> + >>>> + unsigned int avail_idx; >>>> + unsigned int last_used_idx; >>>> + unsigned int num_free; >>>> + unsigned int ring_mask; >>>> +}; >>>> >>>> struct packet_mreq { >>>> @@ -294,6 +335,28 @@ struct packet_mreq { >>>> unsigned char mr_address[8]; >>>> }; >>>> >>>> +/* >>>> + * struct tpacket_memreg_req is used in conjunction with PACKET_MEMRE= G >>>> + * to register user memory which should be used to store the packet >>>> + * data. >>>> + * >>>> + * There are some constraints for the memory being registered: >>>> + * - The memory area has to be memory page size aligned. >>>> + * - The frame size has to be a power of 2. >>>> + * - The frame size cannot be smaller than 2048B. >>>> + * - The frame size cannot be larger than the memory page size. >>>> + * >>>> + * Corollary: The number of frames that can be stored is >>>> + * len / frame_size. >>>> + * >>>> + */ >>>> +struct tpacket_memreg_req { >>>> + unsigned long addr; /* Start of packet data area *= / >>>> + unsigned long len; /* Length of packet data area = */ >>>> + unsigned int frame_size; /* Frame size */ >>>> + unsigned int data_headroom; /* Frame head room */ >>>> +}; >>> >>> >>> Existing packet sockets take a tpacket_req, allocate memory and let the >>> user process mmap this. I understand that TPACKET_V4 distinguishes >>> the descriptor from packet pools, but could both use the existing struc= ts >>> and logic (packet_mmap)? That would avoid introducing a lot of new code >>> just for granting user pages to the kernel. >>> >> >> We could certainly pass the "tpacket_memreg_req" fields as part of >> descriptor ring setup ("tpacket_req4"), but we went with having the >> memory register as a new separate setsockopt. Having it separated, >> makes it easier to compare regions at the kernel side of things. "Is >> this the same umem as another one?" If we go the path of passing the >> range at descriptor ring setup, we need to handle all kind of >> overlapping ranges to determine when a copy is needed or not, in those >> cases where the packet buffer (i.e. umem) is shared between processes. > > > Is there a reason to use separate packet socket for umem? Looks like > userspace has to create separate packet socket for PACKET_MEMREG. > Let me clarify; You *can* use a separate socket for umem, but you can also use the same/existing AF_PACKET socket for that. Bj=C3=B6rn > > -Tushar> > >>> Also, use of unsigned long can cause problems on 32/64 bit compat >>> environments. Prefer fixed width types in uapi. Same for pointer in >>> tpacket4_queue. >> >> >> I agree; We'll change to a fixed width type in next version. Do you >> (and others on the list) prefer __u32/__u64 or unsigned int / unsigned >> long long? >> >> >> Thanks, >> Bj=C3=B6rn >> >