All of lore.kernel.org
 help / color / mirror / Atom feed
From: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Willem de Bruijn <willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
Cc: Michael Kerrisk-manpages
	<mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [PATCH man-pages] man: packet.7: document fanout, ring and auxiliary options
Date: Fri, 06 Dec 2013 17:14:15 +0100	[thread overview]
Message-ID: <52A1F7D7.6040305@redhat.com> (raw)
In-Reply-To: <CA+FuTSdCfH_yum57ZWV9tw5cd0=DkWWR-OvnaUEkUf5O7JCQYg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>

On 12/06/2013 05:11 PM, Willem de Bruijn wrote:
>>   [Very minor fixups. -dborkman]
>>
>> Signed-off-by: Willem de Bruijn <willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
>> Acked-by: Daniel Borkmann <dborkman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
>> ---
>>   Just a resend of something that got lost in March this year.
>
> Thanks for dusting this off, Daniel!
>
> I spotted a few small issues. We also introduced a few new flags since
> the last revision. If we have to make changes anyway, may as well
> describe those, too. Let me know if you will resubmit or prefer me to
> do it.
>
> I did not test the output of my changes yet, btw.

Feel free and take this over and resubmit.

I just didn't want to get this effort lost somewhere.

Thanks Willem !

>> +.I tp_net
>> +stores the offset to the network layer.
>> +If the packet socket is of type
>> +.BR SOCK_DGRAM ,
>> +then
>> +.I tp_mac
>> +is the same.
>> +If it is of type
>> +.BR SOCK_RAW ,
>> +then that field stores the offset to the link layer frame.
>
> This only applies to the metadata when passed in a packet ring frame
> and has to be moved there. The ring metadata structure is very similar
> to tpacket_auxdata (as mentioned below), but they differ in this
> regard: with recvmsg/auxdata the mac always starts at offset 0 for
> obvious reasons.
>
>> +.TP
>> +.BR PACKET_FANOUT " (since Linux 3.1)"
>> +.\" commit dc99f600698dcac69b8f56dda9a8a00d645c5ffc
>> +To scale processing across threads, packet sockets can form a fanout
>> +group.
>> +In this mode, each matching packet is enqueued onto only one
>> +socket in the group.
>> +A socket joins a fanout group by calling
>> +.BR setsockopt (2)
>> +with level
>> +.B SOL_PACKET
>> +and option
>> +.BR PACKET_FANOUT .
>> +Each network namespace can have up to 65536 independent groups.
>> +A socket selects a group by encoding the ID in the first 16 bits of
>> +the integer option value.
>> +The first packet socket to join a group implicitly creates it.
>> +To successfully join an existing group, subsequent packet sockets
>> +must have the same protocol, device settings and fanout mode and
>> +flags (see below).
>> +Packet sockets can leave a fanout group only by closing the socket.
>> +The group is deleted when the last socket is closed.
>> +
>> +Fanout supports multiple algorithms to spread traffic between sockets.
>> +The default mode,
>> +.BR PACKET_FANOUT_HASH ,
>> +sends packets from the same flow to the same socket to maintain
>> +per-flow ordering.
>> +For each packet, it chooses a socket by taking the packet flow hash
>> +modulo the number of sockets in the group, where a flow hash is a hash
>> +over network layer address and optional transport layer port fields.
>> +The load balance mode
>> +.BR PACKET_FANOUT_LB
>> +implements a round-robin algorithm.
>> +.BR PACKET_FANOUT_CPU
>> +selects the socket based on the CPU that the packet arrived on.
>
> New options since the last patch:
>
> +.BR PACKET_FANOUT_ROLLOVER
> +processes all data on a single socket, moves to the next when one
> becomes backlogged.
> +.BR PACKET_FANOUT_RND:
> +selects the socket using a pseudo random number generator.
>
>> +
>> +Fanout modes can take additional options.
>> +IP fragmentation causes packets from the same flow to have different
>> +flow hashes.
>> +The flag
>> +.BR PACKET_FANOUT_FLAG_DEFRAG ,
>> +if set, causes packet to be defragmented before fanout is applied, to
>> +preserve order even in this case.
>> +Fanout mode and options are communicated in the second 16 bits of the
>> +integer option value.
>
> .BR PACKET_FANOUT_FLAG_ROLLOVER ,
> +if set, enables the roll over mechanism as a backup strategy. If the
> +original fanout algorithm selects a backlogged cpu, roll over to the
> +next available one.
>
>> +.TP
>> +.BR PACKET_LOSS " (with PACKET_TX_RING)"
>> +If set, do not silently drop a packet on transmission error, but
>> +return it with status set to
>> +.BR TP_STATUS_WRONG_FORMAT .
>> +.TP
>> +.BR PACKET_RESERVE " (with PACKET_RX_RING)"
>> +By default, a packet receive ring writes packets immediately following the
>> +metadata structure and alignment padding.
>> +This integer option reserves additional headroom.
>> +.TP
>> +.BR PACKET_RX_RING
>> +Create a memory mapped ring buffer for asynchronous packet reception.
>> +The packet socket reserves a contiguous region of application address
>> +space, lays it out into an array of packet slots and copies packets
>> +(up to
>> +.IR tp_snaplen
>> +) into subsequent slots.
>> +Each packet is preceded by a metadata structure similar to
>> +.IR tpacket_auxdata .
>
> This is where the mac discussion from above belongs.
>
>> +Packet socket and application communicate the head and tail of the ring
>> +through the
>> +.I tp_status
>> +field.
>> +The packet socket owns all slots with status
>> +.BR TP_STATUS_KERNEL .
>> +After filling a slot, it changes the status of the slot to transfer
>> +ownership to the application.
>> +During normal operation, the new status is
>> +.BR TP_STATUS_USER ,
>> +to signal that a correctly received packet has been stored.
>> +When the application has finished processing a packet, it transfers
>> +ownership of the slot back to the socket by setting the status to
>> +.BR TP_STATUS_KERNEL .
>> +Packet sockets implement multiple variants of the packet ring.
>> +The implementation details are described in
>> +.IR Documentation/networking/packet_mmap.txt
>> +in the Linux kernel source tree.
>> +.TP
>> +.BR PACKET_STATISTICS
>> +Retrieve packet socket statistics in the form of a structure
>> +
>> +.in +4n
>> +.nf
>> +struct tpacket_stats {
>> +    __u32 tp_packets;  /* total packet count */
>> +    __u32 tp_drops;    /* dropped packet count */
>
> these should apparently be
>
> +    unsigned int tp_packets;  /* total packet count */
> +    unsigned int tp_drops;    /* dropped packet count */
>
>> +};
>> +.fi
>> +.in
>> +
>
> All the rest looked fine.
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-man" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

  parent reply	other threads:[~2013-12-06 16:14 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <cover.1386081779.git.dborkman@redhat.com>
2013-12-06 10:41 ` [PATCH man-pages] man: packet.7: document fanout, ring and auxiliary options Daniel Borkmann
2013-12-06 16:11   ` Willem de Bruijn
     [not found]     ` <CA+FuTSdCfH_yum57ZWV9tw5cd0=DkWWR-OvnaUEkUf5O7JCQYg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-12-06 16:14       ` Daniel Borkmann [this message]
     [not found]         ` <52A1F7D7.6040305-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-12-06 17:18           ` Willem de Bruijn
2013-12-06 19:54             ` Daniel Borkmann
     [not found]               ` <52A22B78.8070109-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-01-04 14:47                 ` Daniel Borkmann
     [not found]                   ` <52C81EF8.6090908-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2014-01-04 21:57                     ` Michael Kerrisk (man-pages)
     [not found]                       ` <CAKgNAkj8G6VvPLYF56884XcAWw+yOTUcF1UZSRwcYTk52D--zg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2014-01-04 23:10                         ` Daniel Borkmann

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52A1F7D7.6040305@redhat.com \
    --to=dborkman-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
    --cc=linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org \
    --cc=netdev-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=willemb-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.