All of lore.kernel.org
 help / color / mirror / Atom feed
* AF_PACKET mmap() v4...
@ 2015-11-05  5:04 David Miller
  2015-11-05  6:53 ` Richard Cochran
                   ` (2 more replies)
  0 siblings, 3 replies; 14+ messages in thread
From: David Miller @ 2015-11-05  5:04 UTC (permalink / raw)
  To: netdev; +Cc: arnd


As part of fixing y2038 problems, Arnd is going to have to make a new
version fo the AF_PACKET mmap() tpacker descriptors in order to extend
the time values to 64-bit.

So I want everyone to think about whether there are any other changes
we might want to make given that we have to make a v4 anyways.

Particularly, I am rather certain that the buffer management could be
improved.  Some have complained that v3 is kinda awkward to use and/or
suboptimal is various ways.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AF_PACKET mmap() v4...
  2015-11-05  5:04 AF_PACKET mmap() v4 David Miller
@ 2015-11-05  6:53 ` Richard Cochran
  2015-11-05  8:14 ` Guy Harris
  2015-11-05  9:07 ` Arnd Bergmann
  2 siblings, 0 replies; 14+ messages in thread
From: Richard Cochran @ 2015-11-05  6:53 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, arnd

On Thu, Nov 05, 2015 at 12:04:14AM -0500, David Miller wrote:
> So I want everyone to think about whether there are any other changes
> we might want to make given that we have to make a v4 anyways.

One thing I would like to see is a field for a desired transmit time.
Time based scheduling is a new topic, never discussed on this list
before, afaict.  HW already supports this, for example, the Intel i210
card has a high priority queue where you can tell it a Tx time in
terms of the PTP clock.

This functionality is useful in industrial Ethernet protocols.  There
must be a dozen of these out there, and a new IEEE standard is in the
works by Time Sensitive Networking (TSN) group.

I haven't thought too much about how to implement this, but the
eventual goal would be a generic time based scheduler that either uses
special HW features or does best effort in SW.  User space would have
a socket option for desired Tx time, and this should also be available
over the mmap interface.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AF_PACKET mmap() v4...
  2015-11-05  5:04 AF_PACKET mmap() v4 David Miller
  2015-11-05  6:53 ` Richard Cochran
@ 2015-11-05  8:14 ` Guy Harris
  2015-11-05 15:32   ` David Miller
  2015-11-05  9:07 ` Arnd Bergmann
  2 siblings, 1 reply; 14+ messages in thread
From: Guy Harris @ 2015-11-05  8:14 UTC (permalink / raw)
  To: David Miller; +Cc: netdev, arnd


On Nov 4, 2015, at 9:04 PM, David Miller <davem@davemloft.net> wrote:

> As part of fixing y2038 problems, Arnd is going to have to make a new
> version fo the AF_PACKET mmap() tpacker descriptors in order to extend
> the time values to 64-bit.
> 
> So I want everyone to think about whether there are any other changes
> we might want to make given that we have to make a v4 anyways.
> 
> Particularly, I am rather certain that the buffer management could be
> improved.  Some have complained that v3 is kinda awkward to use and/or
> suboptimal is various ways.

As a core libpcap developer, I will at least state that it works a *lot* better than v1 and v2 did.  Having buffer slots that can hold only one packet is *really* suboptimal for packet capture; having buffer slots that can hold multiple packets works a lot better, as there's less wasted memory.

I'll see whether there are any changes that *would* make libpcap's life better.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AF_PACKET mmap() v4...
  2015-11-05  5:04 AF_PACKET mmap() v4 David Miller
  2015-11-05  6:53 ` Richard Cochran
  2015-11-05  8:14 ` Guy Harris
@ 2015-11-05  9:07 ` Arnd Bergmann
  2015-11-05  9:39   ` Daniel Borkmann
  2 siblings, 1 reply; 14+ messages in thread
From: Arnd Bergmann @ 2015-11-05  9:07 UTC (permalink / raw)
  To: David Miller; +Cc: netdev

On Thursday 05 November 2015 00:04:14 David Miller wrote:
> As part of fixing y2038 problems, Arnd is going to have to make a new
> version fo the AF_PACKET mmap() tpacker descriptors in order to extend
> the time values to 64-bit.
> 
> So I want everyone to think about whether there are any other changes
> we might want to make given that we have to make a v4 anyways.
> 
> Particularly, I am rather certain that the buffer management could be
> improved.  Some have complained that v3 is kinda awkward to use and/or
> suboptimal is various ways.

I have taken a closer look at the actual timestamp data now, and noticed
that we use __u32 for both tp_sec and ts_sec in the user visible data.
This means that once we fix the internal implementation to use 64-bit
timestamps, we actually won't overflow until 2106 because the 2038 overflow
is only for signed 32-bit numbers as we have in 'struct timespec'.

So the good news is that we can keep the existing v1 through v3 formats
beyond 2038, but only as long as all user space that cares about the
value also interprets it as unsigned.

If we want to have a v4 format anyway, there are a few consideration
for the format of the timestamps: I generally recommend using __u64
nanoseconds rather than split second/nanosecond, as that simplifies
the code in most cases and makes it more efficient, unless you
actually need the seconds portion on a system that does not have
a 64-bit divide instruction (most 32-bit architectures).

Also, most subsystems are moving to 'monotonic' (counting seconds
from boot, and not impacted by settimeofday(), leap seconds or
ntp jumps) timestamps, but it's not clear if that is the best choice
here, because it won't work for hardware timestamps that actually
use real time. If we do this, we probably also need a field to
store the clockid.

	Arnd

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AF_PACKET mmap() v4...
  2015-11-05  9:07 ` Arnd Bergmann
@ 2015-11-05  9:39   ` Daniel Borkmann
  2015-11-05 11:38     ` Eric Dumazet
  2015-11-08  2:19     ` Alexei Starovoitov
  0 siblings, 2 replies; 14+ messages in thread
From: Daniel Borkmann @ 2015-11-05  9:39 UTC (permalink / raw)
  To: Arnd Bergmann, David Miller; +Cc: netdev

On 11/05/2015 10:07 AM, Arnd Bergmann wrote:
> On Thursday 05 November 2015 00:04:14 David Miller wrote:
>> As part of fixing y2038 problems, Arnd is going to have to make a new
>> version fo the AF_PACKET mmap() tpacker descriptors in order to extend
>> the time values to 64-bit.
>>
>> So I want everyone to think about whether there are any other changes
>> we might want to make given that we have to make a v4 anyways.
>>
>> Particularly, I am rather certain that the buffer management could be
>> improved.  Some have complained that v3 is kinda awkward to use and/or
>> suboptimal is various ways.
>
> I have taken a closer look at the actual timestamp data now, and noticed
> that we use __u32 for both tp_sec and ts_sec in the user visible data.
> This means that once we fix the internal implementation to use 64-bit
> timestamps, we actually won't overflow until 2106 because the 2038 overflow
> is only for signed 32-bit numbers as we have in 'struct timespec'.
>
> So the good news is that we can keep the existing v1 through v3 formats
> beyond 2038, but only as long as all user space that cares about the
> value also interprets it as unsigned.

Right, I was just about to ask that. So we could just make a union in
AF_PACKET's UAPI for a single 64-bit variable (as in ktime_t) to fix that.

Best,
Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AF_PACKET mmap() v4...
  2015-11-05  9:39   ` Daniel Borkmann
@ 2015-11-05 11:38     ` Eric Dumazet
  2015-11-05 12:56       ` Daniel Borkmann
  2015-11-08  2:19     ` Alexei Starovoitov
  1 sibling, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2015-11-05 11:38 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Arnd Bergmann, David Miller, netdev

On Thu, 2015-11-05 at 10:39 +0100, Daniel Borkmann wrote:
> On 11/05/2015 10:07 AM, Arnd Bergmann wrote:
> > On Thursday 05 November 2015 00:04:14 David Miller wrote:
> >> As part of fixing y2038 problems, Arnd is going to have to make a new
> >> version fo the AF_PACKET mmap() tpacker descriptors in order to extend
> >> the time values to 64-bit.
> >>
> >> So I want everyone to think about whether there are any other changes
> >> we might want to make given that we have to make a v4 anyways.
> >>
> >> Particularly, I am rather certain that the buffer management could be
> >> improved.  Some have complained that v3 is kinda awkward to use and/or
> >> suboptimal is various ways.
> >
> > I have taken a closer look at the actual timestamp data now, and noticed
> > that we use __u32 for both tp_sec and ts_sec in the user visible data.
> > This means that once we fix the internal implementation to use 64-bit
> > timestamps, we actually won't overflow until 2106 because the 2038 overflow
> > is only for signed 32-bit numbers as we have in 'struct timespec'.
> >
> > So the good news is that we can keep the existing v1 through v3 formats
> > beyond 2038, but only as long as all user space that cares about the
> > value also interprets it as unsigned.
> 
> Right, I was just about to ask that. So we could just make a union in
> AF_PACKET's UAPI for a single 64-bit variable (as in ktime_t) to fix that.

If I am not mistaken, af_packet also lacks the ability to properly set
skb->protocol

I noticed this using trafgen on a bonding device, when I did my SYNFLOOD
tests for TCP listener rewrite.

The bonding hash function might uses flow dissector, but as this flow
dissection depends on skb->protocol, all the traffic is directed on a
single slave.

 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AF_PACKET mmap() v4...
  2015-11-05 11:38     ` Eric Dumazet
@ 2015-11-05 12:56       ` Daniel Borkmann
  2015-11-05 16:17         ` Eric Dumazet
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Borkmann @ 2015-11-05 12:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Arnd Bergmann, David Miller, netdev, tklauser

On 11/05/2015 12:38 PM, Eric Dumazet wrote:
> On Thu, 2015-11-05 at 10:39 +0100, Daniel Borkmann wrote:
>> On 11/05/2015 10:07 AM, Arnd Bergmann wrote:
>>> On Thursday 05 November 2015 00:04:14 David Miller wrote:
>>>> As part of fixing y2038 problems, Arnd is going to have to make a new
>>>> version fo the AF_PACKET mmap() tpacker descriptors in order to extend
>>>> the time values to 64-bit.
>>>>
>>>> So I want everyone to think about whether there are any other changes
>>>> we might want to make given that we have to make a v4 anyways.
>>>>
>>>> Particularly, I am rather certain that the buffer management could be
>>>> improved.  Some have complained that v3 is kinda awkward to use and/or
>>>> suboptimal is various ways.
>>>
>>> I have taken a closer look at the actual timestamp data now, and noticed
>>> that we use __u32 for both tp_sec and ts_sec in the user visible data.
>>> This means that once we fix the internal implementation to use 64-bit
>>> timestamps, we actually won't overflow until 2106 because the 2038 overflow
>>> is only for signed 32-bit numbers as we have in 'struct timespec'.
>>>
>>> So the good news is that we can keep the existing v1 through v3 formats
>>> beyond 2038, but only as long as all user space that cares about the
>>> value also interprets it as unsigned.
>>
>> Right, I was just about to ask that. So we could just make a union in
>> AF_PACKET's UAPI for a single 64-bit variable (as in ktime_t) to fix that.
>
> If I am not mistaken, af_packet also lacks the ability to properly set
> skb->protocol
>
> I noticed this using trafgen on a bonding device, when I did my SYNFLOOD
> tests for TCP listener rewrite.
>
> The bonding hash function might uses flow dissector, but as this flow
> dissection depends on skb->protocol, all the traffic is directed on a
> single slave.

Right, if I see this correctly, when you trigger the flushing of TX_RING
via sendmsg(), one can hand over a sockaddr_ll, where we infer sll_protocol
and tag every skb's skb->protocol with that in tpacket_fill_skb() for the
current flushing run. Otherwise, we use the po->num specified at socket
creation / bind time for everything (trafgen case).

If needed on a per skb basis, perhaps we could map some tpacket_hdr{,2}
member that is not used from TX_RING side (perhaps union on tp_snaplen)?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AF_PACKET mmap() v4...
  2015-11-05  8:14 ` Guy Harris
@ 2015-11-05 15:32   ` David Miller
  0 siblings, 0 replies; 14+ messages in thread
From: David Miller @ 2015-11-05 15:32 UTC (permalink / raw)
  To: guy; +Cc: netdev, arnd

From: Guy Harris <guy@alum.mit.edu>
Date: Thu, 5 Nov 2015 00:14:51 -0800

> As a core libpcap developer, I will at least state that it works a
> *lot* better than v1 and v2 did.  Having buffer slots that can hold
> only one packet is *really* suboptimal for packet capture; having
> buffer slots that can hold multiple packets works a lot better, as
> there's less wasted memory.

No doubt.

> I'll see whether there are any changes that *would* make libpcap's life better.

Thanks.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AF_PACKET mmap() v4...
  2015-11-05 12:56       ` Daniel Borkmann
@ 2015-11-05 16:17         ` Eric Dumazet
  2015-11-05 22:56           ` Daniel Borkmann
  0 siblings, 1 reply; 14+ messages in thread
From: Eric Dumazet @ 2015-11-05 16:17 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Arnd Bergmann, David Miller, netdev, tklauser

On Thu, 2015-11-05 at 13:56 +0100, Daniel Borkmann wrote:
> On 11/05/2015 12:38 PM, Eric Dumazet wrote:

> > If I am not mistaken, af_packet also lacks the ability to properly set
> > skb->protocol
> >
> > I noticed this using trafgen on a bonding device, when I did my SYNFLOOD
> > tests for TCP listener rewrite.
> >
> > The bonding hash function might uses flow dissector, but as this flow
> > dissection depends on skb->protocol, all the traffic is directed on a
> > single slave.
> 
> Right, if I see this correctly, when you trigger the flushing of TX_RING
> via sendmsg(), one can hand over a sockaddr_ll, where we infer sll_protocol
> and tag every skb's skb->protocol with that in tpacket_fill_skb() for the
> current flushing run. Otherwise, we use the po->num specified at socket
> creation / bind time for everything (trafgen case).
> 
> If needed on a per skb basis, perhaps we could map some tpacket_hdr{,2}
> member that is not used from TX_RING side (perhaps union on tp_snaplen)?


If po->num is 0 (as in trafgen case), we could also get the proto from
Ethernet header provided by the user.

The skb_probe_transport_header() call from tpacket_fill_skb() is useless
in the current kernel.

Let say an application wants to mix IPv6 and IPv4 packets, using a
single TX ring....

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AF_PACKET mmap() v4...
  2015-11-05 16:17         ` Eric Dumazet
@ 2015-11-05 22:56           ` Daniel Borkmann
  2015-11-06 11:34             ` Daniel Borkmann
  0 siblings, 1 reply; 14+ messages in thread
From: Daniel Borkmann @ 2015-11-05 22:56 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Arnd Bergmann, David Miller, netdev, tklauser

On 11/05/2015 05:17 PM, Eric Dumazet wrote:
> On Thu, 2015-11-05 at 13:56 +0100, Daniel Borkmann wrote:
>> On 11/05/2015 12:38 PM, Eric Dumazet wrote:
>
>>> If I am not mistaken, af_packet also lacks the ability to properly set
>>> skb->protocol
>>>
>>> I noticed this using trafgen on a bonding device, when I did my SYNFLOOD
>>> tests for TCP listener rewrite.
>>>
>>> The bonding hash function might uses flow dissector, but as this flow
>>> dissection depends on skb->protocol, all the traffic is directed on a
>>> single slave.
>>
>> Right, if I see this correctly, when you trigger the flushing of TX_RING
>> via sendmsg(), one can hand over a sockaddr_ll, where we infer sll_protocol
>> and tag every skb's skb->protocol with that in tpacket_fill_skb() for the
>> current flushing run. Otherwise, we use the po->num specified at socket
>> creation / bind time for everything (trafgen case).
>>
>> If needed on a per skb basis, perhaps we could map some tpacket_hdr{,2}
>> member that is not used from TX_RING side (perhaps union on tp_snaplen)?
>
> If po->num is 0 (as in trafgen case), we could also get the proto from
> Ethernet header provided by the user.
>
> The skb_probe_transport_header() call from tpacket_fill_skb() is useless
> in the current kernel.
>
> Let say an application wants to mix IPv6 and IPv4 packets, using a
> single TX ring....

Sorry for the late answer.

For the skb->protocol issue, perhaps something like this. Also noticed that
we should rather do the vlan check when we have the actual linear data from
the ring slot, the current way seems buggy if I see this correctly. Both
patches squashed below.

Thanks,
Daniel

  net/packet/af_packet.c | 27 ++++++++++++++-------------
  1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index 691660b..8415ebd 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2399,8 +2399,22 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
  	} else {
  		data = ph.raw + po->tp_hdrlen - sizeof(struct sockaddr_ll);
  	}
+
  	to_write = tp_len;

+	/* If skb->protocol is still 0, try to infer it. */
+	if (!skb->protocol && tp_len >= sizeof(struct ethhdr))
+		skb->protocol = ((struct ethhdr *)data)->h_proto;
+	if (tp_len > dev->mtu + dev->hard_header_len) {
+		/* Earlier code assumed this would be a VLAN pkt,
+		 * double-check this now that we have the actual
+		 * (linear) packet data at hand.
+		 */
+		if (unlikely(((struct ethhdr *)data)->h_proto !=
+			     htons(ETH_P_8021Q)))
+			return -EMSGSIZE;
+	}
+
  	if (sock->type == SOCK_DGRAM) {
  		err = dev_hard_header(skb, dev, ntohs(proto), addr,
  				NULL, tp_len);
@@ -2524,19 +2538,6 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg)
  		}
  		tp_len = tpacket_fill_skb(po, skb, ph, dev, size_max, proto,
  					  addr, hlen);
-		if (likely(tp_len >= 0) &&
-		    tp_len > dev->mtu + dev->hard_header_len) {
-			struct ethhdr *ehdr;
-			/* Earlier code assumed this would be a VLAN pkt,
-			 * double-check this now that we have the actual
-			 * packet in hand.
-			 */
-
-			skb_reset_mac_header(skb);
-			ehdr = eth_hdr(skb);
-			if (ehdr->h_proto != htons(ETH_P_8021Q))
-				tp_len = -EMSGSIZE;
-		}
  		if (unlikely(tp_len < 0)) {
  			if (po->tp_loss) {
  				__packet_set_status(po, ph,
-- 
1.9.3

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: AF_PACKET mmap() v4...
  2015-11-05 22:56           ` Daniel Borkmann
@ 2015-11-06 11:34             ` Daniel Borkmann
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2015-11-06 11:34 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: Arnd Bergmann, David Miller, netdev, tklauser

On 11/05/2015 11:56 PM, Daniel Borkmann wrote:
> On 11/05/2015 05:17 PM, Eric Dumazet wrote:
>> On Thu, 2015-11-05 at 13:56 +0100, Daniel Borkmann wrote:
>>> On 11/05/2015 12:38 PM, Eric Dumazet wrote:
>>
>>>> If I am not mistaken, af_packet also lacks the ability to properly set
>>>> skb->protocol
>>>>
>>>> I noticed this using trafgen on a bonding device, when I did my SYNFLOOD
>>>> tests for TCP listener rewrite.
>>>>
>>>> The bonding hash function might uses flow dissector, but as this flow
>>>> dissection depends on skb->protocol, all the traffic is directed on a
>>>> single slave.
>>>
>>> Right, if I see this correctly, when you trigger the flushing of TX_RING
>>> via sendmsg(), one can hand over a sockaddr_ll, where we infer sll_protocol
>>> and tag every skb's skb->protocol with that in tpacket_fill_skb() for the
>>> current flushing run. Otherwise, we use the po->num specified at socket
>>> creation / bind time for everything (trafgen case).
>>>
>>> If needed on a per skb basis, perhaps we could map some tpacket_hdr{,2}
>>> member that is not used from TX_RING side (perhaps union on tp_snaplen)?
>>
>> If po->num is 0 (as in trafgen case), we could also get the proto from
>> Ethernet header provided by the user.
>>
>> The skb_probe_transport_header() call from tpacket_fill_skb() is useless
>> in the current kernel.
>>
>> Let say an application wants to mix IPv6 and IPv4 packets, using a
>> single TX ring....
>
> Sorry for the late answer.
>
> For the skb->protocol issue, perhaps something like this. Also noticed that
> we should rather do the vlan check when we have the actual linear data from
> the ring slot, the current way seems buggy if I see this correctly. Both
> patches squashed below.

Hmm, I believe there's another bug in TX_RING with SOCK_DGRAM, will do some
more experiments and post some patches later on.

Best,
Daniel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AF_PACKET mmap() v4...
  2015-11-05  9:39   ` Daniel Borkmann
  2015-11-05 11:38     ` Eric Dumazet
@ 2015-11-08  2:19     ` Alexei Starovoitov
  2015-11-08  4:27       ` John Fastabend
  1 sibling, 1 reply; 14+ messages in thread
From: Alexei Starovoitov @ 2015-11-08  2:19 UTC (permalink / raw)
  To: Daniel Borkmann; +Cc: Arnd Bergmann, David Miller, netdev

On Thu, Nov 05, 2015 at 10:39:15AM +0100, Daniel Borkmann wrote:
> On 11/05/2015 10:07 AM, Arnd Bergmann wrote:
> >On Thursday 05 November 2015 00:04:14 David Miller wrote:
> >>As part of fixing y2038 problems, Arnd is going to have to make a new
> >>version fo the AF_PACKET mmap() tpacker descriptors in order to extend
> >>the time values to 64-bit.

would also be quite useful to add ability to attach metadata to packet
from bpf program.
Right now we can only trim the length. Would be great if program could
compute something and pass it along with packet as metadata.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AF_PACKET mmap() v4...
  2015-11-08  2:19     ` Alexei Starovoitov
@ 2015-11-08  4:27       ` John Fastabend
  2015-11-09 10:54         ` Daniel Borkmann
  0 siblings, 1 reply; 14+ messages in thread
From: John Fastabend @ 2015-11-08  4:27 UTC (permalink / raw)
  To: Alexei Starovoitov, Daniel Borkmann; +Cc: Arnd Bergmann, David Miller, netdev

On 15-11-07 06:19 PM, Alexei Starovoitov wrote:
> On Thu, Nov 05, 2015 at 10:39:15AM +0100, Daniel Borkmann wrote:
>> On 11/05/2015 10:07 AM, Arnd Bergmann wrote:
>>> On Thursday 05 November 2015 00:04:14 David Miller wrote:
>>>> As part of fixing y2038 problems, Arnd is going to have to make a new
>>>> version fo the AF_PACKET mmap() tpacker descriptors in order to extend
>>>> the time values to 64-bit.
> 
> would also be quite useful to add ability to attach metadata to packet
> from bpf program.
> Right now we can only trim the length. Would be great if program could
> compute something and pass it along with packet as metadata.

Also most modern NICs can generate metadata using packet filters
it would be nice to allow these to populate any metadata fields as well.
Ethtool already has a flow classifier feature that could be easily
extended once the stack has support.

.John


> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: AF_PACKET mmap() v4...
  2015-11-08  4:27       ` John Fastabend
@ 2015-11-09 10:54         ` Daniel Borkmann
  0 siblings, 0 replies; 14+ messages in thread
From: Daniel Borkmann @ 2015-11-09 10:54 UTC (permalink / raw)
  To: John Fastabend, Alexei Starovoitov
  Cc: Arnd Bergmann, David Miller, netdev, horms

On 11/08/2015 05:27 AM, John Fastabend wrote:
> On 15-11-07 06:19 PM, Alexei Starovoitov wrote:
>> On Thu, Nov 05, 2015 at 10:39:15AM +0100, Daniel Borkmann wrote:
>>> On 11/05/2015 10:07 AM, Arnd Bergmann wrote:
>>>> On Thursday 05 November 2015 00:04:14 David Miller wrote:
>>>>> As part of fixing y2038 problems, Arnd is going to have to make a new
>>>>> version fo the AF_PACKET mmap() tpacker descriptors in order to extend
>>>>> the time values to 64-bit.
>>
>> would also be quite useful to add ability to attach metadata to packet
>> from bpf program.
>> Right now we can only trim the length. Would be great if program could
>> compute something and pass it along with packet as metadata.
>
> Also most modern NICs can generate metadata using packet filters
> it would be nice to allow these to populate any metadata fields as well.
> Ethtool already has a flow classifier feature that could be easily
> extended once the stack has support.

If I understand this correctly, that would be something independent from
packet sockets, right? Attaching metadata to the skb could be currently done
via mark, tc_index, tc_classid, priority, but I presume you mean something
else. ;)

Or, do you mean to push meta data into skb->data f.e. in front of the frame.
As in having some sort of a 'dynamic-sized', reserved scratch space or stack
at the head or tail of the skb (not visible to the network itself, but only
to the local NIC)?

It would be interesting if it could be used to interact with the NIC, as
John says, e.g. from incoming side to place a tag or additional meta data
there as a result of the NIC's flow classifier, which might then be read out
from an eBPF program to perform further actions on the skb.

If you even want to take this one step further for data center environments,
there was the idea floating around [1], where you encapsulate a restricted
set of instructions together with some scratch space between Ethernet header
and payload.

These "tiny packet programs" could then query switch meta data on the fly
that is being stored into the scratch space. Of course, this requires vendor
support, but this seems really powerful.

   [1] http://jvimal.github.io/tpp/

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2015-11-09 10:54 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-11-05  5:04 AF_PACKET mmap() v4 David Miller
2015-11-05  6:53 ` Richard Cochran
2015-11-05  8:14 ` Guy Harris
2015-11-05 15:32   ` David Miller
2015-11-05  9:07 ` Arnd Bergmann
2015-11-05  9:39   ` Daniel Borkmann
2015-11-05 11:38     ` Eric Dumazet
2015-11-05 12:56       ` Daniel Borkmann
2015-11-05 16:17         ` Eric Dumazet
2015-11-05 22:56           ` Daniel Borkmann
2015-11-06 11:34             ` Daniel Borkmann
2015-11-08  2:19     ` Alexei Starovoitov
2015-11-08  4:27       ` John Fastabend
2015-11-09 10:54         ` Daniel Borkmann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.