* AF_PACKET mmap() v4... @ 2015-11-05 5:04 David Miller 2015-11-05 6:53 ` Richard Cochran ` (2 more replies) 0 siblings, 3 replies; 14+ messages in thread From: David Miller @ 2015-11-05 5:04 UTC (permalink / raw) To: netdev; +Cc: arnd As part of fixing y2038 problems, Arnd is going to have to make a new version fo the AF_PACKET mmap() tpacker descriptors in order to extend the time values to 64-bit. So I want everyone to think about whether there are any other changes we might want to make given that we have to make a v4 anyways. Particularly, I am rather certain that the buffer management could be improved. Some have complained that v3 is kinda awkward to use and/or suboptimal is various ways. Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AF_PACKET mmap() v4... 2015-11-05 5:04 AF_PACKET mmap() v4 David Miller @ 2015-11-05 6:53 ` Richard Cochran 2015-11-05 8:14 ` Guy Harris 2015-11-05 9:07 ` Arnd Bergmann 2 siblings, 0 replies; 14+ messages in thread From: Richard Cochran @ 2015-11-05 6:53 UTC (permalink / raw) To: David Miller; +Cc: netdev, arnd On Thu, Nov 05, 2015 at 12:04:14AM -0500, David Miller wrote: > So I want everyone to think about whether there are any other changes > we might want to make given that we have to make a v4 anyways. One thing I would like to see is a field for a desired transmit time. Time based scheduling is a new topic, never discussed on this list before, afaict. HW already supports this, for example, the Intel i210 card has a high priority queue where you can tell it a Tx time in terms of the PTP clock. This functionality is useful in industrial Ethernet protocols. There must be a dozen of these out there, and a new IEEE standard is in the works by Time Sensitive Networking (TSN) group. I haven't thought too much about how to implement this, but the eventual goal would be a generic time based scheduler that either uses special HW features or does best effort in SW. User space would have a socket option for desired Tx time, and this should also be available over the mmap interface. Thanks, Richard ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AF_PACKET mmap() v4... 2015-11-05 5:04 AF_PACKET mmap() v4 David Miller 2015-11-05 6:53 ` Richard Cochran @ 2015-11-05 8:14 ` Guy Harris 2015-11-05 15:32 ` David Miller 2015-11-05 9:07 ` Arnd Bergmann 2 siblings, 1 reply; 14+ messages in thread From: Guy Harris @ 2015-11-05 8:14 UTC (permalink / raw) To: David Miller; +Cc: netdev, arnd On Nov 4, 2015, at 9:04 PM, David Miller <davem@davemloft.net> wrote: > As part of fixing y2038 problems, Arnd is going to have to make a new > version fo the AF_PACKET mmap() tpacker descriptors in order to extend > the time values to 64-bit. > > So I want everyone to think about whether there are any other changes > we might want to make given that we have to make a v4 anyways. > > Particularly, I am rather certain that the buffer management could be > improved. Some have complained that v3 is kinda awkward to use and/or > suboptimal is various ways. As a core libpcap developer, I will at least state that it works a *lot* better than v1 and v2 did. Having buffer slots that can hold only one packet is *really* suboptimal for packet capture; having buffer slots that can hold multiple packets works a lot better, as there's less wasted memory. I'll see whether there are any changes that *would* make libpcap's life better. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AF_PACKET mmap() v4... 2015-11-05 8:14 ` Guy Harris @ 2015-11-05 15:32 ` David Miller 0 siblings, 0 replies; 14+ messages in thread From: David Miller @ 2015-11-05 15:32 UTC (permalink / raw) To: guy; +Cc: netdev, arnd From: Guy Harris <guy@alum.mit.edu> Date: Thu, 5 Nov 2015 00:14:51 -0800 > As a core libpcap developer, I will at least state that it works a > *lot* better than v1 and v2 did. Having buffer slots that can hold > only one packet is *really* suboptimal for packet capture; having > buffer slots that can hold multiple packets works a lot better, as > there's less wasted memory. No doubt. > I'll see whether there are any changes that *would* make libpcap's life better. Thanks. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AF_PACKET mmap() v4... 2015-11-05 5:04 AF_PACKET mmap() v4 David Miller 2015-11-05 6:53 ` Richard Cochran 2015-11-05 8:14 ` Guy Harris @ 2015-11-05 9:07 ` Arnd Bergmann 2015-11-05 9:39 ` Daniel Borkmann 2 siblings, 1 reply; 14+ messages in thread From: Arnd Bergmann @ 2015-11-05 9:07 UTC (permalink / raw) To: David Miller; +Cc: netdev On Thursday 05 November 2015 00:04:14 David Miller wrote: > As part of fixing y2038 problems, Arnd is going to have to make a new > version fo the AF_PACKET mmap() tpacker descriptors in order to extend > the time values to 64-bit. > > So I want everyone to think about whether there are any other changes > we might want to make given that we have to make a v4 anyways. > > Particularly, I am rather certain that the buffer management could be > improved. Some have complained that v3 is kinda awkward to use and/or > suboptimal is various ways. I have taken a closer look at the actual timestamp data now, and noticed that we use __u32 for both tp_sec and ts_sec in the user visible data. This means that once we fix the internal implementation to use 64-bit timestamps, we actually won't overflow until 2106 because the 2038 overflow is only for signed 32-bit numbers as we have in 'struct timespec'. So the good news is that we can keep the existing v1 through v3 formats beyond 2038, but only as long as all user space that cares about the value also interprets it as unsigned. If we want to have a v4 format anyway, there are a few consideration for the format of the timestamps: I generally recommend using __u64 nanoseconds rather than split second/nanosecond, as that simplifies the code in most cases and makes it more efficient, unless you actually need the seconds portion on a system that does not have a 64-bit divide instruction (most 32-bit architectures). Also, most subsystems are moving to 'monotonic' (counting seconds from boot, and not impacted by settimeofday(), leap seconds or ntp jumps) timestamps, but it's not clear if that is the best choice here, because it won't work for hardware timestamps that actually use real time. If we do this, we probably also need a field to store the clockid. Arnd ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AF_PACKET mmap() v4... 2015-11-05 9:07 ` Arnd Bergmann @ 2015-11-05 9:39 ` Daniel Borkmann 2015-11-05 11:38 ` Eric Dumazet 2015-11-08 2:19 ` Alexei Starovoitov 0 siblings, 2 replies; 14+ messages in thread From: Daniel Borkmann @ 2015-11-05 9:39 UTC (permalink / raw) To: Arnd Bergmann, David Miller; +Cc: netdev On 11/05/2015 10:07 AM, Arnd Bergmann wrote: > On Thursday 05 November 2015 00:04:14 David Miller wrote: >> As part of fixing y2038 problems, Arnd is going to have to make a new >> version fo the AF_PACKET mmap() tpacker descriptors in order to extend >> the time values to 64-bit. >> >> So I want everyone to think about whether there are any other changes >> we might want to make given that we have to make a v4 anyways. >> >> Particularly, I am rather certain that the buffer management could be >> improved. Some have complained that v3 is kinda awkward to use and/or >> suboptimal is various ways. > > I have taken a closer look at the actual timestamp data now, and noticed > that we use __u32 for both tp_sec and ts_sec in the user visible data. > This means that once we fix the internal implementation to use 64-bit > timestamps, we actually won't overflow until 2106 because the 2038 overflow > is only for signed 32-bit numbers as we have in 'struct timespec'. > > So the good news is that we can keep the existing v1 through v3 formats > beyond 2038, but only as long as all user space that cares about the > value also interprets it as unsigned. Right, I was just about to ask that. So we could just make a union in AF_PACKET's UAPI for a single 64-bit variable (as in ktime_t) to fix that. Best, Daniel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AF_PACKET mmap() v4... 2015-11-05 9:39 ` Daniel Borkmann @ 2015-11-05 11:38 ` Eric Dumazet 2015-11-05 12:56 ` Daniel Borkmann 2015-11-08 2:19 ` Alexei Starovoitov 1 sibling, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2015-11-05 11:38 UTC (permalink / raw) To: Daniel Borkmann; +Cc: Arnd Bergmann, David Miller, netdev On Thu, 2015-11-05 at 10:39 +0100, Daniel Borkmann wrote: > On 11/05/2015 10:07 AM, Arnd Bergmann wrote: > > On Thursday 05 November 2015 00:04:14 David Miller wrote: > >> As part of fixing y2038 problems, Arnd is going to have to make a new > >> version fo the AF_PACKET mmap() tpacker descriptors in order to extend > >> the time values to 64-bit. > >> > >> So I want everyone to think about whether there are any other changes > >> we might want to make given that we have to make a v4 anyways. > >> > >> Particularly, I am rather certain that the buffer management could be > >> improved. Some have complained that v3 is kinda awkward to use and/or > >> suboptimal is various ways. > > > > I have taken a closer look at the actual timestamp data now, and noticed > > that we use __u32 for both tp_sec and ts_sec in the user visible data. > > This means that once we fix the internal implementation to use 64-bit > > timestamps, we actually won't overflow until 2106 because the 2038 overflow > > is only for signed 32-bit numbers as we have in 'struct timespec'. > > > > So the good news is that we can keep the existing v1 through v3 formats > > beyond 2038, but only as long as all user space that cares about the > > value also interprets it as unsigned. > > Right, I was just about to ask that. So we could just make a union in > AF_PACKET's UAPI for a single 64-bit variable (as in ktime_t) to fix that. If I am not mistaken, af_packet also lacks the ability to properly set skb->protocol I noticed this using trafgen on a bonding device, when I did my SYNFLOOD tests for TCP listener rewrite. The bonding hash function might uses flow dissector, but as this flow dissection depends on skb->protocol, all the traffic is directed on a single slave. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AF_PACKET mmap() v4... 2015-11-05 11:38 ` Eric Dumazet @ 2015-11-05 12:56 ` Daniel Borkmann 2015-11-05 16:17 ` Eric Dumazet 0 siblings, 1 reply; 14+ messages in thread From: Daniel Borkmann @ 2015-11-05 12:56 UTC (permalink / raw) To: Eric Dumazet; +Cc: Arnd Bergmann, David Miller, netdev, tklauser On 11/05/2015 12:38 PM, Eric Dumazet wrote: > On Thu, 2015-11-05 at 10:39 +0100, Daniel Borkmann wrote: >> On 11/05/2015 10:07 AM, Arnd Bergmann wrote: >>> On Thursday 05 November 2015 00:04:14 David Miller wrote: >>>> As part of fixing y2038 problems, Arnd is going to have to make a new >>>> version fo the AF_PACKET mmap() tpacker descriptors in order to extend >>>> the time values to 64-bit. >>>> >>>> So I want everyone to think about whether there are any other changes >>>> we might want to make given that we have to make a v4 anyways. >>>> >>>> Particularly, I am rather certain that the buffer management could be >>>> improved. Some have complained that v3 is kinda awkward to use and/or >>>> suboptimal is various ways. >>> >>> I have taken a closer look at the actual timestamp data now, and noticed >>> that we use __u32 for both tp_sec and ts_sec in the user visible data. >>> This means that once we fix the internal implementation to use 64-bit >>> timestamps, we actually won't overflow until 2106 because the 2038 overflow >>> is only for signed 32-bit numbers as we have in 'struct timespec'. >>> >>> So the good news is that we can keep the existing v1 through v3 formats >>> beyond 2038, but only as long as all user space that cares about the >>> value also interprets it as unsigned. >> >> Right, I was just about to ask that. So we could just make a union in >> AF_PACKET's UAPI for a single 64-bit variable (as in ktime_t) to fix that. > > If I am not mistaken, af_packet also lacks the ability to properly set > skb->protocol > > I noticed this using trafgen on a bonding device, when I did my SYNFLOOD > tests for TCP listener rewrite. > > The bonding hash function might uses flow dissector, but as this flow > dissection depends on skb->protocol, all the traffic is directed on a > single slave. Right, if I see this correctly, when you trigger the flushing of TX_RING via sendmsg(), one can hand over a sockaddr_ll, where we infer sll_protocol and tag every skb's skb->protocol with that in tpacket_fill_skb() for the current flushing run. Otherwise, we use the po->num specified at socket creation / bind time for everything (trafgen case). If needed on a per skb basis, perhaps we could map some tpacket_hdr{,2} member that is not used from TX_RING side (perhaps union on tp_snaplen)? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AF_PACKET mmap() v4... 2015-11-05 12:56 ` Daniel Borkmann @ 2015-11-05 16:17 ` Eric Dumazet 2015-11-05 22:56 ` Daniel Borkmann 0 siblings, 1 reply; 14+ messages in thread From: Eric Dumazet @ 2015-11-05 16:17 UTC (permalink / raw) To: Daniel Borkmann; +Cc: Arnd Bergmann, David Miller, netdev, tklauser On Thu, 2015-11-05 at 13:56 +0100, Daniel Borkmann wrote: > On 11/05/2015 12:38 PM, Eric Dumazet wrote: > > If I am not mistaken, af_packet also lacks the ability to properly set > > skb->protocol > > > > I noticed this using trafgen on a bonding device, when I did my SYNFLOOD > > tests for TCP listener rewrite. > > > > The bonding hash function might uses flow dissector, but as this flow > > dissection depends on skb->protocol, all the traffic is directed on a > > single slave. > > Right, if I see this correctly, when you trigger the flushing of TX_RING > via sendmsg(), one can hand over a sockaddr_ll, where we infer sll_protocol > and tag every skb's skb->protocol with that in tpacket_fill_skb() for the > current flushing run. Otherwise, we use the po->num specified at socket > creation / bind time for everything (trafgen case). > > If needed on a per skb basis, perhaps we could map some tpacket_hdr{,2} > member that is not used from TX_RING side (perhaps union on tp_snaplen)? If po->num is 0 (as in trafgen case), we could also get the proto from Ethernet header provided by the user. The skb_probe_transport_header() call from tpacket_fill_skb() is useless in the current kernel. Let say an application wants to mix IPv6 and IPv4 packets, using a single TX ring.... ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AF_PACKET mmap() v4... 2015-11-05 16:17 ` Eric Dumazet @ 2015-11-05 22:56 ` Daniel Borkmann 2015-11-06 11:34 ` Daniel Borkmann 0 siblings, 1 reply; 14+ messages in thread From: Daniel Borkmann @ 2015-11-05 22:56 UTC (permalink / raw) To: Eric Dumazet; +Cc: Arnd Bergmann, David Miller, netdev, tklauser On 11/05/2015 05:17 PM, Eric Dumazet wrote: > On Thu, 2015-11-05 at 13:56 +0100, Daniel Borkmann wrote: >> On 11/05/2015 12:38 PM, Eric Dumazet wrote: > >>> If I am not mistaken, af_packet also lacks the ability to properly set >>> skb->protocol >>> >>> I noticed this using trafgen on a bonding device, when I did my SYNFLOOD >>> tests for TCP listener rewrite. >>> >>> The bonding hash function might uses flow dissector, but as this flow >>> dissection depends on skb->protocol, all the traffic is directed on a >>> single slave. >> >> Right, if I see this correctly, when you trigger the flushing of TX_RING >> via sendmsg(), one can hand over a sockaddr_ll, where we infer sll_protocol >> and tag every skb's skb->protocol with that in tpacket_fill_skb() for the >> current flushing run. Otherwise, we use the po->num specified at socket >> creation / bind time for everything (trafgen case). >> >> If needed on a per skb basis, perhaps we could map some tpacket_hdr{,2} >> member that is not used from TX_RING side (perhaps union on tp_snaplen)? > > If po->num is 0 (as in trafgen case), we could also get the proto from > Ethernet header provided by the user. > > The skb_probe_transport_header() call from tpacket_fill_skb() is useless > in the current kernel. > > Let say an application wants to mix IPv6 and IPv4 packets, using a > single TX ring.... Sorry for the late answer. For the skb->protocol issue, perhaps something like this. Also noticed that we should rather do the vlan check when we have the actual linear data from the ring slot, the current way seems buggy if I see this correctly. Both patches squashed below. Thanks, Daniel net/packet/af_packet.c | 27 ++++++++++++++------------- 1 file changed, 14 insertions(+), 13 deletions(-) diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index 691660b..8415ebd 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2399,8 +2399,22 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, } else { data = ph.raw + po->tp_hdrlen - sizeof(struct sockaddr_ll); } + to_write = tp_len; + /* If skb->protocol is still 0, try to infer it. */ + if (!skb->protocol && tp_len >= sizeof(struct ethhdr)) + skb->protocol = ((struct ethhdr *)data)->h_proto; + if (tp_len > dev->mtu + dev->hard_header_len) { + /* Earlier code assumed this would be a VLAN pkt, + * double-check this now that we have the actual + * (linear) packet data at hand. + */ + if (unlikely(((struct ethhdr *)data)->h_proto != + htons(ETH_P_8021Q))) + return -EMSGSIZE; + } + if (sock->type == SOCK_DGRAM) { err = dev_hard_header(skb, dev, ntohs(proto), addr, NULL, tp_len); @@ -2524,19 +2538,6 @@ static int tpacket_snd(struct packet_sock *po, struct msghdr *msg) } tp_len = tpacket_fill_skb(po, skb, ph, dev, size_max, proto, addr, hlen); - if (likely(tp_len >= 0) && - tp_len > dev->mtu + dev->hard_header_len) { - struct ethhdr *ehdr; - /* Earlier code assumed this would be a VLAN pkt, - * double-check this now that we have the actual - * packet in hand. - */ - - skb_reset_mac_header(skb); - ehdr = eth_hdr(skb); - if (ehdr->h_proto != htons(ETH_P_8021Q)) - tp_len = -EMSGSIZE; - } if (unlikely(tp_len < 0)) { if (po->tp_loss) { __packet_set_status(po, ph, -- 1.9.3 ^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: AF_PACKET mmap() v4... 2015-11-05 22:56 ` Daniel Borkmann @ 2015-11-06 11:34 ` Daniel Borkmann 0 siblings, 0 replies; 14+ messages in thread From: Daniel Borkmann @ 2015-11-06 11:34 UTC (permalink / raw) To: Eric Dumazet; +Cc: Arnd Bergmann, David Miller, netdev, tklauser On 11/05/2015 11:56 PM, Daniel Borkmann wrote: > On 11/05/2015 05:17 PM, Eric Dumazet wrote: >> On Thu, 2015-11-05 at 13:56 +0100, Daniel Borkmann wrote: >>> On 11/05/2015 12:38 PM, Eric Dumazet wrote: >> >>>> If I am not mistaken, af_packet also lacks the ability to properly set >>>> skb->protocol >>>> >>>> I noticed this using trafgen on a bonding device, when I did my SYNFLOOD >>>> tests for TCP listener rewrite. >>>> >>>> The bonding hash function might uses flow dissector, but as this flow >>>> dissection depends on skb->protocol, all the traffic is directed on a >>>> single slave. >>> >>> Right, if I see this correctly, when you trigger the flushing of TX_RING >>> via sendmsg(), one can hand over a sockaddr_ll, where we infer sll_protocol >>> and tag every skb's skb->protocol with that in tpacket_fill_skb() for the >>> current flushing run. Otherwise, we use the po->num specified at socket >>> creation / bind time for everything (trafgen case). >>> >>> If needed on a per skb basis, perhaps we could map some tpacket_hdr{,2} >>> member that is not used from TX_RING side (perhaps union on tp_snaplen)? >> >> If po->num is 0 (as in trafgen case), we could also get the proto from >> Ethernet header provided by the user. >> >> The skb_probe_transport_header() call from tpacket_fill_skb() is useless >> in the current kernel. >> >> Let say an application wants to mix IPv6 and IPv4 packets, using a >> single TX ring.... > > Sorry for the late answer. > > For the skb->protocol issue, perhaps something like this. Also noticed that > we should rather do the vlan check when we have the actual linear data from > the ring slot, the current way seems buggy if I see this correctly. Both > patches squashed below. Hmm, I believe there's another bug in TX_RING with SOCK_DGRAM, will do some more experiments and post some patches later on. Best, Daniel ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AF_PACKET mmap() v4... 2015-11-05 9:39 ` Daniel Borkmann 2015-11-05 11:38 ` Eric Dumazet @ 2015-11-08 2:19 ` Alexei Starovoitov 2015-11-08 4:27 ` John Fastabend 1 sibling, 1 reply; 14+ messages in thread From: Alexei Starovoitov @ 2015-11-08 2:19 UTC (permalink / raw) To: Daniel Borkmann; +Cc: Arnd Bergmann, David Miller, netdev On Thu, Nov 05, 2015 at 10:39:15AM +0100, Daniel Borkmann wrote: > On 11/05/2015 10:07 AM, Arnd Bergmann wrote: > >On Thursday 05 November 2015 00:04:14 David Miller wrote: > >>As part of fixing y2038 problems, Arnd is going to have to make a new > >>version fo the AF_PACKET mmap() tpacker descriptors in order to extend > >>the time values to 64-bit. would also be quite useful to add ability to attach metadata to packet from bpf program. Right now we can only trim the length. Would be great if program could compute something and pass it along with packet as metadata. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AF_PACKET mmap() v4... 2015-11-08 2:19 ` Alexei Starovoitov @ 2015-11-08 4:27 ` John Fastabend 2015-11-09 10:54 ` Daniel Borkmann 0 siblings, 1 reply; 14+ messages in thread From: John Fastabend @ 2015-11-08 4:27 UTC (permalink / raw) To: Alexei Starovoitov, Daniel Borkmann; +Cc: Arnd Bergmann, David Miller, netdev On 15-11-07 06:19 PM, Alexei Starovoitov wrote: > On Thu, Nov 05, 2015 at 10:39:15AM +0100, Daniel Borkmann wrote: >> On 11/05/2015 10:07 AM, Arnd Bergmann wrote: >>> On Thursday 05 November 2015 00:04:14 David Miller wrote: >>>> As part of fixing y2038 problems, Arnd is going to have to make a new >>>> version fo the AF_PACKET mmap() tpacker descriptors in order to extend >>>> the time values to 64-bit. > > would also be quite useful to add ability to attach metadata to packet > from bpf program. > Right now we can only trim the length. Would be great if program could > compute something and pass it along with packet as metadata. Also most modern NICs can generate metadata using packet filters it would be nice to allow these to populate any metadata fields as well. Ethtool already has a flow classifier feature that could be easily extended once the stack has support. .John > > -- > To unsubscribe from this list: send the line "unsubscribe netdev" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: AF_PACKET mmap() v4... 2015-11-08 4:27 ` John Fastabend @ 2015-11-09 10:54 ` Daniel Borkmann 0 siblings, 0 replies; 14+ messages in thread From: Daniel Borkmann @ 2015-11-09 10:54 UTC (permalink / raw) To: John Fastabend, Alexei Starovoitov Cc: Arnd Bergmann, David Miller, netdev, horms On 11/08/2015 05:27 AM, John Fastabend wrote: > On 15-11-07 06:19 PM, Alexei Starovoitov wrote: >> On Thu, Nov 05, 2015 at 10:39:15AM +0100, Daniel Borkmann wrote: >>> On 11/05/2015 10:07 AM, Arnd Bergmann wrote: >>>> On Thursday 05 November 2015 00:04:14 David Miller wrote: >>>>> As part of fixing y2038 problems, Arnd is going to have to make a new >>>>> version fo the AF_PACKET mmap() tpacker descriptors in order to extend >>>>> the time values to 64-bit. >> >> would also be quite useful to add ability to attach metadata to packet >> from bpf program. >> Right now we can only trim the length. Would be great if program could >> compute something and pass it along with packet as metadata. > > Also most modern NICs can generate metadata using packet filters > it would be nice to allow these to populate any metadata fields as well. > Ethtool already has a flow classifier feature that could be easily > extended once the stack has support. If I understand this correctly, that would be something independent from packet sockets, right? Attaching metadata to the skb could be currently done via mark, tc_index, tc_classid, priority, but I presume you mean something else. ;) Or, do you mean to push meta data into skb->data f.e. in front of the frame. As in having some sort of a 'dynamic-sized', reserved scratch space or stack at the head or tail of the skb (not visible to the network itself, but only to the local NIC)? It would be interesting if it could be used to interact with the NIC, as John says, e.g. from incoming side to place a tag or additional meta data there as a result of the NIC's flow classifier, which might then be read out from an eBPF program to perform further actions on the skb. If you even want to take this one step further for data center environments, there was the idea floating around [1], where you encapsulate a restricted set of instructions together with some scratch space between Ethernet header and payload. These "tiny packet programs" could then query switch meta data on the fly that is being stored into the scratch space. Of course, this requires vendor support, but this seems really powerful. [1] http://jvimal.github.io/tpp/ ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2015-11-09 10:54 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2015-11-05 5:04 AF_PACKET mmap() v4 David Miller 2015-11-05 6:53 ` Richard Cochran 2015-11-05 8:14 ` Guy Harris 2015-11-05 15:32 ` David Miller 2015-11-05 9:07 ` Arnd Bergmann 2015-11-05 9:39 ` Daniel Borkmann 2015-11-05 11:38 ` Eric Dumazet 2015-11-05 12:56 ` Daniel Borkmann 2015-11-05 16:17 ` Eric Dumazet 2015-11-05 22:56 ` Daniel Borkmann 2015-11-06 11:34 ` Daniel Borkmann 2015-11-08 2:19 ` Alexei Starovoitov 2015-11-08 4:27 ` John Fastabend 2015-11-09 10:54 ` Daniel Borkmann
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.