All of lore.kernel.org
 help / color / mirror / Atom feed
* Why does my AF-XDP Socket lose packets whereas a generic linux socket doesn't?
@ 2020-03-15 15:36 Gaul, Maximilian
  2020-03-16  8:38 ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 3+ messages in thread
From: Gaul, Maximilian @ 2020-03-15 15:36 UTC (permalink / raw)
  To: Xdp

I am comparing AF-XDP sockets vs Linux Sockets in terms of how many packets they can process without packet-loss (packet-loss is defined as the RTP-sequence number of the current packet is not equal to the RTP-sequence number of the previous packet `+ 1`).

I noticed that my AF-XDP socket program (I can't determine if this problem is related to the kernel program or the user-space program) is losing around `~25` packets per second at around `390.000` packets per second whereas an equivalent program with generic linux sockets doesn't lose any packets.

I implemented a so-called `distributor`-program which loads the XDP-kernel program once, sets up a generic linux socket and adds `setsockopt(IP_ADD_MEMBERSHIP)` to this generic socket for every multicast-address I pass to the program via command line.
After this, the `distributor` loads the filedescriptor of a `BPF_MAP_TYPE_HASH` placed in the XDP-kernel program and inserts routes for the traffic in case a single AF-XDP socket needs to share its umem later on.

The XDP-kernel program then checks for each IPv4/UDP packet if there is an entry in that hash-map. This basically looks like this:

    const struct pckt_idntfy_raw raw = {
        .src_ip = 0, /* not used at the moment */
        .dst_ip = iph->daddr,
        .dst_port = udh->dest,
        .pad = 0
    };
    
    const int *idx = bpf_map_lookup_elem(&xdp_packet_mapping, &raw);

    if(idx != NULL) {
        if (bpf_map_lookup_elem(&xsks_map, idx)) {
            bpf_printk("Found socket @ index: %d!\n", *idx);
            return bpf_redirect_map(&xsks_map, *idx, 0);
        } else {
            bpf_printk("Didn't find connected socket for index %d!\n", *idx);
        }
    }

In case `idx` exists this means that there is a socket sitting behind that index in the `BPF_MAP_TYPE_XSKMAP`.

After doing all that the `distributor` spawns a new process via `fork()` passing all multicast-addresses (including destination port) which should be processed by that process (one process handles one RX-Queue). In case there are not enough RX-Queues, some processes may receive multiple multicast-addresses. This then means that they are going to use `SHARED UMEM`.

I basically oriented my AF-XDP user-space program on this example code: https://github.com/torvalds/linux/blob/master/samples/bpf/xdpsock_user.c

I am using the same `xsk_configure_umem`, `xsk_populate_fill_ring` and `xsk_configure_socket` functions.

Because I figured I don't need maximum latency for this application, I send the process to sleep for a specified time (around `1 - 2ms`) after which it loops through every AF-XDP socket (most of the time it is only one socket) and processes every received packet for that socket, verifying that no packets have been missed:

	while(!global_exit) {
	    nanosleep(&spec, &remaining);

		for(int i = 0; i < cfg.ip_addrs_len; i++) {
			struct xsk_socket_info *socket = xsk_sockets[i];
			if(atomic_exchange(&socket->stats_sync.lock, 1) == 0) {
				handle_receive_packets(socket);
				atomic_fetch_xor(&socket->stats_sync.lock, 1); /* release socket-lock */
			}
		}
	}

In my opinion there is nothing too fancy about this but somehow I lose `~25` packets at around `390.000` packets even though my UMEM is close to 1GB of RAM.

In comparison, my generic linux socket program looks like this (in short):

    int fd = socket(AF_INET, SOCK_RAW, IPPROTO_UDP);

    /* setting some socket options */

    struct sockaddr_in sin;
    memset(&sin, 0, sizeof(struct sockaddr_in));
    sin.sin_family = AF_INET;
    sin.sin_port = cfg->ip_addrs[0]->pckt.dst_port;
    inet_aton(cfg->ip_addrs[0]->pckt.dst_ip, &sin.sin_addr);
    
    if(bind(fd, (struct sockaddr*)&sin, sizeof(struct sockaddr)) < 0) {
        fprintf(stderr, "Error on binding socket: %s\n", strerror(errno));
        return - 1;
    }
    
    ioctl(fd, SIOCGIFADDR, &intf);

The `distributor`-program creates a new process for every given multicast-ip in case generic linux sockets are used (because there are no sophisticated methods such as SHARED-UMEM in generic sockets I don't bother with multiple multicast-streams per process).
Later on I of course join the multicast membership:

    struct ip_mreqn mreq;
    memset(&mreq, 0, sizeof(struct ip_mreqn));
    
    const char *multicast_ip = cfg->ip_addrs[0]->pckt.dst_ip;

    if(inet_pton(AF_INET, multicast_ip, &mreq.imr_multiaddr.s_addr)) {
        /* Local interface address */
        memcpy(&mreq.imr_address, &cfg->ifaddr, sizeof(struct in_addr));
        mreq.imr_ifindex = cfg->ifindex;
        
        if(setsockopt(igmp_socket_fd, IPPROTO_IP, IP_ADD_MEMBERSHIP, &mreq, sizeof(struct ip_mreqn)) < 0) {
            fprintf(stderr, "Failed to set `IP_ADD_MEMBERSHIP`: %s\n", strerror(errno));
            return;
        } else {
            printf("Successfully added Membership for IP: %s\n", multicast_ip);
        }
    }

and start processing packets (not sleeping but in a `busy-loop` like fashion):

    void read_packets_recvmsg_with_latency(struct config *cfg, struct statistic *st, void *buff, const int igmp_socket_fd) {
        char ctrl[CMSG_SPACE(sizeof(struct timeval))];
    
        struct msghdr msg;
        struct iovec iov;
        msg.msg_control = (char*)ctrl;
        msg.msg_controllen = sizeof(ctrl);
        msg.msg_name = &cfg->ifaddr;
        msg.msg_namelen = sizeof(cfg->ifaddr);
    
        msg.msg_iov = &iov;
        msg.msg_iovlen = 1;
        iov.iov_base = buff;
        iov.iov_len = BUFFER_SIZE;
        
        struct timeval time_user, time_kernel;
        struct cmsghdr *cmsg = (struct cmsghdr*)&ctrl;
    
        const int64_t read_bytes = recvmsg(igmp_socket_fd, &msg, 0);
        if(read_bytes == -1) {
            return;
        }
        
        gettimeofday(&time_user, NULL);
        
        if(cmsg->cmsg_level == SOL_SOCKET && cmsg->cmsg_type == SCM_TIMESTAMP) {
            memcpy(&time_kernel, CMSG_DATA(cmsg), sizeof(struct timeval));
        }
        
        if(verify_rtp(cfg, st, read_bytes, buff)) {
            const double timediff = (time_user.tv_sec - time_kernel.tv_sec) * 1000000 + (time_user.tv_usec - time_kernel.tv_usec);
            if(timediff > st->stats.latency_us) {
                st->stats.latency_us = timediff;
            }
        }
    }



    int main(...) {
        ....
        while(!is_global_exit) {
            read_packets_recvmsg_with_latency(&cfg, &st, buffer, igmp_socket_fd);
        }
    }

That's pretty much it.

Please note that in the described use case where I start to lose packets I don't use `SHARED UMEM`, it's just a single RX-Queue receiving a single multicast-stream. In case I process a smaller multicast-stream of around `150.000 pps` - the AF-XDP solution doesn't lose any packets. But it is also the other way around - for around `520.000 pps` on the same RX-Queue (using `SHARED UMEM`) I get a loss of `12.000 pps`.

Any ideas what I am missing?

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Why does my AF-XDP Socket lose packets whereas a generic linux socket doesn't?
  2020-03-15 15:36 Why does my AF-XDP Socket lose packets whereas a generic linux socket doesn't? Gaul, Maximilian
@ 2020-03-16  8:38 ` Jesper Dangaard Brouer
  2020-03-19 14:15   ` AW: " Gaul, Maximilian
  0 siblings, 1 reply; 3+ messages in thread
From: Jesper Dangaard Brouer @ 2020-03-16  8:38 UTC (permalink / raw)
  To: Gaul, Maximilian; +Cc: brouer, Xdp, Björn Töpel

On Sun, 15 Mar 2020 15:36:13 +0000
"Gaul, Maximilian" <maximilian.gaul@hm.edu> wrote:

> I am comparing AF-XDP sockets vs Linux Sockets in terms of how many
> packets they can process without packet-loss (packet-loss is defined
> as the RTP-sequence number of the current packet is not equal to the
> RTP-sequence number of the previous packet `+ 1`).
> 
> I noticed that my AF-XDP socket program (I can't determine if this
> problem is related to the kernel program or the user-space program)
> is losing around `~25` packets per second at around `390.000` packets
> per second whereas an equivalent program with generic linux sockets
> doesn't lose any packets.
> 
[...]

> Because I figured I don't need maximum latency for this application,
> I send the process to sleep for a specified time (around `1 - 2ms`)
> after which it loops through every AF-XDP socket (most of the time it
> is only one socket) and processes every received packet for that
> socket, verifying that no packets have been missed:
> 
> 	while(!global_exit) {
> 	    nanosleep(&spec, &remaining);
> 
> 		for(int i = 0; i < cfg.ip_addrs_len; i++) {
> 			struct xsk_socket_info *socket = xsk_sockets[i];
> 			if(atomic_exchange(&socket->stats_sync.lock, 1) == 0) {
> 				handle_receive_packets(socket);
> 				atomic_fetch_xor(&socket->stats_sync.lock, 1); /* release socket-lock */
> 			}
> 		}
> 	}

You say that you are sleeping for a specified time around 1 - 2ms.

Have you considered if in the time your programs sleeps, if the
RX-queue can be overflowed?

You say at 390,000 pps drops happen.  At this speed a packets arrive
every 2.564 usec (1/390000*10^9 = 2564 ns = 2.564 usec).

What NIC hardware/driver are you using?
And what is the RX-queue size? (ethtool -g)
On Intel's XL710 driver i40e, the default RX-ring size is 512.

The "good-queue" effect is that a queue functions as a shock absorber,
to handle that the OS/CPU is busy doing something else.  If I have 512
RX-queue slots, and packets arriving every 2.564 usec, then I must
return and empty the queue (and re-fill slots) every 1.3 ms
(512 * 2.564 usec = 1312.768 usec = 1.3127 ms).

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer

^ permalink raw reply	[flat|nested] 3+ messages in thread

* AW: Why does my AF-XDP Socket lose packets whereas a generic linux socket doesn't?
  2020-03-16  8:38 ` Jesper Dangaard Brouer
@ 2020-03-19 14:15   ` Gaul, Maximilian
  0 siblings, 0 replies; 3+ messages in thread
From: Gaul, Maximilian @ 2020-03-19 14:15 UTC (permalink / raw)
  To: Jesper Dangaard Brouer; +Cc: Xdp, Björn Töpel

On Mon, 16 Mar 2020 09:38, <brouer@redhat.com> wrote:

>On Sun, 15 Mar 2020 15:36:13 +0000
>"Gaul, Maximilian" <maximilian.gaul@hm.edu> wrote:
>
>
>You say that you are sleeping for a specified time around 1 - 2ms.
>
>Have you considered if in the time your programs sleeps, if the
>RX-queue can be overflowed?
>
>You say at 390,000 pps drops happen.  At this speed a packets arrive
>every 2.564 usec (1/390000*10^9 = 2564 ns = 2.564 usec).
>
>What NIC hardware/driver are you using?
>And what is the RX-queue size? (ethtool -g)
>On Intel's XL710 driver i40e, the default RX-ring size is 512.
>
>The "good-queue" effect is that a queue functions as a shock absorber,
>to handle that the OS/CPU is busy doing something else.  If I have 512
>RX-queue slots, and packets arriving every 2.564 usec, then I must
>return and empty the queue (and re-fill slots) every 1.3 ms
>(512 * 2.564 usec = 1312.768 usec = 1.3127 ms).
>

Thank you so much for your answer Jesper!

regarding the size of the RX-Queue: it is 1024.
I am able to increase it up to 8192 but my tests are showing that the RX-Queue size doesn't change anything on the lost packet rate unless it is lower than 512 (lost packets increase very minimally if set to 512 from 1024).
I also decreased the sleeping time of the process from 1ms to 500µs - this also didn't change anything.
I am using a *Mellanox Technologies MT27800 Family [ConnectX-5]*. I did some further tests with the generic linux socket and it worked fine without any packet loss (but of course I want to use the extended packet processing capability by AF-XDP).
I am not sure but is it possible that some "side traffic" comes up to userspace (for example some ping-packages or IGMP-queries) thus messing up my RTP-Sequencenumber tracking? Even though I am filtering packets by whether they are all four: IP, UDP, have valid dest-ip and valid dest-port:

                            const struct pckt_idntfy_raw raw = {
                                .src_ip = 0, /*not used at the moment */
                                .dst_ip = iph->daddr,
                                .dst_port = udh->dest,
                                .pad = 0
                            };

                            const int *idx = bpf_map_lookup_elem(&xdp_packet_mapping, &raw);
                            
                            if(idx != NULL) {
                                if (bpf_map_lookup_elem(&xsks_map, idx)) {
                                    return bpf_redirect_map(&xsks_map, *idx, 0);
                                }
                            }

Best regards

Max

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2020-03-19 14:15 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-15 15:36 Why does my AF-XDP Socket lose packets whereas a generic linux socket doesn't? Gaul, Maximilian
2020-03-16  8:38 ` Jesper Dangaard Brouer
2020-03-19 14:15   ` AW: " Gaul, Maximilian

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.