rte_sched library performance question

* rte_sched library performance question
@ 2017-02-16 15:13 Zoltan Kiss
  2017-02-16 19:08 ` Dumitrescu, Cristian
  0 siblings, 1 reply; 3+ messages in thread
From: Zoltan Kiss @ 2017-02-16 15:13 UTC (permalink / raw)
  To: dev

Hi,

I'm experimenting a little bit with the scheduler library, and I got some
performance numbers which seems to be worse than what I've expected.
I'm sending 64 bytes packets on a 10G interface to a separate thread, and
my simple test program (based on the qos_sched example) does the following:

while (1) {
            uint16_t ret = rte_ring_sc_dequeue_burst(it.ring,
(void**)flushbatch, FLUSH_SIZE);
            rte_mbuf** t = flushbatch;

            if (!ret) {
                /* This call is necessary to make sure the TX completed
mbuf's
                 * are returned to the pool even if there is nothing to
                 * transmit */
                rte_eth_tx_burst(it.portid, lcore, t, 0);
                continue;
            }
            rte_sched_port_enqueue(it.port, flushbatch, ret);
            ret = rte_sched_port_dequeue(it.port, flushbatch, FLUSH_SIZE);
            while (ret) {
                uint16_t n = rte_eth_tx_burst(it.portid, lcore, t, ret);
                /* we cannot drop the packets, so re-send */
                /* update number of packets to be sent */
                ret -= n;
                t = &t[n];
            };
}

I run this on a separate thread, another one doing rx and feeding the
packets to the ring. When I comment out the enqueue and dequeue part in the
code (reducing it to simple l2fwd), I can forward the entire ~14 Mpps
traffic, whilst with the scheduler enabled I can only reach ~5.4 Mpps at
best. I've tried with a single pipe or with 4k (used rand() to randomly
distribute between pipe, everything else (class etc) was set to 0), didn't
make a difference. Is this expected? I'm running this on a Xeon E5-2630 0 @
2.30GHz

I've used the following configuration:

; port configuration [port]

[port]
frame overhead = 24
number of subports per port = 1
number of pipes per subport = 1024
queue sizes = 64 64 64 64

; Subport configuration

[subport 0]
tb rate = 1250000000; Bytes per second
tb size = 1000000000; Bytes
tc 0 rate = 1250000000;     Bytes per second
tc 1 rate = 1250000000;     Bytes per second
tc 2 rate = 1250000000;     Bytes per second
tc 3 rate = 1250000000;     Bytes per second
tc period = 10;             Milliseconds
tc oversubscription period = 1000;     Milliseconds

pipe 0-1024 = 0;        These pipes are configured with pipe profile 0

; Pipe configuration

[pipe profile 0]
tb rate = 1250000000; Bytes per second
tb size = 1000000000; Bytes

tc 0 rate = 1250000000; Bytes per second
tc 1 rate = 1250000000; Bytes per second
tc 2 rate = 1250000000; Bytes per second
tc 3 rate = 1250000000; Bytes per second
tc period = 10; Milliseconds

tc 0 oversubscription weight = 1
tc 1 oversubscription weight = 1
tc 2 oversubscription weight = 1
tc 3 oversubscription weight = 1

tc 0 wrr weights = 1 1 1 1
tc 1 wrr weights = 1 1 1 1
tc 2 wrr weights = 1 1 1 1
tc 3 wrr weights = 1 1 1 1

Regards,

Zoltan

^ permalink raw reply	[flat|nested] 3+ messages in thread