All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
@ 2013-12-06 14:44 Vincenzo Maffione
  2013-12-06 16:39 ` Stefan Weil
                   ` (2 more replies)
  0 siblings, 3 replies; 17+ messages in thread
From: Vincenzo Maffione @ 2013-12-06 14:44 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, mst, jasowang, mjt, v.maffione, lcapitulino,
	peter.crosthwaite, dmitry, kraxel, yan, edgar.iglesias, akong,
	quintela, agraf, aliguori, marcel.a, sw, stefanha, g.lettieri,
	rizzo, mark.langsdorf, owasserm, pbonzini, afaerber

This patch extends the frontend-backend interface so that it is possible
to pass a new flag (QEMU_NET_PACKET_FLAG_MORE) when sending a packet to the
other peer. The new flag acts as a hint for the receiving peer, which can
accumulate a batch of packets before forwarding those packets (to the host
if the receiving peer is the backend or to the guest if the receiving peer
is the frontend).

The patch also implements a batching mechanism for the netmap backend (on the
backend receive side) and for the e1000 and virtio frontends (on the frontend
transmit side).

Measured improvement of a guest-to-guest UDP_STREAM netperf test (64 bytes
packets) with virtio-net frontends:
    820 Kpps ==> 1000 Kpps (+22%).

Measured improvement of a guest-to-guest UDP test (64 bytes packets) with
e1000 frontends and netmap clients on the guests:
    1.8 Mpps ==> 3.1 Mpps (+72%).

Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
---
Experiment details:
    - Processor: Intel i7-3770K CPU @ 3.50GHz (8 cores)
    - Memory @ 1333 MHz
    - Host O.S.: Archlinux with Linux 3.11
    - Guest O.S.: Archlinux with Linux 3.11

QEMU command line for the virtio experiment:
    qemu-system-x86_64 archdisk.qcow -snapshot -enable-kvm -device virtio-net-pci,ioeventfd=on,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev netmap,ifname=vale0:01,id=mynet -smp 2 -vga std -m 3G

QEMU command line for the e1000 experiment:
    qemu-system-x86_64 archdisk.qcow -snapshot -enable-kvm -device e1000,mitigation=off,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev netmap,ifname=vale0:01,id=mynet -smp 2 -vga std -m 3G

With the e1000 experiments, we don't use netperf on the guests, but netmap clients (pkt-gen)
that run directly on the e1000 adapter, bypassing the O.S. stack.

Other things:
    - This patch is against the net-next tree (https://github.com/stefanha/qemu.git)
      because the first netmap patch is not in the qemu master (AFAIK).
    - The batching can also be implemented on the backend transmit side and frontend
      receive side. We could do it in the future.

 hw/net/cadence_gem.c    |  3 ++-
 hw/net/dp8393x.c        |  5 +++--
 hw/net/e1000.c          | 21 ++++++++++++++++-----
 hw/net/eepro100.c       |  5 +++--
 hw/net/etraxfs_eth.c    |  5 +++--
 hw/net/lan9118.c        |  2 +-
 hw/net/mcf_fec.c        |  5 +++--
 hw/net/mipsnet.c        |  6 ++++--
 hw/net/ne2000.c         |  5 +++--
 hw/net/ne2000.h         |  3 ++-
 hw/net/opencores_eth.c  |  2 +-
 hw/net/pcnet.c          |  8 +++++---
 hw/net/pcnet.h          |  3 ++-
 hw/net/rtl8139.c        |  7 ++++---
 hw/net/smc91c111.c      |  5 +++--
 hw/net/spapr_llan.c     |  2 +-
 hw/net/stellaris_enet.c |  3 ++-
 hw/net/virtio-net.c     | 10 ++++++++--
 hw/net/vmxnet3.c        |  3 ++-
 hw/net/vmxnet_tx_pkt.c  |  4 ++--
 hw/net/xgmac.c          |  2 +-
 hw/net/xilinx_axienet.c |  2 +-
 hw/usb/dev-network.c    |  8 +++++---
 include/net/net.h       | 20 +++++++++++++-------
 include/net/queue.h     |  1 +
 net/dump.c              |  3 ++-
 net/hub.c               | 10 ++++++----
 net/net.c               | 39 +++++++++++++++++++++++----------------
 net/netmap.c            | 17 ++++++++++++-----
 net/slirp.c             |  5 +++--
 net/socket.c            | 10 ++++++----
 net/tap-win32.c         |  2 +-
 net/tap.c               | 12 +++++++-----
 net/vde.c               |  5 +++--
 savevm.c                |  2 +-
 35 files changed, 155 insertions(+), 90 deletions(-)

diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
index 4a355bb..432687a 100644
--- a/hw/net/cadence_gem.c
+++ b/hw/net/cadence_gem.c
@@ -583,7 +583,8 @@ static int gem_mac_address_filter(GemState *s, const uint8_t *packet)
  * gem_receive:
  * Fit a packet handed to us by QEMU into the receive descriptor ring.
  */
-static ssize_t gem_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t gem_receive(NetClientState *nc, const uint8_t *buf, size_t size,
+                           unsigned flags)
 {
     unsigned    desc[2];
     hwaddr packet_desc_addr, last_desc_addr;
diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
index 789d385..d8c7da8 100644
--- a/hw/net/dp8393x.c
+++ b/hw/net/dp8393x.c
@@ -415,7 +415,7 @@ static void do_transmit_packets(dp8393xState *s)
             }
         } else {
             /* Transmit packet */
-            qemu_send_packet(nc, s->tx_buffer, tx_len);
+            qemu_send_packet(nc, s->tx_buffer, tx_len, 0);
         }
         s->regs[SONIC_TCR] |= SONIC_TCR_PTX;
 
@@ -723,7 +723,8 @@ static int receive_filter(dp8393xState *s, const uint8_t * buf, int size)
     return -1;
 }
 
-static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf, size_t size)
+static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf,
+                           size_t size, unsigned flags)
 {
     dp8393xState *s = qemu_get_nic_opaque(nc);
     uint16_t data[10];
diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index ae63591..5294ec5 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -570,10 +570,19 @@ static void
 e1000_send_packet(E1000State *s, const uint8_t *buf, int size)
 {
     NetClientState *nc = qemu_get_queue(s->nic);
+    uint32_t tdh = s->mac_reg[TDH];
+    unsigned flags = QEMU_NET_PACKET_FLAG_MORE;
+
     if (s->phy_reg[PHY_CTRL] & MII_CR_LOOPBACK) {
-        nc->info->receive(nc, buf, size);
+        nc->info->receive(nc, buf, size, 0);
     } else {
-        qemu_send_packet(nc, buf, size);
+        if (++tdh * sizeof(struct e1000_tx_desc) >= s->mac_reg[TDLEN]) {
+            tdh = 0;
+        }
+        if (tdh == s->mac_reg[TDT]) {
+            flags = 0;
+        }
+        qemu_send_packet(nc, buf, size, flags);
     }
 }
 
@@ -899,7 +908,8 @@ static uint64_t rx_desc_base(E1000State *s)
 }
 
 static ssize_t
-e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
+e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt,
+                  unsigned flags)
 {
     E1000State *s = qemu_get_nic_opaque(nc);
     PCIDevice *d = PCI_DEVICE(s);
@@ -1054,14 +1064,15 @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
 }
 
 static ssize_t
-e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size,
+              unsigned flags)
 {
     const struct iovec iov = {
         .iov_base = (uint8_t *)buf,
         .iov_len = size
     };
 
-    return e1000_receive_iov(nc, &iov, 1);
+    return e1000_receive_iov(nc, &iov, 1, flags);
 }
 
 static uint32_t
diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
index 3b891ca..9763904 100644
--- a/hw/net/eepro100.c
+++ b/hw/net/eepro100.c
@@ -828,7 +828,7 @@ static void tx_command(EEPRO100State *s)
         }
     }
     TRACE(RXTX, logout("%p sending frame, len=%d,%s\n", s, size, nic_dump(buf, size)));
-    qemu_send_packet(qemu_get_queue(s->nic), buf, size);
+    qemu_send_packet(qemu_get_queue(s->nic), buf, size, 0);
     s->statistics.tx_good_frames++;
     /* Transmit with bad status would raise an CX/TNO interrupt.
      * (82557 only). Emulation never has bad status. */
@@ -1627,7 +1627,8 @@ static int nic_can_receive(NetClientState *nc)
 #endif
 }
 
-static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf, size_t size)
+static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf,
+                           size_t size, unsigned flags)
 {
     /* TODO:
      * - Magic packets should set bit 30 in power management driver register.
diff --git a/hw/net/etraxfs_eth.c b/hw/net/etraxfs_eth.c
index 78ebbbc..6cba74e 100644
--- a/hw/net/etraxfs_eth.c
+++ b/hw/net/etraxfs_eth.c
@@ -525,7 +525,8 @@ static int eth_can_receive(NetClientState *nc)
     return 1;
 }
 
-static ssize_t eth_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t eth_receive(NetClientState *nc, const uint8_t *buf, size_t size,
+                           unsigned flags)
 {
     unsigned char sa_bcast[6] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
     ETRAXFSEthState *eth = qemu_get_nic_opaque(nc);
@@ -560,7 +561,7 @@ static int eth_tx_push(void *opaque, unsigned char *buf, int len, bool eop)
     ETRAXFSEthState *eth = opaque;
 
     D(printf("%s buf=%p len=%d\n", __func__, buf, len));
-    qemu_send_packet(qemu_get_queue(eth->nic), buf, len);
+    qemu_send_packet(qemu_get_queue(eth->nic), buf, len, 0);
     return len;
 }
 
diff --git a/hw/net/lan9118.c b/hw/net/lan9118.c
index 2315f99..55e06a9 100644
--- a/hw/net/lan9118.c
+++ b/hw/net/lan9118.c
@@ -664,7 +664,7 @@ static void do_tx_packet(lan9118_state *s)
         /* This assumes the receive routine doesn't touch the VLANClient.  */
         lan9118_receive(qemu_get_queue(s->nic), s->txp->data, s->txp->len);
     } else {
-        qemu_send_packet(qemu_get_queue(s->nic), s->txp->data, s->txp->len);
+        qemu_send_packet(qemu_get_queue(s->nic), s->txp->data, s->txp->len, 0);
     }
     s->txp->fifo_used = 0;
 
diff --git a/hw/net/mcf_fec.c b/hw/net/mcf_fec.c
index 4bff3de..14ed0dd 100644
--- a/hw/net/mcf_fec.c
+++ b/hw/net/mcf_fec.c
@@ -174,7 +174,7 @@ static void mcf_fec_do_tx(mcf_fec_state *s)
         if (bd.flags & FEC_BD_L) {
             /* Last buffer in frame.  */
             DPRINTF("Sending packet\n");
-            qemu_send_packet(qemu_get_queue(s->nic), frame, len);
+            qemu_send_packet(qemu_get_queue(s->nic), frame, len, 0);
             ptr = frame;
             frame_size = 0;
             s->eir |= FEC_INT_TXF;
@@ -357,7 +357,8 @@ static int mcf_fec_can_receive(NetClientState *nc)
     return s->rx_enabled;
 }
 
-static ssize_t mcf_fec_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t mcf_fec_receive(NetClientState *nc, const uint8_t *buf,
+                               size_t size, unsigned flags)
 {
     mcf_fec_state *s = qemu_get_nic_opaque(nc);
     mcf_fec_bd bd;
diff --git a/hw/net/mipsnet.c b/hw/net/mipsnet.c
index e421b86..7f5d4c4 100644
--- a/hw/net/mipsnet.c
+++ b/hw/net/mipsnet.c
@@ -74,7 +74,8 @@ static int mipsnet_can_receive(NetClientState *nc)
     return !mipsnet_buffer_full(s);
 }
 
-static ssize_t mipsnet_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t mipsnet_receive(NetClientState *nc, const uint8_t *buf,
+                               size_t size, unsigned flags)
 {
     MIPSnetState *s = qemu_get_nic_opaque(nc);
 
@@ -176,7 +177,8 @@ static void mipsnet_ioport_write(void *opaque, hwaddr addr,
         if (s->tx_written == s->tx_count) {
             /* Send buffer. */
             trace_mipsnet_send(s->tx_count);
-            qemu_send_packet(qemu_get_queue(s->nic), s->tx_buffer, s->tx_count);
+            qemu_send_packet(qemu_get_queue(s->nic), s->tx_buffer,
+                             s->tx_count, 0);
             s->tx_count = s->tx_written = 0;
             s->intctl |= MIPSNET_INTCTL_TXDONE;
             s->busy = 1;
diff --git a/hw/net/ne2000.c b/hw/net/ne2000.c
index 4c32e9e..52af46a 100644
--- a/hw/net/ne2000.c
+++ b/hw/net/ne2000.c
@@ -176,7 +176,8 @@ int ne2000_can_receive(NetClientState *nc)
 
 #define MIN_BUF_SIZE 60
 
-ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
+ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_,
+                       unsigned flags)
 {
     NE2000State *s = qemu_get_nic_opaque(nc);
     int size = size_;
@@ -301,7 +302,7 @@ static void ne2000_ioport_write(void *opaque, uint32_t addr, uint32_t val)
                 /* fail safe: check range on the transmitted length  */
                 if (index + s->tcnt <= NE2000_PMEM_END) {
                     qemu_send_packet(qemu_get_queue(s->nic), s->mem + index,
-                                     s->tcnt);
+                                     s->tcnt, 0);
                 }
                 /* signal end of transfer */
                 s->tsr = ENTSR_PTX;
diff --git a/hw/net/ne2000.h b/hw/net/ne2000.h
index e500306..b62a8f3 100644
--- a/hw/net/ne2000.h
+++ b/hw/net/ne2000.h
@@ -35,6 +35,7 @@ void ne2000_setup_io(NE2000State *s, DeviceState *dev, unsigned size);
 extern const VMStateDescription vmstate_ne2000;
 void ne2000_reset(NE2000State *s);
 int ne2000_can_receive(NetClientState *nc);
-ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_);
+ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_,
+                       unsigned flags);
 
 #endif
diff --git a/hw/net/opencores_eth.c b/hw/net/opencores_eth.c
index 4118d54..b4328ea 100644
--- a/hw/net/opencores_eth.c
+++ b/hw/net/opencores_eth.c
@@ -503,7 +503,7 @@ static void open_eth_start_xmit(OpenEthState *s, desc *tx)
     if (tx_len > len) {
         memset(buf + len, 0, tx_len - len);
     }
-    qemu_send_packet(qemu_get_queue(s->nic), buf, tx_len);
+    qemu_send_packet(qemu_get_queue(s->nic), buf, tx_len, 0);
 
     if (tx->len_flags & TXD_WR) {
         s->tx_desc = 0;
diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c
index 7cb47b3..707ac92 100644
--- a/hw/net/pcnet.c
+++ b/hw/net/pcnet.c
@@ -1019,7 +1019,8 @@ int pcnet_can_receive(NetClientState *nc)
 
 #define MIN_BUF_SIZE 60
 
-ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
+ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_,
+                      unsigned flags)
 {
     PCNetState *s = qemu_get_nic_opaque(nc);
     int is_padr = 0, is_bcast = 0, is_ladr = 0;
@@ -1265,12 +1266,13 @@ static void pcnet_transmit(PCNetState *s)
                 if (BCR_SWSTYLE(s) == 1)
                     add_crc = !GET_FIELD(tmd.status, TMDS, NOFCS);
                 s->looptest = add_crc ? PCNET_LOOPTEST_CRC : PCNET_LOOPTEST_NOCRC;
-                pcnet_receive(qemu_get_queue(s->nic), s->buffer, s->xmit_pos);
+                pcnet_receive(qemu_get_queue(s->nic), s->buffer,
+                              s->xmit_pos, 0);
                 s->looptest = 0;
             } else
                 if (s->nic)
                     qemu_send_packet(qemu_get_queue(s->nic), s->buffer,
-                                     s->xmit_pos);
+                                     s->xmit_pos, 0);
 
             s->csr[0] &= ~0x0008;   /* clear TDMD */
             s->csr[4] |= 0x0004;    /* set TXSTRT */
diff --git a/hw/net/pcnet.h b/hw/net/pcnet.h
index 9dee6f3..a26aacd 100644
--- a/hw/net/pcnet.h
+++ b/hw/net/pcnet.h
@@ -61,7 +61,8 @@ void pcnet_ioport_writel(void *opaque, uint32_t addr, uint32_t val);
 uint32_t pcnet_ioport_readl(void *opaque, uint32_t addr);
 uint32_t pcnet_bcr_readw(PCNetState *s, uint32_t rap);
 int pcnet_can_receive(NetClientState *nc);
-ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_);
+ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_,
+                      unsigned flags);
 void pcnet_set_link_status(NetClientState *nc);
 void pcnet_common_cleanup(PCNetState *d);
 int pcnet_common_init(DeviceState *dev, PCNetState *s, NetClientInfo *info);
diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
index 7f2b4db..340331f 100644
--- a/hw/net/rtl8139.c
+++ b/hw/net/rtl8139.c
@@ -1195,7 +1195,8 @@ static ssize_t rtl8139_do_receive(NetClientState *nc, const uint8_t *buf, size_t
     return size_;
 }
 
-static ssize_t rtl8139_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t rtl8139_receive(NetClientState *nc, const uint8_t *buf,
+                               size_t size, unsigned flags)
 {
     return rtl8139_do_receive(nc, buf, size, 1);
 }
@@ -1814,9 +1815,9 @@ static void rtl8139_transfer_frame(RTL8139State *s, uint8_t *buf, int size,
     else
     {
         if (iov) {
-            qemu_sendv_packet(qemu_get_queue(s->nic), iov, 3);
+            qemu_sendv_packet(qemu_get_queue(s->nic), iov, 3, 0);
         } else {
-            qemu_send_packet(qemu_get_queue(s->nic), buf, size);
+            qemu_send_packet(qemu_get_queue(s->nic), buf, size, 0);
         }
     }
 }
diff --git a/hw/net/smc91c111.c b/hw/net/smc91c111.c
index a8e29b3..82289aa 100644
--- a/hw/net/smc91c111.c
+++ b/hw/net/smc91c111.c
@@ -242,7 +242,7 @@ static void smc91c111_do_tx(smc91c111_state *s)
             smc91c111_release_packet(s, packetnum);
         else if (s->tx_fifo_done_len < NUM_PACKETS)
             s->tx_fifo_done[s->tx_fifo_done_len++] = packetnum;
-        qemu_send_packet(qemu_get_queue(s->nic), p, len);
+        qemu_send_packet(qemu_get_queue(s->nic), p, len, 0);
     }
     s->tx_fifo_len = 0;
     smc91c111_update(s);
@@ -647,7 +647,8 @@ static int smc91c111_can_receive(NetClientState *nc)
     return 1;
 }
 
-static ssize_t smc91c111_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t smc91c111_receive(NetClientState *nc, const uint8_t *buf,
+                                 size_t size, unsigned flags)
 {
     smc91c111_state *s = qemu_get_nic_opaque(nc);
     int status;
diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
index 1bd6f50..1a50bc1 100644
--- a/hw/net/spapr_llan.c
+++ b/hw/net/spapr_llan.c
@@ -476,7 +476,7 @@ static target_ulong h_send_logical_lan(PowerPCCPU *cpu, sPAPREnvironment *spapr,
         p += VLAN_BD_LEN(bufs[i]);
     }
 
-    qemu_send_packet(qemu_get_queue(dev->nic), lbuf, total_len);
+    qemu_send_packet(qemu_get_queue(dev->nic), lbuf, total_len, 0);
 
     return H_SUCCESS;
 }
diff --git a/hw/net/stellaris_enet.c b/hw/net/stellaris_enet.c
index 9dd77f7..950b455 100644
--- a/hw/net/stellaris_enet.c
+++ b/hw/net/stellaris_enet.c
@@ -83,7 +83,8 @@ static void stellaris_enet_update(stellaris_enet_state *s)
 }
 
 /* TODO: Implement MAC address filtering.  */
-static ssize_t stellaris_enet_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t stellaris_enet_receive(NetClientState *nc, const uint8_t *buf,
+                                      size_t size, unsigned flags)
 {
     stellaris_enet_state *s = qemu_get_nic_opaque(nc);
     int n;
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 513c168..b25bc4e 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -938,7 +938,8 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
     return 0;
 }
 
-static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf,
+                                  size_t size, unsigned flags)
 {
     VirtIONet *n = qemu_get_nic_opaque(nc);
     VirtIONetQueue *q = virtio_net_get_subqueue(nc);
@@ -1079,6 +1080,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
         unsigned int out_num = elem.out_num;
         struct iovec *out_sg = &elem.out_sg[0];
         struct iovec sg[VIRTQUEUE_MAX_SIZE];
+        unsigned flags = QEMU_NET_PACKET_FLAG_MORE;
 
         if (out_num < 1) {
             error_report("virtio-net header not in first element");
@@ -1104,8 +1106,12 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
 
         len = n->guest_hdr_len;
 
+        if (num_packets + 1 >= n->tx_burst || virtio_queue_empty(q->tx_vq)) {
+                flags = 0;
+        }
         ret = qemu_sendv_packet_async(qemu_get_subqueue(n->nic, queue_index),
-                                      out_sg, out_num, virtio_net_tx_complete);
+                                      out_sg, out_num, virtio_net_tx_complete,
+                                      flags);
         if (ret == 0) {
             virtio_queue_set_notification(q->tx_vq, 0);
             q->async_tx.elem = elem;
diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
index 19687aa..6bd59d0 100644
--- a/hw/net/vmxnet3.c
+++ b/hw/net/vmxnet3.c
@@ -1817,7 +1817,8 @@ vmxnet3_rx_filter_may_indicate(VMXNET3State *s, const void *data,
 }
 
 static ssize_t
-vmxnet3_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+vmxnet3_receive(NetClientState *nc, const uint8_t *buf, size_t size,
+                unsigned flags)
 {
     VMXNET3State *s = qemu_get_nic_opaque(nc);
     size_t bytes_indicated;
diff --git a/hw/net/vmxnet_tx_pkt.c b/hw/net/vmxnet_tx_pkt.c
index f7344c4..12d842e 100644
--- a/hw/net/vmxnet_tx_pkt.c
+++ b/hw/net/vmxnet_tx_pkt.c
@@ -526,7 +526,7 @@ static bool vmxnet_tx_pkt_do_sw_fragmentation(struct VmxnetTxPkt *pkt,
 
         eth_fix_ip4_checksum(l3_iov_base, l3_iov_len);
 
-        qemu_sendv_packet(nc, fragment, dst_idx);
+        qemu_sendv_packet(nc, fragment, dst_idx, 0);
 
         fragment_offset += fragment_len;
 
@@ -559,7 +559,7 @@ bool vmxnet_tx_pkt_send(struct VmxnetTxPkt *pkt, NetClientState *nc)
     if (pkt->has_virt_hdr ||
         pkt->virt_hdr.gso_type == VIRTIO_NET_HDR_GSO_NONE) {
         qemu_sendv_packet(nc, pkt->vec,
-            pkt->payload_frags + VMXNET_TX_PKT_PL_START_FRAG);
+            pkt->payload_frags + VMXNET_TX_PKT_PL_START_FRAG, 0);
         return true;
     }
 
diff --git a/hw/net/xgmac.c b/hw/net/xgmac.c
index 9384fa0..683a5ad 100644
--- a/hw/net/xgmac.c
+++ b/hw/net/xgmac.c
@@ -239,7 +239,7 @@ static void xgmac_enet_send(XgmacState *s)
         frame_size += len;
         if (bd.ctl_stat & 0x20000000) {
             /* Last buffer in frame.  */
-            qemu_send_packet(qemu_get_queue(s->nic), frame, len);
+            qemu_send_packet(qemu_get_queue(s->nic), frame, len, 0);
             ptr = frame;
             frame_size = 0;
             s->regs[DMA_STATUS] |= DMA_STATUS_TI | DMA_STATUS_NIS;
diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
index 3eb7715..9dd44bf 100644
--- a/hw/net/xilinx_axienet.c
+++ b/hw/net/xilinx_axienet.c
@@ -919,7 +919,7 @@ xilinx_axienet_data_stream_push(StreamSlave *obj, uint8_t *buf, size_t size)
         buf[write_off + 1] = csum & 0xff;
     }
 
-    qemu_send_packet(qemu_get_queue(s->nic), buf, size);
+    qemu_send_packet(qemu_get_queue(s->nic), buf, size, 0);
 
     s->stats.tx_bytes += size;
     s->regs[R_IS] |= IS_TX_COMPLETE;
diff --git a/hw/usb/dev-network.c b/hw/usb/dev-network.c
index 4c532b7..253878c 100644
--- a/hw/usb/dev-network.c
+++ b/hw/usb/dev-network.c
@@ -1196,7 +1196,7 @@ static void usb_net_handle_dataout(USBNetState *s, USBPacket *p)
 
     if (!is_rndis(s)) {
         if (p->iov.size < 64) {
-            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf, s->out_ptr);
+            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf, s->out_ptr, 0);
             s->out_ptr = 0;
         }
         return;
@@ -1209,7 +1209,8 @@ static void usb_net_handle_dataout(USBNetState *s, USBPacket *p)
         uint32_t offs = 8 + le32_to_cpu(msg->DataOffset);
         uint32_t size = le32_to_cpu(msg->DataLength);
         if (offs + size <= len)
-            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf + offs, size);
+            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf + offs,
+                             size, 0);
     }
     s->out_ptr -= len;
     memmove(s->out_buf, &s->out_buf[len], s->out_ptr);
@@ -1259,7 +1260,8 @@ static void usb_net_handle_data(USBDevice *dev, USBPacket *p)
     }
 }
 
-static ssize_t usbnet_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t usbnet_receive(NetClientState *nc, const uint8_t *buf,
+                              size_t size, unsigned flags)
 {
     USBNetState *s = qemu_get_nic_opaque(nc);
     uint8_t *in_buf = s->in_buf;
diff --git a/include/net/net.h b/include/net/net.h
index 11e1468..d3f0ad6 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -44,8 +44,10 @@ typedef struct NICConf {
 
 typedef void (NetPoll)(NetClientState *, bool enable);
 typedef int (NetCanReceive)(NetClientState *);
-typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
-typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
+typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t,
+                 unsigned);
+typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int,
+                                unsigned);
 typedef void (NetCleanup) (NetClientState *);
 typedef void (LinkStatusChanged)(NetClientState *);
 typedef void (NetClientDestructor)(NetClientState *);
@@ -110,13 +112,17 @@ typedef void (*qemu_nic_foreach)(NICState *nic, void *opaque);
 void qemu_foreach_nic(qemu_nic_foreach func, void *opaque);
 int qemu_can_send_packet(NetClientState *nc);
 ssize_t qemu_sendv_packet(NetClientState *nc, const struct iovec *iov,
-                          int iovcnt);
+                          int iovcnt, unsigned flags);
 ssize_t qemu_sendv_packet_async(NetClientState *nc, const struct iovec *iov,
-                                int iovcnt, NetPacketSent *sent_cb);
-void qemu_send_packet(NetClientState *nc, const uint8_t *buf, int size);
-ssize_t qemu_send_packet_raw(NetClientState *nc, const uint8_t *buf, int size);
+                                int iovcnt, NetPacketSent *sent_cb,
+                                unsigned flags);
+void qemu_send_packet(NetClientState *nc, const uint8_t *buf, int size,
+                      unsigned flags);
+ssize_t qemu_send_packet_raw(NetClientState *nc, const uint8_t *buf, int size,
+                             unsigned flags);
 ssize_t qemu_send_packet_async(NetClientState *nc, const uint8_t *buf,
-                               int size, NetPacketSent *sent_cb);
+                               int size, NetPacketSent *sent_cb,
+                               unsigned flags);
 void qemu_purge_queued_packets(NetClientState *nc);
 void qemu_flush_queued_packets(NetClientState *nc);
 void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6]);
diff --git a/include/net/queue.h b/include/net/queue.h
index fc02b33..1d136a6 100644
--- a/include/net/queue.h
+++ b/include/net/queue.h
@@ -33,6 +33,7 @@ typedef void (NetPacketSent) (NetClientState *sender, ssize_t ret);
 
 #define QEMU_NET_PACKET_FLAG_NONE  0
 #define QEMU_NET_PACKET_FLAG_RAW  (1<<0)
+#define QEMU_NET_PACKET_FLAG_MORE (2<<0)
 
 NetQueue *qemu_new_net_queue(void *opaque);
 
diff --git a/net/dump.c b/net/dump.c
index 9d3a09e..f718d5c 100644
--- a/net/dump.c
+++ b/net/dump.c
@@ -57,7 +57,8 @@ struct pcap_sf_pkthdr {
     uint32_t len;
 };
 
-static ssize_t dump_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t dump_receive(NetClientState *nc, const uint8_t *buf,
+                            size_t size, unsigned flags)
 {
     DumpState *s = DO_UPCAST(DumpState, nc, nc);
     struct pcap_sf_pkthdr hdr;
diff --git a/net/hub.c b/net/hub.c
index 33a99c9..7adca5d 100644
--- a/net/hub.c
+++ b/net/hub.c
@@ -52,7 +52,7 @@ static ssize_t net_hub_receive(NetHub *hub, NetHubPort *source_port,
             continue;
         }
 
-        qemu_send_packet(&port->nc, buf, len);
+        qemu_send_packet(&port->nc, buf, len, 0);
     }
     return len;
 }
@@ -68,7 +68,7 @@ static ssize_t net_hub_receive_iov(NetHub *hub, NetHubPort *source_port,
             continue;
         }
 
-        qemu_sendv_packet(&port->nc, iov, iovcnt);
+        qemu_sendv_packet(&port->nc, iov, iovcnt, 0);
     }
     return len;
 }
@@ -107,7 +107,8 @@ static int net_hub_port_can_receive(NetClientState *nc)
 }
 
 static ssize_t net_hub_port_receive(NetClientState *nc,
-                                    const uint8_t *buf, size_t len)
+                                    const uint8_t *buf, size_t len,
+                                    unsigned flags)
 {
     NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
 
@@ -115,7 +116,8 @@ static ssize_t net_hub_port_receive(NetClientState *nc,
 }
 
 static ssize_t net_hub_port_receive_iov(NetClientState *nc,
-                                        const struct iovec *iov, int iovcnt)
+                                        const struct iovec *iov, int iovcnt,
+                                        unsigned flags)
 {
     NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
 
diff --git a/net/net.c b/net/net.c
index 9db88cc..65cf5f1 100644
--- a/net/net.c
+++ b/net/net.c
@@ -414,9 +414,10 @@ ssize_t qemu_deliver_packet(NetClientState *sender,
     }
 
     if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) {
-        ret = nc->info->receive_raw(nc, data, size);
+        ret = nc->info->receive_raw(nc, data, size,
+                                    flags & ~QEMU_NET_PACKET_FLAG_RAW);
     } else {
-        ret = nc->info->receive(nc, data, size);
+        ret = nc->info->receive(nc, data, size, flags);
     }
 
     if (ret == 0) {
@@ -475,32 +476,36 @@ static ssize_t qemu_send_packet_async_with_flags(NetClientState *sender,
 
 ssize_t qemu_send_packet_async(NetClientState *sender,
                                const uint8_t *buf, int size,
-                               NetPacketSent *sent_cb)
+                               NetPacketSent *sent_cb, unsigned flags)
 {
-    return qemu_send_packet_async_with_flags(sender, QEMU_NET_PACKET_FLAG_NONE,
+    return qemu_send_packet_async_with_flags(sender,
+                                             flags | QEMU_NET_PACKET_FLAG_NONE,
                                              buf, size, sent_cb);
 }
 
-void qemu_send_packet(NetClientState *nc, const uint8_t *buf, int size)
+void qemu_send_packet(NetClientState *nc, const uint8_t *buf, int size,
+                      unsigned flags)
 {
-    qemu_send_packet_async(nc, buf, size, NULL);
+    qemu_send_packet_async(nc, buf, size, NULL, flags);
 }
 
-ssize_t qemu_send_packet_raw(NetClientState *nc, const uint8_t *buf, int size)
+ssize_t qemu_send_packet_raw(NetClientState *nc, const uint8_t *buf, int size,
+                             unsigned flags)
 {
-    return qemu_send_packet_async_with_flags(nc, QEMU_NET_PACKET_FLAG_RAW,
+    return qemu_send_packet_async_with_flags(nc,
+                                             QEMU_NET_PACKET_FLAG_RAW | flags,
                                              buf, size, NULL);
 }
 
 static ssize_t nc_sendv_compat(NetClientState *nc, const struct iovec *iov,
-                               int iovcnt)
+                               int iovcnt, unsigned flags)
 {
     uint8_t buffer[NET_BUFSIZE];
     size_t offset;
 
     offset = iov_to_buf(iov, iovcnt, 0, buffer, sizeof(buffer));
 
-    return nc->info->receive(nc, buffer, offset);
+    return nc->info->receive(nc, buffer, offset, flags);
 }
 
 ssize_t qemu_deliver_packet_iov(NetClientState *sender,
@@ -521,9 +526,9 @@ ssize_t qemu_deliver_packet_iov(NetClientState *sender,
     }
 
     if (nc->info->receive_iov) {
-        ret = nc->info->receive_iov(nc, iov, iovcnt);
+        ret = nc->info->receive_iov(nc, iov, iovcnt, flags);
     } else {
-        ret = nc_sendv_compat(nc, iov, iovcnt);
+        ret = nc_sendv_compat(nc, iov, iovcnt, flags);
     }
 
     if (ret == 0) {
@@ -535,7 +540,8 @@ ssize_t qemu_deliver_packet_iov(NetClientState *sender,
 
 ssize_t qemu_sendv_packet_async(NetClientState *sender,
                                 const struct iovec *iov, int iovcnt,
-                                NetPacketSent *sent_cb)
+                                NetPacketSent *sent_cb,
+                                unsigned flags)
 {
     NetQueue *queue;
 
@@ -546,14 +552,15 @@ ssize_t qemu_sendv_packet_async(NetClientState *sender,
     queue = sender->peer->incoming_queue;
 
     return qemu_net_queue_send_iov(queue, sender,
-                                   QEMU_NET_PACKET_FLAG_NONE,
+                                   flags | QEMU_NET_PACKET_FLAG_NONE,
                                    iov, iovcnt, sent_cb);
 }
 
 ssize_t
-qemu_sendv_packet(NetClientState *nc, const struct iovec *iov, int iovcnt)
+qemu_sendv_packet(NetClientState *nc, const struct iovec *iov, int iovcnt,
+                  unsigned flags)
 {
-    return qemu_sendv_packet_async(nc, iov, iovcnt, NULL);
+    return qemu_sendv_packet_async(nc, iov, iovcnt, NULL, flags);
 }
 
 NetClientState *qemu_find_netdev(const char *id)
diff --git a/net/netmap.c b/net/netmap.c
index 0ccc497..0b982a0 100644
--- a/net/netmap.c
+++ b/net/netmap.c
@@ -218,7 +218,8 @@ static void netmap_writable(void *opaque)
 }
 
 static ssize_t netmap_receive(NetClientState *nc,
-      const uint8_t *buf, size_t size)
+                              const uint8_t *buf,
+                              size_t size, unsigned flags)
 {
     NetmapState *s = DO_UPCAST(NetmapState, nc, nc);
     struct netmap_ring *ring = s->me.tx;
@@ -252,13 +253,17 @@ static ssize_t netmap_receive(NetClientState *nc,
     pkt_copy(buf, dst, size);
     ring->cur = NETMAP_RING_NEXT(ring, i);
     ring->avail--;
-    ioctl(s->me.fd, NIOCTXSYNC, NULL);
+
+    if (!(flags & QEMU_NET_PACKET_FLAG_MORE)) {
+        ioctl(s->me.fd, NIOCTXSYNC, NULL);
+    }
 
     return size;
 }
 
 static ssize_t netmap_receive_iov(NetClientState *nc,
-                    const struct iovec *iov, int iovcnt)
+                    const struct iovec *iov, int iovcnt,
+                    unsigned flags)
 {
     NetmapState *s = DO_UPCAST(NetmapState, nc, nc);
     struct netmap_ring *ring = s->me.tx;
@@ -322,7 +327,9 @@ static ssize_t netmap_receive_iov(NetClientState *nc,
     ring->cur = i;
     ring->avail = avail;
 
-    ioctl(s->me.fd, NIOCTXSYNC, NULL);
+    if (!(flags & QEMU_NET_PACKET_FLAG_MORE)) {
+        ioctl(s->me.fd, NIOCTXSYNC, NULL);
+    }
 
     return iov_size(iov, iovcnt);
 }
@@ -368,7 +375,7 @@ static void netmap_send(void *opaque)
         }
 
         iovsize = qemu_sendv_packet_async(&s->nc, s->iov, iovcnt,
-                                            netmap_send_completed);
+                                            netmap_send_completed, 0);
 
         if (iovsize == 0) {
             /* The peer does not receive anymore. Packet is queued, stop
diff --git a/net/slirp.c b/net/slirp.c
index 124e953..a801638 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -103,10 +103,11 @@ void slirp_output(void *opaque, const uint8_t *pkt, int pkt_len)
 {
     SlirpState *s = opaque;
 
-    qemu_send_packet(&s->nc, pkt, pkt_len);
+    qemu_send_packet(&s->nc, pkt, pkt_len, 0);
 }
 
-static ssize_t net_slirp_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t net_slirp_receive(NetClientState *nc, const uint8_t *buf,
+                                 size_t size, unsigned flags)
 {
     SlirpState *s = DO_UPCAST(SlirpState, nc, nc);
 
diff --git a/net/socket.c b/net/socket.c
index fb21e20..acc715a 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -89,7 +89,8 @@ static void net_socket_writable(void *opaque)
     qemu_flush_queued_packets(&s->nc);
 }
 
-static ssize_t net_socket_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t net_socket_receive(NetClientState *nc, const uint8_t *buf,
+                                  size_t size, unsigned flags)
 {
     NetSocketState *s = DO_UPCAST(NetSocketState, nc, nc);
     uint32_t len = htonl(size);
@@ -124,7 +125,8 @@ static ssize_t net_socket_receive(NetClientState *nc, const uint8_t *buf, size_t
     return size;
 }
 
-static ssize_t net_socket_receive_dgram(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t net_socket_receive_dgram(NetClientState *nc, const uint8_t *buf,
+                                        size_t size, unsigned flags)
 {
     NetSocketState *s = DO_UPCAST(NetSocketState, nc, nc);
     ssize_t ret;
@@ -211,7 +213,7 @@ static void net_socket_send(void *opaque)
             buf += l;
             size -= l;
             if (s->index >= s->packet_len) {
-                qemu_send_packet(&s->nc, s->buf, s->packet_len);
+                qemu_send_packet(&s->nc, s->buf, s->packet_len, 0);
                 s->index = 0;
                 s->state = 0;
             }
@@ -234,7 +236,7 @@ static void net_socket_send_dgram(void *opaque)
         net_socket_write_poll(s, false);
         return;
     }
-    qemu_send_packet(&s->nc, s->buf, size);
+    qemu_send_packet(&s->nc, s->buf, size, 0);
 }
 
 static int net_socket_mcast_create(struct sockaddr_in *mcastaddr, struct in_addr *localaddr)
diff --git a/net/tap-win32.c b/net/tap-win32.c
index 91e9e84..2d86122 100644
--- a/net/tap-win32.c
+++ b/net/tap-win32.c
@@ -664,7 +664,7 @@ static void tap_win32_send(void *opaque)
 
     size = tap_win32_read(s->handle, &buf, max_size);
     if (size > 0) {
-        qemu_send_packet(&s->nc, buf, size);
+        qemu_send_packet(&s->nc, buf, size, 0);
         tap_win32_free_buffer(s->handle, buf);
     }
 }
diff --git a/net/tap.c b/net/tap.c
index 39c1cda..6d7a02e 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -112,7 +112,7 @@ static ssize_t tap_write_packet(TAPState *s, const struct iovec *iov, int iovcnt
 }
 
 static ssize_t tap_receive_iov(NetClientState *nc, const struct iovec *iov,
-                               int iovcnt)
+                               int iovcnt, unsigned flags)
 {
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
     const struct iovec *iovp = iov;
@@ -130,7 +130,8 @@ static ssize_t tap_receive_iov(NetClientState *nc, const struct iovec *iov,
     return tap_write_packet(s, iovp, iovcnt);
 }
 
-static ssize_t tap_receive_raw(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t tap_receive_raw(NetClientState *nc, const uint8_t *buf,
+                               size_t size, unsigned flags)
 {
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
     struct iovec iov[2];
@@ -150,13 +151,14 @@ static ssize_t tap_receive_raw(NetClientState *nc, const uint8_t *buf, size_t si
     return tap_write_packet(s, iov, iovcnt);
 }
 
-static ssize_t tap_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t tap_receive(NetClientState *nc, const uint8_t *buf, size_t size,
+                           unsigned flags)
 {
     TAPState *s = DO_UPCAST(TAPState, nc, nc);
     struct iovec iov[1];
 
     if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {
-        return tap_receive_raw(nc, buf, size);
+        return tap_receive_raw(nc, buf, size, flags);
     }
 
     iov[0].iov_base = (char *)buf;
@@ -203,7 +205,7 @@ static void tap_send(void *opaque)
             size -= s->host_vnet_hdr_len;
         }
 
-        size = qemu_send_packet_async(&s->nc, buf, size, tap_send_completed);
+        size = qemu_send_packet_async(&s->nc, buf, size, tap_send_completed, 0);
         if (size == 0) {
             tap_read_poll(s, false);
         }
diff --git a/net/vde.c b/net/vde.c
index 2a619fb..5629f58 100644
--- a/net/vde.c
+++ b/net/vde.c
@@ -44,11 +44,12 @@ static void vde_to_qemu(void *opaque)
 
     size = vde_recv(s->vde, (char *)buf, sizeof(buf), 0);
     if (size > 0) {
-        qemu_send_packet(&s->nc, buf, size);
+        qemu_send_packet(&s->nc, buf, size, 0);
     }
 }
 
-static ssize_t vde_receive(NetClientState *nc, const uint8_t *buf, size_t size)
+static ssize_t vde_receive(NetClientState *nc, const uint8_t *buf, size_t size,
+                           unsigned flags)
 {
     VDEState *s = DO_UPCAST(VDEState, nc, nc);
     ssize_t ret;
diff --git a/savevm.c b/savevm.c
index 3f912dd..a8d5373 100644
--- a/savevm.c
+++ b/savevm.c
@@ -84,7 +84,7 @@ static void qemu_announce_self_iter(NICState *nic, void *opaque)
 
     len = announce_self_create(buf, nic->conf->macaddr.a);
 
-    qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
+    qemu_send_packet_raw(qemu_get_queue(nic), buf, len, 0);
 }
 
 
-- 
1.8.4.2

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-06 14:44 [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced Vincenzo Maffione
@ 2013-12-06 16:39 ` Stefan Weil
  2013-12-08 12:11 ` Michael S. Tsirkin
  2013-12-09 12:36 ` Stefan Hajnoczi
  2 siblings, 0 replies; 17+ messages in thread
From: Stefan Weil @ 2013-12-06 16:39 UTC (permalink / raw)
  To: Vincenzo Maffione, qemu-devel
  Cc: peter.maydell, mst, jasowang, mjt, lcapitulino,
	peter.crosthwaite, owasserm, kraxel, yan, edgar.iglesias, akong,
	quintela, agraf, aliguori, marcel.a, stefanha, g.lettieri, rizzo,
	dmitry, mark.langsdorf, pbonzini, afaerber

Am 06.12.2013 15:44, schrieb Vincenzo Maffione:
> This patch extends the frontend-backend interface so that it is possible
> to pass a new flag (QEMU_NET_PACKET_FLAG_MORE) when sending a packet to the
> other peer. The new flag acts as a hint for the receiving peer, which can
> accumulate a batch of packets before forwarding those packets (to the host
> if the receiving peer is the backend or to the guest if the receiving peer
> is the frontend).
>
> The patch also implements a batching mechanism for the netmap backend (on the
> backend receive side) and for the e1000 and virtio frontends (on the frontend
> transmit side).
>
> Measured improvement of a guest-to-guest UDP_STREAM netperf test (64 bytes
> packets) with virtio-net frontends:
>     820 Kpps ==> 1000 Kpps (+22%).
>
> Measured improvement of a guest-to-guest UDP test (64 bytes packets) with
> e1000 frontends and netmap clients on the guests:
>     1.8 Mpps ==> 3.1 Mpps (+72%).
>
> Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
> ---
>

If this patch is wanted, I suggest replacing flag value 0 by
QEMU_NET_PACKET_FLAG_NONE in all function calls.

Instead of type 'unsigned' for flag values, I'd prefer an enum type (for
QEMU_NET_PACKET_FLAG_NONE, QEMU_NET_PACKET_FLAG_RAW, and
QEMU_NET_PACKET_FLAG_MORE). This enum can then be used in the function
prototypes.

I wonder why the define statement for QEMU_NET_PACKET_FLAG_MORE uses
2<<0 instead of 1<<1 or simply 2.

Regards,
Stefan Weil

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-06 14:44 [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced Vincenzo Maffione
  2013-12-06 16:39 ` Stefan Weil
@ 2013-12-08 12:11 ` Michael S. Tsirkin
  2013-12-09 10:20   ` Vincenzo Maffione
  2013-12-09 12:36 ` Stefan Hajnoczi
  2 siblings, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-12-08 12:11 UTC (permalink / raw)
  To: Vincenzo Maffione
  Cc: peter.maydell, jasowang, mjt, qemu-devel, lcapitulino,
	peter.crosthwaite, dmitry, kraxel, yan, edgar.iglesias, akong,
	quintela, agraf, aliguori, marcel.a, sw, stefanha, g.lettieri,
	rizzo, mark.langsdorf, owasserm, pbonzini, afaerber

On Fri, Dec 06, 2013 at 03:44:33PM +0100, Vincenzo Maffione wrote:
> This patch extends the frontend-backend interface so that it is possible
> to pass a new flag (QEMU_NET_PACKET_FLAG_MORE) when sending a packet to the
> other peer. The new flag acts as a hint for the receiving peer, which can
> accumulate a batch of packets before forwarding those packets (to the host
> if the receiving peer is the backend or to the guest if the receiving peer
> is the frontend).
> 
> The patch also implements a batching mechanism for the netmap backend (on the
> backend receive side) and for the e1000 and virtio frontends (on the frontend
> transmit side).
> 
> Measured improvement of a guest-to-guest UDP_STREAM netperf test (64 bytes
> packets) with virtio-net frontends:
>     820 Kpps ==> 1000 Kpps (+22%).
> 
> Measured improvement of a guest-to-guest UDP test (64 bytes packets) with
> e1000 frontends and netmap clients on the guests:
>     1.8 Mpps ==> 3.1 Mpps (+72%).
> 
> Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>

So we are batching some more and this helps throughput. However I wonder
what this does to a more bursty traffic, such as
several TCP streams running in parallel.


> ---
> Experiment details:
>     - Processor: Intel i7-3770K CPU @ 3.50GHz (8 cores)
>     - Memory @ 1333 MHz
>     - Host O.S.: Archlinux with Linux 3.11
>     - Guest O.S.: Archlinux with Linux 3.11
> 
> QEMU command line for the virtio experiment:
>     qemu-system-x86_64 archdisk.qcow -snapshot -enable-kvm -device virtio-net-pci,ioeventfd=on,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev netmap,ifname=vale0:01,id=mynet -smp 2 -vga std -m 3G
> 
> QEMU command line for the e1000 experiment:
>     qemu-system-x86_64 archdisk.qcow -snapshot -enable-kvm -device e1000,mitigation=off,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev netmap,ifname=vale0:01,id=mynet -smp 2 -vga std -m 3G
> 
> With the e1000 experiments, we don't use netperf on the guests, but netmap clients (pkt-gen)
> that run directly on the e1000 adapter, bypassing the O.S. stack.
> 
> Other things:
>     - This patch is against the net-next tree (https://github.com/stefanha/qemu.git)
>       because the first netmap patch is not in the qemu master (AFAIK).
>     - The batching can also be implemented on the backend transmit side and frontend
>       receive side. We could do it in the future.
> 
>  hw/net/cadence_gem.c    |  3 ++-
>  hw/net/dp8393x.c        |  5 +++--
>  hw/net/e1000.c          | 21 ++++++++++++++++-----
>  hw/net/eepro100.c       |  5 +++--
>  hw/net/etraxfs_eth.c    |  5 +++--
>  hw/net/lan9118.c        |  2 +-
>  hw/net/mcf_fec.c        |  5 +++--
>  hw/net/mipsnet.c        |  6 ++++--
>  hw/net/ne2000.c         |  5 +++--
>  hw/net/ne2000.h         |  3 ++-
>  hw/net/opencores_eth.c  |  2 +-
>  hw/net/pcnet.c          |  8 +++++---
>  hw/net/pcnet.h          |  3 ++-
>  hw/net/rtl8139.c        |  7 ++++---
>  hw/net/smc91c111.c      |  5 +++--
>  hw/net/spapr_llan.c     |  2 +-
>  hw/net/stellaris_enet.c |  3 ++-
>  hw/net/virtio-net.c     | 10 ++++++++--
>  hw/net/vmxnet3.c        |  3 ++-
>  hw/net/vmxnet_tx_pkt.c  |  4 ++--
>  hw/net/xgmac.c          |  2 +-
>  hw/net/xilinx_axienet.c |  2 +-
>  hw/usb/dev-network.c    |  8 +++++---
>  include/net/net.h       | 20 +++++++++++++-------
>  include/net/queue.h     |  1 +
>  net/dump.c              |  3 ++-
>  net/hub.c               | 10 ++++++----
>  net/net.c               | 39 +++++++++++++++++++++++----------------
>  net/netmap.c            | 17 ++++++++++++-----
>  net/slirp.c             |  5 +++--
>  net/socket.c            | 10 ++++++----
>  net/tap-win32.c         |  2 +-
>  net/tap.c               | 12 +++++++-----
>  net/vde.c               |  5 +++--
>  savevm.c                |  2 +-
>  35 files changed, 155 insertions(+), 90 deletions(-)
> 
> diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
> index 4a355bb..432687a 100644
> --- a/hw/net/cadence_gem.c
> +++ b/hw/net/cadence_gem.c
> @@ -583,7 +583,8 @@ static int gem_mac_address_filter(GemState *s, const uint8_t *packet)
>   * gem_receive:
>   * Fit a packet handed to us by QEMU into the receive descriptor ring.
>   */
> -static ssize_t gem_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t gem_receive(NetClientState *nc, const uint8_t *buf, size_t size,
> +                           unsigned flags)
>  {
>      unsigned    desc[2];
>      hwaddr packet_desc_addr, last_desc_addr;
> diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
> index 789d385..d8c7da8 100644
> --- a/hw/net/dp8393x.c
> +++ b/hw/net/dp8393x.c
> @@ -415,7 +415,7 @@ static void do_transmit_packets(dp8393xState *s)
>              }
>          } else {
>              /* Transmit packet */
> -            qemu_send_packet(nc, s->tx_buffer, tx_len);
> +            qemu_send_packet(nc, s->tx_buffer, tx_len, 0);
>          }
>          s->regs[SONIC_TCR] |= SONIC_TCR_PTX;
>  
> @@ -723,7 +723,8 @@ static int receive_filter(dp8393xState *s, const uint8_t * buf, int size)
>      return -1;
>  }
>  
> -static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf, size_t size)
> +static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf,
> +                           size_t size, unsigned flags)
>  {
>      dp8393xState *s = qemu_get_nic_opaque(nc);
>      uint16_t data[10];
> diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> index ae63591..5294ec5 100644
> --- a/hw/net/e1000.c
> +++ b/hw/net/e1000.c
> @@ -570,10 +570,19 @@ static void
>  e1000_send_packet(E1000State *s, const uint8_t *buf, int size)
>  {
>      NetClientState *nc = qemu_get_queue(s->nic);
> +    uint32_t tdh = s->mac_reg[TDH];
> +    unsigned flags = QEMU_NET_PACKET_FLAG_MORE;
> +
>      if (s->phy_reg[PHY_CTRL] & MII_CR_LOOPBACK) {
> -        nc->info->receive(nc, buf, size);
> +        nc->info->receive(nc, buf, size, 0);
>      } else {
> -        qemu_send_packet(nc, buf, size);
> +        if (++tdh * sizeof(struct e1000_tx_desc) >= s->mac_reg[TDLEN]) {
> +            tdh = 0;
> +        }
> +        if (tdh == s->mac_reg[TDT]) {
> +            flags = 0;
> +        }
> +        qemu_send_packet(nc, buf, size, flags);
>      }
>  }
>  
> @@ -899,7 +908,8 @@ static uint64_t rx_desc_base(E1000State *s)
>  }
>  
>  static ssize_t
> -e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
> +e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt,
> +                  unsigned flags)
>  {
>      E1000State *s = qemu_get_nic_opaque(nc);
>      PCIDevice *d = PCI_DEVICE(s);
> @@ -1054,14 +1064,15 @@ e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int iovcnt)
>  }
>  
>  static ssize_t
> -e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size,
> +              unsigned flags)
>  {
>      const struct iovec iov = {
>          .iov_base = (uint8_t *)buf,
>          .iov_len = size
>      };
>  
> -    return e1000_receive_iov(nc, &iov, 1);
> +    return e1000_receive_iov(nc, &iov, 1, flags);
>  }
>  
>  static uint32_t
> diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
> index 3b891ca..9763904 100644
> --- a/hw/net/eepro100.c
> +++ b/hw/net/eepro100.c
> @@ -828,7 +828,7 @@ static void tx_command(EEPRO100State *s)
>          }
>      }
>      TRACE(RXTX, logout("%p sending frame, len=%d,%s\n", s, size, nic_dump(buf, size)));
> -    qemu_send_packet(qemu_get_queue(s->nic), buf, size);
> +    qemu_send_packet(qemu_get_queue(s->nic), buf, size, 0);
>      s->statistics.tx_good_frames++;
>      /* Transmit with bad status would raise an CX/TNO interrupt.
>       * (82557 only). Emulation never has bad status. */
> @@ -1627,7 +1627,8 @@ static int nic_can_receive(NetClientState *nc)
>  #endif
>  }
>  
> -static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf, size_t size)
> +static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf,
> +                           size_t size, unsigned flags)
>  {
>      /* TODO:
>       * - Magic packets should set bit 30 in power management driver register.
> diff --git a/hw/net/etraxfs_eth.c b/hw/net/etraxfs_eth.c
> index 78ebbbc..6cba74e 100644
> --- a/hw/net/etraxfs_eth.c
> +++ b/hw/net/etraxfs_eth.c
> @@ -525,7 +525,8 @@ static int eth_can_receive(NetClientState *nc)
>      return 1;
>  }
>  
> -static ssize_t eth_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t eth_receive(NetClientState *nc, const uint8_t *buf, size_t size,
> +                           unsigned flags)
>  {
>      unsigned char sa_bcast[6] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
>      ETRAXFSEthState *eth = qemu_get_nic_opaque(nc);
> @@ -560,7 +561,7 @@ static int eth_tx_push(void *opaque, unsigned char *buf, int len, bool eop)
>      ETRAXFSEthState *eth = opaque;
>  
>      D(printf("%s buf=%p len=%d\n", __func__, buf, len));
> -    qemu_send_packet(qemu_get_queue(eth->nic), buf, len);
> +    qemu_send_packet(qemu_get_queue(eth->nic), buf, len, 0);
>      return len;
>  }
>  
> diff --git a/hw/net/lan9118.c b/hw/net/lan9118.c
> index 2315f99..55e06a9 100644
> --- a/hw/net/lan9118.c
> +++ b/hw/net/lan9118.c
> @@ -664,7 +664,7 @@ static void do_tx_packet(lan9118_state *s)
>          /* This assumes the receive routine doesn't touch the VLANClient.  */
>          lan9118_receive(qemu_get_queue(s->nic), s->txp->data, s->txp->len);
>      } else {
> -        qemu_send_packet(qemu_get_queue(s->nic), s->txp->data, s->txp->len);
> +        qemu_send_packet(qemu_get_queue(s->nic), s->txp->data, s->txp->len, 0);
>      }
>      s->txp->fifo_used = 0;
>  
> diff --git a/hw/net/mcf_fec.c b/hw/net/mcf_fec.c
> index 4bff3de..14ed0dd 100644
> --- a/hw/net/mcf_fec.c
> +++ b/hw/net/mcf_fec.c
> @@ -174,7 +174,7 @@ static void mcf_fec_do_tx(mcf_fec_state *s)
>          if (bd.flags & FEC_BD_L) {
>              /* Last buffer in frame.  */
>              DPRINTF("Sending packet\n");
> -            qemu_send_packet(qemu_get_queue(s->nic), frame, len);
> +            qemu_send_packet(qemu_get_queue(s->nic), frame, len, 0);
>              ptr = frame;
>              frame_size = 0;
>              s->eir |= FEC_INT_TXF;
> @@ -357,7 +357,8 @@ static int mcf_fec_can_receive(NetClientState *nc)
>      return s->rx_enabled;
>  }
>  
> -static ssize_t mcf_fec_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t mcf_fec_receive(NetClientState *nc, const uint8_t *buf,
> +                               size_t size, unsigned flags)
>  {
>      mcf_fec_state *s = qemu_get_nic_opaque(nc);
>      mcf_fec_bd bd;
> diff --git a/hw/net/mipsnet.c b/hw/net/mipsnet.c
> index e421b86..7f5d4c4 100644
> --- a/hw/net/mipsnet.c
> +++ b/hw/net/mipsnet.c
> @@ -74,7 +74,8 @@ static int mipsnet_can_receive(NetClientState *nc)
>      return !mipsnet_buffer_full(s);
>  }
>  
> -static ssize_t mipsnet_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t mipsnet_receive(NetClientState *nc, const uint8_t *buf,
> +                               size_t size, unsigned flags)
>  {
>      MIPSnetState *s = qemu_get_nic_opaque(nc);
>  
> @@ -176,7 +177,8 @@ static void mipsnet_ioport_write(void *opaque, hwaddr addr,
>          if (s->tx_written == s->tx_count) {
>              /* Send buffer. */
>              trace_mipsnet_send(s->tx_count);
> -            qemu_send_packet(qemu_get_queue(s->nic), s->tx_buffer, s->tx_count);
> +            qemu_send_packet(qemu_get_queue(s->nic), s->tx_buffer,
> +                             s->tx_count, 0);
>              s->tx_count = s->tx_written = 0;
>              s->intctl |= MIPSNET_INTCTL_TXDONE;
>              s->busy = 1;
> diff --git a/hw/net/ne2000.c b/hw/net/ne2000.c
> index 4c32e9e..52af46a 100644
> --- a/hw/net/ne2000.c
> +++ b/hw/net/ne2000.c
> @@ -176,7 +176,8 @@ int ne2000_can_receive(NetClientState *nc)
>  
>  #define MIN_BUF_SIZE 60
>  
> -ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
> +ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_,
> +                       unsigned flags)
>  {
>      NE2000State *s = qemu_get_nic_opaque(nc);
>      int size = size_;
> @@ -301,7 +302,7 @@ static void ne2000_ioport_write(void *opaque, uint32_t addr, uint32_t val)
>                  /* fail safe: check range on the transmitted length  */
>                  if (index + s->tcnt <= NE2000_PMEM_END) {
>                      qemu_send_packet(qemu_get_queue(s->nic), s->mem + index,
> -                                     s->tcnt);
> +                                     s->tcnt, 0);
>                  }
>                  /* signal end of transfer */
>                  s->tsr = ENTSR_PTX;
> diff --git a/hw/net/ne2000.h b/hw/net/ne2000.h
> index e500306..b62a8f3 100644
> --- a/hw/net/ne2000.h
> +++ b/hw/net/ne2000.h
> @@ -35,6 +35,7 @@ void ne2000_setup_io(NE2000State *s, DeviceState *dev, unsigned size);
>  extern const VMStateDescription vmstate_ne2000;
>  void ne2000_reset(NE2000State *s);
>  int ne2000_can_receive(NetClientState *nc);
> -ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_);
> +ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t size_,
> +                       unsigned flags);
>  
>  #endif
> diff --git a/hw/net/opencores_eth.c b/hw/net/opencores_eth.c
> index 4118d54..b4328ea 100644
> --- a/hw/net/opencores_eth.c
> +++ b/hw/net/opencores_eth.c
> @@ -503,7 +503,7 @@ static void open_eth_start_xmit(OpenEthState *s, desc *tx)
>      if (tx_len > len) {
>          memset(buf + len, 0, tx_len - len);
>      }
> -    qemu_send_packet(qemu_get_queue(s->nic), buf, tx_len);
> +    qemu_send_packet(qemu_get_queue(s->nic), buf, tx_len, 0);
>  
>      if (tx->len_flags & TXD_WR) {
>          s->tx_desc = 0;
> diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c
> index 7cb47b3..707ac92 100644
> --- a/hw/net/pcnet.c
> +++ b/hw/net/pcnet.c
> @@ -1019,7 +1019,8 @@ int pcnet_can_receive(NetClientState *nc)
>  
>  #define MIN_BUF_SIZE 60
>  
> -ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_)
> +ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_,
> +                      unsigned flags)
>  {
>      PCNetState *s = qemu_get_nic_opaque(nc);
>      int is_padr = 0, is_bcast = 0, is_ladr = 0;
> @@ -1265,12 +1266,13 @@ static void pcnet_transmit(PCNetState *s)
>                  if (BCR_SWSTYLE(s) == 1)
>                      add_crc = !GET_FIELD(tmd.status, TMDS, NOFCS);
>                  s->looptest = add_crc ? PCNET_LOOPTEST_CRC : PCNET_LOOPTEST_NOCRC;
> -                pcnet_receive(qemu_get_queue(s->nic), s->buffer, s->xmit_pos);
> +                pcnet_receive(qemu_get_queue(s->nic), s->buffer,
> +                              s->xmit_pos, 0);
>                  s->looptest = 0;
>              } else
>                  if (s->nic)
>                      qemu_send_packet(qemu_get_queue(s->nic), s->buffer,
> -                                     s->xmit_pos);
> +                                     s->xmit_pos, 0);
>  
>              s->csr[0] &= ~0x0008;   /* clear TDMD */
>              s->csr[4] |= 0x0004;    /* set TXSTRT */
> diff --git a/hw/net/pcnet.h b/hw/net/pcnet.h
> index 9dee6f3..a26aacd 100644
> --- a/hw/net/pcnet.h
> +++ b/hw/net/pcnet.h
> @@ -61,7 +61,8 @@ void pcnet_ioport_writel(void *opaque, uint32_t addr, uint32_t val);
>  uint32_t pcnet_ioport_readl(void *opaque, uint32_t addr);
>  uint32_t pcnet_bcr_readw(PCNetState *s, uint32_t rap);
>  int pcnet_can_receive(NetClientState *nc);
> -ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_);
> +ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t size_,
> +                      unsigned flags);
>  void pcnet_set_link_status(NetClientState *nc);
>  void pcnet_common_cleanup(PCNetState *d);
>  int pcnet_common_init(DeviceState *dev, PCNetState *s, NetClientInfo *info);
> diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
> index 7f2b4db..340331f 100644
> --- a/hw/net/rtl8139.c
> +++ b/hw/net/rtl8139.c
> @@ -1195,7 +1195,8 @@ static ssize_t rtl8139_do_receive(NetClientState *nc, const uint8_t *buf, size_t
>      return size_;
>  }
>  
> -static ssize_t rtl8139_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t rtl8139_receive(NetClientState *nc, const uint8_t *buf,
> +                               size_t size, unsigned flags)
>  {
>      return rtl8139_do_receive(nc, buf, size, 1);
>  }
> @@ -1814,9 +1815,9 @@ static void rtl8139_transfer_frame(RTL8139State *s, uint8_t *buf, int size,
>      else
>      {
>          if (iov) {
> -            qemu_sendv_packet(qemu_get_queue(s->nic), iov, 3);
> +            qemu_sendv_packet(qemu_get_queue(s->nic), iov, 3, 0);
>          } else {
> -            qemu_send_packet(qemu_get_queue(s->nic), buf, size);
> +            qemu_send_packet(qemu_get_queue(s->nic), buf, size, 0);
>          }
>      }
>  }
> diff --git a/hw/net/smc91c111.c b/hw/net/smc91c111.c
> index a8e29b3..82289aa 100644
> --- a/hw/net/smc91c111.c
> +++ b/hw/net/smc91c111.c
> @@ -242,7 +242,7 @@ static void smc91c111_do_tx(smc91c111_state *s)
>              smc91c111_release_packet(s, packetnum);
>          else if (s->tx_fifo_done_len < NUM_PACKETS)
>              s->tx_fifo_done[s->tx_fifo_done_len++] = packetnum;
> -        qemu_send_packet(qemu_get_queue(s->nic), p, len);
> +        qemu_send_packet(qemu_get_queue(s->nic), p, len, 0);
>      }
>      s->tx_fifo_len = 0;
>      smc91c111_update(s);
> @@ -647,7 +647,8 @@ static int smc91c111_can_receive(NetClientState *nc)
>      return 1;
>  }
>  
> -static ssize_t smc91c111_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t smc91c111_receive(NetClientState *nc, const uint8_t *buf,
> +                                 size_t size, unsigned flags)
>  {
>      smc91c111_state *s = qemu_get_nic_opaque(nc);
>      int status;
> diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
> index 1bd6f50..1a50bc1 100644
> --- a/hw/net/spapr_llan.c
> +++ b/hw/net/spapr_llan.c
> @@ -476,7 +476,7 @@ static target_ulong h_send_logical_lan(PowerPCCPU *cpu, sPAPREnvironment *spapr,
>          p += VLAN_BD_LEN(bufs[i]);
>      }
>  
> -    qemu_send_packet(qemu_get_queue(dev->nic), lbuf, total_len);
> +    qemu_send_packet(qemu_get_queue(dev->nic), lbuf, total_len, 0);
>  
>      return H_SUCCESS;
>  }
> diff --git a/hw/net/stellaris_enet.c b/hw/net/stellaris_enet.c
> index 9dd77f7..950b455 100644
> --- a/hw/net/stellaris_enet.c
> +++ b/hw/net/stellaris_enet.c
> @@ -83,7 +83,8 @@ static void stellaris_enet_update(stellaris_enet_state *s)
>  }
>  
>  /* TODO: Implement MAC address filtering.  */
> -static ssize_t stellaris_enet_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t stellaris_enet_receive(NetClientState *nc, const uint8_t *buf,
> +                                      size_t size, unsigned flags)
>  {
>      stellaris_enet_state *s = qemu_get_nic_opaque(nc);
>      int n;
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 513c168..b25bc4e 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -938,7 +938,8 @@ static int receive_filter(VirtIONet *n, const uint8_t *buf, int size)
>      return 0;
>  }
>  
> -static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf,
> +                                  size_t size, unsigned flags)
>  {
>      VirtIONet *n = qemu_get_nic_opaque(nc);
>      VirtIONetQueue *q = virtio_net_get_subqueue(nc);
> @@ -1079,6 +1080,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
>          unsigned int out_num = elem.out_num;
>          struct iovec *out_sg = &elem.out_sg[0];
>          struct iovec sg[VIRTQUEUE_MAX_SIZE];
> +        unsigned flags = QEMU_NET_PACKET_FLAG_MORE;
>  
>          if (out_num < 1) {
>              error_report("virtio-net header not in first element");
> @@ -1104,8 +1106,12 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue *q)
>  
>          len = n->guest_hdr_len;
>  
> +        if (num_packets + 1 >= n->tx_burst || virtio_queue_empty(q->tx_vq)) {
> +                flags = 0;
> +        }
>          ret = qemu_sendv_packet_async(qemu_get_subqueue(n->nic, queue_index),
> -                                      out_sg, out_num, virtio_net_tx_complete);
> +                                      out_sg, out_num, virtio_net_tx_complete,
> +                                      flags);
>          if (ret == 0) {
>              virtio_queue_set_notification(q->tx_vq, 0);
>              q->async_tx.elem = elem;
> diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
> index 19687aa..6bd59d0 100644
> --- a/hw/net/vmxnet3.c
> +++ b/hw/net/vmxnet3.c
> @@ -1817,7 +1817,8 @@ vmxnet3_rx_filter_may_indicate(VMXNET3State *s, const void *data,
>  }
>  
>  static ssize_t
> -vmxnet3_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +vmxnet3_receive(NetClientState *nc, const uint8_t *buf, size_t size,
> +                unsigned flags)
>  {
>      VMXNET3State *s = qemu_get_nic_opaque(nc);
>      size_t bytes_indicated;
> diff --git a/hw/net/vmxnet_tx_pkt.c b/hw/net/vmxnet_tx_pkt.c
> index f7344c4..12d842e 100644
> --- a/hw/net/vmxnet_tx_pkt.c
> +++ b/hw/net/vmxnet_tx_pkt.c
> @@ -526,7 +526,7 @@ static bool vmxnet_tx_pkt_do_sw_fragmentation(struct VmxnetTxPkt *pkt,
>  
>          eth_fix_ip4_checksum(l3_iov_base, l3_iov_len);
>  
> -        qemu_sendv_packet(nc, fragment, dst_idx);
> +        qemu_sendv_packet(nc, fragment, dst_idx, 0);
>  
>          fragment_offset += fragment_len;
>  
> @@ -559,7 +559,7 @@ bool vmxnet_tx_pkt_send(struct VmxnetTxPkt *pkt, NetClientState *nc)
>      if (pkt->has_virt_hdr ||
>          pkt->virt_hdr.gso_type == VIRTIO_NET_HDR_GSO_NONE) {
>          qemu_sendv_packet(nc, pkt->vec,
> -            pkt->payload_frags + VMXNET_TX_PKT_PL_START_FRAG);
> +            pkt->payload_frags + VMXNET_TX_PKT_PL_START_FRAG, 0);
>          return true;
>      }
>  
> diff --git a/hw/net/xgmac.c b/hw/net/xgmac.c
> index 9384fa0..683a5ad 100644
> --- a/hw/net/xgmac.c
> +++ b/hw/net/xgmac.c
> @@ -239,7 +239,7 @@ static void xgmac_enet_send(XgmacState *s)
>          frame_size += len;
>          if (bd.ctl_stat & 0x20000000) {
>              /* Last buffer in frame.  */
> -            qemu_send_packet(qemu_get_queue(s->nic), frame, len);
> +            qemu_send_packet(qemu_get_queue(s->nic), frame, len, 0);
>              ptr = frame;
>              frame_size = 0;
>              s->regs[DMA_STATUS] |= DMA_STATUS_TI | DMA_STATUS_NIS;
> diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
> index 3eb7715..9dd44bf 100644
> --- a/hw/net/xilinx_axienet.c
> +++ b/hw/net/xilinx_axienet.c
> @@ -919,7 +919,7 @@ xilinx_axienet_data_stream_push(StreamSlave *obj, uint8_t *buf, size_t size)
>          buf[write_off + 1] = csum & 0xff;
>      }
>  
> -    qemu_send_packet(qemu_get_queue(s->nic), buf, size);
> +    qemu_send_packet(qemu_get_queue(s->nic), buf, size, 0);
>  
>      s->stats.tx_bytes += size;
>      s->regs[R_IS] |= IS_TX_COMPLETE;
> diff --git a/hw/usb/dev-network.c b/hw/usb/dev-network.c
> index 4c532b7..253878c 100644
> --- a/hw/usb/dev-network.c
> +++ b/hw/usb/dev-network.c
> @@ -1196,7 +1196,7 @@ static void usb_net_handle_dataout(USBNetState *s, USBPacket *p)
>  
>      if (!is_rndis(s)) {
>          if (p->iov.size < 64) {
> -            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf, s->out_ptr);
> +            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf, s->out_ptr, 0);
>              s->out_ptr = 0;
>          }
>          return;
> @@ -1209,7 +1209,8 @@ static void usb_net_handle_dataout(USBNetState *s, USBPacket *p)
>          uint32_t offs = 8 + le32_to_cpu(msg->DataOffset);
>          uint32_t size = le32_to_cpu(msg->DataLength);
>          if (offs + size <= len)
> -            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf + offs, size);
> +            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf + offs,
> +                             size, 0);
>      }
>      s->out_ptr -= len;
>      memmove(s->out_buf, &s->out_buf[len], s->out_ptr);
> @@ -1259,7 +1260,8 @@ static void usb_net_handle_data(USBDevice *dev, USBPacket *p)
>      }
>  }
>  
> -static ssize_t usbnet_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t usbnet_receive(NetClientState *nc, const uint8_t *buf,
> +                              size_t size, unsigned flags)
>  {
>      USBNetState *s = qemu_get_nic_opaque(nc);
>      uint8_t *in_buf = s->in_buf;
> diff --git a/include/net/net.h b/include/net/net.h
> index 11e1468..d3f0ad6 100644
> --- a/include/net/net.h
> +++ b/include/net/net.h
> @@ -44,8 +44,10 @@ typedef struct NICConf {
>  
>  typedef void (NetPoll)(NetClientState *, bool enable);
>  typedef int (NetCanReceive)(NetClientState *);
> -typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
> -typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int);
> +typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t,
> +                 unsigned);
> +typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *, int,
> +                                unsigned);
>  typedef void (NetCleanup) (NetClientState *);
>  typedef void (LinkStatusChanged)(NetClientState *);
>  typedef void (NetClientDestructor)(NetClientState *);
> @@ -110,13 +112,17 @@ typedef void (*qemu_nic_foreach)(NICState *nic, void *opaque);
>  void qemu_foreach_nic(qemu_nic_foreach func, void *opaque);
>  int qemu_can_send_packet(NetClientState *nc);
>  ssize_t qemu_sendv_packet(NetClientState *nc, const struct iovec *iov,
> -                          int iovcnt);
> +                          int iovcnt, unsigned flags);
>  ssize_t qemu_sendv_packet_async(NetClientState *nc, const struct iovec *iov,
> -                                int iovcnt, NetPacketSent *sent_cb);
> -void qemu_send_packet(NetClientState *nc, const uint8_t *buf, int size);
> -ssize_t qemu_send_packet_raw(NetClientState *nc, const uint8_t *buf, int size);
> +                                int iovcnt, NetPacketSent *sent_cb,
> +                                unsigned flags);
> +void qemu_send_packet(NetClientState *nc, const uint8_t *buf, int size,
> +                      unsigned flags);
> +ssize_t qemu_send_packet_raw(NetClientState *nc, const uint8_t *buf, int size,
> +                             unsigned flags);
>  ssize_t qemu_send_packet_async(NetClientState *nc, const uint8_t *buf,
> -                               int size, NetPacketSent *sent_cb);
> +                               int size, NetPacketSent *sent_cb,
> +                               unsigned flags);
>  void qemu_purge_queued_packets(NetClientState *nc);
>  void qemu_flush_queued_packets(NetClientState *nc);
>  void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6]);
> diff --git a/include/net/queue.h b/include/net/queue.h
> index fc02b33..1d136a6 100644
> --- a/include/net/queue.h
> +++ b/include/net/queue.h
> @@ -33,6 +33,7 @@ typedef void (NetPacketSent) (NetClientState *sender, ssize_t ret);
>  
>  #define QEMU_NET_PACKET_FLAG_NONE  0
>  #define QEMU_NET_PACKET_FLAG_RAW  (1<<0)
> +#define QEMU_NET_PACKET_FLAG_MORE (2<<0)
>  
>  NetQueue *qemu_new_net_queue(void *opaque);
>  
> diff --git a/net/dump.c b/net/dump.c
> index 9d3a09e..f718d5c 100644
> --- a/net/dump.c
> +++ b/net/dump.c
> @@ -57,7 +57,8 @@ struct pcap_sf_pkthdr {
>      uint32_t len;
>  };
>  
> -static ssize_t dump_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t dump_receive(NetClientState *nc, const uint8_t *buf,
> +                            size_t size, unsigned flags)
>  {
>      DumpState *s = DO_UPCAST(DumpState, nc, nc);
>      struct pcap_sf_pkthdr hdr;
> diff --git a/net/hub.c b/net/hub.c
> index 33a99c9..7adca5d 100644
> --- a/net/hub.c
> +++ b/net/hub.c
> @@ -52,7 +52,7 @@ static ssize_t net_hub_receive(NetHub *hub, NetHubPort *source_port,
>              continue;
>          }
>  
> -        qemu_send_packet(&port->nc, buf, len);
> +        qemu_send_packet(&port->nc, buf, len, 0);
>      }
>      return len;
>  }
> @@ -68,7 +68,7 @@ static ssize_t net_hub_receive_iov(NetHub *hub, NetHubPort *source_port,
>              continue;
>          }
>  
> -        qemu_sendv_packet(&port->nc, iov, iovcnt);
> +        qemu_sendv_packet(&port->nc, iov, iovcnt, 0);
>      }
>      return len;
>  }
> @@ -107,7 +107,8 @@ static int net_hub_port_can_receive(NetClientState *nc)
>  }
>  
>  static ssize_t net_hub_port_receive(NetClientState *nc,
> -                                    const uint8_t *buf, size_t len)
> +                                    const uint8_t *buf, size_t len,
> +                                    unsigned flags)
>  {
>      NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
>  
> @@ -115,7 +116,8 @@ static ssize_t net_hub_port_receive(NetClientState *nc,
>  }
>  
>  static ssize_t net_hub_port_receive_iov(NetClientState *nc,
> -                                        const struct iovec *iov, int iovcnt)
> +                                        const struct iovec *iov, int iovcnt,
> +                                        unsigned flags)
>  {
>      NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
>  
> diff --git a/net/net.c b/net/net.c
> index 9db88cc..65cf5f1 100644
> --- a/net/net.c
> +++ b/net/net.c
> @@ -414,9 +414,10 @@ ssize_t qemu_deliver_packet(NetClientState *sender,
>      }
>  
>      if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) {
> -        ret = nc->info->receive_raw(nc, data, size);
> +        ret = nc->info->receive_raw(nc, data, size,
> +                                    flags & ~QEMU_NET_PACKET_FLAG_RAW);
>      } else {
> -        ret = nc->info->receive(nc, data, size);
> +        ret = nc->info->receive(nc, data, size, flags);
>      }
>  
>      if (ret == 0) {
> @@ -475,32 +476,36 @@ static ssize_t qemu_send_packet_async_with_flags(NetClientState *sender,
>  
>  ssize_t qemu_send_packet_async(NetClientState *sender,
>                                 const uint8_t *buf, int size,
> -                               NetPacketSent *sent_cb)
> +                               NetPacketSent *sent_cb, unsigned flags)
>  {
> -    return qemu_send_packet_async_with_flags(sender, QEMU_NET_PACKET_FLAG_NONE,
> +    return qemu_send_packet_async_with_flags(sender,
> +                                             flags | QEMU_NET_PACKET_FLAG_NONE,
>                                               buf, size, sent_cb);
>  }
>  
> -void qemu_send_packet(NetClientState *nc, const uint8_t *buf, int size)
> +void qemu_send_packet(NetClientState *nc, const uint8_t *buf, int size,
> +                      unsigned flags)
>  {
> -    qemu_send_packet_async(nc, buf, size, NULL);
> +    qemu_send_packet_async(nc, buf, size, NULL, flags);
>  }
>  
> -ssize_t qemu_send_packet_raw(NetClientState *nc, const uint8_t *buf, int size)
> +ssize_t qemu_send_packet_raw(NetClientState *nc, const uint8_t *buf, int size,
> +                             unsigned flags)
>  {
> -    return qemu_send_packet_async_with_flags(nc, QEMU_NET_PACKET_FLAG_RAW,
> +    return qemu_send_packet_async_with_flags(nc,
> +                                             QEMU_NET_PACKET_FLAG_RAW | flags,
>                                               buf, size, NULL);
>  }
>  
>  static ssize_t nc_sendv_compat(NetClientState *nc, const struct iovec *iov,
> -                               int iovcnt)
> +                               int iovcnt, unsigned flags)
>  {
>      uint8_t buffer[NET_BUFSIZE];
>      size_t offset;
>  
>      offset = iov_to_buf(iov, iovcnt, 0, buffer, sizeof(buffer));
>  
> -    return nc->info->receive(nc, buffer, offset);
> +    return nc->info->receive(nc, buffer, offset, flags);
>  }
>  
>  ssize_t qemu_deliver_packet_iov(NetClientState *sender,
> @@ -521,9 +526,9 @@ ssize_t qemu_deliver_packet_iov(NetClientState *sender,
>      }
>  
>      if (nc->info->receive_iov) {
> -        ret = nc->info->receive_iov(nc, iov, iovcnt);
> +        ret = nc->info->receive_iov(nc, iov, iovcnt, flags);
>      } else {
> -        ret = nc_sendv_compat(nc, iov, iovcnt);
> +        ret = nc_sendv_compat(nc, iov, iovcnt, flags);
>      }
>  
>      if (ret == 0) {
> @@ -535,7 +540,8 @@ ssize_t qemu_deliver_packet_iov(NetClientState *sender,
>  
>  ssize_t qemu_sendv_packet_async(NetClientState *sender,
>                                  const struct iovec *iov, int iovcnt,
> -                                NetPacketSent *sent_cb)
> +                                NetPacketSent *sent_cb,
> +                                unsigned flags)
>  {
>      NetQueue *queue;
>  
> @@ -546,14 +552,15 @@ ssize_t qemu_sendv_packet_async(NetClientState *sender,
>      queue = sender->peer->incoming_queue;
>  
>      return qemu_net_queue_send_iov(queue, sender,
> -                                   QEMU_NET_PACKET_FLAG_NONE,
> +                                   flags | QEMU_NET_PACKET_FLAG_NONE,
>                                     iov, iovcnt, sent_cb);
>  }
>  
>  ssize_t
> -qemu_sendv_packet(NetClientState *nc, const struct iovec *iov, int iovcnt)
> +qemu_sendv_packet(NetClientState *nc, const struct iovec *iov, int iovcnt,
> +                  unsigned flags)
>  {
> -    return qemu_sendv_packet_async(nc, iov, iovcnt, NULL);
> +    return qemu_sendv_packet_async(nc, iov, iovcnt, NULL, flags);
>  }
>  
>  NetClientState *qemu_find_netdev(const char *id)
> diff --git a/net/netmap.c b/net/netmap.c
> index 0ccc497..0b982a0 100644
> --- a/net/netmap.c
> +++ b/net/netmap.c
> @@ -218,7 +218,8 @@ static void netmap_writable(void *opaque)
>  }
>  
>  static ssize_t netmap_receive(NetClientState *nc,
> -      const uint8_t *buf, size_t size)
> +                              const uint8_t *buf,
> +                              size_t size, unsigned flags)
>  {
>      NetmapState *s = DO_UPCAST(NetmapState, nc, nc);
>      struct netmap_ring *ring = s->me.tx;
> @@ -252,13 +253,17 @@ static ssize_t netmap_receive(NetClientState *nc,
>      pkt_copy(buf, dst, size);
>      ring->cur = NETMAP_RING_NEXT(ring, i);
>      ring->avail--;
> -    ioctl(s->me.fd, NIOCTXSYNC, NULL);
> +
> +    if (!(flags & QEMU_NET_PACKET_FLAG_MORE)) {
> +        ioctl(s->me.fd, NIOCTXSYNC, NULL);
> +    }
>  
>      return size;
>  }
>  
>  static ssize_t netmap_receive_iov(NetClientState *nc,
> -                    const struct iovec *iov, int iovcnt)
> +                    const struct iovec *iov, int iovcnt,
> +                    unsigned flags)
>  {
>      NetmapState *s = DO_UPCAST(NetmapState, nc, nc);
>      struct netmap_ring *ring = s->me.tx;
> @@ -322,7 +327,9 @@ static ssize_t netmap_receive_iov(NetClientState *nc,
>      ring->cur = i;
>      ring->avail = avail;
>  
> -    ioctl(s->me.fd, NIOCTXSYNC, NULL);
> +    if (!(flags & QEMU_NET_PACKET_FLAG_MORE)) {
> +        ioctl(s->me.fd, NIOCTXSYNC, NULL);
> +    }
>  
>      return iov_size(iov, iovcnt);
>  }
> @@ -368,7 +375,7 @@ static void netmap_send(void *opaque)
>          }
>  
>          iovsize = qemu_sendv_packet_async(&s->nc, s->iov, iovcnt,
> -                                            netmap_send_completed);
> +                                            netmap_send_completed, 0);
>  
>          if (iovsize == 0) {
>              /* The peer does not receive anymore. Packet is queued, stop
> diff --git a/net/slirp.c b/net/slirp.c
> index 124e953..a801638 100644
> --- a/net/slirp.c
> +++ b/net/slirp.c
> @@ -103,10 +103,11 @@ void slirp_output(void *opaque, const uint8_t *pkt, int pkt_len)
>  {
>      SlirpState *s = opaque;
>  
> -    qemu_send_packet(&s->nc, pkt, pkt_len);
> +    qemu_send_packet(&s->nc, pkt, pkt_len, 0);
>  }
>  
> -static ssize_t net_slirp_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t net_slirp_receive(NetClientState *nc, const uint8_t *buf,
> +                                 size_t size, unsigned flags)
>  {
>      SlirpState *s = DO_UPCAST(SlirpState, nc, nc);
>  
> diff --git a/net/socket.c b/net/socket.c
> index fb21e20..acc715a 100644
> --- a/net/socket.c
> +++ b/net/socket.c
> @@ -89,7 +89,8 @@ static void net_socket_writable(void *opaque)
>      qemu_flush_queued_packets(&s->nc);
>  }
>  
> -static ssize_t net_socket_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t net_socket_receive(NetClientState *nc, const uint8_t *buf,
> +                                  size_t size, unsigned flags)
>  {
>      NetSocketState *s = DO_UPCAST(NetSocketState, nc, nc);
>      uint32_t len = htonl(size);
> @@ -124,7 +125,8 @@ static ssize_t net_socket_receive(NetClientState *nc, const uint8_t *buf, size_t
>      return size;
>  }
>  
> -static ssize_t net_socket_receive_dgram(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t net_socket_receive_dgram(NetClientState *nc, const uint8_t *buf,
> +                                        size_t size, unsigned flags)
>  {
>      NetSocketState *s = DO_UPCAST(NetSocketState, nc, nc);
>      ssize_t ret;
> @@ -211,7 +213,7 @@ static void net_socket_send(void *opaque)
>              buf += l;
>              size -= l;
>              if (s->index >= s->packet_len) {
> -                qemu_send_packet(&s->nc, s->buf, s->packet_len);
> +                qemu_send_packet(&s->nc, s->buf, s->packet_len, 0);
>                  s->index = 0;
>                  s->state = 0;
>              }
> @@ -234,7 +236,7 @@ static void net_socket_send_dgram(void *opaque)
>          net_socket_write_poll(s, false);
>          return;
>      }
> -    qemu_send_packet(&s->nc, s->buf, size);
> +    qemu_send_packet(&s->nc, s->buf, size, 0);
>  }
>  
>  static int net_socket_mcast_create(struct sockaddr_in *mcastaddr, struct in_addr *localaddr)
> diff --git a/net/tap-win32.c b/net/tap-win32.c
> index 91e9e84..2d86122 100644
> --- a/net/tap-win32.c
> +++ b/net/tap-win32.c
> @@ -664,7 +664,7 @@ static void tap_win32_send(void *opaque)
>  
>      size = tap_win32_read(s->handle, &buf, max_size);
>      if (size > 0) {
> -        qemu_send_packet(&s->nc, buf, size);
> +        qemu_send_packet(&s->nc, buf, size, 0);
>          tap_win32_free_buffer(s->handle, buf);
>      }
>  }
> diff --git a/net/tap.c b/net/tap.c
> index 39c1cda..6d7a02e 100644
> --- a/net/tap.c
> +++ b/net/tap.c
> @@ -112,7 +112,7 @@ static ssize_t tap_write_packet(TAPState *s, const struct iovec *iov, int iovcnt
>  }
>  
>  static ssize_t tap_receive_iov(NetClientState *nc, const struct iovec *iov,
> -                               int iovcnt)
> +                               int iovcnt, unsigned flags)
>  {
>      TAPState *s = DO_UPCAST(TAPState, nc, nc);
>      const struct iovec *iovp = iov;
> @@ -130,7 +130,8 @@ static ssize_t tap_receive_iov(NetClientState *nc, const struct iovec *iov,
>      return tap_write_packet(s, iovp, iovcnt);
>  }
>  
> -static ssize_t tap_receive_raw(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t tap_receive_raw(NetClientState *nc, const uint8_t *buf,
> +                               size_t size, unsigned flags)
>  {
>      TAPState *s = DO_UPCAST(TAPState, nc, nc);
>      struct iovec iov[2];
> @@ -150,13 +151,14 @@ static ssize_t tap_receive_raw(NetClientState *nc, const uint8_t *buf, size_t si
>      return tap_write_packet(s, iov, iovcnt);
>  }
>  
> -static ssize_t tap_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t tap_receive(NetClientState *nc, const uint8_t *buf, size_t size,
> +                           unsigned flags)
>  {
>      TAPState *s = DO_UPCAST(TAPState, nc, nc);
>      struct iovec iov[1];
>  
>      if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {
> -        return tap_receive_raw(nc, buf, size);
> +        return tap_receive_raw(nc, buf, size, flags);
>      }
>  
>      iov[0].iov_base = (char *)buf;
> @@ -203,7 +205,7 @@ static void tap_send(void *opaque)
>              size -= s->host_vnet_hdr_len;
>          }
>  
> -        size = qemu_send_packet_async(&s->nc, buf, size, tap_send_completed);
> +        size = qemu_send_packet_async(&s->nc, buf, size, tap_send_completed, 0);
>          if (size == 0) {
>              tap_read_poll(s, false);
>          }
> diff --git a/net/vde.c b/net/vde.c
> index 2a619fb..5629f58 100644
> --- a/net/vde.c
> +++ b/net/vde.c
> @@ -44,11 +44,12 @@ static void vde_to_qemu(void *opaque)
>  
>      size = vde_recv(s->vde, (char *)buf, sizeof(buf), 0);
>      if (size > 0) {
> -        qemu_send_packet(&s->nc, buf, size);
> +        qemu_send_packet(&s->nc, buf, size, 0);
>      }
>  }
>  
> -static ssize_t vde_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> +static ssize_t vde_receive(NetClientState *nc, const uint8_t *buf, size_t size,
> +                           unsigned flags)
>  {
>      VDEState *s = DO_UPCAST(VDEState, nc, nc);
>      ssize_t ret;
> diff --git a/savevm.c b/savevm.c
> index 3f912dd..a8d5373 100644
> --- a/savevm.c
> +++ b/savevm.c
> @@ -84,7 +84,7 @@ static void qemu_announce_self_iter(NICState *nic, void *opaque)
>  
>      len = announce_self_create(buf, nic->conf->macaddr.a);
>  
> -    qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
> +    qemu_send_packet_raw(qemu_get_queue(nic), buf, len, 0);
>  }
>  
>  
> -- 
> 1.8.4.2

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-08 12:11 ` Michael S. Tsirkin
@ 2013-12-09 10:20   ` Vincenzo Maffione
  2013-12-09 10:30     ` Michael S. Tsirkin
  0 siblings, 1 reply; 17+ messages in thread
From: Vincenzo Maffione @ 2013-12-09 10:20 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Jason Wang, mjt, qemu-devel, lcapitulino,
	peter.crosthwaite, Dmitry Fleytman, Gerd Hoffmann,
	Yan Vugenfirer, Edgar E. Iglesias, akong, quintela,
	Alexander Graf, aliguori, marcel.a, sw, Stefan Hajnoczi,
	Giuseppe Lettieri, Luigi Rizzo, mark.langsdorf, owasserm,
	Paolo Bonzini, Andreas Färber

[-- Attachment #1: Type: text/plain, Size: 44880 bytes --]

Hello,
   I've done some netperf TCP_STREAM and TCP_RR virtio-net tests, using the
same configuration.
Here are the results

########## netperf TCP_STREAM ###########
        NO BATCHING         BATCHING
1          5.5 Gbps         3.8 Gbps
2          5.4 Gbps         5.5 Gbps
3          5.2 Gbps         5.2 Gbps
4          5.1 Gbps         5.0 Gbps
10         5.4 Gbps         5.2 Gbps
20         5.4 Gbps         5.4 Gbps


############ netperf TCP_RR #############
        NO BATCHING         BATCHING
1         13.0 Ktts         12.8 Ktts
2         23.8 Ktts         23.0 Ktts
3         34.0 Ktts         32.5 Ktts
4         44.5 Ktts         41.0 Ktts
10        97.0 Ktts         93.0 Ktts
15       122.0 Ktts        120.0 Ktts
20       125.0 Ktts        128.0 Ktts
25       128.0 Ktts        130.0 Ktts


There is some negative effects introduced by batching.
Also consider that
   - Since TAP backend doesn't use the new flag, this patch doesn't change
the performance when the TAP backend is used.
   - I've not submitted yet the patch for virtio_net_header support, and
therefore the TCP_STREAM performance with NETMAP backend is now not
     comparable to the performance with TAP backend, because we are limited
to 1.5KB packets.

Cheers,
  Vincenzo


2013/12/8 Michael S. Tsirkin <mst@redhat.com>

> On Fri, Dec 06, 2013 at 03:44:33PM +0100, Vincenzo Maffione wrote:
> > This patch extends the frontend-backend interface so that it is possible
> > to pass a new flag (QEMU_NET_PACKET_FLAG_MORE) when sending a packet to
> the
> > other peer. The new flag acts as a hint for the receiving peer, which can
> > accumulate a batch of packets before forwarding those packets (to the
> host
> > if the receiving peer is the backend or to the guest if the receiving
> peer
> > is the frontend).
> >
> > The patch also implements a batching mechanism for the netmap backend
> (on the
> > backend receive side) and for the e1000 and virtio frontends (on the
> frontend
> > transmit side).
> >
> > Measured improvement of a guest-to-guest UDP_STREAM netperf test (64
> bytes
> > packets) with virtio-net frontends:
> >     820 Kpps ==> 1000 Kpps (+22%).
> >
> > Measured improvement of a guest-to-guest UDP test (64 bytes packets) with
> > e1000 frontends and netmap clients on the guests:
> >     1.8 Mpps ==> 3.1 Mpps (+72%).
> >
> > Signed-off-by: Vincenzo Maffione <v.maffione@gmail.com>
>
> So we are batching some more and this helps throughput. However I wonder
> what this does to a more bursty traffic, such as
> several TCP streams running in parallel.
>
>
> > ---
> > Experiment details:
> >     - Processor: Intel i7-3770K CPU @ 3.50GHz (8 cores)
> >     - Memory @ 1333 MHz
> >     - Host O.S.: Archlinux with Linux 3.11
> >     - Guest O.S.: Archlinux with Linux 3.11
> >
> > QEMU command line for the virtio experiment:
> >     qemu-system-x86_64 archdisk.qcow -snapshot -enable-kvm -device
> virtio-net-pci,ioeventfd=on,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev
> netmap,ifname=vale0:01,id=mynet -smp 2 -vga std -m 3G
> >
> > QEMU command line for the e1000 experiment:
> >     qemu-system-x86_64 archdisk.qcow -snapshot -enable-kvm -device
> e1000,mitigation=off,mac=00:AA:BB:CC:DD:01,netdev=mynet -netdev
> netmap,ifname=vale0:01,id=mynet -smp 2 -vga std -m 3G
> >
> > With the e1000 experiments, we don't use netperf on the guests, but
> netmap clients (pkt-gen)
> > that run directly on the e1000 adapter, bypassing the O.S. stack.
> >
> > Other things:
> >     - This patch is against the net-next tree (
> https://github.com/stefanha/qemu.git)
> >       because the first netmap patch is not in the qemu master (AFAIK).
> >     - The batching can also be implemented on the backend transmit side
> and frontend
> >       receive side. We could do it in the future.
> >
> >  hw/net/cadence_gem.c    |  3 ++-
> >  hw/net/dp8393x.c        |  5 +++--
> >  hw/net/e1000.c          | 21 ++++++++++++++++-----
> >  hw/net/eepro100.c       |  5 +++--
> >  hw/net/etraxfs_eth.c    |  5 +++--
> >  hw/net/lan9118.c        |  2 +-
> >  hw/net/mcf_fec.c        |  5 +++--
> >  hw/net/mipsnet.c        |  6 ++++--
> >  hw/net/ne2000.c         |  5 +++--
> >  hw/net/ne2000.h         |  3 ++-
> >  hw/net/opencores_eth.c  |  2 +-
> >  hw/net/pcnet.c          |  8 +++++---
> >  hw/net/pcnet.h          |  3 ++-
> >  hw/net/rtl8139.c        |  7 ++++---
> >  hw/net/smc91c111.c      |  5 +++--
> >  hw/net/spapr_llan.c     |  2 +-
> >  hw/net/stellaris_enet.c |  3 ++-
> >  hw/net/virtio-net.c     | 10 ++++++++--
> >  hw/net/vmxnet3.c        |  3 ++-
> >  hw/net/vmxnet_tx_pkt.c  |  4 ++--
> >  hw/net/xgmac.c          |  2 +-
> >  hw/net/xilinx_axienet.c |  2 +-
> >  hw/usb/dev-network.c    |  8 +++++---
> >  include/net/net.h       | 20 +++++++++++++-------
> >  include/net/queue.h     |  1 +
> >  net/dump.c              |  3 ++-
> >  net/hub.c               | 10 ++++++----
> >  net/net.c               | 39 +++++++++++++++++++++++----------------
> >  net/netmap.c            | 17 ++++++++++++-----
> >  net/slirp.c             |  5 +++--
> >  net/socket.c            | 10 ++++++----
> >  net/tap-win32.c         |  2 +-
> >  net/tap.c               | 12 +++++++-----
> >  net/vde.c               |  5 +++--
> >  savevm.c                |  2 +-
> >  35 files changed, 155 insertions(+), 90 deletions(-)
> >
> > diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
> > index 4a355bb..432687a 100644
> > --- a/hw/net/cadence_gem.c
> > +++ b/hw/net/cadence_gem.c
> > @@ -583,7 +583,8 @@ static int gem_mac_address_filter(GemState *s, const
> uint8_t *packet)
> >   * gem_receive:
> >   * Fit a packet handed to us by QEMU into the receive descriptor ring.
> >   */
> > -static ssize_t gem_receive(NetClientState *nc, const uint8_t *buf,
> size_t size)
> > +static ssize_t gem_receive(NetClientState *nc, const uint8_t *buf,
> size_t size,
> > +                           unsigned flags)
> >  {
> >      unsigned    desc[2];
> >      hwaddr packet_desc_addr, last_desc_addr;
> > diff --git a/hw/net/dp8393x.c b/hw/net/dp8393x.c
> > index 789d385..d8c7da8 100644
> > --- a/hw/net/dp8393x.c
> > +++ b/hw/net/dp8393x.c
> > @@ -415,7 +415,7 @@ static void do_transmit_packets(dp8393xState *s)
> >              }
> >          } else {
> >              /* Transmit packet */
> > -            qemu_send_packet(nc, s->tx_buffer, tx_len);
> > +            qemu_send_packet(nc, s->tx_buffer, tx_len, 0);
> >          }
> >          s->regs[SONIC_TCR] |= SONIC_TCR_PTX;
> >
> > @@ -723,7 +723,8 @@ static int receive_filter(dp8393xState *s, const
> uint8_t * buf, int size)
> >      return -1;
> >  }
> >
> > -static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf,
> size_t size)
> > +static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf,
> > +                           size_t size, unsigned flags)
> >  {
> >      dp8393xState *s = qemu_get_nic_opaque(nc);
> >      uint16_t data[10];
> > diff --git a/hw/net/e1000.c b/hw/net/e1000.c
> > index ae63591..5294ec5 100644
> > --- a/hw/net/e1000.c
> > +++ b/hw/net/e1000.c
> > @@ -570,10 +570,19 @@ static void
> >  e1000_send_packet(E1000State *s, const uint8_t *buf, int size)
> >  {
> >      NetClientState *nc = qemu_get_queue(s->nic);
> > +    uint32_t tdh = s->mac_reg[TDH];
> > +    unsigned flags = QEMU_NET_PACKET_FLAG_MORE;
> > +
> >      if (s->phy_reg[PHY_CTRL] & MII_CR_LOOPBACK) {
> > -        nc->info->receive(nc, buf, size);
> > +        nc->info->receive(nc, buf, size, 0);
> >      } else {
> > -        qemu_send_packet(nc, buf, size);
> > +        if (++tdh * sizeof(struct e1000_tx_desc) >= s->mac_reg[TDLEN]) {
> > +            tdh = 0;
> > +        }
> > +        if (tdh == s->mac_reg[TDT]) {
> > +            flags = 0;
> > +        }
> > +        qemu_send_packet(nc, buf, size, flags);
> >      }
> >  }
> >
> > @@ -899,7 +908,8 @@ static uint64_t rx_desc_base(E1000State *s)
> >  }
> >
> >  static ssize_t
> > -e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int
> iovcnt)
> > +e1000_receive_iov(NetClientState *nc, const struct iovec *iov, int
> iovcnt,
> > +                  unsigned flags)
> >  {
> >      E1000State *s = qemu_get_nic_opaque(nc);
> >      PCIDevice *d = PCI_DEVICE(s);
> > @@ -1054,14 +1064,15 @@ e1000_receive_iov(NetClientState *nc, const
> struct iovec *iov, int iovcnt)
> >  }
> >
> >  static ssize_t
> > -e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> > +e1000_receive(NetClientState *nc, const uint8_t *buf, size_t size,
> > +              unsigned flags)
> >  {
> >      const struct iovec iov = {
> >          .iov_base = (uint8_t *)buf,
> >          .iov_len = size
> >      };
> >
> > -    return e1000_receive_iov(nc, &iov, 1);
> > +    return e1000_receive_iov(nc, &iov, 1, flags);
> >  }
> >
> >  static uint32_t
> > diff --git a/hw/net/eepro100.c b/hw/net/eepro100.c
> > index 3b891ca..9763904 100644
> > --- a/hw/net/eepro100.c
> > +++ b/hw/net/eepro100.c
> > @@ -828,7 +828,7 @@ static void tx_command(EEPRO100State *s)
> >          }
> >      }
> >      TRACE(RXTX, logout("%p sending frame, len=%d,%s\n", s, size,
> nic_dump(buf, size)));
> > -    qemu_send_packet(qemu_get_queue(s->nic), buf, size);
> > +    qemu_send_packet(qemu_get_queue(s->nic), buf, size, 0);
> >      s->statistics.tx_good_frames++;
> >      /* Transmit with bad status would raise an CX/TNO interrupt.
> >       * (82557 only). Emulation never has bad status. */
> > @@ -1627,7 +1627,8 @@ static int nic_can_receive(NetClientState *nc)
> >  #endif
> >  }
> >
> > -static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf,
> size_t size)
> > +static ssize_t nic_receive(NetClientState *nc, const uint8_t * buf,
> > +                           size_t size, unsigned flags)
> >  {
> >      /* TODO:
> >       * - Magic packets should set bit 30 in power management driver
> register.
> > diff --git a/hw/net/etraxfs_eth.c b/hw/net/etraxfs_eth.c
> > index 78ebbbc..6cba74e 100644
> > --- a/hw/net/etraxfs_eth.c
> > +++ b/hw/net/etraxfs_eth.c
> > @@ -525,7 +525,8 @@ static int eth_can_receive(NetClientState *nc)
> >      return 1;
> >  }
> >
> > -static ssize_t eth_receive(NetClientState *nc, const uint8_t *buf,
> size_t size)
> > +static ssize_t eth_receive(NetClientState *nc, const uint8_t *buf,
> size_t size,
> > +                           unsigned flags)
> >  {
> >      unsigned char sa_bcast[6] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff };
> >      ETRAXFSEthState *eth = qemu_get_nic_opaque(nc);
> > @@ -560,7 +561,7 @@ static int eth_tx_push(void *opaque, unsigned char
> *buf, int len, bool eop)
> >      ETRAXFSEthState *eth = opaque;
> >
> >      D(printf("%s buf=%p len=%d\n", __func__, buf, len));
> > -    qemu_send_packet(qemu_get_queue(eth->nic), buf, len);
> > +    qemu_send_packet(qemu_get_queue(eth->nic), buf, len, 0);
> >      return len;
> >  }
> >
> > diff --git a/hw/net/lan9118.c b/hw/net/lan9118.c
> > index 2315f99..55e06a9 100644
> > --- a/hw/net/lan9118.c
> > +++ b/hw/net/lan9118.c
> > @@ -664,7 +664,7 @@ static void do_tx_packet(lan9118_state *s)
> >          /* This assumes the receive routine doesn't touch the
> VLANClient.  */
> >          lan9118_receive(qemu_get_queue(s->nic), s->txp->data,
> s->txp->len);
> >      } else {
> > -        qemu_send_packet(qemu_get_queue(s->nic), s->txp->data,
> s->txp->len);
> > +        qemu_send_packet(qemu_get_queue(s->nic), s->txp->data,
> s->txp->len, 0);
> >      }
> >      s->txp->fifo_used = 0;
> >
> > diff --git a/hw/net/mcf_fec.c b/hw/net/mcf_fec.c
> > index 4bff3de..14ed0dd 100644
> > --- a/hw/net/mcf_fec.c
> > +++ b/hw/net/mcf_fec.c
> > @@ -174,7 +174,7 @@ static void mcf_fec_do_tx(mcf_fec_state *s)
> >          if (bd.flags & FEC_BD_L) {
> >              /* Last buffer in frame.  */
> >              DPRINTF("Sending packet\n");
> > -            qemu_send_packet(qemu_get_queue(s->nic), frame, len);
> > +            qemu_send_packet(qemu_get_queue(s->nic), frame, len, 0);
> >              ptr = frame;
> >              frame_size = 0;
> >              s->eir |= FEC_INT_TXF;
> > @@ -357,7 +357,8 @@ static int mcf_fec_can_receive(NetClientState *nc)
> >      return s->rx_enabled;
> >  }
> >
> > -static ssize_t mcf_fec_receive(NetClientState *nc, const uint8_t *buf,
> size_t size)
> > +static ssize_t mcf_fec_receive(NetClientState *nc, const uint8_t *buf,
> > +                               size_t size, unsigned flags)
> >  {
> >      mcf_fec_state *s = qemu_get_nic_opaque(nc);
> >      mcf_fec_bd bd;
> > diff --git a/hw/net/mipsnet.c b/hw/net/mipsnet.c
> > index e421b86..7f5d4c4 100644
> > --- a/hw/net/mipsnet.c
> > +++ b/hw/net/mipsnet.c
> > @@ -74,7 +74,8 @@ static int mipsnet_can_receive(NetClientState *nc)
> >      return !mipsnet_buffer_full(s);
> >  }
> >
> > -static ssize_t mipsnet_receive(NetClientState *nc, const uint8_t *buf,
> size_t size)
> > +static ssize_t mipsnet_receive(NetClientState *nc, const uint8_t *buf,
> > +                               size_t size, unsigned flags)
> >  {
> >      MIPSnetState *s = qemu_get_nic_opaque(nc);
> >
> > @@ -176,7 +177,8 @@ static void mipsnet_ioport_write(void *opaque,
> hwaddr addr,
> >          if (s->tx_written == s->tx_count) {
> >              /* Send buffer. */
> >              trace_mipsnet_send(s->tx_count);
> > -            qemu_send_packet(qemu_get_queue(s->nic), s->tx_buffer,
> s->tx_count);
> > +            qemu_send_packet(qemu_get_queue(s->nic), s->tx_buffer,
> > +                             s->tx_count, 0);
> >              s->tx_count = s->tx_written = 0;
> >              s->intctl |= MIPSNET_INTCTL_TXDONE;
> >              s->busy = 1;
> > diff --git a/hw/net/ne2000.c b/hw/net/ne2000.c
> > index 4c32e9e..52af46a 100644
> > --- a/hw/net/ne2000.c
> > +++ b/hw/net/ne2000.c
> > @@ -176,7 +176,8 @@ int ne2000_can_receive(NetClientState *nc)
> >
> >  #define MIN_BUF_SIZE 60
> >
> > -ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t
> size_)
> > +ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t
> size_,
> > +                       unsigned flags)
> >  {
> >      NE2000State *s = qemu_get_nic_opaque(nc);
> >      int size = size_;
> > @@ -301,7 +302,7 @@ static void ne2000_ioport_write(void *opaque,
> uint32_t addr, uint32_t val)
> >                  /* fail safe: check range on the transmitted length  */
> >                  if (index + s->tcnt <= NE2000_PMEM_END) {
> >                      qemu_send_packet(qemu_get_queue(s->nic), s->mem +
> index,
> > -                                     s->tcnt);
> > +                                     s->tcnt, 0);
> >                  }
> >                  /* signal end of transfer */
> >                  s->tsr = ENTSR_PTX;
> > diff --git a/hw/net/ne2000.h b/hw/net/ne2000.h
> > index e500306..b62a8f3 100644
> > --- a/hw/net/ne2000.h
> > +++ b/hw/net/ne2000.h
> > @@ -35,6 +35,7 @@ void ne2000_setup_io(NE2000State *s, DeviceState *dev,
> unsigned size);
> >  extern const VMStateDescription vmstate_ne2000;
> >  void ne2000_reset(NE2000State *s);
> >  int ne2000_can_receive(NetClientState *nc);
> > -ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t
> size_);
> > +ssize_t ne2000_receive(NetClientState *nc, const uint8_t *buf, size_t
> size_,
> > +                       unsigned flags);
> >
> >  #endif
> > diff --git a/hw/net/opencores_eth.c b/hw/net/opencores_eth.c
> > index 4118d54..b4328ea 100644
> > --- a/hw/net/opencores_eth.c
> > +++ b/hw/net/opencores_eth.c
> > @@ -503,7 +503,7 @@ static void open_eth_start_xmit(OpenEthState *s,
> desc *tx)
> >      if (tx_len > len) {
> >          memset(buf + len, 0, tx_len - len);
> >      }
> > -    qemu_send_packet(qemu_get_queue(s->nic), buf, tx_len);
> > +    qemu_send_packet(qemu_get_queue(s->nic), buf, tx_len, 0);
> >
> >      if (tx->len_flags & TXD_WR) {
> >          s->tx_desc = 0;
> > diff --git a/hw/net/pcnet.c b/hw/net/pcnet.c
> > index 7cb47b3..707ac92 100644
> > --- a/hw/net/pcnet.c
> > +++ b/hw/net/pcnet.c
> > @@ -1019,7 +1019,8 @@ int pcnet_can_receive(NetClientState *nc)
> >
> >  #define MIN_BUF_SIZE 60
> >
> > -ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t
> size_)
> > +ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t
> size_,
> > +                      unsigned flags)
> >  {
> >      PCNetState *s = qemu_get_nic_opaque(nc);
> >      int is_padr = 0, is_bcast = 0, is_ladr = 0;
> > @@ -1265,12 +1266,13 @@ static void pcnet_transmit(PCNetState *s)
> >                  if (BCR_SWSTYLE(s) == 1)
> >                      add_crc = !GET_FIELD(tmd.status, TMDS, NOFCS);
> >                  s->looptest = add_crc ? PCNET_LOOPTEST_CRC :
> PCNET_LOOPTEST_NOCRC;
> > -                pcnet_receive(qemu_get_queue(s->nic), s->buffer,
> s->xmit_pos);
> > +                pcnet_receive(qemu_get_queue(s->nic), s->buffer,
> > +                              s->xmit_pos, 0);
> >                  s->looptest = 0;
> >              } else
> >                  if (s->nic)
> >                      qemu_send_packet(qemu_get_queue(s->nic), s->buffer,
> > -                                     s->xmit_pos);
> > +                                     s->xmit_pos, 0);
> >
> >              s->csr[0] &= ~0x0008;   /* clear TDMD */
> >              s->csr[4] |= 0x0004;    /* set TXSTRT */
> > diff --git a/hw/net/pcnet.h b/hw/net/pcnet.h
> > index 9dee6f3..a26aacd 100644
> > --- a/hw/net/pcnet.h
> > +++ b/hw/net/pcnet.h
> > @@ -61,7 +61,8 @@ void pcnet_ioport_writel(void *opaque, uint32_t addr,
> uint32_t val);
> >  uint32_t pcnet_ioport_readl(void *opaque, uint32_t addr);
> >  uint32_t pcnet_bcr_readw(PCNetState *s, uint32_t rap);
> >  int pcnet_can_receive(NetClientState *nc);
> > -ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t
> size_);
> > +ssize_t pcnet_receive(NetClientState *nc, const uint8_t *buf, size_t
> size_,
> > +                      unsigned flags);
> >  void pcnet_set_link_status(NetClientState *nc);
> >  void pcnet_common_cleanup(PCNetState *d);
> >  int pcnet_common_init(DeviceState *dev, PCNetState *s, NetClientInfo
> *info);
> > diff --git a/hw/net/rtl8139.c b/hw/net/rtl8139.c
> > index 7f2b4db..340331f 100644
> > --- a/hw/net/rtl8139.c
> > +++ b/hw/net/rtl8139.c
> > @@ -1195,7 +1195,8 @@ static ssize_t rtl8139_do_receive(NetClientState
> *nc, const uint8_t *buf, size_t
> >      return size_;
> >  }
> >
> > -static ssize_t rtl8139_receive(NetClientState *nc, const uint8_t *buf,
> size_t size)
> > +static ssize_t rtl8139_receive(NetClientState *nc, const uint8_t *buf,
> > +                               size_t size, unsigned flags)
> >  {
> >      return rtl8139_do_receive(nc, buf, size, 1);
> >  }
> > @@ -1814,9 +1815,9 @@ static void rtl8139_transfer_frame(RTL8139State
> *s, uint8_t *buf, int size,
> >      else
> >      {
> >          if (iov) {
> > -            qemu_sendv_packet(qemu_get_queue(s->nic), iov, 3);
> > +            qemu_sendv_packet(qemu_get_queue(s->nic), iov, 3, 0);
> >          } else {
> > -            qemu_send_packet(qemu_get_queue(s->nic), buf, size);
> > +            qemu_send_packet(qemu_get_queue(s->nic), buf, size, 0);
> >          }
> >      }
> >  }
> > diff --git a/hw/net/smc91c111.c b/hw/net/smc91c111.c
> > index a8e29b3..82289aa 100644
> > --- a/hw/net/smc91c111.c
> > +++ b/hw/net/smc91c111.c
> > @@ -242,7 +242,7 @@ static void smc91c111_do_tx(smc91c111_state *s)
> >              smc91c111_release_packet(s, packetnum);
> >          else if (s->tx_fifo_done_len < NUM_PACKETS)
> >              s->tx_fifo_done[s->tx_fifo_done_len++] = packetnum;
> > -        qemu_send_packet(qemu_get_queue(s->nic), p, len);
> > +        qemu_send_packet(qemu_get_queue(s->nic), p, len, 0);
> >      }
> >      s->tx_fifo_len = 0;
> >      smc91c111_update(s);
> > @@ -647,7 +647,8 @@ static int smc91c111_can_receive(NetClientState *nc)
> >      return 1;
> >  }
> >
> > -static ssize_t smc91c111_receive(NetClientState *nc, const uint8_t
> *buf, size_t size)
> > +static ssize_t smc91c111_receive(NetClientState *nc, const uint8_t *buf,
> > +                                 size_t size, unsigned flags)
> >  {
> >      smc91c111_state *s = qemu_get_nic_opaque(nc);
> >      int status;
> > diff --git a/hw/net/spapr_llan.c b/hw/net/spapr_llan.c
> > index 1bd6f50..1a50bc1 100644
> > --- a/hw/net/spapr_llan.c
> > +++ b/hw/net/spapr_llan.c
> > @@ -476,7 +476,7 @@ static target_ulong h_send_logical_lan(PowerPCCPU
> *cpu, sPAPREnvironment *spapr,
> >          p += VLAN_BD_LEN(bufs[i]);
> >      }
> >
> > -    qemu_send_packet(qemu_get_queue(dev->nic), lbuf, total_len);
> > +    qemu_send_packet(qemu_get_queue(dev->nic), lbuf, total_len, 0);
> >
> >      return H_SUCCESS;
> >  }
> > diff --git a/hw/net/stellaris_enet.c b/hw/net/stellaris_enet.c
> > index 9dd77f7..950b455 100644
> > --- a/hw/net/stellaris_enet.c
> > +++ b/hw/net/stellaris_enet.c
> > @@ -83,7 +83,8 @@ static void stellaris_enet_update(stellaris_enet_state
> *s)
> >  }
> >
> >  /* TODO: Implement MAC address filtering.  */
> > -static ssize_t stellaris_enet_receive(NetClientState *nc, const uint8_t
> *buf, size_t size)
> > +static ssize_t stellaris_enet_receive(NetClientState *nc, const uint8_t
> *buf,
> > +                                      size_t size, unsigned flags)
> >  {
> >      stellaris_enet_state *s = qemu_get_nic_opaque(nc);
> >      int n;
> > diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> > index 513c168..b25bc4e 100644
> > --- a/hw/net/virtio-net.c
> > +++ b/hw/net/virtio-net.c
> > @@ -938,7 +938,8 @@ static int receive_filter(VirtIONet *n, const
> uint8_t *buf, int size)
> >      return 0;
> >  }
> >
> > -static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t
> *buf, size_t size)
> > +static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t
> *buf,
> > +                                  size_t size, unsigned flags)
> >  {
> >      VirtIONet *n = qemu_get_nic_opaque(nc);
> >      VirtIONetQueue *q = virtio_net_get_subqueue(nc);
> > @@ -1079,6 +1080,7 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue
> *q)
> >          unsigned int out_num = elem.out_num;
> >          struct iovec *out_sg = &elem.out_sg[0];
> >          struct iovec sg[VIRTQUEUE_MAX_SIZE];
> > +        unsigned flags = QEMU_NET_PACKET_FLAG_MORE;
> >
> >          if (out_num < 1) {
> >              error_report("virtio-net header not in first element");
> > @@ -1104,8 +1106,12 @@ static int32_t virtio_net_flush_tx(VirtIONetQueue
> *q)
> >
> >          len = n->guest_hdr_len;
> >
> > +        if (num_packets + 1 >= n->tx_burst ||
> virtio_queue_empty(q->tx_vq)) {
> > +                flags = 0;
> > +        }
> >          ret = qemu_sendv_packet_async(qemu_get_subqueue(n->nic,
> queue_index),
> > -                                      out_sg, out_num,
> virtio_net_tx_complete);
> > +                                      out_sg, out_num,
> virtio_net_tx_complete,
> > +                                      flags);
> >          if (ret == 0) {
> >              virtio_queue_set_notification(q->tx_vq, 0);
> >              q->async_tx.elem = elem;
> > diff --git a/hw/net/vmxnet3.c b/hw/net/vmxnet3.c
> > index 19687aa..6bd59d0 100644
> > --- a/hw/net/vmxnet3.c
> > +++ b/hw/net/vmxnet3.c
> > @@ -1817,7 +1817,8 @@ vmxnet3_rx_filter_may_indicate(VMXNET3State *s,
> const void *data,
> >  }
> >
> >  static ssize_t
> > -vmxnet3_receive(NetClientState *nc, const uint8_t *buf, size_t size)
> > +vmxnet3_receive(NetClientState *nc, const uint8_t *buf, size_t size,
> > +                unsigned flags)
> >  {
> >      VMXNET3State *s = qemu_get_nic_opaque(nc);
> >      size_t bytes_indicated;
> > diff --git a/hw/net/vmxnet_tx_pkt.c b/hw/net/vmxnet_tx_pkt.c
> > index f7344c4..12d842e 100644
> > --- a/hw/net/vmxnet_tx_pkt.c
> > +++ b/hw/net/vmxnet_tx_pkt.c
> > @@ -526,7 +526,7 @@ static bool vmxnet_tx_pkt_do_sw_fragmentation(struct
> VmxnetTxPkt *pkt,
> >
> >          eth_fix_ip4_checksum(l3_iov_base, l3_iov_len);
> >
> > -        qemu_sendv_packet(nc, fragment, dst_idx);
> > +        qemu_sendv_packet(nc, fragment, dst_idx, 0);
> >
> >          fragment_offset += fragment_len;
> >
> > @@ -559,7 +559,7 @@ bool vmxnet_tx_pkt_send(struct VmxnetTxPkt *pkt,
> NetClientState *nc)
> >      if (pkt->has_virt_hdr ||
> >          pkt->virt_hdr.gso_type == VIRTIO_NET_HDR_GSO_NONE) {
> >          qemu_sendv_packet(nc, pkt->vec,
> > -            pkt->payload_frags + VMXNET_TX_PKT_PL_START_FRAG);
> > +            pkt->payload_frags + VMXNET_TX_PKT_PL_START_FRAG, 0);
> >          return true;
> >      }
> >
> > diff --git a/hw/net/xgmac.c b/hw/net/xgmac.c
> > index 9384fa0..683a5ad 100644
> > --- a/hw/net/xgmac.c
> > +++ b/hw/net/xgmac.c
> > @@ -239,7 +239,7 @@ static void xgmac_enet_send(XgmacState *s)
> >          frame_size += len;
> >          if (bd.ctl_stat & 0x20000000) {
> >              /* Last buffer in frame.  */
> > -            qemu_send_packet(qemu_get_queue(s->nic), frame, len);
> > +            qemu_send_packet(qemu_get_queue(s->nic), frame, len, 0);
> >              ptr = frame;
> >              frame_size = 0;
> >              s->regs[DMA_STATUS] |= DMA_STATUS_TI | DMA_STATUS_NIS;
> > diff --git a/hw/net/xilinx_axienet.c b/hw/net/xilinx_axienet.c
> > index 3eb7715..9dd44bf 100644
> > --- a/hw/net/xilinx_axienet.c
> > +++ b/hw/net/xilinx_axienet.c
> > @@ -919,7 +919,7 @@ xilinx_axienet_data_stream_push(StreamSlave *obj,
> uint8_t *buf, size_t size)
> >          buf[write_off + 1] = csum & 0xff;
> >      }
> >
> > -    qemu_send_packet(qemu_get_queue(s->nic), buf, size);
> > +    qemu_send_packet(qemu_get_queue(s->nic), buf, size, 0);
> >
> >      s->stats.tx_bytes += size;
> >      s->regs[R_IS] |= IS_TX_COMPLETE;
> > diff --git a/hw/usb/dev-network.c b/hw/usb/dev-network.c
> > index 4c532b7..253878c 100644
> > --- a/hw/usb/dev-network.c
> > +++ b/hw/usb/dev-network.c
> > @@ -1196,7 +1196,7 @@ static void usb_net_handle_dataout(USBNetState *s,
> USBPacket *p)
> >
> >      if (!is_rndis(s)) {
> >          if (p->iov.size < 64) {
> > -            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf,
> s->out_ptr);
> > +            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf,
> s->out_ptr, 0);
> >              s->out_ptr = 0;
> >          }
> >          return;
> > @@ -1209,7 +1209,8 @@ static void usb_net_handle_dataout(USBNetState *s,
> USBPacket *p)
> >          uint32_t offs = 8 + le32_to_cpu(msg->DataOffset);
> >          uint32_t size = le32_to_cpu(msg->DataLength);
> >          if (offs + size <= len)
> > -            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf + offs,
> size);
> > +            qemu_send_packet(qemu_get_queue(s->nic), s->out_buf + offs,
> > +                             size, 0);
> >      }
> >      s->out_ptr -= len;
> >      memmove(s->out_buf, &s->out_buf[len], s->out_ptr);
> > @@ -1259,7 +1260,8 @@ static void usb_net_handle_data(USBDevice *dev,
> USBPacket *p)
> >      }
> >  }
> >
> > -static ssize_t usbnet_receive(NetClientState *nc, const uint8_t *buf,
> size_t size)
> > +static ssize_t usbnet_receive(NetClientState *nc, const uint8_t *buf,
> > +                              size_t size, unsigned flags)
> >  {
> >      USBNetState *s = qemu_get_nic_opaque(nc);
> >      uint8_t *in_buf = s->in_buf;
> > diff --git a/include/net/net.h b/include/net/net.h
> > index 11e1468..d3f0ad6 100644
> > --- a/include/net/net.h
> > +++ b/include/net/net.h
> > @@ -44,8 +44,10 @@ typedef struct NICConf {
> >
> >  typedef void (NetPoll)(NetClientState *, bool enable);
> >  typedef int (NetCanReceive)(NetClientState *);
> > -typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t);
> > -typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *,
> int);
> > +typedef ssize_t (NetReceive)(NetClientState *, const uint8_t *, size_t,
> > +                 unsigned);
> > +typedef ssize_t (NetReceiveIOV)(NetClientState *, const struct iovec *,
> int,
> > +                                unsigned);
> >  typedef void (NetCleanup) (NetClientState *);
> >  typedef void (LinkStatusChanged)(NetClientState *);
> >  typedef void (NetClientDestructor)(NetClientState *);
> > @@ -110,13 +112,17 @@ typedef void (*qemu_nic_foreach)(NICState *nic,
> void *opaque);
> >  void qemu_foreach_nic(qemu_nic_foreach func, void *opaque);
> >  int qemu_can_send_packet(NetClientState *nc);
> >  ssize_t qemu_sendv_packet(NetClientState *nc, const struct iovec *iov,
> > -                          int iovcnt);
> > +                          int iovcnt, unsigned flags);
> >  ssize_t qemu_sendv_packet_async(NetClientState *nc, const struct iovec
> *iov,
> > -                                int iovcnt, NetPacketSent *sent_cb);
> > -void qemu_send_packet(NetClientState *nc, const uint8_t *buf, int size);
> > -ssize_t qemu_send_packet_raw(NetClientState *nc, const uint8_t *buf,
> int size);
> > +                                int iovcnt, NetPacketSent *sent_cb,
> > +                                unsigned flags);
> > +void qemu_send_packet(NetClientState *nc, const uint8_t *buf, int size,
> > +                      unsigned flags);
> > +ssize_t qemu_send_packet_raw(NetClientState *nc, const uint8_t *buf,
> int size,
> > +                             unsigned flags);
> >  ssize_t qemu_send_packet_async(NetClientState *nc, const uint8_t *buf,
> > -                               int size, NetPacketSent *sent_cb);
> > +                               int size, NetPacketSent *sent_cb,
> > +                               unsigned flags);
> >  void qemu_purge_queued_packets(NetClientState *nc);
> >  void qemu_flush_queued_packets(NetClientState *nc);
> >  void qemu_format_nic_info_str(NetClientState *nc, uint8_t macaddr[6]);
> > diff --git a/include/net/queue.h b/include/net/queue.h
> > index fc02b33..1d136a6 100644
> > --- a/include/net/queue.h
> > +++ b/include/net/queue.h
> > @@ -33,6 +33,7 @@ typedef void (NetPacketSent) (NetClientState *sender,
> ssize_t ret);
> >
> >  #define QEMU_NET_PACKET_FLAG_NONE  0
> >  #define QEMU_NET_PACKET_FLAG_RAW  (1<<0)
> > +#define QEMU_NET_PACKET_FLAG_MORE (2<<0)
> >
> >  NetQueue *qemu_new_net_queue(void *opaque);
> >
> > diff --git a/net/dump.c b/net/dump.c
> > index 9d3a09e..f718d5c 100644
> > --- a/net/dump.c
> > +++ b/net/dump.c
> > @@ -57,7 +57,8 @@ struct pcap_sf_pkthdr {
> >      uint32_t len;
> >  };
> >
> > -static ssize_t dump_receive(NetClientState *nc, const uint8_t *buf,
> size_t size)
> > +static ssize_t dump_receive(NetClientState *nc, const uint8_t *buf,
> > +                            size_t size, unsigned flags)
> >  {
> >      DumpState *s = DO_UPCAST(DumpState, nc, nc);
> >      struct pcap_sf_pkthdr hdr;
> > diff --git a/net/hub.c b/net/hub.c
> > index 33a99c9..7adca5d 100644
> > --- a/net/hub.c
> > +++ b/net/hub.c
> > @@ -52,7 +52,7 @@ static ssize_t net_hub_receive(NetHub *hub, NetHubPort
> *source_port,
> >              continue;
> >          }
> >
> > -        qemu_send_packet(&port->nc, buf, len);
> > +        qemu_send_packet(&port->nc, buf, len, 0);
> >      }
> >      return len;
> >  }
> > @@ -68,7 +68,7 @@ static ssize_t net_hub_receive_iov(NetHub *hub,
> NetHubPort *source_port,
> >              continue;
> >          }
> >
> > -        qemu_sendv_packet(&port->nc, iov, iovcnt);
> > +        qemu_sendv_packet(&port->nc, iov, iovcnt, 0);
> >      }
> >      return len;
> >  }
> > @@ -107,7 +107,8 @@ static int net_hub_port_can_receive(NetClientState
> *nc)
> >  }
> >
> >  static ssize_t net_hub_port_receive(NetClientState *nc,
> > -                                    const uint8_t *buf, size_t len)
> > +                                    const uint8_t *buf, size_t len,
> > +                                    unsigned flags)
> >  {
> >      NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
> >
> > @@ -115,7 +116,8 @@ static ssize_t net_hub_port_receive(NetClientState
> *nc,
> >  }
> >
> >  static ssize_t net_hub_port_receive_iov(NetClientState *nc,
> > -                                        const struct iovec *iov, int
> iovcnt)
> > +                                        const struct iovec *iov, int
> iovcnt,
> > +                                        unsigned flags)
> >  {
> >      NetHubPort *port = DO_UPCAST(NetHubPort, nc, nc);
> >
> > diff --git a/net/net.c b/net/net.c
> > index 9db88cc..65cf5f1 100644
> > --- a/net/net.c
> > +++ b/net/net.c
> > @@ -414,9 +414,10 @@ ssize_t qemu_deliver_packet(NetClientState *sender,
> >      }
> >
> >      if (flags & QEMU_NET_PACKET_FLAG_RAW && nc->info->receive_raw) {
> > -        ret = nc->info->receive_raw(nc, data, size);
> > +        ret = nc->info->receive_raw(nc, data, size,
> > +                                    flags & ~QEMU_NET_PACKET_FLAG_RAW);
> >      } else {
> > -        ret = nc->info->receive(nc, data, size);
> > +        ret = nc->info->receive(nc, data, size, flags);
> >      }
> >
> >      if (ret == 0) {
> > @@ -475,32 +476,36 @@ static ssize_t
> qemu_send_packet_async_with_flags(NetClientState *sender,
> >
> >  ssize_t qemu_send_packet_async(NetClientState *sender,
> >                                 const uint8_t *buf, int size,
> > -                               NetPacketSent *sent_cb)
> > +                               NetPacketSent *sent_cb, unsigned flags)
> >  {
> > -    return qemu_send_packet_async_with_flags(sender,
> QEMU_NET_PACKET_FLAG_NONE,
> > +    return qemu_send_packet_async_with_flags(sender,
> > +                                             flags |
> QEMU_NET_PACKET_FLAG_NONE,
> >                                               buf, size, sent_cb);
> >  }
> >
> > -void qemu_send_packet(NetClientState *nc, const uint8_t *buf, int size)
> > +void qemu_send_packet(NetClientState *nc, const uint8_t *buf, int size,
> > +                      unsigned flags)
> >  {
> > -    qemu_send_packet_async(nc, buf, size, NULL);
> > +    qemu_send_packet_async(nc, buf, size, NULL, flags);
> >  }
> >
> > -ssize_t qemu_send_packet_raw(NetClientState *nc, const uint8_t *buf,
> int size)
> > +ssize_t qemu_send_packet_raw(NetClientState *nc, const uint8_t *buf,
> int size,
> > +                             unsigned flags)
> >  {
> > -    return qemu_send_packet_async_with_flags(nc,
> QEMU_NET_PACKET_FLAG_RAW,
> > +    return qemu_send_packet_async_with_flags(nc,
> > +                                             QEMU_NET_PACKET_FLAG_RAW |
> flags,
> >                                               buf, size, NULL);
> >  }
> >
> >  static ssize_t nc_sendv_compat(NetClientState *nc, const struct iovec
> *iov,
> > -                               int iovcnt)
> > +                               int iovcnt, unsigned flags)
> >  {
> >      uint8_t buffer[NET_BUFSIZE];
> >      size_t offset;
> >
> >      offset = iov_to_buf(iov, iovcnt, 0, buffer, sizeof(buffer));
> >
> > -    return nc->info->receive(nc, buffer, offset);
> > +    return nc->info->receive(nc, buffer, offset, flags);
> >  }
> >
> >  ssize_t qemu_deliver_packet_iov(NetClientState *sender,
> > @@ -521,9 +526,9 @@ ssize_t qemu_deliver_packet_iov(NetClientState
> *sender,
> >      }
> >
> >      if (nc->info->receive_iov) {
> > -        ret = nc->info->receive_iov(nc, iov, iovcnt);
> > +        ret = nc->info->receive_iov(nc, iov, iovcnt, flags);
> >      } else {
> > -        ret = nc_sendv_compat(nc, iov, iovcnt);
> > +        ret = nc_sendv_compat(nc, iov, iovcnt, flags);
> >      }
> >
> >      if (ret == 0) {
> > @@ -535,7 +540,8 @@ ssize_t qemu_deliver_packet_iov(NetClientState
> *sender,
> >
> >  ssize_t qemu_sendv_packet_async(NetClientState *sender,
> >                                  const struct iovec *iov, int iovcnt,
> > -                                NetPacketSent *sent_cb)
> > +                                NetPacketSent *sent_cb,
> > +                                unsigned flags)
> >  {
> >      NetQueue *queue;
> >
> > @@ -546,14 +552,15 @@ ssize_t qemu_sendv_packet_async(NetClientState
> *sender,
> >      queue = sender->peer->incoming_queue;
> >
> >      return qemu_net_queue_send_iov(queue, sender,
> > -                                   QEMU_NET_PACKET_FLAG_NONE,
> > +                                   flags | QEMU_NET_PACKET_FLAG_NONE,
> >                                     iov, iovcnt, sent_cb);
> >  }
> >
> >  ssize_t
> > -qemu_sendv_packet(NetClientState *nc, const struct iovec *iov, int
> iovcnt)
> > +qemu_sendv_packet(NetClientState *nc, const struct iovec *iov, int
> iovcnt,
> > +                  unsigned flags)
> >  {
> > -    return qemu_sendv_packet_async(nc, iov, iovcnt, NULL);
> > +    return qemu_sendv_packet_async(nc, iov, iovcnt, NULL, flags);
> >  }
> >
> >  NetClientState *qemu_find_netdev(const char *id)
> > diff --git a/net/netmap.c b/net/netmap.c
> > index 0ccc497..0b982a0 100644
> > --- a/net/netmap.c
> > +++ b/net/netmap.c
> > @@ -218,7 +218,8 @@ static void netmap_writable(void *opaque)
> >  }
> >
> >  static ssize_t netmap_receive(NetClientState *nc,
> > -      const uint8_t *buf, size_t size)
> > +                              const uint8_t *buf,
> > +                              size_t size, unsigned flags)
> >  {
> >      NetmapState *s = DO_UPCAST(NetmapState, nc, nc);
> >      struct netmap_ring *ring = s->me.tx;
> > @@ -252,13 +253,17 @@ static ssize_t netmap_receive(NetClientState *nc,
> >      pkt_copy(buf, dst, size);
> >      ring->cur = NETMAP_RING_NEXT(ring, i);
> >      ring->avail--;
> > -    ioctl(s->me.fd, NIOCTXSYNC, NULL);
> > +
> > +    if (!(flags & QEMU_NET_PACKET_FLAG_MORE)) {
> > +        ioctl(s->me.fd, NIOCTXSYNC, NULL);
> > +    }
> >
> >      return size;
> >  }
> >
> >  static ssize_t netmap_receive_iov(NetClientState *nc,
> > -                    const struct iovec *iov, int iovcnt)
> > +                    const struct iovec *iov, int iovcnt,
> > +                    unsigned flags)
> >  {
> >      NetmapState *s = DO_UPCAST(NetmapState, nc, nc);
> >      struct netmap_ring *ring = s->me.tx;
> > @@ -322,7 +327,9 @@ static ssize_t netmap_receive_iov(NetClientState *nc,
> >      ring->cur = i;
> >      ring->avail = avail;
> >
> > -    ioctl(s->me.fd, NIOCTXSYNC, NULL);
> > +    if (!(flags & QEMU_NET_PACKET_FLAG_MORE)) {
> > +        ioctl(s->me.fd, NIOCTXSYNC, NULL);
> > +    }
> >
> >      return iov_size(iov, iovcnt);
> >  }
> > @@ -368,7 +375,7 @@ static void netmap_send(void *opaque)
> >          }
> >
> >          iovsize = qemu_sendv_packet_async(&s->nc, s->iov, iovcnt,
> > -                                            netmap_send_completed);
> > +                                            netmap_send_completed, 0);
> >
> >          if (iovsize == 0) {
> >              /* The peer does not receive anymore. Packet is queued, stop
> > diff --git a/net/slirp.c b/net/slirp.c
> > index 124e953..a801638 100644
> > --- a/net/slirp.c
> > +++ b/net/slirp.c
> > @@ -103,10 +103,11 @@ void slirp_output(void *opaque, const uint8_t
> *pkt, int pkt_len)
> >  {
> >      SlirpState *s = opaque;
> >
> > -    qemu_send_packet(&s->nc, pkt, pkt_len);
> > +    qemu_send_packet(&s->nc, pkt, pkt_len, 0);
> >  }
> >
> > -static ssize_t net_slirp_receive(NetClientState *nc, const uint8_t
> *buf, size_t size)
> > +static ssize_t net_slirp_receive(NetClientState *nc, const uint8_t *buf,
> > +                                 size_t size, unsigned flags)
> >  {
> >      SlirpState *s = DO_UPCAST(SlirpState, nc, nc);
> >
> > diff --git a/net/socket.c b/net/socket.c
> > index fb21e20..acc715a 100644
> > --- a/net/socket.c
> > +++ b/net/socket.c
> > @@ -89,7 +89,8 @@ static void net_socket_writable(void *opaque)
> >      qemu_flush_queued_packets(&s->nc);
> >  }
> >
> > -static ssize_t net_socket_receive(NetClientState *nc, const uint8_t
> *buf, size_t size)
> > +static ssize_t net_socket_receive(NetClientState *nc, const uint8_t
> *buf,
> > +                                  size_t size, unsigned flags)
> >  {
> >      NetSocketState *s = DO_UPCAST(NetSocketState, nc, nc);
> >      uint32_t len = htonl(size);
> > @@ -124,7 +125,8 @@ static ssize_t net_socket_receive(NetClientState
> *nc, const uint8_t *buf, size_t
> >      return size;
> >  }
> >
> > -static ssize_t net_socket_receive_dgram(NetClientState *nc, const
> uint8_t *buf, size_t size)
> > +static ssize_t net_socket_receive_dgram(NetClientState *nc, const
> uint8_t *buf,
> > +                                        size_t size, unsigned flags)
> >  {
> >      NetSocketState *s = DO_UPCAST(NetSocketState, nc, nc);
> >      ssize_t ret;
> > @@ -211,7 +213,7 @@ static void net_socket_send(void *opaque)
> >              buf += l;
> >              size -= l;
> >              if (s->index >= s->packet_len) {
> > -                qemu_send_packet(&s->nc, s->buf, s->packet_len);
> > +                qemu_send_packet(&s->nc, s->buf, s->packet_len, 0);
> >                  s->index = 0;
> >                  s->state = 0;
> >              }
> > @@ -234,7 +236,7 @@ static void net_socket_send_dgram(void *opaque)
> >          net_socket_write_poll(s, false);
> >          return;
> >      }
> > -    qemu_send_packet(&s->nc, s->buf, size);
> > +    qemu_send_packet(&s->nc, s->buf, size, 0);
> >  }
> >
> >  static int net_socket_mcast_create(struct sockaddr_in *mcastaddr,
> struct in_addr *localaddr)
> > diff --git a/net/tap-win32.c b/net/tap-win32.c
> > index 91e9e84..2d86122 100644
> > --- a/net/tap-win32.c
> > +++ b/net/tap-win32.c
> > @@ -664,7 +664,7 @@ static void tap_win32_send(void *opaque)
> >
> >      size = tap_win32_read(s->handle, &buf, max_size);
> >      if (size > 0) {
> > -        qemu_send_packet(&s->nc, buf, size);
> > +        qemu_send_packet(&s->nc, buf, size, 0);
> >          tap_win32_free_buffer(s->handle, buf);
> >      }
> >  }
> > diff --git a/net/tap.c b/net/tap.c
> > index 39c1cda..6d7a02e 100644
> > --- a/net/tap.c
> > +++ b/net/tap.c
> > @@ -112,7 +112,7 @@ static ssize_t tap_write_packet(TAPState *s, const
> struct iovec *iov, int iovcnt
> >  }
> >
> >  static ssize_t tap_receive_iov(NetClientState *nc, const struct iovec
> *iov,
> > -                               int iovcnt)
> > +                               int iovcnt, unsigned flags)
> >  {
> >      TAPState *s = DO_UPCAST(TAPState, nc, nc);
> >      const struct iovec *iovp = iov;
> > @@ -130,7 +130,8 @@ static ssize_t tap_receive_iov(NetClientState *nc,
> const struct iovec *iov,
> >      return tap_write_packet(s, iovp, iovcnt);
> >  }
> >
> > -static ssize_t tap_receive_raw(NetClientState *nc, const uint8_t *buf,
> size_t size)
> > +static ssize_t tap_receive_raw(NetClientState *nc, const uint8_t *buf,
> > +                               size_t size, unsigned flags)
> >  {
> >      TAPState *s = DO_UPCAST(TAPState, nc, nc);
> >      struct iovec iov[2];
> > @@ -150,13 +151,14 @@ static ssize_t tap_receive_raw(NetClientState *nc,
> const uint8_t *buf, size_t si
> >      return tap_write_packet(s, iov, iovcnt);
> >  }
> >
> > -static ssize_t tap_receive(NetClientState *nc, const uint8_t *buf,
> size_t size)
> > +static ssize_t tap_receive(NetClientState *nc, const uint8_t *buf,
> size_t size,
> > +                           unsigned flags)
> >  {
> >      TAPState *s = DO_UPCAST(TAPState, nc, nc);
> >      struct iovec iov[1];
> >
> >      if (s->host_vnet_hdr_len && !s->using_vnet_hdr) {
> > -        return tap_receive_raw(nc, buf, size);
> > +        return tap_receive_raw(nc, buf, size, flags);
> >      }
> >
> >      iov[0].iov_base = (char *)buf;
> > @@ -203,7 +205,7 @@ static void tap_send(void *opaque)
> >              size -= s->host_vnet_hdr_len;
> >          }
> >
> > -        size = qemu_send_packet_async(&s->nc, buf, size,
> tap_send_completed);
> > +        size = qemu_send_packet_async(&s->nc, buf, size,
> tap_send_completed, 0);
> >          if (size == 0) {
> >              tap_read_poll(s, false);
> >          }
> > diff --git a/net/vde.c b/net/vde.c
> > index 2a619fb..5629f58 100644
> > --- a/net/vde.c
> > +++ b/net/vde.c
> > @@ -44,11 +44,12 @@ static void vde_to_qemu(void *opaque)
> >
> >      size = vde_recv(s->vde, (char *)buf, sizeof(buf), 0);
> >      if (size > 0) {
> > -        qemu_send_packet(&s->nc, buf, size);
> > +        qemu_send_packet(&s->nc, buf, size, 0);
> >      }
> >  }
> >
> > -static ssize_t vde_receive(NetClientState *nc, const uint8_t *buf,
> size_t size)
> > +static ssize_t vde_receive(NetClientState *nc, const uint8_t *buf,
> size_t size,
> > +                           unsigned flags)
> >  {
> >      VDEState *s = DO_UPCAST(VDEState, nc, nc);
> >      ssize_t ret;
> > diff --git a/savevm.c b/savevm.c
> > index 3f912dd..a8d5373 100644
> > --- a/savevm.c
> > +++ b/savevm.c
> > @@ -84,7 +84,7 @@ static void qemu_announce_self_iter(NICState *nic,
> void *opaque)
> >
> >      len = announce_self_create(buf, nic->conf->macaddr.a);
> >
> > -    qemu_send_packet_raw(qemu_get_queue(nic), buf, len);
> > +    qemu_send_packet_raw(qemu_get_queue(nic), buf, len, 0);
> >  }
> >
> >
> > --
> > 1.8.4.2
>



-- 
Vincenzo Maffione

[-- Attachment #2: Type: text/html, Size: 52216 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-09 10:20   ` Vincenzo Maffione
@ 2013-12-09 10:30     ` Michael S. Tsirkin
  2013-12-09 10:55       ` Vincenzo Maffione
  0 siblings, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-12-09 10:30 UTC (permalink / raw)
  To: Vincenzo Maffione
  Cc: Peter Maydell, Jason Wang, mjt, qemu-devel, lcapitulino,
	peter.crosthwaite, Dmitry Fleytman, Gerd Hoffmann,
	Yan Vugenfirer, Edgar E. Iglesias, akong, quintela,
	Alexander Graf, aliguori, marcel.a, sw, Stefan Hajnoczi,
	Giuseppe Lettieri, Luigi Rizzo, mark.langsdorf, owasserm,
	Paolo Bonzini, Andreas Färber

On Mon, Dec 09, 2013 at 11:20:29AM +0100, Vincenzo Maffione wrote:
> Hello,
>    I've done some netperf TCP_STREAM and TCP_RR virtio-net tests, using the
> same configuration.
> Here are the results
> 
> ########## netperf TCP_STREAM ###########
>         NO BATCHING         BATCHING
> 1          5.5 Gbps         3.8 Gbps
> 2          5.4 Gbps         5.5 Gbps
> 3          5.2 Gbps         5.2 Gbps
> 4          5.1 Gbps         5.0 Gbps
> 10         5.4 Gbps         5.2 Gbps
> 20         5.4 Gbps         5.4 Gbps
> 
> 
> ############ netperf TCP_RR #############
>         NO BATCHING         BATCHING
> 1         13.0 Ktts         12.8 Ktts
> 2         23.8 Ktts         23.0 Ktts
> 3         34.0 Ktts         32.5 Ktts
> 4         44.5 Ktts         41.0 Ktts
> 10        97.0 Ktts         93.0 Ktts
> 15       122.0 Ktts        120.0 Ktts
> 20       125.0 Ktts        128.0 Ktts
> 25       128.0 Ktts        130.0 Ktts
> 
> 
> There is some negative effects introduced by batching.
> Also consider that
>    - Since TAP backend doesn't use the new flag, this patch doesn't change the
> performance when the TAP backend is used.
>    - I've not submitted yet the patch for virtio_net_header support, and
> therefore the TCP_STREAM performance with NETMAP backend is now not
>      comparable to the performance with TAP backend, because we are limited to
> 1.5KB packets.
> 
> Cheers,
>   Vincenzo

Ah, so no GSO/UFO/checksum offload then?
In that case maybe it's a good idea to start with supporting
that in your backend. This does batching within the guest so
extra host side batching with all the tradeoffs it involves
might not be necessary.

Guest network stack behaviour with and without offloads is
different to such a degree that it's not clear optimizing
one is not pessimizing the other.

-- 
MST

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-09 10:30     ` Michael S. Tsirkin
@ 2013-12-09 10:55       ` Vincenzo Maffione
  2013-12-09 11:14         ` Michael S. Tsirkin
  0 siblings, 1 reply; 17+ messages in thread
From: Vincenzo Maffione @ 2013-12-09 10:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Jason Wang, mjt, qemu-devel, lcapitulino,
	peter.crosthwaite, Dmitry Fleytman, Gerd Hoffmann,
	Yan Vugenfirer, Edgar E. Iglesias, akong, quintela,
	Alexander Graf, aliguori, marcel.a, sw, Stefan Hajnoczi,
	Giuseppe Lettieri, Luigi Rizzo, mark.langsdorf, owasserm,
	Paolo Bonzini, Andreas Färber

[-- Attachment #1: Type: text/plain, Size: 2629 bytes --]

I totally agree with you, and we will propose a patch to make this possible.

However, none of the offloadings you mentioned helps with packet rate
throughput (checksum offload doesn't really help with short packets), which
is the main purpose of this patch. High packet rates (say 1-5 Mpps) are
interesting for people who want to use VMs as middleboxes. These packet
rates (and up to 20+ Mpps) are possible with netmap if proper batching is
supported.

If you don't think adding the new flag support for virtio-net is a good
idea (though TAP performance is not affected in every case) we could also
make it optional.


Cheers
  Vincenzo


2013/12/9 Michael S. Tsirkin <mst@redhat.com>

> On Mon, Dec 09, 2013 at 11:20:29AM +0100, Vincenzo Maffione wrote:
> > Hello,
> >    I've done some netperf TCP_STREAM and TCP_RR virtio-net tests, using
> the
> > same configuration.
> > Here are the results
> >
> > ########## netperf TCP_STREAM ###########
> >         NO BATCHING         BATCHING
> > 1          5.5 Gbps         3.8 Gbps
> > 2          5.4 Gbps         5.5 Gbps
> > 3          5.2 Gbps         5.2 Gbps
> > 4          5.1 Gbps         5.0 Gbps
> > 10         5.4 Gbps         5.2 Gbps
> > 20         5.4 Gbps         5.4 Gbps
> >
> >
> > ############ netperf TCP_RR #############
> >         NO BATCHING         BATCHING
> > 1         13.0 Ktts         12.8 Ktts
> > 2         23.8 Ktts         23.0 Ktts
> > 3         34.0 Ktts         32.5 Ktts
> > 4         44.5 Ktts         41.0 Ktts
> > 10        97.0 Ktts         93.0 Ktts
> > 15       122.0 Ktts        120.0 Ktts
> > 20       125.0 Ktts        128.0 Ktts
> > 25       128.0 Ktts        130.0 Ktts
> >
> >
> > There is some negative effects introduced by batching.
> > Also consider that
> >    - Since TAP backend doesn't use the new flag, this patch doesn't
> change the
> > performance when the TAP backend is used.
> >    - I've not submitted yet the patch for virtio_net_header support, and
> > therefore the TCP_STREAM performance with NETMAP backend is now not
> >      comparable to the performance with TAP backend, because we are
> limited to
> > 1.5KB packets.
> >
> > Cheers,
> >   Vincenzo
>
> Ah, so no GSO/UFO/checksum offload then?
> In that case maybe it's a good idea to start with supporting
> that in your backend. This does batching within the guest so
> extra host side batching with all the tradeoffs it involves
> might not be necessary.
>
> Guest network stack behaviour with and without offloads is
> different to such a degree that it's not clear optimizing
> one is not pessimizing the other.
>
> --
> MST
>



-- 
Vincenzo Maffione

[-- Attachment #2: Type: text/html, Size: 3439 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-09 10:55       ` Vincenzo Maffione
@ 2013-12-09 11:14         ` Michael S. Tsirkin
  2013-12-09 12:42           ` Stefan Hajnoczi
  0 siblings, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-12-09 11:14 UTC (permalink / raw)
  To: Vincenzo Maffione
  Cc: Peter Maydell, Jason Wang, mjt, qemu-devel, lcapitulino,
	peter.crosthwaite, Dmitry Fleytman, Gerd Hoffmann,
	Yan Vugenfirer, Edgar E. Iglesias, akong, quintela,
	Alexander Graf, aliguori, marcel.a, sw, Stefan Hajnoczi,
	Giuseppe Lettieri, Luigi Rizzo, mark.langsdorf, owasserm,
	Paolo Bonzini, Andreas Färber

On Mon, Dec 09, 2013 at 11:55:57AM +0100, Vincenzo Maffione wrote:
> I totally agree with you, and we will propose a patch to make this possible.
> 
> However, none of the offloadings you mentioned helps with packet rate
> throughput (checksum offload doesn't really help with short packets), which is
> the main purpose of this patch. High packet rates (say 1-5 Mpps) are
> interesting for people who want to use VMs as middleboxes. These packet rates
> (and up to 20+ Mpps) are possible with netmap if proper batching is supported.

I don't see why would host batching be effective where guest batching
isn't. At least in theory, batching belongs at endpoints not in the
network.

GSO makes packets bigger, and checksum offload helps there.

> If you don't think adding the new flag support for virtio-net is a good idea
> (though TAP performance is not affected in every case) we could also make it
> optional.
> 
> 
> Cheers
>   Vincenzo
> 

I think it's too early to say whether this patch is benefitial for
netmap, too.  It looks like something that trades off latency
for throughput, and this is a decision the endpoint (VM) should
make, not the network (host).
So you should measure with offloads on before you make conclusions about it.



> 2013/12/9 Michael S. Tsirkin <mst@redhat.com>
> 
>     On Mon, Dec 09, 2013 at 11:20:29AM +0100, Vincenzo Maffione wrote:
>     > Hello,
>     >    I've done some netperf TCP_STREAM and TCP_RR virtio-net tests, using
>     the
>     > same configuration.
>     > Here are the results
>     >
>     > ########## netperf TCP_STREAM ###########
>     >         NO BATCHING         BATCHING
>     > 1          5.5 Gbps         3.8 Gbps
>     > 2          5.4 Gbps         5.5 Gbps
>     > 3          5.2 Gbps         5.2 Gbps
>     > 4          5.1 Gbps         5.0 Gbps
>     > 10         5.4 Gbps         5.2 Gbps
>     > 20         5.4 Gbps         5.4 Gbps
>     >
>     >
>     > ############ netperf TCP_RR #############
>     >         NO BATCHING         BATCHING
>     > 1         13.0 Ktts         12.8 Ktts
>     > 2         23.8 Ktts         23.0 Ktts
>     > 3         34.0 Ktts         32.5 Ktts
>     > 4         44.5 Ktts         41.0 Ktts
>     > 10        97.0 Ktts         93.0 Ktts
>     > 15       122.0 Ktts        120.0 Ktts
>     > 20       125.0 Ktts        128.0 Ktts
>     > 25       128.0 Ktts        130.0 Ktts
>     >
>     >
>     > There is some negative effects introduced by batching.
>     > Also consider that
>     >    - Since TAP backend doesn't use the new flag, this patch doesn't
>     change the
>     > performance when the TAP backend is used.
>     >    - I've not submitted yet the patch for virtio_net_header support, and
>     > therefore the TCP_STREAM performance with NETMAP backend is now not
>     >      comparable to the performance with TAP backend, because we are
>     limited to
>     > 1.5KB packets.
>     >
>     > Cheers,
>     >   Vincenzo
> 
>     Ah, so no GSO/UFO/checksum offload then?
>     In that case maybe it's a good idea to start with supporting
>     that in your backend. This does batching within the guest so
>     extra host side batching with all the tradeoffs it involves
>     might not be necessary.
> 
>     Guest network stack behaviour with and without offloads is
>     different to such a degree that it's not clear optimizing
>     one is not pessimizing the other.
>    
>     --
>     MST
> 
> 
> 
> 
> --
> Vincenzo Maffione

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-06 14:44 [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced Vincenzo Maffione
  2013-12-06 16:39 ` Stefan Weil
  2013-12-08 12:11 ` Michael S. Tsirkin
@ 2013-12-09 12:36 ` Stefan Hajnoczi
  2013-12-09 14:02   ` Michael S. Tsirkin
  2 siblings, 1 reply; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-12-09 12:36 UTC (permalink / raw)
  To: Vincenzo Maffione
  Cc: peter.maydell, mst, jasowang, mjt, qemu-devel, lcapitulino,
	peter.crosthwaite, owasserm, kraxel, yan, edgar.iglesias, akong,
	quintela, agraf, aliguori, marcel.a, sw, stefanha, g.lettieri,
	rizzo, dmitry, mark.langsdorf, pbonzini, afaerber

On Fri, Dec 06, 2013 at 03:44:33PM +0100, Vincenzo Maffione wrote:
>     - This patch is against the net-next tree (https://github.com/stefanha/qemu.git)
>       because the first netmap patch is not in the qemu master (AFAIK).

You are right.  I am sending a pull request now to get those patches
into qemu.git/master.

>  hw/net/cadence_gem.c    |  3 ++-
>  hw/net/dp8393x.c        |  5 +++--
>  hw/net/e1000.c          | 21 ++++++++++++++++-----
>  hw/net/eepro100.c       |  5 +++--
>  hw/net/etraxfs_eth.c    |  5 +++--
>  hw/net/lan9118.c        |  2 +-
>  hw/net/mcf_fec.c        |  5 +++--
>  hw/net/mipsnet.c        |  6 ++++--
>  hw/net/ne2000.c         |  5 +++--
>  hw/net/ne2000.h         |  3 ++-
>  hw/net/opencores_eth.c  |  2 +-
>  hw/net/pcnet.c          |  8 +++++---
>  hw/net/pcnet.h          |  3 ++-
>  hw/net/rtl8139.c        |  7 ++++---
>  hw/net/smc91c111.c      |  5 +++--
>  hw/net/spapr_llan.c     |  2 +-
>  hw/net/stellaris_enet.c |  3 ++-
>  hw/net/virtio-net.c     | 10 ++++++++--
>  hw/net/vmxnet3.c        |  3 ++-
>  hw/net/vmxnet_tx_pkt.c  |  4 ++--
>  hw/net/xgmac.c          |  2 +-
>  hw/net/xilinx_axienet.c |  2 +-
>  hw/usb/dev-network.c    |  8 +++++---
>  include/net/net.h       | 20 +++++++++++++-------
>  include/net/queue.h     |  1 +
>  net/dump.c              |  3 ++-
>  net/hub.c               | 10 ++++++----
>  net/net.c               | 39 +++++++++++++++++++++++----------------
>  net/netmap.c            | 17 ++++++++++++-----
>  net/slirp.c             |  5 +++--
>  net/socket.c            | 10 ++++++----
>  net/tap-win32.c         |  2 +-
>  net/tap.c               | 12 +++++++-----
>  net/vde.c               |  5 +++--
>  savevm.c                |  2 +-
>  35 files changed, 155 insertions(+), 90 deletions(-)

Please split this into multiple patches:

1. net subsystem API change that touches all files (if necessary)
2. e1000 MORE support
3. virtio-net MORE support
4. netmap MORE support

This makes it easier to review and bisect.

Thanks,
Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-09 11:14         ` Michael S. Tsirkin
@ 2013-12-09 12:42           ` Stefan Hajnoczi
  2013-12-09 13:25             ` Vincenzo Maffione
  2013-12-09 13:55             ` Michael S. Tsirkin
  0 siblings, 2 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-12-09 12:42 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Jason Wang, mjt, qemu-devel, Vincenzo Maffione,
	lcapitulino, peter.crosthwaite, owasserm, Gerd Hoffmann,
	Yan Vugenfirer, Edgar E. Iglesias, akong, quintela,
	Alexander Graf, aliguori, marcel.a, sw, Stefan Hajnoczi,
	Giuseppe Lettieri, Luigi Rizzo, Dmitry Fleytman, mark.langsdorf,
	Paolo Bonzini, Andreas Färber

On Mon, Dec 09, 2013 at 01:14:31PM +0200, Michael S. Tsirkin wrote:
> On Mon, Dec 09, 2013 at 11:55:57AM +0100, Vincenzo Maffione wrote:
> > If you don't think adding the new flag support for virtio-net is a good idea
> > (though TAP performance is not affected in every case) we could also make it
> > optional.
> > 
> > 
> > Cheers
> >   Vincenzo
> > 
> 
> I think it's too early to say whether this patch is benefitial for
> netmap, too.  It looks like something that trades off latency
> for throughput, and this is a decision the endpoint (VM) should
> make, not the network (host).
> So you should measure with offloads on before you make conclusions about it.

Just to check my understanding, we're talking about the following kind
of batching:

  int num_packets = peek_available_packets(device);
  while (num_packets-- > 0) {
      int flags = MORE;
      if (num_packets == 0) {
          flags = NONE;
      }
      qemu_net_send_packet(..., flags);
  }

In other words, this only batches up a single burst of packets.  It
doesn't introduce timers or blocking calls.

So the effect of batching should be relatively small on latency.  In
fact, it's almost like sendmmsg(2)/recvmmsg(2) but using a
one-packet-at-a-time interface.

Does this sound right?

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-09 12:42           ` Stefan Hajnoczi
@ 2013-12-09 13:25             ` Vincenzo Maffione
  2013-12-09 14:00               ` Michael S. Tsirkin
  2013-12-09 13:55             ` Michael S. Tsirkin
  1 sibling, 1 reply; 17+ messages in thread
From: Vincenzo Maffione @ 2013-12-09 13:25 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Michael S. Tsirkin, Jason Wang, mjt, qemu-devel,
	lcapitulino, peter.crosthwaite, owasserm, Gerd Hoffmann,
	Yan Vugenfirer, Edgar E. Iglesias, akong, quintela,
	Alexander Graf, aliguori, marcel.a, sw, Stefan Hajnoczi,
	Giuseppe Lettieri, Luigi Rizzo, Dmitry Fleytman, mark.langsdorf,
	Paolo Bonzini, Andreas Färber

[-- Attachment #1: Type: text/plain, Size: 2064 bytes --]

2013/12/9 Stefan Hajnoczi <stefanha@gmail.com>

> On Mon, Dec 09, 2013 at 01:14:31PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Dec 09, 2013 at 11:55:57AM +0100, Vincenzo Maffione wrote:
> > > If you don't think adding the new flag support for virtio-net is a
> good idea
> > > (though TAP performance is not affected in every case) we could also
> make it
> > > optional.
> > >
> > >
> > > Cheers
> > >   Vincenzo
> > >
> >
> > I think it's too early to say whether this patch is benefitial for
> > netmap, too.  It looks like something that trades off latency
> > for throughput, and this is a decision the endpoint (VM) should
> > make, not the network (host).
> > So you should measure with offloads on before you make conclusions about
> it.
>
> Just to check my understanding, we're talking about the following kind
> of batching:
>
>   int num_packets = peek_available_packets(device);
>   while (num_packets-- > 0) {
>       int flags = MORE;
>       if (num_packets == 0) {
>           flags = NONE;
>       }
>       qemu_net_send_packet(..., flags);
>   }
>
> In other words, this only batches up a single burst of packets.  It
> doesn't introduce timers or blocking calls.
>
> So the effect of batching should be relatively small on latency.  In
> fact, it's almost like sendmmsg(2)/recvmmsg(2) but using a
> one-packet-at-a-time interface.
>
> Does this sound right?
>
> Stefan
>

Totally correct.

In reply to Michael:
   - what you say is right with netmap used as a backend with typical TCP
applications in the guests, and we have already an implementation that
supports those offloadings

   - however, consider that the main use of netmap is fast packet
processing in middleboxes, where packet aggregation is not always possible.
Applications that use netmap **in the guest** typically use "packet
batching" (i.e. send multiple packets with one system call), so batches
originate in the guest. Without the MORE flag, those batches are split at
the frontend-backend interface. This is just a different workload.


Regards,
-- 
Vincenzo Maffione

[-- Attachment #2: Type: text/html, Size: 2938 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-09 12:42           ` Stefan Hajnoczi
  2013-12-09 13:25             ` Vincenzo Maffione
@ 2013-12-09 13:55             ` Michael S. Tsirkin
  2013-12-10  9:16               ` Stefan Hajnoczi
  1 sibling, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-12-09 13:55 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Peter Maydell, Jason Wang, mjt, qemu-devel, Vincenzo Maffione,
	lcapitulino, peter.crosthwaite, owasserm, Gerd Hoffmann,
	Yan Vugenfirer, Edgar E. Iglesias, akong, quintela,
	Alexander Graf, aliguori, marcel.a, sw, Stefan Hajnoczi,
	Giuseppe Lettieri, Luigi Rizzo, Dmitry Fleytman, mark.langsdorf,
	Paolo Bonzini, Andreas Färber

On Mon, Dec 09, 2013 at 01:42:30PM +0100, Stefan Hajnoczi wrote:
> On Mon, Dec 09, 2013 at 01:14:31PM +0200, Michael S. Tsirkin wrote:
> > On Mon, Dec 09, 2013 at 11:55:57AM +0100, Vincenzo Maffione wrote:
> > > If you don't think adding the new flag support for virtio-net is a good idea
> > > (though TAP performance is not affected in every case) we could also make it
> > > optional.
> > > 
> > > 
> > > Cheers
> > >   Vincenzo
> > > 
> > 
> > I think it's too early to say whether this patch is benefitial for
> > netmap, too.  It looks like something that trades off latency
> > for throughput, and this is a decision the endpoint (VM) should
> > make, not the network (host).
> > So you should measure with offloads on before you make conclusions about it.
> 
> Just to check my understanding, we're talking about the following kind
> of batching:
> 
>   int num_packets = peek_available_packets(device);
>   while (num_packets-- > 0) {
>       int flags = MORE;
>       if (num_packets == 0) {
>           flags = NONE;
>       }
>       qemu_net_send_packet(..., flags);
>   }
> 
> In other words, this only batches up a single burst of packets.  It
> doesn't introduce timers or blocking calls.

Yes.

> So the effect of batching should be relatively small on latency.  In
> fact, it's almost like sendmmsg(2)/recvmmsg(2) but using a
> one-packet-at-a-time interface.
> 
> Does this sound right?
> 
> Stefan

Why would it be small?  Consider a queue of 256 packets.
You are sending out a single short packet, followed
by a burst of 255 larger packets.
the single packet is not transmitted until qemu completes
processing 255 larger ones.

-- 
MST

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-09 13:25             ` Vincenzo Maffione
@ 2013-12-09 14:00               ` Michael S. Tsirkin
  2013-12-09 16:04                 ` Vincenzo Maffione
  0 siblings, 1 reply; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-12-09 14:00 UTC (permalink / raw)
  To: Vincenzo Maffione
  Cc: Peter Maydell, Stefan Hajnoczi, Jason Wang, mjt, qemu-devel,
	lcapitulino, peter.crosthwaite, owasserm, Gerd Hoffmann,
	Yan Vugenfirer, Edgar E. Iglesias, akong, quintela,
	Alexander Graf, aliguori, marcel.a, sw, Stefan Hajnoczi,
	Giuseppe Lettieri, Luigi Rizzo, Dmitry Fleytman, mark.langsdorf,
	Paolo Bonzini, Andreas Färber

On Mon, Dec 09, 2013 at 02:25:46PM +0100, Vincenzo Maffione wrote:
> 
> 
> 
> 2013/12/9 Stefan Hajnoczi <stefanha@gmail.com>
> 
>     On Mon, Dec 09, 2013 at 01:14:31PM +0200, Michael S. Tsirkin wrote:
>     > On Mon, Dec 09, 2013 at 11:55:57AM +0100, Vincenzo Maffione wrote:
>     > > If you don't think adding the new flag support for virtio-net is a good
>     idea
>     > > (though TAP performance is not affected in every case) we could also
>     make it
>     > > optional.
>     > >
>     > >
>     > > Cheers
>     > >   Vincenzo
>     > >
>     >
>     > I think it's too early to say whether this patch is benefitial for
>     > netmap, too.  It looks like something that trades off latency
>     > for throughput, and this is a decision the endpoint (VM) should
>     > make, not the network (host).
>     > So you should measure with offloads on before you make conclusions about
>     it.
> 
>     Just to check my understanding, we're talking about the following kind
>     of batching:
> 
>       int num_packets = peek_available_packets(device);
>       while (num_packets-- > 0) {
>           int flags = MORE;
>           if (num_packets == 0) {
>               flags = NONE;
>           }
>           qemu_net_send_packet(..., flags);
>       }
> 
>     In other words, this only batches up a single burst of packets.  It
>     doesn't introduce timers or blocking calls.
> 
>     So the effect of batching should be relatively small on latency.  In
>     fact, it's almost like sendmmsg(2)/recvmmsg(2) but using a
>     one-packet-at-a-time interface.
> 
>     Does this sound right?
>    
>     Stefan
> 
> 
> Totally correct.
> 
> In reply to Michael:
>    - what you say is right with netmap used as a backend with typical TCP
> applications in the guests, and we have already an implementation that supports
> those offloadings
> 
>    - however, consider that the main use of netmap is fast packet processing in
> middleboxes, where packet aggregation is not always possible. Applications that
> use netmap **in the guest** typically use "packet batching" (i.e. send multiple
> packets with one system call), so batches originate in the guest. Without the
> MORE flag, those batches are split at the frontend-backend interface. This is
> just a different workload.
> 
> 
> Regards,
> --
> Vincenzo Maffione

Considering that you have measured performance regression under
netperf, I don't understand why do we keep arguing
about theory. Increasing latency is a problem and if it can already be
seen with netperf it will only get worse with real life workloads.

So my advice is, start by merging offload support for netmap, then check
whether this optimization adds enough performance to be worth it, if yes
it needs more heuristics to avoid hurting latency.

-- 
MST

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-09 12:36 ` Stefan Hajnoczi
@ 2013-12-09 14:02   ` Michael S. Tsirkin
  2013-12-09 14:10     ` Luigi Rizzo
  2013-12-10  8:53     ` Stefan Hajnoczi
  0 siblings, 2 replies; 17+ messages in thread
From: Michael S. Tsirkin @ 2013-12-09 14:02 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: peter.maydell, jasowang, mjt, qemu-devel, Vincenzo Maffione,
	lcapitulino, peter.crosthwaite, owasserm, kraxel, yan,
	edgar.iglesias, akong, quintela, agraf, aliguori, marcel.a, sw,
	stefanha, g.lettieri, rizzo, dmitry, mark.langsdorf, pbonzini,
	afaerber

On Mon, Dec 09, 2013 at 01:36:54PM +0100, Stefan Hajnoczi wrote:
> On Fri, Dec 06, 2013 at 03:44:33PM +0100, Vincenzo Maffione wrote:
> >     - This patch is against the net-next tree (https://github.com/stefanha/qemu.git)
> >       because the first netmap patch is not in the qemu master (AFAIK).
> 
> You are right.  I am sending a pull request now to get those patches
> into qemu.git/master.

This only arrived over the weekend and affects all
net devices. Whats the rush?
Why not give people a chance to review and discuss
properly?

> >  hw/net/cadence_gem.c    |  3 ++-
> >  hw/net/dp8393x.c        |  5 +++--
> >  hw/net/e1000.c          | 21 ++++++++++++++++-----
> >  hw/net/eepro100.c       |  5 +++--
> >  hw/net/etraxfs_eth.c    |  5 +++--
> >  hw/net/lan9118.c        |  2 +-
> >  hw/net/mcf_fec.c        |  5 +++--
> >  hw/net/mipsnet.c        |  6 ++++--
> >  hw/net/ne2000.c         |  5 +++--
> >  hw/net/ne2000.h         |  3 ++-
> >  hw/net/opencores_eth.c  |  2 +-
> >  hw/net/pcnet.c          |  8 +++++---
> >  hw/net/pcnet.h          |  3 ++-
> >  hw/net/rtl8139.c        |  7 ++++---
> >  hw/net/smc91c111.c      |  5 +++--
> >  hw/net/spapr_llan.c     |  2 +-
> >  hw/net/stellaris_enet.c |  3 ++-
> >  hw/net/virtio-net.c     | 10 ++++++++--
> >  hw/net/vmxnet3.c        |  3 ++-
> >  hw/net/vmxnet_tx_pkt.c  |  4 ++--
> >  hw/net/xgmac.c          |  2 +-
> >  hw/net/xilinx_axienet.c |  2 +-
> >  hw/usb/dev-network.c    |  8 +++++---
> >  include/net/net.h       | 20 +++++++++++++-------
> >  include/net/queue.h     |  1 +
> >  net/dump.c              |  3 ++-
> >  net/hub.c               | 10 ++++++----
> >  net/net.c               | 39 +++++++++++++++++++++++----------------
> >  net/netmap.c            | 17 ++++++++++++-----
> >  net/slirp.c             |  5 +++--
> >  net/socket.c            | 10 ++++++----
> >  net/tap-win32.c         |  2 +-
> >  net/tap.c               | 12 +++++++-----
> >  net/vde.c               |  5 +++--
> >  savevm.c                |  2 +-
> >  35 files changed, 155 insertions(+), 90 deletions(-)
> 
> Please split this into multiple patches:
> 
> 1. net subsystem API change that touches all files (if necessary)
> 2. e1000 MORE support
> 3. virtio-net MORE support
> 4. netmap MORE support
> 
> This makes it easier to review and bisect.
> 
> Thanks,
> Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-09 14:02   ` Michael S. Tsirkin
@ 2013-12-09 14:10     ` Luigi Rizzo
  2013-12-10  8:53     ` Stefan Hajnoczi
  1 sibling, 0 replies; 17+ messages in thread
From: Luigi Rizzo @ 2013-12-09 14:10 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Stefan Hajnoczi, Jason Wang, Michael Tokarev,
	qemu-devel, Vincenzo Maffione, Luiz Capitulino,
	peter.crosthwaite, owasserm, Gerd Hoffmann, yan, edgar.iglesias,
	akong, quintela, agraf, Anthony Liguori, marcel.a, sw,
	Stefan Hajnoczi, Giuseppe Lettieri, dmitry, mark.langsdorf,
	Paolo Bonzini, Andreas Färber

[-- Attachment #1: Type: text/plain, Size: 3046 bytes --]

On Mon, Dec 9, 2013 at 3:02 PM, Michael S. Tsirkin <mst@redhat.com> wrote:

> On Mon, Dec 09, 2013 at 01:36:54PM +0100, Stefan Hajnoczi wrote:
> > On Fri, Dec 06, 2013 at 03:44:33PM +0100, Vincenzo Maffione wrote:
> > >     - This patch is against the net-next tree (
> https://github.com/stefanha/qemu.git)
> > >       because the first netmap patch is not in the qemu master (AFAIK).
> >
> > You are right.  I am sending a pull request now to get those patches
> > into qemu.git/master.
>
> This only arrived over the weekend and affects all
> net devices. Whats the rush?
> Why not give people a chance to review and discuss
> properly?
>

as i understand the pull request is for the netmap backend,
not for this batching patch

cheers
luigi


>
> > >  hw/net/cadence_gem.c    |  3 ++-
> > >  hw/net/dp8393x.c        |  5 +++--
> > >  hw/net/e1000.c          | 21 ++++++++++++++++-----
> > >  hw/net/eepro100.c       |  5 +++--
> > >  hw/net/etraxfs_eth.c    |  5 +++--
> > >  hw/net/lan9118.c        |  2 +-
> > >  hw/net/mcf_fec.c        |  5 +++--
> > >  hw/net/mipsnet.c        |  6 ++++--
> > >  hw/net/ne2000.c         |  5 +++--
> > >  hw/net/ne2000.h         |  3 ++-
> > >  hw/net/opencores_eth.c  |  2 +-
> > >  hw/net/pcnet.c          |  8 +++++---
> > >  hw/net/pcnet.h          |  3 ++-
> > >  hw/net/rtl8139.c        |  7 ++++---
> > >  hw/net/smc91c111.c      |  5 +++--
> > >  hw/net/spapr_llan.c     |  2 +-
> > >  hw/net/stellaris_enet.c |  3 ++-
> > >  hw/net/virtio-net.c     | 10 ++++++++--
> > >  hw/net/vmxnet3.c        |  3 ++-
> > >  hw/net/vmxnet_tx_pkt.c  |  4 ++--
> > >  hw/net/xgmac.c          |  2 +-
> > >  hw/net/xilinx_axienet.c |  2 +-
> > >  hw/usb/dev-network.c    |  8 +++++---
> > >  include/net/net.h       | 20 +++++++++++++-------
> > >  include/net/queue.h     |  1 +
> > >  net/dump.c              |  3 ++-
> > >  net/hub.c               | 10 ++++++----
> > >  net/net.c               | 39 +++++++++++++++++++++++----------------
> > >  net/netmap.c            | 17 ++++++++++++-----
> > >  net/slirp.c             |  5 +++--
> > >  net/socket.c            | 10 ++++++----
> > >  net/tap-win32.c         |  2 +-
> > >  net/tap.c               | 12 +++++++-----
> > >  net/vde.c               |  5 +++--
> > >  savevm.c                |  2 +-
> > >  35 files changed, 155 insertions(+), 90 deletions(-)
> >
> > Please split this into multiple patches:
> >
> > 1. net subsystem API change that touches all files (if necessary)
> > 2. e1000 MORE support
> > 3. virtio-net MORE support
> > 4. netmap MORE support
> >
> > This makes it easier to review and bisect.
> >
> > Thanks,
> > Stefan
>



-- 
-----------------------------------------+-------------------------------
 Prof. Luigi RIZZO, rizzo@iet.unipi.it  . Dip. di Ing. dell'Informazione
 http://www.iet.unipi.it/~luigi/        . Universita` di Pisa
 TEL      +39-050-2211611               . via Diotisalvi 2
 Mobile   +39-338-6809875               . 56122 PISA (Italy)
-----------------------------------------+-------------------------------

[-- Attachment #2: Type: text/html, Size: 4887 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-09 14:00               ` Michael S. Tsirkin
@ 2013-12-09 16:04                 ` Vincenzo Maffione
  0 siblings, 0 replies; 17+ messages in thread
From: Vincenzo Maffione @ 2013-12-09 16:04 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Stefan Hajnoczi, Jason Wang, mjt, qemu-devel,
	lcapitulino, peter.crosthwaite, owasserm, Gerd Hoffmann,
	Yan Vugenfirer, Edgar E. Iglesias, akong, quintela,
	Alexander Graf, aliguori, marcel.a, sw, Stefan Hajnoczi,
	Giuseppe Lettieri, Luigi Rizzo, Dmitry Fleytman, mark.langsdorf,
	Paolo Bonzini, Andreas Färber

[-- Attachment #1: Type: text/plain, Size: 3558 bytes --]

Ok,
   We will prepare the support for the vnet-header and offloadings.

However, this requires some API extensions because AFAIK only the TAP
backend currently supports TSO/UFO/CSUM, and consequently virtio-net code
directly calls TAP-specific functions like tap_set_offload(),
tap_using_vnet_hdr(), tap_has_vnet_hdr(), .... These should become
something like qemu_peer_set_offload(), qemu_peer_using_vnet_hdr(), ...
adding the proper callbacks into the NetClientInfo struct.

Does this sound good to you?


Regards,
  Vincenzo


2013/12/9 Michael S. Tsirkin <mst@redhat.com>

> On Mon, Dec 09, 2013 at 02:25:46PM +0100, Vincenzo Maffione wrote:
> >
> >
> >
> > 2013/12/9 Stefan Hajnoczi <stefanha@gmail.com>
> >
> >     On Mon, Dec 09, 2013 at 01:14:31PM +0200, Michael S. Tsirkin wrote:
> >     > On Mon, Dec 09, 2013 at 11:55:57AM +0100, Vincenzo Maffione wrote:
> >     > > If you don't think adding the new flag support for virtio-net is
> a good
> >     idea
> >     > > (though TAP performance is not affected in every case) we could
> also
> >     make it
> >     > > optional.
> >     > >
> >     > >
> >     > > Cheers
> >     > >   Vincenzo
> >     > >
> >     >
> >     > I think it's too early to say whether this patch is benefitial for
> >     > netmap, too.  It looks like something that trades off latency
> >     > for throughput, and this is a decision the endpoint (VM) should
> >     > make, not the network (host).
> >     > So you should measure with offloads on before you make conclusions
> about
> >     it.
> >
> >     Just to check my understanding, we're talking about the following
> kind
> >     of batching:
> >
> >       int num_packets = peek_available_packets(device);
> >       while (num_packets-- > 0) {
> >           int flags = MORE;
> >           if (num_packets == 0) {
> >               flags = NONE;
> >           }
> >           qemu_net_send_packet(..., flags);
> >       }
> >
> >     In other words, this only batches up a single burst of packets.  It
> >     doesn't introduce timers or blocking calls.
> >
> >     So the effect of batching should be relatively small on latency.  In
> >     fact, it's almost like sendmmsg(2)/recvmmsg(2) but using a
> >     one-packet-at-a-time interface.
> >
> >     Does this sound right?
> >
> >     Stefan
> >
> >
> > Totally correct.
> >
> > In reply to Michael:
> >    - what you say is right with netmap used as a backend with typical TCP
> > applications in the guests, and we have already an implementation that
> supports
> > those offloadings
> >
> >    - however, consider that the main use of netmap is fast packet
> processing in
> > middleboxes, where packet aggregation is not always possible.
> Applications that
> > use netmap **in the guest** typically use "packet batching" (i.e. send
> multiple
> > packets with one system call), so batches originate in the guest.
> Without the
> > MORE flag, those batches are split at the frontend-backend interface.
> This is
> > just a different workload.
> >
> >
> > Regards,
> > --
> > Vincenzo Maffione
>
> Considering that you have measured performance regression under
> netperf, I don't understand why do we keep arguing
> about theory. Increasing latency is a problem and if it can already be
> seen with netperf it will only get worse with real life workloads.
>
> So my advice is, start by merging offload support for netmap, then check
> whether this optimization adds enough performance to be worth it, if yes
> it needs more heuristics to avoid hurting latency.
>
> --
> MST
>



-- 
Vincenzo Maffione

[-- Attachment #2: Type: text/html, Size: 4644 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-09 14:02   ` Michael S. Tsirkin
  2013-12-09 14:10     ` Luigi Rizzo
@ 2013-12-10  8:53     ` Stefan Hajnoczi
  1 sibling, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-12-10  8:53 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: peter.maydell, Stefan Hajnoczi, jasowang, mjt, qemu-devel,
	Vincenzo Maffione, lcapitulino, peter.crosthwaite, owasserm,
	kraxel, yan, edgar.iglesias, akong, quintela, agraf, aliguori,
	marcel.a, sw, g.lettieri, rizzo, dmitry, mark.langsdorf,
	pbonzini, afaerber

On Mon, Dec 09, 2013 at 04:02:09PM +0200, Michael S. Tsirkin wrote:
> On Mon, Dec 09, 2013 at 01:36:54PM +0100, Stefan Hajnoczi wrote:
> > On Fri, Dec 06, 2013 at 03:44:33PM +0100, Vincenzo Maffione wrote:
> > >     - This patch is against the net-next tree (https://github.com/stefanha/qemu.git)
> > >       because the first netmap patch is not in the qemu master (AFAIK).
> > 
> > You are right.  I am sending a pull request now to get those patches
> > into qemu.git/master.
> 
> This only arrived over the weekend and affects all
> net devices. Whats the rush?
> Why not give people a chance to review and discuss
> properly?

I'm not merging this patch series yet.

I'm just draining the net-next queue which was reviewed and built up
during QEMU 1.7 hard-freeze:
https://github.com/stefanha/qemu/commits/net-next

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced
  2013-12-09 13:55             ` Michael S. Tsirkin
@ 2013-12-10  9:16               ` Stefan Hajnoczi
  0 siblings, 0 replies; 17+ messages in thread
From: Stefan Hajnoczi @ 2013-12-10  9:16 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Peter Maydell, Stefan Hajnoczi, Jason Wang, mjt, qemu-devel,
	Vincenzo Maffione, lcapitulino, peter.crosthwaite, owasserm,
	Gerd Hoffmann, Yan Vugenfirer, Edgar E. Iglesias, akong,
	quintela, Alexander Graf, aliguori, marcel.a, sw,
	Giuseppe Lettieri, Luigi Rizzo, Dmitry Fleytman, mark.langsdorf,
	Paolo Bonzini, Andreas Färber

On Mon, Dec 09, 2013 at 03:55:00PM +0200, Michael S. Tsirkin wrote:
> On Mon, Dec 09, 2013 at 01:42:30PM +0100, Stefan Hajnoczi wrote:
> > So the effect of batching should be relatively small on latency.  In
> > fact, it's almost like sendmmsg(2)/recvmmsg(2) but using a
> > one-packet-at-a-time interface.
> > 
> > Does this sound right?
> > 
> > Stefan
> 
> Why would it be small?  Consider a queue of 256 packets.
> You are sending out a single short packet, followed
> by a burst of 255 larger packets.
> the single packet is not transmitted until qemu completes
> processing 255 larger ones.

Seems like my intuition is wrong.  I figured QEMU processing those 255
packets is quick.  If zero-copy is possible then we just need to put the
address/length into a ring before we flush the packets to the host
kernel.

The netperf results do show a regression so we need to understand that
better.

Stefan

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2013-12-10  9:16 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-06 14:44 [Qemu-devel] [PATCH] net: QEMU_NET_PACKET_FLAG_MORE introduced Vincenzo Maffione
2013-12-06 16:39 ` Stefan Weil
2013-12-08 12:11 ` Michael S. Tsirkin
2013-12-09 10:20   ` Vincenzo Maffione
2013-12-09 10:30     ` Michael S. Tsirkin
2013-12-09 10:55       ` Vincenzo Maffione
2013-12-09 11:14         ` Michael S. Tsirkin
2013-12-09 12:42           ` Stefan Hajnoczi
2013-12-09 13:25             ` Vincenzo Maffione
2013-12-09 14:00               ` Michael S. Tsirkin
2013-12-09 16:04                 ` Vincenzo Maffione
2013-12-09 13:55             ` Michael S. Tsirkin
2013-12-10  9:16               ` Stefan Hajnoczi
2013-12-09 12:36 ` Stefan Hajnoczi
2013-12-09 14:02   ` Michael S. Tsirkin
2013-12-09 14:10     ` Luigi Rizzo
2013-12-10  8:53     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.