qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PULL 0/7] Net patches
@ 2020-07-15 13:53 Jason Wang
  2020-07-15 13:53 ` [PULL 1/7] virtio-net: fix removal of failover device Jason Wang
                   ` (8 more replies)
  0 siblings, 9 replies; 11+ messages in thread
From: Jason Wang @ 2020-07-15 13:53 UTC (permalink / raw)
  To: peter.maydell; +Cc: Jason Wang, qemu-devel

The following changes since commit 673205379fb499d2b72f2985b47ec7114282f5fe:

  Merge remote-tracking branch 'remotes/philmd-gitlab/tags/python-next-20200714' into staging (2020-07-15 13:04:27 +0100)

are available in the git repository at:

  https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to a134321ef676723768973537bb9b49365ae2062e:

  ftgmac100: fix dblac write test (2020-07-15 21:00:13 +0800)

----------------------------------------------------------------

----------------------------------------------------------------
Andrew (1):
      hw/net: Added CSO for IPv6

Daniel P. Berrangé (1):
      net: detect errors from probing vnet hdr flag for TAP devices

Juan Quintela (1):
      virtio-net: fix removal of failover device

Laurent Vivier (1):
      net: check if the file descriptor is valid before using it

Zhang Chen (2):
      net/colo-compare.c: Expose compare "max_queue_size" to users
      qemu-options.hx: Clean up and fix typo for colo-compare

erik-smit (1):
      ftgmac100: fix dblac write test

 hw/net/ftgmac100.c     | 14 +++++++------
 hw/net/net_tx_pkt.c    | 15 ++++++++++---
 hw/net/virtio-net.c    |  1 +
 include/qemu/sockets.h |  1 +
 net/colo-compare.c     | 43 ++++++++++++++++++++++++++++++++++++-
 net/socket.c           |  9 ++++++--
 net/tap-bsd.c          |  2 +-
 net/tap-linux.c        |  8 ++++---
 net/tap-solaris.c      |  2 +-
 net/tap-stub.c         |  2 +-
 net/tap.c              | 50 +++++++++++++++++++++++++++++++++++--------
 net/tap_int.h          |  2 +-
 qemu-options.hx        | 33 +++++++++++++++--------------
 util/oslib-posix.c     | 26 ++++++++++++++++-------
 util/oslib-win32.c     | 57 ++++++++++++++++++++++++++++----------------------
 15 files changed, 188 insertions(+), 77 deletions(-)



^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PULL 1/7] virtio-net: fix removal of failover device
  2020-07-15 13:53 [PULL 0/7] Net patches Jason Wang
@ 2020-07-15 13:53 ` Jason Wang
  2020-07-15 13:53 ` [PULL 2/7] hw/net: Added CSO for IPv6 Jason Wang
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2020-07-15 13:53 UTC (permalink / raw)
  To: peter.maydell; +Cc: Jason Wang, Michael S. Tsirkin, qemu-devel, Juan Quintela

From: Juan Quintela <quintela@redhat.com>

If you have a networking device and its virtio failover device, and
you remove them in this order:
- virtio device
- the real device

You get qemu crash.
See bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1820120

Bug exist on qemu 4.2 and 5.0.
But in 5.0 don't shows because commit
77b06bba62034a87cc61a9c8de1309ae3e527d97

somehow papers over it.

CC: Jason Wang <jasowang@redhat.com>
CC: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Juan Quintela <quintela@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/virtio-net.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 10cc958..4895af1 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -3416,6 +3416,7 @@ static void virtio_net_device_unrealize(DeviceState *dev)
     g_free(n->vlans);
 
     if (n->failover) {
+        device_listener_unregister(&n->primary_listener);
         g_free(n->primary_device_id);
         g_free(n->standby_id);
         qobject_unref(n->primary_device_dict);
-- 
2.5.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 2/7] hw/net: Added CSO for IPv6
  2020-07-15 13:53 [PULL 0/7] Net patches Jason Wang
  2020-07-15 13:53 ` [PULL 1/7] virtio-net: fix removal of failover device Jason Wang
@ 2020-07-15 13:53 ` Jason Wang
  2020-07-15 13:53 ` [PULL 3/7] net/colo-compare.c: Expose compare "max_queue_size" to users Jason Wang
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2020-07-15 13:53 UTC (permalink / raw)
  To: peter.maydell; +Cc: Jason Wang, Andrew, qemu-devel

From: Andrew <andrew@daynix.com>

Added fix for checksum offload for IPv6 if a backend doesn't
have a virtual header.
This patch is a part of IPv6 fragmentation.

Signed-off-by: Andrew Melnychenko <andrew@daynix.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/net_tx_pkt.c | 15 ++++++++++++---
 1 file changed, 12 insertions(+), 3 deletions(-)

diff --git a/hw/net/net_tx_pkt.c b/hw/net/net_tx_pkt.c
index 162f802..331c73c 100644
--- a/hw/net/net_tx_pkt.c
+++ b/hw/net/net_tx_pkt.c
@@ -468,8 +468,8 @@ static void net_tx_pkt_do_sw_csum(struct NetTxPkt *pkt)
     /* num of iovec without vhdr */
     uint32_t iov_len = pkt->payload_frags + NET_TX_PKT_PL_START_FRAG - 1;
     uint16_t csl;
-    struct ip_header *iphdr;
     size_t csum_offset = pkt->virt_hdr.csum_start + pkt->virt_hdr.csum_offset;
+    uint16_t l3_proto = eth_get_l3_proto(iov, 1, iov->iov_len);
 
     /* Put zero to checksum field */
     iov_from_buf(iov, iov_len, csum_offset, &csum, sizeof csum);
@@ -477,9 +477,18 @@ static void net_tx_pkt_do_sw_csum(struct NetTxPkt *pkt)
     /* Calculate L4 TCP/UDP checksum */
     csl = pkt->payload_len;
 
+    csum_cntr = 0;
+    cso = 0;
     /* add pseudo header to csum */
-    iphdr = pkt->vec[NET_TX_PKT_L3HDR_FRAG].iov_base;
-    csum_cntr = eth_calc_ip4_pseudo_hdr_csum(iphdr, csl, &cso);
+    if (l3_proto == ETH_P_IP) {
+        csum_cntr = eth_calc_ip4_pseudo_hdr_csum(
+                pkt->vec[NET_TX_PKT_L3HDR_FRAG].iov_base,
+                csl, &cso);
+    } else if (l3_proto == ETH_P_IPV6) {
+        csum_cntr = eth_calc_ip6_pseudo_hdr_csum(
+                pkt->vec[NET_TX_PKT_L3HDR_FRAG].iov_base,
+                csl, pkt->l4proto, &cso);
+    }
 
     /* data checksum */
     csum_cntr +=
-- 
2.5.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 3/7] net/colo-compare.c: Expose compare "max_queue_size" to users
  2020-07-15 13:53 [PULL 0/7] Net patches Jason Wang
  2020-07-15 13:53 ` [PULL 1/7] virtio-net: fix removal of failover device Jason Wang
  2020-07-15 13:53 ` [PULL 2/7] hw/net: Added CSO for IPv6 Jason Wang
@ 2020-07-15 13:53 ` Jason Wang
  2020-07-15 13:53 ` [PULL 4/7] qemu-options.hx: Clean up and fix typo for colo-compare Jason Wang
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2020-07-15 13:53 UTC (permalink / raw)
  To: peter.maydell; +Cc: Zhang Chen, Jason Wang, qemu-devel

From: Zhang Chen <chen.zhang@intel.com>

This patch allow users to set the "max_queue_size" according
to their environment.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/colo-compare.c | 43 ++++++++++++++++++++++++++++++++++++++++++-
 qemu-options.hx    |  5 +++--
 2 files changed, 45 insertions(+), 3 deletions(-)

diff --git a/net/colo-compare.c b/net/colo-compare.c
index 398b754..cc15f23 100644
--- a/net/colo-compare.c
+++ b/net/colo-compare.c
@@ -59,6 +59,7 @@ static bool colo_compare_active;
 static QemuMutex event_mtx;
 static QemuCond event_complete_cond;
 static int event_unhandled_count;
+static uint32_t max_queue_size;
 
 /*
  *  + CompareState ++
@@ -222,7 +223,7 @@ static void fill_pkt_tcp_info(void *data, uint32_t *max_ack)
  */
 static int colo_insert_packet(GQueue *queue, Packet *pkt, uint32_t *max_ack)
 {
-    if (g_queue_get_length(queue) <= MAX_QUEUE_SIZE) {
+    if (g_queue_get_length(queue) <= max_queue_size) {
         if (pkt->ip->ip_p == IPPROTO_TCP) {
             fill_pkt_tcp_info(pkt, max_ack);
             g_queue_insert_sorted(queue,
@@ -1134,6 +1135,37 @@ static void compare_set_expired_scan_cycle(Object *obj, Visitor *v,
     s->expired_scan_cycle = value;
 }
 
+static void get_max_queue_size(Object *obj, Visitor *v,
+                               const char *name, void *opaque,
+                               Error **errp)
+{
+    uint32_t value = max_queue_size;
+
+    visit_type_uint32(v, name, &value, errp);
+}
+
+static void set_max_queue_size(Object *obj, Visitor *v,
+                               const char *name, void *opaque,
+                               Error **errp)
+{
+    Error *local_err = NULL;
+    uint32_t value;
+
+    visit_type_uint32(v, name, &value, &local_err);
+    if (local_err) {
+        goto out;
+    }
+    if (!value) {
+        error_setg(&local_err, "Property '%s.%s' requires a positive value",
+                   object_get_typename(obj), name);
+        goto out;
+    }
+    max_queue_size = value;
+
+out:
+    error_propagate(errp, local_err);
+}
+
 static void compare_pri_rs_finalize(SocketReadState *pri_rs)
 {
     CompareState *s = container_of(pri_rs, CompareState, pri_rs);
@@ -1251,6 +1283,11 @@ static void colo_compare_complete(UserCreatable *uc, Error **errp)
         s->expired_scan_cycle = REGULAR_PACKET_CHECK_MS;
     }
 
+    if (!max_queue_size) {
+        /* Set default queue size to 1024 */
+        max_queue_size = MAX_QUEUE_SIZE;
+    }
+
     if (find_and_check_chardev(&chr, s->pri_indev, errp) ||
         !qemu_chr_fe_init(&s->chr_pri_in, chr, errp)) {
         return;
@@ -1370,6 +1407,10 @@ static void colo_compare_init(Object *obj)
                         compare_get_expired_scan_cycle,
                         compare_set_expired_scan_cycle, NULL, NULL);
 
+    object_property_add(obj, "max_queue_size", "uint32",
+                        get_max_queue_size,
+                        set_max_queue_size, NULL, NULL);
+
     s->vnet_hdr = false;
     object_property_add_bool(obj, "vnet_hdr_support", compare_get_vnet_hdr,
                              compare_set_vnet_hdr);
diff --git a/qemu-options.hx b/qemu-options.hx
index d2c1e95..310885c 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4695,7 +4695,7 @@ SRST
         stored. The file format is libpcap, so it can be analyzed with
         tools such as tcpdump or Wireshark.
 
-    ``-object colo-compare,id=id,primary_in=chardevid,secondary_in=chardevid,outdev=chardevid,iothread=id[,vnet_hdr_support][,notify_dev=id][,compare_timeout=@var{ms}][,expired_scan_cycle=@var{ms}``
+    ``-object colo-compare,id=id,primary_in=chardevid,secondary_in=chardevid,outdev=chardevid,iothread=id[,vnet_hdr_support][,notify_dev=id][,compare_timeout=@var{ms}][,expired_scan_cycle=@var{ms}][,max_queue_size=@var{size}]``
         Colo-compare gets packet from primary\_inchardevid and
         secondary\_inchardevid, than compare primary packet with
         secondary packet. If the packets are same, we will output
@@ -4707,7 +4707,8 @@ SRST
         vnet\_hdr\_len. Then compare\_timeout=@var{ms} determines the
         maximum delay colo-compare wait for the packet.
         The expired\_scan\_cycle=@var{ms} to set the period of scanning
-        expired primary node network packets.
+        expired primary node network packets. The max\_queue\_size=@var{size}
+        is to set the max compare queue size depend on user environment.
         If you want to use Xen COLO, will need the notify\_dev to
         notify Xen colo-frame to do checkpoint.
 
-- 
2.5.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 4/7] qemu-options.hx: Clean up and fix typo for colo-compare
  2020-07-15 13:53 [PULL 0/7] Net patches Jason Wang
                   ` (2 preceding siblings ...)
  2020-07-15 13:53 ` [PULL 3/7] net/colo-compare.c: Expose compare "max_queue_size" to users Jason Wang
@ 2020-07-15 13:53 ` Jason Wang
  2020-07-15 13:53 ` [PULL 5/7] net: check if the file descriptor is valid before using it Jason Wang
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2020-07-15 13:53 UTC (permalink / raw)
  To: peter.maydell; +Cc: Zhang Chen, Jason Wang, qemu-devel

From: Zhang Chen <chen.zhang@intel.com>

Fix some typo and optimized some descriptions.

Signed-off-by: Zhang Chen <chen.zhang@intel.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 qemu-options.hx | 32 ++++++++++++++++----------------
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/qemu-options.hx b/qemu-options.hx
index 310885c..65147ad 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4696,24 +4696,24 @@ SRST
         tools such as tcpdump or Wireshark.
 
     ``-object colo-compare,id=id,primary_in=chardevid,secondary_in=chardevid,outdev=chardevid,iothread=id[,vnet_hdr_support][,notify_dev=id][,compare_timeout=@var{ms}][,expired_scan_cycle=@var{ms}][,max_queue_size=@var{size}]``
-        Colo-compare gets packet from primary\_inchardevid and
-        secondary\_inchardevid, than compare primary packet with
-        secondary packet. If the packets are same, we will output
-        primary packet to outdevchardevid, else we will notify
-        colo-frame do checkpoint and send primary packet to
-        outdevchardevid. In order to improve efficiency, we need to put
-        the task of comparison in another thread. If it has the
-        vnet\_hdr\_support flag, colo compare will send/recv packet with
-        vnet\_hdr\_len. Then compare\_timeout=@var{ms} determines the
-        maximum delay colo-compare wait for the packet.
-        The expired\_scan\_cycle=@var{ms} to set the period of scanning
-        expired primary node network packets. The max\_queue\_size=@var{size}
-        is to set the max compare queue size depend on user environment.
-        If you want to use Xen COLO, will need the notify\_dev to
+        Colo-compare gets packet from primary\_in chardevid and
+        secondary\_in, then compare whether the payload of primary packet
+        and secondary packet are the same. If same, it will output
+        primary packet to out\_dev, else it will notify COLO-framework to do
+        checkpoint and send primary packet to out\_dev. In order to
+        improve efficiency, we need to put the task of comparison in
+        another iothread. If it has the vnet\_hdr\_support flag,
+        colo compare will send/recv packet with vnet\_hdr\_len.
+        The compare\_timeout=@var{ms} determines the maximum time of the
+        colo-compare hold the packet. The expired\_scan\_cycle=@var{ms}
+        is to set the period of scanning expired primary node network packets.
+        The max\_queue\_size=@var{size} is to set the max compare queue
+        size depend on user environment.
+        If user want to use Xen COLO, need to add the notify\_dev to
         notify Xen colo-frame to do checkpoint.
 
-        we must use it with the help of filter-mirror and
-        filter-redirector.
+        COLO-compare must be used with the help of filter-mirror,
+        filter-redirector and filter-rewriter.
 
         ::
 
-- 
2.5.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 5/7] net: check if the file descriptor is valid before using it
  2020-07-15 13:53 [PULL 0/7] Net patches Jason Wang
                   ` (3 preceding siblings ...)
  2020-07-15 13:53 ` [PULL 4/7] qemu-options.hx: Clean up and fix typo for colo-compare Jason Wang
@ 2020-07-15 13:53 ` Jason Wang
  2020-07-15 13:53 ` [PULL 6/7] net: detect errors from probing vnet hdr flag for TAP devices Jason Wang
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2020-07-15 13:53 UTC (permalink / raw)
  To: peter.maydell; +Cc: Laurent Vivier, Jason Wang, qemu-devel

From: Laurent Vivier <lvivier@redhat.com>

qemu_set_nonblock() checks that the file descriptor can be used and, if
not, crashes QEMU. An assert() is used for that. The use of assert() is
used to detect programming error and the coredump will allow to debug
the problem.

But in the case of the tap device, this assert() can be triggered by
a misconfiguration by the user. At startup, it's not a real problem, but it
can also happen during the hot-plug of a new device, and here it's a
problem because we can crash a perfectly healthy system.

For instance:
 # ip link add link virbr0 name macvtap0 type macvtap mode bridge
 # ip link set macvtap0 up
 # TAP=/dev/tap$(ip -o link show macvtap0 | cut -d: -f1)
 # qemu-system-x86_64 -machine q35 -device pcie-root-port,id=pcie-root-port-0 -monitor stdio 9<> $TAP
 (qemu) netdev_add type=tap,id=hostnet0,vhost=on,fd=9
 (qemu) device_add driver=virtio-net-pci,netdev=hostnet0,id=net0,bus=pcie-root-port-0
 (qemu) device_del net0
 (qemu) netdev_del hostnet0
 (qemu) netdev_add type=tap,id=hostnet1,vhost=on,fd=9
 qemu-system-x86_64: .../util/oslib-posix.c:247: qemu_set_nonblock: Assertion `f != -1' failed.
 Aborted (core dumped)

To avoid that, add a function, qemu_try_set_nonblock(), that allows to report the
problem without crashing.

In the same way, we also update the function for vhostfd in net_init_tap_one() and
for fd in net_init_socket() (both descriptors are provided by the user and can
be wrong).

Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 include/qemu/sockets.h |  1 +
 net/socket.c           |  9 ++++++--
 net/tap.c              | 25 ++++++++++++++++++----
 util/oslib-posix.c     | 26 ++++++++++++++++-------
 util/oslib-win32.c     | 57 ++++++++++++++++++++++++++++----------------------
 5 files changed, 79 insertions(+), 39 deletions(-)

diff --git a/include/qemu/sockets.h b/include/qemu/sockets.h
index 57cd049..7d1f813 100644
--- a/include/qemu/sockets.h
+++ b/include/qemu/sockets.h
@@ -18,6 +18,7 @@ int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 int socket_set_cork(int fd, int v);
 int socket_set_nodelay(int fd);
 void qemu_set_block(int fd);
+int qemu_try_set_nonblock(int fd);
 void qemu_set_nonblock(int fd);
 int socket_set_fast_reuse(int fd);
 
diff --git a/net/socket.c b/net/socket.c
index c923540..2d21fdd 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -725,13 +725,18 @@ int net_init_socket(const Netdev *netdev, const char *name,
     }
 
     if (sock->has_fd) {
-        int fd;
+        int fd, ret;
 
         fd = monitor_fd_param(cur_mon, sock->fd, errp);
         if (fd == -1) {
             return -1;
         }
-        qemu_set_nonblock(fd);
+        ret = qemu_try_set_nonblock(fd);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "%s: Can't use file descriptor %d",
+                             name, fd);
+            return -1;
+        }
         if (!net_socket_fd_init(peer, "socket", name, fd, 1, sock->mcast,
                                 errp)) {
             return -1;
diff --git a/net/tap.c b/net/tap.c
index f9dcc2e..32e4813 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -690,6 +690,8 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
         }
 
         if (vhostfdname) {
+            int ret;
+
             vhostfd = monitor_fd_param(cur_mon, vhostfdname, &err);
             if (vhostfd == -1) {
                 if (tap->has_vhostforce && tap->vhostforce) {
@@ -699,7 +701,12 @@ static void net_init_tap_one(const NetdevTapOptions *tap, NetClientState *peer,
                 }
                 return;
             }
-            qemu_set_nonblock(vhostfd);
+            ret = qemu_try_set_nonblock(vhostfd);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "%s: Can't use file descriptor %d",
+                                 name, fd);
+                return;
+            }
         } else {
             vhostfd = open("/dev/vhost-net", O_RDWR);
             if (vhostfd < 0) {
@@ -767,6 +774,7 @@ int net_init_tap(const Netdev *netdev, const char *name,
     Error *err = NULL;
     const char *vhostfdname;
     char ifname[128];
+    int ret = 0;
 
     assert(netdev->type == NET_CLIENT_DRIVER_TAP);
     tap = &netdev->u.tap;
@@ -795,7 +803,12 @@ int net_init_tap(const Netdev *netdev, const char *name,
             return -1;
         }
 
-        qemu_set_nonblock(fd);
+        ret = qemu_try_set_nonblock(fd);
+        if (ret < 0) {
+            error_setg_errno(errp, -ret, "%s: Can't use file descriptor %d",
+                             name, fd);
+            return -1;
+        }
 
         vnet_hdr = tap_probe_vnet_hdr(fd);
 
@@ -810,7 +823,6 @@ int net_init_tap(const Netdev *netdev, const char *name,
         char **fds;
         char **vhost_fds;
         int nfds = 0, nvhosts = 0;
-        int ret = 0;
 
         if (tap->has_ifname || tap->has_script || tap->has_downscript ||
             tap->has_vnet_hdr || tap->has_helper || tap->has_queues ||
@@ -842,7 +854,12 @@ int net_init_tap(const Netdev *netdev, const char *name,
                 goto free_fail;
             }
 
-            qemu_set_nonblock(fd);
+            ret = qemu_try_set_nonblock(fd);
+            if (ret < 0) {
+                error_setg_errno(errp, -ret, "%s: Can't use file descriptor %d",
+                                 name, fd);
+                goto free_fail;
+            }
 
             if (i == 0) {
                 vnet_hdr = tap_probe_vnet_hdr(fd);
diff --git a/util/oslib-posix.c b/util/oslib-posix.c
index e60aea8..36bf859 100644
--- a/util/oslib-posix.c
+++ b/util/oslib-posix.c
@@ -260,25 +260,35 @@ void qemu_set_block(int fd)
     assert(f != -1);
 }
 
-void qemu_set_nonblock(int fd)
+int qemu_try_set_nonblock(int fd)
 {
     int f;
     f = fcntl(fd, F_GETFL);
-    assert(f != -1);
-    f = fcntl(fd, F_SETFL, f | O_NONBLOCK);
-#ifdef __OpenBSD__
     if (f == -1) {
+        return -errno;
+    }
+    if (fcntl(fd, F_SETFL, f | O_NONBLOCK) == -1) {
+#ifdef __OpenBSD__
         /*
          * Previous to OpenBSD 6.3, fcntl(F_SETFL) is not permitted on
          * memory devices and sets errno to ENODEV.
          * It's OK if we fail to set O_NONBLOCK on devices like /dev/null,
          * because they will never block anyway.
          */
-        assert(errno == ENODEV);
-    }
-#else
-    assert(f != -1);
+        if (errno == ENODEV) {
+            return 0;
+        }
 #endif
+        return -errno;
+    }
+    return 0;
+}
+
+void qemu_set_nonblock(int fd)
+{
+    int f;
+    f = qemu_try_set_nonblock(fd);
+    assert(f == 0);
 }
 
 int socket_set_fast_reuse(int fd)
diff --git a/util/oslib-win32.c b/util/oslib-win32.c
index 3b49d27..7eedbe5 100644
--- a/util/oslib-win32.c
+++ b/util/oslib-win32.c
@@ -132,31 +132,6 @@ struct tm *localtime_r(const time_t *timep, struct tm *result)
 }
 #endif /* CONFIG_LOCALTIME_R */
 
-void qemu_set_block(int fd)
-{
-    unsigned long opt = 0;
-    WSAEventSelect(fd, NULL, 0);
-    ioctlsocket(fd, FIONBIO, &opt);
-}
-
-void qemu_set_nonblock(int fd)
-{
-    unsigned long opt = 1;
-    ioctlsocket(fd, FIONBIO, &opt);
-    qemu_fd_register(fd);
-}
-
-int socket_set_fast_reuse(int fd)
-{
-    /* Enabling the reuse of an endpoint that was used by a socket still in
-     * TIME_WAIT state is usually performed by setting SO_REUSEADDR. On Windows
-     * fast reuse is the default and SO_REUSEADDR does strange things. So we
-     * don't have to do anything here. More info can be found at:
-     * http://msdn.microsoft.com/en-us/library/windows/desktop/ms740621.aspx */
-    return 0;
-}
-
-
 static int socket_error(void)
 {
     switch (WSAGetLastError()) {
@@ -233,6 +208,38 @@ static int socket_error(void)
     }
 }
 
+void qemu_set_block(int fd)
+{
+    unsigned long opt = 0;
+    WSAEventSelect(fd, NULL, 0);
+    ioctlsocket(fd, FIONBIO, &opt);
+}
+
+int qemu_try_set_nonblock(int fd)
+{
+    unsigned long opt = 1;
+    if (ioctlsocket(fd, FIONBIO, &opt) != NO_ERROR) {
+        return -socket_error();
+    }
+    qemu_fd_register(fd);
+    return 0;
+}
+
+void qemu_set_nonblock(int fd)
+{
+    (void)qemu_try_set_nonblock(fd);
+}
+
+int socket_set_fast_reuse(int fd)
+{
+    /* Enabling the reuse of an endpoint that was used by a socket still in
+     * TIME_WAIT state is usually performed by setting SO_REUSEADDR. On Windows
+     * fast reuse is the default and SO_REUSEADDR does strange things. So we
+     * don't have to do anything here. More info can be found at:
+     * http://msdn.microsoft.com/en-us/library/windows/desktop/ms740621.aspx */
+    return 0;
+}
+
 int inet_aton(const char *cp, struct in_addr *ia)
 {
     uint32_t addr = inet_addr(cp);
-- 
2.5.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 6/7] net: detect errors from probing vnet hdr flag for TAP devices
  2020-07-15 13:53 [PULL 0/7] Net patches Jason Wang
                   ` (4 preceding siblings ...)
  2020-07-15 13:53 ` [PULL 5/7] net: check if the file descriptor is valid before using it Jason Wang
@ 2020-07-15 13:53 ` Jason Wang
  2020-07-15 13:53 ` [PULL 7/7] ftgmac100: fix dblac write test Jason Wang
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2020-07-15 13:53 UTC (permalink / raw)
  To: peter.maydell; +Cc: Laurent Vivier, Jason Wang, Daniel P. Berrange, qemu-devel

From: "Daniel P. Berrange" <berrange@redhat.com>

When QEMU sets up a tap based network device backend, it mostly ignores errors
reported from various ioctl() calls it makes, assuming the TAP file descriptor
is valid. This assumption can easily be violated when the user is passing in a
pre-opened file descriptor. At best, the ioctls may fail with a -EBADF, but if
the user passes in a bogus FD number that happens to clash with a FD number that
QEMU has opened internally for another reason, a wide variety of errnos may
result, as the TUNGETIFF ioctl number may map to a completely different command
on a different type of file.

By ignoring all these errors, QEMU sets up a zombie network backend that will
never pass any data. Even worse, when QEMU shuts down, or that network backend
is hot-removed, it will close this bogus file descriptor, which could belong to
another QEMU device backend.

There's no obvious guaranteed reliable way to detect that a FD genuinely is a
TAP device, as opposed to a UNIX socket, or pipe, or something else. Checking
the errno from probing vnet hdr flag though, does catch the big common cases.
ie calling TUNGETIFF will return EBADF for an invalid FD, and ENOTTY when FD is
a UNIX socket, or pipe which catches accidental collisions with FDs used for
stdio, or monitor socket.

Previously the example below where bogus fd 9 collides with the FD used for the
chardev saw:

$ ./x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hostnet0,fd=9 \
  -chardev socket,id=charchannel0,path=/tmp/qga,server,nowait \
  -monitor stdio -vnc :0
qemu-system-x86_64: -netdev tap,id=hostnet0,fd=9: TUNGETIFF ioctl() failed: Inappropriate ioctl for device
TUNSETOFFLOAD ioctl() failed: Bad address
QEMU 2.9.1 monitor - type 'help' for more information
(qemu) Warning: netdev hostnet0 has no peer

which gives a running QEMU with a zombie network backend.

With this change applied we get an error message and QEMU immediately exits
before carrying on and making a bigger disaster:

$ ./x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hostnet0,fd=9 \
  -chardev socket,id=charchannel0,path=/tmp/qga,server,nowait \
  -monitor stdio -vnc :0
qemu-system-x86_64: -netdev tap,id=hostnet0,vhost=on,fd=9: Unable to query TUNGETIFF on FD 9: Inappropriate ioctl for device

Reported-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Tested-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-id: 20171027085548.3472-1-berrange@redhat.com
[lv: to simplify, don't check on EINVAL with TUNGETIFF as it exists since v2.6.27]
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 net/tap-bsd.c     |  2 +-
 net/tap-linux.c   |  8 +++++---
 net/tap-solaris.c |  2 +-
 net/tap-stub.c    |  2 +-
 net/tap.c         | 25 ++++++++++++++++++++-----
 net/tap_int.h     |  2 +-
 6 files changed, 29 insertions(+), 12 deletions(-)

diff --git a/net/tap-bsd.c b/net/tap-bsd.c
index a5c3707..77aaf67 100644
--- a/net/tap-bsd.c
+++ b/net/tap-bsd.c
@@ -211,7 +211,7 @@ void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
 {
 }
 
-int tap_probe_vnet_hdr(int fd)
+int tap_probe_vnet_hdr(int fd, Error **errp)
 {
     return 0;
 }
diff --git a/net/tap-linux.c b/net/tap-linux.c
index e0dd442..b0635e9 100644
--- a/net/tap-linux.c
+++ b/net/tap-linux.c
@@ -147,13 +147,15 @@ void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
     }
 }
 
-int tap_probe_vnet_hdr(int fd)
+int tap_probe_vnet_hdr(int fd, Error **errp)
 {
     struct ifreq ifr;
 
     if (ioctl(fd, TUNGETIFF, &ifr) != 0) {
-        error_report("TUNGETIFF ioctl() failed: %s", strerror(errno));
-        return 0;
+        /* TUNGETIFF is available since kernel v2.6.27 */
+        error_setg_errno(errp, errno,
+                         "Unable to query TUNGETIFF on FD %d", fd);
+        return -1;
     }
 
     return ifr.ifr_flags & IFF_VNET_HDR;
diff --git a/net/tap-solaris.c b/net/tap-solaris.c
index d03165c..0475a58 100644
--- a/net/tap-solaris.c
+++ b/net/tap-solaris.c
@@ -207,7 +207,7 @@ void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
 {
 }
 
-int tap_probe_vnet_hdr(int fd)
+int tap_probe_vnet_hdr(int fd, Error **errp)
 {
     return 0;
 }
diff --git a/net/tap-stub.c b/net/tap-stub.c
index a9ab8f8..de525a2 100644
--- a/net/tap-stub.c
+++ b/net/tap-stub.c
@@ -37,7 +37,7 @@ void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp)
 {
 }
 
-int tap_probe_vnet_hdr(int fd)
+int tap_probe_vnet_hdr(int fd, Error **errp)
 {
     return 0;
 }
diff --git a/net/tap.c b/net/tap.c
index 32e4813..14dc904 100644
--- a/net/tap.c
+++ b/net/tap.c
@@ -598,7 +598,11 @@ int net_init_bridge(const Netdev *netdev, const char *name,
     }
 
     qemu_set_nonblock(fd);
-    vnet_hdr = tap_probe_vnet_hdr(fd);
+    vnet_hdr = tap_probe_vnet_hdr(fd, errp);
+    if (vnet_hdr < 0) {
+        close(fd);
+        return -1;
+    }
     s = net_tap_fd_init(peer, "bridge", name, fd, vnet_hdr);
 
     snprintf(s->nc.info_str, sizeof(s->nc.info_str), "helper=%s,br=%s", helper,
@@ -810,7 +814,11 @@ int net_init_tap(const Netdev *netdev, const char *name,
             return -1;
         }
 
-        vnet_hdr = tap_probe_vnet_hdr(fd);
+        vnet_hdr = tap_probe_vnet_hdr(fd, errp);
+        if (vnet_hdr < 0) {
+            close(fd);
+            return -1;
+        }
 
         net_init_tap_one(tap, peer, "tap", name, NULL,
                          script, downscript,
@@ -862,8 +870,11 @@ int net_init_tap(const Netdev *netdev, const char *name,
             }
 
             if (i == 0) {
-                vnet_hdr = tap_probe_vnet_hdr(fd);
-            } else if (vnet_hdr != tap_probe_vnet_hdr(fd)) {
+                vnet_hdr = tap_probe_vnet_hdr(fd, errp);
+                if (vnet_hdr < 0) {
+                    goto free_fail;
+                }
+            } else if (vnet_hdr != tap_probe_vnet_hdr(fd, NULL)) {
                 error_setg(errp,
                            "vnet_hdr not consistent across given tap fds");
                 ret = -1;
@@ -908,7 +919,11 @@ free_fail:
         }
 
         qemu_set_nonblock(fd);
-        vnet_hdr = tap_probe_vnet_hdr(fd);
+        vnet_hdr = tap_probe_vnet_hdr(fd, errp);
+        if (vnet_hdr < 0) {
+            close(fd);
+            return -1;
+        }
 
         net_init_tap_one(tap, peer, "bridge", name, ifname,
                          script, downscript, vhostfdname,
diff --git a/net/tap_int.h b/net/tap_int.h
index e3194b2..225a49e 100644
--- a/net/tap_int.h
+++ b/net/tap_int.h
@@ -34,7 +34,7 @@ int tap_open(char *ifname, int ifname_size, int *vnet_hdr,
 ssize_t tap_read_packet(int tapfd, uint8_t *buf, int maxlen);
 
 void tap_set_sndbuf(int fd, const NetdevTapOptions *tap, Error **errp);
-int tap_probe_vnet_hdr(int fd);
+int tap_probe_vnet_hdr(int fd, Error **errp);
 int tap_probe_vnet_hdr_len(int fd, int len);
 int tap_probe_has_ufo(int fd);
 void tap_fd_set_offload(int fd, int csum, int tso4, int tso6, int ecn, int ufo);
-- 
2.5.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PULL 7/7] ftgmac100: fix dblac write test
  2020-07-15 13:53 [PULL 0/7] Net patches Jason Wang
                   ` (5 preceding siblings ...)
  2020-07-15 13:53 ` [PULL 6/7] net: detect errors from probing vnet hdr flag for TAP devices Jason Wang
@ 2020-07-15 13:53 ` Jason Wang
  2020-07-15 18:06 ` [PULL 0/7] Net patches Peter Maydell
  2020-07-16 13:46 ` Peter Maydell
  8 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2020-07-15 13:53 UTC (permalink / raw)
  To: peter.maydell; +Cc: Jason Wang, erik-smit, qemu-devel

From: erik-smit <erik.lucas.smit@gmail.com>

The test of the write of the dblac register was testing the old value
instead of the new value. This would accept the write of an invalid value
but subsequently refuse any following valid writes.

Signed-off-by: erik-smit <erik.lucas.smit@gmail.com>
Reviewed-by: Cédric Le Goater <clg@kaod.org>
Signed-off-by: Jason Wang <jasowang@redhat.com>
---
 hw/net/ftgmac100.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/hw/net/ftgmac100.c b/hw/net/ftgmac100.c
index 043ba61..5f4b26f 100644
--- a/hw/net/ftgmac100.c
+++ b/hw/net/ftgmac100.c
@@ -810,16 +810,18 @@ static void ftgmac100_write(void *opaque, hwaddr addr,
         s->phydata = value & 0xffff;
         break;
     case FTGMAC100_DBLAC: /* DMA Burst Length and Arbitration Control */
-        if (FTGMAC100_DBLAC_TXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
+        if (FTGMAC100_DBLAC_TXDES_SIZE(value) < sizeof(FTGMAC100Desc)) {
             qemu_log_mask(LOG_GUEST_ERROR,
-                          "%s: transmit descriptor too small : %d bytes\n",
-                          __func__, FTGMAC100_DBLAC_TXDES_SIZE(s->dblac));
+                          "%s: transmit descriptor too small: %" PRIx64
+                          " bytes\n", __func__,
+                          FTGMAC100_DBLAC_TXDES_SIZE(value));
             break;
         }
-        if (FTGMAC100_DBLAC_RXDES_SIZE(s->dblac) < sizeof(FTGMAC100Desc)) {
+        if (FTGMAC100_DBLAC_RXDES_SIZE(value) < sizeof(FTGMAC100Desc)) {
             qemu_log_mask(LOG_GUEST_ERROR,
-                          "%s: receive descriptor too small : %d bytes\n",
-                          __func__, FTGMAC100_DBLAC_RXDES_SIZE(s->dblac));
+                          "%s: receive descriptor too small : %" PRIx64
+                          " bytes\n", __func__,
+                          FTGMAC100_DBLAC_RXDES_SIZE(value));
             break;
         }
         s->dblac = value;
-- 
2.5.0



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PULL 0/7] Net patches
  2020-07-15 13:53 [PULL 0/7] Net patches Jason Wang
                   ` (6 preceding siblings ...)
  2020-07-15 13:53 ` [PULL 7/7] ftgmac100: fix dblac write test Jason Wang
@ 2020-07-15 18:06 ` Peter Maydell
  2020-07-16  3:00   ` Jason Wang
  2020-07-16 13:46 ` Peter Maydell
  8 siblings, 1 reply; 11+ messages in thread
From: Peter Maydell @ 2020-07-15 18:06 UTC (permalink / raw)
  To: Jason Wang; +Cc: QEMU Developers

On Wed, 15 Jul 2020 at 14:53, Jason Wang <jasowang@redhat.com> wrote:
>
> The following changes since commit 673205379fb499d2b72f2985b47ec7114282f5fe:
>
>   Merge remote-tracking branch 'remotes/philmd-gitlab/tags/python-next-20200714' into staging (2020-07-15 13:04:27 +0100)
>
> are available in the git repository at:
>
>   https://github.com/jasowang/qemu.git tags/net-pull-request
>
> for you to fetch changes up to a134321ef676723768973537bb9b49365ae2062e:
>
>   ftgmac100: fix dblac write test (2020-07-15 21:00:13 +0800)
>
> ----------------------------------------------------------------
>
> ----------------------------------------------------------------

just a note that this has missed rc0 but will go into rc1.

thanks
-- PMM


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PULL 0/7] Net patches
  2020-07-15 18:06 ` [PULL 0/7] Net patches Peter Maydell
@ 2020-07-16  3:00   ` Jason Wang
  0 siblings, 0 replies; 11+ messages in thread
From: Jason Wang @ 2020-07-16  3:00 UTC (permalink / raw)
  To: Peter Maydell; +Cc: QEMU Developers


On 2020/7/16 上午2:06, Peter Maydell wrote:
> On Wed, 15 Jul 2020 at 14:53, Jason Wang <jasowang@redhat.com> wrote:
>> The following changes since commit 673205379fb499d2b72f2985b47ec7114282f5fe:
>>
>>    Merge remote-tracking branch 'remotes/philmd-gitlab/tags/python-next-20200714' into staging (2020-07-15 13:04:27 +0100)
>>
>> are available in the git repository at:
>>
>>    https://github.com/jasowang/qemu.git tags/net-pull-request
>>
>> for you to fetch changes up to a134321ef676723768973537bb9b49365ae2062e:
>>
>>    ftgmac100: fix dblac write test (2020-07-15 21:00:13 +0800)
>>
>> ----------------------------------------------------------------
>>
>> ----------------------------------------------------------------
> just a note that this has missed rc0 but will go into rc1.


Yes, I think it's fine.

Thanks


>
> thanks
> -- PMM
>



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PULL 0/7] Net patches
  2020-07-15 13:53 [PULL 0/7] Net patches Jason Wang
                   ` (7 preceding siblings ...)
  2020-07-15 18:06 ` [PULL 0/7] Net patches Peter Maydell
@ 2020-07-16 13:46 ` Peter Maydell
  8 siblings, 0 replies; 11+ messages in thread
From: Peter Maydell @ 2020-07-16 13:46 UTC (permalink / raw)
  To: Jason Wang; +Cc: QEMU Developers

On Wed, 15 Jul 2020 at 14:53, Jason Wang <jasowang@redhat.com> wrote:
>
> The following changes since commit 673205379fb499d2b72f2985b47ec7114282f5fe:
>
>   Merge remote-tracking branch 'remotes/philmd-gitlab/tags/python-next-20200714' into staging (2020-07-15 13:04:27 +0100)
>
> are available in the git repository at:
>
>   https://github.com/jasowang/qemu.git tags/net-pull-request
>
> for you to fetch changes up to a134321ef676723768973537bb9b49365ae2062e:
>
>   ftgmac100: fix dblac write test (2020-07-15 21:00:13 +0800)
>
> ----------------------------------------------------------------
>
> ----------------------------------------------------------------



Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/5.1
for any user-visible changes.

-- PMM


^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2020-07-16 13:47 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-15 13:53 [PULL 0/7] Net patches Jason Wang
2020-07-15 13:53 ` [PULL 1/7] virtio-net: fix removal of failover device Jason Wang
2020-07-15 13:53 ` [PULL 2/7] hw/net: Added CSO for IPv6 Jason Wang
2020-07-15 13:53 ` [PULL 3/7] net/colo-compare.c: Expose compare "max_queue_size" to users Jason Wang
2020-07-15 13:53 ` [PULL 4/7] qemu-options.hx: Clean up and fix typo for colo-compare Jason Wang
2020-07-15 13:53 ` [PULL 5/7] net: check if the file descriptor is valid before using it Jason Wang
2020-07-15 13:53 ` [PULL 6/7] net: detect errors from probing vnet hdr flag for TAP devices Jason Wang
2020-07-15 13:53 ` [PULL 7/7] ftgmac100: fix dblac write test Jason Wang
2020-07-15 18:06 ` [PULL 0/7] Net patches Peter Maydell
2020-07-16  3:00   ` Jason Wang
2020-07-16 13:46 ` Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).