[Qemu-devel] [PATCH v8 0/7] char: non-blocking writes, virtio-console flow control

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH v8 0/7] char: non-blocking writes, virtio-console flow control
@ 2010-12-01  9:54 Amit Shah
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 1/7] virtio-console: Factor out common init between console and generic ports Amit Shah
                   ` (6 more replies)
  0 siblings, 7 replies; 23+ messages in thread
From: Amit Shah @ 2010-12-01  9:54 UTC (permalink / raw)
  To: qemu list; +Cc: Amit Shah, Gerd Hoffmann, Juan Quintela

Hello,

This refreshed series adds support for non-blocking chardev writes if
the caller asks for it.

This series does away with the special qemu_chr_write_nb() call that
did nonblocking writes.  Instead, now the writes are nonblocking if
the caller registers a callback function with the chardev that
indicates a driver became writable.

Individual drivers too have to be modified a bit to update their fd
handlers for nonblocking writes to work.

This series has non-blocking support for the unix and tcp drivers.
The virtio-console code is tweaked to use this facility.

This helps a fast guest slow down in case of a slow host reader.  The
worst-case condition was a guest writing data to a chardev that wasn't
being read from resulting in the guest getting unresponsive.

Amit Shah (7):
  virtio-console: Factor out common init between console and generic
    ports
  char: Add a QemuChrHandlers struct to initialise chardev handlers
  char: Introduce char_set/remove_fd_handlers()
  char: Add framework for a 'write unblocked' callback
  char: Update send_all() to handle nonblocking chardev write requests
  char: Equip the unix/tcp backend to handle nonblocking writes
  virtio-console: Enable port throttling when chardev is slow to
    consume data

 gdbstub.c            |    9 ++-
 hw/debugcon.c        |    2 +-
 hw/escc.c            |    9 ++-
 hw/etraxfs_ser.c     |   13 +++-
 hw/ivshmem.c         |   28 ++++++--
 hw/mcf_uart.c        |    9 ++-
 hw/pl011.c           |    9 ++-
 hw/pxa2xx.c          |   13 +++-
 hw/serial.c          |    9 ++-
 hw/sh_serial.c       |   12 +++-
 hw/syborg_serial.c   |    9 ++-
 hw/usb-serial.c      |    9 ++-
 hw/virtio-console.c  |   53 +++++++++-----
 hw/xen_console.c     |   16 +++--
 hw/xilinx_uartlite.c |   11 +++-
 monitor.c            |   19 ++++-
 net/slirp.c          |    8 ++-
 net/socket.c         |    4 +-
 qemu-char.c          |  190 ++++++++++++++++++++++++++++++++++++++++----------
 qemu-char.h          |   17 ++++-
 qemu_socket.h        |    2 +-
 21 files changed, 343 insertions(+), 108 deletions(-)

-- 
1.7.3.2

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v8 1/7] virtio-console: Factor out common init between console and generic ports
  2010-12-01  9:54 [Qemu-devel] [PATCH v8 0/7] char: non-blocking writes, virtio-console flow control Amit Shah
@ 2010-12-01  9:54 ` Amit Shah
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 2/7] char: Add a QemuChrHandlers struct to initialise chardev handlers Amit Shah
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Amit Shah @ 2010-12-01  9:54 UTC (permalink / raw)
  To: qemu list; +Cc: Amit Shah, Gerd Hoffmann, Juan Quintela

The initialisation for generic ports and console ports is similar.
Factor out the parts that are the same in a different function that can
be called from each of the initfns.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 hw/virtio-console.c |   31 ++++++++++++++-----------------
 1 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index caea11f..d7fe68b 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -58,24 +58,28 @@ static void chr_event(void *opaque, int event)
     }
 }
 
-/* Virtio Console Ports */
-static int virtconsole_initfn(VirtIOSerialDevice *dev)
+static int generic_port_init(VirtConsole *vcon, VirtIOSerialDevice *dev)
 {
-    VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, &dev->qdev);
-    VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
-
-    port->info = dev->info;
-
-    port->is_console = true;
+    vcon->port.info = dev->info;
 
     if (vcon->chr) {
         qemu_chr_add_handlers(vcon->chr, chr_can_read, chr_read, chr_event,
                               vcon);
-        port->info->have_data = flush_buf;
+        vcon->port.info->have_data = flush_buf;
     }
     return 0;
 }
 
+/* Virtio Console Ports */
+static int virtconsole_initfn(VirtIOSerialDevice *dev)
+{
+    VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, &dev->qdev);
+    VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
+
+    port->is_console = true;
+    return generic_port_init(vcon, dev);
+}
+
 static int virtconsole_exitfn(VirtIOSerialDevice *dev)
 {
     VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, &dev->qdev);
@@ -115,14 +119,7 @@ static int virtserialport_initfn(VirtIOSerialDevice *dev)
     VirtIOSerialPort *port = DO_UPCAST(VirtIOSerialPort, dev, &dev->qdev);
     VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
 
-    port->info = dev->info;
-
-    if (vcon->chr) {
-        qemu_chr_add_handlers(vcon->chr, chr_can_read, chr_read, chr_event,
-                              vcon);
-        port->info->have_data = flush_buf;
-    }
-    return 0;
+    return generic_port_init(vcon, dev);
 }
 
 static VirtIOSerialPortInfo virtserialport_info = {
-- 
1.7.3.2

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v8 2/7] char: Add a QemuChrHandlers struct to initialise chardev handlers
  2010-12-01  9:54 [Qemu-devel] [PATCH v8 0/7] char: non-blocking writes, virtio-console flow control Amit Shah
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 1/7] virtio-console: Factor out common init between console and generic ports Amit Shah
@ 2010-12-01  9:54 ` Amit Shah
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 3/7] char: Introduce char_set/remove_fd_handlers() Amit Shah
                   ` (4 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Amit Shah @ 2010-12-01  9:54 UTC (permalink / raw)
  To: qemu list; +Cc: Amit Shah, Gerd Hoffmann, Juan Quintela

Instead of passing each handler in the qemu_add_handlers() function,
create a struct of handlers that can be passed to the function instead.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 gdbstub.c            |    9 +++++++--
 hw/debugcon.c        |    2 +-
 hw/escc.c            |    9 +++++++--
 hw/etraxfs_ser.c     |   13 +++++++++----
 hw/ivshmem.c         |   28 ++++++++++++++++++++++------
 hw/mcf_uart.c        |    9 +++++++--
 hw/pl011.c           |    9 +++++++--
 hw/pxa2xx.c          |   13 +++++++++----
 hw/serial.c          |    9 +++++++--
 hw/sh_serial.c       |   12 +++++++++---
 hw/syborg_serial.c   |    9 +++++++--
 hw/usb-serial.c      |    9 +++++++--
 hw/virtio-console.c  |    9 +++++++--
 hw/xen_console.c     |   16 +++++++++++-----
 hw/xilinx_uartlite.c |   11 +++++++++--
 monitor.c            |   19 +++++++++++++++----
 net/slirp.c          |    8 ++++++--
 qemu-char.c          |   27 ++++++++++++++++++---------
 qemu-char.h          |   12 ++++++++----
 19 files changed, 173 insertions(+), 60 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index 0aa081b..b5ba2ef 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -2627,6 +2627,12 @@ static void gdb_sigterm_handler(int signal)
 }
 #endif
 
+static QemuChrHandlers gdb_handlers = {
+    .fd_can_read = gdb_chr_can_receive,
+    .fd_read = gdb_chr_receive,
+    .fd_event = gdb_chr_event,
+};
+
 int gdbserver_start(const char *device)
 {
     GDBState *s;
@@ -2656,8 +2662,7 @@ int gdbserver_start(const char *device)
         if (!chr)
             return -1;
 
-        qemu_chr_add_handlers(chr, gdb_chr_can_receive, gdb_chr_receive,
-                              gdb_chr_event, NULL);
+        qemu_chr_add_handlers(chr, &gdb_handlers, NULL);
     }
 
     s = gdbserver_state;
diff --git a/hw/debugcon.c b/hw/debugcon.c
index 5ee6821..e79a595 100644
--- a/hw/debugcon.c
+++ b/hw/debugcon.c
@@ -73,7 +73,7 @@ static void debugcon_init_core(DebugconState *s)
         exit(1);
     }
 
-    qemu_chr_add_handlers(s->chr, NULL, NULL, NULL, s);
+    qemu_chr_add_handlers(s->chr, NULL, s);
 }
 
 static int debugcon_isa_initfn(ISADevice *dev)
diff --git a/hw/escc.c b/hw/escc.c
index 8714239..0cfc2e8 100644
--- a/hw/escc.c
+++ b/hw/escc.c
@@ -894,6 +894,12 @@ void slavio_serial_ms_kbd_init(target_phys_addr_t base, qemu_irq irq,
     sysbus_mmio_map(s, 0, base);
 }
 
+static QemuChrHandlers serial_handlers = {
+    .fd_can_read = serial_can_receive,
+    .fd_read = serial_receive1,
+    .fd_event = serial_event,
+};
+
 static int escc_init1(SysBusDevice *dev)
 {
     SerialState *s = FROM_SYSBUS(SerialState, dev);
@@ -907,8 +913,7 @@ static int escc_init1(SysBusDevice *dev)
         s->chn[i].chn = 1 - i;
         s->chn[i].clock = s->frequency / 2;
         if (s->chn[i].chr) {
-            qemu_chr_add_handlers(s->chn[i].chr, serial_can_receive,
-                                  serial_receive1, serial_event, &s->chn[i]);
+            qemu_chr_add_handlers(s->chn[i].chr, &serial_handlers, &s->chn[i]);
         }
     }
     s->chn[0].otherchn = &s->chn[1];
diff --git a/hw/etraxfs_ser.c b/hw/etraxfs_ser.c
index 336cc54..f38280f 100644
--- a/hw/etraxfs_ser.c
+++ b/hw/etraxfs_ser.c
@@ -190,6 +190,12 @@ static void serial_event(void *opaque, int event)
 
 }
 
+static QemuChrHandlers serial_handlers = {
+    .fd_can_read = serial_can_receive,
+    .fd_read = serial_receive,
+    .fd_event = serial_event,
+};
+
 static int etraxfs_ser_init(SysBusDevice *dev)
 {
     struct etrax_serial *s = FROM_SYSBUS(typeof (*s), dev);
@@ -203,10 +209,9 @@ static int etraxfs_ser_init(SysBusDevice *dev)
     ser_regs = cpu_register_io_memory(ser_read, ser_write, s);
     sysbus_init_mmio(dev, R_MAX * 4, ser_regs);
     s->chr = qdev_init_chardev(&dev->qdev);
-    if (s->chr)
-        qemu_chr_add_handlers(s->chr,
-                      serial_can_receive, serial_receive,
-                      serial_event, s);
+    if (s->chr) {
+        qemu_chr_add_handlers(s->chr, &serial_handlers, s);
+    }
     return 0;
 }
 
diff --git a/hw/ivshmem.c b/hw/ivshmem.c
index 06dce70..408ca2a 100644
--- a/hw/ivshmem.c
+++ b/hw/ivshmem.c
@@ -312,6 +312,18 @@ static void fake_irqfd(void *opaque, const uint8_t *buf, int size) {
     msix_notify(pdev, entry->vector);
 }
 
+static QemuChrHandlers ivshmem_msi_handlers = {
+    .fd_can_read = ivshmem_can_receive,
+    .fd_read = fake_irqfd,
+    .fd_event = ivshmem_event,
+};
+
+static QemuChrHandlers ivshmem_nomsi_handlers = {
+    .fd_can_read = ivshmem_can_receive,
+    .fd_read = ivshmem_receive,
+    .fd_event = ivshmem_event,
+};
+
 static CharDriverState* create_eventfd_chr_device(void * opaque, int eventfd,
                                                                     int vector)
 {
@@ -331,11 +343,10 @@ static CharDriverState* create_eventfd_chr_device(void * opaque, int eventfd,
         s->eventfd_table[vector].pdev = &s->dev;
         s->eventfd_table[vector].vector = vector;
 
-        qemu_chr_add_handlers(chr, ivshmem_can_receive, fake_irqfd,
-                      ivshmem_event, &s->eventfd_table[vector]);
+        qemu_chr_add_handlers(chr, &ivshmem_msi_handlers,
+                              &s->eventfd_table[vector]);
     } else {
-        qemu_chr_add_handlers(chr, ivshmem_can_receive, ivshmem_receive,
-                      ivshmem_event, s);
+        qemu_chr_add_handlers(chr, &ivshmem_nomsi_handlers, s);
     }
 
     return chr;
@@ -666,6 +677,12 @@ static int ivshmem_load(QEMUFile* f, void *opaque, int version_id)
     return 0;
 }
 
+static QemuChrHandlers ivshmem_handlers = {
+    .fd_can_read = ivshmem_can_receive,
+    .fd_read = ivshmem_read,
+    .fd_event = ivshmem_event,
+};
+
 static int pci_ivshmem_init(PCIDevice *dev)
 {
     IVShmemState *s = DO_UPCAST(IVShmemState, dev, dev);
@@ -754,8 +771,7 @@ static int pci_ivshmem_init(PCIDevice *dev)
 
         s->eventfd_chr = qemu_mallocz(s->vectors * sizeof(CharDriverState *));
 
-        qemu_chr_add_handlers(s->server_chr, ivshmem_can_receive, ivshmem_read,
-                     ivshmem_event, s);
+        qemu_chr_add_handlers(s->server_chr, &ivshmem_handlers, s);
     } else {
         /* just map the file immediately, we're not using a server */
         int fd;
diff --git a/hw/mcf_uart.c b/hw/mcf_uart.c
index d16bac7..301b901 100644
--- a/hw/mcf_uart.c
+++ b/hw/mcf_uart.c
@@ -268,6 +268,12 @@ static void mcf_uart_receive(void *opaque, const uint8_t *buf, int size)
     mcf_uart_push_byte(s, buf[0]);
 }
 
+static QemuChrHandlers mcf_uart_handlers = {
+    .fd_can_read = mcf_uart_can_receive,
+    .fd_read = mcf_uart_receive,
+    .fd_event = mcf_uart_event,
+};
+
 void *mcf_uart_init(qemu_irq irq, CharDriverState *chr)
 {
     mcf_uart_state *s;
@@ -276,8 +282,7 @@ void *mcf_uart_init(qemu_irq irq, CharDriverState *chr)
     s->chr = chr;
     s->irq = irq;
     if (chr) {
-        qemu_chr_add_handlers(chr, mcf_uart_can_receive, mcf_uart_receive,
-                              mcf_uart_event, s);
+        qemu_chr_add_handlers(chr, &mcf_uart_handlers, s);
     }
     mcf_uart_reset(s);
     return s;
diff --git a/hw/pl011.c b/hw/pl011.c
index 02cf84a..9b8e0dc 100644
--- a/hw/pl011.c
+++ b/hw/pl011.c
@@ -286,6 +286,12 @@ static int pl011_load(QEMUFile *f, void *opaque, int version_id)
     return 0;
 }
 
+static QemuChrHandlers pl011_handlers = {
+    .fd_can_read = pl011_can_receive,
+    .fd_read = pl011_receive,
+    .fd_event = pl011_event,
+};
+
 static int pl011_init(SysBusDevice *dev, const unsigned char *id)
 {
     int iomemtype;
@@ -303,8 +309,7 @@ static int pl011_init(SysBusDevice *dev, const unsigned char *id)
     s->cr = 0x300;
     s->flags = 0x90;
     if (s->chr) {
-        qemu_chr_add_handlers(s->chr, pl011_can_receive, pl011_receive,
-                              pl011_event, s);
+        qemu_chr_add_handlers(s->chr, &pl011_handlers, s);
     }
     register_savevm(&dev->qdev, "pl011_uart", -1, 1, pl011_save, pl011_load, s);
     return 0;
diff --git a/hw/pxa2xx.c b/hw/pxa2xx.c
index 6e04645..6a5242a 100644
--- a/hw/pxa2xx.c
+++ b/hw/pxa2xx.c
@@ -1994,6 +1994,12 @@ static int pxa2xx_fir_load(QEMUFile *f, void *opaque, int version_id)
     return 0;
 }
 
+static QemuChrHandlers pxa2xx_handlers = {
+    .fd_can_read = pxa2xx_fir_is_empty,
+    .fd_read = pxa2xx_fir_rx,
+    .fd_event = pxa2xx_fir_event,
+};
+
 static PXA2xxFIrState *pxa2xx_fir_init(target_phys_addr_t base,
                 qemu_irq irq, PXA2xxDMAState *dma,
                 CharDriverState *chr)
@@ -2012,10 +2018,9 @@ static PXA2xxFIrState *pxa2xx_fir_init(target_phys_addr_t base,
                     pxa2xx_fir_writefn, s);
     cpu_register_physical_memory(base, 0x1000, iomemtype);
 
-    if (chr)
-        qemu_chr_add_handlers(chr, pxa2xx_fir_is_empty,
-                        pxa2xx_fir_rx, pxa2xx_fir_event, s);
-
+    if (chr) {
+        qemu_chr_add_handlers(chr, &pxa2xx_handlers, s);
+    }
     register_savevm(NULL, "pxa2xx_fir", 0, 0, pxa2xx_fir_save,
                     pxa2xx_fir_load, s);
 
diff --git a/hw/serial.c b/hw/serial.c
index 9ebc452..611921b 100644
--- a/hw/serial.c
+++ b/hw/serial.c
@@ -727,6 +727,12 @@ static void serial_reset(void *opaque)
     qemu_irq_lower(s->irq);
 }
 
+static QemuChrHandlers serial_handlers = {
+    .fd_can_read = serial_can_receive1,
+    .fd_read = serial_receive1,
+    .fd_event = serial_event,
+};
+
 static void serial_init_core(SerialState *s)
 {
     if (!s->chr) {
@@ -741,8 +747,7 @@ static void serial_init_core(SerialState *s)
 
     qemu_register_reset(serial_reset, s);
 
-    qemu_chr_add_handlers(s->chr, serial_can_receive1, serial_receive1,
-                          serial_event, s);
+    qemu_chr_add_handlers(s->chr, &serial_handlers, s);
 }
 
 /* Change the main reference oscillator frequency. */
diff --git a/hw/sh_serial.c b/hw/sh_serial.c
index 93dc144..ee3f8c8 100644
--- a/hw/sh_serial.c
+++ b/hw/sh_serial.c
@@ -363,6 +363,12 @@ static CPUWriteMemoryFunc * const sh_serial_writefn[] = {
     &sh_serial_write,
 };
 
+static QemuChrHandlers sh_serial_handlers = {
+    .fd_can_read = sh_serial_can_receive1,
+    .fd_read = sh_serial_receive1,
+    .fd_event = sh_serial_event,
+};
+
 void sh_serial_init (target_phys_addr_t base, int feat,
 		     uint32_t freq, CharDriverState *chr,
 		     qemu_irq eri_source,
@@ -401,9 +407,9 @@ void sh_serial_init (target_phys_addr_t base, int feat,
 
     s->chr = chr;
 
-    if (chr)
-        qemu_chr_add_handlers(chr, sh_serial_can_receive1, sh_serial_receive1,
-			      sh_serial_event, s);
+    if (chr) {
+        qemu_chr_add_handlers(chr, &sh_serial_handlers, s);
+    }
 
     s->eri = eri_source;
     s->rxi = rxi_source;
diff --git a/hw/syborg_serial.c b/hw/syborg_serial.c
index 8c42956..42312b2 100644
--- a/hw/syborg_serial.c
+++ b/hw/syborg_serial.c
@@ -315,6 +315,12 @@ static int syborg_serial_load(QEMUFile *f, void *opaque, int version_id)
     return 0;
 }
 
+static QemuChrHandlers syborg_serial_handlers = {
+    .fd_can_read = syborg_serial_can_receive,
+    .fd_read = syborg_serial_receive,
+    .fd_event = syborg_serial_event,
+};
+
 static int syborg_serial_init(SysBusDevice *dev)
 {
     SyborgSerialState *s = FROM_SYSBUS(SyborgSerialState, dev);
@@ -326,8 +332,7 @@ static int syborg_serial_init(SysBusDevice *dev)
     sysbus_init_mmio(dev, 0x1000, iomemtype);
     s->chr = qdev_init_chardev(&dev->qdev);
     if (s->chr) {
-        qemu_chr_add_handlers(s->chr, syborg_serial_can_receive,
-                              syborg_serial_receive, syborg_serial_event, s);
+        qemu_chr_add_handlers(s->chr, &syborg_serial_handlers, s);
     }
     if (s->fifo_size <= 0) {
         fprintf(stderr, "syborg_serial: fifo too small\n");
diff --git a/hw/usb-serial.c b/hw/usb-serial.c
index c19580f..697cb89 100644
--- a/hw/usb-serial.c
+++ b/hw/usb-serial.c
@@ -540,6 +540,12 @@ static void usb_serial_event(void *opaque, int event)
     }
 }
 
+static QemuChrHandlers usb_serial_handlers = {
+    .fd_can_read = usb_serial_can_read,
+    .fd_read = usb_serial_read,
+    .fd_event = usb_serial_event,
+};
+
 static int usb_serial_initfn(USBDevice *dev)
 {
     USBSerialState *s = DO_UPCAST(USBSerialState, dev, dev);
@@ -550,8 +556,7 @@ static int usb_serial_initfn(USBDevice *dev)
         return -1;
     }
 
-    qemu_chr_add_handlers(s->cs, usb_serial_can_read, usb_serial_read,
-                          usb_serial_event, s);
+    qemu_chr_add_handlers(s->cs, &usb_serial_handlers, s);
     usb_serial_handle_reset(dev);
     return 0;
 }
diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index d7fe68b..749ed59 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -58,13 +58,18 @@ static void chr_event(void *opaque, int event)
     }
 }
 
+static QemuChrHandlers chr_handlers = {
+    .fd_can_read = chr_can_read,
+    .fd_read = chr_read,
+    .fd_event = chr_event,
+};
+
 static int generic_port_init(VirtConsole *vcon, VirtIOSerialDevice *dev)
 {
     vcon->port.info = dev->info;
 
     if (vcon->chr) {
-        qemu_chr_add_handlers(vcon->chr, chr_can_read, chr_read, chr_event,
-                              vcon);
+        qemu_chr_add_handlers(vcon->chr, &chr_handlers, vcon);
         vcon->port.info->have_data = flush_buf;
     }
     return 0;
diff --git a/hw/xen_console.c b/hw/xen_console.c
index d2261f4..f490a5a 100644
--- a/hw/xen_console.c
+++ b/hw/xen_console.c
@@ -202,6 +202,11 @@ static int con_init(struct XenDevice *xendev)
     return 0;
 }
 
+static QemuChrHandlers xencons_handlers = {
+    .fd_can_read = xencons_can_receive,
+    .fd_read = xencons_receive,
+};
+
 static int con_connect(struct XenDevice *xendev)
 {
     struct XenConsole *con = container_of(xendev, struct XenConsole, xendev);
@@ -222,9 +227,9 @@ static int con_connect(struct XenDevice *xendev)
 	return -1;
 
     xen_be_bind_evtchn(&con->xendev);
-    if (con->chr)
-        qemu_chr_add_handlers(con->chr, xencons_can_receive, xencons_receive,
-                              NULL, con);
+    if (con->chr) {
+        qemu_chr_add_handlers(con->chr, &xencons_handlers, con);
+    }
 
     xen_be_printf(xendev, 1, "ring mfn %d, remote port %d, local port %d, limit %zd\n",
 		  con->ring_ref,
@@ -238,8 +243,9 @@ static void con_disconnect(struct XenDevice *xendev)
 {
     struct XenConsole *con = container_of(xendev, struct XenConsole, xendev);
 
-    if (con->chr)
-        qemu_chr_add_handlers(con->chr, NULL, NULL, NULL, NULL);
+    if (con->chr) {
+        qemu_chr_add_handlers(con->chr, NULL, NULL);
+    }
     xen_be_unbind_evtchn(&con->xendev);
 
     if (con->sring) {
diff --git a/hw/xilinx_uartlite.c b/hw/xilinx_uartlite.c
index adab759..376216e 100644
--- a/hw/xilinx_uartlite.c
+++ b/hw/xilinx_uartlite.c
@@ -193,6 +193,12 @@ static void uart_event(void *opaque, int event)
 
 }
 
+static QemuChrHandlers uart_handlers = {
+    .fd_can_read = uart_can_rx,
+    .fd_read = uart_rx,
+    .fd_event = uart_event,
+};
+
 static int xilinx_uartlite_init(SysBusDevice *dev)
 {
     struct xlx_uartlite *s = FROM_SYSBUS(typeof (*s), dev);
@@ -205,8 +211,9 @@ static int xilinx_uartlite_init(SysBusDevice *dev)
     sysbus_init_mmio(dev, R_MAX * 4, uart_regs);
 
     s->chr = qdev_init_chardev(&dev->qdev);
-    if (s->chr)
-        qemu_chr_add_handlers(s->chr, uart_can_rx, uart_rx, uart_event, s);
+    if (s->chr) {
+        qemu_chr_add_handlers(s->chr, &uart_handlers, s);
+    }
     return 0;
 }
 
diff --git a/monitor.c b/monitor.c
index 8cee35d..40c0723 100644
--- a/monitor.c
+++ b/monitor.c
@@ -4662,6 +4662,18 @@ static void monitor_event(void *opaque, int event)
  * End:
  */
 
+static QemuChrHandlers monitor_handlers = {
+    .fd_can_read = monitor_can_read,
+    .fd_read = monitor_read,
+    .fd_event = monitor_event,
+};
+
+static QemuChrHandlers monitor_control_handlers = {
+    .fd_can_read = monitor_can_read,
+    .fd_read = monitor_control_read,
+    .fd_event = monitor_control_event,
+};
+
 void monitor_init(CharDriverState *chr, int flags)
 {
     static int is_first_init = 1;
@@ -4683,12 +4695,11 @@ void monitor_init(CharDriverState *chr, int flags)
 
     if (monitor_ctrl_mode(mon)) {
         mon->mc = qemu_mallocz(sizeof(MonitorControl));
+
         /* Control mode requires special handlers */
-        qemu_chr_add_handlers(chr, monitor_can_read, monitor_control_read,
-                              monitor_control_event, mon);
+        qemu_chr_add_handlers(chr, &monitor_control_handlers, mon);
     } else {
-        qemu_chr_add_handlers(chr, monitor_can_read, monitor_read,
-                              monitor_event, mon);
+        qemu_chr_add_handlers(chr, &monitor_handlers, mon);
     }
 
     QLIST_INSERT_HEAD(&mon_list, mon, entry);
diff --git a/net/slirp.c b/net/slirp.c
index b41c60a..52e9a5c 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -577,6 +577,11 @@ static void guestfwd_read(void *opaque, const uint8_t *buf, int size)
     slirp_socket_recv(fwd->slirp, fwd->server, fwd->port, buf, size);
 }
 
+static QemuChrHandlers guestfwd_handlers = {
+    .fd_can_read = guestfwd_can_read,
+    .fd_read = guestfwd_read,
+};
+
 static int slirp_guestfwd(SlirpState *s, const char *config_str,
                           int legacy_format)
 {
@@ -633,8 +638,7 @@ static int slirp_guestfwd(SlirpState *s, const char *config_str,
     fwd->port = port;
     fwd->slirp = s->slirp;
 
-    qemu_chr_add_handlers(fwd->hd, guestfwd_can_read, guestfwd_read,
-                          NULL, fwd);
+    qemu_chr_add_handlers(fwd->hd, &guestfwd_handlers, fwd);
     return 0;
 
  fail_syntax:
diff --git a/qemu-char.c b/qemu-char.c
index 88997f9..d77530d 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -190,15 +190,19 @@ void qemu_chr_send_event(CharDriverState *s, int event)
         s->chr_send_event(s, event);
 }
 
+static QemuChrHandlers null_handlers = {
+    /* All handlers are initialised to NULL */
+};
+
 void qemu_chr_add_handlers(CharDriverState *s,
-                           IOCanReadHandler *fd_can_read,
-                           IOReadHandler *fd_read,
-                           IOEventHandler *fd_event,
-                           void *opaque)
-{
-    s->chr_can_read = fd_can_read;
-    s->chr_read = fd_read;
-    s->chr_event = fd_event;
+                           QemuChrHandlers *handlers, void *opaque)
+{
+    if (!handlers) {
+        handlers = &null_handlers;
+    }
+    s->chr_can_read = handlers->fd_can_read;
+    s->chr_read = handlers->fd_read;
+    s->chr_event = handlers->fd_event;
     s->handler_opaque = opaque;
     if (s->chr_update_read_handler)
         s->chr_update_read_handler(s);
@@ -436,6 +440,12 @@ static void mux_chr_event(void *opaque, int event)
         mux_chr_send_event(d, i, event);
 }
 
+static QemuChrHandlers mux_chr_handlers = {
+    .fd_can_read = mux_chr_can_read,
+    .fd_read = mux_chr_read,
+    .fd_event = mux_chr_event,
+};
+
 static void mux_chr_update_read_handler(CharDriverState *chr)
 {
     MuxDriver *d = chr->opaque;
@@ -450,8 +460,7 @@ static void mux_chr_update_read_handler(CharDriverState *chr)
     d->chr_event[d->mux_cnt] = chr->chr_event;
     /* Fix up the real driver with mux routines */
     if (d->mux_cnt == 0) {
-        qemu_chr_add_handlers(d->drv, mux_chr_can_read, mux_chr_read,
-                              mux_chr_event, chr);
+        qemu_chr_add_handlers(d->drv, &mux_chr_handlers, chr);
     }
     if (d->focus != -1) {
         mux_chr_send_event(d, d->focus, CHR_EVENT_MUX_OUT);
diff --git a/qemu-char.h b/qemu-char.h
index 18ad12b..62d395e 100644
--- a/qemu-char.h
+++ b/qemu-char.h
@@ -71,6 +71,13 @@ struct CharDriverState {
     QTAILQ_ENTRY(CharDriverState) next;
 };
 
+typedef struct QemuChrHandlers {
+    IOCanReadHandler *fd_can_read;
+    IOReadHandler *fd_read;
+    IOHandler *fd_write_unblocked;
+    IOEventHandler *fd_event;
+} QemuChrHandlers;
+
 QemuOpts *qemu_chr_parse_compat(const char *label, const char *filename);
 CharDriverState *qemu_chr_open_opts(QemuOpts *opts,
                                     void (*init)(struct CharDriverState *s));
@@ -80,10 +87,7 @@ void qemu_chr_printf(CharDriverState *s, const char *fmt, ...)
     GCC_FMT_ATTR(2, 3);
 int qemu_chr_write(CharDriverState *s, const uint8_t *buf, int len);
 void qemu_chr_send_event(CharDriverState *s, int event);
-void qemu_chr_add_handlers(CharDriverState *s,
-                           IOCanReadHandler *fd_can_read,
-                           IOReadHandler *fd_read,
-                           IOEventHandler *fd_event,
+void qemu_chr_add_handlers(CharDriverState *s, QemuChrHandlers *handlers,
                            void *opaque);
 int qemu_chr_ioctl(CharDriverState *s, int cmd, void *arg);
 void qemu_chr_generic_open(CharDriverState *s);
-- 
1.7.3.2

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v8 3/7] char: Introduce char_set/remove_fd_handlers()
  2010-12-01  9:54 [Qemu-devel] [PATCH v8 0/7] char: non-blocking writes, virtio-console flow control Amit Shah
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 1/7] virtio-console: Factor out common init between console and generic ports Amit Shah
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 2/7] char: Add a QemuChrHandlers struct to initialise chardev handlers Amit Shah
@ 2010-12-01  9:54 ` Amit Shah
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 4/7] char: Add framework for a 'write unblocked' callback Amit Shah
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Amit Shah @ 2010-12-01  9:54 UTC (permalink / raw)
  To: qemu list; +Cc: Amit Shah, Gerd Hoffmann, Juan Quintela

Introduce a char-specific wrapper to qemu_set_fd_handler functions.
This wrapper is useful to add / remove a write handler easily.  Write
handlers are only used when the backend is blocked and cannot receive
any more input.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 qemu-char.c |   64 ++++++++++++++++++++++++++++++++++++++--------------------
 1 files changed, 42 insertions(+), 22 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index d77530d..483a5fd 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -214,6 +214,24 @@ void qemu_chr_add_handlers(CharDriverState *s,
     }
 }
 
+static int char_set_fd_handlers(int fd,
+                                IOCanReadHandler *fd_read_poll,
+                                IOHandler *fd_read, IOHandler *fd_write,
+                                void *opaque, bool install_write_handler)
+{
+    if (install_write_handler) {
+        assert(fd_write);
+    } else {
+        fd_write = NULL;
+    }
+    return qemu_set_fd_handler2(fd, fd_read_poll, fd_read, fd_write, opaque);
+}
+
+static int char_remove_fd_handlers(int fd)
+{
+    return qemu_set_fd_handler2(fd, NULL, NULL, NULL, NULL);
+}
+
 static int null_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
 {
     return len;
@@ -579,7 +597,7 @@ static void fd_chr_read(void *opaque)
     size = read(s->fd_in, buf, len);
     if (size == 0) {
         /* FD has been closed. Remove it from the active list.  */
-        qemu_set_fd_handler2(s->fd_in, NULL, NULL, NULL, NULL);
+        char_remove_fd_handlers(s->fd_in);
         qemu_chr_event(chr, CHR_EVENT_CLOSED);
         return;
     }
@@ -595,8 +613,8 @@ static void fd_chr_update_read_handler(CharDriverState *chr)
     if (s->fd_in >= 0) {
         if (display_type == DT_NOGRAPHIC && s->fd_in == 0) {
         } else {
-            qemu_set_fd_handler2(s->fd_in, fd_chr_read_poll,
-                                 fd_chr_read, NULL, chr);
+            char_set_fd_handlers(s->fd_in, fd_chr_read_poll, fd_chr_read,
+                                 NULL, chr, false);
         }
     }
 }
@@ -608,7 +626,7 @@ static void fd_chr_close(struct CharDriverState *chr)
     if (s->fd_in >= 0) {
         if (display_type == DT_NOGRAPHIC && s->fd_in == 0) {
         } else {
-            qemu_set_fd_handler2(s->fd_in, NULL, NULL, NULL, NULL);
+            char_remove_fd_handlers(s->fd_in);
         }
     }
 
@@ -708,7 +726,7 @@ static void stdio_read(void *opaque)
     size = read(0, buf, 1);
     if (size == 0) {
         /* stdin has been closed. Remove it from the active list.  */
-        qemu_set_fd_handler2(0, NULL, NULL, NULL, NULL);
+        char_remove_fd_handlers(0);
         qemu_chr_event(chr, CHR_EVENT_CLOSED);
         return;
     }
@@ -764,7 +782,7 @@ static void qemu_chr_close_stdio(struct CharDriverState *chr)
 {
     term_exit();
     stdio_nb_clients--;
-    qemu_set_fd_handler2(0, NULL, NULL, NULL, NULL);
+    char_remove_fd_handlers(0);
     fd_chr_close(chr);
 }
 
@@ -776,7 +794,7 @@ static CharDriverState *qemu_chr_open_stdio(QemuOpts *opts)
         return NULL;
     chr = qemu_chr_open_fd(0, 1);
     chr->chr_close = qemu_chr_close_stdio;
-    qemu_set_fd_handler2(0, stdio_read_poll, stdio_read, NULL, chr);
+    char_set_fd_handlers(0, stdio_read_poll, stdio_read, NULL, chr, false);
     stdio_nb_clients++;
     term_init(opts);
 
@@ -904,8 +922,8 @@ static void pty_chr_update_read_handler(CharDriverState *chr)
 {
     PtyCharDriver *s = chr->opaque;
 
-    qemu_set_fd_handler2(s->fd, pty_chr_read_poll,
-                         pty_chr_read, NULL, chr);
+    char_set_fd_handlers(s->fd, pty_chr_read_poll, pty_chr_read, NULL,
+                         chr, false);
     s->polling = 1;
     /*
      * Short timeout here: just need wait long enougth that qemu makes
@@ -923,7 +941,7 @@ static void pty_chr_state(CharDriverState *chr, int connected)
     PtyCharDriver *s = chr->opaque;
 
     if (!connected) {
-        qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
+        char_remove_fd_handlers(s->fd);
         s->connected = 0;
         s->polling = 0;
         /* (re-)connect poll interval for idle guests: once per second.
@@ -959,7 +977,7 @@ static void pty_chr_close(struct CharDriverState *chr)
 {
     PtyCharDriver *s = chr->opaque;
 
-    qemu_set_fd_handler2(s->fd, NULL, NULL, NULL, NULL);
+    char_remove_fd_handlers(s->fd);
     close(s->fd);
     qemu_del_timer(s->timer);
     qemu_free_timer(s->timer);
@@ -1860,8 +1878,8 @@ static void udp_chr_update_read_handler(CharDriverState *chr)
     NetCharDriver *s = chr->opaque;
 
     if (s->fd >= 0) {
-        qemu_set_fd_handler2(s->fd, udp_chr_read_poll,
-                             udp_chr_read, NULL, chr);
+        char_set_fd_handlers(s->fd, udp_chr_read_poll, udp_chr_read, NULL,
+                             chr, false);
     }
 }
 
@@ -1869,7 +1887,7 @@ static void udp_chr_close(CharDriverState *chr)
 {
     NetCharDriver *s = chr->opaque;
     if (s->fd >= 0) {
-        qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
+        char_remove_fd_handlers(s->fd);
         closesocket(s->fd);
     }
     qemu_free(s);
@@ -2078,9 +2096,10 @@ static void tcp_chr_read(void *opaque)
         /* connection closed */
         s->connected = 0;
         if (s->listen_fd >= 0) {
-            qemu_set_fd_handler(s->listen_fd, tcp_chr_accept, NULL, chr);
+            char_set_fd_handlers(s->listen_fd, NULL, tcp_chr_accept, NULL,
+                                 chr, false);
         }
-        qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
+        char_remove_fd_handlers(s->fd);
         closesocket(s->fd);
         s->fd = -1;
         qemu_chr_event(chr, CHR_EVENT_CLOSED);
@@ -2105,8 +2124,8 @@ static void tcp_chr_connect(void *opaque)
     TCPCharDriver *s = chr->opaque;
 
     s->connected = 1;
-    qemu_set_fd_handler2(s->fd, tcp_chr_read_poll,
-                         tcp_chr_read, NULL, chr);
+    char_set_fd_handlers(s->fd, tcp_chr_read_poll, tcp_chr_read, NULL,
+                         chr, false);
     qemu_chr_generic_open(chr);
 }
 
@@ -2167,7 +2186,7 @@ static void tcp_chr_accept(void *opaque)
     if (s->do_nodelay)
         socket_set_nodelay(fd);
     s->fd = fd;
-    qemu_set_fd_handler(s->listen_fd, NULL, NULL, NULL);
+    char_remove_fd_handlers(s->listen_fd);
     tcp_chr_connect(chr);
 }
 
@@ -2175,11 +2194,11 @@ static void tcp_chr_close(CharDriverState *chr)
 {
     TCPCharDriver *s = chr->opaque;
     if (s->fd >= 0) {
-        qemu_set_fd_handler(s->fd, NULL, NULL, NULL);
+        char_remove_fd_handlers(s->fd);
         closesocket(s->fd);
     }
     if (s->listen_fd >= 0) {
-        qemu_set_fd_handler(s->listen_fd, NULL, NULL, NULL);
+        char_remove_fd_handlers(s->listen_fd);
         closesocket(s->listen_fd);
     }
     qemu_free(s);
@@ -2241,7 +2260,8 @@ static CharDriverState *qemu_chr_open_socket(QemuOpts *opts)
 
     if (is_listen) {
         s->listen_fd = fd;
-        qemu_set_fd_handler(s->listen_fd, tcp_chr_accept, NULL, chr);
+        char_set_fd_handlers(s->listen_fd, NULL, tcp_chr_accept, NULL,
+                             chr, false);
         if (is_telnet)
             s->do_telnetopt = 1;
 
-- 
1.7.3.2

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v8 4/7] char: Add framework for a 'write unblocked' callback
  2010-12-01  9:54 [Qemu-devel] [PATCH v8 0/7] char: non-blocking writes, virtio-console flow control Amit Shah
                   ` (2 preceding siblings ...)
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 3/7] char: Introduce char_set/remove_fd_handlers() Amit Shah
@ 2010-12-01  9:54 ` Amit Shah
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 5/7] char: Update send_all() to handle nonblocking chardev write requests Amit Shah
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 23+ messages in thread
From: Amit Shah @ 2010-12-01  9:54 UTC (permalink / raw)
  To: qemu list; +Cc: Amit Shah, Gerd Hoffmann, Juan Quintela

The char layer can let users know that the driver will block on further
input.  For users interested in not blocking, they can assign a function
pointer that will be called back when the driver becomes writable.  This
patch just adds the function pointers to the CharDriverState structure,
future patches will enable the nonblocking and callback functionality.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 qemu-char.c |    3 +++
 qemu-char.h |    5 +++++
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index 483a5fd..c16c2d7 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -202,11 +202,14 @@ void qemu_chr_add_handlers(CharDriverState *s,
     }
     s->chr_can_read = handlers->fd_can_read;
     s->chr_read = handlers->fd_read;
+    s->chr_write_unblocked = handlers->fd_write_unblocked;
     s->chr_event = handlers->fd_event;
     s->handler_opaque = opaque;
     if (s->chr_update_read_handler)
         s->chr_update_read_handler(s);
 
+    s->write_blocked = false;
+
     /* We're connecting to an already opened device, so let's make sure we
        also get the open event */
     if (s->opened) {
diff --git a/qemu-char.h b/qemu-char.h
index 62d395e..f9cbcc6 100644
--- a/qemu-char.h
+++ b/qemu-char.h
@@ -59,6 +59,9 @@ struct CharDriverState {
     IOEventHandler *chr_event;
     IOCanReadHandler *chr_can_read;
     IOReadHandler *chr_read;
+    IOHandler *chr_write_unblocked;
+    void (*update_fd_handlers)(struct CharDriverState *chr,
+                               bool poll_out);
     void *handler_opaque;
     void (*chr_send_event)(struct CharDriverState *chr, int event);
     void (*chr_close)(struct CharDriverState *chr);
@@ -68,6 +71,8 @@ struct CharDriverState {
     char *label;
     char *filename;
     int opened;
+    /* Are we in a blocked state? */
+    bool write_blocked;
     QTAILQ_ENTRY(CharDriverState) next;
 };
 
-- 
1.7.3.2

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v8 5/7] char: Update send_all() to handle nonblocking chardev write requests
  2010-12-01  9:54 [Qemu-devel] [PATCH v8 0/7] char: non-blocking writes, virtio-console flow control Amit Shah
                   ` (3 preceding siblings ...)
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 4/7] char: Add framework for a 'write unblocked' callback Amit Shah
@ 2010-12-01  9:54 ` Amit Shah
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 6/7] char: Equip the unix/tcp backend to handle nonblocking writes Amit Shah
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data Amit Shah
  6 siblings, 0 replies; 23+ messages in thread
From: Amit Shah @ 2010-12-01  9:54 UTC (permalink / raw)
  To: qemu list; +Cc: Amit Shah, Gerd Hoffmann, Juan Quintela

The send_all function is modified to return to the caller in case the
driver cannot handle any more data.  It returns -EAGAIN or
WSAEWOULDBLOCK on non-Windows and Windows platforms respectively.  This
is only done when the caller sets a callback function handler indicating
it's not interested in blocking till the driver has written out all the
data.

Currently there's no driver or caller that supports this.  Future
commits will add such capability.
---
 net/socket.c  |    4 +-
 qemu-char.c   |   67 +++++++++++++++++++++++++++++++++++++++++++++++++++-----
 qemu_socket.h |    2 +-
 3 files changed, 64 insertions(+), 9 deletions(-)

diff --git a/net/socket.c b/net/socket.c
index 1c4e153..902529e 100644
--- a/net/socket.c
+++ b/net/socket.c
@@ -56,8 +56,8 @@ static ssize_t net_socket_receive(VLANClientState *nc, const uint8_t *buf, size_
     uint32_t len;
     len = htonl(size);
 
-    send_all(s->fd, (const uint8_t *)&len, sizeof(len));
-    return send_all(s->fd, buf, size);
+    send_all(NULL, s->fd, (const uint8_t *)&len, sizeof(len));
+    return send_all(NULL, s->fd, buf, size);
 }
 
 static ssize_t net_socket_receive_dgram(VLANClientState *nc, const uint8_t *buf, size_t size)
diff --git a/qemu-char.c b/qemu-char.c
index c16c2d7..a24c4c2 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -514,7 +514,7 @@ static CharDriverState *qemu_chr_open_mux(CharDriverState *drv)
 
 
 #ifdef _WIN32
-int send_all(int fd, const void *buf, int len1)
+static int do_send(int fd, const void *buf, int len1, bool nonblock)
 {
     int ret, len;
 
@@ -525,6 +525,8 @@ int send_all(int fd, const void *buf, int len1)
             errno = WSAGetLastError();
             if (errno != WSAEWOULDBLOCK) {
                 return -1;
+            } else if (errno == WSAEWOULDBLOCK && nonblock) {
+                return WSAEWOULDBLOCK;
             }
         } else if (ret == 0) {
             break;
@@ -538,7 +540,7 @@ int send_all(int fd, const void *buf, int len1)
 
 #else
 
-int send_all(int fd, const void *_buf, int len1)
+static int do_send(int fd, const void *_buf, int len1, bool nonblock)
 {
     int ret, len;
     const uint8_t *buf = _buf;
@@ -547,8 +549,12 @@ int send_all(int fd, const void *_buf, int len1)
     while (len > 0) {
         ret = write(fd, buf, len);
         if (ret < 0) {
-            if (errno != EINTR && errno != EAGAIN)
+            if (errno == EAGAIN && nonblock) {
+                return -EAGAIN;
+            }
+            if (errno != EINTR && errno != EAGAIN) {
                 return -1;
+            }
         } else if (ret == 0) {
             break;
         } else {
@@ -560,6 +566,55 @@ int send_all(int fd, const void *_buf, int len1)
 }
 #endif /* !_WIN32 */
 
+int send_all(CharDriverState *chr, int fd, const void *_buf, int len1)
+{
+    int ret, eagain_errno;
+    bool nonblock;
+
+    if (chr->write_blocked) {
+        /*
+         * We don't handle this situation: the caller should not send
+         * us data while we're blocked.
+         *
+         * We could buffer this data here but that'll only encourage
+         * bad behaviour on part of the callers.
+         *
+         * Also, the data already in fd's buffers isn't easily
+         * migratable.  If we want full migration support, all the
+         * data landing here needs to be buffered and on migration,
+         * anything that's unsent needs to be transferred to the
+         * dest. machine (which again isn't a very good way of solving
+         * the problem, as the src may become writable just during
+         * migration and the reader could receive some data twice,
+         * essentially corrupting the data).
+         */
+        return -1;
+    }
+
+    nonblock = false;
+    /*
+     * Ensure the char backend is able to receive and handle the
+     * 'write unblocked' event before we turn on nonblock support.
+     */
+    if (chr && chr->update_fd_handlers && chr->chr_write_unblocked) {
+        nonblock = true;
+    }
+    ret = do_send(fd, _buf, len1, nonblock);
+
+#ifdef _WIN32
+    eagain_errno = WSAEWOULDBLOCK;
+#else
+    eagain_errno = -EAGAIN;
+#endif
+
+    if (nonblock && ret == eagain_errno) {
+        /* Update fd handler to wake up when chr becomes writable */
+        chr->update_fd_handlers(chr, true);
+        chr->write_blocked = true;
+    }
+    return ret;
+}
+
 #ifndef _WIN32
 
 typedef struct {
@@ -573,7 +628,7 @@ static int stdio_nb_clients = 0;
 static int fd_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
 {
     FDCharDriver *s = chr->opaque;
-    return send_all(s->fd_out, buf, len);
+    return send_all(chr, s->fd_out, buf, len);
 }
 
 static int fd_chr_read_poll(void *opaque)
@@ -885,7 +940,7 @@ static int pty_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
         pty_chr_update_read_handler(chr);
         return 0;
     }
-    return send_all(s->fd, buf, len);
+    return send_all(chr, s->fd, buf, len);
 }
 
 static int pty_chr_read_poll(void *opaque)
@@ -1950,7 +2005,7 @@ static int tcp_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
 {
     TCPCharDriver *s = chr->opaque;
     if (s->connected) {
-        return send_all(s->fd, buf, len);
+        return send_all(chr, s->fd, buf, len);
     } else {
         /* XXX: indicate an error ? */
         return len;
diff --git a/qemu_socket.h b/qemu_socket.h
index 897a8ae..97dd24a 100644
--- a/qemu_socket.h
+++ b/qemu_socket.h
@@ -36,7 +36,7 @@ int inet_aton(const char *cp, struct in_addr *ia);
 int qemu_socket(int domain, int type, int protocol);
 int qemu_accept(int s, struct sockaddr *addr, socklen_t *addrlen);
 void socket_set_nonblock(int fd);
-int send_all(int fd, const void *buf, int len1);
+int send_all(CharDriverState *chr, int fd, const void *buf, int len1);
 
 /* New, ipv6-ready socket helper functions, see qemu-sockets.c */
 int inet_listen_opts(QemuOpts *opts, int port_offset);
-- 
1.7.3.2

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v8 6/7] char: Equip the unix/tcp backend to handle nonblocking writes
  2010-12-01  9:54 [Qemu-devel] [PATCH v8 0/7] char: non-blocking writes, virtio-console flow control Amit Shah
                   ` (4 preceding siblings ...)
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 5/7] char: Update send_all() to handle nonblocking chardev write requests Amit Shah
@ 2010-12-01  9:54 ` Amit Shah
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data Amit Shah
  6 siblings, 0 replies; 23+ messages in thread
From: Amit Shah @ 2010-12-01  9:54 UTC (permalink / raw)
  To: qemu list; +Cc: Amit Shah, Gerd Hoffmann, Juan Quintela

Now that the infrastructure is in place to return -EAGAIN to callers,
individual char drivers can set their update_fd_handlers() function to
set or remove an fd's write handler.  This handler checks if the driver
became writable.

A generic callback routine is used for unblocking writes and letting
users of chardevs know that a driver became writable again.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 qemu-char.c |   27 +++++++++++++++++++++++++++
 1 files changed, 27 insertions(+), 0 deletions(-)

diff --git a/qemu-char.c b/qemu-char.c
index a24c4c2..e624600 100644
--- a/qemu-char.c
+++ b/qemu-char.c
@@ -235,6 +235,19 @@ static int char_remove_fd_handlers(int fd)
     return qemu_set_fd_handler2(fd, NULL, NULL, NULL, NULL);
 }
 
+/*
+ * Generic routine that gets called when chardev becomes writable.
+ * Lets chardev user know it's OK to send more data.
+ */
+static void char_write_unblocked(void *opaque)
+{
+    CharDriverState *chr = opaque;
+
+    chr->write_blocked = false;
+    chr->update_fd_handlers(chr, false);
+    chr->chr_write_unblocked(chr->handler_opaque);
+}
+
 static int null_chr_write(CharDriverState *chr, const uint8_t *buf, int len)
 {
     return len;
@@ -2263,6 +2276,19 @@ static void tcp_chr_close(CharDriverState *chr)
     qemu_chr_event(chr, CHR_EVENT_CLOSED);
 }
 
+static void tcp_update_fd_handlers(CharDriverState *chr, bool poll_out)
+{
+    TCPCharDriver *s = chr->opaque;
+
+    /*
+     * This function is called only after tcp_chr_connect() is called
+     * (either in 'server' mode or client mode.  So we're sure of
+     * s->fd being initialised.
+     */
+    char_set_fd_handlers(s->fd, tcp_chr_read_poll, tcp_chr_read,
+                         char_write_unblocked, chr, poll_out);
+}
+
 static CharDriverState *qemu_chr_open_socket(QemuOpts *opts)
 {
     CharDriverState *chr = NULL;
@@ -2315,6 +2341,7 @@ static CharDriverState *qemu_chr_open_socket(QemuOpts *opts)
     chr->chr_write = tcp_chr_write;
     chr->chr_close = tcp_chr_close;
     chr->get_msgfd = tcp_get_msgfd;
+    chr->update_fd_handlers = tcp_update_fd_handlers;
 
     if (is_listen) {
         s->listen_fd = fd;
-- 
1.7.3.2

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-01  9:54 [Qemu-devel] [PATCH v8 0/7] char: non-blocking writes, virtio-console flow control Amit Shah
                   ` (5 preceding siblings ...)
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 6/7] char: Equip the unix/tcp backend to handle nonblocking writes Amit Shah
@ 2010-12-01  9:54 ` Amit Shah
  2010-12-01 11:30   ` Paul Brook
  6 siblings, 1 reply; 23+ messages in thread
From: Amit Shah @ 2010-12-01  9:54 UTC (permalink / raw)
  To: qemu list; +Cc: Amit Shah, Gerd Hoffmann, Juan Quintela

When a chardev indicates it can't accept more data, we tell the
virtio-serial code to stop sending us any more data till we tell
otherwise.  This helps in guests continuing to run normally while the vq
keeps getting full and eventually the guest stops queueing more data.
As soon as the chardev indicates it can accept more data, start pushing!

Signed-off-by: Amit Shah <amit.shah@redhat.com>
---
 hw/virtio-console.c |   17 ++++++++++++++++-
 1 files changed, 16 insertions(+), 1 deletions(-)

diff --git a/hw/virtio-console.c b/hw/virtio-console.c
index 749ed59..ec85ad5 100644
--- a/hw/virtio-console.c
+++ b/hw/virtio-console.c
@@ -18,13 +18,27 @@ typedef struct VirtConsole {
     CharDriverState *chr;
 } VirtConsole;
 
+/*
+ * Callback function that's called from chardevs when backend becomes
+ * writable.
+ */
+static void chr_write_unblocked(void *opaque)
+{
+    VirtConsole *vcon = opaque;
+
+    virtio_serial_throttle_port(&vcon->port, false);
+}
 
 /* Callback function that's called when the guest sends us data */
 static void flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t len)
 {
     VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
+    int ret;
 
-    qemu_chr_write(vcon->chr, buf, len);
+    ret = qemu_chr_write(vcon->chr, buf, len);
+    if (ret == -EAGAIN) {
+        virtio_serial_throttle_port(port, true);
+    }
 }
 
 /* Readiness of the guest to accept data on a port */
@@ -62,6 +76,7 @@ static QemuChrHandlers chr_handlers = {
     .fd_can_read = chr_can_read,
     .fd_read = chr_read,
     .fd_event = chr_event,
+    .fd_write_unblocked = chr_write_unblocked,
 };
 
 static int generic_port_init(VirtConsole *vcon, VirtIOSerialDevice *dev)
-- 
1.7.3.2

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data Amit Shah
@ 2010-12-01 11:30   ` Paul Brook
  2010-12-01 11:48     ` Amit Shah
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Brook @ 2010-12-01 11:30 UTC (permalink / raw)
  To: qemu-devel; +Cc: Amit Shah, Gerd Hoffmann, Juan Quintela

>  /* Callback function that's called when the guest sends us data */
>  static void flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t
> len) {
>      VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
> +    int ret;
> 
> -    qemu_chr_write(vcon->chr, buf, len);
> +    ret = qemu_chr_write(vcon->chr, buf, len);
> +    if (ret == -EAGAIN) {
> +        virtio_serial_throttle_port(port, true);
> +    }
>  }

This looks wrong. It will loose data in the case of a partial write
(i.e. ret < len)

Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-01 11:30   ` Paul Brook
@ 2010-12-01 11:48     ` Amit Shah
  2010-12-01 11:59       ` Paul Brook
  0 siblings, 1 reply; 23+ messages in thread
From: Amit Shah @ 2010-12-01 11:48 UTC (permalink / raw)
  To: Paul Brook; +Cc: Juan Quintela, qemu-devel, Gerd Hoffmann

On (Wed) Dec 01 2010 [11:30:58], Paul Brook wrote:
> >  /* Callback function that's called when the guest sends us data */
> >  static void flush_buf(VirtIOSerialPort *port, const uint8_t *buf, size_t
> > len) {
> >      VirtConsole *vcon = DO_UPCAST(VirtConsole, port, port);
> > +    int ret;
> > 
> > -    qemu_chr_write(vcon->chr, buf, len);
> > +    ret = qemu_chr_write(vcon->chr, buf, len);
> > +    if (ret == -EAGAIN) {
> > +        virtio_serial_throttle_port(port, true);
> > +    }
> >  }
> 
> This looks wrong. It will loose data in the case of a partial write
> (i.e. ret < len)

That doesn't happen currently (qemu_chr_write doesn't return a value > 0
but < len).

I had code in there to handle it, but that would change behaviour for
current users of qemu_chr_write(), which is a risk.

		Amit

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-01 11:48     ` Amit Shah
@ 2010-12-01 11:59       ` Paul Brook
  2010-12-01 12:12         ` Amit Shah
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Brook @ 2010-12-01 11:59 UTC (permalink / raw)
  To: Amit Shah; +Cc: Juan Quintela, qemu-devel, Gerd Hoffmann

> > > -    qemu_chr_write(vcon->chr, buf, len);
> > > +    ret = qemu_chr_write(vcon->chr, buf, len);
> > > +    if (ret == -EAGAIN) {
> > > +        virtio_serial_throttle_port(port, true);
> > > +    }
> > > 
> > >  }
> > 
> > This looks wrong. It will loose data in the case of a partial write
> > (i.e. ret < len)
> 
> That doesn't happen currently (qemu_chr_write doesn't return a value > 0
> but < len).
> 
> I had code in there to handle it, but that would change behaviour for
> current users of qemu_chr_write(), which is a risk.

Doesn't that make the code almost completely pointless?

Allowing EAGAIN without allowing short writes means that the writes will still 
block for arbitrarily long periods of time.  You're relying on the kernel 
blocking at the same point the guest happens to split a DMA block. If you're 
transferring any significant quantities of data the chances of that happening 
seem pretty slim.

Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-01 11:59       ` Paul Brook
@ 2010-12-01 12:12         ` Amit Shah
  2010-12-01 13:08           ` Paul Brook
  0 siblings, 1 reply; 23+ messages in thread
From: Amit Shah @ 2010-12-01 12:12 UTC (permalink / raw)
  To: Paul Brook; +Cc: Juan Quintela, qemu-devel, Gerd Hoffmann

On (Wed) Dec 01 2010 [11:59:35], Paul Brook wrote:
> > > > -    qemu_chr_write(vcon->chr, buf, len);
> > > > +    ret = qemu_chr_write(vcon->chr, buf, len);
> > > > +    if (ret == -EAGAIN) {
> > > > +        virtio_serial_throttle_port(port, true);
> > > > +    }
> > > > 
> > > >  }
> > > 
> > > This looks wrong. It will loose data in the case of a partial write
> > > (i.e. ret < len)
> > 
> > That doesn't happen currently (qemu_chr_write doesn't return a value > 0
> > but < len).
> > 
> > I had code in there to handle it, but that would change behaviour for
> > current users of qemu_chr_write(), which is a risk.
> 
> Doesn't that make the code almost completely pointless?

Not really -- I did have code for partial writes, but removed it before
this submission (had it in previous versions).

The (new) do_send loop:

    len = len1;
    while (len > 0) {
        ret = write(fd, buf, len);
        if (ret < 0) {
            if (errno == EAGAIN && nonblock) {
                return -EAGAIN;
            }
            if (errno != EINTR && errno != EAGAIN) {
                return -1;
            }
        } else if (ret == 0) {
            break;
        } else {
            buf += ret;
            len -= ret;
        }
    }

when there's a partial write, it tries to do a write again, which will
fail with -EAGAIN.

(However I might be completely missing the data that didn't get written
as a result of that partial write, so the output gets corrupted.  I
guess I still need to handle partial writes for that.)

		Amit

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-01 12:12         ` Amit Shah
@ 2010-12-01 13:08           ` Paul Brook
  2010-12-02  9:21             ` Amit Shah
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Brook @ 2010-12-01 13:08 UTC (permalink / raw)
  To: Amit Shah; +Cc: Juan Quintela, qemu-devel, Gerd Hoffmann

> On (Wed) Dec 01 2010 [11:59:35], Paul Brook wrote:
> > > > > -    qemu_chr_write(vcon->chr, buf, len);
> > > > > +    ret = qemu_chr_write(vcon->chr, buf, len);
> > > > > +    if (ret == -EAGAIN) {
> > > > > +        virtio_serial_throttle_port(port, true);
> > > > > +    }
> > > > > 
> > > > >  }
> > > > 
> > > > This looks wrong. It will loose data in the case of a partial write
> > > > (i.e. ret < len)
> > > 
> > > That doesn't happen currently (qemu_chr_write doesn't return a value >
> > > 0 but < len).
> > > 
> > > I had code in there to handle it, but that would change behaviour for
> > > current users of qemu_chr_write(), which is a risk.
> > 
> > Doesn't that make the code almost completely pointless?
> 
> Not really -- I did have code for partial writes, but removed it before
> this submission (had it in previous versions).
> 
> The (new) do_send loop:
> 
>     len = len1;
>     while (len > 0) {
>         ret = write(fd, buf, len);
>         if (ret < 0) {
>             if (errno == EAGAIN && nonblock) {
>                 return -EAGAIN;
>             }
>             if (errno != EINTR && errno != EAGAIN) {
>                 return -1;
>             }
>         } else if (ret == 0) {
>             break;
>         } else {
>             buf += ret;
>             len -= ret;
>         }
>     }
> 
> when there's a partial write, it tries to do a write again, which will
> fail with -EAGAIN.

Doesn't that cause the first partial chunk to be incorrectly transmitted 
twice? You may only return EAGAIN if no data was transmitted.

Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-01 13:08           ` Paul Brook
@ 2010-12-02  9:21             ` Amit Shah
  2010-12-02 17:31               ` Paul Brook
  0 siblings, 1 reply; 23+ messages in thread
From: Amit Shah @ 2010-12-02  9:21 UTC (permalink / raw)
  To: Paul Brook; +Cc: Juan Quintela, qemu-devel, Gerd Hoffmann

On (Wed) Dec 01 2010 [13:08:26], Paul Brook wrote:
> > On (Wed) Dec 01 2010 [11:59:35], Paul Brook wrote:
> > > > > > -    qemu_chr_write(vcon->chr, buf, len);
> > > > > > +    ret = qemu_chr_write(vcon->chr, buf, len);
> > > > > > +    if (ret == -EAGAIN) {
> > > > > > +        virtio_serial_throttle_port(port, true);
> > > > > > +    }
> > > > > > 
> > > > > >  }
> > > > > 
> > > > > This looks wrong. It will loose data in the case of a partial write
> > > > > (i.e. ret < len)
> > > > 
> > > > That doesn't happen currently (qemu_chr_write doesn't return a value >
> > > > 0 but < len).
> > > > 
> > > > I had code in there to handle it, but that would change behaviour for
> > > > current users of qemu_chr_write(), which is a risk.
> > > 
> > > Doesn't that make the code almost completely pointless?
> > 
> > Not really -- I did have code for partial writes, but removed it before
> > this submission (had it in previous versions).
> > 
> > The (new) do_send loop:
> > 
> >     len = len1;
> >     while (len > 0) {
> >         ret = write(fd, buf, len);
> >         if (ret < 0) {
> >             if (errno == EAGAIN && nonblock) {
> >                 return -EAGAIN;
> >             }
> >             if (errno != EINTR && errno != EAGAIN) {
> >                 return -1;
> >             }
> >         } else if (ret == 0) {
> >             break;
> >         } else {
> >             buf += ret;
> >             len -= ret;
> >         }
> >     }
> > 
> > when there's a partial write, it tries to do a write again, which will
> > fail with -EAGAIN.
> 
> Doesn't that cause the first partial chunk to be incorrectly transmitted 
> twice? You may only return EAGAIN if no data was transmitted.

Except for the fact that no caller of qemu_chr_write() resubmits (or
even checks) partial writes.

I think the only way to solve this in a caller-neutral way while keeping
the current semantics is to buffer all the data that comes in for
qemu_chr_write() and re-submit partial writes in the unblock routine
(ie, what I did in virtio-console.c in v7 is to be done in qemu-char.c
now).

That OK?

		Amit

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-02  9:21             ` Amit Shah
@ 2010-12-02 17:31               ` Paul Brook
  2010-12-06  6:55                 ` Amit Shah
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Brook @ 2010-12-02 17:31 UTC (permalink / raw)
  To: qemu-devel; +Cc: Amit Shah, Gerd Hoffmann, Juan Quintela

> > > when there's a partial write, it tries to do a write again, which will
> > > fail with -EAGAIN.
> > 
> > Doesn't that cause the first partial chunk to be incorrectly transmitted
> > twice? You may only return EAGAIN if no data was transmitted.
> 
> Except for the fact that no caller of qemu_chr_write() resubmits (or
> even checks) partial writes.

I don't buy this argument. The current implementation of qemu_chr_write never 
generates transient failures, so they don't need to.

If any data has been written then you must not return -EAGAIN.  Doing so will 
cause data corruption - the device will retry the transmit later (e.g. when 
the socket becomes unblocked) and duplicate data will be output.

Once data has been transmitted, we have three options:

a) Block until the write completes. This makes the whole patch fairly 
pointless as host and guest block boundaries are unlikely to align.

b) Store the data on the side somewhere. Tell the device all data has been 
sent, and arrange for this data to be flushed before accepting any more data.  
This is bad because it allows the guest to allocate arbitrarily large[1] 
buffers on the host. i.e. a fairly easily exploitable DoS attack.

c) Return a partial write to the guest. The guest already has to handle 
retries due to EAGAIN, and DMA capable devices already have to handle partial 
mappings, so this doesn't seem too onerous a requirement. This is not a new 
concept, it's the same as the unix write(2)/send(2) functions.

Paul

[1] At least as large as guest RAM, per port.

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-02 17:31               ` Paul Brook
@ 2010-12-06  6:55                 ` Amit Shah
  2010-12-06  9:35                   ` Paul Brook
  0 siblings, 1 reply; 23+ messages in thread
From: Amit Shah @ 2010-12-06  6:55 UTC (permalink / raw)
  To: Paul Brook; +Cc: Gerd Hoffmann, qemu-devel, Juan Quintela

On (Thu) Dec 02 2010 [17:31:36], Paul Brook wrote:
> > > > when there's a partial write, it tries to do a write again, which will
> > > > fail with -EAGAIN.
> > > 
> > > Doesn't that cause the first partial chunk to be incorrectly transmitted
> > > twice? You may only return EAGAIN if no data was transmitted.
> > 
> > Except for the fact that no caller of qemu_chr_write() resubmits (or
> > even checks) partial writes.
> 
> I don't buy this argument. The current implementation of qemu_chr_write never 
> generates transient failures, so they don't need to.

And applying this patch won't change the situation.

What I proposed in the earlier mail was to buffer only the data that has
to be re-submitted in case the caller is capable of stopping further
output till the char layer indicates it's free to start again.

> Once data has been transmitted, we have three options:
> 
> a) Block until the write completes. This makes the whole patch fairly 
> pointless as host and guest block boundaries are unlikely to align.

This is what currently happens and will remain so for callers of
qemu_chr_write() which don't have a .write_unblocked() pointer assigned
in the char dev struct.

> b) Store the data on the side somewhere. Tell the device all data has been 
> sent, and arrange for this data to be flushed before accepting any more data.  
> This is bad because it allows the guest to allocate arbitrarily large[1] 
> buffers on the host. i.e. a fairly easily exploitable DoS attack.

With virtio-serial, this is what's in use.  The buffer is limited to the
length of the vq (which is a compile-time constant) and there also is
the virtio_serial_throttle_port() call that tells the guest to not send
any more data to the host till the char layer indicates it's OK to send
more data.

> c) Return a partial write to the guest. The guest already has to handle 
> retries due to EAGAIN, and DMA capable devices already have to handle partial 
> mappings, so this doesn't seem too onerous a requirement. This is not a new 
> concept, it's the same as the unix write(2)/send(2) functions.

This isn't possible with the current vq design.

		Amit

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-06  6:55                 ` Amit Shah
@ 2010-12-06  9:35                   ` Paul Brook
  2010-12-06 10:11                     ` Amit Shah
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Brook @ 2010-12-06  9:35 UTC (permalink / raw)
  To: qemu-devel; +Cc: Amit Shah, Gerd Hoffmann, Juan Quintela

> On (Thu) Dec 02 2010 [17:31:36], Paul Brook wrote:
> > > > > when there's a partial write, it tries to do a write again, which
> > > > > will fail with -EAGAIN.
> > > > 
> > > > Doesn't that cause the first partial chunk to be incorrectly
> > > > transmitted twice? You may only return EAGAIN if no data was
> > > > transmitted.
> > > 
> > > Except for the fact that no caller of qemu_chr_write() resubmits (or
> > > even checks) partial writes.
> > 
> > I don't buy this argument. The current implementation of qemu_chr_write
> > never generates transient failures, so they don't need to.
> 
> And applying this patch won't change the situation.

Sure it will. The whole point of the patch is to allow transient failures 
(i.e. avoid blocking) when writing to char backends.  You should expect to 
have to modify the device code to cope with this.

As with the DMA interface added a while ago, I believe it's important to get 
these APIs right.  Quick hacks to support limited use-cases just end up 
needing a complete rewrite (or even worse multiple concurrent 
APIs/implementations) once we actually start using them seriously.

I'm extremely reluctant to add a new layer of buffering that is not visible to 
ether the kernel or the device.  In practice we still need to be able to split 
oversize requests, so handling small split requests should be pretty much 
free.

> What I proposed in the earlier mail was to buffer only the data that has
> to be re-submitted in case the caller is capable of stopping further
> output till the char layer indicates it's free to start again.

That's case (b) below.

> > Once data has been transmitted, we have three options:
> > 
> > a) Block until the write completes. This makes the whole patch fairly
> > pointless as host and guest block boundaries are unlikely to align.
> 
> This is what currently happens and will remain so for callers of
> qemu_chr_write() which don't have a .write_unblocked() pointer assigned
> in the char dev struct.

Obviously if the device doesn't supply an unbocked() hook then the behavior is 
unchanged.  That's trivially uninteresting.  I'm talking about devices that do 
provide the unblocked() hook.

> > b) Store the data on the side somewhere. Tell the device all data has
> > been sent, and arrange for this data to be flushed before accepting any
> > more data. This is bad because it allows the guest to allocate
> > arbitrarily large[1] buffers on the host. i.e. a fairly easily
> > exploitable DoS attack.
> 
> With virtio-serial, this is what's in use.  The buffer is limited to the
> length of the vq (which is a compile-time constant) and there also is
> the virtio_serial_throttle_port() call that tells the guest to not send
> any more data to the host till the char layer indicates it's OK to send
> more data.

No.

Firstly you're assuming all users are virtio based. That may be all you care 
about, but is not acceptable if you want to get this code merged.

Secondly, the virtqueue only restricts the number of direct ring buffer 
entries. It does not restrict the quantity of data each ring entry points to.

As a side note, I notice that the virtio-serial-buf code is already allocating 
buffers and calling iov_to_buf on arbitrary sized requests. This is wrong for 
the same reason. Don't do it.

> > c) Return a partial write to the guest. The guest already has to handle
> > retries due to EAGAIN, and DMA capable devices already have to handle
> > partial mappings, so this doesn't seem too onerous a requirement. This
> > is not a new concept, it's the same as the unix write(2)/send(2)
> > functions.
> 
> This isn't possible with the current vq design.

You need to fix that then.  I'm fairly sure it must be possible as virtio-blk 
has to handle similar problems.

Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-06  9:35                   ` Paul Brook
@ 2010-12-06 10:11                     ` Amit Shah
  2010-12-06 13:23                       ` Paul Brook
  0 siblings, 1 reply; 23+ messages in thread
From: Amit Shah @ 2010-12-06 10:11 UTC (permalink / raw)
  To: Paul Brook; +Cc: Juan Quintela, qemu-devel, Gerd Hoffmann

On (Mon) Dec 06 2010 [09:35:10], Paul Brook wrote:
> > On (Thu) Dec 02 2010 [17:31:36], Paul Brook wrote:
> > > > > > when there's a partial write, it tries to do a write again, which
> > > > > > will fail with -EAGAIN.
> > > > > 
> > > > > Doesn't that cause the first partial chunk to be incorrectly
> > > > > transmitted twice? You may only return EAGAIN if no data was
> > > > > transmitted.
> > > > 
> > > > Except for the fact that no caller of qemu_chr_write() resubmits (or
> > > > even checks) partial writes.
> > > 
> > > I don't buy this argument. The current implementation of qemu_chr_write
> > > never generates transient failures, so they don't need to.
> > 
> > And applying this patch won't change the situation.
> 
> Sure it will. The whole point of the patch is to allow transient failures 
> (i.e. avoid blocking) when writing to char backends.  You should expect to 
> have to modify the device code to cope with this.

Looks like we're talking of two different cases.  I'm talking here of
current code that uses qemu chardevs and that it'll continue working
fine with this patchset (ie. changes required only to code that wants
-EAGAIN returns).

> As with the DMA interface added a while ago, I believe it's important to get 
> these APIs right.  Quick hacks to support limited use-cases just end up 
> needing a complete rewrite (or even worse multiple concurrent 
> APIs/implementations) once we actually start using them seriously.

Sure.  My proposal is for qemu_chr_write() to succeed all the time.  If
the backend can block, and the caller can handle it, it can get a
-EAGAIN (or WSAEWOULDBLOCK) return.  When the backend becomes writable,
the chardev can call the ->writes_unblocked() callback for that caller.
Individual callers need not bother about re-submitting partial writes,
since the buffering can be done in common code in one place
(qemu-char.c).

My previous implementation for leaving out the buffering details to
individual users of qemu chardevs was OK'ed by you but not Anthony.

> I'm extremely reluctant to add a new layer of buffering that is not visible to 
> ether the kernel or the device.  In practice we still need to be able to split 
> oversize requests, so handling small split requests should be pretty much 
> free.

So do you propose to propagate this -EAGAIN error all the way to the
guest?  That won't work for older virtio guests (for virtio, host and
guest changes will be needed).

> > What I proposed in the earlier mail was to buffer only the data that has
> > to be re-submitted in case the caller is capable of stopping further
> > output till the char layer indicates it's free to start again.
> 
> That's case (b) below.
> 
> > > Once data has been transmitted, we have three options:
> > > 
> > > a) Block until the write completes. This makes the whole patch fairly
> > > pointless as host and guest block boundaries are unlikely to align.
> > 
> > This is what currently happens and will remain so for callers of
> > qemu_chr_write() which don't have a .write_unblocked() pointer assigned
> > in the char dev struct.
> 
> Obviously if the device doesn't supply an unbocked() hook then the behavior is 
> unchanged.  That's trivially uninteresting.  I'm talking about devices that do 
> provide the unblocked() hook.
> 
> > > b) Store the data on the side somewhere. Tell the device all data has
> > > been sent, and arrange for this data to be flushed before accepting any
> > > more data. This is bad because it allows the guest to allocate
> > > arbitrarily large[1] buffers on the host. i.e. a fairly easily
> > > exploitable DoS attack.
> > 
> > With virtio-serial, this is what's in use.  The buffer is limited to the
> > length of the vq (which is a compile-time constant) and there also is
> > the virtio_serial_throttle_port() call that tells the guest to not send
> > any more data to the host till the char layer indicates it's OK to send
> > more data.
> 
> No.
> 
> Firstly you're assuming all users are virtio based. That may be all you care 
> about, but is not acceptable if you want to get this code merged.

OK, but it's assumed that once a -EAGAIN is returned, the caller will
take appropriate actions to restrict the data sent.  Especially,
send_all has:

    if (chr->write_blocked) {
        /*
         * We don't handle this situation: the caller should not send
         * us data while we're blocked.
         *
         * We could buffer this data here but that'll only encourage
         * bad behaviour on part of the callers.
         *
         * Also, the data already in fd's buffers isn't easily
         * migratable.  If we want full migration support, all the
         * data landing here needs to be buffered and on migration,
         * anything that's unsent needs to be transferred to the
         * dest. machine (which again isn't a very good way of solving
         * the problem, as the src may become writable just during
         * migration and the reader could receive some data twice,
         * essentially corrupting the data).
         */
        return -1;
    }

> Secondly, the virtqueue only restricts the number of direct ring buffer 
> entries. It does not restrict the quantity of data each ring entry points to.

But that's entirely in guest memory, so it's limited to the amount of
RAM that has been allocated to the guest.

> As a side note, I notice that the virtio-serial-buf code is already allocating 
> buffers and calling iov_to_buf on arbitrary sized requests. This is wrong for 
> the same reason. Don't do it.

You mean the code that allocates one buffer from the iov?

> > > c) Return a partial write to the guest. The guest already has to handle
> > > retries due to EAGAIN, and DMA capable devices already have to handle
> > > partial mappings, so this doesn't seem too onerous a requirement. This
> > > is not a new concept, it's the same as the unix write(2)/send(2)
> > > functions.
> > 
> > This isn't possible with the current vq design.
> 
> You need to fix that then.  I'm fairly sure it must be possible as virtio-blk 
> has to handle similar problems.

This was one of the items that was debated during the lead-up to
virtio-serial merge:  whether a write() call in the guest means data has
been flushed out to the host chardev or if the guest has just passed it
on to the host to take care of it.  We merged the latter approach.

I'll check with what virtio-blk does, but I guess the block layer has
semantics for allowing such usage whereas the char layer doesn't.

		Amit

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-06 10:11                     ` Amit Shah
@ 2010-12-06 13:23                       ` Paul Brook
  2010-12-07  7:11                         ` Amit Shah
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Brook @ 2010-12-06 13:23 UTC (permalink / raw)
  To: Amit Shah; +Cc: Juan Quintela, qemu-devel, Gerd Hoffmann

> > As with the DMA interface added a while ago, I believe it's important to
> > get these APIs right.  Quick hacks to support limited use-cases just end
> > up needing a complete rewrite (or even worse multiple concurrent
> > APIs/implementations) once we actually start using them seriously.
> 
> Sure.  My proposal is for qemu_chr_write() to succeed all the time.  If
> the backend can block, and the caller can handle it, it can get a
> -EAGAIN (or WSAEWOULDBLOCK) return.  When the backend becomes writable,
> the chardev can call the ->writes_unblocked() callback for that caller.
> Individual callers need not bother about re-submitting partial writes,
> since the buffering can be done in common code in one place
> (qemu-char.c).

That's only OK if you assume it's OK to buffer all but one byte of the 
transmit request.

> > I'm extremely reluctant to add a new layer of buffering that is not
> > visible to ether the kernel or the device.  In practice we still need to
> > be able to split oversize requests, so handling small split requests
> > should be pretty much free.
> 
> So do you propose to propagate this -EAGAIN error all the way to the
> guest?  That won't work for older virtio guests (for virtio, host and
> guest changes will be needed).

Huh? That doesn't make any sense. The guest is already using an asyncronous 
submission mechanism.  
With a virtio device the status of a buffer becomes indeterminate once it has 
been placed into the queue. Only when it is removed from the queue do we know 
that it has completed.  The device may transfer all or part of that buffer at 
any time in between.
 
> > > > b) Store the data on the side somewhere. Tell the device all data has
> > > > been sent, and arrange for this data to be flushed before accepting
> > > > any more data. This is bad because it allows the guest to allocate
> > > > arbitrarily large[1] buffers on the host. i.e. a fairly easily
> > > > exploitable DoS attack.
> > > 
> > > With virtio-serial, this is what's in use.  The buffer is limited to
> > > the length of the vq (which is a compile-time constant) and there also
> > > is the virtio_serial_throttle_port() call that tells the guest to not
> > > send any more data to the host till the char layer indicates it's OK
> > > to send more data.
> > 
> > No.
> > 
> > Firstly you're assuming all users are virtio based. That may be all you
> > care about, but is not acceptable if you want to get this code merged.
> 
> OK, but it's assumed that once a -EAGAIN is returned, the caller will
> take appropriate actions to restrict the data sent.  Especially,
> send_all has:
> 
>     if (chr->write_blocked) {
>         /*
>          * We don't handle this situation: the caller should not send
>          * us data while we're blocked.
>          *
>          * We could buffer this data here but that'll only encourage
>          * bad behaviour on part of the callers.

>          */
>         return -1;
>     }

If you're being draconian about this then do it properly and make this an 
abort. Otherwise return -EAGAIN. Returning a random error seems like the worst 
of both worlds.  Your code results in spurious guest errors (or lost data) 
with real indication that this is actually a qemu device emulation bug.
 
> > Secondly, the virtqueue only restricts the number of direct ring buffer
> > entries. It does not restrict the quantity of data each ring entry points
> > to.
> 
> But that's entirely in guest memory, so it's limited to the amount of
> RAM that has been allocated to the guest.

Exactly. The guest can cause ram_size * nr_ports of additional host memory to 
be allocated.  Not acceptable. 

> > > > c) Return a partial write to the guest. The guest already has to
> > > > handle retries due to EAGAIN, and DMA capable devices already have
> > > > to handle partial mappings, so this doesn't seem too onerous a
> > > > requirement. This is not a new concept, it's the same as the unix
> > > > write(2)/send(2) functions.
> > > 
> > > This isn't possible with the current vq design.
> > 
> > You need to fix that then.  I'm fairly sure it must be possible as
> > virtio-blk has to handle similar problems.
> 
> This was one of the items that was debated during the lead-up to
> virtio-serial merge:  whether a write() call in the guest means data has
> been flushed out to the host chardev or if the guest has just passed it
> on to the host to take care of it.  We merged the latter approach.

Ensuring that data has actually reached the endpoint (v.s. being in a queue 
for transmission at the next available point) is a different problem, and 
probably one better solved by higher level protocols.


Char devices are fundamentally stream based devices with no frame boundaries 
(or alternatively a fixed frame size of 1 byte).

Your device only reports progress to the guest in virtqueue-entry sized 
chunks. However that places no real restrictions on how the data is passed to 
the host. You can choose to pass a single queue entry to the host in several 
smaller chunks, or even pass several queue entries to the host in a single 
request.

Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-06 13:23                       ` Paul Brook
@ 2010-12-07  7:11                         ` Amit Shah
  2010-12-08 12:56                           ` Paul Brook
  0 siblings, 1 reply; 23+ messages in thread
From: Amit Shah @ 2010-12-07  7:11 UTC (permalink / raw)
  To: Paul Brook; +Cc: Juan Quintela, qemu-devel, Gerd Hoffmann

On (Mon) Dec 06 2010 [13:23:50], Paul Brook wrote:
> > Sure.  My proposal is for qemu_chr_write() to succeed all the time.  If
> > the backend can block, and the caller can handle it, it can get a
> > -EAGAIN (or WSAEWOULDBLOCK) return.  When the backend becomes writable,
> > the chardev can call the ->writes_unblocked() callback for that caller.
> > Individual callers need not bother about re-submitting partial writes,
> > since the buffering can be done in common code in one place
> > (qemu-char.c).
> 
> That's only OK if you assume it's OK to buffer all but one byte of the 
> transmit request.

Is that a fair assumption to make?

> > > I'm extremely reluctant to add a new layer of buffering that is not
> > > visible to ether the kernel or the device.  In practice we still need to
> > > be able to split oversize requests, so handling small split requests
> > > should be pretty much free.
> > 
> > So do you propose to propagate this -EAGAIN error all the way to the
> > guest?  That won't work for older virtio guests (for virtio, host and
> > guest changes will be needed).
> 
> Huh? That doesn't make any sense. The guest is already using an asyncronous 
> submission mechanism.  
> With a virtio device the status of a buffer becomes indeterminate once it has 
> been placed into the queue. Only when it is removed from the queue do we know 
> that it has completed.  The device may transfer all or part of that buffer at 
> any time in between.

Yes, just making sure this isn't what you're talking about.

> > > > > b) Store the data on the side somewhere. Tell the device all data has
> > > > > been sent, and arrange for this data to be flushed before accepting
> > > > > any more data. This is bad because it allows the guest to allocate
> > > > > arbitrarily large[1] buffers on the host. i.e. a fairly easily
> > > > > exploitable DoS attack.
> > > > 
> > > > With virtio-serial, this is what's in use.  The buffer is limited to
> > > > the length of the vq (which is a compile-time constant) and there also
> > > > is the virtio_serial_throttle_port() call that tells the guest to not
> > > > send any more data to the host till the char layer indicates it's OK
> > > > to send more data.
> > > 
> > > No.
> > > 
> > > Firstly you're assuming all users are virtio based. That may be all you
> > > care about, but is not acceptable if you want to get this code merged.
> > 
> > OK, but it's assumed that once a -EAGAIN is returned, the caller will
> > take appropriate actions to restrict the data sent.  Especially,
> > send_all has:
> > 
> >     if (chr->write_blocked) {
> >         /*
> >          * We don't handle this situation: the caller should not send
> >          * us data while we're blocked.
> >          *
> >          * We could buffer this data here but that'll only encourage
> >          * bad behaviour on part of the callers.
> 
> >          */
> >         return -1;
> >     }
> 
> If you're being draconian about this then do it properly and make this an 
> abort. Otherwise return -EAGAIN. Returning a random error seems like the worst 
> of both worlds.  Your code results in spurious guest errors (or lost data) 
> with real indication that this is actually a qemu device emulation bug.

Yeah; abort() is a good idea.

(BTW the previous send_all() routine just returned -1 for anything other
than -EINTR and -EAGAIN, so I went for consistency.)

> > > Secondly, the virtqueue only restricts the number of direct ring buffer
> > > entries. It does not restrict the quantity of data each ring entry points
> > > to.
> > 
> > But that's entirely in guest memory, so it's limited to the amount of
> > RAM that has been allocated to the guest.
> 
> Exactly. The guest can cause ram_size * nr_ports of additional host memory to 
> be allocated.  Not acceptable. 

OK -- so this is how it adds up:

- guest vq
- virtio-serial-bus converts iov to buf
- qemu-char stores the buf in case it wasn't able to send

but then, since it's all async, we have:

- virtio-serial-bus frees the buf
- guest deletes the buf and removes it from the vq

So what's left is only the data in qemu-char's buf.  Now this can be
(buf_size - 1) * nr_ports in the worst case.

The alternative would be to keep using the guest buffer till all's done,
but then that depends on qemu getting async support - separating out the
qemu_chr_write() into a separate thread and allowing vcpu and chr io
operations to be run simultaneously.

		Amit

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-07  7:11                         ` Amit Shah
@ 2010-12-08 12:56                           ` Paul Brook
  2010-12-08 14:25                             ` Amit Shah
  0 siblings, 1 reply; 23+ messages in thread
From: Paul Brook @ 2010-12-08 12:56 UTC (permalink / raw)
  To: qemu-devel; +Cc: Amit Shah, Gerd Hoffmann, Juan Quintela

> > > Sure.  My proposal is for qemu_chr_write() to succeed all the time.  If
> > > the backend can block, and the caller can handle it, it can get a
> > > -EAGAIN (or WSAEWOULDBLOCK) return.  When the backend becomes writable,
> > > the chardev can call the ->writes_unblocked() callback for that caller.
> > > Individual callers need not bother about re-submitting partial writes,
> > > since the buffering can be done in common code in one place
> > > (qemu-char.c).
> > 
> > That's only OK if you assume it's OK to buffer all but one byte of the
> > transmit request.
> 
> Is that a fair assumption to make?

No. See below.

> > > But that's entirely in guest memory, so it's limited to the amount of
> > > RAM that has been allocated to the guest.
> > 
> > Exactly. The guest can cause ram_size * nr_ports of additional host
> > memory to be allocated.  Not acceptable.
> 
> OK -- so this is how it adds up:
> 
> - guest vq
> - virtio-serial-bus converts iov to buf

This is an unbelievably lame piece of code. There's absolutely no reason to 
copy the data into a linear buffer. You should just be iterating over the 
elements of the sglist.

> - qemu-char stores the buf in case it wasn't able to send
>
> but then, since it's all async, we have:
> 
> - virtio-serial-bus frees the buf
> - guest deletes the buf and removes it from the vq
>
> So what's left is only the data in qemu-char's buf.  Now this can be
> (buf_size - 1) * nr_ports in the worst case.

Add at least another buf_size because you have to allocate the qemu-char 
buffer before you free the virtio-serial buffer. We can expect that
buf_size ~= guest ram size [1], so for practical purposes it may as well be 
unbounded.

Worst case the guest could submit a buffer consisting of many large 
overlapping regions, giving a buf_size many times greater then guest ram size.  
Technically such code isn't even doing anything wrong. We're only reading from 
guest RAM, so in principle the same memory can be submitted multiple times 
without causing a race condition.

> The alternative would be to keep using the guest buffer till all's done,

Yes.

> but then that depends on qemu getting async support - separating out the
> qemu_chr_write() into a separate thread and allowing vcpu and chr io
> operations to be run simultaneously.

You don't need any special async char API or threads.  Normal unix write 
semantics (i.e. short writes and EAGAIN) plus the unblock hook are sufficient.
As mentioned above, the virtio-serial code should be iterating over the 
sglist.  If the host won't accept all the data immediately then just remember 
how much has been sent, and resume iteration when the unblock hook is called.

Paul

[1] This kind of insanity does actually happen in the real world.  e.g. 
loading a 700Mb ramdisk image via the fw_cfg device, or a kernel panic handler 
that dumps the contents of RAM to a debug channel (i.e. serial port).

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-08 12:56                           ` Paul Brook
@ 2010-12-08 14:25                             ` Amit Shah
  2010-12-08 16:54                               ` Paul Brook
  0 siblings, 1 reply; 23+ messages in thread
From: Amit Shah @ 2010-12-08 14:25 UTC (permalink / raw)
  To: Paul Brook; +Cc: Gerd Hoffmann, qemu-devel, Juan Quintela

On (Wed) Dec 08 2010 [12:56:33], Paul Brook wrote:
> > > > Sure.  My proposal is for qemu_chr_write() to succeed all the time.  If
> > > > the backend can block, and the caller can handle it, it can get a
> > > > -EAGAIN (or WSAEWOULDBLOCK) return.  When the backend becomes writable,
> > > > the chardev can call the ->writes_unblocked() callback for that caller.
> > > > Individual callers need not bother about re-submitting partial writes,
> > > > since the buffering can be done in common code in one place
> > > > (qemu-char.c).
> > > 
> > > That's only OK if you assume it's OK to buffer all but one byte of the
> > > transmit request.
> > 
> > Is that a fair assumption to make?
> 
> No. See below.
> 
> > > > But that's entirely in guest memory, so it's limited to the amount of
> > > > RAM that has been allocated to the guest.
> > > 
> > > Exactly. The guest can cause ram_size * nr_ports of additional host
> > > memory to be allocated.  Not acceptable.
> > 
> > OK -- so this is how it adds up:
> > 
> > - guest vq
> > - virtio-serial-bus converts iov to buf
> 
> This is an unbelievably lame piece of code.

I doubt it's 'unbelievably lame' just because of the copy.  Care to
expand?

> There's absolutely no reason to 
> copy the data into a linear buffer. You should just be iterating over the 
> elements of the sglist.

OK, but that can be done in a separate patch series.

> > - qemu-char stores the buf in case it wasn't able to send
> >
> > but then, since it's all async, we have:
> > 
> > - virtio-serial-bus frees the buf
> > - guest deletes the buf and removes it from the vq
> >
> > So what's left is only the data in qemu-char's buf.  Now this can be
> > (buf_size - 1) * nr_ports in the worst case.
> 
> Add at least another buf_size because you have to allocate the qemu-char 
> buffer before you free the virtio-serial buffer. We can expect that
> buf_size ~= guest ram size [1], so for practical purposes it may as well be 
> unbounded.

Now this only happens when the host chardev is slow or isn't being read
from.  So it's not really a guest causing a host DoS, but a guest
causing itself some harm.  You're right that the allocations happen one
after the other, and the freeing happens later, so there is a time when
2 or 3 times the buf_size is needed.

However, once qemu_chr_write() returns, there could be just one copy
lying around, things are freed elsewhere.

> Worst case the guest could submit a buffer consisting of many large 
> overlapping regions, giving a buf_size many times greater then guest ram size.  
> Technically such code isn't even doing anything wrong. We're only reading from 
> guest RAM, so in principle the same memory can be submitted multiple times 
> without causing a race condition.

Interesting.

> > The alternative would be to keep using the guest buffer till all's done,
> 
> Yes.
> 
> > but then that depends on qemu getting async support - separating out the
> > qemu_chr_write() into a separate thread and allowing vcpu and chr io
> > operations to be run simultaneously.
> 
> You don't need any special async char API or threads.  Normal unix write 
> semantics (i.e. short writes and EAGAIN) plus the unblock hook are sufficient.
> As mentioned above, the virtio-serial code should be iterating over the 
> sglist.  If the host won't accept all the data immediately then just remember 
> how much has been sent, and resume iteration when the unblock hook is called.

Yes I've been thinking about this as well.  But the problem is some
kernel versions spin in the guest code till the buffer is placed back
in the vq (signalling it's done using it).  This is a problem for the
virtio-console (hvc) that does writes with spinlocks held, so allocating
new buffers, etc., isn't really -- possible easily.

		Amit

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data
  2010-12-08 14:25                             ` Amit Shah
@ 2010-12-08 16:54                               ` Paul Brook
  0 siblings, 0 replies; 23+ messages in thread
From: Paul Brook @ 2010-12-08 16:54 UTC (permalink / raw)
  To: Amit Shah; +Cc: Gerd Hoffmann, qemu-devel, Juan Quintela

> > > > > But that's entirely in guest memory, so it's limited to the amount
> > > > > of RAM that has been allocated to the guest.
> > > > 
> > > > Exactly. The guest can cause ram_size * nr_ports of additional host
> > > > memory to be allocated.  Not acceptable.
> > > 
> > > OK -- so this is how it adds up:
> > > 
> > > - guest vq
> > > - virtio-serial-bus converts iov to buf
> > 
> > This is an unbelievably lame piece of code.
> 
> I doubt it's 'unbelievably lame' just because of the copy.  Care to
> expand?

Specifically that we are allocating a host buffer of guest-specified size to 
hold that copy.
 
> > There's absolutely no reason to
> > copy the data into a linear buffer. You should just be iterating over the
> > elements of the sglist.
> 
> OK, but that can be done in a separate patch series.

I suspect you'll actually find it easier to fix that first. Otherwise you're 
going end up having to rewrite your own code.

> > > - qemu-char stores the buf in case it wasn't able to send
> > > 
> > > but then, since it's all async, we have:
> > > 
> > > - virtio-serial-bus frees the buf
> > > - guest deletes the buf and removes it from the vq
> > > 
> > > So what's left is only the data in qemu-char's buf.  Now this can be
> > > (buf_size - 1) * nr_ports in the worst case.
> > 
> > Add at least another buf_size because you have to allocate the qemu-char
> > buffer before you free the virtio-serial buffer. We can expect that
> > buf_size ~= guest ram size [1], so for practical purposes it may as well
> > be unbounded.
> 
> Now this only happens when the host chardev is slow or isn't being read
> from.  So it's not really a guest causing a host DoS, but a guest
> causing itself some harm.  

No. It causes qemu to allocate and use an arbitrarily large amount of 
additional ram on the host. This is likely to effect the whole host machine, 
not just the problematic guest.  You can hope the OOM killer happens to pick 
the right guest, but I wouldn't bet on it.

> You're right that the allocations happen one
> after the other, and the freeing happens later, so there is a time when
> 2 or 3 times the buf_size is needed.
> 
> However, once qemu_chr_write() returns, there could be just one copy
> lying around, things are freed elsewhere.

One copy (multiplied by the number of ports) is more than enough to cause 
serious problems.

> > > but then that depends on qemu getting async support - separating out
> > > the qemu_chr_write() into a separate thread and allowing vcpu and chr
> > > io operations to be run simultaneously.
> > 
> > You don't need any special async char API or threads.  Normal unix write
> > semantics (i.e. short writes and EAGAIN) plus the unblock hook are
> > sufficient. As mentioned above, the virtio-serial code should be
> > iterating over the sglist.  If the host won't accept all the data
> > immediately then just remember how much has been sent, and resume
> > iteration when the unblock hook is called.
> 
> Yes I've been thinking about this as well.  But the problem is some
> kernel versions spin in the guest code till the buffer is placed back
> in the vq (signalling it's done using it).  This is a problem for the
> virtio-console (hvc) that does writes with spinlocks held, so allocating
> new buffers, etc., isn't really -- possible easily.

That's a guest bug, plain and simple.
I'm pretty sure such guests will still loose after your patch. All you're 
doing is delaying the inevitable slightly. i.e. if a guest happens to submit 
another block before the first has been flushed then it will spin in exactly 
the same way.

Paul

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2010-12-08 16:54 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-12-01  9:54 [Qemu-devel] [PATCH v8 0/7] char: non-blocking writes, virtio-console flow control Amit Shah
2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 1/7] virtio-console: Factor out common init between console and generic ports Amit Shah
2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 2/7] char: Add a QemuChrHandlers struct to initialise chardev handlers Amit Shah
2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 3/7] char: Introduce char_set/remove_fd_handlers() Amit Shah
2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 4/7] char: Add framework for a 'write unblocked' callback Amit Shah
2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 5/7] char: Update send_all() to handle nonblocking chardev write requests Amit Shah
2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 6/7] char: Equip the unix/tcp backend to handle nonblocking writes Amit Shah
2010-12-01  9:54 ` [Qemu-devel] [PATCH v8 7/7] virtio-console: Enable port throttling when chardev is slow to consume data Amit Shah
2010-12-01 11:30   ` Paul Brook
2010-12-01 11:48     ` Amit Shah
2010-12-01 11:59       ` Paul Brook
2010-12-01 12:12         ` Amit Shah
2010-12-01 13:08           ` Paul Brook
2010-12-02  9:21             ` Amit Shah
2010-12-02 17:31               ` Paul Brook
2010-12-06  6:55                 ` Amit Shah
2010-12-06  9:35                   ` Paul Brook
2010-12-06 10:11                     ` Amit Shah
2010-12-06 13:23                       ` Paul Brook
2010-12-07  7:11                         ` Amit Shah
2010-12-08 12:56                           ` Paul Brook
2010-12-08 14:25                             ` Amit Shah
2010-12-08 16:54                               ` Paul Brook

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.