All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH 0/3] nbd: fix issues when connection breaks
@ 2014-02-25 10:09 Stefan Hajnoczi
  2014-02-25 10:09 ` [Qemu-devel] [PATCH 1/3] nbd: close socket if " Stefan Hajnoczi
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2014-02-25 10:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, bluewindow, nick

The first patch ensures the nbd_receive_reply() fd handler is unregistered when
the connection to the server breaks.  This avoids high CPU consumption and
flooding error messages.

The second patch introduces an NBD server fault injection script.  Using this
fake NBD server it is possible to exercise error handling code paths in the NBD
client.

The third patch adds qemu-iotests test case 080 to verify qemu-io exits with an
error at each point where the connection can break.

Stefan Hajnoczi (3):
  nbd: close socket if connection breaks
  tests: add nbd-fault-injector.py utility
  qemu-iotests: add 080 NBD client disconnect tests

 block/nbd-client.c                       |  33 +++--
 tests/qemu-iotests/080                   |  91 ++++++++++++
 tests/qemu-iotests/080.out               |  73 ++++++++++
 tests/qemu-iotests/group                 |   1 +
 tests/qemu-iotests/nbd-fault-injector.py | 231 +++++++++++++++++++++++++++++++
 5 files changed, 414 insertions(+), 15 deletions(-)
 create mode 100755 tests/qemu-iotests/080
 create mode 100644 tests/qemu-iotests/080.out
 create mode 100755 tests/qemu-iotests/nbd-fault-injector.py

-- 
1.8.5.3

^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 1/3] nbd: close socket if connection breaks
  2014-02-25 10:09 [Qemu-devel] [PATCH 0/3] nbd: fix issues when connection breaks Stefan Hajnoczi
@ 2014-02-25 10:09 ` Stefan Hajnoczi
  2014-02-25 10:09 ` [Qemu-devel] [PATCH 2/3] tests: add nbd-fault-injector.py utility Stefan Hajnoczi
  2014-02-25 10:09 ` [Qemu-devel] [PATCH 3/3] qemu-iotests: add 080 NBD client disconnect tests Stefan Hajnoczi
  2 siblings, 0 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2014-02-25 10:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, bluewindow, nick

nbd_receive_reply() is called by the event loop whenever data is
available or the socket has been closed by the remote side.

This patch closes the socket when an error occurs to prevent the
nbd_receive_reply() handler from being called indefinitely after the
connection has failed.

Note that we were already correctly returning EIO for pending requests
but leaving the nbd_receive_reply() handler registered resulted in high
CPU consumption and a flood of error messages.

Reuse nbd_teardown_connection() to close the socket.

Reported-by: Zhifeng Cai <bluewindow@h3c.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 block/nbd-client.c | 33 ++++++++++++++++++---------------
 1 file changed, 18 insertions(+), 15 deletions(-)

diff --git a/block/nbd-client.c b/block/nbd-client.c
index 0922b78..7d698cb 100644
--- a/block/nbd-client.c
+++ b/block/nbd-client.c
@@ -43,6 +43,17 @@ static void nbd_recv_coroutines_enter_all(NbdClientSession *s)
     }
 }
 
+static void nbd_teardown_connection(NbdClientSession *client)
+{
+    /* finish any pending coroutines */
+    shutdown(client->sock, 2);
+    nbd_recv_coroutines_enter_all(client);
+
+    qemu_aio_set_fd_handler(client->sock, NULL, NULL, NULL);
+    closesocket(client->sock);
+    client->sock = -1;
+}
+
 static void nbd_reply_ready(void *opaque)
 {
     NbdClientSession *s = opaque;
@@ -78,7 +89,7 @@ static void nbd_reply_ready(void *opaque)
     }
 
 fail:
-    nbd_recv_coroutines_enter_all(s);
+    nbd_teardown_connection(s);
 }
 
 static void nbd_restart_write(void *opaque)
@@ -324,7 +335,7 @@ int nbd_client_session_co_discard(NbdClientSession *client, int64_t sector_num,
 
 }
 
-static void nbd_teardown_connection(NbdClientSession *client)
+void nbd_client_session_close(NbdClientSession *client)
 {
     struct nbd_request request = {
         .type = NBD_CMD_DISC,
@@ -332,22 +343,14 @@ static void nbd_teardown_connection(NbdClientSession *client)
         .len = 0
     };
 
-    nbd_send_request(client->sock, &request);
-
-    /* finish any pending coroutines */
-    shutdown(client->sock, 2);
-    nbd_recv_coroutines_enter_all(client);
-
-    qemu_aio_set_fd_handler(client->sock, NULL, NULL, NULL);
-    closesocket(client->sock);
-    client->sock = -1;
-}
-
-void nbd_client_session_close(NbdClientSession *client)
-{
     if (!client->bs) {
         return;
     }
+    if (client->sock == -1) {
+        return;
+    }
+
+    nbd_send_request(client->sock, &request);
 
     nbd_teardown_connection(client);
     client->bs = NULL;
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 2/3] tests: add nbd-fault-injector.py utility
  2014-02-25 10:09 [Qemu-devel] [PATCH 0/3] nbd: fix issues when connection breaks Stefan Hajnoczi
  2014-02-25 10:09 ` [Qemu-devel] [PATCH 1/3] nbd: close socket if " Stefan Hajnoczi
@ 2014-02-25 10:09 ` Stefan Hajnoczi
  2014-02-25 11:45   ` Nick Thomas
  2014-02-25 10:09 ` [Qemu-devel] [PATCH 3/3] qemu-iotests: add 080 NBD client disconnect tests Stefan Hajnoczi
  2 siblings, 1 reply; 8+ messages in thread
From: Stefan Hajnoczi @ 2014-02-25 10:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, bluewindow, nick

The nbd-fault-injector.py script is a special kind of NBD server.  It
throws away all writes and produces zeroes for reads.  Given a list of
fault injection rules, it can simulate NBD protocol errors and is useful
for testing NBD client error handling code paths.

See the patch for documentation.  This scripts is modelled after Kevin
Wolf <kwolf@redhat.com>'s blkdebug block driver.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/nbd-fault-injector.py | 231 +++++++++++++++++++++++++++++++
 1 file changed, 231 insertions(+)
 create mode 100755 tests/qemu-iotests/nbd-fault-injector.py

diff --git a/tests/qemu-iotests/nbd-fault-injector.py b/tests/qemu-iotests/nbd-fault-injector.py
new file mode 100755
index 0000000..b4011f4
--- /dev/null
+++ b/tests/qemu-iotests/nbd-fault-injector.py
@@ -0,0 +1,231 @@
+#!/usr/bin/env python
+# NBD server - fault injection utility
+#
+# Configuration file syntax:
+#   [inject-error "disconnect-neg1"]
+#   event=neg1
+#   io=readwrite
+#   when=before
+#
+# Note that Python's ConfigParser squashes together all sections with the same
+# name, so give each [inject-error] a unique name.
+#
+# inject-error options:
+#   event - name of the trigger event
+#           "neg1" - first part of negotiation struct
+#           "export" - export struct
+#           "neg2" - second part of negotiation struct
+#           "request" - NBD request struct
+#           "reply" - NBD reply struct
+#           "data" - request/reply data
+#   io    - I/O direction that triggers this rule:
+#           "read", "write", or "readwrite"
+#           default: readwrite
+#   when  - when to inject the fault relative to the I/O operation
+#           "before" or "after"
+#           default: before
+#
+# Currently the only error injection action is to terminate the server process.
+# This resets the TCP connection and thus forces the client to handle
+# unexpected connection termination.
+#
+# Other error injection actions could be added in the future.
+#
+# Copyright Red Hat, Inc. 2014
+#
+# Authors:
+#   Stefan Hajnoczi <stefanha@redhat.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or later.
+# See the COPYING file in the top-level directory.
+
+import sys
+import socket
+import struct
+import collections
+import ConfigParser
+
+# Protocol constants
+NBD_CMD_READ = 0
+NBD_CMD_WRITE = 1
+NBD_CMD_DISC = 2
+NBD_REQUEST_MAGIC = 0x25609513
+NBD_REPLY_MAGIC = 0x67446698
+NBD_PASSWD = 0x4e42444d41474943
+NBD_OPTS_MAGIC = 0x49484156454F5054
+NBD_OPT_EXPORT_NAME = 1 << 0
+
+# Protocol structs
+neg1_struct = struct.Struct('>QQH')
+export_tuple = collections.namedtuple('Export', 'reserved magic opt len')
+export_struct = struct.Struct('>IQII')
+neg2_struct = struct.Struct('>QH124x')
+request_tuple = collections.namedtuple('Request', 'magic type handle from_ len')
+request_struct = struct.Struct('>IIQQI')
+reply_struct = struct.Struct('>IIQ')
+
+def err(msg):
+    sys.stderr.write(msg + '\n')
+    sys.exit(1)
+
+def recvall(sock, bufsize):
+    received = 0
+    chunks = []
+    while received < bufsize:
+        chunk = sock.recv(bufsize - received)
+        if len(chunk) == 0:
+            raise Exception('unexpected disconnect')
+        chunks.append(chunk)
+        received += len(chunk)
+    return ''.join(chunks)
+
+class Rule(object):
+    def __init__(self, name, event, io, when):
+        self.name = name
+        self.event = event
+        self.io = io
+        self.when = when
+
+    def match(self, event, io, when):
+        if when != self.when:
+            return False
+        if event != self.event:
+            return False
+        if io != self.io and self.io != 'readwrite':
+            return False
+        return True
+
+class FaultInjectionSocket(object):
+    def __init__(self, sock, rules):
+        self.sock = sock
+        self.rules = rules
+
+    def check(self, event, io, when):
+        for rule in self.rules:
+            if rule.match(event, io, when):
+                print 'Closing connection on rule match %s' % rule.name
+                sys.exit(0)
+
+    def send(self, buf, event):
+        self.check(event, 'write', 'before')
+        self.sock.sendall(buf)
+        self.check(event, 'write', 'after')
+
+    def recv(self, bufsize, event):
+        self.check(event, 'read', 'before')
+        data = recvall(self.sock, bufsize)
+        self.check(event, 'read', 'after')
+        return data
+
+    def close(self):
+        self.sock.close()
+
+def negotiate(conn):
+    '''Negotiate export with client'''
+    # Send negotiation part 1
+    buf = neg1_struct.pack(NBD_PASSWD, NBD_OPTS_MAGIC, 0)
+    conn.send(buf, event='neg1')
+
+    # Receive export option
+    buf = conn.recv(export_struct.size, event='export')
+    export = export_tuple._make(export_struct.unpack(buf))
+    assert export.magic == NBD_OPTS_MAGIC
+    assert export.opt == NBD_OPT_EXPORT_NAME
+    name = conn.recv(export.len, event='export-name')
+
+    # Send negotiation part 2
+    buf = neg2_struct.pack(8 * 1024 * 1024 * 1024, 0) # 8 GB capacity
+    conn.send(buf, event='neg2')
+
+def read_request(conn):
+    '''Parse NBD request from client'''
+    buf = conn.recv(request_struct.size, event='request')
+    req = request_tuple._make(request_struct.unpack(buf))
+    assert req.magic == NBD_REQUEST_MAGIC
+    return req
+
+def write_reply(conn, error, handle):
+    buf = reply_struct.pack(NBD_REPLY_MAGIC, error, handle)
+    conn.send(buf, event='reply')
+
+def handle_connection(conn):
+    negotiate(conn)
+    while True:
+        req = read_request(conn)
+        if req.type == NBD_CMD_READ:
+            write_reply(conn, 0, req.handle)
+            conn.send('\0' * req.len, event='data')
+        elif req.type == NBD_CMD_WRITE:
+            _ = conn.recv(req.len, event='data')
+            write_reply(conn, 0, req.handle)
+        elif req.type == NBD_CMD_DISC:
+            break
+        else:
+            print 'unrecognized command type %#02x' % req.type
+            break
+    conn.close()
+
+def run_server(sock, rules):
+    while True:
+        conn, _ = sock.accept()
+        handle_connection(FaultInjectionSocket(conn, rules))
+
+def parse_inject_error(name, options):
+    if 'event' not in options:
+        err('missing \"event\" option in %s' % name)
+    event = options['event']
+    if event not in ('neg1', 'export', 'neg2', 'request', 'reply', 'data'):
+        err('invalid \"event\" option value \"%s\" in %s' % (event, name))
+    io = options.get('io', 'readwrite')
+    if io not in ('read', 'write', 'readwrite'):
+        err('invalid \"io\" option value \"%s\" in %s' % (io, name))
+    when = options.get('when', 'before')
+    if when not in ('before', 'after'):
+        err('invalid \"when\" option value \"%s\" in %s' % (when, name))
+    return Rule(name, event, io, when)
+
+def parse_config(config):
+    rules = []
+    for name in config.sections():
+        if name.startswith('inject-error'):
+            options = dict(config.items(name))
+            rules.append(parse_inject_error(name, options))
+        else:
+            err('invalid config section name: %s' % name)
+    return rules
+
+def load_rules(filename):
+    config = ConfigParser.RawConfigParser()
+    with open(filename, 'rt') as f:
+        config.readfp(f, filename)
+    return parse_config(config)
+
+def open_socket(path):
+    '''Open a TCP or UNIX domain listen socket'''
+    if ':' in path:
+        host, port = path.split(':', 1)
+        sock = socket.socket()
+        sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
+        sock.bind((host, int(port)))
+    else:
+        sock = socket.socket(socket.AF_UNIX)
+        sock.bind(path)
+    sock.listen(0)
+    print 'Listening on %s' % path
+    return sock
+
+def usage(args):
+    sys.stderr.write('usage: %s <tcp-port>|<unix-path> <config-file>\n' % args[0])
+    sys.stderr.write('Run an fault injector NBD server with rules defined in a config file.\n')
+    sys.exit(1)
+
+def main(args):
+    if len(args) != 3:
+        usage(args)
+    sock = open_socket(args[1])
+    rules = load_rules(args[2])
+    run_server(sock, rules)
+    return 0
+
+if __name__ == '__main__':
+    sys.exit(main(sys.argv))
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH 3/3] qemu-iotests: add 080 NBD client disconnect tests
  2014-02-25 10:09 [Qemu-devel] [PATCH 0/3] nbd: fix issues when connection breaks Stefan Hajnoczi
  2014-02-25 10:09 ` [Qemu-devel] [PATCH 1/3] nbd: close socket if " Stefan Hajnoczi
  2014-02-25 10:09 ` [Qemu-devel] [PATCH 2/3] tests: add nbd-fault-injector.py utility Stefan Hajnoczi
@ 2014-02-25 10:09 ` Stefan Hajnoczi
  2014-02-25 10:16   ` Kevin Wolf
  2 siblings, 1 reply; 8+ messages in thread
From: Stefan Hajnoczi @ 2014-02-25 10:09 UTC (permalink / raw)
  To: qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, bluewindow, nick

This new test case uses nbd-fault-injector.py to simulate broken TCP
connections at each stage in the NBD protocol.  This way we can exercise
block/nbd-client.c's socket error handling code paths.

In particular, this serves as a regression test to make sure
nbd-client.c doesn't cause an infinite loop by leaving its
nbd_receive_reply() fd handler registered after the connection has been
closed.  This bug was fixed in an earlier patch.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
---
 tests/qemu-iotests/080     | 91 ++++++++++++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/080.out | 73 +++++++++++++++++++++++++++++++++++++
 tests/qemu-iotests/group   |  1 +
 3 files changed, 165 insertions(+)
 create mode 100755 tests/qemu-iotests/080
 create mode 100644 tests/qemu-iotests/080.out

diff --git a/tests/qemu-iotests/080 b/tests/qemu-iotests/080
new file mode 100755
index 0000000..21f599b
--- /dev/null
+++ b/tests/qemu-iotests/080
@@ -0,0 +1,91 @@
+#!/bin/bash
+#
+# Test NBD client unexpected disconnect
+#
+# Copyright Red Hat, Inc. 2014
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+#
+
+# creator
+owner=stefanha@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+tmp=/tmp/$$
+status=1	# failure is the default!
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+
+_supported_fmt generic
+_supported_proto nbd
+_supported_os Linux
+
+# Pick a TCP port based on our pid.  This way multiple instances of this test
+# can run in parallel without conflicting.
+choose_tcp_port() {
+	echo $((($$ % 31744) + 1024)) # 1024 <= port < 32768
+}
+
+wait_for_tcp_port() {
+	while ! (netstat --tcp --listening --numeric | \
+		 grep "$1.*0.0.0.0:\*.*LISTEN") 2>&1 >/dev/null; do
+		sleep 0.1
+	done
+}
+
+filter_nbd() {
+	# nbd.c error messages contain function names and line numbers that are prone
+	# to change.  Message ordering depends on timing between send and receive
+	# callbacks sometimes, making them unreliable.
+	#
+	# Filter out the TCP port number since this changes between runs.
+	sed -e 's#^nbd.c:.*##g' \
+	    -e 's#nbd:127.0.0.1:[^:]*:#nbd:127.0.0.1:PORT:#g'
+}
+
+check_disconnect() {
+	event=$1
+	when=$2
+	echo "=== Check disconnect $when $event ==="
+	echo
+
+	port=$(choose_tcp_port)
+
+	cat > "$TEST_DIR/nbd-fault-injector.conf" <<EOF
+[inject-error]
+event=$event
+when=$when
+EOF
+	./nbd-fault-injector.py "127.0.0.1:$port" "$TEST_DIR/nbd-fault-injector.conf" 2>&1 >/dev/null &
+	wait_for_tcp_port "127.0.0.1:$port"
+	$QEMU_IO -c "read 0 512" "nbd:127.0.0.1:$port:exportname=foo" 2>&1 | _filter_qemu_io | filter_nbd
+
+	echo
+}
+
+for event in neg1 "export" neg2 request reply data; do
+	for when in before after; do
+		check_disconnect "$event" "$when"
+	done
+done
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/080.out b/tests/qemu-iotests/080.out
new file mode 100644
index 0000000..9fc5761
--- /dev/null
+++ b/tests/qemu-iotests/080.out
@@ -0,0 +1,73 @@
+QA output created by 080
+=== Check disconnect before neg1 ===
+
+
+qemu-io: can't open device nbd:127.0.0.1:PORT:exportname=foo: Could not open image: Invalid argument
+no file open, try 'help open'
+
+=== Check disconnect after neg1 ===
+
+
+qemu-io: can't open device nbd:127.0.0.1:PORT:exportname=foo: Could not open image: Invalid argument
+no file open, try 'help open'
+
+=== Check disconnect before export ===
+
+
+qemu-io: can't open device nbd:127.0.0.1:PORT:exportname=foo: Could not open image: Invalid argument
+no file open, try 'help open'
+
+=== Check disconnect after export ===
+
+
+qemu-io: can't open device nbd:127.0.0.1:PORT:exportname=foo: Could not open image: Invalid argument
+no file open, try 'help open'
+
+=== Check disconnect before neg2 ===
+
+
+qemu-io: can't open device nbd:127.0.0.1:PORT:exportname=foo: Could not open image: Invalid argument
+no file open, try 'help open'
+
+=== Check disconnect after neg2 ===
+
+
+qemu-io: can't open device nbd:127.0.0.1:PORT:exportname=foo: Could not read image for determining its format: Input/output error
+no file open, try 'help open'
+
+=== Check disconnect before request ===
+
+
+qemu-io: can't open device nbd:127.0.0.1:PORT:exportname=foo: Could not read image for determining its format: Input/output error
+no file open, try 'help open'
+
+=== Check disconnect after request ===
+
+
+qemu-io: can't open device nbd:127.0.0.1:PORT:exportname=foo: Could not read image for determining its format: Input/output error
+no file open, try 'help open'
+
+=== Check disconnect before reply ===
+
+
+qemu-io: can't open device nbd:127.0.0.1:PORT:exportname=foo: Could not read image for determining its format: Input/output error
+no file open, try 'help open'
+
+=== Check disconnect after reply ===
+
+
+qemu-io: can't open device nbd:127.0.0.1:PORT:exportname=foo: Could not read image for determining its format: Input/output error
+no file open, try 'help open'
+
+=== Check disconnect before data ===
+
+
+qemu-io: can't open device nbd:127.0.0.1:PORT:exportname=foo: Could not read image for determining its format: Input/output error
+no file open, try 'help open'
+
+=== Check disconnect after data ===
+
+
+read failed: Input/output error
+
+*** done
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index d8be74a..2c2a3f3 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -83,3 +83,4 @@
 074 rw auto
 077 rw auto
 079 rw auto
+080 rw auto
-- 
1.8.5.3

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qemu-iotests: add 080 NBD client disconnect tests
  2014-02-25 10:09 ` [Qemu-devel] [PATCH 3/3] qemu-iotests: add 080 NBD client disconnect tests Stefan Hajnoczi
@ 2014-02-25 10:16   ` Kevin Wolf
  2014-02-25 11:16     ` Stefan Hajnoczi
  0 siblings, 1 reply; 8+ messages in thread
From: Kevin Wolf @ 2014-02-25 10:16 UTC (permalink / raw)
  To: Stefan Hajnoczi; +Cc: Paolo Bonzini, nick, qemu-devel, bluewindow

Am 25.02.2014 um 11:09 hat Stefan Hajnoczi geschrieben:
> This new test case uses nbd-fault-injector.py to simulate broken TCP
> connections at each stage in the NBD protocol.  This way we can exercise
> block/nbd-client.c's socket error handling code paths.
> 
> In particular, this serves as a regression test to make sure
> nbd-client.c doesn't cause an infinite loop by leaving its
> nbd_receive_reply() fd handler registered after the connection has been
> closed.  This bug was fixed in an earlier patch.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tests/qemu-iotests/080     | 91 ++++++++++++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/080.out | 73 +++++++++++++++++++++++++++++++++++++
>  tests/qemu-iotests/group   |  1 +

083 is the next free one, afaik. (081 and 082 are used by the pull
request I sent on Friday, and I think 080 was reserved for some series,
but I don't remember who it was.)

Kevin

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 3/3] qemu-iotests: add 080 NBD client disconnect tests
  2014-02-25 10:16   ` Kevin Wolf
@ 2014-02-25 11:16     ` Stefan Hajnoczi
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2014-02-25 11:16 UTC (permalink / raw)
  To: Kevin Wolf; +Cc: Paolo Bonzini, nick, qemu-devel, bluewindow

On Tue, Feb 25, 2014 at 11:16:16AM +0100, Kevin Wolf wrote:
> Am 25.02.2014 um 11:09 hat Stefan Hajnoczi geschrieben:
> > This new test case uses nbd-fault-injector.py to simulate broken TCP
> > connections at each stage in the NBD protocol.  This way we can exercise
> > block/nbd-client.c's socket error handling code paths.
> > 
> > In particular, this serves as a regression test to make sure
> > nbd-client.c doesn't cause an infinite loop by leaving its
> > nbd_receive_reply() fd handler registered after the connection has been
> > closed.  This bug was fixed in an earlier patch.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> >  tests/qemu-iotests/080     | 91 ++++++++++++++++++++++++++++++++++++++++++++++
> >  tests/qemu-iotests/080.out | 73 +++++++++++++++++++++++++++++++++++++
> >  tests/qemu-iotests/group   |  1 +
> 
> 083 is the next free one, afaik. (081 and 082 are used by the pull
> request I sent on Friday, and I think 080 was reserved for some series,
> but I don't remember who it was.)

Thanks, will fix in v2.

Stefan

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] tests: add nbd-fault-injector.py utility
  2014-02-25 10:09 ` [Qemu-devel] [PATCH 2/3] tests: add nbd-fault-injector.py utility Stefan Hajnoczi
@ 2014-02-25 11:45   ` Nick Thomas
  2014-02-25 16:25     ` Stefan Hajnoczi
  0 siblings, 1 reply; 8+ messages in thread
From: Nick Thomas @ 2014-02-25 11:45 UTC (permalink / raw)
  To: Stefan Hajnoczi, qemu-devel; +Cc: Kevin Wolf, Paolo Bonzini, bluewindow

Hi,

On 25/02/14 10:09, Stefan Hajnoczi wrote:
> The nbd-fault-injector.py script is a special kind of NBD server.  It
> throws away all writes and produces zeroes for reads.  Given a list of
> fault injection rules, it can simulate NBD protocol errors and is useful
> for testing NBD client error handling code paths.
> 
> See the patch for documentation.  This scripts is modelled after Kevin
> Wolf <kwolf@redhat.com>'s blkdebug block driver.
> 
> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> ---
>  tests/qemu-iotests/nbd-fault-injector.py | 231 +++++++++++++++++++++++++++++++
>  1 file changed, 231 insertions(+)
>  create mode 100755 tests/qemu-iotests/nbd-fault-injector.py
> 
> diff --git a/tests/qemu-iotests/nbd-fault-injector.py b/tests/qemu-iotests/nbd-fault-injector.py
> new file mode 100755
> index 0000000..b4011f4
> --- /dev/null
> +++ b/tests/qemu-iotests/nbd-fault-injector.py
> @@ -0,0 +1,231 @@
> +#!/usr/bin/env python
> +# NBD server - fault injection utility
> +#
> +# Configuration file syntax:
> +#   [inject-error "disconnect-neg1"]
> +#   event=neg1
> +#   io=readwrite
> +#   when=before
> +#
> +# Note that Python's ConfigParser squashes together all sections with the same
> +# name, so give each [inject-error] a unique name.
> +#
> +# inject-error options:
> +#   event - name of the trigger event
> +#           "neg1" - first part of negotiation struct
> +#           "export" - export struct
> +#           "neg2" - second part of negotiation struct
> +#           "request" - NBD request struct
> +#           "reply" - NBD reply struct
> +#           "data" - request/reply data
> +#   io    - I/O direction that triggers this rule:
> +#           "read", "write", or "readwrite"
> +#           default: readwrite
> +#   when  - when to inject the fault relative to the I/O operation
> +#           "before" or "after"
> +#           default: before
> +#
> +# Currently the only error injection action is to terminate the server process.
> +# This resets the TCP connection and thus forces the client to handle
> +# unexpected connection termination.
> +#
> +# Other error injection actions could be added in the future.
> +#
> +# Copyright Red Hat, Inc. 2014
> +#
> +# Authors:
> +#   Stefan Hajnoczi <stefanha@redhat.com>
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or later.
> +# See the COPYING file in the top-level directory.
> +
> +import sys
> +import socket
> +import struct
> +import collections
> +import ConfigParser
> +
> +# Protocol constants
> +NBD_CMD_READ = 0
> +NBD_CMD_WRITE = 1
> +NBD_CMD_DISC = 2
> +NBD_REQUEST_MAGIC = 0x25609513
> +NBD_REPLY_MAGIC = 0x67446698
> +NBD_PASSWD = 0x4e42444d41474943
> +NBD_OPTS_MAGIC = 0x49484156454F5054
> +NBD_OPT_EXPORT_NAME = 1 << 0
> +
> +# Protocol structs
> +neg1_struct = struct.Struct('>QQH')
> +export_tuple = collections.namedtuple('Export', 'reserved magic opt len')
> +export_struct = struct.Struct('>IQII')
> +neg2_struct = struct.Struct('>QH124x')
> +request_tuple = collections.namedtuple('Request', 'magic type handle from_ len')
> +request_struct = struct.Struct('>IIQQI')
> +reply_struct = struct.Struct('>IIQ')
> +
> +def err(msg):
> +    sys.stderr.write(msg + '\n')
> +    sys.exit(1)
> +
> +def recvall(sock, bufsize):
> +    received = 0
> +    chunks = []
> +    while received < bufsize:
> +        chunk = sock.recv(bufsize - received)
> +        if len(chunk) == 0:
> +            raise Exception('unexpected disconnect')
> +        chunks.append(chunk)
> +        received += len(chunk)
> +    return ''.join(chunks)
> +
> +class Rule(object):
> +    def __init__(self, name, event, io, when):
> +        self.name = name
> +        self.event = event
> +        self.io = io
> +        self.when = when
> +
> +    def match(self, event, io, when):
> +        if when != self.when:
> +            return False
> +        if event != self.event:
> +            return False
> +        if io != self.io and self.io != 'readwrite':
> +            return False
> +        return True
> +
> +class FaultInjectionSocket(object):
> +    def __init__(self, sock, rules):
> +        self.sock = sock
> +        self.rules = rules
> +
> +    def check(self, event, io, when):
> +        for rule in self.rules:
> +            if rule.match(event, io, when):
> +                print 'Closing connection on rule match %s' % rule.name
> +                sys.exit(0)
> +
> +    def send(self, buf, event):
> +        self.check(event, 'write', 'before')
> +        self.sock.sendall(buf)
> +        self.check(event, 'write', 'after')
> +
> +    def recv(self, bufsize, event):
> +        self.check(event, 'read', 'before')
> +        data = recvall(self.sock, bufsize)
> +        self.check(event, 'read', 'after')
> +        return data

There's a class of error I recently encountered in our out-of-tree proxy
component that only shows up if a read or write is interrupted partway
through. Perhaps you could have a "during" event here that happens after
bufsize/2 bytes is written?

I've not looked at qemu's block/nbd code recently, so I don't know if
that exercises a particular failure path.

> +    def close(self):
> +        self.sock.close()
> +
> +def negotiate(conn):
> +    '''Negotiate export with client'''
> +    # Send negotiation part 1
> +    buf = neg1_struct.pack(NBD_PASSWD, NBD_OPTS_MAGIC, 0)
> +    conn.send(buf, event='neg1')
> +
> +    # Receive export option
> +    buf = conn.recv(export_struct.size, event='export')
> +    export = export_tuple._make(export_struct.unpack(buf))
> +    assert export.magic == NBD_OPTS_MAGIC
> +    assert export.opt == NBD_OPT_EXPORT_NAME
> +    name = conn.recv(export.len, event='export-name')
> +
> +    # Send negotiation part 2
> +    buf = neg2_struct.pack(8 * 1024 * 1024 * 1024, 0) # 8 GB capacity
> +    conn.send(buf, event='neg2')

Is it worth exercising the old-style negotiation too?

> +def read_request(conn):
> +    '''Parse NBD request from client'''
> +    buf = conn.recv(request_struct.size, event='request')
> +    req = request_tuple._make(request_struct.unpack(buf))
> +    assert req.magic == NBD_REQUEST_MAGIC
> +    return req
> +
> +def write_reply(conn, error, handle):
> +    buf = reply_struct.pack(NBD_REPLY_MAGIC, error, handle)
> +    conn.send(buf, event='reply')
> +
> +def handle_connection(conn):
> +    negotiate(conn)
> +    while True:
> +        req = read_request(conn)
> +        if req.type == NBD_CMD_READ:
> +            write_reply(conn, 0, req.handle)
> +            conn.send('\0' * req.len, event='data')
> +        elif req.type == NBD_CMD_WRITE:
> +            _ = conn.recv(req.len, event='data')
> +            write_reply(conn, 0, req.handle)
> +        elif req.type == NBD_CMD_DISC:
> +            break
> +        else:
> +            print 'unrecognized command type %#02x' % req.type
> +            break
> +    conn.close()
> +
> +def run_server(sock, rules):
> +    while True:
> +        conn, _ = sock.accept()
> +        handle_connection(FaultInjectionSocket(conn, rules))
> +
> +def parse_inject_error(name, options):
> +    if 'event' not in options:
> +        err('missing \"event\" option in %s' % name)
> +    event = options['event']
> +    if event not in ('neg1', 'export', 'neg2', 'request', 'reply', 'data'):
> +        err('invalid \"event\" option value \"%s\" in %s' % (event, name))
> +    io = options.get('io', 'readwrite')
> +    if io not in ('read', 'write', 'readwrite'):
> +        err('invalid \"io\" option value \"%s\" in %s' % (io, name))
> +    when = options.get('when', 'before')
> +    if when not in ('before', 'after'):
> +        err('invalid \"when\" option value \"%s\" in %s' % (when, name))
> +    return Rule(name, event, io, when)
> +
> +def parse_config(config):
> +    rules = []
> +    for name in config.sections():
> +        if name.startswith('inject-error'):
> +            options = dict(config.items(name))
> +            rules.append(parse_inject_error(name, options))
> +        else:
> +            err('invalid config section name: %s' % name)
> +    return rules
> +
> +def load_rules(filename):
> +    config = ConfigParser.RawConfigParser()
> +    with open(filename, 'rt') as f:
> +        config.readfp(f, filename)
> +    return parse_config(config)
> +
> +def open_socket(path):
> +    '''Open a TCP or UNIX domain listen socket'''
> +    if ':' in path:
> +        host, port = path.split(':', 1)
> +        sock = socket.socket()
> +        sock.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)
> +        sock.bind((host, int(port)))
> +    else:
> +        sock = socket.socket(socket.AF_UNIX)
> +        sock.bind(path)
> +    sock.listen(0)
> +    print 'Listening on %s' % path
> +    return sock
> +
> +def usage(args):
> +    sys.stderr.write('usage: %s <tcp-port>|<unix-path> <config-file>\n' % args[0])
> +    sys.stderr.write('Run an fault injector NBD server with rules defined in a config file.\n')
> +    sys.exit(1)
> +
> +def main(args):
> +    if len(args) != 3:
> +        usage(args)
> +    sock = open_socket(args[1])
> +    rules = load_rules(args[2])
> +    run_server(sock, rules)
> +    return 0
> +
> +if __name__ == '__main__':
> +    sys.exit(main(sys.argv))
> 

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH 2/3] tests: add nbd-fault-injector.py utility
  2014-02-25 11:45   ` Nick Thomas
@ 2014-02-25 16:25     ` Stefan Hajnoczi
  0 siblings, 0 replies; 8+ messages in thread
From: Stefan Hajnoczi @ 2014-02-25 16:25 UTC (permalink / raw)
  To: Nick Thomas
  Cc: Kevin Wolf, Paolo Bonzini, qemu-devel, Stefan Hajnoczi, bluewindow

On Tue, Feb 25, 2014 at 11:45:11AM +0000, Nick Thomas wrote:
> On 25/02/14 10:09, Stefan Hajnoczi wrote:
> > +    def send(self, buf, event):
> > +        self.check(event, 'write', 'before')
> > +        self.sock.sendall(buf)
> > +        self.check(event, 'write', 'after')
> > +
> > +    def recv(self, bufsize, event):
> > +        self.check(event, 'read', 'before')
> > +        data = recvall(self.sock, bufsize)
> > +        self.check(event, 'read', 'after')
> > +        return data
> 
> There's a class of error I recently encountered in our out-of-tree proxy
> component that only shows up if a read or write is interrupted partway
> through. Perhaps you could have a "during" event here that happens after
> bufsize/2 bytes is written?
> 
> I've not looked at qemu's block/nbd code recently, so I don't know if
> that exercises a particular failure path.

Yes, it can involve different code paths in the client.  For example,
the client may receive fields in separate recv(2) syscalls so there may
be a unique path for each field.

I think the easiest approach is to turn the 'when' option into a byte
count.  The connection will be terminated after the given number of
bytes.

Then 'before' becomes an alias for 0.  'after' becomes an alias for -1,
the magic value for the entire data length.

> > +    def close(self):
> > +        self.sock.close()
> > +
> > +def negotiate(conn):
> > +    '''Negotiate export with client'''
> > +    # Send negotiation part 1
> > +    buf = neg1_struct.pack(NBD_PASSWD, NBD_OPTS_MAGIC, 0)
> > +    conn.send(buf, event='neg1')
> > +
> > +    # Receive export option
> > +    buf = conn.recv(export_struct.size, event='export')
> > +    export = export_tuple._make(export_struct.unpack(buf))
> > +    assert export.magic == NBD_OPTS_MAGIC
> > +    assert export.opt == NBD_OPT_EXPORT_NAME
> > +    name = conn.recv(export.len, event='export-name')
> > +
> > +    # Send negotiation part 2
> > +    buf = neg2_struct.pack(8 * 1024 * 1024 * 1024, 0) # 8 GB capacity
> > +    conn.send(buf, event='neg2')
> 
> Is it worth exercising the old-style negotiation too?

Yes, probably a good idea.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2014-02-25 16:26 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-25 10:09 [Qemu-devel] [PATCH 0/3] nbd: fix issues when connection breaks Stefan Hajnoczi
2014-02-25 10:09 ` [Qemu-devel] [PATCH 1/3] nbd: close socket if " Stefan Hajnoczi
2014-02-25 10:09 ` [Qemu-devel] [PATCH 2/3] tests: add nbd-fault-injector.py utility Stefan Hajnoczi
2014-02-25 11:45   ` Nick Thomas
2014-02-25 16:25     ` Stefan Hajnoczi
2014-02-25 10:09 ` [Qemu-devel] [PATCH 3/3] qemu-iotests: add 080 NBD client disconnect tests Stefan Hajnoczi
2014-02-25 10:16   ` Kevin Wolf
2014-02-25 11:16     ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.