On 7/31/18 12:30 PM, Vladimir Sementsov-Ogievskiy wrote: > Reconnect will be implemented in the following commit, so for now, > in semantics below, disconnect itself is a "serious error". > > Signed-off-by: Vladimir Sementsov-Ogievskiy > --- > qapi/block-core.json | 12 +++++++++++- > block/nbd-client.h | 1 + > block/nbd-client.c | 1 + > block/nbd.c | 16 +++++++++++++++- > 4 files changed, 28 insertions(+), 2 deletions(-) > > diff --git a/qapi/block-core.json b/qapi/block-core.json > index 5b9084a394..cf03402ec5 100644 > --- a/qapi/block-core.json > +++ b/qapi/block-core.json > @@ -3511,13 +3511,23 @@ > # traditional "base:allocation" block status (see > # NBD_OPT_LIST_META_CONTEXT in the NBD protocol) (since 3.0) > # > +# @reconnect-delay: Reconnect delay. On disconnect, nbd client tries to connect Maybe 'On unexpected disconnect', since intentional disconnect is not unexpected. > +# again, until success or serious error. During first > +# @reconnect-delay seconds of reconnecting loop all requests > +# are paused and have a chance to rerun, if successful > +# connect occures during this time. After @reconnect-delay occurs > +# seconds all delayed requests are failed and all following > +# requests will be failed to (until successfull reconnect). successful > +# Default 300 seconds (Since 3.1) My delay in reviewing means this now has to be 4.0. I'm guessing that a delay of 0 means disable auto-reconnect. From a backwards-compatibility standpoint, no auto-reconnect is more in line with what we previously had - but from a usability standpoint, trying to reconnect can avoid turning transient network hiccups into permanent loss of a device to EIO errors, especially if the retry timeout is long enough to allow an administrator to reroute the network to an alternative server. So I'm probably okay with the default being non-zero - but it DOES mean that where you used to get instant EIO failures when a network connection was severed, you now have to wait for the reconnect delay to expire, and 5 minutes can be a long wait. Since the long delay is guest-observable, can we run into issues where a guest that is currently used to instant EIO and total loss of the device could instead get confused by not getting any response for up to 5 minutes, whether or not that response eventually turns out to be EIO or a successful recovery? > +++ b/block/nbd.c > @@ -360,6 +360,18 @@ static QemuOptsList nbd_runtime_opts = { > .help = "experimental: expose named dirty bitmap in place of " > "block status", > }, > + { > + .name = "reconnect-delay", > + .type = QEMU_OPT_NUMBER, > + .help = "Reconnect delay. On disconnect, nbd client tries to" > + "connect again, until success or serious error. During" > + "first @reconnect-delay seconds of reconnecting loop all" > + "requests are paused and have a chance to rerun, if" > + "successful connect occures during this time. After" > + "@reconnect-delay seconds all delayed requests are failed" > + "and all following requests will be failed to (until" > + "successfull reconnect). Default 300 seconds", Same typos as in qapi. The UI aspects look fine, now I need to review the patch series for code issues :) -- Eric Blake, Principal Software Engineer Red Hat, Inc. +1-919-301-3226 Virtualization: qemu.org | libvirt.org