* [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
@ 2018-02-14 8:54 Alexandre DERUMIER
2018-02-14 9:45 ` Alexandre DERUMIER
0 siblings, 1 reply; 5+ messages in thread
From: Alexandre DERUMIER @ 2018-02-14 8:54 UTC (permalink / raw)
To: qemu-devel
Hi,
I currently have failing mirroring jobs to nbd, when multiple jobs are running in parallel.
step to reproduce, with 2 disks:
1) launch mirroring job of first disk to remote target nbd.(to qemu running target)
2) wait until is reach ready = 1 , do not complete
3) launch mirroring job of second disk to remote target nbd(to same qemu running target)
-> mirroring job of second disk is currently running (ready=0), first disk is still at ready=1 and still mirroring new write coming.
then, after some time, mainly if no new write are coming to first disk (around 30-40s), the first job is crashing with input/output error.
Note that I don't have network problem, or disk problem, I'm able to mirror both disk individually.
Another similar bug report on proxmox bugzilla:
https://bugzilla.proxmox.com/show_bug.cgi?id=1664
Maybe related to this :
https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03086.html
?
I don't remember to have the problem with qemu 2.7, but I'm able to reproduce with qemu 2.9 && qemu 2.11.
Best Regards,
Alexandre
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
2018-02-14 8:54 [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11) Alexandre DERUMIER
@ 2018-02-14 9:45 ` Alexandre DERUMIER
2018-02-14 15:11 ` Eric Blake
0 siblings, 1 reply; 5+ messages in thread
From: Alexandre DERUMIER @ 2018-02-14 9:45 UTC (permalink / raw)
To: qemu-devel
Sorry, I just find that the problem is in our proxmox implementation,
as we use a socat tunnel for the nbd mirroring, with a timeout of 30s in case of inactivity.
So, not a qemu bug.
Regards,
Alexandre
----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "qemu-devel" <qemu-devel@nongnu.org>
Envoyé: Mercredi 14 Février 2018 09:54:21
Objet: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
Hi,
I currently have failing mirroring jobs to nbd, when multiple jobs are running in parallel.
step to reproduce, with 2 disks:
1) launch mirroring job of first disk to remote target nbd.(to qemu running target)
2) wait until is reach ready = 1 , do not complete
3) launch mirroring job of second disk to remote target nbd(to same qemu running target)
-> mirroring job of second disk is currently running (ready=0), first disk is still at ready=1 and still mirroring new write coming.
then, after some time, mainly if no new write are coming to first disk (around 30-40s), the first job is crashing with input/output error.
Note that I don't have network problem, or disk problem, I'm able to mirror both disk individually.
Another similar bug report on proxmox bugzilla:
https://bugzilla.proxmox.com/show_bug.cgi?id=1664
Maybe related to this :
https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03086.html
?
I don't remember to have the problem with qemu 2.7, but I'm able to reproduce with qemu 2.9 && qemu 2.11.
Best Regards,
Alexandre
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
2018-02-14 9:45 ` Alexandre DERUMIER
@ 2018-02-14 15:11 ` Eric Blake
2018-02-15 9:42 ` Wouter Verhelst
0 siblings, 1 reply; 5+ messages in thread
From: Eric Blake @ 2018-02-14 15:11 UTC (permalink / raw)
To: Alexandre DERUMIER, qemu-devel, nbd list
[adding nbd list]
On 02/14/2018 03:45 AM, Alexandre DERUMIER wrote:
> Sorry, I just find that the problem is in our proxmox implementation,
>
> as we use a socat tunnel for the nbd mirroring, with a timeout of 30s in case of inactivity.
>
> So, not a qemu bug.
Good to hear. Still, it makes me wonder if the NBD protocol itself
should have some sort of a keepalive mechanism, maybe a new NBD_CMD_PING
that can be used as a no-op command to keep the line alive if there is
no other command to send for a while? A client can always use a
throwaway NBD_CMD_READ to keep the line alive, but that has more
overhead; conversely, an extension is only useful if both client and
server can negotiate to use it, which means that clients still have to
be prepared for alternative fallbacks if they want to keep the line
alive. And we still don't have support for the server ever sending
unsolicited messages (other than perhaps a structured reply where the
server sends periodic reply chunks but never sends a final chunk - but
still something that the guest initiates the sequence of server
replies), so while the guest can keep the line to the server up, having
the server keep the line open to the guest is a bit harder.
This is more food for thought on whether it even makes sense for NBD to
worry about assisting in keepalive matters, or whether it would just be
bloating the protocol.
--
Eric Blake, Principal Software Engineer
Red Hat, Inc. +1-919-301-3266
Virtualization: qemu.org | libvirt.org
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
2018-02-14 15:11 ` Eric Blake
@ 2018-02-15 9:42 ` Wouter Verhelst
2018-02-16 10:19 ` Alex Bligh
0 siblings, 1 reply; 5+ messages in thread
From: Wouter Verhelst @ 2018-02-15 9:42 UTC (permalink / raw)
To: Eric Blake; +Cc: Alexandre DERUMIER, qemu-devel, nbd list
Hi Eric,
On Wed, Feb 14, 2018 at 09:11:02AM -0600, Eric Blake wrote:
[NBD and keepalive]
> This is more food for thought on whether it even makes sense for NBD to
> worry about assisting in keepalive matters, or whether it would just be
> bloating the protocol.
I'm currently leaning towards the latter. I don't think it makes (much)
sense to run NBD over an unreliable transport. It uses TCP specifically
to not have to worry about that, under the expectation that it won't
break except in unusual circumstances; if you break that expectation, I
think it's not unfair to say "well, then you get to keep both pieces".
We already set the SO_KEEPALIVE socket option (at least nbd-server does;
don't know about qemu) to make the kernel send out TCP-level keepalive
probes. This happens only after two hours (by default), but it's
something you can configure on your system if you need it to be lower.
Having said that, I can always be convinced otherwise by good arguments
:-)
--
Could you people please use IRC like normal people?!?
-- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008
Hacklab
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
2018-02-15 9:42 ` Wouter Verhelst
@ 2018-02-16 10:19 ` Alex Bligh
0 siblings, 0 replies; 5+ messages in thread
From: Alex Bligh @ 2018-02-16 10:19 UTC (permalink / raw)
To: Wouter Verhelst
Cc: Alex Bligh, Eric Blake, qemu-devel, Alexandre Derumier, nbd list
> On 15 Feb 2018, at 04:42, Wouter Verhelst <w@uter.be> wrote:
>
>
> We already set the SO_KEEPALIVE socket option (at least nbd-server does;
> don't know about qemu) to make the kernel send out TCP-level keepalive
> probes. This happens only after two hours (by default), but it's
> something you can configure on your system if you need it to be lower.
+1 for just using SO_KEEPALIVE. I think I even submitted some (untested
and thus unmerged) patches for this.
--
Alex Bligh
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2018-02-16 10:20 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-14 8:54 [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11) Alexandre DERUMIER
2018-02-14 9:45 ` Alexandre DERUMIER
2018-02-14 15:11 ` Eric Blake
2018-02-15 9:42 ` Wouter Verhelst
2018-02-16 10:19 ` Alex Bligh
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.