All of lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
@ 2018-02-14  8:54 Alexandre DERUMIER
  2018-02-14  9:45 ` Alexandre DERUMIER
  0 siblings, 1 reply; 5+ messages in thread
From: Alexandre DERUMIER @ 2018-02-14  8:54 UTC (permalink / raw)
  To: qemu-devel

Hi,

I currently have failing mirroring jobs to nbd, when multiple jobs are running in parallel.


step to reproduce, with 2 disks:


1) launch mirroring job of first disk to remote target nbd.(to qemu running target)
2) wait until is reach  ready = 1 , do not complete
3) launch mirroring job of second disk to remote target nbd(to same qemu running target)

-> mirroring job of second disk is currently running (ready=0), first disk is still at ready=1 and still mirroring new write coming.


then, after some time, mainly if no new write are coming to first disk (around 30-40s), the first job is crashing with input/output error.



Note that I don't have network problem, or disk problem, I'm able to mirror both disk individually.


Another similar bug report on proxmox bugzilla:

https://bugzilla.proxmox.com/show_bug.cgi?id=1664


Maybe related to this :
https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03086.html
?


I don't remember to have the problem with qemu 2.7, but I'm able to reproduce with qemu 2.9 && qemu 2.11.


Best Regards,

Alexandre

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
  2018-02-14  8:54 [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11) Alexandre DERUMIER
@ 2018-02-14  9:45 ` Alexandre DERUMIER
  2018-02-14 15:11   ` Eric Blake
  0 siblings, 1 reply; 5+ messages in thread
From: Alexandre DERUMIER @ 2018-02-14  9:45 UTC (permalink / raw)
  To: qemu-devel

Sorry, I just find that the problem is in our proxmox implementation,

as we use a socat tunnel for the nbd mirroring, with a timeout of 30s in case of inactivity.

So, not a qemu bug.

Regards,

Alexandre

----- Mail original -----
De: "aderumier" <aderumier@odiso.com>
À: "qemu-devel" <qemu-devel@nongnu.org>
Envoyé: Mercredi 14 Février 2018 09:54:21
Objet: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)

Hi, 

I currently have failing mirroring jobs to nbd, when multiple jobs are running in parallel. 


step to reproduce, with 2 disks: 


1) launch mirroring job of first disk to remote target nbd.(to qemu running target) 
2) wait until is reach ready = 1 , do not complete 
3) launch mirroring job of second disk to remote target nbd(to same qemu running target) 

-> mirroring job of second disk is currently running (ready=0), first disk is still at ready=1 and still mirroring new write coming. 


then, after some time, mainly if no new write are coming to first disk (around 30-40s), the first job is crashing with input/output error. 



Note that I don't have network problem, or disk problem, I'm able to mirror both disk individually. 


Another similar bug report on proxmox bugzilla: 

https://bugzilla.proxmox.com/show_bug.cgi?id=1664 


Maybe related to this : 
https://lists.gnu.org/archive/html/qemu-devel/2017-11/msg03086.html 
? 


I don't remember to have the problem with qemu 2.7, but I'm able to reproduce with qemu 2.9 && qemu 2.11. 


Best Regards, 

Alexandre 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
  2018-02-14  9:45 ` Alexandre DERUMIER
@ 2018-02-14 15:11   ` Eric Blake
  2018-02-15  9:42     ` Wouter Verhelst
  0 siblings, 1 reply; 5+ messages in thread
From: Eric Blake @ 2018-02-14 15:11 UTC (permalink / raw)
  To: Alexandre DERUMIER, qemu-devel, nbd list

[adding nbd list]

On 02/14/2018 03:45 AM, Alexandre DERUMIER wrote:
> Sorry, I just find that the problem is in our proxmox implementation,
> 
> as we use a socat tunnel for the nbd mirroring, with a timeout of 30s in case of inactivity.
> 
> So, not a qemu bug.

Good to hear. Still, it makes me wonder if the NBD protocol itself 
should have some sort of a keepalive mechanism, maybe a new NBD_CMD_PING 
that can be used as a no-op command to keep the line alive if there is 
no other command to send for a while?  A client can always use a 
throwaway NBD_CMD_READ to keep the line alive, but that has more 
overhead; conversely, an extension is only useful if both client and 
server can negotiate to use it, which means that clients still have to 
be prepared for alternative fallbacks if they want to keep the line 
alive.  And we still don't have support for the server ever sending 
unsolicited messages (other than perhaps a structured reply where the 
server sends periodic reply chunks but never sends a final chunk - but 
still something that the guest initiates the sequence of server 
replies), so while the guest can keep the line to the server up, having 
the server keep the line open to the guest is a bit harder.

This is more food for thought on whether it even makes sense for NBD to 
worry about assisting in keepalive matters, or whether it would just be 
bloating the protocol.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
  2018-02-14 15:11   ` Eric Blake
@ 2018-02-15  9:42     ` Wouter Verhelst
  2018-02-16 10:19       ` Alex Bligh
  0 siblings, 1 reply; 5+ messages in thread
From: Wouter Verhelst @ 2018-02-15  9:42 UTC (permalink / raw)
  To: Eric Blake; +Cc: Alexandre DERUMIER, qemu-devel, nbd list

Hi Eric,

On Wed, Feb 14, 2018 at 09:11:02AM -0600, Eric Blake wrote:
[NBD and keepalive]
> This is more food for thought on whether it even makes sense for NBD to
> worry about assisting in keepalive matters, or whether it would just be
> bloating the protocol.

I'm currently leaning towards the latter. I don't think it makes (much)
sense to run NBD over an unreliable transport. It uses TCP specifically
to not have to worry about that, under the expectation that it won't
break except in unusual circumstances; if you break that expectation, I
think it's not unfair to say "well, then you get to keep both pieces".

We already set the SO_KEEPALIVE socket option (at least nbd-server does;
don't know about qemu) to make the kernel send out TCP-level keepalive
probes. This happens only after two hours (by default), but it's
something you can configure on your system if you need it to be lower.

Having said that, I can always be convinced otherwise by good arguments
:-)

-- 
Could you people please use IRC like normal people?!?

  -- Amaya Rodrigo Sastre, trying to quiet down the buzz in the DebConf 2008
     Hacklab

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11)
  2018-02-15  9:42     ` Wouter Verhelst
@ 2018-02-16 10:19       ` Alex Bligh
  0 siblings, 0 replies; 5+ messages in thread
From: Alex Bligh @ 2018-02-16 10:19 UTC (permalink / raw)
  To: Wouter Verhelst
  Cc: Alex Bligh, Eric Blake, qemu-devel, Alexandre Derumier, nbd list


> On 15 Feb 2018, at 04:42, Wouter Verhelst <w@uter.be> wrote:
> 
> 
> We already set the SO_KEEPALIVE socket option (at least nbd-server does;
> don't know about qemu) to make the kernel send out TCP-level keepalive
> probes. This happens only after two hours (by default), but it's
> something you can configure on your system if you need it to be lower.

+1 for just using SO_KEEPALIVE. I think I even submitted some (untested
and thus unmerged) patches for this.

-- 
Alex Bligh

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-02-16 10:20 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-02-14  8:54 [Qemu-devel] drive-mirroring to nbd is failing with multiple parallel jobs (qemu 2.9 -> 2.11) Alexandre DERUMIER
2018-02-14  9:45 ` Alexandre DERUMIER
2018-02-14 15:11   ` Eric Blake
2018-02-15  9:42     ` Wouter Verhelst
2018-02-16 10:19       ` Alex Bligh

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.