qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious replication
@ 2019-08-15 18:08 Lukas Straub
  2019-08-15 18:57 ` Dr. David Alan Gilbert
  0 siblings, 1 reply; 8+ messages in thread
From: Lukas Straub @ 2019-08-15 18:08 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zhang Chen, Jason Wang, Xie Changlong, Wen Congyang

Hello Everyone,
These Patches add support for continious replication to colo.
Please review.

Regards,
Lukas Straub

v2:
 - fix email formating
 - fix checkpatch.pl warnings
 - fix patchew error
 - clearer commit messages

Lukas Straub (3):
  Replication: Ignore requests after failover
  net/filter.c: Add Options to insert filters anywhere in the filter
    list
  Update Documentation

 block/replication.c  |  38 ++++++++-
 docs/COLO-FT.txt     | 185 ++++++++++++++++++++++++++++++++-----------
 include/net/filter.h |   2 +
 net/filter.c         |  71 ++++++++++++++++-
 qemu-options.hx      |  10 +--
 5 files changed, 250 insertions(+), 56 deletions(-)

--
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious replication
  2019-08-15 18:08 [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious replication Lukas Straub
@ 2019-08-15 18:57 ` Dr. David Alan Gilbert
  2019-08-15 19:48   ` Lukas Straub
  0 siblings, 1 reply; 8+ messages in thread
From: Dr. David Alan Gilbert @ 2019-08-15 18:57 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Zhang Chen, Jason Wang, Xie Changlong, qemu-devel, Wen Congyang

* Lukas Straub (lukasstraub2@web.de) wrote:
> Hello Everyone,
> These Patches add support for continious replication to colo.
> Please review.


OK, for those who haven't followed COLO for so long; 'continuous
replication' is when after the first primary fails, you can promote the 
original secondary to a new primary and start replicating again;

i.e. current COLO gives you

p<->s
    <primary fails>
    s

with your patches you can do

    s becomes p2
    p2<->s2

and you're back to being resilient again.

Which is great; because that was always an important missing piece.

Do you have some test scripts/setup for this - it would be great
to automate some testing.

Dave

> Regards,
> Lukas Straub
> 
> v2:
>  - fix email formating
>  - fix checkpatch.pl warnings
>  - fix patchew error
>  - clearer commit messages
> 
> Lukas Straub (3):
>   Replication: Ignore requests after failover
>   net/filter.c: Add Options to insert filters anywhere in the filter
>     list
>   Update Documentation
> 
>  block/replication.c  |  38 ++++++++-
>  docs/COLO-FT.txt     | 185 ++++++++++++++++++++++++++++++++-----------
>  include/net/filter.h |   2 +
>  net/filter.c         |  71 ++++++++++++++++-
>  qemu-options.hx      |  10 +--
>  5 files changed, 250 insertions(+), 56 deletions(-)
> 
> --
> 2.20.1
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious replication
  2019-08-15 18:57 ` Dr. David Alan Gilbert
@ 2019-08-15 19:48   ` Lukas Straub
  2019-08-16  1:51     ` Zhang, Chen
  0 siblings, 1 reply; 8+ messages in thread
From: Lukas Straub @ 2019-08-15 19:48 UTC (permalink / raw)
  To: Dr. David Alan Gilbert
  Cc: Zhang Chen, Jason Wang, Xie Changlong, qemu-devel, Wen Congyang

On Thu, 15 Aug 2019 19:57:37 +0100
"Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:

> * Lukas Straub (lukasstraub2@web.de) wrote:
> > Hello Everyone,
> > These Patches add support for continious replication to colo.
> > Please review.
>
>
> OK, for those who haven't followed COLO for so long; 'continuous
> replication' is when after the first primary fails, you can promote the
> original secondary to a new primary and start replicating again;
>
> i.e. current COLO gives you
>
> p<->s
>     <primary fails>
>     s
>
> with your patches you can do
>
>     s becomes p2
>     p2<->s2
>
> and you're back to being resilient again.
>
> Which is great; because that was always an important missing piece.
>
> Do you have some test scripts/setup for this - it would be great
> to automate some testing.

My Plan is to write a Pacemaker Resource Agent[1] for qemu-colo and
then do some long-term testing in my small cluster here. Writing
standalone tests using that Resource Agent should be easy, it just needs
to be provided with the right arguments and environment Variables.

Regards,
Lukas Straub

[1] https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev-guides/ra-dev-guide.asc#what-is-a-resource-agent


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious replication
  2019-08-15 19:48   ` Lukas Straub
@ 2019-08-16  1:51     ` Zhang, Chen
  2019-08-16 18:20       ` Lukas Straub
  0 siblings, 1 reply; 8+ messages in thread
From: Zhang, Chen @ 2019-08-16  1:51 UTC (permalink / raw)
  To: Lukas Straub, Dr. David Alan Gilbert
  Cc: Wen Congyang, Jason Wang, Xie Changlong, qemu-devel



> -----Original Message-----
> From: Lukas Straub [mailto:lukasstraub2@web.de]
> Sent: Friday, August 16, 2019 3:48 AM
> To: Dr. David Alan Gilbert <dgilbert@redhat.com>
> Cc: qemu-devel <qemu-devel@nongnu.org>; Zhang, Chen
> <chen.zhang@intel.com>; Jason Wang <jasowang@redhat.com>; Xie
> Changlong <xiechanglong.d@gmail.com>; Wen Congyang
> <wencongyang2@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious
> replication
> 
> On Thu, 15 Aug 2019 19:57:37 +0100
> "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> 
> > * Lukas Straub (lukasstraub2@web.de) wrote:
> > > Hello Everyone,
> > > These Patches add support for continious replication to colo.
> > > Please review.
> >
> >
> > OK, for those who haven't followed COLO for so long; 'continuous
> > replication' is when after the first primary fails, you can promote
> > the original secondary to a new primary and start replicating again;
> >
> > i.e. current COLO gives you
> >
> > p<->s
> >     <primary fails>
> >     s
> >
> > with your patches you can do
> >
> >     s becomes p2
> >     p2<->s2
> >
> > and you're back to being resilient again.
> >
> > Which is great; because that was always an important missing piece.
> >
> > Do you have some test scripts/setup for this - it would be great to
> > automate some testing.
> 
> My Plan is to write a Pacemaker Resource Agent[1] for qemu-colo and then do
> some long-term testing in my small cluster here. Writing standalone tests using
> that Resource Agent should be easy, it just needs to be provided with the right
> arguments and environment Variables.

Thanks Dave's explanation.
It looks good for me and I will test this series in my side.

Another question: Is "Pacemaker Resource Agent[1] "  like a heartbeat module?   I have wrote an internal heartbeat module running on Qemu, it make COLO can detect fail and trigger failover automatically, no need external APP to call the QMP command "x-colo-lost-heartbeat". If you need it, I can send a RFC version recently.

Thanks
Zhang Chen
> 
> Regards,
> Lukas Straub
> 
> [1] https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev-
> guides/ra-dev-guide.asc#what-is-a-resource-agent


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious replication
  2019-08-16  1:51     ` Zhang, Chen
@ 2019-08-16 18:20       ` Lukas Straub
  2019-08-21  5:23         ` Zhang, Chen
  2019-08-21 17:34         ` Dr. David Alan Gilbert
  0 siblings, 2 replies; 8+ messages in thread
From: Lukas Straub @ 2019-08-16 18:20 UTC (permalink / raw)
  To: Zhang, Chen
  Cc: Wen Congyang, Jason Wang, Xie Changlong, Dr. David Alan Gilbert,
	qemu-devel

On Fri, 16 Aug 2019 01:51:20 +0000
"Zhang, Chen" <chen.zhang@intel.com> wrote:

> > -----Original Message-----
> > From: Lukas Straub [mailto:lukasstraub2@web.de]
> > Sent: Friday, August 16, 2019 3:48 AM
> > To: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > Cc: qemu-devel <qemu-devel@nongnu.org>; Zhang, Chen
> > <chen.zhang@intel.com>; Jason Wang <jasowang@redhat.com>; Xie
> > Changlong <xiechanglong.d@gmail.com>; Wen Congyang
> > <wencongyang2@huawei.com>
> > Subject: Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious
> > replication
> >
> > On Thu, 15 Aug 2019 19:57:37 +0100
> > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> >
> > > * Lukas Straub (lukasstraub2@web.de) wrote:
> > > > Hello Everyone,
> > > > These Patches add support for continious replication to colo.
> > > > Please review.
> > >
> > >
> > > OK, for those who haven't followed COLO for so long; 'continuous
> > > replication' is when after the first primary fails, you can promote
> > > the original secondary to a new primary and start replicating again;
> > >
> > > i.e. current COLO gives you
> > >
> > > p<->s
> > >     <primary fails>
> > >     s
> > >
> > > with your patches you can do
> > >
> > >     s becomes p2
> > >     p2<->s2
> > >
> > > and you're back to being resilient again.
> > >
> > > Which is great; because that was always an important missing piece.
> > >
> > > Do you have some test scripts/setup for this - it would be great to
> > > automate some testing.
> >
> > My Plan is to write a Pacemaker Resource Agent[1] for qemu-colo and then do
> > some long-term testing in my small cluster here. Writing standalone tests using
> > that Resource Agent should be easy, it just needs to be provided with the right
> > arguments and environment Variables.
>
> Thanks Dave's explanation.
> It looks good for me and I will test this series in my side.
>
> Another question: Is "Pacemaker Resource Agent[1] "  like a heartbeat module?

It's a bit more than that. Pacemaker itself is an Cluster Resource Manager, you can think of it like sysvinit but for clusters. It controls where in the cluster Resources run, what state (master/slave) and what to do in case of a Node or Resource failure. Now Resources can be anything like SQL-Server, Webserver, VM, etc. and Pacemaker itself doesn't directly control them, that's the Job of the Resource Agents. So a Resource Agent is like an init-script, but cluster-aware with more actions like start, stop, monitor, promote (to master) or migrate-to.

> I have wrote an internal heartbeat module running on Qemu, it make COLO can detect fail and trigger failover automatically, no need external APP to call the QMP command "x-colo-lost-heartbeat". If you need it, I can send a RFC version recently.

Cool, this should be faster to failover than with Pacemaker.
What is the plan with cases like Primary-failover, which need to issue multiple commands?

> Thanks
> Zhang Chen
> >
> > Regards,
> > Lukas Straub
> >
> > [1] https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev-guides/ra-dev-guide.asc#what-is-a-resource-agent



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious replication
  2019-08-16 18:20       ` Lukas Straub
@ 2019-08-21  5:23         ` Zhang, Chen
  2019-08-21 17:34         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 8+ messages in thread
From: Zhang, Chen @ 2019-08-21  5:23 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Wen Congyang, Jason Wang, Xie Changlong, Dr. David Alan Gilbert,
	qemu-devel



> -----Original Message-----
> From: Lukas Straub [mailto:lukasstraub2@web.de]
> Sent: Saturday, August 17, 2019 2:20 AM
> To: Zhang, Chen <chen.zhang@intel.com>
> Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>; qemu-devel <qemu-
> devel@nongnu.org>; Jason Wang <jasowang@redhat.com>; Xie Changlong
> <xiechanglong.d@gmail.com>; Wen Congyang <wencongyang2@huawei.com>
> Subject: Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious
> replication
> 
> On Fri, 16 Aug 2019 01:51:20 +0000
> "Zhang, Chen" <chen.zhang@intel.com> wrote:
> 
> > > -----Original Message-----
> > > From: Lukas Straub [mailto:lukasstraub2@web.de]
> > > Sent: Friday, August 16, 2019 3:48 AM
> > > To: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > Cc: qemu-devel <qemu-devel@nongnu.org>; Zhang, Chen
> > > <chen.zhang@intel.com>; Jason Wang <jasowang@redhat.com>; Xie
> > > Changlong <xiechanglong.d@gmail.com>; Wen Congyang
> > > <wencongyang2@huawei.com>
> > > Subject: Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for
> > > continious replication
> > >
> > > On Thu, 15 Aug 2019 19:57:37 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >
> > > > * Lukas Straub (lukasstraub2@web.de) wrote:
> > > > > Hello Everyone,
> > > > > These Patches add support for continious replication to colo.
> > > > > Please review.
> > > >
> > > >
> > > > OK, for those who haven't followed COLO for so long; 'continuous
> > > > replication' is when after the first primary fails, you can
> > > > promote the original secondary to a new primary and start
> > > > replicating again;
> > > >
> > > > i.e. current COLO gives you
> > > >
> > > > p<->s
> > > >     <primary fails>
> > > >     s
> > > >
> > > > with your patches you can do
> > > >
> > > >     s becomes p2
> > > >     p2<->s2
> > > >
> > > > and you're back to being resilient again.
> > > >
> > > > Which is great; because that was always an important missing piece.
> > > >
> > > > Do you have some test scripts/setup for this - it would be great
> > > > to automate some testing.
> > >
> > > My Plan is to write a Pacemaker Resource Agent[1] for qemu-colo and
> > > then do some long-term testing in my small cluster here. Writing
> > > standalone tests using that Resource Agent should be easy, it just
> > > needs to be provided with the right arguments and environment Variables.
> >
> > Thanks Dave's explanation.
> > It looks good for me and I will test this series in my side.
> >
> > Another question: Is "Pacemaker Resource Agent[1] "  like a heartbeat
> module?
> 
> It's a bit more than that. Pacemaker itself is an Cluster Resource Manager, you
> can think of it like sysvinit but for clusters. It controls where in the cluster
> Resources run, what state (master/slave) and what to do in case of a Node or
> Resource failure. Now Resources can be anything like SQL-Server, Webserver,
> VM, etc. and Pacemaker itself doesn't directly control them, that's the Job of
> the Resource Agents. So a Resource Agent is like an init-script, but cluster-
> aware with more actions like start, stop, monitor, promote (to master) or
> migrate-to.
> 
> > I have wrote an internal heartbeat module running on Qemu, it make COLO
> can detect fail and trigger failover automatically, no need external APP to call
> the QMP command "x-colo-lost-heartbeat". If you need it, I can send a RFC
> version recently.
> 
> Cool, this should be faster to failover than with Pacemaker.
> What is the plan with cases like Primary-failover, which need to issue multiple
> commands?

Yes, currently we need input some net filter delete command after primary-failover.
We need make a way to remove related net-filter and chardev automatically.
But for Pacemaker it isn't a problem, you can send related qmp command after the "x-lost-heart-beat". 

Thanks
Zhang Chen

> 
> > Thanks
> > Zhang Chen
> > >
> > > Regards,
> > > Lukas Straub
> > >
> > > [1]
> > > https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev-g
> > > uides/ra-dev-guide.asc#what-is-a-resource-agent



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious replication
  2019-08-16 18:20       ` Lukas Straub
  2019-08-21  5:23         ` Zhang, Chen
@ 2019-08-21 17:34         ` Dr. David Alan Gilbert
  1 sibling, 0 replies; 8+ messages in thread
From: Dr. David Alan Gilbert @ 2019-08-21 17:34 UTC (permalink / raw)
  To: Lukas Straub
  Cc: Zhang, Chen, Jason Wang, Xie Changlong, qemu-devel, Wen Congyang

* Lukas Straub (lukasstraub2@web.de) wrote:
> On Fri, 16 Aug 2019 01:51:20 +0000
> "Zhang, Chen" <chen.zhang@intel.com> wrote:
> 
> > > -----Original Message-----
> > > From: Lukas Straub [mailto:lukasstraub2@web.de]
> > > Sent: Friday, August 16, 2019 3:48 AM
> > > To: Dr. David Alan Gilbert <dgilbert@redhat.com>
> > > Cc: qemu-devel <qemu-devel@nongnu.org>; Zhang, Chen
> > > <chen.zhang@intel.com>; Jason Wang <jasowang@redhat.com>; Xie
> > > Changlong <xiechanglong.d@gmail.com>; Wen Congyang
> > > <wencongyang2@huawei.com>
> > > Subject: Re: [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious
> > > replication
> > >
> > > On Thu, 15 Aug 2019 19:57:37 +0100
> > > "Dr. David Alan Gilbert" <dgilbert@redhat.com> wrote:
> > >
> > > > * Lukas Straub (lukasstraub2@web.de) wrote:
> > > > > Hello Everyone,
> > > > > These Patches add support for continious replication to colo.
> > > > > Please review.
> > > >
> > > >
> > > > OK, for those who haven't followed COLO for so long; 'continuous
> > > > replication' is when after the first primary fails, you can promote
> > > > the original secondary to a new primary and start replicating again;
> > > >
> > > > i.e. current COLO gives you
> > > >
> > > > p<->s
> > > >     <primary fails>
> > > >     s
> > > >
> > > > with your patches you can do
> > > >
> > > >     s becomes p2
> > > >     p2<->s2
> > > >
> > > > and you're back to being resilient again.
> > > >
> > > > Which is great; because that was always an important missing piece.
> > > >
> > > > Do you have some test scripts/setup for this - it would be great to
> > > > automate some testing.
> > >
> > > My Plan is to write a Pacemaker Resource Agent[1] for qemu-colo and then do
> > > some long-term testing in my small cluster here. Writing standalone tests using
> > > that Resource Agent should be easy, it just needs to be provided with the right
> > > arguments and environment Variables.

Could you update tests/test-replication.c to test the extra steps?

Dave

> > Thanks Dave's explanation.
> > It looks good for me and I will test this series in my side.
> >
> > Another question: Is "Pacemaker Resource Agent[1] "  like a heartbeat module?
> 
> It's a bit more than that. Pacemaker itself is an Cluster Resource Manager, you can think of it like sysvinit but for clusters. It controls where in the cluster Resources run, what state (master/slave) and what to do in case of a Node or Resource failure. Now Resources can be anything like SQL-Server, Webserver, VM, etc. and Pacemaker itself doesn't directly control them, that's the Job of the Resource Agents. So a Resource Agent is like an init-script, but cluster-aware with more actions like start, stop, monitor, promote (to master) or migrate-to.
> 
> > I have wrote an internal heartbeat module running on Qemu, it make COLO can detect fail and trigger failover automatically, no need external APP to call the QMP command "x-colo-lost-heartbeat". If you need it, I can send a RFC version recently.
> 
> Cool, this should be faster to failover than with Pacemaker.
> What is the plan with cases like Primary-failover, which need to issue multiple commands?
> 
> > Thanks
> > Zhang Chen
> > >
> > > Regards,
> > > Lukas Straub
> > >
> > > [1] https://github.com/ClusterLabs/resource-agents/blob/master/doc/dev-guides/ra-dev-guide.asc#what-is-a-resource-agent
> 
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK


^ permalink raw reply	[flat|nested] 8+ messages in thread

* [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious replication
@ 2019-08-15 18:48 Lukas Straub
  0 siblings, 0 replies; 8+ messages in thread
From: Lukas Straub @ 2019-08-15 18:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zhang Chen, Jason Wang, Xie Changlong, Wen Congyang

Hello Everyone,
These Patches add support for continious replication to colo.
Please review.

Regards,
Lukas Straub

v2:
 - fix email formating
 - fix checkpatch.pl warnings
 - fix patchew error
 - clearer commit messages

Lukas Straub (3):
  Replication: Ignore requests after failover
  net/filter.c: Add Options to insert filters anywhere in the filter
    list
  Update Documentation

 block/replication.c  |  38 ++++++++-
 docs/COLO-FT.txt     | 185 ++++++++++++++++++++++++++++++++-----------
 include/net/filter.h |   2 +
 net/filter.c         |  71 ++++++++++++++++-
 qemu-options.hx      |  10 +--
 5 files changed, 250 insertions(+), 56 deletions(-)

--
2.20.1


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2019-08-21 18:51 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-08-15 18:08 [Qemu-devel] [PATCH v2 0/3] colo: Add support for continious replication Lukas Straub
2019-08-15 18:57 ` Dr. David Alan Gilbert
2019-08-15 19:48   ` Lukas Straub
2019-08-16  1:51     ` Zhang, Chen
2019-08-16 18:20       ` Lukas Straub
2019-08-21  5:23         ` Zhang, Chen
2019-08-21 17:34         ` Dr. David Alan Gilbert
2019-08-15 18:48 Lukas Straub

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).