All of lore.kernel.org
 help / color / mirror / Atom feed
* xl save but leave domain paused
@ 2013-05-30 21:46 Ian Murray
  2013-06-04 14:28 ` Ian Campbell
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Murray @ 2013-05-30 21:46 UTC (permalink / raw)
  To: xen-devel

All,

I have long desired an ability to execute a domain save and leave the 
domain in a paused state. This is so that I can initiate an LVM snapshot 
to go with the checkpoint file. I know I can achieve this via a 
non-checkpoint save and a restore, but it seems a bit silly to reload 
the domain's memory from disk and doubles the "suspension" time of the 
domain. So, ideally, I would like xl save -p that leaves the domain 
paused. I did put it in a feature request a long time ago, but it never 
made the cut.

To this end and as more of a starting point, I have written my own basic 
patch. While this appears to work, I see a (very small) opportunity for 
the domU to run for a short time between the libxl_domain_resume and 
libxl_domain_pause calls. This defeats the object as I am trying to 
maintain a disk snapshot that is exactly in synch with the save state.

Can anyone please offer some thoughts on how I can implement this 
properly. I have looked at the corresponding xc calls but meddling with 
those is way beyond my knowledge. Another way of looking at the problem 
would be able to perform an xl save on a paused domain, as this would 
achieve the same result.

Thanks for reading and thanks for any suggestions that are forthcoming. 
I am not a C guru and even less of a Xen dev guru, so please treat me 
somewhat like an idiot. :)

Thanks,

Ian.

(Against RELEASE-4.2.2)


diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 7780426..d0394df 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2976,7 +2976,7 @@ static void save_domain_core_writeconfig(int fd, 
const char *source,
              hdr.optional_data_len);
  }

-static int save_domain(const char *p, const char *filename, int checkpoint,
+static int save_domain(const char *p, const char *filename, int 
checkpoint, int leavepaused,
                  const char *override_config_file)
  {
      int fd;
@@ -3003,10 +3003,13 @@ static int save_domain(const char *p, const char 
*filename, int checkpoint,
      if (rc < 0)
          fprintf(stderr, "Failed to save domain, resuming domain\n");

-    if (checkpoint || rc < 0)
-        libxl_domain_resume(ctx, domid, 1, 0);
+    if (leavepaused || checkpoint || rc < 0) {
+       libxl_domain_resume(ctx, domid, 1, 0);
+        if (leavepaused && ! (rc < 0))
+            libxl_domain_pause(ctx, domid);
+    }
      else
-        libxl_domain_destroy(ctx, domid, 0);
+         libxl_domain_destroy(ctx, domid, 0);

      exit(rc < 0 ? 1 : 0);
  }
diff --git a/tools/libxl/xl_cmdimpl.c b/tools/libxl/xl_cmdimpl.c
index 7780426..d0394df 100644
--- a/tools/libxl/xl_cmdimpl.c
+++ b/tools/libxl/xl_cmdimpl.c
@@ -2976,7 +2976,7 @@ static void save_domain_core_writeconfig(int fd, 
const char *source,
              hdr.optional_data_len);
  }

-static int save_domain(const char *p, const char *filename, int checkpoint,
+static int save_domain(const char *p, const char *filename, int 
checkpoint, int leavepaused,
                  const char *override_config_file)
  {
      int fd;
@@ -3003,10 +3003,13 @@ static int save_domain(const char *p, const char 
*filename, int checkpoint,
      if (rc < 0)
          fprintf(stderr, "Failed to save domain, resuming domain\n");

-    if (checkpoint || rc < 0)
-        libxl_domain_resume(ctx, domid, 1, 0);
+    if (leavepaused || checkpoint || rc < 0) {
+       libxl_domain_resume(ctx, domid, 1, 0);
+        if (leavepaused && ! (rc < 0))
+            libxl_domain_pause(ctx, domid);
+    }
      else
-        libxl_domain_destroy(ctx, domid, 0);
+         libxl_domain_destroy(ctx, domid, 0);

      exit(rc < 0 ? 1 : 0);
  }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: xl save but leave domain paused
  2013-05-30 21:46 xl save but leave domain paused Ian Murray
@ 2013-06-04 14:28 ` Ian Campbell
  2013-06-09 21:41   ` Ian Murray
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Campbell @ 2013-06-04 14:28 UTC (permalink / raw)
  To: Ian Murray; +Cc: xen-devel

On Thu, 2013-05-30 at 22:46 +0100, Ian Murray wrote:
> To this end and as more of a starting point, I have written my own basic 
> patch. While this appears to work, I see a (very small) opportunity for 
> the domU to run for a short time between the libxl_domain_resume and 
> libxl_domain_pause calls. This defeats the object as I am trying to 
> maintain a disk snapshot that is exactly in synch with the save state.
> 
> Can anyone please offer some thoughts on how I can implement this 
> properly.

Does it work if you simply do the pause before the resume? Looking at
the hypervisor side it appears that pauses are referenced counted and it
looks (based on a cursory glance) that it will do the right thing.

>  I have looked at the corresponding xc calls but meddling with 
> those is way beyond my knowledge. Another way of looking at the problem 
> would be able to perform an xl save on a paused domain, as this would 
> achieve the same result.

OOI what happens if you try that?

> (Against RELEASE-4.2.2)

FYI any eventual patch will need to be against unstable and then
considered separately for backporting.

Ian.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xl save but leave domain paused
  2013-06-04 14:28 ` Ian Campbell
@ 2013-06-09 21:41   ` Ian Murray
  2013-06-11 10:15     ` Ian Campbell
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Murray @ 2013-06-09 21:41 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel

Thanks for reading and responding


> 
> Does it work if you simply do the pause before the resume? Looking at
> the hypervisor side it appears that pauses are referenced counted and it
> looks (based on a cursory glance) that it will do the right thing.
> 

I tried this before and it didn't seems to work. However, it seems to working now, having repeated the exercise.

>From memory It seemed to leave the domain in either a "pss" or similar unhealthy state (I don't remember exactly). I have since discovered this may be a different issue around suspending/checkpointing using a CentOS 5.x Xen kernel and also I was working with a domain I broke because of experimentation.

Anyway, I will spend more time testing this solution.

>>   I have looked at the corresponding xc calls but meddling with 
>>  those is way beyond my knowledge. Another way of looking at the problem 
>>  would be able to perform an xl save on a paused domain, as this would 
>>  achieve the same result.
> 
> OOI what happens if you try that?

As far as I remember it complained that the domain didn't respond to the suspend request. I was going to repeat the exercise but that seems like a moot point now.


> 
>>  (Against RELEASE-4.2.2)
> 
> FYI any eventual patch will need to be against unstable and then
> considered separately for backporting.

Sure. I just needed a working place to start.

If I was to prepare a patch against unstable, would it be accepted?


The code I ended up with was (which is what I think you expected):-

    if (leavepaused || checkpoint || rc < 0) {
        if (leavepaused && ! (rc < 0)) {
            libxl_domain_pause(ctx, domid);
            fprintf(stderr, "Pausing before resume\n");
        }
        libxl_domain_resume(ctx, domid, 1, 0);
    }
    else
         libxl_domain_destroy(ctx, domid, 0);


> 
> Ian.
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xl save but leave domain paused
  2013-06-09 21:41   ` Ian Murray
@ 2013-06-11 10:15     ` Ian Campbell
  2013-06-12 10:21       ` Ian Murray
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Campbell @ 2013-06-11 10:15 UTC (permalink / raw)
  To: Ian Murray; +Cc: xen-devel

On Sun, 2013-06-09 at 22:41 +0100, Ian Murray wrote:
> Thanks for reading and responding
> 
> 
> > 
> > Does it work if you simply do the pause before the resume? Looking at
> > the hypervisor side it appears that pauses are referenced counted and it
> > looks (based on a cursory glance) that it will do the right thing.
> > 
> 
> I tried this before and it didn't seems to work. However, it seems to working now, having repeated the exercise.
> 
> From memory It seemed to leave the domain in either a "pss" or similar unhealthy state (I don't remember exactly). I have since discovered this may be a different issue around suspending/checkpointing using a CentOS 5.x Xen kernel and also I was working with a domain I broke because of experimentation.
> 
> Anyway, I will spend more time testing this solution.
> 
> >>   I have looked at the corresponding xc calls but meddling with 
> >>  those is way beyond my knowledge. Another way of looking at the problem 
> >>  would be able to perform an xl save on a paused domain, as this would 
> >>  achieve the same result.
> > 
> > OOI what happens if you try that?
> 
> As far as I remember it complained that the domain didn't respond to the suspend request. I was going to repeat the exercise but that seems like a moot point now.
> 
> 
> > 
> >>  (Against RELEASE-4.2.2)
> > 
> > FYI any eventual patch will need to be against unstable and then
> > considered separately for backporting.
> 
> Sure. I just needed a working place to start.
> 
> If I was to prepare a patch against unstable, would it be accepted?

It seems like useful functionality to me, so subject to reviewing the
actual implementation I think it more than likely would.

> 
> 
> The code I ended up with was (which is what I think you expected):-

Yep, minus the fprintf which I don't think is needed.

> 
>     if (leavepaused || checkpoint || rc < 0) {
>         if (leavepaused && ! (rc < 0)) {

I think the libxl coding style would be to cuddle the ! against the
bracket.

>             libxl_domain_pause(ctx, domid);
>             fprintf(stderr, "Pausing before resume\n");
>         }
>         libxl_domain_resume(ctx, domid, 1, 0);
>     }
>     else
>          libxl_domain_destroy(ctx, domid, 0);
> 
> 
> > 
> > Ian.
> > 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xl save but leave domain paused
  2013-06-11 10:15     ` Ian Campbell
@ 2013-06-12 10:21       ` Ian Murray
  2013-06-12 11:27         ` Ian Campbell
  0 siblings, 1 reply; 6+ messages in thread
From: Ian Murray @ 2013-06-12 10:21 UTC (permalink / raw)
  To: Ian Campbell; +Cc: xen-devel



> 
> It seems like useful functionality to me, so subject to reviewing the
> actual implementation I think it more than likely would.
> 


I see there is a longterm goal for snapshoting....

"

Full-VM snapshotting owner: ? status: none prognosis: Probably delay until 4.4 Have a way of coordinating the taking and restoring of VM memory and disk snapshots.  This would involve some investigation into the best way to accomplish this."

I don't know if my proposed solution meets that requirement. I would argue that snapshotting of the virtual disk is beyond the scope of the hypervisor tools because there are many different ways to implement a virtual disk, each with their own "snapshotting" methods....

So perhaps it does meet the requirement.


>> 
>>  The code I ended up with was (which is what I think you expected):-
> 
> Yep, minus the fprintf which I don't think is needed.
> 

Yeah, that was in to make sure I was defintely running the right code, as pause straight after the suspend has the same output

>> 
>>      if (leavepaused || checkpoint || rc < 0) {
>>          if (leavepaused && ! (rc < 0)) {
> 
> I think the libxl coding style would be to cuddle the ! against the
> bracket.
> 

Lol, I will take a look at the rest of the code to make sure it is in the same style.

>>              libxl_domain_pause(ctx, domid);
>>              fprintf(stderr, "Pausing before resume\n");
>>          }
>>          libxl_domain_resume(ctx, domid, 1, 0);
>>      }
>>      else
>>           libxl_domain_destroy(ctx, domid, 0);
>> 
>> 
>>  > 
>>  > Ian.
>>  > 
> 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: xl save but leave domain paused
  2013-06-12 10:21       ` Ian Murray
@ 2013-06-12 11:27         ` Ian Campbell
  0 siblings, 0 replies; 6+ messages in thread
From: Ian Campbell @ 2013-06-12 11:27 UTC (permalink / raw)
  To: Ian Murray; +Cc: xen-devel

On Wed, 2013-06-12 at 11:21 +0100, Ian Murray wrote:
> 
> > 
> > It seems like useful functionality to me, so subject to reviewing the
> > actual implementation I think it more than likely would.
> > 
> 
> 
> I see there is a longterm goal for snapshoting....
> 
> "
> 
> Full-VM snapshotting owner: ? status: none prognosis: Probably delay
> until 4.4 Have a way of coordinating the taking and restoring of VM
> memory and disk snapshots.  This would involve some investigation into
> the best way to accomplish this."
> 
> I don't know if my proposed solution meets that requirement. I would
> argue that snapshotting of the virtual disk is beyond the scope of the
> hypervisor tools because there are many different ways to implement a
> virtual disk, each with their own "snapshotting" methods....

I would imagine we would end up with something like having libxl support
callbacks for the relevant events and for xl to implement them as script
callouts and other toolstack to do whatever they need to do. But as the
goal says there needs to be some investigation of what the toolstack
want/need and what the best way to achieve things is.

> So perhaps it does meet the requirement.

I think it is at least complementary or orthogonal to it, and it seems
to me like useful functionality in its own right.

Ian.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2013-06-12 11:27 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-30 21:46 xl save but leave domain paused Ian Murray
2013-06-04 14:28 ` Ian Campbell
2013-06-09 21:41   ` Ian Murray
2013-06-11 10:15     ` Ian Campbell
2013-06-12 10:21       ` Ian Murray
2013-06-12 11:27         ` Ian Campbell

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.