All of lore.kernel.org
 help / color / mirror / Atom feed
* Periodic checkpointing (using perf and signals?)
@ 2013-07-17 15:44 Christopher Covington
  2013-07-17 15:57 ` [CRIU] " Pavel Emelyanov
  0 siblings, 1 reply; 2+ messages in thread
From: Christopher Covington @ 2013-07-17 15:44 UTC (permalink / raw)
  To: linux-perf-users, criu

Hi,

I'm interested in taking checkpoints of processes from fast systems like
hardware and restoring them on really slow software models for performance
analysis. So far I've been able to save and restore checkpoints on the
different systems using CRIU. Now I'm looking for some way to trigger the
checkpointing. One basic use case might be to take a process that runs for say
100M instructions and take a checkpoint every 10M instructions to be restored
as 10 parallel runs of the model.

I'm thinking of trying to use performance counters to trigger such behavior.
Does perf already have support for triggering things like this? If not, I'm
thinking of trying to work in the ability to send a signal, like stop, to the
process of interest once the specified count, such as 10M instructions, has
been reached. CRIU or a wrapper could then wait for process of interest to
stop, take the checkpoint, let the process continue, and then wait for it to
stop again or exit. Would such an approach make sense?

Thanks,
Christopher

-- 
Employee of Qualcomm Innovation Center, Inc.
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by the Linux Foundation.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [CRIU] Periodic checkpointing (using perf and signals?)
  2013-07-17 15:44 Periodic checkpointing (using perf and signals?) Christopher Covington
@ 2013-07-17 15:57 ` Pavel Emelyanov
  0 siblings, 0 replies; 2+ messages in thread
From: Pavel Emelyanov @ 2013-07-17 15:57 UTC (permalink / raw)
  To: Christopher Covington; +Cc: linux-perf-users, criu

On 07/17/2013 07:44 PM, Christopher Covington wrote:
> Hi,
> 
> I'm interested in taking checkpoints of processes from fast systems like
> hardware and restoring them on really slow software models for performance
> analysis.

Great idea! I will add it on http://criu.org/Usage_scenarios :)

> So far I've been able to save and restore checkpoints on the
> different systems using CRIU. Now I'm looking for some way to trigger the
> checkpointing. One basic use case might be to take a process that runs for say
> 100M instructions and take a checkpoint every 10M instructions to be restored
> as 10 parallel runs of the model.
> 
> I'm thinking of trying to use performance counters to trigger such behavior.
> Does perf already have support for triggering things like this?

I'm not 100% sure, but I've seen examples of python plugins for perf. From
these examples, I believe that it's possible to write a plugin, that will run
some code after noticing 100M instructions.

> If not, I'm
> thinking of trying to work in the ability to send a signal, like stop, to the
> process of interest once the specified count, such as 10M instructions, has
> been reached. CRIU or a wrapper could then wait for process of interest to
> stop, take the checkpoint, let the process continue, and then wait for it to
> stop again or exit. Would such an approach make sense?

It makes perfect sense! Several things to note from my side.

1. It's perfect case where the --track-mem + --prev-images-dir options should be
used. It will help subsequent dumps take MUCH less time, since with them CRIU 
will not take full task dump, but instead will only grab what has changed since
last dump.

2. Current version of CRIU doesn't work with stopped tasks. We're currently
developing it and this functionality will be available with v0.7 only. However,
I think it's OK just to start "criu dump" command after perf trigger. The dump
would work on a process that has done slightly more than 10M instructions, but
that would be the same in case you send it STOP signal.

> Thanks,
> Christopher

Thanks,
Pavel

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2013-07-17 15:57 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-07-17 15:44 Periodic checkpointing (using perf and signals?) Christopher Covington
2013-07-17 15:57 ` [CRIU] " Pavel Emelyanov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.