linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Arnd Bergmann <arnd@arndb.de>
To: Dave Hansen <dave@linux.vnet.ibm.com>
Cc: Oren Laadan <orenl@cs.columbia.edu>,
	containers@lists.linux-foundation.org,
	linux-kernel@vger.kernel.org, Theodore Tso <tytso@mit.edu>,
	"Serge E. Hallyn" <serue@us.ibm.com>
Subject: Re: [RFC][PATCH 1/4] checkpoint-restart: general infrastructure
Date: Fri, 8 Aug 2008 11:46:54 +0200	[thread overview]
Message-ID: <200808081146.54834.arnd@arndb.de> (raw)
In-Reply-To: <20080807224034.735B1F84@kernel>

On Friday 08 August 2008, Dave Hansen wrote:
> +/* write the checkpoint header */
> +static int cr_write_hdr(struct cr_ctx *ctx)
> +{
> +	struct cr_hdr h;
> +	struct cr_hdr_head *hh = ctx->tbuf;
> +	struct timeval ktv;
> +
> +	h.type = CR_HDR_HEAD;
> +	h.len = sizeof(*hh);
> +	h.id = 0;
> +
> +	do_gettimeofday(&ktv);
> +
> +	hh->magic = 0x00a2d200;
> +	hh->major = (LINUX_VERSION_CODE >> 16) & 0xff;
> +	hh->minor = (LINUX_VERSION_CODE >> 8) & 0xff;
> +	hh->patch = (LINUX_VERSION_CODE) & 0xff;
> +
> +	hh->version = 1;
> +
> +	hh->flags = ctx->flags;
> +	hh->time = ktv.tv_sec;
> +
> +	return cr_write_obj(ctx, &h, hh);
> +}

Do you rely on the kernel version in order to determine the format
of the binary data, or is it just informational?

If you think the format can change in incompatible ways, you
probably need something more specific than the version number
these days, because there are just so many different trees with
the same numbers.


> +
> +/* debugging */
> +#if 0
> +#define CR_PRINTK(str, args...)  \
> +	printk(KERN_ERR "cr@%s#%d: " str, __func__, __LINE__, ##args)
> +#else
> +#define CR_PRINTK(...)		do {} while (0)
> +#endif
> +

Please use the existing pr_debug and dev_debug here, instead of creating
yet another version.

> +struct cr_hdr_tail {
> +	__u32 magic;
> +	__u32 cksum[2];
> +};

This structure has an odd multiple of 32-bit members, which means
that if you put it into a larger structure that also contains
64-bit members, the larger structure may get different alignment
on x86-32 and x86-64, which you might want to avoid.
I can't tell if this is an actual problem here.

> +
> +struct cr_hdr_task {
> +	__u64 state;
> +	__u32 exit_state;
> +	__u32 exit_code, exit_signal;
> +
> +	__u16 pid;
> +	__u16 tgid;
> +
> +	__u64 utime, stime, utimescaled, stimescaled;
> +	__u64 gtime;
> +	__u64 prev_utime, prev_stime;
> +	__u64 nvcsw, nivcsw;
> +	__u64 start_time_sec, start_time_nsec;
> +	__u64 real_start_time_sec, real_start_time_nsec;
> +	__u64 min_flt, maj_flt;
> +
> +	__s16 task_comm_len;
> +	char comm[TASK_COMM_LEN];
> +};

In this case, I'm pretty sure that sizeof(cr_hdr_task) on x86-32 is
different from x86-64, since it will be 32-bit aligned on x86-32.

> +
> +/*
> + * During restart the code reads in data from the chekcpoint image into a
> + * temporary buffer (ctx->hbuf). Because operations can be nested, one
> + * should call cr_hbuf_get() to reserve space in the buffer, and then
> + * cr_hbuf_put() when it no longer needs that space
> + */
> +
> +#include <linux/version.h>
> +#include <linux/sched.h>
> +#include <linux/file.h>
> +
> +#include "ckpt.h"
> +#include "ckpt_hdr.h"
> +
> +/**
> + * cr_hbuf_get - reserve space on the hbuf
> + * @ctx: checkpoint context
> + * @n: number of bytes to reserve
> + */
> +void *cr_hbuf_get(struct cr_ctx *ctx, int n)
> +{
> +	void *ptr;
> +
> +	BUG_ON(ctx->hpos + n > CR_HBUF_TOTAL);
> +	ptr = (void *) (((char *) ctx->hbuf) + ctx->hpos);
> +	ctx->hpos += n;
> +	return ptr;
> +}

Can (ctx->hpos + n > CR_HBUF_TOTAL) be controlled by the input
data? If so, this is a denial-of-service attack.

> +
> +int cr_kwrite(struct cr_ctx *ctx, void *buf, int count)
> +{
> +	mm_segment_t oldfs;
> +	int ret;
> +
> +	oldfs = get_fs();
> +	set_fs(KERNEL_DS);
> +	ret = cr_uwrite(ctx, buf, count);
> +	set_fs(oldfs);
> +
> +	return ret;
> +}

get_fs()/set_fs() always feels a bit ouch, and this way you have
to use __force to avoid the warnings about __user pointer casts
in sparse.
I wonder if you can use splice_read/splice_write to get around
this problem.

> +	struct cr_ctx *ctx;
> +	struct file *file;
> +	int fput_needed;
> +	int ret;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +

Why do you need CAP_SYS_ADMIN for this? Can't regular users
be allowed to checkpoint/restart their own tasks?

> --- linux-2.6.git/Makefile~handle_a_single_task_with_private_memory_maps	2008-08-05 09:04:27.000000000 -0700
> +++ linux-2.6.git-dave/Makefile	2008-08-05 09:07:53.000000000 -0700
> @@ -611,7 +611,7 @@ export mod_strip_cmd
>  
>  
>  ifeq ($(KBUILD_EXTMOD),)
> -core-y		+= kernel/ mm/ fs/ ipc/ security/ crypto/ block/
> +core-y		+= kernel/ mm/ fs/ ipc/ security/ crypto/ block/ ckpt/
>  

The name 'ckpt' is a bit unobvious, how about naming it 'checkpoint' instead?

	Arnd <><

  reply	other threads:[~2008-08-08  9:49 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-08-07 22:40 [RFC][PATCH 0/4] kernel-based checkpoint restart Dave Hansen
2008-08-07 22:40 ` [RFC][PATCH 1/4] checkpoint-restart: general infrastructure Dave Hansen
2008-08-08  9:46   ` Arnd Bergmann [this message]
2008-08-08 18:50     ` Dave Hansen
2008-08-08 20:59       ` Oren Laadan
2008-08-08 22:17         ` Dave Hansen
2008-08-08 23:27           ` Oren Laadan
2008-08-08 22:23         ` Arnd Bergmann
2008-08-14  8:09         ` [Devel] " Pavel Emelyanov
2008-08-14 15:16           ` Dave Hansen
2008-08-08 22:13       ` Arnd Bergmann
2008-08-08 22:26         ` Dave Hansen
2008-08-08 22:39           ` Arnd Bergmann
2008-08-09  0:43             ` Dave Hansen
2008-08-09  6:37               ` Arnd Bergmann
2008-08-09 13:39                 ` Dave Hansen
2008-08-11 15:07           ` Serge E. Hallyn
2008-08-11 15:25             ` Arnd Bergmann
2008-08-14  5:53             ` Pavel Machek
2008-08-14 15:12               ` Dave Hansen
2008-08-20 21:40               ` Oren Laadan
2008-08-11 15:22         ` Serge E. Hallyn
2008-08-11 16:53           ` Arnd Bergmann
2008-08-11 17:11             ` Dave Hansen
2008-08-11 19:48             ` checkpoint/restart ABI Dave Hansen
2008-08-11 21:47               ` Arnd Bergmann
2008-08-11 23:14                 ` Jonathan Corbet
2008-08-11 23:23                   ` Dave Hansen
2008-08-21  5:56                 ` Oren Laadan
2008-08-21  8:43                   ` Arnd Bergmann
2008-08-21 15:43                     ` Oren Laadan
2008-08-11 21:54               ` Oren Laadan
2008-08-11 23:38               ` Jeremy Fitzhardinge
2008-08-11 23:54                 ` Peter Chubb
2008-08-12 14:49                   ` Serge E. Hallyn
2008-08-28 23:40                     ` Eric W. Biederman
2008-08-12 15:11                   ` Dave Hansen
2008-08-12 14:58                 ` Dave Hansen
2008-08-12 16:32                   ` Jeremy Fitzhardinge
2008-08-12 16:46                     ` Dave Hansen
2008-08-12 17:04                       ` Jeremy Fitzhardinge
2008-08-20 21:52                         ` Oren Laadan
2008-08-20 21:54                       ` Oren Laadan
2008-08-20 22:11                         ` Dave Hansen
2008-08-11 18:03   ` [RFC][PATCH 1/4] checkpoint-restart: general infrastructure Jonathan Corbet
2008-08-11 18:38     ` Dave Hansen
2008-08-12  3:44       ` Oren Laadan
2008-08-18  9:26   ` [Devel] " Pavel Emelyanov
2008-08-20 19:10     ` Dave Hansen
2008-08-07 22:40 ` [RFC][PATCH 2/4] checkpoint/restart: x86 support Dave Hansen
2008-08-08 12:09   ` Arnd Bergmann
2008-08-08 20:28     ` Oren Laadan
2008-08-08 22:29       ` Arnd Bergmann
2008-08-08 23:04         ` Oren Laadan
2008-08-09  0:38           ` Dave Hansen
2008-08-09  1:20             ` Oren Laadan
2008-08-09  2:20               ` Dave Hansen
2008-08-09  2:35                 ` Oren Laadan
2008-08-10 14:55             ` Jeremy Fitzhardinge
2008-08-11 15:36               ` Dave Hansen
2008-08-11 16:07                 ` Jeremy Fitzhardinge
2008-08-09  6:43           ` Arnd Bergmann
2008-08-07 22:40 ` [RFC][PATCH 3/4] checkpoint/restart: memory management Dave Hansen
2008-08-08 12:12   ` Arnd Bergmann
2008-08-07 22:40 ` [RFC][PATCH 4/4] introduce sys_checkpoint and sys_restore Dave Hansen
2008-08-08 12:15   ` Arnd Bergmann
2008-08-08 20:33     ` Oren Laadan
2008-08-08  9:25 ` [RFC][PATCH 0/4] kernel-based checkpoint restart Arnd Bergmann
2008-08-08 18:06   ` Dave Hansen
2008-08-08 18:18     ` Arnd Bergmann
2008-08-08 19:44   ` Oren Laadan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200808081146.54834.arnd@arndb.de \
    --to=arnd@arndb.de \
    --cc=containers@lists.linux-foundation.org \
    --cc=dave@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=orenl@cs.columbia.edu \
    --cc=serue@us.ibm.com \
    --cc=tytso@mit.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).