All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/16][cr][v3]: C/R file owner, locks, leases
@ 2010-08-03 23:11 Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 01/16][cr][v3]: Add uid, euid params to f_modown() Sukadev Bhattiprolu
                   ` (17 more replies)
  0 siblings, 18 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Checkpoint/restart file owner, file-locks and file-lease information.

Changelog[v3]:
	- Broke-up C/R of file-leases patches into smaller patches and included
	  them in this set.
	- Addressed comments from Jamie Lokier, Oren Laadan with help from
	  John Stultz on the computation of time offsets.


Sukadev Bhattiprolu (16):
  Add uid, euid params to f_modown()
  Add uid, euid params to __f_setown()
  Checkpoint file-owner information
  Restore file_owner info
  Move file_lock macros into linux/fs.h
  Checkpoint file-locks
  Define flock_set()
  Define flock64_set()
  Restore file-locks
  Initialize ->fl_break_time to 0
  Add ->fl_type_prev field.
  Add ->fl_break_notified field.
  Add jiffies_begin field to ckpt_ctx
  Checkpoint file-leases
  Define do_setlease()
  Restore file-leases

 drivers/char/tty_io.c            |    3 +-
 drivers/net/tun.c                |    3 +-
 fs/checkpoint.c                  |  410 +++++++++++++++++++++++++++++++++++---
 fs/fcntl.c                       |   19 +-
 fs/locks.c                       |  207 +++++++++++++++-----
 fs/notify/dnotify/dnotify.c      |    3 +-
 include/linux/checkpoint_hdr.h   |   26 +++
 include/linux/checkpoint_types.h |    1 +
 include/linux/fs.h               |   17 ++-
 kernel/checkpoint/sys.c          |    1 +
 10 files changed, 596 insertions(+), 94 deletions(-)

Note: Most of the "added lines" in fs/locks.c are comments about C/R :-)

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 01/16][cr][v3]: Add uid, euid params to f_modown()
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 02/16][cr][v3]: Add uid, euid params to __f_setown() Sukadev Bhattiprolu
                     ` (15 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Checkpoint/restart of file-owner.

Add uid, euid parameters to f_modown(). These parameters will be needed
when restarting an application (and hence restoring the file information),
from a checkpoint image.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/fcntl.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 2079af0..aeab1f4 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -197,7 +197,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
 }
 
 static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
-                     int force)
+                     uid_t uid, uid_t euid, int force)
 {
 	write_lock_irq(&filp->f_owner.lock);
 	if (force || !filp->f_owner.pid) {
@@ -206,9 +206,8 @@ static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
 		filp->f_owner.pid_type = type;
 
 		if (pid) {
-			const struct cred *cred = current_cred();
-			filp->f_owner.uid = cred->uid;
-			filp->f_owner.euid = cred->euid;
+			filp->f_owner.uid = uid;
+			filp->f_owner.euid = euid;
 		}
 	}
 	write_unlock_irq(&filp->f_owner.lock);
@@ -223,7 +222,7 @@ int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
 	if (err)
 		return err;
 
-	f_modown(filp, pid, type, force);
+	f_modown(filp, pid, type, current_uid(), current_euid(), force);
 	return 0;
 }
 EXPORT_SYMBOL(__f_setown);
@@ -249,7 +248,7 @@ EXPORT_SYMBOL(f_setown);
 
 void f_delown(struct file *filp)
 {
-	f_modown(filp, NULL, PIDTYPE_PID, 1);
+	f_modown(filp, NULL, PIDTYPE_PID, current_uid(), current_euid(), 1);
 }
 
 pid_t f_getown(struct file *filp)
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 01/16][cr][v3]: Add uid, euid params to f_modown()
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 02/16][cr][v3]: Add uid, euid params to __f_setown() Sukadev Bhattiprolu
                   ` (16 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Checkpoint/restart of file-owner.

Add uid, euid parameters to f_modown(). These parameters will be needed
when restarting an application (and hence restoring the file information),
from a checkpoint image.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/fcntl.c |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/fs/fcntl.c b/fs/fcntl.c
index 2079af0..aeab1f4 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -197,7 +197,7 @@ static int setfl(int fd, struct file * filp, unsigned long arg)
 }
 
 static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
-                     int force)
+                     uid_t uid, uid_t euid, int force)
 {
 	write_lock_irq(&filp->f_owner.lock);
 	if (force || !filp->f_owner.pid) {
@@ -206,9 +206,8 @@ static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
 		filp->f_owner.pid_type = type;
 
 		if (pid) {
-			const struct cred *cred = current_cred();
-			filp->f_owner.uid = cred->uid;
-			filp->f_owner.euid = cred->euid;
+			filp->f_owner.uid = uid;
+			filp->f_owner.euid = euid;
 		}
 	}
 	write_unlock_irq(&filp->f_owner.lock);
@@ -223,7 +222,7 @@ int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
 	if (err)
 		return err;
 
-	f_modown(filp, pid, type, force);
+	f_modown(filp, pid, type, current_uid(), current_euid(), force);
 	return 0;
 }
 EXPORT_SYMBOL(__f_setown);
@@ -249,7 +248,7 @@ EXPORT_SYMBOL(f_setown);
 
 void f_delown(struct file *filp)
 {
-	f_modown(filp, NULL, PIDTYPE_PID, 1);
+	f_modown(filp, NULL, PIDTYPE_PID, current_uid(), current_euid(), 1);
 }
 
 pid_t f_getown(struct file *filp)
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 02/16][cr][v3]: Add uid, euid params to __f_setown()
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2010-08-03 23:11   ` [PATCH 01/16][cr][v3]: Add uid, euid params to f_modown() Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 03/16][cr][v3]: Checkpoint file-owner information Sukadev Bhattiprolu
                     ` (14 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Instead of letting __f_setown() use the UID and EUID of the calling process,
pass them in as parameters.

This modified interface will be useful when checkpointing and restarting an
application that has a 'file owner' specified for an open file.  When
checkpointing the application, the UID and EUID of the process setting up
the owner are saved in the checkpoint image.

When the application is restarted, we use the UID and EUID values saved in
the checkpoint-image, rather than that of the calling process.

Changelog[v2]:
	- [Matthew Wilcox] Rather than a new __f_setown_uid() interface add
	  the uid parameters to __f_setown() itself.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 drivers/char/tty_io.c       |    3 ++-
 drivers/net/tun.c           |    3 ++-
 fs/fcntl.c                  |   10 ++++++----
 fs/locks.c                  |    3 ++-
 fs/notify/dnotify/dnotify.c |    3 ++-
 include/linux/fs.h          |    3 ++-
 6 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index 115c8e4..9c282b2 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -1969,7 +1969,8 @@ static int tty_fasync(int fd, struct file *filp, int on)
 		}
 		get_pid(pid);
 		spin_unlock_irqrestore(&tty->ctrl_lock, flags);
-		retval = __f_setown(filp, pid, type, 0);
+		retval = __f_setown(filp, pid, type, current_uid(),
+				current_euid(), 0);
 		put_pid(pid);
 		if (retval)
 			goto out;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4326520..dcbc37d 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1400,7 +1400,8 @@ static int tun_chr_fasync(int fd, struct file *file, int on)
 		goto out;
 
 	if (on) {
-		ret = __f_setown(file, task_pid(current), PIDTYPE_PID, 0);
+		ret = __f_setown(file, task_pid(current), PIDTYPE_PID,
+				current_uid(), current_euid(), 0);
 		if (ret)
 			goto out;
 		tun->flags |= TUN_FASYNC;
diff --git a/fs/fcntl.c b/fs/fcntl.c
index aeab1f4..f44327d 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -214,7 +214,7 @@ static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
 }
 
 int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
-		int force)
+		uid_t uid, uid_t euid, int force)
 {
 	int err;
 
@@ -222,7 +222,7 @@ int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
 	if (err)
 		return err;
 
-	f_modown(filp, pid, type, current_uid(), current_euid(), force);
+	f_modown(filp, pid, type, uid, euid, force);
 	return 0;
 }
 EXPORT_SYMBOL(__f_setown);
@@ -240,7 +240,8 @@ int f_setown(struct file *filp, unsigned long arg, int force)
 	}
 	rcu_read_lock();
 	pid = find_vpid(who);
-	result = __f_setown(filp, pid, type, force);
+	result = __f_setown(filp, pid, type, current_uid(), current_euid(),
+			force);
 	rcu_read_unlock();
 	return result;
 }
@@ -296,7 +297,8 @@ static int f_setown_ex(struct file *filp, unsigned long arg)
 	if (owner.pid && !pid)
 		ret = -ESRCH;
 	else
-		ret = __f_setown(filp, pid, type, 1);
+		ret = __f_setown(filp, pid, type, current_uid(), 
+				current_euid(), 1);
 	rcu_read_unlock();
 
 	return ret;
diff --git a/fs/locks.c b/fs/locks.c
index 9cd859e..ca0c7e2 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1514,7 +1514,8 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
 		goto out_unlock;
 	}
 
-	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
+	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, current_uid(),
+				current_euid(), 0);
 out_unlock:
 	unlock_kernel();
 	return error;
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index 0a63bf6..3e025e5 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -409,7 +409,8 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 		goto out;
 	}
 
-	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
+	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, current_uid(),
+				current_euid(), 0);
 	if (error) {
 		/* if we added, we must shoot */
 		if (dnentry == new_dnentry)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ee725ff..b4a6fb0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1304,7 +1304,8 @@ extern void kill_fasync(struct fasync_struct **, int, int);
 /* only for net: no internal synchronization */
 extern void __kill_fasync(struct fasync_struct *, int, int);
 
-extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int force);
+extern int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
+		uid_t uid, uid_t euid, int force);
 extern int f_setown(struct file *filp, unsigned long arg, int force);
 extern void f_delown(struct file *filp);
 extern pid_t f_getown(struct file *filp);
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 02/16][cr][v3]: Add uid, euid params to __f_setown()
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 01/16][cr][v3]: Add uid, euid params to f_modown() Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 03/16][cr][v3]: Checkpoint file-owner information Sukadev Bhattiprolu
                   ` (15 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Instead of letting __f_setown() use the UID and EUID of the calling process,
pass them in as parameters.

This modified interface will be useful when checkpointing and restarting an
application that has a 'file owner' specified for an open file.  When
checkpointing the application, the UID and EUID of the process setting up
the owner are saved in the checkpoint image.

When the application is restarted, we use the UID and EUID values saved in
the checkpoint-image, rather than that of the calling process.

Changelog[v2]:
	- [Matthew Wilcox] Rather than a new __f_setown_uid() interface add
	  the uid parameters to __f_setown() itself.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 drivers/char/tty_io.c       |    3 ++-
 drivers/net/tun.c           |    3 ++-
 fs/fcntl.c                  |   10 ++++++----
 fs/locks.c                  |    3 ++-
 fs/notify/dnotify/dnotify.c |    3 ++-
 include/linux/fs.h          |    3 ++-
 6 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/drivers/char/tty_io.c b/drivers/char/tty_io.c
index 115c8e4..9c282b2 100644
--- a/drivers/char/tty_io.c
+++ b/drivers/char/tty_io.c
@@ -1969,7 +1969,8 @@ static int tty_fasync(int fd, struct file *filp, int on)
 		}
 		get_pid(pid);
 		spin_unlock_irqrestore(&tty->ctrl_lock, flags);
-		retval = __f_setown(filp, pid, type, 0);
+		retval = __f_setown(filp, pid, type, current_uid(),
+				current_euid(), 0);
 		put_pid(pid);
 		if (retval)
 			goto out;
diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 4326520..dcbc37d 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1400,7 +1400,8 @@ static int tun_chr_fasync(int fd, struct file *file, int on)
 		goto out;
 
 	if (on) {
-		ret = __f_setown(file, task_pid(current), PIDTYPE_PID, 0);
+		ret = __f_setown(file, task_pid(current), PIDTYPE_PID,
+				current_uid(), current_euid(), 0);
 		if (ret)
 			goto out;
 		tun->flags |= TUN_FASYNC;
diff --git a/fs/fcntl.c b/fs/fcntl.c
index aeab1f4..f44327d 100644
--- a/fs/fcntl.c
+++ b/fs/fcntl.c
@@ -214,7 +214,7 @@ static void f_modown(struct file *filp, struct pid *pid, enum pid_type type,
 }
 
 int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
-		int force)
+		uid_t uid, uid_t euid, int force)
 {
 	int err;
 
@@ -222,7 +222,7 @@ int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
 	if (err)
 		return err;
 
-	f_modown(filp, pid, type, current_uid(), current_euid(), force);
+	f_modown(filp, pid, type, uid, euid, force);
 	return 0;
 }
 EXPORT_SYMBOL(__f_setown);
@@ -240,7 +240,8 @@ int f_setown(struct file *filp, unsigned long arg, int force)
 	}
 	rcu_read_lock();
 	pid = find_vpid(who);
-	result = __f_setown(filp, pid, type, force);
+	result = __f_setown(filp, pid, type, current_uid(), current_euid(),
+			force);
 	rcu_read_unlock();
 	return result;
 }
@@ -296,7 +297,8 @@ static int f_setown_ex(struct file *filp, unsigned long arg)
 	if (owner.pid && !pid)
 		ret = -ESRCH;
 	else
-		ret = __f_setown(filp, pid, type, 1);
+		ret = __f_setown(filp, pid, type, current_uid(), 
+				current_euid(), 1);
 	rcu_read_unlock();
 
 	return ret;
diff --git a/fs/locks.c b/fs/locks.c
index 9cd859e..ca0c7e2 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1514,7 +1514,8 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
 		goto out_unlock;
 	}
 
-	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
+	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, current_uid(),
+				current_euid(), 0);
 out_unlock:
 	unlock_kernel();
 	return error;
diff --git a/fs/notify/dnotify/dnotify.c b/fs/notify/dnotify/dnotify.c
index 0a63bf6..3e025e5 100644
--- a/fs/notify/dnotify/dnotify.c
+++ b/fs/notify/dnotify/dnotify.c
@@ -409,7 +409,8 @@ int fcntl_dirnotify(int fd, struct file *filp, unsigned long arg)
 		goto out;
 	}
 
-	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, 0);
+	error = __f_setown(filp, task_pid(current), PIDTYPE_PID, current_uid(),
+				current_euid(), 0);
 	if (error) {
 		/* if we added, we must shoot */
 		if (dnentry == new_dnentry)
diff --git a/include/linux/fs.h b/include/linux/fs.h
index ee725ff..b4a6fb0 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1304,7 +1304,8 @@ extern void kill_fasync(struct fasync_struct **, int, int);
 /* only for net: no internal synchronization */
 extern void __kill_fasync(struct fasync_struct *, int, int);
 
-extern int __f_setown(struct file *filp, struct pid *, enum pid_type, int force);
+extern int __f_setown(struct file *filp, struct pid *pid, enum pid_type type,
+		uid_t uid, uid_t euid, int force);
 extern int f_setown(struct file *filp, unsigned long arg, int force);
 extern void f_delown(struct file *filp);
 extern pid_t f_getown(struct file *filp);
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 03/16][cr][v3]: Checkpoint file-owner information
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2010-08-03 23:11   ` [PATCH 01/16][cr][v3]: Add uid, euid params to f_modown() Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 02/16][cr][v3]: Add uid, euid params to __f_setown() Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 04/16][cr][v3]: Restore file_owner info Sukadev Bhattiprolu
                     ` (13 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Checkpoint the file->f_owner information for an open file. This
information will be used to restore the file-owner information
when the application is restarted from the checkpoint.

The file->f_owner information is "private" to each 'struct file' i.e.
fown_struct is not an external object shared with other file structures.
So the information can directly be added to the 'ckpt_hdr_file' object.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/checkpoint.c                |   27 +++++++++++----------------
 include/linux/checkpoint_hdr.h |    5 +++++
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/checkpoint.c b/fs/checkpoint.c
index 87d7c6e..ce1b4af 100644
--- a/fs/checkpoint.c
+++ b/fs/checkpoint.c
@@ -168,12 +168,19 @@ int checkpoint_file_common(struct ckpt_ctx *ctx, struct file *file,
 			   struct ckpt_hdr_file *h)
 {
 	struct cred *f_cred = (struct cred *) file->f_cred;
+	struct pid *pid = file->f_owner.pid;
 
 	h->f_flags = file->f_flags;
 	h->f_mode = file->f_mode;
 	h->f_pos = file->f_pos;
 	h->f_version = file->f_version;
 
+	h->f_owner_pid = pid_nr_ns(pid, ns_of_pid(pid));
+	h->f_owner_pid_type = file->f_owner.pid_type;
+	h->f_owner_uid = file->f_owner.uid;
+	h->f_owner_euid = file->f_owner.euid;
+	h->f_owner_signum = file->f_owner.signum;
+
 	h->f_credref = checkpoint_obj(ctx, f_cred, CKPT_OBJ_CRED);
 	if (h->f_credref < 0)
 		return h->f_credref;
@@ -184,10 +191,10 @@ int checkpoint_file_common(struct ckpt_ctx *ctx, struct file *file,
 		return h->f_secref;
 	}
 
-	ckpt_debug("file %s credref %d secref %d\n",
-		file->f_dentry->d_name.name, h->f_credref, h->f_secref);
-
-	/* FIX: need also file->f_owner, etc */
+	ckpt_debug("file %s credref %d secref %d, fowner-pid %d, type %d, "
+			"fowner-signum %d\n", file->f_dentry->d_name.name,
+			h->f_credref, h->f_secref, h->f_owner_pid,
+			h->f_owner_pid_type, h->f_owner_signum);
 
 	return 0;
 }
@@ -267,7 +274,6 @@ static int checkpoint_file_desc(struct ckpt_ctx *ctx,
 	struct fdtable *fdt;
 	int objref, ret;
 	int coe = 0;	/* avoid gcc warning */
-	pid_t pid;
 
 	h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_FILE_DESC);
 	if (!h)
@@ -302,17 +308,6 @@ static int checkpoint_file_desc(struct ckpt_ctx *ctx,
 	}
 
 	/*
-	 * TODO: Implement c/r of fowner and f_sigio.  Should be
-	 * trivial, but for now we just refuse its checkpoint
-	 */
-	pid = f_getown(file);
-	if (pid) {
-		ret = -EBUSY;
-		ckpt_err(ctx, ret, "%(T)fd %d has an owner (%d)\n", fd);
-		goto out;
-	}
-
-	/*
 	 * if seen first time, this will add 'file' to the objhash, keep
 	 * a reference to it, dump its state while at it.
 	 */
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index 9e8d518..0381019 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -575,6 +575,11 @@ struct ckpt_hdr_file {
 	__u64 f_pos;
 	__u64 f_version;
 	__s32 f_secref;
+	__s32 f_owner_pid;
+	__u32 f_owner_pid_type;
+	__u32 f_owner_uid;
+	__u32 f_owner_euid;
+	__s32 f_owner_signum;
 } __attribute__((aligned(8)));
 
 struct ckpt_hdr_file_generic {
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 03/16][cr][v3]: Checkpoint file-owner information
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 01/16][cr][v3]: Add uid, euid params to f_modown() Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 02/16][cr][v3]: Add uid, euid params to __f_setown() Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 04/16][cr][v3]: Restore file_owner info Sukadev Bhattiprolu
                   ` (14 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Checkpoint the file->f_owner information for an open file. This
information will be used to restore the file-owner information
when the application is restarted from the checkpoint.

The file->f_owner information is "private" to each 'struct file' i.e.
fown_struct is not an external object shared with other file structures.
So the information can directly be added to the 'ckpt_hdr_file' object.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/checkpoint.c                |   27 +++++++++++----------------
 include/linux/checkpoint_hdr.h |    5 +++++
 2 files changed, 16 insertions(+), 16 deletions(-)

diff --git a/fs/checkpoint.c b/fs/checkpoint.c
index 87d7c6e..ce1b4af 100644
--- a/fs/checkpoint.c
+++ b/fs/checkpoint.c
@@ -168,12 +168,19 @@ int checkpoint_file_common(struct ckpt_ctx *ctx, struct file *file,
 			   struct ckpt_hdr_file *h)
 {
 	struct cred *f_cred = (struct cred *) file->f_cred;
+	struct pid *pid = file->f_owner.pid;
 
 	h->f_flags = file->f_flags;
 	h->f_mode = file->f_mode;
 	h->f_pos = file->f_pos;
 	h->f_version = file->f_version;
 
+	h->f_owner_pid = pid_nr_ns(pid, ns_of_pid(pid));
+	h->f_owner_pid_type = file->f_owner.pid_type;
+	h->f_owner_uid = file->f_owner.uid;
+	h->f_owner_euid = file->f_owner.euid;
+	h->f_owner_signum = file->f_owner.signum;
+
 	h->f_credref = checkpoint_obj(ctx, f_cred, CKPT_OBJ_CRED);
 	if (h->f_credref < 0)
 		return h->f_credref;
@@ -184,10 +191,10 @@ int checkpoint_file_common(struct ckpt_ctx *ctx, struct file *file,
 		return h->f_secref;
 	}
 
-	ckpt_debug("file %s credref %d secref %d\n",
-		file->f_dentry->d_name.name, h->f_credref, h->f_secref);
-
-	/* FIX: need also file->f_owner, etc */
+	ckpt_debug("file %s credref %d secref %d, fowner-pid %d, type %d, "
+			"fowner-signum %d\n", file->f_dentry->d_name.name,
+			h->f_credref, h->f_secref, h->f_owner_pid,
+			h->f_owner_pid_type, h->f_owner_signum);
 
 	return 0;
 }
@@ -267,7 +274,6 @@ static int checkpoint_file_desc(struct ckpt_ctx *ctx,
 	struct fdtable *fdt;
 	int objref, ret;
 	int coe = 0;	/* avoid gcc warning */
-	pid_t pid;
 
 	h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_FILE_DESC);
 	if (!h)
@@ -302,17 +308,6 @@ static int checkpoint_file_desc(struct ckpt_ctx *ctx,
 	}
 
 	/*
-	 * TODO: Implement c/r of fowner and f_sigio.  Should be
-	 * trivial, but for now we just refuse its checkpoint
-	 */
-	pid = f_getown(file);
-	if (pid) {
-		ret = -EBUSY;
-		ckpt_err(ctx, ret, "%(T)fd %d has an owner (%d)\n", fd);
-		goto out;
-	}
-
-	/*
 	 * if seen first time, this will add 'file' to the objhash, keep
 	 * a reference to it, dump its state while at it.
 	 */
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index 9e8d518..0381019 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -575,6 +575,11 @@ struct ckpt_hdr_file {
 	__u64 f_pos;
 	__u64 f_version;
 	__s32 f_secref;
+	__s32 f_owner_pid;
+	__u32 f_owner_pid_type;
+	__u32 f_owner_uid;
+	__u32 f_owner_euid;
+	__s32 f_owner_signum;
 } __attribute__((aligned(8)));
 
 struct ckpt_hdr_file_generic {
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 04/16][cr][v3]: Restore file_owner info
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (2 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 03/16][cr][v3]: Checkpoint file-owner information Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 05/16][cr][v3]: Move file_lock macros into linux/fs.h Sukadev Bhattiprolu
                     ` (12 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Restore the file-owner information for each 'struct file'.  This is
essentially is like a new fcntl(F_SETOWN) and fcntl(F_SETSIG) calls,
except that the pid, uid, euid and signum values are read from the
checkpoint image.

Changelog[v3]:
	- [Oren Laadan]: Ensure find_vpid() found a valid pid before
	  making it the file owner.
Changelog[v2]:
	- [Matt Helsley, Serge Hallyn]: Don't trust uids in checkpoint image.
	  (added CAP_KILL check)
	- Check that signal number read from the checkpoint image is valid.
	  (not sure it is required, since its an incomplete check for tampering)

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/checkpoint.c |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/fs/checkpoint.c b/fs/checkpoint.c
index ce1b4af..b5486c1 100644
--- a/fs/checkpoint.c
+++ b/fs/checkpoint.c
@@ -618,6 +618,68 @@ static int attach_file(struct file *file)
 	return fd;
 }
 
+static int restore_file_owner(struct ckpt_ctx *ctx, struct ckpt_hdr_file *h,
+		struct file *file)
+{
+	int ret;
+	struct pid *pid;
+	uid_t uid, euid;
+
+	uid = h->f_owner_uid;
+	euid = h->f_owner_euid;
+
+	ckpt_debug("restore_file_owner(): uid %u, euid %u, pid %d, type %d\n",
+			uid, euid, h->f_owner_pid, h->f_owner_pid_type);
+	/*
+	 * We can't trust the uids in the checkpoint image and normally need
+	 * CAP_KILL. But if the uids match our ids, should be fine since we
+	 * have access to the file.
+	 *
+	 * TODO: Move this check to __f_setown() ?
+	 */
+	ret = -EACCES;
+	if (!capable(CAP_KILL) &&
+			(uid != current_uid() || euid != current_euid())) {
+		ckpt_err(ctx, ret, "image uids [%d, %d] don't match current "
+				"process uids [%d, %d] and no CAP_KILL\n",
+				uid, euid, current_uid(), current_euid());
+		return ret;
+	}
+
+	ret = -EINVAL;
+	if (!valid_signal(h->f_owner_signum)) {
+		ckpt_err(ctx, ret, "Invalid signum %d\n", h->f_owner_signum);
+		return ret;
+	}
+	file->f_owner.signum = h->f_owner_signum;
+
+	rcu_read_lock();
+
+	/*
+	 * If file had a non-NULL owner and we can't find the owner after
+	 * restart, return error.
+	 */
+	pid = find_vpid(h->f_owner_pid);
+	if (h->f_owner_pid && !pid)
+		ret = -ESRCH;
+	else {
+		/*
+		 * TODO: Do we need 'force' to be 1 here or can it be 0 ?
+		 * 	 'force' is used to modify the owner, if one is
+		 * 	 already set. Can it be set when we restart an
+		 * 	 application ?
+		 */
+		ret = __f_setown(file, pid, h->f_owner_pid_type, uid, euid, 1);
+	}
+
+	rcu_read_unlock();
+
+	if (ret < 0)
+		ckpt_err(ctx, ret, "__fsetown_uid() failed\n");
+
+	return ret;
+}
+
 #define CKPT_SETFL_MASK  \
 	(O_APPEND | O_NONBLOCK | O_NDELAY | FASYNC | O_DIRECT | O_NOATIME)
 
@@ -651,6 +713,10 @@ int restore_file_common(struct ckpt_ctx *ctx, struct file *file,
 	if (ret < 0)
 		return ret;
 
+	ret = restore_file_owner(ctx, h, file);
+	if (ret < 0)
+		return ret;
+
 	/*
 	 * Normally f_mode is set by open, and modified only via
 	 * fcntl(), so its value now should match that at checkpoint.
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 04/16][cr][v3]: Restore file_owner info
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (2 preceding siblings ...)
  2010-08-03 23:11 ` [PATCH 03/16][cr][v3]: Checkpoint file-owner information Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
       [not found]   ` <1280877097-12377-5-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2010-08-04 23:01   ` Oren Laadan
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                   ` (13 subsequent siblings)
  17 siblings, 2 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Restore the file-owner information for each 'struct file'.  This is
essentially is like a new fcntl(F_SETOWN) and fcntl(F_SETSIG) calls,
except that the pid, uid, euid and signum values are read from the
checkpoint image.

Changelog[v3]:
	- [Oren Laadan]: Ensure find_vpid() found a valid pid before
	  making it the file owner.
Changelog[v2]:
	- [Matt Helsley, Serge Hallyn]: Don't trust uids in checkpoint image.
	  (added CAP_KILL check)
	- Check that signal number read from the checkpoint image is valid.
	  (not sure it is required, since its an incomplete check for tampering)

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/checkpoint.c |   66 +++++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 66 insertions(+), 0 deletions(-)

diff --git a/fs/checkpoint.c b/fs/checkpoint.c
index ce1b4af..b5486c1 100644
--- a/fs/checkpoint.c
+++ b/fs/checkpoint.c
@@ -618,6 +618,68 @@ static int attach_file(struct file *file)
 	return fd;
 }
 
+static int restore_file_owner(struct ckpt_ctx *ctx, struct ckpt_hdr_file *h,
+		struct file *file)
+{
+	int ret;
+	struct pid *pid;
+	uid_t uid, euid;
+
+	uid = h->f_owner_uid;
+	euid = h->f_owner_euid;
+
+	ckpt_debug("restore_file_owner(): uid %u, euid %u, pid %d, type %d\n",
+			uid, euid, h->f_owner_pid, h->f_owner_pid_type);
+	/*
+	 * We can't trust the uids in the checkpoint image and normally need
+	 * CAP_KILL. But if the uids match our ids, should be fine since we
+	 * have access to the file.
+	 *
+	 * TODO: Move this check to __f_setown() ?
+	 */
+	ret = -EACCES;
+	if (!capable(CAP_KILL) &&
+			(uid != current_uid() || euid != current_euid())) {
+		ckpt_err(ctx, ret, "image uids [%d, %d] don't match current "
+				"process uids [%d, %d] and no CAP_KILL\n",
+				uid, euid, current_uid(), current_euid());
+		return ret;
+	}
+
+	ret = -EINVAL;
+	if (!valid_signal(h->f_owner_signum)) {
+		ckpt_err(ctx, ret, "Invalid signum %d\n", h->f_owner_signum);
+		return ret;
+	}
+	file->f_owner.signum = h->f_owner_signum;
+
+	rcu_read_lock();
+
+	/*
+	 * If file had a non-NULL owner and we can't find the owner after
+	 * restart, return error.
+	 */
+	pid = find_vpid(h->f_owner_pid);
+	if (h->f_owner_pid && !pid)
+		ret = -ESRCH;
+	else {
+		/*
+		 * TODO: Do we need 'force' to be 1 here or can it be 0 ?
+		 * 	 'force' is used to modify the owner, if one is
+		 * 	 already set. Can it be set when we restart an
+		 * 	 application ?
+		 */
+		ret = __f_setown(file, pid, h->f_owner_pid_type, uid, euid, 1);
+	}
+
+	rcu_read_unlock();
+
+	if (ret < 0)
+		ckpt_err(ctx, ret, "__fsetown_uid() failed\n");
+
+	return ret;
+}
+
 #define CKPT_SETFL_MASK  \
 	(O_APPEND | O_NONBLOCK | O_NDELAY | FASYNC | O_DIRECT | O_NOATIME)
 
@@ -651,6 +713,10 @@ int restore_file_common(struct ckpt_ctx *ctx, struct file *file,
 	if (ret < 0)
 		return ret;
 
+	ret = restore_file_owner(ctx, h, file);
+	if (ret < 0)
+		return ret;
+
 	/*
 	 * Normally f_mode is set by open, and modified only via
 	 * fcntl(), so its value now should match that at checkpoint.
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 05/16][cr][v3]: Move file_lock macros into linux/fs.h
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (3 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 04/16][cr][v3]: Restore file_owner info Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 06/16][cr][v3]: Checkpoint file-locks Sukadev Bhattiprolu
                     ` (11 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Move IS_POSIX(), IS_FLOCK(), IS_LEASE() and 'for_each_lock()' into
include/linux/fs.h since these are also needed to checkpoint/restart
file-locks.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/locks.c         |    7 -------
 include/linux/fs.h |    7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index ca0c7e2..741b8b7 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -130,16 +130,9 @@
 
 #include <asm/uaccess.h>
 
-#define IS_POSIX(fl)	(fl->fl_flags & FL_POSIX)
-#define IS_FLOCK(fl)	(fl->fl_flags & FL_FLOCK)
-#define IS_LEASE(fl)	(fl->fl_flags & FL_LEASE)
-
 int leases_enable = 1;
 int lease_break_time = 45;
 
-#define for_each_lock(inode, lockp) \
-	for (lockp = &inode->i_flock; *lockp != NULL; lockp = &(*lockp)->fl_next)
-
 static LIST_HEAD(file_lock_list);
 static LIST_HEAD(blocked_list);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b4a6fb0..abd2267 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1088,6 +1088,13 @@ struct file_lock {
 	} fl_u;
 };
 
+#define IS_POSIX(fl)	(fl->fl_flags & FL_POSIX)
+#define IS_FLOCK(fl)	(fl->fl_flags & FL_FLOCK)
+#define IS_LEASE(fl)	(fl->fl_flags & FL_LEASE)
+
+#define for_each_lock(inode, lockp) \
+	for (lockp = &inode->i_flock; *lockp != NULL; lockp = &(*lockp)->fl_next)
+
 /* The following constant reflects the upper bound of the file/locking space */
 #ifndef OFFSET_MAX
 #define INT_LIMIT(x)	(~((x)1 << (sizeof(x)*8 - 1)))
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 05/16][cr][v3]: Move file_lock macros into linux/fs.h
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (4 preceding siblings ...)
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 06/16][cr][v3]: Checkpoint file-locks Sukadev Bhattiprolu
                   ` (11 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Move IS_POSIX(), IS_FLOCK(), IS_LEASE() and 'for_each_lock()' into
include/linux/fs.h since these are also needed to checkpoint/restart
file-locks.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/locks.c         |    7 -------
 include/linux/fs.h |    7 +++++++
 2 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index ca0c7e2..741b8b7 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -130,16 +130,9 @@
 
 #include <asm/uaccess.h>
 
-#define IS_POSIX(fl)	(fl->fl_flags & FL_POSIX)
-#define IS_FLOCK(fl)	(fl->fl_flags & FL_FLOCK)
-#define IS_LEASE(fl)	(fl->fl_flags & FL_LEASE)
-
 int leases_enable = 1;
 int lease_break_time = 45;
 
-#define for_each_lock(inode, lockp) \
-	for (lockp = &inode->i_flock; *lockp != NULL; lockp = &(*lockp)->fl_next)
-
 static LIST_HEAD(file_lock_list);
 static LIST_HEAD(blocked_list);
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index b4a6fb0..abd2267 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1088,6 +1088,13 @@ struct file_lock {
 	} fl_u;
 };
 
+#define IS_POSIX(fl)	(fl->fl_flags & FL_POSIX)
+#define IS_FLOCK(fl)	(fl->fl_flags & FL_FLOCK)
+#define IS_LEASE(fl)	(fl->fl_flags & FL_LEASE)
+
+#define for_each_lock(inode, lockp) \
+	for (lockp = &inode->i_flock; *lockp != NULL; lockp = &(*lockp)->fl_next)
+
 /* The following constant reflects the upper bound of the file/locking space */
 #ifndef OFFSET_MAX
 #define INT_LIMIT(x)	(~((x)1 << (sizeof(x)*8 - 1)))
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 06/16][cr][v3]: Checkpoint file-locks
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (4 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 05/16][cr][v3]: Move file_lock macros into linux/fs.h Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 07/16][cr][v3]: Define flock_set() Sukadev Bhattiprolu
                     ` (10 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

While checkpointing each file-descriptor, find all the locks on the
file and save information about the lock in the checkpoint-image.
A follow-on patch will use this informaiton to restore the file-locks.

Changelog[v3]:
	[Oren Laadan] Add a missing (loff_t) type cast and use a macro
		to set the marker/dummy file lock

Changelog[v2]:
	[Matt Helsley]: Use fixed sizes (__s64) instead of 'loff_t' in
		'struct ckpt_hdr_file_lock'.
	[Matt Helsley, Serge Hallyn]: Highlight new use of BKL (using
		lock_flocks() macros as suggested by Serge).
	[Matt Helsley]: Reorg code a bit to simplify error handling.
	[Matt Helsley]: Reorg code to initialize marker-lock (Pass a
		NULL lock to checkpoint_one_lock() to indicate marker).

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/checkpoint.c                |  100 ++++++++++++++++++++++++++++++++++-----
 include/linux/checkpoint_hdr.h |   18 +++++++
 2 files changed, 105 insertions(+), 13 deletions(-)

diff --git a/fs/checkpoint.c b/fs/checkpoint.c
index b5486c1..57b6944 100644
--- a/fs/checkpoint.c
+++ b/fs/checkpoint.c
@@ -26,8 +26,19 @@
 #include <linux/checkpoint.h>
 #include <linux/eventpoll.h>
 #include <linux/eventfd.h>
+#include <linux/smp_lock.h>
 #include <net/sock.h>
 
+/*
+ * TODO: This code uses the BKL for consistency with other uses of
+ * 	 'for_each_lock()'. But since the BKL may be replaced with another
+ * 	 lock in the future, use lock_flocks() macros instead. lock_flocks()
+ * 	 are currently used in BKL-fix sand boxes and when those changes
+ * 	 are merged, the following macros can be removed
+ */
+#define lock_flocks()		lock_kernel()
+#define unlock_flocks()	unlock_kernel()
+
 /**************************************************************************
  * Checkpoint
  */
@@ -256,8 +267,78 @@ static int checkpoint_file(struct ckpt_ctx *ctx, void *ptr)
 	return ret;
 }
 
+static int checkpoint_one_file_lock(struct ckpt_ctx *ctx, struct file *file,
+		struct file_lock *lock)
+{
+	int rc;
+	struct ckpt_hdr_file_lock *h;
+
+	h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_FILE_LOCK);
+	if (!h)
+		return -ENOMEM;
+
+	if (lock) {
+		h->fl_start = lock->fl_start;
+		h->fl_end = lock->fl_end;
+		h->fl_type = lock->fl_type;
+		h->fl_flags = lock->fl_flags;
+	} else {
+		/* Checkpoint a dummy lock as a marker */
+		CKPT_HDR_SET_MARKER_FILE_LOCK(h);
+	}
+
+	rc = ckpt_write_obj(ctx, &h->h);
+
+	ckpt_hdr_put(ctx, h);
+
+	return rc;
+}
+
+int
+checkpoint_file_locks(struct ckpt_ctx *ctx, struct files_struct *files,
+		struct file *file)
+{
+	int rc;
+	struct inode *inode;
+	struct file_lock **lockpp;
+	struct file_lock *lockp;
+
+	lock_flocks();
+	inode = file->f_path.dentry->d_inode;
+	for_each_lock(inode, lockpp) {
+		lockp = *lockpp;
+		ckpt_debug("Lock [%lld, %lld, %d, 0x%x]\n", lockp->fl_start,
+				lockp->fl_end, lockp->fl_type, lockp->fl_flags);
+
+		if (lockp->fl_owner != files)
+			continue;
+
+		rc = -EBADF;
+		if (IS_POSIX(lockp))
+			rc = checkpoint_one_file_lock(ctx, file, lockp);
+
+		if (rc < 0) {
+			ckpt_err(ctx,  rc, "%(T), checkpoint of lock "
+					"[%lld, %lld, %d, 0x%x] failed\n",
+					lockp->fl_start, lockp->fl_end,
+					lockp->fl_type, lockp->fl_flags);
+			goto out;
+		}
+	}
+
+	/*
+	 * At the end of file-locks for this file, checkpoint a marker.
+	 */
+	rc = checkpoint_one_file_lock(ctx, file, NULL);
+	if (rc < 0)
+		ckpt_err(ctx,  rc, "%(T), checkpoint marker-lock failed\n");
+out:
+	unlock_flocks();
+	return rc;
+}
+
 /**
- * ckpt_write_file_desc - dump the state of a given file descriptor
+ * checkpoint_file_desc - dump the state of a given file descriptor
  * @ctx: checkpoint context
  * @files: files_struct pointer
  * @fd: file descriptor
@@ -288,18 +369,6 @@ static int checkpoint_file_desc(struct ckpt_ctx *ctx,
 	}
 	rcu_read_unlock();
 
-	ret = find_locks_with_owner(file, files);
-	/*
-	 * find_locks_with_owner() returns an error when there
-	 * are no locks found, so we *want* it to return an error
-	 * code.  Its success means we have to fail the checkpoint.
-	 */
-	if (!ret) {
-		ret = -EBADF;
-		ckpt_err(ctx, ret, "%(T)fd %d has file lock or lease\n", fd);
-		goto out;
-	}
-
 	/* sanity check (although this shouldn't happen) */
 	ret = -EBADF;
 	if (!file) {
@@ -323,6 +392,11 @@ static int checkpoint_file_desc(struct ckpt_ctx *ctx,
 	h->fd_close_on_exec = coe;
 
 	ret = ckpt_write_obj(ctx, &h->h);
+	if (ret < 0)
+		goto out;
+
+	ret = checkpoint_file_locks(ctx, files, file);
+
 out:
 	ckpt_hdr_put(ctx, h);
 	if (file)
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index 0381019..ad08c8e 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -144,6 +144,8 @@ enum {
 #define CKPT_HDR_TTY_LDISC CKPT_HDR_TTY_LDISC
 	CKPT_HDR_EPOLL_ITEMS,  /* must be after file-table */
 #define CKPT_HDR_EPOLL_ITEMS CKPT_HDR_EPOLL_ITEMS
+	CKPT_HDR_FILE_LOCK,
+#define CKPT_HDR_FILE_LOCK CKPT_HDR_FILE_LOCK
 
 	CKPT_HDR_MM = 401,
 #define CKPT_HDR_MM CKPT_HDR_MM
@@ -586,6 +588,22 @@ struct ckpt_hdr_file_generic {
 	struct ckpt_hdr_file common;
 } __attribute__((aligned(8)));
 
+struct ckpt_hdr_file_lock {
+       struct ckpt_hdr h;
+       __s64 fl_start;
+       __s64 fl_end;
+       __u8 fl_type;
+       __u8 fl_flags;
+};
+
+#define CKPT_HDR_SET_MARKER_FILE_LOCK(h) {		\
+		h->fl_flags = FL_POSIX;		\
+		h->fl_start = (loff_t) -1;	\
+}
+
+#define CKPT_HDR_IS_MARKER_FILE_LOCK(h)		\
+		((h->fl_flags == FL_POSIX) && (h->fl_start == (loff_t) -1))
+
 struct ckpt_hdr_file_pipe {
 	struct ckpt_hdr_file common;
 	__s32 pipe_objref;
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 06/16][cr][v3]: Checkpoint file-locks
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (5 preceding siblings ...)
  2010-08-03 23:11 ` [PATCH 05/16][cr][v3]: Move file_lock macros into linux/fs.h Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-04 23:26   ` Oren Laadan
       [not found]   ` <1280877097-12377-7-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2010-08-03 23:11 ` [PATCH 07/16][cr][v3]: Define flock_set() Sukadev Bhattiprolu
                   ` (10 subsequent siblings)
  17 siblings, 2 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

While checkpointing each file-descriptor, find all the locks on the
file and save information about the lock in the checkpoint-image.
A follow-on patch will use this informaiton to restore the file-locks.

Changelog[v3]:
	[Oren Laadan] Add a missing (loff_t) type cast and use a macro
		to set the marker/dummy file lock

Changelog[v2]:
	[Matt Helsley]: Use fixed sizes (__s64) instead of 'loff_t' in
		'struct ckpt_hdr_file_lock'.
	[Matt Helsley, Serge Hallyn]: Highlight new use of BKL (using
		lock_flocks() macros as suggested by Serge).
	[Matt Helsley]: Reorg code a bit to simplify error handling.
	[Matt Helsley]: Reorg code to initialize marker-lock (Pass a
		NULL lock to checkpoint_one_lock() to indicate marker).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/checkpoint.c                |  100 ++++++++++++++++++++++++++++++++++-----
 include/linux/checkpoint_hdr.h |   18 +++++++
 2 files changed, 105 insertions(+), 13 deletions(-)

diff --git a/fs/checkpoint.c b/fs/checkpoint.c
index b5486c1..57b6944 100644
--- a/fs/checkpoint.c
+++ b/fs/checkpoint.c
@@ -26,8 +26,19 @@
 #include <linux/checkpoint.h>
 #include <linux/eventpoll.h>
 #include <linux/eventfd.h>
+#include <linux/smp_lock.h>
 #include <net/sock.h>
 
+/*
+ * TODO: This code uses the BKL for consistency with other uses of
+ * 	 'for_each_lock()'. But since the BKL may be replaced with another
+ * 	 lock in the future, use lock_flocks() macros instead. lock_flocks()
+ * 	 are currently used in BKL-fix sand boxes and when those changes
+ * 	 are merged, the following macros can be removed
+ */
+#define lock_flocks()		lock_kernel()
+#define unlock_flocks()	unlock_kernel()
+
 /**************************************************************************
  * Checkpoint
  */
@@ -256,8 +267,78 @@ static int checkpoint_file(struct ckpt_ctx *ctx, void *ptr)
 	return ret;
 }
 
+static int checkpoint_one_file_lock(struct ckpt_ctx *ctx, struct file *file,
+		struct file_lock *lock)
+{
+	int rc;
+	struct ckpt_hdr_file_lock *h;
+
+	h = ckpt_hdr_get_type(ctx, sizeof(*h), CKPT_HDR_FILE_LOCK);
+	if (!h)
+		return -ENOMEM;
+
+	if (lock) {
+		h->fl_start = lock->fl_start;
+		h->fl_end = lock->fl_end;
+		h->fl_type = lock->fl_type;
+		h->fl_flags = lock->fl_flags;
+	} else {
+		/* Checkpoint a dummy lock as a marker */
+		CKPT_HDR_SET_MARKER_FILE_LOCK(h);
+	}
+
+	rc = ckpt_write_obj(ctx, &h->h);
+
+	ckpt_hdr_put(ctx, h);
+
+	return rc;
+}
+
+int
+checkpoint_file_locks(struct ckpt_ctx *ctx, struct files_struct *files,
+		struct file *file)
+{
+	int rc;
+	struct inode *inode;
+	struct file_lock **lockpp;
+	struct file_lock *lockp;
+
+	lock_flocks();
+	inode = file->f_path.dentry->d_inode;
+	for_each_lock(inode, lockpp) {
+		lockp = *lockpp;
+		ckpt_debug("Lock [%lld, %lld, %d, 0x%x]\n", lockp->fl_start,
+				lockp->fl_end, lockp->fl_type, lockp->fl_flags);
+
+		if (lockp->fl_owner != files)
+			continue;
+
+		rc = -EBADF;
+		if (IS_POSIX(lockp))
+			rc = checkpoint_one_file_lock(ctx, file, lockp);
+
+		if (rc < 0) {
+			ckpt_err(ctx,  rc, "%(T), checkpoint of lock "
+					"[%lld, %lld, %d, 0x%x] failed\n",
+					lockp->fl_start, lockp->fl_end,
+					lockp->fl_type, lockp->fl_flags);
+			goto out;
+		}
+	}
+
+	/*
+	 * At the end of file-locks for this file, checkpoint a marker.
+	 */
+	rc = checkpoint_one_file_lock(ctx, file, NULL);
+	if (rc < 0)
+		ckpt_err(ctx,  rc, "%(T), checkpoint marker-lock failed\n");
+out:
+	unlock_flocks();
+	return rc;
+}
+
 /**
- * ckpt_write_file_desc - dump the state of a given file descriptor
+ * checkpoint_file_desc - dump the state of a given file descriptor
  * @ctx: checkpoint context
  * @files: files_struct pointer
  * @fd: file descriptor
@@ -288,18 +369,6 @@ static int checkpoint_file_desc(struct ckpt_ctx *ctx,
 	}
 	rcu_read_unlock();
 
-	ret = find_locks_with_owner(file, files);
-	/*
-	 * find_locks_with_owner() returns an error when there
-	 * are no locks found, so we *want* it to return an error
-	 * code.  Its success means we have to fail the checkpoint.
-	 */
-	if (!ret) {
-		ret = -EBADF;
-		ckpt_err(ctx, ret, "%(T)fd %d has file lock or lease\n", fd);
-		goto out;
-	}
-
 	/* sanity check (although this shouldn't happen) */
 	ret = -EBADF;
 	if (!file) {
@@ -323,6 +392,11 @@ static int checkpoint_file_desc(struct ckpt_ctx *ctx,
 	h->fd_close_on_exec = coe;
 
 	ret = ckpt_write_obj(ctx, &h->h);
+	if (ret < 0)
+		goto out;
+
+	ret = checkpoint_file_locks(ctx, files, file);
+
 out:
 	ckpt_hdr_put(ctx, h);
 	if (file)
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index 0381019..ad08c8e 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -144,6 +144,8 @@ enum {
 #define CKPT_HDR_TTY_LDISC CKPT_HDR_TTY_LDISC
 	CKPT_HDR_EPOLL_ITEMS,  /* must be after file-table */
 #define CKPT_HDR_EPOLL_ITEMS CKPT_HDR_EPOLL_ITEMS
+	CKPT_HDR_FILE_LOCK,
+#define CKPT_HDR_FILE_LOCK CKPT_HDR_FILE_LOCK
 
 	CKPT_HDR_MM = 401,
 #define CKPT_HDR_MM CKPT_HDR_MM
@@ -586,6 +588,22 @@ struct ckpt_hdr_file_generic {
 	struct ckpt_hdr_file common;
 } __attribute__((aligned(8)));
 
+struct ckpt_hdr_file_lock {
+       struct ckpt_hdr h;
+       __s64 fl_start;
+       __s64 fl_end;
+       __u8 fl_type;
+       __u8 fl_flags;
+};
+
+#define CKPT_HDR_SET_MARKER_FILE_LOCK(h) {		\
+		h->fl_flags = FL_POSIX;		\
+		h->fl_start = (loff_t) -1;	\
+}
+
+#define CKPT_HDR_IS_MARKER_FILE_LOCK(h)		\
+		((h->fl_flags == FL_POSIX) && (h->fl_start == (loff_t) -1))
+
 struct ckpt_hdr_file_pipe {
 	struct ckpt_hdr_file common;
 	__s32 pipe_objref;
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 07/16][cr][v3]: Define flock_set()
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (5 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 06/16][cr][v3]: Checkpoint file-locks Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 08/16][cr][v3]: Define flock64_set() Sukadev Bhattiprolu
                     ` (9 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Extract core functionality of fcntl_setlk() into a separate function,
flock_set(). flock_set() can be also used when restarting a checkpointed
application and restoring its file-locks.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/locks.c         |   44 +++++++++++++++++++++++++++-----------------
 include/linux/fs.h |    1 +
 2 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 741b8b7..cb83741 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1759,14 +1759,10 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
 	return error;
 }
 
-/* Apply the lock described by l to an open file descriptor.
- * This implements both the F_SETLK and F_SETLKW commands of fcntl().
- */
-int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
-		struct flock __user *l)
+int flock_set(unsigned int fd, struct file *filp, unsigned int cmd,
+		struct flock *flock)
 {
 	struct file_lock *file_lock = locks_alloc_lock();
-	struct flock flock;
 	struct inode *inode;
 	struct file *f;
 	int error;
@@ -1774,13 +1770,6 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
 	if (file_lock == NULL)
 		return -ENOLCK;
 
-	/*
-	 * This might block, so we do it before checking the inode.
-	 */
-	error = -EFAULT;
-	if (copy_from_user(&flock, l, sizeof(flock)))
-		goto out;
-
 	inode = filp->f_path.dentry->d_inode;
 
 	/* Don't allow mandatory locks on files that may be memory mapped
@@ -1792,7 +1781,7 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
 	}
 
 again:
-	error = flock_to_posix_lock(filp, file_lock, &flock);
+	error = flock_to_posix_lock(filp, file_lock, flock);
 	if (error)
 		goto out;
 	if (cmd == F_SETLKW) {
@@ -1800,7 +1789,7 @@ again:
 	}
 	
 	error = -EBADF;
-	switch (flock.l_type) {
+	switch (flock->l_type) {
 	case F_RDLCK:
 		if (!(filp->f_mode & FMODE_READ))
 			goto out;
@@ -1830,8 +1819,8 @@ again:
 	spin_lock(&current->files->file_lock);
 	f = fcheck(fd);
 	spin_unlock(&current->files->file_lock);
-	if (!error && f != filp && flock.l_type != F_UNLCK) {
-		flock.l_type = F_UNLCK;
+	if (!error && f != filp && flock->l_type != F_UNLCK) {
+		flock->l_type = F_UNLCK;
 		goto again;
 	}
 
@@ -1840,6 +1829,27 @@ out:
 	return error;
 }
 
+/* Apply the lock described by l to an open file descriptor.
+ * This implements both the F_SETLK and F_SETLKW commands of fcntl().
+ */
+int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
+		struct flock __user *l)
+{
+	int error;
+	struct flock flock;
+	
+	/*
+	 * This might block, so we do it before checking the inode
+	 * in flock_set().
+	 */
+	error = -EFAULT;
+	if (copy_from_user(&flock, l, sizeof(flock)))
+		return error;
+
+	return flock_set(fd, filp, cmd, &flock);
+}
+
+
 #if BITS_PER_LONG == 32
 /* Report the first existing lock that would conflict with l.
  * This implements the F_GETLK command of fcntl().
diff --git a/include/linux/fs.h b/include/linux/fs.h
index abd2267..fd66f37 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1112,6 +1112,7 @@ extern void send_sigio(struct fown_struct *fown, int fd, int band);
 extern int fcntl_getlk(struct file *, struct flock __user *);
 extern int fcntl_setlk(unsigned int, struct file *, unsigned int,
 			struct flock __user *);
+extern int flock_set(unsigned int, struct file *, unsigned int, struct flock *);
 
 #if BITS_PER_LONG == 32
 extern int fcntl_getlk64(struct file *, struct flock64 __user *);
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 07/16][cr][v3]: Define flock_set()
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (6 preceding siblings ...)
  2010-08-03 23:11 ` [PATCH 06/16][cr][v3]: Checkpoint file-locks Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 08/16][cr][v3]: Define flock64_set() Sukadev Bhattiprolu
                   ` (9 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Extract core functionality of fcntl_setlk() into a separate function,
flock_set(). flock_set() can be also used when restarting a checkpointed
application and restoring its file-locks.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/locks.c         |   44 +++++++++++++++++++++++++++-----------------
 include/linux/fs.h |    1 +
 2 files changed, 28 insertions(+), 17 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 741b8b7..cb83741 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1759,14 +1759,10 @@ static int do_lock_file_wait(struct file *filp, unsigned int cmd,
 	return error;
 }
 
-/* Apply the lock described by l to an open file descriptor.
- * This implements both the F_SETLK and F_SETLKW commands of fcntl().
- */
-int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
-		struct flock __user *l)
+int flock_set(unsigned int fd, struct file *filp, unsigned int cmd,
+		struct flock *flock)
 {
 	struct file_lock *file_lock = locks_alloc_lock();
-	struct flock flock;
 	struct inode *inode;
 	struct file *f;
 	int error;
@@ -1774,13 +1770,6 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
 	if (file_lock == NULL)
 		return -ENOLCK;
 
-	/*
-	 * This might block, so we do it before checking the inode.
-	 */
-	error = -EFAULT;
-	if (copy_from_user(&flock, l, sizeof(flock)))
-		goto out;
-
 	inode = filp->f_path.dentry->d_inode;
 
 	/* Don't allow mandatory locks on files that may be memory mapped
@@ -1792,7 +1781,7 @@ int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
 	}
 
 again:
-	error = flock_to_posix_lock(filp, file_lock, &flock);
+	error = flock_to_posix_lock(filp, file_lock, flock);
 	if (error)
 		goto out;
 	if (cmd == F_SETLKW) {
@@ -1800,7 +1789,7 @@ again:
 	}
 	
 	error = -EBADF;
-	switch (flock.l_type) {
+	switch (flock->l_type) {
 	case F_RDLCK:
 		if (!(filp->f_mode & FMODE_READ))
 			goto out;
@@ -1830,8 +1819,8 @@ again:
 	spin_lock(&current->files->file_lock);
 	f = fcheck(fd);
 	spin_unlock(&current->files->file_lock);
-	if (!error && f != filp && flock.l_type != F_UNLCK) {
-		flock.l_type = F_UNLCK;
+	if (!error && f != filp && flock->l_type != F_UNLCK) {
+		flock->l_type = F_UNLCK;
 		goto again;
 	}
 
@@ -1840,6 +1829,27 @@ out:
 	return error;
 }
 
+/* Apply the lock described by l to an open file descriptor.
+ * This implements both the F_SETLK and F_SETLKW commands of fcntl().
+ */
+int fcntl_setlk(unsigned int fd, struct file *filp, unsigned int cmd,
+		struct flock __user *l)
+{
+	int error;
+	struct flock flock;
+	
+	/*
+	 * This might block, so we do it before checking the inode
+	 * in flock_set().
+	 */
+	error = -EFAULT;
+	if (copy_from_user(&flock, l, sizeof(flock)))
+		return error;
+
+	return flock_set(fd, filp, cmd, &flock);
+}
+
+
 #if BITS_PER_LONG == 32
 /* Report the first existing lock that would conflict with l.
  * This implements the F_GETLK command of fcntl().
diff --git a/include/linux/fs.h b/include/linux/fs.h
index abd2267..fd66f37 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1112,6 +1112,7 @@ extern void send_sigio(struct fown_struct *fown, int fd, int band);
 extern int fcntl_getlk(struct file *, struct flock __user *);
 extern int fcntl_setlk(unsigned int, struct file *, unsigned int,
 			struct flock __user *);
+extern int flock_set(unsigned int, struct file *, unsigned int, struct flock *);
 
 #if BITS_PER_LONG == 32
 extern int fcntl_getlk64(struct file *, struct flock64 __user *);
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 08/16][cr][v3]: Define flock64_set()
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (6 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 07/16][cr][v3]: Define flock_set() Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 09/16][cr][v3]: Restore file-locks Sukadev Bhattiprolu
                     ` (8 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Extract core functionality of fcntl_setlk64() into a separate function,
flock64_set(). flock64_set() can be also used when restarting a checkpointed
application and restoring its file-locks.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/locks.c         |   38 ++++++++++++++++++++++++--------------
 include/linux/fs.h |    2 ++
 2 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index cb83741..c62ab7f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1890,11 +1890,10 @@ out:
 /* Apply the lock described by l to an open file descriptor.
  * This implements both the F_SETLK and F_SETLKW commands of fcntl().
  */
-int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
-		struct flock64 __user *l)
+int flock64_set(unsigned int fd, struct file *filp, unsigned int cmd,
+		struct flock64 *flock)
 {
 	struct file_lock *file_lock = locks_alloc_lock();
-	struct flock64 flock;
 	struct inode *inode;
 	struct file *f;
 	int error;
@@ -1902,13 +1901,6 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
 	if (file_lock == NULL)
 		return -ENOLCK;
 
-	/*
-	 * This might block, so we do it before checking the inode.
-	 */
-	error = -EFAULT;
-	if (copy_from_user(&flock, l, sizeof(flock)))
-		goto out;
-
 	inode = filp->f_path.dentry->d_inode;
 
 	/* Don't allow mandatory locks on files that may be memory mapped
@@ -1920,7 +1912,7 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
 	}
 
 again:
-	error = flock64_to_posix_lock(filp, file_lock, &flock);
+	error = flock64_to_posix_lock(filp, file_lock, flock);
 	if (error)
 		goto out;
 	if (cmd == F_SETLKW64) {
@@ -1928,7 +1920,7 @@ again:
 	}
 	
 	error = -EBADF;
-	switch (flock.l_type) {
+	switch (flock->l_type) {
 	case F_RDLCK:
 		if (!(filp->f_mode & FMODE_READ))
 			goto out;
@@ -1953,8 +1945,8 @@ again:
 	spin_lock(&current->files->file_lock);
 	f = fcheck(fd);
 	spin_unlock(&current->files->file_lock);
-	if (!error && f != filp && flock.l_type != F_UNLCK) {
-		flock.l_type = F_UNLCK;
+	if (!error && f != filp && flock->l_type != F_UNLCK) {
+		flock->l_type = F_UNLCK;
 		goto again;
 	}
 
@@ -1962,6 +1954,24 @@ out:
 	locks_free_lock(file_lock);
 	return error;
 }
+
+/* Apply the lock described by l to an open file descriptor.
+ * This implements both the F_SETLK and F_SETLKW commands of fcntl().
+ */
+int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
+		struct flock64 __user *l)
+{
+	struct flock64 flock;
+
+	/*
+	 * This might block, so we do it before checking the inode in
+	 * flock64_set().
+	 */
+	if (copy_from_user(&flock, l, sizeof(flock)))
+		return -EFAULT;
+
+	return flock64_set(fd, filp, cmd, &flock);
+}
 #endif /* BITS_PER_LONG == 32 */
 
 /*
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd66f37..49d4eeb 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1118,6 +1118,8 @@ extern int flock_set(unsigned int, struct file *, unsigned int, struct flock *);
 extern int fcntl_getlk64(struct file *, struct flock64 __user *);
 extern int fcntl_setlk64(unsigned int, struct file *, unsigned int,
 			struct flock64 __user *);
+extern int flock64_set(unsigned int, struct file *, unsigned int,
+			struct flock64 *);
 #endif
 
 extern int fcntl_setlease(unsigned int fd, struct file *filp, long arg);
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 08/16][cr][v3]: Define flock64_set()
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (7 preceding siblings ...)
  2010-08-03 23:11 ` [PATCH 07/16][cr][v3]: Define flock_set() Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 09/16][cr][v3]: Restore file-locks Sukadev Bhattiprolu
                   ` (8 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Extract core functionality of fcntl_setlk64() into a separate function,
flock64_set(). flock64_set() can be also used when restarting a checkpointed
application and restoring its file-locks.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/locks.c         |   38 ++++++++++++++++++++++++--------------
 include/linux/fs.h |    2 ++
 2 files changed, 26 insertions(+), 14 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index cb83741..c62ab7f 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1890,11 +1890,10 @@ out:
 /* Apply the lock described by l to an open file descriptor.
  * This implements both the F_SETLK and F_SETLKW commands of fcntl().
  */
-int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
-		struct flock64 __user *l)
+int flock64_set(unsigned int fd, struct file *filp, unsigned int cmd,
+		struct flock64 *flock)
 {
 	struct file_lock *file_lock = locks_alloc_lock();
-	struct flock64 flock;
 	struct inode *inode;
 	struct file *f;
 	int error;
@@ -1902,13 +1901,6 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
 	if (file_lock == NULL)
 		return -ENOLCK;
 
-	/*
-	 * This might block, so we do it before checking the inode.
-	 */
-	error = -EFAULT;
-	if (copy_from_user(&flock, l, sizeof(flock)))
-		goto out;
-
 	inode = filp->f_path.dentry->d_inode;
 
 	/* Don't allow mandatory locks on files that may be memory mapped
@@ -1920,7 +1912,7 @@ int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
 	}
 
 again:
-	error = flock64_to_posix_lock(filp, file_lock, &flock);
+	error = flock64_to_posix_lock(filp, file_lock, flock);
 	if (error)
 		goto out;
 	if (cmd == F_SETLKW64) {
@@ -1928,7 +1920,7 @@ again:
 	}
 	
 	error = -EBADF;
-	switch (flock.l_type) {
+	switch (flock->l_type) {
 	case F_RDLCK:
 		if (!(filp->f_mode & FMODE_READ))
 			goto out;
@@ -1953,8 +1945,8 @@ again:
 	spin_lock(&current->files->file_lock);
 	f = fcheck(fd);
 	spin_unlock(&current->files->file_lock);
-	if (!error && f != filp && flock.l_type != F_UNLCK) {
-		flock.l_type = F_UNLCK;
+	if (!error && f != filp && flock->l_type != F_UNLCK) {
+		flock->l_type = F_UNLCK;
 		goto again;
 	}
 
@@ -1962,6 +1954,24 @@ out:
 	locks_free_lock(file_lock);
 	return error;
 }
+
+/* Apply the lock described by l to an open file descriptor.
+ * This implements both the F_SETLK and F_SETLKW commands of fcntl().
+ */
+int fcntl_setlk64(unsigned int fd, struct file *filp, unsigned int cmd,
+		struct flock64 __user *l)
+{
+	struct flock64 flock;
+
+	/*
+	 * This might block, so we do it before checking the inode in
+	 * flock64_set().
+	 */
+	if (copy_from_user(&flock, l, sizeof(flock)))
+		return -EFAULT;
+
+	return flock64_set(fd, filp, cmd, &flock);
+}
 #endif /* BITS_PER_LONG == 32 */
 
 /*
diff --git a/include/linux/fs.h b/include/linux/fs.h
index fd66f37..49d4eeb 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1118,6 +1118,8 @@ extern int flock_set(unsigned int, struct file *, unsigned int, struct flock *);
 extern int fcntl_getlk64(struct file *, struct flock64 __user *);
 extern int fcntl_setlk64(unsigned int, struct file *, unsigned int,
 			struct flock64 __user *);
+extern int flock64_set(unsigned int, struct file *, unsigned int,
+			struct flock64 *);
 #endif
 
 extern int fcntl_setlease(unsigned int fd, struct file *filp, long arg);
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 09/16][cr][v3]: Restore file-locks
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (7 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 08/16][cr][v3]: Define flock64_set() Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 10/16][cr][v3]: Initialize ->fl_break_time to 0 Sukadev Bhattiprolu
                     ` (7 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Restore POSIX file-locks of an application from its checkpoint image.

Read the saved file-locks from the checkpoint image and for each POSIX
lock, call flock_set() to set the lock on the file.

As pointed out by Matt Helsley, no special handling is necessary for a
process P2 in the checkpointed container that is blocked on a lock, L1
held by another process P1. Processes in the restarted container begin
execution only after all processes have restored. If the blocked process
P2 is restored first, it will prepare to return an -ERESTARTSYS from the
fcntl() system call, but wait for P1 to be restored. When P1 is restored,
it will re-acquire the lock L1 before P1 and P2 begin actual execution.
This ensures that even if P2 is scheduled to run before P1, P2 will go
back to waiting for the lock L1.

Changelog[v3]:
	- [Oren Laadan]: Use a macro that can be shared with user-space
	  to set/test marker file-lock.

Changelog[v2]:
	- Add support for C/R of F_SETLK64/F_GETLK64

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/checkpoint.c |  170 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 168 insertions(+), 2 deletions(-)

diff --git a/fs/checkpoint.c b/fs/checkpoint.c
index 57b6944..d76b073 100644
--- a/fs/checkpoint.c
+++ b/fs/checkpoint.c
@@ -927,8 +927,170 @@ static void *restore_file(struct ckpt_ctx *ctx)
 	return (void *)file;
 }
 
+#if BITS_PER_LONG == 32
+
+/*
+ * NOTE: Even if we checkpointed a lock that was set with 'struct flock'
+ * 	 restore the lock using 'struct flock64'. Note that both these lock
+ * 	 types are first converted to a posix_file_lock before processing so
+ * 	 converting to 'struct flock64' is (hopefully) not a problem. 
+ * 	 NFS for instance uses IS_SETLK() instead of cmd == F_SETLK.
+ *
+ * 	 TODO:  Are there filesystems that implement F_SETLK but not F_SETLK64 ?
+ * 	 	If there are, restore_one_file_lock() will fail.
+ */
+static int
+ckpt_hdr_file_lock_to_flock64(struct ckpt_hdr_file_lock *h, struct flock64 *fl)
+{
+	/*
+	 * We checkpoint the 'raw' fl_type which in case of leases includes
+	 * the F_INPROGRESS flag. But for posix-locks, the fl_type should
+	 * be simple.
+	 */
+	switch(h->fl_type) {
+		case F_RDLCK:
+		case F_WRLCK:
+		case F_UNLCK:
+			break;
+		default:
+			ckpt_debug("Bad posix lock type 0x%x ?\n", h->fl_type);
+			return -EINVAL;
+	}
+
+	memset(fl, 0, sizeof(*fl));
+	fl->l_type = h->fl_type;
+	fl->l_start = h->fl_start;
+	fl->l_len = h->fl_end == OFFSET_MAX ? 0 : h->fl_end - h->fl_start + 1;
+	fl->l_whence = SEEK_SET;
+
+	/* TODO: Init ->l_sysid, l_pid fields */
+	ckpt_debug("Restoring filelock [%lld, %lld, %d]\n", fl->l_start,
+			fl->l_len, fl->l_type);
+
+	return 0;
+}
+
+static int restore_one_file_lock(struct ckpt_ctx *ctx, struct file *file,
+		int fd, struct ckpt_hdr_file_lock *h)
+{
+	struct flock64 fl;
+	int ret;
+
+	ret = ckpt_hdr_file_lock_to_flock64(h, &fl);
+	if (ret < 0) {
+		ckpt_err(ctx, ret, "%(T) Unexpected flock\n");
+		return ret;
+	}
+
+	/*
+	 * Use F_SETLK because we should not have to wait for the lock. If
+	 * another process holds the lock, it indicates that filesystem-state
+	 * is not consistent with what it was at checkpoint. In which case we
+	 * better fail.
+	 */
+	ret = flock64_set(fd, file, F_SETLK64, &fl);
+	if (ret)
+		ckpt_err(ctx, ret, "flock64_set(): %d\n", (int)h->fl_type);
+
+	return ret;
+}
+
+#else
+
+static int
+ckpt_hdr_file_lock_to_flock(struct ckpt_hdr_file_lock *h, struct flock *fl)
+{
+	/*
+	 * We checkpoint the 'raw' fl_type which in case of leases includes
+	 * the F_INPROGRESS flag. But for posix-locks, the fl_type should
+	 * be simple.
+	 */
+	switch(h->fl_type) {
+		case F_RDLCK:
+		case F_WRLCK:
+		case F_UNLCK:
+			break;
+		default:
+			ckpt_debug("Bad posix lock type 0x%x ?\n", h->fl_type);
+			return -EINVAL;
+	}
+
+	memset(fl, 0, sizeof(*fl));
+
+	fl->l_type = h->fl_type;
+	fl->l_start = h->fl_start;
+	fl->l_len = fl->fl_end == OFFSET_MAX ? 0 : h->fl_end - h->fl_start + 1;
+	fl->l_whence = SEEK_SET;
+
+	ckpt_debug("Restoring filelock [%lld, %lld, %d]\n", fl->l_start,
+			fl->l_len, fl->l_type);
+
+	/* TODO: Init ->l_sysid, l_pid fields */
+
+	return 0;
+}
+
+static int restore_one_file_lock(struct ckpt_ctx *ctx, struct file *file,
+		int fd, struct ckpt_hdr_file_lock *h)
+{
+	struct flock fl;
+	int ret;
+
+	ret = ckpt_hdr_file_lock_to_flock(h, &fl);
+	if (ret < 0) {
+		ckpt_err(ctx, ret, "%(T) Unexpected flock\n");
+		break;
+	}
+
+	/*
+	 * Use F_SETLK because we should not have to wait for the lock. If
+	 * another process holds the lock, it indicates that filesystem-state
+	 * is not consistent with what it was at checkpoint. In which case we
+	 * better fail.
+	 */
+	ret = flock_set(fd, file, F_SETLK, &fl);
+	if (ret)
+		ckpt_err(ctx, ret, "flock_set(): %d\n", (int)h->fl_type);
+
+	return ret;
+}
+#endif
+
+static int restore_file_locks(struct ckpt_ctx *ctx, struct file *file, int fd)
+{
+	int ret;
+	struct ckpt_hdr_file_lock *h;
+
+	ret = 0;
+	while (!ret) {
+
+		h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_FILE_LOCK);
+		if (IS_ERR(h))
+			return PTR_ERR(h);
+
+		ckpt_debug("Lock [%lld, %lld, %d, 0x%x]\n", h->fl_start,
+				h->fl_end, (int)h->fl_type, h->fl_flags);
+
+		/*
+		 * If it is a dummy-lock, we are done with this fd.
+		 */
+		if (CKPT_HDR_IS_MARKER_FILE_LOCK(h)) {
+			ckpt_debug("Found last lock for fd\n");
+			break;
+		}
+
+		ret = -EBADF;
+		if (h->fl_flags & FL_POSIX)
+			ret = restore_one_file_lock(ctx, file, fd, h); 
+
+		if (ret < 0)
+			ckpt_err(ctx, ret, "%(T) fl_flags 0x%x\n", h->fl_flags);
+	}
+	return ret;
+}
+
 /**
- * ckpt_read_file_desc - restore the state of a given file descriptor
+ * restore_file_desc - restore the state of a given file descriptor
  * @ctx: checkpoint context
  *
  * Restores the state of a file descriptor; looks up the objref (in the
@@ -974,7 +1136,11 @@ static int restore_file_desc(struct ckpt_ctx *ctx)
 	}
 
 	set_close_on_exec(h->fd_descriptor, h->fd_close_on_exec);
-	ret = 0;
+
+	ret = restore_file_locks(ctx, file, h->fd_descriptor);
+	if (ret < 0)
+		ckpt_err(ctx, ret, "Error on fd %d\n", h->fd_descriptor);
+
  out:
 	ckpt_hdr_put(ctx, h);
 	return ret;
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 09/16][cr][v3]: Restore file-locks
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (8 preceding siblings ...)
  2010-08-03 23:11 ` [PATCH 08/16][cr][v3]: Define flock64_set() Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 10/16][cr][v3]: Initialize ->fl_break_time to 0 Sukadev Bhattiprolu
                   ` (7 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Restore POSIX file-locks of an application from its checkpoint image.

Read the saved file-locks from the checkpoint image and for each POSIX
lock, call flock_set() to set the lock on the file.

As pointed out by Matt Helsley, no special handling is necessary for a
process P2 in the checkpointed container that is blocked on a lock, L1
held by another process P1. Processes in the restarted container begin
execution only after all processes have restored. If the blocked process
P2 is restored first, it will prepare to return an -ERESTARTSYS from the
fcntl() system call, but wait for P1 to be restored. When P1 is restored,
it will re-acquire the lock L1 before P1 and P2 begin actual execution.
This ensures that even if P2 is scheduled to run before P1, P2 will go
back to waiting for the lock L1.

Changelog[v3]:
	- [Oren Laadan]: Use a macro that can be shared with user-space
	  to set/test marker file-lock.

Changelog[v2]:
	- Add support for C/R of F_SETLK64/F_GETLK64

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/checkpoint.c |  170 ++++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 168 insertions(+), 2 deletions(-)

diff --git a/fs/checkpoint.c b/fs/checkpoint.c
index 57b6944..d76b073 100644
--- a/fs/checkpoint.c
+++ b/fs/checkpoint.c
@@ -927,8 +927,170 @@ static void *restore_file(struct ckpt_ctx *ctx)
 	return (void *)file;
 }
 
+#if BITS_PER_LONG == 32
+
+/*
+ * NOTE: Even if we checkpointed a lock that was set with 'struct flock'
+ * 	 restore the lock using 'struct flock64'. Note that both these lock
+ * 	 types are first converted to a posix_file_lock before processing so
+ * 	 converting to 'struct flock64' is (hopefully) not a problem. 
+ * 	 NFS for instance uses IS_SETLK() instead of cmd == F_SETLK.
+ *
+ * 	 TODO:  Are there filesystems that implement F_SETLK but not F_SETLK64 ?
+ * 	 	If there are, restore_one_file_lock() will fail.
+ */
+static int
+ckpt_hdr_file_lock_to_flock64(struct ckpt_hdr_file_lock *h, struct flock64 *fl)
+{
+	/*
+	 * We checkpoint the 'raw' fl_type which in case of leases includes
+	 * the F_INPROGRESS flag. But for posix-locks, the fl_type should
+	 * be simple.
+	 */
+	switch(h->fl_type) {
+		case F_RDLCK:
+		case F_WRLCK:
+		case F_UNLCK:
+			break;
+		default:
+			ckpt_debug("Bad posix lock type 0x%x ?\n", h->fl_type);
+			return -EINVAL;
+	}
+
+	memset(fl, 0, sizeof(*fl));
+	fl->l_type = h->fl_type;
+	fl->l_start = h->fl_start;
+	fl->l_len = h->fl_end == OFFSET_MAX ? 0 : h->fl_end - h->fl_start + 1;
+	fl->l_whence = SEEK_SET;
+
+	/* TODO: Init ->l_sysid, l_pid fields */
+	ckpt_debug("Restoring filelock [%lld, %lld, %d]\n", fl->l_start,
+			fl->l_len, fl->l_type);
+
+	return 0;
+}
+
+static int restore_one_file_lock(struct ckpt_ctx *ctx, struct file *file,
+		int fd, struct ckpt_hdr_file_lock *h)
+{
+	struct flock64 fl;
+	int ret;
+
+	ret = ckpt_hdr_file_lock_to_flock64(h, &fl);
+	if (ret < 0) {
+		ckpt_err(ctx, ret, "%(T) Unexpected flock\n");
+		return ret;
+	}
+
+	/*
+	 * Use F_SETLK because we should not have to wait for the lock. If
+	 * another process holds the lock, it indicates that filesystem-state
+	 * is not consistent with what it was at checkpoint. In which case we
+	 * better fail.
+	 */
+	ret = flock64_set(fd, file, F_SETLK64, &fl);
+	if (ret)
+		ckpt_err(ctx, ret, "flock64_set(): %d\n", (int)h->fl_type);
+
+	return ret;
+}
+
+#else
+
+static int
+ckpt_hdr_file_lock_to_flock(struct ckpt_hdr_file_lock *h, struct flock *fl)
+{
+	/*
+	 * We checkpoint the 'raw' fl_type which in case of leases includes
+	 * the F_INPROGRESS flag. But for posix-locks, the fl_type should
+	 * be simple.
+	 */
+	switch(h->fl_type) {
+		case F_RDLCK:
+		case F_WRLCK:
+		case F_UNLCK:
+			break;
+		default:
+			ckpt_debug("Bad posix lock type 0x%x ?\n", h->fl_type);
+			return -EINVAL;
+	}
+
+	memset(fl, 0, sizeof(*fl));
+
+	fl->l_type = h->fl_type;
+	fl->l_start = h->fl_start;
+	fl->l_len = fl->fl_end == OFFSET_MAX ? 0 : h->fl_end - h->fl_start + 1;
+	fl->l_whence = SEEK_SET;
+
+	ckpt_debug("Restoring filelock [%lld, %lld, %d]\n", fl->l_start,
+			fl->l_len, fl->l_type);
+
+	/* TODO: Init ->l_sysid, l_pid fields */
+
+	return 0;
+}
+
+static int restore_one_file_lock(struct ckpt_ctx *ctx, struct file *file,
+		int fd, struct ckpt_hdr_file_lock *h)
+{
+	struct flock fl;
+	int ret;
+
+	ret = ckpt_hdr_file_lock_to_flock(h, &fl);
+	if (ret < 0) {
+		ckpt_err(ctx, ret, "%(T) Unexpected flock\n");
+		break;
+	}
+
+	/*
+	 * Use F_SETLK because we should not have to wait for the lock. If
+	 * another process holds the lock, it indicates that filesystem-state
+	 * is not consistent with what it was at checkpoint. In which case we
+	 * better fail.
+	 */
+	ret = flock_set(fd, file, F_SETLK, &fl);
+	if (ret)
+		ckpt_err(ctx, ret, "flock_set(): %d\n", (int)h->fl_type);
+
+	return ret;
+}
+#endif
+
+static int restore_file_locks(struct ckpt_ctx *ctx, struct file *file, int fd)
+{
+	int ret;
+	struct ckpt_hdr_file_lock *h;
+
+	ret = 0;
+	while (!ret) {
+
+		h = ckpt_read_obj_type(ctx, sizeof(*h), CKPT_HDR_FILE_LOCK);
+		if (IS_ERR(h))
+			return PTR_ERR(h);
+
+		ckpt_debug("Lock [%lld, %lld, %d, 0x%x]\n", h->fl_start,
+				h->fl_end, (int)h->fl_type, h->fl_flags);
+
+		/*
+		 * If it is a dummy-lock, we are done with this fd.
+		 */
+		if (CKPT_HDR_IS_MARKER_FILE_LOCK(h)) {
+			ckpt_debug("Found last lock for fd\n");
+			break;
+		}
+
+		ret = -EBADF;
+		if (h->fl_flags & FL_POSIX)
+			ret = restore_one_file_lock(ctx, file, fd, h); 
+
+		if (ret < 0)
+			ckpt_err(ctx, ret, "%(T) fl_flags 0x%x\n", h->fl_flags);
+	}
+	return ret;
+}
+
 /**
- * ckpt_read_file_desc - restore the state of a given file descriptor
+ * restore_file_desc - restore the state of a given file descriptor
  * @ctx: checkpoint context
  *
  * Restores the state of a file descriptor; looks up the objref (in the
@@ -974,7 +1136,11 @@ static int restore_file_desc(struct ckpt_ctx *ctx)
 	}
 
 	set_close_on_exec(h->fd_descriptor, h->fd_close_on_exec);
-	ret = 0;
+
+	ret = restore_file_locks(ctx, file, h->fd_descriptor);
+	if (ret < 0)
+		ckpt_err(ctx, ret, "Error on fd %d\n", h->fd_descriptor);
+
  out:
 	ckpt_hdr_put(ctx, h);
 	return ret;
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 10/16][cr][v3]: Initialize ->fl_break_time to 0
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (8 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 09/16][cr][v3]: Restore file-locks Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 11/16][cr][v3]: Add ->fl_type_prev field Sukadev Bhattiprolu
                     ` (6 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Explicitly initialize ->fl_break_time to 0 in locks_init_lock() and
__locks_copy_lock(). ->fl_break_time will be later used during
checkpoint/restart of an applicaton that has a file-lease.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/locks.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index c62ab7f..0bd5af7 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -185,6 +185,7 @@ void locks_init_lock(struct file_lock *fl)
 	fl->fl_flags = 0;
 	fl->fl_type = 0;
 	fl->fl_start = fl->fl_end = 0;
+	fl->fl_break_time = 0UL;
 	fl->fl_ops = NULL;
 	fl->fl_lmops = NULL;
 }
@@ -228,6 +229,7 @@ void __locks_copy_lock(struct file_lock *new, const struct file_lock *fl)
 	new->fl_type = fl->fl_type;
 	new->fl_start = fl->fl_start;
 	new->fl_end = fl->fl_end;
+	new->fl_break_time = 0UL;
 	new->fl_ops = NULL;
 	new->fl_lmops = NULL;
 }
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 10/16][cr][v3]: Initialize ->fl_break_time to 0
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (9 preceding siblings ...)
  2010-08-03 23:11 ` [PATCH 09/16][cr][v3]: Restore file-locks Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 11/16][cr][v3]: Add ->fl_type_prev field Sukadev Bhattiprolu
                   ` (6 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Explicitly initialize ->fl_break_time to 0 in locks_init_lock() and
__locks_copy_lock(). ->fl_break_time will be later used during
checkpoint/restart of an applicaton that has a file-lease.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/locks.c |    2 ++
 1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index c62ab7f..0bd5af7 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -185,6 +185,7 @@ void locks_init_lock(struct file_lock *fl)
 	fl->fl_flags = 0;
 	fl->fl_type = 0;
 	fl->fl_start = fl->fl_end = 0;
+	fl->fl_break_time = 0UL;
 	fl->fl_ops = NULL;
 	fl->fl_lmops = NULL;
 }
@@ -228,6 +229,7 @@ void __locks_copy_lock(struct file_lock *new, const struct file_lock *fl)
 	new->fl_type = fl->fl_type;
 	new->fl_start = fl->fl_start;
 	new->fl_end = fl->fl_end;
+	new->fl_break_time = 0UL;
 	new->fl_ops = NULL;
 	new->fl_lmops = NULL;
 }
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 11/16][cr][v3]: Add ->fl_type_prev field.
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (9 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 10/16][cr][v3]: Initialize ->fl_break_time to 0 Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 12/16][cr][v3]: Add ->fl_break_notified field Sukadev Bhattiprolu
                     ` (5 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

In preparation for checkpoint/restart of file leases, add ->fl_type_prev
field to 'struct file_lock'. This field is needed to correctly restore
file leases in case of recursive checkpoint/restart of an in-progress
lease.

This ->fl_type_prev is only initialized in this patch. It will actually
be used when restoring file leases after a checkpoint.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/locks.c         |   10 ++++++++++
 include/linux/fs.h |    1 +
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 0bd5af7..9a00876 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -184,6 +184,7 @@ void locks_init_lock(struct file_lock *fl)
 	fl->fl_file = NULL;
 	fl->fl_flags = 0;
 	fl->fl_type = 0;
+	fl->fl_type_prev = 0;
 	fl->fl_start = fl->fl_end = 0;
 	fl->fl_break_time = 0UL;
 	fl->fl_ops = NULL;
@@ -227,6 +228,7 @@ void __locks_copy_lock(struct file_lock *new, const struct file_lock *fl)
 	new->fl_file = NULL;
 	new->fl_flags = fl->fl_flags;
 	new->fl_type = fl->fl_type;
+	new->fl_type_prev = fl->fl_type_prev;
 	new->fl_start = fl->fl_start;
 	new->fl_end = fl->fl_end;
 	new->fl_break_time = 0UL;
@@ -293,6 +295,13 @@ static int assign_type(struct file_lock *fl, int type)
 	case F_WRLCK:
 	case F_UNLCK:
 		fl->fl_type = type;
+		/*
+		 * Clear ->fl_type_prev since this is a new lease type.
+		 * break_lease() will use this cleared state to know
+		 * if it must save the lease-type in case of checkpoint/
+		 * restart.
+		 */
+		fl->fl_type_prev = 0;
 		break;
 	default:
 		return -EINVAL;
@@ -1222,6 +1231,7 @@ int __break_lease(struct inode *inode, unsigned int mode)
 
 	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
 		if (fl->fl_type != future) {
+			fl->fl_type_prev = fl->fl_type;
 			fl->fl_type = future;
 			fl->fl_break_time = break_time;
 			/* lease must have lmops break callback */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 49d4eeb..299cc09 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1066,6 +1066,7 @@ struct file_lock {
 	fl_owner_t fl_owner;
 	unsigned char fl_flags;
 	unsigned char fl_type;
+	unsigned char fl_type_prev;
 	unsigned int fl_pid;
 	struct pid *fl_nspid;
 	wait_queue_head_t fl_wait;
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 11/16][cr][v3]: Add ->fl_type_prev field.
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (10 preceding siblings ...)
  2010-08-03 23:11 ` [PATCH 10/16][cr][v3]: Initialize ->fl_break_time to 0 Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 12/16][cr][v3]: Add ->fl_break_notified field Sukadev Bhattiprolu
                   ` (5 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

In preparation for checkpoint/restart of file leases, add ->fl_type_prev
field to 'struct file_lock'. This field is needed to correctly restore
file leases in case of recursive checkpoint/restart of an in-progress
lease.

This ->fl_type_prev is only initialized in this patch. It will actually
be used when restoring file leases after a checkpoint.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/locks.c         |   10 ++++++++++
 include/linux/fs.h |    1 +
 2 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 0bd5af7..9a00876 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -184,6 +184,7 @@ void locks_init_lock(struct file_lock *fl)
 	fl->fl_file = NULL;
 	fl->fl_flags = 0;
 	fl->fl_type = 0;
+	fl->fl_type_prev = 0;
 	fl->fl_start = fl->fl_end = 0;
 	fl->fl_break_time = 0UL;
 	fl->fl_ops = NULL;
@@ -227,6 +228,7 @@ void __locks_copy_lock(struct file_lock *new, const struct file_lock *fl)
 	new->fl_file = NULL;
 	new->fl_flags = fl->fl_flags;
 	new->fl_type = fl->fl_type;
+	new->fl_type_prev = fl->fl_type_prev;
 	new->fl_start = fl->fl_start;
 	new->fl_end = fl->fl_end;
 	new->fl_break_time = 0UL;
@@ -293,6 +295,13 @@ static int assign_type(struct file_lock *fl, int type)
 	case F_WRLCK:
 	case F_UNLCK:
 		fl->fl_type = type;
+		/*
+		 * Clear ->fl_type_prev since this is a new lease type.
+		 * break_lease() will use this cleared state to know
+		 * if it must save the lease-type in case of checkpoint/
+		 * restart.
+		 */
+		fl->fl_type_prev = 0;
 		break;
 	default:
 		return -EINVAL;
@@ -1222,6 +1231,7 @@ int __break_lease(struct inode *inode, unsigned int mode)
 
 	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
 		if (fl->fl_type != future) {
+			fl->fl_type_prev = fl->fl_type;
 			fl->fl_type = future;
 			fl->fl_break_time = break_time;
 			/* lease must have lmops break callback */
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 49d4eeb..299cc09 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1066,6 +1066,7 @@ struct file_lock {
 	fl_owner_t fl_owner;
 	unsigned char fl_flags;
 	unsigned char fl_type;
+	unsigned char fl_type_prev;
 	unsigned int fl_pid;
 	struct pid *fl_nspid;
 	wait_queue_head_t fl_wait;
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 12/16][cr][v3]: Add ->fl_break_notified field.
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (10 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 11/16][cr][v3]: Add ->fl_type_prev field Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 13/16][cr][v3]: Add jiffies_begin field to ckpt_ctx Sukadev Bhattiprolu
                     ` (4 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

The file_lock->fl_break_notified field will be used when checkpointing and
restarting a file-lease. It is needed if the application was checkpointed
during the lease-break-protocol, to "remember" if the application was notified
of the lease-break.

This patch just initializes the ->fl_break_notified field. It will be used
in a follow-on patch, when checkpoint/restart of file-leases are implemented.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/locks.c         |    7 +++++++
 include/linux/fs.h |    1 +
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 9a00876..b5eb4c0 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -185,6 +185,7 @@ void locks_init_lock(struct file_lock *fl)
 	fl->fl_flags = 0;
 	fl->fl_type = 0;
 	fl->fl_type_prev = 0;
+	fl->fl_break_notified = 0;
 	fl->fl_start = fl->fl_end = 0;
 	fl->fl_break_time = 0UL;
 	fl->fl_ops = NULL;
@@ -230,6 +231,7 @@ void __locks_copy_lock(struct file_lock *new, const struct file_lock *fl)
 	new->fl_type = fl->fl_type;
 	new->fl_type_prev = fl->fl_type_prev;
 	new->fl_start = fl->fl_start;
+	new->fl_break_notified = fl->fl_break_notified;
 	new->fl_end = fl->fl_end;
 	new->fl_break_time = 0UL;
 	new->fl_ops = NULL;
@@ -458,6 +460,7 @@ static int lease_init(struct file *filp, int type, struct file_lock *fl)
 
 	fl->fl_file = filp;
 	fl->fl_flags = FL_LEASE;
+	fl->fl_break_notified = 0;
 	fl->fl_start = 0;
 	fl->fl_end = OFFSET_MAX;
 	fl->fl_ops = NULL;
@@ -1141,6 +1144,8 @@ int lease_modify(struct file_lock **before, int arg)
 	struct file_lock *fl = *before;
 	int error = assign_type(fl, arg);
 
+	fl->fl_break_notified = 0;
+
 	if (error)
 		return error;
 	locks_wake_up_blocks(fl);
@@ -1234,8 +1239,10 @@ int __break_lease(struct inode *inode, unsigned int mode)
 			fl->fl_type_prev = fl->fl_type;
 			fl->fl_type = future;
 			fl->fl_break_time = break_time;
+
 			/* lease must have lmops break callback */
 			fl->fl_lmops->fl_break(fl);
+			fl->fl_break_notified = 1;
 		}
 	}
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 299cc09..c21f002 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1067,6 +1067,7 @@ struct file_lock {
 	unsigned char fl_flags;
 	unsigned char fl_type;
 	unsigned char fl_type_prev;
+	unsigned char fl_break_notified;
 	unsigned int fl_pid;
 	struct pid *fl_nspid;
 	wait_queue_head_t fl_wait;
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 12/16][cr][v3]: Add ->fl_break_notified field.
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (11 preceding siblings ...)
  2010-08-03 23:11 ` [PATCH 11/16][cr][v3]: Add ->fl_type_prev field Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 13/16][cr][v3]: Add jiffies_begin field to ckpt_ctx Sukadev Bhattiprolu
                   ` (4 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

The file_lock->fl_break_notified field will be used when checkpointing and
restarting a file-lease. It is needed if the application was checkpointed
during the lease-break-protocol, to "remember" if the application was notified
of the lease-break.

This patch just initializes the ->fl_break_notified field. It will be used
in a follow-on patch, when checkpoint/restart of file-leases are implemented.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/locks.c         |    7 +++++++
 include/linux/fs.h |    1 +
 2 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index 9a00876..b5eb4c0 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -185,6 +185,7 @@ void locks_init_lock(struct file_lock *fl)
 	fl->fl_flags = 0;
 	fl->fl_type = 0;
 	fl->fl_type_prev = 0;
+	fl->fl_break_notified = 0;
 	fl->fl_start = fl->fl_end = 0;
 	fl->fl_break_time = 0UL;
 	fl->fl_ops = NULL;
@@ -230,6 +231,7 @@ void __locks_copy_lock(struct file_lock *new, const struct file_lock *fl)
 	new->fl_type = fl->fl_type;
 	new->fl_type_prev = fl->fl_type_prev;
 	new->fl_start = fl->fl_start;
+	new->fl_break_notified = fl->fl_break_notified;
 	new->fl_end = fl->fl_end;
 	new->fl_break_time = 0UL;
 	new->fl_ops = NULL;
@@ -458,6 +460,7 @@ static int lease_init(struct file *filp, int type, struct file_lock *fl)
 
 	fl->fl_file = filp;
 	fl->fl_flags = FL_LEASE;
+	fl->fl_break_notified = 0;
 	fl->fl_start = 0;
 	fl->fl_end = OFFSET_MAX;
 	fl->fl_ops = NULL;
@@ -1141,6 +1144,8 @@ int lease_modify(struct file_lock **before, int arg)
 	struct file_lock *fl = *before;
 	int error = assign_type(fl, arg);
 
+	fl->fl_break_notified = 0;
+
 	if (error)
 		return error;
 	locks_wake_up_blocks(fl);
@@ -1234,8 +1239,10 @@ int __break_lease(struct inode *inode, unsigned int mode)
 			fl->fl_type_prev = fl->fl_type;
 			fl->fl_type = future;
 			fl->fl_break_time = break_time;
+
 			/* lease must have lmops break callback */
 			fl->fl_lmops->fl_break(fl);
+			fl->fl_break_notified = 1;
 		}
 	}
 
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 299cc09..c21f002 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1067,6 +1067,7 @@ struct file_lock {
 	unsigned char fl_flags;
 	unsigned char fl_type;
 	unsigned char fl_type_prev;
+	unsigned char fl_break_notified;
 	unsigned int fl_pid;
 	struct pid *fl_nspid;
 	wait_queue_head_t fl_wait;
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 13/16][cr][v3]: Add jiffies_begin field to ckpt_ctx
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (11 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 12/16][cr][v3]: Add ->fl_break_notified field Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 14/16][cr][v3]: Checkpoint file-leases Sukadev Bhattiprolu
                     ` (3 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

'jiffies_begin' is needed to compute relative time-offsets after restart.
We cannot use ->ktime_begin for these computations since, as pointed out
by John Stultz, converting jiffies to CLOCK_MONOTONIC can result in
incorrect values.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 include/linux/checkpoint_types.h |    1 +
 kernel/checkpoint/sys.c          |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/checkpoint_types.h b/include/linux/checkpoint_types.h
index 3ffe9bd..aba9380 100644
--- a/include/linux/checkpoint_types.h
+++ b/include/linux/checkpoint_types.h
@@ -33,6 +33,7 @@ struct ckpt_ctx {
 	int crid;		/* unique checkpoint id */
 
 	ktime_t ktime_begin;	/* checkpoint start time */
+	unsigned long jiffies_begin;
 
 	int root_init;				/* [container] root init ? */
 	pid_t root_pid;				/* [container] root pid */
diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c
index b761ec0..0beeee5 100644
--- a/kernel/checkpoint/sys.c
+++ b/kernel/checkpoint/sys.c
@@ -278,6 +278,7 @@ static struct ckpt_ctx *ckpt_ctx_alloc(int fd, unsigned long uflags,
 	ctx->uflags = uflags;
 	ctx->kflags = kflags;
 	ctx->ktime_begin = ktime_get();
+	ctx->jiffies_begin = jiffies;
 	ctx->coord_pidns = get_pid_ns(current->nsproxy->pid_ns);
 
 	atomic_set(&ctx->refcount, 0);
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 13/16][cr][v3]: Add jiffies_begin field to ckpt_ctx
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (12 preceding siblings ...)
  2010-08-03 23:11 ` [PATCH 12/16][cr][v3]: Add ->fl_break_notified field Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 14/16][cr][v3]: Checkpoint file-leases Sukadev Bhattiprolu
                   ` (3 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

'jiffies_begin' is needed to compute relative time-offsets after restart.
We cannot use ->ktime_begin for these computations since, as pointed out
by John Stultz, converting jiffies to CLOCK_MONOTONIC can result in
incorrect values.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 include/linux/checkpoint_types.h |    1 +
 kernel/checkpoint/sys.c          |    1 +
 2 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/include/linux/checkpoint_types.h b/include/linux/checkpoint_types.h
index 3ffe9bd..aba9380 100644
--- a/include/linux/checkpoint_types.h
+++ b/include/linux/checkpoint_types.h
@@ -33,6 +33,7 @@ struct ckpt_ctx {
 	int crid;		/* unique checkpoint id */
 
 	ktime_t ktime_begin;	/* checkpoint start time */
+	unsigned long jiffies_begin;
 
 	int root_init;				/* [container] root init ? */
 	pid_t root_pid;				/* [container] root pid */
diff --git a/kernel/checkpoint/sys.c b/kernel/checkpoint/sys.c
index b761ec0..0beeee5 100644
--- a/kernel/checkpoint/sys.c
+++ b/kernel/checkpoint/sys.c
@@ -278,6 +278,7 @@ static struct ckpt_ctx *ckpt_ctx_alloc(int fd, unsigned long uflags,
 	ctx->uflags = uflags;
 	ctx->kflags = kflags;
 	ctx->ktime_begin = ktime_get();
+	ctx->jiffies_begin = jiffies;
 	ctx->coord_pidns = get_pid_ns(current->nsproxy->pid_ns);
 
 	atomic_set(&ctx->refcount, 0);
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 14/16][cr][v3]: Checkpoint file-leases
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (12 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 13/16][cr][v3]: Add jiffies_begin field to ckpt_ctx Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 15/16][cr][v3]: Define do_setlease() Sukadev Bhattiprolu
                     ` (2 subsequent siblings)
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Build upon the C/R of file-locks to checkpoint file-leases. This patch
simply checkpoints the file-lease information. A follow-on patch
will use this checkpoint information to restorethe file leases.

C/R of file leases depends on whether the lease is currently being broken
(i.e F_INPROGRESS is set) or not.  If the file-lease is not being broken,
checkpoint of file-lease is identical to checkpoint of file-locks.

But if the lease is being broken, we checkpoint additional information,
such as:
        - the previous lease type,
        - the time remaining in the lease, and
        - whether we already notified the lease-holder about the lease-break

See follow-on patch ("Restore file-leases") for more details on how this
information is used while restoring file leases and for other semantics
relating to file-leases.

Changelog[v3]:
	- [Oren Laadan, John Stultz]: Use ->jiffies_begin to compute
	  remaining lease
        - Broke-up the draft-patch from previous version into smaller
          patches and addressed comments from Oren Laadan.

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/checkpoint.c                |   28 +++++++++++++++++++++++++---
 include/linux/checkpoint_hdr.h |    3 +++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/checkpoint.c b/fs/checkpoint.c
index d76b073..5e5d0c5 100644
--- a/fs/checkpoint.c
+++ b/fs/checkpoint.c
@@ -267,6 +267,17 @@ static int checkpoint_file(struct ckpt_ctx *ctx, void *ptr)
 	return ret;
 }
 
+static int get_rem_lease(struct ckpt_ctx *ctx, struct file_lock *lock)
+{
+	int rem_lease;
+
+	rem_lease = 0;
+	if ((lock->fl_type & F_INPROGRESS))
+		rem_lease = (lock->fl_break_time - ctx->jiffies_begin) / HZ;
+
+	return rem_lease;
+}
+
 static int checkpoint_one_file_lock(struct ckpt_ctx *ctx, struct file *file,
 		struct file_lock *lock)
 {
@@ -280,8 +291,14 @@ static int checkpoint_one_file_lock(struct ckpt_ctx *ctx, struct file *file,
 	if (lock) {
 		h->fl_start = lock->fl_start;
 		h->fl_end = lock->fl_end;
+		/* checkpoint F_INPROGRESS also, if set */
 		h->fl_type = lock->fl_type;
+		h->fl_type_prev = lock->fl_type_prev;
 		h->fl_flags = lock->fl_flags;
+		if (IS_LEASE(lock)) {
+			h->fl_break_notified = lock->fl_break_notified;
+			h->fl_rem_lease = get_rem_lease(ctx, lock);
+		}
 	} else {
 		/* Checkpoint a dummy lock as a marker */
 		CKPT_HDR_SET_MARKER_FILE_LOCK(h);
@@ -307,14 +324,19 @@ checkpoint_file_locks(struct ckpt_ctx *ctx, struct files_struct *files,
 	inode = file->f_path.dentry->d_inode;
 	for_each_lock(inode, lockpp) {
 		lockp = *lockpp;
-		ckpt_debug("Lock [%lld, %lld, %d, 0x%x]\n", lockp->fl_start,
-				lockp->fl_end, lockp->fl_type, lockp->fl_flags);
+
+		ckpt_debug("Lock [%lld, %lld, %d, 0x%x], type_prev %d, "
+				"break-notified %d, rem-lease %d\n",
+				lockp->fl_start, lockp->fl_end, lockp->fl_type,
+				lockp->fl_flags, lockp->fl_type_prev,
+				lockp->fl_break_notified,
+				get_rem_lease(ctx, lockp));
 
 		if (lockp->fl_owner != files)
 			continue;
 
 		rc = -EBADF;
-		if (IS_POSIX(lockp))
+		if (IS_POSIX(lockp) || IS_LEASE(lockp))
 			rc = checkpoint_one_file_lock(ctx, file, lockp);
 
 		if (rc < 0) {
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index ad08c8e..9526504 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -593,7 +593,10 @@ struct ckpt_hdr_file_lock {
        __s64 fl_start;
        __s64 fl_end;
        __u8 fl_type;
+       __u8 fl_type_prev;
        __u8 fl_flags;
+       __u8 fl_break_notified;
+       __s32 fl_rem_lease;
 };
 
 #define CKPT_HDR_SET_MARKER_FILE_LOCK(h) {		\
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 14/16][cr][v3]: Checkpoint file-leases
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (13 preceding siblings ...)
  2010-08-03 23:11 ` [PATCH 13/16][cr][v3]: Add jiffies_begin field to ckpt_ctx Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 15/16][cr][v3]: Define do_setlease() Sukadev Bhattiprolu
                   ` (2 subsequent siblings)
  17 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Build upon the C/R of file-locks to checkpoint file-leases. This patch
simply checkpoints the file-lease information. A follow-on patch
will use this checkpoint information to restorethe file leases.

C/R of file leases depends on whether the lease is currently being broken
(i.e F_INPROGRESS is set) or not.  If the file-lease is not being broken,
checkpoint of file-lease is identical to checkpoint of file-locks.

But if the lease is being broken, we checkpoint additional information,
such as:
        - the previous lease type,
        - the time remaining in the lease, and
        - whether we already notified the lease-holder about the lease-break

See follow-on patch ("Restore file-leases") for more details on how this
information is used while restoring file leases and for other semantics
relating to file-leases.

Changelog[v3]:
	- [Oren Laadan, John Stultz]: Use ->jiffies_begin to compute
	  remaining lease
        - Broke-up the draft-patch from previous version into smaller
          patches and addressed comments from Oren Laadan.

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/checkpoint.c                |   28 +++++++++++++++++++++++++---
 include/linux/checkpoint_hdr.h |    3 +++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/fs/checkpoint.c b/fs/checkpoint.c
index d76b073..5e5d0c5 100644
--- a/fs/checkpoint.c
+++ b/fs/checkpoint.c
@@ -267,6 +267,17 @@ static int checkpoint_file(struct ckpt_ctx *ctx, void *ptr)
 	return ret;
 }
 
+static int get_rem_lease(struct ckpt_ctx *ctx, struct file_lock *lock)
+{
+	int rem_lease;
+
+	rem_lease = 0;
+	if ((lock->fl_type & F_INPROGRESS))
+		rem_lease = (lock->fl_break_time - ctx->jiffies_begin) / HZ;
+
+	return rem_lease;
+}
+
 static int checkpoint_one_file_lock(struct ckpt_ctx *ctx, struct file *file,
 		struct file_lock *lock)
 {
@@ -280,8 +291,14 @@ static int checkpoint_one_file_lock(struct ckpt_ctx *ctx, struct file *file,
 	if (lock) {
 		h->fl_start = lock->fl_start;
 		h->fl_end = lock->fl_end;
+		/* checkpoint F_INPROGRESS also, if set */
 		h->fl_type = lock->fl_type;
+		h->fl_type_prev = lock->fl_type_prev;
 		h->fl_flags = lock->fl_flags;
+		if (IS_LEASE(lock)) {
+			h->fl_break_notified = lock->fl_break_notified;
+			h->fl_rem_lease = get_rem_lease(ctx, lock);
+		}
 	} else {
 		/* Checkpoint a dummy lock as a marker */
 		CKPT_HDR_SET_MARKER_FILE_LOCK(h);
@@ -307,14 +324,19 @@ checkpoint_file_locks(struct ckpt_ctx *ctx, struct files_struct *files,
 	inode = file->f_path.dentry->d_inode;
 	for_each_lock(inode, lockpp) {
 		lockp = *lockpp;
-		ckpt_debug("Lock [%lld, %lld, %d, 0x%x]\n", lockp->fl_start,
-				lockp->fl_end, lockp->fl_type, lockp->fl_flags);
+
+		ckpt_debug("Lock [%lld, %lld, %d, 0x%x], type_prev %d, "
+				"break-notified %d, rem-lease %d\n",
+				lockp->fl_start, lockp->fl_end, lockp->fl_type,
+				lockp->fl_flags, lockp->fl_type_prev,
+				lockp->fl_break_notified,
+				get_rem_lease(ctx, lockp));
 
 		if (lockp->fl_owner != files)
 			continue;
 
 		rc = -EBADF;
-		if (IS_POSIX(lockp))
+		if (IS_POSIX(lockp) || IS_LEASE(lockp))
 			rc = checkpoint_one_file_lock(ctx, file, lockp);
 
 		if (rc < 0) {
diff --git a/include/linux/checkpoint_hdr.h b/include/linux/checkpoint_hdr.h
index ad08c8e..9526504 100644
--- a/include/linux/checkpoint_hdr.h
+++ b/include/linux/checkpoint_hdr.h
@@ -593,7 +593,10 @@ struct ckpt_hdr_file_lock {
        __s64 fl_start;
        __s64 fl_end;
        __u8 fl_type;
+       __u8 fl_type_prev;
        __u8 fl_flags;
+       __u8 fl_break_notified;
+       __s32 fl_rem_lease;
 };
 
 #define CKPT_HDR_SET_MARKER_FILE_LOCK(h) {		\
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 15/16][cr][v3]: Define do_setlease()
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (13 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 14/16][cr][v3]: Checkpoint file-leases Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-03 23:11   ` [PATCH 16/16][cr][v3]: Restore file-leases Sukadev Bhattiprolu
  2010-08-04 10:45   ` [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Steven Whitehouse
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Move the core functionality of fcntl_setlease() into a new function,
do_setlease(). do_setlease() is, same as fcntl_setlease(), except for
two new parameters, 'notified' and 'break_time'.

do_setlease() is used, in a follow-on patch when restoring file leases.
These new parameters are needed if the lease was being broken when the
process was checkpointed.

The parameter 'notified' is used to notify the lease-holder exactly once,
even if the lease was checkpointed during a lease-break and then restarted.

The parameter, 'break_time' specifies the new break-time for the lease.
(i.e if the process had 10-seconds remaining on the lease when it was
checkpointed, the 'break_time' is used give the process 10 seconds on
the lease after the process is restarted).

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/locks.c         |   31 ++++++++++++++++++++-----------
 include/linux/fs.h |    2 ++
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index b5eb4c0..6e84d90 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1490,17 +1490,8 @@ int vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
 }
 EXPORT_SYMBOL_GPL(vfs_setlease);
 
-/**
- *	fcntl_setlease	-	sets a lease on an open file
- *	@fd: open file descriptor
- *	@filp: file pointer
- *	@arg: type of lease to obtain
- *
- *	Call this fcntl to establish a lease on the file.
- *	Note that you also need to call %F_SETSIG to
- *	receive a signal when the lease is broken.
- */
-int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
+int do_setlease(unsigned int fd, struct file *filp, long arg, 
+			unsigned long break_time, int notified)
 {
 	struct file_lock fl, *flp = &fl;
 	struct inode *inode = filp->f_path.dentry->d_inode;
@@ -1511,6 +1502,9 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
 	if (error)
 		return error;
 
+	fl.fl_break_notified = notified;
+	fl.fl_break_time = break_time;
+
 	lock_kernel();
 
 	error = vfs_setlease(filp, arg, &flp);
@@ -1534,6 +1528,21 @@ out_unlock:
 }
 
 /**
+ *	fcntl_setlease	-	sets a lease on an open file
+ *	@fd: open file descriptor
+ *	@filp: file pointer
+ *	@arg: type of lease to obtain
+ *
+ *	Call this fcntl to establish a lease on the file.
+ *	Note that you also need to call %F_SETSIG to
+ *	receive a signal when the lease is broken.
+ */
+int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
+{
+	return do_setlease(fd, filp, arg, 0UL, 0);
+}
+
+/**
  * flock_lock_file_wait - Apply a FLOCK-style lock to a file
  * @filp: The file to apply the lock to
  * @fl: The lock to be applied
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c21f002..5056477 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1124,6 +1124,8 @@ extern int flock64_set(unsigned int, struct file *, unsigned int,
 			struct flock64 *);
 #endif
 
+extern int do_setlease(unsigned int fd, struct file *filp, long arg,
+			unsigned long break_time, int notified);
 extern int fcntl_setlease(unsigned int fd, struct file *filp, long arg);
 extern int fcntl_getlease(struct file *filp);
 
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 15/16][cr][v3]: Define do_setlease()
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (14 preceding siblings ...)
  2010-08-03 23:11 ` [PATCH 14/16][cr][v3]: Checkpoint file-leases Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
  2010-08-03 23:11 ` [PATCH 16/16][cr][v3]: Restore file-leases Sukadev Bhattiprolu
  2010-08-04 10:45 ` [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Steven Whitehouse
  17 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Move the core functionality of fcntl_setlease() into a new function,
do_setlease(). do_setlease() is, same as fcntl_setlease(), except for
two new parameters, 'notified' and 'break_time'.

do_setlease() is used, in a follow-on patch when restoring file leases.
These new parameters are needed if the lease was being broken when the
process was checkpointed.

The parameter 'notified' is used to notify the lease-holder exactly once,
even if the lease was checkpointed during a lease-break and then restarted.

The parameter, 'break_time' specifies the new break-time for the lease.
(i.e if the process had 10-seconds remaining on the lease when it was
checkpointed, the 'break_time' is used give the process 10 seconds on
the lease after the process is restarted).

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/locks.c         |   31 ++++++++++++++++++++-----------
 include/linux/fs.h |    2 ++
 2 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/fs/locks.c b/fs/locks.c
index b5eb4c0..6e84d90 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1490,17 +1490,8 @@ int vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
 }
 EXPORT_SYMBOL_GPL(vfs_setlease);
 
-/**
- *	fcntl_setlease	-	sets a lease on an open file
- *	@fd: open file descriptor
- *	@filp: file pointer
- *	@arg: type of lease to obtain
- *
- *	Call this fcntl to establish a lease on the file.
- *	Note that you also need to call %F_SETSIG to
- *	receive a signal when the lease is broken.
- */
-int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
+int do_setlease(unsigned int fd, struct file *filp, long arg, 
+			unsigned long break_time, int notified)
 {
 	struct file_lock fl, *flp = &fl;
 	struct inode *inode = filp->f_path.dentry->d_inode;
@@ -1511,6 +1502,9 @@ int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
 	if (error)
 		return error;
 
+	fl.fl_break_notified = notified;
+	fl.fl_break_time = break_time;
+
 	lock_kernel();
 
 	error = vfs_setlease(filp, arg, &flp);
@@ -1534,6 +1528,21 @@ out_unlock:
 }
 
 /**
+ *	fcntl_setlease	-	sets a lease on an open file
+ *	@fd: open file descriptor
+ *	@filp: file pointer
+ *	@arg: type of lease to obtain
+ *
+ *	Call this fcntl to establish a lease on the file.
+ *	Note that you also need to call %F_SETSIG to
+ *	receive a signal when the lease is broken.
+ */
+int fcntl_setlease(unsigned int fd, struct file *filp, long arg)
+{
+	return do_setlease(fd, filp, arg, 0UL, 0);
+}
+
+/**
  * flock_lock_file_wait - Apply a FLOCK-style lock to a file
  * @filp: The file to apply the lock to
  * @fl: The lock to be applied
diff --git a/include/linux/fs.h b/include/linux/fs.h
index c21f002..5056477 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -1124,6 +1124,8 @@ extern int flock64_set(unsigned int, struct file *, unsigned int,
 			struct flock64 *);
 #endif
 
+extern int do_setlease(unsigned int fd, struct file *filp, long arg,
+			unsigned long break_time, int notified);
 extern int fcntl_setlease(unsigned int fd, struct file *filp, long arg);
 extern int fcntl_getlease(struct file *filp);
 
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 16/16][cr][v3]: Restore file-leases
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (14 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 15/16][cr][v3]: Define do_setlease() Sukadev Bhattiprolu
@ 2010-08-03 23:11   ` Sukadev Bhattiprolu
  2010-08-04 10:45   ` [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Steven Whitehouse
  16 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Restart an application with file-leases, from its checkpoint.

Restart of file-lease that is not being broken (i.e F_INPROGRESS is not set)
is almost identical to C/R of file-locks. i.e save the type of lease for the
file in the checkpoint image and when restarting, restore the lease by calling
do_setlease().

C/R of file-lease gets complicated (I think), if a process is checkpointed
when its lease was being revoked. i.e if P1 has a F_WRLCK lease on file F1
and P2 opens F1 for write, P2's open is blocked for lease_break_time (45 secs).
P1's lease is revoked (i.e set to F_UNLCK) and P1 is notified via a SIGIO to
flush any dirty data.

Basic design:

To restore a lease that is being broken, we temporarily re-assign the original
lease type (that we saved in ->fl_type_prev) to the lease-holder. i.e. in the
above example, give P1 a F_WRLCK lease). When the lease-breaker (P2) is
restarted after checkpoint, its open() system fails with -ERESTARTSYS and it
will retry the open(). This open() will re-initiate the lease-break protocol
(i.e P2 will go back to waiting and P1 will be notified).

Some observations about this approach:

1. We must use ->fl_type_prev because, when the lease is being broken,
  ->fl_type is already set to F_UNLCK and would not result in a
  lease-break protocol when P2 is restarted.

2. When the lease-break is initiated and we signal the lease-holder, we set
   the ->fl_break_notified field. When restarting the lease and repeating
   the lease-break protocol, we check the ->fl_break_notified field and
   signal the lease-holder only if did not signal before the checkpoint.

3. If P1 was was checkpointed 40 seconds into the lease_break_time,(i.e.
   it had 5 seconds remaining in the lease), we would ideally want to ensure
   that after restart, P1 gets 5 or at least 5 seconds to finish cleaning up
   the lease.

   But the actual time that P1 gets after the application is restarted
   depends on many factors (number of processes in the application
   process tree, load on system at the time of restart etc).

   Jamie Lokier had suggested that we favor the lease-holder (P1) during
   restart, even if it meant giving the lease-holder the entire lease-break
   interval (45 seconds) again after the restart. Oren Laadan suggested
   that rather than make that a kernel policy, we let the user choose a
   policy based on the application's behavior.

   The current patchset computes and checkpoints the remaining-lease and
   uses this value to restore the lease. i.e the kernel simply uses the
   "remaining-lease" value stored in the checkpoint image. Userspace tools
   can be developed to alter the remaining-lease value in the checkpoint
   image to either favor the lease-holder or the lease-breaker or to add
   a fixed delta.

4. The above design of C/R of file-leases assumes that both lease-holder
   and lease-breaker are restarted. If only the lease-holder is
   restarted, the kernel will re-assign the original lease (F_WRLCK in
   the example) to lease-holder. If no lease-breaker comes along, the
   kernel will leave the lease assigned to lease-holder.

   This should not be a problem because, as far as the lease-holder is
   concerned the lease was revoked and it will/should reacquire the
   lease.

Changelog[v3]:

	- Broke-up patchset into smaller patches and addressed comments
	  from Oren Laadan, Jamie Lokier.

Changelog[v2]:
	- comments from Matt Helsley, Serge Hallyn...

Signed-off-by: Sukadev Bhattiprolu <sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
---
 fs/checkpoint.c |   29 +++++++++++++++++++++-
 fs/locks.c      |   71 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 95 insertions(+), 5 deletions(-)

diff --git a/fs/checkpoint.c b/fs/checkpoint.c
index 5e5d0c5..3b53f99 100644
--- a/fs/checkpoint.c
+++ b/fs/checkpoint.c
@@ -1090,8 +1090,10 @@ static int restore_file_locks(struct ckpt_ctx *ctx, struct file *file, int fd)
 		if (IS_ERR(h))
 			return PTR_ERR(h);
 
-		ckpt_debug("Lock [%lld, %lld, %d, 0x%x]\n", h->fl_start,
-				h->fl_end, (int)h->fl_type, h->fl_flags);
+		ckpt_debug("Lock [%lld, %lld, %d, 0x%x], fl_type_prev %d, "
+				"fl_break_notified %d\n", h->fl_start,
+				h->fl_end, (int)h->fl_type, h->fl_flags,
+				h->fl_type_prev, h->fl_break_notified);
 
 		/*
 		 * If it is a dummy-lock, we are done with this fd.
@@ -1104,6 +1106,29 @@ static int restore_file_locks(struct ckpt_ctx *ctx, struct file *file, int fd)
 		ret = -EBADF;
 		if (h->fl_flags & FL_POSIX)
 			ret = restore_one_file_lock(ctx, file, fd, h); 
+		else if (h->fl_flags & FL_LEASE) {
+			int type;
+			unsigned long bt;
+
+			type = h->fl_type;
+			if (h->fl_type & F_INPROGRESS)
+				type = h->fl_type_prev;
+
+			/*
+			 * Use ->fl_rem_lease to compute new break time in
+			 * jiffies relative to ->jiffies_begin.
+			 */
+			bt = ctx->jiffies_begin + (h->fl_rem_lease * HZ);
+
+			ckpt_debug("current-jiffies %lu, jiffies_begin %lu, "
+					"break-time-jiffies %lu\n", jiffies,
+					ctx->jiffies_begin, bt);
+
+			ret = do_setlease(fd, file, type, bt,
+						h->fl_break_notified);
+			if (ret)
+				ckpt_err(ctx, ret, "do_setlease(): %d\n", type);
+		}
 
 		if (ret < 0)
 			ckpt_err(ctx, ret, "%(T) fl_flags 0x%x\n", h->fl_flags);
diff --git a/fs/locks.c b/fs/locks.c
index 6e84d90..b0640b9 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1183,6 +1183,16 @@ static void time_out_leases(struct inode *inode)
  *	some kind of lock (maybe a lease) on this file.  Leases are broken on
  *	a call to open() or truncate().  This function can sleep unless you
  *	specified %O_NONBLOCK to your open().
+ *
+ *	Checkpoint/restart: Suppose 10 seconds remain in a lease when the
+ *		lease-holder is checkpointed. When the lease-holder is
+ *		restarted, it should probably only get 10-seconds of the
+ *		lease, but we favor the lease-holder and allow it to to
+ *		have entire lease-break-time again. This can of course
+ *		cause the lease-breaker(s) to starve if the application is
+ *		repeatedly checkpointed during the lease-break protocol,
+ *		but that is hopefully not a common occurence.
+ *
  */
 int __break_lease(struct inode *inode, unsigned int mode)
 {
@@ -1236,12 +1246,38 @@ int __break_lease(struct inode *inode, unsigned int mode)
 
 	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
 		if (fl->fl_type != future) {
-			fl->fl_type_prev = fl->fl_type;
+
+			/*
+			 * If ->fl_type_prev is already set, we could be in a
+			 * recursive checkpoint/restart. i.e. we were
+			 * checkpointed once when our lease was being broken.
+			 * We were then restarted from the checkpoint and
+			 * immediately checkpointed again before the restored
+			 * lease expired. In this case, we want to restore
+			 * the lease to the original type. So don't overwrite
+			 * ->fl_type_prev if its already set.
+			 */
+			if (!fl->fl_type_prev)
+				fl->fl_type_prev = fl->fl_type;
+
 			fl->fl_type = future;
-			fl->fl_break_time = break_time;
+			/*
+			 * If this lease was restored from a checkpoint (i.e.
+			 * ->fl_break_time is set), don't change the remaining
+			 *  time on the lease.
+			 */
+			if (!fl->fl_break_time)
+				fl->fl_break_time = break_time;
+
+			/*
+			 * Similarly, if we already notified the lease-holder
+			 * before the checkpoint, i.e. ->fl_break_notified is
+			 * set, don't notify again.
+			 */
 
 			/* lease must have lmops break callback */
-			fl->fl_lmops->fl_break(fl);
+			if (!fl->fl_break_notified)
+				fl->fl_lmops->fl_break(fl);
 			fl->fl_break_notified = 1;
 		}
 	}
@@ -1490,6 +1526,35 @@ int vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
 }
 EXPORT_SYMBOL_GPL(vfs_setlease);
 
+/*
+ * do_setlease():
+ *
+ *
+ * Checkpoint/restart notes:
+ *
+ * 	If this lease was from a checkpoint, we assume that the lease-breaker
+ * 	was also checkpointed, and we re-assign the lease to the lease holder.
+ *
+ * 	When the lease-breaker is restarted, it will retry the open() and thus
+ * 	initiate the lease-break protocol again. But to avoid confusing the
+ * 	application, we follow the entire lease-break protocol except that we
+ * 	signal the application only if it was not signalled before.
+ *
+ * 	Note that these semantics assume that the lease-breaker will come along
+ * 	and reclaim the lease. If that does not happen, (either because the
+ * 	lease-breaker was not checkpointed or is not being restarted at the
+ * 	same time as the lease-holder) the kernel will leave the lease assigned
+ * 	to the lease-holder, even though the application assumes it no longer
+ * 	has the lease.
+ *
+ * 	But this discrepancy will not cause a problem for the lease-holder,
+ * 	since the lease-holder "knows" it does not have the lease and will
+ * 	have to reacquire the lease.
+ *
+ * 	This semantics has a down-side to any new lease-breaker in that they
+ * 	will be blocked for the lease_break_time, even if the lease-holder no
+ * 	longer has the lease.
+ */
 int do_setlease(unsigned int fd, struct file *filp, long arg, 
 			unsigned long break_time, int notified)
 {
-- 
1.6.0.4

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH 16/16][cr][v3]: Restore file-leases
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (15 preceding siblings ...)
  2010-08-03 23:11 ` [PATCH 15/16][cr][v3]: Define do_setlease() Sukadev Bhattiprolu
@ 2010-08-03 23:11 ` Sukadev Bhattiprolu
       [not found]   ` <1280877097-12377-17-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  2010-08-04 23:35   ` Oren Laadan
  2010-08-04 10:45 ` [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Steven Whitehouse
  17 siblings, 2 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Restart an application with file-leases, from its checkpoint.

Restart of file-lease that is not being broken (i.e F_INPROGRESS is not set)
is almost identical to C/R of file-locks. i.e save the type of lease for the
file in the checkpoint image and when restarting, restore the lease by calling
do_setlease().

C/R of file-lease gets complicated (I think), if a process is checkpointed
when its lease was being revoked. i.e if P1 has a F_WRLCK lease on file F1
and P2 opens F1 for write, P2's open is blocked for lease_break_time (45 secs).
P1's lease is revoked (i.e set to F_UNLCK) and P1 is notified via a SIGIO to
flush any dirty data.

Basic design:

To restore a lease that is being broken, we temporarily re-assign the original
lease type (that we saved in ->fl_type_prev) to the lease-holder. i.e. in the
above example, give P1 a F_WRLCK lease). When the lease-breaker (P2) is
restarted after checkpoint, its open() system fails with -ERESTARTSYS and it
will retry the open(). This open() will re-initiate the lease-break protocol
(i.e P2 will go back to waiting and P1 will be notified).

Some observations about this approach:

1. We must use ->fl_type_prev because, when the lease is being broken,
  ->fl_type is already set to F_UNLCK and would not result in a
  lease-break protocol when P2 is restarted.

2. When the lease-break is initiated and we signal the lease-holder, we set
   the ->fl_break_notified field. When restarting the lease and repeating
   the lease-break protocol, we check the ->fl_break_notified field and
   signal the lease-holder only if did not signal before the checkpoint.

3. If P1 was was checkpointed 40 seconds into the lease_break_time,(i.e.
   it had 5 seconds remaining in the lease), we would ideally want to ensure
   that after restart, P1 gets 5 or at least 5 seconds to finish cleaning up
   the lease.

   But the actual time that P1 gets after the application is restarted
   depends on many factors (number of processes in the application
   process tree, load on system at the time of restart etc).

   Jamie Lokier had suggested that we favor the lease-holder (P1) during
   restart, even if it meant giving the lease-holder the entire lease-break
   interval (45 seconds) again after the restart. Oren Laadan suggested
   that rather than make that a kernel policy, we let the user choose a
   policy based on the application's behavior.

   The current patchset computes and checkpoints the remaining-lease and
   uses this value to restore the lease. i.e the kernel simply uses the
   "remaining-lease" value stored in the checkpoint image. Userspace tools
   can be developed to alter the remaining-lease value in the checkpoint
   image to either favor the lease-holder or the lease-breaker or to add
   a fixed delta.

4. The above design of C/R of file-leases assumes that both lease-holder
   and lease-breaker are restarted. If only the lease-holder is
   restarted, the kernel will re-assign the original lease (F_WRLCK in
   the example) to lease-holder. If no lease-breaker comes along, the
   kernel will leave the lease assigned to lease-holder.

   This should not be a problem because, as far as the lease-holder is
   concerned the lease was revoked and it will/should reacquire the
   lease.

Changelog[v3]:

	- Broke-up patchset into smaller patches and addressed comments
	  from Oren Laadan, Jamie Lokier.

Changelog[v2]:
	- comments from Matt Helsley, Serge Hallyn...

Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
---
 fs/checkpoint.c |   29 +++++++++++++++++++++-
 fs/locks.c      |   71 ++++++++++++++++++++++++++++++++++++++++++++++++++++--
 2 files changed, 95 insertions(+), 5 deletions(-)

diff --git a/fs/checkpoint.c b/fs/checkpoint.c
index 5e5d0c5..3b53f99 100644
--- a/fs/checkpoint.c
+++ b/fs/checkpoint.c
@@ -1090,8 +1090,10 @@ static int restore_file_locks(struct ckpt_ctx *ctx, struct file *file, int fd)
 		if (IS_ERR(h))
 			return PTR_ERR(h);
 
-		ckpt_debug("Lock [%lld, %lld, %d, 0x%x]\n", h->fl_start,
-				h->fl_end, (int)h->fl_type, h->fl_flags);
+		ckpt_debug("Lock [%lld, %lld, %d, 0x%x], fl_type_prev %d, "
+				"fl_break_notified %d\n", h->fl_start,
+				h->fl_end, (int)h->fl_type, h->fl_flags,
+				h->fl_type_prev, h->fl_break_notified);
 
 		/*
 		 * If it is a dummy-lock, we are done with this fd.
@@ -1104,6 +1106,29 @@ static int restore_file_locks(struct ckpt_ctx *ctx, struct file *file, int fd)
 		ret = -EBADF;
 		if (h->fl_flags & FL_POSIX)
 			ret = restore_one_file_lock(ctx, file, fd, h); 
+		else if (h->fl_flags & FL_LEASE) {
+			int type;
+			unsigned long bt;
+
+			type = h->fl_type;
+			if (h->fl_type & F_INPROGRESS)
+				type = h->fl_type_prev;
+
+			/*
+			 * Use ->fl_rem_lease to compute new break time in
+			 * jiffies relative to ->jiffies_begin.
+			 */
+			bt = ctx->jiffies_begin + (h->fl_rem_lease * HZ);
+
+			ckpt_debug("current-jiffies %lu, jiffies_begin %lu, "
+					"break-time-jiffies %lu\n", jiffies,
+					ctx->jiffies_begin, bt);
+
+			ret = do_setlease(fd, file, type, bt,
+						h->fl_break_notified);
+			if (ret)
+				ckpt_err(ctx, ret, "do_setlease(): %d\n", type);
+		}
 
 		if (ret < 0)
 			ckpt_err(ctx, ret, "%(T) fl_flags 0x%x\n", h->fl_flags);
diff --git a/fs/locks.c b/fs/locks.c
index 6e84d90..b0640b9 100644
--- a/fs/locks.c
+++ b/fs/locks.c
@@ -1183,6 +1183,16 @@ static void time_out_leases(struct inode *inode)
  *	some kind of lock (maybe a lease) on this file.  Leases are broken on
  *	a call to open() or truncate().  This function can sleep unless you
  *	specified %O_NONBLOCK to your open().
+ *
+ *	Checkpoint/restart: Suppose 10 seconds remain in a lease when the
+ *		lease-holder is checkpointed. When the lease-holder is
+ *		restarted, it should probably only get 10-seconds of the
+ *		lease, but we favor the lease-holder and allow it to to
+ *		have entire lease-break-time again. This can of course
+ *		cause the lease-breaker(s) to starve if the application is
+ *		repeatedly checkpointed during the lease-break protocol,
+ *		but that is hopefully not a common occurence.
+ *
  */
 int __break_lease(struct inode *inode, unsigned int mode)
 {
@@ -1236,12 +1246,38 @@ int __break_lease(struct inode *inode, unsigned int mode)
 
 	for (fl = flock; fl && IS_LEASE(fl); fl = fl->fl_next) {
 		if (fl->fl_type != future) {
-			fl->fl_type_prev = fl->fl_type;
+
+			/*
+			 * If ->fl_type_prev is already set, we could be in a
+			 * recursive checkpoint/restart. i.e. we were
+			 * checkpointed once when our lease was being broken.
+			 * We were then restarted from the checkpoint and
+			 * immediately checkpointed again before the restored
+			 * lease expired. In this case, we want to restore
+			 * the lease to the original type. So don't overwrite
+			 * ->fl_type_prev if its already set.
+			 */
+			if (!fl->fl_type_prev)
+				fl->fl_type_prev = fl->fl_type;
+
 			fl->fl_type = future;
-			fl->fl_break_time = break_time;
+			/*
+			 * If this lease was restored from a checkpoint (i.e.
+			 * ->fl_break_time is set), don't change the remaining
+			 *  time on the lease.
+			 */
+			if (!fl->fl_break_time)
+				fl->fl_break_time = break_time;
+
+			/*
+			 * Similarly, if we already notified the lease-holder
+			 * before the checkpoint, i.e. ->fl_break_notified is
+			 * set, don't notify again.
+			 */
 
 			/* lease must have lmops break callback */
-			fl->fl_lmops->fl_break(fl);
+			if (!fl->fl_break_notified)
+				fl->fl_lmops->fl_break(fl);
 			fl->fl_break_notified = 1;
 		}
 	}
@@ -1490,6 +1526,35 @@ int vfs_setlease(struct file *filp, long arg, struct file_lock **lease)
 }
 EXPORT_SYMBOL_GPL(vfs_setlease);
 
+/*
+ * do_setlease():
+ *
+ *
+ * Checkpoint/restart notes:
+ *
+ * 	If this lease was from a checkpoint, we assume that the lease-breaker
+ * 	was also checkpointed, and we re-assign the lease to the lease holder.
+ *
+ * 	When the lease-breaker is restarted, it will retry the open() and thus
+ * 	initiate the lease-break protocol again. But to avoid confusing the
+ * 	application, we follow the entire lease-break protocol except that we
+ * 	signal the application only if it was not signalled before.
+ *
+ * 	Note that these semantics assume that the lease-breaker will come along
+ * 	and reclaim the lease. If that does not happen, (either because the
+ * 	lease-breaker was not checkpointed or is not being restarted at the
+ * 	same time as the lease-holder) the kernel will leave the lease assigned
+ * 	to the lease-holder, even though the application assumes it no longer
+ * 	has the lease.
+ *
+ * 	But this discrepancy will not cause a problem for the lease-holder,
+ * 	since the lease-holder "knows" it does not have the lease and will
+ * 	have to reacquire the lease.
+ *
+ * 	This semantics has a down-side to any new lease-breaker in that they
+ * 	will be blocked for the lease_break_time, even if the lease-holder no
+ * 	longer has the lease.
+ */
 int do_setlease(unsigned int fd, struct file *filp, long arg, 
 			unsigned long break_time, int notified)
 {
-- 
1.6.0.4


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/16][cr][v3]: C/R file owner, locks, leases
       [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
                     ` (15 preceding siblings ...)
  2010-08-03 23:11   ` [PATCH 16/16][cr][v3]: Restore file-leases Sukadev Bhattiprolu
@ 2010-08-04 10:45   ` Steven Whitehouse
  16 siblings, 0 replies; 50+ messages in thread
From: Steven Whitehouse @ 2010-08-04 10:45 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Matthew Wilcox, Containers, Jamie Lokier, John Stultz,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Hi,

On Tue, 2010-08-03 at 16:11 -0700, Sukadev Bhattiprolu wrote:
> Checkpoint/restart file owner, file-locks and file-lease information.
> 
Can you explain roughly how this is intended to work, or point me at a
document explaining it?

I'm trying to figure out how the file lock checkpoint will work with
cluster filesystems, or if there needs to be a mechanism to turn this
feature off for those filesystems. What prevents the lock state changing
in an incompatible way between the checkpoint and the restore?

Steve.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/16][cr][v3]: C/R file owner, locks, leases
  2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
                   ` (16 preceding siblings ...)
  2010-08-03 23:11 ` [PATCH 16/16][cr][v3]: Restore file-leases Sukadev Bhattiprolu
@ 2010-08-04 10:45 ` Steven Whitehouse
  2010-08-04 17:26   ` Matt Helsley
                     ` (3 more replies)
  17 siblings, 4 replies; 50+ messages in thread
From: Steven Whitehouse @ 2010-08-04 10:45 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Oren Laadan, Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Hi,

On Tue, 2010-08-03 at 16:11 -0700, Sukadev Bhattiprolu wrote:
> Checkpoint/restart file owner, file-locks and file-lease information.
> 
Can you explain roughly how this is intended to work, or point me at a
document explaining it?

I'm trying to figure out how the file lock checkpoint will work with
cluster filesystems, or if there needs to be a mechanism to turn this
feature off for those filesystems. What prevents the lock state changing
in an incompatible way between the checkpoint and the restore?

Steve.




^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/16][cr][v3]: C/R file owner, locks, leases
  2010-08-04 10:45 ` [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Steven Whitehouse
  2010-08-04 17:26   ` Matt Helsley
@ 2010-08-04 17:26   ` Matt Helsley
  2010-08-04 19:01   ` Sukadev Bhattiprolu
  2010-08-04 19:01   ` Sukadev Bhattiprolu
  3 siblings, 0 replies; 50+ messages in thread
From: Matt Helsley @ 2010-08-04 17:26 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Matthew Wilcox, Containers, Jamie Lokier, John Stultz,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith,
	Sukadev Bhattiprolu

On Wed, Aug 04, 2010 at 11:45:20AM +0100, Steven Whitehouse wrote:
> Hi,
> 
> On Tue, 2010-08-03 at 16:11 -0700, Sukadev Bhattiprolu wrote:
> > Checkpoint/restart file owner, file-locks and file-lease information.
> > 
> Can you explain roughly how this is intended to work, or point me at a
> document explaining it?
> 
> I'm trying to figure out how the file lock checkpoint will work with
> cluster filesystems, or if there needs to be a mechanism to turn this
> feature off for those filesystems. What prevents the lock state changing
> in an incompatible way between the checkpoint and the restore?

Hi Steve,

[ I'm just going to address your cluster filesystem question and let
  Suka answer your questions on these patches. ]

	Open files whose file operations structs are missing the
.checkpoint operation cause checkpoint to fail. We haven't added a
.checkpoint operation to cluster filesystems because of the kinds of
issues you're referring to.

	I don't think there are any file locks/leases which do not
require opening the file(s) in question. That means file locks
and leases in cluster filesystems should also cause checkpoint
to fail.

	Each cluster filesystem probably needs some special care when
considering the use of the generic_file_checkpoint operation. 

	Using generic_file_checkpoint is appropriate when we have some
way to get a consistent image of the filesystem at the time checkpoint
takes place. How that happens is largely up to the userspace tools
called user-cr. Device-mapper snapshots, fsfreezer + rsync, and
filesystem snapshots will all work. Of course those tools usually don't
save more volatile state information like locks.

	It's quite possible cluster filesystems will need their own
.checkpoint file operations. generic_file_checkpoint is composed of a few
smaller functions which could make writing such ops easier. For example,
we've already reused the smaller functions in .checkpoint operations for
anon_inode-based interfaces, pipes, fifos, and more,

	What it may come down to is this: How do you backup a cluster
filesystem? If there's already a backup method that works then we can
write the .checkpoint operation to rely on it. Often that means we
can use generic_file_checkpoint. The "backup method" should be
something which can be invoked by the userspace checkpoint/restart tools
(user-cr). If the backup method is too slow we can work on
improving it or we can try something else.

	So perhaps the best thing we can do to help you is learn how
folks backup their cluster filesystems. Got any pointers to basic info
on that?

Cheers,
	-Matt Helsley

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/16][cr][v3]: C/R file owner, locks, leases
  2010-08-04 10:45 ` [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Steven Whitehouse
@ 2010-08-04 17:26   ` Matt Helsley
       [not found]     ` <20100804172649.GM2927-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
  2010-08-04 18:03     ` Oren Laadan
  2010-08-04 17:26   ` Matt Helsley
                     ` (2 subsequent siblings)
  3 siblings, 2 replies; 50+ messages in thread
From: Matt Helsley @ 2010-08-04 17:26 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Sukadev Bhattiprolu, Oren Laadan, Serge Hallyn, Matt Helsley,
	Dan Smith, John Stultz, Matthew Wilcox, Jamie Lokier,
	linux-fsdevel, Containers

On Wed, Aug 04, 2010 at 11:45:20AM +0100, Steven Whitehouse wrote:
> Hi,
> 
> On Tue, 2010-08-03 at 16:11 -0700, Sukadev Bhattiprolu wrote:
> > Checkpoint/restart file owner, file-locks and file-lease information.
> > 
> Can you explain roughly how this is intended to work, or point me at a
> document explaining it?
> 
> I'm trying to figure out how the file lock checkpoint will work with
> cluster filesystems, or if there needs to be a mechanism to turn this
> feature off for those filesystems. What prevents the lock state changing
> in an incompatible way between the checkpoint and the restore?

Hi Steve,

[ I'm just going to address your cluster filesystem question and let
  Suka answer your questions on these patches. ]

	Open files whose file operations structs are missing the
.checkpoint operation cause checkpoint to fail. We haven't added a
.checkpoint operation to cluster filesystems because of the kinds of
issues you're referring to.

	I don't think there are any file locks/leases which do not
require opening the file(s) in question. That means file locks
and leases in cluster filesystems should also cause checkpoint
to fail.

	Each cluster filesystem probably needs some special care when
considering the use of the generic_file_checkpoint operation. 

	Using generic_file_checkpoint is appropriate when we have some
way to get a consistent image of the filesystem at the time checkpoint
takes place. How that happens is largely up to the userspace tools
called user-cr. Device-mapper snapshots, fsfreezer + rsync, and
filesystem snapshots will all work. Of course those tools usually don't
save more volatile state information like locks.

	It's quite possible cluster filesystems will need their own
.checkpoint file operations. generic_file_checkpoint is composed of a few
smaller functions which could make writing such ops easier. For example,
we've already reused the smaller functions in .checkpoint operations for
anon_inode-based interfaces, pipes, fifos, and more,

	What it may come down to is this: How do you backup a cluster
filesystem? If there's already a backup method that works then we can
write the .checkpoint operation to rely on it. Often that means we
can use generic_file_checkpoint. The "backup method" should be
something which can be invoked by the userspace checkpoint/restart tools
(user-cr). If the backup method is too slow we can work on
improving it or we can try something else.

	So perhaps the best thing we can do to help you is learn how
folks backup their cluster filesystems. Got any pointers to basic info
on that?

Cheers,
	-Matt Helsley


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/16][cr][v3]: C/R file owner, locks, leases
       [not found]     ` <20100804172649.GM2927-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
@ 2010-08-04 18:03       ` Oren Laadan
  0 siblings, 0 replies; 50+ messages in thread
From: Oren Laadan @ 2010-08-04 18:03 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith,
	Sukadev Bhattiprolu, Steven Whitehouse



On 08/04/2010 01:26 PM, Matt Helsley wrote:
> On Wed, Aug 04, 2010 at 11:45:20AM +0100, Steven Whitehouse wrote:
>> Hi,
>>
>> On Tue, 2010-08-03 at 16:11 -0700, Sukadev Bhattiprolu wrote:
>>> Checkpoint/restart file owner, file-locks and file-lease information.
>>>
>> Can you explain roughly how this is intended to work, or point me at a
>> document explaining it?
>>
>> I'm trying to figure out how the file lock checkpoint will work with
>> cluster filesystems, or if there needs to be a mechanism to turn this
>> feature off for those filesystems. What prevents the lock state changing
>> in an incompatible way between the checkpoint and the restore?
>

Hi Steve,

In addition to Matt's reply -

Checkpoint/restart _assumes_ that there exists a mechanism to keep
the filesystem state _unchanged_ between checkpoint and restart.

For example, one can kill the application after checkpoint and keep
the filesystem from being touched.
A more likely scenario is to use a filesystem's snapshot/backup
solution during checkpoint to ensure a pristine copy for restart.
In particular, there needs to be a mechanism to accomplish this
in a cluster filesystem, or rely on dedicated userspace tools.

So at restart, the filesystem is assumed to be visible and in the
same state as before. That state also includes locks etc.

Also, c/r has a mechanism to detect cases where a file in use by
the checkpoint application(s) is shared with a task that is not
being checkpointed. In this case, checkpoint will fail, to prevent
inconsistencies.

(I also imagine that often a cluster filesystem is used by parallel
applications - which in turn require some support to be checkpointed
in a consisted manner).

Oren.


> Hi Steve,
>
> [ I'm just going to address your cluster filesystem question and let
>    Suka answer your questions on these patches. ]
>
> 	Open files whose file operations structs are missing the
> .checkpoint operation cause checkpoint to fail. We haven't added a
> .checkpoint operation to cluster filesystems because of the kinds of
> issues you're referring to.
>
> 	I don't think there are any file locks/leases which do not
> require opening the file(s) in question. That means file locks
> and leases in cluster filesystems should also cause checkpoint
> to fail.
>
> 	Each cluster filesystem probably needs some special care when
> considering the use of the generic_file_checkpoint operation.
>
> 	Using generic_file_checkpoint is appropriate when we have some
> way to get a consistent image of the filesystem at the time checkpoint
> takes place. How that happens is largely up to the userspace tools
> called user-cr. Device-mapper snapshots, fsfreezer + rsync, and
> filesystem snapshots will all work. Of course those tools usually don't
> save more volatile state information like locks.
>
> 	It's quite possible cluster filesystems will need their own
> .checkpoint file operations. generic_file_checkpoint is composed of a few
> smaller functions which could make writing such ops easier. For example,
> we've already reused the smaller functions in .checkpoint operations for
> anon_inode-based interfaces, pipes, fifos, and more,
>
> 	What it may come down to is this: How do you backup a cluster
> filesystem? If there's already a backup method that works then we can
> write the .checkpoint operation to rely on it. Often that means we
> can use generic_file_checkpoint. The "backup method" should be
> something which can be invoked by the userspace checkpoint/restart tools
> (user-cr). If the backup method is too slow we can work on
> improving it or we can try something else.
>
> 	So perhaps the best thing we can do to help you is learn how
> folks backup their cluster filesystems. Got any pointers to basic info
> on that?
>
> Cheers,
> 	-Matt Helsley
>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/16][cr][v3]: C/R file owner, locks, leases
  2010-08-04 17:26   ` Matt Helsley
       [not found]     ` <20100804172649.GM2927-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
@ 2010-08-04 18:03     ` Oren Laadan
  1 sibling, 0 replies; 50+ messages in thread
From: Oren Laadan @ 2010-08-04 18:03 UTC (permalink / raw)
  To: Matt Helsley
  Cc: Steven Whitehouse, Sukadev Bhattiprolu, Serge Hallyn, Dan Smith,
	John Stultz, Matthew Wilcox, Jamie Lokier, linux-fsdevel,
	Containers



On 08/04/2010 01:26 PM, Matt Helsley wrote:
> On Wed, Aug 04, 2010 at 11:45:20AM +0100, Steven Whitehouse wrote:
>> Hi,
>>
>> On Tue, 2010-08-03 at 16:11 -0700, Sukadev Bhattiprolu wrote:
>>> Checkpoint/restart file owner, file-locks and file-lease information.
>>>
>> Can you explain roughly how this is intended to work, or point me at a
>> document explaining it?
>>
>> I'm trying to figure out how the file lock checkpoint will work with
>> cluster filesystems, or if there needs to be a mechanism to turn this
>> feature off for those filesystems. What prevents the lock state changing
>> in an incompatible way between the checkpoint and the restore?
>

Hi Steve,

In addition to Matt's reply -

Checkpoint/restart _assumes_ that there exists a mechanism to keep
the filesystem state _unchanged_ between checkpoint and restart.

For example, one can kill the application after checkpoint and keep
the filesystem from being touched.
A more likely scenario is to use a filesystem's snapshot/backup
solution during checkpoint to ensure a pristine copy for restart.
In particular, there needs to be a mechanism to accomplish this
in a cluster filesystem, or rely on dedicated userspace tools.

So at restart, the filesystem is assumed to be visible and in the
same state as before. That state also includes locks etc.

Also, c/r has a mechanism to detect cases where a file in use by
the checkpoint application(s) is shared with a task that is not
being checkpointed. In this case, checkpoint will fail, to prevent
inconsistencies.

(I also imagine that often a cluster filesystem is used by parallel
applications - which in turn require some support to be checkpointed
in a consisted manner).

Oren.


> Hi Steve,
>
> [ I'm just going to address your cluster filesystem question and let
>    Suka answer your questions on these patches. ]
>
> 	Open files whose file operations structs are missing the
> .checkpoint operation cause checkpoint to fail. We haven't added a
> .checkpoint operation to cluster filesystems because of the kinds of
> issues you're referring to.
>
> 	I don't think there are any file locks/leases which do not
> require opening the file(s) in question. That means file locks
> and leases in cluster filesystems should also cause checkpoint
> to fail.
>
> 	Each cluster filesystem probably needs some special care when
> considering the use of the generic_file_checkpoint operation.
>
> 	Using generic_file_checkpoint is appropriate when we have some
> way to get a consistent image of the filesystem at the time checkpoint
> takes place. How that happens is largely up to the userspace tools
> called user-cr. Device-mapper snapshots, fsfreezer + rsync, and
> filesystem snapshots will all work. Of course those tools usually don't
> save more volatile state information like locks.
>
> 	It's quite possible cluster filesystems will need their own
> .checkpoint file operations. generic_file_checkpoint is composed of a few
> smaller functions which could make writing such ops easier. For example,
> we've already reused the smaller functions in .checkpoint operations for
> anon_inode-based interfaces, pipes, fifos, and more,
>
> 	What it may come down to is this: How do you backup a cluster
> filesystem? If there's already a backup method that works then we can
> write the .checkpoint operation to rely on it. Often that means we
> can use generic_file_checkpoint. The "backup method" should be
> something which can be invoked by the userspace checkpoint/restart tools
> (user-cr). If the backup method is too slow we can work on
> improving it or we can try something else.
>
> 	So perhaps the best thing we can do to help you is learn how
> folks backup their cluster filesystems. Got any pointers to basic info
> on that?
>
> Cheers,
> 	-Matt Helsley
>
>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/16][cr][v3]: C/R file owner, locks, leases
  2010-08-04 10:45 ` [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Steven Whitehouse
                     ` (2 preceding siblings ...)
  2010-08-04 19:01   ` Sukadev Bhattiprolu
@ 2010-08-04 19:01   ` Sukadev Bhattiprolu
  3 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-04 19:01 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Matthew Wilcox, Containers, Jamie Lokier, John Stultz,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Steven Whitehouse [swhiteho-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org] wrote:
| Hi,
| 
| On Tue, 2010-08-03 at 16:11 -0700, Sukadev Bhattiprolu wrote:
| > Checkpoint/restart file owner, file-locks and file-lease information.
| > 
| Can you explain roughly how this is intended to work, or point me at a
| document explaining it?

Well, there is no document atm and I tried to explain how it is supposed
to work in the patches that restart file-locks (patches 4. 9, 16).

| 
| I'm trying to figure out how the file lock checkpoint will work with
| cluster filesystems, or if there needs to be a mechanism to turn this
| feature off for those filesystems. What prevents the lock state changing
| in an incompatible way between the checkpoint and the restore?

But I see that your question is about the bigger picture. Given that
this question was asked on the previous version of the patchset, I will
add a FAQ/description in patch 0 next time.

For now, please see:

	http://marc.info/?l=linux-fsdevel&m=127657580718791&w=2

Thanks,

Sukadev

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/16][cr][v3]: C/R file owner, locks, leases
  2010-08-04 10:45 ` [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Steven Whitehouse
  2010-08-04 17:26   ` Matt Helsley
  2010-08-04 17:26   ` Matt Helsley
@ 2010-08-04 19:01   ` Sukadev Bhattiprolu
  2010-08-04 19:16     ` Oren Laadan
       [not found]     ` <20100804190112.GA11571-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  2010-08-04 19:01   ` Sukadev Bhattiprolu
  3 siblings, 2 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-04 19:01 UTC (permalink / raw)
  To: Steven Whitehouse
  Cc: Oren Laadan, Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers

Steven Whitehouse [swhiteho@redhat.com] wrote:
| Hi,
| 
| On Tue, 2010-08-03 at 16:11 -0700, Sukadev Bhattiprolu wrote:
| > Checkpoint/restart file owner, file-locks and file-lease information.
| > 
| Can you explain roughly how this is intended to work, or point me at a
| document explaining it?

Well, there is no document atm and I tried to explain how it is supposed
to work in the patches that restart file-locks (patches 4. 9, 16).

| 
| I'm trying to figure out how the file lock checkpoint will work with
| cluster filesystems, or if there needs to be a mechanism to turn this
| feature off for those filesystems. What prevents the lock state changing
| in an incompatible way between the checkpoint and the restore?

But I see that your question is about the bigger picture. Given that
this question was asked on the previous version of the patchset, I will
add a FAQ/description in patch 0 next time.

For now, please see:

	http://marc.info/?l=linux-fsdevel&m=127657580718791&w=2

Thanks,

Sukadev

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/16][cr][v3]: C/R file owner, locks, leases
       [not found]     ` <20100804190112.GA11571-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
@ 2010-08-04 19:16       ` Oren Laadan
  0 siblings, 0 replies; 50+ messages in thread
From: Oren Laadan @ 2010-08-04 19:16 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith,
	Steven Whitehouse


On 08/04/2010 03:01 PM, Sukadev Bhattiprolu wrote:
> Steven Whitehouse [swhiteho-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org] wrote:
> | Hi,
> |
> | On Tue, 2010-08-03 at 16:11 -0700, Sukadev Bhattiprolu wrote:
> |>  Checkpoint/restart file owner, file-locks and file-lease information.
> |>
> | Can you explain roughly how this is intended to work, or point me at a
> | document explaining it?
>
> Well, there is no document atm and I tried to explain how it is supposed
> to work in the patches that restart file-locks (patches 4. 9, 16).
>
> |
> | I'm trying to figure out how the file lock checkpoint will work with
> | cluster filesystems, or if there needs to be a mechanism to turn this
> | feature off for those filesystems. What prevents the lock state changing
> | in an incompatible way between the checkpoint and the restore?
>
> But I see that your question is about the bigger picture. Given that
> this question was asked on the previous version of the patchset, I will
> add a FAQ/description in patch 0 next time.
>
> For now, please see:
>
> 	http://marc.info/?l=linux-fsdevel&m=127657580718791&w=2
>

Would probably be a good idea to put something on the wiki
and/or inside Documentation/checkpoint/...

Oren

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 00/16][cr][v3]: C/R file owner, locks, leases
  2010-08-04 19:01   ` Sukadev Bhattiprolu
@ 2010-08-04 19:16     ` Oren Laadan
       [not found]     ` <20100804190112.GA11571-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
  1 sibling, 0 replies; 50+ messages in thread
From: Oren Laadan @ 2010-08-04 19:16 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Steven Whitehouse, Serge Hallyn, Matt Helsley, Dan Smith,
	John Stultz, Matthew Wilcox, Jamie Lokier, linux-fsdevel,
	Containers


On 08/04/2010 03:01 PM, Sukadev Bhattiprolu wrote:
> Steven Whitehouse [swhiteho@redhat.com] wrote:
> | Hi,
> |
> | On Tue, 2010-08-03 at 16:11 -0700, Sukadev Bhattiprolu wrote:
> |>  Checkpoint/restart file owner, file-locks and file-lease information.
> |>
> | Can you explain roughly how this is intended to work, or point me at a
> | document explaining it?
>
> Well, there is no document atm and I tried to explain how it is supposed
> to work in the patches that restart file-locks (patches 4. 9, 16).
>
> |
> | I'm trying to figure out how the file lock checkpoint will work with
> | cluster filesystems, or if there needs to be a mechanism to turn this
> | feature off for those filesystems. What prevents the lock state changing
> | in an incompatible way between the checkpoint and the restore?
>
> But I see that your question is about the bigger picture. Given that
> this question was asked on the previous version of the patchset, I will
> add a FAQ/description in patch 0 next time.
>
> For now, please see:
>
> 	http://marc.info/?l=linux-fsdevel&m=127657580718791&w=2
>

Would probably be a good idea to put something on the wiki
and/or inside Documentation/checkpoint/...

Oren

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 04/16][cr][v3]: Restore file_owner info
       [not found]   ` <1280877097-12377-5-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2010-08-04 23:01     ` Oren Laadan
  0 siblings, 0 replies; 50+ messages in thread
From: Oren Laadan @ 2010-08-04 23:01 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith



On 08/03/2010 07:11 PM, Sukadev Bhattiprolu wrote:
> Restore the file-owner information for each 'struct file'.  This is
> essentially is like a new fcntl(F_SETOWN) and fcntl(F_SETSIG) calls,
> except that the pid, uid, euid and signum values are read from the
> checkpoint image.
>
> Changelog[v3]:
> 	- [Oren Laadan]: Ensure find_vpid() found a valid pid before
> 	  making it the file owner.
> Changelog[v2]:
> 	- [Matt Helsley, Serge Hallyn]: Don't trust uids in checkpoint image.
> 	  (added CAP_KILL check)
> 	- Check that signal number read from the checkpoint image is valid.
> 	  (not sure it is required, since its an incomplete check for tampering)
>
> Signed-off-by: Sukadev Bhattiprolu<sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>

I may have missed a previous discussion on this - but: do you
plan to relate the uid/euid to the userns ?

[...]

> diff --git a/fs/checkpoint.c b/fs/checkpoint.c
> index ce1b4af..b5486c1 100644
> --- a/fs/checkpoint.c
> +++ b/fs/checkpoint.c
> @@ -618,6 +618,68 @@ static int attach_file(struct file *file)
>   	return fd;
>   }
>
> +static int restore_file_owner(struct ckpt_ctx *ctx, struct ckpt_hdr_file *h,
> +		struct file *file)
> +{
> +	int ret;
> +	struct pid *pid;
> +	uid_t uid, euid;
> +
> +	uid = h->f_owner_uid;
> +	euid = h->f_owner_euid;
> +
> +	ckpt_debug("restore_file_owner(): uid %u, euid %u, pid %d, type %d\n",
> +			uid, euid, h->f_owner_pid, h->f_owner_pid_type);
> +	/*
> +	 * We can't trust the uids in the checkpoint image and normally need
> +	 * CAP_KILL. But if the uids match our ids, should be fine since we
> +	 * have access to the file.
> +	 *
> +	 * TODO: Move this check to __f_setown() ?
> +	 */
> +	ret = -EACCES;
> +	if (!capable(CAP_KILL)&&
> +			(uid != current_uid() || euid != current_euid())) {
> +		ckpt_err(ctx, ret, "image uids [%d, %d] don't match current "
> +				"process uids [%d, %d] and no CAP_KILL\n",
> +				uid, euid, current_uid(), current_euid());
> +		return ret;
> +	}
> +
> +	ret = -EINVAL;
> +	if (!valid_signal(h->f_owner_signum)) {
> +		ckpt_err(ctx, ret, "Invalid signum %d\n", h->f_owner_signum);
> +		return ret;
> +	}
> +	file->f_owner.signum = h->f_owner_signum;
> +
> +	rcu_read_lock();
> +
> +	/*
> +	 * If file had a non-NULL owner and we can't find the owner after
> +	 * restart, return error.
> +	 */
> +	pid = find_vpid(h->f_owner_pid);
> +	if (h->f_owner_pid&&  !pid)
> +		ret = -ESRCH;
> +	else {
> +		/*
> +		 * TODO: Do we need 'force' to be 1 here or can it be 0 ?
> +		 * 	 'force' is used to modify the owner, if one is
> +		 * 	 already set. Can it be set when we restart an
> +		 * 	 application ?
> +		 */
> +		ret = __f_setown(file, pid, h->f_owner_pid_type, uid, euid, 1);

__f_setown() does not check its pid_type argument - you need
to sanitize here, no ?

(and not that I expect the PIDTYPE_... will ever change, but -
possibly encode the PIDTYPE_... types into CKPT_PIDTYPE_... )

[...]

Oren.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 04/16][cr][v3]: Restore file_owner info
  2010-08-03 23:11 ` [PATCH 04/16][cr][v3]: Restore file_owner info Sukadev Bhattiprolu
       [not found]   ` <1280877097-12377-5-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2010-08-04 23:01   ` Oren Laadan
  1 sibling, 0 replies; 50+ messages in thread
From: Oren Laadan @ 2010-08-04 23:01 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers



On 08/03/2010 07:11 PM, Sukadev Bhattiprolu wrote:
> Restore the file-owner information for each 'struct file'.  This is
> essentially is like a new fcntl(F_SETOWN) and fcntl(F_SETSIG) calls,
> except that the pid, uid, euid and signum values are read from the
> checkpoint image.
>
> Changelog[v3]:
> 	- [Oren Laadan]: Ensure find_vpid() found a valid pid before
> 	  making it the file owner.
> Changelog[v2]:
> 	- [Matt Helsley, Serge Hallyn]: Don't trust uids in checkpoint image.
> 	  (added CAP_KILL check)
> 	- Check that signal number read from the checkpoint image is valid.
> 	  (not sure it is required, since its an incomplete check for tampering)
>
> Signed-off-by: Sukadev Bhattiprolu<sukadev@linux.vnet.ibm.com>

I may have missed a previous discussion on this - but: do you
plan to relate the uid/euid to the userns ?

[...]

> diff --git a/fs/checkpoint.c b/fs/checkpoint.c
> index ce1b4af..b5486c1 100644
> --- a/fs/checkpoint.c
> +++ b/fs/checkpoint.c
> @@ -618,6 +618,68 @@ static int attach_file(struct file *file)
>   	return fd;
>   }
>
> +static int restore_file_owner(struct ckpt_ctx *ctx, struct ckpt_hdr_file *h,
> +		struct file *file)
> +{
> +	int ret;
> +	struct pid *pid;
> +	uid_t uid, euid;
> +
> +	uid = h->f_owner_uid;
> +	euid = h->f_owner_euid;
> +
> +	ckpt_debug("restore_file_owner(): uid %u, euid %u, pid %d, type %d\n",
> +			uid, euid, h->f_owner_pid, h->f_owner_pid_type);
> +	/*
> +	 * We can't trust the uids in the checkpoint image and normally need
> +	 * CAP_KILL. But if the uids match our ids, should be fine since we
> +	 * have access to the file.
> +	 *
> +	 * TODO: Move this check to __f_setown() ?
> +	 */
> +	ret = -EACCES;
> +	if (!capable(CAP_KILL)&&
> +			(uid != current_uid() || euid != current_euid())) {
> +		ckpt_err(ctx, ret, "image uids [%d, %d] don't match current "
> +				"process uids [%d, %d] and no CAP_KILL\n",
> +				uid, euid, current_uid(), current_euid());
> +		return ret;
> +	}
> +
> +	ret = -EINVAL;
> +	if (!valid_signal(h->f_owner_signum)) {
> +		ckpt_err(ctx, ret, "Invalid signum %d\n", h->f_owner_signum);
> +		return ret;
> +	}
> +	file->f_owner.signum = h->f_owner_signum;
> +
> +	rcu_read_lock();
> +
> +	/*
> +	 * If file had a non-NULL owner and we can't find the owner after
> +	 * restart, return error.
> +	 */
> +	pid = find_vpid(h->f_owner_pid);
> +	if (h->f_owner_pid&&  !pid)
> +		ret = -ESRCH;
> +	else {
> +		/*
> +		 * TODO: Do we need 'force' to be 1 here or can it be 0 ?
> +		 * 	 'force' is used to modify the owner, if one is
> +		 * 	 already set. Can it be set when we restart an
> +		 * 	 application ?
> +		 */
> +		ret = __f_setown(file, pid, h->f_owner_pid_type, uid, euid, 1);

__f_setown() does not check its pid_type argument - you need
to sanitize here, no ?

(and not that I expect the PIDTYPE_... will ever change, but -
possibly encode the PIDTYPE_... types into CKPT_PIDTYPE_... )

[...]

Oren.


^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/16][cr][v3]: Checkpoint file-locks
       [not found]   ` <1280877097-12377-7-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2010-08-04 23:26     ` Oren Laadan
  0 siblings, 0 replies; 50+ messages in thread
From: Oren Laadan @ 2010-08-04 23:26 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith



On 08/03/2010 07:11 PM, Sukadev Bhattiprolu wrote:
> While checkpointing each file-descriptor, find all the locks on the
> file and save information about the lock in the checkpoint-image.
> A follow-on patch will use this informaiton to restore the file-locks.
>
> Changelog[v3]:
> 	[Oren Laadan] Add a missing (loff_t) type cast and use a macro
> 		to set the marker/dummy file lock
>
> Changelog[v2]:
> 	[Matt Helsley]: Use fixed sizes (__s64) instead of 'loff_t' in
> 		'struct ckpt_hdr_file_lock'.
> 	[Matt Helsley, Serge Hallyn]: Highlight new use of BKL (using
> 		lock_flocks() macros as suggested by Serge).
> 	[Matt Helsley]: Reorg code a bit to simplify error handling.
> 	[Matt Helsley]: Reorg code to initialize marker-lock (Pass a
> 		NULL lock to checkpoint_one_lock() to indicate marker).
>
> Signed-off-by: Sukadev Bhattiprolu<sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
> ---
>   fs/checkpoint.c                |  100 ++++++++++++++++++++++++++++++++++-----
>   include/linux/checkpoint_hdr.h |   18 +++++++
>   2 files changed, 105 insertions(+), 13 deletions(-)
>
> diff --git a/fs/checkpoint.c b/fs/checkpoint.c
> index b5486c1..57b6944 100644
> --- a/fs/checkpoint.c
> +++ b/fs/checkpoint.c
> @@ -26,8 +26,19 @@
>   #include<linux/checkpoint.h>
>   #include<linux/eventpoll.h>
>   #include<linux/eventfd.h>
> +#include<linux/smp_lock.h>
>   #include<net/sock.h>
>
> +/*
> + * TODO: This code uses the BKL for consistency with other uses of
> + * 	 'for_each_lock()'. But since the BKL may be replaced with another
> + * 	 lock in the future, use lock_flocks() macros instead. lock_flocks()
> + * 	 are currently used in BKL-fix sand boxes and when those changes
> + * 	 are merged, the following macros can be removed
> + */
> +#define lock_flocks()		lock_kernel()
> +#define unlock_flocks()	unlock_kernel()
> +
>   /**************************************************************************
>    * Checkpoint
>    */
> @@ -256,8 +267,78 @@ static int checkpoint_file(struct ckpt_ctx *ctx, void *ptr)
>   	return ret;
>   }

I prefer the approach of writing-all-at-once over the current
one-at-a-time-and-mark-the-end. See my previous comment:
(https://lists.linux-foundation.org/pipermail/containers/2010-July/025057.html)

---
   Having looked at the code again - how about the following:
   get rid of this "last entry" altogether; instead, during
   checkpoint, first count the locks, then write a header with
   the number of locks, following by that many records of the
   locks themselves.

   This is what we do for other resource lists/chains as well.
   Also makes it a bit easier to parse since you always know
   what to expect once you see the header.
---

(Yes, I'm aware that if the list is long you may need to send
multiple chunks and for that "remember" where you were in the
list...)

[...]

Oren.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 06/16][cr][v3]: Checkpoint file-locks
  2010-08-03 23:11 ` [PATCH 06/16][cr][v3]: Checkpoint file-locks Sukadev Bhattiprolu
@ 2010-08-04 23:26   ` Oren Laadan
       [not found]   ` <1280877097-12377-7-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
  1 sibling, 0 replies; 50+ messages in thread
From: Oren Laadan @ 2010-08-04 23:26 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers



On 08/03/2010 07:11 PM, Sukadev Bhattiprolu wrote:
> While checkpointing each file-descriptor, find all the locks on the
> file and save information about the lock in the checkpoint-image.
> A follow-on patch will use this informaiton to restore the file-locks.
>
> Changelog[v3]:
> 	[Oren Laadan] Add a missing (loff_t) type cast and use a macro
> 		to set the marker/dummy file lock
>
> Changelog[v2]:
> 	[Matt Helsley]: Use fixed sizes (__s64) instead of 'loff_t' in
> 		'struct ckpt_hdr_file_lock'.
> 	[Matt Helsley, Serge Hallyn]: Highlight new use of BKL (using
> 		lock_flocks() macros as suggested by Serge).
> 	[Matt Helsley]: Reorg code a bit to simplify error handling.
> 	[Matt Helsley]: Reorg code to initialize marker-lock (Pass a
> 		NULL lock to checkpoint_one_lock() to indicate marker).
>
> Signed-off-by: Sukadev Bhattiprolu<sukadev@linux.vnet.ibm.com>
> ---
>   fs/checkpoint.c                |  100 ++++++++++++++++++++++++++++++++++-----
>   include/linux/checkpoint_hdr.h |   18 +++++++
>   2 files changed, 105 insertions(+), 13 deletions(-)
>
> diff --git a/fs/checkpoint.c b/fs/checkpoint.c
> index b5486c1..57b6944 100644
> --- a/fs/checkpoint.c
> +++ b/fs/checkpoint.c
> @@ -26,8 +26,19 @@
>   #include<linux/checkpoint.h>
>   #include<linux/eventpoll.h>
>   #include<linux/eventfd.h>
> +#include<linux/smp_lock.h>
>   #include<net/sock.h>
>
> +/*
> + * TODO: This code uses the BKL for consistency with other uses of
> + * 	 'for_each_lock()'. But since the BKL may be replaced with another
> + * 	 lock in the future, use lock_flocks() macros instead. lock_flocks()
> + * 	 are currently used in BKL-fix sand boxes and when those changes
> + * 	 are merged, the following macros can be removed
> + */
> +#define lock_flocks()		lock_kernel()
> +#define unlock_flocks()	unlock_kernel()
> +
>   /**************************************************************************
>    * Checkpoint
>    */
> @@ -256,8 +267,78 @@ static int checkpoint_file(struct ckpt_ctx *ctx, void *ptr)
>   	return ret;
>   }

I prefer the approach of writing-all-at-once over the current
one-at-a-time-and-mark-the-end. See my previous comment:
(https://lists.linux-foundation.org/pipermail/containers/2010-July/025057.html)

---
   Having looked at the code again - how about the following:
   get rid of this "last entry" altogether; instead, during
   checkpoint, first count the locks, then write a header with
   the number of locks, following by that many records of the
   locks themselves.

   This is what we do for other resource lists/chains as well.
   Also makes it a bit easier to parse since you always know
   what to expect once you see the header.
---

(Yes, I'm aware that if the list is long you may need to send
multiple chunks and for that "remember" where you were in the
list...)

[...]

Oren.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 16/16][cr][v3]: Restore file-leases
       [not found]   ` <1280877097-12377-17-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2010-08-04 23:35     ` Oren Laadan
  0 siblings, 0 replies; 50+ messages in thread
From: Oren Laadan @ 2010-08-04 23:35 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith


How about adding the intro of this patch as a section in the
respective Documentation/checkpoint/.... ?

Oren.


On 08/03/2010 07:11 PM, Sukadev Bhattiprolu wrote:
> Restart an application with file-leases, from its checkpoint.
>
> Restart of file-lease that is not being broken (i.e F_INPROGRESS is not set)
> is almost identical to C/R of file-locks. i.e save the type of lease for the
> file in the checkpoint image and when restarting, restore the lease by calling
> do_setlease().
>
> C/R of file-lease gets complicated (I think), if a process is checkpointed
> when its lease was being revoked. i.e if P1 has a F_WRLCK lease on file F1
> and P2 opens F1 for write, P2's open is blocked for lease_break_time (45 secs).
> P1's lease is revoked (i.e set to F_UNLCK) and P1 is notified via a SIGIO to
> flush any dirty data.
>
> Basic design:
>
> To restore a lease that is being broken, we temporarily re-assign the original
> lease type (that we saved in ->fl_type_prev) to the lease-holder. i.e. in the
> above example, give P1 a F_WRLCK lease). When the lease-breaker (P2) is
> restarted after checkpoint, its open() system fails with -ERESTARTSYS and it
> will retry the open(). This open() will re-initiate the lease-break protocol
> (i.e P2 will go back to waiting and P1 will be notified).
>
> Some observations about this approach:
>
> 1. We must use ->fl_type_prev because, when the lease is being broken,
>    ->fl_type is already set to F_UNLCK and would not result in a
>    lease-break protocol when P2 is restarted.
>
> 2. When the lease-break is initiated and we signal the lease-holder, we set
>     the ->fl_break_notified field. When restarting the lease and repeating
>     the lease-break protocol, we check the ->fl_break_notified field and
>     signal the lease-holder only if did not signal before the checkpoint.
>
> 3. If P1 was was checkpointed 40 seconds into the lease_break_time,(i.e.
>     it had 5 seconds remaining in the lease), we would ideally want to ensure
>     that after restart, P1 gets 5 or at least 5 seconds to finish cleaning up
>     the lease.
>
>     But the actual time that P1 gets after the application is restarted
>     depends on many factors (number of processes in the application
>     process tree, load on system at the time of restart etc).
>
>     Jamie Lokier had suggested that we favor the lease-holder (P1) during
>     restart, even if it meant giving the lease-holder the entire lease-break
>     interval (45 seconds) again after the restart. Oren Laadan suggested
>     that rather than make that a kernel policy, we let the user choose a
>     policy based on the application's behavior.
>
>     The current patchset computes and checkpoints the remaining-lease and
>     uses this value to restore the lease. i.e the kernel simply uses the
>     "remaining-lease" value stored in the checkpoint image. Userspace tools
>     can be developed to alter the remaining-lease value in the checkpoint
>     image to either favor the lease-holder or the lease-breaker or to add
>     a fixed delta.
>
> 4. The above design of C/R of file-leases assumes that both lease-holder
>     and lease-breaker are restarted. If only the lease-holder is
>     restarted, the kernel will re-assign the original lease (F_WRLCK in
>     the example) to lease-holder. If no lease-breaker comes along, the
>     kernel will leave the lease assigned to lease-holder.
>
>     This should not be a problem because, as far as the lease-holder is
>     concerned the lease was revoked and it will/should reacquire the
>     lease.
>
> Changelog[v3]:
>
> 	- Broke-up patchset into smaller patches and addressed comments
> 	  from Oren Laadan, Jamie Lokier.
>
> Changelog[v2]:
> 	- comments from Matt Helsley, Serge Hallyn...

[...]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH 16/16][cr][v3]: Restore file-leases
  2010-08-03 23:11 ` [PATCH 16/16][cr][v3]: Restore file-leases Sukadev Bhattiprolu
       [not found]   ` <1280877097-12377-17-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
@ 2010-08-04 23:35   ` Oren Laadan
  1 sibling, 0 replies; 50+ messages in thread
From: Oren Laadan @ 2010-08-04 23:35 UTC (permalink / raw)
  To: Sukadev Bhattiprolu
  Cc: Serge Hallyn, Matt Helsley, Dan Smith, John Stultz,
	Matthew Wilcox, Jamie Lokier, linux-fsdevel, Containers


How about adding the intro of this patch as a section in the
respective Documentation/checkpoint/.... ?

Oren.


On 08/03/2010 07:11 PM, Sukadev Bhattiprolu wrote:
> Restart an application with file-leases, from its checkpoint.
>
> Restart of file-lease that is not being broken (i.e F_INPROGRESS is not set)
> is almost identical to C/R of file-locks. i.e save the type of lease for the
> file in the checkpoint image and when restarting, restore the lease by calling
> do_setlease().
>
> C/R of file-lease gets complicated (I think), if a process is checkpointed
> when its lease was being revoked. i.e if P1 has a F_WRLCK lease on file F1
> and P2 opens F1 for write, P2's open is blocked for lease_break_time (45 secs).
> P1's lease is revoked (i.e set to F_UNLCK) and P1 is notified via a SIGIO to
> flush any dirty data.
>
> Basic design:
>
> To restore a lease that is being broken, we temporarily re-assign the original
> lease type (that we saved in ->fl_type_prev) to the lease-holder. i.e. in the
> above example, give P1 a F_WRLCK lease). When the lease-breaker (P2) is
> restarted after checkpoint, its open() system fails with -ERESTARTSYS and it
> will retry the open(). This open() will re-initiate the lease-break protocol
> (i.e P2 will go back to waiting and P1 will be notified).
>
> Some observations about this approach:
>
> 1. We must use ->fl_type_prev because, when the lease is being broken,
>    ->fl_type is already set to F_UNLCK and would not result in a
>    lease-break protocol when P2 is restarted.
>
> 2. When the lease-break is initiated and we signal the lease-holder, we set
>     the ->fl_break_notified field. When restarting the lease and repeating
>     the lease-break protocol, we check the ->fl_break_notified field and
>     signal the lease-holder only if did not signal before the checkpoint.
>
> 3. If P1 was was checkpointed 40 seconds into the lease_break_time,(i.e.
>     it had 5 seconds remaining in the lease), we would ideally want to ensure
>     that after restart, P1 gets 5 or at least 5 seconds to finish cleaning up
>     the lease.
>
>     But the actual time that P1 gets after the application is restarted
>     depends on many factors (number of processes in the application
>     process tree, load on system at the time of restart etc).
>
>     Jamie Lokier had suggested that we favor the lease-holder (P1) during
>     restart, even if it meant giving the lease-holder the entire lease-break
>     interval (45 seconds) again after the restart. Oren Laadan suggested
>     that rather than make that a kernel policy, we let the user choose a
>     policy based on the application's behavior.
>
>     The current patchset computes and checkpoints the remaining-lease and
>     uses this value to restore the lease. i.e the kernel simply uses the
>     "remaining-lease" value stored in the checkpoint image. Userspace tools
>     can be developed to alter the remaining-lease value in the checkpoint
>     image to either favor the lease-holder or the lease-breaker or to add
>     a fixed delta.
>
> 4. The above design of C/R of file-leases assumes that both lease-holder
>     and lease-breaker are restarted. If only the lease-holder is
>     restarted, the kernel will re-assign the original lease (F_WRLCK in
>     the example) to lease-holder. If no lease-breaker comes along, the
>     kernel will leave the lease assigned to lease-holder.
>
>     This should not be a problem because, as far as the lease-holder is
>     concerned the lease was revoked and it will/should reacquire the
>     lease.
>
> Changelog[v3]:
>
> 	- Broke-up patchset into smaller patches and addressed comments
> 	  from Oren Laadan, Jamie Lokier.
>
> Changelog[v2]:
> 	- comments from Matt Helsley, Serge Hallyn...

[...]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH 00/16][cr][v3]: C/R file owner, locks, leases
@ 2010-08-03 23:11 Sukadev Bhattiprolu
  0 siblings, 0 replies; 50+ messages in thread
From: Sukadev Bhattiprolu @ 2010-08-03 23:11 UTC (permalink / raw)
  To: Oren Laadan
  Cc: Matthew Wilcox, John Stultz, Containers, Jamie Lokier,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA, Dan Smith

Checkpoint/restart file owner, file-locks and file-lease information.

Changelog[v3]:
	- Broke-up C/R of file-leases patches into smaller patches and included
	  them in this set.
	- Addressed comments from Jamie Lokier, Oren Laadan with help from
	  John Stultz on the computation of time offsets.


Sukadev Bhattiprolu (16):
  Add uid, euid params to f_modown()
  Add uid, euid params to __f_setown()
  Checkpoint file-owner information
  Restore file_owner info
  Move file_lock macros into linux/fs.h
  Checkpoint file-locks
  Define flock_set()
  Define flock64_set()
  Restore file-locks
  Initialize ->fl_break_time to 0
  Add ->fl_type_prev field.
  Add ->fl_break_notified field.
  Add jiffies_begin field to ckpt_ctx
  Checkpoint file-leases
  Define do_setlease()
  Restore file-leases

 drivers/char/tty_io.c            |    3 +-
 drivers/net/tun.c                |    3 +-
 fs/checkpoint.c                  |  410 +++++++++++++++++++++++++++++++++++---
 fs/fcntl.c                       |   19 +-
 fs/locks.c                       |  207 +++++++++++++++-----
 fs/notify/dnotify/dnotify.c      |    3 +-
 include/linux/checkpoint_hdr.h   |   26 +++
 include/linux/checkpoint_types.h |    1 +
 include/linux/fs.h               |   17 ++-
 kernel/checkpoint/sys.c          |    1 +
 10 files changed, 596 insertions(+), 94 deletions(-)

Note: Most of the "added lines" in fs/locks.c are comments about C/R :-)

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2010-08-04 23:36 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-03 23:11 [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 01/16][cr][v3]: Add uid, euid params to f_modown() Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 02/16][cr][v3]: Add uid, euid params to __f_setown() Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 03/16][cr][v3]: Checkpoint file-owner information Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 04/16][cr][v3]: Restore file_owner info Sukadev Bhattiprolu
     [not found]   ` <1280877097-12377-5-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2010-08-04 23:01     ` Oren Laadan
2010-08-04 23:01   ` Oren Laadan
     [not found] ` <1280877097-12377-1-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2010-08-03 23:11   ` [PATCH 01/16][cr][v3]: Add uid, euid params to f_modown() Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 02/16][cr][v3]: Add uid, euid params to __f_setown() Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 03/16][cr][v3]: Checkpoint file-owner information Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 04/16][cr][v3]: Restore file_owner info Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 05/16][cr][v3]: Move file_lock macros into linux/fs.h Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 06/16][cr][v3]: Checkpoint file-locks Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 07/16][cr][v3]: Define flock_set() Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 08/16][cr][v3]: Define flock64_set() Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 09/16][cr][v3]: Restore file-locks Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 10/16][cr][v3]: Initialize ->fl_break_time to 0 Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 11/16][cr][v3]: Add ->fl_type_prev field Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 12/16][cr][v3]: Add ->fl_break_notified field Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 13/16][cr][v3]: Add jiffies_begin field to ckpt_ctx Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 14/16][cr][v3]: Checkpoint file-leases Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 15/16][cr][v3]: Define do_setlease() Sukadev Bhattiprolu
2010-08-03 23:11   ` [PATCH 16/16][cr][v3]: Restore file-leases Sukadev Bhattiprolu
2010-08-04 10:45   ` [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Steven Whitehouse
2010-08-03 23:11 ` [PATCH 05/16][cr][v3]: Move file_lock macros into linux/fs.h Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 06/16][cr][v3]: Checkpoint file-locks Sukadev Bhattiprolu
2010-08-04 23:26   ` Oren Laadan
     [not found]   ` <1280877097-12377-7-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2010-08-04 23:26     ` Oren Laadan
2010-08-03 23:11 ` [PATCH 07/16][cr][v3]: Define flock_set() Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 08/16][cr][v3]: Define flock64_set() Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 09/16][cr][v3]: Restore file-locks Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 10/16][cr][v3]: Initialize ->fl_break_time to 0 Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 11/16][cr][v3]: Add ->fl_type_prev field Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 12/16][cr][v3]: Add ->fl_break_notified field Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 13/16][cr][v3]: Add jiffies_begin field to ckpt_ctx Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 14/16][cr][v3]: Checkpoint file-leases Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 15/16][cr][v3]: Define do_setlease() Sukadev Bhattiprolu
2010-08-03 23:11 ` [PATCH 16/16][cr][v3]: Restore file-leases Sukadev Bhattiprolu
     [not found]   ` <1280877097-12377-17-git-send-email-sukadev-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
2010-08-04 23:35     ` Oren Laadan
2010-08-04 23:35   ` Oren Laadan
2010-08-04 10:45 ` [PATCH 00/16][cr][v3]: C/R file owner, locks, leases Steven Whitehouse
2010-08-04 17:26   ` Matt Helsley
     [not found]     ` <20100804172649.GM2927-52DBMbEzqgQ/wnmkkaCWp/UQ3DHhIser@public.gmane.org>
2010-08-04 18:03       ` Oren Laadan
2010-08-04 18:03     ` Oren Laadan
2010-08-04 17:26   ` Matt Helsley
2010-08-04 19:01   ` Sukadev Bhattiprolu
2010-08-04 19:16     ` Oren Laadan
     [not found]     ` <20100804190112.GA11571-r/Jw6+rmf7HQT0dZR+AlfA@public.gmane.org>
2010-08-04 19:16       ` Oren Laadan
2010-08-04 19:01   ` Sukadev Bhattiprolu
  -- strict thread matches above, loose matches on Subject: below --
2010-08-03 23:11 Sukadev Bhattiprolu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.