linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH 0/1] overlayfs: C/R enhancments (RFC)
@ 2020-10-04 19:24 Alexander Mikhalitsyn
  2020-10-04 19:24 ` [RFC PATCH 1/1] overlayfs: add ioctls that allows to get fhandle for layers dentries Alexander Mikhalitsyn
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Alexander Mikhalitsyn @ 2020-10-04 19:24 UTC (permalink / raw)
  To: miklos
  Cc: Alexander Mikhalitsyn, Amir Goldstein, Andrei Vagin,
	Pavel Tikhomirov, David Howells, linux-unionfs, linux-fsdevel,
	linux-kernel

Some time ago we discussed about the problem of Checkpoint-Restoring
overlayfs mounts [1]. Big thanks to Amir for review and suggestions.

Brief from previous discussion.
Problem statement: to checkpoint-restore overlayfs mounts we need
to save overlayfs mount state and save it into the image. Basically,
this state for us it's just mount options of overlayfs mount. But
here we have two problems:

I. during mounting overlayfs user may specify relative paths in upperdir,
workdir, lowerdir options

II. also user may unmount mount from which these paths was opened during mounting

This is real problems for us. My first patch was attempt to address both problems.
1. I've added refcnt get for mounts from which overlayfs was mounted.
2. I've changed overlayfs mountinfo show algorithm, so overlayfs started to *always*
show full paths for upperdir,workdir,lowerdirs.
3. I've added mnt_id show-time only option which allows to determine from which mnt_id
we opened options paths.

Pros:
- we can determine full information about overlayfs mount
- we hold refcnt to mount, so, user may unmount source mounts only
with lazy flag

Cons:
- by adding refcnt get for mount I've changed possible overlayfs usecases
- by showing *full* paths we can more easily reache PAGE_SIZE limit of 
mounts options in procfs
- by adding mnt_id show-only option I've added inconsistency between
mount-time options and show-time mount options

After very productive discussion with Amir and Pavel I've decided to write new
implementation. In new approach we decided *not* to take extra refcnts to mounts.
Also we decided to use exportfs fhandles instead of full paths. To determine
full path we plan to use the next algo:
1. Export {s_dev; fhandle} from overlayfs for *all* sources
2. User open_by_handle_at syscall to open all these fhandles (we need to
determine mount for each fhandle, looks like we can do this by s_dev by linear
search in /proc/<pid>/mountinfo)
3. Then readlink /proc/<pid>/fd/<opened fd>
4. Dump this full path+mnt_id

But there is question. How to export this {s_dev; fhandle} from kernel to userspace?
- We decided not to use procfs.
- Amir proposed solution - use xattrs. But after diving into it I've meet problem
where I can set this xattrs?
If I set this xattrs on overlayfs dentries then during rsync, or cp -p=xattr we will copy
this temporary information.
- ioctls? (this patchset implements this approach)
- fsinfo subsystem (not merged yet) [2]

Problems with ioctls:
1. We limited in output data size (16 KB AFAIK)
but MAX_HANDLE_SZ=128(bytes), OVL_MAX_STACK=500(num lowerdirs)
So, MAX_HANDLE_SZ*OVL_MAX_STACK = 64KB which is bigger than limit.
So, I've decided to give user one fhandle by one call. This is also
bad from the performance point of view.
2. When using ioctls we need to have *fixed* size of input and output.
So, if MAX_HANDLE_SZ will change in the future our _IOR('o', 2, struct ovl_mnt_opt_fh)
will also change with struct ovl_mnt_opt_fh.

So, I hope that we discuss about this patchset and try to make possible solutions together.

Thanks.
Regards, Alex.

[1] https://lore.kernel.org/linux-unionfs/20200604161133.20949-1-alexander.mikhalitsyn@virtuozzo.com/
[2] https://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git/log/?h=fsinfo-core

Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-unionfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org

Alexander Mikhalitsyn (1):
  overlayfs: add ioctls that allows to get fhandle for layers dentries

 fs/overlayfs/readdir.c | 160 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 160 insertions(+)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/1] overlayfs: add ioctls that allows to get fhandle for layers dentries
  2020-10-04 19:24 [RFC PATCH 0/1] overlayfs: C/R enhancments (RFC) Alexander Mikhalitsyn
@ 2020-10-04 19:24 ` Alexander Mikhalitsyn
  2020-10-04 19:48   ` Randy Dunlap
  2020-10-05  7:56 ` [RFC PATCH 0/1] overlayfs: C/R enhancments (RFC) Amir Goldstein
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Alexander Mikhalitsyn @ 2020-10-04 19:24 UTC (permalink / raw)
  To: miklos
  Cc: Alexander Mikhalitsyn, Amir Goldstein, Andrei Vagin,
	Pavel Tikhomirov, David Howells, linux-unionfs, linux-fsdevel,
	linux-kernel

Add several ioctls to ovl_dir_operations that allows to get file handles
for upperdir, workdir, lowerdir dentries. Special {s_dev; fhandle}
format used. (Ideally should be {mnt_id; fhandle} but this impossible
because overlayfs not keeps mounts refcnt for layers.)

Added ioctls list:
OVL_IOC_GETLWRFHNDLSNUM - get lowerdirs count
OVL_IOC_GETLWRFHNDL - get i-th lowerdir fhandle
OVL_IOC_GETUPPRFHNDL - get upperdir fhandle
OVL_IOC_GETWRKFHNDL - get workdir fhandle

Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-unionfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
---
 fs/overlayfs/readdir.c | 160 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 160 insertions(+)

diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 596002054ac6..12ee043d2b3a 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -13,6 +13,7 @@
 #include <linux/security.h>
 #include <linux/cred.h>
 #include <linux/ratelimit.h>
+#include <linux/exportfs.h>
 #include "overlayfs.h"
 
 struct ovl_cache_entry {
@@ -58,6 +59,20 @@ struct ovl_dir_file {
 	struct file *upperfile;
 };
 
+struct ovl_mnt_opt_fh {
+	__u32 s_dev;
+	struct file_handle fh;
+	/* use f_handle field from struct file_handle */
+	__u8 __fhdata[MAX_HANDLE_SZ];
+};
+
+struct ovl_mnt_opt_fh_req {
+	union {
+		unsigned int lowernum;
+		struct ovl_mnt_opt_fh result;
+	};
+} __packed;
+
 static struct ovl_cache_entry *ovl_cache_entry_from_node(struct rb_node *n)
 {
 	return rb_entry(n, struct ovl_cache_entry, node);
@@ -942,6 +957,150 @@ static int ovl_dir_open(struct inode *inode, struct file *file)
 	return 0;
 }
 
+static long ovl_ioctl_get_lowers_num(struct super_block *sb)
+{
+	struct ovl_entry *oe = sb->s_root->d_fsdata;
+	return oe->numlower;
+}
+
+static struct ovl_mnt_opt_fh *__ovl_encode_mnt_opt_fh(struct dentry *dentry)
+{
+	struct ovl_mnt_opt_fh *opt_fh;
+	int fh_type, dwords;
+	int buflen = MAX_HANDLE_SZ;
+	int err;
+
+	opt_fh = kzalloc(sizeof(struct ovl_mnt_opt_fh), GFP_KERNEL);
+	if (!opt_fh)
+		return ERR_PTR(-ENOMEM);
+
+	/* we ask for a non connected handle */
+	dwords = buflen >> 2;
+	fh_type = exportfs_encode_fh(dentry, (void *)opt_fh->fh.f_handle, &dwords, 0);
+	buflen = (dwords << 2);
+
+	err = -EIO;
+	if (WARN_ON(fh_type < 0) ||
+	    WARN_ON(buflen > MAX_HANDLE_SZ) ||
+	    WARN_ON(fh_type == FILEID_INVALID))
+		goto out_err;
+
+	opt_fh->fh.handle_type = fh_type;
+	opt_fh->fh.handle_bytes = buflen;
+
+	/*
+	 * Ideally, we want to have mnt_id+fhandle, but overlayfs not
+	 * keep refcnts on layers mounts and we couldn't determine
+	 * mnt_ids for layers. So, let's give s_dev to CRIU.
+	 * It's better than nothing.
+	 */
+	opt_fh->s_dev = dentry->d_sb->s_dev;
+
+	return opt_fh;
+
+out_err:
+	kfree(opt_fh);
+	return ERR_PTR(err);
+}
+
+static long __ovl_ioctl_get_fhandle(struct dentry *origin,
+				    unsigned long arg)
+{
+	struct ovl_mnt_opt_fh *fh;
+	int ret = 0;
+
+	fh = __ovl_encode_mnt_opt_fh(origin);
+	if (IS_ERR(fh))
+		return PTR_ERR(fh);
+
+	if (copy_to_user((struct ovl_mnt_opt_fh __user *)arg,
+			 fh, sizeof(*fh)))
+		ret = -EFAULT;
+
+	kfree(fh);
+	return ret;
+}
+
+static long ovl_ioctl_get_lower_fhandle(struct super_block *sb,
+					unsigned long arg)
+{
+	struct ovl_entry *oe = sb->s_root->d_fsdata;
+	struct dentry *origin;
+	struct ovl_mnt_opt_fh_req input;
+
+	BUILD_BUG_ON(sizeof(struct ovl_mnt_opt_fh_req) != sizeof(struct ovl_mnt_opt_fh));
+
+	if (copy_from_user(&input, (struct ovl_mnt_opt_fh_req __user *)arg,
+			   sizeof(input)))
+		return -EFAULT;
+
+	if (input.lowernum >= oe->numlower)
+		return -EINVAL;
+
+	origin = oe->lowerstack[input.lowernum].dentry;
+
+	return __ovl_ioctl_get_fhandle(origin, arg);
+}
+
+static long ovl_ioctl_get_upper_fhandle(struct super_block *sb,
+					unsigned long arg)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+	struct dentry *origin;
+
+	if (!ofs->config.upperdir)
+		return -EINVAL;
+
+	origin = OVL_I(d_inode(sb->s_root))->__upperdentry;
+
+	return __ovl_ioctl_get_fhandle(origin, arg);
+}
+
+static long ovl_ioctl_get_work_fhandle(struct super_block *sb,
+				       unsigned long arg)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+
+	if (!ofs->config.upperdir)
+		return -EINVAL;
+
+	return __ovl_ioctl_get_fhandle(ofs->workbasedir, arg);
+}
+
+#define	OVL_IOC_GETLWRFHNDLSNUM			_IO('o', 1)
+// DISCUSS: what if MAX_HANDLE_SZ will change?
+#define	OVL_IOC_GETLWRFHNDL			_IOR('o', 2, struct ovl_mnt_opt_fh)
+#define	OVL_IOC_GETUPPRFHNDL			_IOR('o', 3, struct ovl_mnt_opt_fh)
+#define	OVL_IOC_GETWRKFHNDL			_IOR('o', 4, struct ovl_mnt_opt_fh)
+
+static long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
+{
+	long ret;
+
+	switch (cmd) {
+	case OVL_IOC_GETLWRFHNDLSNUM:
+		ret = ovl_ioctl_get_lowers_num(file_inode(file)->i_sb);
+		break;
+
+	case OVL_IOC_GETLWRFHNDL:
+		ret = ovl_ioctl_get_lower_fhandle(file_inode(file)->i_sb, arg);
+		break;
+
+	case OVL_IOC_GETUPPRFHNDL:
+		ret = ovl_ioctl_get_upper_fhandle(file_inode(file)->i_sb, arg);
+		break;
+
+	case OVL_IOC_GETWRKFHNDL:
+		ret = ovl_ioctl_get_work_fhandle(file_inode(file)->i_sb, arg);
+		break;
+
+	default:
+		ret = -ENOTTY;
+	}
+
+	return ret;
+}
+
 const struct file_operations ovl_dir_operations = {
 	.read		= generic_read_dir,
 	.open		= ovl_dir_open,
@@ -949,6 +1108,7 @@ const struct file_operations ovl_dir_operations = {
 	.llseek		= ovl_dir_llseek,
 	.fsync		= ovl_dir_fsync,
 	.release	= ovl_dir_release,
+	.unlocked_ioctl	= ovl_ioctl,
 };
 
 int ovl_check_empty_dir(struct dentry *dentry, struct list_head *list)
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/1] overlayfs: add ioctls that allows to get fhandle for layers dentries
  2020-10-04 19:24 ` [RFC PATCH 1/1] overlayfs: add ioctls that allows to get fhandle for layers dentries Alexander Mikhalitsyn
@ 2020-10-04 19:48   ` Randy Dunlap
  0 siblings, 0 replies; 13+ messages in thread
From: Randy Dunlap @ 2020-10-04 19:48 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, miklos
  Cc: Amir Goldstein, Andrei Vagin, Pavel Tikhomirov, David Howells,
	linux-unionfs, linux-fsdevel, linux-kernel

On 10/4/20 12:24 PM, Alexander Mikhalitsyn wrote:
> +#define	OVL_IOC_GETLWRFHNDLSNUM			_IO('o', 1)
> +// DISCUSS: what if MAX_HANDLE_SZ will change?
> +#define	OVL_IOC_GETLWRFHNDL			_IOR('o', 2, struct ovl_mnt_opt_fh)
> +#define	OVL_IOC_GETUPPRFHNDL			_IOR('o', 3, struct ovl_mnt_opt_fh)
> +#define	OVL_IOC_GETWRKFHNDL			_IOR('o', 4, struct ovl_mnt_opt_fh)

Hi,

This needs to have Documentation/userspace-api/ioctl/ioctl-number.rst
updated also.

thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/1] overlayfs: C/R enhancments (RFC)
  2020-10-04 19:24 [RFC PATCH 0/1] overlayfs: C/R enhancments (RFC) Alexander Mikhalitsyn
  2020-10-04 19:24 ` [RFC PATCH 1/1] overlayfs: add ioctls that allows to get fhandle for layers dentries Alexander Mikhalitsyn
@ 2020-10-05  7:56 ` Amir Goldstein
  2020-10-05 19:46   ` Alexander Mikhalitsyn
  2020-10-05 17:02 ` [RFC PATCH] overlayfs: add OVL_IOC_GETINFOFD ioctl that opens ovlinfofd Alexander Mikhalitsyn
  2020-10-14 14:14 ` [RFC PATCH] overlayfs: add fsinfo(FSINFO_ATTR_OVL_SOURCES) support Alexander Mikhalitsyn
  3 siblings, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2020-10-05  7:56 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: Miklos Szeredi, Andrei Vagin, Pavel Tikhomirov, David Howells,
	overlayfs, linux-fsdevel, linux-kernel

On Sun, Oct 4, 2020 at 10:25 PM Alexander Mikhalitsyn
<alexander.mikhalitsyn@virtuozzo.com> wrote:
>
> Some time ago we discussed about the problem of Checkpoint-Restoring
> overlayfs mounts [1]. Big thanks to Amir for review and suggestions.
>
> Brief from previous discussion.
> Problem statement: to checkpoint-restore overlayfs mounts we need
> to save overlayfs mount state and save it into the image. Basically,
> this state for us it's just mount options of overlayfs mount. But
> here we have two problems:
>
> I. during mounting overlayfs user may specify relative paths in upperdir,
> workdir, lowerdir options
>
> II. also user may unmount mount from which these paths was opened during mounting
>
> This is real problems for us. My first patch was attempt to address both problems.
> 1. I've added refcnt get for mounts from which overlayfs was mounted.
> 2. I've changed overlayfs mountinfo show algorithm, so overlayfs started to *always*
> show full paths for upperdir,workdir,lowerdirs.
> 3. I've added mnt_id show-time only option which allows to determine from which mnt_id
> we opened options paths.
>
> Pros:
> - we can determine full information about overlayfs mount
> - we hold refcnt to mount, so, user may unmount source mounts only
> with lazy flag
>
> Cons:
> - by adding refcnt get for mount I've changed possible overlayfs usecases
> - by showing *full* paths we can more easily reache PAGE_SIZE limit of
> mounts options in procfs
> - by adding mnt_id show-only option I've added inconsistency between
> mount-time options and show-time mount options
>
> After very productive discussion with Amir and Pavel I've decided to write new
> implementation. In new approach we decided *not* to take extra refcnts to mounts.
> Also we decided to use exportfs fhandles instead of full paths. To determine
> full path we plan to use the next algo:
> 1. Export {s_dev; fhandle} from overlayfs for *all* sources
> 2. User open_by_handle_at syscall to open all these fhandles (we need to
> determine mount for each fhandle, looks like we can do this by s_dev by linear
> search in /proc/<pid>/mountinfo)
> 3. Then readlink /proc/<pid>/fd/<opened fd>
> 4. Dump this full path+mnt_id
>

Hi Alex,

The general concept looks good to me.
I will not provide specific comment on the implementation (it looks
fine) until the
concept API is accepted by the maintainer.

The main thing I want to make sure is that if we add this interface it can
serve other use cases as well.

During my talk on LPC, I got a similar request from two developers for two
different use cases. They wanted a safe method to iterate "changes
since baseline"
from either within the container or from the host.

Your proposed API is a step in the direction for meeting their requirement.
The major change is that ioctl (or whatever method) should expose the
layers topology of a specific object, not only the overlay instance.

For C/R you would query the layers topology of the overlay root dir.

My comments of the specific methods below are not meant to
object to the choice of ioctl, but they are meant to give the alternative
a fair chance. I am kind of leaning towards ioctl myself.

> But there is question. How to export this {s_dev; fhandle} from kernel to userspace?
> - We decided not to use procfs.

Why not?
C/R already uses procfs to export fhandle for fanotify/inotify
I kind of like the idea of having /sys/fs/overlay/instances etc.
It could be useful to many things.

> - Amir proposed solution - use xattrs. But after diving into it I've meet problem
> where I can set this xattrs?
> If I set this xattrs on overlayfs dentries then during rsync, or cp -p=xattr we will copy
> this temporary information.

No you won't.
rsync, cp will only copy xattrs listed with listxattr.
Several filesystems, such as cifs and nfs export "object properties"
via private xattrs
that are not listed in listxattr (e.g. CIFS_XATTR_CIFS_ACL).
You are not limited in what you can do in the "trusted.overlay" namespace, for
example "trusted.overlay.layers.0.fh"

The advantage is that it is very easy to implement and requires
less discussions about ABI, but I agree it does feel a bit like a hack.

> - ioctls? (this patchset implements this approach)
> - fsinfo subsystem (not merged yet) [2]
>
> Problems with ioctls:
> 1. We limited in output data size (16 KB AFAIK)
> but MAX_HANDLE_SZ=128(bytes), OVL_MAX_STACK=500(num lowerdirs)
> So, MAX_HANDLE_SZ*OVL_MAX_STACK = 64KB which is bigger than limit.
> So, I've decided to give user one fhandle by one call. This is also
> bad from the performance point of view.
> 2. When using ioctls we need to have *fixed* size of input and output.
> So, if MAX_HANDLE_SZ will change in the future our _IOR('o', 2, struct ovl_mnt_opt_fh)
> will also change with struct ovl_mnt_opt_fh.
>

The choice of API with fixed output size for a variable length info seems weird.

I am tempted to suggest extending name_to_handle_at(), for example
name_to_handle_at(ovl_root_fd, path, &fhandle, &layer_id, AT_LAYER)

Where layer_id can be input/output arg.

But I acknowledge this is going to be a much harder sell...

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH] overlayfs: add OVL_IOC_GETINFOFD ioctl that opens ovlinfofd
  2020-10-04 19:24 [RFC PATCH 0/1] overlayfs: C/R enhancments (RFC) Alexander Mikhalitsyn
  2020-10-04 19:24 ` [RFC PATCH 1/1] overlayfs: add ioctls that allows to get fhandle for layers dentries Alexander Mikhalitsyn
  2020-10-05  7:56 ` [RFC PATCH 0/1] overlayfs: C/R enhancments (RFC) Amir Goldstein
@ 2020-10-05 17:02 ` Alexander Mikhalitsyn
  2020-10-05 17:08   ` Randy Dunlap
  2020-10-14 14:14 ` [RFC PATCH] overlayfs: add fsinfo(FSINFO_ATTR_OVL_SOURCES) support Alexander Mikhalitsyn
  3 siblings, 1 reply; 13+ messages in thread
From: Alexander Mikhalitsyn @ 2020-10-05 17:02 UTC (permalink / raw)
  To: miklos
  Cc: Alexander Mikhalitsyn, Amir Goldstein, Andrei Vagin,
	Pavel Tikhomirov, David Howells, linux-unionfs, linux-fsdevel,
	linux-kernel

Second variant of possible interface to get source-dirs fhandles from
userspace. OVL_IOC_GETINFOFD ioctls opens special [ovlinfofd] descriptor
which is really just seq_file. When read from this seq_file we will get
something like this:
===
numlower: 2
L fhandle-bytes:c fhandle-type:1 f_handle:9685a2160200000000000000
L fhandle-bytes:c fhandle-type:1 f_handle:c74cd5c10300000000000000
U fhandle-bytes:c fhandle-type:1 f_handle:e45842640400000000000000
W fhandle-bytes:c fhandle-type:1 f_handle:d393374d0500000000000000
===

Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: David Howells <dhowells@redhat.com>
Cc: linux-unionfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
---
 fs/overlayfs/readdir.c | 171 +++++++++++++++++++++++++++++++++++++++++
 1 file changed, 171 insertions(+)

diff --git a/fs/overlayfs/readdir.c b/fs/overlayfs/readdir.c
index 12ee043d2b3a..60c3c47a6b3e 100644
--- a/fs/overlayfs/readdir.c
+++ b/fs/overlayfs/readdir.c
@@ -14,6 +14,9 @@
 #include <linux/cred.h>
 #include <linux/ratelimit.h>
 #include <linux/exportfs.h>
+#include <linux/anon_inodes.h>
+#include <linux/seq_file.h>
+#include <linux/syscalls.h>
 #include "overlayfs.h"
 
 struct ovl_cache_entry {
@@ -1067,11 +1070,175 @@ static long ovl_ioctl_get_work_fhandle(struct super_block *sb,
 	return __ovl_ioctl_get_fhandle(ofs->workbasedir, arg);
 }
 
+static int ovlinfofd_release(struct inode *inode, struct file *file)
+{
+	printk("ovlinfofd_release\n");
+	return single_release(inode, file);
+}
+
+#ifdef CONFIG_PROC_FS
+static void ovlinfofd_show_fdinfo(struct seq_file *m, struct file *f)
+{
+	/* TODO */
+}
+#endif
+
+static const struct file_operations ovlinfofd_fops = {
+	.owner		= THIS_MODULE,
+#ifdef CONFIG_PROC_FS
+	.show_fdinfo	= ovlinfofd_show_fdinfo,
+#endif
+	.release	= ovlinfofd_release,
+	.read		= seq_read,
+	.llseek		= seq_lseek,
+};
+
+static long __ovl_ioctl_show_dentry_fhandle(struct seq_file *s,
+					    const char *prefix,
+					    struct dentry *origin)
+{
+	struct ovl_mnt_opt_fh *fh;
+	int ret = 0, i;
+
+	fh = __ovl_encode_mnt_opt_fh(origin);
+	if (IS_ERR(fh))
+		return PTR_ERR(fh);
+
+	seq_printf(s, "%s fhandle-bytes:%x fhandle-type:%x f_handle:",
+		   prefix, fh->fh.handle_bytes, fh->fh.handle_type);
+
+	for (i = 0; i < fh->fh.handle_bytes; i++)
+		seq_printf(s, "%02x", (int)fh->fh.f_handle[i]);
+
+	seq_putc(s, '\n');
+
+	kfree(fh);
+	return ret;
+}
+
+static long ovl_ioctl_show_lower_fhandle(struct seq_file *s,
+					 unsigned long arg)
+{
+	struct super_block *sb = s->private;
+	struct ovl_entry *oe = sb->s_root->d_fsdata;
+	struct dentry *origin;
+
+	if (arg >= oe->numlower)
+		return -EINVAL;
+
+	origin = oe->lowerstack[arg].dentry;
+
+	return __ovl_ioctl_show_dentry_fhandle(s, "L", origin);
+}
+
+static long ovl_ioctl_show_upper_fhandle(struct seq_file *s)
+{
+	struct super_block *sb = s->private;
+	struct ovl_fs *ofs = sb->s_fs_info;
+	struct dentry *origin;
+
+	if (!ofs->config.upperdir)
+		return -EINVAL;
+
+	origin = OVL_I(d_inode(sb->s_root))->__upperdentry;
+
+	return __ovl_ioctl_show_dentry_fhandle(s, "U", origin);
+}
+
+static long ovl_ioctl_show_work_fhandle(struct seq_file *s)
+{
+	struct super_block *sb = s->private;
+	struct ovl_fs *ofs = sb->s_fs_info;
+
+	if (!ofs->config.upperdir)
+		return -EINVAL;
+
+	return __ovl_ioctl_show_dentry_fhandle(s, "W", ofs->workbasedir);
+}
+
+static int ovlinfofd_show(struct seq_file *s, void *unused)
+{
+	struct super_block *sb = s->private;
+	struct ovl_entry *oe = sb->s_root->d_fsdata;
+	int i;
+
+	printk("ovlinfofd_show\n");
+
+	seq_printf(s, "numlower: %d\n", oe->numlower);
+
+	for (i = 0; i < oe->numlower; i++)
+		ovl_ioctl_show_lower_fhandle(s, i);
+	ovl_ioctl_show_upper_fhandle(s);
+	ovl_ioctl_show_work_fhandle(s);
+
+	return 0;
+}
+
+static long ovl_ioctl_get_info_fd(struct super_block *sb,
+				  unsigned long arg)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+	struct ovl_entry *oe = sb->s_root->d_fsdata;
+	int err, ufd, flags = arg;
+	struct fd f;
+
+	if (flags & ~(O_CLOEXEC))
+		return -EINVAL;
+
+	/* FIXME Comment taken from signalfd.c. Need to think about this.
+	 * When we call this, the initialization must be complete, since
+	 * anon_inode_getfd() will install the fd.
+	 */
+	ufd = anon_inode_getfd("[ovlinfofd]", &ovlinfofd_fops, NULL,
+				O_RDONLY | (flags & (O_CLOEXEC)));
+	if (ufd < 0)
+		return ufd;
+
+	f = fdget(ufd);
+	if (!f.file) {
+		err = -EBADF;
+		goto err_close;
+	}
+
+	/*
+	 * It's good to have some good guess of seq_file buffer size
+	 * from start because if we will just use single_open() function
+	 * then we will make several seq_file overflows and .show callback
+	 * will be called several times. It's very bad for performance.
+	 *
+	 * Guess is very simple: we show fhandles as hex string. So,
+	 * all that we need is take MAX_HANDLE_SZ * 2 and multiply by
+	 * number of overlayfs mount source-dirs.
+	 */
+	err = single_open_size(f.file, ovlinfofd_show, sb,
+			       MAX_HANDLE_SZ * 2 *
+			       (oe->numlower + 2 * !!ofs->config.upperdir));
+	if (err)
+		goto err_fdput;
+
+	/*
+	 * We doing tricky things by combining anon_inode_getfd with seq_files,
+	 * so, it's better to check that all fine with fops after single_open_size
+	 * call.
+	 */
+	WARN_ON(f.file->f_op != &ovlinfofd_fops);
+	fdput(f);
+
+	return ufd;
+
+err_fdput:
+	fdput(f);
+err_close:
+	ksys_close(ufd);
+	return err;
+}
+
 #define	OVL_IOC_GETLWRFHNDLSNUM			_IO('o', 1)
 // DISCUSS: what if MAX_HANDLE_SZ will change?
 #define	OVL_IOC_GETLWRFHNDL			_IOR('o', 2, struct ovl_mnt_opt_fh)
 #define	OVL_IOC_GETUPPRFHNDL			_IOR('o', 3, struct ovl_mnt_opt_fh)
 #define	OVL_IOC_GETWRKFHNDL			_IOR('o', 4, struct ovl_mnt_opt_fh)
+#define	OVL_IOC_GETINFOFD			_IO('o', 5)
 
 static long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
@@ -1094,6 +1261,10 @@ static long ovl_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 		ret = ovl_ioctl_get_work_fhandle(file_inode(file)->i_sb, arg);
 		break;
 
+	case OVL_IOC_GETINFOFD:
+		ret = ovl_ioctl_get_info_fd(file_inode(file)->i_sb, arg);
+		break;
+
 	default:
 		ret = -ENOTTY;
 	}
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] overlayfs: add OVL_IOC_GETINFOFD ioctl that opens ovlinfofd
  2020-10-05 17:02 ` [RFC PATCH] overlayfs: add OVL_IOC_GETINFOFD ioctl that opens ovlinfofd Alexander Mikhalitsyn
@ 2020-10-05 17:08   ` Randy Dunlap
  2020-10-05 17:12     ` Alexander Mikhalitsyn
  0 siblings, 1 reply; 13+ messages in thread
From: Randy Dunlap @ 2020-10-05 17:08 UTC (permalink / raw)
  To: Alexander Mikhalitsyn, miklos
  Cc: Amir Goldstein, Andrei Vagin, Pavel Tikhomirov, David Howells,
	linux-unionfs, linux-fsdevel, linux-kernel

On 10/5/20 10:02 AM, Alexander Mikhalitsyn wrote:
>  #define	OVL_IOC_GETLWRFHNDLSNUM			_IO('o', 1)
>  // DISCUSS: what if MAX_HANDLE_SZ will change?
>  #define	OVL_IOC_GETLWRFHNDL			_IOR('o', 2, struct ovl_mnt_opt_fh)
>  #define	OVL_IOC_GETUPPRFHNDL			_IOR('o', 3, struct ovl_mnt_opt_fh)
>  #define	OVL_IOC_GETWRKFHNDL			_IOR('o', 4, struct ovl_mnt_opt_fh)
> +#define	OVL_IOC_GETINFOFD			_IO('o', 5)

Hi,

Quoting (repeating) from
https://lore.kernel.org/lkml/9cd0e9d1-f124-3f2d-86e6-e6e96a1ccb1e@infradead.org/:

This needs to have Documentation/userspace-api/ioctl/ioctl-number.rst
updated also.

...

Are you waiting until it's past RFC stage?

thanks.
-- 
~Randy


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] overlayfs: add OVL_IOC_GETINFOFD ioctl that opens ovlinfofd
  2020-10-05 17:08   ` Randy Dunlap
@ 2020-10-05 17:12     ` Alexander Mikhalitsyn
  2020-10-05 17:22       ` Amir Goldstein
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Mikhalitsyn @ 2020-10-05 17:12 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: miklos, Amir Goldstein, Andrei Vagin, Pavel Tikhomirov,
	David Howells, linux-unionfs, linux-fsdevel, linux-kernel

On Mon, 5 Oct 2020 10:08:42 -0700
Randy Dunlap <rdunlap@infradead.org> wrote:

> On 10/5/20 10:02 AM, Alexander Mikhalitsyn wrote:
> >  #define	OVL_IOC_GETLWRFHNDLSNUM			_IO('o', 1)
> >  // DISCUSS: what if MAX_HANDLE_SZ will change?
> >  #define	OVL_IOC_GETLWRFHNDL			_IOR('o', 2, struct ovl_mnt_opt_fh)
> >  #define	OVL_IOC_GETUPPRFHNDL			_IOR('o', 3, struct ovl_mnt_opt_fh)
> >  #define	OVL_IOC_GETWRKFHNDL			_IOR('o', 4, struct ovl_mnt_opt_fh)
> > +#define	OVL_IOC_GETINFOFD			_IO('o', 5)
> 
> Hi,
> 
> Quoting (repeating) from
> https://lore.kernel.org/lkml/9cd0e9d1-f124-3f2d-86e6-e6e96a1ccb1e@infradead.org/:
> 
> This needs to have Documentation/userspace-api/ioctl/ioctl-number.rst
> updated also.
> 
> ...
> 
> Are you waiting until it's past RFC stage?
> 
> thanks.
> -- 
> ~Randy
> 

Hi,

thank you! I will prepare this change too when we
decide which ioctls to add. ;)

Regards,
Alex.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] overlayfs: add OVL_IOC_GETINFOFD ioctl that opens ovlinfofd
  2020-10-05 17:12     ` Alexander Mikhalitsyn
@ 2020-10-05 17:22       ` Amir Goldstein
  2020-10-05 19:39         ` Alexander Mikhalitsyn
  0 siblings, 1 reply; 13+ messages in thread
From: Amir Goldstein @ 2020-10-05 17:22 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: Randy Dunlap, Miklos Szeredi, Andrei Vagin, Pavel Tikhomirov,
	David Howells, overlayfs, linux-fsdevel, linux-kernel

On Mon, Oct 5, 2020 at 8:13 PM Alexander Mikhalitsyn
<alexander.mikhalitsyn@virtuozzo.com> wrote:
>
> On Mon, 5 Oct 2020 10:08:42 -0700
> Randy Dunlap <rdunlap@infradead.org> wrote:
>
> > On 10/5/20 10:02 AM, Alexander Mikhalitsyn wrote:
> > >  #define    OVL_IOC_GETLWRFHNDLSNUM                 _IO('o', 1)
> > >  // DISCUSS: what if MAX_HANDLE_SZ will change?
> > >  #define    OVL_IOC_GETLWRFHNDL                     _IOR('o', 2, struct ovl_mnt_opt_fh)
> > >  #define    OVL_IOC_GETUPPRFHNDL                    _IOR('o', 3, struct ovl_mnt_opt_fh)
> > >  #define    OVL_IOC_GETWRKFHNDL                     _IOR('o', 4, struct ovl_mnt_opt_fh)
> > > +#define    OVL_IOC_GETINFOFD                       _IO('o', 5)
> >
> > Hi,
> >
> > Quoting (repeating) from
> > https://lore.kernel.org/lkml/9cd0e9d1-f124-3f2d-86e6-e6e96a1ccb1e@infradead.org/:
> >
> > This needs to have Documentation/userspace-api/ioctl/ioctl-number.rst
> > updated also.
> >
> > ...
> >
> > Are you waiting until it's past RFC stage?
> >
> > thanks.
> > --
> > ~Randy
> >
>
> Hi,
>
> thank you! I will prepare this change too when we
> decide which ioctls to add. ;)
>

Or... don't do ioctls and avoid the ABI headaches.
If you are going to expose a seqfile I think it would be much prefered
to do a /sys/fs/overlay/instance/<device minor>/layes file.

But that's just my opinion.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] overlayfs: add OVL_IOC_GETINFOFD ioctl that opens ovlinfofd
  2020-10-05 17:22       ` Amir Goldstein
@ 2020-10-05 19:39         ` Alexander Mikhalitsyn
  2020-10-06  9:54           ` Miklos Szeredi
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Mikhalitsyn @ 2020-10-05 19:39 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Randy Dunlap, Miklos Szeredi, Andrei Vagin, Pavel Tikhomirov,
	David Howells, overlayfs, linux-fsdevel, linux-kernel

On Mon, 5 Oct 2020 20:22:47 +0300
Amir Goldstein <amir73il@gmail.com> wrote:

> On Mon, Oct 5, 2020 at 8:13 PM Alexander Mikhalitsyn
> <alexander.mikhalitsyn@virtuozzo.com> wrote:
> >
> > On Mon, 5 Oct 2020 10:08:42 -0700
> > Randy Dunlap <rdunlap@infradead.org> wrote:
> >
> > > On 10/5/20 10:02 AM, Alexander Mikhalitsyn wrote:
> > > >  #define    OVL_IOC_GETLWRFHNDLSNUM                 _IO('o', 1)
> > > >  // DISCUSS: what if MAX_HANDLE_SZ will change?
> > > >  #define    OVL_IOC_GETLWRFHNDL                     _IOR('o', 2, struct ovl_mnt_opt_fh)
> > > >  #define    OVL_IOC_GETUPPRFHNDL                    _IOR('o', 3, struct ovl_mnt_opt_fh)
> > > >  #define    OVL_IOC_GETWRKFHNDL                     _IOR('o', 4, struct ovl_mnt_opt_fh)
> > > > +#define    OVL_IOC_GETINFOFD                       _IO('o', 5)
> > >
> > > Hi,
> > >
> > > Quoting (repeating) from
> > > https://lore.kernel.org/lkml/9cd0e9d1-f124-3f2d-86e6-e6e96a1ccb1e@infradead.org/:
> > >
> > > This needs to have Documentation/userspace-api/ioctl/ioctl-number.rst
> > > updated also.
> > >
> > > ...
> > >
> > > Are you waiting until it's past RFC stage?
> > >
> > > thanks.
> > > --
> > > ~Randy
> > >
> >
> > Hi,
> >
> > thank you! I will prepare this change too when we
> > decide which ioctls to add. ;)
> >
> 
> Or... don't do ioctls and avoid the ABI headaches.
> If you are going to expose a seqfile I think it would be much prefered
> to do a /sys/fs/overlay/instance/<device minor>/layes file.
> 
> But that's just my opinion.
> 
> Thanks,
> Amir.

Hi!

Oh, yes, it's just another one choice for us ;)
I decided to create ioctl which opens fd for transferring
data about overlayfs.
Sure, let's look on sys fs. I will dive into it!

Thanks

Regards,
Alex.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/1] overlayfs: C/R enhancments (RFC)
  2020-10-05  7:56 ` [RFC PATCH 0/1] overlayfs: C/R enhancments (RFC) Amir Goldstein
@ 2020-10-05 19:46   ` Alexander Mikhalitsyn
  2020-10-06  6:53     ` Amir Goldstein
  0 siblings, 1 reply; 13+ messages in thread
From: Alexander Mikhalitsyn @ 2020-10-05 19:46 UTC (permalink / raw)
  To: Amir Goldstein
  Cc: Miklos Szeredi, Andrei Vagin, Pavel Tikhomirov, David Howells,
	overlayfs, linux-fsdevel, linux-kernel

Hi Amir,

On Mon, 5 Oct 2020 10:56:50 +0300
Amir Goldstein <amir73il@gmail.com> wrote:

> On Sun, Oct 4, 2020 at 10:25 PM Alexander Mikhalitsyn
> <alexander.mikhalitsyn@virtuozzo.com> wrote:
> >
> > Some time ago we discussed about the problem of Checkpoint-Restoring
> > overlayfs mounts [1]. Big thanks to Amir for review and suggestions.
> >
> > Brief from previous discussion.
> > Problem statement: to checkpoint-restore overlayfs mounts we need
> > to save overlayfs mount state and save it into the image. Basically,
> > this state for us it's just mount options of overlayfs mount. But
> > here we have two problems:
> >
> > I. during mounting overlayfs user may specify relative paths in upperdir,
> > workdir, lowerdir options
> >
> > II. also user may unmount mount from which these paths was opened during mounting
> >
> > This is real problems for us. My first patch was attempt to address both problems.
> > 1. I've added refcnt get for mounts from which overlayfs was mounted.
> > 2. I've changed overlayfs mountinfo show algorithm, so overlayfs started to *always*
> > show full paths for upperdir,workdir,lowerdirs.
> > 3. I've added mnt_id show-time only option which allows to determine from which mnt_id
> > we opened options paths.
> >
> > Pros:
> > - we can determine full information about overlayfs mount
> > - we hold refcnt to mount, so, user may unmount source mounts only
> > with lazy flag
> >
> > Cons:
> > - by adding refcnt get for mount I've changed possible overlayfs usecases
> > - by showing *full* paths we can more easily reache PAGE_SIZE limit of
> > mounts options in procfs
> > - by adding mnt_id show-only option I've added inconsistency between
> > mount-time options and show-time mount options
> >
> > After very productive discussion with Amir and Pavel I've decided to write new
> > implementation. In new approach we decided *not* to take extra refcnts to mounts.
> > Also we decided to use exportfs fhandles instead of full paths. To determine
> > full path we plan to use the next algo:
> > 1. Export {s_dev; fhandle} from overlayfs for *all* sources
> > 2. User open_by_handle_at syscall to open all these fhandles (we need to
> > determine mount for each fhandle, looks like we can do this by s_dev by linear
> > search in /proc/<pid>/mountinfo)
> > 3. Then readlink /proc/<pid>/fd/<opened fd>
> > 4. Dump this full path+mnt_id
> >
> 
> Hi Alex,
> 
> The general concept looks good to me.
> I will not provide specific comment on the implementation (it looks
> fine) until the
> concept API is accepted by the maintainer.
> 
> The main thing I want to make sure is that if we add this interface it can
> serve other use cases as well.

Yes, let's create universal interface.

> 
> During my talk on LPC, I got a similar request from two developers for two
> different use cases. They wanted a safe method to iterate "changes
> since baseline"
> from either within the container or from the host.

This discussions was on lkml or in private room?

> 
> Your proposed API is a step in the direction for meeting their requirement.
> The major change is that ioctl (or whatever method) should expose the
> layers topology of a specific object, not only the overlay instance.
> 
> For C/R you would query the layers topology of the overlay root dir.
> 
> My comments of the specific methods below are not meant to
> object to the choice of ioctl, but they are meant to give the alternative
> a fair chance. I am kind of leaning towards ioctl myself.
> 
> > But there is question. How to export this {s_dev; fhandle} from kernel to userspace?
> > - We decided not to use procfs.
> 
> Why not?
> C/R already uses procfs to export fhandle for fanotify/inotify
> I kind of like the idea of having /sys/fs/overlay/instances etc.
> It could be useful to many things.

Ah, sorry. For some reason I've decided that we excluded procfs/sysfs option :)
Let's take this option into account too.

> 
> > - Amir proposed solution - use xattrs. But after diving into it I've meet problem
> > where I can set this xattrs?
> > If I set this xattrs on overlayfs dentries then during rsync, or cp -p=xattr we will copy
> > this temporary information.
> 
> No you won't.
> rsync, cp will only copy xattrs listed with listxattr.
> Several filesystems, such as cifs and nfs export "object properties"
> via private xattrs
> that are not listed in listxattr (e.g. CIFS_XATTR_CIFS_ACL).
> You are not limited in what you can do in the "trusted.overlay" namespace, for
> example "trusted.overlay.layers.0.fh"
> 
> The advantage is that it is very easy to implement and requires
> less discussions about ABI, but I agree it does feel a bit like a hack.

Ack. I can try to write some draft implementation with xattrs.

> 
> > - ioctls? (this patchset implements this approach)
> > - fsinfo subsystem (not merged yet) [2]
> >
> > Problems with ioctls:
> > 1. We limited in output data size (16 KB AFAIK)
> > but MAX_HANDLE_SZ=128(bytes), OVL_MAX_STACK=500(num lowerdirs)
> > So, MAX_HANDLE_SZ*OVL_MAX_STACK = 64KB which is bigger than limit.
> > So, I've decided to give user one fhandle by one call. This is also
> > bad from the performance point of view.
> > 2. When using ioctls we need to have *fixed* size of input and output.
> > So, if MAX_HANDLE_SZ will change in the future our _IOR('o', 2, struct ovl_mnt_opt_fh)
> > will also change with struct ovl_mnt_opt_fh.
> >
> 
> The choice of API with fixed output size for a variable length info seems weird.

Yes, and I've proposed option with ioctl syscall where we open file descriptor
instead of doing direct copy_from_user/copy_to_user.

> 
> I am tempted to suggest extending name_to_handle_at(), for example
> name_to_handle_at(ovl_root_fd, path, &fhandle, &layer_id, AT_LAYER)
> 
> Where layer_id can be input/output arg.
> 
> But I acknowledge this is going to be a much harder sell...

Looks interesting. I'll need to dive and think about it.

> 
> Thanks,
> Amir.

Thanks for your attention and review of patchset!

Regards,
Alex.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 0/1] overlayfs: C/R enhancments (RFC)
  2020-10-05 19:46   ` Alexander Mikhalitsyn
@ 2020-10-06  6:53     ` Amir Goldstein
  0 siblings, 0 replies; 13+ messages in thread
From: Amir Goldstein @ 2020-10-06  6:53 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: Miklos Szeredi, Andrei Vagin, Pavel Tikhomirov, David Howells,
	overlayfs, linux-fsdevel, linux-kernel

On Mon, Oct 5, 2020 at 10:47 PM Alexander Mikhalitsyn
<alexander.mikhalitsyn@virtuozzo.com> wrote:
>
> Hi Amir,
>
> On Mon, 5 Oct 2020 10:56:50 +0300
> Amir Goldstein <amir73il@gmail.com> wrote:
>
> > On Sun, Oct 4, 2020 at 10:25 PM Alexander Mikhalitsyn
> > <alexander.mikhalitsyn@virtuozzo.com> wrote:
> > >
> > > Some time ago we discussed about the problem of Checkpoint-Restoring
> > > overlayfs mounts [1]. Big thanks to Amir for review and suggestions.
> > >
> > > Brief from previous discussion.
> > > Problem statement: to checkpoint-restore overlayfs mounts we need
> > > to save overlayfs mount state and save it into the image. Basically,
> > > this state for us it's just mount options of overlayfs mount. But
> > > here we have two problems:
> > >
> > > I. during mounting overlayfs user may specify relative paths in upperdir,
> > > workdir, lowerdir options
> > >
> > > II. also user may unmount mount from which these paths was opened during mounting
> > >
> > > This is real problems for us. My first patch was attempt to address both problems.
> > > 1. I've added refcnt get for mounts from which overlayfs was mounted.
> > > 2. I've changed overlayfs mountinfo show algorithm, so overlayfs started to *always*
> > > show full paths for upperdir,workdir,lowerdirs.
> > > 3. I've added mnt_id show-time only option which allows to determine from which mnt_id
> > > we opened options paths.
> > >
> > > Pros:
> > > - we can determine full information about overlayfs mount
> > > - we hold refcnt to mount, so, user may unmount source mounts only
> > > with lazy flag
> > >
> > > Cons:
> > > - by adding refcnt get for mount I've changed possible overlayfs usecases
> > > - by showing *full* paths we can more easily reache PAGE_SIZE limit of
> > > mounts options in procfs
> > > - by adding mnt_id show-only option I've added inconsistency between
> > > mount-time options and show-time mount options
> > >
> > > After very productive discussion with Amir and Pavel I've decided to write new
> > > implementation. In new approach we decided *not* to take extra refcnts to mounts.
> > > Also we decided to use exportfs fhandles instead of full paths. To determine
> > > full path we plan to use the next algo:
> > > 1. Export {s_dev; fhandle} from overlayfs for *all* sources
> > > 2. User open_by_handle_at syscall to open all these fhandles (we need to
> > > determine mount for each fhandle, looks like we can do this by s_dev by linear
> > > search in /proc/<pid>/mountinfo)
> > > 3. Then readlink /proc/<pid>/fd/<opened fd>
> > > 4. Dump this full path+mnt_id
> > >
> >
> > Hi Alex,
> >
> > The general concept looks good to me.
> > I will not provide specific comment on the implementation (it looks
> > fine) until the
> > concept API is accepted by the maintainer.
> >
> > The main thing I want to make sure is that if we add this interface it can
> > serve other use cases as well.
>
> Yes, let's create universal interface.
>

Note that this universal interface contradicts the direction of sysfs
which is a convenient way for getting filesystem instance info, but
not object info.

> >
> > During my talk on LPC, I got a similar request from two developers for two
> > different use cases. They wanted a safe method to iterate "changes
> > since baseline"
> > from either within the container or from the host.
>
> This discussions was on lkml or in private room?
>

The containers track:
https://youtu.be/fSyr_IXM21Y?t=4939

We continued in private channels, but the general idea
is an API to provide some insight about underlying layers

> >
> > Your proposed API is a step in the direction for meeting their requirement.
> > The major change is that ioctl (or whatever method) should expose the
> > layers topology of a specific object, not only the overlay instance.
> >
> > For C/R you would query the layers topology of the overlay root dir.
> >
> > My comments of the specific methods below are not meant to
> > object to the choice of ioctl, but they are meant to give the alternative
> > a fair chance. I am kind of leaning towards ioctl myself.
> >
> > > But there is question. How to export this {s_dev; fhandle} from kernel to userspace?
> > > - We decided not to use procfs.
> >
> > Why not?
> > C/R already uses procfs to export fhandle for fanotify/inotify
> > I kind of like the idea of having /sys/fs/overlay/instances etc.
> > It could be useful to many things.
>
> Ah, sorry. For some reason I've decided that we excluded procfs/sysfs option :)
> Let's take this option into account too.
>
> >
> > > - Amir proposed solution - use xattrs. But after diving into it I've meet problem
> > > where I can set this xattrs?
> > > If I set this xattrs on overlayfs dentries then during rsync, or cp -p=xattr we will copy
> > > this temporary information.
> >
> > No you won't.
> > rsync, cp will only copy xattrs listed with listxattr.
> > Several filesystems, such as cifs and nfs export "object properties"
> > via private xattrs
> > that are not listed in listxattr (e.g. CIFS_XATTR_CIFS_ACL).
> > You are not limited in what you can do in the "trusted.overlay" namespace, for
> > example "trusted.overlay.layers.0.fh"
> >
> > The advantage is that it is very easy to implement and requires
> > less discussions about ABI, but I agree it does feel a bit like a hack.
>
> Ack. I can try to write some draft implementation with xattrs.
>

You don't have to write code before getting an ack from
maintainer on the design, but fine by me.

> >
> > > - ioctls? (this patchset implements this approach)
> > > - fsinfo subsystem (not merged yet) [2]
> > >
> > > Problems with ioctls:
> > > 1. We limited in output data size (16 KB AFAIK)
> > > but MAX_HANDLE_SZ=128(bytes), OVL_MAX_STACK=500(num lowerdirs)
> > > So, MAX_HANDLE_SZ*OVL_MAX_STACK = 64KB which is bigger than limit.
> > > So, I've decided to give user one fhandle by one call. This is also
> > > bad from the performance point of view.
> > > 2. When using ioctls we need to have *fixed* size of input and output.
> > > So, if MAX_HANDLE_SZ will change in the future our _IOR('o', 2, struct ovl_mnt_opt_fh)
> > > will also change with struct ovl_mnt_opt_fh.
> > >
> >
> > The choice of API with fixed output size for a variable length info seems weird.
>
> Yes, and I've proposed option with ioctl syscall where we open file descriptor
> instead of doing direct copy_from_user/copy_to_user.
>
> >
> > I am tempted to suggest extending name_to_handle_at(), for example
> > name_to_handle_at(ovl_root_fd, path, &fhandle, &layer_id, AT_LAYER)
> >
> > Where layer_id can be input/output arg.
> >
> > But I acknowledge this is going to be a much harder sell...
>
> Looks interesting. I'll need to dive and think about it.
>

This API change has a lot more stakeholders.
I think it would be wiser for you to stay within the overlayfs boundaries.

Thanks,
Amir.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH] overlayfs: add OVL_IOC_GETINFOFD ioctl that opens ovlinfofd
  2020-10-05 19:39         ` Alexander Mikhalitsyn
@ 2020-10-06  9:54           ` Miklos Szeredi
  0 siblings, 0 replies; 13+ messages in thread
From: Miklos Szeredi @ 2020-10-06  9:54 UTC (permalink / raw)
  To: Alexander Mikhalitsyn
  Cc: Amir Goldstein, Randy Dunlap, Andrei Vagin, Pavel Tikhomirov,
	David Howells, overlayfs, linux-fsdevel, linux-kernel

On Mon, Oct 5, 2020 at 9:40 PM Alexander Mikhalitsyn
<alexander.mikhalitsyn@virtuozzo.com> wrote:

> Oh, yes, it's just another one choice for us ;)
> I decided to create ioctl which opens fd for transferring
> data about overlayfs.
> Sure, let's look on sys fs. I will dive into it!

Or maybe use fsinfo(2) or whatever it's going to be called?

That one seems stuck at the moment, but having a clear set of use
cases can possibly help move it forward.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH] overlayfs: add fsinfo(FSINFO_ATTR_OVL_SOURCES) support
  2020-10-04 19:24 [RFC PATCH 0/1] overlayfs: C/R enhancments (RFC) Alexander Mikhalitsyn
                   ` (2 preceding siblings ...)
  2020-10-05 17:02 ` [RFC PATCH] overlayfs: add OVL_IOC_GETINFOFD ioctl that opens ovlinfofd Alexander Mikhalitsyn
@ 2020-10-14 14:14 ` Alexander Mikhalitsyn
  3 siblings, 0 replies; 13+ messages in thread
From: Alexander Mikhalitsyn @ 2020-10-14 14:14 UTC (permalink / raw)
  To: miklos
  Cc: Alexander Mikhalitsyn, David Howells, Amir Goldstein,
	Andrei Vagin, Pavel Tikhomirov, linux-unionfs, linux-fsdevel,
	linux-kernel

FSINFO_ATTR_OVL_SOURCES fsinfo attribute allows us
to export fhandles for overlayfs source directories
such as upperdir, workdir, lowerdirs.

This patchs adds initial support of fsinfo into overlayfs.
If community decide to take this way of C/R support
in overlayfs then I have plan to implement FSINFO_ATTR_SUPPORTS
and FSINFO_ATTR_FEATURES standard attributes handlers too.

Cc: David Howells <dhowells@redhat.com>
Cc: Amir Goldstein <amir73il@gmail.com>
Cc: Andrei Vagin <avagin@gmail.com>
Cc: Pavel Tikhomirov <ptikhomirov@virtuozzo.com>
Cc: Miklos Szeredi <miklos@szeredi.hu>
Cc: linux-unionfs@vger.kernel.org
Cc: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Alexander Mikhalitsyn <alexander.mikhalitsyn@virtuozzo.com>
---
 fs/overlayfs/Makefile       |   1 +
 fs/overlayfs/fsinfo.c       | 133 ++++++++++++++++++++++++++++++++++++
 fs/overlayfs/overlayfs.h    |   6 ++
 fs/overlayfs/super.c        |   3 +
 include/uapi/linux/fsinfo.h |  31 +++++++++
 5 files changed, 174 insertions(+)
 create mode 100644 fs/overlayfs/fsinfo.c

diff --git a/fs/overlayfs/Makefile b/fs/overlayfs/Makefile
index 9164c585eb2f..db555c0e4508 100644
--- a/fs/overlayfs/Makefile
+++ b/fs/overlayfs/Makefile
@@ -7,3 +7,4 @@ obj-$(CONFIG_OVERLAY_FS) += overlay.o
 
 overlay-objs := super.o namei.o util.o inode.o file.o dir.o readdir.o \
 		copy_up.o export.o
+overlay-$(CONFIG_FSINFO)	+= fsinfo.o
diff --git a/fs/overlayfs/fsinfo.c b/fs/overlayfs/fsinfo.c
new file mode 100644
index 000000000000..9857949dcce5
--- /dev/null
+++ b/fs/overlayfs/fsinfo.c
@@ -0,0 +1,133 @@
+// SPDX-License-Identifier: GPL-2.0
+/* Filesystem information for overlayfs
+ *
+ * Copyright (C) 2020 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ */
+
+#include <linux/mount.h>
+#include <linux/fsinfo.h>
+#include "overlayfs.h"
+
+static int __ovl_encode_mnt_opt_fh(struct fsinfo_ovl_source *p,
+				   struct dentry *dentry)
+{
+	int fh_type, dwords;
+	int buflen = MAX_HANDLE_SZ;
+	int err;
+
+	/* we ask for a non connected handle */
+	dwords = buflen >> 2;
+	fh_type = exportfs_encode_fh(dentry, (void *)p->fh.f_handle, &dwords, 0);
+	buflen = (dwords << 2);
+
+	err = -EIO;
+	if (WARN_ON(fh_type < 0) ||
+	    WARN_ON(buflen > MAX_HANDLE_SZ) ||
+	    WARN_ON(fh_type == FILEID_INVALID))
+		goto out_err;
+
+	p->fh.handle_type = fh_type;
+	p->fh.handle_bytes = buflen;
+
+	/*
+	 * Ideally, we want to have mnt_id+fhandle, but overlayfs not
+	 * keep refcnts on layers mounts and we couldn't determine
+	 * mnt_ids for layers. So, let's give s_dev to CRIU.
+	 * It's better than nothing.
+	 */
+	p->s_dev = dentry->d_sb->s_dev;
+
+	return 0;
+
+out_err:
+	return err;
+}
+
+static int ovl_fsinfo_store_source(struct fsinfo_ovl_source *p,
+				   enum fsinfo_ovl_source_type type,
+				   struct dentry *dentry)
+{
+	__ovl_encode_mnt_opt_fh(p, dentry);
+	p->type = type;
+	return 0;
+}
+
+static long ovl_ioctl_stor_lower_fhandle(struct fsinfo_ovl_source *p,
+					 struct super_block *sb,
+					 unsigned long arg)
+{
+	struct ovl_entry *oe = sb->s_root->d_fsdata;
+	struct dentry *origin;
+
+	if (arg >= oe->numlower)
+		return -EINVAL;
+
+	origin = oe->lowerstack[arg].dentry;
+
+	return ovl_fsinfo_store_source(p, FSINFO_OVL_LWR, origin);
+}
+
+static long ovl_ioctl_stor_upper_fhandle(struct fsinfo_ovl_source *p,
+					 struct super_block *sb)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+	struct dentry *origin;
+
+	if (!ofs->config.upperdir)
+		return -EINVAL;
+
+	origin = OVL_I(d_inode(sb->s_root))->__upperdentry;
+
+	return ovl_fsinfo_store_source(p, FSINFO_OVL_UPPR, origin);
+}
+
+static long ovl_ioctl_stor_work_fhandle(struct fsinfo_ovl_source *p,
+					struct super_block *sb)
+{
+	struct ovl_fs *ofs = sb->s_fs_info;
+
+	if (!ofs->config.upperdir)
+		return -EINVAL;
+
+	return ovl_fsinfo_store_source(p, FSINFO_OVL_WRK, ofs->workbasedir);
+}
+
+static int ovl_fsinfo_sources(struct path *path, struct fsinfo_context *ctx)
+{
+	struct fsinfo_ovl_source *p = ctx->buffer;
+	struct super_block *sb = path->dentry->d_sb;
+	struct ovl_fs *ofs = sb->s_fs_info;
+	struct ovl_entry *oe = sb->s_root->d_fsdata;
+	size_t nr_sources = (oe->numlower + 2 * !!ofs->config.upperdir);
+	unsigned int i = 0, j;
+	int ret = -ENODATA;
+
+	ret = nr_sources * sizeof(*p);
+	if (ret <= ctx->buf_size) {
+		if (ofs->config.upperdir) {
+			ovl_ioctl_stor_upper_fhandle(&p[i++], sb);
+			ovl_ioctl_stor_work_fhandle(&p[i++], sb);
+		}
+
+		for (j = 0; j < oe->numlower; j++)
+			ovl_ioctl_stor_lower_fhandle(&p[i++], sb, j);
+	}
+
+	return ret;
+}
+
+static const struct fsinfo_attribute ovl_fsinfo_attributes[] = {
+	/* TODO: implement FSINFO_ATTR_SUPPORTS and FSINFO_ATTR_FEATURES */
+	/*
+	FSINFO_VSTRUCT	(FSINFO_ATTR_SUPPORTS,		ovl_fsinfo_supports),
+	FSINFO_VSTRUCT	(FSINFO_ATTR_FEATURES,		ovl_fsinfo_features),
+	*/
+	FSINFO_LIST	(FSINFO_ATTR_OVL_SOURCES,	ovl_fsinfo_sources),
+	{}
+};
+
+int ovl_fsinfo(struct path *path, struct fsinfo_context *ctx)
+{
+	return fsinfo_get_attribute(path, ctx, ovl_fsinfo_attributes);
+}
diff --git a/fs/overlayfs/overlayfs.h b/fs/overlayfs/overlayfs.h
index 29bc1ec699e7..1c0ac23ecf8f 100644
--- a/fs/overlayfs/overlayfs.h
+++ b/fs/overlayfs/overlayfs.h
@@ -7,6 +7,7 @@
 #include <linux/kernel.h>
 #include <linux/uuid.h>
 #include <linux/fs.h>
+#include <linux/xattr.h>
 #include "ovl_entry.h"
 
 #undef pr_fmt
@@ -492,3 +493,8 @@ int ovl_set_origin(struct dentry *dentry, struct dentry *lower,
 
 /* export.c */
 extern const struct export_operations ovl_export_operations;
+
+/* fsinfo.c */
+#ifdef CONFIG_FSINFO
+extern int ovl_fsinfo(struct path *path, struct fsinfo_context *ctx);
+#endif
diff --git a/fs/overlayfs/super.c b/fs/overlayfs/super.c
index 4b38141c2985..1a4cdbbd766f 100644
--- a/fs/overlayfs/super.c
+++ b/fs/overlayfs/super.c
@@ -392,6 +392,9 @@ static const struct super_operations ovl_super_operations = {
 	.put_super	= ovl_put_super,
 	.sync_fs	= ovl_sync_fs,
 	.statfs		= ovl_statfs,
+#ifdef CONFIG_FSINFO
+	.fsinfo		= ovl_fsinfo,
+#endif
 	.show_options	= ovl_show_options,
 	.remount_fs	= ovl_remount,
 };
diff --git a/include/uapi/linux/fsinfo.h b/include/uapi/linux/fsinfo.h
index dcd764771a7d..83c2511691e4 100644
--- a/include/uapi/linux/fsinfo.h
+++ b/include/uapi/linux/fsinfo.h
@@ -10,6 +10,8 @@
 #include <linux/types.h>
 #include <linux/socket.h>
 #include <linux/openat2.h>
+#include <linux/fs.h>
+#include <linux/exportfs.h>
 
 /*
  * The filesystem attributes that can be requested.  Note that some attributes
@@ -44,6 +46,8 @@
 #define FSINFO_ATTR_AFS_SERVER_NAME	0x301	/* Name of the Nth server (string) */
 #define FSINFO_ATTR_AFS_SERVER_ADDRESSES 0x302	/* List of addresses of the Nth server */
 
+#define FSINFO_ATTR_OVL_SOURCES		0x400	/* List of overlayfs source dirs fhandles+sdev */
+
 /*
  * Optional fsinfo() parameter structure.
  *
@@ -341,4 +345,31 @@ struct fsinfo_error_state {
 
 #define FSINFO_ATTR_ERROR_STATE__STRUCT struct fsinfo_error_state
 
+/*
+ * Information struct for fsinfo(FSINFO_ATTR_FSINFO_ATTRIBUTE_INFO).
+ *
+ * This gives information about the overlayfs upperdir, workdir, lowerdir
+ * superblock options (exported as fhandles).
+ */
+enum fsinfo_ovl_source_type {
+	FSINFO_OVL_UPPR	= 0,	/* upperdir */
+	FSINFO_OVL_WRK	= 1,	/* workdir */
+	FSINFO_OVL_LWR	= 2,	/* lowerdir list item */
+};
+
+/* DISCUSS: we can also export mnt_unique_id here which introduced by fsinfo patchset
+ * and then use him to detect if source was unmounted in the time gap between the moment when
+ * overlayfs was mounted and C/R process was started.
+ * We can get mnt_unique_id also by using fsinfo(FSINFO_ATTR_MOUNT_ALL)
+ */
+struct fsinfo_ovl_source {
+	enum fsinfo_ovl_source_type type;
+	__u32 s_dev;
+	struct file_handle fh;
+	/* use f_handle field from struct file_handle */
+	__u8 __fhdata[MAX_HANDLE_SZ];
+};
+
+#define FSINFO_ATTR_OVL_SOURCES__STRUCT struct fsinfo_ovl_source
+
 #endif /* _UAPI_LINUX_FSINFO_H */
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-10-14 14:15 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-04 19:24 [RFC PATCH 0/1] overlayfs: C/R enhancments (RFC) Alexander Mikhalitsyn
2020-10-04 19:24 ` [RFC PATCH 1/1] overlayfs: add ioctls that allows to get fhandle for layers dentries Alexander Mikhalitsyn
2020-10-04 19:48   ` Randy Dunlap
2020-10-05  7:56 ` [RFC PATCH 0/1] overlayfs: C/R enhancments (RFC) Amir Goldstein
2020-10-05 19:46   ` Alexander Mikhalitsyn
2020-10-06  6:53     ` Amir Goldstein
2020-10-05 17:02 ` [RFC PATCH] overlayfs: add OVL_IOC_GETINFOFD ioctl that opens ovlinfofd Alexander Mikhalitsyn
2020-10-05 17:08   ` Randy Dunlap
2020-10-05 17:12     ` Alexander Mikhalitsyn
2020-10-05 17:22       ` Amir Goldstein
2020-10-05 19:39         ` Alexander Mikhalitsyn
2020-10-06  9:54           ` Miklos Szeredi
2020-10-14 14:14 ` [RFC PATCH] overlayfs: add fsinfo(FSINFO_ATTR_OVL_SOURCES) support Alexander Mikhalitsyn

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).