All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
@ 2016-08-25 15:21 Dmitry Safonov
  2016-08-25 15:21 ` [RFC 1/3] x86/vdso: create vdso file, use it for mapping Dmitry Safonov
                   ` (3 more replies)
  0 siblings, 4 replies; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-25 15:21 UTC (permalink / raw)
  To: linux-kernel, mingo
  Cc: luto, tglx, hpa, x86, 0x7f454c46, oleg, rostedt, viro, Dmitry Safonov

This patches set is cleanly RFC and is not supposed to be applied.
Also for RFC time it builds only on x86_64.

So, in a mail thread Oleg told that it would be worth to introduce vm_file
for vdso mappings as currently uprobes can not be placed on vDSO VMAs [1].
In this patches set I introduce in-kernel filesystem for vdso files.
After patches vDSO VMA now has inode and is just a private file mapping:
7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0                          [vvar]
7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18                         [vdso]

Then I introduce interface in uprobe_events to insert uprobes in vdso.
FWIW:
  [~]# cd kernel/linux
  [linux]# readelf --syms arch/x86/entry/vdso/vdso64.so
Symbol table '.dynsym' contains 11 entries:
   Num:    Value          Size Type    Bind   Vis      Ndx Name
     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
     1: 0000000000000470     0 SECTION LOCAL  DEFAULT    8 
     2: 00000000000008d0   885 FUNC    WEAK   DEFAULT   12 clock_gettime@@LINUX_2.6
     3: 0000000000000c50   472 FUNC    GLOBAL DEFAULT   12 __vdso_gettimeofday@@LINUX_2.6
     4: 0000000000000c50   472 FUNC    WEAK   DEFAULT   12 gettimeofday@@LINUX_2.6
     5: 0000000000000e30    21 FUNC    GLOBAL DEFAULT   12 __vdso_time@@LINUX_2.6
     6: 0000000000000e30    21 FUNC    WEAK   DEFAULT   12 time@@LINUX_2.6
     7: 00000000000008d0   885 FUNC    GLOBAL DEFAULT   12 __vdso_clock_gettime@@LINUX_2.6
     8: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6
     9: 0000000000000e50    41 FUNC    GLOBAL DEFAULT   12 __vdso_getcpu@@LINUX_2.6
    10: 0000000000000e50    41 FUNC    WEAK   DEFAULT   12 getcpu@@LINUX_2.6
  [~]# cd /sys/kernel/debug/tracing/
  [tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events
  [tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events
  [tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events
  [tracing]# echo 1 > events/uprobes/enable
  [tracing]# su test # it has UID=1001
  [tracing]$ date
  Thu Aug 25 17:19:29 MSK 2016
  [tracing]$ exit
  [tracing]# cat trace
  # tracer: nop
  #
  # entries-in-buffer/entries-written: 175/175   #P:4
  #
  #                              _-----=> irqs-off
  #                             / _----=> need-resched
  #                            | / _---=> hardirq/softirq
  #                            || / _--=> preempt-depth
  #                            ||| /     delay
  #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
  #              | |       |   ||||       |         |
              bash-11560 [001] d...   316.470236: time: (0x7ffcacebae30)
              bash-11560 [001] d...   316.471436: gettimeofday: (0x7ffcacebac50)
              bash-11560 [001] d...   316.477550: time: (0x7ffcacebae30)
              bash-11560 [001] d...   316.477655: time: (0x7ffcacebae30)
            mktemp-11568 [001] d...   316.479589: gettimeofday: (0x7ffc603f0c50)
              date-11571 [001] d...   316.481890: clock_gettime: (0x7ffec9db58d0)
[...]  

If this approach will be decided as fine, I will prepare a better version,
fixing the following things:
o put vdsofs in generic fs/* dir
o support other archs and vdso blobs
o remove BUG_ON()'s and UID==1001 check
o remove extern's and use headers only
o refactor code in create_trace_uprobe()
o add some state to (struct trace_uprobe), so i.e., `cat uprobe_events` will
  print those uprobes as vdso-based
o document this interface in Documentation/trace/uprobetracer.txt
o prepare nice patches set?

So, opinions? Is it worth to add something like this?

[1]: https://lkml.org/lkml/2016/7/12/346

Dmitry Safonov (3):
  x86/vdso: create vdso file, use it for mapping
  uprobe: drop isdigit() check in create_trace_uprobe
  uprobe: add vdso support

Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Andy Lutomirski <luto@amacapital.net>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Cc: Dmitry Safonov <0x7f454c46@gmail.com>

 arch/x86/entry/vdso/vma.c   | 148 ++++++++++++++++++++++++++++++++++++++++++--
 kernel/trace/trace_uprobe.c |  50 +++++++++++----
 2 files changed, 180 insertions(+), 18 deletions(-)

-- 
2.9.0

^ permalink raw reply	[flat|nested] 37+ messages in thread

* [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-08-25 15:21 [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs) Dmitry Safonov
@ 2016-08-25 15:21 ` Dmitry Safonov
  2016-08-25 19:49   ` Dmitry Safonov
                     ` (4 more replies)
  2016-08-25 15:21 ` [RFC 2/3] uprobe: drop isdigit() check in create_trace_uprobe Dmitry Safonov
                   ` (2 subsequent siblings)
  3 siblings, 5 replies; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-25 15:21 UTC (permalink / raw)
  To: linux-kernel, mingo
  Cc: luto, tglx, hpa, x86, 0x7f454c46, oleg, rostedt, viro, Dmitry Safonov

I added here a new in-kernel fs with ramfs-like options.
Created vdso file in this fs (yet for testing, only 64-bit vdso).
Mapped this file to process's mm on setup_additional_pages.
Just for testing purpose it's done only for specific UID.

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 arch/x86/entry/vdso/vma.c | 146 ++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 140 insertions(+), 6 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index f840766659a8..e83830e422ae 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -13,6 +13,10 @@
 #include <linux/elf.h>
 #include <linux/cpu.h>
 #include <linux/ptrace.h>
+#include <linux/pagemap.h>
+#include <linux/mount.h>
+#include <linux/ramfs.h>
+#include <linux/file.h>
 #include <asm/pvclock.h>
 #include <asm/vgtod.h>
 #include <asm/proto.h>
@@ -35,6 +39,9 @@ void __init init_vdso_image(const struct vdso_image *image)
 						image->alt_len));
 }
 
+static struct vfsmount *vdso_mnt;
+struct file *vdso_file_64;
+
 struct linux_binprm;
 
 /*
@@ -217,12 +224,31 @@ static int map_vdso(const struct vdso_image *image, bool calculate_addr)
 	/*
 	 * MAYWRITE to allow gdb to COW and set breakpoints
 	 */
-	vma = _install_special_mapping(mm,
-				       text_start,
-				       image->size,
-				       VM_READ|VM_EXEC|
-				       VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
-				       &vdso_mapping);
+	if (__kuid_val(task_uid(current)) == 1001) {
+		unsigned long n_addr = mmap_region(vdso_file_64, text_start,
+				image->size, VM_READ|VM_EXEC|
+				VM_DONTEXPAND|VM_SOFTDIRTY|
+				VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0);
+		if (text_start != n_addr) {
+			pr_err("Failed to mmap vdso file at %lx, mmap_region returned %lx\n",
+					text_start, n_addr);
+			goto old_way;
+		}
+		vma = find_vma(mm, text_start);
+		if (IS_ERR(vma) || vma->vm_start != text_start) {
+			pr_err("Failed to find vdso mapped vma at %lx\n",
+					text_start);
+			goto old_way;
+		}
+	} else {
+old_way:
+		vma = _install_special_mapping(mm,
+				text_start,
+				image->size,
+				VM_READ|VM_EXEC|
+				VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC,
+				&vdso_mapping);
+	}
 
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
@@ -336,6 +362,109 @@ static int vgetcpu_online(unsigned int cpu)
 	return smp_call_function_single(cpu, vgetcpu_cpu_init, NULL, 1);
 }
 
+static __init int add_vdso_pages_to_page_cache(
+		const struct vdso_image *vdso_image, struct inode *inode)
+{
+	unsigned long i;
+	int ret;
+
+	for (i = 0; i < (vdso_image->size / PAGE_SIZE); i++) {
+		struct page *page = virt_to_page(vdso_image->data +
+						(i << PAGE_SHIFT));
+		int ret;
+
+		__SetPageLocked(page);
+		ret = add_to_page_cache_locked(page, inode->i_mapping,
+						i, __GFP_REPEAT);
+		__SetPageUptodate(page);
+		__ClearPageLocked(page);
+		if (unlikely(ret))
+			goto put_pages;
+	}
+	return 0;
+
+put_pages:
+	while (i > 0)
+		put_page(virt_to_page(vdso_image->data + (i << PAGE_SHIFT)));
+	return ret;
+}
+
+static char *vdso_vma_name(struct dentry *dentry, char *buffer, int buflen)
+{
+	return "[vdso]";
+}
+static const struct dentry_operations vdso_dops = {
+	.d_dname = vdso_vma_name,
+};
+
+static __init struct file *init_vdso_file(const struct vdso_image *vdso_image,
+					const char *name)
+{
+	struct super_block *sb;
+	struct qstr name_str;
+	struct inode *inode;
+	struct path path;
+	struct file *res;
+
+	if (IS_ERR(vdso_mnt))
+		return ERR_CAST(vdso_mnt);
+	sb = vdso_mnt->mnt_sb;
+
+	name_str.hash = 0;
+	name_str.len = strlen(name);
+	name_str.name = name;
+
+	res = ERR_PTR(-ENOMEM);
+	path.mnt = mntget(vdso_mnt);
+	path.dentry = d_alloc_pseudo(sb, &name_str);
+	if (!path.dentry)
+		goto put_path;
+	d_set_d_op(path.dentry, &vdso_dops);
+
+	res = ERR_PTR(-ENOSPC);
+	inode = ramfs_get_inode(sb, NULL, S_IFREG | S_IRUGO | S_IXUGO, 0);
+	if (!inode)
+		goto put_path;
+
+	inode->i_flags |= S_PRIVATE;
+	d_instantiate(path.dentry, inode);
+	inode->i_size = vdso_image->size;
+
+	res = ERR_PTR(add_vdso_pages_to_page_cache(vdso_image, inode));
+	if (IS_ERR(res))
+		goto put_path;
+
+	res = alloc_file(&path, FMODE_READ, &ramfs_file_operations);
+	if (!IS_ERR(res))
+		return res;
+
+put_path:
+	path_put(&path);
+	return res;
+}
+
+
+static struct file_system_type vdso_fs_type = {
+	.name		= "vdsofs",
+	.mount		= ramfs_mount,
+	.kill_sb	= kill_litter_super,
+};
+
+static int __init init_vdso_fs(void)
+{
+	int ret;
+
+	ret = register_filesystem(&vdso_fs_type);
+	if (ret)
+		return ret;
+
+	vdso_mnt = kern_mount(&vdso_fs_type);
+	if (IS_ERR(vdso_mnt))
+		return PTR_ERR(vdso_mnt);
+	return 0;
+}
+
+/* XXX: replace BUG_ON with return to old-way vdso handling */
 static int __init init_vdso(void)
 {
 	init_vdso_image(&vdso_image_64);
@@ -344,6 +473,11 @@ static int __init init_vdso(void)
 	init_vdso_image(&vdso_image_x32);
 #endif
 
+	BUG_ON(init_vdso_fs());
+
+	vdso_file_64 = init_vdso_file(&vdso_image_64, "vdso_image_64");
+	BUG_ON(IS_ERR(vdso_file_64));
+
 	/* notifier priority > KVM */
 	return cpuhp_setup_state(CPUHP_AP_X86_VDSO_VMA_ONLINE,
 				 "AP_X86_VDSO_VMA_ONLINE", vgetcpu_online, NULL);
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [RFC 2/3] uprobe: drop isdigit() check in create_trace_uprobe
  2016-08-25 15:21 [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs) Dmitry Safonov
  2016-08-25 15:21 ` [RFC 1/3] x86/vdso: create vdso file, use it for mapping Dmitry Safonov
@ 2016-08-25 15:21 ` Dmitry Safonov
  2016-08-29 22:58   ` Steven Rostedt
  2016-08-25 15:21 ` [RFC 3/3] uprobe: add vdso support Dmitry Safonov
  2016-08-25 20:49 ` [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs) H. Peter Anvin
  3 siblings, 1 reply; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-25 15:21 UTC (permalink / raw)
  To: linux-kernel, mingo
  Cc: luto, tglx, hpa, x86, 0x7f454c46, oleg, rostedt, viro, Dmitry Safonov

It's useless. Before:
  [tracing]# echo 'p:test /a:0x0' >> uprobe_events
  [tracing]# echo 'p:test a:0x0' >> uprobe_events
  -bash: echo: write error: No such file or directory
  [tracing]# echo 'p:test 1:0x0' >> uprobe_events
  -bash: echo: write error: Invalid argument

After:
  [tracing]# echo 'p:test 1:0x0' >> uprobe_events
  -bash: echo: write error: No such file or directory

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 kernel/trace/trace_uprobe.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index c53485441c88..a74f2d9ff379 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -427,10 +427,6 @@ static int create_trace_uprobe(int argc, char **argv)
 		pr_info("Probe point is not specified.\n");
 		return -EINVAL;
 	}
-	if (isdigit(argv[1][0])) {
-		pr_info("probe point must be have a filename.\n");
-		return -EINVAL;
-	}
 	arg = strchr(argv[1], ':');
 	if (!arg) {
 		ret = -EINVAL;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* [RFC 3/3] uprobe: add vdso support
  2016-08-25 15:21 [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs) Dmitry Safonov
  2016-08-25 15:21 ` [RFC 1/3] x86/vdso: create vdso file, use it for mapping Dmitry Safonov
  2016-08-25 15:21 ` [RFC 2/3] uprobe: drop isdigit() check in create_trace_uprobe Dmitry Safonov
@ 2016-08-25 15:21 ` Dmitry Safonov
  2016-08-25 20:49 ` [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs) H. Peter Anvin
  3 siblings, 0 replies; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-25 15:21 UTC (permalink / raw)
  To: linux-kernel, mingo
  Cc: luto, tglx, hpa, x86, 0x7f454c46, oleg, rostedt, viro, Dmitry Safonov

For RFC, this patch is full of hacks:
- non-generic vdso_mnt -- it will not built on !x86_64
- used externs in trace_uprobe.c, rather than clean header-includes

This patch converts vdsofs from lookup-less, renames vdso file name
to just "64". It also adds support to uprobe_events for setting uprobe
to vdso image file.

One can set vdso uprobe now like this:
echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events

Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
---
 arch/x86/entry/vdso/vma.c   | 28 ++++++++++++++-------------
 kernel/trace/trace_uprobe.c | 46 +++++++++++++++++++++++++++++++++++++--------
 2 files changed, 53 insertions(+), 21 deletions(-)

diff --git a/arch/x86/entry/vdso/vma.c b/arch/x86/entry/vdso/vma.c
index e83830e422ae..9e966b649028 100644
--- a/arch/x86/entry/vdso/vma.c
+++ b/arch/x86/entry/vdso/vma.c
@@ -39,7 +39,7 @@ void __init init_vdso_image(const struct vdso_image *image)
 						image->alt_len));
 }
 
-static struct vfsmount *vdso_mnt;
+struct vfsmount *vdso_mnt;
 struct file *vdso_file_64;
 
 struct linux_binprm;
@@ -401,38 +401,40 @@ static __init struct file *init_vdso_file(const struct vdso_image *vdso_image,
 					const char *name)
 {
 	struct super_block *sb;
-	struct qstr name_str;
 	struct inode *inode;
 	struct path path;
 	struct file *res;
+	int ret;
 
 	if (IS_ERR(vdso_mnt))
 		return ERR_CAST(vdso_mnt);
 	sb = vdso_mnt->mnt_sb;
 
-	name_str.hash = 0;
-	name_str.len = strlen(name);
-	name_str.name = name;
-
 	res = ERR_PTR(-ENOMEM);
 	path.mnt = mntget(vdso_mnt);
-	path.dentry = d_alloc_pseudo(sb, &name_str);
-	if (!path.dentry)
+	path.dentry = d_alloc_name(vdso_mnt->mnt_root, name);
+	if (!path.dentry) {
+		pr_err("Failed to allocate dentry for %s vdso file\n", name);
 		goto put_path;
+	}
 	d_set_d_op(path.dentry, &vdso_dops);
 
 	res = ERR_PTR(-ENOSPC);
 	inode = ramfs_get_inode(sb, NULL, S_IFREG | S_IRUGO | S_IXUGO, 0);
-	if (!inode)
+	if (!inode) {
+		pr_err("Failed to get inode for %s vdso file\n", name);
 		goto put_path;
+	}
 
 	inode->i_flags |= S_PRIVATE;
-	d_instantiate(path.dentry, inode);
+	d_add(path.dentry, inode);
 	inode->i_size = vdso_image->size;
 
-	res = ERR_PTR(add_vdso_pages_to_page_cache(vdso_image, inode));
-	if (IS_ERR(res))
+	ret = add_vdso_pages_to_page_cache(vdso_image, inode);
+	if (ret) {
+		pr_err("Failed to put %s pages in page cache: %d\n", name, ret);
 		goto put_path;
+	}
 
 	res = alloc_file(&path, FMODE_READ, &ramfs_file_operations);
 	if (!IS_ERR(res))
@@ -475,7 +477,7 @@ static int __init init_vdso(void)
 
 	BUG_ON(init_vdso_fs());
 
-	vdso_file_64 = init_vdso_file(&vdso_image_64, "vdso_image_64");
+	vdso_file_64 = init_vdso_file(&vdso_image_64, "64");
 	BUG_ON(IS_ERR(vdso_file_64));
 
 	/* notifier priority > KVM */
diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
index a74f2d9ff379..3d4dc6c6ddd9 100644
--- a/kernel/trace/trace_uprobe.c
+++ b/kernel/trace/trace_uprobe.c
@@ -23,6 +23,7 @@
 #include <linux/uprobes.h>
 #include <linux/namei.h>
 #include <linux/string.h>
+#include <linux/mount.h>
 
 #include "trace_probe.h"
 
@@ -352,6 +353,9 @@ end:
  *
  *  - Remove uprobe: -:[GRP/]EVENT
  */
+extern struct vfsmount *vdso_mnt;
+extern int vfs_path_lookup(struct dentry *, struct vfsmount *,
+			   const char *, unsigned int, struct path *);
 static int create_trace_uprobe(int argc, char **argv)
 {
 	struct trace_uprobe *tu;
@@ -427,15 +431,41 @@ static int create_trace_uprobe(int argc, char **argv)
 		pr_info("Probe point is not specified.\n");
 		return -EINVAL;
 	}
-	arg = strchr(argv[1], ':');
-	if (!arg) {
-		ret = -EINVAL;
-		goto fail_address_parse;
-	}
+	if (argv[1][0] == ':') {
+		arg = strchr(&argv[1][1], ':');
+		if (!arg) {
+			ret = -EINVAL;
+			goto fail_address_parse;
+		}
+		*arg++ = '\0';
+		if (strcmp(&argv[1][1], "vdso")) {
+			ret = -EINVAL;
+			goto fail_address_parse;
+		}
+		filename = arg;
+		arg = strchr(filename, ':');
+		if (!filename) {
+			ret = -EINVAL;
+			goto fail_address_parse;
+		}
+		*arg++ = '\0';
+		if (!vdso_mnt) {
+			ret = -ENODEV;
+			goto fail_address_parse;
+		}
+		ret = vfs_path_lookup(vdso_mnt->mnt_root, vdso_mnt,
+				filename, LOOKUP_FOLLOW, &path);
+	} else {
+		arg = strchr(argv[1], ':');
+		if (!arg) {
+			ret = -EINVAL;
+			goto fail_address_parse;
+		}
 
-	*arg++ = '\0';
-	filename = argv[1];
-	ret = kern_path(filename, LOOKUP_FOLLOW, &path);
+		*arg++ = '\0';
+		filename = argv[1];
+		ret = kern_path(filename, LOOKUP_FOLLOW, &path);
+	}
 	if (ret)
 		goto fail_address_parse;
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 37+ messages in thread

* Re: [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-08-25 15:21 ` [RFC 1/3] x86/vdso: create vdso file, use it for mapping Dmitry Safonov
@ 2016-08-25 19:49   ` Dmitry Safonov
  2016-08-25 20:05     ` Dmitry Safonov
  2016-08-28 20:14   ` Cyrill Gorcunov
                     ` (3 subsequent siblings)
  4 siblings, 1 reply; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-25 19:49 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Ingo Molnar, Andy Lutomirski, Thomas Gleixner,
	H. Peter Anvin, X86 ML, Oleg Nesterov, Steven Rostedt, viro

2016-08-25 18:21 GMT+03:00 Dmitry Safonov <dsafonov@virtuozzo.com>:
> +static char *vdso_vma_name(struct dentry *dentry, char *buffer, int buflen)
> +{
> +       return "[vdso]";

It should be:
+       return dynamic_dname(dentry, buffer, buflen, "[vdso]");
returned pointer should be inside buffer.

-- 
             Dmitry

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-08-25 19:49   ` Dmitry Safonov
@ 2016-08-25 20:05     ` Dmitry Safonov
  0 siblings, 0 replies; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-25 20:05 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Ingo Molnar, Andy Lutomirski, Thomas Gleixner,
	H. Peter Anvin, X86 ML, Oleg Nesterov, Steven Rostedt, viro

On 08/25/2016 10:49 PM, Dmitry Safonov wrote:
> 2016-08-25 18:21 GMT+03:00 Dmitry Safonov <dsafonov@virtuozzo.com>:
>> +static char *vdso_vma_name(struct dentry *dentry, char *buffer, int buflen)
>> +{
>> +       return "[vdso]";
>
> It should be:
> +       return dynamic_dname(dentry, buffer, buflen, "[vdso]");
> returned pointer should be inside buffer.
>

Funny thing: after this fixup, I can easily get vdso blob image with:
[root@localhost ~]# dd 
if=/proc/11486/map_files/7ffd26596000-7ffd26598000 of=./vdso
16+0 records in
16+0 records out
8192 bytes (8.2 kB) copied, 0.000101732 s, 80.5 MB/s
[root@localhost ~]# objdump -dS ./vdso | head

./vdso:     file format elf64-x86-64


Disassembly of section .text:

00000000000008d0 <__vdso_clock_gettime>:
  8d0:	55                   	push   %rbp
  8d1:	83 ff 01             	cmp    $0x1,%edi
  8d4:	48 89 e5             	mov    %rsp,%rbp


-- 
              Dmitry

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
  2016-08-25 15:21 [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs) Dmitry Safonov
                   ` (2 preceding siblings ...)
  2016-08-25 15:21 ` [RFC 3/3] uprobe: add vdso support Dmitry Safonov
@ 2016-08-25 20:49 ` H. Peter Anvin
  2016-08-25 22:53   ` Dmitry Safonov
  3 siblings, 1 reply; 37+ messages in thread
From: H. Peter Anvin @ 2016-08-25 20:49 UTC (permalink / raw)
  To: Dmitry Safonov, linux-kernel, mingo
  Cc: luto, tglx, x86, 0x7f454c46, oleg, rostedt, viro

On August 25, 2016 8:21:07 AM PDT, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
>This patches set is cleanly RFC and is not supposed to be applied.
>Also for RFC time it builds only on x86_64.
>
>So, in a mail thread Oleg told that it would be worth to introduce
>vm_file
>for vdso mappings as currently uprobes can not be placed on vDSO VMAs
>[1].
>In this patches set I introduce in-kernel filesystem for vdso files.
>After patches vDSO VMA now has inode and is just a private file
>mapping:
>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0                        
> [vvar]
>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18                       
> [vdso]
>
>Then I introduce interface in uprobe_events to insert uprobes in vdso.
>FWIW:
>  [~]# cd kernel/linux
>  [linux]# readelf --syms arch/x86/entry/vdso/vdso64.so
>Symbol table '.dynsym' contains 11 entries:
>   Num:    Value          Size Type    Bind   Vis      Ndx Name
>     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND 
>     1: 0000000000000470     0 SECTION LOCAL  DEFAULT    8 
>2: 00000000000008d0   885 FUNC    WEAK   DEFAULT   12
>clock_gettime@@LINUX_2.6
>3: 0000000000000c50   472 FUNC    GLOBAL DEFAULT   12
>__vdso_gettimeofday@@LINUX_2.6
>4: 0000000000000c50   472 FUNC    WEAK   DEFAULT   12
>gettimeofday@@LINUX_2.6
>5: 0000000000000e30    21 FUNC    GLOBAL DEFAULT   12
>__vdso_time@@LINUX_2.6
>  6: 0000000000000e30    21 FUNC    WEAK   DEFAULT   12 time@@LINUX_2.6
>7: 00000000000008d0   885 FUNC    GLOBAL DEFAULT   12
>__vdso_clock_gettime@@LINUX_2.6
>     8: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6
>9: 0000000000000e50    41 FUNC    GLOBAL DEFAULT   12
>__vdso_getcpu@@LINUX_2.6
>10: 0000000000000e50    41 FUNC    WEAK   DEFAULT   12
>getcpu@@LINUX_2.6
>  [~]# cd /sys/kernel/debug/tracing/
>  [tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events
>  [tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events
>  [tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events
>  [tracing]# echo 1 > events/uprobes/enable
>  [tracing]# su test # it has UID=1001
>  [tracing]$ date
>  Thu Aug 25 17:19:29 MSK 2016
>  [tracing]$ exit
>  [tracing]# cat trace
>  # tracer: nop
>  #
>  # entries-in-buffer/entries-written: 175/175   #P:4
>  #
>  #                              _-----=> irqs-off
>  #                             / _----=> need-resched
>  #                            | / _---=> hardirq/softirq
>  #                            || / _--=> preempt-depth
>  #                            ||| /     delay
>  #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>  #              | |       |   ||||       |         |
>             bash-11560 [001] d...   316.470236: time: (0x7ffcacebae30)
>     bash-11560 [001] d...   316.471436: gettimeofday: (0x7ffcacebac50)
>             bash-11560 [001] d...   316.477550: time: (0x7ffcacebae30)
>             bash-11560 [001] d...   316.477655: time: (0x7ffcacebae30)
>   mktemp-11568 [001] d...   316.479589: gettimeofday: (0x7ffc603f0c50)
>    date-11571 [001] d...   316.481890: clock_gettime: (0x7ffec9db58d0)
>[...]  
>
>If this approach will be decided as fine, I will prepare a better
>version,
>fixing the following things:
>o put vdsofs in generic fs/* dir
>o support other archs and vdso blobs
>o remove BUG_ON()'s and UID==1001 check
>o remove extern's and use headers only
>o refactor code in create_trace_uprobe()
>o add some state to (struct trace_uprobe), so i.e., `cat uprobe_events`
>will
>  print those uprobes as vdso-based
>o document this interface in Documentation/trace/uprobetracer.txt
>o prepare nice patches set?
>
>So, opinions? Is it worth to add something like this?
>
>[1]: https://lkml.org/lkml/2016/7/12/346
>
>Dmitry Safonov (3):
>  x86/vdso: create vdso file, use it for mapping
>  uprobe: drop isdigit() check in create_trace_uprobe
>  uprobe: add vdso support
>
>Cc: Oleg Nesterov <oleg@redhat.com>
>Cc: Al Viro <viro@zeniv.linux.org.uk>
>Cc: Steven Rostedt <rostedt@goodmis.org>
>Cc: Andy Lutomirski <luto@amacapital.net>
>Cc: Thomas Gleixner <tglx@linutronix.de>
>Cc: Ingo Molnar <mingo@redhat.com>
>Cc: "H. Peter Anvin" <hpa@zytor.com>
>Cc: x86@kernel.org
>Cc: Dmitry Safonov <0x7f454c46@gmail.com>
>
>arch/x86/entry/vdso/vma.c   | 148
>++++++++++++++++++++++++++++++++++++++++++--
> kernel/trace/trace_uprobe.c |  50 +++++++++++----
> 2 files changed, 180 insertions(+), 18 deletions(-)

I think there is a lot to be said for this idea.  However, a private mapping is definitely wrong for the vvar data; for the vdso code it could be considered either way I suppose.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
  2016-08-25 20:49 ` [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs) H. Peter Anvin
@ 2016-08-25 22:53   ` Dmitry Safonov
  2016-08-25 23:00     ` H. Peter Anvin
  2016-09-21  0:22     ` H. Peter Anvin
  0 siblings, 2 replies; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-25 22:53 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dmitry Safonov, linux-kernel, Ingo Molnar, Andy Lutomirski,
	Thomas Gleixner, X86 ML, Oleg Nesterov, Steven Rostedt, viro

2016-08-25 23:49 GMT+03:00 H. Peter Anvin <hpa@zytor.com>:
> On August 25, 2016 8:21:07 AM PDT, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
>>This patches set is cleanly RFC and is not supposed to be applied.
>>Also for RFC time it builds only on x86_64.
>>
>>So, in a mail thread Oleg told that it would be worth to introduce
>>vm_file
>>for vdso mappings as currently uprobes can not be placed on vDSO VMAs
>>[1].
>>In this patches set I introduce in-kernel filesystem for vdso files.
>>After patches vDSO VMA now has inode and is just a private file
>>mapping:
>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>> [vvar]
>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>> [vdso]
>>
>>Then I introduce interface in uprobe_events to insert uprobes in vdso.
>>FWIW:
>>  [~]# cd kernel/linux
>>  [linux]# readelf --syms arch/x86/entry/vdso/vdso64.so
>>Symbol table '.dynsym' contains 11 entries:
>>   Num:    Value          Size Type    Bind   Vis      Ndx Name
>>     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>>     1: 0000000000000470     0 SECTION LOCAL  DEFAULT    8
>>2: 00000000000008d0   885 FUNC    WEAK   DEFAULT   12
>>clock_gettime@@LINUX_2.6
>>3: 0000000000000c50   472 FUNC    GLOBAL DEFAULT   12
>>__vdso_gettimeofday@@LINUX_2.6
>>4: 0000000000000c50   472 FUNC    WEAK   DEFAULT   12
>>gettimeofday@@LINUX_2.6
>>5: 0000000000000e30    21 FUNC    GLOBAL DEFAULT   12
>>__vdso_time@@LINUX_2.6
>>  6: 0000000000000e30    21 FUNC    WEAK   DEFAULT   12 time@@LINUX_2.6
>>7: 00000000000008d0   885 FUNC    GLOBAL DEFAULT   12
>>__vdso_clock_gettime@@LINUX_2.6
>>     8: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6
>>9: 0000000000000e50    41 FUNC    GLOBAL DEFAULT   12
>>__vdso_getcpu@@LINUX_2.6
>>10: 0000000000000e50    41 FUNC    WEAK   DEFAULT   12
>>getcpu@@LINUX_2.6
>>  [~]# cd /sys/kernel/debug/tracing/
>>  [tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events
>>  [tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events
>>  [tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events
>>  [tracing]# echo 1 > events/uprobes/enable
>>  [tracing]# su test # it has UID=1001
>>  [tracing]$ date
>>  Thu Aug 25 17:19:29 MSK 2016
>>  [tracing]$ exit
>>  [tracing]# cat trace
>>  # tracer: nop
>>  #
>>  # entries-in-buffer/entries-written: 175/175   #P:4
>>  #
>>  #                              _-----=> irqs-off
>>  #                             / _----=> need-resched
>>  #                            | / _---=> hardirq/softirq
>>  #                            || / _--=> preempt-depth
>>  #                            ||| /     delay
>>  #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>>  #              | |       |   ||||       |         |
>>             bash-11560 [001] d...   316.470236: time: (0x7ffcacebae30)
>>     bash-11560 [001] d...   316.471436: gettimeofday: (0x7ffcacebac50)
>>             bash-11560 [001] d...   316.477550: time: (0x7ffcacebae30)
>>             bash-11560 [001] d...   316.477655: time: (0x7ffcacebae30)
>>   mktemp-11568 [001] d...   316.479589: gettimeofday: (0x7ffc603f0c50)
>>    date-11571 [001] d...   316.481890: clock_gettime: (0x7ffec9db58d0)
>>[...]
>>
>>If this approach will be decided as fine, I will prepare a better
>>version,
>>fixing the following things:
>>o put vdsofs in generic fs/* dir
>>o support other archs and vdso blobs
>>o remove BUG_ON()'s and UID==1001 check
>>o remove extern's and use headers only
>>o refactor code in create_trace_uprobe()
>>o add some state to (struct trace_uprobe), so i.e., `cat uprobe_events`
>>will
>>  print those uprobes as vdso-based
>>o document this interface in Documentation/trace/uprobetracer.txt
>>o prepare nice patches set?
>>
>>So, opinions? Is it worth to add something like this?
>>
>>[1]: https://lkml.org/lkml/2016/7/12/346
>>
>>Dmitry Safonov (3):
>>  x86/vdso: create vdso file, use it for mapping
>>  uprobe: drop isdigit() check in create_trace_uprobe
>>  uprobe: add vdso support
>>
>>Cc: Oleg Nesterov <oleg@redhat.com>
>>Cc: Al Viro <viro@zeniv.linux.org.uk>
>>Cc: Steven Rostedt <rostedt@goodmis.org>
>>Cc: Andy Lutomirski <luto@amacapital.net>
>>Cc: Thomas Gleixner <tglx@linutronix.de>
>>Cc: Ingo Molnar <mingo@redhat.com>
>>Cc: "H. Peter Anvin" <hpa@zytor.com>
>>Cc: x86@kernel.org
>>Cc: Dmitry Safonov <0x7f454c46@gmail.com>
>>
>>arch/x86/entry/vdso/vma.c   | 148
>>++++++++++++++++++++++++++++++++++++++++++--
>> kernel/trace/trace_uprobe.c |  50 +++++++++++----
>> 2 files changed, 180 insertions(+), 18 deletions(-)
>
> I think there is a lot to be said for this idea.  However, a private mapping is definitely wrong for the vvar data; for the vdso code it could be considered either way I suppose.

Thanks on your reply.
As you could see, I preserved pure mapping of pfn for vvar:
7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0                          [vvar]
7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18                         [vdso]
(no inode number).
I also think it would be useless to do the same to vvar as it
has just data and there is no point in probing it.

-- 
             Dmitry

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
  2016-08-25 22:53   ` Dmitry Safonov
@ 2016-08-25 23:00     ` H. Peter Anvin
  2016-08-26 11:16       ` Dmitry Safonov
  2016-09-21  0:22     ` H. Peter Anvin
  1 sibling, 1 reply; 37+ messages in thread
From: H. Peter Anvin @ 2016-08-25 23:00 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Dmitry Safonov, linux-kernel, Ingo Molnar, Andy Lutomirski,
	Thomas Gleixner, X86 ML, Oleg Nesterov, Steven Rostedt, viro

On August 25, 2016 3:53:43 PM PDT, Dmitry Safonov <0x7f454c46@gmail.com> wrote:
>2016-08-25 23:49 GMT+03:00 H. Peter Anvin <hpa@zytor.com>:
>> On August 25, 2016 8:21:07 AM PDT, Dmitry Safonov
><dsafonov@virtuozzo.com> wrote:
>>>This patches set is cleanly RFC and is not supposed to be applied.
>>>Also for RFC time it builds only on x86_64.
>>>
>>>So, in a mail thread Oleg told that it would be worth to introduce
>>>vm_file
>>>for vdso mappings as currently uprobes can not be placed on vDSO VMAs
>>>[1].
>>>In this patches set I introduce in-kernel filesystem for vdso files.
>>>After patches vDSO VMA now has inode and is just a private file
>>>mapping:
>>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>>> [vvar]
>>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>>> [vdso]
>>>
>>>Then I introduce interface in uprobe_events to insert uprobes in
>vdso.
>>>FWIW:
>>>  [~]# cd kernel/linux
>>>  [linux]# readelf --syms arch/x86/entry/vdso/vdso64.so
>>>Symbol table '.dynsym' contains 11 entries:
>>>   Num:    Value          Size Type    Bind   Vis      Ndx Name
>>>     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>>>     1: 0000000000000470     0 SECTION LOCAL  DEFAULT    8
>>>2: 00000000000008d0   885 FUNC    WEAK   DEFAULT   12
>>>clock_gettime@@LINUX_2.6
>>>3: 0000000000000c50   472 FUNC    GLOBAL DEFAULT   12
>>>__vdso_gettimeofday@@LINUX_2.6
>>>4: 0000000000000c50   472 FUNC    WEAK   DEFAULT   12
>>>gettimeofday@@LINUX_2.6
>>>5: 0000000000000e30    21 FUNC    GLOBAL DEFAULT   12
>>>__vdso_time@@LINUX_2.6
>>>  6: 0000000000000e30    21 FUNC    WEAK   DEFAULT   12
>time@@LINUX_2.6
>>>7: 00000000000008d0   885 FUNC    GLOBAL DEFAULT   12
>>>__vdso_clock_gettime@@LINUX_2.6
>>>     8: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6
>>>9: 0000000000000e50    41 FUNC    GLOBAL DEFAULT   12
>>>__vdso_getcpu@@LINUX_2.6
>>>10: 0000000000000e50    41 FUNC    WEAK   DEFAULT   12
>>>getcpu@@LINUX_2.6
>>>  [~]# cd /sys/kernel/debug/tracing/
>>>  [tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events
>>>  [tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events
>>>  [tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events
>>>  [tracing]# echo 1 > events/uprobes/enable
>>>  [tracing]# su test # it has UID=1001
>>>  [tracing]$ date
>>>  Thu Aug 25 17:19:29 MSK 2016
>>>  [tracing]$ exit
>>>  [tracing]# cat trace
>>>  # tracer: nop
>>>  #
>>>  # entries-in-buffer/entries-written: 175/175   #P:4
>>>  #
>>>  #                              _-----=> irqs-off
>>>  #                             / _----=> need-resched
>>>  #                            | / _---=> hardirq/softirq
>>>  #                            || / _--=> preempt-depth
>>>  #                            ||| /     delay
>>>  #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>>>  #              | |       |   ||||       |         |
>>>             bash-11560 [001] d...   316.470236: time:
>(0x7ffcacebae30)
>>>     bash-11560 [001] d...   316.471436: gettimeofday:
>(0x7ffcacebac50)
>>>             bash-11560 [001] d...   316.477550: time:
>(0x7ffcacebae30)
>>>             bash-11560 [001] d...   316.477655: time:
>(0x7ffcacebae30)
>>>   mktemp-11568 [001] d...   316.479589: gettimeofday:
>(0x7ffc603f0c50)
>>>    date-11571 [001] d...   316.481890: clock_gettime:
>(0x7ffec9db58d0)
>>>[...]
>>>
>>>If this approach will be decided as fine, I will prepare a better
>>>version,
>>>fixing the following things:
>>>o put vdsofs in generic fs/* dir
>>>o support other archs and vdso blobs
>>>o remove BUG_ON()'s and UID==1001 check
>>>o remove extern's and use headers only
>>>o refactor code in create_trace_uprobe()
>>>o add some state to (struct trace_uprobe), so i.e., `cat
>uprobe_events`
>>>will
>>>  print those uprobes as vdso-based
>>>o document this interface in Documentation/trace/uprobetracer.txt
>>>o prepare nice patches set?
>>>
>>>So, opinions? Is it worth to add something like this?
>>>
>>>[1]: https://lkml.org/lkml/2016/7/12/346
>>>
>>>Dmitry Safonov (3):
>>>  x86/vdso: create vdso file, use it for mapping
>>>  uprobe: drop isdigit() check in create_trace_uprobe
>>>  uprobe: add vdso support
>>>
>>>Cc: Oleg Nesterov <oleg@redhat.com>
>>>Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>Cc: Steven Rostedt <rostedt@goodmis.org>
>>>Cc: Andy Lutomirski <luto@amacapital.net>
>>>Cc: Thomas Gleixner <tglx@linutronix.de>
>>>Cc: Ingo Molnar <mingo@redhat.com>
>>>Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>Cc: x86@kernel.org
>>>Cc: Dmitry Safonov <0x7f454c46@gmail.com>
>>>
>>>arch/x86/entry/vdso/vma.c   | 148
>>>++++++++++++++++++++++++++++++++++++++++++--
>>> kernel/trace/trace_uprobe.c |  50 +++++++++++----
>>> 2 files changed, 180 insertions(+), 18 deletions(-)
>>
>> I think there is a lot to be said for this idea.  However, a private
>mapping is definitely wrong for the vvar data; for the vdso code it
>could be considered either way I suppose.
>
>Thanks on your reply.
>As you could see, I preserved pure mapping of pfn for vvar:
>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0                        
> [vvar]
>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18                       
> [vdso]
>(no inode number).
>I also think it would be useless to do the same to vvar as it
>has just data and there is no point in probing it.

Well, it would things like mremap() just work and so on.  Let's get rid of special cases if we are.
-- 
Sent from my Android device with K-9 Mail. Please excuse brevity and formatting.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
  2016-08-25 23:00     ` H. Peter Anvin
@ 2016-08-26 11:16       ` Dmitry Safonov
  2016-08-26 14:32         ` Andy Lutomirski
  0 siblings, 1 reply; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-26 11:16 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dmitry Safonov, linux-kernel, Ingo Molnar, Andy Lutomirski,
	Thomas Gleixner, X86 ML, Oleg Nesterov, Steven Rostedt, viro

2016-08-26 2:00 GMT+03:00 H. Peter Anvin <hpa@zytor.com>:
> On August 25, 2016 3:53:43 PM PDT, Dmitry Safonov <0x7f454c46@gmail.com> wrote:
>>2016-08-25 23:49 GMT+03:00 H. Peter Anvin <hpa@zytor.com>:
>>> On August 25, 2016 8:21:07 AM PDT, Dmitry Safonov
>><dsafonov@virtuozzo.com> wrote:
>>>>This patches set is cleanly RFC and is not supposed to be applied.
>>>>Also for RFC time it builds only on x86_64.
>>>>
>>>>So, in a mail thread Oleg told that it would be worth to introduce
>>>>vm_file
>>>>for vdso mappings as currently uprobes can not be placed on vDSO VMAs
>>>>[1].
>>>>In this patches set I introduce in-kernel filesystem for vdso files.
>>>>After patches vDSO VMA now has inode and is just a private file
>>>>mapping:
>>>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>>>> [vvar]
>>>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>>>> [vdso]
>>>>
>>>>Then I introduce interface in uprobe_events to insert uprobes in
>>vdso.
>>>>FWIW:
>>>>  [~]# cd kernel/linux
>>>>  [linux]# readelf --syms arch/x86/entry/vdso/vdso64.so
>>>>Symbol table '.dynsym' contains 11 entries:
>>>>   Num:    Value          Size Type    Bind   Vis      Ndx Name
>>>>     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>>>>     1: 0000000000000470     0 SECTION LOCAL  DEFAULT    8
>>>>2: 00000000000008d0   885 FUNC    WEAK   DEFAULT   12
>>>>clock_gettime@@LINUX_2.6
>>>>3: 0000000000000c50   472 FUNC    GLOBAL DEFAULT   12
>>>>__vdso_gettimeofday@@LINUX_2.6
>>>>4: 0000000000000c50   472 FUNC    WEAK   DEFAULT   12
>>>>gettimeofday@@LINUX_2.6
>>>>5: 0000000000000e30    21 FUNC    GLOBAL DEFAULT   12
>>>>__vdso_time@@LINUX_2.6
>>>>  6: 0000000000000e30    21 FUNC    WEAK   DEFAULT   12
>>time@@LINUX_2.6
>>>>7: 00000000000008d0   885 FUNC    GLOBAL DEFAULT   12
>>>>__vdso_clock_gettime@@LINUX_2.6
>>>>     8: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6
>>>>9: 0000000000000e50    41 FUNC    GLOBAL DEFAULT   12
>>>>__vdso_getcpu@@LINUX_2.6
>>>>10: 0000000000000e50    41 FUNC    WEAK   DEFAULT   12
>>>>getcpu@@LINUX_2.6
>>>>  [~]# cd /sys/kernel/debug/tracing/
>>>>  [tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events
>>>>  [tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events
>>>>  [tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events
>>>>  [tracing]# echo 1 > events/uprobes/enable
>>>>  [tracing]# su test # it has UID=1001
>>>>  [tracing]$ date
>>>>  Thu Aug 25 17:19:29 MSK 2016
>>>>  [tracing]$ exit
>>>>  [tracing]# cat trace
>>>>  # tracer: nop
>>>>  #
>>>>  # entries-in-buffer/entries-written: 175/175   #P:4
>>>>  #
>>>>  #                              _-----=> irqs-off
>>>>  #                             / _----=> need-resched
>>>>  #                            | / _---=> hardirq/softirq
>>>>  #                            || / _--=> preempt-depth
>>>>  #                            ||| /     delay
>>>>  #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>>>>  #              | |       |   ||||       |         |
>>>>             bash-11560 [001] d...   316.470236: time:
>>(0x7ffcacebae30)
>>>>     bash-11560 [001] d...   316.471436: gettimeofday:
>>(0x7ffcacebac50)
>>>>             bash-11560 [001] d...   316.477550: time:
>>(0x7ffcacebae30)
>>>>             bash-11560 [001] d...   316.477655: time:
>>(0x7ffcacebae30)
>>>>   mktemp-11568 [001] d...   316.479589: gettimeofday:
>>(0x7ffc603f0c50)
>>>>    date-11571 [001] d...   316.481890: clock_gettime:
>>(0x7ffec9db58d0)
>>>>[...]
>>>>
>>>>If this approach will be decided as fine, I will prepare a better
>>>>version,
>>>>fixing the following things:
>>>>o put vdsofs in generic fs/* dir
>>>>o support other archs and vdso blobs
>>>>o remove BUG_ON()'s and UID==1001 check
>>>>o remove extern's and use headers only
>>>>o refactor code in create_trace_uprobe()
>>>>o add some state to (struct trace_uprobe), so i.e., `cat
>>uprobe_events`
>>>>will
>>>>  print those uprobes as vdso-based
>>>>o document this interface in Documentation/trace/uprobetracer.txt
>>>>o prepare nice patches set?
>>>>
>>>>So, opinions? Is it worth to add something like this?
>>>>
>>>>[1]: https://lkml.org/lkml/2016/7/12/346
>>>>
>>>>Dmitry Safonov (3):
>>>>  x86/vdso: create vdso file, use it for mapping
>>>>  uprobe: drop isdigit() check in create_trace_uprobe
>>>>  uprobe: add vdso support
>>>>
>>>>Cc: Oleg Nesterov <oleg@redhat.com>
>>>>Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>Cc: Steven Rostedt <rostedt@goodmis.org>
>>>>Cc: Andy Lutomirski <luto@amacapital.net>
>>>>Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>Cc: Ingo Molnar <mingo@redhat.com>
>>>>Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>Cc: x86@kernel.org
>>>>Cc: Dmitry Safonov <0x7f454c46@gmail.com>
>>>>
>>>>arch/x86/entry/vdso/vma.c   | 148
>>>>++++++++++++++++++++++++++++++++++++++++++--
>>>> kernel/trace/trace_uprobe.c |  50 +++++++++++----
>>>> 2 files changed, 180 insertions(+), 18 deletions(-)
>>>
>>> I think there is a lot to be said for this idea.  However, a private
>>mapping is definitely wrong for the vvar data; for the vdso code it
>>could be considered either way I suppose.
>>
>>Thanks on your reply.
>>As you could see, I preserved pure mapping of pfn for vvar:
>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>> [vvar]
>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>> [vdso]
>>(no inode number).
>>I also think it would be useless to do the same to vvar as it
>>has just data and there is no point in probing it.
>
> Well, it would things like mremap() just work and so on.  Let's get rid of special cases if we are.

Well, for RFC it wouldn't move context.vdso pointer on mremap(),
but as RFC is for x86_64 only, it will work on it.
Anyway, I don't think it would be hard to fix and make mremap() work on
other archs on post-RFC.

The only corner-case I see for now is that /proc/self/map_files/<vdso_range>
will point to [vdso] which is broken link. But one could read this file
and dump/read vdso blob.
So, in the other words: if some program assumes that /proc/self/map_files/*
should always point to correct file, it may be confused. Not sure, maybe
it would be confused by orphane-file mappings, so having dangling link
there is just fine.

-- 
             Dmitry

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
  2016-08-26 11:16       ` Dmitry Safonov
@ 2016-08-26 14:32         ` Andy Lutomirski
  2016-08-26 14:42           ` Dmitry Safonov
  0 siblings, 1 reply; 37+ messages in thread
From: Andy Lutomirski @ 2016-08-26 14:32 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: H. Peter Anvin, Dmitry Safonov, linux-kernel, Ingo Molnar,
	Thomas Gleixner, X86 ML, Oleg Nesterov, Steven Rostedt, Al Viro

On Fri, Aug 26, 2016 at 4:16 AM, Dmitry Safonov <0x7f454c46@gmail.com> wrote:
> 2016-08-26 2:00 GMT+03:00 H. Peter Anvin <hpa@zytor.com>:
>> On August 25, 2016 3:53:43 PM PDT, Dmitry Safonov <0x7f454c46@gmail.com> wrote:
>>>2016-08-25 23:49 GMT+03:00 H. Peter Anvin <hpa@zytor.com>:
>>>> On August 25, 2016 8:21:07 AM PDT, Dmitry Safonov
>>><dsafonov@virtuozzo.com> wrote:
>>>>>This patches set is cleanly RFC and is not supposed to be applied.
>>>>>Also for RFC time it builds only on x86_64.
>>>>>
>>>>>So, in a mail thread Oleg told that it would be worth to introduce
>>>>>vm_file
>>>>>for vdso mappings as currently uprobes can not be placed on vDSO VMAs
>>>>>[1].
>>>>>In this patches set I introduce in-kernel filesystem for vdso files.
>>>>>After patches vDSO VMA now has inode and is just a private file
>>>>>mapping:
>>>>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>>>>> [vvar]
>>>>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>>>>> [vdso]
>>>>>
>>>>>Then I introduce interface in uprobe_events to insert uprobes in
>>>vdso.
>>>>>FWIW:
>>>>>  [~]# cd kernel/linux
>>>>>  [linux]# readelf --syms arch/x86/entry/vdso/vdso64.so
>>>>>Symbol table '.dynsym' contains 11 entries:
>>>>>   Num:    Value          Size Type    Bind   Vis      Ndx Name
>>>>>     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>>>>>     1: 0000000000000470     0 SECTION LOCAL  DEFAULT    8
>>>>>2: 00000000000008d0   885 FUNC    WEAK   DEFAULT   12
>>>>>clock_gettime@@LINUX_2.6
>>>>>3: 0000000000000c50   472 FUNC    GLOBAL DEFAULT   12
>>>>>__vdso_gettimeofday@@LINUX_2.6
>>>>>4: 0000000000000c50   472 FUNC    WEAK   DEFAULT   12
>>>>>gettimeofday@@LINUX_2.6
>>>>>5: 0000000000000e30    21 FUNC    GLOBAL DEFAULT   12
>>>>>__vdso_time@@LINUX_2.6
>>>>>  6: 0000000000000e30    21 FUNC    WEAK   DEFAULT   12
>>>time@@LINUX_2.6
>>>>>7: 00000000000008d0   885 FUNC    GLOBAL DEFAULT   12
>>>>>__vdso_clock_gettime@@LINUX_2.6
>>>>>     8: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6
>>>>>9: 0000000000000e50    41 FUNC    GLOBAL DEFAULT   12
>>>>>__vdso_getcpu@@LINUX_2.6
>>>>>10: 0000000000000e50    41 FUNC    WEAK   DEFAULT   12
>>>>>getcpu@@LINUX_2.6
>>>>>  [~]# cd /sys/kernel/debug/tracing/
>>>>>  [tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events
>>>>>  [tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events
>>>>>  [tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events
>>>>>  [tracing]# echo 1 > events/uprobes/enable
>>>>>  [tracing]# su test # it has UID=1001
>>>>>  [tracing]$ date
>>>>>  Thu Aug 25 17:19:29 MSK 2016
>>>>>  [tracing]$ exit
>>>>>  [tracing]# cat trace
>>>>>  # tracer: nop
>>>>>  #
>>>>>  # entries-in-buffer/entries-written: 175/175   #P:4
>>>>>  #
>>>>>  #                              _-----=> irqs-off
>>>>>  #                             / _----=> need-resched
>>>>>  #                            | / _---=> hardirq/softirq
>>>>>  #                            || / _--=> preempt-depth
>>>>>  #                            ||| /     delay
>>>>>  #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>>>>>  #              | |       |   ||||       |         |
>>>>>             bash-11560 [001] d...   316.470236: time:
>>>(0x7ffcacebae30)
>>>>>     bash-11560 [001] d...   316.471436: gettimeofday:
>>>(0x7ffcacebac50)
>>>>>             bash-11560 [001] d...   316.477550: time:
>>>(0x7ffcacebae30)
>>>>>             bash-11560 [001] d...   316.477655: time:
>>>(0x7ffcacebae30)
>>>>>   mktemp-11568 [001] d...   316.479589: gettimeofday:
>>>(0x7ffc603f0c50)
>>>>>    date-11571 [001] d...   316.481890: clock_gettime:
>>>(0x7ffec9db58d0)
>>>>>[...]
>>>>>
>>>>>If this approach will be decided as fine, I will prepare a better
>>>>>version,
>>>>>fixing the following things:
>>>>>o put vdsofs in generic fs/* dir
>>>>>o support other archs and vdso blobs
>>>>>o remove BUG_ON()'s and UID==1001 check
>>>>>o remove extern's and use headers only
>>>>>o refactor code in create_trace_uprobe()
>>>>>o add some state to (struct trace_uprobe), so i.e., `cat
>>>uprobe_events`
>>>>>will
>>>>>  print those uprobes as vdso-based
>>>>>o document this interface in Documentation/trace/uprobetracer.txt
>>>>>o prepare nice patches set?
>>>>>
>>>>>So, opinions? Is it worth to add something like this?
>>>>>
>>>>>[1]: https://lkml.org/lkml/2016/7/12/346
>>>>>
>>>>>Dmitry Safonov (3):
>>>>>  x86/vdso: create vdso file, use it for mapping
>>>>>  uprobe: drop isdigit() check in create_trace_uprobe
>>>>>  uprobe: add vdso support
>>>>>
>>>>>Cc: Oleg Nesterov <oleg@redhat.com>
>>>>>Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>>Cc: Steven Rostedt <rostedt@goodmis.org>
>>>>>Cc: Andy Lutomirski <luto@amacapital.net>
>>>>>Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>>Cc: Ingo Molnar <mingo@redhat.com>
>>>>>Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>>Cc: x86@kernel.org
>>>>>Cc: Dmitry Safonov <0x7f454c46@gmail.com>
>>>>>
>>>>>arch/x86/entry/vdso/vma.c   | 148
>>>>>++++++++++++++++++++++++++++++++++++++++++--
>>>>> kernel/trace/trace_uprobe.c |  50 +++++++++++----
>>>>> 2 files changed, 180 insertions(+), 18 deletions(-)
>>>>
>>>> I think there is a lot to be said for this idea.  However, a private
>>>mapping is definitely wrong for the vvar data; for the vdso code it
>>>could be considered either way I suppose.
>>>
>>>Thanks on your reply.
>>>As you could see, I preserved pure mapping of pfn for vvar:
>>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>>> [vvar]
>>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>>> [vdso]
>>>(no inode number).
>>>I also think it would be useless to do the same to vvar as it
>>>has just data and there is no point in probing it.
>>
>> Well, it would things like mremap() just work and so on.  Let's get rid of special cases if we are.
>
> Well, for RFC it wouldn't move context.vdso pointer on mremap(),
> but as RFC is for x86_64 only, it will work on it.
> Anyway, I don't think it would be hard to fix and make mremap() work on
> other archs on post-RFC.
>
> The only corner-case I see for now is that /proc/self/map_files/<vdso_range>
> will point to [vdso] which is broken link. But one could read this file
> and dump/read vdso blob.
> So, in the other words: if some program assumes that /proc/self/map_files/*
> should always point to correct file, it may be confused. Not sure, maybe
> it would be confused by orphane-file mappings, so having dangling link
> there is just fine.

I don't see anything a priori wrong with having map_files point
somewhere, but it could be worth special casing it for special
mappings to preserve existing behavior (no file at all).

--Andy

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
  2016-08-26 14:32         ` Andy Lutomirski
@ 2016-08-26 14:42           ` Dmitry Safonov
  2016-08-26 14:44             ` Dmitry Safonov
  0 siblings, 1 reply; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-26 14:42 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, Dmitry Safonov, linux-kernel, Ingo Molnar,
	Thomas Gleixner, X86 ML, Oleg Nesterov, Steven Rostedt, Al Viro

2016-08-26 17:32 GMT+03:00 Andy Lutomirski <luto@amacapital.net>:
> On Fri, Aug 26, 2016 at 4:16 AM, Dmitry Safonov <0x7f454c46@gmail.com> wrote:
>> 2016-08-26 2:00 GMT+03:00 H. Peter Anvin <hpa@zytor.com>:
>>> On August 25, 2016 3:53:43 PM PDT, Dmitry Safonov <0x7f454c46@gmail.com> wrote:
>>>>2016-08-25 23:49 GMT+03:00 H. Peter Anvin <hpa@zytor.com>:
>>>>> On August 25, 2016 8:21:07 AM PDT, Dmitry Safonov
>>>><dsafonov@virtuozzo.com> wrote:
>>>>>>This patches set is cleanly RFC and is not supposed to be applied.
>>>>>>Also for RFC time it builds only on x86_64.
>>>>>>
>>>>>>So, in a mail thread Oleg told that it would be worth to introduce
>>>>>>vm_file
>>>>>>for vdso mappings as currently uprobes can not be placed on vDSO VMAs
>>>>>>[1].
>>>>>>In this patches set I introduce in-kernel filesystem for vdso files.
>>>>>>After patches vDSO VMA now has inode and is just a private file
>>>>>>mapping:
>>>>>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>>>>>> [vvar]
>>>>>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>>>>>> [vdso]
>>>>>>
>>>>>>Then I introduce interface in uprobe_events to insert uprobes in
>>>>vdso.
>>>>>>FWIW:
>>>>>>  [~]# cd kernel/linux
>>>>>>  [linux]# readelf --syms arch/x86/entry/vdso/vdso64.so
>>>>>>Symbol table '.dynsym' contains 11 entries:
>>>>>>   Num:    Value          Size Type    Bind   Vis      Ndx Name
>>>>>>     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>>>>>>     1: 0000000000000470     0 SECTION LOCAL  DEFAULT    8
>>>>>>2: 00000000000008d0   885 FUNC    WEAK   DEFAULT   12
>>>>>>clock_gettime@@LINUX_2.6
>>>>>>3: 0000000000000c50   472 FUNC    GLOBAL DEFAULT   12
>>>>>>__vdso_gettimeofday@@LINUX_2.6
>>>>>>4: 0000000000000c50   472 FUNC    WEAK   DEFAULT   12
>>>>>>gettimeofday@@LINUX_2.6
>>>>>>5: 0000000000000e30    21 FUNC    GLOBAL DEFAULT   12
>>>>>>__vdso_time@@LINUX_2.6
>>>>>>  6: 0000000000000e30    21 FUNC    WEAK   DEFAULT   12
>>>>time@@LINUX_2.6
>>>>>>7: 00000000000008d0   885 FUNC    GLOBAL DEFAULT   12
>>>>>>__vdso_clock_gettime@@LINUX_2.6
>>>>>>     8: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6
>>>>>>9: 0000000000000e50    41 FUNC    GLOBAL DEFAULT   12
>>>>>>__vdso_getcpu@@LINUX_2.6
>>>>>>10: 0000000000000e50    41 FUNC    WEAK   DEFAULT   12
>>>>>>getcpu@@LINUX_2.6
>>>>>>  [~]# cd /sys/kernel/debug/tracing/
>>>>>>  [tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events
>>>>>>  [tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events
>>>>>>  [tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events
>>>>>>  [tracing]# echo 1 > events/uprobes/enable
>>>>>>  [tracing]# su test # it has UID=1001
>>>>>>  [tracing]$ date
>>>>>>  Thu Aug 25 17:19:29 MSK 2016
>>>>>>  [tracing]$ exit
>>>>>>  [tracing]# cat trace
>>>>>>  # tracer: nop
>>>>>>  #
>>>>>>  # entries-in-buffer/entries-written: 175/175   #P:4
>>>>>>  #
>>>>>>  #                              _-----=> irqs-off
>>>>>>  #                             / _----=> need-resched
>>>>>>  #                            | / _---=> hardirq/softirq
>>>>>>  #                            || / _--=> preempt-depth
>>>>>>  #                            ||| /     delay
>>>>>>  #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>>>>>>  #              | |       |   ||||       |         |
>>>>>>             bash-11560 [001] d...   316.470236: time:
>>>>(0x7ffcacebae30)
>>>>>>     bash-11560 [001] d...   316.471436: gettimeofday:
>>>>(0x7ffcacebac50)
>>>>>>             bash-11560 [001] d...   316.477550: time:
>>>>(0x7ffcacebae30)
>>>>>>             bash-11560 [001] d...   316.477655: time:
>>>>(0x7ffcacebae30)
>>>>>>   mktemp-11568 [001] d...   316.479589: gettimeofday:
>>>>(0x7ffc603f0c50)
>>>>>>    date-11571 [001] d...   316.481890: clock_gettime:
>>>>(0x7ffec9db58d0)
>>>>>>[...]
>>>>>>
>>>>>>If this approach will be decided as fine, I will prepare a better
>>>>>>version,
>>>>>>fixing the following things:
>>>>>>o put vdsofs in generic fs/* dir
>>>>>>o support other archs and vdso blobs
>>>>>>o remove BUG_ON()'s and UID==1001 check
>>>>>>o remove extern's and use headers only
>>>>>>o refactor code in create_trace_uprobe()
>>>>>>o add some state to (struct trace_uprobe), so i.e., `cat
>>>>uprobe_events`
>>>>>>will
>>>>>>  print those uprobes as vdso-based
>>>>>>o document this interface in Documentation/trace/uprobetracer.txt
>>>>>>o prepare nice patches set?
>>>>>>
>>>>>>So, opinions? Is it worth to add something like this?
>>>>>>
>>>>>>[1]: https://lkml.org/lkml/2016/7/12/346
>>>>>>
>>>>>>Dmitry Safonov (3):
>>>>>>  x86/vdso: create vdso file, use it for mapping
>>>>>>  uprobe: drop isdigit() check in create_trace_uprobe
>>>>>>  uprobe: add vdso support
>>>>>>
>>>>>>Cc: Oleg Nesterov <oleg@redhat.com>
>>>>>>Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>>>Cc: Steven Rostedt <rostedt@goodmis.org>
>>>>>>Cc: Andy Lutomirski <luto@amacapital.net>
>>>>>>Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>>>Cc: Ingo Molnar <mingo@redhat.com>
>>>>>>Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>>>Cc: x86@kernel.org
>>>>>>Cc: Dmitry Safonov <0x7f454c46@gmail.com>
>>>>>>
>>>>>>arch/x86/entry/vdso/vma.c   | 148
>>>>>>++++++++++++++++++++++++++++++++++++++++++--
>>>>>> kernel/trace/trace_uprobe.c |  50 +++++++++++----
>>>>>> 2 files changed, 180 insertions(+), 18 deletions(-)
>>>>>
>>>>> I think there is a lot to be said for this idea.  However, a private
>>>>mapping is definitely wrong for the vvar data; for the vdso code it
>>>>could be considered either way I suppose.
>>>>
>>>>Thanks on your reply.
>>>>As you could see, I preserved pure mapping of pfn for vvar:
>>>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>>>> [vvar]
>>>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>>>> [vdso]
>>>>(no inode number).
>>>>I also think it would be useless to do the same to vvar as it
>>>>has just data and there is no point in probing it.
>>>
>>> Well, it would things like mremap() just work and so on.  Let's get rid of special cases if we are.
>>
>> Well, for RFC it wouldn't move context.vdso pointer on mremap(),
>> but as RFC is for x86_64 only, it will work on it.
>> Anyway, I don't think it would be hard to fix and make mremap() work on
>> other archs on post-RFC.
>>
>> The only corner-case I see for now is that /proc/self/map_files/<vdso_range>
>> will point to [vdso] which is broken link. But one could read this file
>> and dump/read vdso blob.
>> So, in the other words: if some program assumes that /proc/self/map_files/*
>> should always point to correct file, it may be confused. Not sure, maybe
>> it would be confused by orphane-file mappings, so having dangling link
>> there is just fine.
>
> I don't see anything a priori wrong with having map_files point
> somewhere, but it could be worth special casing it for special
> mappings to preserve existing behavior (no file at all).

Yep, that could be easily done, will do.
Anyway, just curious - what may it break?

Thanks on the reply, Andy. Does the patches set look sane for you?

-- 
             Dmitry

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
  2016-08-26 14:42           ` Dmitry Safonov
@ 2016-08-26 14:44             ` Dmitry Safonov
  0 siblings, 0 replies; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-26 14:44 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: H. Peter Anvin, Dmitry Safonov, linux-kernel, Ingo Molnar,
	Thomas Gleixner, X86 ML, Oleg Nesterov, Steven Rostedt, Al Viro

2016-08-26 17:42 GMT+03:00 Dmitry Safonov <0x7f454c46@gmail.com>:
> 2016-08-26 17:32 GMT+03:00 Andy Lutomirski <luto@amacapital.net>:
>> On Fri, Aug 26, 2016 at 4:16 AM, Dmitry Safonov <0x7f454c46@gmail.com> wrote:
>>> 2016-08-26 2:00 GMT+03:00 H. Peter Anvin <hpa@zytor.com>:
>>>> On August 25, 2016 3:53:43 PM PDT, Dmitry Safonov <0x7f454c46@gmail.com> wrote:
>>>>>2016-08-25 23:49 GMT+03:00 H. Peter Anvin <hpa@zytor.com>:
>>>>>> On August 25, 2016 8:21:07 AM PDT, Dmitry Safonov
>>>>><dsafonov@virtuozzo.com> wrote:
>>>>>>>This patches set is cleanly RFC and is not supposed to be applied.
>>>>>>>Also for RFC time it builds only on x86_64.
>>>>>>>
>>>>>>>So, in a mail thread Oleg told that it would be worth to introduce
>>>>>>>vm_file
>>>>>>>for vdso mappings as currently uprobes can not be placed on vDSO VMAs
>>>>>>>[1].
>>>>>>>In this patches set I introduce in-kernel filesystem for vdso files.
>>>>>>>After patches vDSO VMA now has inode and is just a private file
>>>>>>>mapping:
>>>>>>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>>>>>>> [vvar]
>>>>>>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>>>>>>> [vdso]
>>>>>>>
>>>>>>>Then I introduce interface in uprobe_events to insert uprobes in
>>>>>vdso.
>>>>>>>FWIW:
>>>>>>>  [~]# cd kernel/linux
>>>>>>>  [linux]# readelf --syms arch/x86/entry/vdso/vdso64.so
>>>>>>>Symbol table '.dynsym' contains 11 entries:
>>>>>>>   Num:    Value          Size Type    Bind   Vis      Ndx Name
>>>>>>>     0: 0000000000000000     0 NOTYPE  LOCAL  DEFAULT  UND
>>>>>>>     1: 0000000000000470     0 SECTION LOCAL  DEFAULT    8
>>>>>>>2: 00000000000008d0   885 FUNC    WEAK   DEFAULT   12
>>>>>>>clock_gettime@@LINUX_2.6
>>>>>>>3: 0000000000000c50   472 FUNC    GLOBAL DEFAULT   12
>>>>>>>__vdso_gettimeofday@@LINUX_2.6
>>>>>>>4: 0000000000000c50   472 FUNC    WEAK   DEFAULT   12
>>>>>>>gettimeofday@@LINUX_2.6
>>>>>>>5: 0000000000000e30    21 FUNC    GLOBAL DEFAULT   12
>>>>>>>__vdso_time@@LINUX_2.6
>>>>>>>  6: 0000000000000e30    21 FUNC    WEAK   DEFAULT   12
>>>>>time@@LINUX_2.6
>>>>>>>7: 00000000000008d0   885 FUNC    GLOBAL DEFAULT   12
>>>>>>>__vdso_clock_gettime@@LINUX_2.6
>>>>>>>     8: 0000000000000000     0 OBJECT  GLOBAL DEFAULT  ABS LINUX_2.6
>>>>>>>9: 0000000000000e50    41 FUNC    GLOBAL DEFAULT   12
>>>>>>>__vdso_getcpu@@LINUX_2.6
>>>>>>>10: 0000000000000e50    41 FUNC    WEAK   DEFAULT   12
>>>>>>>getcpu@@LINUX_2.6
>>>>>>>  [~]# cd /sys/kernel/debug/tracing/
>>>>>>>  [tracing]# echo 'p:clock_gettime :vdso:/64:0x8d0' > uprobe_events
>>>>>>>  [tracing]# echo 'p:gettimeofday :vdso:/64:0xc50' >> uprobe_events
>>>>>>>  [tracing]# echo 'p:time :vdso:/64:0xe30' >> uprobe_events
>>>>>>>  [tracing]# echo 1 > events/uprobes/enable
>>>>>>>  [tracing]# su test # it has UID=1001
>>>>>>>  [tracing]$ date
>>>>>>>  Thu Aug 25 17:19:29 MSK 2016
>>>>>>>  [tracing]$ exit
>>>>>>>  [tracing]# cat trace
>>>>>>>  # tracer: nop
>>>>>>>  #
>>>>>>>  # entries-in-buffer/entries-written: 175/175   #P:4
>>>>>>>  #
>>>>>>>  #                              _-----=> irqs-off
>>>>>>>  #                             / _----=> need-resched
>>>>>>>  #                            | / _---=> hardirq/softirq
>>>>>>>  #                            || / _--=> preempt-depth
>>>>>>>  #                            ||| /     delay
>>>>>>>  #           TASK-PID   CPU#  ||||    TIMESTAMP  FUNCTION
>>>>>>>  #              | |       |   ||||       |         |
>>>>>>>             bash-11560 [001] d...   316.470236: time:
>>>>>(0x7ffcacebae30)
>>>>>>>     bash-11560 [001] d...   316.471436: gettimeofday:
>>>>>(0x7ffcacebac50)
>>>>>>>             bash-11560 [001] d...   316.477550: time:
>>>>>(0x7ffcacebae30)
>>>>>>>             bash-11560 [001] d...   316.477655: time:
>>>>>(0x7ffcacebae30)
>>>>>>>   mktemp-11568 [001] d...   316.479589: gettimeofday:
>>>>>(0x7ffc603f0c50)
>>>>>>>    date-11571 [001] d...   316.481890: clock_gettime:
>>>>>(0x7ffec9db58d0)
>>>>>>>[...]
>>>>>>>
>>>>>>>If this approach will be decided as fine, I will prepare a better
>>>>>>>version,
>>>>>>>fixing the following things:
>>>>>>>o put vdsofs in generic fs/* dir
>>>>>>>o support other archs and vdso blobs
>>>>>>>o remove BUG_ON()'s and UID==1001 check
>>>>>>>o remove extern's and use headers only
>>>>>>>o refactor code in create_trace_uprobe()
>>>>>>>o add some state to (struct trace_uprobe), so i.e., `cat
>>>>>uprobe_events`
>>>>>>>will
>>>>>>>  print those uprobes as vdso-based
>>>>>>>o document this interface in Documentation/trace/uprobetracer.txt
>>>>>>>o prepare nice patches set?
>>>>>>>
>>>>>>>So, opinions? Is it worth to add something like this?
>>>>>>>
>>>>>>>[1]: https://lkml.org/lkml/2016/7/12/346
>>>>>>>
>>>>>>>Dmitry Safonov (3):
>>>>>>>  x86/vdso: create vdso file, use it for mapping
>>>>>>>  uprobe: drop isdigit() check in create_trace_uprobe
>>>>>>>  uprobe: add vdso support
>>>>>>>
>>>>>>>Cc: Oleg Nesterov <oleg@redhat.com>
>>>>>>>Cc: Al Viro <viro@zeniv.linux.org.uk>
>>>>>>>Cc: Steven Rostedt <rostedt@goodmis.org>
>>>>>>>Cc: Andy Lutomirski <luto@amacapital.net>
>>>>>>>Cc: Thomas Gleixner <tglx@linutronix.de>
>>>>>>>Cc: Ingo Molnar <mingo@redhat.com>
>>>>>>>Cc: "H. Peter Anvin" <hpa@zytor.com>
>>>>>>>Cc: x86@kernel.org
>>>>>>>Cc: Dmitry Safonov <0x7f454c46@gmail.com>
>>>>>>>
>>>>>>>arch/x86/entry/vdso/vma.c   | 148
>>>>>>>++++++++++++++++++++++++++++++++++++++++++--
>>>>>>> kernel/trace/trace_uprobe.c |  50 +++++++++++----
>>>>>>> 2 files changed, 180 insertions(+), 18 deletions(-)
>>>>>>
>>>>>> I think there is a lot to be said for this idea.  However, a private
>>>>>mapping is definitely wrong for the vvar data; for the vdso code it
>>>>>could be considered either way I suppose.
>>>>>
>>>>>Thanks on your reply.
>>>>>As you could see, I preserved pure mapping of pfn for vvar:
>>>>>7ffcc4b2b000-7ffcc4b2d000 r--p 00000000 00:00 0
>>>>> [vvar]
>>>>>7ffcc4b2d000-7ffcc4b2f000 r-xp 00000000 00:09 18
>>>>> [vdso]
>>>>>(no inode number).
>>>>>I also think it would be useless to do the same to vvar as it
>>>>>has just data and there is no point in probing it.
>>>>
>>>> Well, it would things like mremap() just work and so on.  Let's get rid of special cases if we are.
>>>
>>> Well, for RFC it wouldn't move context.vdso pointer on mremap(),
>>> but as RFC is for x86_64 only, it will work on it.
>>> Anyway, I don't think it would be hard to fix and make mremap() work on
>>> other archs on post-RFC.
>>>
>>> The only corner-case I see for now is that /proc/self/map_files/<vdso_range>
>>> will point to [vdso] which is broken link. But one could read this file
>>> and dump/read vdso blob.
>>> So, in the other words: if some program assumes that /proc/self/map_files/*
>>> should always point to correct file, it may be confused. Not sure, maybe
>>> it would be confused by orphane-file mappings, so having dangling link
>>> there is just fine.
>>
>> I don't see anything a priori wrong with having map_files point
>> somewhere, but it could be worth special casing it for special
>> mappings to preserve existing behavior (no file at all).
>
> Yep, that could be easily done, will do.
> Anyway, just curious - what may it break?
>
> Thanks on the reply, Andy. Does the patches set look sane for you?

I mean, the idea, not current big TODO list and nitpicks :)

-- 
             Dmitry

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-08-25 15:21 ` [RFC 1/3] x86/vdso: create vdso file, use it for mapping Dmitry Safonov
  2016-08-25 19:49   ` Dmitry Safonov
@ 2016-08-28 20:14   ` Cyrill Gorcunov
  2016-08-29  9:18     ` Dmitry Safonov
  2016-08-29  9:28   ` Andy Lutomirski
                     ` (2 subsequent siblings)
  4 siblings, 1 reply; 37+ messages in thread
From: Cyrill Gorcunov @ 2016-08-28 20:14 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, oleg,
	rostedt, viro

On Thu, Aug 25, 2016 at 06:21:08PM +0300, Dmitry Safonov wrote:
> I added here a new in-kernel fs with ramfs-like options.
> Created vdso file in this fs (yet for testing, only 64-bit vdso).
> Mapped this file to process's mm on setup_additional_pages.
> Just for testing purpose it's done only for specific UID.
> 
> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>

Dmitry, could you clarify please, why "old way" remains in the code?
Even if RFC is not the new approach is supposing to completely remove
old one? Or I miss something obvious?

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-08-28 20:14   ` Cyrill Gorcunov
@ 2016-08-29  9:18     ` Dmitry Safonov
  0 siblings, 0 replies; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-29  9:18 UTC (permalink / raw)
  To: Cyrill Gorcunov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, oleg,
	rostedt, viro

On 08/28/2016 11:14 PM, Cyrill Gorcunov wrote:
> On Thu, Aug 25, 2016 at 06:21:08PM +0300, Dmitry Safonov wrote:
>> I added here a new in-kernel fs with ramfs-like options.
>> Created vdso file in this fs (yet for testing, only 64-bit vdso).
>> Mapped this file to process's mm on setup_additional_pages.
>> Just for testing purpose it's done only for specific UID.
>>
>> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
>

Hi Cyrill,

> Dmitry, could you clarify please, why "old way" remains in the code?
> Even if RFC is not the new approach is supposing to completely remove
> old one? Or I miss something obvious?

Well, I left it for RFC, I think on post-RFC we can leave it as
cmd-param (alike vdso{64,32}_enabled) so it could be easily disabled if
it brokes something in userspace. I guess, that's preferred way to
introduce something like that.

-- 
              Dmitry

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-08-25 15:21 ` [RFC 1/3] x86/vdso: create vdso file, use it for mapping Dmitry Safonov
  2016-08-25 19:49   ` Dmitry Safonov
  2016-08-28 20:14   ` Cyrill Gorcunov
@ 2016-08-29  9:28   ` Andy Lutomirski
  2016-08-29  9:50     ` Dmitry Safonov
  2016-09-03  0:08     ` Al Viro
  2016-08-30 14:33   ` Oleg Nesterov
  2016-09-03  0:20   ` Al Viro
  4 siblings, 2 replies; 37+ messages in thread
From: Andy Lutomirski @ 2016-08-29  9:28 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	X86 ML, Dmitry Safonov, Oleg Nesterov, Steven Rostedt, Al Viro

On Thu, Aug 25, 2016 at 8:21 AM, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
> I added here a new in-kernel fs with ramfs-like options.
> Created vdso file in this fs (yet for testing, only 64-bit vdso).
> Mapped this file to process's mm on setup_additional_pages.
> Just for testing purpose it's done only for specific UID.

I'm wondering whether all this code could be easily moved into the
core special mapping helpers so that all special mappings get the same
benefit.  We could embed a struct file * (or struct inode or whatever)
in special_mapping if needed.

Also, could this be simplified to use anon_inode?

(I'm not a VFS expert at all, so I could be way off base.)

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-08-29  9:28   ` Andy Lutomirski
@ 2016-08-29  9:50     ` Dmitry Safonov
  2016-08-30 14:58       ` Andy Lutomirski
  2016-09-03  0:08     ` Al Viro
  1 sibling, 1 reply; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-29  9:50 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	X86 ML, Dmitry Safonov, Oleg Nesterov, Steven Rostedt, Al Viro

On 08/29/2016 12:28 PM, Andy Lutomirski wrote:
> On Thu, Aug 25, 2016 at 8:21 AM, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
>> I added here a new in-kernel fs with ramfs-like options.
>> Created vdso file in this fs (yet for testing, only 64-bit vdso).
>> Mapped this file to process's mm on setup_additional_pages.
>> Just for testing purpose it's done only for specific UID.
>
> I'm wondering whether all this code could be easily moved into the
> core special mapping helpers so that all special mappings get the same
> benefit.  We could embed a struct file * (or struct inode or whatever)
> in special_mapping if needed.

Hmm, yes, I guess. The only thing -- we'll still need per-arch changes
to initialize those files on booting. But that looks like the proper
generic place to move this code.

> Also, could this be simplified to use anon_inode?
>
> (I'm not a VFS expert at all, so I could be way off base.)

Well, I'll try to do the second version with anon_inode.

Thanks,
              Dmitry

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 2/3] uprobe: drop isdigit() check in create_trace_uprobe
  2016-08-25 15:21 ` [RFC 2/3] uprobe: drop isdigit() check in create_trace_uprobe Dmitry Safonov
@ 2016-08-29 22:58   ` Steven Rostedt
  2016-08-29 22:59     ` Steven Rostedt
  2016-08-30 14:57     ` Srikar Dronamraju
  0 siblings, 2 replies; 37+ messages in thread
From: Steven Rostedt @ 2016-08-29 22:58 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, oleg, viro

On Thu, 25 Aug 2016 18:21:09 +0300
Dmitry Safonov <dsafonov@virtuozzo.com> wrote:

> It's useless. Before:
>   [tracing]# echo 'p:test /a:0x0' >> uprobe_events
>   [tracing]# echo 'p:test a:0x0' >> uprobe_events
>   -bash: echo: write error: No such file or directory
>   [tracing]# echo 'p:test 1:0x0' >> uprobe_events
>   -bash: echo: write error: Invalid argument
> 
> After:
>   [tracing]# echo 'p:test 1:0x0' >> uprobe_events
>   -bash: echo: write error: No such file or directory
> 
> Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>

Acked-by: Steven Rostedt <rostedt@goodmis.org>

-- Steve

> ---
>  kernel/trace/trace_uprobe.c | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> index c53485441c88..a74f2d9ff379 100644
> --- a/kernel/trace/trace_uprobe.c
> +++ b/kernel/trace/trace_uprobe.c
> @@ -427,10 +427,6 @@ static int create_trace_uprobe(int argc, char **argv)
>  		pr_info("Probe point is not specified.\n");
>  		return -EINVAL;
>  	}
> -	if (isdigit(argv[1][0])) {
> -		pr_info("probe point must be have a filename.\n");
> -		return -EINVAL;
> -	}
>  	arg = strchr(argv[1], ':');
>  	if (!arg) {
>  		ret = -EINVAL;

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 2/3] uprobe: drop isdigit() check in create_trace_uprobe
  2016-08-29 22:58   ` Steven Rostedt
@ 2016-08-29 22:59     ` Steven Rostedt
  2016-08-29 23:01       ` Dmitry Safonov
  2016-08-30 14:37       ` Oleg Nesterov
  2016-08-30 14:57     ` Srikar Dronamraju
  1 sibling, 2 replies; 37+ messages in thread
From: Steven Rostedt @ 2016-08-29 22:59 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, oleg, viro

On Mon, 29 Aug 2016 18:58:13 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

> On Thu, 25 Aug 2016 18:21:09 +0300
> Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
> 
> > It's useless. Before:
> >   [tracing]# echo 'p:test /a:0x0' >> uprobe_events
> >   [tracing]# echo 'p:test a:0x0' >> uprobe_events
> >   -bash: echo: write error: No such file or directory
> >   [tracing]# echo 'p:test 1:0x0' >> uprobe_events
> >   -bash: echo: write error: Invalid argument
> > 
> > After:
> >   [tracing]# echo 'p:test 1:0x0' >> uprobe_events
> >   -bash: echo: write error: No such file or directory
> > 
> > Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>  
> 
> Acked-by: Steven Rostedt <rostedt@goodmis.org>

Actually, this patch seems agnostic to the series. I'll just pull it in
now.

-- Steve

> 
> > ---
> >  kernel/trace/trace_uprobe.c | 4 ----
> >  1 file changed, 4 deletions(-)
> > 
> > diff --git a/kernel/trace/trace_uprobe.c b/kernel/trace/trace_uprobe.c
> > index c53485441c88..a74f2d9ff379 100644
> > --- a/kernel/trace/trace_uprobe.c
> > +++ b/kernel/trace/trace_uprobe.c
> > @@ -427,10 +427,6 @@ static int create_trace_uprobe(int argc, char **argv)
> >  		pr_info("Probe point is not specified.\n");
> >  		return -EINVAL;
> >  	}
> > -	if (isdigit(argv[1][0])) {
> > -		pr_info("probe point must be have a filename.\n");
> > -		return -EINVAL;
> > -	}
> >  	arg = strchr(argv[1], ':');
> >  	if (!arg) {
> >  		ret = -EINVAL;  
> 

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 2/3] uprobe: drop isdigit() check in create_trace_uprobe
  2016-08-29 22:59     ` Steven Rostedt
@ 2016-08-29 23:01       ` Dmitry Safonov
  2016-08-30 14:37       ` Oleg Nesterov
  1 sibling, 0 replies; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-29 23:01 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dmitry Safonov, linux-kernel, Ingo Molnar, Andy Lutomirski,
	Thomas Gleixner, H. Peter Anvin, X86 ML, Oleg Nesterov, Al Viro

2016-08-30 1:59 GMT+03:00 Steven Rostedt <rostedt@goodmis.org>:
> On Mon, 29 Aug 2016 18:58:13 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
>> On Thu, 25 Aug 2016 18:21:09 +0300
>> Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
>>
>> > It's useless. Before:
>> >   [tracing]# echo 'p:test /a:0x0' >> uprobe_events
>> >   [tracing]# echo 'p:test a:0x0' >> uprobe_events
>> >   -bash: echo: write error: No such file or directory
>> >   [tracing]# echo 'p:test 1:0x0' >> uprobe_events
>> >   -bash: echo: write error: Invalid argument
>> >
>> > After:
>> >   [tracing]# echo 'p:test 1:0x0' >> uprobe_events
>> >   -bash: echo: write error: No such file or directory
>> >
>> > Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
>>
>> Acked-by: Steven Rostedt <rostedt@goodmis.org>
>
> Actually, this patch seems agnostic to the series. I'll just pull it in
> now.

Thanks, Steve!

-- 
             Dmitry

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-08-25 15:21 ` [RFC 1/3] x86/vdso: create vdso file, use it for mapping Dmitry Safonov
                     ` (2 preceding siblings ...)
  2016-08-29  9:28   ` Andy Lutomirski
@ 2016-08-30 14:33   ` Oleg Nesterov
  2016-08-30 14:53     ` Dmitry Safonov
  2016-09-03  0:13     ` Al Viro
  2016-09-03  0:20   ` Al Viro
  4 siblings, 2 replies; 37+ messages in thread
From: Oleg Nesterov @ 2016-08-30 14:33 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, rostedt, viro

On 08/25, Dmitry Safonov wrote:
>
> +static __init struct file *init_vdso_file(const struct vdso_image *vdso_image,
> +					const char *name)
> +{
> +	struct super_block *sb;
> +	struct qstr name_str;
> +	struct inode *inode;
> +	struct path path;
> +	struct file *res;
> +
> +	if (IS_ERR(vdso_mnt))
> +		return ERR_CAST(vdso_mnt);
> +	sb = vdso_mnt->mnt_sb;
> +
> +	name_str.hash = 0;
> +	name_str.len = strlen(name);
> +	name_str.name = name;
> +
> +	res = ERR_PTR(-ENOMEM);
> +	path.mnt = mntget(vdso_mnt);
> +	path.dentry = d_alloc_pseudo(sb, &name_str);
> +	if (!path.dentry)
> +		goto put_path;
> +	d_set_d_op(path.dentry, &vdso_dops);
> +
> +	res = ERR_PTR(-ENOSPC);
> +	inode = ramfs_get_inode(sb, NULL, S_IFREG | S_IRUGO | S_IXUGO, 0);
> +	if (!inode)
> +		goto put_path;
> +
> +	inode->i_flags |= S_PRIVATE;
> +	d_instantiate(path.dentry, inode);
> +	inode->i_size = vdso_image->size;
> +
> +	res = ERR_PTR(add_vdso_pages_to_page_cache(vdso_image, inode));
> +	if (IS_ERR(res))
> +		goto put_path;
> +
> +	res = alloc_file(&path, FMODE_READ, &ramfs_file_operations);
> +	if (!IS_ERR(res))
> +		return res;
> +
> +put_path:
> +	path_put(&path);
> +	return res;
> +}

Not sure... I think alloc_anon_inode() makes more sense.

Perhaps you can look at fs/aio.c and extract some bits of code into
the new helpers which could be used by both aio_fs and vdso_fs?

Oleg.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 2/3] uprobe: drop isdigit() check in create_trace_uprobe
  2016-08-29 22:59     ` Steven Rostedt
  2016-08-29 23:01       ` Dmitry Safonov
@ 2016-08-30 14:37       ` Oleg Nesterov
  2016-08-30 21:15         ` Steven Rostedt
  1 sibling, 1 reply; 37+ messages in thread
From: Oleg Nesterov @ 2016-08-30 14:37 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dmitry Safonov, linux-kernel, mingo, luto, tglx, hpa, x86,
	0x7f454c46, viro

On 08/29, Steven Rostedt wrote:
>
> On Mon, 29 Aug 2016 18:58:13 -0400
> Steven Rostedt <rostedt@goodmis.org> wrote:
>
> > On Thu, 25 Aug 2016 18:21:09 +0300
> > Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
> >
> > > It's useless. Before:
> > >   [tracing]# echo 'p:test /a:0x0' >> uprobe_events
> > >   [tracing]# echo 'p:test a:0x0' >> uprobe_events
> > >   -bash: echo: write error: No such file or directory
> > >   [tracing]# echo 'p:test 1:0x0' >> uprobe_events
> > >   -bash: echo: write error: Invalid argument
> > > 
> > > After:
> > >   [tracing]# echo 'p:test 1:0x0' >> uprobe_events
> > >   -bash: echo: write error: No such file or directory
> > > 
> > > Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>  
> > 
> > Acked-by: Steven Rostedt <rostedt@goodmis.org>
>
> Actually, this patch seems agnostic to the series. I'll just pull it in
> now.

Yes, agreed, this isdigit() is pointless, thanks.

Oleg.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-08-30 14:33   ` Oleg Nesterov
@ 2016-08-30 14:53     ` Dmitry Safonov
  2016-09-03  0:13     ` Al Viro
  1 sibling, 0 replies; 37+ messages in thread
From: Dmitry Safonov @ 2016-08-30 14:53 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Safonov, linux-kernel, Ingo Molnar, Andy Lutomirski,
	Thomas Gleixner, H. Peter Anvin, X86 ML, Steven Rostedt, Al Viro

2016-08-30 17:33 GMT+03:00 Oleg Nesterov <oleg@redhat.com>:
> On 08/25, Dmitry Safonov wrote:
>>
>> +static __init struct file *init_vdso_file(const struct vdso_image *vdso_image,
>> +                                     const char *name)
>> +{
>> +     struct super_block *sb;
>> +     struct qstr name_str;
>> +     struct inode *inode;
>> +     struct path path;
>> +     struct file *res;
>> +
>> +     if (IS_ERR(vdso_mnt))
>> +             return ERR_CAST(vdso_mnt);
>> +     sb = vdso_mnt->mnt_sb;
>> +
>> +     name_str.hash = 0;
>> +     name_str.len = strlen(name);
>> +     name_str.name = name;
>> +
>> +     res = ERR_PTR(-ENOMEM);
>> +     path.mnt = mntget(vdso_mnt);
>> +     path.dentry = d_alloc_pseudo(sb, &name_str);
>> +     if (!path.dentry)
>> +             goto put_path;
>> +     d_set_d_op(path.dentry, &vdso_dops);
>> +
>> +     res = ERR_PTR(-ENOSPC);
>> +     inode = ramfs_get_inode(sb, NULL, S_IFREG | S_IRUGO | S_IXUGO, 0);
>> +     if (!inode)
>> +             goto put_path;
>> +
>> +     inode->i_flags |= S_PRIVATE;
>> +     d_instantiate(path.dentry, inode);
>> +     inode->i_size = vdso_image->size;
>> +
>> +     res = ERR_PTR(add_vdso_pages_to_page_cache(vdso_image, inode));
>> +     if (IS_ERR(res))
>> +             goto put_path;
>> +
>> +     res = alloc_file(&path, FMODE_READ, &ramfs_file_operations);
>> +     if (!IS_ERR(res))
>> +             return res;
>> +
>> +put_path:
>> +     path_put(&path);
>> +     return res;
>> +}
>
> Not sure... I think alloc_anon_inode() makes more sense.
>
> Perhaps you can look at fs/aio.c and extract some bits of code into
> the new helpers which could be used by both aio_fs and vdso_fs?

Thanks, will do for the next version.

-- 
             Dmitry

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 2/3] uprobe: drop isdigit() check in create_trace_uprobe
  2016-08-29 22:58   ` Steven Rostedt
  2016-08-29 22:59     ` Steven Rostedt
@ 2016-08-30 14:57     ` Srikar Dronamraju
  1 sibling, 0 replies; 37+ messages in thread
From: Srikar Dronamraju @ 2016-08-30 14:57 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dmitry Safonov, linux-kernel, mingo, luto, tglx, hpa, x86,
	0x7f454c46, oleg, viro

* Steven Rostedt <rostedt@goodmis.org> [2016-08-29 18:58:13]:

> On Thu, 25 Aug 2016 18:21:09 +0300
> Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
> 
> > It's useless. Before:
> >   [tracing]# echo 'p:test /a:0x0' >> uprobe_events
> >   [tracing]# echo 'p:test a:0x0' >> uprobe_events
> >   -bash: echo: write error: No such file or directory
> >   [tracing]# echo 'p:test 1:0x0' >> uprobe_events
> >   -bash: echo: write error: Invalid argument
> > 
> > After:
> >   [tracing]# echo 'p:test 1:0x0' >> uprobe_events
> >   -bash: echo: write error: No such file or directory
> > 
> > Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>
> 
> Acked-by: Steven Rostedt <rostedt@goodmis.org>

Agree.

Acked-by: Srikar Dronamraju <srikar@linux.vnet.ibm.com>

-- 
Thanks and Regards
Srikar Dronamraju

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-08-29  9:50     ` Dmitry Safonov
@ 2016-08-30 14:58       ` Andy Lutomirski
  0 siblings, 0 replies; 37+ messages in thread
From: Andy Lutomirski @ 2016-08-30 14:58 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, Ingo Molnar, Thomas Gleixner, H. Peter Anvin,
	X86 ML, Dmitry Safonov, Oleg Nesterov, Steven Rostedt, Al Viro

On Mon, Aug 29, 2016 at 2:50 AM, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
> On 08/29/2016 12:28 PM, Andy Lutomirski wrote:
>>
>> On Thu, Aug 25, 2016 at 8:21 AM, Dmitry Safonov <dsafonov@virtuozzo.com>
>> wrote:
>>>
>>> I added here a new in-kernel fs with ramfs-like options.
>>> Created vdso file in this fs (yet for testing, only 64-bit vdso).
>>> Mapped this file to process's mm on setup_additional_pages.
>>> Just for testing purpose it's done only for specific UID.
>>
>>
>> I'm wondering whether all this code could be easily moved into the
>> core special mapping helpers so that all special mappings get the same
>> benefit.  We could embed a struct file * (or struct inode or whatever)
>> in special_mapping if needed.
>
>
> Hmm, yes, I guess. The only thing -- we'll still need per-arch changes
> to initialize those files on booting. But that looks like the proper
> generic place to move this code.

You might be able to get away with initializing on first use.

--Andy

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 2/3] uprobe: drop isdigit() check in create_trace_uprobe
  2016-08-30 14:37       ` Oleg Nesterov
@ 2016-08-30 21:15         ` Steven Rostedt
  2016-08-31 12:07           ` Oleg Nesterov
  0 siblings, 1 reply; 37+ messages in thread
From: Steven Rostedt @ 2016-08-30 21:15 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Safonov, linux-kernel, mingo, luto, tglx, hpa, x86,
	0x7f454c46, viro

On Tue, 30 Aug 2016 16:37:28 +0200
Oleg Nesterov <oleg@redhat.com> wrote:

> On 08/29, Steven Rostedt wrote:
> >
> > On Mon, 29 Aug 2016 18:58:13 -0400
> > Steven Rostedt <rostedt@goodmis.org> wrote:
> >  
> > > On Thu, 25 Aug 2016 18:21:09 +0300
> > > Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
> > >  
> > > > It's useless. Before:
> > > >   [tracing]# echo 'p:test /a:0x0' >> uprobe_events
> > > >   [tracing]# echo 'p:test a:0x0' >> uprobe_events
> > > >   -bash: echo: write error: No such file or directory
> > > >   [tracing]# echo 'p:test 1:0x0' >> uprobe_events
> > > >   -bash: echo: write error: Invalid argument
> > > > 
> > > > After:
> > > >   [tracing]# echo 'p:test 1:0x0' >> uprobe_events
> > > >   -bash: echo: write error: No such file or directory
> > > > 
> > > > Signed-off-by: Dmitry Safonov <dsafonov@virtuozzo.com>    
> > > 
> > > Acked-by: Steven Rostedt <rostedt@goodmis.org>  
> >
> > Actually, this patch seems agnostic to the series. I'll just pull it in
> > now.  
> 
> Yes, agreed, this isdigit() is pointless, thanks.

Should I put you down as an Acked-by?

-- Steve

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 2/3] uprobe: drop isdigit() check in create_trace_uprobe
  2016-08-30 21:15         ` Steven Rostedt
@ 2016-08-31 12:07           ` Oleg Nesterov
  0 siblings, 0 replies; 37+ messages in thread
From: Oleg Nesterov @ 2016-08-31 12:07 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Dmitry Safonov, linux-kernel, mingo, luto, tglx, hpa, x86,
	0x7f454c46, viro

On 08/30, Steven Rostedt wrote:
>
> On Tue, 30 Aug 2016 16:37:28 +0200
> Oleg Nesterov <oleg@redhat.com> wrote:
> 
> > On 08/29, Steven Rostedt wrote:
> > >
> > >
> > > Actually, this patch seems agnostic to the series. I'll just pull it in
> > > now.  
> > 
> > Yes, agreed, this isdigit() is pointless, thanks.
> 
> Should I put you down as an Acked-by?

Thanks, feel free to add my ack.

Oleg.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-08-29  9:28   ` Andy Lutomirski
  2016-08-29  9:50     ` Dmitry Safonov
@ 2016-09-03  0:08     ` Al Viro
  1 sibling, 0 replies; 37+ messages in thread
From: Al Viro @ 2016-09-03  0:08 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Safonov, linux-kernel, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, X86 ML, Dmitry Safonov, Oleg Nesterov,
	Steven Rostedt

On Mon, Aug 29, 2016 at 02:28:08AM -0700, Andy Lutomirski wrote:
> On Thu, Aug 25, 2016 at 8:21 AM, Dmitry Safonov <dsafonov@virtuozzo.com> wrote:
> > I added here a new in-kernel fs with ramfs-like options.
> > Created vdso file in this fs (yet for testing, only 64-bit vdso).
> > Mapped this file to process's mm on setup_additional_pages.
> > Just for testing purpose it's done only for specific UID.
> 
> I'm wondering whether all this code could be easily moved into the
> core special mapping helpers so that all special mappings get the same
> benefit.  We could embed a struct file * (or struct inode or whatever)
> in special_mapping if needed.
> 
> Also, could this be simplified to use anon_inode?

Please, don't.  anon_inode is for situations when you don't mind sharing
the _same_ inode for different things.  This one very clearly isn't that.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-08-30 14:33   ` Oleg Nesterov
  2016-08-30 14:53     ` Dmitry Safonov
@ 2016-09-03  0:13     ` Al Viro
  1 sibling, 0 replies; 37+ messages in thread
From: Al Viro @ 2016-09-03  0:13 UTC (permalink / raw)
  To: Oleg Nesterov
  Cc: Dmitry Safonov, linux-kernel, mingo, luto, tglx, hpa, x86,
	0x7f454c46, rostedt

On Tue, Aug 30, 2016 at 04:33:12PM +0200, Oleg Nesterov wrote:
> > +	inode = ramfs_get_inode(sb, NULL, S_IFREG | S_IRUGO | S_IXUGO, 0);

> Not sure... I think alloc_anon_inode() makes more sense.

	Compared to this, you mean?  It's going to give you the wrong
permissions/i_op/a_ops, and you'll have to assign them manually.  ramfs
already gives the right ones for what's wanted, AFAICS.

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-08-25 15:21 ` [RFC 1/3] x86/vdso: create vdso file, use it for mapping Dmitry Safonov
                     ` (3 preceding siblings ...)
  2016-08-30 14:33   ` Oleg Nesterov
@ 2016-09-03  0:20   ` Al Viro
  2016-09-03  7:32     ` Dmitry Safonov
  4 siblings, 1 reply; 37+ messages in thread
From: Al Viro @ 2016-09-03  0:20 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: linux-kernel, mingo, luto, tglx, hpa, x86, 0x7f454c46, oleg, rostedt

On Thu, Aug 25, 2016 at 06:21:08PM +0300, Dmitry Safonov wrote:
> +		unsigned long n_addr = mmap_region(vdso_file_64, text_start,
> +				image->size, VM_READ|VM_EXEC|
> +				VM_DONTEXPAND|VM_SOFTDIRTY|
> +				VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0);
> +		if (text_start != n_addr) {
> +			pr_err("Failed to mmap vdso file at %lx, mmap_region returned %lx\n",
> +					text_start, n_addr);
> +			goto old_way;
> +		}
> +		vma = find_vma(mm, text_start);
> +		if (IS_ERR(vma) || vma->vm_start != text_start) {
> +			pr_err("Failed to find vdso mapped vma at %lx\n",
> +					text_start);
> +			goto old_way;

Umm...  Since when can find_vma() return ERR_PTR()?

> +	d_set_d_op(path.dentry, &vdso_dops);

Nope.  Set ->s_d_op to &vdso_dops and be done with that.

> +static struct file_system_type vdso_fs_type = {
> +	.name		= "vdsofs",
> +	.mount		= ramfs_mount,

Probably the wrong thing here.  Just use a simple wrapper using mount_pseudo()
for all work; see fs/aio.c:aio_mount().

> +	ret = register_filesystem(&vdso_fs_type);

Do you really want it user-mountable?  If not, no need to register...

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 1/3] x86/vdso: create vdso file, use it for mapping
  2016-09-03  0:20   ` Al Viro
@ 2016-09-03  7:32     ` Dmitry Safonov
  0 siblings, 0 replies; 37+ messages in thread
From: Dmitry Safonov @ 2016-09-03  7:32 UTC (permalink / raw)
  To: Al Viro
  Cc: Dmitry Safonov, linux-kernel, Ingo Molnar, Andy Lutomirski,
	Thomas Gleixner, H. Peter Anvin, X86 ML, Oleg Nesterov,
	Steven Rostedt

Big thanks on review, Al!

2016-09-03 3:20 GMT+03:00 Al Viro <viro@zeniv.linux.org.uk>:
> On Thu, Aug 25, 2016 at 06:21:08PM +0300, Dmitry Safonov wrote:
>> +             unsigned long n_addr = mmap_region(vdso_file_64, text_start,
>> +                             image->size, VM_READ|VM_EXEC|
>> +                             VM_DONTEXPAND|VM_SOFTDIRTY|
>> +                             VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0);
>> +             if (text_start != n_addr) {
>> +                     pr_err("Failed to mmap vdso file at %lx, mmap_region returned %lx\n",
>> +                                     text_start, n_addr);
>> +                     goto old_way;
>> +             }
>> +             vma = find_vma(mm, text_start);
>> +             if (IS_ERR(vma) || vma->vm_start != text_start) {
>> +                     pr_err("Failed to find vdso mapped vma at %lx\n",
>> +                                     text_start);
>> +                     goto old_way;
>
> Umm...  Since when can find_vma() return ERR_PTR()?

Will fix

>> +     d_set_d_op(path.dentry, &vdso_dops);
>
> Nope.  Set ->s_d_op to &vdso_dops and be done with that.

Ok

>> +static struct file_system_type vdso_fs_type = {
>> +     .name           = "vdsofs",
>> +     .mount          = ramfs_mount,
>
> Probably the wrong thing here.  Just use a simple wrapper using mount_pseudo()
> for all work; see fs/aio.c:aio_mount().

Will check, thanks!

>> +     ret = register_filesystem(&vdso_fs_type);
>
> Do you really want it user-mountable?  If not, no need to register...

Of course, I don't want to, will omit that, thanks.

-- 
             Dmitry

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
  2016-08-25 22:53   ` Dmitry Safonov
  2016-08-25 23:00     ` H. Peter Anvin
@ 2016-09-21  0:22     ` H. Peter Anvin
  2016-09-21  0:32       ` H. Peter Anvin
  1 sibling, 1 reply; 37+ messages in thread
From: H. Peter Anvin @ 2016-09-21  0:22 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Dmitry Safonov, linux-kernel, Ingo Molnar, Andy Lutomirski,
	Thomas Gleixner, X86 ML, Oleg Nesterov, Steven Rostedt, viro

The more I'm thinking about this, why don't we simply have these (the
various possible vdsos as well as vvar) as actual files in sysfs instead
of introducing a new filesystem?  I don't believe sysfs actually has to
be mounted in order for sysfs files to have an inode.

It could also be in procfs, I guess, but sysfs probably makes more sense.

I'm thinking something like:

/sys/kernel/vdso/{i386,x86_64,x32,vvar}

Not only would this let the container people and so on do weird things
much easier, but it ought to eliminate a whole slew of special cases.


	-hpa

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
  2016-09-21  0:22     ` H. Peter Anvin
@ 2016-09-21  0:32       ` H. Peter Anvin
  2016-09-21  0:54         ` Andy Lutomirski
  0 siblings, 1 reply; 37+ messages in thread
From: H. Peter Anvin @ 2016-09-21  0:32 UTC (permalink / raw)
  To: Dmitry Safonov
  Cc: Dmitry Safonov, linux-kernel, Ingo Molnar, Andy Lutomirski,
	Thomas Gleixner, X86 ML, Oleg Nesterov, Steven Rostedt, viro

On 09/20/16 17:22, H. Peter Anvin wrote:
> The more I'm thinking about this, why don't we simply have these (the
> various possible vdsos as well as vvar) as actual files in sysfs instead
> of introducing a new filesystem?  I don't believe sysfs actually has to
> be mounted in order for sysfs files to have an inode.
> 
> It could also be in procfs, I guess, but sysfs probably makes more sense.
> 
> I'm thinking something like:
> 
> /sys/kernel/vdso/{i386,x86_64,x32,vvar}
> 
> Not only would this let the container people and so on do weird things
> much easier, but it ought to eliminate a whole slew of special cases.
> 

Even crazier idea: instead of a separate vvar file, have the vvar page
just be a part of these files (as a shared page)... I'm wondering if we
can even use load_elf_interp() since after all it is an ELF shared
library image...

	-hpa

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
  2016-09-21  0:32       ` H. Peter Anvin
@ 2016-09-21  0:54         ` Andy Lutomirski
  2016-09-21  1:07           ` H. Peter Anvin
  0 siblings, 1 reply; 37+ messages in thread
From: Andy Lutomirski @ 2016-09-21  0:54 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dmitry Safonov, Dmitry Safonov, linux-kernel, Ingo Molnar,
	Thomas Gleixner, X86 ML, Oleg Nesterov, Steven Rostedt, Al Viro

On Tue, Sep 20, 2016 at 5:32 PM, H. Peter Anvin <hpa@zytor.com> wrote:
> On 09/20/16 17:22, H. Peter Anvin wrote:
>> The more I'm thinking about this, why don't we simply have these (the
>> various possible vdsos as well as vvar) as actual files in sysfs instead
>> of introducing a new filesystem?  I don't believe sysfs actually has to
>> be mounted in order for sysfs files to have an inode.
>>
>> It could also be in procfs, I guess, but sysfs probably makes more sense.
>>
>> I'm thinking something like:
>>
>> /sys/kernel/vdso/{i386,x86_64,x32,vvar}
>>
>> Not only would this let the container people and so on do weird things
>> much easier, but it ought to eliminate a whole slew of special cases.
>>
>
> Even crazier idea: instead of a separate vvar file, have the vvar page
> just be a part of these files (as a shared page)... I'm wondering if we
> can even use load_elf_interp() since after all it is an ELF shared
> library image...

I think that may be too crazy:

 - If vvar is in the same inode, then that inode won't be a valid ELF
image, because the ELF header won't be in the right place.

 - vvar is highly magical.  IMO letting it get mapped with VM_MAYWRITE
is asking for trouble, as anything that writes it will COW it, leading
to strange malfunctions.

 - vvar can, and has, had IO pages in it.  This means that the actual
cache types can vary page-to-page in the vvar area, which is not
something that ordinary files do.

Also, if we let the users get an fd pointing to the vdso, then we're
more or less committing to never having contents in the vdso text that
vary per-process.  Are we okay with that.

Dmitry's patches have the vdso using the page cache, and I'm not sure
that even that is needed.  I think that a file with no backing
address_space that simply provides vm_fault instead may be sufficient,
especially for vvar.  I don't know if uprobes would be okay with that,
though.

My personal preference is to let them both be real struct file *
objects (possibly shared between all processes of the same vdso ABI)
but to prevent user code from ever creating an fd referring to one of
these files.

--Andy

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
  2016-09-21  0:54         ` Andy Lutomirski
@ 2016-09-21  1:07           ` H. Peter Anvin
  2016-09-21  1:17             ` H. Peter Anvin
  0 siblings, 1 reply; 37+ messages in thread
From: H. Peter Anvin @ 2016-09-21  1:07 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Safonov, Dmitry Safonov, linux-kernel, Ingo Molnar,
	Thomas Gleixner, X86 ML, Oleg Nesterov, Steven Rostedt, Al Viro

On 09/20/16 17:54, Andy Lutomirski wrote:
>  - If vvar is in the same inode, then that inode won't be a valid ELF
> image, because the ELF header won't be in the right place.

So the vvar ought to move into an actual ELF segment, which is probably
The Right Thing anyway.

>  - vvar is highly magical.  IMO letting it get mapped with VM_MAYWRITE
> is asking for trouble, as anything that writes it will COW it, leading
> to strange malfunctions.
> 
>  - vvar can, and has, had IO pages in it.  This means that the actual
> cache types can vary page-to-page in the vvar area, which is not
> something that ordinary files do.

Neither of these are any different than many devices, or various files
in procfs.

> My personal preference is to let them both be real struct file *
> objects (possibly shared between all processes of the same vdso ABI)
> but to prevent user code from ever creating an fd referring to one of
> these files.

Why?  It would help people doing weird things like process snapshotting
or bimodal execution enormously.  We want to share an inode, obviously;
the pointer is another issue.

> Also, if we let the users get an fd pointing to the vdso, then we're
> more or less committing to never having contents in the vdso text that
> vary per-process.  Are we okay with that.

This might be a reason to put these objects in procfs rather than sysfs,
but I have to admit that this seems *extremely* far fetched to me.
Obviously they vary per process in the sense that there are already
several to choose from.  In the case of process-unique vdsos there would
be a large number of them, of course.

	-hpa

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
  2016-09-21  1:07           ` H. Peter Anvin
@ 2016-09-21  1:17             ` H. Peter Anvin
  2016-09-21  6:39               ` Andy Lutomirski
  0 siblings, 1 reply; 37+ messages in thread
From: H. Peter Anvin @ 2016-09-21  1:17 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Dmitry Safonov, Dmitry Safonov, linux-kernel, Ingo Molnar,
	Thomas Gleixner, X86 ML, Oleg Nesterov, Steven Rostedt, Al Viro

On 09/20/16 18:07, H. Peter Anvin wrote:
> 
>>  - vvar is highly magical.  IMO letting it get mapped with VM_MAYWRITE
>> is asking for trouble, as anything that writes it will COW it, leading
>> to strange malfunctions.
>>

The vvar page obviously needs to be mapped MAP_SHARED, and the
underlying file needs to reject writes.  A solution where this area
doesn't end up MAP_SHARED is obviously defective.

As far as keeping the user from doing really stupid things... they can
map a RAM page over the vvar area and there is nothing the kernel really
can do to keep them from doing something like that without doing things
that are probably way worse than the disease.  At least it will be
obvious looking at the mapping file what is going on.

	-hpa

^ permalink raw reply	[flat|nested] 37+ messages in thread

* Re: [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs)
  2016-09-21  1:17             ` H. Peter Anvin
@ 2016-09-21  6:39               ` Andy Lutomirski
  0 siblings, 0 replies; 37+ messages in thread
From: Andy Lutomirski @ 2016-09-21  6:39 UTC (permalink / raw)
  To: H. Peter Anvin
  Cc: Dmitry Safonov, Dmitry Safonov, linux-kernel, Ingo Molnar,
	Thomas Gleixner, X86 ML, Oleg Nesterov, Steven Rostedt, Al Viro

> On 09/20/16 18:07, H. Peter Anvin wrote:
> >
> >>  - vvar is highly magical.  IMO letting it get mapped with VM_MAYWRITE
> >> is asking for trouble, as anything that writes it will COW it, leading
> >> to strange malfunctions.
> >>
>
> The vvar page obviously needs to be mapped MAP_SHARED, and the
> underlying file needs to reject writes.  A solution where this area
> doesn't end up MAP_SHARED is obviously defective.

Hmm, maybe.  But it does certainly work now, and I'm not sure what we
gain by making it more file-like than needed.  Using a non-null
vm_file removes tons of special cases, but giving it a real
address_space doesn't seem very useful to me.

>
> As far as keeping the user from doing really stupid things... they can
> map a RAM page over the vvar area and there is nothing the kernel really
> can do to keep them from doing something like that without doing things
> that are probably way worse than the disease.  At least it will be
> obvious looking at the mapping file what is going on.

I don't want an overly clever debugger to poke the page.  MAP_SHARED
might be sufficient.

^ permalink raw reply	[flat|nested] 37+ messages in thread

end of thread, other threads:[~2016-09-21  6:40 UTC | newest]

Thread overview: 37+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-08-25 15:21 [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs) Dmitry Safonov
2016-08-25 15:21 ` [RFC 1/3] x86/vdso: create vdso file, use it for mapping Dmitry Safonov
2016-08-25 19:49   ` Dmitry Safonov
2016-08-25 20:05     ` Dmitry Safonov
2016-08-28 20:14   ` Cyrill Gorcunov
2016-08-29  9:18     ` Dmitry Safonov
2016-08-29  9:28   ` Andy Lutomirski
2016-08-29  9:50     ` Dmitry Safonov
2016-08-30 14:58       ` Andy Lutomirski
2016-09-03  0:08     ` Al Viro
2016-08-30 14:33   ` Oleg Nesterov
2016-08-30 14:53     ` Dmitry Safonov
2016-09-03  0:13     ` Al Viro
2016-09-03  0:20   ` Al Viro
2016-09-03  7:32     ` Dmitry Safonov
2016-08-25 15:21 ` [RFC 2/3] uprobe: drop isdigit() check in create_trace_uprobe Dmitry Safonov
2016-08-29 22:58   ` Steven Rostedt
2016-08-29 22:59     ` Steven Rostedt
2016-08-29 23:01       ` Dmitry Safonov
2016-08-30 14:37       ` Oleg Nesterov
2016-08-30 21:15         ` Steven Rostedt
2016-08-31 12:07           ` Oleg Nesterov
2016-08-30 14:57     ` Srikar Dronamraju
2016-08-25 15:21 ` [RFC 3/3] uprobe: add vdso support Dmitry Safonov
2016-08-25 20:49 ` [RFC 0/3] Put vdso in ramfs-like filesystem (vdsofs) H. Peter Anvin
2016-08-25 22:53   ` Dmitry Safonov
2016-08-25 23:00     ` H. Peter Anvin
2016-08-26 11:16       ` Dmitry Safonov
2016-08-26 14:32         ` Andy Lutomirski
2016-08-26 14:42           ` Dmitry Safonov
2016-08-26 14:44             ` Dmitry Safonov
2016-09-21  0:22     ` H. Peter Anvin
2016-09-21  0:32       ` H. Peter Anvin
2016-09-21  0:54         ` Andy Lutomirski
2016-09-21  1:07           ` H. Peter Anvin
2016-09-21  1:17             ` H. Peter Anvin
2016-09-21  6:39               ` Andy Lutomirski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.