All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] syscall/tracing: compat syscall support
@ 2016-09-09  8:03 ` Marcin Nowakowski
  0 siblings, 0 replies; 13+ messages in thread
From: Marcin Nowakowski @ 2016-09-09  8:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api, luto, rostedt, Marcin Nowakowski

This patchset adds support syscall event tracing for compat syscalls.

Patch 1 removes the unnecessary syscall_nr field from syscall metadata,
which was one of the obstacles for adding proper support for compat syscalls.

Patch 2 adds a method to distinguish handling of syscalls for compat tasks
if an arch requires that. In disussions about an earlier version of these
patches (http://marc.info/?l=linux-mips&m=147259973128606&w=2) it was suggested
to use audit arch for detecting syscall type. After analysing the code
for various arches it seemed to me that this would add an unnecessary
complexity (as would require extra APIs to enumerate and map all audit
arch types) and I've just simply used compat task status to determine call
type. I cannot see any added value from using the audit arch type in this
context, but it is of course possible that I'm missing some vital piece
of information here ... hence this change is sent as an RFC so that I can
get valuable feedback on the proposed solution.

Finally the last patch adds the missing compat syscall metadata.


Marcin Nowakowski (3):
  tracing/syscalls: remove syscall_nr from syscall metadata
  tracing/syscalls: add handling for compat tasks
  tracing/syscalls: add compat syscall metadata

 arch/mips/kernel/ftrace.c     |   4 +-
 arch/x86/include/asm/ftrace.h |  10 +-
 arch/x86/kernel/ftrace.c      |  14 +++
 include/linux/compat.h        |  74 ++++++++++++
 include/linux/ftrace.h        |   2 +-
 include/linux/syscalls.h      |   1 -
 include/trace/syscall.h       |   2 -
 kernel/trace/trace.h          |  11 +-
 kernel/trace/trace_syscalls.c | 254 +++++++++++++++++++++++++++---------------
 9 files changed, 265 insertions(+), 107 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 0/3] syscall/tracing: compat syscall support
@ 2016-09-09  8:03 ` Marcin Nowakowski
  0 siblings, 0 replies; 13+ messages in thread
From: Marcin Nowakowski @ 2016-09-09  8:03 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, luto-kltTT9wpgjJwATOyAt5JVQ,
	rostedt-nx8X9YLhiw1AfugRpC6u6w, Marcin Nowakowski

This patchset adds support syscall event tracing for compat syscalls.

Patch 1 removes the unnecessary syscall_nr field from syscall metadata,
which was one of the obstacles for adding proper support for compat syscalls.

Patch 2 adds a method to distinguish handling of syscalls for compat tasks
if an arch requires that. In disussions about an earlier version of these
patches (http://marc.info/?l=linux-mips&m=147259973128606&w=2) it was suggested
to use audit arch for detecting syscall type. After analysing the code
for various arches it seemed to me that this would add an unnecessary
complexity (as would require extra APIs to enumerate and map all audit
arch types) and I've just simply used compat task status to determine call
type. I cannot see any added value from using the audit arch type in this
context, but it is of course possible that I'm missing some vital piece
of information here ... hence this change is sent as an RFC so that I can
get valuable feedback on the proposed solution.

Finally the last patch adds the missing compat syscall metadata.


Marcin Nowakowski (3):
  tracing/syscalls: remove syscall_nr from syscall metadata
  tracing/syscalls: add handling for compat tasks
  tracing/syscalls: add compat syscall metadata

 arch/mips/kernel/ftrace.c     |   4 +-
 arch/x86/include/asm/ftrace.h |  10 +-
 arch/x86/kernel/ftrace.c      |  14 +++
 include/linux/compat.h        |  74 ++++++++++++
 include/linux/ftrace.h        |   2 +-
 include/linux/syscalls.h      |   1 -
 include/trace/syscall.h       |   2 -
 kernel/trace/trace.h          |  11 +-
 kernel/trace/trace_syscalls.c | 254 +++++++++++++++++++++++++++---------------
 9 files changed, 265 insertions(+), 107 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/3] tracing/syscalls: remove syscall_nr from syscall metadata
@ 2016-09-09  8:03   ` Marcin Nowakowski
  0 siblings, 0 replies; 13+ messages in thread
From: Marcin Nowakowski @ 2016-09-09  8:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api, luto, rostedt, Marcin Nowakowski

Some architectures map multiple syscall numbers to a single syscall.
This meant that on those platforms, some system calls could not be
properly traced using syscall event tracing mechanism, as a different
number of a syscall was used for registration to the one used by
applications.
We can use syscall lookup together with the syscall metadata table
traversal to register for appropriate events instead. This slightly
increases the overhead during event (un)registration, but does not
impact the trace events themselves, which still use syscall numbers
directly.
At the moment it doesn't seem possible to generate the required
syscall map table at build time - as the syscall numbers are assigned
in very arch-specific ways, so any change would require a lot of
arch-specific changes to map things appropriately. It doesn't seem like
a sensible thing to do for tracing purposes without a major change in
how syscall tables and numbers are defined for each arch.

Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>

---
 include/linux/syscalls.h      |   1 -
 include/trace/syscall.h       |   2 -
 kernel/trace/trace_syscalls.c | 150 ++++++++++++++++++++++++++++--------------
 3 files changed, 100 insertions(+), 53 deletions(-)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index d022390..c13aadd 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -160,7 +160,6 @@ extern struct trace_event_functions exit_syscall_print_funcs;
 	static struct syscall_metadata __used			\
 	  __syscall_meta_##sname = {				\
 		.name 		= "sys"#sname,			\
-		.syscall_nr	= -1,	/* Filled in at boot */	\
 		.nb_args 	= nb,				\
 		.types		= nb ? types_##sname : NULL,	\
 		.args		= nb ? args_##sname : NULL,	\
diff --git a/include/trace/syscall.h b/include/trace/syscall.h
index 7434f0f..b5fbebe 100644
--- a/include/trace/syscall.h
+++ b/include/trace/syscall.h
@@ -13,7 +13,6 @@
  * A syscall entry in the ftrace syscalls array.
  *
  * @name: name of the syscall
- * @syscall_nr: number of the syscall
  * @nb_args: number of parameters it takes
  * @types: list of types as strings
  * @args: list of args as strings (args[i] matches types[i])
@@ -23,7 +22,6 @@
  */
 struct syscall_metadata {
 	const char	*name;
-	int		syscall_nr;
 	int		nb_args;
 	const char	**types;
 	const char	**args;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index b2b6efc..1da10ca 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -404,17 +404,26 @@ static int reg_event_syscall_enter(struct trace_event_file *file,
 	struct trace_array *tr = file->tr;
 	int ret = 0;
 	int num;
+	const char *name;
+
+	name = ((const struct syscall_metadata *)call->data)->name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
-	if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
-		return -ENOSYS;
 	mutex_lock(&syscall_trace_lock);
-	if (!tr->sys_refcount_enter)
+	if (!tr->sys_refcount_enter) {
 		ret = register_trace_sys_enter(ftrace_syscall_enter, tr);
-	if (!ret) {
-		rcu_assign_pointer(tr->enter_syscall_files[num], file);
-		tr->sys_refcount_enter++;
+		if (ret)
+			goto out_unlock;
+	}
+
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				    name))
+			rcu_assign_pointer(tr->enter_syscall_files[num], file);
 	}
+	tr->sys_refcount_enter++;
+
+out_unlock:
 	mutex_unlock(&syscall_trace_lock);
 	return ret;
 }
@@ -424,13 +433,17 @@ static void unreg_event_syscall_enter(struct trace_event_file *file,
 {
 	struct trace_array *tr = file->tr;
 	int num;
+	const char *name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
-	if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
-		return;
+	name = ((const struct syscall_metadata *)call->data)->name;
 	mutex_lock(&syscall_trace_lock);
 	tr->sys_refcount_enter--;
-	RCU_INIT_POINTER(tr->enter_syscall_files[num], NULL);
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		   arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				   name))
+			RCU_INIT_POINTER(tr->enter_syscall_files[num], NULL);
+	}
 	if (!tr->sys_refcount_enter)
 		unregister_trace_sys_enter(ftrace_syscall_enter, tr);
 	mutex_unlock(&syscall_trace_lock);
@@ -442,17 +455,26 @@ static int reg_event_syscall_exit(struct trace_event_file *file,
 	struct trace_array *tr = file->tr;
 	int ret = 0;
 	int num;
+	const char *name;
+
+	name = ((const struct syscall_metadata *)call->data)->name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
-	if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
-		return -ENOSYS;
 	mutex_lock(&syscall_trace_lock);
-	if (!tr->sys_refcount_exit)
-		ret = register_trace_sys_exit(ftrace_syscall_exit, tr);
-	if (!ret) {
-		rcu_assign_pointer(tr->exit_syscall_files[num], file);
-		tr->sys_refcount_exit++;
+	if (!tr->sys_refcount_exit) {
+		ret = register_trace_sys_enter(ftrace_syscall_exit, tr);
+		if (ret)
+			goto out_unlock;
+	}
+
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				    name))
+			rcu_assign_pointer(tr->exit_syscall_files[num], file);
 	}
+	tr->sys_refcount_exit++;
+
+out_unlock:
 	mutex_unlock(&syscall_trace_lock);
 	return ret;
 }
@@ -462,13 +484,18 @@ static void unreg_event_syscall_exit(struct trace_event_file *file,
 {
 	struct trace_array *tr = file->tr;
 	int num;
+	const char *name;
+
+	name = ((const struct syscall_metadata *)call->data)->name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
-	if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
-		return;
 	mutex_lock(&syscall_trace_lock);
 	tr->sys_refcount_exit--;
-	RCU_INIT_POINTER(tr->exit_syscall_files[num], NULL);
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		   arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				   name))
+			RCU_INIT_POINTER(tr->exit_syscall_files[num], NULL);
+	}
 	if (!tr->sys_refcount_exit)
 		unregister_trace_sys_exit(ftrace_syscall_exit, tr);
 	mutex_unlock(&syscall_trace_lock);
@@ -477,14 +504,6 @@ static void unreg_event_syscall_exit(struct trace_event_file *file,
 static int __init init_syscall_trace(struct trace_event_call *call)
 {
 	int id;
-	int num;
-
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
-	if (num < 0 || num >= NR_syscalls) {
-		pr_debug("syscall %s metadata not mapped, disabling ftrace event\n",
-				((struct syscall_metadata *)call->data)->name);
-		return -ENOSYS;
-	}
 
 	if (set_syscall_print_fmt(call) < 0)
 		return -ENOMEM;
@@ -547,7 +566,6 @@ void __init init_ftrace_syscalls(void)
 		if (!meta)
 			continue;
 
-		meta->syscall_nr = i;
 		syscalls_metadata[i] = meta;
 	}
 }
@@ -603,19 +621,29 @@ static int perf_sysenter_enable(struct trace_event_call *call)
 {
 	int ret = 0;
 	int num;
+	const char *name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
+	name = ((const struct syscall_metadata *)call->data)->name;
 
 	mutex_lock(&syscall_trace_lock);
-	if (!sys_perf_refcount_enter)
+	if (!sys_perf_refcount_enter) {
 		ret = register_trace_sys_enter(perf_syscall_enter, NULL);
-	if (ret) {
-		pr_info("event trace: Could not activate"
+		if (ret) {
+			pr_info("event trace: Could not activate"
 				"syscall entry trace point");
-	} else {
-		set_bit(num, enabled_perf_enter_syscalls);
-		sys_perf_refcount_enter++;
+			goto out_unlock;
+		}
+	}
+
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				    name))
+			set_bit(num, enabled_perf_enter_syscalls);
 	}
+	sys_perf_refcount_enter++;
+
+out_unlock:
 	mutex_unlock(&syscall_trace_lock);
 	return ret;
 }
@@ -623,12 +651,18 @@ static int perf_sysenter_enable(struct trace_event_call *call)
 static void perf_sysenter_disable(struct trace_event_call *call)
 {
 	int num;
+	const char *name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
+	name = ((const struct syscall_metadata *)call->data)->name;
 
 	mutex_lock(&syscall_trace_lock);
 	sys_perf_refcount_enter--;
-	clear_bit(num, enabled_perf_enter_syscalls);
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				    name))
+			clear_bit(num, enabled_perf_enter_syscalls);
+	}
 	if (!sys_perf_refcount_enter)
 		unregister_trace_sys_enter(perf_syscall_enter, NULL);
 	mutex_unlock(&syscall_trace_lock);
@@ -675,19 +709,29 @@ static int perf_sysexit_enable(struct trace_event_call *call)
 {
 	int ret = 0;
 	int num;
+	const char *name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
+	name = ((const struct syscall_metadata *)call->data)->name;
 
 	mutex_lock(&syscall_trace_lock);
-	if (!sys_perf_refcount_exit)
+	if (!sys_perf_refcount_exit) {
 		ret = register_trace_sys_exit(perf_syscall_exit, NULL);
-	if (ret) {
-		pr_info("event trace: Could not activate"
+		if (ret) {
+			pr_info("event trace: Could not activate"
 				"syscall exit trace point");
-	} else {
-		set_bit(num, enabled_perf_exit_syscalls);
-		sys_perf_refcount_exit++;
+			goto out_unlock;
+		}
 	}
+
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				    name))
+			set_bit(num, enabled_perf_exit_syscalls);
+	}
+	sys_perf_refcount_exit++;
+
+out_unlock:
 	mutex_unlock(&syscall_trace_lock);
 	return ret;
 }
@@ -695,12 +739,18 @@ static int perf_sysexit_enable(struct trace_event_call *call)
 static void perf_sysexit_disable(struct trace_event_call *call)
 {
 	int num;
+	const char *name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
+	name = ((const struct syscall_metadata *)call->data)->name;
 
 	mutex_lock(&syscall_trace_lock);
 	sys_perf_refcount_exit--;
-	clear_bit(num, enabled_perf_exit_syscalls);
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				    name))
+			clear_bit(num, enabled_perf_exit_syscalls);
+	}
 	if (!sys_perf_refcount_exit)
 		unregister_trace_sys_exit(perf_syscall_exit, NULL);
 	mutex_unlock(&syscall_trace_lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/3] tracing/syscalls: remove syscall_nr from syscall metadata
@ 2016-09-09  8:03   ` Marcin Nowakowski
  0 siblings, 0 replies; 13+ messages in thread
From: Marcin Nowakowski @ 2016-09-09  8:03 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA
  Cc: linux-api-u79uwXL29TY76Z2rM5mHXA, luto-kltTT9wpgjJwATOyAt5JVQ,
	rostedt-nx8X9YLhiw1AfugRpC6u6w, Marcin Nowakowski

Some architectures map multiple syscall numbers to a single syscall.
This meant that on those platforms, some system calls could not be
properly traced using syscall event tracing mechanism, as a different
number of a syscall was used for registration to the one used by
applications.
We can use syscall lookup together with the syscall metadata table
traversal to register for appropriate events instead. This slightly
increases the overhead during event (un)registration, but does not
impact the trace events themselves, which still use syscall numbers
directly.
At the moment it doesn't seem possible to generate the required
syscall map table at build time - as the syscall numbers are assigned
in very arch-specific ways, so any change would require a lot of
arch-specific changes to map things appropriately. It doesn't seem like
a sensible thing to do for tracing purposes without a major change in
how syscall tables and numbers are defined for each arch.

Signed-off-by: Marcin Nowakowski <marcin.nowakowski-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>

---
 include/linux/syscalls.h      |   1 -
 include/trace/syscall.h       |   2 -
 kernel/trace/trace_syscalls.c | 150 ++++++++++++++++++++++++++++--------------
 3 files changed, 100 insertions(+), 53 deletions(-)

diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h
index d022390..c13aadd 100644
--- a/include/linux/syscalls.h
+++ b/include/linux/syscalls.h
@@ -160,7 +160,6 @@ extern struct trace_event_functions exit_syscall_print_funcs;
 	static struct syscall_metadata __used			\
 	  __syscall_meta_##sname = {				\
 		.name 		= "sys"#sname,			\
-		.syscall_nr	= -1,	/* Filled in at boot */	\
 		.nb_args 	= nb,				\
 		.types		= nb ? types_##sname : NULL,	\
 		.args		= nb ? args_##sname : NULL,	\
diff --git a/include/trace/syscall.h b/include/trace/syscall.h
index 7434f0f..b5fbebe 100644
--- a/include/trace/syscall.h
+++ b/include/trace/syscall.h
@@ -13,7 +13,6 @@
  * A syscall entry in the ftrace syscalls array.
  *
  * @name: name of the syscall
- * @syscall_nr: number of the syscall
  * @nb_args: number of parameters it takes
  * @types: list of types as strings
  * @args: list of args as strings (args[i] matches types[i])
@@ -23,7 +22,6 @@
  */
 struct syscall_metadata {
 	const char	*name;
-	int		syscall_nr;
 	int		nb_args;
 	const char	**types;
 	const char	**args;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index b2b6efc..1da10ca 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -404,17 +404,26 @@ static int reg_event_syscall_enter(struct trace_event_file *file,
 	struct trace_array *tr = file->tr;
 	int ret = 0;
 	int num;
+	const char *name;
+
+	name = ((const struct syscall_metadata *)call->data)->name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
-	if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
-		return -ENOSYS;
 	mutex_lock(&syscall_trace_lock);
-	if (!tr->sys_refcount_enter)
+	if (!tr->sys_refcount_enter) {
 		ret = register_trace_sys_enter(ftrace_syscall_enter, tr);
-	if (!ret) {
-		rcu_assign_pointer(tr->enter_syscall_files[num], file);
-		tr->sys_refcount_enter++;
+		if (ret)
+			goto out_unlock;
+	}
+
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				    name))
+			rcu_assign_pointer(tr->enter_syscall_files[num], file);
 	}
+	tr->sys_refcount_enter++;
+
+out_unlock:
 	mutex_unlock(&syscall_trace_lock);
 	return ret;
 }
@@ -424,13 +433,17 @@ static void unreg_event_syscall_enter(struct trace_event_file *file,
 {
 	struct trace_array *tr = file->tr;
 	int num;
+	const char *name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
-	if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
-		return;
+	name = ((const struct syscall_metadata *)call->data)->name;
 	mutex_lock(&syscall_trace_lock);
 	tr->sys_refcount_enter--;
-	RCU_INIT_POINTER(tr->enter_syscall_files[num], NULL);
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		   arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				   name))
+			RCU_INIT_POINTER(tr->enter_syscall_files[num], NULL);
+	}
 	if (!tr->sys_refcount_enter)
 		unregister_trace_sys_enter(ftrace_syscall_enter, tr);
 	mutex_unlock(&syscall_trace_lock);
@@ -442,17 +455,26 @@ static int reg_event_syscall_exit(struct trace_event_file *file,
 	struct trace_array *tr = file->tr;
 	int ret = 0;
 	int num;
+	const char *name;
+
+	name = ((const struct syscall_metadata *)call->data)->name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
-	if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
-		return -ENOSYS;
 	mutex_lock(&syscall_trace_lock);
-	if (!tr->sys_refcount_exit)
-		ret = register_trace_sys_exit(ftrace_syscall_exit, tr);
-	if (!ret) {
-		rcu_assign_pointer(tr->exit_syscall_files[num], file);
-		tr->sys_refcount_exit++;
+	if (!tr->sys_refcount_exit) {
+		ret = register_trace_sys_enter(ftrace_syscall_exit, tr);
+		if (ret)
+			goto out_unlock;
+	}
+
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				    name))
+			rcu_assign_pointer(tr->exit_syscall_files[num], file);
 	}
+	tr->sys_refcount_exit++;
+
+out_unlock:
 	mutex_unlock(&syscall_trace_lock);
 	return ret;
 }
@@ -462,13 +484,18 @@ static void unreg_event_syscall_exit(struct trace_event_file *file,
 {
 	struct trace_array *tr = file->tr;
 	int num;
+	const char *name;
+
+	name = ((const struct syscall_metadata *)call->data)->name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
-	if (WARN_ON_ONCE(num < 0 || num >= NR_syscalls))
-		return;
 	mutex_lock(&syscall_trace_lock);
 	tr->sys_refcount_exit--;
-	RCU_INIT_POINTER(tr->exit_syscall_files[num], NULL);
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		   arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				   name))
+			RCU_INIT_POINTER(tr->exit_syscall_files[num], NULL);
+	}
 	if (!tr->sys_refcount_exit)
 		unregister_trace_sys_exit(ftrace_syscall_exit, tr);
 	mutex_unlock(&syscall_trace_lock);
@@ -477,14 +504,6 @@ static void unreg_event_syscall_exit(struct trace_event_file *file,
 static int __init init_syscall_trace(struct trace_event_call *call)
 {
 	int id;
-	int num;
-
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
-	if (num < 0 || num >= NR_syscalls) {
-		pr_debug("syscall %s metadata not mapped, disabling ftrace event\n",
-				((struct syscall_metadata *)call->data)->name);
-		return -ENOSYS;
-	}
 
 	if (set_syscall_print_fmt(call) < 0)
 		return -ENOMEM;
@@ -547,7 +566,6 @@ void __init init_ftrace_syscalls(void)
 		if (!meta)
 			continue;
 
-		meta->syscall_nr = i;
 		syscalls_metadata[i] = meta;
 	}
 }
@@ -603,19 +621,29 @@ static int perf_sysenter_enable(struct trace_event_call *call)
 {
 	int ret = 0;
 	int num;
+	const char *name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
+	name = ((const struct syscall_metadata *)call->data)->name;
 
 	mutex_lock(&syscall_trace_lock);
-	if (!sys_perf_refcount_enter)
+	if (!sys_perf_refcount_enter) {
 		ret = register_trace_sys_enter(perf_syscall_enter, NULL);
-	if (ret) {
-		pr_info("event trace: Could not activate"
+		if (ret) {
+			pr_info("event trace: Could not activate"
 				"syscall entry trace point");
-	} else {
-		set_bit(num, enabled_perf_enter_syscalls);
-		sys_perf_refcount_enter++;
+			goto out_unlock;
+		}
+	}
+
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				    name))
+			set_bit(num, enabled_perf_enter_syscalls);
 	}
+	sys_perf_refcount_enter++;
+
+out_unlock:
 	mutex_unlock(&syscall_trace_lock);
 	return ret;
 }
@@ -623,12 +651,18 @@ static int perf_sysenter_enable(struct trace_event_call *call)
 static void perf_sysenter_disable(struct trace_event_call *call)
 {
 	int num;
+	const char *name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
+	name = ((const struct syscall_metadata *)call->data)->name;
 
 	mutex_lock(&syscall_trace_lock);
 	sys_perf_refcount_enter--;
-	clear_bit(num, enabled_perf_enter_syscalls);
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				    name))
+			clear_bit(num, enabled_perf_enter_syscalls);
+	}
 	if (!sys_perf_refcount_enter)
 		unregister_trace_sys_enter(perf_syscall_enter, NULL);
 	mutex_unlock(&syscall_trace_lock);
@@ -675,19 +709,29 @@ static int perf_sysexit_enable(struct trace_event_call *call)
 {
 	int ret = 0;
 	int num;
+	const char *name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
+	name = ((const struct syscall_metadata *)call->data)->name;
 
 	mutex_lock(&syscall_trace_lock);
-	if (!sys_perf_refcount_exit)
+	if (!sys_perf_refcount_exit) {
 		ret = register_trace_sys_exit(perf_syscall_exit, NULL);
-	if (ret) {
-		pr_info("event trace: Could not activate"
+		if (ret) {
+			pr_info("event trace: Could not activate"
 				"syscall exit trace point");
-	} else {
-		set_bit(num, enabled_perf_exit_syscalls);
-		sys_perf_refcount_exit++;
+			goto out_unlock;
+		}
 	}
+
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				    name))
+			set_bit(num, enabled_perf_exit_syscalls);
+	}
+	sys_perf_refcount_exit++;
+
+out_unlock:
 	mutex_unlock(&syscall_trace_lock);
 	return ret;
 }
@@ -695,12 +739,18 @@ static int perf_sysexit_enable(struct trace_event_call *call)
 static void perf_sysexit_disable(struct trace_event_call *call)
 {
 	int num;
+	const char *name;
 
-	num = ((struct syscall_metadata *)call->data)->syscall_nr;
+	name = ((const struct syscall_metadata *)call->data)->name;
 
 	mutex_lock(&syscall_trace_lock);
 	sys_perf_refcount_exit--;
-	clear_bit(num, enabled_perf_exit_syscalls);
+	for (num = 0; num < NR_syscalls; num++) {
+		if (syscalls_metadata[num] &&
+		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
+				    name))
+			clear_bit(num, enabled_perf_exit_syscalls);
+	}
 	if (!sys_perf_refcount_exit)
 		unregister_trace_sys_exit(perf_syscall_exit, NULL);
 	mutex_unlock(&syscall_trace_lock);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 2/3] tracing/syscalls: add handling for compat tasks
  2016-09-09  8:03 ` Marcin Nowakowski
@ 2016-09-09  8:03   ` Marcin Nowakowski
  -1 siblings, 0 replies; 13+ messages in thread
From: Marcin Nowakowski @ 2016-09-09  8:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api, luto, rostedt, Marcin Nowakowski

Extend the syscall tracing subsystem by adding a handler for compat
tasks. For some architectures, where compat tasks' syscall numbers have
an exclusive set of syscall numbers, this already works since the
removal of syscall_nr.
Architectures where the same syscall may use a different syscall number
for compat tasks need to define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP and
define a method arch_trace_is_compat_syscall(struct pt_regs*) that tells
if a current task is a compat one.
For architectures that define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP the
number of trace event files is doubled and all syscall trace events are
identified by the syscall number offset by NR_syscalls.

Note that as this patch series is posted as an RFC, this currently only
includes arch updates for MIPS and x86 (and has only been tested on
MIPS and x86_64). I will work on updating other arch trees after this
solution is reviewed.

Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>

---
 arch/mips/kernel/ftrace.c     |   4 +-
 arch/x86/include/asm/ftrace.h |  10 +---
 arch/x86/kernel/ftrace.c      |  14 ++++++
 include/linux/ftrace.h        |   2 +-
 kernel/trace/trace.h          |  11 +++-
 kernel/trace/trace_syscalls.c | 113 +++++++++++++++++++++++++-----------------
 6 files changed, 94 insertions(+), 60 deletions(-)

diff --git a/arch/mips/kernel/ftrace.c b/arch/mips/kernel/ftrace.c
index 937c54b..e150cf6 100644
--- a/arch/mips/kernel/ftrace.c
+++ b/arch/mips/kernel/ftrace.c
@@ -412,7 +412,7 @@ out:
 #ifdef CONFIG_FTRACE_SYSCALLS
 
 #ifdef CONFIG_32BIT
-unsigned long __init arch_syscall_addr(int nr)
+unsigned long __init arch_syscall_addr(int nr, int compat)
 {
 	return (unsigned long)sys_call_table[nr - __NR_O32_Linux];
 }
@@ -420,7 +420,7 @@ unsigned long __init arch_syscall_addr(int nr)
 
 #ifdef CONFIG_64BIT
 
-unsigned long __init arch_syscall_addr(int nr)
+unsigned long __init arch_syscall_addr(int nr, int compat)
 {
 #ifdef CONFIG_MIPS32_N32
 	if (nr >= __NR_N32_Linux && nr <= __NR_N32_Linux + __NR_N32_Linux_syscalls)
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index a4820d4..a24a21c 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -47,15 +47,7 @@ int ftrace_int3_handler(struct pt_regs *regs);
 #if defined(CONFIG_FTRACE_SYSCALLS) && defined(CONFIG_IA32_EMULATION)
 #include <asm/compat.h>
 
-/*
- * Because ia32 syscalls do not map to x86_64 syscall numbers
- * this screws up the trace output when tracing a ia32 task.
- * Instead of reporting bogus syscalls, just do not trace them.
- *
- * If the user really wants these, then they should use the
- * raw syscall tracepoints with filtering.
- */
-#define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS 1
+#define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP 1
 static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
 {
 	if (in_compat_syscall())
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index d036cfb..78f3e36 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -28,6 +28,7 @@
 #include <asm/kprobes.h>
 #include <asm/ftrace.h>
 #include <asm/nops.h>
+#include <asm/syscall.h>
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 
@@ -1035,3 +1036,16 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
 	}
 }
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
+#ifdef CONFIG_FTRACE_SYSCALLS
+
+unsigned long arch_syscall_addr(int nr, int compat)
+{
+#if defined(CONFIG_X86_64) && defined(CONFIG_IA32_EMULATION)
+	if (compat)
+		return (unsigned long)ia32_sys_call_table[nr];
+#endif
+	return (unsigned long)sys_call_table[nr];
+}
+
+#endif /* CONFIG_FTRACE_SYSCALLS */
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 7d565af..110f95d 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -938,7 +938,7 @@ static inline void  disable_trace_on_warning(void) { }
 
 #ifdef CONFIG_FTRACE_SYSCALLS
 
-unsigned long arch_syscall_addr(int nr);
+unsigned long arch_syscall_addr(int nr, int compat);
 
 #endif /* CONFIG_FTRACE_SYSCALLS */
 
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f783df4..102a41a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -234,8 +234,15 @@ struct trace_array {
 #ifdef CONFIG_FTRACE_SYSCALLS
 	int			sys_refcount_enter;
 	int			sys_refcount_exit;
-	struct trace_event_file __rcu *enter_syscall_files[NR_syscalls];
-	struct trace_event_file __rcu *exit_syscall_files[NR_syscalls];
+
+#ifdef ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP
+#define FTRACE_SYSCALL_CNT (NR_syscalls * (1 + IS_ENABLED(CONFIG_COMPAT)))
+#else
+#define FTRACE_SYSCALL_CNT (NR_syscalls)
+#endif
+
+	struct trace_event_file __rcu *enter_syscall_files[FTRACE_SYSCALL_CNT];
+	struct trace_event_file __rcu *exit_syscall_files[FTRACE_SYSCALL_CNT];
 #endif
 	int			stop_count;
 	int			clock_id;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 1da10ca..dc7df38 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -44,37 +44,35 @@ static inline bool arch_syscall_match_sym_name(const char *sym, const char *name
 }
 #endif
 
-#ifdef ARCH_TRACE_IGNORE_COMPAT_SYSCALLS
+#ifdef ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP
 /*
  * Some architectures that allow for 32bit applications
  * to run on a 64bit kernel, do not map the syscalls for
  * the 32bit tasks the same as they do for 64bit tasks.
  *
- *     *cough*x86*cough*
- *
- * In such a case, instead of reporting the wrong syscalls,
- * simply ignore them.
- *
- * For an arch to ignore the compat syscalls it needs to
- * define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS as well as
+ * If a set of syscall numbers for 32-bit tasks overlaps
+ * the set of syscall numbers for 64-bit tasks, define
+ * ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP as well as
  * define the function arch_trace_is_compat_syscall() to let
- * the tracing system know that it should ignore it.
+ * the tracing system know that a compat syscall is being handled.
  */
-static int
-trace_get_syscall_nr(struct task_struct *task, struct pt_regs *regs)
+static inline bool trace_is_compat_syscall(struct pt_regs *regs)
 {
-	if (unlikely(arch_trace_is_compat_syscall(regs)))
-		return -1;
-
-	return syscall_get_nr(task, regs);
+	return arch_trace_is_compat_syscall(regs);
 }
 #else
+static inline bool trace_is_compat_syscall(struct pt_regs *regs)
+{
+	return false;
+}
+#endif /* ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP */
+
 static inline int
 trace_get_syscall_nr(struct task_struct *task, struct pt_regs *regs)
 {
 	return syscall_get_nr(task, regs);
 }
-#endif /* ARCH_TRACE_IGNORE_COMPAT_SYSCALLS */
+
 
 static __init struct syscall_metadata *
 find_syscall_meta(unsigned long syscall)
@@ -98,9 +96,9 @@ find_syscall_meta(unsigned long syscall)
 	return NULL;
 }
 
-static struct syscall_metadata *syscall_nr_to_meta(int nr)
+static struct syscall_metadata *trace_syscall_nr_to_meta(int nr)
 {
-	if (!syscalls_metadata || nr >= NR_syscalls || nr < 0)
+	if (!syscalls_metadata || nr >= FTRACE_SYSCALL_CNT || nr < 0)
 		return NULL;
 
 	return syscalls_metadata[nr];
@@ -110,7 +108,7 @@ const char *get_syscall_name(int syscall)
 {
 	struct syscall_metadata *entry;
 
-	entry = syscall_nr_to_meta(syscall);
+	entry = trace_syscall_nr_to_meta(syscall);
 	if (!entry)
 		return NULL;
 
@@ -130,7 +128,7 @@ print_syscall_enter(struct trace_iterator *iter, int flags,
 
 	trace = (typeof(trace))ent;
 	syscall = trace->nr;
-	entry = syscall_nr_to_meta(syscall);
+	entry = trace_syscall_nr_to_meta(syscall);
 
 	if (!entry)
 		goto end;
@@ -176,7 +174,7 @@ print_syscall_exit(struct trace_iterator *iter, int flags,
 
 	trace = (typeof(trace))ent;
 	syscall = trace->nr;
-	entry = syscall_nr_to_meta(syscall);
+	entry = trace_syscall_nr_to_meta(syscall);
 
 	if (!entry) {
 		trace_seq_putc(s, '\n');
@@ -321,6 +319,9 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
 		return;
 
+	if (trace_is_compat_syscall(regs))
+		syscall_nr += NR_syscalls;
+
 	/* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE) */
 	trace_file = rcu_dereference_sched(tr->enter_syscall_files[syscall_nr]);
 	if (!trace_file)
@@ -329,7 +330,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 	if (trace_trigger_soft_disabled(trace_file))
 		return;
 
-	sys_data = syscall_nr_to_meta(syscall_nr);
+	sys_data = trace_syscall_nr_to_meta(syscall_nr);
 	if (!sys_data)
 		return;
 
@@ -368,6 +369,9 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
 		return;
 
+	if (trace_is_compat_syscall(regs))
+		syscall_nr += NR_syscalls;
+
 	/* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE()) */
 	trace_file = rcu_dereference_sched(tr->exit_syscall_files[syscall_nr]);
 	if (!trace_file)
@@ -376,7 +380,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
 	if (trace_trigger_soft_disabled(trace_file))
 		return;
 
-	sys_data = syscall_nr_to_meta(syscall_nr);
+	sys_data = trace_syscall_nr_to_meta(syscall_nr);
 	if (!sys_data)
 		return;
 
@@ -415,7 +419,7 @@ static int reg_event_syscall_enter(struct trace_event_file *file,
 			goto out_unlock;
 	}
 
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				    name))
@@ -438,7 +442,7 @@ static void unreg_event_syscall_enter(struct trace_event_file *file,
 	name = ((const struct syscall_metadata *)call->data)->name;
 	mutex_lock(&syscall_trace_lock);
 	tr->sys_refcount_enter--;
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		   arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				   name))
@@ -466,7 +470,7 @@ static int reg_event_syscall_exit(struct trace_event_file *file,
 			goto out_unlock;
 	}
 
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				    name))
@@ -490,7 +494,7 @@ static void unreg_event_syscall_exit(struct trace_event_file *file,
 
 	mutex_lock(&syscall_trace_lock);
 	tr->sys_refcount_exit--;
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		   arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				   name))
@@ -542,38 +546,47 @@ struct trace_event_class __refdata event_class_syscall_exit = {
 	.raw_init	= init_syscall_trace,
 };
 
-unsigned long __init __weak arch_syscall_addr(int nr)
+unsigned long __init __weak arch_syscall_addr(int nr, int compat)
 {
 	return (unsigned long)sys_call_table[nr];
 }
 
-void __init init_ftrace_syscalls(void)
+void __init init_ftrace_syscalls_meta(int compat)
 {
 	struct syscall_metadata *meta;
 	unsigned long addr;
 	int i;
 
-	syscalls_metadata = kcalloc(NR_syscalls, sizeof(*syscalls_metadata),
-				    GFP_KERNEL);
-	if (!syscalls_metadata) {
-		WARN_ON(1);
-		return;
-	}
-
 	for (i = 0; i < NR_syscalls; i++) {
-		addr = arch_syscall_addr(i);
+		addr = arch_syscall_addr(i, compat);
 		meta = find_syscall_meta(addr);
 		if (!meta)
 			continue;
 
-		syscalls_metadata[i] = meta;
+		syscalls_metadata[compat * NR_syscalls + i] = meta;
 	}
 }
 
+void __init init_ftrace_syscalls(void)
+{
+	syscalls_metadata = kcalloc(FTRACE_SYSCALL_CNT,
+				    sizeof(*syscalls_metadata), GFP_KERNEL);
+	if (!syscalls_metadata) {
+		WARN_ON(1);
+		return;
+	}
+
+	init_ftrace_syscalls_meta(0);
+#ifdef ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP
+	if (IS_ENABLED(CONFIG_COMPAT))
+		init_ftrace_syscalls_meta(1);
+#endif
+}
+
 #ifdef CONFIG_PERF_EVENTS
 
-static DECLARE_BITMAP(enabled_perf_enter_syscalls, NR_syscalls);
-static DECLARE_BITMAP(enabled_perf_exit_syscalls, NR_syscalls);
+static DECLARE_BITMAP(enabled_perf_enter_syscalls, FTRACE_SYSCALL_CNT);
+static DECLARE_BITMAP(enabled_perf_exit_syscalls, FTRACE_SYSCALL_CNT);
 static int sys_perf_refcount_enter;
 static int sys_perf_refcount_exit;
 
@@ -589,10 +602,14 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 	syscall_nr = trace_get_syscall_nr(current, regs);
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
 		return;
+
+	if (trace_is_compat_syscall(regs))
+		syscall_nr += NR_syscalls;
+
 	if (!test_bit(syscall_nr, enabled_perf_enter_syscalls))
 		return;
 
-	sys_data = syscall_nr_to_meta(syscall_nr);
+	sys_data = trace_syscall_nr_to_meta(syscall_nr);
 	if (!sys_data)
 		return;
 
@@ -635,7 +652,7 @@ static int perf_sysenter_enable(struct trace_event_call *call)
 		}
 	}
 
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				    name))
@@ -657,7 +674,7 @@ static void perf_sysenter_disable(struct trace_event_call *call)
 
 	mutex_lock(&syscall_trace_lock);
 	sys_perf_refcount_enter--;
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				    name))
@@ -680,10 +697,14 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 	syscall_nr = trace_get_syscall_nr(current, regs);
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
 		return;
+
+	if (trace_is_compat_syscall(regs))
+		syscall_nr += NR_syscalls;
+
 	if (!test_bit(syscall_nr, enabled_perf_exit_syscalls))
 		return;
 
-	sys_data = syscall_nr_to_meta(syscall_nr);
+	sys_data = trace_syscall_nr_to_meta(syscall_nr);
 	if (!sys_data)
 		return;
 
@@ -723,7 +744,7 @@ static int perf_sysexit_enable(struct trace_event_call *call)
 		}
 	}
 
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				    name))
@@ -745,7 +766,7 @@ static void perf_sysexit_disable(struct trace_event_call *call)
 
 	mutex_lock(&syscall_trace_lock);
 	sys_perf_refcount_exit--;
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				    name))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 2/3] tracing/syscalls: add handling for compat tasks
@ 2016-09-09  8:03   ` Marcin Nowakowski
  0 siblings, 0 replies; 13+ messages in thread
From: Marcin Nowakowski @ 2016-09-09  8:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api, luto, rostedt, Marcin Nowakowski

Extend the syscall tracing subsystem by adding a handler for compat
tasks. For some architectures, where compat tasks' syscall numbers have
an exclusive set of syscall numbers, this already works since the
removal of syscall_nr.
Architectures where the same syscall may use a different syscall number
for compat tasks need to define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP and
define a method arch_trace_is_compat_syscall(struct pt_regs*) that tells
if a current task is a compat one.
For architectures that define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP the
number of trace event files is doubled and all syscall trace events are
identified by the syscall number offset by NR_syscalls.

Note that as this patch series is posted as an RFC, this currently only
includes arch updates for MIPS and x86 (and has only been tested on
MIPS and x86_64). I will work on updating other arch trees after this
solution is reviewed.

Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>

---
 arch/mips/kernel/ftrace.c     |   4 +-
 arch/x86/include/asm/ftrace.h |  10 +---
 arch/x86/kernel/ftrace.c      |  14 ++++++
 include/linux/ftrace.h        |   2 +-
 kernel/trace/trace.h          |  11 +++-
 kernel/trace/trace_syscalls.c | 113 +++++++++++++++++++++++++-----------------
 6 files changed, 94 insertions(+), 60 deletions(-)

diff --git a/arch/mips/kernel/ftrace.c b/arch/mips/kernel/ftrace.c
index 937c54b..e150cf6 100644
--- a/arch/mips/kernel/ftrace.c
+++ b/arch/mips/kernel/ftrace.c
@@ -412,7 +412,7 @@ out:
 #ifdef CONFIG_FTRACE_SYSCALLS
 
 #ifdef CONFIG_32BIT
-unsigned long __init arch_syscall_addr(int nr)
+unsigned long __init arch_syscall_addr(int nr, int compat)
 {
 	return (unsigned long)sys_call_table[nr - __NR_O32_Linux];
 }
@@ -420,7 +420,7 @@ unsigned long __init arch_syscall_addr(int nr)
 
 #ifdef CONFIG_64BIT
 
-unsigned long __init arch_syscall_addr(int nr)
+unsigned long __init arch_syscall_addr(int nr, int compat)
 {
 #ifdef CONFIG_MIPS32_N32
 	if (nr >= __NR_N32_Linux && nr <= __NR_N32_Linux + __NR_N32_Linux_syscalls)
diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
index a4820d4..a24a21c 100644
--- a/arch/x86/include/asm/ftrace.h
+++ b/arch/x86/include/asm/ftrace.h
@@ -47,15 +47,7 @@ int ftrace_int3_handler(struct pt_regs *regs);
 #if defined(CONFIG_FTRACE_SYSCALLS) && defined(CONFIG_IA32_EMULATION)
 #include <asm/compat.h>
 
-/*
- * Because ia32 syscalls do not map to x86_64 syscall numbers
- * this screws up the trace output when tracing a ia32 task.
- * Instead of reporting bogus syscalls, just do not trace them.
- *
- * If the user really wants these, then they should use the
- * raw syscall tracepoints with filtering.
- */
-#define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS 1
+#define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP 1
 static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
 {
 	if (in_compat_syscall())
diff --git a/arch/x86/kernel/ftrace.c b/arch/x86/kernel/ftrace.c
index d036cfb..78f3e36 100644
--- a/arch/x86/kernel/ftrace.c
+++ b/arch/x86/kernel/ftrace.c
@@ -28,6 +28,7 @@
 #include <asm/kprobes.h>
 #include <asm/ftrace.h>
 #include <asm/nops.h>
+#include <asm/syscall.h>
 
 #ifdef CONFIG_DYNAMIC_FTRACE
 
@@ -1035,3 +1036,16 @@ void prepare_ftrace_return(unsigned long self_addr, unsigned long *parent,
 	}
 }
 #endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+
+#ifdef CONFIG_FTRACE_SYSCALLS
+
+unsigned long arch_syscall_addr(int nr, int compat)
+{
+#if defined(CONFIG_X86_64) && defined(CONFIG_IA32_EMULATION)
+	if (compat)
+		return (unsigned long)ia32_sys_call_table[nr];
+#endif
+	return (unsigned long)sys_call_table[nr];
+}
+
+#endif /* CONFIG_FTRACE_SYSCALLS */
diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h
index 7d565af..110f95d 100644
--- a/include/linux/ftrace.h
+++ b/include/linux/ftrace.h
@@ -938,7 +938,7 @@ static inline void  disable_trace_on_warning(void) { }
 
 #ifdef CONFIG_FTRACE_SYSCALLS
 
-unsigned long arch_syscall_addr(int nr);
+unsigned long arch_syscall_addr(int nr, int compat);
 
 #endif /* CONFIG_FTRACE_SYSCALLS */
 
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index f783df4..102a41a 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -234,8 +234,15 @@ struct trace_array {
 #ifdef CONFIG_FTRACE_SYSCALLS
 	int			sys_refcount_enter;
 	int			sys_refcount_exit;
-	struct trace_event_file __rcu *enter_syscall_files[NR_syscalls];
-	struct trace_event_file __rcu *exit_syscall_files[NR_syscalls];
+
+#ifdef ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP
+#define FTRACE_SYSCALL_CNT (NR_syscalls * (1 + IS_ENABLED(CONFIG_COMPAT)))
+#else
+#define FTRACE_SYSCALL_CNT (NR_syscalls)
+#endif
+
+	struct trace_event_file __rcu *enter_syscall_files[FTRACE_SYSCALL_CNT];
+	struct trace_event_file __rcu *exit_syscall_files[FTRACE_SYSCALL_CNT];
 #endif
 	int			stop_count;
 	int			clock_id;
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index 1da10ca..dc7df38 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -44,37 +44,35 @@ static inline bool arch_syscall_match_sym_name(const char *sym, const char *name
 }
 #endif
 
-#ifdef ARCH_TRACE_IGNORE_COMPAT_SYSCALLS
+#ifdef ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP
 /*
  * Some architectures that allow for 32bit applications
  * to run on a 64bit kernel, do not map the syscalls for
  * the 32bit tasks the same as they do for 64bit tasks.
  *
- *     *cough*x86*cough*
- *
- * In such a case, instead of reporting the wrong syscalls,
- * simply ignore them.
- *
- * For an arch to ignore the compat syscalls it needs to
- * define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS as well as
+ * If a set of syscall numbers for 32-bit tasks overlaps
+ * the set of syscall numbers for 64-bit tasks, define
+ * ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP as well as
  * define the function arch_trace_is_compat_syscall() to let
- * the tracing system know that it should ignore it.
+ * the tracing system know that a compat syscall is being handled.
  */
-static int
-trace_get_syscall_nr(struct task_struct *task, struct pt_regs *regs)
+static inline bool trace_is_compat_syscall(struct pt_regs *regs)
 {
-	if (unlikely(arch_trace_is_compat_syscall(regs)))
-		return -1;
-
-	return syscall_get_nr(task, regs);
+	return arch_trace_is_compat_syscall(regs);
 }
 #else
+static inline bool trace_is_compat_syscall(struct pt_regs *regs)
+{
+	return false;
+}
+#endif /* ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP */
+
 static inline int
 trace_get_syscall_nr(struct task_struct *task, struct pt_regs *regs)
 {
 	return syscall_get_nr(task, regs);
 }
-#endif /* ARCH_TRACE_IGNORE_COMPAT_SYSCALLS */
+
 
 static __init struct syscall_metadata *
 find_syscall_meta(unsigned long syscall)
@@ -98,9 +96,9 @@ find_syscall_meta(unsigned long syscall)
 	return NULL;
 }
 
-static struct syscall_metadata *syscall_nr_to_meta(int nr)
+static struct syscall_metadata *trace_syscall_nr_to_meta(int nr)
 {
-	if (!syscalls_metadata || nr >= NR_syscalls || nr < 0)
+	if (!syscalls_metadata || nr >= FTRACE_SYSCALL_CNT || nr < 0)
 		return NULL;
 
 	return syscalls_metadata[nr];
@@ -110,7 +108,7 @@ const char *get_syscall_name(int syscall)
 {
 	struct syscall_metadata *entry;
 
-	entry = syscall_nr_to_meta(syscall);
+	entry = trace_syscall_nr_to_meta(syscall);
 	if (!entry)
 		return NULL;
 
@@ -130,7 +128,7 @@ print_syscall_enter(struct trace_iterator *iter, int flags,
 
 	trace = (typeof(trace))ent;
 	syscall = trace->nr;
-	entry = syscall_nr_to_meta(syscall);
+	entry = trace_syscall_nr_to_meta(syscall);
 
 	if (!entry)
 		goto end;
@@ -176,7 +174,7 @@ print_syscall_exit(struct trace_iterator *iter, int flags,
 
 	trace = (typeof(trace))ent;
 	syscall = trace->nr;
-	entry = syscall_nr_to_meta(syscall);
+	entry = trace_syscall_nr_to_meta(syscall);
 
 	if (!entry) {
 		trace_seq_putc(s, '\n');
@@ -321,6 +319,9 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
 		return;
 
+	if (trace_is_compat_syscall(regs))
+		syscall_nr += NR_syscalls;
+
 	/* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE) */
 	trace_file = rcu_dereference_sched(tr->enter_syscall_files[syscall_nr]);
 	if (!trace_file)
@@ -329,7 +330,7 @@ static void ftrace_syscall_enter(void *data, struct pt_regs *regs, long id)
 	if (trace_trigger_soft_disabled(trace_file))
 		return;
 
-	sys_data = syscall_nr_to_meta(syscall_nr);
+	sys_data = trace_syscall_nr_to_meta(syscall_nr);
 	if (!sys_data)
 		return;
 
@@ -368,6 +369,9 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
 		return;
 
+	if (trace_is_compat_syscall(regs))
+		syscall_nr += NR_syscalls;
+
 	/* Here we're inside tp handler's rcu_read_lock_sched (__DO_TRACE()) */
 	trace_file = rcu_dereference_sched(tr->exit_syscall_files[syscall_nr]);
 	if (!trace_file)
@@ -376,7 +380,7 @@ static void ftrace_syscall_exit(void *data, struct pt_regs *regs, long ret)
 	if (trace_trigger_soft_disabled(trace_file))
 		return;
 
-	sys_data = syscall_nr_to_meta(syscall_nr);
+	sys_data = trace_syscall_nr_to_meta(syscall_nr);
 	if (!sys_data)
 		return;
 
@@ -415,7 +419,7 @@ static int reg_event_syscall_enter(struct trace_event_file *file,
 			goto out_unlock;
 	}
 
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				    name))
@@ -438,7 +442,7 @@ static void unreg_event_syscall_enter(struct trace_event_file *file,
 	name = ((const struct syscall_metadata *)call->data)->name;
 	mutex_lock(&syscall_trace_lock);
 	tr->sys_refcount_enter--;
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		   arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				   name))
@@ -466,7 +470,7 @@ static int reg_event_syscall_exit(struct trace_event_file *file,
 			goto out_unlock;
 	}
 
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				    name))
@@ -490,7 +494,7 @@ static void unreg_event_syscall_exit(struct trace_event_file *file,
 
 	mutex_lock(&syscall_trace_lock);
 	tr->sys_refcount_exit--;
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		   arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				   name))
@@ -542,38 +546,47 @@ struct trace_event_class __refdata event_class_syscall_exit = {
 	.raw_init	= init_syscall_trace,
 };
 
-unsigned long __init __weak arch_syscall_addr(int nr)
+unsigned long __init __weak arch_syscall_addr(int nr, int compat)
 {
 	return (unsigned long)sys_call_table[nr];
 }
 
-void __init init_ftrace_syscalls(void)
+void __init init_ftrace_syscalls_meta(int compat)
 {
 	struct syscall_metadata *meta;
 	unsigned long addr;
 	int i;
 
-	syscalls_metadata = kcalloc(NR_syscalls, sizeof(*syscalls_metadata),
-				    GFP_KERNEL);
-	if (!syscalls_metadata) {
-		WARN_ON(1);
-		return;
-	}
-
 	for (i = 0; i < NR_syscalls; i++) {
-		addr = arch_syscall_addr(i);
+		addr = arch_syscall_addr(i, compat);
 		meta = find_syscall_meta(addr);
 		if (!meta)
 			continue;
 
-		syscalls_metadata[i] = meta;
+		syscalls_metadata[compat * NR_syscalls + i] = meta;
 	}
 }
 
+void __init init_ftrace_syscalls(void)
+{
+	syscalls_metadata = kcalloc(FTRACE_SYSCALL_CNT,
+				    sizeof(*syscalls_metadata), GFP_KERNEL);
+	if (!syscalls_metadata) {
+		WARN_ON(1);
+		return;
+	}
+
+	init_ftrace_syscalls_meta(0);
+#ifdef ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP
+	if (IS_ENABLED(CONFIG_COMPAT))
+		init_ftrace_syscalls_meta(1);
+#endif
+}
+
 #ifdef CONFIG_PERF_EVENTS
 
-static DECLARE_BITMAP(enabled_perf_enter_syscalls, NR_syscalls);
-static DECLARE_BITMAP(enabled_perf_exit_syscalls, NR_syscalls);
+static DECLARE_BITMAP(enabled_perf_enter_syscalls, FTRACE_SYSCALL_CNT);
+static DECLARE_BITMAP(enabled_perf_exit_syscalls, FTRACE_SYSCALL_CNT);
 static int sys_perf_refcount_enter;
 static int sys_perf_refcount_exit;
 
@@ -589,10 +602,14 @@ static void perf_syscall_enter(void *ignore, struct pt_regs *regs, long id)
 	syscall_nr = trace_get_syscall_nr(current, regs);
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
 		return;
+
+	if (trace_is_compat_syscall(regs))
+		syscall_nr += NR_syscalls;
+
 	if (!test_bit(syscall_nr, enabled_perf_enter_syscalls))
 		return;
 
-	sys_data = syscall_nr_to_meta(syscall_nr);
+	sys_data = trace_syscall_nr_to_meta(syscall_nr);
 	if (!sys_data)
 		return;
 
@@ -635,7 +652,7 @@ static int perf_sysenter_enable(struct trace_event_call *call)
 		}
 	}
 
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				    name))
@@ -657,7 +674,7 @@ static void perf_sysenter_disable(struct trace_event_call *call)
 
 	mutex_lock(&syscall_trace_lock);
 	sys_perf_refcount_enter--;
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				    name))
@@ -680,10 +697,14 @@ static void perf_syscall_exit(void *ignore, struct pt_regs *regs, long ret)
 	syscall_nr = trace_get_syscall_nr(current, regs);
 	if (syscall_nr < 0 || syscall_nr >= NR_syscalls)
 		return;
+
+	if (trace_is_compat_syscall(regs))
+		syscall_nr += NR_syscalls;
+
 	if (!test_bit(syscall_nr, enabled_perf_exit_syscalls))
 		return;
 
-	sys_data = syscall_nr_to_meta(syscall_nr);
+	sys_data = trace_syscall_nr_to_meta(syscall_nr);
 	if (!sys_data)
 		return;
 
@@ -723,7 +744,7 @@ static int perf_sysexit_enable(struct trace_event_call *call)
 		}
 	}
 
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				    name))
@@ -745,7 +766,7 @@ static void perf_sysexit_disable(struct trace_event_call *call)
 
 	mutex_lock(&syscall_trace_lock);
 	sys_perf_refcount_exit--;
-	for (num = 0; num < NR_syscalls; num++) {
+	for (num = 0; num < FTRACE_SYSCALL_CNT; num++) {
 		if (syscalls_metadata[num] &&
 		    arch_syscall_match_sym_name(syscalls_metadata[num]->name,
 				    name))
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 3/3] tracing/syscalls: add compat syscall metadata
  2016-09-09  8:03 ` Marcin Nowakowski
@ 2016-09-09  8:03   ` Marcin Nowakowski
  -1 siblings, 0 replies; 13+ messages in thread
From: Marcin Nowakowski @ 2016-09-09  8:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api, luto, rostedt, Marcin Nowakowski

Now that compat syscalls are properly distinguished from native calls,
we can add metadata for compat syscalls as well.
All the macros used to generate the metadata are the same as for
standard syscalls, but with a compat_ prefix to distinguish them easily.

Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>

---
 include/linux/compat.h        | 74 +++++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_syscalls.c |  7 ++--
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index f964ef7..9e84c92 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -30,7 +30,80 @@
 #define __SC_DELOUSE(t,v) ((t)(unsigned long)(v))
 #endif
 
+#ifdef CONFIG_FTRACE_SYSCALLS
+#ifndef __SC_STR_ADECL
+#define __SC_STR_ADECL(t, a)	#a
+#endif
+
+#ifndef __SC_STR_TDECL
+#define __SC_STR_TDECL(t, a)	#t
+#endif
+
+extern struct trace_event_class event_class_syscall_enter;
+extern struct trace_event_class event_class_syscall_exit;
+extern struct trace_event_functions enter_syscall_print_funcs;
+extern struct trace_event_functions exit_syscall_print_funcs;
+
+#define COMPAT_SYSCALL_TRACE_ENTER_EVENT(sname)				\
+	static struct syscall_metadata __syscall_meta_compat_##sname;		\
+	static struct trace_event_call __used				\
+	  event_enter_compat_##sname = {					\
+		.class			= &event_class_syscall_enter,	\
+		{							\
+			.name                   = "compat_sys_enter"#sname,	\
+		},							\
+		.event.funcs            = &enter_syscall_print_funcs,	\
+		.data			= (void *)&__syscall_meta_compat_##sname,\
+		.flags                  = TRACE_EVENT_FL_CAP_ANY,	\
+	};								\
+	static struct trace_event_call __used				\
+	  __attribute__((section("_ftrace_events")))			\
+	 *__event_enter_compat_##sname = &event_enter_compat_##sname;
+
+#define COMPAT_SYSCALL_TRACE_EXIT_EVENT(sname)					\
+	static struct syscall_metadata __syscall_meta_compat_##sname;		\
+	static struct trace_event_call __used				\
+	  event_exit_compat_##sname = {					\
+		.class			= &event_class_syscall_exit,	\
+		{							\
+			.name                   = "compat_sys_exit"#sname,	\
+		},							\
+		.event.funcs		= &exit_syscall_print_funcs,	\
+		.data			= (void *)&__syscall_meta_compat_##sname,\
+		.flags                  = TRACE_EVENT_FL_CAP_ANY,	\
+	};								\
+	static struct trace_event_call __used				\
+	  __attribute__((section("_ftrace_events")))			\
+	*__event_exit_compat_##sname = &event_exit_compat_##sname;
+
+#define COMPAT_SYSCALL_METADATA(sname, nb, ...)			\
+	static const char *types_compat_##sname[] = {			\
+		__MAP(nb,__SC_STR_TDECL,__VA_ARGS__)		\
+	};							\
+	static const char *args_compat_##sname[] = {			\
+		__MAP(nb,__SC_STR_ADECL,__VA_ARGS__)		\
+	};							\
+	COMPAT_SYSCALL_TRACE_ENTER_EVENT(sname);			\
+	COMPAT_SYSCALL_TRACE_EXIT_EVENT(sname);			\
+	static struct syscall_metadata __used			\
+	  __syscall_meta_compat_##sname = {				\
+		.name 		= "compat_sys"#sname,			\
+		.nb_args 	= nb,				\
+		.types		= nb ? types_compat_##sname : NULL,	\
+		.args		= nb ? args_compat_##sname : NULL,	\
+		.enter_event	= &event_enter_compat_##sname,		\
+		.exit_event	= &event_exit_compat_##sname,		\
+		.enter_fields	= LIST_HEAD_INIT(__syscall_meta_compat_##sname.enter_fields), \
+	};							\
+	static struct syscall_metadata __used			\
+	  __attribute__((section("__syscalls_metadata")))	\
+	 *__p_syscall_meta_compat_##sname = &__syscall_meta_compat_##sname;
+#else
+#define COMPAT_SYSCALL_METADATA(sname, nb, ...)
+#endif
+
 #define COMPAT_SYSCALL_DEFINE0(name) \
+		COMPAT_SYSCALL_METADATA(_##name, 0);				\
 	asmlinkage long compat_sys_##name(void)
 
 #define COMPAT_SYSCALL_DEFINE1(name, ...) \
@@ -47,6 +120,7 @@
 	COMPAT_SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
 
 #define COMPAT_SYSCALL_DEFINEx(x, name, ...)				\
+		COMPAT_SYSCALL_METADATA(name, x, __VA_ARGS__) \
 	asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
 		__attribute__((alias(__stringify(compat_SyS##name))));  \
 	static inline long C_SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index dc7df38..5c3587f 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -35,12 +35,15 @@ static struct syscall_metadata **syscalls_metadata;
 static inline bool arch_syscall_match_sym_name(const char *sym, const char *name)
 {
 	/*
-	 * Only compare after the "sys" prefix. Archs that use
+	 * Only compare after the "sys" or "compat_sys" prefix. Archs that use
 	 * syscall wrappers may have syscalls symbols aliases prefixed
 	 * with ".SyS" or ".sys" instead of "sys", leading to an unwanted
 	 * mismatch.
 	 */
-	return !strcmp(sym + 3, name + 3);
+	int prefix_len = 3;
+	if (!strncasecmp(sym, "compat_", 7))
+		prefix_len = 10;
+	return !strcmp(sym + prefix_len, name + prefix_len);
 }
 #endif
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 3/3] tracing/syscalls: add compat syscall metadata
@ 2016-09-09  8:03   ` Marcin Nowakowski
  0 siblings, 0 replies; 13+ messages in thread
From: Marcin Nowakowski @ 2016-09-09  8:03 UTC (permalink / raw)
  To: linux-kernel; +Cc: linux-api, luto, rostedt, Marcin Nowakowski

Now that compat syscalls are properly distinguished from native calls,
we can add metadata for compat syscalls as well.
All the macros used to generate the metadata are the same as for
standard syscalls, but with a compat_ prefix to distinguish them easily.

Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>

---
 include/linux/compat.h        | 74 +++++++++++++++++++++++++++++++++++++++++++
 kernel/trace/trace_syscalls.c |  7 ++--
 2 files changed, 79 insertions(+), 2 deletions(-)

diff --git a/include/linux/compat.h b/include/linux/compat.h
index f964ef7..9e84c92 100644
--- a/include/linux/compat.h
+++ b/include/linux/compat.h
@@ -30,7 +30,80 @@
 #define __SC_DELOUSE(t,v) ((t)(unsigned long)(v))
 #endif
 
+#ifdef CONFIG_FTRACE_SYSCALLS
+#ifndef __SC_STR_ADECL
+#define __SC_STR_ADECL(t, a)	#a
+#endif
+
+#ifndef __SC_STR_TDECL
+#define __SC_STR_TDECL(t, a)	#t
+#endif
+
+extern struct trace_event_class event_class_syscall_enter;
+extern struct trace_event_class event_class_syscall_exit;
+extern struct trace_event_functions enter_syscall_print_funcs;
+extern struct trace_event_functions exit_syscall_print_funcs;
+
+#define COMPAT_SYSCALL_TRACE_ENTER_EVENT(sname)				\
+	static struct syscall_metadata __syscall_meta_compat_##sname;		\
+	static struct trace_event_call __used				\
+	  event_enter_compat_##sname = {					\
+		.class			= &event_class_syscall_enter,	\
+		{							\
+			.name                   = "compat_sys_enter"#sname,	\
+		},							\
+		.event.funcs            = &enter_syscall_print_funcs,	\
+		.data			= (void *)&__syscall_meta_compat_##sname,\
+		.flags                  = TRACE_EVENT_FL_CAP_ANY,	\
+	};								\
+	static struct trace_event_call __used				\
+	  __attribute__((section("_ftrace_events")))			\
+	 *__event_enter_compat_##sname = &event_enter_compat_##sname;
+
+#define COMPAT_SYSCALL_TRACE_EXIT_EVENT(sname)					\
+	static struct syscall_metadata __syscall_meta_compat_##sname;		\
+	static struct trace_event_call __used				\
+	  event_exit_compat_##sname = {					\
+		.class			= &event_class_syscall_exit,	\
+		{							\
+			.name                   = "compat_sys_exit"#sname,	\
+		},							\
+		.event.funcs		= &exit_syscall_print_funcs,	\
+		.data			= (void *)&__syscall_meta_compat_##sname,\
+		.flags                  = TRACE_EVENT_FL_CAP_ANY,	\
+	};								\
+	static struct trace_event_call __used				\
+	  __attribute__((section("_ftrace_events")))			\
+	*__event_exit_compat_##sname = &event_exit_compat_##sname;
+
+#define COMPAT_SYSCALL_METADATA(sname, nb, ...)			\
+	static const char *types_compat_##sname[] = {			\
+		__MAP(nb,__SC_STR_TDECL,__VA_ARGS__)		\
+	};							\
+	static const char *args_compat_##sname[] = {			\
+		__MAP(nb,__SC_STR_ADECL,__VA_ARGS__)		\
+	};							\
+	COMPAT_SYSCALL_TRACE_ENTER_EVENT(sname);			\
+	COMPAT_SYSCALL_TRACE_EXIT_EVENT(sname);			\
+	static struct syscall_metadata __used			\
+	  __syscall_meta_compat_##sname = {				\
+		.name 		= "compat_sys"#sname,			\
+		.nb_args 	= nb,				\
+		.types		= nb ? types_compat_##sname : NULL,	\
+		.args		= nb ? args_compat_##sname : NULL,	\
+		.enter_event	= &event_enter_compat_##sname,		\
+		.exit_event	= &event_exit_compat_##sname,		\
+		.enter_fields	= LIST_HEAD_INIT(__syscall_meta_compat_##sname.enter_fields), \
+	};							\
+	static struct syscall_metadata __used			\
+	  __attribute__((section("__syscalls_metadata")))	\
+	 *__p_syscall_meta_compat_##sname = &__syscall_meta_compat_##sname;
+#else
+#define COMPAT_SYSCALL_METADATA(sname, nb, ...)
+#endif
+
 #define COMPAT_SYSCALL_DEFINE0(name) \
+		COMPAT_SYSCALL_METADATA(_##name, 0);				\
 	asmlinkage long compat_sys_##name(void)
 
 #define COMPAT_SYSCALL_DEFINE1(name, ...) \
@@ -47,6 +120,7 @@
 	COMPAT_SYSCALL_DEFINEx(6, _##name, __VA_ARGS__)
 
 #define COMPAT_SYSCALL_DEFINEx(x, name, ...)				\
+		COMPAT_SYSCALL_METADATA(name, x, __VA_ARGS__) \
 	asmlinkage long compat_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__))\
 		__attribute__((alias(__stringify(compat_SyS##name))));  \
 	static inline long C_SYSC##name(__MAP(x,__SC_DECL,__VA_ARGS__));\
diff --git a/kernel/trace/trace_syscalls.c b/kernel/trace/trace_syscalls.c
index dc7df38..5c3587f 100644
--- a/kernel/trace/trace_syscalls.c
+++ b/kernel/trace/trace_syscalls.c
@@ -35,12 +35,15 @@ static struct syscall_metadata **syscalls_metadata;
 static inline bool arch_syscall_match_sym_name(const char *sym, const char *name)
 {
 	/*
-	 * Only compare after the "sys" prefix. Archs that use
+	 * Only compare after the "sys" or "compat_sys" prefix. Archs that use
 	 * syscall wrappers may have syscalls symbols aliases prefixed
 	 * with ".SyS" or ".sys" instead of "sys", leading to an unwanted
 	 * mismatch.
 	 */
-	return !strcmp(sym + 3, name + 3);
+	int prefix_len = 3;
+	if (!strncasecmp(sym, "compat_", 7))
+		prefix_len = 10;
+	return !strcmp(sym + prefix_len, name + prefix_len);
 }
 #endif
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/3] tracing/syscalls: add handling for compat tasks
  2016-09-09  8:03   ` Marcin Nowakowski
  (?)
@ 2016-09-12 17:35   ` Andy Lutomirski
  2016-09-13  5:41       ` Marcin Nowakowski
  -1 siblings, 1 reply; 13+ messages in thread
From: Andy Lutomirski @ 2016-09-12 17:35 UTC (permalink / raw)
  To: Marcin Nowakowski; +Cc: linux-kernel, Steven Rostedt, Linux API

On Sep 9, 2016 1:04 AM, "Marcin Nowakowski"
<marcin.nowakowski@imgtec.com> wrote:
>
> Extend the syscall tracing subsystem by adding a handler for compat
> tasks. For some architectures, where compat tasks' syscall numbers have
> an exclusive set of syscall numbers, this already works since the
> removal of syscall_nr.
> Architectures where the same syscall may use a different syscall number
> for compat tasks need to define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP and
> define a method arch_trace_is_compat_syscall(struct pt_regs*) that tells
> if a current task is a compat one.
> For architectures that define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP the
> number of trace event files is doubled and all syscall trace events are
> identified by the syscall number offset by NR_syscalls.
>
> Note that as this patch series is posted as an RFC, this currently only
> includes arch updates for MIPS and x86 (and has only been tested on
> MIPS and x86_64). I will work on updating other arch trees after this
> solution is reviewed.

I bet you didn't test x32 -- see below :)

>
> Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
>
> ---
>  arch/mips/kernel/ftrace.c     |   4 +-
>  arch/x86/include/asm/ftrace.h |  10 +---
>  arch/x86/kernel/ftrace.c      |  14 ++++++
>  include/linux/ftrace.h        |   2 +-
>  kernel/trace/trace.h          |  11 +++-
>  kernel/trace/trace_syscalls.c | 113 +++++++++++++++++++++++++-----------------
>  6 files changed, 94 insertions(+), 60 deletions(-)
>
> diff --git a/arch/mips/kernel/ftrace.c b/arch/mips/kernel/ftrace.c
> index 937c54b..e150cf6 100644
> --- a/arch/mips/kernel/ftrace.c
> +++ b/arch/mips/kernel/ftrace.c
> @@ -412,7 +412,7 @@ out:
>  #ifdef CONFIG_FTRACE_SYSCALLS
>
>  #ifdef CONFIG_32BIT
> -unsigned long __init arch_syscall_addr(int nr)
> +unsigned long __init arch_syscall_addr(int nr, int compat)
>  {
>         return (unsigned long)sys_call_table[nr - __NR_O32_Linux];
>  }
> @@ -420,7 +420,7 @@ unsigned long __init arch_syscall_addr(int nr)
>
>  #ifdef CONFIG_64BIT
>
> -unsigned long __init arch_syscall_addr(int nr)
> +unsigned long __init arch_syscall_addr(int nr, int compat)

bool compat?

>  {
>  #ifdef CONFIG_MIPS32_N32
>         if (nr >= __NR_N32_Linux && nr <= __NR_N32_Linux + __NR_N32_Linux_syscalls)
> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
> index a4820d4..a24a21c 100644
> --- a/arch/x86/include/asm/ftrace.h
> +++ b/arch/x86/include/asm/ftrace.h
> @@ -47,15 +47,7 @@ int ftrace_int3_handler(struct pt_regs *regs);
>  #if defined(CONFIG_FTRACE_SYSCALLS) && defined(CONFIG_IA32_EMULATION)
>  #include <asm/compat.h>
>
> -/*
> - * Because ia32 syscalls do not map to x86_64 syscall numbers
> - * this screws up the trace output when tracing a ia32 task.
> - * Instead of reporting bogus syscalls, just do not trace them.
> - *
> - * If the user really wants these, then they should use the
> - * raw syscall tracepoints with filtering.
> - */
> -#define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS 1
> +#define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP 1
>  static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
>  {
>         if (in_compat_syscall())

This isn't your fault obviously, but shouldn't that be in_ia32_syscall()?

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/3] tracing/syscalls: add handling for compat tasks
@ 2016-09-13  5:41       ` Marcin Nowakowski
  0 siblings, 0 replies; 13+ messages in thread
From: Marcin Nowakowski @ 2016-09-13  5:41 UTC (permalink / raw)
  To: Andy Lutomirski; +Cc: linux-kernel, Steven Rostedt, Linux API

Hi Andy,

Thanks for your review and the comments, I'll address them in a next 
iteration. Do you have any other comments on the complete patchset?

On 12.09.2016 19:35, Andy Lutomirski wrote:
> On Sep 9, 2016 1:04 AM, "Marcin Nowakowski"
> <marcin.nowakowski@imgtec.com> wrote:
>>
>> Extend the syscall tracing subsystem by adding a handler for compat
>> tasks. For some architectures, where compat tasks' syscall numbers have
>> an exclusive set of syscall numbers, this already works since the
>> removal of syscall_nr.
>> Architectures where the same syscall may use a different syscall number
>> for compat tasks need to define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP and
>> define a method arch_trace_is_compat_syscall(struct pt_regs*) that tells
>> if a current task is a compat one.
>> For architectures that define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP the
>> number of trace event files is doubled and all syscall trace events are
>> identified by the syscall number offset by NR_syscalls.
>>
>> Note that as this patch series is posted as an RFC, this currently only
>> includes arch updates for MIPS and x86 (and has only been tested on
>> MIPS and x86_64). I will work on updating other arch trees after this
>> solution is reviewed.
>
> I bet you didn't test x32 -- see below :)

Indeed ... I've tried with x86_64 and 32-bit x86, but not x32 syscalls. 
I will review that part.

>>
>> Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
>>
>> ---
>>  arch/mips/kernel/ftrace.c     |   4 +-
>>  arch/x86/include/asm/ftrace.h |  10 +---
>>  arch/x86/kernel/ftrace.c      |  14 ++++++
>>  include/linux/ftrace.h        |   2 +-
>>  kernel/trace/trace.h          |  11 +++-
>>  kernel/trace/trace_syscalls.c | 113 +++++++++++++++++++++++++-----------------
>>  6 files changed, 94 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/mips/kernel/ftrace.c b/arch/mips/kernel/ftrace.c
>> index 937c54b..e150cf6 100644
>> --- a/arch/mips/kernel/ftrace.c
>> +++ b/arch/mips/kernel/ftrace.c
>> @@ -412,7 +412,7 @@ out:
>>  #ifdef CONFIG_FTRACE_SYSCALLS
>>
>>  #ifdef CONFIG_32BIT
>> -unsigned long __init arch_syscall_addr(int nr)
>> +unsigned long __init arch_syscall_addr(int nr, int compat)
>>  {
>>         return (unsigned long)sys_call_table[nr - __NR_O32_Linux];
>>  }
>> @@ -420,7 +420,7 @@ unsigned long __init arch_syscall_addr(int nr)
>>
>>  #ifdef CONFIG_64BIT
>>
>> -unsigned long __init arch_syscall_addr(int nr)
>> +unsigned long __init arch_syscall_addr(int nr, int compat)
>
> bool compat?

Yes, that should make the intention more clear.

>>  {
>>  #ifdef CONFIG_MIPS32_N32
>>         if (nr >= __NR_N32_Linux && nr <= __NR_N32_Linux + __NR_N32_Linux_syscalls)
>> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
>> index a4820d4..a24a21c 100644
>> --- a/arch/x86/include/asm/ftrace.h
>> +++ b/arch/x86/include/asm/ftrace.h
>> @@ -47,15 +47,7 @@ int ftrace_int3_handler(struct pt_regs *regs);
>>  #if defined(CONFIG_FTRACE_SYSCALLS) && defined(CONFIG_IA32_EMULATION)
>>  #include <asm/compat.h>
>>
>> -/*
>> - * Because ia32 syscalls do not map to x86_64 syscall numbers
>> - * this screws up the trace output when tracing a ia32 task.
>> - * Instead of reporting bogus syscalls, just do not trace them.
>> - *
>> - * If the user really wants these, then they should use the
>> - * raw syscall tracepoints with filtering.
>> - */
>> -#define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS 1
>> +#define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP 1
>>  static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
>>  {
>>         if (in_compat_syscall())
>
> This isn't your fault obviously, but shouldn't that be in_ia32_syscall()?

Thanks for pointing this out - I'll need to review this part of code a 
bit more.

Marcin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/3] tracing/syscalls: add handling for compat tasks
@ 2016-09-13  5:41       ` Marcin Nowakowski
  0 siblings, 0 replies; 13+ messages in thread
From: Marcin Nowakowski @ 2016-09-13  5:41 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Steven Rostedt, Linux API

Hi Andy,

Thanks for your review and the comments, I'll address them in a next 
iteration. Do you have any other comments on the complete patchset?

On 12.09.2016 19:35, Andy Lutomirski wrote:
> On Sep 9, 2016 1:04 AM, "Marcin Nowakowski"
> <marcin.nowakowski-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org> wrote:
>>
>> Extend the syscall tracing subsystem by adding a handler for compat
>> tasks. For some architectures, where compat tasks' syscall numbers have
>> an exclusive set of syscall numbers, this already works since the
>> removal of syscall_nr.
>> Architectures where the same syscall may use a different syscall number
>> for compat tasks need to define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP and
>> define a method arch_trace_is_compat_syscall(struct pt_regs*) that tells
>> if a current task is a compat one.
>> For architectures that define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP the
>> number of trace event files is doubled and all syscall trace events are
>> identified by the syscall number offset by NR_syscalls.
>>
>> Note that as this patch series is posted as an RFC, this currently only
>> includes arch updates for MIPS and x86 (and has only been tested on
>> MIPS and x86_64). I will work on updating other arch trees after this
>> solution is reviewed.
>
> I bet you didn't test x32 -- see below :)

Indeed ... I've tried with x86_64 and 32-bit x86, but not x32 syscalls. 
I will review that part.

>>
>> Signed-off-by: Marcin Nowakowski <marcin.nowakowski-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
>>
>> ---
>>  arch/mips/kernel/ftrace.c     |   4 +-
>>  arch/x86/include/asm/ftrace.h |  10 +---
>>  arch/x86/kernel/ftrace.c      |  14 ++++++
>>  include/linux/ftrace.h        |   2 +-
>>  kernel/trace/trace.h          |  11 +++-
>>  kernel/trace/trace_syscalls.c | 113 +++++++++++++++++++++++++-----------------
>>  6 files changed, 94 insertions(+), 60 deletions(-)
>>
>> diff --git a/arch/mips/kernel/ftrace.c b/arch/mips/kernel/ftrace.c
>> index 937c54b..e150cf6 100644
>> --- a/arch/mips/kernel/ftrace.c
>> +++ b/arch/mips/kernel/ftrace.c
>> @@ -412,7 +412,7 @@ out:
>>  #ifdef CONFIG_FTRACE_SYSCALLS
>>
>>  #ifdef CONFIG_32BIT
>> -unsigned long __init arch_syscall_addr(int nr)
>> +unsigned long __init arch_syscall_addr(int nr, int compat)
>>  {
>>         return (unsigned long)sys_call_table[nr - __NR_O32_Linux];
>>  }
>> @@ -420,7 +420,7 @@ unsigned long __init arch_syscall_addr(int nr)
>>
>>  #ifdef CONFIG_64BIT
>>
>> -unsigned long __init arch_syscall_addr(int nr)
>> +unsigned long __init arch_syscall_addr(int nr, int compat)
>
> bool compat?

Yes, that should make the intention more clear.

>>  {
>>  #ifdef CONFIG_MIPS32_N32
>>         if (nr >= __NR_N32_Linux && nr <= __NR_N32_Linux + __NR_N32_Linux_syscalls)
>> diff --git a/arch/x86/include/asm/ftrace.h b/arch/x86/include/asm/ftrace.h
>> index a4820d4..a24a21c 100644
>> --- a/arch/x86/include/asm/ftrace.h
>> +++ b/arch/x86/include/asm/ftrace.h
>> @@ -47,15 +47,7 @@ int ftrace_int3_handler(struct pt_regs *regs);
>>  #if defined(CONFIG_FTRACE_SYSCALLS) && defined(CONFIG_IA32_EMULATION)
>>  #include <asm/compat.h>
>>
>> -/*
>> - * Because ia32 syscalls do not map to x86_64 syscall numbers
>> - * this screws up the trace output when tracing a ia32 task.
>> - * Instead of reporting bogus syscalls, just do not trace them.
>> - *
>> - * If the user really wants these, then they should use the
>> - * raw syscall tracepoints with filtering.
>> - */
>> -#define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS 1
>> +#define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP 1
>>  static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
>>  {
>>         if (in_compat_syscall())
>
> This isn't your fault obviously, but shouldn't that be in_ia32_syscall()?

Thanks for pointing this out - I'll need to review this part of code a 
bit more.

Marcin

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/3] tracing/syscalls: add handling for compat tasks
@ 2016-09-13 19:09         ` Andy Lutomirski
  0 siblings, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2016-09-13 19:09 UTC (permalink / raw)
  To: Marcin Nowakowski; +Cc: linux-kernel, Steven Rostedt, Linux API

On Mon, Sep 12, 2016 at 10:41 PM, Marcin Nowakowski
<marcin.nowakowski@imgtec.com> wrote:
> Hi Andy,
>
> Thanks for your review and the comments, I'll address them in a next
> iteration. Do you have any other comments on the complete patchset?

It seems reasonable to me.

>
> On 12.09.2016 19:35, Andy Lutomirski wrote:
>>
>> On Sep 9, 2016 1:04 AM, "Marcin Nowakowski"
>> <marcin.nowakowski@imgtec.com> wrote:
>>>
>>>
>>> Extend the syscall tracing subsystem by adding a handler for compat
>>> tasks. For some architectures, where compat tasks' syscall numbers have
>>> an exclusive set of syscall numbers, this already works since the
>>> removal of syscall_nr.
>>> Architectures where the same syscall may use a different syscall number
>>> for compat tasks need to define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP and
>>> define a method arch_trace_is_compat_syscall(struct pt_regs*) that tells
>>> if a current task is a compat one.
>>> For architectures that define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP the
>>> number of trace event files is doubled and all syscall trace events are
>>> identified by the syscall number offset by NR_syscalls.
>>>
>>> Note that as this patch series is posted as an RFC, this currently only
>>> includes arch updates for MIPS and x86 (and has only been tested on
>>> MIPS and x86_64). I will work on updating other arch trees after this
>>> solution is reviewed.
>>
>>
>> I bet you didn't test x32 -- see below :)
>
>
> Indeed ... I've tried with x86_64 and 32-bit x86, but not x32 syscalls. I
> will review that part.
>
>>>
>>> Signed-off-by: Marcin Nowakowski <marcin.nowakowski@imgtec.com>
>>>
>>> ---
>>>  arch/mips/kernel/ftrace.c     |   4 +-
>>>  arch/x86/include/asm/ftrace.h |  10 +---
>>>  arch/x86/kernel/ftrace.c      |  14 ++++++
>>>  include/linux/ftrace.h        |   2 +-
>>>  kernel/trace/trace.h          |  11 +++-
>>>  kernel/trace/trace_syscalls.c | 113
>>> +++++++++++++++++++++++++-----------------
>>>  6 files changed, 94 insertions(+), 60 deletions(-)
>>>
>>> diff --git a/arch/mips/kernel/ftrace.c b/arch/mips/kernel/ftrace.c
>>> index 937c54b..e150cf6 100644
>>> --- a/arch/mips/kernel/ftrace.c
>>> +++ b/arch/mips/kernel/ftrace.c
>>> @@ -412,7 +412,7 @@ out:
>>>  #ifdef CONFIG_FTRACE_SYSCALLS
>>>
>>>  #ifdef CONFIG_32BIT
>>> -unsigned long __init arch_syscall_addr(int nr)
>>> +unsigned long __init arch_syscall_addr(int nr, int compat)
>>>  {
>>>         return (unsigned long)sys_call_table[nr - __NR_O32_Linux];
>>>  }
>>> @@ -420,7 +420,7 @@ unsigned long __init arch_syscall_addr(int nr)
>>>
>>>  #ifdef CONFIG_64BIT
>>>
>>> -unsigned long __init arch_syscall_addr(int nr)
>>> +unsigned long __init arch_syscall_addr(int nr, int compat)
>>
>>
>> bool compat?
>
>
> Yes, that should make the intention more clear.
>
>>>  {
>>>  #ifdef CONFIG_MIPS32_N32
>>>         if (nr >= __NR_N32_Linux && nr <= __NR_N32_Linux +
>>> __NR_N32_Linux_syscalls)
>>> diff --git a/arch/x86/include/asm/ftrace.h
>>> b/arch/x86/include/asm/ftrace.h
>>> index a4820d4..a24a21c 100644
>>> --- a/arch/x86/include/asm/ftrace.h
>>> +++ b/arch/x86/include/asm/ftrace.h
>>> @@ -47,15 +47,7 @@ int ftrace_int3_handler(struct pt_regs *regs);
>>>  #if defined(CONFIG_FTRACE_SYSCALLS) && defined(CONFIG_IA32_EMULATION)
>>>  #include <asm/compat.h>
>>>
>>> -/*
>>> - * Because ia32 syscalls do not map to x86_64 syscall numbers
>>> - * this screws up the trace output when tracing a ia32 task.
>>> - * Instead of reporting bogus syscalls, just do not trace them.
>>> - *
>>> - * If the user really wants these, then they should use the
>>> - * raw syscall tracepoints with filtering.
>>> - */
>>> -#define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS 1
>>> +#define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP 1
>>>  static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
>>>  {
>>>         if (in_compat_syscall())
>>
>>
>> This isn't your fault obviously, but shouldn't that be in_ia32_syscall()?
>
>
> Thanks for pointing this out - I'll need to review this part of code a bit
> more.
>
> Marcin



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/3] tracing/syscalls: add handling for compat tasks
@ 2016-09-13 19:09         ` Andy Lutomirski
  0 siblings, 0 replies; 13+ messages in thread
From: Andy Lutomirski @ 2016-09-13 19:09 UTC (permalink / raw)
  To: Marcin Nowakowski
  Cc: linux-kernel-u79uwXL29TY76Z2rM5mHXA, Steven Rostedt, Linux API

On Mon, Sep 12, 2016 at 10:41 PM, Marcin Nowakowski
<marcin.nowakowski-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org> wrote:
> Hi Andy,
>
> Thanks for your review and the comments, I'll address them in a next
> iteration. Do you have any other comments on the complete patchset?

It seems reasonable to me.

>
> On 12.09.2016 19:35, Andy Lutomirski wrote:
>>
>> On Sep 9, 2016 1:04 AM, "Marcin Nowakowski"
>> <marcin.nowakowski-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org> wrote:
>>>
>>>
>>> Extend the syscall tracing subsystem by adding a handler for compat
>>> tasks. For some architectures, where compat tasks' syscall numbers have
>>> an exclusive set of syscall numbers, this already works since the
>>> removal of syscall_nr.
>>> Architectures where the same syscall may use a different syscall number
>>> for compat tasks need to define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP and
>>> define a method arch_trace_is_compat_syscall(struct pt_regs*) that tells
>>> if a current task is a compat one.
>>> For architectures that define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP the
>>> number of trace event files is doubled and all syscall trace events are
>>> identified by the syscall number offset by NR_syscalls.
>>>
>>> Note that as this patch series is posted as an RFC, this currently only
>>> includes arch updates for MIPS and x86 (and has only been tested on
>>> MIPS and x86_64). I will work on updating other arch trees after this
>>> solution is reviewed.
>>
>>
>> I bet you didn't test x32 -- see below :)
>
>
> Indeed ... I've tried with x86_64 and 32-bit x86, but not x32 syscalls. I
> will review that part.
>
>>>
>>> Signed-off-by: Marcin Nowakowski <marcin.nowakowski-1AXoQHu6uovQT0dZR+AlfA@public.gmane.org>
>>>
>>> ---
>>>  arch/mips/kernel/ftrace.c     |   4 +-
>>>  arch/x86/include/asm/ftrace.h |  10 +---
>>>  arch/x86/kernel/ftrace.c      |  14 ++++++
>>>  include/linux/ftrace.h        |   2 +-
>>>  kernel/trace/trace.h          |  11 +++-
>>>  kernel/trace/trace_syscalls.c | 113
>>> +++++++++++++++++++++++++-----------------
>>>  6 files changed, 94 insertions(+), 60 deletions(-)
>>>
>>> diff --git a/arch/mips/kernel/ftrace.c b/arch/mips/kernel/ftrace.c
>>> index 937c54b..e150cf6 100644
>>> --- a/arch/mips/kernel/ftrace.c
>>> +++ b/arch/mips/kernel/ftrace.c
>>> @@ -412,7 +412,7 @@ out:
>>>  #ifdef CONFIG_FTRACE_SYSCALLS
>>>
>>>  #ifdef CONFIG_32BIT
>>> -unsigned long __init arch_syscall_addr(int nr)
>>> +unsigned long __init arch_syscall_addr(int nr, int compat)
>>>  {
>>>         return (unsigned long)sys_call_table[nr - __NR_O32_Linux];
>>>  }
>>> @@ -420,7 +420,7 @@ unsigned long __init arch_syscall_addr(int nr)
>>>
>>>  #ifdef CONFIG_64BIT
>>>
>>> -unsigned long __init arch_syscall_addr(int nr)
>>> +unsigned long __init arch_syscall_addr(int nr, int compat)
>>
>>
>> bool compat?
>
>
> Yes, that should make the intention more clear.
>
>>>  {
>>>  #ifdef CONFIG_MIPS32_N32
>>>         if (nr >= __NR_N32_Linux && nr <= __NR_N32_Linux +
>>> __NR_N32_Linux_syscalls)
>>> diff --git a/arch/x86/include/asm/ftrace.h
>>> b/arch/x86/include/asm/ftrace.h
>>> index a4820d4..a24a21c 100644
>>> --- a/arch/x86/include/asm/ftrace.h
>>> +++ b/arch/x86/include/asm/ftrace.h
>>> @@ -47,15 +47,7 @@ int ftrace_int3_handler(struct pt_regs *regs);
>>>  #if defined(CONFIG_FTRACE_SYSCALLS) && defined(CONFIG_IA32_EMULATION)
>>>  #include <asm/compat.h>
>>>
>>> -/*
>>> - * Because ia32 syscalls do not map to x86_64 syscall numbers
>>> - * this screws up the trace output when tracing a ia32 task.
>>> - * Instead of reporting bogus syscalls, just do not trace them.
>>> - *
>>> - * If the user really wants these, then they should use the
>>> - * raw syscall tracepoints with filtering.
>>> - */
>>> -#define ARCH_TRACE_IGNORE_COMPAT_SYSCALLS 1
>>> +#define ARCH_COMPAT_SYSCALL_NUMBERS_OVERLAP 1
>>>  static inline bool arch_trace_is_compat_syscall(struct pt_regs *regs)
>>>  {
>>>         if (in_compat_syscall())
>>
>>
>> This isn't your fault obviously, but shouldn't that be in_ia32_syscall()?
>
>
> Thanks for pointing this out - I'll need to review this part of code a bit
> more.
>
> Marcin



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-09-13 19:09 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-09-09  8:03 [RFC PATCH 0/3] syscall/tracing: compat syscall support Marcin Nowakowski
2016-09-09  8:03 ` Marcin Nowakowski
2016-09-09  8:03 ` [RFC PATCH 1/3] tracing/syscalls: remove syscall_nr from syscall metadata Marcin Nowakowski
2016-09-09  8:03   ` Marcin Nowakowski
2016-09-09  8:03 ` [RFC PATCH 2/3] tracing/syscalls: add handling for compat tasks Marcin Nowakowski
2016-09-09  8:03   ` Marcin Nowakowski
2016-09-12 17:35   ` Andy Lutomirski
2016-09-13  5:41     ` Marcin Nowakowski
2016-09-13  5:41       ` Marcin Nowakowski
2016-09-13 19:09       ` Andy Lutomirski
2016-09-13 19:09         ` Andy Lutomirski
2016-09-09  8:03 ` [RFC PATCH 3/3] tracing/syscalls: add compat syscall metadata Marcin Nowakowski
2016-09-09  8:03   ` Marcin Nowakowski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.