* [PATCH 1/7] trace: Move trace_seq_overflowed out of line
2017-03-15 2:14 Some inline debloating, 4.11 edition Andi Kleen
@ 2017-03-15 2:14 ` Andi Kleen
2017-03-16 0:54 ` Steven Rostedt
2017-03-15 2:14 ` [PATCH 2/7] x86/atomic: Move __atomic_add_unless " Andi Kleen
` (5 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2017-03-15 2:14 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, Andi Kleen, rostedt
From: Andi Kleen <ak@linux.intel.com>
Inlining trace_seq_overflowed takes ~17k in text size in my kernel.
The function doesn't seem to be time critical, so we can just out of line it.
Function Total Avg Num
trace_seq_has_overflowed 17134 (0.00%) 33 514
This saves around 6k here
text data bss dec hex filename
9102881 5367568 11116544 25586993 1866d31 vmlinux-orig
9096494 5367568 11116544 25580606 186543e vmlinux-trace-seq
Cc: rostedt@goodmis.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
include/linux/trace_seq.h | 12 +-----------
kernel/trace/trace_seq.c | 15 +++++++++++++++
2 files changed, 16 insertions(+), 11 deletions(-)
diff --git a/include/linux/trace_seq.h b/include/linux/trace_seq.h
index cfaf5a1d4bad..442e4f087b95 100644
--- a/include/linux/trace_seq.h
+++ b/include/linux/trace_seq.h
@@ -56,17 +56,7 @@ trace_seq_buffer_ptr(struct trace_seq *s)
return s->buffer + seq_buf_used(&s->seq);
}
-/**
- * trace_seq_has_overflowed - return true if the trace_seq took too much
- * @s: trace sequence descriptor
- *
- * Returns true if too much data was added to the trace_seq and it is
- * now full and will not take anymore.
- */
-static inline bool trace_seq_has_overflowed(struct trace_seq *s)
-{
- return s->full || seq_buf_has_overflowed(&s->seq);
-}
+bool trace_seq_has_overflowed(struct trace_seq *s);
/*
* Currently only defined when tracing is enabled.
diff --git a/kernel/trace/trace_seq.c b/kernel/trace/trace_seq.c
index e694c9f9efa4..4367cd43e38c 100644
--- a/kernel/trace/trace_seq.c
+++ b/kernel/trace/trace_seq.c
@@ -375,3 +375,18 @@ int trace_seq_to_user(struct trace_seq *s, char __user *ubuf, int cnt)
return seq_buf_to_user(&s->seq, ubuf, cnt);
}
EXPORT_SYMBOL_GPL(trace_seq_to_user);
+
+
+
+/**
+ * trace_seq_has_overflowed - return true if the trace_seq took too much
+ * @s: trace sequence descriptor
+ *
+ * Returns true if too much data was added to the trace_seq and it is
+ * now full and will not take anymore.
+ */
+bool trace_seq_has_overflowed(struct trace_seq *s)
+{
+ return s->full || seq_buf_has_overflowed(&s->seq);
+}
+EXPORT_SYMBOL_GPL(trace_seq_has_overflowed);
--
2.9.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 1/7] trace: Move trace_seq_overflowed out of line
2017-03-15 2:14 ` [PATCH 1/7] trace: Move trace_seq_overflowed out of line Andi Kleen
@ 2017-03-16 0:54 ` Steven Rostedt
2017-03-16 2:27 ` Andi Kleen
0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2017-03-16 0:54 UTC (permalink / raw)
To: Andi Kleen; +Cc: akpm, linux-kernel, Andi Kleen
On Tue, 14 Mar 2017 19:14:25 -0700
Andi Kleen <andi@firstfloor.org> wrote:
> From: Andi Kleen <ak@linux.intel.com>
>
> Inlining trace_seq_overflowed takes ~17k in text size in my kernel.
> The function doesn't seem to be time critical, so we can just out of line it.
Instead of out of lining trace_seq_has_overflowed(), have you tried to
out of line the function that's called by tracepoints (one per
tracepoint). That is, trace_handle_return()?
The trace_seq_handle_overflow() is used in not reproduced places that I
would like to keep it as an inline. If the issue is size of the kernel,
please just out of line the one place that calls it that is duplicated
for every tracepoint. Which happens to be trace_handle_return().
Thanks!
-- Steve
>
> Function Total Avg Num
> trace_seq_has_overflowed 17134 (0.00%) 33 514
>
> This saves around 6k here
>
> text data bss dec hex filename
> 9102881 5367568 11116544 25586993 1866d31 vmlinux-orig
> 9096494 5367568 11116544 25580606 186543e vmlinux-trace-seq
>
> Cc: rostedt@goodmis.org
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
> include/linux/trace_seq.h | 12 +-----------
> kernel/trace/trace_seq.c | 15 +++++++++++++++
> 2 files changed, 16 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/trace_seq.h b/include/linux/trace_seq.h
> index cfaf5a1d4bad..442e4f087b95 100644
> --- a/include/linux/trace_seq.h
> +++ b/include/linux/trace_seq.h
> @@ -56,17 +56,7 @@ trace_seq_buffer_ptr(struct trace_seq *s)
> return s->buffer + seq_buf_used(&s->seq);
> }
>
> -/**
> - * trace_seq_has_overflowed - return true if the trace_seq took too much
> - * @s: trace sequence descriptor
> - *
> - * Returns true if too much data was added to the trace_seq and it is
> - * now full and will not take anymore.
> - */
> -static inline bool trace_seq_has_overflowed(struct trace_seq *s)
> -{
> - return s->full || seq_buf_has_overflowed(&s->seq);
> -}
> +bool trace_seq_has_overflowed(struct trace_seq *s);
>
> /*
> * Currently only defined when tracing is enabled.
> diff --git a/kernel/trace/trace_seq.c b/kernel/trace/trace_seq.c
> index e694c9f9efa4..4367cd43e38c 100644
> --- a/kernel/trace/trace_seq.c
> +++ b/kernel/trace/trace_seq.c
> @@ -375,3 +375,18 @@ int trace_seq_to_user(struct trace_seq *s, char __user *ubuf, int cnt)
> return seq_buf_to_user(&s->seq, ubuf, cnt);
> }
> EXPORT_SYMBOL_GPL(trace_seq_to_user);
> +
> +
> +
> +/**
> + * trace_seq_has_overflowed - return true if the trace_seq took too much
> + * @s: trace sequence descriptor
> + *
> + * Returns true if too much data was added to the trace_seq and it is
> + * now full and will not take anymore.
> + */
> +bool trace_seq_has_overflowed(struct trace_seq *s)
> +{
> + return s->full || seq_buf_has_overflowed(&s->seq);
> +}
> +EXPORT_SYMBOL_GPL(trace_seq_has_overflowed);
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/7] trace: Move trace_seq_overflowed out of line
2017-03-16 0:54 ` Steven Rostedt
@ 2017-03-16 2:27 ` Andi Kleen
2017-03-16 3:20 ` Steven Rostedt
0 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2017-03-16 2:27 UTC (permalink / raw)
To: Steven Rostedt; +Cc: Andi Kleen, akpm, linux-kernel, Andi Kleen
On Wed, Mar 15, 2017 at 08:54:20PM -0400, Steven Rostedt wrote:
> On Tue, 14 Mar 2017 19:14:25 -0700
> Andi Kleen <andi@firstfloor.org> wrote:
>
> > From: Andi Kleen <ak@linux.intel.com>
> >
> > Inlining trace_seq_overflowed takes ~17k in text size in my kernel.
> > The function doesn't seem to be time critical, so we can just out of line it.
>
> Instead of out of lining trace_seq_has_overflowed(), have you tried to
> out of line the function that's called by tracepoints (one per
> tracepoint). That is, trace_handle_return()?
This is a data driven approach so I always went for the largest savings.
>
> The trace_seq_handle_overflow() is used in not reproduced places that I
> would like to keep it as an inline. If the issue is size of the kernel,
I cannot parse this sentence. What advantage has it being inline?
> please just out of line the one place that calls it that is duplicated
> for every tracepoint. Which happens to be trace_handle_return().
It is used in lots of places outside trace_handle_return, so that would
give far less savings.
-Andi
include/linux/trace_events.h:143: return trace_seq_has_overflowed(s) ?
include/linux/trace_seq.h:60: * trace_seq_has_overflowed - return true if the trace_seq took too much
include/linux/trace_seq.h:66:static inline bool trace_seq_has_overflowed(struct trace_seq *s)
kernel/trace/ring_buffer.c:47: return !trace_seq_has_overflowed(s);
kernel/trace/ring_buffer.c:390: return !trace_seq_has_overflowed(s);
kernel/trace/trace.c:3268: if (trace_seq_has_overflowed(s))
kernel/trace/trace.c:3292: if (trace_seq_has_overflowed(s))
kernel/trace/trace.c:3318: if (trace_seq_has_overflowed(s))
kernel/trace/trace.c:3347: if (trace_seq_has_overflowed(s))
kernel/trace/trace.c:3399: if (trace_seq_has_overflowed(&iter->seq))
kernel/trace/trace.c:5490: if (trace_seq_has_overflowed(&iter->seq)) {
kernel/trace/trace_functions_graph.c:910: if (trace_seq_has_overflowed(s))
kernel/trace/trace_functions_graph.c:1221: if (trace_seq_has_overflowed(s))
kernel/trace/trace_output.c:354: return !trace_seq_has_overflowed(s);
kernel/trace/trace_output.c:374: return !trace_seq_has_overflowed(s);
kernel/trace/trace_output.c:435: return !trace_seq_has_overflowed(s);
kernel/trace/trace_output.c:522: return !trace_seq_has_overflowed(s);
kernel/trace/trace_output.c:550: return !trace_seq_has_overflowed(s);
kernel/trace/trace_output.c:586: return !trace_seq_has_overflowed(s);
kernel/trace/trace_output.c:1021: if (trace_seq_has_overflowed(s))
kernel/trace/trace_output.c:1071: if (ip == ULONG_MAX || trace_seq_has_overflowed(s))
kernel/trace/trace_probe.c:44: return !trace_seq_has_overflowed(s); \
kernel/trace/trace_probe.c:73: return !trace_seq_has_overflowed(s);
kernel/trace/trace_syscalls.c:147: if (trace_seq_has_overflowed(s))
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/7] trace: Move trace_seq_overflowed out of line
2017-03-16 2:27 ` Andi Kleen
@ 2017-03-16 3:20 ` Steven Rostedt
2017-03-16 3:41 ` Steven Rostedt
0 siblings, 1 reply; 16+ messages in thread
From: Steven Rostedt @ 2017-03-16 3:20 UTC (permalink / raw)
To: Andi Kleen; +Cc: akpm, linux-kernel, Andi Kleen
On Wed, 15 Mar 2017 19:27:57 -0700
Andi Kleen <andi@firstfloor.org> wrote:
> On Wed, Mar 15, 2017 at 08:54:20PM -0400, Steven Rostedt wrote:
> > On Tue, 14 Mar 2017 19:14:25 -0700
> > Andi Kleen <andi@firstfloor.org> wrote:
> >
> > > From: Andi Kleen <ak@linux.intel.com>
> > >
> > > Inlining trace_seq_overflowed takes ~17k in text size in my kernel.
> > > The function doesn't seem to be time critical, so we can just out of line it.
> >
> > Instead of out of lining trace_seq_has_overflowed(), have you tried to
> > out of line the function that's called by tracepoints (one per
> > tracepoint). That is, trace_handle_return()?
>
> This is a data driven approach so I always went for the largest savings.
>
> >
> > The trace_seq_handle_overflow() is used in not reproduced places that I
> > would like to keep it as an inline. If the issue is size of the kernel,
>
> I cannot parse this sentence. What advantage has it being inline?
Because you don't understand the problem. And why I'm against your
patch!
>
> > please just out of line the one place that calls it that is duplicated
> > for every tracepoint. Which happens to be trace_handle_return().
>
> It is used in lots of places outside trace_handle_return, so that would
> give far less savings.
Have you actually looked at what trace_seq_has_overflowed() is?
static inline bool trace_seq_has_overflowed(struct trace_seq *s)
{
return s->full || seq_buf_has_overflowed(&s->seq);
}
static inline bool
seq_buf_has_overflowed(struct seq_buf *s)
{
return s->len > s->size;
}
Basically trace_seq_has_overflowed() is the same as:
return s->full || s->seq->len > s->seq->size
You really think the above in 24 locations would cause 17k difference??
>
> -Andi
>
> include/linux/trace_events.h:143: return trace_seq_has_overflowed(s) ?
Every thing below is negligible. The above which is called in
trace_handle_return() is your problem.
Let me explain it to you.
The above is part of the TRACE_EVENT() logic. It is duplicated for
*every* tracepoint in the system.
Looking at a current kernel:
# ls /debug/tracing/events/*/*/enable | wc -l
1267
There's 1267 events. That means the function trace_handle_return() is
called 1267 times! THAT IS THE PROBLEM!!!!
Look at include/trace/trace_events.h for
trace_raw_output_##call()
That's the macro that creates over a thousand functions calling
trace_handle_return().
So please, fix where the issue is and not the other function, as 23
callers is not going to be noticed.
-- Steve
> include/linux/trace_seq.h:60: * trace_seq_has_overflowed - return true if the trace_seq took too much
> include/linux/trace_seq.h:66:static inline bool trace_seq_has_overflowed(struct trace_seq *s)
> kernel/trace/ring_buffer.c:47: return !trace_seq_has_overflowed(s);
> kernel/trace/ring_buffer.c:390: return !trace_seq_has_overflowed(s);
> kernel/trace/trace.c:3268: if (trace_seq_has_overflowed(s))
> kernel/trace/trace.c:3292: if (trace_seq_has_overflowed(s))
> kernel/trace/trace.c:3318: if (trace_seq_has_overflowed(s))
> kernel/trace/trace.c:3347: if (trace_seq_has_overflowed(s))
> kernel/trace/trace.c:3399: if (trace_seq_has_overflowed(&iter->seq))
> kernel/trace/trace.c:5490: if (trace_seq_has_overflowed(&iter->seq)) {
> kernel/trace/trace_functions_graph.c:910: if (trace_seq_has_overflowed(s))
> kernel/trace/trace_functions_graph.c:1221: if (trace_seq_has_overflowed(s))
> kernel/trace/trace_output.c:354: return !trace_seq_has_overflowed(s);
> kernel/trace/trace_output.c:374: return !trace_seq_has_overflowed(s);
> kernel/trace/trace_output.c:435: return !trace_seq_has_overflowed(s);
> kernel/trace/trace_output.c:522: return !trace_seq_has_overflowed(s);
> kernel/trace/trace_output.c:550: return !trace_seq_has_overflowed(s);
> kernel/trace/trace_output.c:586: return !trace_seq_has_overflowed(s);
> kernel/trace/trace_output.c:1021: if (trace_seq_has_overflowed(s))
> kernel/trace/trace_output.c:1071: if (ip == ULONG_MAX || trace_seq_has_overflowed(s))
> kernel/trace/trace_probe.c:44: return !trace_seq_has_overflowed(s); \
> kernel/trace/trace_probe.c:73: return !trace_seq_has_overflowed(s);
> kernel/trace/trace_syscalls.c:147: if (trace_seq_has_overflowed(s))
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 1/7] trace: Move trace_seq_overflowed out of line
2017-03-16 3:20 ` Steven Rostedt
@ 2017-03-16 3:41 ` Steven Rostedt
0 siblings, 0 replies; 16+ messages in thread
From: Steven Rostedt @ 2017-03-16 3:41 UTC (permalink / raw)
To: Andi Kleen; +Cc: akpm, linux-kernel, Andi Kleen
On Wed, 15 Mar 2017 23:20:30 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:
> > It is used in lots of places outside trace_handle_return, so that would
> > give far less savings.
Actually, I think you'll probably have *more* savings inlining
trace_handle_return() than trace_seq_has_overflowed(). Why?
Think about it now before looking at the answer below.
> > include/linux/trace_events.h:143: return trace_seq_has_overflowed(s) ?
>
> Every thing below is negligible. The above which is called in
> trace_handle_return() is your problem.
>
> Let me explain it to you.
>
> The above is part of the TRACE_EVENT() logic. It is duplicated for
> *every* tracepoint in the system.
>
> Looking at a current kernel:
>
> # ls /debug/tracing/events/*/*/enable | wc -l
> 1267
>
> There's 1267 events. That means the function trace_handle_return() is
> called 1267 times! THAT IS THE PROBLEM!!!!
>
> Look at include/trace/trace_events.h for
>
> trace_raw_output_##call()
>
> That's the macro that creates over a thousand functions calling
> trace_handle_return().
>
trace_handle_return() is called 1267 times. If you out of line that
function, not only do you save the compares, you also save the
condition too! That could be a jump as well.
static inline enum print_line_t trace_handle_return(struct trace_seq *s)
{
return trace_seq_has_overflowed(s) ?
TRACE_TYPE_PARTIAL_LINE : TRACE_TYPE_HANDLED;
}
The above is called 1267 times. If you move out of line
trace_seq_has_overflowed() you only saved the
s->full || s->seq->len > s->seq->size
part from being duplicated.
But if you out of line trace_handle_return, you move out
s->full || s->seq->len > s->seq->size ?
TRACE_TYPE_PARTIAL_LINE :
TRACE_TYPE-HANDLED
1267 times as well.
Try it. It may surprise you.
-- Steve
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 2/7] x86/atomic: Move __atomic_add_unless out of line
2017-03-15 2:14 Some inline debloating, 4.11 edition Andi Kleen
2017-03-15 2:14 ` [PATCH 1/7] trace: Move trace_seq_overflowed out of line Andi Kleen
@ 2017-03-15 2:14 ` Andi Kleen
2017-03-15 2:14 ` [PATCH 3/7] sched: Out of line __update_load_avg Andi Kleen
` (4 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2017-03-15 2:14 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, Andi Kleen, x86
From: Andi Kleen <ak@linux.intel.com>
__atomic_add_unless is fairly big and often used, so it's quite expensive
to inline it. But it's unlikely that a call makes much difference for
such a complex function doing an expensive atomic. So out of line it.
This saves around 12k of text.
text data bss dec hex filename
9084246 5367600 11116544 25568390 1862486 vmlinux-atomic-add
9096494 5367568 11116544 25580606 186543e vmlinux-before-atomic-add
Cc: x86@kernel.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
arch/x86/include/asm/atomic.h | 24 +-----------------------
arch/x86/lib/Makefile | 1 +
arch/x86/lib/atomic.c | 27 +++++++++++++++++++++++++++
3 files changed, 29 insertions(+), 23 deletions(-)
create mode 100644 arch/x86/lib/atomic.c
diff --git a/arch/x86/include/asm/atomic.h b/arch/x86/include/asm/atomic.h
index 14635c5ea025..069d69712275 100644
--- a/arch/x86/include/asm/atomic.h
+++ b/arch/x86/include/asm/atomic.h
@@ -225,29 +225,7 @@ ATOMIC_OPS(xor, ^)
#undef ATOMIC_FETCH_OP
#undef ATOMIC_OP
-/**
- * __atomic_add_unless - add unless the number is already a given value
- * @v: pointer of type atomic_t
- * @a: the amount to add to v...
- * @u: ...unless v is equal to u.
- *
- * Atomically adds @a to @v, so long as @v was not already @u.
- * Returns the old value of @v.
- */
-static __always_inline int __atomic_add_unless(atomic_t *v, int a, int u)
-{
- int c, old;
- c = atomic_read(v);
- for (;;) {
- if (unlikely(c == (u)))
- break;
- old = atomic_cmpxchg((v), c, c + (a));
- if (likely(old == c))
- break;
- c = old;
- }
- return c;
-}
+int __atomic_add_unless(atomic_t *v, int a, int u);
/**
* atomic_inc_short - increment of a short integer
diff --git a/arch/x86/lib/Makefile b/arch/x86/lib/Makefile
index 34a74131a12c..81303cefa9f4 100644
--- a/arch/x86/lib/Makefile
+++ b/arch/x86/lib/Makefile
@@ -25,6 +25,7 @@ lib-y += memcpy_$(BITS).o
lib-$(CONFIG_RWSEM_XCHGADD_ALGORITHM) += rwsem.o
lib-$(CONFIG_INSTRUCTION_DECODER) += insn.o inat.o
lib-$(CONFIG_RANDOMIZE_BASE) += kaslr.o
+lib-y += atomic.o
obj-y += msr.o msr-reg.o msr-reg-export.o hweight.o
diff --git a/arch/x86/lib/atomic.c b/arch/x86/lib/atomic.c
new file mode 100644
index 000000000000..dde7ddf67698
--- /dev/null
+++ b/arch/x86/lib/atomic.c
@@ -0,0 +1,27 @@
+#include <linux/module.h>
+#include <asm/atomic.h>
+
+/**
+ * __atomic_add_unless - add unless the number is already a given value
+ * @v: pointer of type atomic_t
+ * @a: the amount to add to v...
+ * @u: ...unless v is equal to u.
+ *
+ * Atomically adds @a to @v, so long as @v was not already @u.
+ * Returns the old value of @v.
+ */
+int __atomic_add_unless(atomic_t *v, int a, int u)
+{
+ int c, old;
+ c = atomic_read(v);
+ for (;;) {
+ if (unlikely(c == (u)))
+ break;
+ old = atomic_cmpxchg((v), c, c + (a));
+ if (likely(old == c))
+ break;
+ c = old;
+ }
+ return c;
+}
+EXPORT_SYMBOL(__atomic_add_unless);
--
2.9.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 3/7] sched: Out of line __update_load_avg
2017-03-15 2:14 Some inline debloating, 4.11 edition Andi Kleen
2017-03-15 2:14 ` [PATCH 1/7] trace: Move trace_seq_overflowed out of line Andi Kleen
2017-03-15 2:14 ` [PATCH 2/7] x86/atomic: Move __atomic_add_unless " Andi Kleen
@ 2017-03-15 2:14 ` Andi Kleen
2017-03-15 2:14 ` [PATCH 4/7] kref: Remove WARN_ON for NULL release functions Andi Kleen
` (3 subsequent siblings)
6 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2017-03-15 2:14 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, Andi Kleen, peterz
From: Andi Kleen <ak@linux.intel.com>
This is a very complex function, which is called in multiple places.
It is unlikely that inlining or not inlining it makes any difference
for its run time.
This saves around 13k text in my kernel
text data bss dec hex filename
9083992 5367600 11116544 25568136 1862388 vmlinux-before-load-avg
9070166 5367600 11116544 25554310 185ed86 vmlinux-load-avg
Cc: peterz@infradead.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
kernel/sched/fair.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index dea138964b91..78ace89cd481 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -2848,7 +2848,7 @@ static u32 __compute_runnable_contrib(u64 n)
* load_avg = u_0` + y*(u_0 + u_1*y + u_2*y^2 + ... )
* = u_0 + u_1*y + u_2*y^2 + ... [re-labeling u_i --> u_{i+1}]
*/
-static __always_inline int
+static int
__update_load_avg(u64 now, int cpu, struct sched_avg *sa,
unsigned long weight, int running, struct cfs_rq *cfs_rq)
{
--
2.9.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 4/7] kref: Remove WARN_ON for NULL release functions
2017-03-15 2:14 Some inline debloating, 4.11 edition Andi Kleen
` (2 preceding siblings ...)
2017-03-15 2:14 ` [PATCH 3/7] sched: Out of line __update_load_avg Andi Kleen
@ 2017-03-15 2:14 ` Andi Kleen
2017-03-15 2:46 ` Greg KH
2017-03-15 2:14 ` [PATCH 5/7] Out of line dma_alloc/free_attrs Andi Kleen
` (2 subsequent siblings)
6 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2017-03-15 2:14 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, Andi Kleen, gregkh, peterz
From: Andi Kleen <ak@linux.intel.com>
The kref functions check for NULL release functions.
This WARN_ON seems rather pointless. We will eventually release and
then just crash nicely. It is also somewhat expensive because
these functions are inlined in a lot of places.
Removing the WARN_ONs saves around 2.3k in this kernel
(likely more in others with more drivers)
text data bss dec hex filename
9083992 5367600 11116544 25568136 1862388 vmlinux-before-load-avg
9070166 5367600 11116544 25554310 185ed86 vmlinux-load-avg
Cc: gregkh@linuxfoundation.org
Cc: peterz@infradead.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
include/linux/kref.h | 6 ------
1 file changed, 6 deletions(-)
diff --git a/include/linux/kref.h b/include/linux/kref.h
index f4156f88f557..29220724bf1c 100644
--- a/include/linux/kref.h
+++ b/include/linux/kref.h
@@ -66,8 +66,6 @@ static inline void kref_get(struct kref *kref)
*/
static inline int kref_put(struct kref *kref, void (*release)(struct kref *kref))
{
- WARN_ON(release == NULL);
-
if (refcount_dec_and_test(&kref->refcount)) {
release(kref);
return 1;
@@ -79,8 +77,6 @@ static inline int kref_put_mutex(struct kref *kref,
void (*release)(struct kref *kref),
struct mutex *lock)
{
- WARN_ON(release == NULL);
-
if (refcount_dec_and_mutex_lock(&kref->refcount, lock)) {
release(kref);
return 1;
@@ -92,8 +88,6 @@ static inline int kref_put_lock(struct kref *kref,
void (*release)(struct kref *kref),
spinlock_t *lock)
{
- WARN_ON(release == NULL);
-
if (refcount_dec_and_lock(&kref->refcount, lock)) {
release(kref);
return 1;
--
2.9.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 4/7] kref: Remove WARN_ON for NULL release functions
2017-03-15 2:14 ` [PATCH 4/7] kref: Remove WARN_ON for NULL release functions Andi Kleen
@ 2017-03-15 2:46 ` Greg KH
2017-03-15 12:27 ` Peter Zijlstra
0 siblings, 1 reply; 16+ messages in thread
From: Greg KH @ 2017-03-15 2:46 UTC (permalink / raw)
To: Andi Kleen; +Cc: akpm, linux-kernel, Andi Kleen, peterz
On Tue, Mar 14, 2017 at 07:14:28PM -0700, Andi Kleen wrote:
> From: Andi Kleen <ak@linux.intel.com>
>
> The kref functions check for NULL release functions.
> This WARN_ON seems rather pointless. We will eventually release and
> then just crash nicely. It is also somewhat expensive because
> these functions are inlined in a lot of places.
> Removing the WARN_ONs saves around 2.3k in this kernel
> (likely more in others with more drivers)
>
> text data bss dec hex filename
> 9083992 5367600 11116544 25568136 1862388 vmlinux-before-load-avg
> 9070166 5367600 11116544 25554310 185ed86 vmlinux-load-avg
WARN_ON() is heavy, didn't realize that.
No objection from me.
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 4/7] kref: Remove WARN_ON for NULL release functions
2017-03-15 2:46 ` Greg KH
@ 2017-03-15 12:27 ` Peter Zijlstra
0 siblings, 0 replies; 16+ messages in thread
From: Peter Zijlstra @ 2017-03-15 12:27 UTC (permalink / raw)
To: Greg KH; +Cc: Andi Kleen, akpm, linux-kernel, Andi Kleen
On Wed, Mar 15, 2017 at 10:46:56AM +0800, Greg KH wrote:
> On Tue, Mar 14, 2017 at 07:14:28PM -0700, Andi Kleen wrote:
> > From: Andi Kleen <ak@linux.intel.com>
> >
> > The kref functions check for NULL release functions.
> > This WARN_ON seems rather pointless. We will eventually release and
> > then just crash nicely. It is also somewhat expensive because
> > these functions are inlined in a lot of places.
> > Removing the WARN_ONs saves around 2.3k in this kernel
> > (likely more in others with more drivers)
> >
> > text data bss dec hex filename
> > 9083992 5367600 11116544 25568136 1862388 vmlinux-before-load-avg
> > 9070166 5367600 11116544 25554310 185ed86 vmlinux-load-avg
>
> WARN_ON() is heavy, didn't realize that.
I actually have patches fixing that.
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 5/7] Out of line dma_alloc/free_attrs
2017-03-15 2:14 Some inline debloating, 4.11 edition Andi Kleen
` (3 preceding siblings ...)
2017-03-15 2:14 ` [PATCH 4/7] kref: Remove WARN_ON for NULL release functions Andi Kleen
@ 2017-03-15 2:14 ` Andi Kleen
2017-03-15 2:14 ` [PATCH 6/7] megasas: Remove expensive inline from megasas_return_cmd Andi Kleen
2017-03-15 2:14 ` [PATCH 7/7] Remove expensive WARN_ON in pagefault_disabled_dec Andi Kleen
6 siblings, 0 replies; 16+ messages in thread
From: Andi Kleen @ 2017-03-15 2:14 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
These functions are very big and frequently used. I don't see any
sense in inlining them because they do slow operations.
Out of lining them saves ~19k text in my kernel.
text data bss dec hex filename
9047801 5367568 11116544 25531913 1859609 vmlinux-after-dma
9067798 5367600 11116544 25551942 185e446 vmlinux-before-dma
Cc: akpm@linux-foundation.org
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
include/linux/dma-mapping.h | 45 ++++++-------------------------------------
lib/Makefile | 2 +-
lib/dma-mapping.c | 47 +++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 54 insertions(+), 40 deletions(-)
create mode 100644 lib/dma-mapping.c
diff --git a/include/linux/dma-mapping.h b/include/linux/dma-mapping.h
index 0977317c6835..1a357a65f4cc 100644
--- a/include/linux/dma-mapping.h
+++ b/include/linux/dma-mapping.h
@@ -473,46 +473,13 @@ dma_get_sgtable_attrs(struct device *dev, struct sg_table *sgt, void *cpu_addr,
#define arch_dma_alloc_attrs(dev, flag) (true)
#endif
-static inline void *dma_alloc_attrs(struct device *dev, size_t size,
- dma_addr_t *dma_handle, gfp_t flag,
- unsigned long attrs)
-{
- const struct dma_map_ops *ops = get_dma_ops(dev);
- void *cpu_addr;
-
- BUG_ON(!ops);
-
- if (dma_alloc_from_coherent(dev, size, dma_handle, &cpu_addr))
- return cpu_addr;
-
- if (!arch_dma_alloc_attrs(&dev, &flag))
- return NULL;
- if (!ops->alloc)
- return NULL;
-
- cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
- debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
- return cpu_addr;
-}
-
-static inline void dma_free_attrs(struct device *dev, size_t size,
- void *cpu_addr, dma_addr_t dma_handle,
- unsigned long attrs)
-{
- const struct dma_map_ops *ops = get_dma_ops(dev);
-
- BUG_ON(!ops);
- WARN_ON(irqs_disabled());
-
- if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
- return;
-
- if (!ops->free || !cpu_addr)
- return;
+void *dma_alloc_attrs(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flag,
+ unsigned long attrs);
- debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
- ops->free(dev, size, cpu_addr, dma_handle, attrs);
-}
+void dma_free_attrs(struct device *dev, size_t size,
+ void *cpu_addr, dma_addr_t dma_handle,
+ unsigned long attrs);
static inline void *dma_alloc_coherent(struct device *dev, size_t size,
dma_addr_t *dma_handle, gfp_t flag)
diff --git a/lib/Makefile b/lib/Makefile
index 320ac46a8725..7fcfe102d244 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -23,7 +23,7 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
flex_proportions.o ratelimit.o show_mem.o \
is_single_threaded.o plist.o decompress.o kobject_uevent.o \
earlycpio.o seq_buf.o siphash.o \
- nmi_backtrace.o nodemask.o win_minmax.o
+ nmi_backtrace.o nodemask.o win_minmax.o dma-mapping.o
CFLAGS_radix-tree.o += -DCONFIG_SPARSE_RCU_POINTER
CFLAGS_idr.o += -DCONFIG_SPARSE_RCU_POINTER
diff --git a/lib/dma-mapping.c b/lib/dma-mapping.c
new file mode 100644
index 000000000000..3f221652c7e1
--- /dev/null
+++ b/lib/dma-mapping.c
@@ -0,0 +1,47 @@
+#include <linux/dma-mapping.h>
+#include <linux/module.h>
+
+void *dma_alloc_attrs(struct device *dev, size_t size,
+ dma_addr_t *dma_handle, gfp_t flag,
+ unsigned long attrs)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+ void *cpu_addr;
+
+ BUG_ON(!ops);
+
+ if (dma_alloc_from_coherent(dev, size, dma_handle, &cpu_addr))
+ return cpu_addr;
+
+ if (!arch_dma_alloc_attrs(&dev, &flag))
+ return NULL;
+ if (!ops->alloc)
+ return NULL;
+
+ cpu_addr = ops->alloc(dev, size, dma_handle, flag, attrs);
+ debug_dma_alloc_coherent(dev, size, *dma_handle, cpu_addr);
+ return cpu_addr;
+}
+
+EXPORT_SYMBOL(dma_alloc_attrs);
+
+void dma_free_attrs(struct device *dev, size_t size,
+ void *cpu_addr, dma_addr_t dma_handle,
+ unsigned long attrs)
+{
+ const struct dma_map_ops *ops = get_dma_ops(dev);
+
+ BUG_ON(!ops);
+ WARN_ON(irqs_disabled());
+
+ if (dma_release_from_coherent(dev, get_order(size), cpu_addr))
+ return;
+
+ if (!ops->free || !cpu_addr)
+ return;
+
+ debug_dma_free_coherent(dev, size, cpu_addr, dma_handle);
+ ops->free(dev, size, cpu_addr, dma_handle, attrs);
+}
+
+EXPORT_SYMBOL(dma_free_attrs);
--
2.9.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 6/7] megasas: Remove expensive inline from megasas_return_cmd
2017-03-15 2:14 Some inline debloating, 4.11 edition Andi Kleen
` (4 preceding siblings ...)
2017-03-15 2:14 ` [PATCH 5/7] Out of line dma_alloc/free_attrs Andi Kleen
@ 2017-03-15 2:14 ` Andi Kleen
2017-03-15 12:17 ` Sumit Saxena
2017-03-15 2:14 ` [PATCH 7/7] Remove expensive WARN_ON in pagefault_disabled_dec Andi Kleen
6 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2017-03-15 2:14 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, Andi Kleen, megaraidlinux.pdl
From: Andi Kleen <ak@linux.intel.com>
Remove an inline from a fairly big function that is used often.
It's unlikely that calling or not calling it makes a lot of difference.
Saves around 8k text in my kernel.
text data bss dec hex filename
9047801 5367568 11116544 25531913 1859609 vmlinux-before-megasas
9039417 5367568 11116544 25523529 1857549 vmlinux-megasas
Cc: megaraidlinux.pdl@broadcom.com
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
drivers/scsi/megaraid/megaraid_sas_base.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c b/drivers/scsi/megaraid/megaraid_sas_base.c
index 7ac9a9ee9bd4..55b71de3fb1f 100644
--- a/drivers/scsi/megaraid/megaraid_sas_base.c
+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
@@ -244,7 +244,7 @@ struct megasas_cmd *megasas_get_cmd(struct megasas_instance
* @instance: Adapter soft state
* @cmd: Command packet to be returned to free command pool
*/
-inline void
+void
megasas_return_cmd(struct megasas_instance *instance, struct megasas_cmd *cmd)
{
unsigned long flags;
--
2.9.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* RE: [PATCH 6/7] megasas: Remove expensive inline from megasas_return_cmd
2017-03-15 2:14 ` [PATCH 6/7] megasas: Remove expensive inline from megasas_return_cmd Andi Kleen
@ 2017-03-15 12:17 ` Sumit Saxena
0 siblings, 0 replies; 16+ messages in thread
From: Sumit Saxena @ 2017-03-15 12:17 UTC (permalink / raw)
To: Andi Kleen, akpm; +Cc: linux-kernel, Andi Kleen, PDL,MEGARAIDLINUX
>-----Original Message-----
>From: megaraidlinux.pdl@broadcom.com
>[mailto:megaraidlinux.pdl@broadcom.com] On Behalf Of Andi Kleen
>Sent: Wednesday, March 15, 2017 7:45 AM
>To: akpm@linux-foundation.org
>Cc: linux-kernel@vger.kernel.org; Andi Kleen;
>megaraidlinux.pdl@broadcom.com
>Subject: [PATCH 6/7] megasas: Remove expensive inline from
>megasas_return_cmd
>
>From: Andi Kleen <ak@linux.intel.com>
>
>Remove an inline from a fairly big function that is used often.
>It's unlikely that calling or not calling it makes a lot of difference.
>
>Saves around 8k text in my kernel.
>
> text data bss dec hex filename
>9047801 5367568 11116544 25531913 1859609
>vmlinux-before-megasas
>9039417 5367568 11116544 25523529 1857549 vmlinux-megasas
>
>Cc: megaraidlinux.pdl@broadcom.com
>Signed-off-by: Andi Kleen <ak@linux.intel.com>
>---
> drivers/scsi/megaraid/megaraid_sas_base.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
>diff --git a/drivers/scsi/megaraid/megaraid_sas_base.c
>b/drivers/scsi/megaraid/megaraid_sas_base.c
>index 7ac9a9ee9bd4..55b71de3fb1f 100644
>--- a/drivers/scsi/megaraid/megaraid_sas_base.c
>+++ b/drivers/scsi/megaraid/megaraid_sas_base.c
>@@ -244,7 +244,7 @@ struct megasas_cmd *megasas_get_cmd(struct
>megasas_instance
> * @instance: Adapter soft state
> * @cmd: Command packet to be returned to free command pool
> */
>-inline void
>+void
> megasas_return_cmd(struct megasas_instance *instance, struct megasas_cmd
>*cmd) {
> unsigned long flags;
Acked-by: Sumit Saxena <sumit.saxena@broadcom.com>
>--
>2.9.3
>
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 7/7] Remove expensive WARN_ON in pagefault_disabled_dec
2017-03-15 2:14 Some inline debloating, 4.11 edition Andi Kleen
` (5 preceding siblings ...)
2017-03-15 2:14 ` [PATCH 6/7] megasas: Remove expensive inline from megasas_return_cmd Andi Kleen
@ 2017-03-15 2:14 ` Andi Kleen
2017-03-15 21:49 ` Andrew Morton
6 siblings, 1 reply; 16+ messages in thread
From: Andi Kleen @ 2017-03-15 2:14 UTC (permalink / raw)
To: akpm; +Cc: linux-kernel, Andi Kleen
From: Andi Kleen <ak@linux.intel.com>
pagefault_disabled_dec is frequently used inline, and it has a WARN_ON
for underflow that expands to about 6.5k of extra code. The warning
doesn't seem to be that useful and worth so much code so remove it.
If it was needed could make it depending on some debug kernel option.
Saves ~6.5k in my kernel
text data bss dec hex filename
9039417 5367568 11116544 25523529 1857549 vmlinux-before-pf
9032805 5367568 11116544 25516917 1855b75 vmlinux-pf
Signed-off-by: Andi Kleen <ak@linux.intel.com>
---
include/linux/uaccess.h | 1 -
1 file changed, 1 deletion(-)
diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
index f30c187ed785..b691aad918fb 100644
--- a/include/linux/uaccess.h
+++ b/include/linux/uaccess.h
@@ -12,7 +12,6 @@ static __always_inline void pagefault_disabled_inc(void)
static __always_inline void pagefault_disabled_dec(void)
{
current->pagefault_disabled--;
- WARN_ON(current->pagefault_disabled < 0);
}
/*
--
2.9.3
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 7/7] Remove expensive WARN_ON in pagefault_disabled_dec
2017-03-15 2:14 ` [PATCH 7/7] Remove expensive WARN_ON in pagefault_disabled_dec Andi Kleen
@ 2017-03-15 21:49 ` Andrew Morton
0 siblings, 0 replies; 16+ messages in thread
From: Andrew Morton @ 2017-03-15 21:49 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel, Andi Kleen
On Tue, 14 Mar 2017 19:14:31 -0700 Andi Kleen <andi@firstfloor.org> wrote:
> From: Andi Kleen <ak@linux.intel.com>
>
> pagefault_disabled_dec is frequently used inline, and it has a WARN_ON
> for underflow that expands to about 6.5k of extra code. The warning
> doesn't seem to be that useful and worth so much code so remove it.
>
> If it was needed could make it depending on some debug kernel option.
>
> Saves ~6.5k in my kernel
>
> text data bss dec hex filename
> 9039417 5367568 11116544 25523529 1857549 vmlinux-before-pf
> 9032805 5367568 11116544 25516917 1855b75 vmlinux-pf
>
> Signed-off-by: Andi Kleen <ak@linux.intel.com>
> ---
> include/linux/uaccess.h | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/include/linux/uaccess.h b/include/linux/uaccess.h
> index f30c187ed785..b691aad918fb 100644
> --- a/include/linux/uaccess.h
> +++ b/include/linux/uaccess.h
> @@ -12,7 +12,6 @@ static __always_inline void pagefault_disabled_inc(void)
> static __always_inline void pagefault_disabled_dec(void)
> {
> current->pagefault_disabled--;
> - WARN_ON(current->pagefault_disabled < 0);
> }
Fair enough. We could switch to VM_WARN_ON but apparently even that is now
being enabled in some production systems, which somewhat defeats its
intent...
^ permalink raw reply [flat|nested] 16+ messages in thread