linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V4 0/9] x86/jump_label: Bound IPIs sent when updating a static key
@ 2019-02-04 19:58 Daniel Bristot de Oliveira
  2019-02-04 19:58 ` [PATCH V4 1/9] jump_label: Add for_each_label_entry helper Daniel Bristot de Oliveira
                   ` (8 more replies)
  0 siblings, 9 replies; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-04 19:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: bristot, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

While tuning a system with CPUs isolated as much as possible, we've
noticed that isolated CPUs were receiving bursts of 12 IPIs, periodically.
Tracing the functions that emit IPIs, we saw chronyd - an unprivileged
process -  generating the IPIs when changing a static key, enabling
network timestaping on sockets.

For instance, the trace pointed:

# trace-cmd record --func-stack -p function -l smp_call_function_single -e irq_vectors -f 'vector == 251'...
# trace-cmde report...
[...]
         chronyd-858   [000]   433.286406: function:             smp_call_function_single
         chronyd-858   [000]   433.286408: kernel_stack:         <stack trace>
=> smp_call_function_many (ffffffffbc10ab22)
=> smp_call_function (ffffffffbc10abaa)
=> on_each_cpu (ffffffffbc10ac0b)
=> text_poke_bp (ffffffffbc02436a)
=> arch_jump_label_transform (ffffffffbc020ec8)
=> __jump_label_update (ffffffffbc1b300f)
=> jump_label_update (ffffffffbc1b30ed)
=> static_key_slow_inc (ffffffffbc1b334d)
=> net_enable_timestamp (ffffffffbc625c44)
=> sock_enable_timestamp (ffffffffbc6127d5)
=> sock_setsockopt (ffffffffbc612d2f)
=> SyS_setsockopt (ffffffffbc60d5e6)
=> tracesys (ffffffffbc7681c5)
          <idle>-0     [001]   433.286416: call_function_single_entry: vector=251
          <idle>-0     [001]   433.286419: call_function_single_exit: vector=251
  
[... The IPI takes place 12 times]

The static key in case was the netstamp_needed_key. A static key
change from enabled->disabled/disabled->enabled causes the code to be
patched, and this is done in three steps:

-- Pseudo-code #1 - Current implementation ---
For each key to be updated:
	1) add an int3 trap to the address that will be patched
	    sync cores (send IPI to all other CPUs)
	2) update all but the first byte of the patched range
	    sync cores (send IPI to all other CPUs)
	3) replace the first byte (int3) by the first byte of replacing opcode 
	    sync cores (send IPI to all other CPUs)
-- Pseudo-code #1 ---

As the static key netstamp_needed_key has four entries (used in for
places in the code) in our kernel, 3 IPIs were generated for each
entry, resulting in 12 IPIs. The number of IPIs then is linear with
regard to the number 'n' of entries of a key: O(n*3), which is O(n).

This algorithm works fine for the update of a single key. But we think
it is possible to optimize the case in which a static key has more than
one entry. For instance, the sched_schedstats jump label has 56 entries
in my (updated) fedora kernel, resulting in 168 IPIs for each CPU in
which the thread that is enabling is _not_ running.

Rather than doing each updated at once, it is possible to queue all updates
first, and then apply all updates at once, rewriting the pseudo-code #1
in this way:

-- Pseudo-code #2 - proposal  ---
1)  for each key in the queue:
        add an int3 trap to the address that will be patched
    sync cores (send IPI to all other CPUs)

2)  for each key in the queue:
        update all but the first byte of the patched range
    sync cores (send IPI to all other CPUs)

3)  for each key in the queue:
        replace the first byte (int3) by the first byte of replacing opcode 
    sync cores (send IPI to all other CPUs)
-- Pseudo-code #2 - proposal  ---

Doing the update in this way, the number of IPI becomes O(3) with regard
to the number of keys, which is O(1).

Currently, the jump label of a static key is transformed via the arch
specific function:

    void arch_jump_label_transform(struct jump_entry *entry,
                                   enum jump_label_type type)

The new approach (batch mode) uses two arch functions, the first has the
same arguments of the arch_jump_label_transform(), and is the function:

    void arch_jump_label_transform_queue(struct jump_entry *entry,
                         enum jump_label_type type)

Rather than transforming the code, it adds the jump_entry in a queue of
entries to be updated.

After queuing all jump_entries, the function:
  
    void arch_jump_label_transform_apply(void)

Applies the changes in the queue.

One easy way to see the benefits of this patch is switching the
schedstats on and off. For instance:

-------------------------- %< ---------------------------- 
#!/bin/bash

while [ true ]; do 
    sysctl -w kernel.sched_schedstats=1
    sleep 2
    sysctl -w kernel.sched_schedstats=0
    sleep 2
done
-------------------------- >% ----------------------------

while watching the IPI count:

-------------------------- %< ---------------------------- 
# watch -n1 "cat /proc/interrupts | grep Function"
-------------------------- >% ----------------------------

With the current mode, it is possible to see +- 168 IPIs each 2 seconds,
while with the batch mode the number of IPIs goes to 3 each 2 seconds.

Regarding the performance impact of this patch set, I made two measurements:

    The time to update a key (the task that is causing the change)
    The time to run the int3 handler (the side effect on a thread that
                                      hits the code being changed)

The sched_schedstats static key was chosen as the key to being switched on and
off. The reason being is that it is used in more than 56 places, in a hot path.
The change in the schedstats static key will be done with the following
command:

while [ true ]; do
    sysctl -w kernel.sched_schedstats=1
    usleep 500000
    sysctl -w kernel.sched_schedstats=0
    usleep 500000
done

In this way, they key will be updated twice per second. To force the hit of the
int3 handler, the system will also run a kernel compilation with two jobs per
CPU. The test machine is a two nodes/24 CPUs box with an Intel Xeon processor
@2.27GHz.

Regarding the update part, on average, the regular kernel takes 57 ms to update
the static key, while the kernel with the batch updates takes just 1.4 ms on
average. Although it seems to be too good to be true, it makes sense: the
sched_schedstats key is used in 56 places, so it was expected that it would take
around 56 times to update the keys with the current implementation, as the
IPIs are the most expensive part of the update.

Regarding the int3 handler, the non-batch handler takes 45 ns on average, while
the batch version takes around 180 ns. At first glance, it seems to be a high
value. But it is not, considering that it is doing 56 updates, rather than one!
It is taking four times more, only. This gain is possible because the patch
uses a binary search in the vector: log2(56)=5.8. So, it was expected to have
an overhead within four times.

(voice of tv propaganda) But, that is not all! As the int3 handler keeps on for
a shorter period (because the update part is on for a shorter time), the number
of hits in the int3 handler decreased by 10%.

The question then is: Is it worth paying the price of "135 ns" more in the int3
handler?

Considering that, in this test case, we are saving the handling of 53 IPIs,
that takes more than these 135 ns, it seems to be a meager price to be paid.
Moreover, the test case was forcing the hit of the int3, in practice, it
does not take that often. While the IPI takes place on all CPUs, hitting
the int3 handler or not!

For instance, in an isolated CPU with a process running in user-space
(nohz_full use-case), the chances of hitting the int3 handler is barely zero,
while there is no way to avoid the IPIs. By bounding the IPIs, we are improving
this scenario a lot.

Changes from v3:
  - Optimizations in the hot path of poke_int3_handler() (Masami Hiramatsu)
  - Reduce code duplication in text_poke_bp()/text_poke_batch()
    (Masami Hiramatsu)
  - Set NOKPROBE_SYMBOL on functions called by poke_int3_handler()
    (Masami Hiramatsu)
Changes from v2:
  - Switch the order of patches 8 and 9 (Steven Rostedt)
  - Fallback in the case of failure in the batch mode (Steven Rostedt)
  - Fix a pointer in patch 7 (Steven Rostedt)
  - Adjust the commit log of the patch 1 (Borislav Petkov)
  - Adjust the commit log of the patch 3 (Jiri Kosina/Thomas Gleixner)
Changes from v1:
  - Split the patch in jump-label/x86-jump-label/alternative (Jason Baron)
  - Use bserach in the int3 handler (Steven Rostedt)
  - Use a pre-allocated page to store the vector (Jason/Steven)
  - Do performance tests in the int3 handler (peterz)

Daniel Bristot de Oliveira (9):
  jump_label: Add for_each_label_entry helper
  jump_label: Add the jump_label_can_update_check() helper
  x86/jump_label: Move checking code away from __jump_label_transform()
  x86/jump_label: Add __jump_label_set_jump_code() helper
  x86/alternative: Split text_poke_bp() into tree steps
  jump_label: Sort entries of the same key by the code
  x86/alternative: Batch of patch operations
  jump_label: Batch updates if arch supports it
  x86/jump_label: Batch jump label updates

 arch/x86/include/asm/jump_label.h    |   2 +
 arch/x86/include/asm/text-patching.h |  15 +++
 arch/x86/kernel/alternative.c        | 152 ++++++++++++++++++-----
 arch/x86/kernel/jump_label.c         | 174 ++++++++++++++++++++++-----
 include/linux/jump_label.h           |   6 +
 kernel/jump_label.c                  |  73 +++++++++--
 6 files changed, 358 insertions(+), 64 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH V4 1/9] jump_label: Add for_each_label_entry helper
  2019-02-04 19:58 [PATCH V4 0/9] x86/jump_label: Bound IPIs sent when updating a static key Daniel Bristot de Oliveira
@ 2019-02-04 19:58 ` Daniel Bristot de Oliveira
  2019-02-04 19:58 ` [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper Daniel Bristot de Oliveira
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-04 19:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: bristot, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

Add a helper macro to make jump entry iteration code more readable.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Scott Wood <swood@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 include/linux/jump_label.h | 3 +++
 kernel/jump_label.c        | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 3e113a1fa0f1..7e91af98bbb1 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -227,6 +227,9 @@ extern void static_key_disable(struct static_key *key);
 extern void static_key_enable_cpuslocked(struct static_key *key);
 extern void static_key_disable_cpuslocked(struct static_key *key);
 
+#define for_each_label_entry(key, entry, stop)				  \
+	for (; (entry < stop) && (jump_entry_key(entry) == key); entry++)
+
 /*
  * We should be using ATOMIC_INIT() for initializing .enabled, but
  * the inclusion of atomic.h is problematic for inclusion of jump_label.h
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index bad96b476eb6..288d630da22d 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -379,7 +379,7 @@ static void __jump_label_update(struct static_key *key,
 				struct jump_entry *stop,
 				bool init)
 {
-	for (; (entry < stop) && (jump_entry_key(entry) == key); entry++) {
+	for_each_label_entry(key, entry, stop) {
 		/*
 		 * An entry->code of 0 indicates an entry which has been
 		 * disabled because it was in an init text area.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper
  2019-02-04 19:58 [PATCH V4 0/9] x86/jump_label: Bound IPIs sent when updating a static key Daniel Bristot de Oliveira
  2019-02-04 19:58 ` [PATCH V4 1/9] jump_label: Add for_each_label_entry helper Daniel Bristot de Oliveira
@ 2019-02-04 19:58 ` Daniel Bristot de Oliveira
  2019-02-05  7:22   ` Borislav Petkov
  2019-02-04 19:58 ` [PATCH V4 3/9] x86/jump_label: Move checking code away from __jump_label_transform() Daniel Bristot de Oliveira
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-04 19:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: bristot, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

Move the check of if a jump_entry is valid to a function.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Scott Wood <swood@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 kernel/jump_label.c | 30 ++++++++++++++++++++----------
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 288d630da22d..456c0d7cbb5b 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -374,22 +374,32 @@ static enum jump_label_type jump_label_type(struct jump_entry *entry)
 	return enabled ^ branch;
 }
 
+bool jump_label_can_update_check(struct jump_entry *entry, bool init)
+{
+	/*
+	 * An entry->code of 0 indicates an entry which has been
+	 * disabled because it was in an init text area.
+	 */
+	if (init || !jump_entry_is_init(entry)) {
+		if (!kernel_text_address(jump_entry_code(entry))) {
+			WARN_ONCE(1, "can't patch jump_label at %pS",
+				  (void *)jump_entry_code(entry));
+			return 0;
+		}
+		return 1;
+	}
+	return 0;
+}
+
 static void __jump_label_update(struct static_key *key,
 				struct jump_entry *entry,
 				struct jump_entry *stop,
 				bool init)
 {
 	for_each_label_entry(key, entry, stop) {
-		/*
-		 * An entry->code of 0 indicates an entry which has been
-		 * disabled because it was in an init text area.
-		 */
-		if (init || !jump_entry_is_init(entry)) {
-			if (kernel_text_address(jump_entry_code(entry)))
-				arch_jump_label_transform(entry, jump_label_type(entry));
-			else
-				WARN_ONCE(1, "can't patch jump_label at %pS",
-					  (void *)jump_entry_code(entry));
+		if (jump_label_can_update_check(entry, init)) {
+			arch_jump_label_transform(entry,
+						  jump_label_type(entry));
 		}
 	}
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V4 3/9] x86/jump_label: Move checking code away from __jump_label_transform()
  2019-02-04 19:58 [PATCH V4 0/9] x86/jump_label: Bound IPIs sent when updating a static key Daniel Bristot de Oliveira
  2019-02-04 19:58 ` [PATCH V4 1/9] jump_label: Add for_each_label_entry helper Daniel Bristot de Oliveira
  2019-02-04 19:58 ` [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper Daniel Bristot de Oliveira
@ 2019-02-04 19:58 ` Daniel Bristot de Oliveira
  2019-02-05  7:33   ` Borislav Petkov
  2019-02-04 19:58 ` [PATCH V4 4/9] x86/jump_label: Add __jump_label_set_jump_code() helper Daniel Bristot de Oliveira
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-04 19:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: bristot, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

Move the check of the current code, before updating an entry, to specialized
functions. No changes in the method, only code relocation.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Scott Wood <swood@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kernel/jump_label.c | 60 +++++++++++++++++++++++++-----------
 1 file changed, 42 insertions(+), 18 deletions(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index f99bd26bd3f1..e443c43478eb 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -35,16 +35,53 @@ static void bug_at(unsigned char *ip, int line)
 	BUG();
 }
 
+static inline void __jump_label_trans_check_enable(struct jump_entry *entry,
+						   enum jump_label_type type,
+						   const unsigned char *ideal_nop,
+						   int init)
+{
+	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
+	const void *expect;
+	int line;
+
+	if (init) {
+		expect = default_nop; line = __LINE__;
+	} else {
+		expect = ideal_nop; line = __LINE__;
+	}
+
+	if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE))
+		bug_at((void *)jump_entry_code(entry), line);
+}
+
+static inline void __jump_label_trans_check_disable(struct jump_entry *entry,
+						    enum jump_label_type type,
+						    union jump_code_union *jmp,
+						    int init)
+{
+	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
+	const void *expect;
+	int line;
+
+	if (init) {
+		expect = default_nop; line = __LINE__;
+	} else {
+		expect = jmp->code; line = __LINE__;
+	}
+
+	if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE))
+		bug_at((void *)jump_entry_code(entry), line);
+}
+
+
 static void __ref __jump_label_transform(struct jump_entry *entry,
 					 enum jump_label_type type,
 					 void *(*poker)(void *, const void *, size_t),
 					 int init)
 {
 	union jump_code_union jmp;
-	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
 	const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
-	const void *expect, *code;
-	int line;
+	const void *code;
 
 	jmp.jump = 0xe9;
 	jmp.offset = jump_entry_target(entry) -
@@ -54,26 +91,13 @@ static void __ref __jump_label_transform(struct jump_entry *entry,
 		poker = text_poke_early;
 
 	if (type == JUMP_LABEL_JMP) {
-		if (init) {
-			expect = default_nop; line = __LINE__;
-		} else {
-			expect = ideal_nop; line = __LINE__;
-		}
-
+		__jump_label_trans_check_enable(entry, type, ideal_nop, init);
 		code = &jmp.code;
 	} else {
-		if (init) {
-			expect = default_nop; line = __LINE__;
-		} else {
-			expect = &jmp.code; line = __LINE__;
-		}
-
+		__jump_label_trans_check_disable(entry, type, &jmp, init);
 		code = ideal_nop;
 	}
 
-	if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE))
-		bug_at((void *)jump_entry_code(entry), line);
-
 	/*
 	 * Make text_poke_bp() a default fallback poker.
 	 *
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V4 4/9] x86/jump_label: Add __jump_label_set_jump_code() helper
  2019-02-04 19:58 [PATCH V4 0/9] x86/jump_label: Bound IPIs sent when updating a static key Daniel Bristot de Oliveira
                   ` (2 preceding siblings ...)
  2019-02-04 19:58 ` [PATCH V4 3/9] x86/jump_label: Move checking code away from __jump_label_transform() Daniel Bristot de Oliveira
@ 2019-02-04 19:58 ` Daniel Bristot de Oliveira
  2019-02-04 19:58 ` [PATCH V4 5/9] x86/alternative: Split text_poke_bp() into tree steps Daniel Bristot de Oliveira
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-04 19:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: bristot, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

Move the definition of the code to be written from
__jump_label_transform() to a specialized function.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Scott Wood <swood@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kernel/jump_label.c | 38 +++++++++++++++++++++---------------
 1 file changed, 22 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index e443c43478eb..2ef687db5a87 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -73,30 +73,36 @@ static inline void __jump_label_trans_check_disable(struct jump_entry *entry,
 		bug_at((void *)jump_entry_code(entry), line);
 }
 
+static void __jump_label_set_jump_code(struct jump_entry *entry,
+				       enum jump_label_type type,
+				       union jump_code_union *code,
+				       int init)
+{
+	const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
+
+	code->jump = 0xe9;
+	code->offset = jump_entry_target(entry) -
+		     (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
+
+	if (type == JUMP_LABEL_JMP) {
+		__jump_label_trans_check_enable(entry, type, ideal_nop, init);
+	} else {
+		__jump_label_trans_check_disable(entry, type, code, init);
+		memcpy(code, ideal_nop, JUMP_LABEL_NOP_SIZE);
+	}
+}
 
 static void __ref __jump_label_transform(struct jump_entry *entry,
 					 enum jump_label_type type,
 					 void *(*poker)(void *, const void *, size_t),
 					 int init)
 {
-	union jump_code_union jmp;
-	const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
-	const void *code;
-
-	jmp.jump = 0xe9;
-	jmp.offset = jump_entry_target(entry) -
-		     (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
+	union jump_code_union code;
 
 	if (early_boot_irqs_disabled)
 		poker = text_poke_early;
 
-	if (type == JUMP_LABEL_JMP) {
-		__jump_label_trans_check_enable(entry, type, ideal_nop, init);
-		code = &jmp.code;
-	} else {
-		__jump_label_trans_check_disable(entry, type, &jmp, init);
-		code = ideal_nop;
-	}
+	__jump_label_set_jump_code(entry, type, &code, init);
 
 	/*
 	 * Make text_poke_bp() a default fallback poker.
@@ -107,12 +113,12 @@ static void __ref __jump_label_transform(struct jump_entry *entry,
 	 *
 	 */
 	if (poker) {
-		(*poker)((void *)jump_entry_code(entry), code,
+		(*poker)((void *)jump_entry_code(entry), &code,
 			 JUMP_LABEL_NOP_SIZE);
 		return;
 	}
 
-	text_poke_bp((void *)jump_entry_code(entry), code, JUMP_LABEL_NOP_SIZE,
+	text_poke_bp((void *)jump_entry_code(entry), &code, JUMP_LABEL_NOP_SIZE,
 		     (void *)jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V4 5/9] x86/alternative: Split text_poke_bp() into tree steps
  2019-02-04 19:58 [PATCH V4 0/9] x86/jump_label: Bound IPIs sent when updating a static key Daniel Bristot de Oliveira
                   ` (3 preceding siblings ...)
  2019-02-04 19:58 ` [PATCH V4 4/9] x86/jump_label: Add __jump_label_set_jump_code() helper Daniel Bristot de Oliveira
@ 2019-02-04 19:58 ` Daniel Bristot de Oliveira
  2019-02-06 19:07   ` Borislav Petkov
                     ` (2 more replies)
  2019-02-04 19:58 ` [PATCH V4 6/9] jump_label: Sort entries of the same key by the code Daniel Bristot de Oliveira
                   ` (3 subsequent siblings)
  8 siblings, 3 replies; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-04 19:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: bristot, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

text_poke_bp() updates instructions on live kernel on SMP in three steps:
 1) add a int3 trap to the address that will be patched
 2) update all but the first byte of the patched range
 3) replace the first byte (int3) by the first byte of

This patch creates one function for each of these steps.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: Jiri Kosina <jkosina@suse.cz>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Scott Wood <swood@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/kernel/alternative.c | 38 +++++++++++++++++++++++++----------
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index d458c7973c56..202af29c43c0 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -767,6 +767,29 @@ int poke_int3_handler(struct pt_regs *regs)
 
 }
 
+static void text_poke_bp_set_handler(void *addr, void *handler,
+				     unsigned char int3)
+{
+	bp_int3_handler = handler;
+	bp_int3_addr = (u8 *)addr + sizeof(int3);
+	text_poke(addr, &int3, sizeof(int3));
+}
+
+static void patch_all_but_first_byte(void *addr, const void *opcode,
+				     size_t len, unsigned char int3)
+{
+	/* patch all but the first byte */
+	text_poke((char *)addr + sizeof(int3),
+		  (const char *) opcode + sizeof(int3),
+		  len - sizeof(int3));
+}
+
+static void patch_first_byte(void *addr, const void *opcode, unsigned char int3)
+{
+	/* patch the first byte */
+	text_poke(addr, opcode, sizeof(int3));
+}
+
 /**
  * text_poke_bp() -- update instructions on live kernel on SMP
  * @addr:	address to patch
@@ -791,27 +814,21 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
 {
 	unsigned char int3 = 0xcc;
 
-	bp_int3_handler = handler;
-	bp_int3_addr = (u8 *)addr + sizeof(int3);
-	bp_patching_in_progress = true;
-
 	lockdep_assert_held(&text_mutex);
 
+	bp_patching_in_progress = true;
 	/*
 	 * Corresponding read barrier in int3 notifier for making sure the
 	 * in_progress and handler are correctly ordered wrt. patching.
 	 */
 	smp_wmb();
 
-	text_poke(addr, &int3, sizeof(int3));
+	text_poke_bp_set_handler(addr, handler, int3);
 
 	on_each_cpu(do_sync_core, NULL, 1);
 
 	if (len - sizeof(int3) > 0) {
-		/* patch all but the first byte */
-		text_poke((char *)addr + sizeof(int3),
-			  (const char *) opcode + sizeof(int3),
-			  len - sizeof(int3));
+		patch_all_but_first_byte(addr, opcode, len, int3);
 		/*
 		 * According to Intel, this core syncing is very likely
 		 * not necessary and we'd be safe even without it. But
@@ -820,8 +837,7 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
 		on_each_cpu(do_sync_core, NULL, 1);
 	}
 
-	/* patch the first byte */
-	text_poke(addr, opcode, sizeof(int3));
+	patch_first_byte(addr, opcode, int3);
 
 	on_each_cpu(do_sync_core, NULL, 1);
 	/*
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V4 6/9] jump_label: Sort entries of the same key by the code
  2019-02-04 19:58 [PATCH V4 0/9] x86/jump_label: Bound IPIs sent when updating a static key Daniel Bristot de Oliveira
                   ` (4 preceding siblings ...)
  2019-02-04 19:58 ` [PATCH V4 5/9] x86/alternative: Split text_poke_bp() into tree steps Daniel Bristot de Oliveira
@ 2019-02-04 19:58 ` Daniel Bristot de Oliveira
  2019-02-04 19:59 ` [PATCH V4 7/9] x86/alternative: Batch of patch operations Daniel Bristot de Oliveira
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-04 19:58 UTC (permalink / raw)
  To: linux-kernel
  Cc: bristot, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

In the batching mode, entries with the same key should also be sorted by the
code, enabling a bsearch() of a code/addr when updating a key.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Scott Wood <swood@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 kernel/jump_label.c | 16 ++++++++++++++++
 1 file changed, 16 insertions(+)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 456c0d7cbb5b..53b7c85c0b09 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -36,12 +36,28 @@ static int jump_label_cmp(const void *a, const void *b)
 	const struct jump_entry *jea = a;
 	const struct jump_entry *jeb = b;
 
+	/*
+	 * Entrires are sorted by key.
+	 */
 	if (jump_entry_key(jea) < jump_entry_key(jeb))
 		return -1;
 
 	if (jump_entry_key(jea) > jump_entry_key(jeb))
 		return 1;
 
+#ifdef HAVE_JUMP_LABEL_BATCH
+	/*
+	 * In the batching mode, entries should also be sorted by the code
+	 * inside the already sorted list of entries, enabling a bsearch in
+	 * the vector.
+	 */
+	if (jump_entry_code(jea) < jump_entry_code(jeb))
+		return -1;
+
+	if (jump_entry_code(jea) > jump_entry_code(jeb))
+		return 1;
+#endif
+
 	return 0;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V4 7/9] x86/alternative: Batch of patch operations
  2019-02-04 19:58 [PATCH V4 0/9] x86/jump_label: Bound IPIs sent when updating a static key Daniel Bristot de Oliveira
                   ` (5 preceding siblings ...)
  2019-02-04 19:58 ` [PATCH V4 6/9] jump_label: Sort entries of the same key by the code Daniel Bristot de Oliveira
@ 2019-02-04 19:59 ` Daniel Bristot de Oliveira
  2019-02-14 12:53   ` Borislav Petkov
  2019-02-04 19:59 ` [PATCH V4 8/9] jump_label: Batch updates if arch supports it Daniel Bristot de Oliveira
  2019-02-04 19:59 ` [PATCH V4 9/9] x86/jump_label: Batch jump label updates Daniel Bristot de Oliveira
  8 siblings, 1 reply; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-04 19:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: bristot, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

Currently, the patch of an address is done in three steps:

-- Pseudo-code #1 - Current implementation ---
        1) add an int3 trap to the address that will be patched
            sync cores (send IPI to all other CPUs)
        2) update all but the first byte of the patched range
            sync cores (send IPI to all other CPUs)
        3) replace the first byte (int3) by the first byte of replacing opcode
            sync cores (send IPI to all other CPUs)
-- Pseudo-code #1 ---

When a static key has more than one entry, these steps are called once for
each entry. The number of IPIs then is linear with regard to the number 'n' of
entries of a key: O(n*3), which is O(n).

This algorithm works fine for the update of a single key. But we think
it is possible to optimize the case in which a static key has more than
one entry. For instance, the sched_schedstats jump label has 56 entries
in my (updated) fedora kernel, resulting in 168 IPIs for each CPU in
which the thread that is enabling the key is _not_ running.

With this patch, rather than receiving a single patch to be processed, a vector
of patches is passed, enabling the rewrite of the pseudo-code #1 in this
way:

-- Pseudo-code #2 - This patch  ---
1)  for each patch in the vector:
        add an int3 trap to the address that will be patched

    sync cores (send IPI to all other CPUs)

2)  for each patch in the vector:
        update all but the first byte of the patched range

    sync cores (send IPI to all other CPUs)

3)  for each patch in the vector:
        replace the first byte (int3) by the first byte of replacing opcode

    sync cores (send IPI to all other CPUs)
-- Pseudo-code #2 - This patch  ---

Doing the update in this way, the number of IPI becomes O(3) with regard
to the number of keys, which is O(1).

The batch mode is done with the function text_poke_bp_batch(), that receives
two arguments: a vector of "struct text_to_poke", and the number of entries
in the vector.

The vector must be sorted by the addr field of the text_to_poke structure,
enabling the binary search of a handler in the poke_int3_handler function
(a fast path).

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Scott Wood <swood@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/include/asm/text-patching.h |  15 ++++
 arch/x86/kernel/alternative.c        | 124 ++++++++++++++++++++++-----
 2 files changed, 118 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
index e85ff65c43c3..42ea7846df33 100644
--- a/arch/x86/include/asm/text-patching.h
+++ b/arch/x86/include/asm/text-patching.h
@@ -18,6 +18,20 @@ static inline void apply_paravirt(struct paravirt_patch_site *start,
 #define __parainstructions_end	NULL
 #endif
 
+/*
+ * Currently, the max observed size in the kernel code is
+ * JUMP_LABEL_NOP_SIZE/RELATIVEJUMP_SIZE, which are 5.
+ * Raise it if needed.
+ */
+#define POKE_MAX_OPCODE_SIZE	5
+
+struct text_to_poke {
+	void *handler;
+	void *addr;
+	size_t len;
+	const char opcode[POKE_MAX_OPCODE_SIZE];
+};
+
 extern void *text_poke_early(void *addr, const void *opcode, size_t len);
 
 /*
@@ -37,6 +51,7 @@ extern void *text_poke_early(void *addr, const void *opcode, size_t len);
 extern void *text_poke(void *addr, const void *opcode, size_t len);
 extern int poke_int3_handler(struct pt_regs *regs);
 extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
+extern void text_poke_bp_batch(struct text_to_poke *tp, unsigned int nr_entries);
 extern int after_bootmem;
 
 #endif /* _ASM_X86_TEXT_PATCHING_H */
diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
index 202af29c43c0..318b6867dc5a 100644
--- a/arch/x86/kernel/alternative.c
+++ b/arch/x86/kernel/alternative.c
@@ -11,6 +11,8 @@
 #include <linux/stop_machine.h>
 #include <linux/slab.h>
 #include <linux/kdebug.h>
+#include <linux/kprobes.h>
+#include <linux/bsearch.h>
 #include <asm/text-patching.h>
 #include <asm/alternative.h>
 #include <asm/sections.h>
@@ -738,10 +740,26 @@ static void do_sync_core(void *info)
 }
 
 static bool bp_patching_in_progress;
-static void *bp_int3_handler, *bp_int3_addr;
+static struct text_to_poke *bp_int3_tpv;
+static unsigned int bp_int3_tpv_nr;
+
+static int text_bp_batch_bsearch(const void *key, const void *elt)
+{
+	struct text_to_poke *tp = (struct text_to_poke *) elt;
+
+	if (key < tp->addr)
+		return -1;
+	if (key > tp->addr)
+		return 1;
+	return 0;
+}
+NOKPROBE_SYMBOL(text_bp_batch_bsearch);
 
 int poke_int3_handler(struct pt_regs *regs)
 {
+	void *ip;
+	struct text_to_poke *tp;
+
 	/*
 	 * Having observed our INT3 instruction, we now must observe
 	 * bp_patching_in_progress.
@@ -757,21 +775,40 @@ int poke_int3_handler(struct pt_regs *regs)
 	if (likely(!bp_patching_in_progress))
 		return 0;
 
-	if (user_mode(regs) || regs->ip != (unsigned long)bp_int3_addr)
+	if (user_mode(regs))
 		return 0;
 
-	/* set up the specified breakpoint handler */
-	regs->ip = (unsigned long) bp_int3_handler;
+	ip = (void *) regs->ip - sizeof(unsigned char);
 
-	return 1;
+	/*
+	 * Skip the binary search if there is a single member in the vector.
+	 */
+	if (unlikely(bp_int3_tpv_nr == 1))
+		goto single_poke;
+
+	tp = bsearch(ip, bp_int3_tpv, bp_int3_tpv_nr,
+		     sizeof(struct text_to_poke),
+		     text_bp_batch_bsearch);
+	if (tp) {
+		/* set up the specified breakpoint handler */
+		regs->ip = (unsigned long) tp->handler;
+		return 1;
+	}
 
+	return 0;
+
+single_poke:
+	if (ip == bp_int3_tpv->addr) {
+		regs->ip = (unsigned long) bp_int3_tpv->handler;
+		return 1;
+	}
+
+	return 0;
 }
 
 static void text_poke_bp_set_handler(void *addr, void *handler,
 				     unsigned char int3)
 {
-	bp_int3_handler = handler;
-	bp_int3_addr = (u8 *)addr + sizeof(int3);
 	text_poke(addr, &int3, sizeof(int3));
 }
 
@@ -791,31 +828,36 @@ static void patch_first_byte(void *addr, const void *opcode, unsigned char int3)
 }
 
 /**
- * text_poke_bp() -- update instructions on live kernel on SMP
- * @addr:	address to patch
- * @opcode:	opcode of new instruction
- * @len:	length to copy
- * @handler:	address to jump to when the temporary breakpoint is hit
+ * text_poke_bp_batch() -- update instructions on live kernel on SMP
+ * @tp:		vector of instructions to patch
+ * @nr_entries:	number of entries in the vector
  *
  * Modify multi-byte instruction by using int3 breakpoint on SMP.
  * We completely avoid stop_machine() here, and achieve the
  * synchronization using int3 breakpoint.
  *
  * The way it is done:
- *	- add a int3 trap to the address that will be patched
+ *	- For each entry in the vector:
+ *	    - add a int3 trap to the address that will be patched
  *	- sync cores
- *	- update all but the first byte of the patched range
+ *	- For each entry in the vector:
+ *	    - update all but the first byte of the patched range
  *	- sync cores
- *	- replace the first byte (int3) by the first byte of
- *	  replacing opcode
+ *	- For each entry in the vector:
+ *	    - replace the first byte (int3) by the first byte of
+ *	      replacing opcode
  *	- sync cores
  */
-void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
+void text_poke_bp_batch(struct text_to_poke *tp, unsigned int nr_entries)
 {
+	unsigned int i;
 	unsigned char int3 = 0xcc;
+	int patched_all_but_first = 0;
 
 	lockdep_assert_held(&text_mutex);
 
+	bp_int3_tpv = tp;
+	bp_int3_tpv_nr = nr_entries;
 	bp_patching_in_progress = true;
 	/*
 	 * Corresponding read barrier in int3 notifier for making sure the
@@ -823,12 +865,20 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
 	 */
 	smp_wmb();
 
-	text_poke_bp_set_handler(addr, handler, int3);
+	for (i = 0; i < nr_entries; i++)
+		text_poke_bp_set_handler(tp[i].addr, tp[i].handler, int3);
 
 	on_each_cpu(do_sync_core, NULL, 1);
 
-	if (len - sizeof(int3) > 0) {
-		patch_all_but_first_byte(addr, opcode, len, int3);
+	for (i = 0; i < nr_entries; i++) {
+		if (tp[i].len - sizeof(int3) > 0) {
+			patch_all_but_first_byte(tp[i].addr, tp[i].opcode,
+						 tp[i].len, int3);
+			patched_all_but_first++;
+		}
+	}
+
+	if (patched_all_but_first) {
 		/*
 		 * According to Intel, this core syncing is very likely
 		 * not necessary and we'd be safe even without it. But
@@ -837,14 +887,46 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
 		on_each_cpu(do_sync_core, NULL, 1);
 	}
 
-	patch_first_byte(addr, opcode, int3);
+	for (i = 0; i < nr_entries; i++)
+		patch_first_byte(tp[i].addr, tp[i].opcode, int3);
 
 	on_each_cpu(do_sync_core, NULL, 1);
 	/*
 	 * sync_core() implies an smp_mb() and orders this store against
 	 * the writing of the new instruction.
 	 */
+	bp_int3_tpv_nr = 0;
+	bp_int3_tpv = NULL;
 	bp_patching_in_progress = false;
+}
+
+/**
+ * text_poke_bp() -- update instructions on live kernel on SMP
+ * @addr:	address to patch
+ * @opcode:	opcode of new instruction
+ * @len:	length to copy
+ * @handler:	address to jump to when the temporary breakpoint is hit
+ *
+ * Update a single instruction with the vector in the stack, avoiding
+ * dynamically allocated memory. This function should be used when it is
+ * not possible to allocate memory.
+ */
+void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
+{
+	struct text_to_poke tp = {
+		.handler = handler,
+		.addr = addr,
+		.len = len,
+	};
+
+	if (len > POKE_MAX_OPCODE_SIZE) {
+		WARN_ONCE(1, "len is larger than %d\n", POKE_MAX_OPCODE_SIZE);
+		return NULL;
+	}
+
+	memcpy((void *)tp.opcode, opcode, len);
+
+	text_poke_bp_batch(&tp, 1);
 
 	return addr;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V4 8/9] jump_label: Batch updates if arch supports it
  2019-02-04 19:58 [PATCH V4 0/9] x86/jump_label: Bound IPIs sent when updating a static key Daniel Bristot de Oliveira
                   ` (6 preceding siblings ...)
  2019-02-04 19:59 ` [PATCH V4 7/9] x86/alternative: Batch of patch operations Daniel Bristot de Oliveira
@ 2019-02-04 19:59 ` Daniel Bristot de Oliveira
  2019-02-06  6:34   ` Masami Hiramatsu
  2019-02-14 13:33   ` Borislav Petkov
  2019-02-04 19:59 ` [PATCH V4 9/9] x86/jump_label: Batch jump label updates Daniel Bristot de Oliveira
  8 siblings, 2 replies; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-04 19:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: bristot, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

If the architecture supports the batching of jump label updates, use it!

An easy way to see the benefits of this patch is switching the
schedstats on and off. For instance:

-------------------------- %< ----------------------------
#!/bin/bash

while [ true ]; do
    sysctl -w kernel.sched_schedstats=1
    sleep 2
    sysctl -w kernel.sched_schedstats=0
    sleep 2
done
-------------------------- >% ----------------------------

while watching the IPI count:

-------------------------- %< ----------------------------
# watch -n1 "cat /proc/interrupts | grep Function"
-------------------------- >% ----------------------------

With the current mode, it is possible to see +- 168 IPIs each 2 seconds,
while with this patch the number of IPIs goes to 3 each 2 seconds.

Regarding the performance impact of this patch set, I made two measurements:

    The time to update a key (the task that is causing the change)
    The time to run the int3 handler (the side effect on a thread that
                                      hits the code being changed)

The schedstats static key was chosen as the key to being switched on and off.
The reason being is that it is used in more than 56 places, in a hot path. The
change in the schedstats static key will be done with the following command:

while [ true ]; do
    sysctl -w kernel.sched_schedstats=1
    usleep 500000
    sysctl -w kernel.sched_schedstats=0
    usleep 500000
done

In this way, they key will be updated twice per second. To force the hit of the
int3 handler, the system will also run a kernel compilation with two jobs per
CPU. The test machine is a two nodes/24 CPUs box with an Intel Xeon processor
@2.27GHz.

Regarding the update part, on average, the regular kernel takes 57 ms to update
the schedstats key, while the kernel with the batch updates takes just 1.4 ms
on average. Although it seems to be too good to be true, it makes sense: the
schedstats key is used in 56 places, so it was expected that it would take
around 56 times to update the keys with the current implementation, as the
IPIs are the most expensive part of the update.

Regarding the int3 handler, the non-batch handler takes 45 ns on average, while
the batch version takes around 180 ns. At first glance, it seems to be a high
value. But it is not, considering that it is doing 56 updates, rather than one!
It is taking four times more, only. This gain is possible because the patch
uses a binary search in the vector: log2(56)=5.8. So, it was expected to have
an overhead within four times.

(voice of tv propaganda) But, that is not all! As the int3 handler keeps on for
a shorter period (because the update part is on for a shorter time), the number
of hits in the int3 handler decreased by 10%.

The question then is: Is it worth paying the price of "135 ns" more in the int3
handler?

Considering that, in this test case, we are saving the handling of 53 IPIs,
that takes more than these 135 ns, it seems to be a meager price to be paid.
Moreover, the test case was forcing the hit of the int3, in practice, it
does not take that often. While the IPI takes place on all CPUs, hitting
the int3 handler or not!

For instance, in an isolated CPU with a process running in user-space
(nohz_full use-case), the chances of hitting the int3 handler is barely zero,
while there is no way to avoid the IPIs. By bounding the IPIs, we are improving
a lot this scenario.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Scott Wood <swood@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
---
 include/linux/jump_label.h |  3 +++
 kernel/jump_label.c        | 29 +++++++++++++++++++++++++++++
 2 files changed, 32 insertions(+)

diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
index 7e91af98bbb1..b3dfce98edb7 100644
--- a/include/linux/jump_label.h
+++ b/include/linux/jump_label.h
@@ -215,6 +215,9 @@ extern void arch_jump_label_transform(struct jump_entry *entry,
 				      enum jump_label_type type);
 extern void arch_jump_label_transform_static(struct jump_entry *entry,
 					     enum jump_label_type type);
+extern int arch_jump_label_transform_queue(struct jump_entry *entry,
+					    enum jump_label_type type);
+extern void arch_jump_label_transform_apply(void);
 extern int jump_label_text_reserved(void *start, void *end);
 extern void static_key_slow_inc(struct static_key *key);
 extern void static_key_slow_dec(struct static_key *key);
diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 53b7c85c0b09..944c75a0b09b 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -407,6 +407,7 @@ bool jump_label_can_update_check(struct jump_entry *entry, bool init)
 	return 0;
 }
 
+#ifndef HAVE_JUMP_LABEL_BATCH
 static void __jump_label_update(struct static_key *key,
 				struct jump_entry *entry,
 				struct jump_entry *stop,
@@ -419,6 +420,34 @@ static void __jump_label_update(struct static_key *key,
 		}
 	}
 }
+#else
+static void __jump_label_update(struct static_key *key,
+				struct jump_entry *entry,
+				struct jump_entry *stop,
+				bool init)
+{
+	for_each_label_entry(key, entry, stop) {
+
+		if (!jump_label_can_update_check(entry, init))
+			continue;
+
+		if (arch_jump_label_transform_queue(entry,
+						    jump_label_type(entry)))
+			continue;
+
+		/*
+		 * Queue's overflow: Apply the current queue, and then
+		 * queue again. If it stills not possible to queue, BUG!
+		 */
+		arch_jump_label_transform_apply();
+		if (!arch_jump_label_transform_queue(entry,
+						     jump_label_type(entry))) {
+			BUG();
+		}
+	}
+	arch_jump_label_transform_apply();
+}
+#endif
 
 void __init jump_label_init(void)
 {
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH V4 9/9] x86/jump_label: Batch jump label updates
  2019-02-04 19:58 [PATCH V4 0/9] x86/jump_label: Bound IPIs sent when updating a static key Daniel Bristot de Oliveira
                   ` (7 preceding siblings ...)
  2019-02-04 19:59 ` [PATCH V4 8/9] jump_label: Batch updates if arch supports it Daniel Bristot de Oliveira
@ 2019-02-04 19:59 ` Daniel Bristot de Oliveira
  2019-02-06  6:29   ` Masami Hiramatsu
  8 siblings, 1 reply; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-04 19:59 UTC (permalink / raw)
  To: linux-kernel
  Cc: bristot, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

Currently, the jump label of a static key is transformed via the arch
specific function:

    void arch_jump_label_transform(struct jump_entry *entry,
                                   enum jump_label_type type)

The new approach (batch mode) uses two arch functions, the first has the
same arguments of the arch_jump_label_transform(), and is the function:

    void arch_jump_label_transform_queue(struct jump_entry *entry,
					 enum jump_label_type type)

Rather than transforming the code, it adds the jump_entry in a queue of
entries to be updated. This functions returns 1 in the case of a
successful enqueue of an entry. If it returns 0, the caller must to
apply the queue and then try to queue again, for instance, because the
queue is full.

This function expects the caller to sort the entries by the address before
enqueueuing then. This is already done by the arch independent code, though.

After queuing all jump_entries, the function:

    void arch_jump_label_transform_apply(void)

Applies the changes in the queue.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Scott Wood <swood@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 arch/x86/include/asm/jump_label.h |  2 +
 arch/x86/kernel/jump_label.c      | 88 +++++++++++++++++++++++++++++++
 2 files changed, 90 insertions(+)

diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
index 65191ce8e1cf..06c3cc22a058 100644
--- a/arch/x86/include/asm/jump_label.h
+++ b/arch/x86/include/asm/jump_label.h
@@ -2,6 +2,8 @@
 #ifndef _ASM_X86_JUMP_LABEL_H
 #define _ASM_X86_JUMP_LABEL_H
 
+#define HAVE_JUMP_LABEL_BATCH
+
 #define JUMP_LABEL_NOP_SIZE 5
 
 #ifdef CONFIG_X86_64
diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
index 2ef687db5a87..3c81cf8f06ca 100644
--- a/arch/x86/kernel/jump_label.c
+++ b/arch/x86/kernel/jump_label.c
@@ -15,6 +15,7 @@
 #include <asm/kprobes.h>
 #include <asm/alternative.h>
 #include <asm/text-patching.h>
+#include <linux/slab.h>
 
 union jump_code_union {
 	char code[JUMP_LABEL_NOP_SIZE];
@@ -130,6 +131,93 @@ void arch_jump_label_transform(struct jump_entry *entry,
 	mutex_unlock(&text_mutex);
 }
 
+struct text_to_poke *entry_vector;
+unsigned int entry_vector_max_elem __read_mostly;
+unsigned int entry_vector_nr_elem;
+
+void arch_jump_label_init(void)
+{
+	entry_vector = (void *) __get_free_page(GFP_KERNEL);
+
+	if (WARN_ON_ONCE(!entry_vector))
+		return;
+
+	entry_vector_max_elem = PAGE_SIZE / sizeof(struct text_to_poke);
+	return;
+}
+
+int arch_jump_label_transform_queue(struct jump_entry *entry,
+				     enum jump_label_type type)
+{
+	void *entry_code;
+	struct text_to_poke *tp;
+
+	/*
+	 * Batch mode disabled before being able to allocate memory:
+	 * Fallback to the non-batching mode.
+	 */
+	if (unlikely(!entry_vector_max_elem)) {
+		if (!slab_is_available() || early_boot_irqs_disabled)
+			goto fallback;
+
+		arch_jump_label_init();
+	}
+
+	/*
+	 * No more space in the vector, tell upper layer to apply
+	 * the queue before continuing.
+	 */
+	if (entry_vector_nr_elem == entry_vector_max_elem)
+		return 0;
+
+	tp = &entry_vector[entry_vector_nr_elem];
+
+	entry_code = (void *)jump_entry_code(entry);
+
+	/*
+	 * The int3 handler will do a bsearch in the queue, so we need entries
+	 * to be sorted. We can survive an unsorted list by rejecting the entry,
+	 * forcing the generic jump_label code to apply the queue. Warning once,
+	 * to raise the attention to the case of an unsorted entry that is
+	 * better not happen, because, in the worst case we will perform in the
+	 * same way as we do without batching - with some more overhead.
+	 */
+	if (entry_vector_nr_elem > 0) {
+		int prev_idx = entry_vector_nr_elem - 1;
+		struct text_to_poke *prev_tp = &entry_vector[prev_idx];
+
+		if (WARN_ON_ONCE(prev_tp->addr > entry_code))
+			return 0;
+	}
+
+	__jump_label_set_jump_code(entry, type,
+				   (union jump_code_union *) &tp->opcode, 0);
+
+	tp->addr = entry_code;
+	tp->handler = entry_code + JUMP_LABEL_NOP_SIZE;
+	tp->len = JUMP_LABEL_NOP_SIZE;
+
+	entry_vector_nr_elem++;
+
+	return 1;
+
+fallback:
+	arch_jump_label_transform(entry, type);
+	return 1;
+}
+
+void arch_jump_label_transform_apply(void)
+{
+	if (early_boot_irqs_disabled || !entry_vector_nr_elem)
+		return;
+
+	mutex_lock(&text_mutex);
+	text_poke_bp_batch(entry_vector, entry_vector_nr_elem);
+	mutex_unlock(&text_mutex);
+
+	entry_vector_nr_elem = 0;
+}
+
 static enum {
 	JL_STATE_START,
 	JL_STATE_NO_UPDATE,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper
  2019-02-04 19:58 ` [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper Daniel Bristot de Oliveira
@ 2019-02-05  7:22   ` Borislav Petkov
  2019-02-05 13:50     ` Daniel Bristot de Oliveira
  0 siblings, 1 reply; 36+ messages in thread
From: Borislav Petkov @ 2019-02-05  7:22 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

> Subject: Re: [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper

s/the/a/

On Mon, Feb 04, 2019 at 08:58:55PM +0100, Daniel Bristot de Oliveira wrote:
> Move the check of if a jump_entry is valid to a function.

s/of //

> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index 288d630da22d..456c0d7cbb5b 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -374,22 +374,32 @@ static enum jump_label_type jump_label_type(struct jump_entry *entry)
>  	return enabled ^ branch;
>  }
>  
> +bool jump_label_can_update_check(struct jump_entry *entry, bool init)

static.

Also, "jump_label_can_update" is sufficient for a name AFAICT.

> +{
> +	/*
> +	 * An entry->code of 0 indicates an entry which has been
> +	 * disabled because it was in an init text area.
> +	 */
> +	if (init || !jump_entry_is_init(entry)) {
> +		if (!kernel_text_address(jump_entry_code(entry))) {
> +			WARN_ONCE(1, "can't patch jump_label at %pS",
> +				  (void *)jump_entry_code(entry));
> +			return 0;
> +		}
> +		return 1;
> +	}
> +	return 0;

Those should be bools which it returns, no?

Also, I'd do the function this way, to make it more readable and not
have three returns back-to-back. :)

/*
 * An entry->code of 0 indicates an entry which has been disabled because it
 * was in an init text area.
 */
bool jump_label_can_update(struct jump_entry *entry, bool init)
{
        if (!init && jump_entry_is_init(entry))
                return false;

        if (WARN_ON_ONCE(!kernel_text_address(jump_entry_code(entry))),
                         "can't patch jump_label at %pS", (void *)jump_entry_code(entry))
                return false;

        return true;
}

That second check could be even:

        if (WARN_ON_ONCE(!kernel_text_address(jump_entry_code(entry))),
                         "can't patch jump_label at %pS", (void *)jump_entry_code(entry))
                return false;

but that's not more readable than above, I'd say.

>  static void __jump_label_update(struct static_key *key,
>  				struct jump_entry *entry,
>  				struct jump_entry *stop,
>  				bool init)
>  {
>  	for_each_label_entry(key, entry, stop) {
> -		/*
> -		 * An entry->code of 0 indicates an entry which has been
> -		 * disabled because it was in an init text area.
> -		 */
> -		if (init || !jump_entry_is_init(entry)) {
> -			if (kernel_text_address(jump_entry_code(entry)))
> -				arch_jump_label_transform(entry, jump_label_type(entry));
> -			else
> -				WARN_ONCE(1, "can't patch jump_label at %pS",
> -					  (void *)jump_entry_code(entry));
> +		if (jump_label_can_update_check(entry, init)) {
> +			arch_jump_label_transform(entry,
> +						  jump_label_type(entry));

Yeah, let that one stick out.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 3/9] x86/jump_label: Move checking code away from __jump_label_transform()
  2019-02-04 19:58 ` [PATCH V4 3/9] x86/jump_label: Move checking code away from __jump_label_transform() Daniel Bristot de Oliveira
@ 2019-02-05  7:33   ` Borislav Petkov
  2019-02-05 15:22     ` Daniel Bristot de Oliveira
  2019-02-15 10:05     ` Daniel Bristot de Oliveira
  0 siblings, 2 replies; 36+ messages in thread
From: Borislav Petkov @ 2019-02-05  7:33 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Mon, Feb 04, 2019 at 08:58:56PM +0100, Daniel Bristot de Oliveira wrote:
> Move the check of the current code, before updating an entry, to specialized
> functions. No changes in the method, only code relocation.
> 
> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Chris von Recklinghausen <crecklin@redhat.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Cc: Scott Wood <swood@redhat.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Cc: Clark Williams <williams@redhat.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/x86/kernel/jump_label.c | 60 +++++++++++++++++++++++++-----------
>  1 file changed, 42 insertions(+), 18 deletions(-)
> 
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index f99bd26bd3f1..e443c43478eb 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -35,16 +35,53 @@ static void bug_at(unsigned char *ip, int line)
>  	BUG();
>  }
>  
> +static inline void __jump_label_trans_check_enable(struct jump_entry *entry,
> +						   enum jump_label_type type,
> +						   const unsigned char *ideal_nop,
> +						   int init)
> +{
> +	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
> +	const void *expect;
> +	int line;
> +
> +	if (init) {
> +		expect = default_nop; line = __LINE__;
> +	} else {
> +		expect = ideal_nop; line = __LINE__;
> +	}
> +
> +	if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE))
> +		bug_at((void *)jump_entry_code(entry), line);
> +}
> +
> +static inline void __jump_label_trans_check_disable(struct jump_entry *entry,
> +						    enum jump_label_type type,
> +						    union jump_code_union *jmp,
> +						    int init)
> +{
> +	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
> +	const void *expect;
> +	int line;
> +
> +	if (init) {
> +		expect = default_nop; line = __LINE__;
> +	} else {
> +		expect = jmp->code; line = __LINE__;
> +	}
> +
> +	if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE))
> +		bug_at((void *)jump_entry_code(entry), line);
> +}

Why the carve out?

The next patch is adding __jump_label_set_jump_code() which calls them
so you could just as well keep the functionality all in that function without
having too many helpers which are called only once...

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper
  2019-02-05  7:22   ` Borislav Petkov
@ 2019-02-05 13:50     ` Daniel Bristot de Oliveira
  2019-02-05 21:13       ` Borislav Petkov
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-05 13:50 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

Hi Borislav!

On 2/5/19 8:22 AM, Borislav Petkov wrote:
>> Subject: Re: [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper
> 
> s/the/a/

ack!

> On Mon, Feb 04, 2019 at 08:58:55PM +0100, Daniel Bristot de Oliveira wrote:
>> Move the check of if a jump_entry is valid to a function.
> 
> s/of //

ack!

>> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
>> index 288d630da22d..456c0d7cbb5b 100644
>> --- a/kernel/jump_label.c
>> +++ b/kernel/jump_label.c
>> @@ -374,22 +374,32 @@ static enum jump_label_type jump_label_type(struct jump_entry *entry)
>>  	return enabled ^ branch;
>>  }
>>  
>> +bool jump_label_can_update_check(struct jump_entry *entry, bool init)
> 
> static.
> 
> Also, "jump_label_can_update" is sufficient for a name AFAICT.

sounds better indeed.

>> +{
>> +	/*
>> +	 * An entry->code of 0 indicates an entry which has been
>> +	 * disabled because it was in an init text area.
>> +	 */
>> +	if (init || !jump_entry_is_init(entry)) {
>> +		if (!kernel_text_address(jump_entry_code(entry))) {
>> +			WARN_ONCE(1, "can't patch jump_label at %pS",
>> +				  (void *)jump_entry_code(entry));
>> +			return 0;
>> +		}
>> +		return 1;
>> +	}
>> +	return 0;
> 
> Those should be bools which it returns, no?
> 
> Also, I'd do the function this way, to make it more readable and not
> have three returns back-to-back. :)
> 
> /*
>  * An entry->code of 0 indicates an entry which has been disabled because it
>  * was in an init text area.
>  */
> bool jump_label_can_update(struct jump_entry *entry, bool init)
> {
>         if (!init && jump_entry_is_init(entry))
>                 return false;
> 
>         if (WARN_ON_ONCE(!kernel_text_address(jump_entry_code(entry))),
>                          "can't patch jump_label at %pS", (void *)jump_entry_code(entry))
>                 return false;
> 
>         return true;
> }
> 
> That second check could be even:
> 
>         if (WARN_ON_ONCE(!kernel_text_address(jump_entry_code(entry))),
>                          "can't patch jump_label at %pS", (void *)jump_entry_code(entry))
>                 return false;
> 
> but that's not more readable than above, I'd say.

Agreed!

> 
>>  static void __jump_label_update(struct static_key *key,
>>  				struct jump_entry *entry,
>>  				struct jump_entry *stop,
>>  				bool init)
>>  {
>>  	for_each_label_entry(key, entry, stop) {
>> -		/*
>> -		 * An entry->code of 0 indicates an entry which has been
>> -		 * disabled because it was in an init text area.
>> -		 */
>> -		if (init || !jump_entry_is_init(entry)) {
>> -			if (kernel_text_address(jump_entry_code(entry)))
>> -				arch_jump_label_transform(entry, jump_label_type(entry));
>> -			else
>> -				WARN_ONCE(1, "can't patch jump_label at %pS",
>> -					  (void *)jump_entry_code(entry));
>> +		if (jump_label_can_update_check(entry, init)) {
>> +			arch_jump_label_transform(entry,
>> +						  jump_label_type(entry));
> 
> Yeah, let that one stick out.

I did not get this part...

Thanks!

-- Daniel


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 3/9] x86/jump_label: Move checking code away from __jump_label_transform()
  2019-02-05  7:33   ` Borislav Petkov
@ 2019-02-05 15:22     ` Daniel Bristot de Oliveira
  2019-02-15 10:05     ` Daniel Bristot de Oliveira
  1 sibling, 0 replies; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-05 15:22 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On 2/5/19 8:33 AM, Borislav Petkov wrote:
> On Mon, Feb 04, 2019 at 08:58:56PM +0100, Daniel Bristot de Oliveira wrote:
>> Move the check of the current code, before updating an entry, to specialized
>> functions. No changes in the method, only code relocation.
>>
>> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
>> Cc: Jiri Kosina <jkosina@suse.cz>
>> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
>> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
>> Cc: Chris von Recklinghausen <crecklin@redhat.com>
>> Cc: Jason Baron <jbaron@akamai.com>
>> Cc: Scott Wood <swood@redhat.com>
>> Cc: Marcelo Tosatti <mtosatti@redhat.com>
>> Cc: Clark Williams <williams@redhat.com>
>> Cc: x86@kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>  arch/x86/kernel/jump_label.c | 60 +++++++++++++++++++++++++-----------
>>  1 file changed, 42 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>> index f99bd26bd3f1..e443c43478eb 100644
>> --- a/arch/x86/kernel/jump_label.c
>> +++ b/arch/x86/kernel/jump_label.c
>> @@ -35,16 +35,53 @@ static void bug_at(unsigned char *ip, int line)
>>  	BUG();
>>  }
>>  
>> +static inline void __jump_label_trans_check_enable(struct jump_entry *entry,
>> +						   enum jump_label_type type,
>> +						   const unsigned char *ideal_nop,
>> +						   int init)
>> +{
>> +	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
>> +	const void *expect;
>> +	int line;
>> +
>> +	if (init) {
>> +		expect = default_nop; line = __LINE__;
>> +	} else {
>> +		expect = ideal_nop; line = __LINE__;
>> +	}
>> +
>> +	if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE))
>> +		bug_at((void *)jump_entry_code(entry), line);
>> +}
>> +
>> +static inline void __jump_label_trans_check_disable(struct jump_entry *entry,
>> +						    enum jump_label_type type,
>> +						    union jump_code_union *jmp,
>> +						    int init)
>> +{
>> +	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
>> +	const void *expect;
>> +	int line;
>> +
>> +	if (init) {
>> +		expect = default_nop; line = __LINE__;
>> +	} else {
>> +		expect = jmp->code; line = __LINE__;
>> +	}
>> +
>> +	if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE))
>> +		bug_at((void *)jump_entry_code(entry), line);
>> +}
> 
> Why the carve out?
> 
> The next patch is adding __jump_label_set_jump_code() which calls them
> so you could just as well keep the functionality all in that function without
> having too many helpers which are called only once...
> 

agreed.

Thanks!
-- Daniel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper
  2019-02-05 13:50     ` Daniel Bristot de Oliveira
@ 2019-02-05 21:13       ` Borislav Petkov
  2019-02-07 13:21         ` Daniel Bristot de Oliveira
  0 siblings, 1 reply; 36+ messages in thread
From: Borislav Petkov @ 2019-02-05 21:13 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Tue, Feb 05, 2019 at 02:50:39PM +0100, Daniel Bristot de Oliveira wrote:
> >> +		if (jump_label_can_update_check(entry, init)) {
> >> +			arch_jump_label_transform(entry,
> >> +						  jump_label_type(entry));
> > 
> > Yeah, let that one stick out.
> 
> I did not get this part...

Just let the line stick outside of the 80 cols rule which is not a hard
one:

+          if (jump_label_can_update_check(entry, init)) {
+                  arch_jump_label_transform(entry, jump_label_type(entry));


:-)

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 9/9] x86/jump_label: Batch jump label updates
  2019-02-04 19:59 ` [PATCH V4 9/9] x86/jump_label: Batch jump label updates Daniel Bristot de Oliveira
@ 2019-02-06  6:29   ` Masami Hiramatsu
  0 siblings, 0 replies; 36+ messages in thread
From: Masami Hiramatsu @ 2019-02-06  6:29 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

Hi Daniel,

On Mon,  4 Feb 2019 20:59:02 +0100
Daniel Bristot de Oliveira <bristot@redhat.com> wrote:

> Currently, the jump label of a static key is transformed via the arch
> specific function:
> 
>     void arch_jump_label_transform(struct jump_entry *entry,
>                                    enum jump_label_type type)
> 
> The new approach (batch mode) uses two arch functions, the first has the
> same arguments of the arch_jump_label_transform(), and is the function:
> 
>     void arch_jump_label_transform_queue(struct jump_entry *entry,
> 					 enum jump_label_type type)

This function actually returns "int" value.
Also, it seems the function returns 0 for failure and 1 for success,
but usually we guess "int" returns -errno or 0 for success.

So could you update the inferface to return -ENOSPC for vector
overflow, and -EINVAL for invalid entry?

Thank you,

> 
> Rather than transforming the code, it adds the jump_entry in a queue of
> entries to be updated. This functions returns 1 in the case of a
> successful enqueue of an entry. If it returns 0, the caller must to
> apply the queue and then try to queue again, for instance, because the
> queue is full.
> 
> This function expects the caller to sort the entries by the address before
> enqueueuing then. This is already done by the arch independent code, though.
> 
> After queuing all jump_entries, the function:
> 
>     void arch_jump_label_transform_apply(void)
> 
> Applies the changes in the queue.
> 
> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Chris von Recklinghausen <crecklin@redhat.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Cc: Scott Wood <swood@redhat.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Cc: Clark Williams <williams@redhat.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/x86/include/asm/jump_label.h |  2 +
>  arch/x86/kernel/jump_label.c      | 88 +++++++++++++++++++++++++++++++
>  2 files changed, 90 insertions(+)
> 
> diff --git a/arch/x86/include/asm/jump_label.h b/arch/x86/include/asm/jump_label.h
> index 65191ce8e1cf..06c3cc22a058 100644
> --- a/arch/x86/include/asm/jump_label.h
> +++ b/arch/x86/include/asm/jump_label.h
> @@ -2,6 +2,8 @@
>  #ifndef _ASM_X86_JUMP_LABEL_H
>  #define _ASM_X86_JUMP_LABEL_H
>  
> +#define HAVE_JUMP_LABEL_BATCH
> +
>  #define JUMP_LABEL_NOP_SIZE 5
>  
>  #ifdef CONFIG_X86_64
> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
> index 2ef687db5a87..3c81cf8f06ca 100644
> --- a/arch/x86/kernel/jump_label.c
> +++ b/arch/x86/kernel/jump_label.c
> @@ -15,6 +15,7 @@
>  #include <asm/kprobes.h>
>  #include <asm/alternative.h>
>  #include <asm/text-patching.h>
> +#include <linux/slab.h>
>  
>  union jump_code_union {
>  	char code[JUMP_LABEL_NOP_SIZE];
> @@ -130,6 +131,93 @@ void arch_jump_label_transform(struct jump_entry *entry,
>  	mutex_unlock(&text_mutex);
>  }
>  
> +struct text_to_poke *entry_vector;
> +unsigned int entry_vector_max_elem __read_mostly;
> +unsigned int entry_vector_nr_elem;
> +
> +void arch_jump_label_init(void)
> +{
> +	entry_vector = (void *) __get_free_page(GFP_KERNEL);
> +
> +	if (WARN_ON_ONCE(!entry_vector))
> +		return;
> +
> +	entry_vector_max_elem = PAGE_SIZE / sizeof(struct text_to_poke);
> +	return;
> +}
> +
> +int arch_jump_label_transform_queue(struct jump_entry *entry,
> +				     enum jump_label_type type)
> +{
> +	void *entry_code;
> +	struct text_to_poke *tp;
> +
> +	/*
> +	 * Batch mode disabled before being able to allocate memory:
> +	 * Fallback to the non-batching mode.
> +	 */
> +	if (unlikely(!entry_vector_max_elem)) {
> +		if (!slab_is_available() || early_boot_irqs_disabled)
> +			goto fallback;
> +
> +		arch_jump_label_init();
> +	}
> +
> +	/*
> +	 * No more space in the vector, tell upper layer to apply
> +	 * the queue before continuing.
> +	 */
> +	if (entry_vector_nr_elem == entry_vector_max_elem)
> +		return 0;
> +
> +	tp = &entry_vector[entry_vector_nr_elem];
> +
> +	entry_code = (void *)jump_entry_code(entry);
> +
> +	/*
> +	 * The int3 handler will do a bsearch in the queue, so we need entries
> +	 * to be sorted. We can survive an unsorted list by rejecting the entry,
> +	 * forcing the generic jump_label code to apply the queue. Warning once,
> +	 * to raise the attention to the case of an unsorted entry that is
> +	 * better not happen, because, in the worst case we will perform in the
> +	 * same way as we do without batching - with some more overhead.
> +	 */
> +	if (entry_vector_nr_elem > 0) {
> +		int prev_idx = entry_vector_nr_elem - 1;
> +		struct text_to_poke *prev_tp = &entry_vector[prev_idx];
> +
> +		if (WARN_ON_ONCE(prev_tp->addr > entry_code))
> +			return 0;
> +	}
> +
> +	__jump_label_set_jump_code(entry, type,
> +				   (union jump_code_union *) &tp->opcode, 0);
> +
> +	tp->addr = entry_code;
> +	tp->handler = entry_code + JUMP_LABEL_NOP_SIZE;
> +	tp->len = JUMP_LABEL_NOP_SIZE;
> +
> +	entry_vector_nr_elem++;
> +
> +	return 1;
> +
> +fallback:
> +	arch_jump_label_transform(entry, type);
> +	return 1;
> +}
> +
> +void arch_jump_label_transform_apply(void)
> +{
> +	if (early_boot_irqs_disabled || !entry_vector_nr_elem)
> +		return;
> +
> +	mutex_lock(&text_mutex);
> +	text_poke_bp_batch(entry_vector, entry_vector_nr_elem);
> +	mutex_unlock(&text_mutex);
> +
> +	entry_vector_nr_elem = 0;
> +}
> +
>  static enum {
>  	JL_STATE_START,
>  	JL_STATE_NO_UPDATE,
> -- 
> 2.17.1
> 


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 8/9] jump_label: Batch updates if arch supports it
  2019-02-04 19:59 ` [PATCH V4 8/9] jump_label: Batch updates if arch supports it Daniel Bristot de Oliveira
@ 2019-02-06  6:34   ` Masami Hiramatsu
  2019-02-06 15:59     ` Daniel Bristot de Oliveira
  2019-02-14 13:33   ` Borislav Petkov
  1 sibling, 1 reply; 36+ messages in thread
From: Masami Hiramatsu @ 2019-02-06  6:34 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Mon,  4 Feb 2019 20:59:01 +0100
Daniel Bristot de Oliveira <bristot@redhat.com> wrote:

> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -407,6 +407,7 @@ bool jump_label_can_update_check(struct jump_entry *entry, bool init)
>  	return 0;
>  }
>  
> +#ifndef HAVE_JUMP_LABEL_BATCH
>  static void __jump_label_update(struct static_key *key,
>  				struct jump_entry *entry,
>  				struct jump_entry *stop,
> @@ -419,6 +420,34 @@ static void __jump_label_update(struct static_key *key,
>  		}
>  	}
>  }
> +#else
> +static void __jump_label_update(struct static_key *key,
> +				struct jump_entry *entry,
> +				struct jump_entry *stop,
> +				bool init)
> +{
> +	for_each_label_entry(key, entry, stop) {
> +
> +		if (!jump_label_can_update_check(entry, init))
> +			continue;
> +
> +		if (arch_jump_label_transform_queue(entry,
> +						    jump_label_type(entry)))
> +			continue;
> +
> +		/*
> +		 * Queue's overflow: Apply the current queue, and then
> +		 * queue again. If it stills not possible to queue, BUG!
> +		 */
> +		arch_jump_label_transform_apply();
> +		if (!arch_jump_label_transform_queue(entry,
> +						     jump_label_type(entry))) {
> +			BUG();

Please do not relay on BUG(), since in both case (applied or not),
jump_label is not critical for normal operation. I think you should use
WARN_ONCE() here and lock the jump_label so that root user can report it
to us :)

Thank you,


-- 
Masami Hiramatsu <mhiramat@kernel.org>

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 8/9] jump_label: Batch updates if arch supports it
  2019-02-06  6:34   ` Masami Hiramatsu
@ 2019-02-06 15:59     ` Daniel Bristot de Oliveira
  0 siblings, 0 replies; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-06 15:59 UTC (permalink / raw)
  To: Masami Hiramatsu
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On 2/6/19 7:34 AM, Masami Hiramatsu wrote:
> On Mon,  4 Feb 2019 20:59:01 +0100
> Daniel Bristot de Oliveira <bristot@redhat.com> wrote:
> 
>> --- a/kernel/jump_label.c
>> +++ b/kernel/jump_label.c
>> @@ -407,6 +407,7 @@ bool jump_label_can_update_check(struct jump_entry *entry, bool init)
>>  	return 0;
>>  }
>>  
>> +#ifndef HAVE_JUMP_LABEL_BATCH
>>  static void __jump_label_update(struct static_key *key,
>>  				struct jump_entry *entry,
>>  				struct jump_entry *stop,
>> @@ -419,6 +420,34 @@ static void __jump_label_update(struct static_key *key,
>>  		}
>>  	}
>>  }
>> +#else
>> +static void __jump_label_update(struct static_key *key,
>> +				struct jump_entry *entry,
>> +				struct jump_entry *stop,
>> +				bool init)
>> +{
>> +	for_each_label_entry(key, entry, stop) {
>> +
>> +		if (!jump_label_can_update_check(entry, init))
>> +			continue;
>> +
>> +		if (arch_jump_label_transform_queue(entry,
>> +						    jump_label_type(entry)))
>> +			continue;
>> +
>> +		/*
>> +		 * Queue's overflow: Apply the current queue, and then
>> +		 * queue again. If it stills not possible to queue, BUG!
>> +		 */
>> +		arch_jump_label_transform_apply();
>> +		if (!arch_jump_label_transform_queue(entry,
>> +						     jump_label_type(entry))) {
>> +			BUG();
> 
> Please do not relay on BUG(), since in both case (applied or not),
> jump_label is not critical for normal operation. I think you should use
> WARN_ONCE() here and lock the jump_label so that root user can report it
> to us :)

Oops! I wrote a patch changing this, removing the BUG(). It was a request from
steve, but somehow I ended up missing it.

Thanks for reviewing this...

My bad, sorry :-(

-- Daniel
> Thank you,
> 
> 


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 5/9] x86/alternative: Split text_poke_bp() into tree steps
  2019-02-04 19:58 ` [PATCH V4 5/9] x86/alternative: Split text_poke_bp() into tree steps Daniel Bristot de Oliveira
@ 2019-02-06 19:07   ` Borislav Petkov
  2019-02-08  0:11   ` Steven Rostedt
  2019-02-08  0:15   ` Steven Rostedt
  2 siblings, 0 replies; 36+ messages in thread
From: Borislav Petkov @ 2019-02-06 19:07 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Mon, Feb 04, 2019 at 08:58:58PM +0100, Daniel Bristot de Oliveira wrote:
> text_poke_bp() updates instructions on live kernel on SMP in three steps:
>  1) add a int3 trap to the address that will be patched
>  2) update all but the first byte of the patched range
>  3) replace the first byte (int3) by the first byte of
> 
> This patch creates one function for each of these steps.

s/This patch creates/Create/

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper
  2019-02-05 21:13       ` Borislav Petkov
@ 2019-02-07 13:21         ` Daniel Bristot de Oliveira
  2019-02-07 14:08           ` Borislav Petkov
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-07 13:21 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On 2/5/19 10:13 PM, Borislav Petkov wrote:
> On Tue, Feb 05, 2019 at 02:50:39PM +0100, Daniel Bristot de Oliveira wrote:
>>>> +		if (jump_label_can_update_check(entry, init)) {
>>>> +			arch_jump_label_transform(entry,
>>>> +						  jump_label_type(entry));
>>>
>>> Yeah, let that one stick out.
>>
>> I did not get this part...
> 
> Just let the line stick outside of the 80 cols rule which is not a hard
> one:
> 
> +          if (jump_label_can_update_check(entry, init)) {
> +                  arch_jump_label_transform(entry, jump_label_type(entry));
> 
> 
> :-)
> 

How about this?

----------------- %< ----------------------------
Subject: jump_label: Add a jump_label_can_update() helper

Move the check if a jump_entry is valid to a function.

Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
Cc: Jiri Kosina <jkosina@suse.cz>
Cc: Josh Poimboeuf <jpoimboe@redhat.com>
Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
Cc: Chris von Recklinghausen <crecklin@redhat.com>
Cc: Jason Baron <jbaron@akamai.com>
Cc: Scott Wood <swood@redhat.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Clark Williams <williams@redhat.com>
Cc: x86@kernel.org
Cc: linux-kernel@vger.kernel.org
---
 kernel/jump_label.c | 27 +++++++++++++++++----------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/kernel/jump_label.c b/kernel/jump_label.c
index 288d630da22d..1e6f4d27e28d 100644
--- a/kernel/jump_label.c
+++ b/kernel/jump_label.c
@@ -374,22 +374,29 @@ static enum jump_label_type jump_label_type(struct jump_entry *entry)
 	return enabled ^ branch;
 }
 
+static bool jump_label_can_update(struct jump_entry *entry, bool init)
+{
+	/*
+	 * Cannot update code that was in an init text area.
+	 */
+	if (!init || jump_entry_is_init(entry))
+		return false;
+
+	if (WARN_ONCE(!kernel_text_address(jump_entry_code(entry)),
+			 "can't patch jump_label at %pS", (void *)jump_entry_code(entry)))
+		return false;
+
+	return true;
+}
+
 static void __jump_label_update(struct static_key *key,
 				struct jump_entry *entry,
 				struct jump_entry *stop,
 				bool init)
 {
 	for_each_label_entry(key, entry, stop) {
-		/*
-		 * An entry->code of 0 indicates an entry which has been
-		 * disabled because it was in an init text area.
-		 */
-		if (init || !jump_entry_is_init(entry)) {
-			if (kernel_text_address(jump_entry_code(entry)))
-				arch_jump_label_transform(entry, jump_label_type(entry));
-			else
-				WARN_ONCE(1, "can't patch jump_label at %pS",
-					  (void *)jump_entry_code(entry));
+		if (jump_label_can_update(entry, init)) {
+			arch_jump_label_transform(entry, jump_label_type(entry));
 		}
 	}
 }
---------------- >% --------------------

Thanks!

-- Daniel


^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper
  2019-02-07 13:21         ` Daniel Bristot de Oliveira
@ 2019-02-07 14:08           ` Borislav Petkov
  2019-02-07 17:00             ` Daniel Bristot de Oliveira
  0 siblings, 1 reply; 36+ messages in thread
From: Borislav Petkov @ 2019-02-07 14:08 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Thu, Feb 07, 2019 at 02:21:09PM +0100, Daniel Bristot de Oliveira wrote:
> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index 288d630da22d..1e6f4d27e28d 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -374,22 +374,29 @@ static enum jump_label_type jump_label_type(struct jump_entry *entry)
>  	return enabled ^ branch;
>  }
>  
> +static bool jump_label_can_update(struct jump_entry *entry, bool init)
> +{
> +	/*
> +	 * Cannot update code that was in an init text area.
> +	 */
> +	if (!init || jump_entry_is_init(entry))

Shouldn't this be &&

?

> +		return false;
> +
> +	if (WARN_ONCE(!kernel_text_address(jump_entry_code(entry)),
> +			 "can't patch jump_label at %pS", (void *)jump_entry_code(entry)))
> +		return false;

Yeah, I think that this way of writing it is less readable than:

	if (!kernel_text_address(jump_entry_code(entry))) {
		WARN_ONCE(1, "can't patch jump_label at %pS", (void *)jump_entry_code(entry));
		return false;
	}

> +		if (jump_label_can_update(entry, init)) {
> +			arch_jump_label_transform(entry, jump_label_type(entry));

Yap.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper
  2019-02-07 14:08           ` Borislav Petkov
@ 2019-02-07 17:00             ` Daniel Bristot de Oliveira
  2019-02-07 17:08               ` Borislav Petkov
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-07 17:00 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On 2/7/19 3:08 PM, Borislav Petkov wrote:
> On Thu, Feb 07, 2019 at 02:21:09PM +0100, Daniel Bristot de Oliveira wrote:
>> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
>> index 288d630da22d..1e6f4d27e28d 100644
>> --- a/kernel/jump_label.c
>> +++ b/kernel/jump_label.c
>> @@ -374,22 +374,29 @@ static enum jump_label_type jump_label_type(struct jump_entry *entry)
>>  	return enabled ^ branch;
>>  }
>>  
>> +static bool jump_label_can_update(struct jump_entry *entry, bool init)
>> +{
>> +	/*
>> +	 * Cannot update code that was in an init text area.
>> +	 */
>> +	if (!init || jump_entry_is_init(entry))
> 
> Shouldn't this be &&
> 
> ?

Oops! should be &&! sorry.

>> +		return false;
>> +
>> +	if (WARN_ONCE(!kernel_text_address(jump_entry_code(entry)),
>> +			 "can't patch jump_label at %pS", (void *)jump_entry_code(entry)))
>> +		return false;
> 
> Yeah, I think that this way of writing it is less readable than:
> 
> 	if (!kernel_text_address(jump_entry_code(entry))) {
> 		WARN_ONCE(1, "can't patch jump_label at %pS", (void *)jump_entry_code(entry));
> 		return false;
> 	}

It is taking 95 characters. In this case, wouldn't be better to break?

        if (!kernel_text_address(jump_entry_code(entry))) {
                WARN_ONCE(1, "can't patch jump_label at %pS",
                          (void *)jump_entry_code(entry));
                return false;
        }

I agree your suggestion is better... just confirming that 95 is not too long...

> 
>> +		if (jump_label_can_update(entry, init)) {
>> +			arch_jump_label_transform(entry, jump_label_type(entry));
> 
> Yap.
> 
> Thx.
> 

Thanks!

-- Daniel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper
  2019-02-07 17:00             ` Daniel Bristot de Oliveira
@ 2019-02-07 17:08               ` Borislav Petkov
  0 siblings, 0 replies; 36+ messages in thread
From: Borislav Petkov @ 2019-02-07 17:08 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Thu, Feb 07, 2019 at 06:00:53PM +0100, Daniel Bristot de Oliveira wrote:
> It is taking 95 characters. In this case, wouldn't be better to break?
> 
>         if (!kernel_text_address(jump_entry_code(entry))) {
>                 WARN_ONCE(1, "can't patch jump_label at %pS",
>                           (void *)jump_entry_code(entry));
>                 return false;
>         }
> 
> I agree your suggestion is better... just confirming that 95 is not too long...

Well, current monitors are always > 80 cols unless someone has a tmux
with a gazillion windows in it (yeah, I know of a couple people who do
that :)) and at least to me, breaking the line like that looks less
readable because I need to look at the two lines to parse what's going
on.

Vs when you have one long line, so you look at

	WARN_ONCE(...

and go, oh, ok, we're warning here and nothing else. Without paying too
much attention to the actual arguments of the WARN.

But this is just me.

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 5/9] x86/alternative: Split text_poke_bp() into tree steps
  2019-02-04 19:58 ` [PATCH V4 5/9] x86/alternative: Split text_poke_bp() into tree steps Daniel Bristot de Oliveira
  2019-02-06 19:07   ` Borislav Petkov
@ 2019-02-08  0:11   ` Steven Rostedt
  2019-02-08  0:15   ` Steven Rostedt
  2 siblings, 0 replies; 36+ messages in thread
From: Steven Rostedt @ 2019-02-08  0:11 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Mon,  4 Feb 2019 20:58:58 +0100
Daniel Bristot de Oliveira <bristot@redhat.com> wrote:

> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index d458c7973c56..202af29c43c0 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -767,6 +767,29 @@ int poke_int3_handler(struct pt_regs *regs)
>  
>  }
>  
> +static void text_poke_bp_set_handler(void *addr, void *handler,
> +				     unsigned char int3)
> +{
> +	bp_int3_handler = handler;
> +	bp_int3_addr = (u8 *)addr + sizeof(int3);

Why add the above update to the handler in this function? It looks
strange in this patch. Then I thought, "hmm, maybe it has a reason to
be here in other patches". Then I see in patch 7, you *REMOVE* these
awkward lines from this function! Let's not move them here to begin
with.

We then don't even need to pass in "handler". And perhaps rename it to
just "text_poke_bp_add_int3()"?

-- Steve


> +	text_poke(addr, &int3, sizeof(int3));
> +}
> +
> +static void patch_all_but_first_byte(void *addr, const void *opcode,
> +				     size_t len, unsigned char int3)
> +{
> +	/* patch all but the first byte */
> +	text_poke((char *)addr + sizeof(int3),
> +		  (const char *) opcode + sizeof(int3),
> +		  len - sizeof(int3));
> +}
> +

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 5/9] x86/alternative: Split text_poke_bp() into tree steps
  2019-02-04 19:58 ` [PATCH V4 5/9] x86/alternative: Split text_poke_bp() into tree steps Daniel Bristot de Oliveira
  2019-02-06 19:07   ` Borislav Petkov
  2019-02-08  0:11   ` Steven Rostedt
@ 2019-02-08  0:15   ` Steven Rostedt
  2019-02-15 12:47     ` Daniel Bristot de Oliveira
  2 siblings, 1 reply; 36+ messages in thread
From: Steven Rostedt @ 2019-02-08  0:15 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Mon,  4 Feb 2019 20:58:58 +0100
Daniel Bristot de Oliveira <bristot@redhat.com> wrote:

>  
> +static void text_poke_bp_set_handler(void *addr, void *handler,
> +				     unsigned char int3)
> +{
> +	bp_int3_handler = handler;
> +	bp_int3_addr = (u8 *)addr + sizeof(int3);
> +	text_poke(addr, &int3, sizeof(int3));
> +}
> +

> +
> +static void patch_first_byte(void *addr, const void *opcode, unsigned char int3)
> +{
> +	/* patch the first byte */
> +	text_poke(addr, opcode, sizeof(int3));
> +}

Hmm, perhaps get rid of the first function entirely, and just do...
(although why have the "int3" here anyway?)

> +
>  /**
>   * text_poke_bp() -- update instructions on live kernel on SMP
>   * @addr:	address to patch
> @@ -791,27 +814,21 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
>  {
>  	unsigned char int3 = 0xcc;
>  
> -	bp_int3_handler = handler;
> -	bp_int3_addr = (u8 *)addr + sizeof(int3);
> -	bp_patching_in_progress = true;
> -
>  	lockdep_assert_held(&text_mutex);
>  
> +	bp_patching_in_progress = true;
>  	/*
>  	 * Corresponding read barrier in int3 notifier for making sure the
>  	 * in_progress and handler are correctly ordered wrt. patching.
>  	 */
>  	smp_wmb();
>  
> -	text_poke(addr, &int3, sizeof(int3));
> +	text_poke_bp_set_handler(addr, handler, int3);

	patch_first_byte(addr, &int3, int3);

Which could be just:

	patch_first_byte(addr, &int3);

if we remove passing in int3 (for its size?).

-- Steve

>  
>  	on_each_cpu(do_sync_core, NULL, 1);
>  
>  	if (len - sizeof(int3) > 0) {
> -		/* patch all but the first byte */
> -		text_poke((char *)addr + sizeof(int3),
> -			  (const char *) opcode + sizeof(int3),
> -			  len - sizeof(int3));
> +		patch_all_but_first_byte(addr, opcode, len, int3);
>  		/*
>  		 * According to Intel, this core syncing is very likely
>  		 * not necessary and we'd be safe even without it. But
> @@ -820,8 +837,7 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
>  		on_each_cpu(do_sync_core, NULL, 1);
>  	}
>  
> -	/* patch the first byte */
> -	text_poke(addr, opcode, sizeof(int3));
> +	patch_first_byte(addr, opcode, int3);
>  
>  	on_each_cpu(do_sync_core, NULL, 1);
>  	/*


^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 7/9] x86/alternative: Batch of patch operations
  2019-02-04 19:59 ` [PATCH V4 7/9] x86/alternative: Batch of patch operations Daniel Bristot de Oliveira
@ 2019-02-14 12:53   ` Borislav Petkov
  2019-02-14 14:06     ` Steven Rostedt
  0 siblings, 1 reply; 36+ messages in thread
From: Borislav Petkov @ 2019-02-14 12:53 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Mon, Feb 04, 2019 at 08:59:00PM +0100, Daniel Bristot de Oliveira wrote:
> Currently, the patch of an address is done in three steps:

"Currently, the kernel code at a specific address is patched in three steps:"

> -- Pseudo-code #1 - Current implementation ---
>         1) add an int3 trap to the address that will be patched
>             sync cores (send IPI to all other CPUs)
>         2) update all but the first byte of the patched range
>             sync cores (send IPI to all other CPUs)
>         3) replace the first byte (int3) by the first byte of replacing opcode
>             sync cores (send IPI to all other CPUs)
> -- Pseudo-code #1 ---
> 
> When a static key has more than one entry, these steps are called once for

"When a static key is used in multiple locations and thus multiple
entries need to be patched... "

> each entry. The number of IPIs then is linear with regard to the number 'n' of
> entries of a key: O(n*3), which is O(n).
> 
> This algorithm works fine for the update of a single key. But we think

Please no "we" in the commit message - instead, use passive
formulations and:

 "Describe your changes in imperative mood, e.g. "make xyzzy do frotz"
  instead of "[This patch] makes xyzzy do frotz" or "[I] changed xyzzy
  to do frotz", as if you are giving orders to the codebase to change
  its behaviour."

> it is possible to optimize the case in which a static key has more than
> one entry. For instance, the sched_schedstats jump label has 56 entries
> in my (updated) fedora kernel, resulting in 168 IPIs for each CPU in
> which the thread that is enabling the key is _not_ running.
> 
> With this patch, rather than receiving a single patch to be processed, a vector

"After this change, ..."

> of patches is passed, enabling the rewrite of the pseudo-code #1 in this

"patches" is a very overloaded term. Please change.

> way:
> 
> -- Pseudo-code #2 - This patch  ---
> 1)  for each patch in the vector:
>         add an int3 trap to the address that will be patched
> 
>     sync cores (send IPI to all other CPUs)
> 
> 2)  for each patch in the vector:
>         update all but the first byte of the patched range
> 
>     sync cores (send IPI to all other CPUs)
> 
> 3)  for each patch in the vector:
>         replace the first byte (int3) by the first byte of replacing opcode
> 
>     sync cores (send IPI to all other CPUs)
> -- Pseudo-code #2 - This patch  ---
> 
> Doing the update in this way, the number of IPI becomes O(3) with regard
> to the number of keys, which is O(1).
> 
> The batch mode is done with the function text_poke_bp_batch(), that receives
> two arguments: a vector of "struct text_to_poke", and the number of entries
> in the vector.

This should be either over the function in a comment but looking at it,
it already has an explanation so no need to repeat it here in the commit
message.

> The vector must be sorted by the addr field of the text_to_poke structure,
> enabling the binary search of a handler in the poke_int3_handler function
> (a fast path).

That absolutely needs to be somewhere above the vector struct definition!

> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
> Reviewed-by: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Chris von Recklinghausen <crecklin@redhat.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Cc: Scott Wood <swood@redhat.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Cc: Clark Williams <williams@redhat.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
>  arch/x86/include/asm/text-patching.h |  15 ++++
>  arch/x86/kernel/alternative.c        | 124 ++++++++++++++++++++++-----
>  2 files changed, 118 insertions(+), 21 deletions(-)
> 
> diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
> index e85ff65c43c3..42ea7846df33 100644
> --- a/arch/x86/include/asm/text-patching.h
> +++ b/arch/x86/include/asm/text-patching.h
> @@ -18,6 +18,20 @@ static inline void apply_paravirt(struct paravirt_patch_site *start,
>  #define __parainstructions_end	NULL
>  #endif
>  
> +/*
> + * Currently, the max observed size in the kernel code is
> + * JUMP_LABEL_NOP_SIZE/RELATIVEJUMP_SIZE, which are 5.
> + * Raise it if needed.
> + */
> +#define POKE_MAX_OPCODE_SIZE	5
> +
> +struct text_to_poke {

That name needs bikeshedding. Maybe struct patch_loc is more fitting as
it is a patch location descriptor, AFAICT.

> +	void *handler;

Oh my, took me a while to realize that this is not really a function
which is supposed to handle something but the place we go to when the
temporary breakpoint hits. Please change that name and all the code
which calls it handler. It is very misleading.

> +	void *addr;
> +	size_t len;
> +	const char opcode[POKE_MAX_OPCODE_SIZE];
> +};
> +
>  extern void *text_poke_early(void *addr, const void *opcode, size_t len);
>  
>  /*
> @@ -37,6 +51,7 @@ extern void *text_poke_early(void *addr, const void *opcode, size_t len);
>  extern void *text_poke(void *addr, const void *opcode, size_t len);
>  extern int poke_int3_handler(struct pt_regs *regs);
>  extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
> +extern void text_poke_bp_batch(struct text_to_poke *tp, unsigned int nr_entries);
>  extern int after_bootmem;
>  
>  #endif /* _ASM_X86_TEXT_PATCHING_H */
> diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> index 202af29c43c0..318b6867dc5a 100644
> --- a/arch/x86/kernel/alternative.c
> +++ b/arch/x86/kernel/alternative.c
> @@ -11,6 +11,8 @@
>  #include <linux/stop_machine.h>
>  #include <linux/slab.h>
>  #include <linux/kdebug.h>
> +#include <linux/kprobes.h>
> +#include <linux/bsearch.h>
>  #include <asm/text-patching.h>
>  #include <asm/alternative.h>
>  #include <asm/sections.h>
> @@ -738,10 +740,26 @@ static void do_sync_core(void *info)
>  }
>  
>  static bool bp_patching_in_progress;
> -static void *bp_int3_handler, *bp_int3_addr;
> +static struct text_to_poke *bp_int3_tpv;
> +static unsigned int bp_int3_tpv_nr;

Those names need bikeshedding. See below.

> +
> +static int text_bp_batch_bsearch(const void *key, const void *elt)

This is a local function: patch_cmp() is fine. And if we end up calling
the vector element patch_loc then patch_loc_cmp() is perfectly fitting
:)

> +{
> +	struct text_to_poke *tp = (struct text_to_poke *) elt;
> +
> +	if (key < tp->addr)
> +		return -1;
> +	if (key > tp->addr)
> +		return 1;
> +	return 0;
> +}
> +NOKPROBE_SYMBOL(text_bp_batch_bsearch);
>  
>  int poke_int3_handler(struct pt_regs *regs)
>  {
> +	void *ip;
> +	struct text_to_poke *tp;

Flip the order of those pls. We love the reverse xmas tree :)

> +
>  	/*
>  	 * Having observed our INT3 instruction, we now must observe
>  	 * bp_patching_in_progress.
> @@ -757,21 +775,40 @@ int poke_int3_handler(struct pt_regs *regs)
>  	if (likely(!bp_patching_in_progress))
>  		return 0;
>  
> -	if (user_mode(regs) || regs->ip != (unsigned long)bp_int3_addr)
> +	if (user_mode(regs))
>  		return 0;
>  
> -	/* set up the specified breakpoint handler */
> -	regs->ip = (unsigned long) bp_int3_handler;
> +	ip = (void *) regs->ip - sizeof(unsigned char);
>  
> -	return 1;
> +	/*
> +	 * Skip the binary search if there is a single member in the vector.
> +	 */
> +	if (unlikely(bp_int3_tpv_nr == 1))

Eeww, a global vector length counter. AFAICT, this relies on the fact
that patching happens sequentially. And I guess it does because we grab
text_mutex.

Ok, it is not so much better but let's at least put all those things you
need during patching in a structure so that you can access them in the
different handlers:

static struct bp_patching_desc {
	struct patch_loc *vec;
	int nr_entries;
	bool in_progress;
	...
} bp_patching;

so that you can populate it in text_poke_bp_batch() and query in the
rest of the code. We'll redesign the whole thing when we decide we wanna
do parallel patching someday.

> +		goto single_poke;
> +
> +	tp = bsearch(ip, bp_int3_tpv, bp_int3_tpv_nr,
> +		     sizeof(struct text_to_poke),
> +		     text_bp_batch_bsearch);
> +	if (tp) {

Flip check:

	if (!tp)
		return 0;

and save yourself an indentation level.

> +		/* set up the specified breakpoint handler */
> +		regs->ip = (unsigned long) tp->handler;
> +		return 1;
> +	}
>  
> +	return 0;
> +
> +single_poke:
> +	if (ip == bp_int3_tpv->addr) {
> +		regs->ip = (unsigned long) bp_int3_tpv->handler;
> +		return 1;
> +	}
> +
> +	return 0;
>  }
>  
>  static void text_poke_bp_set_handler(void *addr, void *handler,
>  				     unsigned char int3)
>  {
> -	bp_int3_handler = handler;
> -	bp_int3_addr = (u8 *)addr + sizeof(int3);
>  	text_poke(addr, &int3, sizeof(int3));
>  }
>  
> @@ -791,31 +828,36 @@ static void patch_first_byte(void *addr, const void *opcode, unsigned char int3)
>  }
>  
>  /**
> - * text_poke_bp() -- update instructions on live kernel on SMP
> - * @addr:	address to patch
> - * @opcode:	opcode of new instruction
> - * @len:	length to copy
> - * @handler:	address to jump to when the temporary breakpoint is hit
> + * text_poke_bp_batch() -- update instructions on live kernel on SMP
> + * @tp:		vector of instructions to patch
> + * @nr_entries:	number of entries in the vector
>   *
>   * Modify multi-byte instruction by using int3 breakpoint on SMP.
>   * We completely avoid stop_machine() here, and achieve the
>   * synchronization using int3 breakpoint.
>   *
>   * The way it is done:
> - *	- add a int3 trap to the address that will be patched
> + *	- For each entry in the vector:
> + *	    - add a int3 trap to the address that will be patched
>   *	- sync cores
> - *	- update all but the first byte of the patched range
> + *	- For each entry in the vector:
> + *	    - update all but the first byte of the patched range
>   *	- sync cores
> - *	- replace the first byte (int3) by the first byte of
> - *	  replacing opcode
> + *	- For each entry in the vector:
> + *	    - replace the first byte (int3) by the first byte of
> + *	      replacing opcode
>   *	- sync cores
>   */
> -void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
> +void text_poke_bp_batch(struct text_to_poke *tp, unsigned int nr_entries)
>  {
> +	unsigned int i;
>  	unsigned char int3 = 0xcc;
> +	int patched_all_but_first = 0;

Please sort function local variables declaration in a reverse christmas
tree order:

	<type A> longest_variable_name;
	<type B> shorter_var_name;
	<type C> even_shorter;
	<type D> i;


>  	lockdep_assert_held(&text_mutex);
>  
> +	bp_int3_tpv = tp;
> +	bp_int3_tpv_nr = nr_entries;
>  	bp_patching_in_progress = true;
>  	/*
>  	 * Corresponding read barrier in int3 notifier for making sure the
> @@ -823,12 +865,20 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
>  	 */
>  	smp_wmb();
>  
> -	text_poke_bp_set_handler(addr, handler, int3);
> +	for (i = 0; i < nr_entries; i++)
> +		text_poke_bp_set_handler(tp[i].addr, tp[i].handler, int3);

Again, that "handler" is misleading here. But you get the idea.

Also, that second argument in text_poke_bp_set_handler() is unused. What
gives?

Also, we're passing around this silly int3 char. Why?

It can really be a global:

	static const unsigned char int3 = 0xcc;

and be referenced from everywhere. One less function arg to pay
attention to.

>  	on_each_cpu(do_sync_core, NULL, 1);
>  
> -	if (len - sizeof(int3) > 0) {
> -		patch_all_but_first_byte(addr, opcode, len, int3);
> +	for (i = 0; i < nr_entries; i++) {
> +		if (tp[i].len - sizeof(int3) > 0) {
> +			patch_all_but_first_byte(tp[i].addr, tp[i].opcode,
> +						 tp[i].len, int3);
> +			patched_all_but_first++;
> +		}
> +	}
> +
> +	if (patched_all_but_first) {
>  		/*
>  		 * According to Intel, this core syncing is very likely
>  		 * not necessary and we'd be safe even without it. But
> @@ -837,14 +887,46 @@ void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
>  		on_each_cpu(do_sync_core, NULL, 1);
>  	}
>  
> -	patch_first_byte(addr, opcode, int3);
> +	for (i = 0; i < nr_entries; i++)
> +		patch_first_byte(tp[i].addr, tp[i].opcode, int3);
>  
>  	on_each_cpu(do_sync_core, NULL, 1);
>  	/*
>  	 * sync_core() implies an smp_mb() and orders this store against
>  	 * the writing of the new instruction.
>  	 */
> +	bp_int3_tpv_nr = 0;
> +	bp_int3_tpv = NULL;
>  	bp_patching_in_progress = false;
> +}
> +
> +/**
> + * text_poke_bp() -- update instructions on live kernel on SMP
> + * @addr:	address to patch
> + * @opcode:	opcode of new instruction
> + * @len:	length to copy
> + * @handler:	address to jump to when the temporary breakpoint is hit
> + *
> + * Update a single instruction with the vector in the stack, avoiding
> + * dynamically allocated memory. This function should be used when it is
> + * not possible to allocate memory.
> + */
> +void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler)
> +{
> +	struct text_to_poke tp = {
> +		.handler = handler,
> +		.addr = addr,
> +		.len = len,
> +	};
> +
> +	if (len > POKE_MAX_OPCODE_SIZE) {
> +		WARN_ONCE(1, "len is larger than %d\n", POKE_MAX_OPCODE_SIZE);
> +		return NULL;
> +	}

	lockdep_assert_held(&text_mutex);

> +	memcpy((void *)tp.opcode, opcode, len);
> +
> +	text_poke_bp_batch(&tp, 1);
>  
>  	return addr;
>  }
> -- 
> 2.17.1
> 

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 8/9] jump_label: Batch updates if arch supports it
  2019-02-04 19:59 ` [PATCH V4 8/9] jump_label: Batch updates if arch supports it Daniel Bristot de Oliveira
  2019-02-06  6:34   ` Masami Hiramatsu
@ 2019-02-14 13:33   ` Borislav Petkov
  1 sibling, 0 replies; 36+ messages in thread
From: Borislav Petkov @ 2019-02-14 13:33 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Mon, Feb 04, 2019 at 08:59:01PM +0100, Daniel Bristot de Oliveira wrote:
> If the architecture supports the batching of jump label updates, use it!
> 
> An easy way to see the benefits of this patch is switching the
> schedstats on and off. For instance:
> 
> -------------------------- %< ----------------------------
> #!/bin/bash
> 
> while [ true ]; do
>     sysctl -w kernel.sched_schedstats=1
>     sleep 2
>     sysctl -w kernel.sched_schedstats=0
>     sleep 2
> done
> -------------------------- >% ----------------------------
> 
> while watching the IPI count:
> 
> -------------------------- %< ----------------------------
> # watch -n1 "cat /proc/interrupts | grep Function"
> -------------------------- >% ----------------------------
> 
> With the current mode, it is possible to see +- 168 IPIs each 2 seconds,
> while with this patch the number of IPIs goes to 3 each 2 seconds.
> 
> Regarding the performance impact of this patch set, I made two measurements:
> 
>     The time to update a key (the task that is causing the change)
>     The time to run the int3 handler (the side effect on a thread that
>                                       hits the code being changed)
> 
> The schedstats static key was chosen as the key to being switched on and off.
> The reason being is that it is used in more than 56 places, in a hot path. The
> change in the schedstats static key will be done with the following command:
> 
> while [ true ]; do
>     sysctl -w kernel.sched_schedstats=1
>     usleep 500000
>     sysctl -w kernel.sched_schedstats=0
>     usleep 500000
> done
> 
> In this way, they key will be updated twice per second. To force the hit of the
> int3 handler, the system will also run a kernel compilation with two jobs per
> CPU. The test machine is a two nodes/24 CPUs box with an Intel Xeon processor
> @2.27GHz.
> 
> Regarding the update part, on average, the regular kernel takes 57 ms to update
> the schedstats key, while the kernel with the batch updates takes just 1.4 ms
> on average. Although it seems to be too good to be true, it makes sense: the
> schedstats key is used in 56 places, so it was expected that it would take
> around 56 times to update the keys with the current implementation, as the
> IPIs are the most expensive part of the update.
> 
> Regarding the int3 handler, the non-batch handler takes 45 ns on average, while
> the batch version takes around 180 ns. At first glance, it seems to be a high
> value. But it is not, considering that it is doing 56 updates, rather than one!
> It is taking four times more, only. This gain is possible because the patch
> uses a binary search in the vector: log2(56)=5.8. So, it was expected to have
> an overhead within four times.
> 
> (voice of tv propaganda) But, that is not all! As the int3 handler keeps on for
> a shorter period (because the update part is on for a shorter time), the number
> of hits in the int3 handler decreased by 10%.
> 
> The question then is: Is it worth paying the price of "135 ns" more in the int3
> handler?
> 
> Considering that, in this test case, we are saving the handling of 53 IPIs,
> that takes more than these 135 ns, it seems to be a meager price to be paid.
> Moreover, the test case was forcing the hit of the int3, in practice, it
> does not take that often. While the IPI takes place on all CPUs, hitting
> the int3 handler or not!
> 
> For instance, in an isolated CPU with a process running in user-space
> (nohz_full use-case), the chances of hitting the int3 handler is barely zero,
> while there is no way to avoid the IPIs. By bounding the IPIs, we are improving
> a lot this scenario.

I like that commit message.

> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
> Cc: Masami Hiramatsu <mhiramat@kernel.org>
> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
> Cc: Jiri Kosina <jkosina@suse.cz>
> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
> Cc: Chris von Recklinghausen <crecklin@redhat.com>
> Cc: Jason Baron <jbaron@akamai.com>
> Cc: Scott Wood <swood@redhat.com>
> Cc: Marcelo Tosatti <mtosatti@redhat.com>
> Cc: Clark Williams <williams@redhat.com>
> Cc: x86@kernel.org
> Cc: linux-kernel@vger.kernel.org
> 
> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
> 
> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
> 
> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
> 
> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>

What happened here? Your signing tool went nuts? :-)

> ---
>  include/linux/jump_label.h |  3 +++
>  kernel/jump_label.c        | 29 +++++++++++++++++++++++++++++
>  2 files changed, 32 insertions(+)
> 
> diff --git a/include/linux/jump_label.h b/include/linux/jump_label.h
> index 7e91af98bbb1..b3dfce98edb7 100644
> --- a/include/linux/jump_label.h
> +++ b/include/linux/jump_label.h
> @@ -215,6 +215,9 @@ extern void arch_jump_label_transform(struct jump_entry *entry,
>  				      enum jump_label_type type);
>  extern void arch_jump_label_transform_static(struct jump_entry *entry,
>  					     enum jump_label_type type);
> +extern int arch_jump_label_transform_queue(struct jump_entry *entry,
> +					    enum jump_label_type type);
> +extern void arch_jump_label_transform_apply(void);
>  extern int jump_label_text_reserved(void *start, void *end);
>  extern void static_key_slow_inc(struct static_key *key);
>  extern void static_key_slow_dec(struct static_key *key);
> diff --git a/kernel/jump_label.c b/kernel/jump_label.c
> index 53b7c85c0b09..944c75a0b09b 100644
> --- a/kernel/jump_label.c
> +++ b/kernel/jump_label.c
> @@ -407,6 +407,7 @@ bool jump_label_can_update_check(struct jump_entry *entry, bool init)
>  	return 0;
>  }
>  
> +#ifndef HAVE_JUMP_LABEL_BATCH
>  static void __jump_label_update(struct static_key *key,
>  				struct jump_entry *entry,
>  				struct jump_entry *stop,
> @@ -419,6 +420,34 @@ static void __jump_label_update(struct static_key *key,
>  		}
>  	}
>  }
> +#else
> +static void __jump_label_update(struct static_key *key,
> +				struct jump_entry *entry,
> +				struct jump_entry *stop,
> +				bool init)
> +{
> +	for_each_label_entry(key, entry, stop) {
> +
> +		if (!jump_label_can_update_check(entry, init))
> +			continue;
> +
> +		if (arch_jump_label_transform_queue(entry,
> +						    jump_label_type(entry)))

Let those lines stick out - better than breaking them like that.

> +			continue;
> +
> +		/*
> +		 * Queue's overflow: Apply the current queue, and then
> +		 * queue again. If it stills not possible to queue, BUG!
> +		 */
> +		arch_jump_label_transform_apply();
> +		if (!arch_jump_label_transform_queue(entry,
> +						     jump_label_type(entry))) {

Ditto.

> +			BUG();
> +		}
> +	}
> +	arch_jump_label_transform_apply();
> +}
> +#endif
>  
>  void __init jump_label_init(void)
>  {
> -- 
> 2.17.1
> 

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 7/9] x86/alternative: Batch of patch operations
  2019-02-14 12:53   ` Borislav Petkov
@ 2019-02-14 14:06     ` Steven Rostedt
  2019-02-14 14:30       ` Borislav Petkov
  0 siblings, 1 reply; 36+ messages in thread
From: Steven Rostedt @ 2019-02-14 14:06 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Daniel Bristot de Oliveira, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Greg Kroah-Hartman,
	Masami Hiramatsu, Jiri Kosina, Josh Poimboeuf,
	Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Thu, 14 Feb 2019 13:53:30 +0100
Borislav Petkov <bp@alien8.de> wrote:

> >  arch/x86/include/asm/text-patching.h |  15 ++++
> >  arch/x86/kernel/alternative.c        | 124 ++++++++++++++++++++++-----
> >  2 files changed, 118 insertions(+), 21 deletions(-)
> > 
> > diff --git a/arch/x86/include/asm/text-patching.h b/arch/x86/include/asm/text-patching.h
> > index e85ff65c43c3..42ea7846df33 100644
> > --- a/arch/x86/include/asm/text-patching.h
> > +++ b/arch/x86/include/asm/text-patching.h
> > @@ -18,6 +18,20 @@ static inline void apply_paravirt(struct paravirt_patch_site *start,
> >  #define __parainstructions_end	NULL
> >  #endif
> >  
> > +/*
> > + * Currently, the max observed size in the kernel code is
> > + * JUMP_LABEL_NOP_SIZE/RELATIVEJUMP_SIZE, which are 5.
> > + * Raise it if needed.
> > + */
> > +#define POKE_MAX_OPCODE_SIZE	5
> > +
> > +struct text_to_poke {  
> 
> That name needs bikeshedding. Maybe struct patch_loc is more fitting as
> it is a patch location descriptor, AFAICT.

Although you did say above that "patch" is an overloaded term. Also, I
do find the current name rather descriptive.

> 
> > +	void *handler;  
> 
> Oh my, took me a while to realize that this is not really a function
> which is supposed to handle something but the place we go to when the
> temporary breakpoint hits. Please change that name and all the code
> which calls it handler. It is very misleading.

	void *trampoline;

?

> 
> > +	void *addr;
> > +	size_t len;
> > +	const char opcode[POKE_MAX_OPCODE_SIZE];
> > +};
> > +
> >  extern void *text_poke_early(void *addr, const void *opcode, size_t len);
> >  
> >  /*
> > @@ -37,6 +51,7 @@ extern void *text_poke_early(void *addr, const void *opcode, size_t len);
> >  extern void *text_poke(void *addr, const void *opcode, size_t len);
> >  extern int poke_int3_handler(struct pt_regs *regs);
> >  extern void *text_poke_bp(void *addr, const void *opcode, size_t len, void *handler);
> > +extern void text_poke_bp_batch(struct text_to_poke *tp, unsigned int nr_entries);
> >  extern int after_bootmem;
> >  
> >  #endif /* _ASM_X86_TEXT_PATCHING_H */
> > diff --git a/arch/x86/kernel/alternative.c b/arch/x86/kernel/alternative.c
> > index 202af29c43c0..318b6867dc5a 100644
> > --- a/arch/x86/kernel/alternative.c
> > +++ b/arch/x86/kernel/alternative.c
> > @@ -11,6 +11,8 @@
> >  #include <linux/stop_machine.h>
> >  #include <linux/slab.h>
> >  #include <linux/kdebug.h>
> > +#include <linux/kprobes.h>
> > +#include <linux/bsearch.h>
> >  #include <asm/text-patching.h>
> >  #include <asm/alternative.h>
> >  #include <asm/sections.h>
> > @@ -738,10 +740,26 @@ static void do_sync_core(void *info)
> >  }
> >  
> >  static bool bp_patching_in_progress;
> > -static void *bp_int3_handler, *bp_int3_addr;
> > +static struct text_to_poke *bp_int3_tpv;
> > +static unsigned int bp_int3_tpv_nr;  
> 
> Those names need bikeshedding. See below.

But they are named after you! "bp_" :-)

-- Steve

> 
> > +
> > +static int text_bp_batch_bsearch(const void *key, const void *elt)  
> 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 7/9] x86/alternative: Batch of patch operations
  2019-02-14 14:06     ` Steven Rostedt
@ 2019-02-14 14:30       ` Borislav Petkov
  2019-02-14 14:40         ` Steven Rostedt
  0 siblings, 1 reply; 36+ messages in thread
From: Borislav Petkov @ 2019-02-14 14:30 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Daniel Bristot de Oliveira, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Greg Kroah-Hartman,
	Masami Hiramatsu, Jiri Kosina, Josh Poimboeuf,
	Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Thu, Feb 14, 2019 at 09:06:20AM -0500, Steven Rostedt wrote:
> Although you did say above that "patch" is an overloaded term. Also, I
> do find the current name rather descriptive.

When you look at "text_to_poke" as a struct name, you go, WTF is that?
The whole text? Some of it?

It should have the word "location" in it, to be clearer IMO.

> 	void *trampoline;

Well, it ain't a trampoline either. It is a "temporary location to go to while
patching is going on". Now make that a member name!

:-)))

> But they are named after you! "bp_" :-)

Uuh, you shouldn't have! :-)

Then they *must* be changed immediately!

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 7/9] x86/alternative: Batch of patch operations
  2019-02-14 14:30       ` Borislav Petkov
@ 2019-02-14 14:40         ` Steven Rostedt
  2019-02-14 14:54           ` Borislav Petkov
  2019-02-15 16:00           ` Daniel Bristot de Oliveira
  0 siblings, 2 replies; 36+ messages in thread
From: Steven Rostedt @ 2019-02-14 14:40 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Daniel Bristot de Oliveira, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Greg Kroah-Hartman,
	Masami Hiramatsu, Jiri Kosina, Josh Poimboeuf,
	Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Thu, 14 Feb 2019 15:30:30 +0100
Borislav Petkov <bp@alien8.de> wrote:

> > 	void *trampoline;  
> 
> Well, it ain't a trampoline either. It is a "temporary location to go to while
> patching is going on".

	void *detour;


-- Steve

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 7/9] x86/alternative: Batch of patch operations
  2019-02-14 14:40         ` Steven Rostedt
@ 2019-02-14 14:54           ` Borislav Petkov
  2019-02-15 16:00           ` Daniel Bristot de Oliveira
  1 sibling, 0 replies; 36+ messages in thread
From: Borislav Petkov @ 2019-02-14 14:54 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Daniel Bristot de Oliveira, linux-kernel, Thomas Gleixner,
	Ingo Molnar, H. Peter Anvin, Greg Kroah-Hartman,
	Masami Hiramatsu, Jiri Kosina, Josh Poimboeuf,
	Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Thu, Feb 14, 2019 at 09:40:11AM -0500, Steven Rostedt wrote:
> 	void *detour;

Yap, with a nice comment ontop and you're golden! :-)

Thx.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 3/9] x86/jump_label: Move checking code away from __jump_label_transform()
  2019-02-05  7:33   ` Borislav Petkov
  2019-02-05 15:22     ` Daniel Bristot de Oliveira
@ 2019-02-15 10:05     ` Daniel Bristot de Oliveira
  1 sibling, 0 replies; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-15 10:05 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Steven Rostedt (VMware),
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On 2/5/19 8:33 AM, Borislav Petkov wrote:
> On Mon, Feb 04, 2019 at 08:58:56PM +0100, Daniel Bristot de Oliveira wrote:
>> Move the check of the current code, before updating an entry, to specialized
>> functions. No changes in the method, only code relocation.
>>
>> Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
>> Cc: Masami Hiramatsu <mhiramat@kernel.org>
>> Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
>> Cc: Jiri Kosina <jkosina@suse.cz>
>> Cc: Josh Poimboeuf <jpoimboe@redhat.com>
>> Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
>> Cc: Chris von Recklinghausen <crecklin@redhat.com>
>> Cc: Jason Baron <jbaron@akamai.com>
>> Cc: Scott Wood <swood@redhat.com>
>> Cc: Marcelo Tosatti <mtosatti@redhat.com>
>> Cc: Clark Williams <williams@redhat.com>
>> Cc: x86@kernel.org
>> Cc: linux-kernel@vger.kernel.org
>> ---
>>  arch/x86/kernel/jump_label.c | 60 +++++++++++++++++++++++++-----------
>>  1 file changed, 42 insertions(+), 18 deletions(-)
>>
>> diff --git a/arch/x86/kernel/jump_label.c b/arch/x86/kernel/jump_label.c
>> index f99bd26bd3f1..e443c43478eb 100644
>> --- a/arch/x86/kernel/jump_label.c
>> +++ b/arch/x86/kernel/jump_label.c
>> @@ -35,16 +35,53 @@ static void bug_at(unsigned char *ip, int line)
>>  	BUG();
>>  }
>>  
>> +static inline void __jump_label_trans_check_enable(struct jump_entry *entry,
>> +						   enum jump_label_type type,
>> +						   const unsigned char *ideal_nop,
>> +						   int init)
>> +{
>> +	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
>> +	const void *expect;
>> +	int line;
>> +
>> +	if (init) {
>> +		expect = default_nop; line = __LINE__;
>> +	} else {
>> +		expect = ideal_nop; line = __LINE__;
>> +	}
>> +
>> +	if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE))
>> +		bug_at((void *)jump_entry_code(entry), line);
>> +}
>> +
>> +static inline void __jump_label_trans_check_disable(struct jump_entry *entry,
>> +						    enum jump_label_type type,
>> +						    union jump_code_union *jmp,
>> +						    int init)
>> +{
>> +	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
>> +	const void *expect;
>> +	int line;
>> +
>> +	if (init) {
>> +		expect = default_nop; line = __LINE__;
>> +	} else {
>> +		expect = jmp->code; line = __LINE__;
>> +	}
>> +
>> +	if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE))
>> +		bug_at((void *)jump_entry_code(entry), line);
>> +}
> 
> Why the carve out?
> 
> The next patch is adding __jump_label_set_jump_code() which calls them
> so you could just as well keep the functionality all in that function without
> having too many helpers which are called only once...
> 

I removed these helpers and moved the code to __jump_label_set_jump_code().

The function looks like this now:
-------------- %< ----------------------
static void __jump_label_set_jump_code(struct jump_entry *entry,
				       enum jump_label_type type,
				       union jump_code_union *code,
				       int init)
{
	const unsigned char default_nop[] = { STATIC_KEY_INIT_NOP };
	const unsigned char *ideal_nop = ideal_nops[NOP_ATOMIC5];
	const void *expect;
	int line;

	code->jump = 0xe9;
	code->offset = jump_entry_target(entry) -
		     (jump_entry_code(entry) + JUMP_LABEL_NOP_SIZE);

	if (type == JUMP_LABEL_JMP) {
		if (init) {
			expect = default_nop; line = __LINE__;
		} else {
			expect = ideal_nop; line = __LINE__;
		}

		if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE))
			bug_at((void *)jump_entry_code(entry), line);

	} else {
		if (init) {
			expect = default_nop; line = __LINE__;
		} else {
			expect = code->code; line = __LINE__;
		}

		if (memcmp((void *)jump_entry_code(entry), expect, JUMP_LABEL_NOP_SIZE))
			bug_at((void *)jump_entry_code(entry), line);

		memcpy(code, ideal_nop, JUMP_LABEL_NOP_SIZE);
	}
}
-------------- >% ----------------------


Thanks!

-- Daniel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 5/9] x86/alternative: Split text_poke_bp() into tree steps
  2019-02-08  0:15   ` Steven Rostedt
@ 2019-02-15 12:47     ` Daniel Bristot de Oliveira
  2019-02-21 15:26       ` Steven Rostedt
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-15 12:47 UTC (permalink / raw)
  To: Steven Rostedt, Borislav Petkov
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Jiri Kosina,
	Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On 2/8/19 1:15 AM, Steven Rostedt wrote:
> On Mon,  4 Feb 2019 20:58:58 +0100
> Daniel Bristot de Oliveira <bristot@redhat.com> wrote:
> 
>>  
>> +static void text_poke_bp_set_handler(void *addr, void *handler,
>> +				     unsigned char int3)
>> +{
>> +	bp_int3_handler = handler;
>> +	bp_int3_addr = (u8 *)addr + sizeof(int3);
>> +	text_poke(addr, &int3, sizeof(int3));
>> +}
>> +
>> +
>> +static void patch_first_byte(void *addr, const void *opcode, unsigned char int3)
>> +{
>> +	/* patch the first byte */
>> +	text_poke(addr, opcode, sizeof(int3));
>> +}
> Hmm, perhaps get rid of the first function entirely, and just do...
> (although why have the "int3" here anyway?)
> 

These helpers were created because they were used twice in the first versions of
this patch set. But with the change suggested by Masami, they are called only in
the text_poke_bp_batch() now, so I am thinking to get rid of them all (removing
this patch).

Thoughts?

Thanks!
-- Daniel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 7/9] x86/alternative: Batch of patch operations
  2019-02-14 14:40         ` Steven Rostedt
  2019-02-14 14:54           ` Borislav Petkov
@ 2019-02-15 16:00           ` Daniel Bristot de Oliveira
  2019-02-15 17:20             ` Steven Rostedt
  1 sibling, 1 reply; 36+ messages in thread
From: Daniel Bristot de Oliveira @ 2019-02-15 16:00 UTC (permalink / raw)
  To: Steven Rostedt, Borislav Petkov
  Cc: linux-kernel, Thomas Gleixner, Ingo Molnar, H. Peter Anvin,
	Greg Kroah-Hartman, Masami Hiramatsu, Jiri Kosina,
	Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On 2/14/19 3:40 PM, Steven Rostedt wrote:
> On Thu, 14 Feb 2019 15:30:30 +0100
> Borislav Petkov <bp@alien8.de> wrote:
> 
>>> 	void *trampoline;  
>>
>> Well, it ain't a trampoline either. It is a "temporary location to go to while
>> patching is going on".
> 
> 	void *detour;
> 

The comment on top of text_poke_bp() defines:

handler:    address to jump to when the temporary breakpoint is hit

so, how about tmp_jump_addr?

well, detour works as well :-)

thoughts?

Thanks!
-- Daniel

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 7/9] x86/alternative: Batch of patch operations
  2019-02-15 16:00           ` Daniel Bristot de Oliveira
@ 2019-02-15 17:20             ` Steven Rostedt
  0 siblings, 0 replies; 36+ messages in thread
From: Steven Rostedt @ 2019-02-15 17:20 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: Borislav Petkov, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Fri, 15 Feb 2019 17:00:49 +0100
Daniel Bristot de Oliveira <bristot@redhat.com> wrote:

> On 2/14/19 3:40 PM, Steven Rostedt wrote:
> > On Thu, 14 Feb 2019 15:30:30 +0100
> > Borislav Petkov <bp@alien8.de> wrote:
> >   
> >>> 	void *trampoline;    
> >>
> >> Well, it ain't a trampoline either. It is a "temporary location to go to while
> >> patching is going on".  
> > 
> > 	void *detour;
> >   
> 
> The comment on top of text_poke_bp() defines:
> 
> handler:    address to jump to when the temporary breakpoint is hit
> 
> so, how about tmp_jump_addr?
> 
> well, detour works as well :-)
> 
> thoughts?
> 
>

"detour" sounds cool.

-- Steve

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH V4 5/9] x86/alternative: Split text_poke_bp() into tree steps
  2019-02-15 12:47     ` Daniel Bristot de Oliveira
@ 2019-02-21 15:26       ` Steven Rostedt
  0 siblings, 0 replies; 36+ messages in thread
From: Steven Rostedt @ 2019-02-21 15:26 UTC (permalink / raw)
  To: Daniel Bristot de Oliveira
  Cc: Borislav Petkov, linux-kernel, Thomas Gleixner, Ingo Molnar,
	H. Peter Anvin, Greg Kroah-Hartman, Masami Hiramatsu,
	Jiri Kosina, Josh Poimboeuf, Peter Zijlstra (Intel),
	Chris von Recklinghausen, Jason Baron, Scott Wood,
	Marcelo Tosatti, Clark Williams, x86

On Fri, 15 Feb 2019 13:47:16 +0100
Daniel Bristot de Oliveira <bristot@redhat.com> wrote:

> On 2/8/19 1:15 AM, Steven Rostedt wrote:
> > On Mon,  4 Feb 2019 20:58:58 +0100
> > Daniel Bristot de Oliveira <bristot@redhat.com> wrote:
> >   
> >>  
> >> +static void text_poke_bp_set_handler(void *addr, void *handler,
> >> +				     unsigned char int3)
> >> +{
> >> +	bp_int3_handler = handler;
> >> +	bp_int3_addr = (u8 *)addr + sizeof(int3);
> >> +	text_poke(addr, &int3, sizeof(int3));
> >> +}
> >> +
> >> +
> >> +static void patch_first_byte(void *addr, const void *opcode, unsigned char int3)
> >> +{
> >> +	/* patch the first byte */
> >> +	text_poke(addr, opcode, sizeof(int3));
> >> +}  
> > Hmm, perhaps get rid of the first function entirely, and just do...
> > (although why have the "int3" here anyway?)
> >   
> 
> These helpers were created because they were used twice in the first versions of
> this patch set. But with the change suggested by Masami, they are called only in
> the text_poke_bp_batch() now, so I am thinking to get rid of them all (removing
> this patch).
> 
> Thoughts?
> 

Go ahead.

-- Steve

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2019-02-21 15:26 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-04 19:58 [PATCH V4 0/9] x86/jump_label: Bound IPIs sent when updating a static key Daniel Bristot de Oliveira
2019-02-04 19:58 ` [PATCH V4 1/9] jump_label: Add for_each_label_entry helper Daniel Bristot de Oliveira
2019-02-04 19:58 ` [PATCH V4 2/9] jump_label: Add the jump_label_can_update_check() helper Daniel Bristot de Oliveira
2019-02-05  7:22   ` Borislav Petkov
2019-02-05 13:50     ` Daniel Bristot de Oliveira
2019-02-05 21:13       ` Borislav Petkov
2019-02-07 13:21         ` Daniel Bristot de Oliveira
2019-02-07 14:08           ` Borislav Petkov
2019-02-07 17:00             ` Daniel Bristot de Oliveira
2019-02-07 17:08               ` Borislav Petkov
2019-02-04 19:58 ` [PATCH V4 3/9] x86/jump_label: Move checking code away from __jump_label_transform() Daniel Bristot de Oliveira
2019-02-05  7:33   ` Borislav Petkov
2019-02-05 15:22     ` Daniel Bristot de Oliveira
2019-02-15 10:05     ` Daniel Bristot de Oliveira
2019-02-04 19:58 ` [PATCH V4 4/9] x86/jump_label: Add __jump_label_set_jump_code() helper Daniel Bristot de Oliveira
2019-02-04 19:58 ` [PATCH V4 5/9] x86/alternative: Split text_poke_bp() into tree steps Daniel Bristot de Oliveira
2019-02-06 19:07   ` Borislav Petkov
2019-02-08  0:11   ` Steven Rostedt
2019-02-08  0:15   ` Steven Rostedt
2019-02-15 12:47     ` Daniel Bristot de Oliveira
2019-02-21 15:26       ` Steven Rostedt
2019-02-04 19:58 ` [PATCH V4 6/9] jump_label: Sort entries of the same key by the code Daniel Bristot de Oliveira
2019-02-04 19:59 ` [PATCH V4 7/9] x86/alternative: Batch of patch operations Daniel Bristot de Oliveira
2019-02-14 12:53   ` Borislav Petkov
2019-02-14 14:06     ` Steven Rostedt
2019-02-14 14:30       ` Borislav Petkov
2019-02-14 14:40         ` Steven Rostedt
2019-02-14 14:54           ` Borislav Petkov
2019-02-15 16:00           ` Daniel Bristot de Oliveira
2019-02-15 17:20             ` Steven Rostedt
2019-02-04 19:59 ` [PATCH V4 8/9] jump_label: Batch updates if arch supports it Daniel Bristot de Oliveira
2019-02-06  6:34   ` Masami Hiramatsu
2019-02-06 15:59     ` Daniel Bristot de Oliveira
2019-02-14 13:33   ` Borislav Petkov
2019-02-04 19:59 ` [PATCH V4 9/9] x86/jump_label: Batch jump label updates Daniel Bristot de Oliveira
2019-02-06  6:29   ` Masami Hiramatsu

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).