All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] x86: Intel microcode loader performance improvement
@ 2010-03-05 17:42 Dimitri Sivanich
  2010-03-08 10:33 ` Dmitry Adamushko
  2010-03-11 14:39 ` [tip:x86/microcode] x86: Improve Intel microcode loader performance tip-bot for Dimitri Sivanich
  0 siblings, 2 replies; 5+ messages in thread
From: Dimitri Sivanich @ 2010-03-05 17:42 UTC (permalink / raw)
  To: linux-kernel; +Cc: Ingo Molnar, Dmitry Adamushko

We've noticed that on large SGI UV system configurations, running
microcode.ctl can take very long periods of time.  This is due to
the large number of vmalloc/vfree calls made by the Intel
generic_load_microcode() logic.

By reusing allocated space, the following patch reduces the time
to run microcode.ctl on a 1024 cpu system from approximately 80
seconds down to 1 or 2 seconds.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>

---

 arch/x86/kernel/microcode_intel.c |   22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

Index: linux/arch/x86/kernel/microcode_intel.c
===================================================================
--- linux.orig/arch/x86/kernel/microcode_intel.c
+++ linux/arch/x86/kernel/microcode_intel.c
@@ -343,10 +343,11 @@ static enum ucode_state generic_load_mic
 				int (*get_ucode_data)(void *, const void *, size_t))
 {
 	struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
-	u8 *ucode_ptr = data, *new_mc = NULL, *mc;
+	u8 *ucode_ptr = data, *new_mc = NULL, *mc = NULL;
 	int new_rev = uci->cpu_sig.rev;
 	unsigned int leftover = size;
 	enum ucode_state state = UCODE_OK;
+	unsigned int curr_mc_size = 0;
 
 	while (leftover) {
 		struct microcode_header_intel mc_header;
@@ -361,9 +362,15 @@ static enum ucode_state generic_load_mic
 			break;
 		}
 
-		mc = vmalloc(mc_size);
-		if (!mc)
-			break;
+		/* For performance reasons, reuse mc area when possible */
+		if (!mc || mc_size > curr_mc_size) {
+			if (mc)
+				vfree(mc);
+			mc = vmalloc(mc_size);
+			if (!mc)
+				break;
+			curr_mc_size = mc_size;
+		}
 
 		if (get_ucode_data(mc, ucode_ptr, mc_size) ||
 		    microcode_sanity_check(mc) < 0) {
@@ -376,13 +383,16 @@ static enum ucode_state generic_load_mic
 				vfree(new_mc);
 			new_rev = mc_header.rev;
 			new_mc  = mc;
-		} else
-			vfree(mc);
+			mc = NULL;	/* trigger new vmalloc */
+		}
 
 		ucode_ptr += mc_size;
 		leftover  -= mc_size;
 	}
 
+	if (mc)
+		vfree(mc);
+
 	if (leftover) {
 		if (new_mc)
 			vfree(new_mc);

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86: Intel microcode loader performance improvement
  2010-03-05 17:42 [PATCH] x86: Intel microcode loader performance improvement Dimitri Sivanich
@ 2010-03-08 10:33 ` Dmitry Adamushko
  2010-03-08 11:23   ` Avi Kivity
  2010-03-08 20:37   ` Bill Davidsen
  2010-03-11 14:39 ` [tip:x86/microcode] x86: Improve Intel microcode loader performance tip-bot for Dimitri Sivanich
  1 sibling, 2 replies; 5+ messages in thread
From: Dmitry Adamushko @ 2010-03-08 10:33 UTC (permalink / raw)
  To: Dimitri Sivanich; +Cc: linux-kernel, Ingo Molnar

On 5 March 2010 18:42, Dimitri Sivanich <sivanich@sgi.com> wrote:
> We've noticed that on large SGI UV system configurations, running
> microcode.ctl can take very long periods of time.  This is due to
> the large number of vmalloc/vfree calls made by the Intel
> generic_load_microcode() logic.
>
> By reusing allocated space, the following patch reduces the time
> to run microcode.ctl on a 1024 cpu system from approximately 80
> seconds down to 1 or 2 seconds.
>
> Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>

This approach seems reasonable in the scope of the current framework.

Acked-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>

However, I think a better approach would be to have some kind of
shared storage for loaded microcode updates. Given that for the
majority of SMP systems all the cpus are normally updated to the very
same new instance of microcode, it should be enough to do a search for
the first cpu, cache the instance of microcode and then reuse it for
others.


-- Dmitry

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86: Intel microcode loader performance improvement
  2010-03-08 10:33 ` Dmitry Adamushko
@ 2010-03-08 11:23   ` Avi Kivity
  2010-03-08 20:37   ` Bill Davidsen
  1 sibling, 0 replies; 5+ messages in thread
From: Avi Kivity @ 2010-03-08 11:23 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: Dimitri Sivanich, linux-kernel, Ingo Molnar

On 03/08/2010 12:33 PM, Dmitry Adamushko wrote:
> On 5 March 2010 18:42, Dimitri Sivanich<sivanich@sgi.com>  wrote:
>    
>> We've noticed that on large SGI UV system configurations, running
>> microcode.ctl can take very long periods of time.  This is due to
>> the large number of vmalloc/vfree calls made by the Intel
>> generic_load_microcode() logic.
>>
>> By reusing allocated space, the following patch reduces the time
>> to run microcode.ctl on a 1024 cpu system from approximately 80
>> seconds down to 1 or 2 seconds.
>>
>> Signed-off-by: Dimitri Sivanich<sivanich@sgi.com>
>>      
> This approach seems reasonable in the scope of the current framework.
>
> Acked-by: Dmitry Adamushko<dmitry.adamushko@gmail.com>
>
> However, I think a better approach would be to have some kind of
> shared storage for loaded microcode updates. Given that for the
> majority of SMP systems all the cpus are normally updated to the very
> same new instance of microcode, it should be enough to do a search for
> the first cpu, cache the instance of microcode and then reuse it for
> others.
>
>    

And/or update processors in parallel.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] x86: Intel microcode loader performance improvement
  2010-03-08 10:33 ` Dmitry Adamushko
  2010-03-08 11:23   ` Avi Kivity
@ 2010-03-08 20:37   ` Bill Davidsen
  1 sibling, 0 replies; 5+ messages in thread
From: Bill Davidsen @ 2010-03-08 20:37 UTC (permalink / raw)
  To: Dmitry Adamushko; +Cc: Dimitri Sivanich, linux-kernel, Ingo Molnar

Dmitry Adamushko wrote:
> On 5 March 2010 18:42, Dimitri Sivanich <sivanich@sgi.com> wrote:
>> We've noticed that on large SGI UV system configurations, running
>> microcode.ctl can take very long periods of time.  This is due to
>> the large number of vmalloc/vfree calls made by the Intel
>> generic_load_microcode() logic.
>>
>> By reusing allocated space, the following patch reduces the time
>> to run microcode.ctl on a 1024 cpu system from approximately 80
>> seconds down to 1 or 2 seconds.
>>
>> Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
> 
> This approach seems reasonable in the scope of the current framework.
> 
> Acked-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
> 
> However, I think a better approach would be to have some kind of
> shared storage for loaded microcode updates. Given that for the
> majority of SMP systems all the cpus are normally updated to the very
> same new instance of microcode, it should be enough to do a search for
> the first cpu, cache the instance of microcode and then reuse it for
> others.
> 
The assumption that all CPUs are the same is not always true in practice, people 
buy a system and don't always fully populate initially, and when they add 
processors, they have a more recent stepping. So reusing microcode or updating 
in parallel would add complexity, and 2 sec for 1024 CPUs puts a pretty low 
upper bound on possible improvement. Does more improvement to a one time small 
delay justify additional complexity?

Systems that size are probably not booted all that often. Something to consider 
before putting a lot of effort into it, I think.

-- 
Bill Davidsen <davidsen@tmr.com>
   "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [tip:x86/microcode] x86: Improve Intel microcode loader performance
  2010-03-05 17:42 [PATCH] x86: Intel microcode loader performance improvement Dimitri Sivanich
  2010-03-08 10:33 ` Dmitry Adamushko
@ 2010-03-11 14:39 ` tip-bot for Dimitri Sivanich
  1 sibling, 0 replies; 5+ messages in thread
From: tip-bot for Dimitri Sivanich @ 2010-03-11 14:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, sivanich, davidsen, dmitry.adamushko,
	tglx, mingo, avi

Commit-ID:  938179b4f8cf8a4f11234ebf2dff2eb48400acfe
Gitweb:     http://git.kernel.org/tip/938179b4f8cf8a4f11234ebf2dff2eb48400acfe
Author:     Dimitri Sivanich <sivanich@sgi.com>
AuthorDate: Fri, 5 Mar 2010 11:42:03 -0600
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Thu, 11 Mar 2010 13:49:06 +0100

x86: Improve Intel microcode loader performance

We've noticed that on large SGI UV system configurations,
running microcode.ctl can take very long periods of time.  This
is due to the large number of vmalloc/vfree calls made by the
Intel generic_load_microcode() logic.

By reusing allocated space, the following patch reduces the time
to run microcode.ctl on a 1024 cpu system from approximately 80
seconds down to 1 or 2 seconds.

Signed-off-by: Dimitri Sivanich <sivanich@sgi.com>
Acked-by: Dmitry Adamushko <dmitry.adamushko@gmail.com>
Cc: Avi Kivity <avi@redhat.com>
Cc: Bill Davidsen <davidsen@tmr.com>
LKML-Reference: <20100305174203.GA19638@sgi.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 arch/x86/kernel/microcode_intel.c |   22 ++++++++++++++++------
 1 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/microcode_intel.c b/arch/x86/kernel/microcode_intel.c
index 85a343e..3561702 100644
--- a/arch/x86/kernel/microcode_intel.c
+++ b/arch/x86/kernel/microcode_intel.c
@@ -343,10 +343,11 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
 				int (*get_ucode_data)(void *, const void *, size_t))
 {
 	struct ucode_cpu_info *uci = ucode_cpu_info + cpu;
-	u8 *ucode_ptr = data, *new_mc = NULL, *mc;
+	u8 *ucode_ptr = data, *new_mc = NULL, *mc = NULL;
 	int new_rev = uci->cpu_sig.rev;
 	unsigned int leftover = size;
 	enum ucode_state state = UCODE_OK;
+	unsigned int curr_mc_size = 0;
 
 	while (leftover) {
 		struct microcode_header_intel mc_header;
@@ -361,9 +362,15 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
 			break;
 		}
 
-		mc = vmalloc(mc_size);
-		if (!mc)
-			break;
+		/* For performance reasons, reuse mc area when possible */
+		if (!mc || mc_size > curr_mc_size) {
+			if (mc)
+				vfree(mc);
+			mc = vmalloc(mc_size);
+			if (!mc)
+				break;
+			curr_mc_size = mc_size;
+		}
 
 		if (get_ucode_data(mc, ucode_ptr, mc_size) ||
 		    microcode_sanity_check(mc) < 0) {
@@ -376,13 +383,16 @@ static enum ucode_state generic_load_microcode(int cpu, void *data, size_t size,
 				vfree(new_mc);
 			new_rev = mc_header.rev;
 			new_mc  = mc;
-		} else
-			vfree(mc);
+			mc = NULL;	/* trigger new vmalloc */
+		}
 
 		ucode_ptr += mc_size;
 		leftover  -= mc_size;
 	}
 
+	if (mc)
+		vfree(mc);
+
 	if (leftover) {
 		if (new_mc)
 			vfree(new_mc);

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-03-11 14:39 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-03-05 17:42 [PATCH] x86: Intel microcode loader performance improvement Dimitri Sivanich
2010-03-08 10:33 ` Dmitry Adamushko
2010-03-08 11:23   ` Avi Kivity
2010-03-08 20:37   ` Bill Davidsen
2010-03-11 14:39 ` [tip:x86/microcode] x86: Improve Intel microcode loader performance tip-bot for Dimitri Sivanich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.