All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] limit CPU time spent in kipmid
@ 2009-03-19 16:27 Martin Wilck
  2009-03-19 21:31 ` Corey Minyard
  2009-03-19 22:41 ` [Openipmi-developer] " Bela Lubkin
  0 siblings, 2 replies; 20+ messages in thread
From: Martin Wilck @ 2009-03-19 16:27 UTC (permalink / raw)
  To: linux-kernel, Corey Minyard, openipmi-developer

[-- Attachment #1: Type: text/plain, Size: 1094 bytes --]

Hello Corey, hi everyone,

here is a patch that limits the CPU time spent in kipmid. I know that it 
was previously stated that current kipmid "works as designed" (e.g. 
http://lists.us.dell.com/pipermail/linux-poweredge/2008-October/037636.html), 
yet users are irritated by the high amount of CPU time kipmid may use up 
on current servers with many sensors, even though it is "nice" CPU time. 
Moreover, kipmid busy-waiting for the KCS interface to become ready also 
prevents CPUs from sleeping.

The attached patch was developed and tested on an enterprise 
distribution kernel where it caused the CPU load of kipmid to drop to 
essentially 0 while still delivering reliable IPMI communication.

I am looking forward for comments.
Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 525 2796
Fax:			++49 5251 525 2820
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html

[-- Attachment #2: ipmi_si_max_busy-2.6.29-rc8.diff --]
[-- Type: text/x-patch, Size: 3493 bytes --]

While the current kipmid implementation is optimal in the sense that kipmid obtains
data as quickly as possible without stealing CPU time from other processes
(by running on low prio and calling schedule() in each loop iteration), it
may spend a lot of CPU time polling for data e.g. on the KCS interface. The
busy loop also prevents CPUs from entering sleep states when no KCS data is
available.

This patch adds a module parameter "kipmid_max_busy" that specifies how many
microseconds kipmid should busy-loop before going to sleep. It affects only the
SI_SM_CALL_WITH_DELAY case, where a delay is acceptible. My experiments have shown
that SI_SM_CALL_WITH_DELAY catches about 99% of smi_event_handler calls. The
new parameter defaults to 0 (previous behavior - busy-loop forever).

With this patch, at least on Fujitsu Siemens hardware, the kipmid CPU time can be
reduced to negligible values without noticeably affecting the performance of the
IPMI interface. Recommended values for kipmid_max_busy on this hardware are in the
100us range (50-500us).

Signed-off-by: Martin Wilck <martin.wilck@fujitsu-siemens.com>
--- linux-2.6.29-rc8/drivers/char/ipmi/ipmi_si_intf.c.orig	2009-03-13 03:39:28.000000000 +0100
+++ linux-2.6.29-rc8/drivers/char/ipmi/ipmi_si_intf.c	2009-03-19 13:24:10.009829167 +0100
@@ -298,6 +298,7 @@ static int force_kipmid[SI_MAX_PARMS];
 static int num_force_kipmid;
 
 static int unload_when_empty = 1;
+static unsigned int kipmid_max_busy;
 
 static int try_smi_init(struct smi_info *smi);
 static void cleanup_one_si(struct smi_info *to_clean);
@@ -927,20 +928,53 @@ static void set_run_to_completion(void *
 	}
 }
 
+
+
+static int ipmi_thread_must_sleep(enum si_sm_result smi_result, int *busy,
+				  struct timespec *busy_time)
+{
+	if (kipmid_max_busy == 0)
+		return 0;
+
+	if (smi_result != SI_SM_CALL_WITH_DELAY) {
+		*busy = 0;
+		return 0;
+	}
+
+	if (*busy == 0) {
+		*busy = 1;
+		getnstimeofday(busy_time);
+		timespec_add_ns(busy_time, kipmid_max_busy*NSEC_PER_USEC);
+	} else {
+		struct timespec now;
+		getnstimeofday(&now);
+		if (unlikely(timespec_compare(&now, busy_time))) {
+			*busy = 0;
+			return 1;
+		}
+	}
+	return 0;
+}
+
 static int ipmi_thread(void *data)
 {
 	struct smi_info *smi_info = data;
 	unsigned long flags;
 	enum si_sm_result smi_result;
+	struct timespec busy_time;
+	int busy = 0;
 
 	set_user_nice(current, 19);
 	while (!kthread_should_stop()) {
+		int must_sleep;
 		spin_lock_irqsave(&(smi_info->si_lock), flags);
 		smi_result = smi_event_handler(smi_info, 0);
 		spin_unlock_irqrestore(&(smi_info->si_lock), flags);
+		must_sleep = ipmi_thread_must_sleep(smi_result,
+						    &busy, &busy_time);
 		if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
 			; /* do nothing */
-		else if (smi_result == SI_SM_CALL_WITH_DELAY)
+		else if (smi_result == SI_SM_CALL_WITH_DELAY && !must_sleep)
 			schedule();
 		else
 			schedule_timeout_interruptible(1);
@@ -1213,6 +1247,11 @@ module_param(unload_when_empty, int, 0);
 MODULE_PARM_DESC(unload_when_empty, "Unload the module if no interfaces are"
 		 " specified or found, default is 1.  Setting to 0"
 		 " is useful for hot add of devices using hotmod.");
+module_param(kipmid_max_busy, uint, 0);
+MODULE_PARM_DESC(kipmid_max_busy,
+		 "Max time (in microseconds) to busy-wait for IPMI data before"
+		 " sleeping. 0 (default) means to wait forever. Set to 100-500"
+		 " if kipmid is using up a lot of CPU time.");
 
 
 static void std_irq_cleanup(struct smi_info *info)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] limit CPU time spent in kipmid
  2009-03-19 16:27 [PATCH] limit CPU time spent in kipmid Martin Wilck
@ 2009-03-19 21:31 ` Corey Minyard
  2009-03-19 23:51   ` Greg KH
  2009-03-19 22:41 ` [Openipmi-developer] " Bela Lubkin
  1 sibling, 1 reply; 20+ messages in thread
From: Corey Minyard @ 2009-03-19 21:31 UTC (permalink / raw)
  To: Martin Wilck; +Cc: linux-kernel, openipmi-developer

Martin, thanks for the patch.  I had actually implemented something like 
this before, and it didn't really help very much with the hardware I 
had, so I had abandoned this method.  There's even a comment about it in 
si_sm_result smi_event_handler(). Maybe making it tunable is better, I 
don't know.  But I'm afraid this will kill performance on a lot of systems.

Did you test throughput on this?  The main problem people had without 
kipmid was that things like firmware upgrades took a *long* time; adding 
kipmid improved speeds by an order of magnitude or more.

It's my opinion that if you want this interface to work efficiently with 
good performance, you should design the hardware to be used efficiently 
by using interrupts (which are supported and disable kipmid).  With the 
way the hardware is defined, you cannot have both good performance and 
low CPU usage without interrupts.

It may be possible to add an option to choose between performance and 
efficiency, but it will have to default to performance.

-corey

Martin Wilck wrote:
> Hello Corey, hi everyone,
>
> here is a patch that limits the CPU time spent in kipmid. I know that 
> it was previously stated that current kipmid "works as designed" (e.g. 
> http://lists.us.dell.com/pipermail/linux-poweredge/2008-October/037636.html), 
> yet users are irritated by the high amount of CPU time kipmid may use 
> up on current servers with many sensors, even though it is "nice" CPU 
> time. Moreover, kipmid busy-waiting for the KCS interface to become 
> ready also prevents CPUs from sleeping.
>
> The attached patch was developed and tested on an enterprise 
> distribution kernel where it caused the CPU load of kipmid to drop to 
> essentially 0 while still delivering reliable IPMI communication.
>
> I am looking forward for comments.
> Martin
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* RE: [Openipmi-developer] [PATCH] limit CPU time spent in kipmid
  2009-03-19 16:27 [PATCH] limit CPU time spent in kipmid Martin Wilck
  2009-03-19 21:31 ` Corey Minyard
@ 2009-03-19 22:41 ` Bela Lubkin
  1 sibling, 0 replies; 20+ messages in thread
From: Bela Lubkin @ 2009-03-19 22:41 UTC (permalink / raw)
  To: 'Martin Wilck', linux-kernel, Corey Minyard, openipmi-developer

I'll give this a try, we've certainly had plenty of Fun with the
cost of running kipmid...

Why O Why are essentially all modern "managed" machines designed
with KCS?  Even when you bless it with an identifiable interrupt
(as HP have done), it still cannot perform anywhere near as well
as BT, with its whole-packet DMA transfers.

I've been assuming that some silicon designer put out a KCS chip
early in the game and ramped production up to the point where it
was practically free, while other protocols were still tacking a
few cents onto the cost of every box.  Thus, millions of hobbled
"server" computers.  Grumble snort.

>Bela<

> -----Original Message-----
> From: Martin Wilck [mailto:martin.wilck@fujitsu-siemens.com] 
> Sent: Thursday, March 19, 2009 9:28 AM
> To: linux-kernel@vger.kernel.org; Corey Minyard; 
> openipmi-developer@lists.sourceforge.net
> Subject: [Openipmi-developer] [PATCH] limit CPU time spent in kipmid
> 
> Hello Corey, hi everyone,
> 
> here is a patch that limits the CPU time spent in kipmid. I 
> know that it 
> was previously stated that current kipmid "works as designed" (e.g. 
> http://lists.us.dell.com/pipermail/linux-poweredge/2008-Octobe
r/037636.html), 
> yet users are irritated by the high amount of CPU time kipmid 
> may use up 
> on current servers with many sensors, even though it is 
> "nice" CPU time. 
> Moreover, kipmid busy-waiting for the KCS interface to become 
> ready also 
> prevents CPUs from sleeping.
> 
> The attached patch was developed and tested on an enterprise 
> distribution kernel where it caused the CPU load of kipmid to drop to 
> essentially 0 while still delivering reliable IPMI communication.
> 
> I am looking forward for comments.
> Martin
> 
> -- 
> Martin Wilck
> PRIMERGY System Software Engineer
> FSC IP ESP DEV 6
> 
> Fujitsu Siemens Computers GmbH
> Heinz-Nixdorf-Ring 1
> 33106 Paderborn
> Germany
> 
> Tel:			++49 5251 525 2796
> Fax:			++49 5251 525 2820
> Email:			mailto:martin.wilck@fujitsu-siemens.com
> Internet:		http://www.fujitsu-siemens.com
> Company Details:	http://www.fujitsu-siemens.com/imprint.html
> 

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] limit CPU time spent in kipmid
  2009-03-19 21:31 ` Corey Minyard
@ 2009-03-19 23:51   ` Greg KH
  2009-03-20 15:30     ` Corey Minyard
  0 siblings, 1 reply; 20+ messages in thread
From: Greg KH @ 2009-03-19 23:51 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Martin Wilck, linux-kernel, openipmi-developer

On Thu, Mar 19, 2009 at 04:31:00PM -0500, Corey Minyard wrote:
> Martin, thanks for the patch.  I had actually implemented something like 
> this before, and it didn't really help very much with the hardware I had, 
> so I had abandoned this method.  There's even a comment about it in 
> si_sm_result smi_event_handler(). Maybe making it tunable is better, I 
> don't know.  But I'm afraid this will kill performance on a lot of systems.
>
> Did you test throughput on this?  The main problem people had without 
> kipmid was that things like firmware upgrades took a *long* time; adding 
> kipmid improved speeds by an order of magnitude or more.
>
> It's my opinion that if you want this interface to work efficiently with 
> good performance, you should design the hardware to be used efficiently by 
> using interrupts (which are supported and disable kipmid).  With the way 
> the hardware is defined, you cannot have both good performance and low CPU 
> usage without interrupts.
>
> It may be possible to add an option to choose between performance and 
> efficiency, but it will have to default to performance.

I would think that very infrequent things, like firmware upgrades, would
not take priority over a long-term "keep the cpu busy" type system, like
what we currently have.

Is there any way to switch between the different modes dynamically?

I like the idea of this change, as I have got a lot of complaints lately
about kipmi taking way too much cpu time up on idle systems, messing up
some user's process accounting rules in their management systems.  But I
worry about making it a module parameter, why can't this be a
"self-tunable" thing?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] limit CPU time spent in kipmid
  2009-03-19 23:51   ` Greg KH
@ 2009-03-20 15:30     ` Corey Minyard
  2009-03-20 17:47       ` Greg KH
  0 siblings, 1 reply; 20+ messages in thread
From: Corey Minyard @ 2009-03-20 15:30 UTC (permalink / raw)
  To: Greg KH; +Cc: Martin Wilck, linux-kernel, openipmi-developer

Greg KH wrote:
> On Thu, Mar 19, 2009 at 04:31:00PM -0500, Corey Minyard wrote:
>   
>> Martin, thanks for the patch.  I had actually implemented something like 
>> this before, and it didn't really help very much with the hardware I had, 
>> so I had abandoned this method.  There's even a comment about it in 
>> si_sm_result smi_event_handler(). Maybe making it tunable is better, I 
>> don't know.  But I'm afraid this will kill performance on a lot of systems.
>>
>> Did you test throughput on this?  The main problem people had without 
>> kipmid was that things like firmware upgrades took a *long* time; adding 
>> kipmid improved speeds by an order of magnitude or more.
>>
>> It's my opinion that if you want this interface to work efficiently with 
>> good performance, you should design the hardware to be used efficiently by 
>> using interrupts (which are supported and disable kipmid).  With the way 
>> the hardware is defined, you cannot have both good performance and low CPU 
>> usage without interrupts.
>>
>> It may be possible to add an option to choose between performance and 
>> efficiency, but it will have to default to performance.
>>     
>
> I would think that very infrequent things, like firmware upgrades, would
> not take priority over a long-term "keep the cpu busy" type system, like
> what we currently have.
>
> Is there any way to switch between the different modes dynamically?
>   
> I like the idea of this change, as I have got a lot of complaints lately
> about kipmi taking way too much cpu time up on idle systems, messing up
> some user's process accounting rules in their management systems.  But I
> worry about making it a module parameter, why can't this be a
> "self-tunable" thing?
>   
It's actually already sort of self-tuning.  kipmid sleeps unless there 
is IPMI activity.  It only spins if it is expecting something from the 
controller.

I've been thinking about this a little more.  Assuming that the 
self-tuning is working (and it appears to be working fine on my 
systems), that means that something is causing the IPMI driver to 
constantly talk to the management controller.  I can think of three things:

   1. The user is constantly sending messages to management controller.
   2. There is something wrong with the hardware, like the ATTN bit is
      stuck high, causing the driver to constantly poll the management
      controller.
   3. The driver either has a bug or needs some more work to account for
      something the hardware needs it to do to clear the ATTN bit.

If it's #1 above, then I don't know if there is anything we can do about 
it.  The patch Martin sent will simply slow things down.

#2 and #3 will require someone to do some debugging.  If the ATTN bit is 
stuck, you should see the "attentions" field in /proc/ipmi/0/si_stats 
constantly going up.  Actually, the contents of that file would be 
helpful, along with /proc/ipmi/0/stats.

-corey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] limit CPU time spent in kipmid
  2009-03-20 15:30     ` Corey Minyard
@ 2009-03-20 17:47       ` Greg KH
  2009-03-20 18:28         ` Corey Minyard
  0 siblings, 1 reply; 20+ messages in thread
From: Greg KH @ 2009-03-20 17:47 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Martin Wilck, linux-kernel, openipmi-developer

On Fri, Mar 20, 2009 at 10:30:45AM -0500, Corey Minyard wrote:
> Greg KH wrote:
>> On Thu, Mar 19, 2009 at 04:31:00PM -0500, Corey Minyard wrote:
>>   
>>> Martin, thanks for the patch.  I had actually implemented something like 
>>> this before, and it didn't really help very much with the hardware I had, 
>>> so I had abandoned this method.  There's even a comment about it in 
>>> si_sm_result smi_event_handler(). Maybe making it tunable is better, I 
>>> don't know.  But I'm afraid this will kill performance on a lot of 
>>> systems.
>>>
>>> Did you test throughput on this?  The main problem people had without 
>>> kipmid was that things like firmware upgrades took a *long* time; adding 
>>> kipmid improved speeds by an order of magnitude or more.
>>>
>>> It's my opinion that if you want this interface to work efficiently with 
>>> good performance, you should design the hardware to be used efficiently 
>>> by using interrupts (which are supported and disable kipmid).  With the 
>>> way the hardware is defined, you cannot have both good performance and 
>>> low CPU usage without interrupts.
>>>
>>> It may be possible to add an option to choose between performance and 
>>> efficiency, but it will have to default to performance.
>>>     
>>
>> I would think that very infrequent things, like firmware upgrades, would
>> not take priority over a long-term "keep the cpu busy" type system, like
>> what we currently have.
>>
>> Is there any way to switch between the different modes dynamically?
>>   I like the idea of this change, as I have got a lot of complaints lately
>> about kipmi taking way too much cpu time up on idle systems, messing up
>> some user's process accounting rules in their management systems.  But I
>> worry about making it a module parameter, why can't this be a
>> "self-tunable" thing?
>>   
> It's actually already sort of self-tuning.  kipmid sleeps unless there is 
> IPMI activity.  It only spins if it is expecting something from the 
> controller.
>
> I've been thinking about this a little more.  Assuming that the self-tuning 
> is working (and it appears to be working fine on my systems), that means 
> that something is causing the IPMI driver to constantly talk to the 
> management controller.  I can think of three things:
>
>   1. The user is constantly sending messages to management controller.
>   2. There is something wrong with the hardware, like the ATTN bit is
>      stuck high, causing the driver to constantly poll the management
>      controller.
>   3. The driver either has a bug or needs some more work to account for
>      something the hardware needs it to do to clear the ATTN bit.
>
> If it's #1 above, then I don't know if there is anything we can do about 
> it.  The patch Martin sent will simply slow things down.

Does the "normal" ipmi userspace tools do #1?

For #2, this might make sense, as I have had reports of some hardware
working just fine, while others have the load issue.  Both were
different hardware manufacturers.

> #2 and #3 will require someone to do some debugging.  If the ATTN bit is 
> stuck, you should see the "attentions" field in /proc/ipmi/0/si_stats 
> constantly going up.  Actually, the contents of that file would be helpful, 
> along with /proc/ipmi/0/stats.

Martin has one of these machines, right?  If not, I can dig and try to
get some information as well.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] limit CPU time spent in kipmid
  2009-03-20 17:47       ` Greg KH
@ 2009-03-20 18:28         ` Corey Minyard
  2009-03-23 13:17           ` [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN) Martin Wilck
  2009-03-23 13:25           ` [PATCH] limit CPU time spent in kipmid Martin Wilck
  0 siblings, 2 replies; 20+ messages in thread
From: Corey Minyard @ 2009-03-20 18:28 UTC (permalink / raw)
  To: Greg KH; +Cc: Martin Wilck, linux-kernel, openipmi-developer

Greg KH wrote:
> On Fri, Mar 20, 2009 at 10:30:45AM -0500, Corey Minyard wrote:
>   
>> Greg KH wrote:
>>     
>>> On Thu, Mar 19, 2009 at 04:31:00PM -0500, Corey Minyard wrote:
>>>   
>>>       
>>>> Martin, thanks for the patch.  I had actually implemented something like 
>>>> this before, and it didn't really help very much with the hardware I had, 
>>>> so I had abandoned this method.  There's even a comment about it in 
>>>> si_sm_result smi_event_handler(). Maybe making it tunable is better, I 
>>>> don't know.  But I'm afraid this will kill performance on a lot of 
>>>> systems.
>>>>
>>>> Did you test throughput on this?  The main problem people had without 
>>>> kipmid was that things like firmware upgrades took a *long* time; adding 
>>>> kipmid improved speeds by an order of magnitude or more.
>>>>
>>>> It's my opinion that if you want this interface to work efficiently with 
>>>> good performance, you should design the hardware to be used efficiently 
>>>> by using interrupts (which are supported and disable kipmid).  With the 
>>>> way the hardware is defined, you cannot have both good performance and 
>>>> low CPU usage without interrupts.
>>>>
>>>> It may be possible to add an option to choose between performance and 
>>>> efficiency, but it will have to default to performance.
>>>>     
>>>>         
>>> I would think that very infrequent things, like firmware upgrades, would
>>> not take priority over a long-term "keep the cpu busy" type system, like
>>> what we currently have.
>>>
>>> Is there any way to switch between the different modes dynamically?
>>>   I like the idea of this change, as I have got a lot of complaints lately
>>> about kipmi taking way too much cpu time up on idle systems, messing up
>>> some user's process accounting rules in their management systems.  But I
>>> worry about making it a module parameter, why can't this be a
>>> "self-tunable" thing?
>>>   
>>>       
>> It's actually already sort of self-tuning.  kipmid sleeps unless there is 
>> IPMI activity.  It only spins if it is expecting something from the 
>> controller.
>>
>> I've been thinking about this a little more.  Assuming that the self-tuning 
>> is working (and it appears to be working fine on my systems), that means 
>> that something is causing the IPMI driver to constantly talk to the 
>> management controller.  I can think of three things:
>>
>>   1. The user is constantly sending messages to management controller.
>>   2. There is something wrong with the hardware, like the ATTN bit is
>>      stuck high, causing the driver to constantly poll the management
>>      controller.
>>   3. The driver either has a bug or needs some more work to account for
>>      something the hardware needs it to do to clear the ATTN bit.
>>
>> If it's #1 above, then I don't know if there is anything we can do about 
>> it.  The patch Martin sent will simply slow things down.
>>     
>
> Does the "normal" ipmi userspace tools do #1?
>   
That depends how they are used and configured.  If you make them 
constantly poll for events or grab sensor values, then they will just 
use CPU.  By default they shouldn't do anything.

> For #2, this might make sense, as I have had reports of some hardware
> working just fine, while others have the load issue.  Both were
> different hardware manufacturers.
>
>   
>> #2 and #3 will require someone to do some debugging.  If the ATTN bit is 
>> stuck, you should see the "attentions" field in /proc/ipmi/0/si_stats 
>> constantly going up.  Actually, the contents of that file would be helpful, 
>> along with /proc/ipmi/0/stats.
>>     
>
> Martin has one of these machines, right?  If not, I can dig and try to
> get some information as well.
>   
I'll wait for Martin, hopefully he can get the info.

Thanks,

-corey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)
  2009-03-20 18:28         ` Corey Minyard
@ 2009-03-23 13:17           ` Martin Wilck
  2009-03-23 15:32             ` Greg KH
  2009-03-23 20:39             ` Corey Minyard
  2009-03-23 13:25           ` [PATCH] limit CPU time spent in kipmid Martin Wilck
  1 sibling, 2 replies; 20+ messages in thread
From: Martin Wilck @ 2009-03-23 13:17 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Greg KH, linux-kernel, openipmi-developer

[-- Attachment #1: Type: text/plain, Size: 4150 bytes --]

Hi Corey, hi Greg, hi all,

first of all I need to apologize, because _the first patch I sent was 
broken_. The attached patch should work better.

I did some benchmarking with this patch. In short:

1. The kipmid_max_busy parameter is a tunable that behaves reasonably. 
2. Low values of this parameter use up almost as little CPU as the 
"force_kipmid=0" case, but perform better.
3. It is important to distinguish cases with and without CPU load.
4. To offer this tunable to make a balance between max. CPU load of 
kipmid and performance appears to be worthwhile for many users.

Now the details ... The following tables are in CSV format. The 
benchmark used was a script using ipmitool to read all SDRs and all SEL 
events from the BMC 10x in a loop. This takes 22s with the default 
driver (using nearly 100% CPU), and almost 30x longer without kipmid 
(force_kipmid=off). The "busy cycles" in the table were calculated from 
oprofile CPU_CLK_UNHALTED counts; the "kipmid CPU%" are output from "ps 
-eo pcpu". The tested kernel was an Enterprise Linux kernel with HZ=1000.

"Results without load"
         "elapsed(s)"    "elapsed (rel.)"        "kipmid CPU% (ps)" 
  "CPU busy cycles (%)"
"default       "        22      1       32      103.15
"force_kipmid=0"        621     28.23   0       12.7
"kipmid_max_busy=5000"  21      0.95    34      100.16
"kipmid_max_busy=2000"  22      1       34      94.04
"kipmid_max_busy=1000"  27      1.23    25      26.89
"kipmid_max_busy=500"   24      1.09    0       69.44
"kipmid_max_busy=200"   42      1.91    0       46.72
"kipmid_max_busy=100"   68      3.09    0       17.87
"kipmid_max_busy=50"    101     4.59    0       22.91
"kipmid_max_busy=20"    163     7.41    0       19.98
"kipmid_max_busy=10"    213     9.68    0       13.19

As expected, kipmid_max_busy > 1000 has almost no effect (with HZ=1000). 
kipmid_max_busy=500 saves 30% busy time losing only 10% performance. 
With kipmid_max_busy=10, the performance result is 3x better than just 
switching kipmid totally off, with almost the same amount of CPU busy 
cycles. Note that the %CPU displayed by "ps", "top" etc drops to 0 for 
kipmid_max_busy < HZ. This effect is an artefact caused by the CPU time 
being measured only at timer interrupts. But it will also make user 
complains about kipmid drop to 0 - think about it ;-)

I took another run with a system under 100% CPU load by other processes. 
Now there is hardly any performance difference any more. As expected,
the kipmid runs are all only slightly faster than the interrupt-driven 
run which isn't affected by the CPU load. In this case, recording the 
CPU load from kipmid makes no sense (it is ~0 anyway).

         "elapsed(s)"    "elapsed (rel.)"        "kipmid CPU% (ps)"
"Results with 100% CPU load"
"default       "        500     22.73
"force_kipmid=0"        620     28.18
"kipmid_max_busy=1000"  460     20.91
"kipmid_max_busy=500"   500     22.73
"kipmid_max_busy=200"   530     24.09
"kipmid_max_busy=100"   570     25.91


As I said initially, these are results taken on a single system. On this 
system the KCS response times (from start to end of the 
SI_SM_CALL_WITH_DELAY loop) are between 200 and 2000 us:

us	%wait finished until
200	0%
400	21%
600	39%
800	44%
1000	55%
1200	89%
1400	94%
1600	97%

This may well be different on other systems, depending on the BMC, 
number of sensors, etc. Therefore I think this should remain a tunable, 
because finding an optimal value for arbitrary systems will be hard. Of 
course, the impi driver could implement some sort of self-tuning logic, 
but that would be overengineered to my taste. kipmid_max_busy would give 
HW vendors a chance to determine an optimal value for a given system and 
give a respective recommendation to users.

Best regards
Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 525 2796
Fax:			++49 5251 525 2820
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html

[-- Attachment #2: ipmi_si_max_busy-fixed-2.6.29-rc8.diff --]
[-- Type: text/x-patch, Size: 3298 bytes --]

Patch: Make busy-loops in kipmid configurable (take 2)

While the current kipmid implementation is optimal in the sense that kipmid obtains
data as quickly as possible without stealing CPU time from other processes
(by running on low prio and calling schedule() in each loop iteration), it
may spend a lot of CPU time polling for data e.g. on the KCS interface. The
busy loop also prevents CPUs from entering sleep states when no KCS data is
available.

This patch adds a module parameter "kipmid_max_busy" that specifies how many
microseconds kipmid should busy-loop before going to sleep. It affects only the
SI_SM_CALL_WITH_DELAY case, where a delay is acceptible. My experiments have shown
that SI_SM_CALL_WITH_DELAY catches about 99% of smi_event_handler calls. The
new parameter defaults to 0 (previous behavior - busy-loop forever).

Signed-off-by: Martin Wilck <martin.wilck@fujitsu-siemens.com>
--- linux-2.6.29-rc8/drivers/char/ipmi/ipmi_si_intf.c.orig	2009-03-13 03:39:28.000000000 +0100
+++ linux-2.6.29-rc8/drivers/char/ipmi/ipmi_si_intf.c	2009-03-23 12:48:17.595361036 +0100
@@ -298,6 +298,7 @@ static int force_kipmid[SI_MAX_PARMS];
 static int num_force_kipmid;
 
 static int unload_when_empty = 1;
+static unsigned int kipmid_max_busy;
 
 static int try_smi_init(struct smi_info *smi);
 static void cleanup_one_si(struct smi_info *to_clean);
@@ -927,20 +928,52 @@ static void set_run_to_completion(void *
 	}
 }
 
+static int ipmi_thread_must_sleep(enum si_sm_result smi_result, int *busy,
+				  struct timespec *busy_time)
+{
+	if (kipmid_max_busy == 0)
+		return 0;
+
+	if (smi_result != SI_SM_CALL_WITH_DELAY) {
+		if (*busy != 0)
+			*busy = 0;
+		return 0;
+	}
+
+	if (*busy == 0) {
+		*busy = 1;
+		getnstimeofday(busy_time);
+		timespec_add_ns(busy_time, kipmid_max_busy*NSEC_PER_USEC);
+	} else {
+		struct timespec now;
+		getnstimeofday(&now);
+		if (unlikely(timespec_compare(&now, busy_time) > 0)) {
+			*busy = 0;
+			return 1;
+		}
+	}
+	return 0;
+}
+
 static int ipmi_thread(void *data)
 {
 	struct smi_info *smi_info = data;
 	unsigned long flags;
 	enum si_sm_result smi_result;
+	struct timespec busy_time;
+	int busy = 0;
 
 	set_user_nice(current, 19);
 	while (!kthread_should_stop()) {
+		int must_sleep;
 		spin_lock_irqsave(&(smi_info->si_lock), flags);
 		smi_result = smi_event_handler(smi_info, 0);
 		spin_unlock_irqrestore(&(smi_info->si_lock), flags);
+		must_sleep = ipmi_thread_must_sleep(smi_result,
+						    &busy, &busy_time);
 		if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
 			; /* do nothing */
-		else if (smi_result == SI_SM_CALL_WITH_DELAY)
+		else if (smi_result == SI_SM_CALL_WITH_DELAY && !must_sleep)
 			schedule();
 		else
 			schedule_timeout_interruptible(1);
@@ -1213,6 +1246,11 @@ module_param(unload_when_empty, int, 0);
 MODULE_PARM_DESC(unload_when_empty, "Unload the module if no interfaces are"
 		 " specified or found, default is 1.  Setting to 0"
 		 " is useful for hot add of devices using hotmod.");
+module_param(kipmid_max_busy, uint, 0);
+MODULE_PARM_DESC(kipmid_max_busy,
+		 "Max time (in microseconds) to busy-wait for IPMI data before"
+		 " sleeping. 0 (default) means to wait forever. Set to 100-500"
+		 " if kipmid is using up a lot of CPU time.");
 
 
 static void std_irq_cleanup(struct smi_info *info)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] limit CPU time spent in kipmid
  2009-03-20 18:28         ` Corey Minyard
  2009-03-23 13:17           ` [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN) Martin Wilck
@ 2009-03-23 13:25           ` Martin Wilck
  1 sibling, 0 replies; 20+ messages in thread
From: Martin Wilck @ 2009-03-23 13:25 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Greg KH, linux-kernel, openipmi-developer

Corey Minyard wrote:

>>> It's actually already sort of self-tuning.  kipmid sleeps unless there is 
>>> IPMI activity.  It only spins if it is expecting something from the 
>>> controller.

The self-tuning is fine (as long as there is no CPU load, which may slow 
down stuff a lot, see my other posting). But on systems with many 
sensors it will lead to considerable CPU time shown in "top" and other 
tools for kipmid. And this confuses users. Users think that this is the 
hardware vendor's fault - that's why I am sending this patch (if you so 
wish, it is indeed the vendor's fault to use the outdated KCS interface 
- but that's a different discussion, please let's keep it separate).

>>> I've been thinking about this a little more.  Assuming that the self-tuning 
>>> is working (and it appears to be working fine on my systems), that means 
>>> that something is causing the IPMI driver to constantly talk to the 
>>> management controller.  I can think of three things:
>>>
>>>   1. The user is constantly sending messages to management controller.

This is what I did in my benchmark, of course. But also in real systems, 
there are now many sensors (think dozens of DIMMs with several sensors 
on each DIMM) and many events, causing constant IPMI traffic.

>>>   2. There is something wrong with the hardware, like the ATTN bit is
>>>      stuck high, causing the driver to constantly poll the management
>>>      controller.
>>>   3. The driver either has a bug or needs some more work to account for
>>>      something the hardware needs it to do to clear the ATTN bit.

I think both 2.) and 3.) is not the case here.

>>> If it's #1 above, then I don't know if there is anything we can do about 
>>> it.  The patch Martin sent will simply slow things down.

True, but only a little bit. Please look at the numbers in my other posting.

Best regards
Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 525 2796
Fax:			++49 5251 525 2820
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)
  2009-03-23 13:17           ` [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN) Martin Wilck
@ 2009-03-23 15:32             ` Greg KH
  2009-03-23 16:20               ` Martin Wilck
  2009-03-23 20:39             ` Corey Minyard
  1 sibling, 1 reply; 20+ messages in thread
From: Greg KH @ 2009-03-23 15:32 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Corey Minyard, linux-kernel, openipmi-developer

On Mon, Mar 23, 2009 at 02:17:20PM +0100, Martin Wilck wrote:
> +module_param(kipmid_max_busy, uint, 0);

Can you make this modifiable at runtime by setting the mode to 0644?
That way people would not have to unload and then reload the module in
order to be able to fix their machines, as they usually don't know they
have a problem until they run their system for a while.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)
  2009-03-23 15:32             ` Greg KH
@ 2009-03-23 16:20               ` Martin Wilck
  0 siblings, 0 replies; 20+ messages in thread
From: Martin Wilck @ 2009-03-23 16:20 UTC (permalink / raw)
  To: Greg KH; +Cc: Corey Minyard, linux-kernel, openipmi-developer

Greg KH wrote:

> Can you make this modifiable at runtime by setting the mode to 0644?
> That way people would not have to unload and then reload the module in
> order to be able to fix their machines, as they usually don't know they
> have a problem until they run their system for a while.

Good point, but let's first sort out if there's agreement about the 
general idea, and do the fine-tuning later. I may actually make sense to 
make this an array parameter so that different IPMI interfaces can have 
different settings (like force_kipmid).

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 525 2796
Fax:			++49 5251 525 2820
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)
  2009-03-23 13:17           ` [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN) Martin Wilck
  2009-03-23 15:32             ` Greg KH
@ 2009-03-23 20:39             ` Corey Minyard
  2009-03-24  9:22               ` Martin Wilck
                                 ` (2 more replies)
  1 sibling, 3 replies; 20+ messages in thread
From: Corey Minyard @ 2009-03-23 20:39 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Greg KH, linux-kernel, openipmi-developer

I've done some experimenting with your patch and some more thinking.  
(BTW, setting the permissions of kipmid_max_busy to 0644 as Greg 
suggested makes changing the value for testing a lot easier :).

Results are not so good with the system I was working with.  I have a 
tool that measures latency of individual messages, averaging over a 
number of messages.  It's part of the openipmi library, if you want to 
grab it.

For a message that requires almost no CPU from the management controller 
(a Get MC ID command), it takes around 5ms per message round-trip and 
uses about 10% of a CPU.  Setting the the max busy to 500 causes it to 
take about 23ms per message round trip and the CPU usage is not measurable.

Fetching SDRs (sensor data repository items), which will require more 
work on the management controller, is a bit different.  Each message 
takes 22ms with max_busy disabled using about 50% of the CPU.  Setting 
it to 500 changes the value to 44ms per message, no measurable CPU.  
Still not great, but not 5 times worse, either.  (The reason you are 
seeing 100% CPU and I'm not is because ipmitool issues more than one 
fetch to the driver at a time so the next command is ready to go as soon 
as the driver finishes one, so the driver will not do a 1-tick sleep 
between messages).

I'm guessing that the difference is that there is a long delay between 
receiving the command and issuing the result in the SDR fetch command.  
With your patch, this puts the driver to sleep for a tick when this 
happens.  The individual byte transfers are short, so the tick-long 
sleep doesn't happen in that case.

I'm also pretty sure I know what is going on in general.  You are using 
ipmitool to fetch sensors with a short poll time and your management 
controller does not implement a useful feature.

The reason that some systems doing this use a lot of CPU and other 
systems do not has do with the management controller design.  Some 
management controllers implement a UUID and a timestamp on the SDR data. 
ipmitool will locally cache the data and if the UUID and timestamp are 
the same it will not fetch the SDRs.  Just fetching the sensor values 
will be very efficient, much like the Get MC ID command.   If this is 
not implemented in the management controller, ipmitool will fetch all 
the SDRs every time you run it, which is terribly inefficient.  I'm 
guessing that's your situation.

I'm ok with the patch with the feature disabled by default.  I'd prefer 
for it to be disabled by default because I prefer to reward vendors that 
make our lives better and punish vendors that make our lives worse :).  
You should run it through checkpatch; there were one or two coding style 
violations.

I also have a few suggestions for solving this problem outside of this 
patch:

   1. Get your vendor to implement UUIDs and timestamps.  This will make
      things run more than an order of magnitude faster and more
      efficient.  Even better than interrupts.
   2. If that's not possible, don't use ipmitool.  Instead, write a
      program with the openipmi library that stays up all the time (so
      the SDR fetch is only done once at startup) and dumps the sensors
      periodically.
   3. If that's not feasible, poll less often and use events to catch
      critical changes.  Of course, this being IPMI, some vendors don't
      properly implement events on their sensors, so that may not work.

-corey


Martin Wilck wrote:
> Hi Corey, hi Greg, hi all,
>
> first of all I need to apologize, because _the first patch I sent was 
> broken_. The attached patch should work better.
>
> I did some benchmarking with this patch. In short:
>
> 1. The kipmid_max_busy parameter is a tunable that behaves reasonably. 
> 2. Low values of this parameter use up almost as little CPU as the 
> "force_kipmid=0" case, but perform better.
> 3. It is important to distinguish cases with and without CPU load.
> 4. To offer this tunable to make a balance between max. CPU load of 
> kipmid and performance appears to be worthwhile for many users.
>
> Now the details ... The following tables are in CSV format. The 
> benchmark used was a script using ipmitool to read all SDRs and all 
> SEL events from the BMC 10x in a loop. This takes 22s with the default 
> driver (using nearly 100% CPU), and almost 30x longer without kipmid 
> (force_kipmid=off). The "busy cycles" in the table were calculated 
> from oprofile CPU_CLK_UNHALTED counts; the "kipmid CPU%" are output 
> from "ps -eo pcpu". The tested kernel was an Enterprise Linux kernel 
> with HZ=1000.
>
> "Results without load"
>         "elapsed(s)"    "elapsed (rel.)"        "kipmid CPU% (ps)" 
>  "CPU busy cycles (%)"
> "default       "        22      1       32      103.15
> "force_kipmid=0"        621     28.23   0       12.7
> "kipmid_max_busy=5000"  21      0.95    34      100.16
> "kipmid_max_busy=2000"  22      1       34      94.04
> "kipmid_max_busy=1000"  27      1.23    25      26.89
> "kipmid_max_busy=500"   24      1.09    0       69.44
> "kipmid_max_busy=200"   42      1.91    0       46.72
> "kipmid_max_busy=100"   68      3.09    0       17.87
> "kipmid_max_busy=50"    101     4.59    0       22.91
> "kipmid_max_busy=20"    163     7.41    0       19.98
> "kipmid_max_busy=10"    213     9.68    0       13.19
>
> As expected, kipmid_max_busy > 1000 has almost no effect (with 
> HZ=1000). kipmid_max_busy=500 saves 30% busy time losing only 10% 
> performance. With kipmid_max_busy=10, the performance result is 3x 
> better than just switching kipmid totally off, with almost the same 
> amount of CPU busy cycles. Note that the %CPU displayed by "ps", "top" 
> etc drops to 0 for kipmid_max_busy < HZ. This effect is an artefact 
> caused by the CPU time being measured only at timer interrupts. But it 
> will also make user complains about kipmid drop to 0 - think about it ;-)
>
> I took another run with a system under 100% CPU load by other 
> processes. Now there is hardly any performance difference any more. As 
> expected,
> the kipmid runs are all only slightly faster than the interrupt-driven 
> run which isn't affected by the CPU load. In this case, recording the 
> CPU load from kipmid makes no sense (it is ~0 anyway).
>
>         "elapsed(s)"    "elapsed (rel.)"        "kipmid CPU% (ps)"
> "Results with 100% CPU load"
> "default       "        500     22.73
> "force_kipmid=0"        620     28.18
> "kipmid_max_busy=1000"  460     20.91
> "kipmid_max_busy=500"   500     22.73
> "kipmid_max_busy=200"   530     24.09
> "kipmid_max_busy=100"   570     25.91
>
>
> As I said initially, these are results taken on a single system. On 
> this system the KCS response times (from start to end of the 
> SI_SM_CALL_WITH_DELAY loop) are between 200 and 2000 us:
>
> us    %wait finished until
> 200    0%
> 400    21%
> 600    39%
> 800    44%
> 1000    55%
> 1200    89%
> 1400    94%
> 1600    97%
>
> This may well be different on other systems, depending on the BMC, 
> number of sensors, etc. Therefore I think this should remain a 
> tunable, because finding an optimal value for arbitrary systems will 
> be hard. Of course, the impi driver could implement some sort of 
> self-tuning logic, but that would be overengineered to my taste. 
> kipmid_max_busy would give HW vendors a chance to determine an optimal 
> value for a given system and give a respective recommendation to users.
>
> Best regards
> Martin
>


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)
  2009-03-23 20:39             ` Corey Minyard
@ 2009-03-24  9:22               ` Martin Wilck
  2009-03-24  9:30               ` Improving IPMI performance under load Martin Wilck
  2009-04-06 13:48               ` [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN) Martin Wilck
  2 siblings, 0 replies; 20+ messages in thread
From: Martin Wilck @ 2009-03-24  9:22 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Greg KH, linux-kernel, openipmi-developer

Hi Corey,

thanks a lot for your comments.

> I've done some experimenting with your patch and some more thinking.  
> (BTW, setting the permissions of kipmid_max_busy to 0644 as Greg 
> suggested makes changing the value for testing a lot easier :).
> 
> Results are not so good with the system I was working with.  

I expected things to differ between various systems.

> I have a 
> tool that measures latency of individual messages, averaging over a 
> number of messages.  It's part of the openipmi library, if you want to 
> grab it.

I will check it out.

> I'm also pretty sure I know what is going on in general.  You are using 
> ipmitool to fetch sensors with a short poll time and your management 
> controller does not implement a useful feature.

Some of the SDRs were useful temperature sensors. The SEL data were also 
useful data. I wanted to have a simple benchmark that produces a similar 
load to a real situation. I am grateful for a suggestion of a better tool.

> The reason that some systems doing this use a lot of CPU and other 
> systems do not has do with the management controller design.  Some 
> management controllers implement a UUID and a timestamp on the SDR data. 
> ipmitool will locally cache the data and if the UUID and timestamp are 
> the same it will not fetch the SDRs.  Just fetching the sensor values 
> will be very efficient, much like the Get MC ID command.   If this is 
> not implemented in the management controller, ipmitool will fetch all 
> the SDRs every time you run it, which is terribly inefficient.  I'm 
> guessing that's your situation.

In the benchmark case, this is what I intended to do (I wanted to 
measure the KCS driver performance and load, after all). In the real 
life situation, we aren't using ipmitool at all, we have our own server 
management daemon.

> I'm ok with the patch with the feature disabled by default.  I'd prefer 
> for it to be disabled by default because I prefer to reward vendors that 
> make our lives better and punish vendors that make our lives worse :).  
  on the user side.
I agree the current behavior is good as default.

> You should run it through checkpatch; there were one or two coding style 
> violations.

Weird, I ran it. Perhaps not the latest version. I will resend soon.

> I also have a few suggestions for solving this problem outside of this 
> patch:
> 
>    1. Get your vendor to implement UUIDs and timestamps.  This will make
>       things run more than an order of magnitude faster and more
>       efficient.  Even better than interrupts.
>    2. If that's not possible, don't use ipmitool.  Instead, write a
>       program with the openipmi library that stays up all the time (so
>       the SDR fetch is only done once at startup) and dumps the sensors
>       periodically.

This is what we are doing in real life. I need to check how much caching 
the user space tool does, though.

>    3. If that's not feasible, poll less often and use events to catch
>       critical changes.  Of course, this being IPMI, some vendors don't
>       properly implement events on their sensors, so that may not work.

You forgot to suggest "implement BT" :-) still the best thing to do IMO.

Best regards,
Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 525 2796
Fax:			++49 5251 525 2820
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Improving IPMI performance under load
  2009-03-23 20:39             ` Corey Minyard
  2009-03-24  9:22               ` Martin Wilck
@ 2009-03-24  9:30               ` Martin Wilck
  2009-03-24 13:08                 ` [Openipmi-developer] " Corey Minyard
  2009-04-06 13:48               ` [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN) Martin Wilck
  2 siblings, 1 reply; 20+ messages in thread
From: Martin Wilck @ 2009-03-24  9:30 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Greg KH, linux-kernel, openipmi-developer

Hi Corey,

yesterday I posted some results about the IPMI performance under CPU 
load, which can be up to 25 times slower than in an idle system. I think 
it might be worthwhile to try to improve that behavior as well.

I made a variation of my patch which introduces a second parameter 
(kipmid_min_busy) that causes kipmid not to call schedule() for a 
certain amount of time. Thus if there's IPMI traffic pending, kipmid 
will busy-loop for kipmid_min_busy seconds, then starting to schedule() 
in each loop as it does now, and finally go to sleep when 
kipmid_max_busy is reached. At the same time, I changed the nice value 
of kipmid from 19 to 0.

With this patch and e.g. min_busy=100 and max_busy=200, there is no 
noticeable difference any more between IPMI performance with and without 
CPU load.

The patch + results still need cleanup, therefore I am not sending it 
right now. Just wanted to hear what you think.

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 525 2796
Fax:			++49 5251 525 2820
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Openipmi-developer] Improving IPMI performance under load
  2009-03-24  9:30               ` Improving IPMI performance under load Martin Wilck
@ 2009-03-24 13:08                 ` Corey Minyard
  2009-03-24 13:21                   ` Martin Wilck
  2009-03-24 15:50                   ` Matt Domsch
  0 siblings, 2 replies; 20+ messages in thread
From: Corey Minyard @ 2009-03-24 13:08 UTC (permalink / raw)
  To: Martin Wilck; +Cc: Greg KH, openipmi-developer, linux-kernel

Martin Wilck wrote:
> Hi Corey,
>
> yesterday I posted some results about the IPMI performance under CPU 
> load, which can be up to 25 times slower than in an idle system. I think 
> it might be worthwhile to try to improve that behavior as well.
>   
Yes, that would be expected, as kipmid would never be scheduled in a 
busy system, and it would just be the timer driving things.

> I made a variation of my patch which introduces a second parameter 
> (kipmid_min_busy) that causes kipmid not to call schedule() for a 
> certain amount of time. Thus if there's IPMI traffic pending, kipmid 
> will busy-loop for kipmid_min_busy seconds, then starting to schedule() 
> in each loop as it does now, and finally go to sleep when 
> kipmid_max_busy is reached. At the same time, I changed the nice value 
> of kipmid from 19 to 0.
>   
I would guess that changing the nice value is the main thing that caused 
the difference.  The other changes probably didn't make as big a difference.

> With this patch and e.g. min_busy=100 and max_busy=200, there is no 
> noticeable difference any more between IPMI performance with and without 
> CPU load.
>
> The patch + results still need cleanup, therefore I am not sending it 
> right now. Just wanted to hear what you think.
>   
I'm ok with tuning like this, but most users are probably not going to 
want this type of behavior.

-corey

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Openipmi-developer] Improving IPMI performance under load
  2009-03-24 13:08                 ` [Openipmi-developer] " Corey Minyard
@ 2009-03-24 13:21                   ` Martin Wilck
  2009-03-24 15:50                   ` Matt Domsch
  1 sibling, 0 replies; 20+ messages in thread
From: Martin Wilck @ 2009-03-24 13:21 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Greg KH, openipmi-developer, linux-kernel

Corey Minyard wrote:

> I would guess that changing the nice value is the main thing that caused 
> the difference.  The other changes probably didn't make as big a difference.

That's true, but setting the nice level to 0 isn't "nice" without 
kipmid_max_busy. The two parameters help to make sure that kipmid 
doesn't use excessive CPU time.

I am not sure about your reasons to call schedule() in every loop 
iteration. If there is no other process that needs to run, it will just 
waste cycles trying to figure that out. If there are other processes, 
you say yourself that "kipmid would never be scheduled in a
busy system". Does it really make sense to  call schedule() every 
microsecond? That's what kipmid effectively does if it waits for the KCS 
interface, because it'll do a port_inb() in every iteration which takes 
ca. 1us.

> I'm ok with tuning like this, but most users are probably not going to 
> want this type of behavior.

Let's wait and see :-)

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 525 2796
Fax:			++49 5251 525 2820
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Openipmi-developer] Improving IPMI performance under load
  2009-03-24 13:08                 ` [Openipmi-developer] " Corey Minyard
  2009-03-24 13:21                   ` Martin Wilck
@ 2009-03-24 15:50                   ` Matt Domsch
  2009-03-24 17:15                     ` Martin Wilck
  1 sibling, 1 reply; 20+ messages in thread
From: Matt Domsch @ 2009-03-24 15:50 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Martin Wilck, Greg KH, openipmi-developer, linux-kernel

On Tue, Mar 24, 2009 at 08:08:36AM -0500, Corey Minyard wrote:
> Martin Wilck wrote:
> >Hi Corey,
> >
> >yesterday I posted some results about the IPMI performance under CPU 
> >load, which can be up to 25 times slower than in an idle system. I think 
> >it might be worthwhile to try to improve that behavior as well.
> >  
> Yes, that would be expected, as kipmid would never be scheduled in a 
> busy system, and it would just be the timer driving things.
> 
> >I made a variation of my patch which introduces a second parameter 
> >(kipmid_min_busy) that causes kipmid not to call schedule() for a 
> >certain amount of time. Thus if there's IPMI traffic pending, kipmid 
> >will busy-loop for kipmid_min_busy seconds, then starting to schedule() 
> >in each loop as it does now, and finally go to sleep when 
> >kipmid_max_busy is reached. At the same time, I changed the nice value 
> >of kipmid from 19 to 0.
> >  
> I would guess that changing the nice value is the main thing that caused 
> the difference.  The other changes probably didn't make as big a difference.
> 
> >With this patch and e.g. min_busy=100 and max_busy=200, there is no 
> >noticeable difference any more between IPMI performance with and without 
> >CPU load.
> >
> >The patch + results still need cleanup, therefore I am not sending it 
> >right now. Just wanted to hear what you think.
> >  
> I'm ok with tuning like this, but most users are probably not going to 
> want this type of behavior.

I still get complaints from users who see their CPU utilization spike
attributed to kipmi0 when userspace throws a lot of requests down to
the controller.  I've seen them want to limit kipmi0 even further, not
speed it up.

-- 
Matt Domsch
Linux Technology Strategist, Dell Office of the CTO
linux.dell.com & www.dell.com/linux

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [Openipmi-developer] Improving IPMI performance under load
  2009-03-24 15:50                   ` Matt Domsch
@ 2009-03-24 17:15                     ` Martin Wilck
  0 siblings, 0 replies; 20+ messages in thread
From: Martin Wilck @ 2009-03-24 17:15 UTC (permalink / raw)
  To: Matt Domsch; +Cc: Corey Minyard, Greg KH, openipmi-developer, linux-kernel

Matt Domsch wrote:

> I still get complaints from users who see their CPU utilization spike
> attributed to kipmi0 when userspace throws a lot of requests down to
> the controller.  I've seen them want to limit kipmi0 even further, not
> speed it up.

My first patch ("limit CPU time spent in kipmi") is targeted at exactly 
those users that you are talking about.

These users make their observations on idle systems. You'll never see 
kipmid use up CPU under "top" in systems with high CPU load. With the 
new patch I have in mind, kipmid can be tuned to be faster under load 
and at the same time use less cycles (at the cost of slightly decreased 
speed) when idle.

Martin

-- 
Martin Wilck
PRIMERGY System Software Engineer
FSC IP ESP DEV 6

Fujitsu Siemens Computers GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn
Germany

Tel:			++49 5251 525 2796
Fax:			++49 5251 525 2820
Email:			mailto:martin.wilck@fujitsu-siemens.com
Internet:		http://www.fujitsu-siemens.com
Company Details:	http://www.fujitsu-siemens.com/imprint.html

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN)
  2009-03-23 20:39             ` Corey Minyard
  2009-03-24  9:22               ` Martin Wilck
  2009-03-24  9:30               ` Improving IPMI performance under load Martin Wilck
@ 2009-04-06 13:48               ` Martin Wilck
  2009-06-04 18:39                 ` [PATCH] limit CPU time spent in kipmid (version 4) Martin Wilck
  2 siblings, 1 reply; 20+ messages in thread
From: Martin Wilck @ 2009-04-06 13:48 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Wilck Martin, Greg KH, linux-kernel, openipmi-developer

[-- Attachment #1: Type: text/plain, Size: 762 bytes --]

Hello Corey,

> I've done some experimenting with your patch and some more thinking.  
> (BTW, setting the permissions of kipmid_max_busy to 0644 as Greg 
> suggested makes changing the value for testing a lot easier :).

I apologize for the long delay - busy with other stuff.

Here is the patch with modifiable, per-interface module parameter. 
Feedback is welcome.

Best regards,
Martin (*note changed email address!*)

-- 
Dr. Martin Wilck
PRIMERGY System Software Engineer
x86 Server Engineering

Fujitsu Technology Solutions GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn, Germany

Phone:			++49 5251 525 2796
Fax:			++49 5251 525 2820
Email:			martin.wilck@ts.fujitsu.com
Internet:		http://ts.fujitsu.com
Company Details:	http://de.ts.fujitsu.com/imprint.html

[-- Attachment #2: kipmid_max_busy_arr_2.6.29.diff --]
[-- Type: text/x-patch, Size: 3312 bytes --]

Patch: Make busy-loops in kipmid configurable (take 3)

This patch adds a module parameter "kipmid_max_busy" that specifies how many
microseconds kipmid should busy-loop before going to sleep. The
new parameter defaults to 0 (previous behavior - busy-loop forever).

The main rationale of this patch is to be able to avoid irritating users
who see lots of CPU time spent int kipmid0. With a bit of tuning of the
kipmid_max_busy parameter it is possible to decrease the CPU usage of kipmid
to negligible values without losing much performance.

In this version of the patch, the parameter can be specified per-interface,
like other ipmi_si parameters, and can be modified at runtime via sysfs.

Signed-off-by: martin.wilck@ts.fujitsu.com

--- a/drivers/char/ipmi/ipmi_si_intf.c	2009-03-24 00:12:14.000000000 +0100
+++ b/drivers/char/ipmi/ipmi_si_intf.c	2009-04-06 17:16:41.000000000 +0200
@@ -297,6 +297,9 @@ struct smi_info {
 static int force_kipmid[SI_MAX_PARMS];
 static int num_force_kipmid;
 
+static unsigned int kipmid_max_busy[SI_MAX_PARMS];
+static int num_max_busy;
+
 static int unload_when_empty = 1;
 
 static int try_smi_init(struct smi_info *smi);
@@ -927,23 +930,57 @@ static void set_run_to_completion(void *
 	}
 }
 
+static int ipmi_thread_must_sleep(enum si_sm_result smi_result, int intf_num,
+				  int *busy, struct timespec *busy_time)
+{
+	unsigned int max_busy = 0;
+	if (smi_result != SI_SM_CALL_WITH_DELAY) {
+		if (*busy)
+			*busy = 0;
+		return 0;
+	}
+	if (intf_num < num_max_busy)
+		max_busy = kipmid_max_busy[intf_num];
+	if (*busy == 0) {
+		*busy = 1;
+		getnstimeofday(busy_time);
+		timespec_add_ns(busy_time, max_busy*NSEC_PER_USEC);
+	} else if (max_busy == 0)
+		return 0;
+	else {
+		struct timespec now;
+		getnstimeofday(&now);
+		if (unlikely(timespec_compare(&now, busy_time) > 0)) {
+			*busy = 0;
+			return 1;
+		}
+	}
+	return 0;
+}
+
 static int ipmi_thread(void *data)
 {
 	struct smi_info *smi_info = data;
 	unsigned long flags;
 	enum si_sm_result smi_result;
+	struct timespec busy_time;
+	int busy = 0;
 
 	set_user_nice(current, 19);
 	while (!kthread_should_stop()) {
+		int must_sleep;
 		spin_lock_irqsave(&(smi_info->si_lock), flags);
 		smi_result = smi_event_handler(smi_info, 0);
 		spin_unlock_irqrestore(&(smi_info->si_lock), flags);
+		must_sleep = ipmi_thread_must_sleep(smi_result,
+						    smi_info->intf_num,
+						    &busy, &busy_time);
 		if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
 			; /* do nothing */
-		else if (smi_result == SI_SM_CALL_WITH_DELAY)
+		else if (smi_result == SI_SM_CALL_WITH_DELAY && !must_sleep)
 			schedule();
 		else
-			schedule_timeout_interruptible(1);
+			schedule_timeout_interruptible(0);
 	}
 	return 0;
 }
@@ -1213,6 +1250,11 @@ module_param(unload_when_empty, int, 0);
 MODULE_PARM_DESC(unload_when_empty, "Unload the module if no interfaces are"
 		 " specified or found, default is 1.  Setting to 0"
 		 " is useful for hot add of devices using hotmod.");
+module_param_array(kipmid_max_busy, uint, &num_max_busy, 0644);
+MODULE_PARM_DESC(kipmid_max_busy,
+		 "Max time (in microseconds) to busy-wait for IPMI data before"
+		 " sleeping. 0 (default) means to wait forever. Set to 100-500"
+		 " if kipmid is using up a lot of CPU time.");
 
 
 static void std_irq_cleanup(struct smi_info *info)

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH] limit CPU time spent in kipmid (version 4)
  2009-04-06 13:48               ` [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN) Martin Wilck
@ 2009-06-04 18:39                 ` Martin Wilck
  0 siblings, 0 replies; 20+ messages in thread
From: Martin Wilck @ 2009-06-04 18:39 UTC (permalink / raw)
  To: Corey Minyard; +Cc: Greg KH, linux-kernel, openipmi-developer, Bela Lubkin

[-- Attachment #1: Type: text/plain, Size: 872 bytes --]

Hi all,

I am sorry for the long silence. I am sending here a new version of my 
patch which takes into account Bela's suggestions (well, most of them).
I compiled and tested it with 2.6.29.4, the results are similar as 
before. By setting kipmid_max_busy_us to a value between 100 and 500, it 
is possible to bring down kipmid CPU load to practically 0 without 
loosing too much ipmi throughput performance.

Please give me some feedback whether this patch will get merged, and if 
not, what improvement is needed.

Regards
Martin

-- 
Dr. Martin Wilck
PRIMERGY System Software Engineer
x86 Server Engineering

Fujitsu Technology Solutions GmbH
Heinz-Nixdorf-Ring 1
33106 Paderborn, Germany

Phone:			++49 5251 525 2796
Fax:			++49 5251 525 2820
Email:			martin.wilck@ts.fujitsu.com
Internet:		http://ts.fujitsu.com
Company Details:	http://de.ts.fujitsu.com/imprint.html

[-- Attachment #2: kipmid_max_busy_arr2_2.6.29.diff --]
[-- Type: text/x-patch, Size: 2755 bytes --]

Signed-off-by: martin.wilck@ts.fujitsu.com
--- linux-2.6.29.4/drivers/char/ipmi/ipmi_si_intf.c	2009-05-19 01:52:34.000000000 +0200
+++ linux-2.6.29-rc8/drivers/char/ipmi/ipmi_si_intf.c	2009-06-04 15:30:34.855398091 +0200
@@ -297,6 +297,9 @@
 static int force_kipmid[SI_MAX_PARMS];
 static int num_force_kipmid;
 
+static unsigned int kipmid_max_busy_us[SI_MAX_PARMS];
+static int num_max_busy_us;
+
 static int unload_when_empty = 1;
 
 static int try_smi_init(struct smi_info *smi);
@@ -927,23 +930,56 @@
 	}
 }
 
+#define ipmi_si_set_not_busy(timespec) \
+	do { (timespec)->tv_nsec = -1; } while (0)
+#define ipmi_si_is_busy(timespec) ((timespec)->tv_nsec != -1)
+
+static int ipmi_thread_busy_wait(enum si_sm_result smi_result,
+				 const struct smi_info *smi_info,
+				 struct timespec *busy_until)
+{
+	unsigned int max_busy_us = 0;
+
+	if (smi_info->intf_num < num_max_busy_us)
+		max_busy_us = kipmid_max_busy_us[smi_info->intf_num];
+	if (max_busy_us == 0 || smi_result != SI_SM_CALL_WITH_DELAY)
+		ipmi_si_set_not_busy(busy_until);
+	else if (!ipmi_si_is_busy(busy_until)) {
+		getnstimeofday(busy_until);
+		timespec_add_ns(busy_until, max_busy_us*NSEC_PER_USEC);
+	} else {
+		struct timespec now;
+		getnstimeofday(&now);
+		if (unlikely(timespec_compare(&now, busy_until) > 0)) {
+			ipmi_si_set_not_busy(busy_until);
+			return 0;
+		}
+	}
+	return 1;
+}
+
 static int ipmi_thread(void *data)
 {
 	struct smi_info *smi_info = data;
 	unsigned long flags;
 	enum si_sm_result smi_result;
+	struct timespec busy_until;
 
+	ipmi_si_set_not_busy(&busy_until);
 	set_user_nice(current, 19);
 	while (!kthread_should_stop()) {
+		int busy_wait;
 		spin_lock_irqsave(&(smi_info->si_lock), flags);
 		smi_result = smi_event_handler(smi_info, 0);
 		spin_unlock_irqrestore(&(smi_info->si_lock), flags);
+		busy_wait = ipmi_thread_busy_wait(smi_result, smi_info,
+						  &busy_until);
 		if (smi_result == SI_SM_CALL_WITHOUT_DELAY)
 			; /* do nothing */
-		else if (smi_result == SI_SM_CALL_WITH_DELAY)
+		else if (smi_result == SI_SM_CALL_WITH_DELAY && busy_wait)
 			schedule();
 		else
-			schedule_timeout_interruptible(1);
+			schedule_timeout_interruptible(0);
 	}
 	return 0;
 }
@@ -1213,6 +1249,11 @@
 MODULE_PARM_DESC(unload_when_empty, "Unload the module if no interfaces are"
 		 " specified or found, default is 1.  Setting to 0"
 		 " is useful for hot add of devices using hotmod.");
+module_param_array(kipmid_max_busy_us, uint, &num_max_busy_us, 0644);
+MODULE_PARM_DESC(kipmid_max_busy_us,
+		 "Max time (in microseconds) to busy-wait for IPMI data before"
+		 " sleeping. 0 (default) means to wait forever. Set to 100-500"
+		 " if kipmid is using up a lot of CPU time.");
 
 
 static void std_irq_cleanup(struct smi_info *info)

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2009-06-04 18:50 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-19 16:27 [PATCH] limit CPU time spent in kipmid Martin Wilck
2009-03-19 21:31 ` Corey Minyard
2009-03-19 23:51   ` Greg KH
2009-03-20 15:30     ` Corey Minyard
2009-03-20 17:47       ` Greg KH
2009-03-20 18:28         ` Corey Minyard
2009-03-23 13:17           ` [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN) Martin Wilck
2009-03-23 15:32             ` Greg KH
2009-03-23 16:20               ` Martin Wilck
2009-03-23 20:39             ` Corey Minyard
2009-03-24  9:22               ` Martin Wilck
2009-03-24  9:30               ` Improving IPMI performance under load Martin Wilck
2009-03-24 13:08                 ` [Openipmi-developer] " Corey Minyard
2009-03-24 13:21                   ` Martin Wilck
2009-03-24 15:50                   ` Matt Domsch
2009-03-24 17:15                     ` Martin Wilck
2009-04-06 13:48               ` [PATCH] limit CPU time spent in kipmid (PREVIOUS WAS BROKEN) Martin Wilck
2009-06-04 18:39                 ` [PATCH] limit CPU time spent in kipmid (version 4) Martin Wilck
2009-03-23 13:25           ` [PATCH] limit CPU time spent in kipmid Martin Wilck
2009-03-19 22:41 ` [Openipmi-developer] " Bela Lubkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.