linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Strange delays / what usually happens every 10 min?
@ 2007-11-13 14:24 Florian Boelstler
  2007-11-13 14:41 ` Peter Zijlstra
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Florian Boelstler @ 2007-11-13 14:24 UTC (permalink / raw)
  To: linux-kernel

Hi,

this issue has been already discussed on the kernelnewbies mailing list 
[1],[2] and suggested to be further discussed here.

I am currently working on a MPC8540-based custom board, which runs Linux
2.6.15 (arch/ppc). The original Linux sources have been modified to 
support that custom board. (Additional patches to support LTT are 
applied as well, though disabled in the running kernel)

I set up a periodically running kernel thread, which is delayed for a
single jiffy using schedule_timeout() in an infinite loop. It is used to
measure delays between invocations of that thread. For measuring the
distance in time the PPC's time base lower half register is used
(obtained using get_cycles() defined in asm/timex.h).

The thread calculates the delay to the previous run and only outputs the
result if a new maximum value has been determined (in respect to all
previous cycles). Further the thread outputs a warning if a very "high"
delay was determined. I.e. a delay greater than 5ms.

While running that test driver a delay of about 10ms _exactly_ occurs
every 10 minutes.

The kernel is configured using CONFIG_HZ=1000 and CONFIG_PREEMPT.
The CCB is at 333MHz, whereas the TBR update rate is 333 MHz / 8, i.e.
41,625 MHz.
Kernel configuration as a whole is found here: 
http://nopaste.info/5e4d0283bb.html

And now the funny part starts.
I got a response from Bruce Rowen on kernelnewbies, telling me that he 
came across the same problem. He increased his AMD-Geode-based platform 
to 1GB of RAM (256MB before) and also hit the 10-minutes-issue a few 
month ago (using Linux 2.6.13).
Going back to 256MB cured the problem. I did the same thing by 
instructing the boot loader in order to only use 256 MB of RAM (instead 
of 512MB) and yes, the 10-minutes-issue was gone as well.

Apart of some kernel threads almost all user processes have been killed
during the test. Only SSH and a bash were running (whereas a test with 
network interfaces completely disabled and only operated from a serial 
console turned out the same results).
The kernel comes with compiled in CIFS support, some kernel debugging
features like soft-lockup detection and preemption debugging. I.e. ps
lists the kernel threads ksoftirqd, watchdog, events, khelper, kthread,
kblockd, pdflush, aio, cifsoplockd and cifsdnotifyd.

An appropriate userspace test tool based on nanosleep() determined the
same results like the kernel thread:

root@mpc0:/# /tmp/wait.rt
looping 1 milli seconds nanosleep ...
15:26:16: #1 FRAME MAX 1996 us (at 4139773004 ticks)
15:26:16: #2 FRAME MAX 2002 us (at 4139856360 ticks)
15:26:16: #155 FRAME MAX 2102 us (at 4152597854 ticks)
15:41:37: #460398 FRAME MAX 8941 us (at 3813406605 ticks)
15:41:37: #460398 FRAME HIGH 8941 us (at 3813406605 ticks)
15:51:37: #760394 FRAME MAX 9936 us (at 3018602602 ticks)
15:51:37: #760394 FRAME HIGH 9936 us (at 3018602602 ticks)
16:01:37: #1060390 FRAME HIGH 9935 us (at 2223798809 ticks)
16:11:37: #1360386 FRAME HIGH 9934 us (at 1428994989 ticks)
16:21:37: #1660382 FRAME HIGH 9935 us (at 634191241 ticks)
[...]

Thanks for any help!

Cheers,

   Florian

[1] http://thread.gmane.org/gmane.linux.kernel.kernelnewbies/23419
[2] http://thread.gmane.org/gmane.linux.kernel.kernelnewbies/23426


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Strange delays / what usually happens every 10 min?
  2007-11-13 14:24 Strange delays / what usually happens every 10 min? Florian Boelstler
@ 2007-11-13 14:41 ` Peter Zijlstra
  2007-11-13 16:02 ` Dmitry Adamushko
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 8+ messages in thread
From: Peter Zijlstra @ 2007-11-13 14:41 UTC (permalink / raw)
  To: Florian Boelstler; +Cc: linux-kernel


On Tue, 2007-11-13 at 15:24 +0100, Florian Boelstler wrote:
> Hi,
> 
> this issue has been already discussed on the kernelnewbies mailing list 
> [1],[2] and suggested to be further discussed here.
> 
> I am currently working on a MPC8540-based custom board, which runs Linux
> 2.6.15 (arch/ppc). The original Linux sources have been modified to 
> support that custom board. (Additional patches to support LTT are 
> applied as well, though disabled in the running kernel)
> 
> I set up a periodically running kernel thread, which is delayed for a
> single jiffy using schedule_timeout() in an infinite loop. It is used to
> measure delays between invocations of that thread. For measuring the
> distance in time the PPC's time base lower half register is used
> (obtained using get_cycles() defined in asm/timex.h).
> 
> The thread calculates the delay to the previous run and only outputs the
> result if a new maximum value has been determined (in respect to all
> previous cycles). Further the thread outputs a warning if a very "high"
> delay was determined. I.e. a delay greater than 5ms.
> 
> While running that test driver a delay of about 10ms _exactly_ occurs
> every 10 minutes.
> 
> The kernel is configured using CONFIG_HZ=1000 and CONFIG_PREEMPT.
> The CCB is at 333MHz, whereas the TBR update rate is 333 MHz / 8, i.e.
> 41,625 MHz.
> Kernel configuration as a whole is found here: 
> http://nopaste.info/5e4d0283bb.html
> 
> And now the funny part starts.
> I got a response from Bruce Rowen on kernelnewbies, telling me that he 
> came across the same problem. He increased his AMD-Geode-based platform 
> to 1GB of RAM (256MB before) and also hit the 10-minutes-issue a few 
> month ago (using Linux 2.6.13).
> Going back to 256MB cured the problem. I did the same thing by 
> instructing the boot loader in order to only use 256 MB of RAM (instead 
> of 512MB) and yes, the 10-minutes-issue was gone as well.
> 
> Apart of some kernel threads almost all user processes have been killed
> during the test. Only SSH and a bash were running (whereas a test with 
> network interfaces completely disabled and only operated from a serial 
> console turned out the same results).
> The kernel comes with compiled in CIFS support, some kernel debugging
> features like soft-lockup detection and preemption debugging. I.e. ps
> lists the kernel threads ksoftirqd, watchdog, events, khelper, kthread,
> kblockd, pdflush, aio, cifsoplockd and cifsdnotifyd.
> 
> An appropriate userspace test tool based on nanosleep() determined the
> same results like the kernel thread:
> 
> root@mpc0:/# /tmp/wait.rt
> looping 1 milli seconds nanosleep ...
> 15:26:16: #1 FRAME MAX 1996 us (at 4139773004 ticks)
> 15:26:16: #2 FRAME MAX 2002 us (at 4139856360 ticks)
> 15:26:16: #155 FRAME MAX 2102 us (at 4152597854 ticks)
> 15:41:37: #460398 FRAME MAX 8941 us (at 3813406605 ticks)
> 15:41:37: #460398 FRAME HIGH 8941 us (at 3813406605 ticks)
> 15:51:37: #760394 FRAME MAX 9936 us (at 3018602602 ticks)
> 15:51:37: #760394 FRAME HIGH 9936 us (at 3018602602 ticks)
> 16:01:37: #1060390 FRAME HIGH 9935 us (at 2223798809 ticks)
> 16:11:37: #1360386 FRAME HIGH 9934 us (at 1428994989 ticks)
> 16:21:37: #1660382 FRAME HIGH 9935 us (at 634191241 ticks)
> [...]

You could try the latency tracer from -rt.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Strange delays / what usually happens every 10 min?
  2007-11-13 14:24 Strange delays / what usually happens every 10 min? Florian Boelstler
  2007-11-13 14:41 ` Peter Zijlstra
@ 2007-11-13 16:02 ` Dmitry Adamushko
  2007-11-13 16:08 ` Chris Snook
  2007-11-13 16:23 ` Eric Dumazet
  3 siblings, 0 replies; 8+ messages in thread
From: Dmitry Adamushko @ 2007-11-13 16:02 UTC (permalink / raw)
  To: Florian Boelstler; +Cc: linux-kernel

On 13/11/2007, Florian Boelstler <kernel@boelstler.net> wrote:
> Hi,
>
> this issue has been already discussed on the kernelnewbies mailing list
> [1],[2] and suggested to be further discussed here.
>
> I am currently working on a MPC8540-based custom board, which runs Linux
> 2.6.15 (arch/ppc). The original Linux sources have been modified to
> support that custom board. (Additional patches to support LTT are
> applied as well, though disabled in the running kernel)
>
> I set up a periodically running kernel thread, which is delayed for a
> single jiffy using schedule_timeout() in an infinite loop. It is used to
> measure delays between invocations of that thread. For measuring the
> distance in time the PPC's time base lower half register is used
> (obtained using get_cycles() defined in asm/timex.h).
>
> The thread calculates the delay to the previous run and only outputs the
> result if a new maximum value has been determined (in respect to all
> previous cycles). Further the thread outputs a warning if a very "high"
> delay was determined. I.e. a delay greater than 5ms.
>
> While running that test driver a delay of about 10ms _exactly_ occurs
> every 10 minutes.
> And now the funny part starts.
> I got a response from Bruce Rowen on kernelnewbies, telling me that he
> came across the same problem. He increased his AMD-Geode-based platform
> to 1GB of RAM (256MB before) and also hit the 10-minutes-issue a few
> month ago (using Linux 2.6.13).
> Going back to 256MB cured the problem. I did the same thing by
> instructing the boot loader in order to only use 256 MB of RAM (instead
> of 512MB) and yes, the 10-minutes-issue was gone as well.

Not necessarily gone. I'd say, the question is not whether you still
have 'delays' (> 5 ms. in your case) once in a fixed interval (10
minutes) but whether you have noticeable blips, which are significant
wrt maximum 'delays' you otherwise observe between those points of
time.

What are maximum 'delays' you see before hitting this "once in 10
minutes" point?

say, with 256 Mb. the blips could just become lower (e.g. 2 ms.) and
are not reported as "big ones" (>5 ms. in your terms)...

Quite often the source of high periodic latency is SMI (System
Management Interrupts)... I don't know though, whether any of SMI
activities are somehow dependent on the size of RAM.


-- 
Best regards,
Dmitry Adamushko

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Strange delays / what usually happens every 10 min?
  2007-11-13 14:24 Strange delays / what usually happens every 10 min? Florian Boelstler
  2007-11-13 14:41 ` Peter Zijlstra
  2007-11-13 16:02 ` Dmitry Adamushko
@ 2007-11-13 16:08 ` Chris Snook
  2007-11-13 16:50   ` Clemens Koller
  2007-11-13 16:23 ` Eric Dumazet
  3 siblings, 1 reply; 8+ messages in thread
From: Chris Snook @ 2007-11-13 16:08 UTC (permalink / raw)
  To: Florian Boelstler; +Cc: linux-kernel

Florian Boelstler wrote:
> While running that test driver a delay of about 10ms _exactly_ occurs
> every 10 minutes.

This is precisely the sort of thing that BIOS/firmware-level SMI handlers do, 
particularly those that have monitoring or management features.  Try to 
determine if the kernel is doing anything during this time.  If the entire 
kernel seems to be frozen, talk to the people who wrote the firmware.

	-- Chris

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Strange delays / what usually happens every 10 min?
  2007-11-13 14:24 Strange delays / what usually happens every 10 min? Florian Boelstler
                   ` (2 preceding siblings ...)
  2007-11-13 16:08 ` Chris Snook
@ 2007-11-13 16:23 ` Eric Dumazet
  2007-11-13 17:54   ` Florian Boelstler
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Dumazet @ 2007-11-13 16:23 UTC (permalink / raw)
  To: Florian Boelstler; +Cc: linux-kernel, netdev

Florian Boelstler a écrit :
> Hi,
>
> this issue has been already discussed on the kernelnewbies mailing 
> list [1],[2] and suggested to be further discussed here.
>
> I am currently working on a MPC8540-based custom board, which runs Linux
> 2.6.15 (arch/ppc). The original Linux sources have been modified to 
> support that custom board. (Additional patches to support LTT are 
> applied as well, though disabled in the running kernel)
>
> I set up a periodically running kernel thread, which is delayed for a
> single jiffy using schedule_timeout() in an infinite loop. It is used to
> measure delays between invocations of that thread. For measuring the
> distance in time the PPC's time base lower half register is used
> (obtained using get_cycles() defined in asm/timex.h).
>
> The thread calculates the delay to the previous run and only outputs the
> result if a new maximum value has been determined (in respect to all
> previous cycles). Further the thread outputs a warning if a very "high"
> delay was determined. I.e. a delay greater than 5ms.
>
> While running that test driver a delay of about 10ms _exactly_ occurs
> every 10 minutes.
>
> The kernel is configured using CONFIG_HZ=1000 and CONFIG_PREEMPT.
> The CCB is at 333MHz, whereas the TBR update rate is 333 MHz / 8, i.e.
> 41,625 MHz.
> Kernel configuration as a whole is found here: 
> http://nopaste.info/5e4d0283bb.html
>
> And now the funny part starts.
> I got a response from Bruce Rowen on kernelnewbies, telling me that he 
> came across the same problem. He increased his AMD-Geode-based 
> platform to 1GB of RAM (256MB before) and also hit the 
> 10-minutes-issue a few month ago (using Linux 2.6.13).
> Going back to 256MB cured the problem. I did the same thing by 
> instructing the boot loader in order to only use 256 MB of RAM 
> (instead of 512MB) and yes, the 10-minutes-issue was gone as well.
>
> Apart of some kernel threads almost all user processes have been killed
> during the test. Only SSH and a bash were running (whereas a test with 
> network interfaces completely disabled and only operated from a serial 
> console turned out the same results).
> The kernel comes with compiled in CIFS support, some kernel debugging
> features like soft-lockup detection and preemption debugging. I.e. ps
> lists the kernel threads ksoftirqd, watchdog, events, khelper, kthread,
> kblockd, pdflush, aio, cifsoplockd and cifsdnotifyd.
>
> An appropriate userspace test tool based on nanosleep() determined the
> same results like the kernel thread:
>
> root@mpc0:/# /tmp/wait.rt
> looping 1 milli seconds nanosleep ...
> 15:26:16: #1 FRAME MAX 1996 us (at 4139773004 ticks)
> 15:26:16: #2 FRAME MAX 2002 us (at 4139856360 ticks)
> 15:26:16: #155 FRAME MAX 2102 us (at 4152597854 ticks)
> 15:41:37: #460398 FRAME MAX 8941 us (at 3813406605 ticks)
> 15:41:37: #460398 FRAME HIGH 8941 us (at 3813406605 ticks)
> 15:51:37: #760394 FRAME MAX 9936 us (at 3018602602 ticks)
> 15:51:37: #760394 FRAME HIGH 9936 us (at 3018602602 ticks)
> 16:01:37: #1060390 FRAME HIGH 9935 us (at 2223798809 ticks)
> 16:11:37: #1360386 FRAME HIGH 9934 us (at 1428994989 ticks)
> 16:21:37: #1660382 FRAME HIGH 9935 us (at 634191241 ticks)
> [...]
>
> Thanks for any help!
>
> Cheers,
>
>   Florian
>
> [1] http://thread.gmane.org/gmane.linux.kernel.kernelnewbies/23419
> [2] http://thread.gmane.org/gmane.linux.kernel.kernelnewbies/23426
>

Hi Florian

I think you hit the periodic flush of IP route cache, which is fired 
every 600 seconds per default.

(Check /proc/sys/net/ipv4/route/secret_interval )

For a 1GB machine, this hash table is so big that a full scan might take 
more than 10 ms, even if empty.

Instead of using less RAM, you could just boot with rhash_entries=1024 
to lower the size of this table.

Or just change secret_interval to 2000000 for example (not much more 
because * HZ could overflow)

Eric





^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Strange delays / what usually happens every 10 min?
  2007-11-13 16:08 ` Chris Snook
@ 2007-11-13 16:50   ` Clemens Koller
  2007-11-13 17:58     ` Florian Boelstler
  0 siblings, 1 reply; 8+ messages in thread
From: Clemens Koller @ 2007-11-13 16:50 UTC (permalink / raw)
  To: Chris Snook; +Cc: Florian Boelstler, linux-kernel


Chris Snook schrieb:
 > Florian Boelstler wrote:
 >> While running that test driver a delay of about 10ms _exactly_ occurs
 >> every 10 minutes.
 >
 > This is precisely the sort of thing that BIOS/firmware-level SMI
 > handlers do, particularly those that have monitoring or management
 > features.  Try to determine if the kernel is doing anything during this
 > time.  If the entire kernel seems to be frozen, talk to the people who
 > wrote the firmware.

There is no such thing as BIOS on this MPC8540 embedded system.
I guess he is also using some bootloader which just gives full
control to the linux kernel.

Florian:
Check the IP route chache flushes, as Eric mentioned.
I'm working on the MPC854x as well. If you send me your scripts,
I can give them a run here to gather more information.
(I'm about to update my kernels here)

Regards,

Clemens Koller
__________________________________
R&D Imaging Devices
Anagramm GmbH
Rupert-Mayer-Straße 45/1
Linhof Werksgelände
D-81379 München
Tel.089-741518-50
Fax 089-741518-19
http://www.anagramm-technology.com


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Strange delays / what usually happens every 10 min?
  2007-11-13 16:23 ` Eric Dumazet
@ 2007-11-13 17:54   ` Florian Boelstler
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Boelstler @ 2007-11-13 17:54 UTC (permalink / raw)
  To: linux-kernel; +Cc: netdev

Hi Eric,

Eric Dumazet wrote:
> Instead of using less RAM, you could just boot with rhash_entries=1024 
> to lower the size of this table.

I just tried that and it seems to reduce the scan time. This is the 
result for the first 40 minutes of runtime:

root@mpc0:/# /tmp/wait.rt
looping 1 milli seconds nanosleep ...
17:10:11/425384 #1 MAX 1996/83117/-268599896 us/tick/usec (at 2107848557)
17:10:11/427385 #2 MAX 2001/83327/2001 us/tick/usec (at 2107931884)
17:10:11/433534 #5 MAX 2149/89477/2150 us/tick/usec (at 2108187839)
17:27:02/5897 #505291 MAX 2512/104576/2513 us/tick/usec (at 1223589469)

The first ~10ms delay usually occurred after ~15 minutes. So one could 
argue that the reported HIGH-value at 17:27:02 (GMT) is the first flush 
of IP route cache. And all later flushes weren't longer than 2,5ms.

Thanks to all of you, especially Eric. Now it seems I got an instrument 
to lower system response time.

Cheers,

   Florian

PS: Unfortunately I had to remove some CC:-entries since the local 
firewall seems to not allow anything but NNTP (for gmane) and HTTP.


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Strange delays / what usually happens every 10 min?
  2007-11-13 16:50   ` Clemens Koller
@ 2007-11-13 17:58     ` Florian Boelstler
  0 siblings, 0 replies; 8+ messages in thread
From: Florian Boelstler @ 2007-11-13 17:58 UTC (permalink / raw)
  To: linux-kernel

Hi Clemens,

Clemens Koller wrote:
>  > This is precisely the sort of thing that BIOS/firmware-level SMI
>  > handlers do, particularly those that have monitoring or management
>  > features.  Try to determine if the kernel is doing anything during this
>  > time.  If the entire kernel seems to be frozen, talk to the people who
>  > wrote the firmware.
> 
> There is no such thing as BIOS on this MPC8540 embedded system.
> I guess he is also using some bootloader which just gives full
> control to the linux kernel.

Correct, it's the U-Boot boot loader.

> Florian:
> Check the IP route chache flushes, as Eric mentioned.
> I'm working on the MPC854x as well. If you send me your scripts,
> I can give them a run here to gather more information.
> (I'm about to update my kernels here)

Thanks, the rhash_entries argument seems to have cured the problem.

Cheers,

   Florian


^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2007-11-13 18:00 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-11-13 14:24 Strange delays / what usually happens every 10 min? Florian Boelstler
2007-11-13 14:41 ` Peter Zijlstra
2007-11-13 16:02 ` Dmitry Adamushko
2007-11-13 16:08 ` Chris Snook
2007-11-13 16:50   ` Clemens Koller
2007-11-13 17:58     ` Florian Boelstler
2007-11-13 16:23 ` Eric Dumazet
2007-11-13 17:54   ` Florian Boelstler

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).