John Stultz wrote: > On Wed, 2007-01-24 at 10:41 +0100, John wrote: > >> I'm using the POSIX timers API. My platform is x86 running Linux >> 2.6.18.6 patched with the high-resolution timer subsystem. >> >> http://www.tglx.de/hrtimers.html >> >> I've written a small "de-jittering engine" that receives packets in >> small bursts due to network jitter (typical average rate of 1000 packets >> per second), and re-sends them at a "smooth" rate. >> >> Just before I re-send a packet, I arm a one-shot timer in order to >> receive a signal when it is time to send the next packet. >> >> I've noticed a strange phenomenon that I cannot explain. >> >> Sometimes (rarely) the one-shot timer will expire more than 50 µs later >> than expected. This would seem normal, except that it happens periodically. >> >> For example, my app had been running normally for 2 minutes when it >> started printing diagnostics (see below). >> >> The first T_NEXT_POP is the date the timer was supposed to expire, >> >> NOW is the date the timer was handled after returning from sigwaitinfo >> (I am aware that blocking signals, and handling them at a specific point >> in the code will add some latency) >> >> The second T_NEXT_POP is the date the next timer is supposed to expire. >> >> DIFF is the difference between real and expected dates. >> >> (All dates are CLOCK_MONOTONIC by the way.) >> >> As you can see, the first diagnostic came at 472.410501014... Then >> another diagnostic almost exactly two seconds apart 9 times in a row! >> >> My process is the only SCHED_FIFO process on the system. There are no >> user-space processes with a higher priority. AFAICT, only a kernel >> thread could keep the CPU away from my app. >> >> Is there a periodic kernel thread that runs every 2 seconds, cannot be >> preempted, and runs for over 50 µs?? > > This sounds like a BIOS SMI issue. Can you reproduce this behavior on > different hardware? I am not familiar with the low-level details of the PC. Let me check Wikipedia... http://en.wikipedia.org/wiki/System_Management_Mode SMI = System Management Interrupt * Since the SMM code (SMI handler) is installed by the system firmware (BIOS), the OS and the SMM code may have expectations about hardware settings that are incompatible, such as different ideas of how the APIC should be set up. * Operations in SMM take CPU time away from the OS, since the CPU state must be stored to memory (SMRAM) and any write back caches must be flushed. This can destroy real-time behavior and cause clock ticks to get lost. * A digital logic analyser may be required to determine if SMM is occurring. * Recovering the SMI handler code to analyze it for bugs, vulnerabilities, and secrets requires a logic analyzer or dissassembly of the system firmware. Do PCs typically enter System Management Mode periodically? Is it possible to disable SMM? According to the Wikipedia article, even the kernel is unaware that the CPU has entered SMM, is that correct? I've run my tests on a Dell PC. IIRC, the BIOS options are very basic. I'll also try another PC as you suggest. NB: I've attached dmesg and dot.config If you see anything in there I should have turned on, or that I should have turned off... (Should I disable power management?) +++++ On a related note, John, AFAIU, you wrote the GTOD infrastructure. In my app, I need to call clock_gettime(CLOCK_MONOTONIC, ...) very often, i.e. around 2000 times every second (when I receive a packet, and when I re-send a packet). Is there a way to improve the overhead / latency of these calls? I've heard about vsyscalls, are they relevant? Do I need a specific glibc to use vsyscalls? If I call clock_gettime(CLOCK_MONOTONIC, &spec) twice in a row, then subtract the two timespecs, I get ~1400 ns on a 2.8 GHz P4. AFAIU, my clock source is acpi_pm. I tried setting it to tsc but it made hell break loose. http://groups.google.com/group/fa.linux.kernel/msg/a095241d49adfc44?dmode=source Apparently, 1400 ns is similar to what you observe with acpi_pm. I suppose I'll need to use the TSC if I want any improvement? +++++ Over the past few days, I've sent over 10 messages to the LKML, and none of them have been archived. It seems majordomo is dropping my messages. Do they look like spam? Regards.