From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751401AbXAXTDt (ORCPT ); Wed, 24 Jan 2007 14:03:49 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S1751410AbXAXTDt (ORCPT ); Wed, 24 Jan 2007 14:03:49 -0500 Received: from e33.co.us.ibm.com ([32.97.110.151]:53672 "EHLO e33.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751401AbXAXTDs (ORCPT ); Wed, 24 Jan 2007 14:03:48 -0500 Subject: Re: One-shot high-resolution POSIX timer periodically late From: john stultz To: John Cc: linux-kernel@vger.kernel.org, tglx@timesys.com, mingo@elte.hu, akpm@osdl.org, shill@free.fr In-Reply-To: <45B729AF.4030701@privacy.net> References: <45B729AF.4030701@privacy.net> Content-Type: text/plain; charset=utf-8 Date: Wed, 24 Jan 2007 11:02:27 -0800 Message-Id: <1169665347.6905.3.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.8.1 Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 2007-01-24 at 10:41 +0100, John wrote: > I'm using the POSIX timers API. My platform is x86 running Linux > 2.6.18.6 patched with the high-resolution timer subsystem. > > http://www.tglx.de/hrtimers.html > > I've written a small "de-jittering engine" that receives packets in > small bursts due to network jitter (typical average rate of 1000 packets > per second), and re-sends them at a "smooth" rate. > > Just before I re-send a packet, I arm a one-shot timer in order to > receive a signal when it is time to send the next packet. > > I've noticed a strange phenomenon that I cannot explain. > > Sometimes (rarely) the one-shot timer will expire more than 50 µs later > than expected. This would seem normal, except that it happens periodically. > > For example, my app had been running normally for 2 minutes when it > started printing diagnostics (see below). > > The first T_NEXT_POP is the date the timer was supposed to expire, > > NOW is the date the timer was handled after returning from sigwaitinfo > (I am aware that blocking signals, and handling them at a specific point > in the code will add some latency) > > The second T_NEXT_POP is the date the next timer is supposed to expire. > > DIFF is the difference between real and expected dates. > > (All dates are CLOCK_MONOTONIC by the way.) > > As you can see, the first diagnostic came at 472.410501014... Then > another diagnostic almost exactly two seconds apart 9 times in a row! > > My process is the only SCHED_FIFO process on the system. There are no > user-space processes with a higher priority. AFAICT, only a kernel > thread could keep the CPU away from my app. > > Is there a periodic kernel thread that runs every 2 seconds, cannot be > preempted, and runs for over 50 µs?? This sounds like a BIOS SMI issue. Can you reproduce this behavior on different hardware? thanks -john