From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1757347Ab2HNSzY (ORCPT ); Tue, 14 Aug 2012 14:55:24 -0400 Received: from www.linutronix.de ([62.245.132.108]:60953 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756069Ab2HNSzW (ORCPT ); Tue, 14 Aug 2012 14:55:22 -0400 Date: Tue, 14 Aug 2012 20:55:16 +0200 (CEST) From: Thomas Gleixner To: Tejun Heo cc: linux-kernel@vger.kernel.org, torvalds@linux-foundation.org, mingo@redhat.com, akpm@linux-foundation.org, peterz@infradead.org Subject: Re: [PATCHSET] timer: clean up initializers and implement irqsafe timers In-Reply-To: <1344449428-24962-1-git-send-email-tj@kernel.org> Message-ID: References: <1344449428-24962-1-git-send-email-tj@kernel.org> User-Agent: Alpine 2.02 (LFD 1266 2009-07-14) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Linutronix-Spam-Score: -1.0 X-Linutronix-Spam-Level: - X-Linutronix-Spam-Status: No , -1.0 points, 5.0 required, ALL_TRUSTED=-1,SHORTCIRCUIT=-0.0001 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 8 Aug 2012, Tejun Heo wrote: > Timer internals are protected by irqsafe lock but the lock is > naturally dropped and irq enabled while a timer is executed. This > makes dequeueing timer for execution and the actual execution > non-atomic against IRQs. No matter what the timer function does, IRQs > can occur between timer dispatch and execution. This means that an > IRQ handler could interrupt any timer in progress and it's impossible > for an IRQ handler to cancel and drain a timer. > > This restriction manifests as ugly convolutions in workqueue > delayed_work interface. A !idle delayed_work is either on timer, > being transferred from timer to worklist, on worklist, or executing. > There are interfaces which need to cancel a pending delayed_work - > cancel_delayed_work() and friends and mod_delayed_work(). They want > to cancel a work item in the first three states but it's impossible to > drain the second state from IRQ handlers which lead to the following > oddities. > > * mod_delayed_work() can't be used from IRQ handlers. This function does not exist. So what? > * __cancel_delayed_work() can't use the usual try_to_grab_pending() > which handles all three states but instead only deals with the first > state using a separate implementation. There's no way to make a > delayed_work not pending from IRQ handlers. And why is that desired and justifies the mess you are trying to create in the timer code? > * The context / behavior differences among cancel_delayed_work(), > __cancel_delayed_work(), cancel_delayed_work_sync() are subtle and > confusing (the first two are mostly historical tho). We have a lot of subtle differences in interfaces for similar reasons. > This patchset implements irqsafe timers. For an irqsafe timer, IRQ is > not enabled from dispatch till the end of its execution making it safe > to drain the timer regardless of context. This will enable cleaning > up delayed_work interface. By burdening crap on the timer code. We had a similar context case handling in the original hrtimers code and Linus went berserk on it. There is no real good reason to reinvent it in a different flavour. Your general approach about workqueues seems to be adding hacks into other sensitive code paths [ see __schedule() ]. Can we please stop that? workqueues are not so special to justify that. The whole delayed work handling is a fragile construct due to the combination of the timer and the work struct itself. As long as the timer is pending the work is detached from the workqueue code and that's what causes the issues at hand. So rather than hacking special cases into the timer code, why can't we manage those delayed works in the workqueue core itself and avoid the whole timer dance? Right now delayed work arms a timer, whose callback enqueues the work and wakes the worker thread, which then executes the work. So what about changing delayed_work into: struct delayed_work { struct work_struct work; unsigned long expires; }; Now when delayed work gets scheduled it gets enqueued into a separate list in the workqueue core with the proper worker lock held. Then check the expiry time of the new work against the current expiry time of a timer in the worker itself. If the new expiry time is after the current expiry time, nothing to do. If the new expiry is before the current expiry time or the timer is not armed, then (re)arm the timer. When the timer expires it wakes the worker and that evaluates the delayed list for expired works and executes them and rearms the timer if necessary. To cancel delayed work you don't have to worry about the timer callback being executed at all, simply because the timer callback is just a wakeup of the worker and not fiddling with the work itself. If the work is removed before the worker thread runs, life goes on as usual. So all you have to do is to remove the work from the delayed list. If the timer is armed, just leave it alone and let it fire. Canceling delayed work is probably not a high frequency operation. In fact that would make cancel_delayed_work and cancel_work basically the same operation. I have no idea how many concurrent delayed works are on the fly, so I can't tell whether a simple ordered list is sufficient or if you need a tree which is better suited for a large number of sorted items. But that's a trivial to solve detail. Thoughts? tglx