From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S262524AbTHUKHj (ORCPT ); Thu, 21 Aug 2003 06:07:39 -0400 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S262552AbTHUKHj (ORCPT ); Thu, 21 Aug 2003 06:07:39 -0400 Received: from [66.212.224.118] ([66.212.224.118]:50189 "EHLO hemi.commfireservices.com") by vger.kernel.org with ESMTP id S262524AbTHUKHh (ORCPT ); Thu, 21 Aug 2003 06:07:37 -0400 Date: Thu, 21 Aug 2003 06:07:34 -0400 (EDT) From: Zwane Mwaikambo X-X-Sender: zwane@montezuma.mastecende.com To: TeJun Huh Cc: linux-kernel@vger.kernel.org Subject: Re: Possible race condition in i386 global_irq_lock handling. In-Reply-To: <20030821084807.GA29913@atj.dyndns.org> Message-ID: References: <20030821084807.GA29913@atj.dyndns.org> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Thu, 21 Aug 2003, TeJun Huh wrote: > I've been reading i386 interrupt handling code for a couple of days > and encountered something that looks like a race condition. It's > between include/asm-i386/hardirq.h:irq_enter() and > arch/i386/kernel/irq.c:get_irqlock(). They seem to be using lockless > synchronization with local_irq_count of each cpu and global_irq_lock > variable. Ok 2.4 (but for future try and mention which kernel version). You'll have to forgive me if i misunderstand you.. > A. locking CPU > > 1. Do test_and_set_bit() on global_irq_lock, if fail, repeat. > 2. If all local_irq_count's are zero, we're the winner. Check other > stuff; otherwise, clear global_irq_lock and retry. Are you referring to hardirq_trylock()? > B. other CPUs > > 1. Increment local_irq_count > 2. test_bit() on global_irq_lock, if zero, continue handling interrupt; > otherwise, wait till it's cleared. > > For this to work, the locking CPU should fetch the value of > local_irq_count after global_irq_lock value becomes visible to other > CPUs, and other CPUs should fetch the value of global_irq_lock after > making the incremented local_irq_count visible to other CPUs. Why after? it's currently in an interrupt anyway, the local_irq_count is per cpu so it's not used on other cpus why do you need to make it visible on other processors? (save irqs_running() but even that's ok) > The locking CPU is OK because test_and_set_bit() forces ordering on > x86, but there should be a mb() betweewn step 1 and 2 for other CPUs > because none of ++ and test_bit is ordering. The B part is irq_enter() > in hardirq.h which looks like the following. > > static inline void irq_enter(int cpu, int irq) > { > ++local_irq_count(cpu); > > while (test_bit(0,&global_irq_lock)) { > cpu_relax(); > } > } > > Is it a race condition or am I getting it horribly wrong? Thx in > advance. I don't see or understand the race condition you're describing, local_irq_count is per cpu. Zwane