From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S265828AbTL3WYF (ORCPT ); Tue, 30 Dec 2003 17:24:05 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S265891AbTL3WX7 (ORCPT ); Tue, 30 Dec 2003 17:23:59 -0500 Received: from intra.cyclades.com ([64.186.161.6]:12513 "EHLO intra.cyclades.com") by vger.kernel.org with ESMTP id S265879AbTL3WXT (ORCPT ); Tue, 30 Dec 2003 17:23:19 -0500 Date: Tue, 30 Dec 2003 20:21:25 -0200 (BRST) From: Marcelo Tosatti X-X-Sender: marcelo@logos.cnet To: Linus Torvalds Cc: Marcelo Tosatti , Daniel Tram Lux , steve@drifthost.com, James Bourne , Linux Kernel , Gergely Tamas , Bartlomiej Zolnierkiewicz Subject: Re: no DRQ after issuing WRITE was Re: 2.4.23-uv3 patch set released In-Reply-To: Message-ID: References: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Cyclades-MailScanner-Information: Please contact the ISP for more information X-Cyclades-MailScanner: Found to be clean Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 30 Dec 2003, Linus Torvalds wrote: > > > On Tue, 30 Dec 2003, Marcelo Tosatti wrote: > > > > Small correction: people are not hitting the WAIT_READY (they are hitting > > the problem from ide-disk.c, which uses WAIT_DRQ). But still... > > Ok. Do you have the full trace? In particular, if there is no locking in > that path, and interrupts are enabled, you could possibly get not just an > interrupt, but a preemption event. Now _that_ could blow up the timeout to > any amount of time, and then even 100ms might not be enough. The problem is happening in 2.4 too so I believe preemption is not the culprit. Here are some details: steve@drifthost.com wrote: "Well i only just started getting them and i started with 2.4.20 and upgraded to 2.4.21 about 6weeks or so ago (or when it came out)" "hda: status timeout: status=0xd0 { Busy } hda: no DRQ after issuing WRITE ide0: reset: success hda: status timeout: status=0xd0 { Busy } hda: no DRQ after issuing WRITE ide0: reset: success" daniel@starbattle.com wrote: "hda: no DRQ after issuing WRITE ide0: reset: success hda: status timeout: status=0xd0 { Busy } hda: no DRQ after issuing WRITE ide0: reset: success" (Daniel wrote the patch which got applied to 2.4, it fixed the problems for him). There are several other reports of "no DRQ after issuing {MULTI}WRITE", some of them probably involved with this bug, some of them potentially not. You can find more reports (both from 2.6 and 2.4) at: http://marc.theaimsgroup.com/?l=linux-kernel&w=2&r=1&s=no+DRQ+after+issuing+WRITE&q=b > Is CONFIG_PREEMPT on in the cases, and is there really no locking > anywhere? Preempting in the middle of the data transfer phase sounds like > a fundamentally bad idea, and maybe the code needs a few preempt > disable/enable pairs somewhere? >>From my fast code read, there is no other locking involved. It sounds you are right, the timeout is too small --- we need confirmation from the people who can hit it that increasing it fixes the problem.