* Re: Quick aic7xxx bug hunt...
2002-09-23 23:28 ` Jeff Garzik
@ 2002-09-23 7:44 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 11+ messages in thread
From: Benjamin Herrenschmidt @ 2002-09-23 7:44 UTC (permalink / raw)
To: Jeff Garzik, Justin T. Gibbs; +Cc: Konstantin Kletschke, linux-kernel
>Jeff Garzik wrote:
>> ahc_reset
>> aic7770_config -> can sleep
>> ahc_pci_config -> can sleep
>> ahc_shutdown -> can't sleep, whoops
>
>
>Though, to answer your question from a previous email, you can call the
>function in_interrupt() to see if you're in interrupt context.
Well... no, you can't rely on it for knowing if you can sleep or
not. in_interrupt() won't tell you interesting things like you
are holding a spinlock...
Ben.
^ permalink raw reply [flat|nested] 11+ messages in thread
* 2.4.20-pre7-ac3 aic7xxx broken?
@ 2002-09-23 18:00 Konstantin Kletschke
2002-09-23 19:15 ` Justin T. Gibbs
0 siblings, 1 reply; 11+ messages in thread
From: Konstantin Kletschke @ 2002-09-23 18:00 UTC (permalink / raw)
To: linux-kernel
Hi!
I wonder if my
Bus 0, device 9, function 0:
SCSI storage controller: Adaptec AHA-7850 (rev 3).
IRQ 11.
Master Capable. Latency=32. Min Gnt=4.Max Lat=4.
I/O at 0xa000 [0xa0ff].
Non-prefetchable 32 bit memory at 0xe7001000 [0xe7001fff].
is broken or the aic7xxx driver in linux-2.4.20-pre7-ac3?
The modul loads fine, the cdrom seems to work properly, though,
but this messages appears in my
message.log:
==> /var/log/syslog <==
Sep 23 19:32:23 sexmachine kernel: scsi1: PCI error Interrupt at seqaddr
= 0x9
Sep 23 19:32:23 sexmachine kernel: scsi1: Data Parity Error Detected
during address or write data phase
several times a minute. The aic7xxx_old produces no errorr messages...
Konsti
--
Kletschke & Uhlig GbR:
http://www.ku-gbr.de
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: 2.4.20-pre7-ac3 aic7xxx broken?
2002-09-23 18:00 2.4.20-pre7-ac3 aic7xxx broken? Konstantin Kletschke
@ 2002-09-23 19:15 ` Justin T. Gibbs
2002-09-23 21:27 ` Quick aic7xxx bug hunt Jeff Garzik
0 siblings, 1 reply; 11+ messages in thread
From: Justin T. Gibbs @ 2002-09-23 19:15 UTC (permalink / raw)
To: Konstantin Kletschke, linux-kernel
> Hi!
>
> I wonder if my
>
> Bus 0, device 9, function 0:
> SCSI storage controller: Adaptec AHA-7850 (rev 3).
...
> is broken or the aic7xxx driver in linux-2.4.20-pre7-ac3?
>
> The modul loads fine, the cdrom seems to work properly, though,
> but this messages appears in my
> message.log:
>
> ==> /var/log/syslog <==
> Sep 23 19:32:23 sexmachine kernel: scsi1: PCI error Interrupt at seqaddr
> = 0x9
> Sep 23 19:32:23 sexmachine kernel: scsi1: Data Parity Error Detected
> during address or write data phase
>
> several times a minute. The aic7xxx_old produces no errorr messages...
On some motherboards with some chipsets, you can get these messages if
another busmaster (say an IDE drive or a sound card) is hogging the bus.
Usually this is with a VIA chipset. Its not clear why the aic7xxx_old
driver would behave differently other than it disables memory write
and invalidate PCI transactions on this chip. The new driver doesn't
need that work around.
As to wether it is safe or not, well our investigation into these issues
has shown that the parity errors are real, but it is the parity that
is at fault. It is hard to say if that is the case for your MB or not.
Can you provide, in private email, a full lspci dump?
Thanks,
Justin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Quick aic7xxx bug hunt...
2002-09-23 19:15 ` Justin T. Gibbs
@ 2002-09-23 21:27 ` Jeff Garzik
2002-09-23 21:54 ` Justin T. Gibbs
0 siblings, 1 reply; 11+ messages in thread
From: Jeff Garzik @ 2002-09-23 21:27 UTC (permalink / raw)
To: Justin T. Gibbs; +Cc: Konstantin Kletschke, linux-kernel
Justin T. Gibbs wrote:
> On some motherboards with some chipsets, you can get these messages if
> another busmaster (say an IDE drive or a sound card) is hogging the bus.
> Usually this is with a VIA chipset. Its not clear why the aic7xxx_old
> driver would behave differently other than it disables memory write
> and invalidate PCI transactions on this chip. The new driver doesn't
> need that work around.
Justin,
One thing I notice is at least one PCI posting bug. When using MMIO
(write[bwlq] under Linux), you _must_ use a read[bwlq] to flush the
write to PCI, if you wish to ensure the write posts at a certain point
in the code.
Here is the example PCI posting bug, in ahc_clear_critical_section:
> ahc_outb(ahc, HCNTRL, ahc->unpause);
> do {
> ahc_delay(200);
> } while (!ahc_is_paused(ahc));
As you can see, there is no read before the udelay(), which is very
wrong on modern CPUs with write posting... that's definitely a driver
bug that will bite you on modern x86 motherboards [and is totally broken
on ia64 and other platforms].
Please let me know if you have further questions on PCI write posting...
Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Quick aic7xxx bug hunt...
2002-09-23 21:27 ` Quick aic7xxx bug hunt Jeff Garzik
@ 2002-09-23 21:54 ` Justin T. Gibbs
2002-09-23 22:18 ` Jeff Garzik
0 siblings, 1 reply; 11+ messages in thread
From: Justin T. Gibbs @ 2002-09-23 21:54 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Konstantin Kletschke, linux-kernel
> Justin T. Gibbs wrote:
>> On some motherboards with some chipsets, you can get these messages if
>> another busmaster (say an IDE drive or a sound card) is hogging the bus.
>> Usually this is with a VIA chipset. Its not clear why the aic7xxx_old
>> driver would behave differently other than it disables memory write
>> and invalidate PCI transactions on this chip. The new driver doesn't
>> need that work around.
>
>
> Justin,
>
> One thing I notice is at least one PCI posting bug. When using MMIO
> (write[bwlq] under Linux), you _must_ use a read[bwlq] to flush the write
> to PCI, if you wish to ensure the write posts at a certain point in the
> code.
>
> Here is the example PCI posting bug, in ahc_clear_critical_section:
>> ahc_outb(ahc, HCNTRL, ahc->unpause);
>> do {
>> ahc_delay(200);
>> } while (!ahc_is_paused(ahc));
>
> As you can see, there is no read before the udelay(), which is very wrong
> on modern CPUs with write posting... that's definitely a driver bug that
> will bite you on modern x86 motherboards [and is totally broken on ia64
> and other platforms].
>
> Please let me know if you have further questions on PCI write posting...
I somewhat doubt that any CPU would hold onto a posted write for 200us
since you are not guaranteed that a read will occur quickly and you want
those write buffers to be availble for other clients, but regardless, the
code has not been as you describe since November of last year. The
CHANGELOG reads:
Always perform a bus read prior to waiting in
a delay loop waiting for a bus write to take
effect. This ensures that the first time
through the loop the delay occurs after the
write has taken effect.
Of course, the "bug" was benign since the loop does perform a read
so the write is guaranteed to post during the first itteration through
the loop.
--
Justin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Quick aic7xxx bug hunt...
2002-09-23 21:54 ` Justin T. Gibbs
@ 2002-09-23 22:18 ` Jeff Garzik
2002-09-23 22:44 ` Justin T. Gibbs
0 siblings, 1 reply; 11+ messages in thread
From: Jeff Garzik @ 2002-09-23 22:18 UTC (permalink / raw)
To: Justin T. Gibbs; +Cc: Konstantin Kletschke, linux-kernel
Justin T. Gibbs wrote:
> I somewhat doubt that any CPU would hold onto a posted write for 200us
> since you are not guaranteed that a read will occur quickly and you want
> those write buffers to be availble for other clients, but regardless, the
> code has not been as you describe since November of last year.
Great, I stand corrected. Looks like 2.5 code is ancient then?
comments on the 2.4 code:
* the 1000us delay in ahc_reset needs to be turned into a sleep, instead
all paths to that function [AFAICS] can sleep. likewise for the huge
delay in ahc_acquire_seeprom.
* 400ms worst case udelay() is in ahc_clear_critical_section is kinda
annoying [but I suppose it can be lived with, if the average is a lot
less than that :)]
* the delay in ahc_init should be replaced with a sleep
* PCI posting? (aic7xxx_core.c, line 1322, the last statement in the
function...)
ahc_outb(ahc, CLRINT, CLRSCSIINT);
I'll look it over some more later.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Quick aic7xxx bug hunt...
2002-09-23 22:18 ` Jeff Garzik
@ 2002-09-23 22:44 ` Justin T. Gibbs
2002-09-23 22:50 ` Jeff Garzik
0 siblings, 1 reply; 11+ messages in thread
From: Justin T. Gibbs @ 2002-09-23 22:44 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Konstantin Kletschke, linux-kernel
> Great, I stand corrected. Looks like 2.5 code is ancient then?
Yes. I didn't do the original port and am now just finishing up my
port to 2.5.X.
> comments on the 2.4 code:
> * the 1000us delay in ahc_reset needs to be turned into a sleep, instead
> all paths to that function [AFAICS] can sleep. likewise for the huge
> delay in ahc_acquire_seeprom.
For all of these delays, I'd be more than happy to make them all into
sleeps if I can tell, from inside ahc_delay() if I'm in a context where
it is safe to sleep. On the other platforms that this core code runs on
I'm usually not in a context where it is safe to sleep, so I don't want
to switch to using a different driver primitive.
> * PCI posting? (aic7xxx_core.c, line 1322, the last statement in the
> function...)
>
> ahc_outb(ahc, CLRINT, CLRSCSIINT);
I don't care when the write occurs only that it will occur eventually.
The buffer will get flushed eventually so there is no need to call
ahc_flush_device_writes().
--
Justin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Quick aic7xxx bug hunt...
2002-09-23 22:44 ` Justin T. Gibbs
@ 2002-09-23 22:50 ` Jeff Garzik
2002-09-23 22:55 ` Justin T. Gibbs
0 siblings, 1 reply; 11+ messages in thread
From: Jeff Garzik @ 2002-09-23 22:50 UTC (permalink / raw)
To: Justin T. Gibbs; +Cc: Konstantin Kletschke, linux-kernel
Justin T. Gibbs wrote:
>>Great, I stand corrected. Looks like 2.5 code is ancient then?
>
>
> Yes. I didn't do the original port and am now just finishing up my
> port to 2.5.X.
>
>
>>comments on the 2.4 code:
>>* the 1000us delay in ahc_reset needs to be turned into a sleep, instead
>>all paths to that function [AFAICS] can sleep. likewise for the huge
>>delay in ahc_acquire_seeprom.
>
>
> For all of these delays, I'd be more than happy to make them all into
> sleeps if I can tell, from inside ahc_delay() if I'm in a context where
> it is safe to sleep. On the other platforms that this core code runs on
> I'm usually not in a context where it is safe to sleep, so I don't want
> to switch to using a different driver primitive.
For Linux it's unconditionally safe, and other platforms is sounds like
it's unconditionally not. So, s/ahc_delay/ahc_sleep/ for the places I
pointed out, and just make ahc_delay==ahc_sleep on non-Linux platforms
(or any similarly-functioning solution)
It's pretty much impossible to detect if you are inside certain
spinlocks, in a generic fashion.
>>* PCI posting? (aic7xxx_core.c, line 1322, the last statement in the
>>function...)
>>
>> ahc_outb(ahc, CLRINT, CLRSCSIINT);
>
>
> I don't care when the write occurs only that it will occur eventually.
> The buffer will get flushed eventually so there is no need to call
> ahc_flush_device_writes().
ok, thanks for clarifying.
Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Quick aic7xxx bug hunt...
2002-09-23 22:50 ` Jeff Garzik
@ 2002-09-23 22:55 ` Justin T. Gibbs
2002-09-23 23:22 ` Jeff Garzik
0 siblings, 1 reply; 11+ messages in thread
From: Justin T. Gibbs @ 2002-09-23 22:55 UTC (permalink / raw)
To: Jeff Garzik; +Cc: Konstantin Kletschke, linux-kernel
>> For all of these delays, I'd be more than happy to make them all into
>> sleeps if I can tell, from inside ahc_delay() if I'm in a context where
>> it is safe to sleep. On the other platforms that this core code runs on
>> I'm usually not in a context where it is safe to sleep, so I don't want
>> to switch to using a different driver primitive.
>
> For Linux it's unconditionally safe, and other platforms is sounds like
> it's unconditionally not. So, s/ahc_delay/ahc_sleep/ for the places I
> pointed out, and just make ahc_delay==ahc_sleep on non-Linux platforms
> (or any similarly-functioning solution)
So you can sleep while in an interrupt context? I didn't know that
2.5 had switched to using interrupt threads or some similar construct.
> It's pretty much impossible to detect if you are inside certain
> spinlocks, in a generic fashion.
In 2.5, I should only need to release my own host lock assuming it is
held.
--
Justin
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Quick aic7xxx bug hunt...
2002-09-23 22:55 ` Justin T. Gibbs
@ 2002-09-23 23:22 ` Jeff Garzik
2002-09-23 23:28 ` Jeff Garzik
0 siblings, 1 reply; 11+ messages in thread
From: Jeff Garzik @ 2002-09-23 23:22 UTC (permalink / raw)
To: Justin T. Gibbs; +Cc: Konstantin Kletschke, linux-kernel
Justin T. Gibbs wrote:
>>>For all of these delays, I'd be more than happy to make them all into
>>>sleeps if I can tell, from inside ahc_delay() if I'm in a context where
>>>it is safe to sleep. On the other platforms that this core code runs on
>>>I'm usually not in a context where it is safe to sleep, so I don't want
>>>to switch to using a different driver primitive.
>>
>>For Linux it's unconditionally safe, and other platforms is sounds like
>>it's unconditionally not. So, s/ahc_delay/ahc_sleep/ for the places I
>>pointed out, and just make ahc_delay==ahc_sleep on non-Linux platforms
>>(or any similarly-functioning solution)
>
>
> So you can sleep while in an interrupt context? I didn't know that
> 2.5 had switched to using interrupt threads or some similar construct.
Of course not :)
ahc_reset
aic7770_config -> can sleep
ahc_pci_config -> can sleep
ahc_shutdown -> can't sleep, whoops
ahc_resume -> dead code
ahc_init
aic7770_config -> can sleep
ahc_pci_config -> can sleep
ahc_acquire_seeprom
check_extport -> can sleep
ahc_proc_write_seeprom -> can sleep
so, ahc_init and ahc_acquire_seeprom can s/ahc_delay/ahc_sleep/ safely.
Oh, and I found another bug: never use check_region, it's inherently
racy. Use request_region and check its return value.
Note I agree with a comment in the code, that wrapping SHT->detect() in
io_request_lock is silly... the comment describing the rationale in
drivers/scsi/scsi.c is not really accurate...
Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Quick aic7xxx bug hunt...
2002-09-23 23:22 ` Jeff Garzik
@ 2002-09-23 23:28 ` Jeff Garzik
2002-09-23 7:44 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 11+ messages in thread
From: Jeff Garzik @ 2002-09-23 23:28 UTC (permalink / raw)
To: Justin T. Gibbs; +Cc: Konstantin Kletschke, linux-kernel
Jeff Garzik wrote:
> ahc_reset
> aic7770_config -> can sleep
> ahc_pci_config -> can sleep
> ahc_shutdown -> can't sleep, whoops
Though, to answer your question from a previous email, you can call the
function in_interrupt() to see if you're in interrupt context.
Jeff
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2002-09-24 9:31 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2002-09-23 18:00 2.4.20-pre7-ac3 aic7xxx broken? Konstantin Kletschke
2002-09-23 19:15 ` Justin T. Gibbs
2002-09-23 21:27 ` Quick aic7xxx bug hunt Jeff Garzik
2002-09-23 21:54 ` Justin T. Gibbs
2002-09-23 22:18 ` Jeff Garzik
2002-09-23 22:44 ` Justin T. Gibbs
2002-09-23 22:50 ` Jeff Garzik
2002-09-23 22:55 ` Justin T. Gibbs
2002-09-23 23:22 ` Jeff Garzik
2002-09-23 23:28 ` Jeff Garzik
2002-09-23 7:44 ` Benjamin Herrenschmidt
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).