* Re: Linux-2.5.14..
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
@ 2002-05-06 6:30 ` Daniel Pittman
2002-05-06 6:51 ` Linux-2.5.14 Andrew Morton
2002-05-06 15:13 ` Linux-2.5.14 Linus Torvalds
2002-05-06 6:47 ` Linux-2.5.14 bert hubert
` (11 subsequent siblings)
12 siblings, 2 replies; 265+ messages in thread
From: Daniel Pittman @ 2002-05-06 6:30 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List, Andrew Morton
On Sun, 5 May 2002, Linus Torvalds wrote:
> There's a lot of stuff that has happened in the 2.5.x series lately,
> and you can see the gory details in the ChangeLog files that accompany
> releases these days, but I thought I'd point out 2.5.14, since it has
> some interesting fundamental changes to how dirty state is maintained
> in the VM.
>
> (The big changes were actually in 2.5.12, but 2.5.13 contained various
> minor fixes and tweaks, and 2.5.14 contains a number of fixes
> especially wrt truncate, so hopefully it's fairly _stable_ as of
> 2.5.14.)
>From the look of the changelog at least a few of the file corruption
bugs with ext3, 2k block file systems and 2.5 have been fixed. Should I
expect this release to address the problems I was seeing?
Daniel
--
I keep my head above the surface, trying to breath, looking for land.
I keep an eye at the distant horizon waiting for help, clutching the sky.
-- Covenant, _Phoenix_
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: Linux-2.5.14..
2002-05-06 6:30 ` Linux-2.5.14 Daniel Pittman
@ 2002-05-06 6:51 ` Andrew Morton
2002-05-06 15:13 ` Linux-2.5.14 Linus Torvalds
1 sibling, 0 replies; 265+ messages in thread
From: Andrew Morton @ 2002-05-06 6:51 UTC (permalink / raw)
To: Daniel Pittman; +Cc: Linus Torvalds, Kernel Mailing List, Andrew Morton
Daniel Pittman wrote:
>
> On Sun, 5 May 2002, Linus Torvalds wrote:
> > There's a lot of stuff that has happened in the 2.5.x series lately,
> > and you can see the gory details in the ChangeLog files that accompany
> > releases these days, but I thought I'd point out 2.5.14, since it has
> > some interesting fundamental changes to how dirty state is maintained
> > in the VM.
> >
> > (The big changes were actually in 2.5.12, but 2.5.13 contained various
> > minor fixes and tweaks, and 2.5.14 contains a number of fixes
> > especially wrt truncate, so hopefully it's fairly _stable_ as of
> > 2.5.14.)
>
> >From the look of the changelog at least a few of the file corruption
> bugs with ext3, 2k block file systems and 2.5 have been fixed. Should I
> expect this release to address the problems I was seeing?
>
I don't have an explanation for the ext3 problem which you saw.
It's conceivable that 2.5.13 was leaving dirty buffers around after
they were "deleted", and that fsync grabbed them via the i_dirty_buffers
back door, and wrote them where they shouldn't have been written.
But they wouldn't have been mapped anywhere...
So I still need to try to reproduce that one. If you could have
another shot, it would be appreciated. But if it _does_ work OK,
I can't say it's fixed until I know what caused the 2.4.13 failure.
ext3 is very sensitive to what is going on in buffer.c. There's
a lot of tension in there between the desire to share code and
the desire to not be damaged by changes in the code which we share.
Generally, ext3 with data=journal is not happy at present.
Partly because it contains assertions of things which aren't true
any more.
Partly because of a known problem in ext3: assertion failure at
transaction.c:606. Stephen has a fix for this which we need to
wiggle into 2.4. For some reason, the 2.5 changes are triggering
it much more easily.
I'll spend a few hours this week trying to resurrect data=journal,
but if that doesn't work out I think I'll just turn it off for the
while, make it emit a warning and use data=ordered.
-
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: Linux-2.5.14..
2002-05-06 6:30 ` Linux-2.5.14 Daniel Pittman
2002-05-06 6:51 ` Linux-2.5.14 Andrew Morton
@ 2002-05-06 15:13 ` Linus Torvalds
2002-05-07 4:28 ` Linux-2.5.14 Daniel Pittman
2002-05-09 3:53 ` Linux-2.5.14 Daniel Pittman
1 sibling, 2 replies; 265+ messages in thread
From: Linus Torvalds @ 2002-05-06 15:13 UTC (permalink / raw)
To: Daniel Pittman; +Cc: Kernel Mailing List, Andrew Morton
On Mon, 6 May 2002, Daniel Pittman wrote:
>
> From the look of the changelog at least a few of the file corruption
> bugs with ext3, 2k block file systems and 2.5 have been fixed. Should I
> expect this release to address the problems I was seeing?
"Expect" is too strong a word. I'd say "hope" - a number of truncate bugs
were fixed, but whether that was what bit you, nobody knows.
I suspect the real answer is that we'd love for you to test things out,
but that if it ends up being too painful to recover if the problems happen
again, you probably shouldn't..
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: Linux-2.5.14..
2002-05-06 15:13 ` Linux-2.5.14 Linus Torvalds
@ 2002-05-07 4:28 ` Daniel Pittman
2002-05-09 3:53 ` Linux-2.5.14 Daniel Pittman
1 sibling, 0 replies; 265+ messages in thread
From: Daniel Pittman @ 2002-05-07 4:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List, Andrew Morton
On Mon, 6 May 2002, Linus Torvalds wrote:
> On Mon, 6 May 2002, Daniel Pittman wrote:
>>
>> From the look of the changelog at least a few of the file corruption
>> bugs with ext3, 2k block file systems and 2.5 have been fixed. Should
>> I expect this release to address the problems I was seeing?
>
> "Expect" is too strong a word. I'd say "hope" - a number of truncate
> bugs were fixed, but whether that was what bit you, nobody knows.
Well, hope seems justified...
> I suspect the real answer is that we'd love for you to test things
> out, but that if it ends up being too painful to recover if the
> problems happen again, you probably shouldn't..
I did, and I failed to reproduce it working on a scratch disk. This was
a period of playing and I /hope/ that it's conclusive. I couldn't get
.12 to reliably fail, though, which is less inspiring.
I should be able to find some time in the next day or so to test it a
bit more on the scratch disk and then, if that works, I will update my
backups. :)
Still, it seems good so far.
Daniel
--
Television is the ideal propaganda medium, a mendacious monster, not primarily
out of malice but from its amoral nature.
-- Paul Johnson
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: Linux-2.5.14..
2002-05-06 15:13 ` Linux-2.5.14 Linus Torvalds
2002-05-07 4:28 ` Linux-2.5.14 Daniel Pittman
@ 2002-05-09 3:53 ` Daniel Pittman
2002-05-09 4:34 ` Linux-2.5.14 Andrew Morton
1 sibling, 1 reply; 265+ messages in thread
From: Daniel Pittman @ 2002-05-09 3:53 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List, Andrew Morton
On Mon, 6 May 2002, Linus Torvalds wrote:
> On Mon, 6 May 2002, Daniel Pittman wrote:
>>
>> From the look of the changelog at least a few of the file corruption
>> bugs with ext3, 2k block file systems and 2.5 have been fixed. Should
>> I expect this release to address the problems I was seeing?
>
> "Expect" is too strong a word. I'd say "hope" - a number of truncate
> bugs were fixed, but whether that was what bit you, nobody knows.
>
> I suspect the real answer is that we'd love for you to test things
> out, but that if it ends up being too painful to recover if the
> problems happen again, you probably shouldn't..
Right. I got brave enough to test it on a real, live system after
extensive fake testing. It seems to work well, at least so far as
running the same workload that cause massive file corruption correctly.
So, I believe that 2.5.14 is working correctly with 2k ext3 filesystems,
at least for minimal use. I didn't do any sort of extreme load testing
or anything like that, being cautious about it.
On reboot, I got an assertion in ext3, though, and the following BUG
trace. So, something still isn't well, but it seems to be getting it
much more right. :)
Daniel
ksymoops 2.4.5 on i686 2.5.6. Options used
-V (default)
-k /proc/ksyms (default)
-l /proc/modules (default)
-o /lib/modules/2.5.6/ (default)
-m /boot/System.map-2.5.14 (specified)
Error (regular_file): read_ksyms stat /proc/ksyms failed
No modules in ksyms, skipping objects
No ksyms, skipping lsmod
Invalid Operand: 0000
CPU: 0
EIP: 0010:[<c015cf45>] Not Tainted
Using defaults from ksymoops -t elf32-i386 -a i386
EFLAGS: 00010286
EAX: 00000061 EBX: dc883900 ECX: c14ee080 EDX: df954ca0
ESI: dd36d200 EDI: dfd53600 EBP: d0d805c0 ESP: c14f1e6c
DS: 0018 ES: 0018 SS: 0018
Stack: c02c0060 c02c04e1 c02c0040 00000460 c02c0537 d2821380 d0d805c0 00000000
c14f1f04 c01557fd d0d805c0 d2821380 d2821380 00000800 d2821380 00000800
00000800 000000c0 c015555c d0d805c0 d2821380 0005f700 00000000 bfd71c00
Call Trace: [<c01557fd>] [<c015555c0>] [<c0155909>] [<c01557e4>] [<c0126bab>]
[<c01535fa>] [<c0132576>] [<c0106c97>]
Code: 0f 0b 60 04 40 00 2c c0 83 c4 14 6a 03 8b 45 00 50 53 e8 1c
>>EIP; c015cf45 <journal_dirty_metadata+13d/174> <=====
>>EBX; dc883900 <END_OF_CODE+1c4d838c/????>
>>ECX; c14ee080 <END_OF_CODE+1142b0c/????>
>>EDX; df954ca0 <END_OF_CODE+1f5a972c/????>
>>ESI; dd36d200 <END_OF_CODE+1cfc1c8c/????>
>>EDI; dfd53600 <END_OF_CODE+1f9a808c/????>
>>EBP; d0d805c0 <END_OF_CODE+109d504c/????>
>>ESP; c14f1e6c <END_OF_CODE+11468f8/????>
Trace; c01557fd <commit_write_fn+19/5c>
Trace; c015555c0 <END_OF_CODE+b411aa04c/????>
Trace; c0155909 <ext3_commit_write+c9/1e4>
Trace; c01557e4 <commit_write_fn+0/5c>
Trace; c0126bab <generic_file_write+4c3/6e4>
Trace; c01535fa <ext3_file_write+46/4c>
Trace; c0132576 <sys_write+96/f0>
Trace; c0106c97 <syscall_call+7/b>
Code; c015cf45 <journal_dirty_metadata+13d/174>
00000000 <_EIP>:
Code; c015cf45 <journal_dirty_metadata+13d/174> <=====
0: 0f 0b ud2a <=====
Code; c015cf47 <journal_dirty_metadata+13f/174>
2: 60 pusha
Code; c015cf48 <journal_dirty_metadata+140/174>
3: 04 40 add $0x40,%al
Code; c015cf4a <journal_dirty_metadata+142/174>
5: 00 2c c0 add %ch,(%eax,%eax,8)
Code; c015cf4d <journal_dirty_metadata+145/174>
8: 83 c4 14 add $0x14,%esp
Code; c015cf50 <journal_dirty_metadata+148/174>
b: 6a 03 push $0x3
Code; c015cf52 <journal_dirty_metadata+14a/174>
d: 8b 45 00 mov 0x0(%ebp),%eax
Code; c015cf55 <journal_dirty_metadata+14d/174>
10: 50 push %eax
Code; c015cf56 <journal_dirty_metadata+14e/174>
11: 53 push %ebx
Code; c015cf57 <journal_dirty_metadata+14f/174>
12: e8 1c 00 00 00 call 33 <_EIP+0x33> c015cf78 <journal_dirty_metadata+170/174>
1 error issued. Results may not be reliable.
--
The artistic temperment is a disease which afflicts amateurs.
-- G. K. Chesterton, _Heretics_, 1905
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: Linux-2.5.14..
2002-05-09 3:53 ` Linux-2.5.14 Daniel Pittman
@ 2002-05-09 4:34 ` Andrew Morton
2002-05-09 6:02 ` Linux-2.5.14 Daniel Pittman
0 siblings, 1 reply; 265+ messages in thread
From: Andrew Morton @ 2002-05-09 4:34 UTC (permalink / raw)
To: Daniel Pittman; +Cc: Linus Torvalds, Kernel Mailing List
Daniel Pittman wrote:
>
> On Mon, 6 May 2002, Linus Torvalds wrote:
> > On Mon, 6 May 2002, Daniel Pittman wrote:
> >>
> >> From the look of the changelog at least a few of the file corruption
> >> bugs with ext3, 2k block file systems and 2.5 have been fixed. Should
> >> I expect this release to address the problems I was seeing?
> >
> > "Expect" is too strong a word. I'd say "hope" - a number of truncate
> > bugs were fixed, but whether that was what bit you, nobody knows.
> >
> > I suspect the real answer is that we'd love for you to test things
> > out, but that if it ends up being too painful to recover if the
> > problems happen again, you probably shouldn't..
>
> Right. I got brave enough to test it on a real, live system after
> extensive fake testing. It seems to work well, at least so far as
> running the same workload that cause massive file corruption correctly.
hmm.
> So, I believe that 2.5.14 is working correctly with 2k ext3 filesystems,
> at least for minimal use. I didn't do any sort of extreme load testing
> or anything like that, being cautious about it.
I've been testing 2.5.14 pretty hard for a couple of days.
ext2, ext3-ordered, ext3-writeback (all with small blocks) are solid.
reiserfs and vfat are solid.
JFS deadlocks (see the BUGBUG comment in jfs_txnmgr.c - it happens). I've
asked the JFS guys for help on this; possibly the code can just be removed:
the buffer-based writeout which I replaced wouldn't have written the
pages anyway...
ext3-journalled is not happy.
There's a locking bug between try_to_free_buffers and buffer_insert_inode_queue
which never seems to trigger. I've got that fixed.
There's a known race between unmount and writeback which is probably
impossible to trigger. (see the FIXME at __sync_list). Testing the
fix for that at present.
The "sync" functions aren't right. Pages which are both dirty and
under writeback are not correctly waited upon. This is a minor
correctness thing which nobody would notice. Still thinking
about the best way to close this.
So unless you're a JFS or ext3-journalled user, 2.5.14 is OK.
> On reboot, I got an assertion in ext3, though, and the following BUG
> trace. So, something still isn't well, but it seems to be getting it
> much more right. :)
>
> ...
>
> >>EIP; c015cf45 <journal_dirty_metadata+13d/174> <=====
>
> ...
> Code; c015cf45 <journal_dirty_metadata+13d/174> <=====
> 0: 0f 0b ud2a <=====
> Code; c015cf47 <journal_dirty_metadata+13f/174>
> 2: 60 pusha
> Code; c015cf48 <journal_dirty_metadata+140/174>
> 3: 04 40 add $0x40,%al
04 60 -> line 1120. Yup, I get that one too. I assume you were
testing with data=journal.
Thanks again...
-
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: Linux-2.5.14..
2002-05-09 4:34 ` Linux-2.5.14 Andrew Morton
@ 2002-05-09 6:02 ` Daniel Pittman
0 siblings, 0 replies; 265+ messages in thread
From: Daniel Pittman @ 2002-05-09 6:02 UTC (permalink / raw)
To: Andrew Morton; +Cc: Linus Torvalds, Kernel Mailing List
On Wed, 08 May 2002, Andrew Morton wrote:
> Daniel Pittman wrote:
>> On Mon, 6 May 2002, Linus Torvalds wrote:
>> > On Mon, 6 May 2002, Daniel Pittman wrote:
>> >>
>> >> From the look of the changelog at least a few of the file
>> >> corruption bugs with ext3, 2k block file systems and 2.5 have been
>> >> fixed. Should I expect this release to address the problems I was
>> >> seeing?
>> >
>> > "Expect" is too strong a word. I'd say "hope" - a number of
>> > truncate bugs were fixed, but whether that was what bit you, nobody
>> > knows.
>> >
>> > I suspect the real answer is that we'd love for you to test things
>> > out, but that if it ends up being too painful to recover if the
>> > problems happen again, you probably shouldn't..
>>
>> Right. I got brave enough to test it on a real, live system after
>> extensive fake testing. It seems to work well, at least so far as
>> running the same workload that cause massive file corruption
>> correctly.
>
> hmm.
Not conclusive, I know, but getting a panic after a brief test stopped
it at that point. :)
>> So, I believe that 2.5.14 is working correctly with 2k ext3
>> filesystems, at least for minimal use. I didn't do any sort of
>> extreme load testing or anything like that, being cautious about it.
>
> I've been testing 2.5.14 pretty hard for a couple of days.
[...]
> ext3-journalled is not happy.
Which probably explains my error, then, as I have not stepped back from
data journaled mode on my system.
[...]
> There's a known race between unmount and writeback which is probably
> impossible to trigger. (see the FIXME at __sync_list). Testing the
> fix for that at present.
Well, unmount wouldn't have happened for quite a long time in the
shutdown process, given it was at the initial 'send SIGTERM' stage...
[...]
> So unless you're a JFS or ext3-journalled user, 2.5.14 is OK.
The latter. Ah, well, at least you know.
[...]
> 04 60 -> line 1120. Yup, I get that one too. I assume you were
> testing with data=journal.
Confirmed.
Daniel
--
No, no, you're not thinking, you're just being logical.
-- Niels Bohr
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: Linux-2.5.14..
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
2002-05-06 6:30 ` Linux-2.5.14 Daniel Pittman
@ 2002-05-06 6:47 ` bert hubert
2002-05-06 7:07 ` Linux-2.5.14 Andrew Morton
2002-05-06 9:09 ` [PATCH] 2.5.14 IDE 55 Martin Dalecki
` (10 subsequent siblings)
12 siblings, 1 reply; 265+ messages in thread
From: bert hubert @ 2002-05-06 6:47 UTC (permalink / raw)
To: Linus Torvalds, akpm; +Cc: Kernel Mailing List
On Mon, May 06, 2002 at 03:54:46AM +0000, Linus Torvalds wrote:
> releases these days, but I thought I'd point out 2.5.14, since it has some
> interesting fundamental changes to how dirty state is maintained in the
> VM.
I parsed this 'dirty state' sentence all wrong at first :-) Andrew, Linus -
where does the current VM lie in between rmap-vm and aa-vm?
Regards,
bert hubert
--
http://www.PowerDNS.com Versatile DNS Software & Services
http://www.tk the dot in .tk
http://lartc.org Linux Advanced Routing & Traffic Control HOWTO
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: Linux-2.5.14..
2002-05-06 6:47 ` Linux-2.5.14 bert hubert
@ 2002-05-06 7:07 ` Andrew Morton
2002-05-06 14:00 ` Linux-2.5.14 Rik van Riel
0 siblings, 1 reply; 265+ messages in thread
From: Andrew Morton @ 2002-05-06 7:07 UTC (permalink / raw)
To: bert hubert; +Cc: Kernel Mailing List
bert hubert wrote:
>
> On Mon, May 06, 2002 at 03:54:46AM +0000, Linus Torvalds wrote:
>
> > releases these days, but I thought I'd point out 2.5.14, since it has some
> > interesting fundamental changes to how dirty state is maintained in the
> > VM.
>
> I parsed this 'dirty state' sentence all wrong at first :-) Andrew, Linus -
> where does the current VM lie in between rmap-vm and aa-vm?
>
"VM" is a broad term. The memory allocator, page replacement, swap and
all that stuff is unaltered - it is the same as 2.4.current. ie: Andrea's
VM from when his changes stopped going into the mainline kernel.
I made minimal changes in there to teach the page allocator that
all dirty memory is written back via pages and not sometimes-pages,
sometimes-buffers. Also to add support for the new `clustering
writeback' which address_spaces can perform.
It's probably not as well tuned as it could be at present, but
I don't see a lot of point in fiddling with it. As long as the
VM doesn't actually impede 2.5 development and evaulation of
2.5 performance, best to leave it alone until a VM developer
steps up to do the 2.6 VM.
The change to which Linus refers is:
In 2.4, dirty data from the write(2) system call is encapsulated
in buffer_heads and is placed on a global buffer list for writeout.
And dirty data from shared mappings is attached to its inode.
In 2.5, the buffer list went away, and dirty data from write(2)
is now managed in the same way as dirty data from mmap().
And because the kupdate and bdflush functions used to work
against the buffer LRU, replacements were introduced which do
the same thing against the inodes, instead of against the buffers.
So it's all page-oriented now.
-
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: Linux-2.5.14..
2002-05-06 7:07 ` Linux-2.5.14 Andrew Morton
@ 2002-05-06 14:00 ` Rik van Riel
0 siblings, 0 replies; 265+ messages in thread
From: Rik van Riel @ 2002-05-06 14:00 UTC (permalink / raw)
To: Andrew Morton; +Cc: bert hubert, Kernel Mailing List
On Mon, 6 May 2002, Andrew Morton wrote:
> bert hubert wrote:
> > I parsed this 'dirty state' sentence all wrong at first :-) Andrew, Linus -
> > where does the current VM lie in between rmap-vm and aa-vm?
> I made minimal changes in there to teach the page allocator that
> all dirty memory is written back via pages and not sometimes-pages,
> sometimes-buffers. Also to add support for the new `clustering
> writeback' which address_spaces can perform.
> So it's all page-oriented now.
Nice, this will make it possible to have much cleaner page
replacement code.
regards,
Rik
--
Bravely reimplemented by the knights who say "NIH".
http://www.surriel.com/ http://distro.conectiva.com/
^ permalink raw reply [flat|nested] 265+ messages in thread
* [PATCH] 2.5.14 IDE 55
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
2002-05-06 6:30 ` Linux-2.5.14 Daniel Pittman
2002-05-06 6:47 ` Linux-2.5.14 bert hubert
@ 2002-05-06 9:09 ` Martin Dalecki
2002-05-06 17:48 ` David Lang
` (2 more replies)
2002-05-07 11:22 ` [PATCH] 2.5.14 IDE 56 Martin Dalecki
` (9 subsequent siblings)
12 siblings, 3 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-06 9:09 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 1141 bytes --]
Sun May 5 16:34:59 CEST 2002 ide-clean-55
Resync with 2.5.14.
- Update HPT374 driver carried over from 2.4.xx series by Andrew Morton.
Resync it with the recent host chip driver changes, or better the
introduction of an API at all.
- Consolidate the handling of device ID byte order in one place.
This was spotted and patched by Bartomiej onierkiewicz.
- Eliminate CONFIG_BLK_DEV_IDEPCI - it's duplicating the functionality of the
already present and fine CONFIG_PCI flag and if we are a PCI host, we are
indeed very likely to need host chip support anyway.
- Remove some redundant info about the model and channel number from
/proc/ide. Remove the binary entries not helpful to the user, and not used
by any program and redundant to corresponding ioctls.
- Properly return udma_read and udma_write values in taskfile.
- Only initialize XXX_udma to the default handlers if it has not been
initialized by the host chip initialization.
I have enabled spin lock debugging and can see that on device
flush the spin locks get wrong counts... no problems elsewher ethus
far. I will re check them next time around.
[-- Attachment #2: ide-clean-55.diff.gz --]
[-- Type: application/x-gzip, Size: 20887 bytes --]
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-06 9:09 ` [PATCH] 2.5.14 IDE 55 Martin Dalecki
@ 2002-05-06 17:48 ` David Lang
2002-05-06 22:40 ` Roman Zippel
2002-05-07 0:03 ` Roman Zippel
2 siblings, 0 replies; 265+ messages in thread
From: David Lang @ 2002-05-06 17:48 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
[-- Attachment #1: Type: TEXT/PLAIN, Size: 705 bytes --]
On Mon, 6 May 2002, Martin Dalecki wrote:
> - Remove some redundant info about the model and channel number from
> /proc/ide. Remove the binary entries not helpful to the user, and not used
> by any program and redundant to corresponding ioctls.
Martin, how do you know it isn't used by any program?
changing /proc stuff 'becouse it's availabe elsewhere' is not a good idea,
people writing shell scripts do not nessasarily have access to ioctls.
you must have missed the proc wars of past kernels, but in general it's
not a good idea to tinker with anything in /proc unless it is a
significant improvement and 'redundant to corresponding ioctls' is not a
significant improvement.
David Lang
[-- Attachment #2: Type: APPLICATION/X-GZIP, Size: 20887 bytes --]
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-06 9:09 ` [PATCH] 2.5.14 IDE 55 Martin Dalecki
2002-05-06 17:48 ` David Lang
@ 2002-05-06 22:40 ` Roman Zippel
2002-05-07 10:10 ` Martin Dalecki
2002-05-07 0:03 ` Roman Zippel
2 siblings, 1 reply; 265+ messages in thread
From: Roman Zippel @ 2002-05-06 22:40 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
Hi,
On Mon, 6 May 2002, Martin Dalecki wrote:
> - Eliminate CONFIG_BLK_DEV_IDEPCI - it's duplicating the functionality of the
> already present and fine CONFIG_PCI flag and if we are a PCI host, we are
> indeed very likely to need host chip support anyway.
Please don't do this! There are configurations possible with pci enabled
but without a pci ide adapter.
Could you please try to compile without CONFIG_BLK_DEV_IDEDMA_PCI enabled?
bye, Roman
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-06 22:40 ` Roman Zippel
@ 2002-05-07 10:10 ` Martin Dalecki
2002-05-07 11:31 ` Roman Zippel
0 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 10:10 UTC (permalink / raw)
To: Roman Zippel; +Cc: Linus Torvalds, Kernel Mailing List
Uz.ytkownik Roman Zippel napisa?:
> Hi,
>
> On Mon, 6 May 2002, Martin Dalecki wrote:
>
>
>>- Eliminate CONFIG_BLK_DEV_IDEPCI - it's duplicating the functionality of the
>> already present and fine CONFIG_PCI flag and if we are a PCI host, we are
>> indeed very likely to need host chip support anyway.
>
>
> Please don't do this! There are configurations possible with pci enabled
> but without a pci ide adapter.
So please just don't configure any PCI host chip support there.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-07 10:10 ` Martin Dalecki
@ 2002-05-07 11:31 ` Roman Zippel
2002-05-07 10:31 ` Martin Dalecki
0 siblings, 1 reply; 265+ messages in thread
From: Roman Zippel @ 2002-05-07 11:31 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Kernel Mailing List
Hi,
On Tue, 7 May 2002, Martin Dalecki wrote:
> > Please don't do this! There are configurations possible with pci enabled
> > but without a pci ide adapter.
>
> So please just don't configure any PCI host chip support there.
Then ide-pci.c is still compiled into the kernel. Why?
bye, Roman
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-07 11:31 ` Roman Zippel
@ 2002-05-07 10:31 ` Martin Dalecki
2002-05-07 10:34 ` Martin Dalecki
0 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 10:31 UTC (permalink / raw)
To: Roman Zippel; +Cc: Kernel Mailing List
Uz.ytkownik Roman Zippel napisa?:
> Hi,
>
> On Tue, 7 May 2002, Martin Dalecki wrote:
>
>
>>>Please don't do this! There are configurations possible with pci enabled
>>>but without a pci ide adapter.
>>
>>So please just don't configure any PCI host chip support there.
>
>
> Then ide-pci.c is still compiled into the kernel. Why?
Becouse the big tables there are subject to go.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-07 10:31 ` Martin Dalecki
@ 2002-05-07 10:34 ` Martin Dalecki
2002-05-07 11:48 ` Roman Zippel
0 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 10:34 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Roman Zippel, Kernel Mailing List
Uz.ytkownik Martin Dalecki napisa?:
> Uz.ytkownik Roman Zippel napisa?:
>
>> Hi,
>>
>> On Tue, 7 May 2002, Martin Dalecki wrote:
>>
>>
>>>> Please don't do this! There are configurations possible with pci
>>>> enabled but without a pci ide adapter.
>>>
>>>
>>> So please just don't configure any PCI host chip support there.
>>
>>
>>
>> Then ide-pci.c is still compiled into the kernel. Why?
>
>
>
> Becouse the big tables there are subject to go.
And at some point in time it will check whatever there is
request for any host chip support.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-07 10:34 ` Martin Dalecki
@ 2002-05-07 11:48 ` Roman Zippel
2002-05-07 11:19 ` Martin Dalecki
0 siblings, 1 reply; 265+ messages in thread
From: Roman Zippel @ 2002-05-07 11:48 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Kernel Mailing List
Hi,
On Tue, 7 May 2002, Martin Dalecki wrote:
> >> Then ide-pci.c is still compiled into the kernel. Why?
> >
> > Becouse the big tables there are subject to go.
>
> And at some point in time it will check whatever there is
> request for any host chip support.
Could you please then do the above change _after_ you have done this?
bye, Roman
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-07 11:48 ` Roman Zippel
@ 2002-05-07 11:19 ` Martin Dalecki
2002-05-07 12:35 ` Roman Zippel
` (2 more replies)
0 siblings, 3 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 11:19 UTC (permalink / raw)
To: Roman Zippel; +Cc: Kernel Mailing List
Uz.ytkownik Roman Zippel napisa?:
> Hi,
>
> On Tue, 7 May 2002, Martin Dalecki wrote:
>
>
>>>>Then ide-pci.c is still compiled into the kernel. Why?
>>>
>>>Becouse the big tables there are subject to go.
>>
>>And at some point in time it will check whatever there is
>>request for any host chip support.
>
>
> Could you please then do the above change _after_ you have done this?
Well one question renames: Please name me one PCI based architecture
which contains IDE support and does not contain any special host chip
attached to the very same PCI bus as well.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-07 11:19 ` Martin Dalecki
@ 2002-05-07 12:35 ` Roman Zippel
2002-05-07 12:36 ` Andrey Panin
2002-05-07 12:38 ` Dave Jones
2 siblings, 0 replies; 265+ messages in thread
From: Roman Zippel @ 2002-05-07 12:35 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Kernel Mailing List
Hi,
On Tue, 7 May 2002, Martin Dalecki wrote:
> Well one question renames: Please name me one PCI based architecture
> which contains IDE support and does not contain any special host chip
> attached to the very same PCI bus as well.
Architectures which are not directly PCI based, but which can have have
PCI bridges. On the other hand even on PCI based archs you don't
necessarily want to compile in ide support automatically, please leave
that to Eric's autoconf.
bye, Roman
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-07 11:19 ` Martin Dalecki
2002-05-07 12:35 ` Roman Zippel
@ 2002-05-07 12:36 ` Andrey Panin
2002-05-07 11:32 ` Martin Dalecki
2002-05-07 12:38 ` Dave Jones
2 siblings, 1 reply; 265+ messages in thread
From: Andrey Panin @ 2002-05-07 12:36 UTC (permalink / raw)
To: Martin Dalecki; +Cc: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 931 bytes --]
On Втр, Май 07, 2002 at 01:19:02 +0200, Martin Dalecki wrote:
> Uz.ytkownik Roman Zippel napisa?:
> >Hi,
> >
> >On Tue, 7 May 2002, Martin Dalecki wrote:
> >
> >
> >>>>Then ide-pci.c is still compiled into the kernel. Why?
> >>>
> >>>Becouse the big tables there are subject to go.
> >>
> >>And at some point in time it will check whatever there is
> >>request for any host chip support.
> >
> >
> >Could you please then do the above change _after_ you have done this?
>
> Well one question renames: Please name me one PCI based architecture
> which contains IDE support and does not contain any special host chip
> attached to the very same PCI bus as well.
SiS 496/497 PCI chipset for i486's. It has integrated IDE controller,
but this controller is not connected to PCI bus.
--
Andrey Panin | Embedded systems software engineer
pazke@orbita1.ru | PGP key: wwwkeys.eu.pgp.net
[-- Attachment #2: Type: application/pgp-signature, Size: 232 bytes --]
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-07 12:36 ` Andrey Panin
@ 2002-05-07 11:32 ` Martin Dalecki
0 siblings, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 11:32 UTC (permalink / raw)
To: Andrey Panin; +Cc: linux-kernel
Użytkownik Andrey Panin napisał:
> On ???, ??? 07, 2002 at 01:19:02 +0200, Martin Dalecki wrote:
>>Well one question renames: Please name me one PCI based architecture
>>which contains IDE support and does not contain any special host chip
>>attached to the very same PCI bus as well.
>
>
> SiS 496/497 PCI chipset for i486's. It has integrated IDE controller,
> but this controller is not connected to PCI bus.
OK - That actually is an argument I follow. Thank you I will adjust
the code. (Not quite right jet maybe but I will do it.)
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-07 11:19 ` Martin Dalecki
2002-05-07 12:35 ` Roman Zippel
2002-05-07 12:36 ` Andrey Panin
@ 2002-05-07 12:38 ` Dave Jones
2 siblings, 0 replies; 265+ messages in thread
From: Dave Jones @ 2002-05-07 12:38 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Roman Zippel, Kernel Mailing List
On Tue, May 07, 2002 at 01:19:02PM +0200, Martin Dalecki wrote:
> Well one question renames: Please name me one PCI based architecture
> which contains IDE support and does not contain any special host chip
> attached to the very same PCI bus as well.
If by 'attached' you mean integrated into chipset, I have several such machines
that have no IDE without an extra add-on card.
My quad ppro has two PCI buses, but no IDE controller.
My PCI 486 has no onboard IDE.
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-06 9:09 ` [PATCH] 2.5.14 IDE 55 Martin Dalecki
2002-05-06 17:48 ` David Lang
2002-05-06 22:40 ` Roman Zippel
@ 2002-05-07 0:03 ` Roman Zippel
2002-05-07 10:12 ` Martin Dalecki
2 siblings, 1 reply; 265+ messages in thread
From: Roman Zippel @ 2002-05-07 0:03 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
Hi,
On Mon, 6 May 2002, Martin Dalecki wrote:
> - Consolidate the handling of device ID byte order in one place.
> This was spotted and patched by Bartomiej onierkiewicz.
Another thing: where is the equivalilent part of this removed code?
-static __inline__ void ide_fix_driveid(struct hd_driveid *id)
-{
-#if defined(CONFIG_AMIGA) || defined (CONFIG_MAC) || defined(M68K_IDE_SWAPW)
- u_char *p = (u_char *)id;
- int i, j, cnt;
- u_char t;
-
- if (!MACH_IS_AMIGA && !MACH_IS_MAC && !MACH_IS_Q40 && !MACH_IS_ATARI)
- return;
-#ifdef M68K_IDE_SWAPW
- if (M68K_IDE_SWAPW) /* fix bus byteorder first */
- for (i=0; i < 512; i+=2) {
- t = p[i]; p[i] = p[i+1]; p[i+1] = t;
- }
-#endif
bye, Roman
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-07 0:03 ` Roman Zippel
@ 2002-05-07 10:12 ` Martin Dalecki
2002-05-07 11:39 ` Roman Zippel
0 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 10:12 UTC (permalink / raw)
To: Roman Zippel; +Cc: Linus Torvalds, Kernel Mailing List
Uz.ytkownik Roman Zippel napisa?:
> Hi,
>
> On Mon, 6 May 2002, Martin Dalecki wrote:
>
>
>>- Consolidate the handling of device ID byte order in one place.
>> This was spotted and patched by Bartomiej onierkiewicz.
>
>
> Another thing: where is the equivalilent part of this removed code?
Look closer it's there in ide-probe.c.
>
> -static __inline__ void ide_fix_driveid(struct hd_driveid *id)
> -{
> -#if defined(CONFIG_AMIGA) || defined (CONFIG_MAC) || defined(M68K_IDE_SWAPW)
> - u_char *p = (u_char *)id;
> - int i, j, cnt;
> - u_char t;
> -
> - if (!MACH_IS_AMIGA && !MACH_IS_MAC && !MACH_IS_Q40 && !MACH_IS_ATARI)
> - return;
> -#ifdef M68K_IDE_SWAPW
> - if (M68K_IDE_SWAPW) /* fix bus byteorder first */
> - for (i=0; i < 512; i+=2) {
> - t = p[i]; p[i] = p[i+1]; p[i+1] = t;
> - }
> -#endif
>
> bye, Roman
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-07 10:12 ` Martin Dalecki
@ 2002-05-07 11:39 ` Roman Zippel
2002-05-07 10:40 ` Martin Dalecki
0 siblings, 1 reply; 265+ messages in thread
From: Roman Zippel @ 2002-05-07 11:39 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Kernel Mailing List
Hi,
On Tue, 7 May 2002, Martin Dalecki wrote:
> > Another thing: where is the equivalilent part of this removed code?
>
> Look closer it's there in ide-probe.c.
Does it still take the correct byte swapping into account?
Did you consider using a table for the fixup? It's nothing perfomance
critical and this might generate more compact code.
bye, Roman
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-07 11:39 ` Roman Zippel
@ 2002-05-07 10:40 ` Martin Dalecki
2002-05-07 12:42 ` Roman Zippel
0 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 10:40 UTC (permalink / raw)
To: Roman Zippel; +Cc: Kernel Mailing List
Uz.ytkownik Roman Zippel napisa?:
> Hi,
>
> On Tue, 7 May 2002, Martin Dalecki wrote:
>
>
>>>Another thing: where is the equivalilent part of this removed code?
>>
>>Look closer it's there in ide-probe.c.
>
>
> Does it still take the correct byte swapping into account?
> Did you consider using a table for the fixup? It's nothing perfomance
> critical and this might generate more compact code.
Well right now we have two different reimplementation of
get device id code, so this areas is subject to change anyway.
BTW.> It should indeed take both in to account as far as I can
see.(Despite the fact that I could affort an ATARI I hardly
can find one...)
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 55
2002-05-07 10:40 ` Martin Dalecki
@ 2002-05-07 12:42 ` Roman Zippel
0 siblings, 0 replies; 265+ messages in thread
From: Roman Zippel @ 2002-05-07 12:42 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Kernel Mailing List
Hi,
On Tue, 7 May 2002, Martin Dalecki wrote:
> BTW.> It should indeed take both in to account as far as I can
> see.(Despite the fact that I could affort an ATARI I hardly
> can find one...)
That's not necessary, but I'm only afraid that functionality gets lost,
which isn't needed on the latest hardware.
bye, Roman
^ permalink raw reply [flat|nested] 265+ messages in thread
* [PATCH] 2.5.14 IDE 56
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
` (2 preceding siblings ...)
2002-05-06 9:09 ` [PATCH] 2.5.14 IDE 55 Martin Dalecki
@ 2002-05-07 11:22 ` Martin Dalecki
2002-05-07 14:02 ` Padraig Brady
2002-05-08 18:46 ` Denis Vlasenko
2002-05-07 11:27 ` [PATCH] 2.5.14 IDE 57 Martin Dalecki
` (8 subsequent siblings)
12 siblings, 2 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 11:22 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 3171 bytes --]
Mon May 6 13:29:44 CEST 2002 ide-clean-56
- Push poll_timeout down from the hwgroup to the channel. We are resetting the
channel and not a whole hwgroup. This way using multiple pdc202xxx cards
should magically start to work with multiple performance and resets will no
longer lock the system.
- Updates for PDC4030 by Peter Denison <peterd@marshadder.uklinux.net>.
- Make ide_raw_taskfile don't care about request buffers. They where always
NULL.
- Port set multi mode count over from the special setting interface to
ide_raw_taskfile. Fix errors in the corresponding interrupt handler in one go
as well. It turned out that this is precisely the same code as in
task_no_data_intr, so we can nuke it altogether. And finally we have found
some problems with the set_pio_mode() command which can fail with -EBUSY -
this is in esp. probably *very* common during boot hdparm usage those days!
(OK it was masked by reportig too early that it finished... Crap Crap utter
crap it was!!!) Right now hdparm should just be extendid to properly
sync and retry on -EBUSY and everything should be fine.
And now the 1 Milion EUR question for everybody who loves to put driver
settings in to /proc:
How the hell could echo > /proc/ide/ide0/settings blah blah blah blah handle
properly cases like -EIO, -EBUSY and so on??? Having the possibility o do it
does not mean that it is a good idea to use it.
OK. After realizing the simple fact that quite a lot of low level hardware
manipulating ioctls may require assistance in usage from proper logic which is
*very* unlikely to be implemented in a bash (for me preferable still ksh) I
have made my mind up.
/proc/ide will be nuked.
- Execute the recalibration for error recovery on precisely the same request as
the one which failed.
- Remove set geometry. It's crap by means of standard specification. Because:
1. We relay on the existence of the identify command anyway.
2. This command was obsoleted *before* the identify command existed as far
as I can see.
2. I'm able to have a look at what other ATA/ATAPI drivers in the wild do:
They don't do it.
- Just call tuneproc in set_pio_mode() directly - we are already behind the rq
queue there.
- After we have uncovered the broken logics behind the whole ioctl handling we
now just have made ide_spin_wait_hwgroup() waiting for a proper somehow
longer timeout before giving up. This was previously just hiding the broken
concept of setting ioctl values through /proc/ide/ideX/settings - now it just
really helps hdparm to not to give up too early. (It shouldn't probably play
wreck havock on the global driver spin lock as well. I will look in to this
later.)
- Scrap the non necessary, to say the least, disabling of interrupts for 3,
read it again please, 3 seconds, on the local CPU inside
ide_spin_wait_hwgroup(). Spin lock handling needs checking there badly as I
see now as well...
Hey apparently any "special" requests are gone. We now have only
to deal with REQ_DEVICE_ACB and REQ_DEVICE_CMD. One of them is still too
much and will be killed.
[-- Attachment #2: ide-clean-56.diff --]
[-- Type: text/plain, Size: 35121 bytes --]
diff -urN linux-2.5.14/drivers/ide/ide.c linux/drivers/ide/ide.c
--- linux-2.5.14/drivers/ide/ide.c 2002-05-07 02:36:37.000000000 +0200
+++ linux/drivers/ide/ide.c 2002-05-07 02:15:18.000000000 +0200
@@ -193,7 +193,7 @@
#ifdef CONFIG_BLK_DEV_HD
if (ch->io_ports[IDE_DATA_OFFSET] == HD_DATA)
ch->noprobe = 1; /* may be overridden by ide_setup() */
-#endif /* CONFIG_BLK_DEV_HD */
+#endif
ch->major = ide_major[index];
sprintf(ch->name, "ide%d", index);
ch->bus_state = BUSSTATE_ON;
@@ -201,15 +201,14 @@
for (unit = 0; unit < MAX_DRIVES; ++unit) {
struct ata_device *drive = &ch->drives[unit];
- drive->type = ATA_DISK;
- drive->select.all = (unit<<4)|0xa0;
- drive->channel = ch;
- drive->ctl = 0x08;
- drive->ready_stat = READY_STAT;
- drive->bad_wstat = BAD_W_STAT;
- drive->special_cmd = (ATA_SPECIAL_RECALIBRATE | ATA_SPECIAL_GEOMETRY);
+ drive->type = ATA_DISK;
+ drive->select.all = (unit<<4)|0xa0;
+ drive->channel = ch;
+ drive->ctl = 0x08;
+ drive->ready_stat = READY_STAT;
+ drive->bad_wstat = BAD_W_STAT;
sprintf(drive->name, "hd%c", 'a' + (index * MAX_DRIVES) + unit);
- drive->max_failures = IDE_DEFAULT_MAX_FAILURES;
+ drive->max_failures = IDE_DEFAULT_MAX_FAILURES;
init_waitqueue_head(&drive->wqueue);
}
@@ -354,11 +353,8 @@
spin_unlock_irqrestore(&ide_lock, flags);
}
-static void ata_pre_reset(struct ata_device *drive)
+static void check_crc_errors(struct ata_device *drive)
{
- if (ata_ops(drive) && ata_ops(drive)->pre_reset)
- ata_ops(drive)->pre_reset(drive);
-
if (!drive->using_dma)
return;
@@ -392,38 +388,6 @@
return ~0UL;
}
-/*
- * This is used to issue WIN_SPECIFY, WIN_RESTORE, and WIN_SETMULT commands to
- * a drive.
- */
-static ide_startstop_t ata_special(struct ata_device *drive)
-{
- unsigned char special_cmd = drive->special_cmd;
-
-#ifdef DEBUG
- printk("%s: ata_special: 0x%02x\n", drive->name, special_cmd);
-#endif
- if (special_cmd & ATA_SPECIAL_TUNE) {
- drive->special_cmd &= ~ATA_SPECIAL_TUNE;
- if (drive->channel->tuneproc != NULL)
- drive->channel->tuneproc(drive, drive->tune_req);
- } else if (drive->driver != NULL) {
- if (ata_ops(drive)->special)
- return ata_ops(drive)->special(drive);
- else {
- drive->special_cmd = 0;
- drive->mult_req = 0;
-
- return ide_stopped;
- }
- } else if (special_cmd) {
- printk("%s: bad special flag: 0x%02x\n", drive->name, special_cmd);
- drive->special_cmd = 0;
- }
-
- return ide_stopped;
-}
-
extern struct block_device_operations ide_fops[];
/*
@@ -460,24 +424,24 @@
*/
static ide_startstop_t atapi_reset_pollfunc(struct ata_device *drive, struct request *__rq)
{
- ide_hwgroup_t *hwgroup = HWGROUP(drive);
- byte stat;
+ struct ata_channel *ch = drive->channel;
+ u8 stat;
- SELECT_DRIVE(drive->channel,drive);
+ SELECT_DRIVE(ch,drive);
udelay (10);
if (OK_STAT(stat=GET_STAT(), 0, BUSY_STAT)) {
printk("%s: ATAPI reset complete\n", drive->name);
} else {
- if (time_before(jiffies, hwgroup->poll_timeout)) {
+ if (time_before(jiffies, ch->poll_timeout)) {
ide_set_handler (drive, atapi_reset_pollfunc, HZ/20, NULL);
return ide_started; /* continue polling */
}
- hwgroup->poll_timeout = 0; /* end of polling */
+ ch->poll_timeout = 0; /* end of polling */
printk("%s: ATAPI reset timed-out, status=0x%02x\n", drive->name, stat);
return do_reset1 (drive, 1); /* do it the old fashioned way */
}
- hwgroup->poll_timeout = 0; /* done polling */
+ ch->poll_timeout = 0; /* done polling */
return ide_stopped;
}
@@ -489,19 +453,18 @@
*/
static ide_startstop_t reset_pollfunc(struct ata_device *drive, struct request *__rq)
{
- ide_hwgroup_t *hwgroup = HWGROUP(drive);
- struct ata_channel *hwif = drive->channel;
+ struct ata_channel *ch = drive->channel;
u8 stat;
if (!OK_STAT(stat=GET_STAT(), 0, BUSY_STAT)) {
- if (time_before(jiffies, hwgroup->poll_timeout)) {
+ if (time_before(jiffies, ch->poll_timeout)) {
ide_set_handler(drive, reset_pollfunc, HZ/20, NULL);
return ide_started; /* continue polling */
}
- printk("%s: reset timed-out, status=0x%02x\n", hwif->name, stat);
+ printk("%s: reset timed-out, status=0x%02x\n", ch->name, stat);
drive->failures++;
} else {
- printk("%s: reset: ", hwif->name);
+ printk("%s: reset: ", ch->name);
if ((stat = GET_ERR()) == 1) {
printk("success\n");
drive->failures = 0;
@@ -531,7 +494,8 @@
drive->failures++;
}
}
- hwgroup->poll_timeout = 0; /* done polling */
+ ch->poll_timeout = 0; /* done polling */
+
return ide_stopped;
}
@@ -555,21 +519,21 @@
{
unsigned int unit;
unsigned long flags;
- struct ata_channel *hwif = drive->channel;
- ide_hwgroup_t *hwgroup = HWGROUP(drive);
+ struct ata_channel *ch = drive->channel;
__save_flags(flags); /* local CPU only */
__cli(); /* local CPU only */
/* For an ATAPI device, first try an ATAPI SRST. */
if (drive->type != ATA_DISK && !do_not_try_atapi) {
- ata_pre_reset(drive);
- SELECT_DRIVE(hwif,drive);
+ check_crc_errors(drive);
+ SELECT_DRIVE(ch, drive);
udelay (20);
OUT_BYTE(WIN_SRST, IDE_COMMAND_REG);
- hwgroup->poll_timeout = jiffies + WAIT_WORSTCASE;
+ ch->poll_timeout = jiffies + WAIT_WORSTCASE;
ide_set_handler(drive, atapi_reset_pollfunc, HZ/20, NULL);
__restore_flags(flags); /* local CPU only */
+
return ide_started;
}
@@ -578,11 +542,12 @@
* for any of the drives on this interface.
*/
for (unit = 0; unit < MAX_DRIVES; ++unit)
- ata_pre_reset(&hwif->drives[unit]);
+ check_crc_errors(&ch->drives[unit]);
#if OK_TO_RESET_CONTROLLER
if (!IDE_CONTROL_REG) {
__restore_flags(flags);
+
return ide_stopped;
}
/*
@@ -601,7 +566,7 @@
OUT_BYTE(drive->ctl|2,IDE_CONTROL_REG); /* clear SRST, leave nIEN */
}
udelay(10); /* more than enough time */
- hwgroup->poll_timeout = jiffies + WAIT_WORSTCASE;
+ ch->poll_timeout = jiffies + WAIT_WORSTCASE;
ide_set_handler(drive, reset_pollfunc, HZ/20, NULL);
/*
@@ -609,9 +574,10 @@
* state when the disks are reset this way. At least, the Winbond
* 553 documentation says that
*/
- if (hwif->resetproc != NULL)
- hwif->resetproc(drive);
+ if (ch->resetproc != NULL)
+ ch->resetproc(drive);
+ /* FIXME: we should handle mulit mode setting here as well ! */
#endif
__restore_flags (flags); /* local CPU only */
@@ -789,6 +755,36 @@
}
}
+#ifdef CONFIG_BLK_DEV_PDC4030
+# define IS_PDC4030_DRIVE (drive->channel->chipset == ide_pdc4030)
+#else
+# define IS_PDC4030_DRIVE (0) /* auto-NULLs out pdc4030 code */
+#endif
+
+/*
+ * We are still on the old request path here so issuing the recalibrate command
+ * directly should just work.
+ */
+static int do_recalibrate(struct ata_device *drive)
+{
+ printk(KERN_INFO "%s: recalibrating!\n", drive->name);
+
+ if (drive->type != ATA_DISK)
+ return ide_stopped;
+
+ if (!IS_PDC4030_DRIVE) {
+ struct ata_taskfile args;
+
+ memset(&args, 0, sizeof(args));
+ args.taskfile.sector_count = drive->sect;
+ args.taskfile.command = WIN_RESTORE;
+ args.handler = recal_intr;
+ ata_taskfile(drive, &args, NULL);
+ }
+
+ return IS_PDC4030_DRIVE ? ide_stopped : ide_started;
+}
+
/*
* Take action based on the error returned by the drive.
*/
@@ -835,13 +831,11 @@
else
ide_end_request(drive, rq, 0);
} else {
- if ((rq->errors & ERROR_RESET) == ERROR_RESET) {
- ++rq->errors;
+ ++rq->errors;
+ if ((rq->errors & ERROR_RESET) == ERROR_RESET)
return do_reset1(drive, 0);
- }
if ((rq->errors & ERROR_RECAL) == ERROR_RECAL)
- drive->special_cmd |= ATA_SPECIAL_RECALIBRATE;
- ++rq->errors;
+ return do_recalibrate(drive);
}
return ide_stopped;
}
@@ -960,7 +954,7 @@
goto kill_rq;
}
- block = rq->sector;
+ block = rq->sector;
/* Strange disk manager remap.
*/
@@ -991,14 +985,6 @@
}
}
- /* FIXME: We can see nicely here that all commands should be submitted
- * through the request queue and that the special field in drive should
- * go as soon as possible!
- */
-
- if (drive->special_cmd)
- return ata_special(drive);
-
/* This issues a special drive command, usually initiated by ioctl()
* from the external hdparm program.
*/
@@ -1011,7 +997,7 @@
ata_taskfile(drive, args, NULL);
if (((args->command_type == IDE_DRIVE_TASK_RAW_WRITE) ||
- (args->command_type == IDE_DRIVE_TASK_OUT)) &&
+ (args->command_type == IDE_DRIVE_TASK_OUT)) &&
args->prehandler && args->handler)
return args->prehandler(drive, rq);
@@ -1053,6 +1039,7 @@
return ata_ops(drive)->do_request(drive, rq, block);
else {
ide_end_request(drive, rq, 0);
+
return ide_stopped;
}
}
@@ -1499,7 +1486,7 @@
disable_irq(ch->irq); /* disable_irq_nosync ?? */
#endif
__cli(); /* local CPU only, as if we were handling an interrupt */
- if (hwgroup->poll_timeout != 0) {
+ if (ch->poll_timeout != 0) {
startstop = handler(drive, ch->hwgroup->rq);
} else if (drive_is_ready(drive)) {
if (drive->waiting_for_dma)
@@ -1598,7 +1585,7 @@
if (!ide_ack_intr(ch))
goto out_lock;
- if (handler == NULL || hwgroup->poll_timeout != 0) {
+ if (handler == NULL || ch->poll_timeout != 0) {
#if 0
printk(KERN_INFO "ide: unexpected interrupt %d %d\n", ch->unit, irq);
#endif
@@ -2327,23 +2314,24 @@
int ide_spin_wait_hwgroup(struct ata_device *drive)
{
ide_hwgroup_t *hwgroup = HWGROUP(drive);
- unsigned long timeout = jiffies + (3 * HZ);
+
+ /* FIXME: Wait on a proper timer. Instead of playing games on the
+ * spin_lock().
+ */
+
+ unsigned long timeout = jiffies + (10 * HZ);
spin_lock_irq(&ide_lock);
while (test_bit(IDE_BUSY, &hwgroup->flags)) {
- unsigned long lflags;
spin_unlock_irq(&ide_lock);
- __save_flags(lflags); /* local CPU only */
- __sti(); /* local CPU only; needed for jiffies */
- if (0 < (signed long)(jiffies - timeout)) {
- __restore_flags(lflags); /* local CPU only */
+ if (time_after(jiffies, timeout)) {
printk("%s: channel busy\n", drive->name);
return -EBUSY;
}
- __restore_flags(lflags); /* local CPU only */
spin_lock_irq(&ide_lock);
}
+
return 0;
}
@@ -2411,18 +2399,17 @@
static int set_pio_mode(struct ata_device *drive, int arg)
{
- struct request rq;
-
if (!drive->channel->tuneproc)
return -ENOSYS;
- if (drive->special_cmd & ATA_SPECIAL_TUNE)
+ /* FIXME: This is very much the same kind of problem as we have with
+ * set_mutlmode() see for a edscription there.
+ */
+ if (HWGROUP(drive)->handler)
return -EBUSY;
- ide_init_drive_cmd(&rq);
- drive->tune_req = (u8) arg;
- drive->special_cmd |= ATA_SPECIAL_TUNE;
- ide_do_drive_cmd(drive, &rq, ide_wait);
+ if (drive->channel->tuneproc != NULL)
+ drive->channel->tuneproc(drive, (u8) arg);
return 0;
}
diff -urN linux-2.5.14/drivers/ide/ide-disk.c linux/drivers/ide/ide-disk.c
--- linux-2.5.14/drivers/ide/ide-disk.c 2002-05-07 02:36:37.000000000 +0200
+++ linux/drivers/ide/ide-disk.c 2002-05-07 02:01:48.000000000 +0200
@@ -34,9 +34,9 @@
#include <asm/io.h>
#ifdef CONFIG_BLK_DEV_PDC4030
-#define IS_PDC4030_DRIVE (drive->channel->chipset == ide_pdc4030)
+# define IS_PDC4030_DRIVE (drive->channel->chipset == ide_pdc4030)
#else
-#define IS_PDC4030_DRIVE (0) /* auto-NULLs out pdc4030 code */
+# define IS_PDC4030_DRIVE (0) /* auto-NULLs out pdc4030 code */
#endif
/*
@@ -304,9 +304,9 @@
}
if (IS_PDC4030_DRIVE) {
- extern ide_startstop_t promise_rw_disk(struct ata_device *, struct request *, unsigned long);
+ extern ide_startstop_t promise_do_request(struct ata_device *, struct request *, sector_t);
- return promise_rw_disk(drive, rq, block);
+ return promise_do_request(drive, rq, block);
}
/*
@@ -364,7 +364,7 @@
* point.
*/
- if (drive->doorlocking && ide_raw_taskfile(drive, &args, NULL))
+ if (drive->doorlocking && ide_raw_taskfile(drive, &args))
drive->doorlocking = 0;
}
return 0;
@@ -383,10 +383,10 @@
ide_cmd_type_parser(&args);
- return ide_raw_taskfile(drive, &args, NULL);
+ return ide_raw_taskfile(drive, &args);
}
-static void idedisk_release (struct inode *inode, struct file *filp, struct ata_device *drive)
+static void idedisk_release(struct inode *inode, struct file *filp, struct ata_device *drive)
{
if (drive->removable && !drive->usage) {
struct ata_taskfile args;
@@ -398,7 +398,7 @@
ide_cmd_type_parser(&args);
if (drive->doorlocking &&
- ide_raw_taskfile(drive, &args, NULL))
+ ide_raw_taskfile(drive, &args))
drive->doorlocking = 0;
}
if ((drive->id->cfs_enable_2 & 0x3000) && drive->wcache)
@@ -419,75 +419,6 @@
return drive->capacity - drive->sect0;
}
-static ide_startstop_t idedisk_special(struct ata_device *drive)
-{
- unsigned char special_cmd = drive->special_cmd;
-
- if (special_cmd & ATA_SPECIAL_GEOMETRY) {
- struct ata_taskfile args;
-
- drive->special_cmd &= ~ATA_SPECIAL_GEOMETRY;
-
- memset(&args, 0, sizeof(args));
- args.taskfile.sector_number = drive->sect;
- args.taskfile.low_cylinder = drive->cyl;
- args.taskfile.high_cylinder = drive->cyl>>8;
- args.taskfile.device_head = ((drive->head-1)|drive->select.all)&0xBF;
- if (!IS_PDC4030_DRIVE) {
- args.taskfile.sector_count = drive->sect;
- args.taskfile.command = WIN_SPECIFY;
- args.handler = set_geometry_intr;;
- }
- ata_taskfile(drive, &args, NULL);
- } else if (special_cmd & ATA_SPECIAL_RECALIBRATE) {
- drive->special_cmd &= ~ATA_SPECIAL_RECALIBRATE;
-
- if (!IS_PDC4030_DRIVE) {
- struct ata_taskfile args;
-
- memset(&args, 0, sizeof(args));
- args.taskfile.sector_count = drive->sect;
- args.taskfile.command = WIN_RESTORE;
- args.handler = recal_intr;
- ata_taskfile(drive, &args, NULL);
- }
- } else if (special_cmd & ATA_SPECIAL_MMODE) {
- drive->special_cmd &= ~ATA_SPECIAL_MMODE;
- if (drive->id && drive->mult_req > drive->id->max_multsect)
- drive->mult_req = drive->id->max_multsect;
- if (!IS_PDC4030_DRIVE) {
- struct ata_taskfile args;
-
- memset(&args, 0, sizeof(args));
- args.taskfile.sector_count = drive->mult_req;
- args.taskfile.command = WIN_SETMULT;
- args.handler = set_multmode_intr;
-
- ata_taskfile(drive, &args, NULL);
- }
- } else if (special_cmd) {
- drive->special_cmd = 0;
-
- printk(KERN_ERR "%s: bad special flag: 0x%02x\n", drive->name, special_cmd);
- return ide_stopped;
- }
- return IS_PDC4030_DRIVE ? ide_stopped : ide_started;
-}
-
-static void idedisk_pre_reset(struct ata_device *drive)
-{
- int legacy = (drive->id->cfs_enable_2 & 0x0400) ? 0 : 1;
-
- if (legacy)
- drive->special_cmd = (ATA_SPECIAL_GEOMETRY | ATA_SPECIAL_RECALIBRATE);
- else
- drive->special_cmd = 0;
- if (OK_TO_RESET_CONTROLLER)
- drive->mult_count = 0;
- if (drive->mult_req != drive->mult_count)
- drive->special_cmd |= ATA_SPECIAL_MMODE;
-}
-
#ifdef CONFIG_PROC_FS
#ifdef CONFIG_BLK_DEV_IDE_TCQ
@@ -556,28 +487,58 @@
*/
static int set_multcount(struct ata_device *drive, int arg)
{
- struct request rq;
+ struct ata_taskfile args;
+
+ /* Setting multi mode count on this channel type is not supported/not
+ * handled.
+ */
+ if (IS_PDC4030_DRIVE)
+ return -EIO;
+
+ /* Hugh, we still didn't detect the devices capabilities.
+ */
+ if (!drive->id)
+ return -EIO;
- if (drive->special_cmd & ATA_SPECIAL_MMODE)
+ /* FIXME: Hmm... just bailing out my be problematic, since there *is*
+ * activity during boot. For now the same problem persists in
+ * set_pio_mode() we will have to do something about it soon.
+ */
+ if (HWGROUP(drive)->handler)
return -EBUSY;
- ide_init_drive_cmd(&rq);
+ if (arg > drive->id->max_multsect)
+ arg = drive->id->max_multsect;
- drive->mult_req = arg;
- drive->special_cmd |= ATA_SPECIAL_MMODE;
+ memset(&args, 0, sizeof(args));
+ args.taskfile.sector_count = arg;
+ args.taskfile.command = WIN_SETMULT;
+ ide_cmd_type_parser(&args);
- ide_do_drive_cmd(drive, &rq, ide_wait);
+ if (!ide_raw_taskfile(drive, &args)) {
+ /* all went well track this setting as valid */
+ drive->mult_count = arg;
+
+ return 0;
+ } else
+ drive->mult_count = 0; /* reset */
- return (drive->mult_count == arg) ? 0 : -EIO;
+ return -EIO;
}
static int set_nowerr(struct ata_device *drive, int arg)
{
- if (ide_spin_wait_hwgroup(drive))
+ if (HWGROUP(drive)->handler)
return -EBUSY;
+
drive->nowerr = arg;
drive->bad_wstat = arg ? BAD_R_STAT : BAD_W_STAT;
+
+ /* FIXME: I'm less then sure that we are under the global request lock here!
+ */
+#if 0
spin_unlock_irq(&ide_lock);
+#endif
return 0;
}
@@ -593,7 +554,7 @@
args.taskfile.feature = (arg) ? SETFEATURES_EN_WCACHE : SETFEATURES_DIS_WCACHE;
args.taskfile.command = WIN_SETFEATURES;
ide_cmd_type_parser(&args);
- ide_raw_taskfile(drive, &args, NULL);
+ ide_raw_taskfile(drive, &args);
drive->wcache = arg;
@@ -608,7 +569,7 @@
args.taskfile.command = WIN_STANDBYNOW1;
ide_cmd_type_parser(&args);
- return ide_raw_taskfile(drive, &args, NULL);
+ return ide_raw_taskfile(drive, &args);
}
static int set_acoustic(struct ata_device *drive, int arg)
@@ -620,7 +581,7 @@
args.taskfile.sector_count = arg;
args.taskfile.command = WIN_SETFEATURES;
ide_cmd_type_parser(&args);
- ide_raw_taskfile(drive, &args, NULL);
+ ide_raw_taskfile(drive, &args);
drive->acoustic = arg;
@@ -673,17 +634,11 @@
{
struct hd_driveid *id = drive->id;
- ide_add_setting(drive, "bios_cyl", SETTING_RW, -1, -1, TYPE_INT, 0, 65535, 1, 1, &drive->bios_cyl, NULL);
- ide_add_setting(drive, "bios_head", SETTING_RW, -1, -1, TYPE_BYTE, 0, 255, 1, 1, &drive->bios_head, NULL);
- ide_add_setting(drive, "bios_sect", SETTING_RW, -1, -1, TYPE_BYTE, 0, 63, 1, 1, &drive->bios_sect, NULL);
ide_add_setting(drive, "address", SETTING_RW, HDIO_GET_ADDRESS, HDIO_SET_ADDRESS, TYPE_INTA, 0, 2, 1, 1, &drive->addressing, set_lba_addressing);
ide_add_setting(drive, "multcount", id ? SETTING_RW : SETTING_READ, HDIO_GET_MULTCOUNT, HDIO_SET_MULTCOUNT, TYPE_BYTE, 0, id ? id->max_multsect : 0, 1, 1, &drive->mult_count, set_multcount);
ide_add_setting(drive, "nowerr", SETTING_RW, HDIO_GET_NOWERR, HDIO_SET_NOWERR, TYPE_BYTE, 0, 1, 1, 1, &drive->nowerr, set_nowerr);
- ide_add_setting(drive, "lun", SETTING_RW, -1, -1, TYPE_INT, 0, 7, 1, 1, &drive->lun, NULL);
ide_add_setting(drive, "wcache", SETTING_RW, HDIO_GET_WCACHE, HDIO_SET_WCACHE, TYPE_BYTE, 0, 1, 1, 1, &drive->wcache, write_cache);
ide_add_setting(drive, "acoustic", SETTING_RW, HDIO_GET_ACOUSTIC, HDIO_SET_ACOUSTIC, TYPE_BYTE, 0, 254, 1, 1, &drive->acoustic, set_acoustic);
- ide_add_setting(drive, "failures", SETTING_RW, -1, -1, TYPE_INT, 0, 65535, 1, 1, &drive->failures, NULL);
- ide_add_setting(drive, "max_failures", SETTING_RW, -1, -1, TYPE_INT, 0, 65535, 1, 1, &drive->max_failures, NULL);
#ifdef CONFIG_BLK_DEV_IDE_TCQ
ide_add_setting(drive, "using_tcq", SETTING_RW, HDIO_GET_QDMA, HDIO_SET_QDMA, TYPE_BYTE, 0, IDE_MAX_TAG, 1, 1, &drive->using_tcq, set_using_tcq);
#endif
@@ -761,7 +716,7 @@
args.handler = task_no_data_intr;
/* submit command request */
- ide_raw_taskfile(drive, &args, NULL);
+ ide_raw_taskfile(drive, &args);
/* if OK, compute maximum address value */
if ((args.taskfile.command & 0x01) == 0) {
@@ -789,7 +744,7 @@
args.handler = task_no_data_intr;
/* submit command request */
- ide_raw_taskfile(drive, &args, NULL);
+ ide_raw_taskfile(drive, &args);
/* if OK, compute maximum address value */
if ((args.taskfile.command & 0x01) == 0) {
@@ -829,7 +784,7 @@
args.taskfile.command = WIN_SET_MAX;
args.handler = task_no_data_intr;
/* submit command request */
- ide_raw_taskfile(drive, &args, NULL);
+ ide_raw_taskfile(drive, &args);
/* if OK, read new maximum address value */
if ((args.taskfile.command & 0x01) == 0) {
addr_set = ((args.taskfile.device_head & 0x0f) << 24)
@@ -865,7 +820,7 @@
args.handler = task_no_data_intr;
/* submit command request */
- ide_raw_taskfile(drive, &args, NULL);
+ ide_raw_taskfile(drive, &args);
/* if OK, compute maximum address value */
if ((args.taskfile.command & 0x01) == 0) {
u32 high = (args.hobfile.high_cylinder << 16) |
@@ -1078,7 +1033,13 @@
printk("\n");
drive->mult_count = 0;
+#if 0
if (id->max_multsect) {
+
+ /* FIXME: reenable this again after making it to use
+ * the same code path as the ioctl stuff.
+ */
+
#ifdef CONFIG_IDEDISK_MULTI_MODE
id->multsect = ((id->max_multsect/2) > 1) ? id->max_multsect : 0;
id->multsect_valid = id->multsect ? 1 : 0;
@@ -1094,6 +1055,7 @@
drive->special_cmd |= ATA_SPECIAL_MMODE;
#endif
}
+#endif
/* FIXME: Nowadays there are many chipsets out there which *require* 32
* bit IO. Those will most propably not work properly with drives not
@@ -1136,9 +1098,7 @@
release: idedisk_release,
check_media_change: idedisk_check_media_change,
revalidate: NULL, /* use default method */
- pre_reset: idedisk_pre_reset,
capacity: idedisk_capacity,
- special: idedisk_special,
proc: idedisk_proc
};
diff -urN linux-2.5.14/drivers/ide/ide-tape.c linux/drivers/ide/ide-tape.c
--- linux-2.5.14/drivers/ide/ide-tape.c 2002-05-07 02:36:37.000000000 +0200
+++ linux/drivers/ide/ide-tape.c 2002-05-07 01:23:42.000000000 +0200
@@ -4273,16 +4273,6 @@
}
/*
- * idetape_pre_reset is called before an ATAPI/ATA software reset.
- */
-static void idetape_pre_reset (ide_drive_t *drive)
-{
- idetape_tape_t *tape = drive->driver_data;
- if (tape != NULL)
- set_bit (IDETAPE_IGNORE_DSC, &tape->flags);
-}
-
-/*
* Character device interface functions
*/
static ide_drive_t *get_drive_ptr (kdev_t i_rdev)
@@ -6164,8 +6154,6 @@
release: idetape_blkdev_release,
check_media_change: NULL,
revalidate: idetape_revalidate,
- pre_reset: idetape_pre_reset,
- capacity: NULL,
proc: idetape_proc
};
diff -urN linux-2.5.14/drivers/ide/ide-taskfile.c linux/drivers/ide/ide-taskfile.c
--- linux-2.5.14/drivers/ide/ide-taskfile.c 2002-05-07 02:36:37.000000000 +0200
+++ linux/drivers/ide/ide-taskfile.c 2002-05-07 02:30:11.000000000 +0200
@@ -489,40 +489,6 @@
}
/*
- * This is invoked on completion of a WIN_SETMULT cmd.
- */
-ide_startstop_t set_multmode_intr(struct ata_device *drive, struct request *__rq)
-{
- u8 stat;
-
- if (OK_STAT(stat = GET_STAT(),READY_STAT,BAD_STAT)) {
- drive->mult_count = drive->mult_req;
- } else {
- drive->mult_req = drive->mult_count = 0;
- drive->special_cmd |= ATA_SPECIAL_RECALIBRATE;
- ide_dump_status(drive, "set_multmode", stat);
- }
- return ide_stopped;
-}
-
-/*
- * This is invoked on completion of a WIN_SPECIFY cmd.
- */
-ide_startstop_t set_geometry_intr(struct ata_device *drive, struct request *__rq)
-{
- u8 stat;
-
- if (OK_STAT(stat=GET_STAT(),READY_STAT,BAD_STAT))
- return ide_stopped;
-
- if (stat & (ERR_STAT|DRQ_STAT))
- return ide_error(drive, "set_geometry_intr", stat);
-
- ide_set_handler(drive, set_geometry_intr, WAIT_CMD, NULL);
- return ide_started;
-}
-
-/*
* This is invoked on completion of a WIN_RESTORE (recalibrate) cmd.
*/
ide_startstop_t recal_intr(struct ata_device *drive, struct request *__rq)
@@ -729,11 +695,11 @@
args->command_type = IDE_DRIVE_TASK_IN;
return;
+ case CFA_WRITE_SECT_WO_ERASE:
case WIN_WRITE:
case WIN_WRITE_EXT:
case WIN_WRITE_VERIFY:
case WIN_WRITE_BUFFER:
- case CFA_WRITE_SECT_WO_ERASE:
case WIN_DOWNLOAD_MICROCODE:
args->prehandler = pre_task_out_intr;
args->handler = task_out_intr;
@@ -832,7 +798,7 @@
}
case WIN_SPECIFY:
- args->handler = set_geometry_intr;
+ args->handler = task_no_data_intr;
args->command_type = IDE_DRIVE_TASK_NO_DATA;
return;
@@ -874,7 +840,7 @@
return;
case WIN_SETMULT:
- args->handler = set_multmode_intr;
+ args->handler = task_no_data_intr;
args->command_type = IDE_DRIVE_TASK_NO_DATA;
return;
@@ -894,19 +860,19 @@
}
}
-int ide_raw_taskfile(struct ata_device *drive, struct ata_taskfile *args, byte *buf)
+int ide_raw_taskfile(struct ata_device *drive, struct ata_taskfile *args)
{
struct request rq;
memset(&rq, 0, sizeof(rq));
rq.flags = REQ_DRIVE_ACB;
- rq.buffer = buf;
+#if 0
if (args->command_type != IDE_DRIVE_TASK_NO_DATA)
rq.current_nr_sectors = rq.nr_sectors
= (args->hobfile.sector_count << 8)
| args->taskfile.sector_count;
-
+#endif
rq.special = args;
return ide_do_drive_cmd(drive, &rq, ide_wait);
@@ -996,8 +962,6 @@
EXPORT_SYMBOL(atapi_write);
EXPORT_SYMBOL(ata_taskfile);
EXPORT_SYMBOL(recal_intr);
-EXPORT_SYMBOL(set_geometry_intr);
-EXPORT_SYMBOL(set_multmode_intr);
EXPORT_SYMBOL(task_no_data_intr);
EXPORT_SYMBOL(ide_raw_taskfile);
EXPORT_SYMBOL(ide_cmd_type_parser);
diff -urN linux-2.5.14/drivers/ide/pdc4030.c linux/drivers/ide/pdc4030.c
--- linux-2.5.14/drivers/ide/pdc4030.c 2002-05-06 05:37:57.000000000 +0200
+++ linux/drivers/ide/pdc4030.c 2002-05-06 19:55:23.000000000 +0200
@@ -39,6 +39,7 @@
* Version 0.90 Transition to BETA code. No lost/unexpected interrupts
* Version 0.91 Bring in line with new bio code in 2.5.1
* Version 0.92 Update for IDE driver taskfile changes
+ * Version 0.93 Sync with 2.5.10, minor taskfile changes
*/
/*
@@ -380,6 +381,7 @@
}
/*
+ * promise_complete_pollfunc()
* This is the polling function for waiting (nicely!) until drive stops
* being busy. It is invoked at the end of a write, after the previous poll
* has finished.
@@ -388,20 +390,20 @@
*/
static ide_startstop_t promise_complete_pollfunc(struct ata_device *drive, struct request *rq)
{
- ide_hwgroup_t *hwgroup = HWGROUP(drive);
+ struct ata_channel *ch = drive->channel;
if (GET_STAT() & BUSY_STAT) {
- if (time_before(jiffies, hwgroup->poll_timeout)) {
+ if (time_before(jiffies, ch->poll_timeout)) {
ide_set_handler(drive, promise_complete_pollfunc, HZ/100, NULL);
return ide_started; /* continue polling... */
}
- hwgroup->poll_timeout = 0;
+ ch->poll_timeout = 0;
printk(KERN_ERR "%s: completion timeout - still busy!\n",
drive->name);
return ide_error(drive, "busy timeout", GET_STAT());
}
- hwgroup->poll_timeout = 0;
+ ch->poll_timeout = 0;
#ifdef DEBUG_WRITE
printk(KERN_DEBUG "%s: Write complete - end_request\n", drive->name);
#endif
@@ -432,7 +434,7 @@
nsect = mcount;
mcount -= nsect;
- buffer = bio_kmap_irq(rq->bio, flags) + ide_rq_offset(rq);
+ buffer = bio_kmap_irq(rq->bio, &flags) + ide_rq_offset(rq);
rq->sector += nsect;
rq->nr_sectors -= nsect;
rq->current_nr_sectors -= nsect;
@@ -467,23 +469,23 @@
*/
static ide_startstop_t promise_write_pollfunc(struct ata_device *drive, struct request *rq)
{
- ide_hwgroup_t *hwgroup = HWGROUP(drive);
+ struct ata_channel *ch = drive->channel;
if (IN_BYTE(IDE_NSECTOR_REG) != 0) {
- if (time_before(jiffies, hwgroup->poll_timeout)) {
+ if (time_before(jiffies, ch->poll_timeout)) {
ide_set_handler(drive, promise_write_pollfunc, HZ/100, NULL);
return ide_started; /* continue polling... */
}
- hwgroup->poll_timeout = 0;
+ ch->poll_timeout = 0;
printk(KERN_ERR "%s: write timed-out!\n",drive->name);
- return ide_error (drive, "write timeout", GET_STAT());
+ return ide_error(drive, "write timeout", GET_STAT());
}
/*
* Now write out last 4 sectors and poll for not BUSY
*/
promise_multwrite(drive, rq, 4);
- hwgroup->poll_timeout = jiffies + WAIT_WORSTCASE;
+ ch->poll_timeout = jiffies + WAIT_WORSTCASE;
ide_set_handler(drive, promise_complete_pollfunc, HZ/100, NULL);
#ifdef DEBUG_WRITE
printk(KERN_DEBUG "%s: Done last 4 sectors - status = %02x\n",
@@ -501,7 +503,7 @@
*/
static ide_startstop_t promise_write(struct ata_device *drive, struct request *rq)
{
- ide_hwgroup_t *hwgroup = HWGROUP(drive);
+ struct ata_channel *ch = drive->channel;
#ifdef DEBUG_WRITE
printk(KERN_DEBUG "%s: promise_write: sectors(%ld-%ld), "
@@ -516,7 +518,7 @@
if (rq->nr_sectors > 4) {
if (promise_multwrite(drive, rq, rq->nr_sectors - 4))
return ide_stopped;
- hwgroup->poll_timeout = jiffies + WAIT_WORSTCASE;
+ ch->poll_timeout = jiffies + WAIT_WORSTCASE;
ide_set_handler(drive, promise_write_pollfunc, HZ/100, NULL);
return ide_started;
} else {
@@ -526,7 +528,7 @@
*/
if (promise_multwrite(drive, rq, rq->nr_sectors))
return ide_stopped;
- hwgroup->poll_timeout = jiffies + WAIT_WORSTCASE;
+ ch->poll_timeout = jiffies + WAIT_WORSTCASE;
ide_set_handler(drive, promise_complete_pollfunc, HZ/100, NULL);
#ifdef DEBUG_WRITE
printk(KERN_DEBUG "%s: promise_write: <= 4 sectors, "
@@ -537,13 +539,13 @@
}
/*
- * do_pdc4030_io() is called from do_rw_disk, having had the block number
- * already set up. It issues a READ or WRITE command to the Promise
+ * do_pdc4030_io() is called from promise_do_request, having had the block
+ * number already set up. It issues a READ or WRITE command to the Promise
* controller, assuming LBA has been used to set up the block number.
*/
-ide_startstop_t do_pdc4030_io(struct ata_device *drive, struct ata_taskfile *task, struct request *rq)
+ide_startstop_t do_pdc4030_io(struct ata_device *drive, struct ata_taskfile *args, struct request *rq)
{
- struct hd_drive_task_hdr *taskfile = &task->taskfile;
+ struct hd_drive_task_hdr *taskfile = &(args->taskfile);
unsigned long timeout;
byte stat;
@@ -628,7 +630,7 @@
}
}
-ide_startstop_t promise_rw_disk(struct ata_device *drive, struct request *rq, sector_t block)
+ide_startstop_t promise_do_request(struct ata_device *drive, struct request *rq, sector_t block)
{
struct ata_taskfile args;
@@ -647,12 +649,12 @@
args.taskfile.device_head = ((block>>8)&0x0f)|drive->select.all;
args.taskfile.command = (rq_data_dir(rq)==READ)?PROMISE_READ:PROMISE_WRITE;
- ide_cmd_type_parser(&args);
- /* We don't use the generic inerrupt handlers here? */
- args.prehandler = NULL;
+ /* We can't call ide_cmd_type_parser here, since it won't understand
+ our command, but that doesn't matter, since we don't use the
+ generic interrupt handlers either. Setup the bits of args that we
+ will need. */
args.handler = NULL;
rq->special = &args;
return do_pdc4030_io(drive, &args, rq);
}
-
diff -urN linux-2.5.14/drivers/ide/tcq.c linux/drivers/ide/tcq.c
--- linux-2.5.14/drivers/ide/tcq.c 2002-05-06 05:38:05.000000000 +0200
+++ linux/drivers/ide/tcq.c 2002-05-06 20:11:27.000000000 +0200
@@ -418,7 +418,7 @@
* pass NOP with sub-code 0x01 to device, so the command will not
* fail there
*/
- ide_raw_taskfile(drive, &args, NULL);
+ ide_raw_taskfile(drive, &args);
if (args.taskfile.feature & ABRT_ERR)
return 1;
@@ -448,7 +448,7 @@
args.taskfile.command = WIN_SETFEATURES;
ide_cmd_type_parser(&args);
- if (ide_raw_taskfile(drive, &args, NULL)) {
+ if (ide_raw_taskfile(drive, &args)) {
printk("%s: failed to enable write cache\n", drive->name);
return 1;
}
@@ -462,7 +462,7 @@
args.taskfile.command = WIN_SETFEATURES;
ide_cmd_type_parser(&args);
- if (ide_raw_taskfile(drive, &args, NULL)) {
+ if (ide_raw_taskfile(drive, &args)) {
printk("%s: disabling release interrupt fail\n", drive->name);
return 1;
}
@@ -476,7 +476,7 @@
args.taskfile.command = WIN_SETFEATURES;
ide_cmd_type_parser(&args);
- if (ide_raw_taskfile(drive, &args, NULL)) {
+ if (ide_raw_taskfile(drive, &args)) {
printk("%s: enabling service interrupt fail\n", drive->name);
return 1;
}
diff -urN linux-2.5.14/include/linux/ide.h linux/include/linux/ide.h
--- linux-2.5.14/include/linux/ide.h 2002-05-07 02:36:38.000000000 +0200
+++ linux/include/linux/ide.h 2002-05-07 02:30:15.000000000 +0200
@@ -47,7 +47,7 @@
# define DISK_RECOVERY_TIME 0 /* for hardware that needs it */
#endif
#ifndef OK_TO_RESET_CONTROLLER /* 1 needed for good error recovery */
-# define OK_TO_RESET_CONTROLLER 1 /* 0 for use with AH2372A/B interface */
+# define OK_TO_RESET_CONTROLLER 0 /* 0 for use with AH2372A/B interface */
#endif
#ifndef FANCY_STATUS_DUMPS /* 1 for human-readable drive errors */
# define FANCY_STATUS_DUMPS 1 /* 0 to reduce kernel size */
@@ -327,19 +327,9 @@
*/
request_queue_t queue; /* per device request queue */
-
unsigned long sleep; /* sleep until this time */
- /* Flags requesting/indicating one of the following special commands
- * executed on the request queue.
- */
-#define ATA_SPECIAL_GEOMETRY 0x01
-#define ATA_SPECIAL_RECALIBRATE 0x02
-#define ATA_SPECIAL_MMODE 0x04
-#define ATA_SPECIAL_TUNE 0x08
- unsigned char special_cmd;
- u8 mult_req; /* requested multiple sector setting */
- u8 tune_req; /* requested drive tuning setting */
+ u8 XXX_tune_req; /* requested drive tuning setting */
byte using_dma; /* disk is using dma for read/write */
byte using_tcq; /* disk is using queueing */
@@ -409,6 +399,7 @@
unsigned int failures; /* current failure count */
unsigned int max_failures; /* maximum allowed failure count */
struct device device; /* global device tree handle */
+
/*
* tcq statistics
*/
@@ -517,6 +508,8 @@
/* driver soft-power interface */
int (*busproc)(struct ata_device *, int);
byte bus_state; /* power state of the IDE bus */
+
+ unsigned long poll_timeout; /* timeout value during polled operations */
};
/*
@@ -565,17 +558,19 @@
return 1;
}
#else
-#define ata_pending_commands(drive) (0)
-#define ata_can_queue(drive) (1)
+# define ata_pending_commands(drive) (0)
+# define ata_can_queue(drive) (1)
#endif
typedef struct hwgroup_s {
+ /* FIXME: We should look for busy request queues instead of looking at
+ * the !NULL state of this field.
+ */
ide_startstop_t (*handler)(struct ata_device *, struct request *); /* irq handler, if active */
unsigned long flags; /* BUSY, SLEEPING */
struct ata_device *XXX_drive; /* current drive */
struct request *rq; /* current request */
struct timer_list timer; /* failsafe timer */
- unsigned long poll_timeout; /* timeout value during long polls */
int (*expiry)(struct ata_device *, struct request *); /* irq handler, if active */
} ide_hwgroup_t;
@@ -675,9 +670,7 @@
int (*check_media_change)(struct ata_device *);
void (*revalidate)(struct ata_device *);
- void (*pre_reset)(struct ata_device *);
sector_t (*capacity)(struct ata_device *);
- ide_startstop_t (*special)(struct ata_device *);
ide_proc_entry_t *proc;
};
@@ -827,15 +820,13 @@
*/
extern ide_startstop_t recal_intr(struct ata_device *, struct request *);
-extern ide_startstop_t set_geometry_intr(struct ata_device *, struct request *);
-extern ide_startstop_t set_multmode_intr(struct ata_device *, struct request *);
extern ide_startstop_t task_no_data_intr(struct ata_device *, struct request *);
/* This is setting up all fields in args, which depend upon the command type.
*/
extern void ide_cmd_type_parser(struct ata_taskfile *args);
-extern int ide_raw_taskfile(struct ata_device *drive, struct ata_taskfile *cmd, byte *buf);
+extern int ide_raw_taskfile(struct ata_device *, struct ata_taskfile *);
extern int ide_cmd_ioctl(struct ata_device *drive, unsigned long arg);
void ide_delay_50ms(void);
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 11:22 ` [PATCH] 2.5.14 IDE 56 Martin Dalecki
@ 2002-05-07 14:02 ` Padraig Brady
2002-05-07 13:15 ` Martin Dalecki
2002-05-08 18:46 ` Denis Vlasenko
1 sibling, 1 reply; 265+ messages in thread
From: Padraig Brady @ 2002-05-07 14:02 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
Martin Dalecki wrote:
> Mon May 6 13:29:44 CEST 2002 ide-clean-56
>
[snip]
> OK. After realizing the simple fact that quite a lot of low level
> hardware manipulating ioctls may require assistance in usage from
> proper logic which is *very* unlikely to be implemented in a bash
> (for me preferable still ksh) I have made my mind up.
>
> /proc/ide will be nuked.
Please consider this carefully, especially the read only bits.
One particular thing I use a lot is: /proc/ide/hda/capacity
Will there be another interface easily usable by scripts
to get this information?
Am I going to have to parse hdparm output?
....
geometry = 2434/255/63, sectors = 39102336, start = 0
Am I going to need hdparm on my embedded system?
Padraig.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 14:02 ` Padraig Brady
@ 2002-05-07 13:15 ` Martin Dalecki
2002-05-07 14:30 ` Padraig Brady
2002-05-07 15:08 ` Anton Altaparmakov
0 siblings, 2 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 13:15 UTC (permalink / raw)
To: Padraig Brady; +Cc: Linus Torvalds, Kernel Mailing List
Uz.ytkownik Padraig Brady napisa?:
> Am I going to have to parse hdparm output?
> ....
> geometry = 2434/255/63, sectors = 39102336, start = 0
>
> Am I going to need hdparm on my embedded system?
Yes. Or just fsck hardcode the defaults.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 13:15 ` Martin Dalecki
@ 2002-05-07 14:30 ` Padraig Brady
2002-05-07 15:08 ` Anton Altaparmakov
1 sibling, 0 replies; 265+ messages in thread
From: Padraig Brady @ 2002-05-07 14:30 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
Martin Dalecki wrote:
> Uz.ytkownik Padraig Brady napisa?:
>
>> Am I going to have to parse hdparm output?
>> ....
>> geometry = 2434/255/63, sectors = 39102336, start = 0
>>
>> Am I going to need hdparm on my embedded system?
>
>
> Yes. Or just fsck hardcode the defaults.
>
hardcode defaults?
Also are the following standard RH7.1 programs going to
need changing?
[padraig@pixelbeat padraig]$ find /sbin /usr/sbin /bin /usr/bin /lib
/usr/lib /usr/bin/X11/ -xdev -perm +111 | xargs grep -l /proc/ide
2>/dev/null
/sbin/mkinitrd
/sbin/fdisk
/sbin/sfdisk
/sbin/sndconfig
/usr/sbin/mouseconfig
/usr/sbin/kudzu
/usr/sbin/module_upgrade
/usr/sbin/updfstab
/usr/sbin/glidelink
/usr/sbin/sndconfig
/usr/lib/python1.5/site-packages/_kudzumodule.so
/usr/bin/X11/Xconfigurator
Padraig.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 13:15 ` Martin Dalecki
2002-05-07 14:30 ` Padraig Brady
@ 2002-05-07 15:08 ` Anton Altaparmakov
2002-05-07 15:36 ` Linus Torvalds
1 sibling, 1 reply; 265+ messages in thread
From: Anton Altaparmakov @ 2002-05-07 15:08 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Padraig Brady, Linus Torvalds, Kernel Mailing List
At 14:15 07/05/02, Martin Dalecki wrote:
>Uz.ytkownik Padraig Brady napisa?:
>>Am I going to have to parse hdparm output?
>>....
>> geometry = 2434/255/63, sectors = 39102336, start = 0
>>Am I going to need hdparm on my embedded system?
>
>Yes. Or just fsck hardcode the defaults.
This is stupid! And if that isn't obvious to you, you should think a bit
more carefully...
Linux's power is exactly that it can be used on anything from a wristwatch
to a huge server and that it is flexible about everything. You are breaking
this flexibility for no apparent reason. (I don't accept "I can't cope with
this so I remove it." as a reason, sorry).
As the new IDE maintainer so far we have only seen you removing one feature
after the other in the name of cleanup, without adequate or even any at
all(!) replacements, renaming all functions to hell and back, and breaking
the ide core here there and everywhere. All critical bug fixes seem to have
been contributed by other people looking at your code which doesn't inspire
a lot of confidence in you... Even Alan Cox said a while ago that you have
his vote of no confidence (probably slightly rephrased here) because of
changes you were introducing and I tend to trust bearded kernel hackers
from Whales. (-;
Aren't you noticing that something is wrong here???
Best regards,
Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 15:08 ` Anton Altaparmakov
@ 2002-05-07 15:36 ` Linus Torvalds
2002-05-07 16:20 ` Jan Harkes
2002-05-07 16:29 ` Padraig Brady
0 siblings, 2 replies; 265+ messages in thread
From: Linus Torvalds @ 2002-05-07 15:36 UTC (permalink / raw)
To: Anton Altaparmakov; +Cc: Martin Dalecki, Padraig Brady, Kernel Mailing List
[ First off: any IDE-only thing that doesn't work for SCSI or other disks
doesn't solve a generic problem, so the complaint that some generic
tools might use it is totally invalid. ]
On Tue, 7 May 2002, Anton Altaparmakov wrote:
>
> Linux's power is exactly that it can be used on anything from a wristwatch
> to a huge server and that it is flexible about everything. You are breaking
> this flexibility for no apparent reason. (I don't accept "I can't cope with
> this so I remove it." as a reason, sorry).
Run the 57 patch, and complain if something doesn't work.
Linux's power is that we FIX stuff. That we make it the best system
possible, and that we don't just whine and argue about things.
> As the new IDE maintainer so far we have only seen you removing one
> feature after the other in the name of cleanup, without adequate or even
> any at all(!) replacements,
Who cares? Have you found _anything_ that Martin removed that was at all
worthwhile? I sure haven't.
Guys, you have to realize that the IDE layer has eight YEARS of absolute
crap in it. Seriously. It's _never_ been cleaned up before. It has stuff
so distasteful that t's scary.
Take it from me: it's a _lot_ easier to add cruft and crap on top of clean
code. You can do it yourself if you want to. You don't need a maintainer
to add barnacles.
All the information that /proc/ide gave you is basically available in
hdparm, and for your dear embedded system it apparently takes up less
space by being in user space. So what is the problem?
My vote is to remove as much as humanly possible.
"Everything should be made as simple as possible, but not
simpler" - Albert Einstein
Think about it, and really _understand_ it.
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 15:36 ` Linus Torvalds
@ 2002-05-07 16:20 ` Jan Harkes
2002-05-07 15:26 ` Martin Dalecki
2002-05-07 16:29 ` Padraig Brady
1 sibling, 1 reply; 265+ messages in thread
From: Jan Harkes @ 2002-05-07 16:20 UTC (permalink / raw)
To: Kernel Mailing List
On Tue, May 07, 2002 at 08:36:54AM -0700, Linus Torvalds wrote:
> On Tue, 7 May 2002, Anton Altaparmakov wrote:
> > As the new IDE maintainer so far we have only seen you removing one
> > feature after the other in the name of cleanup, without adequate or even
> > any at all(!) replacements,
>
> Who cares? Have you found _anything_ that Martin removed that was at all
> worthwhile? I sure haven't.
I'm still hoping a patch will show up that will allow me to regain
access to my compactflash cards and IBM microdrive disks. The code
currently doesn't rescan for new drives when a card has been inserted,
although it still seems to have all the necessary logic.
Jan
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 16:20 ` Jan Harkes
@ 2002-05-07 15:26 ` Martin Dalecki
2002-05-07 21:36 ` Jan Harkes
0 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 15:26 UTC (permalink / raw)
To: Jan Harkes; +Cc: Kernel Mailing List
Uz.ytkownik Jan Harkes napisa?:
> On Tue, May 07, 2002 at 08:36:54AM -0700, Linus Torvalds wrote:
>
>>On Tue, 7 May 2002, Anton Altaparmakov wrote:
>>
>>>As the new IDE maintainer so far we have only seen you removing one
>>>feature after the other in the name of cleanup, without adequate or even
>>>any at all(!) replacements,
>>
>>Who cares? Have you found _anything_ that Martin removed that was at all
>>worthwhile? I sure haven't.
>
>
> I'm still hoping a patch will show up that will allow me to regain
> access to my compactflash cards and IBM microdrive disks. The code
> currently doesn't rescan for new drives when a card has been inserted,
> although it still seems to have all the necessary logic.
>
Yes I'm fully aware of this, but the whole initialization
is currently much in flux and I will return to this issue back
if I think that things are in shape there. OK?
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 15:26 ` Martin Dalecki
@ 2002-05-07 21:36 ` Jan Harkes
2002-05-08 0:25 ` Guest section DW
0 siblings, 1 reply; 265+ messages in thread
From: Jan Harkes @ 2002-05-07 21:36 UTC (permalink / raw)
To: Kernel Mailing List
On Tue, May 07, 2002 at 05:26:10PM +0200, Martin Dalecki wrote:
> Uz.ytkownik Jan Harkes napisa?:
> >I'm still hoping a patch will show up that will allow me to regain
> >access to my compactflash cards and IBM microdrive disks. The code
> >currently doesn't rescan for new drives when a card has been inserted,
> >although it still seems to have all the necessary logic.
>
> Yes I'm fully aware of this, but the whole initialization
> is currently much in flux and I will return to this issue back
> if I think that things are in shape there. OK?
I thought so, you already indicated so around the time that it broke.
There is still a 2.4 kernel when I really need to get to the data.
Jan
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 21:36 ` Jan Harkes
@ 2002-05-08 0:25 ` Guest section DW
2002-05-08 3:03 ` Jan Harkes
2002-05-08 9:03 ` Martin Dalecki
0 siblings, 2 replies; 265+ messages in thread
From: Guest section DW @ 2002-05-08 0:25 UTC (permalink / raw)
To: Kernel Mailing List
On Tue, May 07, 2002 at 05:36:03PM -0400, Jan Harkes wrote:
> On Tue, May 07, 2002 at 05:26:10PM +0200, Martin Dalecki wrote:
> > Uz.ytkownik Jan Harkes napisa?:
> > >I'm still hoping a patch will show up that will allow me to regain
> > >access to my compactflash cards and IBM microdrive disks. The code
> > >currently doesn't rescan for new drives when a card has been inserted,
> > >although it still seems to have all the necessary logic.
> >
> > Yes I'm fully aware of this, but the whole initialization
> > is currently much in flux and I will return to this issue back
> > if I think that things are in shape there. OK?
>
> I thought so, you already indicated so around the time that it broke.
> There is still a 2.4 kernel when I really need to get to the data.
I usually do
blockdev --rereadpt /dev/sde
or so. That still works for me with 2.5.13.
Andries
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 0:25 ` Guest section DW
@ 2002-05-08 3:03 ` Jan Harkes
2002-05-08 9:03 ` Martin Dalecki
1 sibling, 0 replies; 265+ messages in thread
From: Jan Harkes @ 2002-05-08 3:03 UTC (permalink / raw)
To: Kernel Mailing List
On Wed, May 08, 2002 at 02:25:13AM +0200, Guest section DW wrote:
> On Tue, May 07, 2002 at 05:36:03PM -0400, Jan Harkes wrote:
> > On Tue, May 07, 2002 at 05:26:10PM +0200, Martin Dalecki wrote:
> > > Uz.ytkownik Jan Harkes napisa?:
> > > >I'm still hoping a patch will show up that will allow me to regain
> > > >access to my compactflash cards and IBM microdrive disks. The code
> > > >currently doesn't rescan for new drives when a card has been inserted,
> > > >although it still seems to have all the necessary logic.
> > >
> > > Yes I'm fully aware of this, but the whole initialization
> > > is currently much in flux and I will return to this issue back
> > > if I think that things are in shape there. OK?
> >
> > I thought so, you already indicated so around the time that it broke.
> > There is still a 2.4 kernel when I really need to get to the data.
>
> I usually do
>
> blockdev --rereadpt /dev/sde
>
> or so. That still works for me with 2.5.13.
For SCSI devices probably, but I get "/dev/hde: No such device" (ENODEV)
when a CF card is inserted and recognized.
(dmesg)
hde: SanDisk SDCFB-32, ATA DISK drive
ide2 at 0x100-0x107,0x10e on irq 3
ide_cs: hde: Vcc = 3.3, Vpp = 0.0
When the CF card is not inserted I get a subtly different error
"/dev/hde: No such device or address" (ENXIO).
It looks like the drive <-> driver association is only set up when the
ide-disk driver module is loaded and not when new hardware is found.
Jan
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 0:25 ` Guest section DW
2002-05-08 3:03 ` Jan Harkes
@ 2002-05-08 9:03 ` Martin Dalecki
2002-05-08 12:10 ` Alan Cox
1 sibling, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-08 9:03 UTC (permalink / raw)
To: Guest section DW; +Cc: Kernel Mailing List
Uz.ytkownik Guest section DW napisa?:
> On Tue, May 07, 2002 at 05:36:03PM -0400, Jan Harkes wrote:
>
>>On Tue, May 07, 2002 at 05:26:10PM +0200, Martin Dalecki wrote:
>>
>>>Uz.ytkownik Jan Harkes napisa?:
>>>
>>>>I'm still hoping a patch will show up that will allow me to regain
>>>>access to my compactflash cards and IBM microdrive disks. The code
>>>>currently doesn't rescan for new drives when a card has been inserted,
>>>>although it still seems to have all the necessary logic.
>>>
>>>Yes I'm fully aware of this, but the whole initialization
>>>is currently much in flux and I will return to this issue back
>>>if I think that things are in shape there. OK?
>>
>>I thought so, you already indicated so around the time that it broke.
>>There is still a 2.4 kernel when I really need to get to the data.
>
>
> I usually do
>
> blockdev --rereadpt /dev/sde
>
> or so. That still works for me with 2.5.13.
What you have to do by hand now is the rescanning for partition
information. What you do is triggering just that. And if I think
about it... and you know I'm evil... hmmm...
well why just don't let it be like that. It's functionally somehow the
responsibility of the /sbin/hotplug thing anyway...
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 9:03 ` Martin Dalecki
@ 2002-05-08 12:10 ` Alan Cox
2002-05-08 10:51 ` Martin Dalecki
0 siblings, 1 reply; 265+ messages in thread
From: Alan Cox @ 2002-05-08 12:10 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Guest section DW, Kernel Mailing List
> about it... and you know I'm evil... hmmm...
> well why just don't let it be like that. It's functionally somehow the
> responsibility of the /sbin/hotplug thing anyway...
How do you intend to order a sequence of I/O operations precisely against a
partition table change driven from user space ? Thats one I can't see a nice
answer for, and having a raid controller that can do on the fly volume
resizing/creation/deletion its not just a matter of curiosity
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 12:10 ` Alan Cox
@ 2002-05-08 10:51 ` Martin Dalecki
0 siblings, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-08 10:51 UTC (permalink / raw)
To: Alan Cox; +Cc: Guest section DW, Kernel Mailing List
Uz.ytkownik Alan Cox napisa?:
>>about it... and you know I'm evil... hmmm...
>>well why just don't let it be like that. It's functionally somehow the
>>responsibility of the /sbin/hotplug thing anyway...
>
>
> How do you intend to order a sequence of I/O operations precisely against a
> partition table change driven from user space ? Thats one I can't see a nice
> answer for, and having a raid controller that can do on the fly volume
> resizing/creation/deletion its not just a matter of curiosity
Nahh Alan we are just talking about the ide-cs stuff. I'm not that "evil".
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 15:36 ` Linus Torvalds
2002-05-07 16:20 ` Jan Harkes
@ 2002-05-07 16:29 ` Padraig Brady
2002-05-07 16:51 ` Linus Torvalds
` (3 more replies)
1 sibling, 4 replies; 265+ messages in thread
From: Padraig Brady @ 2002-05-07 16:29 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Anton Altaparmakov, Martin Dalecki, Kernel Mailing List
Linus Torvalds wrote:
> [ First off: any IDE-only thing that doesn't work for SCSI or other disks
> doesn't solve a generic problem, so the complaint that some generic
> tools might use it is totally invalid. ]
>
> On Tue, 7 May 2002, Anton Altaparmakov wrote:
>
>>Linux's power is exactly that it can be used on anything from a wristwatch
>>to a huge server and that it is flexible about everything. You are breaking
>>this flexibility for no apparent reason. (I don't accept "I can't cope with
>>this so I remove it." as a reason, sorry).
>
>
> Run the 57 patch, and complain if something doesn't work.
>
> Linux's power is that we FIX stuff. That we make it the best system
> possible, and that we don't just whine and argue about things.
>
>
>>As the new IDE maintainer so far we have only seen you removing one
>>feature after the other in the name of cleanup, without adequate or even
>>any at all(!) replacements,
>
>
> Who cares? Have you found _anything_ that Martin removed that was at all
> worthwhile? I sure haven't.
>
> Guys, you have to realize that the IDE layer has eight YEARS of absolute
> crap in it. Seriously. It's _never_ been cleaned up before. It has stuff
> so distasteful that t's scary.
>
> Take it from me: it's a _lot_ easier to add cruft and crap on top of clean
> code. You can do it yourself if you want to. You don't need a maintainer
> to add barnacles.
>
> All the information that /proc/ide gave you is basically available in
> hdparm, and for your dear embedded system it apparently takes up less
> space by being in user space. So what is the problem?
Well my "dear" embedded system doesn't have libc :-(
So 35664 saved in kernel (less on disk), requires 25212
extra for hdparm + more for static linked uclibc (hope
it works ;-)). As a side note if this happens hdparm would
be a requirement for busybox IMHO, anyway getting back on topic...
All the info I've ever needed is /proc/ide/hdx/capacity
which I could get from /proc/partitions with more a bit
more effort, so I vote for removing /proc/ide.
I think everyone realises Martin is doing great and much needed work
on IDE (btw I'll have those flash support patches soon Martin ;-)),
but I did think this change needed debate. In general I know it's a
hard decision what to export in proc, especially if there are
existing dependencies, a few already mentioned possibles in RH7.1:
/sbin/mkinitrd
/sbin/fdisk
/sbin/sfdisk
/sbin/sndconfig
/usr/sbin/mouseconfig
/usr/sbin/kudzu
/usr/sbin/module_upgrade
/usr/sbin/updfstab
/usr/sbin/glidelink
/usr/sbin/sndconfig
/usr/lib/python1.5/site-packages/_kudzumodule.so
/usr/bin/X11/Xconfigurator
For e.g. could the same arguments could be made for lspci only
interface to pci info rather than /proc/bus/pci? The following
references are made to /proc/bus/pci on my system:
/sbin/lspci
/sbin/setpci
/sbin/sndconfig
/usr/sbin/mouseconfig
/usr/sbin/kudzu
/usr/sbin/module_upgrade
/usr/sbin/updfstab
/usr/sbin/glidelink
/usr/sbin/sndconfig
/usr/sbin/adsl-config
/usr/sbin/internet-config
/usr/sbin/isdn-config
/usr/lib/python1.5/site-packages/_kudzumodule.so
/usr/bin/X11/XFree86
/usr/bin/X11/pcitweak
/usr/bin/X11/scanpci
/usr/bin/X11/Xconfigurator
cheers,
Padraig.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 16:29 ` Padraig Brady
@ 2002-05-07 16:51 ` Linus Torvalds
2002-05-07 18:29 ` Kai Henningsen
2002-05-08 7:48 ` Juan Quintela
2002-05-07 17:08 ` Alan Cox
` (2 subsequent siblings)
3 siblings, 2 replies; 265+ messages in thread
From: Linus Torvalds @ 2002-05-07 16:51 UTC (permalink / raw)
To: Padraig Brady; +Cc: Anton Altaparmakov, Martin Dalecki, Kernel Mailing List
On Tue, 7 May 2002, Padraig Brady wrote:
>
> All the info I've ever needed is /proc/ide/hdx/capacity
> which I could get from /proc/partitions with more a bit
> more effort, so I vote for removing /proc/ide.
Note that one thing that we might do is to leave the remnants of /proc/ide
but _without_ the very verbose per-chipset reporting.
At least to me it looks like it's all the chipset reporting that causes
the huge kernel bloat, and it shouldn't be impossible to reinstate a
minimal /proc/ide without those parts - while still keeping most of the
backwards compatibility.
However, since I really don't much like the idea of having special
"ide-only" /proc files, I personally think any information people actually
used should be either in truly generic files (/proc/partitions as an
example), _or_ they should be in the generic device tree (talk to Pat
Mochel about that).
So my personal reaction to removal of /proc/ide is: "good riddance, but if
it turns out that we seriously need it for backwards compatibility, we can
add back a skeleton without the bloat".
(Side note: I'm afraid that don't think backwards compatibility weighs
very heavily on an embedded setup - I'm more thinking about things like "a
regular RedHat/SuSE/Debian/whatever install won't work any more".)
As to existing binaries (your list is interesting), I don't see what they
are doing about ide-specific stuff, since I sure hope those binaries are
happy with a SCSI-only system.
> For e.g. could the same arguments could be made for lspci only
> interface to pci info rather than /proc/bus/pci? The following
> references are made to /proc/bus/pci on my system:
I personally do like ASCII /proc files, as long as they don't add
maintainability problems etc.
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 16:51 ` Linus Torvalds
@ 2002-05-07 18:29 ` Kai Henningsen
2002-05-08 7:48 ` Juan Quintela
1 sibling, 0 replies; 265+ messages in thread
From: Kai Henningsen @ 2002-05-07 18:29 UTC (permalink / raw)
To: linux-kernel
torvalds@transmeta.com (Linus Torvalds) wrote on 07.05.02 in <Pine.LNX.4.44.0205070944020.2509-100000@home.transmeta.com>:
> On Tue, 7 May 2002, Padraig Brady wrote:
> >
> > All the info I've ever needed is /proc/ide/hdx/capacity
> > which I could get from /proc/partitions with more a bit
> > more effort, so I vote for removing /proc/ide.
>
> Note that one thing that we might do is to leave the remnants of /proc/ide
> but _without_ the very verbose per-chipset reporting.
>
> At least to me it looks like it's all the chipset reporting that causes
> the huge kernel bloat, and it shouldn't be impossible to reinstate a
> minimal /proc/ide without those parts - while still keeping most of the
> backwards compatibility.
What I'd like to see - in whatever exact form - is the IDE equivalent to
/proc/scsi/scsi. (And in fact, the SCSI version could use addition of at
least the disk size and the sdX mapping.)
It's rather useful for getting a quick overview of what's on a system, and
where.
Incidentally, is there a compelling reason why block device boot messages
are all different?
> However, since I really don't much like the idea of having special
> "ide-only" /proc files, I personally think any information people actually
> used should be either in truly generic files (/proc/partitions as an
> example), _or_ they should be in the generic device tree (talk to Pat
> Mochel about that).
/proc/bus/ide or something like that? Sure, why not? Exact place is pretty
much irrelevant except to legacy code.
MfG Kai
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 16:51 ` Linus Torvalds
2002-05-07 18:29 ` Kai Henningsen
@ 2002-05-08 7:48 ` Juan Quintela
2002-05-08 16:54 ` Linus Torvalds
1 sibling, 1 reply; 265+ messages in thread
From: Juan Quintela @ 2002-05-08 7:48 UTC (permalink / raw)
To: Linus Torvalds
Cc: Padraig Brady, Anton Altaparmakov, Martin Dalecki, Kernel Mailing List
>>>>> "linus" == Linus Torvalds <torvalds@transmeta.com> writes:
Hi
linus> (Side note: I'm afraid that don't think backwards compatibility weighs
linus> very heavily on an embedded setup - I'm more thinking about things like "a
linus> regular RedHat/SuSE/Debian/whatever install won't work any more".)
here at Mandrake we have a patch for the install kernel to remove the
/proc/ide, and I think that we got it from redhat, that means that at
least two distros preffer to save ~25kb in the boot kernels than the
reporting that they do :p
Later, Juan.
--
In theory, practice and theory are the same, but in practice they
are different -- Larry McVoy
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 7:48 ` Juan Quintela
@ 2002-05-08 16:54 ` Linus Torvalds
0 siblings, 0 replies; 265+ messages in thread
From: Linus Torvalds @ 2002-05-08 16:54 UTC (permalink / raw)
To: Juan Quintela
Cc: Padraig Brady, Anton Altaparmakov, Martin Dalecki, Kernel Mailing List
On 8 May 2002, Juan Quintela wrote:
> linus> (Side note: I'm afraid that don't think backwards compatibility weighs
> linus> very heavily on an embedded setup - I'm more thinking about things like "a
> linus> regular RedHat/SuSE/Debian/whatever install won't work any more".)
>
> here at Mandrake we have a patch for the install kernel to remove the
> /proc/ide, and I think that we got it from redhat, that means that at
> least two distros preffer to save ~25kb in the boot kernels than the
> reporting that they do :p
Well, that's a good sign in that it implies that things certainyl work
fine without /proc/ide.
However, I think I phrased things badly: I'm not actually worried about
the RedHat or Mandrake "act of installation" itself - since that will
always use whatever kernel RH or Mandrake put on their CD's, and they can
always change their install scripts/programs to match the kernel they use.
I'm more worried about the issue of "I installed RH-x.x, and then I
upgraded the kernel, and now program xyz won't work any more", where "xyz"
is something perfectly reasonable and common.
For example, let's say that some strange version of "mount" _requires_
/proc/ide to work (don't ask me why), and that Mandrake happened to ship
that version in their 8.2 release, and if you use the new 2.5.15 kernel on
that installation, it simply won't work. THAT would be a problem where
some backwards compatibility crud is probably worth it.
But if /proc/ide removal breaks an embedded device (on which the kernel is
not normally upgraded by "normal" people that aren't willing to upgrade
other stuff at the same time), I won't worry too much. Or if the /proc/ide
changs mean that the actual installer has to be re-done, I won't worry.
And even breaking one or two applications might be quite acceptable: I
worry more about maintainability than _perfect_ backwards compatibility.
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 16:29 ` Padraig Brady
2002-05-07 16:51 ` Linus Torvalds
@ 2002-05-07 17:08 ` Alan Cox
2002-05-07 17:00 ` Linus Torvalds
2002-05-07 17:10 ` Richard B. Johnson
2002-05-08 7:36 ` Martin Dalecki
3 siblings, 1 reply; 265+ messages in thread
From: Alan Cox @ 2002-05-07 17:08 UTC (permalink / raw)
To: Padraig Brady
Cc: Linus Torvalds, Anton Altaparmakov, Martin Dalecki, Kernel Mailing List
> All the info I've ever needed is /proc/ide/hdx/capacity
> which I could get from /proc/partitions with more a bit
> more effort, so I vote for removing /proc/ide.
/proc/ide has useful information in it that you can't get easily by
other means at the moment - which controller is driving the disks, what
devices are present etc.
> For e.g. could the same arguments could be made for lspci only
> interface to pci info rather than /proc/bus/pci? The following
> references are made to /proc/bus/pci on my system:
lspci relies on /proc/bus/pci - its the only part of the universe that
actually knows how to handle PCI and virtualised PCI devices. Unlike the
older /proc/pci interface it keeps all the complex gunk out of the kernel
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 17:08 ` Alan Cox
@ 2002-05-07 17:00 ` Linus Torvalds
2002-05-07 17:19 ` benh
` (2 more replies)
0 siblings, 3 replies; 265+ messages in thread
From: Linus Torvalds @ 2002-05-07 17:00 UTC (permalink / raw)
To: Alan Cox
Cc: Padraig Brady, Anton Altaparmakov, Martin Dalecki, Kernel Mailing List
On Tue, 7 May 2002, Alan Cox wrote:
>
> /proc/ide has useful information in it that you can't get easily by
> other means at the moment - which controller is driving the disks, what
> devices are present etc.
I'd love for somebody to add the devices to the real device tree, at which
point this kind of information would be very much visible..
Right now devicefs isn't even mounted by default, but it's the only
_really_ generic way of showing things like this that we have. For people
who haven't seen it before, do a
mount -t driverfs /devfs /devfs
and go look in there.. In particular, if you have a PCI system with a USB
device tree (or _multiple_ such trees), notice how you can look at things
like
/driverfs/root/pci0/00:1f.4/usb_bus/000/
and it wouldn't be impossible (or even necessarily very hard) to make an
IDE controller export the "IDE device tree" the same way a USB controller
now exports the "USB device tree".
For things like hotplug etc, I think driverfs is eventually the only way
to go, simply because it gives you the full (and unambiguous) path to
_any_ device, and is completely bus-agnostic.
But there is definitely a potential backwards-compatibility-issue.
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 17:00 ` Linus Torvalds
@ 2002-05-07 17:19 ` benh
2002-05-07 17:24 ` Linus Torvalds
` (3 more replies)
2002-05-08 7:58 ` Martin Dalecki
2002-05-09 13:18 ` Pavel Machek
2 siblings, 4 replies; 265+ messages in thread
From: benh @ 2002-05-07 17:19 UTC (permalink / raw)
To: Linus Torvalds, Alan Cox
Cc: Padraig Brady, Anton Altaparmakov, Martin Dalecki, Kernel Mailing List
> /driverfs/root/pci0/00:1f.4/usb_bus/000/
>
>and it wouldn't be impossible (or even necessarily very hard) to make an
>IDE controller export the "IDE device tree" the same way a USB controller
>now exports the "USB device tree".
>
>For things like hotplug etc, I think driverfs is eventually the only way
>to go, simply because it gives you the full (and unambiguous) path to
>_any_ device, and is completely bus-agnostic.
>
>But there is definitely a potential backwards-compatibility-issue.
One interesting thing here would be to have some optional link between
the bus-oriented device tree and the function-oriented tree (ie. devfs
or simply /dev). For example, an IDE node in driverfs could eventually
hold symlinks to the entries it provides in /dev when using devfs (or
just provide major/minor when not using devfs).
What do you think ?
One problem I've been faced with on ppc is to be able to match
a linux device with what the firmware (Open Firmware) thinks that
device is. The firmware view is bus-centered and it would be pretty
easy to provide some additional entries in driverfs that give the
OF fullpath of a given device. But then, the link between the actual
driver in driverfs and the "device" as used by, for example, the
filesystem isn't trivial.
Ben.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 17:19 ` benh
@ 2002-05-07 17:24 ` Linus Torvalds
2002-05-07 17:30 ` benh
2002-05-07 17:43 ` Richard Gooch
2002-05-07 17:27 ` Jauder Ho
` (2 subsequent siblings)
3 siblings, 2 replies; 265+ messages in thread
From: Linus Torvalds @ 2002-05-07 17:24 UTC (permalink / raw)
To: benh
Cc: Alan Cox, Padraig Brady, Anton Altaparmakov, Martin Dalecki,
Kernel Mailing List
On Tue, 7 May 2002 benh@kernel.crashing.org wrote:
>
> One interesting thing here would be to have some optional link between
> the bus-oriented device tree and the function-oriented tree (ie. devfs
> or simply /dev).
There isn't any 1:1 thing - the device/bus-oriented one should _not_ show
virtual things like partitions etc that have no relevance for a driver,
while /dev (and thus devfs) obviously think that that is the important
part, much more important than how we actually got to the device.
I think we need to have some way of getting a mapping from /dev ->
devicefs, but I don't think that has to be a filesystem thing (it might
even be as simple as just one ioctl or new system call: 'get the "path" of
this device').
There aren't that many people who actually care, I suspect.
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 17:24 ` Linus Torvalds
@ 2002-05-07 17:30 ` benh
2002-05-10 1:45 ` Mike Fedyk
2002-05-07 17:43 ` Richard Gooch
1 sibling, 1 reply; 265+ messages in thread
From: benh @ 2002-05-07 17:30 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, Padraig Brady, Anton Altaparmakov, Martin Dalecki,
Kernel Mailing List
>
>
>On Tue, 7 May 2002 benh@kernel.crashing.org wrote:
>>
>> One interesting thing here would be to have some optional link between
>> the bus-oriented device tree and the function-oriented tree (ie. devfs
>> or simply /dev).
>
>There isn't any 1:1 thing - the device/bus-oriented one should _not_ show
>virtual things like partitions etc that have no relevance for a driver,
>while /dev (and thus devfs) obviously think that that is the important
>part, much more important than how we actually got to the device.
>
>I think we need to have some way of getting a mapping from /dev ->
>devicefs, but I don't think that has to be a filesystem thing (it might
>even be as simple as just one ioctl or new system call: 'get the "path" of
>this device').
>
>There aren't that many people who actually care, I suspect.
Sure, It's obviously not 1:1, what I had in mind was for the controller
to show what devices it exports in the sense of raw devices, but I agree
the other way makes a lot more sense. My problem was how to be devfs
agnostic, but you answered with "ioctl or syscall" and that would indeed
be ok. The ioctl things make it appliable to network interfaces as well,
which is good.
The need to do this link from a /dev to the driverfs, I suspect, will exist
only for case like setting up the firmware, though I can imagine one may
want to tweak some IDE settings (available via driverfs in your proposed
scheme) knowing only the /dev node.
Ben.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 17:30 ` benh
@ 2002-05-10 1:45 ` Mike Fedyk
0 siblings, 0 replies; 265+ messages in thread
From: Mike Fedyk @ 2002-05-10 1:45 UTC (permalink / raw)
To: benh
Cc: Linus Torvalds, Alan Cox, Padraig Brady, Anton Altaparmakov,
Martin Dalecki, Kernel Mailing List
On Tue, May 07, 2002 at 07:30:34PM +0200, benh@kernel.crashing.org wrote:
> >
> >
> >On Tue, 7 May 2002 benh@kernel.crashing.org wrote:
> >>
> >> One interesting thing here would be to have some optional link between
> >> the bus-oriented device tree and the function-oriented tree (ie. devfs
> >> or simply /dev).
> >
> >There isn't any 1:1 thing - the device/bus-oriented one should _not_ show
> >virtual things like partitions etc that have no relevance for a driver,
> >while /dev (and thus devfs) obviously think that that is the important
> >part, much more important than how we actually got to the device.
> >
> >I think we need to have some way of getting a mapping from /dev ->
> >devicefs, but I don't think that has to be a filesystem thing (it might
> >even be as simple as just one ioctl or new system call: 'get the "path" of
> >this device').
> >
> >There aren't that many people who actually care, I suspect.
>
> Sure, It's obviously not 1:1, what I had in mind was for the controller
> to show what devices it exports in the sense of raw devices, but I agree
> the other way makes a lot more sense. My problem was how to be devfs
> agnostic, but you answered with "ioctl or syscall" and that would indeed
> be ok. The ioctl things make it appliable to network interfaces as well,
> which is good.
Yes, when will we get something that associates the physical device with
network ethX name?
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 17:24 ` Linus Torvalds
2002-05-07 17:30 ` benh
@ 2002-05-07 17:43 ` Richard Gooch
2002-05-07 18:05 ` Linus Torvalds
1 sibling, 1 reply; 265+ messages in thread
From: Richard Gooch @ 2002-05-07 17:43 UTC (permalink / raw)
To: Linus Torvalds
Cc: benh, Alan Cox, Padraig Brady, Anton Altaparmakov,
Martin Dalecki, Kernel Mailing List
Linus Torvalds writes:
> On Tue, 7 May 2002 benh@kernel.crashing.org wrote:
> >
> > One interesting thing here would be to have some optional link between
> > the bus-oriented device tree and the function-oriented tree (ie. devfs
> > or simply /dev).
>
> There isn't any 1:1 thing - the device/bus-oriented one should _not_
> show virtual things like partitions etc that have no relevance for a
> driver, while /dev (and thus devfs) obviously think that that is the
> important part, much more important than how we actually got to the
> device.
Actually, I've always said that I think devfs should care about both
views. And that's why I think putting the driver tree (ala driverfs)
in devfs, and making the device-oriented part of the tree be symlinks
into the bus-oriented tree, is a good idea.
> I think we need to have some way of getting a mapping from /dev ->
> devicefs, but I don't think that has to be a filesystem thing (it
> might even be as simple as just one ioctl or new system call: 'get
> the "path" of this device').
Fugly. What's wrong with readlink(2) as this "magic syscall"?
Regards,
Richard....
Permanent: rgooch@atnf.csiro.au
Current: rgooch@ras.ucalgary.ca
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 17:43 ` Richard Gooch
@ 2002-05-07 18:05 ` Linus Torvalds
2002-05-07 18:26 ` Alan Cox
0 siblings, 1 reply; 265+ messages in thread
From: Linus Torvalds @ 2002-05-07 18:05 UTC (permalink / raw)
To: Richard Gooch
Cc: benh, Alan Cox, Padraig Brady, Anton Altaparmakov,
Martin Dalecki, Kernel Mailing List
On Tue, 7 May 2002, Richard Gooch wrote:
>
> Actually, I've always said that I think devfs should care about both
> views.
And I think you're completely wrong.
The fact is, they are two completely different and orthogonal things, and
they have _nothing_ in common except for a very weak linkage of actual
"physical device" (which does not always exist).
The set of people that cares about one view is almost 100% different from
the set of people that care about the other view.
> Fugly. What's wrong with readlink(2) as this "magic syscall"?
Ehh - like the fact that it doesn't work on device files?
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:05 ` Linus Torvalds
@ 2002-05-07 18:26 ` Alan Cox
2002-05-07 18:16 ` Linus Torvalds
0 siblings, 1 reply; 265+ messages in thread
From: Alan Cox @ 2002-05-07 18:26 UTC (permalink / raw)
To: Linus Torvalds
Cc: Richard Gooch, benh, Alan Cox, Padraig Brady, Anton Altaparmakov,
Martin Dalecki, Kernel Mailing List
> > Fugly. What's wrong with readlink(2) as this "magic syscall"?
> Ehh - like the fact that it doesn't work on device files?
I can't find anything in Posix/SuS that says it isnt allowed to however 8)
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:26 ` Alan Cox
@ 2002-05-07 18:16 ` Linus Torvalds
2002-05-07 18:40 ` Richard Gooch
0 siblings, 1 reply; 265+ messages in thread
From: Linus Torvalds @ 2002-05-07 18:16 UTC (permalink / raw)
To: Alan Cox
Cc: Richard Gooch, benh, Padraig Brady, Anton Altaparmakov,
Martin Dalecki, Kernel Mailing List
On Tue, 7 May 2002, Alan Cox wrote:
>
> > > Fugly. What's wrong with readlink(2) as this "magic syscall"?
> > Ehh - like the fact that it doesn't work on device files?
>
> I can't find anything in Posix/SuS that says it isnt allowed to however 8)
We can certainly do it, it just doesn't buy us much of anything, since
none of the standard tools (ie "ls") will actually do the readlink() for
anything but a symlink.
So at that point it's just another magic syscall, except we've overloaded
an old one.
Which may certainly be acceptable, of course.
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:16 ` Linus Torvalds
@ 2002-05-07 18:40 ` Richard Gooch
2002-05-07 18:46 ` Linus Torvalds
2002-05-08 8:21 ` Martin Dalecki
0 siblings, 2 replies; 265+ messages in thread
From: Richard Gooch @ 2002-05-07 18:40 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, benh, Padraig Brady, Anton Altaparmakov,
Martin Dalecki, Kernel Mailing List
Linus Torvalds writes:
>
>
> On Tue, 7 May 2002, Alan Cox wrote:
> >
> > > > Fugly. What's wrong with readlink(2) as this "magic syscall"?
> > > Ehh - like the fact that it doesn't work on device files?
> >
> > I can't find anything in Posix/SuS that says it isnt allowed to however 8)
>
> We can certainly do it, it just doesn't buy us much of anything, since
> none of the standard tools (ie "ls") will actually do the readlink() for
> anything but a symlink.
>
> So at that point it's just another magic syscall, except we've overloaded
> an old one.
>
> Which may certainly be acceptable, of course.
I wasn't suggesting a magic readlink(2). I was suggesting a *real*
one. Device nodes get stored in the physical tree (what you call
driverfs), and the entries in the logical tree are symlinks. Such as:
/dev/scsi/host0 symlink to /dev/bus/pci0/slot1/function2
or something like that. Easy to implement, easy to understand, easy to
manage.
Regards,
Richard....
Permanent: rgooch@atnf.csiro.au
Current: rgooch@ras.ucalgary.ca
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:40 ` Richard Gooch
@ 2002-05-07 18:46 ` Linus Torvalds
2002-05-07 23:54 ` Roman Zippel
` (2 more replies)
2002-05-08 8:21 ` Martin Dalecki
1 sibling, 3 replies; 265+ messages in thread
From: Linus Torvalds @ 2002-05-07 18:46 UTC (permalink / raw)
To: Richard Gooch
Cc: Alan Cox, benh, Padraig Brady, Anton Altaparmakov,
Martin Dalecki, Kernel Mailing List
On Tue, 7 May 2002, Richard Gooch wrote:
> > Which may certainly be acceptable, of course.
>
> I wasn't suggesting a magic readlink(2). I was suggesting a *real*
> one. Device nodes get stored in the physical tree (what you call
> driverfs), and the entries in the logical tree are symlinks.
NO.
This is one backwards compatibility thing that I'm _not_ removing.
We have tons of existign /dev trees, and I'm not making them into
symlinks.
Also, you obviously haven't thought it through AT ALL. Hint: partitions.
If you have /dev/hda1, that _cannot_ be a symlink to the physical tree,
because on a physical level that partition DOES NOT EXIST. It's purely a
virtual mapping.
Yet clearly there _is_ a mapping from /dev/hda1 onto the physical device
in question, and clearly it _is_ a meaninful operation to operate on the
physical device underlying /dev/hda1.
So if you want to have a sane interface, you need to have a way to look up
the physical device that underlies /dev/hda1.
Yet it clearly cannot be a symlink.
QED.
So stop mixing up physical devices and /dev. They should NOT be handled by
the same mechanism.
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:46 ` Linus Torvalds
@ 2002-05-07 23:54 ` Roman Zippel
2002-05-08 6:57 ` Kai Henningsen
2002-05-09 13:58 ` Pavel Machek
2 siblings, 0 replies; 265+ messages in thread
From: Roman Zippel @ 2002-05-07 23:54 UTC (permalink / raw)
To: Linus Torvalds
Cc: Richard Gooch, Alan Cox, benh, Padraig Brady, Anton Altaparmakov,
Martin Dalecki, Kernel Mailing List
Hi,
On Tue, 7 May 2002, Linus Torvalds wrote:
> Also, you obviously haven't thought it through AT ALL. Hint: partitions.
>
> If you have /dev/hda1, that _cannot_ be a symlink to the physical tree,
> because on a physical level that partition DOES NOT EXIST. It's purely a
> virtual mapping.
>
> Yet clearly there _is_ a mapping from /dev/hda1 onto the physical device
> in question, and clearly it _is_ a meaninful operation to operate on the
> physical device underlying /dev/hda1.
>
> So if you want to have a sane interface, you need to have a way to look up
> the physical device that underlies /dev/hda1.
>
> Yet it clearly cannot be a symlink.
>
> QED.
Somehow I expect Al to step in with something like:
mount -t partfs /devfs/bus/... /dev/hda
:-)
bye, Roman
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:46 ` Linus Torvalds
2002-05-07 23:54 ` Roman Zippel
@ 2002-05-08 6:57 ` Kai Henningsen
2002-05-08 9:37 ` Ian Molton
2002-05-09 13:58 ` Pavel Machek
2 siblings, 1 reply; 265+ messages in thread
From: Kai Henningsen @ 2002-05-08 6:57 UTC (permalink / raw)
To: torvalds; +Cc: linux-kernel
torvalds@transmeta.com (Linus Torvalds) wrote on 07.05.02 in <Pine.LNX.4.44.0205071142001.1067-100000@home.transmeta.com>:
> If you have /dev/hda1, that _cannot_ be a symlink to the physical tree,
> because on a physical level that partition DOES NOT EXIST. It's purely a
> virtual mapping.
Well ... one *could* argue that there's justification for showing those
partitions by the exact same argument that there's reason to show devices
on a SCSI or USB bus. It's just going further down the tree.
Say something like
/driverfs/root/pci0/00:1f.4/scsi_bus/003/pc_partition/2
Sure, it's software, not hardware. OTOH, it's one of the things that
change with hotplug. (And incidentally, fdisk changing partitions *might*
be handled somewhat like a hotplug event ...)
As to linking to /dev, I see no reason why you couldn't have that tree
include information (not in the tree *structure*, obviously) of what the
relevant device numbers are. That's more expensive than a lookup with a
pointer gotten from /dev, but it's certainly possible.
MfG Kai
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:46 ` Linus Torvalds
2002-05-07 23:54 ` Roman Zippel
2002-05-08 6:57 ` Kai Henningsen
@ 2002-05-09 13:58 ` Pavel Machek
2 siblings, 0 replies; 265+ messages in thread
From: Pavel Machek @ 2002-05-09 13:58 UTC (permalink / raw)
To: Linus Torvalds
Cc: Richard Gooch, Alan Cox, benh, Padraig Brady, Anton Altaparmakov,
Martin Dalecki, Kernel Mailing List
Hi!
> Also, you obviously haven't thought it through AT ALL. Hint: partitions.
>
> If you have /dev/hda1, that _cannot_ be a symlink to the physical tree,
> because on a physical level that partition DOES NOT EXIST. It's purely a
> virtual mapping.
I don't see why partitions in devicefs are bad idea.
Some physical chips show as multiple devices in devicefs, so I'd guess it
would be okay for partitions to be there, too.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:40 ` Richard Gooch
2002-05-07 18:46 ` Linus Torvalds
@ 2002-05-08 8:21 ` Martin Dalecki
1 sibling, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-08 8:21 UTC (permalink / raw)
To: Richard Gooch
Cc: Linus Torvalds, Alan Cox, benh, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
Użytkownik Richard Gooch napisał:
> Linus Torvalds writes:
>
>>
>>On Tue, 7 May 2002, Alan Cox wrote:
>>
>>>>>Fugly. What's wrong with readlink(2) as this "magic syscall"?
>>>>
>>>>Ehh - like the fact that it doesn't work on device files?
>>>
>>>I can't find anything in Posix/SuS that says it isnt allowed to however 8)
>>
>>We can certainly do it, it just doesn't buy us much of anything, since
>>none of the standard tools (ie "ls") will actually do the readlink() for
>>anything but a symlink.
>>
>>So at that point it's just another magic syscall, except we've overloaded
>>an old one.
>>
>>Which may certainly be acceptable, of course.
>
>
> I wasn't suggesting a magic readlink(2). I was suggesting a *real*
> one. Device nodes get stored in the physical tree (what you call
> driverfs), and the entries in the logical tree are symlinks. Such as:
>
> /dev/scsi/host0 symlink to /dev/bus/pci0/slot1/function2
>
> or something like that. Easy to implement, easy to understand, easy to
> manage.
Now you take the last step toward solaris and realize why I was
always against your solution (no personal offence)
to the device management problem - they do it all in user space
by precisely the above symlink system....
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 17:19 ` benh
2002-05-07 17:24 ` Linus Torvalds
@ 2002-05-07 17:27 ` Jauder Ho
2002-05-08 8:13 ` Martin Dalecki
2002-05-07 18:29 ` Patrick Mochel
2002-05-08 8:07 ` Martin Dalecki
3 siblings, 1 reply; 265+ messages in thread
From: Jauder Ho @ 2002-05-07 17:27 UTC (permalink / raw)
To: benh
Cc: Linus Torvalds, Alan Cox, Padraig Brady, Anton Altaparmakov,
Martin Dalecki, Kernel Mailing List
Ben, what you are proposing is fairly similar to what Solaris does today.
There is a /devices directory that contains the real path while /dev
contains the legacy stuff. Seems to work fine and given the proper docs,
you can decipher what the /devices path points to fairly easily. So I
certainly wouldnt mind seeing this happen for Linux eventually.
--Jauder
On Tue, 7 May 2002 benh@kernel.crashing.org wrote:
> > /driverfs/root/pci0/00:1f.4/usb_bus/000/
> >
> >and it wouldn't be impossible (or even necessarily very hard) to make an
> >IDE controller export the "IDE device tree" the same way a USB controller
> >now exports the "USB device tree".
> >
> >For things like hotplug etc, I think driverfs is eventually the only way
> >to go, simply because it gives you the full (and unambiguous) path to
> >_any_ device, and is completely bus-agnostic.
> >
> >But there is definitely a potential backwards-compatibility-issue.
>
> One interesting thing here would be to have some optional link between
> the bus-oriented device tree and the function-oriented tree (ie. devfs
> or simply /dev). For example, an IDE node in driverfs could eventually
> hold symlinks to the entries it provides in /dev when using devfs (or
> just provide major/minor when not using devfs).
>
> What do you think ?
>
> One problem I've been faced with on ppc is to be able to match
> a linux device with what the firmware (Open Firmware) thinks that
> device is. The firmware view is bus-centered and it would be pretty
> easy to provide some additional entries in driverfs that give the
> OF fullpath of a given device. But then, the link between the actual
> driver in driverfs and the "device" as used by, for example, the
> filesystem isn't trivial.
>
> Ben.
>
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 17:27 ` Jauder Ho
@ 2002-05-08 8:13 ` Martin Dalecki
0 siblings, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-08 8:13 UTC (permalink / raw)
To: Jauder Ho
Cc: benh, Linus Torvalds, Alan Cox, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
Uz.ytkownik Jauder Ho napisa?:
> Ben, what you are proposing is fairly similar to what Solaris does today.
> There is a /devices directory that contains the real path while /dev
> contains the legacy stuff. Seems to work fine and given the proper docs,
> you can decipher what the /devices path points to fairly easily. So I
> certainly wouldnt mind seeing this happen for Linux eventually.
Amen, We would only have to add a device special file
to some of the /devices Stuff and /dev/ could be a symlink tree
pointing there...
I have *intentionally* named the standard mounting point
of the devicefs /devices the time I added the description
how to mount it to the driver-model.txt. The following words
are from *me*:
This can be done permanently by providing the following entry into the
/dev/fstab (under the provision that the mount point does exist, of course):
none /devices driverfs defaults 0 0
Or by hand on the command line:
~: mount -t driverfs none /devices
>
> --Jauder
>
> On Tue, 7 May 2002 benh@kernel.crashing.org wrote:
>
>
>>> /driverfs/root/pci0/00:1f.4/usb_bus/000/
>>>
>>>and it wouldn't be impossible (or even necessarily very hard) to make an
>>>IDE controller export the "IDE device tree" the same way a USB controller
>>>now exports the "USB device tree".
>>>
>>>For things like hotplug etc, I think driverfs is eventually the only way
>>>to go, simply because it gives you the full (and unambiguous) path to
>>>_any_ device, and is completely bus-agnostic.
>>>
>>>But there is definitely a potential backwards-compatibility-issue.
>>
>>One interesting thing here would be to have some optional link between
>>the bus-oriented device tree and the function-oriented tree (ie. devfs
>>or simply /dev). For example, an IDE node in driverfs could eventually
>>hold symlinks to the entries it provides in /dev when using devfs (or
>>just provide major/minor when not using devfs).
>>
>>What do you think ?
>>
>>One problem I've been faced with on ppc is to be able to match
>>a linux device with what the firmware (Open Firmware) thinks that
>>device is. The firmware view is bus-centered and it would be pretty
>>easy to provide some additional entries in driverfs that give the
>>OF fullpath of a given device. But then, the link between the actual
>>driver in driverfs and the "device" as used by, for example, the
>>filesystem isn't trivial.
>>
>>Ben.
>>
>>
>>
>>-
>>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>the body of a message to majordomo@vger.kernel.org
>>More majordomo info at http://vger.kernel.org/majordomo-info.html
>>Please read the FAQ at http://www.tux.org/lkml/
>>
>>
>
>
>
--
- phone: +49 214 8656 283
- job: eVision-Ventures AG, LEV .de (MY OPINIONS ARE MY OWN!)
- langs: de_DE.ISO8859-1, en_US, pl_PL.ISO8859-2, last ressort: ru_RU.KOI8-R
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 17:19 ` benh
2002-05-07 17:24 ` Linus Torvalds
2002-05-07 17:27 ` Jauder Ho
@ 2002-05-07 18:29 ` Patrick Mochel
2002-05-07 18:02 ` Greg KH
` (3 more replies)
2002-05-08 8:07 ` Martin Dalecki
3 siblings, 4 replies; 265+ messages in thread
From: Patrick Mochel @ 2002-05-07 18:29 UTC (permalink / raw)
To: benh
Cc: Linus Torvalds, Alan Cox, Padraig Brady, Anton Altaparmakov,
Martin Dalecki, Kernel Mailing List
On Tue, 7 May 2002 benh@kernel.crashing.org wrote:
> > /driverfs/root/pci0/00:1f.4/usb_bus/000/
> >
> >and it wouldn't be impossible (or even necessarily very hard) to make an
> >IDE controller export the "IDE device tree" the same way a USB controller
> >now exports the "USB device tree".
> >
> >For things like hotplug etc, I think driverfs is eventually the only way
> >to go, simply because it gives you the full (and unambiguous) path to
> >_any_ device, and is completely bus-agnostic.
> >
> >But there is definitely a potential backwards-compatibility-issue.
>
> One interesting thing here would be to have some optional link between
> the bus-oriented device tree and the function-oriented tree (ie. devfs
> or simply /dev). For example, an IDE node in driverfs could eventually
> hold symlinks to the entries it provides in /dev when using devfs (or
> just provide major/minor when not using devfs).
I agree with such a concept, but as Linus said, it should go the other
way, from the functional interface to physical interface. There are many
details involved in doing such a thing, but it should work something like
this:
The logical subystems (ide disks, networking, etc) would register with the
device model core and get a directory in driverfs:
/driverfs/class/ide/
Devices would be discovered and get a driverfs directory representing the
physical location of the device:
/driverfs/root/pci0/07.2/
Note that no drivers have been bound to the device. When the driver is
bound, it registers the device with the subsystem, passing in a
subsystem-specific structure. These can be made to point in some way to
the generic struct device of the device (from which the physical path can
be inferred).
When this happens, the subsystem creates a directory underneath its
driverfs directory, so you get:
/driverfs/class/ide/0/
And, a symlink is created to point to the directory in the physical path.
As the driver discovers partitions on the device, it can create special
nodes in its class directory.
At this point, userspace can be notified (via /sbin/hotplug). That can
create symlinks in /dev to the nodes that were just created, emulating
current /dev behavior.
So, what does this do? To an extent, it reengineers the funtionality of
devfs. I'll be the first to admit it. However, it centers less around the
filesystem, and more on the device model core.
Most devices already register with their subsystems, so having the
subsystesm pass device info onto the core is relatively easy.
As partitions are discovered, you get paths like:
/driverfs/class/ide/0/2
Which gives you a default name for the device. With /sbin/hotplug, simple
userspace policy, and symlinks in /dev, you can emulate the current device
hierarchy. So, you get a device naming solution that gives you only the
device names for the devices you have.
This approach also de-emphasizes the dependency on major and minor
numbers. If device nodes are created in kernel space initially, userspace
doesn't need to know what the major/minor is for a particular device. The
symlink to the device node is all that's need to operate on the device.
Without the need to coordinate between kernel and userspace, at least some
majors/minors can be dynamically allocated as the subsystems and devices
are registered with the core. (These can then be exported via files in
driverfs). (This is similar to the dynamic allocation of minor numbers in
the USB subsystem that showed up recently...)
Oh, and it's with a modern, clean filesystem, 1/5 the size of devfs.
Thoughts? Comments? Flames?
-pat
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:29 ` Patrick Mochel
@ 2002-05-07 18:02 ` Greg KH
2002-05-07 18:44 ` Richard Gooch
` (2 subsequent siblings)
3 siblings, 0 replies; 265+ messages in thread
From: Greg KH @ 2002-05-07 18:02 UTC (permalink / raw)
To: Kernel Mailing List
On Tue, May 07, 2002 at 11:29:10AM -0700, Patrick Mochel wrote:
>
> Which gives you a default name for the device. With /sbin/hotplug, simple
> userspace policy, and symlinks in /dev, you can emulate the current device
> hierarchy. So, you get a device naming solution that gives you only the
> device names for the devices you have.
>
> This approach also de-emphasizes the dependency on major and minor
> numbers. If device nodes are created in kernel space initially, userspace
> doesn't need to know what the major/minor is for a particular device. The
> symlink to the device node is all that's need to operate on the device.
>
> Without the need to coordinate between kernel and userspace, at least some
> majors/minors can be dynamically allocated as the subsystems and devices
> are registered with the core. (These can then be exported via files in
> driverfs). (This is similar to the dynamic allocation of minor numbers in
> the USB subsystem that showed up recently...)
And is exactly why this showed up in the USB subsystem :)
> Oh, and it's with a modern, clean filesystem, 1/5 the size of devfs.
And it removes the dependency of devfsd and its interface, replacing it
with the existing /sbin/hotplug interface. This allows different people
to implement different naming schemes if they so desire, moving naming
policy out of the kernel into userspace, where it belongs.
Yes, there will probably be a "default" naming scheme, matching what we
have today, but the ability to replace it with another one is _so_ much
easier than having to try to tie into devfsd (like the devreg
implementation does: http://www-124.ibm.com/devreg/ )
greg k-h
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:29 ` Patrick Mochel
2002-05-07 18:02 ` Greg KH
@ 2002-05-07 18:44 ` Richard Gooch
2002-05-07 18:44 ` Patrick Mochel
2002-05-07 18:49 ` Thunder from the hill
2002-05-08 8:18 ` Martin Dalecki
3 siblings, 1 reply; 265+ messages in thread
From: Richard Gooch @ 2002-05-07 18:44 UTC (permalink / raw)
To: Patrick Mochel
Cc: benh, Linus Torvalds, Alan Cox, Padraig Brady,
Anton Altaparmakov, Martin Dalecki, Kernel Mailing List
Patrick Mochel writes:
> Oh, and it's with a modern, clean filesystem, 1/5 the size of devfs.
The size argument is not an issue. I've already said that devfs will
shrink a lot once I move tree management from my own code to the VFS.
At that point devfs will mostly be:
- an API
- a way fo supporting the devfsd protocol.
Regards,
Richard....
Permanent: rgooch@atnf.csiro.au
Current: rgooch@ras.ucalgary.ca
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:44 ` Richard Gooch
@ 2002-05-07 18:44 ` Patrick Mochel
2002-05-07 19:21 ` Richard Gooch
0 siblings, 1 reply; 265+ messages in thread
From: Patrick Mochel @ 2002-05-07 18:44 UTC (permalink / raw)
To: Richard Gooch
Cc: benh, Linus Torvalds, Alan Cox, Padraig Brady,
Anton Altaparmakov, Martin Dalecki, Kernel Mailing List
On Tue, 7 May 2002, Richard Gooch wrote:
> Patrick Mochel writes:
> > Oh, and it's with a modern, clean filesystem, 1/5 the size of devfs.
>
> The size argument is not an issue. I've already said that devfs will
> shrink a lot once I move tree management from my own code to the VFS.
I agree 100%. However, I think that move will be very painful. I tried to
do it a couple of months ago, and there were so many interdependencies and
oddities that I gave up after about 6 hours.
> At that point devfs will mostly be:
> - an API
> - a way fo supporting the devfsd protocol.
I argue that you shouldn't need a separate daemon. We already have the
/sbin/hotplug interface. It's simple and sweet. We shouldn't need to rely
on an entirely separate daemon.
-pat
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:44 ` Patrick Mochel
@ 2002-05-07 19:21 ` Richard Gooch
2002-05-07 19:58 ` Patrick Mochel
0 siblings, 1 reply; 265+ messages in thread
From: Richard Gooch @ 2002-05-07 19:21 UTC (permalink / raw)
To: Patrick Mochel
Cc: benh, Linus Torvalds, Alan Cox, Padraig Brady,
Anton Altaparmakov, Martin Dalecki, Kernel Mailing List
Patrick Mochel writes:
>
> On Tue, 7 May 2002, Richard Gooch wrote:
>
> > Patrick Mochel writes:
> > > Oh, and it's with a modern, clean filesystem, 1/5 the size of devfs.
> >
> > The size argument is not an issue. I've already said that devfs will
> > shrink a lot once I move tree management from my own code to the VFS.
>
> I agree 100%. However, I think that move will be very painful. I
> tried to do it a couple of months ago, and there were so many
> interdependencies and oddities that I gave up after about 6 hours.
Oh, it's certainly more that 6 hours of work. But it *will* get done.
> > At that point devfs will mostly be:
> > - an API
> > - a way fo supporting the devfsd protocol.
>
> I argue that you shouldn't need a separate daemon. We already have
> the /sbin/hotplug interface. It's simple and sweet. We shouldn't
> need to rely on an entirely separate daemon.
The devfsd protocol is more lightweight. Plus it doesn't require
fork(2)+execve(2) overheads. And more importantly, you can capture
lookup() events.
Regards,
Richard....
Permanent: rgooch@atnf.csiro.au
Current: rgooch@ras.ucalgary.ca
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 19:21 ` Richard Gooch
@ 2002-05-07 19:58 ` Patrick Mochel
0 siblings, 0 replies; 265+ messages in thread
From: Patrick Mochel @ 2002-05-07 19:58 UTC (permalink / raw)
To: Richard Gooch; +Cc: Kernel Mailing List
> Oh, it's certainly more that 6 hours of work. But it *will* get done.
Even the mtrr driver was a good 8 hours to clean up, make readable and
more object-oriented. I wish you luck, as well as anyone that has to
attempt to decipher it.
> > > At that point devfs will mostly be:
> > > - an API
> > > - a way fo supporting the devfsd protocol.
> >
> > I argue that you shouldn't need a separate daemon. We already have
> > the /sbin/hotplug interface. It's simple and sweet. We shouldn't
> > need to rely on an entirely separate daemon.
>
> The devfsd protocol is more lightweight. Plus it doesn't require
> fork(2)+execve(2) overheads. And more importantly, you can capture
> lookup() events.
These events are not performance critical, so the overhead is less
important. Besides, almost all systems have /sbin/hotplug, since it can be
anything - a shell script, a perl script, a tiny C executable.
The hotplug interface doesn't rely on any particular implementation. It
only relies on something on the other side implementing a particular
interface. The implementation can be replaced, as well as the format of
the policy, based on the constratints of the system or the whims of
the distro.
It also doesn't rely on a process running to capture events. What happens
if the devfsd process is killed?
-pat
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:29 ` Patrick Mochel
2002-05-07 18:02 ` Greg KH
2002-05-07 18:44 ` Richard Gooch
@ 2002-05-07 18:49 ` Thunder from the hill
2002-05-07 19:47 ` Patrick Mochel
2002-05-08 8:18 ` Martin Dalecki
3 siblings, 1 reply; 265+ messages in thread
From: Thunder from the hill @ 2002-05-07 18:49 UTC (permalink / raw)
To: Patrick Mochel
Cc: benh, Linus Torvalds, Alan Cox, Padraig Brady,
Anton Altaparmakov, Martin Dalecki, Kernel Mailing List
Hi,
> > > /driverfs/root/pci0/00:1f.4/usb_bus/000/
> /driverfs/class/ide/
> /driverfs/root/pci0/07.2/
> /driverfs/class/ide/0/
> /driverfs/class/ide/0/2
Why not fixing devfs for that? My root directory is messed up enough. We
have dev, proc, tmp, ...
We might have /dev/driver or such, which doesn't make the root directory
any fuller. (And also not to disturb the newbies any further. It's hard a
lot to explain to a windows user why he can't remove /proc and /dev, and
what this is supposed to be.)
This is just my opinion...
Regards,
Thunder
--
if (errno == ENOTAVAIL)
fprintf(stderr, "Error: Talking to Microsoft server!\n");
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:49 ` Thunder from the hill
@ 2002-05-07 19:47 ` Patrick Mochel
2002-05-07 22:03 ` Richard Gooch
0 siblings, 1 reply; 265+ messages in thread
From: Patrick Mochel @ 2002-05-07 19:47 UTC (permalink / raw)
To: Thunder from the hill; +Cc: Kernel Mailing List
On Tue, 7 May 2002, Thunder from the hill wrote:
> Hi,
>
> > > > /driverfs/root/pci0/00:1f.4/usb_bus/000/
> > /driverfs/class/ide/
> > /driverfs/root/pci0/07.2/
> > /driverfs/class/ide/0/
> > /driverfs/class/ide/0/2
>
> Why not fixing devfs for that? My root directory is messed up enough. We
> have dev, proc, tmp, ...
For one, I am of the camp that believes devfs is unfixable.
For two, where driverfs is mounted is irrelevant: /driverfs, /sys,
/proc/bus are all valid places.
Besides, who cares what's in /? You have /home, which is all that really
matters, no?
> We might have /dev/driver or such, which doesn't make the root directory
> any fuller. (And also not to disturb the newbies any further. It's hard a
> lot to explain to a windows user why he can't remove /proc and /dev, and
> what this is supposed to be.)
So don't give them root access. Or, explain to them that they're magic,
like the pagefile.sys file. :)
-pat
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 19:47 ` Patrick Mochel
@ 2002-05-07 22:03 ` Richard Gooch
2002-05-08 8:14 ` Russell King
0 siblings, 1 reply; 265+ messages in thread
From: Richard Gooch @ 2002-05-07 22:03 UTC (permalink / raw)
To: Patrick Mochel; +Cc: Thunder from the hill, Kernel Mailing List
Patrick Mochel writes:
>
> On Tue, 7 May 2002, Thunder from the hill wrote:
>
> > Hi,
> >
> > > > > /driverfs/root/pci0/00:1f.4/usb_bus/000/
> > > /driverfs/class/ide/
> > > /driverfs/root/pci0/07.2/
> > > /driverfs/class/ide/0/
> > > /driverfs/class/ide/0/2
> >
> > Why not fixing devfs for that? My root directory is messed up enough. We
> > have dev, proc, tmp, ...
>
> For one, I am of the camp that believes devfs is unfixable.
But it's not actually broken, now that the locking is fixed.
Regards,
Richard....
Permanent: rgooch@atnf.csiro.au
Current: rgooch@ras.ucalgary.ca
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 22:03 ` Richard Gooch
@ 2002-05-08 8:14 ` Russell King
2002-05-08 16:07 ` Richard Gooch
0 siblings, 1 reply; 265+ messages in thread
From: Russell King @ 2002-05-08 8:14 UTC (permalink / raw)
To: Richard Gooch; +Cc: Patrick Mochel, Thunder from the hill, Kernel Mailing List
On Tue, May 07, 2002 at 04:03:50PM -0600, Richard Gooch wrote:
> But it's not actually broken, now that the locking is fixed.
Really? What about the case of the missing BKL for device opens that
you haven't really commented on?
Seems like devfs _still_ has locking problems.
--
Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 8:14 ` Russell King
@ 2002-05-08 16:07 ` Richard Gooch
2002-05-08 17:07 ` Russell King
0 siblings, 1 reply; 265+ messages in thread
From: Richard Gooch @ 2002-05-08 16:07 UTC (permalink / raw)
To: Russell King; +Cc: Patrick Mochel, Thunder from the hill, Kernel Mailing List
Russell King writes:
> On Tue, May 07, 2002 at 04:03:50PM -0600, Richard Gooch wrote:
> > But it's not actually broken, now that the locking is fixed.
>
> Really? What about the case of the missing BKL for device opens that
> you haven't really commented on?
I did comment to you, privately, saying I was waiting to see what the
consensus was on the issue of whether to move the BKL or not. I'll be
sending a patch later this week to fix it.
> Seems like devfs _still_ has locking problems.
A pretty minor one, given the comment I was responding to: "devfs is
unfixable". I've noticed that even Al has gone quiet on the "devfs
races" issue, now that the new code is in place :-)
Regards,
Richard....
Permanent: rgooch@atnf.csiro.au
Current: rgooch@ras.ucalgary.ca
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 16:07 ` Richard Gooch
@ 2002-05-08 17:07 ` Russell King
0 siblings, 0 replies; 265+ messages in thread
From: Russell King @ 2002-05-08 17:07 UTC (permalink / raw)
To: Richard Gooch; +Cc: Patrick Mochel, Thunder from the hill, Kernel Mailing List
On Wed, May 08, 2002 at 10:07:44AM -0600, Richard Gooch wrote:
> Russell King writes:
> > Really? What about the case of the missing BKL for device opens that
> > you haven't really commented on?
>
> I did comment to you, privately, saying I was waiting to see what the
> consensus was on the issue of whether to move the BKL or not. I'll be
> sending a patch later this week to fix it.
Yes, and hey, we still have the problem a week layer, even after the
discussion went dead.
> > Seems like devfs _still_ has locking problems.
>
> A pretty minor one, given the comment I was responding to: "devfs is
> unfixable". I've noticed that even Al has gone quiet on the "devfs
> races" issue, now that the new code is in place :-)
Never the less, your comment about "no locking problems" is inaccurate.
devfs is calling at least one part of the kernel without obeying the
existing locking rules. That's definitely a devfs bug.
--
Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 18:29 ` Patrick Mochel
` (2 preceding siblings ...)
2002-05-07 18:49 ` Thunder from the hill
@ 2002-05-08 8:18 ` Martin Dalecki
3 siblings, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-08 8:18 UTC (permalink / raw)
To: Patrick Mochel
Cc: benh, Linus Torvalds, Alan Cox, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
Uz.ytkownik Patrick Mochel napisa?:
> On Tue, 7 May 2002 benh@kernel.crashing.org wrote:
>
>
>>> /driverfs/root/pci0/00:1f.4/usb_bus/000/
>>>
>>>and it wouldn't be impossible (or even necessarily very hard) to make an
>>>IDE controller export the "IDE device tree" the same way a USB controller
>>>now exports the "USB device tree".
>>>
>>>For things like hotplug etc, I think driverfs is eventually the only way
>>>to go, simply because it gives you the full (and unambiguous) path to
>>>_any_ device, and is completely bus-agnostic.
>>>
>>>But there is definitely a potential backwards-compatibility-issue.
>>
>>One interesting thing here would be to have some optional link between
>>the bus-oriented device tree and the function-oriented tree (ie. devfs
>>or simply /dev). For example, an IDE node in driverfs could eventually
>>hold symlinks to the entries it provides in /dev when using devfs (or
>>just provide major/minor when not using devfs).
>
>
> I agree with such a concept, but as Linus said, it should go the other
> way, from the functional interface to physical interface. There are many
> details involved in doing such a thing, but it should work something like
> this:
>
> The logical subystems (ide disks, networking, etc) would register with the
> device model core and get a directory in driverfs:
>
> /driverfs/class/ide/
>
> Devices would be discovered and get a driverfs directory representing the
> physical location of the device:
>
> /driverfs/root/pci0/07.2/
>
> Note that no drivers have been bound to the device. When the driver is
> bound, it registers the device with the subsystem, passing in a
> subsystem-specific structure. These can be made to point in some way to
> the generic struct device of the device (from which the physical path can
> be inferred).
>
> When this happens, the subsystem creates a directory underneath its
> driverfs directory, so you get:
>
> /driverfs/class/ide/0/
>
> And, a symlink is created to point to the directory in the physical path.
> As the driver discovers partitions on the device, it can create special
> nodes in its class directory.
>
> At this point, userspace can be notified (via /sbin/hotplug). That can
> create symlinks in /dev to the nodes that were just created, emulating
> current /dev behavior.
>
> So, what does this do? To an extent, it reengineers the funtionality of
> devfs. I'll be the first to admit it. However, it centers less around the
> filesystem, and more on the device model core.
>
> Most devices already register with their subsystems, so having the
> subsystesm pass device info onto the core is relatively easy.
>
> As partitions are discovered, you get paths like:
>
> /driverfs/class/ide/0/2
Just a side note...
Please please name it /devices/ Some old boys like me (age 30)
can gain from similarities with some quite common "legacy" systems.
We don't have to "invent" for the sake of it.
And /devices/ is the way I have named it in the corresponding
documentation.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 17:19 ` benh
` (2 preceding siblings ...)
2002-05-07 18:29 ` Patrick Mochel
@ 2002-05-08 8:07 ` Martin Dalecki
3 siblings, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-08 8:07 UTC (permalink / raw)
To: benh
Cc: Linus Torvalds, Alan Cox, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
Uz.ytkownik benh@kernel.crashing.org napisa?:
>> /driverfs/root/pci0/00:1f.4/usb_bus/000/
>>
>>and it wouldn't be impossible (or even necessarily very hard) to make an
>>IDE controller export the "IDE device tree" the same way a USB controller
>>now exports the "USB device tree".
>>
>>For things like hotplug etc, I think driverfs is eventually the only way
>>to go, simply because it gives you the full (and unambiguous) path to
>>_any_ device, and is completely bus-agnostic.
>>
>>But there is definitely a potential backwards-compatibility-issue.
>
>
> One interesting thing here would be to have some optional link between
> the bus-oriented device tree and the function-oriented tree (ie. devfs
> or simply /dev). For example, an IDE node in driverfs could eventually
> hold symlinks to the entries it provides in /dev when using devfs (or
> just provide major/minor when not using devfs).
>
> What do you think ?
>
> One problem I've been faced with on ppc is to be able to match
> a linux device with what the firmware (Open Firmware) thinks that
> device is. The firmware view is bus-centered and it would be pretty
> easy to provide some additional entries in driverfs that give the
> OF fullpath of a given device. But then, the link between the actual
> driver in driverfs and the "device" as used by, for example, the
> filesystem isn't trivial.
>
> Ben.
>
>
>
This is the "first" IDE controller on my notebook:
./devices/root/pci0/00:07.1/01f0
./devices/root/pci0/00:07.1/01f0/0
./devices/root/pci0/00:07.1/01f0/0/power
./devices/root/pci0/00:07.1/01f0/0/name
./devices/root/pci0/00:07.1/01f0/0/status
./devices/root/pci0/00:07.1/01f0/power
./devices/root/pci0/00:07.1/01f0/name
./devices/root/pci0/00:07.1/01f0/status
Guys I have done it already!
For your convenience I will attach the ata prefix to the
currently used port number in the next patch round.
OK?
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 17:00 ` Linus Torvalds
2002-05-07 17:19 ` benh
@ 2002-05-08 7:58 ` Martin Dalecki
2002-05-08 12:18 ` Alan Cox
` (2 more replies)
2002-05-09 13:18 ` Pavel Machek
2 siblings, 3 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-08 7:58 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, Padraig Brady, Anton Altaparmakov, Kernel Mailing List
Uz.ytkownik Linus Torvalds napisa?:
>
> On Tue, 7 May 2002, Alan Cox wrote:
>
>>/proc/ide has useful information in it that you can't get easily by
>>other means at the moment - which controller is driving the disks, what
>>devices are present etc.
>
>
> I'd love for somebody to add the devices to the real device tree, at which
> point this kind of information would be very much visible..
>
> Right now devicefs isn't even mounted by default, but it's the only
> _really_ generic way of showing things like this that we have. For people
> who haven't seen it before, do a
>
> mount -t driverfs /devfs /devfs
>
> and go look in there.. In particular, if you have a PCI system with a USB
> device tree (or _multiple_ such trees), notice how you can look at things
> like
>
> /driverfs/root/pci0/00:1f.4/usb_bus/000/
>
> and it wouldn't be impossible (or even necessarily very hard) to make an
> IDE controller export the "IDE device tree" the same way a USB controller
> now exports the "USB device tree".
>
> For things like hotplug etc, I think driverfs is eventually the only way
> to go, simply because it gives you the full (and unambiguous) path to
> _any_ device, and is completely bus-agnostic.
>
> But there is definitely a potential backwards-compatibility-issue.
Linus - there are no backward compatibility issues here.
No single application from my system does mess with /proc/ide.
They showed you a list of programs which use /proc and not a list
of programs which use anything out of /proc/ide...
RedHat even disables all this chip set specific reporting in theyr
public kernels. OK kudzu is using it, but it does not *rely on it*.
Heck kudzu is running all the time I rebooted my system during
developement and nothing ugly did happen.
for fdisk on my notebook, well it runs just fine:
[root@kozaczek root]# fdisk /dev/hda
The number of cylinders for this disk is set to 2584.
There is nothing wrong with that, but this is larger than 1024,
and could in certain setups cause problems with:
1) software that runs at boot time (e.g., old versions of LILO)
2) booting and partitioning software from other OSs
(e.g., DOS FDISK, OS/2 FDISK)
Command (m for help): p
Disk /dev/hda: 240 heads, 63 sectors, 2584 cylinders
Units = cylinders of 15120 * 512 bytes
Device Boot Start End Blocks Id System
/dev/hda1 * 1 7 52888+ 83 Linux
/dev/hda2 8 2556 19270440 5 Extended
/dev/hda4 2557 2584 211680 a0 IBM Thinkpad hibernation
/dev/hda5 8 2166 16322008+ 83 Linux
/dev/hda6 2167 2219 400648+ 82 Linux swap
Neither the programmer who wrote fdisk or cdrecord or anything else
was stiupid enough to use anything out there, since using a
simple ioctl is easier anyway. I *did* check them.
(Admittedly I don't care about kudzu, but fdisk an friends I was
fully aware of.)
BTW. If one needs the size of the disk well we could
attach it as a file size to the device file in /dev IMHO. Why not?
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 7:58 ` Martin Dalecki
@ 2002-05-08 12:18 ` Alan Cox
2002-05-08 11:09 ` Martin Dalecki
2002-05-08 18:21 ` Erik Andersen
2002-05-09 13:13 ` Pavel Machek
2002-05-10 12:01 ` Padraig Brady
2 siblings, 2 replies; 265+ messages in thread
From: Alan Cox @ 2002-05-08 12:18 UTC (permalink / raw)
To: Martin Dalecki
Cc: Linus Torvalds, Alan Cox, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
> RedHat even disables all this chip set specific reporting in theyr
> public kernels. OK kudzu is using it, but it does not *rely on it*.
The boot kernel has a lot of it disabled not the main ones.
> Heck kudzu is running all the time I rebooted my system during
> developement and nothing ugly did happen.
I can't speak directly for the Kudzu maintainer but I can say that having
a sane way to obtain the list of ide devices (all of them not just non
pcmcia) and the device bindings/type has been a long standing request.
If 2.6 breaks a 2.4 installer and nothing else I don't think its a big
disaster and the cleanup may well be justified
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 12:18 ` Alan Cox
@ 2002-05-08 11:09 ` Martin Dalecki
2002-05-08 12:42 ` Alan Cox
2002-05-08 18:21 ` Erik Andersen
1 sibling, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-08 11:09 UTC (permalink / raw)
To: Alan Cox
Cc: Linus Torvalds, Padraig Brady, Anton Altaparmakov, Kernel Mailing List
Uz.ytkownik Alan Cox napisa?:
>>RedHat even disables all this chip set specific reporting in theyr
>>public kernels. OK kudzu is using it, but it does not *rely on it*.
>
>
> The boot kernel has a lot of it disabled not the main ones.
>
>
>>Heck kudzu is running all the time I rebooted my system during
>>developement and nothing ugly did happen.
>
>
> I can't speak directly for the Kudzu maintainer but I can say that having
> a sane way to obtain the list of ide devices (all of them not just non
> pcmcia) and the device bindings/type has been a long standing request.
>
> If 2.6 breaks a 2.4 installer and nothing else I don't think its a big
> disaster and the cleanup may well be justified
Well personally I would just love if there where a "go ahead and don't
care about "compatibility" for the following:
Make hdX gone and use the scsi device major/minor number stuff instead.
And then just making the ATA driver looking like if it where some
incapable SCSI would actually reduce tons of code from kudzu and
friends without the need for any adjustment there.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 11:09 ` Martin Dalecki
@ 2002-05-08 12:42 ` Alan Cox
2002-05-08 11:23 ` Martin Dalecki
2002-05-09 2:37 ` Lincoln Dale
0 siblings, 2 replies; 265+ messages in thread
From: Alan Cox @ 2002-05-08 12:42 UTC (permalink / raw)
To: Martin Dalecki
Cc: Alan Cox, Linus Torvalds, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
> Make hdX gone and use the scsi device major/minor number stuff instead.
> And then just making the ATA driver looking like if it where some
> incapable SCSI would actually reduce tons of code from kudzu and
> friends without the need for any adjustment there.
The SCSI layer is significant overhead even in 2.5. Right now for example
it appears to be the primary bottleneck for the aacraid drivers. ATA6 is
also more capable than SCSI in several areas regardless of the notional
market positioning.
Linus talked about having a /dev/disc/... which once you have 32bit dev_t
makes complete sense. What you don't do however is throw IDE through the
SCSI midlayer, you merely make the /dev/disc/ point call into the right
drivers - be they raid, scsi or ide. That also lets the scsi emulation
crap get ripped out of the megaraid and aacraid drivers which will up
performance.
Alan
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 12:42 ` Alan Cox
@ 2002-05-08 11:23 ` Martin Dalecki
2002-05-09 2:37 ` Lincoln Dale
1 sibling, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-08 11:23 UTC (permalink / raw)
To: Alan Cox
Cc: Linus Torvalds, Padraig Brady, Anton Altaparmakov, Kernel Mailing List
Uz.ytkownik Alan Cox napisa?:
>>Make hdX gone and use the scsi device major/minor number stuff instead.
>>And then just making the ATA driver looking like if it where some
>>incapable SCSI would actually reduce tons of code from kudzu and
>>friends without the need for any adjustment there.
>
>
> The SCSI layer is significant overhead even in 2.5. Right now for example
> it appears to be the primary bottleneck for the aacraid drivers. ATA6 is
> also more capable than SCSI in several areas regardless of the notional
> market positioning.
>
> Linus talked about having a /dev/disc/... which once you have 32bit dev_t
> makes complete sense. What you don't do however is throw IDE through the
> SCSI midlayer, you merely make the /dev/disc/ point call into the right
> drivers - be they raid, scsi or ide. That also lets the scsi emulation
> crap get ripped out of the megaraid and aacraid drivers which will up
> performance.
>
> Alan
Alan... you have taken me wrong. What I mean is just the following.
Take away some minors from use by SCSI (or more propably a common repository)
and use the same ioctl numbers where possible. Perhaps implement
some ioctl here and there... not more!
Not the whole: "we are just another SCSI device on the driver level".
That would not make sense indeed. Since in esp. the SCSI mid-layer isn't
taht pritty too...
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 12:42 ` Alan Cox
2002-05-08 11:23 ` Martin Dalecki
@ 2002-05-09 2:37 ` Lincoln Dale
2002-05-09 3:10 ` Andrew Morton
` (2 more replies)
1 sibling, 3 replies; 265+ messages in thread
From: Lincoln Dale @ 2002-05-09 2:37 UTC (permalink / raw)
To: Alan Cox
Cc: Martin Dalecki, Alan Cox, Linus Torvalds, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
At 01:42 PM 8/05/2002 +0100, Alan Cox wrote:
>The SCSI layer is significant overhead even in 2.5.
i did some benchmarking on a high-end dual P3 Xeon (Serverworks chipset )
with QLogic 2300 (2gbit/s) 64/66 Fibre Channel controllers.
using the '/dev/sgX' interface to issue scsi reads/writes allowed me to hit
the magical limit of 200mbyte/sec throughput. (basically just about
linerate). (simultaneous "sg_read if=/dev/sgX mmap=1 bs=512 count=35M";
sg_read from the sg-tools package)
doing the same test thru the block-layer was basically capped at around
135mbyte/sec. (simultaneous "dd if=/dev/sdX of=/dev/null bs=512 count=35M").
whether the bottleneck was copy-from-kernel-to-userspace (ie. exhaustion of
Front-Side-Bus / memory bandwidth) or related to block-layer overhead and
scsi layer overheads, i haven't yet validated, but at a ~35% performance
difference is relatively significant nontheless.
cpu utlization on the sg interface was under 10%. using 'dd' on the sd
interface, both gigahertz P3 Xeons had 0% idle time.
cheers,
lincoln.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-09 2:37 ` Lincoln Dale
@ 2002-05-09 3:10 ` Andrew Morton
2002-05-09 10:05 ` Lincoln Dale
2002-05-09 4:16 ` [PATCH] 2.5.14 IDE 56 Andre Hedrick
2002-05-09 14:58 ` Alan Cox
2 siblings, 1 reply; 265+ messages in thread
From: Andrew Morton @ 2002-05-09 3:10 UTC (permalink / raw)
To: Lincoln Dale
Cc: Alan Cox, Martin Dalecki, Linus Torvalds, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
Lincoln Dale wrote:
>
> At 01:42 PM 8/05/2002 +0100, Alan Cox wrote:
> >The SCSI layer is significant overhead even in 2.5.
>
> i did some benchmarking on a high-end dual P3 Xeon (Serverworks chipset )
> with QLogic 2300 (2gbit/s) 64/66 Fibre Channel controllers.
>
> using the '/dev/sgX' interface to issue scsi reads/writes allowed me to hit
> the magical limit of 200mbyte/sec throughput. (basically just about
> linerate). (simultaneous "sg_read if=/dev/sgX mmap=1 bs=512 count=35M";
> sg_read from the sg-tools package)
>
> doing the same test thru the block-layer was basically capped at around
> 135mbyte/sec. (simultaneous "dd if=/dev/sdX of=/dev/null bs=512 count=35M").
>
> whether the bottleneck was copy-from-kernel-to-userspace (ie. exhaustion of
> Front-Side-Bus / memory bandwidth) or related to block-layer overhead and
> scsi layer overheads, i haven't yet validated, but at a ~35% performance
> difference is relatively significant nontheless.
>
> cpu utlization on the sg interface was under 10%. using 'dd' on the sd
> interface, both gigahertz P3 Xeons had 0% idle time.
>
You need to be careful with this stuff. Cache effects dominate.
I believe the /dev/sgX driver uses a fixed kernel-side buffer for
the transfer. So the source of the copy_to_user() will always come
out of cache if the CPU is snooping the busmastering. But not if
the CPU is performing cache invalidates in response to that busmastering.
But for `dd', which has to copy the data out of pagecache, the
copy_from_user() will get 100% misses on the source, guaranteed.
Also, the `sg_read' command reads everything into the same (small)
chunk of userspace memory. So the destination of copy_to_user()
is always in cache. Probably, the same is true with `dd bs=512',
but one would have to go read the dd source to verify.
This is also why the scsi_debug driver runs so much faster than normal
devices: it copies everything out of a fixed in-kernel buffer. ie:
out of L1 cache. Fast.
Similarly, `sg_dd' against scsi_debug is copying a fixed kernel buffer
into a fixed userspace buffer But when `dd' tries to do the same thing
it incurs an additional copy into the pagecache. If the pagecache
readahead window exceeds your L1 cache size (it does) then it will
appear to be a lot slower.
Summary: the block layer ain't slow - it's memory which is slow ;)
-
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-09 3:10 ` Andrew Morton
@ 2002-05-09 10:05 ` Lincoln Dale
2002-05-09 18:50 ` Andrew Morton
0 siblings, 1 reply; 265+ messages in thread
From: Lincoln Dale @ 2002-05-09 10:05 UTC (permalink / raw)
To: Andrew Morton
Cc: Alan Cox, Martin Dalecki, Linus Torvalds, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
At 08:10 PM 8/05/2002 -0700, Andrew Morton wrote:
> > whether the bottleneck was copy-from-kernel-to-userspace (ie. exhaustion of
> > Front-Side-Bus / memory bandwidth) or related to block-layer overhead and
> > scsi layer overheads, i haven't yet validated, but at a ~35% performance
> > difference is relatively significant nontheless.
>
>You need to be careful with this stuff. Cache effects dominate.
...
i've validated that the performance difference is due to copy_to_user().
i created a hack in the tree where a read() on a file opened with the
option O_NOCOPY causes no copy_to_user() to occur. (diff at the bottom of
this email).
on this test machine (dual P3 Xeon / 256K L2 cache, 2G PC133 SDRAM, QLogic
2300 FC HBA, 8 x 15K RPM disks).
maximum theoretical performance is 2gbit/s (~200mbyte/sec). kernel is 2.4.18.
i get the following performance numbers with 256K reads syncronously from
the disks:
/dev/md0 raid-0 with O_DIRECT: 91847kbyte/sec (2781usec
avg latency/read)
/dev/md0 raid-0: 129455kbyte/sec
(1978usec avg latency/read)
/dev/md0 raid-0 with O_NOCOPY: 195868kbyte/sec (1297usec avg
latency/read)
requests split evenly across /dev/sd[e-l] w/ O_DIRECT:
78279kbyte/sec (3276usec
avg latency/read)
requests split evenly across /dev/sd[e-l]: 105130kbyte/sec
(2437usec avg latency/read)
requests split evenly across /dev/sd[e-l] w/ O_NOCOPY:
123050kb/sec (2088usec avg
latency/read)
there's some interesting numbers here.
- given the performance difference between O_NOCOPY and pristine on
/dev/md0 one can definitely point at that being the copy_to_user()
overhead.
- however, when requests are split across multiple block-devices
(/dev/sd[e-l])
the difference are significantly decreased -- and something wierdo is
going on:
- readahead?
- scsi priorities?
perhaps some form of async i/o is required to get the performance back.
i'll do some experiments with a 2.5.xx and see if the block-layer changes
cause any significant changes.
--- pristine/linux/include/asm-i386/fcntl.h Tue Sep 18
06:16:30 2001
+++ linux/include/asm-i386/fcntl.h Thu May 9 18:56:46 2002
@@ -20,6 +20,7 @@
#define O_LARGEFILE 0100000
#define O_DIRECTORY 0200000 /* must be a directory */
#define O_NOFOLLOW 0400000 /* don't follow links */
+#define O_NOCOPY 04 /* LTD HACK: dont do copy_to_user */
#define F_DUPFD 0 /* dup */
#define F_GETFD 1 /* get close_on_exec */
--- pristine/linux/mm/filemap.c Tue Feb 26 06:38:13 2002
+++ linux/mm/filemap.c Thu May 9 18:56:48 2002
@@ -1544,6 +1544,19 @@
return retval;
}
+int file_read_nocopy_actor(read_descriptor_t * desc, struct page
*page, unsigned long offset, unsigned long size)
+{
+ unsigned long count = desc->count;
+
+ if (size > count)
+ size = count;
+
+ desc->count = count - size;
+ desc->written += size;
+ desc->buf += size;
+ return size;
+}
+
int file_read_actor(read_descriptor_t * desc, struct page *page,
unsigned long offset, unsigned long size)
{
char *kaddr;
@@ -1591,7 +1604,7 @@
desc.count = count;
desc.buf = buf;
desc.error = 0;
- do_generic_file_read(filp, ppos, &desc,
file_read_actor);
+ do_generic_file_read(filp, ppos, &desc,
((filp->f_flags & O_NOCOPY) ? file_read_nocopy_actor : file_read_actor));
retval = desc.written;
if (!retval)
cheers,
lincoln.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-09 10:05 ` Lincoln Dale
@ 2002-05-09 18:50 ` Andrew Morton
2002-05-10 0:33 ` Andi Kleen
2002-05-10 6:50 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Lincoln Dale
0 siblings, 2 replies; 265+ messages in thread
From: Andrew Morton @ 2002-05-09 18:50 UTC (permalink / raw)
To: Lincoln Dale
Cc: Alan Cox, Martin Dalecki, Linus Torvalds, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
Lincoln Dale wrote:
>
> ...
> i've validated that the performance difference is due to copy_to_user().
> i created a hack in the tree where a read() on a file opened with the
> option O_NOCOPY causes no copy_to_user() to occur. (diff at the bottom of
> this email).
>
> on this test machine (dual P3 Xeon / 256K L2 cache, 2G PC133 SDRAM, QLogic
> 2300 FC HBA, 8 x 15K RPM disks).
> maximum theoretical performance is 2gbit/s (~200mbyte/sec). kernel is 2.4.18.
>
> i get the following performance numbers with 256K reads syncronously from
> the disks:
For bulk read() and write() I/O the best sized buffer is 8 kbytes. 4k is
pretty good, too. Anything larger blows the user-side buffer out of L1.
This is for x86.
This is a pretty important point, so let's repeat it:
Userspace programmers who are writing bulk-transfer read/write loops
should use an 8 kbyte transfer buffer.
> /dev/md0 raid-0 with O_DIRECT: 91847kbyte/sec (2781usec
> avg latency/read)
> /dev/md0 raid-0: 129455kbyte/sec
> (1978usec avg latency/read)
> /dev/md0 raid-0 with O_NOCOPY: 195868kbyte/sec (1297usec avg
> latency/read)
hmm. Why is O_DIRECT always the slowest? (and it would presumably do
even worse with an 8k transfer size).
-
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-09 18:50 ` Andrew Morton
@ 2002-05-10 0:33 ` Andi Kleen
2002-05-10 0:48 ` Andrew Morton
2002-05-10 6:50 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Lincoln Dale
1 sibling, 1 reply; 265+ messages in thread
From: Andi Kleen @ 2002-05-10 0:33 UTC (permalink / raw)
To: Andrew Morton; +Cc: linux-kernel
Andrew Morton <akpm@zip.com.au> writes:
> For bulk read() and write() I/O the best sized buffer is 8 kbytes. 4k is
> pretty good, too. Anything larger blows the user-side buffer out of L1.
> This is for x86.
Modern x86 support prefetch hints for the CPU to tell it to not
pollute the caches with "streaming data". I bet using them would
be a big win. The rep ; movsl loop used in copy*user isn't
very good on modern x86 anyways (it is ok on PPro, but loses on Athlon
and P4)
I'm (slowly) working on such functions for x86-64, it should be eventually
possible to backport them to i386.
-Andi
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-10 0:33 ` Andi Kleen
@ 2002-05-10 0:48 ` Andrew Morton
2002-05-10 1:06 ` Andi Kleen
0 siblings, 1 reply; 265+ messages in thread
From: Andrew Morton @ 2002-05-10 0:48 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel
Andi Kleen wrote:
>
> Andrew Morton <akpm@zip.com.au> writes:
>
> > For bulk read() and write() I/O the best sized buffer is 8 kbytes. 4k is
> > pretty good, too. Anything larger blows the user-side buffer out of L1.
> > This is for x86.
>
> Modern x86 support prefetch hints for the CPU to tell it to not
> pollute the caches with "streaming data". I bet using them would
> be a big win.
Maybe. For your basic:
for (many) {
read(fd1, buf, 8192);
write(fd2, buf, 8192);
}
you want `buf' cached, but not the pagecache for fd1 and fd2.
If the prefetch hints can express that then yes, nice.
> The rep ; movsl loop used in copy*user isn't
> very good on modern x86 anyways (it is ok on PPro, but loses on Athlon
> and P4)
On PII and PIII, rep;movsl is slower than an open-coded
duff-device copy for all src/dest alignments except for
the case where both are eight-byte-aligned. By up to
20%, iirc. four-byte-aligned to four-byte-aligned isn't
too bad.
Of course, a lot of copy_*_users are well-aligned. But
a lot are not. I ended up deciding that switching to
the duff-device copy would be a very small overall win, when
you weight it by the alignment patterns of normal kernel
usage.
But making a runtime slection of which copy function to
use (based on src/dest alignment) could speed up the
kernel's most expensive function by maybe 10-15% overall.
The test proggy is in http://www.zip.com.au/~akpm/linux/cptimer.tar.gz
-
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-10 0:48 ` Andrew Morton
@ 2002-05-10 1:06 ` Andi Kleen
2002-05-13 17:51 ` Pavel Machek
0 siblings, 1 reply; 265+ messages in thread
From: Andi Kleen @ 2002-05-10 1:06 UTC (permalink / raw)
To: Andrew Morton; +Cc: Andi Kleen, linux-kernel
On Fri, May 10, 2002 at 02:48:15AM +0200, Andrew Morton wrote:
> Andi Kleen wrote:
> >
> > Andrew Morton <akpm@zip.com.au> writes:
> >
> > > For bulk read() and write() I/O the best sized buffer is 8 kbytes. 4k is
> > > pretty good, too. Anything larger blows the user-side buffer out of L1.
> > > This is for x86.
> >
> > Modern x86 support prefetch hints for the CPU to tell it to not
> > pollute the caches with "streaming data". I bet using them would
> > be a big win.
>
> Maybe. For your basic:
>
> for (many) {
> read(fd1, buf, 8192);
> write(fd2, buf, 8192);
> }
>
> you want `buf' cached, but not the pagecache for fd1 and fd2.
> If the prefetch hints can express that then yes, nice.
SSE has prefetchnta
3dnow has something similar.
In addition you can use movnti* for stores. These should be faster
because they use write combining and avoid the latency of fetching
the cache line of the destination just to overwrite it.
The tricky bit is to avoid prefetches over the boundary of your copy.
Prefetching from an uncached area or write combined area (like the
AGP gart which could start in next page) triggers hardware bugs in
various boxes. This unfortunately complicates the prefetching loops
a bit.
>
> > The rep ; movsl loop used in copy*user isn't
> > very good on modern x86 anyways (it is ok on PPro, but loses on Athlon
> > and P4)
>
> On PII and PIII, rep;movsl is slower than an open-coded
> duff-device copy for all src/dest alignments except for
> the case where both are eight-byte-aligned. By up to
> 20%, iirc. four-byte-aligned to four-byte-aligned isn't
> too bad.
That's surprising. AFAIK on PPro rep ; movs does magic prefetch
tricks in microcode, so it should be eventually faster if you do
not use explicit prefetching and you're not cache hot for
bigger copies (in smaller ones the setup overhead may dominate)
On Athlon rep ; movs loses clearly compared to an unrolled loop.
-Andi
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-10 1:06 ` Andi Kleen
@ 2002-05-13 17:51 ` Pavel Machek
2002-05-14 21:44 ` Andi Kleen
0 siblings, 1 reply; 265+ messages in thread
From: Pavel Machek @ 2002-05-13 17:51 UTC (permalink / raw)
To: Andi Kleen; +Cc: Andrew Morton, linux-kernel
Hi!
> The tricky bit is to avoid prefetches over the boundary of your copy.
> Prefetching from an uncached area or write combined area (like the
> AGP gart which could start in next page) triggers hardware bugs in
> various boxes. This unfortunately complicates the prefetching loops
> a bit.
CONFIG_MY_MACHINE_AINT_BORKEN? We definitely could assume that on x86-64.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-13 17:51 ` Pavel Machek
@ 2002-05-14 21:44 ` Andi Kleen
0 siblings, 0 replies; 265+ messages in thread
From: Andi Kleen @ 2002-05-14 21:44 UTC (permalink / raw)
To: Pavel Machek; +Cc: Andi Kleen, Andrew Morton, linux-kernel
On Mon, May 13, 2002 at 07:51:00PM +0200, Pavel Machek wrote:
> Hi!
>
> > The tricky bit is to avoid prefetches over the boundary of your copy.
> > Prefetching from an uncached area or write combined area (like the
> > AGP gart which could start in next page) triggers hardware bugs in
> > various boxes. This unfortunately complicates the prefetching loops
> > a bit.
>
> CONFIG_MY_MACHINE_AINT_BORKEN? We definitely could assume that on x86-64.
I was advised by AMD that I should better avoid it even for future boxes.
-Andi
^ permalink raw reply [flat|nested] 265+ messages in thread
* O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-09 18:50 ` Andrew Morton
2002-05-10 0:33 ` Andi Kleen
@ 2002-05-10 6:50 ` Lincoln Dale
2002-05-10 7:15 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56) Andrew Morton
2002-05-10 15:55 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Linus Torvalds
1 sibling, 2 replies; 265+ messages in thread
From: Lincoln Dale @ 2002-05-10 6:50 UTC (permalink / raw)
To: Andrew Morton
Cc: Alan Cox, Martin Dalecki, Linus Torvalds, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
At 11:50 AM 9/05/2002 -0700, Andrew Morton wrote:
> > /dev/md0 raid-0 with O_DIRECT: 91847kbyte/sec (2781usec
> > /dev/md0
> raid-0: 129455kbyte/sec (1978usec
> > /dev/md0 raid-0 with O_NOCOPY: 195868kbyte/sec (1297usec
>
>hmm. Why is O_DIRECT always the slowest? (and it would presumably do
>even worse with an 8k transfer size).
i just reproduced the test to validate the data. i'm using 8kbyte blocks here.
on kernel is 2.4.18, O_DIRECT is still the slowest.
this machine has 2GB RAM, so it has 1.1GB RAM in HighMem.
booting a kernel with 'profile=2' set, the numbers were as follows:
- Base performance, /dev/md0 raid-0 8-disk array:
[root@mel-stglab-host1 src]# readprofile -r;
./test_disk_performance bs=8k blocks=4M /dev/md0
Completed writing 31250 mbytes in 214. 94761 seconds (153.05
Mbytes/sec), 53usec mean latency
- using /dev/md0 raid-0 8-disk array with O_DIRECT:
[root@mel-stglab-host1 src]# readprofile -r;
./test_disk_performance bs=8k blocks=4M direct /dev/md0
Completed reading 31250 mbytes in 1229.830726 seconds (26.64
Mbytes/sec), 306usec mean latency
- using /dev/md0 raid-0 8-disk array with O_NOCOPY hack:
[root@mel-stglab-host1 src]# readprofile -r;
./test_disk_performance bs=8k blocks=4M nocopy /dev/md0
Completed writing 31250 mbytes in 163.602116 seconds (200.29
Mbytes/sec), 39usec mean latency
so O_DIRECT in 2.4.18 still shows up as a 55% performance hit versus no
O_DIRECT.
anyone have any clues?
from the profile of the O_DIRECT kernel, we have:
[root@mel-stglab-host1 src]# cat /tmp/profile2.txt | sort -n -k3 |
tail -20
c01ceb90 submit_bh 270 2.4107
c01fc8c0 scsi_init_io_vc 286 0.7772
c0136ec0 create_bounce 323 0.9908
c0139d80 unlock_buffer 353 4.4125
c012f7d0 kmem_cache_alloc 465 1.6146
c0115a40 __wake_up 470 2.4479
c01fa720 __scsi_end_request 509 1.7674
c01fae00 scsi_request_fn 605 0.7002
c013cab0 end_buffer_io_kiobuf 675 10.5469
c01154e0 schedule 849 0.6170
c0131a40 rmqueue 868 1.5069
c025ede0 raid0_make_request 871 2.5923
c0225ee0 qla2x00_done 973 1.6436
c013cb60 brw_kiovec 1053 1.0446
c01ce400 __make_request 1831 1.1110
c01f30e0 scsi_dispatch_cmd 1854 2.0692
c011d010 do_softirq 2183 9.7455
c0136c30 bounce_end_io_read 13947 39.6222
c0105230 default_idle 231472 3616.7500
00000000 total 266665 0.1425
contrast this to the profile where we're not using O_DIRECT:
[root@mel-stglab-host1 src]# cat /tmp/profile3_base.txt | sort -n
-k3 | tail -20
c012fdc0 kmem_cache_reap 369 0.4707
c013b330 set_bh_page 397 4.9625
c011d010 do_softirq 419 1.8705
c0131a40 rmqueue 466 0.8090
c01fa720 __scsi_end_request 484 1.6806
c012fa60 kmem_cache_free 496 3.8750
c013bd00 block_read_full_page 523 0.7783
c012f7d0 kmem_cache_alloc 571 1.9826
c013db39 _text_lock_buffer 729 0.9812
c0130ca0 shrink_cache 747 0.7781
c01cea70 generic_make_request 833 2.8924
c025ede0 raid0_make_request 930 2.7679
c013b280 get_unused_buffer_head 975 5.5398
c01fc8c0 scsi_init_io_vc 1003 2.7255
c013d490 try_to_free_buffers 1757 4.7745
c013a9d0 end_buffer_io_async 2482 14.1023
c01ce400 __make_request 2687 1.6305
c012a6e0 file_read_actor 6951 27.1523
c0105230 default_idle 15227 237.9219
00000000 total 45048 0.0241
the biggest difference here is bounce_end_io_read in O_DIRECT.
given there's still lots of idle-time, i'll file up lockmeter on here and
see if theres any gremlins there.
cheers,
lincoln.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56)
2002-05-10 6:50 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Lincoln Dale
@ 2002-05-10 7:15 ` Andrew Morton
2002-05-10 7:21 ` Jens Axboe
` (2 more replies)
2002-05-10 15:55 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Linus Torvalds
1 sibling, 3 replies; 265+ messages in thread
From: Andrew Morton @ 2002-05-10 7:15 UTC (permalink / raw)
To: Lincoln Dale; +Cc: Kernel Mailing List
Lincoln Dale wrote:
>
> At 11:50 AM 9/05/2002 -0700, Andrew Morton wrote:
> > > /dev/md0 raid-0 with O_DIRECT: 91847kbyte/sec (2781usec
> > > /dev/md0
> > raid-0: 129455kbyte/sec (1978usec
> > > /dev/md0 raid-0 with O_NOCOPY: 195868kbyte/sec (1297usec
> >
> >hmm. Why is O_DIRECT always the slowest? (and it would presumably do
> >even worse with an 8k transfer size).
>
> i just reproduced the test to validate the data. i'm using 8kbyte blocks here.
> on kernel is 2.4.18, O_DIRECT is still the slowest.
8k would rather disadvantage O_DIRECT. It can't do readahead and it
can't to write-behind. It'll be forced to do tons of tiny I/Os.
> ...
>
> the biggest difference here is bounce_end_io_read in O_DIRECT.
Well hopefully that will be gone in 2.4.20, for many popular
controllers.
Not sure why it bit you here. It could be that the page allocator
just happened to give you highmem pages for the O_DIRECT test.
Turning off highmem may make the test more repeatable.
> given there's still lots of idle-time, i'll file up lockmeter on here and
> see if theres any gremlins there.
lockmeter will go off the dial. All those copies happen at
interrupt time, inside the global io_request_lock. It's horrid.
Try it with the block-highmem patch:
http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.19pre1aa1/00_block-highmem-all-18b-4.gz
-
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56)
2002-05-10 7:15 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56) Andrew Morton
@ 2002-05-10 7:21 ` Jens Axboe
2002-05-10 8:12 ` Andrea Arcangeli
2002-05-10 10:14 ` Lincoln Dale
2 siblings, 0 replies; 265+ messages in thread
From: Jens Axboe @ 2002-05-10 7:21 UTC (permalink / raw)
To: Andrew Morton; +Cc: Lincoln Dale, Kernel Mailing List
On Fri, May 10 2002, Andrew Morton wrote:
> > given there's still lots of idle-time, i'll file up lockmeter on here and
> > see if theres any gremlins there.
>
> lockmeter will go off the dial. All those copies happen at
> interrupt time, inside the global io_request_lock. It's horrid.
Depends. For IDE, that is so. For SCSI, actually no io_request_lock is
not held while doing the bounce copy. The write bounce copy never
happens with io_request_lock held for either, the copy-back on reads
only does if the caller holds io_request_lock while entering
end_that_request_first() (or its own replacement, __scsi_end_request()
for instance).
The read copy-back is nasty for most users regardless of io_request_lock
status, because it happens with interrupts disabled.
> Try it with the block-highmem patch:
> http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.19pre1aa1/00_block-highmem-all-18b-4.gz
That's good advise :-)
--
Jens Axboe
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56)
2002-05-10 7:15 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56) Andrew Morton
2002-05-10 7:21 ` Jens Axboe
@ 2002-05-10 8:12 ` Andrea Arcangeli
2002-05-10 10:14 ` Lincoln Dale
2 siblings, 0 replies; 265+ messages in thread
From: Andrea Arcangeli @ 2002-05-10 8:12 UTC (permalink / raw)
To: Andrew Morton; +Cc: Lincoln Dale, Kernel Mailing List
On Fri, May 10, 2002 at 12:15:19AM -0700, Andrew Morton wrote:
> Lincoln Dale wrote:
> >
> > At 11:50 AM 9/05/2002 -0700, Andrew Morton wrote:
> > > > /dev/md0 raid-0 with O_DIRECT: 91847kbyte/sec (2781usec
> > > > /dev/md0
> > > raid-0: 129455kbyte/sec (1978usec
> > > > /dev/md0 raid-0 with O_NOCOPY: 195868kbyte/sec (1297usec
> > >
> > >hmm. Why is O_DIRECT always the slowest? (and it would presumably do
> > >even worse with an 8k transfer size).
> >
> > i just reproduced the test to validate the data. i'm using 8kbyte blocks here.
> > on kernel is 2.4.18, O_DIRECT is still the slowest.
>
> 8k would rather disadvantage O_DIRECT. It can't do readahead and it
> can't to write-behind. It'll be forced to do tons of tiny I/Os.
yes, that's the main factor. Suggested read/write buffer size with
O_DIRECT are of the order of 512k at least (that's the high limit for a
single scsi dma request in 2.4), and 1M even better so you I/O pipeline
some more in the ll_rw_block layer.
Andrea
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56)
2002-05-10 7:15 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56) Andrew Morton
2002-05-10 7:21 ` Jens Axboe
2002-05-10 8:12 ` Andrea Arcangeli
@ 2002-05-10 10:14 ` Lincoln Dale
2002-05-10 12:36 ` Andrea Arcangeli
2 siblings, 1 reply; 265+ messages in thread
From: Lincoln Dale @ 2002-05-10 10:14 UTC (permalink / raw)
To: Andrew Morton; +Cc: Kernel Mailing List
At 12:15 AM 10/05/2002 -0700, Andrew Morton wrote:
>Try it with the block-highmem patch:
>
>http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.19pre1aa1/00_block-highmem-all-18b-4.gz
given i had to recompile the kernel to add lockmeter, i'd already cheated
and changed PAGE_OFFSET from 0xc0000000 to 0x80000000, obviating the
requirement for highmem altogether.
being fair to O_DIRECT and giving it 1mbyte disk-reads to work with and
giving normal i/o 8kbyte reads to work with.
still using 2.4.18 with profile=2 enabled and lockmeter in the kernel but
not turned on. still using the same disk spindles (just 6 this time), each
a 18G 15K RPM disk spindle.
i got tired of scanning the entire available space on an 18G disk so just
dropped the test down to the first 2G of each disk.
O_DIRECT is still a ~30% performance hit versus just talking to the
/dev/sdX device directly. profile traces at bottom.
normal block-device disks sd[m-r] without O_DIRECT, 64K x 8k reads:
[root@mel-stglab-host1 src]# readprofile -r;
./test_disk_performance blocks=64K bs=8k /dev/sd[m-r]
Completed reading 12000 mbytes in 125.028612 seconds (95.98
Mbytes/sec), 76usec mean
normal block-device disks sd[m-r] with O_DIRECT, 5K x 1 megabyte reads:
[root@mel-stglab-host1 src]# readprofile -r;
./test_disk_performance blocks=5K bs=1m direct /dev/sd[m-r]
Completed reading 12000 mbytes in 182.492975 seconds (65.76
Mbytes/sec), 15416usec mean
for interests-sake, compare this to using the 'raw' versions of the same disks:
[root@mel-stglab-host1 src]# readprofile -r;
./test_disk_performance blocks=5K bs=1m /dev/raw/raw[2-7]
Completed reading 12000 mbytes in 206.346371 seconds (58.15
Mbytes/sec), 16860usec mean
of course, these are all ~25% worse than if a mechanism of performing the
i/o avoiding the copy_to_user() altogether:
[root@mel-stglab-host1 src]# readprofile -r;
./test_disk_performance blocks=64K bs=8k nocopy /dev/sd[m-r]
Completed reading 12000 mbytes in 97.846938 seconds (122.64
Mbytes/sec), 59usec mean
anyone want to see any other benchmarks performed? would a comparison to
2.5.x be useful?
comparative profile=2 traces:
- no O_DIRECT:
[root@mel-stglab-host1 src]# readprofile -v | sort -n -k3 | tail -10
80125060 _spin_lock_ 718 6.4107
8013bfc0 brw_kiovec 798 0.9591
801cbb40 generic_make_request 830 2.8819
801f9400 scsi_init_io_vc 831 2.2582
8013c840 try_to_free_buffers 1198 3.4034
8013a190 end_buffer_io_async 2453 12.7760
8012b100 file_read_actor 3459 36.0312
801cb4e0 __make_request 7532 4.6152
80105220 default_idle 106468 1663.5625
00000000 total 134102 0.0726
- O_DIRECT, disks /dev/sd[m-r]:
[root@mel-stglab-host1 src]# readprofile -v | sort -n -k3 | tail -10
801cbb40 generic_make_request 72 0.2500
8013ab00 set_bh_page 73 1.1406
801cbc60 submit_bh 116 1.0357
801f72a0 __scsi_end_request 133 0.4618
80139540 unlock_buffer 139 1.7375
8013bf10 end_buffer_io_kiobuf 302 4.7188
8013bfc0 brw_kiovec 357 0.4291
801cb4e0 __make_request 995 0.6097
80105220 default_idle 34243 535.0469
00000000 total 37101 0.0201
- /dev/raw/raw[2-7]:
[root@mel-stglab-host1 src]# readprofile -v | sort -n -k3 | tail -10
8013bf50 wait_kio 349 3.1161
801cbb40 generic_make_request 461 1.6007
801cbc60 submit_bh 526 4.6964
80139540 unlock_buffer 666 8.3250
801f72a0 __scsi_end_request 699 2.4271
8013bf10 end_buffer_io_kiobuf 1672 26.1250
8013bfc0 brw_kiovec 1906 2.2909
801cb4e0 __make_request 10495 6.4308
80105220 default_idle 84418 1319.0312
00000000 total 103516 0.0560
- O_NOCOPY hack: (userspace doesn't actually get the read data)
801f9400 scsi_init_io_vc 785 2.1332
8013c840 try_to_free_buffers 950 2.6989
801f72a0 __scsi_end_request 966 3.3542
801cbb40 generic_make_request 1017 3.5312
8013bf10 end_buffer_io_kiobuf 1672 26.1250
8013a190 end_buffer_io_async 1693 8.8177
8013bfc0 brw_kiovec 1906 2.2909
801cb4e0 __make_request 13682 8.3836
80105220 default_idle 112345 1755.3906
00000000 total 144891 0.0784
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56)
2002-05-10 10:14 ` Lincoln Dale
@ 2002-05-10 12:36 ` Andrea Arcangeli
2002-05-11 3:23 ` Lincoln Dale
2002-05-12 11:23 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56) Lincoln Dale
0 siblings, 2 replies; 265+ messages in thread
From: Andrea Arcangeli @ 2002-05-10 12:36 UTC (permalink / raw)
To: Lincoln Dale; +Cc: Andrew Morton, Kernel Mailing List
On Fri, May 10, 2002 at 08:14:10PM +1000, Lincoln Dale wrote:
> At 12:15 AM 10/05/2002 -0700, Andrew Morton wrote:
> >Try it with the block-highmem patch:
> >
> >http://www.kernel.org/pub/linux/kernel/people/andrea/kernels/v2.4/2.4.19pre1aa1/00_block-highmem-all-18b-4.gz
>
> given i had to recompile the kernel to add lockmeter, i'd already cheated
> and changed PAGE_OFFSET from 0xc0000000 to 0x80000000, obviating the
> requirement for highmem altogether.
>
> being fair to O_DIRECT and giving it 1mbyte disk-reads to work with and
> giving normal i/o 8kbyte reads to work with.
> still using 2.4.18 with profile=2 enabled and lockmeter in the kernel but
> not turned on. still using the same disk spindles (just 6 this time), each
> a 18G 15K RPM disk spindle.
> i got tired of scanning the entire available space on an 18G disk so just
> dropped the test down to the first 2G of each disk.
is any of the disks mounted?
>
> O_DIRECT is still a ~30% performance hit versus just talking to the
> /dev/sdX device directly. profile traces at bottom.
>
> normal block-device disks sd[m-r] without O_DIRECT, 64K x 8k reads:
> [root@mel-stglab-host1 src]# readprofile -r;
> ./test_disk_performance blocks=64K bs=8k /dev/sd[m-r]
> Completed reading 12000 mbytes in 125.028612 seconds (95.98
> Mbytes/sec), 76usec mean
can you post your test_disk_performance program so I in particular we
can see the semantics of blocks and bs? 64k*8k == 5k * 1M / 10.
>
> normal block-device disks sd[m-r] with O_DIRECT, 5K x 1 megabyte reads:
> [root@mel-stglab-host1 src]# readprofile -r;
> ./test_disk_performance blocks=5K bs=1m direct /dev/sd[m-r]
> Completed reading 12000 mbytes in 182.492975 seconds (65.76
> Mbytes/sec), 15416usec mean
>
> for interests-sake, compare this to using the 'raw' versions of the same
> disks:
> [root@mel-stglab-host1 src]# readprofile -r;
> ./test_disk_performance blocks=5K bs=1m /dev/raw/raw[2-7]
> Completed reading 12000 mbytes in 206.346371 seconds (58.15
> Mbytes/sec), 16860usec mean
O_DIRECT has to do some more work to check for the coherency with the
pagecache and it has some more overhead with the address space
operations, but O_DIRECT by default uses the blocksize of the blkdev,
that is set to 1k by default (if you never mounted it) versus the
hardblocksize of 512bytes used by the raw device (assuming the sd[m-r]
aren't mounted).
This is most probably why O_DIRECT is faster than raw.c, otherwise they
would run almost at the same rate, the pagecache coherency fast paths
and the address space ops overhead of O_DIRECT shouldn't be noticeable.
>
> of course, these are all ~25% worse than if a mechanism of performing the
> i/o avoiding the copy_to_user() altogether:
> [root@mel-stglab-host1 src]# readprofile -r;
> ./test_disk_performance blocks=64K bs=8k nocopy /dev/sd[m-r]
> Completed reading 12000 mbytes in 97.846938 seconds (122.64
> Mbytes/sec), 59usec mean
the nocopy hack is not an interesting test for O_DIRECT/rawio, it
doesn't walk pagetables, it doesn't allow the DMA to be done into
userspace memory. If you want the pagecache to be visible into userspace
(i.e. MAP_PRIVATE/MAP_SHARED) you must deal with pagetables somehow,
and if you want the read/write syscalls to DMA directly into userspace
memory (raw/O_DIRECT) you must still walk pagetables during those
syscalls before starting the DMA. If you don't want to explicitly deal
with the pagetables then you need to copy_user (case 1). In most archs
where mem bandwith is very expensive avoiding the copy-user is a big
global win (other cpus won't collapse in smp etc..).
Your nocopy hack benchmark has some relevance only for usages of the
data done by kernel. So if it is the kernel that reads the data directly
from pagecache (i.e. a kernel module), then your nocopy benchmark
matters. For example your nocopy benchmark also matters for sendfile
zerocopy, it will read at 122M/sec. But if it's userspace supposed to
receive the data (so not directly from pagecache on the kernel direct
mapping, but in userspace mapped memory) it cannot be 122M/sec, it has
to be less due the user address space management.
>
>
> anyone want to see any other benchmarks performed? would a comparison to
> 2.5.x be useful?
>
>
> comparative profile=2 traces:
> - no O_DIRECT:
> [root@mel-stglab-host1 src]# readprofile -v | sort -n -k3 | tail -10
> 80125060 _spin_lock_ 718 6.4107
> 8013bfc0 brw_kiovec 798 0.9591
> 801cbb40 generic_make_request 830 2.8819
> 801f9400 scsi_init_io_vc 831 2.2582
> 8013c840 try_to_free_buffers 1198 3.4034
> 8013a190 end_buffer_io_async 2453 12.7760
> 8012b100 file_read_actor 3459 36.0312
> 801cb4e0 __make_request 7532 4.6152
> 80105220 default_idle 106468 1663.5625
> 00000000 total 134102 0.0726
>
> - O_DIRECT, disks /dev/sd[m-r]:
> [root@mel-stglab-host1 src]# readprofile -v | sort -n -k3 | tail -10
> 801cbb40 generic_make_request 72 0.2500
> 8013ab00 set_bh_page 73 1.1406
> 801cbc60 submit_bh 116 1.0357
> 801f72a0 __scsi_end_request 133 0.4618
> 80139540 unlock_buffer 139 1.7375
> 8013bf10 end_buffer_io_kiobuf 302 4.7188
> 8013bfc0 brw_kiovec 357 0.4291
> 801cb4e0 __make_request 995 0.6097
> 80105220 default_idle 34243 535.0469
> 00000000 total 37101 0.0201
>
> - /dev/raw/raw[2-7]:
> [root@mel-stglab-host1 src]# readprofile -v | sort -n -k3 | tail -10
> 8013bf50 wait_kio 349 3.1161
> 801cbb40 generic_make_request 461 1.6007
> 801cbc60 submit_bh 526 4.6964
> 80139540 unlock_buffer 666 8.3250
> 801f72a0 __scsi_end_request 699 2.4271
> 8013bf10 end_buffer_io_kiobuf 1672 26.1250
> 8013bfc0 brw_kiovec 1906 2.2909
> 801cb4e0 __make_request 10495 6.4308
> 80105220 default_idle 84418 1319.0312
> 00000000 total 103516 0.0560
>
> - O_NOCOPY hack: (userspace doesn't actually get the read data)
> 801f9400 scsi_init_io_vc 785 2.1332
> 8013c840 try_to_free_buffers 950 2.6989
> 801f72a0 __scsi_end_request 966 3.3542
> 801cbb40 generic_make_request 1017 3.5312
> 8013bf10 end_buffer_io_kiobuf 1672 26.1250
> 8013a190 end_buffer_io_async 1693 8.8177
> 8013bfc0 brw_kiovec 1906 2.2909
> 801cb4e0 __make_request 13682 8.3836
> 80105220 default_idle 112345 1755.3906
> 00000000 total 144891 0.0784
Can you use -k4? this is the number of hits per function, but we should
take the size of the function into account too. Otherwise small
functions won't show up.
Can you also give a spin to the same benchmark with 2.4.19pre8aa2? It
has the vary-io stuff from Badari and futher kiobuf optimization from
Chuck. (vary-io will work only with aic and qlogic, enabling it is a one
liner if the driver is just ok with variable bh->b_size in the same I/O
request). right fix for avoiding the flood of small bh is bio in 2.5,
for 2.4 vary-io should be fine.
thanks,
Andrea
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56)
2002-05-10 12:36 ` Andrea Arcangeli
@ 2002-05-11 3:23 ` Lincoln Dale
2002-05-13 11:19 ` Andrea Arcangeli
2002-05-12 11:23 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56) Lincoln Dale
1 sibling, 1 reply; 265+ messages in thread
From: Lincoln Dale @ 2002-05-11 3:23 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Andrew Morton, Kernel Mailing List
At 02:36 PM 10/05/2002 +0200, Andrea Arcangeli wrote:
> > being fair to O_DIRECT and giving it 1mbyte disk-reads to work with and
> > giving normal i/o 8kbyte reads to work with.
..
>is any of the disks mounted?
no.
for the O_DIRECT tests i also didn't have the MD driver touching
them. (ie. raidstop /dev/md[0-1]).
> > O_DIRECT is still a ~30% performance hit versus just talking to the
> > /dev/sdX device directly. profile traces at bottom.
> >
> > normal block-device disks sd[m-r] without O_DIRECT, 64K x 8k reads:
> > [root@mel-stglab-host1 src]# readprofile -r;
> > ./test_disk_performance blocks=64K bs=8k /dev/sd[m-r]
> > Completed reading 12000 mbytes in 125.028612 seconds (95.98
> > Mbytes/sec), 76usec mean
>
>can you post your test_disk_performance program
i'll post the program later on this weekend. (its suffering from continual
scope-creap and additional development. :) ).
but basically, its similar to 'dd' except works on multiple devices
simultaneously.
operation consists of sequential-reads or sequential-writes.
its main loop basically consists entirely of:
/* loop thru blocks */
for (blocknum=0; blocknum < blocks; blocknum++) {
/* loop thru devices */
for (devicenum=0; devicenum < num_devices; devicenum++) {
before_time = time_tick();
if (operation == 0) {
/* read op */
amt_read = read(fd[devicenum],
aligned_buffer[devicenum], block_size);
} else {
/* write-op */
amt_read = write(fd[devicenum],
aligned_buffer[devicenum], block_size);
}
after_time = time_tick();
[check amt_read == block_size, calculate time
histograms]
}
}
the open call consists of:
for (i=0; i < num_devices; i++) {
flags = (O_RDWR | O_LARGEFILE);
if (nocopy) flags |= O_NOCOPY;
if (direct) flags |= O_DIRECT;
fd[i] = open(devices[i], flags);
...
i've since 'expanded' its functionality a bit so that i can do tests where
i'm rate-limiting different devices to different limits, variable
read/write/seeks, etc etc.
>so I in particular we
>can see the semantics of blocks and bs? 64k*8k == 5k * 1M / 10.
K == 1000
k == 1024
M == 1000*1000
m == 1024*1024
g == 1024*1024*1024
G == 1000*1000*1000
so the above is:
blocks = 64K, bs=8k means 64000 x 8192-byte read()s = 524288000 bytes
blocks = 5K, bs=1m means 5000 x 1048576-byte read()s = 5242880000 bytes
>O_DIRECT has to do some more work to check for the coherency with the
>pagecache and it has some more overhead with the address space
>operations, but O_DIRECT by default uses the blocksize of the blkdev,
>that is set to 1k by default (if you never mounted it) versus the
>hardblocksize of 512bytes used by the raw device (assuming the sd[m-r]
>aren't mounted).
i wonder if the MD driver set it to 512 bytes if it has been touched.
i'll reboot the box after each test to validate. (which, unfortunately, is
about a 10 minute reboot cycle for 22 x SCSI disks and 16 FC disks).
>This is most probably why O_DIRECT is faster than raw.c, otherwise they
>would run almost at the same rate, the pagecache coherency fast paths
>and the address space ops overhead of O_DIRECT shouldn't be noticeable.
as the statistics show, O_DIRECT is about 5% superior to raw.c.
> > of course, these are all ~25% worse than if a mechanism of performing the
> > i/o avoiding the copy_to_user() altogether:
> > [root@mel-stglab-host1 src]# readprofile -r;
> > ./test_disk_performance blocks=64K bs=8k nocopy /dev/sd[m-r]
> > Completed reading 12000 mbytes in 97.846938 seconds (122.64
> > Mbytes/sec), 59usec mean
>
>the nocopy hack is not an interesting test for O_DIRECT/rawio, it
>doesn't walk pagetables, it doesn't allow the DMA to be done into
>userspace memory. If you want the pagecache to be visible into userspace
>(i.e. MAP_PRIVATE/MAP_SHARED) you must deal with pagetables somehow,
>and if you want the read/write syscalls to DMA directly into userspace
>memory (raw/O_DIRECT) you must still walk pagetables during those
the nocopy hack is interesting from the point-of-view of seeing what the
copy_to_user() overhead actually is.
it is interesting to compare that to O_DIRECT.
i agree that doing pagecache-visible-in-userspace is hard to get right and
to do it fast.
but i'm not proposing any such development.
what i am thinking is "interesting" is for privileged programs which can
mmap() /dev/mem and have some async-i/o scheme which returns back
physical-address information about blocks.
sure, it has a lot of potential-security-issues associated with it, and
isn't useful for anything but really big-iron program, but so has other
schemes that involve "lets put this userspace module in the kernel to avoid
user<->kernel copies".
>syscalls before starting the DMA. If you don't want to explicitly deal
>with the pagetables then you need to copy_user (case 1). In most archs
>where mem bandwith is very expensive avoiding the copy-user is a big
>global win (other cpus won't collapse in smp etc..).
>
>Your nocopy hack benchmark has some relevance only for usages of the
>data done by kernel. So if it is the kernel that reads the data directly
>from pagecache (i.e. a kernel module), then your nocopy benchmark
>matters. For example your nocopy benchmark also matters for sendfile
>zerocopy, it will read at 122M/sec. But if it's userspace supposed to
>receive the data (so not directly from pagecache on the kernel direct
>mapping, but in userspace mapped memory) it cannot be 122M/sec, it has
>to be less due the user address space management.
i guess i simply see that there are a bunch of possible big-iron programs
which:
- read from [raw] disk
- write results to network
- don't actually look at the payload
a few program like this that come to mind are:
- Samba
- (user-space) NFS
- [HTTP] caching software
> > comparative profile=2 traces:
...
>Can you use -k4? this is the number of hits per function, but we should
>take the size of the function into account too. Otherwise small
>functions won't show up.
will do.
>Can you also give a spin to the same benchmark with 2.4.19pre8aa2? It
>has the vary-io stuff from Badari and futher kiobuf optimization from
>Chuck.
will do so.
>(vary-io will work only with aic and qlogic, enabling it is a one
>liner if the driver is just ok with variable bh->b_size in the same I/O
>request). right fix for avoiding the flood of small bh is bio in 2.5,
>for 2.4 vary-io should be fine.
i'm using the qlogic HBA driver from their web-site rather than the current
driver in the kernel which doesn't function with the 2gbit/s HBAs.
care to point out the line i should be looking for to change?
cheers,
lincoln.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56)
2002-05-11 3:23 ` Lincoln Dale
@ 2002-05-13 11:19 ` Andrea Arcangeli
2002-05-13 23:58 ` Lincoln Dale
0 siblings, 1 reply; 265+ messages in thread
From: Andrea Arcangeli @ 2002-05-13 11:19 UTC (permalink / raw)
To: Lincoln Dale; +Cc: Andrew Morton, Kernel Mailing List
On Sat, May 11, 2002 at 01:23:11PM +1000, Lincoln Dale wrote:
> At 02:36 PM 10/05/2002 +0200, Andrea Arcangeli wrote:
> >> being fair to O_DIRECT and giving it 1mbyte disk-reads to work with and
> >> giving normal i/o 8kbyte reads to work with.
> ..
> >is any of the disks mounted?
>
> no.
> for the O_DIRECT tests i also didn't have the MD driver touching
> them. (ie. raidstop /dev/md[0-1]).
>
> >> O_DIRECT is still a ~30% performance hit versus just talking to the
> >> /dev/sdX device directly. profile traces at bottom.
> >>
> >> normal block-device disks sd[m-r] without O_DIRECT, 64K x 8k reads:
> >> [root@mel-stglab-host1 src]# readprofile -r;
> >> ./test_disk_performance blocks=64K bs=8k /dev/sd[m-r]
> >> Completed reading 12000 mbytes in 125.028612 seconds (95.98
> >> Mbytes/sec), 76usec mean
> >
> >can you post your test_disk_performance program
>
> i'll post the program later on this weekend. (its suffering from continual
> scope-creap and additional development. :) ).
>
> but basically, its similar to 'dd' except works on multiple devices
> simultaneously.
> operation consists of sequential-reads or sequential-writes.
>
> its main loop basically consists entirely of:
> /* loop thru blocks */
> for (blocknum=0; blocknum < blocks; blocknum++) {
> /* loop thru devices */
> for (devicenum=0; devicenum < num_devices; devicenum++) {
> before_time = time_tick();
> if (operation == 0) {
> /* read op */
> amt_read = read(fd[devicenum],
> aligned_buffer[devicenum], block_size);
> } else {
> /* write-op */
> amt_read = write(fd[devicenum],
> aligned_buffer[devicenum], block_size);
> }
> after_time = time_tick();
>
> [check amt_read == block_size, calculate time
> histograms]
> }
> }
>
> the open call consists of:
> for (i=0; i < num_devices; i++) {
> flags = (O_RDWR | O_LARGEFILE);
> if (nocopy) flags |= O_NOCOPY;
> if (direct) flags |= O_DIRECT;
> fd[i] = open(devices[i], flags);
> ...
>
> i've since 'expanded' its functionality a bit so that i can do tests where
> i'm rate-limiting different devices to different limits, variable
> read/write/seeks, etc etc.
>
> >so I in particular we
> >can see the semantics of blocks and bs? 64k*8k == 5k * 1M / 10.
>
> K == 1000
> k == 1024
> M == 1000*1000
> m == 1024*1024
> g == 1024*1024*1024
> G == 1000*1000*1000
>
> so the above is:
> blocks = 64K, bs=8k means 64000 x 8192-byte read()s = 524288000 bytes
> blocks = 5K, bs=1m means 5000 x 1048576-byte read()s = 5242880000 bytes
if the program is doing only what shown in the main loop, then you're
reading 10 times more data with O_DIRECT, that was my point in saying
64k*8k == 5k * 1M / 10, but I assume you took it into account (otherwise
it means O_DIRECT is just 5 times faster than buffered-I/O for you)
Also I would suggest to measure the time taken by the whole workload, not only
the time for read/write syscalls.
>
> >O_DIRECT has to do some more work to check for the coherency with the
> >pagecache and it has some more overhead with the address space
> >operations, but O_DIRECT by default uses the blocksize of the blkdev,
> >that is set to 1k by default (if you never mounted it) versus the
> >hardblocksize of 512bytes used by the raw device (assuming the sd[m-r]
> >aren't mounted).
>
> i wonder if the MD driver set it to 512 bytes if it has been touched.
it is set to 512 bytes.
> i'll reboot the box after each test to validate. (which, unfortunately, is
> about a 10 minute reboot cycle for 22 x SCSI disks and 16 FC disks).
>
> >This is most probably why O_DIRECT is faster than raw.c, otherwise they
> >would run almost at the same rate, the pagecache coherency fast paths
> >and the address space ops overhead of O_DIRECT shouldn't be noticeable.
>
> as the statistics show, O_DIRECT is about 5% superior to raw.c.
yep, as said that's because O_DIRECT uses the softblocksize (1k) large
b_size, while raw uses the hardblocksize that is 512bytes, so raw wastes 2
times more memory and cpu in handling those list of smaller bh. That is fair
comparison with the buffered-IO, also the buffered-IO uses 1k b_size.
>
> >> of course, these are all ~25% worse than if a mechanism of performing the
> >> i/o avoiding the copy_to_user() altogether:
> >> [root@mel-stglab-host1 src]# readprofile -r;
> >> ./test_disk_performance blocks=64K bs=8k nocopy /dev/sd[m-r]
> >> Completed reading 12000 mbytes in 97.846938 seconds (122.64
> >> Mbytes/sec), 59usec mean
> >
> >the nocopy hack is not an interesting test for O_DIRECT/rawio, it
> >doesn't walk pagetables, it doesn't allow the DMA to be done into
> >userspace memory. If you want the pagecache to be visible into userspace
> >(i.e. MAP_PRIVATE/MAP_SHARED) you must deal with pagetables somehow,
> >and if you want the read/write syscalls to DMA directly into userspace
> >memory (raw/O_DIRECT) you must still walk pagetables during those
>
> the nocopy hack is interesting from the point-of-view of seeing what the
> copy_to_user() overhead actually is.
> it is interesting to compare that to O_DIRECT.
>
> i agree that doing pagecache-visible-in-userspace is hard to get right and
> to do it fast.
> but i'm not proposing any such development.
Yes, I only wanted to make clear the no-copy hack will always be faster
then anything that ends putting the data in user memory (or providing
information where the data in userspace is) somehow.
> what i am thinking is "interesting" is for privileged programs which can
> mmap() /dev/mem and have some async-i/o scheme which returns back
> physical-address information about blocks.
You'd at least need to reserve only a contigous part of the physical
pages for that purposes, or you would run out of virtual address space on a
32bit arch, that just is a problem with fragmentation. Secondly you should use
such mmapped /dev/mem area as your backing store for the application cache or
it's again a copy-user. It seems very messy. I think writing a software TLB
for a certain special VMA allowing the resolution of a virtual address in the
VMA to a struct page with a very efficient lookup would be better if something
to skip the overhead of the pagetable management. And it doesn't need special
userspace hacks with horrible API.
> sure, it has a lot of potential-security-issues associated with it, and
> isn't useful for anything but really big-iron program, but so has other
> schemes that involve "lets put this userspace module in the kernel to avoid
> user<->kernel copies".
>
> >syscalls before starting the DMA. If you don't want to explicitly deal
> >with the pagetables then you need to copy_user (case 1). In most archs
> >where mem bandwith is very expensive avoiding the copy-user is a big
> >global win (other cpus won't collapse in smp etc..).
> >
> >Your nocopy hack benchmark has some relevance only for usages of the
> >data done by kernel. So if it is the kernel that reads the data directly
> >from pagecache (i.e. a kernel module), then your nocopy benchmark
> >matters. For example your nocopy benchmark also matters for sendfile
> >zerocopy, it will read at 122M/sec. But if it's userspace supposed to
> >receive the data (so not directly from pagecache on the kernel direct
> >mapping, but in userspace mapped memory) it cannot be 122M/sec, it has
> >to be less due the user address space management.
>
> i guess i simply see that there are a bunch of possible big-iron programs
> which:
> - read from [raw] disk
> - write results to network
> - don't actually look at the payload
>
> a few program like this that come to mind are:
> - Samba
> - (user-space) NFS
> - [HTTP] caching software
>
> >> comparative profile=2 traces:
> ...
> >Can you use -k4? this is the number of hits per function, but we should
> >take the size of the function into account too. Otherwise small
> >functions won't show up.
>
> will do.
>
> >Can you also give a spin to the same benchmark with 2.4.19pre8aa2? It
> >has the vary-io stuff from Badari and futher kiobuf optimization from
> >Chuck.
>
> will do so.
>
> >(vary-io will work only with aic and qlogic, enabling it is a one
> >liner if the driver is just ok with variable bh->b_size in the same I/O
> >request). right fix for avoiding the flood of small bh is bio in 2.5,
> >for 2.4 vary-io should be fine.
>
> i'm using the qlogic HBA driver from their web-site rather than the current
> driver in the kernel which doesn't function with the 2gbit/s HBAs.
> care to point out the line i should be looking for to change?
Sure just search the .h file for something like this:
#define QLOGICISP { \
detect: isp1020_detect, \
release: isp1020_release, \
info: isp1020_info, \
queuecommand: isp1020_queuecommand, \
abort: isp1020_abort, \
reset: isp1020_reset, \
bios_param: isp1020_biosparam, \
can_queue: QLOGICISP_REQ_QUEUE_LEN, \
this_id: -1, \
sg_tablesize: QLOGICISP_MAX_SG(QLOGICISP_REQ_QUEUE_LEN), \
cmd_per_lun: 1, \
present: 0, \
unchecked_isa_dma: 0, \
use_clustering: DISABLE_CLUSTERING, \
can_do_varyio: 1, \
}
and add can_do_varyio: 1 like in the above file for qlogicisp.h.
(btw, then the vary-io will allow more efficient I/O handling than the
buffered-IO, so it would get unfair with the so underpowered buffered-IO,
but still it would be interesting to see the effect of varyio in numbers)
You also mentioned the md device. If you do I/O to a raid0 array with 5 disks
attached to an MD device then your buffersize with O_DIRECT must be 5*512k at
least or you cannot send bit large scsi commands to each scsi disk and
performance would be very bad compared to reading 1M from each /dev/sd
separately.
One thing I would also recommend is to write a threaded version of the program,
that reads or writes to all the /dev/sd disks simultaneously, first w/ O_DIRECT
then w/o O_DIRECT. The reason is that currently you aren't using all the disks
at once with O_DIRECT due the lack of async-io, while for example
async-writeback w/o O_DIRECT allows a better scaling over the disks, not to
tell the fact you only benchmark the duration of the syscall sounds not
accurate if async I/O happens over userspace (userspace can get stalled by
completion interrupts etc.. and you're not measuring such overhead).
If you instead make a single raid0 array and you use a buffer size of nr_PV*512k
(or nr_PV*1M even better) then also O_DIRECT without threading should perform
similar to the buffered IO.
In my measurements the lack of async-io with O_DIRECT (with a single disk)
wasn't significant in the bandwith numbers, let's say a few percent slower than
the buffered-IO, but the CPU utilization and mem bandwith was so much optimized
that I thought it definitely pays off even now without a kernel side async-io
(note that I wasn't doing simultaneous I/O to multiple devices, futhmore the
bandwith of the membus was not shared with any other workload).
Since you "stripe" by hand in all disks you do a different workload than my
previous benchs and you definitely want to keep all the harddisk running at the
same time. I would also suggest to benchmark a single disk, to see if there is
still such a big performance difference (again: including the cost outside the
syscalls too).
Thanks for the interesting big-iron number-feedback :)
Andrea
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56)
2002-05-13 11:19 ` Andrea Arcangeli
@ 2002-05-13 23:58 ` Lincoln Dale
2002-05-14 0:22 ` Andrea Arcangeli
0 siblings, 1 reply; 265+ messages in thread
From: Lincoln Dale @ 2002-05-13 23:58 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Andrew Morton, Kernel Mailing List
At 01:19 PM 13/05/2002 +0200, Andrea Arcangeli wrote:
> > so the above is:
> > blocks = 64K, bs=8k means 64000 x 8192-byte read()s = 524288000 bytes
> > blocks = 5K, bs=1m means 5000 x 1048576-byte read()s = 5242880000 bytes
>
>if the program is doing only what shown in the main loop, then you're
>reading 10 times more data with O_DIRECT, that was my point in saying
>64k*8k == 5k * 1M / 10, but I assume you took it into account (otherwise
>it means O_DIRECT is just 5 times faster than buffered-I/O for you)
no- count them up above.
without O_DIRECT i was doing: 64000 x 8192 = 524288000 bytes
with O_DIRECT i was doing: 5000 x 1048576 = 524288000 bytes.
ie. same amount of data.
regardless, the "mbyte/sec" is calculated at the very end.
>Also I would suggest to measure the time taken by the whole workload, not only
>the time for read/write syscalls.
the time taken for the whole workload _is_ calculated at the very end of
the workload. that way and "readahead" doesn't have an unfair advantage.
i also have gettimeofday() calls to measure latency on a per-read or
per-write basis
...
>One thing I would also recommend is to write a threaded version of the
>program,
>that reads or writes to all the /dev/sd disks simultaneously, first w/
>O_DIRECT
>then w/o O_DIRECT. The reason is that currently you aren't using all the disks
>at once with O_DIRECT due the lack of async-io
i've thought about doing that - shame that there isn't an async version of
read(). the hard part is that i want to keep the disks at roughly the same
"block" at the same time, so there will still need to be some
syncronization of threads.
otherwise, basic SCSI id priority means that it won't be fair across all
disks. (remeber that i'm also attempting to measure latency)
...
>Since you "stripe" by hand in all disks you do a different workload than my
>previous benchs and you definitely want to keep all the harddisk running
>at the
>same time. I would also suggest to benchmark a single disk, to see if there is
>still such a big performance difference (again: including the cost outside the
>syscalls too).
a single-disk means we hit the performance limits of a single disk
spindle. ie. around 45mbyte/sec sustained throughput.
if you don't think there's any real overhead in the MD driver, i'll just
use that instead for now. (raid-0)
>Thanks for the interesting big-iron number-feedback :)
kinda fun. :-)
cheers,
lincoln.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56)
2002-05-13 23:58 ` Lincoln Dale
@ 2002-05-14 0:22 ` Andrea Arcangeli
2002-05-14 2:43 ` O_DIRECT on 2.4.19pre8aa2 md device Lincoln Dale
0 siblings, 1 reply; 265+ messages in thread
From: Andrea Arcangeli @ 2002-05-14 0:22 UTC (permalink / raw)
To: Lincoln Dale; +Cc: Andrew Morton, Kernel Mailing List
On Tue, May 14, 2002 at 09:58:31AM +1000, Lincoln Dale wrote:
> At 01:19 PM 13/05/2002 +0200, Andrea Arcangeli wrote:
> >> so the above is:
> >> blocks = 64K, bs=8k means 64000 x 8192-byte read()s = 524288000 bytes
> >> blocks = 5K, bs=1m means 5000 x 1048576-byte read()s = 5242880000 bytes
> >
> >if the program is doing only what shown in the main loop, then you're
> >reading 10 times more data with O_DIRECT, that was my point in saying
> >64k*8k == 5k * 1M / 10, but I assume you took it into account (otherwise
> >it means O_DIRECT is just 5 times faster than buffered-I/O for you)
>
> no- count them up above.
> without O_DIRECT i was doing: 64000 x 8192 = 524288000 bytes
> with O_DIRECT i was doing: 5000 x 1048576 = 524288000 bytes.
>
> ie. same amount of data.
I don't mind the 1024/1000 difference, in this context let's assume
k=K,m=M and g=G, I only mind the 1 order of magintude difference. python
tells me:
w/o O_DIRECT 64000 * 8192 = 524288000
w/ O_DIRECT 5000 * 1048576 = 5242880000
^ note the additional zero
they're both _bytes_, it's apples against apples.
You instead wrote:
without O_DIRECT i was doing: 64000 x 8192 = 524288000
with O_DIRECT i was doing: 5000 x 1048576 = 524288000
in the above you're losing a 0 at the end in the O_DIRECT case.
> regardless, the "mbyte/sec" is calculated at the very end.
So I assume the 1 order of magnitude difference didn't affect the
benchmark results anyways. That's quite expectable otherwise as said in
the earlier email O_DIRECT would be just running 5 times faster than
non-O_DIRECT :).
> >Also I would suggest to measure the time taken by the whole workload, not
> >only
> >the time for read/write syscalls.
>
> the time taken for the whole workload _is_ calculated at the very end of
> the workload. that way and "readahead" doesn't have an unfair advantage.
> i also have gettimeofday() calls to measure latency on a per-read or
> per-write basis
Ok fine, by only reading the pseudocode of the main loop it wasn't
obvious (I mistaken the latency accounting for the global throughput
accounting). thanks for the clarification.
> ...
> >One thing I would also recommend is to write a threaded version of the
> >program,
> >that reads or writes to all the /dev/sd disks simultaneously, first w/
> >O_DIRECT
> >then w/o O_DIRECT. The reason is that currently you aren't using all the
> >disks
> >at once with O_DIRECT due the lack of async-io
>
> i've thought about doing that - shame that there isn't an async version of
> read(). the hard part is that i want to keep the disks at roughly the same
> "block" at the same time, so there will still need to be some
> syncronization of threads.
> otherwise, basic SCSI id priority means that it won't be fair across all
> disks. (remeber that i'm also attempting to measure latency)
>
> ...
> >Since you "stripe" by hand in all disks you do a different workload than my
> >previous benchs and you definitely want to keep all the harddisk running
> >at the
> >same time. I would also suggest to benchmark a single disk, to see if there
> >is
> >still such a big performance difference (again: including the cost outside
> >the
> >syscalls too).
>
> a single-disk means we hit the performance limits of a single disk
> spindle. ie. around 45mbyte/sec sustained throughput.
> if you don't think there's any real overhead in the MD driver, i'll just
> use that instead for now. (raid-0)
I think raid0 is a good start to make all disks running at the same time
for O_DIRECT too (only make sure to use a buffer large nr_PV*512k or
nr_PV*1M to allow the generation of large dma transactions to each
disk). The overhead of raid-0 shouldn't be noticeable, it should be
minimal compared to the overhead of the 1k bhs.
Andrea
^ permalink raw reply [flat|nested] 265+ messages in thread
* O_DIRECT on 2.4.19pre8aa2 md device
2002-05-14 0:22 ` Andrea Arcangeli
@ 2002-05-14 2:43 ` Lincoln Dale
2002-05-21 15:51 ` Andrea Arcangeli
0 siblings, 1 reply; 265+ messages in thread
From: Lincoln Dale @ 2002-05-14 2:43 UTC (permalink / raw)
To: Andrea Arcangeli, Andrew Morton, Kernel Mailing List
g'day,
At 02:22 AM 14/05/2002 +0200, Andrea Arcangeli wrote:
>I think raid0 is a good start to make all disks running at the same time
>for O_DIRECT too (only make sure to use a buffer large nr_PV*512k or
same hardware as before -- dual P3 Xeon (733MHz), 133MHz FSB, 2G PC133 SDRAM.
this time, a raid-0 array using MD driver across 8 x 18G 15K RPM disks. md
driver is using "128k chunks".
kernel is 2.4.19pre8aa2 with the qlogic 2300 HBA driver compiled with
vary_io set to 1. FC network is all 2gbit/s. no highmem.
kernel is booted using "profile=2" and has lockmeter compiled in also.
system rebooted after each test.
i promise its the same amount of data for each test this time: :-)
O_DIRECT blocksize = 4 megabytes, blocks = 28000: 112000 mbytes in
977.869706 seconds (120.10 Mbytes/sec)
'raw' blocksize = 4 megabytes, blocks = 28000: 112000 mbytes in
1659.551271 seconds (70.77 Mbytes/sec)
base blocksize = 8 kilobytes, blocks = 14336000: 112000 mbytes
in 918.287570 seconds (127.89 Mbytes/sec)
nocopy hack: blocksize = 8 kilobytes, blocks = 14336000: 112000 mbytes
in 671.560772 seconds (174.88 Mbytes/sec)
net-effect is that O_DIRECT still has a performance hit versus base, 'raw'
just sucks wind versus the others, even 'nocopy' cannot hit line-rate on
the fibre-channel card. (its possible to hit 205mbytes/sec using sg_tools
sg_read or sg_dd).
O_DIRECT:
[root@mel-stglab-host1 src]# readprofile -r;
./test_disk_performance bs=4m blocks=28000 direct /dev/md0 >
/tmp/vary_direct.txt; readprofile -v | sort -n -k4 >> /tmp/vary_direct.txt
Completed reading 112000 mbytes in 977.869706 seconds (120.10
Mbytes/sec), 34849usec mean
[root@mel-stglab-host1 tmp]# tail -20 vary_direct.txt
8012aa50 mark_dirty_kiobuf 234 2.0893
8013f0e0 set_bh_page 134 2.0938
801d28b0 generic_make_request 785 2.5822
80136d40 __free_pages 137 2.8542
80142a10 max_block 406 3.1719
8011f950 do_softirq 724 3.2321
801405d0 brw_kiovec 3219 3.5296
80271370 md_make_request 484 4.3214
80200fb0 __scsi_end_request 1321 4.3454
8023d670 sd_find_queue 334 5.2188
80142c80 blkdev_get_block 358 5.5938
80140560 wait_kio 690 6.1607
80152820 end_kio_request 601 7.5125
80267320 raid0_make_request 3059 9.1042
8013e950 init_buffer 310 9.6875
801d29e0 submit_bh 1274 11.3750
801d22a0 __make_request 20967 13.5097
8013dd10 unlock_buffer 1283 16.0375
80140520 end_buffer_io_kiobuf 2946 46.0312
80106d20 default_idle 151886 2373.2188
'raw':
[root@mel-stglab-host1 src]# readprofile -r;
./test_disk_performance bs=4m blocks=28000 /dev/raw/raw1 >
/tmp/vary_raw.txt; readprofile -v | sort -n -k4 >> /tmp/vary_raw.txt
Completed reading 112000 mbytes in 1659.551271 seconds (70.77
Mbytes/sec), 59167usec mean
[root@mel-stglab-host1 src]# tail -20 /tmp/vary_raw.txt
8012a740 get_user_pages 636 1.3707
80203890 scsi_init_io_vc 989 1.8180
80136d40 __free_pages 126 2.6250
8012aa50 mark_dirty_kiobuf 300 2.6786
8011f950 do_softirq 836 3.7321
801d28b0 generic_make_request 1727 5.6809
8013e950 init_buffer 237 7.4062
801405d0 brw_kiovec 7164 7.8553
80200fb0 __scsi_end_request 2574 8.4671
8023d670 sd_find_queue 602 9.4062
80140560 wait_kio 1155 10.3125
80271370 md_make_request 1176 10.5000
8013f0e0 set_bh_page 799 12.4844
80152820 end_kio_request 1084 13.5500
80267320 raid0_make_request 5904 17.5714
801d29e0 submit_bh 2426 21.6607
8013dd10 unlock_buffer 2540 31.7500
801d22a0 __make_request 77413 49.8795
80140520 end_buffer_io_kiobuf 5540 86.5625
80106d20 default_idle 214369 3349.5156
base:
[root@mel-stglab-host1 src]# readprofile -r;
./test_disk_performance bs=8k blocks=14336000 /dev/md0 >
/tmp/vary_base.txt; readprofile -v | sort -n -k4 >> /tmp/vary_base.txt
Completed reading 112000 mbytes in 918.287570 seconds (127.89
Mbytes/sec), 63usec mean
[root@mel-stglab-host1 src]# tail -20 /tmp/vary_base.txt
80135010 delta_nr_cache_pages 591 6.1562
80203890 scsi_init_io_vc 3448 6.3382
801288b0 _spin_unlock_ 894 6.9844
8013f380 create_empty_buffers 717 7.4688
80133e60 kmem_cache_alloc 2152 7.9118
80267320 raid0_make_request 3125 9.3006
801d28b0 generic_make_request 2861 9.4112
801d29e0 submit_bh 1304 11.6429
8013f0e0 set_bh_page 795 12.4219
80108a48 system_call 766 13.6786
801d22a0 __make_request 23675 15.2545
8012e0c0 unlock_page 1990 15.5469
80140ea0 try_to_free_buffers 5294 15.7560
801340e0 kmem_cache_free 2563 20.0234
80136d40 __free_pages 1012 21.0833
801298cc .text.lock.lockmeter 3129 21.1419
801287d0 _spin_lock_ 4097 36.5804
8013e970 end_buffer_io_async 9310 48.4896
8012edd0 file_read_actor 26102 233.0536
80106d20 default_idle 59883 935.6719
nocopy hack:
[root@mel-stglab-host1 src]# readprofile -r;
./test_disk_performance bs=8k blocks=14336000 nocopy /dev/md0 >
/tmp/vary_nocopy.txt; readprofile -v | sort -n -k4 >> /tmp/vary_nocopy.txt
Completed reading 112000 mbytes in 671.560772 seconds (174.88
Mbytes/sec), 46usec mean
[root@mel-stglab-host1 src]# tail -20 /tmp/vary_nocopy.txt
8013f020 get_unused_buffer_head 1152 6.0000
80134fb0 delta_nr_inactive_pages 583 6.0729
80135010 delta_nr_cache_pages 617 6.4271
801288b0 _spin_unlock_ 854 6.6719
80133e60 kmem_cache_alloc 2154 7.9191
8013f380 create_empty_buffers 785 8.1771
80267320 raid0_make_request 3112 9.2619
801d28b0 generic_make_request 2876 9.4605
801d29e0 submit_bh 1293 11.5446
8013f0e0 set_bh_page 759 11.8594
80108a48 system_call 778 13.8929
8012e0c0 unlock_page 1814 14.1719
80140ea0 try_to_free_buffers 4908 14.6071
801d22a0 __make_request 23997 15.4620
801340e0 kmem_cache_free 2562 20.0156
80136d40 __free_pages 980 20.4167
801298cc .text.lock.lockmeter 3411 23.0473
801287d0 _spin_lock_ 4099 36.5982
8013e970 end_buffer_io_async 8741 45.5260
80106d20 default_idle 39093 610.8281
cheers,
lincoln.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT on 2.4.19pre8aa2 md device
2002-05-14 2:43 ` O_DIRECT on 2.4.19pre8aa2 md device Lincoln Dale
@ 2002-05-21 15:51 ` Andrea Arcangeli
2002-05-22 1:18 ` Lincoln Dale
0 siblings, 1 reply; 265+ messages in thread
From: Andrea Arcangeli @ 2002-05-21 15:51 UTC (permalink / raw)
To: Lincoln Dale; +Cc: Andrew Morton, Kernel Mailing List
On Tue, May 14, 2002 at 12:43:48PM +1000, Lincoln Dale wrote:
> g'day,
>
> At 02:22 AM 14/05/2002 +0200, Andrea Arcangeli wrote:
> >I think raid0 is a good start to make all disks running at the same time
> >for O_DIRECT too (only make sure to use a buffer large nr_PV*512k or
>
> same hardware as before -- dual P3 Xeon (733MHz), 133MHz FSB, 2G PC133
> SDRAM.
> this time, a raid-0 array using MD driver across 8 x 18G 15K RPM disks. md
> driver is using "128k chunks".
>
> kernel is 2.4.19pre8aa2 with the qlogic 2300 HBA driver compiled with
> vary_io set to 1. FC network is all 2gbit/s. no highmem.
> kernel is booted using "profile=2" and has lockmeter compiled in also.
> system rebooted after each test.
>
> i promise its the same amount of data for each test this time: :-)
> O_DIRECT blocksize = 4 megabytes, blocks = 28000: 112000 mbytes in
btw, since you've 8 disks it probably would be a bit faster with 8meg,
so you submit at least 2 scsi commands for each disk at the same time,
not only 1.
> 977.869706 seconds (120.10 Mbytes/sec)
> 'raw' blocksize = 4 megabytes, blocks = 28000: 112000 mbytes in
> 1659.551271 seconds (70.77 Mbytes/sec)
> base blocksize = 8 kilobytes, blocks = 14336000: 112000 mbytes
> in 918.287570 seconds (127.89 Mbytes/sec)
120 vs 127 is pretty good, also considering an 8meg buffer may be enough
to give you such 7 mbyte/sec back too.
> nocopy hack: blocksize = 8 kilobytes, blocks = 14336000: 112000 mbytes
> in 671.560772 seconds (174.88 Mbytes/sec)
>
> net-effect is that O_DIRECT still has a performance hit versus base, 'raw'
yes, a performance hit in absolute disk performance is expected due the
additional synchronization after each read/write syscall returns (the
synchronous beahviour), if the mem bandwith is very high and the mem
bandwith and cpu is otherwise unused, so if the machine is completly
dedicated to I/O and the mem bandwith is high. But look at the
profiling...
> just sucks wind versus the others, even 'nocopy' cannot hit line-rate on
> the fibre-channel card. (its possible to hit 205mbytes/sec using sg_tools
> sg_read or sg_dd).
>
>
> O_DIRECT:
> [root@mel-stglab-host1 src]# readprofile -r;
> ./test_disk_performance bs=4m blocks=28000 direct /dev/md0 >
> /tmp/vary_direct.txt; readprofile -v | sort -n -k4 >> /tmp/vary_direct.txt
> Completed reading 112000 mbytes in 977.869706 seconds (120.10
> Mbytes/sec), 34849usec mean
>
> [root@mel-stglab-host1 tmp]# tail -20 vary_direct.txt
> 8012aa50 mark_dirty_kiobuf 234 2.0893
> 8013f0e0 set_bh_page 134 2.0938
> 801d28b0 generic_make_request 785 2.5822
> 80136d40 __free_pages 137 2.8542
> 80142a10 max_block 406 3.1719
> 8011f950 do_softirq 724 3.2321
> 801405d0 brw_kiovec 3219 3.5296
> 80271370 md_make_request 484 4.3214
> 80200fb0 __scsi_end_request 1321 4.3454
> 8023d670 sd_find_queue 334 5.2188
> 80142c80 blkdev_get_block 358 5.5938
> 80140560 wait_kio 690 6.1607
> 80152820 end_kio_request 601 7.5125
> 80267320 raid0_make_request 3059 9.1042
> 8013e950 init_buffer 310 9.6875
> 801d29e0 submit_bh 1274 11.3750
> 801d22a0 __make_request 20967 13.5097
> 8013dd10 unlock_buffer 1283 16.0375
> 80140520 end_buffer_io_kiobuf 2946 46.0312
> 80106d20 default_idle 151886 2373.2188
here you see the machine was basically idle for the whole time.
> base:
> [root@mel-stglab-host1 src]# readprofile -r;
> ./test_disk_performance bs=8k blocks=14336000 /dev/md0 >
> /tmp/vary_base.txt; readprofile -v | sort -n -k4 >> /tmp/vary_base.txt
> Completed reading 112000 mbytes in 918.287570 seconds (127.89
> Mbytes/sec), 63usec mean
>
> [root@mel-stglab-host1 src]# tail -20 /tmp/vary_base.txt
> 80135010 delta_nr_cache_pages 591 6.1562
> 80203890 scsi_init_io_vc 3448 6.3382
> 801288b0 _spin_unlock_ 894 6.9844
> 8013f380 create_empty_buffers 717 7.4688
> 80133e60 kmem_cache_alloc 2152 7.9118
> 80267320 raid0_make_request 3125 9.3006
> 801d28b0 generic_make_request 2861 9.4112
> 801d29e0 submit_bh 1304 11.6429
> 8013f0e0 set_bh_page 795 12.4219
> 80108a48 system_call 766 13.6786
> 801d22a0 __make_request 23675 15.2545
> 8012e0c0 unlock_page 1990 15.5469
> 80140ea0 try_to_free_buffers 5294 15.7560
> 801340e0 kmem_cache_free 2563 20.0234
> 80136d40 __free_pages 1012 21.0833
> 801298cc .text.lock.lockmeter 3129 21.1419
> 801287d0 _spin_lock_ 4097 36.5804
> 8013e970 end_buffer_io_async 9310 48.4896
> 8012edd0 file_read_actor 26102 233.0536
> 80106d20 default_idle 59883 935.6719
and here the machine was almost completly loaded in the file_read_actor,
it was unusable for anything other than I/O.
To make a similar example my first pentium had a very slow harddisk that
in DMA mode was running 10/20% slower than in PIO mode, but the cpu
utilization of the DMA mode was so much lower that DMA was an huge
global win during kernel compiles etc...
In any real usage where I/O is combined with CPU and mem bus utilization
for computations, not only to move data from disk to userspace memory
O_DIRECT is an _huge_ win as you found out. You have 100000 more usable
ticks for computations over 150000 total ticks. I didn't do exact math
but a rough measure is that with "buffered I/O" (i.e. base) only around
20% of the cpu was available for doing useful things, with O_DIRECT the
90% of the cpu is available for doing useful things, not to tell the
difference in membus utilization. So I think the improvement of O_DIRECT
is just huge and pays off as I also mentioned in an erarlier email in
this thread, even if the absolute I/O performance may decrease of a few
point percent due the synchronous behaviour (and one of the items for
2.5 is to optionally make the synchronous behaviour to go away, plus
providing a generic_direct_IO that synchronizes with the pagecache via
anchors and so allows O_DIRECT to be used with data journaling too).
And again, if you use larger buffers you may be able to fix also the
last 8mbyte/sec difference :).
Andrea
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT on 2.4.19pre8aa2 md device
2002-05-21 15:51 ` Andrea Arcangeli
@ 2002-05-22 1:18 ` Lincoln Dale
2002-05-22 2:51 ` Andrea Arcangeli
0 siblings, 1 reply; 265+ messages in thread
From: Lincoln Dale @ 2002-05-22 1:18 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Andrew Morton, Kernel Mailing List, Linus Torvalds
At 05:51 PM 21/05/2002 +0200, Andrea Arcangeli wrote:
...
>In any real usage where I/O is combined with CPU and mem bus utilization
>for computations, not only to move data from disk to userspace memory
>O_DIRECT is an _huge_ win as you found out. You have 100000 more usable
>ticks for computations over 150000 total ticks.
...
no disagreement there.
however, there is still an enormous difference between
access-/dev/sdX-via-block-layer and access-dev/sdX-via-mmap (127mbyte/sec
versus 175+mbyte/sec) that for many "disk intensive" applications is a huge
difference.
if we _could_ produce superior performance with something like:
(a) additional improvements to O_DIRECT (say, a mmap() hack for O_DIRECT)
whereby i/o can go direct-to-userspace or
userspace-can-mmap(readonly)-into-
kernel-space
or
(b) O_DIRECT with async-i/o
or
(c) /dev/rawN-like interface (eg. /dev/directN) within a fixed disk
buffer in kernel
(eg. physical memory allocated at bootup) that is populated readonly into
user-space)
would such a hack be accepted in the 2.5 tree?
it sure isn't your "average desktop" setup where you hit these things as
performance limitations, but many folk (myself included) have very specific
applications which are essentially bottlenecked on PC hardware
limitations. these limitations are a side-effect of how the linux kernel
works.
that doesn't mean that other OSes don't suffer from the same thing -- since
no-other mainstream OS does the right thing, that doesn't mean that we
cannot make linux even better.
obviously, many of the above would only be suitable for disk-read
operations and less-so for disk-write -- but its probably easier to tackle
these one by one.
cheers,
lincoln.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT on 2.4.19pre8aa2 md device
2002-05-22 1:18 ` Lincoln Dale
@ 2002-05-22 2:51 ` Andrea Arcangeli
2002-06-03 4:53 ` high-end i/o performance of 2.4.19pre8aa2 (was: Re: O_DIRECT on 2.4.19pre8aa2 device) Lincoln Dale
0 siblings, 1 reply; 265+ messages in thread
From: Andrea Arcangeli @ 2002-05-22 2:51 UTC (permalink / raw)
To: Lincoln Dale; +Cc: Andrew Morton, Kernel Mailing List, Linus Torvalds
On Wed, May 22, 2002 at 11:18:13AM +1000, Lincoln Dale wrote:
> At 05:51 PM 21/05/2002 +0200, Andrea Arcangeli wrote:
> ...
> >In any real usage where I/O is combined with CPU and mem bus utilization
> >for computations, not only to move data from disk to userspace memory
> >O_DIRECT is an _huge_ win as you found out. You have 100000 more usable
> >ticks for computations over 150000 total ticks.
> ...
>
> no disagreement there.
>
> however, there is still an enormous difference between
> access-/dev/sdX-via-block-layer and access-dev/sdX-via-mmap (127mbyte/sec
> versus 175+mbyte/sec) that for many "disk intensive" applications is a huge
> difference.
hmm wait a moment, in your last email 175MB/sec was the nocopy hack, not
mmap("/dev/sda"). mmap has the same overhead of O_DIRECT in dealing with
the user pagetables (actually O_DIRECT has the potential to be a bit
lighter because map_user_kiobuf solves the page faults before triggering
a real CPU page fault that has to pass through do_page_fault for every
single page, and we prefault all of them before starting the I/O), so if
175MB/sec was the nocopy hack, 175MB/sec definitely cannot be the mmap too.
Infact a nice addition to your proggy would be to also benchmark the I/O
bandwith using mmap driven I/O, instead of buffered I/O, raw, O_DIRECT
and the nocopy hack. Like raw and O_DIRECT also the mmap driven I/O is
zerocopy I/O, so only the buffered-IO would show the total waste of cpu
and mem bandwith. I guess mmap will be on the same performance lines
with O_DIRECT (potentially a bit faster thanks to readahead), at least
during reads, writes are a bit different, you've to flush by hand with
msync region by region like if they would be write syscalls, or to wait
vm pressure.
175MB/sec means you only read to kernel space and you don't make such
information visible to userspace, so it is unbeatable no matter what :).
Only in kernel service like sendfile and/or tux can take advantage of it by
taking the radix tree lock and looking up the radix tree to find the
physical page.
>
> if we _could_ produce superior performance with something like:
> (a) additional improvements to O_DIRECT (say, a mmap() hack for O_DIRECT)
> whereby i/o can go direct-to-userspace or
> userspace-can-mmap(readonly)-into-
> kernel-space
even if you get a direct physical view of some of the address space (as
said in a earlier email on 32bit the lack of address space would reduce
the place you can map to a little zone), you still need the information
on the physical memory where the I/O gone, so that's still an API from
kernel to user, so again an overhead compared to the nocopy hack. And
that information is exactly the radix tree/hashtable (and btw, I still
I'm not a believer in the radix tree and I wonder if anybody did any
real benchmark on the big irons with a properly tuned hashtable that
takes into account the highmem too like with Davem's patch, the last
benchmarks I seen where fake IIRC, the bonus of the radix tree is the
per-inode locking and that it reduces the amount of normal zone needed
to be allocated statically for numa-q, of course until somebody notices
they can exploit the radix tree with a few liner proggy and hogs the
machine with unfreable radix tree metadata by allocating only 1 page of
highmem cache for each radix tree leaf entry and that will need further
instrumentation to the vm so that it will release highmem memory related
to the lowmem radix tree metadata, at least it's only a security
problem, not a pratical problem with useful users). Not to tell if
there's a bug in your proggy you'd destabilize the kernel. The truth is,
this is no different at all than writing a proper kernel service, if you
need that kind of performance you should write kernel code, not user
code, and to work with the pagecache directly. That's what tux does of
course, that's what sendfile does too, sendfile is like a part of an
userspace fileserver that is been moved in kernel space to send stuff
over the wire in zerocopy with knowledge about kernel data structures
incidentally like pagecache/raidx tree. You should build a sendfile like
operation and anyways to do the stuff in kernel. So you also take all
the advantages of tlb 4Mpages etc..etc.. that kernel code has. Also
take in mind sendfile can just now work to copy data in the fs too,
infact before 2.4.3 it was useful only for doing that fs copies avoiding
passing through userspace before the pskb layer was introduced in the
network code. Now thanks to the blkdev-in-pagecache introduced in 2.4.10
sendfile will even work with blkdevices, it couldn't work with
blockdevices previously. Unfortunately /bin/cp never started using it, I
made a patch to change cp to use sendfile it and to fallback to
read/write if it returned -ENOSYS, but it is never been integrated, it's
still available in my ftp area at suse.com/pub/people/andrea if anybody
is interested to actually copy data in the filesystem without wasting
mem bandwith (I don't copy heavily stuff around [I only create directory
with -l] so it is never been a real need for me). The worst part of my
/bin/cp sendfile patch is that sendfile isn't capable of being
interrupted by a signal, so copying an huge file would hang the machine,
but the same is just for huge read and write, so I consider that a kernel
issue and I think userspace should just write the whole thing at once
(however until the kernel remains uncapable of interrupting sendfile,
userspace can as well sendfile in chunks of a few mbyte each so there's
no risk of an accidental DoS at least, as far as the chunks are of the
order of the few megs there should be no loss in performance). btw, now
that sendfile64 is available in 2.5 the cp hack is doable again with LFS
support too.
Returning to our topic :), the pagetables can be see as a representation
of the same information encoded in the radix tree. They're the data
structure meant to provide the information of where is the physical page
where the I/O happened to userspace, and they provide a security layer
between kernel and userspace unlike /dev/mem or a kernel service
accessing the radix tree directly. instead of mapping physical
contigous ram via /dev/mem, and telling userspace where's the ram it
needs to use, the kenrel sets the pagetables that lead userspace to
reach the right ram without having to deal with its physical position,
and this provides protection and the app cannot crash the kernel (unless
you find a bug in map_user_kiobuf :).
> or
> (b) O_DIRECT with async-i/o
that's a different problem, this only increases the I/O pipeline, so it
will only give you such 7mbyte/sec back, it won't reduce cpu utilization
further or it won't make the I/O significantly faster (compared to using
large buffers for synchronous read/writes). It makes sure that no disk
is idle for no good reason, but it doesn't change at all the overhead in
the VM side to provide the information from disk to userspace (and the
other way around for writes).
> or
> (c) /dev/rawN-like interface (eg. /dev/directN) within a fixed disk
> buffer in kernel
> (eg. physical memory allocated at bootup) that is populated readonly
> into
> user-space)
> would such a hack be accepted in the 2.5 tree?
>
> it sure isn't your "average desktop" setup where you hit these things as
> performance limitations, but many folk (myself included) have very specific
> applications which are essentially bottlenecked on PC hardware
> limitations. these limitations are a side-effect of how the linux kernel
> works.
again, if you need to skip the overhead of passing the info to the
usersapce you should write kernel code. If you want to take advantage of
the protection of the userspace, you also need to pay for the pagetable
settings or pagetable walking instead. Anything that knows physical
pages is kernel code, writing userspace mapping /dev/mem is an illusion
of writing userspace code, that's kernel code anyways. The framebuffer
thing of X is different, X really only needs to talk to a device, it
doesn't really care about the kernel, it doesn't need to communicate
with the kernel, infact if it needs to communicate with the kernel to do
that it needs proper kernel modules like the pci/drm/agpgart/mtrr kernel drivers.
What the X server does with /dev/mem is no different than
mmap("/dev/fb0") and using the so mapped framebuffer. For read and
writes via pagecache or userspace memory, you instead need to enterely
deal with kernel internals data structures instead that only the kernel
knows, and that's why it's kernel code as soon as you want to bypass the
protection layer of the pagetables.
One way to speedup overwrites would be to make faster the lookup from
[virtual address,vma] to [physical page] using a data structure that would
act as a software tlb, that could be attached to a vma, and to en abled
with an mmap flag so you enable it only on the SGA, so an overwrite of
disk data would be faster and you could skip the pagetable walking the
second time, but if you seek an huge lot and the number of pages is
huge, at least if they're not 4M pages, the number of entries to remeber
would be quite huge and it would waste some ram, plus it wouldn't be a
critical optimization, you would skip a few levels of indirection in the
lookup, like a radix tree with only one level vs a radix tree with 3
levels of indirection and such data structure would need some
sychronization during the vma modifications too.
> that doesn't mean that other OSes don't suffer from the same thing -- since
> no-other mainstream OS does the right thing, that doesn't mean that we
> cannot make linux even better.
>
> obviously, many of the above would only be suitable for disk-read
> operations and less-so for disk-write -- but its probably easier to tackle
> these one by one.
the point is, why are you reading from disk, what are you going to do
with such data? For example, assume you read from disk to find the
string "xxx" in a file the fastest possible, then the right way to
exploit the max bandwith is to write a kernel module that triggers the
readpage asynchronously (trivial in kernel, just don't wait_on_page
until you need the data), lookup the radix tree, kmap the page and
search the string, kunmap, start a new async readpage, lookup the next
page, wait_on_page the next page and so on. That's the only way to run
at the core 175MB/sec speed, anything running in userspace would be
slower (even kernel code will be slower than 175MB if kmap actually does
something with the pte if you've more than 1G and CONFIG_3G selected etc..)
Now I'm not suggesting to move a database to kernel space for the sake
of performance, the layer of protection is foundamental for the
reliability of the system, but if you have special needs for certain
special things, then a kernel module accellerator is the only way to go
faster than buffered-IO/read/mmap/O_DIRECT/raw or whatever that makes
that disk data visible from disk to userspace and that in turn needs to
deal with the virtual mappings too and not only with the physical pages
mapped in the kernel direct mapping.
Long email sorry.
Andrea
^ permalink raw reply [flat|nested] 265+ messages in thread
* high-end i/o performance of 2.4.19pre8aa2 (was: Re: O_DIRECT on 2.4.19pre8aa2 device)
2002-05-22 2:51 ` Andrea Arcangeli
@ 2002-06-03 4:53 ` Lincoln Dale
0 siblings, 0 replies; 265+ messages in thread
From: Lincoln Dale @ 2002-06-03 4:53 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Andrew Morton, Kernel Mailing List, Linus Torvalds
g'day Andrea,
At 04:51 AM 22/05/2002 +0200, Andrea Arcangeli wrote:
>Infact a nice addition to your proggy would be to also benchmark the I/O
>bandwith using mmap driven I/O, instead of buffered I/O, raw, O_DIRECT
>and the nocopy hack. Like raw and O_DIRECT also the mmap driven I/O is
>zerocopy I/O, so only the buffered-IO would show the total waste of cpu
..
i've had some spare time to enhance my benchmark program. it is now
multithreaded and can do mmap() i/o also.
interestingly enough, it shows that mmap() actually sucks. i didn't take a
"readprofile" to see why, but it used far more cpu time than anything else.
with the same hardware as before (Dual P3 Xeon 733MHz, 133MHz FSB, 2G PC133
SDRAM, QLogic 2300 HBA in 64/66 PCI slot) and same kernel (linux 2.4.19pre8
with vary-io, lockmeter & profile=2), the aggregate throughput across 8 x
15K RPM FC disks (2gbit/s FC switched network) are:
now that i've multithreaded the test app. the results change somewhat
given we now have multiple threads issuing i/o operations:
raw i/o is slightly better than O_DIRECT again. mmap()'ed i/o is the worst.
[summary]
'base' Completed reading 81920 mbytes in 524.289107 seconds
(156.25 Mbytes/sec), 386usec mean/block
'nocopy' Completed reading 81920 mbytes in 435.653909 seconds
(188.04 Mbytes/sec), 302usec mean/block
'o_direct' Completed reading 81920 mbytes in 429.757513 seconds
(190.62 Mbytes/sec), 157655usec mean/block
'raw' Completed reading 81920 mbytes in 420.940382 seconds
(194.61 Mbytes/sec), 164368usec mean/block
'mmap' Completed reading 81920 mbytes in 1049.446053 seconds (78.06
Mbytes/sec), 401124usec mean/block
given i'm not hitting the peak performance of a single 2gbit/s fibre
channel connection, i'll now migrate to dual FC HBAs and see what happens
... :-)
[detail]
(1)
# performance of a single disk-spindle using normal linux block i/o
layer. use 10G worth of 8kbyte blocks.
# this will show the peak performance of a single 15K RPM disk without any
bus contention.
[root@mel-stglab-host1 root]# ./test_disk_performance2 bs=8k blocks=1280k
mode=basic /dev/sde
Completed reading 10240 mbytes across 1 devices using 1310720 blocks of
8192 in 190.028450 seconds (53.89 Mbytes/sec), 151usec mean
(2)
# performance of multiple disk-spindles (8 disks, sde-sdl) using normal
linux block i/o layer.
# read 10G sequentially from each disk using 8kbyte blocks parallel across
all disks.
[root@mel-stglab-host1 root]# ./test_disk_performance2 bs=8k blocks=1280k
mode=basic /dev/sd[e-l]
Completed reading 81920 mbytes across 8 devices using 10485760 blocks of
8192 in 524.289107 seconds (156.25 Mbytes/sec), 386usec mean
device #0 (/dev/sde) 10240 megabytes in 520.588206 seconds using 1310720
reads of 8192 bytes (19.67 Mbytes/sec), 320usec
device #1 (/dev/sdf) 10240 megabytes in 509.305642 seconds using 1310720
reads of 8192 bytes (20.11 Mbytes/sec), 352usec
device #2 (/dev/sdg) 10240 megabytes in 516.744421 seconds using 1310720
reads of 8192 bytes (19.82 Mbytes/sec), 351usec
device #3 (/dev/sdh) 10240 megabytes in 519.215781 seconds using 1310720
reads of 8192 bytes (19.72 Mbytes/sec), 382usec
device #4 (/dev/sdi) 10240 megabytes in 517.312922 seconds using 1310720
reads of 8192 bytes (19.79 Mbytes/sec), 339usec
device #5 (/dev/sdj) 10240 megabytes in 524.289183 seconds using 1310720
reads of 8192 bytes (19.53 Mbytes/sec), 315usec
device #6 (/dev/sdk) 10240 megabytes in 521.006994 seconds using 1310720
reads of 8192 bytes (19.65 Mbytes/sec), 322usec
device #7 (/dev/sdl) 10240 megabytes in 518.434472 seconds using 1310720
reads of 8192 bytes (19.75 Mbytes/sec), 345usec
(3)
# same test as above, but using 'nocopy hack' so PC architecture isn't the
bottleneck.
# look at 8 disks (sde-sdl) using 8kbyte/block, 10G on each disk: (80G
total) in parallel.
# using 'nocopy' means we can measure peak performance without PC
front-side-bus becoming
# the bottleneck, but we're still using the normal read-file case.
[root@mel-stglab-host1 root]# ./test_disk_performance2 bs=8k blocks=1280k
mode=nocopy /dev/sd[e-l]
Completed reading 81920 mbytes across 8 devices using 10485760 blocks of
8192 in 435.653909 seconds (188.04 Mbytes/sec), 302usec mean
device #0 (/dev/sde) 10240 megabytes in 408.566056 seconds using 1310720
reads of 8192 bytes (25.06 Mbytes/sec), 305usec
device #1 (/dev/sdf) 10240 megabytes in 411.353237 seconds using 1310720
reads of 8192 bytes (24.89 Mbytes/sec), 309usec
device #2 (/dev/sdg) 10240 megabytes in 414.244129 seconds using 1310720
reads of 8192 bytes (24.72 Mbytes/sec), 295usec
device #3 (/dev/sdh) 10240 megabytes in 416.225989 seconds using 1310720
reads of 8192 bytes (24.60 Mbytes/sec), 293usec
device #4 (/dev/sdi) 10240 megabytes in 417.593914 seconds using 1310720
reads of 8192 bytes (24.52 Mbytes/sec), 268usec
device #5 (/dev/sdj) 10240 megabytes in 422.646633 seconds using 1310720
reads of 8192 bytes (24.23 Mbytes/sec), 259usec
device #6 (/dev/sdk) 10240 megabytes in 431.343133 seconds using 1310720
reads of 8192 bytes (23.74 Mbytes/sec), 209usec
device #7 (/dev/sdl) 10240 megabytes in 435.654122 seconds using 1310720
reads of 8192 bytes (23.50 Mbytes/sec), 216usec
(4)
# use O_DIRECT to bypass readahead and most of block-layer memory-copies
# since there isn't any readahead anymore, use a large buffer to achieve
peak performance
[root@mel-stglab-host1 root]# ./test_disk_performance2 bs=4m blocks=2560
mode=direct /dev/sd[e-l]
Completed reading 81920 mbytes across 8 devices using 20480 blocks of
4194304 in 429.757513 seconds (190.62 Mbytes/sec), 157655usec mean
device #0 (/dev/sde) 10240 megabytes in 412.931460 seconds using 2560
reads of 4194304 bytes (24.80 Mbytes/sec), 160898usec
device #1 (/dev/sdf) 10240 megabytes in 414.531502 seconds using 2560
reads of 4194304 bytes (24.70 Mbytes/sec), 159234usec
device #2 (/dev/sdg) 10240 megabytes in 414.585683 seconds using 2560
reads of 4194304 bytes (24.70 Mbytes/sec), 158187usec
device #3 (/dev/sdh) 10240 megabytes in 415.257838 seconds using 2560
reads of 4194304 bytes (24.66 Mbytes/sec), 154983usec
device #4 (/dev/sdi) 10240 megabytes in 416.209758 seconds using 2560
reads of 4194304 bytes (24.60 Mbytes/sec), 138563usec
device #5 (/dev/sdj) 10240 megabytes in 419.144473 seconds using 2560
reads of 4194304 bytes (24.43 Mbytes/sec), 133593usec
device #6 (/dev/sdk) 10240 megabytes in 424.031534 seconds using 2560
reads of 4194304 bytes (24.15 Mbytes/sec), 102836usec
device #7 (/dev/sdl) 10240 megabytes in 429.773878 seconds using 2560
reads of 4194304 bytes (23.83 Mbytes/sec), 98300usec
(5)
# same as (4) but using /dev/raw/rawX instead of O_DIRECT
[root@mel-stglab-host1 root]# ./test_disk_performance2 bs=4m blocks=2560
mode=basic /dev/raw/raw[1-8]
Completed reading 81920 mbytes across 8 devices using 20480 blocks of
4194304 in 420.940382 seconds (194.61 Mbytes/sec), 164368usec mean
device #0 (raw1) 10240 megabytes in 420.869676 seconds using 2560 reads
of 4194304 bytes (24.33 Mbytes/sec), 164385usec
device #1 (raw2) 10240 megabytes in 420.859291 seconds using 2560 reads
of 4194304 bytes (24.33 Mbytes/sec), 164386usec
device #2 (raw3) 10240 megabytes in 420.834023 seconds using 2560 reads
of 4194304 bytes (24.33 Mbytes/sec), 164385usec
device #3 (raw4) 10240 megabytes in 420.896964 seconds using 2560 reads
of 4194304 bytes (24.33 Mbytes/sec), 163879usec
device #4 (raw5) 10240 megabytes in 420.838551 seconds using 2560 reads
of 4194304 bytes (24.33 Mbytes/sec), 164385usec
device #5 (raw6) 10240 megabytes in 420.867115 seconds using 2560 reads
of 4194304 bytes (24.33 Mbytes/sec), 164060usec
device #6 (raw7) 10240 megabytes in 420.906691 seconds using 2560 reads
of 4194304 bytes (24.33 Mbytes/sec), 164386usec
device #7 (raw8) 10240 megabytes in 420.932318 seconds using 2560 reads
of 4194304 bytes (24.33 Mbytes/sec), 164357usec
(6)
# use memory-mapped i/o, more for interests-sake looking at the efficiency
of the virtual-mapping of the linux kernel.
# use a large blocksize as it was shown that a 8kbyte blocksize is about
40% of the performance with a 4m blocksize
[root@mel-stglab-host1 root]# ./test_disk_performance2 bs=4m blocks=2560
mode=mmap /dev/sd[e-l]
Completed reading 81920 mbytes across 8 devices using 20480 blocks of
4194304 in 1049.446053 seconds (78.06 Mbytes/sec), 401124usec mean
device #0 (/dev/sde) 10240 megabytes in 1045.814369 seconds using 2560
reads of 4194304 bytes (9.79 Mbytes/sec), 342912usec
device #1 (/dev/sdf) 10240 megabytes in 1034.783187 seconds using 2560
reads of 4194304 bytes (9.90 Mbytes/sec), 430320usec
device #2 (/dev/sdg) 10240 megabytes in 1046.222514 seconds using 2560
reads of 4194304 bytes (9.79 Mbytes/sec), 340732usec
device #3 (/dev/sdh) 10240 megabytes in 1049.445858 seconds using 2560
reads of 4194304 bytes (9.76 Mbytes/sec), 353895usec
device #4 (/dev/sdi) 10240 megabytes in 1047.926118 seconds using 2560
reads of 4194304 bytes (9.77 Mbytes/sec), 362176usec
device #5 (/dev/sdj) 10240 megabytes in 1047.454030 seconds using 2560
reads of 4194304 bytes (9.78 Mbytes/sec), 377308usec
device #6 (/dev/sdk) 10240 megabytes in 1043.274980 seconds using 2560
reads of 4194304 bytes (9.82 Mbytes/sec), 387720usec
device #7 (/dev/sdl) 10240 megabytes in 1048.326277 seconds using 2560
reads of 4194304 bytes (9.77 Mbytes/sec), 347532usec
>175MB/sec means you only read to kernel space and you don't make such
>information visible to userspace, so it is unbeatable no matter what :).
>Only in kernel service like sendfile and/or tux can take advantage of it by
>taking the radix tree lock and looking up the radix tree to find the
>physical page.
>
> > if we _could_ produce superior performance with something like:
> > (a) additional improvements to O_DIRECT (say, a mmap() hack for O_DIRECT)
> > whereby i/o can go direct-to-userspace or
> > userspace-can-mmap(readonly)-into-
> > kernel-space
>
>even if you get a direct physical view of some of the address space (as
>said in a earlier email on 32bit the lack of address space would reduce
>the place you can map to a little zone), you still need the information
>on the physical memory where the I/O gone, so that's still an API from
>kernel to user, so again an overhead compared to the nocopy hack
[ ignoring the fact that O_DIRECT and /dev/rawN are faster than 'nocopy'
for a moment ... :-) ]
around 18 months ago, i had it working.
basically, there was a character device-driver which allowed user-space to
mmap() kernel space.
a seperate 'zone' existed that was reserved exclusively for data to/from
disk and network.
this zone was sized to 75% of RAM -- and (on the hardware being used) with
2G RAM, meant that there was 1.5GB available in this space.
the kernel would indicate the physical address of a "skb" on network
sockets and each packet put on the skbuff receive_q for a socket marked for
'zerocopy' would result in a zerocopy_buf_t being queued to user-space.
user-space could poll to get a list of the current zerocopy_buf_t's. (this
list wasn't copied to userspace using copy_to_user; user-space picked this
list up via the mmap() into kernel space too)
userspace could control what to do with buffers by various ioctl() calls on
the character-device. things like marking "i'm not interested in that
buffer anymore" would result in atomic_dec_and_test() on the refcount and a
subsequent skb_free() if there wasn't any more interest in the
buffer. there were also ioctl()s for queueing the buffer on a different
socket.
yes -- rather radical -- but it did bypass the bottleneck of
memory-bandwidth becoming the bottleneck in an application which
predominantly did:
- read-from-disk, send on network socket
- read from network socket, write to disk
- read from network socket, write to alternate network socket, write to disk
given user-space could crash at any point, there was some housekeeping that
ensured that buffers tied to a process that had failed didn't stay around -
they were cleaned up.
there was also some rudimentary checks to ensure that user-space passing
buffer pointers to kernel-space was passing buffers that kernel-space
already knew about. ie. you couldn't crash the kernel by passing a bad
pointer.
i haven't resurrected this work since the VM underwent some radical changes
-- but it wouldn't be too hard to do. it'd still be a bit ugly in the
tie-ins to the tcp stack and buffer-cache but ....
i guess given O_DIRECT and /dev/rawN performance for to-disk/from-disk
being what it currently is (very good), there's less importance on having
it, if we could find a 'fast' way to queue to/from disk from/to network.
perhaps Ben's async i/o is the way to go for those apps.
>Long email sorry.
i was travelling and it gave me a chance to test out the 8 hour battery
life in my new laptop on a 14 hour flight. :-)
cheers,
lincoln.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56)
2002-05-10 12:36 ` Andrea Arcangeli
2002-05-11 3:23 ` Lincoln Dale
@ 2002-05-12 11:23 ` Lincoln Dale
2002-05-13 11:37 ` Andrea Arcangeli
1 sibling, 1 reply; 265+ messages in thread
From: Lincoln Dale @ 2002-05-12 11:23 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: Andrew Morton, Kernel Mailing List
At 02:36 PM 10/05/2002 +0200, Andrea Arcangeli wrote:
>Can you also give a spin to the same benchmark with 2.4.19pre8aa2? It
>has the vary-io stuff from Badari and futher kiobuf optimization from
>Chuck. (vary-io will work only with aic and qlogic, enabling it is a one
>liner if the driver is just ok with variable bh->b_size in the same I/O
>request). right fix for avoiding the flood of small bh is bio in 2.5,
>for 2.4 vary-io should be fine.
2.4.19pre8aa2 booted with "profile=2" on dual P3-Xeon (733MHz), 2G PC133
SDRAM, qlogic 2300 HBA (firmware 3.01.02 driver version 6.0b20).
8 x 15K RPM 18G FC disks are directly-attached using 2gbit/s FC (actually,
its via a FC switch but that makes zero difference..).
kernel is set with PAGE_OFFSET_RAW at 8000000 (ie. no highmem defined)
no idea if the "vary-io" stuff is enabled for the qlogic driver or not --
some hints as to what to look for needed here..)
a clean reboot was done prior to each test.
the benchmark results, in summary:
O_DIRECT 62.02 mbyte/sec (2048 x 1048576byte reads)
/dev/rawN 52.31 mbyte/sec (2048 x 1048576byte reads)
base: 127.71 mbyte/sec (262144 x 8192byte reads)
nocopy hack: 182.17 mbyte/sec (262144 x 8192byte reads)
we can still say that:
- 'nocopy' is still 50% faster than copy_to_user(). ie. overhead of
copy_to_user worth
~60mbyte/sec and 18usec latency
- O_DIRECT is superior to /dev/raw/rawN but is still a huge performance hit
versus normal i/o on the block devices.
the benchmark results, in detail: (traces for each at bottom of email)
- O_DIRECT and disk devices never touched (ie. no filesystem on them at all),
2.4.19pre8aa2 performance is down slightly to 62mbyte/sec (compared to
65mbyte/sec with 2.4.18):
[root@mel-stglab-host1 src]# readprofile -r; \
./test_disk_performance blocks=2K bs=1m direct /dev/sd[e-l] >
/tmp/direct.txt; \
readprofile -v | sort -n -k4 >> /tmp/direct.txt
Completed reading 16000 mbytes in 257.977850 seconds (62.02
Mbytes/sec), 15867usec mean
- i/o without O_DIRECT on 2.4.19pre8aa2 basically has DOUBLE the performance:
[root@mel-stglab-host1 src]# readprofile -r; \
./test_disk_performance blocks=256K bs=8k /dev/sd[e-l] >
/tmp/base.txt; \
readprofile -v | sort -n -k4 >> /tmp/base.txt
Completed reading 16000 mbytes in 125.288417 seconds (127.71
Mbytes/sec), 59usec mean
- same rest using i/o on /dev/raw/rawN instead:
[root@mel-stglab-host1 src]# readprofile -r; \
./test_disk_performance blocks=2K bs=1m /dev/raw/raw[1-8] >
/tmp/raw.txt; \
readprofile -v | sort -n -k4 >> /tmp/raw.txt
Completed reading 16000 mbytes in 305.878143 seconds (52.31
Mbytes/sec), 18583usec mean
- to round out the numbers, we still have a bogus file_read_actor() that
doesn't actually
do the copy_to_user, thereby showing the overhead associated with that:
[root@mel-stglab-host1 src]# readprofile -r; \
./test_disk_performance blocks=256K bs=8k nocopy /dev/sd[e-l] >
/tmp/nocopy.txt; \
readprofile -v | sort -n -k4 >> /tmp/nocopy.txt
Completed reading 16000 mbytes in 87.827854 seconds (182.17
Mbytes/sec), 41usec mean
O_DIRECT:
[root@mel-stglab-host1 src]# tail -20 /tmp/direct.txt
8012a670 follow_page 25 0.1202
8012a740 get_user_pages 89 0.1918
80136d40 __free_pages 10 0.2083
801d28b0 generic_make_request 83 0.2730
8012aa50 mark_dirty_kiobuf 35 0.3125
8013f0e0 set_bh_page 22 0.3438
8011f950 do_softirq 88 0.3929
8023d670 sd_find_queue 26 0.4062
80142a10 max_block 54 0.4219
80200fb0 __scsi_end_request 165 0.5428
80142c80 blkdev_get_block 37 0.5781
801405d0 brw_kiovec 581 0.6371
80140560 wait_kio 90 0.8036
80152820 end_kio_request 76 0.9500
801d29e0 submit_bh 181 1.6161
8013e950 init_buffer 55 1.7188
801d22a0 __make_request 3073 1.9800
8013dd10 unlock_buffer 189 2.3625
80140520 end_buffer_io_kiobuf 408 6.3750
80106d20 default_idle 45686 713.8438
base:
[root@mel-stglab-host1 src]# tail -20 /tmp/base.txt
80133e60 kmem_cache_alloc 249 0.9154
80200fb0 __scsi_end_request 291 0.9572
80134fb0 delta_nr_inactive_pages 93 0.9688
801288b0 _spin_unlock_ 131 1.0234
8013f380 create_empty_buffers 107 1.1146
80135010 delta_nr_cache_pages 119 1.2396
801d28b0 generic_make_request 396 1.3026
8013f0e0 set_bh_page 102 1.5938
80108a48 system_call 91 1.6250
801d29e0 submit_bh 185 1.6518
801340e0 kmem_cache_free 217 1.6953
80140ea0 try_to_free_buffers 664 1.9762
801d22a0 __make_request 3214 2.0709
8012e0c0 unlock_page 283 2.2109
801298cc .text.lock.lockmeter 332 2.2432
80136d40 __free_pages 125 2.6042
801287d0 _spin_lock_ 585 5.2232
8013e970 end_buffer_io_async 1234 6.4271
8012edd0 file_read_actor 3732 33.3214
80106d20 default_idle 8875 138.6719
/dev/raw/rawN:
[root@mel-stglab-host1 src]# tail -20 /tmp/raw.txt
80122c50 tqueue_bh 4 0.1250
8012a670 follow_page 33 0.1587
8012a740 get_user_pages 118 0.2543
80203890 scsi_init_io_vc 139 0.2555
8012aa50 mark_dirty_kiobuf 36 0.3214
80136d40 __free_pages 22 0.4583
8011f950 do_softirq 113 0.5045
801d28b0 generic_make_request 204 0.6711
8013e950 init_buffer 34 1.0625
8023d670 sd_find_queue 70 1.0938
8013f0e0 set_bh_page 74 1.1562
80200fb0 __scsi_end_request 365 1.2007
801405d0 brw_kiovec 1288 1.4123
80140560 wait_kio 193 1.7232
80152820 end_kio_request 166 2.0750
801d29e0 submit_bh 347 3.0982
8013dd10 unlock_buffer 357 4.4625
801d22a0 __make_request 11014 7.0966
80140520 end_buffer_io_kiobuf 835 13.0469
80106d20 default_idle 45156 705.5625
nocopy hack:
[root@mel-stglab-host1 src]# tail -20 /tmp/nocopy.txt
8012dec0 page_cache_read 197 0.7695
80134fb0 delta_nr_inactive_pages 77 0.8021
80133e60 kmem_cache_alloc 221 0.8125
8013f020 get_unused_buffer_head 182 0.9479
801288b0 _spin_unlock_ 124 0.9688
80135010 delta_nr_cache_pages 110 1.1458
8013f380 create_empty_buffers 114 1.1875
801d28b0 generic_make_request 375 1.2336
801d29e0 submit_bh 201 1.7946
8013f0e0 set_bh_page 121 1.8906
8012e0c0 unlock_page 254 1.9844
80108a48 system_call 116 2.0714
801d22a0 __make_request 3234 2.0838
80140ea0 try_to_free_buffers 707 2.1042
801340e0 kmem_cache_free 272 2.1250
801298cc .text.lock.lockmeter 392 2.6486
80136d40 __free_pages 134 2.7917
801287d0 _spin_lock_ 636 5.6786
8013e970 end_buffer_io_async 1200 6.2500
80106d20 default_idle 5226 81.6562
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56)
2002-05-12 11:23 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56) Lincoln Dale
@ 2002-05-13 11:37 ` Andrea Arcangeli
0 siblings, 0 replies; 265+ messages in thread
From: Andrea Arcangeli @ 2002-05-13 11:37 UTC (permalink / raw)
To: Lincoln Dale; +Cc: Andrew Morton, Kernel Mailing List
On Sun, May 12, 2002 at 09:23:55PM +1000, Lincoln Dale wrote:
> O_DIRECT:
> [root@mel-stglab-host1 src]# tail -20 /tmp/direct.txt
> 8012a670 follow_page 25 0.1202
> 8012a740 get_user_pages 89 0.1918
follow-page and get_user_pages is the actual cpu cost of walking the
pagetables. that could be trimmed down by wasting some memory for an
efficient software-kernel-side tlb.
> 80136d40 __free_pages 10 0.2083
> 801d28b0 generic_make_request 83 0.2730
> 8012aa50 mark_dirty_kiobuf 35 0.3125
> 8013f0e0 set_bh_page 22 0.3438
> 8011f950 do_softirq 88 0.3929
> 8023d670 sd_find_queue 26 0.4062
> 80142a10 max_block 54 0.4219
> 80200fb0 __scsi_end_request 165 0.5428
> 80142c80 blkdev_get_block 37 0.5781
> 801405d0 brw_kiovec 581 0.6371
> 80140560 wait_kio 90 0.8036
> 80152820 end_kio_request 76 0.9500
> 801d29e0 submit_bh 181 1.6161
> 8013e950 init_buffer 55 1.7188
> 801d22a0 __make_request 3073 1.9800
> 8013dd10 unlock_buffer 189 2.3625
> 80140520 end_buffer_io_kiobuf 408 6.3750
> 80106d20 default_idle 45686 713.8438
the cpu cost is much smaller than base as you can see and most of it
will be optimized away with vary-io that should lead to follow_page and
get_user_pages to move down in the above profiling.
>
> base:
> [root@mel-stglab-host1 src]# tail -20 /tmp/base.txt
> 80133e60 kmem_cache_alloc 249 0.9154
> 80200fb0 __scsi_end_request 291 0.9572
> 80134fb0 delta_nr_inactive_pages 93 0.9688
> 801288b0 _spin_unlock_ 131 1.0234
> 8013f380 create_empty_buffers 107 1.1146
> 80135010 delta_nr_cache_pages 119 1.2396
> 801d28b0 generic_make_request 396 1.3026
> 8013f0e0 set_bh_page 102 1.5938
> 80108a48 system_call 91 1.6250
> 801d29e0 submit_bh 185 1.6518
> 801340e0 kmem_cache_free 217 1.6953
> 80140ea0 try_to_free_buffers 664 1.9762
> 801d22a0 __make_request 3214 2.0709
> 8012e0c0 unlock_page 283 2.2109
> 801298cc .text.lock.lockmeter 332 2.2432
> 80136d40 __free_pages 125 2.6042
> 801287d0 _spin_lock_ 585 5.2232
> 8013e970 end_buffer_io_async 1234 6.4271
> 8012edd0 file_read_actor 3732 33.3214
> 80106d20 default_idle 8875 138.6719
as expected the biggest cost is file_read_actor.
Both profiling looks fine, as expected.
> /dev/raw/rawN:
> [root@mel-stglab-host1 src]# tail -20 /tmp/raw.txt
> 80122c50 tqueue_bh 4 0.1250
> 8012a670 follow_page 33 0.1587
> 8012a740 get_user_pages 118 0.2543
> 80203890 scsi_init_io_vc 139 0.2555
> 8012aa50 mark_dirty_kiobuf 36 0.3214
> 80136d40 __free_pages 22 0.4583
> 8011f950 do_softirq 113 0.5045
> 801d28b0 generic_make_request 204 0.6711
> 8013e950 init_buffer 34 1.0625
> 8023d670 sd_find_queue 70 1.0938
> 8013f0e0 set_bh_page 74 1.1562
> 80200fb0 __scsi_end_request 365 1.2007
> 801405d0 brw_kiovec 1288 1.4123
> 80140560 wait_kio 193 1.7232
> 80152820 end_kio_request 166 2.0750
> 801d29e0 submit_bh 347 3.0982
> 8013dd10 unlock_buffer 357 4.4625
> 801d22a0 __make_request 11014 7.0966
> 80140520 end_buffer_io_kiobuf 835 13.0469
> 80106d20 default_idle 45156 705.5625
expected again, as you can see the cost goes quite up for __make_request
compared to O_DIRECT due the 512 b_size (raw compared to base is unfair
because base uses 1k b_size and raw uses 512 b_size and that's why it's
wasting so much cpu time there, o_direct vs base is instead fair because
they both uses 1k of b_size, once o_direct will take advantage of varyio
in your upgraded driver, the comparison between o_direct vs base will
become unfair too, o_direct will be more advantaged by a virtual-common
4k b_size)
> nocopy hack:
> [root@mel-stglab-host1 src]# tail -20 /tmp/nocopy.txt
> 8012dec0 page_cache_read 197 0.7695
> 80134fb0 delta_nr_inactive_pages 77 0.8021
> 80133e60 kmem_cache_alloc 221 0.8125
> 8013f020 get_unused_buffer_head 182 0.9479
> 801288b0 _spin_unlock_ 124 0.9688
> 80135010 delta_nr_cache_pages 110 1.1458
> 8013f380 create_empty_buffers 114 1.1875
> 801d28b0 generic_make_request 375 1.2336
> 801d29e0 submit_bh 201 1.7946
> 8013f0e0 set_bh_page 121 1.8906
> 8012e0c0 unlock_page 254 1.9844
> 80108a48 system_call 116 2.0714
> 801d22a0 __make_request 3234 2.0838
> 80140ea0 try_to_free_buffers 707 2.1042
> 801340e0 kmem_cache_free 272 2.1250
> 801298cc .text.lock.lockmeter 392 2.6486
> 80136d40 __free_pages 134 2.7917
> 801287d0 _spin_lock_ 636 5.6786
> 8013e970 end_buffer_io_async 1200 6.2500
> 80106d20 default_idle 5226 81.6562
very similar to o_direct, notice the overhead in _spin_lock_ here (and
also in "base") it's certainly the page_cache lock that won't go away in
2.5 with radix tree because you're working on the same file, higher
bandwith because the disks runs at the same time and possibly because
read/write returns faster and part of the cost of the I/O happens
outside your time measurements (for a more fair comparison you can
benchmark the whole workload, not only how fast read/write returns).
Andrea
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-10 6:50 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Lincoln Dale
2002-05-10 7:15 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14IDE 56) Andrew Morton
@ 2002-05-10 15:55 ` Linus Torvalds
2002-05-11 1:01 ` Gerrit Huizenga
2002-05-11 14:18 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Roy Sigurd Karlsbakk
1 sibling, 2 replies; 265+ messages in thread
From: Linus Torvalds @ 2002-05-10 15:55 UTC (permalink / raw)
To: Lincoln Dale
Cc: Andrew Morton, Alan Cox, Martin Dalecki, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
On Fri, 10 May 2002, Lincoln Dale wrote:
>
> so O_DIRECT in 2.4.18 still shows up as a 55% performance hit versus no
> O_DIRECT. anyone have any clues?
Yes.
O_DIRECT isn't doing any read-ahead.
For O_DIRECT to be a win, you need to make it asynchronous.
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-10 15:55 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Linus Torvalds
@ 2002-05-11 1:01 ` Gerrit Huizenga
2002-05-11 18:04 ` Linus Torvalds
2002-05-11 14:18 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Roy Sigurd Karlsbakk
1 sibling, 1 reply; 265+ messages in thread
From: Gerrit Huizenga @ 2002-05-11 1:01 UTC (permalink / raw)
To: Linus Torvalds
Cc: Lincoln Dale, Andrew Morton, Alan Cox, Martin Dalecki,
Padraig Brady, Anton Altaparmakov, Kernel Mailing List
In message <Pine.LNX.4.44.0205100854370.2230-100000@home.transmeta.com>, > : Li
nus Torvalds writes:
>
>
> On Fri, 10 May 2002, Lincoln Dale wrote:
> >
> > so O_DIRECT in 2.4.18 still shows up as a 55% performance hit versus no
> > O_DIRECT. anyone have any clues?
>
> Yes.
>
> O_DIRECT isn't doing any read-ahead.
>
> For O_DIRECT to be a win, you need to make it asynchronous.
>
> Linus
O_DIRECT is especially useful for applications which maintain their
own cache, e.g. a database. And adding Async to it is an even bigger
bonus (another Oracleism we did in PTX). No read ahead, no attempt
to keep the buffer in memory until memory pressure kicks in. Just
a good tool for doing random IO (like an OLTP database would do).
gerrit
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 1:01 ` Gerrit Huizenga
@ 2002-05-11 18:04 ` Linus Torvalds
2002-05-11 18:19 ` Larry McVoy
` (3 more replies)
0 siblings, 4 replies; 265+ messages in thread
From: Linus Torvalds @ 2002-05-11 18:04 UTC (permalink / raw)
To: Gerrit Huizenga
Cc: Lincoln Dale, Andrew Morton, Alan Cox, Martin Dalecki,
Padraig Brady, Anton Altaparmakov, Kernel Mailing List
On Fri, 10 May 2002, Gerrit Huizenga wrote:
> In message <Pine.LNX.4.44.0205100854370.2230-100000@home.transmeta.com>, > : Li
> nus Torvalds writes:
> >
> > For O_DIRECT to be a win, you need to make it asynchronous.
>
> O_DIRECT is especially useful for applications which maintain their
> own cache, e.g. a database. And adding Async to it is an even bigger
> bonus (another Oracleism we did in PTX).
The thing that has always disturbed me about O_DIRECT is that the whole
interface is just stupid, and was probably designed by a deranged monkey
on some serious mind-controlling substances [*].
It's simply not very pretty, and it doesn't perform very well either
because of the bad interfaces (where synchronocity of read/write is part
of it, but the inherent page-table-walking is another issue).
I bet you could get _better_ performance more cleanly by splitting up the
actual IO generation and the "user-space mapping" thing sanely. For
example, if you want to do an O_DIRECT read into a buffer, there is no
reason why it shouldn't be done in two phases:
(1) readahead: allocate pages, and start the IO asynchronously
(2) mmap the file with a MAP_UNCACHED flag, which causes read-faults to
"steal" the page from the page cache and make it private to the
mapping on page faults.
If you split it up like that, you can do much more interesting things than
O_DIRECT can do (ie the above is inherently asynchronous - we'll wait only
for IO to complete when the page is actually faulted in).
For O_DIRECT writes, you split it the other way around:
(1) mmwrite() takes the pages in the memory area, and moves them into the
page cache, removing the page from the page table (and only copies
if existing pages already exist)
(2) fdatasync_area(fd, offset, len)
Again, the above is likely to be a lot more efficient _and_ can do things
that O_DIRECT only dreams on.
With my suggested _sane_ interface, I can do a noncached file copy that
should be "perfect" even in the face of memory pressure by simply doing
addr = mmap( .. MAP_UNCACHED .. src .. )
mwrite(dst, addr, len);
which does true zero-copy (and, since mwrite removes it from the page
table anyway, you can actually avoid even the TLB overhead trivially: if
mwrite notices that the page isn't mapped, it will just take it directly
from the page cache).
Sadly, database people don't seem to have any understanding of good taste,
and various OS people end up usually just saying "Yes, Mr Oracle, I'll
open up any orifice I have for your pleasure".
Linus
[*] In other words, it's an Oracleism.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 18:04 ` Linus Torvalds
@ 2002-05-11 18:19 ` Larry McVoy
2002-05-11 18:35 ` Linus Torvalds
2002-05-11 18:26 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 Alan Cox
` (2 subsequent siblings)
3 siblings, 1 reply; 265+ messages in thread
From: Larry McVoy @ 2002-05-11 18:19 UTC (permalink / raw)
To: Linus Torvalds
Cc: Gerrit Huizenga, Lincoln Dale, Andrew Morton, Alan Cox,
Martin Dalecki, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
On Sat, May 11, 2002 at 11:04:45AM -0700, Linus Torvalds wrote:
> The thing that has always disturbed me about O_DIRECT is that the whole
> interface is just stupid, and was probably designed by a deranged monkey
> on some serious mind-controlling substances [*].
>
> I bet you could get _better_ performance more cleanly by splitting up the
> actual IO generation and the "user-space mapping" thing sanely. For
> example, if you want to do an O_DIRECT read into a buffer, there is no
> reason why it shouldn't be done in two phases:
You're only halfway right. You want to avoid the mmap altogether. To see
why, postulate that you have infinitely fast I/O devices (I know that's
not true but it's close enough if you get enough DMA channels going at
once, it doesn't take very many to saturate memory). For any server
application, now all your time is in the mmap(). And there is no need
for it in general, it's just there because the upper layer of the system
is too lame to handle real page frames.
Go read the splice notes, ftp://bitmover.com/pub/splice.ps because those
were written after we had tuned things enough in IRIX that it was the
VM manipulations that became the bottleneck.
Another way to think of it is this: figure out how fast the hardware could
move the data. Now make it go that fast. Unless you can hide all the
VM crud somehow, you won't achieve 100% of the hardware's capability.
I know I've done a bad job explaining the splice crud, but there is
some pretty cool stuff in there, if you really got it, you'd see how
the server stuff, the database stuff, the aio stuff, all I/O of any
kind can be done in terms of the splice:pull() and splice:push()
interfaces and that it is the absolute lowest cost way to have a
generic I/O layer.
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 18:19 ` Larry McVoy
@ 2002-05-11 18:35 ` Linus Torvalds
2002-05-11 18:37 ` Larry McVoy
` (2 more replies)
0 siblings, 3 replies; 265+ messages in thread
From: Linus Torvalds @ 2002-05-11 18:35 UTC (permalink / raw)
To: Larry McVoy
Cc: Gerrit Huizenga, Lincoln Dale, Andrew Morton, Alan Cox,
Martin Dalecki, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
On Sat, 11 May 2002, Larry McVoy wrote:
>
> You're only halfway right. You want to avoid the mmap altogether.
See my details on doing the perfect zero-copy copy thing.
The mmap doesn't actually touch the page tables - it ends up being nothing
but a "placeholder".
So if you do
addr = mmap( .. MAP_UNCACHED .. src .. )
mwrite(dst, addr, len);
then you can think of the mmap as just a "cookie" or the "hose" between
the source and the destination.
Does it have to be an mmap? No. But the advantage of the mmap is that you
can use the mmap to modify the stream if you want to, quite transparently.
And it gives the whole thing a whole lot more flexibility, in that if you
generate the data yourself, you'd just do the mwrite() - again with zero
copy overhead.
And I personally believe that "generate the data yourself" is actually a
very common case. A pure pipe between two places is not what a computer is
good at, or what a computer should be used for.
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 18:35 ` Linus Torvalds
@ 2002-05-11 18:37 ` Larry McVoy
2002-05-11 18:56 ` Linus Torvalds
2002-05-11 18:43 ` Mr. James W. Laferriere
2002-05-11 23:38 ` Lincoln Dale
2 siblings, 1 reply; 265+ messages in thread
From: Larry McVoy @ 2002-05-11 18:37 UTC (permalink / raw)
To: Linus Torvalds
Cc: Larry McVoy, Gerrit Huizenga, Lincoln Dale, Andrew Morton,
Alan Cox, Martin Dalecki, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
On Sat, May 11, 2002 at 11:35:21AM -0700, Linus Torvalds wrote:
> See my details on doing the perfect zero-copy copy thing.
>
> The mmap doesn't actually touch the page tables - it ends up being nothing
> but a "placeholder".
Huh, I must have missed something, does the mmap() not create any page
tables at all?
--
---
Larry McVoy lm at bitmover.com http://www.bitmover.com/lm
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 18:37 ` Larry McVoy
@ 2002-05-11 18:56 ` Linus Torvalds
2002-05-11 21:42 ` Gerrit Huizenga
0 siblings, 1 reply; 265+ messages in thread
From: Linus Torvalds @ 2002-05-11 18:56 UTC (permalink / raw)
To: Larry McVoy
Cc: Gerrit Huizenga, Lincoln Dale, Andrew Morton, Alan Cox,
Martin Dalecki, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
On Sat, 11 May 2002, Larry McVoy wrote:
> On Sat, May 11, 2002 at 11:35:21AM -0700, Linus Torvalds wrote:
> > See my details on doing the perfect zero-copy copy thing.
> >
> > The mmap doesn't actually touch the page tables - it ends up being nothing
> > but a "placeholder".
>
> Huh, I must have missed something, does the mmap() not create any page
> tables at all?
It can. But go down to the end in my first explanation to see why it
doesn't have to.
I'll write up the implementation notes and you'll see what I'm talking
about:
- readahead(fd, offset, size)
Obvious (except the readahead is free to ignore the size, it's just a
hint)
- mmap( MAP_UNCACHED )
This only sets up the "vma" descriptor (like all other MMAP's). It's
exactly like a regular private mapping, except instead of just
incrementing the page count on a page-in, it will look at whether the
page can just be removed from the page cache and inserted as a private
page into the mapping ("stealing" the page).
- fdatasync_area( fd, offset, len)
Obvious. It's fdatasync, except it only guarantees the specific range.
- mwrite(fd, addr, len)
This is really does the "reverse" of mmap(MAP_UNCACHED) (and like a
mapping, addr/len have to be page-aligned).
This walks the page tables, and does the _smart_ thing:
- if no mapping exists, it looks at the backing store of the vma,
and gets the page directly from the backing store instead of
bothering to populate the page tables.
- if the mapped page exists, it removes it from the page table
- in either case, it moves the page it got into the page cache of the
destination file descriptor.
NOTE on zero-copy / no-page-fault behaviour:
- mwrite has to walk the page tables _anyway_ (the same as O_DIRECT),
since that's the only way to do zero-copy.
- since mwrite has to do that part, it's trivial to notice that the page
tables don't exist. In fact, it's a very natural result of the whole
algorithm.
- if user space doesn't touch the mapping itself in any way (other than
point mwrite() at it), you never build up any page tables at all, and
you never even need to touch the TLB (ie no flushes, no nothing).
- note how even "mmap( MAP_UNCACHED )" doesn't actually touch the TLB or
the page tables (unless it uses MAP_FIXED and you use it to unmap a
previous area, of course - that's all in the normal mmap code already)
See?
I will _guarantee_ that this is more efficient than any O_DIRECT ever was,
and it will get very close to your "optimal" thing (it does need to look
at some page tables, but since the page tables haven't ever really needed
to be built up for the pure copy case, it will be able to decide that the
page isn't there from the top-level page table if you align the virtual
area properly - ie at 4MB boundaries on an x86).
I suspect that this is about a few hundred lines of code (and a lot of
testing). And you can emulate O_DIRECT behaviour with it, along with
splice (only for page-cache entities, though), and a lot of other
off-by-one uses.
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 18:56 ` Linus Torvalds
@ 2002-05-11 21:42 ` Gerrit Huizenga
0 siblings, 0 replies; 265+ messages in thread
From: Gerrit Huizenga @ 2002-05-11 21:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: Larry McVoy, Lincoln Dale, Andrew Morton, Alan Cox,
Martin Dalecki, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
In message <Pine.LNX.4.44.0205111141070.879-100000@home.transmeta.com>, > : Lin
us Torvalds writes:
>
>
> On Sat, 11 May 2002, Larry McVoy wrote:
> > On Sat, May 11, 2002 at 11:35:21AM -0700, Linus Torvalds wrote:
> > > See my details on doing the perfect zero-copy copy thing.
> > >
> > > The mmap doesn't actually touch the page tables - it ends up being nothing
> > > but a "placeholder".
>
> I'll write up the implementation notes and you'll see what I'm talking
> about:
>
> - readahead(fd, offset, size)
>
> Obvious (except the readahead is free to ignore the size, it's just a
> hint)
[...snip... lots of good ideas...]
I'm not sure this is quite the same problem that Oracle (and others)
typically used O_DIRECT for (not trying to be an apologist here, just
making sure the right problem gets solved)...
Most of what Oracle was managing with O_DIRECT was its "Shared Global
Area", which is usually a region of all possible memory that the OS
and other applications aren't using. It uses that space like a giant
buffer cache. Most of the IO's for OLTP applications were little
bitty random 2K IOs. So, their ideal goal was to have the ability to
say here's a list of 10,000 random 2K IOs I want you to do really
quickly and spread them out at these spots within the SGS. Those IOs
can be read asynchronously, but there needs to be some way to know when
the bits make it from disk to memory. Think of it as something like
a big async readv, ideally with the buffer cache and as much of the OS
out of the way as possible.
When the SGA is "full" (memory pressure) they do big async, no buffer
cache, non-deferred writev's (by non deferred, I mean that the write
is actually scheduled for disk, not buffered in memory indefinitely -
they really believe they are done with those buffers).
Now the mmap( MAP_UNCACHED ) thing might work, except that this isn't
really a private mapping - it's a shared mapping. So something like
tmpfs might be the answer, where the tmpfs had a property of being
uncached (in fact, Oracle would love it if that space were pinned into
memory/non-pageable/non-swappable). That way the clients don't block
taking page faults and the server schedules activities to get the
greatest throughput (e.g. schedule clients who wouldn't block).
Unfortunately, tmpfs takes away the niceness of the VM optimizations,
I fear.
Oh, and Database DSS workloads (Decision Support, scan all disks looking
for needles in a big haystack) has different tradeoffs, mostly needing
to focus on lots of sequential IO where pre-fetching and reading, and
discard buffers immediately after use are the primary focus and write
performance is not critical.
gerrit
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 18:35 ` Linus Torvalds
2002-05-11 18:37 ` Larry McVoy
@ 2002-05-11 18:43 ` Mr. James W. Laferriere
2002-05-11 23:38 ` Lincoln Dale
2 siblings, 0 replies; 265+ messages in thread
From: Mr. James W. Laferriere @ 2002-05-11 18:43 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List
Hello Linus ,
On Sat, 11 May 2002, Linus Torvalds wrote:
... Large snip ...
> And I personally believe that "generate the data yourself" is actually a
> very common case. A pure pipe between two places is not what a computer is
> good at, or what a computer should be used for.
Hmmm , (This may not apply here But ...)
What about linux as a router (ip/ipx/...) or a bridge device ?
Tia , JimL
+------------------------------------------------------------------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | P.O. Box 854 | Give me Linux |
| babydr@baby-dragons.com | Coudersport PA 16915 | only on AXP |
+------------------------------------------------------------------+
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 18:35 ` Linus Torvalds
2002-05-11 18:37 ` Larry McVoy
2002-05-11 18:43 ` Mr. James W. Laferriere
@ 2002-05-11 23:38 ` Lincoln Dale
2002-05-12 0:36 ` yodaiken
2 siblings, 1 reply; 265+ messages in thread
From: Lincoln Dale @ 2002-05-11 23:38 UTC (permalink / raw)
To: Linus Torvalds
Cc: Larry McVoy, Gerrit Huizenga, Andrew Morton, Alan Cox,
Martin Dalecki, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
as the person who started this whole thread and made the assertion that
copying from A to B is common:
At 11:35 AM 11/05/2002 -0700, Linus Torvalds wrote:
>And I personally believe that "generate the data yourself" is actually a
>very common case. A pure pipe between two places is not what a computer is
>good at, or what a computer should be used for.
i think you'd be surprised. if we include "pipe from disk to network" then
a large number of 'server' applications do exactly this.
webservers do. fileservers do. http caches do. streaming-media servers do.
sure, they may add additional headers on the front and still generate
dynamic content in some cases, but the "common case" is 'pipe from disk to
network' or 'pipe from network to disk'.
'network' is typically TCP but can be UDP (with rate-limiting) in some cases.
its very good to see this being discussed. thats a large step forward from
many people believing the problem was nonexistent.
i'm skeptical that continuing to use the page-cache is the correct way to
go -- many of these kinds of applications are doing their own form of
memory-management and hot-content 'caching' so are happy to manage a
few-to-several hundred megabytes of "page cache equivalent" data themselves.
at least on many of the 2.3.xx linux releases, that was one of the big
attractions of 'raw' devices -- they didn't get the box into an OOM situation.
if 2.5.xx and recent 2.4.xx has the issues of
page-cache-doesn't-shrink-fast-enough solved, then its forseeable it will fly.
cheers,
lincoln.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 23:38 ` Lincoln Dale
@ 2002-05-12 0:36 ` yodaiken
2002-05-12 2:40 ` Andrew Morton
0 siblings, 1 reply; 265+ messages in thread
From: yodaiken @ 2002-05-12 0:36 UTC (permalink / raw)
To: Lincoln Dale
Cc: Linus Torvalds, Larry McVoy, Gerrit Huizenga, Andrew Morton,
Alan Cox, Martin Dalecki, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
We did some i/o profiling about 6 years ago on a big scientific
app that had started in fortran and had been rewritten in c++
the fortran code used r/w on files and used temp files
the c++ did memmaps and had big data structures - taking advantage of
memory management.
one thing I thought was interesting is that it was easy to see how a smart
algorithm, not even such a smart one, could adapt i/o to the patterns of
i/o in the fortran code, but the c++ i/o patters were really complex.
when everything goes into the page cache, it seems like you will loose
information.
On Sun, May 12, 2002 at 09:38:12AM +1000, Lincoln Dale wrote:
> as the person who started this whole thread and made the assertion that
> copying from A to B is common:
>
> At 11:35 AM 11/05/2002 -0700, Linus Torvalds wrote:
> >And I personally believe that "generate the data yourself" is actually a
> >very common case. A pure pipe between two places is not what a computer is
> >good at, or what a computer should be used for.
>
> i think you'd be surprised. if we include "pipe from disk to network" then
> a large number of 'server' applications do exactly this.
> webservers do. fileservers do. http caches do. streaming-media servers do.
>
> sure, they may add additional headers on the front and still generate
> dynamic content in some cases, but the "common case" is 'pipe from disk to
> network' or 'pipe from network to disk'.
> 'network' is typically TCP but can be UDP (with rate-limiting) in some cases.
>
>
> its very good to see this being discussed. thats a large step forward from
> many people believing the problem was nonexistent.
>
> i'm skeptical that continuing to use the page-cache is the correct way to
> go -- many of these kinds of applications are doing their own form of
> memory-management and hot-content 'caching' so are happy to manage a
> few-to-several hundred megabytes of "page cache equivalent" data themselves.
> at least on many of the 2.3.xx linux releases, that was one of the big
> attractions of 'raw' devices -- they didn't get the box into an OOM situation.
> if 2.5.xx and recent 2.4.xx has the issues of
> page-cache-doesn't-shrink-fast-enough solved, then its forseeable it will fly.
>
>
> cheers,
>
> lincoln.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
www.fsmlabs.com www.rtlinux.com
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-12 0:36 ` yodaiken
@ 2002-05-12 2:40 ` Andrew Morton
0 siblings, 0 replies; 265+ messages in thread
From: Andrew Morton @ 2002-05-12 2:40 UTC (permalink / raw)
To: yodaiken
Cc: Lincoln Dale, Linus Torvalds, Larry McVoy, Gerrit Huizenga,
Alan Cox, Martin Dalecki, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
yodaiken@fsmlabs.com wrote:
>
> We did some i/o profiling about 6 years ago on a big scientific
> app that had started in fortran and had been rewritten in c++
> the fortran code used r/w on files and used temp files
> the c++ did memmaps and had big data structures - taking advantage of
> memory management.
> one thing I thought was interesting is that it was easy to see how a smart
> algorithm, not even such a smart one, could adapt i/o to the patterns of
> i/o in the fortran code, but the c++ i/o patters were really complex.
>
> when everything goes into the page cache, it seems like you will loose
> information.
>
That is certainly the case. If the application is seekily writing
to a file then we currently lay the file out on-disk in the order
in which the application seeked. So reading the file back
linearly is very slow.
Now this is not necessarily a bad thing - if the file was created
seekily then it will probably be _used_ seekily so no big
loss probably.
This problem is pretty unsolvable for filesystems which map blocks
to their disk address at write(2) time. It can be solved for
allocate-on-flush filesystems via a sort of the dirty page list,
or by maintaining ->dirty_pages in a tree or whatever.
There is one "file" where this problem really does matter - the
blockdev mapping "/dev/hda1". It is both highly fragmented and
poorly sorted on the dirty_pages list.
It's pretty trivial to perform a sillysort at writeout time:
if we just wrote page N and the next page isn't N+1 then do a
pagecache probe for "N+1". That's probably sufficient. If
not, there's a simple little sort routine over at
http://www.chiark.greenend.org.uk/~sgtatham/algorithms/listsort.html
which is appropriate to our lists.
I'll be taking a look at the sillysort option once I've cleared away
some other I/O scheduling glitches.
-
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14
2002-05-11 18:04 ` Linus Torvalds
2002-05-11 18:19 ` Larry McVoy
@ 2002-05-11 18:26 ` Alan Cox
2002-05-11 18:09 ` Linus Torvalds
2002-05-11 18:45 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) yodaiken
2002-05-11 19:55 ` O_DIRECT performance impact on 2.4.18 Bernd Eckenfels
3 siblings, 1 reply; 265+ messages in thread
From: Alan Cox @ 2002-05-11 18:26 UTC (permalink / raw)
To: Linus Torvalds
Cc: Gerrit Huizenga, Lincoln Dale, Andrew Morton, Alan Cox,
Martin Dalecki, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
> > O_DIRECT is especially useful for applications which maintain their
> > own cache, e.g. a database. And adding Async to it is an even bigger
> > bonus (another Oracleism we did in PTX).
>
> The thing that has always disturbed me about O_DIRECT is that the whole
> interface is just stupid, and was probably designed by a deranged monkey
> on some serious mind-controlling substances [*].
Used with aio its extremely nice. Without the aio patches its a bit lacking
whenever readahead is useful
Alan
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14
2002-05-11 18:26 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 Alan Cox
@ 2002-05-11 18:09 ` Linus Torvalds
0 siblings, 0 replies; 265+ messages in thread
From: Linus Torvalds @ 2002-05-11 18:09 UTC (permalink / raw)
To: Alan Cox
Cc: Gerrit Huizenga, Lincoln Dale, Andrew Morton, Martin Dalecki,
Padraig Brady, Anton Altaparmakov, Kernel Mailing List
On Sat, 11 May 2002, Alan Cox wrote:
> >
> > The thing that has always disturbed me about O_DIRECT is that the whole
> > interface is just stupid, and was probably designed by a deranged monkey
> > on some serious mind-controlling substances [*].
>
> Used with aio its extremely nice. Without the aio patches its a bit lacking
> whenever readahead is useful
But the point is that AIO is needed just to cover up the fundamental
idiocy in the interface. If the interface had been properly designed, it
would have been useful _without_ AIO.
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 18:04 ` Linus Torvalds
2002-05-11 18:19 ` Larry McVoy
2002-05-11 18:26 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 Alan Cox
@ 2002-05-11 18:45 ` yodaiken
2002-05-11 19:55 ` O_DIRECT performance impact on 2.4.18 Bernd Eckenfels
3 siblings, 0 replies; 265+ messages in thread
From: yodaiken @ 2002-05-11 18:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: Gerrit Huizenga, Lincoln Dale, Andrew Morton, Alan Cox,
Martin Dalecki, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
On Sat, May 11, 2002 at 11:04:45AM -0700, Linus Torvalds wrote:
> (1) readahead: allocate pages, and start the IO asynchronously
> (2) mmap the file with a MAP_UNCACHED flag, which causes read-faults to
> "steal" the page from the page cache and make it private to the
> mapping on page faults.
>
> If you split it up like that, you can do much more interesting things than
> O_DIRECT can do (ie the above is inherently asynchronous - we'll wait only
> for IO to complete when the page is actually faulted in).
I've never liked mmap although that may just be my advanced age
("we never had mmap, we copied files by cutting cuneiform in fresh
clay tablets, the way the gods intended ")
struct kio k;
k.count = RECORDSIZE;
fd1 = open("inputfile",KIO_READ);
fd1a = dup(fd1); //dup creates a non KIO descript for the samefile
fd2 = open("outputfile",KIO_WRITE);
while( (n=read(fd1,&k,sizeof struct kio)
{
write(fd2,&k,sizof struct kio);
if(k.seekposition%10000){
write(fd1a,"Another record sent,Mr E.\n",GROVELSIZE);
}
}
> Sadly, database people don't seem to have any understanding of good taste,
> and various OS people end up usually just saying "Yes, Mr Oracle, I'll
> open up any orifice I have for your pleasure".
When you drive by that campus in redwood city you start to understand how
insignificant you are.
--
---------------------------------------------------------
Victor Yodaiken
Finite State Machine Labs: The RTLinux Company.
www.fsmlabs.com www.rtlinux.com
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18
2002-05-11 18:04 ` Linus Torvalds
` (2 preceding siblings ...)
2002-05-11 18:45 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) yodaiken
@ 2002-05-11 19:55 ` Bernd Eckenfels
3 siblings, 0 replies; 265+ messages in thread
From: Bernd Eckenfels @ 2002-05-11 19:55 UTC (permalink / raw)
To: linux-kernel
In article <Pine.LNX.4.44.0205111047280.2355-100000@home.transmeta.com> you wrote:
> I bet you could get _better_ performance more cleanly by splitting up the
> actual IO generation and the "user-space mapping" thing sanely. For
> example, if you want to do an O_DIRECT read into a buffer, there is no
> reason why it shouldn't be done in two phases:
This works for your load, but it does not work for the load it is designed
for. Sequentially reading and througput is not the way to measure it. You
need random reading and lattency to see it's merrits.
Greetings
Bernd
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-10 15:55 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Linus Torvalds
2002-05-11 1:01 ` Gerrit Huizenga
@ 2002-05-11 14:18 ` Roy Sigurd Karlsbakk
2002-05-11 14:24 ` Jens Axboe
2002-05-11 23:17 ` Lincoln Dale
1 sibling, 2 replies; 265+ messages in thread
From: Roy Sigurd Karlsbakk @ 2002-05-11 14:18 UTC (permalink / raw)
To: Linus Torvalds, Lincoln Dale
Cc: Andrew Morton, Alan Cox, Martin Dalecki, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
On Friday 10 May 2002 17:55, Linus Torvalds wrote:
> On Fri, 10 May 2002, Lincoln Dale wrote:
> > so O_DIRECT in 2.4.18 still shows up as a 55% performance hit versus no
> > O_DIRECT. anyone have any clues?
>
> Yes.
>
> O_DIRECT isn't doing any read-ahead.
>
> For O_DIRECT to be a win, you need to make it asynchronous.
Will the use of O_DIRECT affect disk elevatoring?
Sorry if this is OT - I just need to know
Please cc: to me as I'm not on the list
roy
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 14:18 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Roy Sigurd Karlsbakk
@ 2002-05-11 14:24 ` Jens Axboe
2002-05-11 18:25 ` Gerrit Huizenga
2002-05-11 23:17 ` Lincoln Dale
1 sibling, 1 reply; 265+ messages in thread
From: Jens Axboe @ 2002-05-11 14:24 UTC (permalink / raw)
To: Roy Sigurd Karlsbakk
Cc: Linus Torvalds, Lincoln Dale, Andrew Morton, Alan Cox,
Martin Dalecki, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
On Sat, May 11 2002, Roy Sigurd Karlsbakk wrote:
> On Friday 10 May 2002 17:55, Linus Torvalds wrote:
> > On Fri, 10 May 2002, Lincoln Dale wrote:
> > > so O_DIRECT in 2.4.18 still shows up as a 55% performance hit versus no
> > > O_DIRECT. anyone have any clues?
> >
> > Yes.
> >
> > O_DIRECT isn't doing any read-ahead.
> >
> > For O_DIRECT to be a win, you need to make it asynchronous.
>
> Will the use of O_DIRECT affect disk elevatoring?
No, the I/O scheduler can't even tell whether it's being handed
O_DIRECT buffers or not.
Jens
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 14:24 ` Jens Axboe
@ 2002-05-11 18:25 ` Gerrit Huizenga
2002-05-11 20:17 ` Jens Axboe
0 siblings, 1 reply; 265+ messages in thread
From: Gerrit Huizenga @ 2002-05-11 18:25 UTC (permalink / raw)
To: Jens Axboe
Cc: Roy Sigurd Karlsbakk, Linus Torvalds, Lincoln Dale,
Andrew Morton, Alan Cox, Martin Dalecki, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
In message <20020511142434.GA1224@suse.de>, > : Jens Axboe writes:
> On Sat, May 11 2002, Roy Sigurd Karlsbakk wrote:
> > On Friday 10 May 2002 17:55, Linus Torvalds wrote:
> > > On Fri, 10 May 2002, Lincoln Dale wrote:
> > > > so O_DIRECT in 2.4.18 still shows up as a 55% performance hit versus no
> > > > O_DIRECT. anyone have any clues?
> > >
> > > Yes.
> > >
> > > O_DIRECT isn't doing any read-ahead.
> > >
> > > For O_DIRECT to be a win, you need to make it asynchronous.
> >
> > Will the use of O_DIRECT affect disk elevatoring?
>
> No, the I/O scheduler can't even tell whether it's being handed
> O_DIRECT buffers or not.
We tried disabling the elevator while doing Raw IO with DB2
a couple of weeks ago. The database performance degraded much
more than expected. Disks were FC connected Tritons or SCSI
connected ServerRaid (or both?). Oracle often asks for a patch
to disable the elevator since they believe they can schedule IO
better. We didn't try with Oracle in this case, but DB2 and RAW
IO without and elevator was not a good choice.
gerrit
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 18:25 ` Gerrit Huizenga
@ 2002-05-11 20:17 ` Jens Axboe
2002-05-11 22:27 ` Gerrit Huizenga
0 siblings, 1 reply; 265+ messages in thread
From: Jens Axboe @ 2002-05-11 20:17 UTC (permalink / raw)
To: Gerrit Huizenga
Cc: Roy Sigurd Karlsbakk, Linus Torvalds, Lincoln Dale,
Andrew Morton, Alan Cox, Martin Dalecki, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
On Sat, May 11 2002, Gerrit Huizenga wrote:
> In message <20020511142434.GA1224@suse.de>, > : Jens Axboe writes:
> > On Sat, May 11 2002, Roy Sigurd Karlsbakk wrote:
> > > On Friday 10 May 2002 17:55, Linus Torvalds wrote:
> > > > On Fri, 10 May 2002, Lincoln Dale wrote:
> > > > > so O_DIRECT in 2.4.18 still shows up as a 55% performance hit versus no
> > > > > O_DIRECT. anyone have any clues?
> > > >
> > > > Yes.
> > > >
> > > > O_DIRECT isn't doing any read-ahead.
> > > >
> > > > For O_DIRECT to be a win, you need to make it asynchronous.
> > >
> > > Will the use of O_DIRECT affect disk elevatoring?
> >
> > No, the I/O scheduler can't even tell whether it's being handed
> > O_DIRECT buffers or not.
>
> We tried disabling the elevator while doing Raw IO with DB2
> a couple of weeks ago. The database performance degraded much
I'm curious how you did this -- did you disable sorting and merging, or
just sorting? Merging is pretty essential to getting decent I/O speeds
in current kernels.
> more than expected. Disks were FC connected Tritons or SCSI
> connected ServerRaid (or both?). Oracle often asks for a patch
> to disable the elevator since they believe they can schedule IO
> better. We didn't try with Oracle in this case, but DB2 and RAW
> IO without and elevator was not a good choice.
Due to excessive queue scan times, lock contention, or just slight waste
of cycles?
--
Jens Axboe
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 20:17 ` Jens Axboe
@ 2002-05-11 22:27 ` Gerrit Huizenga
0 siblings, 0 replies; 265+ messages in thread
From: Gerrit Huizenga @ 2002-05-11 22:27 UTC (permalink / raw)
To: Jens Axboe
Cc: Roy Sigurd Karlsbakk, Linus Torvalds, Lincoln Dale,
Andrew Morton, Alan Cox, Martin Dalecki, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List, Jonathan Lahr
In message <20020511201742.GA1106@suse.de>, > : Jens Axboe writes:
> On Sat, May 11 2002, Gerrit Huizenga wrote:
> > In message <20020511142434.GA1224@suse.de>, > : Jens Axboe writes:
> > > On Sat, May 11 2002, Roy Sigurd Karlsbakk wrote:
> > > > On Friday 10 May 2002 17:55, Linus Torvalds wrote:
> > > > > On Fri, 10 May 2002, Lincoln Dale wrote:
> > > > > > so O_DIRECT in 2.4.18 still shows up as a 55% performance hit versus no
> > > > > > O_DIRECT. anyone have any clues?
> > > > >
> > > > > Yes.
> > > > >
> > > > > O_DIRECT isn't doing any read-ahead.
> > > > >
> > > > > For O_DIRECT to be a win, you need to make it asynchronous.
> > > >
> > > > Will the use of O_DIRECT affect disk elevatoring?
> > >
> > > No, the I/O scheduler can't even tell whether it's being handed
> > > O_DIRECT buffers or not.
> >
> > We tried disabling the elevator while doing Raw IO with DB2
> > a couple of weeks ago. The database performance degraded much
>
> I'm curious how you did this -- did you disable sorting and merging, or
> just sorting? Merging is pretty essential to getting decent I/O speeds
> in current kernels.
I believe sorting AND merging were turned off. BTW, this was 2.4 only,
our primary focus is getting product into people's hands this year. We
are hoping to play with the 2.5 IO scheduler, possibly in a few months.
> > more than expected. Disks were FC connected Tritons or SCSI
> > connected ServerRaid (or both?). Oracle often asks for a patch
> > to disable the elevator since they believe they can schedule IO
> > better. We didn't try with Oracle in this case, but DB2 and RAW
> > IO without and elevator was not a good choice.
>
> Due to excessive queue scan times, lock contention, or just slight waste
> of cycles?
A lot more interrupts on the RAID device, indicating a lot more
IOs, probably a direct result of disabling merging. Overall IO throughput
dropped pretty dramatically, reducing database throughput.
A good indication to gen a patch with just sorting turned off and
see where that gets us...
gerrit
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56)
2002-05-11 14:18 ` O_DIRECT performance impact on 2.4.18 (was: Re: [PATCH] 2.5.14 IDE 56) Roy Sigurd Karlsbakk
2002-05-11 14:24 ` Jens Axboe
@ 2002-05-11 23:17 ` Lincoln Dale
1 sibling, 0 replies; 265+ messages in thread
From: Lincoln Dale @ 2002-05-11 23:17 UTC (permalink / raw)
To: Roy Sigurd Karlsbakk; +Cc: linux-kernel
At 04:18 PM 11/05/2002 +0200, Roy Sigurd Karlsbakk wrote:
>Will the use of O_DIRECT affect disk elevatoring?
i believe the elevator is based on the 'block' layer and anything that goes
thru it. so the answer is that the requests would use the elevator.
for the test in question, i was doing sequential reads from the first block
of each disk until some block later on in the disk. (ie. a 2gbyte read or
18gbyte read).
given that was the case and the only i/o ops were 'read' operations,
elevator would make no difference here.
cheers,
lincoln.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-09 2:37 ` Lincoln Dale
2002-05-09 3:10 ` Andrew Morton
@ 2002-05-09 4:16 ` Andre Hedrick
2002-05-09 13:32 ` Alan Cox
2002-05-09 14:58 ` Alan Cox
2 siblings, 1 reply; 265+ messages in thread
From: Andre Hedrick @ 2002-05-09 4:16 UTC (permalink / raw)
To: Lincoln Dale
Cc: Alan Cox, Martin Dalecki, Linus Torvalds, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
Lincoln,
You are right on the money!
There is about 35-40% throughput killjoy "copy-from-kernel-to-userspace".
It is easy to demo if you have a bus analizer and can do accounting on the
data io less the command block overhead.
CR3's are your friend, not ...
On Thu, 9 May 2002, Lincoln Dale wrote:
> At 01:42 PM 8/05/2002 +0100, Alan Cox wrote:
> >The SCSI layer is significant overhead even in 2.5.
>
> i did some benchmarking on a high-end dual P3 Xeon (Serverworks chipset )
> with QLogic 2300 (2gbit/s) 64/66 Fibre Channel controllers.
>
> using the '/dev/sgX' interface to issue scsi reads/writes allowed me to hit
> the magical limit of 200mbyte/sec throughput. (basically just about
> linerate). (simultaneous "sg_read if=/dev/sgX mmap=1 bs=512 count=35M";
> sg_read from the sg-tools package)
>
> doing the same test thru the block-layer was basically capped at around
> 135mbyte/sec. (simultaneous "dd if=/dev/sdX of=/dev/null bs=512 count=35M").
>
> whether the bottleneck was copy-from-kernel-to-userspace (ie. exhaustion of
> Front-Side-Bus / memory bandwidth) or related to block-layer overhead and
> scsi layer overheads, i haven't yet validated, but at a ~35% performance
> difference is relatively significant nontheless.
>
> cpu utlization on the sg interface was under 10%. using 'dd' on the sd
> interface, both gigahertz P3 Xeons had 0% idle time.
>
>
> cheers,
>
> lincoln.
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Andre Hedrick
LAD Storage Consulting Group
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-09 4:16 ` [PATCH] 2.5.14 IDE 56 Andre Hedrick
@ 2002-05-09 13:32 ` Alan Cox
0 siblings, 0 replies; 265+ messages in thread
From: Alan Cox @ 2002-05-09 13:32 UTC (permalink / raw)
To: Andre Hedrick
Cc: Lincoln Dale, Alan Cox, Martin Dalecki, Linus Torvalds,
Padraig Brady, Anton Altaparmakov, Kernel Mailing List
> You are right on the money!
> There is about 35-40% throughput killjoy "copy-from-kernel-to-userspace".
> It is easy to demo if you have a bus analizer and can do accounting on the
> data io less the command block overhead.
>
> CR3's are your friend, not ...
You should be able to verify that by using large O_DIRECT I/O's. The
block layer itself may well be part of the overhead. With the scsi layer,
the block->scsi handling code is definitely a bottleneck to performance
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-09 2:37 ` Lincoln Dale
2002-05-09 3:10 ` Andrew Morton
2002-05-09 4:16 ` [PATCH] 2.5.14 IDE 56 Andre Hedrick
@ 2002-05-09 14:58 ` Alan Cox
2 siblings, 0 replies; 265+ messages in thread
From: Alan Cox @ 2002-05-09 14:58 UTC (permalink / raw)
To: Lincoln Dale
Cc: Alan Cox, Martin Dalecki, Linus Torvalds, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
> doing the same test thru the block-layer was basically capped at around
> 135mbyte/sec. (simultaneous "dd if=/dev/sdX of=/dev/null bs=512 count=35M").
Tweak your dd to use O_DIRECT and use an O_DIRECT capable fs - that tells
you if its copy overhead or disk side stuff.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 12:18 ` Alan Cox
2002-05-08 11:09 ` Martin Dalecki
@ 2002-05-08 18:21 ` Erik Andersen
2002-05-08 18:59 ` Dave Jones
2002-05-08 19:31 ` Alan Cox
1 sibling, 2 replies; 265+ messages in thread
From: Erik Andersen @ 2002-05-08 18:21 UTC (permalink / raw)
To: Alan Cox
Cc: Martin Dalecki, Linus Torvalds, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
On Wed May 08, 2002 at 01:18:47PM +0100, Alan Cox wrote:
> I can't speak directly for the Kudzu maintainer but I can say that having
> a sane way to obtain the list of ide devices (all of them not just non
> pcmcia) and the device bindings/type has been a long standing request.
Can't one simply do something like:
char device_string[20];
int i, type, major=0, minor=0;
for(i=0; i<26; i++) {
snprintf(device_string, sizeof(device_string), "/dev/hd%c", 'a'+i);
if ((fd=open(device_string, O_RDONLY | O_NONBLOCK)) < 0) {
continue;
}
switch ('a'+i) {
case 'a':
major=3;minor=0;
break;
case 'b':
major=3;minor=64;
break;
case 'c':
major=22;minor=0;
break;
case 'd':
major=22;minor=64;
break;
.....
}
etc.... to detect the available ide devices without groveling
through /proc/ide?
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 18:21 ` Erik Andersen
@ 2002-05-08 18:59 ` Dave Jones
2002-05-08 19:31 ` Alan Cox
1 sibling, 0 replies; 265+ messages in thread
From: Dave Jones @ 2002-05-08 18:59 UTC (permalink / raw)
To: Erik Andersen, Alan Cox, Martin Dalecki, Linus Torvalds,
Padraig Brady, Anton Altaparmakov, Kernel Mailing List
On Wed, May 08, 2002 at 12:21:39PM -0600, Erik Andersen wrote:
> if ((fd=open(device_string, O_RDONLY | O_NONBLOCK)) < 0) {
...
> etc.... to detect the available ide devices without groveling
> through /proc/ide?
This goes splat with removable IDE devices like ZIP drives etc.
They fail to open() unless you put a disk in them.
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 18:21 ` Erik Andersen
2002-05-08 18:59 ` Dave Jones
@ 2002-05-08 19:31 ` Alan Cox
2002-05-08 21:16 ` Erik Andersen
1 sibling, 1 reply; 265+ messages in thread
From: Alan Cox @ 2002-05-08 19:31 UTC (permalink / raw)
To: andersen
Cc: Alan Cox, Martin Dalecki, Linus Torvalds, Padraig Brady,
Anton Altaparmakov, Kernel Mailing List
> int i, type, major=0, minor=0;
> for(i=0; i<26; i++) {
> snprintf(device_string, sizeof(device_string), "/dev/hd%c", 'a'+i);
> if ((fd=open(device_string, O_RDONLY | O_NONBLOCK)) < 0) {
> continue;
> }
If it opened is it there. Suppose its an IDE floppy and no media is
present. Maybe its hiding in ide-scsi instead. It ends up being detective
work. The /device set up makes it explicit and clean
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 19:31 ` Alan Cox
@ 2002-05-08 21:16 ` Erik Andersen
2002-05-08 22:14 ` Alan Cox
0 siblings, 1 reply; 265+ messages in thread
From: Erik Andersen @ 2002-05-08 21:16 UTC (permalink / raw)
To: Alan Cox; +Cc: Kernel Mailing List
On Wed May 08, 2002 at 08:31:11PM +0100, Alan Cox wrote:
> > int i, type, major=0, minor=0;
> > for(i=0; i<26; i++) {
> > snprintf(device_string, sizeof(device_string), "/dev/hd%c", 'a'+i);
> > if ((fd=open(device_string, O_RDONLY | O_NONBLOCK)) < 0) {
> > continue;
> > }
>
> If it opened is it there. Suppose its an IDE floppy and no media is
> present. Maybe its hiding in ide-scsi instead. It ends up being detective
> work.
That suggests to me that IDE floppy needs to be fixed to open
even when no media is present when provided with the O_NONBLOCK
flag, which would be consistant with how CDROMs, and everything
SCSI works.
As for ide-scsi, I thought that was going to go away?
> work. The /device set up makes it explicit and clean
agreed. But I don't expect to see that showing up soon in 2.4.x,
which is what most people (like me) will be using for the next
year or two. Sure 2.5.x it might work, but it might eat your
disk too. So is groping about in /proc/ide the only way to get
reliable ide device detection for 2.4.x, or is there some other
way?
-Erik
--
Erik B. Andersen http://codepoet-consulting.com/
--This message was written using 73% post-consumer electrons--
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 21:16 ` Erik Andersen
@ 2002-05-08 22:14 ` Alan Cox
0 siblings, 0 replies; 265+ messages in thread
From: Alan Cox @ 2002-05-08 22:14 UTC (permalink / raw)
To: andersen; +Cc: Alan Cox, Kernel Mailing List
> That suggests to me that IDE floppy needs to be fixed to open
> even when no media is present when provided with the O_NONBLOCK
> flag, which would be consistant with how CDROMs, and everything
> SCSI works.
>
> As for ide-scsi, I thought that was going to go away?
If we move to a more general device tree and you can set the mode you wish
to use when accessing the device it can do. Right now the "mode" so to
speak is implied by the file you open rather than by how you send it
commands and what you ask it to do
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 7:58 ` Martin Dalecki
2002-05-08 12:18 ` Alan Cox
@ 2002-05-09 13:13 ` Pavel Machek
2002-05-09 19:22 ` Daniel Jacobowitz
2002-05-10 12:01 ` Padraig Brady
2 siblings, 1 reply; 265+ messages in thread
From: Pavel Machek @ 2002-05-09 13:13 UTC (permalink / raw)
To: Martin Dalecki
Cc: Linus Torvalds, Alan Cox, Padraig Brady, Anton Altaparmakov,
Kernel Mailing List
Hi!
> BTW. If one needs the size of the disk well we could
> attach it as a file size to the device file in /dev IMHO. Why not?
Seems like good idea. (I don't know how happy du is going to be that. OTOH
is du is not happy, we should fix it not to count block devices...)
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-09 13:13 ` Pavel Machek
@ 2002-05-09 19:22 ` Daniel Jacobowitz
0 siblings, 0 replies; 265+ messages in thread
From: Daniel Jacobowitz @ 2002-05-09 19:22 UTC (permalink / raw)
To: Kernel Mailing List
On Thu, May 09, 2002 at 01:13:42PM +0000, Pavel Machek wrote:
> Hi!
>
> > BTW. If one needs the size of the disk well we could
> > attach it as a file size to the device file in /dev IMHO. Why not?
>
> Seems like good idea. (I don't know how happy du is going to be that. OTOH
> is du is not happy, we should fix it not to count block devices...)
> Pavel
The number /usr/bin/du shows is the block usage, not the logical size;
you could use the logical size for this...
--
Daniel Jacobowitz Carnegie Mellon University
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 7:58 ` Martin Dalecki
2002-05-08 12:18 ` Alan Cox
2002-05-09 13:13 ` Pavel Machek
@ 2002-05-10 12:01 ` Padraig Brady
2 siblings, 0 replies; 265+ messages in thread
From: Padraig Brady @ 2002-05-10 12:01 UTC (permalink / raw)
To: Martin Dalecki
Cc: Linus Torvalds, Alan Cox, Anton Altaparmakov, Kernel Mailing List
Martin Dalecki wrote:
> Uz.ytkownik Linus Torvalds napisa?:
>
>> But there is definitely a potential backwards-compatibility-issue.
>
>
> Linus - there are no backward compatibility issues here.
> No single application from my system does mess with /proc/ide.
> They showed you a list of programs which use /proc and not a list
> of programs which use anything out of /proc/ide...
To be thorough, I greped for /proc/ide not just /proc,
the exact command was:
find /sbin /usr/sbin /bin /usr/bin /lib /usr/lib /usr/bin/X11/ -xdev
-perm +111 | xargs grep -l /proc/ide 2>/dev/null
Padraig.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 17:00 ` Linus Torvalds
2002-05-07 17:19 ` benh
2002-05-08 7:58 ` Martin Dalecki
@ 2002-05-09 13:18 ` Pavel Machek
2 siblings, 0 replies; 265+ messages in thread
From: Pavel Machek @ 2002-05-09 13:18 UTC (permalink / raw)
To: Linus Torvalds
Cc: Alan Cox, Padraig Brady, Anton Altaparmakov, Martin Dalecki,
Kernel Mailing List
Hi!
> /driverfs/root/pci0/00:1f.4/usb_bus/000/
>
> and it wouldn't be impossible (or even necessarily very hard) to make an
> IDE controller export the "IDE device tree" the same way a USB controller
> now exports the "USB device tree".
Look harder, it should be already there.
Pavel
--
Philips Velo 1: 1"x4"x8", 300gram, 60, 12MB, 40bogomips, linux, mutt,
details at http://atrey.karlin.mff.cuni.cz/~pavel/velo/index.html.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 16:29 ` Padraig Brady
2002-05-07 16:51 ` Linus Torvalds
2002-05-07 17:08 ` Alan Cox
@ 2002-05-07 17:10 ` Richard B. Johnson
2002-05-08 7:36 ` Martin Dalecki
3 siblings, 0 replies; 265+ messages in thread
From: Richard B. Johnson @ 2002-05-07 17:10 UTC (permalink / raw)
To: Padraig Brady; +Cc: Anton Altaparmakov, Martin Dalecki, Kernel Mailing List
On Tue, 7 May 2002, Padraig Brady wrote:
> Linus Torvalds wrote:
> > [ First off: any IDE-only thing that doesn't work for SCSI or other disks
> > doesn't solve a generic problem, so the complaint that some generic
> > tools might use it is totally invalid. ]
> >
> > On Tue, 7 May 2002, Anton Altaparmakov wrote:
> >
> >>Linux's power is exactly that it can be used on anything from a wristwatch
> >>to a huge server and that it is flexible about everything. You are breaking
> >>this flexibility for no apparent reason. (I don't accept "I can't cope with
> >>this so I remove it." as a reason, sorry).
> >
> >
> > Run the 57 patch, and complain if something doesn't work.
> >
> > Linux's power is that we FIX stuff. That we make it the best system
> > possible, and that we don't just whine and argue about things.
> >
> >
> >>As the new IDE maintainer so far we have only seen you removing one
> >>feature after the other in the name of cleanup, without adequate or even
> >>any at all(!) replacements,
> >
> >
> > Who cares? Have you found _anything_ that Martin removed that was at all
> > worthwhile? I sure haven't.
> >
> > Guys, you have to realize that the IDE layer has eight YEARS of absolute
> > crap in it. Seriously. It's _never_ been cleaned up before. It has stuff
> > so distasteful that t's scary.
> >
> > Take it from me: it's a _lot_ easier to add cruft and crap on top of clean
> > code. You can do it yourself if you want to. You don't need a maintainer
> > to add barnacles.
> >
> > All the information that /proc/ide gave you is basically available in
> > hdparm, and for your dear embedded system it apparently takes up less
> > space by being in user space. So what is the problem?
>
> Well my "dear" embedded system doesn't have libc :-(
> So 35664 saved in kernel (less on disk), requires 25212
> extra for hdparm + more for static linked uclibc (hope
> it works ;-)). As a side note if this happens hdparm would
> be a requirement for busybox IMHO, anyway getting back on topic...
Link your embeded stuff against a stripped-down shared libc...
-rwxr-xr-x 1 root root 876 Apr 26 13:08 crt1.o
-rwxr-xr-x 1 root root 160824 Feb 25 13:30 ld-linux.so.2
-rwxr-xr-x 1 root root 160824 Apr 30 11:31 ld.so
-rwxr-xr-x 1 root root 2376745 Feb 25 13:29 libc.so.6
-rwxr-xr-x 1 root root 368551 Feb 25 13:29 libm.so.6
This does most everything an embedded system needs. You can extract
the objects from a shared object file (copy), remove the ones you
obviously don't need, make a new shared object file and link. Keep
adding objects until you don't have a any more unresolved symbols.
`ld` allows you to link to whatever you need. I put my special
'libc' plus another private shared library in /opt/lib. On the
target machine, /opt/lib is a sym-link to /lib.
LPATH=/opt/lib
ELINK=-rpath-link $(LPATH) \
-rpath $(LPATH) \
-L $(LPATH) -m elf_i386 \
-dynamic-linker \
$(LPATH)/ld-linux.so.2 \
$(LPATH)/crt1.o \
$(LPATH)/crtendS.o \
$(LPATH)/libc.so.6 \
$(LPATH)/libm.so.6
program: program.o
ld -o program program.o $(ELINK)
Cheers,
Dick Johnson
Penguin : Linux version 2.4.18 on an i686 machine (797.90 BogoMips).
Windows-2000/Professional isn't.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 16:29 ` Padraig Brady
` (2 preceding siblings ...)
2002-05-07 17:10 ` Richard B. Johnson
@ 2002-05-08 7:36 ` Martin Dalecki
2002-05-08 17:22 ` Greg KH
3 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-08 7:36 UTC (permalink / raw)
To: Padraig Brady; +Cc: Linus Torvalds, Anton Altaparmakov, Kernel Mailing List
Uz.ytkownik Padraig Brady napisa?:
> Linus Torvalds wrote:
>
>> [ First off: any IDE-only thing that doesn't work for SCSI or other
>> disks
>> doesn't solve a generic problem, so the complaint that some generic
>> tools might use it is totally invalid. ]
>>
>> On Tue, 7 May 2002, Anton Altaparmakov wrote:
>>
>>> Linux's power is exactly that it can be used on anything from a
>>> wristwatch
>>> to a huge server and that it is flexible about everything. You are
>>> breaking
>>> this flexibility for no apparent reason. (I don't accept "I can't
>>> cope with
>>> this so I remove it." as a reason, sorry).
>>
>>
>>
>> Run the 57 patch, and complain if something doesn't work.
>>
>> Linux's power is that we FIX stuff. That we make it the best system
>> possible, and that we don't just whine and argue about things.
>>
>>
>>> As the new IDE maintainer so far we have only seen you removing one
>>> feature after the other in the name of cleanup, without adequate or even
>>> any at all(!) replacements,
>>
>>
>>
>> Who cares? Have you found _anything_ that Martin removed that was at all
>> worthwhile? I sure haven't.
>>
>> Guys, you have to realize that the IDE layer has eight YEARS of absolute
>> crap in it. Seriously. It's _never_ been cleaned up before. It has stuff
>> so distasteful that t's scary.
>>
>> Take it from me: it's a _lot_ easier to add cruft and crap on top of
>> clean
>> code. You can do it yourself if you want to. You don't need a maintainer
>> to add barnacles.
>>
>> All the information that /proc/ide gave you is basically available in
>> hdparm, and for your dear embedded system it apparently takes up less
>> space by being in user space. So what is the problem?
>
>
> Well my "dear" embedded system doesn't have libc :-(
> So 35664 saved in kernel (less on disk), requires 25212
> extra for hdparm + more for static linked uclibc (hope
> it works ;-)). As a side note if this happens hdparm would
> be a requirement for busybox IMHO, anyway getting back on topic...
>
> All the info I've ever needed is /proc/ide/hdx/capacity
> which I could get from /proc/partitions with more a bit
> more effort, so I vote for removing /proc/ide.
>
> I think everyone realises Martin is doing great and much needed work
> on IDE (btw I'll have those flash support patches soon Martin ;-)),
> but I did think this change needed debate. In general I know it's a
> hard decision what to export in proc, especially if there are
> existing dependencies, a few already mentioned possibles in RH7.1:
>
> /sbin/mkinitrd
> /sbin/fdisk
> /sbin/sfdisk
> /sbin/sndconfig
> /usr/sbin/mouseconfig
> /usr/sbin/kudzu
> /usr/sbin/module_upgrade
> /usr/sbin/updfstab
> /usr/sbin/glidelink
> /usr/sbin/sndconfig
> /usr/lib/python1.5/site-packages/_kudzumodule.so
> /usr/bin/X11/Xconfigurator
>
> For e.g. could the same arguments could be made for lspci only
> interface to pci info rather than /proc/bus/pci? The following
> references are made to /proc/bus/pci on my system:
In esp. in sigth of the fact that we have a device tree filesystem, I
rather think that /prco/bus/pci is obsolete indeed.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-08 7:36 ` Martin Dalecki
@ 2002-05-08 17:22 ` Greg KH
0 siblings, 0 replies; 265+ messages in thread
From: Greg KH @ 2002-05-08 17:22 UTC (permalink / raw)
To: Martin Dalecki
Cc: Padraig Brady, Linus Torvalds, Anton Altaparmakov, Kernel Mailing List
On Wed, May 08, 2002 at 09:36:27AM +0200, Martin Dalecki wrote:
> >
> >For e.g. could the same arguments could be made for lspci only
> >interface to pci info rather than /proc/bus/pci? The following
> >references are made to /proc/bus/pci on my system:
>
> In esp. in sigth of the fact that we have a device tree filesystem, I
> rather think that /prco/bus/pci is obsolete indeed.
Not quite yet. I considered moving the functionality of /proc/bus/pci
into driverfs, but couldn't find a good solid reason to do it (and it
would involve changing lspci and any other userspace programs that use
it today.)
Now reimplementing /proc/bus/pci as a stand alone filesystem mounted in
that position (like usbfs is) is another story. pcifs anyone? :)
thanks,
greg k-h
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 56
2002-05-07 11:22 ` [PATCH] 2.5.14 IDE 56 Martin Dalecki
2002-05-07 14:02 ` Padraig Brady
@ 2002-05-08 18:46 ` Denis Vlasenko
1 sibling, 0 replies; 265+ messages in thread
From: Denis Vlasenko @ 2002-05-08 18:46 UTC (permalink / raw)
To: Martin Dalecki, Linus Torvalds; +Cc: Kernel Mailing List
On 7 May 2002 09:22, Martin Dalecki wrote:
> Mon May 6 13:29:44 CEST 2002 ide-clean-56
+ printk("%s: reset timed-out, status=0x%02x\n", ch->name, stat);
"timed out" (no dash)
--
vda
^ permalink raw reply [flat|nested] 265+ messages in thread
* [PATCH] 2.5.14 IDE 57
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
` (3 preceding siblings ...)
2002-05-07 11:22 ` [PATCH] 2.5.14 IDE 56 Martin Dalecki
@ 2002-05-07 11:27 ` Martin Dalecki
2002-05-07 13:16 ` Anton Altaparmakov
2002-05-11 14:09 ` Aaron Lehmann
2002-05-07 15:03 ` [PATCH] IDE 58 Martin Dalecki
` (7 subsequent siblings)
12 siblings, 2 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 11:27 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 1361 bytes --]
Tue May 7 02:37:49 CEST 2002 ide-clean-57
Nuke /proc/ide. For explanations why, please see the frustrated comments in the
previous change log. If one still don't see why it wasn't a good thing,
well please just take a look at the following:
Kernel size before:
/usr/src/linux# size vmlinux
text data bss dec hex filename
1716049 403968 470252 2590269 27863d vmlinux
/usr/src/linux#
Kernel size after:
/usr/src/linux# size vmlinux
text data bss dec hex filename
1680993 403488 470124 2554605 26faed vmlinux
/usr/src/linux#
2% of overall size! And this is not exactly an minimalistic setup.
Wow! What a waste of space!!!! Not even counting the runtime size
of this crap! And then let's take a look at the following self
flattery:
-/*
- * Copyright (C) 1997-1998 Mark Lord
- *
- * This is the /proc/ide/ filesystem implementation.
- *
- * The major reason this exists is to provide sufficient access
- * to driver and config data, such that user-mode programs can
- * be developed to handle chipset tuning for most PCI interfaces.
- * This should provide better utilities, and less kernel bloat.
^^^^^^^^^^^^^^^^^^
Well there could only be an answer to this which would be
universally understandable in every Slavic language... but since
it's mothers day...
EOD.
[-- Attachment #2: ide-clean-57.diff --]
[-- Type: text/plain, Size: 29987 bytes --]
diff -urN linux-2.5.14/drivers/ide/aec62xx.c linux/drivers/ide/aec62xx.c
--- linux-2.5.14/drivers/ide/aec62xx.c 2002-05-06 05:37:59.000000000 +0200
+++ linux/drivers/ide/aec62xx.c 2002-05-07 03:21:35.000000000 +0200
@@ -26,7 +26,7 @@
#include "ata-timing.h"
-#define DISPLAY_AEC62XX_TIMINGS
+#undef DISPLAY_AEC62XX_TIMINGS
#ifndef HIGH_4
#define HIGH_4(H) ((H)=(H>>4))
@@ -503,7 +503,7 @@
bmide_dev = dev;
aec62xx_display_info = &aec62xx_get_info;
}
-#endif /* DISPLAY_AEC62XX_TIMINGS && CONFIG_PROC_FS */
+#endif
return dev->irq;
}
diff -urN linux-2.5.14/drivers/ide/alim15x3.c linux/drivers/ide/alim15x3.c
--- linux-2.5.14/drivers/ide/alim15x3.c 2002-05-06 05:38:04.000000000 +0200
+++ linux/drivers/ide/alim15x3.c 2002-05-07 03:22:24.000000000 +0200
@@ -28,7 +28,7 @@
#include "ata-timing.h"
-#define DISPLAY_ALI_TIMINGS
+#undef DISPLAY_ALI_TIMINGS
#if defined(DISPLAY_ALI_TIMINGS) && defined(CONFIG_PROC_FS)
#include <linux/stat.h>
diff -urN linux-2.5.14/drivers/ide/amd74xx.c linux/drivers/ide/amd74xx.c
--- linux-2.5.14/drivers/ide/amd74xx.c 2002-05-06 05:38:03.000000000 +0200
+++ linux/drivers/ide/amd74xx.c 2002-05-07 03:23:35.000000000 +0200
@@ -94,7 +94,7 @@
* AMD /proc entry.
*/
-#ifdef CONFIG_PROC_FS
+#if 0 && defined(CONFIG_PROC_FS)
#include <linux/stat.h>
#include <linux/proc_fs.h>
@@ -384,7 +384,7 @@
* Register /proc/ide/amd74xx entry
*/
-#ifdef CONFIG_PROC_FS
+#if 0 && defined(CONFIG_PROC_FS)
if (!amd74xx_proc) {
amd_base = pci_resource_start(dev, 4);
bmide_dev = dev;
diff -urN linux-2.5.14/drivers/ide/cmd64x.c linux/drivers/ide/cmd64x.c
--- linux-2.5.14/drivers/ide/cmd64x.c 2002-05-06 05:38:00.000000000 +0200
+++ linux/drivers/ide/cmd64x.c 2002-05-07 03:24:08.000000000 +0200
@@ -79,7 +79,7 @@
#define UDIDETCR1 0x7B
#define DTPR1 0x7C
-#define DISPLAY_CMD64X_TIMINGS
+#undef DISPLAY_CMD64X_TIMINGS
#if defined(DISPLAY_CMD64X_TIMINGS) && defined(CONFIG_PROC_FS)
#include <linux/stat.h>
diff -urN linux-2.5.14/drivers/ide/cs5530.c linux/drivers/ide/cs5530.c
--- linux-2.5.14/drivers/ide/cs5530.c 2002-05-06 05:38:04.000000000 +0200
+++ linux/drivers/ide/cs5530.c 2002-05-07 03:24:29.000000000 +0200
@@ -29,7 +29,7 @@
#include "ata-timing.h"
-#define DISPLAY_CS5530_TIMINGS
+#undef DISPLAY_CS5530_TIMINGS
#if defined(DISPLAY_CS5530_TIMINGS) && defined(CONFIG_PROC_FS)
#include <linux/stat.h>
diff -urN linux-2.5.14/drivers/ide/hpt366.c linux/drivers/ide/hpt366.c
--- linux-2.5.14/drivers/ide/hpt366.c 2002-05-07 02:36:37.000000000 +0200
+++ linux/drivers/ide/hpt366.c 2002-05-07 03:25:23.000000000 +0200
@@ -65,7 +65,7 @@
#include "ata-timing.h"
-#define DISPLAY_HPT366_TIMINGS
+#undef DISPLAY_HPT366_TIMINGS
/* various tuning parameters */
#define HPT_RESET_STATE_ENGINE
diff -urN linux-2.5.14/drivers/ide/ide.c linux/drivers/ide/ide.c
--- linux-2.5.14/drivers/ide/ide.c 2002-05-07 03:47:14.000000000 +0200
+++ linux/drivers/ide/ide.c 2002-05-07 03:17:41.000000000 +0200
@@ -1921,12 +1921,6 @@
return 0;
}
-#ifdef CONFIG_PROC_FS
-ide_proc_entry_t generic_subdriver_entries[] = {
- { NULL, 0, NULL, NULL }
-};
-#endif
-
void ide_unregister(struct ata_channel *ch)
{
struct gendisk *gd;
@@ -1983,9 +1977,6 @@
}
}
}
-#ifdef CONFIG_PROC_FS
- destroy_proc_ide_drives(ch);
-#endif
spin_lock_irqsave(&ide_lock, flags);
/*
@@ -2208,9 +2199,6 @@
if (!initializing) {
ideprobe_init();
revalidate_drives();
-#ifdef CONFIG_PROC_FS
- create_proc_ide_interfaces();
-#endif
/* FIXME: Do we really have to call it second time here?! */
ide_driver_module();
}
@@ -3198,11 +3186,7 @@
}
drive->revalidate = 1;
drive->suspend_reset = 0;
-#ifdef CONFIG_PROC_FS
- ide_add_proc_entries(drive->proc, generic_subdriver_entries, drive);
- if (ata_ops(drive))
- ide_add_proc_entries(drive->proc, ata_ops(drive)->proc, drive);
-#endif
+
return 0;
}
@@ -3233,11 +3217,6 @@
#if defined(CONFIG_BLK_DEV_ISAPNP) && defined(CONFIG_ISAPNP) && defined(MODULE)
pnpide_init(0);
#endif
-#ifdef CONFIG_PROC_FS
- if (ata_ops(drive))
- ide_remove_proc_entries(drive->proc, ata_ops(drive)->proc);
- ide_remove_proc_entries(drive->proc, generic_subdriver_entries);
-#endif
auto_remove_settings(drive);
drive->driver = NULL;
drive->present = 0;
@@ -3315,10 +3294,6 @@
EXPORT_SYMBOL(ide_cmd);
EXPORT_SYMBOL(ide_delay_50ms);
EXPORT_SYMBOL(ide_stall_queue);
-#ifdef CONFIG_PROC_FS
-EXPORT_SYMBOL(ide_add_proc_entries);
-EXPORT_SYMBOL(ide_remove_proc_entries);
-#endif
EXPORT_SYMBOL(ide_add_setting);
EXPORT_SYMBOL(ide_remove_setting);
@@ -3484,10 +3459,6 @@
# endif
#endif
-#ifdef CONFIG_PROC_FS
- proc_ide_create();
-#endif
-
/*
* Initialize all device type driver modules.
*/
@@ -3553,9 +3524,6 @@
ide_unregister(&ide_hwifs[h]);
}
-# ifdef CONFIG_PROC_FS
- proc_ide_destroy();
-# endif
devfs_unregister(ide_devfs_handle);
}
diff -urN linux-2.5.14/drivers/ide/ide-cd.c linux/drivers/ide/ide-cd.c
--- linux-2.5.14/drivers/ide/ide-cd.c 2002-05-06 05:38:01.000000000 +0200
+++ linux/drivers/ide/ide-cd.c 2002-05-07 03:34:22.000000000 +0200
@@ -2906,7 +2906,6 @@
check_media_change: ide_cdrom_check_media_change,
revalidate: ide_cdrom_revalidate,
capacity: ide_cdrom_capacity,
- proc: NULL
};
/* options */
diff -urN linux-2.5.14/drivers/ide/ide-disk.c linux/drivers/ide/ide-disk.c
--- linux-2.5.14/drivers/ide/ide-disk.c 2002-05-07 03:47:14.000000000 +0200
+++ linux/drivers/ide/ide-disk.c 2002-05-07 03:17:38.000000000 +0200
@@ -419,68 +419,6 @@
return drive->capacity - drive->sect0;
}
-#ifdef CONFIG_PROC_FS
-
-#ifdef CONFIG_BLK_DEV_IDE_TCQ
-static int proc_idedisk_read_tcq
- (char *page, char **start, off_t off, int count, int *eof, void *data)
-{
- struct ata_device *drive = (struct ata_device *) data;
- char *out = page;
- int len, cmds, i;
- unsigned long flags;
-
- if (!blk_queue_tagged(&drive->queue)) {
- len = sprintf(out, "not configured\n");
- PROC_IDE_READ_RETURN(page, start, off, count, eof, len);
- }
-
- spin_lock_irqsave(&ide_lock, flags);
-
- len = sprintf(out, "TCQ currently on:\t%s\n", drive->using_tcq ? "yes" : "no");
- len += sprintf(out+len, "Max queue depth:\t%d\n",drive->queue_depth);
- len += sprintf(out+len, "Max achieved depth:\t%d\n",drive->max_depth);
- len += sprintf(out+len, "Max depth since last:\t%d\n",drive->max_last_depth);
- len += sprintf(out+len, "Current depth:\t\t%d\n", ata_pending_commands(drive));
- len += sprintf(out+len, "Active tags:\t\t[ ");
- for (i = 0, cmds = 0; i < drive->queue_depth; i++) {
- struct request *rq = blk_queue_tag_request(&drive->queue, i);
-
- if (!rq)
- continue;
-
- len += sprintf(out+len, "%d, ", i);
- cmds++;
- }
- len += sprintf(out+len, "]\n");
-
- len += sprintf(out+len, "Queue:\t\t\treleased [ %lu ] - started [ %lu ]\n", drive->immed_rel, drive->immed_comp);
-
- if (ata_pending_commands(drive) != cmds)
- len += sprintf(out+len, "pending request and queue count mismatch (counted: %d)\n", cmds);
-
- len += sprintf(out+len, "DMA status:\t\t%srunning\n", test_bit(IDE_DMA, &HWGROUP(drive)->flags) ? "" : "not ");
-
- drive->max_last_depth = 0;
-
- spin_unlock_irqrestore(&ide_lock, flags);
- PROC_IDE_READ_RETURN(page, start, off, count, eof, len);
-}
-#endif
-
-static ide_proc_entry_t idedisk_proc[] = {
-#ifdef CONFIG_BLK_DEV_IDE_TCQ
- { "tcq", S_IFREG|S_IRUSR, proc_idedisk_read_tcq, NULL },
-#endif
- { NULL, 0, NULL, NULL }
-};
-
-#else
-
-# define idedisk_proc NULL
-
-#endif
-
/*
* This is tightly woven into the driver->special can not touch.
* DON'T do it again until a total personality rewrite is committed.
@@ -1099,7 +1037,6 @@
check_media_change: idedisk_check_media_change,
revalidate: NULL, /* use default method */
capacity: idedisk_capacity,
- proc: idedisk_proc
};
MODULE_DESCRIPTION("ATA DISK Driver");
@@ -1116,10 +1053,6 @@
}
/* We must remove proc entries defined in this module.
Otherwise we oops while accessing these entries */
-#ifdef CONFIG_PROC_FS
- if (drive->proc)
- ide_remove_proc_entries(drive->proc, idedisk_proc);
-#endif
}
}
diff -urN linux-2.5.14/drivers/ide/ide-proc.c linux/drivers/ide/ide-proc.c
--- linux-2.5.14/drivers/ide/ide-proc.c 2002-05-07 02:36:37.000000000 +0200
+++ linux/drivers/ide/ide-proc.c 1970-01-01 01:00:00.000000000 +0100
@@ -1,477 +0,0 @@
-/*
- * Copyright (C) 1997-1998 Mark Lord
- *
- * This is the /proc/ide/ filesystem implementation.
- *
- * The major reason this exists is to provide sufficient access
- * to driver and config data, such that user-mode programs can
- * be developed to handle chipset tuning for most PCI interfaces.
- * This should provide better utilities, and less kernel bloat.
- *
- * The entire pci config space for a PCI interface chipset can be
- * retrieved by just reading it. e.g. "cat /proc/ide3/config"
- *
- * To modify registers *safely*, do something like:
- * echo "P40:88" >/proc/ide/ide3/config
- * That expression writes 0x88 to pci config register 0x40
- * on the chip which controls ide3. Multiple tuples can be issued,
- * and the writes will be completed as an atomic set:
- * echo "P40:88 P41:35 P42:00 P43:00" >/proc/ide/ide3/config
- *
- * All numbers must be specified using pairs of ascii hex digits.
- * It is important to note that these writes will be performed
- * after waiting for the IDE controller (both interfaces)
- * to be completely idle, to ensure no corruption of I/O in progress.
- *
- * Non-PCI registers can also be written, using "R" in place of "P"
- * in the above examples. The size of the port transfer is determined
- * by the number of pairs of hex digits given for the data. If a two
- * digit value is given, the write will be a byte operation; if four
- * digits are used, the write will be performed as a 16-bit operation;
- * and if eight digits are specified, a 32-bit "dword" write will be
- * performed. Odd numbers of digits are not permitted.
- *
- * If there is an error *anywhere* in the string of registers/data
- * then *none* of the writes will be performed.
- *
- * Drive/Driver settings can be retrieved by reading the drive's
- * "settings" files. e.g. "cat /proc/ide0/hda/settings"
- * To write a new value "val" into a specific setting "name", use:
- * echo "name:val" >/proc/ide/ide0/hda/settings
- */
-
-#include <linux/config.h>
-#include <asm/uaccess.h>
-#include <linux/errno.h>
-#include <linux/sched.h>
-#include <linux/proc_fs.h>
-#include <linux/stat.h>
-#include <linux/mm.h>
-#include <linux/pci.h>
-#include <linux/ctype.h>
-#include <linux/hdreg.h>
-#include <linux/ide.h>
-
-#include <asm/io.h>
-
-#ifndef MIN
-#define MIN(a,b) (((a) < (b)) ? (a) : (b))
-#endif
-
-#ifdef CONFIG_BLK_DEV_AEC62XX
-extern byte aec62xx_proc;
-int (*aec62xx_display_info)(char *, char **, off_t, int) = NULL;
-#endif /* CONFIG_BLK_DEV_AEC62XX */
-#ifdef CONFIG_BLK_DEV_ALI15X3
-extern byte ali_proc;
-int (*ali_display_info)(char *, char **, off_t, int) = NULL;
-#endif /* CONFIG_BLK_DEV_ALI15X3 */
-#ifdef CONFIG_BLK_DEV_AMD74XX
-extern byte amd74xx_proc;
-int (*amd74xx_display_info)(char *, char **, off_t, int) = NULL;
-#endif /* CONFIG_BLK_DEV_AMD74XX */
-#ifdef CONFIG_BLK_DEV_CMD64X
-extern byte cmd64x_proc;
-int (*cmd64x_display_info)(char *, char **, off_t, int) = NULL;
-#endif /* CONFIG_BLK_DEV_CMD64X */
-#ifdef CONFIG_BLK_DEV_CS5530
-extern byte cs5530_proc;
-int (*cs5530_display_info)(char *, char **, off_t, int) = NULL;
-#endif /* CONFIG_BLK_DEV_CS5530 */
-#ifdef CONFIG_BLK_DEV_HPT34X
-extern byte hpt34x_proc;
-int (*hpt34x_display_info)(char *, char **, off_t, int) = NULL;
-#endif /* CONFIG_BLK_DEV_HPT34X */
-#ifdef CONFIG_BLK_DEV_HPT366
-extern byte hpt366_proc;
-int (*hpt366_display_info)(char *, char **, off_t, int) = NULL;
-#endif /* CONFIG_BLK_DEV_HPT366 */
-#ifdef CONFIG_BLK_DEV_PDC202XX
-extern byte pdc202xx_proc;
-int (*pdc202xx_display_info)(char *, char **, off_t, int) = NULL;
-#endif /* CONFIG_BLK_DEV_PDC202XX */
-#ifdef CONFIG_BLK_DEV_PIIX
-extern byte piix_proc;
-int (*piix_display_info)(char *, char **, off_t, int) = NULL;
-#endif /* CONFIG_BLK_DEV_PIIX */
-#ifdef CONFIG_BLK_DEV_SVWKS
-extern byte svwks_proc;
-int (*svwks_display_info)(char *, char **, off_t, int) = NULL;
-#endif /* CONFIG_BLK_DEV_SVWKS */
-#ifdef CONFIG_BLK_DEV_SIS5513
-extern byte sis_proc;
-int (*sis_display_info)(char *, char **, off_t, int) = NULL;
-#endif /* CONFIG_BLK_DEV_SIS5513 */
-#ifdef CONFIG_BLK_DEV_VIA82CXXX
-extern byte via_proc;
-int (*via_display_info)(char *, char **, off_t, int) = NULL;
-#endif /* CONFIG_BLK_DEV_VIA82CXXX */
-
-static struct proc_dir_entry * proc_ide_root = NULL;
-
-static int ide_getdigit(char c)
-{
- int digit;
- if (isdigit(c))
- digit = c - '0';
- else
- digit = -1;
- return digit;
-}
-
-static int proc_ide_read_settings
- (char *page, char **start, off_t off, int count, int *eof, void *data)
-{
- ide_drive_t *drive = data;
- ide_settings_t *setting = drive->settings;
- char *out = page;
- int len, rc, mul_factor, div_factor;
-
- out += sprintf(out, "name\t\t\tvalue\t\tmin\t\tmax\t\tmode\n");
- out += sprintf(out, "----\t\t\t-----\t\t---\t\t---\t\t----\n");
- while(setting) {
- mul_factor = setting->mul_factor;
- div_factor = setting->div_factor;
- out += sprintf(out, "%-24s", setting->name);
- if ((rc = ide_read_setting(drive, setting)) >= 0)
- out += sprintf(out, "%-16d", rc * mul_factor / div_factor);
- else
- out += sprintf(out, "%-16s", "write-only");
- out += sprintf(out, "%-16d%-16d", (setting->min * mul_factor + div_factor - 1) / div_factor, setting->max * mul_factor / div_factor);
- if (setting->rw & SETTING_READ)
- out += sprintf(out, "r");
- if (setting->rw & SETTING_WRITE)
- out += sprintf(out, "w");
- out += sprintf(out, "\n");
- setting = setting->next;
- }
- len = out - page;
- PROC_IDE_READ_RETURN(page,start,off,count,eof,len);
-}
-
-#define MAX_LEN 30
-
-static int proc_ide_write_settings
- (struct file *file, const char *buffer, unsigned long count, void *data)
-{
- ide_drive_t *drive = data;
- char name[MAX_LEN + 1];
- int for_real = 0, len;
- unsigned long n;
- const char *start = NULL;
- ide_settings_t *setting;
-
- if (!capable(CAP_SYS_ADMIN))
- return -EACCES;
- /*
- * Skip over leading whitespace
- */
- while (count && isspace(*buffer)) {
- --count;
- ++buffer;
- }
- /*
- * Do one full pass to verify all parameters,
- * then do another to actually write the new settings.
- */
- do {
- const char *p;
- p = buffer;
- n = count;
- while (n > 0) {
- int d, digits;
- unsigned int val = 0;
- start = p;
-
- while (n > 0 && *p != ':') {
- --n;
- p++;
- }
- if (*p != ':')
- goto parse_error;
- len = min(p - start, MAX_LEN);
- strncpy(name, start, min(len, MAX_LEN));
- name[len] = 0;
-
- if (n > 0) {
- --n;
- p++;
- } else
- goto parse_error;
-
- digits = 0;
- while (n > 0 && (d = ide_getdigit(*p)) >= 0) {
- val = (val * 10) + d;
- --n;
- ++p;
- ++digits;
- }
- if (n > 0 && !isspace(*p))
- goto parse_error;
- while (n > 0 && isspace(*p)) {
- --n;
- ++p;
- }
-
- /* Find setting by name */
- setting = drive->settings;
-
- while (setting) {
- if (strcmp(setting->name, name) == 0)
- break;
- setting = setting->next;
- }
- if (!setting)
- goto parse_error;
-
- if (for_real)
- ide_write_setting(drive, setting, val * setting->div_factor / setting->mul_factor);
- }
- } while (!for_real++);
- return count;
-parse_error:
- printk("proc_ide_write_settings(): parse error\n");
- return -EINVAL;
-}
-
-static ide_proc_entry_t generic_drive_entries[] = {
- { "settings", S_IFREG|S_IRUSR|S_IWUSR,proc_ide_read_settings, proc_ide_write_settings },
- { NULL, 0, NULL, NULL }
-};
-
-void ide_add_proc_entries(struct proc_dir_entry *dir, ide_proc_entry_t *p, void *data)
-{
- struct proc_dir_entry *ent;
-
- if (!dir || !p)
- return;
- while (p->name != NULL) {
- ent = create_proc_entry(p->name, p->mode, dir);
- if (!ent) return;
- ent->nlink = 1;
- ent->data = data;
- ent->read_proc = p->read_proc;
- ent->write_proc = p->write_proc;
- p++;
- }
-}
-
-void ide_remove_proc_entries(struct proc_dir_entry *dir, ide_proc_entry_t *p)
-{
- if (!dir || !p)
- return;
- while (p->name != NULL) {
- remove_proc_entry(p->name, dir);
- p++;
- }
-}
-
-/* FIXME: we should iterate over the hwifs here as everywhere else.
- */
-static void create_proc_ide_drives(struct ata_channel *hwif)
-{
- int d;
- struct proc_dir_entry *parent = hwif->proc;
- char name[64];
-
- for (d = 0; d < MAX_DRIVES; d++) {
- ide_drive_t *drive = &hwif->drives[d];
- struct ata_operations *driver = drive->driver;
-
- if (!drive->present)
- continue;
- if (drive->proc)
- continue;
-
- drive->proc = proc_mkdir(drive->name, parent);
- if (drive->proc) {
- ide_add_proc_entries(drive->proc, generic_drive_entries, drive);
- if (driver) {
- ide_add_proc_entries(drive->proc, generic_subdriver_entries, drive);
- ide_add_proc_entries(drive->proc, driver->proc, drive);
- }
- }
- sprintf(name,"ide%d/%s", (drive->name[2]-'a')/2, drive->name);
- }
-}
-
-static void destroy_proc_ide_device(struct ata_channel *hwif, ide_drive_t *drive)
-{
- struct ata_operations *driver = drive->driver;
-
- if (drive->proc) {
- if (driver)
- ide_remove_proc_entries(drive->proc, driver->proc);
- ide_remove_proc_entries(drive->proc, generic_drive_entries);
- remove_proc_entry(drive->name, proc_ide_root);
- remove_proc_entry(drive->name, hwif->proc);
- drive->proc = NULL;
- }
-}
-
-void destroy_proc_ide_drives(struct ata_channel *hwif)
-{
- int d;
-
- for (d = 0; d < MAX_DRIVES; d++) {
- ide_drive_t *drive = &hwif->drives[d];
-
- if (drive->proc)
- destroy_proc_ide_device(hwif, drive);
- }
-}
-
-void create_proc_ide_interfaces(void)
-{
- int h;
-
- for (h = 0; h < MAX_HWIFS; h++) {
- struct ata_channel *hwif = &ide_hwifs[h];
-
- if (!hwif->present)
- continue;
- if (!hwif->proc) {
- hwif->proc = proc_mkdir(hwif->name, proc_ide_root);
- if (!hwif->proc)
- return;
- }
- create_proc_ide_drives(hwif);
- }
-}
-
-static void destroy_proc_ide_interfaces(void)
-{
- int h;
-
- for (h = 0; h < MAX_HWIFS; h++) {
- struct ata_channel *hwif = &ide_hwifs[h];
- int exist = (hwif->proc != NULL);
-#if 0
- if (!hwif->present)
- continue;
-#endif
- if (exist) {
- destroy_proc_ide_drives(hwif);
- remove_proc_entry(hwif->name, proc_ide_root);
- hwif->proc = NULL;
- } else
- continue;
- }
-}
-
-void proc_ide_create(void)
-{
- proc_ide_root = proc_mkdir("ide", 0);
- if (!proc_ide_root) return;
-
- create_proc_ide_interfaces();
-
-#ifdef CONFIG_BLK_DEV_AEC62XX
- if ((aec62xx_display_info) && (aec62xx_proc))
- create_proc_info_entry("aec62xx", 0, proc_ide_root, aec62xx_display_info);
-#endif /* CONFIG_BLK_DEV_AEC62XX */
-#ifdef CONFIG_BLK_DEV_ALI15X3
- if ((ali_display_info) && (ali_proc))
- create_proc_info_entry("ali", 0, proc_ide_root, ali_display_info);
-#endif /* CONFIG_BLK_DEV_ALI15X3 */
-#ifdef CONFIG_BLK_DEV_AMD74XX
- if ((amd74xx_display_info) && (amd74xx_proc))
- create_proc_info_entry("amd74xx", 0, proc_ide_root, amd74xx_display_info);
-#endif /* CONFIG_BLK_DEV_AMD74XX */
-#ifdef CONFIG_BLK_DEV_CMD64X
- if ((cmd64x_display_info) && (cmd64x_proc))
- create_proc_info_entry("cmd64x", 0, proc_ide_root, cmd64x_display_info);
-#endif /* CONFIG_BLK_DEV_CMD64X */
-#ifdef CONFIG_BLK_DEV_CS5530
- if ((cs5530_display_info) && (cs5530_proc))
- create_proc_info_entry("cs5530", 0, proc_ide_root, cs5530_display_info);
-#endif /* CONFIG_BLK_DEV_CS5530 */
-#ifdef CONFIG_BLK_DEV_HPT34X
- if ((hpt34x_display_info) && (hpt34x_proc))
- create_proc_info_entry("hpt34x", 0, proc_ide_root, hpt34x_display_info);
-#endif /* CONFIG_BLK_DEV_HPT34X */
-#ifdef CONFIG_BLK_DEV_HPT366
- if ((hpt366_display_info) && (hpt366_proc))
- create_proc_info_entry("hpt366", 0, proc_ide_root, hpt366_display_info);
-#endif /* CONFIG_BLK_DEV_HPT366 */
-#ifdef CONFIG_BLK_DEV_SVWKS
- if ((svwks_display_info) && (svwks_proc))
- create_proc_info_entry("svwks", 0, proc_ide_root, svwks_display_info);
-#endif /* CONFIG_BLK_DEV_SVWKS */
-#ifdef CONFIG_BLK_DEV_PDC202XX
- if ((pdc202xx_display_info) && (pdc202xx_proc))
- create_proc_info_entry("pdc202xx", 0, proc_ide_root, pdc202xx_display_info);
-#endif /* CONFIG_BLK_DEV_PDC202XX */
-#ifdef CONFIG_BLK_DEV_PIIX
- if ((piix_display_info) && (piix_proc))
- create_proc_info_entry("piix", 0, proc_ide_root, piix_display_info);
-#endif /* CONFIG_BLK_DEV_PIIX */
-#ifdef CONFIG_BLK_DEV_SIS5513
- if ((sis_display_info) && (sis_proc))
- create_proc_info_entry("sis", 0, proc_ide_root, sis_display_info);
-#endif /* CONFIG_BLK_DEV_SIS5513 */
-#ifdef CONFIG_BLK_DEV_VIA82CXXX
- if ((via_display_info) && (via_proc))
- create_proc_info_entry("via", 0, proc_ide_root, via_display_info);
-#endif /* CONFIG_BLK_DEV_VIA82CXXX */
-}
-
-void proc_ide_destroy(void)
-{
- /*
- * Mmmm.. does this free up all resources,
- * or do we need to do a more proper cleanup here ??
- */
-#ifdef CONFIG_BLK_DEV_AEC62XX
- if ((aec62xx_display_info) && (aec62xx_proc))
- remove_proc_entry("ide/aec62xx",0);
-#endif /* CONFIG_BLK_DEV_AEC62XX */
-#ifdef CONFIG_BLK_DEV_ALI15X3
- if ((ali_display_info) && (ali_proc))
- remove_proc_entry("ide/ali",0);
-#endif /* CONFIG_BLK_DEV_ALI15X3 */
-#ifdef CONFIG_BLK_DEV_AMD74XX
- if ((amd74xx_display_info) && (amd74xx_proc))
- remove_proc_entry("ide/amd74xx",0);
-#endif /* CONFIG_BLK_DEV_AMD74XX */
-#ifdef CONFIG_BLK_DEV_CMD64X
- if ((cmd64x_display_info) && (cmd64x_proc))
- remove_proc_entry("ide/cmd64x",0);
-#endif /* CONFIG_BLK_DEV_CMD64X */
-#ifdef CONFIG_BLK_DEV_CS5530
- if ((cs5530_display_info) && (cs5530_proc))
- remove_proc_entry("ide/cs5530",0);
-#endif /* CONFIG_BLK_DEV_CS5530 */
-#ifdef CONFIG_BLK_DEV_HPT34X
- if ((hpt34x_display_info) && (hpt34x_proc))
- remove_proc_entry("ide/hpt34x",0);
-#endif /* CONFIG_BLK_DEV_HPT34X */
-#ifdef CONFIG_BLK_DEV_HPT366
- if ((hpt366_display_info) && (hpt366_proc))
- remove_proc_entry("ide/hpt366",0);
-#endif /* CONFIG_BLK_DEV_HPT366 */
-#ifdef CONFIG_BLK_DEV_PDC202XX
- if ((pdc202xx_display_info) && (pdc202xx_proc))
- remove_proc_entry("ide/pdc202xx",0);
-#endif /* CONFIG_BLK_DEV_PDC202XX */
-#ifdef CONFIG_BLK_DEV_PIIX
- if ((piix_display_info) && (piix_proc))
- remove_proc_entry("ide/piix",0);
-#endif /* CONFIG_BLK_DEV_PIIX */
-#ifdef CONFIG_BLK_DEV_SVWKS
- if ((svwks_display_info) && (svwks_proc))
- remove_proc_entry("ide/svwks",0);
-#endif /* CONFIG_BLK_DEV_SVWKS */
-#ifdef CONFIG_BLK_DEV_SIS5513
- if ((sis_display_info) && (sis_proc))
- remove_proc_entry("ide/sis", 0);
-#endif /* CONFIG_BLK_DEV_SIS5513 */
-#ifdef CONFIG_BLK_DEV_VIA82CXXX
- if ((via_display_info) && (via_proc))
- remove_proc_entry("ide/via",0);
-#endif /* CONFIG_BLK_DEV_VIA82CXXX */
-
- remove_proc_entry("ide/drivers", 0);
- destroy_proc_ide_interfaces();
- remove_proc_entry("ide", 0);
-}
diff -urN linux-2.5.14/drivers/ide/ide-tape.c linux/drivers/ide/ide-tape.c
--- linux-2.5.14/drivers/ide/ide-tape.c 2002-05-07 03:47:14.000000000 +0200
+++ linux/drivers/ide/ide-tape.c 2002-05-07 03:36:01.000000000 +0200
@@ -6108,31 +6108,6 @@
return 0;
}
-#ifdef CONFIG_PROC_FS
-
-static int proc_idetape_read_name
- (char *page, char **start, off_t off, int count, int *eof, void *data)
-{
- ide_drive_t *drive = (ide_drive_t *) data;
- idetape_tape_t *tape = drive->driver_data;
- char *out = page;
- int len;
-
- len = sprintf(out, "%s\n", tape->name);
- PROC_IDE_READ_RETURN(page, start, off, count, eof, len);
-}
-
-static ide_proc_entry_t idetape_proc[] = {
- { "name", S_IFREG|S_IRUGO, proc_idetape_read_name, NULL },
- { NULL, 0, NULL, NULL }
-};
-
-#else
-
-#define idetape_proc NULL
-
-#endif
-
static void idetape_revalidate(ide_drive_t *_dummy)
{
/* We don't have to handle any partition information here, which is the
@@ -6154,7 +6129,6 @@
release: idetape_blkdev_release,
check_media_change: NULL,
revalidate: idetape_revalidate,
- proc: idetape_proc
};
/*
diff -urN linux-2.5.14/drivers/ide/Makefile linux/drivers/ide/Makefile
--- linux-2.5.14/drivers/ide/Makefile 2002-05-07 02:36:37.000000000 +0200
+++ linux/drivers/ide/Makefile 2002-05-07 03:31:30.000000000 +0200
@@ -72,8 +72,6 @@
obj-$(CONFIG_BLK_DEV_ATARAID_PDC) += pdcraid.o
obj-$(CONFIG_BLK_DEV_ATARAID_HPT) += hptraid.o
-ide-obj-$(CONFIG_PROC_FS) += ide-proc.o
-
ide-mod-objs := ide-taskfile.o ide.o ide-probe.o ide-geometry.o ide-features.o ata-timing.o $(ide-obj-y)
include $(TOPDIR)/Rules.make
diff -urN linux-2.5.14/drivers/ide/pdc202xx.c linux/drivers/ide/pdc202xx.c
--- linux-2.5.14/drivers/ide/pdc202xx.c 2002-05-06 05:38:06.000000000 +0200
+++ linux/drivers/ide/pdc202xx.c 2002-05-07 03:25:51.000000000 +0200
@@ -51,7 +51,7 @@
#define PDC202XX_DEBUG_DRIVE_INFO 0
#define PDC202XX_DECODE_REGISTER_INFO 0
-#define DISPLAY_PDC202XX_TIMINGS
+#undef DISPLAY_PDC202XX_TIMINGS
#ifndef SPLIT_BYTE
#define SPLIT_BYTE(B,H,L) ((H)=(B>>4), (L)=(B-((B>>4)<<4)))
diff -urN linux-2.5.14/drivers/ide/piix.c linux/drivers/ide/piix.c
--- linux-2.5.14/drivers/ide/piix.c 2002-05-06 05:38:00.000000000 +0200
+++ linux/drivers/ide/piix.c 2002-05-07 03:26:49.000000000 +0200
@@ -110,7 +110,7 @@
* PIIX/ICH /proc entry.
*/
-#ifdef CONFIG_PROC_FS
+#if 0 && defined(CONFIG_PROC_FS)
#include <linux/stat.h>
#include <linux/proc_fs.h>
@@ -520,7 +520,7 @@
* Register /proc/ide/piix entry
*/
-#ifdef CONFIG_PROC_FS
+#if 0 && defined(CONFIG_PROC_FS)
if (!piix_proc) {
piix_base = pci_resource_start(dev, 4);
bmide_dev = dev;
diff -urN linux-2.5.14/drivers/ide/serverworks.c linux/drivers/ide/serverworks.c
--- linux-2.5.14/drivers/ide/serverworks.c 2002-05-06 05:38:03.000000000 +0200
+++ linux/drivers/ide/serverworks.c 2002-05-07 03:29:32.000000000 +0200
@@ -93,15 +93,16 @@
#include "ata-timing.h"
-#define DISPLAY_SVWKS_TIMINGS 1
+#undef DISPLAY_SVWKS_TIMINGS
#undef SVWKS_DEBUG_DRIVE_INFO
+static u8 svwks_revision = 0;
+
#if defined(DISPLAY_SVWKS_TIMINGS) && defined(CONFIG_PROC_FS)
#include <linux/stat.h>
#include <linux/proc_fs.h>
static struct pci_dev *bmide_dev;
-static byte svwks_revision = 0;
static int svwks_get_info(char *, char **, off_t, int);
extern int (*svwks_display_info)(char *, char **, off_t, int); /* ide-proc.c */
diff -urN linux-2.5.14/drivers/ide/sis5513.c linux/drivers/ide/sis5513.c
--- linux-2.5.14/drivers/ide/sis5513.c 2002-05-06 05:38:00.000000000 +0200
+++ linux/drivers/ide/sis5513.c 2002-05-07 03:27:49.000000000 +0200
@@ -58,7 +58,7 @@
/* When BROKEN_LEVEL is defined it limits the DMA mode
at boot time to its value */
// #define BROKEN_LEVEL XFER_SW_DMA_0
-#define DISPLAY_SIS_TIMINGS
+#undef DISPLAY_SIS_TIMINGS
/* Miscellaneaous flags */
#define SIS5513_LATENCY 0x01
diff -urN linux-2.5.14/drivers/ide/via82cxxx.c linux/drivers/ide/via82cxxx.c
--- linux-2.5.14/drivers/ide/via82cxxx.c 2002-05-06 05:37:58.000000000 +0200
+++ linux/drivers/ide/via82cxxx.c 2002-05-07 03:28:40.000000000 +0200
@@ -136,7 +136,7 @@
* VIA /proc entry.
*/
-#ifdef CONFIG_PROC_FS
+#if 0 && defined(CONFIG_PROC_FS)
#include <linux/stat.h>
#include <linux/proc_fs.h>
@@ -497,7 +497,7 @@
* Setup /proc/ide/via entry.
*/
-#ifdef CONFIG_PROC_FS
+#if 0 && defined(CONFIG_PROC_FS)
if (!via_proc) {
via_base = pci_resource_start(dev, 4);
bmide_dev = dev;
diff -urN linux-2.5.14/drivers/scsi/ide-scsi.c linux/drivers/scsi/ide-scsi.c
--- linux-2.5.14/drivers/scsi/ide-scsi.c 2002-05-06 05:37:53.000000000 +0200
+++ linux/drivers/scsi/ide-scsi.c 2002-05-07 03:37:28.000000000 +0200
@@ -557,7 +557,6 @@
check_media_change: NULL,
revalidate: idescsi_revalidate,
capacity: NULL,
- proc: NULL
};
/*
diff -urN linux-2.5.14/include/linux/ide.h linux/include/linux/ide.h
--- linux-2.5.14/include/linux/ide.h 2002-05-07 03:47:14.000000000 +0200
+++ linux/include/linux/ide.h 2002-05-07 03:17:40.000000000 +0200
@@ -379,8 +379,7 @@
void *driver_data; /* extra driver data */
devfs_handle_t de; /* directory for device */
- struct proc_dir_entry *proc; /* /proc/ide/ directory entry */
- struct ide_settings_s *settings; /* /proc/ide/ drive settings */
+ struct ide_settings_s *settings; /* ioctl entires */
char driver_req[10]; /* requests specific driver */
int last_lun; /* last logical unit */
@@ -612,43 +611,7 @@
extern int ide_write_setting(struct ata_device *, ide_settings_t *, int);
extern void ide_add_generic_settings(struct ata_device *);
-/*
- * /proc/ide interface
- */
-typedef struct {
- const char *name;
- mode_t mode;
- read_proc_t *read_proc;
- write_proc_t *write_proc;
-} ide_proc_entry_t;
-
-#ifdef CONFIG_PROC_FS
-void proc_ide_create(void);
-void proc_ide_destroy(void);
-void destroy_proc_ide_drives(struct ata_channel *);
-void create_proc_ide_interfaces(void);
-void ide_add_proc_entries(struct proc_dir_entry *dir, ide_proc_entry_t *p, void *data);
-void ide_remove_proc_entries(struct proc_dir_entry *dir, ide_proc_entry_t *p);
-read_proc_t proc_ide_read_geometry;
-
-/*
- * Standard exit stuff:
- */
-#define PROC_IDE_READ_RETURN(page,start,off,count,eof,len) \
-{ \
- len -= off; \
- if (len < count) { \
- *eof = 1; \
- if (len <= 0) \
- return 0; \
- } else \
- len = count; \
- *start = page + off; \
- return len; \
-}
-#else
-# define PROC_IDE_READ_RETURN(page,start,off,count,eof,len) return 0;
-#endif
+#define PROC_IDE_READ_RETURN(page,start,off,count,eof,len) return 0;
/*
* This structure describes the operations possible on a particular device type
@@ -671,8 +634,6 @@
void (*revalidate)(struct ata_device *);
sector_t (*capacity)(struct ata_device *);
-
- ide_proc_entry_t *proc;
};
/* Alas, no aliases. Too much hassle with bringing module.h everywhere */
@@ -863,7 +824,6 @@
void ide_init_subdrivers (void);
extern struct block_device_operations ide_fops[];
-extern ide_proc_entry_t generic_subdriver_entries[];
#ifdef CONFIG_BLK_DEV_IDE
/* Probe for devices attached to the systems host controllers.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 11:27 ` [PATCH] 2.5.14 IDE 57 Martin Dalecki
@ 2002-05-07 13:16 ` Anton Altaparmakov
2002-05-07 12:34 ` Martin Dalecki
2002-05-11 14:09 ` Aaron Lehmann
1 sibling, 1 reply; 265+ messages in thread
From: Anton Altaparmakov @ 2002-05-07 13:16 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
At 12:27 07/05/02, Martin Dalecki wrote:
>Tue May 7 02:37:49 CEST 2002 ide-clean-57
>
>Nuke /proc/ide. For explanations why, please see the frustrated comments
>in the previous change log.
This is a big mistake IMO.
Nuking the ability to change settings, fair enough, but only if alternative
interface is provided for userspace to tweak everything, otherwise provide
the interface before you remove the existing one. (There may be already
another interface, I don't know...I am sure someone will tell me if there is!)
Removing the information provided by /proc/ide is very bad! It is very
useful to diagnose one's ide setup, to see what the host is configured as,
what all settings are set to, etc. This is the first place I look to check
whether the interfaces are configured as I expect them to be and in case of
problems, this is again the first place I look.
What alternatives are you going to present to give all the information that
/proc/ide gives? If the answer is none IMHO your patch is not acceptable...
Best regards,
Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 13:16 ` Anton Altaparmakov
@ 2002-05-07 12:34 ` Martin Dalecki
2002-05-07 13:56 ` Mikael Pettersson
2002-05-07 13:57 ` Anton Altaparmakov
0 siblings, 2 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 12:34 UTC (permalink / raw)
To: Anton Altaparmakov; +Cc: Linus Torvalds, Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 938 bytes --]
Uz.ytkownik Anton Altaparmakov napisa?:
> At 12:27 07/05/02, Martin Dalecki wrote:
>
>> Tue May 7 02:37:49 CEST 2002 ide-clean-57
>>
>> Nuke /proc/ide. For explanations why, please see the frustrated
>> comments in the previous change log.
>
>
> This is a big mistake IMO.
>
> Nuking the ability to change settings, fair enough, but only if
> alternative interface is provided for userspace to tweak everything,
> otherwise provide the interface before you remove the existing one.
> (There may be already another interface, I don't know...I am sure
> someone will tell me if there is!)
Ehmm... There *is* one interface there. hdparm will help
you. Note: the upcomming release of hdparm should contain the
following patch which incearses it's usability vastly to the
average user. Just for convenience I'm attaching it here.
If you don't like hdparm - well please shoot the
people who wrote init, ifconfig, eject and so on...
[-- Attachment #2: hdparm-4.9.diff --]
[-- Type: text/plain, Size: 3525 bytes --]
diff -ur hdparm-4.9/hdparm.8 hdparm-4.9-new/hdparm.8
--- hdparm-4.9/hdparm.8 2002-04-29 16:27:55.000000000 +0200
+++ hdparm-4.9-new/hdparm.8 2002-05-02 14:54:56.000000000 +0200
@@ -368,14 +368,16 @@
Tristate device for hotswap (DANGEROUS).
.TP
.I -X
-Set the IDE transfer mode for newer (E)IDE/ATA2 drives.
+Set the IDE transfer mode for newer (E)IDE/ATA-7 drives.
This is typically used in combination with
.I -d1
when enabling DMA to/from a drive on a supported interface chipset, where
-.I -X34
-is used to select multiword DMA mode2 transfers.
+.I -X mdma2
+is used to select multiword DMA mode2 transfers and
+.I -X sdma1
+is used to select simple mode 1 DMA transfers.
With systems which support UltraDMA burst timings,
-.I -X66
+.I -X udma2
is used to select UltraDMA mode2 transfers (you'll need to prepare
the chipset for UltraDMA beforehand).
Apart from that, use of this flag is
diff -ur hdparm-4.9/hdparm.c hdparm-4.9-new/hdparm.c
--- hdparm-4.9/hdparm.c 2002-04-29 16:27:42.000000000 +0200
+++ hdparm-4.9-new/hdparm.c 2002-05-02 14:48:56.000000000 +0200
@@ -606,6 +606,66 @@
printf(")\n");
}
+struct xfermode_entry {
+ int val;
+ char *name;
+};
+
+static struct xfermode_entry xfermode_table[] = {
+ { 8, "pio0" },
+ { 9, "pio1" },
+ { 10, "pio2" },
+ { 11, "pio3" },
+ { 12, "pio4" },
+ { 13, "pio5" },
+ { 14, "pio6" },
+ { 15, "pio7" },
+ { 16, "sdma0" },
+ { 17, "sdma1" },
+ { 18, "sdma2" },
+ { 19, "sdma3" },
+ { 20, "sdma4" },
+ { 21, "sdma5" },
+ { 22, "sdma6" },
+ { 23, "sdma7" },
+ { 32, "mdma0" },
+ { 33, "mdma1" },
+ { 34, "mdma2" },
+ { 35, "mdma3" },
+ { 36, "mdma4" },
+ { 37, "mdma5" },
+ { 38, "mdma6" },
+ { 39, "mdma7" },
+ { 64, "udma0" },
+ { 65, "udma1" },
+ { 66, "udma2" },
+ { 67, "udma3" },
+ { 68, "udma4" },
+ { 69, "udma5" },
+ { 70, "udma6" },
+ { 71, "udma7" },
+ { 0, NULL }
+};
+
+static unsigned int translate_xfermode(char * name)
+{
+ struct xfermode_entry *tmp;
+ char *endptr;
+ int val = -1;
+
+
+ for (tmp = xfermode_table; tmp->name; ++tmp) {
+ if (!strcmp(name, tmp->name))
+ return tmp->val;
+ }
+
+ val = strtol(name, &endptr, 10);
+ if (*endptr == '\0')
+ return val;
+
+ return -1;
+}
+
static void interpret_xfermode (unsigned int xfermode)
{
printf(" (");
@@ -1408,9 +1468,26 @@
num = (num * 10) + (*p++ - '0'); \
}
+#define GET_STRING(flag, num) tmpstr = name; \
+ tmpstr[0] = '\0'; \
+ if (!*p && argc && isalnum(**argv)) \
+ p = *argv++, --argc; \
+ while (isalnum(*p) && (tmpstr - name < 31)) { \
+ tmpstr[0] = *p++; \
+ tmpstr[1] = '\0'; \
+ ++tmpstr; \
+ } \
+ num = translate_xfermode(name); \
+ if (num == -1) \
+ flag = 0; \
+ else \
+ flag = 1;
+
int main(int argc, char **argv)
{
char c, *p;
+ char *tmpstr;
+ char name[32];
if ((progname = (char *) strrchr(*argv, '/')) == NULL)
progname = *argv;
@@ -1491,7 +1568,7 @@
case 'p':
noisy_piomode = noisy;
noisy = 1;
- GET_NUMBER(set_piomode,piomode);
+ GET_STRING(set_piomode,piomode);
break;
#endif
case 'r':
@@ -1551,7 +1628,7 @@
case 'X':
get_xfermode = noisy;
noisy = 1;
- GET_NUMBER(set_xfermode,xfermode);
+ GET_STRING(set_xfermode,xfermode);
if (!set_xfermode)
fprintf(stderr, "-X: missing value\n");
break;
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 12:34 ` Martin Dalecki
@ 2002-05-07 13:56 ` Mikael Pettersson
2002-05-07 14:04 ` Dave Jones
2002-05-07 13:57 ` Anton Altaparmakov
1 sibling, 1 reply; 265+ messages in thread
From: Mikael Pettersson @ 2002-05-07 13:56 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Anton Altaparmakov, Kernel Mailing List
Martin Dalecki writes:
> Uz.ytkownik Anton Altaparmakov napisa?:
> > At 12:27 07/05/02, Martin Dalecki wrote:
> >
> >> Tue May 7 02:37:49 CEST 2002 ide-clean-57
> >>
> >> Nuke /proc/ide. For explanations why, please see the frustrated
> >> comments in the previous change log.
> >
> >
> > This is a big mistake IMO.
> >
> > Nuking the ability to change settings, fair enough, but only if
> > alternative interface is provided for userspace to tweak everything,
> > otherwise provide the interface before you remove the existing one.
> > (There may be already another interface, I don't know...I am sure
> > someone will tell me if there is!)
>
> Ehmm... There *is* one interface there. hdparm will help
> you. Note: the upcomming release of hdparm should contain the
hdparm -i requires root privs. cat /proc/ide/${file} does not.
hdparm is NOT an acceptable substitute for /proc/ide/.
/Mikael
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 12:34 ` Martin Dalecki
2002-05-07 13:56 ` Mikael Pettersson
@ 2002-05-07 13:57 ` Anton Altaparmakov
2002-05-07 14:08 ` Dave Jones
1 sibling, 1 reply; 265+ messages in thread
From: Anton Altaparmakov @ 2002-05-07 13:57 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
At 13:34 07/05/02, Martin Dalecki wrote:
>Uz.ytkownik Anton Altaparmakov napisa?:
>>At 12:27 07/05/02, Martin Dalecki wrote:
>>>Tue May 7 02:37:49 CEST 2002 ide-clean-57
>>>
>>>Nuke /proc/ide. For explanations why, please see the frustrated comments
>>>in the previous change log.
>>
>>This is a big mistake IMO.
>>Nuking the ability to change settings, fair enough, but only if
>>alternative interface is provided for userspace to tweak everything,
>>otherwise provide the interface before you remove the existing one.
>>(There may be already another interface, I don't know...I am sure someone
>>will tell me if there is!)
>
>Ehmm... There *is* one interface there. hdparm will help
>you. Note: the upcomming release of hdparm should contain the
>following patch which incearses it's usability vastly to the
>average user. Just for convenience I'm attaching it here.
How do I get this information with hdparm please?
[aia21@drop ide]$ cat via
----------VIA BusMastering IDE Configuration----------------
Driver Version: 3.34
South Bridge: VIA vt82c686b
Revision: ISA 0x40 IDE 0x6
Highest DMA rate: UDMA100
BM-DMA base: 0xd000
PCI clock: 33.3MHz
Master Read Cycle IRDY: 0ws
Master Write Cycle IRDY: 0ws
BM IDE Status Register Read Retry: yes
Max DRDY Pulse Width: No limit
-----------------------Primary IDE-------Secondary IDE------
Read DMA FIFO flush: yes yes
End Sector FIFO flush: no no
Prefetch Buffer: yes no
Post Write Buffer: yes no
Enabled: yes yes
Simplex only: no no
Cable Type: 80w 40w
-------------------drive0----drive1----drive2----drive3-----
Transfer Mode: UDMA PIO DMA UDMA
Address Setup: 30ns 120ns 30ns 30ns
Cmd Active: 90ns 90ns 90ns 90ns
Cmd Recovery: 30ns 30ns 30ns 30ns
Data Active: 90ns 330ns 90ns 90ns
Data Recovery: 30ns 270ns 30ns 30ns
Cycle Time: 20ns 600ns 120ns 60ns
Transfer Rate: 99.9MB/s 3.3MB/s 16.6MB/s 33.3MB/s
hdparm is a tool to query a device and how the controller is programmed to
talk to the device. But it is not designed nor capable of giving
information about the host itself. I just read the man page for hdparm and
there are no options in sight to show any of the things I have shown above.
Also the below work as normal user but hdparm requires super user... It is
debateable whether a normal user should be allowed access but still you are
taking away existing functionality...
[aia21@drop hda]$ cat cache
1916
[aia21@drop hda]$ cat capacity
80418240
[aia21@drop hda]$ cat geometry
physical 79780/16/63
logical 5005/255/63
And hdparm never gives you the physical geometry AFAICS.
Either I am missing something or you are removing a lot of functionality
and replacing it with nothingness...
And as I said, I can understand removing the ability to write values into
/proc/ide/*, what I disagree with is the removal of the information
provided by read-only access to /proc/ide/*. And that is because I am not
aware of any other way to get the same information.
Best regards,
Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 13:57 ` Anton Altaparmakov
@ 2002-05-07 14:08 ` Dave Jones
2002-05-07 13:11 ` Martin Dalecki
` (3 more replies)
0 siblings, 4 replies; 265+ messages in thread
From: Dave Jones @ 2002-05-07 14:08 UTC (permalink / raw)
To: Anton Altaparmakov; +Cc: Martin Dalecki, Linus Torvalds, Kernel Mailing List
On Tue, May 07, 2002 at 02:57:46PM +0100, Anton Altaparmakov wrote:
> How do I get this information with hdparm please?
>
> [aia21@drop ide]$ cat via
Bartlomiej Zolnierkiewicz moved all this stuff to userspace
a long time ago in 'ideinfo'.
> [aia21@drop hda]$ cat cache
> 1916
> [aia21@drop hda]$ cat capacity
> 80418240
> [aia21@drop hda]$ cat geometry
> physical 79780/16/63
> logical 5005/255/63
>
> And hdparm never gives you the physical geometry AFAICS.
Why would a normal user ever need to know this info?
> And as I said, I can understand removing the ability to write values into
> /proc/ide/*, what I disagree with is the removal of the information
> provided by read-only access to /proc/ide/*. And that is because I am not
> aware of any other way to get the same information.
The parsing gunk we have for /proc/ide is fugly, and should have been
done with sysctls from day one imo.
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 14:08 ` Dave Jones
@ 2002-05-07 13:11 ` Martin Dalecki
2002-05-07 14:29 ` Anton Altaparmakov
` (2 subsequent siblings)
3 siblings, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 13:11 UTC (permalink / raw)
To: Dave Jones; +Cc: Anton Altaparmakov, Linus Torvalds, Kernel Mailing List
Uz.ytkownik Dave Jones napisa?:
> On Tue, May 07, 2002 at 02:57:46PM +0100, Anton Altaparmakov wrote:
> > How do I get this information with hdparm please?
> >
> > [aia21@drop ide]$ cat via
>
> Bartlomiej Zolnierkiewicz moved all this stuff to userspace
> a long time ago in 'ideinfo'.
>
> > [aia21@drop hda]$ cat cache
> > 1916
> > [aia21@drop hda]$ cat capacity
> > 80418240
> > [aia21@drop hda]$ cat geometry
> > physical 79780/16/63
> > logical 5005/255/63
> >
> > And hdparm never gives you the physical geometry AFAICS.
>
> Why would a normal user ever need to know this info?
>
> > And as I said, I can understand removing the ability to write values into
> > /proc/ide/*, what I disagree with is the removal of the information
> > provided by read-only access to /proc/ide/*. And that is because I am not
> > aware of any other way to get the same information.
>
> The parsing gunk we have for /proc/ide is fugly, and should have been
> done with sysctls from day one imo.
Amen. For where it turn outs to be really really worth it
I indeed plan to move to sysctl. For example currently
we have on ioctl level still the problem that many of
them are attached to the device but act on the channel.
hdparm -xxx /dev/hda & hdparm -xxx /dev/hdc - BANG race condition.
(At least on the level of logics).
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 14:08 ` Dave Jones
2002-05-07 13:11 ` Martin Dalecki
@ 2002-05-07 14:29 ` Anton Altaparmakov
2002-05-07 13:36 ` Martin Dalecki
2002-05-07 16:51 ` Dave Jones
2002-05-07 15:07 ` Padraig Brady
2002-05-07 17:21 ` Andre Hedrick
3 siblings, 2 replies; 265+ messages in thread
From: Anton Altaparmakov @ 2002-05-07 14:29 UTC (permalink / raw)
To: Dave Jones; +Cc: Martin Dalecki, Linus Torvalds, Kernel Mailing List
At 15:08 07/05/02, Dave Jones wrote:
>On Tue, May 07, 2002 at 02:57:46PM +0100, Anton Altaparmakov wrote:
> > How do I get this information with hdparm please?
> >
> > [aia21@drop ide]$ cat via
>
>Bartlomiej Zolnierkiewicz moved all this stuff to userspace
>a long time ago in 'ideinfo'.
[aia21@drop hda]$ ideinfo
bash: ideinfo: command not found
Obviously distros haven't caught up with this development. )-:
Care to give me a URL? A quick google for "ideinfo Linux download" didn't
bring up anything looking relevant.
> > [aia21@drop hda]$ cat cache
> > 1916
> > [aia21@drop hda]$ cat capacity
> > 80418240
> > [aia21@drop hda]$ cat geometry
> > physical 79780/16/63
> > logical 5005/255/63
> >
> > And hdparm never gives you the physical geometry AFAICS.
>
>Why would a normal user ever need to know this info?
I want to know this info. (-: Admittedly normal users don't need it... It
is useful for diagnosing problems with NTFS and MD setups for example (in
conjunction with fdisk -l shown in sectors).
> > And as I said, I can understand removing the ability to write values into
> > /proc/ide/*, what I disagree with is the removal of the information
> > provided by read-only access to /proc/ide/*. And that is because I am not
> > aware of any other way to get the same information.
>
>The parsing gunk we have for /proc/ide is fugly, and should have been
>done with sysctls from day one imo.
I like text parsing... It is not performance critical and makes info human
readable... Whether existing text parsers are any good or not, I don't
care, write a better one if you don't like the existing one or go beat up
the people who wrote the bad ones... That seems to be Martin's standard
reply, so I thought I would use it, too. (-;
Best regards,
Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 14:29 ` Anton Altaparmakov
@ 2002-05-07 13:36 ` Martin Dalecki
2002-05-07 15:08 ` Anton Altaparmakov
2002-05-07 16:51 ` Dave Jones
1 sibling, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 13:36 UTC (permalink / raw)
To: Anton Altaparmakov; +Cc: Dave Jones, Linus Torvalds, Kernel Mailing List
Uz.ytkownik Anton Altaparmakov napisa?:
>
> [aia21@drop hda]$ ideinfo
> bash: ideinfo: command not found
>
> Obviously distros haven't caught up with this development. )-:
>
> Care to give me a URL? A quick google for "ideinfo Linux download"
> didn't bring up anything looking relevant.
http://www.j2.ru/frozenfido/ru.unix.bsd/1329707b3e3f8.html
Porting it should be fairly tirvial. Basically lspci +
the parsing crap.
>
> I like text parsing... It is not performance critical and makes info
> human readable... Whether existing text parsers are any good or not, I
> don't care, write a better one if you don't like the existing one or go
> beat up the people who wrote the bad ones... That seems to be Martin's
> standard reply, so I thought I would use it, too. (-;
Feel free to do it yourself - in user space where it belongs.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 13:36 ` Martin Dalecki
@ 2002-05-07 15:08 ` Anton Altaparmakov
0 siblings, 0 replies; 265+ messages in thread
From: Anton Altaparmakov @ 2002-05-07 15:08 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Kernel Mailing List
At 14:36 07/05/02, Martin Dalecki wrote:
>Uz.ytkownik Anton Altaparmakov napisa?:
>>[aia21@drop hda]$ ideinfo
>>bash: ideinfo: command not found
>>Obviously distros haven't caught up with this development. )-:
>>Care to give me a URL? A quick google for "ideinfo Linux download" didn't
>>bring up anything looking relevant.
>
>http://www.j2.ru/frozenfido/ru.unix.bsd/1329707b3e3f8.html
>
>Porting it should be fairly tirvial. Basically lspci +
>the parsing crap.
I don't want to port anything. I don't know ide and I don't want to know
ide. I want to be able to use it. I am an ide USER. You are the ide
DEVELOPER. If you take away functionality YOU have to provide a
replacement. NOT tell me, the USER to write it.
>>I like text parsing... It is not performance critical and makes info
>>human readable... Whether existing text parsers are any good or not, I
>>don't care, write a better one if you don't like the existing one or go
>>beat up the people who wrote the bad ones... That seems to be Martin's
>>standard reply, so I thought I would use it, too. (-;
>
>Feel free to do it yourself - in user space where it belongs.
I don't want to do it myself. I want YOU to do it because YOU are taking
away the functionality that already exists.
Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 14:29 ` Anton Altaparmakov
2002-05-07 13:36 ` Martin Dalecki
@ 2002-05-07 16:51 ` Dave Jones
2002-05-08 3:38 ` Anton Altaparmakov
1 sibling, 1 reply; 265+ messages in thread
From: Dave Jones @ 2002-05-07 16:51 UTC (permalink / raw)
To: Anton Altaparmakov; +Cc: Martin Dalecki, Linus Torvalds, Kernel Mailing List
On Tue, May 07, 2002 at 03:29:28PM +0100, Anton Altaparmakov wrote:
> [aia21@drop hda]$ ideinfo
> bash: ideinfo: command not found
> Obviously distros haven't caught up with this development. )-:
> Care to give me a URL? A quick google for "ideinfo Linux download" didn't
> bring up anything looking relevant.
Can't find where I got it from, and it seems to have fallen off google.
I put up the last version I had (which I hacked up a bit) at
http://www.codemonkey.org.uk/cruft/ide-info-0.0.5-dj.tar.gz
> >The parsing gunk we have for /proc/ide is fugly, and should have been
> >done with sysctls from day one imo.
>
> I like text parsing.
must.. resist.. /proc ascii/bin... holywar..
(besides, sysctl interface gives you ascii in /proc/sys/)
> It is not performance critical and makes info human
> readable... Whether existing text parsers are any good or not, I don't
> care, write a better one if you don't like the existing one
That's likely exactly the reason we ended up with the dungheap we have
now. Rewriting the parser when we already have a usable sysctl interface
seems to have no gain over the existing mess to me.
Dave.
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 16:51 ` Dave Jones
@ 2002-05-08 3:38 ` Anton Altaparmakov
2002-05-08 11:47 ` Dave Jones
0 siblings, 1 reply; 265+ messages in thread
From: Anton Altaparmakov @ 2002-05-08 3:38 UTC (permalink / raw)
To: Dave Jones; +Cc: Martin Dalecki, Linus Torvalds, Kernel Mailing List
At 17:51 07/05/02, Dave Jones wrote:
>On Tue, May 07, 2002 at 03:29:28PM +0100, Anton Altaparmakov wrote:
> > [aia21@drop hda]$ ideinfo
> > bash: ideinfo: command not found
> > Obviously distros haven't caught up with this development. )-:
> > Care to give me a URL? A quick google for "ideinfo Linux download" didn't
> > bring up anything looking relevant.
>
>Can't find where I got it from, and it seems to have fallen off google.
>I put up the last version I had (which I hacked up a bit) at
>http://www.codemonkey.org.uk/cruft/ide-info-0.0.5-dj.tar.gz
Ok, will get that. Someone else emailed me a url and I tried that earlier
on (ages ago it seems) it said version 0.0.4 and it displayed a lot of crap
on a 2.5.14 running kernel. Certainly it bears no resemblance to what
/proc/ide/via has to say and it certainly bears no resemblance to
reality... )-: i hope...
> > >The parsing gunk we have for /proc/ide is fugly, and should have been
> > >done with sysctls from day one imo.
> >
> > I like text parsing.
>
>must.. resist.. /proc ascii/bin... holywar..
>(besides, sysctl interface gives you ascii in /proc/sys/)
It does indeed (if implemented). Agreed if Martin were to change to sysctl
with /proc interface great, it would just mean /proc/ide becomes
/proc/sys/ide, nothing against that....
> > It is not performance critical and makes info human
> > readable... Whether existing text parsers are any good or not, I don't
> > care, write a better one if you don't like the existing one
>
>That's likely exactly the reason we ended up with the dungheap we have
>now. Rewriting the parser when we already have a usable sysctl interface
>seems to have no gain over the existing mess to me.
Probably... I agree sysctl is great. I use it in ntfs myself. (-: And i
think the /proc/sys is very nice... And for people who don't like it or who
don;'t compile /proc fs they can use _sysctl...
Cheers,
Anton
--
"I've not lost my mind. It's backed up on tape somewhere." - Unknown
--
Anton Altaparmakov <aia21 at cantab.net> (replace at with @)
Linux NTFS Maintainer / IRC: #ntfs on irc.openprojects.net
WWW: http://linux-ntfs.sf.net/ & http://www-stu.christs.cam.ac.uk/~aia21/
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-08 3:38 ` Anton Altaparmakov
@ 2002-05-08 11:47 ` Dave Jones
0 siblings, 0 replies; 265+ messages in thread
From: Dave Jones @ 2002-05-08 11:47 UTC (permalink / raw)
To: Anton Altaparmakov; +Cc: Martin Dalecki, Linus Torvalds, Kernel Mailing List
On Wed, May 08, 2002 at 04:38:31AM +0100, Anton Altaparmakov wrote:
> >http://www.codemonkey.org.uk/cruft/ide-info-0.0.5-dj.tar.gz
> Ok, will get that. Someone else emailed me a url and I tried that earlier
> on (ages ago it seems) it said version 0.0.4
I don't think 0.0.5 actually hit the streets, I just named it that as
this one contained something or other I did (can't remember what exactly
diff will tell you) that I was intended to send to the author for 0.0.5
> Certainly it bears no resemblance to what /proc/ide/via
> has to say and it certainly bears no resemblance to
> reality... )-:
Likely it needs an update for the newer VIA chipsets, as this code is
~2 years old. What it does do however, is proove that this doesn't need
to be done in kernel space.
Dave.
--
| Dave Jones. http://www.codemonkey.org.uk
| SuSE Labs
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 14:08 ` Dave Jones
2002-05-07 13:11 ` Martin Dalecki
2002-05-07 14:29 ` Anton Altaparmakov
@ 2002-05-07 15:07 ` Padraig Brady
2002-05-07 17:21 ` Andre Hedrick
3 siblings, 0 replies; 265+ messages in thread
From: Padraig Brady @ 2002-05-07 15:07 UTC (permalink / raw)
To: Dave Jones
Cc: Anton Altaparmakov, Martin Dalecki, Linus Torvalds, Kernel Mailing List
Dave Jones wrote:
> On Tue, May 07, 2002 at 02:57:46PM +0100, Anton Altaparmakov wrote:
> > How do I get this information with hdparm please?
> >
> > [aia21@drop ide]$ cat via
>
> Bartlomiej Zolnierkiewicz moved all this stuff to userspace
> a long time ago in 'ideinfo'.
>
> > [aia21@drop hda]$ cat cache
> > 1916
> > [aia21@drop hda]$ cat capacity
> > 80418240
> > [aia21@drop hda]$ cat geometry
> > physical 79780/16/63
> > logical 5005/255/63
> >
> > And hdparm never gives you the physical geometry AFAICS.
>
> Why would a normal user ever need to know this info?
Well one application we have here is a backup script in a web
interface (php running as nobody), which copies a whole disk
(compact flash) to the client while indicating the total size
to the client for feedback:
Header("Content-type: application/octet-stream");
$flash_size=`cat /proc/ide/hda/capacity`;
$flash_size=$flash_size*512;
Header("Content-length: $flash_size");
Header("Content-Disposition: attachment; filename=flash.img");
passthru("/bin/suid_copy_flash");
Now you could of course have a /bin/suid_get_flash_size
but this is messy/less efficient?
Padraig.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 14:08 ` Dave Jones
` (2 preceding siblings ...)
2002-05-07 15:07 ` Padraig Brady
@ 2002-05-07 17:21 ` Andre Hedrick
3 siblings, 0 replies; 265+ messages in thread
From: Andre Hedrick @ 2002-05-07 17:21 UTC (permalink / raw)
To: Dave Jones
Cc: Anton Altaparmakov, Martin Dalecki, Linus Torvalds, Kernel Mailing List
vaio:~ # hdparm -i /dev/hda
/dev/hda:
Model=FUJITSU MHJ2181AT, FwRev=D034, SerialNo=01001697
Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
BuffType=unknown, BuffSize=512kB, MaxMultSect=16, MultSect=16
CurCHS=17475/15/63, CurSects=16513875, LBA=yes, LBAsects=35433216
IORDY=yes, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
PIO modes: pio0 pio1 pio2 pio3 pio4
DMA modes: mdma0 mdma1 mdma2 udma0 udma1 *udma2 udma3 udma4
Drive Supports : ATA-2 ATA-3 ATA-4 ATA-5
Kernel Drive Geometry LogicalCHS=2205/255/63 PhysicalCHS=37495/15/63
BS Dave it does parse the difference nicely
On Tue, 7 May 2002, Dave Jones wrote:
> On Tue, May 07, 2002 at 02:57:46PM +0100, Anton Altaparmakov wrote:
> > How do I get this information with hdparm please?
> >
> > [aia21@drop ide]$ cat via
>
> Bartlomiej Zolnierkiewicz moved all this stuff to userspace
> a long time ago in 'ideinfo'.
>
> > [aia21@drop hda]$ cat cache
> > 1916
> > [aia21@drop hda]$ cat capacity
> > 80418240
> > [aia21@drop hda]$ cat geometry
> > physical 79780/16/63
> > logical 5005/255/63
> >
> > And hdparm never gives you the physical geometry AFAICS.
>
> Why would a normal user ever need to know this info?
>
> > And as I said, I can understand removing the ability to write values into
> > /proc/ide/*, what I disagree with is the removal of the information
> > provided by read-only access to /proc/ide/*. And that is because I am not
> > aware of any other way to get the same information.
>
> The parsing gunk we have for /proc/ide is fugly, and should have been
> done with sysctls from day one imo.
>
> --
> | Dave Jones. http://www.codemonkey.org.uk
> | SuSE Labs
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Andre Hedrick
LAD Storage Consulting Group
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 57
2002-05-07 11:27 ` [PATCH] 2.5.14 IDE 57 Martin Dalecki
2002-05-07 13:16 ` Anton Altaparmakov
@ 2002-05-11 14:09 ` Aaron Lehmann
1 sibling, 0 replies; 265+ messages in thread
From: Aaron Lehmann @ 2002-05-11 14:09 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
On Tue, May 07, 2002 at 01:27:32PM +0200, Martin Dalecki wrote:
> Tue May 7 02:37:49 CEST 2002 ide-clean-57
>
> Nuke /proc/ide.
I actually agree with you here. /proc has turned into a bloated mess
of unparsable text files. It would be nice if we could move some of
this information to a uniform sysctl() interface. I hope you'll work a
bit on that (if it isn't already implemented!).
^ permalink raw reply [flat|nested] 265+ messages in thread
* [PATCH] IDE 58
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
` (4 preceding siblings ...)
2002-05-07 11:27 ` [PATCH] 2.5.14 IDE 57 Martin Dalecki
@ 2002-05-07 15:03 ` Martin Dalecki
2002-05-08 6:42 ` Paul Mackerras
2002-05-09 19:58 ` [PATCH] 2.5.14 IDE 59 Martin Dalecki
` (6 subsequent siblings)
12 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-07 15:03 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 629 bytes --]
Tue May 7 14:28:47 CEST 2002 ide-clean-58
- Apply m68k fixes by Roman Zippel.
- Apply CDROM PIO mode fix by Osamu Tamita.
(You are true "Hawk-eye" hovering over my head! Respect - and many Thanks.)
- Virtualize the udma_enable method as well to help ARM and PPC people. Please
please if you would like to have some other methods virtualized in a similar
way - just tell me or even better do it yourself at the end of ide-dma.c.
I *don't mind* patches.
- Fix the pmac code to adhere to the new API. It's supposed to work again.
However this is blind coding... I give myself 80% chances for it to work ;-).
[-- Attachment #2: ide-clean-58.diff --]
[-- Type: text/plain, Size: 18516 bytes --]
diff -urN linux-2.5.14/drivers/ide/buddha.c linux/drivers/ide/buddha.c
--- linux-2.5.14/drivers/ide/buddha.c 2002-05-06 05:38:01.000000000 +0200
+++ linux/drivers/ide/buddha.c 2002-05-07 15:41:02.000000000 +0200
@@ -1,10 +1,10 @@
/*
* linux/drivers/ide/buddha.c -- Amiga Buddha, Catweasel and X-Surf IDE Driver
*
- * Copyright (C) 1997 by Geert Uytterhoeven
+ * Copyright (C) 1997, 2001 by Geert Uytterhoeven and others
*
- * This driver was written by based on the specifications in README.buddha and
- * the X-Surf info from Inside_XSurf.txt available at
+ * This driver was written based on the specifications in README.buddha and
+ * the X-Surf info from Inside_XSurf.txt available at
* http://www.jschoenfeld.com
*
* This file is subject to the terms and conditions of the GNU General Public
@@ -52,7 +52,7 @@
BUDDHA_BASE1, BUDDHA_BASE2, BUDDHA_BASE3
};
-static const u_int xsurf_bases[XSURF_NUM_HWIFS] __initdata = {
+static u_int xsurf_bases[XSURF_NUM_HWIFS] __initdata = {
XSURF_BASE1, XSURF_BASE2
};
@@ -97,7 +97,7 @@
BUDDHA_IRQ1, BUDDHA_IRQ2, BUDDHA_IRQ3
};
-static const int xsurf_irqports[XSURF_NUM_HWIFS] __initdata = {
+static int xsurf_irqports[XSURF_NUM_HWIFS] __initdata = {
XSURF_IRQ1, XSURF_IRQ2
};
@@ -108,8 +108,9 @@
* Board information
*/
-enum BuddhaType_Enum {BOARD_BUDDHA, BOARD_CATWEASEL, BOARD_XSURF};
-typedef enum BuddhaType_Enum BuddhaType;
+typedef enum BuddhaType_Enum {
+ BOARD_BUDDHA, BOARD_CATWEASEL, BOARD_XSURF
+} BuddhaType;
/*
@@ -175,15 +176,20 @@
if (!request_mem_region(board+XSURF_BASE1, 0x1000, "IDE"))
continue;
if (!request_mem_region(board+XSURF_BASE2, 0x1000, "IDE"))
+ goto fail_base2;
+ if (!request_mem_region(board+XSURF_IRQ1, 0x8, "IDE")) {
+ release_mem_region(board+XSURF_BASE2, 0x1000);
+fail_base2:
+ release_mem_region(board+XSURF_BASE1, 0x1000);
continue;
- if (!request_mem_region(board+XSURF_IRQ1, 0x8, "IDE"))
- continue;
+ }
}
buddha_board = ZTWO_VADDR(board);
/* write to BUDDHA_IRQ_MR to enable the board IRQ */
/* X-Surf doesn't have this. IRQs are always on */
- if(type != BOARD_XSURF) *(char *)(buddha_board+BUDDHA_IRQ_MR) = 0;
+ if (type != BOARD_XSURF)
+ z_writeb(0, buddha_board+BUDDHA_IRQ_MR);
for(i=0;i<buddha_num_hwifs;i++) {
if(type != BOARD_XSURF) {
diff -urN linux-2.5.14/drivers/ide/Config.in linux/drivers/ide/Config.in
--- linux-2.5.14/drivers/ide/Config.in 2002-05-07 17:56:57.000000000 +0200
+++ linux/drivers/ide/Config.in 2002-05-07 15:41:02.000000000 +0200
@@ -103,7 +103,7 @@
dep_mbool ' Amiga IDE Doubler support (EXPERIMENTAL)' CONFIG_BLK_DEV_IDEDOUBLER $CONFIG_BLK_DEV_GAYLE $CONFIG_EXPERIMENTAL
fi
if [ "$CONFIG_ZORRO" = "y" -a "$CONFIG_EXPERIMENTAL" = "y" ]; then
- dep_mbool ' Buddha/Catweasel IDE interface support (EXPERIMENTAL)' CONFIG_BLK_DEV_BUDDHA $CONFIG_ZORRO $CONFIG_EXPERIMENTAL
+ dep_mbool ' Buddha/Catweasel/X-Surf IDE interface support (EXPERIMENTAL)' CONFIG_BLK_DEV_BUDDHA $CONFIG_ZORRO $CONFIG_EXPERIMENTAL
fi
if [ "$CONFIG_ATARI" = "y" ]; then
dep_bool ' Falcon IDE interface support' CONFIG_BLK_DEV_FALCON_IDE $CONFIG_ATARI
diff -urN linux-2.5.14/drivers/ide/falconide.c linux/drivers/ide/falconide.c
--- linux-2.5.14/drivers/ide/falconide.c 2002-05-06 05:38:01.000000000 +0200
+++ linux/drivers/ide/falconide.c 2002-05-07 15:41:02.000000000 +0200
@@ -7,7 +7,7 @@
* License. See the file COPYING in the main directory of this archive for
* more details.
*/
-#include <linux/config.h>
+
#include <linux/types.h>
#include <linux/mm.h>
#include <linux/interrupt.h>
diff -urN linux-2.5.14/drivers/ide/ide.c linux/drivers/ide/ide.c
--- linux-2.5.14/drivers/ide/ide.c 2002-05-07 17:57:04.000000000 +0200
+++ linux/drivers/ide/ide.c 2002-05-07 15:48:53.000000000 +0200
@@ -2097,6 +2097,7 @@
ch->atapi_read = old.atapi_read;
ch->atapi_write = old.atapi_write;
ch->XXX_udma = old.XXX_udma;
+ ch->udma_enable = old.udma_enable;
ch->udma_start = old.udma_start;
ch->udma_stop = old.udma_stop;
ch->udma_read = old.udma_read;
diff -urN linux-2.5.14/drivers/ide/ide-cd.c linux/drivers/ide/ide-cd.c
--- linux-2.5.14/drivers/ide/ide-cd.c 2002-05-07 17:57:04.000000000 +0200
+++ linux/drivers/ide/ide-cd.c 2002-05-07 15:43:27.000000000 +0200
@@ -962,7 +962,7 @@
/* First, figure out if we need to bit-bucket
any of the leading sectors. */
- nskip = MIN(rq->current_nr_sectors - bio_sectors(rq->bio), sectors_to_transfer);
+ nskip = MIN((int)(rq->current_nr_sectors - bio_sectors(rq->bio)), sectors_to_transfer);
while (nskip > 0) {
/* We need to throw away a sector. */
diff -urN linux-2.5.14/drivers/ide/ide-dma.c linux/drivers/ide/ide-dma.c
--- linux-2.5.14/drivers/ide/ide-dma.c 2002-05-07 17:56:57.000000000 +0200
+++ linux/drivers/ide/ide-dma.c 2002-05-07 15:49:00.000000000 +0200
@@ -533,8 +533,20 @@
{
struct ata_channel *ch = drive->channel;
int set_high = 1;
- u8 unit = (drive->select.b.unit & 0x01);
- u64 addr = BLK_BOUNCE_HIGH;
+ u8 unit;
+ u64 addr;
+
+
+ /* Method overloaded by host chip specific code. */
+ if (ch->udma_enable) {
+ ch->udma_enable(drive, on, verbose);
+
+ return;
+ }
+
+ /* Fall back to the default implementation. */
+ unit = (drive->select.b.unit & 0x01);
+ addr = BLK_BOUNCE_HIGH;
if (!on) {
if (verbose)
diff -urN linux-2.5.14/drivers/ide/ide-pmac.c linux/drivers/ide/ide-pmac.c
--- linux-2.5.14/drivers/ide/ide-pmac.c 2002-05-06 05:38:04.000000000 +0200
+++ linux/drivers/ide/ide-pmac.c 2002-05-07 17:52:35.000000000 +0200
@@ -256,7 +256,15 @@
#define IDE_WAKEUP_DELAY_MS 2000
static void pmac_ide_setup_dma(struct device_node *np, int ix);
-static int pmac_ide_dmaproc(ide_dma_action_t func, struct ata_device *drive, struct request *rq);
+
+static void pmac_udma_enable(struct ata_device *drive, int on, int verbose);
+static int pmac_udma_start(struct ata_device *drive, struct request *rq);
+static int pmac_udma_stop(struct ata_device *drive);
+static int pmac_do_udma(unsigned int reading, struct ata_device *drive, struct request *rq);
+static int pmac_udma_read(struct ata_device *drive, struct request *rq);
+static int pmac_udma_write(struct ata_device *drive, struct request *rq);
+static int pmac_udma_irq_status(struct ata_device *drive);
+static int pmac_ide_dmaproc(struct ata_device *drive);
static int pmac_ide_build_dmatable(struct ata_device *drive, struct request *rq, int ix, int wr);
static int pmac_ide_tune_chipset(struct ata_device *drive, byte speed);
static void pmac_ide_tuneproc(struct ata_device *drive, byte pio);
@@ -323,7 +331,13 @@
ide_hwifs[ix].selectproc = pmac_ide_selectproc;
ide_hwifs[ix].speedproc = &pmac_ide_tune_chipset;
if (pmac_ide[ix].dma_regs && pmac_ide[ix].dma_table_cpu) {
- ide_hwifs[ix].udma = pmac_ide_dmaproc;
+ ide_hwifs[ix].udma_enable = pmac_udma_enable;
+ ide_hwifs[ix].udma_start = pmac_udma_start;
+ ide_hwifs[ix].udma_stop = pmac_udma_stop;
+ ide_hwifs[ix].udma_read = pmac_udma_read;
+ ide_hwifs[ix].udma_write = pmac_udma_write;
+ ide_hwifs[ix].udma_irq_status = pmac_udma_irq_status;
+ ide_hwifs[ix].XXX_udma = pmac_ide_dmaproc;
#ifdef CONFIG_BLK_DEV_IDEDMA_PMAC_AUTO
if (!noautodma)
ide_hwifs[ix].autodma = 1;
@@ -1025,7 +1039,13 @@
pmif->dma_table_cpu, pmif->dma_table_dma);
return;
}
- ide_hwifs[ix].udma = pmac_ide_dmaproc;
+ ide_hwifs[ix].udma_enable = pmac_udma_enable;
+ ide_hwifs[ix].udma_start = pmac_udma_start;
+ ide_hwifs[ix].udma_stop = pmac_udma_stop;
+ ide_hwifs[ix].udma_read = pmac_udma_read;
+ ide_hwifs[ix].udma_write = pmac_udma_write;
+ ide_hwifs[ix].udma_irq_status = pmac_udma_irq_status;
+ ide_hwifs[ix].XXX_udma = pmac_ide_dmaproc;
#ifdef CONFIG_BLK_DEV_IDEDMA_PMAC_AUTO
if (!noautodma)
ide_hwifs[ix].autodma = 1;
@@ -1336,130 +1356,178 @@
blk_queue_bounce_limit(&drive->queue, addr);
}
-static int pmac_ide_dmaproc(ide_dma_action_t func, struct ata_device *drive, struct request *rq)
+static void pmac_udma_enable(struct ata_device *drive, int on, int verbose)
+{
+ if (verbose) {
+ printk(KERN_INFO "%s: DMA disabled\n", drive->name);
+ }
+
+ drive->using_dma = 0;
+ ide_toggle_bounce(drive, 0);
+}
+
+static int pmac_udma_start(struct ata_device *drive, struct request *rq)
{
- int ix, dstat, reading, ata4;
+ int ix, ata4;
+ volatile struct dbdma_regs *dma;
+
+ /* Can we stuff a pointer to our intf structure in config_data
+ * or select_data in hwif ?
+ */
+ ix = pmac_ide_find(drive);
+ if (ix < 0)
+ return 0;
+ dma = pmac_ide[ix].dma_regs;
+ ata4 = (pmac_ide[ix].kind == controller_kl_ata4 ||
+ pmac_ide[ix].kind == controller_kl_ata4_80);
+
+ out_le32(&dma->control, (RUN << 16) | RUN);
+ /* Make sure it gets to the controller right now */
+ (void)in_le32(&dma->control);
+ return 0;
+}
+
+static int pmac_udma_stop(struct ata_device *drive)
+{
+ int ix, dstat, ata4;
+ volatile struct dbdma_regs *dma;
+
+ /* Can we stuff a pointer to our intf structure in config_data
+ * or select_data in hwif ?
+ */
+ ix = pmac_ide_find(drive);
+ if (ix < 0)
+ return 0;
+ dma = pmac_ide[ix].dma_regs;
+ ata4 = (pmac_ide[ix].kind == controller_kl_ata4 ||
+ pmac_ide[ix].kind == controller_kl_ata4_80);
+
+ drive->waiting_for_dma = 0;
+ dstat = in_le32(&dma->status);
+ out_le32(&dma->control, ((RUN|WAKE|DEAD) << 16));
+ pmac_ide_destroy_dmatable(drive->channel, ix);
+ /* verify good dma status */
+ return (dstat & (RUN|DEAD|ACTIVE)) != RUN;
+}
+
+static int pmac_do_udma(unsigned int reading, struct ata_device *drive, struct request *rq)
+{
+ int ix, ata4;
volatile struct dbdma_regs *dma;
byte unit = (drive->select.b.unit & 0x01);
-
+
/* Can we stuff a pointer to our intf structure in config_data
* or select_data in hwif ?
*/
ix = pmac_ide_find(drive);
if (ix < 0)
- return 0;
+ return 0;
dma = pmac_ide[ix].dma_regs;
ata4 = (pmac_ide[ix].kind == controller_kl_ata4 ||
pmac_ide[ix].kind == controller_kl_ata4_80);
-
- switch (func) {
- case ide_dma_off:
- printk(KERN_INFO "%s: DMA disabled\n", drive->name);
- case ide_dma_off_quietly:
- drive->using_dma = 0;
- ide_toggle_bounce(drive, 0);
- break;
- case ide_dma_on:
- case ide_dma_check:
- /* Change this to better match ide-dma.c */
- pmac_ide_check_dma(drive);
- ide_toggle_bounce(drive, drive->using_dma);
- break;
- case ide_dma_read:
- case ide_dma_write:
- reading = (func == ide_dma_read);
- if (!pmac_ide_build_dmatable(drive, rq, ix, !reading))
- return 1;
- /* Apple adds 60ns to wrDataSetup on reads */
- if (ata4 && (pmac_ide[ix].timings[unit] & TR_66_UDMA_EN)) {
- out_le32((unsigned *)(IDE_DATA_REG + IDE_TIMING_CONFIG + _IO_BASE),
- pmac_ide[ix].timings[unit] +
- ((func == ide_dma_read) ? 0x00800000UL : 0));
- (void)in_le32((unsigned *)(IDE_DATA_REG + IDE_TIMING_CONFIG + _IO_BASE));
- }
- drive->waiting_for_dma = 1;
- if (drive->type != ATA_DISK)
- return 0;
- ide_set_handler(drive, ide_dma_intr, WAIT_CMD, NULL);
- if ((rq->flags & REQ_DRIVE_ACB) &&
- (drive->addressing == 1)) {
- struct ata_taskfile *args = rq->special;
- OUT_BYTE(args->taskfile.command, IDE_COMMAND_REG);
- } else if (drive->addressing) {
- OUT_BYTE(reading ? WIN_READDMA_EXT : WIN_WRITEDMA_EXT, IDE_COMMAND_REG);
- } else {
- OUT_BYTE(reading ? WIN_READDMA : WIN_WRITEDMA, IDE_COMMAND_REG);
- }
- /* fall through */
- case ide_dma_begin:
- out_le32(&dma->control, (RUN << 16) | RUN);
- /* Make sure it gets to the controller right now */
- (void)in_le32(&dma->control);
- break;
- case ide_dma_end: /* returns 1 on error, 0 otherwise */
- drive->waiting_for_dma = 0;
- dstat = in_le32(&dma->status);
- out_le32(&dma->control, ((RUN|WAKE|DEAD) << 16));
- pmac_ide_destroy_dmatable(drive->channel, ix);
- /* verify good dma status */
- return (dstat & (RUN|DEAD|ACTIVE)) != RUN;
- case ide_dma_test_irq: /* returns 1 if dma irq issued, 0 otherwise */
- /* We have to things to deal with here:
- *
- * - The dbdma won't stop if the command was started
- * but completed with an error without transfering all
- * datas. This happens when bad blocks are met during
- * a multi-block transfer.
- *
- * - The dbdma fifo hasn't yet finished flushing to
- * to system memory when the disk interrupt occurs.
- *
- * The trick here is to increment drive->waiting_for_dma,
- * and return as if no interrupt occured. If the counter
- * reach a certain timeout value, we then return 1. If
- * we really got the interrupt, it will happen right away
- * again.
- * Apple's solution here may be more elegant. They issue
- * a DMA channel interrupt (a separate irq line) via a DBDMA
- * NOP command just before the STOP, and wait for both the
- * disk and DBDMA interrupts to have completed.
- */
-
- /* If ACTIVE is cleared, the STOP command have passed and
- * transfer is complete.
- */
- if (!(in_le32(&dma->status) & ACTIVE))
- return 1;
- if (!drive->waiting_for_dma)
- printk(KERN_WARNING "ide%d, ide_dma_test_irq \
- called while not waiting\n", ix);
- /* If dbdma didn't execute the STOP command yet, the
- * active bit is still set */
- drive->waiting_for_dma++;
- if (drive->waiting_for_dma >= DMA_WAIT_TIMEOUT) {
- printk(KERN_WARNING "ide%d, timeout waiting \
- for dbdma command stop\n", ix);
- return 1;
- }
- udelay(1);
+ if (!pmac_ide_build_dmatable(drive, rq, ix, !reading))
+ return 1;
+ /* Apple adds 60ns to wrDataSetup on reads */
+ if (ata4 && (pmac_ide[ix].timings[unit] & TR_66_UDMA_EN)) {
+ out_le32((unsigned *)(IDE_DATA_REG + IDE_TIMING_CONFIG + _IO_BASE),
+ pmac_ide[ix].timings[unit] +
+ ((reading) ? 0x00800000UL : 0));
+ (void)in_le32((unsigned *)(IDE_DATA_REG + IDE_TIMING_CONFIG + _IO_BASE));
+ }
+ drive->waiting_for_dma = 1;
+ if (drive->type != ATA_DISK)
+ return 0;
+ ide_set_handler(drive, ide_dma_intr, WAIT_CMD, NULL);
+ if ((rq->flags & REQ_DRIVE_ACB) &&
+ (drive->addressing == 1)) {
+ struct ata_taskfile *args = rq->special;
+ OUT_BYTE(args->taskfile.command, IDE_COMMAND_REG);
+ } else if (drive->addressing) {
+ OUT_BYTE(reading ? WIN_READDMA_EXT : WIN_WRITEDMA_EXT, IDE_COMMAND_REG);
+ } else {
+ OUT_BYTE(reading ? WIN_READDMA : WIN_WRITEDMA, IDE_COMMAND_REG);
+ }
+
+ return udma_start(drive, rq);
+}
+
+static int pmac_udma_read(struct ata_device *drive, struct request *rq)
+{
+ return pmac_do_udma(1, drive, rq);
+}
+
+static int pmac_udma_write(struct ata_device *drive, struct request *rq)
+{
+ return pmac_do_udma(0, drive, rq);
+}
+
+/*
+ * FIXME: This should be attached to a channel as we can see now!
+ */
+static int pmac_udma_irq_status(struct ata_device *drive)
+{
+ int ix, ata4;
+ volatile struct dbdma_regs *dma;
+
+ /* Can we stuff a pointer to our intf structure in config_data
+ * or select_data in hwif ?
+ */
+ ix = pmac_ide_find(drive);
+ if (ix < 0)
return 0;
+ dma = pmac_ide[ix].dma_regs;
+ ata4 = (pmac_ide[ix].kind == controller_kl_ata4 ||
+ pmac_ide[ix].kind == controller_kl_ata4_80);
+
+ /* We have to things to deal with here:
+ *
+ * - The dbdma won't stop if the command was started but completed with
+ * an error without transfering all datas. This happens when bad blocks
+ * are met during a multi-block transfer.
+ *
+ * - The dbdma fifo hasn't yet finished flushing to to system memory
+ * when the disk interrupt occurs.
+ *
+ * The trick here is to increment drive->waiting_for_dma, and return as
+ * if no interrupt occured. If the counter reach a certain timeout
+ * value, we then return 1. If we really got the interrupt, it will
+ * happen right away again. Apple's solution here may be more elegant.
+ * They issue a DMA channel interrupt (a separate irq line) via a DBDMA
+ * NOP command just before the STOP, and wait for both the disk and
+ * DBDMA interrupts to have completed.
+ */
- /* Let's implement tose just in case someone wants them */
- case ide_dma_bad_drive:
- case ide_dma_good_drive:
- return check_drive_lists(drive, (func == ide_dma_good_drive));
- case ide_dma_lostirq:
- case ide_dma_timeout:
- printk(KERN_WARNING "ide_pmac_dmaproc: chipset supported func only: %d\n", func);
+ /* If ACTIVE is cleared, the STOP command have passed and
+ * transfer is complete.
+ */
+ if (!(in_le32(&dma->status) & ACTIVE))
return 1;
- default:
- printk(KERN_WARNING "ide_pmac_dmaproc: unsupported func: %d\n", func);
+ if (!drive->waiting_for_dma)
+ printk(KERN_WARNING "ide%d, ide_dma_test_irq \
+ called while not waiting\n", ix);
+
+ /* If dbdma didn't execute the STOP command yet, the
+ * active bit is still set */
+ drive->waiting_for_dma++;
+ if (drive->waiting_for_dma >= DMA_WAIT_TIMEOUT) {
+ printk(KERN_WARNING "ide%d, timeout waiting \
+ for dbdma command stop\n", ix);
return 1;
}
+ udelay(1);
return 0;
}
-#endif /* CONFIG_BLK_DEV_IDEDMA_PMAC */
+
+static int pmac_ide_dmaproc(struct ata_device *drive)
+{
+ /* Change this to better match ide-dma.c */
+ pmac_ide_check_dma(drive);
+ ide_toggle_bounce(drive, drive->using_dma);
+
+ return 0;
+}
+#endif
static void idepmac_sleep_device(ide_drive_t *drive, int i, unsigned base)
{
diff -urN linux-2.5.14/drivers/ide/ide-taskfile.c linux/drivers/ide/ide-taskfile.c
--- linux-2.5.14/drivers/ide/ide-taskfile.c 2002-05-07 17:57:01.000000000 +0200
+++ linux/drivers/ide/ide-taskfile.c 2002-05-07 15:41:02.000000000 +0200
@@ -39,8 +39,6 @@
#define DTF(x...)
#endif
-#define SUPPORT_VLB_SYNC 1
-
/*
* for now, taskfile requests are special :/
*/
diff -urN linux-2.5.14/include/asm-m68k/ide.h linux/include/asm-m68k/ide.h
--- linux-2.5.14/include/asm-m68k/ide.h 2002-05-07 17:56:58.000000000 +0200
+++ linux/include/asm-m68k/ide.h 2002-05-07 15:41:02.000000000 +0200
@@ -121,6 +121,7 @@
#define inb(p) in_8(ADDR_TRANS_B(p))
#define inb_p(p) in_8(ADDR_TRANS_B(p))
#define inw(p) in_be16(ADDR_TRANS_W(p))
+#define inw_p(p) in_be16(ADDR_TRANS_W(p))
#define outb(v,p) out_8(ADDR_TRANS_B(p),v)
#define outb_p(v,p) out_8(ADDR_TRANS_B(p),v)
#define outw(v,p) out_be16(ADDR_TRANS_W(p),v)
diff -urN linux-2.5.14/include/linux/ide.h linux/include/linux/ide.h
--- linux-2.5.14/include/linux/ide.h 2002-05-07 17:57:04.000000000 +0200
+++ linux/include/linux/ide.h 2002-05-07 15:48:57.000000000 +0200
@@ -459,6 +459,8 @@
int (*XXX_udma)(struct ata_device *);
+ void (*udma_enable)(struct ata_device *, int, int);
+
int (*udma_start) (struct ata_device *, struct request *rq);
int (*udma_stop) (struct ata_device *);
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-07 15:03 ` [PATCH] IDE 58 Martin Dalecki
@ 2002-05-08 6:42 ` Paul Mackerras
2002-05-08 8:53 ` Martin Dalecki
0 siblings, 1 reply; 265+ messages in thread
From: Paul Mackerras @ 2002-05-08 6:42 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
Martin Dalecki writes:
> - Virtualize the udma_enable method as well to help ARM and PPC people. Please
> please if you would like to have some other methods virtualized in a similar
> way - just tell me or even better do it yourself at the end of ide-dma.c.
> I *don't mind* patches.
>
> - Fix the pmac code to adhere to the new API. It's supposed to work again.
> However this is blind coding... I give myself 80% chances for it to work ;-).
OK, now I am truly impressed. Not only does it compile cleanly, it
works first go!
I am using the tiny patch below, it sets the unmask flag so interrupts
will be unmasked by default (which is safe on powermacs).
Thanks,
Paul.
diff -urN linux-2.5/drivers/ide/ide-pmac.c pmac-2.5/drivers/ide/ide-pmac.c
--- linux-2.5/drivers/ide/ide-pmac.c Wed May 8 16:40:17 2002
+++ pmac-2.5/drivers/ide/ide-pmac.c Wed May 8 08:26:48 2002
@@ -343,6 +343,7 @@
ide_hwifs[ix].autodma = 1;
#endif
}
+ ide_hwifs[ix].unmask = 1;
}
#if 0
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 6:42 ` Paul Mackerras
@ 2002-05-08 8:53 ` Martin Dalecki
2002-05-08 10:37 ` Bjorn Wesen
0 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-08 8:53 UTC (permalink / raw)
To: Paul Mackerras; +Cc: Linus Torvalds, Kernel Mailing List, bjornw
Uz.ytkownik Paul Mackerras napisa?:
> Martin Dalecki writes:
>
>
>>- Virtualize the udma_enable method as well to help ARM and PPC people. Please
>> please if you would like to have some other methods virtualized in a similar
>> way - just tell me or even better do it yourself at the end of ide-dma.c.
>> I *don't mind* patches.
>>
>>- Fix the pmac code to adhere to the new API. It's supposed to work again.
>> However this is blind coding... I give myself 80% chances for it to work ;-).
>
>
> OK, now I am truly impressed. Not only does it compile cleanly, it
> works first go!
Thank you.
BTW> I would really love it if the cris architecture people could
"lend me" some small developement system for they interresting CPU.
In return I could give them what's certainly worth "several weeks of
developers time". (If you hear me: this is a hint if you need an argument for
your management.)
This unfortunately is the somehow most wired ATA interface
around. Which is due to the fact that the interface cell is directly mapped to
some CPU registers. As a CPU design I think it's a fine approach. Don't
take me wrong. You save yourself the whole silicon which is needed
for BM access arbitration and general handling and so on... Very nice tought
out. But on the software side this is a bit wired, since you can't use
the generic I/O primitives of the arch in question.
This makes my cleanup of the portability layer a bit hard
to finish on the software side.
> I am using the tiny patch below, it sets the unmask flag so interrupts
> will be unmasked by default (which is safe on powermacs).
And on every other fscking PCI based system... (modulo the "problematic"
cmd640 and RZ1000). Should have been done a long time ago this way... I will
adjust the others as well.
> Thanks,
> Paul.
>
> diff -urN linux-2.5/drivers/ide/ide-pmac.c pmac-2.5/drivers/ide/ide-pmac.c
> --- linux-2.5/drivers/ide/ide-pmac.c Wed May 8 16:40:17 2002
> +++ pmac-2.5/drivers/ide/ide-pmac.c Wed May 8 08:26:48 2002
> @@ -343,6 +343,7 @@
> ide_hwifs[ix].autodma = 1;
> #endif
> }
> + ide_hwifs[ix].unmask = 1;
> }
>
> #if 0
>
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 8:53 ` Martin Dalecki
@ 2002-05-08 10:37 ` Bjorn Wesen
2002-05-08 10:16 ` Martin Dalecki
2002-05-08 11:00 ` Benjamin Herrenschmidt
0 siblings, 2 replies; 265+ messages in thread
From: Bjorn Wesen @ 2002-05-08 10:37 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Paul Mackerras, Linus Torvalds, Kernel Mailing List
On Wed, 8 May 2002, Martin Dalecki wrote:
> BTW> I would really love it if the cris architecture people could
> "lend me" some small developement system for they interresting CPU.
We'll consider it :) However,
> This unfortunately is the somehow most wired ATA interface
> around. Which is due to the fact that the interface cell is directly mapped to
> some CPU registers. As a CPU design I think it's a fine approach. Don't
> take me wrong. You save yourself the whole silicon which is needed
> for BM access arbitration and general handling and so on... Very nice tought
> out. But on the software side this is a bit wired, since you can't use
> the generic I/O primitives of the arch in question.
I don't see why all IDE-interfaces in the world have to be I/O-mapped just
because the first PC implementations used that. Sure it was an extended
ISA-bus but the ISA bus is long gone and we don't all run PC's anymore
either.
So the simple abstraction we need to hit IDE-bus registers is a macro or
inline, instead of a call of an I/O-primitive. It was too much work to
abstract this when I inserted the CRIS-arch IDE-driver in the first place
so I found a workaround but now seems like a better time..
Similarily, there is no reason at all why the CPU has to do _polling_ just
because the IDE _bus_ is using a PIO-mode. It probably does that on legacy
PC's but HW designed, hrm, more optimally can use DMA. Hence the hooks for
the ide_func_t.
So I'd figure the software side really would be _easier_ to implement with
those assumptions about how an IDE-interface is supposed to work gone.
> This makes my cleanup of the portability layer a bit hard
> to finish on the software side.
I understand that, so lets keep the discussion going and I'll check over
your current cleanup.
/Bjorn
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 10:37 ` Bjorn Wesen
@ 2002-05-08 10:16 ` Martin Dalecki
2002-05-08 19:06 ` Linus Torvalds
2002-05-08 11:00 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-08 10:16 UTC (permalink / raw)
To: Bjorn Wesen; +Cc: Paul Mackerras, Linus Torvalds, Kernel Mailing List
Uz.ytkownik Bjorn Wesen napisa?:
> On Wed, 8 May 2002, Martin Dalecki wrote:
>
>>BTW> I would really love it if the cris architecture people could
>>"lend me" some small developement system for they interresting CPU.
>
>
> We'll consider it :) However,
>
>
>>This unfortunately is the somehow most wired ATA interface
>>around. Which is due to the fact that the interface cell is directly mapped to
>>some CPU registers. As a CPU design I think it's a fine approach. Don't
>>take me wrong. You save yourself the whole silicon which is needed
>>for BM access arbitration and general handling and so on... Very nice tought
>>out. But on the software side this is a bit wired, since you can't use
>>the generic I/O primitives of the arch in question.
>
>
> I don't see why all IDE-interfaces in the world have to be I/O-mapped just
> because the first PC implementations used that. Sure it was an extended
> ISA-bus but the ISA bus is long gone and we don't all run PC's anymore
> either.
Hey I agree and anticipate the design decisions for the Cris CPU
as good and surprisingly refreshing. Like for example the whole
concept of the compacted command set and so on. They are just *cute*...
It's about a year ago I did study the public available documentation
on it.
> So the simple abstraction we need to hit IDE-bus registers is a macro or
> inline, instead of a call of an I/O-primitive. It was too much work to
> abstract this when I inserted the CRIS-arch IDE-driver in the first place
> so I found a workaround but now seems like a better time..
I don't think that it's always the proper aproach for hardware
portability to do it on the "micro operation" level. That's good
for generics like inb outb. In the case of the ATA interface it's
better to do it on the "functional" level above... Just like you did
with ata_read() and ata_write() as they are called now. You can
see I picked it up and when I sort the transport method detecion/setting
out I will apply it to the other friends from the ata_read_xxx family as well.
And then we have the same aproach in the udma_ familiy I just
introduced.
> Similarily, there is no reason at all why the CPU has to do _polling_ just
> because the IDE _bus_ is using a PIO-mode. It probably does that on legacy
> PC's but HW designed, hrm, more optimally can use DMA. Hence the hooks for
> the ide_func_t.
Well right now I think if you look at the IDE 58 patch you will see
that ide_func_t is a 'bit ugly', simple becouse it is introducing
just another entity to the game. We don't need it.
struct ata_chanell *is* the central entitiy for operations from
the host view. In my whole expierence as a programmer it always turned
out to be most sane to make the software design be a homological mapping
of the generalized hardware design on this level of coding.
It's just natural functions are there to serve a specific purpose.
> So I'd figure the software side really would be _easier_ to implement with
> those assumptions about how an IDE-interface is supposed to work gone.
>
>
>>This makes my cleanup of the portability layer a bit hard
>>to finish on the software side.
>
>
> I understand that, so lets keep the discussion going and I'll check over
> your current cleanup.
Well please consider: iff I had access to the hardware it would possibly save
you a lot of reading through bad english ;-).
> /Bjorn
Regards.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 10:16 ` Martin Dalecki
@ 2002-05-08 19:06 ` Linus Torvalds
2002-05-08 19:10 ` Benjamin Herrenschmidt
0 siblings, 1 reply; 265+ messages in thread
From: Linus Torvalds @ 2002-05-08 19:06 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Bjorn Wesen, Paul Mackerras, Kernel Mailing List
On Wed, 8 May 2002, Martin Dalecki wrote:
>
> I don't think that it's always the proper aproach for hardware
> portability to do it on the "micro operation" level. That's good
> for generics like inb outb. In the case of the ATA interface it's
> better to do it on the "functional" level above...
Amen.
Helleluja.
Listen to the man.
Please don't play games with "ide_outb()" etc, which cause 99% of the
architectures to have to make the 1:1 translation to just "outb()", and
which also makes it incredibly cumbersome to handle multiple _different_
controllers that just happen to use different schemes.
Instead, making the virtualization at a higher point means that you can
have _one_ set of common operations for traditional PCI/ATA controllers
(and that one set uses inx/outx/readx/writex), and then you have a few
others for the "strange" cases.
And done properly with per-controller (or drive - you may want to
virtualize at the drive level just because you could separate out
different kinds of drive accesses that way too) function pointers you can
then _mix_ access methods, without getting completely idiotic run-time
checks inside "ide_out()".
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 19:06 ` Linus Torvalds
@ 2002-05-08 19:10 ` Benjamin Herrenschmidt
2002-05-08 20:31 ` Alan Cox
` (2 more replies)
0 siblings, 3 replies; 265+ messages in thread
From: Benjamin Herrenschmidt @ 2002-05-08 19:10 UTC (permalink / raw)
To: Linus Torvalds, Martin Dalecki, Andre Hedrick
Cc: Bjorn Wesen, Paul Mackerras, Kernel Mailing List
>And done properly with per-controller (or drive - you may want to
>virtualize at the drive level just because you could separate out
>different kinds of drive accesses that way too) function pointers you can
>then _mix_ access methods, without getting completely idiotic run-time
>checks inside "ide_out()".
Which ends up basically into having function pointers in the
ata_channel (or ata_drive, but I doubt that would be really
necessary) a set of 4 access functions: taskfile_in/out for
access to taskfile registers (8 bits), and data_in/out for
steaming datas in/out of the data reg (16 bits).
That would cleanly solve my problem of mixing MMIO and PIO
controllers in the same machine, that would solve the crazy
byteswapping needed by some controllers for PIO at least,
etc...
I would even suggest not caring about the taskfile register
address at all (that is kill the array of port addresses) but
just pass the taskfile_in/out functions the register number
(cyl_hi, cyl_lo, select, ....) as a nice symbolic constant,
and let the channel specific implementation figure it out.
I haven't checked if you already killed all of the request/release
region crap done by the common ide code, that is matter is completely
internal to the host controller driver, etc...
Now, andre may tell us we need one more set for "slow IO"
versions for some HW, I don't know the details for these so
I'll let the old man speak up here.
Ben.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 19:10 ` Benjamin Herrenschmidt
@ 2002-05-08 20:31 ` Alan Cox
2002-05-08 19:49 ` Benjamin Herrenschmidt
2002-05-08 20:29 ` Andre Hedrick
2002-05-09 15:19 ` Eric W. Biederman
2002-05-09 20:20 ` Ian Molton
2 siblings, 2 replies; 265+ messages in thread
From: Alan Cox @ 2002-05-08 20:31 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Linus Torvalds, Martin Dalecki, Andre Hedrick, Bjorn Wesen,
Paul Mackerras, Kernel Mailing List
> ata_channel (or ata_drive, but I doubt that would be really
> necessary) a set of 4 access functions: taskfile_in/out for
> access to taskfile registers (8 bits), and data_in/out for
> steaming datas in/out of the data reg (16 bits).
Please push it higher level than that. Load the taskfile as a set in
each method. Remember its 1 potentially paired instruction to do an MMIO
write, its a whole mess of synchronziation and stalls to do a function
pointer.
> address at all (that is kill the array of port addresses) but
> just pass the taskfile_in/out functions the register number
> (cyl_hi, cyl_lo, select, ....) as a nice symbolic constant,
> and let the channel specific implementation figure it out.
Pass dev->taskfile_load() a struct at least for the common paths. Make the
PIO block transfers also single callbacks for each block not word.
Alan
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 20:31 ` Alan Cox
@ 2002-05-08 19:49 ` Benjamin Herrenschmidt
2002-05-08 20:44 ` Alan Cox
2002-05-08 20:29 ` Andre Hedrick
1 sibling, 1 reply; 265+ messages in thread
From: Benjamin Herrenschmidt @ 2002-05-08 19:49 UTC (permalink / raw)
To: Alan Cox
Cc: Linus Torvalds, Martin Dalecki, Andre Hedrick, Bjorn Wesen,
Paul Mackerras, Kernel Mailing List
>
>Please push it higher level than that. Load the taskfile as a set in
>each method. Remember its 1 potentially paired instruction to do an MMIO
>write, its a whole mess of synchronziation and stalls to do a function
>pointer.
I though about that, but what about corner cases where only a single
register can be accessed ? (typically alt status). Provide specific
routines ? Also, how does the extended addressing works ? by writing
several times to the cyl registers ? That would have to be dealt with
as well in each host driver then.
>> address at all (that is kill the array of port addresses) but
>> just pass the taskfile_in/out functions the register number
>> (cyl_hi, cyl_lo, select, ....) as a nice symbolic constant,
>> and let the channel specific implementation figure it out.
>
>Pass dev->taskfile_load() a struct at least for the common paths. Make the
>PIO block transfers also single callbacks for each block not word.
Right. We could go the darwin (apple) way and have taskfile_load/store
functions doing the entire registers controlled by a bitmask of which
registers has to be touched. it has a cost (testing each bit and
conditionally branching, which can suck hard) but probably less than
an indirect function call which isn't predictable.
Ben.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 19:49 ` Benjamin Herrenschmidt
@ 2002-05-08 20:44 ` Alan Cox
2002-05-08 20:04 ` Benjamin Herrenschmidt
2002-05-08 20:36 ` Andre Hedrick
0 siblings, 2 replies; 265+ messages in thread
From: Alan Cox @ 2002-05-08 20:44 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Alan Cox, Linus Torvalds, Martin Dalecki, Andre Hedrick,
Bjorn Wesen, Paul Mackerras, Kernel Mailing List
> I though about that, but what about corner cases where only a single
> register can be accessed ? (typically alt status). Provide specific
> routines ? Also, how does the extended addressing works ? by writing
> several times to the cyl registers ? That would have to be dealt with
> as well in each host driver then.
There are lots of cases we don't care about speed - things like setup of
the controller, changing UDMA mode etc.
> Right. We could go the darwin (apple) way and have taskfile_load/store
> functions doing the entire registers controlled by a bitmask of which
> registers has to be touched. it has a cost (testing each bit and
> conditionally branching, which can suck hard) but probably less than
Get yourself a conditional move instruction 8)
> an indirect function call which isn't predictable.
Or you have a small set of such functions for the critical paths - ie doing
actual block I/O which pass the set of values required to do that operation
and do the stores. What are the performance critical paths
Begin a disk write
Begin a disk read
PIO transfer in
PIO transfer out
End a disk I/O fastpaths (no error case)
Maybe ATAPI command writes ?
beyond that I doubt the rest are critical
Alan
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 20:44 ` Alan Cox
@ 2002-05-08 20:04 ` Benjamin Herrenschmidt
2002-05-09 20:20 ` Ian Molton
2002-05-08 20:36 ` Andre Hedrick
1 sibling, 1 reply; 265+ messages in thread
From: Benjamin Herrenschmidt @ 2002-05-08 20:04 UTC (permalink / raw)
To: Alan Cox
Cc: Linus Torvalds, Martin Dalecki, Andre Hedrick, Bjorn Wesen,
Paul Mackerras, Kernel Mailing List
>> I though about that, but what about corner cases where only a single
>> register can be accessed ? (typically alt status). Provide specific
>> routines ? Also, how does the extended addressing works ? by writing
>> several times to the cyl registers ? That would have to be dealt with
>> as well in each host driver then.
>
>There are lots of cases we don't care about speed - things like setup of
>the controller, changing UDMA mode etc.
Right, so we keep the basic indiret access functions, and add the taskfile
ones on top for performances, that's what you mean ?
>> Right. We could go the darwin (apple) way and have taskfile_load/store
>> functions doing the entire registers controlled by a bitmask of which
>> registers has to be touched. it has a cost (testing each bit and
>> conditionally branching, which can suck hard) but probably less than
>
>Get yourself a conditional move instruction 8)
Hehe, let's make an ARM/PPC hybrid ;)
>> an indirect function call which isn't predictable.
>
>Or you have a small set of such functions for the critical paths - ie doing
>actual block I/O which pass the set of values required to do that operation
>and do the stores. What are the performance critical paths
>
> Begin a disk write
> Begin a disk read
> PIO transfer in
> PIO transfer out
> End a disk I/O fastpaths (no error case)
>
> Maybe ATAPI command writes ?
>
>beyond that I doubt the rest are critical
Well, I would normally agree with the above... except that IDE is so full of
corner cases that I don't want to see dealt with in each host controller
driver...
Ben.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 20:44 ` Alan Cox
2002-05-08 20:04 ` Benjamin Herrenschmidt
@ 2002-05-08 20:36 ` Andre Hedrick
1 sibling, 0 replies; 265+ messages in thread
From: Andre Hedrick @ 2002-05-08 20:36 UTC (permalink / raw)
To: Alan Cox
Cc: Benjamin Herrenschmidt, Linus Torvalds, Martin Dalecki,
Bjorn Wesen, Paul Mackerras, Kernel Mailing List
On Wed, 8 May 2002, Alan Cox wrote:
> > I though about that, but what about corner cases where only a single
> > register can be accessed ? (typically alt status). Provide specific
> > routines ? Also, how does the extended addressing works ? by writing
> > several times to the cyl registers ? That would have to be dealt with
> > as well in each host driver then.
>
> There are lots of cases we don't care about speed - things like setup of
> the controller, changing UDMA mode etc.
Erm, think about the state diagram for testing the acceptance and command
migration if TAG is going to be standard.
> Begin a disk write
> Begin a disk read
test command accept/reject
repeat 'til done:
check_status
> PIO transfer in
> PIO transfer out
io hardware atomic segment or burst
(for linus, "atomic" is what is needed to complete, not just linusism")
goto repeat if !done with entire request
The above is where data integrity is lost if error happens.
> End a disk I/O fastpaths (no error case)
>
> Maybe ATAPI command writes ?
>
> beyond that I doubt the rest are critical
There are several other cases, but 95% is the command block execution.
The rest is sense data.
Cheers,
Andre Hedrick
LAD Storage Consulting Group
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 20:31 ` Alan Cox
2002-05-08 19:49 ` Benjamin Herrenschmidt
@ 2002-05-08 20:29 ` Andre Hedrick
2002-05-08 20:06 ` Benjamin Herrenschmidt
2002-05-09 12:14 ` Martin Dalecki
1 sibling, 2 replies; 265+ messages in thread
From: Andre Hedrick @ 2002-05-08 20:29 UTC (permalink / raw)
To: Alan Cox
Cc: Benjamin Herrenschmidt, Linus Torvalds, Martin Dalecki,
Bjorn Wesen, Paul Mackerras, Kernel Mailing List
Alan, we talked about this and the driver/hardware has a flaw.
If you count the total number of single IO operations to check
status/error et al., it is out right fugly. Preprocessing will kill us
like today.
On Wed, 8 May 2002, Alan Cox wrote:
> > ata_channel (or ata_drive, but I doubt that would be really
> > necessary) a set of 4 access functions: taskfile_in/out for
> > access to taskfile registers (8 bits), and data_in/out for
> > steaming datas in/out of the data reg (16 bits).
>
> Please push it higher level than that. Load the taskfile as a set in
> each method. Remember its 1 potentially paired instruction to do an MMIO
> write, its a whole mess of synchronziation and stalls to do a function
> pointer.
>
> > address at all (that is kill the array of port addresses) but
> > just pass the taskfile_in/out functions the register number
> > (cyl_hi, cyl_lo, select, ....) as a nice symbolic constant,
> > and let the channel specific implementation figure it out.
>
> Pass dev->taskfile_load() a struct at least for the common paths. Make the
> PIO block transfers also single callbacks for each block not word.
>
> Alan
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Andre Hedrick
LAD Storage Consulting Group
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 20:29 ` Andre Hedrick
@ 2002-05-08 20:06 ` Benjamin Herrenschmidt
2002-05-09 12:14 ` Martin Dalecki
1 sibling, 0 replies; 265+ messages in thread
From: Benjamin Herrenschmidt @ 2002-05-08 20:06 UTC (permalink / raw)
To: Andre Hedrick, Alan Cox
Cc: Linus Torvalds, Martin Dalecki, Bjorn Wesen, Paul Mackerras,
Kernel Mailing List
>Alan, we talked about this and the driver/hardware has a flaw.
>If you count the total number of single IO operations to check
>status/error et al., it is out right fugly. Preprocessing will kill us
>like today.
So we can still end up having both
- The single ops indirected like I first proposed (with maybe the
addition of slow versions but that could be a parameter, and 32 bits
versions)
- A couple of "apply this taskfile" versions with well known semantics
used only in normal cases for perfs.
Ben.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 20:29 ` Andre Hedrick
2002-05-08 20:06 ` Benjamin Herrenschmidt
@ 2002-05-09 12:14 ` Martin Dalecki
1 sibling, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-09 12:14 UTC (permalink / raw)
To: Andre Hedrick
Cc: Alan Cox, Benjamin Herrenschmidt, Linus Torvalds, Bjorn Wesen,
Paul Mackerras, Kernel Mailing List
Uz.ytkownik Andre Hedrick napisa?:
> Alan, we talked about this and the driver/hardware has a flaw.
> If you count the total number of single IO operations to check
> status/error et al., it is out right fugly. Preprocessing will kill us
> like today.
You mean the preprocessing in the devices firmware program of course?
Just to confirm I did get it right...
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 19:10 ` Benjamin Herrenschmidt
2002-05-08 20:31 ` Alan Cox
@ 2002-05-09 15:19 ` Eric W. Biederman
2002-05-09 20:20 ` Ian Molton
2 siblings, 0 replies; 265+ messages in thread
From: Eric W. Biederman @ 2002-05-09 15:19 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: Linus Torvalds, Martin Dalecki, Andre Hedrick, Bjorn Wesen,
Paul Mackerras, Kernel Mailing List
Benjamin Herrenschmidt <benh@kernel.crashing.org> writes:
> >And done properly with per-controller (or drive - you may want to
> >virtualize at the drive level just because you could separate out
> >different kinds of drive accesses that way too) function pointers you can
> >then _mix_ access methods, without getting completely idiotic run-time
> >checks inside "ide_out()".
>
> Which ends up basically into having function pointers in the
> ata_channel (or ata_drive, but I doubt that would be really
> necessary) a set of 4 access functions: taskfile_in/out for
> access to taskfile registers (8 bits), and data_in/out for
> steaming datas in/out of the data reg (16 bits).
>
> That would cleanly solve my problem of mixing MMIO and PIO
> controllers in the same machine, that would solve the crazy
> byteswapping needed by some controllers for PIO at least,
> etc...
>
> I would even suggest not caring about the taskfile register
> address at all (that is kill the array of port addresses) but
> just pass the taskfile_in/out functions the register number
> (cyl_hi, cyl_lo, select, ....) as a nice symbolic constant,
> and let the channel specific implementation figure it out.
> I haven't checked if you already killed all of the request/release
> region crap done by the common ide code, that is matter is completely
> internal to the host controller driver, etc...
>
> Now, andre may tell us we need one more set for "slow IO"
> versions for some HW, I don't know the details for these so
> I'll let the old man speak up here.
I'd suggest pointers in the ata_channel that abstract out the
functions of the host controllers. For most controllers we
can have a common PCI IDE library that implements them, and provides
a reference implementation for the weird cases.
>From the ata-6 draft there are the following protocols, that should
be implementable on an IDE host controller.
- Software reset protocol
- Non-data command protocol
- PIO data-in command protocol
- PIO data-out command protocol
- DMA command protocol
- PACKET command protocol
- READ/WRITE DMA QUEUED command protocol
- EXECUTE DEVICE DIAGNOSTIC command protocol
- DEVICE RESET command protocol
- Ultra DMA data-in commands
- Ultra DMA data-out commands
Given the high level of the protocol abstraction we aren't
likely to beat ourselves to death with extra cpu or io overhead.
Nor is this an insane number of things to implement.
Perhaps more can be factored out (controllers being so similiar) but
that is the abstraction we need for the layer sending commands to ATA
devices. This allows the higher layers to focus on sending commands
to ATA devices.
Eric
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 19:10 ` Benjamin Herrenschmidt
2002-05-08 20:31 ` Alan Cox
2002-05-09 15:19 ` Eric W. Biederman
@ 2002-05-09 20:20 ` Ian Molton
2 siblings, 0 replies; 265+ messages in thread
From: Ian Molton @ 2002-05-09 20:20 UTC (permalink / raw)
To: Benjamin Herrenschmidt
Cc: torvalds, dalecki, andre, bjorn.wesen, paulus, linux-kernel
On Wed, 8 May 2002 21:10:54 +0200
Benjamin Herrenschmidt <benh@kernel.crashing.org> wrote:
>
> That would cleanly solve my problem of mixing MMIO and PIO
> controllers in the same machine, that would solve the crazy
> byteswapping needed by some controllers for PIO at least,
> etc...
Its good here on the old and slightly odd Acorn Podule bus interface...
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] IDE 58
2002-05-08 10:37 ` Bjorn Wesen
2002-05-08 10:16 ` Martin Dalecki
@ 2002-05-08 11:00 ` Benjamin Herrenschmidt
1 sibling, 0 replies; 265+ messages in thread
From: Benjamin Herrenschmidt @ 2002-05-08 11:00 UTC (permalink / raw)
To: Bjorn Wesen, Martin Dalecki
Cc: Paul Mackerras, Linus Torvalds, Kernel Mailing List
>I don't see why all IDE-interfaces in the world have to be I/O-mapped just
>because the first PC implementations used that. Sure it was an extended
>ISA-bus but the ISA bus is long gone and we don't all run PC's anymore
>either.
>
>So the simple abstraction we need to hit IDE-bus registers is a macro or
>inline, instead of a call of an I/O-primitive. It was too much work to
>abstract this when I inserted the CRIS-arch IDE-driver in the first place
>so I found a workaround but now seems like a better time..
No, not a macro. There are cases where you want different access methods
on the same machine. For example, pmacs can have the "mac-io" (ide-pmac)
controller, which is MMIO based, _and_ a PCI-based legacy IDE controller
using inx/outx like IOs. (A typical example is the Blue&White G3 who has
both on the motherboard).
Ultimately, you want the hwif (or what it becomes in 2.5) provide a set
of functions for accessing taskfile registers and doing the PIO data
stream read/writes (that is replace inb/outb and insw/outsw).
>Similarily, there is no reason at all why the CPU has to do _polling_ just
>because the IDE _bus_ is using a PIO-mode. It probably does that on legacy
>PC's but HW designed, hrm, more optimally can use DMA. Hence the hooks for
>the ide_func_t.
>
>So I'd figure the software side really would be _easier_ to implement with
>those assumptions about how an IDE-interface is supposed to work gone.
>
>> This makes my cleanup of the portability layer a bit hard
>> to finish on the software side.
>
>I understand that, so lets keep the discussion going and I'll check over
>your current cleanup.
>
>/Bjorn
>
>
>-
>To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html
>Please read the FAQ at http://www.tux.org/lkml/
^ permalink raw reply [flat|nested] 265+ messages in thread
* [PATCH] 2.5.14 IDE 59
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
` (5 preceding siblings ...)
2002-05-07 15:03 ` [PATCH] IDE 58 Martin Dalecki
@ 2002-05-09 19:58 ` Martin Dalecki
2002-05-11 4:16 ` William Lee Irwin III
2002-05-11 16:59 ` [PATCH] 2.5.15 IDE 60 Martin Dalecki
` (5 subsequent siblings)
12 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-09 19:58 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 1012 bytes --]
Wed May 8 17:06:57 CEST 2002 ide-clean-59
Basically PCI driver handling reorganization. This is one step further
ahead toward the goal of fully modularized host chip drivers.
- Adjust ide-scsi to the new error handling. Just don't try any device
resets there.
- Add unmasking of IRQ per default to the PMac PCI code.
- Split up the crap table from ide-pci. Let the corresponding drivers do
registration of the functions they provide. This small change makes
this patch rather big.
- Hard code the number of ports requested for DMA engines. They are always
precisely 8 on PCs. If you hove something different to deal with,
well then please just provide your own init_dma method.
- Remove the HDIO_GETGEO_BIG ioctl. Patch by Andries Brouwer. Applies
unmodified.
- Make ON_BOARD be equal 0, so we can spare ourself some typing in structure
initialization.
- Normalize the terminology in the host chip drivers. It will make spotting
the tons of common code found there later easier.
[-- Attachment #2: ide-clean-59.diff.gz --]
[-- Type: application/x-gzip, Size: 26602 bytes --]
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.14 IDE 59
2002-05-09 19:58 ` [PATCH] 2.5.14 IDE 59 Martin Dalecki
@ 2002-05-11 4:16 ` William Lee Irwin III
0 siblings, 0 replies; 265+ messages in thread
From: William Lee Irwin III @ 2002-05-11 4:16 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
On Thu, May 09, 2002 at 09:58:54PM +0200, Martin Dalecki wrote:
> Wed May 8 17:06:57 CEST 2002 ide-clean-59
> Basically PCI driver handling reorganization. This is one step further
> ahead toward the goal of fully modularized host chip drivers.
Martin, could you fix the broken build with the following config? I'd
rather not stick my hand in this pot while it's being vigorously
stirred, though I would like to get some test boots in.
Cheers,
Bill
#
# Automatically generated make config: don't edit
#
CONFIG_X86=y
CONFIG_ISA=y
# CONFIG_SBUS is not set
CONFIG_UID16=y
#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
#
# General setup
#
CONFIG_NET=y
CONFIG_SYSVIPC=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_SYSCTL=y
#
# Loadable module support
#
# CONFIG_MODULES is not set
#
# Processor type and features
#
# CONFIG_M386 is not set
# CONFIG_M486 is not set
CONFIG_M586=y
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
# CONFIG_MK7 is not set
# CONFIG_MELAN is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MCYRIXIII is not set
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_L1_CACHE_SHIFT=5
CONFIG_X86_USE_STRING_486=y
CONFIG_X86_ALIGNMENT_16=y
CONFIG_X86_PPRO_FENCE=y
CONFIG_X86_F00F_BUG=y
# CONFIG_X86_MCE is not set
# CONFIG_X86_MCE_NONFATAL is not set
# CONFIG_X86_MCE_P4THERMAL is not set
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
# CONFIG_MATH_EMULATION is not set
# CONFIG_MTRR is not set
CONFIG_SMP=y
# CONFIG_PREEMPT is not set
# CONFIG_MULTIQUAD is not set
CONFIG_HAVE_DEC_LOCK=y
#
# General options
#
#
# ACPI Support
#
# CONFIG_ACPI is not set
CONFIG_X86_IO_APIC=y
CONFIG_X86_LOCAL_APIC=y
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
CONFIG_PCI_NAMES=y
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_HOTPLUG is not set
# CONFIG_PCMCIA is not set
# CONFIG_HOTPLUG_PCI is not set
CONFIG_KCORE_ELF=y
# CONFIG_KCORE_AOUT is not set
CONFIG_BINFMT_AOUT=y
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=y
# CONFIG_PM is not set
# CONFIG_APM is not set
#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set
#
# Parallel port support
#
# CONFIG_PARPORT is not set
#
# Plug and Play configuration
#
# CONFIG_PNP is not set
# CONFIG_ISAPNP is not set
# CONFIG_PNPBIOS is not set
#
# Block devices
#
# CONFIG_BLK_DEV_FD is not set
# CONFIG_BLK_DEV_XD is not set
# CONFIG_PARIDE is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_CISS_SCSI_TAPE is not set
# CONFIG_BLK_DEV_DAC960 is not set
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_NBD is not set
CONFIG_BLK_DEV_RAM=y
CONFIG_BLK_DEV_RAM_SIZE=65536
# CONFIG_BLK_DEV_INITRD is not set
#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set
# CONFIG_BLK_DEV_MD is not set
# CONFIG_MD_LINEAR is not set
# CONFIG_MD_RAID0 is not set
# CONFIG_MD_RAID1 is not set
# CONFIG_MD_RAID5 is not set
# CONFIG_MD_MULTIPATH is not set
# CONFIG_BLK_DEV_LVM is not set
#
# Networking options
#
# CONFIG_PACKET is not set
# CONFIG_NETLINK_DEV is not set
# CONFIG_NETFILTER is not set
# CONFIG_FILTER is not set
CONFIG_UNIX=y
CONFIG_INET=y
# CONFIG_IP_MULTICAST is not set
# CONFIG_IP_ADVANCED_ROUTER is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_ARPD is not set
# CONFIG_INET_ECN is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_IPV6 is not set
# CONFIG_KHTTPD is not set
# CONFIG_ATM is not set
# CONFIG_VLAN_8021Q is not set
#
#
#
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
#
# Appletalk devices
#
# CONFIG_DEV_APPLETALK is not set
# CONFIG_DECNET is not set
# CONFIG_BRIDGE is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_LLC is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_FASTROUTE is not set
# CONFIG_NET_HW_FLOWCONTROL is not set
#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set
#
# Telephony Support
#
# CONFIG_PHONE is not set
# CONFIG_PHONE_IXJ is not set
# CONFIG_PHONE_IXJ_PCMCIA is not set
#
# ATA/IDE/MFM/RLL support
#
CONFIG_IDE=y
#
# ATA and ATAPI Block devices
#
CONFIG_BLK_DEV_IDE=y
#
# Please see Documentation/ide.txt for help/info on IDE drives
#
# CONFIG_BLK_DEV_HD_IDE is not set
# CONFIG_BLK_DEV_HD is not set
CONFIG_BLK_DEV_IDEDISK=y
# CONFIG_IDEDISK_MULTI_MODE is not set
# CONFIG_IDEDISK_STROKE is not set
# CONFIG_BLK_DEV_IDEDISK_VENDOR is not set
# CONFIG_BLK_DEV_IDEDISK_FUJITSU is not set
# CONFIG_BLK_DEV_IDEDISK_IBM is not set
# CONFIG_BLK_DEV_IDEDISK_MAXTOR is not set
# CONFIG_BLK_DEV_IDEDISK_QUANTUM is not set
# CONFIG_BLK_DEV_IDEDISK_SEAGATE is not set
# CONFIG_BLK_DEV_IDEDISK_WD is not set
# CONFIG_BLK_DEV_COMMERIAL is not set
# CONFIG_BLK_DEV_TIVO is not set
# CONFIG_BLK_DEV_IDECS is not set
CONFIG_BLK_DEV_IDECD=y
# CONFIG_BLK_DEV_IDETAPE is not set
CONFIG_BLK_DEV_IDEFLOPPY=y
# CONFIG_BLK_DEV_IDESCSI is not set
#
# ATA host chip set support
#
CONFIG_BLK_DEV_CMD640=y
# CONFIG_BLK_DEV_CMD640_ENHANCED is not set
# CONFIG_BLK_DEV_ISAPNP is not set
CONFIG_BLK_DEV_RZ1000=y
#
# PCI host chip set support
#
# CONFIG_BLK_DEV_OFFBOARD is not set
CONFIG_IDEPCI_SHARE_IRQ=y
# CONFIG_BLK_DEV_IDEDMA_PCI is not set
# CONFIG_IDEDMA_PCI_AUTO is not set
# CONFIG_IDEDMA_ONLYDISK is not set
# CONFIG_BLK_DEV_IDEDMA is not set
# CONFIG_BLK_DEV_IDE_TCQ is not set
# CONFIG_BLK_DEV_IDE_TCQ_DEFAULT is not set
# CONFIG_IDEDMA_NEW_DRIVE_LISTINGS is not set
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_AEC62XX_TUNING is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_WDC_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_HPT34X_AUTODMA is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_PIIX is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_PDC_ADMA is not set
# CONFIG_BLK_DEV_PDC202XX is not set
# CONFIG_PDC202XX_BURST is not set
# CONFIG_PDC202XX_FORCE is not set
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_TRM290 is not set
# CONFIG_BLK_DEV_VIA82CXXX is not set
# CONFIG_BLK_DEV_SL82C105 is not set
# CONFIG_IDE_CHIPSETS is not set
# CONFIG_IDEDMA_AUTO is not set
# CONFIG_DMA_NONPCI is not set
# CONFIG_BLK_DEV_ATARAID is not set
# CONFIG_BLK_DEV_ATARAID_PDC is not set
# CONFIG_BLK_DEV_ATARAID_HPT is not set
#
# SCSI support
#
# CONFIG_SCSI is not set
#
# Fusion MPT device support
#
# CONFIG_FUSION is not set
#
# IEEE 1394 (FireWire) support (EXPERIMENTAL)
#
# CONFIG_IEEE1394 is not set
#
# I2O device support
#
# CONFIG_I2O is not set
# CONFIG_I2O_PCI is not set
# CONFIG_I2O_BLOCK is not set
# CONFIG_I2O_LAN is not set
# CONFIG_I2O_SCSI is not set
# CONFIG_I2O_PROC is not set
#
# Network device support
#
CONFIG_NETDEVICES=y
#
# ARCnet devices
#
# CONFIG_ARCNET is not set
CONFIG_DUMMY=y
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set
# CONFIG_ETHERTAP is not set
#
# Ethernet (10 or 100Mbit)
#
# CONFIG_NET_ETHERNET is not set
#
# Ethernet (1000 Mbit)
#
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
# CONFIG_E1000 is not set
# CONFIG_MYRI_SBUS is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_SK98LIN is not set
# CONFIG_TIGON3 is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
#
# Wireless LAN (non-hamradio)
#
# CONFIG_NET_RADIO is not set
#
# Token Ring devices
#
# CONFIG_TR is not set
# CONFIG_NET_FC is not set
# CONFIG_RCPCI is not set
# CONFIG_SHAPER is not set
#
# Wan interfaces
#
# CONFIG_WAN is not set
#
# "Tulip" family network device support
#
# CONFIG_NET_TULIP is not set
#
# Amateur Radio support
#
# CONFIG_HAMRADIO is not set
#
# IrDA (infrared) support
#
# CONFIG_IRDA is not set
#
# ISDN subsystem
#
# CONFIG_ISDN_BOOL is not set
#
# Old CD-ROM drivers (not SCSI, not IDE)
#
# CONFIG_CD_NO_IDESCSI is not set
#
# Input device support
#
# CONFIG_INPUT is not set
# CONFIG_INPUT_KEYBDEV is not set
# CONFIG_INPUT_MOUSEDEV is not set
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_GAMEPORT is not set
CONFIG_SOUND_GAMEPORT=y
# CONFIG_GAMEPORT_NS558 is not set
# CONFIG_GAMEPORT_L4 is not set
# CONFIG_INPUT_EMU10K1 is not set
# CONFIG_GAMEPORT_PCIGAME is not set
# CONFIG_GAMEPORT_FM801 is not set
# CONFIG_GAMEPORT_CS461x is not set
# CONFIG_SERIO is not set
# CONFIG_SERIO_SERPORT is not set
#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
# CONFIG_SERIAL is not set
# CONFIG_SERIAL_EXTENDED is not set
# CONFIG_SERIAL_NONSTANDARD is not set
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=2048
#
# I2C support
#
# CONFIG_I2C is not set
#
# Mice
#
# CONFIG_BUSMOUSE is not set
# CONFIG_MOUSE is not set
# CONFIG_QIC02_TAPE is not set
#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
# CONFIG_INTEL_RNG is not set
# CONFIG_NVRAM is not set
# CONFIG_RTC is not set
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set
#
# Ftape, the floppy tape device driver
#
# CONFIG_FTAPE is not set
# CONFIG_AGP is not set
# CONFIG_DRM is not set
# CONFIG_MWAVE is not set
#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set
#
# File systems
#
# CONFIG_QUOTA is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_REISERFS_CHECK is not set
# CONFIG_REISERFS_PROC_INFO is not set
# CONFIG_ADFS_FS is not set
# CONFIG_ADFS_FS_RW is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_BFS_FS is not set
# CONFIG_EXT3_FS is not set
# CONFIG_JBD is not set
# CONFIG_JBD_DEBUG is not set
CONFIG_FAT_FS=y
CONFIG_MSDOS_FS=y
# CONFIG_UMSDOS_FS is not set
CONFIG_VFAT_FS=y
# CONFIG_EFS_FS is not set
# CONFIG_JFFS_FS is not set
# CONFIG_JFFS2_FS is not set
# CONFIG_CRAMFS is not set
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
# CONFIG_ZISOFS is not set
# CONFIG_JFS_FS is not set
# CONFIG_JFS_DEBUG is not set
# CONFIG_JFS_STATISTICS is not set
CONFIG_MINIX_FS=y
# CONFIG_VXFS_FS is not set
# CONFIG_NTFS_FS is not set
# CONFIG_NTFS_DEBUG is not set
# CONFIG_HPFS_FS is not set
CONFIG_PROC_FS=y
# CONFIG_DEVFS_FS is not set
# CONFIG_DEVFS_MOUNT is not set
# CONFIG_DEVFS_DEBUG is not set
CONFIG_DEVPTS_FS=y
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX4FS_RW is not set
# CONFIG_ROMFS_FS is not set
CONFIG_EXT2_FS=y
# CONFIG_SYSV_FS is not set
CONFIG_UDF_FS=y
CONFIG_UDF_RW=y
# CONFIG_UFS_FS is not set
# CONFIG_UFS_FS_WRITE is not set
#
# Network File Systems
#
# CONFIG_CODA_FS is not set
# CONFIG_INTERMEZZO_FS is not set
# CONFIG_NFS_FS is not set
# CONFIG_NFS_V3 is not set
# CONFIG_ROOT_NFS is not set
# CONFIG_NFSD is not set
# CONFIG_NFSD_V3 is not set
# CONFIG_NFSD_TCP is not set
# CONFIG_SUNRPC is not set
# CONFIG_LOCKD is not set
# CONFIG_EXPORTFS is not set
# CONFIG_SMB_FS is not set
# CONFIG_NCP_FS is not set
# CONFIG_NCPFS_PACKET_SIGNING is not set
# CONFIG_NCPFS_IOCTL_LOCKING is not set
# CONFIG_NCPFS_STRONG is not set
# CONFIG_NCPFS_NFS_NS is not set
# CONFIG_NCPFS_OS2_NS is not set
# CONFIG_NCPFS_SMALLDOS is not set
# CONFIG_NCPFS_NLS is not set
# CONFIG_NCPFS_EXTRAS is not set
# CONFIG_ZISOFS_FS is not set
#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
# CONFIG_SMB_NLS is not set
CONFIG_NLS=y
#
# Native Language Support
#
CONFIG_NLS_DEFAULT="iso8859-1"
# CONFIG_NLS_CODEPAGE_437 is not set
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
# CONFIG_NLS_UTF8 is not set
#
# Console drivers
#
CONFIG_VGA_CONSOLE=y
# CONFIG_VIDEO_SELECT is not set
# CONFIG_MDA_CONSOLE is not set
#
# Frame-buffer support
#
# CONFIG_FB is not set
#
# Sound
#
# CONFIG_SOUND is not set
#
# USB support
#
# CONFIG_USB is not set
#
# Bluetooth support
#
# CONFIG_BLUEZ is not set
#
# Kernel hacking
#
CONFIG_DEBUG_KERNEL=y
CONFIG_DEBUG_SLAB=y
CONFIG_DEBUG_IOVIRT=y
CONFIG_MAGIC_SYSRQ=y
CONFIG_DEBUG_SPINLOCK=y
#
# Library routines
#
# CONFIG_CRC32 is not set
# CONFIG_ZLIB_INFLATE is not set
# CONFIG_ZLIB_DEFLATE is not set
^ permalink raw reply [flat|nested] 265+ messages in thread
* [PATCH] 2.5.15 IDE 60
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
` (6 preceding siblings ...)
2002-05-09 19:58 ` [PATCH] 2.5.14 IDE 59 Martin Dalecki
@ 2002-05-11 16:59 ` Martin Dalecki
2002-05-11 18:47 ` Pierre Rousselet
2002-05-12 19:19 ` pdc202xx.c fails to compile in 2.5.15 Zlatko Calusic
2002-05-13 9:48 ` [PATCH] 2.5.15 IDE 61 Martin Dalecki
` (4 subsequent siblings)
12 siblings, 2 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-11 16:59 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 452 bytes --]
Fri May 10 16:17:01 CEST 2002 ide-clean-60
Synchronize with 2.5.15
- Rewrite ioctl handling.
- Apply fix for hpt366 "hang on boot" by Andre.
- Remove stale XXX_tune_req. It was no longer used.
- Propagate rq through ide_error(), ide_end_drive_cmd(), ide_dump_status(),
ide_wait_stat().
- Push the current drive down to ata_channel from hwgroup.
- Push the timer down to the ata_channel structure. Most probably it will end
at the drive.
[-- Attachment #2: ide-clean-60.diff --]
[-- Type: text/plain, Size: 58315 bytes --]
diff -urN linux-2.5.15/drivers/ide/hpt366.c linux/drivers/ide/hpt366.c
--- linux-2.5.15/drivers/ide/hpt366.c 2002-05-10 00:22:02.000000000 +0200
+++ linux/drivers/ide/hpt366.c 2002-05-10 16:18:40.000000000 +0200
@@ -670,8 +670,8 @@
* Disable the "fast interrupt" prediction.
*/
pci_read_config_byte(dev, regfast, &drive_fast);
- if (drive_fast & 0x02)
- pci_write_config_byte(dev, regfast, drive_fast & ~0x20);
+ if (drive_fast & 0x80)
+ pci_write_config_byte(dev, regfast, drive_fast & ~0x80);
pci_read_config_dword(dev, regtime, ®1);
reg2 = pci_bus_clock_list(speed,
diff -urN linux-2.5.15/drivers/ide/ide.c linux/drivers/ide/ide.c
--- linux-2.5.15/drivers/ide/ide.c 2002-05-10 00:24:14.000000000 +0200
+++ linux/drivers/ide/ide.c 2002-05-11 18:38:21.000000000 +0200
@@ -337,17 +337,18 @@
unsigned long timeout, ata_expiry_t expiry)
{
unsigned long flags;
- ide_hwgroup_t *hwgroup = HWGROUP(drive);
+ struct ata_channel *ch = drive->channel;
+ ide_hwgroup_t *hwgroup = ch->hwgroup;
spin_lock_irqsave(&ide_lock, flags);
if (hwgroup->handler != NULL) {
printk("%s: ide_set_handler: handler not null; old=%p, new=%p, from %p\n",
drive->name, hwgroup->handler, handler, __builtin_return_address(0));
}
- hwgroup->handler = handler;
- hwgroup->expiry = expiry;
- hwgroup->timer.expires = jiffies + timeout;
- add_timer(&hwgroup->timer);
+ hwgroup->handler = handler;
+ ch->expiry = expiry;
+ ch->timer.expires = jiffies + timeout;
+ add_timer(&ch->timer);
spin_unlock_irqrestore(&ide_lock, flags);
}
@@ -592,15 +593,11 @@
/*
* Clean up after success/failure of an explicit drive cmd
+ *
+ * Should be called under lock held.
*/
-void ide_end_drive_cmd(struct ata_device *drive, byte stat, byte err)
+void ide_end_drive_cmd(struct ata_device *drive, struct request *rq, u8 stat, u8 err)
{
- unsigned long flags;
- struct request *rq;
-
- spin_lock_irqsave(&ide_lock, flags);
- rq = HWGROUP(drive)->rq;
-
if (rq->flags & REQ_DRIVE_CMD) {
u8 *args = rq->buffer;
rq->errors = !OK_STAT(stat, READY_STAT, BAD_STAT);
@@ -640,14 +637,12 @@
blkdev_dequeue_request(rq);
HWGROUP(drive)->rq = NULL;
end_that_request_last(rq);
-
- spin_unlock_irqrestore(&ide_lock, flags);
}
/*
* Error reporting, in human readable form (luxurious, but a memory hog).
*/
-byte ide_dump_status(struct ata_device *drive, const char *msg, byte stat)
+u8 ide_dump_status(struct ata_device *drive, struct request * rq, const char *msg, u8 stat)
{
unsigned long flags;
byte err = 0;
@@ -722,8 +717,8 @@
IN_BYTE(IDE_SECTOR_REG));
}
}
- if (HWGROUP(drive) && HWGROUP(drive)->rq)
- printk(", sector=%ld", HWGROUP(drive)->rq->sector);
+ if (rq)
+ printk(", sector=%ld", rq->sector);
}
}
#endif
@@ -786,18 +781,17 @@
/*
* Take action based on the error returned by the drive.
*/
-ide_startstop_t ide_error(struct ata_device *drive, const char *msg, byte stat)
+ide_startstop_t ide_error(struct ata_device *drive, struct request *rq, const char *msg, byte stat)
{
- struct request *rq;
byte err;
- err = ide_dump_status(drive, msg, stat);
- if (drive == NULL || (rq = HWGROUP(drive)->rq) == NULL)
+ err = ide_dump_status(drive, rq, msg, stat);
+ if (!drive || !rq)
return ide_stopped;
/* retry only "normal" I/O: */
if (!(rq->flags & REQ_CMD)) {
rq->errors = 1;
- ide_end_drive_cmd(drive, stat, err);
+ ide_end_drive_cmd(drive, rq, stat, err);
return ide_stopped;
}
@@ -869,8 +863,8 @@
}
if (!OK_STAT(stat, READY_STAT, BAD_STAT))
- return ide_error(drive, "drive_cmd", stat); /* calls ide_end_drive_cmd */
- ide_end_drive_cmd (drive, stat, GET_ERR());
+ return ide_error(drive, rq, "drive_cmd", stat); /* calls ide_end_drive_cmd */
+ ide_end_drive_cmd(drive, rq, stat, GET_ERR());
return ide_stopped;
}
@@ -886,8 +880,11 @@
* setting a timer to wake up at half second intervals thereafter, until
* timeout is achieved, before timing out.
*/
-int ide_wait_stat(ide_startstop_t *startstop, struct ata_device *drive, byte good, byte bad, unsigned long timeout) {
- byte stat;
+int ide_wait_stat(ide_startstop_t *startstop,
+ struct ata_device *drive, struct request *rq,
+ byte good, byte bad, unsigned long timeout)
+{
+ u8 stat;
int i;
unsigned long flags;
@@ -903,9 +900,9 @@
ide__sti(); /* local CPU only */
timeout += jiffies;
while ((stat = GET_STAT()) & BUSY_STAT) {
- if (0 < (signed long)(jiffies - timeout)) {
+ if (time_after(timeout, jiffies)) {
__restore_flags(flags); /* local CPU only */
- *startstop = ide_error(drive, "status timeout", stat);
+ *startstop = ide_error(drive, rq, "status timeout", stat);
return 1;
}
}
@@ -923,7 +920,8 @@
if (OK_STAT((stat = GET_STAT()), good, bad))
return 0;
}
- *startstop = ide_error(drive, "status error", stat);
+ *startstop = ide_error(drive, rq, "status error", stat);
+
return 1;
}
@@ -975,7 +973,7 @@
ide_startstop_t res;
SELECT_DRIVE(ch, drive);
- if (ide_wait_stat(&res, drive, drive->ready_stat,
+ if (ide_wait_stat(&res, drive, rq, drive->ready_stat,
BUSY_STAT|DRQ_STAT, WAIT_READY)) {
printk(KERN_WARNING "%s: drive not ready for command\n", drive->name);
@@ -1066,20 +1064,21 @@
#ifdef DEBUG
printk("%s: DRIVE_CMD (null)\n", drive->name);
#endif
- ide_end_drive_cmd(drive, GET_STAT(), GET_ERR());
+ ide_end_drive_cmd(drive, rq, GET_STAT(), GET_ERR());
return ide_stopped;
}
ide_startstop_t restart_request(struct ata_device *drive)
{
- ide_hwgroup_t *hwgroup = HWGROUP(drive);
+ struct ata_channel *ch = drive->channel;
+ ide_hwgroup_t *hwgroup = ch->hwgroup;
unsigned long flags;
struct request *rq;
spin_lock_irqsave(&ide_lock, flags);
hwgroup->handler = NULL;
- del_timer(&hwgroup->timer);
+ del_timer(&ch->timer);
rq = hwgroup->rq;
spin_unlock_irqrestore(&ide_lock, flags);
@@ -1195,11 +1194,11 @@
if (time_after(jiffies, sleep - WAIT_MIN_SLEEP))
sleep = jiffies + WAIT_MIN_SLEEP;
#if 1
- if (timer_pending(&channel->hwgroup->timer))
+ if (timer_pending(&channel->timer))
printk(KERN_ERR "ide_set_handler: timer already active\n");
#endif
set_bit(IDE_SLEEP, &channel->hwgroup->flags);
- mod_timer(&channel->hwgroup->timer, sleep);
+ mod_timer(&channel->timer, sleep);
/* we purposely leave hwgroup busy while sleeping */
} else {
/* Ugly, but how can we sleep for the lock otherwise? perhaps
@@ -1341,8 +1340,8 @@
static void ide_do_request(struct ata_channel *channel, int masked_irq)
{
ide_hwgroup_t *hwgroup = channel->hwgroup;
- ide_get_lock(&irq_lock, ata_irq_request, hwgroup);/* for atari only: POSSIBLY BROKEN HERE(?) */
+ ide_get_lock(&irq_lock, ata_irq_request, hwgroup);/* for atari only: POSSIBLY BROKEN HERE(?) */
__cli(); /* necessary paranoia: ensure IRQs are masked on local CPU */
while (!test_and_set_bit(IDE_BUSY, &hwgroup->flags)) {
@@ -1357,7 +1356,7 @@
ch = drive->channel;
- if (hwgroup->XXX_drive->channel->sharing_irq && ch != hwgroup->XXX_drive->channel && ch->io_ports[IDE_CONTROL_OFFSET]) {
+ if (channel->sharing_irq && ch != channel && ch->io_ports[IDE_CONTROL_OFFSET]) {
/* set nIEN for previous channel */
/* FIXME: check this! It appears to act on the current channel! */
@@ -1369,7 +1368,7 @@
/* Remember the last drive we where acting on.
*/
- hwgroup->XXX_drive = drive;
+ ch->drive = drive;
queue_commands(drive, masked_irq);
}
@@ -1423,7 +1422,8 @@
*/
void ide_timer_expiry(unsigned long data)
{
- ide_hwgroup_t *hwgroup = (ide_hwgroup_t *) data;
+ struct ata_channel *ch = (struct ata_channel *) data;
+ ide_hwgroup_t *hwgroup = ch->hwgroup;
ata_handler_t *handler;
ata_expiry_t *expiry;
unsigned long flags;
@@ -1435,7 +1435,7 @@
*/
spin_lock_irqsave(&ide_lock, flags);
- del_timer(&hwgroup->timer);
+ del_timer(&ch->timer);
if ((handler = hwgroup->handler) == NULL) {
@@ -1449,23 +1449,22 @@
if (test_and_clear_bit(IDE_SLEEP, &hwgroup->flags))
clear_bit(IDE_BUSY, &hwgroup->flags);
} else {
- struct ata_device *drive = hwgroup->XXX_drive;
+ struct ata_device *drive = ch->drive;
if (!drive) {
printk("ide_timer_expiry: hwgroup->drive was NULL\n");
hwgroup->handler = NULL;
} else {
- struct ata_channel *ch;
ide_startstop_t startstop;
/* paranoia */
if (!test_and_set_bit(IDE_BUSY, &hwgroup->flags))
printk("%s: ide_timer_expiry: hwgroup was not busy??\n", drive->name);
- if ((expiry = hwgroup->expiry) != NULL) {
+ if ((expiry = ch->expiry) != NULL) {
/* continue */
if ((wait = expiry(drive, HWGROUP(drive)->rq)) != 0) {
/* reengage timer */
- hwgroup->timer.expires = jiffies + wait;
- add_timer(&hwgroup->timer);
+ ch->timer.expires = jiffies + wait;
+ add_timer(&ch->timer);
spin_unlock_irqrestore(&ide_lock, flags);
return;
}
@@ -1497,7 +1496,7 @@
startstop = ide_stopped;
dma_timeout_retry(drive, ch->hwgroup->rq);
} else
- startstop = ide_error(drive, "irq timeout", GET_STAT());
+ startstop = ide_error(drive, ch->hwgroup->rq, "irq timeout", GET_STAT());
}
set_recovery_timer(ch);
enable_irq(ch->irq);
@@ -1507,7 +1506,7 @@
}
}
- ide_do_request(hwgroup->XXX_drive->channel, 0);
+ ide_do_request(ch->drive->channel, 0);
spin_unlock_irqrestore(&ide_lock, flags);
}
@@ -1617,7 +1616,7 @@
}
goto out_lock;
}
- drive = hwgroup->XXX_drive;
+ drive = ch->drive;
if (!drive_is_ready(drive)) {
/*
* This happens regularly when we share a PCI IRQ with another device.
@@ -1631,7 +1630,7 @@
if (!test_and_set_bit(IDE_BUSY, &hwgroup->flags))
printk(KERN_ERR "%s: %s: hwgroup was not busy!?\n", drive->name, __FUNCTION__);
hwgroup->handler = NULL;
- del_timer(&hwgroup->timer);
+ del_timer(&ch->timer);
spin_unlock(&ide_lock);
if (ch->unmask)
@@ -2002,7 +2001,7 @@
*/
hwgroup = ch->hwgroup;
- d = hwgroup->XXX_drive;
+ d = ch->drive;
for (i = 0; i < MAX_DRIVES; ++i) {
struct ata_device *drive = &ch->drives[i];
@@ -2013,8 +2012,9 @@
if (!drive->present)
continue;
- if (hwgroup->XXX_drive == drive)
- hwgroup->XXX_drive = NULL;
+ /* FIXME: possibly unneccessary */
+ if (ch->drive == drive)
+ ch->drive = NULL;
if (drive->id != NULL) {
kfree(drive->id);
@@ -2024,7 +2024,7 @@
blk_cleanup_queue(&drive->queue);
}
if (d->present)
- hwgroup->XXX_drive = d;
+ ch->drive = d;
/*
@@ -2220,84 +2220,6 @@
return ide_register_hw(&hw, NULL);
}
-void ide_add_setting(struct ata_device *drive, const char *name, int rw, int read_ioctl, int write_ioctl, int data_type, int min, int max, int mul_factor, int div_factor, void *data, ide_procset_t *set)
-{
- ide_settings_t **p = &drive->settings;
- ide_settings_t *setting = NULL;
-
- while ((*p) && strcmp((*p)->name, name) < 0)
- p = &((*p)->next);
- if ((setting = kmalloc(sizeof(*setting), GFP_KERNEL)) == NULL)
- goto abort;
- memset(setting, 0, sizeof(*setting));
- if ((setting->name = kmalloc(strlen(name) + 1, GFP_KERNEL)) == NULL)
- goto abort;
- strcpy(setting->name, name); setting->rw = rw;
- setting->read_ioctl = read_ioctl; setting->write_ioctl = write_ioctl;
- setting->data_type = data_type; setting->min = min;
- setting->max = max; setting->mul_factor = mul_factor;
- setting->div_factor = div_factor; setting->data = data;
- setting->set = set; setting->next = *p;
- if (drive->driver)
- setting->auto_remove = 1;
- *p = setting;
- return;
-abort:
- if (setting)
- kfree(setting);
-}
-
-void ide_remove_setting(struct ata_device *drive, char *name)
-{
- ide_settings_t **p = &drive->settings, *setting;
-
- while ((*p) && strcmp((*p)->name, name))
- p = &((*p)->next);
- if ((setting = (*p)) == NULL)
- return;
- (*p) = setting->next;
- kfree(setting->name);
- kfree(setting);
-}
-
-static void auto_remove_settings(struct ata_device *drive)
-{
- ide_settings_t *setting;
-repeat:
- setting = drive->settings;
- while (setting) {
- if (setting->auto_remove) {
- ide_remove_setting(drive, setting->name);
- goto repeat;
- }
- setting = setting->next;
- }
-}
-
-int ide_read_setting(struct ata_device *drive, ide_settings_t *setting)
-{
- int val = -EINVAL;
- unsigned long flags;
-
- if ((setting->rw & SETTING_READ)) {
- spin_lock_irqsave(&ide_lock, flags);
- switch(setting->data_type) {
- case TYPE_BYTE:
- val = *((u8 *) setting->data);
- break;
- case TYPE_SHORT:
- val = *((u16 *) setting->data);
- break;
- case TYPE_INT:
- case TYPE_INTA:
- val = *((u32 *) setting->data);
- break;
- }
- spin_unlock_irqrestore(&ide_lock, flags);
- }
- return val;
-}
-
int ide_spin_wait_hwgroup(struct ata_device *drive)
{
ide_hwgroup_t *hwgroup = HWGROUP(drive);
@@ -2322,46 +2244,6 @@
return 0;
}
-/*
- * FIXME: This should be changed to enqueue a special request
- * to the driver to change settings, and then wait on a semaphore for completion.
- * The current scheme of polling is kludgey, though safe enough.
- */
-int ide_write_setting(struct ata_device *drive, ide_settings_t *setting, int val)
-{
- int i;
- u32 *p;
-
- if (!capable(CAP_SYS_ADMIN))
- return -EACCES;
- if (!(setting->rw & SETTING_WRITE))
- return -EPERM;
- if (val < setting->min || val > setting->max)
- return -EINVAL;
- if (setting->set)
- return setting->set(drive, val);
- if (ide_spin_wait_hwgroup(drive))
- return -EBUSY;
- switch (setting->data_type) {
- case TYPE_BYTE:
- *((u8 *) setting->data) = val;
- break;
- case TYPE_SHORT:
- *((u16 *) setting->data) = val;
- break;
- case TYPE_INT:
- *((u32 *) setting->data) = val;
- break;
- case TYPE_INTA:
- p = (u32 *) setting->data;
- for (i = 0; i < 1 << PARTN_BITS; i++, p++)
- *p = val;
- break;
- }
- spin_unlock_irq(&ide_lock);
- return 0;
-}
-
static int set_io_32bit(struct ata_device *drive, int arg)
{
if (drive->channel->no_io_32bit)
@@ -2381,6 +2263,7 @@
return -EPERM;
udma_enable(drive, arg, 1);
+
return 0;
}
@@ -2401,20 +2284,6 @@
return 0;
}
-void ide_add_generic_settings(struct ata_device *drive)
-{
-/* drive setting name read/write access read ioctl write ioctl data type min max mul_factor div_factor data pointer set function */
- ide_add_setting(drive, "io_32bit", drive->channel->no_io_32bit ? SETTING_READ : SETTING_RW, HDIO_GET_32BIT, HDIO_SET_32BIT, TYPE_BYTE, 0, 1 + (SUPPORT_VLB_SYNC << 1), 1, 1, &drive->channel->io_32bit, set_io_32bit);
- ide_add_setting(drive, "pio_mode", SETTING_WRITE, -1, HDIO_SET_PIO_MODE, TYPE_BYTE, 0, 255, 1, 1, NULL, set_pio_mode);
- ide_add_setting(drive, "slow", SETTING_RW, -1, -1, TYPE_BYTE, 0, 1, 1, 1, &drive->channel->slow, NULL);
- ide_add_setting(drive, "unmaskirq", drive->channel->no_unmask ? SETTING_READ : SETTING_RW, HDIO_GET_UNMASKINTR, HDIO_SET_UNMASKINTR, TYPE_BYTE, 0, 1, 1, 1, &drive->channel->unmask, NULL);
- ide_add_setting(drive, "using_dma", SETTING_RW, HDIO_GET_DMA, HDIO_SET_DMA, TYPE_BYTE, 0, 1, 1, 1, &drive->using_dma, set_using_dma);
- ide_add_setting(drive, "ide_scsi", SETTING_RW, -1, -1, TYPE_BYTE, 0, 1, 1, 1, &drive->scsi, NULL);
- ide_add_setting(drive, "init_speed", SETTING_RW, -1, -1, TYPE_BYTE, 0, 69, 1, 1, &drive->init_speed, NULL);
- ide_add_setting(drive, "current_speed", SETTING_RW, -1, -1, TYPE_BYTE, 0, 69, 1, 1, &drive->current_speed, NULL);
- ide_add_setting(drive, "number", SETTING_RW, -1, -1, TYPE_BYTE, 0, 3, 1, 1, &drive->dn, NULL);
-}
-
/*
* Delay for *at least* 50ms. As we don't know how much time is left
* until the next tick occurs, we wait an extra tick to be safe.
@@ -2433,44 +2302,125 @@
#endif /* CONFIG_BLK_DEV_IDECS */
}
+/*
+ * Handle ioctls.
+ *
+ * NOTE: Due to ridiculous coding habbits in the hdparm utility we have to
+ * always return unsigned long in case we are returning simple values.
+ */
static int ide_ioctl(struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
{
- int err = 0, major, minor;
+ unsigned int major, minor;
struct ata_device *drive;
struct request rq;
kdev_t dev;
- ide_settings_t *setting;
dev = inode->i_rdev;
major = major(dev); minor = minor(dev);
if ((drive = get_info_ptr(inode->i_rdev)) == NULL)
return -ENODEV;
- /* Find setting by ioctl */
- setting = drive->settings;
+ /* Contrary to popular beleve we disallow even the reading of the ioctl
+ * values for users which don't have permission too. We do this becouse
+ * such information could be used by an attacker to deply a simple-user
+ * attack, which triggers bugs present only on a particular
+ * configuration.
+ */
- while (setting) {
- if (setting->read_ioctl == cmd || setting->write_ioctl == cmd)
- break;
- setting = setting->next;
- }
+ if (!capable(CAP_SYS_ADMIN))
+ return -EACCES;
- if (setting != NULL) {
- if (cmd == setting->read_ioctl) {
- err = ide_read_setting(drive, setting);
- return err >= 0 ? put_user(err, (long *) arg) : err;
- } else {
- if ((minor(inode->i_rdev) & PARTN_MASK))
+ ide_init_drive_cmd(&rq);
+ switch (cmd) {
+ case HDIO_GET_32BIT: {
+ unsigned long val = drive->channel->io_32bit;
+
+ if (put_user(val, (unsigned long *) arg))
+ return -EFAULT;
+ return 0;
+ }
+
+ case HDIO_SET_32BIT: {
+ int val;
+
+ if (arg < 0 || arg > 1 + (SUPPORT_VLB_SYNC << 1))
return -EINVAL;
- return ide_write_setting(drive, setting, arg);
+
+ if (ide_spin_wait_hwgroup(drive))
+ return -EBUSY;
+
+ val = set_io_32bit(drive, arg);
+ spin_unlock_irq(&ide_lock);
+
+ return val;
}
- }
- ide_init_drive_cmd(&rq);
- switch (cmd) {
- case HDIO_GETGEO:
- {
+ case HDIO_SET_PIO_MODE: {
+ int val;
+
+ if (arg < 0 || arg > 255)
+ return -EINVAL;
+
+ if (ide_spin_wait_hwgroup(drive))
+ return -EBUSY;
+
+ val = set_pio_mode(drive, arg);
+ spin_unlock_irq(&ide_lock);
+
+ return val;
+ }
+
+ case HDIO_GET_UNMASKINTR: {
+ unsigned long val = drive->channel->unmask;
+
+ if (put_user(val, (unsigned long *) arg))
+ return -EFAULT;
+ return 0;
+ }
+
+
+ case HDIO_SET_UNMASKINTR: {
+ if (arg < 0 || arg > 1)
+ return -EINVAL;
+
+ if (ide_spin_wait_hwgroup(drive))
+ return -EBUSY;
+
+ if (drive->channel->no_unmask)
+ return -EIO;
+
+ drive->channel->unmask = arg;
+ spin_unlock_irq(&ide_lock);
+
+ return 0;
+ }
+
+ case HDIO_GET_DMA: {
+ unsigned long val = drive->using_dma;
+
+ if (put_user(val, (unsigned long *) arg))
+ return -EFAULT;
+
+ return 0;
+ }
+
+ case HDIO_SET_DMA: {
+ int val;
+
+ if (arg < 0 || arg > 1)
+ return -EINVAL;
+
+ if (ide_spin_wait_hwgroup(drive))
+ return -EBUSY;
+
+ val = set_using_dma(drive, arg);
+ spin_unlock_irq(&ide_lock);
+
+ return val;
+ }
+
+ case HDIO_GETGEO: {
struct hd_geometry *loc = (struct hd_geometry *) arg;
unsigned short bios_cyl = drive->bios_cyl; /* truncate */
@@ -2484,11 +2434,12 @@
return 0;
}
- case HDIO_GETGEO_BIG_RAW:
- {
+ case HDIO_GETGEO_BIG_RAW: {
struct hd_big_geometry *loc = (struct hd_big_geometry *) arg;
+
if (!loc || (drive->type != ATA_DISK && drive->type != ATA_FLOPPY))
return -EINVAL;
+
if (put_user(drive->head, (u8 *) &loc->heads)) return -EFAULT;
if (put_user(drive->sect, (u8 *) &loc->sectors)) return -EFAULT;
if (put_user(drive->cyl, (unsigned int *) &loc->cylinders)) return -EFAULT;
@@ -2532,6 +2483,7 @@
return -EPERM;
}
return 0;
+
case BLKGETSIZE:
case BLKGETSIZE64:
case BLKROSET:
@@ -2566,6 +2518,9 @@
drive->channel->busproc(drive, (int)arg);
return 0;
+ /* Now check whatever this particular ioctl has a special
+ * implementation.
+ */
default:
if (ata_ops(drive) && ata_ops(drive)->ioctl)
return ata_ops(drive)->ioctl(drive, inode, file, cmd, arg);
@@ -2573,7 +2528,7 @@
}
}
-static int ide_check_media_change (kdev_t i_rdev)
+static int ide_check_media_change(kdev_t i_rdev)
{
struct ata_device *drive;
int res = 0; /* not changed */
@@ -3201,7 +3156,6 @@
#if defined(CONFIG_BLK_DEV_ISAPNP) && defined(CONFIG_ISAPNP) && defined(MODULE)
pnpide_init(0);
#endif
- auto_remove_settings(drive);
drive->driver = NULL;
drive->present = 0;
@@ -3255,7 +3209,6 @@
EXPORT_SYMBOL(ide_lock);
EXPORT_SYMBOL(drive_is_flashcard);
EXPORT_SYMBOL(ide_timer_expiry);
-EXPORT_SYMBOL(ide_add_generic_settings);
EXPORT_SYMBOL(do_ide_request);
/*
* Driver module
@@ -3278,8 +3231,6 @@
EXPORT_SYMBOL(ide_cmd);
EXPORT_SYMBOL(ide_delay_50ms);
EXPORT_SYMBOL(ide_stall_queue);
-EXPORT_SYMBOL(ide_add_setting);
-EXPORT_SYMBOL(ide_remove_setting);
EXPORT_SYMBOL(ide_register_hw);
EXPORT_SYMBOL(ide_register);
diff -urN linux-2.5.15/drivers/ide/ide-cd.c linux/drivers/ide/ide-cd.c
--- linux-2.5.15/drivers/ide/ide-cd.c 2002-05-10 00:24:12.000000000 +0200
+++ linux/drivers/ide/ide-cd.c 2002-05-11 16:10:45.000000000 +0200
@@ -594,7 +594,7 @@
pc = (struct packet_command *) rq->special;
pc->stat = 1;
cdrom_end_request(drive, rq, 1);
- *startstop = ide_error (drive, "request sense failure", stat);
+ *startstop = ide_error (drive, rq, "request sense failure", stat);
return 1;
} else if (rq->flags & (REQ_PC | REQ_BLOCK_PC)) {
@@ -614,7 +614,7 @@
return 0;
} else if (!pc->quiet) {
/* Otherwise, print an error. */
- ide_dump_status(drive, "packet command error", stat);
+ ide_dump_status(drive, rq, "packet command error", stat);
}
/* Set the error flag and complete the request.
@@ -662,18 +662,18 @@
sense_key == DATA_PROTECT) {
/* No point in retrying after an illegal
request or data protect error.*/
- ide_dump_status (drive, "command error", stat);
+ ide_dump_status(drive, rq, "command error", stat);
cdrom_end_request(drive, rq, 0);
} else if (sense_key == MEDIUM_ERROR) {
/* No point in re-trying a zillion times on a bad
* sector. The error is not correctable at all.
*/
- ide_dump_status (drive, "media error (bad sector)", stat);
+ ide_dump_status(drive, rq, "media error (bad sector)", stat);
cdrom_end_request(drive, rq, 0);
} else if ((err & ~ABRT_ERR) != 0) {
/* Go to the default handler
for other errors. */
- *startstop = ide_error (drive, __FUNCTION__, stat);
+ *startstop = ide_error(drive, rq, __FUNCTION__, stat);
return 1;
} else if ((++rq->errors > ERROR_MAX)) {
/* We've racked up too many retries. Abort. */
@@ -732,7 +732,7 @@
struct cdrom_info *info = drive->driver_data;
/* Wait for the controller to be idle. */
- if (ide_wait_stat(&startstop, drive, 0, BUSY_STAT, WAIT_READY))
+ if (ide_wait_stat(&startstop, drive, rq, 0, BUSY_STAT, WAIT_READY))
return startstop;
if (info->dma) {
@@ -789,7 +789,7 @@
return startstop;
} else {
/* Otherwise, we must wait for DRQ to get set. */
- if (ide_wait_stat(&startstop, drive, DRQ_STAT, BUSY_STAT, WAIT_READY))
+ if (ide_wait_stat(&startstop, drive, rq, DRQ_STAT, BUSY_STAT, WAIT_READY))
return startstop;
}
@@ -917,7 +917,7 @@
__ide_end_request(drive, rq, 1, rq->nr_sectors);
return ide_stopped;
} else
- return ide_error (drive, "dma error", stat);
+ return ide_error (drive, rq, "dma error", stat);
}
/* Read the interrupt reason and the transfer length. */
@@ -1496,7 +1496,7 @@
*/
if (dma) {
if (dma_error)
- return ide_error(drive, "dma error", stat);
+ return ide_error(drive, rq, "dma error", stat);
__ide_end_request(drive, rq, 1, rq->nr_sectors);
return ide_stopped;
@@ -2659,12 +2659,6 @@
return nslots;
}
-static void ide_cdrom_add_settings(ide_drive_t *drive)
-{
- ide_add_setting(drive, "dsc_overlap",
- SETTING_RW, -1, -1, TYPE_BYTE, 0, 1, 1, 1, &drive->dsc_overlap, NULL);
-}
-
static
int ide_cdrom_setup(ide_drive_t *drive)
{
@@ -2798,7 +2792,7 @@
info->devinfo.handle = NULL;
return 1;
}
- ide_cdrom_add_settings(drive);
+
return 0;
}
diff -urN linux-2.5.15/drivers/ide/ide-disk.c linux/drivers/ide/ide-disk.c
--- linux-2.5.15/drivers/ide/ide-disk.c 2002-05-10 00:21:30.000000000 +0200
+++ linux/drivers/ide/ide-disk.c 2002-05-11 18:40:50.000000000 +0200
@@ -472,12 +472,6 @@
drive->nowerr = arg;
drive->bad_wstat = arg ? BAD_R_STAT : BAD_W_STAT;
- /* FIXME: I'm less then sure that we are under the global request lock here!
- */
-#if 0
- spin_unlock_irq(&ide_lock);
-#endif
-
return 0;
}
@@ -531,8 +525,10 @@
{
if (!drive->driver)
return -EPERM;
+
if (!drive->channel->XXX_udma)
return -EPERM;
+
if (arg == drive->queue_depth && drive->using_tcq)
return 0;
@@ -568,20 +564,6 @@
return (probe_lba_addressing(drive, arg));
}
-static void idedisk_add_settings(struct ata_device *drive)
-{
- struct hd_driveid *id = drive->id;
-
- ide_add_setting(drive, "address", SETTING_RW, HDIO_GET_ADDRESS, HDIO_SET_ADDRESS, TYPE_INTA, 0, 2, 1, 1, &drive->addressing, set_lba_addressing);
- ide_add_setting(drive, "multcount", id ? SETTING_RW : SETTING_READ, HDIO_GET_MULTCOUNT, HDIO_SET_MULTCOUNT, TYPE_BYTE, 0, id ? id->max_multsect : 0, 1, 1, &drive->mult_count, set_multcount);
- ide_add_setting(drive, "nowerr", SETTING_RW, HDIO_GET_NOWERR, HDIO_SET_NOWERR, TYPE_BYTE, 0, 1, 1, 1, &drive->nowerr, set_nowerr);
- ide_add_setting(drive, "wcache", SETTING_RW, HDIO_GET_WCACHE, HDIO_SET_WCACHE, TYPE_BYTE, 0, 1, 1, 1, &drive->wcache, write_cache);
- ide_add_setting(drive, "acoustic", SETTING_RW, HDIO_GET_ACOUSTIC, HDIO_SET_ACOUSTIC, TYPE_BYTE, 0, 254, 1, 1, &drive->acoustic, set_acoustic);
-#ifdef CONFIG_BLK_DEV_IDE_TCQ
- ide_add_setting(drive, "using_tcq", SETTING_RW, HDIO_GET_QDMA, HDIO_SET_QDMA, TYPE_BYTE, 0, IDE_MAX_TAG, 1, 1, &drive->using_tcq, set_using_tcq);
-#endif
-}
-
static int idedisk_suspend(struct device *dev, u32 state, u32 level)
{
struct ata_device *drive = dev->driver_data;
@@ -624,9 +606,6 @@
/* This is just a hook for the overall driver tree.
- *
- * FIXME: This is soon goig to replace the custom linked list games played up
- * to great extend between the different components of the IDE drivers.
*/
static struct device_driver idedisk_devdrv = {
@@ -783,8 +762,6 @@
sector_t set_max;
int drvid = -1;
- idedisk_add_settings(drive);
-
if (id == NULL)
return;
@@ -1022,6 +999,159 @@
return ide_unregister_subdriver(drive);
}
+static int idedisk_ioctl(struct ata_device *drive, struct inode *inode, struct file *file, unsigned int cmd, unsigned long arg)
+{
+ struct hd_driveid *id = drive->id;
+
+ switch (cmd) {
+ case HDIO_GET_ADDRESS: {
+ unsigned long val = drive->addressing;
+
+ if (put_user(val, (unsigned long *) arg))
+ return -EFAULT;
+ return 0;
+ }
+
+ case HDIO_SET_ADDRESS: {
+ int val;
+
+ if (arg < 0 || arg > 2)
+ return -EINVAL;
+
+ if (ide_spin_wait_hwgroup(drive))
+ return -EBUSY;
+
+ val = set_lba_addressing(drive, arg);
+ spin_unlock_irq(&ide_lock);
+
+ return val;
+ }
+
+ case HDIO_GET_MULTCOUNT: {
+ unsigned long val = drive->mult_count & 0xFF;
+
+ if (put_user(val, (unsigned long *) arg))
+ return -EFAULT;
+ return 0;
+ }
+
+ case HDIO_SET_MULTCOUNT: {
+ int val;
+
+ if (!id)
+ return -EBUSY;
+
+ if (arg < 0 || arg > (id ? id->max_multsect : 0))
+ return -EINVAL;
+
+ if (ide_spin_wait_hwgroup(drive))
+ return -EBUSY;
+
+ val = set_multcount(drive, arg);
+ spin_unlock_irq(&ide_lock);
+
+ return val;
+ }
+
+ case HDIO_GET_NOWERR: {
+ unsigned long val = drive->nowerr;
+
+ if (put_user(val, (unsigned long *) arg))
+ return -EFAULT;
+ return 0;
+ }
+
+ case HDIO_SET_NOWERR: {
+ int val;
+
+ if (arg < 0 || arg > 1)
+ return -EINVAL;
+
+ if (ide_spin_wait_hwgroup(drive))
+ return -EBUSY;
+
+ val = set_nowerr(drive, arg);
+ spin_unlock_irq(&ide_lock);
+
+ return val;
+ }
+
+ case HDIO_GET_WCACHE: {
+ unsigned long val = drive->wcache;
+
+ if (put_user(val, (unsigned long *) arg))
+ return -EFAULT;
+ return 0;
+ }
+
+ case HDIO_SET_WCACHE: {
+ int val;
+
+ if (arg < 0 || arg > 1)
+ return -EINVAL;
+
+ if (ide_spin_wait_hwgroup(drive))
+ return -EBUSY;
+
+ val = write_cache(drive, arg);
+ spin_unlock_irq(&ide_lock);
+
+ return val;
+ }
+
+ case HDIO_GET_ACOUSTIC: {
+ u8 val = drive->acoustic;
+
+ if (put_user(val, (u8 *) arg))
+ return -EFAULT;
+ return 0;
+ }
+
+ case HDIO_SET_ACOUSTIC: {
+ int val;
+
+ if (arg < 0 || arg > 254)
+ return -EINVAL;
+
+ if (ide_spin_wait_hwgroup(drive))
+ return -EBUSY;
+
+ val = set_acoustic(drive, arg);
+ spin_unlock_irq(&ide_lock);
+
+ return val;
+ }
+
+#ifdef CONFIG_BLK_DEV_IDE_TCQ
+ case HDIO_GET_QDMA: {
+ u8 val = drive->using_tcq;
+
+ if (put_user(val, (u8 *) arg))
+ return -EFAULT;
+ return 0;
+ }
+
+ case HDIO_SET_QDMA: {
+ int val;
+
+ if (arg < 0 || arg > IDE_MAX_TAG)
+ return -EINVAL;
+
+ if (ide_spin_wait_hwgroup(drive))
+ return -EBUSY;
+
+ val = set_using_tcq(drive, arg);
+ spin_unlock_irq(&ide_lock);
+
+ return val;
+ }
+#endif
+ default:
+ return -EINVAL;
+ }
+}
+
+
/*
* IDE subdriver functions, registered with ide.c
*/
@@ -1031,7 +1161,7 @@
standby: idedisk_standby,
do_request: idedisk_do_request,
end_request: NULL,
- ioctl: NULL,
+ ioctl: idedisk_ioctl,
open: idedisk_open,
release: idedisk_release,
check_media_change: idedisk_check_media_change,
diff -urN linux-2.5.15/drivers/ide/ide-dma.c linux/drivers/ide/ide-dma.c
--- linux-2.5.15/drivers/ide/ide-dma.c 2002-05-10 00:22:44.000000000 +0200
+++ linux/drivers/ide/ide-dma.c 2002-05-11 14:57:37.000000000 +0200
@@ -208,7 +208,7 @@
printk(KERN_ERR "%s: dma_intr: bad DMA status (dma_stat=%x)\n",
drive->name, dma_stat);
}
- return ide_error(drive, "dma_intr", stat);
+ return ide_error(drive, rq, "dma_intr", stat);
}
/*
@@ -375,7 +375,7 @@
/*
* 1 dma-ing, 2 error, 4 intr
*/
-static int dma_timer_expiry(struct ata_device *drive, struct request *__rq)
+static int dma_timer_expiry(struct ata_device *drive, struct request *rq)
{
/* FIXME: What's that? */
u8 dma_stat = inb(drive->channel->dma_base+2);
@@ -390,7 +390,7 @@
if (dma_stat & 2) { /* ERROR */
u8 stat = GET_STAT();
- return ide_error(drive, "dma_timer_expiry", stat);
+ return ide_error(drive, rq, "dma_timer_expiry", stat);
}
if (dma_stat & 1) /* DMAing */
return WAIT_CMD;
diff -urN linux-2.5.15/drivers/ide/ide-features.c linux/drivers/ide/ide-features.c
--- linux-2.5.15/drivers/ide/ide-features.c 2002-05-10 00:25:17.000000000 +0200
+++ linux/drivers/ide/ide-features.c 2002-05-11 15:21:28.000000000 +0200
@@ -393,7 +393,7 @@
enable_irq(hwif->irq);
if (error) {
- ide_dump_status(drive, "set_drive_speed_status", stat);
+ ide_dump_status(drive, NULL, "set_drive_speed_status", stat);
return error;
}
diff -urN linux-2.5.15/drivers/ide/ide-floppy.c linux/drivers/ide/ide-floppy.c
--- linux-2.5.15/drivers/ide/ide-floppy.c 2002-05-10 00:22:51.000000000 +0200
+++ linux/drivers/ide/ide-floppy.c 2002-05-11 16:13:04.000000000 +0200
@@ -692,7 +692,7 @@
return 0;
}
rq->errors = error;
- ide_end_drive_cmd (drive, 0, 0);
+ ide_end_drive_cmd (drive, rq, 0, 0);
return 0;
}
@@ -1006,7 +1006,7 @@
idefloppy_floppy_t *floppy = drive->driver_data;
idefloppy_ireason_reg_t ireason;
- if (ide_wait_stat (&startstop,drive,DRQ_STAT,BUSY_STAT,WAIT_READY)) {
+ if (ide_wait_stat (&startstop, drive, rq, DRQ_STAT, BUSY_STAT, WAIT_READY)) {
printk (KERN_ERR "ide-floppy: Strange, packet command initiated yet DRQ isn't asserted\n");
return startstop;
}
@@ -1043,13 +1043,13 @@
return IDEFLOPPY_WAIT_CMD; /* Timeout for the packet command */
}
-static ide_startstop_t idefloppy_transfer_pc1(struct ata_device *drive, struct request *__rq)
+static ide_startstop_t idefloppy_transfer_pc1(struct ata_device *drive, struct request *rq)
{
idefloppy_floppy_t *floppy = drive->driver_data;
ide_startstop_t startstop;
idefloppy_ireason_reg_t ireason;
- if (ide_wait_stat (&startstop,drive,DRQ_STAT,BUSY_STAT,WAIT_READY)) {
+ if (ide_wait_stat(&startstop, drive, rq, DRQ_STAT, BUSY_STAT, WAIT_READY)) {
printk (KERN_ERR "ide-floppy: Strange, packet command initiated yet DRQ isn't asserted\n");
return startstop;
}
@@ -1960,14 +1960,6 @@
return 0;
}
-static void idefloppy_add_settings(ide_drive_t *drive)
-{
- ide_add_setting(drive, "bios_cyl", SETTING_RW, -1, -1, TYPE_INT, 0, 1023, 1, 1, &drive->bios_cyl, NULL);
- ide_add_setting(drive, "bios_head", SETTING_RW, -1, -1, TYPE_BYTE, 0, 255, 1, 1, &drive->bios_head, NULL);
- ide_add_setting(drive, "bios_sect", SETTING_RW, -1, -1, TYPE_BYTE, 0, 63, 1, 1, &drive->bios_sect, NULL);
-
-}
-
/*
* Driver initialization.
*/
@@ -2009,7 +2001,7 @@
}
(void) idefloppy_get_capacity (drive);
- idefloppy_add_settings(drive);
+
for (i = 0; i < MAX_DRIVES; ++i) {
struct ata_channel *hwif = drive->channel;
diff -urN linux-2.5.15/drivers/ide/ide-probe.c linux/drivers/ide/ide-probe.c
--- linux-2.5.15/drivers/ide/ide-probe.c 2002-05-10 00:21:27.000000000 +0200
+++ linux/drivers/ide/ide-probe.c 2002-05-11 18:26:41.000000000 +0200
@@ -628,9 +628,6 @@
return 1;
}
memset(hwgroup, 0, sizeof(*hwgroup));
- init_timer(&hwgroup->timer);
- hwgroup->timer.function = &ide_timer_expiry;
- hwgroup->timer.data = (unsigned long) hwgroup;
}
/*
@@ -659,6 +656,11 @@
* Everything is okay. Tag us as member of this hardware group.
*/
ch->hwgroup = hwgroup;
+
+ init_timer(&ch->timer);
+ ch->timer.function = &ide_timer_expiry;
+ ch->timer.data = (unsigned long) ch;
+
for (i = 0; i < MAX_DRIVES; ++i) {
struct ata_device *drive = &ch->drives[i];
request_queue_t *q;
@@ -667,8 +669,8 @@
if (!drive->present)
continue;
- if (!hwgroup->XXX_drive)
- hwgroup->XXX_drive = drive;
+ if (!ch->drive)
+ ch->drive = drive;
/*
* Init the per device request queue
@@ -842,7 +844,6 @@
for (unit = 0; unit < MAX_DRIVES; ++unit) {
char name[80];
- ide_add_generic_settings(ch->drives + unit);
ch->drives[unit].dn = ((ch->unit ? 2 : 0) + unit);
sprintf(name, "host%d/bus%d/target%d/lun%d",
ch->index, ch->unit, unit, ch->drives[unit].lun);
diff -urN linux-2.5.15/drivers/ide/ide-tape.c linux/drivers/ide/ide-tape.c
--- linux-2.5.15/drivers/ide/ide-tape.c 2002-05-10 00:22:28.000000000 +0200
+++ linux/drivers/ide/ide-tape.c 2002-05-11 16:11:49.000000000 +0200
@@ -1925,7 +1925,7 @@
idetape_increase_max_pipeline_stages (drive);
}
}
- ide_end_drive_cmd (drive, 0, 0);
+ ide_end_drive_cmd(drive, rq, 0, 0);
if (remove_stage)
idetape_remove_stage_head (drive);
if (tape->active_data_request == NULL)
@@ -2228,7 +2228,7 @@
* we will handle the next request.
*
*/
-static ide_startstop_t idetape_transfer_pc(struct ata_device *drive, struct request *__rq)
+static ide_startstop_t idetape_transfer_pc(struct ata_device *drive, struct request *rq)
{
idetape_tape_t *tape = drive->driver_data;
idetape_pc_t *pc = tape->pc;
@@ -2236,7 +2236,7 @@
int retries = 100;
ide_startstop_t startstop;
- if (ide_wait_stat (&startstop,drive,DRQ_STAT,BUSY_STAT,WAIT_READY)) {
+ if (ide_wait_stat(&startstop, drive, rq, DRQ_STAT, BUSY_STAT, WAIT_READY)) {
printk (KERN_ERR "ide-tape: Strange, packet command initiated yet DRQ isn't asserted\n");
return startstop;
}
@@ -5926,41 +5926,7 @@
tape->tape_block_size =( block_descrp->length[0]<<16) + (block_descrp->length[1]<<8) + block_descrp->length[2];
#if IDETAPE_DEBUG_INFO
printk (KERN_INFO "ide-tape: Adjusted block size - %d\n", tape->tape_block_size);
-#endif /* IDETAPE_DEBUG_INFO */
-}
-static void idetape_add_settings (ide_drive_t *drive)
-{
- idetape_tape_t *tape = drive->driver_data;
-
-/*
- * drive setting name read/write ioctl ioctl data type min max mul_factor div_factor data pointer set function
- */
- ide_add_setting(drive, "buffer", SETTING_READ, -1, -1, TYPE_SHORT, 0, 0xffff, 1, 2, &tape->capabilities.buffer_size, NULL);
- ide_add_setting(drive, "pipeline_min", SETTING_RW, -1, -1, TYPE_INT, 2, 0xffff, tape->stage_size / 1024, 1, &tape->min_pipeline, NULL);
- ide_add_setting(drive, "pipeline", SETTING_RW, -1, -1, TYPE_INT, 2, 0xffff, tape->stage_size / 1024, 1, &tape->max_stages, NULL);
- ide_add_setting(drive, "pipeline_max", SETTING_RW, -1, -1, TYPE_INT, 2, 0xffff, tape->stage_size / 1024, 1, &tape->max_pipeline, NULL);
- ide_add_setting(drive, "pipeline_used",SETTING_READ, -1, -1, TYPE_INT, 0, 0xffff, tape->stage_size / 1024, 1, &tape->nr_stages, NULL);
- ide_add_setting(drive, "pipeline_pending",SETTING_READ,-1, -1, TYPE_INT, 0, 0xffff, tape->stage_size / 1024, 1, &tape->nr_pending_stages, NULL);
- ide_add_setting(drive, "speed", SETTING_READ, -1, -1, TYPE_SHORT, 0, 0xffff, 1, 1, &tape->capabilities.speed, NULL);
- ide_add_setting(drive, "stage", SETTING_READ, -1, -1, TYPE_INT, 0, 0xffff, 1, 1024, &tape->stage_size, NULL);
- ide_add_setting(drive, "tdsc", SETTING_RW, -1, -1, TYPE_INT, IDETAPE_DSC_RW_MIN, IDETAPE_DSC_RW_MAX, 1000, HZ, &tape->best_dsc_rw_frequency, NULL);
- ide_add_setting(drive, "dsc_overlap", SETTING_RW, -1, -1, TYPE_BYTE, 0, 1, 1, 1, &drive->dsc_overlap, NULL);
- ide_add_setting(drive, "pipeline_head_speed_c",SETTING_READ, -1, -1, TYPE_INT, 0, 0xffff, 1, 1, &tape->controlled_pipeline_head_speed, NULL);
- ide_add_setting(drive, "pipeline_head_speed_u",SETTING_READ, -1, -1, TYPE_INT, 0, 0xffff, 1, 1, &tape->uncontrolled_pipeline_head_speed, NULL);
- ide_add_setting(drive, "avg_speed", SETTING_READ, -1, -1, TYPE_INT, 0, 0xffff, 1, 1, &tape->avg_speed, NULL);
- ide_add_setting(drive, "debug_level",SETTING_RW, -1, -1, TYPE_INT, 0, 0xffff, 1, 1, &tape->debug_level, NULL);
- if (tape->onstream) {
- ide_add_setting(drive, "cur_frames", SETTING_READ, -1, -1, TYPE_SHORT, 0, 0xffff, 1, 1, &tape->cur_frames, NULL);
- ide_add_setting(drive, "max_frames", SETTING_READ, -1, -1, TYPE_SHORT, 0, 0xffff, 1, 1, &tape->max_frames, NULL);
- ide_add_setting(drive, "insert_speed", SETTING_READ, -1, -1, TYPE_INT, 0, 0xffff, 1, 1, &tape->insert_speed, NULL);
- ide_add_setting(drive, "speed_control",SETTING_RW, -1, -1, TYPE_INT, 0, 0xffff, 1, 1, &tape->speed_control, NULL);
- ide_add_setting(drive, "tape_still_time",SETTING_READ, -1, -1, TYPE_INT, 0, 0xffff, 1, 1, &tape->tape_still_time, NULL);
- ide_add_setting(drive, "max_insert_speed",SETTING_RW, -1, -1, TYPE_INT, 0, 0xffff, 1, 1, &tape->max_insert_speed, NULL);
- ide_add_setting(drive, "insert_size", SETTING_READ, -1, -1, TYPE_INT, 0, 0xffff, 1, 1, &tape->insert_size, NULL);
- ide_add_setting(drive, "capacity", SETTING_READ, -1, -1, TYPE_INT, 0, 0xffff, 1, 1, &tape->capacity, NULL);
- ide_add_setting(drive, "first_frame", SETTING_READ, -1, -1, TYPE_INT, 0, 0xffff, 1, 1, &tape->first_frame_position, NULL);
- ide_add_setting(drive, "logical_blk", SETTING_READ, -1, -1, TYPE_INT, 0, 0xffff, 1, 1, &tape->logical_blk_num, NULL);
- }
+#endif
}
/*
@@ -6074,8 +6040,6 @@
drive->name, tape->name, tape->capabilities.speed, (tape->capabilities.buffer_size * 512) / tape->stage_size,
tape->stage_size / 1024, tape->max_stages * tape->stage_size / 1024,
tape->best_dsc_rw_frequency * 1000 / HZ, drive->using_dma ? ", DMA":"");
-
- idetape_add_settings(drive);
}
static int idetape_cleanup (ide_drive_t *drive)
diff -urN linux-2.5.15/drivers/ide/ide-taskfile.c linux/drivers/ide/ide-taskfile.c
--- linux-2.5.15/drivers/ide/ide-taskfile.c 2002-05-10 00:24:42.000000000 +0200
+++ linux/drivers/ide/ide-taskfile.c 2002-05-11 18:35:41.000000000 +0200
@@ -308,7 +308,7 @@
struct ata_taskfile *args = rq->special;
ide_startstop_t startstop;
- if (ide_wait_stat(&startstop, drive, DATA_READY, drive->bad_wstat, WAIT_DRQ))
+ if (ide_wait_stat(&startstop, drive, rq, DATA_READY, drive->bad_wstat, WAIT_DRQ))
return startstop;
ata_poll_drive_ready(drive);
@@ -329,7 +329,7 @@
*/
if (!rq->nr_sectors) {
if (stat & (ERR_STAT|DRQ_STAT)) {
- startstop = ide_error(drive, "task_mulout_intr", stat);
+ startstop = ide_error(drive, rq, "task_mulout_intr", stat);
return startstop;
}
@@ -342,7 +342,7 @@
if (!OK_STAT(stat, DATA_READY, BAD_R_STAT)) {
if (stat & (ERR_STAT | DRQ_STAT)) {
- startstop = ide_error(drive, "task_mulout_intr", stat);
+ startstop = ide_error(drive, rq, "task_mulout_intr", stat);
return startstop;
}
@@ -489,12 +489,12 @@
/*
* This is invoked on completion of a WIN_RESTORE (recalibrate) cmd.
*/
-ide_startstop_t recal_intr(struct ata_device *drive, struct request *__rq)
+ide_startstop_t recal_intr(struct ata_device *drive, struct request *rq)
{
u8 stat;
if (!OK_STAT(stat = GET_STAT(),READY_STAT,BAD_STAT))
- return ide_error(drive, "recal_intr", stat);
+ return ide_error(drive, rq, "recal_intr", stat);
return ide_stopped;
}
@@ -511,11 +511,11 @@
if (!OK_STAT(stat = GET_STAT(), READY_STAT, BAD_STAT)) {
/* Keep quiet for NOP because it is expected to fail. */
if (args && args->taskfile.command != WIN_NOP)
- return ide_error(drive, "task_no_data_intr", stat);
+ return ide_error(drive, rq, "task_no_data_intr", stat);
}
if (args)
- ide_end_drive_cmd (drive, stat, GET_ERR());
+ ide_end_drive_cmd(drive, rq, stat, GET_ERR());
return ide_stopped;
}
@@ -523,7 +523,7 @@
/*
* Handler for command with PIO data-in phase
*/
-static ide_startstop_t task_in_intr (struct ata_device *drive, struct request *rq)
+static ide_startstop_t task_in_intr(struct ata_device *drive, struct request *rq)
{
u8 stat = GET_STAT();
char *pBuf = NULL;
@@ -531,7 +531,7 @@
if (!OK_STAT(stat,DATA_READY,BAD_R_STAT)) {
if (stat & (ERR_STAT|DRQ_STAT)) {
- return ide_error(drive, "task_in_intr", stat);
+ return ide_error(drive, rq, "task_in_intr", stat);
}
if (!(stat & BUSY_STAT)) {
DTF("task_in_intr to Soon wait for next interrupt\n");
@@ -569,7 +569,7 @@
struct ata_taskfile *args = rq->special;
ide_startstop_t startstop;
- if (ide_wait_stat(&startstop, drive, DATA_READY, drive->bad_wstat, WAIT_DRQ)) {
+ if (ide_wait_stat(&startstop, drive, rq, DATA_READY, drive->bad_wstat, WAIT_DRQ)) {
printk(KERN_ERR "%s: no DRQ after issuing %s\n", drive->name, drive->mult_count ? "MULTWRITE" : "WRITE");
return startstop;
}
@@ -600,7 +600,7 @@
unsigned long flags;
if (!OK_STAT(stat,DRIVE_READY,drive->bad_wstat))
- return ide_error(drive, "task_out_intr", stat);
+ return ide_error(drive, rq, "task_out_intr", stat);
if (!rq->current_nr_sectors)
if (!ide_end_request(drive, rq, 1))
@@ -632,7 +632,7 @@
if (!OK_STAT(stat = GET_STAT(),DATA_READY,BAD_R_STAT)) {
if (stat & (ERR_STAT|DRQ_STAT)) {
- return ide_error(drive, "task_mulin_intr", stat);
+ return ide_error(drive, rq, "task_mulin_intr", stat);
}
/* no data yet, so wait for another interrupt */
ide_set_handler(drive, task_mulin_intr, WAIT_CMD, NULL);
diff -urN linux-2.5.15/drivers/ide/pdc4030.c linux/drivers/ide/pdc4030.c
--- linux-2.5.15/drivers/ide/pdc4030.c 2002-05-10 00:22:49.000000000 +0200
+++ linux/drivers/ide/pdc4030.c 2002-05-11 15:38:26.000000000 +0200
@@ -185,7 +185,7 @@
if (pdc4030_cmd(drive,PROMISE_GET_CONFIG)) {
return 0;
}
- if (ide_wait_stat(&startstop, drive,DATA_READY,BAD_W_STAT,WAIT_DRQ)) {
+ if (ide_wait_stat(&startstop, drive, NULL, DATA_READY,BAD_W_STAT,WAIT_DRQ)) {
printk(KERN_INFO
"%s: Failed Promise read config!\n",hwif->name);
return 0;
@@ -309,14 +309,14 @@
*/
static ide_startstop_t promise_read_intr(struct ata_device *drive, struct request *rq)
{
- byte stat;
+ u8 stat;
int total_remaining;
unsigned int sectors_left, sectors_avail, nsect;
unsigned long flags;
char *to;
if (!OK_STAT(stat=GET_STAT(),DATA_READY,BAD_R_STAT)) {
- return ide_error(drive, "promise_read_intr", stat);
+ return ide_error(drive, rq, "promise_read_intr", stat);
}
read_again:
@@ -348,17 +348,18 @@
if ((rq->current_nr_sectors -= nsect) <= 0) {
ide_end_request(drive, rq, 1);
}
-/*
- * Now the data has been read in, do the following:
- *
- * if there are still sectors left in the request,
- * if we know there are still sectors available from the interface,
- * go back and read the next bit of the request.
- * else if DRQ is asserted, there are more sectors available, so
- * go back and find out how many, then read them in.
- * else if BUSY is asserted, we are going to get an interrupt, so
- * set the handler for the interrupt and just return
- */
+
+ /*
+ * Now the data has been read in, do the following:
+ *
+ * if there are still sectors left in the request, if we know there are
+ * still sectors available from the interface, go back and read the
+ * next bit of the request. else if DRQ is asserted, there are more
+ * sectors available, so go back and find out how many, then read them
+ * in. else if BUSY is asserted, we are going to get an interrupt, so
+ * set the handler for the interrupt and just return
+ */
+
if (total_remaining > 0) {
if (sectors_avail)
goto read_next;
@@ -375,7 +376,7 @@
}
printk(KERN_ERR "%s: Eeek! promise_read_intr: sectors left "
"!DRQ !BUSY\n", drive->name);
- return ide_error(drive, "promise read intr", stat);
+ return ide_error(drive, rq, "promise read intr", stat);
}
return ide_stopped;
}
@@ -400,7 +401,7 @@
ch->poll_timeout = 0;
printk(KERN_ERR "%s: completion timeout - still busy!\n",
drive->name);
- return ide_error(drive, "busy timeout", GET_STAT());
+ return ide_error(drive, rq, "busy timeout", GET_STAT());
}
ch->poll_timeout = 0;
@@ -478,7 +479,7 @@
}
ch->poll_timeout = 0;
printk(KERN_ERR "%s: write timed out!\n",drive->name);
- return ide_error(drive, "write timeout", GET_STAT());
+ return ide_error(drive, rq, "write timeout", GET_STAT());
}
/*
@@ -613,7 +614,7 @@
* call the promise_write function to deal with writing the data out
* NOTE: No interrupts are generated on writes. Write completion must be polled
*/
- if (ide_wait_stat(&startstop, drive, DATA_READY, drive->bad_wstat, WAIT_DRQ)) {
+ if (ide_wait_stat(&startstop, drive, rq, DATA_READY, drive->bad_wstat, WAIT_DRQ)) {
printk(KERN_ERR "%s: no DRQ after issuing "
"PROMISE_WRITE\n", drive->name);
return startstop;
diff -urN linux-2.5.15/drivers/ide/tcq.c linux/drivers/ide/tcq.c
--- linux-2.5.15/drivers/ide/tcq.c 2002-05-10 00:25:27.000000000 +0200
+++ linux/drivers/ide/tcq.c 2002-05-11 18:39:23.000000000 +0200
@@ -52,7 +52,7 @@
#undef IDE_TCQ_FIDDLE_SI
static ide_startstop_t ide_dmaq_intr(struct ata_device *drive, struct request *rq);
-static ide_startstop_t service(struct ata_device *drive);
+static ide_startstop_t service(struct ata_device *drive, struct request *rq);
static inline void drive_ctl_nien(struct ata_device *drive, int set)
{
@@ -70,7 +70,7 @@
struct ata_taskfile *args = rq->special;
ide__sti();
- ide_end_drive_cmd(drive, GET_STAT(), GET_ERR());
+ ide_end_drive_cmd(drive, rq, GET_STAT(), GET_ERR());
kfree(args);
return ide_stopped;
}
@@ -82,7 +82,8 @@
*/
static void tcq_invalidate_queue(struct ata_device *drive)
{
- ide_hwgroup_t *hwgroup = HWGROUP(drive);
+ struct ata_channel *ch = drive->channel;
+ ide_hwgroup_t *hwgroup = ch->hwgroup;
request_queue_t *q = &drive->queue;
struct ata_taskfile *args;
struct request *rq;
@@ -92,7 +93,7 @@
spin_lock_irqsave(&ide_lock, flags);
- del_timer(&hwgroup->timer);
+ del_timer(&ch->timer);
if (test_bit(IDE_DMA, &hwgroup->flags))
udma_stop(drive);
@@ -169,7 +170,7 @@
* if pending commands, try service before giving up
*/
if (ata_pending_commands(drive) && (GET_STAT() & SERVICE_STAT))
- if (service(drive) == ide_started)
+ if (service(drive, hwgroup->rq) == ide_started)
return;
if (drive)
@@ -178,6 +179,7 @@
static void set_irq(struct ata_device *drive, ata_handler_t *handler)
{
+ struct ata_channel *ch = drive->channel;
ide_hwgroup_t *hwgroup = HWGROUP(drive);
unsigned long flags;
@@ -186,10 +188,14 @@
/*
* always just bump the timer for now, the timeout handling will
* have to be changed to be per-command
+ *
+ * FIXME: Jens - this is broken it will interfere with
+ * the normal timer function on serialized drives!
*/
- hwgroup->timer.function = ata_tcq_irq_timeout;
- hwgroup->timer.data = (unsigned long) hwgroup->XXX_drive;
- mod_timer(&hwgroup->timer, jiffies + 5 * HZ);
+
+ ch->timer.function = ata_tcq_irq_timeout;
+ ch->timer.data = (unsigned long) ch->drive;
+ mod_timer(&ch->timer, jiffies + 5 * HZ);
hwgroup->handler = handler;
spin_unlock_irqrestore(&ide_lock, flags);
@@ -223,9 +229,8 @@
*
* Also, nIEN must be set as not to need protection against ide_dmaq_intr
*/
-static ide_startstop_t service(struct ata_device *drive)
+static ide_startstop_t service(struct ata_device *drive, struct request *rq)
{
- struct request *rq;
u8 feat;
u8 stat;
int tag;
@@ -242,7 +247,7 @@
/*
* need to select the right drive first...
*/
- if (drive != HWGROUP(drive)->XXX_drive) {
+ if (drive != drive->channel->drive) {
SELECT_DRIVE(drive->channel, drive);
udelay(10);
}
@@ -256,8 +261,9 @@
if (wait_altstat(drive, &stat, BUSY_STAT)) {
printk(KERN_ERR"%s: BUSY clear took too long\n", __FUNCTION__);
- ide_dump_status(drive, __FUNCTION__, stat);
+ ide_dump_status(drive, rq, __FUNCTION__, stat);
tcq_invalidate_queue(drive);
+
return ide_stopped;
}
@@ -267,8 +273,9 @@
* FIXME, invalidate queue
*/
if (stat & ERR_STAT) {
- ide_dump_status(drive, __FUNCTION__, stat);
+ ide_dump_status(drive, rq, __FUNCTION__, stat);
tcq_invalidate_queue(drive);
+
return ide_stopped;
}
@@ -301,7 +308,7 @@
return udma_tcq_start(drive, rq);
}
-static ide_startstop_t check_service(struct ata_device *drive)
+static ide_startstop_t check_service(struct ata_device *drive, struct request *rq)
{
u8 stat;
@@ -311,7 +318,7 @@
return ide_stopped;
if ((stat = GET_STAT()) & SERVICE_STAT)
- return service(drive);
+ return service(drive, rq);
/*
* we have pending commands, wait for interrupt
@@ -335,8 +342,9 @@
*/
if (unlikely(!OK_STAT(stat, READY_STAT, drive->bad_wstat | DRQ_STAT))) {
printk(KERN_ERR "%s: %s: error status %x\n", __FUNCTION__, drive->name,stat);
- ide_dump_status(drive, __FUNCTION__, stat);
+ ide_dump_status(drive, rq, __FUNCTION__, stat);
tcq_invalidate_queue(drive);
+
return ide_stopped;
}
@@ -349,7 +357,7 @@
/*
* we completed this command, check if we can service a new command
*/
- return check_service(drive);
+ return check_service(drive, rq);
}
/*
@@ -380,11 +388,11 @@
*/
if (stat & SERVICE_STAT) {
TCQ_PRINTK("%s: SERV (stat=%x)\n", __FUNCTION__, stat);
- return service(drive);
+ return service(drive, rq);
}
printk("%s: stat=%x, not expected\n", __FUNCTION__, stat);
- return check_service(drive);
+ return check_service(drive, rq);
}
/*
@@ -558,7 +566,7 @@
OUT_BYTE(args->taskfile.command, IDE_COMMAND_REG);
if (wait_altstat(drive, &stat, BUSY_STAT)) {
- ide_dump_status(drive, "queued start", stat);
+ ide_dump_status(drive, rq, "queued start", stat);
tcq_invalidate_queue(drive);
return ide_stopped;
}
@@ -566,7 +574,7 @@
drive_ctl_nien(drive, 0);
if (stat & ERR_STAT) {
- ide_dump_status(drive, "tcq_start", stat);
+ ide_dump_status(drive, rq, "tcq_start", stat);
return ide_stopped;
}
@@ -582,7 +590,7 @@
TCQ_PRINTK("REL in queued_start\n");
if ((stat = GET_STAT()) & SERVICE_STAT)
- return service(drive);
+ return service(drive, rq);
return ide_released;
}
diff -urN linux-2.5.15/drivers/scsi/ide-scsi.c linux/drivers/scsi/ide-scsi.c
--- linux-2.5.15/drivers/scsi/ide-scsi.c 2002-05-10 00:21:39.000000000 +0200
+++ linux/drivers/scsi/ide-scsi.c 2002-05-11 16:13:40.000000000 +0200
@@ -271,7 +271,7 @@
ide_end_request(drive, rq, uptodate);
return 0;
}
- ide_end_drive_cmd (drive, 0, 0);
+ ide_end_drive_cmd(drive, rq, 0, 0);
if (rq->errors >= ERROR_MAX) {
pc->scsi_cmd->result = DID_ERROR << 16;
if (log)
@@ -401,7 +401,7 @@
byte ireason;
ide_startstop_t startstop;
- if (ide_wait_stat (&startstop,drive,DRQ_STAT,BUSY_STAT,WAIT_READY)) {
+ if (ide_wait_stat(&startstop, drive, rq, DRQ_STAT, BUSY_STAT, WAIT_READY)) {
printk (KERN_ERR "ide-scsi: Strange, packet command initiated yet DRQ isn't asserted\n");
return startstop;
}
@@ -489,20 +489,6 @@
static ide_drive_t *idescsi_drives[MAX_HWIFS * MAX_DRIVES];
static int idescsi_initialized = 0;
-static void idescsi_add_settings(ide_drive_t *drive)
-{
- idescsi_scsi_t *scsi = drive->driver_data;
-
-/*
- * drive setting name read/write ioctl ioctl data type min max mul_factor div_factor data pointer set function
- */
- ide_add_setting(drive, "bios_cyl", SETTING_RW, -1, -1, TYPE_INT, 0, 1023, 1, 1, &drive->bios_cyl, NULL);
- ide_add_setting(drive, "bios_head", SETTING_RW, -1, -1, TYPE_BYTE, 0, 255, 1, 1, &drive->bios_head, NULL);
- ide_add_setting(drive, "bios_sect", SETTING_RW, -1, -1, TYPE_BYTE, 0, 63, 1, 1, &drive->bios_sect, NULL);
- ide_add_setting(drive, "transform", SETTING_RW, -1, -1, TYPE_INT, 0, 3, 1, 1, &scsi->transform, NULL);
- ide_add_setting(drive, "log", SETTING_RW, -1, -1, TYPE_INT, 0, 1, 1, 1, &scsi->log, NULL);
-}
-
/*
* Driver initialization.
*/
@@ -521,8 +507,7 @@
clear_bit(IDESCSI_SG_TRANSFORM, &scsi->transform);
#if IDESCSI_DEBUG_LOG
set_bit(IDESCSI_LOG_CMD, &scsi->log);
-#endif /* IDESCSI_DEBUG_LOG */
- idescsi_add_settings(drive);
+#endif
}
static int idescsi_cleanup (ide_drive_t *drive)
diff -urN linux-2.5.15/include/linux/ide.h linux/include/linux/ide.h
--- linux-2.5.15/include/linux/ide.h 2002-05-10 00:22:45.000000000 +0200
+++ linux/include/linux/ide.h 2002-05-11 18:34:47.000000000 +0200
@@ -264,8 +264,6 @@
#define ATA_SCSI 0x21
#define ATA_NO_LUN 0x7f
-struct ide_settings_s;
-
typedef union {
unsigned all : 8; /* all of the bits together */
struct {
@@ -329,8 +327,6 @@
unsigned long sleep; /* sleep until this time */
- u8 XXX_tune_req; /* requested drive tuning setting */
-
byte using_dma; /* disk is using dma for read/write */
byte using_tcq; /* disk is using queueing */
byte retry_pio; /* retrying dma capable host in pio */
@@ -379,7 +375,6 @@
void *driver_data; /* extra driver data */
devfs_handle_t de; /* directory for device */
- struct ide_settings_s *settings; /* ioctl entires */
char driver_req[10]; /* requests specific driver */
int last_lun; /* last logical unit */
@@ -418,6 +413,9 @@
int unit; /* channel number */
struct hwgroup_s *hwgroup; /* actually (ide_hwgroup_t *) */
+ struct timer_list timer; /* failsafe timer */
+ int (*expiry)(struct ata_device *, struct request *); /* irq handler, if active */
+ struct ata_device *drive; /* last serviced drive */
ide_ioreg_t io_ports[IDE_NR_PORTS]; /* task file registers */
hw_regs_t hw; /* Hardware info */
@@ -569,50 +567,10 @@
*/
ide_startstop_t (*handler)(struct ata_device *, struct request *); /* irq handler, if active */
unsigned long flags; /* BUSY, SLEEPING */
- struct ata_device *XXX_drive; /* current drive */
struct request *rq; /* current request */
- struct timer_list timer; /* failsafe timer */
- int (*expiry)(struct ata_device *, struct request *); /* irq handler, if active */
} ide_hwgroup_t;
-/* structure attached to the request for IDE_TASK_CMDS */
-
-/*
- * configurable drive settings
- */
-
-#define TYPE_INT 0
-#define TYPE_INTA 1
-#define TYPE_BYTE 2
-#define TYPE_SHORT 3
-
-#define SETTING_READ (1 << 0)
-#define SETTING_WRITE (1 << 1)
-#define SETTING_RW (SETTING_READ | SETTING_WRITE)
-
-typedef int (ide_procset_t)(struct ata_device *, int);
-typedef struct ide_settings_s {
- char *name;
- int rw;
- int read_ioctl;
- int write_ioctl;
- int data_type;
- int min;
- int max;
- int mul_factor;
- int div_factor;
- void *data;
- ide_procset_t *set;
- int auto_remove;
- struct ide_settings_s *next;
-} ide_settings_t;
-
-extern void ide_add_setting(struct ata_device *, const char *, int, int, int, int, int, int, int, int, void *, ide_procset_t *);
-extern void ide_remove_setting(struct ata_device *, char *);
-extern int ide_read_setting(struct ata_device *, ide_settings_t *);
-extern int ide_write_setting(struct ata_device *, ide_settings_t *, int);
-extern void ide_add_generic_settings(struct ata_device *);
-
+/* FIXME: kill this as soon as possible */
#define PROC_IDE_READ_RETURN(page,start,off,count,eof,len) return 0;
/*
@@ -683,13 +641,10 @@
/*
* Error reporting, in human readable form (luxurious, but a memory hog).
*/
-extern byte ide_dump_status(struct ata_device *, const char *, byte);
+extern u8 ide_dump_status(struct ata_device *, struct request *rq, const char *, u8);
-/*
- * ide_error() takes action based on the error returned by the controller.
- * The caller should return immediately after invoking this.
- */
-extern ide_startstop_t ide_error(struct ata_device *, const char *, byte);
+extern ide_startstop_t ide_error(struct ata_device *, struct request *rq,
+ const char *, byte);
/*
* Issue a simple drive command
@@ -713,7 +668,9 @@
* caller should return the updated value of "startstop" in this case.
* "startstop" is unchanged when the function returns 0;
*/
-extern int ide_wait_stat(ide_startstop_t *, struct ata_device *, byte, byte, unsigned long);
+extern int ide_wait_stat(ide_startstop_t *,
+ struct ata_device *, struct request *rq,
+ byte, byte, unsigned long);
extern int ide_wait_noerr(struct ata_device *, byte, byte, unsigned long);
@@ -759,7 +716,7 @@
/*
* Clean up after success/failure of an explicit drive cmd.
*/
-extern void ide_end_drive_cmd(struct ata_device *, byte, byte);
+extern void ide_end_drive_cmd(struct ata_device *, struct request *, u8, u8);
struct ata_taskfile {
struct hd_drive_task_hdr taskfile;
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 60
2002-05-11 16:59 ` [PATCH] 2.5.15 IDE 60 Martin Dalecki
@ 2002-05-11 18:47 ` Pierre Rousselet
2002-05-11 19:12 ` Andre Hedrick
2002-05-12 19:19 ` pdc202xx.c fails to compile in 2.5.15 Zlatko Calusic
1 sibling, 1 reply; 265+ messages in thread
From: Pierre Rousselet @ 2002-05-11 18:47 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Kernel Mailing List
Martin Dalecki wrote:
> Fri May 10 16:17:01 CEST 2002 ide-clean-60
>
> Synchronize with 2.5.15
>
> - Rewrite ioctl handling.
>
> - Apply fix for hpt366 "hang on boot" by Andre.
No, it doesn't fix it for me.
--
Pierre
------------------------------------------------
Pierre Rousselet <pierre.rousselet@wanadoo.fr>
------------------------------------------------
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 60
2002-05-11 18:47 ` Pierre Rousselet
@ 2002-05-11 19:12 ` Andre Hedrick
2002-05-11 19:52 ` Pierre Rousselet
0 siblings, 1 reply; 265+ messages in thread
From: Andre Hedrick @ 2002-05-11 19:12 UTC (permalink / raw)
To: Pierre Rousselet; +Cc: Martin Dalecki, Kernel Mailing List
You have to specify which of the 6 revisions of the chipset you have.
Also in some cases which of the 13 sub-revisions, and the latter is
determined by the sub-vender-device.
On Sat, 11 May 2002, Pierre Rousselet wrote:
> Martin Dalecki wrote:
> > Fri May 10 16:17:01 CEST 2002 ide-clean-60
> >
> > Synchronize with 2.5.15
> >
> > - Rewrite ioctl handling.
> >
> > - Apply fix for hpt366 "hang on boot" by Andre.
>
> No, it doesn't fix it for me.
>
> --
> Pierre
> ------------------------------------------------
> Pierre Rousselet <pierre.rousselet@wanadoo.fr>
> ------------------------------------------------
>
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
Andre Hedrick
LAD Storage Consulting Group
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 60
2002-05-11 19:12 ` Andre Hedrick
@ 2002-05-11 19:52 ` Pierre Rousselet
2002-05-11 23:48 ` Andre Hedrick
0 siblings, 1 reply; 265+ messages in thread
From: Pierre Rousselet @ 2002-05-11 19:52 UTC (permalink / raw)
To: Andre Hedrick; +Cc: Martin Dalecki, Kernel Mailing List
Andre Hedrick wrote:
> You have to specify which of the 6 revisions of the chipset you have.
> Also in some cases which of the 13 sub-revisions, and the latter is
> determined by the sub-vender-device.
hde is ST310212A UDMA(66), hdg is SAMSUNG SV0322A UDMA(33) (motherboard
BE6).
# lspci -v gives (2.5.14 PCI_NAMES not set) :
00:13.0 Unknown mass storage controller: Triones Technologies, Inc.
HPT366 (rev 01)
Flags: bus master, medium devsel, latency 120, IRQ 11
I/O ports at cc00 [size=8]
I/O ports at d000 [size=4]
I/O ports at d400 [size=256]
Expansion ROM at <unassigned> [disabled] [size=128K]
00:13.1 Unknown mass storage controller: Triones Technologies, Inc.
HPT366 (rev 01)
Flags: bus master, medium devsel, latency 120, IRQ 11
I/O ports at d800 [size=8]
I/O ports at dc00 [size=4]
I/O ports at e000 [size=256]
# scanpci -v:
pci bus 0x0000 cardnum 0x13 function 0x00: vendor 0x1103 device 0x0004
Device unknown
STATUS 0x0200 COMMAND 0x0005
CLASS 0x01 0x80 0x00 REVISION 0x01
BIST 0x00 HEADER 0x80 LATENCY 0x78 CACHE 0x08
BASE0 0x0000cc01 addr 0x0000cc00 I/O
BASE1 0x0000d001 addr 0x0000d000 I/O
BASE4 0x0000d401 addr 0x0000d400 I/O
MAX_LAT 0x08 MIN_GNT 0x08 INT_PIN 0x01 INT_LINE 0x0b
BYTE_0 0x10c9a731 BYTE_1 0x00 BYTE_2 0x80736b0 BYTE_3 0xffffffff
pci bus 0x0000 cardnum 0x13 function 0x01: vendor 0x1103 device 0x0004
Device unknown
STATUS 0x0200 COMMAND 0x0007
CLASS 0x01 0x80 0x00 REVISION 0x01
BIST 0x00 HEADER 0x80 LATENCY 0x78 CACHE 0x08
BASE0 0x0000d801 addr 0x0000d800 I/O
BASE1 0x0000dc01 addr 0x0000dc00 I/O
BASE4 0x0000e001 addr 0x0000e000 I/O
MAX_LAT 0x08 MIN_GNT 0x08 INT_PIN 0x02 INT_LINE 0x0b
BYTE_0 0x10caa731 BYTE_1 0x00 BYTE_2 0x8073a28 BYTE_3 0xffffffff
# cat /proc/ide/hpt366 :
HighPoint HPT366/368/370
Controller: 0
Chipset: HPT366
--------------- Primary Channel --------------- Secondary Channel
--------------
Enabled: yes yes
--------------- drive0 --------- drive1 ------- drive0 ---------- drive1
-------
DMA capable: yes no no no
Mode: UDMA off off off
Controller: 1
Chipset: HPT366
--------------- Primary Channel --------------- Secondary Channel
--------------
Enabled: yes yes
--------------- drive0 --------- drive1 ------- drive0 ---------- drive1
-------
DMA capable: yes no no no
Mode: UDMA off off off
--
Pierre
------------------------------------------------
Pierre Rousselet <pierre.rousselet@wanadoo.fr>
------------------------------------------------
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 60
2002-05-11 19:52 ` Pierre Rousselet
@ 2002-05-11 23:48 ` Andre Hedrick
0 siblings, 0 replies; 265+ messages in thread
From: Andre Hedrick @ 2002-05-11 23:48 UTC (permalink / raw)
To: Pierre Rousselet; +Cc: Martin Dalecki, Kernel Mailing List
I am sorry I can not do much for 2.5 at the moment.
Please revert to 2.4.19-pre7 plus my mondo patch.
bp6:~ # uname -a
Linux bp6 2.4.19-pre7 #1 SMP Tue May 7 12:57:46 PDT 2002 i686 unknown
00:13.0 Unknown mass storage controller: Triones Technologies, Inc. HPT366 (rev 01)
Flags: bus master, medium devsel, latency 120, IRQ 18
I/O ports at d800 [size=8]
I/O ports at dc00 [size=4]
I/O ports at e000 [size=256]
Expansion ROM at ea000000 [disabled] [size=128K]
00:13.1 Unknown mass storage controller: Triones Technologies, Inc. HPT366 (rev 01)
Flags: bus master, medium devsel, latency 120, IRQ 18
I/O ports at e400 [size=8]
I/O ports at e800 [size=4]
I/O ports at ec00 [size=256]
HPT366: onboard version of chipset, pin1=1 pin2=2
HPT366: IDE controller on PCI bus 00 dev 98
PCI: Enabling device 00:13.0 (0005 -> 0007)
HPT366: chipset revision 1
HPT366: not 100% native mode: will probe irqs later
ide0: BM-DMA at 0xe000-0xe007, BIOS settings: hda:DMA, hdb:pio
HPT366: IDE controller on PCI bus 00 dev 99
HPT366: chipset revision 1
HPT366: not 100% native mode: will probe irqs later
ide1: BM-DMA at 0xec00-0xec07, BIOS settings: hdc:DMA, hdd:pio
hda: DupliDisk IDE RAID-1 Adapter( 1.19), ATA DISK drive
hdc: QUANTUM FIREBALLP KA13.6, ATA DISK drive
ide2: ports already in use, skipping probe
ide0 at 0xd800-0xd807,0xdc02 on irq 18
ide1 at 0xe400-0xe407,0xe802 on irq 18
hda: host protected area => 1
hda: setmax LBA 18041184, native 18039168
hda: 18039168 sectors (9236 MB) w/371KiB Cache, CHS=17896/16/63, UDMA(66)
hdc: host protected area => 1
hdc: 27068420 sectors (13859 MB) w/371KiB Cache, CHS=26853/16/63, UDMA(66)
ide-floppy driver 0.99.newide
Partition check:
/dev/ide/host0/bus0/target0/lun0: p1 p2 p3 < p5 p6 p7 p8 >
/dev/ide/host1/bus0/target0/lun0: unknown partition table
Floppy drive(s): fd0 is 1.44M
FDC 0 is a post-1991 82077
RAMDISK driver initialized: 16 RAM disks of 64000K size 1024 blocksize
loop: loaded (max 8 devices)
If it still fails it is a driver-chipset problem.
If it works in 2.4 but fails in 2.5 gawd only knows what is the issue.
On Sat, 11 May 2002, Pierre Rousselet wrote:
> Andre Hedrick wrote:
> > You have to specify which of the 6 revisions of the chipset you have.
> > Also in some cases which of the 13 sub-revisions, and the latter is
> > determined by the sub-vender-device.
>
> hde is ST310212A UDMA(66), hdg is SAMSUNG SV0322A UDMA(33) (motherboard
> BE6).
>
> # lspci -v gives (2.5.14 PCI_NAMES not set) :
>
> 00:13.0 Unknown mass storage controller: Triones Technologies, Inc.
> HPT366 (rev 01)
> Flags: bus master, medium devsel, latency 120, IRQ 11
> I/O ports at cc00 [size=8]
> I/O ports at d000 [size=4]
> I/O ports at d400 [size=256]
> Expansion ROM at <unassigned> [disabled] [size=128K]
>
> 00:13.1 Unknown mass storage controller: Triones Technologies, Inc.
> HPT366 (rev 01)
> Flags: bus master, medium devsel, latency 120, IRQ 11
> I/O ports at d800 [size=8]
> I/O ports at dc00 [size=4]
> I/O ports at e000 [size=256]
>
> # scanpci -v:
>
> pci bus 0x0000 cardnum 0x13 function 0x00: vendor 0x1103 device 0x0004
> Device unknown
> STATUS 0x0200 COMMAND 0x0005
> CLASS 0x01 0x80 0x00 REVISION 0x01
> BIST 0x00 HEADER 0x80 LATENCY 0x78 CACHE 0x08
> BASE0 0x0000cc01 addr 0x0000cc00 I/O
> BASE1 0x0000d001 addr 0x0000d000 I/O
> BASE4 0x0000d401 addr 0x0000d400 I/O
> MAX_LAT 0x08 MIN_GNT 0x08 INT_PIN 0x01 INT_LINE 0x0b
> BYTE_0 0x10c9a731 BYTE_1 0x00 BYTE_2 0x80736b0 BYTE_3 0xffffffff
>
> pci bus 0x0000 cardnum 0x13 function 0x01: vendor 0x1103 device 0x0004
> Device unknown
> STATUS 0x0200 COMMAND 0x0007
> CLASS 0x01 0x80 0x00 REVISION 0x01
> BIST 0x00 HEADER 0x80 LATENCY 0x78 CACHE 0x08
> BASE0 0x0000d801 addr 0x0000d800 I/O
> BASE1 0x0000dc01 addr 0x0000dc00 I/O
> BASE4 0x0000e001 addr 0x0000e000 I/O
> MAX_LAT 0x08 MIN_GNT 0x08 INT_PIN 0x02 INT_LINE 0x0b
> BYTE_0 0x10caa731 BYTE_1 0x00 BYTE_2 0x8073a28 BYTE_3 0xffffffff
>
> # cat /proc/ide/hpt366 :
> HighPoint HPT366/368/370
>
> Controller: 0
> Chipset: HPT366
> --------------- Primary Channel --------------- Secondary Channel
> --------------
> Enabled: yes yes
> --------------- drive0 --------- drive1 ------- drive0 ---------- drive1
> -------
> DMA capable: yes no no no
> Mode: UDMA off off off
>
> Controller: 1
> Chipset: HPT366
> --------------- Primary Channel --------------- Secondary Channel
> --------------
> Enabled: yes yes
> --------------- drive0 --------- drive1 ------- drive0 ---------- drive1
> -------
> DMA capable: yes no no no
> Mode: UDMA off off off
>
> --
> Pierre
> ------------------------------------------------
> Pierre Rousselet <pierre.rousselet@wanadoo.fr>
> ------------------------------------------------
>
Andre Hedrick
LAD Storage Consulting Group
^ permalink raw reply [flat|nested] 265+ messages in thread
* pdc202xx.c fails to compile in 2.5.15
2002-05-11 16:59 ` [PATCH] 2.5.15 IDE 60 Martin Dalecki
2002-05-11 18:47 ` Pierre Rousselet
@ 2002-05-12 19:19 ` Zlatko Calusic
2002-05-12 19:40 ` Jurriaan on Alpha
2002-05-12 22:00 ` Petr Vandrovec
1 sibling, 2 replies; 265+ messages in thread
From: Zlatko Calusic @ 2002-05-12 19:19 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Kernel Mailing List
pdc202xx.x fails to compile in 2.5.15. Error messages below.
pdc202xx.c:1453: unknown field `exnablebits' specified in initializer
pdc202xx.c:1453: warning: braces around scalar initializer
pdc202xx.c:1453: warning: (near initialization for `chipsets[3].init_dma')
pdc202xx.c:1453: warning: braces around scalar initializer
pdc202xx.c:1453: warning: (near initialization for `chipsets[3].init_dma')
pdc202xx.c:1453: warning: initialization makes pointer from integer without a cast
pdc202xx.c:1453: warning: excess elements in scalar initializer
pdc202xx.c:1453: warning: (near initialization for `chipsets[3].init_dma')
pdc202xx.c:1453: warning: excess elements in scalar initializer
pdc202xx.c:1453: warning: (near initialization for `chipsets[3].init_dma')
pdc202xx.c:1453: warning: braces around scalar initializer
pdc202xx.c:1453: warning: (near initialization for `chipsets[3].init_dma')
pdc202xx.c:1453: warning: initialization makes pointer from integer without a cast
pdc202xx.c:1453: warning: excess elements in scalar initializer
pdc202xx.c:1453: warning: (near initialization for `chipsets[3].init_dma')
pdc202xx.c:1453: warning: excess elements in scalar initializer
pdc202xx.c:1453: warning: (near initialization for `chipsets[3].init_dma')
pdc202xx.c:1453: warning: excess elements in scalar initializer
pdc202xx.c:1453: warning: (near initialization for `chipsets[3].init_dma')
make[3]: *** [pdc202xx.o] Error 1
--
Zlatko
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: pdc202xx.c fails to compile in 2.5.15
2002-05-12 19:19 ` pdc202xx.c fails to compile in 2.5.15 Zlatko Calusic
@ 2002-05-12 19:40 ` Jurriaan on Alpha
2002-05-12 22:00 ` Petr Vandrovec
1 sibling, 0 replies; 265+ messages in thread
From: Jurriaan on Alpha @ 2002-05-12 19:40 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: Martin Dalecki, Kernel Mailing List
From: Zlatko Calusic <zlatko.calusic@iskon.hr>
Date: Sun, May 12, 2002 at 09:19:22PM +0200
> pdc202xx.x fails to compile in 2.5.15. Error messages below.
>
>
> pdc202xx.c:1453: unknown field `exnablebits' specified in initializer
That's a simple typing error - replace exnable by enable.
Good luck,
Jurriaan
--
The man who thinks he is smarter than his wife is married to a very smart
woman.
Debian GNU/Linux 2.4.19p8 on Alpha 988 bogomips load:0.12 0.06 0.01
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: pdc202xx.c fails to compile in 2.5.15
2002-05-12 19:19 ` pdc202xx.c fails to compile in 2.5.15 Zlatko Calusic
2002-05-12 19:40 ` Jurriaan on Alpha
@ 2002-05-12 22:00 ` Petr Vandrovec
2002-05-13 12:03 ` Alan Cox
1 sibling, 1 reply; 265+ messages in thread
From: Petr Vandrovec @ 2002-05-12 22:00 UTC (permalink / raw)
To: Zlatko Calusic; +Cc: Martin Dalecki, Kernel Mailing List
On Sun, May 12, 2002 at 09:19:22PM +0200, Zlatko Calusic wrote:
> pdc202xx.x fails to compile in 2.5.15. Error messages below.
>
> pdc202xx.c:1453: unknown field `exnablebits' specified in initializer
> pdc202xx.c:1453: warning: braces around scalar initializer
> pdc202xx.c:1453: warning: (near initialization for `chipsets[3].init_dma')
> make[3]: *** [pdc202xx.o] Error 1
If you have PDC20265 like I have, you must also remove test on device class,
as 20265 reports itself as generic mass storage (class 0x0180) and not as
IDE (it is real IDE, not RAID, really).
Because of there are apparently devices on which you must check device class
(2.5.14 talks about CY82C693 and IT8172G), I'll leave proper fix on Martin,
but simple fix below work fine on my Asus A7V.
Petr Vandrovec
vandrove@vc.cvut.cz
--- drivers/ide/ide-pci.c Sun May 12 02:46:44 2002
+++ drivers/ide/ide-pci.c Fri May 10 00:25:29 2002
@@ -701,7 +701,7 @@
hpt374_device_order_fixup(dev, d);
} else if (d->vendor == PCI_VENDOR_ID_PROMISE && d->device == PCI_DEVICE_ID_PROMISE_20268R)
pdc20270_device_order_fixup(dev, d);
- else if ((dev->class >> 8) == PCI_CLASS_STORAGE_IDE) {
+ else if (1 || (dev->class >> 8) == PCI_CLASS_STORAGE_IDE) {
printk(KERN_INFO "ATA: %s (%04x:%04x) on PCI slot %s\n",
dev->name, vendor, device, dev->slot_name);
setup_pci_device(dev, d);
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: pdc202xx.c fails to compile in 2.5.15
2002-05-12 22:00 ` Petr Vandrovec
@ 2002-05-13 12:03 ` Alan Cox
0 siblings, 0 replies; 265+ messages in thread
From: Alan Cox @ 2002-05-13 12:03 UTC (permalink / raw)
To: Petr Vandrovec; +Cc: Zlatko Calusic, Martin Dalecki, Kernel Mailing List
> If you have PDC20265 like I have, you must also remove test on device class,
> as 20265 reports itself as generic mass storage (class 0x0180) and not as
> IDE (it is real IDE, not RAID, really).
It reports itself that way so that the windows ide disk driver doesn't
grab and it and dos/bios don't get odd ideas
> Because of there are apparently devices on which you must check device class
> (2.5.14 talks about CY82C693 and IT8172G), I'll leave proper fix on Martin,
> but simple fix below work fine on my Asus A7V.
You need to do specific checks for the device in question. Removing the
class check btw is something anyone reading this message should not do
even in the same situation unless they know precisely what other
mass storage class devices they have present. You can easily trash a
raid array otherwise
^ permalink raw reply [flat|nested] 265+ messages in thread
* [PATCH] 2.5.15 IDE 61
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
` (7 preceding siblings ...)
2002-05-11 16:59 ` [PATCH] 2.5.15 IDE 60 Martin Dalecki
@ 2002-05-13 9:48 ` Martin Dalecki
2002-05-13 12:17 ` [PATCH] 2.5.15 IDE 62 Martin Dalecki
` (3 subsequent siblings)
12 siblings, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-13 9:48 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 529 bytes --]
Sat May 11 18:45:08 CEST 2002 ide-clean-61
- Fix typo in pdc202xx driver.
- Fix locking order in ioctl.
- Fix wrong time_after usage introduced in 60. Maybe the fact I always get is
wrong is related to the fact that I'm using the mouse with the left hand!?
- Apply arch-clean-2 by Bartlomiej Zolnierkiewicz.
- Don't disable interrupts during ide_wait_stat(). I see no reason too.
- Push flags down from hwgroup to the ata_chaannel structure.
- Apply small fixes from Franz Sirl to make AEC6280 working properly again.
[-- Attachment #2: ide-clean-61.diff --]
[-- Type: text/plain, Size: 34090 bytes --]
diff -urN linux-2.5.15/drivers/ide/ide.c linux/drivers/ide/ide.c
--- linux-2.5.15/drivers/ide/ide.c 2002-05-13 12:44:17.000000000 +0200
+++ linux/drivers/ide/ide.c 2002-05-13 02:13:30.000000000 +0200
@@ -317,7 +317,7 @@
blkdev_dequeue_request(rq);
else
blk_queue_end_tag(&drive->queue, rq);
- HWGROUP(drive)->rq = NULL;
+ drive->rq = NULL;
end_that_request_last(rq);
ret = 0;
}
@@ -635,7 +635,7 @@
}
blkdev_dequeue_request(rq);
- HWGROUP(drive)->rq = NULL;
+ drive->rq = NULL;
end_that_request_last(rq);
}
@@ -886,7 +886,6 @@
{
u8 stat;
int i;
- unsigned long flags;
/* bail early if we've exceeded max_failures */
if (drive->max_failures && (drive->failures > drive->max_failures)) {
@@ -896,24 +895,20 @@
udelay(1); /* spec allows drive 400ns to assert "BUSY" */
if ((stat = GET_STAT()) & BUSY_STAT) {
- __save_flags(flags); /* local CPU only */
- ide__sti(); /* local CPU only */
timeout += jiffies;
while ((stat = GET_STAT()) & BUSY_STAT) {
- if (time_after(timeout, jiffies)) {
- __restore_flags(flags); /* local CPU only */
+ if (time_after(jiffies, timeout)) {
*startstop = ide_error(drive, rq, "status timeout", stat);
return 1;
}
}
- __restore_flags(flags); /* local CPU only */
}
+
/*
- * Allow status to settle, then read it again.
- * A few rare drives vastly violate the 400ns spec here,
- * so we'll wait up to 10usec for a "good" status
- * rather than expensively fail things immediately.
- * This fix courtesy of Matthew Faupel & Niccolo Rigacci.
+ * Allow status to settle, then read it again. A few rare drives
+ * vastly violate the 400ns spec here, so we'll wait up to 10usec for a
+ * "good" status rather than expensively fail things immediately. This
+ * fix courtesy of Matthew Faupel & Niccolo Rigacci.
*/
for (i = 0; i < 10; i++) {
udelay(1);
@@ -1074,15 +1069,13 @@
struct ata_channel *ch = drive->channel;
ide_hwgroup_t *hwgroup = ch->hwgroup;
unsigned long flags;
- struct request *rq;
spin_lock_irqsave(&ide_lock, flags);
hwgroup->handler = NULL;
del_timer(&ch->timer);
- rq = hwgroup->rq;
spin_unlock_irqrestore(&ide_lock, flags);
- return start_request(drive, rq);
+ return start_request(drive, drive->rq);
}
/*
@@ -1180,7 +1173,6 @@
if (choice)
return choice;
- channel->hwgroup->rq = NULL;
sleep = longest_sleep(channel);
if (sleep) {
@@ -1197,14 +1189,14 @@
if (timer_pending(&channel->timer))
printk(KERN_ERR "ide_set_handler: timer already active\n");
#endif
- set_bit(IDE_SLEEP, &channel->hwgroup->flags);
+ set_bit(IDE_SLEEP, &channel->active);
mod_timer(&channel->timer, sleep);
/* we purposely leave hwgroup busy while sleeping */
} else {
/* Ugly, but how can we sleep for the lock otherwise? perhaps
* from tq_disk? */
ide_release_lock(&irq_lock);/* for atari only */
- clear_bit(IDE_BUSY, &channel->hwgroup->flags);
+ clear_bit(IDE_BUSY, &channel->active);
}
return NULL;
@@ -1217,13 +1209,13 @@
*/
static void queue_commands(struct ata_device *drive, int masked_irq)
{
- ide_hwgroup_t *hwgroup = drive->channel->hwgroup;
+ struct ata_channel *ch = drive->channel;
ide_startstop_t startstop = -1;
for (;;) {
struct request *rq = NULL;
- if (!test_bit(IDE_BUSY, &hwgroup->flags))
+ if (!test_bit(IDE_BUSY, &ch->active))
printk(KERN_ERR"%s: error: not busy while queueing!\n", drive->name);
/* Abort early if we can't queue another command. for non
@@ -1232,13 +1224,13 @@
*/
if (!ata_can_queue(drive)) {
if (!ata_pending_commands(drive))
- clear_bit(IDE_BUSY, &hwgroup->flags);
+ clear_bit(IDE_BUSY, &ch->active);
break;
}
drive->sleep = 0;
- if (test_bit(IDE_DMA, &hwgroup->flags)) {
+ if (test_bit(IDE_DMA, &ch->active)) {
printk("ide_do_request: DMA in progress...\n");
break;
}
@@ -1256,8 +1248,8 @@
if (!(rq = elv_next_request(&drive->queue))) {
if (!ata_pending_commands(drive))
- clear_bit(IDE_BUSY, &hwgroup->flags);
- hwgroup->rq = NULL;
+ clear_bit(IDE_BUSY, &ch->active);
+ drive->rq = NULL;
break;
}
@@ -1268,7 +1260,7 @@
if (!(rq->flags & REQ_CMD) && ata_pending_commands(drive))
break;
- hwgroup->rq = rq;
+ drive->rq = rq;
/* Some systems have trouble with IDE IRQs arriving while the
* driver is still setting things up. So, here we disable the
@@ -1339,12 +1331,10 @@
*/
static void ide_do_request(struct ata_channel *channel, int masked_irq)
{
- ide_hwgroup_t *hwgroup = channel->hwgroup;
-
ide_get_lock(&irq_lock, ata_irq_request, hwgroup);/* for atari only: POSSIBLY BROKEN HERE(?) */
__cli(); /* necessary paranoia: ensure IRQs are masked on local CPU */
- while (!test_and_set_bit(IDE_BUSY, &hwgroup->flags)) {
+ while (!test_and_set_bit(IDE_BUSY, &channel->active)) {
struct ata_channel *ch;
struct ata_device *drive;
@@ -1405,7 +1395,7 @@
* un-busy drive etc (hwgroup->busy is cleared on return) and
* make sure request is sane
*/
- HWGROUP(drive)->rq = NULL;
+ drive->rq = NULL;
rq->errors = 0;
if (rq->bio) {
@@ -1446,8 +1436,8 @@
* complain about anything.
*/
- if (test_and_clear_bit(IDE_SLEEP, &hwgroup->flags))
- clear_bit(IDE_BUSY, &hwgroup->flags);
+ if (test_and_clear_bit(IDE_SLEEP, &ch->active))
+ clear_bit(IDE_BUSY, &ch->active);
} else {
struct ata_device *drive = ch->drive;
if (!drive) {
@@ -1457,11 +1447,11 @@
ide_startstop_t startstop;
/* paranoia */
- if (!test_and_set_bit(IDE_BUSY, &hwgroup->flags))
+ if (!test_and_set_bit(IDE_BUSY, &ch->active))
printk("%s: ide_timer_expiry: hwgroup was not busy??\n", drive->name);
if ((expiry = ch->expiry) != NULL) {
/* continue */
- if ((wait = expiry(drive, HWGROUP(drive)->rq)) != 0) {
+ if ((wait = expiry(drive, drive->rq)) != 0) {
/* reengage timer */
ch->timer.expires = jiffies + wait;
add_timer(&ch->timer);
@@ -1484,25 +1474,25 @@
#endif
__cli(); /* local CPU only, as if we were handling an interrupt */
if (ch->poll_timeout != 0) {
- startstop = handler(drive, ch->hwgroup->rq);
+ startstop = handler(drive, drive->rq);
} else if (drive_is_ready(drive)) {
if (drive->waiting_for_dma)
udma_irq_lost(drive);
(void) ide_ack_intr(ch);
printk("%s: lost interrupt\n", drive->name);
- startstop = handler(drive, ch->hwgroup->rq);
+ startstop = handler(drive, drive->rq);
} else {
if (drive->waiting_for_dma) {
startstop = ide_stopped;
- dma_timeout_retry(drive, ch->hwgroup->rq);
+ dma_timeout_retry(drive, drive->rq);
} else
- startstop = ide_error(drive, ch->hwgroup->rq, "irq timeout", GET_STAT());
+ startstop = ide_error(drive, drive->rq, "irq timeout", GET_STAT());
}
set_recovery_timer(ch);
enable_irq(ch->irq);
spin_lock_irq(&ide_lock);
if (startstop == ide_stopped)
- clear_bit(IDE_BUSY, &hwgroup->flags);
+ clear_bit(IDE_BUSY, &ch->active);
}
}
@@ -1627,7 +1617,7 @@
goto out_lock;
}
/* paranoia */
- if (!test_and_set_bit(IDE_BUSY, &hwgroup->flags))
+ if (!test_and_set_bit(IDE_BUSY, &ch->active))
printk(KERN_ERR "%s: %s: hwgroup was not busy!?\n", drive->name, __FUNCTION__);
hwgroup->handler = NULL;
del_timer(&ch->timer);
@@ -1637,7 +1627,7 @@
ide__sti(); /* local CPU only */
/* service this interrupt, may set handler for next interrupt */
- startstop = handler(drive, hwgroup->rq);
+ startstop = handler(drive, drive->rq);
spin_lock_irq(&ide_lock);
/*
@@ -1650,7 +1640,7 @@
set_recovery_timer(drive->channel);
if (startstop == ide_stopped) {
if (hwgroup->handler == NULL) { /* paranoia */
- clear_bit(IDE_BUSY, &hwgroup->flags);
+ clear_bit(IDE_BUSY, &ch->active);
ide_do_request(ch, ch->irq);
} else {
printk("%s: %s: huh? expected NULL handler on exit\n", drive->name, __FUNCTION__);
@@ -1738,7 +1728,7 @@
spin_lock_irqsave(&ide_lock, flags);
if (blk_queue_empty(&drive->queue) || action == ide_preempt) {
if (action == ide_preempt)
- HWGROUP(drive)->rq = NULL;
+ drive->rq = NULL;
} else {
if (action == ide_wait || action == ide_end)
queue_head = queue_head->prev;
@@ -2222,8 +2212,6 @@
int ide_spin_wait_hwgroup(struct ata_device *drive)
{
- ide_hwgroup_t *hwgroup = HWGROUP(drive);
-
/* FIXME: Wait on a proper timer. Instead of playing games on the
* spin_lock().
*/
@@ -2232,7 +2220,7 @@
spin_lock_irq(&ide_lock);
- while (test_bit(IDE_BUSY, &hwgroup->flags)) {
+ while (test_bit(IDE_BUSY, &drive->channel->active)) {
spin_unlock_irq(&ide_lock);
if (time_after(jiffies, timeout)) {
printk("%s: channel busy\n", drive->name);
@@ -2316,7 +2304,9 @@
kdev_t dev;
dev = inode->i_rdev;
- major = major(dev); minor = minor(dev);
+ major = major(dev);
+ minor = minor(dev);
+
if ((drive = get_info_ptr(inode->i_rdev)) == NULL)
return -ENODEV;
@@ -2376,6 +2366,7 @@
if (put_user(val, (unsigned long *) arg))
return -EFAULT;
+
return 0;
}
@@ -2384,12 +2375,12 @@
if (arg < 0 || arg > 1)
return -EINVAL;
- if (ide_spin_wait_hwgroup(drive))
- return -EBUSY;
-
if (drive->channel->no_unmask)
return -EIO;
+ if (ide_spin_wait_hwgroup(drive))
+ return -EBUSY;
+
drive->channel->unmask = arg;
spin_unlock_irq(&ide_lock);
@@ -2426,11 +2417,20 @@
if (!loc || (drive->type != ATA_DISK && drive->type != ATA_FLOPPY))
return -EINVAL;
- if (put_user(drive->bios_head, (byte *) &loc->heads)) return -EFAULT;
- if (put_user(drive->bios_sect, (byte *) &loc->sectors)) return -EFAULT;
- if (put_user(bios_cyl, (unsigned short *) &loc->cylinders)) return -EFAULT;
+
+ if (put_user(drive->bios_head, (byte *) &loc->heads))
+ return -EFAULT;
+
+ if (put_user(drive->bios_sect, (byte *) &loc->sectors))
+ return -EFAULT;
+
+ if (put_user(bios_cyl, (unsigned short *) &loc->cylinders))
+ return -EFAULT;
+
if (put_user((unsigned)drive->part[minor(inode->i_rdev)&PARTN_MASK].start_sect,
- (unsigned long *) &loc->start)) return -EFAULT;
+ (unsigned long *) &loc->start))
+ return -EFAULT;
+
return 0;
}
@@ -2440,48 +2440,59 @@
if (!loc || (drive->type != ATA_DISK && drive->type != ATA_FLOPPY))
return -EINVAL;
- if (put_user(drive->head, (u8 *) &loc->heads)) return -EFAULT;
- if (put_user(drive->sect, (u8 *) &loc->sectors)) return -EFAULT;
- if (put_user(drive->cyl, (unsigned int *) &loc->cylinders)) return -EFAULT;
+ if (put_user(drive->head, (u8 *) &loc->heads))
+ return -EFAULT;
+
+ if (put_user(drive->sect, (u8 *) &loc->sectors))
+ return -EFAULT;
+
+ if (put_user(drive->cyl, (unsigned int *) &loc->cylinders))
+ return -EFAULT;
+
if (put_user((unsigned)drive->part[minor(inode->i_rdev)&PARTN_MASK].start_sect,
- (unsigned long *) &loc->start)) return -EFAULT;
+ (unsigned long *) &loc->start))
+ return -EFAULT;
+
return 0;
}
case BLKRRPART: /* Re-read partition tables */
- if (!capable(CAP_SYS_ADMIN))
- return -EACCES;
return ide_revalidate_disk(inode->i_rdev);
case HDIO_GET_IDENTITY:
if (minor(inode->i_rdev) & PARTN_MASK)
return -EINVAL;
+
if (drive->id == NULL)
return -ENOMSG;
+
if (copy_to_user((char *)arg, (char *)drive->id, sizeof(*drive->id)))
return -EFAULT;
+
return 0;
case HDIO_GET_NICE:
- return put_user(drive->dsc_overlap << IDE_NICE_DSC_OVERLAP |
- drive->atapi_overlap << IDE_NICE_ATAPI_OVERLAP,
+ return put_user(drive->dsc_overlap << IDE_NICE_DSC_OVERLAP |
+ drive->atapi_overlap << IDE_NICE_ATAPI_OVERLAP,
(long *) arg);
case HDIO_DRIVE_CMD:
- if (!capable(CAP_SYS_ADMIN) || !capable(CAP_SYS_RAWIO))
+ if (!capable(CAP_SYS_RAWIO))
return -EACCES;
+
return ide_cmd_ioctl(drive, arg);
case HDIO_SET_NICE:
- if (!capable(CAP_SYS_ADMIN)) return -EACCES;
if (arg != (arg & ((1 << IDE_NICE_DSC_OVERLAP))))
return -EPERM;
+
drive->dsc_overlap = (arg >> IDE_NICE_DSC_OVERLAP) & 1;
/* Only CD-ROM's and tapes support DSC overlap. */
if (drive->dsc_overlap && !(drive->type == ATA_ROM || drive->type == ATA_TAPE)) {
drive->dsc_overlap = 0;
return -EPERM;
}
+
return 0;
case BLKGETSIZE:
@@ -2505,25 +2516,24 @@
return block_ioctl(inode->i_bdev, cmd, arg);
case HDIO_GET_BUSSTATE:
- if (!capable(CAP_SYS_ADMIN))
- return -EACCES;
if (put_user(drive->channel->bus_state, (long *)arg))
return -EFAULT;
+
return 0;
case HDIO_SET_BUSSTATE:
- if (!capable(CAP_SYS_ADMIN))
- return -EACCES;
if (drive->channel->busproc)
drive->channel->busproc(drive, (int)arg);
+
return 0;
- /* Now check whatever this particular ioctl has a special
- * implementation.
+ /* Now check whatever this particular ioctl has a device type
+ * specific implementation.
*/
default:
if (ata_ops(drive) && ata_ops(drive)->ioctl)
return ata_ops(drive)->ioctl(drive, inode, file, cmd, arg);
+
return -EINVAL;
}
}
@@ -2545,6 +2555,7 @@
res = 1; /* assume it was changed */
ata_put(ata_ops(drive));
}
+
return res;
}
diff -urN linux-2.5.15/drivers/ide/ide-disk.c linux/drivers/ide/ide-disk.c
--- linux-2.5.15/drivers/ide/ide-disk.c 2002-05-13 12:44:17.000000000 +0200
+++ linux/drivers/ide/ide-disk.c 2002-05-12 16:12:30.000000000 +0200
@@ -1058,6 +1058,7 @@
if (put_user(val, (unsigned long *) arg))
return -EFAULT;
+
return 0;
}
@@ -1081,6 +1082,7 @@
if (put_user(val, (unsigned long *) arg))
return -EFAULT;
+
return 0;
}
@@ -1100,7 +1102,7 @@
}
case HDIO_GET_ACOUSTIC: {
- u8 val = drive->acoustic;
+ unsigned long val = drive->acoustic;
if (put_user(val, (u8 *) arg))
return -EFAULT;
@@ -1128,6 +1130,7 @@
if (put_user(val, (u8 *) arg))
return -EFAULT;
+
return 0;
}
@@ -1153,7 +1156,7 @@
/*
- * IDE subdriver functions, registered with ide.c
+ * Subdriver functions.
*/
static struct ata_operations idedisk_driver = {
owner: THIS_MODULE,
@@ -1178,11 +1181,9 @@
while ((drive = ide_scan_devices(ATA_DISK, "ide-disk", &idedisk_driver, failed)) != NULL) {
if (idedisk_cleanup (drive)) {
- printk (KERN_ERR "%s: cleanup_module() called while still busy\n", drive->name);
- failed++;
+ printk(KERN_ERR "%s: cleanup_module() called while still busy\n", drive->name);
+ ++failed;
}
- /* We must remove proc entries defined in this module.
- Otherwise we oops while accessing these entries */
}
}
@@ -1203,10 +1204,11 @@
idedisk_cleanup(drive);
continue;
}
- failed--;
+ --failed;
}
revalidate_drives();
MOD_DEC_USE_COUNT;
+
return 0;
}
diff -urN linux-2.5.15/drivers/ide/ide-dma.c linux/drivers/ide/ide-dma.c
--- linux-2.5.15/drivers/ide/ide-dma.c 2002-05-13 12:44:17.000000000 +0200
+++ linux/drivers/ide/ide-dma.c 2002-05-11 23:23:14.000000000 +0200
@@ -382,10 +382,10 @@
#ifdef DEBUG
printk("%s: dma_timer_expiry: dma status == 0x%02x\n", drive->name, dma_stat);
-#endif /* DEBUG */
+#endif
#if 0
- HWGROUP(drive)->expiry = NULL; /* one free ride for now */
+ drive->expiry = NULL; /* one free ride for now */
#endif
if (dma_stat & 2) { /* ERROR */
diff -urN linux-2.5.15/drivers/ide/ide-pci.c linux/drivers/ide/ide-pci.c
--- linux-2.5.15/drivers/ide/ide-pci.c 2002-05-10 00:25:29.000000000 +0200
+++ linux/drivers/ide/ide-pci.c 2002-05-13 12:37:07.000000000 +0200
@@ -122,7 +122,7 @@
* Unless there is a bootable card that does not use the standard
* ports 1f0/170 (the ide0/ide1 defaults). The (bootable) flag.
*/
- if (bootable) {
+ if (bootable == ON_BOARD) {
for (h = 0; h < MAX_HWIFS; ++h) {
hwif = &ide_hwifs[h];
if (hwif->chipset == ide_unknown)
@@ -703,7 +703,7 @@
hpt374_device_order_fixup(dev, d);
} else if (d->vendor == PCI_VENDOR_ID_PROMISE && d->device == PCI_DEVICE_ID_PROMISE_20268R)
pdc20270_device_order_fixup(dev, d);
- else if ((dev->class >> 8) == PCI_CLASS_STORAGE_IDE) {
+ else {
printk(KERN_INFO "ATA: %s (%04x:%04x) on PCI slot %s\n",
dev->name, vendor, device, dev->slot_name);
setup_pci_device(dev, d);
diff -urN linux-2.5.15/drivers/ide/ide-taskfile.c linux/drivers/ide/ide-taskfile.c
--- linux-2.5.15/drivers/ide/ide-taskfile.c 2002-05-13 12:44:17.000000000 +0200
+++ linux/drivers/ide/ide-taskfile.c 2002-05-12 17:47:58.000000000 +0200
@@ -279,6 +279,7 @@
if (stat & BUSY_STAT)
return 0; /* drive busy: definitely not interrupting */
+
return 1; /* drive ready: *might* be interrupting */
}
diff -urN linux-2.5.15/drivers/ide/pdc202xx.c linux/drivers/ide/pdc202xx.c
--- linux-2.5.15/drivers/ide/pdc202xx.c 2002-05-10 00:25:39.000000000 +0200
+++ linux/drivers/ide/pdc202xx.c 2002-05-12 16:16:37.000000000 +0200
@@ -1450,7 +1450,7 @@
init_chipset: pdc202xx_init_chipset,
ata66_check: ata66_pdc202xx,
init_channel: ide_init_pdc202xx,
- exnablebits: {{0x50,0x02,0x02}, {0x50,0x04,0x04}},
+ enablebits: {{0x50,0x02,0x02}, {0x50,0x04,0x04}},
bootable: OFF_BOARD,
extra: 48,
flags: ATA_F_IRQ | ATA_F_DMA
diff -urN linux-2.5.15/drivers/ide/sl82c105.c linux/drivers/ide/sl82c105.c
--- linux-2.5.15/drivers/ide/sl82c105.c 2002-05-10 00:23:12.000000000 +0200
+++ linux/drivers/ide/sl82c105.c 2002-05-12 03:35:22.000000000 +0200
@@ -76,7 +76,7 @@
if (ide_config_drive_speed(drive, xfer_mode) == 0)
drv_ctrl = get_timing_sl82c105(t);
- if (drive->using_dma == 0) {
+ if (!drive->using_dma) {
/*
* If we are actually using MW DMA, then we can not
* reprogram the interface drive control register.
diff -urN linux-2.5.15/drivers/ide/tcq.c linux/drivers/ide/tcq.c
--- linux-2.5.15/drivers/ide/tcq.c 2002-05-13 12:44:17.000000000 +0200
+++ linux/drivers/ide/tcq.c 2002-05-13 02:17:33.000000000 +0200
@@ -95,15 +95,15 @@
del_timer(&ch->timer);
- if (test_bit(IDE_DMA, &hwgroup->flags))
+ if (test_bit(IDE_DMA, &ch->active))
udma_stop(drive);
blk_queue_invalidate_tags(q);
drive->using_tcq = 0;
drive->queue_depth = 1;
- clear_bit(IDE_BUSY, &hwgroup->flags);
- clear_bit(IDE_DMA, &hwgroup->flags);
+ clear_bit(IDE_BUSY, &ch->active);
+ clear_bit(IDE_DMA, &ch->active);
hwgroup->handler = NULL;
/*
@@ -152,6 +152,7 @@
static void ata_tcq_irq_timeout(unsigned long data)
{
struct ata_device *drive = (struct ata_device *) data;
+ struct ata_channel *ch = drive->channel;
ide_hwgroup_t *hwgroup = HWGROUP(drive);
unsigned long flags;
@@ -159,7 +160,7 @@
spin_lock_irqsave(&ide_lock, flags);
- if (test_and_set_bit(IDE_BUSY, &hwgroup->flags))
+ if (test_and_set_bit(IDE_BUSY, &ch->active))
printk(KERN_ERR "ATA: %s: hwgroup not busy\n", __FUNCTION__);
if (hwgroup->handler == NULL)
printk(KERN_ERR "ATA: %s: missing isr!\n", __FUNCTION__);
@@ -170,7 +171,7 @@
* if pending commands, try service before giving up
*/
if (ata_pending_commands(drive) && (GET_STAT() & SERVICE_STAT))
- if (service(drive, hwgroup->rq) == ide_started)
+ if (service(drive, drive->rq) == ide_started)
return;
if (drive)
@@ -241,7 +242,7 @@
* Could be called with IDE_DMA in-progress from invalidate
* handler, refuse to do anything.
*/
- if (test_bit(IDE_DMA, &HWGROUP(drive)->flags))
+ if (test_bit(IDE_DMA, &drive->channel->active))
return ide_stopped;
/*
@@ -283,7 +284,7 @@
* should not happen, a buggy device could introduce loop
*/
if ((feat = GET_FEAT()) & NSEC_REL) {
- HWGROUP(drive)->rq = NULL;
+ drive->rq = NULL;
printk("%s: release in service\n", drive->name);
return ide_stopped;
}
@@ -298,7 +299,7 @@
return ide_stopped;
}
- HWGROUP(drive)->rq = rq;
+ drive->rq = rq;
/*
* we'll start a dma read or write, device will trigger
@@ -529,7 +530,7 @@
struct ata_channel *ch = drive->channel;
TCQ_PRINTK("%s: setting up queued %d\n", __FUNCTION__, rq->tag);
- if (!test_bit(IDE_BUSY, &ch->hwgroup->flags))
+ if (!test_bit(IDE_BUSY, &ch->active))
printk("queued_rw: IDE_BUSY not set\n");
if (tcq_wait_dataphase(drive))
@@ -584,7 +585,7 @@
*/
if ((feat = GET_FEAT()) & NSEC_REL) {
drive->immed_rel++;
- HWGROUP(drive)->rq = NULL;
+ drive->rq = NULL;
set_irq(drive, ide_dmaq_intr);
TCQ_PRINTK("REL in queued_start\n");
diff -urN linux-2.5.15/include/asm-alpha/ide.h linux/include/asm-alpha/ide.h
--- linux-2.5.15/include/asm-alpha/ide.h 2002-05-10 00:23:17.000000000 +0200
+++ linux/include/asm-alpha/ide.h 2002-05-12 16:28:28.000000000 +0200
@@ -82,10 +82,6 @@
#endif
}
-#define ide_ack_intr(hwif) (1)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
#endif /* __KERNEL__ */
#endif /* __ASMalpha_IDE_H */
diff -urN linux-2.5.15/include/asm-arm/ide.h linux/include/asm-arm/ide.h
--- linux-2.5.15/include/asm-arm/ide.h 2002-05-10 00:24:07.000000000 +0200
+++ linux/include/asm-arm/ide.h 2002-05-12 16:28:28.000000000 +0200
@@ -21,10 +21,6 @@
#include <asm/arch/ide.h>
-#define ide_ack_intr(hwif) (1)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
/*
* We always use the new IDE port registering,
* so these are fixed here.
diff -urN linux-2.5.15/include/asm-cris/ide.h linux/include/asm-cris/ide.h
--- linux-2.5.15/include/asm-cris/ide.h 2002-05-10 00:22:28.000000000 +0200
+++ linux/include/asm-cris/ide.h 2002-05-12 16:28:28.000000000 +0200
@@ -96,10 +96,6 @@
#undef SUPPORT_SLOW_DATA_PORTS
#define SUPPORT_SLOW_DATA_PORTS 0
-#define ide_ack_intr(hwif) (1)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
/* the drive addressing is done through a controller register on the Etrax CPU */
void OUT_BYTE(unsigned char data, ide_ioreg_t reg);
unsigned char IN_BYTE(ide_ioreg_t reg);
diff -urN linux-2.5.15/include/asm-i386/ide.h linux/include/asm-i386/ide.h
--- linux-2.5.15/include/asm-i386/ide.h 2002-05-10 00:21:52.000000000 +0200
+++ linux/include/asm-i386/ide.h 2002-05-13 03:09:01.000000000 +0200
@@ -86,10 +86,6 @@
#endif
}
-#define ide_ack_intr(hwif) (1)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
#endif /* __KERNEL__ */
#endif /* __ASMi386_IDE_H */
diff -urN linux-2.5.15/include/asm-ia64/ide.h linux/include/asm-ia64/ide.h
--- linux-2.5.15/include/asm-ia64/ide.h 2002-05-10 00:24:57.000000000 +0200
+++ linux/include/asm-ia64/ide.h 2002-05-12 16:28:28.000000000 +0200
@@ -92,10 +92,6 @@
#endif
}
-#define ide_ack_intr(hwif) (1)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
#endif /* __KERNEL__ */
#endif /* __ASM_IA64_IDE_H */
diff -urN linux-2.5.15/include/asm-m68k/ide.h linux/include/asm-m68k/ide.h
--- linux-2.5.15/include/asm-m68k/ide.h 2002-05-10 00:21:32.000000000 +0200
+++ linux/include/asm-m68k/ide.h 2002-05-12 16:28:28.000000000 +0200
@@ -145,10 +145,13 @@
#endif /* CONFIG_ATARI || CONFIG_Q40 */
+#define ATA_ARCH_ACK_INTR
+
+#ifdef CONFIG_ATARI
+#define ATA_ARCH_LOCK
static __inline__ void ide_release_lock (int *ide_lock)
{
-#ifdef CONFIG_ATARI
if (MACH_IS_ATARI) {
if (*ide_lock == 0) {
printk("ide_release_lock: bug\n");
@@ -157,12 +160,10 @@
*ide_lock = 0;
stdma_release();
}
-#endif /* CONFIG_ATARI */
}
static __inline__ void ide_get_lock (int *ide_lock, void (*handler)(int, void *, struct pt_regs *), void *data)
{
-#ifdef CONFIG_ATARI
if (MACH_IS_ATARI) {
if (*ide_lock == 0) {
if (in_interrupt() > 0)
@@ -171,10 +172,8 @@
*ide_lock = 1;
}
}
-#endif /* CONFIG_ATARI */
}
-
-#define ide_ack_intr(hwif) ((hwif)->hw.ack_intr ? (hwif)->hw.ack_intr(hwif) : 1)
+#endif /* CONFIG_ATARI */
/*
* On the Atari, we sometimes can't enable interrupts:
diff -urN linux-2.5.15/include/asm-mips/ide.h linux/include/asm-mips/ide.h
--- linux-2.5.15/include/asm-mips/ide.h 2002-05-10 00:22:38.000000000 +0200
+++ linux/include/asm-mips/ide.h 2002-05-12 16:28:28.000000000 +0200
@@ -68,10 +68,6 @@
#undef SUPPORT_VLB_SYNC
#define SUPPORT_VLB_SYNC 0
-#define ide_ack_intr(hwif) (1)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
#endif /* __KERNEL__ */
#endif /* __ASM_IDE_H */
diff -urN linux-2.5.15/include/asm-mips64/ide.h linux/include/asm-mips64/ide.h
--- linux-2.5.15/include/asm-mips64/ide.h 2002-05-10 00:21:32.000000000 +0200
+++ linux/include/asm-mips64/ide.h 2002-05-12 16:28:28.000000000 +0200
@@ -68,10 +68,6 @@
#endif
}
-#define ide_ack_intr(hwif) (1)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
#endif /* __KERNEL__ */
#endif /* __ASM_IDE_H */
diff -urN linux-2.5.15/include/asm-parisc/ide.h linux/include/asm-parisc/ide.h
--- linux-2.5.15/include/asm-parisc/ide.h 2002-05-10 00:23:34.000000000 +0200
+++ linux/include/asm-parisc/ide.h 2002-05-12 16:28:29.000000000 +0200
@@ -81,10 +81,6 @@
#endif
}
-#define ide_ack_intr(hwif) (1)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
#endif /* __KERNEL__ */
#endif /* __ASMi386_IDE_H */
diff -urN linux-2.5.15/include/asm-ppc/ide.h linux/include/asm-ppc/ide.h
--- linux-2.5.15/include/asm-ppc/ide.h 2002-05-10 00:23:34.000000000 +0200
+++ linux/include/asm-ppc/ide.h 2002-05-12 16:28:29.000000000 +0200
@@ -108,12 +108,8 @@
}
#if (defined CONFIG_APUS || defined CONFIG_BLK_DEV_MPC8xx_IDE )
-#define ide_ack_intr(hwif) (hwif->hw.ack_intr ? hwif->hw.ack_intr(hwif) : 1)
-#else
-#define ide_ack_intr(hwif) (1)
+#define ATA_ARCH_ACK_INTR
#endif
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
#endif /* __KERNEL__ */
diff -urN linux-2.5.15/include/asm-ppc64/ide.h linux/include/asm-ppc64/ide.h
--- linux-2.5.15/include/asm-ppc64/ide.h 2002-05-10 00:24:22.000000000 +0200
+++ linux/include/asm-ppc64/ide.h 2002-05-12 16:28:29.000000000 +0200
@@ -50,10 +50,6 @@
{
}
-#define ide_ack_intr(hwif) (1)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
#endif /* __KERNEL__ */
#endif /* __ASMPPC64_IDE_H */
diff -urN linux-2.5.15/include/asm-s390/ide.h linux/include/asm-s390/ide.h
--- linux-2.5.15/include/asm-s390/ide.h 2002-05-10 00:24:47.000000000 +0200
+++ linux/include/asm-s390/ide.h 2002-05-12 16:28:29.000000000 +0200
@@ -18,14 +18,6 @@
#define ide__sti() do {} while (0)
/*
- * The following are not needed for the non-m68k ports
- */
-#define ide_ack_intr(hwif) (1)
-#define ide_fix_driveid(id) do {} while (0)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
-/*
* We always use the new IDE port registering,
* so these are fixed here.
*/
diff -urN linux-2.5.15/include/asm-s390x/ide.h linux/include/asm-s390x/ide.h
--- linux-2.5.15/include/asm-s390x/ide.h 2002-05-10 00:21:33.000000000 +0200
+++ linux/include/asm-s390x/ide.h 2002-05-12 16:28:29.000000000 +0200
@@ -17,10 +17,6 @@
#define ide__sti() do {} while (0)
-#define ide_ack_intr(hwif) (1)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
/*
* We always use the new IDE port registering,
* so these are fixed here.
diff -urN linux-2.5.15/include/asm-sh/ide.h linux/include/asm-sh/ide.h
--- linux-2.5.15/include/asm-sh/ide.h 2002-05-10 00:22:55.000000000 +0200
+++ linux/include/asm-sh/ide.h 2002-05-12 16:28:29.000000000 +0200
@@ -107,10 +107,6 @@
#endif
}
-#define ide_ack_intr(hwif) (1)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
#endif /* __KERNEL__ */
#endif /* __ASM_SH_IDE_H */
diff -urN linux-2.5.15/include/asm-sparc/ide.h linux/include/asm-sparc/ide.h
--- linux-2.5.15/include/asm-sparc/ide.h 2002-05-10 00:24:45.000000000 +0200
+++ linux/include/asm-sparc/ide.h 2002-05-12 16:28:29.000000000 +0200
@@ -165,11 +165,6 @@
/* __flush_dcache_range((unsigned long)src, end); */ /* P3 see hme */
}
-#define ide_ack_intr(hwif) (1)
-/* #define ide_ack_intr(hwif) ((hwif)->hw.ack_intr ? (hwif)->hw.ack_intr(hwif) : 1) */
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
#endif /* __KERNEL__ */
#endif /* _SPARC_IDE_H */
diff -urN linux-2.5.15/include/asm-sparc64/ide.h linux/include/asm-sparc64/ide.h
--- linux-2.5.15/include/asm-sparc64/ide.h 2002-05-10 00:23:37.000000000 +0200
+++ linux/include/asm-sparc64/ide.h 2002-05-12 16:28:29.000000000 +0200
@@ -181,10 +181,6 @@
#endif
}
-#define ide_ack_intr(hwif) (1)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
#endif /* __KERNEL__ */
#endif /* _SPARC64_IDE_H */
diff -urN linux-2.5.15/include/asm-x86_64/ide.h linux/include/asm-x86_64/ide.h
--- linux-2.5.15/include/asm-x86_64/ide.h 2002-05-10 00:24:22.000000000 +0200
+++ linux/include/asm-x86_64/ide.h 2002-05-12 16:28:29.000000000 +0200
@@ -86,10 +86,6 @@
#endif
}
-#define ide_ack_intr(hwif) (1)
-#define ide_release_lock(lock) do {} while (0)
-#define ide_get_lock(lock, hdlr, data) do {} while (0)
-
#endif /* __KERNEL__ */
#endif /* __ASMi386_IDE_H */
diff -urN linux-2.5.15/include/linux/ide.h linux/include/linux/ide.h
--- linux-2.5.15/include/linux/ide.h 2002-05-13 12:44:17.000000000 +0200
+++ linux/include/linux/ide.h 2002-05-13 03:09:01.000000000 +0200
@@ -239,6 +239,19 @@
#include <asm/ide.h>
+/* Currently only m68k, apus and m8xx need it */
+#ifdef ATA_ARCH_ACK_INTR
+# define ide_ack_intr(hwif) (hwif->hw.ack_intr ? hwif->hw.ack_intr(hwif) : 1)
+#else
+# define ide_ack_intr(hwif) (1)
+#endif
+
+/* Currently only Atari needs it */
+#ifndef ATA_ARCH_LOCK
+# define ide_release_lock(lock) do {} while (0)
+# define ide_get_lock(lock, hdlr, data) do {} while (0)
+#endif
+
/*
* If the arch-dependant ide.h did not declare/define any OUT_BYTE or IN_BYTE
* functions, we make some defaults here. The only architecture currently
@@ -324,14 +337,16 @@
* magically just go away.
*/
request_queue_t queue; /* per device request queue */
+ struct request *rq; /* current request */
unsigned long sleep; /* sleep until this time */
- byte using_dma; /* disk is using dma for read/write */
- byte using_tcq; /* disk is using queueing */
byte retry_pio; /* retrying dma capable host in pio */
byte state; /* retry state */
- byte dsc_overlap; /* flag: DSC overlap */
+
+ unsigned using_dma : 1; /* disk is using dma for read/write */
+ unsigned using_tcq : 1; /* disk is using queueing */
+ unsigned dsc_overlap : 1; /* flag: DSC overlap */
unsigned waiting_for_dma: 1; /* dma currently in progress */
unsigned busy : 1; /* currently doing revalidate_disk() */
@@ -403,11 +418,39 @@
int max_depth;
} ide_drive_t;
+/*
+ * Status returned by various functions.
+ */
+typedef enum {
+ ide_stopped, /* no drive operation was started */
+ ide_started, /* a drive operation was started, and a handler was set */
+ ide_released /* started and released bus */
+} ide_startstop_t;
+
+/*
+ * Interrupt and timeout handler type.
+ */
+typedef ide_startstop_t (ata_handler_t)(struct ata_device *, struct request *);
+typedef int (ata_expiry_t)(struct ata_device *, struct request *);
+
enum {
ATA_PRIMARY = 0,
ATA_SECONDARY = 1
};
+enum {
+ IDE_BUSY, /* awaiting an interrupt */
+ IDE_SLEEP,
+ IDE_DMA /* DMA in progress */
+};
+
+typedef struct hwgroup_s {
+ /* FIXME: We should look for busy request queues instead of looking at
+ * the !NULL state of this field.
+ */
+ ide_startstop_t (*handler)(struct ata_device *, struct request *); /* irq handler, if active */
+} ide_hwgroup_t;
+
struct ata_channel {
struct device dev; /* device handle */
int unit; /* channel number */
@@ -415,7 +458,9 @@
struct hwgroup_s *hwgroup; /* actually (ide_hwgroup_t *) */
struct timer_list timer; /* failsafe timer */
int (*expiry)(struct ata_device *, struct request *); /* irq handler, if active */
+ unsigned long poll_timeout; /* timeout value during polled operations */
struct ata_device *drive; /* last serviced drive */
+ unsigned long active; /* active processing request */
ide_ioreg_t io_ports[IDE_NR_PORTS]; /* task file registers */
hw_regs_t hw; /* Hardware info */
@@ -506,9 +551,8 @@
#endif
/* driver soft-power interface */
int (*busproc)(struct ata_device *, int);
- byte bus_state; /* power state of the IDE bus */
- unsigned long poll_timeout; /* timeout value during polled operations */
+ byte bus_state; /* power state of the IDE bus */
};
/*
@@ -517,27 +561,8 @@
extern int ide_register_hw(hw_regs_t *hw, struct ata_channel **hwifp);
extern void ide_unregister(struct ata_channel *hwif);
-/*
- * Status returned by various functions.
- */
-typedef enum {
- ide_stopped, /* no drive operation was started */
- ide_started, /* a drive operation was started, and a handler was set */
- ide_released /* started and released bus */
-} ide_startstop_t;
-
-/*
- * Interrupt and timeout handler type.
- */
-typedef ide_startstop_t (ata_handler_t)(struct ata_device *, struct request *);
-typedef int (ata_expiry_t)(struct ata_device *, struct request *);
-
struct ata_taskfile;
-#define IDE_BUSY 0 /* awaiting an interrupt */
-#define IDE_SLEEP 1
-#define IDE_DMA 2 /* DMA in progress */
-
#define IDE_MAX_TAG 32
#ifdef CONFIG_BLK_DEV_IDE_TCQ
@@ -561,15 +586,6 @@
# define ata_can_queue(drive) (1)
#endif
-typedef struct hwgroup_s {
- /* FIXME: We should look for busy request queues instead of looking at
- * the !NULL state of this field.
- */
- ide_startstop_t (*handler)(struct ata_device *, struct request *); /* irq handler, if active */
- unsigned long flags; /* BUSY, SLEEPING */
- struct request *rq; /* current request */
-} ide_hwgroup_t;
-
/* FIXME: kill this as soon as possible */
#define PROC_IDE_READ_RETURN(page,start,off,count,eof,len) return 0;
^ permalink raw reply [flat|nested] 265+ messages in thread
* [PATCH] 2.5.15 IDE 62
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
` (8 preceding siblings ...)
2002-05-13 9:48 ` [PATCH] 2.5.15 IDE 61 Martin Dalecki
@ 2002-05-13 12:17 ` Martin Dalecki
2002-05-13 13:48 ` Jens Axboe
2002-05-13 15:36 ` Tom Rini
2002-05-14 10:26 ` [PATCH] 2.5.15 IDE 62a Martin Dalecki
` (2 subsequent siblings)
12 siblings, 2 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-13 12:17 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 253 bytes --]
Mon May 13 12:38:11 CEST 2002 ide-clean-62
- Add missing locking around ide_do_request in do_ide_request().
- Check all other places where locks get used for matching pairs in ide.c.
- Streamline device detection reporting to always use ->slot_name.
[-- Attachment #2: ide-clean-62.diff --]
[-- Type: text/plain, Size: 7762 bytes --]
diff -urN linux-2.5.15/drivers/ide/ide.c linux/drivers/ide/ide.c
--- linux-2.5.15/drivers/ide/ide.c 2002-05-13 15:13:11.000000000 +0200
+++ linux/drivers/ide/ide.c 2002-05-13 14:27:39.000000000 +0200
@@ -323,6 +323,7 @@
}
spin_unlock_irqrestore(&ide_lock, flags);
+
return ret;
}
@@ -341,6 +342,7 @@
ide_hwgroup_t *hwgroup = ch->hwgroup;
spin_lock_irqsave(&ide_lock, flags);
+
if (hwgroup->handler != NULL) {
printk("%s: ide_set_handler: handler not null; old=%p, new=%p, from %p\n",
drive->name, hwgroup->handler, handler, __builtin_return_address(0));
@@ -349,6 +351,7 @@
ch->expiry = expiry;
ch->timer.expires = jiffies + timeout;
add_timer(&ch->timer);
+
spin_unlock_irqrestore(&ide_lock, flags);
}
@@ -1071,8 +1074,10 @@
unsigned long flags;
spin_lock_irqsave(&ide_lock, flags);
+
hwgroup->handler = NULL;
del_timer(&ch->timer);
+
spin_unlock_irqrestore(&ide_lock, flags);
return start_request(drive, drive->rq);
@@ -1275,10 +1280,12 @@
disable_irq_nosync(drive->channel->irq);
spin_unlock(&ide_lock);
+
ide__sti(); /* allow other IRQs while we start this request */
startstop = start_request(drive, rq);
spin_lock_irq(&ide_lock);
+
if (masked_irq && drive->channel->irq != masked_irq)
enable_irq(drive->channel->irq);
@@ -1332,7 +1339,7 @@
static void ide_do_request(struct ata_channel *channel, int masked_irq)
{
ide_get_lock(&irq_lock, ata_irq_request, hwgroup);/* for atari only: POSSIBLY BROKEN HERE(?) */
- __cli(); /* necessary paranoia: ensure IRQs are masked on local CPU */
+// __cli(); /* necessary paranoia: ensure IRQs are masked on local CPU */
while (!test_and_set_bit(IDE_BUSY, &channel->active)) {
struct ata_channel *ch;
@@ -1362,11 +1369,16 @@
queue_commands(drive, masked_irq);
}
+
}
void do_ide_request(request_queue_t *q)
{
+ unsigned long flags;
+
+ spin_lock_irqsave(&ide_lock, flags);
ide_do_request(q->queuedata, 0);
+ spin_unlock_irqrestore(&ide_lock, flags);
}
/*
@@ -1455,7 +1467,9 @@
/* reengage timer */
ch->timer.expires = jiffies + wait;
add_timer(&ch->timer);
+
spin_unlock_irqrestore(&ide_lock, flags);
+
return;
}
}
@@ -1465,7 +1479,9 @@
* the handler() function, which means we need to globally
* mask the specific IRQ:
*/
+
spin_unlock(&ide_lock);
+
ch = drive->channel;
#if DISABLE_IRQ_NOSYNC
disable_irq_nosync(ch->irq);
@@ -1490,7 +1506,9 @@
}
set_recovery_timer(ch);
enable_irq(ch->irq);
+
spin_lock_irq(&ide_lock);
+
if (startstop == ide_stopped)
clear_bit(IDE_BUSY, &ch->active);
}
@@ -1621,6 +1639,7 @@
printk(KERN_ERR "%s: %s: hwgroup was not busy!?\n", drive->name, __FUNCTION__);
hwgroup->handler = NULL;
del_timer(&ch->timer);
+
spin_unlock(&ide_lock);
if (ch->unmask)
@@ -1725,7 +1744,9 @@
rq->rq_dev = mk_kdev(major,(drive->select.b.unit)<<PARTN_BITS);
if (action == ide_wait)
rq->waiting = &wait;
+
spin_lock_irqsave(&ide_lock, flags);
+
if (blk_queue_empty(&drive->queue) || action == ide_preempt) {
if (action == ide_preempt)
drive->rq = NULL;
@@ -1737,7 +1758,9 @@
}
q->elevator.elevator_add_req_fn(q, rq, queue_head);
ide_do_request(drive->channel, 0);
+
spin_unlock_irqrestore(&ide_lock, flags);
+
if (action == ide_wait) {
wait_for_completion(&wait); /* wait for it to be serviced */
return rq->errors ? -EIO : 0; /* return -EIO if errors */
@@ -1767,12 +1790,15 @@
spin_lock_irqsave(&ide_lock, flags);
if (drive->busy || (drive->usage > 1)) {
+
spin_unlock_irqrestore(&ide_lock, flags);
+
return -EBUSY;
}
drive->busy = 1;
MOD_INC_USE_COUNT;
+
spin_unlock_irqrestore(&ide_lock, flags);
res = wipe_partitions(i_rdev);
@@ -1789,6 +1815,7 @@
drive->busy = 0;
wake_up(&drive->wqueue);
MOD_DEC_USE_COUNT;
+
return res;
}
@@ -1950,6 +1977,7 @@
* All clear? Then blow away the buffer cache
*/
spin_unlock_irqrestore(&ide_lock, flags);
+
for (unit = 0; unit < MAX_DRIVES; ++unit) {
struct ata_device * drive = &ch->drives[unit];
@@ -1964,6 +1992,7 @@
}
}
}
+
spin_lock_irqsave(&ide_lock, flags);
/*
@@ -2221,11 +2250,14 @@
spin_lock_irq(&ide_lock);
while (test_bit(IDE_BUSY, &drive->channel->active)) {
+
spin_unlock_irq(&ide_lock);
+
if (time_after(jiffies, timeout)) {
printk("%s: channel busy\n", drive->name);
return -EBUSY;
}
+
spin_lock_irq(&ide_lock);
}
@@ -3455,7 +3487,7 @@
#if defined(CONFIG_BLK_DEV_IDE) || defined(CONFIG_BLK_DEV_IDE_MODULE)
# if defined(__mc68000__) || defined(CONFIG_APUS)
if (ide_hwifs[0].io_ports[IDE_DATA_OFFSET]) {
- ide_get_lock(&irq_lock, NULL, NULL);/* for atari only */
+ // ide_get_lock(&irq_lock, NULL, NULL);/* for atari only */
disable_irq(ide_hwifs[0].irq); /* disable_irq_nosync ?? */
// disable_irq_nosync(ide_hwifs[0].irq);
}
diff -urN linux-2.5.15/drivers/ide/ide-pci.c linux/drivers/ide/ide-pci.c
--- linux-2.5.15/drivers/ide/ide-pci.c 2002-05-13 15:13:11.000000000 +0200
+++ linux/drivers/ide/ide-pci.c 2002-05-13 15:01:41.000000000 +0200
@@ -553,15 +553,14 @@
}
}
}
-
- printk("ATA: %s: controller on PCI bus %02x dev %02x\n",
- dev->name, dev->bus->number, dev->devfn);
+ printk(KERN_INFO "ATA: %s: controller on PCI slot %s dev %02x\n",
+ dev->name, dev->slot_name, dev->devfn);
setup_pci_device(dev, d);
if (!dev2)
return;
d2 = d;
- printk("ATA: %s: controller on PCI bus %02x dev %02x\n",
- dev2->name, dev2->bus->number, dev2->devfn);
+ printk(KERN_INFO "ATA: %s: controller on PCI slot %s dev %02x\n",
+ dev2->name, dev2->slot_name, dev2->devfn);
setup_pci_device(dev2, d2);
}
@@ -584,8 +583,8 @@
}
}
- printk("%s: IDE controller on PCI bus %02x dev %02x\n",
- dev->name, dev->bus->number, dev->devfn);
+ printk(KERN_INFO "ATA: %s: controller on PCI slot %s dev %02x\n",
+ dev->name, dev->slot_name, dev->devfn);
setup_pci_device(dev, d);
if (!dev2) {
return;
@@ -601,8 +600,8 @@
}
}
d2 = d;
- printk("%s: IDE controller on PCI bus %02x dev %02x\n",
- dev2->name, dev2->bus->number, dev2->devfn);
+ printk(KERN_INFO "ATA: %s: controller on PCI slot %s dev %02x\n",
+ dev2->name, dev2->slot_name, dev2->devfn);
setup_pci_device(dev2, d2);
}
@@ -623,7 +622,7 @@
switch(class_rev) {
case 5:
case 4:
- case 3: printk("%s: IDE controller on PCI slot %s\n", dev->name, dev->slot_name);
+ case 3: printk(KERN_INFO "ATA: %s: controller on PCI slot %s\n", dev->name, dev->slot_name);
setup_pci_device(dev, d);
return;
default: break;
@@ -639,17 +638,17 @@
pci_read_config_byte(dev2, PCI_INTERRUPT_PIN, &pin2);
if ((pin1 != pin2) && (dev->irq == dev2->irq)) {
d->bootable = ON_BOARD;
- printk("%s: onboard version of chipset, pin1=%d pin2=%d\n", dev->name, pin1, pin2);
+ printk(KERN_INFO "ATAL: %s: onboard version of chipset, pin1=%d pin2=%d\n", dev->name, pin1, pin2);
}
break;
}
}
- printk("%s: IDE controller on PCI slot %s\n", dev->name, dev->slot_name);
+ printk(KERN_INFO "ATA: %s: controller on PCI slot %s\n", dev->name, dev->slot_name);
setup_pci_device(dev, d);
if (!dev2)
return;
d2 = d;
- printk("%s: IDE controller on PCI slot %s\n", dev2->name, dev2->slot_name);
+ printk(KERN_INFO "ATA: %s: controller on PCI slot %s\n", dev2->name, dev2->slot_name);
setup_pci_device(dev2, d2);
}
@@ -679,6 +678,10 @@
}
if (!d) {
+ /* Only check the device calls, if it wasn't listed, since
+ * there are in esp. some pdc202xx chips which "work around"
+ * beeing grabbed by generic drivers.
+ */
if ((dev->class >> 8) == PCI_CLASS_STORAGE_IDE) {
printk(KERN_INFO "ATA: unknown interface: %s, on PCI slot %s\n",
dev->name, dev->slot_name);
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 12:17 ` [PATCH] 2.5.15 IDE 62 Martin Dalecki
@ 2002-05-13 13:48 ` Jens Axboe
2002-05-13 13:02 ` Martin Dalecki
2002-05-13 15:36 ` Tom Rini
1 sibling, 1 reply; 265+ messages in thread
From: Jens Axboe @ 2002-05-13 13:48 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
On Mon, May 13 2002, Martin Dalecki wrote:
> Mon May 13 12:38:11 CEST 2002 ide-clean-62
>
> - Add missing locking around ide_do_request in do_ide_request().
This is broken, do_ide_request() is already called with the request lock
held. tq_disk run -> generic_unplug_device (grab lock) ->
__generic_unplug_device -> do_ide_request(). You just introduced a
deadlock.
This code would have caused hangs or massive corruption immediately if
ide_lock wasn't ready held there. Not to mention instant spin_unlock
BUG() triggers in queue_command()
--
Jens Axboe
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 13:48 ` Jens Axboe
@ 2002-05-13 13:02 ` Martin Dalecki
2002-05-13 15:38 ` Jens Axboe
0 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-13 13:02 UTC (permalink / raw)
To: Jens Axboe; +Cc: Linus Torvalds, Kernel Mailing List
Uz.ytkownik Jens Axboe napisa?:
> On Mon, May 13 2002, Martin Dalecki wrote:
>
>>Mon May 13 12:38:11 CEST 2002 ide-clean-62
>>
>>- Add missing locking around ide_do_request in do_ide_request().
>
>
> This is broken, do_ide_request() is already called with the request lock
> held. tq_disk run -> generic_unplug_device (grab lock) ->
> __generic_unplug_device -> do_ide_request(). You just introduced a
> deadlock.
>
> This code would have caused hangs or massive corruption immediately if
> ide_lock wasn't ready held there. Not to mention instant spin_unlock
> BUG() triggers in queue_command()
>
Oops. Indeed I see now that the ide_lock is exported to
the upper layers above it in ide-probe.c
blk_init_queue(q, do_ide_request, &ide_lock);
But this is problematic in itself, since it means that
we are basically serialiazing between *all* requests
on all channels.
So I think we should have per channel locks on this level
right? This is anyway our unit for serialization.
(I'm just surprised that blk_init_queue() doesn't
provide queue specific locking and relies on exported
locks from the drivers...)
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 13:02 ` Martin Dalecki
@ 2002-05-13 15:38 ` Jens Axboe
2002-05-13 15:45 ` Martin Dalecki
` (2 more replies)
0 siblings, 3 replies; 265+ messages in thread
From: Jens Axboe @ 2002-05-13 15:38 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
On Mon, May 13 2002, Martin Dalecki wrote:
> Uz.ytkownik Jens Axboe napisa?:
> >On Mon, May 13 2002, Martin Dalecki wrote:
> >
> >>Mon May 13 12:38:11 CEST 2002 ide-clean-62
> >>
> >>- Add missing locking around ide_do_request in do_ide_request().
> >
> >
> >This is broken, do_ide_request() is already called with the request lock
> >held. tq_disk run -> generic_unplug_device (grab lock) ->
> >__generic_unplug_device -> do_ide_request(). You just introduced a
> >deadlock.
> >
> >This code would have caused hangs or massive corruption immediately if
> >ide_lock wasn't ready held there. Not to mention instant spin_unlock
> >BUG() triggers in queue_command()
> >
>
> Oops. Indeed I see now that the ide_lock is exported to
> the upper layers above it in ide-probe.c
>
> blk_init_queue(q, do_ide_request, &ide_lock);
>
> But this is problematic in itself, since it means that
> we are basically serialiazing between *all* requests
> on all channels.
Correct.
> So I think we should have per channel locks on this level
> right? This is anyway our unit for serialization.
> (I'm just surprised that blk_init_queue() doesn't
> provide queue specific locking and relies on exported
> locks from the drivers...)
Sure go ahead and fine grain it, I had no time to go that much into
detail when ripping out io_request_lock. A drive->lock passed to
blk_init_queue would do nicely.
But beware that ide locking is a lot nastier than you think. I saw other
irq changes earlier, I just want to make sure that you are _absolutely_
certain that these changes are safe??
--
Jens Axboe
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 15:38 ` Jens Axboe
@ 2002-05-13 15:45 ` Martin Dalecki
2002-05-13 16:54 ` Linus Torvalds
2002-05-13 15:50 ` Martin Dalecki
2002-05-13 17:52 ` benh
2 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-13 15:45 UTC (permalink / raw)
To: Jens Axboe; +Cc: Linus Torvalds, Kernel Mailing List
Uz.ytkownik Jens Axboe napisa?:
> On Mon, May 13 2002, Martin Dalecki wrote:
>
>>Uz.ytkownik Jens Axboe napisa?:
>>
>>>On Mon, May 13 2002, Martin Dalecki wrote:
>>>
>>>
>>>>Mon May 13 12:38:11 CEST 2002 ide-clean-62
>>>>
>>>>- Add missing locking around ide_do_request in do_ide_request().
>>>
>>>
>>>This is broken, do_ide_request() is already called with the request lock
>>>held. tq_disk run -> generic_unplug_device (grab lock) ->
>>>__generic_unplug_device -> do_ide_request(). You just introduced a
>>>deadlock.
>>>
>>>This code would have caused hangs or massive corruption immediately if
>>>ide_lock wasn't ready held there. Not to mention instant spin_unlock
>>>BUG() triggers in queue_command()
>>>
>>
>>Oops. Indeed I see now that the ide_lock is exported to
>>the upper layers above it in ide-probe.c
>>
>>blk_init_queue(q, do_ide_request, &ide_lock);
>>
>>But this is problematic in itself, since it means that
>>we are basically serialiazing between *all* requests
>>on all channels.
>
>
> Correct.
>
>
>>So I think we should have per channel locks on this level
>>right? This is anyway our unit for serialization.
>>(I'm just surprised that blk_init_queue() doesn't
>>provide queue specific locking and relies on exported
>>locks from the drivers...)
>
>
> Sure go ahead and fine grain it, I had no time to go that much into
> detail when ripping out io_request_lock. A drive->lock passed to
> blk_init_queue would do nicely.
>
> But beware that ide locking is a lot nastier than you think. I saw other
> irq changes earlier, I just want to make sure that you are _absolutely_
> certain that these changes are safe??
Well on the channel level they are safe modulo cmd640 and rz1000.
We can handle them by serializing them on the global lock
in do_ide_request. Like:
if (ch->drive[0].serialized|| ch->drive[1].serialized)
then
spin_lock(serialize_lock);
The other case are shared PCI irq's between two channel,
but this case I can easly test on my HPT772 controller card.
You could have observed the hwgroup_t melting down... step by step.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 15:45 ` Martin Dalecki
@ 2002-05-13 16:54 ` Linus Torvalds
2002-05-13 16:55 ` Jens Axboe
2002-05-13 18:02 ` benh
0 siblings, 2 replies; 265+ messages in thread
From: Linus Torvalds @ 2002-05-13 16:54 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Jens Axboe, Kernel Mailing List
[ Martin - just a heads up that I'm not applying 62, so don't make new IDE
patches relative to that ]
On Mon, 13 May 2002, Martin Dalecki wrote:
>
> Well on the channel level they are safe modulo cmd640 and rz1000.
> We can handle them by serializing them on the global lock
> in do_ide_request. Like:
>
> if (ch->drive[0].serialized|| ch->drive[1].serialized)
> then
> spin_lock(serialize_lock);
NO.
The whole point of having a per-queue lock pointer is that this should be
initialized at queue creation time. Don't add more crud to the IDE
locking, we want to get _rid_ of the locking that IDE has thought
(traditionally incorrectly) that it could do better than the higher
levels.
So when you create the queue, you should decide at THAT point whether you
just want to pass in the same lock or not.
For a cmd640, you make sure that both queues get created with the same
lock. And for non-broken chipsets, you use per-queue locks.
And then you make sure that nobody EVER uses any other lock than the queue
lock.
Linus
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 16:54 ` Linus Torvalds
@ 2002-05-13 16:55 ` Jens Axboe
2002-05-13 16:00 ` Martin Dalecki
2002-05-13 18:02 ` benh
1 sibling, 1 reply; 265+ messages in thread
From: Jens Axboe @ 2002-05-13 16:55 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Martin Dalecki, Kernel Mailing List
On Mon, May 13 2002, Linus Torvalds wrote:
> > Well on the channel level they are safe modulo cmd640 and rz1000.
> > We can handle them by serializing them on the global lock
> > in do_ide_request. Like:
> >
> > if (ch->drive[0].serialized|| ch->drive[1].serialized)
> > then
> > spin_lock(serialize_lock);
>
> NO.
>
> The whole point of having a per-queue lock pointer is that this should be
> initialized at queue creation time. Don't add more crud to the IDE
> locking, we want to get _rid_ of the locking that IDE has thought
> (traditionally incorrectly) that it could do better than the higher
> levels.
>
> So when you create the queue, you should decide at THAT point whether you
> just want to pass in the same lock or not.
>
> For a cmd640, you make sure that both queues get created with the same
> lock. And for non-broken chipsets, you use per-queue locks.
>
> And then you make sure that nobody EVER uses any other lock than the queue
> lock.
Completely agreed. And when we finally use the queue as the
serialization point for "everything", then it all falls into place
nicely.
--
Jens Axboe
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 16:55 ` Jens Axboe
@ 2002-05-13 16:00 ` Martin Dalecki
0 siblings, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-13 16:00 UTC (permalink / raw)
To: Jens Axboe; +Cc: Linus Torvalds, Kernel Mailing List
Uz.ytkownik Jens Axboe napisa?:
> On Mon, May 13 2002, Linus Torvalds wrote:
>
>>>Well on the channel level they are safe modulo cmd640 and rz1000.
>>>We can handle them by serializing them on the global lock
>>>in do_ide_request. Like:
>>>
>>>if (ch->drive[0].serialized|| ch->drive[1].serialized)
>>> then
>>> spin_lock(serialize_lock);
>>
>>NO.
>>
>>The whole point of having a per-queue lock pointer is that this should be
>>initialized at queue creation time. Don't add more crud to the IDE
>>locking, we want to get _rid_ of the locking that IDE has thought
>>(traditionally incorrectly) that it could do better than the higher
>>levels.
>>
>>So when you create the queue, you should decide at THAT point whether you
>>just want to pass in the same lock or not.
>>
>>For a cmd640, you make sure that both queues get created with the same
>>lock. And for non-broken chipsets, you use per-queue locks.
>>
>>And then you make sure that nobody EVER uses any other lock than the queue
>>lock.
>
>
> Completely agreed. And when we finally use the queue as the
> serialization point for "everything", then it all falls into place
> nicely.
Well actually I came to the same conclusion regarding the dealing
with broken chipsets. However please note that:
1. queues are per device, since we have to deal with
the fact that the code flow can be different whatever:
1.1. The drive is doing DMA transfers.
1.2. The drive is doing TCQ. (could and should be unifyed with 1.1.)
1.3. The drive is doing ATAPI.
2. Operations are per channel and not per queue.
Therefore the queue locking and basic serialization
has to be on the channel level, with the "lock recycling trick"
for the two interface chips, which can't distingish properly
between primary and secondary channel.
OK?
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 16:54 ` Linus Torvalds
2002-05-13 16:55 ` Jens Axboe
@ 2002-05-13 18:02 ` benh
1 sibling, 0 replies; 265+ messages in thread
From: benh @ 2002-05-13 18:02 UTC (permalink / raw)
To: Linus Torvalds, Martin Dalecki; +Cc: Jens Axboe, Kernel Mailing List
>And then you make sure that nobody EVER uses any other lock than the queue
>lock.
Except that some controllers are perfectly safe to use both channels
at the same time, except when dealing with rare and sensible operations
(like changing channel settings) where a common set of registers is
shared between channels.
The controller driver in this case will want a lock per channel queue
and an internal lock to protect access to these shared registers. But
that lock can (and has to be) hidden in the controller driver.
Ben.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 15:38 ` Jens Axboe
2002-05-13 15:45 ` Martin Dalecki
@ 2002-05-13 15:50 ` Martin Dalecki
2002-05-13 17:52 ` benh
2 siblings, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-13 15:50 UTC (permalink / raw)
To: Jens Axboe; +Cc: Linus Torvalds, Kernel Mailing List
> But beware that ide locking is a lot nastier than you think. I saw other
> irq changes earlier, I just want to make sure that you are _absolutely_
> certain that these changes are safe??
Well, I'm at least warned now to double check with the current
BIO stuff...
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 15:38 ` Jens Axboe
2002-05-13 15:45 ` Martin Dalecki
2002-05-13 15:50 ` Martin Dalecki
@ 2002-05-13 17:52 ` benh
2002-05-13 15:55 ` Martin Dalecki
2 siblings, 1 reply; 265+ messages in thread
From: benh @ 2002-05-13 17:52 UTC (permalink / raw)
To: Jens Axboe, Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
>> So I think we should have per channel locks on this level
>> right? This is anyway our unit for serialization.
>> (I'm just surprised that blk_init_queue() doesn't
>> provide queue specific locking and relies on exported
>> locks from the drivers...)
>
>Sure go ahead and fine grain it, I had no time to go that much into
>detail when ripping out io_request_lock. A drive->lock passed to
>blk_init_queue would do nicely.
>
>But beware that ide locking is a lot nastier than you think. I saw other
>irq changes earlier, I just want to make sure that you are _absolutely_
>certain that these changes are safe??
You'll probably need a per-host lock (but that one can be safely
hidden in the host controller driver I beleive) since some hosts
share some registers for their 2 channels (timings can be bitfields
in a single register controlling 2 channels, I'm not too sure about
legacy DMA).
Ben.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 17:52 ` benh
@ 2002-05-13 15:55 ` Martin Dalecki
2002-05-13 19:13 ` benh
0 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-13 15:55 UTC (permalink / raw)
To: benh; +Cc: Jens Axboe, Linus Torvalds, Kernel Mailing List
Uz.ytkownik benh@kernel.crashing.org napisa?:
>>>So I think we should have per channel locks on this level
>>>right? This is anyway our unit for serialization.
>>>(I'm just surprised that blk_init_queue() doesn't
>>>provide queue specific locking and relies on exported
>>>locks from the drivers...)
>>
>>Sure go ahead and fine grain it, I had no time to go that much into
>>detail when ripping out io_request_lock. A drive->lock passed to
>>blk_init_queue would do nicely.
>>
>>But beware that ide locking is a lot nastier than you think. I saw other
>>irq changes earlier, I just want to make sure that you are _absolutely_
>>certain that these changes are safe??
>
>
> You'll probably need a per-host lock (but that one can be safely
> hidden in the host controller driver I beleive) since some hosts
> share some registers for their 2 channels (timings can be bitfields
> in a single register controlling 2 channels, I'm not too sure about
> legacy DMA).
Just to clarify it... From the host view it's not the chipset
it's a channel we have to deal with. And there are typically two
channels on a host. For the serialized parts, we have to
possiblities:
1. Preserve the current behaviour of using additionally a global
lock.
2. "Cheat" and reuse the lock from the primary channel during
the initialization of the secondary channel.
Hmmm.... Thinking a bit about it I'm now conviced that 2. is more
elegant then 1. And finally this will
just allow us to make the hwgroup_t go entierly away.
The only thing that worries me are the checks for hwgroupt_t's
only remaining member -> handler use to determine whatever some
IRQ is pending or not.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 15:55 ` Martin Dalecki
@ 2002-05-13 19:13 ` benh
2002-05-14 8:48 ` Martin Dalecki
2002-05-17 11:40 ` Martin Dalecki
0 siblings, 2 replies; 265+ messages in thread
From: benh @ 2002-05-13 19:13 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Jens Axboe, Linus Torvalds, Kernel Mailing List
>Just to clarify it... From the host view it's not the chipset
>it's a channel we have to deal with. And there are typically two
>channels on a host. For the serialized parts, we have to
>possiblities:
>
>1. Preserve the current behaviour of using additionally a global
>lock.
>
>2. "Cheat" and reuse the lock from the primary channel during
>the initialization of the secondary channel.
>
>Hmmm.... Thinking a bit about it I'm now conviced that 2. is more
>elegant then 1. And finally this will
>just allow us to make the hwgroup_t go entierly away.
I would do things differently. From the common point of view,
what we deal with is
controller
/ \
channel x, channel y, ....
That is an _arbitrary_ number of channels. So the host driver
should just register individual "channels" to the IDE layer,
each one has it's queue lock, period.
Now, if for any reason, the host specific code has to synchronize
between several of it's channels when dealing with things like
chipset configuration, it's up to that host driver to know about
it and deal with it; which make perfect sense to be done with a
third lock specific to protecting those specific registers that
are shared and that is completely internal to the host chipset
driver.
The only case I see where the host may have to additionally go
and grab the other channel's locks (the queue lock or whatever
you call it) is if the actual setting change on one channel
has side effect on a currently transferring other channel.
But that is completely internal to the host, and yes, I agree
that reusing the other channel's lock is probably the best solution.
But in cases where you just have 2 bitfields in the same register
that need serialized access from both channels, a simple lock
protecting only that register seems to be plenty enough.
What did I miss ?
Ben.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 19:13 ` benh
@ 2002-05-14 8:48 ` Martin Dalecki
2002-05-17 11:40 ` Martin Dalecki
1 sibling, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-14 8:48 UTC (permalink / raw)
To: benh; +Cc: Jens Axboe, Linus Torvalds, Kernel Mailing List
Uz.ytkownik benh@kernel.crashing.org napisa?:
>>Just to clarify it... From the host view it's not the chipset
>>it's a channel we have to deal with. And there are typically two
>>channels on a host. For the serialized parts, we have to
>>possiblities:
>>
>>1. Preserve the current behaviour of using additionally a global
>>lock.
>>
>>2. "Cheat" and reuse the lock from the primary channel during
>>the initialization of the secondary channel.
>>
>>Hmmm.... Thinking a bit about it I'm now conviced that 2. is more
>>elegant then 1. And finally this will
>>just allow us to make the hwgroup_t go entierly away.
>
>
> I would do things differently. From the common point of view,
> what we deal with is
>
> controller
> / \
> channel x, channel y, ....
>
> That is an _arbitrary_ number of channels. So the host driver
> should just register individual "channels" to the IDE layer,
> each one has it's queue lock, period.
>
> Now, if for any reason, the host specific code has to synchronize
> between several of it's channels when dealing with things like
> chipset configuration, it's up to that host driver to know about
> it and deal with it; which make perfect sense to be done with a
> third lock specific to protecting those specific registers that
> are shared and that is completely internal to the host chipset
> driver.
>
> The only case I see where the host may have to additionally go
> and grab the other channel's locks (the queue lock or whatever
> you call it) is if the actual setting change on one channel
> has side effect on a currently transferring other channel.
The problem is that not all setup register file access
is localized to the particular host chip driver. Go look
for rz1000 - it does not export any correposponding function.
And we have therefore to deal with it on the generic level.
> But that is completely internal to the host, and yes, I agree
> that reusing the other channel's lock is probably the best solution.
>
> But in cases where you just have 2 bitfields in the same register
> that need serialized access from both channels, a simple lock
> protecting only that register seems to be plenty enough.
>
> What did I miss ?
See above.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 19:13 ` benh
2002-05-14 8:48 ` Martin Dalecki
@ 2002-05-17 11:40 ` Martin Dalecki
2002-05-17 2:27 ` Benjamin Herrenschmidt
1 sibling, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-17 11:40 UTC (permalink / raw)
To: benh; +Cc: Jens Axboe, Linus Torvalds, Kernel Mailing List
Uz.ytkownik benh@kernel.crashing.org napisa?:
>>Just to clarify it... From the host view it's not the chipset
>>it's a channel we have to deal with. And there are typically two
>>channels on a host. For the serialized parts, we have to
>>possiblities:
>>
>>1. Preserve the current behaviour of using additionally a global
>>lock.
>>
>>2. "Cheat" and reuse the lock from the primary channel during
>>the initialization of the secondary channel.
>>
>>Hmmm.... Thinking a bit about it I'm now conviced that 2. is more
>>elegant then 1. And finally this will
>>just allow us to make the hwgroup_t go entierly away.
>
>
> I would do things differently. From the common point of view,
> what we deal with is
>
> controller
> / \
> channel x, channel y, ....
>
> That is an _arbitrary_ number of channels. So the host driver
> should just register individual "channels" to the IDE layer,
> each one has it's queue lock, period.
>
> Now, if for any reason, the host specific code has to synchronize
> between several of it's channels when dealing with things like
> chipset configuration, it's up to that host driver to know about
> it and deal with it; which make perfect sense to be done with a
> third lock specific to protecting those specific registers that
> are shared and that is completely internal to the host chipset
> driver.
>
> The only case I see where the host may have to additionally go
> and grab the other channel's locks (the queue lock or whatever
> you call it) is if the actual setting change on one channel
> has side effect on a currently transferring other channel.
>
> But that is completely internal to the host, and yes, I agree
> that reusing the other channel's lock is probably the best solution.
>
> But in cases where you just have 2 bitfields in the same register
> that need serialized access from both channels, a simple lock
> protecting only that register seems to be plenty enough.
>
> What did I miss ?
1. The fact that there are some cases where the initialization code
doesn't necessarily go down to the host chip drivers right now.
2. Most of the current code...
BTW> The code will be much cleaner in the upcomming ide 65, since
the allocation of the structures shared between two channels will be
simple pushed down to the corresponding host chip drivers instead of
the "match search" done after the channles have been initialized.
Since most of the host chip drivers are not reentrant anyway we will
be able to save quite a lot of allocation code as well.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-17 11:40 ` Martin Dalecki
@ 2002-05-17 2:27 ` Benjamin Herrenschmidt
0 siblings, 0 replies; 265+ messages in thread
From: Benjamin Herrenschmidt @ 2002-05-17 2:27 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Jens Axboe, Linus Torvalds, Kernel Mailing List
>1. The fact that there are some cases where the initialization code
>doesn't necessarily go down to the host chip drivers right now.
Well, I'd rather see the init code be some kind of "lib" called
by the host driver, but well...
>2. Most of the current code...
Sure ;) Note that the real point I missed is about broken controllers
that can't grok dealing with simultaneous requests on both channels
(like cmd640) in which case, you probably need the same request
queue to serialize access to both of them. In practice, that means
more or less dealing with both channels like the same way we do
with slave vs. master.
My understanding though is that the block layer can't (yet ?) quite
deal with a single request queue for several target devices, and
it seems that the whole point of the old "busy" flag along with
andre taskfile stuff was to perform some kind of fair arbitration
between which channels/targets got a chance to process requests.
>BTW> The code will be much cleaner in the upcomming ide 65, since
>the allocation of the structures shared between two channels will be
>simple pushed down to the corresponding host chip drivers instead of
>the "match search" done after the channles have been initialized.
Great ! We are slowly going toward a real host controller driver
template finally ;)
>Since most of the host chip drivers are not reentrant anyway we will
>be able to save quite a lot of allocation code as well.
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 62
2002-05-13 12:17 ` [PATCH] 2.5.15 IDE 62 Martin Dalecki
2002-05-13 13:48 ` Jens Axboe
@ 2002-05-13 15:36 ` Tom Rini
1 sibling, 0 replies; 265+ messages in thread
From: Tom Rini @ 2002-05-13 15:36 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
On Mon, May 13, 2002 at 02:17:04PM +0200, Martin Dalecki wrote:
> Mon May 13 12:38:11 CEST 2002 ide-clean-62
Hello. Since include/linux/ide.h has a 'u8', a 'u16', and a 'u64' can
you apply the following so that it doesn't rely in <asm/types.h> being
included indirectly?
--
Tom Rini (TR1265)
http://gate.crashing.org/~trini/
===== include/linux/ide.h 1.60 vs edited =====
--- 1.60/include/linux/ide.h Thu May 9 11:43:58 2002
+++ edited/include/linux/ide.h Mon May 13 08:34:32 2002
@@ -17,6 +17,7 @@
#include <linux/bitops.h>
#include <asm/byteorder.h>
#include <asm/hdreg.h>
+#include <asm/types.h>
/*
* This is the multiple IDE interface driver, as evolved from hd.c.
^ permalink raw reply [flat|nested] 265+ messages in thread
* [PATCH] 2.5.15 IDE 62a
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
` (9 preceding siblings ...)
2002-05-13 12:17 ` [PATCH] 2.5.15 IDE 62 Martin Dalecki
@ 2002-05-14 10:26 ` Martin Dalecki
2002-05-14 10:28 ` [PATCH] 2.5.15 IDE 63 Martin Dalecki
2002-05-15 12:04 ` [PATCH] 2.5.15 IDE 64 Martin Dalecki
12 siblings, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-14 10:26 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 1504 bytes --]
Mon May 13 15:20:18 CEST 2002 ide-clean-62a
62 was crap. This applies on top of 61.
- Streamline device detection reporting to always use ->slot_name.
- Apply 64 bit sector size fixes to the overall code.
- Push ->handler down to the struct ata_channel.
- Introduce channel group based locking instead of a single global lock for all
operations. There are still some places where we have preserved the ide_lock.
We can't lock for queues during device probe and we protect global data
structures during device registration and unregistration in ide.c with it.
- Start replacement of serialized access to the registers of
channels which share them with proper host chip driver specific locking.
This affects the following host chip drivers:
cmd640.c, rz1000, ... ?
Seems some are setting the serialize flag just in case. So better let's do it
gradually over time.
Well, I still have to think whatever we really need to put channels sharing
an IRQ line in the same locking group.
From now on the sick concept of a hw group is gone now. We have full blown
per channel request queues! Hopefully I will be able soon to get my hands on
a dual Athlon machine to check how this all behaves on a multi SMP machine.
- Move the whole SUPPORT_VLB_SYNC stuff to the only place where it is used: the
pdc4030 host chip driver. Eliminate it from the global driver part.
- Eliminate pseudo portability macros from pdc4030. This is a host chip firmly
based on VLB.
[-- Attachment #2: ide-clean-62a.diff --]
[-- Type: text/plain, Size: 74585 bytes --]
diff -urN linux-2.5.15/drivers/ide/cmd640.c linux/drivers/ide/cmd640.c
--- linux-2.5.15/drivers/ide/cmd640.c 2002-05-10 00:24:11.000000000 +0200
+++ linux/drivers/ide/cmd640.c 2002-05-13 23:53:42.000000000 +0200
@@ -155,7 +155,7 @@
#define CMDTIM 0x52
#define ARTTIM0 0x53
#define DRWTIM0 0x54
-#define ARTTIM1 0x55
+#define ARTTIM1 0x55
#define DRWTIM1 0x56
#define ARTTIM23 0x57
#define ARTTIM23_DIS_RA2 0x04
@@ -166,36 +166,42 @@
/*
* Registers and masks for easy access by drive index:
*/
-static byte prefetch_regs[4] = {CNTRL, CNTRL, ARTTIM23, ARTTIM23};
-static byte prefetch_masks[4] = {CNTRL_DIS_RA0, CNTRL_DIS_RA1, ARTTIM23_DIS_RA2, ARTTIM23_DIS_RA3};
+static u8 prefetch_regs[4] = {CNTRL, CNTRL, ARTTIM23, ARTTIM23};
+static u8 prefetch_masks[4] = {CNTRL_DIS_RA0, CNTRL_DIS_RA1, ARTTIM23_DIS_RA2, ARTTIM23_DIS_RA3};
#ifdef CONFIG_BLK_DEV_CMD640_ENHANCED
-static byte arttim_regs[4] = {ARTTIM0, ARTTIM1, ARTTIM23, ARTTIM23};
-static byte drwtim_regs[4] = {DRWTIM0, DRWTIM1, DRWTIM23, DRWTIM23};
+/*
+ * Protects register file access from overlapping on primary and secondary
+ * channel, since those share hardware resources.
+ */
+static spinlock_t cmd640_lock __cacheline_aligned = SPIN_LOCK_UNLOCKED;
+
+static u8 arttim_regs[4] = {ARTTIM0, ARTTIM1, ARTTIM23, ARTTIM23};
+static u8 drwtim_regs[4] = {DRWTIM0, DRWTIM1, DRWTIM23, DRWTIM23};
/*
* Current cmd640 timing values for each drive.
* The defaults for each are the slowest possible timings.
*/
-static byte setup_counts[4] = {4, 4, 4, 4}; /* Address setup count (in clocks) */
-static byte active_counts[4] = {16, 16, 16, 16}; /* Active count (encoded) */
-static byte recovery_counts[4] = {16, 16, 16, 16}; /* Recovery count (encoded) */
+static u8 setup_counts[4] = {4, 4, 4, 4}; /* Address setup count (in clocks) */
+static u8 active_counts[4] = {16, 16, 16, 16}; /* Active count (encoded) */
+static u8 recovery_counts[4] = {16, 16, 16, 16}; /* Recovery count (encoded) */
-#endif /* CONFIG_BLK_DEV_CMD640_ENHANCED */
+#endif
/*
* These are initialized to point at the devices we control
*/
static struct ata_channel *cmd_hwif0, *cmd_hwif1;
-static ide_drive_t *cmd_drives[4];
+static struct ata_device *cmd_drives[4];
/*
* Interface to access cmd640x registers
*/
static unsigned int cmd640_key;
static void (*put_cmd640_reg)(unsigned short reg, byte val);
-static byte (*get_cmd640_reg)(unsigned short reg);
+static u8 (*get_cmd640_reg)(unsigned short reg);
/*
* This is read from the CFR reg, and is used in several places.
@@ -221,9 +227,9 @@
restore_flags(flags);
}
-static byte get_cmd640_reg_pci1 (unsigned short reg)
+static u8 get_cmd640_reg_pci1 (unsigned short reg)
{
- byte b;
+ u8 b;
unsigned long flags;
save_flags(flags);
@@ -236,7 +242,7 @@
/* PCI method 2 access (from CMD datasheet) */
-static void put_cmd640_reg_pci2 (unsigned short reg, byte val)
+static void put_cmd640_reg_pci2 (unsigned short reg, u8 val)
{
unsigned long flags;
@@ -248,9 +254,9 @@
restore_flags(flags);
}
-static byte get_cmd640_reg_pci2 (unsigned short reg)
+static u8 get_cmd640_reg_pci2 (unsigned short reg)
{
- byte b;
+ u8 b;
unsigned long flags;
save_flags(flags);
@@ -264,7 +270,7 @@
/* VLB access */
-static void put_cmd640_reg_vlb (unsigned short reg, byte val)
+static void put_cmd640_reg_vlb (unsigned short reg, u8 val)
{
unsigned long flags;
@@ -275,9 +281,9 @@
restore_flags(flags);
}
-static byte get_cmd640_reg_vlb (unsigned short reg)
+static u8 get_cmd640_reg_vlb (unsigned short reg)
{
- byte b;
+ u8 b;
unsigned long flags;
save_flags(flags);
@@ -290,7 +296,7 @@
static int __init match_pci_cmd640_device (void)
{
- const byte ven_dev[4] = {0x95, 0x10, 0x40, 0x06};
+ const u8 ven_dev[4] = {0x95, 0x10, 0x40, 0x06};
unsigned int i;
for (i = 0; i < 4; i++) {
if (get_cmd640_reg(i) != ven_dev[i])
@@ -338,7 +344,7 @@
*/
static int __init probe_for_cmd640_vlb (void)
{
- byte b;
+ u8 b;
get_cmd640_reg = get_cmd640_reg_vlb;
put_cmd640_reg = put_cmd640_reg_vlb;
@@ -404,7 +410,7 @@
static void __init check_prefetch (unsigned int index)
{
struct ata_device *drive = cmd_drives[index];
- byte b = get_cmd640_reg(prefetch_regs[index]);
+ u8 b = get_cmd640_reg(prefetch_regs[index]);
if (b & prefetch_masks[index]) { /* is prefetch off? */
drive->channel->no_unmask = 0;
@@ -450,19 +456,19 @@
*/
static void set_prefetch_mode (unsigned int index, int mode)
{
- ide_drive_t *drive = cmd_drives[index];
+ struct ata_device *drive = cmd_drives[index];
int reg = prefetch_regs[index];
- byte b;
+ u8 b;
unsigned long flags;
save_flags(flags);
cli();
b = get_cmd640_reg(reg);
if (mode) { /* want prefetch on? */
-#if CMD640_PREFETCH_MASKS
+# if CMD640_PREFETCH_MASKS
drive->channel->no_unmask = 1;
drive->channel->unmask = 0;
-#endif
+# endif
drive->channel->no_io_32bit = 0;
b &= ~prefetch_masks[index]; /* enable prefetch */
} else {
@@ -480,7 +486,7 @@
*/
static void display_clocks (unsigned int index)
{
- byte active_count, recovery_count;
+ u8 active_count, recovery_count;
active_count = active_counts[index];
if (active_count == 1)
@@ -497,7 +503,7 @@
* Pack active and recovery counts into single byte representation
* used by controller
*/
-inline static byte pack_nibbles (byte upper, byte lower)
+static inline u8 pack_nibbles (u8 upper, u8 lower)
{
return ((upper & 0x0f) << 4) | (lower & 0x0f);
}
@@ -507,7 +513,7 @@
*/
static void __init retrieve_drive_counts (unsigned int index)
{
- byte b;
+ u8 b;
/*
* Get the internal setup timing, and convert to clock count
@@ -537,9 +543,9 @@
static void program_drive_counts (unsigned int index)
{
unsigned long flags;
- byte setup_count = setup_counts[index];
- byte active_count = active_counts[index];
- byte recovery_count = recovery_counts[index];
+ u8 setup_count = setup_counts[index];
+ u8 active_count = active_counts[index];
+ u8 recovery_count = recovery_counts[index];
/*
* Set up address setup count and drive read/write timing registers.
@@ -589,10 +595,12 @@
/*
* Set a specific pio_mode for a drive
*/
-static void cmd640_set_mode (unsigned int index, byte pio_mode, unsigned int cycle_time, unsigned int active_time, unsigned int setup_time)
+static void cmd640_set_mode (unsigned int index, u8 pio_mode, unsigned int cycle_time, unsigned int active_time, unsigned int setup_time)
{
int recovery_time, clock_time;
- byte setup_count, active_count, recovery_count, recovery_count2, cycle_count;
+ u8 setup_count, active_count;
+ u8 recovery_count, recovery_count2;
+ u8 cycle_count;
recovery_time = cycle_time - (setup_time + active_time);
clock_time = 1000 / system_bus_speed;
@@ -639,16 +647,19 @@
/*
* Drive PIO mode selection:
*/
-static void cmd640_tune_drive (ide_drive_t *drive, byte mode_wanted)
+static void cmd640_tune_drive(struct ata_device *drive, byte mode_wanted)
{
- byte b;
+ u8 b;
struct ata_timing *t;
unsigned int index = 0;
+ unsigned long flags;
+
+ spin_lock_irqsave(&cmd640_lock, flags);
while (drive != cmd_drives[index]) {
if (++index > 3) {
- printk("%s: bad news in cmd640_tune_drive\n", drive->name);
- return;
+ printk(KERN_ERR "%s: bad news in cmd640_tune_drive\n", drive->name);
+ goto out_lock;
}
}
switch (mode_wanted) {
@@ -659,21 +670,21 @@
if (mode_wanted)
b |= 0x27;
put_cmd640_reg(CNTRL, b);
- printk("%s: %sabled cmd640 fast host timing (devsel)\n", drive->name, mode_wanted ? "en" : "dis");
- return;
+ printk(KERN_INFO "%s: %sabled cmd640 fast host timing (devsel)\n", drive->name, mode_wanted ? "en" : "dis");
+ goto out_lock;
case 8: /* set prefetch off */
case 9: /* set prefetch on */
mode_wanted &= 1;
set_prefetch_mode(index, mode_wanted);
printk("%s: %sabled cmd640 prefetch\n", drive->name, mode_wanted ? "en" : "dis");
- return;
+ goto out_lock;
}
if (mode_wanted == 255)
t = ata_timing_data(ata_timing_mode(drive, XFER_PIO | XFER_EPIO));
else
- t = ata_timing_data(XFER_PIO_0 + min_t(byte, mode_wanted, 4));
+ t = ata_timing_data(XFER_PIO_0 + min_t(u8, mode_wanted, 4));
cmd640_set_mode(index, t->mode - XFER_PIO_0, t->cycle, t->active, t->setup);
@@ -681,10 +692,14 @@
drive->name, t->mode, t->cycle);
display_clocks(index);
+
+out_lock:
+ spin_unlock_irqrestore(&cmd640_lock, flags);
+
return;
}
-#endif /* CONFIG_BLK_DEV_CMD640_ENHANCED */
+#endif
/*
* Probe for a cmd640 chipset, and initialize it if found. Called from ide.c
@@ -697,7 +712,7 @@
int second_port_cmd640 = 0;
const char *bus_type, *port2;
unsigned int index;
- byte b, cfr;
+ u8 b, cfr;
if (cmd640_vlb && probe_for_cmd640_vlb()) {
bus_type = "VLB";
@@ -743,7 +758,7 @@
cmd_hwif0->chipset = ide_cmd640;
#ifdef CONFIG_BLK_DEV_CMD640_ENHANCED
cmd_hwif0->tuneproc = &cmd640_tune_drive;
-#endif /* CONFIG_BLK_DEV_CMD640_ENHANCED */
+#endif
/*
* Ensure compatibility by always using the slowest timings
@@ -777,7 +792,7 @@
second_port_cmd640 = 1;
#ifdef CONFIG_BLK_DEV_CMD640_ENHANCED
second_port_toggled = 1;
-#endif /* CONFIG_BLK_DEV_CMD640_ENHANCED */
+#endif
port2 = "enabled";
} else {
put_cmd640_reg(CNTRL, b); /* restore original setting */
@@ -796,7 +811,7 @@
cmd_hwif1->unit = ATA_SECONDARY;
#ifdef CONFIG_BLK_DEV_CMD640_ENHANCED
cmd_hwif1->tuneproc = &cmd640_tune_drive;
-#endif /* CONFIG_BLK_DEV_CMD640_ENHANCED */
+#endif
}
printk("%s: %sserialized, secondary interface %s\n", cmd_hwif1->name,
cmd_hwif0->serialized ? "" : "not ", port2);
@@ -806,7 +821,7 @@
* Do not unnecessarily disturb any prior BIOS setup of these.
*/
for (index = 0; index < (2 + (second_port_cmd640 << 1)); index++) {
- ide_drive_t *drive = cmd_drives[index];
+ struct ata_device *drive = cmd_drives[index];
#ifdef CONFIG_BLK_DEV_CMD640_ENHANCED
if (drive->autotune || ((index > 1) && second_port_toggled)) {
/*
@@ -837,7 +852,7 @@
check_prefetch (index);
printk("cmd640: drive%d timings/prefetch(%s) preserved\n",
index, drive->channel->no_io_32bit ? "off" : "on");
-#endif /* CONFIG_BLK_DEV_CMD640_ENHANCED */
+#endif
}
#ifdef CMD640_DUMP_REGS
diff -urN linux-2.5.15/drivers/ide/ide.c linux/drivers/ide/ide.c
--- linux-2.5.15/drivers/ide/ide.c 2002-05-14 00:50:24.000000000 +0200
+++ linux/drivers/ide/ide.c 2002-05-14 00:31:15.000000000 +0200
@@ -290,7 +290,7 @@
unsigned long flags;
int ret = 1;
- spin_lock_irqsave(&ide_lock, flags);
+ spin_lock_irqsave(drive->channel->lock, flags);
BUG_ON(!(rq->flags & REQ_STARTED));
@@ -322,7 +322,8 @@
ret = 0;
}
- spin_unlock_irqrestore(&ide_lock, flags);
+ spin_unlock_irqrestore(drive->channel->lock, flags);
+
return ret;
}
@@ -338,18 +339,19 @@
{
unsigned long flags;
struct ata_channel *ch = drive->channel;
- ide_hwgroup_t *hwgroup = ch->hwgroup;
- spin_lock_irqsave(&ide_lock, flags);
- if (hwgroup->handler != NULL) {
+ spin_lock_irqsave(ch->lock, flags);
+
+ if (ch->handler != NULL) {
printk("%s: ide_set_handler: handler not null; old=%p, new=%p, from %p\n",
- drive->name, hwgroup->handler, handler, __builtin_return_address(0));
+ drive->name, ch->handler, handler, __builtin_return_address(0));
}
- hwgroup->handler = handler;
+ ch->handler = handler;
ch->expiry = expiry;
ch->timer.expires = jiffies + timeout;
add_timer(&ch->timer);
- spin_unlock_irqrestore(&ide_lock, flags);
+
+ spin_unlock_irqrestore(ch->lock, flags);
}
static void check_crc_errors(struct ata_device *drive)
@@ -1067,20 +1069,21 @@
ide_startstop_t restart_request(struct ata_device *drive)
{
struct ata_channel *ch = drive->channel;
- ide_hwgroup_t *hwgroup = ch->hwgroup;
unsigned long flags;
- spin_lock_irqsave(&ide_lock, flags);
- hwgroup->handler = NULL;
+ spin_lock_irqsave(ch->lock, flags);
+
+ ch->handler = NULL;
del_timer(&ch->timer);
- spin_unlock_irqrestore(&ide_lock, flags);
+
+ spin_unlock_irqrestore(ch->lock, flags);
return start_request(drive, drive->rq);
}
/*
- * This is used by a drive to give excess bandwidth back to the hwgroup by
- * sleeping for timeout jiffies.
+ * This is used by a drive to give excess bandwidth back by sleeping for
+ * timeout jiffies.
*/
void ide_stall_queue(struct ata_device *drive, unsigned long timeout)
{
@@ -1096,77 +1099,57 @@
static unsigned long longest_sleep(struct ata_channel *channel)
{
unsigned long sleep = 0;
- int i;
-
- for (i = 0; i < MAX_HWIFS; ++i) {
- int unit;
- struct ata_channel *ch = &ide_hwifs[i];
+ int unit;
- if (!ch->present)
- continue;
+ for (unit = 0; unit < MAX_DRIVES; ++unit) {
+ struct ata_device *drive = &channel->drives[unit];
- if (ch->hwgroup != channel->hwgroup)
+ if (!drive->present)
continue;
- for (unit = 0; unit < MAX_DRIVES; ++unit) {
- struct ata_device *drive = &ch->drives[unit];
-
- if (!drive->present)
- continue;
-
- /* This device is sleeping and waiting to be serviced
- * later than any other device we checked thus far.
- */
- if (drive->sleep && (!sleep || time_after(sleep, drive->sleep)))
- sleep = drive->sleep;
- }
+ /* This device is sleeping and waiting to be serviced
+ * later than any other device we checked thus far.
+ */
+ if (drive->sleep && (!sleep || time_after(sleep, drive->sleep)))
+ sleep = drive->sleep;
}
return sleep;
}
/*
- * Select the next device which will be serviced.
+ * Select the next device which will be serviced. This selects onlt between
+ * devices on the same channel, since everything else will be scheduled on the
+ * queue level.
*/
static struct ata_device *choose_urgent_device(struct ata_channel *channel)
{
struct ata_device *choice = NULL;
unsigned long sleep = 0;
- int i;
+ int unit;
- for (i = 0; i < MAX_HWIFS; ++i) {
- int unit;
- struct ata_channel *ch = &ide_hwifs[i];
+ for (unit = 0; unit < MAX_DRIVES; ++unit) {
+ struct ata_device *drive = &channel->drives[unit];
- if (!ch->present)
+ if (!drive->present)
continue;
- if (ch->hwgroup != channel->hwgroup)
+ /* There are no request pending for this device.
+ */
+ if (list_empty(&drive->queue.queue_head))
continue;
- for (unit = 0; unit < MAX_DRIVES; ++unit) {
- struct ata_device *drive = &ch->drives[unit];
-
- if (!drive->present)
- continue;
-
- /* There are no request pending for this device.
- */
- if (list_empty(&drive->queue.queue_head))
- continue;
-
- /* This device still wants to remain idle.
- */
- if (drive->sleep && time_after(drive->sleep, jiffies))
- continue;
+ /* This device still wants to remain idle.
+ */
+ if (drive->sleep && time_after(drive->sleep, jiffies))
+ continue;
- /* Take this device, if there is no device choosen thus far or
- * it's more urgent.
- */
- if (!choice || (drive->sleep && (!choice->sleep || time_after(choice->sleep, drive->sleep)))) {
- if (!blk_queue_plugged(&drive->queue))
- choice = drive;
- }
+ /* Take this device, if there is no device choosen thus far or
+ * it's more urgent.
+ */
+ if (!choice || (drive->sleep && (!choice->sleep || time_after(choice->sleep, drive->sleep)))) {
+ if (!blk_queue_plugged(&drive->queue))
+ choice = drive;
}
}
@@ -1274,11 +1257,13 @@
if (masked_irq && drive->channel->irq != masked_irq)
disable_irq_nosync(drive->channel->irq);
- spin_unlock(&ide_lock);
+ spin_unlock(drive->channel->lock);
+
ide__sti(); /* allow other IRQs while we start this request */
startstop = start_request(drive, rq);
- spin_lock_irq(&ide_lock);
+ spin_lock_irq(drive->channel->lock);
+
if (masked_irq && drive->channel->irq != masked_irq)
enable_irq(drive->channel->irq);
@@ -1301,38 +1286,12 @@
/*
* Issue a new request.
- * Caller must have already done spin_lock_irqsave(&ide_lock, ...)
- *
- * A hwgroup is a serialized group of IDE interfaces. Usually there is
- * exactly one hwif (interface) per hwgroup, but buggy controllers (eg. CMD640)
- * may have both interfaces in a single hwgroup to "serialize" access.
- * Or possibly multiple ISA interfaces can share a common IRQ by being grouped
- * together into one hwgroup for serialized access.
- *
- * Note also that several hwgroups can end up sharing a single IRQ,
- * possibly along with many other devices. This is especially common in
- * PCI-based systems with off-board IDE controller cards.
- *
- * The IDE driver uses the queue spinlock to protect access to the request
- * queues.
- *
- * The first thread into the driver for a particular hwgroup sets the
- * hwgroup->flags IDE_BUSY flag to indicate that this hwgroup is now active,
- * and then initiates processing of the top request from the request queue.
- *
- * Other threads attempting entry notice the busy setting, and will simply
- * queue their new requests and exit immediately. Note that hwgroup->flags
- * remains busy even when the driver is merely awaiting the next interrupt.
- * Thus, the meaning is "this hwgroup is busy processing a request".
- *
- * When processing of a request completes, the completing thread or IRQ-handler
- * will start the next request from the queue. If no more work remains,
- * the driver will clear the hwgroup->flags IDE_BUSY flag and exit.
+ * Caller must have already done spin_lock_irqsave(channel->lock, ...)
*/
static void ide_do_request(struct ata_channel *channel, int masked_irq)
{
ide_get_lock(&irq_lock, ata_irq_request, hwgroup);/* for atari only: POSSIBLY BROKEN HERE(?) */
- __cli(); /* necessary paranoia: ensure IRQs are masked on local CPU */
+// __cli(); /* necessary paranoia: ensure IRQs are masked on local CPU */
while (!test_and_set_bit(IDE_BUSY, &channel->active)) {
struct ata_channel *ch;
@@ -1362,6 +1321,7 @@
queue_commands(drive, masked_irq);
}
+
}
void do_ide_request(request_queue_t *q)
@@ -1413,7 +1373,6 @@
void ide_timer_expiry(unsigned long data)
{
struct ata_channel *ch = (struct ata_channel *) data;
- ide_hwgroup_t *hwgroup = ch->hwgroup;
ata_handler_t *handler;
ata_expiry_t *expiry;
unsigned long flags;
@@ -1424,10 +1383,11 @@
* worth mentioning.
*/
- spin_lock_irqsave(&ide_lock, flags);
+ spin_lock_irqsave(ch->lock, flags);
del_timer(&ch->timer);
- if ((handler = hwgroup->handler) == NULL) {
+ handler = ch->handler;
+ if (!handler) {
/*
* Either a marginal timeout occurred (got the interrupt just
@@ -1441,31 +1401,35 @@
} else {
struct ata_device *drive = ch->drive;
if (!drive) {
- printk("ide_timer_expiry: hwgroup->drive was NULL\n");
- hwgroup->handler = NULL;
+ printk(KERN_ERR "ide_timer_expiry: IRQ handler was NULL\n");
+ ch->handler = NULL;
} else {
ide_startstop_t startstop;
/* paranoia */
if (!test_and_set_bit(IDE_BUSY, &ch->active))
- printk("%s: ide_timer_expiry: hwgroup was not busy??\n", drive->name);
+ printk(KERN_ERR "%s: ide_timer_expiry: IRQ handler was not busy??\n", drive->name);
if ((expiry = ch->expiry) != NULL) {
/* continue */
if ((wait = expiry(drive, drive->rq)) != 0) {
/* reengage timer */
ch->timer.expires = jiffies + wait;
add_timer(&ch->timer);
- spin_unlock_irqrestore(&ide_lock, flags);
+
+ spin_unlock_irqrestore(ch->lock, flags);
+
return;
}
}
- hwgroup->handler = NULL;
+ ch->handler = NULL;
/*
* We need to simulate a real interrupt when invoking
* the handler() function, which means we need to globally
* mask the specific IRQ:
*/
- spin_unlock(&ide_lock);
+
+ spin_unlock(ch->lock);
+
ch = drive->channel;
#if DISABLE_IRQ_NOSYNC
disable_irq_nosync(ch->irq);
@@ -1490,7 +1454,9 @@
}
set_recovery_timer(ch);
enable_irq(ch->irq);
- spin_lock_irq(&ide_lock);
+
+ spin_lock_irq(ch->lock);
+
if (startstop == ide_stopped)
clear_bit(IDE_BUSY, &ch->active);
}
@@ -1498,7 +1464,7 @@
ide_do_request(ch->drive->channel, 0);
- spin_unlock_irqrestore(&ide_lock, flags);
+ spin_unlock_irqrestore(ch->lock, flags);
}
/*
@@ -1560,14 +1526,12 @@
void ata_irq_request(int irq, void *data, struct pt_regs *regs)
{
struct ata_channel *ch = data;
- ide_hwgroup_t *hwgroup = ch->hwgroup;
-
unsigned long flags;
struct ata_device *drive;
- ata_handler_t *handler = hwgroup->handler;
+ ata_handler_t *handler = ch->handler;
ide_startstop_t startstop;
- spin_lock_irqsave(&ide_lock, flags);
+ spin_lock_irqsave(ch->lock, flags);
if (!ide_ack_intr(ch))
goto out_lock;
@@ -1619,16 +1583,17 @@
/* paranoia */
if (!test_and_set_bit(IDE_BUSY, &ch->active))
printk(KERN_ERR "%s: %s: hwgroup was not busy!?\n", drive->name, __FUNCTION__);
- hwgroup->handler = NULL;
+ ch->handler = NULL;
del_timer(&ch->timer);
- spin_unlock(&ide_lock);
+
+ spin_unlock(ch->lock);
if (ch->unmask)
ide__sti(); /* local CPU only */
/* service this interrupt, may set handler for next interrupt */
startstop = handler(drive, drive->rq);
- spin_lock_irq(&ide_lock);
+ spin_lock_irq(ch->lock);
/*
* Note that handler() may have set things up for another
@@ -1639,7 +1604,7 @@
*/
set_recovery_timer(drive->channel);
if (startstop == ide_stopped) {
- if (hwgroup->handler == NULL) { /* paranoia */
+ if (!ch->handler) { /* paranoia */
clear_bit(IDE_BUSY, &ch->active);
ide_do_request(ch, ch->irq);
} else {
@@ -1649,7 +1614,7 @@
queue_commands(drive, ch->irq);
out_lock:
- spin_unlock_irqrestore(&ide_lock, flags);
+ spin_unlock_irqrestore(ch->lock, flags);
}
/*
@@ -1725,7 +1690,9 @@
rq->rq_dev = mk_kdev(major,(drive->select.b.unit)<<PARTN_BITS);
if (action == ide_wait)
rq->waiting = &wait;
- spin_lock_irqsave(&ide_lock, flags);
+
+ spin_lock_irqsave(drive->channel->lock, flags);
+
if (blk_queue_empty(&drive->queue) || action == ide_preempt) {
if (action == ide_preempt)
drive->rq = NULL;
@@ -1737,7 +1704,9 @@
}
q->elevator.elevator_add_req_fn(q, rq, queue_head);
ide_do_request(drive->channel, 0);
- spin_unlock_irqrestore(&ide_lock, flags);
+
+ spin_unlock_irqrestore(drive->channel->lock, flags);
+
if (action == ide_wait) {
wait_for_completion(&wait); /* wait for it to be serviced */
return rq->errors ? -EIO : 0; /* return -EIO if errors */
@@ -1764,15 +1733,19 @@
if ((drive = get_info_ptr(i_rdev)) == NULL)
return -ENODEV;
+ /* FIXME: The locking here doesn't make the slightest sense! */
spin_lock_irqsave(&ide_lock, flags);
if (drive->busy || (drive->usage > 1)) {
+
spin_unlock_irqrestore(&ide_lock, flags);
+
return -EBUSY;
}
drive->busy = 1;
MOD_INC_USE_COUNT;
+
spin_unlock_irqrestore(&ide_lock, flags);
res = wipe_partitions(i_rdev);
@@ -1789,6 +1762,7 @@
drive->busy = 0;
wake_up(&drive->wqueue);
MOD_DEC_USE_COUNT;
+
return res;
}
@@ -1912,7 +1886,7 @@
{
struct gendisk *gd;
struct ata_device *d;
- ide_hwgroup_t *hwgroup;
+ spinlock_t *lock;
int unit;
int i;
unsigned long flags;
@@ -1950,6 +1924,7 @@
* All clear? Then blow away the buffer cache
*/
spin_unlock_irqrestore(&ide_lock, flags);
+
for (unit = 0; unit < MAX_DRIVES; ++unit) {
struct ata_device * drive = &ch->drives[unit];
@@ -1964,6 +1939,7 @@
}
}
}
+
spin_lock_irqsave(&ide_lock, flags);
/*
@@ -1987,10 +1963,10 @@
#endif
/*
- * Remove us from the hwgroup.
+ * Remove us from the lock group.
*/
- hwgroup = ch->hwgroup;
+ lock = ch->lock;
d = ch->drive;
for (i = 0; i < MAX_DRIVES; ++i) {
struct ata_device *drive = &ch->drives[i];
@@ -2020,7 +1996,7 @@
/*
* Free the irq if we were the only channel using it.
*
- * Free the hwgroup if we were the only member.
+ * Free the lock group if we were the only member.
*/
n_irq = n_ch = 0;
for (i = 0; i < MAX_HWIFS; ++i) {
@@ -2031,14 +2007,14 @@
if (tmp->irq == ch->irq)
++n_irq;
- if (tmp->hwgroup == ch->hwgroup)
+ if (tmp->lock == ch->lock)
++n_ch;
}
if (n_irq == 1)
- free_irq(ch->irq, ch->hwgroup);
+ free_irq(ch->irq, ch);
if (n_ch == 1) {
- kfree(ch->hwgroup);
- ch->hwgroup = NULL;
+ kfree(ch->lock);
+ ch->lock = NULL;
}
#if defined(CONFIG_BLK_DEV_IDEDMA) && !defined(CONFIG_DMA_NONPCI)
@@ -2072,7 +2048,7 @@
old = *ch;
init_hwif_data(ch, ch->index);
- ch->hwgroup = old.hwgroup;
+ ch->lock = old.lock;
ch->tuneproc = old.tuneproc;
ch->speedproc = old.speedproc;
ch->selectproc = old.selectproc;
@@ -2218,15 +2194,18 @@
unsigned long timeout = jiffies + (10 * HZ);
- spin_lock_irq(&ide_lock);
+ spin_lock_irq(drive->channel->lock);
while (test_bit(IDE_BUSY, &drive->channel->active)) {
- spin_unlock_irq(&ide_lock);
+
+ spin_unlock_irq(drive->channel->lock);
+
if (time_after(jiffies, timeout)) {
printk("%s: channel busy\n", drive->name);
return -EBUSY;
}
- spin_lock_irq(&ide_lock);
+
+ spin_lock_irq(drive->channel->lock);
}
return 0;
@@ -2260,12 +2239,6 @@
if (!drive->channel->tuneproc)
return -ENOSYS;
- /* FIXME: This is very much the same kind of problem as we have with
- * set_mutlmode() see for a edscription there.
- */
- if (HWGROUP(drive)->handler)
- return -EBUSY;
-
if (drive->channel->tuneproc != NULL)
drive->channel->tuneproc(drive, (u8) arg);
@@ -2334,14 +2307,14 @@
case HDIO_SET_32BIT: {
int val;
- if (arg < 0 || arg > 1 + (SUPPORT_VLB_SYNC << 1))
+ if (arg < 0 || arg > 1)
return -EINVAL;
if (ide_spin_wait_hwgroup(drive))
return -EBUSY;
val = set_io_32bit(drive, arg);
- spin_unlock_irq(&ide_lock);
+ spin_unlock_irq(drive->channel->lock);
return val;
}
@@ -2356,7 +2329,7 @@
return -EBUSY;
val = set_pio_mode(drive, arg);
- spin_unlock_irq(&ide_lock);
+ spin_unlock_irq(drive->channel->lock);
return val;
}
@@ -2382,7 +2355,7 @@
return -EBUSY;
drive->channel->unmask = arg;
- spin_unlock_irq(&ide_lock);
+ spin_unlock_irq(drive->channel->lock);
return 0;
}
@@ -2406,7 +2379,7 @@
return -EBUSY;
val = set_using_dma(drive, arg);
- spin_unlock_irq(&ide_lock);
+ spin_unlock_irq(drive->channel->lock);
return val;
}
@@ -3455,7 +3428,7 @@
#if defined(CONFIG_BLK_DEV_IDE) || defined(CONFIG_BLK_DEV_IDE_MODULE)
# if defined(__mc68000__) || defined(CONFIG_APUS)
if (ide_hwifs[0].io_ports[IDE_DATA_OFFSET]) {
- ide_get_lock(&irq_lock, NULL, NULL);/* for atari only */
+ // ide_get_lock(&irq_lock, NULL, NULL);/* for atari only */
disable_irq(ide_hwifs[0].irq); /* disable_irq_nosync ?? */
// disable_irq_nosync(ide_hwifs[0].irq);
}
diff -urN linux-2.5.15/drivers/ide/ide-disk.c linux/drivers/ide/ide-disk.c
--- linux-2.5.15/drivers/ide/ide-disk.c 2002-05-14 00:50:24.000000000 +0200
+++ linux/drivers/ide/ide-disk.c 2002-05-13 23:26:11.000000000 +0200
@@ -316,7 +316,7 @@
unsigned long flags;
int ret;
- spin_lock_irqsave(&ide_lock, flags);
+ spin_lock_irqsave(drive->channel->lock, flags);
ret = blk_queue_start_tag(&drive->queue, rq);
@@ -325,7 +325,7 @@
if (ata_pending_commands(drive) > drive->max_last_depth)
drive->max_last_depth = ata_pending_commands(drive);
- spin_unlock_irqrestore(&ide_lock, flags);
+ spin_unlock_irqrestore(drive->channel->lock, flags);
if (ret) {
BUG_ON(!ata_pending_commands(drive));
@@ -438,13 +438,6 @@
if (!drive->id)
return -EIO;
- /* FIXME: Hmm... just bailing out my be problematic, since there *is*
- * activity during boot. For now the same problem persists in
- * set_pio_mode() we will have to do something about it soon.
- */
- if (HWGROUP(drive)->handler)
- return -EBUSY;
-
if (arg > drive->id->max_multsect)
arg = drive->id->max_multsect;
@@ -466,9 +459,6 @@
static int set_nowerr(struct ata_device *drive, int arg)
{
- if (HWGROUP(drive)->handler)
- return -EBUSY;
-
drive->nowerr = arg;
drive->bad_wstat = arg ? BAD_R_STAT : BAD_W_STAT;
@@ -576,8 +566,8 @@
return 0;
/* wait until all commands are finished */
- printk("ide_disk_suspend()\n");
- while (HWGROUP(drive)->handler)
+ /* FIXME: waiting for spinlocks should be done instead. */
+ while (drive->channel->handler)
yield();
/* set the drive to standby */
@@ -1022,7 +1012,7 @@
return -EBUSY;
val = set_lba_addressing(drive, arg);
- spin_unlock_irq(&ide_lock);
+ spin_unlock_irq(drive->channel->lock);
return val;
}
@@ -1048,7 +1038,7 @@
return -EBUSY;
val = set_multcount(drive, arg);
- spin_unlock_irq(&ide_lock);
+ spin_unlock_irq(drive->channel->lock);
return val;
}
@@ -1072,7 +1062,7 @@
return -EBUSY;
val = set_nowerr(drive, arg);
- spin_unlock_irq(&ide_lock);
+ spin_unlock_irq(drive->channel->lock);
return val;
}
@@ -1096,7 +1086,7 @@
return -EBUSY;
val = write_cache(drive, arg);
- spin_unlock_irq(&ide_lock);
+ spin_unlock_irq(drive->channel->lock);
return val;
}
@@ -1119,7 +1109,7 @@
return -EBUSY;
val = set_acoustic(drive, arg);
- spin_unlock_irq(&ide_lock);
+ spin_unlock_irq(drive->channel->lock);
return val;
}
@@ -1144,7 +1134,7 @@
return -EBUSY;
val = set_using_tcq(drive, arg);
- spin_unlock_irq(&ide_lock);
+ spin_unlock_irq(drive->channel->lock);
return val;
}
diff -urN linux-2.5.15/drivers/ide/ide-pci.c linux/drivers/ide/ide-pci.c
--- linux-2.5.15/drivers/ide/ide-pci.c 2002-05-14 00:50:24.000000000 +0200
+++ linux/drivers/ide/ide-pci.c 2002-05-13 21:08:19.000000000 +0200
@@ -553,15 +553,14 @@
}
}
}
-
- printk("ATA: %s: controller on PCI bus %02x dev %02x\n",
- dev->name, dev->bus->number, dev->devfn);
+ printk(KERN_INFO "ATA: %s: controller on PCI slot %s dev %02x\n",
+ dev->name, dev->slot_name, dev->devfn);
setup_pci_device(dev, d);
if (!dev2)
return;
d2 = d;
- printk("ATA: %s: controller on PCI bus %02x dev %02x\n",
- dev2->name, dev2->bus->number, dev2->devfn);
+ printk(KERN_INFO "ATA: %s: controller on PCI slot %s dev %02x\n",
+ dev2->name, dev2->slot_name, dev2->devfn);
setup_pci_device(dev2, d2);
}
@@ -584,8 +583,8 @@
}
}
- printk("%s: IDE controller on PCI bus %02x dev %02x\n",
- dev->name, dev->bus->number, dev->devfn);
+ printk(KERN_INFO "ATA: %s: controller on PCI slot %s dev %02x\n",
+ dev->name, dev->slot_name, dev->devfn);
setup_pci_device(dev, d);
if (!dev2) {
return;
@@ -601,8 +600,8 @@
}
}
d2 = d;
- printk("%s: IDE controller on PCI bus %02x dev %02x\n",
- dev2->name, dev2->bus->number, dev2->devfn);
+ printk(KERN_INFO "ATA: %s: controller on PCI slot %s dev %02x\n",
+ dev2->name, dev2->slot_name, dev2->devfn);
setup_pci_device(dev2, d2);
}
@@ -623,7 +622,7 @@
switch(class_rev) {
case 5:
case 4:
- case 3: printk("%s: IDE controller on PCI slot %s\n", dev->name, dev->slot_name);
+ case 3: printk(KERN_INFO "ATA: %s: controller on PCI slot %s\n", dev->name, dev->slot_name);
setup_pci_device(dev, d);
return;
default: break;
@@ -639,17 +638,17 @@
pci_read_config_byte(dev2, PCI_INTERRUPT_PIN, &pin2);
if ((pin1 != pin2) && (dev->irq == dev2->irq)) {
d->bootable = ON_BOARD;
- printk("%s: onboard version of chipset, pin1=%d pin2=%d\n", dev->name, pin1, pin2);
+ printk(KERN_INFO "ATAL: %s: onboard version of chipset, pin1=%d pin2=%d\n", dev->name, pin1, pin2);
}
break;
}
}
- printk("%s: IDE controller on PCI slot %s\n", dev->name, dev->slot_name);
+ printk(KERN_INFO "ATA: %s: controller on PCI slot %s\n", dev->name, dev->slot_name);
setup_pci_device(dev, d);
if (!dev2)
return;
d2 = d;
- printk("%s: IDE controller on PCI slot %s\n", dev2->name, dev2->slot_name);
+ printk(KERN_INFO "ATA: %s: controller on PCI slot %s\n", dev2->name, dev2->slot_name);
setup_pci_device(dev2, d2);
}
@@ -679,6 +678,10 @@
}
if (!d) {
+ /* Only check the device calls, if it wasn't listed, since
+ * there are in esp. some pdc202xx chips which "work around"
+ * beeing grabbed by generic drivers.
+ */
if ((dev->class >> 8) == PCI_CLASS_STORAGE_IDE) {
printk(KERN_INFO "ATA: unknown interface: %s, on PCI slot %s\n",
dev->name, dev->slot_name);
diff -urN linux-2.5.15/drivers/ide/ide-pmac.c linux/drivers/ide/ide-pmac.c
--- linux-2.5.15/drivers/ide/ide-pmac.c 2002-05-10 00:25:18.000000000 +0200
+++ linux/drivers/ide/ide-pmac.c 2002-05-13 23:13:55.000000000 +0200
@@ -15,12 +15,14 @@
* Some code taken from drivers/ide/ide-dma.c:
*
* Copyright (c) 1995-1998 Mark Lord
- *
+ *
* TODO:
- *
+ *
* - Find a way to duplicate less code with ide-dma and use the
* dma fileds in the hwif structure instead of our own
+ *
* - Fix check_disk_change() call
+ *
* - Make module-able (includes setting ppc_md. hooks from within
* this file and not from arch code, and handling module deps with
* mediabay (by having both modules do dynamic lookup of each other
@@ -111,7 +113,7 @@
/* 66Mhz cell, found in KeyLargo. Can do ultra mode 0 to 2 on
* 40 connector cable and to 4 on 80 connector one.
* Clock unit is 15ns (66Mhz)
- *
+ *
* 3 Values can be programmed:
* - Write data setup, which appears to match the cycle time. They
* also call it DIOW setup.
@@ -146,7 +148,7 @@
/* 33Mhz cell, found in OHare, Heathrow (& Paddington) and KeyLargo
* Can do pio & mdma modes, clock unit is 30ns (33Mhz)
- *
+ *
* The access time and recovery time can be programmed. Some older
* Darwin code base limit OHare to 150ns cycle time. I decided to do
* the same here fore safety against broken old hardware ;)
@@ -180,7 +182,7 @@
# define GOOD_DMA_DRIVE 1
/* Rounded Multiword DMA timings
- *
+ *
* I gave up finding a generic formula for all controller
* types and instead, built tables based on timing values
* used by Apple in Darwin's implementation.
@@ -280,7 +282,7 @@
#endif /* CONFIG_PMAC_PBOOK */
static int __pmac
-pmac_ide_find(ide_drive_t *drive)
+pmac_ide_find(struct ata_device *drive)
{
struct ata_channel *hwif = drive->channel;
ide_ioreg_t base;
@@ -352,7 +354,7 @@
* device for yaboot configuration
*/
struct device_node*
-pmac_ide_get_devnode(ide_drive_t *drive)
+pmac_ide_get_devnode(struct ata_device *drive)
{
int i = pmac_ide_find(drive);
if (i < 0)
@@ -365,7 +367,7 @@
* is enough, I beleive selectproc will be called whenever an IDE command is started,
* but... */
static void __pmac
-pmac_ide_selectproc(ide_drive_t *drive)
+pmac_ide_selectproc(struct ata_device *drive)
{
int i = pmac_ide_find(drive);
if (i < 0)
@@ -387,11 +389,11 @@
* almost identical to the generic one and works, I've not yet
* managed to figure out what bit is causing the lockup in the
* generic code, possibly a timing issue...
- *
+ *
* --BenH
*/
static int __pmac
-wait_for_ready(ide_drive_t *drive)
+wait_for_ready(struct ata_device *drive)
{
/* Timeout bumped for some powerbooks */
int timeout = 2000;
@@ -417,12 +419,12 @@
}
static int __pmac
-pmac_ide_do_setfeature(ide_drive_t *drive, byte command)
+pmac_ide_do_setfeature(struct ata_device *drive, byte command)
{
int result = 1;
unsigned long flags;
struct ata_channel *hwif = drive->channel;
-
+
disable_irq(hwif->irq); /* disable_irq_nosync ?? */
udelay(1);
SELECT_DRIVE(drive->channel, drive);
@@ -477,7 +479,7 @@
/* Calculate PIO timings */
static void __pmac
-pmac_ide_tuneproc(ide_drive_t *drive, byte pio)
+pmac_ide_tuneproc(struct ata_device *drive, byte pio)
{
struct ata_timing *t;
int i;
@@ -536,8 +538,8 @@
#ifdef IDE_PMAC_DEBUG
printk(KERN_ERR "ide_pmac: Set PIO timing for mode %d, reg: 0x%08x\n",
pio, *timings);
-#endif
-
+#endif
+
if (drive->select.all == IN_BYTE(IDE_SELECT_REG))
pmac_ide_selectproc(drive);
}
@@ -553,14 +555,14 @@
addrTicks = SYSCLK_TICKS_66(udma_timings[speed & 0xf].addrSetup);
*timings = ((*timings) & ~(TR_66_UDMA_MASK | TR_66_MDMA_MASK)) |
- (wrDataSetupTicks << TR_66_UDMA_WRDATASETUP_SHIFT) |
+ (wrDataSetupTicks << TR_66_UDMA_WRDATASETUP_SHIFT) |
(rdyToPauseTicks << TR_66_UDMA_RDY2PAUS_SHIFT) |
(addrTicks <<TR_66_UDMA_ADDRSETUP_SHIFT) |
TR_66_UDMA_EN;
#ifdef IDE_PMAC_DEBUG
printk(KERN_ERR "ide_pmac: Set UDMA timing for mode %d, reg: 0x%08x\n",
speed & 0xf, *timings);
-#endif
+#endif
return 0;
}
@@ -584,7 +586,7 @@
/* Adjust for drive */
if (drive_cycle_time && drive_cycle_time > cycleTime)
cycleTime = drive_cycle_time;
- /* OHare limits according to some old Apple sources */
+ /* OHare limits according to some old Apple sources */
if ((intf_type == controller_ohare) && (cycleTime < 150))
cycleTime = 150;
/* Get the proper timing array for this controller */
@@ -616,7 +618,7 @@
#ifdef IDE_PMAC_DEBUG
printk(KERN_ERR "ide_pmac: MDMA, cycleTime: %d, accessTime: %d, recTime: %d\n",
cycleTime, accessTime, recTime);
-#endif
+#endif
if (intf_type == controller_kl_ata4 || intf_type == controller_kl_ata4_80) {
/* 66Mhz cell */
accessTicks = SYSCLK_TICKS_66(accessTime);
@@ -646,7 +648,7 @@
int halfTick = 0;
int origAccessTime = accessTime;
int origRecTime = recTime;
-
+
accessTicks = SYSCLK_TICKS(accessTime);
accessTicks = max(accessTicks, 1U);
accessTicks = min(accessTicks, 0x1fU);
@@ -658,7 +660,7 @@
if ((accessTicks > 1) &&
((accessTime - IDE_SYSCLK_NS/2) >= origAccessTime) &&
((recTime - IDE_SYSCLK_NS/2) >= origRecTime)) {
- halfTick = 1;
+ halfTick = 1;
accessTicks--;
}
*timings = ((*timings) & ~TR_33_MDMA_MASK) |
@@ -667,19 +669,19 @@
if (halfTick)
*timings |= TR_33_MDMA_HALFTICK;
}
-#ifdef IDE_PMAC_DEBUG
+# ifdef IDE_PMAC_DEBUG
printk(KERN_ERR "ide_pmac: Set MDMA timing for mode %d, reg: 0x%08x\n",
speed & 0xf, *timings);
-#endif
+# endif
return 0;
}
-#endif /* #ifdef CONFIG_BLK_DEV_IDEDMA_PMAC */
+#endif
/* You may notice we don't use this function on normal operation,
* our, normal mdma function is supposed to be more precise
*/
static int __pmac
-pmac_ide_tune_chipset (ide_drive_t *drive, byte speed)
+pmac_ide_tune_chipset (struct ata_device *drive, byte speed)
{
int intf = pmac_ide_find(drive);
int unit = (drive->select.b.unit & 0x01);
@@ -688,21 +690,21 @@
if (intf < 0)
return 1;
-
+
timings = &pmac_ide[intf].timings[unit];
-
+
switch(speed) {
#ifdef CONFIG_BLK_DEV_IDEDMA_PMAC
case XFER_UDMA_4:
case XFER_UDMA_3:
if (pmac_ide[intf].kind != controller_kl_ata4_80)
- return 1;
+ return 1;
case XFER_UDMA_2:
case XFER_UDMA_1:
case XFER_UDMA_0:
if (pmac_ide[intf].kind != controller_kl_ata4 &&
pmac_ide[intf].kind != controller_kl_ata4_80)
- return 1;
+ return 1;
ret = set_timings_udma(timings, speed);
break;
case XFER_MW_DMA_2:
@@ -714,7 +716,7 @@
case XFER_SW_DMA_1:
case XFER_SW_DMA_0:
return 1;
-#endif /* CONFIG_BLK_DEV_IDEDMA_PMAC */
+#endif
case XFER_PIO_4:
case XFER_PIO_3:
case XFER_PIO_2:
@@ -731,8 +733,8 @@
ret = pmac_ide_do_setfeature(drive, speed);
if (ret)
return ret;
-
- pmac_ide_selectproc(drive);
+
+ pmac_ide_selectproc(drive);
drive->current_speed = speed;
return 0;
@@ -742,7 +744,7 @@
sanitize_timings(int i)
{
unsigned value;
-
+
switch(pmac_ide[i].kind) {
case controller_kl_ata4:
case controller_kl_ata4_80:
@@ -770,8 +772,8 @@
pmac_ide_check_base(ide_ioreg_t base)
{
int ix;
-
- for (ix = 0; ix < MAX_HWIFS; ++ix)
+
+ for (ix = 0; ix < MAX_HWIFS; ++ix)
if (base == pmac_ide[ix].regbase)
return ix;
return -1;
@@ -794,7 +796,7 @@
pmac_find_ide_boot(char *bootdevice, int n)
{
int i;
-
+
/*
* Look through the list of IDE interfaces for this one.
*/
@@ -857,7 +859,7 @@
int in_bay = 0;
u8 pbus, pid;
struct pci_dev *pdev = NULL;
-
+
/*
* If this node is not under a mac-io or dbdma node,
* leave it to the generic PCI driver.
@@ -883,7 +885,7 @@
if (pdev == NULL)
printk(KERN_WARNING "ide: no PCI host for device %s, DMA disabled\n",
np->full_name);
-
+
/*
* If this slot is taken (e.g. by ide-pci.c) try the next one.
*/
@@ -950,7 +952,7 @@
&& strcasecmp(np->parent->name, "media-bay") == 0) {
#ifdef CONFIG_PMAC_PBOOK
media_bay_set_ide_infos(np->parent,base,irq,i);
-#endif /* CONFIG_PMAC_PBOOK */
+#endif
in_bay = 1;
if (!bidp)
pmif->aapl_bus_id = 1;
@@ -961,7 +963,7 @@
*/
ppc_md.feature_call(PMAC_FTR_IDE_ENABLE, np, 0, 1);
} else {
- /* This is necessary to enable IDE when net-booting */
+ /* This is necessary to enable IDE when net-booting */
printk(KERN_INFO "pmac_ide: enabling IDE bus ID %d\n",
pmif->aapl_bus_id);
ppc_md.feature_call(PMAC_FTR_IDE_RESET, np, pmif->aapl_bus_id, 1);
@@ -981,14 +983,14 @@
#ifdef CONFIG_PMAC_PBOOK
if (in_bay && check_media_bay_by_base(base, MB_CD) == 0)
hwif->noprobe = 0;
-#endif /* CONFIG_PMAC_PBOOK */
+#endif
#ifdef CONFIG_BLK_DEV_IDEDMA_PMAC
if (pdev && np->n_addrs >= 2) {
/* has a DBDMA controller channel */
pmac_ide_setup_dma(np, i);
}
-#endif /* CONFIG_BLK_DEV_IDEDMA_PMAC */
+#endif
++i;
}
@@ -998,12 +1000,12 @@
#ifdef CONFIG_PMAC_PBOOK
pmu_register_sleep_notifier(&idepmac_sleep_notifier);
-#endif /* CONFIG_PMAC_PBOOK */
+#endif
}
#ifdef CONFIG_BLK_DEV_IDEDMA_PMAC
-static void __init
+static void __init
pmac_ide_setup_dma(struct device_node *np, int ix)
{
struct pmac_ide_hwif *pmif = &pmac_ide[ix];
@@ -1037,7 +1039,7 @@
if (pmif->sg_table == NULL) {
pci_free_consistent( ide_hwifs[ix].pci_dev,
(MAX_DCMDS + 2) * sizeof(struct dbdma_cmd),
- pmif->dma_table_cpu, pmif->dma_table_dma);
+ pmif->dma_table_cpu, pmif->dma_table_dma);
return;
}
ide_hwifs[ix].udma_enable = pmac_udma_enable;
@@ -1230,7 +1232,7 @@
/* Calculate MultiWord DMA timings */
static int __pmac
-pmac_ide_mdma_enable(ide_drive_t *drive, int idx)
+pmac_ide_mdma_enable(struct ata_device *drive, int idx)
{
byte bits = drive->id->dma_mword & 0x07;
byte feature = dma_bits_to_command(bits);
@@ -1240,16 +1242,16 @@
int ret;
/* Set feature on drive */
- printk(KERN_INFO "%s: Enabling MultiWord DMA %d\n", drive->name, feature & 0xf);
+ printk(KERN_INFO "%s: Enabling MultiWord DMA %d\n", drive->name, feature & 0xf);
ret = pmac_ide_do_setfeature(drive, feature);
if (ret) {
- printk(KERN_WARNING "%s: Failed !\n", drive->name);
- return 0;
+ printk(KERN_WARNING "%s: Failed !\n", drive->name);
+ return 0;
}
if (!drive->init_speed)
drive->init_speed = feature;
-
+
/* which drive is it ? */
if (drive->select.b.unit & 0x01)
timings = &pmac_ide[idx].timings[1];
@@ -1265,13 +1267,13 @@
/* Calculate controller timings */
set_timings_mdma(pmac_ide[idx].kind, timings, feature, drive_cycle_time);
- drive->current_speed = feature;
+ drive->current_speed = feature;
return 1;
}
/* Calculate Ultra DMA timings */
static int __pmac
-pmac_ide_udma_enable(ide_drive_t *drive, int idx, int high_speed)
+pmac_ide_udma_enable(struct ata_device *drive, int idx, int high_speed)
{
byte bits = drive->id->dma_ultra & 0x1f;
byte feature = udma_bits_to_command(bits, high_speed);
@@ -1279,7 +1281,7 @@
int ret;
/* Set feature on drive */
- printk(KERN_INFO "%s: Enabling Ultra DMA %d\n", drive->name, feature & 0xf);
+ printk(KERN_INFO "%s: Enabling Ultra DMA %d\n", drive->name, feature & 0xf);
ret = pmac_ide_do_setfeature(drive, feature);
if (ret) {
printk(KERN_WARNING "%s: Failed !\n", drive->name);
@@ -1297,12 +1299,12 @@
set_timings_udma(timings, feature);
- drive->current_speed = feature;
+ drive->current_speed = feature;
return 1;
}
static int __pmac
-pmac_ide_check_dma(ide_drive_t *drive)
+pmac_ide_check_dma(struct ata_device *drive)
{
int ata4, udma, idx;
struct hd_driveid *id = drive->id;
@@ -1343,7 +1345,7 @@
return 0;
}
-static void ide_toggle_bounce(ide_drive_t *drive, int on)
+static void ide_toggle_bounce(struct ata_device *drive, int on)
{
dma64_addr_t addr = BLK_BOUNCE_HIGH;
@@ -1432,7 +1434,7 @@
/* Apple adds 60ns to wrDataSetup on reads */
if (ata4 && (pmac_ide[ix].timings[unit] & TR_66_UDMA_EN)) {
out_le32((unsigned *)(IDE_DATA_REG + IDE_TIMING_CONFIG + _IO_BASE),
- pmac_ide[ix].timings[unit] +
+ pmac_ide[ix].timings[unit] +
((reading) ? 0x00800000UL : 0));
(void)in_le32((unsigned *)(IDE_DATA_REG + IDE_TIMING_CONFIG + _IO_BASE));
}
@@ -1530,7 +1532,7 @@
}
#endif
-static void idepmac_sleep_device(ide_drive_t *drive, int i, unsigned base)
+static void idepmac_sleep_device(struct ata_device *drive, int i, unsigned base)
{
int j;
@@ -1568,7 +1570,7 @@
#ifdef CONFIG_PMAC_PBOOK
static void __pmac
-idepmac_wake_device(ide_drive_t *drive, int used_dma)
+idepmac_wake_device(struct ata_device *drive, int used_dma)
{
/* We force the IDE subdriver to check for a media change
* This must be done first or we may lost the condition
@@ -1581,7 +1583,7 @@
/* We kick the VFS too (see fix in ide.c revalidate) */
check_disk_change(mk_kdev(drive->channel->major, (drive->select.b.unit) << PARTN_BITS));
-
+
#ifdef CONFIG_BLK_DEV_IDEDMA_PMAC
/* We re-enable DMA on the drive if it was active. */
/* This doesn't work with the CD-ROM in the media-bay, probably
@@ -1590,12 +1592,12 @@
*/
if (used_dma && !ide_spin_wait_hwgroup(drive)) {
/* Lock HW group */
- set_bit(IDE_BUSY, &HWGROUP(drive)->flags);
+ set_bit(IDE_BUSY, &drive->channel->active);
pmac_ide_check_dma(drive);
- clear_bit(IDE_BUSY, &HWGROUP(drive)->flags);
- spin_unlock_irq(&ide_lock);
+ clear_bit(IDE_BUSY, &drive->channel->active);
+ spin_unlock_irq(drive->channel->lock);
}
-#endif /* CONFIG_BLK_DEV_IDEDMA_PMAC */
+#endif
}
static void __pmac
@@ -1606,11 +1608,11 @@
/* We clear the timings */
pmac_ide[i].timings[0] = 0;
pmac_ide[i].timings[1] = 0;
-
+
/* The media bay will handle itself just fine */
if (mediabay)
return;
-
+
/* Disable the bus */
ppc_md.feature_call(PMAC_FTR_IDE_ENABLE, np, pmac_ide[i].aapl_bus_id, 0);
}
@@ -1630,7 +1632,7 @@
}
static void
-idepmac_sleep_drive(ide_drive_t *drive, int idx, unsigned long base)
+idepmac_sleep_drive(struct ata_device *drive, int idx, unsigned long base)
{
/* Wait for HW group to complete operations */
if (ide_spin_wait_hwgroup(drive))
@@ -1639,22 +1641,22 @@
return;
else {
/* Lock HW group */
- set_bit(IDE_BUSY, &HWGROUP(drive)->flags);
+ set_bit(IDE_BUSY, &drive->channel->active);
/* Stop the device */
idepmac_sleep_device(drive, idx, base);
- spin_unlock_irq(&ide_lock);
+ spin_unlock_irq(drive->channel->lock);
}
}
static void
-idepmac_wake_drive(ide_drive_t *drive, unsigned long base)
+idepmac_wake_drive(struct ata_device *drive, unsigned long base)
{
int j;
-
+
/* Reset timings */
pmac_ide_selectproc(drive);
mdelay(10);
-
+
/* Wait up to 20 seconds for the drive to be ready */
for (j = 0; j < 200; j++) {
int status;
@@ -1669,7 +1671,7 @@
/* We resume processing on the HW group */
spin_lock_irq(&ide_lock);
- clear_bit(IDE_BUSY, &HWGROUP(drive)->flags);
+ clear_bit(IDE_BUSY, &drive->channel->active);
if (!list_empty(&drive->queue.queue_head))
do_ide_request(&drive->queue);
spin_unlock_irq(&ide_lock);
@@ -1685,7 +1687,7 @@
int i, ret;
unsigned long base;
int big_delay;
-
+
switch (when) {
case PBOOK_SLEEP_REQUEST:
break;
@@ -1707,7 +1709,7 @@
}
/* Disable irq during sleep */
disable_irq(pmac_ide[i].irq);
-
+
/* Check if this is a media bay with an IDE device or not
* a media bay.
*/
@@ -1722,8 +1724,8 @@
if ((base = pmac_ide[i].regbase) == 0)
continue;
-
- /* Make sure we have sane timings */
+
+ /* Make sure we have sane timings */
sanitize_timings(i);
/* Check if this is a media bay with an IDE device or not
@@ -1731,7 +1733,7 @@
*/
ret = check_media_bay_by_base(base, MB_CD);
if ((ret == 0) || (ret == -ENODEV)) {
- idepmac_wake_interface(i, base, (ret == 0));
+ idepmac_wake_interface(i, base, (ret == 0));
big_delay = 1;
}
@@ -1739,18 +1741,18 @@
/* Let hardware get up to speed */
if (big_delay)
mdelay(IDE_WAKEUP_DELAY_MS);
-
+
for (i = 0; i < pmac_ide_count; ++i) {
struct ata_channel *hwif;
int used_dma, dn;
int irq_on = 0;
-
+
if ((base = pmac_ide[i].regbase) == 0)
continue;
-
+
hwif = &ide_hwifs[i];
for (dn=0; dn<MAX_DRIVES; dn++) {
- ide_drive_t *drive = &hwif->drives[dn];
+ struct ata_device *drive = &hwif->drives[dn];
if (!drive->present)
continue;
/* We don't have re-configured DMA yet */
diff -urN linux-2.5.15/drivers/ide/ide-probe.c linux/drivers/ide/ide-probe.c
--- linux-2.5.15/drivers/ide/ide-probe.c 2002-05-14 00:50:21.000000000 +0200
+++ linux/drivers/ide/ide-probe.c 2002-05-13 23:39:29.000000000 +0200
@@ -578,15 +578,15 @@
{
unsigned long flags;
int i;
- ide_hwgroup_t *hwgroup;
- ide_hwgroup_t *new_hwgroup;
+ spinlock_t *lock;
+ spinlock_t *new_lock;
struct ata_channel *match = NULL;
/* Spare allocation before sleep. */
- new_hwgroup = kmalloc(sizeof(*hwgroup), GFP_KERNEL);
+ new_lock = kmalloc(sizeof(*lock), GFP_KERNEL);
spin_lock_irqsave(&ide_lock, flags);
- ch->hwgroup = NULL;
+ ch->lock = NULL;
#if MAX_HWIFS > 1
/*
@@ -596,7 +596,7 @@
struct ata_channel *h = &ide_hwifs[i];
/* scan only initialized channels */
- if (!h->hwgroup)
+ if (!h->lock)
continue;
if (ch->irq != h->irq)
@@ -606,7 +606,7 @@
if (ch->chipset != ide_pci || h->chipset != ide_pci ||
ch->serialized || h->serialized) {
- if (match && match->hwgroup && match->hwgroup != h->hwgroup)
+ if (match && match->lock && match->lock != h->lock)
printk("%s: potential irq problem with %s and %s\n", ch->name, h->name, match->name);
/* don't undo a prior perfect match */
if (!match || match->irq != ch->irq)
@@ -615,19 +615,20 @@
}
#endif
/*
- * If we are still without a hwgroup, then form a new one
+ * If we are still without a lock group, then form a new one
*/
- if (match) {
- hwgroup = match->hwgroup;
- if(new_hwgroup)
- kfree(new_hwgroup);
- } else {
- hwgroup = new_hwgroup;
- if (!hwgroup) {
+ if (!match) {
+ lock = new_lock;
+ if (!lock) {
spin_unlock_irqrestore(&ide_lock, flags);
+
return 1;
}
- memset(hwgroup, 0, sizeof(*hwgroup));
+ spin_lock_init(lock);
+ } else {
+ lock = match->lock;
+ if(new_lock)
+ kfree(new_lock);
}
/*
@@ -645,7 +646,8 @@
if (request_irq(ch->irq, &ata_irq_request, sa, ch->name, ch)) {
if (!match)
- kfree(hwgroup);
+ kfree(lock);
+
spin_unlock_irqrestore(&ide_lock, flags);
return 1;
@@ -653,9 +655,9 @@
}
/*
- * Everything is okay. Tag us as member of this hardware group.
+ * Everything is okay. Tag us as member of this lock group.
*/
- ch->hwgroup = hwgroup;
+ ch->lock = lock;
init_timer(&ch->timer);
ch->timer.function = &ide_timer_expiry;
@@ -678,7 +680,7 @@
q = &drive->queue;
q->queuedata = drive->channel;
- blk_init_queue(q, do_ide_request, &ide_lock);
+ blk_init_queue(q, do_ide_request, drive->channel->lock);
blk_queue_segment_boundary(q, 0xffff);
/* ATA can do up to 128K per request, pdc4030 needs smaller limit */
diff -urN linux-2.5.15/drivers/ide/ide-taskfile.c linux/drivers/ide/ide-taskfile.c
--- linux-2.5.15/drivers/ide/ide-taskfile.c 2002-05-14 00:50:24.000000000 +0200
+++ linux/drivers/ide/ide-taskfile.c 2002-05-14 00:23:19.000000000 +0200
@@ -61,41 +61,6 @@
* Data transfer functions for polled IO.
*/
-#if SUPPORT_VLB_SYNC
-/*
- * Some localbus EIDE interfaces require a special access sequence
- * when using 32-bit I/O instructions to transfer data. We call this
- * the "vlb_sync" sequence, which consists of three successive reads
- * of the sector count register location, with interrupts disabled
- * to ensure that the reads all happen together.
- */
-static void ata_read_vlb(struct ata_device *drive, void *buffer, unsigned int wcount)
-{
- unsigned long flags;
-
- __save_flags(flags); /* local CPU only */
- __cli(); /* local CPU only */
- IN_BYTE(IDE_NSECTOR_REG);
- IN_BYTE(IDE_NSECTOR_REG);
- IN_BYTE(IDE_NSECTOR_REG);
- insl(IDE_DATA_REG, buffer, wcount);
- __restore_flags(flags); /* local CPU only */
-}
-
-static void ata_write_vlb(struct ata_device *drive, void *buffer, unsigned int wcount)
-{
- unsigned long flags;
-
- __save_flags(flags); /* local CPU only */
- __cli(); /* local CPU only */
- IN_BYTE(IDE_NSECTOR_REG);
- IN_BYTE(IDE_NSECTOR_REG);
- IN_BYTE(IDE_NSECTOR_REG);
- outsl(IDE_DATA_REG, buffer, wcount);
- __restore_flags(flags); /* local CPU only */
-}
-#endif
-
static void ata_read_32(struct ata_device *drive, void *buffer, unsigned int wcount)
{
insl(IDE_DATA_REG, buffer, wcount);
@@ -157,12 +122,7 @@
io_32bit = drive->channel->io_32bit;
if (io_32bit) {
-#if SUPPORT_VLB_SYNC
- if (io_32bit & 2)
- ata_read_vlb(drive, buffer, wcount);
- else
-#endif
- ata_read_32(drive, buffer, wcount);
+ ata_read_32(drive, buffer, wcount);
} else {
#if SUPPORT_SLOW_DATA_PORTS
if (drive->channel->slow)
@@ -188,12 +148,7 @@
io_32bit = drive->channel->io_32bit;
if (io_32bit) {
-#if SUPPORT_VLB_SYNC
- if (io_32bit & 2)
- ata_write_vlb(drive, buffer, wcount);
- else
-#endif
- ata_write_32(drive, buffer, wcount);
+ ata_write_32(drive, buffer, wcount);
} else {
#if SUPPORT_SLOW_DATA_PORTS
if (drive->channel->slow)
@@ -320,7 +275,6 @@
static ide_startstop_t task_mulout_intr(struct ata_device *drive, struct request *rq)
{
u8 stat = GET_STAT();
- ide_hwgroup_t *hwgroup = HWGROUP(drive);
int mcount = drive->mult_count;
ide_startstop_t startstop;
@@ -349,7 +303,7 @@
}
/* no data yet, so wait for another interrupt */
- if (hwgroup->handler == NULL)
+ if (!drive->channel->handler)
ide_set_handler(drive, task_mulout_intr, WAIT_CMD, NULL);
return ide_started;
@@ -392,7 +346,7 @@
} while (mcount);
rq->errors = 0;
- if (hwgroup->handler == NULL)
+ if (!drive->channel->handler)
ide_set_handler(drive, task_mulout_intr, WAIT_CMD, NULL);
return ide_started;
diff -urN linux-2.5.15/drivers/ide/pdc4030.c linux/drivers/ide/pdc4030.c
--- linux-2.5.15/drivers/ide/pdc4030.c 2002-05-14 00:50:21.000000000 +0200
+++ linux/drivers/ide/pdc4030.c 2002-05-14 00:22:41.000000000 +0200
@@ -1,14 +1,11 @@
/* -*- linux-c -*-
- * linux/drivers/ide/pdc4030.c Version 0.92 Jan 15, 2002
*
* Copyright (C) 1995-2002 Linus Torvalds & authors (see below)
- */
-
-/*
+ *
* Principal Author/Maintainer: peterd@pnd-pc.demon.co.uk
*
* This file provides support for the second port and cache of Promise
- * IDE interfaces, e.g. DC4030VL, DC4030VL-1 and DC4030VL-2.
+ * VLB based IDE interfaces, e.g. DC4030VL, DC4030VL-1 and DC4030VL-2.
*
* Thanks are due to Mark Lord for advice and patiently answering stupid
* questions, and all those mugs^H^H^H^Hbrave souls who've tested this,
@@ -44,14 +41,14 @@
/*
* Once you've compiled it in, you'll have to also enable the interface
- * setup routine from the kernel command line, as in
+ * setup routine from the kernel command line, as in
*
* 'linux ide0=dc4030' or 'linux ide1=dc4030'
*
* It should now work as a second controller also ('ide1=dc4030') but only
* if you DON'T have BIOS V4.44, which has a bug. If you have this version
* and EPROM programming facilities, you need to fix 4 bytes:
- * 2496: 81 81
+ * 2496: 81 81
* 2497: 3E 3E
* 2498: 22 98 *
* 2499: 06 05 *
@@ -67,7 +64,7 @@
*
* As of January 1999, Promise Technology Inc. have finally supplied me with
* some technical information which has shed a glimmer of light on some of the
- * problems I was having, especially with writes.
+ * problems I was having, especially with writes.
*
* There are still potential problems with the robustness and efficiency of
* this driver because I still don't understand what the card is doing with
@@ -94,20 +91,85 @@
#include "pdc4030.h"
-#if SUPPORT_VLB_SYNC != 1
-#error This driver will not work unless SUPPORT_VLB_SYNC is 1
-#endif
+/*
+ * Data transfer functions for polled IO.
+ */
+
+/*
+ * Some localbus EIDE interfaces require a special access sequence
+ * when using 32-bit I/O instructions to transfer data. We call this
+ * the "vlb_sync" sequence, which consists of three successive reads
+ * of the sector count register location, with interrupts disabled
+ * to ensure that the reads all happen together.
+ */
+static void read_vlb(struct ata_device *drive, void *buffer, unsigned int wcount)
+{
+ unsigned long flags;
+
+ __save_flags(flags); /* local CPU only */
+ __cli(); /* local CPU only */
+ inb(IDE_NSECTOR_REG);
+ inb(IDE_NSECTOR_REG);
+ inb(IDE_NSECTOR_REG);
+ insl(IDE_DATA_REG, buffer, wcount);
+ __restore_flags(flags); /* local CPU only */
+}
+
+static void write_vlb(struct ata_device *drive, void *buffer, unsigned int wcount)
+{
+ unsigned long flags;
+
+ __save_flags(flags); /* local CPU only */
+ __cli(); /* local CPU only */
+ inb(IDE_NSECTOR_REG);
+ inb(IDE_NSECTOR_REG);
+ inb(IDE_NSECTOR_REG);
+ outsl(IDE_DATA_REG, buffer, wcount);
+ __restore_flags(flags); /* local CPU only */
+}
+
+static void read_16(struct ata_device *drive, void *buffer, unsigned int wcount)
+{
+ insw(IDE_DATA_REG, buffer, wcount<<1);
+}
+
+static void write_16(struct ata_device *drive, void *buffer, unsigned int wcount)
+{
+ outsw(IDE_DATA_REG, buffer, wcount<<1);
+}
+
+/*
+ * This is used for most PIO data transfers *from* the device.
+ */
+static void promise_read(struct ata_device *drive, void *buffer, unsigned int wcount)
+{
+ if (drive->channel->io_32bit)
+ read_vlb(drive, buffer, wcount);
+ else
+ read_16(drive, buffer, wcount);
+}
+
+/*
+ * This is used for most PIO data transfers *to* the device interface.
+ */
+static void promise_write(struct ata_device *drive, void *buffer, unsigned int wcount)
+{
+ if (drive->channel->io_32bit)
+ write_vlb(drive, buffer, wcount);
+ else
+ write_16(drive, buffer, wcount);
+}
/*
* promise_selectproc() is invoked by ide.c
* in preparation for access to the specified drive.
*/
-static void promise_selectproc (ide_drive_t *drive)
+static void promise_selectproc(struct ata_device *drive)
{
- unsigned int number;
+ u8 number;
number = (drive->channel->unit << 1) + drive->select.b.unit;
- OUT_BYTE(number,IDE_FEATURE_REG);
+ outb(number, IDE_FEATURE_REG);
}
/*
@@ -115,15 +177,15 @@
* by command F0. They all have the same success/failure notification -
* 'P' (=0x50) on success, 'p' (=0x70) on failure.
*/
-int pdc4030_cmd(ide_drive_t *drive, byte cmd)
+int pdc4030_cmd(struct ata_device *drive, byte cmd)
{
unsigned long timeout, timer;
byte status_val;
promise_selectproc(drive); /* redundant? */
- OUT_BYTE(0xF3,IDE_SECTOR_REG);
- OUT_BYTE(cmd,IDE_SELECT_REG);
- OUT_BYTE(PROMISE_EXTENDED_COMMAND,IDE_COMMAND_REG);
+ outb(0xF3, IDE_SECTOR_REG);
+ outb(cmd, IDE_SELECT_REG);
+ outb(PROMISE_EXTENDED_COMMAND, IDE_COMMAND_REG);
timeout = HZ * 10;
timeout += jiffies;
do {
@@ -134,7 +196,7 @@
/* Delays at least 10ms to give interface a chance */
timer = jiffies + (HZ + 99)/100 + 1;
while (time_after(timer, jiffies));
- status_val = IN_BYTE(IDE_SECTOR_REG);
+ status_val = inb(IDE_SECTOR_REG);
} while (status_val != 0x50 && status_val != 0x70);
if(status_val == 0x50)
@@ -146,7 +208,7 @@
/*
* pdc4030_identify sends a vendor-specific IDENTIFY command to the drive
*/
-int pdc4030_identify(ide_drive_t *drive)
+int pdc4030_identify(struct ata_device *drive)
{
return pdc4030_cmd(drive, PROMISE_IDENTIFY);
}
@@ -164,24 +226,25 @@
*/
int __init setup_pdc4030(struct ata_channel *hwif)
{
- ide_drive_t *drive;
+ struct ata_device *drive;
struct ata_channel *hwif2;
struct dc_ident ident;
int i;
ide_startstop_t startstop;
- if (!hwif) return 0;
+ if (!hwif)
+ return 0;
drive = &hwif->drives[0];
hwif2 = &ide_hwifs[hwif->index+1];
if (hwif->chipset == ide_pdc4030) /* we've already been found ! */
return 1;
- if (IN_BYTE(IDE_NSECTOR_REG) == 0xFF || IN_BYTE(IDE_SECTOR_REG) == 0xFF) {
+ if (inb(IDE_NSECTOR_REG) == 0xFF || inb(IDE_SECTOR_REG) == 0xFF) {
return 0;
}
if (IDE_CONTROL_REG)
- OUT_BYTE(0x08,IDE_CONTROL_REG);
+ outb(0x08, IDE_CONTROL_REG);
if (pdc4030_cmd(drive,PROMISE_GET_CONFIG)) {
return 0;
}
@@ -190,7 +253,7 @@
"%s: Failed Promise read config!\n",hwif->name);
return 0;
}
- ata_read(drive, &ident, SECTOR_WORDS);
+ promise_read(drive, &ident, SECTOR_WORDS);
if (ident.id[1] != 'P' || ident.id[0] != 'T') {
return 0;
}
@@ -233,7 +296,9 @@
hwif->chipset = hwif2->chipset = ide_pdc4030;
hwif->unit = ATA_PRIMARY;
hwif2->unit = ATA_SECONDARY;
- hwif->selectproc = hwif2->selectproc = &promise_selectproc;
+ hwif->ata_read = hwif2->ata_read = promise_read;
+ hwif->ata_write = hwif2->ata_write = promise_write;
+ hwif->selectproc = hwif2->selectproc = promise_selectproc;
hwif->serialized = hwif2->serialized = 1;
/* Shift the remaining interfaces up by one */
@@ -269,20 +334,20 @@
*/
int __init detect_pdc4030(struct ata_channel *hwif)
{
- ide_drive_t *drive = &hwif->drives[0];
+ struct ata_device *drive = &hwif->drives[0];
if (IDE_DATA_REG == 0) { /* Skip test for non-existent interface */
return 0;
}
- OUT_BYTE(0xF3, IDE_SECTOR_REG);
- OUT_BYTE(0x14, IDE_SELECT_REG);
- OUT_BYTE(PROMISE_EXTENDED_COMMAND, IDE_COMMAND_REG);
-
+ outb(0xF3, IDE_SECTOR_REG);
+ outb(0x14, IDE_SELECT_REG);
+ outb(PROMISE_EXTENDED_COMMAND, IDE_COMMAND_REG);
+
ide_delay_50ms();
- if (IN_BYTE(IDE_ERROR_REG) == 'P' &&
- IN_BYTE(IDE_NSECTOR_REG) == 'T' &&
- IN_BYTE(IDE_SECTOR_REG) == 'I') {
+ if (inb(IDE_ERROR_REG) == 'P' &&
+ inb(IDE_NSECTOR_REG) == 'T' &&
+ inb(IDE_SECTOR_REG) == 'I') {
return 1;
} else {
return 0;
@@ -321,9 +386,9 @@
read_again:
do {
- sectors_left = IN_BYTE(IDE_NSECTOR_REG);
- IN_BYTE(IDE_SECTOR_REG);
- } while (IN_BYTE(IDE_NSECTOR_REG) != sectors_left);
+ sectors_left = inb(IDE_NSECTOR_REG);
+ inb(IDE_SECTOR_REG);
+ } while (inb(IDE_NSECTOR_REG) != sectors_left);
sectors_avail = rq->nr_sectors - sectors_left;
if (!sectors_avail)
goto read_again;
@@ -334,7 +399,7 @@
nsect = sectors_avail;
sectors_avail -= nsect;
to = bio_kmap_irq(rq->bio, &flags) + ide_rq_offset(rq);
- ata_read(drive, to, nsect * SECTOR_WORDS);
+ promise_read(drive, to, nsect * SECTOR_WORDS);
#ifdef DEBUG_READ
printk(KERN_DEBUG "%s: promise_read: sectors(%ld-%ld), "
"buf=0x%08lx, rem=%ld\n", drive->name, rq->sector,
@@ -458,7 +523,7 @@
* Ok, we're all setup for the interrupt
* re-entering us on the last transfer.
*/
- ata_write(drive, buffer, nsect << 7);
+ promise_write(drive, buffer, nsect << 7);
bio_kunmap_irq(buffer, &flags);
} while (mcount);
@@ -472,7 +537,7 @@
{
struct ata_channel *ch = drive->channel;
- if (IN_BYTE(IDE_NSECTOR_REG) != 0) {
+ if (inb(IDE_NSECTOR_REG) != 0) {
if (time_before(jiffies, ch->poll_timeout)) {
ide_set_handler(drive, promise_write_pollfunc, HZ/100, NULL);
return ide_started; /* continue polling... */
@@ -496,13 +561,13 @@
}
/*
- * promise_write() transfers a block of one or more sectors of data to a
- * drive as part of a disk write operation. All but 4 sectors are transferred
- * in the first attempt, then the interface is polled (nicely!) for completion
- * before the final 4 sectors are transferred. There is no interrupt generated
- * on writes (at least on the DC4030VL-2), we just have to poll for NOT BUSY.
+ * This transfers a block of one or more sectors of data to a drive as part of
+ * a disk write operation. All but 4 sectors are transferred in the first
+ * attempt, then the interface is polled (nicely!) for completion before the
+ * final 4 sectors are transferred. There is no interrupt generated on writes
+ * (at least on the DC4030VL-2), we just have to poll for NOT BUSY.
*/
-static ide_startstop_t promise_write(struct ata_device *drive, struct request *rq)
+static ide_startstop_t promise_do_write(struct ata_device *drive, struct request *rq)
{
struct ata_channel *ch = drive->channel;
@@ -558,18 +623,18 @@
}
if (IDE_CONTROL_REG)
- OUT_BYTE(drive->ctl, IDE_CONTROL_REG); /* clear nIEN */
+ outb(drive->ctl, IDE_CONTROL_REG); /* clear nIEN */
SELECT_MASK(drive->channel, drive, 0);
- OUT_BYTE(taskfile->feature, IDE_FEATURE_REG);
- OUT_BYTE(taskfile->sector_count, IDE_NSECTOR_REG);
+ outb(taskfile->feature, IDE_FEATURE_REG);
+ outb(taskfile->sector_count, IDE_NSECTOR_REG);
/* refers to number of sectors to transfer */
- OUT_BYTE(taskfile->sector_number, IDE_SECTOR_REG);
+ outb(taskfile->sector_number, IDE_SECTOR_REG);
/* refers to sector offset or start sector */
- OUT_BYTE(taskfile->low_cylinder, IDE_LCYL_REG);
- OUT_BYTE(taskfile->high_cylinder, IDE_HCYL_REG);
- OUT_BYTE(taskfile->device_head, IDE_SELECT_REG);
- OUT_BYTE(taskfile->command, IDE_COMMAND_REG);
+ outb(taskfile->low_cylinder, IDE_LCYL_REG);
+ outb(taskfile->high_cylinder, IDE_HCYL_REG);
+ outb(taskfile->device_head, IDE_SELECT_REG);
+ outb(taskfile->command, IDE_COMMAND_REG);
switch (rq_data_dir(rq)) {
case READ:
@@ -590,7 +655,7 @@
udelay(1);
return promise_read_intr(drive, rq);
}
- if (IN_BYTE(IDE_SELECT_REG) & 0x01) {
+ if (inb(IDE_SELECT_REG) & 0x01) {
#ifdef DEBUG_READ
printk(KERN_DEBUG "%s: read: waiting for "
"interrupt\n", drive->name);
@@ -621,7 +686,7 @@
}
if (!drive->channel->unmask)
__cli(); /* local CPU only */
- return promise_write(drive, rq);
+ return promise_do_write(drive, rq);
}
default:
diff -urN linux-2.5.15/drivers/ide/tcq.c linux/drivers/ide/tcq.c
--- linux-2.5.15/drivers/ide/tcq.c 2002-05-14 00:50:24.000000000 +0200
+++ linux/drivers/ide/tcq.c 2002-05-13 22:03:40.000000000 +0200
@@ -83,7 +83,6 @@
static void tcq_invalidate_queue(struct ata_device *drive)
{
struct ata_channel *ch = drive->channel;
- ide_hwgroup_t *hwgroup = ch->hwgroup;
request_queue_t *q = &drive->queue;
struct ata_taskfile *args;
struct request *rq;
@@ -104,7 +103,7 @@
drive->queue_depth = 1;
clear_bit(IDE_BUSY, &ch->active);
clear_bit(IDE_DMA, &ch->active);
- hwgroup->handler = NULL;
+ ch->handler = NULL;
/*
* Do some internal stuff -- we really need this command to be
@@ -153,7 +152,6 @@
{
struct ata_device *drive = (struct ata_device *) data;
struct ata_channel *ch = drive->channel;
- ide_hwgroup_t *hwgroup = HWGROUP(drive);
unsigned long flags;
printk(KERN_ERR "ATA: %s: timeout waiting for interrupt...\n", __FUNCTION__);
@@ -161,9 +159,9 @@
spin_lock_irqsave(&ide_lock, flags);
if (test_and_set_bit(IDE_BUSY, &ch->active))
- printk(KERN_ERR "ATA: %s: hwgroup not busy\n", __FUNCTION__);
- if (hwgroup->handler == NULL)
- printk(KERN_ERR "ATA: %s: missing isr!\n", __FUNCTION__);
+ printk(KERN_ERR "ATA: %s: IRQ handler not busy\n", __FUNCTION__);
+ if (!ch->handler)
+ printk(KERN_ERR "ATA: %s: missing ISR!\n", __FUNCTION__);
spin_unlock_irqrestore(&ide_lock, flags);
@@ -181,7 +179,6 @@
static void set_irq(struct ata_device *drive, ata_handler_t *handler)
{
struct ata_channel *ch = drive->channel;
- ide_hwgroup_t *hwgroup = HWGROUP(drive);
unsigned long flags;
spin_lock_irqsave(&ide_lock, flags);
@@ -197,8 +194,8 @@
ch->timer.function = ata_tcq_irq_timeout;
ch->timer.data = (unsigned long) ch->drive;
mod_timer(&ch->timer, jiffies + 5 * HZ);
+ ch->handler = handler;
- hwgroup->handler = handler;
spin_unlock_irqrestore(&ide_lock, flags);
}
diff -urN linux-2.5.15/drivers/ide/umc8672.c linux/drivers/ide/umc8672.c
--- linux-2.5.15/drivers/ide/umc8672.c 2002-05-10 00:23:22.000000000 +0200
+++ linux/drivers/ide/umc8672.c 2002-05-13 22:06:19.000000000 +0200
@@ -21,7 +21,7 @@
*/
/*
- * VLB Controller Support from
+ * VLB Controller Support from
* Wolfram Podien
* Rohoefe 3
* D28832 Achim
@@ -34,7 +34,7 @@
* #define UMC_DRIVE0 11
* in the beginning of the driver, which sets the speed of drive 0 to 11 (there
* are some lines present). 0 - 11 are allowed speed values. These values are
- * the results from the DOS speed test program supplied from UMC. 11 is the
+ * the results from the DOS speed test program supplied from UMC. 11 is the
* highest speed (about PIO mode 3)
*/
#define REALLY_SLOW_IO /* some systems can safely undef this */
@@ -92,13 +92,11 @@
out_umc (0xd7,(speedtab[0][speeds[2]] | (speedtab[0][speeds[3]]<<4)));
out_umc (0xd6,(speedtab[0][speeds[0]] | (speedtab[0][speeds[1]]<<4)));
tmp = 0;
- for (i = 3; i >= 0; i--)
- {
+ for (i = 3; i >= 0; i--) {
tmp = (tmp << 2) | speedtab[1][speeds[i]];
}
out_umc (0xdc,tmp);
- for (i = 0;i < 4; i++)
- {
+ for (i = 0;i < 4; i++) {
out_umc (0xd0+i,speedtab[2][speeds[i]]);
out_umc (0xd8+i,speedtab[2][speeds[i]]);
}
@@ -108,10 +106,9 @@
speeds[0], speeds[1], speeds[2], speeds[3]);
}
-static void tune_umc (ide_drive_t *drive, byte pio)
+static void tune_umc(struct ata_device *drive, byte pio)
{
unsigned long flags;
- ide_hwgroup_t *hwgroup = ide_hwifs[drive->channel->index ^ 1].hwgroup;
if (pio == 255)
pio = ata_timing_mode(drive, XFER_PIO | XFER_EPIO) - XFER_PIO_0;
@@ -121,16 +118,12 @@
printk("%s: setting umc8672 to PIO mode%d (speed %d)\n", drive->name, pio, pio_to_umc[pio]);
save_flags(flags); /* all CPUs */
cli(); /* all CPUs */
- if (hwgroup && hwgroup->handler != NULL) {
- printk("umc8672: other interface is busy: exiting tune_umc()\n");
- } else {
- current_speeds[drive->name[2] - 'a'] = pio_to_umc[pio];
- umc_set_speeds (current_speeds);
- }
+ current_speeds[drive->name[2] - 'a'] = pio_to_umc[pio];
+ umc_set_speeds (current_speeds);
restore_flags(flags); /* all CPUs */
}
-void __init init_umc8672 (void) /* called from ide.c */
+void __init init_umc8672(void) /* called from ide.c */
{
unsigned long flags;
diff -urN linux-2.5.15/include/asm-cris/ide.h linux/include/asm-cris/ide.h
--- linux-2.5.15/include/asm-cris/ide.h 2002-05-14 00:50:24.000000000 +0200
+++ linux/include/asm-cris/ide.h 2002-05-14 00:32:15.000000000 +0200
@@ -90,9 +90,6 @@
/* some configuration options we don't need */
-#undef SUPPORT_VLB_SYNC
-#define SUPPORT_VLB_SYNC 0
-
#undef SUPPORT_SLOW_DATA_PORTS
#define SUPPORT_SLOW_DATA_PORTS 0
diff -urN linux-2.5.15/include/asm-m68k/ide.h linux/include/asm-m68k/ide.h
--- linux-2.5.15/include/asm-m68k/ide.h 2002-05-14 00:50:24.000000000 +0200
+++ linux/include/asm-m68k/ide.h 2002-05-14 00:31:49.000000000 +0200
@@ -83,9 +83,6 @@
#undef SUPPORT_SLOW_DATA_PORTS
#define SUPPORT_SLOW_DATA_PORTS 0
-#undef SUPPORT_VLB_SYNC
-#define SUPPORT_VLB_SYNC 0
-
/* this definition is used only on startup .. */
#undef HD_DATA
#define HD_DATA NULL
diff -urN linux-2.5.15/include/asm-mips/ide.h linux/include/asm-mips/ide.h
--- linux-2.5.15/include/asm-mips/ide.h 2002-05-14 00:50:24.000000000 +0200
+++ linux/include/asm-mips/ide.h 2002-05-14 00:31:56.000000000 +0200
@@ -65,9 +65,6 @@
#endif
}
-#undef SUPPORT_VLB_SYNC
-#define SUPPORT_VLB_SYNC 0
-
#endif /* __KERNEL__ */
#endif /* __ASM_IDE_H */
diff -urN linux-2.5.15/include/asm-ppc/ide.h linux/include/asm-ppc/ide.h
--- linux-2.5.15/include/asm-ppc/ide.h 2002-05-14 00:50:24.000000000 +0200
+++ linux/include/asm-ppc/ide.h 2002-05-14 00:31:36.000000000 +0200
@@ -43,8 +43,6 @@
#undef SUPPORT_SLOW_DATA_PORTS
#define SUPPORT_SLOW_DATA_PORTS 0
-#undef SUPPORT_VLB_SYNC
-#define SUPPORT_VLB_SYNC 0
#define ide__sti() __sti()
diff -urN linux-2.5.15/include/asm-sparc/ide.h linux/include/asm-sparc/ide.h
--- linux-2.5.15/include/asm-sparc/ide.h 2002-05-14 00:50:24.000000000 +0200
+++ linux/include/asm-sparc/ide.h 2002-05-14 00:32:09.000000000 +0200
@@ -76,9 +76,6 @@
#undef SUPPORT_SLOW_DATA_PORTS
#define SUPPORT_SLOW_DATA_PORTS 0
-#undef SUPPORT_VLB_SYNC
-#define SUPPORT_VLB_SYNC 0
-
#undef HD_DATA
#define HD_DATA ((ide_ioreg_t)0)
diff -urN linux-2.5.15/include/asm-sparc64/ide.h linux/include/asm-sparc64/ide.h
--- linux-2.5.15/include/asm-sparc64/ide.h 2002-05-14 00:50:24.000000000 +0200
+++ linux/include/asm-sparc64/ide.h 2002-05-14 00:32:03.000000000 +0200
@@ -72,9 +72,6 @@
#undef SUPPORT_SLOW_DATA_PORTS
#define SUPPORT_SLOW_DATA_PORTS 0
-#undef SUPPORT_VLB_SYNC
-#define SUPPORT_VLB_SYNC 0
-
#undef HD_DATA
#define HD_DATA ((ide_ioreg_t)0)
diff -urN linux-2.5.15/include/linux/ide.h linux/include/linux/ide.h
--- linux-2.5.15/include/linux/ide.h 2002-05-14 00:50:24.000000000 +0200
+++ linux/include/linux/ide.h 2002-05-14 01:08:54.000000000 +0200
@@ -40,9 +40,6 @@
/* Right now this is only needed by a promise controlled.
*/
-#ifndef SUPPORT_VLB_SYNC /* 1 to support weird 32-bit chips */
-# define SUPPORT_VLB_SYNC 1 /* 0 to reduce kernel size */
-#endif
#ifndef DISK_RECOVERY_TIME /* off=0; on=access_delay_time */
# define DISK_RECOVERY_TIME 0 /* for hardware that needs it */
#endif
@@ -74,8 +71,6 @@
*/
#define DMA_PIO_RETRY 1 /* retrying in PIO */
-#define HWGROUP(drive) (drive->channel->hwgroup)
-
/*
* Definitions for accessing IDE controller registers
*/
@@ -444,18 +439,16 @@
IDE_DMA /* DMA in progress */
};
-typedef struct hwgroup_s {
- /* FIXME: We should look for busy request queues instead of looking at
- * the !NULL state of this field.
- */
- ide_startstop_t (*handler)(struct ata_device *, struct request *); /* irq handler, if active */
-} ide_hwgroup_t;
-
struct ata_channel {
struct device dev; /* device handle */
int unit; /* channel number */
- struct hwgroup_s *hwgroup; /* actually (ide_hwgroup_t *) */
+ /* This lock is used to serialize requests on the same device queue or
+ * between differen queues sharing the same irq line.
+ */
+ spinlock_t *lock;
+
+ ide_startstop_t (*handler)(struct ata_device *, struct request *); /* irq handler, if active */
struct timer_list timer; /* failsafe timer */
int (*expiry)(struct ata_device *, struct request *); /* irq handler, if active */
unsigned long poll_timeout; /* timeout value during polled operations */
@@ -777,11 +770,7 @@
extern int system_bus_speed;
-/*
- * ide_stall_queue() can be used by a drive to give excess bandwidth back
- * to the hwgroup by sleeping for timeout jiffies.
- */
-void ide_stall_queue(struct ata_device *, unsigned long);
+extern void ide_stall_queue(struct ata_device *, unsigned long);
/*
* CompactFlash cards and their brethern pretend to be removable hard disks,
^ permalink raw reply [flat|nested] 265+ messages in thread
* [PATCH] 2.5.15 IDE 63
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
` (10 preceding siblings ...)
2002-05-14 10:26 ` [PATCH] 2.5.15 IDE 62a Martin Dalecki
@ 2002-05-14 10:28 ` Martin Dalecki
2002-05-15 12:04 ` [PATCH] 2.5.15 IDE 64 Martin Dalecki
12 siblings, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-14 10:28 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 1461 bytes --]
Tue May 14 02:36:12 CEST 2002 ide-clean-63:
- Propagate the queue handling changes to pmac as well.
- Move set_transfer to ide-taskfile.c this is the only place where it's used
and it can be made static there. Same applies to ide_ata66_check().
- Move ide_auto_reduce_xfer to ide.c.
- Make ide_cmd() local to the only place where it's used. Rename it to
drive_cmd(). Don't pass drive_cmd_intr() as parameter.
- Remove ide_next command completion type. Nobody is using it.
- Move ide_do_drive_cmd to ide-taskfile. It's used there and in sub-drivers.
Not in ide.c. The usage inside the device type drivers is entirely bogus
inconsistent and so on...
- Kill bogus IRQ masking code. The kernel is supposed to handle this properly.
We should not try to work against possible bugs in the overall irq handling
code. Wow this is increasing the systems overall responsibility by a
significant margin.
- Remove disfunctional pdcadma code. It is only misleading to the user.
Finally I know where the locking mis matches happen: It's in the device type
drivers, which entirely disobey the locking done inside the generic code and
which push REQ_DRIVE_CMD type request directly down through the request queue.
We will have to deal with them later, simple due to the fact that I can not do
everything at once. And finally I have different plans for them :-). (Hint: Why
there should be 4 different ATAPI device handling drivers instead of one?)
[-- Attachment #2: ide-clean-63.diff --]
[-- Type: text/plain, Size: 41654 bytes --]
diff -urN linux-2.5.15/arch/alpha/defconfig linux/arch/alpha/defconfig
--- linux-2.5.15/arch/alpha/defconfig 2002-05-10 00:24:42.000000000 +0200
+++ linux/arch/alpha/defconfig 2002-05-14 04:11:51.000000000 +0200
@@ -266,7 +266,6 @@
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_OPTI621 is not set
-# CONFIG_BLK_DEV_PDC_ADMA is not set
# CONFIG_BLK_DEV_PDC202XX is not set
# CONFIG_PDC202XX_BURST is not set
# CONFIG_PDC202XX_FORCE is not set
diff -urN linux-2.5.15/arch/i386/defconfig linux/arch/i386/defconfig
--- linux-2.5.15/arch/i386/defconfig 2002-05-10 00:21:57.000000000 +0200
+++ linux/arch/i386/defconfig 2002-05-14 04:11:41.000000000 +0200
@@ -275,7 +275,6 @@
CONFIG_BLK_DEV_PIIX=y
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_OPTI621 is not set
-# CONFIG_BLK_DEV_PDC_ADMA is not set
# CONFIG_BLK_DEV_PDC202XX is not set
# CONFIG_PDC202XX_BURST is not set
# CONFIG_PDC202XX_FORCE is not set
diff -urN linux-2.5.15/arch/ia64/defconfig linux/arch/ia64/defconfig
--- linux-2.5.15/arch/ia64/defconfig 2002-05-10 00:21:51.000000000 +0200
+++ linux/arch/ia64/defconfig 2002-05-14 04:11:58.000000000 +0200
@@ -247,7 +247,6 @@
# CONFIG_BLK_DEV_PIIX is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_OPTI621 is not set
-# CONFIG_BLK_DEV_PDC_ADMA is not set
# CONFIG_BLK_DEV_PDC202XX is not set
# CONFIG_PDC202XX_BURST is not set
# CONFIG_PDC202XX_FORCE is not set
diff -urN linux-2.5.15/arch/sparc64/defconfig linux/arch/sparc64/defconfig
--- linux-2.5.15/arch/sparc64/defconfig 2002-05-10 00:23:31.000000000 +0200
+++ linux/arch/sparc64/defconfig 2002-05-14 04:11:47.000000000 +0200
@@ -310,7 +310,6 @@
# CONFIG_BLK_DEV_PIIX is not set
CONFIG_BLK_DEV_NS87415=y
# CONFIG_BLK_DEV_OPTI621 is not set
-# CONFIG_BLK_DEV_PDC_ADMA is not set
# CONFIG_BLK_DEV_PDC202XX is not set
# CONFIG_PDC202XX_BURST is not set
# CONFIG_PDC202XX_FORCE is not set
diff -urN linux-2.5.15/arch/x86_64/defconfig linux/arch/x86_64/defconfig
--- linux-2.5.15/arch/x86_64/defconfig 2002-05-10 00:25:17.000000000 +0200
+++ linux/arch/x86_64/defconfig 2002-05-14 04:11:54.000000000 +0200
@@ -228,7 +228,6 @@
# CONFIG_BLK_DEV_PIIX is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_OPTI621 is not set
-# CONFIG_BLK_DEV_PDC_ADMA is not set
# CONFIG_BLK_DEV_PDC202XX is not set
# CONFIG_PDC202XX_BURST is not set
# CONFIG_PDC202XX_FORCE is not set
diff -urN linux-2.5.15/drivers/ide/ata-timing.c linux/drivers/ide/ata-timing.c
--- linux-2.5.15/drivers/ide/ata-timing.c 2002-05-10 00:23:31.000000000 +0200
+++ linux/drivers/ide/ata-timing.c 2002-05-14 02:39:39.000000000 +0200
@@ -70,7 +70,7 @@
* then to be matched agains in esp. other drives no the same channel or even
* the whole particular host chip.
*/
-short ata_timing_mode(ide_drive_t *drive, int map)
+short ata_timing_mode(struct ata_device *drive, int map)
{
struct hd_driveid *id = drive->id;
short best = 0;
@@ -192,7 +192,7 @@
return t;
}
-int ata_timing_compute(ide_drive_t *drive, short speed, struct ata_timing *t,
+int ata_timing_compute(struct ata_device *drive, short speed, struct ata_timing *t,
int T, int UT)
{
struct hd_driveid *id = drive->id;
diff -urN linux-2.5.15/drivers/ide/Config.help linux/drivers/ide/Config.help
--- linux-2.5.15/drivers/ide/Config.help 2002-05-10 00:25:30.000000000 +0200
+++ linux/drivers/ide/Config.help 2002-05-14 04:11:23.000000000 +0200
@@ -294,9 +294,6 @@
It is normally safe to answer Y; however, the default is N.
-CONFIG_BLK_DEV_PDC_ADMA
- Please read the comments at the top of <file:drivers/ide/ide-pci.c>.
-
CONFIG_BLK_DEV_AEC62XX
This driver adds up to 4 more EIDE devices sharing a single
interrupt. This add-on card is a bootable PCI UDMA controller. In
diff -urN linux-2.5.15/drivers/ide/Config.in linux/drivers/ide/Config.in
--- linux-2.5.15/drivers/ide/Config.in 2002-05-10 00:25:33.000000000 +0200
+++ linux/drivers/ide/Config.in 2002-05-14 04:11:34.000000000 +0200
@@ -73,7 +73,6 @@
fi
dep_bool ' NS87415 chipset support (EXPERIMENTAL)' CONFIG_BLK_DEV_NS87415 $CONFIG_BLK_DEV_IDEDMA_PCI
dep_mbool ' OPTi 82C621 chipset enhanced support (EXPERIMENTAL)' CONFIG_BLK_DEV_OPTI621 $CONFIG_PCI $CONFIG_EXPERIMENTAL
- dep_mbool ' Pacific Digital A-DMA support (EXPERIMENTAL)' CONFIG_BLK_DEV_PDC_ADMA $CONFIG_EXPERIMENTAL
dep_bool ' PROMISE PDC202{46|62|65|67|68|69|70} support' CONFIG_BLK_DEV_PDC202XX $CONFIG_BLK_DEV_IDEDMA_PCI
dep_bool ' Special UDMA Feature' CONFIG_PDC202XX_BURST $CONFIG_BLK_DEV_PDC202XX
dep_bool ' Special FastTrak Feature' CONFIG_PDC202XX_FORCE $CONFIG_BLK_DEV_PDC202XX
diff -urN linux-2.5.15/drivers/ide/ide.c linux/drivers/ide/ide.c
--- linux-2.5.15/drivers/ide/ide.c 2002-05-14 04:22:37.000000000 +0200
+++ linux/drivers/ide/ide.c 2002-05-14 04:11:13.000000000 +0200
@@ -54,7 +54,6 @@
#include <linux/delay.h>
#include <linux/ide.h>
#include <linux/devfs_fs_kernel.h>
-#include <linux/completion.h>
#include <linux/reboot.h>
#include <linux/cdrom.h>
#include <linux/device.h>
@@ -354,6 +353,31 @@
spin_unlock_irqrestore(ch->lock, flags);
}
+static u8 auto_reduce_xfer(struct ata_device *drive)
+{
+ if (!drive->crc_count)
+ return drive->current_speed;
+ drive->crc_count = 0;
+
+ switch(drive->current_speed) {
+ case XFER_UDMA_7: return XFER_UDMA_6;
+ case XFER_UDMA_6: return XFER_UDMA_5;
+ case XFER_UDMA_5: return XFER_UDMA_4;
+ case XFER_UDMA_4: return XFER_UDMA_3;
+ case XFER_UDMA_3: return XFER_UDMA_2;
+ case XFER_UDMA_2: return XFER_UDMA_1;
+ case XFER_UDMA_1: return XFER_UDMA_0;
+ /*
+ * OOPS we do not goto non Ultra DMA modes
+ * without iCRC's available we force
+ * the system to PIO and make the user
+ * invoke the ATA-1 ATA-2 DMA modes.
+ */
+ case XFER_UDMA_0:
+ default: return XFER_PIO_4;
+ }
+}
+
static void check_crc_errors(struct ata_device *drive)
{
if (!drive->using_dma)
@@ -362,8 +386,10 @@
/* check the DMA crc count */
if (drive->crc_count) {
udma_enable(drive, 0, 0);
- if ((drive->channel->speedproc) != NULL)
- drive->channel->speedproc(drive, ide_auto_reduce_xfer(drive));
+ if (drive->channel->speedproc) {
+ u8 pio = auto_reduce_xfer(drive);
+ drive->channel->speedproc(drive, pio);
+ }
if (drive->current_speed >= XFER_SW_DMA_0)
udma_enable(drive, 1, 1);
} else
@@ -835,19 +861,6 @@
}
/*
- * Issue a simple drive command. The drive must be selected beforehand.
- */
-void ide_cmd(struct ata_device *drive, byte cmd, byte nsect, ata_handler_t handler)
-{
- ide_set_handler (drive, handler, WAIT_CMD, NULL);
- if (IDE_CONTROL_REG)
- OUT_BYTE(drive->ctl,IDE_CONTROL_REG); /* clear nIEN */
- SELECT_MASK(drive->channel, drive, 0);
- OUT_BYTE(nsect,IDE_NSECTOR_REG);
- OUT_BYTE(cmd,IDE_COMMAND_REG);
-}
-
-/*
* Invoked on completion of a special DRIVE_CMD.
*/
static ide_startstop_t drive_cmd_intr(struct ata_device *drive, struct request *rq)
@@ -865,13 +878,27 @@
}
if (!OK_STAT(stat, READY_STAT, BAD_STAT))
- return ide_error(drive, rq, "drive_cmd", stat); /* calls ide_end_drive_cmd */
+ return ide_error(drive, rq, "drive_cmd", stat); /* already calls ide_end_drive_cmd */
ide_end_drive_cmd(drive, rq, stat, GET_ERR());
return ide_stopped;
}
/*
+ * Issue a simple drive command. The drive must be selected beforehand.
+ */
+static void drive_cmd(struct ata_device *drive, u8 cmd, u8 nsect)
+{
+ ide_set_handler(drive, drive_cmd_intr, WAIT_CMD, NULL);
+ if (IDE_CONTROL_REG)
+ OUT_BYTE(drive->ctl, IDE_CONTROL_REG); /* clear nIEN */
+ SELECT_MASK(drive->channel, drive, 0);
+ OUT_BYTE(nsect, IDE_NSECTOR_REG);
+ OUT_BYTE(cmd, IDE_COMMAND_REG);
+}
+
+
+/*
* Busy-wait for the drive status to be not "busy". Check then the status for
* all of the "good" bits and none of the "bad" bits, and if all is okay it
* returns 0. All other cases return 1 after invoking ide_error() -- caller
@@ -1014,12 +1041,12 @@
OUT_BYTE(0xc2, IDE_HCYL_REG);
OUT_BYTE(args[2],IDE_FEATURE_REG);
OUT_BYTE(args[1],IDE_SECTOR_REG);
- ide_cmd(drive, args[0], args[3], &drive_cmd_intr);
+ drive_cmd(drive, args[0], args[3]);
return ide_started;
}
OUT_BYTE(args[2],IDE_FEATURE_REG);
- ide_cmd(drive, args[0], args[1], &drive_cmd_intr);
+ drive_cmd(drive, args[0], args[1]);
return ide_started;
}
@@ -1187,10 +1214,10 @@
/*
- * Feed commands to a drive until it barfs. Called with ide_lock/DRIVE_LOCK
- * held and busy channel.
+ * Feed commands to a drive until it barfs. Called with queue lock held and
+ * busy channel.
*/
-static void queue_commands(struct ata_device *drive, int masked_irq)
+static void queue_commands(struct ata_device *drive)
{
struct ata_channel *ch = drive->channel;
ide_startstop_t startstop = -1;
@@ -1199,7 +1226,7 @@
struct request *rq = NULL;
if (!test_bit(IDE_BUSY, &ch->active))
- printk(KERN_ERR"%s: error: not busy while queueing!\n", drive->name);
+ printk(KERN_ERR "%s: error: not busy while queueing!\n", drive->name);
/* Abort early if we can't queue another command. for non
* tcq, ata_can_queue is always 1 since we never get here
@@ -1214,7 +1241,7 @@
drive->sleep = 0;
if (test_bit(IDE_DMA, &ch->active)) {
- printk("ide_do_request: DMA in progress...\n");
+ printk(KERN_ERR "%s: error: DMA in progress...\n", drive->name);
break;
}
@@ -1245,18 +1272,6 @@
drive->rq = rq;
- /* Some systems have trouble with IDE IRQs arriving while the
- * driver is still setting things up. So, here we disable the
- * IRQ used by this interface while the request is being
- * started. This may look bad at first, but pretty much the
- * same thing happens anyway when any interrupt comes in, IDE
- * or otherwise -- the kernel masks the IRQ while it is being
- * handled.
- */
-
- if (masked_irq && drive->channel->irq != masked_irq)
- disable_irq_nosync(drive->channel->irq);
-
spin_unlock(drive->channel->lock);
ide__sti(); /* allow other IRQs while we start this request */
@@ -1264,9 +1279,6 @@
spin_lock_irq(drive->channel->lock);
- if (masked_irq && drive->channel->irq != masked_irq)
- enable_irq(drive->channel->irq);
-
/* command started, we are busy */
if (startstop == ide_started)
break;
@@ -1288,7 +1300,7 @@
* Issue a new request.
* Caller must have already done spin_lock_irqsave(channel->lock, ...)
*/
-static void ide_do_request(struct ata_channel *channel, int masked_irq)
+static void do_request(struct ata_channel *channel)
{
ide_get_lock(&irq_lock, ata_irq_request, hwgroup);/* for atari only: POSSIBLY BROKEN HERE(?) */
// __cli(); /* necessary paranoia: ensure IRQs are masked on local CPU */
@@ -1319,14 +1331,14 @@
*/
ch->drive = drive;
- queue_commands(drive, masked_irq);
+ queue_commands(drive);
}
}
void do_ide_request(request_queue_t *q)
{
- ide_do_request(q->queuedata, 0);
+ do_request(q->queuedata);
}
/*
@@ -1368,7 +1380,7 @@
/*
* This is our timeout function for all drive operations. But note that it can
* also be invoked as a result of a "sleep" operation triggered by the
- * mod_timer() call in ide_do_request.
+ * mod_timer() call in do_request.
*/
void ide_timer_expiry(unsigned long data)
{
@@ -1462,7 +1474,7 @@
}
}
- ide_do_request(ch->drive->channel, 0);
+ do_request(ch->drive->channel);
spin_unlock_irqrestore(ch->lock, flags);
}
@@ -1606,12 +1618,12 @@
if (startstop == ide_stopped) {
if (!ch->handler) { /* paranoia */
clear_bit(IDE_BUSY, &ch->active);
- ide_do_request(ch, ch->irq);
+ do_request(ch);
} else {
printk("%s: %s: huh? expected NULL handler on exit\n", drive->name, __FUNCTION__);
}
} else if (startstop == ide_released)
- queue_commands(drive, ch->irq);
+ queue_commands(drive);
out_lock:
spin_unlock_irqrestore(ch->lock, flags);
@@ -1642,81 +1654,6 @@
}
/*
- * This function is intended to be used prior to invoking ide_do_drive_cmd().
- */
-void ide_init_drive_cmd(struct request *rq)
-{
- memset(rq, 0, sizeof(*rq));
- rq->flags = REQ_DRIVE_CMD;
-}
-
-/*
- * This function issues a special IDE device request onto the request queue.
- *
- * If action is ide_wait, then the rq is queued at the end of the request
- * queue, and the function sleeps until it has been processed. This is for use
- * when invoked from an ioctl handler.
- *
- * If action is ide_preempt, then the rq is queued at the head of the request
- * queue, displacing the currently-being-processed request and this function
- * returns immediately without waiting for the new rq to be completed. This is
- * VERY DANGEROUS, and is intended for careful use by the ATAPI tape/cdrom
- * driver code.
- *
- * If action is ide_next, then the rq is queued immediately after the
- * currently-being-processed-request (if any), and the function returns without
- * waiting for the new rq to be completed. As above, This is VERY DANGEROUS,
- * and is intended for careful use by the ATAPI tape/cdrom driver code.
- *
- * If action is ide_end, then the rq is queued at the end of the request queue,
- * and the function returns immediately without waiting for the new rq to be
- * completed. This is again intended for careful use by the ATAPI tape/cdrom
- * driver code.
- */
-int ide_do_drive_cmd(struct ata_device *drive, struct request *rq, ide_action_t action)
-{
- unsigned long flags;
- unsigned int major = drive->channel->major;
- request_queue_t *q = &drive->queue;
- struct list_head *queue_head = &q->queue_head;
- DECLARE_COMPLETION(wait);
-
-#ifdef CONFIG_BLK_DEV_PDC4030
- if (drive->channel->chipset == ide_pdc4030 && rq->buffer != NULL)
- return -ENOSYS; /* special drive cmds not supported */
-#endif
- rq->errors = 0;
- rq->rq_status = RQ_ACTIVE;
- rq->rq_dev = mk_kdev(major,(drive->select.b.unit)<<PARTN_BITS);
- if (action == ide_wait)
- rq->waiting = &wait;
-
- spin_lock_irqsave(drive->channel->lock, flags);
-
- if (blk_queue_empty(&drive->queue) || action == ide_preempt) {
- if (action == ide_preempt)
- drive->rq = NULL;
- } else {
- if (action == ide_wait || action == ide_end)
- queue_head = queue_head->prev;
- else
- queue_head = queue_head->next;
- }
- q->elevator.elevator_add_req_fn(q, rq, queue_head);
- ide_do_request(drive->channel, 0);
-
- spin_unlock_irqrestore(drive->channel->lock, flags);
-
- if (action == ide_wait) {
- wait_for_completion(&wait); /* wait for it to be serviced */
- return rq->errors ? -EIO : 0; /* return -EIO if errors */
- }
-
- return 0;
-
-}
-
-/*
* This routine is called to flush all partitions and partition tables
* for a changed disk, and then re-read the new partition table.
* If we are revalidating a disk because of a media change, then we
@@ -3206,13 +3143,10 @@
EXPORT_SYMBOL(ide_fixstring);
EXPORT_SYMBOL(ide_wait_stat);
EXPORT_SYMBOL(restart_request);
-EXPORT_SYMBOL(ide_init_drive_cmd);
-EXPORT_SYMBOL(ide_do_drive_cmd);
EXPORT_SYMBOL(ide_end_drive_cmd);
EXPORT_SYMBOL(__ide_end_request);
EXPORT_SYMBOL(ide_end_request);
EXPORT_SYMBOL(ide_revalidate_disk);
-EXPORT_SYMBOL(ide_cmd);
EXPORT_SYMBOL(ide_delay_50ms);
EXPORT_SYMBOL(ide_stall_queue);
@@ -3362,9 +3296,6 @@
# ifdef CONFIG_BLK_DEV_AMD74XX
init_amd74xx();
# endif
-# ifdef CONFIG_BLK_DEV_PDC_ADMA
- init_pdcadma();
-# endif
# ifdef CONFIG_BLK_DEV_SVWKS
init_svwks();
# endif
diff -urN linux-2.5.15/drivers/ide/ide-features.c linux/drivers/ide/ide-features.c
--- linux-2.5.15/drivers/ide/ide-features.c 2002-05-14 00:50:21.000000000 +0200
+++ linux/drivers/ide/ide-features.c 2002-05-14 02:56:51.000000000 +0200
@@ -78,31 +78,6 @@
return "XFER ERROR";
}
-byte ide_auto_reduce_xfer (ide_drive_t *drive)
-{
- if (!drive->crc_count)
- return drive->current_speed;
- drive->crc_count = 0;
-
- switch(drive->current_speed) {
- case XFER_UDMA_7: return XFER_UDMA_6;
- case XFER_UDMA_6: return XFER_UDMA_5;
- case XFER_UDMA_5: return XFER_UDMA_4;
- case XFER_UDMA_4: return XFER_UDMA_3;
- case XFER_UDMA_3: return XFER_UDMA_2;
- case XFER_UDMA_2: return XFER_UDMA_1;
- case XFER_UDMA_1: return XFER_UDMA_0;
- /*
- * OOPS we do not goto non Ultra DMA modes
- * without iCRC's available we force
- * the system to PIO and make the user
- * invoke the ATA-1 ATA-2 DMA modes.
- */
- case XFER_UDMA_0:
- default: return XFER_PIO_4;
- }
-}
-
/*
* hd_driveid data come as little endian,
* they need to be converted on big endian machines
@@ -110,7 +85,7 @@
void ide_fix_driveid(struct hd_driveid *id)
{
#ifndef __LITTLE_ENDIAN
-#ifdef __BIG_ENDIAN
+# ifdef __BIG_ENDIAN
int i;
unsigned short *stringcast;
@@ -196,13 +171,13 @@
for (i = 0; i < 48; i++)
id->words206_254[i] = __le16_to_cpu(id->words206_254[i]);
id->integrity_word = __le16_to_cpu(id->integrity_word);
-#else
-#error "Please fix <asm/byteorder.h>"
-#endif /* __BIG_ENDIAN */
-#endif /* __LITTLE_ENDIAN */
+# else
+# error "Please fix <asm/byteorder.h>"
+# endif
+#endif
}
-int ide_driveid_update (ide_drive_t *drive)
+int ide_driveid_update(struct ata_device *drive)
{
/*
* Re-read drive->id for possible DMA mode
@@ -255,56 +230,9 @@
}
/*
- * Verify that we are doing an approved SETFEATURES_XFER with respect
- * to the hardware being able to support request. Since some hardware
- * can improperly report capabilties, we check to see if the host adapter
- * in combination with the device (usually a disk) properly detect
- * and acknowledge each end of the ribbon.
- */
-int ide_ata66_check (ide_drive_t *drive, struct ata_taskfile *args)
-{
- if ((args->taskfile.command == WIN_SETFEATURES) &&
- (args->taskfile.sector_number > XFER_UDMA_2) &&
- (args->taskfile.feature == SETFEATURES_XFER)) {
- if (!drive->channel->udma_four) {
- printk("%s: Speed warnings UDMA 3/4/5 is not functional.\n", drive->channel->name);
- return 1;
- }
-#ifndef CONFIG_IDEDMA_IVB
- if ((drive->id->hw_config & 0x6000) == 0) {
-#else
- if (((drive->id->hw_config & 0x2000) == 0) ||
- ((drive->id->hw_config & 0x4000) == 0)) {
-#endif
- printk("%s: Speed warnings UDMA 3/4/5 is not functional.\n", drive->name);
- return 1;
- }
- }
- return 0;
-}
-
-/*
- * Backside of HDIO_DRIVE_CMD call of SETFEATURES_XFER.
- * 1 : Safe to update drive->id DMA registers.
- * 0 : OOPs not allowed.
- */
-int set_transfer (ide_drive_t *drive, struct ata_taskfile *args)
-{
- if ((args->taskfile.command == WIN_SETFEATURES) &&
- (args->taskfile.sector_number >= XFER_SW_DMA_0) &&
- (args->taskfile.feature == SETFEATURES_XFER) &&
- (drive->id->dma_ultra ||
- drive->id->dma_mword ||
- drive->id->dma_1word))
- return 1;
-
- return 0;
-}
-
-/*
- * All hosts that use the 80c ribbon mus use!
+ * All hosts that use the 80c ribbon must use this!
*/
-byte eighty_ninty_three (ide_drive_t *drive)
+byte eighty_ninty_three(struct ata_device *drive)
{
return ((byte) ((drive->channel->udma_four) &&
#ifndef CONFIG_IDEDMA_IVB
@@ -324,7 +252,7 @@
*
* const char *msg == consider adding for verbose errors.
*/
-int ide_config_drive_speed (ide_drive_t *drive, byte speed)
+int ide_config_drive_speed(struct ata_device *drive, byte speed)
{
struct ata_channel *hwif = drive->channel;
int i;
@@ -407,7 +335,7 @@
} else {
outb(inb(hwif->dma_base+2) & ~(1<<(5+unit)), hwif->dma_base+2);
}
-#endif /* (CONFIG_BLK_DEV_IDEDMA) && !(CONFIG_DMA_NONPCI) */
+#endif
switch(speed) {
case XFER_UDMA_7: drive->id->dma_ultra |= 0x8080; break;
@@ -429,11 +357,7 @@
return error;
}
-EXPORT_SYMBOL(ide_auto_reduce_xfer);
EXPORT_SYMBOL(ide_fix_driveid);
EXPORT_SYMBOL(ide_driveid_update);
-EXPORT_SYMBOL(ide_ata66_check);
-EXPORT_SYMBOL(set_transfer);
EXPORT_SYMBOL(eighty_ninty_three);
EXPORT_SYMBOL(ide_config_drive_speed);
-
diff -urN linux-2.5.15/drivers/ide/ide-floppy.c linux/drivers/ide/ide-floppy.c
--- linux-2.5.15/drivers/ide/ide-floppy.c 2002-05-14 00:50:21.000000000 +0200
+++ linux/drivers/ide/ide-floppy.c 2002-05-14 03:32:32.000000000 +0200
@@ -879,7 +879,7 @@
pc = idefloppy_next_pc_storage (drive);
rq = idefloppy_next_rq_storage (drive);
idefloppy_create_request_sense_cmd (pc);
- idefloppy_queue_pc_head (drive, pc, rq);
+ idefloppy_queue_pc_head(drive, pc, rq);
}
/*
diff -urN linux-2.5.15/drivers/ide/ide-geometry.c linux/drivers/ide/ide-geometry.c
--- linux-2.5.15/drivers/ide/ide-geometry.c 2002-05-10 00:21:32.000000000 +0200
+++ linux/drivers/ide/ide-geometry.c 2002-05-14 02:45:46.000000000 +0200
@@ -1,6 +1,4 @@
/*
- * linux/drivers/ide/ide-geometry.c
- *
* Sun Feb 24 23:13:03 CET 2002: Patch by Andries Brouwer to remove the
* confused CMOS probe applied. This is solving more problems than it may
* (unexpectedly) introduce.
@@ -14,17 +12,17 @@
#if defined(CONFIG_BLK_DEV_IDE) || defined(CONFIG_BLK_DEV_IDE_MODULE)
-extern ide_drive_t * get_info_ptr(kdev_t);
+extern struct ata_device * get_info_ptr(kdev_t);
/*
* If heads is nonzero: find a translation with this many heads and S=63.
* Otherwise: find out how OnTrack Disk Manager would translate the disk.
*/
static void
-ontrack(ide_drive_t *drive, int heads, unsigned int *c, int *h, int *s)
+ontrack(struct ata_device *drive, int heads, unsigned int *c, int *h, int *s)
{
- static const byte dm_head_vals[] = {4, 8, 16, 32, 64, 128, 255, 0};
- const byte *headp = dm_head_vals;
+ static const u8 dm_head_vals[] = {4, 8, 16, 32, 64, 128, 255, 0};
+ const u8 *headp = dm_head_vals;
unsigned long total;
/*
@@ -72,13 +70,13 @@
* -1 = similar to "0", plus redirect sector 0 to sector 1.
* 2 = convert to a CHS geometry with "ptheads" heads.
*
- * Returns 0 if the translation was not possible, if the device was not
+ * Returns 0 if the translation was not possible, if the device was not
* an IDE disk drive, or if a geometry was "forced" on the commandline.
* Returns 1 if the geometry translation was successful.
*/
-int ide_xlate_1024 (kdev_t i_rdev, int xparm, int ptheads, const char *msg)
+int ide_xlate_1024(kdev_t i_rdev, int xparm, int ptheads, const char *msg)
{
- ide_drive_t *drive;
+ struct ata_device *drive;
const char *msg1 = "";
int heads = 0;
int c, h, s;
@@ -144,4 +142,4 @@
drive->bios_cyl, drive->bios_head, drive->bios_sect);
return ret;
}
-#endif /* defined(CONFIG_BLK_DEV_IDE) || defined(CONFIG_BLK_DEV_IDE_MODULE) */
+#endif
diff -urN linux-2.5.15/drivers/ide/ide-pmac.c linux/drivers/ide/ide-pmac.c
--- linux-2.5.15/drivers/ide/ide-pmac.c 2002-05-14 04:22:37.000000000 +0200
+++ linux/drivers/ide/ide-pmac.c 2002-05-14 02:23:23.000000000 +0200
@@ -55,8 +55,6 @@
#endif
#include "ata-timing.h"
-extern spinlock_t ide_lock;
-
#undef IDE_PMAC_DEBUG
#define DMA_WAIT_TIMEOUT 500
@@ -1669,12 +1667,12 @@
break;
}
- /* We resume processing on the HW group */
- spin_lock_irq(&ide_lock);
+ /* We resume processing on the lock group */
+ spin_lock_irq(drive->channel->lock);
clear_bit(IDE_BUSY, &drive->channel->active);
if (!list_empty(&drive->queue.queue_head))
do_ide_request(&drive->queue);
- spin_unlock_irq(&ide_lock);
+ spin_unlock_irq(drive->channel->lock);
}
/* Note: We support only master drives for now. This will have to be
diff -urN linux-2.5.15/drivers/ide/ide-tape.c linux/drivers/ide/ide-tape.c
--- linux-2.5.15/drivers/ide/ide-tape.c 2002-05-14 00:50:21.000000000 +0200
+++ linux/drivers/ide/ide-tape.c 2002-05-14 03:56:21.000000000 +0200
@@ -1917,9 +1917,9 @@
idetape_active_next_stage (drive);
/*
- * Insert the next request into the request queue.
+ * Insert the next request into the request queue.
*/
- (void) ide_do_drive_cmd (drive, tape->active_data_request, ide_end);
+ ide_do_drive_cmd(drive, tape->active_data_request, ide_end);
} else if (!error) {
if (!tape->onstream)
idetape_increase_max_pipeline_stages (drive);
@@ -1986,7 +1986,7 @@
ide_init_drive_cmd (rq);
rq->buffer = (char *) pc;
rq->flags = IDETAPE_PC_RQ1;
- (void) ide_do_drive_cmd (drive, rq, ide_preempt);
+ ide_do_drive_cmd(drive, rq, ide_preempt);
}
/*
@@ -3197,7 +3197,7 @@
ide_init_drive_cmd (&rq);
rq.buffer = (char *) pc;
rq.flags = IDETAPE_PC_RQ1;
- return ide_do_drive_cmd (drive, &rq, ide_wait);
+ return ide_do_drive_cmd(drive, &rq, ide_wait);
}
static void idetape_create_load_unload_cmd (ide_drive_t *drive, idetape_pc_t *pc,int cmd)
diff -urN linux-2.5.15/drivers/ide/ide-taskfile.c linux/drivers/ide/ide-taskfile.c
--- linux-2.5.15/drivers/ide/ide-taskfile.c 2002-05-14 04:22:37.000000000 +0200
+++ linux/drivers/ide/ide-taskfile.c 2002-05-14 03:56:48.000000000 +0200
@@ -19,6 +19,7 @@
#include <linux/errno.h>
#include <linux/genhd.h>
#include <linux/blkpg.h>
+#include <linux/completion.h>
#include <linux/slab.h>
#include <linux/pci.h>
#include <linux/delay.h>
@@ -813,6 +814,77 @@
}
}
+/*
+ * This function is intended to be used prior to invoking ide_do_drive_cmd().
+ */
+void ide_init_drive_cmd(struct request *rq)
+{
+ memset(rq, 0, sizeof(*rq));
+ rq->flags = REQ_DRIVE_CMD;
+}
+
+/*
+ * This function issues a special IDE device request onto the request queue.
+ *
+ * If action is ide_wait, then the rq is queued at the end of the request
+ * queue, and the function sleeps until it has been processed. This is for use
+ * when invoked from an ioctl handler.
+ *
+ * If action is ide_preempt, then the rq is queued at the head of the request
+ * queue, displacing the currently-being-processed request and this function
+ * returns immediately without waiting for the new rq to be completed. This is
+ * VERY DANGEROUS, and is intended for careful use by the ATAPI tape/cdrom
+ * driver code.
+ *
+ * If action is ide_end, then the rq is queued at the end of the request queue,
+ * and the function returns immediately without waiting for the new rq to be
+ * completed. This is again intended for careful use by the ATAPI tape/cdrom
+ * driver code.
+ */
+int ide_do_drive_cmd(struct ata_device *drive, struct request *rq, ide_action_t action)
+{
+ unsigned long flags;
+ unsigned int major = drive->channel->major;
+ request_queue_t *q = &drive->queue;
+ struct list_head *queue_head = &q->queue_head;
+ DECLARE_COMPLETION(wait);
+
+#ifdef CONFIG_BLK_DEV_PDC4030
+ if (drive->channel->chipset == ide_pdc4030 && rq->buffer != NULL)
+ return -ENOSYS; /* special drive cmds not supported */
+#endif
+ rq->errors = 0;
+ rq->rq_status = RQ_ACTIVE;
+ rq->rq_dev = mk_kdev(major,(drive->select.b.unit)<<PARTN_BITS);
+ if (action == ide_wait)
+ rq->waiting = &wait;
+
+ spin_lock_irqsave(drive->channel->lock, flags);
+
+ if (blk_queue_empty(&drive->queue) || action == ide_preempt) {
+ if (action == ide_preempt)
+ drive->rq = NULL;
+ } else {
+ if (action == ide_wait)
+ queue_head = queue_head->prev;
+ else
+ queue_head = queue_head->next;
+ }
+ q->elevator.elevator_add_req_fn(q, rq, queue_head);
+
+ do_ide_request(q);
+
+ spin_unlock_irqrestore(drive->channel->lock, flags);
+
+ if (action == ide_wait) {
+ wait_for_completion(&wait); /* wait for it to be serviced */
+ return rq->errors ? -EIO : 0; /* return -EIO if errors */
+ }
+
+ return 0;
+
+}
+
int ide_raw_taskfile(struct ata_device *drive, struct ata_taskfile *args)
{
struct request rq;
@@ -839,12 +911,59 @@
* interface.
*/
+/*
+ * Backside of HDIO_DRIVE_CMD call of SETFEATURES_XFER.
+ * 1 : Safe to update drive->id DMA registers.
+ * 0 : OOPs not allowed.
+ */
+static int set_transfer(struct ata_device *drive, struct ata_taskfile *args)
+{
+ if ((args->taskfile.command == WIN_SETFEATURES) &&
+ (args->taskfile.sector_number >= XFER_SW_DMA_0) &&
+ (args->taskfile.feature == SETFEATURES_XFER) &&
+ (drive->id->dma_ultra ||
+ drive->id->dma_mword ||
+ drive->id->dma_1word))
+ return 1;
+
+ return 0;
+}
+
+/*
+ * Verify that we are doing an approved SETFEATURES_XFER with respect
+ * to the hardware being able to support request. Since some hardware
+ * can improperly report capabilties, we check to see if the host adapter
+ * in combination with the device (usually a disk) properly detect
+ * and acknowledge each end of the ribbon.
+ */
+static int ata66_check(struct ata_device *drive, struct ata_taskfile *args)
+{
+ if ((args->taskfile.command == WIN_SETFEATURES) &&
+ (args->taskfile.sector_number > XFER_UDMA_2) &&
+ (args->taskfile.feature == SETFEATURES_XFER)) {
+ if (!drive->channel->udma_four) {
+ printk("%s: Speed warnings UDMA 3/4/5 is not functional.\n", drive->channel->name);
+ return 1;
+ }
+#ifndef CONFIG_IDEDMA_IVB
+ if ((drive->id->hw_config & 0x6000) == 0) {
+#else
+ if (((drive->id->hw_config & 0x2000) == 0) ||
+ ((drive->id->hw_config & 0x4000) == 0)) {
+#endif
+ printk("%s: Speed warnings UDMA 3/4/5 is not functional.\n", drive->name);
+ return 1;
+ }
+ }
+ return 0;
+}
+
int ide_cmd_ioctl(struct ata_device *drive, unsigned long arg)
{
int err = 0;
u8 vals[4];
u8 *argbuf = vals;
- u8 xfer_rate = 0;
+ u8 pio = 0;
int argsize = 4;
struct ata_taskfile args;
struct request rq;
@@ -879,10 +998,11 @@
}
/* Always make sure the transfer reate has been setup.
+ * FIXME: what about setting up the drive with ->tuneproc?
*/
if (set_transfer(drive, &args)) {
- xfer_rate = vals[1];
- if (ide_ata66_check(drive, &args))
+ pio = vals[1];
+ if (ata66_check(drive, &args))
goto abort;
}
@@ -891,10 +1011,11 @@
rq.buffer = argbuf;
err = ide_do_drive_cmd(drive, &rq, ide_wait);
- if (!err && xfer_rate) {
+ if (!err && pio) {
/* active-retuning-calls future */
- if ((drive->channel->speedproc) != NULL)
- drive->channel->speedproc(drive, xfer_rate);
+ /* FIXME: what about the setup for the drive?! */
+ if (drive->channel->speedproc)
+ drive->channel->speedproc(drive, pio);
ide_driveid_update(drive);
}
@@ -916,6 +1037,8 @@
EXPORT_SYMBOL(ata_taskfile);
EXPORT_SYMBOL(recal_intr);
EXPORT_SYMBOL(task_no_data_intr);
+EXPORT_SYMBOL(ide_init_drive_cmd);
+EXPORT_SYMBOL(ide_do_drive_cmd);
EXPORT_SYMBOL(ide_raw_taskfile);
EXPORT_SYMBOL(ide_cmd_type_parser);
EXPORT_SYMBOL(ide_cmd_ioctl);
diff -urN linux-2.5.15/drivers/ide/Makefile linux/drivers/ide/Makefile
--- linux-2.5.15/drivers/ide/Makefile 2002-05-10 00:21:51.000000000 +0200
+++ linux/drivers/ide/Makefile 2002-05-14 04:11:02.000000000 +0200
@@ -54,7 +54,6 @@
ide-obj-$(CONFIG_BLK_DEV_SVWKS) += serverworks.o
ide-obj-$(CONFIG_BLK_DEV_PDC202XX) += pdc202xx.o
ide-obj-$(CONFIG_BLK_DEV_PDC4030) += pdc4030.o
-ide-obj-$(CONFIG_BLK_DEV_PDC_ADMA) += pdcadma.o
ide-obj-$(CONFIG_BLK_DEV_PIIX) += piix.o
ide-obj-$(CONFIG_BLK_DEV_QD65XX) += qd65xx.o
ide-obj-$(CONFIG_BLK_DEV_IDE_RAPIDE) += rapide.o
@@ -72,6 +71,6 @@
obj-$(CONFIG_BLK_DEV_ATARAID_PDC) += pdcraid.o
obj-$(CONFIG_BLK_DEV_ATARAID_HPT) += hptraid.o
-ide-mod-objs := ide-taskfile.o ide.o ide-probe.o ide-geometry.o ide-features.o ata-timing.o $(ide-obj-y)
+ide-mod-objs := ide-taskfile.o ide.o ide-probe.o ide-geometry.o ide-features.o ata-timing.o $(ide-obj-y)
include $(TOPDIR)/Rules.make
diff -urN linux-2.5.15/drivers/ide/pcihost.h linux/drivers/ide/pcihost.h
--- linux-2.5.15/drivers/ide/pcihost.h 2002-05-10 00:21:38.000000000 +0200
+++ linux/drivers/ide/pcihost.h 2002-05-14 04:10:46.000000000 +0200
@@ -72,9 +72,6 @@
#ifdef CONFIG_BLK_DEV_AMD74XX
extern int init_amd74xx(void);
#endif
-#ifdef CONFIG_BLK_DEV_PDC_ADMA
-extern int init_pdcadma(void);
-#endif
#ifdef CONFIG_BLK_DEV_SVWKS
extern int init_svwks(void);
#endif
diff -urN linux-2.5.15/drivers/ide/pdc4030.c linux/drivers/ide/pdc4030.c
--- linux-2.5.15/drivers/ide/pdc4030.c 2002-05-14 04:22:37.000000000 +0200
+++ linux/drivers/ide/pdc4030.c 2002-05-14 03:11:56.000000000 +0200
@@ -316,8 +316,7 @@
memcpy(hwif2->io_ports, hwif->hw.io_ports, sizeof(hwif2->io_ports));
hwif2->irq = hwif->irq;
hwif2->hw.irq = hwif->hw.irq = hwif->irq;
- hwif->io_32bit = 3;
- hwif2->io_32bit = 3;
+ hwif->io_32bit = hwif2->io_32bit = 1;
for (i=0; i<2 ; i++) {
if (!ident.current_tm[i].cyl)
hwif->drives[i].noprobe = 1;
diff -urN linux-2.5.15/drivers/ide/pdcadma.c linux/drivers/ide/pdcadma.c
--- linux-2.5.15/drivers/ide/pdcadma.c 2002-05-10 00:22:52.000000000 +0200
+++ linux/drivers/ide/pdcadma.c 1970-01-01 01:00:00.000000000 +0100
@@ -1,128 +0,0 @@
-/*
- * linux/drivers/ide/pdcadma.c Version 0.01 June 21, 2001
- *
- * Copyright (C) 1999-2000 Andre Hedrick <andre@linux-ide.org>
- * May be copied or modified under the terms of the GNU General Public License
- *
- */
-
-#include <linux/config.h>
-#include <linux/types.h>
-#include <linux/kernel.h>
-#include <linux/delay.h>
-#include <linux/timer.h>
-#include <linux/mm.h>
-#include <linux/ioport.h>
-#include <linux/blkdev.h>
-#include <linux/hdreg.h>
-
-#include <linux/interrupt.h>
-#include <linux/init.h>
-#include <linux/pci.h>
-#include <linux/ide.h>
-
-#include <asm/io.h>
-#include <asm/irq.h>
-
-#include "ata-timing.h"
-#include "pcihost.h"
-
-#undef DISPLAY_PDCADMA_TIMINGS
-
-#if defined(DISPLAY_PDCADMA_TIMINGS) && defined(CONFIG_PROC_FS)
-#include <linux/stat.h>
-#include <linux/proc_fs.h>
-
-static int pdcadma_get_info(char *, char **, off_t, int);
-extern int (*pdcadma_display_info)(char *, char **, off_t, int); /* ide-proc.c */
-static struct pci_dev *bmide_dev;
-
-static int pdcadma_get_info (char *buffer, char **addr, off_t offset, int count)
-{
- char *p = buffer;
- u32 bibma = pci_resource_start(bmide_dev, 4);
-
- p += sprintf(p, "\n PDC ADMA %04X Chipset.\n", bmide_dev->device);
- p += sprintf(p, "UDMA\n");
- p += sprintf(p, "PIO\n");
-
- return p-buffer; /* => must be less than 4k! */
-}
-#endif
-
-byte pdcadma_proc = 0;
-
-extern char *ide_xfer_verbose (byte xfer_rate);
-
-#ifdef CONFIG_BLK_DEV_IDEDMA
-
-/*
- * This initiates/aborts (U)DMA read/write operations on a drive.
- */
-static int pdcadma_dmaproc(struct ata_device *drive)
-{
- udma_enable(drive, 0, 0);
-
- return 0;
-}
-#endif
-
-static unsigned int __init pci_init_pdcadma(struct pci_dev *dev)
-{
-#if defined(DISPLAY_PDCADMA_TIMINGS) && defined(CONFIG_PROC_FS)
- if (!pdcadma_proc) {
- pdcadma_proc = 1;
- bmide_dev = dev;
- pdcadma_display_info = pdcadma_get_info;
- }
-#endif
- return 0;
-}
-
-static unsigned int __init ata66_pdcadma(struct ata_channel *channel)
-{
- return 1;
-}
-
-static void __init ide_init_pdcadma(struct ata_channel *hwif)
-{
- hwif->autodma = 0;
- hwif->dma_base = 0;
-
-// hwif->tuneproc = &pdcadma_tune_drive;
-// hwif->speedproc = &pdcadma_tune_chipset;
-
-// if (hwif->dma_base) {
-// hwif->XXX_dmaproc = &pdcadma_dmaproc;
-// hwif->autodma = 1;
-// }
-}
-
-static void __init ide_dmacapable_pdcadma(struct ata_channel *hwif, unsigned long dmabase)
-{
-// ide_setup_dma(hwif, dmabase, 8);
-}
-
-
-/* module data table */
-static struct ata_pci_device chipset __initdata = {
- PCI_VENDOR_ID_PDC, PCI_DEVICE_ID_PDC_1841,
- pci_init_pdcadma,
- ata66_pdcadma,
- ide_init_pdcadma,
- ide_dmacapable_pdcadma,
- {
- {0x00,0x00,0x00},
- {0x00,0x00,0x00}
- },
- OFF_BOARD,
- 0,
- ATA_F_NODMA
-};
-
-int __init init_pdcadma(void)
-{
- ata_register_chipset(&chipset);
-
- return 0;
-}
diff -urN linux-2.5.15/drivers/ide/tcq.c linux/drivers/ide/tcq.c
--- linux-2.5.15/drivers/ide/tcq.c 2002-05-14 04:22:37.000000000 +0200
+++ linux/drivers/ide/tcq.c 2002-05-14 02:22:11.000000000 +0200
@@ -90,7 +90,7 @@
printk(KERN_INFO "ATA: %s: invalidating pending queue (%d)\n", drive->name, ata_pending_commands(drive));
- spin_lock_irqsave(&ide_lock, flags);
+ spin_lock_irqsave(ch->lock, flags);
del_timer(&ch->timer);
@@ -144,7 +144,7 @@
* start doing stuff again
*/
q->request_fn(q);
- spin_unlock_irqrestore(&ide_lock, flags);
+ spin_unlock_irqrestore(ch->lock, flags);
printk(KERN_DEBUG "ATA: tcq_invalidate_queue: done\n");
}
@@ -156,14 +156,14 @@
printk(KERN_ERR "ATA: %s: timeout waiting for interrupt...\n", __FUNCTION__);
- spin_lock_irqsave(&ide_lock, flags);
+ spin_lock_irqsave(ch->lock, flags);
if (test_and_set_bit(IDE_BUSY, &ch->active))
printk(KERN_ERR "ATA: %s: IRQ handler not busy\n", __FUNCTION__);
if (!ch->handler)
printk(KERN_ERR "ATA: %s: missing ISR!\n", __FUNCTION__);
- spin_unlock_irqrestore(&ide_lock, flags);
+ spin_unlock_irqrestore(ch->lock, flags);
/*
* if pending commands, try service before giving up
@@ -181,7 +181,7 @@
struct ata_channel *ch = drive->channel;
unsigned long flags;
- spin_lock_irqsave(&ide_lock, flags);
+ spin_lock_irqsave(ch->lock, flags);
/*
* always just bump the timer for now, the timeout handling will
@@ -196,7 +196,7 @@
mod_timer(&ch->timer, jiffies + 5 * HZ);
ch->handler = handler;
- spin_unlock_irqrestore(&ide_lock, flags);
+ spin_unlock_irqrestore(ch->lock, flags);
}
/*
diff -urN linux-2.5.15/drivers/scsi/ide-scsi.c linux/drivers/scsi/ide-scsi.c
--- linux-2.5.15/drivers/scsi/ide-scsi.c 2002-05-14 00:50:21.000000000 +0200
+++ linux/drivers/scsi/ide-scsi.c 2002-05-14 03:54:24.000000000 +0200
@@ -792,18 +792,6 @@
*/
int idescsi_device_reset (Scsi_Cmnd *cmd)
{
-#if 0
- ide_drive_t *drive = idescsi_drives[cmd->target];
- struct request req;
-
- ide_init_drive_cmd(&req);
- req.flags = REQ_SPECIAL;
-
- /* FIX ME, the next executable line causes on oops in lk 2.5.10-dj1
- * [code copied from ide-cd's ide_cdrom_reset(), does it work?]
- */
- ide_do_drive_cmd(drive, &req, ide_wait);
-#endif
return SUCCESS;
}
diff -urN linux-2.5.15/include/linux/ide.h linux/include/linux/ide.h
--- linux-2.5.15/include/linux/ide.h 2002-05-14 04:22:38.000000000 +0200
+++ linux/include/linux/ide.h 2002-05-14 03:38:06.000000000 +0200
@@ -449,19 +449,20 @@
spinlock_t *lock;
ide_startstop_t (*handler)(struct ata_device *, struct request *); /* irq handler, if active */
- struct timer_list timer; /* failsafe timer */
+ struct timer_list timer; /* failsafe timer */
int (*expiry)(struct ata_device *, struct request *); /* irq handler, if active */
- unsigned long poll_timeout; /* timeout value during polled operations */
- struct ata_device *drive; /* last serviced drive */
+ unsigned long poll_timeout; /* timeout value during polled operations */
+ struct ata_device *drive; /* last serviced drive */
+
unsigned long active; /* active processing request */
- ide_ioreg_t io_ports[IDE_NR_PORTS]; /* task file registers */
- hw_regs_t hw; /* Hardware info */
+ ide_ioreg_t io_ports[IDE_NR_PORTS]; /* task file registers */
+ hw_regs_t hw; /* hardware info */
#ifdef CONFIG_PCI
- struct pci_dev *pci_dev; /* for pci chipsets */
+ struct pci_dev *pci_dev; /* for pci chipsets */
#endif
struct ata_device drives[MAX_DRIVES]; /* drive info */
- struct gendisk *gd; /* gendisk structure */
+ struct gendisk *gd; /* gendisk structure */
/*
* Routines to tune PIO and DMA mode for drives.
@@ -469,7 +470,11 @@
* A value of 255 indicates that the function should choose the optimal
* mode itself.
*/
+
+ /* setup disk on a channel for a particular transfer mode */
void (*tuneproc) (struct ata_device *, byte pio);
+
+ /* setup the chipset timing for a particular transfer mode */
int (*speedproc) (struct ata_device *, byte pio);
/* tweaks hardware to select drive */
@@ -487,6 +492,9 @@
/* check host's drive quirk list */
int (*quirkproc) (struct ata_device *);
+ /* driver soft-power interface */
+ int (*busproc)(struct ata_device *, int);
+
/* CPU-polled transfer routines */
void (*ata_read)(struct ata_device *, void *, unsigned int);
void (*ata_write)(struct ata_device *, void *, unsigned int);
@@ -535,17 +543,14 @@
unsigned no_io_32bit : 1; /* disallow enabling 32bit I/O */
unsigned no_unmask : 1; /* disallow setting unmask bit */
unsigned auto_poll : 1; /* supports nop auto-poll */
- byte io_32bit; /* 0=16-bit, 1=32-bit, 2/3=32bit+sync */
- byte unmask; /* flag: okay to unmask other irqs */
- byte slow; /* flag: slow data port */
+ unsigned unmask : 1; /* flag: okay to unmask other irqs */
+ unsigned slow : 1; /* flag: slow data port */
+ unsigned io_32bit : 1; /* 0=16-bit, 1=32-bit */
+ unsigned char bus_state; /* power state of the IDE bus */
#if (DISK_RECOVERY_TIME > 0)
- unsigned long last_time; /* time when previous rq was done */
+ unsigned long last_time; /* time when previous rq was done */
#endif
- /* driver soft-power interface */
- int (*busproc)(struct ata_device *, int);
-
- byte bus_state; /* power state of the IDE bus */
};
/*
@@ -656,12 +661,6 @@
const char *, byte);
/*
- * Issue a simple drive command
- * The drive must be selected beforehand.
- */
-void ide_cmd(struct ata_device *, byte, byte, ata_handler_t);
-
-/*
* ide_fixstring() cleans up and (optionally) byte-swaps a text string,
* removing leading/trailing blanks and compressing internal blanks.
* It is primarily used to tidy up the model name/number fields as
@@ -710,7 +709,6 @@
*/
typedef enum {
ide_wait, /* insert rq at end of list, and wait for it */
- ide_next, /* insert rq immediately after current request */
ide_preempt, /* insert rq in front of current request */
ide_end /* insert rq at end of list, but don't wait for it */
} ide_action_t;
@@ -760,13 +758,10 @@
void ide_delay_50ms(void);
-extern byte ide_auto_reduce_xfer(struct ata_device *);
extern void ide_fix_driveid(struct hd_driveid *id);
extern int ide_driveid_update(struct ata_device *);
-extern int ide_ata66_check(struct ata_device *, struct ata_taskfile *);
extern int ide_config_drive_speed(struct ata_device *, byte);
extern byte eighty_ninty_three(struct ata_device *);
-extern int set_transfer(struct ata_device *, struct ata_taskfile *);
extern int system_bus_speed;
^ permalink raw reply [flat|nested] 265+ messages in thread
* [PATCH] 2.5.15 IDE 64
2002-05-06 3:53 Linux-2.5.14 Linus Torvalds
` (11 preceding siblings ...)
2002-05-14 10:28 ` [PATCH] 2.5.15 IDE 63 Martin Dalecki
@ 2002-05-15 12:04 ` Martin Dalecki
2002-05-15 13:12 ` Russell King
12 siblings, 1 reply; 265+ messages in thread
From: Martin Dalecki @ 2002-05-15 12:04 UTC (permalink / raw)
To: Linus Torvalds; +Cc: Kernel Mailing List
[-- Attachment #1: Type: text/plain, Size: 316 bytes --]
Tue May 14 13:35:04 CEST 2002 ide-clean-64:
Let's just get over with this before queue handling will be targeted again...
- Implement suggestions by Russel King for improved portability and separation
between PCI and non PCI host code.
- pdc202xxx updates from Thierry Vignaud.
- Tinny PIO fix from Tomita.
[-- Attachment #2: ide-clean-64.diff --]
[-- Type: text/plain, Size: 73906 bytes --]
diff -urN linux-2.5.15/drivers/ide/ide.c linux/drivers/ide/ide.c
--- linux-2.5.15/drivers/ide/ide.c 2002-05-15 14:55:12.000000000 +0200
+++ linux/drivers/ide/ide.c 2002-05-15 13:34:25.000000000 +0200
@@ -1145,7 +1145,7 @@
}
/*
- * Select the next device which will be serviced. This selects onlt between
+ * Select the next device which will be serviced. This selects only between
* devices on the same channel, since everything else will be scheduled on the
* queue level.
*/
diff -urN linux-2.5.15/drivers/ide/ide-dma.c linux/drivers/ide/ide-dma.c
--- linux-2.5.15/drivers/ide/ide-dma.c 2002-05-15 14:55:05.000000000 +0200
+++ linux/drivers/ide/ide-dma.c 1970-01-01 01:00:00.000000000 +0100
@@ -1,866 +0,0 @@
-/**** vi:set ts=8 sts=8 sw=8:************************************************
- *
- * Copyright (c) 1999-2000 Andre Hedrick <andre@linux-ide.org>
- * Copyright (c) 1995-1998 Mark Lord
- *
- * May be copied or modified under the terms of the GNU General Public License
- *
- * Special Thanks to Mark for his Six years of work.
- *
- * This module provides support for the bus-master IDE DMA functions
- * of various PCI chipsets, including the Intel PIIX (i82371FB for
- * the 430 FX chipset), the PIIX3 (i82371SB for the 430 HX/VX and
- * 440 chipsets), and the PIIX4 (i82371AB for the 430 TX chipset)
- * ("PIIX" stands for "PCI ISA IDE Xcellerator").
- *
- * Pretty much the same code works for other IDE PCI bus-mastering chipsets.
- *
- * DMA is supported for all IDE devices (disk drives, cdroms, tapes, floppies).
- *
- * By default, DMA support is prepared for use, but is currently enabled only
- * for drives which already have DMA enabled (UltraDMA or mode 2 multi/single),
- * or which are recognized as "good" (see table below). Drives with only mode0
- * or mode1 (multi/single) DMA should also work with this chipset/driver
- * (eg. MC2112A) but are not enabled by default.
- *
- * Use "hdparm -i" to view modes supported by a given drive.
- *
- * The hdparm-3.5 (or later) utility can be used for manually enabling/disabling
- * DMA support, but must be (re-)compiled against this kernel version or later.
- *
- * To enable DMA, use "hdparm -d1 /dev/hd?" on a per-drive basis after booting.
- * If problems arise, ide.c will disable DMA operation after a few retries.
- * This error recovery mechanism works and has been extremely well exercised.
- *
- * IDE drives, depending on their vintage, may support several different modes
- * of DMA operation. The boot-time modes are indicated with a "*" in
- * the "hdparm -i" listing, and can be changed with *knowledgeable* use of
- * the "hdparm -X" feature. There is seldom a need to do this, as drives
- * normally power-up with their "best" PIO/DMA modes enabled.
- *
- * Testing has been done with a rather extensive number of drives,
- * with Quantum & Western Digital models generally outperforming the pack,
- * and Fujitsu & Conner (and some Seagate which are really Conner) drives
- * showing more lackluster throughput.
- *
- * Keep an eye on /var/adm/messages for "DMA disabled" messages.
- *
- * Some people have reported trouble with Intel Zappa motherboards.
- * This can be fixed by upgrading the AMI BIOS to version 1.00.04.BS0,
- * available from ftp://ftp.intel.com/pub/bios/10004bs0.exe
- * (thanks to Glen Morrell <glen@spin.Stanford.edu> for researching this).
- *
- * Thanks to "Christopher J. Reimer" <reimer@doe.carleton.ca> for
- * fixing the problem with the BIOS on some Acer motherboards.
- *
- * Thanks to "Benoit Poulot-Cazajous" <poulot@chorus.fr> for testing
- * "TX" chipset compatibility and for providing patches for the "TX" chipset.
- *
- * Thanks to Christian Brunner <chb@muc.de> for taking a good first crack
- * at generic DMA -- his patches were referred to when preparing this code.
- *
- * Most importantly, thanks to Robert Bringman <rob@mars.trion.com>
- * for supplying a Promise UDMA board & WD UDMA drive for this work!
- *
- * And, yes, Intel Zappa boards really *do* use both PIIX IDE ports.
- *
- * ATA-66/100 and recovery functions, I forgot the rest......
- */
-
-#include <linux/config.h>
-#define __NO_VERSION__
-#include <linux/module.h>
-#include <linux/types.h>
-#include <linux/kernel.h>
-#include <linux/timer.h>
-#include <linux/mm.h>
-#include <linux/interrupt.h>
-#include <linux/pci.h>
-#include <linux/init.h>
-#include <linux/ide.h>
-#include <linux/delay.h>
-
-#include <asm/io.h>
-#include <asm/irq.h>
-
-/*
- * Long lost data from 2.0.34 that is now in 2.0.39
- *
- * This was used in ./drivers/block/triton.c to do DMA Base address setup
- * when PnP failed. Oh the things we forget. I believe this was part
- * of SFF-8038i that has been withdrawn from public access... :-((
- */
-#define DEFAULT_BMIBA 0xe800 /* in case BIOS did not init it */
-#define DEFAULT_BMCRBA 0xcc00 /* VIA's default value */
-#define DEFAULT_BMALIBA 0xd400 /* ALI's default value */
-
-#ifdef CONFIG_IDEDMA_NEW_DRIVE_LISTINGS
-
-struct drive_list_entry {
- char * id_model;
- char * id_firmware;
-};
-
-struct drive_list_entry drive_whitelist[] = {
- { "Micropolis 2112A", NULL },
- { "CONNER CTMA 4000", NULL },
- { "CONNER CTT8000-A", NULL },
- { "ST34342A", NULL },
- { NULL, NULL }
-};
-
-struct drive_list_entry drive_blacklist[] = {
-
- { "WDC AC11000H", NULL },
- { "WDC AC22100H", NULL },
- { "WDC AC32500H", NULL },
- { "WDC AC33100H", NULL },
- { "WDC AC31600H", NULL },
- { "WDC AC32100H", "24.09P07" },
- { "WDC AC23200L", "21.10N21" },
- { "Compaq CRD-8241B", NULL },
- { "CRD-8400B", NULL },
- { "CRD-8480B", NULL },
- { "CRD-8480C", NULL },
- { "CRD-8482B", NULL },
- { "CRD-84", NULL },
- { "SanDisk SDP3B", NULL },
- { "SanDisk SDP3B-64", NULL },
- { "SANYO CD-ROM CRD", NULL },
- { "HITACHI CDR-8", NULL },
- { "HITACHI CDR-8335", NULL },
- { "HITACHI CDR-8435", NULL },
- { "Toshiba CD-ROM XM-6202B", NULL },
- { "CD-532E-A", NULL },
- { "E-IDE CD-ROM CR-840", NULL },
- { "CD-ROM Drive/F5A", NULL },
- { "RICOH CD-R/RW MP7083A", NULL },
- { "WPI CDD-820", NULL },
- { "SAMSUNG CD-ROM SC-148C", NULL },
- { "SAMSUNG CD-ROM SC-148F", NULL },
- { "SAMSUNG CD-ROM SC", NULL },
- { "SanDisk SDP3B-64", NULL },
- { "SAMSUNG CD-ROM SN-124", NULL },
- { "PLEXTOR CD-R PX-W8432T", NULL },
- { "ATAPI CD-ROM DRIVE 40X MAXIMUM", NULL },
- { "_NEC DV5800A", NULL },
- { NULL, NULL }
-
-};
-
-static int in_drive_list(struct hd_driveid *id, struct drive_list_entry * drive_table)
-{
- for ( ; drive_table->id_model ; drive_table++)
- if ((!strcmp(drive_table->id_model, id->model)) &&
- ((drive_table->id_firmware && !strstr(drive_table->id_firmware, id->fw_rev)) ||
- (!drive_table->id_firmware)))
- return 1;
- return 0;
-}
-
-#else
-
-/*
- * good_dma_drives() lists the model names (from "hdparm -i")
- * of drives which do not support mode2 DMA but which are
- * known to work fine with this interface under Linux.
- */
-const char *good_dma_drives[] = {"Micropolis 2112A",
- "CONNER CTMA 4000",
- "CONNER CTT8000-A",
- "ST34342A", /* for Sun Ultra */
- NULL};
-
-/*
- * bad_dma_drives() lists the model names (from "hdparm -i")
- * of drives which supposedly support (U)DMA but which are
- * known to corrupt data with this interface under Linux.
- *
- * This is an empirical list. Its generated from bug reports. That means
- * while it reflects actual problem distributions it doesn't answer whether
- * the drive or the controller, or cabling, or software, or some combination
- * thereof is the fault. If you don't happen to agree with the kernel's
- * opinion of your drive - use hdparm to turn DMA on.
- */
-const char *bad_dma_drives[] = {"WDC AC11000H",
- "WDC AC22100H",
- "WDC AC32100H",
- "WDC AC32500H",
- "WDC AC33100H",
- "WDC AC31600H",
- NULL};
-
-#endif
-
-/*
- * This is the handler for disk read/write DMA interrupts.
- */
-ide_startstop_t ide_dma_intr(struct ata_device *drive, struct request *rq)
-{
- u8 stat, dma_stat;
-
- dma_stat = udma_stop(drive);
- if (OK_STAT(stat = GET_STAT(),DRIVE_READY,drive->bad_wstat|DRQ_STAT)) {
- if (!dma_stat) {
- __ide_end_request(drive, rq, 1, rq->nr_sectors);
- return ide_stopped;
- }
- printk(KERN_ERR "%s: dma_intr: bad DMA status (dma_stat=%x)\n",
- drive->name, dma_stat);
- }
- return ide_error(drive, rq, "dma_intr", stat);
-}
-
-/*
- * FIXME: taskfiles should be a map of pages, not a long virt address... /jens
- * FIXME: I agree with Jens --mdcki!
- */
-static int build_sglist(struct ata_channel *ch, struct request *rq)
-{
- struct scatterlist *sg = ch->sg_table;
- int nents = 0;
-
- if (rq->flags & REQ_DRIVE_ACB) {
- struct ata_taskfile *args = rq->special;
-#if 1
- unsigned char *virt_addr = rq->buffer;
- int sector_count = rq->nr_sectors;
-#else
- nents = blk_rq_map_sg(rq->q, rq, ch->sg_table);
-
- if (nents > rq->nr_segments)
- printk("ide-dma: received %d segments, build %d\n", rq->nr_segments, nents);
-#endif
-
- if (args->command_type == IDE_DRIVE_TASK_RAW_WRITE)
- ch->sg_dma_direction = PCI_DMA_TODEVICE;
- else
- ch->sg_dma_direction = PCI_DMA_FROMDEVICE;
-
- /*
- * FIXME: This depends upon a hard coded page size!
- */
- if (sector_count > 128) {
- memset(&sg[nents], 0, sizeof(*sg));
-
- sg[nents].page = virt_to_page(virt_addr);
- sg[nents].offset = (unsigned long) virt_addr & ~PAGE_MASK;
- sg[nents].length = 128 * SECTOR_SIZE;
- ++nents;
- virt_addr = virt_addr + (128 * SECTOR_SIZE);
- sector_count -= 128;
- }
- memset(&sg[nents], 0, sizeof(*sg));
- sg[nents].page = virt_to_page(virt_addr);
- sg[nents].offset = (unsigned long) virt_addr & ~PAGE_MASK;
- sg[nents].length = sector_count * SECTOR_SIZE;
- ++nents;
- } else {
- nents = blk_rq_map_sg(rq->q, rq, ch->sg_table);
-
- if (rq->q && nents > rq->nr_phys_segments)
- printk("ide-dma: received %d phys segments, build %d\n", rq->nr_phys_segments, nents);
-
- if (rq_data_dir(rq) == READ)
- ch->sg_dma_direction = PCI_DMA_FROMDEVICE;
- else
- ch->sg_dma_direction = PCI_DMA_TODEVICE;
-
- }
- return pci_map_sg(ch->pci_dev, sg, nents, ch->sg_dma_direction);
-}
-
-/*
- * For both Blacklisted and Whitelisted drives.
- * This is setup to be called as an extern for future support
- * to other special driver code.
- */
-int check_drive_lists(struct ata_device *drive, int good_bad)
-{
- struct hd_driveid *id = drive->id;
-
-#ifdef CONFIG_IDEDMA_NEW_DRIVE_LISTINGS
- if (good_bad) {
- return in_drive_list(id, drive_whitelist);
- } else {
- int blacklist = in_drive_list(id, drive_blacklist);
- if (blacklist)
- printk("%s: Disabling (U)DMA for %s\n", drive->name, id->model);
- return(blacklist);
- }
-#else
- const char **list;
-
- if (good_bad) {
- /* Consult the list of known "good" drives */
- list = good_dma_drives;
- while (*list) {
- if (!strcmp(*list++,id->model))
- return 1;
- }
- } else {
- /* Consult the list of known "bad" drives */
- list = bad_dma_drives;
- while (*list) {
- if (!strcmp(*list++,id->model)) {
- printk("%s: Disabling (U)DMA for %s\n",
- drive->name, id->model);
- return 1;
- }
- }
- }
-#endif
- return 0;
-}
-
-static int config_drive_for_dma(struct ata_device *drive)
-{
- int config_allows_dma = 1;
- struct hd_driveid *id = drive->id;
- struct ata_channel *ch = drive->channel;
-
-#ifdef CONFIG_IDEDMA_ONLYDISK
- if (drive->type != ATA_DISK)
- config_allows_dma = 0;
-#endif
-
- if (id && (id->capability & 1) && ch->autodma && config_allows_dma) {
- /* Consult the list of known "bad" drives */
- if (udma_black_list(drive)) {
- udma_enable(drive, 0, 1);
-
- return 0;
- }
-
- /* Enable DMA on any drive that has UltraDMA (mode 6/7/?) enabled */
- if ((id->field_valid & 4) && (eighty_ninty_three(drive)))
- if ((id->dma_ultra & (id->dma_ultra >> 14) & 2)) {
- udma_enable(drive, 1, 1);
-
- return 0;
- }
- /* Enable DMA on any drive that has UltraDMA (mode 3/4/5) enabled */
- if ((id->field_valid & 4) && (eighty_ninty_three(drive)))
- if ((id->dma_ultra & (id->dma_ultra >> 11) & 7)) {
- udma_enable(drive, 1, 1);
-
- return 0;
- }
- /* Enable DMA on any drive that has UltraDMA (mode 0/1/2) enabled */
- if (id->field_valid & 4) /* UltraDMA */
- if ((id->dma_ultra & (id->dma_ultra >> 8) & 7)) {
- udma_enable(drive, 1, 1);
-
- return 0;
- }
- /* Enable DMA on any drive that has mode2 DMA (multi or single) enabled */
- if (id->field_valid & 2) /* regular DMA */
- if ((id->dma_mword & 0x404) == 0x404 || (id->dma_1word & 0x404) == 0x404) {
- udma_enable(drive, 1, 1);
-
- return 0;
- }
- /* Consult the list of known "good" drives */
- if (udma_white_list(drive)) {
- udma_enable(drive, 1, 1);
-
- return 0;
- }
- }
- udma_enable(drive, 0, 0);
-
- return 0;
-}
-
-/*
- * 1 dma-ing, 2 error, 4 intr
- */
-static int dma_timer_expiry(struct ata_device *drive, struct request *rq)
-{
- /* FIXME: What's that? */
- u8 dma_stat = inb(drive->channel->dma_base+2);
-
-#ifdef DEBUG
- printk("%s: dma_timer_expiry: dma status == 0x%02x\n", drive->name, dma_stat);
-#endif
-
-#if 0
- drive->expiry = NULL; /* one free ride for now */
-#endif
-
- if (dma_stat & 2) { /* ERROR */
- u8 stat = GET_STAT();
- return ide_error(drive, rq, "dma_timer_expiry", stat);
- }
- if (dma_stat & 1) /* DMAing */
- return WAIT_CMD;
- return 0;
-}
-
-int ata_start_dma(struct ata_device *drive, struct request *rq)
-{
- struct ata_channel *ch = drive->channel;
- unsigned long dma_base = ch->dma_base;
- unsigned int reading = 0;
-
- if (rq_data_dir(rq) == READ)
- reading = 1 << 3;
-
- /* try PIO instead of DMA */
- if (!udma_new_table(ch, rq))
- return 1;
-
- outl(ch->dmatable_dma, dma_base + 4); /* PRD table */
- outb(reading, dma_base); /* specify r/w */
- outb(inb(dma_base+2)|6, dma_base+2); /* clear INTR & ERROR flags */
- drive->waiting_for_dma = 1;
- return 0;
-}
-
-/*
- * This initiates/aborts DMA read/write operations on a drive.
- *
- * The caller is assumed to have selected the drive and programmed the drive's
- * sector address using CHS or LBA. All that remains is to prepare for DMA
- * and then issue the actual read/write DMA/PIO command to the drive.
- *
- * For ATAPI devices, we just prepare for DMA and return. The caller should
- * then issue the packet command to the drive and call us again with
- * udma_start afterwards.
- *
- * Returns 0 if all went well.
- * Returns 1 if DMA read/write could not be started, in which case
- * the caller should revert to PIO for the current request.
- * May also be invoked from trm290.c
- */
-int XXX_ide_dmaproc(struct ata_device *drive)
-{
- return config_drive_for_dma(drive);
-}
-
-/*
- * Needed for allowing full modular support of ide-driver
- */
-void ide_release_dma(struct ata_channel *ch)
-{
- if (!ch->dma_base)
- return;
-
- if (ch->dmatable_cpu) {
- pci_free_consistent(ch->pci_dev,
- PRD_ENTRIES * PRD_BYTES,
- ch->dmatable_cpu,
- ch->dmatable_dma);
- ch->dmatable_cpu = NULL;
- }
- if (ch->sg_table) {
- kfree(ch->sg_table);
- ch->sg_table = NULL;
- }
- if ((ch->dma_extra) && (ch->unit == 0))
- release_region((ch->dma_base + 16), ch->dma_extra);
- release_region(ch->dma_base, 8);
- ch->dma_base = 0;
-}
-
-/*
- * This can be called for a dynamically installed interface. Don't __init it
- */
-void ata_init_dma(struct ata_channel *ch, unsigned long dma_base)
-{
- if (!request_region(dma_base, 8, ch->name)) {
- printk(KERN_ERR "ATA: ERROR: BM DMA portst already in use!\n");
-
- return;
- }
- printk(KERN_INFO" %s: BM-DMA at 0x%04lx-0x%04lx", ch->name, dma_base, dma_base + 7);
- ch->dma_base = dma_base;
- ch->dmatable_cpu = pci_alloc_consistent(ch->pci_dev,
- PRD_ENTRIES * PRD_BYTES,
- &ch->dmatable_dma);
- if (ch->dmatable_cpu == NULL)
- goto dma_alloc_failure;
-
- ch->sg_table = kmalloc(sizeof(struct scatterlist) * PRD_ENTRIES,
- GFP_KERNEL);
- if (ch->sg_table == NULL) {
- pci_free_consistent(ch->pci_dev, PRD_ENTRIES * PRD_BYTES,
- ch->dmatable_cpu, ch->dmatable_dma);
- goto dma_alloc_failure;
- }
-
- if (!ch->XXX_udma)
- ch->XXX_udma = XXX_ide_dmaproc;
-
- if (ch->chipset != ide_trm290) {
- u8 dma_stat = inb(dma_base+2);
- printk(", BIOS settings: %s:%s, %s:%s",
- ch->drives[0].name, (dma_stat & 0x20) ? "DMA" : "pio",
- ch->drives[1].name, (dma_stat & 0x40) ? "DMA" : "pio");
- }
- printk("\n");
- return;
-
-dma_alloc_failure:
- printk(" -- ERROR, UNABLE TO ALLOCATE DMA TABLES\n");
-}
-
-/****************************************************************************
- * UDMA function which should have architecture specific counterparts where
- * neccessary.
- *
- * The intention is that at some point in time we will move this whole to
- * architecture specific kernel sections. For now I would love the architecture
- * maintainers to just #ifdef #endif this stuff directly here. I have for now
- * tryed to update as much as I could in the architecture specific code. But
- * of course I may have done mistakes, so please bear with me and update it
- * here the proper way.
- *
- * Thank you a lot in advance!
- *
- * Sat May 4 20:29:46 CEST 2002 Marcin Dalecki.
- */
-
-/*
- * This is the generic part of the DMA setup used by the host chipset drivers
- * in the corresponding DMA setup method.
- *
- * FIXME: there are some places where this gets used driectly for "error
- * recovery" in the ATAPI drivers. This was just plain wrong before, in esp.
- * not portable, and just got uncovered now.
- */
-void udma_enable(struct ata_device *drive, int on, int verbose)
-{
- struct ata_channel *ch = drive->channel;
- int set_high = 1;
- u8 unit;
- u64 addr;
-
-
- /* Method overloaded by host chip specific code. */
- if (ch->udma_enable) {
- ch->udma_enable(drive, on, verbose);
-
- return;
- }
-
- /* Fall back to the default implementation. */
- unit = (drive->select.b.unit & 0x01);
- addr = BLK_BOUNCE_HIGH;
-
- if (!on) {
- if (verbose)
- printk("%s: DMA disabled\n", drive->name);
- set_high = 0;
- outb(inb(ch->dma_base + 2) & ~(1 << (5 + unit)), ch->dma_base + 2);
-#ifdef CONFIG_BLK_DEV_IDE_TCQ
- udma_tcq_enable(drive, 0);
-#endif
- }
-
- /* toggle bounce buffers */
-
- if (on && drive->type == ATA_DISK && drive->channel->highmem) {
- if (!PCI_DMA_BUS_IS_PHYS)
- addr = BLK_BOUNCE_ANY;
- else
- addr = drive->channel->pci_dev->dma_mask;
- }
-
- blk_queue_bounce_limit(&drive->queue, addr);
-
- drive->using_dma = on;
-
- if (on) {
- outb(inb(ch->dma_base + 2) | (1 << (5 + unit)), ch->dma_base + 2);
-#ifdef CONFIG_BLK_DEV_IDE_TCQ_DEFAULT
- udma_tcq_enable(drive, 1);
-#endif
- }
-}
-
-/*
- * This prepares a dma request. Returns 0 if all went okay, returns 1
- * otherwise. May also be invoked from trm290.c
- */
-int udma_new_table(struct ata_channel *ch, struct request *rq)
-{
- unsigned int *table = ch->dmatable_cpu;
-#ifdef CONFIG_BLK_DEV_TRM290
- unsigned int is_trm290_chipset = (ch->chipset == ide_trm290);
-#else
- const int is_trm290_chipset = 0;
-#endif
- unsigned int count = 0;
- int i;
- struct scatterlist *sg;
-
- ch->sg_nents = i = build_sglist(ch, rq);
- if (!i)
- return 0;
-
- sg = ch->sg_table;
- while (i) {
- u32 cur_addr;
- u32 cur_len;
-
- cur_addr = sg_dma_address(sg);
- cur_len = sg_dma_len(sg);
-
- /*
- * Fill in the dma table, without crossing any 64kB boundaries.
- * Most hardware requires 16-bit alignment of all blocks,
- * but the trm290 requires 32-bit alignment.
- */
-
- while (cur_len) {
- u32 xcount, bcount = 0x10000 - (cur_addr & 0xffff);
-
- if (count++ >= PRD_ENTRIES) {
- printk("ide-dma: count %d, sg_nents %d, cur_len %d, cur_addr %u\n",
- count, ch->sg_nents, cur_len, cur_addr);
- BUG();
- }
-
- if (bcount > cur_len)
- bcount = cur_len;
- *table++ = cpu_to_le32(cur_addr);
- xcount = bcount & 0xffff;
- if (is_trm290_chipset)
- xcount = ((xcount >> 2) - 1) << 16;
- if (xcount == 0x0000) {
- /*
- * Most chipsets correctly interpret a length of
- * 0x0000 as 64KB, but at least one (e.g. CS5530)
- * misinterprets it as zero (!). So here we break
- * the 64KB entry into two 32KB entries instead.
- */
- if (count++ >= PRD_ENTRIES) {
- pci_unmap_sg(ch->pci_dev, sg,
- ch->sg_nents,
- ch->sg_dma_direction);
- return 0;
- }
-
- *table++ = cpu_to_le32(0x8000);
- *table++ = cpu_to_le32(cur_addr + 0x8000);
- xcount = 0x8000;
- }
- *table++ = cpu_to_le32(xcount);
- cur_addr += bcount;
- cur_len -= bcount;
- }
-
- sg++;
- i--;
- }
-
- if (!count)
- printk(KERN_ERR "%s: empty DMA table?\n", ch->name);
- else if (!is_trm290_chipset)
- *--table |= cpu_to_le32(0x80000000);
-
- return count;
-}
-
-/* Teardown mappings after DMA has completed. */
-void udma_destroy_table(struct ata_channel *ch)
-{
- pci_unmap_sg(ch->pci_dev, ch->sg_table, ch->sg_nents, ch->sg_dma_direction);
-}
-
-void udma_print(struct ata_device *drive)
-{
-#ifdef CONFIG_ARCH_ACORN
- printk(", DMA");
-#else
- struct hd_driveid *id = drive->id;
- char *str = NULL;
-
- if ((id->field_valid & 4) && (eighty_ninty_three(drive)) &&
- (id->dma_ultra & (id->dma_ultra >> 14) & 3)) {
- if ((id->dma_ultra >> 15) & 1)
- str = ", UDMA(mode 7)"; /* UDMA BIOS-enabled! */
- else
- str = ", UDMA(133)"; /* UDMA BIOS-enabled! */
- } else if ((id->field_valid & 4) && (eighty_ninty_three(drive)) &&
- (id->dma_ultra & (id->dma_ultra >> 11) & 7)) {
- if ((id->dma_ultra >> 13) & 1) {
- str = ", UDMA(100)"; /* UDMA BIOS-enabled! */
- } else if ((id->dma_ultra >> 12) & 1) {
- str = ", UDMA(66)"; /* UDMA BIOS-enabled! */
- } else {
- str = ", UDMA(44)"; /* UDMA BIOS-enabled! */
- }
- } else if ((id->field_valid & 4) &&
- (id->dma_ultra & (id->dma_ultra >> 8) & 7)) {
- if ((id->dma_ultra >> 10) & 1) {
- str = ", UDMA(33)"; /* UDMA BIOS-enabled! */
- } else if ((id->dma_ultra >> 9) & 1) {
- str = ", UDMA(25)"; /* UDMA BIOS-enabled! */
- } else {
- str = ", UDMA(16)"; /* UDMA BIOS-enabled! */
- }
- } else if (id->field_valid & 4)
- str = ", (U)DMA"; /* Can be BIOS-enabled! */
- else
- str = ", DMA";
-
- printk(str);
-#endif
-}
-
-/*
- * Drive back/white list handling for UDMA capability:
- */
-
-int udma_black_list(struct ata_device *drive)
-{
- return check_drive_lists(drive, 0);
-}
-
-int udma_white_list(struct ata_device *drive)
-{
- return check_drive_lists(drive, 1);
-}
-
-/*
- * Generic entry points for functions provided possibly by the host chip set
- * drivers.
- */
-
-/*
- * Prepare the channel for a DMA startfer. Please note that only the broken
- * Pacific Digital host chip needs the reques to be passed there to decide
- * about addressing modes.
- */
-
-int udma_start(struct ata_device *drive, struct request *rq)
-{
- struct ata_channel *ch = drive->channel;
- unsigned long dma_base = ch->dma_base;
-
- if (ch->udma_start)
- return ch->udma_start(drive, rq);
-
- /* Note that this is done *after* the cmd has
- * been issued to the drive, as per the BM-IDE spec.
- * The Promise Ultra33 doesn't work correctly when
- * we do this part before issuing the drive cmd.
- */
- outb(inb(dma_base)|1, dma_base); /* start DMA */
- return 0;
-}
-
-int udma_stop(struct ata_device *drive)
-{
- struct ata_channel *ch = drive->channel;
- unsigned long dma_base = ch->dma_base;
- u8 dma_stat;
-
- if (ch->udma_stop)
- return ch->udma_stop(drive);
-
- drive->waiting_for_dma = 0;
- outb(inb(dma_base)&~1, dma_base); /* stop DMA */
- dma_stat = inb(dma_base+2); /* get DMA status */
- outb(dma_stat|6, dma_base+2); /* clear the INTR & ERROR bits */
- udma_destroy_table(ch); /* purge DMA mappings */
-
- return (dma_stat & 7) != 4 ? (0x10 | dma_stat) : 0; /* verify good DMA status */
-}
-
-/*
- * This is the default read write function.
- *
- * It's exported only for host chips which use it for fallback or (too) late
- * capability checking.
- */
-
-int ata_do_udma(unsigned int reading, struct ata_device *drive, struct request *rq)
-{
- if (ata_start_dma(drive, rq))
- return 1;
-
- if (drive->type != ATA_DISK)
- return 0;
-
- reading <<= 3;
-
- ide_set_handler(drive, ide_dma_intr, WAIT_CMD, dma_timer_expiry); /* issue cmd to drive */
- if ((rq->flags & REQ_DRIVE_ACB) && (drive->addressing == 1)) {
- struct ata_taskfile *args = rq->special;
-
- OUT_BYTE(args->taskfile.command, IDE_COMMAND_REG);
- } else if (drive->addressing) {
- OUT_BYTE(reading ? WIN_READDMA_EXT : WIN_WRITEDMA_EXT, IDE_COMMAND_REG);
- } else {
- OUT_BYTE(reading ? WIN_READDMA : WIN_WRITEDMA, IDE_COMMAND_REG);
- }
-
- return udma_start(drive, rq);
-}
-
-int udma_read(struct ata_device *drive, struct request *rq)
-{
- struct ata_channel *ch = drive->channel;
-
- if (ch->udma_read)
- return ch->udma_read(drive, rq);
-
- return ata_do_udma(1, drive, rq);
-}
-
-int udma_write(struct ata_device *drive, struct request *rq)
-{
- struct ata_channel *ch = drive->channel;
-
- if (ch->udma_write)
- return ch->udma_write(drive, rq);
-
- return ata_do_udma(0, drive, rq);
-}
-
-/*
- * FIXME: This should be attached to a channel as we can see now!
- */
-int udma_irq_status(struct ata_device *drive)
-{
- struct ata_channel *ch = drive->channel;
- u8 dma_stat;
-
- if (ch->udma_irq_status)
- return ch->udma_irq_status(drive);
-
- /* default action */
- dma_stat = inb(ch->dma_base + 2);
-
- return (dma_stat & 4) == 4; /* return 1 if INTR asserted */
-}
-
-void udma_timeout(struct ata_device *drive)
-{
- printk(KERN_ERR "ATA: UDMA timeout occured %s!\n", drive->name);
-
- /* Invoke the chipset specific handler now. */
- if (drive->channel->udma_timeout)
- drive->channel->udma_timeout(drive);
-
-}
-
-void udma_irq_lost(struct ata_device *drive)
-{
- if (drive->channel->udma_irq_lost)
- drive->channel->udma_irq_lost(drive);
-}
-
-EXPORT_SYMBOL(udma_enable);
-EXPORT_SYMBOL(udma_start);
-EXPORT_SYMBOL(udma_stop);
-EXPORT_SYMBOL(udma_read);
-EXPORT_SYMBOL(udma_write);
-EXPORT_SYMBOL(ata_do_udma);
-EXPORT_SYMBOL(udma_irq_status);
-EXPORT_SYMBOL(udma_print);
-EXPORT_SYMBOL(udma_black_list);
-EXPORT_SYMBOL(udma_white_list);
diff -urN linux-2.5.15/drivers/ide/ide-taskfile.c linux/drivers/ide/ide-taskfile.c
--- linux-2.5.15/drivers/ide/ide-taskfile.c 2002-05-15 14:55:12.000000000 +0200
+++ linux/drivers/ide/ide-taskfile.c 2002-05-15 14:07:43.000000000 +0200
@@ -562,7 +562,7 @@
if (!ide_end_request(drive, rq, 1))
return ide_stopped;
- if ((rq->current_nr_sectors==1) ^ (stat & DRQ_STAT)) {
+ if ((rq->nr_sectors == 1) != (stat & DRQ_STAT)) {
pBuf = ide_map_rq(rq, &flags);
DTF("write: %p, rq->current_nr_sectors: %d\n", pBuf, (int) rq->current_nr_sectors);
diff -urN linux-2.5.15/drivers/ide/Makefile linux/drivers/ide/Makefile
--- linux-2.5.15/drivers/ide/Makefile 2002-05-15 14:55:12.000000000 +0200
+++ linux/drivers/ide/Makefile 2002-05-14 17:10:48.000000000 +0200
@@ -10,7 +10,7 @@
O_TARGET := idedriver.o
-export-objs := ide-taskfile.o ide.o ide-features.o ide-probe.o ide-dma.o ataraid.o
+export-objs := ide-taskfile.o ide.o ide-features.o ide-probe.o quirks.o pcidma.o ataraid.o
obj-y :=
obj-m :=
@@ -43,7 +43,8 @@
ide-obj-$(CONFIG_BLK_DEV_HPT366) += hpt366.o
ide-obj-$(CONFIG_BLK_DEV_HT6560B) += ht6560b.o
ide-obj-$(CONFIG_BLK_DEV_IDE_ICSIDE) += icside.o
-ide-obj-$(CONFIG_BLK_DEV_IDEDMA_PCI) += ide-dma.o
+ide-obj-$(CONFIG_BLK_DEV_IDEDMA) += quirks.o
+ide-obj-$(CONFIG_BLK_DEV_IDEDMA_PCI) += pcidma.o
ide-obj-$(CONFIG_BLK_DEV_IDE_TCQ) += tcq.o
ide-obj-$(CONFIG_PCI) += ide-pci.o
ide-obj-$(CONFIG_BLK_DEV_ISAPNP) += ide-pnp.o
diff -urN linux-2.5.15/drivers/ide/pcidma.c linux/drivers/ide/pcidma.c
--- linux-2.5.15/drivers/ide/pcidma.c 1970-01-01 01:00:00.000000000 +0100
+++ linux/drivers/ide/pcidma.c 2002-05-15 13:28:50.000000000 +0200
@@ -0,0 +1,555 @@
+/**** vi:set ts=8 sts=8 sw=8:************************************************
+ *
+ * Copyright (C) 2002 Marcin Dalecki <martin@dalecki.de>
+ *
+ * Based on previous work by:
+ *
+ * Copyright (c) 1999-2000 Andre Hedrick <andre@linux-ide.org>
+ * Copyright (c) 1995-1998 Mark Lord
+ *
+ * May be copied or modified under the terms of the GNU General Public License
+ */
+
+/*
+ * Those are the generic BM DMA support functions for PCI bus based systems.
+ */
+
+#include <linux/config.h>
+#define __NO_VERSION__
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/timer.h>
+#include <linux/mm.h>
+#include <linux/interrupt.h>
+#include <linux/pci.h>
+#include <linux/init.h>
+#include <linux/ide.h>
+#include <linux/delay.h>
+
+#include <asm/io.h>
+#include <asm/irq.h>
+
+#define DEFAULT_BMIBA 0xe800 /* in case BIOS did not init it */
+#define DEFAULT_BMCRBA 0xcc00 /* VIA's default value */
+#define DEFAULT_BMALIBA 0xd400 /* ALI's default value */
+
+/*
+ * This is the handler for disk read/write DMA interrupts.
+ */
+ide_startstop_t ide_dma_intr(struct ata_device *drive, struct request *rq)
+{
+ u8 stat, dma_stat;
+
+ dma_stat = udma_stop(drive);
+ if (OK_STAT(stat = GET_STAT(),DRIVE_READY,drive->bad_wstat|DRQ_STAT)) {
+ if (!dma_stat) {
+ __ide_end_request(drive, rq, 1, rq->nr_sectors);
+ return ide_stopped;
+ }
+ printk(KERN_ERR "%s: dma_intr: bad DMA status (dma_stat=%x)\n",
+ drive->name, dma_stat);
+ }
+ return ide_error(drive, rq, "dma_intr", stat);
+}
+
+/*
+ * FIXME: taskfiles should be a map of pages, not a long virt address... /jens
+ * FIXME: I agree with Jens --mdcki!
+ */
+static int build_sglist(struct ata_channel *ch, struct request *rq)
+{
+ struct scatterlist *sg = ch->sg_table;
+ int nents = 0;
+
+ if (rq->flags & REQ_DRIVE_ACB) {
+ struct ata_taskfile *args = rq->special;
+#if 1
+ unsigned char *virt_addr = rq->buffer;
+ int sector_count = rq->nr_sectors;
+#else
+ nents = blk_rq_map_sg(rq->q, rq, ch->sg_table);
+
+ if (nents > rq->nr_segments)
+ printk("ide-dma: received %d segments, build %d\n", rq->nr_segments, nents);
+#endif
+
+ if (args->command_type == IDE_DRIVE_TASK_RAW_WRITE)
+ ch->sg_dma_direction = PCI_DMA_TODEVICE;
+ else
+ ch->sg_dma_direction = PCI_DMA_FROMDEVICE;
+
+ /*
+ * FIXME: This depends upon a hard coded page size!
+ */
+ if (sector_count > 128) {
+ memset(&sg[nents], 0, sizeof(*sg));
+
+ sg[nents].page = virt_to_page(virt_addr);
+ sg[nents].offset = (unsigned long) virt_addr & ~PAGE_MASK;
+ sg[nents].length = 128 * SECTOR_SIZE;
+ ++nents;
+ virt_addr = virt_addr + (128 * SECTOR_SIZE);
+ sector_count -= 128;
+ }
+ memset(&sg[nents], 0, sizeof(*sg));
+ sg[nents].page = virt_to_page(virt_addr);
+ sg[nents].offset = (unsigned long) virt_addr & ~PAGE_MASK;
+ sg[nents].length = sector_count * SECTOR_SIZE;
+ ++nents;
+ } else {
+ nents = blk_rq_map_sg(rq->q, rq, ch->sg_table);
+
+ if (rq->q && nents > rq->nr_phys_segments)
+ printk("ide-dma: received %d phys segments, build %d\n", rq->nr_phys_segments, nents);
+
+ if (rq_data_dir(rq) == READ)
+ ch->sg_dma_direction = PCI_DMA_FROMDEVICE;
+ else
+ ch->sg_dma_direction = PCI_DMA_TODEVICE;
+
+ }
+
+ return pci_map_sg(ch->pci_dev, sg, nents, ch->sg_dma_direction);
+}
+
+/*
+ * 1 dma-ing, 2 error, 4 intr
+ */
+static int dma_timer_expiry(struct ata_device *drive, struct request *rq)
+{
+ /* FIXME: What's that? */
+ u8 dma_stat = inb(drive->channel->dma_base+2);
+
+#ifdef DEBUG
+ printk("%s: dma_timer_expiry: dma status == 0x%02x\n", drive->name, dma_stat);
+#endif
+
+#if 0
+ drive->expiry = NULL; /* one free ride for now */
+#endif
+
+ if (dma_stat & 2) { /* ERROR */
+ u8 stat = GET_STAT();
+ return ide_error(drive, rq, "dma_timer_expiry", stat);
+ }
+ if (dma_stat & 1) /* DMAing */
+ return WAIT_CMD;
+ return 0;
+}
+
+int ata_start_dma(struct ata_device *drive, struct request *rq)
+{
+ struct ata_channel *ch = drive->channel;
+ unsigned long dma_base = ch->dma_base;
+ unsigned int reading = 0;
+
+ if (rq_data_dir(rq) == READ)
+ reading = 1 << 3;
+
+ /* try PIO instead of DMA */
+ if (!udma_new_table(ch, rq))
+ return 1;
+
+ outl(ch->dmatable_dma, dma_base + 4); /* PRD table */
+ outb(reading, dma_base); /* specify r/w */
+ outb(inb(dma_base+2)|6, dma_base+2); /* clear INTR & ERROR flags */
+ drive->waiting_for_dma = 1;
+
+ return 0;
+}
+
+/*
+ * Configure a device for DMA operation.
+ */
+int XXX_ide_dmaproc(struct ata_device *drive)
+{
+ int config_allows_dma = 1;
+ struct hd_driveid *id = drive->id;
+ struct ata_channel *ch = drive->channel;
+
+#ifdef CONFIG_IDEDMA_ONLYDISK
+ if (drive->type != ATA_DISK)
+ config_allows_dma = 0;
+#endif
+
+ if (id && (id->capability & 1) && ch->autodma && config_allows_dma) {
+ /* Consult the list of known "bad" drives */
+ if (udma_black_list(drive)) {
+ udma_enable(drive, 0, 1);
+
+ return 0;
+ }
+
+ /* Enable DMA on any drive that has UltraDMA (mode 6/7/?) enabled */
+ if ((id->field_valid & 4) && (eighty_ninty_three(drive)))
+ if ((id->dma_ultra & (id->dma_ultra >> 14) & 2)) {
+ udma_enable(drive, 1, 1);
+
+ return 0;
+ }
+ /* Enable DMA on any drive that has UltraDMA (mode 3/4/5) enabled */
+ if ((id->field_valid & 4) && (eighty_ninty_three(drive)))
+ if ((id->dma_ultra & (id->dma_ultra >> 11) & 7)) {
+ udma_enable(drive, 1, 1);
+
+ return 0;
+ }
+ /* Enable DMA on any drive that has UltraDMA (mode 0/1/2) enabled */
+ if (id->field_valid & 4) /* UltraDMA */
+ if ((id->dma_ultra & (id->dma_ultra >> 8) & 7)) {
+ udma_enable(drive, 1, 1);
+
+ return 0;
+ }
+ /* Enable DMA on any drive that has mode2 DMA (multi or single) enabled */
+ if (id->field_valid & 2) /* regular DMA */
+ if ((id->dma_mword & 0x404) == 0x404 || (id->dma_1word & 0x404) == 0x404) {
+ udma_enable(drive, 1, 1);
+
+ return 0;
+ }
+ /* Consult the list of known "good" drives */
+ if (udma_white_list(drive)) {
+ udma_enable(drive, 1, 1);
+
+ return 0;
+ }
+ }
+ udma_enable(drive, 0, 0);
+
+ return 0;
+}
+
+/*
+ * Needed for allowing full modular support of ide-driver
+ */
+void ide_release_dma(struct ata_channel *ch)
+{
+ if (!ch->dma_base)
+ return;
+
+ if (ch->dmatable_cpu) {
+ pci_free_consistent(ch->pci_dev,
+ PRD_ENTRIES * PRD_BYTES,
+ ch->dmatable_cpu,
+ ch->dmatable_dma);
+ ch->dmatable_cpu = NULL;
+ }
+ if (ch->sg_table) {
+ kfree(ch->sg_table);
+ ch->sg_table = NULL;
+ }
+ if ((ch->dma_extra) && (ch->unit == 0))
+ release_region((ch->dma_base + 16), ch->dma_extra);
+ release_region(ch->dma_base, 8);
+ ch->dma_base = 0;
+}
+
+/****************************************************************************
+ * PCI specific UDMA channel method implementations.
+ */
+
+/*
+ * This is the generic part of the DMA setup used by the host chipset drivers
+ * in the corresponding DMA setup method.
+ *
+ * FIXME: there are some places where this gets used driectly for "error
+ * recovery" in the ATAPI drivers. This was just plain wrong before, in esp.
+ * not portable, and just got uncovered now.
+ */
+static void udma_pci_enable(struct ata_device *drive, int on, int verbose)
+{
+ struct ata_channel *ch = drive->channel;
+ int set_high = 1;
+ u8 unit;
+ u64 addr;
+
+ /* Fall back to the default implementation. */
+ unit = (drive->select.b.unit & 0x01);
+ addr = BLK_BOUNCE_HIGH;
+
+ if (!on) {
+ if (verbose)
+ printk("%s: DMA disabled\n", drive->name);
+ set_high = 0;
+ outb(inb(ch->dma_base + 2) & ~(1 << (5 + unit)), ch->dma_base + 2);
+#ifdef CONFIG_BLK_DEV_IDE_TCQ
+ udma_tcq_enable(drive, 0);
+#endif
+ }
+
+ /* toggle bounce buffers */
+
+ if (on && drive->type == ATA_DISK && drive->channel->highmem) {
+ if (!PCI_DMA_BUS_IS_PHYS)
+ addr = BLK_BOUNCE_ANY;
+ else
+ addr = drive->channel->pci_dev->dma_mask;
+ }
+
+ blk_queue_bounce_limit(&drive->queue, addr);
+
+ drive->using_dma = on;
+
+ if (on) {
+ outb(inb(ch->dma_base + 2) | (1 << (5 + unit)), ch->dma_base + 2);
+#ifdef CONFIG_BLK_DEV_IDE_TCQ_DEFAULT
+ udma_tcq_enable(drive, 1);
+#endif
+ }
+}
+
+/*
+ * This prepares a dma request. Returns 0 if all went okay, returns 1
+ * otherwise. May also be invoked from trm290.c
+ */
+int udma_new_table(struct ata_channel *ch, struct request *rq)
+{
+ unsigned int *table = ch->dmatable_cpu;
+#ifdef CONFIG_BLK_DEV_TRM290
+ unsigned int is_trm290_chipset = (ch->chipset == ide_trm290);
+#else
+ const int is_trm290_chipset = 0;
+#endif
+ unsigned int count = 0;
+ int i;
+ struct scatterlist *sg;
+
+ ch->sg_nents = i = build_sglist(ch, rq);
+ if (!i)
+ return 0;
+
+ sg = ch->sg_table;
+ while (i) {
+ u32 cur_addr;
+ u32 cur_len;
+
+ cur_addr = sg_dma_address(sg);
+ cur_len = sg_dma_len(sg);
+
+ /*
+ * Fill in the dma table, without crossing any 64kB boundaries.
+ * Most hardware requires 16-bit alignment of all blocks,
+ * but the trm290 requires 32-bit alignment.
+ */
+
+ while (cur_len) {
+ u32 xcount, bcount = 0x10000 - (cur_addr & 0xffff);
+
+ if (count++ >= PRD_ENTRIES) {
+ printk("ide-dma: count %d, sg_nents %d, cur_len %d, cur_addr %u\n",
+ count, ch->sg_nents, cur_len, cur_addr);
+ BUG();
+ }
+
+ if (bcount > cur_len)
+ bcount = cur_len;
+ *table++ = cpu_to_le32(cur_addr);
+ xcount = bcount & 0xffff;
+ if (is_trm290_chipset)
+ xcount = ((xcount >> 2) - 1) << 16;
+ if (xcount == 0x0000) {
+ /*
+ * Most chipsets correctly interpret a length of
+ * 0x0000 as 64KB, but at least one (e.g. CS5530)
+ * misinterprets it as zero (!). So here we break
+ * the 64KB entry into two 32KB entries instead.
+ */
+ if (count++ >= PRD_ENTRIES) {
+ pci_unmap_sg(ch->pci_dev, sg,
+ ch->sg_nents,
+ ch->sg_dma_direction);
+ return 0;
+ }
+
+ *table++ = cpu_to_le32(0x8000);
+ *table++ = cpu_to_le32(cur_addr + 0x8000);
+ xcount = 0x8000;
+ }
+ *table++ = cpu_to_le32(xcount);
+ cur_addr += bcount;
+ cur_len -= bcount;
+ }
+
+ sg++;
+ i--;
+ }
+
+ if (!count)
+ printk(KERN_ERR "%s: empty DMA table?\n", ch->name);
+ else if (!is_trm290_chipset)
+ *--table |= cpu_to_le32(0x80000000);
+
+ return count;
+}
+
+/* Teardown mappings after DMA has completed. */
+void udma_destroy_table(struct ata_channel *ch)
+{
+ pci_unmap_sg(ch->pci_dev, ch->sg_table, ch->sg_nents, ch->sg_dma_direction);
+}
+
+/*
+ * Prepare the channel for a DMA startfer. Please note that only the broken
+ * Pacific Digital host chip needs the reques to be passed there to decide
+ * about addressing modes.
+ */
+
+static int udma_pci_start(struct ata_device *drive, struct request *rq)
+{
+ struct ata_channel *ch = drive->channel;
+ unsigned long dma_base = ch->dma_base;
+
+ /* Note that this is done *after* the cmd has
+ * been issued to the drive, as per the BM-IDE spec.
+ * The Promise Ultra33 doesn't work correctly when
+ * we do this part before issuing the drive cmd.
+ */
+ outb(inb(dma_base)|1, dma_base); /* start DMA */
+ return 0;
+}
+
+static int udma_pci_stop(struct ata_device *drive)
+{
+ struct ata_channel *ch = drive->channel;
+ unsigned long dma_base = ch->dma_base;
+ u8 dma_stat;
+
+ drive->waiting_for_dma = 0;
+ outb(inb(dma_base)&~1, dma_base); /* stop DMA */
+ dma_stat = inb(dma_base+2); /* get DMA status */
+ outb(dma_stat|6, dma_base+2); /* clear the INTR & ERROR bits */
+ udma_destroy_table(ch); /* purge DMA mappings */
+
+ return (dma_stat & 7) != 4 ? (0x10 | dma_stat) : 0; /* verify good DMA status */
+}
+
+static int udma_pci_read(struct ata_device *drive, struct request *rq)
+{
+ return ata_do_udma(1, drive, rq);
+}
+
+static int udma_pci_write(struct ata_device *drive, struct request *rq)
+{
+ return ata_do_udma(0, drive, rq);
+}
+
+/*
+ * FIXME: This should be attached to a channel as we can see now!
+ */
+static int udma_pci_irq_status(struct ata_device *drive)
+{
+ struct ata_channel *ch = drive->channel;
+ u8 dma_stat;
+
+ /* default action */
+ dma_stat = inb(ch->dma_base + 2);
+
+ return (dma_stat & 4) == 4; /* return 1 if INTR asserted */
+}
+
+static void udma_pci_timeout(struct ata_device *drive)
+{
+ printk(KERN_ERR "ATA: UDMA timeout occured %s!\n", drive->name);
+}
+
+static void udma_pci_irq_lost(struct ata_device *drive)
+{
+}
+
+/*
+ * This can be called for a dynamically installed interface. Don't __init it
+ */
+void ata_init_dma(struct ata_channel *ch, unsigned long dma_base)
+{
+ if (!request_region(dma_base, 8, ch->name)) {
+ printk(KERN_ERR "ATA: ERROR: BM DMA portst already in use!\n");
+
+ return;
+ }
+ printk(KERN_INFO" %s: BM-DMA at 0x%04lx-0x%04lx", ch->name, dma_base, dma_base + 7);
+ ch->dma_base = dma_base;
+ ch->dmatable_cpu = pci_alloc_consistent(ch->pci_dev,
+ PRD_ENTRIES * PRD_BYTES,
+ &ch->dmatable_dma);
+ if (ch->dmatable_cpu == NULL)
+ goto dma_alloc_failure;
+
+ ch->sg_table = kmalloc(sizeof(struct scatterlist) * PRD_ENTRIES,
+ GFP_KERNEL);
+ if (ch->sg_table == NULL) {
+ pci_free_consistent(ch->pci_dev, PRD_ENTRIES * PRD_BYTES,
+ ch->dmatable_cpu, ch->dmatable_dma);
+ goto dma_alloc_failure;
+ }
+
+ /*
+ * We could just assign them, and then leave it up to the chipset
+ * specific code to override these after they've called this function.
+ */
+ if (!ch->XXX_udma)
+ ch->XXX_udma = XXX_ide_dmaproc;
+ if (!ch->udma_enable)
+ ch->udma_enable = udma_pci_enable;
+ if (!ch->udma_start)
+ ch->udma_start = udma_pci_start;
+ if (!ch->udma_stop)
+ ch->udma_stop = udma_pci_stop;
+ if (!ch->udma_read)
+ ch->udma_read = udma_pci_read;
+ if (!ch->udma_write)
+ ch->udma_write = udma_pci_write;
+ if (!ch->udma_irq_status)
+ ch->udma_irq_status = udma_pci_irq_status;
+ if (!ch->udma_timeout)
+ ch->udma_timeout = udma_pci_timeout;
+ if (!ch->udma_irq_lost)
+ ch->udma_irq_lost = udma_pci_irq_lost;
+
+ if (ch->chipset != ide_trm290) {
+ u8 dma_stat = inb(dma_base+2);
+ printk(", BIOS settings: %s:%s, %s:%s",
+ ch->drives[0].name, (dma_stat & 0x20) ? "DMA" : "pio",
+ ch->drives[1].name, (dma_stat & 0x40) ? "DMA" : "pio");
+ }
+ printk("\n");
+ return;
+
+dma_alloc_failure:
+ printk(" -- ERROR, UNABLE TO ALLOCATE DMA TABLES\n");
+}
+
+/*
+ * This is the default read write function.
+ *
+ * It's exported only for host chips which use it for fallback or (too) late
+ * capability checking.
+ */
+
+int ata_do_udma(unsigned int reading, struct ata_device *drive, struct request *rq)
+{
+ if (ata_start_dma(drive, rq))
+ return 1;
+
+ if (drive->type != ATA_DISK)
+ return 0;
+
+ reading <<= 3;
+
+ ide_set_handler(drive, ide_dma_intr, WAIT_CMD, dma_timer_expiry); /* issue cmd to drive */
+ if ((rq->flags & REQ_DRIVE_ACB) && (drive->addressing == 1)) {
+ struct ata_taskfile *args = rq->special;
+
+ outb(args->taskfile.command, IDE_COMMAND_REG);
+ } else if (drive->addressing) {
+ outb(reading ? WIN_READDMA_EXT : WIN_WRITEDMA_EXT, IDE_COMMAND_REG);
+ } else {
+ outb(reading ? WIN_READDMA : WIN_WRITEDMA, IDE_COMMAND_REG);
+ }
+
+ return udma_start(drive, rq);
+}
+
+EXPORT_SYMBOL(ata_do_udma);
+EXPORT_SYMBOL(ide_dma_intr);
diff -urN linux-2.5.15/drivers/ide/pcihost.h linux/drivers/ide/pcihost.h
--- linux-2.5.15/drivers/ide/pcihost.h 2002-05-15 14:55:12.000000000 +0200
+++ linux/drivers/ide/pcihost.h 2002-05-15 13:29:10.000000000 +0200
@@ -10,10 +10,6 @@
* ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
* FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
* more details.
- *
- * You should have received a copy of the GNU General Public License along with
- * this program; if not, write to the Free Software Foundation, Inc., 59 Temple
- * Place, Suite 330, Boston, MA 02111-1307 USA
*/
/*
diff -urN linux-2.5.15/drivers/ide/pdc202xx.c linux/drivers/ide/pdc202xx.c
--- linux-2.5.15/drivers/ide/pdc202xx.c 2002-05-15 14:55:05.000000000 +0200
+++ linux/drivers/ide/pdc202xx.c 2002-05-15 13:45:41.000000000 +0200
@@ -53,220 +53,10 @@
#define PDC202XX_DEBUG_DRIVE_INFO 0
#define PDC202XX_DECODE_REGISTER_INFO 0
-#undef DISPLAY_PDC202XX_TIMINGS
-
#ifndef SPLIT_BYTE
#define SPLIT_BYTE(B,H,L) ((H)=(B>>4), (L)=(B-((B>>4)<<4)))
#endif
-#if defined(DISPLAY_PDC202XX_TIMINGS) && defined(CONFIG_PROC_FS)
-#include <linux/stat.h>
-#include <linux/proc_fs.h>
-
-static int pdc202xx_get_info(char *, char **, off_t, int);
-extern int (*pdc202xx_display_info)(char *, char **, off_t, int); /* ide-proc.c */
-static struct pci_dev *bmide_dev;
-
-char *pdc202xx_pio_verbose (u32 drive_pci)
-{
- if ((drive_pci & 0x000ff000) == 0x000ff000) return("NOTSET");
- if ((drive_pci & 0x00000401) == 0x00000401) return("PIO 4");
- if ((drive_pci & 0x00000602) == 0x00000602) return("PIO 3");
- if ((drive_pci & 0x00000803) == 0x00000803) return("PIO 2");
- if ((drive_pci & 0x00000C05) == 0x00000C05) return("PIO 1");
- if ((drive_pci & 0x00001309) == 0x00001309) return("PIO 0");
- return("PIO ?");
-}
-
-char *pdc202xx_dma_verbose (u32 drive_pci)
-{
- if ((drive_pci & 0x00036000) == 0x00036000) return("MWDMA 2");
- if ((drive_pci & 0x00046000) == 0x00046000) return("MWDMA 1");
- if ((drive_pci & 0x00056000) == 0x00056000) return("MWDMA 0");
- if ((drive_pci & 0x00056000) == 0x00056000) return("SWDMA 2");
- if ((drive_pci & 0x00068000) == 0x00068000) return("SWDMA 1");
- if ((drive_pci & 0x000BC000) == 0x000BC000) return("SWDMA 0");
- return("PIO---");
-}
-
-char *pdc202xx_ultra_verbose (u32 drive_pci, u16 slow_cable)
-{
- if ((drive_pci & 0x000ff000) == 0x000ff000)
- return("NOTSET");
- if ((drive_pci & 0x00012000) == 0x00012000)
- return((slow_cable) ? "UDMA 2" : "UDMA 4");
- if ((drive_pci & 0x00024000) == 0x00024000)
- return((slow_cable) ? "UDMA 1" : "UDMA 3");
- if ((drive_pci & 0x00036000) == 0x00036000)
- return("UDMA 0");
- return(pdc202xx_dma_verbose(drive_pci));
-}
-
-static char * pdc202xx_info (char *buf, struct pci_dev *dev)
-{
- char *p = buf;
-
- u32 bibma = pci_resource_start(dev, 4);
- u32 reg60h = 0, reg64h = 0, reg68h = 0, reg6ch = 0;
- u16 reg50h = 0, pmask = (1<<10), smask = (1<<11);
- u8 hi = 0, lo = 0;
-
- /*
- * at that point bibma+0x2 et bibma+0xa are byte registers
- * to investigate:
- */
- u8 c0 = inb_p((unsigned short)bibma + 0x02);
- u8 c1 = inb_p((unsigned short)bibma + 0x0a);
-
- u8 sc11 = inb_p((unsigned short)bibma + 0x11);
- u8 sc1a = inb_p((unsigned short)bibma + 0x1a);
- u8 sc1b = inb_p((unsigned short)bibma + 0x1b);
- u8 sc1c = inb_p((unsigned short)bibma + 0x1c);
- u8 sc1d = inb_p((unsigned short)bibma + 0x1d);
- u8 sc1e = inb_p((unsigned short)bibma + 0x1e);
- u8 sc1f = inb_p((unsigned short)bibma + 0x1f);
-
- pci_read_config_word(dev, 0x50, ®50h);
- pci_read_config_dword(dev, 0x60, ®60h);
- pci_read_config_dword(dev, 0x64, ®64h);
- pci_read_config_dword(dev, 0x68, ®68h);
- pci_read_config_dword(dev, 0x6c, ®6ch);
-
- switch(dev->device) {
- case PCI_DEVICE_ID_PROMISE_20267:
- p += sprintf(p, "\n PDC20267 Chipset.\n");
- break;
- case PCI_DEVICE_ID_PROMISE_20265:
- p += sprintf(p, "\n PDC20265 Chipset.\n");
- break;
- case PCI_DEVICE_ID_PROMISE_20262:
- p += sprintf(p, "\n PDC20262 Chipset.\n");
- break;
- case PCI_DEVICE_ID_PROMISE_20246:
- p += sprintf(p, "\n PDC20246 Chipset.\n");
- reg50h |= 0x0c00;
- break;
- default:
- p += sprintf(p, "\n PDC202XX Chipset.\n");
- break;
- }
-
- p += sprintf(p, "------------------------------- General Status ---------------------------------\n");
- p += sprintf(p, "Burst Mode : %sabled\n", (sc1f & 0x01) ? "en" : "dis");
- p += sprintf(p, "Host Mode : %s\n", (sc1f & 0x08) ? "Tri-Stated" : "Normal");
- p += sprintf(p, "Bus Clocking : %s\n",
- ((sc1f & 0xC0) == 0xC0) ? "100 External" :
- ((sc1f & 0x80) == 0x80) ? "66 External" :
- ((sc1f & 0x40) == 0x40) ? "33 External" : "33 PCI Internal");
- p += sprintf(p, "IO pad select : %s mA\n",
- ((sc1c & 0x03) == 0x03) ? "10" :
- ((sc1c & 0x02) == 0x02) ? "8" :
- ((sc1c & 0x01) == 0x01) ? "6" :
- ((sc1c & 0x00) == 0x00) ? "4" : "??");
- SPLIT_BYTE(sc1e, hi, lo);
- p += sprintf(p, "Status Polling Period : %d\n", hi);
- p += sprintf(p, "Interrupt Check Status Polling Delay : %d\n", lo);
- p += sprintf(p, "--------------- Primary Channel ---------------- Secondary Channel -------------\n");
- p += sprintf(p, " %s %s\n",
- (c0&0x80)?"disabled":"enabled ",
- (c1&0x80)?"disabled":"enabled ");
- p += sprintf(p, "66 Clocking %s %s\n",
- (sc11&0x02)?"enabled ":"disabled",
- (sc11&0x08)?"enabled ":"disabled");
- p += sprintf(p, " Mode %s Mode %s\n",
- (sc1a & 0x01) ? "MASTER" : "PCI ",
- (sc1b & 0x01) ? "MASTER" : "PCI ");
- p += sprintf(p, " %s %s\n",
- (sc1d & 0x08) ? "Error " :
- ((sc1d & 0x05) == 0x05) ? "Not My INTR " :
- (sc1d & 0x04) ? "Interrupting" :
- (sc1d & 0x02) ? "FIFO Full " :
- (sc1d & 0x01) ? "FIFO Empty " : "????????????",
- (sc1d & 0x80) ? "Error " :
- ((sc1d & 0x50) == 0x50) ? "Not My INTR " :
- (sc1d & 0x40) ? "Interrupting" :
- (sc1d & 0x20) ? "FIFO Full " :
- (sc1d & 0x10) ? "FIFO Empty " : "????????????");
- p += sprintf(p, "--------------- drive0 --------- drive1 -------- drive0 ---------- drive1 ------\n");
- p += sprintf(p, "DMA enabled: %s %s %s %s\n",
- (c0&0x20)?"yes":"no ",(c0&0x40)?"yes":"no ",(c1&0x20)?"yes":"no ",(c1&0x40)?"yes":"no ");
- p += sprintf(p, "DMA Mode: %s %s %s %s\n",
- pdc202xx_ultra_verbose(reg60h, (reg50h & pmask)),
- pdc202xx_ultra_verbose(reg64h, (reg50h & pmask)),
- pdc202xx_ultra_verbose(reg68h, (reg50h & smask)),
- pdc202xx_ultra_verbose(reg6ch, (reg50h & smask)));
- p += sprintf(p, "PIO Mode: %s %s %s %s\n",
- pdc202xx_pio_verbose(reg60h),
- pdc202xx_pio_verbose(reg64h),
- pdc202xx_pio_verbose(reg68h),
- pdc202xx_pio_verbose(reg6ch));
-#if 0
- p += sprintf(p, "--------------- Can ATAPI DMA ---------------\n");
-#endif
- return (char *)p;
-}
-
-static char * pdc202xx_info_new (char *buf, struct pci_dev *dev)
-{
- char *p = buf;
-// u32 bibma = pci_resource_start(dev, 4);
-
-// u32 reg60h = 0, reg64h = 0, reg68h = 0, reg6ch = 0;
-// u16 reg50h = 0, word88 = 0;
-// int udmasel[4]={0,0,0,0}, piosel[4]={0,0,0,0}, i=0, hd=0;
-
- switch(dev->device) {
- case PCI_DEVICE_ID_PROMISE_20275:
- p += sprintf(p, "\n PDC20275 Chipset.\n");
- break;
- case PCI_DEVICE_ID_PROMISE_20276:
- p += sprintf(p, "\n PDC20276 Chipset.\n");
- break;
- case PCI_DEVICE_ID_PROMISE_20269:
- p += sprintf(p, "\n PDC20269 TX2 Chipset.\n");
- break;
- case PCI_DEVICE_ID_PROMISE_20268:
- case PCI_DEVICE_ID_PROMISE_20268R:
- p += sprintf(p, "\n PDC20268 TX2 Chipset.\n");
- break;
-default:
- p += sprintf(p, "\n PDC202XX Chipset.\n");
- break;
- }
- return (char *)p;
-}
-
-static int pdc202xx_get_info (char *buffer, char **addr, off_t offset, int count)
-{
- char *p = buffer;
- switch(bmide_dev->device) {
- case PCI_DEVICE_ID_PROMISE_20275:
- case PCI_DEVICE_ID_PROMISE_20276:
- case PCI_DEVICE_ID_PROMISE_20269:
- case PCI_DEVICE_ID_PROMISE_20268:
- case PCI_DEVICE_ID_PROMISE_20268R:
- p = pdc202xx_info_new(buffer, bmide_dev);
- break;
- default:
- p = pdc202xx_info(buffer, bmide_dev);
- break;
- }
- return p-buffer; /* => must be less than 4k! */
-}
-#endif /* defined(DISPLAY_PDC202XX_TIMINGS) && defined(CONFIG_PROC_FS) */
-
-byte pdc202xx_proc = 0;
-
-const char *pdc_quirk_drives[] = {
- "QUANTUM FIREBALLlct08 08",
- "QUANTUM FIREBALLP KA6.4",
- "QUANTUM FIREBALLP LM20.4",
- "QUANTUM FIREBALLP KX20.5",
- "QUANTUM FIREBALLP KX27.3",
- "QUANTUM FIREBALLP LM20.5",
- NULL
-};
-
extern char *ide_xfer_verbose (byte xfer_rate);
/* A Register */
@@ -322,7 +112,6 @@
switch(registers) {
case REG_A:
- bit2 = 0;
printk("A Register ");
if (value & 0x80) printk("SYNC_IN ");
if (value & 0x40) printk("ERRDY_EN ");
@@ -335,7 +124,6 @@
printk("PIO(A) = %d ", bit2);
break;
case REG_B:
- bit1 = 0;bit2 = 0;
printk("B Register ");
if (value & 0x80) { printk("MB2 ");bit1 |= 0x80; }
if (value & 0x40) { printk("MB1 ");bit1 |= 0x40; }
@@ -349,7 +137,6 @@
printk("PIO(B) = %d ", bit2);
break;
case REG_C:
- bit2 = 0;
printk("C Register ");
if (value & 0x80) printk("DMARQp ");
if (value & 0x40) printk("IORDYp ");
@@ -379,23 +166,22 @@
#endif /* PDC202XX_DECODE_REGISTER_INFO */
-static int check_in_drive_lists(struct ata_device *drive, const char **list)
-{
+int check_in_drive_lists(struct ata_device *drive) {
+ const char *pdc_quirk_drives[] = {
+ "QUANTUM FIREBALLlct08 08",
+ "QUANTUM FIREBALLP KA6.4",
+ "QUANTUM FIREBALLP LM20.4",
+ "QUANTUM FIREBALLP KX20.5",
+ "QUANTUM FIREBALLP KX27.3",
+ "QUANTUM FIREBALLP LM20.5",
+ NULL
+ };
+ const char**list = pdc_quirk_drives;
struct hd_driveid *id = drive->id;
- if (pdc_quirk_drives == list) {
- while (*list) {
- if (strstr(id->model, *list++)) {
- return 2;
- }
- }
- } else {
- while (*list) {
- if (!strcmp(*list++,id->model)) {
- return 1;
- }
- }
- }
+ while (*list)
+ if (strstr(id->model, *list++))
+ return 2;
return 0;
}
@@ -523,6 +309,15 @@
return err;
}
+#define set_2regs(a, b) \
+ OUT_BYTE((a + adj), indexreg); \
+ OUT_BYTE(b, datareg);
+
+#define set_reg_and_wait(value, reg, delay) \
+ OUT_BYTE(value, reg); \
+ mdelay(delay);
+
+
static int pdc202xx_new_tune_chipset(struct ata_device *drive, byte speed)
{
struct ata_channel *hwif = drive->channel;
@@ -549,121 +344,79 @@
case XFER_UDMA_7:
speed = XFER_UDMA_6;
case XFER_UDMA_6:
- OUT_BYTE((0x10 + adj), indexreg);
- OUT_BYTE(0x1a, datareg);
- OUT_BYTE((0x11 + adj), indexreg);
- OUT_BYTE(0x01, datareg);
- OUT_BYTE((0x12 + adj), indexreg);
- OUT_BYTE(0xcb, datareg);
+ set_2regs(0x10, 0x1a);
+ set_2regs(0x11, 0x01);
+ set_2regs(0x12, 0xcb);
break;
case XFER_UDMA_5:
- OUT_BYTE((0x10 + adj), indexreg);
- OUT_BYTE(0x1a, datareg);
- OUT_BYTE((0x11 + adj), indexreg);
- OUT_BYTE(0x02, datareg);
- OUT_BYTE((0x12 + adj), indexreg);
- OUT_BYTE(0xcb, datareg);
+ set_2regs(0x10, 0x1a);
+ set_2regs(0x11, 0x02);
+ set_2regs(0x12, 0xcb);
break;
case XFER_UDMA_4:
- OUT_BYTE((0x10 + adj), indexreg);
- OUT_BYTE(0x1a, datareg);
- OUT_BYTE((0x11 + adj), indexreg);
- OUT_BYTE(0x03, datareg);
- OUT_BYTE((0x12 + adj), indexreg);
- OUT_BYTE(0xcd, datareg);
+ set_2regs(0x10, 0x1a);
+ set_2regs(0x11, 0x03);
+ set_2regs(0x12, 0xcd);
break;
case XFER_UDMA_3:
- OUT_BYTE((0x10 + adj), indexreg);
- OUT_BYTE(0x1a, datareg);
- OUT_BYTE((0x11 + adj), indexreg);
- OUT_BYTE(0x05, datareg);
- OUT_BYTE((0x12 + adj), indexreg);
- OUT_BYTE(0xcd, datareg);
+ set_2regs(0x10, 0x1a);
+ set_2regs(0x11, 0x05);
+ set_2regs(0x12, 0xcd);
break;
case XFER_UDMA_2:
- OUT_BYTE((0x10 + adj), indexreg);
- OUT_BYTE(0x2a, datareg);
- OUT_BYTE((0x11 + adj), indexreg);
- OUT_BYTE(0x07, datareg);
- OUT_BYTE((0x12 + adj), indexreg);
- OUT_BYTE(0xcd, datareg);
+ set_2regs(0x10, 0x2a);
+ set_2regs(0x11, 0x07);
+ set_2regs(0x12, 0xcd);
break;
case XFER_UDMA_1:
- OUT_BYTE((0x10 + adj), indexreg);
- OUT_BYTE(0x3a, datareg);
- OUT_BYTE((0x11 + adj), indexreg);
- OUT_BYTE(0x0a, datareg);
- OUT_BYTE((0x12 + adj), indexreg);
- OUT_BYTE(0xd0, datareg);
+ set_2regs(0x10, 0x3a);
+ set_2regs(0x11, 0x0a);
+ set_2regs(0x12, 0xd0);
break;
case XFER_UDMA_0:
- OUT_BYTE((0x10 + adj), indexreg);
- OUT_BYTE(0x4a, datareg);
- OUT_BYTE((0x11 + adj), indexreg);
- OUT_BYTE(0x0f, datareg);
- OUT_BYTE((0x12 + adj), indexreg);
- OUT_BYTE(0xd5, datareg);
+ set_2regs(0x10, 0x4a);
+ set_2regs(0x11, 0x0f);
+ set_2regs(0x12, 0xd5);
break;
case XFER_MW_DMA_2:
- OUT_BYTE((0x0e + adj), indexreg);
- OUT_BYTE(0x69, datareg);
- OUT_BYTE((0x0f + adj), indexreg);
- OUT_BYTE(0x25, datareg);
+ set_2regs(0x0e, 0x69);
+ set_2regs(0x0f, 0x25);
break;
case XFER_MW_DMA_1:
- OUT_BYTE((0x0e + adj), indexreg);
- OUT_BYTE(0x6b, datareg);
- OUT_BYTE((0x0f+ adj), indexreg);
- OUT_BYTE(0x27, datareg);
+ set_2regs(0x0e, 0x6b);
+ set_2regs(0x0f, 0x27);
break;
case XFER_MW_DMA_0:
- OUT_BYTE((0x0e + adj), indexreg);
- OUT_BYTE(0xdf, datareg);
- OUT_BYTE((0x0f + adj), indexreg);
- OUT_BYTE(0x5f, datareg);
+ set_2regs(0x0e, 0xdf);
+ set_2regs(0x0f, 0x5f);
break;
#else
switch (speed) {
#endif /* CONFIG_BLK_DEV_IDEDMA */
case XFER_PIO_4:
- OUT_BYTE((0x0c + adj), indexreg);
- OUT_BYTE(0x23, datareg);
- OUT_BYTE((0x0d + adj), indexreg);
- OUT_BYTE(0x09, datareg);
- OUT_BYTE((0x13 + adj), indexreg);
- OUT_BYTE(0x25, datareg);
+ set_2regs(0x0c, 0x23);
+ set_2regs(0x0d, 0x09);
+ set_2regs(0x13, 0x25);
break;
case XFER_PIO_3:
- OUT_BYTE((0x0c + adj), indexreg);
- OUT_BYTE(0x27, datareg);
- OUT_BYTE((0x0d + adj), indexreg);
- OUT_BYTE(0x0d, datareg);
- OUT_BYTE((0x13 + adj), indexreg);
- OUT_BYTE(0x35, datareg);
+ set_2regs(0x0c, 0x27);
+ set_2regs(0x0d, 0x0d);
+ set_2regs(0x13, 0x35);
break;
case XFER_PIO_2:
- OUT_BYTE((0x0c + adj), indexreg);
- OUT_BYTE(0x23, datareg);
- OUT_BYTE((0x0d + adj), indexreg);
- OUT_BYTE(0x26, datareg);
- OUT_BYTE((0x13 + adj), indexreg);
- OUT_BYTE(0x64, datareg);
+ set_2regs(0x0c, 0x23);
+ set_2regs(0x0d, 0x26);
+ set_2regs(0x13, 0x64);
break;
case XFER_PIO_1:
- OUT_BYTE((0x0c + adj), indexreg);
- OUT_BYTE(0x46, datareg);
- OUT_BYTE((0x0d + adj), indexreg);
- OUT_BYTE(0x29, datareg);
- OUT_BYTE((0x13 + adj), indexreg);
- OUT_BYTE(0xa4, datareg);
+ set_2regs(0x0c, 0x46);
+ set_2regs(0x0d, 0x29);
+ set_2regs(0x13, 0xa4);
break;
case XFER_PIO_0:
- OUT_BYTE((0x0c + adj), indexreg);
- OUT_BYTE(0xfb, datareg);
- OUT_BYTE((0x0d + adj), indexreg);
- OUT_BYTE(0x2b, datareg);
- OUT_BYTE((0x13 + adj), indexreg);
- OUT_BYTE(0xac, datareg);
+ set_2regs(0x0c, 0xfb);
+ set_2regs(0x0d, 0x2b);
+ set_2regs(0x13, 0xac);
break;
default:
;
@@ -684,21 +437,15 @@
* 180, 120, 90, 90, 90, 60, 30
* 11, 5, 4, 3, 2, 1, 0
*/
-static int config_chipset_for_pio(struct ata_device *drive, byte pio)
+static void config_chipset_for_pio(struct ata_device *drive, byte pio)
{
- byte speed = 0x00;
+ byte speed;
if (pio == 255)
speed = ata_timing_mode(drive, XFER_PIO | XFER_EPIO);
- else
- speed = XFER_PIO_0 + min_t(byte, pio, 4);
+ else speed = XFER_PIO_0 + min_t(byte, pio, 4);
- return ((int) pdc202xx_tune_chipset(drive, speed));
-}
-
-static void pdc202xx_tune_drive(struct ata_device *drive, byte pio)
-{
- (void) config_chipset_for_pio(drive, pio);
+ pdc202xx_tune_chipset(drive, speed);
}
#ifdef CONFIG_BLK_DEV_IDEDMA
@@ -833,8 +580,7 @@
if (drive->type != ATA_DISK)
return 0;
if (id->capability & 4) { /* IORDY_EN & PREFETCH_EN */
- OUT_BYTE((iordy + adj), indexreg);
- OUT_BYTE((IN_BYTE(datareg)|0x03), datareg);
+ set_2regs(iordy, (IN_BYTE(datareg)|0x03));
}
goto jumpbit_is_set;
}
@@ -971,7 +717,7 @@
on = 0;
verbose = 0;
no_dma_set:
- (void) config_chipset_for_pio(drive, 5);
+ config_chipset_for_pio(drive, 5);
}
udma_enable(drive, on, verbose);
@@ -979,11 +725,6 @@
return 0;
}
-int pdc202xx_quirkproc(struct ata_device *drive)
-{
- return ((int) check_in_drive_lists(drive, pdc_quirk_drives));
-}
-
static int pdc202xx_udma_start(struct ata_device *drive, struct request *rq)
{
u8 clock = 0;
@@ -1119,15 +860,7 @@
return (dma_stat & 4) == 4; /* return 1 if INTR asserted */
}
-static void pdc202xx_udma_timeout(struct ata_device *drive)
-{
- if (!drive->channel->resetproc)
- return;
- /* Assume naively that resetting the drive may help. */
- drive->channel->resetproc(drive);
-}
-
-static void pdc202xx_udma_irq_lost(struct ata_device *drive)
+static void pdc202xx_bug (struct ata_device *drive)
{
if (!drive->channel->resetproc)
return;
@@ -1143,10 +876,8 @@
void pdc202xx_new_reset(struct ata_device *drive)
{
- OUT_BYTE(0x04,IDE_CONTROL_REG);
- mdelay(1000);
- OUT_BYTE(0x00,IDE_CONTROL_REG);
- mdelay(1000);
+ set_reg_and_wait(0x04,IDE_CONTROL_REG, 1000);
+ set_reg_and_wait(0x00,IDE_CONTROL_REG, 1000);
printk("PDC202XX: %s channel reset.\n",
drive->channel->unit ? "Secondary" : "Primary");
}
@@ -1156,40 +887,12 @@
unsigned long high_16 = pci_resource_start(drive->channel->pci_dev, 4);
byte udma_speed_flag = IN_BYTE(high_16 + 0x001f);
- OUT_BYTE(udma_speed_flag | 0x10, high_16 + 0x001f);
- mdelay(100);
- OUT_BYTE(udma_speed_flag & ~0x10, high_16 + 0x001f);
- mdelay(2000); /* 2 seconds ?! */
+ set_reg_and_wait(udma_speed_flag | 0x10, high_16 + 0x001f, 100);
+ set_reg_and_wait(udma_speed_flag & ~0x10, high_16 + 0x001f, 2000); /* 2 seconds ?! */
printk("PDC202XX: %s channel reset.\n",
drive->channel->unit ? "Secondary" : "Primary");
}
-/*
- * Since SUN Cobalt is attempting to do this operation, I should disclose
- * this has been a long time ago Thu Jul 27 16:40:57 2000 was the patch date
- * HOTSWAP ATA Infrastructure.
- */
-static int pdc202xx_tristate(struct ata_device * drive, int state)
-{
-#if 0
- struct ata_channel *hwif = drive->channel;
- unsigned long high_16 = pci_resource_start(hwif->pci_dev, 4);
- byte sc1f = inb(high_16 + 0x001f);
-
- if (!hwif)
- return -EINVAL;
-
-// hwif->bus_state = state;
-
- if (state) {
- outb(sc1f | 0x08, high_16 + 0x001f);
- } else {
- outb(sc1f & ~0x08, high_16 + 0x001f);
- }
-#endif
- return 0;
-}
-
static unsigned int __init pdc202xx_init_chipset(struct pci_dev *dev)
{
unsigned long high_16 = pci_resource_start(dev, 4);
@@ -1213,10 +916,8 @@
break;
case PCI_DEVICE_ID_PROMISE_20267:
case PCI_DEVICE_ID_PROMISE_20265:
- OUT_BYTE(udma_speed_flag | 0x10, high_16 + 0x001f);
- mdelay(100);
- OUT_BYTE(udma_speed_flag & ~0x10, high_16 + 0x001f);
- mdelay(2000); /* 2 seconds ?! */
+ set_reg_and_wait(udma_speed_flag | 0x10, high_16 + 0x001f, 100);
+ set_reg_and_wait(udma_speed_flag & ~0x10, high_16 + 0x001f, 2000); /* 2 seconds ?! */
break;
case PCI_DEVICE_ID_PROMISE_20262:
/*
@@ -1229,10 +930,8 @@
* reset leaves the timing registers intact,
* but resets the drives.
*/
- OUT_BYTE(udma_speed_flag | 0x10, high_16 + 0x001f);
- mdelay(100);
- OUT_BYTE(udma_speed_flag & ~0x10, high_16 + 0x001f);
- mdelay(2000); /* 2 seconds ?! */
+ set_reg_and_wait(udma_speed_flag | 0x10, high_16 + 0x001f, 100);
+ set_reg_and_wait(udma_speed_flag & ~0x10, high_16 + 0x001f, 2000); /* 2 seconds ?! */
default:
if ((dev->class >> 8) != PCI_CLASS_STORAGE_IDE) {
byte irq = 0, irq2 = 0;
@@ -1265,31 +964,7 @@
}
#endif /* CONFIG_PDC202XX_BURST */
-#ifdef CONFIG_PDC202XX_MASTER
- if (!(primary_mode & 1)) {
- printk("%s: FORCING PRIMARY MODE BIT 0x%02x -> 0x%02x ",
- dev->name, primary_mode, (primary_mode|1));
- OUT_BYTE(primary_mode|1, high_16 + 0x001a);
- printk("%s\n", (IN_BYTE(high_16 + 0x001a) & 1) ? "MASTER" : "PCI");
- }
-
- if (!(secondary_mode & 1)) {
- printk("%s: FORCING SECONDARY MODE BIT 0x%02x -> 0x%02x ",
- dev->name, secondary_mode, (secondary_mode|1));
- OUT_BYTE(secondary_mode|1, high_16 + 0x001b);
- printk("%s\n", (IN_BYTE(high_16 + 0x001b) & 1) ? "MASTER" : "PCI");
- }
-#endif
-
fttk_tx_series:
-
-#if defined(DISPLAY_PDC202XX_TIMINGS) && defined(CONFIG_PROC_FS)
- if (!pdc202xx_proc) {
- pdc202xx_proc = 1;
- bmide_dev = dev;
- pdc202xx_display_info = &pdc202xx_get_info;
- }
-#endif
return dev->irq;
}
@@ -1314,8 +989,8 @@
static void __init ide_init_pdc202xx(struct ata_channel *hwif)
{
- hwif->tuneproc = &pdc202xx_tune_drive;
- hwif->quirkproc = &pdc202xx_quirkproc;
+ hwif->tuneproc = &config_chipset_for_pio;
+ hwif->quirkproc = &check_in_drive_lists;
switch(hwif->pci_dev->device) {
case PCI_DEVICE_ID_PROMISE_20275:
@@ -1329,7 +1004,6 @@
case PCI_DEVICE_ID_PROMISE_20267:
case PCI_DEVICE_ID_PROMISE_20265:
case PCI_DEVICE_ID_PROMISE_20262:
- hwif->busproc = &pdc202xx_tristate;
hwif->resetproc = &pdc202xx_reset;
case PCI_DEVICE_ID_PROMISE_20246:
hwif->speedproc = &pdc202xx_tune_chipset;
@@ -1337,19 +1011,13 @@
break;
}
-#undef CONFIG_PDC202XX_32_UNMASK
-#ifdef CONFIG_PDC202XX_32_UNMASK
- hwif->io_32bit = 1;
- hwif->unmask = 1;
-#endif
-
#ifdef CONFIG_BLK_DEV_IDEDMA
if (hwif->dma_base) {
hwif->udma_start = pdc202xx_udma_start;
hwif->udma_stop = pdc202xx_udma_stop;
hwif->udma_irq_status = pdc202xx_udma_irq_status;
- hwif->udma_irq_lost = pdc202xx_udma_irq_lost;
- hwif->udma_timeout = pdc202xx_udma_timeout;
+ hwif->udma_irq_lost = pdc202xx_bug;
+ hwif->udma_timeout = pdc202xx_bug;
hwif->XXX_udma = pdc202xx_dmaproc;
hwif->highmem = 1;
if (!noautodma)
@@ -1509,7 +1177,7 @@
int __init init_pdc202xx(void)
{
- int i;
+ unsigned int i;
for (i = 0; i < ARRAY_SIZE(chipsets); ++i) {
ata_register_chipset(&chipsets[i]);
diff -urN linux-2.5.15/drivers/ide/quirks.c linux/drivers/ide/quirks.c
--- linux-2.5.15/drivers/ide/quirks.c 1970-01-01 01:00:00.000000000 +0100
+++ linux/drivers/ide/quirks.c 2002-05-15 13:26:34.000000000 +0200
@@ -0,0 +1,231 @@
+/**** vi:set ts=8 sts=8 sw=8:************************************************
+ *
+ * Copyright (C) 2002 Marcin Dalecki <martin@dalecki.de>
+ *
+ * Copyright (c) 1999-2000 Andre Hedrick <andre@linux-ide.org>
+ * Copyright (c) 1995-1998 Mark Lord
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published by
+ * the Free Software Foundation.
+ */
+
+/*
+ * Just the black and white list handling for BM-DMA operation.
+ */
+
+#include <linux/config.h>
+#define __NO_VERSION__
+#include <linux/module.h>
+#include <linux/types.h>
+#include <linux/kernel.h>
+#include <linux/timer.h>
+#include <linux/mm.h>
+#include <linux/interrupt.h>
+#include <linux/pci.h>
+#include <linux/init.h>
+#include <linux/ide.h>
+#include <linux/delay.h>
+
+#include <asm/io.h>
+#include <asm/irq.h>
+
+#ifdef CONFIG_IDEDMA_NEW_DRIVE_LISTINGS
+
+struct drive_list_entry {
+ char * id_model;
+ char * id_firmware;
+};
+
+struct drive_list_entry drive_whitelist[] = {
+ { "Micropolis 2112A", NULL },
+ { "CONNER CTMA 4000", NULL },
+ { "CONNER CTT8000-A", NULL },
+ { "ST34342A", NULL },
+ { NULL, NULL }
+};
+
+struct drive_list_entry drive_blacklist[] = {
+
+ { "WDC AC11000H", NULL },
+ { "WDC AC22100H", NULL },
+ { "WDC AC32500H", NULL },
+ { "WDC AC33100H", NULL },
+ { "WDC AC31600H", NULL },
+ { "WDC AC32100H", "24.09P07" },
+ { "WDC AC23200L", "21.10N21" },
+ { "Compaq CRD-8241B", NULL },
+ { "CRD-8400B", NULL },
+ { "CRD-8480B", NULL },
+ { "CRD-8480C", NULL },
+ { "CRD-8482B", NULL },
+ { "CRD-84", NULL },
+ { "SanDisk SDP3B", NULL },
+ { "SanDisk SDP3B-64", NULL },
+ { "SANYO CD-ROM CRD", NULL },
+ { "HITACHI CDR-8", NULL },
+ { "HITACHI CDR-8335", NULL },
+ { "HITACHI CDR-8435", NULL },
+ { "Toshiba CD-ROM XM-6202B", NULL },
+ { "CD-532E-A", NULL },
+ { "E-IDE CD-ROM CR-840", NULL },
+ { "CD-ROM Drive/F5A", NULL },
+ { "RICOH CD-R/RW MP7083A", NULL },
+ { "WPI CDD-820", NULL },
+ { "SAMSUNG CD-ROM SC-148C", NULL },
+ { "SAMSUNG CD-ROM SC-148F", NULL },
+ { "SAMSUNG CD-ROM SC", NULL },
+ { "SanDisk SDP3B-64", NULL },
+ { "SAMSUNG CD-ROM SN-124", NULL },
+ { "PLEXTOR CD-R PX-W8432T", NULL },
+ { "ATAPI CD-ROM DRIVE 40X MAXIMUM", NULL },
+ { "_NEC DV5800A", NULL },
+ { NULL, NULL }
+
+};
+
+static int in_drive_list(struct hd_driveid *id, struct drive_list_entry * drive_table)
+{
+ for ( ; drive_table->id_model ; drive_table++)
+ if ((!strcmp(drive_table->id_model, id->model)) &&
+ ((drive_table->id_firmware && !strstr(drive_table->id_firmware, id->fw_rev)) ||
+ (!drive_table->id_firmware)))
+ return 1;
+ return 0;
+}
+
+#else
+
+/*
+ * good_dma_drives() lists the model names (from "hdparm -i")
+ * of drives which do not support mode2 DMA but which are
+ * known to work fine with this interface under Linux.
+ */
+const char *good_dma_drives[] = {"Micropolis 2112A",
+ "CONNER CTMA 4000",
+ "CONNER CTT8000-A",
+ "ST34342A", /* for Sun Ultra */
+ NULL};
+
+/*
+ * bad_dma_drives() lists the model names (from "hdparm -i")
+ * of drives which supposedly support (U)DMA but which are
+ * known to corrupt data with this interface under Linux.
+ *
+ * This is an empirical list. Its generated from bug reports. That means
+ * while it reflects actual problem distributions it doesn't answer whether
+ * the drive or the controller, or cabling, or software, or some combination
+ * thereof is the fault. If you don't happen to agree with the kernel's
+ * opinion of your drive - use hdparm to turn DMA on.
+ */
+const char *bad_dma_drives[] = {"WDC AC11000H",
+ "WDC AC22100H",
+ "WDC AC32100H",
+ "WDC AC32500H",
+ "WDC AC33100H",
+ "WDC AC31600H",
+ NULL};
+
+#endif
+
+/*
+ * For both Blacklisted and Whitelisted drives.
+ * This is setup to be called as an extern for future support
+ * to other special driver code.
+ */
+int check_drive_lists(struct ata_device *drive, int good_bad)
+{
+ struct hd_driveid *id = drive->id;
+
+#ifdef CONFIG_IDEDMA_NEW_DRIVE_LISTINGS
+ if (good_bad) {
+ return in_drive_list(id, drive_whitelist);
+ } else {
+ int blacklist = in_drive_list(id, drive_blacklist);
+ if (blacklist)
+ printk("%s: Disabling (U)DMA for %s\n", drive->name, id->model);
+ return(blacklist);
+ }
+#else
+ const char **list;
+
+ if (good_bad) {
+ /* Consult the list of known "good" drives */
+ list = good_dma_drives;
+ while (*list) {
+ if (!strcmp(*list++,id->model))
+ return 1;
+ }
+ } else {
+ /* Consult the list of known "bad" drives */
+ list = bad_dma_drives;
+ while (*list) {
+ if (!strcmp(*list++,id->model)) {
+ printk("%s: Disabling (U)DMA for %s\n",
+ drive->name, id->model);
+ return 1;
+ }
+ }
+ }
+#endif
+ return 0;
+}
+
+void udma_print(struct ata_device *drive)
+{
+#ifdef CONFIG_ARCH_ACORN
+ printk(", DMA");
+#else
+ struct hd_driveid *id = drive->id;
+ char *str = NULL;
+
+ if ((id->field_valid & 4) && (eighty_ninty_three(drive)) &&
+ (id->dma_ultra & (id->dma_ultra >> 14) & 3)) {
+ if ((id->dma_ultra >> 15) & 1)
+ str = ", UDMA(mode 7)"; /* UDMA BIOS-enabled! */
+ else
+ str = ", UDMA(133)"; /* UDMA BIOS-enabled! */
+ } else if ((id->field_valid & 4) && (eighty_ninty_three(drive)) &&
+ (id->dma_ultra & (id->dma_ultra >> 11) & 7)) {
+ if ((id->dma_ultra >> 13) & 1) {
+ str = ", UDMA(100)"; /* UDMA BIOS-enabled! */
+ } else if ((id->dma_ultra >> 12) & 1) {
+ str = ", UDMA(66)"; /* UDMA BIOS-enabled! */
+ } else {
+ str = ", UDMA(44)"; /* UDMA BIOS-enabled! */
+ }
+ } else if ((id->field_valid & 4) &&
+ (id->dma_ultra & (id->dma_ultra >> 8) & 7)) {
+ if ((id->dma_ultra >> 10) & 1) {
+ str = ", UDMA(33)"; /* UDMA BIOS-enabled! */
+ } else if ((id->dma_ultra >> 9) & 1) {
+ str = ", UDMA(25)"; /* UDMA BIOS-enabled! */
+ } else {
+ str = ", UDMA(16)"; /* UDMA BIOS-enabled! */
+ }
+ } else if (id->field_valid & 4)
+ str = ", (U)DMA"; /* Can be BIOS-enabled! */
+ else
+ str = ", DMA";
+
+ printk(str);
+#endif
+}
+
+/*
+ * Drive back/white list handling for UDMA capability:
+ */
+
+int udma_black_list(struct ata_device *drive)
+{
+ return check_drive_lists(drive, 0);
+}
+
+int udma_white_list(struct ata_device *drive)
+{
+ return check_drive_lists(drive, 1);
+}
+
+EXPORT_SYMBOL(udma_print);
+EXPORT_SYMBOL(udma_black_list);
+EXPORT_SYMBOL(udma_white_list);
diff -urN linux-2.5.15/include/linux/ide.h linux/include/linux/ide.h
--- linux-2.5.15/include/linux/ide.h 2002-05-15 14:55:12.000000000 +0200
+++ linux/include/linux/ide.h 2002-05-15 14:14:15.000000000 +0200
@@ -820,22 +820,55 @@
void __init ide_scan_pcibus(int scan_direction);
#endif
+
+static inline void udma_enable(struct ata_device *drive, int on, int verbose)
+{
+ drive->channel->udma_enable(drive, on, verbose);
+}
+
+static inline int udma_start(struct ata_device *drive, struct request *rq)
+{
+ return drive->channel->udma_start(drive, rq);
+}
+
+static inline int udma_stop(struct ata_device *drive)
+{
+ return drive->channel->udma_stop(drive);
+}
+
+static inline int udma_read(struct ata_device *drive, struct request *rq)
+{
+ return drive->channel->udma_read(drive, rq);
+}
+
+static inline int udma_write(struct ata_device *drive, struct request *rq)
+{
+ return drive->channel->udma_write(drive, rq);
+}
+
+static inline int udma_irq_status(struct ata_device *drive)
+{
+ return drive->channel->udma_irq_status(drive);
+}
+
+static inline void udma_timeout(struct ata_device *drive)
+{
+ return drive->channel->udma_timeout(drive);
+}
+
+static inline void udma_irq_lost(struct ata_device *drive)
+{
+ return drive->channel->udma_irq_lost(drive);
+}
+
#ifdef CONFIG_BLK_DEV_IDEDMA
extern int udma_new_table(struct ata_channel *, struct request *);
extern void udma_destroy_table(struct ata_channel *);
extern void udma_print(struct ata_device *);
-extern void udma_enable(struct ata_device *, int, int);
extern int udma_black_list(struct ata_device *);
extern int udma_white_list(struct ata_device *);
-extern void udma_timeout(struct ata_device *);
-extern void udma_irq_lost(struct ata_device *);
-extern int udma_start(struct ata_device *, struct request *rq);
-extern int udma_stop(struct ata_device *);
-extern int udma_read(struct ata_device *, struct request *rq);
-extern int udma_write(struct ata_device *, struct request *rq);
-extern int udma_irq_status(struct ata_device *);
extern int ata_do_udma(unsigned int reading, struct ata_device *drive, struct request *rq);
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 64
2002-05-15 12:04 ` [PATCH] 2.5.15 IDE 64 Martin Dalecki
@ 2002-05-15 13:12 ` Russell King
2002-05-15 12:14 ` Martin Dalecki
0 siblings, 1 reply; 265+ messages in thread
From: Russell King @ 2002-05-15 13:12 UTC (permalink / raw)
To: Martin Dalecki; +Cc: Linus Torvalds, Kernel Mailing List
On Wed, May 15, 2002 at 02:04:24PM +0200, Martin Dalecki wrote:
> Tue May 14 13:35:04 CEST 2002 ide-clean-64:
>
> Let's just get over with this before queue handling will be targeted again...
>
> - Implement suggestions by Russel King for improved portability and separation
RusseLL please. 8)
--
Russell King (rmk@arm.linux.org.uk) The developer of ARM Linux
http://www.arm.linux.org.uk/personal/aboutme.html
^ permalink raw reply [flat|nested] 265+ messages in thread
* Re: [PATCH] 2.5.15 IDE 64
2002-05-15 13:12 ` Russell King
@ 2002-05-15 12:14 ` Martin Dalecki
0 siblings, 0 replies; 265+ messages in thread
From: Martin Dalecki @ 2002-05-15 12:14 UTC (permalink / raw)
To: Russell King; +Cc: Linus Torvalds, Kernel Mailing List
Uz.ytkownik Russell King napisa?:
> On Wed, May 15, 2002 at 02:04:24PM +0200, Martin Dalecki wrote:
>
>>Tue May 14 13:35:04 CEST 2002 ide-clean-64:
>>
>>Let's just get over with this before queue handling will be targeted again...
>>
>>- Implement suggestions by Russel King for improved portability and separation
>
>
> RusseLL please. 8)
Please accept my in deppth thorough apologies for the spelling mistake.
Would maybe just "The ARM King" suite you as well? ;-).
^ permalink raw reply [flat|nested] 265+ messages in thread