linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [BK PATCHES] add ata scsi driver
@ 2003-05-26 18:12 James Bottomley
  2003-05-26 18:18 ` Jens Axboe
  0 siblings, 1 reply; 68+ messages in thread
From: James Bottomley @ 2003-05-26 18:12 UTC (permalink / raw)
  To: torvalds, Jens Axboe; +Cc: Linux Kernel


    On Mon, May 26 2003, Linus Torvalds wrote:
    > > What does the block layer need, that it doesn't have now?
    > 
    > Exactly. I'd _love_ for people to really think about this.
    
    In discussion with Jeff, it seems most of what he wants is already
    there. He just doesn't know it yet :-)
    
    Maybe that's my problem as well, maybe the code / comments / doc /
    whatever is not clear enough.
    
My wishlist for this would be:

1. Unified SG segment allocation.  The SCSI layer currently has a
mempool implementation to cope with this, is there a reason it can't
become block generic?

2. Device locality awareness.  Quite a bit of the esoteric SCSI queueing
code occurs because we have two type of queue events:
a. device can't accept another command---stop queue and restart when the
device sends a completion back
b. the host adapter is out of resources for *all* its devices.  Block
all device queues until we free some resources (again, usually a
returning command).

3. Perhaps some type of unified command handling.  At the moment, we all
seem to allocate DMA'able regions for our commands/taskfiles/whatever
and attach them to reqest->special.  Then we need to release them again
before completing the request.

4. Same thing goes for sense buffers.

5. There needs to be some amalgam of the SCSI code for dynamic tag
command queue depth handling.

OK, I'll stop now.

James



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 18:12 [BK PATCHES] add ata scsi driver James Bottomley
@ 2003-05-26 18:18 ` Jens Axboe
  2003-05-26 18:47   ` James Bottomley
  0 siblings, 1 reply; 68+ messages in thread
From: Jens Axboe @ 2003-05-26 18:18 UTC (permalink / raw)
  To: James Bottomley; +Cc: torvalds, Linux Kernel

On Mon, May 26 2003, James Bottomley wrote:
> 
>     On Mon, May 26 2003, Linus Torvalds wrote:
>     > > What does the block layer need, that it doesn't have now?
>     > 
>     > Exactly. I'd _love_ for people to really think about this.
>     
>     In discussion with Jeff, it seems most of what he wants is already
>     there. He just doesn't know it yet :-)
>     
>     Maybe that's my problem as well, maybe the code / comments / doc /
>     whatever is not clear enough.
>     
> My wishlist for this would be:
> 
> 1. Unified SG segment allocation.  The SCSI layer currently has a
> mempool implementation to cope with this, is there a reason it can't
> become block generic?

Of course that is doable, when I killed scsi_dma.c it was just a direct
replacement. Given that IDE had no such dynamic sg list allocation
requirements, it stayed in SCSI. Overdesign is never good :)

> 2. Device locality awareness.  Quite a bit of the esoteric SCSI queueing
> code occurs because we have two type of queue events:
> a. device can't accept another command---stop queue and restart when the
> device sends a completion back

This should be doable.

> b. the host adapter is out of resources for *all* its devices.  Block
> all device queues until we free some resources (again, usually a
> returning command).

This is harder, because it involves more than one specific queue.

> 3. Perhaps some type of unified command handling.  At the moment, we all
> seem to allocate DMA'able regions for our commands/taskfiles/whatever
> and attach them to reqest->special.  Then we need to release them again
> before completing the request.
> 
> 4. Same thing goes for sense buffers.

Completely agree.

> 5. There needs to be some amalgam of the SCSI code for dynamic tag
> command queue depth handling.

Again, block layer queueing was designed for what I needed (ide tcq) and
no overdesign was attempted. If you describe what you need, I'd be very
happy to oblige and add those bits. Some decent depth change handling, I
presume?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 18:18 ` Jens Axboe
@ 2003-05-26 18:47   ` James Bottomley
  2003-05-26 19:07     ` Jens Axboe
  0 siblings, 1 reply; 68+ messages in thread
From: James Bottomley @ 2003-05-26 18:47 UTC (permalink / raw)
  To: Jens Axboe; +Cc: torvalds, Linux Kernel

On Mon, 2003-05-26 at 14:18, Jens Axboe wrote:
> > 1. Unified SG segment allocation.  The SCSI layer currently has a
> > mempool implementation to cope with this, is there a reason it can't
> > become block generic?
> 
> Of course that is doable, when I killed scsi_dma.c it was just a direct
> replacement. Given that IDE had no such dynamic sg list allocation
> requirements, it stayed in SCSI. Overdesign is never good :)

I agree with the sentiment.  I just don't think variable size SG tables
will remain the exclusive province of SCSI forever.

> > b. the host adapter is out of resources for *all* its devices.  Block
> > all device queues until we free some resources (again, usually a
> > returning command).
> 
> This is harder, because it involves more than one specific queue.

Yes, this is our nastycase, especially for locking and ref
counting...you didn't say I only had to hand off the easy problems,
though...

Hotpluggin has to have some awareness of this locality too.  Even for
IDE, hot unplug a card and you can lose two devices per cable.

> > 5. There needs to be some amalgam of the SCSI code for dynamic tag
> > command queue depth handling.
> 
> Again, block layer queueing was designed for what I needed (ide tcq) and
> no overdesign was attempted. If you describe what you need, I'd be very
> happy to oblige and add those bits. Some decent depth change handling, I
> presume?

Pretty much yes, now.  We lost all of our memory allocation nightmare
problems when we moved away from fixed command queues per device to lazy
command allocation using slabs.

James



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 18:47   ` James Bottomley
@ 2003-05-26 19:07     ` Jens Axboe
  2003-05-26 19:17       ` James Bottomley
  0 siblings, 1 reply; 68+ messages in thread
From: Jens Axboe @ 2003-05-26 19:07 UTC (permalink / raw)
  To: James Bottomley; +Cc: torvalds, Linux Kernel

On Mon, May 26 2003, James Bottomley wrote:
> On Mon, 2003-05-26 at 14:18, Jens Axboe wrote:
> > > 1. Unified SG segment allocation.  The SCSI layer currently has a
> > > mempool implementation to cope with this, is there a reason it can't
> > > become block generic?
> > 
> > Of course that is doable, when I killed scsi_dma.c it was just a direct
> > replacement. Given that IDE had no such dynamic sg list allocation
> > requirements, it stayed in SCSI. Overdesign is never good :)
> 
> I agree with the sentiment.  I just don't think variable size SG tables
> will remain the exclusive province of SCSI forever.

Agree, until that becomes a problem I don't see a reason to rip it out
of SCSI.

> > > b. the host adapter is out of resources for *all* its devices.  Block
> > > all device queues until we free some resources (again, usually a
> > > returning command).
> > 
> > This is harder, because it involves more than one specific queue.
> 
> Yes, this is our nastycase, especially for locking and ref
> counting...you didn't say I only had to hand off the easy problems,
> though...

:)

I really think this should be handled in the SCSI layer. You are dealing
with devices hanging off a hardware unit of some sort.

It's also possible to go too far with this whole abstraction deal. The
resulting code ends up being contorted, instead of having to separate
'straight forward' cases in SCSI and XYZ (for instance).

> Hotpluggin has to have some awareness of this locality too.  Even for
> IDE, hot unplug a card and you can lose two devices per cable.

Hmmm yes. Now we are moving into general device management, though.

> > > 5. There needs to be some amalgam of the SCSI code for dynamic tag
> > > command queue depth handling.
> > 
> > Again, block layer queueing was designed for what I needed (ide tcq) and
> > no overdesign was attempted. If you describe what you need, I'd be very
> > happy to oblige and add those bits. Some decent depth change handling, I
> > presume?
> 
> Pretty much yes, now.  We lost all of our memory allocation nightmare
> problems when we moved away from fixed command queues per device to lazy
> command allocation using slabs.

Alright, so what do you need? Start out with X tags, shrink to Y (based
on repeated queue full conditions)? Anything else?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 19:07     ` Jens Axboe
@ 2003-05-26 19:17       ` James Bottomley
  2003-05-26 19:33         ` Jens Axboe
  2003-05-26 20:27         ` Linus Torvalds
  0 siblings, 2 replies; 68+ messages in thread
From: James Bottomley @ 2003-05-26 19:17 UTC (permalink / raw)
  To: Jens Axboe; +Cc: torvalds, Linux Kernel

On Mon, 2003-05-26 at 15:07, Jens Axboe wrote:
> Alright, so what do you need? Start out with X tags, shrink to Y (based
> on repeated queue full conditions)? Anything else?

Actually, it's easier than that: just an API to alter the number of tags
in the block layer (really only the size of your internal hash table). 
The actual heuristics of when to alter the queue depth is the province
of the individual drivers (although Doug Ledford was going to come up
with a generic implementation).

James



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 19:17       ` James Bottomley
@ 2003-05-26 19:33         ` Jens Axboe
  2003-05-27 12:39           ` Jens Axboe
  2003-05-26 20:27         ` Linus Torvalds
  1 sibling, 1 reply; 68+ messages in thread
From: Jens Axboe @ 2003-05-26 19:33 UTC (permalink / raw)
  To: James Bottomley; +Cc: torvalds, Linux Kernel

On Mon, May 26 2003, James Bottomley wrote:
> On Mon, 2003-05-26 at 15:07, Jens Axboe wrote:
> > Alright, so what do you need? Start out with X tags, shrink to Y (based
> > on repeated queue full conditions)? Anything else?
> 
> Actually, it's easier than that: just an API to alter the number of tags
> in the block layer (really only the size of your internal hash table). 
> The actual heuristics of when to alter the queue depth is the province
> of the individual drivers (although Doug Ledford was going to come up
> with a generic implementation).

That's actually what I meant, that the SCSI layer would call down into
the block layer to set the size. I don't/want to know about queue full
conditions.

The internal memory requirements for the queue table is small (a bit per
tag), so I think we can basically get away with just decrementing
->max_depth.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 19:17       ` James Bottomley
  2003-05-26 19:33         ` Jens Axboe
@ 2003-05-26 20:27         ` Linus Torvalds
  2003-05-26 20:36           ` James Bottomley
  2003-05-26 20:38           ` Jens Axboe
  1 sibling, 2 replies; 68+ messages in thread
From: Linus Torvalds @ 2003-05-26 20:27 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jens Axboe, Linux Kernel


On 26 May 2003, James Bottomley wrote:
>
> On Mon, 2003-05-26 at 15:07, Jens Axboe wrote:
> > Alright, so what do you need? Start out with X tags, shrink to Y (based
> > on repeated queue full conditions)? Anything else?
> 
> Actually, it's easier than that: just an API to alter the number of tags
> in the block layer (really only the size of your internal hash table). 
> The actual heuristics of when to alter the queue depth is the province
> of the individual drivers (although Doug Ledford was going to come up
> with a generic implementation).

Talking about tagged queueing - does the SCSI layer still remove the
request from the request list when it starts executing it?

At least historically that's a major mistake, and generates a crappy 
elevator, because it removes information from the block layer about where 
the disk is (or is going to be).

I know Andrew thinks that SCSI tagged queuing is a bunch of crap, and he 
has the latency numbers to prove it. He blames the SCSI disks themselves, 
but I think it might be the fact that SCSI makes it impossible to make a 
fair queuing algorithm for higher levels by hiding information.

Has anybody looked at just removing the request at command _completion_ 
time instead? That's what IDE does, and it's the _right_ thing to do.

I'd hate for SATA to pick up these kinds of mistakes from the SCSI layer.

			Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 20:27         ` Linus Torvalds
@ 2003-05-26 20:36           ` James Bottomley
  2003-05-26 20:45             ` Linus Torvalds
  2003-05-26 20:38           ` Jens Axboe
  1 sibling, 1 reply; 68+ messages in thread
From: James Bottomley @ 2003-05-26 20:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, Linux Kernel

On Mon, 2003-05-26 at 16:27, Linus Torvalds wrote:
> Talking about tagged queueing - does the SCSI layer still remove the
> request from the request list when it starts executing it?

That's a block layer requirement (blk_queue_start_tag does the dequeue).
But yes, for untagged requests, we still dequeue them manually.

> At least historically that's a major mistake, and generates a crappy 
> elevator, because it removes information from the block layer about where 
> the disk is (or is going to be).
> 
> I know Andrew thinks that SCSI tagged queuing is a bunch of crap, and he 
> has the latency numbers to prove it. He blames the SCSI disks themselves, 
> but I think it might be the fact that SCSI makes it impossible to make a 
> fair queuing algorithm for higher levels by hiding information.
> 
> Has anybody looked at just removing the request at command _completion_ 
> time instead? That's what IDE does, and it's the _right_ thing to do.

Well...I could do it, but since it only applies to untagged devices
(which is really tapes and some CD-ROMs nowadays), I'm not sure it would
be worth the effort.

> I'd hate for SATA to pick up these kinds of mistakes from the SCSI layer.

The elevator is based on linear head movements.  I'm not sure its
optimal to figure out a way to leave all the executing tagged requests
in the queue, is it?  They can be spread out all over the disc with no
idea which one the disc is currently executing.

James



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 20:27         ` Linus Torvalds
  2003-05-26 20:36           ` James Bottomley
@ 2003-05-26 20:38           ` Jens Axboe
  2003-05-26 20:49             ` Linus Torvalds
  1 sibling, 1 reply; 68+ messages in thread
From: Jens Axboe @ 2003-05-26 20:38 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: James Bottomley, Linux Kernel

On Mon, May 26 2003, Linus Torvalds wrote:
> 
> On 26 May 2003, James Bottomley wrote:
> >
> > On Mon, 2003-05-26 at 15:07, Jens Axboe wrote:
> > > Alright, so what do you need? Start out with X tags, shrink to Y (based
> > > on repeated queue full conditions)? Anything else?
> > 
> > Actually, it's easier than that: just an API to alter the number of tags
> > in the block layer (really only the size of your internal hash table). 
> > The actual heuristics of when to alter the queue depth is the province
> > of the individual drivers (although Doug Ledford was going to come up
> > with a generic implementation).
> 
> Talking about tagged queueing - does the SCSI layer still remove the
> request from the request list when it starts executing it?

Yes

> At least historically that's a major mistake, and generates a crappy 
> elevator, because it removes information from the block layer about where 
> the disk is (or is going to be).

Not necessarily, the io schedulers keep track of where we are going. You
don't need an active front request for that.

> I know Andrew thinks that SCSI tagged queuing is a bunch of crap, and he 
> has the latency numbers to prove it. He blames the SCSI disks themselves, 
> but I think it might be the fact that SCSI makes it impossible to make a 
> fair queuing algorithm for higher levels by hiding information.
> 
> Has anybody looked at just removing the request at command _completion_ 
> time instead? That's what IDE does, and it's the _right_ thing to do.

I know this is a pet peeve of yours (must be, I remember you bringing it
up at least 3 time before :), but I don't think that's necessarily true.
It shouldn't matter _one_ bit whether you leave the request there or
not, it's unmergeable. As long as the io scheduler keeps track of this
(and it does!) we are golden.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 20:36           ` James Bottomley
@ 2003-05-26 20:45             ` Linus Torvalds
  2003-05-26 20:51               ` Jens Axboe
  2003-05-26 20:56               ` James Bottomley
  0 siblings, 2 replies; 68+ messages in thread
From: Linus Torvalds @ 2003-05-26 20:45 UTC (permalink / raw)
  To: James Bottomley; +Cc: Jens Axboe, Linux Kernel


On 26 May 2003, James Bottomley wrote:
> 
> > I'd hate for SATA to pick up these kinds of mistakes from the SCSI layer.
> 
> The elevator is based on linear head movements.

Historically, yes.

But we've been moving more and more towards a latency-based elevator, or
at least one that takes latency into account. Which is exactly why you'd
like to leave unfinished requests on the queue, because otherwise your
queue doesn't really show what is really going on.

In particular, while the higher layers don't actually _do_ this yet, from 
a latency standpoint the right thing to do when some request seems to get 
starved is to refuse to feed more tagged requeusts to the device until the 
starved request has finished. 

As I mentioned, Andrew actually had some really bad latency numbers to
prove that this is a real issue. SCSI with more than 4 tags or so results 
in potentially _major_ starvation, because the disks themselves are not 
even trying to avoid it.

Also, even aside from the starvation issue with unfair disks, just from a
"linear head movement" standpoint, you really want to sort the queue
according to what is going on _now_ in the disk. If the disk eats the
requests immediately (but doesn't execute them immediately), the sorting
has nothing to go on, and you get tons of back-and-forth movements.

Basically, if you're trying to do an elevator, you need to know what the 
outstanding commands are. Think it through on paper, and you'll see what I 
mean.

			Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 20:38           ` Jens Axboe
@ 2003-05-26 20:49             ` Linus Torvalds
  2003-05-26 20:57               ` Jens Axboe
  0 siblings, 1 reply; 68+ messages in thread
From: Linus Torvalds @ 2003-05-26 20:49 UTC (permalink / raw)
  To: Jens Axboe; +Cc: James Bottomley, Linux Kernel


On Mon, 26 May 2003, Jens Axboe wrote:
> 
> I know this is a pet peeve of yours (must be, I remember you bringing it
> up at least 3 time before :), but I don't think that's necessarily true.
> It shouldn't matter _one_ bit whether you leave the request there or
> not, it's unmergeable.

It's not the merging that I worry about. It's latency and starvation.

Think of it this way: if you keep feeding a disk requests, and the disk 
always tries to do the closest one (which is a likely algorithm), you can 
easily have a situation where the disk _never_ actually schedules a 
request that is at one "end" of the platter. 

Think of all the fairness issues we've had in the elevator code, and 
realize that the low-level disk probably implements _none_ of those 
fairness algorithms.

> As long as the io scheduler keeps track of this (and it does!) we are
> golden.

Hmm.. Where does it keep track of request latency for requests that have 
been removed from the queue?

		Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 20:45             ` Linus Torvalds
@ 2003-05-26 20:51               ` Jens Axboe
  2003-05-26 20:56               ` James Bottomley
  1 sibling, 0 replies; 68+ messages in thread
From: Jens Axboe @ 2003-05-26 20:51 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: James Bottomley, Linux Kernel

On Mon, May 26 2003, Linus Torvalds wrote:
> 
> On 26 May 2003, James Bottomley wrote:
> > 
> > > I'd hate for SATA to pick up these kinds of mistakes from the SCSI layer.
> > 
> > The elevator is based on linear head movements.
> 
> Historically, yes.
> 
> But we've been moving more and more towards a latency-based elevator, or
> at least one that takes latency into account. Which is exactly why you'd
> like to leave unfinished requests on the queue, because otherwise your
> queue doesn't really show what is really going on.

The newer io schedulers divide the queue into a front and backend, so
it's not exactly trivial to just 'leave it on the queue'. Which queue?

You know which ones are pending, they are on the busy queue. You can
look there.

> In particular, while the higher layers don't actually _do_ this yet, from 
> a latency standpoint the right thing to do when some request seems to get 
> starved is to refuse to feed more tagged requeusts to the device until the 
> starved request has finished. 

Agree

> As I mentioned, Andrew actually had some really bad latency numbers to
> prove that this is a real issue. SCSI with more than 4 tags or so results 
> in potentially _major_ starvation, because the disks themselves are not 
> even trying to avoid it.

Basically everyone agrees that this shouldn't happen in newer disks, I'm
inclined to think we are seeing internal queueing bugs in the SCSI
drivers or SCSI layer itself.

> Also, even aside from the starvation issue with unfair disks, just from a
> "linear head movement" standpoint, you really want to sort the queue
> according to what is going on _now_ in the disk. If the disk eats the
> requests immediately (but doesn't execute them immediately), the sorting
> has nothing to go on, and you get tons of back-and-forth movements.
> 
> Basically, if you're trying to do an elevator, you need to know what the 
> outstanding commands are. Think it through on paper, and you'll see what I 
> mean.

Who said we are using an elevator? News flash, we haven't used an
elevator design for a long time.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 20:45             ` Linus Torvalds
  2003-05-26 20:51               ` Jens Axboe
@ 2003-05-26 20:56               ` James Bottomley
  1 sibling, 0 replies; 68+ messages in thread
From: James Bottomley @ 2003-05-26 20:56 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, Linux Kernel

On Mon, 2003-05-26 at 16:45, Linus Torvalds wrote:
> 
> On 26 May 2003, James Bottomley wrote:
> > 
> > > I'd hate for SATA to pick up these kinds of mistakes from the SCSI layer.
> > 
> > The elevator is based on linear head movements.
> 
> Historically, yes.
> 
> But we've been moving more and more towards a latency-based elevator, or
> at least one that takes latency into account. Which is exactly why you'd
> like to leave unfinished requests on the queue, because otherwise your
> queue doesn't really show what is really going on.
> 
> In particular, while the higher layers don't actually _do_ this yet, from 
> a latency standpoint the right thing to do when some request seems to get 
> starved is to refuse to feed more tagged requeusts to the device until the 
> starved request has finished. 

OK, number seven on the list was going to be thinking about tracking
timeouts at the block layer.

Tag starvation handling is a property of each individual SCSI HBA (and I
know it shouldn't be).  Some do a good job, some don't.  However, it's
implicitly also handled in the SCSI error handler because on timeout we
quiesce the device (which finally causes the starved tag to execute).  I
have thought about doing it properly in the error handler, but it's a
radical divergence---you basically need to begin to throttle the queue,
but wait to see if your starved tag gets serviced and then increase the
feed again (as opposed to our current quiesce then handle model).

> As I mentioned, Andrew actually had some really bad latency numbers to
> prove that this is a real issue. SCSI with more than 4 tags or so results 
> in potentially _major_ starvation, because the disks themselves are not 
> even trying to avoid it.

Tag starvation isn't really a problem for the majority of modern SCSI
drives (big iron vendors yelled and kicked about this a while ago).  You
can still find some of the older drives that do have these issues (I
keep several for the purposes of exciting the error handling).

> Also, even aside from the starvation issue with unfair disks, just from a
> "linear head movement" standpoint, you really want to sort the queue
> according to what is going on _now_ in the disk. If the disk eats the
> requests immediately (but doesn't execute them immediately), the sorting
> has nothing to go on, and you get tons of back-and-forth movements.

Most scsi discs have a non linear geometry anyway, so "head scheduling"
is pretty useless to them.  All SCSI really wants is I/O consolidation. 
i.e. we'd like to use fewer larger sized requests.

James



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 20:49             ` Linus Torvalds
@ 2003-05-26 20:57               ` Jens Axboe
  2003-05-26 21:34                 ` Linus Torvalds
  0 siblings, 1 reply; 68+ messages in thread
From: Jens Axboe @ 2003-05-26 20:57 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: James Bottomley, Linux Kernel

On Mon, May 26 2003, Linus Torvalds wrote:
> 
> On Mon, 26 May 2003, Jens Axboe wrote:
> > 
> > I know this is a pet peeve of yours (must be, I remember you bringing it
> > up at least 3 time before :), but I don't think that's necessarily true.
> > It shouldn't matter _one_ bit whether you leave the request there or
> > not, it's unmergeable.
> 
> It's not the merging that I worry about. It's latency and starvation.
> 
> Think of it this way: if you keep feeding a disk requests, and the disk 
> always tries to do the closest one (which is a likely algorithm), you can 
> easily have a situation where the disk _never_ actually schedules a 
> request that is at one "end" of the platter. 

Then you have a bad disk, period. If the disks always tries to
approximate SPTF internally, then it's a bad design. Apparently that
Other OS times read/write requests out after 3 seconds, we we at least
know we are getting service in that time frame. Not saying that is good
enough, just a data point.

But the situation you describe above can easily be fixed, you described
the solution yourself in the previous mail. The silly tag depth is a
problem in itself, it should not be done. Keeping a sane number of tags
just to keep the disk busy, and we can use the "don't queue more
requests before X finishes, because X has been waiting for Y ms" tactic.

In fact, considering folks want to make error handling for command
timeouts a block property (that I agree with, we are already going there
with the SG_IO stuff), we can soft timeout a command if need be and
handle the case from there. What do you think?

> Think of all the fairness issues we've had in the elevator code, and 
> realize that the low-level disk probably implements _none_ of those 
> fairness algorithms.

I think it does, to some extent at least.

> > As long as the io scheduler keeps track of this (and it does!) we are
> > golden.
> 
> Hmm.. Where does it keep track of request latency for requests that have 
> been removed from the queue?

Well, it doesn't...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 20:57               ` Jens Axboe
@ 2003-05-26 21:34                 ` Linus Torvalds
  2003-05-26 23:58                   ` Nick Piggin
                                     ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Linus Torvalds @ 2003-05-26 21:34 UTC (permalink / raw)
  To: Jens Axboe; +Cc: James Bottomley, Linux Kernel


On Mon, 26 May 2003, Jens Axboe wrote:
> 
> > Think of all the fairness issues we've had in the elevator code, and 
> > realize that the low-level disk probably implements _none_ of those 
> > fairness algorithms.
> 
> I think it does, to some extent at least.

I doubt they do a very good job of it. I know of bad cases, even with 
"high-end" hardware. Sure, we can hope that it's getting better, but do we 
want to bet on it.

> > Hmm.. Where does it keep track of request latency for requests that have 
> > been removed from the queue?
> 
> Well, it doesn't...

Yeah. Which means that right now _really_ long starvation will show up as
timeouts, while other cases will just show up as bad latency.

Which will _work_, of course (assuming the timeout handling is correct,
which is a big if in itself), but it still sucks from a usability
standpoint.

Even if we drop our timeouts from 30 seconds (or whatever they are now)
down to just a few seconds, that's a _loooong_ time, and we should be a
lot more proactive about things. Audio/video stuff tends to want things
with latencies in the tenth-of-a-second range, even when they buffer
things up internally to hide the worst cases.

			Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 21:34                 ` Linus Torvalds
@ 2003-05-26 23:58                   ` Nick Piggin
  2003-05-27  0:09                     ` Linus Torvalds
  2003-05-27  0:16                   ` Alan Cox
  2003-05-27  6:54                   ` Jens Axboe
  2 siblings, 1 reply; 68+ messages in thread
From: Nick Piggin @ 2003-05-26 23:58 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, James Bottomley, Linux Kernel



Linus Torvalds wrote:

>On Mon, 26 May 2003, Jens Axboe wrote:
>
>>>Think of all the fairness issues we've had in the elevator code, and 
>>>realize that the low-level disk probably implements _none_ of those 
>>>fairness algorithms.
>>>
>>I think it does, to some extent at least.
>>
>
>I doubt they do a very good job of it. I know of bad cases, even with 
>"high-end" hardware. Sure, we can hope that it's getting better, but do we 
>want to bet on it.
>
>
>>>Hmm.. Where does it keep track of request latency for requests that have 
>>>been removed from the queue?
>>>
>>Well, it doesn't...
>>
>
>Yeah. Which means that right now _really_ long starvation will show up as
>timeouts, while other cases will just show up as bad latency.
>
There is an elevator notifier which is called on request
completion in Andrew's tree (needed for AS io scheduler). This
can be used to do what you want.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 23:58                   ` Nick Piggin
@ 2003-05-27  0:09                     ` Linus Torvalds
  2003-05-27  0:49                       ` Nick Piggin
  0 siblings, 1 reply; 68+ messages in thread
From: Linus Torvalds @ 2003-05-27  0:09 UTC (permalink / raw)
  To: Nick Piggin; +Cc: Jens Axboe, James Bottomley, Linux Kernel


On Tue, 27 May 2003, Nick Piggin wrote:
>
> There is an elevator notifier which is called on request
> completion in Andrew's tree (needed for AS io scheduler). This
> can be used to do what you want.

Well, yeah, sure, you can use it to keep track of outstanding requests. 
But wouldn't it be nicer to see them in the first place?

The question being: how do you sanely avoid adding more requests to the 
queue if you start seeing starvation? Or if adding more requests, at least 
using an ordered tag or a barrier or whatever to make sure that you tell 
the disk that the new request must not be done before the old one has 
finally been satisfied?

I think you'd want to see the old requests in order to be able to make 
that decision reasonably well. 

This is clearly not a 2.6.x issue, btw.

			Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 21:34                 ` Linus Torvalds
  2003-05-26 23:58                   ` Nick Piggin
@ 2003-05-27  0:16                   ` Alan Cox
  2003-05-27  6:54                   ` Jens Axboe
  2 siblings, 0 replies; 68+ messages in thread
From: Alan Cox @ 2003-05-27  0:16 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, James Bottomley, Linux Kernel Mailing List

On Llu, 2003-05-26 at 22:34, Linus Torvalds wrote:
> Even if we drop our timeouts from 30 seconds (or whatever they are now)
> down to just a few seconds, that's a _loooong_ time, and we should be a
> lot more proactive about things. Audio/video stuff tends to want things
> with latencies in the tenth-of-a-second range, even when they buffer
> things up internally to hide the worst cases.

Many IDE drives will take several seconds to give up on a failing I/O
because they do all the recovery themselves. IDE CD/DVD can be
especially bad for this.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27  0:09                     ` Linus Torvalds
@ 2003-05-27  0:49                       ` Nick Piggin
  0 siblings, 0 replies; 68+ messages in thread
From: Nick Piggin @ 2003-05-27  0:49 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, James Bottomley, Linux Kernel



Linus Torvalds wrote:

>On Tue, 27 May 2003, Nick Piggin wrote:
>
>>There is an elevator notifier which is called on request
>>completion in Andrew's tree (needed for AS io scheduler). This
>>can be used to do what you want.
>>
>
>Well, yeah, sure, you can use it to keep track of outstanding requests. 
>But wouldn't it be nicer to see them in the first place?
>
Yeah. Basically the driver will call:
elv_next_request, elv_remove_request, elv_completed_request

elv_remove_request can easily just move the request to another
list, which is removed by elv_completed_request.

Don't let the names bother you, the elevator (in Andrew's tree)
gets all the information it needs.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 21:34                 ` Linus Torvalds
  2003-05-26 23:58                   ` Nick Piggin
  2003-05-27  0:16                   ` Alan Cox
@ 2003-05-27  6:54                   ` Jens Axboe
  2003-05-27 14:20                     ` James Bottomley
  2003-05-27 14:36                     ` Linus Torvalds
  2 siblings, 2 replies; 68+ messages in thread
From: Jens Axboe @ 2003-05-27  6:54 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: James Bottomley, Linux Kernel

On Mon, May 26 2003, Linus Torvalds wrote:
> 
> On Mon, 26 May 2003, Jens Axboe wrote:
> > 
> > > Think of all the fairness issues we've had in the elevator code, and 
> > > realize that the low-level disk probably implements _none_ of those 
> > > fairness algorithms.
> > 
> > I think it does, to some extent at least.
> 
> I doubt they do a very good job of it. I know of bad cases, even with 
> "high-end" hardware. Sure, we can hope that it's getting better, but do we 
> want to bet on it.

No I agree, it's always nice to handle these cases in software so we
don't have to rely on these things getting fixed.

> > > Hmm.. Where does it keep track of request latency for requests that have 
> > > been removed from the queue?
> > 
> > Well, it doesn't...
> 
> Yeah. Which means that right now _really_ long starvation will show up as
> timeouts, while other cases will just show up as bad latency.
> 
> Which will _work_, of course (assuming the timeout handling is correct,
> which is a big if in itself), but it still sucks from a usability
> standpoint.
> 
> Even if we drop our timeouts from 30 seconds (or whatever they are now)
> down to just a few seconds, that's a _loooong_ time, and we should be a
> lot more proactive about things. Audio/video stuff tends to want things
> with latencies in the tenth-of-a-second range, even when they buffer
> things up internally to hide the worst cases.

Here's something ridicolously simple, that just wont start a new tag if
the oldest tag is older than 100ms. Clearly nothing for submission, but
it gets the point across.

Now only look at reads, and we've got something a little useful at
least.

James, speaking of queue localities and tcq... Doug mentioned some time
ago that aic7xxx dishes out tags numbers from a hba pool which makes it
impossible to support with out current block layer queueing code. Maybe
it we associate the blk_queue_tag structure with a bunch of queues
instead of having a 1:1 mapping it could work.

===== drivers/block/ll_rw_blk.c 1.170 vs edited =====
--- 1.170/drivers/block/ll_rw_blk.c	Thu May  8 11:30:11 2003
+++ edited/drivers/block/ll_rw_blk.c	Tue May 27 08:44:44 2003
@@ -574,6 +574,13 @@
 		BUG();
 	}
 
+	if (!list_empty(&bqt->busy_list)) {
+		struct request *__rq = list_entry_rq(bqt->busy_list.prev);
+
+		if (time_after(rq->timeout, jiffies))
+			return 1;
+	}
+
 	for (map = bqt->tag_map; *map == -1UL; map++) {
 		tag += BLK_TAGS_PER_LONG;
 
@@ -589,6 +596,7 @@
 	bqt->tag_index[tag] = rq;
 	blkdev_dequeue_request(rq);
 	list_add(&rq->queuelist, &bqt->busy_list);
+	rq->timeout = jiffies + HZ / 10;
 	bqt->busy++;
 	return 0;
 }

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 19:33         ` Jens Axboe
@ 2003-05-27 12:39           ` Jens Axboe
  2003-05-27 14:26             ` James Bottomley
  0 siblings, 1 reply; 68+ messages in thread
From: Jens Axboe @ 2003-05-27 12:39 UTC (permalink / raw)
  To: James Bottomley; +Cc: torvalds, Linux Kernel

On Mon, May 26 2003, Jens Axboe wrote:
> On Mon, May 26 2003, James Bottomley wrote:
> > On Mon, 2003-05-26 at 15:07, Jens Axboe wrote:
> > > Alright, so what do you need? Start out with X tags, shrink to Y (based
> > > on repeated queue full conditions)? Anything else?
> > 
> > Actually, it's easier than that: just an API to alter the number of tags
> > in the block layer (really only the size of your internal hash table). 
> > The actual heuristics of when to alter the queue depth is the province
> > of the individual drivers (although Doug Ledford was going to come up
> > with a generic implementation).
> 
> That's actually what I meant, that the SCSI layer would call down into
> the block layer to set the size. I don't/want to know about queue full
> conditions.
> 
> The internal memory requirements for the queue table is small (a bit per
> tag), so I think we can basically get away with just decrementing
> ->max_depth.

James, something like this would be enough then (untested, compiles)?

===== drivers/block/ll_rw_blk.c 1.170 vs edited =====
--- 1.170/drivers/block/ll_rw_blk.c	Thu May  8 11:30:11 2003
+++ edited/drivers/block/ll_rw_blk.c	Tue May 27 14:37:20 2003
@@ -413,11 +413,12 @@
 {
 	struct blk_queue_tag *bqt = q->queue_tags;
 
-	if (unlikely(bqt == NULL || bqt->max_depth < tag))
+	if (unlikely(bqt == NULL))
 		return NULL;
 
 	return bqt->tag_index[tag];
 }
+
 /**
  * blk_queue_free_tags - release tag maintenance info
  * @q:  the request queue for the device
@@ -448,38 +449,26 @@
 	q->queue_flags &= ~(1 << QUEUE_FLAG_QUEUED);
 }
 
-/**
- * blk_queue_init_tags - initialize the queue tag info
- * @q:  the request queue for the device
- * @depth:  the maximum queue depth supported
- **/
-int blk_queue_init_tags(request_queue_t *q, int depth)
+static int init_tag_map(struct blk_queue_tag *tags, int depth)
 {
-	struct blk_queue_tag *tags;
 	int bits, i;
 
 	if (depth > (queue_nr_requests*2)) {
 		depth = (queue_nr_requests*2);
-		printk("blk_queue_init_tags: adjusted depth to %d\n", depth);
+		printk(KERN_ERR "%s: adjusted depth to %d\n", __FUNCTION__, depth);
 	}
 
-	tags = kmalloc(sizeof(struct blk_queue_tag),GFP_ATOMIC);
-	if (!tags)
-		goto fail;
-
 	tags->tag_index = kmalloc(depth * sizeof(struct request *), GFP_ATOMIC);
 	if (!tags->tag_index)
-		goto fail_index;
+		goto fail;
 
 	bits = (depth / BLK_TAGS_PER_LONG) + 1;
 	tags->tag_map = kmalloc(bits * sizeof(unsigned long), GFP_ATOMIC);
 	if (!tags->tag_map)
-		goto fail_map;
+		goto fail;
 
 	memset(tags->tag_index, 0, depth * sizeof(struct request *));
 	memset(tags->tag_map, 0, bits * sizeof(unsigned long));
-	INIT_LIST_HEAD(&tags->busy_list);
-	tags->busy = 0;
 	tags->max_depth = depth;
 
 	/*
@@ -488,22 +477,89 @@
 	for (i = depth; i < bits * BLK_TAGS_PER_LONG; i++)
 		__set_bit(i, tags->tag_map);
 
+	return 0;
+fail:
+	kfree(tags->tag_index);
+	return -ENOMEM;
+}
+
+
+/**
+ * blk_queue_init_tags - initialize the queue tag info
+ * @q:  the request queue for the device
+ * @depth:  the maximum queue depth supported
+ **/
+int blk_queue_init_tags(request_queue_t *q, int depth)
+{
+	struct blk_queue_tag *tags;
+
+	tags = kmalloc(sizeof(struct blk_queue_tag),GFP_ATOMIC);
+	if (!tags)
+		goto fail;
+
+	if (init_tag_map(tags, depth))
+		goto fail;
+
+	INIT_LIST_HEAD(&tags->busy_list);
+	tags->busy = 0;
+
 	/*
 	 * assign it, all done
 	 */
 	q->queue_tags = tags;
 	q->queue_flags |= (1 << QUEUE_FLAG_QUEUED);
 	return 0;
-
-fail_map:
-	kfree(tags->tag_index);
-fail_index:
-	kfree(tags);
 fail:
+	kfree(tags);
 	return -ENOMEM;
 }
 
 /**
+ * blk_queue_resize_tags - change the queueing depth
+ * @q:  the request queue for the device
+ * @new_depth: the new max command queueing depth
+ *
+ *  Notes:
+ *    Must be called with the queue lock held.
+ **/
+int blk_queue_resize_tags(request_queue_t *q, int new_depth)
+{
+	struct blk_queue_tag *bqt = q->queue_tags;
+	struct request **tag_index;
+	unsigned long *tag_map;
+	int bits, max_depth;
+
+	if (!bqt)
+		return -ENXIO;
+
+	/*
+	 * don't bother sizing down
+	 */
+	if (new_depth <= bqt->max_depth) {
+		bqt->max_depth = new_depth;
+		return 0;
+	}
+
+	/*
+	 * save the old state info, so we can copy it back
+	 */
+	tag_index = bqt->tag_index;
+	tag_map = bqt->tag_map;
+	max_depth = bqt->max_depth;
+
+	if (init_tag_map(bqt, new_depth))
+		return -ENOMEM;
+
+	memcpy(bqt->tag_index, tag_index, max_depth * sizeof(struct request *));
+	bits = (max_depth / BLK_TAGS_PER_LONG) + 1;
+	memcpy(bqt->tag_map, bqt->tag_map, bits * sizeof(unsigned long));
+
+	kfree(tag_index);
+	kfree(tag_map);
+	return 0;
+}
+
+/**
  * blk_queue_end_tag - end tag operations for a request
  * @q:  the request queue for the device
  * @tag:  the tag that has completed
@@ -523,9 +579,6 @@
 	int tag = rq->tag;
 
 	BUG_ON(tag == -1);
-
-	if (unlikely(tag >= bqt->max_depth))
-		return;
 
 	if (unlikely(!__test_and_clear_bit(tag, bqt->tag_map))) {
 		printk("attempt to clear non-busy tag (%d)\n", tag);
===== include/linux/blkdev.h 1.105 vs edited =====
--- 1.105/include/linux/blkdev.h	Thu May  8 11:30:11 2003
+++ edited/include/linux/blkdev.h	Tue May 27 12:36:53 2003
@@ -452,6 +452,7 @@
 extern void blk_queue_end_tag(request_queue_t *, struct request *);
 extern int blk_queue_init_tags(request_queue_t *, int);
 extern void blk_queue_free_tags(request_queue_t *);
+extern int blk_queue_resize_tags(request_queue_t *, int);
 extern void blk_queue_invalidate_tags(request_queue_t *);
 extern void blk_congestion_wait(int rw, long timeout);
 

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27  6:54                   ` Jens Axboe
@ 2003-05-27 14:20                     ` James Bottomley
  2003-05-27 14:36                     ` Linus Torvalds
  1 sibling, 0 replies; 68+ messages in thread
From: James Bottomley @ 2003-05-27 14:20 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, Linux Kernel

On Tue, 2003-05-27 at 02:54, Jens Axboe wrote:
> James, speaking of queue localities and tcq... Doug mentioned some time
> ago that aic7xxx dishes out tags numbers from a hba pool which makes it
> impossible to support with out current block layer queueing code. Maybe
> it we associate the blk_queue_tag structure with a bunch of queues
> instead of having a 1:1 mapping it could work.

Yes, A large number of devices (not just the aic7xxx) have a single
issue queue for all outgoing requests (this is the reason for the
can_queue limit in the host template).

If the issue queue is < 256 slots, it makes a lot of sense to use global
tag numbers instead of device local ones, since then you can simply map
the returning tag number to an index in the issue queue and not bother
having to keep a hash table of <pun, lun, tag> to look it up in.

Even drivers which have larger or variable size issue queues (here the
qlogic ones spring immediately to mind) often have a virtual index
number attached to their commands (which, this time is 16 bits) which
they'd like to use as a unique offset into the issue queue.  Obviously,
they face exactly the same challenges as tags and they'd like a similar
solution.

A global structure for a bunch of queues would probably be the most
useful way forwards.

James



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 12:39           ` Jens Axboe
@ 2003-05-27 14:26             ` James Bottomley
  2003-05-27 17:16               ` Jens Axboe
  0 siblings, 1 reply; 68+ messages in thread
From: James Bottomley @ 2003-05-27 14:26 UTC (permalink / raw)
  To: Jens Axboe; +Cc: torvalds, Linux Kernel

On Tue, 2003-05-27 at 08:39, Jens Axboe wrote:
> James, something like this would be enough then (untested, compiles)?

Yes...the only concern I would have is that if you downsize the tag map,
you don't seem to keep any memory of what the high water mark actually
is.  Thus, I can drop the depth by 8 and then increase it again by 4 and
you won't see that the increase can be accommodated by the already
allocated space.

James



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27  6:54                   ` Jens Axboe
  2003-05-27 14:20                     ` James Bottomley
@ 2003-05-27 14:36                     ` Linus Torvalds
  2003-05-27 14:59                       ` James Bottomley
  2003-05-27 19:43                       ` Jens Axboe
  1 sibling, 2 replies; 68+ messages in thread
From: Linus Torvalds @ 2003-05-27 14:36 UTC (permalink / raw)
  To: Jens Axboe; +Cc: James Bottomley, Linux Kernel


On Tue, 27 May 2003, Jens Axboe wrote:
> 
> Here's something ridicolously simple, that just wont start a new tag if
> the oldest tag is older than 100ms. Clearly nothing for submission, but
> it gets the point across.

Yes, I think something like this should work very well.

In fact, it might fit in very well indeed with AS - and in general it
might be a good idea to have some nice interface for the IO scheduler to
give this kind of ordering hints down to the hardware.

		Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 14:36                     ` Linus Torvalds
@ 2003-05-27 14:59                       ` James Bottomley
  2003-05-27 15:21                         ` Jeff Garzik
  2003-05-27 19:43                       ` Jens Axboe
  1 sibling, 1 reply; 68+ messages in thread
From: James Bottomley @ 2003-05-27 14:59 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jens Axboe, Linux Kernel, jgarzik

On Tue, 2003-05-27 at 10:36, Linus Torvalds wrote:
    Btw, in case you wonder why I care about names and organization, it's 
    because with the names and organization comes assumptions and 
    expectations.
    
    One prime example of this is cdrecord, and the incredible braindamage that
    the name "SCSI" foisted upon it. Why? Because everybody (ie schily)  
    _knows_ that SCSI is addressed by bus/id/lun, and thinks that anything
    else is wrong. So you have total idiocies like the "cdrecord -scanbus"  
    crap for finding your device, and totally useless naming that makes no 
    sense in any sane environment.
    
    Calling something SCSI when it isn't brings on these kinds of bad things: 
    people make assuptions that aren't sensible or desireable.
    
    Names have power. There's baggage and assumptions in a name. In the case
    of SCSI, there is a _lot_ of baggage.
    
I took this one on board a long time ago.  Even in the SCSI world, FC
devices don't think in terms of PUN, they think in terms of WWN.

If you look at the mid layer (and I don't promise this to be complete
yet) we're moving away from referring to things by host/channel/id/lun. 
Now we just have host/device list in most places.  About the only place
we convert back to the numbers is to print messages.

We're certainly not there yet, since we need to support legacy
interfaces like /proc/scsi/scsi.  But eventually you'll probably see us
using the sysfs name instead of the id (FC devices will probably stuff
WWNs in here, other things may use numbers) and lun (not sure how we'll
represent SCSI-3 LUN hierarchies yet).  Hopefully, it will be possible
to make the mid layer entirely unaware of any id/lun distinction so it
could be configured for a flatter host/device space instead.

James





^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 14:59                       ` James Bottomley
@ 2003-05-27 15:21                         ` Jeff Garzik
  2003-05-27 15:38                           ` James Bottomley
  2003-05-28 10:50                           ` Lincoln Dale
  0 siblings, 2 replies; 68+ messages in thread
From: Jeff Garzik @ 2003-05-27 15:21 UTC (permalink / raw)
  To: James Bottomley; +Cc: Linus Torvalds, Jens Axboe, Linux Kernel

On Tue, May 27, 2003 at 10:59:51AM -0400, James Bottomley wrote:
> We're certainly not there yet, since we need to support legacy
> interfaces like /proc/scsi/scsi.  But eventually you'll probably see us
> using the sysfs name instead of the id (FC devices will probably stuff
> WWNs in here, other things may use numbers) and lun (not sure how we'll
> represent SCSI-3 LUN hierarchies yet).  Hopefully, it will be possible
> to make the mid layer entirely unaware of any id/lun distinction so it
> could be configured for a flatter host/device space instead.

ATA defines WWN too...  I'm curious what the format is?  uuid-ish?

	Jeff




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 15:21                         ` Jeff Garzik
@ 2003-05-27 15:38                           ` James Bottomley
  2003-05-27 15:50                             ` Jeff Garzik
  2003-05-28 10:50                           ` Lincoln Dale
  1 sibling, 1 reply; 68+ messages in thread
From: James Bottomley @ 2003-05-27 15:38 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linus Torvalds, Jens Axboe, Linux Kernel

On Tue, 2003-05-27 at 11:21, Jeff Garzik wrote:
> ATA defines WWN too...  I'm curious what the format is?  uuid-ish?

Heh, you'll regret asking that one.  There are currently seven possible
identifier formats for the WWN (which is basically the SCSI Device
Identification VPD page).

The extremely gory details are in the SPC-3 spec (section 7.6.4 of r12).

However, really, as far as Linux is concerned we don't need to care.  It
just needs to be reducible to an ASCII representation and dumped into
the sysfs name or embedded into bus_id.  The reduction to ASCII can be
subsystem (or even device driver) specific.

However, note that I'm not thinking of forcing the WWN here.  I'm
thinking of using whatever makes most sense to the device, so SPI
devices will continue to use simple target/lun numbers here.  FC devices
will probably want to use their variant of WWN/PortID.  The rationale is
to get rid of the unnecessary internal mappings some drivers use to get
to the physical address they send on the wire.

Thus, if you never address ATA devices by the WWN, you probably never
want to make it part of the addressing scheme.

Exporting a unique ID for userspace to use is a different (and probably
orthogonal) issue.

James



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 15:38                           ` James Bottomley
@ 2003-05-27 15:50                             ` Jeff Garzik
  2003-05-27 16:00                               ` James Bottomley
  0 siblings, 1 reply; 68+ messages in thread
From: Jeff Garzik @ 2003-05-27 15:50 UTC (permalink / raw)
  To: James Bottomley; +Cc: Linus Torvalds, Jens Axboe, Linux Kernel

On Tue, May 27, 2003 at 11:38:50AM -0400, James Bottomley wrote:
> Thus, if you never address ATA devices by the WWN, you probably never
> want to make it part of the addressing scheme.
> 
> Exporting a unique ID for userspace to use is a different (and probably
> orthogonal) issue.

Oh, no question.  My main interest is having a persistent id for a
device's media.  Then Linux can use that to allow mapping in case device
names or majors change at each boot (and similar situations).

	Jeff




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 15:50                             ` Jeff Garzik
@ 2003-05-27 16:00                               ` James Bottomley
  2003-05-27 16:16                                 ` Jeff Garzik
  0 siblings, 1 reply; 68+ messages in thread
From: James Bottomley @ 2003-05-27 16:00 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linus Torvalds, Jens Axboe, Linux Kernel

On Tue, 2003-05-27 at 11:50, Jeff Garzik wrote:
> Oh, no question.  My main interest is having a persistent id for a
> device's media.  Then Linux can use that to allow mapping in case device
> names or majors change at each boot (and similar situations).

Well, OK, that's not an in-kernel issue.

My current thought for this is that deriving the unique name can be
awfully device specific (even in SCSI there are several fallback methods
plus the usual black/white list of things that don't quite return
entirely unique objects).  Thus, I think the derivation probably belongs
in userspace as part of hotplug.  Once the unique ID is determined, it
could be written to a well known place in the sysfs tree for all the
rest of the OS (things like udev for persistent device naming) to use.

James



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 16:00                               ` James Bottomley
@ 2003-05-27 16:16                                 ` Jeff Garzik
  2003-05-28  9:35                                   ` Christoph Hellwig
  0 siblings, 1 reply; 68+ messages in thread
From: Jeff Garzik @ 2003-05-27 16:16 UTC (permalink / raw)
  To: James Bottomley; +Cc: Linus Torvalds, Jens Axboe, Linux Kernel

On Tue, May 27, 2003 at 12:00:28PM -0400, James Bottomley wrote:
> On Tue, 2003-05-27 at 11:50, Jeff Garzik wrote:
> > Oh, no question.  My main interest is having a persistent id for a
> > device's media.  Then Linux can use that to allow mapping in case device
> > names or majors change at each boot (and similar situations).
> 
> Well, OK, that's not an in-kernel issue.

It's definitely an in-kernel issue, because the mapping is in-kernel now:

	<major,minor> -> queue


> My current thought for this is that deriving the unique name can be
> awfully device specific (even in SCSI there are several fallback methods
> plus the usual black/white list of things that don't quite return
> entirely unique objects).

Agreed.  And kernel assistance is likely necessary in many cases to
obtain the unique id, even if it's only userspace that's reading it.


> Thus, I think the derivation probably belongs
> in userspace as part of hotplug.  Once the unique ID is determined, it
> could be written to a well known place in the sysfs tree for all the
> rest of the OS (things like udev for persistent device naming) to use.

Agreed, with the above proviso.

	Jeff





^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 14:26             ` James Bottomley
@ 2003-05-27 17:16               ` Jens Axboe
  2003-05-27 18:09                 ` James Bottomley
  0 siblings, 1 reply; 68+ messages in thread
From: Jens Axboe @ 2003-05-27 17:16 UTC (permalink / raw)
  To: James Bottomley; +Cc: torvalds, Linux Kernel

On Tue, May 27 2003, James Bottomley wrote:
> On Tue, 2003-05-27 at 08:39, Jens Axboe wrote:
> > James, something like this would be enough then (untested, compiles)?
> 
> Yes...the only concern I would have is that if you downsize the tag map,
> you don't seem to keep any memory of what the high water mark actually
> is.  Thus, I can drop the depth by 8 and then increase it again by 4 and
> you won't see that the increase can be accommodated by the already
> allocated space.

If you increase it again, the maps are resized. Is that a problem? Seems
ok to me.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 17:16               ` Jens Axboe
@ 2003-05-27 18:09                 ` James Bottomley
  2003-05-27 18:21                   ` Jens Axboe
  0 siblings, 1 reply; 68+ messages in thread
From: James Bottomley @ 2003-05-27 18:09 UTC (permalink / raw)
  To: Jens Axboe; +Cc: torvalds, Linux Kernel

On Tue, 2003-05-27 at 13:16, Jens Axboe wrote:
> If you increase it again, the maps are resized. Is that a problem? Seems
> ok to me.

What I mean is that you allocate memory whenever the depth increases. 
Even if you have an array large enough to accommodate the increase
(because you don't release when you decrease the tag depth).

On further examination, there's also an invalid tag race:  If a device
is throttling, it might want to do a big decrease followed fairly
quickly by a small increase.  When it does the increase, you potentially
still have outstanding tags above the new depth, which will now run off
the end of your newly allocated tag array.

James




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 18:09                 ` James Bottomley
@ 2003-05-27 18:21                   ` Jens Axboe
  2003-05-27 18:30                     ` James Bottomley
  0 siblings, 1 reply; 68+ messages in thread
From: Jens Axboe @ 2003-05-27 18:21 UTC (permalink / raw)
  To: James Bottomley; +Cc: torvalds, Linux Kernel

On Tue, May 27 2003, James Bottomley wrote:
> On Tue, 2003-05-27 at 13:16, Jens Axboe wrote:
> > If you increase it again, the maps are resized. Is that a problem? Seems
> > ok to me.
> 
> What I mean is that you allocate memory whenever the depth increases. 
> Even if you have an array large enough to accommodate the increase
> (because you don't release when you decrease the tag depth).

Yes I know what you mean, but my question is if it's worth it to keep
track of? No memory is really lost, but we are doing a copy of tag map
and bitmap of course in addition to the extra allocation. Given that
you're not going to change depths 100 times per seconds, I don't think
it's worth it to actually do anything about it. Especially since you'll
quickly settle at the desired depth and stay there. But hey, I'm an
accomodating guy (pfft), so here's the change just for you :)

> On further examination, there's also an invalid tag race:  If a device
> is throttling, it might want to do a big decrease followed fairly
> quickly by a small increase.  When it does the increase, you potentially
> still have outstanding tags above the new depth, which will now run off
> the end of your newly allocated tag array.

Oh yes you are right. How does the attached look? With real_max_depth,
that should work as well since we'll only ever alloc a bigger area.

===== drivers/block/ll_rw_blk.c 1.170 vs edited =====
--- 1.170/drivers/block/ll_rw_blk.c	Thu May  8 11:30:11 2003
+++ edited/drivers/block/ll_rw_blk.c	Tue May 27 20:21:00 2003
@@ -413,11 +413,12 @@
 {
 	struct blk_queue_tag *bqt = q->queue_tags;
 
-	if (unlikely(bqt == NULL || bqt->max_depth < tag))
+	if (unlikely(bqt == NULL || tag >= bqt->real_max_depth))
 		return NULL;
 
 	return bqt->tag_index[tag];
 }
+
 /**
  * blk_queue_free_tags - release tag maintenance info
  * @q:  the request queue for the device
@@ -448,39 +449,28 @@
 	q->queue_flags &= ~(1 << QUEUE_FLAG_QUEUED);
 }
 
-/**
- * blk_queue_init_tags - initialize the queue tag info
- * @q:  the request queue for the device
- * @depth:  the maximum queue depth supported
- **/
-int blk_queue_init_tags(request_queue_t *q, int depth)
+static int init_tag_map(struct blk_queue_tag *tags, int depth)
 {
-	struct blk_queue_tag *tags;
 	int bits, i;
 
 	if (depth > (queue_nr_requests*2)) {
 		depth = (queue_nr_requests*2);
-		printk("blk_queue_init_tags: adjusted depth to %d\n", depth);
+		printk(KERN_ERR "%s: adjusted depth to %d\n", __FUNCTION__, depth);
 	}
 
-	tags = kmalloc(sizeof(struct blk_queue_tag),GFP_ATOMIC);
-	if (!tags)
-		goto fail;
-
 	tags->tag_index = kmalloc(depth * sizeof(struct request *), GFP_ATOMIC);
 	if (!tags->tag_index)
-		goto fail_index;
+		goto fail;
 
 	bits = (depth / BLK_TAGS_PER_LONG) + 1;
 	tags->tag_map = kmalloc(bits * sizeof(unsigned long), GFP_ATOMIC);
 	if (!tags->tag_map)
-		goto fail_map;
+		goto fail;
 
 	memset(tags->tag_index, 0, depth * sizeof(struct request *));
 	memset(tags->tag_map, 0, bits * sizeof(unsigned long));
-	INIT_LIST_HEAD(&tags->busy_list);
-	tags->busy = 0;
 	tags->max_depth = depth;
+	tags->real_max_depth = bits * BITS_PER_LONG;
 
 	/*
 	 * set the upper bits if the depth isn't a multiple of the word size
@@ -488,22 +478,89 @@
 	for (i = depth; i < bits * BLK_TAGS_PER_LONG; i++)
 		__set_bit(i, tags->tag_map);
 
+	return 0;
+fail:
+	kfree(tags->tag_index);
+	return -ENOMEM;
+}
+
+
+/**
+ * blk_queue_init_tags - initialize the queue tag info
+ * @q:  the request queue for the device
+ * @depth:  the maximum queue depth supported
+ **/
+int blk_queue_init_tags(request_queue_t *q, int depth)
+{
+	struct blk_queue_tag *tags;
+
+	tags = kmalloc(sizeof(struct blk_queue_tag),GFP_ATOMIC);
+	if (!tags)
+		goto fail;
+
+	if (init_tag_map(tags, depth))
+		goto fail;
+
+	INIT_LIST_HEAD(&tags->busy_list);
+	tags->busy = 0;
+
 	/*
 	 * assign it, all done
 	 */
 	q->queue_tags = tags;
 	q->queue_flags |= (1 << QUEUE_FLAG_QUEUED);
 	return 0;
-
-fail_map:
-	kfree(tags->tag_index);
-fail_index:
-	kfree(tags);
 fail:
+	kfree(tags);
 	return -ENOMEM;
 }
 
 /**
+ * blk_queue_resize_tags - change the queueing depth
+ * @q:  the request queue for the device
+ * @new_depth: the new max command queueing depth
+ *
+ *  Notes:
+ *    Must be called with the queue lock held.
+ **/
+int blk_queue_resize_tags(request_queue_t *q, int new_depth)
+{
+	struct blk_queue_tag *bqt = q->queue_tags;
+	struct request **tag_index;
+	unsigned long *tag_map;
+	int bits, max_depth;
+
+	if (!bqt)
+		return -ENXIO;
+
+	/*
+	 * don't bother sizing down
+	 */
+	if (new_depth <= bqt->real_max_depth) {
+		bqt->max_depth = new_depth;
+		return 0;
+	}
+
+	/*
+	 * save the old state info, so we can copy it back
+	 */
+	tag_index = bqt->tag_index;
+	tag_map = bqt->tag_map;
+	max_depth = bqt->real_max_depth;
+
+	if (init_tag_map(bqt, new_depth))
+		return -ENOMEM;
+
+	memcpy(bqt->tag_index, tag_index, max_depth * sizeof(struct request *));
+	bits = max_depth / BLK_TAGS_PER_LONG;
+	memcpy(bqt->tag_map, bqt->tag_map, bits * sizeof(unsigned long));
+
+	kfree(tag_index);
+	kfree(tag_map);
+	return 0;
+}
+
+/**
  * blk_queue_end_tag - end tag operations for a request
  * @q:  the request queue for the device
  * @tag:  the tag that has completed
@@ -524,7 +581,7 @@
 
 	BUG_ON(tag == -1);
 
-	if (unlikely(tag >= bqt->max_depth))
+	if (unlikely(tag >= bqt->real_max_depth))
 		return;
 
 	if (unlikely(!__test_and_clear_bit(tag, bqt->tag_map))) {
===== include/linux/blkdev.h 1.105 vs edited =====
--- 1.105/include/linux/blkdev.h	Thu May  8 11:30:11 2003
+++ edited/include/linux/blkdev.h	Tue May 27 20:15:31 2003
@@ -179,7 +179,8 @@
 	unsigned long *tag_map;		/* bit map of free/busy tags */
 	struct list_head busy_list;	/* fifo list of busy tags */
 	int busy;			/* current depth */
-	int max_depth;
+	int max_depth;			/* what we will send to device */
+	int real_max_depth;		/* what the array can hold */
 };
 
 struct request_queue
@@ -452,6 +453,7 @@
 extern void blk_queue_end_tag(request_queue_t *, struct request *);
 extern int blk_queue_init_tags(request_queue_t *, int);
 extern void blk_queue_free_tags(request_queue_t *);
+extern int blk_queue_resize_tags(request_queue_t *, int);
 extern void blk_queue_invalidate_tags(request_queue_t *);
 extern void blk_congestion_wait(int rw, long timeout);
 

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 18:21                   ` Jens Axboe
@ 2003-05-27 18:30                     ` James Bottomley
  0 siblings, 0 replies; 68+ messages in thread
From: James Bottomley @ 2003-05-27 18:30 UTC (permalink / raw)
  To: Jens Axboe; +Cc: torvalds, Linux Kernel

On Tue, 2003-05-27 at 14:21, Jens Axboe wrote:
> Oh yes you are right. How does the attached look? With real_max_depth,
> that should work as well since we'll only ever alloc a bigger area.

That looks perfect.  You submit the block layer changes, I'll plumb it
into SCSI and we can try it out....just as soon as I convert the one
SCSI driver that uses the block queue tags to do dynamic queue
adjustment...

James



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 14:36                     ` Linus Torvalds
  2003-05-27 14:59                       ` James Bottomley
@ 2003-05-27 19:43                       ` Jens Axboe
  1 sibling, 0 replies; 68+ messages in thread
From: Jens Axboe @ 2003-05-27 19:43 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: James Bottomley, Linux Kernel

On Tue, May 27 2003, Linus Torvalds wrote:
> 
> On Tue, 27 May 2003, Jens Axboe wrote:
> > 
> > Here's something ridicolously simple, that just wont start a new tag if
> > the oldest tag is older than 100ms. Clearly nothing for submission, but
> > it gets the point across.
> 
> Yes, I think something like this should work very well.

Agree, it should take the edge of crappy hardware at least.

> In fact, it might fit in very well indeed with AS - and in general it
> might be a good idea to have some nice interface for the IO scheduler to
> give this kind of ordering hints down to the hardware.

And deadline, they share the same request expire mechanism. But I read
your hint, I'll add the hint and fix this for real. Was waiting for the
other tcq patch to be comitted as they overlap, but I see that is in
so...

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 16:16                                 ` Jeff Garzik
@ 2003-05-28  9:35                                   ` Christoph Hellwig
  0 siblings, 0 replies; 68+ messages in thread
From: Christoph Hellwig @ 2003-05-28  9:35 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: James Bottomley, Linus Torvalds, Jens Axboe, Linux Kernel

On Tue, May 27, 2003 at 12:16:24PM -0400, Jeff Garzik wrote:
> > Well, OK, that's not an in-kernel issue.
> 
> It's definitely an in-kernel issue, because the mapping is in-kernel now:
> 
> 	<major,minor> -> queue

The mapping in the kernel is gendisk -> queue.  On the syscall
boundary we have an addition dev_t -> gendisk mapping.  The
scsi midlayer e.g. doesn't care about dev_t or even major / minor
at all.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27 15:21                         ` Jeff Garzik
  2003-05-27 15:38                           ` James Bottomley
@ 2003-05-28 10:50                           ` Lincoln Dale
  1 sibling, 0 replies; 68+ messages in thread
From: Lincoln Dale @ 2003-05-28 10:50 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: James Bottomley, Linus Torvalds, Jens Axboe, Linux Kernel

At 11:21 AM 27/05/2003 -0400, Jeff Garzik wrote:
>ATA defines WWN too...  I'm curious what the format is?  uuid-ish?

FC uses a 64-bit identifier for a WWN.

you end up with things like "50:00:0e:10:00:00:01:1c".
ironically, FC switching doesn't actually 'use' the WWN but a 24-bit FC_ID.

for things like mappings -- who knows -- perhaps WWN is good -- perhaps 
something like a serial # is also good -- perhaps both are good?

         mel-stglab-mds9509-1# show scsi-target lun vsan 1

         - DF350F from HITACHI (Rev 0000)
           FCID is 0x7f0002 in VSAN 1, PWWN is 50:00:0e:10:00:00:01:1c
           ------------------------------------------------------------------------------
           LUN    Capacity  Status  Serial Number    Device-Id
                  (MB)
           ------------------------------------------------------------------------------
           0x0    16651     Online                   C:2 A:0 T:1 HITACHI 
D35067950000
           0x1    16651     Online                   C:2 A:0 T:1 HITACHI 
D35067950001

         - ST318452FC from HP (Rev HP03)
           FCID is 0x7f01da in VSAN 1, PWWN is 22:00:00:04:cf:8c:1f:95
           ------------------------------------------------------------------------------
           LUN    Capacity  Status  Serial Number    Device-Id
                  (MB)
           ------------------------------------------------------------------------------
           0x0    17920     Online  3EV0JVQJ         C:1 A:0 T:3 
20:00:00:04:cf:8c:1f:95
         [..]


cheers,

lincoln.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-06-02  9:46 ` Andre Hedrick
@ 2003-06-02 13:56   ` Alan Cox
  0 siblings, 0 replies; 68+ messages in thread
From: Alan Cox @ 2003-06-02 13:56 UTC (permalink / raw)
  To: Andre Hedrick; +Cc: Jeff Garzik, Linus Torvalds, Linux Kernel Mailing List

On Llu, 2003-06-02 at 10:46, Andre Hedrick wrote:
> Linus, my professional opinion is to follow Jeff's direction for 2.5/2.6.
> This will allow Linux to push open source to the hardware vendors.
> There are several bastardized scsi-ide drivers in ./scsi.
> 
> pci2000.c,h :: pci2220i.c,h :: psi240i.c,h + psi_*.h :: eata*

megaraid, i2o, 3ware, aacraid all also drive IDE devices too now
Promise new driver is using the scsi layer as well (because they do
command queuing as well as drive tagged queue stuff). I've been 
hacking on the SI stuff a bit and its also apparent our IDE PATA
layer won't support that well in a hurry either since it also
has command buffering.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  4:58 Jeff Garzik
  2003-05-26  5:15 ` Linus Torvalds
  2003-05-26  5:59 ` Benjamin Herrenschmidt
@ 2003-06-02  9:46 ` Andre Hedrick
  2003-06-02 13:56   ` Alan Cox
  2 siblings, 1 reply; 68+ messages in thread
From: Andre Hedrick @ 2003-06-02  9:46 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: torvalds, linux-kernel


JG,

This is the right move after months of debating ideas.
Migrating the contents of ./drivers/ide/pci to scsi is reasonable too.
This will allow the legacy drivers to die.

Taskfile came to late to be really useful, it is now best used as a model
to tackle FIS/FPDMA mapping for SATA II.

Linus, my professional opinion is to follow Jeff's direction for 2.5/2.6.
This will allow Linux to push open source to the hardware vendors.
There are several bastardized scsi-ide drivers in ./scsi.

pci2000.c,h :: pci2220i.c,h :: psi240i.c,h + psi_*.h :: eata*

ATAPI is nothing more than Bastardized SCSI beaten with an ugly stick.

Packet-Taskfile is more than painful, and I am not in the fighting mood.

Cheers,

Andre

On Mon, 26 May 2003, Jeff Garzik wrote:

> Just to echo some comments I said in private, this driver is _not_
> a replacement for drivers/ide.  This is not, and has never been,
> the intention.  In fact, I need drivers/ide's continued existence,
> so that I may have fewer boundaries on future development.
> 
> Even though ATAPI support doesn't exist and error handling is
> primitive, this driver has been extensively tested locally and I feel
> is ready for a full and public kernel developer assault :)
> 
> James ok'd sending this...  I'll be sending "un-hack scsi headers" patch
> through him via his scsi-misc-2.5 tree.
> 
> 
> 
> 
> Linus, please do a
> 
> 	bk pull bk://kernel.bkbits.net/jgarzik/scsi-2.5
> 
> Others may download the patch from
> 
> ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.5/2.5.69-bk18-scsi1.patch.bz2
> 
> This will update the following files:
> 
>  drivers/scsi/Kconfig    |   27 
>  drivers/scsi/Makefile   |    1 
>  drivers/scsi/ata_piix.c |  322 ++++++
>  drivers/scsi/libata.c   | 2247 ++++++++++++++++++++++++++++++++++++++++++++++++
>  include/linux/ata.h     |  485 ++++++++++
>  5 files changed, 3082 insertions(+)
> 
> through these ChangeSets:
> 
> <jgarzik@redhat.com> (03/05/26 1.1357)
>    [scsi ata] make PATA config option actually do something useful
> 
> <jgarzik@redhat.com> (03/05/26 1.1356)
>    [scsi ata] include hacks, b/c scsi headers not in include/linux
> 
> <jgarzik@redhat.com> (03/05/26 1.1355)
>    [scsi] add ATA driver
> 
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 

Andre Hedrick
LAD Storage Consulting Group


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27  6:51                     ` Linus Torvalds
@ 2003-05-27  7:29                       ` Jeff Garzik
  0 siblings, 0 replies; 68+ messages in thread
From: Jeff Garzik @ 2003-05-27  7:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds wrote:
> One prime example of this is cdrecord, and the incredible braindamage that
> the name "SCSI" foisted upon it. Why? Because everybody (ie schily)  
> _knows_ that SCSI is addressed by bus/id/lun, and thinks that anything
> else is wrong. So you have total idiocies like the "cdrecord -scanbus"  
> crap for finding your device, and totally useless naming that makes no 
> sense in any sane environment.
> 
> Calling something SCSI when it isn't brings on these kinds of bad things: 
> people make assuptions that aren't sensible or desireable.
> 
> Names have power. There's baggage and assumptions in a name. In the case
> of SCSI, there is a _lot_ of baggage.


Now that argument I can buy.

There's still helper functions to be created before a native block 
driver can directly use struct requests for fully native queueing. 
Brand new device, host registration code.  PM, hotplug, yadda yadda.  It 
winds up being a lot of code still, and it not as simple as you and Jens 
seem to be making the task out to be.  That's why I brought up 
/dev/{disk,floppy,cdrom}...

If all that work is to be done for a brand new, native block driver, we 
should at least intend on using the code as a bus-agnostic command 
transport layer, with packages of helpers like my current "libata" doing 
the command set work (and sometimes, some amount of low-level driver 
work, where commonality exists).

	Jeff




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27  6:30                   ` Linus Torvalds
@ 2003-05-27  6:51                     ` Linus Torvalds
  2003-05-27  7:29                       ` Jeff Garzik
  0 siblings, 1 reply; 68+ messages in thread
From: Linus Torvalds @ 2003-05-27  6:51 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel


On Mon, 26 May 2003, Linus Torvalds wrote:
> 
> And quite frankly, names matter, and calling it SCSI is clearly wrong.  

Btw, in case you wonder why I care about names and organization, it's 
because with the names and organization comes assumptions and 
expectations.

One prime example of this is cdrecord, and the incredible braindamage that
the name "SCSI" foisted upon it. Why? Because everybody (ie schily)  
_knows_ that SCSI is addressed by bus/id/lun, and thinks that anything
else is wrong. So you have total idiocies like the "cdrecord -scanbus"  
crap for finding your device, and totally useless naming that makes no 
sense in any sane environment.

Calling something SCSI when it isn't brings on these kinds of bad things: 
people make assuptions that aren't sensible or desireable.

Names have power. There's baggage and assumptions in a name. In the case
of SCSI, there is a _lot_ of baggage.

			Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27  6:07                 ` Jeff Garzik
@ 2003-05-27  6:30                   ` Linus Torvalds
  2003-05-27  6:51                     ` Linus Torvalds
  0 siblings, 1 reply; 68+ messages in thread
From: Linus Torvalds @ 2003-05-27  6:30 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel


On Tue, 27 May 2003, Jeff Garzik wrote:
> 
> As you see from Alan's message and others, it isn't pseudo-SCSI.

It _is_ pseudo-scsi.

Or rather, what used to be SCSI is quickly becoming irrelevant. There's 
almost nothing left, except for the command set. And SCSI is a lot more 
than the command set, it's the full definition of the signalling from 
command set to electricals to connectors.

And the other stuff matters. The linux SCSI layer proper is full of the
_addressing_ that is part of the SCSI standard proper, and that is pretty
much total nonsense outside of that standard (it's starting to be nonsense 
even inside that standard, since everybody running away fron the old buses 
and the old addressing).

So we shouldn't call it SCSI, because it clearly IS NOT, whatever you
claim. This is a _fact_, I don't see why you argue against it. SCSI has a
well-defined definition (or rather, a _set_ of definitions), and SATA
ain't there. One is T10, the other is part of T13. 

And quite frankly, names matter, and calling it SCSI is clearly wrong.  
What makes you _think_ it is SCSI is that everybody uses the command set,
and all devices are starting to largely just talk MMC-2+.

But calling it MMC-2 is also incorrect, since everybody really talks a
superset, and we should just accept that and not try to limit outselves.

		Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 20:09               ` Linus Torvalds
  2003-05-27  0:29                 ` Alan Cox
@ 2003-05-27  6:07                 ` Jeff Garzik
  2003-05-27  6:30                   ` Linus Torvalds
  1 sibling, 1 reply; 68+ messages in thread
From: Jeff Garzik @ 2003-05-27  6:07 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds wrote:
> On Mon, 26 May 2003, Jeff Garzik wrote:
> 
>>Correct, but precisely by saying that, you're missing something.
> 
> 
> You're missing _my_ point.
> 
> 
>>The SCSI midlayer provides infrastructure I need -- which is not 
>>specific to SCSI at all.
> 
> 
> If it isn't specific to SCSI, then it sure as hell shouldn't BE THERE!
> 
> My point is that it's _wrong_ to make non-SCSI drivers use the SCSI layer, 
> because that shows that something is misdesigned.
> 
> And I bet there isn't all that much left that really helps.
> 
> You adding more "pseudo-SCSI" crap just _makes_things_worse. It does not 
> advance anything, it regresses. 


As you see from Alan's message and others, it isn't pseudo-SCSI.

Besides what he mentioned, there is Serial Attached SCSI (SAS), where a 
host controller can simultaneously support SAS disks and SATA disks.  So 
it's either an IDE driver that does SCSI, or a SCSI driver that does 
IDE, or a driver that's in both IDE and SCSI subsystems, or... ?  Having 
fun yet?  :)


> On 27 May 2003, Alan Cox wrote:
>>> I actually think thats a positive thing. It means 2.5 drivers/scsi is
>>> now very close to being the "native queueing driver" with some
>>> additional default plugins for doing scsi scanning, scsi error recovery 
>>> and a few other scsi bits.
> 
> Hey, that may well be the way to go, in which case the core stuff should
> be renamed and moved off somewhere else. Leaving drivers/scsi with just 
> the actual low-level SCSI drivers. 

For all these reasons, I continue to maintain that starting out as a 
SCSI driver, and then evolving, is the best route.  The non-SCSI parts 
leave drivers/scsi, as they should.  The SCSI parts stay.  The SCSI 
mid-layer gets smaller.  All the while, the driver continues to work. 
Everybody wins.

Starting out as a native block driver and _then_ adding SCSI support and 
native queueing and jazz does not sound even remotely like a good path 
to follow.

	Jeff




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-27  0:22       ` Alan Cox
@ 2003-05-27  4:15         ` Linus Torvalds
  0 siblings, 0 replies; 68+ messages in thread
From: Linus Torvalds @ 2003-05-27  4:15 UTC (permalink / raw)
  To: Alan Cox; +Cc: Jeff Garzik, Linux Kernel Mailing List


On 27 May 2003, Alan Cox wrote:
> 
> I actually think thats a positive thing. It means 2.5 drivers/scsi is
> now very close to being the "native queueing driver" with some
> additional default plugins for doing scsi scanning, scsi error recovery 
> and a few other scsi bits.

Hey, that may well be the way to go, in which case the core stuff should
be renamed and moved off somewhere else. Leaving drivers/scsi with just 
the actual low-level SCSI drivers. 

			Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 20:09               ` Linus Torvalds
@ 2003-05-27  0:29                 ` Alan Cox
  2003-05-27  6:07                 ` Jeff Garzik
  1 sibling, 0 replies; 68+ messages in thread
From: Alan Cox @ 2003-05-27  0:29 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff Garzik, Linux Kernel Mailing List

On Llu, 2003-05-26 at 21:09, Linus Torvalds wrote:
> My point is that it's _wrong_ to make non-SCSI drivers use the SCSI layer, 
> because that shows that something is misdesigned.

Sure - there is lots of stuff that ought to be generic error
handling/timers/requeue/ioctl stuff that right now happens to be in the
scsi layer, and a megaton of user space that only supports it

> You adding more "pseudo-SCSI" crap just _makes_things_worse. It does not 
> advance anything, it regresses. 

SATA is SCSI with some legacy crap nailed on the back end, next
generation SATA controllers look like scsi, act like scsi, think like
scsi. The bang a register count to ten poke a couple of bits and babysit
the command world of IDE is dying. 

Also with regards to the confusion comment, several vendors now have
identical hardware interfaces for their smart SATA and SCSI devices.
dpt_i2o has an ATA form, aacraid has a SATA form so the confusion 
problem about device names also won't be going away in a hurry.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  5:40     ` Linus Torvalds
  2003-05-26  5:53       ` Jeff Garzik
@ 2003-05-27  0:22       ` Alan Cox
  2003-05-27  4:15         ` Linus Torvalds
  1 sibling, 1 reply; 68+ messages in thread
From: Alan Cox @ 2003-05-27  0:22 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff Garzik, Linux Kernel Mailing List

On Llu, 2003-05-26 at 06:40, Linus Torvalds wrote:
> > And for specifically Intel SATA, drivers/ide flat out doesn't work (even 
> > though it claims to).
> 
> Well, I don't think it claimed to, until today. Still doesn't work?

Even if it did it would at best be a toy. The core IDE layer doesn't
handle SCSI errors properly (needed for ATAPI) except using ide-scsi. It
doesn't handle hot plugging of devices, it doesn't handle tagged
queueing very well, it hasn't the slightest idea about multipath (SATA2
can do), it doesn't know a lot of other things either.

SATA and especially SATA2 is basically SCSI with some slightly odd ways
of issuing READ10/WRITE10 to disk devices. A "native" driver would
basically be a copy of most of drivers/scsi.

I actually think thats a positive thing. It means 2.5 drivers/scsi is
now very close to being the "native queueing driver" with some
additional default plugins for doing scsi scanning, scsi error recovery 
and a few other scsi bits.



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 17:47             ` Jeff Garzik
@ 2003-05-26 20:09               ` Linus Torvalds
  2003-05-27  0:29                 ` Alan Cox
  2003-05-27  6:07                 ` Jeff Garzik
  0 siblings, 2 replies; 68+ messages in thread
From: Linus Torvalds @ 2003-05-26 20:09 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel


On Mon, 26 May 2003, Jeff Garzik wrote:
> 
> Correct, but precisely by saying that, you're missing something.

You're missing _my_ point.

> The SCSI midlayer provides infrastructure I need -- which is not 
> specific to SCSI at all.

If it isn't specific to SCSI, then it sure as hell shouldn't BE THERE!

My point is that it's _wrong_ to make non-SCSI drivers use the SCSI layer, 
because that shows that something is misdesigned.

And I bet there isn't all that much left that really helps.

You adding more "pseudo-SCSI" crap just _makes_things_worse. It does not 
advance anything, it regresses. 

		Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 17:59               ` Jeff Garzik
@ 2003-05-26 18:11                 ` Jens Axboe
  0 siblings, 0 replies; 68+ messages in thread
From: Jens Axboe @ 2003-05-26 18:11 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linus Torvalds, linux-kernel

On Mon, May 26 2003, Jeff Garzik wrote:
> Jens Axboe wrote:
> >On Mon, May 26 2003, Linus Torvalds wrote:
> >
> >>>What does the block layer need, that it doesn't have now?
> >>
> >>Exactly. I'd _love_ for people to really think about this.
> >
> >
> >In discussion with Jeff, it seems most of what he wants is already
> >there. He just doesn't know it yet :-)
> 
> 
> Another important point is time.
> 
> I continue to agree that a native block driver is the best direction.
> 
> But with 2.6.0 looming, I think it's best to evolve my ATA driver to be 
> a native block driver from a scsi one.   Not start out as a native 
> driver.  That's significant pre-2.6 churn.

I don't think that makes any sense. If you really do find missing
functionality that are candidates to be generic block property, we can
add them.

> Or, it lives out-of-tree until 2.7 and people with SATA hardware have to 
> go out-of-tree for their driver for months and months, until the working 
> driver is deemed sufficiently native :)  In the meantime, distros 
> wanting working SATA will just ship the SCSI driver as-is.  :(

I don't know why you are even worrying about this yet, time will decide
what happens. As it stands right now, I still consider your driver to be
something to play with, not something that should go into the kernel
anytime soon.

Maybe in 6 months time it has evolved to be something really great, we
add it then. Right now you are still in the design stages in some areas.


-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 17:24             ` Jens Axboe
  2003-05-26 17:54               ` Jeff Garzik
@ 2003-05-26 17:59               ` Jeff Garzik
  2003-05-26 18:11                 ` Jens Axboe
  1 sibling, 1 reply; 68+ messages in thread
From: Jeff Garzik @ 2003-05-26 17:59 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, linux-kernel

Jens Axboe wrote:
> On Mon, May 26 2003, Linus Torvalds wrote:
> 
>>>What does the block layer need, that it doesn't have now?
>>
>>Exactly. I'd _love_ for people to really think about this.
> 
> 
> In discussion with Jeff, it seems most of what he wants is already
> there. He just doesn't know it yet :-)


Another important point is time.

I continue to agree that a native block driver is the best direction.

But with 2.6.0 looming, I think it's best to evolve my ATA driver to be 
a native block driver from a scsi one.   Not start out as a native 
driver.  That's significant pre-2.6 churn.

Or, it lives out-of-tree until 2.7 and people with SATA hardware have to 
go out-of-tree for their driver for months and months, until the working 
driver is deemed sufficiently native :)  In the meantime, distros 
wanting working SATA will just ship the SCSI driver as-is.  :(

	Jeff




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 17:24             ` Jens Axboe
@ 2003-05-26 17:54               ` Jeff Garzik
  2003-05-26 17:59               ` Jeff Garzik
  1 sibling, 0 replies; 68+ messages in thread
From: Jeff Garzik @ 2003-05-26 17:54 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Linus Torvalds, linux-kernel

Jens Axboe wrote:
> On Mon, May 26 2003, Linus Torvalds wrote:
> 
>>>What does the block layer need, that it doesn't have now?
>>
>>Exactly. I'd _love_ for people to really think about this.
> 
> 
> In discussion with Jeff, it seems most of what he wants is already
> there. He just doesn't know it yet :-)


hehe.  Here's a salient point:

native block drivers are typically used in one of two ways:  creating a 
whole subsystem (ide, scsi), or servicing a single host (dac960).

I am doing the first:  creating a whole subsystem.  The infrastructure 
involved in that, over and above what block already provides, is that 
part I dread coding.

If I am to code that, I want to do so ONCE.  And that means a 
bus-agnostic /dev/{disk,cdrom,tape}.  Not /dev/{ad,acd,at,ag}.

	Jeff



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 16:56           ` Linus Torvalds
@ 2003-05-26 17:47             ` Jeff Garzik
  2003-05-26 20:09               ` Linus Torvalds
  0 siblings, 1 reply; 68+ messages in thread
From: Jeff Garzik @ 2003-05-26 17:47 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds wrote:
> On Mon, 26 May 2003, Jeff Garzik wrote:
> 
>>>>I think with SATA + drivers/ide, you reach a point of diminishing 
>>>>returns versus amount of time spent on mid-layer coding.
>>>
>>>I think that's a valid approach, and just have a special driver for SATA. 
>>>That's not the part I worry about. 
>>
>>You're still at a point of diminishing returns for a native driver too.
> 
> 
> No, because for a block-queue native driver there would be one huge 
> advantage: making sure the native queueing is brought up to be a fully 
> used interface.
> 
> Originally, native queueing was all there was, and all drivers interfaced 
> directly to it. The IDE and SCSI layers ended up being used largely
> because they allowed drivers to use a global naming scheme, not because 
> they were "better". In fact, in many ways they were a lot worse than the 
> native queues, but then they had some other infrastructure that did help.
> 
> My bet is that most of that infrastructure today is just not worth it.

[...]

> Clue-in-time: if hotplugging doesn't work on a native level, it won't work 
> on a SCSI level either. So that _cannot_ be a real issue.
> 
> Device model stuff and power management all comes from other areas, and 
> have little to do with SCSI.

Correct, but precisely by saying that, you're missing something.

The SCSI midlayer provides infrastructure I need -- which is not 
specific to SCSI at all.

Device registration, host registration, major/minor sub-allocation, 
device model, power management, command queueing features, command 
timeout, error handling thread, userland command submission interface (a 
la taskfile ioctl or SG_IO)...  all these are possible via the generic 
block layer -- but the generic block layer does not directly provide 
these facilities.  Each subsystem (ide, scsi, ...) has to provide bits 
on top of that to make it usable.  Each native block driver must 
re-create these for its own subsystem.

All those "little bits" add up to code I have to write, for a native 
driver, which I don't have to write for SCSI.

IF you are willing to sign off on a generic /dev/disk (bus agnostic), 
then I am interested.  That allows me to code up the above stuff ONCE, 
and not see others do the same work when they write a native blk driver 
(or avoid this work, by using SCSI like I do now).  Otherwise. I'm 
writing all those little bits YET AGAIN, duplicating stuff scsi already 
does.

Taking that to a concrete level, that basically means more helpers at 
the native block level.

Given that I have always agreed a native block driver is the best 
eventual method, what's stopping me?  Time:

If /dev/disk is a non-starter for 2.6, the best route is to take my SCSI 
driver and evolve it over time to be a native block driver.  Not start 
with a native block driver.  That gives us a SATA framework now, and 
SCSI can disappear eventually.
#include <linux_is_usable_not_pie_in_sky_theory.h>


(re-arranged msg a bit)
>> Userland support is nonexistent, and users would have start using yet 
>> another set of tools to deal with their disks, instead of the existing 
>> [scsi] ones.

> That's not true. The command set is already exported, so that even things 
> like packet commands can be sent (that's how you write CD-RW under 2.5.x, 
> and it's the _generic_ layer that does all the command queueing). 
> 
> That, btw, i show you should do the ATA transport thing. Even SCSI devices 
> can get the commands fed that way. It's neither ATA nor SCSI, it's a 
> "packet interface for the generic queue".

I agree the capabilities are there for packet interface, but one still 
has to define the userland interface.  Yes, I know about SG_IO.  It's: 
somewhat SCSI-specific, and sub-optimal.   Think back to December 2001, 
when you described a packet interface using read(2) and write(2). 
Basically SG_IO, but better and without scsi specifics.


> The only remaining piece of the puzzle (from a user land perspective) ends 
> up being the mapping from major/minor -> queue. That used to be a _huge_
> issue, and the main reason you couldn't write a SCSI driver outside the 
> SCSI layer, for example (but look at DAC960 - it decided to do so 
> _anyway_, even if it meant being ostracised and put on a separate major).
> 
> But that should be much less of an issue now, since our queue mapping is a 
> lot more flexible these days.

The mapping part is easy, IMO.  I would just have the "generic 
infrastructure" export chrdrvs for each queue I manage, using the 
interface you described in December 2001.  sysfs can then describe the 
{major+minor | CIDR | whatever}-to-queue mapping.

For example, when /dev/disk device registration occurs at runtime, the 
/dev/disk infrastructure would make a new chrdrv appear.


Anyway, stepping back to the larger issue of "userland tools", I'm not 
just talking about a taskfile I/O interface.  I'm talking about all the 
little distro tools for IDE and SCSI, like /usr/bin/scsi_unique_id or 
/sbin/hdparm.  And the assumptions that distro installers make.

Creating yet another set of these tools and standard practices is just 
not worth it, without bus-agnostic /dev/disk.  Sysadmins are going to be 
annoyed at yet another interface and yet another not-IDE, not-SCSI (and 
thus second-class) block major.

It's not worth it for me, with working code now, and it's not worth for 
the next person who wants to add a bus type to Linux.

Unless I am to be creating infrastructure which non-SATA people can use, 
your suggestion equates to making the SATA driver a second-class 
citizen, with lots of additional coding for little gain, and with lots 
of pain on the distro side that must be repeated again and again for 
each native block driver.

	Jeff




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 16:57           ` Linus Torvalds
@ 2003-05-26 17:24             ` Jens Axboe
  2003-05-26 17:54               ` Jeff Garzik
  2003-05-26 17:59               ` Jeff Garzik
  0 siblings, 2 replies; 68+ messages in thread
From: Jens Axboe @ 2003-05-26 17:24 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Jeff Garzik, linux-kernel

On Mon, May 26 2003, Linus Torvalds wrote:
> > What does the block layer need, that it doesn't have now?
> 
> Exactly. I'd _love_ for people to really think about this.

In discussion with Jeff, it seems most of what he wants is already
there. He just doesn't know it yet :-)

Maybe that's my problem as well, maybe the code / comments / doc /
whatever is not clear enough.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  6:21         ` Jeff Garzik
@ 2003-05-26 16:57           ` Linus Torvalds
  2003-05-26 17:24             ` Jens Axboe
  0 siblings, 1 reply; 68+ messages in thread
From: Linus Torvalds @ 2003-05-26 16:57 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel


On Mon, 26 May 2003, Jeff Garzik wrote:
> 
> Tangenting a bit:

Not at all. This is the thrust of _my_ worries.

> What does the block layer need, that it doesn't have now?

Exactly. I'd _love_ for people to really think about this.

		Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  6:01         ` Jeff Garzik
@ 2003-05-26 16:56           ` Linus Torvalds
  2003-05-26 17:47             ` Jeff Garzik
  0 siblings, 1 reply; 68+ messages in thread
From: Linus Torvalds @ 2003-05-26 16:56 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel


On Mon, 26 May 2003, Jeff Garzik wrote:
> >>I think with SATA + drivers/ide, you reach a point of diminishing 
> >>returns versus amount of time spent on mid-layer coding.
> > 
> > I think that's a valid approach, and just have a special driver for SATA. 
> > That's not the part I worry about. 
> 
> You're still at a point of diminishing returns for a native driver too.

No, because for a block-queue native driver there would be one huge 
advantage: making sure the native queueing is brought up to be a fully 
used interface.

Originally, native queueing was all there was, and all drivers interfaced 
directly to it. The IDE and SCSI layers ended up being used largely
because they allowed drivers to use a global naming scheme, not because 
they were "better". In fact, in many ways they were a lot worse than the 
native queues, but then they had some other infrastructure that did help.

My bet is that most of that infrastructure today is just not worth it.

> Userland support is nonexistent, and users would have start using yet 
> another set of tools to deal with their disks, instead of the existing 
> [scsi] ones.

That's not true. The command set is already exported, so that even things 
like packet commands can be sent (that's how you write CD-RW under 2.5.x, 
and it's the _generic_ layer that does all the command queueing). 

That, btw, i show you should do the ATA transport thing. Even SCSI devices 
can get the commands fed that way. It's neither ATA nor SCSI, it's a 
"packet interface for the generic queue".

The only remaining piece of the puzzle (from a user land perspective) ends 
up being the mapping from major/minor -> queue. That used to be a _huge_
issue, and the main reason you couldn't write a SCSI driver outside the 
SCSI layer, for example (but look at DAC960 - it decided to do so 
_anyway_, even if it meant being ostracised and put on a separate major).

But that should be much less of an issue now, since our queue mapping is a 
lot more flexible these days.

> Device model, power management, hotplugging are all handled or 
> in-the-works for scsi, and they are exactly what SATA needs.

Clue-in-time: if hotplugging doesn't work on a native level, it won't work 
on a SCSI level either. So that _cannot_ be a real issue.

Device model stuff and power management all comes from other areas, and 
have little to do with SCSI.

		Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 11:13       ` Jeff Garzik
@ 2003-05-26 11:37         ` Bartlomiej Zolnierkiewicz
  0 siblings, 0 replies; 68+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2003-05-26 11:37 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linus Torvalds, linux-kernel


On Mon, 26 May 2003, Jeff Garzik wrote:

> Bartlomiej Zolnierkiewicz wrote:
> > On Mon, 26 May 2003, Jeff Garzik wrote:
> >
> >
> >>Linus Torvalds wrote:
> >>
> >>>On Mon, 26 May 2003, Jeff Garzik wrote:
> >>>
> >>>
> >>>>Just to echo some comments I said in private, this driver is _not_
> >>>>a replacement for drivers/ide.  This is not, and has never been,
> >>>>the intention.  In fact, I need drivers/ide's continued existence,
> >>>>so that I may have fewer boundaries on future development.
> >>>
> >>>
> >>>Just out of interest, is there any _point_ to this driver? I can
> >>>appreciate the approach, but I'd like to know if it does anything (at all)
> >>>better than the native IDE driver? Faster? Anything?
> >>
> >>
> >>Direction:  SATA is much more suited to SCSI, because otherwise you wind
> >>up re-creating all the queueing and error handling mess that SCSI
> >>already does for you.  The SATA2 host controllers coming out soon do
> >>full host-side TCQ, not the dain-bramaged ATA TCQ bus-release stuff.
> >>Doing SATA2 devel in drivers/ide will essentially be re-creating the
> >>SCSI mid-layer.
> >
> >
> > And now you are recreating ATA in SCSI ;-).
> >
> > Don't get me wrong: I like idea very much, but why you can't
> > share common code between drivers/ide and your ATA-SCSI.
>
> SATA2 is really 100% different from PATA in all respects except for the
> ATA commands themselves being sent down the pipe.  Doing that inside
> drivers/ide really means two driver cores at that point.
>
> Sharing code is definitely an option, though!
> Why do you think I named it "libata"?  :)
>
>
> Another SATA wrinkle:  the phy layer.
>
> SATA, like network cards, has an underlying connection layer.  The host
> connects to the device.  Your link state may be up or down.  This is
> easily mapped to SCSI semantics (initiator connection to target).  Much
> more code if writing from scratch.
>
>
> >>Legacy-free:  Because I don't have to worry about legacy host
> >>controllers, I can ignore limitations drivers/ide cannot.  In
> >>drivers/ide, each host IO (PIO/MMIO) is done via function pointer.  If
> >>your arch has a mach_vec, more function pointers.  Mine does direct
> >>calls to the asm/io.h functions in faster.  So, ATA command submission
> >>is measureably faster.
> >
> >
> > I think it is simply wrong, you should use function pointers.
> > You can have ie. two PCI hosts, one using PIO and one using MMIO.
> >
> > "measureably faster", I doubt.
> > IO operations are REALLY slow when compared to CPU cycles.
>
> You misunderstand.
>
> drivers/ide -- uses function pointers HWIF->outb, etc. inside a static
> taskfile-submit function.  The TF submit functions statically order each
> HWIF->outb() call, and you cannot change that order from the low-level
> driver.
>
> my driver -- uses separate taskfile-submit functions for different hardware.
>
> My method directly uses outb(), writel(), or whatever is necessary.  It
> allows one to use __writel() and barriers to optimize as needed, or to
> support special cache coherency needs.  Further, for SATA and some
> advanced PATA controllers, my method allows one to use a completely
> alien and new taskfile submission method.  For example, building SATA2
> FIS's as an array of u32's, and then queueing that FIS onto a DMA ring.
>   The existing drivers/ide code is nowhere close to that.

Its not that hard, but nevermind.

> Putting it into drivers/ide language, my driver adds a
> HWIF->do_flagged_taskfile hook, and eliminates the HWIF->outb hooks.
> Turns ~13 hook calls into one, from looking at flagged_taskfile.

Okay, I misunderstood.
This is cool, but you are using ie. outb not only in taskfiles.

> And FWIW, I would not state "measureably faster" unless I actually
> benchmarked it.  :)  It's a pittance in the grand scheme of things,
> since you're inevitably limited by device speed, but lower CPU usage is
> something nobody complains about...
>
>
> >>sysfs:  James and co are putting time into getting scsi sysfs right.  I
> >>would rather ride their coattails, and have my driver Just Work with
> >>sysfs and the driver model.
> >
> >
> > No big deal here, ATA will get it too.
>
> Oh agreed.  But scsi is pretty much there now (it's in testing in a BK
> tree), and we're darn close to 2.6.0.
>
>
> >>PIO data transfer is faster and more scheduler-friendly, since it polls
> >>from a kernel thread.
> >
> >
> > CPU polling is faster than IRQs?
>
> Yes, almost always.
>
> The problem with polling is that it chews up CPU if you're not careful,
> starving out other processes.  I'm careful :)

Good.

> >>And for specifically Intel SATA, drivers/ide flat out doesn't work (even
> >>though it claims to).
> >
> >
> > So fix it ;-).
>
> As you can see from the discussion, I think a new driver best serves the
> future of SATA2 as well as today's SATA.  So I'm not gonna bother...  I
> have a driver that works.
>
> If you're interested in working the problem, the result of booting ICH5
> SATA in drivers/ide is a hang at
> 	hdX: attached to ide-disk driver
>
> Nothing happens after that.  sysrq doesn't work, so my assumption was
> that the bus was locked up.
>
>
> >>So, I conclude:  faster, smaller, and better future direction.  IMO, of
> >>course :)
> >
> >
> > And right now ugly and incomplete.
> > IMO, of course ;-).
>
> Incomplete?  Kinda sorta.  SATAPI isn't here yet, so all it needs is
> more fine-grained error handling.  I call it 100% stable now, though.
>
> Ugly?  Compared to drivers/ide?  I actually have a lot of admiration and
> respect for the PATA knowledge embedded in drivers/ide.  But I would
> never call it pretty :)

Silly excuse ;-).

Thanks,
--
Bartlomiej

> 	Jeff


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26 10:32     ` Bartlomiej Zolnierkiewicz
@ 2003-05-26 11:13       ` Jeff Garzik
  2003-05-26 11:37         ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 68+ messages in thread
From: Jeff Garzik @ 2003-05-26 11:13 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz; +Cc: Linus Torvalds, linux-kernel

Bartlomiej Zolnierkiewicz wrote:
> On Mon, 26 May 2003, Jeff Garzik wrote:
> 
> 
>>Linus Torvalds wrote:
>>
>>>On Mon, 26 May 2003, Jeff Garzik wrote:
>>>
>>>
>>>>Just to echo some comments I said in private, this driver is _not_
>>>>a replacement for drivers/ide.  This is not, and has never been,
>>>>the intention.  In fact, I need drivers/ide's continued existence,
>>>>so that I may have fewer boundaries on future development.
>>>
>>>
>>>Just out of interest, is there any _point_ to this driver? I can
>>>appreciate the approach, but I'd like to know if it does anything (at all)
>>>better than the native IDE driver? Faster? Anything?
>>
>>
>>Direction:  SATA is much more suited to SCSI, because otherwise you wind
>>up re-creating all the queueing and error handling mess that SCSI
>>already does for you.  The SATA2 host controllers coming out soon do
>>full host-side TCQ, not the dain-bramaged ATA TCQ bus-release stuff.
>>Doing SATA2 devel in drivers/ide will essentially be re-creating the
>>SCSI mid-layer.
> 
> 
> And now you are recreating ATA in SCSI ;-).
> 
> Don't get me wrong: I like idea very much, but why you can't
> share common code between drivers/ide and your ATA-SCSI.

SATA2 is really 100% different from PATA in all respects except for the 
ATA commands themselves being sent down the pipe.  Doing that inside 
drivers/ide really means two driver cores at that point.

Sharing code is definitely an option, though!
Why do you think I named it "libata"?  :)


Another SATA wrinkle:  the phy layer.

SATA, like network cards, has an underlying connection layer.  The host 
connects to the device.  Your link state may be up or down.  This is 
easily mapped to SCSI semantics (initiator connection to target).  Much 
more code if writing from scratch.


>>Legacy-free:  Because I don't have to worry about legacy host
>>controllers, I can ignore limitations drivers/ide cannot.  In
>>drivers/ide, each host IO (PIO/MMIO) is done via function pointer.  If
>>your arch has a mach_vec, more function pointers.  Mine does direct
>>calls to the asm/io.h functions in faster.  So, ATA command submission
>>is measureably faster.
> 
> 
> I think it is simply wrong, you should use function pointers.
> You can have ie. two PCI hosts, one using PIO and one using MMIO.
> 
> "measureably faster", I doubt.
> IO operations are REALLY slow when compared to CPU cycles.

You misunderstand.

drivers/ide -- uses function pointers HWIF->outb, etc. inside a static 
taskfile-submit function.  The TF submit functions statically order each 
HWIF->outb() call, and you cannot change that order from the low-level 
driver.

my driver -- uses separate taskfile-submit functions for different hardware.

My method directly uses outb(), writel(), or whatever is necessary.  It 
allows one to use __writel() and barriers to optimize as needed, or to 
support special cache coherency needs.  Further, for SATA and some 
advanced PATA controllers, my method allows one to use a completely 
alien and new taskfile submission method.  For example, building SATA2 
FIS's as an array of u32's, and then queueing that FIS onto a DMA ring. 
  The existing drivers/ide code is nowhere close to that.

Putting it into drivers/ide language, my driver adds a 
HWIF->do_flagged_taskfile hook, and eliminates the HWIF->outb hooks. 
Turns ~13 hook calls into one, from looking at flagged_taskfile.

And FWIW, I would not state "measureably faster" unless I actually 
benchmarked it.  :)  It's a pittance in the grand scheme of things, 
since you're inevitably limited by device speed, but lower CPU usage is 
something nobody complains about...


>>sysfs:  James and co are putting time into getting scsi sysfs right.  I
>>would rather ride their coattails, and have my driver Just Work with
>>sysfs and the driver model.
> 
> 
> No big deal here, ATA will get it too.

Oh agreed.  But scsi is pretty much there now (it's in testing in a BK 
tree), and we're darn close to 2.6.0.


>>PIO data transfer is faster and more scheduler-friendly, since it polls
>>from a kernel thread.
> 
> 
> CPU polling is faster than IRQs?

Yes, almost always.

The problem with polling is that it chews up CPU if you're not careful, 
starving out other processes.  I'm careful :)


>>And for specifically Intel SATA, drivers/ide flat out doesn't work (even
>>though it claims to).
> 
> 
> So fix it ;-).

As you can see from the discussion, I think a new driver best serves the 
future of SATA2 as well as today's SATA.  So I'm not gonna bother...  I 
have a driver that works.

If you're interested in working the problem, the result of booting ICH5 
SATA in drivers/ide is a hang at
	hdX: attached to ide-disk driver

Nothing happens after that.  sysrq doesn't work, so my assumption was 
that the bus was locked up.


>>So, I conclude:  faster, smaller, and better future direction.  IMO, of
>>course :)
> 
> 
> And right now ugly and incomplete.
> IMO, of course ;-).

Incomplete?  Kinda sorta.  SATAPI isn't here yet, so all it needs is 
more fine-grained error handling.  I call it 100% stable now, though.

Ugly?  Compared to drivers/ide?  I actually have a lot of admiration and 
respect for the PATA knowledge embedded in drivers/ide.  But I would 
never call it pretty :)

	Jeff




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  5:30   ` Jeff Garzik
  2003-05-26  5:36     ` Jeff Garzik
  2003-05-26  5:40     ` Linus Torvalds
@ 2003-05-26 10:32     ` Bartlomiej Zolnierkiewicz
  2003-05-26 11:13       ` Jeff Garzik
  2 siblings, 1 reply; 68+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2003-05-26 10:32 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linus Torvalds, linux-kernel


On Mon, 26 May 2003, Jeff Garzik wrote:

> Linus Torvalds wrote:
> > On Mon, 26 May 2003, Jeff Garzik wrote:
> >
> >>Just to echo some comments I said in private, this driver is _not_
> >>a replacement for drivers/ide.  This is not, and has never been,
> >>the intention.  In fact, I need drivers/ide's continued existence,
> >>so that I may have fewer boundaries on future development.
> >
> >
> > Just out of interest, is there any _point_ to this driver? I can
> > appreciate the approach, but I'd like to know if it does anything (at all)
> > better than the native IDE driver? Faster? Anything?
>
>
> Direction:  SATA is much more suited to SCSI, because otherwise you wind
> up re-creating all the queueing and error handling mess that SCSI
> already does for you.  The SATA2 host controllers coming out soon do
> full host-side TCQ, not the dain-bramaged ATA TCQ bus-release stuff.
> Doing SATA2 devel in drivers/ide will essentially be re-creating the
> SCSI mid-layer.

And now you are recreating ATA in SCSI ;-).

Don't get me wrong: I like idea very much, but why you can't
share common code between drivers/ide and your ATA-SCSI.

> Modularity:  drivers/ide has come a long way.  It needed to be turned
> "inside out", and that's what Alan did.  But there's still a lot of code
> that needs to be factored out/about, before hotplugging and device model
> stuff is sane.

Its getting close.

> Legacy-free:  Because I don't have to worry about legacy host
> controllers, I can ignore limitations drivers/ide cannot.  In
> drivers/ide, each host IO (PIO/MMIO) is done via function pointer.  If
> your arch has a mach_vec, more function pointers.  Mine does direct
> calls to the asm/io.h functions in faster.  So, ATA command submission
> is measureably faster.

I think it is simply wrong, you should use function pointers.
You can have ie. two PCI hosts, one using PIO and one using MMIO.

"measureably faster", I doubt.
IO operations are REALLY slow when compared to CPU cycles.

> sysfs:  James and co are putting time into getting scsi sysfs right.  I
> would rather ride their coattails, and have my driver Just Work with
> sysfs and the driver model.

No big deal here, ATA will get it too.

> PIO data transfer is faster and more scheduler-friendly, since it polls
> from a kernel thread.

CPU polling is faster than IRQs?

> And for specifically Intel SATA, drivers/ide flat out doesn't work (even
> though it claims to).

So fix it ;-).

> So, I conclude:  faster, smaller, and better future direction.  IMO, of
> course :)

And right now ugly and incomplete.
IMO, of course ;-).

Regards,
--
Bartlomiej

> 	Jeff
>
>
>
> (the following is somewhat comparing apples to oranges, but I like doing
> it nonetheless)

social engineering removed



^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  5:53       ` Jeff Garzik
@ 2003-05-26  6:21         ` Jeff Garzik
  2003-05-26 16:57           ` Linus Torvalds
  0 siblings, 1 reply; 68+ messages in thread
From: Jeff Garzik @ 2003-05-26  6:21 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Jeff Garzik wrote:
> Linus Torvalds wrote:
>> In other words: I'd really like to see what you can do with a _native_
>> request queue driver, and what the probloems are. And maybe port 
>> _those_ parts of SCSI to it.

> Actually getting down to coding, I see it as a huge amount of work for 
> little gain.  You have to consider all the userspace interfaces, sysfs 
> and device model support that wants coding, -after- you're done with the 
> basic SATA block driver.  Userland proggies already exist for scsi.


Tangenting a bit:

What does the block layer need, that it doesn't have now?

Each major subsystem right now is re-creating driver classes (floppy, 
tape, disk, ...) and that should be moved up and made more general. 
People should use /dev/diskX, /dev/floppyX, /dev/cdromX etc. without 
having to worry about IDE or SCSI or Jeff's SATA or whatever. 
Registration of majors/minors/CIDR/etc.  sysfs.  host 
controller-specific struct request_queue settings.  All these have to be 
recreated each time a new native block driver is created.  If you peer 
closely at some scsi drivers already merged, you see that too chose to 
translate SCSI rather than re-create all the junk necessary for a native 
block driver.  I'm definitely not the first to do what I did.  :)

When such code moves up into the block layer, then we can have a unified 
userland for doing class-specific activities, and then have a few 
bus-specific tools for the ATA- or SCSI-specific needs not met by 
class-general code.

As it stands now, I think a new native block driver creates more user 
confusion and pain than it solves, in addition to saving developer pain, 
if you consider all the device classes and userland tools and such that 
need writing.

	Jeff




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  5:59 ` Benjamin Herrenschmidt
@ 2003-05-26  6:03   ` Jeff Garzik
  0 siblings, 0 replies; 68+ messages in thread
From: Jeff Garzik @ 2003-05-26  6:03 UTC (permalink / raw)
  To: Benjamin Herrenschmidt; +Cc: Linus Torvalds, linux-kernel mailing list

Benjamin Herrenschmidt wrote:
> Btw, Jeff, while I agree about not boring about old PATA hardware,
> I'd still like to see support for MDMA modes in there. For once,
> there is no real difference in supporting both UDMA and MDMA, it's
> really just a matter of extending the range of the "mode" parameter,
> and I'd like to be able to use the driver on configs like pmacs which
> typically have a U/DMA capable channel for the internal HD and one
> MDMA only channel (ATAPI CD/DVD, ZIP) without having to play bad tricks
> to get both drivers up.


We'll see... I would really like to ignore MDMA.  For the reasons just 
outlined in replies to Linus, PATA support is quite useful in limited 
situations, but is not at all the focus of the driver.  When situations 
arise where special support for some PATA situation is needed, I will 
most likely just say "use drivers/ide"...

	Jeff




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  5:42       ` Linus Torvalds
@ 2003-05-26  6:01         ` Jeff Garzik
  2003-05-26 16:56           ` Linus Torvalds
  0 siblings, 1 reply; 68+ messages in thread
From: Jeff Garzik @ 2003-05-26  6:01 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds wrote:
> On Mon, 26 May 2003, Jeff Garzik wrote:
> 
>>I think with SATA + drivers/ide, you reach a point of diminishing 
>>returns versus amount of time spent on mid-layer coding.
> 
> 
> I think that's a valid approach, and just have a special driver for SATA. 
> That's not the part I worry about. 

You're still at a point of diminishing returns for a native driver too.

ATAPI support would be a pain.  Can't use drivers/ide/* nor ide-scsi.

Userland support is nonexistent, and users would have start using yet 
another set of tools to deal with their disks, instead of the existing 
[scsi] ones.

Device model, power management, hotplugging are all handled or 
in-the-works for scsi, and they are exactly what SATA needs.  I would 
have to recreate all that handling from scratch.

I think you're missing how much code and pain is actually saved by using 
the scsi layer...  like I mentioned in the last message, long-term, 
these issues can (and should) be solved by moving some of this stuff out 
of the scsi layer into the block layer, creating an overall desirable 
flattening effect of all code involved.


> The part I worry about is the SCSI layer itself, and also potential user
> confusion.

I think this can only benefit the scsi layer and continue its trend of 
becoming more lightweight and lib'ified.

WRT user confusion, I agree:  that's why PATA is a config option.  I 
would prefer people think "SATA? use my driver.  PATA? use drivers/ide"

Andre suggested that I might support PCI PATA controllers in my driver 
"officially", but I think user confusion and transition issues rule out 
that, for non-hackers.

	Jeff




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  4:58 Jeff Garzik
  2003-05-26  5:15 ` Linus Torvalds
@ 2003-05-26  5:59 ` Benjamin Herrenschmidt
  2003-05-26  6:03   ` Jeff Garzik
  2003-06-02  9:46 ` Andre Hedrick
  2 siblings, 1 reply; 68+ messages in thread
From: Benjamin Herrenschmidt @ 2003-05-26  5:59 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Linus Torvalds, linux-kernel mailing list

On Mon, 2003-05-26 at 06:58, Jeff Garzik wrote:
> Just to echo some comments I said in private, this driver is _not_
> a replacement for drivers/ide.  This is not, and has never been,
> the intention.  In fact, I need drivers/ide's continued existence,
> so that I may have fewer boundaries on future development.
> 
> Even though ATAPI support doesn't exist and error handling is
> primitive, this driver has been extensively tested locally and I feel
> is ready for a full and public kernel developer assault :)
> 
> James ok'd sending this...  I'll be sending "un-hack scsi headers" patch
> through him via his scsi-misc-2.5 tree.

Btw, Jeff, while I agree about not boring about old PATA hardware,
I'd still like to see support for MDMA modes in there. For once,
there is no real difference in supporting both UDMA and MDMA, it's
really just a matter of extending the range of the "mode" parameter,
and I'd like to be able to use the driver on configs like pmacs which
typically have a U/DMA capable channel for the internal HD and one
MDMA only channel (ATAPI CD/DVD, ZIP) without having to play bad tricks
to get both drivers up.

Ben.


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  5:40     ` Linus Torvalds
@ 2003-05-26  5:53       ` Jeff Garzik
  2003-05-26  6:21         ` Jeff Garzik
  2003-05-27  0:22       ` Alan Cox
  1 sibling, 1 reply; 68+ messages in thread
From: Jeff Garzik @ 2003-05-26  5:53 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds wrote:
> On Mon, 26 May 2003, Jeff Garzik wrote:
> 
>>Direction:  SATA is much more suited to SCSI, because otherwise you wind 
>>up re-creating all the queueing and error handling mess that SCSI 
>>already does for you. 
> 
> 
> Last I looked, the SCSI interfaces were much nastier than the native 
> queueing, and if there is anything missing I'd rather put it at that 
> layer, instead of making everything use the SCSI layer.


The SCSI mid-layer has quite flexible command submission and 
synchronization.  Since each SATA host controller I've seen so far 
differs in its queueing implementation and limits, this fits perfectly 
with the existing SCSI set up.

So, short-term, I save a ton of code over a SATA block driver. 
Long-term, the SCSI mid-layer benefits from the developer attention and 
becomes more lightweight as we push some generic concepts upwards into 
the block layer.  Just look at how far scsi mid-layer has come in 2.5, 
versus 2.4!  Much more lightweight even now.

So, short-term I disagree.  Long-term, I actually agree w/ you, in an 
indirect sorta way :)


> Because when you talk about error handling messes, you're talking SCSI. 
> THAT is messy. At least judging by the fact that a lot of SCSI drivers 
> don't seem to get it right.

This isn't an issue for me.  I have to do my own error handling, in 
fact.  I just define ->eh_strategy_handler, and do my own thing.  SCSI 
mid-layer nicely provides a kernel thread and command synchronization 
before and after it calls ->eh_strategy_handler.


> In other words: I'd really like to see what you can do with a _native_
> request queue driver, and what the probloems are. And maybe port _those_ 
> parts of SCSI to it.

I considered a native block driver, or perhaps a native block driver for 
SATA and SCSI pass-thru for SATAPI.

Actually getting down to coding, I see it as a huge amount of work for 
little gain.  You have to consider all the userspace interfaces, sysfs 
and device model support that wants coding, -after- you're done with the 
basic SATA block driver.  Userland proggies already exist for scsi.

(more on this in next reply)


>>And for specifically Intel SATA, drivers/ide flat out doesn't work (even 
>>though it claims to).
> 
> 
> Well, I don't think it claimed to, until today. Still doesn't work?

Nope.  Not before or after.  (even without the patch, it should have 
worked in some amount of compat mode)

	Jeff




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  5:36     ` Jeff Garzik
@ 2003-05-26  5:42       ` Linus Torvalds
  2003-05-26  6:01         ` Jeff Garzik
  0 siblings, 1 reply; 68+ messages in thread
From: Linus Torvalds @ 2003-05-26  5:42 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel


On Mon, 26 May 2003, Jeff Garzik wrote:
> 
> I think with SATA + drivers/ide, you reach a point of diminishing 
> returns versus amount of time spent on mid-layer coding.

I think that's a valid approach, and just have a special driver for SATA. 
That's not the part I worry about. 

The part I worry about is the SCSI layer itself, and also potential user
confusion.

		Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  5:30   ` Jeff Garzik
  2003-05-26  5:36     ` Jeff Garzik
@ 2003-05-26  5:40     ` Linus Torvalds
  2003-05-26  5:53       ` Jeff Garzik
  2003-05-27  0:22       ` Alan Cox
  2003-05-26 10:32     ` Bartlomiej Zolnierkiewicz
  2 siblings, 2 replies; 68+ messages in thread
From: Linus Torvalds @ 2003-05-26  5:40 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel


On Mon, 26 May 2003, Jeff Garzik wrote:
> 
> Direction:  SATA is much more suited to SCSI, because otherwise you wind 
> up re-creating all the queueing and error handling mess that SCSI 
> already does for you. 

Last I looked, the SCSI interfaces were much nastier than the native 
queueing, and if there is anything missing I'd rather put it at that 
layer, instead of making everything use the SCSI layer.

Because when you talk about error handling messes, you're talking SCSI. 
THAT is messy. At least judging by the fact that a lot of SCSI drivers 
don't seem to get it right.

In other words: I'd really like to see what you can do with a _native_
request queue driver, and what the probloems are. And maybe port _those_ 
parts of SCSI to it.

> And for specifically Intel SATA, drivers/ide flat out doesn't work (even 
> though it claims to).

Well, I don't think it claimed to, until today. Still doesn't work?

		Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  5:30   ` Jeff Garzik
@ 2003-05-26  5:36     ` Jeff Garzik
  2003-05-26  5:42       ` Linus Torvalds
  2003-05-26  5:40     ` Linus Torvalds
  2003-05-26 10:32     ` Bartlomiej Zolnierkiewicz
  2 siblings, 1 reply; 68+ messages in thread
From: Jeff Garzik @ 2003-05-26  5:36 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Jeff Garzik wrote:
> So, I conclude:  faster, smaller, and better future direction.  IMO, of 
> course :)


As a FWIW, this driver is mainly intended as a Serial ATA driver.

It just happens to do PATA by coincedence (i.e. because it makes devel 
easier for me).

For example, current Intel SATA is "PATA, but without the timing junk."

I think with SATA + drivers/ide, you reach a point of diminishing 
returns versus amount of time spent on mid-layer coding.

	Jeff




^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  5:15 ` Linus Torvalds
@ 2003-05-26  5:30   ` Jeff Garzik
  2003-05-26  5:36     ` Jeff Garzik
                       ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Jeff Garzik @ 2003-05-26  5:30 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: linux-kernel

Linus Torvalds wrote:
> On Mon, 26 May 2003, Jeff Garzik wrote:
> 
>>Just to echo some comments I said in private, this driver is _not_
>>a replacement for drivers/ide.  This is not, and has never been,
>>the intention.  In fact, I need drivers/ide's continued existence,
>>so that I may have fewer boundaries on future development.
> 
> 
> Just out of interest, is there any _point_ to this driver? I can
> appreciate the approach, but I'd like to know if it does anything (at all)
> better than the native IDE driver? Faster? Anything?


Direction:  SATA is much more suited to SCSI, because otherwise you wind 
up re-creating all the queueing and error handling mess that SCSI 
already does for you.  The SATA2 host controllers coming out soon do 
full host-side TCQ, not the dain-bramaged ATA TCQ bus-release stuff. 
Doing SATA2 devel in drivers/ide will essentially be re-creating the 
SCSI mid-layer.

Modularity:  drivers/ide has come a long way.  It needed to be turned 
"inside out", and that's what Alan did.  But there's still a lot of code 
that needs to be factored out/about, before hotplugging and device model 
stuff is sane.

Legacy-free:  Because I don't have to worry about legacy host 
controllers, I can ignore limitations drivers/ide cannot.  In 
drivers/ide, each host IO (PIO/MMIO) is done via function pointer.  If 
your arch has a mach_vec, more function pointers.  Mine does direct 
calls to the asm/io.h functions in faster.  So, ATA command submission 
is measureably faster.

sysfs:  James and co are putting time into getting scsi sysfs right.  I 
would rather ride their coattails, and have my driver Just Work with 
sysfs and the driver model.

PIO data transfer is faster and more scheduler-friendly, since it polls 
from a kernel thread.

And for specifically Intel SATA, drivers/ide flat out doesn't work (even 
though it claims to).

So, I conclude:  faster, smaller, and better future direction.  IMO, of 
course :)

	Jeff



(the following is somewhat comparing apples to oranges, but I like doing 
it nonetheless)

bash-2.05b$ wc -l drivers/scsi/libata.c drivers/scsi/ata_piix.c 
include/linux/ata.h
    2247 drivers/scsi/libata.c
     322 drivers/scsi/ata_piix.c
     485 include/linux/ata.h
    3054 total

bash-2.05b$ wc -l drivers/ide/*.[ch] drivers/ide/pci/piix.[ch]
    3418 drivers/ide/ide-cd.c
     733 drivers/ide/ide-cd.h
      71 drivers/ide/ide-default.c
    1841 drivers/ide/ide-disk.c
    1145 drivers/ide/ide-dma.c
    2090 drivers/ide/ide-floppy.c
     135 drivers/ide/ide-geometry.c
    1330 drivers/ide/ide-io.c
    1322 drivers/ide/ide-iops.c
     437 drivers/ide/ide-lib.c
      97 drivers/ide/ide-pnp.c
    1466 drivers/ide/ide-probe.c
     930 drivers/ide/ide-proc.c
    6330 drivers/ide/ide-tape.c
    2006 drivers/ide/ide-taskfile.c
     798 drivers/ide/ide-tcq.c
     281 drivers/ide/ide-timing.h
    2539 drivers/ide/ide.c
      41 drivers/ide/ide_modes.h
     899 drivers/ide/setup-pci.c
     840 drivers/ide/pci/piix.c
     317 drivers/ide/pci/piix.h
   29066 total


^ permalink raw reply	[flat|nested] 68+ messages in thread

* Re: [BK PATCHES] add ata scsi driver
  2003-05-26  4:58 Jeff Garzik
@ 2003-05-26  5:15 ` Linus Torvalds
  2003-05-26  5:30   ` Jeff Garzik
  2003-05-26  5:59 ` Benjamin Herrenschmidt
  2003-06-02  9:46 ` Andre Hedrick
  2 siblings, 1 reply; 68+ messages in thread
From: Linus Torvalds @ 2003-05-26  5:15 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: linux-kernel


On Mon, 26 May 2003, Jeff Garzik wrote:
>
> Just to echo some comments I said in private, this driver is _not_
> a replacement for drivers/ide.  This is not, and has never been,
> the intention.  In fact, I need drivers/ide's continued existence,
> so that I may have fewer boundaries on future development.

Just out of interest, is there any _point_ to this driver? I can
appreciate the approach, but I'd like to know if it does anything (at all)
better than the native IDE driver? Faster? Anything?

		Linus


^ permalink raw reply	[flat|nested] 68+ messages in thread

* [BK PATCHES] add ata scsi driver
@ 2003-05-26  4:58 Jeff Garzik
  2003-05-26  5:15 ` Linus Torvalds
                   ` (2 more replies)
  0 siblings, 3 replies; 68+ messages in thread
From: Jeff Garzik @ 2003-05-26  4:58 UTC (permalink / raw)
  To: torvalds, linux-kernel

Just to echo some comments I said in private, this driver is _not_
a replacement for drivers/ide.  This is not, and has never been,
the intention.  In fact, I need drivers/ide's continued existence,
so that I may have fewer boundaries on future development.

Even though ATAPI support doesn't exist and error handling is
primitive, this driver has been extensively tested locally and I feel
is ready for a full and public kernel developer assault :)

James ok'd sending this...  I'll be sending "un-hack scsi headers" patch
through him via his scsi-misc-2.5 tree.




Linus, please do a

	bk pull bk://kernel.bkbits.net/jgarzik/scsi-2.5

Others may download the patch from

ftp://ftp.kernel.org/pub/linux/kernel/people/jgarzik/patchkits/2.5/2.5.69-bk18-scsi1.patch.bz2

This will update the following files:

 drivers/scsi/Kconfig    |   27 
 drivers/scsi/Makefile   |    1 
 drivers/scsi/ata_piix.c |  322 ++++++
 drivers/scsi/libata.c   | 2247 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/ata.h     |  485 ++++++++++
 5 files changed, 3082 insertions(+)

through these ChangeSets:

<jgarzik@redhat.com> (03/05/26 1.1357)
   [scsi ata] make PATA config option actually do something useful

<jgarzik@redhat.com> (03/05/26 1.1356)
   [scsi ata] include hacks, b/c scsi headers not in include/linux

<jgarzik@redhat.com> (03/05/26 1.1355)
   [scsi] add ATA driver


^ permalink raw reply	[flat|nested] 68+ messages in thread

end of thread, other threads:[~2003-06-02 14:41 UTC | newest]

Thread overview: 68+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-05-26 18:12 [BK PATCHES] add ata scsi driver James Bottomley
2003-05-26 18:18 ` Jens Axboe
2003-05-26 18:47   ` James Bottomley
2003-05-26 19:07     ` Jens Axboe
2003-05-26 19:17       ` James Bottomley
2003-05-26 19:33         ` Jens Axboe
2003-05-27 12:39           ` Jens Axboe
2003-05-27 14:26             ` James Bottomley
2003-05-27 17:16               ` Jens Axboe
2003-05-27 18:09                 ` James Bottomley
2003-05-27 18:21                   ` Jens Axboe
2003-05-27 18:30                     ` James Bottomley
2003-05-26 20:27         ` Linus Torvalds
2003-05-26 20:36           ` James Bottomley
2003-05-26 20:45             ` Linus Torvalds
2003-05-26 20:51               ` Jens Axboe
2003-05-26 20:56               ` James Bottomley
2003-05-26 20:38           ` Jens Axboe
2003-05-26 20:49             ` Linus Torvalds
2003-05-26 20:57               ` Jens Axboe
2003-05-26 21:34                 ` Linus Torvalds
2003-05-26 23:58                   ` Nick Piggin
2003-05-27  0:09                     ` Linus Torvalds
2003-05-27  0:49                       ` Nick Piggin
2003-05-27  0:16                   ` Alan Cox
2003-05-27  6:54                   ` Jens Axboe
2003-05-27 14:20                     ` James Bottomley
2003-05-27 14:36                     ` Linus Torvalds
2003-05-27 14:59                       ` James Bottomley
2003-05-27 15:21                         ` Jeff Garzik
2003-05-27 15:38                           ` James Bottomley
2003-05-27 15:50                             ` Jeff Garzik
2003-05-27 16:00                               ` James Bottomley
2003-05-27 16:16                                 ` Jeff Garzik
2003-05-28  9:35                                   ` Christoph Hellwig
2003-05-28 10:50                           ` Lincoln Dale
2003-05-27 19:43                       ` Jens Axboe
  -- strict thread matches above, loose matches on Subject: below --
2003-05-26  4:58 Jeff Garzik
2003-05-26  5:15 ` Linus Torvalds
2003-05-26  5:30   ` Jeff Garzik
2003-05-26  5:36     ` Jeff Garzik
2003-05-26  5:42       ` Linus Torvalds
2003-05-26  6:01         ` Jeff Garzik
2003-05-26 16:56           ` Linus Torvalds
2003-05-26 17:47             ` Jeff Garzik
2003-05-26 20:09               ` Linus Torvalds
2003-05-27  0:29                 ` Alan Cox
2003-05-27  6:07                 ` Jeff Garzik
2003-05-27  6:30                   ` Linus Torvalds
2003-05-27  6:51                     ` Linus Torvalds
2003-05-27  7:29                       ` Jeff Garzik
2003-05-26  5:40     ` Linus Torvalds
2003-05-26  5:53       ` Jeff Garzik
2003-05-26  6:21         ` Jeff Garzik
2003-05-26 16:57           ` Linus Torvalds
2003-05-26 17:24             ` Jens Axboe
2003-05-26 17:54               ` Jeff Garzik
2003-05-26 17:59               ` Jeff Garzik
2003-05-26 18:11                 ` Jens Axboe
2003-05-27  0:22       ` Alan Cox
2003-05-27  4:15         ` Linus Torvalds
2003-05-26 10:32     ` Bartlomiej Zolnierkiewicz
2003-05-26 11:13       ` Jeff Garzik
2003-05-26 11:37         ` Bartlomiej Zolnierkiewicz
2003-05-26  5:59 ` Benjamin Herrenschmidt
2003-05-26  6:03   ` Jeff Garzik
2003-06-02  9:46 ` Andre Hedrick
2003-06-02 13:56   ` Alan Cox

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).