linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [RFC] Disk shock protection (revisited)
  2008-02-28 11:13     ` Alan Cox
@ 2008-02-24 18:03       ` Pavel Machek
  2008-02-28 17:00       ` Greg Freemyer
  2008-03-07 18:03       ` Elias Oltmanns
  2 siblings, 0 replies; 25+ messages in thread
From: Pavel Machek @ 2008-02-24 18:03 UTC (permalink / raw)
  To: Alan Cox; +Cc: Elias Oltmanns, linux-ide, linux-kernel, Jens Axboe

Hi!

> > > That sounds like a non starter. What if the box is busy, what if the
> > > daemon or something you touch needs memory and causes paging ?
> > 
> > The daemon runs mlock'd anyway, so there won't be any need for paging
> 
> mlock does not guarantee anything of that form. A syscall by an mlocked
> process which causes a memory allocation can cause paging of another
> process on the system.

Well... but you can be careful about the syscalls, right?

Anyway, active protection is 'best effort' anyway. There's not enough
time to park heads if you drop the machine without tilting it first...
and we have been running with no protection for years now...

> > stays in memory all the time, it can go ahead and notify the kernel that
> > the disk heads should be unloaded. The kernel takes care to insert the
> > idle immediate command at the head of the queue. Am I missing something?
> 
> Yes - the fact we may well have bounced off the floor already.

Well, shit happens. Even notebook with parked harddrive is not
guaranteed to survive the fall.
							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [RFC] Disk shock protection (revisited)
@ 2008-02-25 23:56 Elias Oltmanns
  2008-02-26  0:02 ` Jeff Garzik
                   ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Elias Oltmanns @ 2008-02-25 23:56 UTC (permalink / raw)
  To: linux-ide; +Cc: linux-kernel, Jens Axboe

Hi all,

at the moment I'm having another go at trying to make the disk shock
protection patch fit for upstream submission. However, there are still
some fundamental issues I'd like to discuss in order to make sure that
I'm heading in the right direction.

The general idea: A daemon running in user space monitors input data
from an accelerometer. When the daemon detects a critical condition,
i.e., a sudden acceleration (for instance, laptop slides off the desk),
it signals the kernel so the hard disk may be put into a (more) safe
state. To this end, the kernel has to issue an idle immediate command
with unload feature and stop the block layer queue afterwards. Once the
daemon tells us that the imminent danger is over, the most important
task for the kernel is to restart the block layer queue. See below for
more details.

This project is (and I personally am) mainly concerned with laptops
equipped with an accelerometer and an (S)ATA hard drive that supports
the unload feature of the idle immediate command. Jens Axboe, however,
suggested right from the beginning that there might be more general
applications for the block layer queue freezing part of the story. The
question is now to what extent are the requirements for a disk shock
protection facility (specific to ATA devices) and a general block layer
queue freezing facility compatible and in what way should they be
exposed to user space.

Probably, the major problem is that I don't really know what kind of
applications (apart from shock protection) I should be thinking of that
might want to have a queue freezing facility at hand. Still, one thing
seems obvious to me: For disk shock protection, time is of the essence,
whereas in the more general case of simple block layer queue freezing,
the situation is different as far as lower levels are concerned. In
particular, I'm inclined to believe that in the context of such a
general application it would be desirable to be able to freeze the queue
of an ATA device *without* issuing an idle immediate command first.
Obviously, the interface exposed to userspace would have to provide for
these diverging needs.

The disk-protect patch in it's current form [1] got stuck somewhere
between trying to provide a general queue freezing facility and
accommodating the needs of a disk shock protection setup. The sysfs
attributes required to request immediate disk parking from user space
are exported under /sys/block/. This is very convenient from the user's
point of view because the hierarchy is intuitive and you can easily find
the subdirectory associated to your hard disk. Conceptually, though, it
doesn't feel right. That is, for simple queue freezing, it is perfectly
alright, of course, but I don't see why and, indeed, how an ATA specific
feature like immediate disk parking could be controlled from the block
layer in a straight forward way. Besides, Jens, quite understandably,
objects to the introduction of yet another queue hook which is the
current way of telling ATA and non-ATA devices apart. Instead, he
suggests to implement generic block layer notification requests like
REQ_LB_OP_FREEZE of type REQ_TYPE_LINUX_BLOCK and let low level drivers
act upon it as they see fit. But then we would still need a way to
configure the way libata / ide actuaaly does respond to those block
layer messages. As explained above, the user might want to choose
whether or idle immediate is to be issued or simple queue freezing is
enough for his / her purposes. Besides, some drives that actually
support the unload feature of the idle immediate command don't report
that capability in the IDENTIFY data, so userspace needs a way to tell
the driver that the feature is available after all.

So, roughly my questions are these:

1. Who is to be in charge for the shock protection application? Should
   userspace speak to libata / ide directly (through sysfs) and the low
   level drivers will notify block layer to stop the queeue or should
   userspace always talk to the block layer, regardless whether we want
   to park an ATA disk or just freeze the queue? In the latter case we'd
   still need the option to configure the exact behaviour for ATA
   devices.
2. Depending on the answer to the previous question, by what mechanism
   should block layer and lld interact? Special requests, queue hooks or
   something in some way similar to power management functions (once
   suggested by James Bottomley)?
3. What is the preferred way to pass device specific configuration
   options to libata (preferrably at runtime, i.e., after module
   loading)?

Please let me know if you need any further information. Also, I will
certainly have more questions once I try to my hand at any of your
suggestions.

Thanks in advance,

Elias


[1] http://article.gmane.org/gmane.linux.drivers.hdaps.devel/1094

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Disk shock protection (revisited)
  2008-02-25 23:56 [RFC] Disk shock protection (revisited) Elias Oltmanns
@ 2008-02-26  0:02 ` Jeff Garzik
  2008-02-26  0:30   ` Elias Oltmanns
  2008-02-26 12:39 ` Alan Cox
  2008-02-26 20:47 ` Willy Tarreau
  2 siblings, 1 reply; 25+ messages in thread
From: Jeff Garzik @ 2008-02-26  0:02 UTC (permalink / raw)
  To: linux-ide, linux-kernel, Jens Axboe

Elias Oltmanns wrote:
> The general idea: A daemon running in user space monitors input data
> from an accelerometer. When the daemon detects a critical condition,
> i.e., a sudden acceleration (for instance, laptop slides off the desk),
> it signals the kernel so the hard disk may be put into a (more) safe
> state. To this end, the kernel has to issue an idle immediate command
> with unload feature and stop the block layer queue afterwards. Once the
> daemon tells us that the imminent danger is over, the most important
> task for the kernel is to restart the block layer queue. See below for
> more details.

Speaking specifically to that problem, it seems to me that you either 
want an mlock'd daemon, or just simply to keep everything in the kernel, 
for this specific solution.

You don't want, for example, to swap out other apps, swap in the daemon, 
in order to handle a sudden acceleration.

	Jeff



^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Disk shock protection (revisited)
  2008-02-26  0:02 ` Jeff Garzik
@ 2008-02-26  0:30   ` Elias Oltmanns
  2008-02-26  1:33     ` Henrique de Moraes Holschuh
  0 siblings, 1 reply; 25+ messages in thread
From: Elias Oltmanns @ 2008-02-26  0:30 UTC (permalink / raw)
  To: linux-ide; +Cc: linux-kernel, Jens Axboe

Jeff Garzik <jeff@garzik.org> wrote:
> Elias Oltmanns wrote:
>> The general idea: A daemon running in user space monitors input data
>> from an accelerometer. When the daemon detects a critical condition,
>> i.e., a sudden acceleration (for instance, laptop slides off the desk),
>> it signals the kernel so the hard disk may be put into a (more) safe
>> state. To this end, the kernel has to issue an idle immediate command
>> with unload feature and stop the block layer queue afterwards. Once the
>> daemon tells us that the imminent danger is over, the most important
>> task for the kernel is to restart the block layer queue. See below for
>> more details.
>
> Speaking specifically to that problem, it seems to me that you either
> want an mlock'd daemon, or just simply to keep everything in the
> kernel, for this specific solution.

Yes, the daemon is mlock'd.

>
> You don't want, for example, to swap out other apps, swap in the
> daemon, in order to handle a sudden acceleration.

Quite. But with mlock this particular problem can be handled in user
space just fine. The only reason I can see right now for putting this
logic into the kernel as well is to keep the functionality around even
after task freeze during suspend / resume. On the other hand, I don't
know whether this is really worth the effort even though the time when
the suspend operation is in progress can arguably be one of the most
accident-prone moments (think of users packing their things in a hurry).

Regards,

Elias

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Disk shock protection (revisited)
  2008-02-26  0:30   ` Elias Oltmanns
@ 2008-02-26  1:33     ` Henrique de Moraes Holschuh
  0 siblings, 0 replies; 25+ messages in thread
From: Henrique de Moraes Holschuh @ 2008-02-26  1:33 UTC (permalink / raw)
  To: linux-ide, linux-kernel, Jens Axboe

On Tue, 26 Feb 2008, Elias Oltmanns wrote:
> > You don't want, for example, to swap out other apps, swap in the
> > daemon, in order to handle a sudden acceleration.
> 
> Quite. But with mlock this particular problem can be handled in user
> space just fine. The only reason I can see right now for putting this

And, as long as there is a way to also invoke it from within the kernel, we
can call it from inside the kernel as well when there is a reason to make
that function available.  Full flexibility is easily attainable here and
nothing we should bother about too much.

> logic into the kernel as well is to keep the functionality around even

Some hardware (Apple's?) has the entire APS logic in firmware (and AFAIK
*also* export the accelerometer data for other uses).  On those boxes, if
you want to trust the firmware, you just ignore the accelerometer and hook
to an interrupt.  When you get the interrupt, you freeze the queue and
unload heads.  Adding that to work in-kernel would be trivial.

Adding a suspend-time-only emergency HDAPS in-kernel monitor thread would
also be doable, if we wanted to duplicate that for ThinkPads (I don't really
think it is needed, but...).  As long as queue freezing and the required
unload immediate procedure can be called in at *any* time before the disk
device, and its buses and controller are suspended, it would do to implement
whatever we feel it is needed.

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Disk shock protection (revisited)
  2008-02-25 23:56 [RFC] Disk shock protection (revisited) Elias Oltmanns
  2008-02-26  0:02 ` Jeff Garzik
@ 2008-02-26 12:39 ` Alan Cox
  2008-02-28  8:24   ` Elias Oltmanns
  2008-02-26 20:47 ` Willy Tarreau
  2 siblings, 1 reply; 25+ messages in thread
From: Alan Cox @ 2008-02-26 12:39 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: linux-ide, linux-kernel, Jens Axboe

> The general idea: A daemon running in user space monitors input data
> from an accelerometer. When the daemon detects a critical condition,

That sounds like a non starter. What if the box is busy, what if the
daemon or something you touch needs memory and causes paging ?

Given the accelerometer data should be very simple doesn't it actually
make sense in this specific case to put the logic (not thresholds) in
kernel space.

> state. To this end, the kernel has to issue an idle immediate command
> with unload feature and stop the block layer queue afterwards. Once the

Yep. Pity the worst case completion time for an IDE I/O is 60 seconds or
so.

> 1. Who is to be in charge for the shock protection application? Should
>    userspace speak to libata / ide directly (through sysfs) and the low

I think it has to be kernel side for speed, and because you will need to
issue idle immediate while a command sequence is active which is
*extremely* hairy as you have to recover from the mess and restart the
relevant I/O. Plus you may need controller specific knowledge on issuing
it (and changes to libata).

> 2. Depending on the answer to the previous question, by what mechanism
>    should block layer and lld interact? Special requests, queue hooks or
>    something in some way similar to power management functions (once
>    suggested by James Bottomley)?

Idle immediate seem to simply fit the queue model, it happens in
*parallel* to I/O events and is special in all sorts of ways.

> 3. What is the preferred way to pass device specific configuration
>    options to libata (preferrably at runtime, i.e., after module
>    loading)?

sysfs

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Disk shock protection (revisited)
  2008-02-25 23:56 [RFC] Disk shock protection (revisited) Elias Oltmanns
  2008-02-26  0:02 ` Jeff Garzik
  2008-02-26 12:39 ` Alan Cox
@ 2008-02-26 20:47 ` Willy Tarreau
  2008-02-28 10:10   ` Elias Oltmanns
  2 siblings, 1 reply; 25+ messages in thread
From: Willy Tarreau @ 2008-02-26 20:47 UTC (permalink / raw)
  To: linux-ide, linux-kernel, Jens Axboe

Hi Elias,

On Tue, Feb 26, 2008 at 12:56:31AM +0100, Elias Oltmanns wrote:

[ very interesting project ]

> Probably, the major problem is that I don't really know what kind of
> applications (apart from shock protection) I should be thinking of that
> might want to have a queue freezing facility at hand.

In terms of applications, depending on the sensitivity of the accelerometer,
we can imagine that a laptop would immediately force unmount crypted
filesystems if it believes it's being stolen, for instance. It's just a
random idea that comes to my mind, in the hope it may help you imagine
some crazy usages. But generally you should not fool your mind with too
many hypothetical cases, ideas will come once you provide a smart interface
and this interface will evolve with future needs.

Concerning your daemon, I think that every millisecond counts when a
laptop falls on the floor. So I think that running it in the kernel
should help you gain those precious milliseconds. I doubt your daemon
could trigger fast enough while X is starting or during some activities
which require a lot of CPU or uninterruptible I/O. If (I don't know)
the driver can be woken up by an interrupt from the controller, it
might react faster.

Good luck, and I sincerely wish you success on this project!
Willy


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Disk shock protection (revisited)
  2008-02-26 12:39 ` Alan Cox
@ 2008-02-28  8:24   ` Elias Oltmanns
  2008-02-28 11:13     ` Alan Cox
  0 siblings, 1 reply; 25+ messages in thread
From: Elias Oltmanns @ 2008-02-28  8:24 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-ide, linux-kernel, Jens Axboe

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
>> The general idea: A daemon running in user space monitors input data
>> from an accelerometer. When the daemon detects a critical condition,
>
> That sounds like a non starter. What if the box is busy, what if the
> daemon or something you touch needs memory and causes paging ?

The daemon runs mlock'd anyway, so there won't be any need for paging
there. As for responsiveness under heavy load, I'm not quite sure I get
your meaning. On my system, at least, the only way I have managed to
decrease responsiveness noticeably is to cause a lot of I/O operations
on my disk. But even then it's not the overall responsiveness that gets
hurt but just any action that requires further I/O. Since the daemon
stays in memory all the time, it can go ahead and notify the kernel that
the disk heads should be unloaded. The kernel takes care to insert the
idle immediate command at the head of the queue. Am I missing something?

>
> Given the accelerometer data should be very simple doesn't it actually
> make sense in this specific case to put the logic (not thresholds) in
> kernel space.

The simplicity of the input data doesn't necessarily imply that the
evaluation logic is simple as well; but then the daemon is rather simple
in this case. Still, probably due to my lack of experience I don't quite
see what can be gained by putting it into kernel space which cannot be
achieved using the mlock feature or nice levels.

The important thing is this: There will be a dedicated code path for
disk head parking in the kernel. If the actual decision about when head
parking should take place is left to a daemon in user space, it is much
easier for the user to specify which devices should be protected and
which input data the decision should be based upon in case the system
happens to have access to more than one accelerometer. Right now, I don't
feel quite up to the job to write a dedicated kernel module that
replaces the daemon and is designed in a sufficiently generic way to
cope with all sorts of weird system configurations. Since I wouldn't
even know where to start, someone would have to point me in the right
direction first and probably have a lot of patience with me and my
questions in the process.

>
>> state. To this end, the kernel has to issue an idle immediate command
>> with unload feature and stop the block layer queue afterwards. Once the
>
> Yep. Pity the worst case completion time for an IDE I/O is 60 seconds or
> so.

Well, the low level driver would have to make sure that no requests are
accepted after the idle immediate command has been received. The block
layer queue is stopped later merely to stop the request_fn() to be
called for the time that lld won't accept any requests anyway. See
further comments below.

>
>> 1. Who is to be in charge for the shock protection application? Should
>>    userspace speak to libata / ide directly (through sysfs) and the low
>
> I think it has to be kernel side for speed, and because you will need to
> issue idle immediate while a command sequence is active which is
> *extremely* hairy as you have to recover from the mess and restart the
> relevant I/O. Plus you may need controller specific knowledge on issuing
> it (and changes to libata).

As indicated above, I'd appreciate it if you could explain in a bit more
detail why it is not enough to let the kernel take care of just the
actual disk parking. It really is perfectly possible that I miss
something obvious here, so please bare with me.

Let me also make quite clear what exactly I intend to keep in kernel
space and what the daemon is supposed to be doing. When the daemon
decides that we had better stop all I/O to the disk, it writes an
integer to a sysfs attribute specifying the number of seconds it expects
the disk to be kept in the safe mode for. From there on everything is
going to be handled in kernel space, i.e., issuing idle immediate while
making sure that no other command gets issued to the hardware after that
and freezing the block layer queue eventually in order to stop
the request_fn() from being called needlessly. Once the specified time
is up or if the daemon writes 0 to that sysfs attrribute before that
time, it is kernel space code again that takes care that normal
operation is resumed.

>
>> 2. Depending on the answer to the previous question, by what mechanism
>>    should block layer and lld interact? Special requests, queue hooks or
>>    something in some way similar to power management functions (once
>>    suggested by James Bottomley)?
>
> Idle immediate seem to simply fit the queue model, it happens in
> *parallel* to I/O events and is special in all sorts of ways.

Well, this is something we'll have to discuss too since I don't have the
SATA specs and haven't a clue as to how idle immediate behaves in an NCQ
enabled system. However, my question was about something more basic than
that, namely, what should be handled by the block layer and what by the
libata / ide subsystem and how they should interact with each other.
But never mind that now because I have had some ideas since and will
come up with a patch series once the other issues have been settled, so
we can have a more hands on discussion about this particular problem
then.

>
>> 3. What is the preferred way to pass device specific configuration
>>    options to libata (preferrably at runtime, i.e., after module
>>    loading)?
>
> sysfs

Yes, I thought as much. I just haven't quite worked out yet where or how
I am supposed to introduce libata specific sysfs attributes since this
seems to be left to the scsi midlayer so far.

Regards,

Elias

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Disk shock protection (revisited)
  2008-02-26 20:47 ` Willy Tarreau
@ 2008-02-28 10:10   ` Elias Oltmanns
  0 siblings, 0 replies; 25+ messages in thread
From: Elias Oltmanns @ 2008-02-28 10:10 UTC (permalink / raw)
  To: linux-ide; +Cc: linux-kernel, Jens Axboe

Willy Tarreau <w@1wt.eu> wrote:
> Hi Elias,

Hi Willy,

>
> On Tue, Feb 26, 2008 at 12:56:31AM +0100, Elias Oltmanns wrote:
>
> [ very interesting project ]
>
>> Probably, the major problem is that I don't really know what kind of
>> applications (apart from shock protection) I should be thinking of that
>> might want to have a queue freezing facility at hand.
>
> In terms of applications, depending on the sensitivity of the accelerometer,
> we can imagine that a laptop would immediately force unmount crypted
> filesystems if it believes it's being stolen, for instance. It's just a
> random idea that comes to my mind, in the hope it may help you imagine
> some crazy usages.

Well, this application would use the same input data (acceleromtere) but
it would certainly not require a generic queue freezing facility.

> But generally you should not fool your mind with too many hypothetical
> cases, ideas will come once you provide a smart interface and this
> interface will evolve with future needs.
>
> Concerning your daemon, I think that every millisecond counts when a
> laptop falls on the floor. So I think that running it in the kernel
> should help you gain those precious milliseconds.

The idle immediate command itself may need up to 300 milliseconds to
complete according to the ATA standard. This seems like a very long time
compared to CPU standards, i.e., the time usually needed to serve a
lightweight daemon.

> I doubt your daemon could trigger fast enough while X is starting or
> during some activities which require a lot of CPU or uninterruptible
> I/O.

On my system the daemon's response *feels* just fine even while
compiling a kernel; I haven't done any measurements or benchmarks
though. The thing I'm most concerned about is uninterruptible I/O but
I'm not quite sure whether and how this can be addressed in kernel
space.

Regards,

Elias

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Disk shock protection (revisited)
  2008-02-28  8:24   ` Elias Oltmanns
@ 2008-02-28 11:13     ` Alan Cox
  2008-02-24 18:03       ` Pavel Machek
                         ` (2 more replies)
  0 siblings, 3 replies; 25+ messages in thread
From: Alan Cox @ 2008-02-28 11:13 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: linux-ide, linux-kernel, Jens Axboe

> > That sounds like a non starter. What if the box is busy, what if the
> > daemon or something you touch needs memory and causes paging ?
> 
> The daemon runs mlock'd anyway, so there won't be any need for paging

mlock does not guarantee anything of that form. A syscall by an mlocked
process which causes a memory allocation can cause paging of another
process on the system.

> there. As for responsiveness under heavy load, I'm not quite sure I get
> your meaning. On my system, at least, the only way I have managed to
> decrease responsiveness noticeably is to cause a lot of I/O operations

It depends a lot on hardware but you can certainly get user space delays
in seconds as an extreme worst case.

> stays in memory all the time, it can go ahead and notify the kernel that
> the disk heads should be unloaded. The kernel takes care to insert the
> idle immediate command at the head of the queue. Am I missing something?

Yes - the fact we may well have bounced off the floor already.

> happens to have access to more than one accelerometer. Right now, I don't
> feel quite up to the job to write a dedicated kernel module that
> replaces the daemon and is designed in a sufficiently generic way to

Thats fine - nothing says a user space daemon isn't a good starting point.

> > Yep. Pity the worst case completion time for an IDE I/O is 60 seconds or
> > so.
> 
> Well, the low level driver would have to make sure that no requests are
> accepted after the idle immediate command has been received. The block

No doesn't work like that. The command currently being processed on IDE
can take up to 60 seconds to complete. Idle immediate (on the devices it
works for - it hangs some) is very special in that it can be used in some
cases to interrupt a running command sequence. It requires a significant
amount of work in the driver layer to then clean up and requeue the
partial command and to know if it is possible to do so.

> and freezing the block layer queue eventually in order to stop
> the request_fn() from being called needlessly. Once the specified time
> is up or if the daemon writes 0 to that sysfs attrribute before that
> time, it is kernel space code again that takes care that normal
> operation is resumed.

I think we have three things here

1.	A general queue freeze scheme from user space
2.	A general implementation of a queue freeze that stops further
command issuing while the queue is blocked
3.	The ability for devices to provide a function to be called
when a queue freeze is done (ie idle immediate and the like)

The fine details of how you abort an ATA command don't actually matter
for an initial implementation and can be written once the core stuff is
right.

> Well, this is something we'll have to discuss too since I don't have the
> SATA specs and haven't a clue as to how idle immediate behaves in an NCQ
> enabled system. However, my question was about something more basic than

I have the specs, and I don't understand it or even if it is valid to do
so. Some research (as in trying it and seeing) may be needed.

> Yes, I thought as much. I just haven't quite worked out yet where or how
> I am supposed to introduce libata specific sysfs attributes since this
> seems to be left to the scsi midlayer so far.

The scsi midlayer is the main manager of queues so that seems sane - and
if you've got the basic queue freeze logic right then one assumes it will
work for scsi too.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Disk shock protection (revisited)
  2008-02-28 11:13     ` Alan Cox
  2008-02-24 18:03       ` Pavel Machek
@ 2008-02-28 17:00       ` Greg Freemyer
  2008-03-07 18:03       ` Elias Oltmanns
  2 siblings, 0 replies; 25+ messages in thread
From: Greg Freemyer @ 2008-02-28 17:00 UTC (permalink / raw)
  To: Alan Cox; +Cc: Elias Oltmanns, linux-ide, linux-kernel, Jens Axboe, lmb

On Thu, Feb 28, 2008 at 6:13 AM, Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
> > > That sounds like a non starter. What if the box is busy, what if the
>  > > daemon or something you touch needs memory and causes paging ?
>  >
>  > The daemon runs mlock'd anyway, so there won't be any need for paging
>
>  mlock does not guarantee anything of that form. A syscall by an mlocked
>  process which causes a memory allocation can cause paging of another
>  process on the system.
>
>
>  > there. As for responsiveness under heavy load, I'm not quite sure I get
>  > your meaning. On my system, at least, the only way I have managed to
>  > decrease responsiveness noticeably is to cause a lot of I/O operations
>
>  It depends a lot on hardware but you can certainly get user space delays
>  in seconds as an extreme worst case.

I don't know the details, but I believe the Linux-HA heartbeat daemons
take significant effort to eliminate unexpected delays.  See
http://www.linux-ha.org/

Lars Marowsky-Bree of Novell is extremely involved in the project and
he at least occasionally posts on LKML.  I've cc'ed him.

Greg
-- 
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Disk shock protection (revisited)
  2008-02-28 11:13     ` Alan Cox
  2008-02-24 18:03       ` Pavel Machek
  2008-02-28 17:00       ` Greg Freemyer
@ 2008-03-07 18:03       ` Elias Oltmanns
  2008-03-07 18:25         ` [PATCH 1/4] disk-protect: Add disk shock protection helpers to libata Elias Oltmanns
                           ` (5 more replies)
  2 siblings, 6 replies; 25+ messages in thread
From: Elias Oltmanns @ 2008-03-07 18:03 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-ide, linux-kernel, Jens Axboe

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:
[...]
>> there. As for responsiveness under heavy load, I'm not quite sure I get
>> your meaning. On my system, at least, the only way I have managed to
>> decrease responsiveness noticeably is to cause a lot of I/O operations
>
> It depends a lot on hardware but you can certainly get user space delays
> in seconds as an extreme worst case.
>
>> stays in memory all the time, it can go ahead and notify the kernel that
>> the disk heads should be unloaded. The kernel takes care to insert the
>> idle immediate command at the head of the queue. Am I missing something?
>
> Yes - the fact we may well have bounced off the floor already.

Well, with or without shock protection it can't get any worse by then,
can it? But in all those cases where the system manages to get the heads
of the platter in time, the owner may be greatful for this feature.

>
>> happens to have access to more than one accelerometer. Right now, I don't
>> feel quite up to the job to write a dedicated kernel module that
>> replaces the daemon and is designed in a sufficiently generic way to
>
> Thats fine - nothing says a user space daemon isn't a good starting point.

A starting point it is then.

>
>> > Yep. Pity the worst case completion time for an IDE I/O is 60 seconds or
>> > so.
>> 
>> Well, the low level driver would have to make sure that no requests are
>> accepted after the idle immediate command has been received. The block
>
> No doesn't work like that. The command currently being processed on IDE
> can take up to 60 seconds to complete. Idle immediate (on the devices it
> works for - it hangs some) is very special in that it can be used in some
> cases to interrupt a running command sequence. It requires a significant
> amount of work in the driver layer to then clean up and requeue the
> partial command and to know if it is possible to do so.

This business of aborting commands is exactly what I haven't a clue
about. At first I thought I could do something similar to
ata_do_link_abort but I obviously want to avoid the need for a soft
reset before issuing idle immediate. How am I to go about it?

>
>> and freezing the block layer queue eventually in order to stop
>> the request_fn() from being called needlessly. Once the specified time
>> is up or if the daemon writes 0 to that sysfs attrribute before that
>> time, it is kernel space code again that takes care that normal
>> operation is resumed.
>
> I think we have three things here
>
> 1.	A general queue freeze scheme from user space
> 2.	A general implementation of a queue freeze that stops further
> command issuing while the queue is blocked
> 3.	The ability for devices to provide a function to be called
> when a queue freeze is done (ie idle immediate and the like)
>
> The fine details of how you abort an ATA command don't actually matter
> for an initial implementation and can be written once the core stuff is
> right.
>
[...]
>> Yes, I thought as much. I just haven't quite worked out yet where or how
>> I am supposed to introduce libata specific sysfs attributes since this
>> seems to be left to the scsi midlayer so far.
>
> The scsi midlayer is the main manager of queues so that seems sane - and
> if you've got the basic queue freeze logic right then one assumes it will
> work for scsi too.

Basic queue freezing certainly will. But we'll need attributes specific
to ATA so the user can determine whether
1. idle immediate should be issued (if supported) on queue freeze
   events;
2. idle immediate is supported on this particular device even though
   dev->id doesn't say so;
3. idle immediate is malfunctioning and should be avoided even though
   dev->id reports support for that feature;
4. (perhaps we should drop that): use standby immediate if idle
   immediate isn't supported for some reason.

I'm going to send a first draft of a patch series in reply to this
email. It is a stripped down version intended to get the general idea
across. The first of these four patches will eventually need to be
modified to actually abort in flight commands and clear up the mess
afterwards. However, first and foremost I'd like to draw your attention
to the use of REQ_TYPE_LINUX_BLOCK requests as demonstrated in the other
three patches. The question is whether the underlying concept is right.
Although the question how to handle REQ_TYPE_LINUX_BLOCK requests in the
scsi subsystem has been raised on the linux-scsi ml, it has never been
answered really because this request type was deemed unsuitable for the
application in question. See, for instance, the thread starting at [1].
My patch approach has been partly inspired by the patch discussed there.
Before I raise this issue yet again, I'd like to know whether
REQ_TYPE_LINUX_BLOCK is the right choice for my application in your
opinion or whether another mechanism might be more suitable as James
suggested a while ago (see [2]).

Regards,

Elias

[1] http://permalink.gmane.org/gmane.linux.scsi/30049
[2] http://permalink.gmane.org/gmane.linux.scsi/37951

^ permalink raw reply	[flat|nested] 25+ messages in thread

* [PATCH 1/4] disk-protect: Add disk shock protection helpers to libata
  2008-03-07 18:03       ` Elias Oltmanns
@ 2008-03-07 18:25         ` Elias Oltmanns
  2008-03-15 12:39           ` Pavel Machek
  2008-03-20 14:13           ` Alan Cox
  2008-03-07 18:25         ` [PATCH 2/4] disk-protect: SCSI support for REQ_TYPE_LINUX_BLOCK requests Elias Oltmanns
                           ` (4 subsequent siblings)
  5 siblings, 2 replies; 25+ messages in thread
From: Elias Oltmanns @ 2008-03-07 18:25 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-ide, linux-kernel, Jens Axboe

This patch adds some helper functions to libata in order to provide low
level suport for disk shock protection. The user interface to this
functionality will be implemented in the block layer in a subsequent patch.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 drivers/ata/libata-core.c |   69 +++++++++++++++++++++++++++++++++++++++++++++
 drivers/ata/libata-eh.c   |    9 ++++++
 drivers/ata/libata-scsi.c |   31 ++++++++++++++++++++
 drivers/ata/libata.h      |    1 +
 include/linux/ata.h       |   12 ++++++++
 include/linux/libata.h    |    4 ++-
 6 files changed, 125 insertions(+), 1 deletions(-)

diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 6380726..74c3e7e 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6634,6 +6634,75 @@ int ata_port_start(struct ata_port *ap)
 }
 
 /**
+ *	ata_protect_dev - Stop I/O to device and unload disk heads
+ *	@dev: Device to be protected
+ *
+ *	Issue an IDLE IMMEDIATE command with UNLOAD FEATURE.
+ *
+ *	LOCKING:
+ *	Kernel thread context (may sleep)
+ *
+ *	RETURNS:
+ *	0 on success, -EIO on failure.
+ */
+int ata_protect_dev(struct ata_device *dev)
+{
+	struct ata_port *ap = dev->link->ap;
+	struct ata_taskfile tf;
+	unsigned long flags;
+	unsigned int err_mask;
+	int unload = ata_id_has_unload(dev->id);
+	/* We need a mechanism to set unload to 1 for devices that
+	 * support the unload feature of idle immediate but don't
+	 * report that capability in dev->id. Is dev->horkage or
+	 * dev->flags the right place for such a flag?
+	 */
+
+	spin_lock_irqsave(ap->lock, flags);
+	if (unlikely(!(dev->flags & ATA_DFLAG_PROTECTED))) {
+		spin_unlock_irqrestore(ap->lock, flags);
+		goto out;
+	}
+	spin_unlock_irqrestore(ap->lock, flags);
+
+	ata_tf_init(dev, &tf);
+	tf.flags |= ATA_TFLAG_DEVICE | ATA_TFLAG_ISADDR;
+	tf.protocol |= ATA_PROT_NODATA;
+	if (unload) {
+		tf.command = ATA_CMD_IDLEIMMEDIATE;
+		tf.feature = 0x44;
+		tf.lbal = 0x4c;
+		tf.lbam = 0x4e;
+		tf.lbah = 0x55;
+	} else
+		tf.command = ATA_CMD_STANDBYNOW1;
+
+	err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
+	if (err_mask)
+		goto abort;
+
+	if (unload) {
+		if (tf.lbal == 0xc4)
+			ata_dev_printk(dev, KERN_DEBUG,
+				       "head parked\n");
+		else
+			goto abort;
+	} else
+		ata_dev_printk(dev, KERN_DEBUG,
+			       "head park not requested, used standby\n");
+
+out:
+	return 0;
+
+abort:
+	ata_dev_printk(dev, KERN_DEBUG, "head NOT parked\n");
+	spin_lock_irqsave(ap->lock, flags);
+	dev->flags &= ~ATA_DFLAG_PROTECTED;
+	spin_unlock_irqrestore(ap->lock, flags);
+	return -EIO;
+}
+
+/**
  *	ata_dev_init - Initialize an ata_device structure
  *	@dev: Device structure to initialize
  *
diff --git a/drivers/ata/libata-eh.c b/drivers/ata/libata-eh.c
index 21a81cd..edb92dc 100644
--- a/drivers/ata/libata-eh.c
+++ b/drivers/ata/libata-eh.c
@@ -2584,6 +2584,15 @@ int ata_eh_recover(struct ata_port *ap, ata_prereset_fn_t prereset,
 			ata_link_for_each_dev(dev, link)
 				ata_dev_enable_pm(dev, ap->pm_policy);
 
+		ata_link_for_each_dev(dev, link)
+			if (ehc->i.dev_action[dev->devno] & ATA_EH_PROTECT) {
+				ata_eh_about_to_do(link, dev, ATA_EH_PROTECT);
+				rc = ata_protect_dev(dev);
+				if (rc)
+					goto dev_fail;
+				ata_eh_done(link, dev, ATA_EH_PROTECT);
+			}
+
 		/* this link is okay now */
 		ehc->i.flags = 0;
 		continue;
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index 14daf48..d0fd762 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -856,6 +856,34 @@ static void ata_scsi_dev_config(struct scsi_device *sdev,
 	}
 }
 
+static void ata_scsi_protect_dev(struct ata_device *dev)
+{
+	struct ata_link *link = dev->link;
+	struct ata_eh_info *ehi = &link->eh_info;
+
+	if (dev->flags & ATA_DFLAG_PROTECTED)
+		return;
+
+	dev->flags |= ATA_DFLAG_PROTECTED;
+	ehi->dev_action[dev->devno] |= ATA_EH_PROTECT;
+	ehi->flags |= ATA_EHI_QUIET;
+	ata_port_schedule_eh(link->ap);
+}
+
+static void ata_scsi_unprotect_dev(struct ata_device *dev)
+{
+	struct ata_link *link = dev->link;
+	struct ata_eh_info *ehi = &link->eh_info;
+
+	if (!(dev->flags & ATA_DFLAG_PROTECTED))
+		return;
+	dev->flags &= ~ATA_DFLAG_PROTECTED;
+
+	ehi->dev_action[dev->devno] |= ATA_EH_REVALIDATE;
+	ehi->flags |= ATA_EHI_QUIET;
+	ata_port_schedule_eh(link->ap);
+}
+
 /**
  *	ata_scsi_slave_config - Set SCSI device attributes
  *	@sdev: SCSI device to examine
@@ -1517,6 +1545,9 @@ static int ata_scsi_translate(struct ata_device *dev, struct scsi_cmnd *cmd,
 
 	VPRINTK("ENTER\n");
 
+	if (unlikely(dev->flags & ATA_DFLAG_PROTECTED))
+		return SCSI_MLQUEUE_DEVICE_BUSY;
+
 	qc = ata_scsi_qc_new(dev, cmd, done);
 	if (!qc)
 		goto err_mem;
diff --git a/drivers/ata/libata.h b/drivers/ata/libata.h
index bbe59c2..807cf2f 100644
--- a/drivers/ata/libata.h
+++ b/drivers/ata/libata.h
@@ -95,6 +95,7 @@ extern void ata_dev_select(struct ata_port *ap, unsigned int device,
                            unsigned int wait, unsigned int can_sleep);
 extern void swap_buf_le16(u16 *buf, unsigned int buf_words);
 extern int ata_flush_cache(struct ata_device *dev);
+extern int ata_protect_dev(struct ata_device *dev);
 extern void ata_dev_init(struct ata_device *dev);
 extern void ata_link_init(struct ata_port *ap, struct ata_link *link, int pmp);
 extern int sata_link_init_spd(struct ata_link *link);
diff --git a/include/linux/ata.h b/include/linux/ata.h
index e672e80..aa16cbb 100644
--- a/include/linux/ata.h
+++ b/include/linux/ata.h
@@ -395,6 +395,18 @@ struct ata_taskfile {
 
 #define ata_id_cdb_intr(id)	(((id)[0] & 0x60) == 0x20)
 
+static inline int ata_id_has_unload(const u16 *id)
+{
+	/* ATA-7 specifies two places to indicate unload feature support.
+	 * Since I don't really understand the difference, I'll just check
+	 * both and only return zero if none of them indicates otherwise. */
+	if ((id[84] & 0xC000) == 0x4000 && id[84] & (1 << 13))
+		return id[84] & (1 << 13);
+	if ((id[87] & 0xC000) == 0x4000)
+		return id[87] & (1 << 13);
+	return 0;
+}
+
 static inline bool ata_id_has_hipm(const u16 *id)
 {
 	u16 val = id[76];
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 124033c..2db23c3 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -147,6 +147,7 @@ enum {
 
 	ATA_DFLAG_DETACH	= (1 << 16),
 	ATA_DFLAG_DETACHED	= (1 << 17),
+	ATA_DFLAG_PROTECTED	= (1 << 18),
 
 	ATA_DEV_UNKNOWN		= 0,	/* unknown device */
 	ATA_DEV_ATA		= 1,	/* ATA device */
@@ -299,9 +300,10 @@ enum {
 	ATA_EH_SOFTRESET	= (1 << 1),
 	ATA_EH_HARDRESET	= (1 << 2),
 	ATA_EH_ENABLE_LINK	= (1 << 3),
+	ATA_EH_PROTECT		= (1 << 4),
 
 	ATA_EH_RESET_MASK	= ATA_EH_SOFTRESET | ATA_EH_HARDRESET,
-	ATA_EH_PERDEV_MASK	= ATA_EH_REVALIDATE,
+	ATA_EH_PERDEV_MASK	= ATA_EH_REVALIDATE | ATA_EH_PROTECT,
 
 	/* ata_eh_info->flags */
 	ATA_EHI_HOTPLUGGED	= (1 << 0),  /* could have been hotplugged */



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 2/4] disk-protect: SCSI support for REQ_TYPE_LINUX_BLOCK requests
  2008-03-07 18:03       ` Elias Oltmanns
  2008-03-07 18:25         ` [PATCH 1/4] disk-protect: Add disk shock protection helpers to libata Elias Oltmanns
@ 2008-03-07 18:25         ` Elias Oltmanns
  2008-03-07 18:26         ` [PATCH 3/4] disk-protect: Add a REQ_TYPE_LINUX_BLOCK request handler to libata Elias Oltmanns
                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 25+ messages in thread
From: Elias Oltmanns @ 2008-03-07 18:25 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-ide, linux-kernel, Jens Axboe

Add the necessary infrastructure to the SCSI midlayer so it can handle
REQ_TYPE_LINUX_BLOCK requests. It is rather simple at this stage and may
have to be adapted as new REQ_LB_OP* commands are introduced.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 drivers/scsi/scsi_lib.c  |   56 ++++++++++++++++++++++++++++++++++++++++++++--
 drivers/scsi/sd.c        |    9 +++++++
 include/linux/blkdev.h   |    1 +
 include/scsi/scsi_host.h |   22 ++++++++++++++++++
 4 files changed, 86 insertions(+), 2 deletions(-)

diff --git a/drivers/scsi/scsi_lib.c b/drivers/scsi/scsi_lib.c
index a9ac5b1..c25201b 100644
--- a/drivers/scsi/scsi_lib.c
+++ b/drivers/scsi/scsi_lib.c
@@ -1448,12 +1448,30 @@ static void scsi_kill_request(struct request *req, struct request_queue *q)
 	__scsi_done(cmd);
 }
 
+static void scsi_finish_lb_req(struct request *req)
+{
+	struct request_queue *q = req->q;
+	struct scsi_device *sdev = q->queuedata;
+	unsigned long flags;
+
+	spin_lock_irqsave(q->queue_lock, flags);
+	end_that_request_last(req, 1);
+	spin_unlock_irqrestore(q->queue_lock, flags);
+	put_device(&sdev->sdev_gendev);
+}
+
 static void scsi_softirq_done(struct request *rq)
 {
 	struct scsi_cmnd *cmd = rq->completion_data;
-	unsigned long wait_for = (cmd->allowed + 1) * cmd->timeout_per_command;
+	unsigned long wait_for;
 	int disposition;
 
+	if (blk_lb_request(rq)) {
+		scsi_finish_lb_req(rq);
+		return;
+	}
+
+	wait_for = (cmd->allowed + 1) * cmd->timeout_per_command;
 	INIT_LIST_HEAD(&cmd->eh_entry);
 
 	disposition = scsi_decide_disposition(cmd);
@@ -1483,6 +1501,24 @@ static void scsi_softirq_done(struct request *rq)
 	}
 }
 
+static void scsi_exec_lb_req(struct request *req)
+{
+	struct scsi_device *sdev = req->q->queuedata;
+	struct scsi_host_template *shostt = sdev->host->hostt;
+	int rc;
+
+	if (shostt->lb_request_fn)
+		rc = shostt->lb_request_fn(req);
+	else
+		rc = FAILED;
+
+	if (rc == FAILED)
+		req->errors = -EIO;
+	else if (rc == QUEUED)
+		return;
+	blk_complete_request(req);
+}
+
 /*
  * Function:    scsi_request_fn()
  *
@@ -1525,7 +1561,23 @@ static void scsi_request_fn(struct request_queue *q)
 		 * accept it.
 		 */
 		req = elv_next_request(q);
-		if (!req || !scsi_dev_queue_ready(q, sdev))
+		if (!req)
+			break;
+
+		/*
+		 * We do not account for linux blk req in the device
+		 * or host busy accounting because it is not necessarily
+		 * a scsi command that is sent to some object. The lower
+		 * level can translate it into a request/scsi_cmnd, if
+		 * necessary, and then queue that up using REQ_TYPE_BLOCK_PC.
+		 */
+		if (blk_lb_request(req)) {
+			blkdev_dequeue_request(req);
+			scsi_exec_lb_req(req);
+			continue;
+		}
+
+		if (!scsi_dev_queue_ready(q, sdev))
 			break;
 
 		if (unlikely(!scsi_device_online(sdev))) {
diff --git a/drivers/scsi/sd.c b/drivers/scsi/sd.c
index cfd859a..1be9821 100644
--- a/drivers/scsi/sd.c
+++ b/drivers/scsi/sd.c
@@ -357,6 +357,15 @@ static int sd_prep_fn(struct request_queue *q, struct request *rq)
 	if (rq->cmd_type == REQ_TYPE_BLOCK_PC) {
 		ret = scsi_setup_blk_pc_cmnd(sdp, rq);
 		goto out;
+	} else if (rq->cmd_type == REQ_TYPE_LINUX_BLOCK) {
+		get_device(&sdp->sdev_gendev);
+		/*
+		 * Since these requests don't need preparation, we'll
+		 * basically just accept them unconditionally at this
+		 * point.
+		 */
+		ret = BLKPREP_OK;
+		goto out;
 	} else if (rq->cmd_type != REQ_TYPE_FS) {
 		ret = BLKPREP_KILL;
 		goto out;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index d18ee67..5955b57 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -521,6 +521,7 @@ enum {
 
 #define blk_fs_request(rq)	((rq)->cmd_type == REQ_TYPE_FS)
 #define blk_pc_request(rq)	((rq)->cmd_type == REQ_TYPE_BLOCK_PC)
+#define blk_lb_request(rq)	((rq)->cmd_type == REQ_TYPE_LINUX_BLOCK)
 #define blk_special_request(rq)	((rq)->cmd_type == REQ_TYPE_SPECIAL)
 #define blk_sense_request(rq)	((rq)->cmd_type == REQ_TYPE_SENSE)
 
diff --git a/include/scsi/scsi_host.h b/include/scsi/scsi_host.h
index 0fd4746..c2946d0 100644
--- a/include/scsi/scsi_host.h
+++ b/include/scsi/scsi_host.h
@@ -8,6 +8,7 @@
 #include <linux/mutex.h>
 
 struct request_queue;
+struct request;
 struct block_device;
 struct completion;
 struct module;
@@ -152,6 +153,27 @@ struct scsi_host_template {
 				  void (*done)(struct scsi_cmnd *));
 
 	/*
+	 * The lb_request_fn function is used to pass
+	 * REQ_TYPE_LINUX_BLOCK requests to the LLDD. The return value
+	 * FAILED indicates that the command opcode has not been known
+	 * by lb_request_fn. In contrast, the return value SUCCESS
+	 * means that the opcode has been recognised and the request
+	 * has been processed accordingly. Note, however, that SUCCESS
+	 * does not necessarily mean that all actions have been
+	 * performed successfully; errors are recorded in req->errors.
+	 * lb_request_fn can also return QUEUED in order to prevent
+	 * midlayer from enqueueng the request for completion.
+	 * Obviously, the LLDD must take care that the request will be
+	 * completed by means of blk_complete_request eventually.
+	 *
+	 * Status: OPTIONAL
+	 */
+	/* TODO: We might need to accept a return value NEEDS_RETRY some
+	 * time.
+	 */
+	int (* lb_request_fn)(struct request *req);
+
+	/*
 	 * This is an error handling strategy routine.  You don't need to
 	 * define one of these if you don't want to - there is a default
 	 * routine that is present that should work in most cases.  For those



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 3/4] disk-protect: Add a REQ_TYPE_LINUX_BLOCK request handler to libata
  2008-03-07 18:03       ` Elias Oltmanns
  2008-03-07 18:25         ` [PATCH 1/4] disk-protect: Add disk shock protection helpers to libata Elias Oltmanns
  2008-03-07 18:25         ` [PATCH 2/4] disk-protect: SCSI support for REQ_TYPE_LINUX_BLOCK requests Elias Oltmanns
@ 2008-03-07 18:26         ` Elias Oltmanns
  2008-03-15 12:42           ` Pavel Machek
  2008-03-07 18:26         ` [PATCH 4/4] disk-protect: Add a generic block layer queue freezing facility Elias Oltmanns
                           ` (2 subsequent siblings)
  5 siblings, 1 reply; 25+ messages in thread
From: Elias Oltmanns @ 2008-03-07 18:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-ide, linux-kernel, Jens Axboe

Defines a generic handler to be used as the lb_request_fn() callback for
libata managed hosts. Support for REQ_LB_OP_FREEZE and REQ_LB_OP_THAW is
included as well.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 drivers/ata/ahci.c        |    1 +
 drivers/ata/ata_piix.c    |    1 +
 drivers/ata/libata-scsi.c |   53 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h    |    2 ++
 include/linux/libata.h    |    1 +
 5 files changed, 58 insertions(+), 0 deletions(-)

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 54f38c2..324c4fa 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -273,6 +273,7 @@ static struct scsi_host_template ahci_sht = {
 	.name			= DRV_NAME,
 	.ioctl			= ata_scsi_ioctl,
 	.queuecommand		= ata_scsi_queuecmd,
+	.lb_request_fn		= ata_scsi_lb_request_fn,
 	.change_queue_depth	= ata_scsi_change_queue_depth,
 	.can_queue		= AHCI_MAX_CMDS - 1,
 	.this_id		= ATA_SHT_THIS_ID,
diff --git a/drivers/ata/ata_piix.c b/drivers/ata/ata_piix.c
index b406b39..c2b7ad7 100644
--- a/drivers/ata/ata_piix.c
+++ b/drivers/ata/ata_piix.c
@@ -283,6 +283,7 @@ static struct scsi_host_template piix_sht = {
 	.name			= DRV_NAME,
 	.ioctl			= ata_scsi_ioctl,
 	.queuecommand		= ata_scsi_queuecmd,
+	.lb_request_fn		= ata_scsi_lb_request_fn,
 	.can_queue		= ATA_DEF_QUEUE,
 	.this_id		= ATA_SHT_THIS_ID,
 	.sg_tablesize		= LIBATA_MAX_PRD,
diff --git a/drivers/ata/libata-scsi.c b/drivers/ata/libata-scsi.c
index d0fd762..df25aa4 100644
--- a/drivers/ata/libata-scsi.c
+++ b/drivers/ata/libata-scsi.c
@@ -3123,6 +3123,59 @@ void ata_scsi_simulate(struct ata_device *dev, struct scsi_cmnd *cmd,
 	}
 }
 
+/**
+ *	ata_scsi_lb_request_fn - process block request for libata-managed device
+ *	@req: REQ_TYPE_LINUX_BLOCK request to be processed
+ *
+ *	This function is the default handler for REQ_TYPE_LINUX_BLOCK
+ *	requests in libata.
+ *
+ *	LOCKING:
+ *	Releases queue lock, and obtains host lock.
+ *
+ *	RETURNS:
+ *	SUCCESS if opcode is known to us, FAILED otherwise.
+ */
+
+int ata_scsi_lb_request_fn(struct request *req)
+{
+	struct request_queue *q = req->q;
+	struct scsi_device *sdev = q->queuedata;
+	struct Scsi_Host *shost = sdev->host;
+	struct ata_port *ap = ata_shost_to_port(shost);
+	struct ata_device *dev;
+	int rc = SUCCESS;
+
+	spin_unlock(q->queue_lock);
+	spin_lock(ap->lock);
+
+	dev = ata_scsi_find_dev(ap, sdev);
+	if (unlikely(!dev)) {
+		req->errors = -ENXIO;
+		goto out;
+	}
+
+	switch (req->cmd[0]) {
+	case REQ_LB_OP_FREEZE:
+		ata_scsi_protect_dev(dev);
+		break;
+
+	case REQ_LB_OP_THAW:
+		ata_scsi_unprotect_dev(dev);
+		break;
+
+	default:
+		rc = FAILED;
+		goto out;
+	}
+
+out:
+	spin_unlock(ap->lock);
+	spin_lock(q->queue_lock);
+	return rc;
+}
+EXPORT_SYMBOL_GPL(ata_scsi_lb_request_fn);
+
 int ata_scsi_add_hosts(struct ata_host *host, struct scsi_host_template *sht)
 {
 	int i, rc;
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 5955b57..1854a69 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -164,6 +164,8 @@ enum {
 	 */
 	REQ_LB_OP_EJECT	= 0x40,		/* eject request */
 	REQ_LB_OP_FLUSH = 0x41,		/* flush device */
+	REQ_LB_OP_FREEZE = 0x42,	/* freeze queue (protect device) */
+	REQ_LB_OP_THAW	= 0x3,		/* thaw queue */
 };
 
 /*
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 2db23c3..4e97964 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -809,6 +809,7 @@ extern void ata_host_init(struct ata_host *, struct device *,
 extern int ata_scsi_detect(struct scsi_host_template *sht);
 extern int ata_scsi_ioctl(struct scsi_device *dev, int cmd, void __user *arg);
 extern int ata_scsi_queuecmd(struct scsi_cmnd *cmd, void (*done)(struct scsi_cmnd *));
+extern int ata_scsi_lb_request_fn(struct request *req);
 extern void ata_sas_port_destroy(struct ata_port *);
 extern struct ata_port *ata_sas_port_alloc(struct ata_host *,
 					   struct ata_port_info *, struct Scsi_Host *);



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* [PATCH 4/4] disk-protect: Add a generic block layer queue freezing facility
  2008-03-07 18:03       ` Elias Oltmanns
                           ` (2 preceding siblings ...)
  2008-03-07 18:26         ` [PATCH 3/4] disk-protect: Add a REQ_TYPE_LINUX_BLOCK request handler to libata Elias Oltmanns
@ 2008-03-07 18:26         ` Elias Oltmanns
  2008-03-15 12:49           ` Pavel Machek
  2008-03-07 22:43         ` [RFC] Disk shock protection (revisited) Alan Cox
  2008-03-13 14:51         ` Elias Oltmanns
  5 siblings, 1 reply; 25+ messages in thread
From: Elias Oltmanns @ 2008-03-07 18:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-ide, linux-kernel, Jens Axboe

This patch provides a generic way to freeze the request queue of a block
device temporarily. This functionality is exposed to user space via sysfs.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 block/ll_rw_blk.c      |  140 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |    7 ++
 2 files changed, 147 insertions(+), 0 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 8b91994..2626007 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -39,6 +39,8 @@
 
 static void blk_unplug_work(struct work_struct *work);
 static void blk_unplug_timeout(unsigned long data);
+static void blk_unfreeze_work(struct work_struct *work);
+static void blk_unfreeze_timeout(unsigned long data);
 static void drive_stat_acct(struct request *rq, int new_io);
 static void init_request_from_bio(struct request *req, struct bio *bio);
 static int __make_request(struct request_queue *q, struct bio *bio);
@@ -231,6 +233,13 @@ void blk_queue_make_request(struct request_queue * q, make_request_fn * mfn)
 	q->unplug_timer.function = blk_unplug_timeout;
 	q->unplug_timer.data = (unsigned long)q;
 
+	q->max_unfreeze = 30;
+
+	INIT_WORK(&q->unfreeze_work, blk_unfreeze_work);
+
+	q->unfreeze_timer.function = blk_unfreeze_timeout;
+	q->unfreeze_timer.data = (unsigned long)q;
+
 	/*
 	 * by default assume old behaviour and bounce for any highmem page
 	 */
@@ -1861,6 +1870,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	}
 
 	init_timer(&q->unplug_timer);
+	init_timer(&q->unfreeze_timer);
 
 	kobject_set_name(&q->kobj, "%s", "queue");
 	q->kobj.ktype = &queue_ktype;
@@ -1975,6 +1985,93 @@ int blk_get_queue(struct request_queue *q)
 
 EXPORT_SYMBOL(blk_get_queue);
 
+static void issue_queue_freeze_cmd(struct request_queue *q, int thaw)
+{
+	struct request *rq;
+	unsigned long flags;
+
+	rq = blk_get_request(q, 0, GFP_NOIO);
+	if (!rq)
+		return;
+	rq->cmd_type = REQ_TYPE_LINUX_BLOCK;
+	rq->cmd_len = 1;
+	rq->cmd_flags |= REQ_NOMERGE | REQ_SOFTBARRIER;
+	if (thaw)
+		rq->cmd[0] = REQ_LB_OP_THAW;
+	else
+		rq->cmd[0] = REQ_LB_OP_FREEZE;
+	rq->rq_disk = NULL;
+	rq->sense = NULL;
+	rq->sense_len = 0;
+	rq->retries = 5;
+	rq->timeout = HZ;
+	spin_lock_irqsave(q->queue_lock, flags);
+	__elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
+	if (thaw)
+		blk_start_queue(q);
+	else {
+		blk_stop_queue(q);
+		q->request_fn(q);
+	}
+	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
+static void blk_unfreeze_work(struct work_struct *work)
+{
+	struct request_queue *q = container_of(work, struct request_queue,
+					       unfreeze_work);
+	int pending;
+
+	pending = timer_pending(&q->unfreeze_timer);
+	if (!pending)
+		issue_queue_freeze_cmd(q, 1);
+}
+
+static void blk_unfreeze_timeout(unsigned long data)
+{
+	struct request_queue *q = (struct request_queue *) data;
+
+	kblockd_schedule_work(&q->unfreeze_work);
+}
+
+/**
+ * blk_freeze_queue - (un)freeze queue and manage the queue unfreeze timer
+ * @q:		queue to be (un)frozen
+ * @second:	number of seconds to freeze the queue for
+ *
+ * Description: This function (re)sets the unfreeze timer of a request
+ *    queue. When a timer is started / stopped (not just modified),
+ *    then low level drivers are notified about this event.
+ */
+static int blk_freeze_queue(struct request_queue *q, unsigned int seconds)
+{
+	int thaw, rc;
+
+	if (seconds > 0) {
+		if (seconds > q->max_unfreeze)
+			seconds = q->max_unfreeze;
+		/* set / reset the thaw timer */
+		rc = !mod_timer(&q->unfreeze_timer,
+				msecs_to_jiffies(seconds*1000) + jiffies);
+		thaw = 0;
+		if (rc && blk_queue_stopped(q)) {
+			/* Someone else has stopped the queue already.
+			 * Best leave it alone.
+			 */
+			del_timer(&q->unfreeze_timer);
+			rc = 0;
+		}
+	} else {
+		rc = del_timer(&q->unfreeze_timer);
+		thaw = 1;
+	}
+
+	if (rc)
+		issue_queue_freeze_cmd(q, thaw);
+
+	return rc;
+}
+
 static inline void blk_free_request(struct request_queue *q, struct request *rq)
 {
 	if (rq->cmd_flags & REQ_ELVPRIV)
@@ -4080,6 +4177,42 @@ static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page)
 	return queue_var_show(max_hw_sectors_kb, (page));
 }
 
+/*
+ * When reading the 'protect' attribute, we return seconds remaining
+ * before the unfreeze timeout expires.
+ */
+static ssize_t queue_protect_show(struct request_queue *q, char *page)
+{
+	unsigned int seconds = 0;
+
+	if (blk_queue_stopped(q) && timer_pending(&q->unfreeze_timer))
+		/*
+		 * Adding 1 in order to guarantee nonzero value until timer
+		 * has actually expired.
+		 */
+		seconds = jiffies_to_msecs(q->unfreeze_timer.expires
+					   - jiffies) / 1000 + 1;
+	return queue_var_show(seconds, (page));
+}
+
+/*
+ * When writing to the 'protect' attribute, input is the number of
+ * seconds to freeze the queue for. We call a helper function to park
+ * the heads and freeze/block the queue, then we make a block layer
+ * call to setup the thaw timeout. If input is 0, then the queue will
+ * be thawed by that helper function immediately.
+ */
+static ssize_t queue_protect_store(struct request_queue *q,
+				   const char *page, size_t count)
+{
+	unsigned long seconds;
+
+	queue_var_store(&seconds, page, count);
+	blk_freeze_queue(q, seconds);
+
+	return count;
+}
+
 
 static struct queue_sysfs_entry queue_requests_entry = {
 	.attr = {.name = "nr_requests", .mode = S_IRUGO | S_IWUSR },
@@ -4110,12 +4243,19 @@ static struct queue_sysfs_entry queue_iosched_entry = {
 	.store = elv_iosched_store,
 };
 
+static struct queue_sysfs_entry queue_protect_entry = {
+	.attr = { .name = "protect", .mode = S_IRUGO | S_IWUSR },
+	.show = queue_protect_show,
+	.store = queue_protect_store,
+};
+
 static struct attribute *default_attrs[] = {
 	&queue_requests_entry.attr,
 	&queue_ra_entry.attr,
 	&queue_max_hw_sectors_entry.attr,
 	&queue_max_sectors_entry.attr,
 	&queue_iosched_entry.attr,
+	&queue_protect_entry.attr,
 	NULL,
 };
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1854a69..082e2d3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -385,6 +385,13 @@ struct request_queue
 	unsigned long		unplug_delay;	/* After this many jiffies */
 	struct work_struct	unplug_work;
 
+	/*
+	 * Auto-unfreeze state
+	 */
+	struct timer_list	unfreeze_timer;
+	int			max_unfreeze;	/* At most this many seconds */
+	struct work_struct	unfreeze_work;
+
 	struct backing_dev_info	backing_dev_info;
 
 	/*



^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [RFC] Disk shock protection (revisited)
  2008-03-07 18:03       ` Elias Oltmanns
                           ` (3 preceding siblings ...)
  2008-03-07 18:26         ` [PATCH 4/4] disk-protect: Add a generic block layer queue freezing facility Elias Oltmanns
@ 2008-03-07 22:43         ` Alan Cox
  2008-03-13 14:51         ` Elias Oltmanns
  5 siblings, 0 replies; 25+ messages in thread
From: Alan Cox @ 2008-03-07 22:43 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: linux-ide, linux-kernel, Jens Axboe

> This business of aborting commands is exactly what I haven't a clue
> about. At first I thought I could do something similar to
> ata_do_link_abort but I obviously want to avoid the need for a soft
> reset before issuing idle immediate. How am I to go about it?

See the ATA 7 specification, and then stare at libata, and then stress a
lot and realise it is going to be very hard.

Seriously - get the queue freezing stuff working first and then I'm sure
one of us libata folk who actually have the misfortune to read these
specs regularly will add in the details like that.


^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Disk shock protection (revisited)
  2008-03-07 18:03       ` Elias Oltmanns
                           ` (4 preceding siblings ...)
  2008-03-07 22:43         ` [RFC] Disk shock protection (revisited) Alan Cox
@ 2008-03-13 14:51         ` Elias Oltmanns
  2008-03-15 14:30           ` Alan Cox
  5 siblings, 1 reply; 25+ messages in thread
From: Elias Oltmanns @ 2008-03-13 14:51 UTC (permalink / raw)
  To: Alan Cox; +Cc: linux-ide, linux-kernel, Jens Axboe

Elias Oltmanns <eo@nebensachen.de> wrote:
[...]
> I'm going to send a first draft of a patch series in reply to this
> email. It is a stripped down version intended to get the general idea
> across.

Have you had got round to looking at these patches yet?

> The first of these four patches will eventually need to be modified to
> actually abort in flight commands and clear up the mess afterwards.
> However, first and foremost I'd like to draw your attention to the use
> of REQ_TYPE_LINUX_BLOCK requests as demonstrated in the other three
> patches. The question is whether the underlying concept is right.
> Although the question how to handle REQ_TYPE_LINUX_BLOCK requests in
> the scsi subsystem has been raised on the linux-scsi ml, it has never
> been answered really because this request type was deemed unsuitable
> for the application in question. See, for instance, the thread
> starting at [1]. My patch approach has been partly inspired by the
> patch discussed there. Before I raise this issue yet again, I'd like
> to know whether REQ_TYPE_LINUX_BLOCK is the right choice for my
> application in your opinion or whether another mechanism might be more
> suitable as James suggested a while ago (see [2]).
>
> Regards,
>
> Elias
>
> [1] http://permalink.gmane.org/gmane.linux.scsi/30049
> [2] http://permalink.gmane.org/gmane.linux.scsi/37951

Sorry, I got these two the wrong way round. [1] should be [2] and vice
versa.

Regards,

Elias

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] disk-protect: Add disk shock protection helpers to libata
  2008-03-07 18:25         ` [PATCH 1/4] disk-protect: Add disk shock protection helpers to libata Elias Oltmanns
@ 2008-03-15 12:39           ` Pavel Machek
  2008-03-20 14:13           ` Alan Cox
  1 sibling, 0 replies; 25+ messages in thread
From: Pavel Machek @ 2008-03-15 12:39 UTC (permalink / raw)
  To: Alan Cox, linux-ide, linux-kernel, Jens Axboe

On Fri 2008-03-07 19:25:16, Elias Oltmanns wrote:
> This patch adds some helper functions to libata in order to provide low
> level suport for disk shock protection. The user interface to this
> functionality will be implemented in the block layer in a subsequent patch.
> 
> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>

Looks ok to me, but I'm not an ata expert.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 3/4] disk-protect: Add a REQ_TYPE_LINUX_BLOCK request handler to libata
  2008-03-07 18:26         ` [PATCH 3/4] disk-protect: Add a REQ_TYPE_LINUX_BLOCK request handler to libata Elias Oltmanns
@ 2008-03-15 12:42           ` Pavel Machek
  0 siblings, 0 replies; 25+ messages in thread
From: Pavel Machek @ 2008-03-15 12:42 UTC (permalink / raw)
  To: Alan Cox, linux-ide, linux-kernel, Jens Axboe

On Fri 2008-03-07 19:26:14, Elias Oltmanns wrote:
> Defines a generic handler to be used as the lb_request_fn() callback for
> libata managed hosts. Support for REQ_LB_OP_FREEZE and REQ_LB_OP_THAW is
> included as well.

> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>

Both this and previous patch look ok to me -- but I'm not ATA
person...

It would be nice to get them in - thinkpad x42 kills few disks a year
when running w/o active protection.

-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] disk-protect: Add a generic block layer queue freezing facility
  2008-03-07 18:26         ` [PATCH 4/4] disk-protect: Add a generic block layer queue freezing facility Elias Oltmanns
@ 2008-03-15 12:49           ` Pavel Machek
  2008-03-16 16:16             ` Elias Oltmanns
  0 siblings, 1 reply; 25+ messages in thread
From: Pavel Machek @ 2008-03-15 12:49 UTC (permalink / raw)
  To: Alan Cox, linux-ide, linux-kernel, Jens Axboe

On Fri 2008-03-07 19:26:41, Elias Oltmanns wrote:
> This patch provides a generic way to freeze the request queue of a block
> device temporarily. This functionality is exposed to user space via sysfs.
> 
> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>

I guess this should have patch going to documentation. Otherwise it
looks ok.


> +/*
> + * When reading the 'protect' attribute, we return seconds remaining
> + * before the unfreeze timeout expires.
> + */
> +static ssize_t queue_protect_show(struct request_queue *q, char *page)
> +{
> +	unsigned int seconds = 0;
> +
> +	if (blk_queue_stopped(q) && timer_pending(&q->unfreeze_timer))
> +		/*
> +		 * Adding 1 in order to guarantee nonzero value until timer
> +		 * has actually expired.
> +		 */
> +		seconds = jiffies_to_msecs(q->unfreeze_timer.expires
> +					   - jiffies) / 1000 + 1;

Is it okay to read expires without locking? 
						Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [RFC] Disk shock protection (revisited)
  2008-03-13 14:51         ` Elias Oltmanns
@ 2008-03-15 14:30           ` Alan Cox
  0 siblings, 0 replies; 25+ messages in thread
From: Alan Cox @ 2008-03-15 14:30 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: linux-ide, linux-kernel, Jens Axboe

On Thu, 13 Mar 2008 15:51:59 +0100
Elias Oltmanns <eo@nebensachen.de> wrote:

> Elias Oltmanns <eo@nebensachen.de> wrote:
> [...]
> > I'm going to send a first draft of a patch series in reply to this
> > email. It is a stripped down version intended to get the general idea
> > across.
> 
> Have you had got round to looking at these patches yet?

I've spent almost all of the last two weeks at or travelling to/from
conferences so no.

Alan

^ permalink raw reply	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] disk-protect: Add a generic block layer queue freezing facility
  2008-03-15 12:49           ` Pavel Machek
@ 2008-03-16 16:16             ` Elias Oltmanns
  2008-03-17 23:00               ` Elias Oltmanns
  0 siblings, 1 reply; 25+ messages in thread
From: Elias Oltmanns @ 2008-03-16 16:16 UTC (permalink / raw)
  To: linux-ide; +Cc: Alan Cox, linux-kernel, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 1439 bytes --]

Pavel Machek <pavel@ucw.cz> wrote:
> On Fri 2008-03-07 19:26:41, Elias Oltmanns wrote:
>> This patch provides a generic way to freeze the request queue of a block
>> device temporarily. This functionality is exposed to user space via sysfs.
>> 
>> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
>
> I guess this should have patch going to documentation. Otherwise it
> looks ok.

Yes, I'll add documentation eventually. There is more to be added before
this patch can go in because the protect attribute will appear for all
block devices employing request queues. In particular, we will have to
make sure that drivers will handle REQ_TYPE_LINUX_BLOCK requests
gracefully even if they don't implement support for a particular
command.

>
>> +/*
>> + * When reading the 'protect' attribute, we return seconds remaining
>> + * before the unfreeze timeout expires.
>> + */
>> +static ssize_t queue_protect_show(struct request_queue *q, char *page)
>> +{
>> +	unsigned int seconds = 0;
>> +
>> +	if (blk_queue_stopped(q) && timer_pending(&q->unfreeze_timer))
>> +		/*
>> +		 * Adding 1 in order to guarantee nonzero value until timer
>> +		 * has actually expired.
>> +		 */
>> +		seconds = jiffies_to_msecs(q->unfreeze_timer.expires
>> +					   - jiffies) / 1000 + 1;
>
> Is it okay to read expires without locking? 

No, I have been a bit careless with regard to locking in this patch. See
the revised version below.

Regards,

Elias

---

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: tmp.patch --]
[-- Type: text/x-patch, Size: 6660 bytes --]

This patch provides a generic way to freeze the request queue of a block
device temporarily. This functionality is exposed to user space via sysfs.

Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
---

 block/ll_rw_blk.c      |  147 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |    7 ++
 2 files changed, 154 insertions(+), 0 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 8b91994..f295855 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -39,6 +39,8 @@
 
 static void blk_unplug_work(struct work_struct *work);
 static void blk_unplug_timeout(unsigned long data);
+static void blk_unfreeze_work(struct work_struct *work);
+static void blk_unfreeze_timeout(unsigned long data);
 static void drive_stat_acct(struct request *rq, int new_io);
 static void init_request_from_bio(struct request *req, struct bio *bio);
 static int __make_request(struct request_queue *q, struct bio *bio);
@@ -231,6 +233,13 @@ void blk_queue_make_request(struct request_queue * q, make_request_fn * mfn)
 	q->unplug_timer.function = blk_unplug_timeout;
 	q->unplug_timer.data = (unsigned long)q;
 
+	q->max_unfreeze = 30;
+
+	INIT_WORK(&q->unfreeze_work, blk_unfreeze_work);
+
+	q->unfreeze_timer.function = blk_unfreeze_timeout;
+	q->unfreeze_timer.data = (unsigned long)q;
+
 	/*
 	 * by default assume old behaviour and bounce for any highmem page
 	 */
@@ -1861,6 +1870,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	}
 
 	init_timer(&q->unplug_timer);
+	init_timer(&q->unfreeze_timer);
 
 	kobject_set_name(&q->kobj, "%s", "queue");
 	q->kobj.ktype = &queue_ktype;
@@ -1975,6 +1985,97 @@ int blk_get_queue(struct request_queue *q)
 
 EXPORT_SYMBOL(blk_get_queue);
 
+static void issue_queue_freeze_cmd(struct request_queue *q, int thaw)
+{
+	struct request *rq;
+	unsigned long flags;
+
+	rq = blk_get_request(q, 0, GFP_NOIO);
+	if (!rq)
+		return;
+	rq->cmd_type = REQ_TYPE_LINUX_BLOCK;
+	rq->cmd_len = 1;
+	rq->cmd_flags |= REQ_NOMERGE | REQ_SOFTBARRIER;
+	if (thaw)
+		rq->cmd[0] = REQ_LB_OP_THAW;
+	else
+		rq->cmd[0] = REQ_LB_OP_FREEZE;
+	rq->rq_disk = NULL;
+	rq->sense = NULL;
+	rq->sense_len = 0;
+	rq->retries = 5;
+	rq->timeout = HZ;
+	spin_lock_irqsave(q->queue_lock, flags);
+	__elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
+	if (thaw)
+		blk_start_queue(q);
+	else {
+		blk_stop_queue(q);
+		q->request_fn(q);
+	}
+	spin_unlock_irqrestore(q->queue_lock, flags);
+}
+
+static void blk_unfreeze_work(struct work_struct *work)
+{
+	struct request_queue *q = container_of(work, struct request_queue,
+					       unfreeze_work);
+	int pending;
+
+	spin_lock_irq(q->queue_lock);
+	pending = timer_pending(&q->unfreeze_timer);
+	spin_unlock_irq(q->queue_lock);
+	if (!pending)
+		issue_queue_freeze_cmd(q, 1);
+}
+
+static void blk_unfreeze_timeout(unsigned long data)
+{
+	struct request_queue *q = (struct request_queue *) data;
+
+	kblockd_schedule_work(&q->unfreeze_work);
+}
+
+/**
+ * blk_freeze_queue - (un)freeze queue and manage the queue unfreeze timer
+ * @q:		queue to be (un)frozen
+ * @second:	number of seconds to freeze the queue for
+ *
+ * Description: This function (re)sets the unfreeze timer of a request
+ *    queue. When a timer is started / stopped (not just modified),
+ *    then low level drivers are notified about this event.
+ */
+static int blk_freeze_queue(struct request_queue *q, unsigned int seconds)
+{
+	int thaw, rc;
+
+	spin_lock_irq(q->queue_lock);
+	if (seconds > 0) {
+		if (seconds > q->max_unfreeze)
+			seconds = q->max_unfreeze;
+		/* set / reset the thaw timer */
+		rc = !mod_timer(&q->unfreeze_timer,
+				msecs_to_jiffies(seconds * 1000) + jiffies);
+		thaw = 0;
+		if (rc && blk_queue_stopped(q)) {
+			/* Someone else has stopped the queue already.
+			 * Best leave it alone.
+			 */
+			del_timer(&q->unfreeze_timer);
+			rc = 0;
+		}
+	} else {
+		rc = del_timer(&q->unfreeze_timer);
+		thaw = 1;
+	}
+	spin_unlock_irq(q->queue_lock);
+
+	if (rc)
+		issue_queue_freeze_cmd(q, thaw);
+
+	return rc;
+}
+
 static inline void blk_free_request(struct request_queue *q, struct request *rq)
 {
 	if (rq->cmd_flags & REQ_ELVPRIV)
@@ -4080,6 +4181,45 @@ static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page)
 	return queue_var_show(max_hw_sectors_kb, (page));
 }
 
+/*
+ * When reading the 'protect' attribute, we return seconds remaining
+ * before the unfreeze timeout expires.
+ */
+static ssize_t queue_protect_show(struct request_queue *q, char *page)
+{
+	unsigned int seconds = 0;
+
+	spin_lock_irq(q->queue_lock);
+	if (blk_queue_stopped(q) && timer_pending(&q->unfreeze_timer))
+		/*
+		 * Adding 1 in order to guarantee nonzero value until timer
+		 * has actually expired.
+		 */
+		seconds = jiffies_to_msecs(q->unfreeze_timer.expires
+					   - jiffies) / 1000 + 1;
+	spin_unlock_irq(q->queue_lock);
+
+	return queue_var_show(seconds, (page));
+}
+
+/*
+ * When writing to the 'protect' attribute, input is the number of
+ * seconds to freeze the queue for. We call a helper function to park
+ * the heads and freeze/block the queue, then we make a block layer
+ * call to setup the thaw timeout. If input is 0, then the queue will
+ * be thawed by that helper function immediately.
+ */
+static ssize_t queue_protect_store(struct request_queue *q,
+				   const char *page, size_t count)
+{
+	unsigned long seconds;
+
+	queue_var_store(&seconds, page, count);
+	blk_freeze_queue(q, seconds);
+
+	return count;
+}
+
 
 static struct queue_sysfs_entry queue_requests_entry = {
 	.attr = {.name = "nr_requests", .mode = S_IRUGO | S_IWUSR },
@@ -4110,12 +4250,19 @@ static struct queue_sysfs_entry queue_iosched_entry = {
 	.store = elv_iosched_store,
 };
 
+static struct queue_sysfs_entry queue_protect_entry = {
+	.attr = { .name = "protect", .mode = S_IRUGO | S_IWUSR },
+	.show = queue_protect_show,
+	.store = queue_protect_store,
+};
+
 static struct attribute *default_attrs[] = {
 	&queue_requests_entry.attr,
 	&queue_ra_entry.attr,
 	&queue_max_hw_sectors_entry.attr,
 	&queue_max_sectors_entry.attr,
 	&queue_iosched_entry.attr,
+	&queue_protect_entry.attr,
 	NULL,
 };
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1854a69..082e2d3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -385,6 +385,13 @@ struct request_queue
 	unsigned long		unplug_delay;	/* After this many jiffies */
 	struct work_struct	unplug_work;
 
+	/*
+	 * Auto-unfreeze state
+	 */
+	struct timer_list	unfreeze_timer;
+	int			max_unfreeze;	/* At most this many seconds */
+	struct work_struct	unfreeze_work;
+
 	struct backing_dev_info	backing_dev_info;
 
 	/*

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 4/4] disk-protect: Add a generic block layer queue freezing facility
  2008-03-16 16:16             ` Elias Oltmanns
@ 2008-03-17 23:00               ` Elias Oltmanns
  0 siblings, 0 replies; 25+ messages in thread
From: Elias Oltmanns @ 2008-03-17 23:00 UTC (permalink / raw)
  To: linux-ide; +Cc: Alan Cox, linux-kernel, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 1471 bytes --]

Elias Oltmanns <eo@nebensachen.de> wrote:
> Pavel Machek <pavel@ucw.cz> wrote:
>> On Fri 2008-03-07 19:26:41, Elias Oltmanns wrote:
>>> This patch provides a generic way to freeze the request queue of a block
>>> device temporarily. This functionality is exposed to user space via sysfs.
>>> 
>>> Signed-off-by: Elias Oltmanns <eo@nebensachen.de>
[...]
>>> +/*
>>> + * When reading the 'protect' attribute, we return seconds remaining
>>> + * before the unfreeze timeout expires.
>>> + */
>>> +static ssize_t queue_protect_show(struct request_queue *q, char *page)
>>> +{
>>> +	unsigned int seconds = 0;
>>> +
>>> +	if (blk_queue_stopped(q) && timer_pending(&q->unfreeze_timer))
>>> +		/*
>>> +		 * Adding 1 in order to guarantee nonzero value until timer
>>> +		 * has actually expired.
>>> +		 */
>>> +		seconds = jiffies_to_msecs(q->unfreeze_timer.expires
>>> +					   - jiffies) / 1000 + 1;
>>
>> Is it okay to read expires without locking? 
>
> No, I have been a bit careless with regard to locking in this patch. See
> the revised version below.

Come to think of it, the lock really shouldn't be released between
modding / checking the timer and issuing the respective command. This
makes the whole thing a bit messy because allocating a request the
standard way is done before acquiring the spin_lock. Since we don't know
at that time whether a command is actually going to be enqueued, this
looks rather suboptimal to me. A better idea anyone?

Regards,

Elias


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 04-block-freeze-queue.patch --]
[-- Type: text/x-patch, Size: 6438 bytes --]

---

 block/ll_rw_blk.c      |  142 ++++++++++++++++++++++++++++++++++++++++++++++++
 include/linux/blkdev.h |    7 ++
 2 files changed, 149 insertions(+), 0 deletions(-)

diff --git a/block/ll_rw_blk.c b/block/ll_rw_blk.c
index 8b91994..112af66 100644
--- a/block/ll_rw_blk.c
+++ b/block/ll_rw_blk.c
@@ -39,6 +39,8 @@
 
 static void blk_unplug_work(struct work_struct *work);
 static void blk_unplug_timeout(unsigned long data);
+static void blk_unfreeze_work(struct work_struct *work);
+static void blk_unfreeze_timeout(unsigned long data);
 static void drive_stat_acct(struct request *rq, int new_io);
 static void init_request_from_bio(struct request *req, struct bio *bio);
 static int __make_request(struct request_queue *q, struct bio *bio);
@@ -231,6 +233,13 @@ void blk_queue_make_request(struct request_queue * q, make_request_fn * mfn)
 	q->unplug_timer.function = blk_unplug_timeout;
 	q->unplug_timer.data = (unsigned long)q;
 
+	q->max_unfreeze = 30;
+
+	INIT_WORK(&q->unfreeze_work, blk_unfreeze_work);
+
+	q->unfreeze_timer.function = blk_unfreeze_timeout;
+	q->unfreeze_timer.data = (unsigned long)q;
+
 	/*
 	 * by default assume old behaviour and bounce for any highmem page
 	 */
@@ -1861,6 +1870,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
 	}
 
 	init_timer(&q->unplug_timer);
+	init_timer(&q->unfreeze_timer);
 
 	kobject_set_name(&q->kobj, "%s", "queue");
 	q->kobj.ktype = &queue_ktype;
@@ -1975,6 +1985,92 @@ int blk_get_queue(struct request_queue *q)
 
 EXPORT_SYMBOL(blk_get_queue);
 
+static void issue_queue_freeze_cmd(struct request_queue *q,
+				   struct request *rq, int thaw)
+{
+	rq->cmd_type = REQ_TYPE_LINUX_BLOCK;
+	rq->cmd_len = 1;
+	rq->cmd_flags |= REQ_NOMERGE | REQ_SOFTBARRIER;
+	if (thaw)
+		rq->cmd[0] = REQ_LB_OP_THAW;
+	else
+		rq->cmd[0] = REQ_LB_OP_FREEZE;
+	rq->rq_disk = NULL;
+	rq->sense = NULL;
+	rq->sense_len = 0;
+	rq->retries = 5;
+	rq->timeout = HZ;
+	__elv_add_request(q, rq, ELEVATOR_INSERT_FRONT, 0);
+	if (thaw)
+		blk_start_queue(q);
+	else {
+		blk_stop_queue(q);
+		q->request_fn(q);
+	}
+}
+
+static void blk_unfreeze_work(struct work_struct *work)
+{
+	struct request_queue *q = container_of(work, struct request_queue,
+					       unfreeze_work);
+	struct request *rq;
+
+	rq = blk_get_request(q, 0, __GFP_WAIT);
+	spin_lock_irq(q->queue_lock);
+	if (!timer_pending(&q->unfreeze_timer))
+		issue_queue_freeze_cmd(q, rq, 1);
+	else
+		__blk_put_request(q, rq);
+	spin_unlock_irq(q->queue_lock);
+}
+
+static void blk_unfreeze_timeout(unsigned long data)
+{
+	struct request_queue *q = (struct request_queue *) data;
+
+	kblockd_schedule_work(&q->unfreeze_work);
+}
+
+/**
+ * blk_freeze_queue - (un)freeze queue and manage the queue unfreeze timer
+ * @q:		queue to be (un)frozen
+ * @second:	number of seconds to freeze the queue for
+ *
+ * Description: This function (re)sets the unfreeze timer of a request
+ *    queue. When a timer is started / stopped (not just modified),
+ *    then low level drivers are notified about this event.
+ */
+static int blk_freeze_queue(struct request_queue *q, unsigned int seconds)
+{
+	struct request *rq;
+	int rc;
+
+	rq = blk_get_request(q, 0, __GFP_HIGH | __GFP_WAIT);
+	spin_lock_irq(q->queue_lock);
+	if (seconds > 0) {
+		if (seconds > q->max_unfreeze)
+			seconds = q->max_unfreeze;
+		/* set / reset the thaw timer */
+		rc = !mod_timer(&q->unfreeze_timer,
+				msecs_to_jiffies(seconds * 1000) + jiffies);
+		if (rc && blk_queue_stopped(q)) {
+			/* Someone else has stopped the queue already.
+			 * Best leave it alone.
+			 */
+			del_timer(&q->unfreeze_timer);
+			rc = 0;
+		}
+	} else
+		rc = del_timer(&q->unfreeze_timer);
+	if (rc)
+		issue_queue_freeze_cmd(q, rq, !seconds);
+	else
+		__blk_put_request(q, rq);
+	spin_unlock_irq(q->queue_lock);
+
+	return rc;
+}
+
 static inline void blk_free_request(struct request_queue *q, struct request *rq)
 {
 	if (rq->cmd_flags & REQ_ELVPRIV)
@@ -4080,6 +4176,45 @@ static ssize_t queue_max_hw_sectors_show(struct request_queue *q, char *page)
 	return queue_var_show(max_hw_sectors_kb, (page));
 }
 
+/*
+ * When reading the 'protect' attribute, we return seconds remaining
+ * before the unfreeze timeout expires.
+ */
+static ssize_t queue_protect_show(struct request_queue *q, char *page)
+{
+	unsigned int seconds = 0;
+
+	spin_lock_irq(q->queue_lock);
+	if (blk_queue_stopped(q) && timer_pending(&q->unfreeze_timer))
+		/*
+		 * Adding 1 in order to guarantee nonzero value until timer
+		 * has actually expired.
+		 */
+		seconds = jiffies_to_msecs(q->unfreeze_timer.expires
+					   - jiffies) / 1000 + 1;
+	spin_unlock_irq(q->queue_lock);
+
+	return queue_var_show(seconds, (page));
+}
+
+/*
+ * When writing to the 'protect' attribute, input is the number of
+ * seconds to freeze the queue for. We call a helper function to park
+ * the heads and freeze/block the queue, then we make a block layer
+ * call to setup the thaw timeout. If input is 0, then the queue will
+ * be thawed by that helper function immediately.
+ */
+static ssize_t queue_protect_store(struct request_queue *q,
+				   const char *page, size_t count)
+{
+	unsigned long seconds;
+
+	queue_var_store(&seconds, page, count);
+	blk_freeze_queue(q, seconds);
+
+	return count;
+}
+
 
 static struct queue_sysfs_entry queue_requests_entry = {
 	.attr = {.name = "nr_requests", .mode = S_IRUGO | S_IWUSR },
@@ -4110,12 +4245,19 @@ static struct queue_sysfs_entry queue_iosched_entry = {
 	.store = elv_iosched_store,
 };
 
+static struct queue_sysfs_entry queue_protect_entry = {
+	.attr = { .name = "protect", .mode = S_IRUGO | S_IWUSR },
+	.show = queue_protect_show,
+	.store = queue_protect_store,
+};
+
 static struct attribute *default_attrs[] = {
 	&queue_requests_entry.attr,
 	&queue_ra_entry.attr,
 	&queue_max_hw_sectors_entry.attr,
 	&queue_max_sectors_entry.attr,
 	&queue_iosched_entry.attr,
+	&queue_protect_entry.attr,
 	NULL,
 };
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 1854a69..082e2d3 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -385,6 +385,13 @@ struct request_queue
 	unsigned long		unplug_delay;	/* After this many jiffies */
 	struct work_struct	unplug_work;
 
+	/*
+	 * Auto-unfreeze state
+	 */
+	struct timer_list	unfreeze_timer;
+	int			max_unfreeze;	/* At most this many seconds */
+	struct work_struct	unfreeze_work;
+
 	struct backing_dev_info	backing_dev_info;
 
 	/*

^ permalink raw reply related	[flat|nested] 25+ messages in thread

* Re: [PATCH 1/4] disk-protect: Add disk shock protection helpers to libata
  2008-03-07 18:25         ` [PATCH 1/4] disk-protect: Add disk shock protection helpers to libata Elias Oltmanns
  2008-03-15 12:39           ` Pavel Machek
@ 2008-03-20 14:13           ` Alan Cox
  1 sibling, 0 replies; 25+ messages in thread
From: Alan Cox @ 2008-03-20 14:13 UTC (permalink / raw)
  To: Elias Oltmanns; +Cc: linux-ide, linux-kernel, Jens Axboe

> +	err_mask = ata_exec_internal(dev, &tf, NULL, DMA_NONE, NULL, 0, 0);
> +	if (err_mask)
> +		goto abort;

We assume you never use ata_exec_internal on a "live" to block layer
device, so this doesn't work for the general case. In the EH handler for
unparking it should be fine as the EH thread runs with the drive queue
shut down.


> +static inline int ata_id_has_unload(const u16 *id)
> +{
> +	/* ATA-7 specifies two places to indicate unload feature support.
> +	 * Since I don't really understand the difference, I'll just check
> +	 * both and only return zero if none of them indicates otherwise. */

Probably wise and we can fix that once it is obvious what reality is
using.

>

^ permalink raw reply	[flat|nested] 25+ messages in thread

end of thread, other threads:[~2008-03-20 14:29 UTC | newest]

Thread overview: 25+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-02-25 23:56 [RFC] Disk shock protection (revisited) Elias Oltmanns
2008-02-26  0:02 ` Jeff Garzik
2008-02-26  0:30   ` Elias Oltmanns
2008-02-26  1:33     ` Henrique de Moraes Holschuh
2008-02-26 12:39 ` Alan Cox
2008-02-28  8:24   ` Elias Oltmanns
2008-02-28 11:13     ` Alan Cox
2008-02-24 18:03       ` Pavel Machek
2008-02-28 17:00       ` Greg Freemyer
2008-03-07 18:03       ` Elias Oltmanns
2008-03-07 18:25         ` [PATCH 1/4] disk-protect: Add disk shock protection helpers to libata Elias Oltmanns
2008-03-15 12:39           ` Pavel Machek
2008-03-20 14:13           ` Alan Cox
2008-03-07 18:25         ` [PATCH 2/4] disk-protect: SCSI support for REQ_TYPE_LINUX_BLOCK requests Elias Oltmanns
2008-03-07 18:26         ` [PATCH 3/4] disk-protect: Add a REQ_TYPE_LINUX_BLOCK request handler to libata Elias Oltmanns
2008-03-15 12:42           ` Pavel Machek
2008-03-07 18:26         ` [PATCH 4/4] disk-protect: Add a generic block layer queue freezing facility Elias Oltmanns
2008-03-15 12:49           ` Pavel Machek
2008-03-16 16:16             ` Elias Oltmanns
2008-03-17 23:00               ` Elias Oltmanns
2008-03-07 22:43         ` [RFC] Disk shock protection (revisited) Alan Cox
2008-03-13 14:51         ` Elias Oltmanns
2008-03-15 14:30           ` Alan Cox
2008-02-26 20:47 ` Willy Tarreau
2008-02-28 10:10   ` Elias Oltmanns

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).