linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rjw@sisk.pl>
To: Toshi Kani <toshi.kani@hp.com>
Cc: linux-s390@vger.kernel.org, jiang.liu@huawei.com,
	wency@cn.fujitsu.com, linux-acpi@vger.kernel.org,
	Greg KH <gregkh@linuxfoundation.org>,
	linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	isimatu.yasuaki@jp.fujitsu.com, yinghai@kernel.org,
	srivatsa.bhat@linux.vnet.ibm.com, guohanjun@huawei.com,
	bhelgaas@google.com, akpm@linux-foundation.org,
	linuxppc-dev@lists.ozlabs.org, lenb@kernel.org
Subject: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework
Date: Tue, 05 Feb 2013 00:52:30 +0100	[thread overview]
Message-ID: <7003418.onqVlaaHJS@vostro.rjw.lan> (raw)
In-Reply-To: <1360016009.23410.213.camel@misato.fc.hp.com>

On Monday, February 04, 2013 03:13:29 PM Toshi Kani wrote:
> On Mon, 2013-02-04 at 21:07 +0100, Rafael J. Wysocki wrote:
> > On Monday, February 04, 2013 06:33:52 AM Greg KH wrote:
> > > On Mon, Feb 04, 2013 at 03:21:22PM +0100, Rafael J. Wysocki wrote:
> > > > On Monday, February 04, 2013 04:48:10 AM Greg KH wrote:
> > > > > On Sun, Feb 03, 2013 at 09:44:39PM +0100, Rafael J. Wysocki wrote:
> > > > > > > Yes, but those are just remove events and we can only see how destructive they
> > > > > > > were after the removal.  The point is to be able to figure out whether or not
> > > > > > > we *want* to do the removal in the first place.
> > > > > > > 
> > > > > > > Say you have a computing node which signals a hardware problem in a processor
> > > > > > > package (the container with CPU cores, memory, PCI host bridge etc.).  You
> > > > > > > may want to eject that package, but you don't want to kill the system this
> > > > > > > way.  So if the eject is doable, it is very much desirable to do it, but if it
> > > > > > > is not doable, you'd rather shut the box down and do the replacement afterward.
> > > > > > > That may be costly, however (maybe weeks of computations), so it should be
> > > > > > > avoided if possible, but not at the expense of crashing the box if the eject
> > > > > > > doesn't work out.
> > > > > > 
> > > > > > It seems to me that we could handle that with the help of a new flag, say
> > > > > > "no_eject", in struct device, a global mutex, and a function that will walk
> > > > > > the given subtree of the device hierarchy and check if "no_eject" is set for
> > > > > > any devices in there.  Plus a global "no_eject" switch, perhaps.
> > > > > 
> > > > > I think this will always be racy, or at worst, slow things down on
> > > > > normal device operations as you will always be having to grab this flag
> > > > > whenever you want to do something new.
> > > > 
> > > > I don't see why this particular scheme should be racy, at least I don't see any
> > > > obvious races in it (although I'm not that good at races detection in general,
> > > > admittedly).
> > > > 
> > > > Also, I don't expect that flag to be used for everything, just for things known
> > > > to seriously break if forcible eject is done.  That may be not precise enough,
> > > > so that's a matter of defining its purpose more precisely.
> > > > 
> > > > We can do something like that on the ACPI level (ie. introduce a no_eject flag
> > > > in struct acpi_device and provide an iterface for the layers above ACPI to
> > > > manipulate it) but then devices without ACPI namespace objects won't be
> > > > covered.  That may not be a big deal, though.
> > > > 
> > > > So say dev is about to be used for something incompatible with ejecting, so to
> > > > speak.  Then, one would do platform_lock_eject(dev), which would check if dev
> > > > has an ACPI handle and then take acpi_eject_lock (if so).  The return value of
> > > > platform_lock_eject(dev) would need to be checked to see if the device is not
> > > > gone.  If it returns success (0), one would do something to the device and
> > > > call platform_no_eject(dev) and then platform_unlock_eject(dev).
> > > 
> > > How does a device "know" it is doing something that is incompatible with
> > > ejecting?  That's a non-trivial task from what I can tell.
> > 
> > I agree that this is complicated in general.  But.
> > 
> > There are devices known to have software "offline" and "online" operations
> > such that after the "offline" the given device is guaranteed to be not used
> > until "online".  We have that for CPU cores, for example, and user space can
> > do it via /sys/devices/system/cpu/cpuX/online .  So, why don't we make the
> > "online" set the no_eject flag (under the lock as appropriate) and the
> > "offline" clear it?  And why don't we define such "online" and "offline" for
> > all of the other "system" stuff, like memory, PCI host bridges etc. and make it
> > behave analogously?
> > 
> > Then, it is quite simple to say which devices should use the no_eject flag:
> > devices that have "online" and "offline" exported to user space.  And guess
> > who's responsible for "offlining" all of those things before trying to eject
> > them: user space is.  From the kernel's point of view it is all clear.  Hands
> > clean. :-)
> > 
> > Now, there's a different problem how to expose all of the relevant information
> > to user space so that it knows what to "offline" for the specific eject
> > operation to succeed, but that's kind of separate and worth addressing
> > anyway.
> 
> So, the idea is to run a user space program that off-lines all relevant
> devices before trimming ACPI devices.  Is that right?  That sounds like
> a worth idea to consider with.  This basically moves the "sequencer"
> part into user space instead of the kernel space in my proposal.  I
> agree that how to expose all of the relevant info to user space is an
> issue.  Also, we will need to make sure that the user program always
> runs per a kernel request and then informs a result back to the kernel,
> so that the kernel can do the rest of an operation and inform a result
> to FW with _OST or _EJ0.  This loop has to close.  I think it is going
> to be more complicated than the kernel-only approach.

I actually didn't think about that.  The point is that trying to offline
everything *synchronously* may just be pointless, because it may be
offlined upfront, before the eject is even requested.  So the sequence
would be to first offline things that we'll want to eject from user space
and then to send the eject request (e.g. via sysfs too).

Eject requests from eject buttons and things like that may just fail if
some components involved that should be offline are online.  The fact that
we might be able to offline them synchronously if we tried doesn't matter,
pretty much as it doesn't matter for hot-swappable disks.

You'd probably never try to hot-remove a disk before unmounting filesystems
mounted from it or failing it as a RAID component and nobody sane wants the
kernel to do things like that automatically when the user presses the eject
button.  In my opinion we should treat memory eject, or CPU package eject, or
PCI host bridge eject in exactly the same way: Don't eject if it is not
prepared for ejecting in the first place.

And if you think about it, that makes things *massively* simpler, because now
the kernel doesn't heed to worry about all of those "synchronous removal"
scenarions that very well may involve every single device in the system and
the whole problem is nicely split into several separate "implement
offline/online" problems that are subsystem-specific and a single
"eject if everything relevant is offline" problem which is kind of trivial.
Plus the one of exposing information to user space, which is separate too.

Now, each of them can be worked on separately, *tested* separately and
debugged separately if need be and it is much easier to isolate failures
and so on.

> In addition, I am not sure if the "no_eject" flag in acpi_device is
> really necessary here since the user program will inform the kernel if
> all devices are off-line.  Also, the kernel will likely need to expose
> the device info to the user program to tell which devices need to be
> off-lined.  At that time, the kernel already knows if there is any
> on-line device in the scope.

Well, that depends no what "the kernel" means and how it knows that.  Surely
the "online" components have to be marked somehow so that it is easy to check
if they are in the scope in the subsystem-independent way, so why don't we use
something like the no_eject flag for that?

Rafael


-- 
I speak only for myself.
Rafael J. Wysocki, Intel Open Source Technology Center.

  reply	other threads:[~2013-02-04 23:46 UTC|newest]

Thread overview: 83+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-10 23:40 [RFC PATCH v2 00/12] System device hot-plug framework Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Toshi Kani
2013-01-11 21:23   ` Rafael J. Wysocki
2013-01-14 15:33     ` Toshi Kani
2013-01-14 18:48       ` Rafael J. Wysocki
2013-01-14 19:02         ` Toshi Kani
2013-01-30  4:48           ` Greg KH
2013-01-31  1:15             ` Toshi Kani
2013-01-31  5:24               ` Greg KH
2013-01-31 14:42                 ` Toshi Kani
2013-01-30  4:53   ` Greg KH
2013-01-31  1:46     ` Toshi Kani
2013-01-30  4:58   ` Greg KH
2013-01-31  2:57     ` Toshi Kani
2013-01-31 20:54       ` Rafael J. Wysocki
2013-02-01  1:32         ` Toshi Kani
2013-02-01  7:30           ` Greg KH
2013-02-01 20:40             ` Toshi Kani
2013-02-01 22:21               ` Rafael J. Wysocki
2013-02-01 23:12                 ` Toshi Kani
2013-02-02 15:01               ` Greg KH
2013-02-04  0:28                 ` Toshi Kani
2013-02-04 12:46                   ` Greg KH
2013-02-04 16:46                     ` Toshi Kani
2013-02-04 19:45                       ` Rafael J. Wysocki
2013-02-04 20:59                         ` Toshi Kani
2013-02-04 23:23                           ` Rafael J. Wysocki
2013-02-04 23:33                             ` Toshi Kani
2013-02-01  7:23         ` Greg KH
2013-02-01 22:12           ` Rafael J. Wysocki
2013-02-02 14:58             ` Greg KH
2013-02-02 20:15               ` Rafael J. Wysocki
2013-02-02 22:18                 ` [PATCH?] Move ACPI device nodes under /sys/firmware/acpi (was: Re: [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework) Rafael J. Wysocki
2013-02-04  1:24                   ` Greg KH
2013-02-04 12:34                     ` Rafael J. Wysocki
2013-02-03 20:44                 ` [RFC PATCH v2 01/12] Add sys_hotplug.h for system device hotplug framework Rafael J. Wysocki
2013-02-04 12:48                   ` Greg KH
2013-02-04 14:21                     ` Rafael J. Wysocki
2013-02-04 14:33                       ` Greg KH
2013-02-04 20:07                         ` Rafael J. Wysocki
2013-02-04 22:13                           ` Toshi Kani
2013-02-04 23:52                             ` Rafael J. Wysocki [this message]
2013-02-05  0:04                               ` Greg KH
2013-02-05  1:02                                 ` Rafael J. Wysocki
2013-02-05 11:11                                 ` Rafael J. Wysocki
2013-02-05 18:39                                   ` Greg KH
2013-02-05 21:13                                     ` Rafael J. Wysocki
2013-02-05  0:55                               ` Toshi Kani
2013-02-04 16:19                       ` Toshi Kani
2013-02-04 19:43                         ` Rafael J. Wysocki
2013-02-04  1:23                 ` Greg KH
2013-02-04 13:41                   ` Rafael J. Wysocki
2013-02-04 16:02                     ` Toshi Kani
2013-02-04 19:48                       ` Rafael J. Wysocki
2013-02-04 19:46                         ` Toshi Kani
2013-02-04 20:12                           ` Rafael J. Wysocki
2013-02-04 20:34                             ` Toshi Kani
2013-02-04 23:19                               ` Rafael J. Wysocki
2013-01-10 23:40 ` [RFC PATCH v2 02/12] ACPI: " Toshi Kani
2013-01-11 21:25   ` Rafael J. Wysocki
2013-01-14 15:53     ` Toshi Kani
2013-01-14 18:47       ` Rafael J. Wysocki
2013-01-14 18:42         ` Toshi Kani
2013-01-14 19:07           ` Rafael J. Wysocki
2013-01-14 19:21             ` Toshi Kani
2013-01-30  4:51               ` Greg KH
2013-01-31  1:38                 ` Toshi Kani
2013-01-14 19:21             ` Greg KH
2013-01-14 19:29               ` Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 03/12] drivers/base: Add " Toshi Kani
2013-01-30  4:54   ` Greg KH
2013-01-31  1:48     ` Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 04/12] cpu: Add cpu hotplug handlers Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 05/12] mm: Add memory " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 06/12] ACPI: Add ACPI bus " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 07/12] ACPI: Add ACPI resource hotplug handler Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 08/12] ACPI: Update processor driver for hotplug framework Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 09/12] ACPI: Update memory " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 10/12] ACPI: Update container " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 11/12] cpu: Update sysfs cpu/online " Toshi Kani
2013-01-10 23:40 ` [RFC PATCH v2 12/12] ACPI: Update sysfs eject " Toshi Kani
2013-01-17  0:50 ` [RFC PATCH v2 00/12] System device hot-plug framework Rafael J. Wysocki
2013-01-17 17:59   ` Toshi Kani

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7003418.onqVlaaHJS@vostro.rjw.lan \
    --to=rjw@sisk.pl \
    --cc=akpm@linux-foundation.org \
    --cc=bhelgaas@google.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=guohanjun@huawei.com \
    --cc=isimatu.yasuaki@jp.fujitsu.com \
    --cc=jiang.liu@huawei.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=linux-s390@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=srivatsa.bhat@linux.vnet.ibm.com \
    --cc=toshi.kani@hp.com \
    --cc=wency@cn.fujitsu.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).