All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: joeyli <jlee@suse.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Kani Toshimitsu <toshi.kani@hpe.com>,
	Jiri Kosina <jkosina@suse.cz>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	linux-api@vger.kernel.org
Subject: Re: memory hotplug and force_remove
Date: Fri, 31 Mar 2017 14:02:36 +0200	[thread overview]
Message-ID: <20170331120236.GO27098@dhcp22.suse.cz> (raw)
In-Reply-To: <20170331115530.GB28365@linux-l9pv.suse>

On Fri 31-03-17 19:55:30, Joey Lee wrote:
> On Fri, Mar 31, 2017 at 12:55:05PM +0200, Michal Hocko wrote:
> > On Fri 31-03-17 18:49:05, Joey Lee wrote:
> > > Hi Michal,
> > > 
> > > On Fri, Mar 31, 2017 at 10:30:17AM +0200, Michal Hocko wrote:
> > [...]
> > > > @@ -241,11 +232,10 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
> > > >  		acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
> > > >  				    NULL, acpi_bus_offline, (void *)true,
> > > >  				    (void **)&errdev);
> > > > -		if (!errdev || acpi_force_hot_remove)
> > > > +		if (!errdev)
> > > >  			acpi_bus_offline(handle, 0, (void *)true,
> > > >  					 (void **)&errdev);
> > > > -
> > > > -		if (errdev && !acpi_force_hot_remove) {
> > > > +		else {
> > >               ^^^^^^^^^^^^^
> > > Here should still checks the parent's errdev state then rollback
> > > parent/children to online state:
> > > 
> > > -		if (errdev && !acpi_force_hot_remove) {
> > > +		if (errdev) {
> > 
> > You are right, I have missed that acpi_bus_offline modifies errdev.
> > Thanks for spotting that! Updated patch is below.
> > ---
> > >From 8df0abd29988ffb52b6df52407b96d6015861bb7 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Fri, 31 Mar 2017 10:08:41 +0200
> > Subject: [PATCH] acpi: drop support for force_remove
> > 
> > /sys/firmware/acpi/hotplug/force_remove was presumably added to support
> > auto offlining in the past. This is, however, inherently dangerous for
> > some hotplugable resources like memory. The memory offlining fails when
> > the memory is still in use and cannot be dropped or migrated. If we
> > ignore the failure we are basically allowing for subtle memory
> > corruption or a crash.
> > 
> > We have actually noticed the later while hitting BUG() during the memory
> > hotremove (remove_memory):
> > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > 			check_memblock_offlined_cb);
> > 	if (ret)
> > 		BUG();
> > 
> > it took us quite non-trivial time realize that the customer had
> > force_remove enabled. Even if the BUG was removed here and we could
> > propagate the error up the call chain it wouldn't help at all because
> > then we would hit a crash or a memory corruption later and harder to
> > debug. So force_remove is unfixable for the memory hotremove. We haven't
> > checked other hotplugable resources to be prone to a similar problems.
> > 
> > Remove the force_remove functionality because it is not fixable currently.
> > Keep the sysfs file and report an error if somebody tries to enable it.
> > Encourage users to report about the missing functionality and work with
> > them with an alternative solution.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> This patch is good to me. Please feel free to add:
> 
> Reviewed-by: Lee, Chun-Yi <jlee@suse.com>

Thanks for the review Joey!
-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: joeyli <jlee@suse.com>
Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Kani Toshimitsu <toshi.kani@hpe.com>,
	Jiri Kosina <jkosina@suse.cz>,
	linux-mm@kvack.org, LKML <linux-kernel@vger.kernel.org>,
	linux-api@vger.kernel.org
Subject: Re: memory hotplug and force_remove
Date: Fri, 31 Mar 2017 14:02:36 +0200	[thread overview]
Message-ID: <20170331120236.GO27098@dhcp22.suse.cz> (raw)
In-Reply-To: <20170331115530.GB28365@linux-l9pv.suse>

On Fri 31-03-17 19:55:30, Joey Lee wrote:
> On Fri, Mar 31, 2017 at 12:55:05PM +0200, Michal Hocko wrote:
> > On Fri 31-03-17 18:49:05, Joey Lee wrote:
> > > Hi Michal,
> > > 
> > > On Fri, Mar 31, 2017 at 10:30:17AM +0200, Michal Hocko wrote:
> > [...]
> > > > @@ -241,11 +232,10 @@ static int acpi_scan_try_to_offline(struct acpi_device *device)
> > > >  		acpi_walk_namespace(ACPI_TYPE_ANY, handle, ACPI_UINT32_MAX,
> > > >  				    NULL, acpi_bus_offline, (void *)true,
> > > >  				    (void **)&errdev);
> > > > -		if (!errdev || acpi_force_hot_remove)
> > > > +		if (!errdev)
> > > >  			acpi_bus_offline(handle, 0, (void *)true,
> > > >  					 (void **)&errdev);
> > > > -
> > > > -		if (errdev && !acpi_force_hot_remove) {
> > > > +		else {
> > >               ^^^^^^^^^^^^^
> > > Here should still checks the parent's errdev state then rollback
> > > parent/children to online state:
> > > 
> > > -		if (errdev && !acpi_force_hot_remove) {
> > > +		if (errdev) {
> > 
> > You are right, I have missed that acpi_bus_offline modifies errdev.
> > Thanks for spotting that! Updated patch is below.
> > ---
> > >From 8df0abd29988ffb52b6df52407b96d6015861bb7 Mon Sep 17 00:00:00 2001
> > From: Michal Hocko <mhocko@suse.com>
> > Date: Fri, 31 Mar 2017 10:08:41 +0200
> > Subject: [PATCH] acpi: drop support for force_remove
> > 
> > /sys/firmware/acpi/hotplug/force_remove was presumably added to support
> > auto offlining in the past. This is, however, inherently dangerous for
> > some hotplugable resources like memory. The memory offlining fails when
> > the memory is still in use and cannot be dropped or migrated. If we
> > ignore the failure we are basically allowing for subtle memory
> > corruption or a crash.
> > 
> > We have actually noticed the later while hitting BUG() during the memory
> > hotremove (remove_memory):
> > 	ret = walk_memory_range(PFN_DOWN(start), PFN_UP(start + size - 1), NULL,
> > 			check_memblock_offlined_cb);
> > 	if (ret)
> > 		BUG();
> > 
> > it took us quite non-trivial time realize that the customer had
> > force_remove enabled. Even if the BUG was removed here and we could
> > propagate the error up the call chain it wouldn't help at all because
> > then we would hit a crash or a memory corruption later and harder to
> > debug. So force_remove is unfixable for the memory hotremove. We haven't
> > checked other hotplugable resources to be prone to a similar problems.
> > 
> > Remove the force_remove functionality because it is not fixable currently.
> > Keep the sysfs file and report an error if somebody tries to enable it.
> > Encourage users to report about the missing functionality and work with
> > them with an alternative solution.
> > 
> > Signed-off-by: Michal Hocko <mhocko@suse.com>
> 
> This patch is good to me. Please feel free to add:
> 
> Reviewed-by: Lee, Chun-Yi <jlee@suse.com>

Thanks for the review Joey!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2017-03-31 12:02 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-20 19:29 memory hotplug and force_remove Michal Hocko
2017-03-20 19:29 ` Michal Hocko
2017-03-20 21:24 ` Rafael J. Wysocki
2017-03-20 21:24   ` Rafael J. Wysocki
2017-03-21 16:13   ` joeyli
2017-03-21 16:13     ` joeyli
2017-03-28  7:58   ` Michal Hocko
2017-03-28  7:58     ` Michal Hocko
2017-03-28 15:22     ` Rafael J. Wysocki
2017-03-28 15:22       ` Rafael J. Wysocki
2017-03-30  8:47       ` Jiri Kosina
2017-03-30  8:47         ` Jiri Kosina
2017-03-30 16:20         ` Michal Hocko
2017-03-30 16:20           ` Michal Hocko
2017-03-30 16:57           ` joeyli
2017-03-30 16:57             ` joeyli
2017-03-30 16:57             ` joeyli
2017-03-30 20:15             ` Rafael J. Wysocki
2017-03-30 20:15               ` Rafael J. Wysocki
2017-03-30 20:15               ` Rafael J. Wysocki
2017-03-31  0:00               ` joeyli
2017-03-31  0:00                 ` joeyli
2017-03-31  8:30       ` Michal Hocko
2017-03-31  8:30         ` Michal Hocko
2017-03-31  8:30         ` Michal Hocko
2017-03-31 10:49         ` joeyli
2017-03-31 10:49           ` joeyli
2017-03-31 10:49           ` joeyli
2017-03-31 10:55           ` Michal Hocko
2017-03-31 10:55             ` Michal Hocko
2017-03-31 10:55             ` Michal Hocko
2017-03-31 11:55             ` joeyli
2017-03-31 11:55               ` joeyli
2017-03-31 11:55               ` joeyli
2017-03-31 12:02               ` Michal Hocko [this message]
2017-03-31 12:02                 ` Michal Hocko
2017-03-31 22:35                 ` Rafael J. Wysocki
2017-03-31 22:35                   ` Rafael J. Wysocki
2017-03-31 22:35                   ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170331120236.GO27098@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=jkosina@suse.cz \
    --cc=jlee@suse.com \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rjw@rjwysocki.net \
    --cc=toshi.kani@hpe.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.