All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Andy Shevchenko <andriy.shevchenko@intel.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Len Brown <len.brown@intel.com>, Len Brown <lenb@kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Pavel Machek <pavel@ucw.cz>, Stable <stable@vger.kernel.org>,
	ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 00/12] ACPI/NVDIMM: Runtime Firmware Activation
Date: Sun, 28 Jun 2020 19:22:55 +0200	[thread overview]
Message-ID: <CAJZ5v0i=SkqtgcXzq0oYNEAuYA-FvBEG-bm6fyidzAsYSNcEdQ@mail.gmail.com> (raw)
In-Reply-To: <CAPcyv4jqShnZr1b0-upwWf8L3JjKtHox_pCuu229630rXGuLkg@mail.gmail.com>

On Fri, Jun 26, 2020 at 8:43 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Fri, Jun 26, 2020 at 7:22 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Fri, Jun 26, 2020 at 2:06 AM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > Quoting the documentation:
> > >
> > >     Some persistent memory devices run a firmware locally on the device /
> > >     "DIMM" to perform tasks like media management, capacity provisioning,
> > >     and health monitoring. The process of updating that firmware typically
> > >     involves a reboot because it has implications for in-flight memory
> > >     transactions. However, reboots are disruptive and at least the Intel
> > >     persistent memory platform implementation, described by the Intel ACPI
> > >     DSM specification [1], has added support for activating firmware at
> > >     runtime.
> > >
> > >     [1]: https://docs.pmem.io/persistent-memory/
> > >
> > > The approach taken is to abstract the Intel platform specific mechanism
> > > behind a libnvdimm-generic sysfs interface. The interface could support
> > > runtime-firmware-activation on another architecture without need to
> > > change userspace tooling.
> > >
> > > The ACPI NFIT implementation involves a set of device-specific-methods
> > > (DSMs) to 'arm' individual devices for activation and bus-level
> > > 'trigger' method to execute the activation. Informational / enumeration
> > > methods are also provided at the bus and device level.
> > >
> > > One complicating aspect of the memory device firmware activation is that
> > > the memory controller may need to be quiesced, no memory cycles, during
> > > the activation. While the platform has mechanisms to support holding off
> > > in-flight DMA during the activation, the device response to that delay
> > > is potentially undefined. The platform may reject a runtime firmware
> > > update if, for example a PCI-E device does not support its completion
> > > timeout value being increased to meet the activation time. Outside of
> > > device timeouts the quiesce period may also violate application
> > > timeouts.
> > >
> > > Given the above device and application timeout considerations the
> > > implementation defaults to hooking into the suspend path to trigger the
> > > activation, i.e. that a suspend-resume cycle (at least up to the syscore
> > > suspend point) is required.
> >
> > Well, that doesn't work if the suspend method for the system is set to
> > suspend-to-idle (for example, via /sys/power/mem_sleep), because the
> > syscore callbacks are not invoked in that case.
> >
> > Also you probably don't need the device power state toggling that
> > happens during regular suspend/resume (you may not want it even for
> > some devices).
> >
> > The hibernation freeze/thaw may be a better match and there is some
> > test support in there already that may be kind of co-opted for your
> > use case.
>
> Hmm, yes I guess freeze should be sufficient to quiesce most
> device-DMA in the general case as applications will stop sending
> requests.

It is expected to be sufficient to quiesce all of them.

If that is not the case, the integrity of the hibernation image cannot
be guaranteed on the system in question.

> I do expect some RDMA devices will happily keep on
> transmitting, but that likely will need explicit mitigation. It also
> appears the suspend callback for at least one RDMA device
> mlx5_suspend() is rather violent as it appears to fully teardown the
> device context, not just suspend operations.
>
> To be clear, what debug interface were you thinking I could glom onto
> to just trigger firmware-activate at the end of the freeze phase?

Functionally, the same as for suspend, but using the hibernation
interface, so "echo platform > /sys/power/pm_test" followed by "echo
disk > /sys/power/state".

But it might be cleaner to introduce a special "hibernation mode", ie.
is one more item in /sys/power/disk, that will trigger what you need
(in analogy with "test_resume").
_______________________________________________
Linux-nvdimm mailing list -- linux-nvdimm@lists.01.org
To unsubscribe send an email to linux-nvdimm-leave@lists.01.org

WARNING: multiple messages have this Message-ID (diff)
From: "Rafael J. Wysocki" <rafael@kernel.org>
To: Dan Williams <dan.j.williams@intel.com>
Cc: "Rafael J. Wysocki" <rafael@kernel.org>,
	"linux-nvdimm@lists.01.org" <linux-nvdimm@lists.01.org>,
	Ira Weiny <ira.weiny@intel.com>,
	Dave Jiang <dave.jiang@intel.com>,
	"Rafael J. Wysocki" <rafael.j.wysocki@intel.com>,
	Vishal Verma <vishal.l.verma@intel.com>,
	Andy Shevchenko <andriy.shevchenko@intel.com>,
	Jonathan Corbet <corbet@lwn.net>,
	Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
	Len Brown <len.brown@intel.com>, Len Brown <lenb@kernel.org>,
	"Rafael J. Wysocki" <rjw@rjwysocki.net>,
	Pavel Machek <pavel@ucw.cz>, Stable <stable@vger.kernel.org>,
	ACPI Devel Maling List <linux-acpi@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH 00/12] ACPI/NVDIMM: Runtime Firmware Activation
Date: Sun, 28 Jun 2020 19:22:55 +0200	[thread overview]
Message-ID: <CAJZ5v0i=SkqtgcXzq0oYNEAuYA-FvBEG-bm6fyidzAsYSNcEdQ@mail.gmail.com> (raw)
In-Reply-To: <CAPcyv4jqShnZr1b0-upwWf8L3JjKtHox_pCuu229630rXGuLkg@mail.gmail.com>

On Fri, Jun 26, 2020 at 8:43 PM Dan Williams <dan.j.williams@intel.com> wrote:
>
> On Fri, Jun 26, 2020 at 7:22 AM Rafael J. Wysocki <rafael@kernel.org> wrote:
> >
> > On Fri, Jun 26, 2020 at 2:06 AM Dan Williams <dan.j.williams@intel.com> wrote:
> > >
> > > Quoting the documentation:
> > >
> > >     Some persistent memory devices run a firmware locally on the device /
> > >     "DIMM" to perform tasks like media management, capacity provisioning,
> > >     and health monitoring. The process of updating that firmware typically
> > >     involves a reboot because it has implications for in-flight memory
> > >     transactions. However, reboots are disruptive and at least the Intel
> > >     persistent memory platform implementation, described by the Intel ACPI
> > >     DSM specification [1], has added support for activating firmware at
> > >     runtime.
> > >
> > >     [1]: https://docs.pmem.io/persistent-memory/
> > >
> > > The approach taken is to abstract the Intel platform specific mechanism
> > > behind a libnvdimm-generic sysfs interface. The interface could support
> > > runtime-firmware-activation on another architecture without need to
> > > change userspace tooling.
> > >
> > > The ACPI NFIT implementation involves a set of device-specific-methods
> > > (DSMs) to 'arm' individual devices for activation and bus-level
> > > 'trigger' method to execute the activation. Informational / enumeration
> > > methods are also provided at the bus and device level.
> > >
> > > One complicating aspect of the memory device firmware activation is that
> > > the memory controller may need to be quiesced, no memory cycles, during
> > > the activation. While the platform has mechanisms to support holding off
> > > in-flight DMA during the activation, the device response to that delay
> > > is potentially undefined. The platform may reject a runtime firmware
> > > update if, for example a PCI-E device does not support its completion
> > > timeout value being increased to meet the activation time. Outside of
> > > device timeouts the quiesce period may also violate application
> > > timeouts.
> > >
> > > Given the above device and application timeout considerations the
> > > implementation defaults to hooking into the suspend path to trigger the
> > > activation, i.e. that a suspend-resume cycle (at least up to the syscore
> > > suspend point) is required.
> >
> > Well, that doesn't work if the suspend method for the system is set to
> > suspend-to-idle (for example, via /sys/power/mem_sleep), because the
> > syscore callbacks are not invoked in that case.
> >
> > Also you probably don't need the device power state toggling that
> > happens during regular suspend/resume (you may not want it even for
> > some devices).
> >
> > The hibernation freeze/thaw may be a better match and there is some
> > test support in there already that may be kind of co-opted for your
> > use case.
>
> Hmm, yes I guess freeze should be sufficient to quiesce most
> device-DMA in the general case as applications will stop sending
> requests.

It is expected to be sufficient to quiesce all of them.

If that is not the case, the integrity of the hibernation image cannot
be guaranteed on the system in question.

> I do expect some RDMA devices will happily keep on
> transmitting, but that likely will need explicit mitigation. It also
> appears the suspend callback for at least one RDMA device
> mlx5_suspend() is rather violent as it appears to fully teardown the
> device context, not just suspend operations.
>
> To be clear, what debug interface were you thinking I could glom onto
> to just trigger firmware-activate at the end of the freeze phase?

Functionally, the same as for suspend, but using the hibernation
interface, so "echo platform > /sys/power/pm_test" followed by "echo
disk > /sys/power/state".

But it might be cleaner to introduce a special "hibernation mode", ie.
is one more item in /sys/power/disk, that will trigger what you need
(in analogy with "test_resume").

  reply	other threads:[~2020-06-28 17:23 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-25 23:50 [PATCH 00/12] ACPI/NVDIMM: Runtime Firmware Activation Dan Williams
2020-06-25 23:50 ` Dan Williams
2020-06-25 23:50 ` [PATCH 01/12] libnvdimm: Validate command family indices Dan Williams
2020-06-25 23:50   ` Dan Williams
2020-07-01 19:33   ` Sasha Levin
2020-07-01 19:33     ` Sasha Levin
2020-07-10 14:02   ` Sasha Levin
2020-07-10 14:02     ` Sasha Levin
2020-06-25 23:50 ` [PATCH 02/12] ACPI: NFIT: Move bus_dsm_mask out of generic nvdimm_bus_descriptor Dan Williams
2020-06-25 23:50   ` Dan Williams
2020-06-25 23:50 ` [PATCH 03/12] ACPI: NFIT: Define runtime firmware activation commands Dan Williams
2020-06-25 23:50   ` Dan Williams
2020-06-25 23:50 ` [PATCH 04/12] tools/testing/nvdimm: Cleanup dimm index passing Dan Williams
2020-06-25 23:50   ` Dan Williams
2020-06-25 23:50 ` [PATCH 05/12] tools/testing/nvdimm: Add command debug messages Dan Williams
2020-06-25 23:50   ` Dan Williams
2020-06-25 23:50 ` [PATCH 06/12] tools/testing/nvdimm: Prepare nfit_ctl_test() for ND_CMD_CALL emulation Dan Williams
2020-06-25 23:50   ` Dan Williams
2020-06-25 23:50 ` [PATCH 07/12] tools/testing/nvdimm: Emulate firmware activation commands Dan Williams
2020-06-25 23:50   ` Dan Williams
2020-06-25 23:51 ` [PATCH 08/12] driver-core: Introduce DEVICE_ATTR_ADMIN_{RO,RW} Dan Williams
2020-06-25 23:51   ` Dan Williams
2020-06-26  5:06   ` Greg Kroah-Hartman
2020-06-26  5:06     ` Greg Kroah-Hartman
2020-06-26  5:09     ` Dan Williams
2020-06-26  5:09       ` Dan Williams
2020-06-25 23:51 ` [PATCH 09/12] libnvdimm: Convert to DEVICE_ATTR_ADMIN_RO() Dan Williams
2020-06-25 23:51   ` Dan Williams
2020-06-25 23:51 ` [PATCH 10/12] libnvdimm: Add runtime firmware activation sysfs interface Dan Williams
2020-06-25 23:51   ` Dan Williams
2020-06-25 23:51 ` [PATCH 11/12] PM, libnvdimm: Add syscore_quiesced() callback for firmware activation Dan Williams
2020-06-25 23:51   ` Dan Williams
2020-06-26 14:23   ` Rafael J. Wysocki
2020-06-26 14:23     ` Rafael J. Wysocki
2020-06-25 23:51 ` [PATCH 12/12] ACPI: NFIT: Add runtime firmware activate support Dan Williams
2020-06-25 23:51   ` Dan Williams
2020-06-26 14:22 ` [PATCH 00/12] ACPI/NVDIMM: Runtime Firmware Activation Rafael J. Wysocki
2020-06-26 14:22   ` Rafael J. Wysocki
2020-06-26 18:43   ` Dan Williams
2020-06-26 18:43     ` Dan Williams
2020-06-28 17:22     ` Rafael J. Wysocki [this message]
2020-06-28 17:22       ` Rafael J. Wysocki
2020-06-29 23:37       ` Dan Williams
2020-06-29 23:37         ` Dan Williams
2020-06-30 10:55         ` Rafael J. Wysocki
2020-06-30 10:55           ` Rafael J. Wysocki

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAJZ5v0i=SkqtgcXzq0oYNEAuYA-FvBEG-bm6fyidzAsYSNcEdQ@mail.gmail.com' \
    --to=rafael@kernel.org \
    --cc=andriy.shevchenko@intel.com \
    --cc=corbet@lwn.net \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=len.brown@intel.com \
    --cc=lenb@kernel.org \
    --cc=linux-acpi@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-nvdimm@lists.01.org \
    --cc=pavel@ucw.cz \
    --cc=rafael.j.wysocki@intel.com \
    --cc=rjw@rjwysocki.net \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.