All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [SPDK] Best practices on driver binding for SPDK in production environments
@ 2018-07-09 19:24 Lance Hartmann ORACLE
  0 siblings, 0 replies; 4+ messages in thread
From: Lance Hartmann ORACLE @ 2018-07-09 19:24 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5406 bytes --]


> On Jul 2, 2018, at 2:22 PM, Lance Hartmann ORACLE <lance.hartmann(a)oracle.com <mailto:lance.hartmann(a)oracle.com>> wrote:
> 
> 
> Hi Jim,
> 
> Comments inlined below:
> 
> 
>> On Jul 2, 2018, at 12:35 PM, Harris, James R <james.r.harris(a)intel.com <mailto:james.r.harris(a)intel.com>> wrote:
>> 
>>  
>>  
>> From: SPDK <spdk-bounces(a)lists.01.org <mailto:spdk-bounces(a)lists.01.org>> on behalf of Lance Hartmann ORACLE <lance.hartmann(a)oracle.com <mailto:lance.hartmann(a)oracle.com>>
>> Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org <mailto:spdk(a)lists.01.org>>
>> Date: Tuesday, June 26, 2018 at 9:56 AM
>> To: Storage Performance Development Kit <spdk(a)lists.01.org <mailto:spdk(a)lists.01.org>>
>> Subject: [SPDK] Best practices on driver binding for SPDK in production environments
>>  
>>  
>> <snip>
>>  
>> 4.   With my new udev rules in place, I was successful getting specific NVMe controllers (based on bus-device-function) to unbind from the Linux nvme driver and bind to vfio-pci.   However, I made a couple of observations in the kernel log (dmesg).   In particular, I was drawn to the following for an NVMe controller at BDF:  0000:40:00.0 for which I had a udev rule to unbind from nvme and bind to vfio-pci:
>>  
>> [   35.534279] nvme nvme1: pci function 0000:40:00.0
>> [   37.964945] nvme nvme1: failed to mark controller live
>> [   37.964947] nvme nvme1: Removing after probe failure status: 0
>>  
>> One theory I have for the above is that my udev RUN rule was invoked while the nvme driver’s probe() was still running on this controller, and perhaps the unbind request came in before the probe() completed hence this “name1: failed to mark controller live”.   This has left lingering in my mind that maybe instead of triggering on (Event 2) when the bind occurs, that perhaps I should instead try to derive a trigger on the “last" udev event, an “add”, where the NVMe namespace’s are instantiated.   Of course, I’d need to know ahead of time just how many namespaces exist on that controller if I were to do that so I’d trigger on the last one.   I’m wondering if that may help to avoid what looks like a complaint during the middle of probe() of that particular controller.   Then, again, maybe I can just safely ignore that and not worry about it at all?    Thoughts?
>>  
>> [Jim]  Can you confirm your suspicion - maybe add a 1 or 2 second delay after detecting Event 2 before unbinding – and see if that eliminates the probe failures?  I’m not suggesting that as a workaround or solution – just want to know for sure if we need to worry about deferring the unbind until after the kernel driver’s probe has completed.  It sounds like these error messages are benign but would be nice to avoid them.
> 
> For experimentation purposes, yes, I might be able to instrument a delay to see if the kernel nvme probe failures go away.  I don’t know if udev execution is multi-threaded or not, and thus whether such a delay would block other udev events from getting processed while mine sleeps, but I can explore this at least as an experiment.
> 
> Let me emphasize another point.   While playing with this further, I did subsequently discover that the end result, at least with my particular NVMe drives, was in fact not benign.   That is, although the NVMe controller did appear successfully bound to vfio-pci, execution of any SPDK apps (e.g. perf, identify) returned a failure attempting to communicate with the controller.   I then removed my udev rule, then manually unbound the controller from vfio-pci and rebound it to the kernel’s nvme driver.  After doing that, inspection of dmesg revealed a complaint from the nvme driver accessing the device.   And, so, I then reboot the system — again, having ensured that my udev rule was not in place (neither in my rootfs nor the initramfs) — to see how the controller would behave following a reboot and coming up with the kernel nvme driver in the default scenario.  Again, a dmesg revealed complaints about accessing that particular NVMe controller.   Finally, I power-cycled the host, and lo and behold after doing that, then the NVMe controller came up fine.
> 
> In summary, I will at least attempt the delay-experiment and see if that helps us sidestep the probe failure and leaving the NVMe controller in a bad state.   If that should work, I may then alter the udev rule to trigger instead on the add action of the last namespace instead of the bind action to the nvme driver and see how that works.


Jim, a delay -- "sleep 3" -- added to the udev rule before running the unbind operation appears to avoid triggering the (kernel) nvme driver probe() failure I previously encountered.   Now, whether this is a corner case that could/should be addressed in that device driver, or may be unique to this particular NVMe controller/firmware, is unknown at this time, but at least I have a work-around.   I intend to explore this further via a change to the implementation where I trigger on the last add of the namespace instead of the bind of the nvme driver.   If that proves to be more sound, I'd prefer that than using a sleep which in my experience is one of those things that frequently comes back to haunt you ;-).


--
Lance Hartmann
lance.hartmann(a)oracle.com <mailto:lance.hartmann(a)oracle.com>




[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 12361 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [SPDK] Best practices on driver binding for SPDK in production environments
@ 2018-07-02 19:22 Lance Hartmann ORACLE
  0 siblings, 0 replies; 4+ messages in thread
From: Lance Hartmann ORACLE @ 2018-07-02 19:22 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5189 bytes --]


Hi Jim,

Comments inlined below:


> On Jul 2, 2018, at 12:35 PM, Harris, James R <james.r.harris(a)intel.com> wrote:
> 
>  
>  
> From: SPDK <spdk-bounces(a)lists.01.org <mailto:spdk-bounces(a)lists.01.org>> on behalf of Lance Hartmann ORACLE <lance.hartmann(a)oracle.com <mailto:lance.hartmann(a)oracle.com>>
> Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org <mailto:spdk(a)lists.01.org>>
> Date: Tuesday, June 26, 2018 at 9:56 AM
> To: Storage Performance Development Kit <spdk(a)lists.01.org <mailto:spdk(a)lists.01.org>>
> Subject: [SPDK] Best practices on driver binding for SPDK in production environments
>  
>  
> <snip>
>  
> 4.   With my new udev rules in place, I was successful getting specific NVMe controllers (based on bus-device-function) to unbind from the Linux nvme driver and bind to vfio-pci.   However, I made a couple of observations in the kernel log (dmesg).   In particular, I was drawn to the following for an NVMe controller at BDF:  0000:40:00.0 for which I had a udev rule to unbind from nvme and bind to vfio-pci:
>  
> [   35.534279] nvme nvme1: pci function 0000:40:00.0
> [   37.964945] nvme nvme1: failed to mark controller live
> [   37.964947] nvme nvme1: Removing after probe failure status: 0
>  
> One theory I have for the above is that my udev RUN rule was invoked while the nvme driver’s probe() was still running on this controller, and perhaps the unbind request came in before the probe() completed hence this “name1: failed to mark controller live”.   This has left lingering in my mind that maybe instead of triggering on (Event 2) when the bind occurs, that perhaps I should instead try to derive a trigger on the “last" udev event, an “add”, where the NVMe namespace’s are instantiated.   Of course, I’d need to know ahead of time just how many namespaces exist on that controller if I were to do that so I’d trigger on the last one.   I’m wondering if that may help to avoid what looks like a complaint during the middle of probe() of that particular controller.   Then, again, maybe I can just safely ignore that and not worry about it at all?    Thoughts?
>  
> [Jim]  Can you confirm your suspicion - maybe add a 1 or 2 second delay after detecting Event 2 before unbinding – and see if that eliminates the probe failures?  I’m not suggesting that as a workaround or solution – just want to know for sure if we need to worry about deferring the unbind until after the kernel driver’s probe has completed.  It sounds like these error messages are benign but would be nice to avoid them.

For experimentation purposes, yes, I might be able to instrument a delay to see if the kernel nvme probe failures go away.  I don’t know if udev execution is multi-threaded or not, and thus whether such a delay would block other udev events from getting processed while mine sleeps, but I can explore this at least as an experiment.

Let me emphasize another point.   While playing with this further, I did subsequently discover that the end result, at least with my particular NVMe drives, was in fact not benign.   That is, although the NVMe controller did appear successfully bound to vfio-pci, execution of any SPDK apps (e.g. perf, identify) returned a failure attempting to communicate with the controller.   I then removed my udev rule, then manually unbound the controller from vfio-pci and rebound it to the kernel’s nvme driver.  After doing that, inspection of dmesg revealed a complaint from the nvme driver accessing the device.   And, so, I then reboot the system — again, having ensured that my udev rule was not in place (neither in my rootfs nor the initramfs) — to see how the controller would behave following a reboot and coming up with the kernel nvme driver in the default scenario.  Again, a dmesg revealed complaints about accessing that particular NVMe controller.   Finally, I power-cycled the host, and lo and behold after doing that, then the NVMe controller came up fine.

In summary, I will at least attempt the delay-experiment and see if that helps us sidestep the probe failure and leaving the NVMe controller in a bad state.   If that should work, I may then alter the udev rule to trigger instead on the add action of the last namespace instead of the bind action to the nvme driver and see how that works.

>  
> Overall this seems like a reasonable approach though.  How do you see this working if a system has multiple NVMe SSDs – one of which has the OS install, and the rest should be assigned to uio/vfio?


We do have this exact scenario; i.e. systems with NVMe controllers (on which file systems are mounted) which depend on the kernel nvme driver where other NVMe controllers are ‘reserved’ for SPDK-use.   Among the criteria in my udev rule’s trigger criteria is also the BDF (bus-device-function), so this should work fine.   We just have to make abundantly clear how careful one must be configuring the system to use this mechanism to avoid inadvertently triggering on a NVMe controller that’s needed  for use with the kernel nvme driver.

--
Lance Hartmann
lance.hartmann(a)oracle.com



[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 12300 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [SPDK] Best practices on driver binding for SPDK in production environments
@ 2018-07-02 17:35 Harris, James R
  0 siblings, 0 replies; 4+ messages in thread
From: Harris, James R @ 2018-07-02 17:35 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2774 bytes --]



From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Lance Hartmann ORACLE <lance.hartmann(a)oracle.com>
Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
Date: Tuesday, June 26, 2018 at 9:56 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Best practices on driver binding for SPDK in production environments


<snip>

4.   With my new udev rules in place, I was successful getting specific NVMe controllers (based on bus-device-function) to unbind from the Linux nvme driver and bind to vfio-pci.   However, I made a couple of observations in the kernel log (dmesg).   In particular, I was drawn to the following for an NVMe controller at BDF:  0000:40:00.0 for which I had a udev rule to unbind from nvme and bind to vfio-pci:

[   35.534279] nvme nvme1: pci function 0000:40:00.0
[   37.964945] nvme nvme1: failed to mark controller live
[   37.964947] nvme nvme1: Removing after probe failure status: 0

One theory I have for the above is that my udev RUN rule was invoked while the nvme driver’s probe() was still running on this controller, and perhaps the unbind request came in before the probe() completed hence this “name1: failed to mark controller live”.   This has left lingering in my mind that maybe instead of triggering on (Event 2) when the bind occurs, that perhaps I should instead try to derive a trigger on the “last" udev event, an “add”, where the NVMe namespace’s are instantiated.   Of course, I’d need to know ahead of time just how many namespaces exist on that controller if I were to do that so I’d trigger on the last one.   I’m wondering if that may help to avoid what looks like a complaint during the middle of probe() of that particular controller.   Then, again, maybe I can just safely ignore that and not worry about it at all?    Thoughts?

[Jim]  Can you confirm your suspicion - maybe add a 1 or 2 second delay after detecting Event 2 before unbinding – and see if that eliminates the probe failures?  I’m not suggesting that as a workaround or solution – just want to know for sure if we need to worry about deferring the unbind until after the kernel driver’s probe has completed.  It sounds like these error messages are benign but would be nice to avoid them.

Overall this seems like a reasonable approach though.  How do you see this working if a system has multiple NVMe SSDs – one of which has the OS install, and the rest should be assigned to uio/vfio?

I discovered another issue during this experimentation that is somewhat tangential to this task, but I’ll write a separate email on that topic.

thanks for any feedback,
--
Lance Hartmann
lance.hartmann(a)oracle.com<mailto:lance.hartmann(a)oracle.com>


[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 8039 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [SPDK] Best practices on driver binding for SPDK in production environments
@ 2018-06-26 16:56 Lance Hartmann ORACLE
  0 siblings, 0 replies; 4+ messages in thread
From: Lance Hartmann ORACLE @ 2018-06-26 16:56 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 5301 bytes --]


This email to the SPDK list is a follow-on to a brief discussion held during a recent SPDK community meeting (Tue Jun 26 UTC 15:00).

Lifted and edited from the Trello agenda item (https://trello.com/c/U291IBYx/91-best-practices-on-driver-binding-for-spdk-in-production-environments <https://trello.com/c/U291IBYx/91-best-practices-on-driver-binding-for-spdk-in-production-environments>):

During development many (most?) people rely on the run of SPDK's scripts/setup.sh to perform a number of initializations, among them the unbinding of the Linux kernel nvme driver from NVMe controllers targeted for use by the SPDK and then binding them to either uio_pci_generic or vfio-pci.   This script is applicable for development environments, but not targeted for use in productions systems employing the SPDK.

I'd like to confer with my fellow SPDK community members on ideas, suggestions and best practices for handling this driver unbinding/binding.   I wrote some udev rules along with updates to some other Linux system conf files for automatically loading either the uio_pci_generic or vfio-pci modules.  I also had to update my initramfs so that when the system comes all the way up, the desired NVMe controllers are already bound to the needed driver for SPDK operation.  And, as a bonus, it should "just work" when a hotplug occurs as well.   However, there may be additional considerations I might have overlooked on which I'd appreciate input.   Further, there's the matter of how and whether to semi-automate this configuration via some kind of script and how that might vary according to Linux distro to say nothing of the determination of employing uio_pci_generic vs vfio-pci.

And, now some details:

1.  I performed this on an Oracle Linux (OL) distro.   I’m currently unaware how and what configuration files might be different depending on the distro.   Oracle Linux is RedHat-compatible, so I’m confident my implementation should run similarly on RedHat-based systems, but I’ve yet to delve into other distro’s like Debian, SuSE, etc.

2.  In preparation to writing my own udev rules, I unbound a specific NVMe controller from the Linux nvme driver by hand.   Then, in another window I launched:  "udevadm monitor -k -p” so that I could observe the usual udev events when a NVMe controller is bound to the nvme driver.   On my system, I observed four (4) udev kernel events (abbreviated/edited output to avoid this become excessively long):

(Event 1)
KERNEL[382128.187273] add      /devices/pci0000:00/0000:00:02.2/0000:30:00.0/nvme/nvme0 (nvme)
ACTION=add
DEVNAME=/dev/nvme0
…
SUBSYSTEM=nvme

(Event 2)
KERNEL[382128.244658] bind     /devices/pci0000:00/0000:00:02.2/0000:30:00.0 (pci)
ACTION=bind
DEVPATH=/devices/pci0000:00/0000:00:02.2/0000:30:00.0
DRIVER=nvme
…
SUBSYSTEM=pci

(Event 3)
KERNEL[382130.697832] add      /devices/virtual/bdi/259:0 (bdi)
ACTION=add
DEVPATH=/devices/virtual/bdi/259:0
...
SUBSYSTEM=bdi

(Event 4)
KERNEL[382130.698192] add      /devices/pci0000:00/0000:00:02.2/0000:30:00.0/nvme/nvme0/nvme0n1 (block)
ACTION=add
DEVNAME=/dev/nvme0n1
DEVPATH=/devices/pci0000:00/0000:00:02.2/0000:30:00.0/nvme/nvme0/nvme0n1
DEVTYPE=disk
...
SUBSYSTEM=block


3.   My udev rule triggers on (Event 2) above:   the bind action.   Upon this action, my udev rule appends operations to the special udev RUN variable such that udev will essentially mirror that which is done in the SPDK’s scripts/setup.sh for unbinding from the nvme driver and binding to, in my case, the vfio-pci driver.

4.   With my new udev rules in place, I was successful getting specific NVMe controllers (based on bus-device-function) to unbind from the Linux nvme driver and bind to vfio-pci.   However, I made a couple of observations in the kernel log (dmesg).   In particular, I was drawn to the following for an NVMe controller at BDF:  0000:40:00.0 for which I had a udev rule to unbind from nvme and bind to vfio-pci:

[   35.534279] nvme nvme1: pci function 0000:40:00.0
[   37.964945] nvme nvme1: failed to mark controller live
[   37.964947] nvme nvme1: Removing after probe failure status: 0

One theory I have for the above is that my udev RUN rule was invoked while the nvme driver’s probe() was still running on this controller, and perhaps the unbind request came in before the probe() completed hence this “name1: failed to mark controller live”.   This has left lingering in my mind that maybe instead of triggering on (Event 2) when the bind occurs, that perhaps I should instead try to derive a trigger on the “last" udev event, an “add”, where the NVMe namespace’s are instantiated.   Of course, I’d need to know ahead of time just how many namespaces exist on that controller if I were to do that so I’d trigger on the last one.   I’m wondering if that may help to avoid what looks like a complaint during the middle of probe() of that particular controller.   Then, again, maybe I can just safely ignore that and not worry about it at all?    Thoughts?

I discovered another issue during this experimentation that is somewhat tangential to this task, but I’ll write a separate email on that topic.

thanks for any feedback,
--
Lance Hartmann
lance.hartmann(a)oracle.com



[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 7902 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-07-09 19:24 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-09 19:24 [SPDK] Best practices on driver binding for SPDK in production environments Lance Hartmann ORACLE
  -- strict thread matches above, loose matches on Subject: below --
2018-07-02 19:22 Lance Hartmann ORACLE
2018-07-02 17:35 Harris, James R
2018-06-26 16:56 Lance Hartmann ORACLE

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.