All of lore.kernel.org
 help / color / mirror / Atom feed
* please revert a nvme-cli commit
@ 2018-06-13  7:55 Christoph Hellwig
  2018-06-13 13:31 ` Keith Busch
  2018-06-13 15:51 ` James Smart
  0 siblings, 2 replies; 6+ messages in thread
From: Christoph Hellwig @ 2018-06-13  7:55 UTC (permalink / raw)


d8f949c21e9d832149518f26c71dfd8de16cd628 does bogus blind retries
on failure, and the kernel part of it was rejected.  Let's sort this
out properly.

Similarly at least for now bb2d87d7f386882005bb87fe9412af23c646c876 is
bogus as well.  We require fabrics connect to be synchronous, and changing
that will require a longer discussion first.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* please revert a nvme-cli commit
  2018-06-13  7:55 please revert a nvme-cli commit Christoph Hellwig
@ 2018-06-13 13:31 ` Keith Busch
  2018-06-13 15:51 ` James Smart
  1 sibling, 0 replies; 6+ messages in thread
From: Keith Busch @ 2018-06-13 13:31 UTC (permalink / raw)


On Wed, Jun 13, 2018@09:55:57AM +0200, Christoph Hellwig wrote:
> d8f949c21e9d832149518f26c71dfd8de16cd628 does bogus blind retries
> on failure, and the kernel part of it was rejected.  Let's sort this
> out properly.
> 
> Similarly at least for now bb2d87d7f386882005bb87fe9412af23c646c876 is
> bogus as well.  We require fabrics connect to be synchronous, and changing
> that will require a longer discussion first.

Okay, will do.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* please revert a nvme-cli commit
  2018-06-13  7:55 please revert a nvme-cli commit Christoph Hellwig
  2018-06-13 13:31 ` Keith Busch
@ 2018-06-13 15:51 ` James Smart
  2018-06-14 12:42   ` Christoph Hellwig
  1 sibling, 1 reply; 6+ messages in thread
From: James Smart @ 2018-06-13 15:51 UTC (permalink / raw)


Really ?? sigh.? I have lots of consumers that have no issues with these 
changes and there is nothing that acts "incompatible".?? It's been 1.5 
months - where have you been?

These conditions can occur independent of any change in kernel 
implementation and are significant robustness corrections.

On 6/13/2018 12:55 AM, Christoph Hellwig wrote:
> d8f949c21e9d832149518f26c71dfd8de16cd628 does bogus blind retries
> on failure, and the kernel part of it was rejected.  Let's sort this
> out properly.

title: nvme-cli: Add ioctl retry support for "connect-all"

This should have been sorted out a long time ago. The current 
implementation is completely "happy path".? Failure can occur anytime 
when an error occurs with a discovery controller that causes a 
reconnect.?? This change is isolated to retrieving discovery log records 
and is specifically tied to an EAGAIN status - not a "blind failure".

>
> Similarly at least for now bb2d87d7f386882005bb87fe9412af23c646c876 is
> bogus as well.  We require fabrics connect to be synchronous, and changing
> that will require a longer discussion first.

title: nvme-cli: Wait for device file if not present after successful 
add_ctrl

This, in general, has little to do with kernel implementation which only 
influences it's likelyhood. There's always the possibility the that the 
creation of the /dev node in userspace takes time and the return from 
the create thread could beat it.? It's only more unlikely with the full 
finish-connect before return as that added time.

thus there's no "synchronous" requirement - what does that even mean ??? 
What is the difference between a controller being created then 
immediately failing and going into a reconnect that delays ?

-- james

^ permalink raw reply	[flat|nested] 6+ messages in thread

* please revert a nvme-cli commit
  2018-06-13 15:51 ` James Smart
@ 2018-06-14 12:42   ` Christoph Hellwig
  2018-06-14 16:29     ` James Smart
  0 siblings, 1 reply; 6+ messages in thread
From: Christoph Hellwig @ 2018-06-14 12:42 UTC (permalink / raw)


On Wed, Jun 13, 2018@08:51:38AM -0700, James Smart wrote:
> Really ?? sigh.? I have lots of consumers that have no issues with these 
> changes and there is nothing that acts "incompatible".?? It's been 1.5 
> months - where have you been?
>
> These conditions can occur independent of any change in kernel 
> implementation and are significant robustness corrections.

I don't agree.  For one so far we've guaranteed the device node
appears before we return from the write to the /dev/nvme-fabrics
device.  If that changes it is a significant ABI break, and we need
to fix that in the kernel.

And if it wasn't the fix is still wrong - we'd need to wait for it to
appear using libudev APIs and/or one of the file notification syscalls
rather than adding a probing loop that is in a different place than
the actual open.

And for retrying the actual I/O we need to decide on the exact semantics
we want to support first.  Blind n time retry is always a bad idea,
we need to build some sort of reliable infrastructure.  Be that
optionally marking requests as not failfast, and/or some sort of poll
notification for a device that is ready.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* please revert a nvme-cli commit
  2018-06-14 12:42   ` Christoph Hellwig
@ 2018-06-14 16:29     ` James Smart
  2018-06-15  9:38       ` Christoph Hellwig
  0 siblings, 1 reply; 6+ messages in thread
From: James Smart @ 2018-06-14 16:29 UTC (permalink / raw)




On 6/14/2018 5:42 AM, Christoph Hellwig wrote:
> On Wed, Jun 13, 2018@08:51:38AM -0700, James Smart wrote:
>> Really ?? sigh.? I have lots of consumers that have no issues with these
>> changes and there is nothing that acts "incompatible".?? It's been 1.5
>> months - where have you been?
>>
>> These conditions can occur independent of any change in kernel
>> implementation and are significant robustness corrections.
> I don't agree.  For one so far we've guaranteed the device node
> appears before we return from the write to the /dev/nvme-fabrics
> device.  If that changes it is a significant ABI break, and we need
> to fix that in the kernel.

what provides the guarantee that it appears first ?

Based on what I've seen there is nothing but time that guarantees it, 
and if that's true, the udev user side always has a potential of getting 
hung up and exceeding the time.


>
> And if it wasn't the fix is still wrong - we'd need to wait for it to
> appear using libudev APIs and/or one of the file notification syscalls
> rather than adding a probing loop that is in a different place than
> the actual open.

that's fine, I can certainly go look in that direction.

>
> And for retrying the actual I/O we need to decide on the exact semantics
> we want to support first.  Blind n time retry is always a bad idea,
> we need to build some sort of reliable infrastructure.  Be that
> optionally marking requests as not failfast, and/or some sort of poll
> notification for a device that is ready.

Ok. Given I've already had complaints that the cli shouldn't hang out 
for a minute or more - the time it takes for a controller to lose 
connectivity and fail the loss timeout, and that's not considering the 
cmds that probe several controllers each of which could do this 
sequentially resulting in a delay of many minutes - and as the user may 
want to kill the waiting cli, I'm of the opinion that the kernel request 
should be terminated not suspended. Thus the ios should continue to be 
failfast and let the policy be in the cli based on what the command is.? 
Making the cli use the poll notification to find when the controller is 
ready again is fine.

what should be the general policy for any command? if receive an EAGAIN 
(not ready) then poll until ready or device failure ????? or return 
failure immediately unless command is connect-all (discovery log read) 
then it waits for ready/failure ?

what should be the policy for the commands that probe multiple 
controllers ? ?? do commands fail immediately causing command to skip 
controllers not ready (message when does so) ??? note: the skip is 
already there on command failure, but if we suspend as a general policy, 
it would affect it.

-- james

^ permalink raw reply	[flat|nested] 6+ messages in thread

* please revert a nvme-cli commit
  2018-06-14 16:29     ` James Smart
@ 2018-06-15  9:38       ` Christoph Hellwig
  0 siblings, 0 replies; 6+ messages in thread
From: Christoph Hellwig @ 2018-06-15  9:38 UTC (permalink / raw)


On Thu, Jun 14, 2018@09:29:24AM -0700, James Smart wrote:
>> I don't agree.  For one so far we've guaranteed the device node
>> appears before we return from the write to the /dev/nvme-fabrics
>> device.  If that changes it is a significant ABI break, and we need
>> to fix that in the kernel.
>
> what provides the guarantee that it appears first ?
>
> Based on what I've seen there is nothing but time that guarantees it, and 
> if that's true, the udev user side always has a potential of getting hung 
> up and exceeding the time.

Then we need to use the proper apis to wait for it to appear instead
of just trying a few times.  I suspect those are going to be either
libudev or libsysfs APIs, but I'll need to look into the details.

>> And for retrying the actual I/O we need to decide on the exact semantics
>> we want to support first.  Blind n time retry is always a bad idea,
>> we need to build some sort of reliable infrastructure.  Be that
>> optionally marking requests as not failfast, and/or some sort of poll
>> notification for a device that is ready.
>
> Ok. Given I've already had complaints that the cli shouldn't hang out for a 
> minute or more - the time it takes for a controller to lose connectivity 
> and fail the loss timeout, and that's not considering the cmds that probe 
> several controllers each of which could do this sequentially resulting in a 
> delay of many minutes - and as the user may want to kill the waiting cli, 
> I'm of the opinion that the kernel request should be terminated not 
> suspended. Thus the ios should continue to be failfast and let the policy 
> be in the cli based on what the command is.? Making the cli use the poll 
> notification to find when the controller is ready again is fine.

The passthrough ioctls can set a per-command timeout, so we should set
that to the time we want to wait for.  And yes, we should do a killable
wait.  Right now the block layer doesn't offer an API for that, but
we really should switch all of NVMe over to that.  I can take it as
an action item to add that support.

> what should be the general policy for any command? if receive an EAGAIN 
> (not ready) then poll until ready or device failure ????? or return 
> failure immediately unless command is connect-all (discovery log read) then 
> it waits for ready/failure ?

We should only ever return EAGAIN if opened with O_NONBLOCK for a start.
And then preferable we should allow to poll for the controller to be
ready.

> what should be the policy for the commands that probe multiple controllers 
> ? ?? do commands fail immediately causing command to skip controllers not 
> ready (message when does so) ??? note: the skip is already there on 
> command failure, but if we suspend as a general policy, it would affect it.

I think we should have the ioctl blocking until the sent timeout (or
killed) and just skip the controllers that time out.  But I'd be happy
to listen to other idea if they sound more sensible.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-06-15  9:38 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-06-13  7:55 please revert a nvme-cli commit Christoph Hellwig
2018-06-13 13:31 ` Keith Busch
2018-06-13 15:51 ` James Smart
2018-06-14 12:42   ` Christoph Hellwig
2018-06-14 16:29     ` James Smart
2018-06-15  9:38       ` Christoph Hellwig

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.