linux-nvme.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* nvme-fabrics: shutdown 1-minute deadlock
@ 2021-09-20 15:09 Martin Belanger
  2021-09-20 17:14 ` Martin Belanger
  0 siblings, 1 reply; 2+ messages in thread
From: Martin Belanger @ 2021-09-20 15:09 UTC (permalink / raw)
  To: linux-nvme

Hello NVMe community,

I ran into a 1-minute deadlock trying to disconnect from a remote discovery controller (over tcp) while the controller is unreachable. The remote controller could be unreachable because the network is down or the controller simply crashed unexpectedly. The cause is irrelevant. Suffice it to say that the kernel does not yet know that the controller is unreachable and we're trying to disconnect from the controller. The problem is that during a disconnect we try to write commands to the remote controller and the default timeout for a write operation is 1 minute (admin_timeout). 

For example, in nvme_shutdown_ctrl() we call reg_write32() (which really invokes nvmf_reg_write32) to set the NVME_CC_SHN_NORMAL bit, and this operation will block waiting for a response that will never come. Interestingly, in the same function we also call reg_read32() to read the CSTS register, but this time we specify a 5-sec timeout (i.e. ctrl->shutdown_timeout). In other words, there is an inconsistency between the write and the read timeouts in that one function. 

Similarly, nvme_disable_ctrl() calls reg_write32() (i.e. nvmf_reg_write32) to clear NVME_CC_ENABLE and NVME_CC_SHN_MASK, and once again this will block for 1 minute if the controller in unreachable. 

I would like to propose that the prototype for reg_write32() be changed to allow for the caller to specify a timeout as follows:

int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val, unsigned timeout);

This timeout will simply be passed to nvmf_reg_write32() and in turn to __nvme_submit_sync_cmd().

When invoking reg_write32(), one would set timeout to 0 (zero) to indicate that the default 1-minute timeout shall be used. Otherwise, a non-0 timeout would overwrite the default. It's only in functions nvme_shutdown_ctrl() and nvme_disable_ctrl() that we would specify a timeout shorter than 1 minute. For example, we could use ctrl->shutdown_timeout as the value for timeout.

I would like to hear your thoughts before I submit a patch. Maybe there's a better or easier way to work around this.

Regards,
Martin Belanger
Engineering Technologist, Dell Inc.
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: nvme-fabrics: shutdown 1-minute deadlock
  2021-09-20 15:09 nvme-fabrics: shutdown 1-minute deadlock Martin Belanger
@ 2021-09-20 17:14 ` Martin Belanger
  0 siblings, 0 replies; 2+ messages in thread
From: Martin Belanger @ 2021-09-20 17:14 UTC (permalink / raw)
  To: linux-nvme

Hello NVMe community,

I ran into a 1-minute deadlock trying to disconnect from a remote discovery controller (over tcp) while the controller is unreachable. The remote controller could be unreachable because the network is down or the controller simply crashed unexpectedly. The cause is irrelevant. Suffice it to say that the kernel does not yet know that the controller is unreachable and we're trying to disconnect from the controller. The problem is that during a disconnect we try to write commands to the remote controller and the default timeout for a write operation is 1 minute (admin_timeout). 

For example, in nvme_shutdown_ctrl() we call reg_write32() (which really invokes nvmf_reg_write32) to set the NVME_CC_SHN_NORMAL bit, and this operation will block waiting for a response that will never come. Interestingly, in the same function we also call reg_read32() to read the CSTS register, but this time we specify a 5-sec timeout (i.e. ctrl->shutdown_timeout). In other words, there is an inconsistency between the write and the read timeouts in that one function. 

Similarly, nvme_disable_ctrl() calls reg_write32() (i.e. nvmf_reg_write32) to clear NVME_CC_ENABLE and NVME_CC_SHN_MASK, and once again this will block for 1 minute if the controller in unreachable. 

I would like to propose that the prototype for reg_write32() be changed to allow for the caller to specify a timeout as follows:

int (*reg_write32)(struct nvme_ctrl *ctrl, u32 off, u32 val, unsigned timeout);

This timeout will simply be passed to nvmf_reg_write32() and in turn to __nvme_submit_sync_cmd().

When invoking reg_write32(), one would set timeout to 0 (zero) to indicate that the default 1-minute timeout shall be used. Otherwise, a non-0 timeout would overwrite the default. It's only in functions nvme_shutdown_ctrl() and nvme_disable_ctrl() that we would specify a timeout shorter than 1 minute. For example, we could use ctrl->shutdown_timeout as the value for timeout.

I would like to hear your thoughts before I submit a patch. Maybe there's a better or easier way to work around this.

Regards,
Martin Belanger
Engineering Technologist, Dell Inc.


Here's another suggestion. Instead of changing the prototype for reg_write32(), in nvmf_reg_write32() we could simply check ctrl->state and if the state indicates that the ctrl is being deleted, then use ctrl->shutdown_timeout as the timeout. All the changes are limited to this one function as shown below.

int nvmf_reg_write32(struct nvme_ctrl *ctrl, u32 off, u32 val)
{
	struct nvme_command cmd = { };
	int ret;
	unsigned timeout = 0;

	/* Use shutdown timeout when the controller is being deleted */
	switch (ctrl->state) {
	case NVME_CTRL_DELETING:
	case NVME_CTRL_DELETING_NOIO:
	case NVME_CTRL_DEAD:
		timeout = jiffies + (ctrl->shutdown_timeout * HZ);
		break;
	default:
		timeout = 0;
		break;
	}

	cmd.prop_set.opcode = nvme_fabrics_command;
	cmd.prop_set.fctype = nvme_fabrics_type_property_set;
	cmd.prop_set.attrib = 0;
	cmd.prop_set.offset = cpu_to_le32(off);
	cmd.prop_set.value = cpu_to_le64(val);

	ret = __nvme_submit_sync_cmd(ctrl->fabrics_q, &cmd, NULL, NULL, 0,
				     timeout, NVME_QID_ANY, 0, 0);
	if (unlikely(ret))
		dev_err(ctrl->device,
			"Property Set error: %d, offset %#x\n",
			ret > 0 ? ret & ~NVME_SC_DNR : ret, off);
	return ret;
}

Again, please let me know if this solution is acceptable and I will submit a patch.

Regards,
Martin Belanger
Engineering Technologist, Dell Inc.

_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2021-09-20 17:14 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-20 15:09 nvme-fabrics: shutdown 1-minute deadlock Martin Belanger
2021-09-20 17:14 ` Martin Belanger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).