All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] traffic based keep alive support (TP 4024)
@ 2018-09-28  1:15 Sagi Grimberg
  2018-09-28  1:15 ` [PATCH 1/4] nvme: introduce ctrl attributes enumeration Sagi Grimberg
                   ` (4 more replies)
  0 siblings, 5 replies; 10+ messages in thread
From: Sagi Grimberg @ 2018-09-28  1:15 UTC (permalink / raw)


Traffic Based Keep Alive TP (4024) was ratified recently. The technical
proposal introduces a mode where keep alive commands may be omitted
based on the presence of admin or io traffic during the keep alive timeout
period. The rational is that normal command execution is sufficient
indication of host, controller and transport health so the keep alive command
is redundant.

Moreover, There is no way to guarantee in the various transports that a
keep alive command will not be starved by heavy traffic, causing the
keep alive timeout to expire. There were a number bug reports on
linux nvme community on this exact issue.

The implementation simply a controller-wide indicator that is cleared
upon reset of the keep alive timer and set when a command (target) or
completion (host) is processed. When the keep alive timer expires, we
check if any command/completion were processed, we restart the keep
alive timer even if a keep alive command was not executed.

Sagi Grimberg (4):
  nvme: introduce ctrl attributes enumeration
  nvmet: support for traffic based keep-alive
  nvme-core: cache controller attributes
  nvme-core: support traffic based keep-alive based on controller
    support

 drivers/nvme/host/core.c        | 12 ++++++++++++
 drivers/nvme/host/nvme.h        |  2 ++
 drivers/nvme/target/admin-cmd.c |  3 ++-
 drivers/nvme/target/core.c      | 12 ++++++++++++
 drivers/nvme/target/nvmet.h     |  2 ++
 include/linux/nvme.h            |  5 +++++
 6 files changed, 35 insertions(+), 1 deletion(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 1/4] nvme: introduce ctrl attributes enumeration
  2018-09-28  1:15 [PATCH 0/4] traffic based keep alive support (TP 4024) Sagi Grimberg
@ 2018-09-28  1:15 ` Sagi Grimberg
  2018-09-28  1:15 ` [PATCH 2/4] nvmet: support for traffic based keep-alive Sagi Grimberg
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2018-09-28  1:15 UTC (permalink / raw)


We are growing more controller attributes, so use
a proper enumeration for it. For now just add the
128-bit hostid which we support.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/target/admin-cmd.c | 2 +-
 include/linux/nvme.h            | 4 ++++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index eb240c134cb9..86be3d0ae847 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -300,7 +300,7 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 
 	/* XXX: figure out what to do about RTD3R/RTD3 */
 	id->oaes = cpu_to_le32(NVMET_AEN_CFG_OPTIONAL);
-	id->ctratt = cpu_to_le32(1 << 0);
+	id->ctratt = cpu_to_le32(NVME_CTRL_ATTR_HID_128_BIT);
 
 	id->oacs = 0;
 
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 68e91ef5494c..0692683c103a 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -198,6 +198,10 @@ enum {
 	NVME_PS_FLAGS_NON_OP_STATE	= 1 << 1,
 };
 
+enum nvme_ctrl_attr {
+	NVME_CTRL_ATTR_HID_128_BIT	= (1 << 0),
+};
+
 struct nvme_id_ctrl {
 	__le16			vid;
 	__le16			ssvid;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 2/4] nvmet: support for traffic based keep-alive
  2018-09-28  1:15 [PATCH 0/4] traffic based keep alive support (TP 4024) Sagi Grimberg
  2018-09-28  1:15 ` [PATCH 1/4] nvme: introduce ctrl attributes enumeration Sagi Grimberg
@ 2018-09-28  1:15 ` Sagi Grimberg
  2018-09-28  1:15 ` [PATCH 3/4] nvme-core: cache controller attributes Sagi Grimberg
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2018-09-28  1:15 UTC (permalink / raw)


A controller that supports traffic based keep-alive can restart
its keep alive timer even if keep-alive was not issued in the
kato period but other admin or io commands was. For each command
set ctrl->cmd_seen to true, and when keep-alive timer expires,
if any commands were seen, resched ka_work instead of escalating
to a fatal error.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/target/admin-cmd.c |  3 ++-
 drivers/nvme/target/core.c      | 12 ++++++++++++
 drivers/nvme/target/nvmet.h     |  2 ++
 include/linux/nvme.h            |  1 +
 4 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/drivers/nvme/target/admin-cmd.c b/drivers/nvme/target/admin-cmd.c
index 86be3d0ae847..5eb972a7c092 100644
--- a/drivers/nvme/target/admin-cmd.c
+++ b/drivers/nvme/target/admin-cmd.c
@@ -300,7 +300,8 @@ static void nvmet_execute_identify_ctrl(struct nvmet_req *req)
 
 	/* XXX: figure out what to do about RTD3R/RTD3 */
 	id->oaes = cpu_to_le32(NVMET_AEN_CFG_OPTIONAL);
-	id->ctratt = cpu_to_le32(NVME_CTRL_ATTR_HID_128_BIT);
+	id->ctratt = cpu_to_le32(NVME_CTRL_ATTR_HID_128_BIT |
+		NVME_CTRL_ATTR_TBKAS);
 
 	id->oacs = 0;
 
diff --git a/drivers/nvme/target/core.c b/drivers/nvme/target/core.c
index b5ec96abd048..7084704b468d 100644
--- a/drivers/nvme/target/core.c
+++ b/drivers/nvme/target/core.c
@@ -298,6 +298,15 @@ static void nvmet_keep_alive_timer(struct work_struct *work)
 {
 	struct nvmet_ctrl *ctrl = container_of(to_delayed_work(work),
 			struct nvmet_ctrl, ka_work);
+	bool cmd_seen = ctrl->cmd_seen;
+
+	ctrl->cmd_seen = false;
+	if (cmd_seen) {
+		pr_debug("ctrl %d traffic-based keep-alive timer expired, reschedule\n",
+			ctrl->cntlid);
+		schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
+		return;
+	}
 
 	pr_err("ctrl %d keep-alive timer (%d seconds) expired!\n",
 		ctrl->cntlid, ctrl->kato);
@@ -700,6 +709,9 @@ bool nvmet_req_init(struct nvmet_req *req, struct nvmet_cq *cq,
 		goto fail;
 	}
 
+	if (sq->ctrl)
+		sq->ctrl->cmd_seen = true;
+
 	return true;
 
 fail:
diff --git a/drivers/nvme/target/nvmet.h b/drivers/nvme/target/nvmet.h
index ec9af4ee03b6..8f965e025e29 100644
--- a/drivers/nvme/target/nvmet.h
+++ b/drivers/nvme/target/nvmet.h
@@ -154,6 +154,8 @@ struct nvmet_ctrl {
 	struct nvmet_cq		**cqs;
 	struct nvmet_sq		**sqs;
 
+	bool			cmd_seen;
+
 	struct mutex		lock;
 	u64			cap;
 	u32			cc;
diff --git a/include/linux/nvme.h b/include/linux/nvme.h
index 0692683c103a..6ea47158c41b 100644
--- a/include/linux/nvme.h
+++ b/include/linux/nvme.h
@@ -200,6 +200,7 @@ enum {
 
 enum nvme_ctrl_attr {
 	NVME_CTRL_ATTR_HID_128_BIT	= (1 << 0),
+	NVME_CTRL_ATTR_TBKAS		= (1 << 6),
 };
 
 struct nvme_id_ctrl {
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 3/4] nvme-core: cache controller attributes
  2018-09-28  1:15 [PATCH 0/4] traffic based keep alive support (TP 4024) Sagi Grimberg
  2018-09-28  1:15 ` [PATCH 1/4] nvme: introduce ctrl attributes enumeration Sagi Grimberg
  2018-09-28  1:15 ` [PATCH 2/4] nvmet: support for traffic based keep-alive Sagi Grimberg
@ 2018-09-28  1:15 ` Sagi Grimberg
  2018-09-28  1:15 ` [PATCH 4/4] nvme-core: support traffic based keep-alive based on controller support Sagi Grimberg
  2018-10-16  1:07 ` [PATCH 0/4] traffic based keep alive support (TP 4024) Sagi Grimberg
  4 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2018-09-28  1:15 UTC (permalink / raw)


We get the controller attributes in identify, cache
them as we'll need them for traffic based keep alive
support.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 1 +
 drivers/nvme/host/nvme.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index ef6748f4ff47..6bb6214c0634 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2409,6 +2409,7 @@ int nvme_init_identify(struct nvme_ctrl *ctrl)
 	ctrl->sgls = le32_to_cpu(id->sgls);
 	ctrl->kas = le16_to_cpu(id->kas);
 	ctrl->max_namespaces = le32_to_cpu(id->mnan);
+	ctrl->ctratt = le32_to_cpu(id->ctratt);
 
 	if (id->rtd3e) {
 		/* us -> s */
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index bb4a2003c097..ee1c4fc0249c 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -193,6 +193,7 @@ struct nvme_ctrl {
 	u8 apsta;
 	u32 oaes;
 	u32 aen_result;
+	u32 ctratt;
 	unsigned int shutdown_timeout;
 	unsigned int kato;
 	bool subsystem;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 4/4] nvme-core: support traffic based keep-alive based on controller support
  2018-09-28  1:15 [PATCH 0/4] traffic based keep alive support (TP 4024) Sagi Grimberg
                   ` (2 preceding siblings ...)
  2018-09-28  1:15 ` [PATCH 3/4] nvme-core: cache controller attributes Sagi Grimberg
@ 2018-09-28  1:15 ` Sagi Grimberg
  2018-10-17  7:05   ` Christoph Hellwig
  2018-10-16  1:07 ` [PATCH 0/4] traffic based keep alive support (TP 4024) Sagi Grimberg
  4 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2018-09-28  1:15 UTC (permalink / raw)


If the controller supports traffic based keep alive, we restart
the keep alive timer if any admin or io commands was completed
during the kato period. This prevents a possible starvation of
keep alive commands in the presence of heavy traffic as in such
case, we already have a health indication from the host perspective.

Signed-off-by: Sagi Grimberg <sagi at grimberg.me>
---
 drivers/nvme/host/core.c | 11 +++++++++++
 drivers/nvme/host/nvme.h |  1 +
 2 files changed, 12 insertions(+)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 6bb6214c0634..54a24987a7d4 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -251,6 +251,7 @@ void nvme_complete_rq(struct request *req)
 
 	trace_nvme_complete_rq(req);
 
+	nvme_req(req)->ctrl->comp_seen = true;
 	if (unlikely(status != BLK_STS_OK && nvme_req_needs_retry(req))) {
 		if ((req->cmd_flags & REQ_NVME_MPATH) &&
 		    blk_path_error(status)) {
@@ -841,6 +842,7 @@ static void nvme_keep_alive_end_io(struct request *rq, blk_status_t status)
 		return;
 	}
 
+	ctrl->comp_seen = false;
 	schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
 }
 
@@ -865,6 +867,15 @@ static void nvme_keep_alive_work(struct work_struct *work)
 {
 	struct nvme_ctrl *ctrl = container_of(to_delayed_work(work),
 			struct nvme_ctrl, ka_work);
+	bool comp_seen = ctrl->comp_seen;
+
+	if ((ctrl->ctratt & NVME_CTRL_ATTR_TBKAS) && comp_seen) {
+		dev_dbg(ctrl->device,
+			"traffic based keep-alive timer expired, reschedule\n");
+		ctrl->comp_seen = false;
+		schedule_delayed_work(&ctrl->ka_work, ctrl->kato * HZ);
+		return;
+	}
 
 	if (nvme_keep_alive(ctrl)) {
 		/* allocation failure, reset the controller */
diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
index ee1c4fc0249c..e9aaae257daa 100644
--- a/drivers/nvme/host/nvme.h
+++ b/drivers/nvme/host/nvme.h
@@ -145,6 +145,7 @@ enum nvme_ctrl_state {
 };
 
 struct nvme_ctrl {
+	bool comp_seen;
 	enum nvme_ctrl_state state;
 	bool identified;
 	spinlock_t lock;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [PATCH 0/4] traffic based keep alive support (TP 4024)
  2018-09-28  1:15 [PATCH 0/4] traffic based keep alive support (TP 4024) Sagi Grimberg
                   ` (3 preceding siblings ...)
  2018-09-28  1:15 ` [PATCH 4/4] nvme-core: support traffic based keep-alive based on controller support Sagi Grimberg
@ 2018-10-16  1:07 ` Sagi Grimberg
  4 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2018-10-16  1:07 UTC (permalink / raw)


Ping?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/4] nvme-core: support traffic based keep-alive based on controller support
  2018-09-28  1:15 ` [PATCH 4/4] nvme-core: support traffic based keep-alive based on controller support Sagi Grimberg
@ 2018-10-17  7:05   ` Christoph Hellwig
  2018-10-19  0:38     ` Sagi Grimberg
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2018-10-17  7:05 UTC (permalink / raw)


> +	nvme_req(req)->ctrl->comp_seen = true;

shouldn't we only do this for controllers with keep a live support?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/4] nvme-core: support traffic based keep-alive based on controller support
  2018-10-17  7:05   ` Christoph Hellwig
@ 2018-10-19  0:38     ` Sagi Grimberg
  2018-10-19  6:00       ` Christoph Hellwig
  0 siblings, 1 reply; 10+ messages in thread
From: Sagi Grimberg @ 2018-10-19  0:38 UTC (permalink / raw)



>> +	nvme_req(req)->ctrl->comp_seen = true;
> 
> shouldn't we only do this for controllers with keep a live support?

What is the harm setting it anyways? Would it be better to condition it
on keep alive support instead?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/4] nvme-core: support traffic based keep-alive based on controller support
  2018-10-19  0:38     ` Sagi Grimberg
@ 2018-10-19  6:00       ` Christoph Hellwig
  2018-10-19  6:30         ` Sagi Grimberg
  0 siblings, 1 reply; 10+ messages in thread
From: Christoph Hellwig @ 2018-10-19  6:00 UTC (permalink / raw)


On Thu, Oct 18, 2018@05:38:29PM -0700, Sagi Grimberg wrote:
>
>>> +	nvme_req(req)->ctrl->comp_seen = true;
>>
>> shouldn't we only do this for controllers with keep a live support?
>
> What is the harm setting it anyways? Would it be better to condition it
> on keep alive support instead?

This writes to a cache line in the controller on every I/O completion.
And while it isn't an atomic it isn't exactly going to help scalability,
so I'd rather have a branch before it, which will be predicted not taken
for controllers that don't have the feature (aka at least every PCIe
controller on the planet)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH 4/4] nvme-core: support traffic based keep-alive based on controller support
  2018-10-19  6:00       ` Christoph Hellwig
@ 2018-10-19  6:30         ` Sagi Grimberg
  0 siblings, 0 replies; 10+ messages in thread
From: Sagi Grimberg @ 2018-10-19  6:30 UTC (permalink / raw)



>>>> +	nvme_req(req)->ctrl->comp_seen = true;
>>>
>>> shouldn't we only do this for controllers with keep a live support?
>>
>> What is the harm setting it anyways? Would it be better to condition it
>> on keep alive support instead?
> 
> This writes to a cache line in the controller on every I/O completion.
> And while it isn't an atomic it isn't exactly going to help scalability,

Yea, this was why I placed it in the front of nvme_ctrl with hope that
that nvme_ctrl located right behind variables accessed in the hot path..

> so I'd rather have a branch before it, which will be predicted not taken
> for controllers that don't have the feature (aka at least every PCIe
> controller on the planet)

That would induce a read access a variable (kas) deeper in nvme_ctrl
which is "colder"... I don't mind doing it, just wanted to raise the
trade-off. We can also keep a cached copy of kas in the drivers
representations if it becomes a problem...

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2018-10-19  6:30 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-09-28  1:15 [PATCH 0/4] traffic based keep alive support (TP 4024) Sagi Grimberg
2018-09-28  1:15 ` [PATCH 1/4] nvme: introduce ctrl attributes enumeration Sagi Grimberg
2018-09-28  1:15 ` [PATCH 2/4] nvmet: support for traffic based keep-alive Sagi Grimberg
2018-09-28  1:15 ` [PATCH 3/4] nvme-core: cache controller attributes Sagi Grimberg
2018-09-28  1:15 ` [PATCH 4/4] nvme-core: support traffic based keep-alive based on controller support Sagi Grimberg
2018-10-17  7:05   ` Christoph Hellwig
2018-10-19  0:38     ` Sagi Grimberg
2018-10-19  6:00       ` Christoph Hellwig
2018-10-19  6:30         ` Sagi Grimberg
2018-10-16  1:07 ` [PATCH 0/4] traffic based keep alive support (TP 4024) Sagi Grimberg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.