From: Sagi Grimberg <sagi@grimberg.me>
To: "Wunderlich, Mark" <mark.wunderlich@intel.com>,
"linux-nvme@lists.infradead.org" <linux-nvme@lists.infradead.org>
Subject: Re: [PATCH V3] nvmet-tcp: enable optional queue idle period tracking
Date: Tue, 23 Mar 2021 15:52:41 -0700 [thread overview]
Message-ID: <c4a1f41c-9e99-4ed0-db06-e26496be71df@grimberg.me> (raw)
In-Reply-To: <BYAPR11MB2869C572ECE86BF9CBF7F649E5649@BYAPR11MB2869.namprd11.prod.outlook.com>
Hey Mark,
> nvmet-tcp: enable optional queue idle period tracking
>
> Add 'idle_poll_period_usecs' option used by io_work() to support
> network devices enabled with advanced interrupt moderation
> supporting a relaxed interrupt model. It was discovered that
> such a NIC used on the target was unable to support initiator
> connection establishment, caused by the existing io_work()
> flow that immediately exits after a loop with no activity and
> does not re-queue itself.
>
> With this new option a queue is assigned a period of time
> that no activity must occur in order to become 'idle'. Until
> the queue is idle the work item is requeued.
>
> The new module option is defined as changeable making it
> flexible for testing purposes.
>
> The pre-existing legacy behavior is preserved when no module option
> for idle_poll_period_usecs is specified.
>
> Signed-off-by: Mark Wunderlich <mark.wunderlich@intel.com>
> ---
Its easier to read here with the format:
Changes from v2:
- ...
- ...
Changes from v1:
- ...
- ...
> V2 of this patch removes the accounting of time deducted from the
> idle deadline time period only during io_work activity. The result
> is a more simple solution, only requiring the selection of a
> sufficient optional time period that will catch any non-idle activity
> to keep a queue active.
>
> Testing was performed with a NIC using standard HW interrupt mode, with
> and without the new module option enabled. No measurable performance
> drop was seen when the patch wsa applied and the new option specified
> or not. A side effect of a standard NIC using the new option
> will reduce the context switch rate. We measured a drop from roughly
> 90K to less than 300 (for 32 active connections).
>
> For a NIC using a passive advanced interrupt moderation policy, it was
> then successfully able to achieve and maintain active connections with
> the target.
> ---
> V3 of this patch provides a bit more simplification, pulling the
> tracking code out of io_work and into two support functions. Now a
> single test made after the process loop in io_work to determine if
> optional idle tracking is active or not. The base logic of idle tracking
> used as condition to re-queue worker remains the same.
> ---
> drivers/nvme/target/tcp.c | 44 +++++++++++++++++++++++++++++++++++++++++---
> 1 file changed, 41 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/target/tcp.c b/drivers/nvme/target/tcp.c
> index dc1f0f647189..0e86115af9f4 100644
> --- a/drivers/nvme/target/tcp.c
> +++ b/drivers/nvme/target/tcp.c
> @@ -29,6 +29,16 @@ static int so_priority;
> module_param(so_priority, int, 0644);
> MODULE_PARM_DESC(so_priority, "nvmet tcp socket optimize priority");
>
> +/* Define a time period (in usecs) that io_work() shall sample an activated
> + * queue before determining it to be idle. This optional module behavior
> + * can enable NIC solutions that support socket optimized packet processing
> + * using advanced interrupt moderation techniques.
> + */
> +static int idle_poll_period_usecs;
> +module_param(idle_poll_period_usecs, int, 0644);
> +MODULE_PARM_DESC(idle_poll_period_usecs,
> + "nvmet tcp io_work poll till idle time period in usecs");
> +
> #define NVMET_TCP_RECV_BUDGET 8
> #define NVMET_TCP_SEND_BUDGET 8
> #define NVMET_TCP_IO_WORK_BUDGET 64
> @@ -119,6 +129,9 @@ struct nvmet_tcp_queue {
> struct ahash_request *snd_hash;
> struct ahash_request *rcv_hash;
>
> + unsigned long poll_start;
> + unsigned long poll_end;
> +
> spinlock_t state_lock;
> enum nvmet_tcp_queue_state state;
>
> @@ -1198,11 +1211,34 @@ static void nvmet_tcp_schedule_release_queue(struct nvmet_tcp_queue *queue)
> spin_unlock(&queue->state_lock);
> }
>
> +static inline void nvmet_tcp_arm_queue_deadline(struct nvmet_tcp_queue *queue)
> +{
> + queue->poll_start = jiffies;
> + queue->poll_end = queue->poll_start +
> + usecs_to_jiffies(idle_poll_period_usecs);
> +}
> +
> +static bool nvmet_tcp_check_queue_deadline(struct nvmet_tcp_queue *queue,
> + int ops)
> +{
> + if (!idle_poll_period_usecs)
> + return false;
> +
> + if (ops || !queue->poll_start)
> + nvmet_tcp_arm_queue_deadline(queue);
> +
> + if (!time_in_range(jiffies, queue->poll_start, queue->poll_end)) {
> + queue->poll_start = queue->poll_end = 0;
> + return false;
> + }
> + return true;
Can this be simpler somehow? without clearing the poll limits and then
looking at poll_start to indicate that?
> +}
> +
> static void nvmet_tcp_io_work(struct work_struct *w)
> {
> struct nvmet_tcp_queue *queue =
> container_of(w, struct nvmet_tcp_queue, io_work);
> - bool pending;
> + bool pending, requeue;
> int ret, ops = 0;
>
> do {
> @@ -1223,9 +1259,11 @@ static void nvmet_tcp_io_work(struct work_struct *w)
> } while (pending && ops < NVMET_TCP_IO_WORK_BUDGET);
>
> /*
> - * We exahusted our budget, requeue our selves
> + * Requeue the worker if idle deadline period is in progress or any
> + * ops activity was recorded during the do-while loop above.
> */
> - if (pending)
> + requeue = nvmet_tcp_check_queue_deadline(queue, ops) || pending;
> + if (requeue)
I'm thinking that this requeue variable is redundant,
You can do instead:
if (nvmet_tcp_check_queue_deadline(queue, ops) || pending)
> queue_work_on(queue_cpu(queue), nvmet_tcp_wq, &queue->io_work);
> }
>
>
> _______________________________________________
> Linux-nvme mailing list
> Linux-nvme@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-nvme
>
_______________________________________________
Linux-nvme mailing list
Linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
prev parent reply other threads:[~2021-03-23 22:53 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-03-23 22:25 [PATCH V3] nvmet-tcp: enable optional queue idle period tracking Wunderlich, Mark
2021-03-23 22:52 ` Sagi Grimberg [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=c4a1f41c-9e99-4ed0-db06-e26496be71df@grimberg.me \
--to=sagi@grimberg.me \
--cc=linux-nvme@lists.infradead.org \
--cc=mark.wunderlich@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).