From: Waiman Long <longman@redhat.com> To: Tejun Heo <tj@kernel.org>, Jens Axboe <axboe@kernel.dk> Cc: cgroups@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Ming Lei <ming.lei@redhat.com>, Waiman Long <longman@redhat.com> Subject: [PATCH v5 4/4] blk-cgroup: Document the design of new lockless iostat_cpu list Date: Thu, 2 Jun 2022 14:54:01 -0400 [thread overview] Message-ID: <20220602185401.162937-1-longman@redhat.com> (raw) In-Reply-To: <20220602133543.128088-2-longman@redhat.com> A set of percpu lockless lists per block cgroup (blkcg) is added to track the set of recently updated iostat_cpu structures. Add comment in the code to document the design of this new set of lockless lists. Signed-off-by: Waiman Long <longman@redhat.com> --- block/blk-cgroup.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 8af97f3b2fc9..f8f27551c16a 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -60,6 +60,21 @@ static struct workqueue_struct *blkcg_punt_bio_wq; #define BLKG_DESTROY_BATCH_SIZE 64 /* + * Lockless lists for tracking IO stats update + * + * New IO stats are stored in the percpu iostat_cpu within blkcg_gq (blkg). + * There are multiple blkg's (one for each block device) attached to each + * blkcg. The rstat code keeps track of which cpu has IO stats updated, + * but it doesn't know which blkg has the updated stats. If there are many + * block devices in a system, the cost of iterating all the blkg's to flush + * out the IO stats can be high. To reduce such overhead, a set of percpu + * lockless lists (lhead) per blkcg are used to track the set of recently + * updated iostat_cpu's since the last flush. An iostat_cpu will be put + * onto the lockless list on the update side [blk_cgroup_bio_start()] if + * not there yet and then removed when being flushed [blkcg_rstat_flush()]. + * References to blkg are gotten and then put back in the process to + * protect against blkg removal. + * * lnode.next of the last entry in a lockless list is NULL. To enable us to * use lnode.next as a boolean flag to indicate its presence in a lockless * list, we have to make it non-NULL for all. This is done by using a -- 2.31.1
WARNING: multiple messages have this Message-ID (diff)
From: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> To: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>, Jens Axboe <axboe-tSWWG44O7X1aa/9Udqfwiw@public.gmane.org> Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-block-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org, Ming Lei <ming.lei-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>, Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> Subject: [PATCH v5 4/4] blk-cgroup: Document the design of new lockless iostat_cpu list Date: Thu, 2 Jun 2022 14:54:01 -0400 [thread overview] Message-ID: <20220602185401.162937-1-longman@redhat.com> (raw) In-Reply-To: <20220602133543.128088-2-longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> A set of percpu lockless lists per block cgroup (blkcg) is added to track the set of recently updated iostat_cpu structures. Add comment in the code to document the design of this new set of lockless lists. Signed-off-by: Waiman Long <longman-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> --- block/blk-cgroup.c | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/block/blk-cgroup.c b/block/blk-cgroup.c index 8af97f3b2fc9..f8f27551c16a 100644 --- a/block/blk-cgroup.c +++ b/block/blk-cgroup.c @@ -60,6 +60,21 @@ static struct workqueue_struct *blkcg_punt_bio_wq; #define BLKG_DESTROY_BATCH_SIZE 64 /* + * Lockless lists for tracking IO stats update + * + * New IO stats are stored in the percpu iostat_cpu within blkcg_gq (blkg). + * There are multiple blkg's (one for each block device) attached to each + * blkcg. The rstat code keeps track of which cpu has IO stats updated, + * but it doesn't know which blkg has the updated stats. If there are many + * block devices in a system, the cost of iterating all the blkg's to flush + * out the IO stats can be high. To reduce such overhead, a set of percpu + * lockless lists (lhead) per blkcg are used to track the set of recently + * updated iostat_cpu's since the last flush. An iostat_cpu will be put + * onto the lockless list on the update side [blk_cgroup_bio_start()] if + * not there yet and then removed when being flushed [blkcg_rstat_flush()]. + * References to blkg are gotten and then put back in the process to + * protect against blkg removal. + * * lnode.next of the last entry in a lockless list is NULL. To enable us to * use lnode.next as a boolean flag to indicate its presence in a lockless * list, we have to make it non-NULL for all. This is done by using a -- 2.31.1
next prev parent reply other threads:[~2022-06-02 18:54 UTC|newest] Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top 2022-06-01 21:18 [PATCH v3 0/2] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long 2022-06-01 21:18 ` Waiman Long 2022-06-01 21:18 ` [PATCH v3 1/2] blk-cgroup: Correctly free percpu iostat_cpu in blkg on error exit Waiman Long 2022-06-01 21:18 ` [PATCH v3 2/2] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long 2022-06-01 21:18 ` Waiman Long 2022-06-01 21:26 ` Tejun Heo 2022-06-01 21:30 ` Waiman Long 2022-06-02 6:32 ` kernel test robot 2022-06-02 1:54 ` [PATCH v4 " Waiman Long 2022-06-02 1:54 ` Waiman Long 2022-06-02 13:35 ` [PATCH v5 0/3] " Waiman Long 2022-06-02 13:35 ` [PATCH v5 1/3] blk-cgroup: Correctly free percpu iostat_cpu in blkg on error exit Waiman Long 2022-06-02 18:54 ` Waiman Long [this message] 2022-06-02 18:54 ` [PATCH v5 4/4] blk-cgroup: Document the design of new lockless iostat_cpu list Waiman Long 2022-06-02 19:05 ` Tejun Heo 2022-06-02 19:12 ` Waiman Long 2022-06-02 19:12 ` Waiman Long 2022-06-02 13:35 ` [PATCH v5 2/3] blk-cgroup: Return -ENOMEM directly in blkcg_css_alloc() error path Waiman Long 2022-06-02 13:35 ` Waiman Long 2022-06-02 16:16 ` Tejun Heo 2022-06-02 17:17 ` Waiman Long 2022-06-02 17:17 ` Waiman Long 2022-06-02 13:35 ` [PATCH v5 3/3] blk-cgroup: Optimize blkcg_rstat_flush() Waiman Long 2022-06-02 16:58 ` Tejun Heo 2022-06-02 17:26 ` Waiman Long 2022-06-02 17:26 ` Waiman Long 2022-06-02 17:46 ` Tejun Heo 2022-06-02 17:46 ` Tejun Heo 2022-06-02 18:18 ` Waiman Long 2022-06-02 18:18 ` Waiman Long
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20220602185401.162937-1-longman@redhat.com \ --to=longman@redhat.com \ --cc=axboe@kernel.dk \ --cc=cgroups@vger.kernel.org \ --cc=linux-block@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=ming.lei@redhat.com \ --cc=tj@kernel.org \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.