* [PATCH V2 1/1] Add mddev->io_acct_cnt for raid0_quiesce
@ 2022-10-24 6:48 Xiao Ni
2022-10-28 21:06 ` Song Liu
2022-11-14 18:14 ` Song Liu
0 siblings, 2 replies; 9+ messages in thread
From: Xiao Ni @ 2022-10-24 6:48 UTC (permalink / raw)
To: song; +Cc: guoqing.jiang, linux-raid, ffan
It has added io_acct_set for raid0/raid5 io accounting and it needs to
alloc md_io_acct in the i/o path. They are free when the bios come back
from member disks. Now we don't have a method to monitor if those bios
are all come back. In the takeover process, it needs to free the raid0
memory resource including the memory pool for md_io_acct. But maybe some
bios are still not returned. When those bios are returned, it can cause
panic bcause of introducing NULL pointer or invalid address.
This patch adds io_acct_cnt. So when stopping raid0, it can use this
to wait until all bios come back.
Reported-by: Fine Fan <ffan@redhat.com>
Signed-off-by: Xiao Ni <xni@redhat.com>
---
V2: Move struct mddev* to the start of struct mddev_io_acct
drivers/md/md.c | 13 ++++++++++++-
drivers/md/md.h | 11 ++++++++---
drivers/md/raid0.c | 6 ++++++
3 files changed, 26 insertions(+), 4 deletions(-)
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 6f3b2c1cb6cd..208f69849054 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -685,6 +685,7 @@ void mddev_init(struct mddev *mddev)
atomic_set(&mddev->flush_pending, 0);
init_waitqueue_head(&mddev->sb_wait);
init_waitqueue_head(&mddev->recovery_wait);
+ init_waitqueue_head(&mddev->wait_io_acct);
mddev->reshape_position = MaxSector;
mddev->reshape_backwards = 0;
mddev->last_sync_action = "none";
@@ -8618,15 +8619,18 @@ int acct_bioset_init(struct mddev *mddev)
{
int err = 0;
- if (!bioset_initialized(&mddev->io_acct_set))
+ if (!bioset_initialized(&mddev->io_acct_set)) {
+ atomic_set(&mddev->io_acct_cnt, 0);
err = bioset_init(&mddev->io_acct_set, BIO_POOL_SIZE,
offsetof(struct md_io_acct, bio_clone), 0);
+ }
return err;
}
EXPORT_SYMBOL_GPL(acct_bioset_init);
void acct_bioset_exit(struct mddev *mddev)
{
+ WARN_ON(atomic_read(&mddev->io_acct_cnt) != 0);
bioset_exit(&mddev->io_acct_set);
}
EXPORT_SYMBOL_GPL(acct_bioset_exit);
@@ -8635,12 +8639,17 @@ static void md_end_io_acct(struct bio *bio)
{
struct md_io_acct *md_io_acct = bio->bi_private;
struct bio *orig_bio = md_io_acct->orig_bio;
+ struct mddev *mddev = md_io_acct->mddev;
orig_bio->bi_status = bio->bi_status;
bio_end_io_acct(orig_bio, md_io_acct->start_time);
bio_put(bio);
bio_endio(orig_bio);
+
+ if (atomic_dec_and_test(&mddev->io_acct_cnt))
+ if (unlikely(test_bit(MD_QUIESCE, &mddev->flags)))
+ wake_up(&mddev->wait_io_acct);
}
/*
@@ -8660,6 +8669,8 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
md_io_acct = container_of(clone, struct md_io_acct, bio_clone);
md_io_acct->orig_bio = *bio;
md_io_acct->start_time = bio_start_io_acct(*bio);
+ md_io_acct->mddev = mddev;
+ atomic_inc(&mddev->io_acct_cnt);
clone->bi_end_io = md_end_io_acct;
clone->bi_private = md_io_acct;
diff --git a/drivers/md/md.h b/drivers/md/md.h
index b4e2d8b87b61..a7c89ed53be5 100644
--- a/drivers/md/md.h
+++ b/drivers/md/md.h
@@ -255,6 +255,7 @@ struct md_cluster_info;
* array is ready yet.
* @MD_BROKEN: This is used to stop writes and mark array as failed.
* @MD_DELETED: This device is being deleted
+ * @MD_QUIESCE: This device is being quiesced. Now only raid0 use this flag
*
* change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
*/
@@ -272,6 +273,7 @@ enum mddev_flags {
MD_NOT_READY,
MD_BROKEN,
MD_DELETED,
+ MD_QUIESCE,
};
enum mddev_sb_flags {
@@ -513,6 +515,8 @@ struct mddev {
* metadata and bitmap writes
*/
struct bio_set io_acct_set; /* for raid0 and raid5 io accounting */
+ atomic_t io_acct_cnt;
+ wait_queue_head_t wait_io_acct;
/* Generic flush handling.
* The last to finish preflush schedules a worker to submit
@@ -710,9 +714,10 @@ struct md_thread {
};
struct md_io_acct {
- struct bio *orig_bio;
- unsigned long start_time;
- struct bio bio_clone;
+ struct mddev *mddev;
+ struct bio *orig_bio;
+ unsigned long start_time;
+ struct bio bio_clone;
};
#define THREAD_WAKEUP 0
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 857c49399c28..aced0ad8cdab 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -754,6 +754,12 @@ static void *raid0_takeover(struct mddev *mddev)
static void raid0_quiesce(struct mddev *mddev, int quiesce)
{
+ /* It doesn't use a separate struct to count how many bios are submitted
+ * to member disks to avoid memory alloc and performance decrease
+ */
+ set_bit(MD_QUIESCE, &mddev->flags);
+ wait_event(mddev->wait_io_acct, !atomic_read(&mddev->io_acct_cnt));
+ clear_bit(MD_QUIESCE, &mddev->flags);
}
static struct md_personality raid0_personality=
--
2.32.0 (Apple Git-132)
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH V2 1/1] Add mddev->io_acct_cnt for raid0_quiesce
2022-10-24 6:48 [PATCH V2 1/1] Add mddev->io_acct_cnt for raid0_quiesce Xiao Ni
@ 2022-10-28 21:06 ` Song Liu
2022-11-14 18:14 ` Song Liu
1 sibling, 0 replies; 9+ messages in thread
From: Song Liu @ 2022-10-28 21:06 UTC (permalink / raw)
To: Xiao Ni; +Cc: guoqing.jiang, linux-raid, ffan
On Sun, Oct 23, 2022 at 11:48 PM Xiao Ni <xni@redhat.com> wrote:
>
> It has added io_acct_set for raid0/raid5 io accounting and it needs to
> alloc md_io_acct in the i/o path. They are free when the bios come back
> from member disks. Now we don't have a method to monitor if those bios
> are all come back. In the takeover process, it needs to free the raid0
> memory resource including the memory pool for md_io_acct. But maybe some
> bios are still not returned. When those bios are returned, it can cause
> panic bcause of introducing NULL pointer or invalid address.
>
> This patch adds io_acct_cnt. So when stopping raid0, it can use this
> to wait until all bios come back.
>
> Reported-by: Fine Fan <ffan@redhat.com>
> Signed-off-by: Xiao Ni <xni@redhat.com>
Applied to md-next. Thanks!
Song
> ---
> V2: Move struct mddev* to the start of struct mddev_io_acct
> drivers/md/md.c | 13 ++++++++++++-
> drivers/md/md.h | 11 ++++++++---
> drivers/md/raid0.c | 6 ++++++
> 3 files changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 6f3b2c1cb6cd..208f69849054 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -685,6 +685,7 @@ void mddev_init(struct mddev *mddev)
> atomic_set(&mddev->flush_pending, 0);
> init_waitqueue_head(&mddev->sb_wait);
> init_waitqueue_head(&mddev->recovery_wait);
> + init_waitqueue_head(&mddev->wait_io_acct);
> mddev->reshape_position = MaxSector;
> mddev->reshape_backwards = 0;
> mddev->last_sync_action = "none";
> @@ -8618,15 +8619,18 @@ int acct_bioset_init(struct mddev *mddev)
> {
> int err = 0;
>
> - if (!bioset_initialized(&mddev->io_acct_set))
> + if (!bioset_initialized(&mddev->io_acct_set)) {
> + atomic_set(&mddev->io_acct_cnt, 0);
> err = bioset_init(&mddev->io_acct_set, BIO_POOL_SIZE,
> offsetof(struct md_io_acct, bio_clone), 0);
> + }
> return err;
> }
> EXPORT_SYMBOL_GPL(acct_bioset_init);
>
> void acct_bioset_exit(struct mddev *mddev)
> {
> + WARN_ON(atomic_read(&mddev->io_acct_cnt) != 0);
> bioset_exit(&mddev->io_acct_set);
> }
> EXPORT_SYMBOL_GPL(acct_bioset_exit);
> @@ -8635,12 +8639,17 @@ static void md_end_io_acct(struct bio *bio)
> {
> struct md_io_acct *md_io_acct = bio->bi_private;
> struct bio *orig_bio = md_io_acct->orig_bio;
> + struct mddev *mddev = md_io_acct->mddev;
>
> orig_bio->bi_status = bio->bi_status;
>
> bio_end_io_acct(orig_bio, md_io_acct->start_time);
> bio_put(bio);
> bio_endio(orig_bio);
> +
> + if (atomic_dec_and_test(&mddev->io_acct_cnt))
> + if (unlikely(test_bit(MD_QUIESCE, &mddev->flags)))
> + wake_up(&mddev->wait_io_acct);
> }
>
> /*
> @@ -8660,6 +8669,8 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
> md_io_acct = container_of(clone, struct md_io_acct, bio_clone);
> md_io_acct->orig_bio = *bio;
> md_io_acct->start_time = bio_start_io_acct(*bio);
> + md_io_acct->mddev = mddev;
> + atomic_inc(&mddev->io_acct_cnt);
>
> clone->bi_end_io = md_end_io_acct;
> clone->bi_private = md_io_acct;
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index b4e2d8b87b61..a7c89ed53be5 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -255,6 +255,7 @@ struct md_cluster_info;
> * array is ready yet.
> * @MD_BROKEN: This is used to stop writes and mark array as failed.
> * @MD_DELETED: This device is being deleted
> + * @MD_QUIESCE: This device is being quiesced. Now only raid0 use this flag
> *
> * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
> */
> @@ -272,6 +273,7 @@ enum mddev_flags {
> MD_NOT_READY,
> MD_BROKEN,
> MD_DELETED,
> + MD_QUIESCE,
> };
>
> enum mddev_sb_flags {
> @@ -513,6 +515,8 @@ struct mddev {
> * metadata and bitmap writes
> */
> struct bio_set io_acct_set; /* for raid0 and raid5 io accounting */
> + atomic_t io_acct_cnt;
> + wait_queue_head_t wait_io_acct;
>
> /* Generic flush handling.
> * The last to finish preflush schedules a worker to submit
> @@ -710,9 +714,10 @@ struct md_thread {
> };
>
> struct md_io_acct {
> - struct bio *orig_bio;
> - unsigned long start_time;
> - struct bio bio_clone;
> + struct mddev *mddev;
> + struct bio *orig_bio;
> + unsigned long start_time;
> + struct bio bio_clone;
> };
>
> #define THREAD_WAKEUP 0
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 857c49399c28..aced0ad8cdab 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -754,6 +754,12 @@ static void *raid0_takeover(struct mddev *mddev)
>
> static void raid0_quiesce(struct mddev *mddev, int quiesce)
> {
> + /* It doesn't use a separate struct to count how many bios are submitted
> + * to member disks to avoid memory alloc and performance decrease
> + */
> + set_bit(MD_QUIESCE, &mddev->flags);
> + wait_event(mddev->wait_io_acct, !atomic_read(&mddev->io_acct_cnt));
> + clear_bit(MD_QUIESCE, &mddev->flags);
> }
>
> static struct md_personality raid0_personality=
> --
> 2.32.0 (Apple Git-132)
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2 1/1] Add mddev->io_acct_cnt for raid0_quiesce
2022-10-24 6:48 [PATCH V2 1/1] Add mddev->io_acct_cnt for raid0_quiesce Xiao Ni
2022-10-28 21:06 ` Song Liu
@ 2022-11-14 18:14 ` Song Liu
2022-11-14 23:18 ` Xiao Ni
1 sibling, 1 reply; 9+ messages in thread
From: Song Liu @ 2022-11-14 18:14 UTC (permalink / raw)
To: Xiao Ni; +Cc: guoqing.jiang, linux-raid, ffan
Hi Xiao,
On Sun, Oct 23, 2022 at 11:48 PM Xiao Ni <xni@redhat.com> wrote:
>
> It has added io_acct_set for raid0/raid5 io accounting and it needs to
> alloc md_io_acct in the i/o path. They are free when the bios come back
> from member disks. Now we don't have a method to monitor if those bios
> are all come back. In the takeover process, it needs to free the raid0
> memory resource including the memory pool for md_io_acct. But maybe some
> bios are still not returned. When those bios are returned, it can cause
> panic bcause of introducing NULL pointer or invalid address.
>
> This patch adds io_acct_cnt. So when stopping raid0, it can use this
> to wait until all bios come back.
I am very sorry to bring this up late. Have you tested the performance
impact of this change? I am afraid this may introduce some visible
performance regression for very high speed arrays.
Thanks,
Song
>
> Reported-by: Fine Fan <ffan@redhat.com>
> Signed-off-by: Xiao Ni <xni@redhat.com>
> ---
> V2: Move struct mddev* to the start of struct mddev_io_acct
> drivers/md/md.c | 13 ++++++++++++-
> drivers/md/md.h | 11 ++++++++---
> drivers/md/raid0.c | 6 ++++++
> 3 files changed, 26 insertions(+), 4 deletions(-)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 6f3b2c1cb6cd..208f69849054 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -685,6 +685,7 @@ void mddev_init(struct mddev *mddev)
> atomic_set(&mddev->flush_pending, 0);
> init_waitqueue_head(&mddev->sb_wait);
> init_waitqueue_head(&mddev->recovery_wait);
> + init_waitqueue_head(&mddev->wait_io_acct);
> mddev->reshape_position = MaxSector;
> mddev->reshape_backwards = 0;
> mddev->last_sync_action = "none";
> @@ -8618,15 +8619,18 @@ int acct_bioset_init(struct mddev *mddev)
> {
> int err = 0;
>
> - if (!bioset_initialized(&mddev->io_acct_set))
> + if (!bioset_initialized(&mddev->io_acct_set)) {
> + atomic_set(&mddev->io_acct_cnt, 0);
> err = bioset_init(&mddev->io_acct_set, BIO_POOL_SIZE,
> offsetof(struct md_io_acct, bio_clone), 0);
> + }
> return err;
> }
> EXPORT_SYMBOL_GPL(acct_bioset_init);
>
> void acct_bioset_exit(struct mddev *mddev)
> {
> + WARN_ON(atomic_read(&mddev->io_acct_cnt) != 0);
> bioset_exit(&mddev->io_acct_set);
> }
> EXPORT_SYMBOL_GPL(acct_bioset_exit);
> @@ -8635,12 +8639,17 @@ static void md_end_io_acct(struct bio *bio)
> {
> struct md_io_acct *md_io_acct = bio->bi_private;
> struct bio *orig_bio = md_io_acct->orig_bio;
> + struct mddev *mddev = md_io_acct->mddev;
>
> orig_bio->bi_status = bio->bi_status;
>
> bio_end_io_acct(orig_bio, md_io_acct->start_time);
> bio_put(bio);
> bio_endio(orig_bio);
> +
> + if (atomic_dec_and_test(&mddev->io_acct_cnt))
> + if (unlikely(test_bit(MD_QUIESCE, &mddev->flags)))
> + wake_up(&mddev->wait_io_acct);
> }
>
> /*
> @@ -8660,6 +8669,8 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
> md_io_acct = container_of(clone, struct md_io_acct, bio_clone);
> md_io_acct->orig_bio = *bio;
> md_io_acct->start_time = bio_start_io_acct(*bio);
> + md_io_acct->mddev = mddev;
> + atomic_inc(&mddev->io_acct_cnt);
>
> clone->bi_end_io = md_end_io_acct;
> clone->bi_private = md_io_acct;
> diff --git a/drivers/md/md.h b/drivers/md/md.h
> index b4e2d8b87b61..a7c89ed53be5 100644
> --- a/drivers/md/md.h
> +++ b/drivers/md/md.h
> @@ -255,6 +255,7 @@ struct md_cluster_info;
> * array is ready yet.
> * @MD_BROKEN: This is used to stop writes and mark array as failed.
> * @MD_DELETED: This device is being deleted
> + * @MD_QUIESCE: This device is being quiesced. Now only raid0 use this flag
> *
> * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
> */
> @@ -272,6 +273,7 @@ enum mddev_flags {
> MD_NOT_READY,
> MD_BROKEN,
> MD_DELETED,
> + MD_QUIESCE,
> };
>
> enum mddev_sb_flags {
> @@ -513,6 +515,8 @@ struct mddev {
> * metadata and bitmap writes
> */
> struct bio_set io_acct_set; /* for raid0 and raid5 io accounting */
> + atomic_t io_acct_cnt;
> + wait_queue_head_t wait_io_acct;
>
> /* Generic flush handling.
> * The last to finish preflush schedules a worker to submit
> @@ -710,9 +714,10 @@ struct md_thread {
> };
>
> struct md_io_acct {
> - struct bio *orig_bio;
> - unsigned long start_time;
> - struct bio bio_clone;
> + struct mddev *mddev;
> + struct bio *orig_bio;
> + unsigned long start_time;
> + struct bio bio_clone;
> };
>
> #define THREAD_WAKEUP 0
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 857c49399c28..aced0ad8cdab 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -754,6 +754,12 @@ static void *raid0_takeover(struct mddev *mddev)
>
> static void raid0_quiesce(struct mddev *mddev, int quiesce)
> {
> + /* It doesn't use a separate struct to count how many bios are submitted
> + * to member disks to avoid memory alloc and performance decrease
> + */
> + set_bit(MD_QUIESCE, &mddev->flags);
> + wait_event(mddev->wait_io_acct, !atomic_read(&mddev->io_acct_cnt));
> + clear_bit(MD_QUIESCE, &mddev->flags);
> }
>
> static struct md_personality raid0_personality=
> --
> 2.32.0 (Apple Git-132)
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2 1/1] Add mddev->io_acct_cnt for raid0_quiesce
2022-11-14 18:14 ` Song Liu
@ 2022-11-14 23:18 ` Xiao Ni
2022-11-17 2:02 ` Xiao Ni
0 siblings, 1 reply; 9+ messages in thread
From: Xiao Ni @ 2022-11-14 23:18 UTC (permalink / raw)
To: Song Liu; +Cc: guoqing.jiang, linux-raid, ffan
Hi Song
I'll do a performance test today and give the test result.
Regards
Xiao
On Tue, Nov 15, 2022 at 2:14 AM Song Liu <song@kernel.org> wrote:
>
> Hi Xiao,
>
> On Sun, Oct 23, 2022 at 11:48 PM Xiao Ni <xni@redhat.com> wrote:
> >
> > It has added io_acct_set for raid0/raid5 io accounting and it needs to
> > alloc md_io_acct in the i/o path. They are free when the bios come back
> > from member disks. Now we don't have a method to monitor if those bios
> > are all come back. In the takeover process, it needs to free the raid0
> > memory resource including the memory pool for md_io_acct. But maybe some
> > bios are still not returned. When those bios are returned, it can cause
> > panic bcause of introducing NULL pointer or invalid address.
> >
> > This patch adds io_acct_cnt. So when stopping raid0, it can use this
> > to wait until all bios come back.
>
> I am very sorry to bring this up late. Have you tested the performance
> impact of this change? I am afraid this may introduce some visible
> performance regression for very high speed arrays.
>
> Thanks,
> Song
>
>
> >
> > Reported-by: Fine Fan <ffan@redhat.com>
> > Signed-off-by: Xiao Ni <xni@redhat.com>
> > ---
> > V2: Move struct mddev* to the start of struct mddev_io_acct
> > drivers/md/md.c | 13 ++++++++++++-
> > drivers/md/md.h | 11 ++++++++---
> > drivers/md/raid0.c | 6 ++++++
> > 3 files changed, 26 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 6f3b2c1cb6cd..208f69849054 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -685,6 +685,7 @@ void mddev_init(struct mddev *mddev)
> > atomic_set(&mddev->flush_pending, 0);
> > init_waitqueue_head(&mddev->sb_wait);
> > init_waitqueue_head(&mddev->recovery_wait);
> > + init_waitqueue_head(&mddev->wait_io_acct);
> > mddev->reshape_position = MaxSector;
> > mddev->reshape_backwards = 0;
> > mddev->last_sync_action = "none";
> > @@ -8618,15 +8619,18 @@ int acct_bioset_init(struct mddev *mddev)
> > {
> > int err = 0;
> >
> > - if (!bioset_initialized(&mddev->io_acct_set))
> > + if (!bioset_initialized(&mddev->io_acct_set)) {
> > + atomic_set(&mddev->io_acct_cnt, 0);
> > err = bioset_init(&mddev->io_acct_set, BIO_POOL_SIZE,
> > offsetof(struct md_io_acct, bio_clone), 0);
> > + }
> > return err;
> > }
> > EXPORT_SYMBOL_GPL(acct_bioset_init);
> >
> > void acct_bioset_exit(struct mddev *mddev)
> > {
> > + WARN_ON(atomic_read(&mddev->io_acct_cnt) != 0);
> > bioset_exit(&mddev->io_acct_set);
> > }
> > EXPORT_SYMBOL_GPL(acct_bioset_exit);
> > @@ -8635,12 +8639,17 @@ static void md_end_io_acct(struct bio *bio)
> > {
> > struct md_io_acct *md_io_acct = bio->bi_private;
> > struct bio *orig_bio = md_io_acct->orig_bio;
> > + struct mddev *mddev = md_io_acct->mddev;
> >
> > orig_bio->bi_status = bio->bi_status;
> >
> > bio_end_io_acct(orig_bio, md_io_acct->start_time);
> > bio_put(bio);
> > bio_endio(orig_bio);
> > +
> > + if (atomic_dec_and_test(&mddev->io_acct_cnt))
> > + if (unlikely(test_bit(MD_QUIESCE, &mddev->flags)))
> > + wake_up(&mddev->wait_io_acct);
> > }
> >
> > /*
> > @@ -8660,6 +8669,8 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
> > md_io_acct = container_of(clone, struct md_io_acct, bio_clone);
> > md_io_acct->orig_bio = *bio;
> > md_io_acct->start_time = bio_start_io_acct(*bio);
> > + md_io_acct->mddev = mddev;
> > + atomic_inc(&mddev->io_acct_cnt);
> >
> > clone->bi_end_io = md_end_io_acct;
> > clone->bi_private = md_io_acct;
> > diff --git a/drivers/md/md.h b/drivers/md/md.h
> > index b4e2d8b87b61..a7c89ed53be5 100644
> > --- a/drivers/md/md.h
> > +++ b/drivers/md/md.h
> > @@ -255,6 +255,7 @@ struct md_cluster_info;
> > * array is ready yet.
> > * @MD_BROKEN: This is used to stop writes and mark array as failed.
> > * @MD_DELETED: This device is being deleted
> > + * @MD_QUIESCE: This device is being quiesced. Now only raid0 use this flag
> > *
> > * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
> > */
> > @@ -272,6 +273,7 @@ enum mddev_flags {
> > MD_NOT_READY,
> > MD_BROKEN,
> > MD_DELETED,
> > + MD_QUIESCE,
> > };
> >
> > enum mddev_sb_flags {
> > @@ -513,6 +515,8 @@ struct mddev {
> > * metadata and bitmap writes
> > */
> > struct bio_set io_acct_set; /* for raid0 and raid5 io accounting */
> > + atomic_t io_acct_cnt;
> > + wait_queue_head_t wait_io_acct;
> >
> > /* Generic flush handling.
> > * The last to finish preflush schedules a worker to submit
> > @@ -710,9 +714,10 @@ struct md_thread {
> > };
> >
> > struct md_io_acct {
> > - struct bio *orig_bio;
> > - unsigned long start_time;
> > - struct bio bio_clone;
> > + struct mddev *mddev;
> > + struct bio *orig_bio;
> > + unsigned long start_time;
> > + struct bio bio_clone;
> > };
> >
> > #define THREAD_WAKEUP 0
> > diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> > index 857c49399c28..aced0ad8cdab 100644
> > --- a/drivers/md/raid0.c
> > +++ b/drivers/md/raid0.c
> > @@ -754,6 +754,12 @@ static void *raid0_takeover(struct mddev *mddev)
> >
> > static void raid0_quiesce(struct mddev *mddev, int quiesce)
> > {
> > + /* It doesn't use a separate struct to count how many bios are submitted
> > + * to member disks to avoid memory alloc and performance decrease
> > + */
> > + set_bit(MD_QUIESCE, &mddev->flags);
> > + wait_event(mddev->wait_io_acct, !atomic_read(&mddev->io_acct_cnt));
> > + clear_bit(MD_QUIESCE, &mddev->flags);
> > }
> >
> > static struct md_personality raid0_personality=
> > --
> > 2.32.0 (Apple Git-132)
> >
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2 1/1] Add mddev->io_acct_cnt for raid0_quiesce
2022-11-14 23:18 ` Xiao Ni
@ 2022-11-17 2:02 ` Xiao Ni
2022-11-17 19:56 ` Song Liu
0 siblings, 1 reply; 9+ messages in thread
From: Xiao Ni @ 2022-11-17 2:02 UTC (permalink / raw)
To: Song Liu; +Cc: guoqing.jiang, linux-raid, ffan
Hi Song
The performance is good. Please check the result below.
And for the patch itself, do you think we should add a smp_mb
diff --git a/drivers/md/md.c b/drivers/md/md.c
index 4d0139cae8b5..3696e3825e27 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -8650,9 +8650,11 @@ static void md_end_io_acct(struct bio *bio)
bio_put(bio);
bio_endio(orig_bio);
- if (atomic_dec_and_test(&mddev->io_acct_cnt))
+ if (atomic_dec_and_test(&mddev->io_acct_cnt)) {
+ smp_mb();
if (unlikely(test_bit(MD_QUIESCE, &mddev->flags)))
wake_up(&mddev->wait_io_acct);
+ }
}
/*
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 9d4831ca802c..1818f79bfdf7 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -757,6 +757,7 @@ static void raid0_quiesce(struct mddev *mddev, int quiesce)
* to member disks to avoid memory alloc and performance decrease
*/
set_bit(MD_QUIESCE, &mddev->flags);
+ smp_mb();
wait_event(mddev->wait_io_acct, !atomic_read(&mddev->io_acct_cnt));
clear_bit(MD_QUIESCE, &mddev->flags);
}
Test result:
without patch with patch
psync read 100MB/s 101MB/s job:1 bs:4k
1015MB/s 1016MB/s job:1 bs:128k
1359MB/s 1358MB/s job:1 bs:256k
1394MB/s 1393MB/s job:40 bs:4k
4959MB/s 4873MB/s job:40 bs:128k
6166MB/s 6157MB/s job:40 bs:256k
without patch with patch
psync write 286MB/s 275MB/s job:1 bs:4k
1810MB/s 1808MB/s job:1 bs:128k
1814MB/s 1814MB/s job:1 bs:256k
1802MB/s 1801MB/s job:40 bs:4k
1814MB/s 1814MB/s job:40 bs:128k
1814MB/s 1814MB/s job:40 bs:256k
without patch
psync randread 39.3MB/s 39.7MB/s job:1 bs:4k
791MB/s 783MB/s job:1 bs:128k
1183MiB/s 1217MB/s job:1 bs:256k
1183MiB/s 1235MB/s job:40 bs:4k
3768MB/s 3705MB/s job:40 bs:128k
4410MB/s 4418MB/s job:40 bs:256k
without patch
psync randwrite. 281MB/s 272MB/s job:1 bs:4k
1708MB/s 1706MB/s job:1 bs:128k
1658MB/s 1644MB/s job:1 bs:256k
1796MB/s 1796MB/s job:40 bs:4k
1818MB/s 1818MB/s job:40 bs:128k
1820MB/s 1820MB/s job:40 bs:256k
depth:128 without patch with patch
aio read 1294MB/s 1270MB/s job:1 bs:4k depth:128
3956MB/s 4000MB/s job:1
bs:128k depth:128
3955MB/s 4000MB/s job:1
bs:256k depth:128
aio write 1255MB/s 1241MB/s job:1 bs:4k depth:128
1813MB/s 1814MB/s job:1
bs:128k depth:128
1814MB/s 1814MB/s job:1
bs:256k depth:128
aio randread 1112MB/s 1117MB/s job:1 bs:4k depth:128
3875MB/s 3975MB/s job:1
bs:128k depth:128
4284MB/s 4407MB/s job:1
bs:256k depth:128
aio randwrite 1080MB/s 1172MB/s job:1 bs:4k depth:128
1814MB/s 1814MB/s job:1
bs:128k depth:128
1816MB/s 1817MB/s job:1
bs:256k depth:128
Best Regards
Xiao
On Tue, Nov 15, 2022 at 7:18 AM Xiao Ni <xni@redhat.com> wrote:
>
> Hi Song
>
> I'll do a performance test today and give the test result.
>
> Regards
> Xiao
>
> On Tue, Nov 15, 2022 at 2:14 AM Song Liu <song@kernel.org> wrote:
> >
> > Hi Xiao,
> >
> > On Sun, Oct 23, 2022 at 11:48 PM Xiao Ni <xni@redhat.com> wrote:
> > >
> > > It has added io_acct_set for raid0/raid5 io accounting and it needs to
> > > alloc md_io_acct in the i/o path. They are free when the bios come back
> > > from member disks. Now we don't have a method to monitor if those bios
> > > are all come back. In the takeover process, it needs to free the raid0
> > > memory resource including the memory pool for md_io_acct. But maybe some
> > > bios are still not returned. When those bios are returned, it can cause
> > > panic bcause of introducing NULL pointer or invalid address.
> > >
> > > This patch adds io_acct_cnt. So when stopping raid0, it can use this
> > > to wait until all bios come back.
> >
> > I am very sorry to bring this up late. Have you tested the performance
> > impact of this change? I am afraid this may introduce some visible
> > performance regression for very high speed arrays.
> >
> > Thanks,
> > Song
> >
> >
> > >
> > > Reported-by: Fine Fan <ffan@redhat.com>
> > > Signed-off-by: Xiao Ni <xni@redhat.com>
> > > ---
> > > V2: Move struct mddev* to the start of struct mddev_io_acct
> > > drivers/md/md.c | 13 ++++++++++++-
> > > drivers/md/md.h | 11 ++++++++---
> > > drivers/md/raid0.c | 6 ++++++
> > > 3 files changed, 26 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > index 6f3b2c1cb6cd..208f69849054 100644
> > > --- a/drivers/md/md.c
> > > +++ b/drivers/md/md.c
> > > @@ -685,6 +685,7 @@ void mddev_init(struct mddev *mddev)
> > > atomic_set(&mddev->flush_pending, 0);
> > > init_waitqueue_head(&mddev->sb_wait);
> > > init_waitqueue_head(&mddev->recovery_wait);
> > > + init_waitqueue_head(&mddev->wait_io_acct);
> > > mddev->reshape_position = MaxSector;
> > > mddev->reshape_backwards = 0;
> > > mddev->last_sync_action = "none";
> > > @@ -8618,15 +8619,18 @@ int acct_bioset_init(struct mddev *mddev)
> > > {
> > > int err = 0;
> > >
> > > - if (!bioset_initialized(&mddev->io_acct_set))
> > > + if (!bioset_initialized(&mddev->io_acct_set)) {
> > > + atomic_set(&mddev->io_acct_cnt, 0);
> > > err = bioset_init(&mddev->io_acct_set, BIO_POOL_SIZE,
> > > offsetof(struct md_io_acct, bio_clone), 0);
> > > + }
> > > return err;
> > > }
> > > EXPORT_SYMBOL_GPL(acct_bioset_init);
> > >
> > > void acct_bioset_exit(struct mddev *mddev)
> > > {
> > > + WARN_ON(atomic_read(&mddev->io_acct_cnt) != 0);
> > > bioset_exit(&mddev->io_acct_set);
> > > }
> > > EXPORT_SYMBOL_GPL(acct_bioset_exit);
> > > @@ -8635,12 +8639,17 @@ static void md_end_io_acct(struct bio *bio)
> > > {
> > > struct md_io_acct *md_io_acct = bio->bi_private;
> > > struct bio *orig_bio = md_io_acct->orig_bio;
> > > + struct mddev *mddev = md_io_acct->mddev;
> > >
> > > orig_bio->bi_status = bio->bi_status;
> > >
> > > bio_end_io_acct(orig_bio, md_io_acct->start_time);
> > > bio_put(bio);
> > > bio_endio(orig_bio);
> > > +
> > > + if (atomic_dec_and_test(&mddev->io_acct_cnt))
> > > + if (unlikely(test_bit(MD_QUIESCE, &mddev->flags)))
> > > + wake_up(&mddev->wait_io_acct);
> > > }
> > >
> > > /*
> > > @@ -8660,6 +8669,8 @@ void md_account_bio(struct mddev *mddev, struct bio **bio)
> > > md_io_acct = container_of(clone, struct md_io_acct, bio_clone);
> > > md_io_acct->orig_bio = *bio;
> > > md_io_acct->start_time = bio_start_io_acct(*bio);
> > > + md_io_acct->mddev = mddev;
> > > + atomic_inc(&mddev->io_acct_cnt);
> > >
> > > clone->bi_end_io = md_end_io_acct;
> > > clone->bi_private = md_io_acct;
> > > diff --git a/drivers/md/md.h b/drivers/md/md.h
> > > index b4e2d8b87b61..a7c89ed53be5 100644
> > > --- a/drivers/md/md.h
> > > +++ b/drivers/md/md.h
> > > @@ -255,6 +255,7 @@ struct md_cluster_info;
> > > * array is ready yet.
> > > * @MD_BROKEN: This is used to stop writes and mark array as failed.
> > > * @MD_DELETED: This device is being deleted
> > > + * @MD_QUIESCE: This device is being quiesced. Now only raid0 use this flag
> > > *
> > > * change UNSUPPORTED_MDDEV_FLAGS for each array type if new flag is added
> > > */
> > > @@ -272,6 +273,7 @@ enum mddev_flags {
> > > MD_NOT_READY,
> > > MD_BROKEN,
> > > MD_DELETED,
> > > + MD_QUIESCE,
> > > };
> > >
> > > enum mddev_sb_flags {
> > > @@ -513,6 +515,8 @@ struct mddev {
> > > * metadata and bitmap writes
> > > */
> > > struct bio_set io_acct_set; /* for raid0 and raid5 io accounting */
> > > + atomic_t io_acct_cnt;
> > > + wait_queue_head_t wait_io_acct;
> > >
> > > /* Generic flush handling.
> > > * The last to finish preflush schedules a worker to submit
> > > @@ -710,9 +714,10 @@ struct md_thread {
> > > };
> > >
> > > struct md_io_acct {
> > > - struct bio *orig_bio;
> > > - unsigned long start_time;
> > > - struct bio bio_clone;
> > > + struct mddev *mddev;
> > > + struct bio *orig_bio;
> > > + unsigned long start_time;
> > > + struct bio bio_clone;
> > > };
> > >
> > > #define THREAD_WAKEUP 0
> > > diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> > > index 857c49399c28..aced0ad8cdab 100644
> > > --- a/drivers/md/raid0.c
> > > +++ b/drivers/md/raid0.c
> > > @@ -754,6 +754,12 @@ static void *raid0_takeover(struct mddev *mddev)
> > >
> > > static void raid0_quiesce(struct mddev *mddev, int quiesce)
> > > {
> > > + /* It doesn't use a separate struct to count how many bios are submitted
> > > + * to member disks to avoid memory alloc and performance decrease
> > > + */
> > > + set_bit(MD_QUIESCE, &mddev->flags);
> > > + wait_event(mddev->wait_io_acct, !atomic_read(&mddev->io_acct_cnt));
> > > + clear_bit(MD_QUIESCE, &mddev->flags);
> > > }
> > >
> > > static struct md_personality raid0_personality=
> > > --
> > > 2.32.0 (Apple Git-132)
> > >
> >
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH V2 1/1] Add mddev->io_acct_cnt for raid0_quiesce
2022-11-17 2:02 ` Xiao Ni
@ 2022-11-17 19:56 ` Song Liu
2022-11-18 1:39 ` Xiao Ni
0 siblings, 1 reply; 9+ messages in thread
From: Song Liu @ 2022-11-17 19:56 UTC (permalink / raw)
To: Xiao Ni; +Cc: guoqing.jiang, linux-raid, ffan
Hi Xiao,
Thanks for the results.
On Wed, Nov 16, 2022 at 6:03 PM Xiao Ni <xni@redhat.com> wrote:
>
> Hi Song
>
> The performance is good. Please check the result below.
>
> And for the patch itself, do you think we should add a smp_mb
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 4d0139cae8b5..3696e3825e27 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -8650,9 +8650,11 @@ static void md_end_io_acct(struct bio *bio)
> bio_put(bio);
> bio_endio(orig_bio);
>
> - if (atomic_dec_and_test(&mddev->io_acct_cnt))
> + if (atomic_dec_and_test(&mddev->io_acct_cnt)) {
> + smp_mb();
> if (unlikely(test_bit(MD_QUIESCE, &mddev->flags)))
> wake_up(&mddev->wait_io_acct);
> + }
> }
>
> /*
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index 9d4831ca802c..1818f79bfdf7 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -757,6 +757,7 @@ static void raid0_quiesce(struct mddev *mddev, int quiesce)
> * to member disks to avoid memory alloc and performance decrease
> */
> set_bit(MD_QUIESCE, &mddev->flags);
> + smp_mb();
> wait_event(mddev->wait_io_acct, !atomic_read(&mddev->io_acct_cnt));
> clear_bit(MD_QUIESCE, &mddev->flags);
> }
>
> Test result:
I think there is some noise in the result?
>
> without patch with patch
> psync read 100MB/s 101MB/s job:1 bs:4k
For example, this is a small improvement, but
> 1015MB/s 1016MB/s job:1 bs:128k
> 1359MB/s 1358MB/s job:1 bs:256k
> 1394MB/s 1393MB/s job:40 bs:4k
> 4959MB/s 4873MB/s job:40 bs:128k
> 6166MB/s 6157MB/s job:40 bs:256k
>
> without patch with patch
> psync write 286MB/s 275MB/s job:1 bs:4k
this is a big regression (~4%).
> 1810MB/s 1808MB/s job:1 bs:128k
> 1814MB/s 1814MB/s job:1 bs:256k
> 1802MB/s 1801MB/s job:40 bs:4k
> 1814MB/s 1814MB/s job:40 bs:128k
> 1814MB/s 1814MB/s job:40 bs:256k
>
> without patch
> psync randread 39.3MB/s 39.7MB/s job:1 bs:4k
> 791MB/s 783MB/s job:1 bs:128k
> 1183MiB/s 1217MB/s job:1 bs:256k
> 1183MiB/s 1235MB/s job:40 bs:4k
> 3768MB/s 3705MB/s job:40 bs:128k
And some regression for 128kB but improvement for 4kB.
> 4410MB/s 4418MB/s job:40 bs:256k
So I am not quite convinced by these results.
Also, do we really need an extra counter here? Can we use
mddev->active_io instead?
Thanks,
Song
[...]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2 1/1] Add mddev->io_acct_cnt for raid0_quiesce
2022-11-17 19:56 ` Song Liu
@ 2022-11-18 1:39 ` Xiao Ni
2022-11-18 2:36 ` Song Liu
0 siblings, 1 reply; 9+ messages in thread
From: Xiao Ni @ 2022-11-18 1:39 UTC (permalink / raw)
To: Song Liu; +Cc: guoqing.jiang, linux-raid, ffan
On Fri, Nov 18, 2022 at 3:57 AM Song Liu <song@kernel.org> wrote:
>
> Hi Xiao,
>
> Thanks for the results.
>
> On Wed, Nov 16, 2022 at 6:03 PM Xiao Ni <xni@redhat.com> wrote:
> >
> > Hi Song
> >
> > The performance is good. Please check the result below.
> >
> > And for the patch itself, do you think we should add a smp_mb
> > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > index 4d0139cae8b5..3696e3825e27 100644
> > --- a/drivers/md/md.c
> > +++ b/drivers/md/md.c
> > @@ -8650,9 +8650,11 @@ static void md_end_io_acct(struct bio *bio)
> > bio_put(bio);
> > bio_endio(orig_bio);
> >
> > - if (atomic_dec_and_test(&mddev->io_acct_cnt))
> > + if (atomic_dec_and_test(&mddev->io_acct_cnt)) {
> > + smp_mb();
> > if (unlikely(test_bit(MD_QUIESCE, &mddev->flags)))
> > wake_up(&mddev->wait_io_acct);
> > + }
> > }
> >
> > /*
> > diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> > index 9d4831ca802c..1818f79bfdf7 100644
> > --- a/drivers/md/raid0.c
> > +++ b/drivers/md/raid0.c
> > @@ -757,6 +757,7 @@ static void raid0_quiesce(struct mddev *mddev, int quiesce)
> > * to member disks to avoid memory alloc and performance decrease
> > */
> > set_bit(MD_QUIESCE, &mddev->flags);
> > + smp_mb();
> > wait_event(mddev->wait_io_acct, !atomic_read(&mddev->io_acct_cnt));
> > clear_bit(MD_QUIESCE, &mddev->flags);
> > }
> >
> > Test result:
>
> I think there is some noise in the result?
>
> >
> > without patch with patch
> > psync read 100MB/s 101MB/s job:1 bs:4k
>
> For example, this is a small improvement, but
>
> > 1015MB/s 1016MB/s job:1 bs:128k
> > 1359MB/s 1358MB/s job:1 bs:256k
> > 1394MB/s 1393MB/s job:40 bs:4k
> > 4959MB/s 4873MB/s job:40 bs:128k
> > 6166MB/s 6157MB/s job:40 bs:256k
> >
> > without patch with patch
> > psync write 286MB/s 275MB/s job:1 bs:4k
>
> this is a big regression (~4%).
>
> > 1810MB/s 1808MB/s job:1 bs:128k
> > 1814MB/s 1814MB/s job:1 bs:256k
> > 1802MB/s 1801MB/s job:40 bs:4k
> > 1814MB/s 1814MB/s job:40 bs:128k
> > 1814MB/s 1814MB/s job:40 bs:256k
> >
> > without patch
> > psync randread 39.3MB/s 39.7MB/s job:1 bs:4k
> > 791MB/s 783MB/s job:1 bs:128k
> > 1183MiB/s 1217MB/s job:1 bs:256k
> > 1183MiB/s 1235MB/s job:40 bs:4k
> > 3768MB/s 3705MB/s job:40 bs:128k
>
> And some regression for 128kB but improvement for 4kB.
>
> > 4410MB/s 4418MB/s job:40 bs:256k
>
> So I am not quite convinced by these results.
Thanks for pointing out the problem. Maybe I need to do a precondition before
the testing. I'll give the result again.
>
> Also, do we really need an extra counter here? Can we use
> mddev->active_io instead?
At first, I thought of this way too. But active_io is decreased only
after pers->make_request.
So it can't be used to wait for all bios to return back.
Can we decrease mddev->active_io in the bi_end_io of each personality?
Now only mddev_supsend
uses mddev->active_io. It needs to wait all active io to finish. From
this point, it should be better
to decrease acitve_io in bi_end_io. What's your opinion?
Best Regards
Xiao
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2 1/1] Add mddev->io_acct_cnt for raid0_quiesce
2022-11-18 1:39 ` Xiao Ni
@ 2022-11-18 2:36 ` Song Liu
2022-11-18 4:24 ` Xiao Ni
0 siblings, 1 reply; 9+ messages in thread
From: Song Liu @ 2022-11-18 2:36 UTC (permalink / raw)
To: Xiao Ni; +Cc: guoqing.jiang, linux-raid, ffan
On Thu, Nov 17, 2022 at 5:39 PM Xiao Ni <xni@redhat.com> wrote:
>
> On Fri, Nov 18, 2022 at 3:57 AM Song Liu <song@kernel.org> wrote:
> >
> > Hi Xiao,
> >
> > Thanks for the results.
> >
> > On Wed, Nov 16, 2022 at 6:03 PM Xiao Ni <xni@redhat.com> wrote:
> > >
> > > Hi Song
> > >
> > > The performance is good. Please check the result below.
> > >
> > > And for the patch itself, do you think we should add a smp_mb
> > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > index 4d0139cae8b5..3696e3825e27 100644
> > > --- a/drivers/md/md.c
> > > +++ b/drivers/md/md.c
> > > @@ -8650,9 +8650,11 @@ static void md_end_io_acct(struct bio *bio)
> > > bio_put(bio);
> > > bio_endio(orig_bio);
> > >
> > > - if (atomic_dec_and_test(&mddev->io_acct_cnt))
> > > + if (atomic_dec_and_test(&mddev->io_acct_cnt)) {
> > > + smp_mb();
> > > if (unlikely(test_bit(MD_QUIESCE, &mddev->flags)))
> > > wake_up(&mddev->wait_io_acct);
> > > + }
> > > }
> > >
> > > /*
> > > diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> > > index 9d4831ca802c..1818f79bfdf7 100644
> > > --- a/drivers/md/raid0.c
> > > +++ b/drivers/md/raid0.c
> > > @@ -757,6 +757,7 @@ static void raid0_quiesce(struct mddev *mddev, int quiesce)
> > > * to member disks to avoid memory alloc and performance decrease
> > > */
> > > set_bit(MD_QUIESCE, &mddev->flags);
> > > + smp_mb();
> > > wait_event(mddev->wait_io_acct, !atomic_read(&mddev->io_acct_cnt));
> > > clear_bit(MD_QUIESCE, &mddev->flags);
> > > }
> > >
> > > Test result:
> >
> > I think there is some noise in the result?
> >
> > >
> > > without patch with patch
> > > psync read 100MB/s 101MB/s job:1 bs:4k
> >
> > For example, this is a small improvement, but
> >
> > > 1015MB/s 1016MB/s job:1 bs:128k
> > > 1359MB/s 1358MB/s job:1 bs:256k
> > > 1394MB/s 1393MB/s job:40 bs:4k
> > > 4959MB/s 4873MB/s job:40 bs:128k
> > > 6166MB/s 6157MB/s job:40 bs:256k
> > >
> > > without patch with patch
> > > psync write 286MB/s 275MB/s job:1 bs:4k
> >
> > this is a big regression (~4%).
> >
> > > 1810MB/s 1808MB/s job:1 bs:128k
> > > 1814MB/s 1814MB/s job:1 bs:256k
> > > 1802MB/s 1801MB/s job:40 bs:4k
> > > 1814MB/s 1814MB/s job:40 bs:128k
> > > 1814MB/s 1814MB/s job:40 bs:256k
> > >
> > > without patch
> > > psync randread 39.3MB/s 39.7MB/s job:1 bs:4k
> > > 791MB/s 783MB/s job:1 bs:128k
> > > 1183MiB/s 1217MB/s job:1 bs:256k
> > > 1183MiB/s 1235MB/s job:40 bs:4k
> > > 3768MB/s 3705MB/s job:40 bs:128k
> >
> > And some regression for 128kB but improvement for 4kB.
> >
> > > 4410MB/s 4418MB/s job:40 bs:256k
> >
> > So I am not quite convinced by these results.
>
> Thanks for pointing out the problem. Maybe I need to do a precondition before
> the testing. I'll give the result again.
> >
> > Also, do we really need an extra counter here? Can we use
> > mddev->active_io instead?
>
> At first, I thought of this way too. But active_io is decreased only
> after pers->make_request.
> So it can't be used to wait for all bios to return back.
Ah, that's right.
>
> Can we decrease mddev->active_io in the bi_end_io of each personality?
> Now only mddev_supsend
> uses mddev->active_io. It needs to wait all active io to finish. From
> this point, it should be better
> to decrease acitve_io in bi_end_io. What's your opinion?
I think we can give it a try. It may break some cases though. If mdadm tests
behave the same with the change, we can give it a try.
OTOH, depend on the perf results, we may consider using percpu_ref for
both active_io and io_acct_cnt.
Thanks,
Song
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH V2 1/1] Add mddev->io_acct_cnt for raid0_quiesce
2022-11-18 2:36 ` Song Liu
@ 2022-11-18 4:24 ` Xiao Ni
0 siblings, 0 replies; 9+ messages in thread
From: Xiao Ni @ 2022-11-18 4:24 UTC (permalink / raw)
To: Song Liu; +Cc: guoqing.jiang, linux-raid, ffan
On Fri, Nov 18, 2022 at 10:37 AM Song Liu <song@kernel.org> wrote:
>
> On Thu, Nov 17, 2022 at 5:39 PM Xiao Ni <xni@redhat.com> wrote:
> >
> > On Fri, Nov 18, 2022 at 3:57 AM Song Liu <song@kernel.org> wrote:
> > >
> > > Hi Xiao,
> > >
> > > Thanks for the results.
> > >
> > > On Wed, Nov 16, 2022 at 6:03 PM Xiao Ni <xni@redhat.com> wrote:
> > > >
> > > > Hi Song
> > > >
> > > > The performance is good. Please check the result below.
> > > >
> > > > And for the patch itself, do you think we should add a smp_mb
> > > > diff --git a/drivers/md/md.c b/drivers/md/md.c
> > > > index 4d0139cae8b5..3696e3825e27 100644
> > > > --- a/drivers/md/md.c
> > > > +++ b/drivers/md/md.c
> > > > @@ -8650,9 +8650,11 @@ static void md_end_io_acct(struct bio *bio)
> > > > bio_put(bio);
> > > > bio_endio(orig_bio);
> > > >
> > > > - if (atomic_dec_and_test(&mddev->io_acct_cnt))
> > > > + if (atomic_dec_and_test(&mddev->io_acct_cnt)) {
> > > > + smp_mb();
> > > > if (unlikely(test_bit(MD_QUIESCE, &mddev->flags)))
> > > > wake_up(&mddev->wait_io_acct);
> > > > + }
> > > > }
> > > >
> > > > /*
> > > > diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> > > > index 9d4831ca802c..1818f79bfdf7 100644
> > > > --- a/drivers/md/raid0.c
> > > > +++ b/drivers/md/raid0.c
> > > > @@ -757,6 +757,7 @@ static void raid0_quiesce(struct mddev *mddev, int quiesce)
> > > > * to member disks to avoid memory alloc and performance decrease
> > > > */
> > > > set_bit(MD_QUIESCE, &mddev->flags);
> > > > + smp_mb();
> > > > wait_event(mddev->wait_io_acct, !atomic_read(&mddev->io_acct_cnt));
> > > > clear_bit(MD_QUIESCE, &mddev->flags);
> > > > }
> > > >
> > > > Test result:
> > >
> > > I think there is some noise in the result?
> > >
> > > >
> > > > without patch with patch
> > > > psync read 100MB/s 101MB/s job:1 bs:4k
> > >
> > > For example, this is a small improvement, but
> > >
> > > > 1015MB/s 1016MB/s job:1 bs:128k
> > > > 1359MB/s 1358MB/s job:1 bs:256k
> > > > 1394MB/s 1393MB/s job:40 bs:4k
> > > > 4959MB/s 4873MB/s job:40 bs:128k
> > > > 6166MB/s 6157MB/s job:40 bs:256k
> > > >
> > > > without patch with patch
> > > > psync write 286MB/s 275MB/s job:1 bs:4k
> > >
> > > this is a big regression (~4%).
> > >
> > > > 1810MB/s 1808MB/s job:1 bs:128k
> > > > 1814MB/s 1814MB/s job:1 bs:256k
> > > > 1802MB/s 1801MB/s job:40 bs:4k
> > > > 1814MB/s 1814MB/s job:40 bs:128k
> > > > 1814MB/s 1814MB/s job:40 bs:256k
> > > >
> > > > without patch
> > > > psync randread 39.3MB/s 39.7MB/s job:1 bs:4k
> > > > 791MB/s 783MB/s job:1 bs:128k
> > > > 1183MiB/s 1217MB/s job:1 bs:256k
> > > > 1183MiB/s 1235MB/s job:40 bs:4k
> > > > 3768MB/s 3705MB/s job:40 bs:128k
> > >
> > > And some regression for 128kB but improvement for 4kB.
> > >
> > > > 4410MB/s 4418MB/s job:40 bs:256k
> > >
> > > So I am not quite convinced by these results.
> >
> > Thanks for pointing out the problem. Maybe I need to do a precondition before
> > the testing. I'll give the result again.
> > >
> > > Also, do we really need an extra counter here? Can we use
> > > mddev->active_io instead?
> >
> > At first, I thought of this way too. But active_io is decreased only
> > after pers->make_request.
> > So it can't be used to wait for all bios to return back.
>
> Ah, that's right.
>
> >
> > Can we decrease mddev->active_io in the bi_end_io of each personality?
> > Now only mddev_supsend
> > uses mddev->active_io. It needs to wait all active io to finish. From
> > this point, it should be better
> > to decrease acitve_io in bi_end_io. What's your opinion?
>
> I think we can give it a try. It may break some cases though. If mdadm tests
> behave the same with the change, we can give it a try.
What are the cases?
>
> OTOH, depend on the perf results, we may consider using percpu_ref for
> both active_io and io_acct_cnt.
Thanks. I'll have a try with it.
Regards
Xiao
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-11-18 4:25 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-10-24 6:48 [PATCH V2 1/1] Add mddev->io_acct_cnt for raid0_quiesce Xiao Ni
2022-10-28 21:06 ` Song Liu
2022-11-14 18:14 ` Song Liu
2022-11-14 23:18 ` Xiao Ni
2022-11-17 2:02 ` Xiao Ni
2022-11-17 19:56 ` Song Liu
2022-11-18 1:39 ` Xiao Ni
2022-11-18 2:36 ` Song Liu
2022-11-18 4:24 ` Xiao Ni
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.