All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [dm-devel] [PATCH V2 3/6] multipathd: Occasionally allow waiters to interrupt checking paths
       [not found] ` <1660769316-5302-4-git-send-email-bmarzins@redhat.com>
@ 2022-08-22 16:15   ` Martin Wilck
  2022-08-22 17:46     ` Benjamin Marzinski
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Wilck @ 2022-08-22 16:15 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: wuguanghao3, dm-devel

On Wed, 2022-08-17 at 15:48 -0500, Benjamin Marzinski wrote:
> If there are a very large number of paths that all get checked at the
> same time, it is possible for the path checkers to starve out other
> vecs->lock users, like uevent processing. To avoid this, if the path
> checkers are taking a long time, checkerloop now occasionally unlocks
> and allows other vecs->lock waiters to run.
> 
> In order to make sure that path checking is always making real
> progress,
> checkerloop() only checks if it should unlock every 128 path checks.
> To
> check, it sees if there are waiters and more than a second has passed
> since it acquired the vecs->lock. If both of these are true, it drops
> the lock and calls nanosleep() to schedule.
> 
> When checkerloop() reacquires the lock, it may no longer be at the
> correct place in the vector. While paths can be deleted from any
> point
> of the pathvec, they can only be added at the end. This means that
> the
> next path to check must be between its previous location and the
> start
> of the vector. To help determine the correct starting place,
> checkerloop() marks each path as not checked at the start of each
> checker loop. New paths start in the unchecked state. As paths are
> checked, they are marked as such. If the checker loop is interrupted,
> when it reacquires the lock, it will iterate backwards from the last
> location in checked to the start of the vector. The first path it
> finds
> that is marked as checked must be the last path it checked.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/structs.h |  1 +
>  multipathd/main.c      | 82 +++++++++++++++++++++++++++++++++-------
> --
>  2 files changed, 67 insertions(+), 16 deletions(-)
> 
> diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> index a6a09441..9d4fb9c8 100644
> --- a/libmultipath/structs.h
> +++ b/libmultipath/structs.h
> @@ -351,6 +351,7 @@ struct path {
>         int fast_io_fail;
>         unsigned int dev_loss;
>         int eh_deadline;
> +       bool is_checked;
>         /* configlet pointers */
>         vector hwe;
>         struct gen_path generic_path;
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 71079113..37c04018 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2548,6 +2548,11 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
>         }
>         return 1;
>  }
> +enum checker_state {
> +       CHECKER_STARTING,
> +       CHECKER_RUNNING,
> +       CHECKER_FINISHED,
> +};
>  
>  static void *
>  checkerloop (void *ap)
> @@ -2555,7 +2560,6 @@ checkerloop (void *ap)
>         struct vectors *vecs;
>         struct path *pp;
>         int count = 0;
> -       unsigned int i;
>         struct timespec last_time;
>         struct config *conf;
>         int foreign_tick = 0;
> @@ -2581,8 +2585,9 @@ checkerloop (void *ap)
>  
>         while (1) {
>                 struct timespec diff_time, start_time, end_time;
> -               int num_paths = 0, strict_timing, rc = 0;
> +               int num_paths = 0, strict_timing, rc = 0, i = 0;
>                 unsigned int ticks = 0;
> +               enum checker_state checker_state = CHECKER_STARTING;
>  
>                 if (set_config_state(DAEMON_RUNNING) !=
> DAEMON_RUNNING)
>                         /* daemon shutdown */
> @@ -2603,22 +2608,67 @@ checkerloop (void *ap)
>                 if (use_watchdog)
>                         sd_notify(0, "WATCHDOG=1");
>  #endif
> +               while (checker_state != CHECKER_FINISHED) {
> +                       unsigned int paths_checked = 0;
> +                       struct timespec chk_start_time;
>  
> -               pthread_cleanup_push(cleanup_lock, &vecs->lock);
> -               lock(&vecs->lock);
> -               pthread_testcancel();
> -               vector_foreach_slot (vecs->pathvec, pp, i) {
> -                       rc = check_path(vecs, pp, ticks);
> -                       if (rc < 0) {
> -                               condlog(1, "%s: check_path() failed,
> removing",
> -                                       pp->dev);
> -                               vector_del_slot(vecs->pathvec, i);
> -                               free_path(pp);
> -                               i--;
> -                       } else
> -                               num_paths += rc;
> +                       pthread_cleanup_push(cleanup_lock, &vecs-
> >lock);
> +                       lock(&vecs->lock);
> +                       pthread_testcancel();
> +                       get_monotonic_time(&chk_start_time);
> +                       if (checker_state == CHECKER_STARTING) {
> +                               vector_foreach_slot(vecs->pathvec,
> pp, i)
> +                                       pp->is_checked = false;
> +                               i = 0;
> +                               checker_state = CHECKER_RUNNING;
> +                       } else {
> +                               /*
> +                                * Paths could have been removed
> since we
> +                                * dropped the lock. Find the path to
> continue
> +                                * checking at. Since paths can be
> removed from
> +                                * anywhere in the vector, but can
> only be added
> +                                * at the end, the last checked path
> must be
> +                                * between its old location, and the
> start or
> +                                * the vector.
> +                                */
> +                               if (i >= VECTOR_SIZE(vecs->pathvec))
> +                                       i = VECTOR_SIZE(vecs-
> >pathvec) - 1;

What if VECTOR_SIZE(vecs->pathvec) == 0? Maybe you should catch that in
the while () condition above?

Martin



--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [PATCH V2 3/6] multipathd: Occasionally allow waiters to interrupt checking paths
  2022-08-22 16:15   ` [dm-devel] [PATCH V2 3/6] multipathd: Occasionally allow waiters to interrupt checking paths Martin Wilck
@ 2022-08-22 17:46     ` Benjamin Marzinski
  2022-08-22 18:11       ` Martin Wilck
  0 siblings, 1 reply; 13+ messages in thread
From: Benjamin Marzinski @ 2022-08-22 17:46 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel, wuguanghao3

On Mon, Aug 22, 2022 at 04:15:01PM +0000, Martin Wilck wrote:
> On Wed, 2022-08-17 at 15:48 -0500, Benjamin Marzinski wrote:
> > If there are a very large number of paths that all get checked at the
> > same time, it is possible for the path checkers to starve out other
> > vecs->lock users, like uevent processing. To avoid this, if the path
> > checkers are taking a long time, checkerloop now occasionally unlocks
> > and allows other vecs->lock waiters to run.
> > 
> > In order to make sure that path checking is always making real
> > progress,
> > checkerloop() only checks if it should unlock every 128 path checks.
> > To
> > check, it sees if there are waiters and more than a second has passed
> > since it acquired the vecs->lock. If both of these are true, it drops
> > the lock and calls nanosleep() to schedule.
> > 
> > When checkerloop() reacquires the lock, it may no longer be at the
> > correct place in the vector. While paths can be deleted from any
> > point
> > of the pathvec, they can only be added at the end. This means that
> > the
> > next path to check must be between its previous location and the
> > start
> > of the vector. To help determine the correct starting place,
> > checkerloop() marks each path as not checked at the start of each
> > checker loop. New paths start in the unchecked state. As paths are
> > checked, they are marked as such. If the checker loop is interrupted,
> > when it reacquires the lock, it will iterate backwards from the last
> > location in checked to the start of the vector. The first path it
> > finds
> > that is marked as checked must be the last path it checked.
> > 
> > Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> > ---
> >  libmultipath/structs.h |  1 +
> >  multipathd/main.c      | 82 +++++++++++++++++++++++++++++++++-------
> > --
> >  2 files changed, 67 insertions(+), 16 deletions(-)
> > 
> > diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> > index a6a09441..9d4fb9c8 100644
> > --- a/libmultipath/structs.h
> > +++ b/libmultipath/structs.h
> > @@ -351,6 +351,7 @@ struct path {
> >         int fast_io_fail;
> >         unsigned int dev_loss;
> >         int eh_deadline;
> > +       bool is_checked;
> >         /* configlet pointers */
> >         vector hwe;
> >         struct gen_path generic_path;
> > diff --git a/multipathd/main.c b/multipathd/main.c
> > index 71079113..37c04018 100644
> > --- a/multipathd/main.c
> > +++ b/multipathd/main.c
> > @@ -2548,6 +2548,11 @@ check_path (struct vectors * vecs, struct path
> > * pp, unsigned int ticks)
> >         }
> >         return 1;
> >  }
> > +enum checker_state {
> > +       CHECKER_STARTING,
> > +       CHECKER_RUNNING,
> > +       CHECKER_FINISHED,
> > +};
> >  
> >  static void *
> >  checkerloop (void *ap)
> > @@ -2555,7 +2560,6 @@ checkerloop (void *ap)
> >         struct vectors *vecs;
> >         struct path *pp;
> >         int count = 0;
> > -       unsigned int i;
> >         struct timespec last_time;
> >         struct config *conf;
> >         int foreign_tick = 0;
> > @@ -2581,8 +2585,9 @@ checkerloop (void *ap)
> >  
> >         while (1) {
> >                 struct timespec diff_time, start_time, end_time;
> > -               int num_paths = 0, strict_timing, rc = 0;
> > +               int num_paths = 0, strict_timing, rc = 0, i = 0;
> >                 unsigned int ticks = 0;
> > +               enum checker_state checker_state = CHECKER_STARTING;
> >  
> >                 if (set_config_state(DAEMON_RUNNING) !=
> > DAEMON_RUNNING)
> >                         /* daemon shutdown */
> > @@ -2603,22 +2608,67 @@ checkerloop (void *ap)
> >                 if (use_watchdog)
> >                         sd_notify(0, "WATCHDOG=1");
> >  #endif
> > +               while (checker_state != CHECKER_FINISHED) {
> > +                       unsigned int paths_checked = 0;
> > +                       struct timespec chk_start_time;
> >  
> > -               pthread_cleanup_push(cleanup_lock, &vecs->lock);
> > -               lock(&vecs->lock);
> > -               pthread_testcancel();
> > -               vector_foreach_slot (vecs->pathvec, pp, i) {
> > -                       rc = check_path(vecs, pp, ticks);
> > -                       if (rc < 0) {
> > -                               condlog(1, "%s: check_path() failed,
> > removing",
> > -                                       pp->dev);
> > -                               vector_del_slot(vecs->pathvec, i);
> > -                               free_path(pp);
> > -                               i--;
> > -                       } else
> > -                               num_paths += rc;
> > +                       pthread_cleanup_push(cleanup_lock, &vecs-
> > >lock);
> > +                       lock(&vecs->lock);
> > +                       pthread_testcancel();
> > +                       get_monotonic_time(&chk_start_time);
> > +                       if (checker_state == CHECKER_STARTING) {
> > +                               vector_foreach_slot(vecs->pathvec,
> > pp, i)
> > +                                       pp->is_checked = false;
> > +                               i = 0;
> > +                               checker_state = CHECKER_RUNNING;
> > +                       } else {
> > +                               /*
> > +                                * Paths could have been removed
> > since we
> > +                                * dropped the lock. Find the path to
> > continue
> > +                                * checking at. Since paths can be
> > removed from
> > +                                * anywhere in the vector, but can
> > only be added
> > +                                * at the end, the last checked path
> > must be
> > +                                * between its old location, and the
> > start or
> > +                                * the vector.
> > +                                */
> > +                               if (i >= VECTOR_SIZE(vecs->pathvec))
> > +                                       i = VECTOR_SIZE(vecs-
> > >pathvec) - 1;
> 
> What if VECTOR_SIZE(vecs->pathvec) == 0? Maybe you should catch that in
> the while () condition above?

I could for clarity if you want, but the code is correct as is. If
VECTOR_SIZE() is 0, then i will start at -1. This will cause the while()
loop to immediately exit, since VECTOR_SLOT() checks for i < 0. Right
after the while loop, i gets bumped up to start checking at the first
device (which of course isn't there).  It's the same logic as what
happens if the while() loop searches through the entire pathvec, and
doesn't find any checked paths.  Obviously, the empty vector case does a
bit of unnecessary work after finding out that the vector is empty, and
I could add something like

if (VECTOR_SIZE(vecs->pathvec) == 0) {
	checker_state = CHECKER_FINISHED;
	goto unlock;
}

If you'd prefer.

-Ben

> 
> Martin
> 
> 
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [PATCH V2 3/6] multipathd: Occasionally allow waiters to interrupt checking paths
  2022-08-22 17:46     ` Benjamin Marzinski
@ 2022-08-22 18:11       ` Martin Wilck
  2022-08-24  2:47         ` Wu Guanghao
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Wilck @ 2022-08-22 18:11 UTC (permalink / raw)
  To: bmarzins; +Cc: wuguanghao3, dm-devel

On Mon, 2022-08-22 at 12:46 -0500, Benjamin Marzinski wrote:
> On Mon, Aug 22, 2022 at 04:15:01PM +0000, Martin Wilck wrote:
> > 
> > > +                               if (i >= VECTOR_SIZE(vecs-
> > > >pathvec))
> > > +                                       i = VECTOR_SIZE(vecs-
> > > > pathvec) - 1;
> > 
> > What if VECTOR_SIZE(vecs->pathvec) == 0? Maybe you should catch
> > that in
> > the while () condition above?
> 
> I could for clarity if you want, but the code is correct as is. If
> VECTOR_SIZE() is 0, then i will start at -1. This will cause the
> while()
> loop to immediately exit, since VECTOR_SLOT() checks for i < 0. Right
> after the while loop, i gets bumped up to start checking at the first
> device (which of course isn't there).  It's the same logic as what
> happens if the while() loop searches through the entire pathvec, and
> doesn't find any checked paths.  Obviously, the empty vector case
> does a
> bit of unnecessary work after finding out that the vector is empty,
> and
> I could add something like
> 
> if (VECTOR_SIZE(vecs->pathvec) == 0) {
>         checker_state = CHECKER_FINISHED;
>         goto unlock;
> }
> 
> If you'd prefer.

No, it's fine. I realized that your code was correct after I'd hit
"Send" :-/

Wu Guanghaho, have you already some results to report?

Martin


--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.
       [not found] <1660769316-5302-1-git-send-email-bmarzins@redhat.com>
       [not found] ` <1660769316-5302-4-git-send-email-bmarzins@redhat.com>
@ 2022-08-22 18:12 ` Martin Wilck
  2022-09-15  6:56 ` Wu Guanghao
  2 siblings, 0 replies; 13+ messages in thread
From: Martin Wilck @ 2022-08-22 18:12 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: wuguanghao3, dm-devel

On Wed, 2022-08-17 at 15:48 -0500, Benjamin Marzinski wrote:
> When there are a huge number of paths (> 10000) The amount of time
> that
> the checkerloop can hold the vecs lock for while checking the paths
> can
> get to be large enough that it starves other vecs lock users.  If
> path
> checking takes long enough, it's possible that uxlsnr threads will
> never
> run. To deal with this, this patchset makes it possible to drop the
> vecs
> lock while checking the paths, and then reacquire it and continue
> with
> the next path to check.
> 
> My choice of only checking if there are waiters every 128 paths
> checked
> and only interrupting if path checking has taken more than a second
> are
> arbitrary. I didn't want to slow down path checking in the common
> case
> where this isn't an issue, and I wanted to avoid path checking
> getting
> starved by other vecs->lock users. Having the checkerloop wait for
> 10000
> nsec was based on my own testing with a setup using 4K multipath
> devies
> with 4 paths each. This was almost always long enough for the uevent
> or
> uxlsnr client to grab the vecs lock, but I'm not sure how dependent
> this
> is on details of the system. For instance with my setup in never took
> more than 20 seconds to check the paths. and usually, a looping
> through
> all the paths took well under 10 seconds, most often under 5. I would
> only occasionally run into situations where a uxlsnr client would
> time
> out.
> 
> V2 Changes:
> 0003: Switched to a simpler method of determining the path to
> continue
>       checking at, as suggested by Martin Wilck. Also fixed a bug
> when
>       the previous index was larger than the current vector size.
> 
> Benjamin Marzinski (6):
>   multipathd: Use regular pthread_mutex_t for waiter_lock
>   multipathd: track waiters for mutex_lock
>   multipathd: Occasionally allow waiters to interrupt checking paths
>   multipathd: allow uxlsnr clients to interrupt checking paths
>   multipathd: fix uxlsnr timeout
>   multipathd: Don't check if timespec.tv_sec is zero

For the series:

Reviewed-by: Martin Wilck <mwilck@suse.com>


> 
>  libmultipath/lock.h    |  16 +++++
>  libmultipath/structs.h |   1 +
>  multipathd/main.c      | 147 ++++++++++++++++++++++++++-------------
> --
>  multipathd/uxlsnr.c    |  23 +++++--
>  multipathd/uxlsnr.h    |   1 +
>  multipathd/waiter.c    |  14 ++--
>  6 files changed, 136 insertions(+), 66 deletions(-)
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [PATCH V2 3/6] multipathd: Occasionally allow waiters to interrupt checking paths
  2022-08-22 18:11       ` Martin Wilck
@ 2022-08-24  2:47         ` Wu Guanghao
  0 siblings, 0 replies; 13+ messages in thread
From: Wu Guanghao @ 2022-08-24  2:47 UTC (permalink / raw)
  To: Martin Wilck, bmarzins; +Cc: dm-devel



在 2022/8/23 2:11, Martin Wilck 写道:
> On Mon, 2022-08-22 at 12:46 -0500, Benjamin Marzinski wrote:
>> On Mon, Aug 22, 2022 at 04:15:01PM +0000, Martin Wilck wrote:
>>>
>>>> +                               if (i >= VECTOR_SIZE(vecs-
>>>>> pathvec))
>>>> +                                       i = VECTOR_SIZE(vecs-
>>>>> pathvec) - 1;
>>>
>>> What if VECTOR_SIZE(vecs->pathvec) == 0? Maybe you should catch
>>> that in
>>> the while () condition above?
>>
>> I could for clarity if you want, but the code is correct as is. If
>> VECTOR_SIZE() is 0, then i will start at -1. This will cause the
>> while()
>> loop to immediately exit, since VECTOR_SLOT() checks for i < 0. Right
>> after the while loop, i gets bumped up to start checking at the first
>> device (which of course isn't there).  It's the same logic as what
>> happens if the while() loop searches through the entire pathvec, and
>> doesn't find any checked paths.  Obviously, the empty vector case
>> does a
>> bit of unnecessary work after finding out that the vector is empty,
>> and
>> I could add something like
>>
>> if (VECTOR_SIZE(vecs->pathvec) == 0) {
>>         checker_state = CHECKER_FINISHED;
>>         goto unlock;
>> }
>>
>> If you'd prefer.
> 
> No, it's fine. I realized that your code was correct after I'd hit
> "Send" :-/
> 
> Wu Guanghaho, have you already some results to report?
> 
The test environment is not ready, it may take next week to have results.
> Martin
> 
> 
> .
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.
       [not found] <1660769316-5302-1-git-send-email-bmarzins@redhat.com>
       [not found] ` <1660769316-5302-4-git-send-email-bmarzins@redhat.com>
  2022-08-22 18:12 ` [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted Martin Wilck
@ 2022-09-15  6:56 ` Wu Guanghao
  2022-09-15 18:17   ` Benjamin Marzinski
  2022-09-20  6:45   ` Martin Wilck
  2 siblings, 2 replies; 13+ messages in thread
From: Wu Guanghao @ 2022-09-15  6:56 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: linfeilong, lixiaokeng, device-mapper development, Martin Wilck

Sorry for the late feedback.

The version we are currently testing is 0.8.4, so we only merge the
first 3 patches in this series of patches. Then after the actual test,
it was found that the effect improvement is not very obvious.

Test environment: 1000 multipath devices, 16 paths per device
Test command:  Aggregate multipath devices using multipathd add path
Time consuming (time for 16 paths to aggregate 1 multipath device):
1. Original:
	< 4s:   76.9%;
	4s~10s: 18.4%;
	>10s:	 4.7%;
2. After using the patches:
	< 4s:	79.1%;
	4s~10s: 18.2%;
	>10s:	 2.7%;

From the results, the time-consuming improvement of the patch is not obvious,
so we made a few changes to the patch and it worked as expected. The modification
of the patch is as follows:

Paths_checked is changed to configurable, current setting n = 2.
Removed judgment on diff_time.
Sleep change from 10us to 5ms

if (++paths_checked % n == 0 &&
	lock_has_waiters(&vecs->lock)) {
		get_monotonic_time(&end_time);
		timespecsub(&end_time, &chk_start_time,
			    &diff_time);
		if (diff_time.tv_sec > 0) // Delete the code, goto directly
			goto unlock;
}

if (checker_state != CHECKER_FINISHED) {
	/* Yield to waiters */
	//struct timespec wait = { .tv_nsec = 10000, };
	//nanosleep(&wait, NULL);
	usleep(5000);  // sleep change from 10us to 5ms
}

Time consuming(After making the above changes to the patch):
< 4s: 99.5%;
= 4s: 0.5%;
> 4s: 0%;

在 2022/8/18 4:48, Benjamin Marzinski 写道:
> When there are a huge number of paths (> 10000) The amount of time that
> the checkerloop can hold the vecs lock for while checking the paths can
> get to be large enough that it starves other vecs lock users.  If path
> checking takes long enough, it's possible that uxlsnr threads will never
> run. To deal with this, this patchset makes it possible to drop the vecs
> lock while checking the paths, and then reacquire it and continue with
> the next path to check.
> 
> My choice of only checking if there are waiters every 128 paths checked
> and only interrupting if path checking has taken more than a second are
> arbitrary. I didn't want to slow down path checking in the common case
> where this isn't an issue, and I wanted to avoid path checking getting
> starved by other vecs->lock users. Having the checkerloop wait for 10000
> nsec was based on my own testing with a setup using 4K multipath devies
> with 4 paths each. This was almost always long enough for the uevent or
> uxlsnr client to grab the vecs lock, but I'm not sure how dependent this
> is on details of the system. For instance with my setup in never took
> more than 20 seconds to check the paths. and usually, a looping through
> all the paths took well under 10 seconds, most often under 5. I would
> only occasionally run into situations where a uxlsnr client would time
> out.
> 
> V2 Changes:
> 0003: Switched to a simpler method of determining the path to continue
>       checking at, as suggested by Martin Wilck. Also fixed a bug when
>       the previous index was larger than the current vector size.
> 
> Benjamin Marzinski (6):
>   multipathd: Use regular pthread_mutex_t for waiter_lock
>   multipathd: track waiters for mutex_lock
>   multipathd: Occasionally allow waiters to interrupt checking paths
>   multipathd: allow uxlsnr clients to interrupt checking paths
>   multipathd: fix uxlsnr timeout
>   multipathd: Don't check if timespec.tv_sec is zero
> 
>  libmultipath/lock.h    |  16 +++++
>  libmultipath/structs.h |   1 +
>  multipathd/main.c      | 147 ++++++++++++++++++++++++++---------------
>  multipathd/uxlsnr.c    |  23 +++++--
>  multipathd/uxlsnr.h    |   1 +
>  multipathd/waiter.c    |  14 ++--
>  6 files changed, 136 insertions(+), 66 deletions(-)
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.
  2022-09-15  6:56 ` Wu Guanghao
@ 2022-09-15 18:17   ` Benjamin Marzinski
  2022-09-20  9:20     ` Wu Guanghao
  2022-09-20  6:45   ` Martin Wilck
  1 sibling, 1 reply; 13+ messages in thread
From: Benjamin Marzinski @ 2022-09-15 18:17 UTC (permalink / raw)
  To: Wu Guanghao
  Cc: linfeilong, lixiaokeng, device-mapper development, Martin Wilck

On Thu, Sep 15, 2022 at 02:56:36PM +0800, Wu Guanghao wrote:
> Sorry for the late feedback.
> 
> The version we are currently testing is 0.8.4, so we only merge the
> first 3 patches in this series of patches. Then after the actual test,
> it was found that the effect improvement is not very obvious.
> 
> Test environment: 1000 multipath devices, 16 paths per device
> Test command:  Aggregate multipath devices using multipathd add path
> Time consuming (time for 16 paths to aggregate 1 multipath device):
> 1. Original:
> 	< 4s:   76.9%;
> 	4s~10s: 18.4%;
> 	>10s:	 4.7%;
> 2. After using the patches:
> 	< 4s:	79.1%;
> 	4s~10s: 18.2%;
> 	>10s:	 2.7%;
> 
> >From the results, the time-consuming improvement of the patch is not obvious,
> so we made a few changes to the patch and it worked as expected. The modification
> of the patch is as follows:
> 
> Paths_checked is changed to configurable, current setting n = 2.
> Removed judgment on diff_time.
> Sleep change from 10us to 5ms

I worry that this is giving too much deference to the uevents. If there
is a uevent storm, checking will stop for 5ms every 2 paths checked. I'm
worried that this will make it take significantly longer for the the
path checker to complete a cycle.  Making paths_checked configurable, so
that this doesn't trigger too often does help to avoid the issue that
checking the time from chk_start_time was dealing with, and makes the
mechanics of this simpler.  But 5ms seems like a long time to have to
wait, just to make sure that another process had the time to grab the
vecs lock.  Perhaps it would be better to sleep for a shorter length of
time, but in a loop, where we can check to see if progress has been
made, perhaps by checking some counter of events and user commands
serviced. This way, we aren't sleeping too long for no good reason.

> if (++paths_checked % n == 0 &&
> 	lock_has_waiters(&vecs->lock)) {
> 		get_monotonic_time(&end_time);
> 		timespecsub(&end_time, &chk_start_time,
> 			    &diff_time);
> 		if (diff_time.tv_sec > 0) // Delete the code, goto directly
> 			goto unlock;
> }
> 
> if (checker_state != CHECKER_FINISHED) {
> 	/* Yield to waiters */
> 	//struct timespec wait = { .tv_nsec = 10000, };
> 	//nanosleep(&wait, NULL);
> 	usleep(5000);  // sleep change from 10us to 5ms
> }
> 
> Time consuming(After making the above changes to the patch):
> < 4s: 99.5%;
> = 4s: 0.5%;
> > 4s: 0%;

Since I believe that this is likely causing a real impact on the checker
speed, I'm not sure that we're looking for results like this. Are you
seeing That "path checkers took longer than ..." message? How long does
it say the checker is taking (and what do you have max_polling_interval
set to)?

-Ben

> 在 2022/8/18 4:48, Benjamin Marzinski 写道:
> > When there are a huge number of paths (> 10000) The amount of time that
> > the checkerloop can hold the vecs lock for while checking the paths can
> > get to be large enough that it starves other vecs lock users.  If path
> > checking takes long enough, it's possible that uxlsnr threads will never
> > run. To deal with this, this patchset makes it possible to drop the vecs
> > lock while checking the paths, and then reacquire it and continue with
> > the next path to check.
> > 
> > My choice of only checking if there are waiters every 128 paths checked
> > and only interrupting if path checking has taken more than a second are
> > arbitrary. I didn't want to slow down path checking in the common case
> > where this isn't an issue, and I wanted to avoid path checking getting
> > starved by other vecs->lock users. Having the checkerloop wait for 10000
> > nsec was based on my own testing with a setup using 4K multipath devies
> > with 4 paths each. This was almost always long enough for the uevent or
> > uxlsnr client to grab the vecs lock, but I'm not sure how dependent this
> > is on details of the system. For instance with my setup in never took
> > more than 20 seconds to check the paths. and usually, a looping through
> > all the paths took well under 10 seconds, most often under 5. I would
> > only occasionally run into situations where a uxlsnr client would time
> > out.
> > 
> > V2 Changes:
> > 0003: Switched to a simpler method of determining the path to continue
> >       checking at, as suggested by Martin Wilck. Also fixed a bug when
> >       the previous index was larger than the current vector size.
> > 
> > Benjamin Marzinski (6):
> >   multipathd: Use regular pthread_mutex_t for waiter_lock
> >   multipathd: track waiters for mutex_lock
> >   multipathd: Occasionally allow waiters to interrupt checking paths
> >   multipathd: allow uxlsnr clients to interrupt checking paths
> >   multipathd: fix uxlsnr timeout
> >   multipathd: Don't check if timespec.tv_sec is zero
> > 
> >  libmultipath/lock.h    |  16 +++++
> >  libmultipath/structs.h |   1 +
> >  multipathd/main.c      | 147 ++++++++++++++++++++++++++---------------
> >  multipathd/uxlsnr.c    |  23 +++++--
> >  multipathd/uxlsnr.h    |   1 +
> >  multipathd/waiter.c    |  14 ++--
> >  6 files changed, 136 insertions(+), 66 deletions(-)
> > 
--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.
  2022-09-15  6:56 ` Wu Guanghao
  2022-09-15 18:17   ` Benjamin Marzinski
@ 2022-09-20  6:45   ` Martin Wilck
  1 sibling, 0 replies; 13+ messages in thread
From: Martin Wilck @ 2022-09-20  6:45 UTC (permalink / raw)
  To: bmarzins, wuguanghao3; +Cc: linfeilong, lixiaokeng, dm-devel

On Thu, 2022-09-15 at 14:56 +0800, Wu Guanghao wrote:
> Sorry for the late feedback.
> 
> The version we are currently testing is 0.8.4, so we only merge the
> first 3 patches in this series of patches. Then after the actual
> test,
> it was found that the effect improvement is not very obvious.
> 

Which path checker are you using? 
If it's TUR, could you try to simply change the sync wait time?

static void tur_timeout(struct timespec *tsp)
{
	get_monotonic_time(tsp);
	tsp->tv_nsec += 1000 * 1000; /* 1 millisecond */
	normalize_timespec(tsp);
}

Change the 1 millisecond above to e.g. one microsecond. That should
speed up the checker significantly. You will get a higher rate of
"pending" path states, but that shouldn't matter much.

Regards
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.
  2022-09-15 18:17   ` Benjamin Marzinski
@ 2022-09-20  9:20     ` Wu Guanghao
  2022-10-17 11:31       ` Wu Guanghao
  0 siblings, 1 reply; 13+ messages in thread
From: Wu Guanghao @ 2022-09-20  9:20 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: linfeilong, lixiaokeng, device-mapper development, Martin Wilck



在 2022/9/16 2:17, Benjamin Marzinski 写道:
> On Thu, Sep 15, 2022 at 02:56:36PM +0800, Wu Guanghao wrote:
>> Sorry for the late feedback.
>>
>> The version we are currently testing is 0.8.4, so we only merge the
>> first 3 patches in this series of patches. Then after the actual test,
>> it was found that the effect improvement is not very obvious.
>>
>> Test environment: 1000 multipath devices, 16 paths per device
>> Test command:  Aggregate multipath devices using multipathd add path
>> Time consuming (time for 16 paths to aggregate 1 multipath device):
>> 1. Original:
>> 	< 4s:   76.9%;
>> 	4s~10s: 18.4%;
>> 	>10s:	 4.7%;
>> 2. After using the patches:
>> 	< 4s:	79.1%;
>> 	4s~10s: 18.2%;
>> 	>10s:	 2.7%;
>>
>> >From the results, the time-consuming improvement of the patch is not obvious,
>> so we made a few changes to the patch and it worked as expected. The modification
>> of the patch is as follows:
>>
>> Paths_checked is changed to configurable, current setting n = 2.
>> Removed judgment on diff_time.
>> Sleep change from 10us to 5ms
> 
> I worry that this is giving too much deference to the uevents. If there
> is a uevent storm, checking will stop for 5ms every 2 paths checked. I'm
> worried that this will make it take significantly longer for the the
> path checker to complete a cycle.  Making paths_checked configurable, so
> that this doesn't trigger too often does help to avoid the issue that
> checking the time from chk_start_time was dealing with, and makes the
> mechanics of this simpler.  But 5ms seems like a long time to have to
> wait, just to make sure that another process had the time to grab the
> vecs lock.  Perhaps it would be better to sleep for a shorter length of
> time, but in a loop, where we can check to see if progress has been
> made, perhaps by checking some counter of events and user commands
> serviced. This way, we aren't sleeping too long for no good reason.
> I agree with this, we are also going to adjust the sleep time, and then
continue the test.

>> if (++paths_checked % n == 0 &&
>> 	lock_has_waiters(&vecs->lock)) {
>> 		get_monotonic_time(&end_time);
>> 		timespecsub(&end_time, &chk_start_time,
>> 			    &diff_time);
>> 		if (diff_time.tv_sec > 0) // Delete the code, goto directly
>> 			goto unlock;
>> }
>>
>> if (checker_state != CHECKER_FINISHED) {
>> 	/* Yield to waiters */
>> 	//struct timespec wait = { .tv_nsec = 10000, };
>> 	//nanosleep(&wait, NULL);
>> 	usleep(5000);  // sleep change from 10us to 5ms
>> }
>>
>> Time consuming(After making the above changes to the patch):
>> < 4s: 99.5%;
>> = 4s: 0.5%;
>>> 4s: 0%;
> 
> Since I believe that this is likely causing a real impact on the checker
> speed, I'm not sure that we're looking for results like this. Are you
> seeing That "path checkers took longer than ..." message? How long does
> it say the checker is taking (and what do you have max_polling_interval
> set to)?
> Yes, I saw this message. In the current test environment, using the original
code, all path checks take at least 50s to complete. We currently set
max_polling_interval=20s, so it must print this message. Adding sleep just
makes this message print more often.

> -Ben
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.
  2022-09-20  9:20     ` Wu Guanghao
@ 2022-10-17 11:31       ` Wu Guanghao
  2022-10-24  2:22         ` Wu Guanghao
  0 siblings, 1 reply; 13+ messages in thread
From: Wu Guanghao @ 2022-10-17 11:31 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: linfeilong, lixiaokeng, device-mapper development, Martin Wilck



在 2022/9/20 17:20, Wu Guanghao 写道:
> 
> 
> 在 2022/9/16 2:17, Benjamin Marzinski 写道:
>> On Thu, Sep 15, 2022 at 02:56:36PM +0800, Wu Guanghao wrote:
>>> Sorry for the late feedback.
>>>
>>> The version we are currently testing is 0.8.4, so we only merge the
>>> first 3 patches in this series of patches. Then after the actual test,
>>> it was found that the effect improvement is not very obvious.
>>>
>>> Test environment: 1000 multipath devices, 16 paths per device
>>> Test command:  Aggregate multipath devices using multipathd add path
>>> Time consuming (time for 16 paths to aggregate 1 multipath device):
>>> 1. Original:
>>> 	< 4s:   76.9%;
>>> 	4s~10s: 18.4%;
>>> 	>10s:	 4.7%;
>>> 2. After using the patches:
>>> 	< 4s:	79.1%;
>>> 	4s~10s: 18.2%;
>>> 	>10s:	 2.7%;
>>>
>>> >From the results, the time-consuming improvement of the patch is not obvious,
>>> so we made a few changes to the patch and it worked as expected. The modification
>>> of the patch is as follows:
>>>
>>> Paths_checked is changed to configurable, current setting n = 2.
>>> Removed judgment on diff_time.
>>> Sleep change from 10us to 5ms
>>
>> I worry that this is giving too much deference to the uevents. If there
>> is a uevent storm, checking will stop for 5ms every 2 paths checked. I'm
>> worried that this will make it take significantly longer for the the
>> path checker to complete a cycle.  Making paths_checked configurable, so
>> that this doesn't trigger too often does help to avoid the issue that
>> checking the time from chk_start_time was dealing with, and makes the
>> mechanics of this simpler.  But 5ms seems like a long time to have to
>> wait, just to make sure that another process had the time to grab the
>> vecs lock.  Perhaps it would be better to sleep for a shorter length of
>> time, but in a loop, where we can check to see if progress has been
>> made, perhaps by checking some counter of events and user commands
>> serviced. This way, we aren't sleeping too long for no good reason.
>> I agree with this, we are also going to adjust the sleep time, and then
> continue the test.

We have tested the delay times of 10us, 100us and 1ms, but the results
weren't very good. More than 30% of the luns aggregation time is greater
than 4s. Whether the delay time can also be used as a configurable item
and configured according to the situation.

The following is the patch content after we have modified checker_loop().

Signed-off-by: wuguanghao <wuguanghao3@huawei.com>
---
 libmultipath/config.c   |  3 ++
 libmultipath/config.h   |  3 ++
 libmultipath/defaults.h |  3 ++
 libmultipath/dict.c     | 58 +++++++++++++++++++++++++++
 libmultipath/structs.h  |  1 +
 multipathd/main.c       | 87 +++++++++++++++++++++++++++++++++--------
 6 files changed, 138 insertions(+), 17 deletions(-)

diff --git a/libmultipath/config.c b/libmultipath/config.c
index 7d0d711..7658626 100644
--- a/libmultipath/config.c
+++ b/libmultipath/config.c
@@ -752,6 +752,9 @@ load_config (char * file)
 	conf->remove_retries = 0;
 	conf->ghost_delay = DEFAULT_GHOST_DELAY;
 	conf->all_tg_pt = DEFAULT_ALL_TG_PT;
+	conf->check_delay_time = DEFAULT_CHECK_DELAY_TIME;
+	conf->check_path_num_per_splice = DEFAULT_CHECK_PATH_NUM_PER_SPLICE;
+
 	/*
 	 * preload default hwtable
 	 */
diff --git a/libmultipath/config.h b/libmultipath/config.h
index ceecff2..a26dd99 100644
--- a/libmultipath/config.h
+++ b/libmultipath/config.h
@@ -229,6 +229,9 @@ struct config {
 	vector elist_property;
 	vector elist_protocol;
 	char *enable_foreign;
+
+	int check_delay_time;
+	int check_path_num_per_splice;
 };

 extern struct udev * udev;
diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
index 52fe05b..b3fb1bc 100644
--- a/libmultipath/defaults.h
+++ b/libmultipath/defaults.h
@@ -50,6 +50,9 @@
 #define DEFAULT_FIND_MULTIPATHS_TIMEOUT -10
 #define DEFAULT_UNKNOWN_FIND_MULTIPATHS_TIMEOUT 1
 #define DEFAULT_ALL_TG_PT ALL_TG_PT_OFF
+#define DEFAULT_CHECK_DELAY_TIME 10
+#define DEFAULT_CHECK_PATH_NUM_PER_SPLICE 1
+
 /* Enable all foreign libraries by default */
 #define DEFAULT_ENABLE_FOREIGN ""

diff --git a/libmultipath/dict.c b/libmultipath/dict.c
index 2c8ea10..2262caa 100644
--- a/libmultipath/dict.c
+++ b/libmultipath/dict.c
@@ -1407,6 +1407,46 @@ def_uxsock_timeout_handler(struct config *conf, vector strvec)
 	free(buff);
 	return 0;
 }
+static int
+def_check_delay_time_handler(struct config *conf, vector strvec)
+{
+	int local_check_delay_time;
+	char *buff;
+
+	buff = set_value(strvec);
+	if (!buff)
+		return 1;
+
+	if (sscanf(buff, "%u", &local_check_delay_time) == 1 &&
+	    local_check_delay_time > DEFAULT_CHECK_DELAY_TIME)
+		conf->check_delay_time = local_check_delay_time;
+	else
+		conf->check_delay_time = DEFAULT_CHECK_DELAY_TIME;
+
+	free(buff);
+	return 0;
+}
+
+static int
+def_check_path_num_per_splice_handler(struct config *conf, vector strvec)
+{
+	int local_check_path_num_per_splice;
+	char *buff;
+
+	buff = set_value(strvec);
+	if (!buff)
+		return 1;
+
+	if (sscanf(buff, "%u", &local_check_path_num_per_splice) == 1 &&
+	    local_check_path_num_per_splice > DEFAULT_CHECK_PATH_NUM_PER_SPLICE)
+		conf->check_path_num_per_splice = local_check_path_num_per_splice;
+	else
+		conf->check_path_num_per_splice = DEFAULT_CHECK_PATH_NUM_PER_SPLICE;
+
+	free(buff);
+	return 0;
+}
+

 static int
 hw_vpd_vendor_handler(struct config *conf, vector strvec)
@@ -1546,6 +1586,21 @@ snprint_def_uxsock_timeout(struct config *conf, char * buff, int len,
 	return snprintf(buff, len, "%u", conf->uxsock_timeout);
 }

+static int
+snprint_def_check_delay_time(struct config *conf, char * buff, int len,
+			   const void *data)
+{
+	return snprintf(buff, len, "%u", conf->check_delay_time);
+}
+
+static int
+snprint_def_check_path_num_per_splice(struct config *conf, char * buff, int len,
+			   const void *data)
+{
+	return snprintf(buff, len, "%u", conf->check_path_num_per_splice);
+}
+
+
 static int
 snprint_ble_simple (struct config *conf, char * buff, int len,
 		    const void * data)
@@ -1804,6 +1859,9 @@ init_keywords(vector keywords)
 	install_keyword("enable_foreign", &def_enable_foreign_handler,
 			&snprint_def_enable_foreign);
 	install_keyword("marginal_pathgroups", &def_marginal_pathgroups_handler, &snprint_def_marginal_pathgroups);
+	install_keyword("check_delay_time", &def_check_delay_time_handler, &snprint_def_check_delay_time);
+	install_keyword("check_path_num_per_splice", &def_check_path_num_per_splice_handler, &snprint_def_check_path_num_per_splice);
+
 	__deprecated install_keyword("default_selector", &def_selector_handler, NULL);
 	__deprecated install_keyword("default_path_grouping_policy", &def_pgpolicy_handler, NULL);
 	__deprecated install_keyword("default_uid_attribute", &def_uid_attribute_handler, NULL);
diff --git a/libmultipath/structs.h b/libmultipath/structs.h
index 11e516a..391de96 100644
--- a/libmultipath/structs.h
+++ b/libmultipath/structs.h
@@ -288,6 +288,7 @@ struct path {
 	int find_multipaths_timeout;
 	int marginal;
 	int vpd_vendor_id;
+	bool is_checked;
 	/* configlet pointers */
 	vector hwe;
 	struct gen_path generic_path;
diff --git a/multipathd/main.c b/multipathd/main.c
index 5035d5b..865b7b5 100644
--- a/multipathd/main.c
+++ b/multipathd/main.c
@@ -2388,6 +2388,11 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
 	}
 	return 1;
 }
+enum checker_state {
+	CHECKER_STARTING,
+	CHECKER_RUNNING,
+	CHECKER_FINISHED,
+};

 static void *
 checkerloop (void *ap)
@@ -2395,11 +2400,11 @@ checkerloop (void *ap)
 	struct vectors *vecs;
 	struct path *pp;
 	int count = 0;
-	unsigned int i;
 	struct timespec last_time;
 	struct config *conf;
 	int foreign_tick = 0;
 	bool use_watchdog;
+	int check_delay_time, check_path_num_per_splice;

 	pthread_cleanup_push(rcu_unregister, NULL);
 	rcu_register_thread();
@@ -2414,12 +2419,15 @@ checkerloop (void *ap)
 	/* use_watchdog is set from process environment and never changes */
 	conf = get_multipath_config();
 	use_watchdog = conf->use_watchdog;
+	check_delay_time = conf->check_delay_time;
+	check_path_num_per_splice = conf->check_path_num_per_splice;
 	put_multipath_config(conf);

 	while (1) {
 		struct timespec diff_time, start_time, end_time;
-		int num_paths = 0, strict_timing, rc = 0;
+		int num_paths = 0, strict_timing, rc = 0, i = 0;
 		unsigned int ticks = 0;
+		enum checker_state checker_state = CHECKER_STARTING;

 		get_monotonic_time(&start_time);
 		if (start_time.tv_sec && last_time.tv_sec) {
@@ -2443,21 +2451,66 @@ checkerloop (void *ap)
 		} else if (rc == EINVAL)
 			/* daemon shutdown */
 			break;
-
-		pthread_cleanup_push(cleanup_lock, &vecs->lock);
-		lock(&vecs->lock);
-		pthread_testcancel();
-		vector_foreach_slot (vecs->pathvec, pp, i) {
-			rc = check_path(vecs, pp, ticks);
-			if (rc < 0) {
-				vector_del_slot(vecs->pathvec, i);
-				free_path(pp);
-				i--;
-			} else
-				num_paths += rc;
-		}
-		lock_cleanup_pop(vecs->lock);
-
+		condlog(4, "check_delay_time is %u, check_path_num_per_splice is %u", check_delay_time, check_path_num_per_splice);
+
+		while (checker_state != CHECKER_FINISHED) {
+			unsigned int paths_checked = 0;
+			struct timespec chk_start_time;
+			pthread_cleanup_push(cleanup_lock, &vecs->lock);
+			lock(&vecs->lock);
+			pthread_testcancel();
+			get_monotonic_time(&chk_start_time);
+			if (checker_state == CHECKER_STARTING) {
+				vector_foreach_slot(vecs->pathvec, pp, i)
+					pp->is_checked = false;
+				i = 0;
+				checker_state = CHECKER_RUNNING;
+			} else {
+				/*
+				 * Paths could have been removed since we
+				 * dropped the lock. Find the path to continue
+				 * checking at. Since paths can be removed from
+				 * anywhere in the vector, but can only be added
+				 * at the end, the last checked path must be
+				 * between its old location, and the start or
+				 * the vector.
+				 */
+				if (i >= VECTOR_SIZE(vecs->pathvec))
+					i = VECTOR_SIZE(vecs->pathvec) - 1;
+				while ((pp = VECTOR_SLOT(vecs->pathvec, i))) {
+					if (pp->is_checked == true)
+						break;
+					i--;
+				}
+				i++;
+			}
+			vector_foreach_slot_after (vecs->pathvec, pp, i) {
+				pp->is_checked = true;
+				rc = check_path(vecs, pp, ticks);
+				if (rc < 0) {
+					condlog(1, "%s: check_path() failed, removing",
+						pp->dev);
+					vector_del_slot(vecs->pathvec, i);
+					free_path(pp);
+					i--;
+				} else
+					num_paths += rc;
+				if (++paths_checked % check_path_num_per_splice == 0 &&
+				    lock_has_waiters(&vecs->lock)) {
+					get_monotonic_time(&end_time);
+					timespecsub(&end_time, &chk_start_time,
+						    &diff_time);
+					goto unlock;
+				}
+			}
+			checker_state = CHECKER_FINISHED;
+unlock:
+			lock_cleanup_pop(vecs->lock);
+			if (checker_state != CHECKER_FINISHED) {
+				/* Yield to waiters */
+				usleep(check_delay_time);
+			}
+		}
 		pthread_cleanup_push(cleanup_lock, &vecs->lock);
 		lock(&vecs->lock);
 		pthread_testcancel();
-- 
2.27.0

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.
  2022-10-17 11:31       ` Wu Guanghao
@ 2022-10-24  2:22         ` Wu Guanghao
  2022-10-24  7:59           ` Martin Wilck
  0 siblings, 1 reply; 13+ messages in thread
From: Wu Guanghao @ 2022-10-24  2:22 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: linfeilong, lixiaokeng, device-mapper development, Martin Wilck

Friendly ping ...
Is the series of patches for vecs->lock ready to be merged into mainline?
Thanks

在 2022/10/17 19:31, Wu Guanghao 写道:
> 
> 
> 在 2022/9/20 17:20, Wu Guanghao 写道:
>>
>>
>> 在 2022/9/16 2:17, Benjamin Marzinski 写道:
>>> On Thu, Sep 15, 2022 at 02:56:36PM +0800, Wu Guanghao wrote:
>>>> Sorry for the late feedback.
>>>>
>>>> The version we are currently testing is 0.8.4, so we only merge the
>>>> first 3 patches in this series of patches. Then after the actual test,
>>>> it was found that the effect improvement is not very obvious.
>>>>
>>>> Test environment: 1000 multipath devices, 16 paths per device
>>>> Test command:  Aggregate multipath devices using multipathd add path
>>>> Time consuming (time for 16 paths to aggregate 1 multipath device):
>>>> 1. Original:
>>>> 	< 4s:   76.9%;
>>>> 	4s~10s: 18.4%;
>>>> 	>10s:	 4.7%;
>>>> 2. After using the patches:
>>>> 	< 4s:	79.1%;
>>>> 	4s~10s: 18.2%;
>>>> 	>10s:	 2.7%;
>>>>
>>>> >From the results, the time-consuming improvement of the patch is not obvious,
>>>> so we made a few changes to the patch and it worked as expected. The modification
>>>> of the patch is as follows:
>>>>
>>>> Paths_checked is changed to configurable, current setting n = 2.
>>>> Removed judgment on diff_time.
>>>> Sleep change from 10us to 5ms
>>>
>>> I worry that this is giving too much deference to the uevents. If there
>>> is a uevent storm, checking will stop for 5ms every 2 paths checked. I'm
>>> worried that this will make it take significantly longer for the the
>>> path checker to complete a cycle.  Making paths_checked configurable, so
>>> that this doesn't trigger too often does help to avoid the issue that
>>> checking the time from chk_start_time was dealing with, and makes the
>>> mechanics of this simpler.  But 5ms seems like a long time to have to
>>> wait, just to make sure that another process had the time to grab the
>>> vecs lock.  Perhaps it would be better to sleep for a shorter length of
>>> time, but in a loop, where we can check to see if progress has been
>>> made, perhaps by checking some counter of events and user commands
>>> serviced. This way, we aren't sleeping too long for no good reason.
>>> I agree with this, we are also going to adjust the sleep time, and then
>> continue the test.
> 
> We have tested the delay times of 10us, 100us and 1ms, but the results
> weren't very good. More than 30% of the luns aggregation time is greater
> than 4s. Whether the delay time can also be used as a configurable item
> and configured according to the situation.
> 
> The following is the patch content after we have modified checker_loop().
> 
> Signed-off-by: wuguanghao <wuguanghao3@huawei.com>
> ---
>  libmultipath/config.c   |  3 ++
>  libmultipath/config.h   |  3 ++
>  libmultipath/defaults.h |  3 ++
>  libmultipath/dict.c     | 58 +++++++++++++++++++++++++++
>  libmultipath/structs.h  |  1 +
>  multipathd/main.c       | 87 +++++++++++++++++++++++++++++++++--------
>  6 files changed, 138 insertions(+), 17 deletions(-)
> 
> diff --git a/libmultipath/config.c b/libmultipath/config.c
> index 7d0d711..7658626 100644
> --- a/libmultipath/config.c
> +++ b/libmultipath/config.c
> @@ -752,6 +752,9 @@ load_config (char * file)
>  	conf->remove_retries = 0;
>  	conf->ghost_delay = DEFAULT_GHOST_DELAY;
>  	conf->all_tg_pt = DEFAULT_ALL_TG_PT;
> +	conf->check_delay_time = DEFAULT_CHECK_DELAY_TIME;
> +	conf->check_path_num_per_splice = DEFAULT_CHECK_PATH_NUM_PER_SPLICE;
> +
>  	/*
>  	 * preload default hwtable
>  	 */
> diff --git a/libmultipath/config.h b/libmultipath/config.h
> index ceecff2..a26dd99 100644
> --- a/libmultipath/config.h
> +++ b/libmultipath/config.h
> @@ -229,6 +229,9 @@ struct config {
>  	vector elist_property;
>  	vector elist_protocol;
>  	char *enable_foreign;
> +
> +	int check_delay_time;
> +	int check_path_num_per_splice;
>  };
> 
>  extern struct udev * udev;
> diff --git a/libmultipath/defaults.h b/libmultipath/defaults.h
> index 52fe05b..b3fb1bc 100644
> --- a/libmultipath/defaults.h
> +++ b/libmultipath/defaults.h
> @@ -50,6 +50,9 @@
>  #define DEFAULT_FIND_MULTIPATHS_TIMEOUT -10
>  #define DEFAULT_UNKNOWN_FIND_MULTIPATHS_TIMEOUT 1
>  #define DEFAULT_ALL_TG_PT ALL_TG_PT_OFF
> +#define DEFAULT_CHECK_DELAY_TIME 10
> +#define DEFAULT_CHECK_PATH_NUM_PER_SPLICE 1
> +
>  /* Enable all foreign libraries by default */
>  #define DEFAULT_ENABLE_FOREIGN ""
> 
> diff --git a/libmultipath/dict.c b/libmultipath/dict.c
> index 2c8ea10..2262caa 100644
> --- a/libmultipath/dict.c
> +++ b/libmultipath/dict.c
> @@ -1407,6 +1407,46 @@ def_uxsock_timeout_handler(struct config *conf, vector strvec)
>  	free(buff);
>  	return 0;
>  }
> +static int
> +def_check_delay_time_handler(struct config *conf, vector strvec)
> +{
> +	int local_check_delay_time;
> +	char *buff;
> +
> +	buff = set_value(strvec);
> +	if (!buff)
> +		return 1;
> +
> +	if (sscanf(buff, "%u", &local_check_delay_time) == 1 &&
> +	    local_check_delay_time > DEFAULT_CHECK_DELAY_TIME)
> +		conf->check_delay_time = local_check_delay_time;
> +	else
> +		conf->check_delay_time = DEFAULT_CHECK_DELAY_TIME;
> +
> +	free(buff);
> +	return 0;
> +}
> +
> +static int
> +def_check_path_num_per_splice_handler(struct config *conf, vector strvec)
> +{
> +	int local_check_path_num_per_splice;
> +	char *buff;
> +
> +	buff = set_value(strvec);
> +	if (!buff)
> +		return 1;
> +
> +	if (sscanf(buff, "%u", &local_check_path_num_per_splice) == 1 &&
> +	    local_check_path_num_per_splice > DEFAULT_CHECK_PATH_NUM_PER_SPLICE)
> +		conf->check_path_num_per_splice = local_check_path_num_per_splice;
> +	else
> +		conf->check_path_num_per_splice = DEFAULT_CHECK_PATH_NUM_PER_SPLICE;
> +
> +	free(buff);
> +	return 0;
> +}
> +
> 
>  static int
>  hw_vpd_vendor_handler(struct config *conf, vector strvec)
> @@ -1546,6 +1586,21 @@ snprint_def_uxsock_timeout(struct config *conf, char * buff, int len,
>  	return snprintf(buff, len, "%u", conf->uxsock_timeout);
>  }
> 
> +static int
> +snprint_def_check_delay_time(struct config *conf, char * buff, int len,
> +			   const void *data)
> +{
> +	return snprintf(buff, len, "%u", conf->check_delay_time);
> +}
> +
> +static int
> +snprint_def_check_path_num_per_splice(struct config *conf, char * buff, int len,
> +			   const void *data)
> +{
> +	return snprintf(buff, len, "%u", conf->check_path_num_per_splice);
> +}
> +
> +
>  static int
>  snprint_ble_simple (struct config *conf, char * buff, int len,
>  		    const void * data)
> @@ -1804,6 +1859,9 @@ init_keywords(vector keywords)
>  	install_keyword("enable_foreign", &def_enable_foreign_handler,
>  			&snprint_def_enable_foreign);
>  	install_keyword("marginal_pathgroups", &def_marginal_pathgroups_handler, &snprint_def_marginal_pathgroups);
> +	install_keyword("check_delay_time", &def_check_delay_time_handler, &snprint_def_check_delay_time);
> +	install_keyword("check_path_num_per_splice", &def_check_path_num_per_splice_handler, &snprint_def_check_path_num_per_splice);
> +
>  	__deprecated install_keyword("default_selector", &def_selector_handler, NULL);
>  	__deprecated install_keyword("default_path_grouping_policy", &def_pgpolicy_handler, NULL);
>  	__deprecated install_keyword("default_uid_attribute", &def_uid_attribute_handler, NULL);
> diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> index 11e516a..391de96 100644
> --- a/libmultipath/structs.h
> +++ b/libmultipath/structs.h
> @@ -288,6 +288,7 @@ struct path {
>  	int find_multipaths_timeout;
>  	int marginal;
>  	int vpd_vendor_id;
> +	bool is_checked;
>  	/* configlet pointers */
>  	vector hwe;
>  	struct gen_path generic_path;
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 5035d5b..865b7b5 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2388,6 +2388,11 @@ check_path (struct vectors * vecs, struct path * pp, unsigned int ticks)
>  	}
>  	return 1;
>  }
> +enum checker_state {
> +	CHECKER_STARTING,
> +	CHECKER_RUNNING,
> +	CHECKER_FINISHED,
> +};
> 
>  static void *
>  checkerloop (void *ap)
> @@ -2395,11 +2400,11 @@ checkerloop (void *ap)
>  	struct vectors *vecs;
>  	struct path *pp;
>  	int count = 0;
> -	unsigned int i;
>  	struct timespec last_time;
>  	struct config *conf;
>  	int foreign_tick = 0;
>  	bool use_watchdog;
> +	int check_delay_time, check_path_num_per_splice;
> 
>  	pthread_cleanup_push(rcu_unregister, NULL);
>  	rcu_register_thread();
> @@ -2414,12 +2419,15 @@ checkerloop (void *ap)
>  	/* use_watchdog is set from process environment and never changes */
>  	conf = get_multipath_config();
>  	use_watchdog = conf->use_watchdog;
> +	check_delay_time = conf->check_delay_time;
> +	check_path_num_per_splice = conf->check_path_num_per_splice;
>  	put_multipath_config(conf);
> 
>  	while (1) {
>  		struct timespec diff_time, start_time, end_time;
> -		int num_paths = 0, strict_timing, rc = 0;
> +		int num_paths = 0, strict_timing, rc = 0, i = 0;
>  		unsigned int ticks = 0;
> +		enum checker_state checker_state = CHECKER_STARTING;
> 
>  		get_monotonic_time(&start_time);
>  		if (start_time.tv_sec && last_time.tv_sec) {
> @@ -2443,21 +2451,66 @@ checkerloop (void *ap)
>  		} else if (rc == EINVAL)
>  			/* daemon shutdown */
>  			break;
> -
> -		pthread_cleanup_push(cleanup_lock, &vecs->lock);
> -		lock(&vecs->lock);
> -		pthread_testcancel();
> -		vector_foreach_slot (vecs->pathvec, pp, i) {
> -			rc = check_path(vecs, pp, ticks);
> -			if (rc < 0) {
> -				vector_del_slot(vecs->pathvec, i);
> -				free_path(pp);
> -				i--;
> -			} else
> -				num_paths += rc;
> -		}
> -		lock_cleanup_pop(vecs->lock);
> -
> +		condlog(4, "check_delay_time is %u, check_path_num_per_splice is %u", check_delay_time, check_path_num_per_splice);
> +
> +		while (checker_state != CHECKER_FINISHED) {
> +			unsigned int paths_checked = 0;
> +			struct timespec chk_start_time;
> +			pthread_cleanup_push(cleanup_lock, &vecs->lock);
> +			lock(&vecs->lock);
> +			pthread_testcancel();
> +			get_monotonic_time(&chk_start_time);
> +			if (checker_state == CHECKER_STARTING) {
> +				vector_foreach_slot(vecs->pathvec, pp, i)
> +					pp->is_checked = false;
> +				i = 0;
> +				checker_state = CHECKER_RUNNING;
> +			} else {
> +				/*
> +				 * Paths could have been removed since we
> +				 * dropped the lock. Find the path to continue
> +				 * checking at. Since paths can be removed from
> +				 * anywhere in the vector, but can only be added
> +				 * at the end, the last checked path must be
> +				 * between its old location, and the start or
> +				 * the vector.
> +				 */
> +				if (i >= VECTOR_SIZE(vecs->pathvec))
> +					i = VECTOR_SIZE(vecs->pathvec) - 1;
> +				while ((pp = VECTOR_SLOT(vecs->pathvec, i))) {
> +					if (pp->is_checked == true)
> +						break;
> +					i--;
> +				}
> +				i++;
> +			}
> +			vector_foreach_slot_after (vecs->pathvec, pp, i) {
> +				pp->is_checked = true;
> +				rc = check_path(vecs, pp, ticks);
> +				if (rc < 0) {
> +					condlog(1, "%s: check_path() failed, removing",
> +						pp->dev);
> +					vector_del_slot(vecs->pathvec, i);
> +					free_path(pp);
> +					i--;
> +				} else
> +					num_paths += rc;
> +				if (++paths_checked % check_path_num_per_splice == 0 &&
> +				    lock_has_waiters(&vecs->lock)) {
> +					get_monotonic_time(&end_time);
> +					timespecsub(&end_time, &chk_start_time,
> +						    &diff_time);
> +					goto unlock;
> +				}
> +			}
> +			checker_state = CHECKER_FINISHED;
> +unlock:
> +			lock_cleanup_pop(vecs->lock);
> +			if (checker_state != CHECKER_FINISHED) {
> +				/* Yield to waiters */
> +				usleep(check_delay_time);
> +			}
> +		}
>  		pthread_cleanup_push(cleanup_lock, &vecs->lock);
>  		lock(&vecs->lock);
>  		pthread_testcancel();
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.
  2022-10-24  2:22         ` Wu Guanghao
@ 2022-10-24  7:59           ` Martin Wilck
  2022-10-26  3:47             ` Wu Guanghao
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Wilck @ 2022-10-24  7:59 UTC (permalink / raw)
  To: bmarzins, wuguanghao3; +Cc: linfeilong, lixiaokeng, dm-devel

Hello Wu Guanghao,

On Mon, 2022-10-24 at 10:22 +0800, Wu Guanghao wrote:
> Friendly ping ...
> Is the series of patches for vecs->lock ready to be merged into
> mainline?
> Thanks

Ben's series has been merged in 0.9.1 already. There have been no
reviewed patches on top of it, unless I am mistaken.

Btw, I never received an answer from you about my suggestion from 
Sep. 20:

> > The version we are currently testing is 0.8.4, so we only merge the
> > first 3 patches in this series of patches. Then after the actual
> > test,
> > it was found that the effect improvement is not very obvious.
> > 
> 
> Which path checker are you using? 
> If it's TUR, could you try to simply change the sync wait time?
> 
> static void tur_timeout(struct timespec *tsp)
> {
> 	get_monotonic_time(tsp);
> 	tsp->tv_nsec += 1000 * 1000; /* 1 millisecond */
> 	normalize_timespec(tsp);
> }
> 
> Change the 1 millisecond above to e.g. one microsecond. That should
> speed up the checker significantly. You will get a higher rate of
> "pending" path states, but that shouldn't matter much.
> 

Regards,
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted.
  2022-10-24  7:59           ` Martin Wilck
@ 2022-10-26  3:47             ` Wu Guanghao
  0 siblings, 0 replies; 13+ messages in thread
From: Wu Guanghao @ 2022-10-26  3:47 UTC (permalink / raw)
  To: Martin Wilck, bmarzins; +Cc: linfeilong, lixiaokeng, dm-devel



在 2022/10/24 15:59, Martin Wilck 写道:
> Hello Wu Guanghao,
> 
> On Mon, 2022-10-24 at 10:22 +0800, Wu Guanghao wrote:
>> Friendly ping ...
>> Is the series of patches for vecs->lock ready to be merged into
>> mainline?
>> Thanks
> 
> Ben's series has been merged in 0.9.1 already. There have been no
> reviewed patches on top of it, unless I am mistaken.

Yes, it has been merged, sorry.

> Btw, I never received an answer from you about my suggestion from 
> Sep. 20:
> 
>>> The version we are currently testing is 0.8.4, so we only merge the
>>> first 3 patches in this series of patches. Then after the actual
>>> test,
>>> it was found that the effect improvement is not very obvious.
>>>
>>
>> Which path checker are you using? 
>> If it's TUR, could you try to simply change the sync wait time?
>>
>> static void tur_timeout(struct timespec *tsp)
>> {
>> 	get_monotonic_time(tsp);
>> 	tsp->tv_nsec += 1000 * 1000; /* 1 millisecond */
>> 	normalize_timespec(tsp);
>> }
>>

We are ready to start the test and will give you feedback on the
test results later.

>> Change the 1 millisecond above to e.g. one microsecond. That should
>> speed up the checker significantly. You will get a higher rate of
>> "pending" path states, but that shouldn't matter much.
>>
> 
> Regards,
> Martin
> 
> .
> 

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-10-26  3:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1660769316-5302-1-git-send-email-bmarzins@redhat.com>
     [not found] ` <1660769316-5302-4-git-send-email-bmarzins@redhat.com>
2022-08-22 16:15   ` [dm-devel] [PATCH V2 3/6] multipathd: Occasionally allow waiters to interrupt checking paths Martin Wilck
2022-08-22 17:46     ` Benjamin Marzinski
2022-08-22 18:11       ` Martin Wilck
2022-08-24  2:47         ` Wu Guanghao
2022-08-22 18:12 ` [dm-devel] [PATCH V2 0/6] allowing path checking to be interrupted Martin Wilck
2022-09-15  6:56 ` Wu Guanghao
2022-09-15 18:17   ` Benjamin Marzinski
2022-09-20  9:20     ` Wu Guanghao
2022-10-17 11:31       ` Wu Guanghao
2022-10-24  2:22         ` Wu Guanghao
2022-10-24  7:59           ` Martin Wilck
2022-10-26  3:47             ` Wu Guanghao
2022-09-20  6:45   ` Martin Wilck

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.