Re: [dm-devel] [PATCH V2 3/6] multipathd: Occasionally allow waiters to interrupt checking paths

* Re: [dm-devel] [PATCH V2 3/6] multipathd: Occasionally allow waiters to interrupt checking paths
       [not found] ` <1660769316-5302-4-git-send-email-bmarzins@redhat.com>
@ 2022-08-22 16:15   ` Martin Wilck
  2022-08-22 17:46     ` Benjamin Marzinski
  0 siblings, 1 reply; 13+ messages in thread
From: Martin Wilck @ 2022-08-22 16:15 UTC (permalink / raw)
  To: bmarzins, christophe.varoqui; +Cc: wuguanghao3, dm-devel

On Wed, 2022-08-17 at 15:48 -0500, Benjamin Marzinski wrote:
> If there are a very large number of paths that all get checked at the
> same time, it is possible for the path checkers to starve out other
> vecs->lock users, like uevent processing. To avoid this, if the path
> checkers are taking a long time, checkerloop now occasionally unlocks
> and allows other vecs->lock waiters to run.
> 
> In order to make sure that path checking is always making real
> progress,
> checkerloop() only checks if it should unlock every 128 path checks.
> To
> check, it sees if there are waiters and more than a second has passed
> since it acquired the vecs->lock. If both of these are true, it drops
> the lock and calls nanosleep() to schedule.
> 
> When checkerloop() reacquires the lock, it may no longer be at the
> correct place in the vector. While paths can be deleted from any
> point
> of the pathvec, they can only be added at the end. This means that
> the
> next path to check must be between its previous location and the
> start
> of the vector. To help determine the correct starting place,
> checkerloop() marks each path as not checked at the start of each
> checker loop. New paths start in the unchecked state. As paths are
> checked, they are marked as such. If the checker loop is interrupted,
> when it reacquires the lock, it will iterate backwards from the last
> location in checked to the start of the vector. The first path it
> finds
> that is marked as checked must be the last path it checked.
> 
> Signed-off-by: Benjamin Marzinski <bmarzins@redhat.com>
> ---
>  libmultipath/structs.h |  1 +
>  multipathd/main.c      | 82 +++++++++++++++++++++++++++++++++-------
> --
>  2 files changed, 67 insertions(+), 16 deletions(-)
> 
> diff --git a/libmultipath/structs.h b/libmultipath/structs.h
> index a6a09441..9d4fb9c8 100644
> --- a/libmultipath/structs.h
> +++ b/libmultipath/structs.h
> @@ -351,6 +351,7 @@ struct path {
>         int fast_io_fail;
>         unsigned int dev_loss;
>         int eh_deadline;
> +       bool is_checked;
>         /* configlet pointers */
>         vector hwe;
>         struct gen_path generic_path;
> diff --git a/multipathd/main.c b/multipathd/main.c
> index 71079113..37c04018 100644
> --- a/multipathd/main.c
> +++ b/multipathd/main.c
> @@ -2548,6 +2548,11 @@ check_path (struct vectors * vecs, struct path
> * pp, unsigned int ticks)
>         }
>         return 1;
>  }
> +enum checker_state {
> +       CHECKER_STARTING,
> +       CHECKER_RUNNING,
> +       CHECKER_FINISHED,
> +};
>  
>  static void *
>  checkerloop (void *ap)
> @@ -2555,7 +2560,6 @@ checkerloop (void *ap)
>         struct vectors *vecs;
>         struct path *pp;
>         int count = 0;
> -       unsigned int i;
>         struct timespec last_time;
>         struct config *conf;
>         int foreign_tick = 0;
> @@ -2581,8 +2585,9 @@ checkerloop (void *ap)
>  
>         while (1) {
>                 struct timespec diff_time, start_time, end_time;
> -               int num_paths = 0, strict_timing, rc = 0;
> +               int num_paths = 0, strict_timing, rc = 0, i = 0;
>                 unsigned int ticks = 0;
> +               enum checker_state checker_state = CHECKER_STARTING;
>  
>                 if (set_config_state(DAEMON_RUNNING) !=
> DAEMON_RUNNING)
>                         /* daemon shutdown */
> @@ -2603,22 +2608,67 @@ checkerloop (void *ap)
>                 if (use_watchdog)
>                         sd_notify(0, "WATCHDOG=1");
>  #endif
> +               while (checker_state != CHECKER_FINISHED) {
> +                       unsigned int paths_checked = 0;
> +                       struct timespec chk_start_time;
>  
> -               pthread_cleanup_push(cleanup_lock, &vecs->lock);
> -               lock(&vecs->lock);
> -               pthread_testcancel();
> -               vector_foreach_slot (vecs->pathvec, pp, i) {
> -                       rc = check_path(vecs, pp, ticks);
> -                       if (rc < 0) {
> -                               condlog(1, "%s: check_path() failed,
> removing",
> -                                       pp->dev);
> -                               vector_del_slot(vecs->pathvec, i);
> -                               free_path(pp);
> -                               i--;
> -                       } else
> -                               num_paths += rc;
> +                       pthread_cleanup_push(cleanup_lock, &vecs-
> >lock);
> +                       lock(&vecs->lock);
> +                       pthread_testcancel();
> +                       get_monotonic_time(&chk_start_time);
> +                       if (checker_state == CHECKER_STARTING) {
> +                               vector_foreach_slot(vecs->pathvec,
> pp, i)
> +                                       pp->is_checked = false;
> +                               i = 0;
> +                               checker_state = CHECKER_RUNNING;
> +                       } else {
> +                               /*
> +                                * Paths could have been removed
> since we
> +                                * dropped the lock. Find the path to
> continue
> +                                * checking at. Since paths can be
> removed from
> +                                * anywhere in the vector, but can
> only be added
> +                                * at the end, the last checked path
> must be
> +                                * between its old location, and the
> start or
> +                                * the vector.
> +                                */
> +                               if (i >= VECTOR_SIZE(vecs->pathvec))
> +                                       i = VECTOR_SIZE(vecs-
> >pathvec) - 1;

What if VECTOR_SIZE(vecs->pathvec) == 0? Maybe you should catch that in
the while () condition above?

Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://listman.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 13+ messages in thread