All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christian Couder <christian.couder@gmail.com>
To: Matheus Tavares <matheus.bernardino@usp.br>
Cc: git <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>,
	Jeff Hostetler <git@jeffhostetler.com>
Subject: Re: [PATCH 2/5] parallel-checkout: make it truly parallel
Date: Wed, 31 Mar 2021 06:32:38 +0200	[thread overview]
Message-ID: <CAP8UFD0CmPmvK7a-+5fKk0e0=te_e7cCFf6_-vSFrX9COTBg-w@mail.gmail.com> (raw)
In-Reply-To: <fca797698c78e2248bd2645bd0dce23ec4df35ae.1616015337.git.matheus.bernardino@usp.br>

On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
<matheus.bernardino@usp.br> wrote:

> diff --git a/.gitignore b/.gitignore
> index 3dcdb6bb5a..26f8ddfc55 100644
> --- a/.gitignore
> +++ b/.gitignore
> @@ -33,6 +33,7 @@
>  /git-check-mailmap
>  /git-check-ref-format
>  /git-checkout
> +/git-checkout--helper

I wonder if "checkout--worker" would be better than "checkout--helper".

>  /git-checkout-index
>  /git-cherry
>  /git-cherry-pick

[...]

> +#define ASSERT_PC_ITEM_RESULT_SIZE(got, exp) \
> +{ \
> +       if (got != exp) \
> +               BUG("corrupted result from checkout worker (got %dB, exp %dB)", \

Maybe precompilers are smart enough to not replace the "got" and "exp"
in the above string, but it might be a bit confusing for readers.
Anway I wonder if this macro could just be a regular (possibly inline)
function.

> +                   got, exp); \
> +} while(0)

> +static void parse_and_save_result(const char *line, int len,
> +                                 struct pc_worker *worker)
> +{
> +       struct pc_item_result *res;
> +       struct parallel_checkout_item *pc_item;
> +       struct stat *st = NULL;
> +
> +       if (len < PC_ITEM_RESULT_BASE_SIZE)
> +               BUG("too short result from checkout worker (got %dB, exp %dB)",
> +                   len, (int)PC_ITEM_RESULT_BASE_SIZE);
> +
> +       res = (struct pc_item_result *)line;
> +
> +       /*
> +        * Worker should send either the full result struct on success, or
> +        * just the base (i.e. no stat data), otherwise.
> +        */
> +       if (res->status == PC_ITEM_WRITTEN) {
> +               ASSERT_PC_ITEM_RESULT_SIZE(len, (int)sizeof(struct pc_item_result));
> +               st = &res->st;
> +       } else {
> +               ASSERT_PC_ITEM_RESULT_SIZE(len, (int)PC_ITEM_RESULT_BASE_SIZE);
> +       }
> +
> +       if (!worker->nr_items_to_complete || res->id != worker->next_item_to_complete)

Nit: maybe it could be useful to distinguish between these 2 potential
bugs and have a specific BUG() for each one.

> +               BUG("checkout worker sent unexpected item id");

> +       worker->next_item_to_complete++;
> +       worker->nr_items_to_complete--;
> +
> +       pc_item = &parallel_checkout.items[res->id];
> +       pc_item->status = res->status;
> +       if (st)
> +               pc_item->st = *st;
> +}
> +
> +

One blank line is enough.

> +static void gather_results_from_workers(struct pc_worker *workers,
> +                                       int num_workers)
> +{
> +       int i, active_workers = num_workers;
> +       struct pollfd *pfds;
> +
> +       CALLOC_ARRAY(pfds, num_workers);
> +       for (i = 0; i < num_workers; i++) {
> +               pfds[i].fd = workers[i].cp.out;
> +               pfds[i].events = POLLIN;
> +       }
> +
> +       while (active_workers) {
> +               int nr = poll(pfds, num_workers, -1);
> +
> +               if (nr < 0) {
> +                       if (errno == EINTR)
> +                               continue;
> +                       die_errno("failed to poll checkout workers");
> +               }
> +
> +               for (i = 0; i < num_workers && nr > 0; i++) {

Is it possible that nr is 0? If that happens, it looks like we would
be in an infinite `while (active_workers) { ... }` loop.

Actually in poll(2) there is: "A value of 0 indicates that the call
timed out and no file descriptors were ready." So it seems that it
could, at least theorically, happen.

> +                       struct pc_worker *worker = &workers[i];
> +                       struct pollfd *pfd = &pfds[i];
> +
> +                       if (!pfd->revents)
> +                               continue;
> +
> +                       if (pfd->revents & POLLIN) {
> +                               int len;
> +                               const char *line = packet_read_line(pfd->fd, &len);
> +
> +                               if (!line) {
> +                                       pfd->fd = -1;
> +                                       active_workers--;
> +                               } else {
> +                                       parse_and_save_result(line, len, worker);
> +                               }
> +                       } else if (pfd->revents & POLLHUP) {
> +                               pfd->fd = -1;
> +                               active_workers--;
> +                       } else if (pfd->revents & (POLLNVAL | POLLERR)) {
> +                               die(_("error polling from checkout worker"));
> +                       }
> +
> +                       nr--;
> +               }
> +       }
> +
> +       free(pfds);
> +}

[...]

> +enum pc_item_status {
> +       PC_ITEM_PENDING = 0,
> +       PC_ITEM_WRITTEN,
> +       /*
> +        * The entry could not be written because there was another file
> +        * already present in its path or leading directories. Since
> +        * checkout_entry_ca() removes such files from the working tree before
> +        * enqueueing the entry for parallel checkout, it means that there was
> +        * a path collision among the entries being written.
> +        */
> +       PC_ITEM_COLLIDED,
> +       PC_ITEM_FAILED,
> +};
> +
> +struct parallel_checkout_item {
> +       /*
> +        * In main process ce points to a istate->cache[] entry. Thus, it's not
> +        * owned by us. In workers they own the memory, which *must be* released.
> +        */
> +       struct cache_entry *ce;
> +       struct conv_attrs ca;
> +       size_t id; /* position in parallel_checkout.items[] of main process */
> +
> +       /* Output fields, sent from workers. */
> +       enum pc_item_status status;
> +       struct stat st;
> +};

Maybe the previous patch could have declared both 'enum
pc_item_status' and 'struct parallel_checkout_item' here, in
parallel-checkout.h, so that this patch wouldn't need to move them
here.

  reply	other threads:[~2021-03-31  4:33 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-03-17 21:12 ` [PATCH 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-03-31  4:22   ` Christian Couder
2021-04-02 14:39     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-03-31  4:32   ` Christian Couder [this message]
2021-04-02 14:42     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-03-31  4:33   ` Christian Couder
2021-04-02 14:45     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-03-17 21:12 ` [PATCH 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-03-31  5:36   ` Christian Couder
2021-03-18 20:56 ` [PATCH 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-03-19  3:24   ` Matheus Tavares
2021-03-19 22:58     ` Junio C Hamano
2021-03-31  5:42 ` Christian Couder
2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-08 19:52   ` [PATCH v2 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-04-16 21:43   ` Junio C Hamano
2021-04-17 19:57     ` Matheus Tavares Bernardino
2021-04-19  9:41     ` Christian Couder
2021-04-19  0:14   ` [PATCH v3 " Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-19  9:36       ` Christian Couder
2021-04-19 19:53     ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 5/5] parallel-checkout: add design documentation Matheus Tavares

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAP8UFD0CmPmvK7a-+5fKk0e0=te_e7cCFf6_-vSFrX9COTBg-w@mail.gmail.com' \
    --to=christian.couder@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --cc=matheus.bernardino@usp.br \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.