All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matheus Tavares Bernardino <matheus.bernardino@usp.br>
To: Christian Couder <christian.couder@gmail.com>
Cc: git <git@vger.kernel.org>, Junio C Hamano <gitster@pobox.com>,
	Jeff Hostetler <git@jeffhostetler.com>
Subject: Re: [PATCH 1/5] unpack-trees: add basic support for parallel checkout
Date: Fri, 2 Apr 2021 11:39:30 -0300	[thread overview]
Message-ID: <CAHd-oW7TYmbJAx9Jrzwsfb0zsdT=KmhumJnOJygk0yTWTW3i=Q@mail.gmail.com> (raw)
In-Reply-To: <CAP8UFD2eaaQEuytk6cPoYco2_dfdbsM3aAvJyYFqgYSfkZ487g@mail.gmail.com>

On Wed, Mar 31, 2021 at 1:22 AM Christian Couder
<christian.couder@gmail.com> wrote:
>
> On Wed, Mar 17, 2021 at 10:12 PM Matheus Tavares
> <matheus.bernardino@usp.br> wrote:
>
> > +struct parallel_checkout_item {
> > +       /* pointer to a istate->cache[] entry. Not owned by us. */
> > +       struct cache_entry *ce;
> > +       struct conv_attrs ca;
> > +       struct stat st;
> > +       enum pc_item_status status;
> > +};
> > +
> > +struct parallel_checkout {
> > +       enum pc_status status;
> > +       struct parallel_checkout_item *items;
> > +       size_t nr, alloc;
> > +};
>
> It looks like this struct parallel_checkout contains what is called
> the parallel checkout queue in many places. It would be a bit better
> if a name or at least a comment could make that clear.

Right, will do.

> > +       case CA_CLASS_INCORE_PROCESS:
> > +               /*
> > +                * The parallel queue and the delayed queue are not compatible,
> > +                * so they must be kept completely separated. And we can't tell
> > +                * if a long-running process will delay its response without
> > +                * actually asking it to perform the filtering. Therefore, this
> > +                * type of filter is not allowed in parallel checkout.
> > +                *
> > +                * Furthermore, there should only be one instance of the
> > +                * long-running process filter as we don't know how it is
> > +                * managing its own concurrency. So, spreading the entries that
> > +                * requisite such a filter among the parallel workers would
> > +                * require a lot more inter-process communication. We would
> > +                * probably have to designate a single process to interact with
> > +                * the filter and send all the necessary data to it, for each
> > +                * entry.
> > +                */
> > +               return 0;
>
> So it looks like we don't process entries that need filtering. It's a
> bit disappointing as the commit message says:
>
> "This new interface allows us to enqueue some of the entries being
> checked out to later uncompress, smudge, and write them in parallel."
>
> So it seems to say that entries could be smudged in parallel. Or maybe
> I am missing something?

Good point, this part of the commit message is indeed misleading. Only
internal filters, like re-encoding and end-of-line conversions, can be
performed in parallel. I'll rephrase that sentence, thanks!

> > +static int handle_results(struct checkout *state)
[...]
> > +
> > +       if (have_pending)
> > +               error(_("parallel checkout finished with pending entries"));
>
> Is this an error that can actually happen? Or is it a bug? If this can
> actually happen what is the state of the working directory [...] ?

This could happen both due to a bug or due to an actual error, e.g. if
one of the workers die()s or gets killed before finishing its work
queue. In these cases, the files associated with the unfinished items
will be missing from the working tree, and their index entries will
have null stat() information.

> [...] and what
> can the user do? Start another checkout with an option to do it
> sequentially?

Hmm, it depends on what caused the error. For example, if the user
accidentally killed one of the workers, starting another checkout
(either parallel or sequential) should be able to create the missing
files. But that might not be the case if the error was caused by a
fatal condition in one of the workers which led it to die() (like a
packet write failure or corrupted object).

Nevertheless, I think it might be interesting to rephrase the error
message here to be more explicit about the outcome state. Perhaps even
mention how many entries will be missing.

> [...]
>
> > +static int write_pc_item_to_fd(struct parallel_checkout_item *pc_item, int fd,
> > +                              const char *path)
> > +{
> > +       int ret;
> > +       struct stream_filter *filter;
> > +       struct strbuf buf = STRBUF_INIT;
> > +       char *new_blob;
> > +       unsigned long size;
>
> I thought that we shouldn't use unsigned long anymore for sizes, but
> unfortunately it looks like that's what read_blob_entry() expects.
>
> > +       size_t newsize = 0;
>
> I don't think we need to initialize newsize. Also we could perhaps
> declare it closer to where it is used below.
>
> > +       ssize_t wrote;
> > +
> > +       /* Sanity check */
> > +       assert(is_eligible_for_parallel_checkout(pc_item->ce, &pc_item->ca));
> > +
> > +       filter = get_stream_filter_ca(&pc_item->ca, &pc_item->ce->oid);
> > +       if (filter) {
> > +               if (stream_blob_to_fd(fd, &pc_item->ce->oid, filter, 1)) {
> > +                       /* On error, reset fd to try writing without streaming */
> > +                       if (reset_fd(fd, path))
> > +                               return -1;
> > +               } else {
> > +                       return 0;
> > +               }
> > +       }
> > +
> > +       new_blob = read_blob_entry(pc_item->ce, &size);
> > +       if (!new_blob)
> > +               return error("unable to read sha1 file of %s (%s)", path,
>
> With the support of sha256, I guess we should avoid talking about
> sha1. Maybe: s/sha1/object/ Or maybe we could just say that we
> couldn't read the object, as read_blob_entry() would perhaps read it
> from a pack-file and not from a loose file?

Good point, thanks!

> > +                            oid_to_hex(&pc_item->ce->oid));
> > +
> > +       /*
> > +        * checkout metadata is used to give context for external process
> > +        * filters. Files requiring such filters are not eligible for parallel
> > +        * checkout, so pass NULL.
> > +        */
> > +       ret = convert_to_working_tree_ca(&pc_item->ca, pc_item->ce->name,
> > +                                        new_blob, size, &buf, NULL);
> > +
> > +       if (ret) {
> > +               free(new_blob);
> > +               new_blob = strbuf_detach(&buf, &newsize);
> > +               size = newsize;
>
> "newsize" seems to be used only here. It would be nice if we could get
> rid of it and pass "&size" to strbuf_detach(), but maybe we cannot
> because of the unsigned long/size_t type mismatch. At least we could
> declare "newsize" in this scope though.

Sure, I'll move "newsize" down to this block and remove the
initialization, thanks.

  reply	other threads:[~2021-04-02 14:39 UTC|newest]

Thread overview: 40+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-03-17 21:12 [PATCH 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-03-17 21:12 ` [PATCH 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-03-31  4:22   ` Christian Couder
2021-04-02 14:39     ` Matheus Tavares Bernardino [this message]
2021-03-17 21:12 ` [PATCH 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-03-31  4:32   ` Christian Couder
2021-04-02 14:42     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-03-31  4:33   ` Christian Couder
2021-04-02 14:45     ` Matheus Tavares Bernardino
2021-03-17 21:12 ` [PATCH 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-03-17 21:12 ` [PATCH 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-03-31  5:36   ` Christian Couder
2021-03-18 20:56 ` [PATCH 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-03-19  3:24   ` Matheus Tavares
2021-03-19 22:58     ` Junio C Hamano
2021-03-31  5:42 ` Christian Couder
2021-04-08 16:16 ` [PATCH v2 " Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-08 16:17   ` [PATCH v2 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-08 19:52   ` [PATCH v2 0/5] Parallel Checkout (part 2) Junio C Hamano
2021-04-16 21:43   ` Junio C Hamano
2021-04-17 19:57     ` Matheus Tavares Bernardino
2021-04-19  9:41     ` Christian Couder
2021-04-19  0:14   ` [PATCH v3 " Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19  0:14     ` [PATCH v3 5/5] parallel-checkout: add design documentation Matheus Tavares
2021-04-19  9:36       ` Christian Couder
2021-04-19 19:53     ` [PATCH v4 0/5] Parallel Checkout (part 2) Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 1/5] unpack-trees: add basic support for parallel checkout Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 2/5] parallel-checkout: make it truly parallel Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 3/5] parallel-checkout: add configuration options Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 4/5] parallel-checkout: support progress displaying Matheus Tavares
2021-04-19 19:53       ` [PATCH v4 5/5] parallel-checkout: add design documentation Matheus Tavares

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAHd-oW7TYmbJAx9Jrzwsfb0zsdT=KmhumJnOJygk0yTWTW3i=Q@mail.gmail.com' \
    --to=matheus.bernardino@usp.br \
    --cc=christian.couder@gmail.com \
    --cc=git@jeffhostetler.com \
    --cc=git@vger.kernel.org \
    --cc=gitster@pobox.com \
    --subject='Re: [PATCH 1/5] unpack-trees: add basic support for parallel checkout' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.