From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-15.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 31B08C432BE for ; Tue, 3 Aug 2021 19:39:54 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 1996D60BD3 for ; Tue, 3 Aug 2021 19:39:54 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S240184AbhHCTkE (ORCPT ); Tue, 3 Aug 2021 15:40:04 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:51524 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S240238AbhHCTj4 (ORCPT ); Tue, 3 Aug 2021 15:39:56 -0400 Received: from mail-wm1-x334.google.com (mail-wm1-x334.google.com [IPv6:2a00:1450:4864:20::334]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 9E91BC061383 for ; Tue, 3 Aug 2021 12:39:35 -0700 (PDT) Received: by mail-wm1-x334.google.com with SMTP id a192-20020a1c7fc90000b0290253b32e8796so2494947wmd.0 for ; Tue, 03 Aug 2021 12:39:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=q3XRO2JHGoG03/g3gEaz2PJltoYcpBW/dzqMzPu1hMY=; b=EhecSSXyObNRQ/OPwF+/V5HJEQsagytgr55gGF+IFM+dR/P8rGhvr5/+vwfXU10qkp e2E8wnhdqnfr24DWze848QYb7sqPYP1YzLni011ZmHKqCqcUeTkiZI1pXxlvfez0QwM4 m7UYK66V5GG1zZM+Ogdc+iPZSxUHErAhrw63fqHhVR1HnxlVXElv+3Hx83eCg0t86Cl5 mDTYu5mzlT+PuXFxfgweaxoPD0kb2Y5f1fIrp+VZ9rbTFitkbrCVChWvKntLuVvFK2XK tZK7kW1KIarP6dbe+P3z+xzVQ0O0NW+fUM1AVVQ5tiUTdHZt/dA+u7DxogCWd5KRmR92 yCLA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=q3XRO2JHGoG03/g3gEaz2PJltoYcpBW/dzqMzPu1hMY=; b=oYIdSXncXeay/MIm414Fi47YSIGM29be2EYvaBRMnrp/YuXSivWJw5emfu4ivKW2Pm k5MdjoOlhXc1bQNfxXFgoFDQpDzRgLEi1BelvetIobVPc9FrC46+0Ug7R/vTV1mCKoYJ gu5jvOQBf5mMAC1jN2LdQBcenTJN+KJ1qTmvZlKfKSLlM46GmrTrm1H7IcoCA1brPXCv 0Mmkyd/IY1phvfQYtWCq2s7gWCzo+Ipzd6EAwsm6TAdpE1AkLkfiBD/zGnJX+8b1ZKz8 6K2KpK1mlNH+TiRD8caBG8od/3Q9SHsxQ/SD7hgKxfPJ5JSLV58of+jy+UxQaYRusRpU hrkQ== X-Gm-Message-State: AOAM530y1oGPazsyiqEZ2VkKIhWj3jB7cZWYSmcHn3b/+NPEB5Y2qM2Z 3Pj7JPb7mk88CBZYKQWuXVxpRqv0gM4QsA== X-Google-Smtp-Source: ABdhPJyzHDO0EfjbwcRPWw0j6flD5+e+Tvrs/m0CPheDNSqiz/zc58E5bT/75RohdVokdIcKvYrapg== X-Received: by 2002:a1c:46c4:: with SMTP id t187mr6081317wma.64.1628019573934; Tue, 03 Aug 2021 12:39:33 -0700 (PDT) Received: from vm.nix.is (vm.nix.is. [2a01:4f8:120:2468::2]) by smtp.gmail.com with ESMTPSA id n8sm15333562wrx.46.2021.08.03.12.39.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 03 Aug 2021 12:39:33 -0700 (PDT) From: =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= To: git@vger.kernel.org Cc: Junio C Hamano , Emily Shaffer , Jeff King , Taylor Blau , Felipe Contreras , Eric Sunshine , "brian m . carlson" , Josh Steadmon , Jonathan Tan , Derrick Stolee , =?UTF-8?q?=C3=86var=20Arnfj=C3=B6r=C3=B0=20Bjarmason?= Subject: [PATCH v4 24/36] run-command: add stdin callback for parallelization Date: Tue, 3 Aug 2021 21:38:50 +0200 Message-Id: X-Mailer: git-send-email 2.33.0.rc0.593.g54be4d223c3 In-Reply-To: References: MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Precedence: bulk List-ID: X-Mailing-List: git@vger.kernel.org From: Emily Shaffer If a user of the run_processes_parallel() API wants to pipe a large amount of information to stdin of each parallel command, that information could exceed the buffer of the pipe allocated for that process's stdin. Generally this is solved by repeatedly writing to child_process.in between calls to start_command() and finish_command(); run_processes_parallel() did not provide users an opportunity to access child_process at that time. Because the data might be extremely large (for example, a list of all refs received during a push from a client) simply taking a string_list or strbuf is not as scalable as using a callback; the rest of the run_processes_parallel() API also uses callbacks, so making this feature match the rest of the API reduces mental load on the user. Signed-off-by: Emily Shaffer Signed-off-by: Ævar Arnfjörð Bjarmason --- builtin/fetch.c | 1 + builtin/submodule--helper.c | 2 +- hook.c | 1 + run-command.c | 54 +++++++++++++++++++++++++++++++++++-- run-command.h | 17 +++++++++++- submodule.c | 1 + t/helper/test-run-command.c | 31 ++++++++++++++++++--- t/t0061-run-command.sh | 30 +++++++++++++++++++++ 8 files changed, 129 insertions(+), 8 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 25740c13df1..fef6e85d003 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -1817,6 +1817,7 @@ static int fetch_multiple(struct string_list *list, int max_children) result = run_processes_parallel_tr2(max_children, &fetch_next_remote, &fetch_failed_to_start, + NULL, &fetch_finished, &state, "fetch", "parallel/fetch"); diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c index f73963ad67d..f42ded548bf 100644 --- a/builtin/submodule--helper.c +++ b/builtin/submodule--helper.c @@ -2294,7 +2294,7 @@ static int update_submodules(struct submodule_update_clone *suc) int i; run_processes_parallel_tr2(suc->max_jobs, update_clone_get_next_task, - update_clone_start_failure, + update_clone_start_failure, NULL, update_clone_task_finished, suc, "submodule", "parallel/update"); diff --git a/hook.c b/hook.c index 63c9a60921f..a7462639d97 100644 --- a/hook.c +++ b/hook.c @@ -175,6 +175,7 @@ int run_hooks(const char *hook_name, const char *hook_path, run_processes_parallel_tr2(jobs, pick_next_hook, notify_start_failure, + NULL, notify_hook_finished, &cb_data, "hook", diff --git a/run-command.c b/run-command.c index 482ee2d76c6..f1616858d18 100644 --- a/run-command.c +++ b/run-command.c @@ -1492,6 +1492,7 @@ struct parallel_processes { get_next_task_fn get_next_task; start_failure_fn start_failure; + feed_pipe_fn feed_pipe; task_finished_fn task_finished; struct { @@ -1519,6 +1520,13 @@ static int default_start_failure(struct strbuf *out, return 0; } +static int default_feed_pipe(struct strbuf *pipe, + void *pp_cb, + void *pp_task_cb) +{ + return 1; +} + static int default_task_finished(int result, struct strbuf *out, void *pp_cb, @@ -1549,6 +1557,7 @@ static void pp_init(struct parallel_processes *pp, int n, get_next_task_fn get_next_task, start_failure_fn start_failure, + feed_pipe_fn feed_pipe, task_finished_fn task_finished, void *data) { @@ -1567,6 +1576,7 @@ static void pp_init(struct parallel_processes *pp, pp->get_next_task = get_next_task; pp->start_failure = start_failure ? start_failure : default_start_failure; + pp->feed_pipe = feed_pipe ? feed_pipe : default_feed_pipe; pp->task_finished = task_finished ? task_finished : default_task_finished; pp->nr_processes = 0; @@ -1664,6 +1674,37 @@ static int pp_start_one(struct parallel_processes *pp) return 0; } +static void pp_buffer_stdin(struct parallel_processes *pp) +{ + int i; + struct strbuf sb = STRBUF_INIT; + + /* Buffer stdin for each pipe. */ + for (i = 0; i < pp->max_processes; i++) { + if (pp->children[i].state == GIT_CP_WORKING && + pp->children[i].process.in > 0) { + int done; + strbuf_reset(&sb); + done = pp->feed_pipe(&sb, pp->data, + pp->children[i].data); + if (sb.len) { + if (write_in_full(pp->children[i].process.in, + sb.buf, sb.len) < 0) { + if (errno != EPIPE) + die_errno("write"); + done = 1; + } + } + if (done) { + close(pp->children[i].process.in); + pp->children[i].process.in = 0; + } + } + } + + strbuf_release(&sb); +} + static void pp_buffer_stderr(struct parallel_processes *pp, int output_timeout) { int i; @@ -1728,6 +1769,7 @@ static int pp_collect_finished(struct parallel_processes *pp) pp->nr_processes--; pp->children[i].state = GIT_CP_FREE; pp->pfd[i].fd = -1; + pp->children[i].process.in = 0; child_process_init(&pp->children[i].process); if (i != pp->output_owner) { @@ -1761,6 +1803,7 @@ static int pp_collect_finished(struct parallel_processes *pp) int run_processes_parallel(int n, get_next_task_fn get_next_task, start_failure_fn start_failure, + feed_pipe_fn feed_pipe, task_finished_fn task_finished, void *pp_cb) { @@ -1769,7 +1812,9 @@ int run_processes_parallel(int n, int spawn_cap = 4; struct parallel_processes pp; - pp_init(&pp, n, get_next_task, start_failure, task_finished, pp_cb); + sigchain_push(SIGPIPE, SIG_IGN); + + pp_init(&pp, n, get_next_task, start_failure, feed_pipe, task_finished, pp_cb); while (1) { for (i = 0; i < spawn_cap && !pp.shutdown && @@ -1786,6 +1831,7 @@ int run_processes_parallel(int n, } if (!pp.nr_processes) break; + pp_buffer_stdin(&pp); pp_buffer_stderr(&pp, output_timeout); pp_output(&pp); code = pp_collect_finished(&pp); @@ -1797,11 +1843,15 @@ int run_processes_parallel(int n, } pp_cleanup(&pp); + + sigchain_pop(SIGPIPE); + return 0; } int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task, start_failure_fn start_failure, + feed_pipe_fn feed_pipe, task_finished_fn task_finished, void *pp_cb, const char *tr2_category, const char *tr2_label) { @@ -1811,7 +1861,7 @@ int run_processes_parallel_tr2(int n, get_next_task_fn get_next_task, ((n < 1) ? online_cpus() : n)); result = run_processes_parallel(n, get_next_task, start_failure, - task_finished, pp_cb); + feed_pipe, task_finished, pp_cb); trace2_region_leave(tr2_category, tr2_label, NULL); diff --git a/run-command.h b/run-command.h index cfb6887e4ae..80d394664ae 100644 --- a/run-command.h +++ b/run-command.h @@ -422,6 +422,20 @@ typedef int (*start_failure_fn)(struct strbuf *out, void *pp_cb, void *pp_task_cb); +/** + * This callback is called repeatedly on every child process who requests + * start_command() to create a pipe by setting child_process.in < 0. + * + * pp_cb is the callback cookie as passed into run_processes_parallel, and + * pp_task_cb is the callback cookie as passed into get_next_task_fn. + * The contents of 'send' will be read into the pipe and passed to the pipe. + * + * Return nonzero to close the pipe. + */ +typedef int (*feed_pipe_fn)(struct strbuf *pipe, + void *pp_cb, + void *pp_task_cb); + /** * This callback is called on every child process that finished processing. * @@ -456,10 +470,11 @@ typedef int (*task_finished_fn)(int result, int run_processes_parallel(int n, get_next_task_fn, start_failure_fn, + feed_pipe_fn, task_finished_fn, void *pp_cb); int run_processes_parallel_tr2(int n, get_next_task_fn, start_failure_fn, - task_finished_fn, void *pp_cb, + feed_pipe_fn, task_finished_fn, void *pp_cb, const char *tr2_category, const char *tr2_label); /** diff --git a/submodule.c b/submodule.c index 8e611fe1dbf..db1700a502d 100644 --- a/submodule.c +++ b/submodule.c @@ -1632,6 +1632,7 @@ int fetch_populated_submodules(struct repository *r, run_processes_parallel_tr2(max_parallel_jobs, get_next_submodule, fetch_start_failure, + NULL, fetch_finish, &spf, "submodule", "parallel/fetch"); diff --git a/t/helper/test-run-command.c b/t/helper/test-run-command.c index 7ae03dc7123..9348184d303 100644 --- a/t/helper/test-run-command.c +++ b/t/helper/test-run-command.c @@ -32,8 +32,13 @@ static int parallel_next(struct child_process *cp, return 0; strvec_pushv(&cp->args, d->argv); + cp->in = d->in; + cp->no_stdin = d->no_stdin; strbuf_addstr(err, "preloaded output of a child\n"); number_callbacks++; + + *task_cb = xmalloc(sizeof(int)); + *(int*)(*task_cb) = 2; return 1; } @@ -55,6 +60,17 @@ static int task_finished(int result, return 1; } +static int test_stdin(struct strbuf *pipe, void *cb, void *task_cb) +{ + int *lines_remaining = task_cb; + + if (*lines_remaining) + strbuf_addf(pipe, "sample stdin %d\n", --(*lines_remaining)); + + return !(*lines_remaining); +} + + struct testsuite { struct string_list tests, failed; int next; @@ -185,7 +201,7 @@ static int testsuite(int argc, const char **argv) suite.tests.nr, max_jobs); ret = run_processes_parallel(max_jobs, next_test, test_failed, - test_finished, &suite); + test_stdin, test_finished, &suite); if (suite.failed.nr > 0) { ret = 1; @@ -413,15 +429,22 @@ int cmd__run_command(int argc, const char **argv) if (!strcmp(argv[1], "run-command-parallel")) exit(run_processes_parallel(jobs, parallel_next, - NULL, NULL, &proc)); + NULL, NULL, NULL, &proc)); if (!strcmp(argv[1], "run-command-abort")) exit(run_processes_parallel(jobs, parallel_next, - NULL, task_finished, &proc)); + NULL, NULL, task_finished, &proc)); if (!strcmp(argv[1], "run-command-no-jobs")) exit(run_processes_parallel(jobs, no_job, - NULL, task_finished, &proc)); + NULL, NULL, task_finished, &proc)); + + if (!strcmp(argv[1], "run-command-stdin")) { + proc.in = -1; + proc.no_stdin = 0; + exit (run_processes_parallel(jobs, parallel_next, NULL, + test_stdin, NULL, &proc)); + } fprintf(stderr, "check usage\n"); return 1; diff --git a/t/t0061-run-command.sh b/t/t0061-run-command.sh index 7d599675e35..87759482ad1 100755 --- a/t/t0061-run-command.sh +++ b/t/t0061-run-command.sh @@ -143,6 +143,36 @@ test_expect_success 'run_command runs in parallel with more tasks than jobs avai test_cmp expect actual ' +cat >expect <<-EOF +preloaded output of a child +listening for stdin: +sample stdin 1 +sample stdin 0 +preloaded output of a child +listening for stdin: +sample stdin 1 +sample stdin 0 +preloaded output of a child +listening for stdin: +sample stdin 1 +sample stdin 0 +preloaded output of a child +listening for stdin: +sample stdin 1 +sample stdin 0 +EOF + +test_expect_success 'run_command listens to stdin' ' + write_script stdin-script <<-\EOF && + echo "listening for stdin:" + while read line; do + echo "$line" + done + EOF + test-tool run-command run-command-stdin 2 ./stdin-script 2>actual && + test_cmp expect actual +' + cat >expect <<-EOF preloaded output of a child asking for a quick stop -- 2.33.0.rc0.595.ge31e012651d