All of lore.kernel.org
 help / color / mirror / Atom feed
From: Johannes Schindelin <Johannes.Schindelin@gmx.de>
To: Elijah Newren <newren@gmail.com>
Cc: Matthew John Cheetham via GitGitGadget <gitgitgadget@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	Matthew John Cheetham <mjcheetham@outlook.com>
Subject: Re: [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information
Date: Sun, 6 Feb 2022 22:25:33 +0100 (CET)	[thread overview]
Message-ID: <nycvar.QRO.7.76.6.2202062214110.347@tvgsbejvaqbjf.bet> (raw)
In-Reply-To: <CABPp-BHeLzinXkX3WgqBNYntJwY_ZAm5D7VdOR7KQahvLOuV=w@mail.gmail.com>

Hi Elijah,

On Thu, 27 Jan 2022, Elijah Newren wrote:

> On Wed, Jan 26, 2022 at 3:37 PM Matthew John Cheetham via GitGitGadget
> <gitgitgadget@gmail.com> wrote:
> >
> > From: Matthew John Cheetham <mjcheetham@outlook.com>
> >
> > When operating at the scale that Scalar wants to support, certain data
> > shapes are more likely to cause undesirable performance issues, such as
> > large numbers or large sizes of loose objects.
>
> Makes sense.
>
> > By including statistics about this, `scalar diagnose` now makes it
> > easier to identify such scenarios.
> >
> > Signed-off-by: Matthew John Cheetham <mjcheetham@outlook.com>
> > ---
> >  contrib/scalar/scalar.c          | 60 ++++++++++++++++++++++++++++++++
> >  contrib/scalar/t/t9099-scalar.sh |  2 ++
> >  2 files changed, 62 insertions(+)
> >
> > diff --git a/contrib/scalar/scalar.c b/contrib/scalar/scalar.c
> > index 690933ffdf3..c0ad4948215 100644
> > --- a/contrib/scalar/scalar.c
> > +++ b/contrib/scalar/scalar.c
> > @@ -686,6 +686,60 @@ static void dir_file_stats(struct strbuf *buf, const char *path)
> >         closedir(dir);
> >  }
> >
> > +static int count_files(char *path)
> > +{
> > +       DIR *dir = opendir(path);
> > +       struct dirent *e;
> > +       int count = 0;
> > +
> > +       if (!dir)
> > +               return 0;
> > +
> > +       while ((e = readdir(dir)) != NULL)
> > +               if (!is_dot_or_dotdot(e->d_name) && e->d_type == DT_REG)
> > +                       count++;
> > +
> > +       closedir(dir);
> > +       return count;
> > +}
> > +
> > +static void loose_objs_stats(struct strbuf *buf, const char *path)
> > +{
> > +       DIR *dir = opendir(path);
> > +       struct dirent *e;
> > +       int count;
> > +       int total = 0;
> > +       unsigned char c;
> > +       struct strbuf count_path = STRBUF_INIT;
> > +       size_t base_path_len;
> > +
> > +       if (!dir)
> > +               return;
> > +
> > +       strbuf_addstr(buf, "Object directory stats for ");
> > +       strbuf_add_absolute_path(buf, path);
> > +       strbuf_addstr(buf, ":\n");
> > +
> > +       strbuf_add_absolute_path(&count_path, path);
> > +       strbuf_addch(&count_path, '/');
> > +       base_path_len = count_path.len;
> > +
> > +       while ((e = readdir(dir)) != NULL)
> > +               if (!is_dot_or_dotdot(e->d_name) &&
> > +                   e->d_type == DT_DIR && strlen(e->d_name) == 2 &&
> > +                   !hex_to_bytes(&c, e->d_name, 1)) {
>
> You only recurse into directories, ignoring individual files.
>
> > +                       strbuf_setlen(&count_path, base_path_len);
> > +                       strbuf_addstr(&count_path, e->d_name);
> > +                       total += (count = count_files(count_path.buf));
> > +                       strbuf_addf(buf, "%s : %7d files\n", e->d_name, count);
>
> This shows the number of files within a directory.
>
> > +               }
> > +
> > +       strbuf_addf(buf, "Total: %d loose objects", total);
>
> and this shows the total number of files across all the directories.
>
> But the commit message suggested you also wanted to check for large
> sizes of loose objects.  Did that get ripped out at some point with
> the commit message not being updated, or is it perhaps going to be
> included later?

No, there was no plan to include this information later, as the original
.NET implementation of `scalar diagnose` did not provide that information,
either (which I take as a strong sign that we never needed this type of
information to help users, at least not up until this point).

Besides, it would be kind of a difficult thing to say conclusively what
makes a loose file "big". Is it the zlib-compressed size on disk? Or the
unpacked size? Should there be a configurable threshold to determine when
an object is big? Should `core.bigFileThreshold` be co-opted for this?

Together with the fact that there was no need for this information in
practice, it makes me doubt that we should add this type of information. I
actually suspect that _iff_ information of that type would be helpful, a
more complete tool like git-sizer (https://github.com/github/git-sizer/)
would be needed, and I do not really want to subsume git-sizer's
functionality in `scalar diagnose`.

I rephrased the commit message.

Ciao,
Dscho

>
> > +
> > +       strbuf_release(&count_path);
> > +       closedir(dir);
> > +}
> > +
> >  static int cmd_diagnose(int argc, const char **argv)
> >  {
> >         struct option options[] = {
> > @@ -734,6 +788,12 @@ static int cmd_diagnose(int argc, const char **argv)
> >         if ((res = stage(tmp_dir.buf, &buf, "packs-local.txt")))
> >                 goto diagnose_cleanup;
> >
> > +       strbuf_reset(&buf);
> > +       loose_objs_stats(&buf, ".git/objects");
> > +
> > +       if ((res = stage(tmp_dir.buf, &buf, "objects-local.txt")))
> > +               goto diagnose_cleanup;
> > +
> >         if ((res = stage_directory(tmp_dir.buf, ".git", 0)) ||
> >             (res = stage_directory(tmp_dir.buf, ".git/hooks", 0)) ||
> >             (res = stage_directory(tmp_dir.buf, ".git/info", 0)) ||
> > diff --git a/contrib/scalar/t/t9099-scalar.sh b/contrib/scalar/t/t9099-scalar.sh
> > index b1745851e31..f2ec156d819 100755
> > --- a/contrib/scalar/t/t9099-scalar.sh
> > +++ b/contrib/scalar/t/t9099-scalar.sh
> > @@ -77,6 +77,8 @@ test_expect_success UNZIP 'scalar diagnose' '
> >         unzip -p "$zip_path" diagnostics.log >out &&
> >         test_file_not_empty out &&
> >         unzip -p "$zip_path" packs-local.txt >out &&
> > +       test_file_not_empty out &&
> > +       unzip -p "$zip_path" objects-local.txt >out &&
> >         test_file_not_empty out
> >  '
> >
> > --
> > gitgitgadget
>

  reply	other threads:[~2022-02-06 21:25 UTC|newest]

Thread overview: 140+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-26  8:41 [PATCH 0/5] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
2022-01-26  8:41 ` [PATCH 1/5] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-01-26  9:34   ` René Scharfe
2022-01-26 22:20     ` Taylor Blau
2022-02-06 21:34       ` Johannes Schindelin
2022-01-27 19:38   ` Elijah Newren
2022-01-26  8:41 ` [PATCH 2/5] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-01-26  8:41 ` [PATCH 3/5] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-01-26 22:43   ` Taylor Blau
2022-01-27 15:14     ` Derrick Stolee
2022-02-06 21:38       ` Johannes Schindelin
2022-01-26  8:41 ` [PATCH 4/5] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-01-26 22:50   ` Taylor Blau
2022-01-27 15:17     ` Derrick Stolee
2022-01-27 18:59   ` Elijah Newren
2022-02-06 21:25     ` Johannes Schindelin [this message]
2022-01-26  8:41 ` [PATCH 5/5] scalar diagnose: show a spinner while staging content Johannes Schindelin via GitGitGadget
2022-01-27 15:19 ` [PATCH 0/5] scalar: implement the subcommand "diagnose" Derrick Stolee
2022-02-06 21:13   ` Johannes Schindelin
2022-02-06 22:39 ` [PATCH v2 0/6] " Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 1/6] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-02-07 19:55     ` René Scharfe
2022-02-07 23:30       ` Junio C Hamano
2022-02-08 13:12         ` Johannes Schindelin
2022-02-08 17:44           ` Junio C Hamano
2022-02-08 20:58             ` René Scharfe
2022-02-09 22:48               ` Junio C Hamano
2022-02-10 19:10                 ` René Scharfe
2022-02-10 19:23                   ` Junio C Hamano
2022-02-11 19:16                     ` René Scharfe
2022-02-11 21:27                       ` Junio C Hamano
2022-02-12  9:12                         ` René Scharfe
2022-02-13  6:25                           ` Junio C Hamano
2022-02-13  9:02                             ` René Scharfe
2022-02-14 17:22                               ` Junio C Hamano
2022-02-08 12:54       ` Johannes Schindelin
2022-02-06 22:39   ` [PATCH v2 2/6] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 3/6] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-02-07 19:55     ` René Scharfe
2022-02-08 12:08       ` Johannes Schindelin
2022-02-06 22:39   ` [PATCH v2 4/6] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 5/6] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-02-06 22:39   ` [PATCH v2 6/6] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-04 15:25   ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-07  2:06       ` Elijah Newren
2022-05-09 21:04         ` Johannes Schindelin
2022-05-04 15:25     ` [PATCH v3 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-04 15:25     ` [PATCH v3 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-07  2:23     ` [PATCH v3 0/7] scalar: implement the subcommand "diagnose" Elijah Newren
2022-05-10 19:26     ` [PATCH v4 " Johannes Schindelin via GitGitGadget
2022-05-10 19:26       ` [PATCH v4 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-10 21:48         ` Junio C Hamano
2022-05-10 22:06           ` rsbecker
2022-05-10 23:21             ` Junio C Hamano
2022-05-11 16:14               ` René Scharfe
2022-05-11 19:27                 ` Junio C Hamano
2022-05-12 16:16                   ` René Scharfe
2022-05-12 18:15                     ` Junio C Hamano
2022-05-12 21:31                       ` Junio C Hamano
2022-05-14  7:06                         ` René Scharfe
2022-05-12 22:31           ` [PATCH] fixup! " Junio C Hamano
2022-05-10 19:26       ` [PATCH v4 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-10 21:56         ` Junio C Hamano
2022-05-10 22:23           ` rsbecker
2022-05-19 18:12             ` Johannes Schindelin
2022-05-19 18:09           ` Johannes Schindelin
2022-05-19 18:44             ` Junio C Hamano
2022-05-10 19:27       ` [PATCH v4 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-17 14:51         ` Ævar Arnfjörð Bjarmason
2022-05-18 17:35           ` Junio C Hamano
2022-05-20  7:30             ` Ævar Arnfjörð Bjarmason
2022-05-20 15:55               ` Johannes Schindelin
2022-05-21  9:54                 ` Ævar Arnfjörð Bjarmason
2022-05-22  5:50                   ` Junio C Hamano
2022-05-24 12:25                     ` Johannes Schindelin
2022-05-24 18:11                       ` Ævar Arnfjörð Bjarmason
2022-05-24 19:29                         ` Junio C Hamano
2022-05-25 10:31                           ` Johannes Schindelin
2022-05-10 19:27       ` [PATCH v4 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-17 14:53         ` Ævar Arnfjörð Bjarmason
2022-05-10 19:27       ` [PATCH v4 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-10 19:27       ` [PATCH v4 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-10 19:27       ` [PATCH v4 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-17 15:03       ` [PATCH v4 0/7] scalar: implement the subcommand "diagnose" Ævar Arnfjörð Bjarmason
2022-05-17 15:28         ` rsbecker
2022-05-19 18:17           ` Johannes Schindelin
2022-05-19 18:17       ` [PATCH v5 " Johannes Schindelin via GitGitGadget
2022-05-19 18:17         ` [PATCH v5 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-20 14:41           ` René Scharfe
2022-05-20 16:21             ` Junio C Hamano
2022-05-19 18:17         ` [PATCH v5 2/7] archive --add-file-with-contents: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-19 18:17         ` [PATCH v5 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-19 18:18         ` [PATCH v5 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-19 19:23         ` [PATCH v5 0/7] scalar: implement the subcommand "diagnose" Junio C Hamano
2022-05-21 15:08         ` [PATCH v6 " Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 1/7] archive: optionally add "virtual" files Johannes Schindelin via GitGitGadget
2022-05-25 21:11             ` Junio C Hamano
2022-05-26  9:09               ` René Scharfe
2022-05-26 17:10                 ` Junio C Hamano
2022-05-26 18:57                   ` René Scharfe
2022-05-26 20:16                     ` Junio C Hamano
2022-05-27 17:02                       ` René Scharfe
2022-05-27 19:01                         ` Junio C Hamano
2022-05-28  6:57                           ` René Scharfe
2022-05-21 15:08           ` [PATCH v6 2/7] archive --add-virtual-file: allow paths containing colons Johannes Schindelin via GitGitGadget
2022-05-25 20:22             ` Junio C Hamano
2022-05-25 21:42               ` Junio C Hamano
2022-05-25 22:34                 ` Junio C Hamano
2022-05-21 15:08           ` [PATCH v6 3/7] scalar: validate the optional enlistment argument Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 4/7] Implement `scalar diagnose` Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 5/7] scalar diagnose: include disk space information Johannes Schindelin via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 6/7] scalar: teach `diagnose` to gather packfile info Matthew John Cheetham via GitGitGadget
2022-05-21 15:08           ` [PATCH v6 7/7] scalar: teach `diagnose` to gather loose objects information Matthew John Cheetham via GitGitGadget
2022-05-28 23:11           ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 1/7] archive: optionally add "virtual" files Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 2/7] archive --add-virtual-file: allow paths containing colons Junio C Hamano
2022-06-15 18:16               ` Adam Dinwoodie
2022-06-15 20:00                 ` Junio C Hamano
2022-06-15 21:36                   ` Adam Dinwoodie
2022-06-18 20:19                     ` Johannes Schindelin
2022-06-18 22:05                       ` Junio C Hamano
2022-06-20  9:41                       ` Adam Dinwoodie
2022-05-28 23:11             ` [PATCH v6+ 3/7] scalar: validate the optional enlistment argument Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 4/7] scalar: implement `scalar diagnose` Junio C Hamano
2022-06-10  2:08               ` Ævar Arnfjörð Bjarmason
2022-06-10 16:44                 ` Junio C Hamano
2022-06-10 17:35                   ` Ævar Arnfjörð Bjarmason
2022-05-28 23:11             ` [PATCH v6+ 5/7] scalar diagnose: include disk space information Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 6/7] scalar: teach `diagnose` to gather packfile info Junio C Hamano
2022-05-28 23:11             ` [PATCH v6+ 7/7] scalar: teach `diagnose` to gather loose objects information Junio C Hamano
2022-05-30 10:12             ` [PATCH v6+ 0/7] js/scalar-diagnose rebased Johannes Schindelin
2022-05-30 17:37               ` Junio C Hamano

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=nycvar.QRO.7.76.6.2202062214110.347@tvgsbejvaqbjf.bet \
    --to=johannes.schindelin@gmx.de \
    --cc=git@vger.kernel.org \
    --cc=gitgitgadget@gmail.com \
    --cc=mjcheetham@outlook.com \
    --cc=newren@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.