All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jeff King <peff@peff.net>
To: Duy Nguyen <pclouds@gmail.com>
Cc: "Christian Couder" <christian.couder@gmail.com>,
	"Thomas Gummerer" <t.gummerer@gmail.com>,
	"Matheus Tavares Bernardino" <matheus.bernardino@usp.br>,
	git <git@vger.kernel.org>,
	"Оля Тележная" <olyatelezhnaya@gmail.com>,
	"Elijah Newren" <newren@gmail.com>,
	"Tanushree Tumane" <tanushreetumane@gmail.com>
Subject: Re: Questions on GSoC 2019 Ideas
Date: Mon, 4 Mar 2019 23:51:40 -0500	[thread overview]
Message-ID: <20190305045140.GH19800@sigill.intra.peff.net> (raw)
In-Reply-To: <CACsJy8ATKdcDdbTzCdZFhChKEAWhjuYQJBpGXZ9HAVXK1r2pFw@mail.gmail.com>

On Sun, Mar 03, 2019 at 05:12:59PM +0700, Duy Nguyen wrote:

> On Sun, Mar 3, 2019 at 2:18 PM Christian Couder
> <christian.couder@gmail.com> wrote:
> > One thing I am still worried about is if we are sure that adding
> > parallelism is likely to get us a significant performance improvement
> > or not. If the performance of this code is bounded by disk or memory
> > access, then adding parallelism might not bring any benefit. (It could
> > perhaps decrease performance if memory locality gets worse.) So I'd
> > like some confirmation either by running some tests or by experienced
> > Git developers that it is likely to be a win.
> 
> This is a good point. My guess is the pack access consists of two
> parts: deflate zlib, resolve delta objects (which is just another form
> of compression) and actual I/O. The former is CPU bound and may take
> advantage of multiple cores. However, the cache we have kinda helps
> reduce CPU work load already, so perhaps the actual gain is not that
> much (or maybe we could just improve this cache to be more efficient).
> I'm adding Jeff, maybe he has done some experiments on parallel pack
> access, who knows.

Sorry, I don't have anything intelligent to add here. I do know that
`index-pack` doesn't scale well with more cores. I don't think I've ever
looked at adding parallel access to the packs themselves. I suspect it
would be tricky due to a few global variables (the pack windows, the
delta cache, etc).

> The second good thing from parallel pack access is not about utilizing
> processing power from multiple cores, but about _not_ blocking. I
> think one example use case here is parallel checkout. While one thread
> is blocked by pack access code for whatever reason, the others can
> still continue doing other stuff (e.g. write the checked out file to
> disk) or even access the pack again to check more things out.

I'm not sure if it would help much for packs, because they're organized
to have pretty good cold-cache read-ahead behavior. But who knows until
we measure it.

I do suspect that inflating (and delta reconstruction) done in parallel
could be a win for git-grep, especially if you have a really simple
regex that is quick to search.

-Peff

  parent reply	other threads:[~2019-03-05  4:51 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-02-28 21:46 Questions on GSoC 2019 Ideas Matheus Tavares Bernardino
2019-02-28 22:07 ` Christian Couder
2019-03-01  9:30   ` Duy Nguyen
2019-03-02 15:09     ` Thomas Gummerer
2019-03-03  7:18       ` Christian Couder
2019-03-03 10:12         ` Duy Nguyen
2019-03-03 10:17           ` Duy Nguyen
2019-03-05  4:51           ` Jeff King [this message]
2019-03-05 12:57             ` Duy Nguyen
2019-03-05 23:46               ` Matheus Tavares Bernardino
2019-03-06 10:17                 ` Duy Nguyen
2019-03-12  0:18                   ` Matheus Tavares Bernardino
2019-03-12 10:02                     ` Duy Nguyen
2019-03-12 10:11                       ` Duy Nguyen
2019-04-04  1:15                         ` Matheus Tavares Bernardino
2019-04-04  7:56                           ` Christian Couder
2019-04-04  8:20                             ` Mike Hommey
2019-04-05 16:28                             ` Matheus Tavares Bernardino
2019-04-07 23:40                               ` Christian Couder
2019-03-05 23:03         ` Matheus Tavares Bernardino
2019-03-06 23:17           ` Thomas Gummerer
2019-03-03 10:03       ` Duy Nguyen
2019-03-03 16:12         ` Thomas Gummerer
2019-03-01 15:20   ` Johannes Schindelin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190305045140.GH19800@sigill.intra.peff.net \
    --to=peff@peff.net \
    --cc=christian.couder@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=matheus.bernardino@usp.br \
    --cc=newren@gmail.com \
    --cc=olyatelezhnaya@gmail.com \
    --cc=pclouds@gmail.com \
    --cc=t.gummerer@gmail.com \
    --cc=tanushreetumane@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.