All of lore.kernel.org
 help / color / mirror / Atom feed
* git-p4 out of memory for very large repository
@ 2013-08-23  1:12 Corey Thompson
  2013-08-23  7:16 ` Luke Diamand
  0 siblings, 1 reply; 13+ messages in thread
From: Corey Thompson @ 2013-08-23  1:12 UTC (permalink / raw)
  To: git

Hello,

Has anyone actually gotten git-p4 to clone a large Perforce repository?
I have one codebase in particular that gets to about 67%, then
consistently gets get-fast-import (and often times a few other
processes) killed by the OOM killer.

I've found some patches out there that claim to resolve this, but
they're all for versions of git-p4.py from several years ago.  Not only
will they not apply cleanly, but as far as I can tell the issues that
these patches are meant to address aren't in the current version,
anyway.

Any suggestions would be greatly appreciated.

Thanks,
Corey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git-p4 out of memory for very large repository
  2013-08-23  1:12 git-p4 out of memory for very large repository Corey Thompson
@ 2013-08-23  7:16 ` Luke Diamand
  2013-08-23 11:48   ` Corey Thompson
  0 siblings, 1 reply; 13+ messages in thread
From: Luke Diamand @ 2013-08-23  7:16 UTC (permalink / raw)
  To: Corey Thompson; +Cc: git

On 23/08/13 02:12, Corey Thompson wrote:
> Hello,
>
> Has anyone actually gotten git-p4 to clone a large Perforce repository?

Yes. I've cloned repos with a couple of Gig of files.

> I have one codebase in particular that gets to about 67%, then
> consistently gets get-fast-import (and often times a few other
> processes) killed by the OOM killer.

What size is this codebase? Which version and platform of git are you using?

Maybe it's a regression, or perhaps you've hit some new, previously 
unknown size limit?

Thanks
Luke


>
> I've found some patches out there that claim to resolve this, but
> they're all for versions of git-p4.py from several years ago.  Not only
> will they not apply cleanly, but as far as I can tell the issues that
> these patches are meant to address aren't in the current version,
> anyway.
>
> Any suggestions would be greatly appreciated.
>
> Thanks,
> Corey
> --
> To unsubscribe from this list: send the line "unsubscribe git" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git-p4 out of memory for very large repository
  2013-08-23  7:16 ` Luke Diamand
@ 2013-08-23 11:48   ` Corey Thompson
  2013-08-23 11:59     ` Corey Thompson
  2013-08-25 15:50     ` Pete Wyckoff
  0 siblings, 2 replies; 13+ messages in thread
From: Corey Thompson @ 2013-08-23 11:48 UTC (permalink / raw)
  To: Luke Diamand; +Cc: git

On Fri, Aug 23, 2013 at 08:16:58AM +0100, Luke Diamand wrote:
> On 23/08/13 02:12, Corey Thompson wrote:
> >Hello,
> >
> >Has anyone actually gotten git-p4 to clone a large Perforce repository?
> 
> Yes. I've cloned repos with a couple of Gig of files.
> 
> >I have one codebase in particular that gets to about 67%, then
> >consistently gets get-fast-import (and often times a few other
> >processes) killed by the OOM killer.
> 
> What size is this codebase? Which version and platform of git are you using?
> 
> Maybe it's a regression, or perhaps you've hit some new, previously
> unknown size limit?
> 
> Thanks
> Luke
> 
> 
> >
> >I've found some patches out there that claim to resolve this, but
> >they're all for versions of git-p4.py from several years ago.  Not only
> >will they not apply cleanly, but as far as I can tell the issues that
> >these patches are meant to address aren't in the current version,
> >anyway.
> >
> >Any suggestions would be greatly appreciated.
> >
> >Thanks,
> >Corey
> >--
> >To unsubscribe from this list: send the line "unsubscribe git" in
> >the body of a message to majordomo@vger.kernel.org
> >More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

Sorry, I guess I could have included more details in my original post.
Since then, I have also made an attempt to clone another (slightly more
recent) branch, and at last had success.  So I see this does indeed
work, it just seems to be very unhappy with one particular branch.

So, here are a few statistics I collected on the two branches.

branch-that-fails:
total workspace disk usage (current head): 12GB
68 files over 20MB
largest three being about 118MB

branch-that-clones:
total workspace disk usage (current head): 11GB
22 files over 20MB
largest three being about 80MB

I suspect that part of the problem here might be that my company likes
to submit very large binaries into our repo (.tar.gzs, pre-compiled
third party binaries, etc.).

Is there any way I can clone this in pieces?  The best I've come up with
is to clone only up to a change number just before it tends to fail, and
then rebase to the latest.  My clone succeeded, but the rebase still
runs out of memory.  It would be great if I could specify a change
number to rebase up to, so that I can just take this thing a few hundred
changes at a time.

Thanks,
Corey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git-p4 out of memory for very large repository
  2013-08-23 11:48   ` Corey Thompson
@ 2013-08-23 11:59     ` Corey Thompson
  2013-08-23 19:42       ` Luke Diamand
  2013-08-25 15:50     ` Pete Wyckoff
  1 sibling, 1 reply; 13+ messages in thread
From: Corey Thompson @ 2013-08-23 11:59 UTC (permalink / raw)
  To: Luke Diamand; +Cc: git

On Fri, Aug 23, 2013 at 07:48:56AM -0400, Corey Thompson wrote:
> Sorry, I guess I could have included more details in my original post.
> Since then, I have also made an attempt to clone another (slightly more
> recent) branch, and at last had success.  So I see this does indeed
> work, it just seems to be very unhappy with one particular branch.
> 
> So, here are a few statistics I collected on the two branches.
> 
> branch-that-fails:
> total workspace disk usage (current head): 12GB
> 68 files over 20MB
> largest three being about 118MB
> 
> branch-that-clones:
> total workspace disk usage (current head): 11GB
> 22 files over 20MB
> largest three being about 80MB
> 
> I suspect that part of the problem here might be that my company likes
> to submit very large binaries into our repo (.tar.gzs, pre-compiled
> third party binaries, etc.).
> 
> Is there any way I can clone this in pieces?  The best I've come up with
> is to clone only up to a change number just before it tends to fail, and
> then rebase to the latest.  My clone succeeded, but the rebase still
> runs out of memory.  It would be great if I could specify a change
> number to rebase up to, so that I can just take this thing a few hundred
> changes at a time.
> 
> Thanks,
> Corey

And I still haven't told you anything about my platform or git
version...

This is on Fedora Core 11, with git 1.8.3.4 built from the github repo
(117eea7e).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git-p4 out of memory for very large repository
  2013-08-23 11:59     ` Corey Thompson
@ 2013-08-23 19:42       ` Luke Diamand
  2013-08-24  0:56         ` Corey Thompson
  0 siblings, 1 reply; 13+ messages in thread
From: Luke Diamand @ 2013-08-23 19:42 UTC (permalink / raw)
  To: Corey Thompson; +Cc: git


I think I've cloned files as large as that or larger. If you just want to
clone this and move on, perhaps you just need a bit more memory? What's the
size of your physical memory and swap partition? Per process memory limit?


On 23 Aug 2013 12:59, "Corey Thompson" <cmtptr@gmail.com> wrote:
On 23/08/13 12:59, Corey Thompson wrote:
> On Fri, Aug 23, 2013 at 07:48:56AM -0400, Corey Thompson wrote:
>> Sorry, I guess I could have included more details in my original post.
>> Since then, I have also made an attempt to clone another (slightly more
>> recent) branch, and at last had success.  So I see this does indeed
>> work, it just seems to be very unhappy with one particular branch.
>>
>> So, here are a few statistics I collected on the two branches.
>>
>> branch-that-fails:
>> total workspace disk usage (current head): 12GB
>> 68 files over 20MB
>> largest three being about 118MB
>>
>> branch-that-clones:
>> total workspace disk usage (current head): 11GB
>> 22 files over 20MB
>> largest three being about 80MB
>>
>> I suspect that part of the problem here might be that my company likes
>> to submit very large binaries into our repo (.tar.gzs, pre-compiled
>> third party binaries, etc.).
>>
>> Is there any way I can clone this in pieces?  The best I've come up with
>> is to clone only up to a change number just before it tends to fail, and
>> then rebase to the latest.  My clone succeeded, but the rebase still
>> runs out of memory.  It would be great if I could specify a change
>> number to rebase up to, so that I can just take this thing a few hundred
>> changes at a time.
>>
>> Thanks,
>> Corey
> 
> And I still haven't told you anything about my platform or git
> version...
> 
> This is on Fedora Core 11, with git 1.8.3.4 built from the github repo
> (117eea7e).

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git-p4 out of memory for very large repository
  2013-08-23 19:42       ` Luke Diamand
@ 2013-08-24  0:56         ` Corey Thompson
  0 siblings, 0 replies; 13+ messages in thread
From: Corey Thompson @ 2013-08-24  0:56 UTC (permalink / raw)
  To: Luke Diamand; +Cc: git

On Fri, Aug 23, 2013 at 08:42:44PM +0100, Luke Diamand wrote:
> 
> I think I've cloned files as large as that or larger. If you just want to
> clone this and move on, perhaps you just need a bit more memory? What's the
> size of your physical memory and swap partition? Per process memory limit?
> 

The machine has 32GB of memory, so I hope that should be more than
sufficient!

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 268288
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
open files                      (-n) 1024
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 10240
cpu time               (seconds, -t) unlimited
max user processes              (-u) 1024
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

Admittedly I don't typically look at ulimit, so please excuse me if I
interpret this wrong, but I feel like this is indicating that the only
artificial limit in place is a maximum of 64kB mlock()'d memory.

Thanks,
Corey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git-p4 out of memory for very large repository
  2013-08-23 11:48   ` Corey Thompson
  2013-08-23 11:59     ` Corey Thompson
@ 2013-08-25 15:50     ` Pete Wyckoff
  2013-08-26 13:47       ` Corey Thompson
  1 sibling, 1 reply; 13+ messages in thread
From: Pete Wyckoff @ 2013-08-25 15:50 UTC (permalink / raw)
  To: Corey Thompson; +Cc: Luke Diamand, git

cmtptr@gmail.com wrote on Fri, 23 Aug 2013 07:48 -0400:
> On Fri, Aug 23, 2013 at 08:16:58AM +0100, Luke Diamand wrote:
> > On 23/08/13 02:12, Corey Thompson wrote:
> > >Hello,
> > >
> > >Has anyone actually gotten git-p4 to clone a large Perforce repository?
> > 
> > Yes. I've cloned repos with a couple of Gig of files.
> > 
> > >I have one codebase in particular that gets to about 67%, then
> > >consistently gets get-fast-import (and often times a few other
> > >processes) killed by the OOM killer.
[..]
> Sorry, I guess I could have included more details in my original post.
> Since then, I have also made an attempt to clone another (slightly more
> recent) branch, and at last had success.  So I see this does indeed
> work, it just seems to be very unhappy with one particular branch.
> 
> So, here are a few statistics I collected on the two branches.
> 
> branch-that-fails:
> total workspace disk usage (current head): 12GB
> 68 files over 20MB
> largest three being about 118MB
> 
> branch-that-clones:
> total workspace disk usage (current head): 11GB
> 22 files over 20MB
> largest three being about 80MB
> 
> I suspect that part of the problem here might be that my company likes
> to submit very large binaries into our repo (.tar.gzs, pre-compiled
> third party binaries, etc.).
> 
> Is there any way I can clone this in pieces?  The best I've come up with
> is to clone only up to a change number just before it tends to fail, and
> then rebase to the latest.  My clone succeeded, but the rebase still
> runs out of memory.  It would be great if I could specify a change
> number to rebase up to, so that I can just take this thing a few hundred
> changes at a time.

Modern git, including your version, do "streaming" reads from p4,
so the git-p4 python process never even holds a whole file's
worth of data.  You're seeing git-fast-import die, it seems.  It
will hold onto the entire file contents.  But just one, not the
entire repo.  How big is the single largest file?

You can import in pieces.  See the change numbers like this:

    p4 changes -m 1000 //depot/big/...
    p4 changes -m 1000 //depot/big/...@<some-old-change>

Import something far enough back in history so that it seems
to work:

    git p4 clone --destination=big //depot/big@60602
    cd big

Sync up a bit at a time:

    git p4 sync @60700
    git p4 sync @60800
    ...

I don't expect this to get around the problem you describe,
however.  Sounds like there is one gigantic file that is causing
git-fast-import to fill all of memory.  You will at least isolate
the change.

There are options to git-fast-import to limit max pack size
and to cause it to skip importing files that are too big, if
that would help.

You can also use a client spec to hide the offending files
from git.

Can you watch with "top"?  Hit "M" to sort by memory usage, and
see how big the processes get before falling over.

		-- Pete

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git-p4 out of memory for very large repository
  2013-08-25 15:50     ` Pete Wyckoff
@ 2013-08-26 13:47       ` Corey Thompson
  2013-08-28 15:41         ` Corey Thompson
  0 siblings, 1 reply; 13+ messages in thread
From: Corey Thompson @ 2013-08-26 13:47 UTC (permalink / raw)
  To: Pete Wyckoff; +Cc: Luke Diamand, git

On Sun, Aug 25, 2013 at 11:50:01AM -0400, Pete Wyckoff wrote:
> Modern git, including your version, do "streaming" reads from p4,
> so the git-p4 python process never even holds a whole file's
> worth of data.  You're seeing git-fast-import die, it seems.  It
> will hold onto the entire file contents.  But just one, not the
> entire repo.  How big is the single largest file?
> 
> You can import in pieces.  See the change numbers like this:
> 
>     p4 changes -m 1000 //depot/big/...
>     p4 changes -m 1000 //depot/big/...@<some-old-change>
> 
> Import something far enough back in history so that it seems
> to work:
> 
>     git p4 clone --destination=big //depot/big@60602
>     cd big
> 
> Sync up a bit at a time:
> 
>     git p4 sync @60700
>     git p4 sync @60800
>     ...
> 
> I don't expect this to get around the problem you describe,
> however.  Sounds like there is one gigantic file that is causing
> git-fast-import to fill all of memory.  You will at least isolate
> the change.
> 
> There are options to git-fast-import to limit max pack size
> and to cause it to skip importing files that are too big, if
> that would help.
> 
> You can also use a client spec to hide the offending files
> from git.
> 
> Can you watch with "top"?  Hit "M" to sort by memory usage, and
> see how big the processes get before falling over.
> 
> 		-- Pete

You are correct that git-fast-import is killed by the OOM killer, but I
was unclear about which process was malloc()ing so much memory that the
OOM killer got invoked (as other completely unrelated processes usually
also get killed when this happens).

Unless there's one gigantic file in one change that gets removed by
another change, I don't think that's the problem; as I mentioned in
another email, the machine has 32GB physical memory and the largest
single file in the current head is only 118MB.  Even if there is a very
large transient file somewhere in the history, I seriously doubt it's
tens of gigabytes in size.

I have tried watching it with top before, but it takes several hours
before it dies.  I haven't been able to see any explosion of memory
usage, even within the final hour, but I've never caught it just before
it dies, either.  I suspect that whatever the issue is here, it happens
very quickly.

If I'm unable to get through this today using the incremental p4 sync
method you described, I'll try running a full-blown clone overnight with
top in batch mode writing to a log file to see whether it catches
anything.

Thanks again,
Corey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git-p4 out of memory for very large repository
  2013-08-26 13:47       ` Corey Thompson
@ 2013-08-28 15:41         ` Corey Thompson
  2013-08-29 22:46           ` Pete Wyckoff
  0 siblings, 1 reply; 13+ messages in thread
From: Corey Thompson @ 2013-08-28 15:41 UTC (permalink / raw)
  To: Pete Wyckoff; +Cc: Luke Diamand, git

On Mon, Aug 26, 2013 at 09:47:56AM -0400, Corey Thompson wrote:
> You are correct that git-fast-import is killed by the OOM killer, but I
> was unclear about which process was malloc()ing so much memory that the
> OOM killer got invoked (as other completely unrelated processes usually
> also get killed when this happens).
> 
> Unless there's one gigantic file in one change that gets removed by
> another change, I don't think that's the problem; as I mentioned in
> another email, the machine has 32GB physical memory and the largest
> single file in the current head is only 118MB.  Even if there is a very
> large transient file somewhere in the history, I seriously doubt it's
> tens of gigabytes in size.
> 
> I have tried watching it with top before, but it takes several hours
> before it dies.  I haven't been able to see any explosion of memory
> usage, even within the final hour, but I've never caught it just before
> it dies, either.  I suspect that whatever the issue is here, it happens
> very quickly.
> 
> If I'm unable to get through this today using the incremental p4 sync
> method you described, I'll try running a full-blown clone overnight with
> top in batch mode writing to a log file to see whether it catches
> anything.
> 
> Thanks again,
> Corey

Unforunately I have not made much progress.  The incremental sync method
fails with the output pasted below.  The change I specified is only one
change number above where that repo was cloned...

So I tried a 'git p4 rebase' overnight with top running, and as I feared
I did not see anything out of the ordinary.  git, git-fast-import, and
git-p4 all hovered under 1.5% MEM the entire time, right up until
death.  The last entry in my log shows git-fast-import at 0.8%, with git
and git-p4 at 0.0% and 0.1%, respectively.  I could try again with a
more granular period, but I feel like this method is ultimately a goose
chase.

Corey


$ git p4 sync //path/to/some/branch@505859
Doing initial import of //path/to/some/branch/ from revision @505859 into refs/remotes/p4/master
fast-import failed: warning: Not updating refs/remotes/p4/master (new tip 29ef6ff25f1448fa2f907d22fd704594dc8769bd does not contain d477672be5ac6a00cc9175ba2713d5395660e840)
git-fast-import statistics:
---------------------------------------------------------------------
Alloc'd objects:     165000
Total objects:           69 (    232434 duplicates                  )
      blobs  :           45 (    209904 duplicates         40 deltas of         42 attempts) 
      trees  :           23 (     22530 duplicates          0 deltas of         23 attempts) 
      commits:            1 (         0 duplicates          0 deltas of          0 attempts) 
      tags   :            0 (         0 duplicates          0 deltas of          0 attempts)
Total branches:           1 (         1 loads     )
      marks:           1024 (         0 unique    )
      atoms:         105170
Memory total:         24421 KiB
       pools:         17976 KiB
     objects:          6445 KiB
---------------------------------------------------------------------
pack_report: getpagesize()            =       4096
pack_report: core.packedGitWindowSize =   33554432
pack_report: core.packedGitLimit      =  268435456
pack_report: pack_used_ctr            =       4371
pack_report: pack_mmap_calls          =        124
pack_report: pack_open_windows        =          8 /          9
pack_report: pack_mapped              =  268435456 /  268435456
---------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git-p4 out of memory for very large repository
  2013-08-28 15:41         ` Corey Thompson
@ 2013-08-29 22:46           ` Pete Wyckoff
  2013-09-02 19:42             ` Luke Diamand
  0 siblings, 1 reply; 13+ messages in thread
From: Pete Wyckoff @ 2013-08-29 22:46 UTC (permalink / raw)
  To: Corey Thompson; +Cc: Luke Diamand, git

cmtptr@gmail.com wrote on Wed, 28 Aug 2013 11:41 -0400:
> On Mon, Aug 26, 2013 at 09:47:56AM -0400, Corey Thompson wrote:
> > You are correct that git-fast-import is killed by the OOM killer, but I
> > was unclear about which process was malloc()ing so much memory that the
> > OOM killer got invoked (as other completely unrelated processes usually
> > also get killed when this happens).
> > 
> > Unless there's one gigantic file in one change that gets removed by
> > another change, I don't think that's the problem; as I mentioned in
> > another email, the machine has 32GB physical memory and the largest
> > single file in the current head is only 118MB.  Even if there is a very
> > large transient file somewhere in the history, I seriously doubt it's
> > tens of gigabytes in size.
> > 
> > I have tried watching it with top before, but it takes several hours
> > before it dies.  I haven't been able to see any explosion of memory
> > usage, even within the final hour, but I've never caught it just before
> > it dies, either.  I suspect that whatever the issue is here, it happens
> > very quickly.
> > 
> > If I'm unable to get through this today using the incremental p4 sync
> > method you described, I'll try running a full-blown clone overnight with
> > top in batch mode writing to a log file to see whether it catches
> > anything.
> > 
> > Thanks again,
> > Corey
> 
> Unforunately I have not made much progress.  The incremental sync method
> fails with the output pasted below.  The change I specified is only one
> change number above where that repo was cloned...

I usually just do "git p4 sync @505859".  The error message below
crops up when things get confused.  Usually after a previous
error.  I tend to destroy the repo and try again.  Sorry I don't
can't explain better what's happening here.  It's not a memory
issue; it reports only 24 MB used.

> So I tried a 'git p4 rebase' overnight with top running, and as I feared
> I did not see anything out of the ordinary.  git, git-fast-import, and
> git-p4 all hovered under 1.5% MEM the entire time, right up until
> death.  The last entry in my log shows git-fast-import at 0.8%, with git
> and git-p4 at 0.0% and 0.1%, respectively.  I could try again with a
> more granular period, but I feel like this method is ultimately a goose
> chase.

Bizarre.  There is no good explanation why memory usage would go
up to 32 GB (?) within one top interval (3 sec ?).  My theory
about one gigantic object is debunked:  you have only the 118 MB
one.  Perhaps there's some container or process memory limit, as
Luke guessed, but it's not obvious here.

The other big hammer is "strace".  If you're still interested in
playing with this, you could do:

    strace -vf -tt -s 200 -o /tmp/strace.out git p4 clone ....

and hours later, see if something suggests itself toward the
end of that output file.

		-- Pete

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git-p4 out of memory for very large repository
  2013-08-29 22:46           ` Pete Wyckoff
@ 2013-09-02 19:42             ` Luke Diamand
  2013-09-06 19:03               ` Corey Thompson
  0 siblings, 1 reply; 13+ messages in thread
From: Luke Diamand @ 2013-09-02 19:42 UTC (permalink / raw)
  To: Pete Wyckoff; +Cc: Corey Thompson, git

I guess you could try changing the OOM score for git-fast-import.

change /proc/<pid>/oomadj.

I think a value of -31 would make it very unlikely to be killed.

On 29/08/13 23:46, Pete Wyckoff wrote:
> cmtptr@gmail.com wrote on Wed, 28 Aug 2013 11:41 -0400:
>> On Mon, Aug 26, 2013 at 09:47:56AM -0400, Corey Thompson wrote:
>>> You are correct that git-fast-import is killed by the OOM killer, but I
>>> was unclear about which process was malloc()ing so much memory that the
>>> OOM killer got invoked (as other completely unrelated processes usually
>>> also get killed when this happens).
>>>
>>> Unless there's one gigantic file in one change that gets removed by
>>> another change, I don't think that's the problem; as I mentioned in
>>> another email, the machine has 32GB physical memory and the largest
>>> single file in the current head is only 118MB.  Even if there is a very
>>> large transient file somewhere in the history, I seriously doubt it's
>>> tens of gigabytes in size.
>>>
>>> I have tried watching it with top before, but it takes several hours
>>> before it dies.  I haven't been able to see any explosion of memory
>>> usage, even within the final hour, but I've never caught it just before
>>> it dies, either.  I suspect that whatever the issue is here, it happens
>>> very quickly.
>>>
>>> If I'm unable to get through this today using the incremental p4 sync
>>> method you described, I'll try running a full-blown clone overnight with
>>> top in batch mode writing to a log file to see whether it catches
>>> anything.
>>>
>>> Thanks again,
>>> Corey
>>
>> Unforunately I have not made much progress.  The incremental sync method
>> fails with the output pasted below.  The change I specified is only one
>> change number above where that repo was cloned...
>
> I usually just do "git p4 sync @505859".  The error message below
> crops up when things get confused.  Usually after a previous
> error.  I tend to destroy the repo and try again.  Sorry I don't
> can't explain better what's happening here.  It's not a memory
> issue; it reports only 24 MB used.
>
>> So I tried a 'git p4 rebase' overnight with top running, and as I feared
>> I did not see anything out of the ordinary.  git, git-fast-import, and
>> git-p4 all hovered under 1.5% MEM the entire time, right up until
>> death.  The last entry in my log shows git-fast-import at 0.8%, with git
>> and git-p4 at 0.0% and 0.1%, respectively.  I could try again with a
>> more granular period, but I feel like this method is ultimately a goose
>> chase.
>
> Bizarre.  There is no good explanation why memory usage would go
> up to 32 GB (?) within one top interval (3 sec ?).  My theory
> about one gigantic object is debunked:  you have only the 118 MB
> one.  Perhaps there's some container or process memory limit, as
> Luke guessed, but it's not obvious here.
>
> The other big hammer is "strace".  If you're still interested in
> playing with this, you could do:
>
>      strace -vf -tt -s 200 -o /tmp/strace.out git p4 clone ....
>
> and hours later, see if something suggests itself toward the
> end of that output file.
>
> 		-- Pete

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git-p4 out of memory for very large repository
  2013-09-02 19:42             ` Luke Diamand
@ 2013-09-06 19:03               ` Corey Thompson
  2013-09-07  8:19                 ` Pete Wyckoff
  0 siblings, 1 reply; 13+ messages in thread
From: Corey Thompson @ 2013-09-06 19:03 UTC (permalink / raw)
  To: Luke Diamand; +Cc: Pete Wyckoff, git

On Mon, Sep 02, 2013 at 08:42:36PM +0100, Luke Diamand wrote:
> I guess you could try changing the OOM score for git-fast-import.
> 
> change /proc/<pid>/oomadj.
> 
> I think a value of -31 would make it very unlikely to be killed.
> 
> On 29/08/13 23:46, Pete Wyckoff wrote:
> >I usually just do "git p4 sync @505859".  The error message below
> >crops up when things get confused.  Usually after a previous
> >error.  I tend to destroy the repo and try again.  Sorry I don't
> >can't explain better what's happening here.  It's not a memory
> >issue; it reports only 24 MB used.
> >
> >Bizarre.  There is no good explanation why memory usage would go
> >up to 32 GB (?) within one top interval (3 sec ?).  My theory
> >about one gigantic object is debunked:  you have only the 118 MB
> >one.  Perhaps there's some container or process memory limit, as
> >Luke guessed, but it's not obvious here.
> >
> >The other big hammer is "strace".  If you're still interested in
> >playing with this, you could do:
> >
> >     strace -vf -tt -s 200 -o /tmp/strace.out git p4 clone ....
> >
> >and hours later, see if something suggests itself toward the
> >end of that output file.
> >
> >		-- Pete
> 

Finally, I claim success!  Unfortunately I did not try either of the OOM
score or strace suggestions - sorry!  After spending so much time on
this, I've gotten to the point that I'm more interested in getting it to
work than in figuring out why the direct approach isn't working; it
sounds like you're both pretty confident that git is working as it
should, and I don't maintain the system I'm doing this on so I don't
doubt that there might be some artificial limit or other quirk here that
we just aren't seeing.

Anyway, what I found is that Pete's incremental method does work, I just
have to know how to do it properly!  This is what I WAS doing to
generate the error message I pasted several posts ago:

git clone //path/to/branch@<begin>,<stage1>
cd branch
git sync //path/to/branch@<stage2>
# ERROR!
# (I also tried //path/to/branch@<stage1+1>,<stage2>, same error)

Eventually what happened is that I downloaded the free 20-user p4d, set
up a very small repository with only 4 changes, and started some old
fashioned trial-and-error.  Here's what I should have been doing all
along:

git clone //path/to/branch@<begin>,<stage1>
cd branch
git sync //path/to/branch@<begin>,<stage2>
git sync //path/to/branch@<begin>,<stage3>
# and so on...

And syncing a few thousand changes every day over the course of the past
week, my git repo is finally up to the Perforce HEAD.  So I suppose
ultimately this was my own misunderstanding, partly because when you
begin your range at the original first change number the output looks
suspiciously like it's importing changes again that it's already
imported.  Maybe this is all documented somewhere, and if it is I just
failed to find it.

Thanks to both of you for all your help!
Corey

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: git-p4 out of memory for very large repository
  2013-09-06 19:03               ` Corey Thompson
@ 2013-09-07  8:19                 ` Pete Wyckoff
  0 siblings, 0 replies; 13+ messages in thread
From: Pete Wyckoff @ 2013-09-07  8:19 UTC (permalink / raw)
  To: Corey Thompson; +Cc: Luke Diamand, git

cmtptr@gmail.com wrote on Fri, 06 Sep 2013 15:03 -0400:
> Finally, I claim success!  Unfortunately I did not try either of the OOM
> score or strace suggestions - sorry!  After spending so much time on
> this, I've gotten to the point that I'm more interested in getting it to
> work than in figuring out why the direct approach isn't working; it
> sounds like you're both pretty confident that git is working as it
> should, and I don't maintain the system I'm doing this on so I don't
> doubt that there might be some artificial limit or other quirk here that
> we just aren't seeing.
> 
> Anyway, what I found is that Pete's incremental method does work, I just
> have to know how to do it properly!  This is what I WAS doing to
> generate the error message I pasted several posts ago:
> 
> git clone //path/to/branch@<begin>,<stage1>
> cd branch
> git sync //path/to/branch@<stage2>
> # ERROR!
> # (I also tried //path/to/branch@<stage1+1>,<stage2>, same error)
> 
> Eventually what happened is that I downloaded the free 20-user p4d, set
> up a very small repository with only 4 changes, and started some old
> fashioned trial-and-error.  Here's what I should have been doing all
> along:
> 
> git clone //path/to/branch@<begin>,<stage1>
> cd branch
> git sync //path/to/branch@<begin>,<stage2>
> git sync //path/to/branch@<begin>,<stage3>
> # and so on...
> 
> And syncing a few thousand changes every day over the course of the past
> week, my git repo is finally up to the Perforce HEAD.  So I suppose
> ultimately this was my own misunderstanding, partly because when you
> begin your range at the original first change number the output looks
> suspiciously like it's importing changes again that it's already
> imported.  Maybe this is all documented somewhere, and if it is I just
> failed to find it.
> 
> Thanks to both of you for all your help!

That you got it to work is the most important thing.  Amazing all
the effort you put into it; a lesser hacker would have walked
away much earlier.

The changes don't overlap.  If you give it a range that includes
changes already synced, git-p4 makes sure to start only at the
lowest change it has not yet seen.  I'll see if I can update the
docs somewhere.

		-- Pete

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2013-09-07  8:13 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-08-23  1:12 git-p4 out of memory for very large repository Corey Thompson
2013-08-23  7:16 ` Luke Diamand
2013-08-23 11:48   ` Corey Thompson
2013-08-23 11:59     ` Corey Thompson
2013-08-23 19:42       ` Luke Diamand
2013-08-24  0:56         ` Corey Thompson
2013-08-25 15:50     ` Pete Wyckoff
2013-08-26 13:47       ` Corey Thompson
2013-08-28 15:41         ` Corey Thompson
2013-08-29 22:46           ` Pete Wyckoff
2013-09-02 19:42             ` Luke Diamand
2013-09-06 19:03               ` Corey Thompson
2013-09-07  8:19                 ` Pete Wyckoff

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.