All of lore.kernel.org
 help / color / mirror / Atom feed
* Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
@ 2012-03-29 20:53 Eric Bénard
  2012-03-29 22:03 ` Richard Purdie
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Bénard @ 2012-03-29 20:53 UTC (permalink / raw)
  To: openembedded-core

Hi,

I noticed in from scratch builds for qemuarm that the longest time is
taken in fetching sources, especially those fetched using git
(linux-yocto for example) & svn (gcc, eglibc & co).

To reduce the fetch time would that make sense to 
- fetch gcc/glibc & co from the archive of a stable version and then
  apply patches on top of it (maybe patches stored in an archive
  fetched from oe's website and applied in bulk or patches stored in OE)
- do the same thing for the linux-yocto kernel or add a --reference
  option to the git fetcher so that we can provide a local tree as a
  reference ?

Do you have other ideas (appart from using a local mirror) to optimize
the fetch time ?

Thanks,
Eric



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-29 20:53 Fetch time optimization (svn : gcc/eglibc - git : linux-yocto) Eric Bénard
@ 2012-03-29 22:03 ` Richard Purdie
  2012-03-30  1:03   ` Bruce Ashfield
  2012-03-30  8:50   ` Eric Bénard
  0 siblings, 2 replies; 20+ messages in thread
From: Richard Purdie @ 2012-03-29 22:03 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Thu, 2012-03-29 at 22:53 +0200, Eric Bénard wrote:
> I noticed in from scratch builds for qemuarm that the longest time is
> taken in fetching sources, especially those fetched using git
> (linux-yocto for example) & svn (gcc, eglibc & co).

Are you timing these as fetches from the source control systems or from
the mirror tarballs of the repositories. The tarballs should be
faster...

> To reduce the fetch time would that make sense to 
> - fetch gcc/glibc & co from the archive of a stable version and then
>   apply patches on top of it (maybe patches stored in an archive
>   fetched from oe's website and applied in bulk or patches stored in OE)

Unfortunately the patches tend to get unwieldy. The tarballs of the svn
repos on the mirror should be about equal in size to the upstream
archive in this case. 

> - do the same thing for the linux-yocto kernel or add a --reference
>   option to the git fetcher so that we can provide a local tree as a
>   reference ?

This is effectively how the repositories in DL_DIR are used. If you
place a tree in the right place there, it should reuse references...

> Do you have other ideas (appart from using a local mirror) to optimize
> the fetch time ?

I'd be interested firstly to understand if you're using the SCM directly
or using the mirror tarballs as that should make a big difference. In
the standard configuration it should be using mirror tarballs...

Cheers,

Richard




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-29 22:03 ` Richard Purdie
@ 2012-03-30  1:03   ` Bruce Ashfield
  2012-03-30  6:44     ` Samuel Stirtzel
  2012-03-30  7:00     ` Martin Jansa
  2012-03-30  8:50   ` Eric Bénard
  1 sibling, 2 replies; 20+ messages in thread
From: Bruce Ashfield @ 2012-03-30  1:03 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Thu, Mar 29, 2012 at 6:03 PM, Richard Purdie
<richard.purdie@linuxfoundation.org> wrote:
> On Thu, 2012-03-29 at 22:53 +0200, Eric Bénard wrote:
>> I noticed in from scratch builds for qemuarm that the longest time is
>> taken in fetching sources, especially those fetched using git
>> (linux-yocto for example) & svn (gcc, eglibc & co).
>
> Are you timing these as fetches from the source control systems or from
> the mirror tarballs of the repositories. The tarballs should be
> faster...
>
>> To reduce the fetch time would that make sense to
>> - fetch gcc/glibc & co from the archive of a stable version and then
>>   apply patches on top of it (maybe patches stored in an archive
>>   fetched from oe's website and applied in bulk or patches stored in OE)
>
> Unfortunately the patches tend to get unwieldy. The tarballs of the svn
> repos on the mirror should be about equal in size to the upstream
> archive in this case.
>
>> - do the same thing for the linux-yocto kernel or add a --reference
>>   option to the git fetcher so that we can provide a local tree as a
>>   reference ?
>
> This is effectively how the repositories in DL_DIR are used. If you
> place a tree in the right place there, it should reuse references...

Agreed .. they definitely do here.

Richard probably recalls me asking for a --reference option several
years ago as well .. but in the end, at some point the initial fetch happens
and then the blobs are re-used. So setting up local mirrors, or pre-fetching
are options to make sure that the first download is primed and ready to
go. For most builds I do, any time fetching just happens in the background
and doesn't get in the way.

>
>> Do you have other ideas (appart from using a local mirror) to optimize
>> the fetch time ?
>
> I'd be interested firstly to understand if you're using the SCM directly
> or using the mirror tarballs as that should make a big difference. In
> the standard configuration it should be using mirror tarballs...

As would I, since there are some ideas, but they either break workflows,
don't follow best practices or compromise the completeness of the data.

Cheers,

Bruce

>
> Cheers,
>
> Richard
>
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core



-- 
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end"



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30  1:03   ` Bruce Ashfield
@ 2012-03-30  6:44     ` Samuel Stirtzel
  2012-03-30  9:21       ` Paul Eggleton
  2012-03-30  9:32       ` Richard Purdie
  2012-03-30  7:00     ` Martin Jansa
  1 sibling, 2 replies; 20+ messages in thread
From: Samuel Stirtzel @ 2012-03-30  6:44 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

2012/3/30 Bruce Ashfield <bruce.ashfield@gmail.com>:
> On Thu, Mar 29, 2012 at 6:03 PM, Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
>> On Thu, 2012-03-29 at 22:53 +0200, Eric Bénard wrote:
>>> I noticed in from scratch builds for qemuarm that the longest time is
>>> taken in fetching sources, especially those fetched using git
>>> (linux-yocto for example) & svn (gcc, eglibc & co).
>>
>> Are you timing these as fetches from the source control systems or from
>> the mirror tarballs of the repositories. The tarballs should be
>> faster...
>>
>>> To reduce the fetch time would that make sense to
>>> - fetch gcc/glibc & co from the archive of a stable version and then
>>>   apply patches on top of it (maybe patches stored in an archive
>>>   fetched from oe's website and applied in bulk or patches stored in OE)
>>
>> Unfortunately the patches tend to get unwieldy. The tarballs of the svn
>> repos on the mirror should be about equal in size to the upstream
>> archive in this case.
>>
>>> - do the same thing for the linux-yocto kernel or add a --reference
>>>   option to the git fetcher so that we can provide a local tree as a
>>>   reference ?
>>
>> This is effectively how the repositories in DL_DIR are used. If you
>> place a tree in the right place there, it should reuse references...
>
> Agreed .. they definitely do here.
>
> Richard probably recalls me asking for a --reference option several
> years ago as well .. but in the end, at some point the initial fetch happens
> and then the blobs are re-used. So setting up local mirrors, or pre-fetching
> are options to make sure that the first download is primed and ready to
> go. For most builds I do, any time fetching just happens in the background
> and doesn't get in the way.
>
>>
>>> Do you have other ideas (appart from using a local mirror) to optimize
>>> the fetch time ?
>>
>> I'd be interested firstly to understand if you're using the SCM directly
>> or using the mirror tarballs as that should make a big difference. In
>> the standard configuration it should be using mirror tarballs...
>
> As would I, since there are some ideas, but they either break workflows,
> don't follow best practices or compromise the completeness of the data.
>
> Cheers,
>
> Bruce
>
>>
>> Cheers,
>>
>> Richard
>>
>>
>> _______________________________________________
>> Openembedded-core mailing list
>> Openembedded-core@lists.openembedded.org
>> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core
>
>
>
> --
> "Thou shalt not follow the NULL pointer, for chaos and madness await
> thee at its end"
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core

Hi,
this might be a bit off-topic, but another idea would be to add a
separate threading mechanism for fetching.

Current threading can help to use the CPU and memory load to it's optimum,
but sometimes you have to wait for a download to finish..
Instead there could be a separate set of threads that only download
the sources and make optimal use of the bandwidth too.

This would also allow to fetch files when the normal threads are busy
with configuring/building/packaging recipes.


The downside would be that it requires some sort of inter process
communication.
Or it could be regulated with a simple check if the download is finished..

How does this idea sound to you?


-- 
Regards
Samuel



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30  1:03   ` Bruce Ashfield
  2012-03-30  6:44     ` Samuel Stirtzel
@ 2012-03-30  7:00     ` Martin Jansa
  2012-03-30 10:06       ` Richard Purdie
  1 sibling, 1 reply; 20+ messages in thread
From: Martin Jansa @ 2012-03-30  7:00 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

[-- Attachment #1: Type: text/plain, Size: 2118 bytes --]

On Thu, Mar 29, 2012 at 09:03:15PM -0400, Bruce Ashfield wrote:
> On Thu, Mar 29, 2012 at 6:03 PM, Richard Purdie
> <richard.purdie@linuxfoundation.org> wrote:
> > On Thu, 2012-03-29 at 22:53 +0200, Eric Bénard wrote:
> >> I noticed in from scratch builds for qemuarm that the longest time is
> >> taken in fetching sources, especially those fetched using git
> >> (linux-yocto for example) & svn (gcc, eglibc & co).
> >
> > Are you timing these as fetches from the source control systems or from
> > the mirror tarballs of the repositories. The tarballs should be
> > faster...
> >
> >> To reduce the fetch time would that make sense to
> >> - fetch gcc/glibc & co from the archive of a stable version and then
> >>   apply patches on top of it (maybe patches stored in an archive
> >>   fetched from oe's website and applied in bulk or patches stored in OE)
> >
> > Unfortunately the patches tend to get unwieldy. The tarballs of the svn
> > repos on the mirror should be about equal in size to the upstream
> > archive in this case.
> >
> >> - do the same thing for the linux-yocto kernel or add a --reference
> >>   option to the git fetcher so that we can provide a local tree as a
> >>   reference ?
> >
> > This is effectively how the repositories in DL_DIR are used. If you
> > place a tree in the right place there, it should reuse references...
> 
> Agreed .. they definitely do here.

What's right place?

I guess the idea was to use --reference for e.g. some other kernel recipe
sources checkout.

And I guess that building linux-foo won't notice that there is e.g.
/OE/downloads/git2/gitorious.org.shr.linux.git:
from which it can share a lot of objects using --reference

Bob Ham (rah on #oe) said that he is working on some sort of support
for --reference with bitbake after I've refused to add just another
linux-bar recipe to meta-smartphone, but not sure how he plans to
implement it to be usefull and working oob in different env with
different sources available in downloads dir.

Cheers,

-- 
Martin 'JaMa' Jansa     jabber: Martin.Jansa@gmail.com

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 205 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-29 22:03 ` Richard Purdie
  2012-03-30  1:03   ` Bruce Ashfield
@ 2012-03-30  8:50   ` Eric Bénard
  2012-03-30 15:12     ` Richard Purdie
  1 sibling, 1 reply; 20+ messages in thread
From: Eric Bénard @ 2012-03-30  8:50 UTC (permalink / raw)
  To: openembedded-core

Le Thu, 29 Mar 2012 23:03:13 +0100,
Richard Purdie <richard.purdie@linuxfoundation.org> a écrit :

> On Thu, 2012-03-29 at 22:53 +0200, Eric Bénard wrote:
> > I noticed in from scratch builds for qemuarm that the longest time is
> > taken in fetching sources, especially those fetched using git
> > (linux-yocto for example) & svn (gcc, eglibc & co).
> 
> Are you timing these as fetches from the source control systems or from
> the mirror tarballs of the repositories. The tarballs should be
> faster...
> 
the default configuration seems to fetch from source control systems
as I always see very long time to fetch gcc/eglibc/linux-yocto
(despite having a 2.2 MBytes/s downlink DSL line).

> > To reduce the fetch time would that make sense to 
> > - fetch gcc/glibc & co from the archive of a stable version and then
> >   apply patches on top of it (maybe patches stored in an archive
> >   fetched from oe's website and applied in bulk or patches stored in OE)
> 
> Unfortunately the patches tend to get unwieldy. The tarballs of the svn
> repos on the mirror should be about equal in size to the upstream
> archive in this case. 
> 
I don't think that's a size problem but that fetching through svn or
git is far less efficient than http or ftp especially from gnu's svn
which may be overloaded.
Morover in a pure OE context we have no interest of all the source
history provided by svn or git and that makes a very big volume to
download.

> > - do the same thing for the linux-yocto kernel or add a --reference
> >   option to the git fetcher so that we can provide a local tree as a
> >   reference ?
> 
> This is effectively how the repositories in DL_DIR are used. If you
> place a tree in the right place there, it should reuse references...
> 
> > Do you have other ideas (appart from using a local mirror) to optimize
> > the fetch time ?
> 
> I'd be interested firstly to understand if you're using the SCM directly
> or using the mirror tarballs as that should make a big difference. In
> the standard configuration it should be using mirror tarballs...
> 
that doesn't seems to be the case :
from a clean oe-core + bitbake clone :

. ./openembedded-core/oe-init-build-env 
edit local.conf to select qemuarm & BBTHREAD to 8
bitbake core-image-minimal -c fetchall

and then I see bitbake stops at around 209 or 214 tasks waiting and
I see that in ps :
/home/ebenard/OE-CORE/build/tmp-eglibc/sysroots/x86_64-linux/usr/bin/git.real
clone --bare --mirror
git://git.yoctoproject.org/linux-yocto-3.2 /home/ebenard/OE-CORE/build/downloads/git2/git.yoctoproject.org.linux-yocto-3.2
and
svn co -r 184847
http://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch@184847

... and both are actually fetching at only around 200 KiB/s which last
for quite a long time as (from an other downloads dir) the final
tree size are huge :
du -s git.yoctoproject.org.linux-yocto-3.2/
610824	git.yoctoproject.org.linux-yocto-3.2/
du -s gcc.gnu.org/
1602496	gcc.gnu.org/
du -s www.eglibc.org/
625048	www.eglibc.org
If I launch at the same time :
wget ftp://ftp.gnu.org/gnu/gcc/gcc-4.6.3/gcc-4.6.3.tar.bz2
I get a download speed close to 1MB/s and the file to download is only
64MB which would save bandwidth.

Eric



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30  6:44     ` Samuel Stirtzel
@ 2012-03-30  9:21       ` Paul Eggleton
  2012-03-30  9:32       ` Richard Purdie
  1 sibling, 0 replies; 20+ messages in thread
From: Paul Eggleton @ 2012-03-30  9:21 UTC (permalink / raw)
  To: Samuel Stirtzel; +Cc: openembedded-core

On Friday 30 March 2012 08:44:56 Samuel Stirtzel wrote:
> this might be a bit off-topic, but another idea would be to add a
> separate threading mechanism for fetching.
> 
> Current threading can help to use the CPU and memory load to it's optimum,
> but sometimes you have to wait for a download to finish..
> Instead there could be a separate set of threads that only download
> the sources and make optimal use of the bandwidth too.
> 
> This would also allow to fetch files when the normal threads are busy
> with configuring/building/packaging recipes.

What you're really suggesting here is a modified BitBake scheduler that 
understands that fetch tasks that require network bandwidth are different from 
other tasks such as compile ones which stress the CPU. It sounds like it might 
be worth investigating at least. FYI, BitBake's schedulers are pluggable and 
not particularly complicated (see bitbake/lib/bb/runqueue.py).

Cheers,
Paul

-- 

Paul Eggleton
Intel Open Source Technology Centre



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30  6:44     ` Samuel Stirtzel
  2012-03-30  9:21       ` Paul Eggleton
@ 2012-03-30  9:32       ` Richard Purdie
  2012-03-30 10:07         ` Samuel Stirtzel
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Purdie @ 2012-03-30  9:32 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Fri, 2012-03-30 at 08:44 +0200, Samuel Stirtzel wrote:
> this might be a bit off-topic, but another idea would be to add a
> separate threading mechanism for fetching.
> 
> Current threading can help to use the CPU and memory load to it's optimum,
> but sometimes you have to wait for a download to finish..
> Instead there could be a separate set of threads that only download
> the sources and make optimal use of the bandwidth too.
> 
> This would also allow to fetch files when the normal threads are busy
> with configuring/building/packaging recipes.
> 
> 
> The downside would be that it requires some sort of inter process
> communication.
> Or it could be regulated with a simple check if the download is finished..
> 
> How does this idea sound to you?

Its easier than you think to do this, bitbake has a plugable scheduler
implementation so you'd just have to write one which ignores "fetch"
operations from the total thread count.

Sadly this isn't really the place most people have a bottleneck in day
to day usage of the system. People have tried various algorithms for
enhancing the scheduler and as far as I know never found anything that
makes a significant difference, much to everyone's surprise :/.

Cheers,

Richard





^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30  7:00     ` Martin Jansa
@ 2012-03-30 10:06       ` Richard Purdie
  0 siblings, 0 replies; 20+ messages in thread
From: Richard Purdie @ 2012-03-30 10:06 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Fri, 2012-03-30 at 09:00 +0200, Martin Jansa wrote:
> On Thu, Mar 29, 2012 at 09:03:15PM -0400, Bruce Ashfield wrote:
> > On Thu, Mar 29, 2012 at 6:03 PM, Richard Purdie
> > <richard.purdie@linuxfoundation.org> wrote:
> > >> - do the same thing for the linux-yocto kernel or add a --reference
> > >>   option to the git fetcher so that we can provide a local tree as a
> > >>   reference ?
> > >
> > > This is effectively how the repositories in DL_DIR are used. If you
> > > place a tree in the right place there, it should reuse references...
> > 
> > Agreed .. they definitely do here.
> 
> What's right place?
> 
> I guess the idea was to use --reference for e.g. some other kernel recipe
> sources checkout.
> 
> And I guess that building linux-foo won't notice that there is e.g.
> /OE/downloads/git2/gitorious.org.shr.linux.git:
> from which it can share a lot of objects using --reference
> 
> Bob Ham (rah on #oe) said that he is working on some sort of support
> for --reference with bitbake after I've refused to add just another
> linux-bar recipe to meta-smartphone, but not sure how he plans to
> implement it to be usefull and working oob in different env with
> different sources available in downloads dir.

You could conceivably symlink all your different kernel directories
together within git2/. As far as I can tell, the fetcher simply wouldn't
care in most cases. The branch structures could get a little mangled I
guess and you'd not want to share the resulting mirror tarballs.

There is an argument for using one large shared container for all the
git objects as another way of solving this. I don't know how well git
has that supported but at the object level its a non-issue at least.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30  9:32       ` Richard Purdie
@ 2012-03-30 10:07         ` Samuel Stirtzel
  2012-03-30 10:45           ` Richard Purdie
  0 siblings, 1 reply; 20+ messages in thread
From: Samuel Stirtzel @ 2012-03-30 10:07 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

2012/3/30 Richard Purdie <richard.purdie@linuxfoundation.org>:
> On Fri, 2012-03-30 at 08:44 +0200, Samuel Stirtzel wrote:
>> this might be a bit off-topic, but another idea would be to add a
>> separate threading mechanism for fetching.
>>
>> Current threading can help to use the CPU and memory load to it's optimum,
>> but sometimes you have to wait for a download to finish..
>> Instead there could be a separate set of threads that only download
>> the sources and make optimal use of the bandwidth too.
>>
>> This would also allow to fetch files when the normal threads are busy
>> with configuring/building/packaging recipes.
>>
>>
>> The downside would be that it requires some sort of inter process
>> communication.
>> Or it could be regulated with a simple check if the download is finished..
>>
>> How does this idea sound to you?
>
> Its easier than you think to do this, bitbake has a plugable scheduler
> implementation so you'd just have to write one which ignores "fetch"
> operations from the total thread count.
>
> Sadly this isn't really the place most people have a bottleneck in day
> to day usage of the system. People have tried various algorithms for
> enhancing the scheduler and as far as I know never found anything that
> makes a significant difference, much to everyone's surprise :/.
>
> Cheers,
>
> Richard
>
>
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core

Of course this will only reduce the time of recipes if they are build
for the first time,
or when the version/URL changes.
It is not that important, I agree,
but it would improve the situation for first time users, or new installations.


Example for 2 threads:
http://pastebin.com/kviwQZJ3
It is very likely that the current situation also uses cpu and network
resources at the same time,
but it might occur that the build-task has to wait for a download to
finish or vice versa.

Ignoring fetch tasks from the thread count would only do half of the
job and _could_ cause network bottlenecks ;)
Fetching should be "independent" from the dependency chain.
E.g.: it should not wait with the downloads for dependencies to finish building,
the download sequence should still match the dependency chain sequence.



If it is really that easy, then I will look into it.

-- 
Regards
Samuel



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30 10:07         ` Samuel Stirtzel
@ 2012-03-30 10:45           ` Richard Purdie
  2012-04-02  8:15             ` Samuel Stirtzel
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Purdie @ 2012-03-30 10:45 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Fri, 2012-03-30 at 12:07 +0200, Samuel Stirtzel wrote:
> Of course this will only reduce the time of recipes if they are build
> for the first time,
> or when the version/URL changes.
> It is not that important, I agree,
> but it would improve the situation for first time users, or new installations.
> 
> Example for 2 threads:
> http://pastebin.com/kviwQZJ3
> It is very likely that the current situation also uses cpu and network
> resources at the same time,
> but it might occur that the build-task has to wait for a download to
> finish or vice versa.
> 
> Ignoring fetch tasks from the thread count would only do half of the
> job and _could_ cause network bottlenecks ;)
> Fetching should be "independent" from the dependency chain.

This simply isn't true and there is also no benefit to splitting them to
be independent. The fetch tasks have dependencies just like any other
task (for example git:// urls depend on git-native being built unless
its in ASSUME_PROVIDED).

> E.g.: it should not wait with the downloads for dependencies to finish building,
> the download sequence should still match the dependency chain sequence.

I'm afraid I don't understand what you mean. I think you will find that
if you exclude the "fetch" tasks from the normal "cpu" thread count you
will get the behaviour you are describing.

Cheers,

Richard




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30  8:50   ` Eric Bénard
@ 2012-03-30 15:12     ` Richard Purdie
  2012-03-30 15:24       ` Eric Bénard
  0 siblings, 1 reply; 20+ messages in thread
From: Richard Purdie @ 2012-03-30 15:12 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Fri, 2012-03-30 at 10:50 +0200, Eric Bénard wrote:
> Le Thu, 29 Mar 2012 23:03:13 +0100,
> Richard Purdie <richard.purdie@linuxfoundation.org> a écrit :
> 
> > On Thu, 2012-03-29 at 22:53 +0200, Eric Bénard wrote:
> > > I noticed in from scratch builds for qemuarm that the longest time is
> > > taken in fetching sources, especially those fetched using git
> > > (linux-yocto for example) & svn (gcc, eglibc & co).
> > 
> > Are you timing these as fetches from the source control systems or from
> > the mirror tarballs of the repositories. The tarballs should be
> > faster...
> > 
> the default configuration seems to fetch from source control systems
> as I always see very long time to fetch gcc/eglibc/linux-yocto
> (despite having a 2.2 MBytes/s downlink DSL line).

If you're hitting the SCMs I can understand the frustration.

> I don't think that's a size problem but that fetching through svn or
> git is far less efficient than http or ftp especially from gnu's svn
> which may be overloaded.

Agreed.

> Morover in a pure OE context we have no interest of all the source
> history provided by svn or git and that makes a very big volume to
> download.

The fetcher will deal with this well in the svn case. In the git case,
we made a decision to include history since its not that more expensive.
Both these assumptions are based on a working up to date mirror.

> . ./openembedded-core/oe-init-build-env 
> edit local.conf to select qemuarm & BBTHREAD to 8
> bitbake core-image-minimal -c fetchall
> 
> and then I see bitbake stops at around 209 or 214 tasks waiting and
> I see that in ps :
> /home/ebenard/OE-CORE/build/tmp-eglibc/sysroots/x86_64-linux/usr/bin/git.real
> clone --bare --mirror
> git://git.yoctoproject.org/linux-yocto-3.2 /home/ebenard/OE-CORE/build/downloads/git2/git.yoctoproject.org.linux-yocto-3.2
> and
> svn co -r 184847
> http://gcc.gnu.org/svn/gcc/branches/gcc-4_6-branch@184847
> 
> ... and both are actually fetching at only around 200 KiB/s which last
> for quite a long time as (from an other downloads dir) the final
> tree size are huge :
> du -s git.yoctoproject.org.linux-yocto-3.2/
> 610824	git.yoctoproject.org.linux-yocto-3.2/
> du -s gcc.gnu.org/
> 1602496	gcc.gnu.org/
> du -s www.eglibc.org/
> 625048	www.eglibc.org
> If I launch at the same time :
> wget ftp://ftp.gnu.org/gnu/gcc/gcc-4.6.3/gcc-4.6.3.tar.bz2
> I get a download speed close to 1MB/s and the file to download is only
> 64MB which would save bandwidth.

Try adding this to your configuration:

PREMIRRORS = "\
git://.*/.*   http://downloads.yoctoproject.org/mirror/sources/ \n \
svn://.*/.*   http://downloads.yoctoproject.org/mirror/sources/ \n"

and see if that helps the performance. It might be we consider making
this the default for OE-Core although some people are nervous about
doing this...

Cheers,

Richard




^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30 15:12     ` Richard Purdie
@ 2012-03-30 15:24       ` Eric Bénard
  2012-03-30 15:49         ` Bruce Ashfield
  2012-03-30 16:02         ` Richard Purdie
  0 siblings, 2 replies; 20+ messages in thread
From: Eric Bénard @ 2012-03-30 15:24 UTC (permalink / raw)
  To: openembedded-core

Le Fri, 30 Mar 2012 16:12:44 +0100,
Richard Purdie <richard.purdie@linuxfoundation.org> a écrit :
> On Fri, 2012-03-30 at 10:50 +0200, Eric Bénard wrote:
> > the default configuration seems to fetch from source control systems
> > as I always see very long time to fetch gcc/eglibc/linux-yocto
> > (despite having a 2.2 MBytes/s downlink DSL line).
> 
> If you're hitting the SCMs I can understand the frustration.
> 
that's not a frustration, that's a feedback on the default
behaviour. But I agree with you that could be a frustration for someone
trying OE-core for the first time ;-)

> Try adding this to your configuration:
> 
> PREMIRRORS = "\
> git://.*/.*   http://downloads.yoctoproject.org/mirror/sources/ \n \
> svn://.*/.*   http://downloads.yoctoproject.org/mirror/sources/ \n"
> 
> and see if that helps the performance. It might be we consider making
> this the default for OE-Core although some people are nervous about
> doing this...
> 
sure that will help : in my work setup I have my own mirrors configured
but here again, that's not what a new user will have and in that
case, I'm testing the plain default configuration to help finding bugs
or things to improve the release.

I think fetching from git or svn should not be the first thing to do in
recipes like gcc, eglibc, linux & co where we are based on a
stable released version : this doesn't bring real added value to the
user in OE context and this wastes bandwidth (a tbz2 kernel is around
75MB, a git one is around 600MB).

Eric



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30 15:24       ` Eric Bénard
@ 2012-03-30 15:49         ` Bruce Ashfield
  2012-03-30 15:55           ` Eric Bénard
  2012-03-30 16:02         ` Richard Purdie
  1 sibling, 1 reply; 20+ messages in thread
From: Bruce Ashfield @ 2012-03-30 15:49 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Fri, Mar 30, 2012 at 11:24 AM, Eric Bénard <eric@eukrea.com> wrote:
> Le Fri, 30 Mar 2012 16:12:44 +0100,
> Richard Purdie <richard.purdie@linuxfoundation.org> a écrit :
>> On Fri, 2012-03-30 at 10:50 +0200, Eric Bénard wrote:
>> > the default configuration seems to fetch from source control systems
>> > as I always see very long time to fetch gcc/eglibc/linux-yocto
>> > (despite having a 2.2 MBytes/s downlink DSL line).
>>
>> If you're hitting the SCMs I can understand the frustration.
>>
> that's not a frustration, that's a feedback on the default
> behaviour. But I agree with you that could be a frustration for someone
> trying OE-core for the first time ;-)
>
>> Try adding this to your configuration:
>>
>> PREMIRRORS = "\
>> git://.*/.*   http://downloads.yoctoproject.org/mirror/sources/ \n \
>> svn://.*/.*   http://downloads.yoctoproject.org/mirror/sources/ \n"
>>
>> and see if that helps the performance. It might be we consider making
>> this the default for OE-Core although some people are nervous about
>> doing this...
>>
> sure that will help : in my work setup I have my own mirrors configured
> but here again, that's not what a new user will have and in that
> case, I'm testing the plain default configuration to help finding bugs
> or things to improve the release.
>
> I think fetching from git or svn should not be the first thing to do in
> recipes like gcc, eglibc, linux & co where we are based on a
> stable released version : this doesn't bring real added value to the
> user in OE context and this wastes bandwidth (a tbz2 kernel is around

s/user/developer/ and there is value in having git history. I know we'd never do
without it in our shop.

I suggested shallow clones and some other options to Richard a few weeks
ago, or some other hybrid models. They all vary in terms of nastiness and
have some good and bad points.

But from a kernel guy's point of view, you definitely want to work
inside git, but
I can see from non-kernel point of view, build and boot is all that
really matters.

Cheers,

Bruce

> 75MB, a git one is around 600MB).
>
> Eric
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core



-- 
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end"



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30 15:49         ` Bruce Ashfield
@ 2012-03-30 15:55           ` Eric Bénard
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Bénard @ 2012-03-30 15:55 UTC (permalink / raw)
  To: openembedded-core

Le Fri, 30 Mar 2012 11:49:33 -0400,
Bruce Ashfield <bruce.ashfield@gmail.com> a écrit :
> On Fri, Mar 30, 2012 at 11:24 AM, Eric Bénard <eric@eukrea.com> wrote:
> > I think fetching from git or svn should not be the first thing to do in
> > recipes like gcc, eglibc, linux & co where we are based on a
> > stable released version : this doesn't bring real added value to the
> > user in OE context and this wastes bandwidth (a tbz2 kernel is around
> 
> s/user/developer/ and there is value in having git history. I know we'd never do
> without it in our shop.
> 
> I suggested shallow clones and some other options to Richard a few weeks
> ago, or some other hybrid models. They all vary in terms of nastiness and
> have some good and bad points.
> 
> But from a kernel guy's point of view, you definitely want to work
> inside git, but

Do you mean you work in the git tree of linux-yocto *directly
inside* OE's sources / downloads ?

Eric



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30 15:24       ` Eric Bénard
  2012-03-30 15:49         ` Bruce Ashfield
@ 2012-03-30 16:02         ` Richard Purdie
  2012-03-30 16:17           ` Eric Bénard
  1 sibling, 1 reply; 20+ messages in thread
From: Richard Purdie @ 2012-03-30 16:02 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Fri, 2012-03-30 at 17:24 +0200, Eric Bénard wrote:
> Le Fri, 30 Mar 2012 16:12:44 +0100,
> Richard Purdie <richard.purdie@linuxfoundation.org> a écrit :
> > On Fri, 2012-03-30 at 10:50 +0200, Eric Bénard wrote:
> > > the default configuration seems to fetch from source control systems
> > > as I always see very long time to fetch gcc/eglibc/linux-yocto
> > > (despite having a 2.2 MBytes/s downlink DSL line).
> > 
> > If you're hitting the SCMs I can understand the frustration.
> > 
> that's not a frustration, that's a feedback on the default
> behaviour. But I agree with you that could be a frustration for someone
> trying OE-core for the first time ;-)
> 
> > Try adding this to your configuration:
> > 
> > PREMIRRORS = "\
> > git://.*/.*   http://downloads.yoctoproject.org/mirror/sources/ \n \
> > svn://.*/.*   http://downloads.yoctoproject.org/mirror/sources/ \n"
> > 
> > and see if that helps the performance. It might be we consider making
> > this the default for OE-Core although some people are nervous about
> > doing this...
> > 
> sure that will help : in my work setup I have my own mirrors configured
> but here again, that's not what a new user will have and in that
> case, I'm testing the plain default configuration to help finding bugs
> or things to improve the release.
> 
> I think fetching from git or svn should not be the first thing to do in
> recipes like gcc, eglibc, linux & co where we are based on a
> stable released version : this doesn't bring real added value to the
> user in OE context and this wastes bandwidth (a tbz2 kernel is around
> 75MB, a git one is around 600MB).

We've gone around in circles on this. We did use tarballs for gcc,
people complained. We switched to svn, you're not happy and probably
others aren't. We can't win.

Adding the PREMIRRORS makes the situation better, I agree its not
perfect. The original question was how can we speed it up and this is an
easy way to do so for the default user case without changing anything
major.

I hear what you're saying on the tarball vs. SCM issue but using
tarballs does break use cases some users do use, the opposite is not
true. 

Its also ultimately down to the maintainers of recipes. The gcc issue is
more maintainable the way its set up now compared to large numbers of
patches (which took an age to apply) and doesn't have much of an
additional bandwidth cost.

The linux-yocto kernel recipe heavily uses the SCM to do things so
whilst it does have a higher download cost, it as adds value and is
ultimately a maintainers choice too.

So whilst I hear what you're saying, I don't think we can change
anything other than the PREMIRROR...

Cheers,

Richard






^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30 16:02         ` Richard Purdie
@ 2012-03-30 16:17           ` Eric Bénard
  2012-03-30 17:33             ` Bruce Ashfield
  0 siblings, 1 reply; 20+ messages in thread
From: Eric Bénard @ 2012-03-30 16:17 UTC (permalink / raw)
  To: openembedded-core

Le Fri, 30 Mar 2012 17:02:24 +0100,
Richard Purdie <richard.purdie@linuxfoundation.org> a écrit :
> The linux-yocto kernel recipe heavily uses the SCM to do things so
> whilst it does have a higher download cost, it as adds value and is
> ultimately a maintainers choice too.
> 
OK now that I've given a closer look at the linux-yocto recipes &
bbclass I understand better why you need it in that recipe and that
this recipe is heavily based on git's features.

> So whilst I hear what you're saying, I don't think we can change
> anything other than the PREMIRROR...
> 
then maybe for the new users testing OE, having PREMIRRORs set in the
default configuration would be a great thing so that they don't believe
OE is a big slow beast just because they have to wait hours for git or
svn to fetch sources.

Eric



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30 16:17           ` Eric Bénard
@ 2012-03-30 17:33             ` Bruce Ashfield
  2012-03-30 18:36               ` Eric Bénard
  0 siblings, 1 reply; 20+ messages in thread
From: Bruce Ashfield @ 2012-03-30 17:33 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

On Fri, Mar 30, 2012 at 12:17 PM, Eric Bénard <eric@eukrea.com> wrote:
> Le Fri, 30 Mar 2012 17:02:24 +0100,
> Richard Purdie <richard.purdie@linuxfoundation.org> a écrit :
>> The linux-yocto kernel recipe heavily uses the SCM to do things so
>> whilst it does have a higher download cost, it as adds value and is
>> ultimately a maintainers choice too.
>>
> OK now that I've given a closer look at the linux-yocto recipes &
> bbclass I understand better why you need it in that recipe and that
> this recipe is heavily based on git's features.

There are alternatives that I'm going to be exploring going forward,
just nothing
that we can bring in during the stabilization cycle. The recipes manipulate git
and use it to construct what you build, they don't absolutely require a full
git history, so there are some potential savings to be had.

It just obviously limits flexibility if a derived recipe wants to merge branches
and histories to construct what is built. So having a simple/shallow history for
basic builds while not breaking more complex cases probably hits the sweet
spot.

Cheers,

Bruce

>
>> So whilst I hear what you're saying, I don't think we can change
>> anything other than the PREMIRROR...
>>
> then maybe for the new users testing OE, having PREMIRRORs set in the
> default configuration would be a great thing so that they don't believe
> OE is a big slow beast just because they have to wait hours for git or
> svn to fetch sources.
>
> Eric
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core



-- 
"Thou shalt not follow the NULL pointer, for chaos and madness await
thee at its end"



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30 17:33             ` Bruce Ashfield
@ 2012-03-30 18:36               ` Eric Bénard
  0 siblings, 0 replies; 20+ messages in thread
From: Eric Bénard @ 2012-03-30 18:36 UTC (permalink / raw)
  To: openembedded-core

Le Fri, 30 Mar 2012 13:33:05 -0400,
Bruce Ashfield <bruce.ashfield@gmail.com> a écrit :
> There are alternatives that I'm going to be exploring going forward,
> just nothing
> that we can bring in during the stabilization cycle. The recipes manipulate git
> and use it to construct what you build, they don't absolutely require a full
> git history, so there are some potential savings to be had.
> 
> It just obviously limits flexibility if a derived recipe wants to merge branches
> and histories to construct what is built. So having a simple/shallow history for
> basic builds while not breaking more complex cases probably hits the sweet
> spot.
> 
OK in the end all the slow download problems I met while testing oe-core
& qemuarm from scratch were due to a problem on the server hosting
yocto's git and mirror services (so setting PREMIRROR to use
yocto's mirror didn't improve the situation).
Now that this problem is fixed on the yocto server, the time to download
linux-yocto kernel went from 90-120 minutes down to 20-30 minutes
which seems more reasonnable !
So there was really a problem but I was not looking in the right
direction to fix it :-(

Eric



^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: Fetch time optimization (svn : gcc/eglibc - git : linux-yocto)
  2012-03-30 10:45           ` Richard Purdie
@ 2012-04-02  8:15             ` Samuel Stirtzel
  0 siblings, 0 replies; 20+ messages in thread
From: Samuel Stirtzel @ 2012-04-02  8:15 UTC (permalink / raw)
  To: Patches and discussions about the oe-core layer

2012/3/30 Richard Purdie <richard.purdie@linuxfoundation.org>:
> On Fri, 2012-03-30 at 12:07 +0200, Samuel Stirtzel wrote:
>> Of course this will only reduce the time of recipes if they are build
>> for the first time,
>> or when the version/URL changes.
>> It is not that important, I agree,
>> but it would improve the situation for first time users, or new installations.
>>
>> Example for 2 threads:
>> http://pastebin.com/kviwQZJ3
>> It is very likely that the current situation also uses cpu and network
>> resources at the same time,
>> but it might occur that the build-task has to wait for a download to
>> finish or vice versa.
>>
>> Ignoring fetch tasks from the thread count would only do half of the
>> job and _could_ cause network bottlenecks ;)
>> Fetching should be "independent" from the dependency chain.
>
> This simply isn't true and there is also no benefit to splitting them to
> be independent. The fetch tasks have dependencies just like any other
> task (for example git:// urls depend on git-native being built unless
> its in ASSUME_PROVIDED).
You are right, my mistake.
Adding some line like PARALLEL_FETCH to the config will do the rest.

>
>> E.g.: it should not wait with the downloads for dependencies to finish building,
>> the download sequence should still match the dependency chain sequence.
>
> I'm afraid I don't understand what you mean. I think you will find that
> if you exclude the "fetch" tasks from the normal "cpu" thread count you
> will get the behaviour you are describing.

I was erroneously assuming that the download only starts after all
dependencies finished building,
but of course this was only derivated as the threads where blocked by
the build tasks.
So yes the method you mentioned will work.
>
> Cheers,
>
> Richard
>
>
> _______________________________________________
> Openembedded-core mailing list
> Openembedded-core@lists.openembedded.org
> http://lists.linuxtogo.org/cgi-bin/mailman/listinfo/openembedded-core



-- 
Regards
Samuel



^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2012-04-02  8:25 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-03-29 20:53 Fetch time optimization (svn : gcc/eglibc - git : linux-yocto) Eric Bénard
2012-03-29 22:03 ` Richard Purdie
2012-03-30  1:03   ` Bruce Ashfield
2012-03-30  6:44     ` Samuel Stirtzel
2012-03-30  9:21       ` Paul Eggleton
2012-03-30  9:32       ` Richard Purdie
2012-03-30 10:07         ` Samuel Stirtzel
2012-03-30 10:45           ` Richard Purdie
2012-04-02  8:15             ` Samuel Stirtzel
2012-03-30  7:00     ` Martin Jansa
2012-03-30 10:06       ` Richard Purdie
2012-03-30  8:50   ` Eric Bénard
2012-03-30 15:12     ` Richard Purdie
2012-03-30 15:24       ` Eric Bénard
2012-03-30 15:49         ` Bruce Ashfield
2012-03-30 15:55           ` Eric Bénard
2012-03-30 16:02         ` Richard Purdie
2012-03-30 16:17           ` Eric Bénard
2012-03-30 17:33             ` Bruce Ashfield
2012-03-30 18:36               ` Eric Bénard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.