* Build time data @ 2012-04-11 20:42 Chris Tapp 2012-04-11 21:19 ` Autif Khan ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Chris Tapp @ 2012-04-11 20:42 UTC (permalink / raw) To: Yocto Project Is there a page somewhere that gives a rough idea of how quickly a full build runs on various systems? I need a faster build platform, but want to get a reasonable price / performance balance ;-) I'm looking at something like an i7-2700K but am not yet tied... Chris Tapp opensource@keylevel.com www.keylevel.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-11 20:42 Build time data Chris Tapp @ 2012-04-11 21:19 ` Autif Khan 2012-04-11 21:38 ` Bob Cochran 2012-04-12 0:30 ` Darren Hart 2 siblings, 0 replies; 36+ messages in thread From: Autif Khan @ 2012-04-11 21:19 UTC (permalink / raw) To: Chris Tapp; +Cc: Yocto Project > Is there a page somewhere that gives a rough idea of how quickly a full build runs on various systems? I dont think there is a page anywhere. This is as rough as it can get on the two machine that I have not including the time it takes to download the source files. HP 2.8GHz core i-7 dual core hyper threaded machine with a 5400 rpm disk: BB_NUM_THREADS = "8" PARALLEL_MAKE = "8" core image minimal 2.5 hours core image sato 5 hours core image sdk 8-10 hours A build machine - Core i7-3960X 3.3 GHz - 6 cores hyperthreaded, with an SSD for build output and a 7200 rpm 6.0 Gbps claiming HDD for downloads, poky, whatever BB_NUM_THREADS = "24" PARALLEL_MAKE = "24" core image minimal 27 minutes core image sato 58-62 minutes core image sdk 110-120 minutes OS was always Ubuntu 11.10 - xubuntu on HP laptop, server in build machine. > I need a faster build platform, but want to get a reasonable price / performance balance ;-) > > I'm looking at something like an i7-2700K but am not yet tied... The build machine cost about $3500 or so from tiger direct/newegg It was well worth it - instead of doing nightly builds, I can now do a clean build in under one hour. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-11 20:42 Build time data Chris Tapp 2012-04-11 21:19 ` Autif Khan @ 2012-04-11 21:38 ` Bob Cochran 2012-04-12 0:30 ` Darren Hart 2 siblings, 0 replies; 36+ messages in thread From: Bob Cochran @ 2012-04-11 21:38 UTC (permalink / raw) To: yocto On 04/11/2012 04:42 PM, Chris Tapp wrote: > Is there a page somewhere that gives a rough idea of how quickly a full build runs on various systems? > > I need a faster build platform, but want to get a reasonable price / performance balance ;-) > > I'm looking at something like an i7-2700K but am not yet tied... > > Chris Tapp > > opensource@keylevel.com > www.keylevel.com > > I haven't seen one, but it would be great to have this on the wiki where everyone could post what they're seeing & using. Maybe the autobuilder has some useful statistics (http://autobuilder.yoctoproject.org:8010/)? Of course, you'll have to be careful to determine whether anything else was running at the time of the build. On a related note, I have been wondering whether I would get the bang for the buck with an SSD for my build machines. I would guess that building embedded Linux images isn't a typical use pattern for an SSD. I wonder if the long write & erase durations for FLASH technology would show its ugly face during a poky build. I would think that the embedded micro inside the SSD managing the writes might get taxed to the limit trying to slice the data. I would appreciate anyone's experience with SSDs on build machines. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-11 20:42 Build time data Chris Tapp 2012-04-11 21:19 ` Autif Khan 2012-04-11 21:38 ` Bob Cochran @ 2012-04-12 0:30 ` Darren Hart 2012-04-12 0:43 ` Osier-mixon, Jeffrey ` (4 more replies) 2 siblings, 5 replies; 36+ messages in thread From: Darren Hart @ 2012-04-12 0:30 UTC (permalink / raw) To: Chris Tapp; +Cc: Yocto Project On 04/11/2012 01:42 PM, Chris Tapp wrote: > Is there a page somewhere that gives a rough idea of how quickly a full build runs on various systems? > > I need a faster build platform, but want to get a reasonable price / performance balance ;-) > > I'm looking at something like an i7-2700K but am not yet tied... > We really do need to get some pages up on this as it comes up a lot. Currently Yocto Project builds scale well up to about 12 Cores, so first step is to get as many cores as you can. Sacrifice some speed for cores if you have to. If you can do dual-socket, do it. If not, try for a six core. Next up is storage. We read and write a LOT of data. SSDs are one way to go, but we've been known to chew through them and they aren't priced as consumables. You can get about 66% of the performance of a single SSD with a pair of good quality SATA2 or better drives configured in RAID0 (no redundancy). Ideally, you would have your OS and sources on an SSD and use a RAID0 array to build on. This data is all recreatable, so it's "OK" if you lose a disk and therefor ALL of your build data. Now RAM, you will want about 2 GB of RAM per core, with a minimum of 4GB. Finally, software. Be sure to run a "server" kernel which is optimized for throughput as opposed to interactivity (like Desktop kernels). This implies CONFIG_PREEMPT_NONE=y. You'll want a 64-bit kernel to avoid the performance penalty inherent with 32bit PAE kernels - and you will want lots of memory. You can save some IO by mounting your its-ok-if-i-lose-all-my-data build partition as follows: /dev/md0 /build ext4 noauto,noatime,nodiratime,commit=6000 As well as drop the journal from it when you format it. Just don't power off your machine without properly shutting down! That should get you some pretty good build times. I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build partition. I run a headless Ubuntu 11.10 (x86_64) installation running the 3.0.0-16-server kernel. I can build core-image-minimal in < 30 minutes and core-image-sato in < 50 minutes from scratch. Hopefully that gives you some ideas to get started. -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 0:30 ` Darren Hart @ 2012-04-12 0:43 ` Osier-mixon, Jeffrey 2012-04-12 4:39 ` Bob Cochran ` (3 subsequent siblings) 4 siblings, 0 replies; 36+ messages in thread From: Osier-mixon, Jeffrey @ 2012-04-12 0:43 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project Excellent topic for a wiki page. On Wed, Apr 11, 2012 at 5:30 PM, Darren Hart <dvhart@linux.intel.com> wrote: > > > On 04/11/2012 01:42 PM, Chris Tapp wrote: >> Is there a page somewhere that gives a rough idea of how quickly a full build runs on various systems? >> >> I need a faster build platform, but want to get a reasonable price / performance balance ;-) >> >> I'm looking at something like an i7-2700K but am not yet tied... >> > > > We really do need to get some pages up on this as it comes up a lot. > > Currently Yocto Project builds scale well up to about 12 Cores, so first > step is to get as many cores as you can. Sacrifice some speed for cores > if you have to. If you can do dual-socket, do it. If not, try for a six > core. > > Next up is storage. We read and write a LOT of data. SSDs are one way to > go, but we've been known to chew through them and they aren't priced as > consumables. You can get about 66% of the performance of a single SSD > with a pair of good quality SATA2 or better drives configured in RAID0 > (no redundancy). Ideally, you would have your OS and sources on an SSD > and use a RAID0 array to build on. This data is all recreatable, so it's > "OK" if you lose a disk and therefor ALL of your build data. > > Now RAM, you will want about 2 GB of RAM per core, with a minimum of 4GB. > > Finally, software. Be sure to run a "server" kernel which is optimized > for throughput as opposed to interactivity (like Desktop kernels). This > implies CONFIG_PREEMPT_NONE=y. You'll want a 64-bit kernel to avoid the > performance penalty inherent with 32bit PAE kernels - and you will want > lots of memory. You can save some IO by mounting your > its-ok-if-i-lose-all-my-data build partition as follows: > > /dev/md0 /build ext4 > noauto,noatime,nodiratime,commit=6000 > > As well as drop the journal from it when you format it. Just don't power > off your machine without properly shutting down! > > That should get you some pretty good build times. > > I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 > Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build > partition. I run a headless Ubuntu 11.10 (x86_64) installation running > the 3.0.0-16-server kernel. I can build core-image-minimal in < 30 > minutes and core-image-sato in < 50 minutes from scratch. > > Hopefully that gives you some ideas to get started. > > -- > Darren Hart > Intel Open Source Technology Center > Yocto Project - Linux Kernel > _______________________________________________ > yocto mailing list > yocto@yoctoproject.org > https://lists.yoctoproject.org/listinfo/yocto -- Jeff Osier-Mixon http://jefro.net/blog Yocto Project Community Manager @Intel http://yoctoproject.org ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 0:30 ` Darren Hart 2012-04-12 0:43 ` Osier-mixon, Jeffrey @ 2012-04-12 4:39 ` Bob Cochran 2012-04-12 7:10 ` Darren Hart 2012-04-12 7:35 ` Joshua Immanuel ` (2 subsequent siblings) 4 siblings, 1 reply; 36+ messages in thread From: Bob Cochran @ 2012-04-12 4:39 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project On 04/11/2012 08:30 PM, Darren Hart wrote: > SSDs are one way to > go, but we've been known to chew through them and they aren't priced as > consumables. Hi Darren, Could you please elaborate on "been known to chew through them"? Are you running into an upper limit on write / erase cycles? Are you encountering hard (or soft) failures? Thanks, Bob ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 4:39 ` Bob Cochran @ 2012-04-12 7:10 ` Darren Hart 0 siblings, 0 replies; 36+ messages in thread From: Darren Hart @ 2012-04-12 7:10 UTC (permalink / raw) To: Bob Cochran; +Cc: Yocto Project On 04/11/2012 09:39 PM, Bob Cochran wrote: > On 04/11/2012 08:30 PM, Darren Hart wrote: >> SSDs are one way to >> go, but we've been known to chew through them and they aren't priced as >> consumables. > > Hi Darren, > > Could you please elaborate on "been known to chew through them"? > > Are you running into an upper limit on write / erase cycles? Are you > encountering hard (or soft) failures? Some have reported early physical disk failure. Due to the cost of SSDs, not a lot of people seem to be trying it out. I *believe* the current generation of SSDs would perform admirably, but I haven't tested that. I know Deny builds with SSDs, perhaps he would care to comment? -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 0:30 ` Darren Hart 2012-04-12 0:43 ` Osier-mixon, Jeffrey 2012-04-12 4:39 ` Bob Cochran @ 2012-04-12 7:35 ` Joshua Immanuel 2012-04-12 8:00 ` Martin Jansa 2012-04-12 14:08 ` Björn Stenberg 2012-04-13 9:56 ` Tomas Frydrych 4 siblings, 1 reply; 36+ messages in thread From: Joshua Immanuel @ 2012-04-12 7:35 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project [-- Attachment #1: Type: text/plain, Size: 524 bytes --] Darren, On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote: > I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 > Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build > partition. I run a headless Ubuntu 11.10 (x86_64) installation running > the 3.0.0-16-server kernel. I can build core-image-minimal in < 30 > minutes and core-image-sato in < 50 minutes from scratch. wow. Can I get a shell? :D -- Joshua Immanuel HiPro IT Solutions Private Limited http://hipro.co.in [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 205 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 7:35 ` Joshua Immanuel @ 2012-04-12 8:00 ` Martin Jansa 2012-04-12 9:36 ` Joshua Immanuel 2012-04-12 14:12 ` Darren Hart 0 siblings, 2 replies; 36+ messages in thread From: Martin Jansa @ 2012-04-12 8:00 UTC (permalink / raw) To: Joshua Immanuel; +Cc: Yocto Project, Darren Hart [-- Attachment #1: Type: text/plain, Size: 720 bytes --] On Thu, Apr 12, 2012 at 01:05:00PM +0530, Joshua Immanuel wrote: > Darren, > > On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote: > > I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 > > Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build > > partition. I run a headless Ubuntu 11.10 (x86_64) installation running > > the 3.0.0-16-server kernel. I can build core-image-minimal in < 30 > > minutes and core-image-sato in < 50 minutes from scratch. why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to be able to do my builds in tmpfs and keep only more permanent data on RAID. Cheers, -- Martin 'JaMa' Jansa jabber: Martin.Jansa@gmail.com [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 205 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 8:00 ` Martin Jansa @ 2012-04-12 9:36 ` Joshua Immanuel 2012-04-12 14:12 ` Darren Hart 1 sibling, 0 replies; 36+ messages in thread From: Joshua Immanuel @ 2012-04-12 9:36 UTC (permalink / raw) To: Martin Jansa; +Cc: Yocto Project, Darren Hart [-- Attachment #1: Type: text/plain, Size: 1091 bytes --] On Thu, 2012-04-12 at 10:00 +0200, Martin Jansa wrote: > > On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote: > > > I run on a beast with 12 cores, 48GB of RAM, OS and sources on > > > a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for > > > my /build partition. I run a headless Ubuntu 11.10 (x86_64) > > > installation running the 3.0.0-16-server kernel. I can build > > > core-image-minimal in < 30 minutes and core-image-sato in < 50 > > > minutes from scratch. > > why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to be > able to do my builds in tmpfs and keep only more permanent data on > RAID. +1 I tried using the tmpfs for WORKDIR on my T420 which has 8GB of RAM. (In India, maximum single slot DDR3 RAM we can get is 4GB.) Obviously, this is not sufficient :( Maybe I shouldn't use the laptop for build purposes. Moreover, every time I build the image in yocto, temperature peeks to 87 degree Celsius. Hoping that my HDD should not die. -- Joshua Immanuel HiPro IT Solutions Private Limited http://hipro.co.in [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 205 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 8:00 ` Martin Jansa 2012-04-12 9:36 ` Joshua Immanuel @ 2012-04-12 14:12 ` Darren Hart 2012-04-12 23:37 ` Flanagan, Elizabeth 1 sibling, 1 reply; 36+ messages in thread From: Darren Hart @ 2012-04-12 14:12 UTC (permalink / raw) To: Martin Jansa; +Cc: Yocto Project -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/12/2012 01:00 AM, Martin Jansa wrote: > On Thu, Apr 12, 2012 at 01:05:00PM +0530, Joshua Immanuel wrote: >> Darren, >> >> On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote: >>> I run on a beast with 12 cores, 48GB of RAM, OS and sources on >>> a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array >>> for my /build partition. I run a headless Ubuntu 11.10 (x86_64) >>> installation running the 3.0.0-16-server kernel. I can build >>> core-image-minimal in < 30 minutes and core-image-sato in < 50 >>> minutes from scratch. > > why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to > be able to do my builds in tmpfs and keep only more permanent data > on RAID. We've done some experiments with tmpfs, adding Beth on CC. If I recall correctly, my RAID0 array with the mount options I specified accomplishes much of what tmpfs does for me without the added setup. With a higher commit interval, the kernel doesn't try to sync the dcache with the disks as frequently (eg not even once during a build), so it's effectively writing to memory (although there is still plenty of IO occurring). The other reason is that while 48GB is plenty for a single build, I often run many builds in parallel, sometimes in virtual machines when I need to reproduce or test something on different hosts. For example: https://picasaweb.google.com/lh/photo/7PCrqXQqxL98SAY1ecNzDdMTjNZETYmyPJy0liipFm0?feat=directlink - -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPhuLfAAoJEKbMaAwKp3648pYH/1HGCzI1QP1mj1OPfbo1TNou nq1dCnEQOc+vUqShrmgjEY5H2G7Kqu5Y8JRp8m3D6v2iUPwu+ko3xASJkIVetgTn 1J+dkZl93Gbm8nm63b5bES0mMqyiycNgXW4KTL0iA+4mLbKSXck7nF/gIyjE4iHa SR+DDavSoOIJUiZsJBJpIdS4sY2RpalohhJvp97Qfmbxmqlo2RJkqzB7OmLliKbB zGiuXeFgGojZXIRl11Rr36kqqA75WoTlNYjlkcg1paEhCr4zCMh0sujGaPQgVPtu YU+FCtGxQ569f+hahdJraCU9T4IbMK4AOk30VqVxPifCqFhIvr7FnVRkYtV5pZM= =tdFq -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 14:12 ` Darren Hart @ 2012-04-12 23:37 ` Flanagan, Elizabeth 2012-04-13 5:51 ` Martin Jansa 0 siblings, 1 reply; 36+ messages in thread From: Flanagan, Elizabeth @ 2012-04-12 23:37 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project [-- Attachment #1: Type: text/plain, Size: 4012 bytes --] On Thu, Apr 12, 2012 at 7:12 AM, Darren Hart <dvhart@linux.intel.com> wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > On 04/12/2012 01:00 AM, Martin Jansa wrote: > > On Thu, Apr 12, 2012 at 01:05:00PM +0530, Joshua Immanuel wrote: > >> Darren, > >> > >> On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote: > >>> I run on a beast with 12 cores, 48GB of RAM, OS and sources on > >>> a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array > >>> for my /build partition. I run a headless Ubuntu 11.10 (x86_64) > >>> installation running the 3.0.0-16-server kernel. I can build > >>> core-image-minimal in < 30 minutes and core-image-sato in < 50 > >>> minutes from scratch. > > > > why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to > > be able to do my builds in tmpfs and keep only more permanent data > > on RAID. > > We've done some experiments with tmpfs, adding Beth on CC. If I recall > correctly, my RAID0 array with the mount options I specified > accomplishes much of what tmpfs does for me without the added setup. > This should be the case in general. For the most part, if you have a decent RAID setup (We're using RAID10 on the ab) with fast disks you should be able to hit tmpfs speed (or close to it). I've done some experiments with this and what I found was maybe a 5 minute difference, sometimes, from a clean build between tmpfs and RAID10. I discussed this during Yocto Developer Day. Let me boil it down a bit to explain some of what I did on the autobuilders. Caveat first though. I would avoid using autobuilder time as representative of prime yocto build time. The autobuilder hosts a lot of different services that sometimes impact build time and this can vary depending on what else is going on on the machine. There are four places, in general, where you want to look at optimizing outside of dependency issues. CPU, disk, memory, build process. What I found was that the most useful of these in getting the autobuilder time down was disk and build process. With disk, spreading it across the RAID saved us not only a bit of time, but also helped us avoid trashed disks. More disk thrash == higher failure rate. So far this year we've seen two disk failures that have resulted in almost zero autobuilder downtime. The real time saver however ended up being maintaining sstate across build runs. Even with our sstate on nfs, we're still seeing a dramatic decrease in build time. I would be interested in seeing what times you get with tmpfs. I've done tmpfs builds before and have seen good results, but bang for the buck did end up being a RAID array. With a higher commit interval, the kernel doesn't try to sync the > dcache with the disks as frequently (eg not even once during a build), > so it's effectively writing to memory (although there is still plenty > of IO occurring). > > The other reason is that while 48GB is plenty for a single build, I > often run many builds in parallel, sometimes in virtual machines when > I need to reproduce or test something on different hosts. > > For example: > > https://picasaweb.google.com/lh/photo/7PCrqXQqxL98SAY1ecNzDdMTjNZETYmyPJy0liipFm0?feat=directlink > > > - -- > Darren Hart > Intel Open Source Technology Center > Yocto Project - Linux Kernel > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJPhuLfAAoJEKbMaAwKp3648pYH/1HGCzI1QP1mj1OPfbo1TNou > nq1dCnEQOc+vUqShrmgjEY5H2G7Kqu5Y8JRp8m3D6v2iUPwu+ko3xASJkIVetgTn > 1J+dkZl93Gbm8nm63b5bES0mMqyiycNgXW4KTL0iA+4mLbKSXck7nF/gIyjE4iHa > SR+DDavSoOIJUiZsJBJpIdS4sY2RpalohhJvp97Qfmbxmqlo2RJkqzB7OmLliKbB > zGiuXeFgGojZXIRl11Rr36kqqA75WoTlNYjlkcg1paEhCr4zCMh0sujGaPQgVPtu > YU+FCtGxQ569f+hahdJraCU9T4IbMK4AOk30VqVxPifCqFhIvr7FnVRkYtV5pZM= > =tdFq > -----END PGP SIGNATURE----- > -- Elizabeth Flanagan Yocto Project Build and Release [-- Attachment #2: Type: text/html, Size: 4923 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 23:37 ` Flanagan, Elizabeth @ 2012-04-13 5:51 ` Martin Jansa 2012-04-13 6:08 ` Darren Hart 2012-04-17 15:29 ` Martin Jansa 0 siblings, 2 replies; 36+ messages in thread From: Martin Jansa @ 2012-04-13 5:51 UTC (permalink / raw) To: Flanagan, Elizabeth; +Cc: Yocto Project, Darren Hart [-- Attachment #1: Type: text/plain, Size: 4849 bytes --] On Thu, Apr 12, 2012 at 04:37:00PM -0700, Flanagan, Elizabeth wrote: > On Thu, Apr 12, 2012 at 7:12 AM, Darren Hart <dvhart@linux.intel.com> wrote: > > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > > > > > > > On 04/12/2012 01:00 AM, Martin Jansa wrote: > > > On Thu, Apr 12, 2012 at 01:05:00PM +0530, Joshua Immanuel wrote: > > >> Darren, > > >> > > >> On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote: > > >>> I run on a beast with 12 cores, 48GB of RAM, OS and sources on > > >>> a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array > > >>> for my /build partition. I run a headless Ubuntu 11.10 (x86_64) > > >>> installation running the 3.0.0-16-server kernel. I can build > > >>> core-image-minimal in < 30 minutes and core-image-sato in < 50 > > >>> minutes from scratch. > > > > > > why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to > > > be able to do my builds in tmpfs and keep only more permanent data > > > on RAID. > > > > We've done some experiments with tmpfs, adding Beth on CC. If I recall > > correctly, my RAID0 array with the mount options I specified > > accomplishes much of what tmpfs does for me without the added setup. > > > > This should be the case in general. For the most part, if you have a decent > RAID setup (We're using RAID10 on the ab) with fast disks you should be > able to hit tmpfs speed (or close to it). I've done some experiments with > this and what I found was maybe a 5 minute difference, sometimes, from a > clean build between tmpfs and RAID10. 5 minutes on very small image like core-image-minimal (30 min) is 1/6 of that time :).. I have much bigger images and even bigger ipk feed, so to rebuild from scratch takes about 24 hours for one architecture.. And my system is very slow compared to yours, I've found my measurement of core-image-minimal-with-mtdutils around 95 mins http://patchwork.openembedded.org/patch/17039/ but this was with Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for WORKDIR, RAID5 (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now I have Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but different motherboard.. Problem with tmpfs is that no RAM is big enough to build whole feed in one go, so I have to build in steps (e.g. bitbake gcc for all machines with the same architecture, then cleanup WORKDIR and switch to another arch, then bitbake small-image, bigger-image, qt4-x11-free, ...). qt4-x11-free is able to eat 15GB tmpfs almost completely. > I discussed this during Yocto Developer Day. Let me boil it down a bit to > explain some of what I did on the autobuilders. > > Caveat first though. I would avoid using autobuilder time as representative > of prime yocto build time. The autobuilder hosts a lot of different > services that sometimes impact build time and this can vary depending on > what else is going on on the machine. > > There are four places, in general, where you want to look at optimizing > outside of dependency issues. CPU, disk, memory, build process. What I > found was that the most useful of these in getting the autobuilder time > down was disk and build process. > > With disk, spreading it across the RAID saved us not only a bit of time, > but also helped us avoid trashed disks. More disk thrash == higher failure > rate. So far this year we've seen two disk failures that have resulted in > almost zero autobuilder downtime. True for RAID10, but for WORKDIR itself RAID0 is cheeper and even higher failure rate it's not big issue for WORKDIR.. just have to cleansstate tasks which were in hit in the middle of build.. > The real time saver however ended up being maintaining sstate across build > runs. Even with our sstate on nfs, we're still seeing a dramatic decrease > in build time. > > I would be interested in seeing what times you get with tmpfs. I've done > tmpfs builds before and have seen good results, but bang for the buck did > end up being a RAID array. I'll check if core-image-minimal can be built with just 15GB tmpfs, otherwise I would have to build it in 2 steps and the time wont be precise. > With a higher commit interval, the kernel doesn't try to sync the > > dcache with the disks as frequently (eg not even once during a build), > > so it's effectively writing to memory (although there is still plenty > > of IO occurring). > > > > The other reason is that while 48GB is plenty for a single build, I > > often run many builds in parallel, sometimes in virtual machines when > > I need to reproduce or test something on different hosts. > > > > For example: > > > > https://picasaweb.google.com/lh/photo/7PCrqXQqxL98SAY1ecNzDdMTjNZETYmyPJy0liipFm0?feat=directlink -- Martin 'JaMa' Jansa jabber: Martin.Jansa@gmail.com [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 205 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-13 5:51 ` Martin Jansa @ 2012-04-13 6:08 ` Darren Hart 2012-04-13 6:38 ` Martin Jansa 2012-04-13 7:24 ` Wolfgang Denk 2012-04-17 15:29 ` Martin Jansa 1 sibling, 2 replies; 36+ messages in thread From: Darren Hart @ 2012-04-13 6:08 UTC (permalink / raw) To: Martin Jansa; +Cc: Yocto Project -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On 04/12/2012 10:51 PM, Martin Jansa wrote: > And my system is very slow compared to yours, I've found my > measurement of core-image-minimal-with-mtdutils around 95 mins > http://patchwork.openembedded.org/patch/17039/ but this was with > Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for WORKDIR, RAID5 > (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now I have > Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but > different motherboard.. Why RAID5 for BUILDDIR? The write overhead of RAID5 is very high. The savings RAID5 alots you is more significant with more disks, but with 3 disks it's only 1 disk better than RAID10, with a lot more overhead. I spent some time outlining all this a while back: http://www.dvhart.com/2011/03/qnap_ts419p_configuration_raid_levels_and_throughput/ Here's the relevant bit: "RAID 5 distributes parity across all the drives in the array, this parity calculation is both compute intensive and IO intensive. Every write requires the parity calculation, and data must be written to every drive." - -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iQEcBAEBAgAGBQJPh8LTAAoJEKbMaAwKp364pa8H/A8BSudN/g7ixFmUTYMNGHlC 2+H59MgNHYWRYzNn9QvN6vyyfXzX7C00HUTQ4MQ3CmisTUza2tbJEdX9CpeIBQNg Ny8iqyNNoInTFx2T1Yi2eA9Ytegtue9Ls+IcBRbpIbs6Zo1Qwzi6oemdPZN7g3YI rH/NKALWIBt/Y/Dt2k0fz7WsQGYOuE/lYpL/CmukU7vNNEUAdOs7tZa5o1ZOQDuj zGCwuVH9QwrDJEXNsMtjNY37aJeAgDMwSXjN0pKv1WQI9j47kYQQrrp2qKVQYhV1 x4QxJ5aOuV7BaS0Y7zYkNo9nv+yKPODt25s5L83k5vjbMhCvczmMJn3jupQuUhQ= =3GDA -----END PGP SIGNATURE----- ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-13 6:08 ` Darren Hart @ 2012-04-13 6:38 ` Martin Jansa 2012-04-13 7:24 ` Wolfgang Denk 1 sibling, 0 replies; 36+ messages in thread From: Martin Jansa @ 2012-04-13 6:38 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project [-- Attachment #1: Type: text/plain, Size: 2479 bytes --] On Thu, Apr 12, 2012 at 11:08:19PM -0700, Darren Hart wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > On 04/12/2012 10:51 PM, Martin Jansa wrote: > > > And my system is very slow compared to yours, I've found my > > measurement of core-image-minimal-with-mtdutils around 95 mins > > http://patchwork.openembedded.org/patch/17039/ but this was with > > Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for WORKDIR, RAID5 > > (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now I have > > Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but > > different motherboard.. > > Why RAID5 for BUILDDIR? The write overhead of RAID5 is very high. The > savings RAID5 alots you is more significant with more disks, but with > 3 disks it's only 1 disk better than RAID10, with a lot more overhead. Becaure RAID10 needs at least 4 drivers and all my SATA ports are already used and also it's on my /home partition.. please not that this is not some company build server, just my desktop where it happens I do a lot of builds for comunity distribution for smartphones http://shr-project.org Server we have available for builds is _much_ slower then this especially IO (some virtualized host on busy server), but has much better network bandwidth.. :). Cheers, > I spent some time outlining all this a while back: > http://www.dvhart.com/2011/03/qnap_ts419p_configuration_raid_levels_and_throughput/ > > Here's the relevant bit: > > "RAID 5 distributes parity across all the drives in the array, this > parity calculation is both compute intensive and IO intensive. Every > write requires the parity calculation, and data must be written to > every drive." > > > > - -- > Darren Hart > Intel Open Source Technology Center > Yocto Project - Linux Kernel > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.11 (GNU/Linux) > Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ > > iQEcBAEBAgAGBQJPh8LTAAoJEKbMaAwKp364pa8H/A8BSudN/g7ixFmUTYMNGHlC > 2+H59MgNHYWRYzNn9QvN6vyyfXzX7C00HUTQ4MQ3CmisTUza2tbJEdX9CpeIBQNg > Ny8iqyNNoInTFx2T1Yi2eA9Ytegtue9Ls+IcBRbpIbs6Zo1Qwzi6oemdPZN7g3YI > rH/NKALWIBt/Y/Dt2k0fz7WsQGYOuE/lYpL/CmukU7vNNEUAdOs7tZa5o1ZOQDuj > zGCwuVH9QwrDJEXNsMtjNY37aJeAgDMwSXjN0pKv1WQI9j47kYQQrrp2qKVQYhV1 > x4QxJ5aOuV7BaS0Y7zYkNo9nv+yKPODt25s5L83k5vjbMhCvczmMJn3jupQuUhQ= > =3GDA > -----END PGP SIGNATURE----- -- Martin 'JaMa' Jansa jabber: Martin.Jansa@gmail.com [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 205 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-13 6:08 ` Darren Hart 2012-04-13 6:38 ` Martin Jansa @ 2012-04-13 7:24 ` Wolfgang Denk 1 sibling, 0 replies; 36+ messages in thread From: Wolfgang Denk @ 2012-04-13 7:24 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project Dear Darren Hart, In message <4F87C2D3.8020805@linux.intel.com> you wrote: > > > Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for WORKDIR, RAID5 > > (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now I have > > Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but > > different motherboard.. > > Why RAID5 for BUILDDIR? The write overhead of RAID5 is very high. The > savings RAID5 alots you is more significant with more disks, but with > 3 disks it's only 1 disk better than RAID10, with a lot more overhead. Indeed, RAID5 with just 3 devices makes little sense - especially when running on the same drives as the RAID0 workdir. > I spent some time outlining all this a while back: > http://www.dvhart.com/2011/03/qnap_ts419p_configuration_raid_levels_and_throughput/ Well, such data from a 4 spindle array are nor teling much. When you are asking for I/O performance on RAID arrays, you want to distibute load over _many_ spindles. Do your comparisons on a 8 or 16 (or more) spindle setup, and the results will be much different. Also, your test of copying huge files is just one usage mode: strictly sequential access. But what we see with OE / Yocto builds is completely different. Here you will see a huge number of small and even tiny data transfers. "Classical" recommendations for performance optimization od RAID arrays (which are usually tuning for such big, sequentuial accesses only) like using big stripe sizes and huge read-ahead etc. turn out to be counter-productive here. But it makes no sense to have for example a stripe size of 256 kB or more when 95% or more of your disk accesses write less than 4 kB only. > Here's the relevant bit: > > "RAID 5 distributes parity across all the drives in the array, this > parity calculation is both compute intensive and IO intensive. Every > write requires the parity calculation, and data must be written to > every drive." But did you look at a real system? I never found the CPU load of the parity calculations to be a bottleneck. I rather have the CPU spend cycles on computing parity, instead of running it with all cores idle because it's waitong for I/O to complete. I found that for the work loads we have (software builds like Yocto etc.) a multi-spindle software RAID array outperforms all other solutions (and especially the h/w RAID controllers I had access to so far - these don't even closely reach the same number of IOPS). OH - and BTW: if you care about reliability, then don't use RAID5. Go for RAID6. Yes, it's more expensive, but it's also much less painful when you have to rebuild the array in case of a disk failure. I've seen too many cases where a second disk would fail during the rebuild to ever go with RAID5 for big systems again - restoring several TB of data from tape ain't no fun. See also the RAID wiki for specific performance optizations on such RAID arrays. Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk & Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: wd@denx.de Never put off until tomorrow what you can put off indefinitely. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-13 5:51 ` Martin Jansa 2012-04-13 6:08 ` Darren Hart @ 2012-04-17 15:29 ` Martin Jansa 1 sibling, 0 replies; 36+ messages in thread From: Martin Jansa @ 2012-04-17 15:29 UTC (permalink / raw) To: Flanagan, Elizabeth; +Cc: Yocto Project, Darren Hart [-- Attachment #1: Type: text/plain, Size: 6699 bytes --] On Fri, Apr 13, 2012 at 07:51:51AM +0200, Martin Jansa wrote: > On Thu, Apr 12, 2012 at 04:37:00PM -0700, Flanagan, Elizabeth wrote: > > On Thu, Apr 12, 2012 at 7:12 AM, Darren Hart <dvhart@linux.intel.com> wrote: > > > > > -----BEGIN PGP SIGNED MESSAGE----- > > > Hash: SHA1 > > > > > > > > > > > > On 04/12/2012 01:00 AM, Martin Jansa wrote: > > > > On Thu, Apr 12, 2012 at 01:05:00PM +0530, Joshua Immanuel wrote: > > > >> Darren, > > > >> > > > >> On Wed, 2012-04-11 at 17:30 -0700, Darren Hart wrote: > > > >>> I run on a beast with 12 cores, 48GB of RAM, OS and sources on > > > >>> a G2 Intel SSD, with two Seagate Barracudas in a RAID0 array > > > >>> for my /build partition. I run a headless Ubuntu 11.10 (x86_64) > > > >>> installation running the 3.0.0-16-server kernel. I can build > > > >>> core-image-minimal in < 30 minutes and core-image-sato in < 50 > > > >>> minutes from scratch. > > > > > > > > why not use so much RAM for WORKDIR in tmpfs? I bought 16GB just to > > > > be able to do my builds in tmpfs and keep only more permanent data > > > > on RAID. > > > > > > We've done some experiments with tmpfs, adding Beth on CC. If I recall > > > correctly, my RAID0 array with the mount options I specified > > > accomplishes much of what tmpfs does for me without the added setup. > > > > > > > This should be the case in general. For the most part, if you have a decent > > RAID setup (We're using RAID10 on the ab) with fast disks you should be > > able to hit tmpfs speed (or close to it). I've done some experiments with > > this and what I found was maybe a 5 minute difference, sometimes, from a > > clean build between tmpfs and RAID10. > > 5 minutes on very small image like core-image-minimal (30 min) is 1/6 of > that time :).. > > I have much bigger images and even bigger ipk feed, so to rebuild from > scratch takes about 24 hours for one architecture.. > > And my system is very slow compared to yours, I've found my measurement > of core-image-minimal-with-mtdutils around 95 mins > http://patchwork.openembedded.org/patch/17039/ > but this was with Phenom II X4 965, 4GB RAM, RAID0 (3 SATA2 disks) for > WORKDIR, RAID5 (the same 3 SATA2 disks) BUILDDIR (raid as mdraid), now > I have Bulldozer AMD FX(tm)-8120, 16GB RAM, still the same RAID0 but > different motherboard.. > > Problem with tmpfs is that no RAM is big enough to build whole feed in > one go, so I have to build in steps (e.g. bitbake gcc for all machines > with the same architecture, then cleanup WORKDIR and switch to another > arch, then bitbake small-image, bigger-image, qt4-x11-free, ...). > qt4-x11-free is able to eat 15GB tmpfs almost completely. > > > I discussed this during Yocto Developer Day. Let me boil it down a bit to > > explain some of what I did on the autobuilders. > > > > Caveat first though. I would avoid using autobuilder time as representative > > of prime yocto build time. The autobuilder hosts a lot of different > > services that sometimes impact build time and this can vary depending on > > what else is going on on the machine. > > > > There are four places, in general, where you want to look at optimizing > > outside of dependency issues. CPU, disk, memory, build process. What I > > found was that the most useful of these in getting the autobuilder time > > down was disk and build process. > > > > With disk, spreading it across the RAID saved us not only a bit of time, > > but also helped us avoid trashed disks. More disk thrash == higher failure > > rate. So far this year we've seen two disk failures that have resulted in > > almost zero autobuilder downtime. > > True for RAID10, but for WORKDIR itself RAID0 is cheeper and even higher > failure rate it's not big issue for WORKDIR.. just have to cleansstate > tasks which were in hit in the middle of build.. > > > The real time saver however ended up being maintaining sstate across build > > runs. Even with our sstate on nfs, we're still seeing a dramatic decrease > > in build time. > > > > I would be interested in seeing what times you get with tmpfs. I've done > > tmpfs builds before and have seen good results, but bang for the buck did > > end up being a RAID array. > > I'll check if core-image-minimal can be built with just 15GB tmpfs, > otherwise I would have to build it in 2 steps and the time wont be > precise. It was enough with rm_work, so here are my results: The difference is much smaller then I've expected, but again those are very small images (next time I'll try to do just qt4 builds). Fastest is TMPDIR on tmpfs (BUILDDIR is not important - same times with BUILDDIR also in tmpfs and on SATA2 disk). raid0 is only about 4% slower single SATA2 disk is slowest but only a bit slower then raid5, but that could be caused by bug #2314 as I had to run build twice.. And all times were just from first successfull build, it could be different with avg time over 10 builds.. And all builds on AMD FX(tm)-8120 Eight-Core Processor 16G DDR3-1600 RAM standalone SATA2 disk ST31500341AS mdraid on 3 older SATA2 disks HDS728080PLA380 bitbake: commit 4219e2ea033232d95117211947b751bdb5efafd4 Author: Saul Wold <sgw@linux.intel.com> Date: Tue Apr 10 17:57:15 2012 -0700 openembedded-core: commit 4396db54dba4afdb9f1099f4e386dc25c76f49fb Author: Richard Purdie <richard.purdie@linuxfoundation.org> Date: Sat Apr 14 23:42:16 2012 +0100 + fix for opkg-utils, so that package-index doesn't take ages to complete BUILDDIR = 1 SATA2 disk TMPDIR = tmpfs real 84m32.995s user 263m46.316s sys 48m26.376s BUILDDIR = tmpfs TMPDIR = tmpfs real 84m10.528s user 264m16.144s sys 50m21.853s BUILDDIR = raid5 TMPDIR = raid5 real 91m20.470s user 263m47.156s sys 52m23.400s BUILDDIR = raid0 TMPDIR = raid0 real 87m29.526s user 263m0.799s sys 51m37.242s BUILDDIR = 1 SATA2 disk TMPDIR = the same SATA2 disk Summary: 1 task failed: /OE/oe-core/openembedded-core/meta/recipes-core/eglibc/eglibc_2.15.bb, do_compile Summary: There was 1 ERROR message shown, returning a non-zero exit code. see https://bugzilla.yoctoproject.org/show_bug.cgi?id=2314 real 48m23.412s user 163m55.082s sys 23m26.990s + touch oe-core/tmp-eglibc/work/x86_64-oe-linux/eglibc-2.15-r6+svnr17386/eglibc-2_15/libc/Makerules + Summary: There were 6 WARNING messages shown. real 44m13.401s user 92m44.427s sys 27m38.347s = real 92m36.813s user 255m99.509s sys 51m05.337s -- Martin 'JaMa' Jansa jabber: Martin.Jansa@gmail.com [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 205 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 0:30 ` Darren Hart ` (2 preceding siblings ...) 2012-04-12 7:35 ` Joshua Immanuel @ 2012-04-12 14:08 ` Björn Stenberg 2012-04-12 14:34 ` Darren Hart 2012-04-13 9:56 ` Tomas Frydrych 4 siblings, 1 reply; 36+ messages in thread From: Björn Stenberg @ 2012-04-12 14:08 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project Darren Hart wrote: > /dev/md0 /build ext4 > noauto,noatime,nodiratime,commit=6000 A minor detail: 'nodiratime' is a subset of 'noatime', so there is no need to specify both. > I run on a beast with 12 cores, 48GB of RAM, OS and sources on a G2 > Intel SSD, with two Seagate Barracudas in a RAID0 array for my /build > partition. I run a headless Ubuntu 11.10 (x86_64) installation running > the 3.0.0-16-server kernel. I can build core-image-minimal in < 30 > minutes and core-image-sato in < 50 minutes from scratch. I'm guessing those are rather fast cores? I build on a different type of beast: 64 cores at 2.1GHz and 128 GB ram. The OS is on a single SSD and the build dir (and sources) is on a RAID0 array of Intel 520 SSDs. Kernel is the same ubuntu 3.0.0-16-server as yours. Yet for all the combined horsepower, I am unable to match your time of 30 minutes for core-image-minimal. I clock in at around 37 minutes for a qemux86-64 build with ipk output: ------ NOTE: Tasks Summary: Attempted 1363 tasks of which 290 didn't need to be rerun and all succeeded. real 36m32.118s user 214m39.697s sys 108m49.152s ------ These numbers also show that my build is running less than 9x realtime, indicating that 80% of my cores sit idle most of the time. This confirms what "ps xf" says during the builds: Only rarely is bitbake running more than a handful tasks at once, even with BB_NUMBER_THREADS at 64. And many of these tasks are in turn running sequential loops on a single core. I'm hoping to find time soon to look deeper into this issue and suggest remedies. It my distinct feeling that we should be able to build significantly faster on powerful machines. -- Björn ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 14:08 ` Björn Stenberg @ 2012-04-12 14:34 ` Darren Hart 2012-04-12 22:43 ` Chris Tapp ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: Darren Hart @ 2012-04-12 14:34 UTC (permalink / raw) To: Björn Stenberg; +Cc: Yocto Project On 04/12/2012 07:08 AM, Björn Stenberg wrote: > Darren Hart wrote: >> /dev/md0 /build ext4 >> noauto,noatime,nodiratime,commit=6000 > > A minor detail: 'nodiratime' is a subset of 'noatime', so there is no > need to specify both. Excellent, thanks for the tip. > >> I run on a beast with 12 cores, 48GB of RAM, OS and sources on a >> G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my >> /build partition. I run a headless Ubuntu 11.10 (x86_64) >> installation running the 3.0.0-16-server kernel. I can build >> core-image-minimal in < 30 minutes and core-image-sato in < 50 >> minutes from scratch. > > I'm guessing those are rather fast cores? They are: model name : Intel(R) Xeon(R) CPU X5680 @ 3.33GHz > I build on a different type > of beast: 64 cores at 2.1GHz and 128 GB ram. The OS is on a single > SSD and the build dir (and sources) is on a RAID0 array of Intel 520 > SSDs. Kernel is the same ubuntu 3.0.0-16-server as yours. Now that I think about it, my downloads are on the RAID0 array too. One thing that comes to mind is the parallel settings, BB_NUMBER_THREADS and PARALLEL_MAKE. I noticed a negative impact if I increased these beyond 12 and 14 respectively. I tested this with bb-matrix (scripts/contrib/bb-perf/bb-matrix.sh). The script is a bit fickle, but can provide useful results and killer 3D surface plots of build time with BB and PM on the axis. Can't seem to find a plot image at the moment for some reason... > > Yet for all the combined horsepower, I am unable to match your time > of 30 minutes for core-image-minimal. I clock in at around 37 minutes > for a qemux86-64 build with ipk output: > > ------ NOTE: Tasks Summary: Attempted 1363 tasks of which 290 didn't > need to be rerun and all succeeded. > > real 36m32.118s user 214m39.697s sys 108m49.152s ------ > > These numbers also show that my build is running less than 9x > realtime, indicating that 80% of my cores sit idle most of the time. Yup, that sounds about right. The build has a linear component to it, and anything above about 12 just doesn't help. In fact the added scheduling overhead seems to hurt. > This confirms what "ps xf" says during the builds: Only rarely is > bitbake running more than a handful tasks at once, even with > BB_NUMBER_THREADS at 64. And many of these tasks are in turn running > sequential loops on a single core. > > I'm hoping to find time soon to look deeper into this issue and > suggest remedies. It my distinct feeling that we should be able to > build significantly faster on powerful machines. > Reducing the dependency chains that result in the linear component of the build (forcing serialized execution) is one place we've focused, and could probably still use some attention. CC'ing RP as he's done a lot there. -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 14:34 ` Darren Hart @ 2012-04-12 22:43 ` Chris Tapp 2012-04-12 22:56 ` Darren Hart 2012-04-13 8:45 ` Richard Purdie 2012-04-13 8:47 ` Björn Stenberg 2 siblings, 1 reply; 36+ messages in thread From: Chris Tapp @ 2012-04-12 22:43 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project On 12 Apr 2012, at 15:34, Darren Hart wrote: > > > On 04/12/2012 07:08 AM, Björn Stenberg wrote: >> Darren Hart wrote: >>> /dev/md0 /build ext4 >>> noauto,noatime,nodiratime,commit=6000 >> >> A minor detail: 'nodiratime' is a subset of 'noatime', so there is no >> need to specify both. > > Excellent, thanks for the tip. > >> >>> I run on a beast with 12 cores, 48GB of RAM, OS and sources on a >>> G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my >>> /build partition. I run a headless Ubuntu 11.10 (x86_64) >>> installation running the 3.0.0-16-server kernel. I can build >>> core-image-minimal in < 30 minutes and core-image-sato in < 50 >>> minutes from scratch. >> >> I'm guessing those are rather fast cores? > > They are: > model name : Intel(R) Xeon(R) CPU X5680 @ 3.33GHz Nice, but well out of my budget - I've got to make do with what one of your CPUs costs for the whole system ;-) > >> I build on a different type >> of beast: 64 cores at 2.1GHz and 128 GB ram. The OS is on a single >> SSD and the build dir (and sources) is on a RAID0 array of Intel 520 >> SSDs. Kernel is the same ubuntu 3.0.0-16-server as yours. > > Now that I think about it, my downloads are on the RAID0 array too. > > One thing that comes to mind is the parallel settings, BB_NUMBER_THREADS > and PARALLEL_MAKE. I noticed a negative impact if I increased these > beyond 12 and 14 respectively. I tested this with bb-matrix > (scripts/contrib/bb-perf/bb-matrix.sh). The script is a bit fickle, but > can provide useful results and killer 3D surface plots of build time > with BB and PM on the axis. Can't seem to find a plot image at the > moment for some reason... > >> >> Yet for all the combined horsepower, I am unable to match your time >> of 30 minutes for core-image-minimal. I clock in at around 37 minutes >> for a qemux86-64 build with ipk output: >> >> ------ NOTE: Tasks Summary: Attempted 1363 tasks of which 290 didn't >> need to be rerun and all succeeded. >> >> real 36m32.118s user 214m39.697s sys 108m49.152s ------ >> >> These numbers also show that my build is running less than 9x >> realtime, indicating that 80% of my cores sit idle most of the time. > > Yup, that sounds about right. The build has a linear component to it, > and anything above about 12 just doesn't help. In fact the added > scheduling overhead seems to hurt. > >> This confirms what "ps xf" says during the builds: Only rarely is >> bitbake running more than a handful tasks at once, even with >> BB_NUMBER_THREADS at 64. And many of these tasks are in turn running >> sequential loops on a single core. >> >> I'm hoping to find time soon to look deeper into this issue and >> suggest remedies. It my distinct feeling that we should be able to >> build significantly faster on powerful machines. >> > > Reducing the dependency chains that result in the linear component of > the build (forcing serialized execution) is one place we've focused, and > could probably still use some attention. CC'ing RP as he's done a lot there. Current plan for a 'budget' system is: DX79TO motherboard, i7 3820, 16GB RAM, a pair of 60GB OCZ Vertex III's in RAID-0 for downloads / build, SATA HD for OS (Ubuntu 11.10 x86_64). That'll give me a 2.7x boost just on CPU and the SSDs (and maybe some over-clocking) will give some more. Not sure if SSDs in RAID-0 will give any boost, so I'll run some tests. Thanks to all for the comments in this thread. Chris Tapp opensource@keylevel.com www.keylevel.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 22:43 ` Chris Tapp @ 2012-04-12 22:56 ` Darren Hart 2012-04-18 19:41 ` Chris Tapp 0 siblings, 1 reply; 36+ messages in thread From: Darren Hart @ 2012-04-12 22:56 UTC (permalink / raw) To: Chris Tapp; +Cc: Yocto Project On 04/12/2012 03:43 PM, Chris Tapp wrote: > On 12 Apr 2012, at 15:34, Darren Hart wrote: >> >> >> On 04/12/2012 07:08 AM, Björn Stenberg wrote: >>> Darren Hart wrote: >>>> /dev/md0 /build ext4 >>>> noauto,noatime,nodiratime,commit=6000 >>> >>> A minor detail: 'nodiratime' is a subset of 'noatime', so there is no >>> need to specify both. >> >> Excellent, thanks for the tip. >> >>> >>>> I run on a beast with 12 cores, 48GB of RAM, OS and sources on a >>>> G2 Intel SSD, with two Seagate Barracudas in a RAID0 array for my >>>> /build partition. I run a headless Ubuntu 11.10 (x86_64) >>>> installation running the 3.0.0-16-server kernel. I can build >>>> core-image-minimal in < 30 minutes and core-image-sato in < 50 >>>> minutes from scratch. >>> >>> I'm guessing those are rather fast cores? >> >> They are: >> model name : Intel(R) Xeon(R) CPU X5680 @ 3.33GHz > > Nice, but well out of my budget - I've got to make do with what one of your CPUs costs for the whole system ;-) > >> >>> I build on a different type >>> of beast: 64 cores at 2.1GHz and 128 GB ram. The OS is on a single >>> SSD and the build dir (and sources) is on a RAID0 array of Intel 520 >>> SSDs. Kernel is the same ubuntu 3.0.0-16-server as yours. >> >> Now that I think about it, my downloads are on the RAID0 array too. >> >> One thing that comes to mind is the parallel settings, BB_NUMBER_THREADS >> and PARALLEL_MAKE. I noticed a negative impact if I increased these >> beyond 12 and 14 respectively. I tested this with bb-matrix >> (scripts/contrib/bb-perf/bb-matrix.sh). The script is a bit fickle, but >> can provide useful results and killer 3D surface plots of build time >> with BB and PM on the axis. Can't seem to find a plot image at the >> moment for some reason... >> >>> >>> Yet for all the combined horsepower, I am unable to match your time >>> of 30 minutes for core-image-minimal. I clock in at around 37 minutes >>> for a qemux86-64 build with ipk output: >>> >>> ------ NOTE: Tasks Summary: Attempted 1363 tasks of which 290 didn't >>> need to be rerun and all succeeded. >>> >>> real 36m32.118s user 214m39.697s sys 108m49.152s ------ >>> >>> These numbers also show that my build is running less than 9x >>> realtime, indicating that 80% of my cores sit idle most of the time. >> >> Yup, that sounds about right. The build has a linear component to it, >> and anything above about 12 just doesn't help. In fact the added >> scheduling overhead seems to hurt. >> >>> This confirms what "ps xf" says during the builds: Only rarely is >>> bitbake running more than a handful tasks at once, even with >>> BB_NUMBER_THREADS at 64. And many of these tasks are in turn running >>> sequential loops on a single core. >>> >>> I'm hoping to find time soon to look deeper into this issue and >>> suggest remedies. It my distinct feeling that we should be able to >>> build significantly faster on powerful machines. >>> >> >> Reducing the dependency chains that result in the linear component of >> the build (forcing serialized execution) is one place we've focused, and >> could probably still use some attention. CC'ing RP as he's done a lot there. > > Current plan for a 'budget' system is: > > DX79TO motherboard, i7 3820, 16GB RAM, a pair of 60GB OCZ Vertex III's in RAID-0 for downloads / build, SATA HD for OS (Ubuntu 11.10 x86_64). > > That'll give me a 2.7x boost just on CPU and the SSDs (and maybe some over-clocking) will give some more. > > Not sure if SSDs in RAID-0 will give any boost, so I'll run some tests. > > Thanks to all for the comments in this thread. Get back to us with times, and we'll build up a wiki page. > > Chris Tapp > > opensource@keylevel.com > www.keylevel.com -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 22:56 ` Darren Hart @ 2012-04-18 19:41 ` Chris Tapp 2012-04-18 20:27 ` Chris Tapp 2012-04-18 20:55 ` Darren Hart 0 siblings, 2 replies; 36+ messages in thread From: Chris Tapp @ 2012-04-18 19:41 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project On 12 Apr 2012, at 23:56, Darren Hart wrote: > Get back to us with times, and we'll build up a wiki page. Some initial results / comments: I'm running on: - i7 3820 (quad core, hyper-treading, 3.6GHz) - 16GB RAM (1600MHz XMP profile) - Asus P9X79 Pro motherboard - Ubuntu 11.10 x86_64 server installed on a 60GB OCZ Vertex 3 SSD on a 3Gb/s interface - Two 60GB OCZ Vertex 3s as RAID-0 on 6Gb/s interfaces. The following results use a DL_DIR on the OS SSD (pre-populated) - I'm not interested in the speed of the internet, especially as I've only got a relatively slow connection ;-) Poky-6.0.1 is also installed on the OS SSD. I've done a few builds of core-image-minimal: 1) Build dir on the OS SSD 2) Build dir on the SSD RAID + various bits of tuning. The results are basically the same, so it seems as if the SSD RAID makes no difference. Benchmarking it does show twice the read/write performance of the OS SSD, as expected. Disabling journalling and increasing the commit time to 6000 also made no significant difference to the build times, which were (to the nearest minute): Real : 42m User : 133m System : 19m These time were starting from nothing, and seem to fit with your 30 minutes with 3 times as many cores! BTW, BB_NUMBER_THREADS was set to 16 and PARALLEL_MAKE to 12. I also tried rebuilding the kernel: bitbake -c clean linux-yocto rm -rf the sstate bits for the above bitbake linux-yocto and got the following times: Real : 39m User : 105m System : 16m Which kind of fits with an observation. The minimal build had something like 1530 stages to complete. The first 750 to 800 of these flew past with all 8 'cores' running at just about 100% all the time. Load average (short term) was about 19, so plenty ready to run. However, round about the time python-native, the kernel, libxslt, gettext kicked in the cpu usage dropped right off - to the point that the short term load average dropped below 3. It did pick up again later on (after the kernel was completed) before slowing down again towards the end (when it would seem reasonable to expect that less can run in parallel). It seems as if some of these bits (or others around this time) aren't making use of parallel make or there is a queue of dependent tasks that needs to be serialized. The kernel build is a much bigger part of the build than I was expecting, but this is only a small image. However, it looks as if the main compilation phase completes very early on and a lot of time is then spent building the modules (in a single thread, it seems) and in packaging - which leads me to ask if RPM is the best option (speed wise)? I don't use the packages myself (though understand they are needed internally), so I can use the fastest (if there is one). Is there anything else I should be considering to improve build times? As I said above, this is just a rough-cut at some benchmarking and I plan to do some more, especially if there are other things to try and/or any other information that would be useful. Still, it's looking much, much faster than my old build system :-) Chris Tapp opensource@keylevel.com www.keylevel.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-18 19:41 ` Chris Tapp @ 2012-04-18 20:27 ` Chris Tapp 2012-04-18 20:55 ` Darren Hart 1 sibling, 0 replies; 36+ messages in thread From: Chris Tapp @ 2012-04-18 20:27 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project On 18 Apr 2012, at 20:41, Chris Tapp wrote: > On 12 Apr 2012, at 23:56, Darren Hart wrote: >> Get back to us with times, and we'll build up a wiki page. > > <snip> > > I also tried rebuilding the kernel: > bitbake -c clean linux-yocto > rm -rf the sstate bits for the above > bitbake linux-yocto > > and got the following times: (CORRECT TIMES INSERTED): > Real : 11m > User : 15m > System : 2m The comments about low load averages during kernel build still stand. Chris Tapp opensource@keylevel.com www.keylevel.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-18 19:41 ` Chris Tapp 2012-04-18 20:27 ` Chris Tapp @ 2012-04-18 20:55 ` Darren Hart 2012-04-19 22:39 ` Chris Tapp 1 sibling, 1 reply; 36+ messages in thread From: Darren Hart @ 2012-04-18 20:55 UTC (permalink / raw) To: Chris Tapp; +Cc: Yocto Project On 04/18/2012 12:41 PM, Chris Tapp wrote: > On 12 Apr 2012, at 23:56, Darren Hart wrote: >> Get back to us with times, and we'll build up a wiki page. > > Some initial results / comments: > > I'm running on: > - i7 3820 (quad core, hyper-treading, 3.6GHz) > - 16GB RAM (1600MHz XMP profile) > - Asus P9X79 Pro motherboard > - Ubuntu 11.10 x86_64 server installed on a 60GB OCZ Vertex 3 SSD on a 3Gb/s interface > - Two 60GB OCZ Vertex 3s as RAID-0 on 6Gb/s interfaces. > > The following results use a DL_DIR on the OS SSD (pre-populated) - > I'm not interested in the speed of the internet, especially as I've only got a relatively slow connection ;-) > > Poky-6.0.1 is also installed on the OS SSD. > > I've done a few builds of core-image-minimal: > > 1) Build dir on the OS SSD > 2) Build dir on the SSD RAID + various bits of tuning. > > The results are basically the same, so it seems as if the SSD RAID makes no difference. Benchmarking it does show twice the read/write performance of the OS SSD, as expected. Disabling journalling and increasing the commit time to 6000 also made no significant difference to the build times, which were (to the nearest minute): That is not surprising. With 4 cores and a very serialized build target, I would not expect your SSD to be the bottleneck. > > Real : 42m > User : 133m > System : 19m > > These time were starting from nothing, and seem to fit with your 30 minutes with 3 times as many cores! BTW, BB_NUMBER_THREADS was set to 16 and PARALLEL_MAKE to 12. A couple of things to keep in mind here. The minimal build is very serialized in comparison to something like a sato build. If you want to optimize your build times, look at the bbmatrix* scripts shipped with poky to find the sweet spot for your target image and your build system. I suspect you will find your BB_NUMBER_THREADS and PARALLEL_MAKE settings are two high for your system. I'd start with them at 8 and 8, or 8 and 6 respectively. > > I also tried rebuilding the kernel: > bitbake -c clean linux-yocto > rm -rf the sstate bits for the above > bitbake linux-yocto > > and got the following times: > > Real : 39m > User : 105m > System : 16m > > Which kind of fits with an observation. The minimal build had something like 1530 stages to complete. The first 750 to 800 of these flew past with all 8 'cores' running at just about 100% all the time. Load average (short term) was about 19, so plenty ready to run. However, round about the time python-native, the kernel, libxslt, gettext kicked in the cpu usage dropped right off - to the point that the short term load average dropped below 3. It did pick up again later on (after the kernel was completed) before slowing down again towards the end (when it would seem reasonable to expect that less can run in parallel). > > It seems as if some of these bits (or others around this time) > aren't making use of parallel make or there is a queue of dependent tasks that needs to be serialized. > > The kernel build is a much bigger part of the build than I was expecting, but this is only a small image. However, it looks as if the main compilation phase completes very early on and a lot of time is then spent building the modules (in a single thread, it seems) and in packaging - which leads me to ask if RPM is the best option (speed wise)? I don't use the packages myself (though understand they are needed internally), so I can use the fastest (if there is one). IPK is faster than RPM. This is what I use on most of my builds. > > Is there anything else I should be considering to improve build > times? Run the ubuntu server kernel to eliminate some scheduling overhead. Reducing the parallel settings mentioned above should help here too. Welcome to Ubuntu 11.10 (GNU/Linux 3.0.0-16-server x86_64) dvhart@rage:~ $ uname -r 3.0.0-16-server As I said above, this is just a rough-cut at some benchmarking and I plan to do some more, especially if there are other things to try and/or any other information that would be useful. > > Still, it's looking much, much faster than my old build system :-) > > Chris Tapp > > opensource@keylevel.com > www.keylevel.com -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-18 20:55 ` Darren Hart @ 2012-04-19 22:39 ` Chris Tapp 0 siblings, 0 replies; 36+ messages in thread From: Chris Tapp @ 2012-04-19 22:39 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project On 18 Apr 2012, at 21:55, Darren Hart wrote: <snip> > A couple of things to keep in mind here. The minimal build is very > serialized in comparison to something like a sato build. If you want to > optimize your build times, look at the bbmatrix* scripts shipped with > poky to find the sweet spot for your target image and your build system. > I suspect you will find your BB_NUMBER_THREADS and PARALLEL_MAKE > settings are two high for your system. I'd start with them at 8 and 8, > or 8 and 6 respectively. I've run a few of the matrix variants (it's going to take a few days to get a full set). 8 and 16 threads are giving the same results (within a few seconds) for parallel make values in the range 6 to 12. I tried a core-image-sato build and it completed in 61m/244m/40m, which is much closer to your <50m than I thought I would get. One thing I noticed during the build was that gettext-native seemed slow. Doing a 'clean' on it and re-baking shows that it takes over 4 minutes to build with most of the time (2m38) being spent in 'do_configure'. It also seems as if this is on the critical path as nothing else was getting scheduled while it was building. There seems to be a lot of 'nothing' going on during the do_configure phase (i.e. very little CPU use). Or, to put it another way, 2.5% of the build time is taken up configuring this package! > IPK is faster than RPM. This is what I use on most of my builds. Makes no noticeable difference in my testing so far, but I'll stick with IPK from now on. <snip> > Run the ubuntu server kernel to eliminate some scheduling overhead. > Reducing the parallel settings mentioned above should help here too. I'm running 11.x server as you mentioned this before ;-) Chris Tapp opensource@keylevel.com www.keylevel.com ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 14:34 ` Darren Hart 2012-04-12 22:43 ` Chris Tapp @ 2012-04-13 8:45 ` Richard Purdie 2012-04-19 10:00 ` Koen Kooi 2012-04-19 12:48 ` Joshua Immanuel 2012-04-13 8:47 ` Björn Stenberg 2 siblings, 2 replies; 36+ messages in thread From: Richard Purdie @ 2012-04-13 8:45 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project On Thu, 2012-04-12 at 07:34 -0700, Darren Hart wrote: > > On 04/12/2012 07:08 AM, Björn Stenberg wrote: > > Darren Hart wrote: > >> /dev/md0 /build ext4 > >> noauto,noatime,nodiratime,commit=6000 > > > > A minor detail: 'nodiratime' is a subset of 'noatime', so there is no > > need to specify both. > > Excellent, thanks for the tip. Note the key here is that for a system with large amounts of memory, you can effectively keep the build in memory due to the long commit time. All the tests I've done show we are not IO bound anyway. > > Yet for all the combined horsepower, I am unable to match your time > > of 30 minutes for core-image-minimal. I clock in at around 37 minutes > > for a qemux86-64 build with ipk output: > > > > ------ NOTE: Tasks Summary: Attempted 1363 tasks of which 290 didn't > > need to be rerun and all succeeded. > > > > real 36m32.118s user 214m39.697s sys 108m49.152s ------ > > > > These numbers also show that my build is running less than 9x > > realtime, indicating that 80% of my cores sit idle most of the time. > > Yup, that sounds about right. The build has a linear component to it, > and anything above about 12 just doesn't help. In fact the added > scheduling overhead seems to hurt. > > > This confirms what "ps xf" says during the builds: Only rarely is > > bitbake running more than a handful tasks at once, even with > > BB_NUMBER_THREADS at 64. And many of these tasks are in turn running > > sequential loops on a single core. > > > > I'm hoping to find time soon to look deeper into this issue and > > suggest remedies. It my distinct feeling that we should be able to > > build significantly faster on powerful machines. > > > > Reducing the dependency chains that result in the linear component of > the build (forcing serialized execution) is one place we've focused, and > could probably still use some attention. CC'ing RP as he's done a lot there. The minimal build is about our worst case single threaded build as it is highly dependency ordered. We've already done a lot of work looking at the "single thread" of core dependencies and this is for example why we have gettext-minimal-native which unlocked some of the core path dependencies. When you look at what we build, there is a reason for most of it unfortunately. There are emails from me about what I looked and found on the mailing list, I tried to keep a record of it somewhere at least. You can get some wins with things like ASSUME_PROVIDED += "git-native". For something like a sato build you should see more parallelism. I do also have some small gains in some pending patches: http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t2&id=2023801e25d81e8cffb643eac259c18b9fecda0b http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t2&id=ecf5f5de8368fdcf90c3d38eafc689d6d265514b http://git.yoctoproject.org/cgit.cgi/poky-contrib/commit/?h=rpurdie/t2&id=2190a51ffac71c9d19305601f8a3a46e467b745a which look at speeding up do_package, do_package_write_rpm and do_rootfs (with rpm). There were developed too late for 1.2 and are in some cases only partially complete but they show some ways we can squeeze some extra performance out the system. There are undoubtedly ways we can improve performance but I think we've done the low hanging fruit and we need some fresh ideas. Cheers, Richard ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-13 8:45 ` Richard Purdie @ 2012-04-19 10:00 ` Koen Kooi 2012-04-19 12:48 ` Joshua Immanuel 1 sibling, 0 replies; 36+ messages in thread From: Koen Kooi @ 2012-04-19 10:00 UTC (permalink / raw) To: Yocto Project Op 13 apr. 2012, om 10:45 heeft Richard Purdie het volgende geschreven: > On Thu, 2012-04-12 at 07:34 -0700, Darren Hart wrote: >> >> On 04/12/2012 07:08 AM, Björn Stenberg wrote: >>> Darren Hart wrote: >>>> /dev/md0 /build ext4 >>>> noauto,noatime,nodiratime,commit=6000 >>> >>> A minor detail: 'nodiratime' is a subset of 'noatime', so there is no >>> need to specify both. >> >> Excellent, thanks for the tip. > > Note the key here is that for a system with large amounts of memory, you > can effectively keep the build in memory due to the long commit time. > > All the tests I've done show we are not IO bound anyway. Consider this scenario: OS disk on spinning rust (sda1, /) BUILDDIR on spinning rust (sdb1, /OE) WORKDIR on SSD (sdc1, /OE/build/tmp/work) SD card in USB reader (sde1) When I do the following during a build all CPUs will enter IO wait and the build grinds to a halt: cd /media ; xz -d -c foo.img.xz | pv -s 3488M > /dev/sde That only touches the OS disk and the SD card, but for some reason the 3.2.8 kernel stops IO to the OE disks as well. do_patch for my kernel recipe has been taking more than an hour now, it usually completes in less than 5 minutes (a few hundred patches applied with a custom patcher, git-am). regards, Koen ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-13 8:45 ` Richard Purdie 2012-04-19 10:00 ` Koen Kooi @ 2012-04-19 12:48 ` Joshua Immanuel 2012-04-19 12:52 ` Richard Purdie 1 sibling, 1 reply; 36+ messages in thread From: Joshua Immanuel @ 2012-04-19 12:48 UTC (permalink / raw) To: Richard Purdie; +Cc: Yocto Project, Darren Hart [-- Attachment #1: Type: text/plain, Size: 384 bytes --] Hello, On Fri, 2012-04-13 at 09:45 +0100, Richard Purdie wrote: > There are undoubtedly ways we can improve performance but I think > we've done the low hanging fruit and we need some fresh ideas. Is there a way to integrate distcc in yocto so that we could distribute the build across machines. -- Joshua Immanuel HiPro IT Solutions Private Limited http://hipro.co.in [-- Attachment #2: This is a digitally signed message part --] [-- Type: application/pgp-signature, Size: 205 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-19 12:48 ` Joshua Immanuel @ 2012-04-19 12:52 ` Richard Purdie 2012-04-19 13:47 ` Samuel Stirtzel 0 siblings, 1 reply; 36+ messages in thread From: Richard Purdie @ 2012-04-19 12:52 UTC (permalink / raw) To: Joshua Immanuel; +Cc: Yocto Project, Darren Hart On Thu, 2012-04-19 at 18:18 +0530, Joshua Immanuel wrote: > Hello, > > On Fri, 2012-04-13 at 09:45 +0100, Richard Purdie wrote: > > There are undoubtedly ways we can improve performance but I think > > we've done the low hanging fruit and we need some fresh ideas. > > Is there a way to integrate distcc in yocto so that we could distribute > the build across machines. See icecream.bbclass but compiling is not the bottleneck, its configure, install and packaging... Cheers, Richard ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-19 12:52 ` Richard Purdie @ 2012-04-19 13:47 ` Samuel Stirtzel 0 siblings, 0 replies; 36+ messages in thread From: Samuel Stirtzel @ 2012-04-19 13:47 UTC (permalink / raw) To: Richard Purdie; +Cc: Yocto Project, Darren Hart 2012/4/19 Richard Purdie <richard.purdie@linuxfoundation.org>: > On Thu, 2012-04-19 at 18:18 +0530, Joshua Immanuel wrote: >> Hello, >> >> On Fri, 2012-04-13 at 09:45 +0100, Richard Purdie wrote: >> > There are undoubtedly ways we can improve performance but I think >> > we've done the low hanging fruit and we need some fresh ideas. >> >> Is there a way to integrate distcc in yocto so that we could distribute >> the build across machines. > > See icecream.bbclass but compiling is not the bottleneck, its configure, > install and packaging... Multi threaded package managers come to my mind, also multi threaded bzip2 (see [1]) Maybe multi threaded autotools / cmake, but that will be future talk (and a headache for the developers). > > Cheers, > > Richard > > _______________________________________________ > yocto mailing list > yocto@yoctoproject.org > https://lists.yoctoproject.org/listinfo/yocto [1] http://compression.ca/pbzip2/ -- Regards Samuel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 14:34 ` Darren Hart 2012-04-12 22:43 ` Chris Tapp 2012-04-13 8:45 ` Richard Purdie @ 2012-04-13 8:47 ` Björn Stenberg 2012-04-13 14:41 ` Darren Hart 2 siblings, 1 reply; 36+ messages in thread From: Björn Stenberg @ 2012-04-13 8:47 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project Darren Hart wrote: > One thing that comes to mind is the parallel settings, BB_NUMBER_THREADS > and PARALLEL_MAKE. I noticed a negative impact if I increased these > beyond 12 and 14 respectively. I tested this with bb-matrix > (scripts/contrib/bb-perf/bb-matrix.sh). The script is a bit fickle, but > can provide useful results and killer 3D surface plots of build time > with BB and PM on the axis. Very nice! I ran a batch overnight with permutations of 8,12,16,24,64 cores: BB PM %e %S %U %P %c %w %R %F %M %x 8 8 2288.96 2611.37 10773.53 584% 810299 18460161 690464859 0 1715456 0 8 12 2198.40 2648.57 10846.28 613% 839750 18559413 690563187 0 1982864 0 8 16 2157.26 2672.79 10943.59 631% 898599 18487946 690761197 0 1715440 0 8 24 2125.15 2916.33 11199.27 664% 800009 18412764 690856116 0 1715440 0 8 64 2189.14 7084.14 12906.95 913% 1491503 18646891 699897733 0 1715440 0 12 8 2277.66 2625.82 10805.21 589% 691752 18596208 690998433 0 1715440 0 12 12 2194.04 2664.01 10934.65 619% 714997 18717017 691199925 0 1715440 0 12 16 2183.95 2736.33 11162.30 636% 1090270 18359128 690559327 0 1715440 0 12 24 2120.46 2907.63 11229.50 666% 829783 18644293 690729638 0 1715312 0 12 64 2171.58 6767.09 12822.86 902% 1524683 18634668 690904549 0 1867456 0 16 8 2294.59 2691.74 10813.69 588% 771621 18637582 686712129 0 1715344 0 16 12 2201.51 2704.54 11017.23 623% 753662 18590533 699231236 0 1715424 0 16 16 2154.54 2692.31 11023.28 636% 809586 18557781 691014487 0 1715440 0 16 24 2130.33 2932.18 11259.09 666% 905669 18531776 691082307 0 2030992 0 16 64 2184.01 6954.71 12922.39 910% 1467774 18800203 701770099 0 1715440 0 24 8 2284.88 2645.88 10854.89 590% 833061 18523938 691067170 0 1715328 0 24 12 2203.72 2696.96 11033.10 623% 931443 18457749 691187723 0 2016368 0 24 16 2176.02 2727.94 11113.33 636% 940044 18420200 690959670 0 1715440 0 24 24 2170.38 2938.80 11643.10 671% 1023328 18641215 686665448 15 1715440 0 24 64 2200.02 7188.60 12902.42 913% 1509158 18924772 690615091 66 1715440 0 64 8 2309.40 2702.33 10952.18 591% 753168 18687309 690927732 10 1867440 0 64 12 2230.80 2765.98 11131.22 622% 875495 18744802 691213524 28 1715216 0 64 16 2182.22 2786.22 11180.86 640% 881328 18724987 691020084 109 1768576 0 64 24 2136.20 3001.36 11238.81 666% 898320 18646384 691239254 46 1715312 0 64 64 2189.73 7154.10 12846.99 913% 1416830 18781801 690890798 41 1715424 0 What it shows is that BB_NUMBER_THREADS makes no difference at all in this range. As for PARALLEL_MAKE, it shows 24 is better than 16 but 64 is too high, incurring a massive scheduling penalty. I wonder if newer kernel versions have become more efficient. In hindsight, I should have included 32 and 48 cores in the test. Unfortunately I was unable to produce plots with bb-matrix-plot.sh. It gave me pretty png files, but missing any plotted data: # ../../poky/scripts/contrib/bb-perf/bb-matrix-plot.sh line 0: Number of grid points must be in [2:1000] - not changed! Warning: Single isoline (scan) is not enough for a pm3d plot. Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. Warning: Single isoline (scan) is not enough for a pm3d plot. Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. Warning: Single isoline (scan) is not enough for a pm3d plot. Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. Warning: Single isoline (scan) is not enough for a pm3d plot. Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. Result: http://imgur.com/mfgWb -- Björn ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-13 8:47 ` Björn Stenberg @ 2012-04-13 14:41 ` Darren Hart 2012-04-19 7:24 ` Björn Stenberg 0 siblings, 1 reply; 36+ messages in thread From: Darren Hart @ 2012-04-13 14:41 UTC (permalink / raw) To: Björn Stenberg; +Cc: Yocto Project On 04/13/2012 01:47 AM, Björn Stenberg wrote: > Darren Hart wrote: >> One thing that comes to mind is the parallel settings, BB_NUMBER_THREADS >> and PARALLEL_MAKE. I noticed a negative impact if I increased these >> beyond 12 and 14 respectively. I tested this with bb-matrix >> (scripts/contrib/bb-perf/bb-matrix.sh). The script is a bit fickle, but >> can provide useful results and killer 3D surface plots of build time >> with BB and PM on the axis. > > Very nice! I ran a batch overnight with permutations of 8,12,16,24,64 cores: > > BB PM %e %S %U %P %c %w %R %F %M %x > 8 8 2288.96 2611.37 10773.53 584% 810299 18460161 690464859 0 1715456 0 > 8 12 2198.40 2648.57 10846.28 613% 839750 18559413 690563187 0 1982864 0 > 8 16 2157.26 2672.79 10943.59 631% 898599 18487946 690761197 0 1715440 0 > 8 24 2125.15 2916.33 11199.27 664% 800009 18412764 690856116 0 1715440 0 > 8 64 2189.14 7084.14 12906.95 913% 1491503 18646891 699897733 0 1715440 0 > 12 8 2277.66 2625.82 10805.21 589% 691752 18596208 690998433 0 1715440 0 > 12 12 2194.04 2664.01 10934.65 619% 714997 18717017 691199925 0 1715440 0 > 12 16 2183.95 2736.33 11162.30 636% 1090270 18359128 690559327 0 1715440 0 > 12 24 2120.46 2907.63 11229.50 666% 829783 18644293 690729638 0 1715312 0 > 12 64 2171.58 6767.09 12822.86 902% 1524683 18634668 690904549 0 1867456 0 > 16 8 2294.59 2691.74 10813.69 588% 771621 18637582 686712129 0 1715344 0 > 16 12 2201.51 2704.54 11017.23 623% 753662 18590533 699231236 0 1715424 0 > 16 16 2154.54 2692.31 11023.28 636% 809586 18557781 691014487 0 1715440 0 > 16 24 2130.33 2932.18 11259.09 666% 905669 18531776 691082307 0 2030992 0 > 16 64 2184.01 6954.71 12922.39 910% 1467774 18800203 701770099 0 1715440 0 > 24 8 2284.88 2645.88 10854.89 590% 833061 18523938 691067170 0 1715328 0 > 24 12 2203.72 2696.96 11033.10 623% 931443 18457749 691187723 0 2016368 0 > 24 16 2176.02 2727.94 11113.33 636% 940044 18420200 690959670 0 1715440 0 > 24 24 2170.38 2938.80 11643.10 671% 1023328 18641215 686665448 15 1715440 0 > 24 64 2200.02 7188.60 12902.42 913% 1509158 18924772 690615091 66 1715440 0 > 64 8 2309.40 2702.33 10952.18 591% 753168 18687309 690927732 10 1867440 0 > 64 12 2230.80 2765.98 11131.22 622% 875495 18744802 691213524 28 1715216 0 > 64 16 2182.22 2786.22 11180.86 640% 881328 18724987 691020084 109 1768576 0 > 64 24 2136.20 3001.36 11238.81 666% 898320 18646384 691239254 46 1715312 0 > 64 64 2189.73 7154.10 12846.99 913% 1416830 18781801 690890798 41 1715424 0 > > What it shows is that BB_NUMBER_THREADS makes no difference at all in this range. As for PARALLEL_MAKE, it shows 24 is better than 16 but 64 is too high, incurring a massive scheduling penalty. I wonder if newer kernel versions have become more efficient. In hindsight, I should have included 32 and 48 cores in the test. > > Unfortunately I was unable to produce plots with bb-matrix-plot.sh. It gave me pretty png files, but missing any plotted data: Right, gnuplot likes evenly spaced values of BB and PM. So you could have done: 8,12,16,24,28,32 (anything about that is going to go down anyway). Unfortunately, the gaps force the plot to generate spikes at the interpolated points. I'm open to ideas on how to make it compatible with arbitrary gaps and avoid the spikes. Perhaps I should rewrite this with python matplotlib and scipy and use the interpolate module. This is non-trivial, so not something I'll get to quickly. > > # ../../poky/scripts/contrib/bb-perf/bb-matrix-plot.sh > line 0: Number of grid points must be in [2:1000] - not changed! > > Warning: Single isoline (scan) is not enough for a pm3d plot. > Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. > Warning: Single isoline (scan) is not enough for a pm3d plot. > Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. > Warning: Single isoline (scan) is not enough for a pm3d plot. > Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. > Warning: Single isoline (scan) is not enough for a pm3d plot. > Hint: Missing blank lines in the data file? See 'help pm3d' and FAQ. > > Result: http://imgur.com/mfgWb > -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-13 14:41 ` Darren Hart @ 2012-04-19 7:24 ` Björn Stenberg 2012-04-19 14:11 ` Darren Hart 0 siblings, 1 reply; 36+ messages in thread From: Björn Stenberg @ 2012-04-19 7:24 UTC (permalink / raw) To: Darren Hart; +Cc: Yocto Project Darren Hart wrote: > Right, gnuplot likes evenly spaced values of BB and PM. So you could > have done: 8,12,16,24,28,32 I did that, and uploaded it to the wiki: https://wiki.yoctoproject.org/wiki/Build_Performance#parallelism Looks like 24/32 is the sweet spot for this system, for this build. -- Björn ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-19 7:24 ` Björn Stenberg @ 2012-04-19 14:11 ` Darren Hart 0 siblings, 0 replies; 36+ messages in thread From: Darren Hart @ 2012-04-19 14:11 UTC (permalink / raw) To: Björn Stenberg; +Cc: Yocto Project On 04/19/2012 12:24 AM, Björn Stenberg wrote: > Darren Hart wrote: >> Right, gnuplot likes evenly spaced values of BB and PM. So you could >> have done: 8,12,16,24,28,32 > > I did that, and uploaded it to the wiki: > https://wiki.yoctoproject.org/wiki/Build_Performance#parallelism > > Looks like 24/32 is the sweet spot for this system, for this build. > Fantastic! I'm glad to see a sweet spot above 12x12. I'll have to rerun on my system to see if things have improved for me as well. Thanks for taking the time to summarize the discussion and get it on the wiki! -- Darren Hart Intel Open Source Technology Center Yocto Project - Linux Kernel ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-12 0:30 ` Darren Hart ` (3 preceding siblings ...) 2012-04-12 14:08 ` Björn Stenberg @ 2012-04-13 9:56 ` Tomas Frydrych 2012-04-13 10:23 ` Koen Kooi 4 siblings, 1 reply; 36+ messages in thread From: Tomas Frydrych @ 2012-04-13 9:56 UTC (permalink / raw) To: yocto On 12/04/12 01:30, Darren Hart wrote: > Next up is storage. Indeed. In my experience by far the biggest limiting factor in the builds is getting io bound. If you are not running a dedicated build machine, it is well worth using a dedicated disk for the poky tmp dir; assuming you have cpu time left, this leaves the machine completely usable for other things. > Now RAM, you will want about 2 GB of RAM per core, with a minimum of 4GB. My experience does not bear this out at all; building Yocto on a 6 core hyper threaded desktop machine I have never ever seen the system memory use to get significantly over a 2GB mark (out of 8GB available), doing Yocto build using 10 cores/threads. On a custom desktop machine with i7-x990 3.47GHz, 8GB ram, quiet conventional hard disks, letting poky use 10 cores/threads (so I can get my work done while it does its own thing in the background), a fresh build of core-image-minimal for beagleboard, with debug & profile tools and test apps, takes 77 minutes. Obviously, not anywhere near as fast as the Intel OTC Xenon beast, but much cheaper HW, and for my purposes the build speed is well in a region where it is no longer a productivity issue. Tomas ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: Build time data 2012-04-13 9:56 ` Tomas Frydrych @ 2012-04-13 10:23 ` Koen Kooi 0 siblings, 0 replies; 36+ messages in thread From: Koen Kooi @ 2012-04-13 10:23 UTC (permalink / raw) To: Tomas Frydrych; +Cc: yocto Op 13 apr. 2012, om 11:56 heeft Tomas Frydrych het volgende geschreven: > On 12/04/12 01:30, Darren Hart wrote: >> Next up is storage. > > Indeed. In my experience by far the biggest limiting factor in the > builds is getting io bound. If you are not running a dedicated build > machine, it is well worth using a dedicated disk for the poky tmp dir; > assuming you have cpu time left, this leaves the machine completely > usable for other things. > > >> Now RAM, you will want about 2 GB of RAM per core, with a minimum of 4GB. > > My experience does not bear this out at all; building Yocto on a 6 core > hyper threaded desktop machine I have never ever seen the system memory > use to get significantly over a 2GB mark (out of 8GB available), doing > Yocto build using 10 cores/threads. Try building webkit or asio, the linker will uses ~1.5GB per object, so for asio you need PARALLEL_MAKE * 1.5 GB of ram to avoid swapping to disk. ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2012-04-19 22:39 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2012-04-11 20:42 Build time data Chris Tapp 2012-04-11 21:19 ` Autif Khan 2012-04-11 21:38 ` Bob Cochran 2012-04-12 0:30 ` Darren Hart 2012-04-12 0:43 ` Osier-mixon, Jeffrey 2012-04-12 4:39 ` Bob Cochran 2012-04-12 7:10 ` Darren Hart 2012-04-12 7:35 ` Joshua Immanuel 2012-04-12 8:00 ` Martin Jansa 2012-04-12 9:36 ` Joshua Immanuel 2012-04-12 14:12 ` Darren Hart 2012-04-12 23:37 ` Flanagan, Elizabeth 2012-04-13 5:51 ` Martin Jansa 2012-04-13 6:08 ` Darren Hart 2012-04-13 6:38 ` Martin Jansa 2012-04-13 7:24 ` Wolfgang Denk 2012-04-17 15:29 ` Martin Jansa 2012-04-12 14:08 ` Björn Stenberg 2012-04-12 14:34 ` Darren Hart 2012-04-12 22:43 ` Chris Tapp 2012-04-12 22:56 ` Darren Hart 2012-04-18 19:41 ` Chris Tapp 2012-04-18 20:27 ` Chris Tapp 2012-04-18 20:55 ` Darren Hart 2012-04-19 22:39 ` Chris Tapp 2012-04-13 8:45 ` Richard Purdie 2012-04-19 10:00 ` Koen Kooi 2012-04-19 12:48 ` Joshua Immanuel 2012-04-19 12:52 ` Richard Purdie 2012-04-19 13:47 ` Samuel Stirtzel 2012-04-13 8:47 ` Björn Stenberg 2012-04-13 14:41 ` Darren Hart 2012-04-19 7:24 ` Björn Stenberg 2012-04-19 14:11 ` Darren Hart 2012-04-13 9:56 ` Tomas Frydrych 2012-04-13 10:23 ` Koen Kooi
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.