All of lore.kernel.org
 help / color / mirror / Atom feed
* Storing Sstate in S3 success stories?
@ 2019-02-26  1:44 Timothy Froehlich
  2019-02-26  2:18 ` Chuck Wolber
                   ` (2 more replies)
  0 siblings, 3 replies; 6+ messages in thread
From: Timothy Froehlich @ 2019-02-26  1:44 UTC (permalink / raw)
  To: Yocto discussion list

[-- Attachment #1: Type: text/plain, Size: 950 bytes --]

I've been spending a bit too long this past week trying to build up a
reproducable build infrastructure in AWS and I've got very little
experience with cloud infrastucture and I'm wondering if I'm going in the
wrong direction. I'm attempting to host my sstate_cache as a mirror in a
private S3 bucket, and I believe I have everything configured properly,
including exposing the bucket to http requests, since I can wget files that
I've previously synced up to the bucket. However if I add in the
SSTATE_MIRRORS to my build, bitbake slows to a crawl (it's a powerful VM)
and barely seems to get anything. The EC2 instance is in the same region as
the S3 bucket, roles have been configured properly to allow access, etc.

I'm not looking for help debugging this, I just want to know whether I'm
right that hosting my sstate in an S3 bucket should work. I've only been
able to find one mention of it being done with no reproduction hints.

[-- Attachment #2: Type: text/html, Size: 1136 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Storing Sstate in S3 success stories?
  2019-02-26  1:44 Storing Sstate in S3 success stories? Timothy Froehlich
@ 2019-02-26  2:18 ` Chuck Wolber
  2019-02-26  8:19 ` Erik Hoogeveen
  2019-02-26 19:35 ` Brian Walsh
  2 siblings, 0 replies; 6+ messages in thread
From: Chuck Wolber @ 2019-02-26  2:18 UTC (permalink / raw)
  To: Timothy Froehlich; +Cc: Yocto discussion list

[-- Attachment #1: Type: text/plain, Size: 1697 bytes --]

Have you done any wireshark analysis on the traffic? My guess is that the
round trip with network latency is bumping your build time by a factor of
at least 100x.  The state-cache is hammered on continuously, so have
probably introduced a significant bottleneck.

..Ch:W..

On Mon, Feb 25, 2019 at 17:45 Timothy Froehlich <tfroehlich@archsys.io>
wrote:

> I've been spending a bit too long this past week trying to build up a
> reproducable build infrastructure in AWS and I've got very little
> experience with cloud infrastucture and I'm wondering if I'm going in the
> wrong direction. I'm attempting to host my sstate_cache as a mirror in a
> private S3 bucket, and I believe I have everything configured properly,
> including exposing the bucket to http requests, since I can wget files that
> I've previously synced up to the bucket. However if I add in the
> SSTATE_MIRRORS to my build, bitbake slows to a crawl (it's a powerful VM)
> and barely seems to get anything. The EC2 instance is in the same region as
> the S3 bucket, roles have been configured properly to allow access, etc.
>
> I'm not looking for help debugging this, I just want to know whether I'm
> right that hosting my sstate in an S3 bucket should work. I've only been
> able to find one mention of it being done with no reproduction hints.
>
> --
> _______________________________________________
> yocto mailing list
> yocto@yoctoproject.org
> https://lists.yoctoproject.org/listinfo/yocto
>
-- 
“I would challenge anyone here to think of a question upon which we once
had a scientific answer, however inadequate, but for which now the best
answer is a religious one."  -Sam Harris

[-- Attachment #2: Type: text/html, Size: 2493 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Storing Sstate in S3 success stories?
  2019-02-26  1:44 Storing Sstate in S3 success stories? Timothy Froehlich
  2019-02-26  2:18 ` Chuck Wolber
@ 2019-02-26  8:19 ` Erik Hoogeveen
  2019-02-26 18:35   ` Timothy Froehlich
  2019-02-26 19:35 ` Brian Walsh
  2 siblings, 1 reply; 6+ messages in thread
From: Erik Hoogeveen @ 2019-02-26  8:19 UTC (permalink / raw)
  To: Yocto discussion list, Timothy Froehlich

[-- Attachment #1: Type: text/plain, Size: 2185 bytes --]

Hi Timothy,

The S3 protocol is HTTP(S) based, the overhead per object is quite significant. This is not much of a problem for large files but the sstate_cache contains mostly lots of really small files. I think in this case you’re better of storing the cache on a secondary EBS volume that you can attach as a regular block device to the EC2 instance. You can swtich on deletion protection to make it survive EC2 termination.

Since EBS volumes are quite a bit more expensive than S3 buckets  you could make snapshots to transfer the state between build runs in stead then you can destroy the EBS volume when nothing is running.

Alls the documentation about EBS is here https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html

Cheers,
Erik
On 26 Feb 2019, 02:45 +0100, Timothy Froehlich <tfroehlich@archsys.io>, wrote:
I've been spending a bit too long this past week trying to build up a reproducable build infrastructure in AWS and I've got very little experience with cloud infrastucture and I'm wondering if I'm going in the wrong direction. I'm attempting to host my sstate_cache as a mirror in a private S3 bucket, and I believe I have everything configured properly, including exposing the bucket to http requests, since I can wget files that I've previously synced up to the bucket. However if I add in the SSTATE_MIRRORS to my build, bitbake slows to a crawl (it's a powerful VM) and barely seems to get anything. The EC2 instance is in the same region as the S3 bucket, roles have been configured properly to allow access, etc.

I'm not looking for help debugging this, I just want to know whether I'm right that hosting my sstate in an S3 bucket should work. I've only been able to find one mention of it being done with no reproduction hints.

--
_______________________________________________
yocto mailing list
yocto@yoctoproject.org
https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.yoctoproject.org%2Flistinfo%2Fyocto&amp;data=02%7C01%7C%7Cdc66c2ab9da44272220b08d69b8c2262%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636867423506178146&amp;sdata=TjwcnaFaZQpk9iuxL16%2FosjuqEH2S7aXB16JjBIDGko%3D&amp;reserved=0

[-- Attachment #2: Type: text/html, Size: 3199 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Storing Sstate in S3 success stories?
  2019-02-26  8:19 ` Erik Hoogeveen
@ 2019-02-26 18:35   ` Timothy Froehlich
  0 siblings, 0 replies; 6+ messages in thread
From: Timothy Froehlich @ 2019-02-26 18:35 UTC (permalink / raw)
  To: Erik Hoogeveen; +Cc: Yocto discussion list

[-- Attachment #1: Type: text/plain, Size: 3155 bytes --]

Well, based on the responses above I did some more research and it didn't
seem like the file sizes should be causing problems on the scale that I was
seeing so I investigated further. I realized that despite my build/sstate
directory getting slowly larger, it wasn't actually getting the files and
was leaving empty files behind. I tried just changing the https in my
SSTATE_MIRRORs line just http and it worked perfectly, pulling down a half
gig of sstate before I could tell it was even working. So there was likely
some other misconfiguration in our AWS account that caused the https to
fail (despite being able to wget individual files using https). Thanks for
the responses!

On Tue, Feb 26, 2019 at 12:19 AM Erik Hoogeveen <erik.hoogeveen@outlook.com>
wrote:

> Hi Timothy,
>
> The S3 protocol is HTTP(S) based, the overhead per object is quite
> significant. This is not much of a problem for large files but the
> sstate_cache contains mostly lots of really small files. I think in this
> case you’re better of storing the cache on a secondary EBS volume that you
> can attach as a regular block device to the EC2 instance. You can swtich on
> deletion protection to make it survive EC2 termination.
>
> Since EBS volumes are quite a bit more expensive than S3 buckets  you
> could make snapshots to transfer the state between build runs in stead then
> you can destroy the EBS volume when nothing is running.
>
> Alls the documentation about EBS is here
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/AmazonEBS.html
>
> Cheers,
> Erik
> On 26 Feb 2019, 02:45 +0100, Timothy Froehlich <tfroehlich@archsys.io>,
> wrote:
>
> I've been spending a bit too long this past week trying to build up a
> reproducable build infrastructure in AWS and I've got very little
> experience with cloud infrastucture and I'm wondering if I'm going in the
> wrong direction. I'm attempting to host my sstate_cache as a mirror in a
> private S3 bucket, and I believe I have everything configured properly,
> including exposing the bucket to http requests, since I can wget files that
> I've previously synced up to the bucket. However if I add in the
> SSTATE_MIRRORS to my build, bitbake slows to a crawl (it's a powerful VM)
> and barely seems to get anything. The EC2 instance is in the same region as
> the S3 bucket, roles have been configured properly to allow access, etc.
>
> I'm not looking for help debugging this, I just want to know whether I'm
> right that hosting my sstate in an S3 bucket should work. I've only been
> able to find one mention of it being done with no reproduction hints.
>
> --
> _______________________________________________
> yocto mailing list
> yocto@yoctoproject.org
>
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.yoctoproject.org%2Flistinfo%2Fyocto&amp;data=02%7C01%7C%7Cdc66c2ab9da44272220b08d69b8c2262%7C84df9e7fe9f640afb435aaaaaaaaaaaa%7C1%7C0%7C636867423506178146&amp;sdata=TjwcnaFaZQpk9iuxL16%2FosjuqEH2S7aXB16JjBIDGko%3D&amp;reserved=0
>
>

-- 
Tim Froehlich
Embedded Linux Engineer
tfroehlich@archsys.io
215-218-8955

[-- Attachment #2: Type: text/html, Size: 4856 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Storing Sstate in S3 success stories?
  2019-02-26  1:44 Storing Sstate in S3 success stories? Timothy Froehlich
  2019-02-26  2:18 ` Chuck Wolber
  2019-02-26  8:19 ` Erik Hoogeveen
@ 2019-02-26 19:35 ` Brian Walsh
  2019-02-27  0:29   ` Timothy Froehlich
  2 siblings, 1 reply; 6+ messages in thread
From: Brian Walsh @ 2019-02-26 19:35 UTC (permalink / raw)
  To: Timothy Froehlich; +Cc: Yocto discussion list

On Mon, Feb 25, 2019 at 8:46 PM Timothy Froehlich <tfroehlich@archsys.io> wrote:
>
> I've been spending a bit too long this past week trying to build up a reproducable build infrastructure in AWS and I've got very little experience with cloud infrastucture and I'm wondering if I'm going in the wrong direction. I'm attempting to host my sstate_cache as a mirror in a private S3 bucket, and I believe I have everything configured properly, including exposing the bucket to http requests, since I can wget files that I've previously synced up to the bucket. However if I add in the SSTATE_MIRRORS to my build, bitbake slows to a crawl (it's a powerful VM) and barely seems to get anything. The EC2 instance is in the same region as the S3 bucket, roles have been configured properly to allow access, etc.
>
> I'm not looking for help debugging this, I just want to know whether I'm right that hosting my sstate in an S3 bucket should work. I've only been able to find one mention of it being done with no reproduction hints.
>

A lot of the files end up with plus signs in the name. This causes
problems with retrieving files through http access with S3. S3
translates all plus signs to spaces, even those in the file path. So
if my-file_v1.0+g1241876 actually exists as named in S3 an http
request for that file will trigger the server to look for
"my-file_v1.0 g1241876"

I ran into this problem trying to host an opkg repository in S3 for upgrading.

It may mostly work for you but there will be many files that it will
never be able to find in your S3 hosted sstate.

Maybe this has been fixed by AWS. I noticed the problem a year or two ago.

https://stackoverflow.com/questions/36734171/how-to-decide-if-the-filename-has-a-plus-sign-in-it#36758133
https://news.ycombinator.com/item?id=15398804


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Storing Sstate in S3 success stories?
  2019-02-26 19:35 ` Brian Walsh
@ 2019-02-27  0:29   ` Timothy Froehlich
  0 siblings, 0 replies; 6+ messages in thread
From: Timothy Froehlich @ 2019-02-27  0:29 UTC (permalink / raw)
  To: Brian Walsh; +Cc: Yocto discussion list

[-- Attachment #1: Type: text/plain, Size: 2262 bytes --]

This doesn't seem to be an issue. I have multiple files with plus signs in
their names that made it back down to my local cache without requiring a
rebuild (including the whole Linux kernel)

On Tue, Feb 26, 2019 at 11:35 AM Brian Walsh <brian@walsh.ws> wrote:

> On Mon, Feb 25, 2019 at 8:46 PM Timothy Froehlich <tfroehlich@archsys.io>
> wrote:
> >
> > I've been spending a bit too long this past week trying to build up a
> reproducable build infrastructure in AWS and I've got very little
> experience with cloud infrastucture and I'm wondering if I'm going in the
> wrong direction. I'm attempting to host my sstate_cache as a mirror in a
> private S3 bucket, and I believe I have everything configured properly,
> including exposing the bucket to http requests, since I can wget files that
> I've previously synced up to the bucket. However if I add in the
> SSTATE_MIRRORS to my build, bitbake slows to a crawl (it's a powerful VM)
> and barely seems to get anything. The EC2 instance is in the same region as
> the S3 bucket, roles have been configured properly to allow access, etc.
> >
> > I'm not looking for help debugging this, I just want to know whether I'm
> right that hosting my sstate in an S3 bucket should work. I've only been
> able to find one mention of it being done with no reproduction hints.
> >
>
> A lot of the files end up with plus signs in the name. This causes
> problems with retrieving files through http access with S3. S3
> translates all plus signs to spaces, even those in the file path. So
> if my-file_v1.0+g1241876 actually exists as named in S3 an http
> request for that file will trigger the server to look for
> "my-file_v1.0 g1241876"
>
> I ran into this problem trying to host an opkg repository in S3 for
> upgrading.
>
> It may mostly work for you but there will be many files that it will
> never be able to find in your S3 hosted sstate.
>
> Maybe this has been fixed by AWS. I noticed the problem a year or two ago.
>
>
> https://stackoverflow.com/questions/36734171/how-to-decide-if-the-filename-has-a-plus-sign-in-it#36758133
> https://news.ycombinator.com/item?id=15398804
>


-- 
Tim Froehlich
Embedded Linux Engineer
tfroehlich@archsys.io
215-218-8955

[-- Attachment #2: Type: text/html, Size: 3095 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2019-02-27  0:29 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-26  1:44 Storing Sstate in S3 success stories? Timothy Froehlich
2019-02-26  2:18 ` Chuck Wolber
2019-02-26  8:19 ` Erik Hoogeveen
2019-02-26 18:35   ` Timothy Froehlich
2019-02-26 19:35 ` Brian Walsh
2019-02-27  0:29   ` Timothy Froehlich

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.