All of lore.kernel.org
 help / color / mirror / Atom feed
* qemu.org server bandwidth report (May 2021)
@ 2021-05-10  9:49 Stefan Hajnoczi
  2021-05-10 10:31 ` Daniel P. Berrangé
  0 siblings, 1 reply; 6+ messages in thread
From: Stefan Hajnoczi @ 2021-05-10  9:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Thomas Huth, danpb, Philippe Mathieu-Daudé

Hi,
A few months ago qemu.org hit network bandwidth limits and incurred
costs for exceeding them. Since then we have implemented several
changes to make continuous integration systems more
bandwidth-efficient and reduce the biggest sources of traffic to
qemu.org.

During the Mar-Apr billing cycle qemu.org still exceeded its network
bandwidth limit but only by a small amount. Bandwidth consumption
needs to be under ~6-7 TB/month. Below are the details of how we're
doing.

Thank you to Paolo Bonzini, Thomas Huth, Philippe Mathieu-Daudé,
Daniel Berrangé, and everyone who helped with bandwidth reduction. The
main change was a move to GitLab.com, which now serves the main QEMU
git repository URLs. We also updated documentation and links to
encourage people to use these new URLs.

qemu.org bandwidth usage has been as follows:
- Jan: 12.56 TB
- Feb: 10.55 TB
- Mar: 10.28 TB
- Apr: 7.62 TB

In May qemu.org has averaged 232.25 GB/day so far putting it on track
for 7 TB total this month.

The top 3 web traffic users are Google Cloud and Amazon Web Services
IP addresses. This suggests that some continuous integration systems
are still accessing qemu.org git repositories. It is unlikely that
these are crawlers because User-Agent web stats show that crawlers
only consume a few GB whereas the top three hosts consume 10s or 100s
of GB each.

Roughly 75% of traffic is git (https), 25% is tarball downloads, and
the rest is wiki/web/miscellaneous traffic. Fun fact:
qemu-4.2.0.tar.xz is the most popular download!

I will send another update in 2 months so we can see where bandwidth
usage finally settled. At that point we can decide whether more steps
are necessary.

Thanks,
Stefan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qemu.org server bandwidth report (May 2021)
  2021-05-10  9:49 qemu.org server bandwidth report (May 2021) Stefan Hajnoczi
@ 2021-05-10 10:31 ` Daniel P. Berrangé
  2021-05-10 13:40   ` Stefan Hajnoczi
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel P. Berrangé @ 2021-05-10 10:31 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Paolo Bonzini, Thomas Huth, berrange, qemu-devel,
	Philippe Mathieu-Daudé

On Mon, May 10, 2021 at 10:49:19AM +0100, Stefan Hajnoczi wrote:

> qemu.org bandwidth usage has been as follows:
> - Jan: 12.56 TB
> - Feb: 10.55 TB
> - Mar: 10.28 TB
> - Apr: 7.62 TB
> 
> In May qemu.org has averaged 232.25 GB/day so far putting it on track
> for 7 TB total this month.

That decrease seems to show we've had a big effect from moving to
gitlab. Not big enough yet though.

> Roughly 75% of traffic is git (https), 25% is tarball downloads, and
> the rest is wiki/web/miscellaneous traffic. Fun fact:
> qemu-4.2.0.tar.xz is the most popular download!

First git traffic...

When you say  "git (https)" are you exclusively meaning access of
git via https:// protocol URIs, or does that include git:// URIs
too ?

Or are git:/// URI traffic not accounted for at all in your 75%/25%
split there ?

For the https:// URIs should we setup a HTTP redirect ?

When git clones via https it fetches some specific paths which
I believe we have rules for in httpd conf:

  ScriptAliasMatch "^/git/(.*\.git/(HEAD|info/refs))$" \
    /usr/libexec/git-core/git-http-backend/$1
  ScriptAliasMatch "^/git/(.*\.git/git-(upload|receive)-pack)$" \
    /usr/libexec/git-core/git-http-backend/$1

If we set those URI path matches to send a HTTP 307 redirect
to gitlab, that would essentially kill off our git traffic on
qemu.org, while still allowing the qemu.org gitweb UI to
work normally. The downside is that people won't notice to
update their clone URIs. Still feels like an easy win and
we can easily remove the redirect if we use code 307.



Second tarball traffic...

The qemu-5.2.0.tar.xz file is 102 MB in size

This is quite ridiculous and is directly caused by the number of
binary blobs we're bundling and the corresponding need to bundle
their source.

In fact 66% of this is EDK2's fault - just removing the EDK2
ROMs and source code drops it to 38 MB.

Deleting capstone, dtc, slirp and meson saves 2 MB compressed.

Deleting all remaining contents or roms/ gets us down to 14 MB

Of course we likely want to provide ROMs as a convenience to
users who are not distro vendors, but we could perhaps do that
in a more flexible way.

Even users who want the ROMs likely don't need all of them.


Third, qemu 4.2.0....

I wonder why this is the most popular. Something must be linking
to this, as you would otherwise have to go out of your way to
search it out.

Do we have any stats on the referrer URLs ?

I wonder if there's some key page(s) that need updating ?

If we're unlucky there might be some CI system that hardcoded
use of qemu 4.2.0 that's frequently pulling it.

> I will send another update in 2 months so we can see where bandwidth
> usage finally settled. At that point we can decide whether more steps
> are necessary.




Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qemu.org server bandwidth report (May 2021)
  2021-05-10 10:31 ` Daniel P. Berrangé
@ 2021-05-10 13:40   ` Stefan Hajnoczi
  2021-05-10 13:46     ` Daniel P. Berrangé
  2021-05-10 15:47     ` Alex Bennée
  0 siblings, 2 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2021-05-10 13:40 UTC (permalink / raw)
  To: Daniel P. Berrangé
  Cc: Paolo Bonzini, Thomas Huth, qemu-devel, Philippe Mathieu-Daudé

On Mon, May 10, 2021 at 11:31 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>
> On Mon, May 10, 2021 at 10:49:19AM +0100, Stefan Hajnoczi wrote:
>
> > qemu.org bandwidth usage has been as follows:
> > - Jan: 12.56 TB
> > - Feb: 10.55 TB
> > - Mar: 10.28 TB
> > - Apr: 7.62 TB
> >
> > In May qemu.org has averaged 232.25 GB/day so far putting it on track
> > for 7 TB total this month.
>
> That decrease seems to show we've had a big effect from moving to
> gitlab. Not big enough yet though.
>
> > Roughly 75% of traffic is git (https), 25% is tarball downloads, and
> > the rest is wiki/web/miscellaneous traffic. Fun fact:
> > qemu-4.2.0.tar.xz is the most popular download!
>
> First git traffic...
>
> When you say  "git (https)" are you exclusively meaning access of
> git via https:// protocol URIs, or does that include git:// URIs
> too ?

This includes git-http-backend(1) only. I think gitweb traffic is separate.

>
> Or are git:/// URI traffic not accounted for at all in your 75%/25%
> split there ?

git-daemon is not included in the stats because they are web server
stats only. Based on the network bandwidth fees that QEMU has been
paying I do know git-daemon traffic is much smaller than
git-http-backend traffic.

>
> For the https:// URIs should we setup a HTTP redirect ?
>
> When git clones via https it fetches some specific paths which
> I believe we have rules for in httpd conf:
>
>   ScriptAliasMatch "^/git/(.*\.git/(HEAD|info/refs))$" \
>     /usr/libexec/git-core/git-http-backend/$1
>   ScriptAliasMatch "^/git/(.*\.git/git-(upload|receive)-pack)$" \
>     /usr/libexec/git-core/git-http-backend/$1
>
> If we set those URI path matches to send a HTTP 307 redirect
> to gitlab, that would essentially kill off our git traffic on
> qemu.org, while still allowing the qemu.org gitweb UI to
> work normally. The downside is that people won't notice to
> update their clone URIs. Still feels like an easy win and
> we can easily remove the redirect if we use code 307.

I remember there were concerns about warning messages that
git-clone(1) prints when an HTTP redirect is encountered? If everyone
is okay I can turn the git-http-backend(1) aliases into HTTP 307
redirects to GitLab.

> Third, qemu 4.2.0....
>
> I wonder why this is the most popular. Something must be linking
> to this, as you would otherwise have to go out of your way to
> search it out.
>
> Do we have any stats on the referrer URLs ?
>
> I wonder if there's some key page(s) that need updating ?
>
> If we're unlucky there might be some CI system that hardcoded
> use of qemu 4.2.0 that's frequently pulling it.

The majority of qemu-4.2.0.tar.xz downloads have the wget user agent
and no referrer. The IP addresses don't have a clear pattern (there
are many).

Stefan


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qemu.org server bandwidth report (May 2021)
  2021-05-10 13:40   ` Stefan Hajnoczi
@ 2021-05-10 13:46     ` Daniel P. Berrangé
  2021-05-10 15:47     ` Alex Bennée
  1 sibling, 0 replies; 6+ messages in thread
From: Daniel P. Berrangé @ 2021-05-10 13:46 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Paolo Bonzini, Thomas Huth, qemu-devel, Philippe Mathieu-Daudé

On Mon, May 10, 2021 at 02:40:16PM +0100, Stefan Hajnoczi wrote:
> On Mon, May 10, 2021 at 11:31 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >
> > For the https:// URIs should we setup a HTTP redirect ?
> >
> > When git clones via https it fetches some specific paths which
> > I believe we have rules for in httpd conf:
> >
> >   ScriptAliasMatch "^/git/(.*\.git/(HEAD|info/refs))$" \
> >     /usr/libexec/git-core/git-http-backend/$1
> >   ScriptAliasMatch "^/git/(.*\.git/git-(upload|receive)-pack)$" \
> >     /usr/libexec/git-core/git-http-backend/$1
> >
> > If we set those URI path matches to send a HTTP 307 redirect
> > to gitlab, that would essentially kill off our git traffic on
> > qemu.org, while still allowing the qemu.org gitweb UI to
> > work normally. The downside is that people won't notice to
> > update their clone URIs. Still feels like an easy win and
> > we can easily remove the redirect if we use code 307.
> 
> I remember there were concerns about warning messages that
> git-clone(1) prints when an HTTP redirect is encountered? If everyone
> is okay I can turn the git-http-backend(1) aliases into HTTP 307
> redirects to GitLab.

I presume that'll be the case with git fetch/pull too, and anything
else which talks to the server.

None the less, if git prints a warning message when getting a redirect,
I'd say that is probably a desirable feature, as it'll make it more
likely that people will fix their URIs to directly point at gitlab.

Regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qemu.org server bandwidth report (May 2021)
  2021-05-10 13:40   ` Stefan Hajnoczi
  2021-05-10 13:46     ` Daniel P. Berrangé
@ 2021-05-10 15:47     ` Alex Bennée
  2021-05-10 16:05       ` Stefan Hajnoczi
  1 sibling, 1 reply; 6+ messages in thread
From: Alex Bennée @ 2021-05-10 15:47 UTC (permalink / raw)
  To: Stefan Hajnoczi
  Cc: Paolo Bonzini, Thomas Huth, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	qemu-devel


Stefan Hajnoczi <stefanha@gmail.com> writes:

> On Mon, May 10, 2021 at 11:31 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
>>
>> On Mon, May 10, 2021 at 10:49:19AM +0100, Stefan Hajnoczi wrote:
>>
>> > qemu.org bandwidth usage has been as follows:
>> > - Jan: 12.56 TB
>> > - Feb: 10.55 TB
>> > - Mar: 10.28 TB
>> > - Apr: 7.62 TB
>> >
>> > In May qemu.org has averaged 232.25 GB/day so far putting it on track
>> > for 7 TB total this month.
<snip>
>>
>> For the https:// URIs should we setup a HTTP redirect ?
>>
>> When git clones via https it fetches some specific paths which
>> I believe we have rules for in httpd conf:
>>
>>   ScriptAliasMatch "^/git/(.*\.git/(HEAD|info/refs))$" \
>>     /usr/libexec/git-core/git-http-backend/$1
>>   ScriptAliasMatch "^/git/(.*\.git/git-(upload|receive)-pack)$" \
>>     /usr/libexec/git-core/git-http-backend/$1
>>
>> If we set those URI path matches to send a HTTP 307 redirect
>> to gitlab, that would essentially kill off our git traffic on
>> qemu.org, while still allowing the qemu.org gitweb UI to
>> work normally. The downside is that people won't notice to
>> update their clone URIs. Still feels like an easy win and
>> we can easily remove the redirect if we use code 307.
>
> I remember there were concerns about warning messages that
> git-clone(1) prints when an HTTP redirect is encountered? If everyone
> is okay I can turn the git-http-backend(1) aliases into HTTP 307
> redirects to GitLab.
>
>> Third, qemu 4.2.0....
>>
>> I wonder why this is the most popular. Something must be linking
>> to this, as you would otherwise have to go out of your way to
>> search it out.
>>
>> Do we have any stats on the referrer URLs ?
>>
>> I wonder if there's some key page(s) that need updating ?
>>
>> If we're unlucky there might be some CI system that hardcoded
>> use of qemu 4.2.0 that's frequently pulling it.
>
> The majority of qemu-4.2.0.tar.xz downloads have the wget user agent
> and no referrer. The IP addresses don't have a clear pattern (there
> are many).

I've just checked my Gentoo box and I can see it pulls directly from:

  SRC_URI="https://download.qemu.org/${P}.tar.xz"

and the *9999* builds (HEAD, which I doubt many people use) points to:

  EGIT_REPO_URI="https://git.qemu.org/git/qemu.git"

but the lowest version is 5.2.0 and 6.0.0 is already in the repo so
these particular users probably are a minority.

However Google does point to a number of instructions online that have
wget and "qemu-4.2.0.tar.xz" in them. 

>
> Stefan


-- 
Alex Bennée


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: qemu.org server bandwidth report (May 2021)
  2021-05-10 15:47     ` Alex Bennée
@ 2021-05-10 16:05       ` Stefan Hajnoczi
  0 siblings, 0 replies; 6+ messages in thread
From: Stefan Hajnoczi @ 2021-05-10 16:05 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Paolo Bonzini, Thomas Huth, Daniel P. Berrangé,
	Philippe Mathieu-Daudé,
	qemu-devel

On Mon, May 10, 2021 at 4:53 PM Alex Bennée <alex.bennee@linaro.org> wrote:
>
>
> Stefan Hajnoczi <stefanha@gmail.com> writes:
>
> > On Mon, May 10, 2021 at 11:31 AM Daniel P. Berrangé <berrange@redhat.com> wrote:
> >>
> >> On Mon, May 10, 2021 at 10:49:19AM +0100, Stefan Hajnoczi wrote:
> >>
> >> > qemu.org bandwidth usage has been as follows:
> >> > - Jan: 12.56 TB
> >> > - Feb: 10.55 TB
> >> > - Mar: 10.28 TB
> >> > - Apr: 7.62 TB
> >> >
> >> > In May qemu.org has averaged 232.25 GB/day so far putting it on track
> >> > for 7 TB total this month.
> <snip>
> >>
> >> For the https:// URIs should we setup a HTTP redirect ?
> >>
> >> When git clones via https it fetches some specific paths which
> >> I believe we have rules for in httpd conf:
> >>
> >>   ScriptAliasMatch "^/git/(.*\.git/(HEAD|info/refs))$" \
> >>     /usr/libexec/git-core/git-http-backend/$1
> >>   ScriptAliasMatch "^/git/(.*\.git/git-(upload|receive)-pack)$" \
> >>     /usr/libexec/git-core/git-http-backend/$1
> >>
> >> If we set those URI path matches to send a HTTP 307 redirect
> >> to gitlab, that would essentially kill off our git traffic on
> >> qemu.org, while still allowing the qemu.org gitweb UI to
> >> work normally. The downside is that people won't notice to
> >> update their clone URIs. Still feels like an easy win and
> >> we can easily remove the redirect if we use code 307.
> >
> > I remember there were concerns about warning messages that
> > git-clone(1) prints when an HTTP redirect is encountered? If everyone
> > is okay I can turn the git-http-backend(1) aliases into HTTP 307
> > redirects to GitLab.
> >
> >> Third, qemu 4.2.0....
> >>
> >> I wonder why this is the most popular. Something must be linking
> >> to this, as you would otherwise have to go out of your way to
> >> search it out.
> >>
> >> Do we have any stats on the referrer URLs ?
> >>
> >> I wonder if there's some key page(s) that need updating ?
> >>
> >> If we're unlucky there might be some CI system that hardcoded
> >> use of qemu 4.2.0 that's frequently pulling it.
> >
> > The majority of qemu-4.2.0.tar.xz downloads have the wget user agent
> > and no referrer. The IP addresses don't have a clear pattern (there
> > are many).
>
> I've just checked my Gentoo box and I can see it pulls directly from:
>
>   SRC_URI="https://download.qemu.org/${P}.tar.xz"
>
> and the *9999* builds (HEAD, which I doubt many people use) points to:
>
>   EGIT_REPO_URI="https://git.qemu.org/git/qemu.git"
>
> but the lowest version is 5.2.0 and 6.0.0 is already in the repo so
> these particular users probably are a minority.
>
> However Google does point to a number of instructions online that have
> wget and "qemu-4.2.0.tar.xz" in them.

Thank you for checking this! Are you in touch with the maintainers or
able to tweak the ebuilds or documentation?

Thanks,
Stefan


^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2021-05-10 16:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-10  9:49 qemu.org server bandwidth report (May 2021) Stefan Hajnoczi
2021-05-10 10:31 ` Daniel P. Berrangé
2021-05-10 13:40   ` Stefan Hajnoczi
2021-05-10 13:46     ` Daniel P. Berrangé
2021-05-10 15:47     ` Alex Bennée
2021-05-10 16:05       ` Stefan Hajnoczi

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.