git.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Jakub Narebski <jnareb@gmail.com>
Cc: Jeff Garzik <jeff@garzik.org>,
	Martin Langhoff <martin.langhoff@gmail.com>,
	Git Mailing List <git@vger.kernel.org>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Rogan Dawes <discard@dawes.za.net>,
	Kernel Org Admin <ftpadmin@kernel.org>
Subject: Re: kernel.org mirroring (Re: [GIT PULL] MMC update)
Date: Sun, 10 Dec 2006 11:50:15 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.64.0612101129190.12500@woody.osdl.org> (raw)
In-Reply-To: <200612102011.52589.jnareb@gmail.com>



On Sun, 10 Dec 2006, Jakub Narebski wrote:
> >> If-Modified-Since:, If-Match:, If-None-Match: do you?
> 
> Adn in CGI standard there is a way to access additional HTTP headers
> info from CGI script: the envirionmental variables are HTTP_HEADER,
> for example if browser sent If-Modified-Since: header it's value
> can be found in HTTP_IF_MODIFIED_SINCE environmental variable.

Guys, you're missing something fairly fundamnetal. 

It helps almost _nothing_ to support client-side caching with all these 
fancy "If-Modified-Since:" etc crap.

That's not the _problem_.

It's usually not one client asking for the gitweb pages: the load comes 
from just lots of people independently asking for it. So client-side 
caching may help a tiny tiny bit, but it's not actually fixing the 
fundamental problem at all.

So forget about "If-Modified-Since:" etc. It may help in benchmarks when 
you try it yourself, and use "refresh" on the client side. But the basic 
problem is all about lots of clients that do NOT have things cached, 
because all teh client caches are all filled up with pr0n, not with gitweb 
data from yesterday.

So the thing to help is server-side caching with good access patterns, so 
that the server won't have to seek all over the disk when clients that 
_don't_ have things in their caches want to see the "git projects" summary 
overview (that currently lists something like 200+ projects).

So to get that list of 200+ projects, right now gitweb will literally walk 
them all, look at their refs, their descriptions, their ages (which 
requires looking up the refs, and the objects behing the refs), and if 
they aren't cached, you're going to have several disk seeks for each 
project.

At 200+ projects, the thing that makes it slow is those disk seeks. Even 
with a fast disk and RAID array, the seeks are all basically going to be 
interdependent, so there's no room for disk arm movement optimization, and 
in the absense of any other load it's still going to be several seconds 
just for the seeks (say 10ms per seek, four or five seeks per project, 
you've got 10 seconds _just_ for the seeks to generate the top-level 
summary page, and quite frankly, five seeks is probably optimistic).

Now, hopefully some of it will be in the disk cache, but when the 
mirroring happens, it will basically blow the disk caches away totally 
(when using the "--checksum" option), and then you literally have tens of 
seconds to generate that one top-level page. 

And when mirroring is blowing out the disk caches, the thing will be doing 
other things _too_ to the disk, of course.

So what you want is server-side caching, and you basically _never_ want to 
re-generate that data synchronously (because even if the server can take 
the load, having the clients wait for half a minute or more for the data 
is just NOT FRIENDLY). This is why I suggested the grace-period where we 
fill the cache on he server side in the background _while_at_the_same_time 
actually feeding the clients the old cached contents.

Because what matters most to _clients_ is not getting the most recent 
up-to-date data within the last few minutes - people who go to the 
overview page want to just get a list of projects, and they want to get 
them in a second or two, not half a minute later.

And btw, all those "If-Modified-Since:" things are irrelevant, since quite 
often, the top-level page really technically _has_ been modified in the 
last few minutes, because with the kernel and git projects, _somebody_ has 
usually pushed out one of the projects within the last hour.

And no, people don't just sit there refreshing their browser page all the 
time. I bet even "active" git users do it at most once or twice a day, 
which means that their client cache will _never_ be up-to-date.

But if you do it with server-side caches and grace-periods, you can 
generally say "we have something that is at most five minutes old", and 
most importantly, you can hopefully do it without a lot of disk seeks 
(because you just cache the _one_ page as _one_ object), so hopefully you 
can do it in a few hundred ms even if the thing is on disk and even if 
there's a lot of other load going on.

I bet the top-level "all projects" summary page and the individual 
project summary pages are the important things to cache. That's what 
probably most people look at, and they are the ones that have lots of 
server-side cache locality. Individual commits and diffs probably don't 
get the same kind of "lots of people looking at them" and thus don't get 
the same kind of benefit from caching.

(Individual commits hopefully also need fewer disk seeks, at least with 
packed repositories. So even if you have to re-generate them from scratch, 
they won't have the seek times themselves taking up tens of seconds, 
unless the project is entirely unpacked and diffing just generates total 
disk seek hell)


  reply	other threads:[~2006-12-10 19:51 UTC|newest]

Thread overview: 82+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <45708A56.3040508@drzeus.cx>
     [not found] ` <Pine.LNX.4.64.0612011639240.3695@woody.osdl.org>
     [not found]   ` <457151A0.8090203@drzeus.cx>
     [not found]     ` <Pine.LNX.4.64.0612020835110.3476@woody.osdl.org>
     [not found]       ` <45744FA3.7020908@zytor.com>
     [not found]         ` <Pine.LNX.4.64.0612061847190.3615@woody.osdl.org>
     [not found]           ` <45778AA3.7080709@zytor.com>
     [not found]             ` <Pine.LNX.4.64.0612061940170.3615@woody.osdl.org>
     [not found]               ` <4577A84C.3010601@zytor.com>
     [not found]                 ` <Pine.LNX.4.64.0612070953290.3615@woody.osdl.org>
     [not found]                   ` <45785697.1060001@zytor.com>
2006-12-07 19:05                     ` kernel.org mirroring (Re: [GIT PULL] MMC update) Linus Torvalds
2006-12-07 19:16                       ` H. Peter Anvin
2006-12-07 19:30                         ` Olivier Galibert
2006-12-07 19:57                           ` H. Peter Anvin
2006-12-07 23:50                             ` Olivier Galibert
2006-12-07 23:56                               ` H. Peter Anvin
2006-12-08 11:25                               ` Jakub Narebski
2006-12-08 12:57                             ` Rogan Dawes
2006-12-08 13:38                               ` Jakub Narebski
2006-12-08 14:31                                 ` Rogan Dawes
2006-12-08 15:38                                   ` Jonas Fonseca
2006-12-09  1:28                                 ` Martin Langhoff
2006-12-09  2:03                                   ` H. Peter Anvin
2006-12-09  2:52                                     ` Martin Langhoff
2006-12-09  5:09                                       ` H. Peter Anvin
2006-12-09  5:34                                         ` Martin Langhoff
2006-12-09 16:26                                           ` H. Peter Anvin
2006-12-08 16:16                               ` H. Peter Anvin
2006-12-08 16:35                                 ` Linus Torvalds
2006-12-08 16:42                                   ` H. Peter Anvin
2006-12-08 19:49                                     ` Lars Hjemli
2006-12-08 19:51                                       ` H. Peter Anvin
2006-12-08 19:59                                         ` Lars Hjemli
2006-12-08 20:02                                           ` H. Peter Anvin
2006-12-10  9:43                                     ` rda
2006-12-08 16:54                                   ` Jeff Garzik
2006-12-08 17:04                                     ` H. Peter Anvin
2006-12-08 17:40                                       ` Jeff Garzik
2006-12-08 23:27                                     ` Linus Torvalds
2006-12-08 23:46                                       ` Michael K. Edwards
2006-12-08 23:49                                         ` H. Peter Anvin
2006-12-09  0:18                                           ` Michael K. Edwards
2006-12-09  0:23                                             ` H. Peter Anvin
2006-12-09  0:49                                         ` Linus Torvalds
2006-12-09  0:51                                           ` H. Peter Anvin
2006-12-09  4:36                                           ` Michael K. Edwards
2006-12-09  9:27                                           ` Jeff Garzik
     [not found]                                       ` <4579FABC.5070509@garzik.org>
2006-12-09  0:45                                         ` Linus Torvalds
2006-12-09  0:47                                           ` H. Peter Anvin
2006-12-09  9:16                                           ` Jeff Garzik
2006-12-09  1:56                                       ` Martin Langhoff
2006-12-09 11:51                                         ` Jakub Narebski
2006-12-09 12:42                                           ` Jeff Garzik
2006-12-09 13:37                                             ` Jakub Narebski
2006-12-09 14:43                                               ` Jeff Garzik
2006-12-09 17:02                                                 ` Jakub Narebski
2006-12-09 17:27                                                   ` Jeff Garzik
2006-12-10  4:07                                               ` Martin Langhoff
2006-12-10 10:09                                                 ` Jakub Narebski
2006-12-10 12:41                                                   ` Jeff Garzik
2006-12-10 13:02                                                     ` Jakub Narebski
2006-12-10 13:45                                                       ` Jeff Garzik
2006-12-10 19:11                                                         ` Jakub Narebski
2006-12-10 19:50                                                           ` Linus Torvalds [this message]
2006-12-10 20:27                                                             ` Jakub Narebski
2006-12-10 20:30                                                               ` Linus Torvalds
2006-12-10 22:01                                                                 ` Martin Langhoff
2006-12-10 22:14                                                                   ` Jeff Garzik
2006-12-10 22:08                                                                 ` Jeff Garzik
2006-12-10 21:01                                                             ` H. Peter Anvin
2006-12-10 22:05                                                           ` Jeff Garzik
2006-12-10 22:59                                                             ` Jakub Narebski
2006-12-11  2:16                                                               ` Martin Langhoff
2006-12-11  8:59                                                                 ` Jakub Narebski
2006-12-11 10:18                                                                   ` Martin Langhoff
2006-12-09 18:04                                             ` Linus Torvalds
2006-12-09 18:30                                               ` H. Peter Anvin
2006-12-10  3:55                                             ` Martin Langhoff
2006-12-10  7:05                                               ` H. Peter Anvin
2006-12-12 21:19                                                 ` Jakub Narebski
2006-12-09  7:56                                       ` Steven Grimm
2006-12-07 19:30                         ` Linus Torvalds
2006-12-07 19:39                           ` Shawn Pearce
2006-12-07 19:58                             ` Linus Torvalds
2006-12-07 23:33                               ` Michael K. Edwards
2006-12-07 19:58                             ` H. Peter Anvin
2006-12-07 20:05                           ` Junio C Hamano
2006-12-07 20:09                             ` H. Peter Anvin
2006-12-07 22:11                               ` Junio C Hamano
2006-12-08  9:43                       ` Jakub Narebski
2006-12-11  3:40 linux
2006-12-11  9:30 ` Jakub Narebski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.64.0612101129190.12500@woody.osdl.org \
    --to=torvalds@osdl.org \
    --cc=discard@dawes.za.net \
    --cc=ftpadmin@kernel.org \
    --cc=git@vger.kernel.org \
    --cc=hpa@zytor.com \
    --cc=jeff@garzik.org \
    --cc=jnareb@gmail.com \
    --cc=martin.langhoff@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).