All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCHv4 00/17] gitweb: Simple file based output caching
@ 2010-06-14 16:08 Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 01/17] gitweb: Return or exit after done serving request Jakub Narebski
                   ` (19 more replies)
  0 siblings, 20 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

This 17 patches long patch series is intended (yet again) as preliminary
version for splitting large 'gitweb: File based caching layer (from
git.kernel.org)' mega-patch by John 'Warthog9' Hawley aka J.H., by starting
small and adding features piece by piece.

This is fourth version (fourth release) of this series, and is available on
http://repo.or.cz/w/git/jnareb-git.git as 'gitweb/cache-kernel-v4' branch.
Earlier versions are available there as branches 'gitweb/cache-kernel',
'gitweb/cache-kernel-v2' and 'gitweb/cache-kernel-v3'.  Previous version of
this series was sent to git mailing list as:

* [RFC PATCHv3 00/10] gitweb: Simple file based output caching
  Message-Id: <1266349005-15393-1-git-send-email-jnareb@gmail.com>
  http://thread.gmane.org/gmane.comp.version-control.git/140138

This version is based on top of c0cb4ed (git-instaweb: Configure it to work
with new gitweb structure, 2010-05-28), which was merged as 'ps/gitweb-soc'
into next in 5db4adf, on 2010-06-03.  Next version would be rebased on top
of 'next', which would include "sub run" code in 'jn/gitweb-fastcgi', which
would require resolving conflict with this series.

This series of patches does not include comments on individual patches,
comparing them to previous version, and to original series by J.H.  The
next release of this series, hopefully not as RFC, would include them.


The MAIN DIFFERENCE from previous version is that, inspired by splitting
gitweb part of "Splitting gitweb and developing write functionalities
(Integrated web client for git)" Google Summer of Code 2010 project by
Pavan Kumar Sankara, the gitweb output caching code now resides in separate
modules under 'lib/' subdirectory, instead of being all in 'cache.pm' file.

The 'gitweb: href(..., -path_info => 0|1)' which began v3 series is already
merged in, together with other patches from 'jn/gitweb-caching-prep'.

The following changes are available in the git repository at:

  git://repo.or.cz/git/jnareb-git.git gitweb/cache-kernel-v4


All comments welcome!

Shortlog:
~~~~~~~~~
 1. gitweb: Return or exit after done serving request

    This patch was generated in response to noticing the following kind of
    errors in /var/log/httpd/error_log:

      [Sun Jun 13 11:58:02 2010] gitweb.cgi: Subroutine git_atom redefined
      at /var/www/perl/gitweb/gitweb.cgi line 6804.

    when running gitweb under mod_perl (with ModPerl::Registry handler).
    After this patch I didn't see such errors... but I failed to reproduce
    this error without this patch.

    This patch was sent separately to git mailing list as
      http://thread.gmane.org/gmane.comp.version-control.git/149040
    but should probably be skipped.

 2. gitweb: Fix typo in hash key name in %opts in git_header_html

    This patch was sent separately to git mailing list and is already
    present in 'master' as ad709ea (gitweb: Fix typo in hash key name in
    %opts in git_header_html, 2010-06-13).

 3. gitweb: Prepare for splitting gitweb

    IMPORTANT!!!

    This patch is the main reason this series is CC-ed to Pavan Kumar
    Sankara (student) and Christian Couder (mentor) of "Integrated web
    client for git" GSoC2010 project, which includes splitting gitweb.

    As you can see later in series it makes it very easy to add new gitweb
    modules which are to be installed.  Just add

      GITWEB_MODULES += lib/Module/Foo.pm

    to gitweb/Makefile for Module::Foo.

 4. gitweb/lib - Very simple file based cache
 5. gitweb/lib - Stat-based cache expiration

    The major change is that the code is now in GitwebCache::SimpleFileCache
    module in gitweb/lib/GitwebCache/SimpleFileCache.pm file, instead of
    being in gitweb/cache.pm file.

    Other change is removing using File::Spec to manipulate pathnames,
    using faster concatenation instead.  Using File::Spec do not bring us
    extra portability, as gitweb itself assumes '/' as path separator.
    It should be tiny little bit faster without File::Spec::catfile etc.

 6. gitweb/lib - Benchmarking GitwebCache::SimpleFileCache (in t/9603/)

    This is new patch, not present in previous series.  Note that you have
    to run benchmark manually; it is not run by t9503 test.

    The t/t9503/benchmark_caching_interface.pl includes example benchmark
    results at the end of the file.

 7. gitweb/lib - Simple select(FH) based output capture

    The major change is that the code is now in GitwebCache::Capture and
    GitwebCache::Capture::SelectFH modules which are in *separate* files
    (one file per module), instead of being all in gitweb/cache.pm

 8. gitweb/lib - Alternate ways of capturing output

    This is new patch, not present in previous series.  It includes new
    test (not run by t9503) and new benchmark.

 9. gitweb/lib - Cache captured output (using get/set)

    The major change is that the code is now in GitwebCache::CacheOutput
    module in gitweb/lib/GitwebCache/CacheOutput.pm file.

10. gitweb: Add optional output caching

    "use lib __DIR__.'/lib';" is now added by earlier patch in series.
    It uses new infrastructure instead of 'bag of modules' cache.pm file.

11. gitweb/lib - Adaptive cache expiration time
12. gitweb/lib - Use CHI compatible (compute method) caching interface

    Similar to what was in previous version of this series.

13. gitweb/lib - Use locking to avoid 'cache miss stampede' problem
14. gitweb/lib - Serve stale data when waiting for filling cache
15. gitweb/lib - Regenerate (refresh) cache in background


    The major change is that the code is now in new (not present in
    previous version) GitwebCache::FileCacheWithLocking module (in its own
    file), which is derived from GitwebCache::SimpleFileCache.  This allow
    to choose whether to use locking (and other features that require
    locking) or not.

16. gitweb: Show appropriate "Generating..." page when regenerating cache
17. gitweb: Add startup delay to activity indicator for cache

    No major changes, I think (besides using new infrastructure).

 gitweb/Makefile                                |   10 +
 gitweb/README                                  |   62 ++++
 gitweb/gitweb.perl                             |  333 +++++++++++++++++--
 gitweb/lib/GitwebCache/CacheOutput.pm          |   93 ++++++
 gitweb/lib/GitwebCache/Capture.pm              |   66 ++++
 gitweb/lib/GitwebCache/Capture/PerlIO.pm       |   79 +++++
 gitweb/lib/GitwebCache/Capture/SelectFH.pm     |   82 +++++
 gitweb/lib/GitwebCache/Capture/TiedCapture.pm  |  149 +++++++++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |  264 +++++++++++++++
 gitweb/lib/GitwebCache/SimpleFileCache.pm      |  417 ++++++++++++++++++++++++
 t/t9500-gitweb-standalone-no-errors.sh         |   19 +
 t/t9503-gitweb-caching.sh                      |   38 +++
 t/t9503/benchmark_caching_interface.pl         |  132 ++++++++
 t/t9503/benchmark_capture_implementations.pl   |  198 +++++++++++
 t/t9503/test_cache_interface.pl                |  409 +++++++++++++++++++++++
 t/t9503/test_cache_output.pl                   |   66 ++++
 t/t9503/test_capture_interface.pl              |   76 +++++
 17 files changed, 2468 insertions(+), 25 deletions(-)

 create mode 100644 gitweb/lib/GitwebCache/CacheOutput.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture/PerlIO.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture/SelectFH.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture/TiedCapture.pm
 create mode 100644 gitweb/lib/GitwebCache/FileCacheWithLocking.pm
 create mode 100644 gitweb/lib/GitwebCache/SimpleFileCache.pm
 create mode 100755 t/t9503-gitweb-caching.sh
 create mode 100755 t/t9503/benchmark_caching_interface.pl
 create mode 100755 t/t9503/benchmark_capture_implementations.pl
 create mode 100755 t/t9503/test_cache_interface.pl
 create mode 100755 t/t9503/test_cache_output.pl
 create mode 100755 t/t9503/test_capture_interface.pl

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 01/17] gitweb: Return or exit after done serving request
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 02/17] gitweb: Fix typo in hash key name in %opts in git_header_html Jakub Narebski
                   ` (18 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Check if there is a caller in top frame of gitweb, and either 'return'
if gitweb code is wrapped in subroutine, or 'exit' if it is not.

This should avoid

  gitweb.cgi: Subroutine git_SOMETHING redefined at gitweb.cgi line NNN

warnings in error_log when running gitweb with mod_perl (using
ModPerl::Registry handler)

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/gitweb.perl |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 934aacb..c54a8a8 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -987,7 +987,17 @@ if ($action !~ m/^(?:opml|project_list|project_index)$/ &&
 	die_error(400, "Project needed");
 }
 $actions{$action}->();
+
 DONE_GITWEB:
+if (defined caller) {
+	# wrapped in a subroutine processing requests,
+	# e.g. mod_perl with ModPerl::Registry, or PSGI with Plack::App::WrapCGI
+	return;
+} else {
+	# pure CGI script, serving single request
+	# or 'exit' hijacked approprietly
+	exit;
+}
 1;
 
 ## ======================================================================
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 02/17] gitweb: Fix typo in hash key name in %opts in git_header_html
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 01/17] gitweb: Return or exit after done serving request Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 03/17] gitweb: Prepare for splitting gitweb Jakub Narebski
                   ` (17 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

The name of the key has to be the same in call site handle_errors_html
and in called subroutine that uses it, i.e. git_header_html.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/gitweb.perl |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index c54a8a8..74be92b 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3230,7 +3230,7 @@ sub git_header_html {
 	}
 	print $cgi->header(-type=>$content_type, -charset => 'utf-8',
 	                   -status=> $status, -expires => $expires)
-		unless ($opts{'-no_http_headers'});
+		unless ($opts{'-no_http_header'});
 	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
 	print <<EOF;
 <?xml version="1.0" encoding="utf-8"?>
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 03/17] gitweb: Prepare for splitting gitweb
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 01/17] gitweb: Return or exit after done serving request Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 02/17] gitweb: Fix typo in hash key name in %opts in git_header_html Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 04/17] gitweb/lib - Very simple file based cache Jakub Narebski
                   ` (16 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Prepare gitweb for having been split into modules that are to be
installed alongside gitweb in 'lib/' subdirectory, by adding

  use lib __DIR__.'/lib';

to gitweb.perl (to main gitweb script), and preparing for putting
modules (relative path) in $(GITWEB_MODULES) in gitweb/Makefile.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/Makefile    |    3 +++
 gitweb/gitweb.perl |    9 +++++++++
 2 files changed, 12 insertions(+), 0 deletions(-)

diff --git a/gitweb/Makefile b/gitweb/Makefile
index d2584fe..d2401e1 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -55,6 +55,7 @@ PERL_PATH  ?= /usr/bin/perl
 bindir_SQ = $(subst ','\'',$(bindir))#'
 gitwebdir_SQ = $(subst ','\'',$(gitwebdir))#'
 gitwebstaticdir_SQ = $(subst ','\'',$(gitwebdir)/static)#'
+gitweblibdir_SQ = $(subst ','\'',$(gitwebdir)/lib)#'
 SHELL_PATH_SQ = $(subst ','\'',$(SHELL_PATH))#'
 PERL_PATH_SQ  = $(subst ','\'',$(PERL_PATH))#'
 DESTDIR_SQ    = $(subst ','\'',$(DESTDIR))#'
@@ -150,6 +151,8 @@ install: all
 	$(INSTALL) -m 755 $(GITWEB_PROGRAMS) '$(DESTDIR_SQ)$(gitwebdir_SQ)'
 	$(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitwebstaticdir_SQ)'
 	$(INSTALL) -m 644 $(GITWEB_FILES) '$(DESTDIR_SQ)$(gitwebstaticdir_SQ)'
+	$(foreach dir,$(sort $(dir $(GITWEB_MODULES))),test -d '$(DESTDIR_SQ)$(gitwebdir_SQ)/$(dir)' || $(INSTALL) -d -m 755 '$(DESTDIR_SQ)$(gitwebdir_SQ)/$(dir)';)
+	$(foreach mod,$(GITWEB_MODULES),$(INSTALL) -m 644 $(mod) '$(DESTDIR_SQ)$(gitwebdir_SQ)/$(mod)';)
 
 ### Cleaning rules
 
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 74be92b..bb81e14 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -9,6 +9,14 @@
 
 use strict;
 use warnings;
+
+use File::Spec;
+# __DIR__ is taken from Dir::Self __DIR__ fragment
+sub __DIR__ () {
+	File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
+}
+use lib __DIR__ . '/lib';
+
 use CGI qw(:standard :escapeHTML -nosticky);
 use CGI::Util qw(unescape);
 use CGI::Carp qw(fatalsToBrowser set_message);
@@ -16,6 +24,7 @@ use Encode;
 use Fcntl ':mode';
 use File::Find qw();
 use File::Basename qw(basename);
+
 binmode STDOUT, ':utf8';
 
 our $t0;
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 04/17] gitweb/lib - Very simple file based cache
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (2 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 03/17] gitweb: Prepare for splitting gitweb Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 05/17] gitweb/lib - Stat-based cache expiration Jakub Narebski
                   ` (15 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

This is first step towards implementing file based output (response)
caching layer that is used on such large sites as kernel.org.

This patch introduces GitwebCaching::SimpleFileCache package, which
follows Cache::Cache / CHI interface, although do not implement it
fully.  The intent of following established convention for cache
interface is to be able to replace our simple file based cache,
e.g. by the one using memcached.

Like in original patch by John 'Warthog9' Hawley (J.H.) (the one this
commit intends to be incremental step to), the data is stored in the
case as-is, without adding metadata (like expiration date), and
without serialization (which means that one can store only scalar
data).

To be implemented (from original patch by J.H.):
* cache expiration (based on file stats, current time and global
  expiration time); currently elements in cache do not expire at all
* actually using this cache in gitweb, with exception of error pages
* adaptive cache expiration, based on average system load
* optional locking interface, where only one process can update cache
  (using flock)
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Possible extensions (beyound what was in original patch):
* (optionally) show information about cache utilization
* AJAX (JavaScript-based) progress indicator
* JavaScript code to update relative dates in cached output
* make cache size-aware (try to not exceed specified maximum size)
* utilize X-Sendfile header (or equivalent) to show cached data
  (optional, as it makes sense only if web server supports sendfile
  feature and have it enabled)
* variable expiration feature from CHI, allowing items to expire a bit
  earlier than the stated expiration time to prevent cache miss
  stampedes (although locking, if available, should take care of
  this)

The code of GitwebCaching::SimpleFileCache package in gitweb/lib
was heavily based on file-based cache in Cache::Cache package, i.e.
on Cache::FileCache, Cache::FileBackend and Cache::BaseCache, and on
file-based cache in CHI, i.e. on CHI::Driver::File and CHI::Driver
(including implementing atomic write, something that original patch
lacks).  It tries to follow more modern CHI architecture, but without
requiring Moose.  It is much simplified compared to both interfaces
and their file-based drivers.

This patch does not yet enable output caching in gitweb (it doesn't
have all required features yet); on the other hand it includes tests
of cache Perl API in t/t9503-gitweb-caching.sh.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/lib/GitwebCache/SimpleFileCache.pm |  310 +++++++++++++++++++++++++++++
 t/t9503-gitweb-caching.sh                 |   32 +++
 t/t9503/test_cache_interface.pl           |   81 ++++++++
 3 files changed, 423 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/SimpleFileCache.pm
 create mode 100755 t/t9503-gitweb-caching.sh
 create mode 100755 t/t9503/test_cache_interface.pl

diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
new file mode 100644
index 0000000..b51a124
--- /dev/null
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -0,0 +1,310 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Gitweb caching engine, simple file-based cache
+#
+
+# Minimalistic cache that stores data in the filesystem, without serialization
+# and currently without any kind of cache expiration (all keys last forever till
+# they got explicitely removed).
+#
+# It follows Cache::Cache and CHI interfaces (but does not implement it fully)
+
+package GitwebCache::SimpleFileCache;
+
+use strict;
+use warnings;
+
+use File::Path qw(mkpath);
+use File::Temp qw(tempfile);
+use Digest::MD5 qw(md5_hex);
+
+# by default, the cache nests all entries on the filesystem single
+# directory deep, i.e. '60/b725f10c9c85c70d97880dfe8191b3' for
+# key name (key digest) 60b725f10c9c85c70d97880dfe8191b3.
+#
+our $DEFAULT_CACHE_DEPTH = 1;
+
+# by default, the root of the cache is located in 'cache'.
+#
+our $DEFAULT_CACHE_ROOT = "cache";
+
+# by default we don't use cache namespace (empty namespace);
+# empty namespace does not allow for simple implementation of clear() method.
+#
+our $DEFAULT_NAMESPACE = '';
+
+# ......................................................................
+# constructor
+
+# The options are set by passing in a reference to a hash containing
+# any of the following keys:
+#  * 'namespace'
+#    The namespace associated with this cache.  This allows easy separation of
+#    multiple, distinct caches without worrying about key collision.  Defaults
+#    to $DEFAULT_NAMESPACE.
+#  * 'cache_root' (Cache::FileCache compatibile),
+#    'root_dir' (CHI::Driver::File compatibile),
+#    The location in the filesystem that will hold the root of the cache.
+#    Defaults to $DEFAULT_CACHE_ROOT.
+#  * 'cache_depth' (Cache::FileCache compatibile),
+#    'depth' (CHI::Driver::File compatibile),
+#    The number of subdirectories deep to cache object item.  This should be
+#    large enough that no cache directory has more than a few hundred objects.
+#    Defaults to $DEFAULT_CACHE_DEPTH unless explicitly set.
+sub new {
+	my ($proto, $p_options_hash_ref) = @_;
+
+	my $class = ref($proto) || $proto;
+	my $self  = {};
+	$self = bless($self, $class);
+
+	my ($root, $depth, $ns);
+	if (defined $p_options_hash_ref) {
+		$root  =
+			$p_options_hash_ref->{'cache_root'} ||
+			$p_options_hash_ref->{'root_dir'};
+		$depth =
+			$p_options_hash_ref->{'cache_depth'} ||
+			$p_options_hash_ref->{'depth'};
+		$ns    = $p_options_hash_ref->{'namespace'};
+	}
+	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
+	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
+	$ns    = $DEFAULT_NAMESPACE   unless defined($ns);
+
+	$self->set_root($root);
+	$self->set_depth($depth);
+	$self->set_namespace($ns);
+
+	return $self;
+}
+
+# ......................................................................
+# accessors
+
+# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
+
+# creates get_depth() and set_depth($depth) etc. methods
+foreach my $i (qw(depth root namespace)) {
+	my $field = $i;
+	no strict 'refs';
+	*{"get_$field"} = sub {
+		my $self = shift;
+		return $self->{$field};
+	};
+	*{"set_$field"} = sub {
+		my ($self, $value) = @_;
+		$self->{$field} = $value;
+	};
+}
+
+# ----------------------------------------------------------------------
+# utility functions and methods
+
+# Return root dir for namespace (lazily built, cached)
+sub path_to_namespace {
+	my ($self) = @_;
+
+	if (!exists $self->{'path_to_namespace'}) {
+		if (defined $self->{'namespace'} &&
+		    $self->{'namespace'} ne '') {
+			$self->{'path_to_namespace'} = "$self->{'root'}/$self->{'namespace'}";
+		} else {
+			$self->{'path_to_namespace'} =  $self->{'root'};
+		}
+	}
+	return $self->{'path_to_namespace'};
+}
+
+# Take an human readable key, and return file path.
+# Puts dirname of file path in second argument, if it is provided.
+sub path_to_key {
+	my ($self, $key, $dir_ref) = @_;
+
+	my @paths = ( $self->path_to_namespace() );
+
+	# Create a unique (hashed) key from human readable key
+	my $filename = md5_hex($key); # or $digester->add($key)->hexdigest();
+
+	# Split filename so that it have DEPTH subdirectories,
+	# where each subdirectory has a two-letter name
+	push @paths, unpack("(a2)".($self->{'depth'} || 1)." a*", $filename);
+	$filename = pop @paths;
+
+	# Join paths together, computing dir separately if $dir_ref was passed.
+	my $filepath;
+	if (defined $dir_ref && ref($dir_ref)) {
+		my $dir = join('/', @paths);
+		$filepath = "$dir/$filename";
+		$$dir_ref = $dir;
+	} else {
+		$filepath = join('/', @paths, $filename);
+	}
+
+	return $filepath;
+}
+
+
+# ----------------------------------------------------------------------
+# worker methods
+
+sub fetch {
+	my ($self, $key) = @_;
+
+	my $file = $self->path_to_key($key);
+	return undef unless (defined $file && -f $file);
+
+	# Fast slurp, adapted from File::Slurp::read, with unnecessary options removed
+	# via CHI::Driver::File (from CHI-0.33)
+	my $buf = '';
+	open my $read_fh, '<', $file
+		or return undef;
+	binmode $read_fh, ':raw';
+
+	my $size_left = -s $read_fh;
+
+	while ($size_left > 0) {
+		my $read_cnt = sysread($read_fh, $buf, $size_left, length($buf));
+		return undef unless defined $read_cnt;
+
+		last if $read_cnt == 0;
+		$size_left -= $read_cnt;
+		#last if $size_left <= 0;
+	}
+
+	close $read_fh
+		or die "Couldn't close file '$file' opened for reading: $!";
+	return $buf;
+}
+
+sub store {
+	my ($self, $key, $data) = @_;
+
+	my $dir;
+	my $file = $self->path_to_key($key, \$dir);
+	return undef unless (defined $file && defined $dir);
+
+	# ensure that directory leading to cache file exists
+	if (!-d $dir) {
+		mkpath($dir, 0, 0777)
+			or die "Couldn't create leading directory '$dir' (mkpath): $!";
+	}
+
+	# generate a temporary file
+	my $temp = File::Temp->new(
+		#DIR => $dir,
+		TEMPLATE => "${file}_XXXXX",
+		SUFFIX => ".tmp"
+	) or die "Couldn't create temporary file with '${file}_XXXXX' template: $!";
+	chmod 0666, $temp
+		or warn "Couldn't change permissions to 0666 for '$temp': $!";
+
+	# Fast spew, adapted from File::Slurp::write, with unnecessary options removed
+	# via CHI::Driver::File (from CHI-0.33)
+	my $write_fh = $temp;
+	binmode $write_fh, ':raw';
+
+	my $size_left = length($data);
+	my $offset = 0;
+
+	while ($size_left > 0) {
+		my $write_cnt = syswrite($write_fh, $data, $size_left, $offset);
+		return undef unless defined $write_cnt;
+		
+		$size_left -= $write_cnt;
+		$offset += $write_cnt; # == length($data);
+	}
+
+	close $temp
+		or die "Couldn't close temporary file '$temp' opened for writing: $!";
+	rename($temp, $file)
+		or die "Couldn't rename temporary file '$temp' to '$file': $!";
+}
+
+# get size of an element associated with the $key (not the size of whole cache)
+sub get_size {
+	my ($self, $key) = @_;
+
+	my $path = $self->path_to_key($key)
+		or return undef;
+	if (-f $path) {
+		return -s $path;
+	}
+	return 0;
+}
+
+# ......................................................................
+# interface methods
+
+# Removing and expiring
+
+# $cache->remove($key)
+#
+# Remove the data associated with the $key from the cache.
+sub remove {
+	my ($self, $key) = @_;
+
+	my $file = $self->path_to_key($key)
+		or return undef;
+	return undef unless -f $file;
+	unlink($file)
+		or die "Couldn't remove file '$file': $!";
+}
+
+# Getting and setting
+
+# $cache->set($key, $data);
+#
+# Associates $data with $key in the cache, overwriting any existing entry.
+# Returns $data.
+sub set {
+	my ($self, $key, $data) = @_;
+
+	return unless (defined $key && defined $data);
+
+	$self->store($key, $data);
+
+	return $data;
+}
+
+# $data = $cache->get($key);
+#
+# Returns the data associated with $key.  If $key does not exist
+# or has expired, returns undef.
+sub get {
+	my ($self, $key) = @_;
+
+	my $data = $self->fetch($key)
+		or return undef;
+
+	return $data;
+}
+
+# $data = $cache->compute($key, $code);
+#
+# Combines the get and set operations in a single call.  Attempts to
+# get $key; if successful, returns the value.  Otherwise, calls $code
+# and uses the return value as the new value for $key, which is then
+# returned.
+sub compute {
+	my ($self, $key, $code) = @_;
+
+	my $data = $self->get($key);
+	if (!defined $data) {
+		$data = $code->($self, $key);
+		$self->set($key, $data);
+	}
+
+	return $data;
+}
+
+# end of package GitwebCache::SimpleFileCache;
+1;
+
+
diff --git a/t/t9503-gitweb-caching.sh b/t/t9503-gitweb-caching.sh
new file mode 100755
index 0000000..768080c
--- /dev/null
+++ b/t/t9503-gitweb-caching.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+#
+# Copyright (c) 2010 Jakub Narebski
+#
+
+test_description='caching interface to be used in gitweb'
+#test_description='caching interface used in gitweb, gitweb caching
+#
+#This test checks cache (interface) used in gitweb caching, caching
+#infrastructure and gitweb response (output) caching (the last by
+#running gitweb as CGI script from commandline).'
+
+# for now we are running only cache interface tests
+. ./test-lib.sh
+
+# this test is present in gitweb-lib.sh
+if ! test_have_prereq PERL; then
+	say 'perl not available, skipping test'
+	test_done
+fi
+
+"$PERL_PATH" -MTest::More -e 0 >/dev/null 2>&1 || {
+	say 'perl module Test::More unavailable, skipping test'
+	test_done
+}
+
+# ----------------------------------------------------------------------
+
+test_external 'GitwebCache::* Perl API (in gitweb/cache.pm)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_cache_interface.pl
+
+test_done
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
new file mode 100755
index 0000000..9242129
--- /dev/null
+++ b/t/t9503/test_cache_interface.pl
@@ -0,0 +1,81 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use Test::More;
+
+# test source version
+use lib "$ENV{TEST_DIRECTORY}/../gitweb/lib";
+
+# Test creating a cache
+#
+BEGIN { use_ok('GitwebCache::SimpleFileCache'); }
+my $cache = new_ok('GitwebCache::SimpleFileCache');
+
+# Test that default values are defined
+#
+ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
+	'$DEFAULT_CACHE_ROOT defined');
+ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH,
+	'$DEFAULT_CACHE_DEPTH defined');
+
+# Test accessors and default values for cache
+#
+SKIP: {
+	skip 'default values not defined', 3
+		unless ($GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT &&
+		        $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH);
+
+	is($cache->get_namespace(), '', "default namespace is ''");
+	cmp_ok($cache->get_root(),  'eq', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
+		"default cache root is '$GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT'");
+	cmp_ok($cache->get_depth(), '==', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH,
+		"default cache depth is $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH");
+}
+
+# Test the getting, setting, and removal of a cached value
+# (Cache::Cache interface)
+#
+my $key = 'Test Key';
+my $value = 'Test Value';
+
+subtest 'Cache::Cache interface' => sub {
+	foreach my $method (qw(get set remove)) {
+		can_ok($cache, $method);
+	}
+
+	$cache->set($key, $value);
+	cmp_ok($cache->get_size($key), '>', 0, 'get_size after set, is greater than 0');
+	is($cache->get($key), $value,          'get after set, returns cached value');
+	$cache->remove($key);
+	ok(!defined($cache->get($key)),        'get after remove, is undefined');
+
+	eval { $cache->remove('Not-Existent Key'); };
+	ok(!$@,                                'remove on non-existent key doesn\'t die');
+	diag($@) if $@;
+
+	done_testing();
+};
+
+# Test the getting and setting of a cached value
+# (CHI interface)
+#
+my $call_count = 0;
+sub get_value {
+	$call_count++;
+	return $value;
+}
+subtest 'CHI interface' => sub {
+	can_ok($cache, qw(compute));
+
+	is($cache->compute($key, \&get_value), $value, "compute 1st time (set) returns '$value'");
+	is($cache->compute($key, \&get_value), $value, "compute 2nd time (get) returns '$value'");
+	is($cache->compute($key, \&get_value), $value, "compute 3rd time (get) returns '$value'");
+	cmp_ok($call_count, '==', 1, 'get_value() is called once from compute');
+
+	done_testing();
+};
+
+done_testing();
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 05/17] gitweb/lib - Stat-based cache expiration
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (3 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 04/17] gitweb/lib - Very simple file based cache Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 06/17] gitweb/lib - Benchmarking GitwebCache::SimpleFileCache (in t/9603/) Jakub Narebski
                   ` (14 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Add stat-based cache expiration to file-based GitwebCache::SimpleFileCache.
Contrary to the way other caching interfaces such as Cache::Cache and CHI
do it, the time cache element expires in is _global_ value associated with
cache instance, and is not local property of cache entry.  (Currently cache
entry does not store any metadata associated with entry... which means that
there is no need for serialization / marshalling / freezing and thawing.)
Default expire time is -1, which means never expire.

To check if cache entry is expired, GitwebCache::SimpleFileCache compares
difference between mtime (last modify time) of a cache file and current time
with (global) time to expire.  It is done using CHI-compatibile is_valid()
method.

Add some tests checking that expiring works correctly (on the level of API).

To be implemented (from original patch by J.H.):
* actually using this cache in gitweb, except error pages
* adaptive cache expiration, based on average system load
* optional locking interface, where only one process can update cache
  (using flock)
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/lib/GitwebCache/SimpleFileCache.pm |   42 ++++++++++++++++++++++++++--
 t/t9503/test_cache_interface.pl           |   19 +++++++++++++
 2 files changed, 58 insertions(+), 3 deletions(-)

diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index b51a124..91b3373 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -57,6 +57,10 @@ our $DEFAULT_NAMESPACE = '';
 #    The number of subdirectories deep to cache object item.  This should be
 #    large enough that no cache directory has more than a few hundred objects.
 #    Defaults to $DEFAULT_CACHE_DEPTH unless explicitly set.
+#  * 'default_expires_in' (Cache::Cache compatibile),
+#    'expires_in' (CHI compatibile) [seconds]
+#    The expiration time for objects place in the cache.
+#    Defaults to -1 (never expire) if not explicitly set.
 sub new {
 	my ($proto, $p_options_hash_ref) = @_;
 
@@ -64,7 +68,7 @@ sub new {
 	my $self  = {};
 	$self = bless($self, $class);
 
-	my ($root, $depth, $ns);
+	my ($root, $depth, $ns, $expires_in);
 	if (defined $p_options_hash_ref) {
 		$root  =
 			$p_options_hash_ref->{'cache_root'} ||
@@ -73,14 +77,19 @@ sub new {
 			$p_options_hash_ref->{'cache_depth'} ||
 			$p_options_hash_ref->{'depth'};
 		$ns    = $p_options_hash_ref->{'namespace'};
+		$expires_in =
+			$p_options_hash_ref->{'default_expires_in'} ||
+			$p_options_hash_ref->{'expires_in'};
 	}
 	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
 	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
 	$ns    = $DEFAULT_NAMESPACE   unless defined($ns);
+	$expires_in = -1 unless defined($expires_in); # <0 means never
 
 	$self->set_root($root);
 	$self->set_depth($depth);
 	$self->set_namespace($ns);
+	$self->set_expires_in($expires_in);
 
 	return $self;
 }
@@ -91,7 +100,7 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(depth root namespace)) {
+foreach my $i (qw(depth root namespace expires_in)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -257,6 +266,31 @@ sub remove {
 		or die "Couldn't remove file '$file': $!";
 }
 
+# $cache->is_valid($key)
+#
+# Returns a boolean indicating whether $key exists in the cache
+# and has not expired (global per-cache 'expires_in').
+sub is_valid {
+	my ($self, $key) = @_;
+
+	my $path = $self->path_to_key($key);
+
+	# does file exists in cache?
+	return 0 unless -f $path;
+	# get its modification time
+	my $mtime = (stat(_))[9] # _ to reuse stat structure used in -f test
+		or die "Couldn't stat file '$path': $!";
+
+	# expire time can be set to never
+	my $expires_in = $self->get_expires_in();
+	return 1 unless (defined $expires_in && $expires_in >= 0);
+
+	# is file expired?
+	my $now = time();
+
+	return (($now - $mtime) < $expires_in);
+}
+
 # Getting and setting
 
 # $cache->set($key, $data);
@@ -280,6 +314,7 @@ sub set {
 sub get {
 	my ($self, $key) = @_;
 
+	return undef unless $self->is_valid($key);
 	my $data = $self->fetch($key)
 		or return undef;
 
@@ -304,7 +339,8 @@ sub compute {
 	return $data;
 }
 
-# end of package GitwebCache::SimpleFileCache;
 1;
+__END__
+# end of package GitwebCache::SimpleFileCache;
 
 
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 9242129..b1e9036 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -78,4 +78,23 @@ subtest 'CHI interface' => sub {
 	done_testing();
 };
 
+# Test cache expiration
+#
+subtest 'cache expiration' => sub {
+	$cache->set_expires_in(60*60*24); # set expire time to 1 day
+	cmp_ok($cache->get_expires_in(), '>', 0, '"expires in" is greater than 0');
+	is($cache->get($key), $value,            'get returns cached value (not expired in 1d)');
+
+	$cache->set_expires_in(-1); # set expire time to never expire
+	is($cache->get_expires_in(), -1,         '"expires in" is set to never (-1)');
+	is($cache->get($key), $value,            'get returns cached value (not expired)');
+
+	$cache->set_expires_in(0);
+	is($cache->get_expires_in(),  0,         '"expires in" is set to now (0)');
+	$cache->set($key, $value);
+	ok(!defined($cache->get($key)),          'cache is expired');
+
+	done_testing();
+};
+
 done_testing();
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 06/17] gitweb/lib - Benchmarking GitwebCache::SimpleFileCache (in t/9603/)
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (4 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 05/17] gitweb/lib - Stat-based cache expiration Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 07/17] gitweb/lib - Simple select(FH) based output capture Jakub Narebski
                   ` (13 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Add t/t9503/benchmark_caching_interface.pl, which benchmarks 'set' and
'get' methods from GitwebCache::SimpleFileCache, and compares them
against Cache::FileCache, Cache::MemoryCache, CHI with 'File' driver,
Cache::FastMmap if they are available.

Includes example benchmark results.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 t/t9503/benchmark_caching_interface.pl |  132 ++++++++++++++++++++++++++++++++
 1 files changed, 132 insertions(+), 0 deletions(-)
 create mode 100755 t/t9503/benchmark_caching_interface.pl

diff --git a/t/t9503/benchmark_caching_interface.pl b/t/t9503/benchmark_caching_interface.pl
new file mode 100755
index 0000000..5a46077
--- /dev/null
+++ b/t/t9503/benchmark_caching_interface.pl
@@ -0,0 +1,132 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use File::Spec;
+use File::Path;
+use Benchmark qw(:all);
+
+# benchmark source version
+sub __DIR__ () {
+	File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
+}
+use lib __DIR__."/../../gitweb/lib";
+use GitwebCache::SimpleFileCache;
+
+my $key = 'http://localhost/cgi-bin/gitweb.cgi?p=git/git.git;a=blob_plain;f=lorem.txt;hb=HEAD';
+my $value = <<'EOF';
+Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
+eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
+minim veniam, quis nostrud exercitation ullamco laboris nisi ut
+aliquip ex ea commodo consequat. Duis aute irure dolor in
+reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
+pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
+culpa qui officia deserunt mollit anim id est laborum.
+EOF
+
+my $cache_root = __DIR__.'/cache';
+my %cache_options = (
+	'cache_root' => $cache_root, # Cache::FileCache compatibile
+	'root_dir'   => $cache_root, # CHI::Driver::File compatibile
+	'cache_depth' => 1, # Cache::FileCache compatibile
+	'depth'       => 1, # CHI::Driver::File compatibile
+	'default_expires_in' => 24*60*60, # Cache::Cache compatibile
+	'expires_in'         => 24*60*60, # CHI compatibile
+	'expire_time'        => 24*60*60, # Cache::FastMmap compatibile
+);
+
+mkdir($cache_root);
+
+my @caches;
+push @caches, GitwebCache::SimpleFileCache->new({
+	'namespace' => 'GitwebCache-SimpleFileCache',
+	%cache_options,
+});
+
+if (eval { require Cache::FileCache; 1; }) {
+	push @caches, Cache::FileCache->new({
+		'namespace' => 'Cache-FileCache',
+		%cache_options,
+	});
+}
+
+if (eval { require Cache::MemoryCache; 1; }) {
+	push @caches, Cache::MemoryCache->new({
+		'namespace' => 'Cache-MemoryCache',
+		%cache_options,
+	});
+}
+if (eval { require CHI; 1; }) {
+	push @caches, CHI->new(
+		'driver' => 'File',
+		'namespace' => 'CHI-Driver-File',
+		%cache_options,
+	);
+}
+if (eval { require Cache::FastMmap; 1 }) {
+	push @caches, Cache::FastMmap->new(
+		'share_file' => $cache_root.'/Cache-FastMmap',
+		'init_file' => 1,
+		'raw_values' => 1,
+		%cache_options,
+	);
+}
+
+my %caches;
+my $count = -10;
+
+print '$cache->set($key, $value)'."\n";
+for my $cache (@caches) {
+	my $name = ref($cache);
+	$name = 'GitwebCache' if ($name =~ /^GitwebCache/);
+	$caches{$name} = sub { $cache->set($key, $value); };
+}
+
+my $result_set = timethese($count, { %caches });
+cmpthese($result_set);
+
+print "\n";
+
+print '$cache->get($key)'."\n";
+for my $cache (@caches) {
+	my $name = ref($cache);
+	$name = 'GitwebCache' if ($name =~ /^GitwebCache/);
+	$caches{$name} = sub { $cache->get($key); };
+}
+
+my $result_get = timethese($count, { %caches });
+cmpthese($result_get);
+
+rmtree($cache_root);
+
+1;
+__END__
+## EXAMPLE OUTPUT ##
+#
+# $cache->set($key, $value)
+# Benchmark: running Cache::FastMmap, Cache::FileCache, Cache::MemoryCache, GitwebCache
+#   for at least 10 CPU seconds...
+# Cache::FastMmap:    11 wallclock secs ( 8.46 usr +  2.17 sys = 10.63 CPU) @ 15710.25/s (n=167000)
+# Cache::FileCache:   15 wallclock secs ( 8.43 usr +  2.25 sys = 10.68 CPU) @ 356.27/s   (n=3805)
+# Cache::MemoryCache: 13 wallclock secs ( 9.91 usr +  0.09 sys = 10.00 CPU) @ 3306.20/s  (n=33062)
+# GitwebCache:        29 wallclock secs ( 7.02 usr +  3.47 sys = 10.49 CPU) @ 605.91/s   (n=6356)
+#                       Rate Cache::FileCache GitwebCache Cache::MemoryCache Cache::FastMmap
+# Cache::FileCache     356/s               --        -41%               -89%            -98%
+# GitwebCache          606/s              70%          --               -82%            -96%
+# Cache::MemoryCache  3306/s             828%        446%                 --            -79%
+# Cache::FastMmap    15710/s            4310%       2493%               375%              --
+#
+# $cache->get($key)
+# Benchmark: running Cache::FastMmap, Cache::FileCache, Cache::MemoryCache, GitwebCache
+#   for at least 10 CPU seconds...
+# Cache::FastMmap:    13 wallclock secs ( 7.32 usr +  2.83 sys = 10.15 CPU) @ 24260.59/s (n=246245)
+# Cache::FileCache:   12 wallclock secs ( 9.22 usr +  1.30 sys = 10.52 CPU) @ 972.62/s   (n=10232)
+# Cache::MemoryCache: 14 wallclock secs ( 9.89 usr +  0.12 sys = 10.01 CPU) @ 3679.52/s  (n=36832)
+# GitwebCache:        20 wallclock secs ( 8.16 usr +  2.36 sys = 10.52 CPU) @ 4401.05/s  (n=46299)
+#                       Rate Cache::FileCache Cache::MemoryCache GitwebCache Cache::FastMmap
+# Cache::FileCache     973/s               --               -74%        -78%            -96%
+# Cache::MemoryCache  3680/s             278%                 --        -16%            -85%
+# GitwebCache         4401/s             352%                20%          --            -82%
+# Cache::FastMmap    24261/s            2394%               559%        451%              --
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 07/17] gitweb/lib - Simple select(FH) based output capture
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (5 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 06/17] gitweb/lib - Benchmarking GitwebCache::SimpleFileCache (in t/9603/) Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 08/17] gitweb/lib - Alternate ways of capturing output Jakub Narebski
                   ` (12 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Add two packages: GitwebCache::Capture, which defines interface, and
GitwebCache::Capture::SelectFH, which is actually implements simple
capturing.  GitwebCache::Capture::SelectFH captures output by using
select(FILEHANDLE) to change default filehandle for output.  This
means that output of a "print" or a "printf" (or a "write") without
a filehandle would be captured.

To change mode of filehandle used for capturing correctly,
  binmode select(), <mode>;
needs to be used in place of
  binmode STDOUT, <mode>;

Capturing is done using in-memory file held in Perl scalar.

Using select(FILEHANDLE) is a bit fragile as a method of capturing
output, as it assumes that we always use "print" or "printf" without
filehandle, and use select() which returns default filehandle for
output in place of explicit STDOUT.  On the other hand it has the
advantage of being simple.  Alternate solutions include using tie
(like in CGI::Cache), or using PerlIO layers - but the last requires
non-standard PerlIO::Util module.


Includes separate tests for capturing output, in t9503/test_capture.pl
which is run as external test from t9503-gitweb-caching.sh

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/lib/GitwebCache/Capture.pm          |   66 ++++++++++++++++++++++
 gitweb/lib/GitwebCache/Capture/SelectFH.pm |   82 ++++++++++++++++++++++++++++
 gitweb/lib/GitwebCache/SimpleFileCache.pm  |    7 +--
 t/t9503-gitweb-caching.sh                  |    5 ++-
 t/t9503/test_capture_interface.pl          |   76 ++++++++++++++++++++++++++
 5 files changed, 231 insertions(+), 5 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/Capture.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture/SelectFH.pm
 create mode 100755 t/t9503/test_capture_interface.pl

diff --git a/gitweb/lib/GitwebCache/Capture.pm b/gitweb/lib/GitwebCache/Capture.pm
new file mode 100644
index 0000000..3e9fe81
--- /dev/null
+++ b/gitweb/lib/GitwebCache/Capture.pm
@@ -0,0 +1,66 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Output capturing for gitweb caching engine
+#
+
+# It is base abstract class (a role) for capturing output of gitweb
+# actions for gitweb caching engine.
+# 
+# Child (derived) concrete classes, which actually implement some method
+# of capturing STDOUT output, must implement the following methods:
+# * ->new(), to create new object of a capturing class
+# * ->start(), to start capturing output
+# * ->stop(), to stop capturing output and return it
+#
+# Before starting capture by using capture_block etc. subroutines,
+# one has to run <child class>->setup().
+
+package GitwebCache::Capture;
+
+use strict;
+use warnings;
+
+use Exporter qw(import);
+our @EXPORT    = qw(capture_start capture_stop capture_block);
+our @EXPORT_OK = qw(setup_capture);
+our %EXPORT_TAGS = (all => [ @EXPORT, @EXPORT_OK ]);
+
+# Holds object used for capture (of child class)
+my $capture;
+
+sub setup_capture {
+	my $self = shift || __PACKAGE__;
+
+	$capture = $self->new(@_);
+}
+
+sub capture {
+	my ($self, $code) = @_;
+
+	$self->start();
+	$code->();
+	return $self->stop();
+}
+
+# Wrap caching data; capture only STDOUT
+sub capture_block (&) {
+	my $code = shift;
+	return $capture->capture($code);
+}
+
+sub capture_start {
+	$capture->start(@_);
+}
+
+sub capture_stop {
+	return $capture->stop(@_);
+}
+
+1;
+__END__
+# end of package GitwebCache::Capture;
diff --git a/gitweb/lib/GitwebCache/Capture/SelectFH.pm b/gitweb/lib/GitwebCache/Capture/SelectFH.pm
new file mode 100644
index 0000000..2b904d3
--- /dev/null
+++ b/gitweb/lib/GitwebCache/Capture/SelectFH.pm
@@ -0,0 +1,82 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Simple output capturing using select(FH);
+#
+
+# This module (class) captures output of 'print <sth>', 'printf <sth>'
+# and 'write <sth>' (without a filehandle) by using select(FILEHANDLE)
+# to change default filehandle for output, changing it to in-memory
+# file (saving output to scalar).
+#
+# Note that when using this simplest way of capturing, to change mode of
+# filehandle using for capturing correctly, "binmode STDOUT, <mode>;"
+# has to be changed to "binmode select(), <mode>;".  This has no change
+# if we are not capturing output using GitwebCache::Capture::SelectFH.
+
+package GitwebCache::Capture::SelectFH;
+
+use PerlIO;
+
+use strict;
+use warnings;
+
+use base qw(GitwebCache::Capture);
+use GitwebCache::Capture qw(:all);
+
+use Exporter qw(import);
+our @EXPORT      = @GitwebCache::Capture::EXPORT;
+our @EXPORT_OK   = @GitwebCache::Capture::EXPORT_OK;
+our %EXPORT_TAGS = %GitwebCache::Capture::EXPORT_TAGS;
+
+# Constructor
+sub new {
+	my $proto = shift;
+
+	my $class = ref($proto) || $proto;
+	my $self  = {};
+	$self = bless($self, $class);
+
+	$self->{'oldfh'} = select();
+	$self->{'data'} = '';
+
+	return $self;
+}
+
+# Start capturing data (STDOUT)
+# (printed using 'print <sth>' or 'printf <sth>')
+sub start {
+	my $self = shift;
+	
+	$self->{'data'}    = '';
+	$self->{'data_fh'} = undef;
+	
+	open $self->{'data_fh'}, '>', \$self->{'data'}
+		or die "Couldn't open in-memory file for capture: $!";
+	$self->{'oldfh'} = select($self->{'data_fh'});
+
+	# note: this does not cover all cases
+	binmode select(), ':utf8'
+		if ((PerlIO::get_layers($self->{'oldfh'}))[-1] eq 'utf8');
+}
+
+# Stop capturing data (required for die_error)
+sub stop {
+	my $self = shift;
+
+	# return if we didn't start capturing
+	return unless defined $self->{'data_fh'};
+	
+	select($self->{'oldfh'});
+	close $self->{'data_fh'}
+		or die "Couldn't close in-memory file for capture: $!";
+	return $self->{'data'};
+}
+
+1;
+__END__
+# end of package GitwebCache::Capture::SelectFH;
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 91b3373..7c90350 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -201,16 +201,17 @@ sub store {
 
 	# ensure that directory leading to cache file exists
 	if (!-d $dir) {
-		mkpath($dir, 0, 0777)
+		eval { mkpath($dir, 0, 0777); 1 }
 			or die "Couldn't create leading directory '$dir' (mkpath): $!";
 	}
 
 	# generate a temporary file
+	# (Temp::File dies itself if it cannot create file)
 	my $temp = File::Temp->new(
 		#DIR => $dir,
 		TEMPLATE => "${file}_XXXXX",
 		SUFFIX => ".tmp"
-	) or die "Couldn't create temporary file with '${file}_XXXXX' template: $!";
+	);# or die "Couldn't create temporary file with '${file}_XXXXX' template: $!";
 	chmod 0666, $temp
 		or warn "Couldn't change permissions to 0666 for '$temp': $!";
 
@@ -342,5 +343,3 @@ sub compute {
 1;
 __END__
 # end of package GitwebCache::SimpleFileCache;
-
-
diff --git a/t/t9503-gitweb-caching.sh b/t/t9503-gitweb-caching.sh
index 768080c..73b3f5a 100755
--- a/t/t9503-gitweb-caching.sh
+++ b/t/t9503-gitweb-caching.sh
@@ -26,7 +26,10 @@ fi
 
 # ----------------------------------------------------------------------
 
-test_external 'GitwebCache::* Perl API (in gitweb/cache.pm)' \
+test_external 'GitwebCache::SimpleFileCache Perl API (in gitweb/cache.pm)' \
 	"$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_cache_interface.pl
 
+test_external 'GitwebCache::Capture Perl API (in gitweb/cache.pm)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_capture_interface.pl
+
 test_done
diff --git a/t/t9503/test_capture_interface.pl b/t/t9503/test_capture_interface.pl
new file mode 100755
index 0000000..d7aa480
--- /dev/null
+++ b/t/t9503/test_capture_interface.pl
@@ -0,0 +1,76 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+use utf8;
+
+use Test::More;
+
+# test source version
+use lib "$ENV{TEST_DIRECTORY}/../gitweb/lib";
+
+# ....................................................................
+
+# prototypes must be known at compile time, otherwise they do not work
+BEGIN { use_ok('GitwebCache::Capture::SelectFH', qw(:all)); }
+
+# Test setting up capture
+#
+my $capture = new_ok('GitwebCache::Capture::SelectFH' => [], 'The $capture');
+isa_ok($capture, 'GitwebCache::Capture', 'The $capture');
+ok(setup_capture('GitwebCache::Capture::SelectFH'),
+   'setup_capture with package name: GitwebCache::Capture::SelectFH');
+ok(setup_capture($capture),
+   'setup_capture with subclass object: $capture');
+
+# Test properties of capture_block
+#
+is(prototype('capture_block'), '&', 'capture_block has (&) prototype');
+
+# Test capturing
+#
+diag('Should not print anything except test results and diagnostic');
+my $test_data = 'Capture this';
+my $captured = capture_block {
+	print $test_data;
+};
+is($captured, $test_data, 'capture_block captures simple data');
+
+binmode STDOUT, ':utf8';
+$test_data = <<'EOF';
+Áéí óú
+ÄËÑÏÖ
+Ábçdèfg
+Zażółć gęsią jaźń
+山田 太郎
+ブレームのテストです。
+
+はれひほふ
+
+しているのが、いるので。
+濱浜ほれぷりぽれまびぐりろへ。
+EOF
+utf8::decode($test_data);
+$captured = capture_block {
+	binmode select(), ':utf8';
+
+	print $test_data;
+};
+utf8::decode($captured);
+is($captured, $test_data, 'capture_block captures utf8 data');
+
+$test_data = '|\x{fe}\x{ff}|\x{9F}|\000|'; # invalid utf-8
+$captured = capture_block {
+	binmode select(), ':raw';
+
+	print $test_data;
+};
+is($captured, $test_data, 'capture_block captures raw data');
+
+
+done_testing();
+
+# Local Variables:
+# encoding: utf-8
+# End:
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 08/17] gitweb/lib - Alternate ways of capturing output
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (6 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 07/17] gitweb/lib - Simple select(FH) based output capture Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 09/17] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
                   ` (11 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Besides GitwebCache::Capture::SelectFH, which uses select(FH) to
redirect 'print LIST' and 'printf FORMAT, LIST' to in-memory file to
capture output, add GitwebCache::Capture::TiedCapture which uses
tie-ing filehandle to capture output, and GitwebCache::Capture::PerlIO
which uses push_layer method from non-core PerlIO::Util module to
capture output.

Add test (which canbe run standalone) for all those implementations,
checking ':utf8' and ':raw' output, and benchmark comparing them
(includes example benchmark tests).  Please note that the test for
alternate implementations is not run from t/t9503-gitweb-caching.sh
test.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/lib/GitwebCache/Capture/PerlIO.pm      |   79 ++++++++++
 gitweb/lib/GitwebCache/Capture/TiedCapture.pm |  149 +++++++++++++++++++
 t/t9503/benchmark_capture_implementations.pl  |  198 +++++++++++++++++++++++++
 3 files changed, 426 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/Capture/PerlIO.pm
 create mode 100644 gitweb/lib/GitwebCache/Capture/TiedCapture.pm
 create mode 100755 t/t9503/benchmark_capture_implementations.pl

diff --git a/gitweb/lib/GitwebCache/Capture/PerlIO.pm b/gitweb/lib/GitwebCache/Capture/PerlIO.pm
new file mode 100644
index 0000000..199aeed
--- /dev/null
+++ b/gitweb/lib/GitwebCache/Capture/PerlIO.pm
@@ -0,0 +1,79 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Output capturing using PerlIO layers
+#
+
+# This module requires PaerlIO::Util installed.
+
+package GitwebCache::Capture::PerlIO;
+
+use PerlIO::Util;
+
+use strict;
+use warnings;
+
+use base qw(GitwebCache::Capture);
+use GitwebCache::Capture qw(:all);
+
+use Exporter qw(import);
+our @EXPORT      = @GitwebCache::Capture::EXPORT;
+our @EXPORT_OK   = @GitwebCache::Capture::EXPORT_OK;
+our %EXPORT_TAGS = %GitwebCache::Capture::EXPORT_TAGS;
+
+# Constructor
+sub new {
+	my $proto = shift;
+
+	my $class = ref($proto) || $proto;
+	my $self  = {};
+	$self = bless($self, $class);
+
+	$self->{'data'} = '';
+
+	return $self;
+}
+
+# Start capturing data (STDOUT)
+# (printed using 'print <sth>' or 'printf <sth>')
+sub start {
+	my $self = shift;
+
+	$self->{'data'}    = '';
+	*STDOUT->push_layer('scalar' => \$self->{'data'});
+
+	# push ':utf8' on top, if it was on top
+	*STDOUT->push_layer(':utf8')
+		if ((*STDOUT->get_layers())[-2] eq 'utf8');
+}
+
+# Stop capturing data (required for die_error)
+sub stop {
+	my $self = shift;
+
+	# return if we didn't start capturing
+	my @layers = *STDOUT->get_layers();
+	return unless grep { $_ eq 'scalar' } @layers;
+
+	my $was_utf8 = $layers[-1] eq 'utf8';
+	# stop saving to scalar, i.e. remove topmost 'scalar' layer,
+	# but remember that 'utf8' layer might be on top of it
+	while ((my $layer = *STDOUT->pop_layer())) {
+		pop @layers;
+		last if $layer eq 'scalar';
+	}
+	# restore ':utf8' mode, if needed
+	if ($was_utf8 && $layers[-1] ne 'utf8') {
+		*STDOUT->push_layer('utf8');
+	}
+
+	return $self->{'data'};
+}
+
+1;
+__END__
+# end of package GitwebCache::Capture::PerlIO;
diff --git a/gitweb/lib/GitwebCache/Capture/TiedCapture.pm b/gitweb/lib/GitwebCache/Capture/TiedCapture.pm
new file mode 100644
index 0000000..e7cc2e3
--- /dev/null
+++ b/gitweb/lib/GitwebCache/Capture/TiedCapture.pm
@@ -0,0 +1,149 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Simple output capturing by tie-ing filehandle
+#
+
+package GitwebCache::Capture::TiedCapture;
+
+use PerlIO;
+
+use strict;
+use warnings;
+
+use base qw(GitwebCache::Capture);
+use GitwebCache::Capture qw(:all);
+
+use Exporter qw(import);
+our @EXPORT      = @GitwebCache::Capture::EXPORT;
+our @EXPORT_OK   = @GitwebCache::Capture::EXPORT_OK;
+our %EXPORT_TAGS = %GitwebCache::Capture::EXPORT_TAGS;
+
+# Constructor
+sub new {
+	my $proto = shift;
+
+	my $class = ref($proto) || $proto;
+	my $self  = {};
+	$self = bless($self, $class);
+
+	$self->{'data'} = '';
+	$self->{'tied'} = undef;
+
+	return $self;
+}
+
+# Start capturing data (STDOUT)
+# (printed using 'print <sth>' or 'printf <sth>')
+sub start {
+	my $self = shift;
+
+	# savie tie
+	$self->{'tied'} = tied *STDOUT;
+
+	$self->{'data'} = '';
+	tie *STDOUT, 'GitwebCache::TiedCapture', \$self->{'data'};
+
+	# re-binmode, so that tied class would pick it up
+	binmode STDOUT,
+		(PerlIO::get_layers(*STDOUT))[-1] eq 'utf8' ? ':utf8' : ':raw';
+}
+
+# Stop capturing data (required for die_error)
+sub stop {
+	my $self = shift;
+
+	# return if we didn't start capturing
+	return unless tied(*STDOUT)->isa('GitwebCache::TiedCapture');
+
+	# restore ties, if there were any
+	untie *STDOUT;
+	if ($self->{'tied'}) {
+		tie *STDOUT, 'Tie::Restore', $self->{'tied'};
+	}
+
+	return $self->{'data'};
+}
+
+1;
+
+########################################################################
+
+package GitwebCache::TiedCapture;
+
+our $VERSION = '0.001';
+$VERSION = eval $VERSION;
+
+use utf8; # not strictly necessary: utf8::* is in core
+use strict;
+use warnings;
+
+sub TIEHANDLE {
+	my ($proto, $dataref) = @_;
+	my $class = ref($proto) || $proto;
+	my $self = {};
+	$self = bless($self, $class);
+	$self->{'scalar'} = $dataref;
+	$self->{'binmode'} = ':utf8';
+	return $self;
+}
+
+sub append_str {
+	my ($self, $str) = @_;
+	utf8::encode($str) if ($self->{'binmode'} eq ':utf8');
+	${$self->{'scalar'}} .= $str;
+}
+
+sub WRITE {
+	my ($self, $buffer, $length, $offset) = @_;
+	$self->append_str(substr($buffer, $offset, $length));
+}
+
+sub PRINT {
+	my $self = shift;
+	$self->append_str(join('',@_));
+}
+
+sub PRINTF {
+	my $self = shift;
+	$self->append_str(sprintf(@_));
+}
+
+sub BINMODE {
+	my $self = shift;
+	$self->{'mode'} = shift || ':raw';
+}
+
+#sub UNTIE {
+#	local $^W = 0;
+#}
+
+1;
+
+########################################################################
+
+package Tie::Restore;
+# Written by Robby Walker ( webmaster@pointwriter.com )
+# for Point Writer ( http://www.pointwriter.com/ ).
+
+our $VERSION = '0.11';
+$VERSION = eval $VERSION;
+
+# $object = tied %hash;
+# tie %hash, 'Tie::Restore', $object;
+
+sub TIESCALAR { $_[1] }
+sub TIEARRAY  { $_[1] }
+sub TIEHASH   { $_[1] }
+sub TIEHANDLE { $_[1] }
+
+1;
+
+
+1;
+__END__
+# end of package GitwebCache::Capture::TiedCapture;
diff --git a/t/t9503/benchmark_capture_implementations.pl b/t/t9503/benchmark_capture_implementations.pl
new file mode 100755
index 0000000..73806a3
--- /dev/null
+++ b/t/t9503/benchmark_capture_implementations.pl
@@ -0,0 +1,198 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use File::Spec;
+use File::Path;
+use Benchmark qw(:all);
+
+use PerlIO;
+
+# benchmark source version
+sub __DIR__ () {
+	File::Spec->rel2abs(join '', (File::Spec->splitpath(__FILE__))[0, 1]);
+}
+use lib __DIR__."/../../gitweb/lib";
+
+# ....................................................................
+
+# Load modules (without importing)
+#
+my @modules =
+	map { "GitwebCache::Capture::$_" }
+	qw(SelectFH TiedCapture);
+foreach my $mod (@modules) {
+	eval "require $mod";
+}
+if (eval { require PerlIO::Util; 1 }) {
+	require GitwebCache::Capture::PerlIO;
+	push @modules, 'GitwebCache::Capture::PerlIO';
+}
+
+# Set up capturing, for each module
+#
+my @captures = map { $_->new() } @modules;
+
+my $test_data = <<'EOF';
+Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do
+eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad
+minim veniam, quis nostrud exercitation ullamco laboris nisi ut
+aliquip ex ea commodo consequat. Duis aute irure dolor in
+reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
+pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
+culpa qui officia deserunt mollit anim id est laborum.
+
+Sed ut perspiciatis unde omnis iste natus error sit voluptatem
+accusantium doloremque laudantium, totam rem aperiam, eaque ipsa quae
+ab illo inventore veritatis et quasi architecto beatae vitae dicta
+sunt explicabo. Nemo enim ipsam voluptatem quia voluptas sit
+aspernatur aut odit aut fugit, sed quia consequuntur magni dolores eos
+qui ratione voluptatem sequi nesciunt. Neque porro quisquam est, qui
+dolorem ipsum quia dolor sit amet, consectetur, adipisci velit, sed
+quia non numquam eius modi tempora incidunt ut labore et dolore magnam
+aliquam quaerat voluptatem. Ut enim ad minima veniam, quis nostrum
+exercitationem ullam corporis suscipit laboriosam, nisi ut aliquid ex
+ea commodi consequatur? Quis autem vel eum iure reprehenderit qui in
+ea voluptate velit esse quam nihil molestiae consequatur, vel illum
+qui dolorem eum fugiat quo voluptas nulla pariatur?
+EOF
+
+my @captured_output;
+my $repeat = 100;
+sub capture_output {
+	my ($class, $mode) = @_;
+
+	$class->start();
+	binmode select(), $mode if defined($mode);
+	print $test_data for (1..$repeat);
+
+	return $class->stop();
+}
+
+my %codehash;
+for (my $i = 0; $i < @captures; $i++) {
+	my $capture = $captures[$i];
+	my $name = ref($capture);
+	$name =~ s/^.*:://;
+
+	$codehash{$name} = sub { $captured_output[$i] = capture_output($capture) };
+}
+
+# ....................................................................
+
+my $test_other_modules = 0;
+
+if ($test_other_modules) {
+
+	if (eval { require Capture::Tiny; 1; }) {
+		$codehash{'Capture::Tiny'} = sub {
+			my ($stdout, $stderr) = Capture::Tiny::capture(sub {
+				print $test_data for (1..$repeat);
+			});
+			print STDERR $stderr if defined($stderr);
+		};
+	}
+
+	if (eval { require IO::CaptureOutput; 1; }) {
+		$codehash{'IO::CaptureOutput'} = sub {
+			my ($stdout, $stderr);
+			IO::CaptureOutput::capture(sub {
+				print $test_data for (1..$repeat);
+			}, \$stdout, \$stderr);
+			print STDERR $stderr if defined($stderr);
+		};
+		# somehow it interferes with capturing in GitwebCache::Capture::PerlIO
+		delete $codehash{'PerlIO'};
+	}
+
+	if (eval { require IO::Capture::Stdout; 1; }) {
+		$codehash{'IO::Capture'} = sub {
+			my $capture = IO::Capture::Stdout->new();
+
+			$capture->start();
+			print $test_data for (1..$repeat);
+			$capture->stop();
+
+			my $captured_output = join('', $capture->read());
+		};
+	}
+} # end if ($test_other_modules)
+
+# ....................................................................
+
+print "Capturing $repeat x ".length($test_data).
+      " = ".($repeat * length($test_data))." characters\n";
+my $count = -10; # CPU seconds
+my $result = timethese($count, \%codehash);
+cmpthese($result);
+
+#if (exists $codehash{PerlIO}) {
+#	cmpthese(-10, {
+#		'PerlIO::get_layers'  => sub { PerlIO::get_layers(*STDOUT); },
+#		'PerlIO::Util method' => sub { *STDOUT->get_layers(); },
+#	});
+#}
+
+1;
+__END__
+## EXAMPLE OUTPUT ##
+#
+## 1 x $test_data, PerlIO using *STDOUT->get_layers();
+# Benchmark: running PerlIO, SelectFH, TiedCapture for at least 10 CPU seconds...
+#      PerlIO:  9 wallclock secs (10.38 usr +  0.13 sys = 10.51 CPU) @  9676.31/s (n=101698)
+#    SelectFH: 12 wallclock secs (10.51 usr +  0.02 sys = 10.53 CPU) @ 12294.21/s (n=129458)
+# TiedCapture: 10 wallclock secs (10.24 usr +  0.06 sys = 10.30 CPU) @  9489.22/s (n=97739)
+#                Rate TiedCapture      PerlIO    SelectFH
+# TiedCapture  9489/s          --         -2%        -23%
+# PerlIO       9676/s          2%          --        -21%
+# SelectFH    12294/s         30%         27%          --
+#
+## 10 x $test_data, PerlIO using *STDOUT->get_layers();
+# Benchmark: running PerlIO, SelectFH, TiedCapture for at least 10 CPU seconds...
+#      PerlIO:  9 wallclock secs (10.47 usr +  0.07 sys = 10.54 CPU) @ 7558.35/s (n=79665)
+#    SelectFH: 11 wallclock secs (10.36 usr +  0.04 sys = 10.40 CPU) @ 8970.87/s (n=93297)
+# TiedCapture: 11 wallclock secs (10.45 usr +  0.02 sys = 10.47 CPU) @ 2602.77/s (n=27251)
+#               Rate TiedCapture      PerlIO    SelectFH
+# TiedCapture 2603/s          --        -66%        -71%
+# PerlIO      7558/s        190%          --        -16%
+# SelectFH    8971/s        245%         19%          --
+#
+## 100 x $test_data, PerlIO using *STDOUT->get_layers();
+# Benchmark: running PerlIO, SelectFH, TiedCapture for at least 50 CPU seconds...
+#      PerlIO: 67 wallclock secs (35.28 usr + 17.82 sys = 53.10 CPU) @ 832.41/s (n=44201)
+#    SelectFH: 73 wallclock secs (33.83 usr + 18.63 sys = 52.46 CPU) @ 830.06/s (n=43545)
+# TiedCapture: 71 wallclock secs (50.93 usr +  0.41 sys = 51.34 CPU) @  95.31/s (n=4893)
+#               Rate TiedCapture    SelectFH      PerlIO
+# TiedCapture 95.3/s          --        -89%        -89%
+# SelectFH     830/s        771%          --         -0%
+# PerlIO       832/s        773%          0%          --
+#
+## 100 x $test_data, PerlIO using mix of *STDOUT->get_layers() and PerlIO::get_layers(*STDOUT);
+# Capturing 100 x 1314 = 131400 characters
+# Benchmark: timing 25000 iterations of PerlIO, SelectFH, TiedCapture...
+#      PerlIO:  30 wallclock secs  (19.05 usr + 10.29 sys =  29.34 CPU) @ 852.08/s (n=25000)
+#    SelectFH:  30 wallclock secs  (18.95 usr + 10.26 sys =  29.21 CPU) @ 855.87/s (n=25000)
+# TiedCapture: 307 wallclock secs (267.37 usr +  2.95 sys = 270.32 CPU) @  92.48/s (n=25000)
+#               Rate TiedCapture      PerlIO    SelectFH
+# TiedCapture 92.5/s          --        -89%        -89%
+# PerlIO       852/s        821%          --         -0%
+# SelectFH     856/s        825%          0%          --
+#
+## 100 x $test_data (IO::CaptureOutput interferes with GitwebCache::Capture::PerlIO)
+# Capturing 100 x 1314 = 131400 characters
+# Benchmark: running IO::CaptureOutput, SelectFH, TiedCapture for at least 10 CPU seconds...
+# IO::CaptureOutput: 12 wallclock secs ( 5.12 usr +  5.63 sys = 10.75 CPU) @ 126.60/s (n=1361)
+#          SelectFH: 12 wallclock secs ( 6.93 usr +  3.45 sys = 10.38 CPU) @ 808.29/s (n=8390)
+#       TiedCapture: 11 wallclock secs (10.11 usr +  0.01 sys = 10.12 CPU) @ 103.26/s (n=1045)
+#                    Rate       TiedCapture IO::CaptureOutput          SelectFH
+# TiedCapture       103/s                --              -18%              -87%
+# IO::CaptureOutput 127/s               23%                --              -84%
+# SelectFH          808/s              683%              538%                --
+#
+## PerlIO::get_layers   == PerlIO::get_layers(*STDOUT)
+## PerlIU::Util method  == *STDOUT->get_layers()
+#                        Rate PerlIO::Util method  PerlIO::get_layers
+# PerlIO::Util method 54405/s                  --                -38%
+# PerlIO::get_layers  87672/s                 61%                  --
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 09/17] gitweb/lib - Cache captured output (using get/set)
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (7 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 08/17] gitweb/lib - Alternate ways of capturing output Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 10/17] gitweb: Add optional output caching (from gitweb/cache.pm) Jakub Narebski
                   ` (10 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Add GitwebCache::CacheOutput package, which contains cache_output
subroutine, and (re)exports capture_stop from GitwebCache::Capture.
The cache_output gets data from cache and prints it, or captures
output of provided subroutine (code reference), saves it to cache and
prints it.  It currently uses Cache::Cache compatibile (get, set)
interface to cache.  Capturing is currently (not configurable) done
using GitwebCache::Capture::SelectFH introduced in previous commit,
but any class derived from GitwebCache::Capture (like provided example
GitwebCache::Capture::TiedCapture and GitwebCache::Capture::PerlIO)
would work.

Gitweb would use cache_output to get page from cache, or to generate
page and save it to cache.  The capture_stop would be used in
die_error subroutine, because error pages would not be cached.

It is assumed that data is saved to cache _converted_, and should
therefore be read from cache and printed to STDOUT in ':raw' (binary)
mode.


Add t9503/test_cache_output.pl test, run as external test in
t9503-gitweb-caching.  It checks that cache_output behaves correctly,
namely that it saves and restores action output in cache, and that it
prints generated output or cached output.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/lib/GitwebCache/CacheOutput.pm |   62 +++++++++++++++++++++++++++++++
 t/t9503-gitweb-caching.sh             |    3 +
 t/t9503/test_cache_output.pl          |   66 +++++++++++++++++++++++++++++++++
 3 files changed, 131 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/CacheOutput.pm
 create mode 100755 t/t9503/test_cache_output.pl

diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
new file mode 100644
index 0000000..8195f0b
--- /dev/null
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -0,0 +1,62 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+#
+# This program is licensed under the GPLv2
+
+#
+# Capturing and caching (gitweb) output
+#
+
+# Capture output, save it in cache and print it, or retrieve it from
+# cache and print it.
+
+package GitwebCache::CacheOutput;
+
+use strict;
+use warnings;
+
+use GitwebCache::SimpleFileCache;
+use GitwebCache::Capture::SelectFH qw(:all);
+
+use Exporter qw(import);
+our @EXPORT      = qw(cache_output capture_stop);
+our %EXPORT_TAGS = (all => [ @EXPORT ]);
+
+# cache_output($cache, $key, $action_code);
+#
+# Attempts to get $key from $cache; if successful, prints the value.
+# Otherwise, calls $action_code, capture its output and use
+# the captured output as the new value for $key in $cache,
+# then print captured output.
+our $CAPTURE_CLASS = 'GitwebCache::Capture::SelectFH';
+
+sub cache_output {
+	my ($cache, $key, $code) = @_;
+
+	my $data = $cache->get($key);
+
+	# capture and cache output, if there was nothing in the cache
+	if (!defined $data) {
+		my $capture = $CAPTURE_CLASS;
+		setup_capture($capture);
+
+		# do not use 'capture_block' prototype
+		$data = &capture_block($code);
+		$cache->set($key, $data) if defined $data;
+	}
+
+	# print cached data
+	if (defined $data) {
+		# select() instead of STDOUT is here for t9503 test:
+		binmode select(), ':raw';
+		print $data;
+	}
+
+	return $data;
+}
+
+1;
+__END__
+# end of package GitwebCache::CacheOutput;
diff --git a/t/t9503-gitweb-caching.sh b/t/t9503-gitweb-caching.sh
index 73b3f5a..0afcc0c 100755
--- a/t/t9503-gitweb-caching.sh
+++ b/t/t9503-gitweb-caching.sh
@@ -32,4 +32,7 @@ test_external 'GitwebCache::SimpleFileCache Perl API (in gitweb/cache.pm)' \
 test_external 'GitwebCache::Capture Perl API (in gitweb/cache.pm)' \
 	"$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_capture_interface.pl
 
+test_external 'GitwebCache::CacheOutput Perl API (in gitweb/cache.pm)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_cache_output.pl
+
 test_done
diff --git a/t/t9503/test_cache_output.pl b/t/t9503/test_cache_output.pl
new file mode 100755
index 0000000..96eedb5
--- /dev/null
+++ b/t/t9503/test_cache_output.pl
@@ -0,0 +1,66 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use Test::More;
+
+# test source version
+use lib "$ENV{TEST_DIRECTORY}/../gitweb/lib";
+
+# ....................................................................
+
+# prototypes must be known at compile time, otherwise they do not work
+BEGIN { use_ok('GitwebCache::CacheOutput'); }
+
+# Test setting up $cache and $capture
+my $cache   = new_ok('GitwebCache::SimpleFileCache'   => [], 'The $cache  ');
+my $capture = new_ok('GitwebCache::Capture::SelectFH' => [], 'The $capture');
+
+# ......................................................................
+
+# Prepare for testing cache_output
+my $key = 'Key';
+my $action_output = <<'EOF';
+# This is data to be cached and shown
+EOF
+my $cached_output = <<"EOF";
+$action_output# (version recovered from cache)
+EOF
+sub action {
+	print $action_output;
+}
+
+# Catch output printed by cache_fetch
+# (only for 'print <sth>' and 'printf <sth>')
+sub capture_output_of_cache_output {
+	my $test_data = '';
+
+	open my $test_data_fh, '>', \$test_data;
+	my $oldfh = select($test_data_fh);
+
+	cache_output($cache, $key, \&action);
+
+	select($oldfh);
+	close $test_data_fh;
+
+	return $test_data;
+}
+
+# clean state
+$cache->set_expires_in(-1);
+$cache->remove($key);
+my $test_data;
+
+# first time (if there is no cache) generates cache entry
+$test_data = capture_output_of_cache_output();
+is($test_data, $action_output,        'action output is printed (generated)');
+is($cache->get($key), $action_output, 'action output is saved in cache (generated)');
+
+# second time (if cache is set/valid) reads from cache
+$cache->set($key, $cached_output);
+$test_data = capture_output_of_cache_output();
+is($test_data, $cached_output,        'action output is printed (from cache)');
+
+done_testing();
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 10/17] gitweb: Add optional output caching (from gitweb/cache.pm)
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (8 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 09/17] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 10/18] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
                   ` (9 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

This commit actually adds output caching to gitweb, as we have now
minimal features required for it in GitwebCache::SimpleFileCache
(a 'dumb' but fast file-based cache engine).  To enable cache you need
(at least) set $caching_enabled to true in gitweb config, and copy
gitweb/cache.pm alongside generated gitweb.cgi - this is described
in more detail in the new "Gitweb caching" section in gitweb/README.

Caching in theory can be done using any Perl module that implements
Cache::Cache compatibile get/set (method) interface.  The default is
to use GitwebCache::SimpleFileCache from gitweb/cache.pm.  Capturing
and caching output is done via cache_output subroutine from
GitwebCache::CacheOutput, also in gitweb/cache.pm.

The cache_output subroutine in GitwebCache::CacheOutput currently
captures output using GitwebCache::Capture::SelectFH package, which in
turn uses select(FILEHANDLE) to change default filehandle for output.
This means that a "print" or a "printf" (or a "write") in gitweb
source without a filehandle would be captured.  To change mode of
filehandle used for capturing correctly, "binmode STDOUT, <mode>" had
to be changed to "binmode select(), <mode>".

Capturing and caching is designed in such way that there is no
behaviour change if $caching_enabled is false.  If caching is not
enabled, then capturing is also turned off.

Enabling caching causes the following additional changes to gitweb
output:
* Disables content-type negotiation (choosing between 'text/html'
  mimetype and 'application/xhtml+xml') when caching, as there is no
  content-type negotiation done when retrieving page from cache.
  Use lowest common denominator of 'text/html' mimetype which can
  be used by all browsers.
* Disable optional timing info (how much time it took to generate the
  original page, and how many git commands it took), and in its place show
  unconditionally when page was originally generated (in GMT / UTC
  timezone).
* Disable 'blame_incremental' view, as it doesn't make sense without
  printing data as soon as it is generated (which would require tee-ing
  when capturing output for caching)... and it doesn't work currently
  anyway.

Add basic tests of caching support to t9500-gitweb-standalone-no-errors
test: set $caching_enabled to true and check for errors for first time
run (generating cache) and second time run (retrieving from cache) for a
single view - summary view for a project.


To be implemented (from original patch by J.H.):
* adaptive cache expiration, based on average system load
* optional locking interface, where only one process can update cache
  (using flock)
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/Makefile                        |    6 ++
 gitweb/README                          |   58 +++++++++++++++
 gitweb/gitweb.perl                     |  127 ++++++++++++++++++++++++++------
 t/t9500-gitweb-standalone-no-errors.sh |   19 +++++
 4 files changed, 187 insertions(+), 23 deletions(-)

diff --git a/gitweb/Makefile b/gitweb/Makefile
index d2401e1..025060b 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -111,6 +111,12 @@ endif
 
 GITWEB_FILES += static/git-logo.png static/git-favicon.png
 
+# gitweb output caching
+GITWEB_MODULES += lib/GitwebCache/CacheOutput.pm
+GITWEB_MODULES += lib/GitwebCache/SimpleFileCache.pm
+GITWEB_MODULES += lib/GitwebCache/Capture.pm
+GITWEB_MODULES += lib/GitwebCache/Capture/SelectFH.pm
+
 GITWEB_REPLACE = \
 	-e 's|++GIT_VERSION++|$(GIT_VERSION)|g' \
 	-e 's|++GIT_BINDIR++|$(bindir)|g' \
diff --git a/gitweb/README b/gitweb/README
index 0e19be8..9b3e5d7 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -236,6 +236,11 @@ not include variables usually directly set during build):
    If server load exceed this value then return "503 Service Unavaliable" error.
    Server load is taken to be 0 if gitweb cannot determine its value.  Set it to
    undefined value to turn it off.  The default is 300.
+ * $caching_enabled
+   If true, gitweb would use caching to speed up generating response.
+   Currently supported is only output (response) caching.  See "Gitweb caching"
+   section below for details on how to configure and customize caching.
+   The default is false (caching is disabled).
 
 
 Projects list file format
@@ -308,6 +313,59 @@ You can use the following files in repository:
    descriptions.
 
 
+Gitweb caching
+~~~~~~~~~~~~~~
+
+Currently gitweb supports only output (HTTP response) caching, similar
+to the one used on http://git.kernel.org.  To turn it on, set 
+$caching_enabled variable to true value in gitweb config file, i.e.:
+
+   our $caching_enabled = 1;
+
+You can choose which caching engine should gitweb use by setting $cache
+variable to _initialized_ instance of cache interface, e.g.:
+
+   use CHI;
+   our $cache = CHI->new( driver => 'Memcached',
+   	servers => [ "10.0.0.15:11211", "10.0.0.15:11212" ],
+   	l1_cache => { driver => 'FastMmap', root_dir => '/var/cache/gitweb' }
+   );
+
+Alternatively you can set $cache variable to the name of cache class,
+e.g.:
+
+   use Cache::FileCache;
+   our $cache = 'Cache::FileCache';
+
+In this case caching engine should support Cache::Cache or CHI names for
+cache config (see below), and ignore unrecognized options.  Such caching
+engine should also implement (at least) ->get($key) and ->set($key, $data)
+methods (Cache::Cache and CHI compatible interface).
+
+If $cache is left unset (if it is left undefined), then gitweb would use
+GitwebCache::SimpleFileCache as caching engine.  This engine is 'dumb' (but
+fast) file based caching layer, currently without any support for cache size
+limiting, or even removing expired / grossly expired entries.  It has
+therefore the downside of requiring a huge amount of disk space if there are
+a number of repositories involved.  It is not uncommon for git.kernel.org to
+have on the order of 80G - 120G accumulate over the course of a few months.
+It is therefore recommended that the cache directory be periodically
+completely deleted; this operation is safe to perform.  Suggested mechanism
+(substitute $cachedir for actual path to gitweb cache):
+
+   # mv $cachedir $cachedir.flush && mkdir $cachedir && rm -rf $cachedir.flush
+
+Site-wide cache options are defined in %cache_options hash.  Those options
+apply only when $cache is unset (GitwebCache::SimpleFileCache is used), or
+if $cache is name of cache class (e.g. $cache = 'Cache::FileCache').  You
+can override cache options in gitweb config, e.g.:
+
+   $cache_options{'expires_in'} = 60; # 60 seconds = 1 minute
+
+Please read comments for %cache_options entries in gitweb/gitweb.perl for
+description of available cache options.
+
+
 Webserver configuration
 -----------------------
 
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index bb81e14..6772f6e 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -236,6 +236,41 @@ our %avatar_size = (
 # Leave it undefined (or set to 'undef') to turn off load checking.
 our $maxload = 300;
 
+# This enables/disables the caching layer in gitweb.  Currently supported
+# is only output (response) caching, similar to the one used on git.kernel.org.
+our $caching_enabled = 0;
+# Set to _initialized_ instance of cache interface implementing (at least)
+# get($key) and set($key, $data) methods (Cache::Cache and CHI interfaces).
+# If unset, GitwebCache::SimpleFileCache would be used, which is 'dumb'
+# (but fast) file based caching layer, currently without any support for
+# cache size limiting.  It is therefore recommended that the cache directory
+# be periodically completely deleted; this operation is safe to perform.
+# Suggested mechanism:
+# mv $cachedir $cachedir.flush && mkdir $cachedir && rm -rf $cachedir.flush
+our $cache;
+# You define site-wide cache options defaults here; override them with
+# $GITWEB_CONFIG as necessary.
+our %cache_options = (
+	# The location in the filesystem that will hold the root of the cache.
+	# This directory will be created as needed (if possible) on the first
+	# cache set.  Note that either this directory must exists and web server
+	# has to have write permissions to it, or web server must be able to
+	# create this directory.
+	# Possible values:
+	# * 'cache' (relative to gitweb),
+	# * File::Spec->catdir(File::Spec->tmpdir(), 'gitweb-cache'),
+	# * '/var/cache/gitweb' (FHS compliant, requires being set up),
+	'cache_root' => 'cache',
+	# The number of subdirectories deep to cache object item.  This should be
+	# large enough that no cache directory has more than a few hundred
+	# objects.  Each non-leaf directory contains up to 256 subdirectories
+	# (00-ff).  Must be larger than 0.
+	'cache_depth' => 1,
+	# The (global) expiration time for objects placed in the cache, in seconds.
+	'expires_in' => 20,
+);
+
+
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
 our %feature = (
@@ -995,7 +1030,37 @@ if ($action !~ m/^(?:opml|project_list|project_index)$/ &&
     !$project) {
 	die_error(400, "Project needed");
 }
-$actions{$action}->();
+
+if ($caching_enabled) {
+	if (!eval { require GitwebCache::CacheOutput; 1; }) {
+		die_error(500, "Caching enabled and GitwebCache::CacheOutput not found");
+	}
+	GitwebCache::CacheOutput->import();
+
+	# $cache might be initialized (instantiated) cache, i.e. cache object,
+	# or it might be name of class, or it might be undefined
+	unless (defined $cache && ref($cache)) {
+		$cache ||= 'GitwebCache::SimpleFileCache';
+		$cache = $cache->new({
+			%cache_options,
+			#'cache_root' => '/tmp/cache',
+			#'cache_depth' => 2,
+			#'expires_in' => 20, # in seconds (CHI compatibile)
+			# (Cache::Cache compatibile initialization)
+			'default_expires_in' => $cache_options{'expires_in'},
+			# (CHI compatibile initialization)
+			'root_dir' => $cache_options{'cache_root'},
+			'depth' => $cache_options{'cache_depth'},
+		});
+	}
+
+	# human readable key identifying gitweb output
+	my $output_key = href(-replay => 1, -full => 1, -path_info => 0);
+
+	cache_output($cache, $output_key, $actions{$action});
+} else {
+	$actions{$action}->();
+}
 
 DONE_GITWEB:
 if (defined caller) {
@@ -3230,7 +3295,9 @@ sub git_header_html {
 	# 'application/xhtml+xml', otherwise send it as plain old 'text/html'.
 	# we have to do this because MSIE sometimes globs '*/*', pretending to
 	# support xhtml+xml but choking when it gets what it asked for.
-	if (defined $cgi->http('HTTP_ACCEPT') &&
+	# Disable content-type negotiation when caching (use mimetype good for all).
+	if (!$caching_enabled &&
+	    defined $cgi->http('HTTP_ACCEPT') &&
 	    $cgi->http('HTTP_ACCEPT') =~ m/(,|;|\s|^)application\/xhtml\+xml(,|;|\s|$)/ &&
 	    $cgi->Accept('application/xhtml+xml') != 0) {
 		$content_type = 'application/xhtml+xml';
@@ -3404,17 +3471,25 @@ sub git_footer_html {
 	}
 	print "</div>\n"; # class="page_footer"
 
-	if (defined $t0 && gitweb_check_feature('timed')) {
+	# timing info doesn't make much sense with output (response) caching,
+	# so when caching is enabled gitweb prints the time of page generation
+	if ((defined $t0 || $caching_enabled) &&
+	    gitweb_check_feature('timed')) {
 		print "<div id=\"generating_info\">\n";
-		print 'This page took '.
-		      '<span id="generating_time" class="time_span">'.
-		      Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
-		      ' seconds </span>'.
-		      ' and '.
-		      '<span id="generating_cmd">'.
-		      $number_of_git_cmds.
-		      '</span> git commands '.
-		      " to generate.\n";
+		if ($caching_enabled) {
+			print 'This page was generated at '.
+			      gmtime( time() )." GMT\n";
+		} else {
+			print 'This page took '.
+			      '<span id="generating_time" class="time_span">'.
+			      Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
+			      ' seconds </span>'.
+			      ' and '.
+			      '<span id="generating_cmd">'.
+			      $number_of_git_cmds.
+			      '</span> git commands '.
+			      " to generate.\n";
+		}
 		print "</div>\n"; # class="page_footer"
 	}
 
@@ -3423,8 +3498,8 @@ sub git_footer_html {
 	}
 
 	print qq!<script type="text/javascript" src="$javascript"></script>\n!;
-	if (defined $action &&
-	    $action eq 'blame_incremental') {
+	if (!$caching_enabled &&
+	    defined $action && $action eq 'blame_incremental') {
 		print qq!<script type="text/javascript">\n!.
 		      qq!startBlame("!. href(action=>"blame_data", -replay=>1) .qq!",\n!.
 		      qq!           "!. href() .qq!");\n!.
@@ -3465,6 +3540,10 @@ sub die_error {
 		500 => '500 Internal Server Error',
 		503 => '503 Service Unavailable',
 	);
+
+	# Do not cache error pages
+	capture_stop() if ($caching_enabled);
+
 	git_header_html($http_responses{$status}, undef, %opts);
 	print <<EOF;
 <div class="page_body">
@@ -4246,7 +4325,7 @@ sub git_patchset_body {
 	while ($patch_line) {
 
 		# parse "git diff" header line
-		if ($patch_line =~ m/^diff --git (\"(?:[^\\\"]*(?:\\.[^\\\"]*)*)\"|[^ "]*) (.*)$/) {
+		if ($patch_line =~ m/^diff --git (\"(?:[^\\\"]*(?:\\.[^\\\"]*)*)\"|[^ "]*) (.*)$/) {#"#"'
 			# $1 is from_name, which we do not use
 			$to_name = unquote($2);
 			$to_name =~ s!^b/!!;
@@ -5060,7 +5139,8 @@ sub git_tag {
 
 sub git_blame_common {
 	my $format = shift || 'porcelain';
-	if ($format eq 'porcelain' && $cgi->param('js')) {
+	if ($format eq 'porcelain' && $cgi->param('js') &&
+	    !$caching_enabled) {
 		$format = 'incremental';
 		$action = 'blame_incremental'; # for page title etc
 	}
@@ -5114,7 +5194,8 @@ sub git_blame_common {
 			or print "ERROR $!\n";
 
 		print 'END';
-		if (defined $t0 && gitweb_check_feature('timed')) {
+		if (!$caching_enabled &&
+		    defined $t0 && gitweb_check_feature('timed')) {
 			print ' '.
 			      Time::HiRes::tv_interval($t0, [Time::HiRes::gettimeofday()]).
 			      ' '.$number_of_git_cmds;
@@ -5134,7 +5215,7 @@ sub git_blame_common {
 		$formats_nav .=
 			$cgi->a({-href => href(action=>"blame", javascript=>0, -replay=>1)},
 			        "blame") . " (non-incremental)";
-	} else {
+	} elsif (!$caching_enabled) {
 		$formats_nav .=
 			$cgi->a({-href => href(action=>"blame_incremental", -replay=>1)},
 			        "blame") . " (incremental)";
@@ -5293,7 +5374,7 @@ sub git_blame {
 }
 
 sub git_blame_incremental {
-	git_blame_common('incremental');
+	git_blame_common(!$caching_enabled ? 'incremental' : undef);
 }
 
 sub git_blame_data {
@@ -5373,9 +5454,9 @@ sub git_blob_plain {
 			($sandbox ? 'attachment' : 'inline')
 			. '; filename="' . $save_as . '"');
 	local $/ = undef;
-	binmode STDOUT, ':raw';
+	binmode select(), ':raw';
 	print <$fd>;
-	binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+	binmode select(), ':utf8'; # as set at the beginning of gitweb.cgi
 	close $fd;
 }
 
@@ -5655,9 +5736,9 @@ sub git_snapshot {
 
 	open my $fd, "-|", $cmd
 		or die_error(500, "Execute git-archive failed");
-	binmode STDOUT, ':raw';
+	binmode select(), ':raw';
 	print <$fd>;
-	binmode STDOUT, ':utf8'; # as set at the beginning of gitweb.cgi
+	binmode select(), ':utf8'; # as set at the beginning of gitweb.cgi
 	close $fd;
 }
 
diff --git a/t/t9500-gitweb-standalone-no-errors.sh b/t/t9500-gitweb-standalone-no-errors.sh
index 63b6b06..aeb471d 100755
--- a/t/t9500-gitweb-standalone-no-errors.sh
+++ b/t/t9500-gitweb-standalone-no-errors.sh
@@ -647,4 +647,23 @@ test_expect_success \
 	 gitweb_run "p=.git;a=summary"'
 test_debug 'cat gitweb.log'
 
+# ----------------------------------------------------------------------
+# caching
+
+cat >>gitweb_config.perl <<\EOF
+$caching_enabled = 1;
+$cache_options{'expires_in'} = -1; # never expire cache for tests
+EOF
+rm -rf cache
+
+test_expect_success \
+	'caching enabled (project summary, first run)' \
+	'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
+test_expect_success \
+	'caching enabled (project summary, second run)' \
+	'gitweb_run "p=.git;a=summary"'
+test_debug 'cat gitweb.log'
+
 test_done
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 10/18] gitweb/lib - Cache captured output (using get/set)
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (9 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 10/17] gitweb: Add optional output caching (from gitweb/cache.pm) Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 11/17] gitweb/lib - Adaptive cache expiration time Jakub Narebski
                   ` (8 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Add GitwebCache::CacheOutput package, which contains cache_output
subroutine, and (re)exports capture_stop from GitwebCache::Capture.
The cache_output gets data from cache and prints it, or captures
output of provided subroutine (code reference), saves it to cache and
prints it.  It currently uses Cache::Cache compatibile (get, set)
interface to cache.  Capturing is currently (not configurable) done
using GitwebCache::Capture::SelectFH introduced in previous commit,
but any class derived from GitwebCache::Capture (like provided example
GitwebCache::Capture::TiedCapture and GitwebCache::Capture::PerlIO)
would work.

Gitweb would use cache_output to get page from cache, or to generate
page and save it to cache.  The capture_stop would be used in
die_error subroutine, because error pages would not be cached.

It is assumed that data is saved to cache _converted_, and should
therefore be read from cache and printed to STDOUT in ':raw' (binary)
mode.


Add t9503/test_cache_output.pl test, run as external test in
t9503-gitweb-caching.  It checks that cache_output behaves correctly,
namely that it saves and restores action output in cache, and that it
prints generated output or cached output.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/lib/GitwebCache/CacheOutput.pm |   62 +++++++++++++++++++++++++++++++
 t/t9503-gitweb-caching.sh             |    3 +
 t/t9503/test_cache_output.pl          |   66 +++++++++++++++++++++++++++++++++
 3 files changed, 131 insertions(+), 0 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/CacheOutput.pm
 create mode 100755 t/t9503/test_cache_output.pl

diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
new file mode 100644
index 0000000..8195f0b
--- /dev/null
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -0,0 +1,62 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+#
+# This program is licensed under the GPLv2
+
+#
+# Capturing and caching (gitweb) output
+#
+
+# Capture output, save it in cache and print it, or retrieve it from
+# cache and print it.
+
+package GitwebCache::CacheOutput;
+
+use strict;
+use warnings;
+
+use GitwebCache::SimpleFileCache;
+use GitwebCache::Capture::SelectFH qw(:all);
+
+use Exporter qw(import);
+our @EXPORT      = qw(cache_output capture_stop);
+our %EXPORT_TAGS = (all => [ @EXPORT ]);
+
+# cache_output($cache, $key, $action_code);
+#
+# Attempts to get $key from $cache; if successful, prints the value.
+# Otherwise, calls $action_code, capture its output and use
+# the captured output as the new value for $key in $cache,
+# then print captured output.
+our $CAPTURE_CLASS = 'GitwebCache::Capture::SelectFH';
+
+sub cache_output {
+	my ($cache, $key, $code) = @_;
+
+	my $data = $cache->get($key);
+
+	# capture and cache output, if there was nothing in the cache
+	if (!defined $data) {
+		my $capture = $CAPTURE_CLASS;
+		setup_capture($capture);
+
+		# do not use 'capture_block' prototype
+		$data = &capture_block($code);
+		$cache->set($key, $data) if defined $data;
+	}
+
+	# print cached data
+	if (defined $data) {
+		# select() instead of STDOUT is here for t9503 test:
+		binmode select(), ':raw';
+		print $data;
+	}
+
+	return $data;
+}
+
+1;
+__END__
+# end of package GitwebCache::CacheOutput;
diff --git a/t/t9503-gitweb-caching.sh b/t/t9503-gitweb-caching.sh
index 73b3f5a..0afcc0c 100755
--- a/t/t9503-gitweb-caching.sh
+++ b/t/t9503-gitweb-caching.sh
@@ -32,4 +32,7 @@ test_external 'GitwebCache::SimpleFileCache Perl API (in gitweb/cache.pm)' \
 test_external 'GitwebCache::Capture Perl API (in gitweb/cache.pm)' \
 	"$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_capture_interface.pl
 
+test_external 'GitwebCache::CacheOutput Perl API (in gitweb/cache.pm)' \
+	"$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_cache_output.pl
+
 test_done
diff --git a/t/t9503/test_cache_output.pl b/t/t9503/test_cache_output.pl
new file mode 100755
index 0000000..96eedb5
--- /dev/null
+++ b/t/t9503/test_cache_output.pl
@@ -0,0 +1,66 @@
+#!/usr/bin/perl
+use lib (split(/:/, $ENV{GITPERLLIB}));
+
+use warnings;
+use strict;
+
+use Test::More;
+
+# test source version
+use lib "$ENV{TEST_DIRECTORY}/../gitweb/lib";
+
+# ....................................................................
+
+# prototypes must be known at compile time, otherwise they do not work
+BEGIN { use_ok('GitwebCache::CacheOutput'); }
+
+# Test setting up $cache and $capture
+my $cache   = new_ok('GitwebCache::SimpleFileCache'   => [], 'The $cache  ');
+my $capture = new_ok('GitwebCache::Capture::SelectFH' => [], 'The $capture');
+
+# ......................................................................
+
+# Prepare for testing cache_output
+my $key = 'Key';
+my $action_output = <<'EOF';
+# This is data to be cached and shown
+EOF
+my $cached_output = <<"EOF";
+$action_output# (version recovered from cache)
+EOF
+sub action {
+	print $action_output;
+}
+
+# Catch output printed by cache_fetch
+# (only for 'print <sth>' and 'printf <sth>')
+sub capture_output_of_cache_output {
+	my $test_data = '';
+
+	open my $test_data_fh, '>', \$test_data;
+	my $oldfh = select($test_data_fh);
+
+	cache_output($cache, $key, \&action);
+
+	select($oldfh);
+	close $test_data_fh;
+
+	return $test_data;
+}
+
+# clean state
+$cache->set_expires_in(-1);
+$cache->remove($key);
+my $test_data;
+
+# first time (if there is no cache) generates cache entry
+$test_data = capture_output_of_cache_output();
+is($test_data, $action_output,        'action output is printed (generated)');
+is($cache->get($key), $action_output, 'action output is saved in cache (generated)');
+
+# second time (if cache is set/valid) reads from cache
+$cache->set($key, $cached_output);
+$test_data = capture_output_of_cache_output();
+is($test_data, $cached_output,        'action output is printed (from cache)');
+
+done_testing();
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 11/17] gitweb/lib - Adaptive cache expiration time
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (10 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 10/18] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 12/17] gitweb/lib - Use CHI compatibile (compute method) caching interface Jakub Narebski
                   ` (7 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Add to GitwebCache::SimpleFileCache support for adaptive lifetime
(cache expiration) control.  Cache lifetime can be increased or
decreased by any factor, e.g. load average, through the definition
of the 'check_load' callback.

Note that using ->set_expires_in, or unsetting 'check_load' via
->set_check_load(undef) turns off adaptive caching.

Make gitweb automatically adjust cache lifetime by load, using
get_loadavg() function.  Define and describe default parameters for
dynamic (adaptive) cache expiration time control.

There are some very basic tests of dynamic expiration time in t9503,
namely checking if dynamic expire time is within given upper and lower
bounds.

To be implemented (from original patch by J.H.):
* optional locking interface, where only one process can update cache
  (using flock)
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/gitweb.perl                        |   27 +++++++++-
 gitweb/lib/GitwebCache/SimpleFileCache.pm |   80 +++++++++++++++++++++++++++--
 t/t9503/test_cache_interface.pl           |   33 ++++++++++++
 3 files changed, 133 insertions(+), 7 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 6772f6e..5ae5757 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -261,13 +261,36 @@ our %cache_options = (
 	# * File::Spec->catdir(File::Spec->tmpdir(), 'gitweb-cache'),
 	# * '/var/cache/gitweb' (FHS compliant, requires being set up),
 	'cache_root' => 'cache',
+
 	# The number of subdirectories deep to cache object item.  This should be
 	# large enough that no cache directory has more than a few hundred
 	# objects.  Each non-leaf directory contains up to 256 subdirectories
 	# (00-ff).  Must be larger than 0.
 	'cache_depth' => 1,
-	# The (global) expiration time for objects placed in the cache, in seconds.
-	'expires_in' => 20,
+
+	# The (global) minimum expiration time for objects placed in the cache,
+	# in seconds.  If the dynamic adaptive cache exporation time is lower
+	# than this number, we set cache timeout to this minimum.
+	'expires_min' => 20,   # 20 seconds
+
+	# The (global) maximum expiration time for dynamic (adaptive) caching
+	# algorithm, in seconds.  If the adaptive cache lifetime exceeds this
+	# number, we set cache timeout to this maximum.
+	# (If 'expires_min' >= 'expires_max', there is no adaptive cache timeout,
+	# and 'expires_min' is used as expiration time for objects in cache.)
+	'expires_max' => 1200, # 20 minutes
+
+	# Cache lifetime will be increased by applying this factor to the result
+	# from 'check_load' callback (see below).
+	'expires_factor' => 60, # expire time in seconds for 1.0 (100% CPU) load
+
+	# User supplied callback for deciding the cache policy, usually system
+	# load.  Multiplied by 'expires_factor' gives adaptive expiration time,
+	# in seconds, subject to the limits imposed by 'expires_min' and
+	# 'expires_max' bounds.  Set to undef (or delete) to turn off dynamic
+	# lifetime control.
+	# (Compatibile with Cache::Adaptive.)
+	'check_load' => \&get_loadavg,
 );
 
 
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 7c90350..e3548dc 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -61,6 +61,22 @@ our $DEFAULT_NAMESPACE = '';
 #    'expires_in' (CHI compatibile) [seconds]
 #    The expiration time for objects place in the cache.
 #    Defaults to -1 (never expire) if not explicitly set.
+#    Sets 'expires_min' to given value.
+#  * 'expires_min' [seconds]
+#    The minimum expiration time for objects in cache (e.g. with 0% CPU load).
+#    Used as lower bound in adaptive cache lifetime / expiration.
+#    Defaults to 20 seconds; 'expires_in' sets it also.
+#  * 'expires_max' [seconds]
+#    The maximum expiration time for objects in cache.
+#    Used as upper bound in adaptive cache lifetime / expiration.
+#    Defaults to 1200 seconds, if not set; 
+#    defaults to 'expires_min' if 'expires_in' is used.
+#  * 'check_load'
+#    Subroutine (code) used for adaptive cache lifetime / expiration.
+#    If unset, adaptive caching is turned off; defaults to unset.
+#  * 'increase_factor' [seconds / 100% CPU load]
+#    Factor multiplying 'check_load' result when calculating cache lietime.
+#    Defaults to 60 seconds for 100% SPU load ('check_load' returning 1.0).
 sub new {
 	my ($proto, $p_options_hash_ref) = @_;
 
@@ -68,7 +84,8 @@ sub new {
 	my $self  = {};
 	$self = bless($self, $class);
 
-	my ($root, $depth, $ns, $expires_in);
+	my ($root, $depth, $ns);
+	my ($expires_min, $expires_max, $increase_factor, $check_load);
 	if (defined $p_options_hash_ref) {
 		$root  =
 			$p_options_hash_ref->{'cache_root'} ||
@@ -77,19 +94,31 @@ sub new {
 			$p_options_hash_ref->{'cache_depth'} ||
 			$p_options_hash_ref->{'depth'};
 		$ns    = $p_options_hash_ref->{'namespace'};
-		$expires_in =
+		$expires_min =
+			$p_options_hash_ref->{'expires_min'} ||
 			$p_options_hash_ref->{'default_expires_in'} ||
 			$p_options_hash_ref->{'expires_in'};
+		$expires_max =
+			$p_options_hash_ref->{'expires_max'};
+		$increase_factor = $p_options_hash_ref->{'expires_factor'};
+		$check_load      = $p_options_hash_ref->{'check_load'};
 	}
 	$root  = $DEFAULT_CACHE_ROOT  unless defined($root);
 	$depth = $DEFAULT_CACHE_DEPTH unless defined($depth);
 	$ns    = $DEFAULT_NAMESPACE   unless defined($ns);
-	$expires_in = -1 unless defined($expires_in); # <0 means never
+	$expires_min = -1 unless defined($expires_min);
+	$expires_max = $expires_min
+		if (!defined($expires_max) && exists $p_options_hash_ref->{'expires_in'});
+	$expires_max = -1 unless (defined($expires_max));
+	$increase_factor = 60 unless defined($increase_factor);
 
 	$self->set_root($root);
 	$self->set_depth($depth);
 	$self->set_namespace($ns);
-	$self->set_expires_in($expires_in);
+	$self->set_expires_min($expires_min);
+	$self->set_expires_max($expires_max);
+	$self->set_increase_factor($increase_factor);
+	$self->set_check_load($check_load);
 
 	return $self;
 }
@@ -100,7 +129,7 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(depth root namespace expires_in)) {
+foreach my $i (qw(depth root namespace expires_min expires_max increase_factor check_load)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -113,6 +142,47 @@ foreach my $i (qw(depth root namespace expires_in)) {
 	};
 }
 
+# ......................................................................
+# pseudo-accessors
+
+# returns adaptive lifetime of cache entry for given $key [seconds]
+sub get_expires_in {
+	my ($self) = @_;
+
+	# short-circuit
+	if (!defined $self->{'check_load'} ||
+	    $self->{'expires_max'} <= $self->{'expires_min'}) {
+		return $self->{'expires_min'};
+	}
+
+	my $expires_in =
+		#$self->{'expires_min'} +
+		$self->{'increase_factor'} * $self->check_load();
+
+	if ($expires_in < $self->{'expires_min'}) {
+		return $self->{'expires_min'};
+	} elsif ($expires_in > $self->{'expires_max'}) {
+		return $self->{'expires_max'};
+	}
+
+	return $expires_in;
+}
+
+# sets expiration time to $duration, turns off adaptive cache lifetime
+sub set_expires_in {
+	my ($self, $duration) = @_;
+
+	$self->{'expires_min'} = $self->{'expires_max'} = $duration;
+}
+
+# runs 'check_load' subroutine, for adaptive cache lifetime.
+# Note: check in caller that 'check_load' exists.
+sub check_load {
+	my $self = shift;
+	#return &{$self->{'check_load'}}();
+	return $self->{'check_load'}->();
+}
+
 # ----------------------------------------------------------------------
 # utility functions and methods
 
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index b1e9036..37c1f2b 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -97,4 +97,37 @@ subtest 'cache expiration' => sub {
 	done_testing();
 };
 
+# Test assertions for adaptive cache expiration
+#
+my $load = 0.0;
+sub load { return $load; }
+my $expires_min = 10;
+my $expires_max = 30;
+$cache->set_expires_in(-1);
+$cache->set_expires_min($expires_min);
+$cache->set_expires_max($expires_max);
+$cache->set_check_load(\&load);
+subtest 'adaptive cache expiration' => sub {
+	cmp_ok($cache->get_expires_min(), '==', $expires_min,
+	       '"expires min" set correctly');
+	cmp_ok($cache->get_expires_max(), '==', $expires_max,
+	       '"expires max" set correctly');
+
+	$load = 0.0;
+	cmp_ok($cache->get_expires_in(), '>=', $expires_min,
+	       '"expires in" bound from down for load=0');
+	cmp_ok($cache->get_expires_in(), '<=', $expires_max,
+	       '"expires in" bound from up   for load=0');
+	
+	$load = 1_000;
+	cmp_ok($cache->get_expires_in(), '>=', $expires_min,
+	       '"expires in" bound from down for heavy load');
+	cmp_ok($cache->get_expires_in(), '<=', $expires_max,
+	       '"expires in" bound from up   for heavy load');
+
+	done_testing();
+};
+
+$cache->set_expires_in(-1);
+
 done_testing();
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 12/17] gitweb/lib - Use CHI compatibile (compute method) caching interface
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (11 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 11/17] gitweb/lib - Adaptive cache expiration time Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 13/17] gitweb/lib - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
                   ` (6 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

If $cache provides CHI compatible ->compute($key, $code) method, use it
instead of Cache::Cache compatible ->get($key) and ->set($key, $data).

GitwebCache::SimpleFileCache provides 'compute' method, which currently
simply use 'get' and 'set' methods in proscribed manner.  Nevertheless
'compute' method can be more flexible in choosing when to refresh cache,
and which process is to refresh/(re)generate cache entry.  This method
would use (advisory) locking to prevent 'cache miss stampede' (aka
'stampeding herd') problem in the next commit.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/lib/GitwebCache/CacheOutput.pm |   31 +++++++++++++++++++++++++++++++
 1 files changed, 31 insertions(+), 0 deletions(-)

diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
index 8195f0b..de4bd4d 100644
--- a/gitweb/lib/GitwebCache/CacheOutput.pm
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -35,6 +35,37 @@ our $CAPTURE_CLASS = 'GitwebCache::Capture::SelectFH';
 sub cache_output {
 	my ($cache, $key, $code) = @_;
 
+	if ($cache->can('compute')) {
+		#other solution: goto &cache_output_compute;
+		return cache_output_compute($cache, $key, $code);
+	} else {
+		#other solution: goto &cache_output_get_set;
+		return cache_output_get_set($cache, $key, $code);
+	}
+}
+
+# for $cache which can ->compute($key, $code)
+sub cache_output_compute {
+	my ($cache, $key, $code) = @_;
+
+	my $capture = $CAPTURE_CLASS;
+	setup_capture($capture);
+
+	my $data = $cache->compute($key, sub { &capture_block($code) });
+
+	if (defined $data) {
+		# select() instead of STDOUT is here for t9503 test:
+		binmode select(), ':raw';
+		print $data;
+	}
+
+	return $data;
+}
+
+# for $cache which can ->get($key) and ->set($key, $data)
+sub cache_output_get_set {
+	my ($cache, $key, $code) = @_;
+
 	my $data = $cache->get($key);
 
 	# capture and cache output, if there was nothing in the cache
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 13/17] gitweb/lib - Use locking to avoid 'cache miss stampede' problem
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (12 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 12/17] gitweb/lib - Use CHI compatibile (compute method) caching interface Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 14/17] gitweb/lib - Serve stale data when waiting for filling cache Jakub Narebski
                   ` (5 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

Create GitwebCache::FileCacheWithLocking module (class), derived from
GitwebCache::SimpleFileCache, where the ->compute($key, $code) method
use locking (via flock) to ensure that only one process would generate
data to update/fill-in cache; the rest would wait for the cache to
be (re)generated and would read data from cache.  If process that was
to (re)generate data dies or exits, one of the readers would take its
role.

Currently this feature can not be disabled via %cache_options,
although you can set $cache to 'GitwebCache::SimpleFileCache' instead.
Future new features (like: serving stale data while cache is being
regenerated, (re)generating cache in background, activity indicator)
all depend on locking.


A test in t9503 shows that in the case where there are two clients
trying to simultaneously access non-existent or stale cache entry,
(and generating data takes (artifically) a bit of time), if they are
using ->compute method the data is (re)generated once, as opposed to
if those clients are just using ->get/->set methods.

To be implemented (from original patch by J.H.):
* background building, and showing stale cache
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Note that there is slight inefficiency, in that filename for lockfile
depends on the filename for cache entry (it just adds '.lock' suffix),
but is recalculated from $key for both paths.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/Makefile                                |    1 +
 gitweb/README                                  |    6 +-
 gitweb/gitweb.perl                             |    4 +-
 gitweb/lib/GitwebCache/CacheOutput.pm          |    2 +-
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   95 ++++++++++++++
 t/t9503-gitweb-caching.sh                      |    2 +-
 t/t9503/test_cache_interface.pl                |  157 +++++++++++++++++++++++-
 7 files changed, 260 insertions(+), 7 deletions(-)
 create mode 100644 gitweb/lib/GitwebCache/FileCacheWithLocking.pm

diff --git a/gitweb/Makefile b/gitweb/Makefile
index 025060b..c736648 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -114,6 +114,7 @@ GITWEB_FILES += static/git-logo.png static/git-favicon.png
 # gitweb output caching
 GITWEB_MODULES += lib/GitwebCache/CacheOutput.pm
 GITWEB_MODULES += lib/GitwebCache/SimpleFileCache.pm
+GITWEB_MODULES += lib/GitwebCache/FileCacheWithLocking.pm
 GITWEB_MODULES += lib/GitwebCache/Capture.pm
 GITWEB_MODULES += lib/GitwebCache/Capture/SelectFH.pm
 
diff --git a/gitweb/README b/gitweb/README
index 9b3e5d7..7309e8e 100644
--- a/gitweb/README
+++ b/gitweb/README
@@ -342,8 +342,12 @@ cache config (see below), and ignore unrecognized options.  Such caching
 engine should also implement (at least) ->get($key) and ->set($key, $data)
 methods (Cache::Cache and CHI compatible interface).
 
+You can set $cache to 'GitwebCache::SimpleFileCache' if you don't want
+to use locking, but then some advanced features, like generating data in
+background, wouldn't work because they require locking.
+
 If $cache is left unset (if it is left undefined), then gitweb would use
-GitwebCache::SimpleFileCache as caching engine.  This engine is 'dumb' (but
+GitwebCache::FileCacheWithLocking as caching engine.  This engine is 'dumb' (but
 fast) file based caching layer, currently without any support for cache size
 limiting, or even removing expired / grossly expired entries.  It has
 therefore the downside of requiring a huge amount of disk space if there are
diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 5ae5757..411eb0d 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -241,7 +241,7 @@ our $maxload = 300;
 our $caching_enabled = 0;
 # Set to _initialized_ instance of cache interface implementing (at least)
 # get($key) and set($key, $data) methods (Cache::Cache and CHI interfaces).
-# If unset, GitwebCache::SimpleFileCache would be used, which is 'dumb'
+# If unset, GitwebCache::FileCacheWithLocking would be used, which is 'dumb'
 # (but fast) file based caching layer, currently without any support for
 # cache size limiting.  It is therefore recommended that the cache directory
 # be periodically completely deleted; this operation is safe to perform.
@@ -1063,7 +1063,7 @@ if ($caching_enabled) {
 	# $cache might be initialized (instantiated) cache, i.e. cache object,
 	# or it might be name of class, or it might be undefined
 	unless (defined $cache && ref($cache)) {
-		$cache ||= 'GitwebCache::SimpleFileCache';
+		$cache ||= 'GitwebCache::FileCacheWithLocking';
 		$cache = $cache->new({
 			%cache_options,
 			#'cache_root' => '/tmp/cache',
diff --git a/gitweb/lib/GitwebCache/CacheOutput.pm b/gitweb/lib/GitwebCache/CacheOutput.pm
index de4bd4d..a397a45 100644
--- a/gitweb/lib/GitwebCache/CacheOutput.pm
+++ b/gitweb/lib/GitwebCache/CacheOutput.pm
@@ -17,7 +17,7 @@ package GitwebCache::CacheOutput;
 use strict;
 use warnings;
 
-use GitwebCache::SimpleFileCache;
+use GitwebCache::SimpleFileCache; # base class
 use GitwebCache::Capture::SelectFH qw(:all);
 
 use Exporter qw(import);
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
new file mode 100644
index 0000000..c91c0ee
--- /dev/null
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -0,0 +1,95 @@
+# gitweb - simple web interface to track changes in git repositories
+#
+# (C) 2006, John 'Warthog9' Hawley <warthog19@eaglescrag.net>
+# (C) 2010, Jakub Narebski <jnareb@gmail.com>
+#
+# This program is licensed under the GPLv2
+
+#
+# Gitweb caching engine, simple file-based cache, with locking
+#
+
+# Based on GitwebCache::SimpleFileCache, minimalistic cache that
+# stores data in the filesystem, without serialization.
+#
+# It uses file locks (flock) to have only one process generating data
+# and writing to cache, when using CHI interface ->compute() method.
+
+package GitwebCache::FileCacheWithLocking;
+use base qw(GitwebCache::SimpleFileCache);
+
+use strict;
+use warnings;
+
+use File::Path qw(mkpath);
+use Fcntl qw(:flock);
+
+# ......................................................................
+# constructor is inherited from GitwebCache::SimpleFileCache
+
+# ----------------------------------------------------------------------
+# utility functions and methods
+
+# Take an human readable key, and return path to be used for lockfile
+# Puts dirname of file path in second argument, if it is provided.
+sub get_lockname {
+	my $self = shift;
+
+	return $self->path_to_key(@_) . '.lock';
+}
+
+# ......................................................................
+# interface methods
+
+# $data = $cache->compute($key, $code);
+#
+# Combines the get and set operations in a single call.  Attempts to
+# get $key; if successful, returns the value.  Otherwise, calls $code
+# and uses the return value as the new value for $key, which is then
+# returned.
+#
+# Uses file locking to have only one process updating value for $key
+# to avoid 'cache miss stampede' (aka 'stampeding herd') problem.
+sub compute {
+	my ($self, $key, $code) = @_;
+
+	my $data = $self->get($key);
+	return $data if defined $data;
+
+	my $dir;
+	my $lockfile = $self->get_lockname($key, \$dir);
+
+	# ensure that directory leading to lockfile exists
+	if (!-d $dir) {
+		eval { mkpath($dir, 0, 0777); 1 }
+			or die "Couldn't mkpath '$dir' for lockfile: $!";
+	}
+
+	# this loop is to protect against situation where process that
+	# acquired exclusive lock (writer) dies or exits (die_error)
+	# before writing data to cache
+	my $lock_state; # needed for loop condition
+	do {
+		open my $lock_fh, '+>', $lockfile
+			or die "Could't open lockfile '$lockfile': $!";
+		$lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
+		if ($lock_state) {
+			# acquired writers lock
+			$data = $code->($self, $key);
+			$self->set($key, $data);
+		} else {
+			# get readers lock
+			flock($lock_fh, LOCK_SH);
+			$data = $self->fetch($key);
+		}
+		# closing lockfile releases lock
+		close $lock_fh
+			or die "Could't close lockfile '$lockfile': $!";
+	} until (defined $data || $lock_state);
+	# repeat until we have data, or we tried generating data oneself and failed
+	return $data;
+}
+
+1;
+__END__
+# end of package GitwebCache::FileCacheWithLocking
diff --git a/t/t9503-gitweb-caching.sh b/t/t9503-gitweb-caching.sh
index 0afcc0c..56f5aa0 100755
--- a/t/t9503-gitweb-caching.sh
+++ b/t/t9503-gitweb-caching.sh
@@ -26,7 +26,7 @@ fi
 
 # ----------------------------------------------------------------------
 
-test_external 'GitwebCache::SimpleFileCache Perl API (in gitweb/cache.pm)' \
+test_external 'GitwebCache::*FileCache* Perl API (in gitweb/cache.pm)' \
 	"$PERL_PATH" "$TEST_DIRECTORY"/t9503/test_cache_interface.pl
 
 test_external 'GitwebCache::Capture Perl API (in gitweb/cache.pm)' \
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 37c1f2b..f4e2418 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -4,6 +4,7 @@ use lib (split(/:/, $ENV{GITPERLLIB}));
 use warnings;
 use strict;
 
+use IO::Handle;
 use Test::More;
 
 # test source version
@@ -11,8 +12,9 @@ use lib "$ENV{TEST_DIRECTORY}/../gitweb/lib";
 
 # Test creating a cache
 #
-BEGIN { use_ok('GitwebCache::SimpleFileCache'); }
-my $cache = new_ok('GitwebCache::SimpleFileCache');
+BEGIN { use_ok('GitwebCache::FileCacheWithLocking'); }
+my $cache = new_ok('GitwebCache::FileCacheWithLocking');
+isa_ok($cache, 'GitwebCache::SimpleFileCache');
 
 # Test that default values are defined
 #
@@ -130,4 +132,155 @@ subtest 'adaptive cache expiration' => sub {
 
 $cache->set_expires_in(-1);
 
+# Test 'stampeding herd' / 'cache miss stampede' problem
+# (probably should be run only if GIT_TEST_LONG)
+#
+my $slow_time = 1; # how many seconds to sleep in mockup of slow generation
+sub get_value_slow {
+	$call_count++;
+	sleep $slow_time;
+	return $value;
+}
+
+sub cache_get_set {
+	my ($cache, $key) = @_;
+
+	my $data = $cache->get($key);
+	if (!defined $data) {
+		$data = get_value_slow();
+		$cache->set($key, $data);
+	}
+
+	return $data;
+}
+
+sub cache_compute {
+	my ($cache, $key) = @_;
+
+	my $data = $cache->compute($key, \&get_value_slow);
+	return $data;
+}
+
+sub run_child {
+	my ($writer, $data) = @_;
+
+	print $writer "$call_count\0";
+	print $writer "$data\0";
+	$writer->flush();
+
+	exit 0;
+}
+
+sub run_parent {
+	my ($reader, $data, $child) = @_;
+
+	local $/ = "\0";
+
+	my @lines = <$reader>;
+	chomp @lines;
+
+	waitpid $child, 0;
+	close $reader;
+
+	my ($child_count, $child_data) = @lines;
+	return ($child_count, $child_data);
+}
+
+sub count_total {
+	my ($kid_fh, $data, $pid) = @_;
+
+	if ($pid) {
+		my ($child_count, $child_data) =
+			run_parent($kid_fh, $data, $pid);
+		$call_count += $child_count;
+
+	} else {
+		run_child(\*STDOUT, $data);
+	}
+}
+
+my ($pid, $kid_fh);
+
+$call_count = 0;
+$cache->remove($key);
+$pid = open $kid_fh, '-|';
+SKIP: {
+	skip "cannot fork: $!", 1
+		unless defined $pid;
+
+	my $data = cache_get_set($cache, $key);
+	count_total($kid_fh, $data, $pid);
+
+	cmp_ok($call_count, '==', 2, 'parallel get/set: get_value_slow() called twice');
+}
+
+$call_count = 0;
+$cache->remove($key);
+$pid = open $kid_fh, '-|';
+SKIP: {
+	skip "cannot fork: $!", 1
+		unless defined $pid;
+
+	my $data = cache_compute($cache, $key);
+	count_total($kid_fh, $data, $pid);
+
+	cmp_ok($call_count, '==', 1, 'parallel compute: get_value_slow() called once');
+}
+
+
+# Test that it doesn't hang if get_action exits/dies
+#
+sub get_value_die {
+	$call_count++;
+	die "get_value_die\n";
+}
+
+sub cache_compute_catch {
+	my ($cache, $key) = @_;
+
+	eval { $cache->compute($key, \&get_value_die); };
+	my $eval_error = $@;
+	return $eval_error;
+}
+
+sub catch_errors {
+	my ($kid_fh, $eval_error, $pid) = @_;
+
+	my $child_eval_error;
+	if ($pid) {
+		(undef, $child_eval_error) =
+			run_parent($kid_fh, $eval_error, $pid);
+
+	} else {
+		run_child(\*STDOUT, $eval_error);
+	}
+
+	return ($eval_error, $child_eval_error);
+}
+
+$call_count = 0;
+my $result = eval {
+	$pid = open $kid_fh, '-|';
+ SKIP: {
+		skip "cannot fork: $!", 2
+			unless defined $pid;
+
+		local $SIG{ALRM} = sub { die "alarm\n"; };
+		alarm 4*$slow_time;
+
+		my $eval_error = cache_compute_catch($cache, 'No Key');
+		my $child_eval_error;
+
+		($eval_error, $child_eval_error) =
+			catch_errors($kid_fh, $eval_error, $pid);
+
+		is($eval_error,       "get_value_die\n", 'get_value_die() died in parent');
+		is($child_eval_error, "get_value_die\n", 'get_value_die() died in child');
+
+		alarm 0;
+	}
+};
+ok(!$@, 'no alarm call (neither process hung)');
+diag($@) if $@;
+
 done_testing();
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 14/17] gitweb/lib - Serve stale data when waiting for filling cache
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (13 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 13/17] gitweb/lib - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 15/17] gitweb/lib - Regenerate (refresh) cache in background Jakub Narebski
                   ` (4 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

When process fails to acquire exclusive (writers) lock, then instead
of waiting for the other process to (re)generate and fill cache, serve
stale (expired) data from cache.  This is of course possible only if
there is some stale data in cache for given key.

This feature of GitwebCache::FileCacheWithLocking is used only for an
->update($key, $code) method.  It is controlled by 'max_lifetime'
cache parameter; you can set it to -1 to always serve stale data
if it exists, and you can set it to 0 (or any value smaller than
'expires_min') to turn this feature off.

This feature, as it is implemented currently, makes ->update() method a
bit asymmetric with respect to process that acquired writers lock and
those processes that didn't, which can be seen in the new test in t9503.
The process that is to regenerate (refresh) data in cache must wait for
the data to be generated in full before showing anything to client, while
the other processes show stale (expired) data immediately.  In order to
remove or reduce this asymmetry gitweb would need to employ one of the two
alternate solutions.  Either data should be (re)generated in background,
so that process that acquired writers lock would generate data in
background while serving stale data, or alternatively the process that
generates data should pass output to original STDOUT while capturing it
("tee" output).

When developing this feature, ->is_valid() method in base class
GitwebCache::SimpleFileCache acquired additional extra optional
parameter, where one can pass expire time instead of using whole-cache
global (adaptive) expire time.

To be implemented (from original patch by J.H.):
* background building,
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/gitweb.perl                             |    8 ++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   94 +++++++++++++++++++++++-
 gitweb/lib/GitwebCache/SimpleFileCache.pm      |    9 +-
 t/t9503/test_cache_interface.pl                |   58 ++++++++++++++-
 4 files changed, 160 insertions(+), 9 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 411eb0d..5aee9a1 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -291,6 +291,14 @@ our %cache_options = (
 	# lifetime control.
 	# (Compatibile with Cache::Adaptive.)
 	'check_load' => \&get_loadavg,
+
+	# Maximum cache file life, in seconds.  If cache entry lifetime exceeds
+	# this value, it wouldn't be served as being too stale when waiting for
+	# cache to be regenerated/refreshed, instead of trying to display
+	# existing cache date.
+	# Set it to -1 to always serve existing data if it exists,
+	# set it to 0 to turn off serving stale data - always wait.
+	'max_lifetime' => 5*60*60, # 5 hours
 );
 
 
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index c91c0ee..266114c 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -25,7 +25,88 @@ use File::Path qw(mkpath);
 use Fcntl qw(:flock);
 
 # ......................................................................
-# constructor is inherited from GitwebCache::SimpleFileCache
+# constructor
+
+# The options are set by passing in a reference to a hash containing
+# any of the following keys:
+#  * 'namespace'
+#    The namespace associated with this cache.  This allows easy separation of
+#    multiple, distinct caches without worrying about key collision.  Defaults
+#    to $DEFAULT_NAMESPACE.
+#  * 'cache_root' (Cache::FileCache compatibile),
+#    'root_dir' (CHI::Driver::File compatibile),
+#    The location in the filesystem that will hold the root of the cache.
+#    Defaults to $DEFAULT_CACHE_ROOT.
+#  * 'cache_depth' (Cache::FileCache compatibile),
+#    'depth' (CHI::Driver::File compatibile),
+#    The number of subdirectories deep to cache object item.  This should be
+#    large enough that no cache directory has more than a few hundred objects.
+#    Defaults to $DEFAULT_CACHE_DEPTH unless explicitly set.
+#  * 'default_expires_in' (Cache::Cache compatibile),
+#    'expires_in' (CHI compatibile) [seconds]
+#    The expiration time for objects place in the cache.
+#    Defaults to -1 (never expire) if not explicitly set.
+#    Sets 'expires_min' to given value.
+#  * 'expires_min' [seconds]
+#    The minimum expiration time for objects in cache (e.g. with 0% CPU load).
+#    Used as lower bound in adaptive cache lifetime / expiration.
+#    Defaults to 20 seconds; 'expires_in' sets it also.
+#  * 'expires_max' [seconds]
+#    The maximum expiration time for objects in cache.
+#    Used as upper bound in adaptive cache lifetime / expiration.
+#    Defaults to 1200 seconds, if not set; 
+#    defaults to 'expires_min' if 'expires_in' is used.
+#  * 'check_load'
+#    Subroutine (code) used for adaptive cache lifetime / expiration.
+#    If unset, adaptive caching is turned off; defaults to unset.
+#  * 'increase_factor' [seconds / 100% CPU load]
+#    Factor multiplying 'check_load' result when calculating cache lietime.
+#    Defaults to 60 seconds for 100% SPU load ('check_load' returning 1.0).
+#
+# (all the above are inherited from GitwebCache::SimpleFileCache)
+#
+#  * 'max_lifetime' [seconds]
+#    If it is greater than 0, and cache entry is expired but not older
+#    than it, serve stale data when waiting for cache entry to be 
+#    regenerated (refreshed).  Non-adaptive.
+#    Defaults to -1 (never expire / always serve stale).
+sub new {
+	my ($proto, $p_options_hash_ref) = @_;
+
+	my $class = ref($proto) || $proto;
+	my $self = $class->SUPER::new($p_options_hash_ref);
+
+	my ($max_lifetime);
+	if (defined $p_options_hash_ref) {
+		$max_lifetime =
+			$p_options_hash_ref->{'max_lifetime'} ||
+			$p_options_hash_ref->{'max_cache_lifetime'};
+	}
+	$max_lifetime = -1 unless defined($max_lifetime);
+
+	$self->set_max_lifetime($max_lifetime);
+
+	return $self;
+}
+
+# ......................................................................
+# accessors
+
+# http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
+
+# creates get_depth() and set_depth($depth) etc. methods
+foreach my $i (qw(max_lifetime)) {
+	my $field = $i;
+	no strict 'refs';
+	*{"get_$field"} = sub {
+		my $self = shift;
+		return $self->{$field};
+	};
+	*{"set_$field"} = sub {
+		my ($self, $value) = @_;
+		$self->{$field} = $value;
+	};
+}
 
 # ----------------------------------------------------------------------
 # utility functions and methods
@@ -78,9 +159,14 @@ sub compute {
 			$data = $code->($self, $key);
 			$self->set($key, $data);
 		} else {
-			# get readers lock
-			flock($lock_fh, LOCK_SH);
-			$data = $self->fetch($key);
+			# try to retrieve stale data
+			$data = $self->fetch($key)
+				if $self->is_valid($key, $self->get_max_lifetime());
+			if (!defined $data) {
+				# get readers lock if there is no stale data to serve
+				flock($lock_fh, LOCK_SH);
+				$data = $self->fetch($key);
+			}
 		}
 		# closing lockfile releases lock
 		close $lock_fh
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index e3548dc..5289894 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -337,12 +337,13 @@ sub remove {
 		or die "Couldn't remove file '$file': $!";
 }
 
-# $cache->is_valid($key)
+# $cache->is_valid($key[, $expires_in])
 #
 # Returns a boolean indicating whether $key exists in the cache
-# and has not expired (global per-cache 'expires_in').
+# and has not expired.  Uses global per-cache expires time, unless
+# passed optional $expires_in argument.
 sub is_valid {
-	my ($self, $key) = @_;
+	my ($self, $key, $expires_in) = @_;
 
 	my $path = $self->path_to_key($key);
 
@@ -353,7 +354,7 @@ sub is_valid {
 		or die "Couldn't stat file '$path': $!";
 
 	# expire time can be set to never
-	my $expires_in = $self->get_expires_in();
+	$expires_in = defined $expires_in ? $expires_in : $self->get_expires_in();
 	return 1 unless (defined $expires_in && $expires_in >= 0);
 
 	# is file expired?
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index f4e2418..185307e 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -13,7 +13,9 @@ use lib "$ENV{TEST_DIRECTORY}/../gitweb/lib";
 # Test creating a cache
 #
 BEGIN { use_ok('GitwebCache::FileCacheWithLocking'); }
-my $cache = new_ok('GitwebCache::FileCacheWithLocking');
+my $cache = new_ok('GitwebCache::FileCacheWithLocking', [ {
+	'max_lifetime' => 0, # turn it off
+} ]);
 isa_ok($cache, 'GitwebCache::SimpleFileCache');
 
 # Test that default values are defined
@@ -283,4 +285,58 @@ my $result = eval {
 ok(!$@, 'no alarm call (neither process hung)');
 diag($@) if $@;
 
+# Test that cache returns stale data in existing but expired cache situation
+# (probably should be run only if GIT_TEST_LONG)
+#
+my $stale_value = 'Stale Value';
+$cache->set($key, $stale_value);
+$call_count = 0;
+sub run_compute_forked {
+	my ($pid, $kid_fh) = @_;
+
+	my $data = $cache->compute($key, \&get_value_slow);
+
+	my $child_data;
+	if ($pid) {
+		(undef, $child_data) =
+			run_parent($kid_fh, $data, $pid);
+
+	} else {
+		run_child(\*STDOUT, $data);
+	}
+
+	return ($data, $child_data);
+}
+$cache->set_expires_in(0);    # expire now
+$cache->set_max_lifetime(-1); # forever (always serve stale data)
+$pid = open $kid_fh, '-|';
+SKIP: {
+	skip "cannot fork: $!", 2
+		unless defined $pid;
+
+	my ($data, $child_data) = run_compute_forked($pid, $kid_fh);
+
+	# returning stale data works
+	ok($data eq $stale_value || $child_data eq $stale_value,
+	   'stale data in at least one process when expired');
+
+	$cache->set_expires_in(-1); # never expire for next ->get
+	is($cache->get($key), $value, 'value got set correctly, even if stale data returned');
+}
+
+$cache->set_expires_in(0);   # expire now
+$cache->set_max_lifetime(0); # don't serve stale data
+$pid = open $kid_fh, '-|';
+SKIP: {
+	skip "cannot fork: $!", 2
+		unless defined $pid;
+
+	my ($data, $child_data) = run_compute_forked($pid, $kid_fh);
+
+	# no returning stale data
+	isnt($data,       $stale_value, 'no stale data in parent if configured');
+	isnt($child_data, $stale_value, 'no stale data in child  if configured');
+}
+$cache->set_expires_in(-1);
+
 done_testing();
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 15/17] gitweb/lib - Regenerate (refresh) cache in background
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (14 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 14/17] gitweb/lib - Serve stale data when waiting for filling cache Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 16/17] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
                   ` (3 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

This commit removes asymmetry in serving stale data (if it exists)
when regenerating cache in GitwebCache::FileCacheWithLocking.  The process
that acquired exclusive (writers) lock, and is therefore selected to
be the one that (re)generates data to fill the cache, can now generate
data in background, while serving stale data.

Those background processes are daemonized, i.e. detached from the main
process (the one returning data or stale data).  Otherwise there might be a
problem when gitweb is running as (part of) long-lived process, for example
from mod_perl (or in the future from FastCGI): it would leave unreaped
children as zombies (entries in process table).  We don't want to wait for
background process, and we can't set $SIG{CHLD} to 'IGNORE' in gitweb to
automatically reap child processes, because this interferes with using
  open my $fd, '-|', git_cmd(), 'param', ...
    or die_error(...)
  # read from <$fd>
  close $fd
    or die_error(...)
In the above code "close" for magic "-|" open calls waitpid...  and we
would would die with "No child processes".  Removing 'or die' would
possibly remove ability to react to other errors.

This feature can be enabled or disabled on demand via 'background_cache'
cache parameter.  It is turned on by default.


To be implemented (from original patch by J.H.):
* server-side progress indicator when waiting for filling cache,
  which in turn requires separating situations (like snapshots and
  other non-HTML responses) where we should not show 'please wait'
  message

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/gitweb.perl                             |    9 ++++
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   61 +++++++++++++++++++----
 gitweb/lib/GitwebCache/SimpleFileCache.pm      |    3 +-
 t/t9503/test_cache_interface.pl                |   12 ++++-
 4 files changed, 70 insertions(+), 15 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 5aee9a1..2ca1ad7 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -299,6 +299,15 @@ our %cache_options = (
 	# Set it to -1 to always serve existing data if it exists,
 	# set it to 0 to turn off serving stale data - always wait.
 	'max_lifetime' => 5*60*60, # 5 hours
+
+	# This enables/disables background caching.  If it is set to true value,
+	# caching engine would return stale data (if it is not older than
+	# 'max_lifetime' seconds) if it exists, and launch process if regenerating
+	# (refreshing) cache into the background.  If it is set to false value,
+	# the process that fills cache must always wait for data to be generated.
+	# In theory this will make gitweb seem more responsive at the price of
+	# serving possibly stale data.
+	'background_cache' => 1,
 );
 
 
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index 266114c..bde1420 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -23,6 +23,7 @@ use warnings;
 
 use File::Path qw(mkpath);
 use Fcntl qw(:flock);
+use POSIX qw(setsid);
 
 # ......................................................................
 # constructor
@@ -70,21 +71,27 @@ use Fcntl qw(:flock);
 #    than it, serve stale data when waiting for cache entry to be 
 #    regenerated (refreshed).  Non-adaptive.
 #    Defaults to -1 (never expire / always serve stale).
+#  * 'background_cache' (boolean)
+#    This enables/disables regenerating cache in background process.
+#    Defaults to true.
 sub new {
 	my ($proto, $p_options_hash_ref) = @_;
 
 	my $class = ref($proto) || $proto;
 	my $self = $class->SUPER::new($p_options_hash_ref);
 
-	my ($max_lifetime);
+	my ($max_lifetime, $background_cache);
 	if (defined $p_options_hash_ref) {
 		$max_lifetime =
 			$p_options_hash_ref->{'max_lifetime'} ||
 			$p_options_hash_ref->{'max_cache_lifetime'};
+		$background_cache = $p_options_hash_ref->{'background_cache'};
 	}
 	$max_lifetime = -1 unless defined($max_lifetime);
+	$background_cache = 1 unless defined($background_cache);
 
 	$self->set_max_lifetime($max_lifetime);
+	$self->set_background_cache($background_cache);
 
 	return $self;
 }
@@ -95,7 +102,7 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(max_lifetime)) {
+foreach my $i (qw(max_lifetime background_cache)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -132,9 +139,10 @@ sub get_lockname {
 # Uses file locking to have only one process updating value for $key
 # to avoid 'cache miss stampede' (aka 'stampeding herd') problem.
 sub compute {
-	my ($self, $key, $code) = @_;
+	my ($self, $key, $coderef) = @_;
+	my ($data, $stale_data);
 
-	my $data = $self->get($key);
+	$data = $self->get($key);
 	return $data if defined $data;
 
 	my $dir;
@@ -153,16 +161,47 @@ sub compute {
 	do {
 		open my $lock_fh, '+>', $lockfile
 			or die "Could't open lockfile '$lockfile': $!";
+
 		$lock_state = flock($lock_fh, LOCK_EX | LOCK_NB);
 		if ($lock_state) {
-			# acquired writers lock
-			$data = $code->($self, $key);
-			$self->set($key, $data);
+			## acquired writers lock, have to generate data
+			my $pid;
+			if ($self->{'background_cache'}) {
+				# try to retrieve stale data
+				$stale_data = $self->fetch($key)
+					if $self->is_valid($key, $self->get_max_lifetime());
+
+				# fork if there is stale data, for background process
+				# to regenerate/refresh the cache (generate data)
+				$pid = fork() if (defined $stale_data);
+			}
+			if (!defined $pid || !$pid) {
+				## didn't fork, or are in background process
+
+				# daemonize background process, detaching it from parent
+				# see also Proc::Daemonize, Apache2::SubProcess
+				if (defined $pid) {
+					POSIX::setsid();  # or setpgrp(0, 0);
+					fork() && exit 0;
+				}
+
+				$data = $coderef->($self, $key);
+				$self->set($key, $data);
+
+				if (defined $pid) {
+					## in background process; parent will serve stale data
+					close $lock_fh
+						or die "Couldn't close lockfile '$lockfile' (background): $!";
+					exit 0;
+				}
+			}
+			
 		} else {
 			# try to retrieve stale data
-			$data = $self->fetch($key)
+			$stale_data = $self->fetch($key)
 				if $self->is_valid($key, $self->get_max_lifetime());
-			if (!defined $data) {
+
+			if (!defined $stale_data) {
 				# get readers lock if there is no stale data to serve
 				flock($lock_fh, LOCK_SH);
 				$data = $self->fetch($key);
@@ -171,9 +210,9 @@ sub compute {
 		# closing lockfile releases lock
 		close $lock_fh
 			or die "Could't close lockfile '$lockfile': $!";
-	} until (defined $data || $lock_state);
+	} until (defined $data || defined $stale_data || $lock_state);
 	# repeat until we have data, or we tried generating data oneself and failed
-	return $data;
+	return defined $data ? $data : $stale_data;
 }
 
 1;
diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 5289894..9eb4896 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -129,7 +129,8 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(depth root namespace expires_min expires_max increase_factor check_load)) {
+foreach my $i (qw(depth root namespace
+                  expires_min expires_max increase_factor check_load)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 185307e..667fb5e 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -18,6 +18,9 @@ my $cache = new_ok('GitwebCache::FileCacheWithLocking', [ {
 } ]);
 isa_ok($cache, 'GitwebCache::SimpleFileCache');
 
+# compute can fork, don't generate zombies
+#local $SIG{CHLD} = 'IGNORE';
+
 # Test that default values are defined
 #
 ok(defined $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
@@ -309,18 +312,21 @@ sub run_compute_forked {
 }
 $cache->set_expires_in(0);    # expire now
 $cache->set_max_lifetime(-1); # forever (always serve stale data)
+ok($cache->get_background_cache(), 'generate cache in background by default');
 $pid = open $kid_fh, '-|';
 SKIP: {
-	skip "cannot fork: $!", 2
+	skip "cannot fork: $!", 3
 		unless defined $pid;
 
 	my ($data, $child_data) = run_compute_forked($pid, $kid_fh);
 
 	# returning stale data works
-	ok($data eq $stale_value || $child_data eq $stale_value,
-	   'stale data in at least one process when expired');
+	is($data,       $stale_value, 'stale data in parent when expired');
+	is($child_data, $stale_value, 'stale data in child  when expired');
 
 	$cache->set_expires_in(-1); # never expire for next ->get
+	diag('waiting for background process to have time to set data');
+	sleep $slow_time; # wait for background process to have chance to set data
 	is($cache->get($key), $value, 'value got set correctly, even if stale data returned');
 }
 
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 16/17] gitweb: Show appropriate "Generating..." page when regenerating cache
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (15 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 15/17] gitweb/lib - Regenerate (refresh) cache in background Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-14 16:08 ` [RFC PATCHv4 17/17] gitweb: Add startup delay to activity indicator for cache Jakub Narebski
                   ` (2 subsequent siblings)
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

When there exist stale/expired (but not too stale) version of
(re)generated page in cache, gitweb returns stale version (and updates
cache in background, assuming 'background_cache' is set to true value).
When there is no stale version suitable to serve the client, currently
we have to wait for the data to be generated in full before showing it.
Add to GitwebCache::FileCacheWithLocking, via 'generating_info' callback,
the ability to show user some activity indicator / progress bar, to
show that we are working on generating data.

Gitweb itself uses "Generating..." page as activity indicator, which
redirects (via <meta http-equiv="Refresh" ...>) to refreshed version
of the page after the cache is filled (via trick of not closing page
and therefore not closing connection till data is available in cache,
checked by getting shared/readers lock on lockfile for cache entry).
The git_generating_data_html() subroutine, which is used by gitweb
to implement this feature, is highly configurable: you can choose
frequency of writing some data so that connection won't get closed,
and maximum time to wait for data in "Generating..." page (see comments
in %generating_options hash definition).

Currently git_generating_data_html() contains hardcoded "whitelist" of
actions for which such HTML "Generating..." page makes sense.


This implements final feature from the original gitweb output caching
patch by J.H.

Inspired-by-code-by: John 'Warthog9' Hawley <warthog9@kernel.org>
Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/gitweb.perl                             |  127 +++++++++++++++++++++++-
 gitweb/lib/GitwebCache/FileCacheWithLocking.pm |   52 +++++++++-
 t/t9503/test_cache_interface.pl                |   61 +++++++++++
 3 files changed, 234 insertions(+), 6 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2ca1ad7..8d7540e 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -21,7 +21,7 @@ use CGI qw(:standard :escapeHTML -nosticky);
 use CGI::Util qw(unescape);
 use CGI::Carp qw(fatalsToBrowser set_message);
 use Encode;
-use Fcntl ':mode';
+use Fcntl qw(:mode :flock);
 use File::Find qw();
 use File::Basename qw(basename);
 
@@ -308,8 +308,32 @@ our %cache_options = (
 	# In theory this will make gitweb seem more responsive at the price of
 	# serving possibly stale data.
 	'background_cache' => 1,
-);
 
+	# Subroutine which would be called when gitweb has to wait for data to
+	# be generated (it can't serve stale data because there isn't any,
+	# or if it exists it is older than 'max_lifetime').  The default
+	# is to use git_generating_data_html(), which creates "Generating..."
+	# page, which would then redirect or redraw/rewrite the page when
+	# data is ready.
+	# Set it to `undef' to disable this feature.
+	#
+	# Such subroutine (if invoked from GitwebCache::SimpleFileCache)
+	# is passed the following parameters: $cache instance, human-readable
+	# $key to current page, and filehandle $lock_fh to lockfile.
+	'generating_info' => \&git_generating_data_html,
+);
+# You define site-wide options for "Generating..." page (if enabled) here
+# (which means that $cache_options{'generating_info'} is set to coderef);
+# override them with $GITWEB_CONFIG as necessary.
+our %generating_options = (
+	# The time between generating new piece of output to prevent from
+	# redirection before data is ready, i.e. time between printing each
+	# dot in activity indicator / progress info, in seconds.
+	'print_interval' => 2,
+	# Maximum time "Generating..." page would be present, waiting for data,
+	# before unconditional redirect, in seconds.
+	'timeout' => $cache_options{'expires_min'},
+);
 
 # You define site-wide feature defaults here; override them with
 # $GITWEB_CONFIG as necessary.
@@ -3324,6 +3348,105 @@ sub get_page_title {
 	return $title;
 }
 
+# creates "Generating..." page when caching enabled and not in cache
+sub git_generating_data_html {
+	my ($cache, $key, $lock_fh) = @_;
+
+	# whitelist of actions that should get "Generating..." page
+	unless ($action =~ /(?:blame(?:|_incremental) | blobdiff | blob |
+	                     commitdiff | commit | forks | heads | tags |
+	                     log | shortlog | history | search |
+	                     tag | tree | summary | project_list)/x) {
+		return;
+	}
+	# blacklist of actions that should not have "Generating..." page
+	#if ($action =~ /(?:atom | rss | opml |
+	#                 blob_plain | blobdiff_plain | commitdiff_plain |
+	#                 patch | patches |
+	#                 blame_data | search_help | object | project_index |
+	#                 snapshot/x) { # binary
+	#	return;
+	#}
+
+	# Stop capturing response, just in case (we should be not generating response)
+	#
+	capture_stop(); # or gitweb could use 'print $STDOUT' in place of 'print STDOUT'
+
+	my $title = "[Generating...] " . get_page_title();
+	# TODO: the following line of code duplicates the one
+	# in git_header_html, and it should probably be refactored.
+	my $mod_perl_version = $ENV{'MOD_PERL'} ? " $ENV{'MOD_PERL'}" : '';
+
+	# Use the trick that 'refresh' HTTP header equivalent (set via http-equiv)
+	# with timeout of 0 seconds would redirect as soon as page is finished.
+	# This "Generating..." redirect page should not be cached (externally).
+	print STDOUT $cgi->header(-type => 'text/html', -charset => 'utf-8',
+	                          -status=> '200 OK', -expires => 'now');
+	print STDOUT <<"EOF";
+<?xml version="1.0" encoding="utf-8"?>
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
+                      "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en-US" lang="en-US">
+<!-- git web interface version $version -->
+<!-- git core binaries version $git_version -->
+<head>
+<meta http-equiv="content-type" content="text/html; charset=utf-8" />
+<meta http-equiv="refresh" content="0" />
+<meta name="generator" content="gitweb/$version git/$git_version$mod_perl_version" />
+<meta name="robots" content="noindex, nofollow" />
+<title>$title</title>
+</head>
+<body>
+EOF
+
+	local $| = 1; # autoflush
+	print STDOUT 'Generating...';
+
+	my $total_time = 0;
+	my $interval = $generating_options{'print_interval'} || 1;
+	my $timeout  = $generating_options{'timeout'};
+	my $alarm_handler = sub {
+		local $! = 1;
+		print STDOUT '.';
+		$total_time += $interval;
+		if ($total_time > $timeout) {
+			die "timeout\n";
+		}
+	};
+	eval {
+		# check if we can use functions from Time::HiRes
+		if (defined $t0) {
+			local $SIG{ALRM} = $alarm_handler;
+			Time::HiRes::alarm($interval, $interval);
+		} else {
+			local $SIG{ALRM} = sub {
+				$alarm_handler->();
+				alarm($interval);
+			};
+			alarm($interval);
+		}
+		my $lock_acquired;
+		do {
+			# loop is needed here because SIGALRM (from 'alarm')
+			# can interrupt process of acquiring lock
+			$lock_acquired = flock($lock_fh, LOCK_SH); # blocking readers lock
+		} until ($lock_acquired);
+		alarm 0;
+	};
+	# It doesn't really matter if we got lock, or timed-out
+	# but we should re-throw unknown (unexpected) errors
+	die $@ if ($@ and $@ !~ /timeout/);
+
+	print STDOUT <<"EOF";
+
+</body>
+</html>
+EOF
+
+	exit 0;
+	#return;
+}
+
 sub git_header_html {
 	my $status = shift || "200 OK";
 	my $expires = shift;
diff --git a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
index bde1420..ad078e8 100644
--- a/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
+++ b/gitweb/lib/GitwebCache/FileCacheWithLocking.pm
@@ -74,24 +74,32 @@ use POSIX qw(setsid);
 #  * 'background_cache' (boolean)
 #    This enables/disables regenerating cache in background process.
 #    Defaults to true.
+#  * 'generating_info'
+#    Subroutine (code) called when process has to wait for cache entry
+#    to be (re)generated (when there is no not-too-stale data to serve
+#    instead), for other process (or bacground process).  It is passed
+#    $cache instance, $key, and opened $lock_fh filehandle to lockfile.
+#    Unset by default (which means no activity indicator).
 sub new {
 	my ($proto, $p_options_hash_ref) = @_;
 
 	my $class = ref($proto) || $proto;
 	my $self = $class->SUPER::new($p_options_hash_ref);
 
-	my ($max_lifetime, $background_cache);
+	my ($max_lifetime, $background_cache, $generating_info);
 	if (defined $p_options_hash_ref) {
 		$max_lifetime =
 			$p_options_hash_ref->{'max_lifetime'} ||
 			$p_options_hash_ref->{'max_cache_lifetime'};
 		$background_cache = $p_options_hash_ref->{'background_cache'};
+		$generating_info  = $p_options_hash_ref->{'generating_info'};
 	}
 	$max_lifetime = -1 unless defined($max_lifetime);
 	$background_cache = 1 unless defined($background_cache);
 
 	$self->set_max_lifetime($max_lifetime);
 	$self->set_background_cache($background_cache);
+	$self->set_generating_info($generating_info);
 
 	return $self;
 }
@@ -102,7 +110,7 @@ sub new {
 # http://perldesignpatterns.com/perldesignpatterns.html#AccessorPattern
 
 # creates get_depth() and set_depth($depth) etc. methods
-foreach my $i (qw(max_lifetime background_cache)) {
+foreach my $i (qw(max_lifetime background_cache generating_info)) {
 	my $field = $i;
 	no strict 'refs';
 	*{"get_$field"} = sub {
@@ -115,6 +123,17 @@ foreach my $i (qw(max_lifetime background_cache)) {
 	};
 }
 
+# $cache->generating_info($key, $lock);
+# runs 'generating_info' subroutine, for activity indicator,
+# checking if it is defined first.
+sub generating_info {
+	my $self = shift;
+
+	if (defined $self->{'generating_info'}) {
+		$self->{'generating_info'}->($self, @_);
+	}
+}
+
 # ----------------------------------------------------------------------
 # utility functions and methods
 
@@ -173,7 +192,8 @@ sub compute {
 
 				# fork if there is stale data, for background process
 				# to regenerate/refresh the cache (generate data)
-				$pid = fork() if (defined $stale_data);
+				$pid = fork()
+					if (defined $stale_data || $self->{'generating_info'});
 			}
 			if (!defined $pid || !$pid) {
 				## didn't fork, or are in background process
@@ -190,18 +210,42 @@ sub compute {
 
 				if (defined $pid) {
 					## in background process; parent will serve stale data
+					## or show activity indicator, and serve data
 					close $lock_fh
 						or die "Couldn't close lockfile '$lockfile' (background): $!";
 					exit 0;
 				}
+
+			} else {
+				## forked, in parent process
+
+				# provide "generating page..." info if there is no stale data to serve
+				# might exit, or force web browser to do redirection (refresh)
+				if (!defined $stale_data) {
+					# lock can get inherited across forks; unlock
+					# flock($lock_fh, LOCK_UN); # <-- this doesn't work
+					close $lock_fh
+						or die "Couldn't close lockfile '$lockfile' for reopen: $!";
+					open $lock_fh, '<', $lockfile
+						or die "Couldn't reopen (for reading) lockfile '$lockfile': $!";
+
+					$self->generating_info($key, $lock_fh);
+					# generating info may exit, so we can not get there
+					# wait for and get data from background process
+					flock($lock_fh, LOCK_SH);
+					$data = $self->fetch($key);
+				}
 			}
-			
+
 		} else {
 			# try to retrieve stale data
 			$stale_data = $self->fetch($key)
 				if $self->is_valid($key, $self->get_max_lifetime());
 
 			if (!defined $stale_data) {
+				# there is no stale data to serve
+				# provide "generating page..." info
+				$self->generating_info($key, $lock_fh);
 				# get readers lock if there is no stale data to serve
 				flock($lock_fh, LOCK_SH);
 				$data = $self->fetch($key);
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index 667fb5e..a84faf9 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -345,4 +345,65 @@ SKIP: {
 }
 $cache->set_expires_in(-1);
 
+# Test 'generating_info' feature
+#
+$cache->remove($key);
+my $progress_info = "Generating...";
+sub test_generating_info {
+	local $| = 1;
+	print "$progress_info";
+}
+$cache->set_generating_info(\&test_generating_info);
+# Catch output printed by ->compute
+# (only for 'print <sth>' and 'printf <sth>')
+sub capture_compute {
+	my $output = '';
+
+	open my $output_fh, '>', \$output;
+	my $oldfh = select($output_fh);
+
+	my $data = $cache->compute($key, \&get_value_slow);
+
+	select($oldfh);
+	close $output_fh;
+
+	return ($output, $data);
+}
+sub run_capture_compute_forked {
+	my $pid = shift;
+
+	my ($output, $data) = capture_compute();
+	my ($child_output, $child_data);
+
+	if ($pid) {
+		local $/ = "\0";
+		chomp($child_output = <$kid_fh>);
+		chomp($child_data   = <$kid_fh>);
+
+		waitpid $pid, 0;
+		close $kid_fh;
+	} else {
+		local $| = 1;
+		$output = '' unless defined $output;
+		$data   = '' unless defined $data;
+		print "$output\0$data\0";
+		exit 0;
+	}
+
+	return ($output, $data, $child_output, $child_data);
+}
+SKIP: {
+	$pid = open $kid_fh, '-|';
+	skip "cannot fork: $!", 4
+		unless defined $pid;
+
+	my ($output, $data, $child_output, $child_data) =
+		run_capture_compute_forked($pid);
+
+	is($output,       $progress_info, 'progress info from parent');
+	is($child_output, $progress_info, 'progress info from child');
+	is($data,         $value,         'data info from parent');
+	is($child_data,   $value,         'data info from child');
+}
+
 done_testing();
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 17/17] gitweb: Add startup delay to activity indicator for cache
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (16 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 16/17] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
@ 2010-06-14 16:08 ` Jakub Narebski
  2010-06-24  7:56 ` [RFC PATCHv4 18/17] gitweb/lib - Add clear() and size() methods to caching interface Jakub Narebski
  2010-06-24  7:56 ` [RFC PATCHv4 19/17] gitweb: Add beginnings of cache administration page (WIP) Jakub Narebski
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-14 16:08 UTC (permalink / raw)
  To: git
  Cc: Pavan Kumar Sunkara, Petr Baudis, Christian Couder,
	John 'Warthog9' Hawley, John 'Warthog9' Hawley,
	Jakub Narebski

This adds support for [optional] startup delay to git_generating_data_html()
subroutine, which is used to provide "Generating..." page as activity
indicator when waiting for page to be generated if caching.  If the data
(page contents) gets generated within $generating_options{'staryp_delay'}
seconds, the "Generating..." page won't get displayed.

This feature was created in response to complaint by Petr 'Pasky' Baudis'
about "Generating..." feature.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
 gitweb/gitweb.perl |   20 ++++++++++++++++++++
 1 files changed, 20 insertions(+), 0 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 8d7540e..313114c 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -326,6 +326,9 @@ our %cache_options = (
 # (which means that $cache_options{'generating_info'} is set to coderef);
 # override them with $GITWEB_CONFIG as necessary.
 our %generating_options = (
+	# The delay before displaying "Generating..." page, in seconds.  It is
+	# intended for "Generating..." page to be shown only when really needed.
+	'startup_delay' => 1,
 	# The time between generating new piece of output to prevent from
 	# redirection before data is ready, i.e. time between printing each
 	# dot in activity indicator / progress info, in seconds.
@@ -3368,6 +3371,23 @@ sub git_generating_data_html {
 	#	return;
 	#}
 
+	# Initial delay
+	if ($generating_options{'startup_delay'} > 0) {
+		eval {
+			local $SIG{ALRM} = sub { die "alarm clock restart\n" }; # NB: \n required
+			alarm $generating_options{'startup_delay'};
+			flock($lock_fh, LOCK_SH); # blocking readers lock
+			alarm 0;
+		};
+		if ($@) {
+			# propagate unexpected errors
+			die $@ if $@ !~ /alarm clock restart/;
+		} else {
+			# we got response within 'startup_delay' timeout
+			return;
+		}
+	}
+
 	# Stop capturing response, just in case (we should be not generating response)
 	#
 	capture_stop(); # or gitweb could use 'print $STDOUT' in place of 'print STDOUT'
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 18/17] gitweb/lib - Add clear() and size() methods to caching interface
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (17 preceding siblings ...)
  2010-06-14 16:08 ` [RFC PATCHv4 17/17] gitweb: Add startup delay to activity indicator for cache Jakub Narebski
@ 2010-06-24  7:56 ` Jakub Narebski
  2010-06-24  7:56 ` [RFC PATCHv4 19/17] gitweb: Add beginnings of cache administration page (WIP) Jakub Narebski
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-24  7:56 UTC (permalink / raw)
  To: git
  Cc: Petr Baudis, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley, Jakub Narebski

Add ->size() method, which following Cache::Cache interface returns
estimated total size of all entries in whole cache (in the namsepace
assiciated with give cache instance).  Note that ->get_size($key)
returns size of a single entry!

Add ->clear() method, which removes all entries from the namespace
associated with given cache instance.  For safety it requires
namespace to be set to true value, which means that it cannot be
empty; therefore default namespace is changed to 'gitweb'.

The ->clear() method should be fairly safe, because it first renames
directory (which should be atomic), and only then removes it
(following code from CGI::Driver::File).

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is new patch, implementing methods that allow for rudimentary
gitweb cache administration.  For now it is only safe removing whole
cache, in the future it could include removing only stale (expired)
entries.

This is "backend" patch.

 gitweb/lib/GitwebCache/SimpleFileCache.pm |   51 ++++++++++++++++++++++++++--
 t/t9503/test_cache_interface.pl           |    1 -
 2 files changed, 47 insertions(+), 5 deletions(-)

diff --git a/gitweb/lib/GitwebCache/SimpleFileCache.pm b/gitweb/lib/GitwebCache/SimpleFileCache.pm
index 9eb4896..457bbe7 100644
--- a/gitweb/lib/GitwebCache/SimpleFileCache.pm
+++ b/gitweb/lib/GitwebCache/SimpleFileCache.pm
@@ -20,8 +20,9 @@ package GitwebCache::SimpleFileCache;
 use strict;
 use warnings;
 
-use File::Path qw(mkpath);
-use File::Temp qw(tempfile);
+use File::Path qw(mkpath rmtree);
+use File::Temp qw(tempfile mktemp);
+use File::Find qw(find);
 use Digest::MD5 qw(md5_hex);
 
 # by default, the cache nests all entries on the filesystem single
@@ -37,7 +38,7 @@ our $DEFAULT_CACHE_ROOT = "cache";
 # by default we don't use cache namespace (empty namespace);
 # empty namespace does not allow for simple implementation of clear() method.
 #
-our $DEFAULT_NAMESPACE = '';
+our $DEFAULT_NAMESPACE = "gitweb";
 
 # ......................................................................
 # constructor
@@ -202,6 +203,9 @@ sub path_to_namespace {
 	return $self->{'path_to_namespace'};
 }
 
+# $path = $cache->path_to_key($key);
+# $path = $cache->path_to_key($key, \$dir);
+#
 # Take an human readable key, and return file path.
 # Puts dirname of file path in second argument, if it is provided.
 sub path_to_key {
@@ -321,7 +325,7 @@ sub get_size {
 }
 
 # ......................................................................
-# interface methods
+# interface methods dealing with single item
 
 # Removing and expiring
 
@@ -412,6 +416,45 @@ sub compute {
 	return $data;
 }
 
+# ......................................................................
+# interface methods dealing with whole namespace
+
+# $cache->clear();
+#
+# Remove all entries from the namespace.
+# Namespace must be defined and not empty.
+sub clear {
+	my $self = shift;
+
+	return unless $self->get_namespace();
+
+	my $namespace_dir = $self->path_to_namespace();
+	return if !-d $namespace_dir;
+
+	my $renamed_dir = mktemp($namespace_dir . '.XXXX');
+	rename($namespace_dir, $renamed_dir);
+	rmtree($renamed_dir);
+	die "Couldn't remove '$renamed_dir' directory"
+		if -d $renamed_dir;
+}
+
+# $size = $cache->size();
+#
+# Size of whole names (or whole cache if namespace empty)
+sub size {
+	my $self = shift;
+
+	my $namespace_dir = $self->path_to_namespace();
+	return if !-d $namespace_dir;
+
+	my $total_size = 0;
+	my $add_size = sub { $total_size += -s $_ };
+
+	File::Find::find({ wanted => $add_size, no_chdir => 1 }, $namespace_dir);
+
+	return $total_size;
+}
+
 1;
 __END__
 # end of package GitwebCache::SimpleFileCache;
diff --git a/t/t9503/test_cache_interface.pl b/t/t9503/test_cache_interface.pl
index a84faf9..0a18f77 100755
--- a/t/t9503/test_cache_interface.pl
+++ b/t/t9503/test_cache_interface.pl
@@ -35,7 +35,6 @@ SKIP: {
 		unless ($GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT &&
 		        $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH);
 
-	is($cache->get_namespace(), '', "default namespace is ''");
 	cmp_ok($cache->get_root(),  'eq', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT,
 		"default cache root is '$GitwebCache::SimpleFileCache::DEFAULT_CACHE_ROOT'");
 	cmp_ok($cache->get_depth(), '==', $GitwebCache::SimpleFileCache::DEFAULT_CACHE_DEPTH,
-- 
1.7.0.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [RFC PATCHv4 19/17] gitweb: Add beginnings of cache administration page (WIP)
  2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
                   ` (18 preceding siblings ...)
  2010-06-24  7:56 ` [RFC PATCHv4 18/17] gitweb/lib - Add clear() and size() methods to caching interface Jakub Narebski
@ 2010-06-24  7:56 ` Jakub Narebski
  19 siblings, 0 replies; 21+ messages in thread
From: Jakub Narebski @ 2010-06-24  7:56 UTC (permalink / raw)
  To: git
  Cc: Petr Baudis, John 'Warthog9' Hawley,
	John 'Warthog9' Hawley, Jakub Narebski

Currently it shows estimated size of cache, and provides link to
clearing cache.

Cache administration page is visible only on local computer; the same
is true with respect to ability to clear cache.  Those are bare
beginnings of autorization framework.

Signed-off-by: Jakub Narebski <jnareb@gmail.com>
---
This is "frontend" (interface) patch for gitweb cache management.

This is very much work in progress, sent to git mailing list as a
proof of concept.  Would something like that (perhaps with more
advanced authorization for admin) be useful?

The 'cache' page looks like this:

  Cache location      Size	 
  ------------------+---------+--------------
  cache/gitweb        14 KiB	[Clear cache]
  ------------------+---------+--------------

where '[Clear cache]' is a submit button.

 gitweb/gitweb.perl |  131 +++++++++++++++++++++++++++++++++++++++++++++++++---
 1 files changed, 124 insertions(+), 7 deletions(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 313114c..deb0553 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -24,8 +24,13 @@ use Encode;
 use Fcntl qw(:mode :flock);
 use File::Find qw();
 use File::Basename qw(basename);
+use POSIX;
 
 binmode STDOUT, ':utf8';
 
 our $t0;
@@ -781,6 +786,19 @@ our %actions = (
 	"project_index" => \&git_project_index,
 );
 
+our %uncacheable_actions = ();
+# some actions are available only conditionally
+if ($caching_enabled) {
+	$actions{'cache'} = \&git_cache_admin;
+	$actions{'clear_cache'} = \&git_cache_clear;
+
+	map { $uncacheable_actions{$_} = 1 } qw(cache clear_cache);
+}
+sub is_cacheable {
+	my $action = shift;
+	return !$uncacheable_actions{$action};
+}
+
 # finally, we have the hash of allowed extra_options for the commands that
 # allow them
 our %allowed_options = (
@@ -1093,7 +1111,7 @@ if (!defined $action) {
 if (!defined($actions{$action})) {
 	die_error(400, "Unknown action");
 }
-if ($action !~ m/^(?:opml|project_list|project_index)$/ &&
+if ($action !~ m/^(?:opml|project_list|project_index|cache|clear_cache)$/ &&
     !$project) {
 	die_error(400, "Project needed");
 }
@@ -1102,12 +1120,20 @@ if ($caching_enabled) {
 	# $cache might be initialized (instantiated) cache, i.e. cache object,
 	# or it might be name of class, or it might be undefined
 	unless (defined $cache && ref($cache)) {
-		$cache ||= 'GitwebCache::FileCacheWithLocking';
+		if (eval { require GitwebCache::FileCacheWithLocking; 1; }) {
+			$cache ||= 'GitwebCache::FileCacheWithLocking';
+		} else {
+			# GitwebCache::CacheOutput loads GitwebCache::SimpleFileCache
+			$cache ||= 'GitwebCache::SimpleFileCache';
+		}
 		$cache = $cache->new({
 			%cache_options,
 			#'cache_root' => '/tmp/cache',
@@ -1120,7 +1146,9 @@ if ($caching_enabled) {
 			'depth' => $cache_options{'cache_depth'},
 		});
 	}
-
+}
+if ($caching_enabled &&
+    is_cacheable($action)) {
 	# human readable key identifying gitweb output
 	my $output_key = href(-replay => 1, -full => 1, -path_info => 0);
 
@@ -3472,7 +3500,7 @@ sub git_header_html {
 	my $expires = shift;
 	my %opts = @_;
 
-	my $title = get_page_title();
+	my $title = $opts{'-title'} || get_page_title();
 	my $content_type;
 	# require explicit support from the UA if we are to send the page as
 	# 'application/xhtml+xml', otherwise send it as plain old 'text/html'.
@@ -7118,3 +7146,92 @@ XML
 </opml>
 XML
 }
+
+# see Number::Bytes::Human
+sub human_readable_size {
+	my $bytes = shift || return;
+
+	my @units = ('', 'KiB', 'MiB', 'GiB', 'TiB');
+	my $block = 1024;
+
+	my $x = $bytes;
+	my $unit;
+	foreach (@units) {
+		$unit = $_, last if POSIX::ceil($x) < $block;
+		$x /= $block;
+	}
+
+	my $num;
+	if ($x < 10.0) {
+		$num = sprintf("%.1f", POSIX::ceil($x*10)/10); 
+	} else {
+		$num = sprintf("%d", POSIX::ceil($x));
+	}
+
+	return "$num $unit";
+}
+
+sub cache_admin_auth_ok {
+	return $ENV{'SERVER_ADDR'} eq $ENV{'REMOTE_ADDR'};
+}
+
+sub git_cache_admin {
+	$caching_enabled
+		or die_error(403, "Caching disabled");
+	cache_admin_auth_ok()
+		or die_error(403, "Cache administration not allowed");
+	$cache && ref($cache)
+		or die_error(500, "Cache is not present");
+
+	git_header_html(undef, undef,
+		-title => to_utf8($site_name) . " - Gitweb cache");
+
+	print <<'EOF_HTML';
+<table class="cache_admin">
+<tr><th>Cache location</th><th>Size</th><th>&nbsp;</th></tr>
+EOF_HTML
+	print '<tr class="light">' .
+	      '<td class="path">' . esc_path($cache->path_to_namespace()) . '</td>' .
+	      '<td>';
+	my $size;
+	$size = $cache->size()     if (!defined $size && $cache->can('size'));
+	$size = $cache->get_size() if (!defined $size && $cache->can('get_size'));
+	if (defined $size) {
+		print human_readable_size($size);
+	}
+	print '</td><td>';
+	if ($cache->can('clear')) {
+		print $cgi->start_form({-method => "POST",
+		                        -action => $my_uri,
+		                        -enctype => CGI::URL_ENCODED}) .
+		      $cgi->input({-name=>"a", -value=>"clear_cache", -type=>"hidden"}) .
+		      $cgi->submit({-label => 'Clear cache'}) .
+		      $cgi->end_form();
+#		print <<'EOF_FORM';
+#
+#<form method="post" action="?a=clear_cache" enctype="application/x-www-form-urlencoded">
+#<input type="submit" value="Clear cache" />
+#</form>
+#EOF_FORM
+	}
+	print <<'EOF_HTML';
+</td></tr>
+</table>
+EOF_HTML
+
+	git_footer_html();
+}
+
+sub git_cache_clear {
+	if (cache_admin_auth_ok() && $cache &&
+	    $cgi->request_method() eq 'POST') {
+
+		$cache->clear();
+	}
+
+	#print "cleared";
+	print $cgi->redirect(-uri => href(action=>'cache', -full=>1),
+	                     -status => '303 See Other');
+}

^ permalink raw reply related	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2010-06-24  7:57 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-06-14 16:08 [RFC PATCHv4 00/17] gitweb: Simple file based output caching Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 01/17] gitweb: Return or exit after done serving request Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 02/17] gitweb: Fix typo in hash key name in %opts in git_header_html Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 03/17] gitweb: Prepare for splitting gitweb Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 04/17] gitweb/lib - Very simple file based cache Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 05/17] gitweb/lib - Stat-based cache expiration Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 06/17] gitweb/lib - Benchmarking GitwebCache::SimpleFileCache (in t/9603/) Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 07/17] gitweb/lib - Simple select(FH) based output capture Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 08/17] gitweb/lib - Alternate ways of capturing output Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 09/17] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 10/17] gitweb: Add optional output caching (from gitweb/cache.pm) Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 10/18] gitweb/lib - Cache captured output (using get/set) Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 11/17] gitweb/lib - Adaptive cache expiration time Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 12/17] gitweb/lib - Use CHI compatibile (compute method) caching interface Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 13/17] gitweb/lib - Use locking to avoid 'cache miss stampede' problem Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 14/17] gitweb/lib - Serve stale data when waiting for filling cache Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 15/17] gitweb/lib - Regenerate (refresh) cache in background Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 16/17] gitweb: Show appropriate "Generating..." page when regenerating cache Jakub Narebski
2010-06-14 16:08 ` [RFC PATCHv4 17/17] gitweb: Add startup delay to activity indicator for cache Jakub Narebski
2010-06-24  7:56 ` [RFC PATCHv4 18/17] gitweb/lib - Add clear() and size() methods to caching interface Jakub Narebski
2010-06-24  7:56 ` [RFC PATCHv4 19/17] gitweb: Add beginnings of cache administration page (WIP) Jakub Narebski

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.