All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled
@ 2011-06-09 13:15 Jeremie Nikaes
  2011-06-09 13:16 ` [RFC/PATCH 2/2] Git-remote-mediawiki: Add push support Jeremie Nikaes
                   ` (3 more replies)
  0 siblings, 4 replies; 10+ messages in thread
From: Jeremie Nikaes @ 2011-06-09 13:15 UTC (permalink / raw)
  To: git
  Cc: Jeremie Nikaes, Arnaud Lacurie, Claire Fousse, David Amouyal,
	Matthieu Moy, Sylvain Boulmé

Implement a gate between git and mediawiki, allowing git users to
push and pull objects from mediawiki just as one would do with a
classic git repository thanks to remote-helpers.

Currently supported commands are :
     git clone mediawiki::http://onewiki.com
     git pull

You need the following packages installed (available on common
repositories):
     libmediawiki-api-perl
     libdatetime-format-iso8601-perl

Use remote helpers in order to be as transparent as possible
to the git user.

Download Mediawiki revisions through the Mediawiki API and then
fast-import into git.

Mediawiki revisions and git commits are linked thanks to notes bound to
commits.

The import part is done on a refs/mediawiki/<remote> branch before
coming to refs/remote/origin/master (Huge thanks to Jonathan Nieder
for his help)

For now, the whole wiki is cloned, but it will be possible to clone only
some pages: the clone is based on a list of pages which is now all
pages.

Code clarified & improved with the help of Jeff King and Junio C Hamano.

We were not able to reproduce the empty timestamp bug noticed by Jeff
King, thus needing some further testing. A placeholder is still
implemented just in case. Its value is the value of the last valid
timestamp received + 1

With "use encoding 'utf-8'" non-iso characters are now fully supported
in both file content and filename.
A small helper run_git is also added to execute any git command, helping
to also utf-8 encode results from git commands.
However, utf-8 encoding for filenames could raise problems if different
file systems handle utf-8 filenames differently. A uri_escape of
mediawiki filenames could be imaginable, and is still to be discussed
further.

Partial cloning is supported using the following syntax :
"git clone mediawiki::http://wikiurl##A_Page##Another_Page"
As always, this url is kept in .git/config, helping to always keep
track of these specific pages

Signed-off-by: Jérémie Nikaes <jeremie.nikaes@ensimag.imag.fr>
Signed-off-by: Arnaud Lacurie <arnaud.lacurie@ensimag.imag.fr>
Signed-off-by: Claire Fousse <claire.fousse@ensimag.imag.fr>
Signed-off-by: David Amouyal <david.amouyal@ensimag.imag.fr>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
Signed-off-by: Sylvain Boulmé <sylvain.boulme@imag.fr>
---
 -- diff with v2
 Added the timestamp placeholder
 Utf8 encoding now properly supported
 Partial cloning functionality added

 contrib/mw-to-git/git-remote-mediawiki     |  322 ++++++++++++++++++++++++++++
 contrib/mw-to-git/git-remote-mediawiki.txt |    7 +
 2 files changed, 329 insertions(+), 0 deletions(-)
 create mode 100755 contrib/mw-to-git/git-remote-mediawiki
 create mode 100644 contrib/mw-to-git/git-remote-mediawiki.txt

diff --git a/contrib/mw-to-git/git-remote-mediawiki b/contrib/mw-to-git/git-remote-mediawiki
new file mode 100755
index 0000000..176ff09
--- /dev/null
+++ b/contrib/mw-to-git/git-remote-mediawiki
@@ -0,0 +1,322 @@
+#! /usr/bin/perl
+
+use strict;
+use MediaWiki::API;
+use DateTime::Format::ISO8601;
+use encoding 'utf8';
+use URI::Escape;
+use warnings;
+
+# Mediawiki filenames can contain forward slashes. This variable decides by which pattern they should be replaced
+my $slash_replacement = "%2F";
+
+my $remotename = $ARGV[0];
+# Current syntax to fetch only a set of pages mediawiki::http://mediawikiurl##A_Page##Another_Page
+my @pages_titles = split(/##/,$ARGV[1]);
+my $url = shift (@pages_titles);
+
+
+# commands parser
+my $entry;
+my @cmd;
+while (1) {
+	$| = 1; #flush STDOUT
+	$entry = <STDIN>;
+	chomp($entry);
+	@cmd = split(/ /,$entry);
+	if (defined($cmd[0])) {
+		if ($cmd[0] eq "capabilities") {
+			last unless (!defined($cmd[1]));
+			mw_capabilities();
+		} elsif ($cmd[0] eq "list") {
+			last unless (!defined($cmd[2]));
+			mw_list($cmd[1]);
+		} elsif ($cmd[0] eq "import") {
+			last unless ($cmd[1] ne "" && !defined($cmd[2]));
+			mw_import($cmd[1]);
+		} elsif ($cmd[0] eq "option") {
+			last unless ($cmd[1] ne "" && $cmd[2] ne "" && !defined($cmd[3]));
+			mw_option($cmd[1],$cmd[2]);
+		} elsif ($cmd[0] eq "push") {
+			# Check the pattern <src>:<dst>
+			my @pushargs = split(/:/,$cmd[1]);
+			last unless ($pushargs[1] ne "" && !defined($pushargs[2]));
+			mw_push($pushargs[0],$pushargs[1]);
+		} else {
+			print STDERR "Unknown capability. Aborting...\n";
+			last;
+		}
+	} else {
+		# End of input
+		last;
+	}
+
+}
+
+########################## Functions ##############################
+sub get_pages{
+	my $mediawiki = MediaWiki::API->new;
+	$mediawiki->{config}->{api_url} = "$url/api.php";
+
+	my $pages;
+	if (!@pages_titles){
+		$pages = $mediawiki->list({
+			action => 'query',
+			list => 'allpages',
+			aplimit => 500,
+		});
+		if (!defined($pages)) {
+			print STDERR "fatal: '$url' does not appear to be a mediawiki\n";
+			print STDERR "fatal: make sure '$url/api.php' is a valid page\n";
+			exit 1;
+		}
+		return @$pages;
+	} else {
+		#the list of titles should follow the pattern 'page1|page2|...'
+		my $titles = "";
+		foreach my $title (@pages_titles){
+			$titles.="$title|";
+		}
+		#supress the last | that is add in the foreach
+		chop($titles);
+
+		$pages = $mediawiki->api({
+			action => 'query',
+			titles => $titles,
+		});
+		if (!defined($pages)) {
+			print STDERR "fatal: None of the pages exist \n";
+			exit 1;
+		}
+		return values (%{$pages->{query}->{pages}});
+	}
+}
+
+sub run_git {
+	open(my $git, "-|:encoding(UTF-8)", "git " . $_[0]);
+	my $res = do { local $/; <$git> };
+	close($git);
+
+	return $res;
+}
+
+
+sub get_last_local_revision {
+	# Get note regarding last mediawiki revision
+	my $note = run_git("notes --ref=mediawiki show refs/mediawiki/$remotename/master 2>/dev/null");
+	my @note_info = split(/ /, $note);
+
+	my $lastrevision_number;
+	if (!(defined($note_info[0]) && $note_info[0] eq "mediawiki_revision:")) {
+		print STDERR "No previous mediawiki revision found";
+		$lastrevision_number = 0;
+	} else {
+		# Notes are formatted : mediawiki_revision: #number
+		$lastrevision_number = $note_info[1];
+		chomp($lastrevision_number);
+		print STDERR "Last local mediawiki revision found is $lastrevision_number ";
+	}
+	return $lastrevision_number;
+}
+
+sub get_last_remote_revision {
+	my $mediawiki = MediaWiki::API->new;
+	$mediawiki->{config}->{api_url} = "$url/api.php";
+
+	my @pages = get_pages();
+
+	my $max_rev_num = 0;
+
+	foreach my $page (@pages) {
+		my $id = $page->{pageid};
+
+		my $query = {
+			action => 'query',
+			prop => 'revisions',
+			rvprop => 'ids',
+			pageids => $id,
+		};
+
+		my $result = $mediawiki->api($query);
+		
+		my $lastrev = pop(@{$result->{query}->{pages}->{$id}->{revisions}});
+		
+		$max_rev_num = ($lastrev->{revid} > $max_rev_num ? $lastrev->{revid} : $max_rev_num);
+	}
+
+	print STDERR "Last remote revision found is $max_rev_num\n";
+	return $max_rev_num;
+}
+
+sub literal_data {
+	my ($content) = @_;
+	print STDOUT "data ", bytes::length($content), "\n", $content;
+}
+
+sub mw_capabilities {
+	# Revisions are imported to the private namespace
+	# refs/mediawiki/$remotename/ by the helper and fetched into
+	# refs/remotes/$remotename later by fetch.
+	print STDOUT "refspec refs/heads/*:refs/mediawiki/$remotename/*\n";
+	print STDOUT "import\n";
+	print STDOUT "list\n";
+	print STDOUT "option\n";
+	print STDOUT "push\n";
+	print STDOUT "\n";
+}
+
+sub mw_list {
+	# MediaWiki do not have branches, we consider one branch arbitrarily
+	# called master
+	print STDOUT "? refs/heads/master\n";
+	print STDOUT '@'."refs/heads/master HEAD\n";
+	print STDOUT "\n";
+
+}
+
+sub mw_option {
+	print STDOUT "unsupported\n";
+}
+
+sub mw_import {
+	my @wiki_name = split(/:\/\//,$url);
+	my $wiki_name = $wiki_name[1];
+
+	my $mediawiki = MediaWiki::API->new;
+	$mediawiki->{config}->{api_url} = "$url/api.php";
+
+	my @pages = get_pages();
+
+	my @revisions;
+	print STDERR "Searching revisions...\n";
+	my $fetch_from = get_last_local_revision() + 1;
+	if ($fetch_from == 1) {
+		print STDERR ", fetching from beginning\n";
+	} else {
+		print STDERR ", fetching from here\n";
+	}
+	my $n = 1;
+	foreach my $page (@pages) {
+		my $id = $page->{pageid};
+
+		print STDERR "$n/", scalar(@pages), ": ". $page->{title}."\n";
+		$n++;
+
+		my $query = {
+			action => 'query',
+			prop => 'revisions',
+			rvprop => 'ids',
+			rvdir => 'newer',
+			rvstartid => $fetch_from,
+			rvlimit => 500,
+			pageids => $id,
+		};
+
+		my $revnum = 0;
+		# Get 500 revisions at a time due to the mediawiki api limit
+		while (1) {
+			my $result = $mediawiki->api($query);
+
+			# Parse each of those 500 revisions
+			foreach my $revision (@{$result->{query}->{pages}->{$id}->{revisions}}) {
+				my $page_rev_ids;
+				$page_rev_ids->{pageid} = $page->{pageid};
+				$page_rev_ids->{revid} = $revision->{revid};
+				push (@revisions, $page_rev_ids);
+				$revnum++;
+			}
+			last unless $result->{'query-continue'};
+			$query->{rvstartid} = $result->{'query-continue'}->{revisions}->{rvstartid};
+		}
+		print STDERR "  Found ", $revnum, " revision(s).\n";
+	}
+
+	# Creation of the fast-import stream
+	print STDERR "Fetching & writing export data...\n";
+	
+	$n = 0;
+	my $last_timestamp = 0; #Placeholer in case $rev->timestamp is undefined
+
+	foreach my $pagerevids (sort {$a->{revid} <=> $b->{revid}} @revisions) {
+		#fetch the content of the pages
+		my $query = {
+			action => 'query',
+			prop => 'revisions',
+			rvprop => 'content|timestamp|comment|user|ids',
+			revids => $pagerevids->{revid},
+		};
+
+		my $result = $mediawiki->api($query);
+
+		my $rev = pop(@{$result->{query}->{pages}->{$pagerevids->{pageid}}->{revisions}});
+
+		$n++;
+		my $user = $rev->{user} || 'Anonymous';
+
+		if (!defined($rev->{timestamp})) {
+			$last_timestamp++;
+		} else {
+			$last_timestamp = $rev->{timestamp};
+		}
+		my $dt = DateTime::Format::ISO8601->parse_datetime($last_timestamp);
+
+		my $comment = defined $rev->{comment} ? $rev->{comment} : '*Empty MediaWiki Message*';
+		
+		my $title = $result->{query}->{pages}->{$pagerevids->{pageid}}->{title};
+		$title =~ y/ /_/;
+		#$title = uri_escape($title); #It would be nice to use uri_escape to be cross compatible
+		#on different file systems handling accentuated characters differently
+		$title =~ s/\//$slash_replacement/g;
+		
+		my $content = $rev->{'*'};
+		# This \n is important. This is due to mediawiki's way to handle end of files.
+		$content .= "\n";
+
+		print STDERR "$n/", scalar(@revisions), ": Revision #$pagerevids->{revid} of $title\n";
+
+		print STDOUT "commit refs/mediawiki/$remotename/master\n";
+		print STDOUT "mark :$n\n";
+		print STDOUT "committer $user <$user\@$wiki_name> ", $dt->epoch, " +0000\n";
+		literal_data($comment);
+		# If it's not a clone, needs to know where to start from
+		if ($fetch_from != 1 && $n == 1) {
+			print STDOUT "from refs/mediawiki/$remotename/master^0\n";
+		}
+		print STDOUT "M 644 inline $title.mw\n";
+		literal_data($content);
+		print STDOUT "\n\n";
+
+
+
+
+		# mediawiki revision number in the git note
+		if ($fetch_from == 1 && $n == 1) {
+			print STDOUT "reset refs/notes/mediawiki\n";
+		}
+		print STDOUT "commit refs/notes/mediawiki\n";
+		print STDOUT "committer $user <$user\@$wiki_name> ", $dt->epoch, " +0000\n";
+		literal_data("note added by git-mediawiki");
+		if ($fetch_from != 1 && $n == 1) {
+			print STDOUT "from refs/notes/mediawiki^0\n";
+		}
+		print STDOUT "N inline :$n\n";
+		literal_data("mediawiki_revision: " . $pagerevids->{revid});
+		print STDOUT "\n\n";
+	}
+
+	if ($fetch_from == 1) {
+		if ($n != 0) {
+			print STDOUT "reset $_[0]\n"; #$_[0] contains refs/heads/master
+			print STDOUT "from :$n\n";
+		} else {
+			print STDERR "You appear to have cloned an empty mediawiki\n";
+			#Something has to be done remote-helper side. If nothing is done, an error is 
+			#thrown saying that HEAD is refering to unknown object 0000000000000000000
+		}
+	}
+
+}
+
+sub mw_push {
+	print STDERR "Push not yet implemented\n";
+}
diff --git a/contrib/mw-to-git/git-remote-mediawiki.txt b/contrib/mw-to-git/git-remote-mediawiki.txt
new file mode 100644
index 0000000..4d211f5
--- /dev/null
+++ b/contrib/mw-to-git/git-remote-mediawiki.txt
@@ -0,0 +1,7 @@
+Git-Mediawiki is a project which aims the creation of a gate
+between git and mediawiki, allowing git users to push and pull
+objects from mediawiki just as one would do with a classic git
+repository thanks to remote-helpers.
+
+For more information, visit the wiki at
+https://github.com/Bibzball/Git-Mediawiki/wiki
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* [RFC/PATCH 2/2] Git-remote-mediawiki: Add push support
  2011-06-09 13:15 [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled Jeremie Nikaes
@ 2011-06-09 13:16 ` Jeremie Nikaes
  2011-06-09 17:15   ` Junio C Hamano
  2011-06-09 14:03 ` [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled Sverre Rabbelier
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 10+ messages in thread
From: Jeremie Nikaes @ 2011-06-09 13:16 UTC (permalink / raw)
  To: git
  Cc: Jeremie Nikaes, Arnaud Lacurie, Claire Fousse, David Amouyal,
	Matthieu Moy, Sylvain Boulmé

Push is now supported by the remote-helper
Thanks to notes metadata, it is possible to compare remote and local
last mediawiki revision to warn non fast-forward and everything
up-to-date.

When allowed, push looks for each commit between remotes/origin/master
and HEAD, catches every blob related to these commit and push them in
chronological order. To do so, it uses git rev-list --children HEAD and
travels the tree from remotes/origin/master to HEAD through children. In
other words :

	* Shortest path from remotes/origin/master to HEAD
	* For each commit encountered, push blobs related to this commit

An automatic git pull --rebase is executed after a successful push to
get metadata back from mediawiki. This is also done to maintain
closeness with the form of a mediawiki history. It can be a problem
since it also flatens the entire history. (This solution is still
to be discussed).

To send files to mediawiki, the mediawiki API is used. A filter is
applied to the data send because mediawiki pages cannot have blank
characters at the end. The filter is thus more or less a right trim.

Signed-off-by: Jérémie Nikaes <jeremie.nikaes@ensimag.imag.fr>
Signed-off-by: Arnaud Lacurie <arnaud.lacurie@ensimag.imag.fr>
Signed-off-by: Claire Fousse <claire.fousse@ensimag.imag.fr>
Signed-off-by: David Amouyal <david.amouyal@ensimag.imag.fr>
Signed-off-by: Matthieu Moy <matthieu.moy@grenoble-inp.fr>
Signed-off-by: Sylvain Boulmé <sylvain.boulme@imag.fr>
---
 contrib/mw-to-git/git-remote-mediawiki |   93 +++++++++++++++++++++++++++++++-
 1 files changed, 92 insertions(+), 1 deletions(-)

diff --git a/contrib/mw-to-git/git-remote-mediawiki b/contrib/mw-to-git/git-remote-mediawiki
index 176ff09..dc1aacf 100755
--- a/contrib/mw-to-git/git-remote-mediawiki
+++ b/contrib/mw-to-git/git-remote-mediawiki
@@ -148,6 +148,14 @@ sub get_last_remote_revision {
 	return $max_rev_num;
 }
 
+sub mediawiki_filter($) {
+	# Mediawiki does not allow blank space at the end of a page and ends with a single \n.
+	# This function right trims a string and adds a \n at the end to follow this rule
+	my $string = shift;
+	$string =~ s/\s+$//;
+	return $string."\n";
+}
+
 sub literal_data {
 	my ($content) = @_;
 	print STDOUT "data ", bytes::length($content), "\n", $content;
@@ -175,6 +183,7 @@ sub mw_list {
 }
 
 sub mw_option {
+	print STDERR "remote-helper capability 'option' not yet implemented \n";
 	print STDOUT "unsupported\n";
 }
 
@@ -318,5 +327,87 @@ sub mw_import {
 }
 
 sub mw_push {
-	print STDERR "Push not yet implemented\n";
+
+	sub push_file {
+		#$_[0] contains a string in this format :
+		#100644 100644 <sha1_of_blob_before_commit> <sha1_of_blob_now> <status>\0<filename.mw>\0
+		#$_[1] contains the title of the commit message (the only phrase kept in the revision message)
+		my @blob_info_split = split(/ |\t|\0/, $_[0]);
+
+		my $sha1 = $blob_info_split[3];
+		my $complete_file_name = $blob_info_split[5];
+		# complete_file_name = uri_unescape($complete_file_name); # If we use the uri escape before
+		# we should unescape here, before anything
+		
+		if (substr($complete_file_name,-3) eq ".mw"){
+			my $title = substr($complete_file_name,0,-3);
+			$title =~ s/$slash_replacement/\//g;
+
+			my $file_content = run_git("cat-file -p $sha1");
+			
+			my $mw = MediaWiki::API->new();
+			$mw->{config}->{api_url} = "$url/api.php";
+
+			# log in to the wiki : here should be added a way to push changes with an identity
+			#$mw->login( { lgname => 'login', lgpassword => 'passwd' })
+			#|| die $mw->{error}->{code} . ': ' . $mw->{error}->{details};
+			
+			$mw->edit( {
+				action => 'edit',
+				summary => $_[1],
+				title => $title,
+				text => mediawiki_filter($file_content),
+			}, {
+				skip_encoding => 1 # Helps with names with accentuated characters
+			}) || die 'Fatal: Error ' . $mw->{error}->{code} . ' from mediwiki: ' . $mw->{error}->{details};
+
+			print STDERR "Pushed file : $sha1 - $title\n";
+		} else {
+			print STDERR "$complete_file_name not a mediawiki file. '(Not pushable on this version)\n"
+		}
+	}
+	
+	my $last_local_revid = get_last_local_revision();
+	my $last_remote_revid = get_last_remote_revision();
+
+	# Get sha1 of commit pointed by local HEAD
+	my $HEAD_sha1 = run_git("rev-parse $_[0] 2>/dev/null"); chomp($HEAD_sha1);
+	# Get sha1 of commit pointed by remotes/origin/master
+	my $remoteorigin_sha1 = run_git("rev-parse refs/remotes/origin/master 2>/dev/null"); chomp($remoteorigin_sha1);
+
+	if ($last_local_revid < $last_remote_revid){
+		my $message = "\"To prevent you from losing history, non-fast-forward updates were rejected \\n";
+		$message .= "Merge the remote changes (e.g. 'git pull') before pushing again. See the";
+		$message .= " 'Note about fast-forwards' section of 'git push --help' for details.\"";
+		print STDOUT "error $_[0] $message\n";
+		print STDOUT "\n";
+	} elsif ($HEAD_sha1 ne $remoteorigin_sha1) {
+		# Get every commit in between HEAD and refs/remotes/origin/master,
+		# including HEAD and refs/remotes/origin/master
+		my $parsed_sha1 = $remoteorigin_sha1;
+		while ($parsed_sha1 ne $HEAD_sha1) {
+			my @commit_info =  grep(/^$parsed_sha1/, `git rev-list --children $_[0]`);
+			my @commit_info_split = split(/ |\n/, $commit_info[0]);
+			# $commit_info_split[0] is the sha1 of the commit itself
+			# $commit_info_split[1] is the sha1 of its direct child
+			my $blob_infos = run_git("diff --raw --abbrev=40 -z $commit_info_split[0] $commit_info_split[1]");
+			my @blob_info_list = split(/\n/, $blob_infos);
+			# Keep the first line of the commit message as mediawiki comment for the revision
+			my $commit_msg = (split(/\n/, run_git("show --pretty=format:\"%s\" $commit_info_split[1]")))[0];
+			chomp($commit_msg);
+			foreach my $blob_info (@blob_info_list) {
+				# Push every blob
+				push_file($blob_info, $commit_msg);
+			}
+			$parsed_sha1 = $commit_info_split[1];
+		}
+
+		print STDOUT "ok $_[1]\n";
+		print STDOUT "\n";
+		
+		# Pulling from mediawiki after pushing in order to keep things synchronized
+		exec("git pull --rebase >/dev/null");
+	} else {
+		print STDOUT "\n";
+	}
 }
-- 
1.7.4.1

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled
  2011-06-09 13:15 [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled Jeremie Nikaes
  2011-06-09 13:16 ` [RFC/PATCH 2/2] Git-remote-mediawiki: Add push support Jeremie Nikaes
@ 2011-06-09 14:03 ` Sverre Rabbelier
  2011-06-09 14:30   ` Jérémie NIKAES
  2011-06-09 22:44 ` Jeff King
  2011-06-10  0:21 ` Jeff King
  3 siblings, 1 reply; 10+ messages in thread
From: Sverre Rabbelier @ 2011-06-09 14:03 UTC (permalink / raw)
  To: Jeremie Nikaes
  Cc: git, Arnaud Lacurie, Claire Fousse, David Amouyal, Matthieu Moy,
	Sylvain Boulmé

Heya,

On Thu, Jun 9, 2011 at 15:15, Jeremie Nikaes
<jeremie.nikaes@ensimag.imag.fr> wrote:
> +sub mw_capabilities {
> +       print STDOUT "push\n";

> +sub mw_push {
> +       print STDERR "Push not yet implemented\n";
> +}

If it's not supported then you shouldn't advertise it above. Is there
any particular reason you implemented it this way? E.g., did you
encounter a breakage in remote-helpers if push is not implemented?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled
  2011-06-09 14:03 ` [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled Sverre Rabbelier
@ 2011-06-09 14:30   ` Jérémie NIKAES
  2011-06-09 14:32     ` Sverre Rabbelier
  0 siblings, 1 reply; 10+ messages in thread
From: Jérémie NIKAES @ 2011-06-09 14:30 UTC (permalink / raw)
  To: Sverre Rabbelier
  Cc: git, Arnaud Lacurie, Claire Fousse, David Amouyal, Matthieu Moy,
	Sylvain Boulmé

2011/6/9 Sverre Rabbelier <srabbelier@gmail.com>:
> Heya,
>
> On Thu, Jun 9, 2011 at 15:15, Jeremie Nikaes
> <jeremie.nikaes@ensimag.imag.fr> wrote:
>> +sub mw_capabilities {
>> +       print STDOUT "push\n";
>
>> +sub mw_push {
>> +       print STDERR "Push not yet implemented\n";
>> +}
>
> If it's not supported then you shouldn't advertise it above. Is there
> any particular reason you implemented it this way? E.g., did you
> encounter a breakage in remote-helpers if push is not implemented?

Well, if you dont give the capability to push, you get the following message

error: failed to push some refs to 'mediawiki::http://mediawikiurl'

Which is less explicit than 'Push not yet implemented'.


Regards,
-- 
Jérémie Nikaes

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled
  2011-06-09 14:30   ` Jérémie NIKAES
@ 2011-06-09 14:32     ` Sverre Rabbelier
  0 siblings, 0 replies; 10+ messages in thread
From: Sverre Rabbelier @ 2011-06-09 14:32 UTC (permalink / raw)
  To: Jérémie NIKAES
  Cc: git, Arnaud Lacurie, Claire Fousse, David Amouyal, Matthieu Moy,
	Sylvain Boulmé

Heya,

2011/6/9 Jérémie NIKAES <jeremie.nikaes@ensimag.imag.fr>:
> Well, if you dont give the capability to push, you get the following message
>
> error: failed to push some refs to 'mediawiki::http://mediawikiurl'
>
> Which is less explicit than 'Push not yet implemented'.

I suspected as much. In that case perhaps the transport-helper.c code
needs to be updated to give a more helpful message?

-- 
Cheers,

Sverre Rabbelier

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [RFC/PATCH 2/2] Git-remote-mediawiki: Add push support
  2011-06-09 13:16 ` [RFC/PATCH 2/2] Git-remote-mediawiki: Add push support Jeremie Nikaes
@ 2011-06-09 17:15   ` Junio C Hamano
  0 siblings, 0 replies; 10+ messages in thread
From: Junio C Hamano @ 2011-06-09 17:15 UTC (permalink / raw)
  To: Jeremie Nikaes
  Cc: git, Arnaud Lacurie, Claire Fousse, David Amouyal, Matthieu Moy,
	Sylvain Boulmé

Jeremie Nikaes <jeremie.nikaes@ensimag.imag.fr> writes:

> diff --git a/contrib/mw-to-git/git-remote-mediawiki b/contrib/mw-to-git/git-remote-mediawiki
> index 176ff09..dc1aacf 100755
> --- a/contrib/mw-to-git/git-remote-mediawiki
> +++ b/contrib/mw-to-git/git-remote-mediawiki
> @@ -148,6 +148,14 @@ sub get_last_remote_revision {
>  	return $max_rev_num;
>  }
>  
> +sub mediawiki_filter($) {

The only caller calls this function with a plain vanilla single scalar
string, and this is not emulating to replace any Perl built-in. I do not
see why you want to confuse the readers with a prototype here.

> @@ -318,5 +327,87 @@ sub mw_import {
>  }
>  
>  sub mw_push {
> -	print STDERR "Push not yet implemented\n";
> +
> +	sub push_file {

The language lets you to write nested functions, but in this case I do not
think it is buying you anything, other than one level unnecessarily deeper
indentation to make the resulting code harder to read.

> +		#$_[0] contains a string in this format :
> +		#100644 100644 <sha1_of_blob_before_commit> <sha1_of_blob_now> <status>\0<filename.mw>\0
> +		#$_[1] contains the title of the commit message (the only phrase kept in the revision message)
> +		my @blob_info_split = split(/ |\t|\0/, $_[0]);

What if a filename has space or tab in it?  A code that reads from "-z"
output should not be using split().  Something like this (untested)?

  # avoid $_[number] unless in a trivial few-liner function. they
  # are unreadable.
  my ($raw_diff, $message) = @_;
  my ($old_mode, $new_mode, $old_sha1, $new_sha1, $status, $path) =
  ($raw_diff =~ /^:([0-7]+) ([0-7]+) ([0-9a-f]{40}) ([0-9a-f]{40}) (\S+)\0(.*?)\0$/) 

> +		if (substr($complete_file_name,-3) eq ".mw"){
> +			my $title = substr($complete_file_name,0,-3);
> +			$title =~ s/$slash_replacement/\//g;

It is probably more customary to write this like so:
	
	if (($title = $complete_file_name) =~ s/\.mw$//) {
		...

> +	} elsif ($HEAD_sha1 ne $remoteorigin_sha1) {
> +		# Get every commit in between HEAD and refs/remotes/origin/master,
> +		# including HEAD and refs/remotes/origin/master
> +		my $parsed_sha1 = $remoteorigin_sha1;
> +		while ($parsed_sha1 ne $HEAD_sha1) {
> +			my @commit_info =  grep(/^$parsed_sha1/, `git rev-list --children $_[0]`);

It feels extremely wasteful to traverse the whole history with rev-list
every time you interate this loop. Can't you do better?

> +			my $blob_infos = run_git("diff --raw --abbrev=40 -z $commit_info_split[0] $commit_info_split[1]");
> +			my @blob_info_list = split(/\n/, $blob_infos);

Huh?  Didn't you read from "-z" output?

> +			# Keep the first line of the commit message as mediawiki comment for the revision
> +			my $commit_msg = (split(/\n/, run_git("show --pretty=format:\"%s\" $commit_info_split[1]")))[0];
> +			chomp($commit_msg);
> +			foreach my $blob_info (@blob_info_list) {
> +				# Push every blob
> +				push_file($blob_info, $commit_msg);
> +			}
> +			$parsed_sha1 = $commit_info_split[1];
> +		}
> +
> +		print STDOUT "ok $_[1]\n";
> +		print STDOUT "\n";
> +		
> +		# Pulling from mediawiki after pushing in order to keep things synchronized
> +		exec("git pull --rebase >/dev/null");
> +	} else {
> +		print STDOUT "\n";
> +	}
>  }

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled
  2011-06-09 13:15 [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled Jeremie Nikaes
  2011-06-09 13:16 ` [RFC/PATCH 2/2] Git-remote-mediawiki: Add push support Jeremie Nikaes
  2011-06-09 14:03 ` [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled Sverre Rabbelier
@ 2011-06-09 22:44 ` Jeff King
  2011-06-10  0:21 ` Jeff King
  3 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2011-06-09 22:44 UTC (permalink / raw)
  To: Jeremie Nikaes
  Cc: git, Arnaud Lacurie, Claire Fousse, David Amouyal, Matthieu Moy,
	Sylvain Boulmé

On Thu, Jun 09, 2011 at 03:15:59PM +0200, Jeremie Nikaes wrote:

> For now, the whole wiki is cloned, but it will be possible to clone only
> some pages: the clone is based on a list of pages which is now all
> pages.

This is not really true anymore, is it? Later you say:

> Partial cloning is supported using the following syntax :
> "git clone mediawiki::http://wikiurl##A_Page##Another_Page"
> As always, this url is kept in .git/config, helping to always keep
> track of these specific pages

so I think it is not "will be" possible any more.

> +sub get_pages{
> +	my $mediawiki = MediaWiki::API->new;
> +	$mediawiki->{config}->{api_url} = "$url/api.php";
> [...]
> +	} else {
> +		#the list of titles should follow the pattern 'page1|page2|...'
> +		my $titles = "";
> +		foreach my $title (@pages_titles){
> +			$titles.="$title|";
> +		}
> +		#supress the last | that is add in the foreach
> +		chop($titles);

This is usually spelled:

  my $titles = join('|', @pages_titles);

> +		$pages = $mediawiki->api({
> +			action => 'query',
> +			titles => $titles,
> +		});
> +		if (!defined($pages)) {
> +			print STDERR "fatal: None of the pages exist \n";
> +			exit 1;
> +		}

That's not an accurate error message. If the pages don't exist, we will
actually get back a set of pages with negative ids (so you can tell
which ones exist and which ones don't). If $pages is undefined, it's
actually not a valid mediawiki repo.

Also, according to the mediawiki API, we can send only 51 titles at a
time. So we need to break this into pieces.

However, I wonder if this code path is needed at all. We are mapping
titles to page ids, so that we can later ask mediawiki for revisions by
page id. But why not just ask for revisions by title, and skip this
extra round trip to the server?

Speaking of round trips, I did have an idea for reducing round-trips in
the "mostly up to date" case. We can ask for the revisions for multiple
titles at once (apparently up to 51, or 501 if you have special bot
privileges), but you will only get the latest revision for each. Which
isn't sufficient for us to do anything except tell whether or not there
are any revisions to fetch.

So without the optimization, with N pages we will make N requests for
new revisions. But with it, we will make N/51 requests for the latest
revisions, and then M (where M <= N) for every page that actually has
new content. In other words, it is a good optimization as long as less
than 49/51 pages have changed, on average. So it's bad for "clone", but
very good for a subsequent "fetch". The best case is a fetch where
nothing has changed, which should have 1/51th as many round-trips to
determine that is the case.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled
  2011-06-09 13:15 [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled Jeremie Nikaes
                   ` (2 preceding siblings ...)
  2011-06-09 22:44 ` Jeff King
@ 2011-06-10  0:21 ` Jeff King
  2011-06-10  6:31   ` Arnaud Lacurie
  3 siblings, 1 reply; 10+ messages in thread
From: Jeff King @ 2011-06-10  0:21 UTC (permalink / raw)
  To: Jeremie Nikaes
  Cc: git, Arnaud Lacurie, Claire Fousse, David Amouyal, Matthieu Moy,
	Sylvain Boulmé

On Thu, Jun 09, 2011 at 03:15:59PM +0200, Jeremie Nikaes wrote:

> Partial cloning is supported using the following syntax :
> "git clone mediawiki::http://wikiurl##A_Page##Another_Page"
> As always, this url is kept in .git/config, helping to always keep
> track of these specific pages

Earlier today I posted a 10-patch series to allow git to handle
something like:

  git clone \
    -c mediawiki.page=GitWorkflows \
    -c mediawiki.page=Tig \
    https://git.wiki.kernel.org

and set that config into the .git/config file. The patch for the
mediawiki helper to actually read and use it (instead of the "##" syntax
in the url) is just:

---
diff --git a/contrib/mw-to-git/git-remote-mediawiki b/contrib/mw-to-git/git-remote-mediawiki
index fd26f87..75c3537 100755
--- a/contrib/mw-to-git/git-remote-mediawiki
+++ b/contrib/mw-to-git/git-remote-mediawiki
@@ -11,10 +11,9 @@ use warnings;
 my $slash_replacement = "%2F";
 
 my $remotename = $ARGV[0];
-# Current syntax to fetch only a set of pages mediawiki::http://mediawikiurl##A_Page##Another_Page
-my @pages_titles = split(/##/,$ARGV[1]);
-my $url = shift (@pages_titles);
-
+my $url = $ARGV[1];
+my @pages_titles = `git config --get-all mediawiki.page`;
+chomp @pages_titles;
 
 # commands parser
 my $entry;

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled
  2011-06-10  0:21 ` Jeff King
@ 2011-06-10  6:31   ` Arnaud Lacurie
  2011-06-10  7:22     ` Jeff King
  0 siblings, 1 reply; 10+ messages in thread
From: Arnaud Lacurie @ 2011-06-10  6:31 UTC (permalink / raw)
  To: Jeff King
  Cc: Jeremie Nikaes, git, Claire Fousse, David Amouyal, Matthieu Moy,
	Sylvain Boulmé

Hi !

What you did there is great !
It'll help clarify the command.

2011/6/10 Jeff King <peff@peff.net>:
> On Thu, Jun 09, 2011 at 03:15:59PM +0200, Jeremie Nikaes wrote:
> Earlier today I posted a 10-patch series to allow git to handle
> something like:
>
>  git clone \
>    -c mediawiki.page=GitWorkflows \
>    -c mediawiki.page=Tig \
>    https://git.wiki.kernel.org
>

So we give up on the mediawiki prefix (mediawiki::https://git.wiki.kernel.org)
when it comes to partial cloning ?

Regards

-- 
Arnaud Lacurie

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled
  2011-06-10  6:31   ` Arnaud Lacurie
@ 2011-06-10  7:22     ` Jeff King
  0 siblings, 0 replies; 10+ messages in thread
From: Jeff King @ 2011-06-10  7:22 UTC (permalink / raw)
  To: Arnaud Lacurie
  Cc: Jeremie Nikaes, git, Claire Fousse, David Amouyal, Matthieu Moy,
	Sylvain Boulmé

On Fri, Jun 10, 2011 at 08:31:41AM +0200, Arnaud Lacurie wrote:

> >  git clone \
> >    -c mediawiki.page=GitWorkflows \
> >    -c mediawiki.page=Tig \
> >    https://git.wiki.kernel.org
> >
> 
> So we give up on the mediawiki prefix (mediawiki::https://git.wiki.kernel.org)
> when it comes to partial cloning ?

Oops, no, I just forgot it. You need the mediawiki prefix so git knows
to invoke the mediawiki helper.

Sorry for the confusion.

-Peff

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-06-10  7:22 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-06-09 13:15 [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled Jeremie Nikaes
2011-06-09 13:16 ` [RFC/PATCH 2/2] Git-remote-mediawiki: Add push support Jeremie Nikaes
2011-06-09 17:15   ` Junio C Hamano
2011-06-09 14:03 ` [PATCHv3 1/2] Add a remote helper to interact with mediawiki, pull & clone handled Sverre Rabbelier
2011-06-09 14:30   ` Jérémie NIKAES
2011-06-09 14:32     ` Sverre Rabbelier
2011-06-09 22:44 ` Jeff King
2011-06-10  0:21 ` Jeff King
2011-06-10  6:31   ` Arnaud Lacurie
2011-06-10  7:22     ` Jeff King

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.