* [PATCH 0/4] WIP: git-remote-media wiki namespace support @ 2017-10-29 16:08 Antoine Beaupré 2017-10-29 16:08 ` [PATCH 1/4] remote-mediawiki: add " Antoine Beaupré ` (4 more replies) 0 siblings, 5 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-10-29 16:08 UTC (permalink / raw) To: git Hi, For a few years now, work has been happening in a [GitHub issue] to improve git's support for MediaWiki sites, which are implemented in the contrib/mw-to-git/ module, mostly visible in the git-remote-mediawiki command. [GitHub issue]: https://github.com/Git-Mediawiki/Git-Mediawiki/issues/10 This specific patchset adds support for namespaces in MediaWiki. Without this, it is impossible to fetch pages outside the "(Main)" namespace (e.g. Talk pages or "meta"). Namespaces are heavily used on many wikis and this seems like an essential feature to have. I have been hesitant in pushing those patches here because I know how strict the git community is regarding patchsets and I was afraid they would just get shot down, especially because there are no unit tests for the new functionality. Obviously, doing unit tests against a full MediaWiki instance isn't exactly trivial. Even though the contrib module features a test suite and a way to install MediaWiki, I haven't had the chance to test this yet, so unit tests are still missing. This is the main reason why this is marked WIP. I have tried to follow the patch submission guide, but I believe this is my first Git patch, so please be gentle. Any review would be greatly appreciated and I hope this can be eventually merged in. This work is also available on GitHub: https://github.com/anarcat/git/tree/mediawiki-namespaces Thanks in advance, A. ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH 1/4] remote-mediawiki: add namespace support 2017-10-29 16:08 [PATCH 0/4] WIP: git-remote-media wiki namespace support Antoine Beaupré @ 2017-10-29 16:08 ` Antoine Beaupré 2017-10-29 17:24 ` Eric Sunshine ` (2 more replies) 2017-10-29 16:08 ` [PATCH 2/4] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré ` (3 subsequent siblings) 4 siblings, 3 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-10-29 16:08 UTC (permalink / raw) To: git; +Cc: Kevin, Antoine Beaupré From: Kevin <kevin@ki-ai.org> this introduces a new remote.origin.namespaces argument that is a space-separated list of namespaces. the list of pages extract is then taken from all the specified namespaces. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 34 +++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index e7f857c1a..1c5e39831 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -17,6 +17,7 @@ use Git; use Git::Mediawiki qw(clean_filename smudge_filename connect_maybe EMPTY HTTP_CODE_OK); use DateTime::Format::ISO8601; +use Scalar::Util; use warnings; # By default, use UTF-8 to communicate with Git and the user @@ -63,6 +64,10 @@ chomp(@tracked_pages); my @tracked_categories = split(/[ \n]/, run_git("config --get-all remote.${remotename}.categories")); chomp(@tracked_categories); +# Just like @tracked_categories, but for MediaWiki namespaces. +my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); +chomp(@tracked_namespaces); + # Import media files on pull my $import_media = run_git("config --get --bool remote.${remotename}.mediaimport"); chomp($import_media); @@ -256,6 +261,23 @@ sub get_mw_tracked_categories { return; } +sub get_mw_tracked_namespaces { + my $pages = shift; + foreach my $local_namespace (@tracked_namespaces) { + my $mw_pages = $mediawiki->list( { + action => 'query', + list => 'allpages', + apnamespace => get_mw_namespace_id($local_namespace), + aplimit => 'max' } ) + || die $mediawiki->{error}->{code} . ': ' + . $mediawiki->{error}->{details} . "\n"; + foreach my $page (@{$mw_pages}) { + $pages->{$page->{title}} = $page; + } + } + return; +} + sub get_mw_all_pages { my $pages = shift; # No user-provided list, get the list of pages from the API. @@ -319,6 +341,10 @@ sub get_mw_pages { $user_defined = 1; get_mw_tracked_categories(\%pages); } + if (@tracked_namespaces) { + $user_defined = 1; + get_mw_tracked_namespaces(\%pages); + } if (!$user_defined) { get_mw_all_pages(\%pages); } @@ -1263,7 +1289,6 @@ my %cached_mw_namespace_id; sub get_mw_namespace_id { $mediawiki = connect_maybe($mediawiki, $remotename, $url); my $name = shift; - if (!exists $namespace_id{$name}) { # Look at configuration file, if the record for that namespace is # already cached. Namespaces are stored in form: @@ -1331,7 +1356,12 @@ sub get_mw_namespace_id { sub get_mw_namespace_id_for_page { my $namespace = shift; if ($namespace =~ /^([^:]*):/) { - return get_mw_namespace_id($namespace); + my ($ns, $id) = split(/:/, $namespace); + if (Scalar::Util::looks_like_number($id)) { + return get_mw_namespace_id($ns); + } else{ + return + } } else { return; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH 1/4] remote-mediawiki: add namespace support 2017-10-29 16:08 ` [PATCH 1/4] remote-mediawiki: add " Antoine Beaupré @ 2017-10-29 17:24 ` Eric Sunshine 2017-10-29 18:29 ` Antoine Beaupré 2017-10-30 2:51 ` [PATCH v2 0/7] " Antoine Beaupré 2017-10-30 10:43 ` [PATCH 1/4] remote-mediawiki: add " Matthieu Moy 2 siblings, 1 reply; 78+ messages in thread From: Eric Sunshine @ 2017-10-29 17:24 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List, Kevin On Sun, Oct 29, 2017 at 12:08 PM, Antoine Beaupré <anarcat@debian.org> wrote: > From: Kevin <kevin@ki-ai.org> > > this introduces a new remote.origin.namespaces argument that is a s/this/This/ > space-separated list of namespaces. the list of pages extract is then s/the/The/ > taken from all the specified namespaces. > > Reviewed-by: Antoine Beaupré <anarcat@debian.org> > Signed-off-by: Antoine Beaupré <anarcat@debian.org> > --- > diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl > @@ -1331,7 +1356,12 @@ sub get_mw_namespace_id { > sub get_mw_namespace_id_for_page { > my $namespace = shift; > if ($namespace =~ /^([^:]*):/) { This is not a new issue, but why capture if $1 is never referenced in the code below? > - return get_mw_namespace_id($namespace); > + my ($ns, $id) = split(/:/, $namespace); > + if (Scalar::Util::looks_like_number($id)) { > + return get_mw_namespace_id($ns); So, the idea is that if the input has form "something:number", then you want to look up "something" as a namespace name. Anything else (such as "something:foobar") is not considered a valid page reference. Right? > + } else{ Missing space before open brace. > + return Not required, but missing semi-colon. > + } > } else { > return; > } The multiple 'return's are a bit messy. Perhaps collapse the entire function to something like this: sub get_mw_namespace_id_for_page { my $arg = shift; if ($arg =~ /^([^:]+):\d+$/) { return get_mw_namespace_id($1); } return undef; } Then, you don't need even need Scalar::Util::looks_like_number() (unless, I suppose, the incoming number is expected to be something other than simple digits). In fact, it may be that the intent of the original code *was* meant to do exactly the same as shown in my example above, but that the person who wrote it accidentally typed: return get_mw_namespace_id($namespace); instead of the intended: return get_mw_namespace_id($1); So, a minimal fix would be simply to change $namespace to $1. Tightening the regex as I did in my example would be a bonus (though probably ought to be a separate patch). ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 1/4] remote-mediawiki: add namespace support 2017-10-29 17:24 ` Eric Sunshine @ 2017-10-29 18:29 ` Antoine Beaupré 2017-10-29 20:07 ` Eric Sunshine 0 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-10-29 18:29 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List, Kevin On 2017-10-29 13:24:03, Eric Sunshine wrote: > On Sun, Oct 29, 2017 at 12:08 PM, Antoine Beaupré <anarcat@debian.org> wrote: >> From: Kevin <kevin@ki-ai.org> >> >> this introduces a new remote.origin.namespaces argument that is a > > s/this/This/ ack. >> space-separated list of namespaces. the list of pages extract is then > > s/the/The/ ack. >> taken from all the specified namespaces. >> >> Reviewed-by: Antoine Beaupré <anarcat@debian.org> >> Signed-off-by: Antoine Beaupré <anarcat@debian.org> >> --- >> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl >> @@ -1331,7 +1356,12 @@ sub get_mw_namespace_id { >> sub get_mw_namespace_id_for_page { >> my $namespace = shift; >> if ($namespace =~ /^([^:]*):/) { > > This is not a new issue, but why capture if $1 is never referenced in > the code below? meh, i dunno. >> - return get_mw_namespace_id($namespace); >> + my ($ns, $id) = split(/:/, $namespace); >> + if (Scalar::Util::looks_like_number($id)) { >> + return get_mw_namespace_id($ns); > > So, the idea is that if the input has form "something:number", then > you want to look up "something" as a namespace name. Anything else > (such as "something:foobar") is not considered a valid page reference. > Right? frankly, i have no idea what's going on here. >> + } else{ > > Missing space before open brace. right. >> + return > > Not required, but missing semi-colon. ok. >> + } >> } else { >> return; >> } > > The multiple 'return's are a bit messy. Perhaps collapse the entire > function to something like this: > > sub get_mw_namespace_id_for_page { > my $arg = shift; > if ($arg =~ /^([^:]+):\d+$/) { > return get_mw_namespace_id($1); > } > return undef; > } > > Then, you don't need even need Scalar::Util::looks_like_number() > (unless, I suppose, the incoming number is expected to be something > other than simple digits). > > In fact, it may be that the intent of the original code *was* meant to > do exactly the same as shown in my example above, but that the person > who wrote it accidentally typed: > > return get_mw_namespace_id($namespace); > > instead of the intended: > > return get_mw_namespace_id($1); > > So, a minimal fix would be simply to change $namespace to $1. > Tightening the regex as I did in my example would be a bonus (though > probably ought to be a separate patch). so while i'm happy to just copy-paste your code in there, that's kind of a sensitive area of the code, as it was originally used only in the upload procedure, which I haven't tested at all. so i'm hesitant in just merging that in as is. i don't understand why or how this even works, to be honest: page names don't necessarily look like numbers, in fact, they generally don't. i don't understand why the patch submitted here even touches that function at all, considering that the function is only used on uploads. I just cargo-culted it from the original issue... sigh. a. -- C'est trop facile quand les guerres sont finies D'aller gueuler que c'était la dernière Amis bourgeois vous me faites envie Ne voyez vous pas donc point vos cimetières? - Jaques Brel ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 1/4] remote-mediawiki: add namespace support 2017-10-29 18:29 ` Antoine Beaupré @ 2017-10-29 20:07 ` Eric Sunshine 2017-10-29 23:08 ` Kevin 0 siblings, 1 reply; 78+ messages in thread From: Eric Sunshine @ 2017-10-29 20:07 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List, Kevin On Sun, Oct 29, 2017 at 2:29 PM, Antoine Beaupré <anarcat@debian.org> wrote: > On 2017-10-29 13:24:03, Eric Sunshine wrote: >> On Sun, Oct 29, 2017 at 12:08 PM, Antoine Beaupré <anarcat@debian.org> wrote: >> So, the idea is that if the input has form "something:number", then >> you want to look up "something" as a namespace name. Anything else >> (such as "something:foobar") is not considered a valid page reference. >> Right? > > frankly, i have no idea what's going on here. > >> The multiple 'return's are a bit messy. Perhaps collapse the entire >> function to something like this: >> >> sub get_mw_namespace_id_for_page { >> my $arg = shift; >> if ($arg =~ /^([^:]+):\d+$/) { >> return get_mw_namespace_id($1); >> } >> return undef; >> } >> >> In fact, it may be that the intent of the original code *was* meant to >> do exactly the same as shown in my example above, but that the person >> who wrote it accidentally typed: >> >> return get_mw_namespace_id($namespace); >> >> instead of the intended: >> >> return get_mw_namespace_id($1); >> >> So, a minimal fix would be simply to change $namespace to $1. >> Tightening the regex as I did in my example would be a bonus (though >> probably ought to be a separate patch). > > so while i'm happy to just copy-paste your code in there, that's kind of > a sensitive area of the code, as it was originally used only in the > upload procedure, which I haven't tested at all. so i'm hesitant in just > merging that in as is. I don't think there's a need to copy/paste my example code. If you instead make the minimal suggested fix, then the resulting code will be effectively equivalent to my example (minus the tighter regex). > i don't understand why or how this even works, to be honest: page names > don't necessarily look like numbers, in fact, they generally don't. i > don't understand why the patch submitted here even touches that function > at all, considering that the function is only used on uploads. I just > cargo-culted it from the original issue... I, myself, am not familiar with or a user of Mediawiki or with the Git bridging, and I don't know what page names look like, but I'm pretty well convinced from reading both the existing code and this patch that the changes to get_mw_namespace_id_for_page() are really just a bug fix to that function. My interpretation is that the function really was intended to strip the ":id" portion of "name:id" before calling get_mw_namespace_id(); the fact that the original code neglects to do so seems just an oversight. The fact that the regex uses capturing parentheses implies strongly that it was indeed the intention to use $1 in the call to get_mw_namespace_id(). Unlike the "fix" in the patch you posted from Kevin, which is perhaps unnecessarily complicated, the fix I suggested above is about a minimal as possible. That is, changing: return get_mw_namespace_id($namespace); to: return get_mw_namespace_id($1); should achieve the same result. (It could be made more robust by tightening the regex as in my example, but that's a separate topic, not needed just to get the function to work as intended.) ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 1/4] remote-mediawiki: add namespace support 2017-10-29 20:07 ` Eric Sunshine @ 2017-10-29 23:08 ` Kevin 2017-10-30 2:14 ` Antoine Beaupré 0 siblings, 1 reply; 78+ messages in thread From: Kevin @ 2017-10-29 23:08 UTC (permalink / raw) To: Eric Sunshine, Antoine Beaupré; +Cc: Git List So I shared the patch some time ago (~2 years). Surprisingly its just now getting attention. I guess some renewed interest in using mediawiki with git. Myself, however, am no longer using mediawiki. Nor am I completely clear on what the reasons were for using some variable or another a couple of years ago. So... the best of luck, sorry I couldn't be more helpful. Eric Sunshine: > On Sun, Oct 29, 2017 at 2:29 PM, Antoine Beaupré <anarcat@debian.org> wrote: >> On 2017-10-29 13:24:03, Eric Sunshine wrote: >>> On Sun, Oct 29, 2017 at 12:08 PM, Antoine Beaupré <anarcat@debian.org> wrote: >>> So, the idea is that if the input has form "something:number", then >>> you want to look up "something" as a namespace name. Anything else >>> (such as "something:foobar") is not considered a valid page reference. >>> Right? >> >> frankly, i have no idea what's going on here. >> >>> The multiple 'return's are a bit messy. Perhaps collapse the entire >>> function to something like this: >>> >>> sub get_mw_namespace_id_for_page { >>> my $arg = shift; >>> if ($arg =~ /^([^:]+):\d+$/) { >>> return get_mw_namespace_id($1); >>> } >>> return undef; >>> } >>> >>> In fact, it may be that the intent of the original code *was* meant to >>> do exactly the same as shown in my example above, but that the person >>> who wrote it accidentally typed: >>> >>> return get_mw_namespace_id($namespace); >>> >>> instead of the intended: >>> >>> return get_mw_namespace_id($1); >>> >>> So, a minimal fix would be simply to change $namespace to $1. >>> Tightening the regex as I did in my example would be a bonus (though >>> probably ought to be a separate patch). >> >> so while i'm happy to just copy-paste your code in there, that's kind of >> a sensitive area of the code, as it was originally used only in the >> upload procedure, which I haven't tested at all. so i'm hesitant in just >> merging that in as is. > > I don't think there's a need to copy/paste my example code. If you > instead make the minimal suggested fix, then the resulting code will > be effectively equivalent to my example (minus the tighter regex). > >> i don't understand why or how this even works, to be honest: page names >> don't necessarily look like numbers, in fact, they generally don't. i >> don't understand why the patch submitted here even touches that function >> at all, considering that the function is only used on uploads. I just >> cargo-culted it from the original issue... > > I, myself, am not familiar with or a user of Mediawiki or with the Git > bridging, and I don't know what page names look like, but I'm pretty > well convinced from reading both the existing code and this patch that > the changes to get_mw_namespace_id_for_page() are really just a bug > fix to that function. My interpretation is that the function really > was intended to strip the ":id" portion of "name:id" before calling > get_mw_namespace_id(); the fact that the original code neglects to do > so seems just an oversight. The fact that the regex uses capturing > parentheses implies strongly that it was indeed the intention to use > $1 in the call to get_mw_namespace_id(). Unlike the "fix" in the patch > you posted from Kevin, which is perhaps unnecessarily complicated, the > fix I suggested above is about a minimal as possible. That is, > changing: > > return get_mw_namespace_id($namespace); > > to: > > return get_mw_namespace_id($1); > > should achieve the same result. (It could be made more robust by > tightening the regex as in my example, but that's a separate topic, > not needed just to get the function to work as intended.) > ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 1/4] remote-mediawiki: add namespace support 2017-10-29 23:08 ` Kevin @ 2017-10-30 2:14 ` Antoine Beaupré 0 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-10-30 2:14 UTC (permalink / raw) To: kevin, Eric Sunshine; +Cc: Git List On 2017-10-29 23:08:00, Kevin wrote: > So I shared the patch some time ago (~2 years). Surprisingly its just > now getting attention. I guess some renewed interest in using mediawiki > with git. I think what's happening is that someone (ie. me :p) figured it was about frigging time to actually send those patches to the git mailing list. ;) And I'm glad we're seeing such good reviews, so thanks Eric for that... > Myself, however, am no longer using mediawiki. Nor am I > completely clear on what the reasons were for using some variable or > another a couple of years ago. So... the best of luck, sorry I couldn't > be more helpful. That's too bad, but thanks for the feedback anyways. :) Frankly, I'm tempted to just completely remove the get_mw_namespace_id_for_page hunk - it's completely unrelated to the rest of the patch. Could that be a bugfix for a separate issue that crept up in your patchset? For example this? https://github.com/Git-Mediawiki/Git-Mediawiki/issues/43 A. -- That's one of the remarkable things about life: it's never so bad that it can't get worse. - Calvin ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH v2 0/7] remote-mediawiki: add namespace support 2017-10-29 16:08 ` [PATCH 1/4] remote-mediawiki: add " Antoine Beaupré 2017-10-29 17:24 ` Eric Sunshine @ 2017-10-30 2:51 ` Antoine Beaupré 2017-10-30 2:51 ` [PATCH 1/7] " Antoine Beaupré ` (7 more replies) 2017-10-30 10:43 ` [PATCH 1/4] remote-mediawiki: add " Matthieu Moy 2 siblings, 8 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-10-30 2:51 UTC (permalink / raw) To: git This patch series tries to integrate all the feedback received in the recent review from Eric Sunshine. It completely removes the confusing changes to get_mw_namespace_id_for_page() because I believe they are unrelated to the namespace support. I also split up the last patch in 4 different patches for clarity and fixed the vocabulary (it's "virtual" namespaces, not "special", which is a specific namespace). I left that die() in there because it makes the code a little cleaner and I'm lazy. Thanks again for the good feedback! ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH 1/7] remote-mediawiki: add namespace support 2017-10-30 2:51 ` [PATCH v2 0/7] " Antoine Beaupré @ 2017-10-30 2:51 ` Antoine Beaupré 2017-10-30 2:51 ` [PATCH 2/7] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré ` (6 subsequent siblings) 7 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-10-30 2:51 UTC (permalink / raw) To: git; +Cc: Kevin, Antoine Beaupré From: Kevin <kevin@ki-ai.org> This introduces a new remote.origin.namespaces argument that is a space-separated list of namespaces. The list of pages extract is then taken from all the specified namespaces. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index e7f857c1a..5ffb57595 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -63,6 +63,10 @@ chomp(@tracked_pages); my @tracked_categories = split(/[ \n]/, run_git("config --get-all remote.${remotename}.categories")); chomp(@tracked_categories); +# Just like @tracked_categories, but for MediaWiki namespaces. +my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); +chomp(@tracked_namespaces); + # Import media files on pull my $import_media = run_git("config --get --bool remote.${remotename}.mediaimport"); chomp($import_media); @@ -256,6 +260,23 @@ sub get_mw_tracked_categories { return; } +sub get_mw_tracked_namespaces { + my $pages = shift; + foreach my $local_namespace (@tracked_namespaces) { + my $mw_pages = $mediawiki->list( { + action => 'query', + list => 'allpages', + apnamespace => get_mw_namespace_id($local_namespace), + aplimit => 'max' } ) + || die $mediawiki->{error}->{code} . ': ' + . $mediawiki->{error}->{details} . "\n"; + foreach my $page (@{$mw_pages}) { + $pages->{$page->{title}} = $page; + } + } + return; +} + sub get_mw_all_pages { my $pages = shift; # No user-provided list, get the list of pages from the API. @@ -319,6 +340,10 @@ sub get_mw_pages { $user_defined = 1; get_mw_tracked_categories(\%pages); } + if (@tracked_namespaces) { + $user_defined = 1; + get_mw_tracked_namespaces(\%pages); + } if (!$user_defined) { get_mw_all_pages(\%pages); } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH 2/7] remote-mediawiki: allow fetching namespaces with spaces 2017-10-30 2:51 ` [PATCH v2 0/7] " Antoine Beaupré 2017-10-30 2:51 ` [PATCH 1/7] " Antoine Beaupré @ 2017-10-30 2:51 ` Antoine Beaupré 2017-10-30 2:51 ` [PATCH 3/7] remote-mediawiki: show known namespace choices on failure Antoine Beaupré ` (5 subsequent siblings) 7 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-10-30 2:51 UTC (permalink / raw) To: git; +Cc: Ingo Ruhnke, Antoine Beaupré From: Ingo Ruhnke <grumbel@gmail.com> we still want to use spaces as separators in the config, but we should allow the user to specify namespaces with spaces, so we use underscore for this. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 5ffb57595..a1d783789 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -65,6 +65,7 @@ chomp(@tracked_categories); # Just like @tracked_categories, but for MediaWiki namespaces. my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); +for (@tracked_namespaces) { s/_/ /g; } chomp(@tracked_namespaces); # Import media files on pull -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH 3/7] remote-mediawiki: show known namespace choices on failure 2017-10-30 2:51 ` [PATCH v2 0/7] " Antoine Beaupré 2017-10-30 2:51 ` [PATCH 1/7] " Antoine Beaupré 2017-10-30 2:51 ` [PATCH 2/7] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré @ 2017-10-30 2:51 ` Antoine Beaupré 2017-10-30 2:51 ` [PATCH 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré ` (4 subsequent siblings) 7 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-10-30 2:51 UTC (permalink / raw) To: git; +Cc: Antoine Beaupré If we fail to find a requested namespace, we should tell the user which ones we know about, since those were already fetched. This allows users to fetch all namespaces by specifying a dummy namespace, failing, then copying the list of namespaces in the config. Eventually, we should have a flag that allows fetching all namespaces automatically. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index a1d783789..e7616e1a2 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -1334,7 +1334,9 @@ sub get_mw_namespace_id { my $id; if (!defined $ns) { - print {*STDERR} "No such namespace ${name} on MediaWiki.\n"; + my @namespaces = sort keys %namespace_id; + for (@namespaces) { s/ /_/g; } + print {*STDERR} "No such namespace ${name} on MediaWiki, known namespaces: @namespaces\n"; $ns = {is_namespace => 0}; $namespace_id{$name} = $ns; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH 4/7] remote-mediawiki: skip virtual namespaces 2017-10-30 2:51 ` [PATCH v2 0/7] " Antoine Beaupré ` (2 preceding siblings ...) 2017-10-30 2:51 ` [PATCH 3/7] remote-mediawiki: show known namespace choices on failure Antoine Beaupré @ 2017-10-30 2:51 ` Antoine Beaupré 2017-11-01 13:52 ` Eric Sunshine 2017-10-30 2:51 ` [PATCH 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré ` (3 subsequent siblings) 7 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-10-30 2:51 UTC (permalink / raw) To: git; +Cc: Antoine Beaupré Virtual namespaces do not correspond to pages in the database and are automatically generated by MediaWiki. It makes little sense, therefore, to fetch pages from those namespaces and the MW API doesn't support listing those pages. According to the documentation, those virtual namespaces are currently "Special" (-1) and "Media" (-2) but we treat all negative namespaces as "virtual" as a future-proofing mechanism. Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index e7616e1a2..5c85e64b6 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -264,10 +264,12 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; foreach my $local_namespace (@tracked_namespaces) { + my $namespace_id = get_mw_namespace_id($local_namespace); + next if $namespace_id < 0; # virtual namespaces don't support allpages my $mw_pages = $mediawiki->list( { action => 'query', list => 'allpages', - apnamespace => get_mw_namespace_id($local_namespace), + apnamespace => $namespace_id, aplimit => 'max' } ) || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details} . "\n"; -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH 4/7] remote-mediawiki: skip virtual namespaces 2017-10-30 2:51 ` [PATCH 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré @ 2017-11-01 13:52 ` Eric Sunshine 2017-11-01 16:45 ` Antoine Beaupré 0 siblings, 1 reply; 78+ messages in thread From: Eric Sunshine @ 2017-11-01 13:52 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List On Sun, Oct 29, 2017 at 10:51 PM, Antoine Beaupré <anarcat@debian.org> wrote: > Virtual namespaces do not correspond to pages in the database and are > automatically generated by MediaWiki. It makes little sense, > therefore, to fetch pages from those namespaces and the MW API doesn't > support listing those pages. > > According to the documentation, those virtual namespaces are currently > "Special" (-1) and "Media" (-2) but we treat all negative namespaces > as "virtual" as a future-proofing mechanism. This patch makes more sense now with the additional commentary. Thanks. More below. > Signed-off-by: Antoine Beaupré <anarcat@debian.org> > --- > diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl > index e7616e1a2..5c85e64b6 100755 > --- a/contrib/mw-to-git/git-remote-mediawiki.perl > +++ b/contrib/mw-to-git/git-remote-mediawiki.perl > @@ -264,10 +264,12 @@ sub get_mw_tracked_categories { > sub get_mw_tracked_namespaces { > my $pages = shift; > foreach my $local_namespace (@tracked_namespaces) { > + my $namespace_id = get_mw_namespace_id($local_namespace); > + next if $namespace_id < 0; # virtual namespaces don't support allpages Since (it appears) that get_mw_namespace_id() can return undef, you probably still need to take that into account before performing a numeric comparison: next if !$namespace_id || $namespace_id < 0; > my $mw_pages = $mediawiki->list( { > action => 'query', > list => 'allpages', > - apnamespace => get_mw_namespace_id($local_namespace), > + apnamespace => $namespace_id, > aplimit => 'max' } ) > || die $mediawiki->{error}->{code} . ': ' > . $mediawiki->{error}->{details} . "\n"; ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 4/7] remote-mediawiki: skip virtual namespaces 2017-11-01 13:52 ` Eric Sunshine @ 2017-11-01 16:45 ` Antoine Beaupré 2017-11-02 1:24 ` Junio C Hamano 0 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-11-01 16:45 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List On 2017-11-01 09:52:09, Eric Sunshine wrote: > On Sun, Oct 29, 2017 at 10:51 PM, Antoine Beaupré <anarcat@debian.org> wrote: >> Virtual namespaces do not correspond to pages in the database and are >> automatically generated by MediaWiki. It makes little sense, >> therefore, to fetch pages from those namespaces and the MW API doesn't >> support listing those pages. >> >> According to the documentation, those virtual namespaces are currently >> "Special" (-1) and "Media" (-2) but we treat all negative namespaces >> as "virtual" as a future-proofing mechanism. > > This patch makes more sense now with the additional commentary. > Thanks. More below. > >> Signed-off-by: Antoine Beaupré <anarcat@debian.org> >> --- >> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl >> index e7616e1a2..5c85e64b6 100755 >> --- a/contrib/mw-to-git/git-remote-mediawiki.perl >> +++ b/contrib/mw-to-git/git-remote-mediawiki.perl >> @@ -264,10 +264,12 @@ sub get_mw_tracked_categories { >> sub get_mw_tracked_namespaces { >> my $pages = shift; >> foreach my $local_namespace (@tracked_namespaces) { >> + my $namespace_id = get_mw_namespace_id($local_namespace); >> + next if $namespace_id < 0; # virtual namespaces don't support allpages > > Since (it appears) that get_mw_namespace_id() can return undef, you > probably still need to take that into account before performing a > numeric comparison: > > next if !$namespace_id || $namespace_id < 0; I would argue that this bug exists already elsewhere in the code - no error handling exists there... Furthermore, it should be !defined() because it can be 0. It might still worth fixing this, but I'm not sure what the process is here - in the latest "what's cooking" Junio said this patchset would be merged in "next". Should I reroll the patchset to fix this or not? A. -- N'aimer qu'un seul est barbarie, car c'est au détriment de tous les autres. Fût-ce l'amour de Dieu. - Nietzsche, "Par delà le bien et le mal" ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 4/7] remote-mediawiki: skip virtual namespaces 2017-11-01 16:45 ` Antoine Beaupré @ 2017-11-02 1:24 ` Junio C Hamano 2017-11-02 21:20 ` Antoine Beaupré 0 siblings, 1 reply; 78+ messages in thread From: Junio C Hamano @ 2017-11-02 1:24 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Eric Sunshine, Git List Antoine Beaupré <anarcat@debian.org> writes: > It might still worth fixing this, but I'm not sure what the process is > here - in the latest "what's cooking" Junio said this patchset would be > merged in "next". Should I reroll the patchset to fix this or not? The process is for you (the contributor of the topic) to yell at me, "don't merge it yet, there still are updates to come". That message _may_ come to late, in which case we may have to go incremental, but I usually try to leave at least a few days between the time I mark a topic as "will merge" and the time I actually do the merge, for this exact reason. Thanks. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 4/7] remote-mediawiki: skip virtual namespaces 2017-11-02 1:24 ` Junio C Hamano @ 2017-11-02 21:20 ` Antoine Beaupré 2017-11-06 0:38 ` Junio C Hamano 0 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-11-02 21:20 UTC (permalink / raw) To: Junio C Hamano; +Cc: Eric Sunshine, Git List On 2017-11-02 10:24:40, Junio C Hamano wrote: > Antoine Beaupré <anarcat@debian.org> writes: > >> It might still worth fixing this, but I'm not sure what the process is >> here - in the latest "what's cooking" Junio said this patchset would be >> merged in "next". Should I reroll the patchset to fix this or not? > > The process is for you (the contributor of the topic) to yell at me, > "don't merge it yet, there still are updates to come". YELL! "don't merge it yet, there still are updates to come". :) > That message _may_ come to late, in which case we may have to go > incremental, but I usually try to leave at least a few days between > the time I mark a topic as "will merge" and the time I actually do > the merge, for this exact reason. Awesome, thanks for the update. i'll roll a v4 with the last tweaks, hopefully that will be the last. a. -- How inappropriate to call this planet 'Earth' when it is quite clearly 'Ocean'. - Arthur C. Clarke ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 4/7] remote-mediawiki: skip virtual namespaces 2017-11-02 21:20 ` Antoine Beaupré @ 2017-11-06 0:38 ` Junio C Hamano 0 siblings, 0 replies; 78+ messages in thread From: Junio C Hamano @ 2017-11-06 0:38 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Eric Sunshine, Git List Antoine Beaupré <anarcat@debian.org> writes: > On 2017-11-02 10:24:40, Junio C Hamano wrote: >> Antoine Beaupré <anarcat@debian.org> writes: >> >>> It might still worth fixing this, but I'm not sure what the process is >>> here - in the latest "what's cooking" Junio said this patchset would be >>> merged in "next". Should I reroll the patchset to fix this or not? >> >> The process is for you (the contributor of the topic) to yell at me, >> "don't merge it yet, there still are updates to come". > > YELL! "don't merge it yet, there still are updates to come". :) Thanks; heard you loud and clear. >> That message _may_ come to late, in which case we may have to go >> incremental, but I usually try to leave at least a few days between >> the time I mark a topic as "will merge" and the time I actually do >> the merge, for this exact reason. > > Awesome, thanks for the update. > > i'll roll a v4 with the last tweaks, hopefully that will be the last. Thanks. ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH 5/7] remote-mediawiki: support fetching from (Main) namespace 2017-10-30 2:51 ` [PATCH v2 0/7] " Antoine Beaupré ` (3 preceding siblings ...) 2017-10-30 2:51 ` [PATCH 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré @ 2017-10-30 2:51 ` Antoine Beaupré 2017-11-01 19:56 ` Eric Sunshine 2017-10-30 2:51 ` [PATCH 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré ` (2 subsequent siblings) 7 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-10-30 2:51 UTC (permalink / raw) To: git; +Cc: Antoine Beaupré When we specify a list of namespaces to fetch from, by default the MW API will not fetch from the default namespace, refered to as "(Main)" in the documentation: https://www.mediawiki.org/wiki/Manual:Namespace#Built-in_namespaces I haven't found a way to address that "(Main)" namespace when getting the namespace ids: indeed, when listing namespaces, there is no "canonical" field for the main namespace, although there is a "*" field that is set to "" (empty). So in theory, we could specify the empty namespace to get the main namespace, but that would make specifying namespaces harder for the user: we would need to teach users about the "empty" default namespace. It would also make the code more complicated: we'd need to parse quotes in the configuration. So we simply override the query here and allow the user to specify "(Main)" since that is the publicly documented name. Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 9 +++++++-- 1 file changed, 7 insertions(+), 2 deletions(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 5c85e64b6..2c2a7367b 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -264,9 +264,14 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; foreach my $local_namespace (@tracked_namespaces) { - my $namespace_id = get_mw_namespace_id($local_namespace); + my ($namespace_id, $mw_pages); + if ($local_namespace eq "(Main)") { + $namespace_id = 0; + } else { + $namespace_id = get_mw_namespace_id($local_namespace); + } next if $namespace_id < 0; # virtual namespaces don't support allpages - my $mw_pages = $mediawiki->list( { + $mw_pages = $mediawiki->list( { action => 'query', list => 'allpages', apnamespace => $namespace_id, -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH 5/7] remote-mediawiki: support fetching from (Main) namespace 2017-10-30 2:51 ` [PATCH 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré @ 2017-11-01 19:56 ` Eric Sunshine 2017-11-02 21:19 ` Antoine Beaupré 0 siblings, 1 reply; 78+ messages in thread From: Eric Sunshine @ 2017-11-01 19:56 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List On Sun, Oct 29, 2017 at 10:51 PM, Antoine Beaupré <anarcat@debian.org> wrote: > When we specify a list of namespaces to fetch from, by default the MW > API will not fetch from the default namespace, refered to as "(Main)" > in the documentation: > > https://www.mediawiki.org/wiki/Manual:Namespace#Built-in_namespaces > > I haven't found a way to address that "(Main)" namespace when getting > the namespace ids: indeed, when listing namespaces, there is no > "canonical" field for the main namespace, although there is a "*" > field that is set to "" (empty). So in theory, we could specify the > empty namespace to get the main namespace, but that would make > specifying namespaces harder for the user: we would need to teach > users about the "empty" default namespace. It would also make the code > more complicated: we'd need to parse quotes in the configuration. > > So we simply override the query here and allow the user to specify > "(Main)" since that is the publicly documented name. Thanks, this explanation makes the patch a lot clearer. More below... > Signed-off-by: Antoine Beaupré <anarcat@debian.org> > --- > diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl > @@ -264,9 +264,14 @@ sub get_mw_tracked_categories { > sub get_mw_tracked_namespaces { > my $pages = shift; > foreach my $local_namespace (@tracked_namespaces) { > - my $namespace_id = get_mw_namespace_id($local_namespace); > + my ($namespace_id, $mw_pages); > + if ($local_namespace eq "(Main)") { > + $namespace_id = 0; > + } else { > + $namespace_id = get_mw_namespace_id($local_namespace); > + } I meant to ask this in the previous round, but with the earlier patch mixing several distinct changes into one, I plumb forgot: Would it make sense to move this "(Main)" special case into get_mw_namespace_id() itself? After all, that function is all about determining an ID associated with a name, and "(Main)" is a name. > next if $namespace_id < 0; # virtual namespaces don't support allpages > - my $mw_pages = $mediawiki->list( { > + $mw_pages = $mediawiki->list( { Why did the "my" of $my_pages get moved up to the top of the foreach loop? I can't seem to see any reason for it. Is this an unrelated change accidentally included in this patch? > action => 'query', > list => 'allpages', > apnamespace => $namespace_id, > -- ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 5/7] remote-mediawiki: support fetching from (Main) namespace 2017-11-01 19:56 ` Eric Sunshine @ 2017-11-02 21:19 ` Antoine Beaupré 0 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-02 21:19 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List On 2017-11-01 15:56:51, Eric Sunshine wrote: >> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl >> @@ -264,9 +264,14 @@ sub get_mw_tracked_categories { >> sub get_mw_tracked_namespaces { >> my $pages = shift; >> foreach my $local_namespace (@tracked_namespaces) { >> - my $namespace_id = get_mw_namespace_id($local_namespace); >> + my ($namespace_id, $mw_pages); >> + if ($local_namespace eq "(Main)") { >> + $namespace_id = 0; >> + } else { >> + $namespace_id = get_mw_namespace_id($local_namespace); >> + } > > I meant to ask this in the previous round, but with the earlier patch > mixing several distinct changes into one, I plumb forgot: Would it > make sense to move this "(Main)" special case into > get_mw_namespace_id() itself? After all, that function is all about > determining an ID associated with a name, and "(Main)" is a name. Right. At first sight, I agree: get_mw_namespace_id should do the right thing. But then, I look at the code of that function, and it strikes me as ... well... really hard to actually do this the right way. In fact, I suspect that passing "" to get_mw_namespace_id would actually do the right thing. The problem, as I explained before, is that passing that in the configuration is pretty hard: it would needlessly complicate the configuration setting, so I think it's a fair shortcut to do it here. >> next if $namespace_id < 0; # virtual namespaces don't support allpages >> - my $mw_pages = $mediawiki->list( { >> + $mw_pages = $mediawiki->list( { > > Why did the "my" of $my_pages get moved up to the top of the foreach > loop? I can't seem to see any reason for it. Is this an unrelated > change accidentally included in this patch? Just a habit of declaring functions at the beginning of a block. Maybe it's because I'm old? :) I'll reroll a last patchset with those fixes. A. -- One of the strongest motives that leads men to art and science is escape from everyday life with its painful crudity and hopeless dreariness. Such men make this cosmos and its construction the pivot of their emotional life, in order to find the peace and security which they cannot find in the narrow whirlpool of personal experience. - Albert Einstein ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH 6/7] remote-mediawiki: process namespaces in order 2017-10-30 2:51 ` [PATCH v2 0/7] " Antoine Beaupré ` (4 preceding siblings ...) 2017-10-30 2:51 ` [PATCH 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré @ 2017-10-30 2:51 ` Antoine Beaupré 2017-11-01 19:59 ` Eric Sunshine 2017-10-30 2:51 ` [PATCH 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré 2017-11-02 21:25 ` [PATCH v3 0/7] remote-mediawiki: namespace support Antoine Beaupré 7 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-10-30 2:51 UTC (permalink / raw) To: git; +Cc: Antoine Beaupré Ideally, we'd process them in numeric order since that is more logical, but we can't do that yet since this is where we find the numeric identifiers in the first place. Lexicographic order is a good compromise. Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 2c2a7367b..5199af6f6 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -263,7 +263,7 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; - foreach my $local_namespace (@tracked_namespaces) { + foreach my $local_namespace (sort @tracked_namespaces) { my ($namespace_id, $mw_pages); if ($local_namespace eq "(Main)") { $namespace_id = 0; -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH 6/7] remote-mediawiki: process namespaces in order 2017-10-30 2:51 ` [PATCH 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré @ 2017-11-01 19:59 ` Eric Sunshine 0 siblings, 0 replies; 78+ messages in thread From: Eric Sunshine @ 2017-11-01 19:59 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List On Sun, Oct 29, 2017 at 10:51 PM, Antoine Beaupré <anarcat@debian.org> wrote: > Ideally, we'd process them in numeric order since that is more > logical, but we can't do that yet since this is where we find the > numeric identifiers in the first place. Lexicographic order is a good > compromise. The reader of this commit message is left with the question: Why is this change needed? Is it for the benefit of a human eventually seeing the output? Is it because a subsequent patch requires a certain order? > Signed-off-by: Antoine Beaupré <anarcat@debian.org> > --- > contrib/mw-to-git/git-remote-mediawiki.perl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl > index 2c2a7367b..5199af6f6 100755 > --- a/contrib/mw-to-git/git-remote-mediawiki.perl > +++ b/contrib/mw-to-git/git-remote-mediawiki.perl > @@ -263,7 +263,7 @@ sub get_mw_tracked_categories { > > sub get_mw_tracked_namespaces { > my $pages = shift; > - foreach my $local_namespace (@tracked_namespaces) { > + foreach my $local_namespace (sort @tracked_namespaces) { > my ($namespace_id, $mw_pages); > if ($local_namespace eq "(Main)") { > $namespace_id = 0; > -- ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH 7/7] remote-mediawiki: show progress while fetching namespaces 2017-10-30 2:51 ` [PATCH v2 0/7] " Antoine Beaupré ` (5 preceding siblings ...) 2017-10-30 2:51 ` [PATCH 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré @ 2017-10-30 2:51 ` Antoine Beaupré 2017-11-01 20:01 ` Eric Sunshine 2017-11-02 21:25 ` [PATCH v3 0/7] remote-mediawiki: namespace support Antoine Beaupré 7 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-10-30 2:51 UTC (permalink / raw) To: git; +Cc: Antoine Beaupré Without this, the fetch process seems hanged while we fetch page listings across the namespaces. Obviously, it should be possible to silence this with -q, but that's an issue already present everywhere in the code and should be fixed separately: https://github.com/Git-Mediawiki/Git-Mediawiki/issues/30 Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 5199af6f6..61e6dd798 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -278,6 +278,7 @@ sub get_mw_tracked_namespaces { aplimit => 'max' } ) || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details} . "\n"; + print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace ($namespace_id)\n"; foreach my $page (@{$mw_pages}) { $pages->{$page->{title}} = $page; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH 7/7] remote-mediawiki: show progress while fetching namespaces 2017-10-30 2:51 ` [PATCH 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré @ 2017-11-01 20:01 ` Eric Sunshine 0 siblings, 0 replies; 78+ messages in thread From: Eric Sunshine @ 2017-11-01 20:01 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List On Sun, Oct 29, 2017 at 10:51 PM, Antoine Beaupré <anarcat@debian.org> wrote: > Without this, the fetch process seems hanged while we fetch page > listings across the namespaces. Obviously, it should be possible to > silence this with -q, but that's an issue already present everywhere > in the code and should be fixed separately: > > https://github.com/Git-Mediawiki/Git-Mediawiki/issues/30 Unlike the previous round, this commit message makes it clear that this new printed message is indeed intentional. Thanks. > Signed-off-by: Antoine Beaupré <anarcat@debian.org> > --- > contrib/mw-to-git/git-remote-mediawiki.perl | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl > index 5199af6f6..61e6dd798 100755 > --- a/contrib/mw-to-git/git-remote-mediawiki.perl > +++ b/contrib/mw-to-git/git-remote-mediawiki.perl > @@ -278,6 +278,7 @@ sub get_mw_tracked_namespaces { > aplimit => 'max' } ) > || die $mediawiki->{error}->{code} . ': ' > . $mediawiki->{error}->{details} . "\n"; > + print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace ($namespace_id)\n"; > foreach my $page (@{$mw_pages}) { > $pages->{$page->{title}} = $page; > } > -- > 2.11.0 ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH v3 0/7] remote-mediawiki: namespace support 2017-10-30 2:51 ` [PATCH v2 0/7] " Antoine Beaupré ` (6 preceding siblings ...) 2017-10-30 2:51 ` [PATCH 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré @ 2017-11-02 21:25 ` Antoine Beaupré 2017-11-02 21:25 ` [PATCH v3 1/7] remote-mediawiki: add " Antoine Beaupré ` (8 more replies) 7 siblings, 9 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-02 21:25 UTC (permalink / raw) To: git This should be the final roll of patches for namespace support. I included the undef check even though that problem occurs elsewhere in the code. I also removed the needless "my" move. Hopefully that should be the last in the queue! ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH v3 1/7] remote-mediawiki: add namespace support 2017-11-02 21:25 ` [PATCH v3 0/7] remote-mediawiki: namespace support Antoine Beaupré @ 2017-11-02 21:25 ` Antoine Beaupré 2017-11-02 21:25 ` [PATCH v3 2/7] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré ` (7 subsequent siblings) 8 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-02 21:25 UTC (permalink / raw) To: git; +Cc: Kevin, Antoine Beaupré From: Kevin <kevin@ki-ai.org> This introduces a new remote.origin.namespaces argument that is a space-separated list of namespaces. The list of pages extract is then taken from all the specified namespaces. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index e7f857c1a..5ffb57595 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -63,6 +63,10 @@ chomp(@tracked_pages); my @tracked_categories = split(/[ \n]/, run_git("config --get-all remote.${remotename}.categories")); chomp(@tracked_categories); +# Just like @tracked_categories, but for MediaWiki namespaces. +my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); +chomp(@tracked_namespaces); + # Import media files on pull my $import_media = run_git("config --get --bool remote.${remotename}.mediaimport"); chomp($import_media); @@ -256,6 +260,23 @@ sub get_mw_tracked_categories { return; } +sub get_mw_tracked_namespaces { + my $pages = shift; + foreach my $local_namespace (@tracked_namespaces) { + my $mw_pages = $mediawiki->list( { + action => 'query', + list => 'allpages', + apnamespace => get_mw_namespace_id($local_namespace), + aplimit => 'max' } ) + || die $mediawiki->{error}->{code} . ': ' + . $mediawiki->{error}->{details} . "\n"; + foreach my $page (@{$mw_pages}) { + $pages->{$page->{title}} = $page; + } + } + return; +} + sub get_mw_all_pages { my $pages = shift; # No user-provided list, get the list of pages from the API. @@ -319,6 +340,10 @@ sub get_mw_pages { $user_defined = 1; get_mw_tracked_categories(\%pages); } + if (@tracked_namespaces) { + $user_defined = 1; + get_mw_tracked_namespaces(\%pages); + } if (!$user_defined) { get_mw_all_pages(\%pages); } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v3 2/7] remote-mediawiki: allow fetching namespaces with spaces 2017-11-02 21:25 ` [PATCH v3 0/7] remote-mediawiki: namespace support Antoine Beaupré 2017-11-02 21:25 ` [PATCH v3 1/7] remote-mediawiki: add " Antoine Beaupré @ 2017-11-02 21:25 ` Antoine Beaupré 2017-11-02 21:25 ` [PATCH v3 3/7] remote-mediawiki: show known namespace choices on failure Antoine Beaupré ` (6 subsequent siblings) 8 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-02 21:25 UTC (permalink / raw) To: git; +Cc: Ingo Ruhnke, Antoine Beaupré From: Ingo Ruhnke <grumbel@gmail.com> we still want to use spaces as separators in the config, but we should allow the user to specify namespaces with spaces, so we use underscore for this. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 5ffb57595..a1d783789 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -65,6 +65,7 @@ chomp(@tracked_categories); # Just like @tracked_categories, but for MediaWiki namespaces. my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); +for (@tracked_namespaces) { s/_/ /g; } chomp(@tracked_namespaces); # Import media files on pull -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v3 3/7] remote-mediawiki: show known namespace choices on failure 2017-11-02 21:25 ` [PATCH v3 0/7] remote-mediawiki: namespace support Antoine Beaupré 2017-11-02 21:25 ` [PATCH v3 1/7] remote-mediawiki: add " Antoine Beaupré 2017-11-02 21:25 ` [PATCH v3 2/7] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré @ 2017-11-02 21:25 ` Antoine Beaupré 2017-11-02 21:25 ` [PATCH v3 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré ` (5 subsequent siblings) 8 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-02 21:25 UTC (permalink / raw) To: git; +Cc: Antoine Beaupré If we fail to find a requested namespace, we should tell the user which ones we know about, since those were already fetched. This allows users to fetch all namespaces by specifying a dummy namespace, failing, then copying the list of namespaces in the config. Eventually, we should have a flag that allows fetching all namespaces automatically. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index a1d783789..e7616e1a2 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -1334,7 +1334,9 @@ sub get_mw_namespace_id { my $id; if (!defined $ns) { - print {*STDERR} "No such namespace ${name} on MediaWiki.\n"; + my @namespaces = sort keys %namespace_id; + for (@namespaces) { s/ /_/g; } + print {*STDERR} "No such namespace ${name} on MediaWiki, known namespaces: @namespaces\n"; $ns = {is_namespace => 0}; $namespace_id{$name} = $ns; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v3 4/7] remote-mediawiki: skip virtual namespaces 2017-11-02 21:25 ` [PATCH v3 0/7] remote-mediawiki: namespace support Antoine Beaupré ` (2 preceding siblings ...) 2017-11-02 21:25 ` [PATCH v3 3/7] remote-mediawiki: show known namespace choices on failure Antoine Beaupré @ 2017-11-02 21:25 ` Antoine Beaupré 2017-11-02 22:43 ` Eric Sunshine 2017-11-02 21:25 ` [PATCH v3 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré ` (4 subsequent siblings) 8 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-11-02 21:25 UTC (permalink / raw) To: git; +Cc: Antoine Beaupré Virtual namespaces do not correspond to pages in the database and are automatically generated by MediaWiki. It makes little sense, therefore, to fetch pages from those namespaces and the MW API doesn't support listing those pages. According to the documentation, those virtual namespaces are currently "Special" (-1) and "Media" (-2) but we treat all negative namespaces as "virtual" as a future-proofing mechanism. Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index e7616e1a2..21fb2e302 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -264,10 +264,13 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; foreach my $local_namespace (@tracked_namespaces) { + my $namespace_id = get_mw_namespace_id($local_namespace); + # virtual namespaces don't support allpages + next if !defined($namespace_id) || $namespace_id < 0; my $mw_pages = $mediawiki->list( { action => 'query', list => 'allpages', - apnamespace => get_mw_namespace_id($local_namespace), + apnamespace => $namespace_id, aplimit => 'max' } ) || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details} . "\n"; -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH v3 4/7] remote-mediawiki: skip virtual namespaces 2017-11-02 21:25 ` [PATCH v3 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré @ 2017-11-02 22:43 ` Eric Sunshine 2017-11-02 22:54 ` Antoine Beaupré 0 siblings, 1 reply; 78+ messages in thread From: Eric Sunshine @ 2017-11-02 22:43 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List On Thu, Nov 2, 2017 at 5:25 PM, Antoine Beaupré <anarcat@debian.org> wrote: > Virtual namespaces do not correspond to pages in the database and are > automatically generated by MediaWiki. It makes little sense, > therefore, to fetch pages from those namespaces and the MW API doesn't > support listing those pages. > > According to the documentation, those virtual namespaces are currently > "Special" (-1) and "Media" (-2) but we treat all negative namespaces > as "virtual" as a future-proofing mechanism. > > Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> It probably would be best to omit this Reviewed-by: since it was not provided explicitly. More importantly, I'm neither a user of nor familiar with MediaWiki or its API, so a Reviewed-by: from me has little or no value. Probably best would be for someone such as Matthieu to give his Reviewed-by: if he so desires. > Signed-off-by: Antoine Beaupré <anarcat@debian.org> ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH v3 4/7] remote-mediawiki: skip virtual namespaces 2017-11-02 22:43 ` Eric Sunshine @ 2017-11-02 22:54 ` Antoine Beaupré 0 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-02 22:54 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List, Junio C Hamano On 2017-11-02 18:43:00, Eric Sunshine wrote: > On Thu, Nov 2, 2017 at 5:25 PM, Antoine Beaupré <anarcat@debian.org> wrote: >> Virtual namespaces do not correspond to pages in the database and are >> automatically generated by MediaWiki. It makes little sense, >> therefore, to fetch pages from those namespaces and the MW API doesn't >> support listing those pages. >> >> According to the documentation, those virtual namespaces are currently >> "Special" (-1) and "Media" (-2) but we treat all negative namespaces >> as "virtual" as a future-proofing mechanism. >> >> Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> > > It probably would be best to omit this Reviewed-by: since it was not > provided explicitly. More importantly, I'm neither a user of nor > familiar with MediaWiki or its API, so a Reviewed-by: from me has > little or no value. Probably best would be for someone such as > Matthieu to give his Reviewed-by: if he so desires. Alright, I was wondering what the process was for those. I didn't want to leave your contributions by the wayside... I'll wait a little while longer for more feedback and then resend without those. unless... @junio: my github repo has the branch without those Reviewed-by tags, iirc. so if you can to merge from there, that will keep me from sending yet another pile of patches for such a trivial change... a. -- Semantics is the gravity of abstraction. ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH v3 5/7] remote-mediawiki: support fetching from (Main) namespace 2017-11-02 21:25 ` [PATCH v3 0/7] remote-mediawiki: namespace support Antoine Beaupré ` (3 preceding siblings ...) 2017-11-02 21:25 ` [PATCH v3 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré @ 2017-11-02 21:25 ` Antoine Beaupré 2017-11-02 22:48 ` Eric Sunshine 2017-11-02 21:25 ` [PATCH v3 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré ` (3 subsequent siblings) 8 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-11-02 21:25 UTC (permalink / raw) To: git; +Cc: Antoine Beaupré When we specify a list of namespaces to fetch from, by default the MW API will not fetch from the default namespace, refered to as "(Main)" in the documentation: https://www.mediawiki.org/wiki/Manual:Namespace#Built-in_namespaces I haven't found a way to address that "(Main)" namespace when getting the namespace ids: indeed, when listing namespaces, there is no "canonical" field for the main namespace, although there is a "*" field that is set to "" (empty). So in theory, we could specify the empty namespace to get the main namespace, but that would make specifying namespaces harder for the user: we would need to teach users about the "empty" default namespace. It would also make the code more complicated: we'd need to parse quotes in the configuration. So we simply override the query here and allow the user to specify "(Main)" since that is the publicly documented name. Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 21fb2e302..898541a9f 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -264,7 +264,12 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; foreach my $local_namespace (@tracked_namespaces) { - my $namespace_id = get_mw_namespace_id($local_namespace); + my $namespace_id; + if ($local_namespace eq "(Main)") { + $namespace_id = 0; + } else { + $namespace_id = get_mw_namespace_id($local_namespace); + } # virtual namespaces don't support allpages next if !defined($namespace_id) || $namespace_id < 0; my $mw_pages = $mediawiki->list( { -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH v3 5/7] remote-mediawiki: support fetching from (Main) namespace 2017-11-02 21:25 ` [PATCH v3 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré @ 2017-11-02 22:48 ` Eric Sunshine 0 siblings, 0 replies; 78+ messages in thread From: Eric Sunshine @ 2017-11-02 22:48 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List On Thu, Nov 2, 2017 at 5:25 PM, Antoine Beaupré <anarcat@debian.org> wrote: > When we specify a list of namespaces to fetch from, by default the MW > API will not fetch from the default namespace, refered to as "(Main)" > in the documentation: > > https://www.mediawiki.org/wiki/Manual:Namespace#Built-in_namespaces > > I haven't found a way to address that "(Main)" namespace when getting > the namespace ids: indeed, when listing namespaces, there is no > "canonical" field for the main namespace, although there is a "*" > field that is set to "" (empty). So in theory, we could specify the > empty namespace to get the main namespace, but that would make > specifying namespaces harder for the user: we would need to teach > users about the "empty" default namespace. It would also make the code > more complicated: we'd need to parse quotes in the configuration. > > So we simply override the query here and allow the user to specify > "(Main)" since that is the publicly documented name. > > Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> As with the previous patch, it would be best to drop this Reviewed-by: since it has no value with my name attached to it and was not provided explicitly. > Signed-off-by: Antoine Beaupré <anarcat@debian.org> ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH v3 6/7] remote-mediawiki: process namespaces in order 2017-11-02 21:25 ` [PATCH v3 0/7] remote-mediawiki: namespace support Antoine Beaupré ` (4 preceding siblings ...) 2017-11-02 21:25 ` [PATCH v3 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré @ 2017-11-02 21:25 ` Antoine Beaupré 2017-11-02 22:49 ` Eric Sunshine 2017-11-02 21:25 ` [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré ` (2 subsequent siblings) 8 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-11-02 21:25 UTC (permalink / raw) To: git; +Cc: Antoine Beaupré Ideally, we'd process them in numeric order since that is more logical, but we can't do that yet since this is where we find the numeric identifiers in the first place. Lexicographic order is a good compromise. Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 898541a9f..f53e638cf 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -263,7 +263,7 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; - foreach my $local_namespace (@tracked_namespaces) { + foreach my $local_namespace (sort @tracked_namespaces) { my $namespace_id; if ($local_namespace eq "(Main)") { $namespace_id = 0; -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH v3 6/7] remote-mediawiki: process namespaces in order 2017-11-02 21:25 ` [PATCH v3 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré @ 2017-11-02 22:49 ` Eric Sunshine 0 siblings, 0 replies; 78+ messages in thread From: Eric Sunshine @ 2017-11-02 22:49 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List On Thu, Nov 2, 2017 at 5:25 PM, Antoine Beaupré <anarcat@debian.org> wrote: > Ideally, we'd process them in numeric order since that is more > logical, but we can't do that yet since this is where we find the > numeric identifiers in the first place. Lexicographic order is a good > compromise. > > Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Ditto: It would be best to drop this Reviewed-by: since it has no value with my name attached to it and was not provided explicitly. > Signed-off-by: Antoine Beaupré <anarcat@debian.org> ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces 2017-11-02 21:25 ` [PATCH v3 0/7] remote-mediawiki: namespace support Antoine Beaupré ` (5 preceding siblings ...) 2017-11-02 21:25 ` [PATCH v3 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré @ 2017-11-02 21:25 ` Antoine Beaupré 2017-11-02 22:18 ` Thomas Adam 2017-11-02 22:50 ` Eric Sunshine 2017-11-06 21:19 ` [PATCH v4 0/7] remote-mediawiki: namespace support Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 0/7] namespace support Antoine Beaupré 8 siblings, 2 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-02 21:25 UTC (permalink / raw) To: git; +Cc: Antoine Beaupré Without this, the fetch process seems hanged while we fetch page listings across the namespaces. Obviously, it should be possible to silence this with -q, but that's an issue already present everywhere in the code and should be fixed separately: https://github.com/Git-Mediawiki/Git-Mediawiki/issues/30 Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index f53e638cf..dc43a950b 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -279,6 +279,7 @@ sub get_mw_tracked_namespaces { aplimit => 'max' } ) || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details} . "\n"; + print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace ($namespace_id)\n"; foreach my $page (@{$mw_pages}) { $pages->{$page->{title}} = $page; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces 2017-11-02 21:25 ` [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré @ 2017-11-02 22:18 ` Thomas Adam 2017-11-02 22:26 ` Antoine Beaupré 2017-11-02 22:50 ` Eric Sunshine 1 sibling, 1 reply; 78+ messages in thread From: Thomas Adam @ 2017-11-02 22:18 UTC (permalink / raw) To: Antoine Beaupré; +Cc: git Hi, On Thu, Nov 02, 2017 at 05:25:18PM -0400, Antoine Beaupré wrote: > + print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace ($namespace_id)\n"; How is this any different to using warn()? I appreciate you're using a globbed filehandle, but it seems superfluous to me. Kindly, Thomas ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces 2017-11-02 22:18 ` Thomas Adam @ 2017-11-02 22:26 ` Antoine Beaupré 2017-11-02 22:31 ` Thomas Adam 0 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-11-02 22:26 UTC (permalink / raw) To: Thomas Adam; +Cc: git On 2017-11-02 22:18:07, Thomas Adam wrote: > Hi, > > On Thu, Nov 02, 2017 at 05:25:18PM -0400, Antoine Beaupré wrote: >> + print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace ($namespace_id)\n"; > > How is this any different to using warn()? I appreciate you're using a > globbed filehandle, but it seems superfluous to me. It's what is used everywhere in the module, I'm just tagging along. This was discussed before: there's an issue about cleaning up the messaging in that module, that can be fixed separately. A. -- N'aimer qu'un seul est barbarie, car c'est au détriment de tous les autres. Fût-ce l'amour de Dieu. - Nietzsche, "Par delà le bien et le mal" ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces 2017-11-02 22:26 ` Antoine Beaupré @ 2017-11-02 22:31 ` Thomas Adam 2017-11-02 23:10 ` Antoine Beaupré 0 siblings, 1 reply; 78+ messages in thread From: Thomas Adam @ 2017-11-02 22:31 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Thomas Adam, git On Thu, Nov 02, 2017 at 06:26:43PM -0400, Antoine Beaupré wrote: > On 2017-11-02 22:18:07, Thomas Adam wrote: > > Hi, > > > > On Thu, Nov 02, 2017 at 05:25:18PM -0400, Antoine Beaupré wrote: > >> + print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace ($namespace_id)\n"; > > > > How is this any different to using warn()? I appreciate you're using a > > globbed filehandle, but it seems superfluous to me. > > It's what is used everywhere in the module, I'm just tagging along. > > This was discussed before: there's an issue about cleaning up the > messaging in that module, that can be fixed separately. Understood. That should happen sooner rather than later. -- Thomas Adam ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces 2017-11-02 22:31 ` Thomas Adam @ 2017-11-02 23:10 ` Antoine Beaupré 2017-11-04 9:57 ` Thomas Adam 0 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-11-02 23:10 UTC (permalink / raw) To: Thomas Adam; +Cc: Thomas Adam, git On 2017-11-02 22:31:02, Thomas Adam wrote: > On Thu, Nov 02, 2017 at 06:26:43PM -0400, Antoine Beaupré wrote: >> On 2017-11-02 22:18:07, Thomas Adam wrote: >> > Hi, >> > >> > On Thu, Nov 02, 2017 at 05:25:18PM -0400, Antoine Beaupré wrote: >> >> + print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace ($namespace_id)\n"; >> > >> > How is this any different to using warn()? I appreciate you're using a >> > globbed filehandle, but it seems superfluous to me. >> >> It's what is used everywhere in the module, I'm just tagging along. >> >> This was discussed before: there's an issue about cleaning up the >> messaging in that module, that can be fixed separately. > > Understood. That should happen sooner rather than later. Actually, is there a standard way to do this in git with Perl extensions? I know about "option verbosity N" but how should I translate this into Perl? Carp? Warn? Log::Any? Log4perl? Recommendations welcome... A. -- Si Dieu existe, j'espère qu'Il a une excuse valable - Daniel Pennac ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces 2017-11-02 23:10 ` Antoine Beaupré @ 2017-11-04 9:57 ` Thomas Adam 0 siblings, 0 replies; 78+ messages in thread From: Thomas Adam @ 2017-11-04 9:57 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Thomas Adam, git On Thu, Nov 02, 2017 at 07:10:17PM -0400, Antoine Beaupré wrote: > Actually, is there a standard way to do this in git with Perl > extensions? I know about "option verbosity N" but how should I translate > this into Perl? Carp? Warn? Log::Any? Log4perl? No, not really. From a quick glance at some of the existing perl code in git, a lot of it continues to use "print STDERR" -- but then to be fair, a lot of the perl code also reads like it has been written by C programmers... While there's nothing wrong with using "print STDERR", it's probably wiser to transition this to using Carp in the long run -- it would decrease the round-trip time to debugging should there be a situation where that was needed, and hence I would recommend using "warn" for less-severe errors/debugging. -- Thomas Adam ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces 2017-11-02 21:25 ` [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré 2017-11-02 22:18 ` Thomas Adam @ 2017-11-02 22:50 ` Eric Sunshine 1 sibling, 0 replies; 78+ messages in thread From: Eric Sunshine @ 2017-11-02 22:50 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List On Thu, Nov 2, 2017 at 5:25 PM, Antoine Beaupré <anarcat@debian.org> wrote: > Without this, the fetch process seems hanged while we fetch page > listings across the namespaces. Obviously, it should be possible to > silence this with -q, but that's an issue already present everywhere > in the code and should be fixed separately: > > https://github.com/Git-Mediawiki/Git-Mediawiki/issues/30 > > Reviewed-by: Eric Sunshine <sunshine@sunshineco.com> Ditto: It would be best to drop this Reviewed-by: since it has no value with my name attached to it and was not provided explicitly. > Signed-off-by: Antoine Beaupré <anarcat@debian.org> ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH v4 0/7] remote-mediawiki: namespace support 2017-11-02 21:25 ` [PATCH v3 0/7] remote-mediawiki: namespace support Antoine Beaupré ` (6 preceding siblings ...) 2017-11-02 21:25 ` [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré @ 2017-11-06 21:19 ` Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 1/7] remote-mediawiki: add " Antoine Beaupré ` (6 more replies) 2017-11-07 16:06 ` [PATCH v5 0/7] namespace support Antoine Beaupré 8 siblings, 7 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-06 21:19 UTC (permalink / raw) To: git; +Cc: gitster Hopefully, the final series. This includes only one more fix, from Thomas, to remove an extra loop. This should, alas, be ready to merge. ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH v4 1/7] remote-mediawiki: add namespace support 2017-11-06 21:19 ` [PATCH v4 0/7] remote-mediawiki: namespace support Antoine Beaupré @ 2017-11-06 21:19 ` Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 2/7] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré ` (5 subsequent siblings) 6 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-06 21:19 UTC (permalink / raw) To: git; +Cc: gitster, Kevin, Antoine Beaupré From: Kevin <kevin@ki-ai.org> This introduces a new remote.origin.namespaces argument that is a space-separated list of namespaces. The list of pages extract is then taken from all the specified namespaces. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index e7f857c1a..5ffb57595 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -63,6 +63,10 @@ chomp(@tracked_pages); my @tracked_categories = split(/[ \n]/, run_git("config --get-all remote.${remotename}.categories")); chomp(@tracked_categories); +# Just like @tracked_categories, but for MediaWiki namespaces. +my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); +chomp(@tracked_namespaces); + # Import media files on pull my $import_media = run_git("config --get --bool remote.${remotename}.mediaimport"); chomp($import_media); @@ -256,6 +260,23 @@ sub get_mw_tracked_categories { return; } +sub get_mw_tracked_namespaces { + my $pages = shift; + foreach my $local_namespace (@tracked_namespaces) { + my $mw_pages = $mediawiki->list( { + action => 'query', + list => 'allpages', + apnamespace => get_mw_namespace_id($local_namespace), + aplimit => 'max' } ) + || die $mediawiki->{error}->{code} . ': ' + . $mediawiki->{error}->{details} . "\n"; + foreach my $page (@{$mw_pages}) { + $pages->{$page->{title}} = $page; + } + } + return; +} + sub get_mw_all_pages { my $pages = shift; # No user-provided list, get the list of pages from the API. @@ -319,6 +340,10 @@ sub get_mw_pages { $user_defined = 1; get_mw_tracked_categories(\%pages); } + if (@tracked_namespaces) { + $user_defined = 1; + get_mw_tracked_namespaces(\%pages); + } if (!$user_defined) { get_mw_all_pages(\%pages); } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v4 2/7] remote-mediawiki: allow fetching namespaces with spaces 2017-11-06 21:19 ` [PATCH v4 0/7] remote-mediawiki: namespace support Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 1/7] remote-mediawiki: add " Antoine Beaupré @ 2017-11-06 21:19 ` Antoine Beaupré 2017-11-07 7:08 ` Thomas Adam 2017-11-06 21:19 ` [PATCH v4 3/7] remote-mediawiki: show known namespace choices on failure Antoine Beaupré ` (4 subsequent siblings) 6 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-11-06 21:19 UTC (permalink / raw) To: git; +Cc: gitster, Ingo Ruhnke, Antoine Beaupré From: Ingo Ruhnke <grumbel@gmail.com> we still want to use spaces as separators in the config, but we should allow the user to specify namespaces with spaces, so we use underscore for this. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 5ffb57595..a1d783789 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -65,6 +65,7 @@ chomp(@tracked_categories); # Just like @tracked_categories, but for MediaWiki namespaces. my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); +for (@tracked_namespaces) { s/_/ /g; } chomp(@tracked_namespaces); # Import media files on pull -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH v4 2/7] remote-mediawiki: allow fetching namespaces with spaces 2017-11-06 21:19 ` [PATCH v4 2/7] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré @ 2017-11-07 7:08 ` Thomas Adam 2017-11-07 16:03 ` Antoine Beaupré 0 siblings, 1 reply; 78+ messages in thread From: Thomas Adam @ 2017-11-07 7:08 UTC (permalink / raw) To: Antoine Beaupré; +Cc: git, gitster, Ingo Ruhnke On Mon, Nov 06, 2017 at 04:19:48PM -0500, Antoine Beaupré wrote: > From: Ingo Ruhnke <grumbel@gmail.com> > > we still want to use spaces as separators in the config, but we should > allow the user to specify namespaces with spaces, so we use underscore > for this. > > Reviewed-by: Antoine Beaupré <anarcat@debian.org> > Signed-off-by: Antoine Beaupré <anarcat@debian.org> > --- > contrib/mw-to-git/git-remote-mediawiki.perl | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl > index 5ffb57595..a1d783789 100755 > --- a/contrib/mw-to-git/git-remote-mediawiki.perl > +++ b/contrib/mw-to-git/git-remote-mediawiki.perl > @@ -65,6 +65,7 @@ chomp(@tracked_categories); > > # Just like @tracked_categories, but for MediaWiki namespaces. > my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); > +for (@tracked_namespaces) { s/_/ /g; } > chomp(@tracked_namespaces); Depending on the number if namespaces returned, it might be easier to convert this to the following: my @tracked_namespaces = map { chomp; s/_/ /g; $_; } split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); This would, once again, avoid creating @tracked_namespaces, and iterating over it. Note that this isn't about trying to 'golf' this; it's a performance consideration. Kindly, Thomas Adam ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH v4 2/7] remote-mediawiki: allow fetching namespaces with spaces 2017-11-07 7:08 ` Thomas Adam @ 2017-11-07 16:03 ` Antoine Beaupré 0 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-07 16:03 UTC (permalink / raw) To: Thomas Adam; +Cc: git, gitster, Ingo Ruhnke On 2017-11-07 07:08:08, Thomas Adam wrote: > On Mon, Nov 06, 2017 at 04:19:48PM -0500, Antoine Beaupré wrote: >> From: Ingo Ruhnke <grumbel@gmail.com> >> >> we still want to use spaces as separators in the config, but we should >> allow the user to specify namespaces with spaces, so we use underscore >> for this. >> >> Reviewed-by: Antoine Beaupré <anarcat@debian.org> >> Signed-off-by: Antoine Beaupré <anarcat@debian.org> >> --- >> contrib/mw-to-git/git-remote-mediawiki.perl | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl >> index 5ffb57595..a1d783789 100755 >> --- a/contrib/mw-to-git/git-remote-mediawiki.perl >> +++ b/contrib/mw-to-git/git-remote-mediawiki.perl >> @@ -65,6 +65,7 @@ chomp(@tracked_categories); >> >> # Just like @tracked_categories, but for MediaWiki namespaces. >> my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); >> +for (@tracked_namespaces) { s/_/ /g; } >> chomp(@tracked_namespaces); > > Depending on the number if namespaces returned, it might be easier to convert > this to the following: > > my @tracked_namespaces = map { > chomp; s/_/ /g; $_; > } split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); > > This would, once again, avoid creating @tracked_namespaces, and iterating over > it. > > Note that this isn't about trying to 'golf' this; it's a performance > consideration. I'm not sure it's worth it. Mediawiki has only about 10 default namespaces, and the user needs to specify them by hand here. I wouldn't be concerned about the performance. A. -- Education is the most powerful weapon which we can use to change the world. - Nelson Mandela ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH v4 3/7] remote-mediawiki: show known namespace choices on failure 2017-11-06 21:19 ` [PATCH v4 0/7] remote-mediawiki: namespace support Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 1/7] remote-mediawiki: add " Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 2/7] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré @ 2017-11-06 21:19 ` Antoine Beaupré 2017-11-07 10:45 ` Thomas Adam 2017-11-06 21:19 ` [PATCH v4 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré ` (3 subsequent siblings) 6 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-11-06 21:19 UTC (permalink / raw) To: git; +Cc: gitster, Antoine Beaupré If we fail to find a requested namespace, we should tell the user which ones we know about, since those were already fetched. This allows users to fetch all namespaces by specifying a dummy namespace, failing, then copying the list of namespaces in the config. Eventually, we should have a flag that allows fetching all namespaces automatically. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index a1d783789..6364d4e91 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -1334,7 +1334,8 @@ sub get_mw_namespace_id { my $id; if (!defined $ns) { - print {*STDERR} "No such namespace ${name} on MediaWiki.\n"; + my @namespaces = map { s/ /_/g; $_; } sort keys %namespaces_id; + print {*STDERR} "No such namespace ${name} on MediaWiki, known namespaces: @namespaces\n"; $ns = {is_namespace => 0}; $namespace_id{$name} = $ns; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH v4 3/7] remote-mediawiki: show known namespace choices on failure 2017-11-06 21:19 ` [PATCH v4 3/7] remote-mediawiki: show known namespace choices on failure Antoine Beaupré @ 2017-11-07 10:45 ` Thomas Adam 2017-11-07 16:07 ` Antoine Beaupré 0 siblings, 1 reply; 78+ messages in thread From: Thomas Adam @ 2017-11-07 10:45 UTC (permalink / raw) To: Antoine Beaupré; +Cc: git, gitster On Mon, Nov 06, 2017 at 04:19:49PM -0500, Antoine Beaupré wrote: > If we fail to find a requested namespace, we should tell the user > which ones we know about, since those were already fetched. This > allows users to fetch all namespaces by specifying a dummy namespace, > failing, then copying the list of namespaces in the config. > > Eventually, we should have a flag that allows fetching all namespaces > automatically. > > Reviewed-by: Antoine Beaupré <anarcat@debian.org> > Signed-off-by: Antoine Beaupré <anarcat@debian.org> > --- > contrib/mw-to-git/git-remote-mediawiki.perl | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl > index a1d783789..6364d4e91 100755 > --- a/contrib/mw-to-git/git-remote-mediawiki.perl > +++ b/contrib/mw-to-git/git-remote-mediawiki.perl > @@ -1334,7 +1334,8 @@ sub get_mw_namespace_id { > my $id; > > if (!defined $ns) { > - print {*STDERR} "No such namespace ${name} on MediaWiki.\n"; > + my @namespaces = map { s/ /_/g; $_; } sort keys %namespaces_id; Oops. This was my typo from my original suggestion. The hash is '%namespace_id', not '%namespaces_id'. However, how did this slip through testing? I'm assuming you blindly copied this from my example, which although quick to do, is only being caught because of my sharp eyes... -- Thomas Adam ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH v4 3/7] remote-mediawiki: show known namespace choices on failure 2017-11-07 10:45 ` Thomas Adam @ 2017-11-07 16:07 ` Antoine Beaupré 0 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-07 16:07 UTC (permalink / raw) To: Thomas Adam; +Cc: git, gitster On 2017-11-07 10:45:27, Thomas Adam wrote: > On Mon, Nov 06, 2017 at 04:19:49PM -0500, Antoine Beaupré wrote: >> If we fail to find a requested namespace, we should tell the user >> which ones we know about, since those were already fetched. This >> allows users to fetch all namespaces by specifying a dummy namespace, >> failing, then copying the list of namespaces in the config. >> >> Eventually, we should have a flag that allows fetching all namespaces >> automatically. >> >> Reviewed-by: Antoine Beaupré <anarcat@debian.org> >> Signed-off-by: Antoine Beaupré <anarcat@debian.org> >> --- >> contrib/mw-to-git/git-remote-mediawiki.perl | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl >> index a1d783789..6364d4e91 100755 >> --- a/contrib/mw-to-git/git-remote-mediawiki.perl >> +++ b/contrib/mw-to-git/git-remote-mediawiki.perl >> @@ -1334,7 +1334,8 @@ sub get_mw_namespace_id { >> my $id; >> >> if (!defined $ns) { >> - print {*STDERR} "No such namespace ${name} on MediaWiki.\n"; >> + my @namespaces = map { s/ /_/g; $_; } sort keys %namespaces_id; > > Oops. This was my typo from my original suggestion. The hash is > '%namespace_id', not '%namespaces_id'. However, how did this slip through > testing? I'm assuming you blindly copied this from my example, which although > quick to do, is only being caught because of my sharp eyes... I must admit I did not test that at all. Honestly, I'm just trying to finalize this so we can move to GitHub and I can move on other things. :) I rerolled with your fix. A. -- If builders built houses the way programmers built programs, The first woodpecker to come along would destroy civilization. - Gerald Weinberg ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH v4 4/7] remote-mediawiki: skip virtual namespaces 2017-11-06 21:19 ` [PATCH v4 0/7] remote-mediawiki: namespace support Antoine Beaupré ` (2 preceding siblings ...) 2017-11-06 21:19 ` [PATCH v4 3/7] remote-mediawiki: show known namespace choices on failure Antoine Beaupré @ 2017-11-06 21:19 ` Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré ` (2 subsequent siblings) 6 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-06 21:19 UTC (permalink / raw) To: git; +Cc: gitster, Antoine Beaupré Virtual namespaces do not correspond to pages in the database and are automatically generated by MediaWiki. It makes little sense, therefore, to fetch pages from those namespaces and the MW API doesn't support listing those pages. According to the documentation, those virtual namespaces are currently "Special" (-1) and "Media" (-2) but we treat all negative namespaces as "virtual" as a future-proofing mechanism. Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 6364d4e91..7f483180f 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -264,10 +264,13 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; foreach my $local_namespace (@tracked_namespaces) { + my $namespace_id = get_mw_namespace_id($local_namespace); + # virtual namespaces don't support allpages + next if !defined($namespace_id) || $namespace_id < 0; my $mw_pages = $mediawiki->list( { action => 'query', list => 'allpages', - apnamespace => get_mw_namespace_id($local_namespace), + apnamespace => $namespace_id, aplimit => 'max' } ) || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details} . "\n"; -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v4 5/7] remote-mediawiki: support fetching from (Main) namespace 2017-11-06 21:19 ` [PATCH v4 0/7] remote-mediawiki: namespace support Antoine Beaupré ` (3 preceding siblings ...) 2017-11-06 21:19 ` [PATCH v4 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré @ 2017-11-06 21:19 ` Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré 6 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-06 21:19 UTC (permalink / raw) To: git; +Cc: gitster, Antoine Beaupré When we specify a list of namespaces to fetch from, by default the MW API will not fetch from the default namespace, refered to as "(Main)" in the documentation: https://www.mediawiki.org/wiki/Manual:Namespace#Built-in_namespaces I haven't found a way to address that "(Main)" namespace when getting the namespace ids: indeed, when listing namespaces, there is no "canonical" field for the main namespace, although there is a "*" field that is set to "" (empty). So in theory, we could specify the empty namespace to get the main namespace, but that would make specifying namespaces harder for the user: we would need to teach users about the "empty" default namespace. It would also make the code more complicated: we'd need to parse quotes in the configuration. So we simply override the query here and allow the user to specify "(Main)" since that is the publicly documented name. Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 7f483180f..7a0824f31 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -264,7 +264,12 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; foreach my $local_namespace (@tracked_namespaces) { - my $namespace_id = get_mw_namespace_id($local_namespace); + my $namespace_id; + if ($local_namespace eq "(Main)") { + $namespace_id = 0; + } else { + $namespace_id = get_mw_namespace_id($local_namespace); + } # virtual namespaces don't support allpages next if !defined($namespace_id) || $namespace_id < 0; my $mw_pages = $mediawiki->list( { -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v4 6/7] remote-mediawiki: process namespaces in order 2017-11-06 21:19 ` [PATCH v4 0/7] remote-mediawiki: namespace support Antoine Beaupré ` (4 preceding siblings ...) 2017-11-06 21:19 ` [PATCH v4 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré @ 2017-11-06 21:19 ` Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré 6 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-06 21:19 UTC (permalink / raw) To: git; +Cc: gitster, Antoine Beaupré Ideally, we'd process them in numeric order since that is more logical, but we can't do that yet since this is where we find the numeric identifiers in the first place. Lexicographic order is a good compromise. Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 7a0824f31..7dccb44e0 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -263,7 +263,7 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; - foreach my $local_namespace (@tracked_namespaces) { + foreach my $local_namespace (sort @tracked_namespaces) { my $namespace_id; if ($local_namespace eq "(Main)") { $namespace_id = 0; -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v4 7/7] remote-mediawiki: show progress while fetching namespaces 2017-11-06 21:19 ` [PATCH v4 0/7] remote-mediawiki: namespace support Antoine Beaupré ` (5 preceding siblings ...) 2017-11-06 21:19 ` [PATCH v4 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré @ 2017-11-06 21:19 ` Antoine Beaupré 6 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-06 21:19 UTC (permalink / raw) To: git; +Cc: gitster, Antoine Beaupré Without this, the fetch process seems hanged while we fetch page listings across the namespaces. Obviously, it should be possible to silence this with -q, but that's an issue already present everywhere in the code and should be fixed separately: https://github.com/Git-Mediawiki/Git-Mediawiki/issues/30 Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 7dccb44e0..fcdc29197 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -279,6 +279,7 @@ sub get_mw_tracked_namespaces { aplimit => 'max' } ) || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details} . "\n"; + print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace ($namespace_id)\n"; foreach my $page (@{$mw_pages}) { $pages->{$page->{title}} = $page; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v5 0/7] namespace support 2017-11-02 21:25 ` [PATCH v3 0/7] remote-mediawiki: namespace support Antoine Beaupré ` (7 preceding siblings ...) 2017-11-06 21:19 ` [PATCH v4 0/7] remote-mediawiki: namespace support Antoine Beaupré @ 2017-11-07 16:06 ` Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 1/7] remote-mediawiki: add " Antoine Beaupré ` (7 more replies) 8 siblings, 8 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-07 16:06 UTC (permalink / raw) To: git; +Cc: gitster Yet another reroll to fix a typo. ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH v5 1/7] remote-mediawiki: add namespace support 2017-11-07 16:06 ` [PATCH v5 0/7] namespace support Antoine Beaupré @ 2017-11-07 16:06 ` Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 2/7] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré ` (6 subsequent siblings) 7 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-07 16:06 UTC (permalink / raw) To: git; +Cc: gitster, Kevin, Antoine Beaupré From: Kevin <kevin@ki-ai.org> This introduces a new remote.origin.namespaces argument that is a space-separated list of namespaces. The list of pages extract is then taken from all the specified namespaces. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index e7f857c1a..5ffb57595 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -63,6 +63,10 @@ chomp(@tracked_pages); my @tracked_categories = split(/[ \n]/, run_git("config --get-all remote.${remotename}.categories")); chomp(@tracked_categories); +# Just like @tracked_categories, but for MediaWiki namespaces. +my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); +chomp(@tracked_namespaces); + # Import media files on pull my $import_media = run_git("config --get --bool remote.${remotename}.mediaimport"); chomp($import_media); @@ -256,6 +260,23 @@ sub get_mw_tracked_categories { return; } +sub get_mw_tracked_namespaces { + my $pages = shift; + foreach my $local_namespace (@tracked_namespaces) { + my $mw_pages = $mediawiki->list( { + action => 'query', + list => 'allpages', + apnamespace => get_mw_namespace_id($local_namespace), + aplimit => 'max' } ) + || die $mediawiki->{error}->{code} . ': ' + . $mediawiki->{error}->{details} . "\n"; + foreach my $page (@{$mw_pages}) { + $pages->{$page->{title}} = $page; + } + } + return; +} + sub get_mw_all_pages { my $pages = shift; # No user-provided list, get the list of pages from the API. @@ -319,6 +340,10 @@ sub get_mw_pages { $user_defined = 1; get_mw_tracked_categories(\%pages); } + if (@tracked_namespaces) { + $user_defined = 1; + get_mw_tracked_namespaces(\%pages); + } if (!$user_defined) { get_mw_all_pages(\%pages); } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v5 2/7] remote-mediawiki: allow fetching namespaces with spaces 2017-11-07 16:06 ` [PATCH v5 0/7] namespace support Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 1/7] remote-mediawiki: add " Antoine Beaupré @ 2017-11-07 16:06 ` Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 3/7] remote-mediawiki: show known namespace choices on failure Antoine Beaupré ` (5 subsequent siblings) 7 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-07 16:06 UTC (permalink / raw) To: git; +Cc: gitster, Ingo Ruhnke, Antoine Beaupré From: Ingo Ruhnke <grumbel@gmail.com> we still want to use spaces as separators in the config, but we should allow the user to specify namespaces with spaces, so we use underscore for this. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 5ffb57595..a1d783789 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -65,6 +65,7 @@ chomp(@tracked_categories); # Just like @tracked_categories, but for MediaWiki namespaces. my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); +for (@tracked_namespaces) { s/_/ /g; } chomp(@tracked_namespaces); # Import media files on pull -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v5 3/7] remote-mediawiki: show known namespace choices on failure 2017-11-07 16:06 ` [PATCH v5 0/7] namespace support Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 1/7] remote-mediawiki: add " Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 2/7] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré @ 2017-11-07 16:06 ` Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré ` (4 subsequent siblings) 7 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-07 16:06 UTC (permalink / raw) To: git; +Cc: gitster, Antoine Beaupré If we fail to find a requested namespace, we should tell the user which ones we know about, since those were already fetched. This allows users to fetch all namespaces by specifying a dummy namespace, failing, then copying the list of namespaces in the config. Eventually, we should have a flag that allows fetching all namespaces automatically. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index a1d783789..5e8845893 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -1334,7 +1334,8 @@ sub get_mw_namespace_id { my $id; if (!defined $ns) { - print {*STDERR} "No such namespace ${name} on MediaWiki.\n"; + my @namespaces = map { s/ /_/g; $_; } sort keys %namespace_id; + print {*STDERR} "No such namespace ${name} on MediaWiki, known namespaces: @namespaces\n"; $ns = {is_namespace => 0}; $namespace_id{$name} = $ns; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v5 4/7] remote-mediawiki: skip virtual namespaces 2017-11-07 16:06 ` [PATCH v5 0/7] namespace support Antoine Beaupré ` (2 preceding siblings ...) 2017-11-07 16:06 ` [PATCH v5 3/7] remote-mediawiki: show known namespace choices on failure Antoine Beaupré @ 2017-11-07 16:06 ` Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré ` (3 subsequent siblings) 7 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-07 16:06 UTC (permalink / raw) To: git; +Cc: gitster, Antoine Beaupré Virtual namespaces do not correspond to pages in the database and are automatically generated by MediaWiki. It makes little sense, therefore, to fetch pages from those namespaces and the MW API doesn't support listing those pages. According to the documentation, those virtual namespaces are currently "Special" (-1) and "Media" (-2) but we treat all negative namespaces as "virtual" as a future-proofing mechanism. Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 5e8845893..611a04cd7 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -264,10 +264,13 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; foreach my $local_namespace (@tracked_namespaces) { + my $namespace_id = get_mw_namespace_id($local_namespace); + # virtual namespaces don't support allpages + next if !defined($namespace_id) || $namespace_id < 0; my $mw_pages = $mediawiki->list( { action => 'query', list => 'allpages', - apnamespace => get_mw_namespace_id($local_namespace), + apnamespace => $namespace_id, aplimit => 'max' } ) || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details} . "\n"; -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v5 5/7] remote-mediawiki: support fetching from (Main) namespace 2017-11-07 16:06 ` [PATCH v5 0/7] namespace support Antoine Beaupré ` (3 preceding siblings ...) 2017-11-07 16:06 ` [PATCH v5 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré @ 2017-11-07 16:06 ` Antoine Beaupré 2017-11-07 16:07 ` [PATCH v5 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré ` (2 subsequent siblings) 7 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-07 16:06 UTC (permalink / raw) To: git; +Cc: gitster, Antoine Beaupré When we specify a list of namespaces to fetch from, by default the MW API will not fetch from the default namespace, refered to as "(Main)" in the documentation: https://www.mediawiki.org/wiki/Manual:Namespace#Built-in_namespaces I haven't found a way to address that "(Main)" namespace when getting the namespace ids: indeed, when listing namespaces, there is no "canonical" field for the main namespace, although there is a "*" field that is set to "" (empty). So in theory, we could specify the empty namespace to get the main namespace, but that would make specifying namespaces harder for the user: we would need to teach users about the "empty" default namespace. It would also make the code more complicated: we'd need to parse quotes in the configuration. So we simply override the query here and allow the user to specify "(Main)" since that is the publicly documented name. Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 7 ++++++- 1 file changed, 6 insertions(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 611a04cd7..0e60b85c8 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -264,7 +264,12 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; foreach my $local_namespace (@tracked_namespaces) { - my $namespace_id = get_mw_namespace_id($local_namespace); + my $namespace_id; + if ($local_namespace eq "(Main)") { + $namespace_id = 0; + } else { + $namespace_id = get_mw_namespace_id($local_namespace); + } # virtual namespaces don't support allpages next if !defined($namespace_id) || $namespace_id < 0; my $mw_pages = $mediawiki->list( { -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v5 6/7] remote-mediawiki: process namespaces in order 2017-11-07 16:06 ` [PATCH v5 0/7] namespace support Antoine Beaupré ` (4 preceding siblings ...) 2017-11-07 16:06 ` [PATCH v5 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré @ 2017-11-07 16:07 ` Antoine Beaupré 2017-11-07 16:07 ` [PATCH v5 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré 2017-11-08 2:07 ` [PATCH v5 0/7] namespace support Junio C Hamano 7 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-07 16:07 UTC (permalink / raw) To: git; +Cc: gitster, Antoine Beaupré Ideally, we'd process them in numeric order since that is more logical, but we can't do that yet since this is where we find the numeric identifiers in the first place. Lexicographic order is a good compromise. Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 0e60b85c8..c9f46359b 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -263,7 +263,7 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; - foreach my $local_namespace (@tracked_namespaces) { + foreach my $local_namespace (sort @tracked_namespaces) { my $namespace_id; if ($local_namespace eq "(Main)") { $namespace_id = 0; -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH v5 7/7] remote-mediawiki: show progress while fetching namespaces 2017-11-07 16:06 ` [PATCH v5 0/7] namespace support Antoine Beaupré ` (5 preceding siblings ...) 2017-11-07 16:07 ` [PATCH v5 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré @ 2017-11-07 16:07 ` Antoine Beaupré 2017-11-08 2:07 ` [PATCH v5 0/7] namespace support Junio C Hamano 7 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-11-07 16:07 UTC (permalink / raw) To: git; +Cc: gitster, Antoine Beaupré Without this, the fetch process seems hanged while we fetch page listings across the namespaces. Obviously, it should be possible to silence this with -q, but that's an issue already present everywhere in the code and should be fixed separately: https://github.com/Git-Mediawiki/Git-Mediawiki/issues/30 Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index c9f46359b..af9cbc9d0 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -279,6 +279,7 @@ sub get_mw_tracked_namespaces { aplimit => 'max' } ) || die $mediawiki->{error}->{code} . ': ' . $mediawiki->{error}->{details} . "\n"; + print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace ($namespace_id)\n"; foreach my $page (@{$mw_pages}) { $pages->{$page->{title}} = $page; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH v5 0/7] namespace support 2017-11-07 16:06 ` [PATCH v5 0/7] namespace support Antoine Beaupré ` (6 preceding siblings ...) 2017-11-07 16:07 ` [PATCH v5 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré @ 2017-11-08 2:07 ` Junio C Hamano 7 siblings, 0 replies; 78+ messages in thread From: Junio C Hamano @ 2017-11-08 2:07 UTC (permalink / raw) To: Antoine Beaupré; +Cc: git Antoine Beaupré <anarcat@debian.org> writes: > Yet another reroll to fix a typo. Thanks. Will replace. Let's wait for a few more days and then merge it to 'next' and down to 'master'. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 1/4] remote-mediawiki: add namespace support 2017-10-29 16:08 ` [PATCH 1/4] remote-mediawiki: add " Antoine Beaupré 2017-10-29 17:24 ` Eric Sunshine 2017-10-30 2:51 ` [PATCH v2 0/7] " Antoine Beaupré @ 2017-10-30 10:43 ` Matthieu Moy 2 siblings, 0 replies; 78+ messages in thread From: Matthieu Moy @ 2017-10-30 10:43 UTC (permalink / raw) To: Antoine Beaupré; +Cc: git, Kevin Antoine Beaupré <anarcat@debian.org> writes: > @@ -319,6 +341,10 @@ sub get_mw_pages { > $user_defined = 1; > get_mw_tracked_categories(\%pages); > } > + if (@tracked_namespaces) { > + $user_defined = 1; > + get_mw_tracked_namespaces(\%pages); > + } > if (!$user_defined) { > get_mw_all_pages(\%pages); > } Space Vs tabs indent issue (I have tab-width = 8, you probably have 4 and this "if" looks underindented). -- Matthieu Moy https://matthieu-moy.fr/ ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH 2/4] remote-mediawiki: allow fetching namespaces with spaces 2017-10-29 16:08 [PATCH 0/4] WIP: git-remote-media wiki namespace support Antoine Beaupré 2017-10-29 16:08 ` [PATCH 1/4] remote-mediawiki: add " Antoine Beaupré @ 2017-10-29 16:08 ` Antoine Beaupré 2017-10-29 16:08 ` [PATCH 3/4] remote-mediawiki: show known namespace choices on failure Antoine Beaupré ` (2 subsequent siblings) 4 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-10-29 16:08 UTC (permalink / raw) To: git; +Cc: Ingo Ruhnke, Antoine Beaupré From: Ingo Ruhnke <grumbel@gmail.com> we still want to use spaces as separators in the config, but we should allow the user to specify namespaces with spaces, so we use underscore for this. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 1 + 1 file changed, 1 insertion(+) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 1c5e39831..fc48846a1 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -66,6 +66,7 @@ chomp(@tracked_categories); # Just like @tracked_categories, but for MediaWiki namespaces. my @tracked_namespaces = split(/[ \n]/, run_git("config --get-all remote.${remotename}.namespaces")); +for (@tracked_namespaces) { s/_/ /g; } chomp(@tracked_namespaces); # Import media files on pull -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* [PATCH 3/4] remote-mediawiki: show known namespace choices on failure 2017-10-29 16:08 [PATCH 0/4] WIP: git-remote-media wiki namespace support Antoine Beaupré 2017-10-29 16:08 ` [PATCH 1/4] remote-mediawiki: add " Antoine Beaupré 2017-10-29 16:08 ` [PATCH 2/4] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré @ 2017-10-29 16:08 ` Antoine Beaupré 2017-10-29 17:34 ` Eric Sunshine 2017-11-04 10:57 ` Thomas Adam 2017-10-29 16:08 ` [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces Antoine Beaupré 2017-10-30 10:40 ` [PATCH 0/4] WIP: git-remote-media wiki namespace support Matthieu Moy 4 siblings, 2 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-10-29 16:08 UTC (permalink / raw) To: git; +Cc: Antoine Beaupré if we fail to find a requested namespace, we should tell the user which ones we know about, since we already do. this allows users to feetch all namespaces by specifying a dummy namespace, failing, then copying the list of namespaces in the config. eventually, we should have a flag that allows fetching all namespaces automatically. Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index fc48846a1..07cc74bac 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -1334,7 +1334,9 @@ sub get_mw_namespace_id { my $id; if (!defined $ns) { - print {*STDERR} "No such namespace ${name} on MediaWiki.\n"; + my @namespaces = sort keys %namespace_id; + for (@namespaces) { s/ /_/g; } + print {*STDERR} "No such namespace ${name} on MediaWiki, known namespaces: @namespaces.\n"; $ns = {is_namespace => 0}; $namespace_id{$name} = $ns; } -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH 3/4] remote-mediawiki: show known namespace choices on failure 2017-10-29 16:08 ` [PATCH 3/4] remote-mediawiki: show known namespace choices on failure Antoine Beaupré @ 2017-10-29 17:34 ` Eric Sunshine 2017-10-29 18:31 ` Antoine Beaupré 2017-11-04 10:57 ` Thomas Adam 1 sibling, 1 reply; 78+ messages in thread From: Eric Sunshine @ 2017-10-29 17:34 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List On Sun, Oct 29, 2017 at 12:08 PM, Antoine Beaupré <anarcat@debian.org> wrote: > if we fail to find a requested namespace, we should tell the user s/if/If/ > which ones we know about, since we already do. this allows users to s/this/This/ Not sure what ", since we already do" means here. > feetch all namespaces by specifying a dummy namespace, failing, then s/feetch/fetch/ > copying the list of namespaces in the config. > > eventually, we should have a flag that allows fetching all namespaces > automatically. > > Reviewed-by: Antoine Beaupré <anarcat@debian.org> > Signed-off-by: Antoine Beaupré <anarcat@debian.org> > --- > diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl > @@ -1334,7 +1334,9 @@ sub get_mw_namespace_id { > my $id; > > if (!defined $ns) { > - print {*STDERR} "No such namespace ${name} on MediaWiki.\n"; > + my @namespaces = sort keys %namespace_id; > + for (@namespaces) { s/ /_/g; } > + print {*STDERR} "No such namespace ${name} on MediaWiki, known namespaces: @namespaces.\n"; Probably want to drop the terminating "." in the error message. > $ns = {is_namespace => 0}; > $namespace_id{$name} = $ns; > } ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 3/4] remote-mediawiki: show known namespace choices on failure 2017-10-29 17:34 ` Eric Sunshine @ 2017-10-29 18:31 ` Antoine Beaupré 0 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-10-29 18:31 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List On 2017-10-29 13:34:31, Eric Sunshine wrote: > On Sun, Oct 29, 2017 at 12:08 PM, Antoine Beaupré <anarcat@debian.org> wrote: >> if we fail to find a requested namespace, we should tell the user > > s/if/If/ fixed. >> which ones we know about, since we already do. this allows users to > > s/this/This/ > > Not sure what ", since we already do" means here. we already have fetched the mapping, fixed. >> feetch all namespaces by specifying a dummy namespace, failing, then > > s/feetch/fetch/ fixed. >> copying the list of namespaces in the config. >> >> eventually, we should have a flag that allows fetching all namespaces >> automatically. >> >> Reviewed-by: Antoine Beaupré <anarcat@debian.org> >> Signed-off-by: Antoine Beaupré <anarcat@debian.org> >> --- >> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl >> @@ -1334,7 +1334,9 @@ sub get_mw_namespace_id { >> my $id; >> >> if (!defined $ns) { >> - print {*STDERR} "No such namespace ${name} on MediaWiki.\n"; >> + my @namespaces = sort keys %namespace_id; >> + for (@namespaces) { s/ /_/g; } >> + print {*STDERR} "No such namespace ${name} on MediaWiki, known namespaces: @namespaces.\n"; > > Probably want to drop the terminating "." in the error message. meh... i just respected what was already there, but it's true it can be error-prone when copy-pasting, so removed. a. -- A ballot is like a bullet. You don't throw your ballots until you see a target, and if that target is not within your reach, keep your ballot in your pocket. - Malcom X ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 3/4] remote-mediawiki: show known namespace choices on failure 2017-10-29 16:08 ` [PATCH 3/4] remote-mediawiki: show known namespace choices on failure Antoine Beaupré 2017-10-29 17:34 ` Eric Sunshine @ 2017-11-04 10:57 ` Thomas Adam 1 sibling, 0 replies; 78+ messages in thread From: Thomas Adam @ 2017-11-04 10:57 UTC (permalink / raw) To: Antoine Beaupré; +Cc: git On Sun, Oct 29, 2017 at 12:08:56PM -0400, Antoine Beaupré wrote: > if we fail to find a requested namespace, we should tell the user > which ones we know about, since we already do. this allows users to > feetch all namespaces by specifying a dummy namespace, failing, then > copying the list of namespaces in the config. > > eventually, we should have a flag that allows fetching all namespaces > automatically. > > Reviewed-by: Antoine Beaupré <anarcat@debian.org> > Signed-off-by: Antoine Beaupré <anarcat@debian.org> > --- > contrib/mw-to-git/git-remote-mediawiki.perl | 4 +++- > 1 file changed, 3 insertions(+), 1 deletion(-) > > diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl > index fc48846a1..07cc74bac 100755 > --- a/contrib/mw-to-git/git-remote-mediawiki.perl > +++ b/contrib/mw-to-git/git-remote-mediawiki.perl > @@ -1334,7 +1334,9 @@ sub get_mw_namespace_id { > my $id; > > if (!defined $ns) { > - print {*STDERR} "No such namespace ${name} on MediaWiki.\n"; > + my @namespaces = sort keys %namespace_id; > + for (@namespaces) { s/ /_/g; } I am sure we can improve upon the need to process @namespaces twice: my @namespaces = map { s/ /_/g; $_; } sort keys %namespaces_id; -- Thomas Adam ^ permalink raw reply [flat|nested] 78+ messages in thread
* [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces 2017-10-29 16:08 [PATCH 0/4] WIP: git-remote-media wiki namespace support Antoine Beaupré ` (2 preceding siblings ...) 2017-10-29 16:08 ` [PATCH 3/4] remote-mediawiki: show known namespace choices on failure Antoine Beaupré @ 2017-10-29 16:08 ` Antoine Beaupré 2017-10-29 19:49 ` Eric Sunshine 2017-10-30 10:40 ` [PATCH 0/4] WIP: git-remote-media wiki namespace support Matthieu Moy 4 siblings, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-10-29 16:08 UTC (permalink / raw) To: git; +Cc: Antoine Beaupré Reviewed-by: Antoine Beaupré <anarcat@debian.org> Signed-off-by: Antoine Beaupré <anarcat@debian.org> --- contrib/mw-to-git/git-remote-mediawiki.perl | 31 +++++++++++++++++++---------- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl index 07cc74bac..ccefde4dc 100755 --- a/contrib/mw-to-git/git-remote-mediawiki.perl +++ b/contrib/mw-to-git/git-remote-mediawiki.perl @@ -264,16 +264,27 @@ sub get_mw_tracked_categories { sub get_mw_tracked_namespaces { my $pages = shift; - foreach my $local_namespace (@tracked_namespaces) { - my $mw_pages = $mediawiki->list( { - action => 'query', - list => 'allpages', - apnamespace => get_mw_namespace_id($local_namespace), - aplimit => 'max' } ) - || die $mediawiki->{error}->{code} . ': ' - . $mediawiki->{error}->{details} . "\n"; - foreach my $page (@{$mw_pages}) { - $pages->{$page->{title}} = $page; + foreach my $local_namespace (sort @tracked_namespaces) { + my ($mw_pages, $namespace_id); + if ($local_namespace eq "(Main)") { + $namespace_id = 0; + } else { + $namespace_id = get_mw_namespace_id($local_namespace); + } + if ($namespace_id >= 0) { + if ($mw_pages = $mediawiki->list( { + action => 'query', + list => 'allpages', + apnamespace => $namespace_id, + aplimit => 'max' } )) { + print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace ($namespace_id)\n"; + foreach my $page (@{$mw_pages}) { + $pages->{$page->{title}} = $page; + } + } else { + warn $mediawiki->{error}->{code} . ': ' + . $mediawiki->{error}->{details} . "\n"; + } } } return; -- 2.11.0 ^ permalink raw reply related [flat|nested] 78+ messages in thread
* Re: [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces 2017-10-29 16:08 ` [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces Antoine Beaupré @ 2017-10-29 19:49 ` Eric Sunshine 2017-10-30 2:11 ` Antoine Beaupré 2017-10-30 2:43 ` Antoine Beaupré 0 siblings, 2 replies; 78+ messages in thread From: Eric Sunshine @ 2017-10-29 19:49 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List On Sun, Oct 29, 2017 at 12:08 PM, Antoine Beaupré <anarcat@debian.org> wrote: > Subject: remote-mediawiki: allow using (Main) as a namespace and skip special namespaces This patch is more difficult to review than it perhaps ought to be since it is making multiple unrelated changes. It's not clear from the description what special namespaces are and why they need to be skipped. It's also not clear why (Main) is special. Perhaps the commit message(s) could explain these issues in more detail. To simplify review and make it easier to gauge what it going on, it might make sense to split this patch into at least two: one which skips "special namespaces", and one which gives special treatment to (Main). More below... > Reviewed-by: Antoine Beaupré <anarcat@debian.org> > Signed-off-by: Antoine Beaupré <anarcat@debian.org> > --- > diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl > @@ -264,16 +264,27 @@ sub get_mw_tracked_categories { > sub get_mw_tracked_namespaces { > my $pages = shift; > - foreach my $local_namespace (@tracked_namespaces) { > - my $mw_pages = $mediawiki->list( { > - action => 'query', > - list => 'allpages', > - apnamespace => get_mw_namespace_id($local_namespace), > - aplimit => 'max' } ) > - || die $mediawiki->{error}->{code} . ': ' > - . $mediawiki->{error}->{details} . "\n"; > - foreach my $page (@{$mw_pages}) { > - $pages->{$page->{title}} = $page; > + foreach my $local_namespace (sort @tracked_namespaces) { > + my ($mw_pages, $namespace_id); > + if ($local_namespace eq "(Main)") { > + $namespace_id = 0; > + } else { > + $namespace_id = get_mw_namespace_id($local_namespace); > + } > + if ($namespace_id >= 0) { This may be problematic since get_mw_namespace_id() may return undef rather than a number, in which case Perl will complain. Since the code skips the $mediawiki query altogether when it encounters "(Main)", you could fix this problem and simplify the code overall by simply skipping the bulk of the foreach loop body instead of mucking around with $namespace_id. For instance: foreach my $local_namespace (sort @tracked_namespaces) { next if ($local_namespace eq "(Main)"); ...normal processing... } > + if ($mw_pages = $mediawiki->list( { > + action => 'query', > + list => 'allpages', > + apnamespace => $namespace_id, > + aplimit => 'max' } )) { > + print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace ($namespace_id)\n"; The original code did not emit this diagnostic but the new code does so unconditionally. Is this just leftover debugging code or is intended that all users should see this information all the time? > + foreach my $page (@{$mw_pages}) { > + $pages->{$page->{title}} = $page; > + } > + } else { > + warn $mediawiki->{error}->{code} . ': ' > + . $mediawiki->{error}->{details} . "\n"; I guess this is the part which "skips special namespaces". The original code die()'d but this merely warns. Aside from these "special namespaces", are there genuine cases when the $mediawiki query would return an error, and which should indeed die(), or is warning appropriate for all $mediawiki query error cases? > + } > } > } > return; ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces 2017-10-29 19:49 ` Eric Sunshine @ 2017-10-30 2:11 ` Antoine Beaupré 2017-10-30 3:49 ` Eric Sunshine 2017-10-30 2:43 ` Antoine Beaupré 1 sibling, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-10-30 2:11 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List On 2017-10-29 15:49:28, Eric Sunshine wrote: > On Sun, Oct 29, 2017 at 12:08 PM, Antoine Beaupré <anarcat@debian.org> wrote: >> Subject: remote-mediawiki: allow using (Main) as a namespace and skip special namespaces > > This patch is more difficult to review than it perhaps ought to be > since it is making multiple unrelated changes. > > It's not clear from the description what special namespaces are and > why they need to be skipped. It's also not clear why (Main) is > special. Perhaps the commit message(s) could explain these issues in > more detail. > > To simplify review and make it easier to gauge what it going on, it > might make sense to split this patch into at least two: one which > skips "special namespaces", and one which gives special treatment to > (Main). Agreed, I'll try to do that. > More below... > >> Reviewed-by: Antoine Beaupré <anarcat@debian.org> >> Signed-off-by: Antoine Beaupré <anarcat@debian.org> >> --- >> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl >> @@ -264,16 +264,27 @@ sub get_mw_tracked_categories { >> sub get_mw_tracked_namespaces { >> my $pages = shift; >> - foreach my $local_namespace (@tracked_namespaces) { >> - my $mw_pages = $mediawiki->list( { >> - action => 'query', >> - list => 'allpages', >> - apnamespace => get_mw_namespace_id($local_namespace), >> - aplimit => 'max' } ) >> - || die $mediawiki->{error}->{code} . ': ' >> - . $mediawiki->{error}->{details} . "\n"; >> - foreach my $page (@{$mw_pages}) { >> - $pages->{$page->{title}} = $page; >> + foreach my $local_namespace (sort @tracked_namespaces) { >> + my ($mw_pages, $namespace_id); >> + if ($local_namespace eq "(Main)") { >> + $namespace_id = 0; >> + } else { >> + $namespace_id = get_mw_namespace_id($local_namespace); >> + } >> + if ($namespace_id >= 0) { > > This may be problematic since get_mw_namespace_id() may return undef > rather than a number, in which case Perl will complain. Since the code > skips the $mediawiki query altogether when it encounters "(Main)", you > could fix this problem and simplify the code overall by simply > skipping the bulk of the foreach loop body instead of mucking around > with $namespace_id. For instance: > > foreach my $local_namespace (sort @tracked_namespaces) { > next if ($local_namespace eq "(Main)"); > ...normal processing... > } Ah yes. I see your point but it doesn't actually skip the query when it encouters main ($namespace_id >= 0). >> + if ($mw_pages = $mediawiki->list( { >> + action => 'query', >> + list => 'allpages', >> + apnamespace => $namespace_id, >> + aplimit => 'max' } )) { >> + print {*STDERR} "$#{$mw_pages} found in namespace $local_namespace ($namespace_id)\n"; > > The original code did not emit this diagnostic but the new code does > so unconditionally. Is this just leftover debugging code or is > intended that all users should see this information all the time? This is a known issue that permeates the whole remote at this point, and it is quite annoying. https://github.com/Git-Mediawiki/Git-Mediawiki/issues/30 I have, however, considered it useful to include this to show progress as it can take a while to fetch all namespace information... Obviously, once we figure out how to silence this stuff (ie. how to recognize -q), it should be silenced like everything else, but until then I think it's quite useful. >> + foreach my $page (@{$mw_pages}) { >> + $pages->{$page->{title}} = $page; >> + } >> + } else { >> + warn $mediawiki->{error}->{code} . ': ' >> + . $mediawiki->{error}->{details} . "\n"; > > I guess this is the part which "skips special namespaces". The > original code die()'d but this merely warns. Aside from these "special > namespaces", are there genuine cases when the $mediawiki query would > return an error, and which should indeed die(), or is warning > appropriate for all $mediawiki query error cases? Maybe I didn't get the indentation right, but this } else { is for query failures, *not* the if ($namespace_id < 0). So < 0 is just silently skipped. The original code was die()'ing on failures, but I think that's a mistake: we should fetch what we can and warn on the failures. That allows the user to fix multiple problems at once instead of having to rerun the script repeatedly. A. -- Le féminisme n'a jamais tué personne Le machisme tue tous les jours. - Benoîte Groulx ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces 2017-10-30 2:11 ` Antoine Beaupré @ 2017-10-30 3:49 ` Eric Sunshine 0 siblings, 0 replies; 78+ messages in thread From: Eric Sunshine @ 2017-10-30 3:49 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List On Sun, Oct 29, 2017 at 10:11 PM, Antoine Beaupré <anarcat@debian.org> wrote: > On 2017-10-29 15:49:28, Eric Sunshine wrote: >> On Sun, Oct 29, 2017 at 12:08 PM, Antoine Beaupré <anarcat@debian.org> wrote: >>> + foreach my $local_namespace (sort @tracked_namespaces) { >>> + my ($mw_pages, $namespace_id); >>> + if ($local_namespace eq "(Main)") { >>> + $namespace_id = 0; >>> + } else { >>> + $namespace_id = get_mw_namespace_id($local_namespace); >>> + } >>> + if ($namespace_id >= 0) { >> >> This may be problematic since get_mw_namespace_id() may return undef >> rather than a number, in which case Perl will complain. Since the code >> skips the $mediawiki query altogether when it encounters "(Main)", you >> could fix this problem and simplify the code overall by simply >> skipping the bulk of the foreach loop body instead of mucking around >> with $namespace_id. For instance: >> >> foreach my $local_namespace (sort @tracked_namespaces) { >> next if ($local_namespace eq "(Main)"); >> ...normal processing... >> } > > Ah yes. I see your point but it doesn't actually skip the query when it > encouters main ($namespace_id >= 0). Ah, yes, you're right. My brain glossed over the '=' in '>=' for some reason. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces 2017-10-29 19:49 ` Eric Sunshine 2017-10-30 2:11 ` Antoine Beaupré @ 2017-10-30 2:43 ` Antoine Beaupré 2017-10-30 3:52 ` Eric Sunshine 1 sibling, 1 reply; 78+ messages in thread From: Antoine Beaupré @ 2017-10-30 2:43 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List On 2017-10-29 15:49:28, Eric Sunshine wrote: [...] >> Reviewed-by: Antoine Beaupré <anarcat@debian.org> >> Signed-off-by: Antoine Beaupré <anarcat@debian.org> >> --- >> diff --git a/contrib/mw-to-git/git-remote-mediawiki.perl b/contrib/mw-to-git/git-remote-mediawiki.perl >> @@ -264,16 +264,27 @@ sub get_mw_tracked_categories { >> sub get_mw_tracked_namespaces { >> my $pages = shift; >> - foreach my $local_namespace (@tracked_namespaces) { >> - my $mw_pages = $mediawiki->list( { >> - action => 'query', >> - list => 'allpages', >> - apnamespace => get_mw_namespace_id($local_namespace), >> - aplimit => 'max' } ) >> - || die $mediawiki->{error}->{code} . ': ' >> - . $mediawiki->{error}->{details} . "\n"; >> - foreach my $page (@{$mw_pages}) { >> - $pages->{$page->{title}} = $page; >> + foreach my $local_namespace (sort @tracked_namespaces) { >> + my ($mw_pages, $namespace_id); >> + if ($local_namespace eq "(Main)") { >> + $namespace_id = 0; >> + } else { >> + $namespace_id = get_mw_namespace_id($local_namespace); >> + } >> + if ($namespace_id >= 0) { > > This may be problematic since get_mw_namespace_id() may return undef > rather than a number, in which case Perl will complain. [...] Actually, get_mw_namespace_id() doesn't seem like it can return undef - did you mistake it with get_mw_namespace_id_for_page()? A. -- Uncompromising war resistance and refusal to do military service under any circumstances. - Albert Einstein ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces 2017-10-30 2:43 ` Antoine Beaupré @ 2017-10-30 3:52 ` Eric Sunshine 2017-10-30 12:17 ` Antoine Beaupré 0 siblings, 1 reply; 78+ messages in thread From: Eric Sunshine @ 2017-10-30 3:52 UTC (permalink / raw) To: Antoine Beaupré; +Cc: Git List On Sun, Oct 29, 2017 at 10:43 PM, Antoine Beaupré <anarcat@debian.org> wrote: > On 2017-10-29 15:49:28, Eric Sunshine wrote: >> This may be problematic since get_mw_namespace_id() may return undef >> rather than a number, in which case Perl will complain. > > Actually, get_mw_namespace_id() doesn't seem like it can return undef - > did you mistake it with get_mw_namespace_id_for_page()? Hmm, no. What I see in the function is this: my $id; ... if ($ns->{is_namespace}) { $id = $ns->{id}; } ... return $id; So, $id starts undefined and is assigned only conditionally before being returned, but perhaps I'm missing some subtlety. ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces 2017-10-30 3:52 ` Eric Sunshine @ 2017-10-30 12:17 ` Antoine Beaupré 0 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-10-30 12:17 UTC (permalink / raw) To: Eric Sunshine; +Cc: Git List On 2017-10-29 23:52:16, Eric Sunshine wrote: > On Sun, Oct 29, 2017 at 10:43 PM, Antoine Beaupré <anarcat@debian.org> wrote: >> On 2017-10-29 15:49:28, Eric Sunshine wrote: >>> This may be problematic since get_mw_namespace_id() may return undef >>> rather than a number, in which case Perl will complain. >> >> Actually, get_mw_namespace_id() doesn't seem like it can return undef - >> did you mistake it with get_mw_namespace_id_for_page()? > > Hmm, no. What I see in the function is this: > > my $id; > ... > if ($ns->{is_namespace}) { > $id = $ns->{id}; > } > ... > return $id; > > So, $id starts undefined and is assigned only conditionally before > being returned, but perhaps I'm missing some subtlety. Ah yes, you're probably right there. -- During the initial stage of the struggle, the oppressed, instead of striving for liberation, tend themselves to become oppressors The very structure of their thought has been conditioned by the contradictions of the concrete, existential situation by which they were shaped. Their ideal is to be men; but for them, to be men is to be oppressors. This is their model of humanity. - Paulo Freire, Pedagogy of the Oppressed ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 0/4] WIP: git-remote-media wiki namespace support 2017-10-29 16:08 [PATCH 0/4] WIP: git-remote-media wiki namespace support Antoine Beaupré ` (3 preceding siblings ...) 2017-10-29 16:08 ` [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces Antoine Beaupré @ 2017-10-30 10:40 ` Matthieu Moy 2017-10-30 12:20 ` Antoine Beaupré 4 siblings, 1 reply; 78+ messages in thread From: Matthieu Moy @ 2017-10-30 10:40 UTC (permalink / raw) To: Antoine Beaupré; +Cc: git Antoine Beaupré <anarcat@debian.org> writes: > Obviously, doing unit tests against a full MediaWiki instance isn't > exactly trivial. Not trivial, but doable: there is all the infrastructure to do so in t/: install-wiki.sh to automatically install Mediawiki, and then a testsuite that interacts with it. This has been written under the assumption that the developer had a lighttpd instance running on localhost, but this can probably be adapted to run on Travis-CI (install lighttpd & Mediawiki in the install: part, and run the tests afterwards), so that anyone can run the tests by just submitting a pull-request to Git-Mediawiki. If you are to work more on Git-Mediawiki, don't underestimate the usefullness of the testsuite (for example, Git-Mediawiki was developped against a prehistoric version of Mediawiki, the testsuite can help ensuring it still works on the lastest version), nor the fun of playing with install scripts and CI systems ;-). Cheers, -- Matthieu Moy https://matthieu-moy.fr/ ^ permalink raw reply [flat|nested] 78+ messages in thread
* Re: [PATCH 0/4] WIP: git-remote-media wiki namespace support 2017-10-30 10:40 ` [PATCH 0/4] WIP: git-remote-media wiki namespace support Matthieu Moy @ 2017-10-30 12:20 ` Antoine Beaupré 0 siblings, 0 replies; 78+ messages in thread From: Antoine Beaupré @ 2017-10-30 12:20 UTC (permalink / raw) To: Matthieu Moy; +Cc: git On 2017-10-30 11:40:06, Matthieu Moy wrote: > Antoine Beaupré <anarcat@debian.org> writes: > >> Obviously, doing unit tests against a full MediaWiki instance isn't >> exactly trivial. > > Not trivial, but doable: there is all the infrastructure to do so in t/: > install-wiki.sh to automatically install Mediawiki, and then a testsuite > that interacts with it. > > This has been written under the assumption that the developer had a > lighttpd instance running on localhost, but this can probably be adapted > to run on Travis-CI (install lighttpd & Mediawiki in the install: part, > and run the tests afterwards), so that anyone can run the tests by just > submitting a pull-request to Git-Mediawiki. > > If you are to work more on Git-Mediawiki, don't underestimate the > usefullness of the testsuite (for example, Git-Mediawiki was developped > against a prehistoric version of Mediawiki, the testsuite can help > ensuring it still works on the lastest version), nor the fun of playing > with install scripts and CI systems ;-). Hello! Glad to hear from you. :) So I actually tried install-wiki.sh, and it "failed to start lighttpd" and told me to see logs. I couldn't find them and stopped there... It would be great to hook this up into CI somewhere, but I suspect it isn't considering how it doesn't actually work out of the box. I'm hoping we can still do things and fix some things without going through that trouble, but I recognize it would be better to have unit tests operational. Honestly, I would prefer just having this thing work and not have to work on it. :) I have lots of things on my plate and I'm just scratching an itch on this one - some backup script broke and I am trying to fix it. Once it works, my work is done, so unfortunately I cannot lead that project (but I'd be happy to help when I can of course). A. -- The greatest tragedy in mankind's entire history may be the hijacking of morality by religion. - Arthur C. Clarke ^ permalink raw reply [flat|nested] 78+ messages in thread
end of thread, other threads:[~2017-11-08 2:07 UTC | newest] Thread overview: 78+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2017-10-29 16:08 [PATCH 0/4] WIP: git-remote-media wiki namespace support Antoine Beaupré 2017-10-29 16:08 ` [PATCH 1/4] remote-mediawiki: add " Antoine Beaupré 2017-10-29 17:24 ` Eric Sunshine 2017-10-29 18:29 ` Antoine Beaupré 2017-10-29 20:07 ` Eric Sunshine 2017-10-29 23:08 ` Kevin 2017-10-30 2:14 ` Antoine Beaupré 2017-10-30 2:51 ` [PATCH v2 0/7] " Antoine Beaupré 2017-10-30 2:51 ` [PATCH 1/7] " Antoine Beaupré 2017-10-30 2:51 ` [PATCH 2/7] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré 2017-10-30 2:51 ` [PATCH 3/7] remote-mediawiki: show known namespace choices on failure Antoine Beaupré 2017-10-30 2:51 ` [PATCH 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré 2017-11-01 13:52 ` Eric Sunshine 2017-11-01 16:45 ` Antoine Beaupré 2017-11-02 1:24 ` Junio C Hamano 2017-11-02 21:20 ` Antoine Beaupré 2017-11-06 0:38 ` Junio C Hamano 2017-10-30 2:51 ` [PATCH 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré 2017-11-01 19:56 ` Eric Sunshine 2017-11-02 21:19 ` Antoine Beaupré 2017-10-30 2:51 ` [PATCH 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré 2017-11-01 19:59 ` Eric Sunshine 2017-10-30 2:51 ` [PATCH 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré 2017-11-01 20:01 ` Eric Sunshine 2017-11-02 21:25 ` [PATCH v3 0/7] remote-mediawiki: namespace support Antoine Beaupré 2017-11-02 21:25 ` [PATCH v3 1/7] remote-mediawiki: add " Antoine Beaupré 2017-11-02 21:25 ` [PATCH v3 2/7] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré 2017-11-02 21:25 ` [PATCH v3 3/7] remote-mediawiki: show known namespace choices on failure Antoine Beaupré 2017-11-02 21:25 ` [PATCH v3 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré 2017-11-02 22:43 ` Eric Sunshine 2017-11-02 22:54 ` Antoine Beaupré 2017-11-02 21:25 ` [PATCH v3 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré 2017-11-02 22:48 ` Eric Sunshine 2017-11-02 21:25 ` [PATCH v3 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré 2017-11-02 22:49 ` Eric Sunshine 2017-11-02 21:25 ` [PATCH v3 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré 2017-11-02 22:18 ` Thomas Adam 2017-11-02 22:26 ` Antoine Beaupré 2017-11-02 22:31 ` Thomas Adam 2017-11-02 23:10 ` Antoine Beaupré 2017-11-04 9:57 ` Thomas Adam 2017-11-02 22:50 ` Eric Sunshine 2017-11-06 21:19 ` [PATCH v4 0/7] remote-mediawiki: namespace support Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 1/7] remote-mediawiki: add " Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 2/7] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré 2017-11-07 7:08 ` Thomas Adam 2017-11-07 16:03 ` Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 3/7] remote-mediawiki: show known namespace choices on failure Antoine Beaupré 2017-11-07 10:45 ` Thomas Adam 2017-11-07 16:07 ` Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré 2017-11-06 21:19 ` [PATCH v4 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 0/7] namespace support Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 1/7] remote-mediawiki: add " Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 2/7] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 3/7] remote-mediawiki: show known namespace choices on failure Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 4/7] remote-mediawiki: skip virtual namespaces Antoine Beaupré 2017-11-07 16:06 ` [PATCH v5 5/7] remote-mediawiki: support fetching from (Main) namespace Antoine Beaupré 2017-11-07 16:07 ` [PATCH v5 6/7] remote-mediawiki: process namespaces in order Antoine Beaupré 2017-11-07 16:07 ` [PATCH v5 7/7] remote-mediawiki: show progress while fetching namespaces Antoine Beaupré 2017-11-08 2:07 ` [PATCH v5 0/7] namespace support Junio C Hamano 2017-10-30 10:43 ` [PATCH 1/4] remote-mediawiki: add " Matthieu Moy 2017-10-29 16:08 ` [PATCH 2/4] remote-mediawiki: allow fetching namespaces with spaces Antoine Beaupré 2017-10-29 16:08 ` [PATCH 3/4] remote-mediawiki: show known namespace choices on failure Antoine Beaupré 2017-10-29 17:34 ` Eric Sunshine 2017-10-29 18:31 ` Antoine Beaupré 2017-11-04 10:57 ` Thomas Adam 2017-10-29 16:08 ` [PATCH 4/4] remote-mediawiki: allow using (Main) as a namespace and skip special namespaces Antoine Beaupré 2017-10-29 19:49 ` Eric Sunshine 2017-10-30 2:11 ` Antoine Beaupré 2017-10-30 3:49 ` Eric Sunshine 2017-10-30 2:43 ` Antoine Beaupré 2017-10-30 3:52 ` Eric Sunshine 2017-10-30 12:17 ` Antoine Beaupré 2017-10-30 10:40 ` [PATCH 0/4] WIP: git-remote-media wiki namespace support Matthieu Moy 2017-10-30 12:20 ` Antoine Beaupré
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.