* Bug in .mailmap handling? @ 2013-07-12 16:07 Stefan Beller 2013-07-12 17:35 ` Junio C Hamano 2013-07-12 20:28 ` Junio C Hamano 0 siblings, 2 replies; 8+ messages in thread From: Stefan Beller @ 2013-07-12 16:07 UTC (permalink / raw) To: git Hello, you may have noticed I am currently trying to bring the mailmap file of git itself up to date. I noticed some behavior, which I did not expect. Have a look yourself: --- # prepare test environment: mkdir testmailmap cd testmailmap/ git init # do a commit: echo "asdf" > test1 git add test1 git commit -a --author="A <A@example.org>" -m "add test1" # commit with same name, but different email # (different capitalization does the trick already, # but here I am going to use a different mail) echo "asdf" > test2 git add test2 git commit -a --author="A <changed_email@example.org>" -m "add test2" # how do we know it's the same person? git shortlog A (2): add test1 add test2 # reports as expected: git shortlog -sne 1 A <A@example.org> 1 A <changed_email@example.org> # Adding the line to the mailmap should make life easy, so we know # it's the same person echo "A <A@example.org> <changed_email@example.org>" > .mailmap # Come on, I just wanted to have it reported as one person! git shortlog -sne 1 A <A@example.org> 1 A <a@example.org> # So let's try another line in the mailmap file, (small 'a') echo "A <a@example.org> <changed_email@example.org>" > .mailmap # We're not there yet? git shortlog -sne 1 A <A@example.org> 1 A <a@example.org> # Now let's write it rather explicit: # (essentially just write 2 lines into the mailmap file) cat << EOF > .mailmap A <a@example.org> <changed_email@example.org> A <a@example.org> <A@example.org> EOF # works as expected now git shortlog -sne 2 A <a@example.org> # works as expected now as well git shortlog A (2): add test1 add test2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Bug in .mailmap handling? 2013-07-12 16:07 Bug in .mailmap handling? Stefan Beller @ 2013-07-12 17:35 ` Junio C Hamano 2013-07-12 17:47 ` Stefan Beller 2013-07-12 20:28 ` Junio C Hamano 1 sibling, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2013-07-12 17:35 UTC (permalink / raw) To: Stefan Beller; +Cc: git Stefan Beller <stefanbeller@googlemail.com> writes: > Hello, > > you may have noticed I am currently trying to bring the > mailmap file of git itself up to date. I noticed > some behavior, which I did not expect. Have a look yourself: > > --- > # prepare test environment: > mkdir testmailmap > cd testmailmap/ > git init > > # do a commit: > echo "asdf" > test1 > git add test1 > git commit -a --author="A <A@example.org>" -m "add test1" > > # commit with same name, but different email > # (different capitalization does the trick already, > # but here I am going to use a different mail) > echo "asdf" > test2 > git add test2 > git commit -a --author="A <changed_email@example.org>" -m "add test2" > > # how do we know it's the same person? > git shortlog > A (2): > add test1 > add test2 You don't, and it is a long known behaviour. > # reports as expected: > git shortlog -sne > 1 A <A@example.org> > 1 A <changed_email@example.org> Yes. > # Adding the line to the mailmap should make life easy, so we know > # it's the same person > echo "A <A@example.org> <changed_email@example.org>" > .mailmap > > # Come on, I just wanted to have it reported as one person! > git shortlog -sne > 1 A <A@example.org> > 1 A <a@example.org> Err, where does the lowercase a@ come from in the above? Are we missing some steps before we get here? > # So let's try another line in the mailmap file, (small 'a') > echo "A <a@example.org> <changed_email@example.org>" > .mailmap This is ">", not ">>", I presume? Otherwise changed_email is mapped to two destination, no? > # We're not there yet? > git shortlog -sne > 1 A <A@example.org> > 1 A <a@example.org> Expected, as long as some hidden set-up you did not describe that caused me to say "Err, where does the lowercase a@ come from" is there, i.e. one of the two commits is done by <a@example.org>. > # Now let's write it rather explicit: > # (essentially just write 2 lines into the mailmap file) > cat << EOF > .mailmap > A <a@example.org> <changed_email@example.org> > A <a@example.org> <A@example.org> > EOF > > # works as expected now > git shortlog -sne > 2 A <a@example.org> Makes sense. > # works as expected now as well > git shortlog > A (2): > add test1 > add test2 ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Bug in .mailmap handling? 2013-07-12 17:35 ` Junio C Hamano @ 2013-07-12 17:47 ` Stefan Beller 0 siblings, 0 replies; 8+ messages in thread From: Stefan Beller @ 2013-07-12 17:47 UTC (permalink / raw) To: Junio C Hamano, git [-- Attachment #1: Type: text/plain, Size: 2947 bytes --] The commands are exactly as given in the first mail. (I run it multiple times, and this exact behavior comes up) I think there is some problem with the capitalisation of the first (if not all) characters. (Hence the small 'a') So either check with these example commands yourself, or take my latest patch for the mailmap to reproduce on real names, please. On 07/12/2013 07:35 PM, Junio C Hamano wrote: > Stefan Beller <stefanbeller@googlemail.com> writes: > >> Hello, >> >> you may have noticed I am currently trying to bring the >> mailmap file of git itself up to date. I noticed >> some behavior, which I did not expect. Have a look yourself: >> >> --- >> # prepare test environment: >> mkdir testmailmap >> cd testmailmap/ >> git init >> >> # do a commit: >> echo "asdf" > test1 >> git add test1 >> git commit -a --author="A <A@example.org>" -m "add test1" >> >> # commit with same name, but different email >> # (different capitalization does the trick already, >> # but here I am going to use a different mail) >> echo "asdf" > test2 >> git add test2 >> git commit -a --author="A <changed_email@example.org>" -m "add test2" >> >> # how do we know it's the same person? >> git shortlog >> A (2): >> add test1 >> add test2 > > You don't, and it is a long known behaviour. > >> # reports as expected: >> git shortlog -sne >> 1 A <A@example.org> >> 1 A <changed_email@example.org> > > Yes. > >> # Adding the line to the mailmap should make life easy, so we know >> # it's the same person >> echo "A <A@example.org> <changed_email@example.org>" > .mailmap >> >> # Come on, I just wanted to have it reported as one person! >> git shortlog -sne >> 1 A <A@example.org> >> 1 A <a@example.org> > > Err, where does the lowercase a@ come from in the above? Are we > missing some steps before we get here? > >> # So let's try another line in the mailmap file, (small 'a') >> echo "A <a@example.org> <changed_email@example.org>" > .mailmap > > This is ">", not ">>", I presume? Otherwise changed_email is mapped > to two destination, no? > >> # We're not there yet? >> git shortlog -sne >> 1 A <A@example.org> >> 1 A <a@example.org> > > Expected, as long as some hidden set-up you did not describe that > caused me to say "Err, where does the lowercase a@ come from" is > there, i.e. one of the two commits is done by <a@example.org>. > >> # Now let's write it rather explicit: >> # (essentially just write 2 lines into the mailmap file) >> cat << EOF > .mailmap >> A <a@example.org> <changed_email@example.org> >> A <a@example.org> <A@example.org> >> EOF >> >> # works as expected now >> git shortlog -sne >> 2 A <a@example.org> > > Makes sense. > >> # works as expected now as well >> git shortlog >> A (2): >> add test1 >> add test2 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 899 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Bug in .mailmap handling? 2013-07-12 16:07 Bug in .mailmap handling? Stefan Beller 2013-07-12 17:35 ` Junio C Hamano @ 2013-07-12 20:28 ` Junio C Hamano 2013-07-12 20:35 ` Stefan Beller 2013-07-12 20:38 ` Junio C Hamano 1 sibling, 2 replies; 8+ messages in thread From: Junio C Hamano @ 2013-07-12 20:28 UTC (permalink / raw) To: Stefan Beller; +Cc: git Stefan Beller <stefanbeller@googlemail.com> writes: > # Adding the line to the mailmap should make life easy, so we know > # it's the same person > echo "A <A@example.org> <changed_email@example.org>" > .mailmap While I was looking at this, I noticed this piece of code: diff --git a/mailmap.c b/mailmap.c index 2a7b366..418081e 100644 --- a/mailmap.c +++ b/mailmap.c @@ -122,7 +122,7 @@ static char *parse_name_and_email(char *buffer, char **name, while (nend > nstart && isspace(*nend)) --nend; - *name = (nstart < nend ? nstart : NULL); + *name = (nstart <= nend ? nstart : NULL); *email = left+1; *(nend+1) = '\0'; *right++ = '\0'; The function is given a buffer "A <A@example.org>...", nstart scans from the beginning of the buffer, skipping whitespaces (there isn't any, so nstart points at the buffer), while nend starts from one byte before the first '<' and skips whitespaces backwards and ends at the first non-whitespace (i.e. it hits "A" at the beginning of the buffer). nstart == nend in this case for a single-letter name, and an off-by-one error makes it fail to pick up the name, which makes the entry equivalent to <A@example.org> <changed_email@example.org> without the name. I do not think this bug affected anything you observed, though. > git shortlog -sne > 1 A <A@example.org> > 1 A <a@example.org> This is coming from mailmap.c::add_mapping() that downcases the e-mail address. changed_email@example.org is mapped to a@example.org because of this downcasing, while "A <A@example.org>" does not have any entry for it in the .mailmap file, so it is given back as-is. Hence we see two distinct entries. ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: Bug in .mailmap handling? 2013-07-12 20:28 ` Junio C Hamano @ 2013-07-12 20:35 ` Stefan Beller 2013-07-12 20:38 ` Junio C Hamano 1 sibling, 0 replies; 8+ messages in thread From: Stefan Beller @ 2013-07-12 20:35 UTC (permalink / raw) To: Junio C Hamano; +Cc: git On 07/12/2013 10:28 PM, Junio C Hamano wrote: > Stefan Beller <stefanbeller@googlemail.com> writes: > >> # Adding the line to the mailmap should make life easy, so we know >> # it's the same person >> echo "A <A@example.org> <changed_email@example.org>" > .mailmap > > While I was looking at this, I noticed this piece of code: > > diff --git a/mailmap.c b/mailmap.c > index 2a7b366..418081e 100644 > --- a/mailmap.c > +++ b/mailmap.c > @@ -122,7 +122,7 @@ static char *parse_name_and_email(char *buffer, char **name, > while (nend > nstart && isspace(*nend)) > --nend; > > - *name = (nstart < nend ? nstart : NULL); > + *name = (nstart <= nend ? nstart : NULL); > *email = left+1; > *(nend+1) = '\0'; > *right++ = '\0'; > > The function is given a buffer "A <A@example.org>...", nstart scans > from the beginning of the buffer, skipping whitespaces (there isn't > any, so nstart points at the buffer), while nend starts from one > byte before the first '<' and skips whitespaces backwards and ends > at the first non-whitespace (i.e. it hits "A" at the beginning of > the buffer). nstart == nend in this case for a single-letter name, > and an off-by-one error makes it fail to pick up the name, which > makes the entry equivalent to > > <A@example.org> <changed_email@example.org> > > without the name. I do not think this bug affected anything you > observed, though. > >> git shortlog -sne >> 1 A <A@example.org> >> 1 A <a@example.org> > > This is coming from mailmap.c::add_mapping() that downcases the > e-mail address. > > changed_email@example.org is mapped to a@example.org because of this > downcasing, while "A <A@example.org>" does not have any entry for it > in the .mailmap file, so it is given back as-is. Hence we see two > distinct entries. > So do I understand it right, we're having 2 bugs in here? One being triggered by the short name, only one character. So if you want to debug the other bug with a longer name, you can either use a made up name, or run git shortlog -sne |grep Knut in the git repository having the mailmap file already updated. The way the mailmap file is written, I'd assume only one line to be found, as of now 2 lines come up 2 Knut Franke <knut.franke@gmx.de> 1 Knut Franke <Knut.Franke@gmx.de> which seems to downcase the whole first email. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Bug in .mailmap handling? 2013-07-12 20:28 ` Junio C Hamano 2013-07-12 20:35 ` Stefan Beller @ 2013-07-12 20:38 ` Junio C Hamano 2013-07-12 20:50 ` Junio C Hamano 1 sibling, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2013-07-12 20:38 UTC (permalink / raw) To: Stefan Beller; +Cc: git Junio C Hamano <gitster@pobox.com> writes: > Stefan Beller <stefanbeller@googlemail.com> writes: > >> git shortlog -sne >> 1 A <A@example.org> >> 1 A <a@example.org> > > This is coming from mailmap.c::add_mapping() that downcases the > e-mail address. > > changed_email@example.org is mapped to a@example.org because of this > downcasing, while "A <A@example.org>" does not have any entry for it > in the .mailmap file, so it is given back as-is. Hence we see two > distinct entries. I think it is wrong for the parser to lose information by downcasing. It is perfectly fine to do the comparison case insensitively, to allow <changed_Email@example.org> in the input is mangled using the entry for <changed_email@example.org> in the .mailmap file; it is not fine to downcase the parsed result, especially the side that is used as the "rewritten result" (i.e. the tokens earlier on the line), as that would mean mangling the output. Let me see if I can quickly whip up a fix. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Bug in .mailmap handling? 2013-07-12 20:38 ` Junio C Hamano @ 2013-07-12 20:50 ` Junio C Hamano 2013-07-12 21:13 ` [PATCH] Add a testcase for checking case insensitivity of mail map Stefan Beller 0 siblings, 1 reply; 8+ messages in thread From: Junio C Hamano @ 2013-07-12 20:50 UTC (permalink / raw) To: Stefan Beller; +Cc: git Junio C Hamano <gitster@pobox.com> writes: > Junio C Hamano <gitster@pobox.com> writes: > >> Stefan Beller <stefanbeller@googlemail.com> writes: >> >>> git shortlog -sne >>> 1 A <A@example.org> >>> 1 A <a@example.org> >> >> This is coming from mailmap.c::add_mapping() that downcases the >> e-mail address. >> >> changed_email@example.org is mapped to a@example.org because of this >> downcasing, while "A <A@example.org>" does not have any entry for it >> in the .mailmap file, so it is given back as-is. Hence we see two >> distinct entries. > > I think it is wrong for the parser to lose information by > downcasing. > > It is perfectly fine to do the comparison case insensitively, to > allow <changed_Email@example.org> in the input is mangled using the > entry for <changed_email@example.org> in the .mailmap file; it is > not fine to downcase the parsed result, especially the side that is > used as the "rewritten result" (i.e. the tokens earlier on the line), > as that would mean mangling the output. > > Let me see if I can quickly whip up a fix. It might be just the matter of doing this. I suspect that we could drop the downcasing of old-email, too, but the new-email is supposed to be the rewritten result that appears on the output, and downcasing it is very wrong. I also suspect that this was an old workaround for the original string-list that did not know how to match items case insensitively, which we should have removed when we added case sensitivity support to the string-list at around 577f63e7 (Merge branch 'ap/log-mailmap', 2013-01-20). mailmap.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/mailmap.c b/mailmap.c index 418081e..c64a53d 100644 --- a/mailmap.c +++ b/mailmap.c @@ -56,9 +56,6 @@ static void add_mapping(struct string_list *map, if (old_email) for (p = old_email; *p; p++) *p = tolower(*p); - if (new_email) - for (p = new_email; *p; p++) - *p = tolower(*p); if (old_email == NULL) { old_email = new_email; ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] Add a testcase for checking case insensitivity of mail map 2013-07-12 20:50 ` Junio C Hamano @ 2013-07-12 21:13 ` Stefan Beller 0 siblings, 0 replies; 8+ messages in thread From: Stefan Beller @ 2013-07-12 21:13 UTC (permalink / raw) To: git, Junio C Hamano; +Cc: Stefan Beller Your patch seems to do it. I added a test case which doesn't fail, if your patch is added. Signed-off-by: Stefan Beller <stefanbeller@googlemail.com> --- t/t4203-mailmap.sh | 33 +++++++++++++++++++++++++++++++++ 1 file changed, 33 insertions(+) diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh index 842b754..07152e9 100755 --- a/t/t4203-mailmap.sh +++ b/t/t4203-mailmap.sh @@ -409,4 +409,37 @@ test_expect_success 'Blame output (complex mapping)' ' test_cmp expect actual.fuzz ' +test_expect_success 'cleanup after mailmap.blob tests' ' + rm -f .mailmap +' + +cat >expect <<\EOF + 2 A <A@example.org> + 2 Other Author <other@author.xx> + 2 Santa Claus <santa.claus@northpole.xx> + 1 A U Thor <author@example.com> + 1 CTO <cto@company.xx> + 1 Some Dude <some@dude.xx> +EOF + +test_expect_success 'Test case sensitivity of Names' ' + # do a commit: + echo "asdf" > test1 + git add test1 + git commit -a --author="A <A@example.org>" -m "add test1" + + # commit with same name, but different email + # (different capitalization does the trick already, + # but here I am going to use a different mail) + echo "asdf" > test2 + git add test2 + git commit -a --author="A <changed_email@example.org>" -m "add test2" + + # Adding the line to the mailmap should make life easy, so we know + # it is the same person + echo "A <A@example.org> <changed_email@example.org>" > .mailmap + + git shortlog -sne HEAD >actual && test_cmp expect actual +' + test_done -- 1.8.3.2.776.gfcf213d ^ permalink raw reply related [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-07-12 21:13 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2013-07-12 16:07 Bug in .mailmap handling? Stefan Beller 2013-07-12 17:35 ` Junio C Hamano 2013-07-12 17:47 ` Stefan Beller 2013-07-12 20:28 ` Junio C Hamano 2013-07-12 20:35 ` Stefan Beller 2013-07-12 20:38 ` Junio C Hamano 2013-07-12 20:50 ` Junio C Hamano 2013-07-12 21:13 ` [PATCH] Add a testcase for checking case insensitivity of mail map Stefan Beller
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.