All of lore.kernel.org
 help / color / mirror / Atom feed
From: Anthony Ramine <n.oxyde@gmail.com>
To: Duy Nguyen <pclouds@gmail.com>
Cc: Git Mailing List <git@vger.kernel.org>
Subject: Re: [PATCH v3] wildmatch: properly fold case everywhere
Date: Wed, 29 May 2013 19:57:44 +0200	[thread overview]
Message-ID: <BAB62C57-FE7D-476A-ACA7-5831BAF3E558@gmail.com> (raw)
In-Reply-To: <CACsJy8A61nYu9a-BhUiBhBEv-e6_CtYyZE3sG9iCiau+3EKVdw@mail.gmail.com>

Replied inline.

-- 
Anthony Ramine

Le 29 mai 2013 à 15:52, Duy Nguyen a écrit :

> On Wed, May 29, 2013 at 8:37 PM, Anthony Ramine <n.oxyde@gmail.com> wrote:
>> Le 29 mai 2013 à 15:22, Duy Nguyen a écrit :
>> 
>>> On Tue, May 28, 2013 at 8:58 PM, Anthony Ramine <n.oxyde@gmail.com> wrote:
>>>> Case folding is not done correctly when matching against the [:upper:]
>>>> character class and uppercased character ranges (e.g. A-Z).
>>>> Specifically, an uppercase letter fails to match against any of them
>>>> when case folding is requested because plain characters in the pattern
>>>> and the whole string and preemptively lowercased to handle the base case
>>>> fast.
>>> 
>>> I did a little test with glibc fnmatch and also checked the source
>>> code. I don't think 'a' matches [:upper:]. So I'm not sure if that's a
>>> correct behavior or a bug in glibc. The spec is not clear (I think) on
>>> this. I guess we should just assume that 'a' should match '[:upper:]'?
>> 
>> I don't know, in my opinion if case folding is enabled we should say [:upper:], [:lower:] and [:alpha:] are equivalent.
>> 
>> This opinion is shared by GNU Flex [1]:
>> 
>>>      • If your scanner is case-insensitive (the ‘-i’ flag), then ‘[:upper:]’ and ‘[:lower:]’ are equivalent to ‘[:alpha:]’.
>> 
>> [1] http://flex.sourceforge.net/manual/Patterns.html
> 
> Then we should do it too because of this precedent, I think.
> 
>>>> @@ -196,6 +196,11 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>>>>                                       }
>>>>                                       if (t_ch <= p_ch && t_ch >= prev_ch)
>>>>                                               matched = 1;
>>>> +                                       else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch)) {
>>>> +                                               uchar t_ch_upper = toupper(t_ch);
>>>> +                                               if (t_ch_upper <= p_ch && t_ch_upper >= prev_ch)
>>>> +                                                       matched = 1;
>>>> +                                       }
>>> 
>>> Or we could stick with to tolower. Something like this
>>> 
>>> if ((t_ch <= p_ch && t_ch >= prev_ch) ||
>>>  ((flags & WM_CASEFOLD) &&
>>>     t_ch <= tolower(p_ch) && t_ch >= tolower(prev_ch)))
>>>  match = 1;
>>> 
>>> I think it's easier to read if we either downcase all, or upcase all, not both.
>> 
>> If the range to match against is [A-_], it will become [a-_] which is an empty range, ord('a') > ord('_'). I think it is simpler to reuse toupper() after the fact as I did.
>> 
>> Anyway maybe I should add a test for that corner case?
> 
> Yeah I was thinking about such a case, but I saw glibc do it... I
> guess we just found another bug, at least in compat/fnmatch.c. Yes a
> test for it would be great, in case I change my mind 2 years from now
> and decide to turn it the other way ;)

Should I patch compat/fnmatch.c too? That would make it different from the glibc's one.

>> 
>>>>                                       p_ch = 0; /* This makes "prev_ch" get set to 0. */
>>>>                               } else if (p_ch == '[' && p[1] == ':') {
>>>>                                       const uchar *s;
>>>> @@ -245,6 +250,8 @@ static int dowild(const uchar *p, const uchar *text, unsigned int flags)
>>>>                                       } else if (CC_EQ(s,i, "upper")) {
>>>>                                               if (ISUPPER(t_ch))
>>>>                                                       matched = 1;
>>>> +                                               else if ((flags & WM_CASEFOLD) && ISLOWER(t_ch))
>>>> +                                                       matched = 1;
>>>>                                       } else if (CC_EQ(s,i, "xdigit")) {
>>>>                                               if (ISXDIGIT(t_ch))
>>>>                                                       matched = 1;
>>> 
>>> If WM_CASEFOLD is set, maybe isalpha(t_ch) is enough then?
>> 
>> Yes isalpha() is enought but I wanted to keep the two cases separated, I can amend that if you want.
> 
> Either way is fine. I don't think this code is performance critical. Your call.
> --
> Duy

  reply	other threads:[~2013-05-29 17:57 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-05-28 12:32 [PATCH] wildmatch: properly fold case everywhere Anthony Ramine
2013-05-28 12:53 ` Duy Nguyen
2013-05-28 13:01   ` Anthony Ramine
2013-05-28 13:10 ` [PATCH v2] " Anthony Ramine
2013-05-28 13:58 ` [PATCH v3] " Anthony Ramine
2013-05-29 13:22   ` Duy Nguyen
2013-05-29 13:37     ` Anthony Ramine
2013-05-29 13:52       ` Duy Nguyen
2013-05-29 17:57         ` Anthony Ramine [this message]
2013-05-30  0:04           ` Duy Nguyen
2013-05-30  8:45             ` [PATCH] " Anthony Ramine
2013-05-30  8:52               ` Duy Nguyen
2013-05-30  9:07               ` Eric Sunshine
2013-05-30  9:29                 ` Anthony Ramine
2013-05-30 10:09                   ` Eric Sunshine
2013-05-30 10:19               ` [PATCH v5] " Anthony Ramine
2013-06-02 21:53                 ` Junio C Hamano
2013-06-02 23:42                   ` Anthony Ramine

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=BAB62C57-FE7D-476A-ACA7-5831BAF3E558@gmail.com \
    --to=n.oxyde@gmail.com \
    --cc=git@vger.kernel.org \
    --cc=pclouds@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.