All of lore.kernel.org
 help / color / mirror / Atom feed
* [Buildroot] [PATCH] support/check-uniq-files: support weird locales and filenames
@ 2018-03-31 12:52 Yann E. MORIN
  2018-03-31 13:37 ` Arnout Vandecappelle
  2018-04-07 17:50 ` Peter Korsgaard
  0 siblings, 2 replies; 5+ messages in thread
From: Yann E. MORIN @ 2018-03-31 12:52 UTC (permalink / raw)
  To: buildroot

Currently, when a filename contains characters not representable in the
user's locale, we fail hard, especially when the host python is python3.

This is because python2 and python3 handle encoding/decoding strings
differently, with python3 presumable doing the right thing, but it
breaks on some systems, while python2 presumable does the wrong thing,
but it works everywhere. (Just joking, obviously...)

Part of the issue being that the csv reader in python2 is broken with
UTF8.

We fix the issue by ditching the csv reader, and simply read the file in
binary mode, manually partitionning the lines on the first comma.

Then, we use the binary-encoded (really, un-encoded) package names and
filenames as values and keys, respectively.

Finally, for each filename of package we need to print, we try to decode
them with the default s for the usser settings, but catch any decoding
exception and fallback to dumping the raw, binary values. in that case.

Thanks a lot to Arnout for the live help doing this patch. :-)

Reported-by: Jaap Crezee <jaap@jcz.nl>
Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
Cc: Arnout Vandecappelle <arnout@mind.be>
Cc: Jaap Crezee <jaap@jcz.nl>
---
 support/scripts/check-uniq-files | 19 +++++++++++++------
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
index be808cce03..f110176274 100755
--- a/support/scripts/check-uniq-files
+++ b/support/scripts/check-uniq-files
@@ -26,16 +26,23 @@ def main():
         return False
 
     file_to_pkg = defaultdict(list)
-    with open(args.packages_file_list[0], 'r') as pkg_file_list:
-        r = csv.reader(pkg_file_list, delimiter=',')
-        for row in r:
-            pkg = row[0]
-            file = row[1]
+    with open(args.packages_file_list[0], 'rb') as pkg_file_list:
+        for line in pkg_file_list.readlines():
+            pkg, _, file = line.rstrip(b'\n').partition(b',')
             file_to_pkg[file].append(pkg)
 
     for file in file_to_pkg:
         if len(file_to_pkg[file]) > 1:
-            sys.stderr.write(warn.format(args.type, file, file_to_pkg[file]))
+            # If possible, try to decode the binary strings with
+            # the default user's locale
+            try:
+                sys.stderr.write(warn.format(args.type, file.decode(),
+                                             [p.decode() for p in file_to_pkg[file]]))
+            except UnicodeDecodeError:
+                # ... but fallback to just dumping them raw if they
+                # contain non-representable chars
+                sys.stderr.write(warn.format(args.type, file,
+                                             file_to_pkg[file]))
 
 
 if __name__ == "__main__":
-- 
2.14.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* [Buildroot] [PATCH] support/check-uniq-files: support weird locales and filenames
  2018-03-31 12:52 [Buildroot] [PATCH] support/check-uniq-files: support weird locales and filenames Yann E. MORIN
@ 2018-03-31 13:37 ` Arnout Vandecappelle
  2018-04-01 21:25   ` Jaap Crezee
  2018-04-07 17:50 ` Peter Korsgaard
  1 sibling, 1 reply; 5+ messages in thread
From: Arnout Vandecappelle @ 2018-03-31 13:37 UTC (permalink / raw)
  To: buildroot



On 31-03-18 14:52, Yann E. MORIN wrote:
> Currently, when a filename contains characters not representable in the
> user's locale, we fail hard, especially when the host python is python3.
> 
> This is because python2 and python3 handle encoding/decoding strings
> differently, with python3 presumable doing the right thing, but it
> breaks on some systems, while python2 presumable does the wrong thing,
> but it works everywhere. (Just joking, obviously...)
> 
> Part of the issue being that the csv reader in python2 is broken with
> UTF8.
> 
> We fix the issue by ditching the csv reader, and simply read the file in
> binary mode, manually partitionning the lines on the first comma.
> 
> Then, we use the binary-encoded (really, un-encoded) package names and
> filenames as values and keys, respectively.
> 
> Finally, for each filename of package we need to print, we try to decode
> them with the default s for the usser settings, but catch any decoding
> exception and fallback to dumping the raw, binary values. in that case.
> 
> Thanks a lot to Arnout for the live help doing this patch. :-)
> 
> Reported-by: Jaap Crezee <jaap@jcz.nl>
> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> Cc: Arnout Vandecappelle <arnout@mind.be>
> Cc: Jaap Crezee <jaap@jcz.nl>


 Applied to master, thanks. But I couldn't resist extended the commit log a
little more - it really was too short :-P

 Regards,
 Arnout

> ---
>  support/scripts/check-uniq-files | 19 +++++++++++++------
>  1 file changed, 13 insertions(+), 6 deletions(-)
> 
> diff --git a/support/scripts/check-uniq-files b/support/scripts/check-uniq-files
> index be808cce03..f110176274 100755
> --- a/support/scripts/check-uniq-files
> +++ b/support/scripts/check-uniq-files
> @@ -26,16 +26,23 @@ def main():
>          return False
>  
>      file_to_pkg = defaultdict(list)
> -    with open(args.packages_file_list[0], 'r') as pkg_file_list:
> -        r = csv.reader(pkg_file_list, delimiter=',')
> -        for row in r:
> -            pkg = row[0]
> -            file = row[1]
> +    with open(args.packages_file_list[0], 'rb') as pkg_file_list:
> +        for line in pkg_file_list.readlines():
> +            pkg, _, file = line.rstrip(b'\n').partition(b',')
>              file_to_pkg[file].append(pkg)
>  
>      for file in file_to_pkg:
>          if len(file_to_pkg[file]) > 1:
> -            sys.stderr.write(warn.format(args.type, file, file_to_pkg[file]))
> +            # If possible, try to decode the binary strings with
> +            # the default user's locale
> +            try:
> +                sys.stderr.write(warn.format(args.type, file.decode(),
> +                                             [p.decode() for p in file_to_pkg[file]]))
> +            except UnicodeDecodeError:
> +                # ... but fallback to just dumping them raw if they
> +                # contain non-representable chars
> +                sys.stderr.write(warn.format(args.type, file,
> +                                             file_to_pkg[file]))
>  
>  
>  if __name__ == "__main__":
> 

-- 
Arnout Vandecappelle                          arnout at mind be
Senior Embedded Software Architect            +32-16-286500
Essensium/Mind                                http://www.mind.be
G.Geenslaan 9, 3001 Leuven, Belgium           BE 872 984 063 RPR Leuven
LinkedIn profile: http://www.linkedin.com/in/arnoutvandecappelle
GPG fingerprint:  7493 020B C7E3 8618 8DEC 222C 82EB F404 F9AC 0DDF

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Buildroot] [PATCH] support/check-uniq-files: support weird locales and filenames
  2018-03-31 13:37 ` Arnout Vandecappelle
@ 2018-04-01 21:25   ` Jaap Crezee
  2018-04-01 21:40     ` Yann E. MORIN
  0 siblings, 1 reply; 5+ messages in thread
From: Jaap Crezee @ 2018-04-01 21:25 UTC (permalink / raw)
  To: buildroot

On 03/31/18 15:37, Arnout Vandecappelle wrote:
>> Reported-by: Jaap Crezee <jaap@jcz.nl>
>> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
>> Cc: Arnout Vandecappelle <arnout@mind.be>
>> Cc: Jaap Crezee <jaap@jcz.nl>
> 
> 
>  Applied to master, thanks. But I couldn't resist extended the commit log a
> little more - it really was too short :-P

Excellent guys, thank you.
I can confirm it works for me.
Are such patches expected to be applied to/merged on 2018.02.x as well?

regards,

Jaap Crezee

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Buildroot] [PATCH] support/check-uniq-files: support weird locales and filenames
  2018-04-01 21:25   ` Jaap Crezee
@ 2018-04-01 21:40     ` Yann E. MORIN
  0 siblings, 0 replies; 5+ messages in thread
From: Yann E. MORIN @ 2018-04-01 21:40 UTC (permalink / raw)
  To: buildroot

Jaap, All,

On 2018-04-01 23:25 +0200, Jaap Crezee spake thusly:
> On 03/31/18 15:37, Arnout Vandecappelle wrote:
> >> Reported-by: Jaap Crezee <jaap@jcz.nl>
> >> Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
> >> Cc: Arnout Vandecappelle <arnout@mind.be>
> >> Cc: Jaap Crezee <jaap@jcz.nl>
> >  Applied to master, thanks. But I couldn't resist extended the commit log a
> > little more - it really was too short :-P
> Excellent guys, thank you.
> I can confirm it works for me.

OK, good to know, thanks for the feedback. :-)

> Are such patches expected to be applied to/merged on 2018.02.x as well?

Yeah, they will eventually trickle down to the LTS branch. But we
presumably want to test-bench them a bit in master before doing the
backport...

Regards,
Yann E. MORIN.

-- 
.-----------------.--------------------.------------------.--------------------.
|  Yann E. MORIN  | Real-Time Embedded | /"\ ASCII RIBBON | Erics' conspiracy: |
| +33 662 376 056 | Software  Designer | \ / CAMPAIGN     |  ___               |
| +33 223 225 172 `------------.-------:  X  AGAINST      |  \e/  There is no  |
| http://ymorin.is-a-geek.org/ | _/*\_ | / \ HTML MAIL    |   v   conspiracy.  |
'------------------------------^-------^------------------^--------------------'

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Buildroot] [PATCH] support/check-uniq-files: support weird locales and filenames
  2018-03-31 12:52 [Buildroot] [PATCH] support/check-uniq-files: support weird locales and filenames Yann E. MORIN
  2018-03-31 13:37 ` Arnout Vandecappelle
@ 2018-04-07 17:50 ` Peter Korsgaard
  1 sibling, 0 replies; 5+ messages in thread
From: Peter Korsgaard @ 2018-04-07 17:50 UTC (permalink / raw)
  To: buildroot

>>>>> "Yann" == Yann E MORIN <yann.morin.1998@free.fr> writes:

 > Currently, when a filename contains characters not representable in the
 > user's locale, we fail hard, especially when the host python is python3.

 > This is because python2 and python3 handle encoding/decoding strings
 > differently, with python3 presumable doing the right thing, but it
 > breaks on some systems, while python2 presumable does the wrong thing,
 > but it works everywhere. (Just joking, obviously...)

 > Part of the issue being that the csv reader in python2 is broken with
 > UTF8.

 > We fix the issue by ditching the csv reader, and simply read the file in
 > binary mode, manually partitionning the lines on the first comma.

 > Then, we use the binary-encoded (really, un-encoded) package names and
 > filenames as values and keys, respectively.

 > Finally, for each filename of package we need to print, we try to decode
 > them with the default s for the usser settings, but catch any decoding
 > exception and fallback to dumping the raw, binary values. in that case.

 > Thanks a lot to Arnout for the live help doing this patch. :-)

 > Reported-by: Jaap Crezee <jaap@jcz.nl>
 > Signed-off-by: "Yann E. MORIN" <yann.morin.1998@free.fr>
 > Cc: Arnout Vandecappelle <arnout@mind.be>
 > Cc: Jaap Crezee <jaap@jcz.nl>

Committed to 2018.02.x, thanks.

-- 
Bye, Peter Korsgaard

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-04-07 17:50 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-03-31 12:52 [Buildroot] [PATCH] support/check-uniq-files: support weird locales and filenames Yann E. MORIN
2018-03-31 13:37 ` Arnout Vandecappelle
2018-04-01 21:25   ` Jaap Crezee
2018-04-01 21:40     ` Yann E. MORIN
2018-04-07 17:50 ` Peter Korsgaard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.