From mboxrd@z Thu Jan 1 00:00:00 1970 From: bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r@public.gmane.org Subject: [Bug 60807] not all the pages are encoded using utf-8 Date: Fri, 14 Feb 2014 10:22:04 +0000 Message-ID: References: Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-man-owner-u79uwXL29TY76Z2rM5mHXA@public.gmane.org To: linux-man-u79uwXL29TY76Z2rM5mHXA@public.gmane.org List-Id: linux-man@vger.kernel.org https://bugzilla.kernel.org/show_bug.cgi?id=60807 Michael Kerrisk changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |mtk.manpages-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org --- Comment #4 from Michael Kerrisk --- (In reply to Peter Schiffer from comment #3) > $ ./print_encoding.sh man?/* > > Man Page Encoding by file Encoding by first line > > * man2/close.2 iso-8859-1 > * man2/getdomainname.2 iso-8859-1 > * man2/getrlimit.2 iso-8859-1 > * man2/madvise.2 iso-8859-1 > * man2/mount.2 utf-8 > * man2/sysinfo.2 iso-8859-1 > * man2/umask.2 iso-8859-1 > * man3/encrypt.3 iso-8859-1 > * man3/fclose.3 iso-8859-1 > * man3/fflush.3 iso-8859-1 > * man3/lockf.3 iso-8859-1 > * man3/rand.3 iso-8859-1 > * man3/strtok.3 iso-8859-1 > * man3/toupper.3 iso-8859-1 > * man3/updwtmp.3 iso-8859-1 > * man4/st.4 utf-8 > * man5/utmp.5 iso-8859-1 > * man7/armscii-8.7 iso-8859-1 ARMSCII-8 > * man7/cp1251.7 unknown-8bit CP1251 > * man7/environ.7 iso-8859-1 > * man7/hier.7 iso-8859-1 > * man7/iso_8859-10.7 iso-8859-1 ISO-8859-10 > * man7/iso_8859-11.7 iso-8859-1 ISO-8859-11 > * man7/iso_8859-13.7 iso-8859-1 ISO-8859-7 > * man7/iso_8859-14.7 iso-8859-1 ISO-8859-14 > * man7/iso_8859-15.7 iso-8859-1 ISO-8859-15 > * man7/iso_8859-16.7 iso-8859-1 ISO-8859-16 > * man7/iso_8859-1.7 iso-8859-1 > * man7/iso_8859-2.7 iso-8859-1 ISO-8859-2 > * man7/iso_8859-3.7 iso-8859-1 ISO-8859-3 > * man7/iso_8859-4.7 iso-8859-1 ISO-8859-4 > * man7/iso_8859-5.7 iso-8859-1 ISO-8859-5 > * man7/iso_8859-6.7 iso-8859-1 ISO-8859-6 > * man7/iso_8859-7.7 iso-8859-1 ISO-8859-7 > * man7/iso_8859-8.7 iso-8859-1 ISO-8859-8 > * man7/iso_8859-9.7 iso-8859-1 ISO-8859-9 > * man7/koi8-r.7 unknown-8bit KOI8-R > * man7/koi8-u.7 unknown-8bit > * man7/suffixes.7 iso-8859-1 > > $ ./convert_to_utf_8.sh tmp_encoded man?/* > Converting man2/close.2 from iso-8859-1 > Converting man2/getdomainname.2 from iso-8859-1 > Converting man2/getrlimit.2 from iso-8859-1 > Converting man2/madvise.2 from iso-8859-1 > Converting man2/mount.2 from utf-8 > Converting man2/sysinfo.2 from iso-8859-1 > Converting man2/umask.2 from iso-8859-1 > Converting man3/encrypt.3 from iso-8859-1 > Converting man3/fclose.3 from iso-8859-1 > Converting man3/fflush.3 from iso-8859-1 > Converting man3/lockf.3 from iso-8859-1 > Converting man3/rand.3 from iso-8859-1 > Converting man3/strtok.3 from iso-8859-1 > Converting man3/toupper.3 from iso-8859-1 > Converting man3/updwtmp.3 from iso-8859-1 > Converting man4/st.4 from utf-8 > Converting man5/utmp.5 from iso-8859-1 > Converting man7/armscii-8.7 from armscii-8 > Converting man7/cp1251.7 from cp1251 > Converting man7/environ.7 from iso-8859-1 > Converting man7/hier.7 from iso-8859-1 > Converting man7/iso_8859-10.7 from iso_8859-10 > Converting man7/iso_8859-11.7 from iso-8859-1 > Converting man7/iso_8859-13.7 from iso-8859-1 > Converting man7/iso_8859-14.7 from iso_8859-14 > Converting man7/iso_8859-15.7 from iso_8859-15 > Converting man7/iso_8859-16.7 from iso_8859-16 > Converting man7/iso_8859-1.7 from iso_8859-1 > Converting man7/iso_8859-2.7 from iso_8859-2 > Converting man7/iso_8859-3.7 from iso_8859-3 > Converting man7/iso_8859-4.7 from iso_8859-4 > Converting man7/iso_8859-5.7 from iso_8859-5 > Converting man7/iso_8859-6.7 from iso_8859-6 > Converting man7/iso_8859-7.7 from iso_8859-7 > Converting man7/iso_8859-8.7 from iso_8859-8 > Converting man7/iso_8859-9.7 from iso_8859-9 > Converting man7/koi8-r.7 from koi8-r > Converting man7/koi8-u.7 from koi8-u > Converting man7/suffixes.7 from iso-8859-1 > > $ cd tmp_encoded/ > > $ ../print_encoding.sh man?/* > > Man Page Encoding by file Encoding by first line > > * man2/close.2 utf-8 UTF-8 > * man2/getdomainname.2 utf-8 UTF-8 > * man2/getrlimit.2 utf-8 UTF-8 > * man2/madvise.2 utf-8 UTF-8 > * man2/mount.2 utf-8 UTF-8 > * man2/sysinfo.2 utf-8 UTF-8 > * man2/umask.2 utf-8 UTF-8 > * man3/encrypt.3 utf-8 UTF-8 > * man3/fclose.3 utf-8 UTF-8 > * man3/fflush.3 utf-8 UTF-8 > * man3/lockf.3 utf-8 UTF-8 > * man3/rand.3 utf-8 UTF-8 > * man3/strtok.3 utf-8 UTF-8 > * man3/toupper.3 utf-8 UTF-8 > * man3/updwtmp.3 utf-8 UTF-8 > * man4/st.4 utf-8 UTF-8 > * man5/utmp.5 utf-8 UTF-8 > * man7/armscii-8.7 utf-8 UTF-8 > * man7/cp1251.7 utf-8 UTF-8 > * man7/environ.7 utf-8 UTF-8 > * man7/hier.7 utf-8 UTF-8 > * man7/iso_8859-10.7 utf-8 UTF-8 > * man7/iso_8859-11.7 utf-8 UTF-8 > * man7/iso_8859-13.7 utf-8 UTF-8 > * man7/iso_8859-14.7 utf-8 UTF-8 > * man7/iso_8859-15.7 utf-8 UTF-8 > * man7/iso_8859-16.7 utf-8 UTF-8 > * man7/iso_8859-1.7 utf-8 UTF-8 > * man7/iso_8859-2.7 utf-8 UTF-8 > * man7/iso_8859-3.7 utf-8 UTF-8 > * man7/iso_8859-4.7 utf-8 UTF-8 > * man7/iso_8859-5.7 utf-8 UTF-8 > * man7/iso_8859-6.7 utf-8 UTF-8 > * man7/iso_8859-7.7 utf-8 UTF-8 > * man7/iso_8859-8.7 utf-8 UTF-8 > * man7/iso_8859-9.7 utf-8 UTF-8 > * man7/koi8-r.7 utf-8 UTF-8 > * man7/koi8-u.7 utf-8 UTF-8 > * man7/suffixes.7 utf-8 UTF-8 Peter, Sorry to be slow following up on this. Thanks for the scripts. As some background, I'll just note that the current encoding markers in the iso_8859* pages were added in response to this 2009 bug report: https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=519209 It seems a reasonable idea to convert everything to UTF-8, but I have some concerns/questions. 1. Is the encoding line: '\" t -*- coding: UTF-8 -*- really needed, or does modern groff just work this out? 2. I'm concerned about backward compatibility issues. As in: what if someone loads the man pages onto a system with old groff. Now, as far as I can work out, groff added input unicode support in v1.20, 2009 (http://lists.gnu.org/archive/html/groff/2009-01/msg00011.html). So, perhaps that's long enough ago that we don't need to worry too much about these issues. Any thoughts? -- You are receiving this mail because: You are watching the assignee of the bug. -- To unsubscribe from this list: send the line "unsubscribe linux-man" in the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org More majordomo info at http://vger.kernel.org/majordomo-info.html