All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] More reproducible document builds
@ 2015-09-01 22:47 Ben Hutchings
  2015-09-01 22:49 ` [PATCH 2/2] DocBook: Use a fixed encoding for output Ben Hutchings
  0 siblings, 1 reply; 7+ messages in thread
From: Ben Hutchings @ 2015-09-01 22:47 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Jérémy Bobbio, reproducible-builds, linux-doc,
	Randy Dunlap, Michal Marek, linux-kbuild

[-- Attachment #1: Type: text/plain, Size: 1420 bytes --]

This series fixes a couple more reproducibility bugs:

1. The 'mandocs' target always creates dummy man pages in the source
tree.  Aside from their very existence being a nasty hack, and the
wrong-ness of writing to the source tree, they contain a build
timestamp and if we build a package of the source later it won't be
reproducible.  This moves them to the same place as all the other man
pages are built.

(Moving the dummy man pages shouldn't cause a problem for
reproducibility of the man page packages, because the dummies are in
section 1 and not section 9. However the dummies still don't get
cleaned properly.)

2. The current locale affects the encoding of HTML pages.

With these patches (and the previous set) applied, Debian's package of
Linux 4.2 is fully reproducible:
https://reproducible.debian.net/rb-pkg/experimental/amd64/linux.html

Ben.

Ben Hutchings (2):
  Documentation: Avoid creating man pages in source tree
  DocBook: Use a fixed encoding for output

 Documentation/DocBook/Makefile | 6 ++++++
 Makefile                       | 2 +-
 scripts/Makefile               | 7 +++++--
 scripts/check-lc_ctype.c       | 6 ++++++
 scripts/kernel-doc             | 9 +++++----
 5 files changed, 23 insertions(+), 7 deletions(-)
 create mode 100644 scripts/check-lc_ctype.c

-- 
Ben Hutchings
The first rule of tautology club is the first rule of tautology club.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [PATCH 2/2] DocBook: Use a fixed encoding for output
  2015-09-01 22:47 [PATCH 0/2] More reproducible document builds Ben Hutchings
@ 2015-09-01 22:49 ` Ben Hutchings
  2015-09-11 19:30   ` Jonathan Corbet
  0 siblings, 1 reply; 7+ messages in thread
From: Ben Hutchings @ 2015-09-01 22:49 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Jérémy Bobbio, reproducible-builds, linux-doc,
	Randy Dunlap, Michal Marek, linux-kbuild

[-- Attachment #1: Type: text/plain, Size: 3232 bytes --]

Currently the encoding of documents generated by DocBook depends on
the current locale.  Make the output reproducible independently of
the locale, by setting the encoding to UTF-8 (LC_CTYPE=C.UTF-8) by
preference, or ASCII (LC_CTYPE=C) as a fallback.

LC_CTYPE can normally be overridden by LC_ALL, but the top-level
Makefile unsets that.

Signed-off-by: Ben Hutchings <ben@decadent.org.uk>
---
 Documentation/DocBook/Makefile | 6 ++++++
 Makefile                       | 2 +-
 scripts/Makefile               | 7 +++++--
 scripts/check-lc_ctype.c       | 6 ++++++
 4 files changed, 18 insertions(+), 3 deletions(-)
 create mode 100644 scripts/check-lc_ctype.c

diff --git a/Documentation/DocBook/Makefile b/Documentation/DocBook/Makefile
index 198e9b5..9af25da 100644
--- a/Documentation/DocBook/Makefile
+++ b/Documentation/DocBook/Makefile
@@ -68,6 +68,12 @@ installmandocs: mandocs
 #External programs used
 KERNELDOC = $(srctree)/scripts/kernel-doc
 DOCPROC   = $(objtree)/scripts/docproc
+CHECK_LC_CTYPE = $(objtree)/scripts/check-lc_ctype
+
+# Use a fixed encoding - UTF-8 if the C library has support built-in
+# or ASCII if not
+LC_CTYPE := $(call try-run, LC_CTYPE=C.UTF-8 $(CHECK_LC_CTYPE),C.UTF-8,C)
+export LC_CTYPE
 
 XMLTOFLAGS = -m $(srctree)/$(src)/stylesheet.xsl
 XMLTOFLAGS += --skip-validation
diff --git a/Makefile b/Makefile
index 13270c0..5846c06 100644
--- a/Makefile
+++ b/Makefile
@@ -1338,7 +1338,7 @@ $(help-board-dirs): help-%:
 # Documentation targets
 # ---------------------------------------------------------------------------
 %docs: scripts_basic FORCE
-	$(Q)$(MAKE) $(build)=scripts build_docproc
+	$(Q)$(MAKE) $(build)=scripts build_docproc build_check-lc_ctype
 	$(Q)$(MAKE) $(build)=Documentation/DocBook $@
 
 else # KBUILD_EXTMOD
diff --git a/scripts/Makefile b/scripts/Makefile
index 2016a64..6f0349f 100644
--- a/scripts/Makefile
+++ b/scripts/Makefile
@@ -7,6 +7,7 @@
 # conmakehash:   Create chartable
 # conmakehash:	 Create arrays for initializing the kernel console tables
 # docproc:       Used in Documentation/DocBook
+# check-lc_ctype: Used in Documentation/DocBook
 
 HOST_EXTRACFLAGS += -I$(srctree)/tools/include
 
@@ -23,14 +24,16 @@ HOSTCFLAGS_asn1_compiler.o = -I$(srctree)/include
 always		:= $(hostprogs-y) $(hostprogs-m)
 
 # The following hostprogs-y programs are only build on demand
-hostprogs-y += unifdef docproc
+hostprogs-y += unifdef docproc check-lc_ctype
 
 # These targets are used internally to avoid "is up to date" messages
-PHONY += build_unifdef build_docproc
+PHONY += build_unifdef build_docproc build_check-lc_ctype
 build_unifdef: $(obj)/unifdef
 	@:
 build_docproc: $(obj)/docproc
 	@:
+build_check-lc_ctype: $(obj)/check-lc_ctype
+	@:
 
 subdir-$(CONFIG_MODVERSIONS) += genksyms
 subdir-y                     += mod
diff --git a/scripts/check-lc_ctype.c b/scripts/check-lc_ctype.c
new file mode 100644
index 0000000..51fe229
--- /dev/null
+++ b/scripts/check-lc_ctype.c
@@ -0,0 +1,6 @@
+#include <locale.h>
+
+int main(void)
+{
+	return !setlocale(LC_CTYPE, "");
+}
-- 
Ben Hutchings
The first rule of tautology club is the first rule of tautology club.


[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 811 bytes --]

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] DocBook: Use a fixed encoding for output
  2015-09-01 22:49 ` [PATCH 2/2] DocBook: Use a fixed encoding for output Ben Hutchings
@ 2015-09-11 19:30   ` Jonathan Corbet
  2015-09-11 21:40     ` [Reproducible-builds] " Daniel Kahn Gillmor
  2015-09-14  0:32     ` Ben Hutchings
  0 siblings, 2 replies; 7+ messages in thread
From: Jonathan Corbet @ 2015-09-11 19:30 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Jérémy Bobbio, reproducible-builds, linux-doc,
	Randy Dunlap, Michal Marek, linux-kbuild

On Tue, 01 Sep 2015 23:49:19 +0100
Ben Hutchings <ben@decadent.org.uk> wrote:

> Currently the encoding of documents generated by DocBook depends on
> the current locale.  Make the output reproducible independently of
> the locale, by setting the encoding to UTF-8 (LC_CTYPE=C.UTF-8) by
> preference, or ASCII (LC_CTYPE=C) as a fallback.

I guess I have to ask, though: doesn't it seem that having the docs
produced according to the current locale is the Right Thing to do?  Users
have their locale set as it is for a reason, it seems like the production
of textual documents should respect their choice.

Am I missing something here?

Thanks,

jon

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Reproducible-builds] [PATCH 2/2] DocBook: Use a fixed encoding for output
  2015-09-11 19:30   ` Jonathan Corbet
@ 2015-09-11 21:40     ` Daniel Kahn Gillmor
  2015-09-12 20:06       ` Jonathan Corbet
  2015-09-14  0:32     ` Ben Hutchings
  1 sibling, 1 reply; 7+ messages in thread
From: Daniel Kahn Gillmor @ 2015-09-11 21:40 UTC (permalink / raw)
  To: Jonathan Corbet, Ben Hutchings
  Cc: Randy Dunlap, linux-kbuild, Jérémy Bobbio, linux-doc,
	Michal Marek, reproducible-builds

On Fri 2015-09-11 15:30:59 -0400, Jonathan Corbet wrote:
> On Tue, 01 Sep 2015 23:49:19 +0100
> Ben Hutchings <ben@decadent.org.uk> wrote:
>
>> Currently the encoding of documents generated by DocBook depends on
>> the current locale.  Make the output reproducible independently of
>> the locale, by setting the encoding to UTF-8 (LC_CTYPE=C.UTF-8) by
>> preference, or ASCII (LC_CTYPE=C) as a fallback.
>
> I guess I have to ask, though: doesn't it seem that having the docs
> produced according to the current locale is the Right Thing to do?  Users
> have their locale set as it is for a reason, it seems like the production
> of textual documents should respect their choice.
>
> Am I missing something here?

I sympathize with Jonathan's general concern here -- if this patchset
makes it impossible for people to build documentation with (for example)
their preferred collation order, it would be suboptimal.

On the other hand, this seems to focus on character encodings
specifically; do we really want to encourage any sort of encodings other
than UTF-8?  The only plausible arguments i've heard for documents that
are exclusively CJK characters, which could achieve a modest size
reduction using more targeted encodings.  afaik, there are no such
documents in the kernel, and i doubt there ever will be.

          --dkg

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [Reproducible-builds] [PATCH 2/2] DocBook: Use a fixed encoding for output
  2015-09-11 21:40     ` [Reproducible-builds] " Daniel Kahn Gillmor
@ 2015-09-12 20:06       ` Jonathan Corbet
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Corbet @ 2015-09-12 20:06 UTC (permalink / raw)
  To: Daniel Kahn Gillmor
  Cc: Ben Hutchings, Randy Dunlap, linux-kbuild,
	Jérémy Bobbio, linux-doc, Michal Marek,
	reproducible-builds

On Fri, 11 Sep 2015 17:40:33 -0400
Daniel Kahn Gillmor <dkg@fifthhorseman.net> wrote:

> I sympathize with Jonathan's general concern here -- if this patchset
> makes it impossible for people to build documentation with (for example)
> their preferred collation order, it would be suboptimal.
> 
> On the other hand, this seems to focus on character encodings
> specifically; do we really want to encourage any sort of encodings other
> than UTF-8?  The only plausible arguments i've heard for documents that
> are exclusively CJK characters, which could achieve a modest size
> reduction using more targeted encodings.  afaik, there are no such
> documents in the kernel, and i doubt there ever will be.

Well, there are CJK documents in the kernel, actually, though none are in
the DocBook directory currently.

Regardless of this, it's not a matter of which encodings we are
encouraging.  If we want to encourage utf-8 use, we might not want to
start in the kernel's documentation directory.  I think we need to
respect the user's choice in this regard and not try to override it.  If
I take this patch, I suspect somebody will yell at me for it...

With regard to reproducible builds: success in this area certainly
requires reproducing the build environment as well.  Honestly, I think
that needs to include the locale settings.

Let me know if you think I've totally misunderstood things.

jon

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] DocBook: Use a fixed encoding for output
  2015-09-11 19:30   ` Jonathan Corbet
  2015-09-11 21:40     ` [Reproducible-builds] " Daniel Kahn Gillmor
@ 2015-09-14  0:32     ` Ben Hutchings
  2015-09-18 16:30       ` Jonathan Corbet
  1 sibling, 1 reply; 7+ messages in thread
From: Ben Hutchings @ 2015-09-14  0:32 UTC (permalink / raw)
  To: Jonathan Corbet
  Cc: Jérémy Bobbio, reproducible-builds, linux-doc,
	Randy Dunlap, Michal Marek, linux-kbuild

On Fri, 2015-09-11 at 13:30 -0600, Jonathan Corbet wrote:
> On Tue, 01 Sep 2015 23:49:19 +0100
> Ben Hutchings <ben@decadent.org.uk> wrote:
> 
> > Currently the encoding of documents generated by DocBook depends on
> > the current locale.  Make the output reproducible independently of
> > the locale, by setting the encoding to UTF-8 (LC_CTYPE=C.UTF-8) by
> > preference, or ASCII (LC_CTYPE=C) as a fallback.
> 
> I guess I have to ask, though: doesn't it seem that having the docs
> produced according to the current locale is the Right Thing to do?  Users
> have their locale set as it is for a reason, it seems like the production
> of textual documents should respect their choice.
> 
> Am I missing something here?

Yes - the locale's character encoding applies to plain text, but rich
text formats can have a locale-independent encoding which the viewer
will automatically to the current locale's encoding.

For HTML, the document encoding can be explicit in the document header
(and is, in this case).

Manual pages were already consistently encoded in UTF-8, as this is the
default behaviour of DocBook-XSL (and is what man-db prefers as input).

PDF and Postscript documents have arbitrary and explicit mappings from
character numbers (or names) to glyphs, and PDF documents normally have
a mapping from glyphs back to Unicode code points to support searching
and copying text.

Ben.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH 2/2] DocBook: Use a fixed encoding for output
  2015-09-14  0:32     ` Ben Hutchings
@ 2015-09-18 16:30       ` Jonathan Corbet
  0 siblings, 0 replies; 7+ messages in thread
From: Jonathan Corbet @ 2015-09-18 16:30 UTC (permalink / raw)
  To: Ben Hutchings
  Cc: Jérémy Bobbio, reproducible-builds, linux-doc,
	Randy Dunlap, Michal Marek, linux-kbuild

On Mon, 14 Sep 2015 01:32:50 +0100
Ben Hutchings <ben@decadent.org.uk> wrote:

> > I guess I have to ask, though: doesn't it seem that having the docs
> > produced according to the current locale is the Right Thing to do?  Users
> > have their locale set as it is for a reason, it seems like the production
> > of textual documents should respect their choice.
> > 
> > Am I missing something here?  
> 
> Yes - the locale's character encoding applies to plain text, but rich
> text formats can have a locale-independent encoding which the viewer
> will automatically to the current locale's encoding.
> 
> For HTML, the document encoding can be explicit in the document header
> (and is, in this case).
> 
> Manual pages were already consistently encoded in UTF-8, as this is the
> default behaviour of DocBook-XSL (and is what man-db prefers as input).
> 
> PDF and Postscript documents have arbitrary and explicit mappings from
> character numbers (or names) to glyphs, and PDF documents normally have
> a mapping from glyphs back to Unicode code points to support searching
> and copying text.

OK, I guess you've talked me into it.  Can I ask you for one last favor,
though: please resubmit this patch with a couple of tweaks:

 - Based off current mainline, please (or docs-next, but that shouldn't
   be necessary).  The patch as sent doesn't apply.

 - Could you add a comment to the check-lc_ctype proglet so that somebody
   stumbling across it in the scripts directory knows why it's there?

Thanks,

jon

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2015-09-18 16:30 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-01 22:47 [PATCH 0/2] More reproducible document builds Ben Hutchings
2015-09-01 22:49 ` [PATCH 2/2] DocBook: Use a fixed encoding for output Ben Hutchings
2015-09-11 19:30   ` Jonathan Corbet
2015-09-11 21:40     ` [Reproducible-builds] " Daniel Kahn Gillmor
2015-09-12 20:06       ` Jonathan Corbet
2015-09-14  0:32     ` Ben Hutchings
2015-09-18 16:30       ` Jonathan Corbet

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.