linux-spdx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Philippe Ombredanne <pombredanne@nexb.com>
To: J Lovejoy <opensource@jilayne.com>
Cc: linux-spdx@vger.kernel.org, Steve Winslow <swinslow@linuxfoundation.org>
Subject: Re: some ideas on guidelines
Date: Wed, 12 Jun 2019 12:26:30 +0200	[thread overview]
Message-ID: <CAOFm3uHTGF-bBOixc_c0rM_0PWb1Rqx8k=xTJZ6QOqOLpyF5yg@mail.gmail.com> (raw)
In-Reply-To: <B3E07802-79E4-4703-90ED-8BB2D8F37092@jilayne.com>

Hi Jilayne:

On Wed, Jun 12, 2019 at 7:25 AM J Lovejoy <opensource@jilayne.com> wrote:
>
> GOAL: The over-arching goal here is to provide clear, concise, and
> machine-readable license information at the file-level for the Linux
> kernel by placing SPDX License List short identifiers at the top of
> each file in order to make it easier for downstream users and
> distributors to use automated processes and comply with the applicable
> license terms.
>
> NOTE: The guidance is either to REPLACE the existing license notice
> with the SPDX license identifier or ADD the SPDX license identifier.
> The rationale here is that where the license notice is clear, then
> replacing should be okay as this is essentially upgrading the current
> notice to something more modern and machine readable. But everywhere
> else, a conservative approach of adding the SPDX identifiers (and as
> such, keeping the existing license notice info) means that others can
> see both. This also avoids the need to create or retain some file with
> all the removed notices, which seems to be distasteful and untenable
> based on the threads related to that topic. The SPDX identifier still
> needs to be accurate, of course.
>
> TOOLING CONSIDERATION: To make it easier on tooling, putting some kind
> of START/END notation, as Steve has recommended,

Having some convention to enclose a notice in some markers would have
no impact and would not make it easier for scancode: the notice would
be detected and reported if is enclosed in markers or not. This could
be leveraged later as a way to speed things up of course, but that's
minor.

If tagging notice text boundaries is the route selected for the
kernel, then it is worth crafting something that is well thought out
as the kernel ways **will** surely be adopted by other projects.

FWIW, here are a few examples of using such markers that exist in the
wild from a quick grep in scancode license notices database:

- Mozilla: BEGIN LICENSE BLOCK/ END LICENSE BLOCK
- Apple: @APPLE_LICENSE_HEADER_START@
@APPLE_LICENSE_HEADER_END@ and some variations
- Oracle: CDDL HEADER START/END , GPL HEADER START/END, LGPL HEADER
START/END used with their highly impractical "DO NOT ALTER OR REMOVE
COPYRIGHT NOTICES OR THIS FILE HEADER."
- LICENSE_START/LICENSE_END (and variations such as %%%LICENSE_START
used in some man pages and tools including the kernel)
- BSDCOPYRIGHTBEGIN, ECOSGPLCOPYRIGHTBEGIN and other variations in eCos.
- Qt and KDE: QT_BEGIN_LICENSE with variations
- COPYRIGHTBEGIN/END
- Begin-Header/End-Header
- BEGIN LICENSE TEXT/END LICENSE TEXT

> thus allowing tooling
> to ignore what’s enclosed there and just read the SPDX identifier as
> the definitive license notice.

There is something inconsistent here: either a custom notice or
disclaimer is needed and has some legal importance, or it has none and
should be removed. If it has some importance and needs to be kept,
then I cannot "just read the SPDX identifier as the definitive license
notice" as you wrote, I think I would need to consider both the id and
extra notice. Or am I missing something?

>  As time goes by, if copyright holders
> come across these files and want to remove the original notices, then
> they have the right to do so.
>
> GUIDANCE: The following is meant to provide some high-level guidance
> for how to handle common scenarios and triage the approaches to reach
> the stated goal.
> The following is not intended to be legal advice. Rather, it is meant
> to reflect the intention of the participating individuals to improve
> the quality and machine-readability of the applicable license
> information in Linux kernel files. The approach described below has
> been developed with the Linux kernel in mind and might not be
> appropriate for other projects or communities.
>
> #1   Where a file contains the standard license notice as stated in
> the GPL-2.0 license text for GPL-2.0-only or GPL-2.0-or-later and no
> other license information whatsoever —> then REPLACE the standard
> license notice with the SPDX identifier for the relevant license.
>
> #2   Where a file contains a non-substantive variation on the standard
> GPL-2.0 license notice, but still provides clear distinction as to
> GPL-2.0-only or GPL-or-later consistent with the intent of the
> standard license notice and no other license information whatsoever
> —> then REPLACE the standard license notice with the SPDX identifier
> for the relevant license.
>
> #3   Where a file contains a license notice that is non-standard as
> compared to that stated in the GPL-2.0 license text but is nonetheless
> clear as to GPL-2.0-only or GPL-2.0-or-later and no other license
> information whatsoever —> then REPLACE the standard license notice
> with the SPDX identifier for the relevant license.
>
> NOTES RELATED TO #1-3:
> The SPDX identifier is simply a more concise way to express the same
> intention regarding what license applies to the file as the standard
> license notice, but does so in a reliably, machine-readable way that
> meets the needs of modern software supply chain use and efforts to
> automate detection of license information in order to facilitate more
> complete license information and license compliance. One consideration
> is whether replacing existing license notices with more concise,
> machine-readable expression of the same information could run afoul of
> a strict reading of GPL-2.0, section 1.
> Such a strict reading applied to the scenarios described in #1-3 is
> unconvincing for the following reasons:
> *  Although the license text itself recommends the use of the standard
> license notice, it is not a hard requirement of the license. The
> definitive text, as always, is the full text of the license itself.
> Notably, the license author/steward, the Free Software Foundation
> (FSF), encourages use of the standard header, but more broadly
> recommends clear communication of the license variant chosen for the
> given work as seen in various pages on their site.[1] Furthermore,
> Richard Stallman endorsed the use of the revised SPDX identifiers for
> helping provide clarity as to whether a licensor has chosen the
> license-version-only or any-later-version option.[2]
> *  This project to improve license information in the Linux kernel
> files has been discussed among kernel developers, on kernel mailing
> lists, and documented in public files and documentation beginning in
> mid-20173 to which many kernel copyright holders past and present have
> access and would be likely to see and which has received positive
> response and encouragement.
> [1] See https://www.gnu.org/licenses/gpl-howto.html which provides the
> standard license notice, but then also goes on to
> https://www.gnu.org/licenses/gpl-faq.en.html#LicenseCopyOnlysuggest
> one clear and explicit statement such as, “This program is released
> under license FOO”. FAQ questions and https://www.gnu.org/licenses
> /gpl-faq.en.html#NoticeInSourceFile also stress the general need for
> clarity without mandating use of the specific standard license notice.
> [2] See https://www.gnu.org/licenses/identify-licenses-clearly.html
>
> #4   Where the file contains a license notice that clearly states the
> file is licensed under “GPL” with no indication of version number
> and no other license information whatsoever —> ADD SPDX identifier
> for GPL-2.0-or-later
>     Rationale: This is consistent with the text of the license which
> states, “If the Program does not specify a version number of this
> License, you may choose any version ever published by the Free
> Software Foundation.” Because the Linux kernel is well-known to be
> licensed under GPL-2.0-only and use of GPL-1.0 is generally sparse, it
> within the options given in the license text to choose GPL-2.0-or-
> later in this case. Doing so more easily enables use of such files
> beyond the Linux kernel.

Just FYI,  I am fine with a GPL-2.0-or-later choice for the kernel,
but scancode will report these cases as GPL-1.0-or-later.

> #5   Where the file contains a license notice that: a) refers to the
> COPYING file or another specific file (or references GPL and the
> COPYING or another specific file) with no other information as to the
> specific license whatsoever; and b) the COPYING or other specific file
> can be located and is clearly a copy of GPL-2.0  —> ADD SPDX
> identifier for GPL-2.0-only
>     Rationale: This is similar to #4, but the combination of a clear
> reference to a specific license file and the fact that the Linux
> kernel is clearly intended to be GPL-2.0-only leads to the intent that
> this is also GPL-2.0-only. The COPYING file currently in the kernel is
> at https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
> tree/COPYING, and refers to GPL-2.0-only. The (earlier) version of the
> COPYING file also had Linus expressing GPL-2.0-only: see https://git.k
> ernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/COPYING?i
> d=1da177e4c3f41524e886b7f1b8a0c1fc7321cac2
>
> #6   Where a file contains a license notice that is non-standard as
> compared to that stated in the GPL-2.0 license text but in nonetheless
> clear as to GPL-2.0-only or GPL-2.0-or-later and there is other
> license information, and that license information contains the
> following:
>                 #6a  An existing known additional license or exception
> for which there is an SPDX identifier
>                          —> ADD appropriate SPDX license expression
> (use of AND, OR, WITH), where person making change is does not
> represent copyright holders for file
>                          —> REPLACE with appropriate SPDX license
> expression, where person(s) making or signing-off on changes represent
> copyright holders
>                 #6b   An additional license or exception for which
> there is no SPDX identifier as per the existing SPDX License List
> Matching Guidelines:
>                         --  If clearly a different license and use is
> more than one or two files, then submit for addition to SPDX License
> List at http://13.57.134.254/app/submit_new_license/
>                         -- If close to an existing license/exception
> on the SPDX License List such that the SPDX license’s matching
> markup might be extended to accommodate as a match, submit to SPDX
> legal team for review of such.
>                         -- If some mess of a license that is unclear,
> an abomination, contains non-free elements, or otherwise poses some
> kind of challenge, then attempt to contact copyright holders to change
> license with recommendation
>                 #6c   An additional or different disclaimer or
> warranty text:
>                         — Where the copyright holders of the file in
> questions can be contacted, then ask them to remove this and use the
> appropriate SPDX identifier for GPL
>                         —  Where copyright holders of the file in
> question cannot be easily contacted or found, then analyze differences
> between additional disclaimer text and standard disclaimer included in
> GPL, then:
>                                 —> if additional disclaimer text
> adds no additional substantive aspects to the standard GPL disclaimer,
> REPLACE with appropriate SPDX license identifier for GPL-2.0
>                                 —> If additional disclaimer text
> adds additional substantive aspects to the standard GPL disclaimer,
> ADD the appropriate SPDX license identifier for GPL-2.0
> ========
> Please note: while I am a lawyer, I do not represent any kernel
> developers nor any of the people involved in this work. I understand
> that no lawyer could represent the interest of the Linux kernel and
> its many copyright holders in total. We can, however, discuss this in
> a public forum and come up with some consensus as to reasonable
> guidelines and rationale for such.
> I have tried to collect the various thoughts and opinions expressed on
> the mailing list on these topics.
> I’m particularly interested in the following feedback:
> A) This takes a somewhat conservative approach regarding retaining
> some of the license notices and adding SPDX identifiers, rather than
> replacing. I’d like to know from those involved in using scanning
> tools (Thomas, Philippe) if this would be tenable.

Speaking for the scanning tool in use here (i.e. the scancode-toolkit)
having SPDX ids alone or with some extra notice has no impact. The
SPDX id and the license notice will be detected and each detected
texts reported with their own corresponding license expression (which
would happen to be the same and that can later be combined and
simplified in a single expression.)

It would likely not impact checkpatch.pl either since it cares only
about the SPDX identifiers.

BUT If you start to butcher the original notice (such as you remove
the GPL notice part and keep a warranty disclaimer) the detection
results will be butchered accordingly and that standalone disclaimer
will be eventually detected either as a bare disclaimer with no
related license or as a partial detection of an another notice (since
scancode eventually does a multidiff/red line comparison).
The same would likely apply to other license scanners that do not use
a diff, though this could be amplified as regex-based scanners such as
Fossology may get unlucky and miss having a regex for the butchered
text and probabilistic scanners such as Licensee and many others may
see the butchered text going below their false positive threshold and
ignore it entirely.

Therefore my advice would be either to keep a complete and consistent
notice or to keep none e.g. avoid cherry picking parts of a notice as
this will surely result in some license detection but not the one you
would expect: it will likely be inconclusive and require more review.

--
Cordially
Philippe Ombredanne

  reply	other threads:[~2019-06-12 10:27 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-06-12  5:25 some ideas on guidelines J Lovejoy
2019-06-12 10:26 ` Philippe Ombredanne [this message]
2019-06-24  0:57   ` J Lovejoy
2019-06-12 16:04 ` John Sullivan
2019-06-24  1:01   ` J Lovejoy
2019-06-13 17:44 ` Richard Fontana
2019-06-14 14:25 ` Richard Fontana
2019-06-17  4:44   ` J Lovejoy
2019-06-24  1:08   ` J Lovejoy
2019-08-17 19:55 ` J Lovejoy
2019-08-18  5:08   ` Greg KH
2019-08-18 22:08     ` Richard Fontana
2019-08-19  4:30       ` Christoph Hellwig
2019-08-19  5:06       ` Greg KH

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAOFm3uHTGF-bBOixc_c0rM_0PWb1Rqx8k=xTJZ6QOqOLpyF5yg@mail.gmail.com' \
    --to=pombredanne@nexb.com \
    --cc=linux-spdx@vger.kernel.org \
    --cc=opensource@jilayne.com \
    --cc=swinslow@linuxfoundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).