From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.5 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, NORMAL_HTTP_TO_IP,NUMERIC_HTTP_ADDR,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 52173C48BE8 for ; Mon, 24 Jun 2019 01:40:48 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 17C0622CF4 for ; Mon, 24 Jun 2019 01:40:48 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="key not found in DNS" (0-bit key) header.d=jilayne.com header.i=@jilayne.com header.b="KSmPPDEp" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726321AbfFXBks (ORCPT ); Sun, 23 Jun 2019 21:40:48 -0400 Received: from mx1-c1.supremebox.com ([198.23.53.215]:40775 "EHLO mx1.supremebox.com" rhost-flags-OK-OK-OK-FAIL) by vger.kernel.org with ESMTP id S1726304AbfFXBkr (ORCPT ); Sun, 23 Jun 2019 21:40:47 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=jilayne.com ; s=default; h=To:References:Message-Id:Content-Transfer-Encoding:Cc:Date: In-Reply-To:From:Subject:Mime-Version:Content-Type:Sender:Reply-To:Content-ID :Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To: Resent-Cc:Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe :List-Post:List-Owner:List-Archive; bh=7EJ1gg5osM/Ufr0oFI045ckOvHw68QnElbWVno91BgM=; b=KSmPPDEpo2Bfar1J+HdHfaeK13 W+9FNzW3HGIp7AkuSMmEUdRmpyw9XLaA6mAHcuS8gTPbXzhlcQ4fQMw4rItL7k90ws/C1cVzxXlV9 1NUCjBxrtZ7/ETLhb3wSNTY71S70ssQRRtIVFAsc+ALOw1Ew6aZUrSgfin32sf59wevo=; Received: from [166.137.104.218] (helo=[172.20.10.4]) by mx1.supremebox.com with esmtpa (Exim 4.89) (envelope-from ) id 1hfDIn-0006MG-2A; Mon, 24 Jun 2019 00:57:47 +0000 Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 12.4 \(3445.104.11\)) Subject: Re: some ideas on guidelines From: J Lovejoy In-Reply-To: Date: Sun, 23 Jun 2019 18:57:38 -0600 Cc: linux-spdx@vger.kernel.org Content-Transfer-Encoding: quoted-printable Message-Id: <2BD3A377-AFD6-4DF5-969D-FCCB71FAAC05@jilayne.com> References: To: Philippe Ombredanne X-Mailer: Apple Mail (2.3445.104.11) X-Sender-Ident-agJab5osgicCis: opensource@jilayne.com Sender: linux-spdx-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-spdx@vger.kernel.org Hi Philppe, See comments on tooling considerations below: > On Jun 12, 2019, at 4:26 AM, Philippe Ombredanne = wrote: >=20 > Hi Jilayne: >=20 > On Wed, Jun 12, 2019 at 7:25 AM J Lovejoy = wrote: >>=20 >> GOAL: The over-arching goal here is to provide clear, concise, and >> machine-readable license information at the file-level for the Linux >> kernel by placing SPDX License List short identifiers at the top of >> each file in order to make it easier for downstream users and >> distributors to use automated processes and comply with the = applicable >> license terms. >>=20 >> NOTE: The guidance is either to REPLACE the existing license notice >> with the SPDX license identifier or ADD the SPDX license identifier. >> The rationale here is that where the license notice is clear, then >> replacing should be okay as this is essentially upgrading the current >> notice to something more modern and machine readable. But everywhere >> else, a conservative approach of adding the SPDX identifiers (and as >> such, keeping the existing license notice info) means that others can >> see both. This also avoids the need to create or retain some file = with >> all the removed notices, which seems to be distasteful and untenable >> based on the threads related to that topic. The SPDX identifier still >> needs to be accurate, of course. >>=20 >> TOOLING CONSIDERATION: To make it easier on tooling, putting some = kind >> of START/END notation, as Steve has recommended, >=20 > Having some convention to enclose a notice in some markers would have > no impact and would not make it easier for scancode: the notice would > be detected and reported if is enclosed in markers or not. This could > be leveraged later as a way to speed things up of course, but that's > minor. >=20 > If tagging notice text boundaries is the route selected for the > kernel, then it is worth crafting something that is well thought out > as the kernel ways **will** surely be adopted by other projects. >=20 > FWIW, here are a few examples of using such markers that exist in the > wild from a quick grep in scancode license notices database: >=20 > - Mozilla: BEGIN LICENSE BLOCK/ END LICENSE BLOCK > - Apple: @APPLE_LICENSE_HEADER_START@ > @APPLE_LICENSE_HEADER_END@ and some variations > - Oracle: CDDL HEADER START/END , GPL HEADER START/END, LGPL HEADER > START/END used with their highly impractical "DO NOT ALTER OR REMOVE > COPYRIGHT NOTICES OR THIS FILE HEADER." > - LICENSE_START/LICENSE_END (and variations such as %%%LICENSE_START > used in some man pages and tools including the kernel) > - BSDCOPYRIGHTBEGIN, ECOSGPLCOPYRIGHTBEGIN and other variations in = eCos. > - Qt and KDE: QT_BEGIN_LICENSE with variations > - COPYRIGHTBEGIN/END > - Begin-Header/End-Header > - BEGIN LICENSE TEXT/END LICENSE TEXT >=20 >> thus allowing tooling >> to ignore what=E2=80=99s enclosed there and just read the SPDX = identifier as >> the definitive license notice. >=20 > There is something inconsistent here: well, it=E2=80=99s not inconsistent, really and it is consistent with = the SPDX spec, actually=E2=80=A6 more below... > either a custom notice or > disclaimer is needed and has some legal importance > or it has none and > should be removed. but you are making the wildly optimistic assumption that such = determination is black and white, it is not. Reasonable attorneys may = disagree as to what is of =E2=80=9Clegal importance=E2=80=9D or not in = these cases and ultimately, we don=E2=80=99t decide a judge does. This is the challenge, as I implicitly note below, of the Linux kernel = not having its own lawyer or one point of responsibility. We are = collectively making a decision that impacts lots and lots of Linux = users.=20 If we had some text that we all generally agreed we didn=E2=80=99t think = was substantively adding anything to the standard disclaimer text and = thus, just used the GPL-2.0-[only / or-later] tag and didn=E2=80=99t add = a new SPDX identifier to represent the non-substantive text - we=E2=80=99d= basically be following the advice of the SPDX spec for section 4.6 = License Information in File (represented by the stuff found in the file = that we left there, but enclosed in some kind of denotation) and section = 4.5 Concluded License (represented by the SPDX license identifier) Don=E2=80=99t get me wrong - the best case scenario for these kinds of = things is to have the copyright holder clean it up - but just trying to = come up with something for when that=E2=80=99s not feasible that is a = bit on the conservative side and accommodates the concerns raised about = full-scale removing stuff.=20 > If it has some importance and needs to be kept, > then I cannot "just read the SPDX identifier as the definitive license > notice" as you wrote, I think I would need to consider both the id and > extra notice. Or am I missing something? yes and no. see above :) >=20 >> As time goes by, if copyright holders >> come across these files and want to remove the original notices, then >> they have the right to do so. >>=20 >> GUIDANCE: The following is meant to provide some high-level guidance >> for how to handle common scenarios and triage the approaches to reach >> the stated goal. >> The following is not intended to be legal advice. Rather, it is meant >> to reflect the intention of the participating individuals to improve >> the quality and machine-readability of the applicable license >> information in Linux kernel files. The approach described below has >> been developed with the Linux kernel in mind and might not be >> appropriate for other projects or communities. >>=20 >> #1 Where a file contains the standard license notice as stated in >> the GPL-2.0 license text for GPL-2.0-only or GPL-2.0-or-later and no >> other license information whatsoever =E2=80=94> then REPLACE the = standard >> license notice with the SPDX identifier for the relevant license. >>=20 >> #2 Where a file contains a non-substantive variation on the = standard >> GPL-2.0 license notice, but still provides clear distinction as to >> GPL-2.0-only or GPL-or-later consistent with the intent of the >> standard license notice and no other license information whatsoever >> =E2=80=94> then REPLACE the standard license notice with the SPDX = identifier >> for the relevant license. >>=20 >> #3 Where a file contains a license notice that is non-standard as >> compared to that stated in the GPL-2.0 license text but is = nonetheless >> clear as to GPL-2.0-only or GPL-2.0-or-later and no other license >> information whatsoever =E2=80=94> then REPLACE the standard license = notice >> with the SPDX identifier for the relevant license. >>=20 >> NOTES RELATED TO #1-3: >> The SPDX identifier is simply a more concise way to express the same >> intention regarding what license applies to the file as the standard >> license notice, but does so in a reliably, machine-readable way that >> meets the needs of modern software supply chain use and efforts to >> automate detection of license information in order to facilitate more >> complete license information and license compliance. One = consideration >> is whether replacing existing license notices with more concise, >> machine-readable expression of the same information could run afoul = of >> a strict reading of GPL-2.0, section 1. >> Such a strict reading applied to the scenarios described in #1-3 is >> unconvincing for the following reasons: >> * Although the license text itself recommends the use of the = standard >> license notice, it is not a hard requirement of the license. The >> definitive text, as always, is the full text of the license itself. >> Notably, the license author/steward, the Free Software Foundation >> (FSF), encourages use of the standard header, but more broadly >> recommends clear communication of the license variant chosen for the >> given work as seen in various pages on their site.[1] Furthermore, >> Richard Stallman endorsed the use of the revised SPDX identifiers for >> helping provide clarity as to whether a licensor has chosen the >> license-version-only or any-later-version option.[2] >> * This project to improve license information in the Linux kernel >> files has been discussed among kernel developers, on kernel mailing >> lists, and documented in public files and documentation beginning in >> mid-20173 to which many kernel copyright holders past and present = have >> access and would be likely to see and which has received positive >> response and encouragement. >> [1] See https://www.gnu.org/licenses/gpl-howto.html which provides = the >> standard license notice, but then also goes on to >> https://www.gnu.org/licenses/gpl-faq.en.html#LicenseCopyOnlysuggest >> one clear and explicit statement such as, =E2=80=9CThis program is = released >> under license FOO=E2=80=9D. FAQ questions and = https://www.gnu.org/licenses >> /gpl-faq.en.html#NoticeInSourceFile also stress the general need for >> clarity without mandating use of the specific standard license = notice. >> [2] See https://www.gnu.org/licenses/identify-licenses-clearly.html >>=20 >> #4 Where the file contains a license notice that clearly states the >> file is licensed under =E2=80=9CGPL=E2=80=9D with no indication of = version number >> and no other license information whatsoever =E2=80=94> ADD SPDX = identifier >> for GPL-2.0-or-later >> Rationale: This is consistent with the text of the license which >> states, =E2=80=9CIf the Program does not specify a version number of = this >> License, you may choose any version ever published by the Free >> Software Foundation.=E2=80=9D Because the Linux kernel is well-known = to be >> licensed under GPL-2.0-only and use of GPL-1.0 is generally sparse, = it >> within the options given in the license text to choose GPL-2.0-or- >> later in this case. Doing so more easily enables use of such files >> beyond the Linux kernel. >=20 > Just FYI, I am fine with a GPL-2.0-or-later choice for the kernel, > but scancode will report these cases as GPL-1.0-or-later. good to know, thanks. I don=E2=80=99t think that is an issue, agree? >=20 >> #5 Where the file contains a license notice that: a) refers to the >> COPYING file or another specific file (or references GPL and the >> COPYING or another specific file) with no other information as to the >> specific license whatsoever; and b) the COPYING or other specific = file >> can be located and is clearly a copy of GPL-2.0 =E2=80=94> ADD SPDX >> identifier for GPL-2.0-only >> Rationale: This is similar to #4, but the combination of a clear >> reference to a specific license file and the fact that the Linux >> kernel is clearly intended to be GPL-2.0-only leads to the intent = that >> this is also GPL-2.0-only. The COPYING file currently in the kernel = is >> at = https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/ >> tree/COPYING, and refers to GPL-2.0-only. The (earlier) version of = the >> COPYING file also had Linus expressing GPL-2.0-only: see = https://git.k >> = ernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/COPYING?i >> d=3D1da177e4c3f41524e886b7f1b8a0c1fc7321cac2 >>=20 >> #6 Where a file contains a license notice that is non-standard as >> compared to that stated in the GPL-2.0 license text but in = nonetheless >> clear as to GPL-2.0-only or GPL-2.0-or-later and there is other >> license information, and that license information contains the >> following: >> #6a An existing known additional license or exception >> for which there is an SPDX identifier >> =E2=80=94> ADD appropriate SPDX license = expression >> (use of AND, OR, WITH), where person making change is does not >> represent copyright holders for file >> =E2=80=94> REPLACE with appropriate SPDX = license >> expression, where person(s) making or signing-off on changes = represent >> copyright holders >> #6b An additional license or exception for which >> there is no SPDX identifier as per the existing SPDX License List >> Matching Guidelines: >> -- If clearly a different license and use is >> more than one or two files, then submit for addition to SPDX License >> List at http://13.57.134.254/app/submit_new_license/ >> -- If close to an existing license/exception >> on the SPDX License List such that the SPDX license=E2=80=99s = matching >> markup might be extended to accommodate as a match, submit to SPDX >> legal team for review of such. >> -- If some mess of a license that is unclear, >> an abomination, contains non-free elements, or otherwise poses some >> kind of challenge, then attempt to contact copyright holders to = change >> license with recommendation >> #6c An additional or different disclaimer or >> warranty text: >> =E2=80=94 Where the copyright holders of the = file in >> questions can be contacted, then ask them to remove this and use the >> appropriate SPDX identifier for GPL >> =E2=80=94 Where copyright holders of the file = in >> question cannot be easily contacted or found, then analyze = differences >> between additional disclaimer text and standard disclaimer included = in >> GPL, then: >> =E2=80=94> if additional disclaimer = text >> adds no additional substantive aspects to the standard GPL = disclaimer, >> REPLACE with appropriate SPDX license identifier for GPL-2.0 >> =E2=80=94> If additional disclaimer = text >> adds additional substantive aspects to the standard GPL disclaimer, >> ADD the appropriate SPDX license identifier for GPL-2.0 >> =3D=3D=3D=3D=3D=3D=3D=3D >> Please note: while I am a lawyer, I do not represent any kernel >> developers nor any of the people involved in this work. I understand >> that no lawyer could represent the interest of the Linux kernel and >> its many copyright holders in total. We can, however, discuss this in >> a public forum and come up with some consensus as to reasonable >> guidelines and rationale for such. >> I have tried to collect the various thoughts and opinions expressed = on >> the mailing list on these topics. >> I=E2=80=99m particularly interested in the following feedback: >> A) This takes a somewhat conservative approach regarding retaining >> some of the license notices and adding SPDX identifiers, rather than >> replacing. I=E2=80=99d like to know from those involved in using = scanning >> tools (Thomas, Philippe) if this would be tenable. >=20 > Speaking for the scanning tool in use here (i.e. the scancode-toolkit) > having SPDX ids alone or with some extra notice has no impact. The > SPDX id and the license notice will be detected and each detected > texts reported with their own corresponding license expression (which > would happen to be the same and that can later be combined and > simplified in a single expression.) >=20 > It would likely not impact checkpatch.pl either since it cares only > about the SPDX identifiers. >=20 > BUT If you start to butcher the original notice (such as you remove > the GPL notice part and keep a warranty disclaimer) the detection > results will be butchered accordingly and that standalone disclaimer > will be eventually detected either as a bare disclaimer with no > related license or as a partial detection of an another notice (since > scancode eventually does a multidiff/red line comparison). good to know. I don=E2=80=99t think I intended to suggest we=E2=80=99d = butcher up the existing notice - I think we either leave it all in, and = ADD SPDX identifier or REPLACE it all. That was what I was trying to = delineate here overall. I think these disclaimers ones are particular tricky. Might be worth = trying to settle some of the other threshold issues raised by John and = Richard (response to those next and in order!) and then come back to = this. Thanks, Jilayne > The same would likely apply to other license scanners that do not use > a diff, though this could be amplified as regex-based scanners such as > Fossology may get unlucky and miss having a regex for the butchered > text and probabilistic scanners such as Licensee and many others may > see the butchered text going below their false positive threshold and > ignore it entirely. >=20 > Therefore my advice would be either to keep a complete and consistent > notice or to keep none e.g. avoid cherry picking parts of a notice as > this will surely result in some license detection but not the one you > would expect: it will likely be inconclusive and require more review. >=20 > -- > Cordially > Philippe Ombredanne