Re: [bitbake-devel] Improving npm(sw) fetcher & integration within Bitbake

From: Stefan Herbrechtsmeier <stefan.herbrechtsmeier-oss@weidmueller.com>
To: Martin Koppehel <martin@mko.dev>,
	richard.purdie@linuxfoundation.org,
	Jasper Orschulko <Jasper.Orschulko@iris-sensing.com>,
	"bitbake-devel@lists.openembedded.org"
	<bitbake-devel@lists.openembedded.org>
Cc: Daniel Baumgart <Daniel.Baumgart@iris-sensing.com>
Subject: Re: [bitbake-devel] Improving npm(sw) fetcher & integration within Bitbake
Date: Fri, 5 Nov 2021 14:16:22 +0100	[thread overview]
Message-ID: <4d2e6c3c-b1f8-01e7-76b3-557201755a87@weidmueller.com> (raw)
In-Reply-To: <4106f9ef-5b2e-5276-f1bb-c80a989d7fdf@mko.dev>

Hi,

Am 05.11.2021 um 12:10 schrieb Martin Koppehel:
> On 11/5/21 10:07, Stefan Herbrechtsmeier wrote:
>> Am 05.11.2021 um 00:15 schrieb Richard Purdie via lists.openembedded.org:
>>> On Thu, 2021-11-04 at 12:29 +0000, Jasper Orschulko wrote:
>>>> Dear Bitbake developers,

[snip]

>>>> So how can we address these issues?
>>>>
>>>> We plan to implement a "sub-fetcher" for npmsw (a concept which might
>>>> also be recyclable for similar use-cases). This would take the
>>>> form of e.g.:
>>>>
>>>> SRC_URI = "npmsw+git://git-uri.git;npm-topdir=path_to_npm_project;..."
>>>>
>>>> The idea is, that the npsw fetcher would then call an arbitrary sub-
>>>> fetcher (in this case git, however any fetcher will be supported) and
>>>> after the sub-fetcher has extracted the source code into the DL_DIR,
>>>> the npm fetcher will create a secondary download folder as a copy of
>>>> the sub-fetchers download folder. Within this copy, the npm fetcher
>>>> will call `npm ci`, effectively downloading the npm packages by doing a
>>>> clean-install on the basis of the package.json and the package-
>>>> lock.json files within the npmsw download dir. This results in a much
>>>> faster build, as it removes the need for seperate handling of the
>>>> individual node packages, as well as streamlining the developers
>>>> workflow with the build process within Bitbake.
>>
>> How should this support the download proxy? The npm ci command need a 
>> repository or a cache to work.
> The npm ci command can utilize a private registry and/or http proxy if 
> that's required. We didn't consider that case yet, but I think we could 
> add a call to npm to configure a proxy according to e.g. a set of 
> environment variables.

With proxy I mean the yocto http download proxy not a private npm registry:
https://downloads.yoctoproject.org/mirror/sources/

>> Furthermore you need a patch step in between the fetch steps to 
>> support tuning / fixing of the configuration before the second fetch 
>> step.
> Our idea was to build a completely checked out and installed repository 
> and archive this in the DL_DIR, which then can be used in the do_patch 
> phase.

This makes the download recipe specific and you can't share npm packages 
between recipes.

> Are we missing some important use-case here? Whenever it is necessary to 
> patch the package.json/package-lock.json this should ideally be done in 
> your upstream repository.

Yes but what if your upstream repository doesn't exist anymore or the 
upstream repo doesn't accept your change.

> Our primary motivation behind leaving the package-lock within the source 
> repository was to have a single source of truth for the dependency 
> versions.

What happens if you have a CVE in a common dependency. You have to wait 
for every project to integrate the update and have to check the external 
sources to know if the package was updated.

The problems are the different requirements between a developer and 
distribution point of view.

>>>> As this fetcher would be implemented separately from the current npmsw
>>>> fetcher, this will not cause any breaking changes for existing setups.
>>>>
>>>> Additionally, we plan on writing a separate npmsw.bbclass, which will
>>>> parse the package.json for each node module for an automated Bitbake
>>>> license manifest generation, which will resolve the current challenge
>>>> of having to maintain these manually, as currently described at
>>>> https://www.yoctoproject.org/docs/latest/mega-manual/mega-manual.html#npm-using-the-registry-modules-method 
>>>>
>>>> .
>>
>> This licenses will be generated by the recipetool and you could 
>> provide checksums to detect the correct licenses.
>>
>> The license inside the package.json is only a hint and you need a 
>> license file to fulfill the license compliance. Because of this I 
>> remove the package.json from LIC_FILES_CHKSUM because it is useless 
>> for the license compliance.
> You're right here that there's a need to have the full license file. In 
> this case, a license crawler would need to traverse node_modules and 
> scan for LICENSE[.md,.txt,] files and then generate the checksums.
>>
>>>> If this is something you see as a worthwhile goal, we will provide a
>>>> set of patch files within the coming weeks.
>>
>> I think you mixed the unusable npm implementation with your special 
>> use case.
>>
>> The problem is that the current npm implementation isn't really 
>> usable. I'm working on this and have already a prototype that could 
>> install, build and *test* a proprietary angular project and node-red 
>> as well as koa/examples from github.
> You make a very interesting point here, primarily because you cover two 
> very different use-cases. I think we have to distinguish between 
> something like a webinterface that only uses nodejs and npm at 
> compile-time for dependency management and bundling, where NodeJS itself 
> is not even required on the target (this is our use case). The second 
> class of use cases is running software like node-red directly on the 
> target, where the current approach of the npm fetcher works quite well. 
> Our thoughts primarily focused on the webinterface use case, but I agree 
> with you that we should keep an eye on supporting all use cases.
>>
>> If I understand you correct you like to build a npm recipe that could 
>> change it dependencies without update the recipe except the SRCREV of 
>> the repositories.
> 
> That is true, and I believe keeping the package-lock file directly in 
> the source repository is something worth pursuing not only for us.
> Do you have a strong preference for keeping the dependencies outside of 
> the source repository?

The problem is the different focus between a project and a distribution. 
If you use the dependencies direct you relay on the policy of the 
project and its dependencies. It must be possible to override the 
decision of an individual project or dependency if it doesn't match your 
requirements.

My question is if we really need a fetcher for the content of a 
package-lock or if we should create a recipe from a package-lock.

>>> At a first read it sounds reasonable but I don't know the answers to 
>>> a few
>>> questions which make or break things from an OE/bitbake perspective. 
>>> Those
>>> questions are:
>>>
>>> a) Once DL_DIR has been populated by this fetch mechanism, can a 
>>> subsequent
>>> build run with just the data from there without accessing the network?
> This does hold for well-built packages that only use code out of 
> node_modules.
> We can not guarantee this because the package could execute arbitrary JS 
> code during its build time, including fetching content from the internet.
>>>
>>> b) Is the information encoded into SRC_URI enough to give a 
>>> deterministic build
>>> result, i.e. if we run this build at some later date, will we get the 
>>> same
>>> result?
> The package-lock.json should be checked into the source repository, so 
> pinning down SRCREV guarantees a 100% reproducible dependency installation.
>>>
>>> c) Is fetching only happening during the do_fetch task and not in any 
>>> subsequent
>>> step?
> Yes, we want to perform a full fetch directly in do_fetch and then 
> archive the result of this operation within the DL_DIR, so subsequent 
> builds can be done directly from the DL_DIR.
>>>
>>> I'd love for some of the other people who're worked on this code to 
>>> jump in as I
>>> don't use it or understand it in detail. I am worried about how we 
>>> maintain this
>>> longer term as different people seem to have different use cases 
>>> which sees the
>>> code changing in different directions and we're starting to look like 
>>> we may end
>>> up with multiple ways of doing things which I really dislike.
>>
>> This leads to the questions what is the desired way to integrate a 
>> package / dependency manager. Nowadays any language (even C/C++) has a 
>> package manager available and more and more build systems (ex. Meson, 
>> CMake) support automatic download of dependencies. The common 
>> integration into OE is a script (recipetool) that generate a recipe 
>> from the foreign configuration. The current npm implementation is 
>> special because it reuse a foreign configuration and translate it into 
>> fetch commands on-the-fly. This leads to the problem that common 
>> tweaks like override a dependency or share configuration between 
>> recipes via include file isn't possible. We could fix it by removing 
>> the foreign configuration and do the translation during recipe 
>> creation. But this means you have to recreate the recipe after every 
>> dependency change.
>>
>> Is it a valid use case for OE to support foreign dependency 
>> configurations like npm-shrinkwrap.json, go.sum or conan.lock?
> Agreed. Especially for cases like Javascript/Go/Rust where the 
> dependency management is a core part of the language and ecosystem, we 
> should support these.

What is the advantage of a package manager specific fetcher instead of a 
package manager specific recipe generator? Does this advantages overcome 
the loss of common OE features?

Regards
   Stefan