From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751748AbdFGRJl (ORCPT ); Wed, 7 Jun 2017 13:09:41 -0400 Received: from mail-it0-f53.google.com ([209.85.214.53]:33803 "EHLO mail-it0-f53.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751660AbdFGRJk (ORCPT ); Wed, 7 Jun 2017 13:09:40 -0400 Date: Wed, 7 Jun 2017 13:09:37 -0400 (EDT) From: Nicolas Pitre To: Ingo Molnar cc: Ingo Molnar , Peter Zijlstra , linux-kernel@vger.kernel.org, Linus Torvalds , Thomas Gleixner Subject: Re: [PATCH v2 0/8] scheduler tinification In-Reply-To: <20170607160010.e2gtddlllflvr6er@gmail.com> Message-ID: References: <20170606232450.30278-1-nicolas.pitre@linaro.org> <20170607160010.e2gtddlllflvr6er@gmail.com> User-Agent: Alpine 2.20 (LFD 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, 7 Jun 2017, Ingo Molnar wrote: > > * Nicolas Pitre wrote: > > > Many embedded systems don't need the full scheduler support. Most of the > > time, user space is tightly controlled and many of the scheduler facilities > > are simply unused. > > Sorry, NAK: > > > 23 files changed, 3190 insertions(+), 2897 deletions(-) > > That's a lot of extra code plus churn for a code base that is already pretty > #ifdef heavy. > > Also, the savings are marginal, even with significant functionality disabled: > > > text data bss dec hex filename > > 28623 3404 128 32155 7d9b kernel/sched/built-in.o > > > > With this series and dl and rt classes disabled: > > > > text data bss dec hex filename > > 20734 3334 40 24108 5e2c kernel/sched/built-in.o > > With 1GHz + 1GB RAM SoCs being well below $10 in bulk we worry about code > complexity, predictability, testability, behavioral and ABI uniformity a lot more > than about the last 10-20k of kernel text footprint... > > So I think the 'tiny' efforts are fundamentally misguided and are shooting for an > ever shrinking market of RAM/ROM starved products whose share is shrinking every > month. I'm rather seeing the opposite: an ever growing market of internet-connected coin-cell-battery-powered tiny devices where the amount of RAM is counted in kilobytes rather than megabytes. Let me repeat some background as to what my fundamental motivation is, and then maybe you'll understand why I'm doing this. What is the biggest buzzword in the IT industry besides AI right now? It is IOT. Most IOT targets are so small that people are rewriting new operating systems from scratch for them. Lots of fragmentation already exists. We're talking about systems with less than one megabyte of RAM, sometimes much less. Still, those things are being connected to the internet. And this is going to be a total security nightmare. I wish to be able to leverage the Linux ecosystem for as much of the IOT space as possible to avoid the worst of those nightmares. The Linux ecosystem has a *lot* of knowledgeable people around it, a lot of testing infrastructure and tooling available already, etc. If a security issue turns up on Linux, it has a greater chance of being caught early, or fixed quickly otherwise, and finding people with the right knowledge is easier on Linux than it could be on any RTOS out there. Still with me so far? Yes we have tools that can automatically reduce the kernel size. We can use LTO with the compiler, etc. LTO is pretty good already. It can typically reduce the kernel size by 20%. If all system calls are disabled except for a few ones, then LTO can get rid of another 20%. The minimal kernel I get is still 400-500 KB in size. That's still too big. There is this 120 KB of VFS code that is always there even though there is no real filesystem at all configured in the kernel. There is that other 100 KB of core driver support code despite the fact that the set of drivers I'm using are very simple and make no use of most of that core driver code. Etc. There comes a point where there is no option but to explicitly trim out parts of the kernel as such decisions cannot be automated, hence this patch series. Bringing the scheduler under 20KB in size is therefore very useful in that context. Alternatively I could push for a parallel implementation as I did with the TTY layer where I obtained a 6x size reduction. But in the scheduler case I obtained only a 2x size reduction so I thought it could be more profitable to get about the same saving by reworking the existing code instead., and eventually contributing a very bare scheduler class that would be a smaller alternative to the fair scheduler for deployments where that makes sense. Unless you actually changed your mind about alternative whole scheduler implementations that is... For Linux to be suitable for small IoT, it has to be small, damn small. My target is 256 KB of RAM. And if you look at the kind of application those 256-KB systems are doing, it's basically one main task typically acquiring sensor data and sending it in some crypted protocol over a wireless network on the internet, and possibly accepting commands back. So what do you need from the OS to achieve that? A few system calls, a minimal scheduler, minimal memory management, minimal filesystem structure and minimal network stack. And your user app. So, why not having each of those blocks be created using the existing Linux syscall interface and internal API? At that point, it should be possible to take your standard full-featured Linux workstation and develop your user app on it, run it there using all the existing native debugging tools, etc. In the end you just pick the mini version of everything for the final target and you're done. And you don't have to learn a whole new OS, development environment and program model, etc. Next on my list would be a cache-less, completely serialized VFS bypass that has only what's needed to make the link between the read/write syscalls, a filesystem driver and a block driver while preserving the existing kernel APIs. And by being really small, the maintenance cost of a "parallel" implementation isn't very high, certainly much less than trying to maintain a single code path that can scale to both extremes in that case. PS: As far as I remember, Linus didn't condemn the idea last time I brought up this topic in his presence. I therefore hope we could find ways for allowing Linux usage into the largest computing device deployment to come. Nicolas