From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752997AbdEJI7C (ORCPT ); Wed, 10 May 2017 04:59:02 -0400 Received: from lhrrgout.huawei.com ([194.213.3.17]:26148 "EHLO lhrrgout.huawei.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752519AbdEJI66 (ORCPT ); Wed, 10 May 2017 04:58:58 -0400 Subject: Re: RFC v2: post-init-read-only protection for data allocated dynamically To: Michal Hocko , Dave Hansen References: <9200d87d-33b6-2c70-0095-e974a30639fd@huawei.com> <0b55343e-4305-a9f1-2b17-51c3c734aea6@huawei.com> <20170510080542.GF31466@dhcp22.suse.cz> CC: Laura Abbott , , , "kernel-hardening@lists.openwall.com" From: Igor Stoppa Message-ID: <885311a2-5b9f-4402-0a71-5a3be7870aa0@huawei.com> Date: Wed, 10 May 2017 11:57:42 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <20170510080542.GF31466@dhcp22.suse.cz> Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: 7bit X-Originating-IP: [10.122.225.51] X-CFilter-Loop: Reflected X-Mirapoint-Virus-RAPID-Raw: score=unknown(0), refid=str=0001.0A020205.5912D647.00B9,ss=1,re=0.000,recu=0.000,reip=0.000,cl=1,cld=1,fgs=0, ip=0.0.0.0, so=2013-06-18 04:22:30, dmn=2013-03-21 17:37:32 X-Mirapoint-Loop-Id: 6b59a0021462a4f9b0519fd6afc9be29 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 10/05/17 11:05, Michal Hocko wrote: > On Fri 05-05-17 13:42:27, Igor Stoppa wrote: [...] >> ... in the case I have in mind, I have various, heterogeneous chunks of >> data, coming from various subsystems, not necessarily page aligned. >> And, even if they were page aligned, most likely they would be far >> smaller than a page, even a 4k page. > > This aspect of various sizes makes the SLAB allocator not optimal > because it operates on caches (pools of pages) which manage objects of > the same size. Indeed, that's why I wasn't too excited about caches, but probably I was not able to explain sufficiently well why in the RFC :-/ > You could use the maximum size of all objects and waste > some memory but you would have to know this max in advance which would > make this approach less practical. You could create more caches of > course but that still requires to know those sizes in advance. Yes, and even generic per-architecture or even per board profiling wouldn't necessarily do much good: taking SE Linux as example, one could have two identical boards with almost identical binaries installed, only differing in the rules/policy. That difference alone could easily lead to very different size requirements for the sealable page pool. > So it smells like a dedicated allocator which operates on a pool of > pages might be a better option in the end. ok > This depends on what you > expect from the allocator. NUMA awareness? Very effective hotpath? Very > good fragmentation avoidance? CPU cache awareness? Special alignment > requirements? Reasonable free()? Etc... >>From the perspective of selling the feature to as many subsystems as possible, I'd say that as primary requirement, it shouldn't affect runtime performance. I suppose (but it's just my guess) that trading milliseconds-scale boot-time slowdown, for additional integrity is acceptable in the vast majority of cases. The only alignment requirements that I can think of, are coming from the MMU capability of dealing only with physical pages, when it comes to write-protect them. > To me it seems that this being an initialization mostly thingy a simple > allocator which manages a pool of pages (one set of sealed and one for > allocations) Shouldn't also the set of pages used for keeping track of the others be sealed? Once one is ro, also the other should not change. > and which only appends new objects as they fit to unsealed > pages would be sufficient for starter. Any "free" that might happen during the initialization transient, would actually result in an untracked gap, right? What about the size of the pool of pages? No predefined size, instead request a new page, when the memory remaining from the page currently in use is not enough to fit the latest allocation request? There are also two aspect we discussed earlier: - livepatch: how to deal with it? Identify the page it wants to modify and temporarily un-protect it? - modules: unloading and reloading modules will eventually lead to permanently lost pages, in increasing number. Loading/unloading repeatedly the same module is probably not so common, with a major exception being USB, where almost anything can show up. And disappear. This seems like a major showstopper for the linear allocator you propose. My reasoning in pursuing the kmalloc approach was that it is already equipped with mechanisms for dealing with these sort of cases, where memory can be fragmented. I also wouldn't risk introducing bugs with my homebrew allocator ... The initial thought was that there could be a master toggle to seal/unseal all the memory affected. But you were not too excited, iirc :-D Alternatively, kmalloc could be enhanced to unseal only the pages it wants to modify. I don't think much can be done for data that is placed together, in the same page with something that needs to be altered. But what is outside of that page could still enjoy the protection from the seal. -- igor