From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751916AbcGMPco (ORCPT ); Wed, 13 Jul 2016 11:32:44 -0400 Received: from ud19.udmedia.de ([194.117.254.59]:48094 "EHLO mail.ud19.udmedia.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751176AbcGMPce (ORCPT ); Wed, 13 Jul 2016 11:32:34 -0400 MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII; format=flowed Content-Transfer-Encoding: 7bit Date: Wed, 13 Jul 2016 17:32:12 +0200 From: Matthias Dahl To: Michal Hocko Cc: linux-raid@vger.kernel.org, linux-mm@kvack.org, dm-devel@redhat.com, linux-kernel@vger.kernel.org, Mike Snitzer Subject: Re: Page Allocation Failures/OOM with dm-crypt on software RAID10 (Intel Rapid Storage) In-Reply-To: <20160713134717.GL28723@dhcp22.suse.cz> References: <02580b0a303da26b669b4a9892624b13@mail.ud19.udmedia.de> <20160712095013.GA14591@dhcp22.suse.cz> <20160712114920.GF14586@dhcp22.suse.cz> <20160712140715.GL14586@dhcp22.suse.cz> <459d501038de4d25db6d140ac5ea5f8d@mail.ud19.udmedia.de> <20160713112126.GH28723@dhcp22.suse.cz> <20160713121828.GI28723@dhcp22.suse.cz> <74b9325c37948cf2b460bd759cff23dd@mail.ud19.udmedia.de> <20160713134717.GL28723@dhcp22.suse.cz> Message-ID: User-Agent: Roundcube Webmail/1.2.0 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello... On 2016-07-13 15:47, Michal Hocko wrote: > This is getting out of my area of expertise so I am not sure I can help > you much more, I am afraid. That's okay. Thank you so much for investing the time. For what it is worth, I did some further tests and here is what I came up with: If I create the plain dm-crypt device with --perf-submit_from_crypt_cpus, I can run the tests for as long as I want but the memory problem never occurs, meaning buffer/cache increase accordingly and thus free memory decreases but used mem stays pretty constant low. Yet the problem here is, the system becomes sluggish and throughput is severely impacted. ksoftirqd is hovering at 100% the whole time. Somehow my guess is that normally dm-crypt simply takes every request, encrypts it and queues it internally by itself. And that queue is then slowly emptied to the underlying device kernel queue. That is why I am seeing the exploding increase in used memory (rather than in buffer/cache) which in the end causes a OOM situation. But that is just my guess. And IMHO that is not the right thing to do (tm), as can be seen in this case. No matter what, I have no clue how to further diagnose this issue. And given that I already had unsolvable issues with dm-crypt a couple of months ago with my old machine where the system simply hang itself or went OOM when the swap was encrypted and just a few kilobytes needed to be swapped out, I am not so sure anymore I can trust dm-crypt with a full disk encryption to the point where I feel "safe"... as-in, nothing bad will happen or the system won't suddenly hang itself due to it. Or if a bug is introduced, that it will actually be possible to diagnose it and help fix it or that it will even be eventually fixed. Which is really a pity, since I would really have liked to help solve this. With the swap issue, I did git bisects, tests, narrowed it down to kernel versions when said bug was introduced... but in the end, the bug is still present as far as I know. :( I will probably look again into ext4 fs encryption. My whole point is just that in case any of disks go faulty and needs to be replaced or sent in for warranty, I don't have to worry about mails, personal or business data still being left on the device (e.g. if it is no longer accessible or has reallocated sectors or whatever) in a readable form. Oh well. Pity, really. Thanks again, Matthias -- Dipl.-Inf. (FH) Matthias Dahl | Software Engineer | binary-island.eu services: custom software [desktop, mobile, web], server administration