From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751569AbcELUUZ (ORCPT ); Thu, 12 May 2016 16:20:25 -0400 Received: from torres.zugschlus.de ([85.214.131.164]:33064 "EHLO torres.zugschlus.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751485AbcELUUW (ORCPT ); Thu, 12 May 2016 16:20:22 -0400 Date: Thu, 12 May 2016 22:20:09 +0200 From: Marc Haber To: "Dr. David Alan Gilbert" Cc: Borislav Petkov , Paolo Bonzini , linux-kernel@vger.kernel.org, kvm ML Subject: transparent huge pages breaks KVM on AMD. Message-ID: <20160512202009.GZ9143@torres.zugschlus.de> References: <570EEF6D.40307@redhat.com> <20160414052220.GE7600@torres.zugschlus.de> <20160421083948.GF21755@torres.zugschlus.de> <20160421123711.GD28821@pd.tnic> <20160421145005.GI21755@torres.zugschlus.de> <20160421165106.GK28821@pd.tnic> <20160421200433.GL21755@torres.zugschlus.de> <20160423160429.GL8531@pd.tnic> <20160423184341.GA21755@torres.zugschlus.de> <20160423185246.GC8376@gallifrey> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <20160423185246.GC8376@gallifrey> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi David, On Sat, Apr 23, 2016 at 07:52:46PM +0100, Dr. David Alan Gilbert wrote: > Hmm, your problem does sound like bad hardware, but.... > If you've got a nice reliable crash, can you try turning transparent huge pages > off on the host; > echo never > /sys/kernel/mm/transparent_hugepage/enabled I must have missed this hint in the middle of the "your hardware is bad" avalance that came over me. I spent two weeks bisecting "good" kernels since during the repeated reconfigurations, transparent huge pages got turned off in kernel configuration. After running each kernel for 24 hours, I eventually ended up with a working 4.5 kernel. The configuration diff was short, showing transparent huge pages, and - finally - upon re-reading the thread I found your hint. I have now the result that 4.5, 4.5.1 and 4.5.4 corrupt KVM guest memory reliably in the first hour of running under disk load, causing the VM to either drop dead in the water, or to read randomness from disk. Rebooting fixes the VM. This happens as soon as transparent huge pages are turned on in the host. Turning off transparent huge pages by echo never > /sys/kernel/mm/transparent_hugepage/enabled fixes the issue even without rebooting the host. Start up the VM again and it works just fine. Is this an issue in (a) transparent huge pages, (b) KVM or (c) qemu? Where should this issue be forwarded? Or do we just accept it and turn transparent huge pages off? Greetings Marc -- ----------------------------------------------------------------------------- Marc Haber | "I don't trust Computers. They | Mailadresse im Header Leimen, Germany | lose things." Winona Ryder | Fon: *49 6224 1600402 Nordisch by Nature | How to make an American Quilt | Fax: *49 6224 1600421