From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
        id S1751064AbdAQBzX (ORCPT <rfc822;w@1wt.eu>);
        Mon, 16 Jan 2017 20:55:23 -0500
Received: from mail-wm0-f67.google.com ([74.125.82.67]:34746 "EHLO
        mail-wm0-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1750784AbdAQBzR (ORCPT
        <rfc822;linux-kernel@vger.kernel.org>);
        Mon, 16 Jan 2017 20:55:17 -0500
From: MasterPrenium <masterprenium.lkml@gmail.com>
X-Google-Original-From: MasterPrenium <MasterPrenium.LKML@gmail.com>
Subject: Re: PROBLEM: Kernel BUG with raid5 soft + Xen + DRBD - invalid opcode
To: Shaohua Li <shli@kernel.org>
References: <585D6C34.2020908@gmail.com>
 <20170104223015.cr6vtyhxuwxrg76g@kernel.org>
 <e56edc2b-f2ad-2ab1-4184-5d7cad80085a@gmail.com>
 <20170105193745.qnmqsussxy7nasdn@kernel.org>
 <2717981a-4308-3f7b-15c6-f384a41fd445@gmail.com>
 <20170109224435.sfyrvkxhajgrq2i5@kernel.org>
Cc: linux-kernel@vger.kernel.org, xen-users@lists.xen.org,
        linux-raid@vger.kernel.org,
        "MasterPrenium@gmail.com" <MasterPrenium@gmail.com>,
        xen-devel@lists.xenproject.org
Message-ID: <128ad8a8-3aaf-f5f1-3709-373ad504ca44@gmail.com>
Date: Tue, 17 Jan 2017 02:54:06 +0100
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
 Thunderbird/45.6.0
MIME-Version: 1.0
In-Reply-To: <20170109224435.sfyrvkxhajgrq2i5@kernel.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 8bit
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

Hi Shaohua,

I've made some new little tests, maybe it can help.

- I tried creating the RAID 5 stack with only 2 drives (mdadm --create 
/dev/md10 --raid-devices=3 --level=5 /dev/sdc1 /dev/sdd1 missing).
The same issue is happening.
- but one time (still with 2/3 drives), I was not able to crash the 
kernel, with exactly the same procedure as previous. Even with 
re-creating filesystems ect.
In order to re-produce the BUG I had to re-create the array.

Can this be linked to this message ? :
[  155.667456] md10: Warning: Device sdc1 is misaligned

I don't know how to "align" a drive in a RAID stack... The partition is 
correctly align (as "parted" says).

- In another test (still 2/3 drives in the stack), I didn't got the 
kernel crash, but I had 100% io wait on cpu. Trying to reboot, finally 
give me this printk messages : http://pastebin.com/uzVHUUrC

If you have any patch to give me (maybe something to be more verbose 
about the issue), please tell me, I'll test it as it's a really blocking 
issue...

Best regards,

MasterPrenium


Le 09/01/2017 à 23:44, Shaohua Li a écrit :
> On Sun, Jan 08, 2017 at 02:31:15PM +0100, MasterPrenium wrote:
>> Hello,
>>
>> Replies below + :
>> - I don't know if this can help but after the crash, when the system
>> reboots, the Raid 5 stack is re-synchronizing
>> [   37.028239] md10: Warning: Device sdc1 is misaligned
>> [   37.028541] created bitmap (15 pages) for device md10
>> [   37.030433] md10: bitmap initialized from disk: read 1 pages, set 59 of
>> 29807 bits
>>
>> - Sometimes the kernel completely crash (lost serial + network connection),
>> sometimes only got the "BUG" dump, but still have network access (but a
>> reboot is impossible, need to reset the system).
>>
>> - You can find blktrace here (while running fio), I hope it's complete since
>> the end of the file is when the kernel crashed : https://goo.gl/X9jZ50
> Looks most are normal full stripe writes.
>   
>>> I'm trying to reproduce, but no success. So
>>> ext4->btrfs->raid5, crash
>>> btrfs->raid5, no crash
>>> right? does subvolume matter? When you create the raid5 array, does adding
>>> '--assume-clean' option change the behavior? I'd like to narrow down the issue.
>>> If you can capture the blktrace to the raid5 array, it would be great to hint
>>> us what kind of IO it is.
>> Yes Correct.
>> The subvolume doesn't matter.
>> -- assume-clean doesn't change the behaviour.
> so it's not a resync issue.
>
>> Don't forget that the system needs to be running on xen to crash, without
>> (on native kernel) it doesn't crash (or at least, I was not able to make it
>> crash).
>>>> Regarding your patch, I can't find it. Is it the one sent by Konstantin
>>>> Khlebnikov ?
>>> Right.
>> It doesn't help :(. Maybe the crash is happening a little bit later.
> ok, the patch is unlikely helpful, since the IO size isn't very big.
>
> Don't have good idea yet. My best guess so far is virtual machine introduces
> extra delay, which might trigger some race conditions which aren't seen in
> native.  I'll check if I could find something locally.
>
> Thanks,
> Shaohua