linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Dexuan Cui <decui@microsoft.com>
To: "hch@lst.de" <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>,
	Bart Van Assche <Bart.VanAssche@sandisk.com>,
	"hare@suse.com" <hare@suse.com>, "hare@suse.de" <hare@suse.de>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"jth@kernel.org" <jth@kernel.org>,
	Nick Meier <Nick.Meier@microsoft.com>,
	"Alex Ng (LIS)" <alexng@microsoft.com>,
	Long Li <longli@microsoft.com>,
	"Adrian Suhov (Cloudbase Solutions SRL)" <v-adsuho@microsoft.com>,
	"Chris Valean (Cloudbase Solutions SRL)" <v-chvale@microsoft.com>
Subject: RE: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements")
Date: Wed, 15 Feb 2017 13:51:33 +0000	[thread overview]
Message-ID: <MWHPR03MB26691FB87F0823E2E62868BABF5B0@MWHPR03MB2669.namprd03.prod.outlook.com> (raw)
In-Reply-To: <20170214163437.GA23956@lst.de>

> From: hch@lst.de [mailto:hch@lst.de]
> Sent: Wednesday, February 15, 2017 00:35
> > I tested today's linux-next (next-20170214) + the 2 patches just now and
> got
> > a weird result:
> > sometimes the VM stills hung with a new calltrace (BUG: spinlock bad
> > magic) , but sometimes the VM did boot up despite the new calltrace!
> >
> > Attached is the log of a "good" boot.
> >
> > It looks we have a memory corruption issue somewhere...
>
> Yes.
It's due to an uninitialized spinlock. Please see the below.

> > Actually previously I saw the "BUG: spinlock bad magic" message once, but
> I
> > couldn't repro it later, so I didn't mention it to you.
>
> Interesting.
Ditto.
And probably my memory was inaccurate due to the long period of bisecting.
We should always see the message.

>
> A few questions on the dmesg:
>
> [    6.208794] sd 2:0:0:0: [storvsc] Sense Key : Illegal Request [current]
> [    6.209447] sd 2:0:0:0: [storvsc] Add. Sense: Invalid command operation
> code
> [    6.210043] sd 3:0:0:0: [storvsc] Sense Key : Illegal Request [current]
> [    6.210618] sd 3:0:0:0: [storvsc] Add. Sense: Invalid command operation
> code
> [    6.212272] sd 2:0:0:0: [storvsc] Sense Key : Illegal Request [current]
> [    6.212897] sd 2:0:0:0: [storvsc] Add. Sense: Invalid command operation
> code
> [    6.213474] sd 3:0:0:0: [storvsc] Sense Key : Illegal Request [current]
> [    6.214051] sd 3:0:0:0: [storvsc] Add. Sense: Invalid command operation
> code
>
> I didn't see anything like this in the other logs.  Are these messages
> something usual on HyperV VMs?

The messages should be normal, and I have always seen them for years
at least.

I think they're triggered by "/sbin/hdparm /dev/sda" or
"/lib/udev/ata_id /dev/sda"  in my Ubuntu 16.04 VM: during the VM
boot-up, the 2 programs are executed by the initrd's systemd-udev or other
system init scripts and the 2 programs try to run these 2 SCSI commands,
which are not supported by Hyper-V:

ATA PASS THROUGH (12) -- 0xa1
ATA PASS THROUGH (16) -- 0x85

IMO the commands are used when PATA/SATA devices are connected to
SCSI-to-ATA bridge, so it's understandable that Hyper-V doesn't support
them.

In the case of boot failure (hang), the 2 programs don't have a chance to
run, so we can't see the messages.

IMO we should ignore the messages, which should be harmless.

> [    6.358405] XFS (sdb1): Mounting V5 Filesystem
> [    6.404478] XFS (sdb1): Ending clean mount
> [    7.535174] BUG: spinlock bad magic on CPU#0, swapper/0/0
> [    7.536807]  lock: host_ts+0x30/0xffffffffffffe1a0 [hv_utils], .magic:
> 00000000, .owner: <none>/-1, .owner_cpu: 0
> [    7.538436] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next-
> 20170214+ #1
> [    7.539142] Hardware name: Microsoft Corporation Virtual Machine/Virtual
> Machine, BIOS 090006  04/28/2016
> [    7.539142] Call Trace:
> [    7.539142]  <IRQ>
> [    7.539142]  dump_stack+0x63/0x82
> [    7.539142]  spin_dump+0x78/0xc0
> [    7.539142]  do_raw_spin_lock+0xfd/0x160
> [    7.539142]  _raw_spin_lock_irqsave+0x4c/0x60
> [    7.539142]  ? timesync_onchannelcallback+0x153/0x220 [hv_utils]
> [    7.539142]  timesync_onchannelcallback+0x153/0x220 [hv_utils]
>
> Can you resolve this address using gdb to a line of code?  Once inside
> gdb do:
>
> l *(timesync_onchannelcallback+0x153)

(gdb) l *(timesync_onchannelcallback+0x153)
0xffffffffc0104593 is in timesync_onchannelcallback (drivers/hv/hv_util.c:279).
274             } else {
275                     /*
276                      * Save the adjusted time sample from the host and the snapshot
277                      * of the current system time for PTP device.
278                      */
279                     spin_lock_irqsave(&host_ts.lock, flags);
280
281                     cur_reftime = hyperv_cs->read(hyperv_cs);
282                     host_ts.host_time = hosttime;
283                     host_ts.ref_time = cur_reftime;

It turns out the "host_ts.lock" isn't initialized with spin_lock_init().
I'll submit a patch for this.

However, the SCSI issue (i.e. *sometimes* the VM fails to boot) is still there, I think.
We need to continue to debug...

Thanks,
-- Dexuan

      reply	other threads:[~2017-02-15 13:51 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-18  9:48 [PATCH] genhd: Do not hold event lock when scheduling workqueue elements Hannes Reinecke
2017-01-31  0:31 ` Bart Van Assche
2017-01-31 16:15   ` Hannes Reinecke
2017-02-03 12:22     ` Dexuan Cui
2017-02-07  2:23       ` Dexuan Cui
2017-02-07  2:56         ` Bart Van Assche
2017-02-07  3:48           ` Dexuan Cui
2017-02-07  6:29             ` Dexuan Cui
2017-02-07 16:09               ` Jens Axboe
2017-02-08 10:48                 ` Dexuan Cui
2017-02-08 17:43                   ` Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements") Jens Axboe
2017-02-08 18:03                     ` hch
2017-02-09  7:35                       ` Dexuan Cui
2017-02-09 13:08                         ` hch
2017-02-10 14:49                           ` Dexuan Cui
2017-02-14 13:47                             ` hch
2017-02-14 14:17                               ` Dexuan Cui
2017-02-14 14:28                                 ` hch
2017-02-14 14:46                                   ` Dexuan Cui
2017-02-14 14:51                                     ` hch
2017-02-14 15:54                                       ` Dexuan Cui
2017-02-14 16:34                                         ` hch
2017-02-15 13:51                                           ` Dexuan Cui [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=MWHPR03MB26691FB87F0823E2E62868BABF5B0@MWHPR03MB2669.namprd03.prod.outlook.com \
    --to=decui@microsoft.com \
    --cc=Bart.VanAssche@sandisk.com \
    --cc=Nick.Meier@microsoft.com \
    --cc=alexng@microsoft.com \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.com \
    --cc=hare@suse.de \
    --cc=hch@lst.de \
    --cc=jth@kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=longli@microsoft.com \
    --cc=martin.petersen@oracle.com \
    --cc=v-adsuho@microsoft.com \
    --cc=v-chvale@microsoft.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).