From: Dexuan Cui <decui@microsoft.com> To: "hch@lst.de" <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk>, Bart Van Assche <Bart.VanAssche@sandisk.com>, "hare@suse.com" <hare@suse.com>, "hare@suse.de" <hare@suse.de>, "Martin K. Petersen" <martin.petersen@oracle.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>, "jth@kernel.org" <jth@kernel.org>, Nick Meier <Nick.Meier@microsoft.com>, "Alex Ng (LIS)" <alexng@microsoft.com>, Long Li <longli@microsoft.com>, "Adrian Suhov (Cloudbase Solutions SRL)" <v-adsuho@microsoft.com>, "Chris Valean (Cloudbase Solutions SRL)" <v-chvale@microsoft.com> Subject: RE: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements") Date: Wed, 15 Feb 2017 13:51:33 +0000 [thread overview] Message-ID: <MWHPR03MB26691FB87F0823E2E62868BABF5B0@MWHPR03MB2669.namprd03.prod.outlook.com> (raw) In-Reply-To: <20170214163437.GA23956@lst.de> > From: hch@lst.de [mailto:hch@lst.de] > Sent: Wednesday, February 15, 2017 00:35 > > I tested today's linux-next (next-20170214) + the 2 patches just now an= d > got > > a weird result: > > sometimes the VM stills hung with a new calltrace (BUG: spinlock bad > > magic) , but sometimes the VM did boot up despite the new calltrace! > > > > Attached is the log of a "good" boot. > > > > It looks we have a memory corruption issue somewhere... > > Yes. It's due to an uninitialized spinlock. Please see the below. > > Actually previously I saw the "BUG: spinlock bad magic" message once, b= ut > I > > couldn't repro it later, so I didn't mention it to you. > > Interesting. Ditto. And probably my memory was inaccurate due to the long period of bisecting. We should always see the message. > > A few questions on the dmesg: > > [ 6.208794] sd 2:0:0:0: [storvsc] Sense Key : Illegal Request [current= ] > [ 6.209447] sd 2:0:0:0: [storvsc] Add. Sense: Invalid command operatio= n > code > [ 6.210043] sd 3:0:0:0: [storvsc] Sense Key : Illegal Request [current= ] > [ 6.210618] sd 3:0:0:0: [storvsc] Add. Sense: Invalid command operatio= n > code > [ 6.212272] sd 2:0:0:0: [storvsc] Sense Key : Illegal Request [current= ] > [ 6.212897] sd 2:0:0:0: [storvsc] Add. Sense: Invalid command operatio= n > code > [ 6.213474] sd 3:0:0:0: [storvsc] Sense Key : Illegal Request [current= ] > [ 6.214051] sd 3:0:0:0: [storvsc] Add. Sense: Invalid command operatio= n > code > > I didn't see anything like this in the other logs. Are these messages > something usual on HyperV VMs? The messages should be normal, and I have always seen them for years at least. I think they're triggered by "/sbin/hdparm /dev/sda" or "/lib/udev/ata_id /dev/sda" in my Ubuntu 16.04 VM: during the VM boot-up, the 2 programs are executed by the initrd's systemd-udev or other system init scripts and the 2 programs try to run these 2 SCSI commands, which are not supported by Hyper-V: ATA PASS THROUGH (12) -- 0xa1 ATA PASS THROUGH (16) -- 0x85 IMO the commands are used when PATA/SATA devices are connected to SCSI-to-ATA bridge, so it's understandable that Hyper-V doesn't support them. In the case of boot failure (hang), the 2 programs don't have a chance to run, so we can't see the messages. IMO we should ignore the messages, which should be harmless. > [ 6.358405] XFS (sdb1): Mounting V5 Filesystem > [ 6.404478] XFS (sdb1): Ending clean mount > [ 7.535174] BUG: spinlock bad magic on CPU#0, swapper/0/0 > [ 7.536807] lock: host_ts+0x30/0xffffffffffffe1a0 [hv_utils], .magic: > 00000000, .owner: <none>/-1, .owner_cpu: 0 > [ 7.538436] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next- > 20170214+ #1 > [ 7.539142] Hardware name: Microsoft Corporation Virtual Machine/Virtu= al > Machine, BIOS 090006 04/28/2016 > [ 7.539142] Call Trace: > [ 7.539142] <IRQ> > [ 7.539142] dump_stack+0x63/0x82 > [ 7.539142] spin_dump+0x78/0xc0 > [ 7.539142] do_raw_spin_lock+0xfd/0x160 > [ 7.539142] _raw_spin_lock_irqsave+0x4c/0x60 > [ 7.539142] ? timesync_onchannelcallback+0x153/0x220 [hv_utils] > [ 7.539142] timesync_onchannelcallback+0x153/0x220 [hv_utils] > > Can you resolve this address using gdb to a line of code? Once inside > gdb do: > > l *(timesync_onchannelcallback+0x153) (gdb) l *(timesync_onchannelcallback+0x153) 0xffffffffc0104593 is in timesync_onchannelcallback (drivers/hv/hv_util.c:2= 79). 274 } else { 275 /* 276 * Save the adjusted time sample from the host and = the snapshot 277 * of the current system time for PTP device. 278 */ 279 spin_lock_irqsave(&host_ts.lock, flags); 280 281 cur_reftime =3D hyperv_cs->read(hyperv_cs); 282 host_ts.host_time =3D hosttime; 283 host_ts.ref_time =3D cur_reftime; It turns out the "host_ts.lock" isn't initialized with spin_lock_init(). I'll submit a patch for this. However, the SCSI issue (i.e. *sometimes* the VM fails to boot) is still th= ere, I think. We need to continue to debug... Thanks, -- Dexuan
WARNING: multiple messages have this Message-ID (diff)
From: Dexuan Cui <decui@microsoft.com> To: "hch@lst.de" <hch@lst.de> Cc: Jens Axboe <axboe@kernel.dk>, Bart Van Assche <Bart.VanAssche@sandisk.com>, "hare@suse.com" <hare@suse.com>, "hare@suse.de" <hare@suse.de>, "Martin K. Petersen" <martin.petersen@oracle.com>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "linux-block@vger.kernel.org" <linux-block@vger.kernel.org>, "jth@kernel.org" <jth@kernel.org>, Nick Meier <Nick.Meier@microsoft.com>, "Alex Ng (LIS)" <alexng@microsoft.com>, Long Li <longli@microsoft.com>, "Adrian Suhov (Cloudbase Solutions SRL)" <v-adsuho@microsoft.com>, "Chris Valean (Cloudbase Solutions SRL)" <v-chvale@microsoft.com> Subject: RE: Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements") Date: Wed, 15 Feb 2017 13:51:33 +0000 [thread overview] Message-ID: <MWHPR03MB26691FB87F0823E2E62868BABF5B0@MWHPR03MB2669.namprd03.prod.outlook.com> (raw) In-Reply-To: <20170214163437.GA23956@lst.de> > From: hch@lst.de [mailto:hch@lst.de] > Sent: Wednesday, February 15, 2017 00:35 > > I tested today's linux-next (next-20170214) + the 2 patches just now and > got > > a weird result: > > sometimes the VM stills hung with a new calltrace (BUG: spinlock bad > > magic) , but sometimes the VM did boot up despite the new calltrace! > > > > Attached is the log of a "good" boot. > > > > It looks we have a memory corruption issue somewhere... > > Yes. It's due to an uninitialized spinlock. Please see the below. > > Actually previously I saw the "BUG: spinlock bad magic" message once, but > I > > couldn't repro it later, so I didn't mention it to you. > > Interesting. Ditto. And probably my memory was inaccurate due to the long period of bisecting. We should always see the message. > > A few questions on the dmesg: > > [ 6.208794] sd 2:0:0:0: [storvsc] Sense Key : Illegal Request [current] > [ 6.209447] sd 2:0:0:0: [storvsc] Add. Sense: Invalid command operation > code > [ 6.210043] sd 3:0:0:0: [storvsc] Sense Key : Illegal Request [current] > [ 6.210618] sd 3:0:0:0: [storvsc] Add. Sense: Invalid command operation > code > [ 6.212272] sd 2:0:0:0: [storvsc] Sense Key : Illegal Request [current] > [ 6.212897] sd 2:0:0:0: [storvsc] Add. Sense: Invalid command operation > code > [ 6.213474] sd 3:0:0:0: [storvsc] Sense Key : Illegal Request [current] > [ 6.214051] sd 3:0:0:0: [storvsc] Add. Sense: Invalid command operation > code > > I didn't see anything like this in the other logs. Are these messages > something usual on HyperV VMs? The messages should be normal, and I have always seen them for years at least. I think they're triggered by "/sbin/hdparm /dev/sda" or "/lib/udev/ata_id /dev/sda" in my Ubuntu 16.04 VM: during the VM boot-up, the 2 programs are executed by the initrd's systemd-udev or other system init scripts and the 2 programs try to run these 2 SCSI commands, which are not supported by Hyper-V: ATA PASS THROUGH (12) -- 0xa1 ATA PASS THROUGH (16) -- 0x85 IMO the commands are used when PATA/SATA devices are connected to SCSI-to-ATA bridge, so it's understandable that Hyper-V doesn't support them. In the case of boot failure (hang), the 2 programs don't have a chance to run, so we can't see the messages. IMO we should ignore the messages, which should be harmless. > [ 6.358405] XFS (sdb1): Mounting V5 Filesystem > [ 6.404478] XFS (sdb1): Ending clean mount > [ 7.535174] BUG: spinlock bad magic on CPU#0, swapper/0/0 > [ 7.536807] lock: host_ts+0x30/0xffffffffffffe1a0 [hv_utils], .magic: > 00000000, .owner: <none>/-1, .owner_cpu: 0 > [ 7.538436] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.10.0-rc8-next- > 20170214+ #1 > [ 7.539142] Hardware name: Microsoft Corporation Virtual Machine/Virtual > Machine, BIOS 090006 04/28/2016 > [ 7.539142] Call Trace: > [ 7.539142] <IRQ> > [ 7.539142] dump_stack+0x63/0x82 > [ 7.539142] spin_dump+0x78/0xc0 > [ 7.539142] do_raw_spin_lock+0xfd/0x160 > [ 7.539142] _raw_spin_lock_irqsave+0x4c/0x60 > [ 7.539142] ? timesync_onchannelcallback+0x153/0x220 [hv_utils] > [ 7.539142] timesync_onchannelcallback+0x153/0x220 [hv_utils] > > Can you resolve this address using gdb to a line of code? Once inside > gdb do: > > l *(timesync_onchannelcallback+0x153) (gdb) l *(timesync_onchannelcallback+0x153) 0xffffffffc0104593 is in timesync_onchannelcallback (drivers/hv/hv_util.c:279). 274 } else { 275 /* 276 * Save the adjusted time sample from the host and the snapshot 277 * of the current system time for PTP device. 278 */ 279 spin_lock_irqsave(&host_ts.lock, flags); 280 281 cur_reftime = hyperv_cs->read(hyperv_cs); 282 host_ts.host_time = hosttime; 283 host_ts.ref_time = cur_reftime; It turns out the "host_ts.lock" isn't initialized with spin_lock_init(). I'll submit a patch for this. However, the SCSI issue (i.e. *sometimes* the VM fails to boot) is still there, I think. We need to continue to debug... Thanks, -- Dexuan
next prev parent reply other threads:[~2017-02-15 13:51 UTC|newest] Thread overview: 36+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-01-18 9:48 [PATCH] genhd: Do not hold event lock when scheduling workqueue elements Hannes Reinecke 2017-01-31 0:31 ` Bart Van Assche 2017-01-31 0:31 ` Bart Van Assche 2017-01-31 16:15 ` Hannes Reinecke 2017-01-31 16:15 ` Hannes Reinecke 2017-02-03 12:22 ` Dexuan Cui 2017-02-03 12:22 ` Dexuan Cui 2017-02-07 2:23 ` Dexuan Cui 2017-02-07 2:23 ` Dexuan Cui 2017-02-07 2:56 ` Bart Van Assche 2017-02-07 2:56 ` Bart Van Assche 2017-02-07 3:48 ` Dexuan Cui 2017-02-07 6:29 ` Dexuan Cui 2017-02-07 6:29 ` Dexuan Cui 2017-02-07 16:09 ` Jens Axboe 2017-02-08 10:48 ` Dexuan Cui 2017-02-08 10:48 ` Dexuan Cui 2017-02-08 17:43 ` Boot regression (was "Re: [PATCH] genhd: Do not hold event lock when scheduling workqueue elements") Jens Axboe 2017-02-08 18:03 ` hch 2017-02-09 7:35 ` Dexuan Cui 2017-02-09 7:35 ` Dexuan Cui 2017-02-09 13:08 ` hch 2017-02-10 14:49 ` Dexuan Cui 2017-02-10 14:49 ` Dexuan Cui 2017-02-14 13:47 ` hch 2017-02-14 14:17 ` Dexuan Cui 2017-02-14 14:17 ` Dexuan Cui 2017-02-14 14:28 ` hch 2017-02-14 14:46 ` Dexuan Cui 2017-02-14 14:46 ` Dexuan Cui 2017-02-14 14:51 ` hch 2017-02-14 15:54 ` Dexuan Cui 2017-02-14 15:54 ` Dexuan Cui 2017-02-14 16:34 ` hch 2017-02-15 13:51 ` Dexuan Cui [this message] 2017-02-15 13:51 ` Dexuan Cui
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=MWHPR03MB26691FB87F0823E2E62868BABF5B0@MWHPR03MB2669.namprd03.prod.outlook.com \ --to=decui@microsoft.com \ --cc=Bart.VanAssche@sandisk.com \ --cc=Nick.Meier@microsoft.com \ --cc=alexng@microsoft.com \ --cc=axboe@kernel.dk \ --cc=hare@suse.com \ --cc=hare@suse.de \ --cc=hch@lst.de \ --cc=jth@kernel.org \ --cc=linux-block@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=longli@microsoft.com \ --cc=martin.petersen@oracle.com \ --cc=v-adsuho@microsoft.com \ --cc=v-chvale@microsoft.com \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.