All of lore.kernel.org
 help / color / mirror / Atom feed
* understanding partprobe failure
@ 2015-12-17 12:19 Loic Dachary
  2015-12-17 15:49 ` Ilya Dryomov
  0 siblings, 1 reply; 3+ messages in thread
From: Loic Dachary @ 2015-12-17 12:19 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 5839 bytes --]

Hi Ilya,

I'm seeing a partprobe failure right after a disk was zapped with sgdisk --clear --mbrtogpt -- /dev/vdb:

partprobe /dev/vdb failed : Error: Partition(s) 1 on /dev/vdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.

waiting 60 seconds (see the log below) and trying again succeeds. The partprobe call is guarded by udevadm settle to prevent udev actions from racing and nothing else goes on in the machine.

Any idea how that could happen ?

Cheers

2015-12-17 11:46:10,356.356 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:get_dm_uuid /dev/vdb uuid path is /sys/dev/block/253:16/dm/uuid
2015-12-17 11:46:10,357.357 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Zapping partition table on /dev/vdb
2015-12-17 11:46:10,358.358 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/sgdisk --zap-all -- /dev/vdb
2015-12-17 11:46:10,365.365 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Caution: invalid backup GPT header, but valid main header; regenerating
2015-12-17 11:46:10,366.366 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:backup header from main header.
2015-12-17 11:46:10,366.366 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
2015-12-17 11:46:10,366.366 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
2015-12-17 11:46:10,367.367 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:on the recovery & transformation menu to examine the two tables.
2015-12-17 11:46:10,367.367 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
2015-12-17 11:46:10,367.367 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning! One or more CRCs don't match. You should repair the disk!
2015-12-17 11:46:10,368.368 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
2015-12-17 11:46:11,413.413 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:****************************************************************************
2015-12-17 11:46:11,414.414 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
2015-12-17 11:46:11,414.414 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:verification and recovery are STRONGLY recommended.
2015-12-17 11:46:11,414.414 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:****************************************************************************
2015-12-17 11:46:11,415.415 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning: The kernel is still using the old partition table.
2015-12-17 11:46:11,415.415 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The new table will be used at the next reboot.
2015-12-17 11:46:11,416.416 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:GPT data structures destroyed! You may now partition the disk using fdisk or
2015-12-17 11:46:11,416.416 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:other utilities.
2015-12-17 11:46:11,416.416 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/sgdisk --clear --mbrtogpt -- /dev/vdb
2015-12-17 11:46:12,504.504 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Creating new GPT entries.
2015-12-17 11:46:12,505.505 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning: The kernel is still using the old partition table.
2015-12-17 11:46:12,505.505 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The new table will be used at the next reboot.
2015-12-17 11:46:12,505.505 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The operation has completed successfully.
2015-12-17 11:46:12,506.506 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Calling partprobe on zapped device /dev/vdb
2015-12-17 11:46:12,507.507 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
2015-12-17 11:46:15,427.427 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb
2015-12-17 11:46:16,860.860 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:partprobe /dev/vdb failed : Error: Partition(s) 1 on /dev/vdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.
2015-12-17 11:46:16,860.860 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:(ignored, waiting 60s)
2015-12-17 11:47:16,925.925 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
2015-12-17 11:47:19,681.681 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb
2015-12-17 11:47:20,125.125 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: understanding partprobe failure
  2015-12-17 12:19 understanding partprobe failure Loic Dachary
@ 2015-12-17 15:49 ` Ilya Dryomov
  2015-12-17 23:23   ` Loic Dachary
  0 siblings, 1 reply; 3+ messages in thread
From: Ilya Dryomov @ 2015-12-17 15:49 UTC (permalink / raw)
  To: Loic Dachary; +Cc: Ceph Development

On Thu, Dec 17, 2015 at 1:19 PM, Loic Dachary <loic@dachary.org> wrote:
> Hi Ilya,
>
> I'm seeing a partprobe failure right after a disk was zapped with sgdisk --clear --mbrtogpt -- /dev/vdb:
>
> partprobe /dev/vdb failed : Error: Partition(s) 1 on /dev/vdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
>
> waiting 60 seconds (see the log below) and trying again succeeds. The partprobe call is guarded by udevadm settle to prevent udev actions from racing and nothing else goes on in the machine.
>
> Any idea how that could happen ?
>
> Cheers
>
> 2015-12-17 11:46:10,356.356 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:get_dm_uuid /dev/vdb uuid path is /sys/dev/block/253:16/dm/uuid
> 2015-12-17 11:46:10,357.357 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Zapping partition table on /dev/vdb
> 2015-12-17 11:46:10,358.358 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/sgdisk --zap-all -- /dev/vdb
> 2015-12-17 11:46:10,365.365 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Caution: invalid backup GPT header, but valid main header; regenerating
> 2015-12-17 11:46:10,366.366 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:backup header from main header.
> 2015-12-17 11:46:10,366.366 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
> 2015-12-17 11:46:10,366.366 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
> 2015-12-17 11:46:10,367.367 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:on the recovery & transformation menu to examine the two tables.
> 2015-12-17 11:46:10,367.367 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
> 2015-12-17 11:46:10,367.367 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning! One or more CRCs don't match. You should repair the disk!
> 2015-12-17 11:46:10,368.368 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
> 2015-12-17 11:46:11,413.413 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:****************************************************************************
> 2015-12-17 11:46:11,414.414 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
> 2015-12-17 11:46:11,414.414 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:verification and recovery are STRONGLY recommended.
> 2015-12-17 11:46:11,414.414 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:****************************************************************************
> 2015-12-17 11:46:11,415.415 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning: The kernel is still using the old partition table.
> 2015-12-17 11:46:11,415.415 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The new table will be used at the next reboot.
> 2015-12-17 11:46:11,416.416 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:GPT data structures destroyed! You may now partition the disk using fdisk or
> 2015-12-17 11:46:11,416.416 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:other utilities.
> 2015-12-17 11:46:11,416.416 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/sgdisk --clear --mbrtogpt -- /dev/vdb
> 2015-12-17 11:46:12,504.504 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Creating new GPT entries.
> 2015-12-17 11:46:12,505.505 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning: The kernel is still using the old partition table.
> 2015-12-17 11:46:12,505.505 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The new table will be used at the next reboot.
> 2015-12-17 11:46:12,505.505 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The operation has completed successfully.
> 2015-12-17 11:46:12,506.506 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Calling partprobe on zapped device /dev/vdb
> 2015-12-17 11:46:12,507.507 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
> 2015-12-17 11:46:15,427.427 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb
> 2015-12-17 11:46:16,860.860 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:partprobe /dev/vdb failed : Error: Partition(s) 1 on /dev/vdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.
> 2015-12-17 11:46:16,860.860 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:(ignored, waiting 60s)
> 2015-12-17 11:47:16,925.925 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
> 2015-12-17 11:47:19,681.681 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb
> 2015-12-17 11:47:20,125.125 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600

Well, evidently something was using that partition.  This is on
openstack, right?  It probably makes it hard to debug, but trying to
reproduce and doing some tracing is probably the only way to get an
idea.

udevadm settle doesn't guarantee that a device (or one of its
partitions) isn't going to be busy - it just waits for udevd to empty
its queue.  Both sgdisk invocations complained about a busy device, is
it possible something external to udev was doing something with it?

Thanks,

                Ilya

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: understanding partprobe failure
  2015-12-17 15:49 ` Ilya Dryomov
@ 2015-12-17 23:23   ` Loic Dachary
  0 siblings, 0 replies; 3+ messages in thread
From: Loic Dachary @ 2015-12-17 23:23 UTC (permalink / raw)
  To: Ilya Dryomov; +Cc: Ceph Development

[-- Attachment #1: Type: text/plain, Size: 6823 bytes --]



On 17/12/2015 16:49, Ilya Dryomov wrote:
> On Thu, Dec 17, 2015 at 1:19 PM, Loic Dachary <loic@dachary.org> wrote:
>> Hi Ilya,
>>
>> I'm seeing a partprobe failure right after a disk was zapped with sgdisk --clear --mbrtogpt -- /dev/vdb:
>>
>> partprobe /dev/vdb failed : Error: Partition(s) 1 on /dev/vdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use. As a result, the old partition(s) will remain in use. You should reboot now before making further changes.
>>
>> waiting 60 seconds (see the log below) and trying again succeeds. The partprobe call is guarded by udevadm settle to prevent udev actions from racing and nothing else goes on in the machine.
>>
>> Any idea how that could happen ?
>>
>> Cheers
>>
>> 2015-12-17 11:46:10,356.356 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:get_dm_uuid /dev/vdb uuid path is /sys/dev/block/253:16/dm/uuid
>> 2015-12-17 11:46:10,357.357 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Zapping partition table on /dev/vdb
>> 2015-12-17 11:46:10,358.358 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/sgdisk --zap-all -- /dev/vdb
>> 2015-12-17 11:46:10,365.365 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Caution: invalid backup GPT header, but valid main header; regenerating
>> 2015-12-17 11:46:10,366.366 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:backup header from main header.
>> 2015-12-17 11:46:10,366.366 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
>> 2015-12-17 11:46:10,366.366 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning! Main and backup partition tables differ! Use the 'c' and 'e' options
>> 2015-12-17 11:46:10,367.367 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:on the recovery & transformation menu to examine the two tables.
>> 2015-12-17 11:46:10,367.367 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
>> 2015-12-17 11:46:10,367.367 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning! One or more CRCs don't match. You should repair the disk!
>> 2015-12-17 11:46:10,368.368 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:
>> 2015-12-17 11:46:11,413.413 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:****************************************************************************
>> 2015-12-17 11:46:11,414.414 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Caution: Found protective or hybrid MBR and corrupt GPT. Using GPT, but disk
>> 2015-12-17 11:46:11,414.414 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:verification and recovery are STRONGLY recommended.
>> 2015-12-17 11:46:11,414.414 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:****************************************************************************
>> 2015-12-17 11:46:11,415.415 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning: The kernel is still using the old partition table.
>> 2015-12-17 11:46:11,415.415 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The new table will be used at the next reboot.
>> 2015-12-17 11:46:11,416.416 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:GPT data structures destroyed! You may now partition the disk using fdisk or
>> 2015-12-17 11:46:11,416.416 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:other utilities.
>> 2015-12-17 11:46:11,416.416 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/sgdisk --clear --mbrtogpt -- /dev/vdb
>> 2015-12-17 11:46:12,504.504 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Creating new GPT entries.
>> 2015-12-17 11:46:12,505.505 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:Warning: The kernel is still using the old partition table.
>> 2015-12-17 11:46:12,505.505 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The new table will be used at the next reboot.
>> 2015-12-17 11:46:12,505.505 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:The operation has completed successfully.
>> 2015-12-17 11:46:12,506.506 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:Calling partprobe on zapped device /dev/vdb
>> 2015-12-17 11:46:12,507.507 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
>> 2015-12-17 11:46:15,427.427 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb
>> 2015-12-17 11:46:16,860.860 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:DEBUG:ceph-disk:partprobe /dev/vdb failed : Error: Partition(s) 1 on /dev/vdb have been written, but we have been unable to inform the kernel of the change, probably because it/they are in use.  As a result, the old partition(s) will remain in use.  You should reboot now before making further changes.
>> 2015-12-17 11:46:16,860.860 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:(ignored, waiting 60s)
>> 2015-12-17 11:47:16,925.925 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
>> 2015-12-17 11:47:19,681.681 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/sbin/partprobe /dev/vdb
>> 2015-12-17 11:47:20,125.125 INFO:tasks.workunit.client.0.target167114233028.stderr:DEBUG:CephDisk:INFO:ceph-disk:Running command: /usr/bin/udevadm settle --timeout=600
> 
> Well, evidently something was using that partition.  This is on
> openstack, right?  It probably makes it hard to debug, but trying to
> reproduce and doing some tracing is probably the only way to get an
> idea.

This is on a VMs running integration tests with a tightly controlled environment.

> udevadm settle doesn't guarantee that a device (or one of its
> partitions) isn't going to be busy - it just waits for udevd to empty
> its queue.  Both sgdisk invocations complained about a busy device, is
> it possible something external to udev was doing something with it?

I think it may be multipath or dmcrypt temporarily holding the partition and triggering this error. I'll try to verify that.

Thanks !


-- 
Loïc Dachary, Artisan Logiciel Libre


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2015-12-17 23:23 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-12-17 12:19 understanding partprobe failure Loic Dachary
2015-12-17 15:49 ` Ilya Dryomov
2015-12-17 23:23   ` Loic Dachary

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.