All of lore.kernel.org
 help / color / mirror / Atom feed
* MD Remnants After --stop
@ 2016-11-04 15:35 Marc Smith
  2016-11-07  5:44 ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Smith @ 2016-11-04 15:35 UTC (permalink / raw)
  To: linux-raid

Hi,

It may be that I've never noticed this before, so maybe its not a
problem... after using '--stop' to deactivate/stop an MD array, there
are remnants of it lingering, namely an entry in /sys/block (eg,
/sys/block/md127) and the device node in /dev remains (eg,
/dev/md127).

Is this normal? Like I said, it probably is, and I've just never
noticed it before. I assume its not going to hurt anything, but is
there a way to clean it up, without rebooting? Obviously I could
remove the /dev entry, but what about /sys/block?

Thanks for your time.


--Marc

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-11-04 15:35 MD Remnants After --stop Marc Smith
@ 2016-11-07  5:44 ` NeilBrown
  2016-11-18 19:31   ` Marc Smith
  0 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2016-11-07  5:44 UTC (permalink / raw)
  To: Marc Smith, linux-raid

[-- Attachment #1: Type: text/plain, Size: 1989 bytes --]

On Sat, Nov 05 2016, Marc Smith wrote:

> Hi,
>
> It may be that I've never noticed this before, so maybe its not a
> problem... after using '--stop' to deactivate/stop an MD array, there
> are remnants of it lingering, namely an entry in /sys/block (eg,
> /sys/block/md127) and the device node in /dev remains (eg,
> /dev/md127).
>
> Is this normal? Like I said, it probably is, and I've just never
> noticed it before. I assume its not going to hurt anything, but is
> there a way to clean it up, without rebooting? Obviously I could
> remove the /dev entry, but what about /sys/block?
>

You can remove them both by running
  mdadm -S /dev/md127

but they'll probably just reappear again.

This seems to be an on-going battle between md and udev.  I've "fixed"
it at least once, but it keeps coming back.

When md removes the md127 device, a message is sent to udev.
As part of its response to this message, udev tries to open /dev/md127.
Because of the rather unusual way that md devices are created (it made
sense nearly 20 years ago when it was designed), opening /dev/md127
causes md to create device md127 again.

You could
  mv /dev/md127 /dev/md127X
  mdadm -S /dev/md127X
  rm /dev/md127X
that stop udev from opening /dev/md127.  It seems to work reliably.

md used to generate a CHANGE event before the REMOVE event, and only the
CHANGE event caused udev to open the device file.  I removed that and
the problem went away.  Apparently some change has happened to udev and
now it opens the file in response to REMOVE as well.

So to "fix" it again, you need to figure out what udev is doing and fix
that.

Alternately... place "create names=yes" in your mdadm.conf
and always use names, not numbers, to work with md arrays.
e.g. /dev/md/home /dev/md/root /dev/md/scratch etc.

When will trigger the use of an alternate scheme for creating md devices
(using minor numbers >= 512) which udev cannot break so easily.  When it
tries to open /dev/md_home, that will fail.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-11-07  5:44 ` NeilBrown
@ 2016-11-18 19:31   ` Marc Smith
  2016-11-21  3:42     ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Smith @ 2016-11-18 19:31 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Mon, Nov 7, 2016 at 12:44 AM, NeilBrown <neilb@suse.com> wrote:
> On Sat, Nov 05 2016, Marc Smith wrote:
>
>> Hi,
>>
>> It may be that I've never noticed this before, so maybe its not a
>> problem... after using '--stop' to deactivate/stop an MD array, there
>> are remnants of it lingering, namely an entry in /sys/block (eg,
>> /sys/block/md127) and the device node in /dev remains (eg,
>> /dev/md127).
>>
>> Is this normal? Like I said, it probably is, and I've just never
>> noticed it before. I assume its not going to hurt anything, but is
>> there a way to clean it up, without rebooting? Obviously I could
>> remove the /dev entry, but what about /sys/block?
>>
>
> You can remove them both by running
>   mdadm -S /dev/md127
>
> but they'll probably just reappear again.
>
> This seems to be an on-going battle between md and udev.  I've "fixed"
> it at least once, but it keeps coming back.
>
> When md removes the md127 device, a message is sent to udev.
> As part of its response to this message, udev tries to open /dev/md127.
> Because of the rather unusual way that md devices are created (it made
> sense nearly 20 years ago when it was designed), opening /dev/md127
> causes md to create device md127 again.
>
> You could
>   mv /dev/md127 /dev/md127X
>   mdadm -S /dev/md127X
>   rm /dev/md127X
> that stop udev from opening /dev/md127.  It seems to work reliably.
>
> md used to generate a CHANGE event before the REMOVE event, and only the
> CHANGE event caused udev to open the device file.  I removed that and
> the problem went away.  Apparently some change has happened to udev and
> now it opens the file in response to REMOVE as well.

I used "udevadm monitor -pku" to watch the events when running "mdadm
--stop /dev/md127" and this is what I see:

--snip--
KERNEL[163074.119778] change   /devices/virtual/block/md127 (block)
ACTION=change
DEVNAME=/dev/md127
DEVPATH=/devices/virtual/block/md127
DEVTYPE=disk
MAJOR=9
MINOR=127
SEQNUM=3701
SUBSYSTEM=block

UDEV  [163074.121569] change   /devices/virtual/block/md127 (block)
ACTION=change
DEVNAME=/dev/md127
DEVPATH=/devices/virtual/block/md127
DEVTYPE=disk
MAJOR=9
MINOR=127
SEQNUM=3701
SUBSYSTEM=block
SYSTEMD_READY=0
USEC_INITIALIZED=370470
--snip--

I don't see any 'remove' event generated. I should mention if I hadn't
already that I'm testing md-cluster (--bitmap=clustered), and
currently using Linux 4.9-rc3.


--Marc

>
> So to "fix" it again, you need to figure out what udev is doing and fix
> that.
>
> Alternately... place "create names=yes" in your mdadm.conf
> and always use names, not numbers, to work with md arrays.
> e.g. /dev/md/home /dev/md/root /dev/md/scratch etc.
>
> When will trigger the use of an alternate scheme for creating md devices
> (using minor numbers >= 512) which udev cannot break so easily.  When it
> tries to open /dev/md_home, that will fail.
>
> NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-11-18 19:31   ` Marc Smith
@ 2016-11-21  3:42     ` NeilBrown
  2016-11-21 14:08       ` Marc Smith
  0 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2016-11-21  3:42 UTC (permalink / raw)
  To: Marc Smith; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 2838 bytes --]

On Sat, Nov 19 2016, Marc Smith wrote:

> On Mon, Nov 7, 2016 at 12:44 AM, NeilBrown <neilb@suse.com> wrote:
>> On Sat, Nov 05 2016, Marc Smith wrote:
>>
>>> Hi,
>>>
>>> It may be that I've never noticed this before, so maybe its not a
>>> problem... after using '--stop' to deactivate/stop an MD array, there
>>> are remnants of it lingering, namely an entry in /sys/block (eg,
>>> /sys/block/md127) and the device node in /dev remains (eg,
>>> /dev/md127).
>>>
>>> Is this normal? Like I said, it probably is, and I've just never
>>> noticed it before. I assume its not going to hurt anything, but is
>>> there a way to clean it up, without rebooting? Obviously I could
>>> remove the /dev entry, but what about /sys/block?
>>>
>>
>> You can remove them both by running
>>   mdadm -S /dev/md127
>>
>> but they'll probably just reappear again.
>>
>> This seems to be an on-going battle between md and udev.  I've "fixed"
>> it at least once, but it keeps coming back.
>>
>> When md removes the md127 device, a message is sent to udev.
>> As part of its response to this message, udev tries to open /dev/md127.
>> Because of the rather unusual way that md devices are created (it made
>> sense nearly 20 years ago when it was designed), opening /dev/md127
>> causes md to create device md127 again.
>>
>> You could
>>   mv /dev/md127 /dev/md127X
>>   mdadm -S /dev/md127X
>>   rm /dev/md127X
>> that stop udev from opening /dev/md127.  It seems to work reliably.
>>
>> md used to generate a CHANGE event before the REMOVE event, and only the
>> CHANGE event caused udev to open the device file.  I removed that and
>> the problem went away.  Apparently some change has happened to udev and
>> now it opens the file in response to REMOVE as well.
>
> I used "udevadm monitor -pku" to watch the events when running "mdadm
> --stop /dev/md127" and this is what I see:
>
> --snip--
> KERNEL[163074.119778] change   /devices/virtual/block/md127 (block)
> ACTION=change
> DEVNAME=/dev/md127
> DEVPATH=/devices/virtual/block/md127
> DEVTYPE=disk
> MAJOR=9
> MINOR=127
> SEQNUM=3701
> SUBSYSTEM=block
>
> UDEV  [163074.121569] change   /devices/virtual/block/md127 (block)
> ACTION=change
> DEVNAME=/dev/md127
> DEVPATH=/devices/virtual/block/md127
> DEVTYPE=disk
> MAJOR=9
> MINOR=127
> SEQNUM=3701
> SUBSYSTEM=block
> SYSTEMD_READY=0
> USEC_INITIALIZED=370470
> --snip--
>
> I don't see any 'remove' event generated. I should mention if I hadn't
> already that I'm testing md-cluster (--bitmap=clustered), and
> currently using Linux 4.9-rc3.

What version of mdadm are you using?
You need one which contains
Commit: 229e66cb9689 ("Manage.c: Only issue change events for kernels older than 2.6.28")

which hasn't made it into a release yet.  But if you are playing with
md-cluster, I would guess you are using the latest from git...

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-11-21  3:42     ` NeilBrown
@ 2016-11-21 14:08       ` Marc Smith
  2016-11-22 15:12         ` Marc Smith
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Smith @ 2016-11-21 14:08 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Sun, Nov 20, 2016 at 10:42 PM, NeilBrown <neilb@suse.com> wrote:
> On Sat, Nov 19 2016, Marc Smith wrote:
>
>> On Mon, Nov 7, 2016 at 12:44 AM, NeilBrown <neilb@suse.com> wrote:
>>> On Sat, Nov 05 2016, Marc Smith wrote:
>>>
>>>> Hi,
>>>>
>>>> It may be that I've never noticed this before, so maybe its not a
>>>> problem... after using '--stop' to deactivate/stop an MD array, there
>>>> are remnants of it lingering, namely an entry in /sys/block (eg,
>>>> /sys/block/md127) and the device node in /dev remains (eg,
>>>> /dev/md127).
>>>>
>>>> Is this normal? Like I said, it probably is, and I've just never
>>>> noticed it before. I assume its not going to hurt anything, but is
>>>> there a way to clean it up, without rebooting? Obviously I could
>>>> remove the /dev entry, but what about /sys/block?
>>>>
>>>
>>> You can remove them both by running
>>>   mdadm -S /dev/md127
>>>
>>> but they'll probably just reappear again.
>>>
>>> This seems to be an on-going battle between md and udev.  I've "fixed"
>>> it at least once, but it keeps coming back.
>>>
>>> When md removes the md127 device, a message is sent to udev.
>>> As part of its response to this message, udev tries to open /dev/md127.
>>> Because of the rather unusual way that md devices are created (it made
>>> sense nearly 20 years ago when it was designed), opening /dev/md127
>>> causes md to create device md127 again.
>>>
>>> You could
>>>   mv /dev/md127 /dev/md127X
>>>   mdadm -S /dev/md127X
>>>   rm /dev/md127X
>>> that stop udev from opening /dev/md127.  It seems to work reliably.
>>>
>>> md used to generate a CHANGE event before the REMOVE event, and only the
>>> CHANGE event caused udev to open the device file.  I removed that and
>>> the problem went away.  Apparently some change has happened to udev and
>>> now it opens the file in response to REMOVE as well.
>>
>> I used "udevadm monitor -pku" to watch the events when running "mdadm
>> --stop /dev/md127" and this is what I see:
>>
>> --snip--
>> KERNEL[163074.119778] change   /devices/virtual/block/md127 (block)
>> ACTION=change
>> DEVNAME=/dev/md127
>> DEVPATH=/devices/virtual/block/md127
>> DEVTYPE=disk
>> MAJOR=9
>> MINOR=127
>> SEQNUM=3701
>> SUBSYSTEM=block
>>
>> UDEV  [163074.121569] change   /devices/virtual/block/md127 (block)
>> ACTION=change
>> DEVNAME=/dev/md127
>> DEVPATH=/devices/virtual/block/md127
>> DEVTYPE=disk
>> MAJOR=9
>> MINOR=127
>> SEQNUM=3701
>> SUBSYSTEM=block
>> SYSTEMD_READY=0
>> USEC_INITIALIZED=370470
>> --snip--
>>
>> I don't see any 'remove' event generated. I should mention if I hadn't
>> already that I'm testing md-cluster (--bitmap=clustered), and
>> currently using Linux 4.9-rc3.
>
> What version of mdadm are you using?

v3.4


> You need one which contains
> Commit: 229e66cb9689 ("Manage.c: Only issue change events for kernels older than 2.6.28")
>
> which hasn't made it into a release yet.  But if you are playing with
> md-cluster, I would guess you are using the latest from git...

Wasn't, but I will now. Thanks.

--Marc


>
> NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-11-21 14:08       ` Marc Smith
@ 2016-11-22 15:12         ` Marc Smith
  2016-11-22 23:51           ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Smith @ 2016-11-22 15:12 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Hi,

Sorry, I'm not trying to beat a dead horse here, but I do feel
something has changed... I just tested with Linux 4.5.2 and when
stopping an md array (with mdadm --stop) the entry in /sys/block/ is
removed, and even the /dev/mdXXX and /dev/md/name link are removed
properly.

When testing with Linux 4.9-rc3, the entries in /sys/block/ still
remain (array_state attribute value is "clear") after using mdadm
--stop and the /dev/mdXXX device exists (the /dev/md/name link is
removed, by udev I assume).

Looks like Linux 4.9 is at rc6 now -- have there been any changes that
would correct this behavior? Or is this expected behavior with the
latest version? Not sure when this changed, but I did go back to 4.5.2
and confirmed everything is removed correctly in that version, not
sure if this is different starting in 4.9, or something between 4.5
and 4.9.

Can anyone else confirm with Linux 4.9 that the /sys/block/mdXXX entry
lingers after using mdadm --stop? I suppose it could be some other
system that is causing this on my machines. I tested using the latest
from the master branch of mdadm and get the same result.


Thanks for your time and help.

--Marc


On Mon, Nov 21, 2016 at 9:08 AM, Marc Smith <marc.smith@mcc.edu> wrote:
> On Sun, Nov 20, 2016 at 10:42 PM, NeilBrown <neilb@suse.com> wrote:
>> On Sat, Nov 19 2016, Marc Smith wrote:
>>
>>> On Mon, Nov 7, 2016 at 12:44 AM, NeilBrown <neilb@suse.com> wrote:
>>>> On Sat, Nov 05 2016, Marc Smith wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> It may be that I've never noticed this before, so maybe its not a
>>>>> problem... after using '--stop' to deactivate/stop an MD array, there
>>>>> are remnants of it lingering, namely an entry in /sys/block (eg,
>>>>> /sys/block/md127) and the device node in /dev remains (eg,
>>>>> /dev/md127).
>>>>>
>>>>> Is this normal? Like I said, it probably is, and I've just never
>>>>> noticed it before. I assume its not going to hurt anything, but is
>>>>> there a way to clean it up, without rebooting? Obviously I could
>>>>> remove the /dev entry, but what about /sys/block?
>>>>>
>>>>
>>>> You can remove them both by running
>>>>   mdadm -S /dev/md127
>>>>
>>>> but they'll probably just reappear again.
>>>>
>>>> This seems to be an on-going battle between md and udev.  I've "fixed"
>>>> it at least once, but it keeps coming back.
>>>>
>>>> When md removes the md127 device, a message is sent to udev.
>>>> As part of its response to this message, udev tries to open /dev/md127.
>>>> Because of the rather unusual way that md devices are created (it made
>>>> sense nearly 20 years ago when it was designed), opening /dev/md127
>>>> causes md to create device md127 again.
>>>>
>>>> You could
>>>>   mv /dev/md127 /dev/md127X
>>>>   mdadm -S /dev/md127X
>>>>   rm /dev/md127X
>>>> that stop udev from opening /dev/md127.  It seems to work reliably.
>>>>
>>>> md used to generate a CHANGE event before the REMOVE event, and only the
>>>> CHANGE event caused udev to open the device file.  I removed that and
>>>> the problem went away.  Apparently some change has happened to udev and
>>>> now it opens the file in response to REMOVE as well.
>>>
>>> I used "udevadm monitor -pku" to watch the events when running "mdadm
>>> --stop /dev/md127" and this is what I see:
>>>
>>> --snip--
>>> KERNEL[163074.119778] change   /devices/virtual/block/md127 (block)
>>> ACTION=change
>>> DEVNAME=/dev/md127
>>> DEVPATH=/devices/virtual/block/md127
>>> DEVTYPE=disk
>>> MAJOR=9
>>> MINOR=127
>>> SEQNUM=3701
>>> SUBSYSTEM=block
>>>
>>> UDEV  [163074.121569] change   /devices/virtual/block/md127 (block)
>>> ACTION=change
>>> DEVNAME=/dev/md127
>>> DEVPATH=/devices/virtual/block/md127
>>> DEVTYPE=disk
>>> MAJOR=9
>>> MINOR=127
>>> SEQNUM=3701
>>> SUBSYSTEM=block
>>> SYSTEMD_READY=0
>>> USEC_INITIALIZED=370470
>>> --snip--
>>>
>>> I don't see any 'remove' event generated. I should mention if I hadn't
>>> already that I'm testing md-cluster (--bitmap=clustered), and
>>> currently using Linux 4.9-rc3.
>>
>> What version of mdadm are you using?
>
> v3.4
>
>
>> You need one which contains
>> Commit: 229e66cb9689 ("Manage.c: Only issue change events for kernels older than 2.6.28")
>>
>> which hasn't made it into a release yet.  But if you are playing with
>> md-cluster, I would guess you are using the latest from git...
>
> Wasn't, but I will now. Thanks.
>
> --Marc
>
>
>>
>> NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-11-22 15:12         ` Marc Smith
@ 2016-11-22 23:51           ` NeilBrown
  2016-11-23 15:21             ` Marc Smith
  0 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2016-11-22 23:51 UTC (permalink / raw)
  To: Marc Smith; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4794 bytes --]

On Wed, Nov 23 2016, Marc Smith wrote:

> Hi,
>
> Sorry, I'm not trying to beat a dead horse here, but I do feel
> something has changed... I just tested with Linux 4.5.2 and when
> stopping an md array (with mdadm --stop) the entry in /sys/block/ is
> removed, and even the /dev/mdXXX and /dev/md/name link are removed
> properly.
>
> When testing with Linux 4.9-rc3, the entries in /sys/block/ still
> remain (array_state attribute value is "clear") after using mdadm
> --stop and the /dev/mdXXX device exists (the /dev/md/name link is
> removed, by udev I assume).

With the latest (git) mdadm, when events are reported by "udevadm monitor"??

I only see remove events, and the entries from /dev and /sys are
removed.

If I could reproduce your problem, I would fix it...

NeilBrown

>
> Looks like Linux 4.9 is at rc6 now -- have there been any changes that
> would correct this behavior? Or is this expected behavior with the
> latest version? Not sure when this changed, but I did go back to 4.5.2
> and confirmed everything is removed correctly in that version, not
> sure if this is different starting in 4.9, or something between 4.5
> and 4.9.
>
> Can anyone else confirm with Linux 4.9 that the /sys/block/mdXXX entry
> lingers after using mdadm --stop? I suppose it could be some other
> system that is causing this on my machines. I tested using the latest
> from the master branch of mdadm and get the same result.
>
>
> Thanks for your time and help.
>
> --Marc
>
>
> On Mon, Nov 21, 2016 at 9:08 AM, Marc Smith <marc.smith@mcc.edu> wrote:
>> On Sun, Nov 20, 2016 at 10:42 PM, NeilBrown <neilb@suse.com> wrote:
>>> On Sat, Nov 19 2016, Marc Smith wrote:
>>>
>>>> On Mon, Nov 7, 2016 at 12:44 AM, NeilBrown <neilb@suse.com> wrote:
>>>>> On Sat, Nov 05 2016, Marc Smith wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> It may be that I've never noticed this before, so maybe its not a
>>>>>> problem... after using '--stop' to deactivate/stop an MD array, there
>>>>>> are remnants of it lingering, namely an entry in /sys/block (eg,
>>>>>> /sys/block/md127) and the device node in /dev remains (eg,
>>>>>> /dev/md127).
>>>>>>
>>>>>> Is this normal? Like I said, it probably is, and I've just never
>>>>>> noticed it before. I assume its not going to hurt anything, but is
>>>>>> there a way to clean it up, without rebooting? Obviously I could
>>>>>> remove the /dev entry, but what about /sys/block?
>>>>>>
>>>>>
>>>>> You can remove them both by running
>>>>>   mdadm -S /dev/md127
>>>>>
>>>>> but they'll probably just reappear again.
>>>>>
>>>>> This seems to be an on-going battle between md and udev.  I've "fixed"
>>>>> it at least once, but it keeps coming back.
>>>>>
>>>>> When md removes the md127 device, a message is sent to udev.
>>>>> As part of its response to this message, udev tries to open /dev/md127.
>>>>> Because of the rather unusual way that md devices are created (it made
>>>>> sense nearly 20 years ago when it was designed), opening /dev/md127
>>>>> causes md to create device md127 again.
>>>>>
>>>>> You could
>>>>>   mv /dev/md127 /dev/md127X
>>>>>   mdadm -S /dev/md127X
>>>>>   rm /dev/md127X
>>>>> that stop udev from opening /dev/md127.  It seems to work reliably.
>>>>>
>>>>> md used to generate a CHANGE event before the REMOVE event, and only the
>>>>> CHANGE event caused udev to open the device file.  I removed that and
>>>>> the problem went away.  Apparently some change has happened to udev and
>>>>> now it opens the file in response to REMOVE as well.
>>>>
>>>> I used "udevadm monitor -pku" to watch the events when running "mdadm
>>>> --stop /dev/md127" and this is what I see:
>>>>
>>>> --snip--
>>>> KERNEL[163074.119778] change   /devices/virtual/block/md127 (block)
>>>> ACTION=change
>>>> DEVNAME=/dev/md127
>>>> DEVPATH=/devices/virtual/block/md127
>>>> DEVTYPE=disk
>>>> MAJOR=9
>>>> MINOR=127
>>>> SEQNUM=3701
>>>> SUBSYSTEM=block
>>>>
>>>> UDEV  [163074.121569] change   /devices/virtual/block/md127 (block)
>>>> ACTION=change
>>>> DEVNAME=/dev/md127
>>>> DEVPATH=/devices/virtual/block/md127
>>>> DEVTYPE=disk
>>>> MAJOR=9
>>>> MINOR=127
>>>> SEQNUM=3701
>>>> SUBSYSTEM=block
>>>> SYSTEMD_READY=0
>>>> USEC_INITIALIZED=370470
>>>> --snip--
>>>>
>>>> I don't see any 'remove' event generated. I should mention if I hadn't
>>>> already that I'm testing md-cluster (--bitmap=clustered), and
>>>> currently using Linux 4.9-rc3.
>>>
>>> What version of mdadm are you using?
>>
>> v3.4
>>
>>
>>> You need one which contains
>>> Commit: 229e66cb9689 ("Manage.c: Only issue change events for kernels older than 2.6.28")
>>>
>>> which hasn't made it into a release yet.  But if you are playing with
>>> md-cluster, I would guess you are using the latest from git...
>>
>> Wasn't, but I will now. Thanks.
>>
>> --Marc
>>
>>
>>>
>>> NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-11-22 23:51           ` NeilBrown
@ 2016-11-23 15:21             ` Marc Smith
  2016-11-23 23:38               ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Smith @ 2016-11-23 15:21 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Tue, Nov 22, 2016 at 6:51 PM, NeilBrown <neilb@suse.com> wrote:
> On Wed, Nov 23 2016, Marc Smith wrote:
>
>> Hi,
>>
>> Sorry, I'm not trying to beat a dead horse here, but I do feel
>> something has changed... I just tested with Linux 4.5.2 and when
>> stopping an md array (with mdadm --stop) the entry in /sys/block/ is
>> removed, and even the /dev/mdXXX and /dev/md/name link are removed
>> properly.
>>
>> When testing with Linux 4.9-rc3, the entries in /sys/block/ still
>> remain (array_state attribute value is "clear") after using mdadm
>> --stop and the /dev/mdXXX device exists (the /dev/md/name link is
>> removed, by udev I assume).
>
> With the latest (git) mdadm, when events are reported by "udevadm monitor"??
>
> I only see remove events, and the entries from /dev and /sys are
> removed.
>
> If I could reproduce your problem, I would fix it...

On one set of hosts I can reliably reproduce this issue, and on
another system I could previously reproduce this, but now seems to be
working fine... I haven't found the connection (same distro / kernel
versions on all hosts).

# mdadm --version
mdadm - v3.4-100-g52a9408 - 26th October 2016


From 'mdadm --stop /dev/md/blah1' (non-clustered RAID0 array):

--snip--
# udevadm monitor -pku
calling: monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

KERNEL[32930.834312] change   /devices/virtual/block/md126 (block)
ACTION=change
DEVNAME=/dev/md126
DEVPATH=/devices/virtual/block/md126
DEVTYPE=disk
MAJOR=9
MINOR=126
SEQNUM=3678
SUBSYSTEM=block

UDEV  [32930.836032] change   /devices/virtual/block/md126 (block)
ACTION=change
DEVNAME=/dev/md126
DEVPATH=/devices/virtual/block/md126
DEVTYPE=disk
MAJOR=9
MINOR=126
SEQNUM=3678
SUBSYSTEM=block
SYSTEMD_READY=0
USEC_INITIALIZED=843336612
--snip--

Kernel logs from that:
--snip--
[32928.465695] md126: detected capacity change from 146681102336 to 0
[32928.465699] md: md126 stopped.
[32928.465702] md: unbind<dm-3>
[32928.465798] udevd[499]: inotify event: 8 for /dev/md126
[32928.465964] udevd[499]: device /dev/md126 closed, synthesising 'change'
[32928.466029] udevd[499]: seq 3678 queued, 'change' 'block'
[32928.466129] udevd[499]: seq 3678 forked new worker [27035]
[32928.466357] udevd[27035]: seq 3678 running
[32928.466423] udevd[27035]: removing watch on '/dev/md126'
[32928.466492] udevd[27035]: IMPORT 'probe-bcache -o udev /dev/md126'
/usr/lib/udev/rules.d/69-bcache.rules:16
[32928.466712] udevd[27036]: starting 'probe-bcache -o udev /dev/md126'
[32928.467540] udevd[27035]: 'probe-bcache -o udev /dev/md126' [27036]
exit with return code 0
[32928.467564] udevd[27035]: update old name,
'/dev/disk/by-id/md-name-tgtnode1.parodyne.com:blah1' no longer
belonging to '/devices/virtual/block/md126'
[32928.470851] md: export_rdev(dm-3)
[32928.470920] md: unbind<dm-2>
[32928.478843] md: export_rdev(dm-2)
--snip--


From 'mdadm --stop /dev/md/asdf1' (clustered RAID1 array):

--snip--
# udevadm monitor -pku
calling: monitor
monitor will print the received events for:
UDEV - the event which udev sends out after rule processing
KERNEL - the kernel uevent

KERNEL[34402.247229] change   /devices/virtual/block/md127 (block)
ACTION=change
DEVNAME=/dev/md127
DEVPATH=/devices/virtual/block/md127
DEVTYPE=disk
MAJOR=9
MINOR=127
SEQNUM=3679
SUBSYSTEM=block

KERNEL[34402.247885] offline
/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
ACTION=offline
DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
SEQNUM=3680
SUBSYSTEM=dlm

UDEV  [34402.248269] offline
/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
ACTION=offline
DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
SEQNUM=3680
SUBSYSTEM=dlm
USEC_INITIALIZED=402248230

KERNEL[34402.248841] remove
/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
ACTION=remove
DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
SEQNUM=3681
SUBSYSTEM=dlm

UDEV  [34402.248899] change   /devices/virtual/block/md127 (block)
ACTION=change
DEVNAME=/dev/md127
DEVPATH=/devices/virtual/block/md127
DEVTYPE=disk
MAJOR=9
MINOR=127
SEQNUM=3679
SUBSYSTEM=block
SYSTEMD_READY=0
USEC_INITIALIZED=1273670

UDEV  [34402.248990] remove
/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
ACTION=remove
DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
SEQNUM=3681
SUBSYSTEM=dlm
USEC_INITIALIZED=2248905
--snip--


Kernel logs from that:
--snip--
[34399.753876] udevd[499]: inotify event: 8 for /dev/md127
[34399.765389] md127: detected capacity change from 73340747776 to 0
[34399.765393] md: md127 stopped.
[34399.765579] udevd[499]: device /dev/md127 closed, synthesising 'change'
[34399.765656] udevd[499]: seq 3679 queued, 'change' 'block'
[34399.765751] udevd[499]: seq 3679 forked new worker [6317]
[34399.765878] udevd[6317]: seq 3679 running
[34399.765943] udevd[6317]: removing watch on '/dev/md127'
[34399.766012] udevd[6317]: IMPORT 'probe-bcache -o udev /dev/md127'
/usr/lib/udev/rules.d/69-bcache.rules:16
[34399.766259] udevd[6318]: starting 'probe-bcache -o udev /dev/md127'
[34399.766295] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e: leaving the
lockspace group...
[34399.766421] udevd[499]: seq 3680 queued, 'offline' 'dlm'
[34399.766549] udevd[499]: seq 3680 forked new worker [6319]
[34399.767080] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e: group event done 0 0
[34399.767297] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e:
release_lockspace final free
[34399.767320] md: unbind<dm-1>
[34399.795574] md: export_rdev(dm-1)
[34399.795640] md: unbind<dm-0>
[34399.803565] md: export_rdev(dm-0)
--snip--


On the other machines where the md array stopped correctly (removing
the entries from /dev and /sys) I do see the 'remove' events with
"udevadm monitor". What produces those remove events? Is that
something directly from the mdadm tool, or indirectly as part of the
stop/tear-down that mdadm initiates?


--Marc

>
> NeilBrown
>
>>
>> Looks like Linux 4.9 is at rc6 now -- have there been any changes that
>> would correct this behavior? Or is this expected behavior with the
>> latest version? Not sure when this changed, but I did go back to 4.5.2
>> and confirmed everything is removed correctly in that version, not
>> sure if this is different starting in 4.9, or something between 4.5
>> and 4.9.
>>
>> Can anyone else confirm with Linux 4.9 that the /sys/block/mdXXX entry
>> lingers after using mdadm --stop? I suppose it could be some other
>> system that is causing this on my machines. I tested using the latest
>> from the master branch of mdadm and get the same result.
>>
>>
>> Thanks for your time and help.
>>
>> --Marc
>>
>>
>> On Mon, Nov 21, 2016 at 9:08 AM, Marc Smith <marc.smith@mcc.edu> wrote:
>>> On Sun, Nov 20, 2016 at 10:42 PM, NeilBrown <neilb@suse.com> wrote:
>>>> On Sat, Nov 19 2016, Marc Smith wrote:
>>>>
>>>>> On Mon, Nov 7, 2016 at 12:44 AM, NeilBrown <neilb@suse.com> wrote:
>>>>>> On Sat, Nov 05 2016, Marc Smith wrote:
>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>> It may be that I've never noticed this before, so maybe its not a
>>>>>>> problem... after using '--stop' to deactivate/stop an MD array, there
>>>>>>> are remnants of it lingering, namely an entry in /sys/block (eg,
>>>>>>> /sys/block/md127) and the device node in /dev remains (eg,
>>>>>>> /dev/md127).
>>>>>>>
>>>>>>> Is this normal? Like I said, it probably is, and I've just never
>>>>>>> noticed it before. I assume its not going to hurt anything, but is
>>>>>>> there a way to clean it up, without rebooting? Obviously I could
>>>>>>> remove the /dev entry, but what about /sys/block?
>>>>>>>
>>>>>>
>>>>>> You can remove them both by running
>>>>>>   mdadm -S /dev/md127
>>>>>>
>>>>>> but they'll probably just reappear again.
>>>>>>
>>>>>> This seems to be an on-going battle between md and udev.  I've "fixed"
>>>>>> it at least once, but it keeps coming back.
>>>>>>
>>>>>> When md removes the md127 device, a message is sent to udev.
>>>>>> As part of its response to this message, udev tries to open /dev/md127.
>>>>>> Because of the rather unusual way that md devices are created (it made
>>>>>> sense nearly 20 years ago when it was designed), opening /dev/md127
>>>>>> causes md to create device md127 again.
>>>>>>
>>>>>> You could
>>>>>>   mv /dev/md127 /dev/md127X
>>>>>>   mdadm -S /dev/md127X
>>>>>>   rm /dev/md127X
>>>>>> that stop udev from opening /dev/md127.  It seems to work reliably.
>>>>>>
>>>>>> md used to generate a CHANGE event before the REMOVE event, and only the
>>>>>> CHANGE event caused udev to open the device file.  I removed that and
>>>>>> the problem went away.  Apparently some change has happened to udev and
>>>>>> now it opens the file in response to REMOVE as well.
>>>>>
>>>>> I used "udevadm monitor -pku" to watch the events when running "mdadm
>>>>> --stop /dev/md127" and this is what I see:
>>>>>
>>>>> --snip--
>>>>> KERNEL[163074.119778] change   /devices/virtual/block/md127 (block)
>>>>> ACTION=change
>>>>> DEVNAME=/dev/md127
>>>>> DEVPATH=/devices/virtual/block/md127
>>>>> DEVTYPE=disk
>>>>> MAJOR=9
>>>>> MINOR=127
>>>>> SEQNUM=3701
>>>>> SUBSYSTEM=block
>>>>>
>>>>> UDEV  [163074.121569] change   /devices/virtual/block/md127 (block)
>>>>> ACTION=change
>>>>> DEVNAME=/dev/md127
>>>>> DEVPATH=/devices/virtual/block/md127
>>>>> DEVTYPE=disk
>>>>> MAJOR=9
>>>>> MINOR=127
>>>>> SEQNUM=3701
>>>>> SUBSYSTEM=block
>>>>> SYSTEMD_READY=0
>>>>> USEC_INITIALIZED=370470
>>>>> --snip--
>>>>>
>>>>> I don't see any 'remove' event generated. I should mention if I hadn't
>>>>> already that I'm testing md-cluster (--bitmap=clustered), and
>>>>> currently using Linux 4.9-rc3.
>>>>
>>>> What version of mdadm are you using?
>>>
>>> v3.4
>>>
>>>
>>>> You need one which contains
>>>> Commit: 229e66cb9689 ("Manage.c: Only issue change events for kernels older than 2.6.28")
>>>>
>>>> which hasn't made it into a release yet.  But if you are playing with
>>>> md-cluster, I would guess you are using the latest from git...
>>>
>>> Wasn't, but I will now. Thanks.
>>>
>>> --Marc
>>>
>>>
>>>>
>>>> NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-11-23 15:21             ` Marc Smith
@ 2016-11-23 23:38               ` NeilBrown
  2016-11-26 16:41                 ` Marc Smith
  0 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2016-11-23 23:38 UTC (permalink / raw)
  To: Marc Smith; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 7582 bytes --]

On Thu, Nov 24 2016, Marc Smith wrote:

> On Tue, Nov 22, 2016 at 6:51 PM, NeilBrown <neilb@suse.com> wrote:
>> On Wed, Nov 23 2016, Marc Smith wrote:
>>
>>> Hi,
>>>
>>> Sorry, I'm not trying to beat a dead horse here, but I do feel
>>> something has changed... I just tested with Linux 4.5.2 and when
>>> stopping an md array (with mdadm --stop) the entry in /sys/block/ is
>>> removed, and even the /dev/mdXXX and /dev/md/name link are removed
>>> properly.
>>>
>>> When testing with Linux 4.9-rc3, the entries in /sys/block/ still
>>> remain (array_state attribute value is "clear") after using mdadm
>>> --stop and the /dev/mdXXX device exists (the /dev/md/name link is
>>> removed, by udev I assume).
>>
>> With the latest (git) mdadm, when events are reported by "udevadm monitor"??
>>
>> I only see remove events, and the entries from /dev and /sys are
>> removed.
>>
>> If I could reproduce your problem, I would fix it...
>
> On one set of hosts I can reliably reproduce this issue, and on
> another system I could previously reproduce this, but now seems to be
> working fine... I haven't found the connection (same distro / kernel
> versions on all hosts).
>
> # mdadm --version
> mdadm - v3.4-100-g52a9408 - 26th October 2016
>
>
> From 'mdadm --stop /dev/md/blah1' (non-clustered RAID0 array):
>
> --snip--
> # udevadm monitor -pku
> calling: monitor
> monitor will print the received events for:
> UDEV - the event which udev sends out after rule processing
> KERNEL - the kernel uevent
>
> KERNEL[32930.834312] change   /devices/virtual/block/md126 (block)
> ACTION=change
> DEVNAME=/dev/md126
> DEVPATH=/devices/virtual/block/md126
> DEVTYPE=disk
> MAJOR=9
> MINOR=126
> SEQNUM=3678
> SUBSYSTEM=block
>
> UDEV  [32930.836032] change   /devices/virtual/block/md126 (block)
> ACTION=change
> DEVNAME=/dev/md126
> DEVPATH=/devices/virtual/block/md126
> DEVTYPE=disk
> MAJOR=9
> MINOR=126
> SEQNUM=3678
> SUBSYSTEM=block
> SYSTEMD_READY=0
> USEC_INITIALIZED=843336612

Using the recent version of mdadm, and a kernel newer than 4.6 (commit
399146b8) a CHANGE event should only be generated when the array is
started. To see a CHANGE of stop, you must have an older mdadm or and
older kernel.... unless something else is synthesizing a CHANGE event. 

When using a clustered array, you can also see a CHANGE event when a new
disk is being added to an array.

> --snip--
>
> Kernel logs from that:
> --snip--
> [32928.465695] md126: detected capacity change from 146681102336 to 0
> [32928.465699] md: md126 stopped.
> [32928.465702] md: unbind<dm-3>
> [32928.465798] udevd[499]: inotify event: 8 for /dev/md126
> [32928.465964] udevd[499]: device /dev/md126 closed, synthesising 'change'
> [32928.466029] udevd[499]: seq 3678 queued, 'change' 'block'
> [32928.466129] udevd[499]: seq 3678 forked new worker [27035]
> [32928.466357] udevd[27035]: seq 3678 running
> [32928.466423] udevd[27035]: removing watch on '/dev/md126'
> [32928.466492] udevd[27035]: IMPORT 'probe-bcache -o udev /dev/md126'
> /usr/lib/udev/rules.d/69-bcache.rules:16
> [32928.466712] udevd[27036]: starting 'probe-bcache -o udev /dev/md126'
> [32928.467540] udevd[27035]: 'probe-bcache -o udev /dev/md126' [27036]
> exit with return code 0
> [32928.467564] udevd[27035]: update old name,
> '/dev/disk/by-id/md-name-tgtnode1.parodyne.com:blah1' no longer
> belonging to '/devices/virtual/block/md126'
> [32928.470851] md: export_rdev(dm-3)
> [32928.470920] md: unbind<dm-2>
> [32928.478843] md: export_rdev(dm-2)
> --snip--
>
>
> From 'mdadm --stop /dev/md/asdf1' (clustered RAID1 array):
>
> --snip--
> # udevadm monitor -pku
> calling: monitor
> monitor will print the received events for:
> UDEV - the event which udev sends out after rule processing
> KERNEL - the kernel uevent
>
> KERNEL[34402.247229] change   /devices/virtual/block/md127 (block)
> ACTION=change
> DEVNAME=/dev/md127
> DEVPATH=/devices/virtual/block/md127
> DEVTYPE=disk
> MAJOR=9
> MINOR=127
> SEQNUM=3679
> SUBSYSTEM=block
>
> KERNEL[34402.247885] offline
> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
> ACTION=offline
> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
> SEQNUM=3680
> SUBSYSTEM=dlm
>
> UDEV  [34402.248269] offline
> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
> ACTION=offline
> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
> SEQNUM=3680
> SUBSYSTEM=dlm
> USEC_INITIALIZED=402248230
>
> KERNEL[34402.248841] remove
> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
> ACTION=remove
> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
> SEQNUM=3681
> SUBSYSTEM=dlm
>
> UDEV  [34402.248899] change   /devices/virtual/block/md127 (block)
> ACTION=change
> DEVNAME=/dev/md127
> DEVPATH=/devices/virtual/block/md127
> DEVTYPE=disk
> MAJOR=9
> MINOR=127
> SEQNUM=3679
> SUBSYSTEM=block
> SYSTEMD_READY=0
> USEC_INITIALIZED=1273670
>
> UDEV  [34402.248990] remove
> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
> ACTION=remove
> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
> SEQNUM=3681
> SUBSYSTEM=dlm
> USEC_INITIALIZED=2248905
> --snip--
>
>
> Kernel logs from that:
> --snip--
> [34399.753876] udevd[499]: inotify event: 8 for /dev/md127
> [34399.765389] md127: detected capacity change from 73340747776 to 0
> [34399.765393] md: md127 stopped.
> [34399.765579] udevd[499]: device /dev/md127 closed, synthesising 'change'

That is weird.  udev is synthesizing a CHANGE event.  How nice of it
(not!).
It doesn't do this for dm, only other devices.

This happens when an open-for-write is closed.
If you edit mdopen.c and change the O_RDWR in open_mddev() to O_RDONLY,
this change will go away.


> [34399.765656] udevd[499]: seq 3679 queued, 'change' 'block'
> [34399.765751] udevd[499]: seq 3679 forked new worker [6317]
> [34399.765878] udevd[6317]: seq 3679 running
> [34399.765943] udevd[6317]: removing watch on '/dev/md127'
> [34399.766012] udevd[6317]: IMPORT 'probe-bcache -o udev /dev/md127'
> /usr/lib/udev/rules.d/69-bcache.rules:16
> [34399.766259] udevd[6318]: starting 'probe-bcache -o udev /dev/md127'
> [34399.766295] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e: leaving the
> lockspace group...
> [34399.766421] udevd[499]: seq 3680 queued, 'offline' 'dlm'
> [34399.766549] udevd[499]: seq 3680 forked new worker [6319]
> [34399.767080] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e: group event done 0 0
> [34399.767297] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e:
> release_lockspace final free
> [34399.767320] md: unbind<dm-1>
> [34399.795574] md: export_rdev(dm-1)
> [34399.795640] md: unbind<dm-0>
> [34399.803565] md: export_rdev(dm-0)
> --snip--
>
>
> On the other machines where the md array stopped correctly (removing
> the entries from /dev and /sys) I do see the 'remove' events with
> "udevadm monitor". What produces those remove events? Is that
> something directly from the mdadm tool, or indirectly as part of the
> stop/tear-down that mdadm initiates?

REMOVE events are generated by

md_free() -> del_gendisk() ->  blk_unregister_queue()

which should happen when the md object is finally released.
If there is no CHANGE event, then nothing should stop this from
happening.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 800 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-11-23 23:38               ` NeilBrown
@ 2016-11-26 16:41                 ` Marc Smith
  2016-11-27 22:20                   ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Smith @ 2016-11-26 16:41 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

So, I modified mdopen.c to look like this:

--- a/mdopen.c    2016-11-25 17:04:25.782299330 -0500
+++ b/mdopen.c    2016-11-26 10:57:35.883621355 -0500
@@ -416,7 +416,7 @@
  */
 int open_mddev(char *dev, int report_errors)
 {
-    int mdfd = open(dev, O_RDWR);
+    int mdfd = open(dev, O_RDONLY);
     if (mdfd < 0 && errno == EACCES)
         mdfd = open(dev, O_RDONLY);
     if (mdfd < 0) {


And now, when running 'mdadm --stop' here is what I see...

From the output 'udevadm monitor -pku':

--snip--
KERNEL[297486.536908] offline
/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
ACTION=offline
DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
SEQNUM=3651
SUBSYSTEM=dlm

UDEV  [297486.537541] offline
/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
ACTION=offline
DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
SEQNUM=3651
SUBSYSTEM=dlm
USEC_INITIALIZED=7486537404

KERNEL[297486.538325] remove
/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
ACTION=remove
DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
SEQNUM=3652
SUBSYSTEM=dlm

UDEV  [297486.538644] remove
/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
ACTION=remove
DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
SEQNUM=3652
SUBSYSTEM=dlm
USEC_INITIALIZED=86538345
--snip--

And from the kernel log:

--snip--
[297504.958244] md127: detected capacity change from 73340747776 to 0
[297504.958249] md: md127 stopped.
[297504.958884] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e: leaving the
lockspace group...
[297504.959004] udevd[487]: seq 3651 queued, 'offline' 'dlm'
[297504.959161] udevd[487]: seq 3651 forked new worker [5392]
[297504.959417] udevd[5392]: seq 3651 running
[297504.959474] udevd[5392]: no db file to read
/run/udev/data/+dlm:62fccfd6-605f-19e6-be6d-99a1e3cb987e: No such file
or directory
[297504.959524] udevd[5392]: passed device to netlink monitor 0x2251c30
[297504.959527] udevd[5392]: seq 3651 processed
[297504.960101] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e: group event done 0 0
[297504.960299] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e:
release_lockspace final free
[297504.960329] md: unbind<dm-0>
[297504.960448] udevd[487]: seq 3652 queued, 'remove' 'dlm'
[297504.960500] udevd[487]: passed 214 byte device to netlink monitor 0x224b130
[297504.960584] udevd[5392]: seq 3652 running
[297504.960606] udevd[5392]: no db file to read
/run/udev/data/+dlm:62fccfd6-605f-19e6-be6d-99a1e3cb987e: No such file
or directory
[297504.967168] md: export_rdev(dm-0)
[297504.967231] md: unbind<dm-1>
[297504.975176] md: export_rdev(dm-1)
--snip--

So that did get rid of the synthesized CHANGE event, but still no
REMOVE event. =)

Still trying to rule-out there isn't anything strange with my Linux
distro / setup... but I assume even if udev was mishandling something,
we should still be seeing a REMOVE event on '--stop'.


--Marc


On Wed, Nov 23, 2016 at 6:38 PM, NeilBrown <neilb@suse.com> wrote:
> On Thu, Nov 24 2016, Marc Smith wrote:
>
>> On Tue, Nov 22, 2016 at 6:51 PM, NeilBrown <neilb@suse.com> wrote:
>>> On Wed, Nov 23 2016, Marc Smith wrote:
>>>
>>>> Hi,
>>>>
>>>> Sorry, I'm not trying to beat a dead horse here, but I do feel
>>>> something has changed... I just tested with Linux 4.5.2 and when
>>>> stopping an md array (with mdadm --stop) the entry in /sys/block/ is
>>>> removed, and even the /dev/mdXXX and /dev/md/name link are removed
>>>> properly.
>>>>
>>>> When testing with Linux 4.9-rc3, the entries in /sys/block/ still
>>>> remain (array_state attribute value is "clear") after using mdadm
>>>> --stop and the /dev/mdXXX device exists (the /dev/md/name link is
>>>> removed, by udev I assume).
>>>
>>> With the latest (git) mdadm, when events are reported by "udevadm monitor"??
>>>
>>> I only see remove events, and the entries from /dev and /sys are
>>> removed.
>>>
>>> If I could reproduce your problem, I would fix it...
>>
>> On one set of hosts I can reliably reproduce this issue, and on
>> another system I could previously reproduce this, but now seems to be
>> working fine... I haven't found the connection (same distro / kernel
>> versions on all hosts).
>>
>> # mdadm --version
>> mdadm - v3.4-100-g52a9408 - 26th October 2016
>>
>>
>> From 'mdadm --stop /dev/md/blah1' (non-clustered RAID0 array):
>>
>> --snip--
>> # udevadm monitor -pku
>> calling: monitor
>> monitor will print the received events for:
>> UDEV - the event which udev sends out after rule processing
>> KERNEL - the kernel uevent
>>
>> KERNEL[32930.834312] change   /devices/virtual/block/md126 (block)
>> ACTION=change
>> DEVNAME=/dev/md126
>> DEVPATH=/devices/virtual/block/md126
>> DEVTYPE=disk
>> MAJOR=9
>> MINOR=126
>> SEQNUM=3678
>> SUBSYSTEM=block
>>
>> UDEV  [32930.836032] change   /devices/virtual/block/md126 (block)
>> ACTION=change
>> DEVNAME=/dev/md126
>> DEVPATH=/devices/virtual/block/md126
>> DEVTYPE=disk
>> MAJOR=9
>> MINOR=126
>> SEQNUM=3678
>> SUBSYSTEM=block
>> SYSTEMD_READY=0
>> USEC_INITIALIZED=843336612
>
> Using the recent version of mdadm, and a kernel newer than 4.6 (commit
> 399146b8) a CHANGE event should only be generated when the array is
> started. To see a CHANGE of stop, you must have an older mdadm or and
> older kernel.... unless something else is synthesizing a CHANGE event.
>
> When using a clustered array, you can also see a CHANGE event when a new
> disk is being added to an array.
>
>> --snip--
>>
>> Kernel logs from that:
>> --snip--
>> [32928.465695] md126: detected capacity change from 146681102336 to 0
>> [32928.465699] md: md126 stopped.
>> [32928.465702] md: unbind<dm-3>
>> [32928.465798] udevd[499]: inotify event: 8 for /dev/md126
>> [32928.465964] udevd[499]: device /dev/md126 closed, synthesising 'change'
>> [32928.466029] udevd[499]: seq 3678 queued, 'change' 'block'
>> [32928.466129] udevd[499]: seq 3678 forked new worker [27035]
>> [32928.466357] udevd[27035]: seq 3678 running
>> [32928.466423] udevd[27035]: removing watch on '/dev/md126'
>> [32928.466492] udevd[27035]: IMPORT 'probe-bcache -o udev /dev/md126'
>> /usr/lib/udev/rules.d/69-bcache.rules:16
>> [32928.466712] udevd[27036]: starting 'probe-bcache -o udev /dev/md126'
>> [32928.467540] udevd[27035]: 'probe-bcache -o udev /dev/md126' [27036]
>> exit with return code 0
>> [32928.467564] udevd[27035]: update old name,
>> '/dev/disk/by-id/md-name-tgtnode1.parodyne.com:blah1' no longer
>> belonging to '/devices/virtual/block/md126'
>> [32928.470851] md: export_rdev(dm-3)
>> [32928.470920] md: unbind<dm-2>
>> [32928.478843] md: export_rdev(dm-2)
>> --snip--
>>
>>
>> From 'mdadm --stop /dev/md/asdf1' (clustered RAID1 array):
>>
>> --snip--
>> # udevadm monitor -pku
>> calling: monitor
>> monitor will print the received events for:
>> UDEV - the event which udev sends out after rule processing
>> KERNEL - the kernel uevent
>>
>> KERNEL[34402.247229] change   /devices/virtual/block/md127 (block)
>> ACTION=change
>> DEVNAME=/dev/md127
>> DEVPATH=/devices/virtual/block/md127
>> DEVTYPE=disk
>> MAJOR=9
>> MINOR=127
>> SEQNUM=3679
>> SUBSYSTEM=block
>>
>> KERNEL[34402.247885] offline
>> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
>> ACTION=offline
>> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> SEQNUM=3680
>> SUBSYSTEM=dlm
>>
>> UDEV  [34402.248269] offline
>> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
>> ACTION=offline
>> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> SEQNUM=3680
>> SUBSYSTEM=dlm
>> USEC_INITIALIZED=402248230
>>
>> KERNEL[34402.248841] remove
>> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
>> ACTION=remove
>> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> SEQNUM=3681
>> SUBSYSTEM=dlm
>>
>> UDEV  [34402.248899] change   /devices/virtual/block/md127 (block)
>> ACTION=change
>> DEVNAME=/dev/md127
>> DEVPATH=/devices/virtual/block/md127
>> DEVTYPE=disk
>> MAJOR=9
>> MINOR=127
>> SEQNUM=3679
>> SUBSYSTEM=block
>> SYSTEMD_READY=0
>> USEC_INITIALIZED=1273670
>>
>> UDEV  [34402.248990] remove
>> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
>> ACTION=remove
>> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> SEQNUM=3681
>> SUBSYSTEM=dlm
>> USEC_INITIALIZED=2248905
>> --snip--
>>
>>
>> Kernel logs from that:
>> --snip--
>> [34399.753876] udevd[499]: inotify event: 8 for /dev/md127
>> [34399.765389] md127: detected capacity change from 73340747776 to 0
>> [34399.765393] md: md127 stopped.
>> [34399.765579] udevd[499]: device /dev/md127 closed, synthesising 'change'
>
> That is weird.  udev is synthesizing a CHANGE event.  How nice of it
> (not!).
> It doesn't do this for dm, only other devices.
>
> This happens when an open-for-write is closed.
> If you edit mdopen.c and change the O_RDWR in open_mddev() to O_RDONLY,
> this change will go away.
>
>
>> [34399.765656] udevd[499]: seq 3679 queued, 'change' 'block'
>> [34399.765751] udevd[499]: seq 3679 forked new worker [6317]
>> [34399.765878] udevd[6317]: seq 3679 running
>> [34399.765943] udevd[6317]: removing watch on '/dev/md127'
>> [34399.766012] udevd[6317]: IMPORT 'probe-bcache -o udev /dev/md127'
>> /usr/lib/udev/rules.d/69-bcache.rules:16
>> [34399.766259] udevd[6318]: starting 'probe-bcache -o udev /dev/md127'
>> [34399.766295] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e: leaving the
>> lockspace group...
>> [34399.766421] udevd[499]: seq 3680 queued, 'offline' 'dlm'
>> [34399.766549] udevd[499]: seq 3680 forked new worker [6319]
>> [34399.767080] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e: group event done 0 0
>> [34399.767297] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e:
>> release_lockspace final free
>> [34399.767320] md: unbind<dm-1>
>> [34399.795574] md: export_rdev(dm-1)
>> [34399.795640] md: unbind<dm-0>
>> [34399.803565] md: export_rdev(dm-0)
>> --snip--
>>
>>
>> On the other machines where the md array stopped correctly (removing
>> the entries from /dev and /sys) I do see the 'remove' events with
>> "udevadm monitor". What produces those remove events? Is that
>> something directly from the mdadm tool, or indirectly as part of the
>> stop/tear-down that mdadm initiates?
>
> REMOVE events are generated by
>
> md_free() -> del_gendisk() ->  blk_unregister_queue()
>
> which should happen when the md object is finally released.
> If there is no CHANGE event, then nothing should stop this from
> happening.
>
> NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-11-26 16:41                 ` Marc Smith
@ 2016-11-27 22:20                   ` NeilBrown
  2016-11-28  2:25                     ` Marc Smith
  0 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2016-11-27 22:20 UTC (permalink / raw)
  To: Marc Smith; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4146 bytes --]

On Sun, Nov 27 2016, Marc Smith wrote:

> So, I modified mdopen.c to look like this:
>
> --- a/mdopen.c    2016-11-25 17:04:25.782299330 -0500
> +++ b/mdopen.c    2016-11-26 10:57:35.883621355 -0500
> @@ -416,7 +416,7 @@
>   */
>  int open_mddev(char *dev, int report_errors)
>  {
> -    int mdfd = open(dev, O_RDWR);
> +    int mdfd = open(dev, O_RDONLY);
>      if (mdfd < 0 && errno == EACCES)
>          mdfd = open(dev, O_RDONLY);
>      if (mdfd < 0) {
>
>
> And now, when running 'mdadm --stop' here is what I see...
>
> From the output 'udevadm monitor -pku':
>
> --snip--
> KERNEL[297486.536908] offline
> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
> ACTION=offline
> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
> SEQNUM=3651
> SUBSYSTEM=dlm
>
> UDEV  [297486.537541] offline
> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
> ACTION=offline
> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
> SEQNUM=3651
> SUBSYSTEM=dlm
> USEC_INITIALIZED=7486537404
>
> KERNEL[297486.538325] remove
> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
> ACTION=remove
> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
> SEQNUM=3652
> SUBSYSTEM=dlm
>
> UDEV  [297486.538644] remove
> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
> ACTION=remove
> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
> SEQNUM=3652
> SUBSYSTEM=dlm
> USEC_INITIALIZED=86538345
> --snip--
>
> And from the kernel log:
>
> --snip--
> [297504.958244] md127: detected capacity change from 73340747776 to 0
> [297504.958249] md: md127 stopped.
> [297504.958884] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e: leaving the
> lockspace group...
> [297504.959004] udevd[487]: seq 3651 queued, 'offline' 'dlm'
> [297504.959161] udevd[487]: seq 3651 forked new worker [5392]
> [297504.959417] udevd[5392]: seq 3651 running
> [297504.959474] udevd[5392]: no db file to read
> /run/udev/data/+dlm:62fccfd6-605f-19e6-be6d-99a1e3cb987e: No such file
> or directory
> [297504.959524] udevd[5392]: passed device to netlink monitor 0x2251c30
> [297504.959527] udevd[5392]: seq 3651 processed
> [297504.960101] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e: group event done 0 0
> [297504.960299] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e:
> release_lockspace final free
> [297504.960329] md: unbind<dm-0>
> [297504.960448] udevd[487]: seq 3652 queued, 'remove' 'dlm'
> [297504.960500] udevd[487]: passed 214 byte device to netlink monitor 0x224b130
> [297504.960584] udevd[5392]: seq 3652 running
> [297504.960606] udevd[5392]: no db file to read
> /run/udev/data/+dlm:62fccfd6-605f-19e6-be6d-99a1e3cb987e: No such file
> or directory
> [297504.967168] md: export_rdev(dm-0)
> [297504.967231] md: unbind<dm-1>
> [297504.975176] md: export_rdev(dm-1)
> --snip--
>
> So that did get rid of the synthesized CHANGE event, but still no
> REMOVE event. =)
>
> Still trying to rule-out there isn't anything strange with my Linux
> distro / setup... but I assume even if udev was mishandling something,
> we should still be seeing a REMOVE event on '--stop'.
>

Getting rid of the CHANGE events is good as they can muddy the waters a
bit.

The lack of a REMOVE event suggests that something it still holding a
reference to the 'md' kobject.  There aren't many references though.
One is held when the array is active.  That is dropped by
mddev_delayed_delete() which is called from a work-queue.  So it can be
delayed a little bit, but not much.
The other references are held by member devices, and dropped when the
devices are unbound from the array - and the kernel log shows the
"unbind" messages.

In either case, if there is a reference outstanding, there should be
some evidence in the /sys/mdXX/md directory.  Either a 'dev-XX'
directory or the various bitmap attributes..

Can you report the result of

  find /sys/block/md127/md
and
  find /sys/block/md127/md -type file | xargs grep .

after stopping the array?

Thanks,
NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-11-27 22:20                   ` NeilBrown
@ 2016-11-28  2:25                     ` Marc Smith
  2016-12-01  2:52                       ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Smith @ 2016-11-28  2:25 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Sun, Nov 27, 2016 at 5:20 PM, NeilBrown <neilb@suse.com> wrote:
> On Sun, Nov 27 2016, Marc Smith wrote:
>
>> So, I modified mdopen.c to look like this:
>>
>> --- a/mdopen.c    2016-11-25 17:04:25.782299330 -0500
>> +++ b/mdopen.c    2016-11-26 10:57:35.883621355 -0500
>> @@ -416,7 +416,7 @@
>>   */
>>  int open_mddev(char *dev, int report_errors)
>>  {
>> -    int mdfd = open(dev, O_RDWR);
>> +    int mdfd = open(dev, O_RDONLY);
>>      if (mdfd < 0 && errno == EACCES)
>>          mdfd = open(dev, O_RDONLY);
>>      if (mdfd < 0) {
>>
>>
>> And now, when running 'mdadm --stop' here is what I see...
>>
>> From the output 'udevadm monitor -pku':
>>
>> --snip--
>> KERNEL[297486.536908] offline
>> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
>> ACTION=offline
>> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> SEQNUM=3651
>> SUBSYSTEM=dlm
>>
>> UDEV  [297486.537541] offline
>> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
>> ACTION=offline
>> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> SEQNUM=3651
>> SUBSYSTEM=dlm
>> USEC_INITIALIZED=7486537404
>>
>> KERNEL[297486.538325] remove
>> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
>> ACTION=remove
>> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> SEQNUM=3652
>> SUBSYSTEM=dlm
>>
>> UDEV  [297486.538644] remove
>> /kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e (dlm)
>> ACTION=remove
>> DEVPATH=/kernel/dlm/62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> LOCKSPACE=62fccfd6-605f-19e6-be6d-99a1e3cb987e
>> SEQNUM=3652
>> SUBSYSTEM=dlm
>> USEC_INITIALIZED=86538345
>> --snip--
>>
>> And from the kernel log:
>>
>> --snip--
>> [297504.958244] md127: detected capacity change from 73340747776 to 0
>> [297504.958249] md: md127 stopped.
>> [297504.958884] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e: leaving the
>> lockspace group...
>> [297504.959004] udevd[487]: seq 3651 queued, 'offline' 'dlm'
>> [297504.959161] udevd[487]: seq 3651 forked new worker [5392]
>> [297504.959417] udevd[5392]: seq 3651 running
>> [297504.959474] udevd[5392]: no db file to read
>> /run/udev/data/+dlm:62fccfd6-605f-19e6-be6d-99a1e3cb987e: No such file
>> or directory
>> [297504.959524] udevd[5392]: passed device to netlink monitor 0x2251c30
>> [297504.959527] udevd[5392]: seq 3651 processed
>> [297504.960101] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e: group event done 0 0
>> [297504.960299] dlm: 62fccfd6-605f-19e6-be6d-99a1e3cb987e:
>> release_lockspace final free
>> [297504.960329] md: unbind<dm-0>
>> [297504.960448] udevd[487]: seq 3652 queued, 'remove' 'dlm'
>> [297504.960500] udevd[487]: passed 214 byte device to netlink monitor 0x224b130
>> [297504.960584] udevd[5392]: seq 3652 running
>> [297504.960606] udevd[5392]: no db file to read
>> /run/udev/data/+dlm:62fccfd6-605f-19e6-be6d-99a1e3cb987e: No such file
>> or directory
>> [297504.967168] md: export_rdev(dm-0)
>> [297504.967231] md: unbind<dm-1>
>> [297504.975176] md: export_rdev(dm-1)
>> --snip--
>>
>> So that did get rid of the synthesized CHANGE event, but still no
>> REMOVE event. =)
>>
>> Still trying to rule-out there isn't anything strange with my Linux
>> distro / setup... but I assume even if udev was mishandling something,
>> we should still be seeing a REMOVE event on '--stop'.
>>
>
> Getting rid of the CHANGE events is good as they can muddy the waters a
> bit.
>
> The lack of a REMOVE event suggests that something it still holding a
> reference to the 'md' kobject.  There aren't many references though.
> One is held when the array is active.  That is dropped by
> mddev_delayed_delete() which is called from a work-queue.  So it can be
> delayed a little bit, but not much.
> The other references are held by member devices, and dropped when the
> devices are unbound from the array - and the kernel log shows the
> "unbind" messages.
>
> In either case, if there is a reference outstanding, there should be
> some evidence in the /sys/mdXX/md directory.  Either a 'dev-XX'
> directory or the various bitmap attributes..
>
> Can you report the result of
>
>   find /sys/block/md127/md
> and
>   find /sys/block/md127/md -type file | xargs grep .
>
> after stopping the array?

# find /sys/block/md127/md
/sys/block/md127/md
/sys/block/md127/md/reshape_position
/sys/block/md127/md/layout
/sys/block/md127/md/raid_disks
/sys/block/md127/md/bitmap
/sys/block/md127/md/bitmap/chunksize
/sys/block/md127/md/bitmap/space
/sys/block/md127/md/bitmap/can_clear
/sys/block/md127/md/bitmap/time_base
/sys/block/md127/md/bitmap/metadata
/sys/block/md127/md/bitmap/backlog
/sys/block/md127/md/bitmap/location
/sys/block/md127/md/bitmap/max_backlog_used
/sys/block/md127/md/array_size
/sys/block/md127/md/max_read_errors
/sys/block/md127/md/metadata_version
/sys/block/md127/md/component_size
/sys/block/md127/md/reshape_direction
/sys/block/md127/md/resync_start
/sys/block/md127/md/new_dev
/sys/block/md127/md/safe_mode_delay
/sys/block/md127/md/level
/sys/block/md127/md/chunk_size
/sys/block/md127/md/array_state


# find /sys/block/md127/md -type f | xargs grep .
/sys/block/md127/md/reshape_position:none
/sys/block/md127/md/layout:0
/sys/block/md127/md/bitmap/chunksize:0
/sys/block/md127/md/bitmap/space:0
/sys/block/md127/md/bitmap/time_base:0
/sys/block/md127/md/bitmap/metadata:internal
/sys/block/md127/md/bitmap/backlog:0
/sys/block/md127/md/bitmap/location:none
/sys/block/md127/md/bitmap/max_backlog_used:0
/sys/block/md127/md/array_size:default
/sys/block/md127/md/max_read_errors:20
/sys/block/md127/md/metadata_version:none
/sys/block/md127/md/component_size:0
/sys/block/md127/md/reshape_direction:forwards
/sys/block/md127/md/resync_start:0
grep: /sys/block/md127/md/new_dev: Permission denied
/sys/block/md127/md/safe_mode_delay:0.000
/sys/block/md127/md/chunk_size:0
/sys/block/md127/md/array_state:clear


--Marc

>
> Thanks,
> NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-11-28  2:25                     ` Marc Smith
@ 2016-12-01  2:52                       ` NeilBrown
  2016-12-01 19:40                         ` Marc Smith
  0 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2016-12-01  2:52 UTC (permalink / raw)
  To: Marc Smith; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1311 bytes --]

On Mon, Nov 28 2016, Marc Smith wrote:

>
> # find /sys/block/md127/md
> /sys/block/md127/md
> /sys/block/md127/md/reshape_position
> /sys/block/md127/md/layout
> /sys/block/md127/md/raid_disks
> /sys/block/md127/md/bitmap
> /sys/block/md127/md/bitmap/chunksize

This tells me that:
	sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
hasn't been run, so mddev_delayed_delete() hasn't run.
That suggests the final mddev_put() hsn't run. i.e. mddev->active is > 0

Everything else suggests that array has been stopped and cleaned and
should be gone...

This seems to suggest that there is an unbalanced mddev_get() without a
matching mddev_put().  I cannot find it though.

If I could reproduce it, I would try to see what is happening by:

- putting
   printk("mddev->active = %d\n", atomic_read(&mddev->active));
 in the top of mddev_put().  That shouldn't be *too* noisy.

- putting
   printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
   	list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);

  in mddev_put() just before those values are tested.

 - putting
     printk("queue_work\n");
   just before the 'queue_work()' call in mddev_put.

 - putting
     printk("mddev_delayed_delete\n");
   in mddev_delayed_delete()

Then see what gets printed when you stop the array.

NeilBrown

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-12-01  2:52                       ` NeilBrown
@ 2016-12-01 19:40                         ` Marc Smith
  2016-12-01 22:35                           ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Smith @ 2016-12-01 19:40 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Wed, Nov 30, 2016 at 9:52 PM, NeilBrown <neilb@suse.com> wrote:
> On Mon, Nov 28 2016, Marc Smith wrote:
>
>>
>> # find /sys/block/md127/md
>> /sys/block/md127/md
>> /sys/block/md127/md/reshape_position
>> /sys/block/md127/md/layout
>> /sys/block/md127/md/raid_disks
>> /sys/block/md127/md/bitmap
>> /sys/block/md127/md/bitmap/chunksize
>
> This tells me that:
>         sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
> hasn't been run, so mddev_delayed_delete() hasn't run.
> That suggests the final mddev_put() hsn't run. i.e. mddev->active is > 0
>
> Everything else suggests that array has been stopped and cleaned and
> should be gone...
>
> This seems to suggest that there is an unbalanced mddev_get() without a
> matching mddev_put().  I cannot find it though.
>
> If I could reproduce it, I would try to see what is happening by:
>
> - putting
>    printk("mddev->active = %d\n", atomic_read(&mddev->active));
>  in the top of mddev_put().  That shouldn't be *too* noisy.
>
> - putting
>    printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
>         list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);
>
>   in mddev_put() just before those values are tested.
>
>  - putting
>      printk("queue_work\n");
>    just before the 'queue_work()' call in mddev_put.
>
>  - putting
>      printk("mddev_delayed_delete\n");
>    in mddev_delayed_delete()
>
> Then see what gets printed when you stop the array.

I made those modifications to md.c and here is the kernel log when stopping:

--snip--
[ 3937.233487] mddev->active = 2
[ 3937.233503] mddev->active = 2
[ 3937.233509] mddev->active = 2
[ 3937.233516] mddev->active = 1
[ 3937.233516] rd=2 empty=0 ctime=1480617270 hold=0
[ 3937.233679] udevd[492]: inotify event: 8 for /dev/md127
[ 3937.241489] md127: detected capacity change from 73340747776 to 0
[ 3937.241493] md: md127 stopped.
[ 3937.241665] udevd[492]: device /dev/md127 closed, synthesising 'change'
[ 3937.241726] udevd[492]: seq 3631 queued, 'change' 'block'
[ 3937.241829] udevd[492]: seq 3631 forked new worker [4991]
[ 3937.241989] udevd[4991]: seq 3631 running
[ 3937.242002] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: leaving the
lockspace group...
[ 3937.242039] udevd[4991]: removing watch on '/dev/md127'
[ 3937.242068] mddev->active = 3
[ 3937.242069] udevd[492]: seq 3632 queued, 'offline' 'dlm'
[ 3937.242080] mddev->active = 3
[ 3937.242104] udevd[4991]: IMPORT 'probe-bcache -o udev /dev/md127'
/usr/lib/udev/rules.d/69-bcache.rules:16
[ 3937.242161] udevd[492]: seq 3632 forked new worker [4992]
[ 3937.242259] udevd[4993]: starting 'probe-bcache -o udev /dev/md127'
[ 3937.242753] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: group event done 0 0
[ 3937.242847] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19:
release_lockspace final free
[ 3937.242861] md: unbind<dm-1>
[ 3937.256606] md: export_rdev(dm-1)
[ 3937.256612] md: unbind<dm-0>
[ 3937.263601] md: export_rdev(dm-0)
[ 3937.263688] mddev->active = 4
[ 3937.263751] mddev->active = 3
--snip--

I didn't use my modified mdadm which stops the synthesized CHANGE from
occurring, but if needed, I can re-run the test using that.

I don't see any of the "queue_work" or "mddev_delayed_delete" messages
anywhere in the kernel logs. Here is how those monitoring lines are
set in md.c:

--snip--
static void mddev_put(struct mddev *mddev)
{
        struct bio_set *bs = NULL;
        printk("mddev->active = %d\n", atomic_read(&mddev->active));
        if (!atomic_dec_and_lock(&mddev->active, &all_mddevs_lock))
                return;
        printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
                list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);
        if (!mddev->raid_disks && list_empty(&mddev->disks) &&
            mddev->ctime == 0 && !mddev->hold_active) {
                /* Array is not configured at all, and not held active,
                 * so destroy it */
                list_del_init(&mddev->all_mddevs);
                bs = mddev->bio_set;
                mddev->bio_set = NULL;
                if (mddev->gendisk) {
                        /* We did a probe so need to clean up.  Call
                         * queue_work inside the spinlock so that
                         * flush_workqueue() after mddev_find will
                         * succeed in waiting for the work to be done.
                         */
                        INIT_WORK(&mddev->del_work, mddev_delayed_delete);
                        printk("queue_work\n");
                        queue_work(md_misc_wq, &mddev->del_work);
                } else
                        kfree(mddev);
        }
        spin_unlock(&all_mddevs_lock);
        if (bs)
                bioset_free(bs);
}
--snip--

--snip--
static void mddev_delayed_delete(struct work_struct *ws)
{
        struct mddev *mddev = container_of(ws, struct mddev, del_work);
        printk("mddev_delayed_delete\n");
        sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
        kobject_del(&mddev->kobj);
        kobject_put(&mddev->kobj);
}
--snip--

Let me know if the printk() lines weren't placed in the proper spots
and I'll fix and re-run the test. Thanks for your time.


--Marc


>
> NeilBrown

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-12-01 19:40                         ` Marc Smith
@ 2016-12-01 22:35                           ` NeilBrown
  2016-12-02 18:18                             ` Stephane Thiell
  2016-12-02 19:12                             ` Marc Smith
  0 siblings, 2 replies; 20+ messages in thread
From: NeilBrown @ 2016-12-01 22:35 UTC (permalink / raw)
  To: Marc Smith; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 4565 bytes --]

On Fri, Dec 02 2016, Marc Smith wrote:

> On Wed, Nov 30, 2016 at 9:52 PM, NeilBrown <neilb@suse.com> wrote:
>> On Mon, Nov 28 2016, Marc Smith wrote:
>>
>>>
>>> # find /sys/block/md127/md
>>> /sys/block/md127/md
>>> /sys/block/md127/md/reshape_position
>>> /sys/block/md127/md/layout
>>> /sys/block/md127/md/raid_disks
>>> /sys/block/md127/md/bitmap
>>> /sys/block/md127/md/bitmap/chunksize
>>
>> This tells me that:
>>         sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
>> hasn't been run, so mddev_delayed_delete() hasn't run.
>> That suggests the final mddev_put() hsn't run. i.e. mddev->active is > 0
>>
>> Everything else suggests that array has been stopped and cleaned and
>> should be gone...
>>
>> This seems to suggest that there is an unbalanced mddev_get() without a
>> matching mddev_put().  I cannot find it though.
>>
>> If I could reproduce it, I would try to see what is happening by:
>>
>> - putting
>>    printk("mddev->active = %d\n", atomic_read(&mddev->active));
>>  in the top of mddev_put().  That shouldn't be *too* noisy.
>>
>> - putting
>>    printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
>>         list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);
>>
>>   in mddev_put() just before those values are tested.
>>
>>  - putting
>>      printk("queue_work\n");
>>    just before the 'queue_work()' call in mddev_put.
>>
>>  - putting
>>      printk("mddev_delayed_delete\n");
>>    in mddev_delayed_delete()
>>
>> Then see what gets printed when you stop the array.
>
> I made those modifications to md.c and here is the kernel log when stopping:
>
> --snip--
> [ 3937.233487] mddev->active = 2
> [ 3937.233503] mddev->active = 2
> [ 3937.233509] mddev->active = 2
> [ 3937.233516] mddev->active = 1
> [ 3937.233516] rd=2 empty=0 ctime=1480617270 hold=0

At this point, mdadm has opened the /dev/md127 device, accessed a few
attributes via sysfs just to check on the status, and then closed it
again.
The array is still active, but we know that no other process has it
open.


> [ 3937.233679] udevd[492]: inotify event: 8 for /dev/md127
> [ 3937.241489] md127: detected capacity change from 73340747776 to 0
> [ 3937.241493] md: md127 stopped.

Now mdadm has opened the array again and issued the STOP_ARRAY ioctl.
Still nothing else has the array open.

> [ 3937.241665] udevd[492]: device /dev/md127 closed, synthesising 'change'
> [ 3937.241726] udevd[492]: seq 3631 queued, 'change' 'block'
> [ 3937.241829] udevd[492]: seq 3631 forked new worker [4991]
> [ 3937.241989] udevd[4991]: seq 3631 running
> [ 3937.242002] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: leaving the
> lockspace group...
> [ 3937.242039] udevd[4991]: removing watch on '/dev/md127'
> [ 3937.242068] mddev->active = 3

But somehow the ->active count got up to 3.
mdadm probably still has it open, but two other things do too.
If you have "mdadm --monitor" running in the background (which is good)
it will temporarily increase, then decrease the count.
udevd opens the device temporarily too.
So this isn't necessarily a problem.

> [ 3937.242069] udevd[492]: seq 3632 queued, 'offline' 'dlm'
> [ 3937.242080] mddev->active = 3
> [ 3937.242104] udevd[4991]: IMPORT 'probe-bcache -o udev /dev/md127'
> /usr/lib/udev/rules.d/69-bcache.rules:16
> [ 3937.242161] udevd[492]: seq 3632 forked new worker [4992]
> [ 3937.242259] udevd[4993]: starting 'probe-bcache -o udev /dev/md127'
> [ 3937.242753] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: group event done 0 0
> [ 3937.242847] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19:
> release_lockspace final free
> [ 3937.242861] md: unbind<dm-1>
> [ 3937.256606] md: export_rdev(dm-1)
> [ 3937.256612] md: unbind<dm-0>
> [ 3937.263601] md: export_rdev(dm-0)
> [ 3937.263688] mddev->active = 4
> [ 3937.263751] mddev->active = 3

But here, the active count only drops down to 2. (it is decremented
after it is printed).  Assuming there really were no more messages like
this, there are two active references to the md device, and we don't
know what they are.

>
> I didn't use my modified mdadm which stops the synthesized CHANGE from
> occurring, but if needed, I can re-run the test using that.

It would be good to use the modified mdadm, if only to reduce the
noise.  It won't change the end result, but might make it easier to see
what is happening.
Also please add
   WARN_ON(1);

in the start of mddev_get() and mddev_put().
That will provide a stack trace whenever either of these are called, so
we can see who takes a references, and who doesn't release it.

Thanks,
NeilBrown


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-12-01 22:35                           ` NeilBrown
@ 2016-12-02 18:18                             ` Stephane Thiell
  2016-12-02 19:12                             ` Marc Smith
  1 sibling, 0 replies; 20+ messages in thread
From: Stephane Thiell @ 2016-12-02 18:18 UTC (permalink / raw)
  To: NeilBrown; +Cc: Marc Smith, linux-raid

Hey,

Just wanted to jump in as I reported a very similar problem recently on:

https://github.com/neilbrown/mdadm/issues/29

Although it’s much better with mdadm 3.4 in my case, I still occasionally get remnant md after --stop ...

Please let me know what you think.

Best,

Stephan

> On Dec 1, 2016, at 2:35 PM, NeilBrown <neilb@suse.com> wrote:
> 
> On Fri, Dec 02 2016, Marc Smith wrote:
> 
>> On Wed, Nov 30, 2016 at 9:52 PM, NeilBrown <neilb@suse.com> wrote:
>>> On Mon, Nov 28 2016, Marc Smith wrote:
>>> 
>>>> 
>>>> # find /sys/block/md127/md
>>>> /sys/block/md127/md
>>>> /sys/block/md127/md/reshape_position
>>>> /sys/block/md127/md/layout
>>>> /sys/block/md127/md/raid_disks
>>>> /sys/block/md127/md/bitmap
>>>> /sys/block/md127/md/bitmap/chunksize
>>> 
>>> This tells me that:
>>>        sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
>>> hasn't been run, so mddev_delayed_delete() hasn't run.
>>> That suggests the final mddev_put() hsn't run. i.e. mddev->active is > 0
>>> 
>>> Everything else suggests that array has been stopped and cleaned and
>>> should be gone...
>>> 
>>> This seems to suggest that there is an unbalanced mddev_get() without a
>>> matching mddev_put().  I cannot find it though.
>>> 
>>> If I could reproduce it, I would try to see what is happening by:
>>> 
>>> - putting
>>>   printk("mddev->active = %d\n", atomic_read(&mddev->active));
>>> in the top of mddev_put().  That shouldn't be *too* noisy.
>>> 
>>> - putting
>>>   printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
>>>        list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);
>>> 
>>>  in mddev_put() just before those values are tested.
>>> 
>>> - putting
>>>     printk("queue_work\n");
>>>   just before the 'queue_work()' call in mddev_put.
>>> 
>>> - putting
>>>     printk("mddev_delayed_delete\n");
>>>   in mddev_delayed_delete()
>>> 
>>> Then see what gets printed when you stop the array.
>> 
>> I made those modifications to md.c and here is the kernel log when stopping:
>> 
>> --snip--
>> [ 3937.233487] mddev->active = 2
>> [ 3937.233503] mddev->active = 2
>> [ 3937.233509] mddev->active = 2
>> [ 3937.233516] mddev->active = 1
>> [ 3937.233516] rd=2 empty=0 ctime=1480617270 hold=0
> 
> At this point, mdadm has opened the /dev/md127 device, accessed a few
> attributes via sysfs just to check on the status, and then closed it
> again.
> The array is still active, but we know that no other process has it
> open.
> 
> 
>> [ 3937.233679] udevd[492]: inotify event: 8 for /dev/md127
>> [ 3937.241489] md127: detected capacity change from 73340747776 to 0
>> [ 3937.241493] md: md127 stopped.
> 
> Now mdadm has opened the array again and issued the STOP_ARRAY ioctl.
> Still nothing else has the array open.
> 
>> [ 3937.241665] udevd[492]: device /dev/md127 closed, synthesising 'change'
>> [ 3937.241726] udevd[492]: seq 3631 queued, 'change' 'block'
>> [ 3937.241829] udevd[492]: seq 3631 forked new worker [4991]
>> [ 3937.241989] udevd[4991]: seq 3631 running
>> [ 3937.242002] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: leaving the
>> lockspace group...
>> [ 3937.242039] udevd[4991]: removing watch on '/dev/md127'
>> [ 3937.242068] mddev->active = 3
> 
> But somehow the ->active count got up to 3.
> mdadm probably still has it open, but two other things do too.
> If you have "mdadm --monitor" running in the background (which is good)
> it will temporarily increase, then decrease the count.
> udevd opens the device temporarily too.
> So this isn't necessarily a problem.
> 
>> [ 3937.242069] udevd[492]: seq 3632 queued, 'offline' 'dlm'
>> [ 3937.242080] mddev->active = 3
>> [ 3937.242104] udevd[4991]: IMPORT 'probe-bcache -o udev /dev/md127'
>> /usr/lib/udev/rules.d/69-bcache.rules:16
>> [ 3937.242161] udevd[492]: seq 3632 forked new worker [4992]
>> [ 3937.242259] udevd[4993]: starting 'probe-bcache -o udev /dev/md127'
>> [ 3937.242753] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: group event done 0 0
>> [ 3937.242847] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19:
>> release_lockspace final free
>> [ 3937.242861] md: unbind<dm-1>
>> [ 3937.256606] md: export_rdev(dm-1)
>> [ 3937.256612] md: unbind<dm-0>
>> [ 3937.263601] md: export_rdev(dm-0)
>> [ 3937.263688] mddev->active = 4
>> [ 3937.263751] mddev->active = 3
> 
> But here, the active count only drops down to 2. (it is decremented
> after it is printed).  Assuming there really were no more messages like
> this, there are two active references to the md device, and we don't
> know what they are.
> 
>> 
>> I didn't use my modified mdadm which stops the synthesized CHANGE from
>> occurring, but if needed, I can re-run the test using that.
> 
> It would be good to use the modified mdadm, if only to reduce the
> noise.  It won't change the end result, but might make it easier to see
> what is happening.
> Also please add
>   WARN_ON(1);
> 
> in the start of mddev_get() and mddev_put().
> That will provide a stack trace whenever either of these are called, so
> we can see who takes a references, and who doesn't release it.
> 
> Thanks,
> NeilBrown


^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-12-01 22:35                           ` NeilBrown
  2016-12-02 18:18                             ` Stephane Thiell
@ 2016-12-02 19:12                             ` Marc Smith
  2016-12-02 20:22                               ` Marc Smith
  1 sibling, 1 reply; 20+ messages in thread
From: Marc Smith @ 2016-12-02 19:12 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Thu, Dec 1, 2016 at 5:35 PM, NeilBrown <neilb@suse.com> wrote:
> On Fri, Dec 02 2016, Marc Smith wrote:
>
>> On Wed, Nov 30, 2016 at 9:52 PM, NeilBrown <neilb@suse.com> wrote:
>>> On Mon, Nov 28 2016, Marc Smith wrote:
>>>
>>>>
>>>> # find /sys/block/md127/md
>>>> /sys/block/md127/md
>>>> /sys/block/md127/md/reshape_position
>>>> /sys/block/md127/md/layout
>>>> /sys/block/md127/md/raid_disks
>>>> /sys/block/md127/md/bitmap
>>>> /sys/block/md127/md/bitmap/chunksize
>>>
>>> This tells me that:
>>>         sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
>>> hasn't been run, so mddev_delayed_delete() hasn't run.
>>> That suggests the final mddev_put() hsn't run. i.e. mddev->active is > 0
>>>
>>> Everything else suggests that array has been stopped and cleaned and
>>> should be gone...
>>>
>>> This seems to suggest that there is an unbalanced mddev_get() without a
>>> matching mddev_put().  I cannot find it though.
>>>
>>> If I could reproduce it, I would try to see what is happening by:
>>>
>>> - putting
>>>    printk("mddev->active = %d\n", atomic_read(&mddev->active));
>>>  in the top of mddev_put().  That shouldn't be *too* noisy.
>>>
>>> - putting
>>>    printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
>>>         list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);
>>>
>>>   in mddev_put() just before those values are tested.
>>>
>>>  - putting
>>>      printk("queue_work\n");
>>>    just before the 'queue_work()' call in mddev_put.
>>>
>>>  - putting
>>>      printk("mddev_delayed_delete\n");
>>>    in mddev_delayed_delete()
>>>
>>> Then see what gets printed when you stop the array.
>>
>> I made those modifications to md.c and here is the kernel log when stopping:
>>
>> --snip--
>> [ 3937.233487] mddev->active = 2
>> [ 3937.233503] mddev->active = 2
>> [ 3937.233509] mddev->active = 2
>> [ 3937.233516] mddev->active = 1
>> [ 3937.233516] rd=2 empty=0 ctime=1480617270 hold=0
>
> At this point, mdadm has opened the /dev/md127 device, accessed a few
> attributes via sysfs just to check on the status, and then closed it
> again.
> The array is still active, but we know that no other process has it
> open.
>
>
>> [ 3937.233679] udevd[492]: inotify event: 8 for /dev/md127
>> [ 3937.241489] md127: detected capacity change from 73340747776 to 0
>> [ 3937.241493] md: md127 stopped.
>
> Now mdadm has opened the array again and issued the STOP_ARRAY ioctl.
> Still nothing else has the array open.
>
>> [ 3937.241665] udevd[492]: device /dev/md127 closed, synthesising 'change'
>> [ 3937.241726] udevd[492]: seq 3631 queued, 'change' 'block'
>> [ 3937.241829] udevd[492]: seq 3631 forked new worker [4991]
>> [ 3937.241989] udevd[4991]: seq 3631 running
>> [ 3937.242002] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: leaving the
>> lockspace group...
>> [ 3937.242039] udevd[4991]: removing watch on '/dev/md127'
>> [ 3937.242068] mddev->active = 3
>
> But somehow the ->active count got up to 3.
> mdadm probably still has it open, but two other things do too.
> If you have "mdadm --monitor" running in the background (which is good)
> it will temporarily increase, then decrease the count.
> udevd opens the device temporarily too.
> So this isn't necessarily a problem.
>
>> [ 3937.242069] udevd[492]: seq 3632 queued, 'offline' 'dlm'
>> [ 3937.242080] mddev->active = 3
>> [ 3937.242104] udevd[4991]: IMPORT 'probe-bcache -o udev /dev/md127'
>> /usr/lib/udev/rules.d/69-bcache.rules:16
>> [ 3937.242161] udevd[492]: seq 3632 forked new worker [4992]
>> [ 3937.242259] udevd[4993]: starting 'probe-bcache -o udev /dev/md127'
>> [ 3937.242753] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: group event done 0 0
>> [ 3937.242847] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19:
>> release_lockspace final free
>> [ 3937.242861] md: unbind<dm-1>
>> [ 3937.256606] md: export_rdev(dm-1)
>> [ 3937.256612] md: unbind<dm-0>
>> [ 3937.263601] md: export_rdev(dm-0)
>> [ 3937.263688] mddev->active = 4
>> [ 3937.263751] mddev->active = 3
>
> But here, the active count only drops down to 2. (it is decremented
> after it is printed).  Assuming there really were no more messages like
> this, there are two active references to the md device, and we don't
> know what they are.
>
>>
>> I didn't use my modified mdadm which stops the synthesized CHANGE from
>> occurring, but if needed, I can re-run the test using that.
>
> It would be good to use the modified mdadm, if only to reduce the
> noise.  It won't change the end result, but might make it easier to see
> what is happening.
> Also please add
>    WARN_ON(1);
>
> in the start of mddev_get() and mddev_put().
> That will provide a stack trace whenever either of these are called, so
> we can see who takes a references, and who doesn't release it.

Okay, I added that to both functions, and now I can't get stopping the
array to misbehave (eg, not generate the REMOVE event). I've been
trying all morning! I literally just added the WARN_ON(1) to those two
functions, and that's all I changed. I compiled and reinstalled image,
no other changes. I've tried quite a few times now to reproduce this,
and I'm failing to do so -- every time the REMOVE event is generated
and everything is removed correctly.

I'm going to switch back to the previous image and confirm its
reproducible with that.

--Marc


>
> Thanks,
> NeilBrown
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-12-02 19:12                             ` Marc Smith
@ 2016-12-02 20:22                               ` Marc Smith
  2016-12-05  0:41                                 ` NeilBrown
  0 siblings, 1 reply; 20+ messages in thread
From: Marc Smith @ 2016-12-02 20:22 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

Finally, I got it! Why is it when I want it to break, it doesn't. =)

I will say, using the modified mdadm that prevents the synthesized
CHANGE event, it seems to not induce the problem as regularly.

Below are the kernel logs after stopping an array:

--snip--
[16438.999544] ------------[ cut here ]------------
[16438.999554] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:449
mddev_find+0x85/0x205
[16438.999567] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16438.999592] CPU: 4 PID: 31175 Comm: mdadm Tainted: G        W  O
4.9.0-rc3-esos.prod #1
[16438.999593] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16438.999595]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16438.999598]  ffffffff81065522 0000000000000000 000000000090007e
ffff8803182d4000
[16438.999601]  000000000000007e 0000000000000009 0000000000000001
ffffffff81875356
[16438.999604] Call Trace:
[16438.999612]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16438.999618]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16438.999620]  [<ffffffff81875356>] ? mddev_find+0x85/0x205
[16438.999623]  [<ffffffff8187978e>] ? md_open+0x10/0x9a
[16438.999628]  [<ffffffff811513c3>] ? __blkdev_get+0xc3/0x345
[16438.999634]  [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16438.999641]  [<ffffffff811517f2>] ? blkdev_get+0x1ad/0x2ad
[16438.999647]  [<ffffffff81132a3f>] ? walk_component+0x36/0x20f
[16438.999653]  [<ffffffff81150501>] ? bdgrab+0xd/0x12
[16438.999659]  [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16438.999666]  [<ffffffff8112664e>] ? do_dentry_open.isra.16+0x1b2/0x28a
[16438.999672]  [<ffffffff81134dd9>] ? path_openat+0xcc7/0xeb1
[16438.999677]  [<ffffffff8113500b>] ? do_filp_open+0x48/0x9e
[16438.999680]  [<ffffffff8113a058>] ? dput+0x21/0x1cb
[16438.999683]  [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16438.999685]  [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16438.999690]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16438.999692] ---[ end trace a0868b86aec8f14c ]---
[16438.999743] ------------[ cut here ]------------
[16438.999747] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:449
md_attr_show+0x61/0x8f
[16438.999748] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16438.999763] CPU: 4 PID: 31175 Comm: mdadm Tainted: G        W  O
4.9.0-rc3-esos.prod #1
[16438.999764] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16438.999765]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16438.999768]  ffffffff81065522 ffff8803182d4048 ffff8803182d4000
ffffffff8209fb80
[16438.999771]  ffff8803182fc000 ffffc9000ff17f30 ffff8805c7871998
ffffffff818792a0
[16438.999774] Call Trace:
[16438.999777]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16438.999780]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16438.999782]  [<ffffffff818792a0>] ? md_attr_show+0x61/0x8f
[16438.999787]  [<ffffffff811843ea>] ? sysfs_kf_read+0x61/0x97
[16438.999789]  [<ffffffff81183a52>] ? kernfs_fop_read+0xdc/0x13e
[16438.999792]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16438.999795]  [<ffffffff811840ca>] ? kernfs_iop_get_link+0x14b/0x17b
[16438.999798]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16438.999800]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16438.999803]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16438.999804] ---[ end trace a0868b86aec8f14d ]---
[16438.999806] ------------[ cut here ]------------
[16438.999809] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16438.999809] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16438.999825] CPU: 4 PID: 31175 Comm: mdadm Tainted: G        W  O
4.9.0-rc3-esos.prod #1
[16438.999826] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16438.999827]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16438.999830]  ffffffff81065522 ffff8803182d4000 ffff8803182d4000
ffffffff8209fb80
[16438.999833]  ffff8803182fc000 ffffc9000ff17f30 ffff8805c7871998
ffffffff818781b3
[16438.999836] Call Trace:
[16438.999838]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16438.999840]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16438.999843]  [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16438.999845]  [<ffffffff818792c4>] ? md_attr_show+0x85/0x8f
[16438.999847]  [<ffffffff811843ea>] ? sysfs_kf_read+0x61/0x97
[16438.999850]  [<ffffffff81183a52>] ? kernfs_fop_read+0xdc/0x13e
[16438.999852]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16438.999855]  [<ffffffff811840ca>] ? kernfs_iop_get_link+0x14b/0x17b
[16438.999857]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16438.999860]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16438.999862]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16438.999864] ---[ end trace a0868b86aec8f14e ]---
[16438.999864] mddev->active = 2
[16438.999884] ------------[ cut here ]------------
[16438.999887] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:449
md_attr_show+0x61/0x8f
[16438.999887] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16438.999902] CPU: 4 PID: 31175 Comm: mdadm Tainted: G        W  O
4.9.0-rc3-esos.prod #1
[16438.999903] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16438.999904]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16438.999907]  ffffffff81065522 ffff8803182d4048 ffff8803182d4000
ffffffff8209fd10
[16438.999910]  ffff8805c8def000 ffff8805c96b7e00 0000000000000001
ffffffff818792a0
[16438.999913] Call Trace:
[16438.999916]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16438.999918]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16438.999920]  [<ffffffff818792a0>] ? md_attr_show+0x61/0x8f
[16438.999923]  [<ffffffff811844cf>] ? sysfs_kf_seq_show+0x7a/0xc4
[16438.999927]  [<ffffffff81143d99>] ? seq_read+0x16c/0x323
[16438.999929]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16438.999931]  [<ffffffff8113a067>] ? dput+0x30/0x1cb
[16438.999934]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16438.999936]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16438.999939]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16438.999941] ---[ end trace a0868b86aec8f14f ]---
[16438.999942] ------------[ cut here ]------------
[16438.999945] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16438.999952] Modules linked in:
[16438.999958]  fcst(O) scst_changer(O) scst_tape(O) scst_vdisk(O)
scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O) scst(O) qla2xxx
bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs ib_srp iw_nes
iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.000060] CPU: 4 PID: 31175 Comm: mdadm Tainted: G        W  O
4.9.0-rc3-esos.prod #1
[16439.000061] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.000061]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.000064]  ffffffff81065522 ffff8803182d4000 ffff8803182d4000
ffffffff8209fd10
[16439.000067]  ffff8805c8def000 ffff8805c96b7e00 0000000000000001
ffffffff818781b3
[16439.000070] Call Trace:
[16439.000073]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.000075]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.000077]  [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.000080]  [<ffffffff818792c4>] ? md_attr_show+0x85/0x8f
[16439.000082]  [<ffffffff811844cf>] ? sysfs_kf_seq_show+0x7a/0xc4
[16439.000085]  [<ffffffff81143d99>] ? seq_read+0x16c/0x323
[16439.000087]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.000089]  [<ffffffff8113a067>] ? dput+0x30/0x1cb
[16439.000092]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.000095]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.000097]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.000099] ---[ end trace a0868b86aec8f150 ]---
[16439.000099] mddev->active = 2
[16439.000111] ------------[ cut here ]------------
[16439.000114] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:449
md_attr_show+0x61/0x8f
[16439.000115] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.000130] CPU: 4 PID: 31175 Comm: mdadm Tainted: G        W  O
4.9.0-rc3-esos.prod #1
[16439.000131] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.000132]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.000135]  ffffffff81065522 ffff8803182d4048 ffff8803182d4000
ffffffff8209fba0
[16439.000138]  ffff8805c8def000 ffff8805c96b7100 0000000000000001
ffffffff818792a0
[16439.000141] Call Trace:
[16439.000144]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.000146]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.000148]  [<ffffffff818792a0>] ? md_attr_show+0x61/0x8f
[16439.000151]  [<ffffffff811844cf>] ? sysfs_kf_seq_show+0x7a/0xc4
[16439.000153]  [<ffffffff81143d99>] ? seq_read+0x16c/0x323
[16439.000156]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.000158]  [<ffffffff8113a166>] ? dput+0x12f/0x1cb
[16439.000161]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.000163]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.000166]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.000167] ---[ end trace a0868b86aec8f151 ]---
[16439.000169] ------------[ cut here ]------------
[16439.000171] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.000172] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.000187] CPU: 4 PID: 31175 Comm: mdadm Tainted: G        W  O
4.9.0-rc3-esos.prod #1
[16439.000188] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.000189]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.000192]  ffffffff81065522 ffff8803182d4000 ffff8803182d4000
ffffffff8209fba0
[16439.000195]  ffff8805c8def000 ffff8805c96b7100 0000000000000001
ffffffff818781b3
[16439.000198] Call Trace:
[16439.000201]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.000203]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.000205]  [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.000207]  [<ffffffff818792c4>] ? md_attr_show+0x85/0x8f
[16439.000210]  [<ffffffff811844cf>] ? sysfs_kf_seq_show+0x7a/0xc4
[16439.000212]  [<ffffffff81143d99>] ? seq_read+0x16c/0x323
[16439.000215]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.000217]  [<ffffffff8113a166>] ? dput+0x12f/0x1cb
[16439.000219]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.000222]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.000224]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.000226] ---[ end trace a0868b86aec8f152 ]---
[16439.000227] mddev->active = 2
[16439.000236] ------------[ cut here ]------------
[16439.000239] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.000240] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.000255] CPU: 4 PID: 31175 Comm: mdadm Tainted: G        W  O
4.9.0-rc3-esos.prod #1
[16439.000256] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.000257]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.000260]  ffffffff81065522 ffff8803182d4000 ffff880622d20aa0
ffff8803182ae800
[16439.000263]  ffff880622d209d8 ffff880622d20b20 ffff88030f854ac8
ffffffff818781b3
[16439.000266] Call Trace:
[16439.000269]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.000271]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.000273]  [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.000276]  [<ffffffff81151258>] ? __blkdev_put+0x11c/0x1c4
[16439.000278]  [<ffffffff81151aff>] ? blkdev_close+0x1c/0x1f
[16439.000280]  [<ffffffff81129c69>] ? __fput+0xd8/0x18a
[16439.000285]  [<ffffffff8107990c>] ? task_work_run+0x5d/0x73
[16439.000288]  [<ffffffff81001048>] ? exit_to_usermode_loop+0x48/0x5d
[16439.000290]  [<ffffffff8100135c>] ? syscall_return_slowpath+0x3a/0x4c
[16439.000292]  [<ffffffff81a7da9f>] ? entry_SYSCALL_64_fastpath+0x92/0x94
[16439.000294] ---[ end trace a0868b86aec8f153 ]---
[16439.000295] mddev->active = 1
[16439.000296] rd=2 empty=0 ctime=1480694644 hold=0
[16439.000302] ------------[ cut here ]------------
[16439.000305] WARNING: CPU: 4 PID: 31175 at drivers/md/md.c:449
mddev_find+0x85/0x205
[16439.000305] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.000321] CPU: 4 PID: 31175 Comm: mdadm Tainted: G        W  O
4.9.0-rc3-esos.prod #1
[16439.000322] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.000323]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.000326]  ffffffff81065522 0000000000000000 000000000090007e
ffff8803182d4000
[16439.000329]  000000000000007e 0000000000000009 0000000000000001
ffffffff81875356
[16439.000332] Call Trace:
[16439.000334]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.000336]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.000338]  [<ffffffff81875356>] ? mddev_find+0x85/0x205
[16439.000341]  [<ffffffff8187978e>] ? md_open+0x10/0x9a
[16439.000343]  [<ffffffff811513c3>] ? __blkdev_get+0xc3/0x345
[16439.000345]  [<ffffffff811517f2>] ? blkdev_get+0x1ad/0x2ad
[16439.000348]  [<ffffffff81150501>] ? bdgrab+0xd/0x12
[16439.000350]  [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16439.000353]  [<ffffffff8112664e>] ? do_dentry_open.isra.16+0x1b2/0x28a
[16439.000355]  [<ffffffff81134dd9>] ? path_openat+0xcc7/0xeb1
[16439.000359]  [<ffffffff8109b30f>] ? console_unlock+0x254/0x46c
[16439.000362]  [<ffffffff8113500b>] ? do_filp_open+0x48/0x9e
[16439.000364]  [<ffffffff8113a058>] ? dput+0x21/0x1cb
[16439.000367]  [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16439.000369]  [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16439.000372]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.000374] ---[ end trace a0868b86aec8f154 ]---
[16439.000405] udevd[494]: inotify event: 8 for /dev/md126
[16439.000420] ------------[ cut here ]------------
[16439.000427] WARNING: CPU: 11 PID: 494 at drivers/md/md.c:449
mddev_find+0x85/0x205
[16439.000427] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.000449] CPU: 11 PID: 494 Comm: udevd Tainted: G        W  O
4.9.0-rc3-esos.prod #1
[16439.000450] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.000452]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.000455]  ffffffff81065522 0000000000000000 000000000090007e
ffff8803182d4000
[16439.000458]  000000000000007e 0000000000000009 0000000000000001
ffffffff81875356
[16439.000461] Call Trace:
[16439.000467]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.000471]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.000474]  [<ffffffff81875356>] ? mddev_find+0x85/0x205
[16439.000476]  [<ffffffff8187978e>] ? md_open+0x10/0x9a
[16439.000480]  [<ffffffff81151522>] ? __blkdev_get+0x222/0x345
[16439.000483]  [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16439.000485]  [<ffffffff811517f2>] ? blkdev_get+0x1ad/0x2ad
[16439.000488]  [<ffffffff81132aab>] ? walk_component+0xa2/0x20f
[16439.000490]  [<ffffffff81150501>] ? bdgrab+0xd/0x12
[16439.000493]  [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16439.000496]  [<ffffffff8112664e>] ? do_dentry_open.isra.16+0x1b2/0x28a
[16439.000498]  [<ffffffff81134dd9>] ? path_openat+0xcc7/0xeb1
[16439.000500]  [<ffffffff81132400>] ? lookup_fast+0x1c0/0x267
[16439.000503]  [<ffffffff8113a067>] ? dput+0x30/0x1cb
[16439.000505]  [<ffffffff8113316c>] ? path_lookupat+0xea/0xfe
[16439.000507]  [<ffffffff8113500b>] ? do_filp_open+0x48/0x9e
[16439.000510]  [<ffffffff8113cf32>] ? current_time+0x54/0x5d
[16439.000514]  [<ffffffff811840ca>] ? kernfs_iop_get_link+0x14b/0x17b
[16439.000516]  [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16439.000518]  [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16439.000523]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.000524] ---[ end trace a0868b86aec8f155 ]---
[16439.009255] md126: detected capacity change from 73340747776 to 0
[16439.009259] md: md126 stopped.
[16439.009419] dlm: d8c5a2a8-4fbe-7f67-ab89-d1834f978d2a: leaving the
lockspace group...
[16439.009424] udevd[494]: device /dev/md126 closed, synthesising 'change'
[16439.009512] udevd[494]: seq 3817 queued, 'offline' 'dlm'
[16439.009709] udevd[494]: seq 3817 forked new worker [31176]
[16439.009762] udevd[494]: seq 3818 queued, 'change' 'block'
[16439.009882] udevd[494]: seq 3818 forked new worker [31177]
[16439.009904] udevd[31176]: seq 3817 running
[16439.009955] udevd[31176]: no db file to read
/run/udev/data/+dlm:d8c5a2a8-4fbe-7f67-ab89-d1834f978d2a: No such file
or directory
[16439.010002] udevd[31176]: passed device to netlink monitor 0xdee2c0
[16439.010005] udevd[31176]: seq 3817 processed
[16439.010255] ------------[ cut here ]------------
[16439.010260] WARNING: CPU: 20 PID: 31177 at drivers/md/md.c:449
md_attr_show+0x61/0x8f
[16439.010261] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.010280] CPU: 20 PID: 31177 Comm: udevd Tainted: G        W  O
 4.9.0-rc3-esos.prod #1
[16439.010281] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.010282]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.010286]  ffffffff81065522 ffff8803182d4048 ffff8803182d4000
ffffffff8209fb80
[16439.010289]  ffff8806198f8000 ffffc9000f66ff30 ffff8805c95c7ed8
ffffffff818792a0
[16439.010292] Call Trace:
[16439.010296]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.010299]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.010301]  [<ffffffff818792a0>] ? md_attr_show+0x61/0x8f
[16439.010304]  [<ffffffff811843ea>] ? sysfs_kf_read+0x61/0x97
[16439.010306]  [<ffffffff81183a52>] ? kernfs_fop_read+0xdc/0x13e
[16439.010309]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.010312]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.010315]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.010318]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.010319] ---[ end trace a0868b86aec8f156 ]---
[16439.010321] ------------[ cut here ]------------
[16439.010324] WARNING: CPU: 20 PID: 31177 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.010324] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.010340] CPU: 20 PID: 31177 Comm: udevd Tainted: G        W  O
 4.9.0-rc3-esos.prod #1
[16439.010341] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.010342]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.010345]  ffffffff81065522 ffff8803182d4000 ffff8803182d4000
ffffffff8209fb80
[16439.010348]  ffff8806198f8000 ffffc9000f66ff30 ffff8805c95c7ed8
ffffffff818781b3
[16439.010351] Call Trace:
[16439.010354]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.010356]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.010358]  [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.010360]  [<ffffffff818792c4>] ? md_attr_show+0x85/0x8f
[16439.010363]  [<ffffffff811843ea>] ? sysfs_kf_read+0x61/0x97
[16439.010365]  [<ffffffff81183a52>] ? kernfs_fop_read+0xdc/0x13e
[16439.010368]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.010370]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.010373]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.010375]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.010377] ---[ end trace a0868b86aec8f157 ]---
[16439.010377] mddev->active = 3
[16439.010385] dlm: d8c5a2a8-4fbe-7f67-ab89-d1834f978d2a: group event done 0 0
[16439.010399] ------------[ cut here ]------------
[16439.010402] WARNING: CPU: 20 PID: 31177 at drivers/md/md.c:449
md_attr_show+0x61/0x8f
[16439.010403] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.010419] CPU: 20 PID: 31177 Comm: udevd Tainted: G        W  O
 4.9.0-rc3-esos.prod #1
[16439.010420] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.010421]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.010424]  ffffffff81065522 ffff8803182d4048 ffff8803182d4000
ffffffff8209fc20
[16439.010426]  ffff8806198f8000 ffffc9000f66ff30 ffff8805c95c7ed8
ffffffff818792a0
[16439.010429] Call Trace:
[16439.010432]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.010434]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.010436]  [<ffffffff818792a0>] ? md_attr_show+0x61/0x8f
[16439.010439]  [<ffffffff811843ea>] ? sysfs_kf_read+0x61/0x97
[16439.010441]  [<ffffffff81183a52>] ? kernfs_fop_read+0xdc/0x13e
[16439.010444]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.010447]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.010449]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.010452]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.010453] ---[ end trace a0868b86aec8f158 ]---
[16439.010455] ------------[ cut here ]------------
[16439.010457] WARNING: CPU: 20 PID: 31177 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.010458] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.010473] CPU: 20 PID: 31177 Comm: udevd Tainted: G        W  O
 4.9.0-rc3-esos.prod #1
[16439.010474] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.010475]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.010478]  ffffffff81065522 ffff8803182d4000 ffff8803182d4000
ffffffff8209fc20
[16439.010480]  ffff8806198f8000 ffffc9000f66ff30 ffff8805c95c7ed8
ffffffff818781b3
[16439.010483] Call Trace:
[16439.010486]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.010488]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.010491]  [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.010493]  [<ffffffff818792c4>] ? md_attr_show+0x85/0x8f
[16439.010495]  [<ffffffff811843ea>] ? sysfs_kf_read+0x61/0x97
[16439.010498]  [<ffffffff81183a52>] ? kernfs_fop_read+0xdc/0x13e
[16439.010501]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.010503]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.010506]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.010508]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.010510] ---[ end trace a0868b86aec8f159 ]---
[16439.010511] mddev->active = 3
[16439.010600] dlm: d8c5a2a8-4fbe-7f67-ab89-d1834f978d2a:
release_lockspace final free
[16439.010628] md: unbind<dm-2>
[16439.011727] ------------[ cut here ]------------
[16439.011732] WARNING: CPU: 12 PID: 31178 at drivers/md/md.c:449
mddev_find+0x85/0x205
[16439.011733] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.011751] CPU: 12 PID: 31178 Comm: probe-bcache Tainted: G
W  O    4.9.0-rc3-esos.prod #1
[16439.011752] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.011753]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.011757]  ffffffff81065522 0000000000000000 000000000090007e
ffff8803182d4000
[16439.011760]  000000000000007e 0000000000000009 0000000000000001
ffffffff81875356
[16439.011763] Call Trace:
[16439.011767]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.011769]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.011772]  [<ffffffff81875356>] ? mddev_find+0x85/0x205
[16439.011774]  [<ffffffff8187978e>] ? md_open+0x10/0x9a
[16439.011777]  [<ffffffff81151522>] ? __blkdev_get+0x222/0x345
[16439.011779]  [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16439.011781]  [<ffffffff811517f2>] ? blkdev_get+0x1ad/0x2ad
[16439.011784]  [<ffffffff81132aab>] ? walk_component+0xa2/0x20f
[16439.011789]  [<ffffffff810ebf11>] ? get_page_from_freelist+0x58f/0x6dc
[16439.011791]  [<ffffffff81150501>] ? bdgrab+0xd/0x12
[16439.011794]  [<ffffffff81151935>] ? blkdev_get_by_dev+0x43/0x43
[16439.011796]  [<ffffffff8112664e>] ? do_dentry_open.isra.16+0x1b2/0x28a
[16439.011798]  [<ffffffff81134dd9>] ? path_openat+0xcc7/0xeb1
[16439.011801]  [<ffffffff8113500b>] ? do_filp_open+0x48/0x9e
[16439.011806]  [<ffffffff81105e6a>] ? handle_mm_fault+0x607/0xb0e
[16439.011809]  [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16439.011811]  [<ffffffff811274ab>] ? do_sys_open+0x135/0x1bc
[16439.011814]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.011815] ---[ end trace a0868b86aec8f15a ]---
[16439.024630] md: export_rdev(dm-2)
[16439.024689] md: unbind<dm-3>
[16439.035626] md: export_rdev(dm-3)
[16439.035781] ------------[ cut here ]------------
[16439.035786] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:449
md_seq_next+0x5b/0x93
[16439.035787] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.035805] CPU: 11 PID: 31175 Comm: mdadm Tainted: G        W  O
 4.9.0-rc3-esos.prod #1
[16439.035806] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.035808]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.035811]  ffffffff81065522 0000000000000001 ffff88061a157000
ffff88061a1573b8
[16439.035814]  ffff8803133adb00 ffff880320a15000 000000000000004b
ffffffff81879329
[16439.035817] Call Trace:
[16439.035821]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.035823]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.035826]  [<ffffffff81879329>] ? md_seq_next+0x5b/0x93
[16439.035829]  [<ffffffff81143e60>] ? seq_read+0x233/0x323
[16439.035832]  [<ffffffff81175e07>] ? proc_reg_read+0x3f/0x5d
[16439.035834]  [<ffffffff81175dc8>] ? proc_reg_write+0x5d/0x5d
[16439.035837]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.035840]  [<ffffffff8112c69b>] ? SyS_newfstat+0x1f/0x27
[16439.035842]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.035845]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.035848]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.035850] ---[ end trace a0868b86aec8f15b ]---
[16439.035858] ------------[ cut here ]------------
[16439.035861] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:449
md_seq_next+0x5b/0x93
[16439.035861] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.035877] CPU: 11 PID: 31175 Comm: mdadm Tainted: G        W  O
 4.9.0-rc3-esos.prod #1
[16439.035878] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.035879]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.035882]  ffffffff81065522 ffff88061a157000 ffff8803182d5000
ffff8803182d53b8
[16439.035885]  ffff8803133adb00 ffff880320a15000 00000000000000c9
ffffffff81879329
[16439.035888] Call Trace:
[16439.035891]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.035893]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.035895]  [<ffffffff81879329>] ? md_seq_next+0x5b/0x93
[16439.035897]  [<ffffffff81143e60>] ? seq_read+0x233/0x323
[16439.035900]  [<ffffffff81175e07>] ? proc_reg_read+0x3f/0x5d
[16439.035902]  [<ffffffff81175dc8>] ? proc_reg_write+0x5d/0x5d
[16439.035904]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.035907]  [<ffffffff8112c69b>] ? SyS_newfstat+0x1f/0x27
[16439.035909]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.035911]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.035914]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.035915] ---[ end trace a0868b86aec8f15c ]---
[16439.035916] ------------[ cut here ]------------
[16439.035919] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.035920] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.035935] CPU: 11 PID: 31175 Comm: mdadm Tainted: G        W  O
 4.9.0-rc3-esos.prod #1
[16439.035936] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.035936]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.035940]  ffffffff81065522 ffff88061a157000 ffff8803182d5000
ffff8803182d53b8
[16439.035942]  ffff8803133adb00 ffff880320a15000 00000000000000c9
ffffffff818781b3
[16439.035946] Call Trace:
[16439.035948]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.035950]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.035953]  [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.035955]  [<ffffffff81879359>] ? md_seq_next+0x8b/0x93
[16439.035957]  [<ffffffff81143e60>] ? seq_read+0x233/0x323
[16439.035960]  [<ffffffff81175e07>] ? proc_reg_read+0x3f/0x5d
[16439.035961]  [<ffffffff81175dc8>] ? proc_reg_write+0x5d/0x5d
[16439.035964]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.035966]  [<ffffffff8112c69b>] ? SyS_newfstat+0x1f/0x27
[16439.035969]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.035971]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.035974]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.035975] ---[ end trace a0868b86aec8f15d ]---
[16439.035976] mddev->active = 1
[16439.035978] rd=2 empty=0 ctime=1480694775 hold=0
[16439.035984] ------------[ cut here ]------------
[16439.035987] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:449
md_seq_next+0x5b/0x93
[16439.035988] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.036003] CPU: 11 PID: 31175 Comm: mdadm Tainted: G        W  O
 4.9.0-rc3-esos.prod #1
[16439.036004] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.036005]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.036008]  ffffffff81065522 ffff8803182d5000 ffff8803182d4000
ffff8803182d43b8
[16439.036011]  ffff8803133adb00 ffff880320a15000 0000000000000147
ffffffff81879329
[16439.036014] Call Trace:
[16439.036016]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.036018]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.036021]  [<ffffffff81879329>] ? md_seq_next+0x5b/0x93
[16439.036023]  [<ffffffff81143e60>] ? seq_read+0x233/0x323
[16439.036025]  [<ffffffff81175e07>] ? proc_reg_read+0x3f/0x5d
[16439.036027]  [<ffffffff81175dc8>] ? proc_reg_write+0x5d/0x5d
[16439.036030]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.036032]  [<ffffffff8112c69b>] ? SyS_newfstat+0x1f/0x27
[16439.036034]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.036037]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.036039]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.036041] ---[ end trace a0868b86aec8f15e ]---
[16439.036042] ------------[ cut here ]------------
[16439.036044] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.036045] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.036060] CPU: 11 PID: 31175 Comm: mdadm Tainted: G        W  O
 4.9.0-rc3-esos.prod #1
[16439.036061] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.036062]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.036065]  ffffffff81065522 ffff8803182d5000 ffff8803182d4000
ffff8803182d43b8
[16439.036067]  ffff8803133adb00 ffff880320a15000 0000000000000147
ffffffff818781b3
[16439.036070] Call Trace:
[16439.036073]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.036075]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.036077]  [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.036079]  [<ffffffff81879359>] ? md_seq_next+0x8b/0x93
[16439.036082]  [<ffffffff81143e60>] ? seq_read+0x233/0x323
[16439.036084]  [<ffffffff81175e07>] ? proc_reg_read+0x3f/0x5d
[16439.036086]  [<ffffffff81175dc8>] ? proc_reg_write+0x5d/0x5d
[16439.036088]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.036091]  [<ffffffff8112c69b>] ? SyS_newfstat+0x1f/0x27
[16439.036093]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.036096]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.036098]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.036100] ---[ end trace a0868b86aec8f15f ]---
[16439.036100] mddev->active = 1
[16439.036102] rd=2 empty=0 ctime=1480694673 hold=0
[16439.036103] ------------[ cut here ]------------
[16439.036105] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.036106] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.036121] CPU: 11 PID: 31175 Comm: mdadm Tainted: G        W  O
 4.9.0-rc3-esos.prod #1
[16439.036122] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.036122]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.036125]  ffffffff81065522 ffff8803182d4000 0000000000000002
ffffffff8209ff30
[16439.036128]  ffff8803133adb00 ffff880320a15000 0000000000000147
ffffffff818781b3
[16439.036131] Call Trace:
[16439.036134]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.036136]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.036138]  [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.036140]  [<ffffffff81879359>] ? md_seq_next+0x8b/0x93
[16439.036143]  [<ffffffff81143e60>] ? seq_read+0x233/0x323
[16439.036145]  [<ffffffff81175e07>] ? proc_reg_read+0x3f/0x5d
[16439.036147]  [<ffffffff81175dc8>] ? proc_reg_write+0x5d/0x5d
[16439.036150]  [<ffffffff811279a2>] ? __vfs_read+0x1c/0xe2
[16439.036152]  [<ffffffff8112c69b>] ? SyS_newfstat+0x1f/0x27
[16439.036154]  [<ffffffff811283df>] ? vfs_read+0x98/0x11b
[16439.036157]  [<ffffffff811294d2>] ? SyS_read+0x48/0x81
[16439.036160]  [<ffffffff81a7da20>] ? entry_SYSCALL_64_fastpath+0x13/0x94
[16439.036161] ---[ end trace a0868b86aec8f160 ]---
[16439.036162] mddev->active = 4
[16439.036298] ------------[ cut here ]------------
[16439.036302] WARNING: CPU: 11 PID: 31175 at drivers/md/md.c:458
mddev_put+0x18/0x16b
[16439.036303] Modules linked in: fcst(O) scst_changer(O) scst_tape(O)
scst_vdisk(O) scst_disk(O) ib_srpt(O) iscsi_scst(O) qla2x00tgt(O)
scst(O) qla2xxx bonding mlx5_core bna ib_umad rdma_ucm ib_uverbs
ib_srp iw_nes iw_cxgb4 cxgb4 iw_cxgb3 ib_qib rdmavt mlx4_ib ib_mthca
[16439.036319] CPU: 11 PID: 31175 Comm: mdadm Tainted: G        W  O
 4.9.0-rc3-esos.prod #1
[16439.036320] Hardware name: Dell Inc. PowerEdge R710/00NH4P, BIOS
6.4.0 07/23/2013
[16439.036321]  0000000000000000 ffffffff81396464 0000000000000000
0000000000000000
[16439.036324]  ffffffff81065522 ffff8803182d4000 ffff880622d20aa0
ffff8803182ae800
[16439.036327]  ffff880622d209d8 ffff880622d20b20 ffff88030f854ac8
ffffffff818781b3
[16439.036330] Call Trace:
[16439.036334]  [<ffffffff81396464>] ? dump_stack+0x46/0x59
[16439.036336]  [<ffffffff81065522>] ? __warn+0xc8/0xe1
[16439.036339]  [<ffffffff818781b3>] ? mddev_put+0x18/0x16b
[16439.036341]  [<ffffffff81151258>] ? __blkdev_put+0x11c/0x1c4
[16439.036344]  [<ffffffff81151aff>] ? blkdev_close+0x1c/0x1f
[16439.036346]  [<ffffffff81129c69>] ? __fput+0xd8/0x18a
[16439.036350]  [<ffffffff8107990c>] ? task_work_run+0x5d/0x73
[16439.036352]  [<ffffffff81001048>] ? exit_to_usermode_loop+0x48/0x5d
[16439.036354]  [<ffffffff8100135c>] ? syscall_return_slowpath+0x3a/0x4c
[16439.036357]  [<ffffffff81a7da9f>] ? entry_SYSCALL_64_fastpath+0x92/0x94
[16439.036359] ---[ end trace a0868b86aec8f161 ]---
[16439.036360] mddev->active = 3
--snip--


--Marc

On Fri, Dec 2, 2016 at 2:12 PM, Marc Smith <marc.smith@mcc.edu> wrote:
> On Thu, Dec 1, 2016 at 5:35 PM, NeilBrown <neilb@suse.com> wrote:
>> On Fri, Dec 02 2016, Marc Smith wrote:
>>
>>> On Wed, Nov 30, 2016 at 9:52 PM, NeilBrown <neilb@suse.com> wrote:
>>>> On Mon, Nov 28 2016, Marc Smith wrote:
>>>>
>>>>>
>>>>> # find /sys/block/md127/md
>>>>> /sys/block/md127/md
>>>>> /sys/block/md127/md/reshape_position
>>>>> /sys/block/md127/md/layout
>>>>> /sys/block/md127/md/raid_disks
>>>>> /sys/block/md127/md/bitmap
>>>>> /sys/block/md127/md/bitmap/chunksize
>>>>
>>>> This tells me that:
>>>>         sysfs_remove_group(&mddev->kobj, &md_bitmap_group);
>>>> hasn't been run, so mddev_delayed_delete() hasn't run.
>>>> That suggests the final mddev_put() hsn't run. i.e. mddev->active is > 0
>>>>
>>>> Everything else suggests that array has been stopped and cleaned and
>>>> should be gone...
>>>>
>>>> This seems to suggest that there is an unbalanced mddev_get() without a
>>>> matching mddev_put().  I cannot find it though.
>>>>
>>>> If I could reproduce it, I would try to see what is happening by:
>>>>
>>>> - putting
>>>>    printk("mddev->active = %d\n", atomic_read(&mddev->active));
>>>>  in the top of mddev_put().  That shouldn't be *too* noisy.
>>>>
>>>> - putting
>>>>    printk("rd=%d empty=%d ctime=%d hold=%d\n", mddev->raid_disks,
>>>>         list_empty(&mddev->disks), mddev->ctime, mddev->hold_active);
>>>>
>>>>   in mddev_put() just before those values are tested.
>>>>
>>>>  - putting
>>>>      printk("queue_work\n");
>>>>    just before the 'queue_work()' call in mddev_put.
>>>>
>>>>  - putting
>>>>      printk("mddev_delayed_delete\n");
>>>>    in mddev_delayed_delete()
>>>>
>>>> Then see what gets printed when you stop the array.
>>>
>>> I made those modifications to md.c and here is the kernel log when stopping:
>>>
>>> --snip--
>>> [ 3937.233487] mddev->active = 2
>>> [ 3937.233503] mddev->active = 2
>>> [ 3937.233509] mddev->active = 2
>>> [ 3937.233516] mddev->active = 1
>>> [ 3937.233516] rd=2 empty=0 ctime=1480617270 hold=0
>>
>> At this point, mdadm has opened the /dev/md127 device, accessed a few
>> attributes via sysfs just to check on the status, and then closed it
>> again.
>> The array is still active, but we know that no other process has it
>> open.
>>
>>
>>> [ 3937.233679] udevd[492]: inotify event: 8 for /dev/md127
>>> [ 3937.241489] md127: detected capacity change from 73340747776 to 0
>>> [ 3937.241493] md: md127 stopped.
>>
>> Now mdadm has opened the array again and issued the STOP_ARRAY ioctl.
>> Still nothing else has the array open.
>>
>>> [ 3937.241665] udevd[492]: device /dev/md127 closed, synthesising 'change'
>>> [ 3937.241726] udevd[492]: seq 3631 queued, 'change' 'block'
>>> [ 3937.241829] udevd[492]: seq 3631 forked new worker [4991]
>>> [ 3937.241989] udevd[4991]: seq 3631 running
>>> [ 3937.242002] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: leaving the
>>> lockspace group...
>>> [ 3937.242039] udevd[4991]: removing watch on '/dev/md127'
>>> [ 3937.242068] mddev->active = 3
>>
>> But somehow the ->active count got up to 3.
>> mdadm probably still has it open, but two other things do too.
>> If you have "mdadm --monitor" running in the background (which is good)
>> it will temporarily increase, then decrease the count.
>> udevd opens the device temporarily too.
>> So this isn't necessarily a problem.
>>
>>> [ 3937.242069] udevd[492]: seq 3632 queued, 'offline' 'dlm'
>>> [ 3937.242080] mddev->active = 3
>>> [ 3937.242104] udevd[4991]: IMPORT 'probe-bcache -o udev /dev/md127'
>>> /usr/lib/udev/rules.d/69-bcache.rules:16
>>> [ 3937.242161] udevd[492]: seq 3632 forked new worker [4992]
>>> [ 3937.242259] udevd[4993]: starting 'probe-bcache -o udev /dev/md127'
>>> [ 3937.242753] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19: group event done 0 0
>>> [ 3937.242847] dlm: dc18e34c-b136-1964-1c34-4509a7c60a19:
>>> release_lockspace final free
>>> [ 3937.242861] md: unbind<dm-1>
>>> [ 3937.256606] md: export_rdev(dm-1)
>>> [ 3937.256612] md: unbind<dm-0>
>>> [ 3937.263601] md: export_rdev(dm-0)
>>> [ 3937.263688] mddev->active = 4
>>> [ 3937.263751] mddev->active = 3
>>
>> But here, the active count only drops down to 2. (it is decremented
>> after it is printed).  Assuming there really were no more messages like
>> this, there are two active references to the md device, and we don't
>> know what they are.
>>
>>>
>>> I didn't use my modified mdadm which stops the synthesized CHANGE from
>>> occurring, but if needed, I can re-run the test using that.
>>
>> It would be good to use the modified mdadm, if only to reduce the
>> noise.  It won't change the end result, but might make it easier to see
>> what is happening.
>> Also please add
>>    WARN_ON(1);
>>
>> in the start of mddev_get() and mddev_put().
>> That will provide a stack trace whenever either of these are called, so
>> we can see who takes a references, and who doesn't release it.
>
> Okay, I added that to both functions, and now I can't get stopping the
> array to misbehave (eg, not generate the REMOVE event). I've been
> trying all morning! I literally just added the WARN_ON(1) to those two
> functions, and that's all I changed. I compiled and reinstalled image,
> no other changes. I've tried quite a few times now to reproduce this,
> and I'm failing to do so -- every time the REMOVE event is generated
> and everything is removed correctly.
>
> I'm going to switch back to the previous image and confirm its
> reproducible with that.
>
> --Marc
>
>
>>
>> Thanks,
>> NeilBrown
>>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-12-02 20:22                               ` Marc Smith
@ 2016-12-05  0:41                                 ` NeilBrown
  2016-12-05 21:37                                   ` Marc Smith
  0 siblings, 1 reply; 20+ messages in thread
From: NeilBrown @ 2016-12-05  0:41 UTC (permalink / raw)
  To: Marc Smith; +Cc: linux-raid

[-- Attachment #1: Type: text/plain, Size: 1719 bytes --]

On Sat, Dec 03 2016, Marc Smith wrote:

> Finally, I got it! Why is it when I want it to break, it doesn't. =)

welcome to my world :-)


>
> I will say, using the modified mdadm that prevents the synthesized
> CHANGE event, it seems to not induce the problem as regularly.
>
> Below are the kernel logs after stopping an array:

Thank you so much for persisting with this.
The logs you provide make it clear that two separate processes (494 and
31178) increment the ->active count by opening the device, but never
decrement that count by closing the device.
It seems too unlikely that either process would be holding the
file descriptor open indefinitely, so something must be going wrong
either as part of 'open', or as part of 'close'.

Now that I know where to look, the bug is obvious.  Why didn't I see
that before?

The open request is failing, almost certainly because MD_CLOSING is set,
but the ->active count isn't being decremented on failure.
This patch should fix it.

Please test and report results.

Thanks,
NeilBrown

Fixes: af8d8e6f0315 ("md: changes for MD_STILL_CLOSED flag" v4.9-rc1)

diff --git a/drivers/md/md.c b/drivers/md/md.c
index 2089d46b0eb8..a8e07eb2ca5f 100644
--- a/drivers/md/md.c
+++ b/drivers/md/md.c
@@ -7087,11 +7087,14 @@ static int md_open(struct block_device *bdev, fmode_t mode)
 	}
 	BUG_ON(mddev != bdev->bd_disk->private_data);
 
-	if ((err = mutex_lock_interruptible(&mddev->open_mutex)))
+	if ((err = mutex_lock_interruptible(&mddev->open_mutex))) {
+		mddev_put(mddev);
 		goto out;
+	}
 
 	if (test_bit(MD_CLOSING, &mddev->flags)) {
 		mutex_unlock(&mddev->open_mutex);
+		mddev_put(mddev);
 		return -ENODEV;
 	}
 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: MD Remnants After --stop
  2016-12-05  0:41                                 ` NeilBrown
@ 2016-12-05 21:37                                   ` Marc Smith
  0 siblings, 0 replies; 20+ messages in thread
From: Marc Smith @ 2016-12-05 21:37 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid

On Sun, Dec 4, 2016 at 7:41 PM, NeilBrown <neilb@suse.com> wrote:
> On Sat, Dec 03 2016, Marc Smith wrote:
>
>> Finally, I got it! Why is it when I want it to break, it doesn't. =)
>
> welcome to my world :-)
>
>
>>
>> I will say, using the modified mdadm that prevents the synthesized
>> CHANGE event, it seems to not induce the problem as regularly.
>>
>> Below are the kernel logs after stopping an array:
>
> Thank you so much for persisting with this.
> The logs you provide make it clear that two separate processes (494 and
> 31178) increment the ->active count by opening the device, but never
> decrement that count by closing the device.
> It seems too unlikely that either process would be holding the
> file descriptor open indefinitely, so something must be going wrong
> either as part of 'open', or as part of 'close'.
>
> Now that I know where to look, the bug is obvious.  Why didn't I see
> that before?
>
> The open request is failing, almost certainly because MD_CLOSING is set,
> but the ->active count isn't being decremented on failure.
> This patch should fix it.
>
> Please test and report results.
>
> Thanks,
> NeilBrown
>
> Fixes: af8d8e6f0315 ("md: changes for MD_STILL_CLOSED flag" v4.9-rc1)
>
> diff --git a/drivers/md/md.c b/drivers/md/md.c
> index 2089d46b0eb8..a8e07eb2ca5f 100644
> --- a/drivers/md/md.c
> +++ b/drivers/md/md.c
> @@ -7087,11 +7087,14 @@ static int md_open(struct block_device *bdev, fmode_t mode)
>         }
>         BUG_ON(mddev != bdev->bd_disk->private_data);
>
> -       if ((err = mutex_lock_interruptible(&mddev->open_mutex)))
> +       if ((err = mutex_lock_interruptible(&mddev->open_mutex))) {
> +               mddev_put(mddev);
>                 goto out;
> +       }
>
>         if (test_bit(MD_CLOSING, &mddev->flags)) {
>                 mutex_unlock(&mddev->open_mutex);
> +               mddev_put(mddev);
>                 return -ENODEV;
>         }
>

That did the trick! I ran 'mdadm --stop' eight different times on two
nodes, and every time it was removed completely from /dev and
/sys/block just like expected. =)

Thanks for your effort and time on this. I think I read in the other
thread RE: this patch, it may make it into 4.9?


--Marc

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-12-05 21:37 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-04 15:35 MD Remnants After --stop Marc Smith
2016-11-07  5:44 ` NeilBrown
2016-11-18 19:31   ` Marc Smith
2016-11-21  3:42     ` NeilBrown
2016-11-21 14:08       ` Marc Smith
2016-11-22 15:12         ` Marc Smith
2016-11-22 23:51           ` NeilBrown
2016-11-23 15:21             ` Marc Smith
2016-11-23 23:38               ` NeilBrown
2016-11-26 16:41                 ` Marc Smith
2016-11-27 22:20                   ` NeilBrown
2016-11-28  2:25                     ` Marc Smith
2016-12-01  2:52                       ` NeilBrown
2016-12-01 19:40                         ` Marc Smith
2016-12-01 22:35                           ` NeilBrown
2016-12-02 18:18                             ` Stephane Thiell
2016-12-02 19:12                             ` Marc Smith
2016-12-02 20:22                               ` Marc Smith
2016-12-05  0:41                                 ` NeilBrown
2016-12-05 21:37                                   ` Marc Smith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.