All of lore.kernel.org
 help / color / mirror / Atom feed
* Improve processing efficiency for addition and deletion of multipath devices
@ 2016-11-16  1:46 tang.junhui
  2016-11-16  7:53 ` Hannes Reinecke
  0 siblings, 1 reply; 47+ messages in thread
From: tang.junhui @ 2016-11-16  1:46 UTC (permalink / raw)
  To: christophe.varoqui, bmarzins, hare; +Cc: dm-devel, tang.junhui


[-- Attachment #1.1: Type: text/plain, Size: 1157 bytes --]

In these scenarios, multipath processing efficiency is very low:
1) There are many paths in multipath devices,
2) Add/delete devices when iSCSI login/logout or FC link up/down.

Multipath process so slowly that it is not satisfied some applications, 
For example, 
openstack is often timeout in waiting for the creation of multipath 
devices.

The reason of the low processing efficiency I think is that multipath 
process uevent 
message one by one, and each one also produce a new dm 
addition/change/deletion 
uevent message to increased system resource consumption, actually most of 
these uevent 
message have no sense at all.

So, can we find out a way to reduce these uevent messages to promote 
multipath 
processing efficiency? Personally, I think we can merge these uevent 
messages before 
processing. For example, during the 4 iSCSI session login procedures, we 
can wait for a 
moment until the addition uevent messages of 4 paths are all arrived, and 
then we can 
merge these uevent messages to one and deal with it at once. The way to 
deal with 
deletion uevent messages is the same.

How do you think about? Any points of view are welcome.

[-- Attachment #1.2: Type: text/html, Size: 1911 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-16  1:46 Improve processing efficiency for addition and deletion of multipath devices tang.junhui
@ 2016-11-16  7:53 ` Hannes Reinecke
  2016-11-16  8:45   ` tang.junhui
  0 siblings, 1 reply; 47+ messages in thread
From: Hannes Reinecke @ 2016-11-16  7:53 UTC (permalink / raw)
  To: tang.junhui, christophe.varoqui, bmarzins; +Cc: dm-devel

On 11/16/2016 02:46 AM, tang.junhui@zte.com.cn wrote:
> In these scenarios, multipath processing efficiency is very low:
> 1) There are many paths in multipath devices,
> 2) Add/delete devices when iSCSI login/logout or FC link up/down.
> 
> Multipath process so slowly that it is not satisfied some applications,
> For example, openstack is often timeout in waiting for the creation of
> multipath devices.
> 
> The reason of the low processing efficiency I think is that multipath
> process uevent message one by one, and each one also produce a new dm
> addition/change/deletion uevent message to increased system resource
> consumption, actually most of these uevent message have no sense at all.
> 
> So, can we find out a way to reduce these uevent messages to promote
> multipath processing efficiency? Personally, I think we can merge
> these uevent messages before processing. For example, during the
> 4 iSCSI session login procedures, we can wait for a moment until
> the addition uevent messages of 4 paths are all arrived, and then we can
> merge these uevent messages to one and deal with it at once. The way to
> deal with deletion uevent messages is the same.
> 
> How do you think about? Any points of view are welcome.

The problem is that we don't know beforehand on how many uevents we
should be waiting for.
And even if we do there still would be a chance of one or several of
these uevents failing to setup the device, so we would be waiting
forever here.

The one possible way out would be to modify the way we're handling
events internally. Event processing really are several steps:
1) Getting information about the attached device (pathinfo() and friends)
2) Store the information in our pathvec
3) Identify and update the mpp structure with the new pathvecs
Currently, we're processing each step for every uevent.
As we have only a single lock protecting both, pathvec and mppvec, we
have to take the lock prior to setup 2 and release it after step 3.
So while we could receive events in parallel, we can only process them
one-by-one, with every event having to re-do step 3.

The idea would be to split off single lock into a pathvec lock and an
mppvec lock, and create a separate thread for updating mppvec.

Then event processing can be streamlined by having the uevent thread
adding the new device to the pathvec and notifying the mppvec thread.
This thread could then check if an pathvec update is in progress, and
only start once this pathvec handling has finished.
With this we would avoid having to issue several similar mppvec updates,
and the entire handling would be far smoother.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-16  7:53 ` Hannes Reinecke
@ 2016-11-16  8:45   ` tang.junhui
  2016-11-16  9:49     ` Martin Wilck
  0 siblings, 1 reply; 47+ messages in thread
From: tang.junhui @ 2016-11-16  8:45 UTC (permalink / raw)
  To: Hannes Reinecke; +Cc: dm-devel, dm-devel-bounces


[-- Attachment #1.1: Type: text/plain, Size: 3936 bytes --]

Hello Hannes:

Since the received uevent messages store in the queue, and the speed 
uevent messages received is 
much faster than the speed uevent messages processed, so, we can merge 
these queued uevent 
message first, and then process it in the next step. Of course, some paths
’ uevent messages of 
multipath device may not be received yet, but we do not need to wait for 
it, since we can deal with 
the left paths in the original way when we received uevent messages of 
these paths later. 

I think we can merger most of uevent messages and reduce most of 
unnecessary DM change 
uevent messages during creation/deletion of multipath devices by this way.

The method you mentioned I think that it is a little complex, and it not 
reduce the DM 
addition/change/deletion uevent messages which consumed system high 
resource.

Sincerely
Tang


On 11/16/2016 02:46 AM, tang.junhui@zte.com.cn wrote:
> In these scenarios, multipath processing efficiency is very low:
> 1) There are many paths in multipath devices,
> 2) Add/delete devices when iSCSI login/logout or FC link up/down.
> 
> Multipath process so slowly that it is not satisfied some applications,
> For example, openstack is often timeout in waiting for the creation of
> multipath devices.
> 
> The reason of the low processing efficiency I think is that multipath
> process uevent message one by one, and each one also produce a new dm
> addition/change/deletion uevent message to increased system resource
> consumption, actually most of these uevent message have no sense at all.
> 
> So, can we find out a way to reduce these uevent messages to promote
> multipath processing efficiency? Personally, I think we can merge
> these uevent messages before processing. For example, during the
> 4 iSCSI session login procedures, we can wait for a moment until
> the addition uevent messages of 4 paths are all arrived, and then we can
> merge these uevent messages to one and deal with it at once. The way to
> deal with deletion uevent messages is the same.
> 
> How do you think about? Any points of view are welcome.

The problem is that we don't know beforehand on how many uevents we
should be waiting for.
And even if we do there still would be a chance of one or several of
these uevents failing to setup the device, so we would be waiting
forever here.

The one possible way out would be to modify the way we're handling
events internally. Event processing really are several steps:
1) Getting information about the attached device (pathinfo() and friends)
2) Store the information in our pathvec
3) Identify and update the mpp structure with the new pathvecs
Currently, we're processing each step for every uevent.
As we have only a single lock protecting both, pathvec and mppvec, we
have to take the lock prior to setup 2 and release it after step 3.
So while we could receive events in parallel, we can only process them
one-by-one, with every event having to re-do step 3.

The idea would be to split off single lock into a pathvec lock and an
mppvec lock, and create a separate thread for updating mppvec.

Then event processing can be streamlined by having the uevent thread
adding the new device to the pathvec and notifying the mppvec thread.
This thread could then check if an pathvec update is in progress, and
only start once this pathvec handling has finished.
With this we would avoid having to issue several similar mppvec updates,
and the entire handling would be far smoother.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                                 Teamlead Storage & 
Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[-- Attachment #1.2: Type: text/html, Size: 5340 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-16  8:45   ` tang.junhui
@ 2016-11-16  9:49     ` Martin Wilck
  2016-11-17  1:41       ` tang.junhui
  0 siblings, 1 reply; 47+ messages in thread
From: Martin Wilck @ 2016-11-16  9:49 UTC (permalink / raw)
  To: dm-devel

Hi Tang,

On Wed, 2016-11-16 at 16:45 +0800, tang.junhui@zte.com.cn wrote:

> I think we can merger most of uevent messages and reduce most of
> unnecessary DM change 
> uevent messages during creation/deletion of multipath devices by this
> way. 
> The method you mentioned I think that it is a little complex, and it
> not reduce the DM 
> addition/change/deletion uevent messages which consumed system high
> resource. 

Let me quickly introduce myself, I'm a new member of Hannes' team at
SUSE and new on this ML.

Apart from Hannes' proposal for more fine-grained locking, I can see
the following options:

 a) instead of kicking the uevent processing thread whenever anything
is queued, kick it in predefined time intervals (1 second, say). The
uevent processor would then accumulate all changes received since it
had been kicked last, and apply DM changes only when necessary. This
may need to be a config option because it could obviously lead to
delayed device setup in some configurations.

 b) the logic of a) could be improved by the uevent listener detecting
"event storms" and waiting for them to end before kicking the
processing thread. The details of the heuristics for "storms" would
need to be worked out, of course.

 c) sometimes several uevents for the same physical path occur in quick
succession. Normally it should be sufficient to filter these such that
only the last event is processed.

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-16  9:49     ` Martin Wilck
@ 2016-11-17  1:41       ` tang.junhui
  2016-11-17 10:48         ` Martin Wilck
  0 siblings, 1 reply; 47+ messages in thread
From: tang.junhui @ 2016-11-17  1:41 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 3278 bytes --]

Hi Martin,

Nice to see you, May you success in your team and our open source 
community.

You have put forward a good proposal to queued more uevent messages by 
kicking uevent 
processing thread in a predefined time intervals. It is also a good idea 
to have such intervals 
configuration.

As to process several uevents for the same physical devices, I think the 
opinions 
different between us is "FILTER" or "MERGER". Personally, I think Merger 
is more 
accuracy, for example, we receive 4 paths addition uevent messages from 
the same 
physical devices:
1)uevent add sdb
2)uevent add sdc
3)uevent add sdd
4)uevent add sde

We cannot just filter the 1)2)3) uevent messages but only process the 
4)uevent message, 
which would cause losing paths of this multipath devices.

In my opionion, we should MERGE the four uevent messages to one like:
1)uevent add sdb sdb sdd sde

And then uevent processing thread only needs to process one uevent 
message, and it 
only produce one DM addition uevent messages(VS. now: one DM addition 
uevent 
message and 3 DM change uevent messages) which really reduce system 
consumption 
(for example: avoid udev to process every DM uevent messages DM kernel 
produced).

Regards,Tang



发件人:         Martin Wilck <mwilck@suse.com>
收件人:         dm-devel@redhat.com, 
日期:   2016/11/16 17:59
主题:   Re: [dm-devel] Improve processing efficiency for addition and 
deletion of multipath devices
发件人: dm-devel-bounces@redhat.com



Hi Tang,

On Wed, 2016-11-16 at 16:45 +0800, tang.junhui@zte.com.cn wrote:

> I think we can merger most of uevent messages and reduce most of
> unnecessary DM change 
> uevent messages during creation/deletion of multipath devices by this
> way. 
> The method you mentioned I think that it is a little complex, and it
> not reduce the DM 
> addition/change/deletion uevent messages which consumed system high
> resource. 

Let me quickly introduce myself, I'm a new member of Hannes' team at
SUSE and new on this ML.

Apart from Hannes' proposal for more fine-grained locking, I can see
the following options:

 a) instead of kicking the uevent processing thread whenever anything
is queued, kick it in predefined time intervals (1 second, say). The
uevent processor would then accumulate all changes received since it
had been kicked last, and apply DM changes only when necessary. This
may need to be a config option because it could obviously lead to
delayed device setup in some configurations.

 b) the logic of a) could be improved by the uevent listener detecting
"event storms" and waiting for them to end before kicking the
processing thread. The details of the heuristics for "storms" would
need to be worked out, of course.

 c) sometimes several uevents for the same physical path occur in quick
succession. Normally it should be sufficient to filter these such that
only the last event is processed.

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[-- Attachment #1.2: Type: text/html, Size: 5657 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-17  1:41       ` tang.junhui
@ 2016-11-17 10:48         ` Martin Wilck
  2016-11-18  1:02           ` tang.junhui
  2016-11-18 22:26           ` Benjamin Marzinski
  0 siblings, 2 replies; 47+ messages in thread
From: Martin Wilck @ 2016-11-17 10:48 UTC (permalink / raw)
  To: tang.junhui; +Cc: dm-devel

Hi Tang,

> As to process several uevents for the same physical devices, I think
> the opinions 
> different between us is "FILTER" or "MERGER". Personally, I think
> Merger is more 
> accuracy, for example, we receive 4 paths addition uevent messages
> from the same 
> physical devices: 
> 1)uevent add sdb 
> 2)uevent add sdc 
> 3)uevent add sdd 
> 4)uevent add sde 
> 
> We cannot just filter the 1)2)3) uevent messages but only process the
> 4)uevent message, 
> which would cause losing paths of this multipath devices. 

Of course. My "filtering" idea was meant for cases where several events
for the same device are queued, e.g.

  1) add sda
  2) change sda
  3) delete sda

Is it always sufficient to look only at the last event in such a case?
I think so, but I'm not 100% certain.

Regards
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-17 10:48         ` Martin Wilck
@ 2016-11-18  1:02           ` tang.junhui
  2016-11-18  7:39             ` Martin Wilck
  2016-11-18 22:26           ` Benjamin Marzinski
  1 sibling, 1 reply; 47+ messages in thread
From: tang.junhui @ 2016-11-18  1:02 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 2229 bytes --]

Hi Martin,

For your opinion:

>  My "filtering" idea was meant for cases where several events
>  for the same device are queued, e.g

>  1) add sda
>  2) change sda
>  3) delete sda

>Is it always sufficient to look only at the last event in such a case?

I do not agree with you. The reasons are as follows:

1) It’s risky to filter uevents like that, a SCSI device has been 
generated, 
   May be its life time is very short, but we cannot turn a blind eye on 
it, 
   the system or applications may need multipath device of the SCSI 
device.

2) This scenario you mentioned rarely happens, we are more concerned 
   about the more common situation like addition or deletion devices when 
   iSCSI login/logout or FC link up/down with many iSCSI or FC links in 
   each LUN. At this situation we may receive many uevents from different
   paths of the same LUN device, we want merge these uevents to one and 
   process them together.

Regards,
Tang






发件人:         Martin Wilck <mwilck@suse.com>
收件人:         tang.junhui@zte.com.cn, 
抄送:   dm-devel@redhat.com
日期:   2016/11/17 18:57
主题:   Re: [dm-devel] Improve processing efficiency for addition and 
deletion of multipath devices
发件人: dm-devel-bounces@redhat.com



Hi Tang,

> As to process several uevents for the same physical devices, I think
> the opinions 
> different between us is "FILTER" or "MERGER". Personally, I think
> Merger is more 
> accuracy, for example, we receive 4 paths addition uevent messages
> from the same 
> physical devices: 
> 1)uevent add sdb 
> 2)uevent add sdc 
> 3)uevent add sdd 
> 4)uevent add sde 
> 
> We cannot just filter the 1)2)3) uevent messages but only process the
> 4)uevent message, 
> which would cause losing paths of this multipath devices. 

Of course. My "filtering" idea was meant for cases where several events
for the same device are queued, e.g.

  1) add sda
  2) change sda
  3) delete sda

Is it always sufficient to look only at the last event in such a case?
I think so, but I'm not 100% certain.

Regards
Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[-- Attachment #1.2: Type: text/html, Size: 4532 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-18  1:02           ` tang.junhui
@ 2016-11-18  7:39             ` Martin Wilck
  2016-11-18  8:24               ` tang.junhui
  0 siblings, 1 reply; 47+ messages in thread
From: Martin Wilck @ 2016-11-18  7:39 UTC (permalink / raw)
  To: tang.junhui; +Cc: dm-devel

Hi Tang,

On Fri, 2016-11-18 at 09:02 +0800, tang.junhui@zte.com.cn wrote:
> 
> we are more concerned 
>    about the more common situation like addition or deletion devices
> when 
>    iSCSI login/logout or FC link up/down with many iSCSI or FC links
> in 
>    each LUN. At this situation we may receive many uevents from
> different 
>    paths of the same LUN device, we want merge these uevents to one
> and 
>    process them together. 

I'm not arguing against that. But consider what you'd do if you have to
process the series of uevents [ 1 "add sda", 2 "add sdb", 3 "del sda",
4 "add sdc", 5 "del sdb", 6 "add sda" ]. If you merge these, assuming
all belong to the same multipath map, what would be your action? I
would set up a map with sda and sdc, using the device properties from
event 4 and 6. That means that the rest of the events has been
"filtered" out. All else would mean repeated map loads for the same
multipath map, which is what we want to avoid. Agreed?

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-18  7:39             ` Martin Wilck
@ 2016-11-18  8:24               ` tang.junhui
  2016-11-18  8:30                 ` Martin Wilck
  0 siblings, 1 reply; 47+ messages in thread
From: tang.junhui @ 2016-11-18  8:24 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 8557 bytes --]

Hi Martin,

In your case, my action is:
1) merger uevents 1) 2) to one uevent "add sda sdb", and process them 
togother
2) process uevent "del sda"
3) process uevent "add sdc"
4) process uevent "del sdb"
5) process uevent "add sda"

Though the processing efficiency in such scenario is lower than yours, but 
it is simple and reliable,
more importantly, Martin, you still focus on such special scene, which I 
concerned is like this:
UDEV  [19172.014482] add 
/devices/platform/host3/session17/target3:0:0/3:0:0:0/block/sdc (block)
UDEV  [19172.249389] add 
/devices/platform/host4/session18/target4:0:0/4:0:0:0/block/sdd (block)
UDEV  [19172.343791] add 
/devices/platform/host3/session17/target3:0:0/3:0:0:2/block/sdf (block)
UDEV  [19172.364496] add 
/devices/platform/host5/session19/target5:0:0/5:0:0:0/block/sdh (block)
UDEV  [19172.523794] add 
/devices/platform/host4/session18/target4:0:0/4:0:0:2/block/sdi (block)
UDEV  [19172.621333] add 
/devices/platform/host3/session17/target3:0:0/3:0:0:4/block/sdn (block)
UDEV  [19172.699473] add 
/devices/platform/host4/session18/target4:0:0/4:0:0:1/block/sdg (block)
UDEV  [19172.704605] add 
/devices/platform/host3/session17/target3:0:0/3:0:0:1/block/sde (block)
UDEV  [19172.709687] add 
/devices/platform/host3/session17/target3:0:0/3:0:0:3/block/sdj (block)
UDEV  [19172.714919] add 
/devices/platform/host4/session18/target4:0:0/4:0:0:3/block/sdl (block)
UDEV  [19172.728891] add 
/devices/platform/host4/session18/target4:0:0/4:0:0:4/block/sdo (block)
UDEV  [19172.872156] add 
/devices/platform/host3/session17/target3:0:0/3:0:0:6/block/sdt (block)
UDEV  [19172.915542] add 
/devices/platform/host4/session18/target4:0:0/4:0:0:6/block/sdu (block)
UDEV  [19173.086935] add 
/devices/platform/host6/session20/target6:0:0/6:0:0:0/block/sdw (block)
UDEV  [19173.108278] add 
/devices/platform/host6/session20/target6:0:0/6:0:0:1/block/sdz (block)
UDEV  [19173.195153] add 
/devices/platform/host4/session18/target4:0:0/4:0:0:5/block/sdr (block)
UDEV  [19173.228397] add 
/devices/platform/host3/session17/target3:0:0/3:0:0:5/block/sdq (block)
UDEV  [19173.363632] add 
/devices/platform/host5/session19/target5:0:0/5:0:0:2/block/sdm (block)
UDEV  [19173.386560] add 
/devices/platform/host3/session17/target3:0:0/3:0:0:7/block/sdx (block)
UDEV  [19173.394515] add 
/devices/platform/host4/session18/target4:0:0/4:0:0:7/block/sdy (block)
UDEV  [19173.410152] add 
/devices/platform/host5/session19/target5:0:0/5:0:0:1/block/sdk (block)
UDEV  [19173.474286] add 
/devices/platform/host6/session20/target6:0:0/6:0:0:2/block/sdab (block)
UDEV  [19173.508438] add 
/devices/platform/host5/session19/target5:0:0/5:0:0:3/block/sdp (block)
UDEV  [19173.713146] add 
/devices/platform/host5/session19/target5:0:0/5:0:0:4/block/sds (block)
UDEV  [19173.782065] add 
/devices/platform/host5/session19/target5:0:0/5:0:0:5/block/sdv (block)
UDEV  [19173.787447] add 
/devices/platform/host5/session19/target5:0:0/5:0:0:6/block/sdaa (block)
UDEV  [19173.803218] add 
/devices/platform/host6/session20/target6:0:0/6:0:0:3/block/sdad (block)
UDEV  [19173.849411] add 
/devices/platform/host5/session19/target5:0:0/5:0:0:7/block/sdac (block)
UDEV  [19173.918301] add 
/devices/platform/host6/session20/target6:0:0/6:0:0:5/block/sdaf (block)
UDEV  [19173.941005] add 
/devices/platform/host6/session20/target6:0:0/6:0:0:4/block/sdae (block)
UDEV  [19173.987206] add 
/devices/platform/host6/session20/target6:0:0/6:0:0:7/block/sdah (block)
UDEV  [19173.992431] add 
/devices/platform/host6/session20/target6:0:0/6:0:0:6/block/sdag (block)

Or like this:
UDEV  [20712.402631] remove 
/devices/platform/host3/session17/target3:0:0/3:0:0:0/block/sdc (block)
UDEV  [20712.427716] remove 
/devices/platform/host6/session20/target6:0:0/6:0:0:0/block/sdw (block)
UDEV  [20712.459854] remove 
/devices/platform/host4/session18/target4:0:0/4:0:0:0/block/sdd (block)
UDEV  [20712.471124] remove 
/devices/platform/host5/session19/target5:0:0/5:0:0:0/block/sdh (block)
UDEV  [20712.492190] remove 
/devices/platform/host6/session20/target6:0:0/6:0:0:1/block/sdz (block)
UDEV  [20712.495245] remove 
/devices/platform/host4/session18/target4:0:0/4:0:0:1/block/sdg (block)
UDEV  [20712.512007] remove 
/devices/platform/host3/session17/target3:0:0/3:0:0:1/block/sde (block)
UDEV  [20712.522986] remove 
/devices/platform/host5/session19/target5:0:0/5:0:0:1/block/sdk (block)
UDEV  [20712.528676] remove 
/devices/platform/host6/session20/target6:0:0/6:0:0:2/block/sdab (block)
UDEV  [20712.529891] remove 
/devices/platform/host5/session19/target5:0:0/5:0:0:2/block/sdm (block)
UDEV  [20712.536178] remove 
/devices/platform/host4/session18/target4:0:0/4:0:0:2/block/sdi (block)
UDEV  [20712.545444] remove 
/devices/platform/host4/session18/target4:0:0/4:0:0:3/block/sdl (block)
UDEV  [20712.548006] remove 
/devices/platform/host3/session17/target3:0:0/3:0:0:3/block/sdj (block)
UDEV  [20712.551935] remove 
/devices/platform/host5/session19/target5:0:0/5:0:0:3/block/sdp (block)
UDEV  [20712.555405] remove 
/devices/platform/host3/session17/target3:0:0/3:0:0:2/block/sdf (block)
UDEV  [20712.556947] remove 
/devices/platform/host4/session18/target4:0:0/4:0:0:4/block/sdo (block)
UDEV  [20712.563524] remove 
/devices/platform/host5/session19/target5:0:0/5:0:0:4/block/sds (block)
UDEV  [20712.572948] remove 
/devices/platform/host6/session20/target6:0:0/6:0:0:4/block/sdae (block)
UDEV  [20712.574738] remove 
/devices/platform/host6/session20/target6:0:0/6:0:0:3/block/sdad (block)
UDEV  [20712.576736] remove 
/devices/platform/host4/session18/target4:0:0/4:0:0:5/block/sdr (block)
UDEV  [20712.581343] remove 
/devices/platform/host3/session17/target3:0:0/3:0:0:4/block/sdn (block)
UDEV  [20712.583883] remove 
/devices/platform/host3/session17/target3:0:0/3:0:0:5/block/sdq (block)
UDEV  [20712.604498] remove 
/devices/platform/host6/session20/target6:0:0/6:0:0:5/block/sdaf (block)
UDEV  [20712.605536] remove 
/devices/platform/host4/session18/target4:0:0/4:0:0:6/block/sdu (block)
UDEV  [20712.605721] remove 
/devices/platform/host3/session17/target3:0:0/3:0:0:6/block/sdt (block)
UDEV  [20712.606517] remove 
/devices/platform/host5/session19/target5:0:0/5:0:0:5/block/sdv (block)
UDEV  [20712.610709] remove 
/devices/platform/host5/session19/target5:0:0/5:0:0:6/block/sdaa (block)
UDEV  [20712.618187] remove 
/devices/platform/host4/session18/target4:0:0/4:0:0:7/block/sdy (block)
UDEV  [20712.628520] remove 
/devices/platform/host6/session20/target6:0:0/6:0:0:6/block/sdag (block)
UDEV  [20712.642216] remove 
/devices/platform/host5/session19/target5:0:0/5:0:0:7/block/sdac (block)
UDEV  [20712.643796] remove 
/devices/platform/host3/session17/target3:0:0/3:0:0:7/block/sdx (block)
UDEV  [20712.653786] remove 
/devices/platform/host6/session20/target6:0:0/6:0:0:7/block/sdah (block)

These scenarios are more realistic and more urgently requested to be 
solved.

Regards,
Tang






发件人:         Martin Wilck <mwilck@suse.com>
收件人:         tang.junhui@zte.com.cn, 
抄送:   dm-devel@redhat.com
日期:   2016/11/18 15:40
主题:   Re: Re: [dm-devel] Improve processing efficiency for addition and 
deletion of multipath devices



Hi Tang,

On Fri, 2016-11-18 at 09:02 +0800, tang.junhui@zte.com.cn wrote:
> 
> we are more concerned 
>    about the more common situation like addition or deletion devices
> when 
>    iSCSI login/logout or FC link up/down with many iSCSI or FC links
> in 
>    each LUN. At this situation we may receive many uevents from
> different 
>    paths of the same LUN device, we want merge these uevents to one
> and 
>    process them together. 

I'm not arguing against that. But consider what you'd do if you have to
process the series of uevents [ 1 "add sda", 2 "add sdb", 3 "del sda",
4 "add sdc", 5 "del sdb", 6 "add sda" ]. If you merge these, assuming
all belong to the same multipath map, what would be your action? I
would set up a map with sda and sdc, using the device properties from
event 4 and 6. That means that the rest of the events has been
"filtered" out. All else would mean repeated map loads for the same
multipath map, which is what we want to avoid. Agreed?

Regards,
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)




[-- Attachment #1.2: Type: text/html, Size: 12614 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-18  8:24               ` tang.junhui
@ 2016-11-18  8:30                 ` Martin Wilck
  2016-11-18  8:56                   ` tang.junhui
                                     ` (2 more replies)
  0 siblings, 3 replies; 47+ messages in thread
From: Martin Wilck @ 2016-11-18  8:30 UTC (permalink / raw)
  To: tang.junhui; +Cc: dm-devel

On Fri, 2016-11-18 at 16:24 +0800, tang.junhui@zte.com.cn wrote:
> Hi Martin, 
> 
> In your case, my action is: 
> 1) merger uevents 1) 2) to one uevent "add sda sdb", and process them
> togother 

This will fail because sdb is non-existent at the time you try - no?

> Though the processing efficiency in such scenario is lower than
> yours, but it is simple and reliable,
> more importantly, Martin, you still focus on such special scene,
> which I concerned is like this: 

[...]

I understand what you're concerned with. I just think we need to do
both. I agree that many events for many different devices are more
likely. But once you start merging, you'd rather be prepared for
several events for the same phys device, too.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-18  8:30                 ` Martin Wilck
@ 2016-11-18  8:56                   ` tang.junhui
  2016-11-18  9:12                   ` tang.junhui
  2016-11-21 18:19                   ` Benjamin Marzinski
  2 siblings, 0 replies; 47+ messages in thread
From: tang.junhui @ 2016-11-18  8:56 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 1624 bytes --]

Hi Martin,

>  This will fail because sdb is non-existent at the time you try - no?
Non-existent paths in merger uevents do not affect the entire algorithm. 
It should be considered at the time of implementation.

Your suggestion is meaningful, but I think we need to focus on the existed 
problem, 
resolve it, and then do more work to make it better in other areas. is not 
it?

Regards,Tang





发件人:         Martin Wilck <mwilck@suse.com>
收件人:         tang.junhui@zte.com.cn, 
抄送:   dm-devel@redhat.com
日期:   2016/11/18 16:31
主题:   Re: Re: Re: [dm-devel] Improve processing efficiency for addition 
and deletion of multipath devices



On Fri, 2016-11-18 at 16:24 +0800, tang.junhui@zte.com.cn wrote:
> Hi Martin, 
> 
> In your case, my action is: 
> 1) merger uevents 1) 2) to one uevent "add sda sdb", and process them
> togother 

This will fail because sdb is non-existent at the time you try - no?

> Though the processing efficiency in such scenario is lower than
> yours, but it is simple and reliable,
> more importantly, Martin, you still focus on such special scene,
> which I concerned is like this: 

[...]

I understand what you're concerned with. I just think we need to do
both. I agree that many events for many different devices are more
likely. But once you start merging, you'd rather be prepared for
several events for the same phys device, too.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)




[-- Attachment #1.2: Type: text/html, Size: 2842 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-18  8:30                 ` Martin Wilck
  2016-11-18  8:56                   ` tang.junhui
@ 2016-11-18  9:12                   ` tang.junhui
  2016-11-21 18:19                   ` Benjamin Marzinski
  2 siblings, 0 replies; 47+ messages in thread
From: tang.junhui @ 2016-11-18  9:12 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel


[-- Attachment #1.1: Type: text/plain, Size: 1683 bytes --]

Hi Martin,

〉But once you start merging, you'd rather be prepared for
〉several events for the same phys device, too.

We can base on such a threshold that there is no repeat uevents from the 
same sd device,
otherwise, we pause doing merger, and kick uevent processing thread to 
process the merged uevents.

Regards,
Tang




发件人:         Martin Wilck <mwilck@suse.com>
收件人:         tang.junhui@zte.com.cn, 
抄送:   dm-devel@redhat.com
日期:   2016/11/18 16:38
主题:   Re: [dm-devel] Improve processing efficiency for addition and 
deletion of multipath devices
发件人: dm-devel-bounces@redhat.com



On Fri, 2016-11-18 at 16:24 +0800, tang.junhui@zte.com.cn wrote:
> Hi Martin, 
> 
> In your case, my action is: 
> 1) merger uevents 1) 2) to one uevent "add sda sdb", and process them
> togother 

This will fail because sdb is non-existent at the time you try - no?

> Though the processing efficiency in such scenario is lower than
> yours, but it is simple and reliable,
> more importantly, Martin, you still focus on such special scene,
> which I concerned is like this: 

[...]

I understand what you're concerned with. I just think we need to do
both. I agree that many events for many different devices are more
likely. But once you start merging, you'd rather be prepared for
several events for the same phys device, too.

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[-- Attachment #1.2: Type: text/html, Size: 3235 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-17 10:48         ` Martin Wilck
  2016-11-18  1:02           ` tang.junhui
@ 2016-11-18 22:26           ` Benjamin Marzinski
  2016-11-23  1:08             ` tang.junhui
  2016-11-24  9:21             ` Martin Wilck
  1 sibling, 2 replies; 47+ messages in thread
From: Benjamin Marzinski @ 2016-11-18 22:26 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel, tang.junhui

My $.02

I'm not sure that we want to wait for a predetermined time to grab the
uevents.  If they are coming in slowly, multipathd is more responsive by
processing them immediately, and if they are coming in a storm, they
will naturally pile up faster than we deal with them. I also don't
really like the idea of waiting when we get an event to see if another
one comes along, for the same reasons. That being said, I wouldn't NACK
an config option (turned off by default) that set a minimum wait time
between uevent dispatches, because I might be wrong here.  However, It
sees like what uevent_dispatch() dispatch already does is the right
thing, which is to pull all oustanding uevents off the uevent queue into
a new list.

The issue is that we simply need to work on processing the whole list at
a time.  Martin's filtering definitely has a place here. He is correct
that if we get a delete uevent for a device, we don't need to process
any of the earlier ones.  We do need to look at both the add and change
uevents individually. For instance, one of the change uevents out of a
group could be for changing the read-only status of a device. If we just
took the most recent, we would miss that information.  The good news is
that we don't need to actually call uev_add_path and uev_update_path
once for each of these events.  All we need to do is read through the
uev environment variables for all of them to find the information we
need (like change in ro status), and then call the function once with
that information.

Hannes pointed out that for adding paths we don't need to do any locking
until we want to add the path to the pathvec. We could grab all the
outstanding add events from the list that gets passed to service_uevq,
and collect the pathinfo for all the paths, add them all into the
pathvec, and then create or update the multipath devices. delete uevents
could work similarly, where we find all the paths for a multipath device
in our list, remove them all, and reload the device once.

I'm not sure how much a separate thread for doing the multipath work
would buy us, however. It's true that we could have a seperate lock for
the mpvec, so we didn't need to hold the pathvec lock while updating the
mpvec, but actually updating the mpvec only takes a moment. The time
consuming part is building the multipath device and passing it to the
kernel. For this, we do need the the paths locked. Otherwise what would
keep a path from getting removed while the multipath device was using it
to set get set up. If working with multipath devices and proccessing
path uevents is happening at the same time, this is a very real
possibility.

But even if we keep one thread processing the uevents, simply avoiding
calling uev_add_path and uev_update_path for repeated add and change
events, and batching multiple add and delete events together could
provide a real speedup, IMHO.

Now, the holy grail of multipathd efficiency would be to totally rework
multipathd's locking into something sensible.  We could have locks for
the vectors that only got held when you were actually traversing or
manipulating the vectors, coupled with reference counts on the
individual paths and multipaths, so that they didn't get removed while
they were in use, and probably locks on the individual paths and
multipaths for just the sections that really needed to be protected.
The problem is that this would take a huge amount of code auditting to
get mostly right, and then a while to flush out all the missed corner
cases.  At least I think it would. But I dismissed decoupling the config
structure from the vecs lock as too painful, and clearly that was
possible.

At any rate, I'd rather get rid of the gazillion waiter threads first.

-Ben

On Thu, Nov 17, 2016 at 11:48:32AM +0100, Martin Wilck wrote:
> Hi Tang,
> 
> > As to process several uevents for the same physical devices, I think
> > the opinions 
> > different between us is "FILTER" or "MERGER". Personally, I think
> > Merger is more 
> > accuracy, for example, we receive 4 paths addition uevent messages
> > from the same 
> > physical devices: 
> > 1)uevent add sdb 
> > 2)uevent add sdc 
> > 3)uevent add sdd 
> > 4)uevent add sde 
> > 
> > We cannot just filter the 1)2)3) uevent messages but only process the
> > 4)uevent message, 
> > which would cause losing paths of this multipath devices. 
> 
> Of course. My "filtering" idea was meant for cases where several events
> for the same device are queued, e.g.
> 
>   1) add sda
>   2) change sda
>   3) delete sda
> 
> Is it always sufficient to look only at the last event in such a case?
> I think so, but I'm not 100% certain.
> 
> Regards
> Martin

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-18  8:30                 ` Martin Wilck
  2016-11-18  8:56                   ` tang.junhui
  2016-11-18  9:12                   ` tang.junhui
@ 2016-11-21 18:19                   ` Benjamin Marzinski
  2 siblings, 0 replies; 47+ messages in thread
From: Benjamin Marzinski @ 2016-11-21 18:19 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel, tang.junhui

On Fri, Nov 18, 2016 at 09:30:57AM +0100, Martin Wilck wrote:
> On Fri, 2016-11-18 at 16:24 +0800, tang.junhui@zte.com.cn wrote:
> > Hi Martin, 
> > 
> > In your case, my action is: 
> > 1) merger uevents 1) 2) to one uevent "add sda sdb", and process them
> > togother 
> 
> This will fail because sdb is non-existent at the time you try - no?

Yes. If we get a cluster of events for a device, and the last event is a
remove event, we don't need to precess them. Just like you mention,
after that remove event is sent by udev, the device is gone.  There is
nothing that multipathd could do with it even if it wanted to. So there
is no point in trying the add and change event processing, since we
already know that they will fail.

-Ben

> 
> > Though the processing efficiency in such scenario is lower than
> > yours, but it is simple and reliable,
> > more importantly, Martin, you still focus on such special scene,
> > which I concerned is like this: 
> 
> [...]
> 
> I understand what you're concerned with. I just think we need to do
> both. I agree that many events for many different devices are more
> likely. But once you start merging, you'd rather be prepared for
> several events for the same phys device, too.
> 
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)
> 
> --
> dm-devel mailing list
> dm-devel@redhat.com
> https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-18 22:26           ` Benjamin Marzinski
@ 2016-11-23  1:08             ` tang.junhui
  2016-11-29  9:07               ` Zdenek Kabelac
  2016-11-24  9:21             ` Martin Wilck
  1 sibling, 1 reply; 47+ messages in thread
From: tang.junhui @ 2016-11-23  1:08 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: zhang.kai16, dm-devel, Bart Van Assche, Martin Wilck


[-- Attachment #1.1: Type: text/plain, Size: 5611 bytes --]

So, now we have at least 4 ways to improve mutliapth efficiency:
1)  Filtering uevents;
2)  Merger uevents;
3)  Using separate locks for mpvec and pathvec;
4)  Get rid of the gazillion waiter threads.

This is exciting, but how do we achieve this blueprint? 
Can we set up some working groups and develop it in parallel 
to implement each improvement since most of them are independent?



发件人:         "Benjamin Marzinski" <bmarzins@redhat.com>
收件人:         Martin Wilck <mwilck@suse.com>, 
抄送:   dm-devel@redhat.com, tang.junhui@zte.com.cn
日期:   2016/11/19 06:33
主题:   Re: [dm-devel] Improve processing efficiency for addition and 
deletion of multipath devices
发件人: dm-devel-bounces@redhat.com



My $.02

I'm not sure that we want to wait for a predetermined time to grab the
uevents.  If they are coming in slowly, multipathd is more responsive by
processing them immediately, and if they are coming in a storm, they
will naturally pile up faster than we deal with them. I also don't
really like the idea of waiting when we get an event to see if another
one comes along, for the same reasons. That being said, I wouldn't NACK
an config option (turned off by default) that set a minimum wait time
between uevent dispatches, because I might be wrong here.  However, It
sees like what uevent_dispatch() dispatch already does is the right
thing, which is to pull all oustanding uevents off the uevent queue into
a new list.

The issue is that we simply need to work on processing the whole list at
a time.  Martin's filtering definitely has a place here. He is correct
that if we get a delete uevent for a device, we don't need to process
any of the earlier ones.  We do need to look at both the add and change
uevents individually. For instance, one of the change uevents out of a
group could be for changing the read-only status of a device. If we just
took the most recent, we would miss that information.  The good news is
that we don't need to actually call uev_add_path and uev_update_path
once for each of these events.  All we need to do is read through the
uev environment variables for all of them to find the information we
need (like change in ro status), and then call the function once with
that information.

Hannes pointed out that for adding paths we don't need to do any locking
until we want to add the path to the pathvec. We could grab all the
outstanding add events from the list that gets passed to service_uevq,
and collect the pathinfo for all the paths, add them all into the
pathvec, and then create or update the multipath devices. delete uevents
could work similarly, where we find all the paths for a multipath device
in our list, remove them all, and reload the device once.

I'm not sure how much a separate thread for doing the multipath work
would buy us, however. It's true that we could have a seperate lock for
the mpvec, so we didn't need to hold the pathvec lock while updating the
mpvec, but actually updating the mpvec only takes a moment. The time
consuming part is building the multipath device and passing it to the
kernel. For this, we do need the the paths locked. Otherwise what would
keep a path from getting removed while the multipath device was using it
to set get set up. If working with multipath devices and proccessing
path uevents is happening at the same time, this is a very real
possibility.

But even if we keep one thread processing the uevents, simply avoiding
calling uev_add_path and uev_update_path for repeated add and change
events, and batching multiple add and delete events together could
provide a real speedup, IMHO.

Now, the holy grail of multipathd efficiency would be to totally rework
multipathd's locking into something sensible.  We could have locks for
the vectors that only got held when you were actually traversing or
manipulating the vectors, coupled with reference counts on the
individual paths and multipaths, so that they didn't get removed while
they were in use, and probably locks on the individual paths and
multipaths for just the sections that really needed to be protected.
The problem is that this would take a huge amount of code auditting to
get mostly right, and then a while to flush out all the missed corner
cases.  At least I think it would. But I dismissed decoupling the config
structure from the vecs lock as too painful, and clearly that was
possible.

At any rate, I'd rather get rid of the gazillion waiter threads first.

-Ben

On Thu, Nov 17, 2016 at 11:48:32AM +0100, Martin Wilck wrote:
> Hi Tang,
> 
> > As to process several uevents for the same physical devices, I think
> > the opinions 
> > different between us is "FILTER" or "MERGER". Personally, I think
> > Merger is more 
> > accuracy, for example, we receive 4 paths addition uevent messages
> > from the same 
> > physical devices: 
> > 1)uevent add sdb 
> > 2)uevent add sdc 
> > 3)uevent add sdd 
> > 4)uevent add sde 
> > 
> > We cannot just filter the 1)2)3) uevent messages but only process the
> > 4)uevent message, 
> > which would cause losing paths of this multipath devices. 
> 
> Of course. My "filtering" idea was meant for cases where several events
> for the same device are queued, e.g.
> 
>   1) add sda
>   2) change sda
>   3) delete sda
> 
> Is it always sufficient to look only at the last event in such a case?
> I think so, but I'm not 100% certain.
> 
> Regards
> Martin

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[-- Attachment #1.2: Type: text/html, Size: 7650 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-18 22:26           ` Benjamin Marzinski
  2016-11-23  1:08             ` tang.junhui
@ 2016-11-24  9:21             ` Martin Wilck
  2016-11-28 18:46               ` Benjamin Marzinski
  1 sibling, 1 reply; 47+ messages in thread
From: Martin Wilck @ 2016-11-24  9:21 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel, tang.junhui

On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote:

> The issue is that we simply need to work on processing the whole list
> at
> a time.  Martin's filtering definitely has a place here. He is
> correct
> that if we get a delete uevent for a device, we don't need to process
> any of the earlier ones.  We do need to look at both the add and
> change
> uevents individually. For instance, one of the change uevents out of
> a
> group could be for changing the read-only status of a device. If we
> just
> took the most recent, we would miss that information.  The good news
> is
> that we don't need to actually call uev_add_path and uev_update_path
> once for each of these events.  All we need to do is read through the
> uev environment variables for all of them to find the information we
> need (like change in ro status), and then call the function once with
> that information.

Do we need to implement the environment-merging ourselves? AFAICS this
logic is implemented in udev itself. So if the last event for a given
device is a change event, we'd just need to fetch the device properties
from the udev db. Therefore IMO we just have to look at the last
received event for a given path:

 - DELETE => discard the device
 - ADD => use the udev properties coming with the event
 - CHANGE => query udev db for current set of properties
 
> At any rate, I'd rather get rid of the gazillion waiter threads
> first.

Hm, I thought the threads are good because this avoids one unresponsive
device to stall everything?

Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-24  9:21             ` Martin Wilck
@ 2016-11-28 18:46               ` Benjamin Marzinski
  2016-11-29  6:47                 ` Hannes Reinecke
  2016-11-29  7:57                 ` Martin Wilck
  0 siblings, 2 replies; 47+ messages in thread
From: Benjamin Marzinski @ 2016-11-28 18:46 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel, tang.junhui

On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote:
> On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote:
> 
> > The issue is that we simply need to work on processing the whole list
> > at
> > a time.  Martin's filtering definitely has a place here. He is
> > correct
> > that if we get a delete uevent for a device, we don't need to process
> > any of the earlier ones.  We do need to look at both the add and
> > change
> > uevents individually. For instance, one of the change uevents out of
> > a
> > group could be for changing the read-only status of a device. If we
> > just
> > took the most recent, we would miss that information.  The good news
> > is
> > that we don't need to actually call uev_add_path and uev_update_path
> > once for each of these events.  All we need to do is read through the
> > uev environment variables for all of them to find the information we
> > need (like change in ro status), and then call the function once with
> > that information.
> 
> Do we need to implement the environment-merging ourselves? AFAICS this
> logic is implemented in udev itself. So if the last event for a given
> device is a change event, we'd just need to fetch the device properties
> from the udev db. Therefore IMO we just have to look at the last
> received event for a given path:
> 
>  - DELETE => discard the device
>  - ADD => use the udev properties coming with the event
>  - CHANGE => query udev db for current set of properties

The issue is that when we get a change event, there may be udev
environment variables that tell us what that specific event is for. For
instance, a change in the Read-only status of the path device. Future
change events will not have those environment variables set.  However,
if we want to only call uev_update_path once, we need to know all of the
things that changed, so we need to examine the approriate udev
environment variables for all the change events, so that when we call
uev_update_path, we can take all the necessary actions (in this
instance, changing the Read-only status of the multipath device).

> > At any rate, I'd rather get rid of the gazillion waiter threads
> > first.
> 
> Hm, I thought the threads are good because this avoids one unresponsive
> device to stall everything?

There is work making dm events pollable, so that you can wait for any
number of them with one thread. At the moment, once we get an event, we
lock the vecs lock, which pretty much keeps everything else from
running, so this doesn't really change that.

-Ben
 
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 18:46               ` Benjamin Marzinski
@ 2016-11-29  6:47                 ` Hannes Reinecke
  2016-11-29  8:02                   ` Martin Wilck
  2016-11-29  7:57                 ` Martin Wilck
  1 sibling, 1 reply; 47+ messages in thread
From: Hannes Reinecke @ 2016-11-29  6:47 UTC (permalink / raw)
  To: Benjamin Marzinski, Martin Wilck; +Cc: dm-devel, tang.junhui

On 11/28/2016 07:46 PM, Benjamin Marzinski wrote:
> On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote:
>> On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote:
>>
>>> The issue is that we simply need to work on processing the whole list
>>> at
>>> a time.  Martin's filtering definitely has a place here. He is
>>> correct
>>> that if we get a delete uevent for a device, we don't need to process
>>> any of the earlier ones.  We do need to look at both the add and
>>> change
>>> uevents individually. For instance, one of the change uevents out of
>>> a
>>> group could be for changing the read-only status of a device. If we
>>> just
>>> took the most recent, we would miss that information.  The good news
>>> is
>>> that we don't need to actually call uev_add_path and uev_update_path
>>> once for each of these events.  All we need to do is read through the
>>> uev environment variables for all of them to find the information we
>>> need (like change in ro status), and then call the function once with
>>> that information.
>>
>> Do we need to implement the environment-merging ourselves? AFAICS this
>> logic is implemented in udev itself. So if the last event for a given
>> device is a change event, we'd just need to fetch the device properties
>> from the udev db. Therefore IMO we just have to look at the last
>> received event for a given path:
>>
>>  - DELETE => discard the device
>>  - ADD => use the udev properties coming with the event
>>  - CHANGE => query udev db for current set of properties
> 
> The issue is that when we get a change event, there may be udev
> environment variables that tell us what that specific event is for. For
> instance, a change in the Read-only status of the path device. Future
> change events will not have those environment variables set.  However,
> if we want to only call uev_update_path once, we need to know all of the
> things that changed, so we need to examine the approriate udev
> environment variables for all the change events, so that when we call
> uev_update_path, we can take all the necessary actions (in this
> instance, changing the Read-only status of the multipath device).
> 
>>> At any rate, I'd rather get rid of the gazillion waiter threads
>>> first.
>>
>> Hm, I thought the threads are good because this avoids one unresponsive
>> device to stall everything?
> 
> There is work making dm events pollable, so that you can wait for any
> number of them with one thread. At the moment, once we get an event, we
> lock the vecs lock, which pretty much keeps everything else from
> running, so this doesn't really change that.
> 
Which again leads me to the question:
Why are we waiting for dm events?
The code handling them is pretty arcane, and from what I've seen there
is nothing in there which we wouldn't be informed via other mechanisms
(path checker, uevents).
So why do we still bother with them?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 18:46               ` Benjamin Marzinski
  2016-11-29  6:47                 ` Hannes Reinecke
@ 2016-11-29  7:57                 ` Martin Wilck
  2016-11-29 17:41                   ` Benjamin Marzinski
  1 sibling, 1 reply; 47+ messages in thread
From: Martin Wilck @ 2016-11-29  7:57 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: dm-devel, tang.junhui

On Mon, 2016-11-28 at 12:46 -0600, Benjamin Marzinski wrote:
> On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote:
> > 
> > from the udev db. Therefore IMO we just have to look at the last
> > received event for a given path:
> > 
> >  - DELETE => discard the device
> >  - ADD => use the udev properties coming with the event
> >  - CHANGE => query udev db for current set of properties
> 
> The issue is that when we get a change event, there may be udev
> environment variables that tell us what that specific event is for.
> For
> instance, a change in the Read-only status of the path device. Future
> change events will not have those environment variables set.

And the udev db will not have records of the environment variables of
previous change events? IOW, in your example, we can't derive the read-
only status of a device by looking at the current set of udev
properties of the device, only by tracking the full uevent history?

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-29  6:47                 ` Hannes Reinecke
@ 2016-11-29  8:02                   ` Martin Wilck
  2016-11-29  8:10                     ` Zdenek Kabelac
  2016-11-29 17:25                     ` Benjamin Marzinski
  0 siblings, 2 replies; 47+ messages in thread
From: Martin Wilck @ 2016-11-29  8:02 UTC (permalink / raw)
  To: Hannes Reinecke, Benjamin Marzinski; +Cc: dm-devel, tang.junhui

On Tue, 2016-11-29 at 07:47 +0100, Hannes Reinecke wrote:
> On 11/28/2016 07:46 PM, Benjamin Marzinski wrote:
> > On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote:
> > > On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote:
> > > 
> > > > At any rate, I'd rather get rid of the gazillion waiter threads
> > > > first.
> > > 
> > > Hm, I thought the threads are good because this avoids one
> > > unresponsive
> > > device to stall everything?
> > 
> > There is work making dm events pollable, so that you can wait for
> > any
> > number of them with one thread. At the moment, once we get an
> > event, we
> > lock the vecs lock, which pretty much keeps everything else from
> > running, so this doesn't really change that.
> > 
> 
> Which again leads me to the question:
> Why are we waiting for dm events?
> The code handling them is pretty arcane, and from what I've seen
> there
> is nothing in there which we wouldn't be informed via other
> mechanisms
> (path checker, uevents).
> So why do we still bother with them?

I was asking myself the same question. From my inspection of the kernel
code, there are two code paths that trigger a dm event but no uevent
(bypass_pg() and switch_pg_num(), both related to path group
switching). If these are covered by the path checker, I see no point in
waiting for DM events. But of course, I may be missing something.

Regards,
Martin


-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-29  8:02                   ` Martin Wilck
@ 2016-11-29  8:10                     ` Zdenek Kabelac
  2016-11-29  8:16                       ` Martin Wilck
  2016-11-29 17:25                     ` Benjamin Marzinski
  1 sibling, 1 reply; 47+ messages in thread
From: Zdenek Kabelac @ 2016-11-29  8:10 UTC (permalink / raw)
  To: Martin Wilck, Hannes Reinecke, Benjamin Marzinski; +Cc: dm-devel, tang.junhui

Dne 29.11.2016 v 09:02 Martin Wilck napsal(a):
> On Tue, 2016-11-29 at 07:47 +0100, Hannes Reinecke wrote:
>> On 11/28/2016 07:46 PM, Benjamin Marzinski wrote:
>>> On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote:
>>>> On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote:
>>>>
>>>>> At any rate, I'd rather get rid of the gazillion waiter threads
>>>>> first.
>>>>
>>>> Hm, I thought the threads are good because this avoids one
>>>> unresponsive
>>>> device to stall everything?
>>>
>>> There is work making dm events pollable, so that you can wait for
>>> any
>>> number of them with one thread. At the moment, once we get an
>>> event, we
>>> lock the vecs lock, which pretty much keeps everything else from
>>> running, so this doesn't really change that.
>>>
>>
>> Which again leads me to the question:
>> Why are we waiting for dm events?
>> The code handling them is pretty arcane, and from what I've seen
>> there
>> is nothing in there which we wouldn't be informed via other
>> mechanisms
>> (path checker, uevents).
>> So why do we still bother with them?
>
> I was asking myself the same question. From my inspection of the kernel
> code, there are two code paths that trigger a dm event but no uevent
> (bypass_pg() and switch_pg_num(), both related to path group
> switching). If these are covered by the path checker, I see no point in
> waiting for DM events. But of course, I may be missing something.
>

Processing of 'dm' events likely should be postponed to 'dmeventd' -
which is a daemon resolving the problem here with waiting for an event.

Plugin just takes the action.

IMHO there is nothing easier you can have.

It's then upto dmeventd to maintain the best 'connection' with kernel and events.


But still - I'd really like to see the focus to target to biggest bottleneck 
first  - i.e. 'blkid'  executed on appearing & disappearing component devices.


Regards

Zdenek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-29  8:10                     ` Zdenek Kabelac
@ 2016-11-29  8:16                       ` Martin Wilck
  2016-11-29  8:24                         ` Zdenek Kabelac
  0 siblings, 1 reply; 47+ messages in thread
From: Martin Wilck @ 2016-11-29  8:16 UTC (permalink / raw)
  To: Zdenek Kabelac, Hannes Reinecke, Benjamin Marzinski; +Cc: dm-devel, tang.junhui

On Tue, 2016-11-29 at 09:10 +0100, Zdenek Kabelac wrote:
> Dne 29.11.2016 v 09:02 Martin Wilck napsal(a):
> > On Tue, 2016-11-29 at 07:47 +0100, Hannes Reinecke wrote:
> > > On 11/28/2016 07:46 PM, Benjamin Marzinski wrote:
> > > > On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote:
> > > > > On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote:
> > > > > 
> > > > > > At any rate, I'd rather get rid of the gazillion waiter
> > > > > > threads
> > > > > > first.
> > > > > 
> > > > > Hm, I thought the threads are good because this avoids one
> > > > > unresponsive
> > > > > device to stall everything?
> > > > 
> > > > There is work making dm events pollable, so that you can wait
> > > > for
> > > > any
> > > > number of them with one thread. At the moment, once we get an
> > > > event, we
> > > > lock the vecs lock, which pretty much keeps everything else
> > > > from
> > > > running, so this doesn't really change that.
> > > > 
> > > 
> > > Which again leads me to the question:
> > > Why are we waiting for dm events?
> > > The code handling them is pretty arcane, and from what I've seen
> > > there
> > > is nothing in there which we wouldn't be informed via other
> > > mechanisms
> > > (path checker, uevents).
> > > So why do we still bother with them?
> > 
> > I was asking myself the same question. From my inspection of the
> > kernel
> > code, there are two code paths that trigger a dm event but no
> > uevent
> > (bypass_pg() and switch_pg_num(), both related to path group
> > switching). If these are covered by the path checker, I see no
> > point in
> > waiting for DM events. But of course, I may be missing something.
> > 
> 
> Processing of 'dm' events likely should be postponed to 'dmeventd' -
> which is a daemon resolving the problem here with waiting for an
> event.
> 
> Plugin just takes the action.
> 
> IMHO there is nothing easier you can have.

> It's then upto dmeventd to maintain the best 'connection' with kernel
> and events.

But that would simply move the "gazillion waiter threads" from
multipathd to dmeventd, right? And it would introduce another boot
sequence dependency for multipathd, I'm not sure if that's desirable.

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-29  8:16                       ` Martin Wilck
@ 2016-11-29  8:24                         ` Zdenek Kabelac
  0 siblings, 0 replies; 47+ messages in thread
From: Zdenek Kabelac @ 2016-11-29  8:24 UTC (permalink / raw)
  To: Martin Wilck, Hannes Reinecke, Benjamin Marzinski; +Cc: dm-devel, tang.junhui

Dne 29.11.2016 v 09:16 Martin Wilck napsal(a):
> On Tue, 2016-11-29 at 09:10 +0100, Zdenek Kabelac wrote:
>> Dne 29.11.2016 v 09:02 Martin Wilck napsal(a):
>>> On Tue, 2016-11-29 at 07:47 +0100, Hannes Reinecke wrote:
>>>> On 11/28/2016 07:46 PM, Benjamin Marzinski wrote:
>>>>> On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote:
>>>>>> On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote:
>>>>>>
>>>>>>> At any rate, I'd rather get rid of the gazillion waiter
>>>>>>> threads
>>>>>>> first.
>>>>>>
>>>>>> Hm, I thought the threads are good because this avoids one
>>>>>> unresponsive
>>>>>> device to stall everything?
>>>>>
>>>>> There is work making dm events pollable, so that you can wait
>>>>> for
>>>>> any
>>>>> number of them with one thread. At the moment, once we get an
>>>>> event, we
>>>>> lock the vecs lock, which pretty much keeps everything else
>>>>> from
>>>>> running, so this doesn't really change that.
>>>>>
>>>>
>>>> Which again leads me to the question:
>>>> Why are we waiting for dm events?
>>>> The code handling them is pretty arcane, and from what I've seen
>>>> there
>>>> is nothing in there which we wouldn't be informed via other
>>>> mechanisms
>>>> (path checker, uevents).
>>>> So why do we still bother with them?
>>>
>>> I was asking myself the same question. From my inspection of the
>>> kernel
>>> code, there are two code paths that trigger a dm event but no
>>> uevent
>>> (bypass_pg() and switch_pg_num(), both related to path group
>>> switching). If these are covered by the path checker, I see no
>>> point in
>>> waiting for DM events. But of course, I may be missing something.
>>>
>>
>> Processing of 'dm' events likely should be postponed to 'dmeventd' -
>> which is a daemon resolving the problem here with waiting for an
>> event.
>>
>> Plugin just takes the action.
>>
>> IMHO there is nothing easier you can have.
>
>> It's then upto dmeventd to maintain the best 'connection' with kernel
>> and events.
>
> But that would simply move the "gazillion waiter threads" from
> multipathd to dmeventd, right? And it would introduce another boot
> sequence dependency for multipathd, I'm not sure if that's desirable.
>


Well - frankly how many multipath devices have you seen on a single machine?

Have you noticed every single  "XFS" mounted device spreads around 9 threads 
these days?

But I'm not going to defend current 'thread' explosion with device monitoring 
and this thing is BEING resolved.

The main point here is - multipathd  does NOT need to solve this issue at all 
and let dmeventd to resolve it - and once kernel will have better mechanism 
with reliable event passing  (which is NOT udev btw)  - it will use it.

The solution is not that everyone will write everything here...


Regards

Zdenek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-23  1:08             ` tang.junhui
@ 2016-11-29  9:07               ` Zdenek Kabelac
  2016-11-29 10:13                 ` tang.junhui
  0 siblings, 1 reply; 47+ messages in thread
From: Zdenek Kabelac @ 2016-11-29  9:07 UTC (permalink / raw)
  To: tang.junhui, Benjamin Marzinski
  Cc: zhang.kai16, dm-devel, Martin Wilck, Bart Van Assche

Dne 23.11.2016 v 02:08 tang.junhui@zte.com.cn napsal(a):
> So, now we have at least 4 ways to improve mutliapth efficiency:
> 1)  Filtering uevents;
> 2)  Merger uevents;
> 3)  Using separate locks for mpvec and pathvec;
> 4)  Get rid of the gazillion waiter threads.
>
> This is exciting, but how do we achieve this blueprint?
> Can we set up some working groups and develop it in parallel
> to implement each improvement since most of them are independent?
>

I'm missing here the analysis of impact on performance.

Do we have 'perf' traces and other reports where is the CPU wasted ?

While surely solving all issues make sense - there are some tasks which have 
major impact on speed - while others user will most not notice at all.


Regards

Zdenek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-29  9:07               ` Zdenek Kabelac
@ 2016-11-29 10:13                 ` tang.junhui
  0 siblings, 0 replies; 47+ messages in thread
From: tang.junhui @ 2016-11-29 10:13 UTC (permalink / raw)
  To: Zdenek Kabelac
  Cc: zhang.kai16, dm-devel-bounces, dm-devel, Bart Van Assche, Martin Wilck


[-- Attachment #1.1: Type: text/plain, Size: 1607 bytes --]

Hello Zdenek

I think you suggestion is constructive,
some unnecessary operations in udev rules waster too much system 
resoure(CPU etc.)
It is another way to promote multipath efficiency,
could you put forward more specific amendment?

Regards,
Tang





发件人:         Zdenek Kabelac <zdenek.kabelac@gmail.com>
收件人:         tang.junhui@zte.com.cn, Benjamin Marzinski 
<bmarzins@redhat.com>, 
抄送:   zhang.kai16@zte.com.cn, dm-devel@redhat.com, Martin Wilck 
<mwilck@suse.com>, Bart Van Assche <bart.vanassche@sandisk.com>
日期:   2016/11/29 17:22
主题:   Re: [dm-devel] Improve processing efficiency for addition and 
deletion of multipath devices
发件人: dm-devel-bounces@redhat.com



Dne 23.11.2016 v 02:08 tang.junhui@zte.com.cn napsal(a):
> So, now we have at least 4 ways to improve mutliapth efficiency:
> 1)  Filtering uevents;
> 2)  Merger uevents;
> 3)  Using separate locks for mpvec and pathvec;
> 4)  Get rid of the gazillion waiter threads.
>
> This is exciting, but how do we achieve this blueprint?
> Can we set up some working groups and develop it in parallel
> to implement each improvement since most of them are independent?
>

I'm missing here the analysis of impact on performance.

Do we have 'perf' traces and other reports where is the CPU wasted ?

While surely solving all issues make sense - there are some tasks which 
have 
major impact on speed - while others user will most not notice at all.


Regards

Zdenek

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel



[-- Attachment #1.2: Type: text/html, Size: 2887 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-29  8:02                   ` Martin Wilck
  2016-11-29  8:10                     ` Zdenek Kabelac
@ 2016-11-29 17:25                     ` Benjamin Marzinski
  1 sibling, 0 replies; 47+ messages in thread
From: Benjamin Marzinski @ 2016-11-29 17:25 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel, tang.junhui

On Tue, Nov 29, 2016 at 09:02:08AM +0100, Martin Wilck wrote:
> On Tue, 2016-11-29 at 07:47 +0100, Hannes Reinecke wrote:
> > On 11/28/2016 07:46 PM, Benjamin Marzinski wrote:
> > > On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote:
> > > > On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote:
> > > > 
> > > > > At any rate, I'd rather get rid of the gazillion waiter threads
> > > > > first.
> > > > 
> > > > Hm, I thought the threads are good because this avoids one
> > > > unresponsive
> > > > device to stall everything?
> > > 
> > > There is work making dm events pollable, so that you can wait for
> > > any
> > > number of them with one thread. At the moment, once we get an
> > > event, we
> > > lock the vecs lock, which pretty much keeps everything else from
> > > running, so this doesn't really change that.
> > > 
> > 
> > Which again leads me to the question:
> > Why are we waiting for dm events?
> > The code handling them is pretty arcane, and from what I've seen
> > there
> > is nothing in there which we wouldn't be informed via other
> > mechanisms
> > (path checker, uevents).
> > So why do we still bother with them?

But multipath would still need to respond to the uevents just like it
currently responds to the dm events (by calling udpate_multipath), so
aside from the code setting up waiting on the dm events (which isn't
that bad, except that we currently have to set it up and tear it down
for every multipath device) things would still be the same. The locking
requirements of responding to an event would definitely stame the same,
regardless of whether we were responding to a dm_event or a uevent.
 
> I was asking myself the same question. From my inspection of the kernel
> code, there are two code paths that trigger a dm event but no uevent
> (bypass_pg() and switch_pg_num(), both related to path group
> switching). If these are covered by the path checker, I see no point in
> waiting for DM events. But of course, I may be missing something.

One answer is that uevents are only "best effort". They are not
guaranteed to be delivered. And the time when they are least likely to
be delivered is when the system is under a lot of memory pressure, which
is exactly what can happen if you lose all paths to your devices, and
are backing up IO. However, from my discussion with Hannes about this,
neither of us came up with any situation where missing a uevent is
disasterous to multipathd. uevents don't bring paths back. The checker
loop does.  Uevents will tell multipathd when a path fails, but the
kernel is the one in charge of routing IO. If multipathd missed a event
for the last failing path, it would not correctly enter recovery mode,
but the checker loop should take care of that when it finds that the
path is down.  The way the checker code is right now, if the kernel
notices a path failure and marks a path down between checkerloop runs,
and multipathd missed the event notifying it of this, and the path came
back before the next checker pass, multipath wouldn't restore the path.
We can obviously solve this by just adding the syncing code from
update_multipath to the check_path code after we call
update_multipath_strings. So, multipathd will need to add some code to
protect it from missing uevents, and losing an event will cause
multipathd to do some things slower than otherwise, but switching to
a design that just uses uevents is workable.

On the other hand, when you think about it, uevents have a lot of
processing associated with them.  This entire thread is about how to cut
that down. dm events only need to be processed by the program waiting
for them, and programs only get the events that they care about. If we
can make dm event processing less cumbersome (and that is currently
being worked on, by making them pollable), I'm not even sure that we
need to send uevents for paths failing and being restored (except for
possibly when your last path goes down, or your first path goes up). The
whole point of multipath is to manage these paths so that nobody else
needs to care about this stuff.

The one big benefit to removing the dm event waiting code, is that we
can remove it completely, which we couldn't do with the uevent code.
Even if we cut down on the number of uevents that multipath generates
and has to respond to, multipathd will still need to respond to uevents,
because many of the important ones come from outside of device-mapper.

-Ben

> 
> Regards,
> Martin
> 
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-29  7:57                 ` Martin Wilck
@ 2016-11-29 17:41                   ` Benjamin Marzinski
  0 siblings, 0 replies; 47+ messages in thread
From: Benjamin Marzinski @ 2016-11-29 17:41 UTC (permalink / raw)
  To: Martin Wilck; +Cc: dm-devel, tang.junhui

On Tue, Nov 29, 2016 at 08:57:18AM +0100, Martin Wilck wrote:
> On Mon, 2016-11-28 at 12:46 -0600, Benjamin Marzinski wrote:
> > On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote:
> > > 
> > > from the udev db. Therefore IMO we just have to look at the last
> > > received event for a given path:
> > > 
> > >  - DELETE => discard the device
> > >  - ADD => use the udev properties coming with the event
> > >  - CHANGE => query udev db for current set of properties
> > 
> > The issue is that when we get a change event, there may be udev
> > environment variables that tell us what that specific event is for.
> > For
> > instance, a change in the Read-only status of the path device. Future
> > change events will not have those environment variables set.
> 
> And the udev db will not have records of the environment variables of
> previous change events? IOW, in your example, we can't derive the read-
> only status of a device by looking at the current set of udev
> properties of the device, only by tracking the full uevent history?

Correct, when a device's read only status is changed, the kernel will
send a change uevent with the udev environment variable "DISK_RO=<n>"
where <n> is 1 if the device changed to read-only, and 0 if it changed
to read-write.  This environment variable is only set for this change
event.

Now, it's totally possible that in every call to uev_update_path,
multipathd could check a device's read-only status in sysfs and take
action if it had changed. Otherwise, it needs to look though all the
change events in the batch it is processing to see if any of them have
the DISK_RO evironment variable set, and it would need to pass this
information to uev_update_path. I personally favor this second approach.
I can easily imagine multipath wanting to deal a specific way with
certain change events, especially if we go the route of ignoring dm
events completely, and relying only on uevents for updating multipath.
In these cases, it may not always be possible to look in sysfs to find
out what caused a previous change event, and we would have to read the
uevent environment variables for all the events we are processing.

-Ben

> 
> Regards
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 15:37   ` Hannes Reinecke
@ 2016-12-01  1:16     ` tang.junhui
  0 siblings, 0 replies; 47+ messages in thread
From: tang.junhui @ 2016-12-01  1:16 UTC (permalink / raw)
  To: Benjamin Marzinski, Hannes Reinecke
  Cc: Bart Van Assche, dm-devel, Martin Wilck


[-- Attachment #1.1: Type: text/plain, Size: 9855 bytes --]

Hello Ben, Hannes 

I'm sorry for the late reply.

> You can't just get the wwid with no work (look at all work uev_add_path
> does, specifically alloc_path_with_pathinfo). Now you could reorder
> this, but there isn't much point, since it is doing useful things, like
> checking if this is a spurious uevent, and necessary things, like
> figuring out the device type and using that that the configuration to
> figure out HOW to get the wwid.

IMO, WWID can be geted from uevent, since it has a ID_SERIAL filed in 
uevent body.

> It seems like what you want to do is to
> call uev_add_path multiple times, but defer most of the work that
> ev_add_path does (creating or updating the multipath device), until
> you've processed all that paths.

Not exactly, the input parameter "struct uevent *uev" is a batch of 
merged uevents, so uev_add_path() is called once to process the merged 
uevent.

> split off uev_add_path() and
> ev_add_path().
> Then uev_add_path could generate a list of fully-formed paths which
> ev_add_path() would process.
> IE generalize coalesce_paths() to work on a passed-in list rather than
> the normal vecs->pathvec.

Hannes, I think my thoughts are close to your idea now, In uev_add_path(), 

we get all information of the merged paths, and then call ev_add_path() 
to create or update the multipath device. Maybe the different between us 
is "processing all the same type uevents(add, etc.) in ev_add_path()" or 
"processing all the same type uevents(add, etc.) which came from the 
same LUN in ev_add_path()". I think your idea can also reach the goal, and
reduce the DM reload times. So we will try to code as your idea, and 
in list_merger_uevents(&uevq_tmp) only merger the same type(add, etc.) 
uevents,
add stop mergering when another type uevents are occured.

Thanks all,
Tang
 



发件人:         Hannes Reinecke <hare@suse.de>
收件人:         Benjamin Marzinski <bmarzins@redhat.com>, 
tang.junhui@zte.com.cn, 
抄送:   dm-devel@redhat.com, zhang.kai16@zte.com.cn, Martin Wilck 
<mwilck@suse.com>, Bart Van Assche <bart.vanassche@sandisk.com>
日期:   2016/11/28 23:46
主题:   Re: [dm-devel] Improve processing efficiency for addition and 
deletion of multipath devices
发件人: dm-devel-bounces@redhat.com



On 11/28/2016 04:25 PM, Benjamin Marzinski wrote:
> On Mon, Nov 28, 2016 at 10:19:15AM +0800, tang.junhui@zte.com.cn wrote:
>>    Hello Christophe, Ben, Hannes, Martin, Bart,
>>    I am a member of host-side software development team of ZXUSP 
storage
>>    project
>>    in ZTE Corporation. Facing the market demand, our team decides to 
write
>>    code to
>>    promote multipath efficiency next month. The whole idea is in the 
mail
>>    below.We
>>    hope to participate in and make progress with the open source 
community,
>>    so any
>>    suggestion and comment would be welcome.
> 
> Like I mentioned before, I think this is a good idea in general, but the
> devil is in the details here.
> 
>>
>>    Thanks,
>>    Tang
>>
>> 
------------------------------------------------------------------------------------------------------------------------------
>> 
------------------------------------------------------------------------------------------------------------------------------
>>    1.        Problem
>>    In these scenarios, multipath processing efficiency is low:
>>    1) Many paths exist in each multipath device,
>>    2) Devices addition or deletion during iSCSI login/logout or FC link
>>    up/down.
> 
> <snip>
> 
>>    4.        Proposal
>>    Other than processing uevents one by one, uevents which coming from 
the
>>    same LUN devices can be mergered to one, and then uevent processing
>>    thread only needs to process it once, and it only produces one DM 
addition
>>    uevent which could reduce system resource consumption.
>>
>>    The example in Chapter 2 is continued to use to explain the 
proposal:
>>    1) Multipath receives block device addition uevents from udev:
>>    UDEV  [89068.806214] add 
>>     /devices/platform/host3/session44/target3:0:0/3:0:0:0/block/sdc 
(block)
>>    UDEV  [89068.909457] add 
>>     /devices/platform/host3/session44/target3:0:0/3:0:0:2/block/sdg 
(block)
>>    UDEV  [89068.944956] add 
>>     /devices/platform/host3/session44/target3:0:0/3:0:0:1/block/sde 
(block)
>>    UDEV  [89068.959215] add 
>>     /devices/platform/host5/session46/target5:0:0/5:0:0:0/block/sdh 
(block)
>>    UDEV  [89068.978558] add 
>>     /devices/platform/host5/session46/target5:0:0/5:0:0:2/block/sdk 
(block)
>>    UDEV  [89069.004217] add 
>>     /devices/platform/host5/session46/target5:0:0/5:0:0:1/block/sdj 
(block)
>>    UDEV  [89069.486361] add 
>>     /devices/platform/host4/session45/target4:0:0/4:0:0:1/block/sdf 
(block)
>>    UDEV  [89069.495194] add 
>>     /devices/platform/host4/session45/target4:0:0/4:0:0:0/block/sdd 
(block)
>>    UDEV  [89069.511628] add 
>>     /devices/platform/host4/session45/target4:0:0/4:0:0:2/block/sdi 
(block)
>>    UDEV  [89069.716292] add 
>>     /devices/platform/host6/session47/target6:0:0/6:0:0:0/block/sdl 
(block)
>>    UDEV  [89069.748456] add 
>>     /devices/platform/host6/session47/target6:0:0/6:0:0:1/block/sdm 
(block)
>>    UDEV  [89069.789662] add 
>>     /devices/platform/host6/session47/target6:0:0/6:0:0:2/block/sdn 
(block)
>>
>>    2) Multipath merges these 12 uevents to 3 internal uvents
>>    UEVENT add sdc sdh sdd sdl
>>    UEVENT add sde sdj sdf sdm
>>    UEVENT add sdg sdk sdi sdn
>>
>>    3) Multipath process these 3 uevents one by one, and only produce 3
>>    addition
>>    DM uvents, no dm change uevent exists.
>>    KERNEL[89068.899614] add      /devices/virtual/block/dm-2 (block)
>>    KERNEL[89068.955364] add      /devices/virtual/block/dm-3 (block)
>>    KERNEL[89069.018903] add      /devices/virtual/block/dm-4 (block)
> 
> Just because I'm pedantic: There will, of cource, be dm change events.
> Without them, you couldn't have a multipath device. Whenever you load a
> table in a dm device (including during the initial creation), you get a
> change event.
> 
>>    4) Udev process these uevents above, and transfer it to multipath
>>    UDEV  [89068.926428] add      /devices/virtual/block/dm-2 (block)
>>    UDEV  [89069.007511] add      /devices/virtual/block/dm-3 (block)
>>    UDEV  [89069.098054] add      /devices/virtual/block/dm-4 (block)
> 
> multipathd ignores add events for dm devices (look at uev_trigger). A dm
> device isn't set up until it's initial change event happens.
> 
>>    5) Multipath processes these uevents above, and finishes the 
creation of
>>    multipath
>>    devices.
>>
>>    5.        Coding
>>    After taking over uevents form uevent listening thread, uevent 
processing
>>    thread can
>>    merger these uevents before processing��
>>    int uevent_dispatch(int (*uev_trigger)(struct uevent *, void *
>>    trigger_data),
>>                void * trigger_data)
>>    {
>>        ...
>>        while (1) {
>>            ...
>>            list_splice_init(&uevq, &uevq_tmp);
>>            ...
>>            list_merger_uevents(&uevq_tmp);
>>            service_uevq(&uevq_tmp);
>>        }
>>        ...
>>    }
>>    In structure of ��struct uevent�� , an additional member of 
��char
>>    wwid[WWID_SIZE]�� is
>>    added to record each device WWID for addition or change uevent to 
identify
>>    whether
>>    these uevents coming from the same LUN, and an additional member of
>>    ��struct list_head merger_node�� is added to record the list of 
uevents
>>    which having been
>>    merged with this uevent:
>>    struct uevent {
>>        struct list_head node;
>>        struct list_head merger_node;
>>        char wwid[WWID_SIZE]
>>        struct udev_device *udev;
>>        ...
>>    };
> 
> You can't just get the wwid with no work (look at all work uev_add_path
> does, specifically alloc_path_with_pathinfo). Now you could reorder
> this, but there isn't much point, since it is doing useful things, like
> checking if this is a spurious uevent, and necessary things, like
> figuring out the device type and using that that the configuration to
> figure out HOW to get the wwid. It seems like what you want to do is to
> call uev_add_path multiple times, but defer most of the work that
> ev_add_path does (creating or updating the multipath device), until
> you've processed all that paths.
> 
Which is kinda what I've proposed; split off uev_add_path() and
ev_add_path().
Then uev_add_path could generate a list of fully-formed paths which
ev_add_path() would process.
IE generalize coalesce_paths() to work on a passed-in list rather than
the normal vecs->pathvec.

Or consider using vecs->pathvec to hold 'orphan' paths when processing
uevents; then we could add the paths to the pathvec in uev_add_path()
and have another thread going through the list of paths and trying to
call coalesce_paths() for the changed mpps.
Will be a tad tricky, as we'd need to identify which mpps needs to be
updated; but nothing too awkward.

Meanwhile I'm working on converting vecs->pathvec and vecs->mpvec to RCU
lists; that should get rid of the need for most locks, which I suspect
being the main reason for the scaling issues.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke                                 Teamlead Storage & 
Networking
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel


[-- Attachment #1.2: Type: text/html, Size: 14860 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-29  7:52     ` Martin Wilck
@ 2016-11-29 19:21       ` Benjamin Marzinski
  0 siblings, 0 replies; 47+ messages in thread
From: Benjamin Marzinski @ 2016-11-29 19:21 UTC (permalink / raw)
  To: Martin Wilck; +Cc: tang.junhui, zhang.kai16, dm-devel, Bart Van Assche

On Tue, Nov 29, 2016 at 08:52:17AM +0100, Martin Wilck wrote:
> On Mon, 2016-11-28 at 11:31 -0600, Benjamin Marzinski wrote:
> > On Mon, Nov 28, 2016 at 11:28:56AM +0100, Martin Wilck wrote:
> > > On Mon, 2016-11-28 at 10:19 +0800, tang.junhui@zte.com.cn wrote:
> > > > 
> > > > 4.        Proposal 
> > > > Other than processing uevents one by one, uevents which coming
> > > > from
> > > > the 
> > > > same LUN devices can be mergered to one, and then uevent
> > > > processing 
> > > > thread only needs to process it once, and it only produces one DM
> > > > addition 
> > > > uevent which could reduce system resource consumption. 
> > > > 
> > > 
> > > Here comes an idea how to achieve this without a lot of additional
> > > code:
> > > 
> > > libmultipath already has code to check whether any maps need to be
> > > updated (in coalesce_paths()). Instead of recording uevents,
> > > merging
> > > them, and calling ev_add_path() for every affected WWID, it might
> > > be
> > > sufficient to set daemon state to DAEMON_CONFIGURE and wake up the
> > > main
> > > multipathd thread to call reconfigure(). Then we only need to make
> > > sure
> > > that coalesce_paths() really reloads or creates maps only when
> > > necessary. I have some patches here that I made for that purpose,
> > > for a
> > > different scenario (multipathd to avoid RELOAD ioctls when it's
> > > started
> > > in a scenario where most paths are already set up by udev).
> > 
> > There is a long standing multipath issue that this will likely make a
> > lot worse.  Currently, multipathd can drop paths that it doesn't have
> > access to on reconfigure. This is wrong. When the path comes back, it
> > won't issue another add uevent. This means that multipathd won't be
> > able
> > to re-enable it (since it stopped monitoring it during the
> > reconfigure).
> > In fact the only way to get the path back is to either manually
> > intervene or to have the path actually get removed from the system
> > and
> > come back.
> 
> Thanks for pointing this out. But don't you think it can be dealt with
> by fixing the reconfigure() path? 

Sure. We should fix that even if we don't make calls to reconfigure more
common. I was just pointing out that by going this route, we make fixing
that issue much more of a priority.

> My point is: any code that merges uevents and tries to figure out the
> correct "minimal" set DM operations to put the changes in place re-
> implement at least parts of the logic of coalesce_paths(), and face
> similar problems.

Yeah. But reconfigure does a lot of stuff that we probably shouldn't be
doing when we are simply adding and removing paths, such as reload the
configuration (and everything that goes along with that, such as
remaking the pathvec to deal with changes in which devices are
blacklisted). We could try breaking out just what needs to be done to
deal with changes to a seperate function, but I'm pretty certain that to
catch all the possibilities of batched events, we will end up doing more
work than necessary in many cases.  In fact, we may find that this takes
long enough that we actually take a performance hit unless we have a
large batch. 

-Ben

> Regards
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 17:22         ` Benjamin Marzinski
@ 2016-11-29  9:34           ` Zdenek Kabelac
  0 siblings, 0 replies; 47+ messages in thread
From: Zdenek Kabelac @ 2016-11-29  9:34 UTC (permalink / raw)
  To: Benjamin Marzinski, Hannes Reinecke
  Cc: tang.junhui, zhang.kai16, dm-devel, Zdenek Kabelac,
	Bart Van Assche, Martin Wilck

Dne 28.11.2016 v 18:22 Benjamin Marzinski napsal(a):
> On Mon, Nov 28, 2016 at 01:08:51PM +0100, Hannes Reinecke wrote:
>> On 11/28/2016 12:51 PM, Zdenek Kabelac wrote:
>>> Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a):
>>>> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote:
>>>
>>> With multipath device - you ONLY want to 'scan' device (with blkid)  when
>>> only the initial first member device of multipath gets in.
>>>
>>> So you start multipath (resume -> CHANGE) - it should be the ONLY place
>>> to run 'blkid' test (which really goes though over 3/4MB of disk read,
>>> to check if there is not ZFS somewhere)
>>>
>>> Then any next disk being a member of multipath (recognized by 'multipath
>>> -c',
>>> should NOT scan)  - as far  as  I can tell current order is opposite,
>>> fist there is  'blkid' (60) and then rule (62) recognizes a mpath_member.
>>>
>>> Thus every add disk fires very lengthy blkid scan.
>>>
>>> Of course I'm not here an expert on dm multipath rules so passing this
>>> on to prajnoha@ -  but I'd guess this is primary source of slowdowns.
>>>
>>> There should be exactly ONE blkid for a single multipath device - as
>>> long as 'RELOAD' only  add/remove  paths  (there is no reason to scan
>>> component devices)
>>>
>> ATM 'multipath -c' is just a simple test if the device is supposed to be
>> handled by multipath.
>
> Well, "simple" might be stretching the truth a little. I'd really like
> to not have to call multipath -c on every change event to a device since
> this isn't a particularly quick callout either. In fact we do this with
> the redhat code, but we use some horrible hacks, that could be solved if
> udev would just allow rules to compare the value of two environment
> variables. But this is impossible. I opened a bugzilla for allowing
> this, but it recently got closed WONTFIX. The idea is that multipathd

There are number of ways to solve this thing.

One could be  udev built-in command plugin to make parsing of WWID efficient.

multipath could also be able to generate udev rule dynamically
(i.e. --makeudevrule)  which would check if udev rule is older then config 
file and reload udev with new rule if necessary.
In such generated rules you then could use  WWID constant for compare operation.

But even plain 'multipath -c' is still better - since it's executed already 
anyway - my point here is - it should be executed before 'blkid' is attempted.

> would set a timestamp when it started, when a new wwid was added to the
> wwids file, and when it got reconfigured. These are the only times a
> configuration could change (well, users could change /etc/multipath.conf
> but not reload multipathd, but that will already cause problems). udev
> would read this timestamp and save it to the database.  When change
> events come along, as long as the timestamp hasn't changed, the old
> value of "multipath -c" is still correect.
>
>> And the number of bytes read by blkid should be _that_ large; a simple
>> 'blkid' on my device caused it to read 35k ...
>>
>> Also udev will become very unhappy if we're not calling blkid for every
>> device; you'd be having a hard time reconstructing the event for those
>> devices.
>> While it's trivial to import variables from parent devices, it's
>> impossible to do that from unrelated devices; you'd need a dedicated
>> daemon for that.
>> So we cannot skip blkid without additional tooling.
>
> We need to run blkid on the multipath device, but I'm not sure what it
> gets us on the paths, once we have determined that they belong to

There is NO reason to run blkid on a path (component device) -  since the 
discovered filesystem is later overwritten by mpath member info anyway.

Now lvm2 even tries to detect if the user has tried to use  'path' instead of 
real mpath device on misconfigured system - and already number of such 
misusage have been detected.

Regards

Zdenek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 17:31   ` Benjamin Marzinski
@ 2016-11-29  7:52     ` Martin Wilck
  2016-11-29 19:21       ` Benjamin Marzinski
  0 siblings, 1 reply; 47+ messages in thread
From: Martin Wilck @ 2016-11-29  7:52 UTC (permalink / raw)
  To: Benjamin Marzinski; +Cc: Bart, tang.junhui, zhang.kai16, dm-devel, Assche

On Mon, 2016-11-28 at 11:31 -0600, Benjamin Marzinski wrote:
> On Mon, Nov 28, 2016 at 11:28:56AM +0100, Martin Wilck wrote:
> > On Mon, 2016-11-28 at 10:19 +0800, tang.junhui@zte.com.cn wrote:
> > > 
> > > 4.        Proposal 
> > > Other than processing uevents one by one, uevents which coming
> > > from
> > > the 
> > > same LUN devices can be mergered to one, and then uevent
> > > processing 
> > > thread only needs to process it once, and it only produces one DM
> > > addition 
> > > uevent which could reduce system resource consumption. 
> > > 
> > 
> > Here comes an idea how to achieve this without a lot of additional
> > code:
> > 
> > libmultipath already has code to check whether any maps need to be
> > updated (in coalesce_paths()). Instead of recording uevents,
> > merging
> > them, and calling ev_add_path() for every affected WWID, it might
> > be
> > sufficient to set daemon state to DAEMON_CONFIGURE and wake up the
> > main
> > multipathd thread to call reconfigure(). Then we only need to make
> > sure
> > that coalesce_paths() really reloads or creates maps only when
> > necessary. I have some patches here that I made for that purpose,
> > for a
> > different scenario (multipathd to avoid RELOAD ioctls when it's
> > started
> > in a scenario where most paths are already set up by udev).
> 
> There is a long standing multipath issue that this will likely make a
> lot worse.  Currently, multipathd can drop paths that it doesn't have
> access to on reconfigure. This is wrong. When the path comes back, it
> won't issue another add uevent. This means that multipathd won't be
> able
> to re-enable it (since it stopped monitoring it during the
> reconfigure).
> In fact the only way to get the path back is to either manually
> intervene or to have the path actually get removed from the system
> and
> come back.

Thanks for pointing this out. But don't you think it can be dealt with
by fixing the reconfigure() path? 

My point is: any code that merges uevents and tries to figure out the
correct "minimal" set DM operations to put the changes in place re-
implement at least parts of the logic of coalesce_paths(), and face
similar problems.

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 10:28 ` Martin Wilck
@ 2016-11-28 17:31   ` Benjamin Marzinski
  2016-11-29  7:52     ` Martin Wilck
  0 siblings, 1 reply; 47+ messages in thread
From: Benjamin Marzinski @ 2016-11-28 17:31 UTC (permalink / raw)
  To: Martin Wilck; +Cc: tang.junhui, zhang.kai16, dm-devel, Bart Van Assche

On Mon, Nov 28, 2016 at 11:28:56AM +0100, Martin Wilck wrote:
> On Mon, 2016-11-28 at 10:19 +0800, tang.junhui@zte.com.cn wrote:
> > 
> > 4.        Proposal 
> > Other than processing uevents one by one, uevents which coming from
> > the 
> > same LUN devices can be mergered to one, and then uevent processing 
> > thread only needs to process it once, and it only produces one DM
> > addition 
> > uevent which could reduce system resource consumption. 
> > 
> 
> Here comes an idea how to achieve this without a lot of additional
> code:
> 
> libmultipath already has code to check whether any maps need to be
> updated (in coalesce_paths()). Instead of recording uevents, merging
> them, and calling ev_add_path() for every affected WWID, it might be
> sufficient to set daemon state to DAEMON_CONFIGURE and wake up the main
> multipathd thread to call reconfigure(). Then we only need to make sure
> that coalesce_paths() really reloads or creates maps only when
> necessary. I have some patches here that I made for that purpose, for a
> different scenario (multipathd to avoid RELOAD ioctls when it's started
> in a scenario where most paths are already set up by udev).

There is a long standing multipath issue that this will likely make a
lot worse.  Currently, multipathd can drop paths that it doesn't have
access to on reconfigure. This is wrong. When the path comes back, it
won't issue another add uevent. This means that multipathd won't be able
to re-enable it (since it stopped monitoring it during the reconfigure).
In fact the only way to get the path back is to either manually
intervene or to have the path actually get removed from the system and
come back.

The correct thing for multipath to do is to only remove paths when it is
manually instructed to, when a reconfigure blacklists the path, or when
it gets a remove uevent.  Now currently this isn't a huge issue since
people don't commonly reconfigure their multipath device, but it we
start running reconfigure all the time, it certainly will become one.

-Ben
 
> Regards
> Martin
> 
> -- 
> Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
> SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 12:08       ` Hannes Reinecke
  2016-11-28 12:23         ` Peter Rajnoha
  2016-11-28 12:55         ` Zdenek Kabelac
@ 2016-11-28 17:22         ` Benjamin Marzinski
  2016-11-29  9:34           ` Zdenek Kabelac
  2 siblings, 1 reply; 47+ messages in thread
From: Benjamin Marzinski @ 2016-11-28 17:22 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: tang.junhui, zhang.kai16, Zdenek Kabelac, dm-devel,
	Bart Van Assche, Martin Wilck

On Mon, Nov 28, 2016 at 01:08:51PM +0100, Hannes Reinecke wrote:
> On 11/28/2016 12:51 PM, Zdenek Kabelac wrote:
> > Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a):
> >> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote:
> >>> Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a):
> >>>> Hello Christophe, Ben, Hannes, Martin, Bart,
> >>>> I am a member of host-side software development team of ZXUSP storage
> >>>> project
> >>>> in ZTE Corporation. Facing the market demand, our team decides to
> >>>> write code to
> >>>> promote multipath efficiency next month. The whole idea is in the mail
> >>>> below.We
> >>>> hope to participate in and make progress with the open source
> >>>> community, so any
> >>>> suggestion and comment would be welcome.
> >>>>
> >>>
> >>>
> >>> Hi
> >>>
> >>> First - we are aware of these issue.
> >>>
> >>> The solution proposed in this mail would surely help - but there is
> >>> likely a bigger issue to be solved first.
> >>>
> >>> The core trouble is to avoid  'blkid' disk identification to be
> >>> executed.
> >>> Recent version of multipath is already marking plain 'RELOAD' operation
> >>> of table (which should not be changing disk content) with extra DM bit,
> >>> so udev rules ATM skips 'pvscan' - we also would like to extend the
> >>> functionality to skip rules more and reimport existing 'symlinks' from
> >>> udev database (so they would not get deleted).
> >>>
> >>> I believe the processing of udev rules is 'relatively' quick as long
> >>> as it does not need to read/write ANYTHING from real disks.
> >>>
> >> Hmm. You sure this is an issue?
> >> We definitely need to skip uevent handling when a path goes down (but I
> >> think we do that already), but for 'add' events we absolutely need to
> >> call blkid to figure out if the device has changed.
> >> There are storage arrays out there who use a 'path down/path up' cycle
> >> to inform initiators about any device layout change.
> >> So we wouldn't be able to handle those properly if we don't call blkid
> >> here.
> > 
> > The core trouble is -
> > 
> > 
> > With multipath device - you ONLY want to 'scan' device (with blkid)  when
> > only the initial first member device of multipath gets in.
> > 
> > So you start multipath (resume -> CHANGE) - it should be the ONLY place
> > to run 'blkid' test (which really goes though over 3/4MB of disk read,
> > to check if there is not ZFS somewhere)
> > 
> > Then any next disk being a member of multipath (recognized by 'multipath
> > -c',
> > should NOT scan)  - as far  as  I can tell current order is opposite,
> > fist there is  'blkid' (60) and then rule (62) recognizes a mpath_member.
> > 
> > Thus every add disk fires very lengthy blkid scan.
> > 
> > Of course I'm not here an expert on dm multipath rules so passing this
> > on to prajnoha@ -  but I'd guess this is primary source of slowdowns.
> > 
> > There should be exactly ONE blkid for a single multipath device - as
> > long as 'RELOAD' only  add/remove  paths  (there is no reason to scan
> > component devices)
> > 
> ATM 'multipath -c' is just a simple test if the device is supposed to be
> handled by multipath.

Well, "simple" might be stretching the truth a little. I'd really like
to not have to call multipath -c on every change event to a device since
this isn't a particularly quick callout either. In fact we do this with
the redhat code, but we use some horrible hacks, that could be solved if
udev would just allow rules to compare the value of two environment
variables. But this is impossible. I opened a bugzilla for allowing
this, but it recently got closed WONTFIX. The idea is that multipathd
would set a timestamp when it started, when a new wwid was added to the
wwids file, and when it got reconfigured. These are the only times a
configuration could change (well, users could change /etc/multipath.conf
but not reload multipathd, but that will already cause problems). udev
would read this timestamp and save it to the database.  When change
events come along, as long as the timestamp hasn't changed, the old
value of "multipath -c" is still correect.
 
> And the number of bytes read by blkid should be _that_ large; a simple
> 'blkid' on my device caused it to read 35k ...
> 
> Also udev will become very unhappy if we're not calling blkid for every
> device; you'd be having a hard time reconstructing the event for those
> devices.
> While it's trivial to import variables from parent devices, it's
> impossible to do that from unrelated devices; you'd need a dedicated
> daemon for that.
> So we cannot skip blkid without additional tooling.

We need to run blkid on the multipath device, but I'm not sure what it
gets us on the paths, once we have determined that they belong to
multipath.  multipathd doesn't use any of the values that blkid sets,
and really nothing else should be directly accessing this device.

-Ben

> >>
> >>> So while aggregation of 'uevents' in multipath would 'shorten' queue
> >>> processing of events - it would still not speedup scan alone.
> >>>
> >>> We need to drastically shorten unnecessary disk re-scanning.
> >>>
> >>> Also note - if you have a lot of disks -  it might be worth to checkout
> >>> whether udev picks  'right amount of udev workers'.
> >>> There is heuristic logic to avoid system overload - but might be worth
> >>> to check if in you system with your amount of CPU/RAM/DISKS  the
> >>> computed number is the best for scaling - i.e. if you double the amount
> >>> of workers - do you
> >>> get any better performance ?
> >>>
> >> That doesn't help, as we only have one queue (within multipath) to
> >> handle all uevents.
> > 
> > This was meant for systems with many different multipath devices.
> > Obviously would not help with a single multipath device.
> > 
> I'm talking about the multipath daemon.
> There will be exactly _one_ instance of the multipath daemon running for
> the entire system, which will be handling _all_ udev events with a
> single queue.
> Independent on the number of attached devices.
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 16:07   ` Benjamin Marzinski
@ 2016-11-28 16:26     ` Zdenek Kabelac
  0 siblings, 0 replies; 47+ messages in thread
From: Zdenek Kabelac @ 2016-11-28 16:26 UTC (permalink / raw)
  To: Benjamin Marzinski, Hannes Reinecke
  Cc: zhang.kai16, dm-devel, Martin Wilck, tang.junhui, Bart Van Assche

Dne 28.11.2016 v 17:07 Benjamin Marzinski napsal(a):
> On Mon, Nov 28, 2016 at 11:05:29AM +0100, Hannes Reinecke wrote:
>> Hi Junhui,
>>
>> Tricky bit would be to figure out _when_ to stop uevent processing and
>> calling 'ev_add_path'.
>> I guess we'd need to make 'ev_add_path' running in a separate thread;
>> that way we'd be putting pressure off the uevent processing thread and
>> the whole thing should be running far smoother.
>
> I still think that we don't need to force a wait here.
> uevent_dispatch() already pulls off all the queued events in a batch. We
> could just merge over the batch that we are given.  This has the nice
> property that when uevents aren't coming in a storm, we quickly process
> the next event, and when they are, the further multipathd gets behind, the
> larger the list it will merge over.
>


It's worth to mention at this point   - libdevmapper  1.02  is NOT library
with multithread support - there is 'certain' limited set of thread-safe 
functions, but for sure not a whole library.

So any threaded usage of libdevmapper.so would likely require mutex against
access to do library code.

Regards

Zdenek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 10:05 ` Hannes Reinecke
@ 2016-11-28 16:07   ` Benjamin Marzinski
  2016-11-28 16:26     ` Zdenek Kabelac
  0 siblings, 1 reply; 47+ messages in thread
From: Benjamin Marzinski @ 2016-11-28 16:07 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: tang.junhui, zhang.kai16, dm-devel, Bart Van Assche, Martin Wilck

On Mon, Nov 28, 2016 at 11:05:29AM +0100, Hannes Reinecke wrote:
> Hi Junhui,
> 
> On 11/28/2016 03:19 AM, tang.junhui@zte.com.cn wrote:
> > Hello Christophe, Ben, Hannes, Martin, Bart,
> > I am a member of host-side software development team of ZXUSP storage
> > project in ZTE Corporation. Facing the market demand, our team decides to write
> > code to promote multipath efficiency next month. The whole idea is in the mail
> > below. We hope to participate in and make progress with the open source community,
> > so any suggestion and comment would be welcome.
> > 
> Thank you for looking into it.
> This is one area which definitely needs improvement, so your
> contribution would be most welcome.
> 
> > 
> > ------------------------------------------------------------------------------------------------------------------------------
> > 
> > ------------------------------------------------------------------------------------------------------------------------------
> > 
> > 1.        Problem
> > In these scenarios, multipath processing efficiency is low:
> > 1) Many paths exist in each multipath device,
> > 2) Devices addition or deletion during iSCSI login/logout or FC link
> > up/down.
> > 
> > 2.        Reasons
> > Multipath process uevents one by one, and each one also produce a new dm
> > addition change or deletionuevent to increased system resource consumption,
> > actually most of these uevents have no sense at all.
> > 
> [ .. ]
> > In list_merger_uevents(&uevq_tmp), each node is traversed from the
> > latest to the oldest,
> > and the older node with the same WWID and uevent type(e.g. add) would be
> > moved to
> > the merger_node list of the later node. If a deletion uevent node
> > occurred, other older
> > uevent nodes about this device would be filtered(Thanks to Martin’s idea).
> > 
> > After above processing, attention must be paid to that the parameter
> > “struct uevent * uev” is not a single uevent any more in and after
> > uev_trigger(), code
> > need to be modified to process batch uevents in uev_add_path() and so on.
> > 
> To handle this properly you need to modify uev_add_path(); basically you
> need to treat 'uev_add_path()' and 'ev_add_path()' as two _distinct_
> events, where the former processes uevents and add the path to an
> internal list (->pathvec would be ideally suited here).
> 'ev_add_path()' then would need to be called once all uevents are
> processed, and would be processing all elements in ->pathvec, creating
> the resulting multipath map in one go.
> 
> Actually not a bad idea.
> 
> Tricky bit would be to figure out _when_ to stop uevent processing and
> calling 'ev_add_path'.
> I guess we'd need to make 'ev_add_path' running in a separate thread;
> that way we'd be putting pressure off the uevent processing thread and
> the whole thing should be running far smoother.

I still think that we don't need to force a wait here.
uevent_dispatch() already pulls off all the queued events in a batch. We
could just merge over the batch that we are given.  This has the nice
property that when uevents aren't coming in a storm, we quickly process
the next event, and when they are, the further multipathd gets behind, the
larger the list it will merge over.
 
> And looking at it closer, when you move 'ev_add_path' into a separate
> thread, and separating out 'uev_add_path' you _already_ have your event
> merging implemented.

Like I mentioned before, I'm not sure what a seperate thread will buy us
here, since we can't create multipath devices while paths might be
getting deleted. I do think your earlier point that we don't need any
locking until the paths are added to the pathvec is important.  Instead
of going a path at a time, getting the path's information and then
adding it to the pathvec, we could get the information on all the
outstanding paths and then add them all at once, and then call the
map creation code (basically the work that ev_add_path does) on all the
paths.

> With much less effort then trying to merge things at the uevent level.

Yes. I'd much rather deal with the uevents as they acutally are, and
have uev_add_path (possibly now called uev_add_paths) do the work.

> I'd be willing to look into it; please keep me informed about any
> progress here.
> 
> Cheers,
> 
> Hannes
> -- 
> Dr. Hannes Reinecke		   Teamlead Storage & Networking
> hare@suse.de			               +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 15:25 ` Benjamin Marzinski
@ 2016-11-28 15:37   ` Hannes Reinecke
  2016-12-01  1:16     ` tang.junhui
  0 siblings, 1 reply; 47+ messages in thread
From: Hannes Reinecke @ 2016-11-28 15:37 UTC (permalink / raw)
  To: Benjamin Marzinski, tang.junhui
  Cc: dm-devel, zhang.kai16, Martin Wilck, Bart Van Assche

On 11/28/2016 04:25 PM, Benjamin Marzinski wrote:
> On Mon, Nov 28, 2016 at 10:19:15AM +0800, tang.junhui@zte.com.cn wrote:
>>    Hello Christophe, Ben, Hannes, Martin, Bart,
>>    I am a member of host-side software development team of ZXUSP storage
>>    project
>>    in ZTE Corporation. Facing the market demand, our team decides to write
>>    code to
>>    promote multipath efficiency next month. The whole idea is in the mail
>>    below.We
>>    hope to participate in and make progress with the open source community,
>>    so any
>>    suggestion and comment would be welcome.
> 
> Like I mentioned before, I think this is a good idea in general, but the
> devil is in the details here.
> 
>>
>>    Thanks,
>>    Tang
>>
>>    ------------------------------------------------------------------------------------------------------------------------------
>>    ------------------------------------------------------------------------------------------------------------------------------
>>    1.        Problem
>>    In these scenarios, multipath processing efficiency is low:
>>    1) Many paths exist in each multipath device,
>>    2) Devices addition or deletion during iSCSI login/logout or FC link
>>    up/down.
> 
> <snip>
> 
>>    4.        Proposal
>>    Other than processing uevents one by one, uevents which coming from the
>>    same LUN devices can be mergered to one, and then uevent processing
>>    thread only needs to process it once, and it only produces one DM addition
>>    uevent which could reduce system resource consumption.
>>
>>    The example in Chapter 2 is continued to use to explain the proposal:
>>    1) Multipath receives block device addition uevents from udev:
>>    UDEV  [89068.806214] add    
>>     /devices/platform/host3/session44/target3:0:0/3:0:0:0/block/sdc (block)
>>    UDEV  [89068.909457] add    
>>     /devices/platform/host3/session44/target3:0:0/3:0:0:2/block/sdg (block)
>>    UDEV  [89068.944956] add    
>>     /devices/platform/host3/session44/target3:0:0/3:0:0:1/block/sde (block)
>>    UDEV  [89068.959215] add    
>>     /devices/platform/host5/session46/target5:0:0/5:0:0:0/block/sdh (block)
>>    UDEV  [89068.978558] add    
>>     /devices/platform/host5/session46/target5:0:0/5:0:0:2/block/sdk (block)
>>    UDEV  [89069.004217] add    
>>     /devices/platform/host5/session46/target5:0:0/5:0:0:1/block/sdj (block)
>>    UDEV  [89069.486361] add    
>>     /devices/platform/host4/session45/target4:0:0/4:0:0:1/block/sdf (block)
>>    UDEV  [89069.495194] add    
>>     /devices/platform/host4/session45/target4:0:0/4:0:0:0/block/sdd (block)
>>    UDEV  [89069.511628] add    
>>     /devices/platform/host4/session45/target4:0:0/4:0:0:2/block/sdi (block)
>>    UDEV  [89069.716292] add    
>>     /devices/platform/host6/session47/target6:0:0/6:0:0:0/block/sdl (block)
>>    UDEV  [89069.748456] add    
>>     /devices/platform/host6/session47/target6:0:0/6:0:0:1/block/sdm (block)
>>    UDEV  [89069.789662] add    
>>     /devices/platform/host6/session47/target6:0:0/6:0:0:2/block/sdn (block)
>>
>>    2) Multipath merges these 12 uevents to 3 internal uvents
>>    UEVENT add sdc sdh sdd sdl
>>    UEVENT add sde sdj sdf sdm
>>    UEVENT add sdg sdk sdi sdn
>>
>>    3) Multipath process these 3 uevents one by one, and only produce 3
>>    addition
>>    DM uvents, no dm change uevent exists.
>>    KERNEL[89068.899614] add      /devices/virtual/block/dm-2 (block)
>>    KERNEL[89068.955364] add      /devices/virtual/block/dm-3 (block)
>>    KERNEL[89069.018903] add      /devices/virtual/block/dm-4 (block)
> 
> Just because I'm pedantic: There will, of cource, be dm change events.
> Without them, you couldn't have a multipath device. Whenever you load a
> table in a dm device (including during the initial creation), you get a
> change event.
>  
>>    4) Udev process these uevents above, and transfer it to multipath
>>    UDEV  [89068.926428] add      /devices/virtual/block/dm-2 (block)
>>    UDEV  [89069.007511] add      /devices/virtual/block/dm-3 (block)
>>    UDEV  [89069.098054] add      /devices/virtual/block/dm-4 (block)
> 
> multipathd ignores add events for dm devices (look at uev_trigger). A dm
> device isn't set up until it's initial change event happens.
> 
>>    5) Multipath processes these uevents above, and finishes the creation of
>>    multipath
>>    devices.
>>
>>    5.        Coding
>>    After taking over uevents form uevent listening thread, uevent processing
>>    thread can
>>    merger these uevents before processing��
>>    int uevent_dispatch(int (*uev_trigger)(struct uevent *, void *
>>    trigger_data),
>>                void * trigger_data)
>>    {
>>        ...
>>        while (1) {
>>            ...
>>            list_splice_init(&uevq, &uevq_tmp);
>>            ...
>>            list_merger_uevents(&uevq_tmp);
>>            service_uevq(&uevq_tmp);
>>        }
>>        ...
>>    }
>>    In structure of ��struct uevent�� , an additional member of ��char
>>    wwid[WWID_SIZE]�� is
>>    added to record each device WWID for addition or change uevent to identify
>>    whether
>>    these uevents coming from the same LUN, and an additional member of
>>    ��struct list_head merger_node�� is added to record the list of uevents
>>    which having been
>>    merged with this uevent:
>>    struct uevent {
>>        struct list_head node;
>>        struct list_head merger_node;
>>        char wwid[WWID_SIZE]
>>        struct udev_device *udev;
>>        ...
>>    };
> 
> You can't just get the wwid with no work (look at all work uev_add_path
> does, specifically alloc_path_with_pathinfo). Now you could reorder
> this, but there isn't much point, since it is doing useful things, like
> checking if this is a spurious uevent, and necessary things, like
> figuring out the device type and using that that the configuration to
> figure out HOW to get the wwid. It seems like what you want to do is to
> call uev_add_path multiple times, but defer most of the work that
> ev_add_path does (creating or updating the multipath device), until
> you've processed all that paths.
> 
Which is kinda what I've proposed; split off uev_add_path() and
ev_add_path().
Then uev_add_path could generate a list of fully-formed paths which
ev_add_path() would process.
IE generalize coalesce_paths() to work on a passed-in list rather than
the normal vecs->pathvec.

Or consider using vecs->pathvec to hold 'orphan' paths when processing
uevents; then we could add the paths to the pathvec in uev_add_path()
and have another thread going through the list of paths and trying to
call coalesce_paths() for the changed mpps.
Will be a tad tricky, as we'd need to identify which mpps needs to be
updated; but nothing too awkward.

Meanwhile I'm working on converting vecs->pathvec and vecs->mpvec to RCU
lists; that should get rid of the need for most locks, which I suspect
being the main reason for the scaling issues.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28  2:19 tang.junhui
                   ` (2 preceding siblings ...)
  2016-11-28 10:28 ` Martin Wilck
@ 2016-11-28 15:25 ` Benjamin Marzinski
  2016-11-28 15:37   ` Hannes Reinecke
  3 siblings, 1 reply; 47+ messages in thread
From: Benjamin Marzinski @ 2016-11-28 15:25 UTC (permalink / raw)
  To: tang.junhui; +Cc: zhang.kai16, dm-devel, Bart Van Assche, Martin Wilck

On Mon, Nov 28, 2016 at 10:19:15AM +0800, tang.junhui@zte.com.cn wrote:
>    Hello Christophe, Ben, Hannes, Martin, Bart,
>    I am a member of host-side software development team of ZXUSP storage
>    project
>    in ZTE Corporation. Facing the market demand, our team decides to write
>    code to
>    promote multipath efficiency next month. The whole idea is in the mail
>    below.We
>    hope to participate in and make progress with the open source community,
>    so any
>    suggestion and comment would be welcome.

Like I mentioned before, I think this is a good idea in general, but the
devil is in the details here.

> 
>    Thanks,
>    Tang
> 
>    ------------------------------------------------------------------------------------------------------------------------------
>    ------------------------------------------------------------------------------------------------------------------------------
>    1.        Problem
>    In these scenarios, multipath processing efficiency is low:
>    1) Many paths exist in each multipath device,
>    2) Devices addition or deletion during iSCSI login/logout or FC link
>    up/down.

<snip>

>    4.        Proposal
>    Other than processing uevents one by one, uevents which coming from the
>    same LUN devices can be mergered to one, and then uevent processing
>    thread only needs to process it once, and it only produces one DM addition
>    uevent which could reduce system resource consumption.
> 
>    The example in Chapter 2 is continued to use to explain the proposal:
>    1) Multipath receives block device addition uevents from udev:
>    UDEV  [89068.806214] add    
>     /devices/platform/host3/session44/target3:0:0/3:0:0:0/block/sdc (block)
>    UDEV  [89068.909457] add    
>     /devices/platform/host3/session44/target3:0:0/3:0:0:2/block/sdg (block)
>    UDEV  [89068.944956] add    
>     /devices/platform/host3/session44/target3:0:0/3:0:0:1/block/sde (block)
>    UDEV  [89068.959215] add    
>     /devices/platform/host5/session46/target5:0:0/5:0:0:0/block/sdh (block)
>    UDEV  [89068.978558] add    
>     /devices/platform/host5/session46/target5:0:0/5:0:0:2/block/sdk (block)
>    UDEV  [89069.004217] add    
>     /devices/platform/host5/session46/target5:0:0/5:0:0:1/block/sdj (block)
>    UDEV  [89069.486361] add    
>     /devices/platform/host4/session45/target4:0:0/4:0:0:1/block/sdf (block)
>    UDEV  [89069.495194] add    
>     /devices/platform/host4/session45/target4:0:0/4:0:0:0/block/sdd (block)
>    UDEV  [89069.511628] add    
>     /devices/platform/host4/session45/target4:0:0/4:0:0:2/block/sdi (block)
>    UDEV  [89069.716292] add    
>     /devices/platform/host6/session47/target6:0:0/6:0:0:0/block/sdl (block)
>    UDEV  [89069.748456] add    
>     /devices/platform/host6/session47/target6:0:0/6:0:0:1/block/sdm (block)
>    UDEV  [89069.789662] add    
>     /devices/platform/host6/session47/target6:0:0/6:0:0:2/block/sdn (block)
> 
>    2) Multipath merges these 12 uevents to 3 internal uvents
>    UEVENT add sdc sdh sdd sdl
>    UEVENT add sde sdj sdf sdm
>    UEVENT add sdg sdk sdi sdn
> 
>    3) Multipath process these 3 uevents one by one, and only produce 3
>    addition
>    DM uvents, no dm change uevent exists.
>    KERNEL[89068.899614] add      /devices/virtual/block/dm-2 (block)
>    KERNEL[89068.955364] add      /devices/virtual/block/dm-3 (block)
>    KERNEL[89069.018903] add      /devices/virtual/block/dm-4 (block)

Just because I'm pedantic: There will, of cource, be dm change events.
Without them, you couldn't have a multipath device. Whenever you load a
table in a dm device (including during the initial creation), you get a
change event.
 
>    4) Udev process these uevents above, and transfer it to multipath
>    UDEV  [89068.926428] add      /devices/virtual/block/dm-2 (block)
>    UDEV  [89069.007511] add      /devices/virtual/block/dm-3 (block)
>    UDEV  [89069.098054] add      /devices/virtual/block/dm-4 (block)

multipathd ignores add events for dm devices (look at uev_trigger). A dm
device isn't set up until it's initial change event happens.

>    5) Multipath processes these uevents above, and finishes the creation of
>    multipath
>    devices.
> 
>    5.        Coding
>    After taking over uevents form uevent listening thread, uevent processing
>    thread can
>    merger these uevents before processing��
>    int uevent_dispatch(int (*uev_trigger)(struct uevent *, void *
>    trigger_data),
>                void * trigger_data)
>    {
>        ...
>        while (1) {
>            ...
>            list_splice_init(&uevq, &uevq_tmp);
>            ...
>            list_merger_uevents(&uevq_tmp);
>            service_uevq(&uevq_tmp);
>        }
>        ...
>    }
>    In structure of ��struct uevent�� , an additional member of ��char
>    wwid[WWID_SIZE]�� is
>    added to record each device WWID for addition or change uevent to identify
>    whether
>    these uevents coming from the same LUN, and an additional member of
>    ��struct list_head merger_node�� is added to record the list of uevents
>    which having been
>    merged with this uevent:
>    struct uevent {
>        struct list_head node;
>        struct list_head merger_node;
>        char wwid[WWID_SIZE]
>        struct udev_device *udev;
>        ...
>    };

You can't just get the wwid with no work (look at all work uev_add_path
does, specifically alloc_path_with_pathinfo). Now you could reorder
this, but there isn't much point, since it is doing useful things, like
checking if this is a spurious uevent, and necessary things, like
figuring out the device type and using that that the configuration to
figure out HOW to get the wwid. It seems like what you want to do is to
call uev_add_path multiple times, but defer most of the work that
ev_add_path does (creating or updating the multipath device), until
you've processed all that paths.

>    In list_merger_uevents(&uevq_tmp), each node is traversed from the latest
>    to the oldest,
>    and the older node with the same WWID and uevent type(e.g. add) would be
>    moved to
>    the merger_node list of the later node. If a deletion uevent node
>    occurred, other older
>    uevent nodes about this device would be filtered(Thanks to Martin��s
>    idea).
> 
>    After above processing, attention must be paid to that the parameter
>    ��struct uevent * uev�� is not a single uevent any more in and after
>    uev_trigger(), code
>    need to be modified to process batch uevents in uev_add_path() and so on.

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 12:08       ` Hannes Reinecke
  2016-11-28 12:23         ` Peter Rajnoha
@ 2016-11-28 12:55         ` Zdenek Kabelac
  2016-11-28 17:22         ` Benjamin Marzinski
  2 siblings, 0 replies; 47+ messages in thread
From: Zdenek Kabelac @ 2016-11-28 12:55 UTC (permalink / raw)
  To: Hannes Reinecke, tang.junhui, Christophe Varoqui, bmarzins,
	Martin Wilck, Bart Van Assche
  Cc: zhang.kai16, dm-devel

Dne 28.11.2016 v 13:08 Hannes Reinecke napsal(a):
> On 11/28/2016 12:51 PM, Zdenek Kabelac wrote:
>> Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a):
>> With multipath device - you ONLY want to 'scan' device (with blkid)  when
>> only the initial first member device of multipath gets in.
>>
>> So you start multipath (resume -> CHANGE) - it should be the ONLY place
>> to run 'blkid' test (which really goes though over 3/4MB of disk read,
>> to check if there is not ZFS somewhere)
>>
>> Then any next disk being a member of multipath (recognized by 'multipath
>> -c',
>> should NOT scan)  - as far  as  I can tell current order is opposite,
>> fist there is  'blkid' (60) and then rule (62) recognizes a mpath_member.
>>
>> Thus every add disk fires very lengthy blkid scan.
>>
>> Of course I'm not here an expert on dm multipath rules so passing this
>> on to prajnoha@ -  but I'd guess this is primary source of slowdowns.
>>
>> There should be exactly ONE blkid for a single multipath device - as
>> long as 'RELOAD' only  add/remove  paths  (there is no reason to scan
>> component devices)
>>
> ATM 'multipath -c' is just a simple test if the device is supposed to be
> handled by multipath.

Yes - 'multipath -c'  is a way to recognize  'private/internal' device of 
multipath device (aka component).

This is very same issues with i.e. lvm2 raid1  leg devices - where you have a 
raid device and its individual _rimageXX legs.  In lvm2 we bypass/skip
scans of 'rimage' - only resulting 'raid' device is being checked.

There is no reason to scan 'hidden' subdevice.

Unfortunately udev doesn't support 'hidden' device - so it's upto 
'blockdevice' maintainer to implement this logic in udev rules.
So it needs a careful design to NOT miss scan for 'user accesssible' device 
while not doing unnecessary scans of device noone expect i.e. lvm2 should ever 
access  (there is absolutely NO reason to have full udev DB content for any 
lvm2 private device - there is always only a minimum info - same
should apply  to  dm multipaht - where components are 'private' to multipath).

> And the number of bytes read by blkid should be _that_ large; a simple
> 'blkid' on my device caused it to read 35k ...

Unless you've configured  (or is it default for Suse) to NOT scan for ZFS,
you would see this 3 times in strace:

read(3, 
"@\200\0\0@\200\1\0@\200\2\0@\200\3\0@\200\4\0@\200\f\0@\200\r\0@\200\30\0"..., 
262144) = 262144

So a default e.i. 'blkdid /dev/sda1' really does read a lot data from disk.


>
> Also udev will become very unhappy if we're not calling blkid for every
> device; you'd be having a hard time reconstructing the event for those
> devices.

It's not about making udev happy :)
Private device (or subsystem) is simply a missing piece in udev.
There is really no reason to fill udev DB with a device noone should be using.
And I hope we both agree  that mpath  'component' should never be used
by any user.
Of course noone object reading it via 'dd'  - but component should not
appear in udev DB device list as a regular device.



> While it's trivial to import variables from parent devices, it's
> impossible to do that from unrelated devices; you'd need a dedicated
> daemon for that.
> So we cannot skip blkid without additional tooling.

Trust me -  we can and we should improve it even more.


>
>>>
>>>> So while aggregation of 'uevents' in multipath would 'shorten' queue
>>>> processing of events - it would still not speedup scan alone.
>>>>
>>>> We need to drastically shorten unnecessary disk re-scanning.
>>>>
>>>> Also note - if you have a lot of disks -  it might be worth to checkout
>>>> whether udev picks  'right amount of udev workers'.
>>>> There is heuristic logic to avoid system overload - but might be worth
>>>> to check if in you system with your amount of CPU/RAM/DISKS  the
>>>> computed number is the best for scaling - i.e. if you double the amount
>>>> of workers - do you
>>>> get any better performance ?
>>>>
>>> That doesn't help, as we only have one queue (within multipath) to
>>> handle all uevents.
>>
>> This was meant for systems with many different multipath devices.
>> Obviously would not help with a single multipath device.
>>
> I'm talking about the multipath daemon.
> There will be exactly _one_ instance of the multipath daemon running for
> the entire system, which will be handling _all_ udev events with a
> single queue.
> Independent on the number of attached devices.


Ohh - I've thought there is some bigger parallelism in multipathd.
Ok - so then more workers would really not help.

That another reason why improving speed of rules should be prio task #1.

Regards

Zdenek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 12:08       ` Hannes Reinecke
@ 2016-11-28 12:23         ` Peter Rajnoha
  2016-11-28 12:55         ` Zdenek Kabelac
  2016-11-28 17:22         ` Benjamin Marzinski
  2 siblings, 0 replies; 47+ messages in thread
From: Peter Rajnoha @ 2016-11-28 12:23 UTC (permalink / raw)
  To: Hannes Reinecke, Zdenek Kabelac, tang.junhui, Christophe Varoqui,
	bmarzins, Martin Wilck, Bart Van Assche
  Cc: zhang.kai16, dm-devel

On 11/28/2016 01:08 PM, Hannes Reinecke wrote:
> On 11/28/2016 12:51 PM, Zdenek Kabelac wrote:
>> Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a):
>>> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote:
>>>> Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a):
>>>>> Hello Christophe, Ben, Hannes, Martin, Bart,
>>>>> I am a member of host-side software development team of ZXUSP storage
>>>>> project
>>>>> in ZTE Corporation. Facing the market demand, our team decides to
>>>>> write code to
>>>>> promote multipath efficiency next month. The whole idea is in the mail
>>>>> below.We
>>>>> hope to participate in and make progress with the open source
>>>>> community, so any
>>>>> suggestion and comment would be welcome.
>>>>>
>>>>
>>>>
>>>> Hi
>>>>
>>>> First - we are aware of these issue.
>>>>
>>>> The solution proposed in this mail would surely help - but there is
>>>> likely a bigger issue to be solved first.
>>>>
>>>> The core trouble is to avoid  'blkid' disk identification to be
>>>> executed.
>>>> Recent version of multipath is already marking plain 'RELOAD' operation
>>>> of table (which should not be changing disk content) with extra DM bit,
>>>> so udev rules ATM skips 'pvscan' - we also would like to extend the
>>>> functionality to skip rules more and reimport existing 'symlinks' from
>>>> udev database (so they would not get deleted).
>>>>
>>>> I believe the processing of udev rules is 'relatively' quick as long
>>>> as it does not need to read/write ANYTHING from real disks.
>>>>
>>> Hmm. You sure this is an issue?
>>> We definitely need to skip uevent handling when a path goes down (but I
>>> think we do that already), but for 'add' events we absolutely need to
>>> call blkid to figure out if the device has changed.
>>> There are storage arrays out there who use a 'path down/path up' cycle
>>> to inform initiators about any device layout change.
>>> So we wouldn't be able to handle those properly if we don't call blkid
>>> here.
>>
>> The core trouble is -
>>
>>
>> With multipath device - you ONLY want to 'scan' device (with blkid)  when
>> only the initial first member device of multipath gets in.
>>
>> So you start multipath (resume -> CHANGE) - it should be the ONLY place
>> to run 'blkid' test (which really goes though over 3/4MB of disk read,
>> to check if there is not ZFS somewhere)
>>
>> Then any next disk being a member of multipath (recognized by 'multipath
>> -c',
>> should NOT scan)  - as far  as  I can tell current order is opposite,
>> fist there is  'blkid' (60) and then rule (62) recognizes a mpath_member.
>>
>> Thus every add disk fires very lengthy blkid scan.
>>
>> Of course I'm not here an expert on dm multipath rules so passing this
>> on to prajnoha@ -  but I'd guess this is primary source of slowdowns.
>>
>> There should be exactly ONE blkid for a single multipath device - as
>> long as 'RELOAD' only  add/remove  paths  (there is no reason to scan
>> component devices)
>>
> ATM 'multipath -c' is just a simple test if the device is supposed to be
> handled by multipath.
> 
> And the number of bytes read by blkid should be _that_ large; a simple
> 'blkid' on my device caused it to read 35k ...
> 
> Also udev will become very unhappy if we're not calling blkid for every
> device; you'd be having a hard time reconstructing the event for those
> devices.

What do you mean with event reconstruction?

I don't think we really need to call blkid for every device. If we have
configured that certain device is surely an mpath component (based on
WWN in mpath config), I think we don't need to call blkid at all - it's
mpath component and the top-level device should be simply used for any
scanning.

I mean, I still don't see why do we need to call blkid and then
overwrite the ID_FS_TYPE variable right away based on the fact that it's
multipath -c. If we reverse this order, we could save the extra blkid
that's not actually needed.

-- 
Peter

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 11:51     ` Zdenek Kabelac
  2016-11-28 12:06       ` Peter Rajnoha
@ 2016-11-28 12:08       ` Hannes Reinecke
  2016-11-28 12:23         ` Peter Rajnoha
                           ` (2 more replies)
  1 sibling, 3 replies; 47+ messages in thread
From: Hannes Reinecke @ 2016-11-28 12:08 UTC (permalink / raw)
  To: Zdenek Kabelac, tang.junhui, Christophe Varoqui, bmarzins,
	Martin Wilck, Bart Van Assche
  Cc: zhang.kai16, dm-devel

On 11/28/2016 12:51 PM, Zdenek Kabelac wrote:
> Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a):
>> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote:
>>> Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a):
>>>> Hello Christophe, Ben, Hannes, Martin, Bart,
>>>> I am a member of host-side software development team of ZXUSP storage
>>>> project
>>>> in ZTE Corporation. Facing the market demand, our team decides to
>>>> write code to
>>>> promote multipath efficiency next month. The whole idea is in the mail
>>>> below.We
>>>> hope to participate in and make progress with the open source
>>>> community, so any
>>>> suggestion and comment would be welcome.
>>>>
>>>
>>>
>>> Hi
>>>
>>> First - we are aware of these issue.
>>>
>>> The solution proposed in this mail would surely help - but there is
>>> likely a bigger issue to be solved first.
>>>
>>> The core trouble is to avoid  'blkid' disk identification to be
>>> executed.
>>> Recent version of multipath is already marking plain 'RELOAD' operation
>>> of table (which should not be changing disk content) with extra DM bit,
>>> so udev rules ATM skips 'pvscan' - we also would like to extend the
>>> functionality to skip rules more and reimport existing 'symlinks' from
>>> udev database (so they would not get deleted).
>>>
>>> I believe the processing of udev rules is 'relatively' quick as long
>>> as it does not need to read/write ANYTHING from real disks.
>>>
>> Hmm. You sure this is an issue?
>> We definitely need to skip uevent handling when a path goes down (but I
>> think we do that already), but for 'add' events we absolutely need to
>> call blkid to figure out if the device has changed.
>> There are storage arrays out there who use a 'path down/path up' cycle
>> to inform initiators about any device layout change.
>> So we wouldn't be able to handle those properly if we don't call blkid
>> here.
> 
> The core trouble is -
> 
> 
> With multipath device - you ONLY want to 'scan' device (with blkid)  when
> only the initial first member device of multipath gets in.
> 
> So you start multipath (resume -> CHANGE) - it should be the ONLY place
> to run 'blkid' test (which really goes though over 3/4MB of disk read,
> to check if there is not ZFS somewhere)
> 
> Then any next disk being a member of multipath (recognized by 'multipath
> -c',
> should NOT scan)  - as far  as  I can tell current order is opposite,
> fist there is  'blkid' (60) and then rule (62) recognizes a mpath_member.
> 
> Thus every add disk fires very lengthy blkid scan.
> 
> Of course I'm not here an expert on dm multipath rules so passing this
> on to prajnoha@ -  but I'd guess this is primary source of slowdowns.
> 
> There should be exactly ONE blkid for a single multipath device - as
> long as 'RELOAD' only  add/remove  paths  (there is no reason to scan
> component devices)
> 
ATM 'multipath -c' is just a simple test if the device is supposed to be
handled by multipath.

And the number of bytes read by blkid should be _that_ large; a simple
'blkid' on my device caused it to read 35k ...

Also udev will become very unhappy if we're not calling blkid for every
device; you'd be having a hard time reconstructing the event for those
devices.
While it's trivial to import variables from parent devices, it's
impossible to do that from unrelated devices; you'd need a dedicated
daemon for that.
So we cannot skip blkid without additional tooling.

>>
>>> So while aggregation of 'uevents' in multipath would 'shorten' queue
>>> processing of events - it would still not speedup scan alone.
>>>
>>> We need to drastically shorten unnecessary disk re-scanning.
>>>
>>> Also note - if you have a lot of disks -  it might be worth to checkout
>>> whether udev picks  'right amount of udev workers'.
>>> There is heuristic logic to avoid system overload - but might be worth
>>> to check if in you system with your amount of CPU/RAM/DISKS  the
>>> computed number is the best for scaling - i.e. if you double the amount
>>> of workers - do you
>>> get any better performance ?
>>>
>> That doesn't help, as we only have one queue (within multipath) to
>> handle all uevents.
> 
> This was meant for systems with many different multipath devices.
> Obviously would not help with a single multipath device.
> 
I'm talking about the multipath daemon.
There will be exactly _one_ instance of the multipath daemon running for
the entire system, which will be handling _all_ udev events with a
single queue.
Independent on the number of attached devices.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 11:51     ` Zdenek Kabelac
@ 2016-11-28 12:06       ` Peter Rajnoha
  2016-11-28 12:08       ` Hannes Reinecke
  1 sibling, 0 replies; 47+ messages in thread
From: Peter Rajnoha @ 2016-11-28 12:06 UTC (permalink / raw)
  To: Zdenek Kabelac, Hannes Reinecke, tang.junhui, Christophe Varoqui,
	bmarzins, Martin Wilck, Bart Van Assche
  Cc: zhang.kai16, dm-devel

On 11/28/2016 12:51 PM, Zdenek Kabelac wrote:
> Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a):
>> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote:
>>> Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a):
>>>> Hello Christophe, Ben, Hannes, Martin, Bart,
>>>> I am a member of host-side software development team of ZXUSP storage
>>>> project
>>>> in ZTE Corporation. Facing the market demand, our team decides to
>>>> write code to
>>>> promote multipath efficiency next month. The whole idea is in the mail
>>>> below.We
>>>> hope to participate in and make progress with the open source
>>>> community, so any
>>>> suggestion and comment would be welcome.
>>>>
>>>
>>>
>>> Hi
>>>
>>> First - we are aware of these issue.
>>>
>>> The solution proposed in this mail would surely help - but there is
>>> likely a bigger issue to be solved first.
>>>
>>> The core trouble is to avoid  'blkid' disk identification to be
>>> executed.
>>> Recent version of multipath is already marking plain 'RELOAD' operation
>>> of table (which should not be changing disk content) with extra DM bit,
>>> so udev rules ATM skips 'pvscan' - we also would like to extend the
>>> functionality to skip rules more and reimport existing 'symlinks' from
>>> udev database (so they would not get deleted).
>>>
>>> I believe the processing of udev rules is 'relatively' quick as long
>>> as it does not need to read/write ANYTHING from real disks.
>>>
>> Hmm. You sure this is an issue?
>> We definitely need to skip uevent handling when a path goes down (but I
>> think we do that already), but for 'add' events we absolutely need to
>> call blkid to figure out if the device has changed.
>> There are storage arrays out there who use a 'path down/path up' cycle
>> to inform initiators about any device layout change.
>> So we wouldn't be able to handle those properly if we don't call blkid
>> here.
> 
> The core trouble is -
> 
> 
> With multipath device - you ONLY want to 'scan' device (with blkid)  when
> only the initial first member device of multipath gets in.
> 
> So you start multipath (resume -> CHANGE) - it should be the ONLY place
> to run 'blkid' test (which really goes though over 3/4MB of disk read,
> to check if there is not ZFS somewhere)
> 
> Then any next disk being a member of multipath (recognized by 'multipath
> -c',
> should NOT scan)  - as far  as  I can tell current order is opposite,
> fist there is  'blkid' (60) and then rule (62) recognizes a mpath_member.
> 
> Thus every add disk fires very lengthy blkid scan.
> 
> Of course I'm not here an expert on dm multipath rules so passing this
> on to prajnoha@ -  but I'd guess this is primary source of slowdowns.
> 
> There should be exactly ONE blkid for a single multipath device - as
> long as 'RELOAD' only  add/remove  paths  (there is no reason to scan
> component devices)

That's true indeed - the blkid is executed in
60-persistent-storage.rules (for all disks in general). And multipath rules
are executed later - at the moment, if we detect that the disk is a
multipath component, we overwrite the ID_FS_TYPE to mpath_member. But
that means the blkid was executed uselessly before.

So if we wanted to save some cycles, this could be the place to also
look at optimization, as Zdenek already pointed out.

When it comes to the need to detect whether the component has changed or
not (as Hannes pointed out), that would be only for a change in
signatures (that blkid detects only). But is that enough? If we care
about the change in signatures that blkid detects, don't we care about
data/content that may have changed the same way as those signatures?

-- 
Peter

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 10:42   ` Hannes Reinecke
@ 2016-11-28 11:51     ` Zdenek Kabelac
  2016-11-28 12:06       ` Peter Rajnoha
  2016-11-28 12:08       ` Hannes Reinecke
  0 siblings, 2 replies; 47+ messages in thread
From: Zdenek Kabelac @ 2016-11-28 11:51 UTC (permalink / raw)
  To: Hannes Reinecke, tang.junhui, Christophe Varoqui, bmarzins,
	Martin Wilck, Bart Van Assche
  Cc: zhang.kai16, dm-devel

Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a):
> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote:
>> Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a):
>>> Hello Christophe, Ben, Hannes, Martin, Bart,
>>> I am a member of host-side software development team of ZXUSP storage
>>> project
>>> in ZTE Corporation. Facing the market demand, our team decides to
>>> write code to
>>> promote multipath efficiency next month. The whole idea is in the mail
>>> below.We
>>> hope to participate in and make progress with the open source
>>> community, so any
>>> suggestion and comment would be welcome.
>>>
>>
>>
>> Hi
>>
>> First - we are aware of these issue.
>>
>> The solution proposed in this mail would surely help - but there is
>> likely a bigger issue to be solved first.
>>
>> The core trouble is to avoid  'blkid' disk identification to be executed.
>> Recent version of multipath is already marking plain 'RELOAD' operation
>> of table (which should not be changing disk content) with extra DM bit,
>> so udev rules ATM skips 'pvscan' - we also would like to extend the
>> functionality to skip rules more and reimport existing 'symlinks' from
>> udev database (so they would not get deleted).
>>
>> I believe the processing of udev rules is 'relatively' quick as long
>> as it does not need to read/write ANYTHING from real disks.
>>
> Hmm. You sure this is an issue?
> We definitely need to skip uevent handling when a path goes down (but I
> think we do that already), but for 'add' events we absolutely need to
> call blkid to figure out if the device has changed.
> There are storage arrays out there who use a 'path down/path up' cycle
> to inform initiators about any device layout change.
> So we wouldn't be able to handle those properly if we don't call blkid here.

The core trouble is -


With multipath device - you ONLY want to 'scan' device (with blkid)  when
only the initial first member device of multipath gets in.

So you start multipath (resume -> CHANGE) - it should be the ONLY place
to run 'blkid' test (which really goes though over 3/4MB of disk read,
to check if there is not ZFS somewhere)

Then any next disk being a member of multipath (recognized by 'multipath -c',
should NOT scan)  - as far  as  I can tell current order is opposite,
fist there is  'blkid' (60) and then rule (62) recognizes a mpath_member.

Thus every add disk fires very lengthy blkid scan.

Of course I'm not here an expert on dm multipath rules so passing this on to 
prajnoha@ -  but I'd guess this is primary source of slowdowns.

There should be exactly ONE blkid for a single multipath device - as
long as 'RELOAD' only  add/remove  paths  (there is no reason to scan
component devices)

>
>> So while aggregation of 'uevents' in multipath would 'shorten' queue
>> processing of events - it would still not speedup scan alone.
>>
>> We need to drastically shorten unnecessary disk re-scanning.
>>
>> Also note - if you have a lot of disks -  it might be worth to checkout
>> whether udev picks  'right amount of udev workers'.
>> There is heuristic logic to avoid system overload - but might be worth
>> to check if in you system with your amount of CPU/RAM/DISKS  the
>> computed number is the best for scaling - i.e. if you double the amount
>> of workers - do you
>> get any better performance ?
>>
> That doesn't help, as we only have one queue (within multipath) to
> handle all uevents.

This was meant for systems with many different multipath devices.
Obviously would not help with a single multipath device.

Regards

Zdenek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28 10:06 ` Zdenek Kabelac
@ 2016-11-28 10:42   ` Hannes Reinecke
  2016-11-28 11:51     ` Zdenek Kabelac
  0 siblings, 1 reply; 47+ messages in thread
From: Hannes Reinecke @ 2016-11-28 10:42 UTC (permalink / raw)
  To: Zdenek Kabelac, tang.junhui, Christophe Varoqui, bmarzins,
	Martin Wilck, Bart Van Assche
  Cc: zhang.kai16, dm-devel

On 11/28/2016 11:06 AM, Zdenek Kabelac wrote:
> Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a):
>> Hello Christophe, Ben, Hannes, Martin, Bart,
>> I am a member of host-side software development team of ZXUSP storage
>> project
>> in ZTE Corporation. Facing the market demand, our team decides to
>> write code to
>> promote multipath efficiency next month. The whole idea is in the mail
>> below.We
>> hope to participate in and make progress with the open source
>> community, so any
>> suggestion and comment would be welcome.
>>
> 
> 
> Hi
> 
> First - we are aware of these issue.
> 
> The solution proposed in this mail would surely help - but there is
> likely a bigger issue to be solved first.
> 
> The core trouble is to avoid  'blkid' disk identification to be executed.
> Recent version of multipath is already marking plain 'RELOAD' operation
> of table (which should not be changing disk content) with extra DM bit,
> so udev rules ATM skips 'pvscan' - we also would like to extend the
> functionality to skip rules more and reimport existing 'symlinks' from
> udev database (so they would not get deleted).
> 
> I believe the processing of udev rules is 'relatively' quick as long
> as it does not need to read/write ANYTHING from real disks.
> 
Hmm. You sure this is an issue?
We definitely need to skip uevent handling when a path goes down (but I
think we do that already), but for 'add' events we absolutely need to
call blkid to figure out if the device has changed.
There are storage arrays out there who use a 'path down/path up' cycle
to inform initiators about any device layout change.
So we wouldn't be able to handle those properly if we don't call blkid here.

> So while aggregation of 'uevents' in multipath would 'shorten' queue
> processing of events - it would still not speedup scan alone.
> 
> We need to drastically shorten unnecessary disk re-scanning.
> 
> Also note - if you have a lot of disks -  it might be worth to checkout
> whether udev picks  'right amount of udev workers'.
> There is heuristic logic to avoid system overload - but might be worth
> to check if in you system with your amount of CPU/RAM/DISKS  the
> computed number is the best for scaling - i.e. if you double the amount
> of workers - do you
> get any better performance ?
> 
That doesn't help, as we only have one queue (within multipath) to
handle all uevents.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28  2:19 tang.junhui
  2016-11-28 10:05 ` Hannes Reinecke
  2016-11-28 10:06 ` Zdenek Kabelac
@ 2016-11-28 10:28 ` Martin Wilck
  2016-11-28 17:31   ` Benjamin Marzinski
  2016-11-28 15:25 ` Benjamin Marzinski
  3 siblings, 1 reply; 47+ messages in thread
From: Martin Wilck @ 2016-11-28 10:28 UTC (permalink / raw)
  To: tang.junhui, Christophe Varoqui, bmarzins, Hannes Reinecke,
	Bart Van Assche
  Cc: zhang.kai16, dm-devel

On Mon, 2016-11-28 at 10:19 +0800, tang.junhui@zte.com.cn wrote:
> 
> 4.        Proposal 
> Other than processing uevents one by one, uevents which coming from
> the 
> same LUN devices can be mergered to one, and then uevent processing 
> thread only needs to process it once, and it only produces one DM
> addition 
> uevent which could reduce system resource consumption. 
> 

Here comes an idea how to achieve this without a lot of additional
code:

libmultipath already has code to check whether any maps need to be
updated (in coalesce_paths()). Instead of recording uevents, merging
them, and calling ev_add_path() for every affected WWID, it might be
sufficient to set daemon state to DAEMON_CONFIGURE and wake up the main
multipathd thread to call reconfigure(). Then we only need to make sure
that coalesce_paths() really reloads or creates maps only when
necessary. I have some patches here that I made for that purpose, for a
different scenario (multipathd to avoid RELOAD ioctls when it's started
in a scenario where most paths are already set up by udev).

Regards
Martin

-- 
Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107
SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28  2:19 tang.junhui
  2016-11-28 10:05 ` Hannes Reinecke
@ 2016-11-28 10:06 ` Zdenek Kabelac
  2016-11-28 10:42   ` Hannes Reinecke
  2016-11-28 10:28 ` Martin Wilck
  2016-11-28 15:25 ` Benjamin Marzinski
  3 siblings, 1 reply; 47+ messages in thread
From: Zdenek Kabelac @ 2016-11-28 10:06 UTC (permalink / raw)
  To: tang.junhui, Christophe Varoqui, bmarzins, Hannes Reinecke,
	Martin Wilck, Bart Van Assche
  Cc: zhang.kai16, dm-devel

Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a):
> Hello Christophe, Ben, Hannes, Martin, Bart,
> I am a member of host-side software development team of ZXUSP storage project
> in ZTE Corporation. Facing the market demand, our team decides to write code to
> promote multipath efficiency next month. The whole idea is in the mail below.We
> hope to participate in and make progress with the open source community, so any
> suggestion and comment would be welcome.
>


Hi

First - we are aware of these issue.

The solution proposed in this mail would surely help - but there is likely a 
bigger issue to be solved first.

The core trouble is to avoid  'blkid' disk identification to be executed.
Recent version of multipath is already marking plain 'RELOAD' operation
of table (which should not be changing disk content) with extra DM bit,
so udev rules ATM skips 'pvscan' - we also would like to extend the 
functionality to skip rules more and reimport existing 'symlinks' from
udev database (so they would not get deleted).

I believe the processing of udev rules is 'relatively' quick as long
as it does not need to read/write ANYTHING from real disks.

So while aggregation of 'uevents' in multipath would 'shorten' queue 
processing of events - it would still not speedup scan alone.

We need to drastically shorten unnecessary disk re-scanning.

Also note - if you have a lot of disks -  it might be worth to checkout
whether udev picks  'right amount of udev workers'.
There is heuristic logic to avoid system overload - but might be worth to 
check if in you system with your amount of CPU/RAM/DISKS  the computed number 
is the best for scaling - i.e. if you double the amount of workers - do you
get any better performance ?

Regards

Zdenek

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
  2016-11-28  2:19 tang.junhui
@ 2016-11-28 10:05 ` Hannes Reinecke
  2016-11-28 16:07   ` Benjamin Marzinski
  2016-11-28 10:06 ` Zdenek Kabelac
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 47+ messages in thread
From: Hannes Reinecke @ 2016-11-28 10:05 UTC (permalink / raw)
  To: tang.junhui, Christophe Varoqui, bmarzins, Martin Wilck, Bart Van Assche
  Cc: zhang.kai16, dm-devel

Hi Junhui,

On 11/28/2016 03:19 AM, tang.junhui@zte.com.cn wrote:
> Hello Christophe, Ben, Hannes, Martin, Bart,
> I am a member of host-side software development team of ZXUSP storage
> project in ZTE Corporation. Facing the market demand, our team decides to write
> code to promote multipath efficiency next month. The whole idea is in the mail
> below. We hope to participate in and make progress with the open source community,
> so any suggestion and comment would be welcome.
> 
Thank you for looking into it.
This is one area which definitely needs improvement, so your
contribution would be most welcome.

> 
> ------------------------------------------------------------------------------------------------------------------------------
> 
> ------------------------------------------------------------------------------------------------------------------------------
> 
> 1.        Problem
> In these scenarios, multipath processing efficiency is low:
> 1) Many paths exist in each multipath device,
> 2) Devices addition or deletion during iSCSI login/logout or FC link
> up/down.
> 
> 2.        Reasons
> Multipath process uevents one by one, and each one also produce a new dm
> addition change or deletionuevent to increased system resource consumption,
> actually most of these uevents have no sense at all.
> 
[ .. ]
> In list_merger_uevents(&uevq_tmp), each node is traversed from the
> latest to the oldest,
> and the older node with the same WWID and uevent type(e.g. add) would be
> moved to
> the merger_node list of the later node. If a deletion uevent node
> occurred, other older
> uevent nodes about this device would be filtered(Thanks to Martin’s idea).
> 
> After above processing, attention must be paid to that the parameter
> “struct uevent * uev” is not a single uevent any more in and after
> uev_trigger(), code
> need to be modified to process batch uevents in uev_add_path() and so on.
> 
To handle this properly you need to modify uev_add_path(); basically you
need to treat 'uev_add_path()' and 'ev_add_path()' as two _distinct_
events, where the former processes uevents and add the path to an
internal list (->pathvec would be ideally suited here).
'ev_add_path()' then would need to be called once all uevents are
processed, and would be processing all elements in ->pathvec, creating
the resulting multipath map in one go.

Actually not a bad idea.

Tricky bit would be to figure out _when_ to stop uevent processing and
calling 'ev_add_path'.
I guess we'd need to make 'ev_add_path' running in a separate thread;
that way we'd be putting pressure off the uevent processing thread and
the whole thing should be running far smoother.

And looking at it closer, when you move 'ev_add_path' into a separate
thread, and separating out 'uev_add_path' you _already_ have your event
merging implemented.

With much less effort then trying to merge things at the uevent level.

I'd be willing to look into it; please keep me informed about any
progress here.

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		   Teamlead Storage & Networking
hare@suse.de			               +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)

--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: Improve processing efficiency for addition and deletion of multipath devices
@ 2016-11-28  2:19 tang.junhui
  2016-11-28 10:05 ` Hannes Reinecke
                   ` (3 more replies)
  0 siblings, 4 replies; 47+ messages in thread
From: tang.junhui @ 2016-11-28  2:19 UTC (permalink / raw)
  To: Christophe Varoqui, bmarzins, Hannes Reinecke, Martin Wilck,
	Bart Van Assche
  Cc: zhang.kai16, dm-devel, tang.junhui


[-- Attachment #1.1: Type: text/plain, Size: 9191 bytes --]

Hello Christophe, Ben, Hannes, Martin, Bart, 

I am a member of host-side software development team of ZXUSP storage 
project 
in ZTE Corporation. Facing the market demand, our team decides to write 
code to 
promote multipath efficiency next month. The whole idea is in the mail 
below.We 
hope to participate in and make progress with the open source community, 
so any 
suggestion and comment would be welcome. 

Thanks,
Tang

------------------------------------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------------------------------------
1.      Problem
In these scenarios, multipath processing efficiency is low:
1) Many paths exist in each multipath device,
2) Devices addition or deletion during iSCSI login/logout or FC link 
up/down.

2.      Reasons
Multipath process uevents one by one, and each one also produce a new dm 
addition change or deletion uevent to increased system resource 
consumption, 
actually most of these uevents have no sense at all. 

E.g. login procedure of 4 iSCSI sessions with 3 LUNs:
1) Multipath processes these uevents one by one:
UDEV  [89068.806214] add 
/devices/platform/host3/session44/target3:0:0/3:0:0:0/block/sdc (block)
UDEV  [89068.909457] add 
/devices/platform/host3/session44/target3:0:0/3:0:0:2/block/sdg (block)
UDEV  [89068.944956] add 
/devices/platform/host3/session44/target3:0:0/3:0:0:1/block/sde (block)
UDEV  [89068.959215] add 
/devices/platform/host5/session46/target5:0:0/5:0:0:0/block/sdh (block)
UDEV  [89068.978558] add 
/devices/platform/host5/session46/target5:0:0/5:0:0:2/block/sdk (block)
UDEV  [89069.004217] add 
/devices/platform/host5/session46/target5:0:0/5:0:0:1/block/sdj (block)
UDEV  [89069.486361] add 
/devices/platform/host4/session45/target4:0:0/4:0:0:1/block/sdf (block)
UDEV  [89069.495194] add 
/devices/platform/host4/session45/target4:0:0/4:0:0:0/block/sdd (block)
UDEV  [89069.511628] add 
/devices/platform/host4/session45/target4:0:0/4:0:0:2/block/sdi (block)
UDEV  [89069.716292] add 
/devices/platform/host6/session47/target6:0:0/6:0:0:0/block/sdl (block)
UDEV  [89069.748456] add 
/devices/platform/host6/session47/target6:0:0/6:0:0:1/block/sdm (block)
UDEV  [89069.789662] add 
/devices/platform/host6/session47/target6:0:0/6:0:0:2/block/sdn (block)

2) Multipath also produce DM uvents by step 1), which would be processed 
by 
udev and other process who listening kernel: 
KERNEL[89068.899614] add      /devices/virtual/block/dm-2 (block)
KERNEL[89068.902477] change   /devices/virtual/block/dm-2 (block)
KERNEL[89068.955364] add      /devices/virtual/block/dm-3 (block)
KERNEL[89068.960663] change   /devices/virtual/block/dm-3 (block)
KERNEL[89069.018903] add      /devices/virtual/block/dm-4 (block)
KERNEL[89069.042102] change   /devices/virtual/block/dm-4 (block)
KERNEL[89069.297252] change   /devices/virtual/block/dm-2 (block)
KERNEL[89069.346718] change   /devices/virtual/block/dm-4 (block)
KERNEL[89069.388361] change   /devices/virtual/block/dm-3 (block)
KERNEL[89069.548270] change   /devices/virtual/block/dm-4 (block)
KERNEL[89069.607306] change   /devices/virtual/block/dm-2 (block)
KERNEL[89070.118067] change   /devices/virtual/block/dm-3 (block)
KERNEL[89070.136256] change   /devices/virtual/block/dm-2 (block)
KERNEL[89070.157222] change   /devices/virtual/block/dm-4 (block)
KERNEL[89070.216269] change   /devices/virtual/block/dm-3 (block)

3) After processing by udev in step 2), udev also transfers these uevents 
to 
multipath:
UDEV  [89068.926428] add      /devices/virtual/block/dm-2 (block)
UDEV  [89069.007511] add      /devices/virtual/block/dm-3 (block)
UDEV  [89069.098054] add      /devices/virtual/block/dm-4 (block)
UDEV  [89069.291184] change   /devices/virtual/block/dm-2 (block)
UDEV  [89069.320632] change   /devices/virtual/block/dm-4 (block)
UDEV  [89069.381434] change   /devices/virtual/block/dm-3 (block)
UDEV  [89069.637666] change   /devices/virtual/block/dm-2 (block)
UDEV  [89069.682303] change   /devices/virtual/block/dm-4 (block)
UDEV  [89069.860877] change   /devices/virtual/block/dm-2 (block)
UDEV  [89069.904735] change   /devices/virtual/block/dm-4 (block)
UDEV  [89070.327167] change   /devices/virtual/block/dm-2 (block)
UDEV  [89070.371114] change   /devices/virtual/block/dm-4 (block)
UDEV  [89070.434592] change   /devices/virtual/block/dm-3 (block)
UDEV  [89070.572072] change   /devices/virtual/block/dm-3 (block)
UDEV  [89070.703181] change   /devices/virtual/block/dm-3 (block)

4) Multipath processes uevents above.

The efficiency of processing uevents one by one is low, and it produces 
too 
many uevents, which further reducing the processing efficiency. The 
problem 
is similar in the logout procedure of iSCSI sessions.

3.      Negative effect
Multipath processes so slowly that it is not satisfied to some 
applications, For 
example, Openstack is often timeout in waiting for the creation of 
multipath 
devices.

4.      Proposal
Other than processing uevents one by one, uevents which coming from the 
same LUN devices can be mergered to one, and then uevent processing 
thread only needs to process it once, and it only produces one DM addition 

uevent which could reduce system resource consumption.

The example in Chapter 2 is continued to use to explain the proposal:
1) Multipath receives block device addition uevents from udev:
UDEV  [89068.806214] add 
/devices/platform/host3/session44/target3:0:0/3:0:0:0/block/sdc (block)
UDEV  [89068.909457] add 
/devices/platform/host3/session44/target3:0:0/3:0:0:2/block/sdg (block)
UDEV  [89068.944956] add 
/devices/platform/host3/session44/target3:0:0/3:0:0:1/block/sde (block)
UDEV  [89068.959215] add 
/devices/platform/host5/session46/target5:0:0/5:0:0:0/block/sdh (block)
UDEV  [89068.978558] add 
/devices/platform/host5/session46/target5:0:0/5:0:0:2/block/sdk (block)
UDEV  [89069.004217] add 
/devices/platform/host5/session46/target5:0:0/5:0:0:1/block/sdj (block)
UDEV  [89069.486361] add 
/devices/platform/host4/session45/target4:0:0/4:0:0:1/block/sdf (block)
UDEV  [89069.495194] add 
/devices/platform/host4/session45/target4:0:0/4:0:0:0/block/sdd (block)
UDEV  [89069.511628] add 
/devices/platform/host4/session45/target4:0:0/4:0:0:2/block/sdi (block)
UDEV  [89069.716292] add 
/devices/platform/host6/session47/target6:0:0/6:0:0:0/block/sdl (block)
UDEV  [89069.748456] add 
/devices/platform/host6/session47/target6:0:0/6:0:0:1/block/sdm (block)
UDEV  [89069.789662] add 
/devices/platform/host6/session47/target6:0:0/6:0:0:2/block/sdn (block)

2) Multipath merges these 12 uevents to 3 internal uvents
UEVENT add sdc sdh sdd sdl 
UEVENT add sde sdj sdf sdm 
UEVENT add sdg sdk sdi sdn

3) Multipath process these 3 uevents one by one, and only produce 3 
addition 
DM uvents, no dm change uevent exists.
KERNEL[89068.899614] add      /devices/virtual/block/dm-2 (block)
KERNEL[89068.955364] add      /devices/virtual/block/dm-3 (block)
KERNEL[89069.018903] add      /devices/virtual/block/dm-4 (block)

4) Udev process these uevents above, and transfer it to multipath
UDEV  [89068.926428] add      /devices/virtual/block/dm-2 (block)
UDEV  [89069.007511] add      /devices/virtual/block/dm-3 (block)
UDEV  [89069.098054] add      /devices/virtual/block/dm-4 (block)

5) Multipath processes these uevents above, and finishes the creation of 
multipath 
devices.

5.      Coding
After taking over uevents form uevent listening thread, uevent processing 
thread can 
merger these uevents before processing:
int uevent_dispatch(int (*uev_trigger)(struct uevent *, void * 
trigger_data),
            void * trigger_data)
{
    ...
    while (1) {
        ...
        list_splice_init(&uevq, &uevq_tmp);
        ...
        list_merger_uevents(&uevq_tmp);
        service_uevq(&uevq_tmp);
    }
    ...
}
In structure of “struct uevent” , an additional member of “char 
wwid[WWID_SIZE]” is 
added to record each device WWID for addition or change uevent to identify 
whether 
these uevents coming from the same LUN, and an additional member of 
“struct list_head merger_node” is added to record the list of uevents 
which having been 
merged with this uevent:
struct uevent {
    struct list_head node;
    struct list_head merger_node;
    char wwid[WWID_SIZE]
    struct udev_device *udev;
    ...
};

In list_merger_uevents(&uevq_tmp), each node is traversed from the latest 
to the oldest, 
and the older node with the same WWID and uevent type(e.g. add) would be 
moved to 
the merger_node list of the later node. If a deletion uevent node 
occurred, other older 
uevent nodes about this device would be filtered(Thanks to Martin’s 
idea). 

After above processing, attention must be paid to that the parameter 
“struct uevent * uev” is not a single uevent any more in and after 
uev_trigger(), code 
need to be modified to process batch uevents in uev_add_path() and so on.



[-- Attachment #1.2: Type: text/html, Size: 17802 bytes --]

[-- Attachment #2: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2016-12-01  1:16 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-16  1:46 Improve processing efficiency for addition and deletion of multipath devices tang.junhui
2016-11-16  7:53 ` Hannes Reinecke
2016-11-16  8:45   ` tang.junhui
2016-11-16  9:49     ` Martin Wilck
2016-11-17  1:41       ` tang.junhui
2016-11-17 10:48         ` Martin Wilck
2016-11-18  1:02           ` tang.junhui
2016-11-18  7:39             ` Martin Wilck
2016-11-18  8:24               ` tang.junhui
2016-11-18  8:30                 ` Martin Wilck
2016-11-18  8:56                   ` tang.junhui
2016-11-18  9:12                   ` tang.junhui
2016-11-21 18:19                   ` Benjamin Marzinski
2016-11-18 22:26           ` Benjamin Marzinski
2016-11-23  1:08             ` tang.junhui
2016-11-29  9:07               ` Zdenek Kabelac
2016-11-29 10:13                 ` tang.junhui
2016-11-24  9:21             ` Martin Wilck
2016-11-28 18:46               ` Benjamin Marzinski
2016-11-29  6:47                 ` Hannes Reinecke
2016-11-29  8:02                   ` Martin Wilck
2016-11-29  8:10                     ` Zdenek Kabelac
2016-11-29  8:16                       ` Martin Wilck
2016-11-29  8:24                         ` Zdenek Kabelac
2016-11-29 17:25                     ` Benjamin Marzinski
2016-11-29  7:57                 ` Martin Wilck
2016-11-29 17:41                   ` Benjamin Marzinski
2016-11-28  2:19 tang.junhui
2016-11-28 10:05 ` Hannes Reinecke
2016-11-28 16:07   ` Benjamin Marzinski
2016-11-28 16:26     ` Zdenek Kabelac
2016-11-28 10:06 ` Zdenek Kabelac
2016-11-28 10:42   ` Hannes Reinecke
2016-11-28 11:51     ` Zdenek Kabelac
2016-11-28 12:06       ` Peter Rajnoha
2016-11-28 12:08       ` Hannes Reinecke
2016-11-28 12:23         ` Peter Rajnoha
2016-11-28 12:55         ` Zdenek Kabelac
2016-11-28 17:22         ` Benjamin Marzinski
2016-11-29  9:34           ` Zdenek Kabelac
2016-11-28 10:28 ` Martin Wilck
2016-11-28 17:31   ` Benjamin Marzinski
2016-11-29  7:52     ` Martin Wilck
2016-11-29 19:21       ` Benjamin Marzinski
2016-11-28 15:25 ` Benjamin Marzinski
2016-11-28 15:37   ` Hannes Reinecke
2016-12-01  1:16     ` tang.junhui

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.