* Improve processing efficiency for addition and deletion of multipath devices @ 2016-11-16 1:46 tang.junhui 2016-11-16 7:53 ` Hannes Reinecke 0 siblings, 1 reply; 47+ messages in thread From: tang.junhui @ 2016-11-16 1:46 UTC (permalink / raw) To: christophe.varoqui, bmarzins, hare; +Cc: dm-devel, tang.junhui [-- Attachment #1.1: Type: text/plain, Size: 1157 bytes --] In these scenarios, multipath processing efficiency is very low: 1) There are many paths in multipath devices, 2) Add/delete devices when iSCSI login/logout or FC link up/down. Multipath process so slowly that it is not satisfied some applications, For example, openstack is often timeout in waiting for the creation of multipath devices. The reason of the low processing efficiency I think is that multipath process uevent message one by one, and each one also produce a new dm addition/change/deletion uevent message to increased system resource consumption, actually most of these uevent message have no sense at all. So, can we find out a way to reduce these uevent messages to promote multipath processing efficiency? Personally, I think we can merge these uevent messages before processing. For example, during the 4 iSCSI session login procedures, we can wait for a moment until the addition uevent messages of 4 paths are all arrived, and then we can merge these uevent messages to one and deal with it at once. The way to deal with deletion uevent messages is the same. How do you think about? Any points of view are welcome. [-- Attachment #1.2: Type: text/html, Size: 1911 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-16 1:46 Improve processing efficiency for addition and deletion of multipath devices tang.junhui @ 2016-11-16 7:53 ` Hannes Reinecke 2016-11-16 8:45 ` tang.junhui 0 siblings, 1 reply; 47+ messages in thread From: Hannes Reinecke @ 2016-11-16 7:53 UTC (permalink / raw) To: tang.junhui, christophe.varoqui, bmarzins; +Cc: dm-devel On 11/16/2016 02:46 AM, tang.junhui@zte.com.cn wrote: > In these scenarios, multipath processing efficiency is very low: > 1) There are many paths in multipath devices, > 2) Add/delete devices when iSCSI login/logout or FC link up/down. > > Multipath process so slowly that it is not satisfied some applications, > For example, openstack is often timeout in waiting for the creation of > multipath devices. > > The reason of the low processing efficiency I think is that multipath > process uevent message one by one, and each one also produce a new dm > addition/change/deletion uevent message to increased system resource > consumption, actually most of these uevent message have no sense at all. > > So, can we find out a way to reduce these uevent messages to promote > multipath processing efficiency? Personally, I think we can merge > these uevent messages before processing. For example, during the > 4 iSCSI session login procedures, we can wait for a moment until > the addition uevent messages of 4 paths are all arrived, and then we can > merge these uevent messages to one and deal with it at once. The way to > deal with deletion uevent messages is the same. > > How do you think about? Any points of view are welcome. The problem is that we don't know beforehand on how many uevents we should be waiting for. And even if we do there still would be a chance of one or several of these uevents failing to setup the device, so we would be waiting forever here. The one possible way out would be to modify the way we're handling events internally. Event processing really are several steps: 1) Getting information about the attached device (pathinfo() and friends) 2) Store the information in our pathvec 3) Identify and update the mpp structure with the new pathvecs Currently, we're processing each step for every uevent. As we have only a single lock protecting both, pathvec and mppvec, we have to take the lock prior to setup 2 and release it after step 3. So while we could receive events in parallel, we can only process them one-by-one, with every event having to re-do step 3. The idea would be to split off single lock into a pathvec lock and an mppvec lock, and create a separate thread for updating mppvec. Then event processing can be streamlined by having the uevent thread adding the new device to the pathvec and notifying the mppvec thread. This thread could then check if an pathvec update is in progress, and only start once this pathvec handling has finished. With this we would avoid having to issue several similar mppvec updates, and the entire handling would be far smoother. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-16 7:53 ` Hannes Reinecke @ 2016-11-16 8:45 ` tang.junhui 2016-11-16 9:49 ` Martin Wilck 0 siblings, 1 reply; 47+ messages in thread From: tang.junhui @ 2016-11-16 8:45 UTC (permalink / raw) To: Hannes Reinecke; +Cc: dm-devel, dm-devel-bounces [-- Attachment #1.1: Type: text/plain, Size: 3936 bytes --] Hello Hannes: Since the received uevent messages store in the queue, and the speed uevent messages received is much faster than the speed uevent messages processed, so, we can merge these queued uevent message first, and then process it in the next step. Of course, some paths ’ uevent messages of multipath device may not be received yet, but we do not need to wait for it, since we can deal with the left paths in the original way when we received uevent messages of these paths later. I think we can merger most of uevent messages and reduce most of unnecessary DM change uevent messages during creation/deletion of multipath devices by this way. The method you mentioned I think that it is a little complex, and it not reduce the DM addition/change/deletion uevent messages which consumed system high resource. Sincerely Tang On 11/16/2016 02:46 AM, tang.junhui@zte.com.cn wrote: > In these scenarios, multipath processing efficiency is very low: > 1) There are many paths in multipath devices, > 2) Add/delete devices when iSCSI login/logout or FC link up/down. > > Multipath process so slowly that it is not satisfied some applications, > For example, openstack is often timeout in waiting for the creation of > multipath devices. > > The reason of the low processing efficiency I think is that multipath > process uevent message one by one, and each one also produce a new dm > addition/change/deletion uevent message to increased system resource > consumption, actually most of these uevent message have no sense at all. > > So, can we find out a way to reduce these uevent messages to promote > multipath processing efficiency? Personally, I think we can merge > these uevent messages before processing. For example, during the > 4 iSCSI session login procedures, we can wait for a moment until > the addition uevent messages of 4 paths are all arrived, and then we can > merge these uevent messages to one and deal with it at once. The way to > deal with deletion uevent messages is the same. > > How do you think about? Any points of view are welcome. The problem is that we don't know beforehand on how many uevents we should be waiting for. And even if we do there still would be a chance of one or several of these uevents failing to setup the device, so we would be waiting forever here. The one possible way out would be to modify the way we're handling events internally. Event processing really are several steps: 1) Getting information about the attached device (pathinfo() and friends) 2) Store the information in our pathvec 3) Identify and update the mpp structure with the new pathvecs Currently, we're processing each step for every uevent. As we have only a single lock protecting both, pathvec and mppvec, we have to take the lock prior to setup 2 and release it after step 3. So while we could receive events in parallel, we can only process them one-by-one, with every event having to re-do step 3. The idea would be to split off single lock into a pathvec lock and an mppvec lock, and create a separate thread for updating mppvec. Then event processing can be streamlined by having the uevent thread adding the new device to the pathvec and notifying the mppvec thread. This thread could then check if an pathvec update is in progress, and only start once this pathvec handling has finished. With this we would avoid having to issue several similar mppvec updates, and the entire handling would be far smoother. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel [-- Attachment #1.2: Type: text/html, Size: 5340 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-16 8:45 ` tang.junhui @ 2016-11-16 9:49 ` Martin Wilck 2016-11-17 1:41 ` tang.junhui 0 siblings, 1 reply; 47+ messages in thread From: Martin Wilck @ 2016-11-16 9:49 UTC (permalink / raw) To: dm-devel Hi Tang, On Wed, 2016-11-16 at 16:45 +0800, tang.junhui@zte.com.cn wrote: > I think we can merger most of uevent messages and reduce most of > unnecessary DM change > uevent messages during creation/deletion of multipath devices by this > way. > The method you mentioned I think that it is a little complex, and it > not reduce the DM > addition/change/deletion uevent messages which consumed system high > resource. Let me quickly introduce myself, I'm a new member of Hannes' team at SUSE and new on this ML. Apart from Hannes' proposal for more fine-grained locking, I can see the following options: a) instead of kicking the uevent processing thread whenever anything is queued, kick it in predefined time intervals (1 second, say). The uevent processor would then accumulate all changes received since it had been kicked last, and apply DM changes only when necessary. This may need to be a config option because it could obviously lead to delayed device setup in some configurations. b) the logic of a) could be improved by the uevent listener detecting "event storms" and waiting for them to end before kicking the processing thread. The details of the heuristics for "storms" would need to be worked out, of course. c) sometimes several uevents for the same physical path occur in quick succession. Normally it should be sufficient to filter these such that only the last event is processed. Regards, Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-16 9:49 ` Martin Wilck @ 2016-11-17 1:41 ` tang.junhui 2016-11-17 10:48 ` Martin Wilck 0 siblings, 1 reply; 47+ messages in thread From: tang.junhui @ 2016-11-17 1:41 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel [-- Attachment #1.1: Type: text/plain, Size: 3278 bytes --] Hi Martin, Nice to see you, May you success in your team and our open source community. You have put forward a good proposal to queued more uevent messages by kicking uevent processing thread in a predefined time intervals. It is also a good idea to have such intervals configuration. As to process several uevents for the same physical devices, I think the opinions different between us is "FILTER" or "MERGER". Personally, I think Merger is more accuracy, for example, we receive 4 paths addition uevent messages from the same physical devices: 1)uevent add sdb 2)uevent add sdc 3)uevent add sdd 4)uevent add sde We cannot just filter the 1)2)3) uevent messages but only process the 4)uevent message, which would cause losing paths of this multipath devices. In my opionion, we should MERGE the four uevent messages to one like: 1)uevent add sdb sdb sdd sde And then uevent processing thread only needs to process one uevent message, and it only produce one DM addition uevent messages(VS. now: one DM addition uevent message and 3 DM change uevent messages) which really reduce system consumption (for example: avoid udev to process every DM uevent messages DM kernel produced). Regards,Tang 发件人: Martin Wilck <mwilck@suse.com> 收件人: dm-devel@redhat.com, 日期: 2016/11/16 17:59 主题: Re: [dm-devel] Improve processing efficiency for addition and deletion of multipath devices 发件人: dm-devel-bounces@redhat.com Hi Tang, On Wed, 2016-11-16 at 16:45 +0800, tang.junhui@zte.com.cn wrote: > I think we can merger most of uevent messages and reduce most of > unnecessary DM change > uevent messages during creation/deletion of multipath devices by this > way. > The method you mentioned I think that it is a little complex, and it > not reduce the DM > addition/change/deletion uevent messages which consumed system high > resource. Let me quickly introduce myself, I'm a new member of Hannes' team at SUSE and new on this ML. Apart from Hannes' proposal for more fine-grained locking, I can see the following options: a) instead of kicking the uevent processing thread whenever anything is queued, kick it in predefined time intervals (1 second, say). The uevent processor would then accumulate all changes received since it had been kicked last, and apply DM changes only when necessary. This may need to be a config option because it could obviously lead to delayed device setup in some configurations. b) the logic of a) could be improved by the uevent listener detecting "event storms" and waiting for them to end before kicking the processing thread. The details of the heuristics for "storms" would need to be worked out, of course. c) sometimes several uevents for the same physical path occur in quick succession. Normally it should be sufficient to filter these such that only the last event is processed. Regards, Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel [-- Attachment #1.2: Type: text/html, Size: 5657 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-17 1:41 ` tang.junhui @ 2016-11-17 10:48 ` Martin Wilck 2016-11-18 1:02 ` tang.junhui 2016-11-18 22:26 ` Benjamin Marzinski 0 siblings, 2 replies; 47+ messages in thread From: Martin Wilck @ 2016-11-17 10:48 UTC (permalink / raw) To: tang.junhui; +Cc: dm-devel Hi Tang, > As to process several uevents for the same physical devices, I think > the opinions > different between us is "FILTER" or "MERGER". Personally, I think > Merger is more > accuracy, for example, we receive 4 paths addition uevent messages > from the same > physical devices: > 1)uevent add sdb > 2)uevent add sdc > 3)uevent add sdd > 4)uevent add sde > > We cannot just filter the 1)2)3) uevent messages but only process the > 4)uevent message, > which would cause losing paths of this multipath devices. Of course. My "filtering" idea was meant for cases where several events for the same device are queued, e.g. 1) add sda 2) change sda 3) delete sda Is it always sufficient to look only at the last event in such a case? I think so, but I'm not 100% certain. Regards Martin -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-17 10:48 ` Martin Wilck @ 2016-11-18 1:02 ` tang.junhui 2016-11-18 7:39 ` Martin Wilck 2016-11-18 22:26 ` Benjamin Marzinski 1 sibling, 1 reply; 47+ messages in thread From: tang.junhui @ 2016-11-18 1:02 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel [-- Attachment #1.1: Type: text/plain, Size: 2229 bytes --] Hi Martin, For your opinion: > My "filtering" idea was meant for cases where several events > for the same device are queued, e.g > 1) add sda > 2) change sda > 3) delete sda >Is it always sufficient to look only at the last event in such a case? I do not agree with you. The reasons are as follows: 1) It’s risky to filter uevents like that, a SCSI device has been generated, May be its life time is very short, but we cannot turn a blind eye on it, the system or applications may need multipath device of the SCSI device. 2) This scenario you mentioned rarely happens, we are more concerned about the more common situation like addition or deletion devices when iSCSI login/logout or FC link up/down with many iSCSI or FC links in each LUN. At this situation we may receive many uevents from different paths of the same LUN device, we want merge these uevents to one and process them together. Regards, Tang 发件人: Martin Wilck <mwilck@suse.com> 收件人: tang.junhui@zte.com.cn, 抄送: dm-devel@redhat.com 日期: 2016/11/17 18:57 主题: Re: [dm-devel] Improve processing efficiency for addition and deletion of multipath devices 发件人: dm-devel-bounces@redhat.com Hi Tang, > As to process several uevents for the same physical devices, I think > the opinions > different between us is "FILTER" or "MERGER". Personally, I think > Merger is more > accuracy, for example, we receive 4 paths addition uevent messages > from the same > physical devices: > 1)uevent add sdb > 2)uevent add sdc > 3)uevent add sdd > 4)uevent add sde > > We cannot just filter the 1)2)3) uevent messages but only process the > 4)uevent message, > which would cause losing paths of this multipath devices. Of course. My "filtering" idea was meant for cases where several events for the same device are queued, e.g. 1) add sda 2) change sda 3) delete sda Is it always sufficient to look only at the last event in such a case? I think so, but I'm not 100% certain. Regards Martin -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel [-- Attachment #1.2: Type: text/html, Size: 4532 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-18 1:02 ` tang.junhui @ 2016-11-18 7:39 ` Martin Wilck 2016-11-18 8:24 ` tang.junhui 0 siblings, 1 reply; 47+ messages in thread From: Martin Wilck @ 2016-11-18 7:39 UTC (permalink / raw) To: tang.junhui; +Cc: dm-devel Hi Tang, On Fri, 2016-11-18 at 09:02 +0800, tang.junhui@zte.com.cn wrote: > > we are more concerned > about the more common situation like addition or deletion devices > when > iSCSI login/logout or FC link up/down with many iSCSI or FC links > in > each LUN. At this situation we may receive many uevents from > different > paths of the same LUN device, we want merge these uevents to one > and > process them together. I'm not arguing against that. But consider what you'd do if you have to process the series of uevents [ 1 "add sda", 2 "add sdb", 3 "del sda", 4 "add sdc", 5 "del sdb", 6 "add sda" ]. If you merge these, assuming all belong to the same multipath map, what would be your action? I would set up a map with sda and sdc, using the device properties from event 4 and 6. That means that the rest of the events has been "filtered" out. All else would mean repeated map loads for the same multipath map, which is what we want to avoid. Agreed? Regards, Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-18 7:39 ` Martin Wilck @ 2016-11-18 8:24 ` tang.junhui 2016-11-18 8:30 ` Martin Wilck 0 siblings, 1 reply; 47+ messages in thread From: tang.junhui @ 2016-11-18 8:24 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel [-- Attachment #1.1: Type: text/plain, Size: 8557 bytes --] Hi Martin, In your case, my action is: 1) merger uevents 1) 2) to one uevent "add sda sdb", and process them togother 2) process uevent "del sda" 3) process uevent "add sdc" 4) process uevent "del sdb" 5) process uevent "add sda" Though the processing efficiency in such scenario is lower than yours, but it is simple and reliable, more importantly, Martin, you still focus on such special scene, which I concerned is like this: UDEV [19172.014482] add /devices/platform/host3/session17/target3:0:0/3:0:0:0/block/sdc (block) UDEV [19172.249389] add /devices/platform/host4/session18/target4:0:0/4:0:0:0/block/sdd (block) UDEV [19172.343791] add /devices/platform/host3/session17/target3:0:0/3:0:0:2/block/sdf (block) UDEV [19172.364496] add /devices/platform/host5/session19/target5:0:0/5:0:0:0/block/sdh (block) UDEV [19172.523794] add /devices/platform/host4/session18/target4:0:0/4:0:0:2/block/sdi (block) UDEV [19172.621333] add /devices/platform/host3/session17/target3:0:0/3:0:0:4/block/sdn (block) UDEV [19172.699473] add /devices/platform/host4/session18/target4:0:0/4:0:0:1/block/sdg (block) UDEV [19172.704605] add /devices/platform/host3/session17/target3:0:0/3:0:0:1/block/sde (block) UDEV [19172.709687] add /devices/platform/host3/session17/target3:0:0/3:0:0:3/block/sdj (block) UDEV [19172.714919] add /devices/platform/host4/session18/target4:0:0/4:0:0:3/block/sdl (block) UDEV [19172.728891] add /devices/platform/host4/session18/target4:0:0/4:0:0:4/block/sdo (block) UDEV [19172.872156] add /devices/platform/host3/session17/target3:0:0/3:0:0:6/block/sdt (block) UDEV [19172.915542] add /devices/platform/host4/session18/target4:0:0/4:0:0:6/block/sdu (block) UDEV [19173.086935] add /devices/platform/host6/session20/target6:0:0/6:0:0:0/block/sdw (block) UDEV [19173.108278] add /devices/platform/host6/session20/target6:0:0/6:0:0:1/block/sdz (block) UDEV [19173.195153] add /devices/platform/host4/session18/target4:0:0/4:0:0:5/block/sdr (block) UDEV [19173.228397] add /devices/platform/host3/session17/target3:0:0/3:0:0:5/block/sdq (block) UDEV [19173.363632] add /devices/platform/host5/session19/target5:0:0/5:0:0:2/block/sdm (block) UDEV [19173.386560] add /devices/platform/host3/session17/target3:0:0/3:0:0:7/block/sdx (block) UDEV [19173.394515] add /devices/platform/host4/session18/target4:0:0/4:0:0:7/block/sdy (block) UDEV [19173.410152] add /devices/platform/host5/session19/target5:0:0/5:0:0:1/block/sdk (block) UDEV [19173.474286] add /devices/platform/host6/session20/target6:0:0/6:0:0:2/block/sdab (block) UDEV [19173.508438] add /devices/platform/host5/session19/target5:0:0/5:0:0:3/block/sdp (block) UDEV [19173.713146] add /devices/platform/host5/session19/target5:0:0/5:0:0:4/block/sds (block) UDEV [19173.782065] add /devices/platform/host5/session19/target5:0:0/5:0:0:5/block/sdv (block) UDEV [19173.787447] add /devices/platform/host5/session19/target5:0:0/5:0:0:6/block/sdaa (block) UDEV [19173.803218] add /devices/platform/host6/session20/target6:0:0/6:0:0:3/block/sdad (block) UDEV [19173.849411] add /devices/platform/host5/session19/target5:0:0/5:0:0:7/block/sdac (block) UDEV [19173.918301] add /devices/platform/host6/session20/target6:0:0/6:0:0:5/block/sdaf (block) UDEV [19173.941005] add /devices/platform/host6/session20/target6:0:0/6:0:0:4/block/sdae (block) UDEV [19173.987206] add /devices/platform/host6/session20/target6:0:0/6:0:0:7/block/sdah (block) UDEV [19173.992431] add /devices/platform/host6/session20/target6:0:0/6:0:0:6/block/sdag (block) Or like this: UDEV [20712.402631] remove /devices/platform/host3/session17/target3:0:0/3:0:0:0/block/sdc (block) UDEV [20712.427716] remove /devices/platform/host6/session20/target6:0:0/6:0:0:0/block/sdw (block) UDEV [20712.459854] remove /devices/platform/host4/session18/target4:0:0/4:0:0:0/block/sdd (block) UDEV [20712.471124] remove /devices/platform/host5/session19/target5:0:0/5:0:0:0/block/sdh (block) UDEV [20712.492190] remove /devices/platform/host6/session20/target6:0:0/6:0:0:1/block/sdz (block) UDEV [20712.495245] remove /devices/platform/host4/session18/target4:0:0/4:0:0:1/block/sdg (block) UDEV [20712.512007] remove /devices/platform/host3/session17/target3:0:0/3:0:0:1/block/sde (block) UDEV [20712.522986] remove /devices/platform/host5/session19/target5:0:0/5:0:0:1/block/sdk (block) UDEV [20712.528676] remove /devices/platform/host6/session20/target6:0:0/6:0:0:2/block/sdab (block) UDEV [20712.529891] remove /devices/platform/host5/session19/target5:0:0/5:0:0:2/block/sdm (block) UDEV [20712.536178] remove /devices/platform/host4/session18/target4:0:0/4:0:0:2/block/sdi (block) UDEV [20712.545444] remove /devices/platform/host4/session18/target4:0:0/4:0:0:3/block/sdl (block) UDEV [20712.548006] remove /devices/platform/host3/session17/target3:0:0/3:0:0:3/block/sdj (block) UDEV [20712.551935] remove /devices/platform/host5/session19/target5:0:0/5:0:0:3/block/sdp (block) UDEV [20712.555405] remove /devices/platform/host3/session17/target3:0:0/3:0:0:2/block/sdf (block) UDEV [20712.556947] remove /devices/platform/host4/session18/target4:0:0/4:0:0:4/block/sdo (block) UDEV [20712.563524] remove /devices/platform/host5/session19/target5:0:0/5:0:0:4/block/sds (block) UDEV [20712.572948] remove /devices/platform/host6/session20/target6:0:0/6:0:0:4/block/sdae (block) UDEV [20712.574738] remove /devices/platform/host6/session20/target6:0:0/6:0:0:3/block/sdad (block) UDEV [20712.576736] remove /devices/platform/host4/session18/target4:0:0/4:0:0:5/block/sdr (block) UDEV [20712.581343] remove /devices/platform/host3/session17/target3:0:0/3:0:0:4/block/sdn (block) UDEV [20712.583883] remove /devices/platform/host3/session17/target3:0:0/3:0:0:5/block/sdq (block) UDEV [20712.604498] remove /devices/platform/host6/session20/target6:0:0/6:0:0:5/block/sdaf (block) UDEV [20712.605536] remove /devices/platform/host4/session18/target4:0:0/4:0:0:6/block/sdu (block) UDEV [20712.605721] remove /devices/platform/host3/session17/target3:0:0/3:0:0:6/block/sdt (block) UDEV [20712.606517] remove /devices/platform/host5/session19/target5:0:0/5:0:0:5/block/sdv (block) UDEV [20712.610709] remove /devices/platform/host5/session19/target5:0:0/5:0:0:6/block/sdaa (block) UDEV [20712.618187] remove /devices/platform/host4/session18/target4:0:0/4:0:0:7/block/sdy (block) UDEV [20712.628520] remove /devices/platform/host6/session20/target6:0:0/6:0:0:6/block/sdag (block) UDEV [20712.642216] remove /devices/platform/host5/session19/target5:0:0/5:0:0:7/block/sdac (block) UDEV [20712.643796] remove /devices/platform/host3/session17/target3:0:0/3:0:0:7/block/sdx (block) UDEV [20712.653786] remove /devices/platform/host6/session20/target6:0:0/6:0:0:7/block/sdah (block) These scenarios are more realistic and more urgently requested to be solved. Regards, Tang 发件人: Martin Wilck <mwilck@suse.com> 收件人: tang.junhui@zte.com.cn, 抄送: dm-devel@redhat.com 日期: 2016/11/18 15:40 主题: Re: Re: [dm-devel] Improve processing efficiency for addition and deletion of multipath devices Hi Tang, On Fri, 2016-11-18 at 09:02 +0800, tang.junhui@zte.com.cn wrote: > > we are more concerned > about the more common situation like addition or deletion devices > when > iSCSI login/logout or FC link up/down with many iSCSI or FC links > in > each LUN. At this situation we may receive many uevents from > different > paths of the same LUN device, we want merge these uevents to one > and > process them together. I'm not arguing against that. But consider what you'd do if you have to process the series of uevents [ 1 "add sda", 2 "add sdb", 3 "del sda", 4 "add sdc", 5 "del sdb", 6 "add sda" ]. If you merge these, assuming all belong to the same multipath map, what would be your action? I would set up a map with sda and sdc, using the device properties from event 4 and 6. That means that the rest of the events has been "filtered" out. All else would mean repeated map loads for the same multipath map, which is what we want to avoid. Agreed? Regards, Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) [-- Attachment #1.2: Type: text/html, Size: 12614 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-18 8:24 ` tang.junhui @ 2016-11-18 8:30 ` Martin Wilck 2016-11-18 8:56 ` tang.junhui ` (2 more replies) 0 siblings, 3 replies; 47+ messages in thread From: Martin Wilck @ 2016-11-18 8:30 UTC (permalink / raw) To: tang.junhui; +Cc: dm-devel On Fri, 2016-11-18 at 16:24 +0800, tang.junhui@zte.com.cn wrote: > Hi Martin, > > In your case, my action is: > 1) merger uevents 1) 2) to one uevent "add sda sdb", and process them > togother This will fail because sdb is non-existent at the time you try - no? > Though the processing efficiency in such scenario is lower than > yours, but it is simple and reliable, > more importantly, Martin, you still focus on such special scene, > which I concerned is like this: [...] I understand what you're concerned with. I just think we need to do both. I agree that many events for many different devices are more likely. But once you start merging, you'd rather be prepared for several events for the same phys device, too. Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-18 8:30 ` Martin Wilck @ 2016-11-18 8:56 ` tang.junhui 2016-11-18 9:12 ` tang.junhui 2016-11-21 18:19 ` Benjamin Marzinski 2 siblings, 0 replies; 47+ messages in thread From: tang.junhui @ 2016-11-18 8:56 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel [-- Attachment #1.1: Type: text/plain, Size: 1624 bytes --] Hi Martin, > This will fail because sdb is non-existent at the time you try - no? Non-existent paths in merger uevents do not affect the entire algorithm. It should be considered at the time of implementation. Your suggestion is meaningful, but I think we need to focus on the existed problem, resolve it, and then do more work to make it better in other areas. is not it? Regards,Tang 发件人: Martin Wilck <mwilck@suse.com> 收件人: tang.junhui@zte.com.cn, 抄送: dm-devel@redhat.com 日期: 2016/11/18 16:31 主题: Re: Re: Re: [dm-devel] Improve processing efficiency for addition and deletion of multipath devices On Fri, 2016-11-18 at 16:24 +0800, tang.junhui@zte.com.cn wrote: > Hi Martin, > > In your case, my action is: > 1) merger uevents 1) 2) to one uevent "add sda sdb", and process them > togother This will fail because sdb is non-existent at the time you try - no? > Though the processing efficiency in such scenario is lower than > yours, but it is simple and reliable, > more importantly, Martin, you still focus on such special scene, > which I concerned is like this: [...] I understand what you're concerned with. I just think we need to do both. I agree that many events for many different devices are more likely. But once you start merging, you'd rather be prepared for several events for the same phys device, too. Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) [-- Attachment #1.2: Type: text/html, Size: 2842 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-18 8:30 ` Martin Wilck 2016-11-18 8:56 ` tang.junhui @ 2016-11-18 9:12 ` tang.junhui 2016-11-21 18:19 ` Benjamin Marzinski 2 siblings, 0 replies; 47+ messages in thread From: tang.junhui @ 2016-11-18 9:12 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel [-- Attachment #1.1: Type: text/plain, Size: 1683 bytes --] Hi Martin, 〉But once you start merging, you'd rather be prepared for 〉several events for the same phys device, too. We can base on such a threshold that there is no repeat uevents from the same sd device, otherwise, we pause doing merger, and kick uevent processing thread to process the merged uevents. Regards, Tang 发件人: Martin Wilck <mwilck@suse.com> 收件人: tang.junhui@zte.com.cn, 抄送: dm-devel@redhat.com 日期: 2016/11/18 16:38 主题: Re: [dm-devel] Improve processing efficiency for addition and deletion of multipath devices 发件人: dm-devel-bounces@redhat.com On Fri, 2016-11-18 at 16:24 +0800, tang.junhui@zte.com.cn wrote: > Hi Martin, > > In your case, my action is: > 1) merger uevents 1) 2) to one uevent "add sda sdb", and process them > togother This will fail because sdb is non-existent at the time you try - no? > Though the processing efficiency in such scenario is lower than > yours, but it is simple and reliable, > more importantly, Martin, you still focus on such special scene, > which I concerned is like this: [...] I understand what you're concerned with. I just think we need to do both. I agree that many events for many different devices are more likely. But once you start merging, you'd rather be prepared for several events for the same phys device, too. Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel [-- Attachment #1.2: Type: text/html, Size: 3235 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-18 8:30 ` Martin Wilck 2016-11-18 8:56 ` tang.junhui 2016-11-18 9:12 ` tang.junhui @ 2016-11-21 18:19 ` Benjamin Marzinski 2 siblings, 0 replies; 47+ messages in thread From: Benjamin Marzinski @ 2016-11-21 18:19 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel, tang.junhui On Fri, Nov 18, 2016 at 09:30:57AM +0100, Martin Wilck wrote: > On Fri, 2016-11-18 at 16:24 +0800, tang.junhui@zte.com.cn wrote: > > Hi Martin, > > > > In your case, my action is: > > 1) merger uevents 1) 2) to one uevent "add sda sdb", and process them > > togother > > This will fail because sdb is non-existent at the time you try - no? Yes. If we get a cluster of events for a device, and the last event is a remove event, we don't need to precess them. Just like you mention, after that remove event is sent by udev, the device is gone. There is nothing that multipathd could do with it even if it wanted to. So there is no point in trying the add and change event processing, since we already know that they will fail. -Ben > > > Though the processing efficiency in such scenario is lower than > > yours, but it is simple and reliable, > > more importantly, Martin, you still focus on such special scene, > > which I concerned is like this: > > [...] > > I understand what you're concerned with. I just think we need to do > both. I agree that many events for many different devices are more > likely. But once you start merging, you'd rather be prepared for > several events for the same phys device, too. > > Martin > > -- > Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) > > -- > dm-devel mailing list > dm-devel@redhat.com > https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-17 10:48 ` Martin Wilck 2016-11-18 1:02 ` tang.junhui @ 2016-11-18 22:26 ` Benjamin Marzinski 2016-11-23 1:08 ` tang.junhui 2016-11-24 9:21 ` Martin Wilck 1 sibling, 2 replies; 47+ messages in thread From: Benjamin Marzinski @ 2016-11-18 22:26 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel, tang.junhui My $.02 I'm not sure that we want to wait for a predetermined time to grab the uevents. If they are coming in slowly, multipathd is more responsive by processing them immediately, and if they are coming in a storm, they will naturally pile up faster than we deal with them. I also don't really like the idea of waiting when we get an event to see if another one comes along, for the same reasons. That being said, I wouldn't NACK an config option (turned off by default) that set a minimum wait time between uevent dispatches, because I might be wrong here. However, It sees like what uevent_dispatch() dispatch already does is the right thing, which is to pull all oustanding uevents off the uevent queue into a new list. The issue is that we simply need to work on processing the whole list at a time. Martin's filtering definitely has a place here. He is correct that if we get a delete uevent for a device, we don't need to process any of the earlier ones. We do need to look at both the add and change uevents individually. For instance, one of the change uevents out of a group could be for changing the read-only status of a device. If we just took the most recent, we would miss that information. The good news is that we don't need to actually call uev_add_path and uev_update_path once for each of these events. All we need to do is read through the uev environment variables for all of them to find the information we need (like change in ro status), and then call the function once with that information. Hannes pointed out that for adding paths we don't need to do any locking until we want to add the path to the pathvec. We could grab all the outstanding add events from the list that gets passed to service_uevq, and collect the pathinfo for all the paths, add them all into the pathvec, and then create or update the multipath devices. delete uevents could work similarly, where we find all the paths for a multipath device in our list, remove them all, and reload the device once. I'm not sure how much a separate thread for doing the multipath work would buy us, however. It's true that we could have a seperate lock for the mpvec, so we didn't need to hold the pathvec lock while updating the mpvec, but actually updating the mpvec only takes a moment. The time consuming part is building the multipath device and passing it to the kernel. For this, we do need the the paths locked. Otherwise what would keep a path from getting removed while the multipath device was using it to set get set up. If working with multipath devices and proccessing path uevents is happening at the same time, this is a very real possibility. But even if we keep one thread processing the uevents, simply avoiding calling uev_add_path and uev_update_path for repeated add and change events, and batching multiple add and delete events together could provide a real speedup, IMHO. Now, the holy grail of multipathd efficiency would be to totally rework multipathd's locking into something sensible. We could have locks for the vectors that only got held when you were actually traversing or manipulating the vectors, coupled with reference counts on the individual paths and multipaths, so that they didn't get removed while they were in use, and probably locks on the individual paths and multipaths for just the sections that really needed to be protected. The problem is that this would take a huge amount of code auditting to get mostly right, and then a while to flush out all the missed corner cases. At least I think it would. But I dismissed decoupling the config structure from the vecs lock as too painful, and clearly that was possible. At any rate, I'd rather get rid of the gazillion waiter threads first. -Ben On Thu, Nov 17, 2016 at 11:48:32AM +0100, Martin Wilck wrote: > Hi Tang, > > > As to process several uevents for the same physical devices, I think > > the opinions > > different between us is "FILTER" or "MERGER". Personally, I think > > Merger is more > > accuracy, for example, we receive 4 paths addition uevent messages > > from the same > > physical devices: > > 1)uevent add sdb > > 2)uevent add sdc > > 3)uevent add sdd > > 4)uevent add sde > > > > We cannot just filter the 1)2)3) uevent messages but only process the > > 4)uevent message, > > which would cause losing paths of this multipath devices. > > Of course. My "filtering" idea was meant for cases where several events > for the same device are queued, e.g. > > 1) add sda > 2) change sda > 3) delete sda > > Is it always sufficient to look only at the last event in such a case? > I think so, but I'm not 100% certain. > > Regards > Martin ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-18 22:26 ` Benjamin Marzinski @ 2016-11-23 1:08 ` tang.junhui 2016-11-29 9:07 ` Zdenek Kabelac 2016-11-24 9:21 ` Martin Wilck 1 sibling, 1 reply; 47+ messages in thread From: tang.junhui @ 2016-11-23 1:08 UTC (permalink / raw) To: Benjamin Marzinski; +Cc: zhang.kai16, dm-devel, Bart Van Assche, Martin Wilck [-- Attachment #1.1: Type: text/plain, Size: 5611 bytes --] So, now we have at least 4 ways to improve mutliapth efficiency: 1) Filtering uevents; 2) Merger uevents; 3) Using separate locks for mpvec and pathvec; 4) Get rid of the gazillion waiter threads. This is exciting, but how do we achieve this blueprint? Can we set up some working groups and develop it in parallel to implement each improvement since most of them are independent? 发件人: "Benjamin Marzinski" <bmarzins@redhat.com> 收件人: Martin Wilck <mwilck@suse.com>, 抄送: dm-devel@redhat.com, tang.junhui@zte.com.cn 日期: 2016/11/19 06:33 主题: Re: [dm-devel] Improve processing efficiency for addition and deletion of multipath devices 发件人: dm-devel-bounces@redhat.com My $.02 I'm not sure that we want to wait for a predetermined time to grab the uevents. If they are coming in slowly, multipathd is more responsive by processing them immediately, and if they are coming in a storm, they will naturally pile up faster than we deal with them. I also don't really like the idea of waiting when we get an event to see if another one comes along, for the same reasons. That being said, I wouldn't NACK an config option (turned off by default) that set a minimum wait time between uevent dispatches, because I might be wrong here. However, It sees like what uevent_dispatch() dispatch already does is the right thing, which is to pull all oustanding uevents off the uevent queue into a new list. The issue is that we simply need to work on processing the whole list at a time. Martin's filtering definitely has a place here. He is correct that if we get a delete uevent for a device, we don't need to process any of the earlier ones. We do need to look at both the add and change uevents individually. For instance, one of the change uevents out of a group could be for changing the read-only status of a device. If we just took the most recent, we would miss that information. The good news is that we don't need to actually call uev_add_path and uev_update_path once for each of these events. All we need to do is read through the uev environment variables for all of them to find the information we need (like change in ro status), and then call the function once with that information. Hannes pointed out that for adding paths we don't need to do any locking until we want to add the path to the pathvec. We could grab all the outstanding add events from the list that gets passed to service_uevq, and collect the pathinfo for all the paths, add them all into the pathvec, and then create or update the multipath devices. delete uevents could work similarly, where we find all the paths for a multipath device in our list, remove them all, and reload the device once. I'm not sure how much a separate thread for doing the multipath work would buy us, however. It's true that we could have a seperate lock for the mpvec, so we didn't need to hold the pathvec lock while updating the mpvec, but actually updating the mpvec only takes a moment. The time consuming part is building the multipath device and passing it to the kernel. For this, we do need the the paths locked. Otherwise what would keep a path from getting removed while the multipath device was using it to set get set up. If working with multipath devices and proccessing path uevents is happening at the same time, this is a very real possibility. But even if we keep one thread processing the uevents, simply avoiding calling uev_add_path and uev_update_path for repeated add and change events, and batching multiple add and delete events together could provide a real speedup, IMHO. Now, the holy grail of multipathd efficiency would be to totally rework multipathd's locking into something sensible. We could have locks for the vectors that only got held when you were actually traversing or manipulating the vectors, coupled with reference counts on the individual paths and multipaths, so that they didn't get removed while they were in use, and probably locks on the individual paths and multipaths for just the sections that really needed to be protected. The problem is that this would take a huge amount of code auditting to get mostly right, and then a while to flush out all the missed corner cases. At least I think it would. But I dismissed decoupling the config structure from the vecs lock as too painful, and clearly that was possible. At any rate, I'd rather get rid of the gazillion waiter threads first. -Ben On Thu, Nov 17, 2016 at 11:48:32AM +0100, Martin Wilck wrote: > Hi Tang, > > > As to process several uevents for the same physical devices, I think > > the opinions > > different between us is "FILTER" or "MERGER". Personally, I think > > Merger is more > > accuracy, for example, we receive 4 paths addition uevent messages > > from the same > > physical devices: > > 1)uevent add sdb > > 2)uevent add sdc > > 3)uevent add sdd > > 4)uevent add sde > > > > We cannot just filter the 1)2)3) uevent messages but only process the > > 4)uevent message, > > which would cause losing paths of this multipath devices. > > Of course. My "filtering" idea was meant for cases where several events > for the same device are queued, e.g. > > 1) add sda > 2) change sda > 3) delete sda > > Is it always sufficient to look only at the last event in such a case? > I think so, but I'm not 100% certain. > > Regards > Martin -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel [-- Attachment #1.2: Type: text/html, Size: 7650 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-23 1:08 ` tang.junhui @ 2016-11-29 9:07 ` Zdenek Kabelac 2016-11-29 10:13 ` tang.junhui 0 siblings, 1 reply; 47+ messages in thread From: Zdenek Kabelac @ 2016-11-29 9:07 UTC (permalink / raw) To: tang.junhui, Benjamin Marzinski Cc: zhang.kai16, dm-devel, Martin Wilck, Bart Van Assche Dne 23.11.2016 v 02:08 tang.junhui@zte.com.cn napsal(a): > So, now we have at least 4 ways to improve mutliapth efficiency: > 1) Filtering uevents; > 2) Merger uevents; > 3) Using separate locks for mpvec and pathvec; > 4) Get rid of the gazillion waiter threads. > > This is exciting, but how do we achieve this blueprint? > Can we set up some working groups and develop it in parallel > to implement each improvement since most of them are independent? > I'm missing here the analysis of impact on performance. Do we have 'perf' traces and other reports where is the CPU wasted ? While surely solving all issues make sense - there are some tasks which have major impact on speed - while others user will most not notice at all. Regards Zdenek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-29 9:07 ` Zdenek Kabelac @ 2016-11-29 10:13 ` tang.junhui 0 siblings, 0 replies; 47+ messages in thread From: tang.junhui @ 2016-11-29 10:13 UTC (permalink / raw) To: Zdenek Kabelac Cc: zhang.kai16, dm-devel-bounces, dm-devel, Bart Van Assche, Martin Wilck [-- Attachment #1.1: Type: text/plain, Size: 1607 bytes --] Hello Zdenek I think you suggestion is constructive, some unnecessary operations in udev rules waster too much system resoure(CPU etc.) It is another way to promote multipath efficiency, could you put forward more specific amendment? Regards, Tang 发件人: Zdenek Kabelac <zdenek.kabelac@gmail.com> 收件人: tang.junhui@zte.com.cn, Benjamin Marzinski <bmarzins@redhat.com>, 抄送: zhang.kai16@zte.com.cn, dm-devel@redhat.com, Martin Wilck <mwilck@suse.com>, Bart Van Assche <bart.vanassche@sandisk.com> 日期: 2016/11/29 17:22 主题: Re: [dm-devel] Improve processing efficiency for addition and deletion of multipath devices 发件人: dm-devel-bounces@redhat.com Dne 23.11.2016 v 02:08 tang.junhui@zte.com.cn napsal(a): > So, now we have at least 4 ways to improve mutliapth efficiency: > 1) Filtering uevents; > 2) Merger uevents; > 3) Using separate locks for mpvec and pathvec; > 4) Get rid of the gazillion waiter threads. > > This is exciting, but how do we achieve this blueprint? > Can we set up some working groups and develop it in parallel > to implement each improvement since most of them are independent? > I'm missing here the analysis of impact on performance. Do we have 'perf' traces and other reports where is the CPU wasted ? While surely solving all issues make sense - there are some tasks which have major impact on speed - while others user will most not notice at all. Regards Zdenek -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel [-- Attachment #1.2: Type: text/html, Size: 2887 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-18 22:26 ` Benjamin Marzinski 2016-11-23 1:08 ` tang.junhui @ 2016-11-24 9:21 ` Martin Wilck 2016-11-28 18:46 ` Benjamin Marzinski 1 sibling, 1 reply; 47+ messages in thread From: Martin Wilck @ 2016-11-24 9:21 UTC (permalink / raw) To: Benjamin Marzinski; +Cc: dm-devel, tang.junhui On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote: > The issue is that we simply need to work on processing the whole list > at > a time. Martin's filtering definitely has a place here. He is > correct > that if we get a delete uevent for a device, we don't need to process > any of the earlier ones. We do need to look at both the add and > change > uevents individually. For instance, one of the change uevents out of > a > group could be for changing the read-only status of a device. If we > just > took the most recent, we would miss that information. The good news > is > that we don't need to actually call uev_add_path and uev_update_path > once for each of these events. All we need to do is read through the > uev environment variables for all of them to find the information we > need (like change in ro status), and then call the function once with > that information. Do we need to implement the environment-merging ourselves? AFAICS this logic is implemented in udev itself. So if the last event for a given device is a change event, we'd just need to fetch the device properties from the udev db. Therefore IMO we just have to look at the last received event for a given path: - DELETE => discard the device - ADD => use the udev properties coming with the event - CHANGE => query udev db for current set of properties > At any rate, I'd rather get rid of the gazillion waiter threads > first. Hm, I thought the threads are good because this avoids one unresponsive device to stall everything? Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-24 9:21 ` Martin Wilck @ 2016-11-28 18:46 ` Benjamin Marzinski 2016-11-29 6:47 ` Hannes Reinecke 2016-11-29 7:57 ` Martin Wilck 0 siblings, 2 replies; 47+ messages in thread From: Benjamin Marzinski @ 2016-11-28 18:46 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel, tang.junhui On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote: > On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote: > > > The issue is that we simply need to work on processing the whole list > > at > > a time. Martin's filtering definitely has a place here. He is > > correct > > that if we get a delete uevent for a device, we don't need to process > > any of the earlier ones. We do need to look at both the add and > > change > > uevents individually. For instance, one of the change uevents out of > > a > > group could be for changing the read-only status of a device. If we > > just > > took the most recent, we would miss that information. The good news > > is > > that we don't need to actually call uev_add_path and uev_update_path > > once for each of these events. All we need to do is read through the > > uev environment variables for all of them to find the information we > > need (like change in ro status), and then call the function once with > > that information. > > Do we need to implement the environment-merging ourselves? AFAICS this > logic is implemented in udev itself. So if the last event for a given > device is a change event, we'd just need to fetch the device properties > from the udev db. Therefore IMO we just have to look at the last > received event for a given path: > > - DELETE => discard the device > - ADD => use the udev properties coming with the event > - CHANGE => query udev db for current set of properties The issue is that when we get a change event, there may be udev environment variables that tell us what that specific event is for. For instance, a change in the Read-only status of the path device. Future change events will not have those environment variables set. However, if we want to only call uev_update_path once, we need to know all of the things that changed, so we need to examine the approriate udev environment variables for all the change events, so that when we call uev_update_path, we can take all the necessary actions (in this instance, changing the Read-only status of the multipath device). > > At any rate, I'd rather get rid of the gazillion waiter threads > > first. > > Hm, I thought the threads are good because this avoids one unresponsive > device to stall everything? There is work making dm events pollable, so that you can wait for any number of them with one thread. At the moment, once we get an event, we lock the vecs lock, which pretty much keeps everything else from running, so this doesn't really change that. -Ben > Martin > > -- > Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 18:46 ` Benjamin Marzinski @ 2016-11-29 6:47 ` Hannes Reinecke 2016-11-29 8:02 ` Martin Wilck 2016-11-29 7:57 ` Martin Wilck 1 sibling, 1 reply; 47+ messages in thread From: Hannes Reinecke @ 2016-11-29 6:47 UTC (permalink / raw) To: Benjamin Marzinski, Martin Wilck; +Cc: dm-devel, tang.junhui On 11/28/2016 07:46 PM, Benjamin Marzinski wrote: > On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote: >> On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote: >> >>> The issue is that we simply need to work on processing the whole list >>> at >>> a time. Martin's filtering definitely has a place here. He is >>> correct >>> that if we get a delete uevent for a device, we don't need to process >>> any of the earlier ones. We do need to look at both the add and >>> change >>> uevents individually. For instance, one of the change uevents out of >>> a >>> group could be for changing the read-only status of a device. If we >>> just >>> took the most recent, we would miss that information. The good news >>> is >>> that we don't need to actually call uev_add_path and uev_update_path >>> once for each of these events. All we need to do is read through the >>> uev environment variables for all of them to find the information we >>> need (like change in ro status), and then call the function once with >>> that information. >> >> Do we need to implement the environment-merging ourselves? AFAICS this >> logic is implemented in udev itself. So if the last event for a given >> device is a change event, we'd just need to fetch the device properties >> from the udev db. Therefore IMO we just have to look at the last >> received event for a given path: >> >> - DELETE => discard the device >> - ADD => use the udev properties coming with the event >> - CHANGE => query udev db for current set of properties > > The issue is that when we get a change event, there may be udev > environment variables that tell us what that specific event is for. For > instance, a change in the Read-only status of the path device. Future > change events will not have those environment variables set. However, > if we want to only call uev_update_path once, we need to know all of the > things that changed, so we need to examine the approriate udev > environment variables for all the change events, so that when we call > uev_update_path, we can take all the necessary actions (in this > instance, changing the Read-only status of the multipath device). > >>> At any rate, I'd rather get rid of the gazillion waiter threads >>> first. >> >> Hm, I thought the threads are good because this avoids one unresponsive >> device to stall everything? > > There is work making dm events pollable, so that you can wait for any > number of them with one thread. At the moment, once we get an event, we > lock the vecs lock, which pretty much keeps everything else from > running, so this doesn't really change that. > Which again leads me to the question: Why are we waiting for dm events? The code handling them is pretty arcane, and from what I've seen there is nothing in there which we wouldn't be informed via other mechanisms (path checker, uevents). So why do we still bother with them? Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-29 6:47 ` Hannes Reinecke @ 2016-11-29 8:02 ` Martin Wilck 2016-11-29 8:10 ` Zdenek Kabelac 2016-11-29 17:25 ` Benjamin Marzinski 0 siblings, 2 replies; 47+ messages in thread From: Martin Wilck @ 2016-11-29 8:02 UTC (permalink / raw) To: Hannes Reinecke, Benjamin Marzinski; +Cc: dm-devel, tang.junhui On Tue, 2016-11-29 at 07:47 +0100, Hannes Reinecke wrote: > On 11/28/2016 07:46 PM, Benjamin Marzinski wrote: > > On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote: > > > On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote: > > > > > > > At any rate, I'd rather get rid of the gazillion waiter threads > > > > first. > > > > > > Hm, I thought the threads are good because this avoids one > > > unresponsive > > > device to stall everything? > > > > There is work making dm events pollable, so that you can wait for > > any > > number of them with one thread. At the moment, once we get an > > event, we > > lock the vecs lock, which pretty much keeps everything else from > > running, so this doesn't really change that. > > > > Which again leads me to the question: > Why are we waiting for dm events? > The code handling them is pretty arcane, and from what I've seen > there > is nothing in there which we wouldn't be informed via other > mechanisms > (path checker, uevents). > So why do we still bother with them? I was asking myself the same question. From my inspection of the kernel code, there are two code paths that trigger a dm event but no uevent (bypass_pg() and switch_pg_num(), both related to path group switching). If these are covered by the path checker, I see no point in waiting for DM events. But of course, I may be missing something. Regards, Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-29 8:02 ` Martin Wilck @ 2016-11-29 8:10 ` Zdenek Kabelac 2016-11-29 8:16 ` Martin Wilck 2016-11-29 17:25 ` Benjamin Marzinski 1 sibling, 1 reply; 47+ messages in thread From: Zdenek Kabelac @ 2016-11-29 8:10 UTC (permalink / raw) To: Martin Wilck, Hannes Reinecke, Benjamin Marzinski; +Cc: dm-devel, tang.junhui Dne 29.11.2016 v 09:02 Martin Wilck napsal(a): > On Tue, 2016-11-29 at 07:47 +0100, Hannes Reinecke wrote: >> On 11/28/2016 07:46 PM, Benjamin Marzinski wrote: >>> On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote: >>>> On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote: >>>> >>>>> At any rate, I'd rather get rid of the gazillion waiter threads >>>>> first. >>>> >>>> Hm, I thought the threads are good because this avoids one >>>> unresponsive >>>> device to stall everything? >>> >>> There is work making dm events pollable, so that you can wait for >>> any >>> number of them with one thread. At the moment, once we get an >>> event, we >>> lock the vecs lock, which pretty much keeps everything else from >>> running, so this doesn't really change that. >>> >> >> Which again leads me to the question: >> Why are we waiting for dm events? >> The code handling them is pretty arcane, and from what I've seen >> there >> is nothing in there which we wouldn't be informed via other >> mechanisms >> (path checker, uevents). >> So why do we still bother with them? > > I was asking myself the same question. From my inspection of the kernel > code, there are two code paths that trigger a dm event but no uevent > (bypass_pg() and switch_pg_num(), both related to path group > switching). If these are covered by the path checker, I see no point in > waiting for DM events. But of course, I may be missing something. > Processing of 'dm' events likely should be postponed to 'dmeventd' - which is a daemon resolving the problem here with waiting for an event. Plugin just takes the action. IMHO there is nothing easier you can have. It's then upto dmeventd to maintain the best 'connection' with kernel and events. But still - I'd really like to see the focus to target to biggest bottleneck first - i.e. 'blkid' executed on appearing & disappearing component devices. Regards Zdenek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-29 8:10 ` Zdenek Kabelac @ 2016-11-29 8:16 ` Martin Wilck 2016-11-29 8:24 ` Zdenek Kabelac 0 siblings, 1 reply; 47+ messages in thread From: Martin Wilck @ 2016-11-29 8:16 UTC (permalink / raw) To: Zdenek Kabelac, Hannes Reinecke, Benjamin Marzinski; +Cc: dm-devel, tang.junhui On Tue, 2016-11-29 at 09:10 +0100, Zdenek Kabelac wrote: > Dne 29.11.2016 v 09:02 Martin Wilck napsal(a): > > On Tue, 2016-11-29 at 07:47 +0100, Hannes Reinecke wrote: > > > On 11/28/2016 07:46 PM, Benjamin Marzinski wrote: > > > > On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote: > > > > > On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote: > > > > > > > > > > > At any rate, I'd rather get rid of the gazillion waiter > > > > > > threads > > > > > > first. > > > > > > > > > > Hm, I thought the threads are good because this avoids one > > > > > unresponsive > > > > > device to stall everything? > > > > > > > > There is work making dm events pollable, so that you can wait > > > > for > > > > any > > > > number of them with one thread. At the moment, once we get an > > > > event, we > > > > lock the vecs lock, which pretty much keeps everything else > > > > from > > > > running, so this doesn't really change that. > > > > > > > > > > Which again leads me to the question: > > > Why are we waiting for dm events? > > > The code handling them is pretty arcane, and from what I've seen > > > there > > > is nothing in there which we wouldn't be informed via other > > > mechanisms > > > (path checker, uevents). > > > So why do we still bother with them? > > > > I was asking myself the same question. From my inspection of the > > kernel > > code, there are two code paths that trigger a dm event but no > > uevent > > (bypass_pg() and switch_pg_num(), both related to path group > > switching). If these are covered by the path checker, I see no > > point in > > waiting for DM events. But of course, I may be missing something. > > > > Processing of 'dm' events likely should be postponed to 'dmeventd' - > which is a daemon resolving the problem here with waiting for an > event. > > Plugin just takes the action. > > IMHO there is nothing easier you can have. > It's then upto dmeventd to maintain the best 'connection' with kernel > and events. But that would simply move the "gazillion waiter threads" from multipathd to dmeventd, right? And it would introduce another boot sequence dependency for multipathd, I'm not sure if that's desirable. Regards Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-29 8:16 ` Martin Wilck @ 2016-11-29 8:24 ` Zdenek Kabelac 0 siblings, 0 replies; 47+ messages in thread From: Zdenek Kabelac @ 2016-11-29 8:24 UTC (permalink / raw) To: Martin Wilck, Hannes Reinecke, Benjamin Marzinski; +Cc: dm-devel, tang.junhui Dne 29.11.2016 v 09:16 Martin Wilck napsal(a): > On Tue, 2016-11-29 at 09:10 +0100, Zdenek Kabelac wrote: >> Dne 29.11.2016 v 09:02 Martin Wilck napsal(a): >>> On Tue, 2016-11-29 at 07:47 +0100, Hannes Reinecke wrote: >>>> On 11/28/2016 07:46 PM, Benjamin Marzinski wrote: >>>>> On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote: >>>>>> On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote: >>>>>> >>>>>>> At any rate, I'd rather get rid of the gazillion waiter >>>>>>> threads >>>>>>> first. >>>>>> >>>>>> Hm, I thought the threads are good because this avoids one >>>>>> unresponsive >>>>>> device to stall everything? >>>>> >>>>> There is work making dm events pollable, so that you can wait >>>>> for >>>>> any >>>>> number of them with one thread. At the moment, once we get an >>>>> event, we >>>>> lock the vecs lock, which pretty much keeps everything else >>>>> from >>>>> running, so this doesn't really change that. >>>>> >>>> >>>> Which again leads me to the question: >>>> Why are we waiting for dm events? >>>> The code handling them is pretty arcane, and from what I've seen >>>> there >>>> is nothing in there which we wouldn't be informed via other >>>> mechanisms >>>> (path checker, uevents). >>>> So why do we still bother with them? >>> >>> I was asking myself the same question. From my inspection of the >>> kernel >>> code, there are two code paths that trigger a dm event but no >>> uevent >>> (bypass_pg() and switch_pg_num(), both related to path group >>> switching). If these are covered by the path checker, I see no >>> point in >>> waiting for DM events. But of course, I may be missing something. >>> >> >> Processing of 'dm' events likely should be postponed to 'dmeventd' - >> which is a daemon resolving the problem here with waiting for an >> event. >> >> Plugin just takes the action. >> >> IMHO there is nothing easier you can have. > >> It's then upto dmeventd to maintain the best 'connection' with kernel >> and events. > > But that would simply move the "gazillion waiter threads" from > multipathd to dmeventd, right? And it would introduce another boot > sequence dependency for multipathd, I'm not sure if that's desirable. > Well - frankly how many multipath devices have you seen on a single machine? Have you noticed every single "XFS" mounted device spreads around 9 threads these days? But I'm not going to defend current 'thread' explosion with device monitoring and this thing is BEING resolved. The main point here is - multipathd does NOT need to solve this issue at all and let dmeventd to resolve it - and once kernel will have better mechanism with reliable event passing (which is NOT udev btw) - it will use it. The solution is not that everyone will write everything here... Regards Zdenek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-29 8:02 ` Martin Wilck 2016-11-29 8:10 ` Zdenek Kabelac @ 2016-11-29 17:25 ` Benjamin Marzinski 1 sibling, 0 replies; 47+ messages in thread From: Benjamin Marzinski @ 2016-11-29 17:25 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel, tang.junhui On Tue, Nov 29, 2016 at 09:02:08AM +0100, Martin Wilck wrote: > On Tue, 2016-11-29 at 07:47 +0100, Hannes Reinecke wrote: > > On 11/28/2016 07:46 PM, Benjamin Marzinski wrote: > > > On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote: > > > > On Fri, 2016-11-18 at 16:26 -0600, Benjamin Marzinski wrote: > > > > > > > > > At any rate, I'd rather get rid of the gazillion waiter threads > > > > > first. > > > > > > > > Hm, I thought the threads are good because this avoids one > > > > unresponsive > > > > device to stall everything? > > > > > > There is work making dm events pollable, so that you can wait for > > > any > > > number of them with one thread. At the moment, once we get an > > > event, we > > > lock the vecs lock, which pretty much keeps everything else from > > > running, so this doesn't really change that. > > > > > > > Which again leads me to the question: > > Why are we waiting for dm events? > > The code handling them is pretty arcane, and from what I've seen > > there > > is nothing in there which we wouldn't be informed via other > > mechanisms > > (path checker, uevents). > > So why do we still bother with them? But multipath would still need to respond to the uevents just like it currently responds to the dm events (by calling udpate_multipath), so aside from the code setting up waiting on the dm events (which isn't that bad, except that we currently have to set it up and tear it down for every multipath device) things would still be the same. The locking requirements of responding to an event would definitely stame the same, regardless of whether we were responding to a dm_event or a uevent. > I was asking myself the same question. From my inspection of the kernel > code, there are two code paths that trigger a dm event but no uevent > (bypass_pg() and switch_pg_num(), both related to path group > switching). If these are covered by the path checker, I see no point in > waiting for DM events. But of course, I may be missing something. One answer is that uevents are only "best effort". They are not guaranteed to be delivered. And the time when they are least likely to be delivered is when the system is under a lot of memory pressure, which is exactly what can happen if you lose all paths to your devices, and are backing up IO. However, from my discussion with Hannes about this, neither of us came up with any situation where missing a uevent is disasterous to multipathd. uevents don't bring paths back. The checker loop does. Uevents will tell multipathd when a path fails, but the kernel is the one in charge of routing IO. If multipathd missed a event for the last failing path, it would not correctly enter recovery mode, but the checker loop should take care of that when it finds that the path is down. The way the checker code is right now, if the kernel notices a path failure and marks a path down between checkerloop runs, and multipathd missed the event notifying it of this, and the path came back before the next checker pass, multipath wouldn't restore the path. We can obviously solve this by just adding the syncing code from update_multipath to the check_path code after we call update_multipath_strings. So, multipathd will need to add some code to protect it from missing uevents, and losing an event will cause multipathd to do some things slower than otherwise, but switching to a design that just uses uevents is workable. On the other hand, when you think about it, uevents have a lot of processing associated with them. This entire thread is about how to cut that down. dm events only need to be processed by the program waiting for them, and programs only get the events that they care about. If we can make dm event processing less cumbersome (and that is currently being worked on, by making them pollable), I'm not even sure that we need to send uevents for paths failing and being restored (except for possibly when your last path goes down, or your first path goes up). The whole point of multipath is to manage these paths so that nobody else needs to care about this stuff. The one big benefit to removing the dm event waiting code, is that we can remove it completely, which we couldn't do with the uevent code. Even if we cut down on the number of uevents that multipath generates and has to respond to, multipathd will still need to respond to uevents, because many of the important ones come from outside of device-mapper. -Ben > > Regards, > Martin > > > -- > Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 18:46 ` Benjamin Marzinski 2016-11-29 6:47 ` Hannes Reinecke @ 2016-11-29 7:57 ` Martin Wilck 2016-11-29 17:41 ` Benjamin Marzinski 1 sibling, 1 reply; 47+ messages in thread From: Martin Wilck @ 2016-11-29 7:57 UTC (permalink / raw) To: Benjamin Marzinski; +Cc: dm-devel, tang.junhui On Mon, 2016-11-28 at 12:46 -0600, Benjamin Marzinski wrote: > On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote: > > > > from the udev db. Therefore IMO we just have to look at the last > > received event for a given path: > > > > - DELETE => discard the device > > - ADD => use the udev properties coming with the event > > - CHANGE => query udev db for current set of properties > > The issue is that when we get a change event, there may be udev > environment variables that tell us what that specific event is for. > For > instance, a change in the Read-only status of the path device. Future > change events will not have those environment variables set. And the udev db will not have records of the environment variables of previous change events? IOW, in your example, we can't derive the read- only status of a device by looking at the current set of udev properties of the device, only by tracking the full uevent history? Regards Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-29 7:57 ` Martin Wilck @ 2016-11-29 17:41 ` Benjamin Marzinski 0 siblings, 0 replies; 47+ messages in thread From: Benjamin Marzinski @ 2016-11-29 17:41 UTC (permalink / raw) To: Martin Wilck; +Cc: dm-devel, tang.junhui On Tue, Nov 29, 2016 at 08:57:18AM +0100, Martin Wilck wrote: > On Mon, 2016-11-28 at 12:46 -0600, Benjamin Marzinski wrote: > > On Thu, Nov 24, 2016 at 10:21:10AM +0100, Martin Wilck wrote: > > > > > > from the udev db. Therefore IMO we just have to look at the last > > > received event for a given path: > > > > > > - DELETE => discard the device > > > - ADD => use the udev properties coming with the event > > > - CHANGE => query udev db for current set of properties > > > > The issue is that when we get a change event, there may be udev > > environment variables that tell us what that specific event is for. > > For > > instance, a change in the Read-only status of the path device. Future > > change events will not have those environment variables set. > > And the udev db will not have records of the environment variables of > previous change events? IOW, in your example, we can't derive the read- > only status of a device by looking at the current set of udev > properties of the device, only by tracking the full uevent history? Correct, when a device's read only status is changed, the kernel will send a change uevent with the udev environment variable "DISK_RO=<n>" where <n> is 1 if the device changed to read-only, and 0 if it changed to read-write. This environment variable is only set for this change event. Now, it's totally possible that in every call to uev_update_path, multipathd could check a device's read-only status in sysfs and take action if it had changed. Otherwise, it needs to look though all the change events in the batch it is processing to see if any of them have the DISK_RO evironment variable set, and it would need to pass this information to uev_update_path. I personally favor this second approach. I can easily imagine multipath wanting to deal a specific way with certain change events, especially if we go the route of ignoring dm events completely, and relying only on uevents for updating multipath. In these cases, it may not always be possible to look in sysfs to find out what caused a previous change event, and we would have to read the uevent environment variables for all the events we are processing. -Ben > > Regards > Martin > > -- > Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices @ 2016-11-28 2:19 tang.junhui 2016-11-28 10:05 ` Hannes Reinecke ` (3 more replies) 0 siblings, 4 replies; 47+ messages in thread From: tang.junhui @ 2016-11-28 2:19 UTC (permalink / raw) To: Christophe Varoqui, bmarzins, Hannes Reinecke, Martin Wilck, Bart Van Assche Cc: zhang.kai16, dm-devel, tang.junhui [-- Attachment #1.1: Type: text/plain, Size: 9191 bytes --] Hello Christophe, Ben, Hannes, Martin, Bart, I am a member of host-side software development team of ZXUSP storage project in ZTE Corporation. Facing the market demand, our team decides to write code to promote multipath efficiency next month. The whole idea is in the mail below.We hope to participate in and make progress with the open source community, so any suggestion and comment would be welcome. Thanks, Tang ------------------------------------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------------------------------------ 1. Problem In these scenarios, multipath processing efficiency is low: 1) Many paths exist in each multipath device, 2) Devices addition or deletion during iSCSI login/logout or FC link up/down. 2. Reasons Multipath process uevents one by one, and each one also produce a new dm addition change or deletion uevent to increased system resource consumption, actually most of these uevents have no sense at all. E.g. login procedure of 4 iSCSI sessions with 3 LUNs: 1) Multipath processes these uevents one by one: UDEV [89068.806214] add /devices/platform/host3/session44/target3:0:0/3:0:0:0/block/sdc (block) UDEV [89068.909457] add /devices/platform/host3/session44/target3:0:0/3:0:0:2/block/sdg (block) UDEV [89068.944956] add /devices/platform/host3/session44/target3:0:0/3:0:0:1/block/sde (block) UDEV [89068.959215] add /devices/platform/host5/session46/target5:0:0/5:0:0:0/block/sdh (block) UDEV [89068.978558] add /devices/platform/host5/session46/target5:0:0/5:0:0:2/block/sdk (block) UDEV [89069.004217] add /devices/platform/host5/session46/target5:0:0/5:0:0:1/block/sdj (block) UDEV [89069.486361] add /devices/platform/host4/session45/target4:0:0/4:0:0:1/block/sdf (block) UDEV [89069.495194] add /devices/platform/host4/session45/target4:0:0/4:0:0:0/block/sdd (block) UDEV [89069.511628] add /devices/platform/host4/session45/target4:0:0/4:0:0:2/block/sdi (block) UDEV [89069.716292] add /devices/platform/host6/session47/target6:0:0/6:0:0:0/block/sdl (block) UDEV [89069.748456] add /devices/platform/host6/session47/target6:0:0/6:0:0:1/block/sdm (block) UDEV [89069.789662] add /devices/platform/host6/session47/target6:0:0/6:0:0:2/block/sdn (block) 2) Multipath also produce DM uvents by step 1), which would be processed by udev and other process who listening kernel: KERNEL[89068.899614] add /devices/virtual/block/dm-2 (block) KERNEL[89068.902477] change /devices/virtual/block/dm-2 (block) KERNEL[89068.955364] add /devices/virtual/block/dm-3 (block) KERNEL[89068.960663] change /devices/virtual/block/dm-3 (block) KERNEL[89069.018903] add /devices/virtual/block/dm-4 (block) KERNEL[89069.042102] change /devices/virtual/block/dm-4 (block) KERNEL[89069.297252] change /devices/virtual/block/dm-2 (block) KERNEL[89069.346718] change /devices/virtual/block/dm-4 (block) KERNEL[89069.388361] change /devices/virtual/block/dm-3 (block) KERNEL[89069.548270] change /devices/virtual/block/dm-4 (block) KERNEL[89069.607306] change /devices/virtual/block/dm-2 (block) KERNEL[89070.118067] change /devices/virtual/block/dm-3 (block) KERNEL[89070.136256] change /devices/virtual/block/dm-2 (block) KERNEL[89070.157222] change /devices/virtual/block/dm-4 (block) KERNEL[89070.216269] change /devices/virtual/block/dm-3 (block) 3) After processing by udev in step 2), udev also transfers these uevents to multipath: UDEV [89068.926428] add /devices/virtual/block/dm-2 (block) UDEV [89069.007511] add /devices/virtual/block/dm-3 (block) UDEV [89069.098054] add /devices/virtual/block/dm-4 (block) UDEV [89069.291184] change /devices/virtual/block/dm-2 (block) UDEV [89069.320632] change /devices/virtual/block/dm-4 (block) UDEV [89069.381434] change /devices/virtual/block/dm-3 (block) UDEV [89069.637666] change /devices/virtual/block/dm-2 (block) UDEV [89069.682303] change /devices/virtual/block/dm-4 (block) UDEV [89069.860877] change /devices/virtual/block/dm-2 (block) UDEV [89069.904735] change /devices/virtual/block/dm-4 (block) UDEV [89070.327167] change /devices/virtual/block/dm-2 (block) UDEV [89070.371114] change /devices/virtual/block/dm-4 (block) UDEV [89070.434592] change /devices/virtual/block/dm-3 (block) UDEV [89070.572072] change /devices/virtual/block/dm-3 (block) UDEV [89070.703181] change /devices/virtual/block/dm-3 (block) 4) Multipath processes uevents above. The efficiency of processing uevents one by one is low, and it produces too many uevents, which further reducing the processing efficiency. The problem is similar in the logout procedure of iSCSI sessions. 3. Negative effect Multipath processes so slowly that it is not satisfied to some applications, For example, Openstack is often timeout in waiting for the creation of multipath devices. 4. Proposal Other than processing uevents one by one, uevents which coming from the same LUN devices can be mergered to one, and then uevent processing thread only needs to process it once, and it only produces one DM addition uevent which could reduce system resource consumption. The example in Chapter 2 is continued to use to explain the proposal: 1) Multipath receives block device addition uevents from udev: UDEV [89068.806214] add /devices/platform/host3/session44/target3:0:0/3:0:0:0/block/sdc (block) UDEV [89068.909457] add /devices/platform/host3/session44/target3:0:0/3:0:0:2/block/sdg (block) UDEV [89068.944956] add /devices/platform/host3/session44/target3:0:0/3:0:0:1/block/sde (block) UDEV [89068.959215] add /devices/platform/host5/session46/target5:0:0/5:0:0:0/block/sdh (block) UDEV [89068.978558] add /devices/platform/host5/session46/target5:0:0/5:0:0:2/block/sdk (block) UDEV [89069.004217] add /devices/platform/host5/session46/target5:0:0/5:0:0:1/block/sdj (block) UDEV [89069.486361] add /devices/platform/host4/session45/target4:0:0/4:0:0:1/block/sdf (block) UDEV [89069.495194] add /devices/platform/host4/session45/target4:0:0/4:0:0:0/block/sdd (block) UDEV [89069.511628] add /devices/platform/host4/session45/target4:0:0/4:0:0:2/block/sdi (block) UDEV [89069.716292] add /devices/platform/host6/session47/target6:0:0/6:0:0:0/block/sdl (block) UDEV [89069.748456] add /devices/platform/host6/session47/target6:0:0/6:0:0:1/block/sdm (block) UDEV [89069.789662] add /devices/platform/host6/session47/target6:0:0/6:0:0:2/block/sdn (block) 2) Multipath merges these 12 uevents to 3 internal uvents UEVENT add sdc sdh sdd sdl UEVENT add sde sdj sdf sdm UEVENT add sdg sdk sdi sdn 3) Multipath process these 3 uevents one by one, and only produce 3 addition DM uvents, no dm change uevent exists. KERNEL[89068.899614] add /devices/virtual/block/dm-2 (block) KERNEL[89068.955364] add /devices/virtual/block/dm-3 (block) KERNEL[89069.018903] add /devices/virtual/block/dm-4 (block) 4) Udev process these uevents above, and transfer it to multipath UDEV [89068.926428] add /devices/virtual/block/dm-2 (block) UDEV [89069.007511] add /devices/virtual/block/dm-3 (block) UDEV [89069.098054] add /devices/virtual/block/dm-4 (block) 5) Multipath processes these uevents above, and finishes the creation of multipath devices. 5. Coding After taking over uevents form uevent listening thread, uevent processing thread can merger these uevents before processing: int uevent_dispatch(int (*uev_trigger)(struct uevent *, void * trigger_data), void * trigger_data) { ... while (1) { ... list_splice_init(&uevq, &uevq_tmp); ... list_merger_uevents(&uevq_tmp); service_uevq(&uevq_tmp); } ... } In structure of “struct uevent” , an additional member of “char wwid[WWID_SIZE]” is added to record each device WWID for addition or change uevent to identify whether these uevents coming from the same LUN, and an additional member of “struct list_head merger_node” is added to record the list of uevents which having been merged with this uevent: struct uevent { struct list_head node; struct list_head merger_node; char wwid[WWID_SIZE] struct udev_device *udev; ... }; In list_merger_uevents(&uevq_tmp), each node is traversed from the latest to the oldest, and the older node with the same WWID and uevent type(e.g. add) would be moved to the merger_node list of the later node. If a deletion uevent node occurred, other older uevent nodes about this device would be filtered(Thanks to Martin’s idea). After above processing, attention must be paid to that the parameter “struct uevent * uev” is not a single uevent any more in and after uev_trigger(), code need to be modified to process batch uevents in uev_add_path() and so on. [-- Attachment #1.2: Type: text/html, Size: 17802 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 2:19 tang.junhui @ 2016-11-28 10:05 ` Hannes Reinecke 2016-11-28 16:07 ` Benjamin Marzinski 2016-11-28 10:06 ` Zdenek Kabelac ` (2 subsequent siblings) 3 siblings, 1 reply; 47+ messages in thread From: Hannes Reinecke @ 2016-11-28 10:05 UTC (permalink / raw) To: tang.junhui, Christophe Varoqui, bmarzins, Martin Wilck, Bart Van Assche Cc: zhang.kai16, dm-devel Hi Junhui, On 11/28/2016 03:19 AM, tang.junhui@zte.com.cn wrote: > Hello Christophe, Ben, Hannes, Martin, Bart, > I am a member of host-side software development team of ZXUSP storage > project in ZTE Corporation. Facing the market demand, our team decides to write > code to promote multipath efficiency next month. The whole idea is in the mail > below. We hope to participate in and make progress with the open source community, > so any suggestion and comment would be welcome. > Thank you for looking into it. This is one area which definitely needs improvement, so your contribution would be most welcome. > > ------------------------------------------------------------------------------------------------------------------------------ > > ------------------------------------------------------------------------------------------------------------------------------ > > 1. Problem > In these scenarios, multipath processing efficiency is low: > 1) Many paths exist in each multipath device, > 2) Devices addition or deletion during iSCSI login/logout or FC link > up/down. > > 2. Reasons > Multipath process uevents one by one, and each one also produce a new dm > addition change or deletionuevent to increased system resource consumption, > actually most of these uevents have no sense at all. > [ .. ] > In list_merger_uevents(&uevq_tmp), each node is traversed from the > latest to the oldest, > and the older node with the same WWID and uevent type(e.g. add) would be > moved to > the merger_node list of the later node. If a deletion uevent node > occurred, other older > uevent nodes about this device would be filtered(Thanks to Martin’s idea). > > After above processing, attention must be paid to that the parameter > “struct uevent * uev” is not a single uevent any more in and after > uev_trigger(), code > need to be modified to process batch uevents in uev_add_path() and so on. > To handle this properly you need to modify uev_add_path(); basically you need to treat 'uev_add_path()' and 'ev_add_path()' as two _distinct_ events, where the former processes uevents and add the path to an internal list (->pathvec would be ideally suited here). 'ev_add_path()' then would need to be called once all uevents are processed, and would be processing all elements in ->pathvec, creating the resulting multipath map in one go. Actually not a bad idea. Tricky bit would be to figure out _when_ to stop uevent processing and calling 'ev_add_path'. I guess we'd need to make 'ev_add_path' running in a separate thread; that way we'd be putting pressure off the uevent processing thread and the whole thing should be running far smoother. And looking at it closer, when you move 'ev_add_path' into a separate thread, and separating out 'uev_add_path' you _already_ have your event merging implemented. With much less effort then trying to merge things at the uevent level. I'd be willing to look into it; please keep me informed about any progress here. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 10:05 ` Hannes Reinecke @ 2016-11-28 16:07 ` Benjamin Marzinski 2016-11-28 16:26 ` Zdenek Kabelac 0 siblings, 1 reply; 47+ messages in thread From: Benjamin Marzinski @ 2016-11-28 16:07 UTC (permalink / raw) To: Hannes Reinecke Cc: tang.junhui, zhang.kai16, dm-devel, Bart Van Assche, Martin Wilck On Mon, Nov 28, 2016 at 11:05:29AM +0100, Hannes Reinecke wrote: > Hi Junhui, > > On 11/28/2016 03:19 AM, tang.junhui@zte.com.cn wrote: > > Hello Christophe, Ben, Hannes, Martin, Bart, > > I am a member of host-side software development team of ZXUSP storage > > project in ZTE Corporation. Facing the market demand, our team decides to write > > code to promote multipath efficiency next month. The whole idea is in the mail > > below. We hope to participate in and make progress with the open source community, > > so any suggestion and comment would be welcome. > > > Thank you for looking into it. > This is one area which definitely needs improvement, so your > contribution would be most welcome. > > > > > ------------------------------------------------------------------------------------------------------------------------------ > > > > ------------------------------------------------------------------------------------------------------------------------------ > > > > 1. Problem > > In these scenarios, multipath processing efficiency is low: > > 1) Many paths exist in each multipath device, > > 2) Devices addition or deletion during iSCSI login/logout or FC link > > up/down. > > > > 2. Reasons > > Multipath process uevents one by one, and each one also produce a new dm > > addition change or deletionuevent to increased system resource consumption, > > actually most of these uevents have no sense at all. > > > [ .. ] > > In list_merger_uevents(&uevq_tmp), each node is traversed from the > > latest to the oldest, > > and the older node with the same WWID and uevent type(e.g. add) would be > > moved to > > the merger_node list of the later node. If a deletion uevent node > > occurred, other older > > uevent nodes about this device would be filtered(Thanks to Martin’s idea). > > > > After above processing, attention must be paid to that the parameter > > “struct uevent * uev” is not a single uevent any more in and after > > uev_trigger(), code > > need to be modified to process batch uevents in uev_add_path() and so on. > > > To handle this properly you need to modify uev_add_path(); basically you > need to treat 'uev_add_path()' and 'ev_add_path()' as two _distinct_ > events, where the former processes uevents and add the path to an > internal list (->pathvec would be ideally suited here). > 'ev_add_path()' then would need to be called once all uevents are > processed, and would be processing all elements in ->pathvec, creating > the resulting multipath map in one go. > > Actually not a bad idea. > > Tricky bit would be to figure out _when_ to stop uevent processing and > calling 'ev_add_path'. > I guess we'd need to make 'ev_add_path' running in a separate thread; > that way we'd be putting pressure off the uevent processing thread and > the whole thing should be running far smoother. I still think that we don't need to force a wait here. uevent_dispatch() already pulls off all the queued events in a batch. We could just merge over the batch that we are given. This has the nice property that when uevents aren't coming in a storm, we quickly process the next event, and when they are, the further multipathd gets behind, the larger the list it will merge over. > And looking at it closer, when you move 'ev_add_path' into a separate > thread, and separating out 'uev_add_path' you _already_ have your event > merging implemented. Like I mentioned before, I'm not sure what a seperate thread will buy us here, since we can't create multipath devices while paths might be getting deleted. I do think your earlier point that we don't need any locking until the paths are added to the pathvec is important. Instead of going a path at a time, getting the path's information and then adding it to the pathvec, we could get the information on all the outstanding paths and then add them all at once, and then call the map creation code (basically the work that ev_add_path does) on all the paths. > With much less effort then trying to merge things at the uevent level. Yes. I'd much rather deal with the uevents as they acutally are, and have uev_add_path (possibly now called uev_add_paths) do the work. > I'd be willing to look into it; please keep me informed about any > progress here. > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Teamlead Storage & Networking > hare@suse.de +49 911 74053 688 > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton > HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 16:07 ` Benjamin Marzinski @ 2016-11-28 16:26 ` Zdenek Kabelac 0 siblings, 0 replies; 47+ messages in thread From: Zdenek Kabelac @ 2016-11-28 16:26 UTC (permalink / raw) To: Benjamin Marzinski, Hannes Reinecke Cc: zhang.kai16, dm-devel, Martin Wilck, tang.junhui, Bart Van Assche Dne 28.11.2016 v 17:07 Benjamin Marzinski napsal(a): > On Mon, Nov 28, 2016 at 11:05:29AM +0100, Hannes Reinecke wrote: >> Hi Junhui, >> >> Tricky bit would be to figure out _when_ to stop uevent processing and >> calling 'ev_add_path'. >> I guess we'd need to make 'ev_add_path' running in a separate thread; >> that way we'd be putting pressure off the uevent processing thread and >> the whole thing should be running far smoother. > > I still think that we don't need to force a wait here. > uevent_dispatch() already pulls off all the queued events in a batch. We > could just merge over the batch that we are given. This has the nice > property that when uevents aren't coming in a storm, we quickly process > the next event, and when they are, the further multipathd gets behind, the > larger the list it will merge over. > It's worth to mention at this point - libdevmapper 1.02 is NOT library with multithread support - there is 'certain' limited set of thread-safe functions, but for sure not a whole library. So any threaded usage of libdevmapper.so would likely require mutex against access to do library code. Regards Zdenek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 2:19 tang.junhui 2016-11-28 10:05 ` Hannes Reinecke @ 2016-11-28 10:06 ` Zdenek Kabelac 2016-11-28 10:42 ` Hannes Reinecke 2016-11-28 10:28 ` Martin Wilck 2016-11-28 15:25 ` Benjamin Marzinski 3 siblings, 1 reply; 47+ messages in thread From: Zdenek Kabelac @ 2016-11-28 10:06 UTC (permalink / raw) To: tang.junhui, Christophe Varoqui, bmarzins, Hannes Reinecke, Martin Wilck, Bart Van Assche Cc: zhang.kai16, dm-devel Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a): > Hello Christophe, Ben, Hannes, Martin, Bart, > I am a member of host-side software development team of ZXUSP storage project > in ZTE Corporation. Facing the market demand, our team decides to write code to > promote multipath efficiency next month. The whole idea is in the mail below.We > hope to participate in and make progress with the open source community, so any > suggestion and comment would be welcome. > Hi First - we are aware of these issue. The solution proposed in this mail would surely help - but there is likely a bigger issue to be solved first. The core trouble is to avoid 'blkid' disk identification to be executed. Recent version of multipath is already marking plain 'RELOAD' operation of table (which should not be changing disk content) with extra DM bit, so udev rules ATM skips 'pvscan' - we also would like to extend the functionality to skip rules more and reimport existing 'symlinks' from udev database (so they would not get deleted). I believe the processing of udev rules is 'relatively' quick as long as it does not need to read/write ANYTHING from real disks. So while aggregation of 'uevents' in multipath would 'shorten' queue processing of events - it would still not speedup scan alone. We need to drastically shorten unnecessary disk re-scanning. Also note - if you have a lot of disks - it might be worth to checkout whether udev picks 'right amount of udev workers'. There is heuristic logic to avoid system overload - but might be worth to check if in you system with your amount of CPU/RAM/DISKS the computed number is the best for scaling - i.e. if you double the amount of workers - do you get any better performance ? Regards Zdenek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 10:06 ` Zdenek Kabelac @ 2016-11-28 10:42 ` Hannes Reinecke 2016-11-28 11:51 ` Zdenek Kabelac 0 siblings, 1 reply; 47+ messages in thread From: Hannes Reinecke @ 2016-11-28 10:42 UTC (permalink / raw) To: Zdenek Kabelac, tang.junhui, Christophe Varoqui, bmarzins, Martin Wilck, Bart Van Assche Cc: zhang.kai16, dm-devel On 11/28/2016 11:06 AM, Zdenek Kabelac wrote: > Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a): >> Hello Christophe, Ben, Hannes, Martin, Bart, >> I am a member of host-side software development team of ZXUSP storage >> project >> in ZTE Corporation. Facing the market demand, our team decides to >> write code to >> promote multipath efficiency next month. The whole idea is in the mail >> below.We >> hope to participate in and make progress with the open source >> community, so any >> suggestion and comment would be welcome. >> > > > Hi > > First - we are aware of these issue. > > The solution proposed in this mail would surely help - but there is > likely a bigger issue to be solved first. > > The core trouble is to avoid 'blkid' disk identification to be executed. > Recent version of multipath is already marking plain 'RELOAD' operation > of table (which should not be changing disk content) with extra DM bit, > so udev rules ATM skips 'pvscan' - we also would like to extend the > functionality to skip rules more and reimport existing 'symlinks' from > udev database (so they would not get deleted). > > I believe the processing of udev rules is 'relatively' quick as long > as it does not need to read/write ANYTHING from real disks. > Hmm. You sure this is an issue? We definitely need to skip uevent handling when a path goes down (but I think we do that already), but for 'add' events we absolutely need to call blkid to figure out if the device has changed. There are storage arrays out there who use a 'path down/path up' cycle to inform initiators about any device layout change. So we wouldn't be able to handle those properly if we don't call blkid here. > So while aggregation of 'uevents' in multipath would 'shorten' queue > processing of events - it would still not speedup scan alone. > > We need to drastically shorten unnecessary disk re-scanning. > > Also note - if you have a lot of disks - it might be worth to checkout > whether udev picks 'right amount of udev workers'. > There is heuristic logic to avoid system overload - but might be worth > to check if in you system with your amount of CPU/RAM/DISKS the > computed number is the best for scaling - i.e. if you double the amount > of workers - do you > get any better performance ? > That doesn't help, as we only have one queue (within multipath) to handle all uevents. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 10:42 ` Hannes Reinecke @ 2016-11-28 11:51 ` Zdenek Kabelac 2016-11-28 12:06 ` Peter Rajnoha 2016-11-28 12:08 ` Hannes Reinecke 0 siblings, 2 replies; 47+ messages in thread From: Zdenek Kabelac @ 2016-11-28 11:51 UTC (permalink / raw) To: Hannes Reinecke, tang.junhui, Christophe Varoqui, bmarzins, Martin Wilck, Bart Van Assche Cc: zhang.kai16, dm-devel Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a): > On 11/28/2016 11:06 AM, Zdenek Kabelac wrote: >> Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a): >>> Hello Christophe, Ben, Hannes, Martin, Bart, >>> I am a member of host-side software development team of ZXUSP storage >>> project >>> in ZTE Corporation. Facing the market demand, our team decides to >>> write code to >>> promote multipath efficiency next month. The whole idea is in the mail >>> below.We >>> hope to participate in and make progress with the open source >>> community, so any >>> suggestion and comment would be welcome. >>> >> >> >> Hi >> >> First - we are aware of these issue. >> >> The solution proposed in this mail would surely help - but there is >> likely a bigger issue to be solved first. >> >> The core trouble is to avoid 'blkid' disk identification to be executed. >> Recent version of multipath is already marking plain 'RELOAD' operation >> of table (which should not be changing disk content) with extra DM bit, >> so udev rules ATM skips 'pvscan' - we also would like to extend the >> functionality to skip rules more and reimport existing 'symlinks' from >> udev database (so they would not get deleted). >> >> I believe the processing of udev rules is 'relatively' quick as long >> as it does not need to read/write ANYTHING from real disks. >> > Hmm. You sure this is an issue? > We definitely need to skip uevent handling when a path goes down (but I > think we do that already), but for 'add' events we absolutely need to > call blkid to figure out if the device has changed. > There are storage arrays out there who use a 'path down/path up' cycle > to inform initiators about any device layout change. > So we wouldn't be able to handle those properly if we don't call blkid here. The core trouble is - With multipath device - you ONLY want to 'scan' device (with blkid) when only the initial first member device of multipath gets in. So you start multipath (resume -> CHANGE) - it should be the ONLY place to run 'blkid' test (which really goes though over 3/4MB of disk read, to check if there is not ZFS somewhere) Then any next disk being a member of multipath (recognized by 'multipath -c', should NOT scan) - as far as I can tell current order is opposite, fist there is 'blkid' (60) and then rule (62) recognizes a mpath_member. Thus every add disk fires very lengthy blkid scan. Of course I'm not here an expert on dm multipath rules so passing this on to prajnoha@ - but I'd guess this is primary source of slowdowns. There should be exactly ONE blkid for a single multipath device - as long as 'RELOAD' only add/remove paths (there is no reason to scan component devices) > >> So while aggregation of 'uevents' in multipath would 'shorten' queue >> processing of events - it would still not speedup scan alone. >> >> We need to drastically shorten unnecessary disk re-scanning. >> >> Also note - if you have a lot of disks - it might be worth to checkout >> whether udev picks 'right amount of udev workers'. >> There is heuristic logic to avoid system overload - but might be worth >> to check if in you system with your amount of CPU/RAM/DISKS the >> computed number is the best for scaling - i.e. if you double the amount >> of workers - do you >> get any better performance ? >> > That doesn't help, as we only have one queue (within multipath) to > handle all uevents. This was meant for systems with many different multipath devices. Obviously would not help with a single multipath device. Regards Zdenek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 11:51 ` Zdenek Kabelac @ 2016-11-28 12:06 ` Peter Rajnoha 2016-11-28 12:08 ` Hannes Reinecke 1 sibling, 0 replies; 47+ messages in thread From: Peter Rajnoha @ 2016-11-28 12:06 UTC (permalink / raw) To: Zdenek Kabelac, Hannes Reinecke, tang.junhui, Christophe Varoqui, bmarzins, Martin Wilck, Bart Van Assche Cc: zhang.kai16, dm-devel On 11/28/2016 12:51 PM, Zdenek Kabelac wrote: > Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a): >> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote: >>> Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a): >>>> Hello Christophe, Ben, Hannes, Martin, Bart, >>>> I am a member of host-side software development team of ZXUSP storage >>>> project >>>> in ZTE Corporation. Facing the market demand, our team decides to >>>> write code to >>>> promote multipath efficiency next month. The whole idea is in the mail >>>> below.We >>>> hope to participate in and make progress with the open source >>>> community, so any >>>> suggestion and comment would be welcome. >>>> >>> >>> >>> Hi >>> >>> First - we are aware of these issue. >>> >>> The solution proposed in this mail would surely help - but there is >>> likely a bigger issue to be solved first. >>> >>> The core trouble is to avoid 'blkid' disk identification to be >>> executed. >>> Recent version of multipath is already marking plain 'RELOAD' operation >>> of table (which should not be changing disk content) with extra DM bit, >>> so udev rules ATM skips 'pvscan' - we also would like to extend the >>> functionality to skip rules more and reimport existing 'symlinks' from >>> udev database (so they would not get deleted). >>> >>> I believe the processing of udev rules is 'relatively' quick as long >>> as it does not need to read/write ANYTHING from real disks. >>> >> Hmm. You sure this is an issue? >> We definitely need to skip uevent handling when a path goes down (but I >> think we do that already), but for 'add' events we absolutely need to >> call blkid to figure out if the device has changed. >> There are storage arrays out there who use a 'path down/path up' cycle >> to inform initiators about any device layout change. >> So we wouldn't be able to handle those properly if we don't call blkid >> here. > > The core trouble is - > > > With multipath device - you ONLY want to 'scan' device (with blkid) when > only the initial first member device of multipath gets in. > > So you start multipath (resume -> CHANGE) - it should be the ONLY place > to run 'blkid' test (which really goes though over 3/4MB of disk read, > to check if there is not ZFS somewhere) > > Then any next disk being a member of multipath (recognized by 'multipath > -c', > should NOT scan) - as far as I can tell current order is opposite, > fist there is 'blkid' (60) and then rule (62) recognizes a mpath_member. > > Thus every add disk fires very lengthy blkid scan. > > Of course I'm not here an expert on dm multipath rules so passing this > on to prajnoha@ - but I'd guess this is primary source of slowdowns. > > There should be exactly ONE blkid for a single multipath device - as > long as 'RELOAD' only add/remove paths (there is no reason to scan > component devices) That's true indeed - the blkid is executed in 60-persistent-storage.rules (for all disks in general). And multipath rules are executed later - at the moment, if we detect that the disk is a multipath component, we overwrite the ID_FS_TYPE to mpath_member. But that means the blkid was executed uselessly before. So if we wanted to save some cycles, this could be the place to also look at optimization, as Zdenek already pointed out. When it comes to the need to detect whether the component has changed or not (as Hannes pointed out), that would be only for a change in signatures (that blkid detects only). But is that enough? If we care about the change in signatures that blkid detects, don't we care about data/content that may have changed the same way as those signatures? -- Peter ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 11:51 ` Zdenek Kabelac 2016-11-28 12:06 ` Peter Rajnoha @ 2016-11-28 12:08 ` Hannes Reinecke 2016-11-28 12:23 ` Peter Rajnoha ` (2 more replies) 1 sibling, 3 replies; 47+ messages in thread From: Hannes Reinecke @ 2016-11-28 12:08 UTC (permalink / raw) To: Zdenek Kabelac, tang.junhui, Christophe Varoqui, bmarzins, Martin Wilck, Bart Van Assche Cc: zhang.kai16, dm-devel On 11/28/2016 12:51 PM, Zdenek Kabelac wrote: > Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a): >> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote: >>> Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a): >>>> Hello Christophe, Ben, Hannes, Martin, Bart, >>>> I am a member of host-side software development team of ZXUSP storage >>>> project >>>> in ZTE Corporation. Facing the market demand, our team decides to >>>> write code to >>>> promote multipath efficiency next month. The whole idea is in the mail >>>> below.We >>>> hope to participate in and make progress with the open source >>>> community, so any >>>> suggestion and comment would be welcome. >>>> >>> >>> >>> Hi >>> >>> First - we are aware of these issue. >>> >>> The solution proposed in this mail would surely help - but there is >>> likely a bigger issue to be solved first. >>> >>> The core trouble is to avoid 'blkid' disk identification to be >>> executed. >>> Recent version of multipath is already marking plain 'RELOAD' operation >>> of table (which should not be changing disk content) with extra DM bit, >>> so udev rules ATM skips 'pvscan' - we also would like to extend the >>> functionality to skip rules more and reimport existing 'symlinks' from >>> udev database (so they would not get deleted). >>> >>> I believe the processing of udev rules is 'relatively' quick as long >>> as it does not need to read/write ANYTHING from real disks. >>> >> Hmm. You sure this is an issue? >> We definitely need to skip uevent handling when a path goes down (but I >> think we do that already), but for 'add' events we absolutely need to >> call blkid to figure out if the device has changed. >> There are storage arrays out there who use a 'path down/path up' cycle >> to inform initiators about any device layout change. >> So we wouldn't be able to handle those properly if we don't call blkid >> here. > > The core trouble is - > > > With multipath device - you ONLY want to 'scan' device (with blkid) when > only the initial first member device of multipath gets in. > > So you start multipath (resume -> CHANGE) - it should be the ONLY place > to run 'blkid' test (which really goes though over 3/4MB of disk read, > to check if there is not ZFS somewhere) > > Then any next disk being a member of multipath (recognized by 'multipath > -c', > should NOT scan) - as far as I can tell current order is opposite, > fist there is 'blkid' (60) and then rule (62) recognizes a mpath_member. > > Thus every add disk fires very lengthy blkid scan. > > Of course I'm not here an expert on dm multipath rules so passing this > on to prajnoha@ - but I'd guess this is primary source of slowdowns. > > There should be exactly ONE blkid for a single multipath device - as > long as 'RELOAD' only add/remove paths (there is no reason to scan > component devices) > ATM 'multipath -c' is just a simple test if the device is supposed to be handled by multipath. And the number of bytes read by blkid should be _that_ large; a simple 'blkid' on my device caused it to read 35k ... Also udev will become very unhappy if we're not calling blkid for every device; you'd be having a hard time reconstructing the event for those devices. While it's trivial to import variables from parent devices, it's impossible to do that from unrelated devices; you'd need a dedicated daemon for that. So we cannot skip blkid without additional tooling. >> >>> So while aggregation of 'uevents' in multipath would 'shorten' queue >>> processing of events - it would still not speedup scan alone. >>> >>> We need to drastically shorten unnecessary disk re-scanning. >>> >>> Also note - if you have a lot of disks - it might be worth to checkout >>> whether udev picks 'right amount of udev workers'. >>> There is heuristic logic to avoid system overload - but might be worth >>> to check if in you system with your amount of CPU/RAM/DISKS the >>> computed number is the best for scaling - i.e. if you double the amount >>> of workers - do you >>> get any better performance ? >>> >> That doesn't help, as we only have one queue (within multipath) to >> handle all uevents. > > This was meant for systems with many different multipath devices. > Obviously would not help with a single multipath device. > I'm talking about the multipath daemon. There will be exactly _one_ instance of the multipath daemon running for the entire system, which will be handling _all_ udev events with a single queue. Independent on the number of attached devices. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 12:08 ` Hannes Reinecke @ 2016-11-28 12:23 ` Peter Rajnoha 2016-11-28 12:55 ` Zdenek Kabelac 2016-11-28 17:22 ` Benjamin Marzinski 2 siblings, 0 replies; 47+ messages in thread From: Peter Rajnoha @ 2016-11-28 12:23 UTC (permalink / raw) To: Hannes Reinecke, Zdenek Kabelac, tang.junhui, Christophe Varoqui, bmarzins, Martin Wilck, Bart Van Assche Cc: zhang.kai16, dm-devel On 11/28/2016 01:08 PM, Hannes Reinecke wrote: > On 11/28/2016 12:51 PM, Zdenek Kabelac wrote: >> Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a): >>> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote: >>>> Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a): >>>>> Hello Christophe, Ben, Hannes, Martin, Bart, >>>>> I am a member of host-side software development team of ZXUSP storage >>>>> project >>>>> in ZTE Corporation. Facing the market demand, our team decides to >>>>> write code to >>>>> promote multipath efficiency next month. The whole idea is in the mail >>>>> below.We >>>>> hope to participate in and make progress with the open source >>>>> community, so any >>>>> suggestion and comment would be welcome. >>>>> >>>> >>>> >>>> Hi >>>> >>>> First - we are aware of these issue. >>>> >>>> The solution proposed in this mail would surely help - but there is >>>> likely a bigger issue to be solved first. >>>> >>>> The core trouble is to avoid 'blkid' disk identification to be >>>> executed. >>>> Recent version of multipath is already marking plain 'RELOAD' operation >>>> of table (which should not be changing disk content) with extra DM bit, >>>> so udev rules ATM skips 'pvscan' - we also would like to extend the >>>> functionality to skip rules more and reimport existing 'symlinks' from >>>> udev database (so they would not get deleted). >>>> >>>> I believe the processing of udev rules is 'relatively' quick as long >>>> as it does not need to read/write ANYTHING from real disks. >>>> >>> Hmm. You sure this is an issue? >>> We definitely need to skip uevent handling when a path goes down (but I >>> think we do that already), but for 'add' events we absolutely need to >>> call blkid to figure out if the device has changed. >>> There are storage arrays out there who use a 'path down/path up' cycle >>> to inform initiators about any device layout change. >>> So we wouldn't be able to handle those properly if we don't call blkid >>> here. >> >> The core trouble is - >> >> >> With multipath device - you ONLY want to 'scan' device (with blkid) when >> only the initial first member device of multipath gets in. >> >> So you start multipath (resume -> CHANGE) - it should be the ONLY place >> to run 'blkid' test (which really goes though over 3/4MB of disk read, >> to check if there is not ZFS somewhere) >> >> Then any next disk being a member of multipath (recognized by 'multipath >> -c', >> should NOT scan) - as far as I can tell current order is opposite, >> fist there is 'blkid' (60) and then rule (62) recognizes a mpath_member. >> >> Thus every add disk fires very lengthy blkid scan. >> >> Of course I'm not here an expert on dm multipath rules so passing this >> on to prajnoha@ - but I'd guess this is primary source of slowdowns. >> >> There should be exactly ONE blkid for a single multipath device - as >> long as 'RELOAD' only add/remove paths (there is no reason to scan >> component devices) >> > ATM 'multipath -c' is just a simple test if the device is supposed to be > handled by multipath. > > And the number of bytes read by blkid should be _that_ large; a simple > 'blkid' on my device caused it to read 35k ... > > Also udev will become very unhappy if we're not calling blkid for every > device; you'd be having a hard time reconstructing the event for those > devices. What do you mean with event reconstruction? I don't think we really need to call blkid for every device. If we have configured that certain device is surely an mpath component (based on WWN in mpath config), I think we don't need to call blkid at all - it's mpath component and the top-level device should be simply used for any scanning. I mean, I still don't see why do we need to call blkid and then overwrite the ID_FS_TYPE variable right away based on the fact that it's multipath -c. If we reverse this order, we could save the extra blkid that's not actually needed. -- Peter ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 12:08 ` Hannes Reinecke 2016-11-28 12:23 ` Peter Rajnoha @ 2016-11-28 12:55 ` Zdenek Kabelac 2016-11-28 17:22 ` Benjamin Marzinski 2 siblings, 0 replies; 47+ messages in thread From: Zdenek Kabelac @ 2016-11-28 12:55 UTC (permalink / raw) To: Hannes Reinecke, tang.junhui, Christophe Varoqui, bmarzins, Martin Wilck, Bart Van Assche Cc: zhang.kai16, dm-devel Dne 28.11.2016 v 13:08 Hannes Reinecke napsal(a): > On 11/28/2016 12:51 PM, Zdenek Kabelac wrote: >> Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a): >> With multipath device - you ONLY want to 'scan' device (with blkid) when >> only the initial first member device of multipath gets in. >> >> So you start multipath (resume -> CHANGE) - it should be the ONLY place >> to run 'blkid' test (which really goes though over 3/4MB of disk read, >> to check if there is not ZFS somewhere) >> >> Then any next disk being a member of multipath (recognized by 'multipath >> -c', >> should NOT scan) - as far as I can tell current order is opposite, >> fist there is 'blkid' (60) and then rule (62) recognizes a mpath_member. >> >> Thus every add disk fires very lengthy blkid scan. >> >> Of course I'm not here an expert on dm multipath rules so passing this >> on to prajnoha@ - but I'd guess this is primary source of slowdowns. >> >> There should be exactly ONE blkid for a single multipath device - as >> long as 'RELOAD' only add/remove paths (there is no reason to scan >> component devices) >> > ATM 'multipath -c' is just a simple test if the device is supposed to be > handled by multipath. Yes - 'multipath -c' is a way to recognize 'private/internal' device of multipath device (aka component). This is very same issues with i.e. lvm2 raid1 leg devices - where you have a raid device and its individual _rimageXX legs. In lvm2 we bypass/skip scans of 'rimage' - only resulting 'raid' device is being checked. There is no reason to scan 'hidden' subdevice. Unfortunately udev doesn't support 'hidden' device - so it's upto 'blockdevice' maintainer to implement this logic in udev rules. So it needs a careful design to NOT miss scan for 'user accesssible' device while not doing unnecessary scans of device noone expect i.e. lvm2 should ever access (there is absolutely NO reason to have full udev DB content for any lvm2 private device - there is always only a minimum info - same should apply to dm multipaht - where components are 'private' to multipath). > And the number of bytes read by blkid should be _that_ large; a simple > 'blkid' on my device caused it to read 35k ... Unless you've configured (or is it default for Suse) to NOT scan for ZFS, you would see this 3 times in strace: read(3, "@\200\0\0@\200\1\0@\200\2\0@\200\3\0@\200\4\0@\200\f\0@\200\r\0@\200\30\0"..., 262144) = 262144 So a default e.i. 'blkdid /dev/sda1' really does read a lot data from disk. > > Also udev will become very unhappy if we're not calling blkid for every > device; you'd be having a hard time reconstructing the event for those > devices. It's not about making udev happy :) Private device (or subsystem) is simply a missing piece in udev. There is really no reason to fill udev DB with a device noone should be using. And I hope we both agree that mpath 'component' should never be used by any user. Of course noone object reading it via 'dd' - but component should not appear in udev DB device list as a regular device. > While it's trivial to import variables from parent devices, it's > impossible to do that from unrelated devices; you'd need a dedicated > daemon for that. > So we cannot skip blkid without additional tooling. Trust me - we can and we should improve it even more. > >>> >>>> So while aggregation of 'uevents' in multipath would 'shorten' queue >>>> processing of events - it would still not speedup scan alone. >>>> >>>> We need to drastically shorten unnecessary disk re-scanning. >>>> >>>> Also note - if you have a lot of disks - it might be worth to checkout >>>> whether udev picks 'right amount of udev workers'. >>>> There is heuristic logic to avoid system overload - but might be worth >>>> to check if in you system with your amount of CPU/RAM/DISKS the >>>> computed number is the best for scaling - i.e. if you double the amount >>>> of workers - do you >>>> get any better performance ? >>>> >>> That doesn't help, as we only have one queue (within multipath) to >>> handle all uevents. >> >> This was meant for systems with many different multipath devices. >> Obviously would not help with a single multipath device. >> > I'm talking about the multipath daemon. > There will be exactly _one_ instance of the multipath daemon running for > the entire system, which will be handling _all_ udev events with a > single queue. > Independent on the number of attached devices. Ohh - I've thought there is some bigger parallelism in multipathd. Ok - so then more workers would really not help. That another reason why improving speed of rules should be prio task #1. Regards Zdenek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 12:08 ` Hannes Reinecke 2016-11-28 12:23 ` Peter Rajnoha 2016-11-28 12:55 ` Zdenek Kabelac @ 2016-11-28 17:22 ` Benjamin Marzinski 2016-11-29 9:34 ` Zdenek Kabelac 2 siblings, 1 reply; 47+ messages in thread From: Benjamin Marzinski @ 2016-11-28 17:22 UTC (permalink / raw) To: Hannes Reinecke Cc: tang.junhui, zhang.kai16, Zdenek Kabelac, dm-devel, Bart Van Assche, Martin Wilck On Mon, Nov 28, 2016 at 01:08:51PM +0100, Hannes Reinecke wrote: > On 11/28/2016 12:51 PM, Zdenek Kabelac wrote: > > Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a): > >> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote: > >>> Dne 28.11.2016 v 03:19 tang.junhui@zte.com.cn napsal(a): > >>>> Hello Christophe, Ben, Hannes, Martin, Bart, > >>>> I am a member of host-side software development team of ZXUSP storage > >>>> project > >>>> in ZTE Corporation. Facing the market demand, our team decides to > >>>> write code to > >>>> promote multipath efficiency next month. The whole idea is in the mail > >>>> below.We > >>>> hope to participate in and make progress with the open source > >>>> community, so any > >>>> suggestion and comment would be welcome. > >>>> > >>> > >>> > >>> Hi > >>> > >>> First - we are aware of these issue. > >>> > >>> The solution proposed in this mail would surely help - but there is > >>> likely a bigger issue to be solved first. > >>> > >>> The core trouble is to avoid 'blkid' disk identification to be > >>> executed. > >>> Recent version of multipath is already marking plain 'RELOAD' operation > >>> of table (which should not be changing disk content) with extra DM bit, > >>> so udev rules ATM skips 'pvscan' - we also would like to extend the > >>> functionality to skip rules more and reimport existing 'symlinks' from > >>> udev database (so they would not get deleted). > >>> > >>> I believe the processing of udev rules is 'relatively' quick as long > >>> as it does not need to read/write ANYTHING from real disks. > >>> > >> Hmm. You sure this is an issue? > >> We definitely need to skip uevent handling when a path goes down (but I > >> think we do that already), but for 'add' events we absolutely need to > >> call blkid to figure out if the device has changed. > >> There are storage arrays out there who use a 'path down/path up' cycle > >> to inform initiators about any device layout change. > >> So we wouldn't be able to handle those properly if we don't call blkid > >> here. > > > > The core trouble is - > > > > > > With multipath device - you ONLY want to 'scan' device (with blkid) when > > only the initial first member device of multipath gets in. > > > > So you start multipath (resume -> CHANGE) - it should be the ONLY place > > to run 'blkid' test (which really goes though over 3/4MB of disk read, > > to check if there is not ZFS somewhere) > > > > Then any next disk being a member of multipath (recognized by 'multipath > > -c', > > should NOT scan) - as far as I can tell current order is opposite, > > fist there is 'blkid' (60) and then rule (62) recognizes a mpath_member. > > > > Thus every add disk fires very lengthy blkid scan. > > > > Of course I'm not here an expert on dm multipath rules so passing this > > on to prajnoha@ - but I'd guess this is primary source of slowdowns. > > > > There should be exactly ONE blkid for a single multipath device - as > > long as 'RELOAD' only add/remove paths (there is no reason to scan > > component devices) > > > ATM 'multipath -c' is just a simple test if the device is supposed to be > handled by multipath. Well, "simple" might be stretching the truth a little. I'd really like to not have to call multipath -c on every change event to a device since this isn't a particularly quick callout either. In fact we do this with the redhat code, but we use some horrible hacks, that could be solved if udev would just allow rules to compare the value of two environment variables. But this is impossible. I opened a bugzilla for allowing this, but it recently got closed WONTFIX. The idea is that multipathd would set a timestamp when it started, when a new wwid was added to the wwids file, and when it got reconfigured. These are the only times a configuration could change (well, users could change /etc/multipath.conf but not reload multipathd, but that will already cause problems). udev would read this timestamp and save it to the database. When change events come along, as long as the timestamp hasn't changed, the old value of "multipath -c" is still correect. > And the number of bytes read by blkid should be _that_ large; a simple > 'blkid' on my device caused it to read 35k ... > > Also udev will become very unhappy if we're not calling blkid for every > device; you'd be having a hard time reconstructing the event for those > devices. > While it's trivial to import variables from parent devices, it's > impossible to do that from unrelated devices; you'd need a dedicated > daemon for that. > So we cannot skip blkid without additional tooling. We need to run blkid on the multipath device, but I'm not sure what it gets us on the paths, once we have determined that they belong to multipath. multipathd doesn't use any of the values that blkid sets, and really nothing else should be directly accessing this device. -Ben > >> > >>> So while aggregation of 'uevents' in multipath would 'shorten' queue > >>> processing of events - it would still not speedup scan alone. > >>> > >>> We need to drastically shorten unnecessary disk re-scanning. > >>> > >>> Also note - if you have a lot of disks - it might be worth to checkout > >>> whether udev picks 'right amount of udev workers'. > >>> There is heuristic logic to avoid system overload - but might be worth > >>> to check if in you system with your amount of CPU/RAM/DISKS the > >>> computed number is the best for scaling - i.e. if you double the amount > >>> of workers - do you > >>> get any better performance ? > >>> > >> That doesn't help, as we only have one queue (within multipath) to > >> handle all uevents. > > > > This was meant for systems with many different multipath devices. > > Obviously would not help with a single multipath device. > > > I'm talking about the multipath daemon. > There will be exactly _one_ instance of the multipath daemon running for > the entire system, which will be handling _all_ udev events with a > single queue. > Independent on the number of attached devices. > > Cheers, > > Hannes > -- > Dr. Hannes Reinecke Teamlead Storage & Networking > hare@suse.de +49 911 74053 688 > SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg > GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton > HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 17:22 ` Benjamin Marzinski @ 2016-11-29 9:34 ` Zdenek Kabelac 0 siblings, 0 replies; 47+ messages in thread From: Zdenek Kabelac @ 2016-11-29 9:34 UTC (permalink / raw) To: Benjamin Marzinski, Hannes Reinecke Cc: tang.junhui, zhang.kai16, dm-devel, Zdenek Kabelac, Bart Van Assche, Martin Wilck Dne 28.11.2016 v 18:22 Benjamin Marzinski napsal(a): > On Mon, Nov 28, 2016 at 01:08:51PM +0100, Hannes Reinecke wrote: >> On 11/28/2016 12:51 PM, Zdenek Kabelac wrote: >>> Dne 28.11.2016 v 11:42 Hannes Reinecke napsal(a): >>>> On 11/28/2016 11:06 AM, Zdenek Kabelac wrote: >>> >>> With multipath device - you ONLY want to 'scan' device (with blkid) when >>> only the initial first member device of multipath gets in. >>> >>> So you start multipath (resume -> CHANGE) - it should be the ONLY place >>> to run 'blkid' test (which really goes though over 3/4MB of disk read, >>> to check if there is not ZFS somewhere) >>> >>> Then any next disk being a member of multipath (recognized by 'multipath >>> -c', >>> should NOT scan) - as far as I can tell current order is opposite, >>> fist there is 'blkid' (60) and then rule (62) recognizes a mpath_member. >>> >>> Thus every add disk fires very lengthy blkid scan. >>> >>> Of course I'm not here an expert on dm multipath rules so passing this >>> on to prajnoha@ - but I'd guess this is primary source of slowdowns. >>> >>> There should be exactly ONE blkid for a single multipath device - as >>> long as 'RELOAD' only add/remove paths (there is no reason to scan >>> component devices) >>> >> ATM 'multipath -c' is just a simple test if the device is supposed to be >> handled by multipath. > > Well, "simple" might be stretching the truth a little. I'd really like > to not have to call multipath -c on every change event to a device since > this isn't a particularly quick callout either. In fact we do this with > the redhat code, but we use some horrible hacks, that could be solved if > udev would just allow rules to compare the value of two environment > variables. But this is impossible. I opened a bugzilla for allowing > this, but it recently got closed WONTFIX. The idea is that multipathd There are number of ways to solve this thing. One could be udev built-in command plugin to make parsing of WWID efficient. multipath could also be able to generate udev rule dynamically (i.e. --makeudevrule) which would check if udev rule is older then config file and reload udev with new rule if necessary. In such generated rules you then could use WWID constant for compare operation. But even plain 'multipath -c' is still better - since it's executed already anyway - my point here is - it should be executed before 'blkid' is attempted. > would set a timestamp when it started, when a new wwid was added to the > wwids file, and when it got reconfigured. These are the only times a > configuration could change (well, users could change /etc/multipath.conf > but not reload multipathd, but that will already cause problems). udev > would read this timestamp and save it to the database. When change > events come along, as long as the timestamp hasn't changed, the old > value of "multipath -c" is still correect. > >> And the number of bytes read by blkid should be _that_ large; a simple >> 'blkid' on my device caused it to read 35k ... >> >> Also udev will become very unhappy if we're not calling blkid for every >> device; you'd be having a hard time reconstructing the event for those >> devices. >> While it's trivial to import variables from parent devices, it's >> impossible to do that from unrelated devices; you'd need a dedicated >> daemon for that. >> So we cannot skip blkid without additional tooling. > > We need to run blkid on the multipath device, but I'm not sure what it > gets us on the paths, once we have determined that they belong to There is NO reason to run blkid on a path (component device) - since the discovered filesystem is later overwritten by mpath member info anyway. Now lvm2 even tries to detect if the user has tried to use 'path' instead of real mpath device on misconfigured system - and already number of such misusage have been detected. Regards Zdenek ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 2:19 tang.junhui 2016-11-28 10:05 ` Hannes Reinecke 2016-11-28 10:06 ` Zdenek Kabelac @ 2016-11-28 10:28 ` Martin Wilck 2016-11-28 17:31 ` Benjamin Marzinski 2016-11-28 15:25 ` Benjamin Marzinski 3 siblings, 1 reply; 47+ messages in thread From: Martin Wilck @ 2016-11-28 10:28 UTC (permalink / raw) To: tang.junhui, Christophe Varoqui, bmarzins, Hannes Reinecke, Bart Van Assche Cc: zhang.kai16, dm-devel On Mon, 2016-11-28 at 10:19 +0800, tang.junhui@zte.com.cn wrote: > > 4. Proposal > Other than processing uevents one by one, uevents which coming from > the > same LUN devices can be mergered to one, and then uevent processing > thread only needs to process it once, and it only produces one DM > addition > uevent which could reduce system resource consumption. > Here comes an idea how to achieve this without a lot of additional code: libmultipath already has code to check whether any maps need to be updated (in coalesce_paths()). Instead of recording uevents, merging them, and calling ev_add_path() for every affected WWID, it might be sufficient to set daemon state to DAEMON_CONFIGURE and wake up the main multipathd thread to call reconfigure(). Then we only need to make sure that coalesce_paths() really reloads or creates maps only when necessary. I have some patches here that I made for that purpose, for a different scenario (multipathd to avoid RELOAD ioctls when it's started in a scenario where most paths are already set up by udev). Regards Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 10:28 ` Martin Wilck @ 2016-11-28 17:31 ` Benjamin Marzinski 2016-11-29 7:52 ` Martin Wilck 0 siblings, 1 reply; 47+ messages in thread From: Benjamin Marzinski @ 2016-11-28 17:31 UTC (permalink / raw) To: Martin Wilck; +Cc: tang.junhui, zhang.kai16, dm-devel, Bart Van Assche On Mon, Nov 28, 2016 at 11:28:56AM +0100, Martin Wilck wrote: > On Mon, 2016-11-28 at 10:19 +0800, tang.junhui@zte.com.cn wrote: > > > > 4. Proposal > > Other than processing uevents one by one, uevents which coming from > > the > > same LUN devices can be mergered to one, and then uevent processing > > thread only needs to process it once, and it only produces one DM > > addition > > uevent which could reduce system resource consumption. > > > > Here comes an idea how to achieve this without a lot of additional > code: > > libmultipath already has code to check whether any maps need to be > updated (in coalesce_paths()). Instead of recording uevents, merging > them, and calling ev_add_path() for every affected WWID, it might be > sufficient to set daemon state to DAEMON_CONFIGURE and wake up the main > multipathd thread to call reconfigure(). Then we only need to make sure > that coalesce_paths() really reloads or creates maps only when > necessary. I have some patches here that I made for that purpose, for a > different scenario (multipathd to avoid RELOAD ioctls when it's started > in a scenario where most paths are already set up by udev). There is a long standing multipath issue that this will likely make a lot worse. Currently, multipathd can drop paths that it doesn't have access to on reconfigure. This is wrong. When the path comes back, it won't issue another add uevent. This means that multipathd won't be able to re-enable it (since it stopped monitoring it during the reconfigure). In fact the only way to get the path back is to either manually intervene or to have the path actually get removed from the system and come back. The correct thing for multipath to do is to only remove paths when it is manually instructed to, when a reconfigure blacklists the path, or when it gets a remove uevent. Now currently this isn't a huge issue since people don't commonly reconfigure their multipath device, but it we start running reconfigure all the time, it certainly will become one. -Ben > Regards > Martin > > -- > Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 17:31 ` Benjamin Marzinski @ 2016-11-29 7:52 ` Martin Wilck 2016-11-29 19:21 ` Benjamin Marzinski 0 siblings, 1 reply; 47+ messages in thread From: Martin Wilck @ 2016-11-29 7:52 UTC (permalink / raw) To: Benjamin Marzinski; +Cc: Bart, tang.junhui, zhang.kai16, dm-devel, Assche On Mon, 2016-11-28 at 11:31 -0600, Benjamin Marzinski wrote: > On Mon, Nov 28, 2016 at 11:28:56AM +0100, Martin Wilck wrote: > > On Mon, 2016-11-28 at 10:19 +0800, tang.junhui@zte.com.cn wrote: > > > > > > 4. Proposal > > > Other than processing uevents one by one, uevents which coming > > > from > > > the > > > same LUN devices can be mergered to one, and then uevent > > > processing > > > thread only needs to process it once, and it only produces one DM > > > addition > > > uevent which could reduce system resource consumption. > > > > > > > Here comes an idea how to achieve this without a lot of additional > > code: > > > > libmultipath already has code to check whether any maps need to be > > updated (in coalesce_paths()). Instead of recording uevents, > > merging > > them, and calling ev_add_path() for every affected WWID, it might > > be > > sufficient to set daemon state to DAEMON_CONFIGURE and wake up the > > main > > multipathd thread to call reconfigure(). Then we only need to make > > sure > > that coalesce_paths() really reloads or creates maps only when > > necessary. I have some patches here that I made for that purpose, > > for a > > different scenario (multipathd to avoid RELOAD ioctls when it's > > started > > in a scenario where most paths are already set up by udev). > > There is a long standing multipath issue that this will likely make a > lot worse. Currently, multipathd can drop paths that it doesn't have > access to on reconfigure. This is wrong. When the path comes back, it > won't issue another add uevent. This means that multipathd won't be > able > to re-enable it (since it stopped monitoring it during the > reconfigure). > In fact the only way to get the path back is to either manually > intervene or to have the path actually get removed from the system > and > come back. Thanks for pointing this out. But don't you think it can be dealt with by fixing the reconfigure() path? My point is: any code that merges uevents and tries to figure out the correct "minimal" set DM operations to put the changes in place re- implement at least parts of the logic of coalesce_paths(), and face similar problems. Regards Martin -- Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-29 7:52 ` Martin Wilck @ 2016-11-29 19:21 ` Benjamin Marzinski 0 siblings, 0 replies; 47+ messages in thread From: Benjamin Marzinski @ 2016-11-29 19:21 UTC (permalink / raw) To: Martin Wilck; +Cc: tang.junhui, zhang.kai16, dm-devel, Bart Van Assche On Tue, Nov 29, 2016 at 08:52:17AM +0100, Martin Wilck wrote: > On Mon, 2016-11-28 at 11:31 -0600, Benjamin Marzinski wrote: > > On Mon, Nov 28, 2016 at 11:28:56AM +0100, Martin Wilck wrote: > > > On Mon, 2016-11-28 at 10:19 +0800, tang.junhui@zte.com.cn wrote: > > > > > > > > 4. Proposal > > > > Other than processing uevents one by one, uevents which coming > > > > from > > > > the > > > > same LUN devices can be mergered to one, and then uevent > > > > processing > > > > thread only needs to process it once, and it only produces one DM > > > > addition > > > > uevent which could reduce system resource consumption. > > > > > > > > > > Here comes an idea how to achieve this without a lot of additional > > > code: > > > > > > libmultipath already has code to check whether any maps need to be > > > updated (in coalesce_paths()). Instead of recording uevents, > > > merging > > > them, and calling ev_add_path() for every affected WWID, it might > > > be > > > sufficient to set daemon state to DAEMON_CONFIGURE and wake up the > > > main > > > multipathd thread to call reconfigure(). Then we only need to make > > > sure > > > that coalesce_paths() really reloads or creates maps only when > > > necessary. I have some patches here that I made for that purpose, > > > for a > > > different scenario (multipathd to avoid RELOAD ioctls when it's > > > started > > > in a scenario where most paths are already set up by udev). > > > > There is a long standing multipath issue that this will likely make a > > lot worse. Currently, multipathd can drop paths that it doesn't have > > access to on reconfigure. This is wrong. When the path comes back, it > > won't issue another add uevent. This means that multipathd won't be > > able > > to re-enable it (since it stopped monitoring it during the > > reconfigure). > > In fact the only way to get the path back is to either manually > > intervene or to have the path actually get removed from the system > > and > > come back. > > Thanks for pointing this out. But don't you think it can be dealt with > by fixing the reconfigure() path? Sure. We should fix that even if we don't make calls to reconfigure more common. I was just pointing out that by going this route, we make fixing that issue much more of a priority. > My point is: any code that merges uevents and tries to figure out the > correct "minimal" set DM operations to put the changes in place re- > implement at least parts of the logic of coalesce_paths(), and face > similar problems. Yeah. But reconfigure does a lot of stuff that we probably shouldn't be doing when we are simply adding and removing paths, such as reload the configuration (and everything that goes along with that, such as remaking the pathvec to deal with changes in which devices are blacklisted). We could try breaking out just what needs to be done to deal with changes to a seperate function, but I'm pretty certain that to catch all the possibilities of batched events, we will end up doing more work than necessary in many cases. In fact, we may find that this takes long enough that we actually take a performance hit unless we have a large batch. -Ben > Regards > Martin > > -- > Dr. Martin Wilck <mwilck@suse.com>, Tel. +49 (0)911 74053 2107 > SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton > HRB 21284 (AG Nürnberg) ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 2:19 tang.junhui ` (2 preceding siblings ...) 2016-11-28 10:28 ` Martin Wilck @ 2016-11-28 15:25 ` Benjamin Marzinski 2016-11-28 15:37 ` Hannes Reinecke 3 siblings, 1 reply; 47+ messages in thread From: Benjamin Marzinski @ 2016-11-28 15:25 UTC (permalink / raw) To: tang.junhui; +Cc: zhang.kai16, dm-devel, Bart Van Assche, Martin Wilck On Mon, Nov 28, 2016 at 10:19:15AM +0800, tang.junhui@zte.com.cn wrote: > Hello Christophe, Ben, Hannes, Martin, Bart, > I am a member of host-side software development team of ZXUSP storage > project > in ZTE Corporation. Facing the market demand, our team decides to write > code to > promote multipath efficiency next month. The whole idea is in the mail > below.We > hope to participate in and make progress with the open source community, > so any > suggestion and comment would be welcome. Like I mentioned before, I think this is a good idea in general, but the devil is in the details here. > > Thanks, > Tang > > ------------------------------------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------------------------------------ > 1. Problem > In these scenarios, multipath processing efficiency is low: > 1) Many paths exist in each multipath device, > 2) Devices addition or deletion during iSCSI login/logout or FC link > up/down. <snip> > 4. Proposal > Other than processing uevents one by one, uevents which coming from the > same LUN devices can be mergered to one, and then uevent processing > thread only needs to process it once, and it only produces one DM addition > uevent which could reduce system resource consumption. > > The example in Chapter 2 is continued to use to explain the proposal: > 1) Multipath receives block device addition uevents from udev: > UDEV [89068.806214] add > /devices/platform/host3/session44/target3:0:0/3:0:0:0/block/sdc (block) > UDEV [89068.909457] add > /devices/platform/host3/session44/target3:0:0/3:0:0:2/block/sdg (block) > UDEV [89068.944956] add > /devices/platform/host3/session44/target3:0:0/3:0:0:1/block/sde (block) > UDEV [89068.959215] add > /devices/platform/host5/session46/target5:0:0/5:0:0:0/block/sdh (block) > UDEV [89068.978558] add > /devices/platform/host5/session46/target5:0:0/5:0:0:2/block/sdk (block) > UDEV [89069.004217] add > /devices/platform/host5/session46/target5:0:0/5:0:0:1/block/sdj (block) > UDEV [89069.486361] add > /devices/platform/host4/session45/target4:0:0/4:0:0:1/block/sdf (block) > UDEV [89069.495194] add > /devices/platform/host4/session45/target4:0:0/4:0:0:0/block/sdd (block) > UDEV [89069.511628] add > /devices/platform/host4/session45/target4:0:0/4:0:0:2/block/sdi (block) > UDEV [89069.716292] add > /devices/platform/host6/session47/target6:0:0/6:0:0:0/block/sdl (block) > UDEV [89069.748456] add > /devices/platform/host6/session47/target6:0:0/6:0:0:1/block/sdm (block) > UDEV [89069.789662] add > /devices/platform/host6/session47/target6:0:0/6:0:0:2/block/sdn (block) > > 2) Multipath merges these 12 uevents to 3 internal uvents > UEVENT add sdc sdh sdd sdl > UEVENT add sde sdj sdf sdm > UEVENT add sdg sdk sdi sdn > > 3) Multipath process these 3 uevents one by one, and only produce 3 > addition > DM uvents, no dm change uevent exists. > KERNEL[89068.899614] add /devices/virtual/block/dm-2 (block) > KERNEL[89068.955364] add /devices/virtual/block/dm-3 (block) > KERNEL[89069.018903] add /devices/virtual/block/dm-4 (block) Just because I'm pedantic: There will, of cource, be dm change events. Without them, you couldn't have a multipath device. Whenever you load a table in a dm device (including during the initial creation), you get a change event. > 4) Udev process these uevents above, and transfer it to multipath > UDEV [89068.926428] add /devices/virtual/block/dm-2 (block) > UDEV [89069.007511] add /devices/virtual/block/dm-3 (block) > UDEV [89069.098054] add /devices/virtual/block/dm-4 (block) multipathd ignores add events for dm devices (look at uev_trigger). A dm device isn't set up until it's initial change event happens. > 5) Multipath processes these uevents above, and finishes the creation of > multipath > devices. > > 5. Coding > After taking over uevents form uevent listening thread, uevent processing > thread can > merger these uevents before processing�� > int uevent_dispatch(int (*uev_trigger)(struct uevent *, void * > trigger_data), > void * trigger_data) > { > ... > while (1) { > ... > list_splice_init(&uevq, &uevq_tmp); > ... > list_merger_uevents(&uevq_tmp); > service_uevq(&uevq_tmp); > } > ... > } > In structure of ��struct uevent�� , an additional member of ��char > wwid[WWID_SIZE]�� is > added to record each device WWID for addition or change uevent to identify > whether > these uevents coming from the same LUN, and an additional member of > ��struct list_head merger_node�� is added to record the list of uevents > which having been > merged with this uevent: > struct uevent { > struct list_head node; > struct list_head merger_node; > char wwid[WWID_SIZE] > struct udev_device *udev; > ... > }; You can't just get the wwid with no work (look at all work uev_add_path does, specifically alloc_path_with_pathinfo). Now you could reorder this, but there isn't much point, since it is doing useful things, like checking if this is a spurious uevent, and necessary things, like figuring out the device type and using that that the configuration to figure out HOW to get the wwid. It seems like what you want to do is to call uev_add_path multiple times, but defer most of the work that ev_add_path does (creating or updating the multipath device), until you've processed all that paths. > In list_merger_uevents(&uevq_tmp), each node is traversed from the latest > to the oldest, > and the older node with the same WWID and uevent type(e.g. add) would be > moved to > the merger_node list of the later node. If a deletion uevent node > occurred, other older > uevent nodes about this device would be filtered(Thanks to Martin��s > idea). > > After above processing, attention must be paid to that the parameter > ��struct uevent * uev�� is not a single uevent any more in and after > uev_trigger(), code > need to be modified to process batch uevents in uev_add_path() and so on. -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 15:25 ` Benjamin Marzinski @ 2016-11-28 15:37 ` Hannes Reinecke 2016-12-01 1:16 ` tang.junhui 0 siblings, 1 reply; 47+ messages in thread From: Hannes Reinecke @ 2016-11-28 15:37 UTC (permalink / raw) To: Benjamin Marzinski, tang.junhui Cc: dm-devel, zhang.kai16, Martin Wilck, Bart Van Assche On 11/28/2016 04:25 PM, Benjamin Marzinski wrote: > On Mon, Nov 28, 2016 at 10:19:15AM +0800, tang.junhui@zte.com.cn wrote: >> Hello Christophe, Ben, Hannes, Martin, Bart, >> I am a member of host-side software development team of ZXUSP storage >> project >> in ZTE Corporation. Facing the market demand, our team decides to write >> code to >> promote multipath efficiency next month. The whole idea is in the mail >> below.We >> hope to participate in and make progress with the open source community, >> so any >> suggestion and comment would be welcome. > > Like I mentioned before, I think this is a good idea in general, but the > devil is in the details here. > >> >> Thanks, >> Tang >> >> ------------------------------------------------------------------------------------------------------------------------------ >> ------------------------------------------------------------------------------------------------------------------------------ >> 1. Problem >> In these scenarios, multipath processing efficiency is low: >> 1) Many paths exist in each multipath device, >> 2) Devices addition or deletion during iSCSI login/logout or FC link >> up/down. > > <snip> > >> 4. Proposal >> Other than processing uevents one by one, uevents which coming from the >> same LUN devices can be mergered to one, and then uevent processing >> thread only needs to process it once, and it only produces one DM addition >> uevent which could reduce system resource consumption. >> >> The example in Chapter 2 is continued to use to explain the proposal: >> 1) Multipath receives block device addition uevents from udev: >> UDEV [89068.806214] add >> /devices/platform/host3/session44/target3:0:0/3:0:0:0/block/sdc (block) >> UDEV [89068.909457] add >> /devices/platform/host3/session44/target3:0:0/3:0:0:2/block/sdg (block) >> UDEV [89068.944956] add >> /devices/platform/host3/session44/target3:0:0/3:0:0:1/block/sde (block) >> UDEV [89068.959215] add >> /devices/platform/host5/session46/target5:0:0/5:0:0:0/block/sdh (block) >> UDEV [89068.978558] add >> /devices/platform/host5/session46/target5:0:0/5:0:0:2/block/sdk (block) >> UDEV [89069.004217] add >> /devices/platform/host5/session46/target5:0:0/5:0:0:1/block/sdj (block) >> UDEV [89069.486361] add >> /devices/platform/host4/session45/target4:0:0/4:0:0:1/block/sdf (block) >> UDEV [89069.495194] add >> /devices/platform/host4/session45/target4:0:0/4:0:0:0/block/sdd (block) >> UDEV [89069.511628] add >> /devices/platform/host4/session45/target4:0:0/4:0:0:2/block/sdi (block) >> UDEV [89069.716292] add >> /devices/platform/host6/session47/target6:0:0/6:0:0:0/block/sdl (block) >> UDEV [89069.748456] add >> /devices/platform/host6/session47/target6:0:0/6:0:0:1/block/sdm (block) >> UDEV [89069.789662] add >> /devices/platform/host6/session47/target6:0:0/6:0:0:2/block/sdn (block) >> >> 2) Multipath merges these 12 uevents to 3 internal uvents >> UEVENT add sdc sdh sdd sdl >> UEVENT add sde sdj sdf sdm >> UEVENT add sdg sdk sdi sdn >> >> 3) Multipath process these 3 uevents one by one, and only produce 3 >> addition >> DM uvents, no dm change uevent exists. >> KERNEL[89068.899614] add /devices/virtual/block/dm-2 (block) >> KERNEL[89068.955364] add /devices/virtual/block/dm-3 (block) >> KERNEL[89069.018903] add /devices/virtual/block/dm-4 (block) > > Just because I'm pedantic: There will, of cource, be dm change events. > Without them, you couldn't have a multipath device. Whenever you load a > table in a dm device (including during the initial creation), you get a > change event. > >> 4) Udev process these uevents above, and transfer it to multipath >> UDEV [89068.926428] add /devices/virtual/block/dm-2 (block) >> UDEV [89069.007511] add /devices/virtual/block/dm-3 (block) >> UDEV [89069.098054] add /devices/virtual/block/dm-4 (block) > > multipathd ignores add events for dm devices (look at uev_trigger). A dm > device isn't set up until it's initial change event happens. > >> 5) Multipath processes these uevents above, and finishes the creation of >> multipath >> devices. >> >> 5. Coding >> After taking over uevents form uevent listening thread, uevent processing >> thread can >> merger these uevents before processing�� >> int uevent_dispatch(int (*uev_trigger)(struct uevent *, void * >> trigger_data), >> void * trigger_data) >> { >> ... >> while (1) { >> ... >> list_splice_init(&uevq, &uevq_tmp); >> ... >> list_merger_uevents(&uevq_tmp); >> service_uevq(&uevq_tmp); >> } >> ... >> } >> In structure of ��struct uevent�� , an additional member of ��char >> wwid[WWID_SIZE]�� is >> added to record each device WWID for addition or change uevent to identify >> whether >> these uevents coming from the same LUN, and an additional member of >> ��struct list_head merger_node�� is added to record the list of uevents >> which having been >> merged with this uevent: >> struct uevent { >> struct list_head node; >> struct list_head merger_node; >> char wwid[WWID_SIZE] >> struct udev_device *udev; >> ... >> }; > > You can't just get the wwid with no work (look at all work uev_add_path > does, specifically alloc_path_with_pathinfo). Now you could reorder > this, but there isn't much point, since it is doing useful things, like > checking if this is a spurious uevent, and necessary things, like > figuring out the device type and using that that the configuration to > figure out HOW to get the wwid. It seems like what you want to do is to > call uev_add_path multiple times, but defer most of the work that > ev_add_path does (creating or updating the multipath device), until > you've processed all that paths. > Which is kinda what I've proposed; split off uev_add_path() and ev_add_path(). Then uev_add_path could generate a list of fully-formed paths which ev_add_path() would process. IE generalize coalesce_paths() to work on a passed-in list rather than the normal vecs->pathvec. Or consider using vecs->pathvec to hold 'orphan' paths when processing uevents; then we could add the paths to the pathvec in uev_add_path() and have another thread going through the list of paths and trying to call coalesce_paths() for the changed mpps. Will be a tad tricky, as we'd need to identify which mpps needs to be updated; but nothing too awkward. Meanwhile I'm working on converting vecs->pathvec and vecs->mpvec to RCU lists; that should get rid of the need for most locks, which I suspect being the main reason for the scaling issues. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel ^ permalink raw reply [flat|nested] 47+ messages in thread
* Re: Improve processing efficiency for addition and deletion of multipath devices 2016-11-28 15:37 ` Hannes Reinecke @ 2016-12-01 1:16 ` tang.junhui 0 siblings, 0 replies; 47+ messages in thread From: tang.junhui @ 2016-12-01 1:16 UTC (permalink / raw) To: Benjamin Marzinski, Hannes Reinecke Cc: Bart Van Assche, dm-devel, Martin Wilck [-- Attachment #1.1: Type: text/plain, Size: 9855 bytes --] Hello Ben, Hannes I'm sorry for the late reply. > You can't just get the wwid with no work (look at all work uev_add_path > does, specifically alloc_path_with_pathinfo). Now you could reorder > this, but there isn't much point, since it is doing useful things, like > checking if this is a spurious uevent, and necessary things, like > figuring out the device type and using that that the configuration to > figure out HOW to get the wwid. IMO, WWID can be geted from uevent, since it has a ID_SERIAL filed in uevent body. > It seems like what you want to do is to > call uev_add_path multiple times, but defer most of the work that > ev_add_path does (creating or updating the multipath device), until > you've processed all that paths. Not exactly, the input parameter "struct uevent *uev" is a batch of merged uevents, so uev_add_path() is called once to process the merged uevent. > split off uev_add_path() and > ev_add_path(). > Then uev_add_path could generate a list of fully-formed paths which > ev_add_path() would process. > IE generalize coalesce_paths() to work on a passed-in list rather than > the normal vecs->pathvec. Hannes, I think my thoughts are close to your idea now, In uev_add_path(), we get all information of the merged paths, and then call ev_add_path() to create or update the multipath device. Maybe the different between us is "processing all the same type uevents(add, etc.) in ev_add_path()" or "processing all the same type uevents(add, etc.) which came from the same LUN in ev_add_path()". I think your idea can also reach the goal, and reduce the DM reload times. So we will try to code as your idea, and in list_merger_uevents(&uevq_tmp) only merger the same type(add, etc.) uevents, add stop mergering when another type uevents are occured. Thanks all, Tang 发件人: Hannes Reinecke <hare@suse.de> 收件人: Benjamin Marzinski <bmarzins@redhat.com>, tang.junhui@zte.com.cn, 抄送: dm-devel@redhat.com, zhang.kai16@zte.com.cn, Martin Wilck <mwilck@suse.com>, Bart Van Assche <bart.vanassche@sandisk.com> 日期: 2016/11/28 23:46 主题: Re: [dm-devel] Improve processing efficiency for addition and deletion of multipath devices 发件人: dm-devel-bounces@redhat.com On 11/28/2016 04:25 PM, Benjamin Marzinski wrote: > On Mon, Nov 28, 2016 at 10:19:15AM +0800, tang.junhui@zte.com.cn wrote: >> Hello Christophe, Ben, Hannes, Martin, Bart, >> I am a member of host-side software development team of ZXUSP storage >> project >> in ZTE Corporation. Facing the market demand, our team decides to write >> code to >> promote multipath efficiency next month. The whole idea is in the mail >> below.We >> hope to participate in and make progress with the open source community, >> so any >> suggestion and comment would be welcome. > > Like I mentioned before, I think this is a good idea in general, but the > devil is in the details here. > >> >> Thanks, >> Tang >> >> ------------------------------------------------------------------------------------------------------------------------------ >> ------------------------------------------------------------------------------------------------------------------------------ >> 1. Problem >> In these scenarios, multipath processing efficiency is low: >> 1) Many paths exist in each multipath device, >> 2) Devices addition or deletion during iSCSI login/logout or FC link >> up/down. > > <snip> > >> 4. Proposal >> Other than processing uevents one by one, uevents which coming from the >> same LUN devices can be mergered to one, and then uevent processing >> thread only needs to process it once, and it only produces one DM addition >> uevent which could reduce system resource consumption. >> >> The example in Chapter 2 is continued to use to explain the proposal: >> 1) Multipath receives block device addition uevents from udev: >> UDEV [89068.806214] add >> /devices/platform/host3/session44/target3:0:0/3:0:0:0/block/sdc (block) >> UDEV [89068.909457] add >> /devices/platform/host3/session44/target3:0:0/3:0:0:2/block/sdg (block) >> UDEV [89068.944956] add >> /devices/platform/host3/session44/target3:0:0/3:0:0:1/block/sde (block) >> UDEV [89068.959215] add >> /devices/platform/host5/session46/target5:0:0/5:0:0:0/block/sdh (block) >> UDEV [89068.978558] add >> /devices/platform/host5/session46/target5:0:0/5:0:0:2/block/sdk (block) >> UDEV [89069.004217] add >> /devices/platform/host5/session46/target5:0:0/5:0:0:1/block/sdj (block) >> UDEV [89069.486361] add >> /devices/platform/host4/session45/target4:0:0/4:0:0:1/block/sdf (block) >> UDEV [89069.495194] add >> /devices/platform/host4/session45/target4:0:0/4:0:0:0/block/sdd (block) >> UDEV [89069.511628] add >> /devices/platform/host4/session45/target4:0:0/4:0:0:2/block/sdi (block) >> UDEV [89069.716292] add >> /devices/platform/host6/session47/target6:0:0/6:0:0:0/block/sdl (block) >> UDEV [89069.748456] add >> /devices/platform/host6/session47/target6:0:0/6:0:0:1/block/sdm (block) >> UDEV [89069.789662] add >> /devices/platform/host6/session47/target6:0:0/6:0:0:2/block/sdn (block) >> >> 2) Multipath merges these 12 uevents to 3 internal uvents >> UEVENT add sdc sdh sdd sdl >> UEVENT add sde sdj sdf sdm >> UEVENT add sdg sdk sdi sdn >> >> 3) Multipath process these 3 uevents one by one, and only produce 3 >> addition >> DM uvents, no dm change uevent exists. >> KERNEL[89068.899614] add /devices/virtual/block/dm-2 (block) >> KERNEL[89068.955364] add /devices/virtual/block/dm-3 (block) >> KERNEL[89069.018903] add /devices/virtual/block/dm-4 (block) > > Just because I'm pedantic: There will, of cource, be dm change events. > Without them, you couldn't have a multipath device. Whenever you load a > table in a dm device (including during the initial creation), you get a > change event. > >> 4) Udev process these uevents above, and transfer it to multipath >> UDEV [89068.926428] add /devices/virtual/block/dm-2 (block) >> UDEV [89069.007511] add /devices/virtual/block/dm-3 (block) >> UDEV [89069.098054] add /devices/virtual/block/dm-4 (block) > > multipathd ignores add events for dm devices (look at uev_trigger). A dm > device isn't set up until it's initial change event happens. > >> 5) Multipath processes these uevents above, and finishes the creation of >> multipath >> devices. >> >> 5. Coding >> After taking over uevents form uevent listening thread, uevent processing >> thread can >> merger these uevents before processing�� >> int uevent_dispatch(int (*uev_trigger)(struct uevent *, void * >> trigger_data), >> void * trigger_data) >> { >> ... >> while (1) { >> ... >> list_splice_init(&uevq, &uevq_tmp); >> ... >> list_merger_uevents(&uevq_tmp); >> service_uevq(&uevq_tmp); >> } >> ... >> } >> In structure of ��struct uevent�� , an additional member of ��char >> wwid[WWID_SIZE]�� is >> added to record each device WWID for addition or change uevent to identify >> whether >> these uevents coming from the same LUN, and an additional member of >> ��struct list_head merger_node�� is added to record the list of uevents >> which having been >> merged with this uevent: >> struct uevent { >> struct list_head node; >> struct list_head merger_node; >> char wwid[WWID_SIZE] >> struct udev_device *udev; >> ... >> }; > > You can't just get the wwid with no work (look at all work uev_add_path > does, specifically alloc_path_with_pathinfo). Now you could reorder > this, but there isn't much point, since it is doing useful things, like > checking if this is a spurious uevent, and necessary things, like > figuring out the device type and using that that the configuration to > figure out HOW to get the wwid. It seems like what you want to do is to > call uev_add_path multiple times, but defer most of the work that > ev_add_path does (creating or updating the multipath device), until > you've processed all that paths. > Which is kinda what I've proposed; split off uev_add_path() and ev_add_path(). Then uev_add_path could generate a list of fully-formed paths which ev_add_path() would process. IE generalize coalesce_paths() to work on a passed-in list rather than the normal vecs->pathvec. Or consider using vecs->pathvec to hold 'orphan' paths when processing uevents; then we could add the paths to the pathvec in uev_add_path() and have another thread going through the list of paths and trying to call coalesce_paths() for the changed mpps. Will be a tad tricky, as we'd need to identify which mpps needs to be updated; but nothing too awkward. Meanwhile I'm working on converting vecs->pathvec and vecs->mpvec to RCU lists; that should get rid of the need for most locks, which I suspect being the main reason for the scaling issues. Cheers, Hannes -- Dr. Hannes Reinecke Teamlead Storage & Networking hare@suse.de +49 911 74053 688 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton HRB 21284 (AG Nürnberg) -- dm-devel mailing list dm-devel@redhat.com https://www.redhat.com/mailman/listinfo/dm-devel [-- Attachment #1.2: Type: text/html, Size: 14860 bytes --] [-- Attachment #2: Type: text/plain, Size: 0 bytes --] ^ permalink raw reply [flat|nested] 47+ messages in thread
end of thread, other threads:[~2016-12-01 1:16 UTC | newest] Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2016-11-16 1:46 Improve processing efficiency for addition and deletion of multipath devices tang.junhui 2016-11-16 7:53 ` Hannes Reinecke 2016-11-16 8:45 ` tang.junhui 2016-11-16 9:49 ` Martin Wilck 2016-11-17 1:41 ` tang.junhui 2016-11-17 10:48 ` Martin Wilck 2016-11-18 1:02 ` tang.junhui 2016-11-18 7:39 ` Martin Wilck 2016-11-18 8:24 ` tang.junhui 2016-11-18 8:30 ` Martin Wilck 2016-11-18 8:56 ` tang.junhui 2016-11-18 9:12 ` tang.junhui 2016-11-21 18:19 ` Benjamin Marzinski 2016-11-18 22:26 ` Benjamin Marzinski 2016-11-23 1:08 ` tang.junhui 2016-11-29 9:07 ` Zdenek Kabelac 2016-11-29 10:13 ` tang.junhui 2016-11-24 9:21 ` Martin Wilck 2016-11-28 18:46 ` Benjamin Marzinski 2016-11-29 6:47 ` Hannes Reinecke 2016-11-29 8:02 ` Martin Wilck 2016-11-29 8:10 ` Zdenek Kabelac 2016-11-29 8:16 ` Martin Wilck 2016-11-29 8:24 ` Zdenek Kabelac 2016-11-29 17:25 ` Benjamin Marzinski 2016-11-29 7:57 ` Martin Wilck 2016-11-29 17:41 ` Benjamin Marzinski 2016-11-28 2:19 tang.junhui 2016-11-28 10:05 ` Hannes Reinecke 2016-11-28 16:07 ` Benjamin Marzinski 2016-11-28 16:26 ` Zdenek Kabelac 2016-11-28 10:06 ` Zdenek Kabelac 2016-11-28 10:42 ` Hannes Reinecke 2016-11-28 11:51 ` Zdenek Kabelac 2016-11-28 12:06 ` Peter Rajnoha 2016-11-28 12:08 ` Hannes Reinecke 2016-11-28 12:23 ` Peter Rajnoha 2016-11-28 12:55 ` Zdenek Kabelac 2016-11-28 17:22 ` Benjamin Marzinski 2016-11-29 9:34 ` Zdenek Kabelac 2016-11-28 10:28 ` Martin Wilck 2016-11-28 17:31 ` Benjamin Marzinski 2016-11-29 7:52 ` Martin Wilck 2016-11-29 19:21 ` Benjamin Marzinski 2016-11-28 15:25 ` Benjamin Marzinski 2016-11-28 15:37 ` Hannes Reinecke 2016-12-01 1:16 ` tang.junhui
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.