* [PATCH] mdadm/systemd: remove KillMode=none from service file @ 2022-02-15 13:34 Coly Li 2022-04-06 6:36 ` Xiao Ni 2022-07-28 7:55 ` Mariusz Tkaczyk 0 siblings, 2 replies; 16+ messages in thread From: Coly Li @ 2022-02-15 13:34 UTC (permalink / raw) To: linux-raid Cc: Coly Li, Benjamin Brunner, Franck Bui, Jes Sorensen, Mariusz Tkaczyk, Neil Brown, Xiao Ni For mdadm's systemd configuration, current systemd KillMode is "none" in following service files, - mdadm-grow-continue@.service - mdmon@.service This "none" mode is strongly againsted by systemd developers (see man 5 systemd.kill for "KillMode=" section), and is considering to remove in future systemd version. As systemd developer explained in disuccsion, the systemd kill process is, 1. send the signal specified by KillSignal= to the list of processes (if any), TERM is the default 2. wait until either the target of process(es) exit or a timeout expires 3. if the timeout expires send the signal specified by FinalKillSignal=, KILL is the default For "control-group", all remaining processes will receive the SIGTERM signal (by default) and if there are still processes after a period f time, they will get the SIGKILL signal. For "mixed", only the main process will receive the SIGTERM signal, and if there are still processes after a period of time, all remaining processes (including the main one) will receive the SIGKILL signal. From the above comment, currently KillMode=control-group is a proper kill mode. Since control-gropu is the default kill mode, the fix can be simply removing KillMode=none line from the service file, then the default mode will take effect. Signed-off-by: Coly Li <colyli@suse.de> Cc: Benjamin Brunner <bbrunner@suse.com> Cc: Franck Bui <fbui@suse.de> Cc: Jes Sorensen <jes@trained-monkey.org> Cc: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> Cc: Neil Brown <neilb@suse.de> Cc: Xiao Ni <xni@redhat.com> --- systemd/mdadm-grow-continue@.service | 1 - systemd/mdmon@.service | 1 - 2 files changed, 2 deletions(-) diff --git a/systemd/mdadm-grow-continue@.service b/systemd/mdadm-grow-continue@.service index 5c667d2..9fdc8ec 100644 --- a/systemd/mdadm-grow-continue@.service +++ b/systemd/mdadm-grow-continue@.service @@ -14,4 +14,3 @@ ExecStart=BINDIR/mdadm --grow --continue /dev/%I StandardInput=null StandardOutput=null StandardError=null -KillMode=none diff --git a/systemd/mdmon@.service b/systemd/mdmon@.service index 85a3a7c..7753395 100644 --- a/systemd/mdmon@.service +++ b/systemd/mdmon@.service @@ -25,4 +25,3 @@ Type=forking # it out) and systemd will remove it when transitioning from # initramfs to rootfs. #PIDFile=/run/mdadm/%I.pid -KillMode=none -- 2.31.1 ^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-02-15 13:34 [PATCH] mdadm/systemd: remove KillMode=none from service file Coly Li @ 2022-04-06 6:36 ` Xiao Ni 2022-04-06 13:35 ` Jes Sorensen 2022-07-28 7:55 ` Mariusz Tkaczyk 1 sibling, 1 reply; 16+ messages in thread From: Xiao Ni @ 2022-04-06 6:36 UTC (permalink / raw) To: Coly Li Cc: linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Mariusz Tkaczyk, Neil Brown Hi Jes Could you merge this patch. On Tue, Feb 15, 2022 at 9:34 PM Coly Li <colyli@suse.de> wrote: > > For mdadm's systemd configuration, current systemd KillMode is "none" in > following service files, > - mdadm-grow-continue@.service > - mdmon@.service > > This "none" mode is strongly againsted by systemd developers (see man 5 > systemd.kill for "KillMode=" section), and is considering to remove in > future systemd version. > > As systemd developer explained in disuccsion, the systemd kill process > is, > 1. send the signal specified by KillSignal= to the list of processes (if > any), TERM is the default > 2. wait until either the target of process(es) exit or a timeout expires > 3. if the timeout expires send the signal specified by FinalKillSignal=, > KILL is the default > > For "control-group", all remaining processes will receive the SIGTERM > signal (by default) and if there are still processes after a period f > time, they will get the SIGKILL signal. > > For "mixed", only the main process will receive the SIGTERM signal, and > if there are still processes after a period of time, all remaining > processes (including the main one) will receive the SIGKILL signal. > > From the above comment, currently KillMode=control-group is a proper > kill mode. Since control-gropu is the default kill mode, the fix can be > simply removing KillMode=none line from the service file, then the > default mode will take effect. > > Signed-off-by: Coly Li <colyli@suse.de> > Cc: Benjamin Brunner <bbrunner@suse.com> > Cc: Franck Bui <fbui@suse.de> > Cc: Jes Sorensen <jes@trained-monkey.org> > Cc: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > Cc: Neil Brown <neilb@suse.de> > Cc: Xiao Ni <xni@redhat.com> > --- > systemd/mdadm-grow-continue@.service | 1 - > systemd/mdmon@.service | 1 - > 2 files changed, 2 deletions(-) > > diff --git a/systemd/mdadm-grow-continue@.service b/systemd/mdadm-grow-continue@.service > index 5c667d2..9fdc8ec 100644 > --- a/systemd/mdadm-grow-continue@.service > +++ b/systemd/mdadm-grow-continue@.service > @@ -14,4 +14,3 @@ ExecStart=BINDIR/mdadm --grow --continue /dev/%I > StandardInput=null > StandardOutput=null > StandardError=null > -KillMode=none > diff --git a/systemd/mdmon@.service b/systemd/mdmon@.service > index 85a3a7c..7753395 100644 > --- a/systemd/mdmon@.service > +++ b/systemd/mdmon@.service > @@ -25,4 +25,3 @@ Type=forking > # it out) and systemd will remove it when transitioning from > # initramfs to rootfs. > #PIDFile=/run/mdadm/%I.pid > -KillMode=none > -- > 2.31.1 > Acked-by: Xiao Ni <xni@redhat.com> ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-04-06 6:36 ` Xiao Ni @ 2022-04-06 13:35 ` Jes Sorensen 0 siblings, 0 replies; 16+ messages in thread From: Jes Sorensen @ 2022-04-06 13:35 UTC (permalink / raw) To: Xiao Ni, Coly Li Cc: linux-raid, Benjamin Brunner, Franck Bui, Mariusz Tkaczyk, Neil Brown On 4/6/22 02:36, Xiao Ni wrote: > Hi Jes > > Could you merge this patch. > > On Tue, Feb 15, 2022 at 9:34 PM Coly Li <colyli@suse.de> wrote: >> >> For mdadm's systemd configuration, current systemd KillMode is "none" in >> following service files, >> - mdadm-grow-continue@.service >> - mdmon@.service >> >> This "none" mode is strongly againsted by systemd developers (see man 5 >> systemd.kill for "KillMode=" section), and is considering to remove in >> future systemd version. >> >> As systemd developer explained in disuccsion, the systemd kill process >> is, >> 1. send the signal specified by KillSignal= to the list of processes (if >> any), TERM is the default >> 2. wait until either the target of process(es) exit or a timeout expires >> 3. if the timeout expires send the signal specified by FinalKillSignal=, >> KILL is the default >> >> For "control-group", all remaining processes will receive the SIGTERM >> signal (by default) and if there are still processes after a period f >> time, they will get the SIGKILL signal. >> >> For "mixed", only the main process will receive the SIGTERM signal, and >> if there are still processes after a period of time, all remaining >> processes (including the main one) will receive the SIGKILL signal. >> >> From the above comment, currently KillMode=control-group is a proper >> kill mode. Since control-gropu is the default kill mode, the fix can be >> simply removing KillMode=none line from the service file, then the >> default mode will take effect. >> Seems reasonable to me, applied! Thanks, Jes ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-02-15 13:34 [PATCH] mdadm/systemd: remove KillMode=none from service file Coly Li 2022-04-06 6:36 ` Xiao Ni @ 2022-07-28 7:55 ` Mariusz Tkaczyk 2022-07-28 8:39 ` Coly Li ` (2 more replies) 1 sibling, 3 replies; 16+ messages in thread From: Mariusz Tkaczyk @ 2022-07-28 7:55 UTC (permalink / raw) To: Coly Li Cc: linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Neil Brown, Xiao Ni On Tue, 15 Feb 2022 21:34:15 +0800 Coly Li <colyli@suse.de> wrote: > For mdadm's systemd configuration, current systemd KillMode is "none" in > following service files, > - mdadm-grow-continue@.service > - mdmon@.service > > This "none" mode is strongly againsted by systemd developers (see man 5 > systemd.kill for "KillMode=" section), and is considering to remove in > future systemd version. > > As systemd developer explained in disuccsion, the systemd kill process > is, > 1. send the signal specified by KillSignal= to the list of processes (if > any), TERM is the default > 2. wait until either the target of process(es) exit or a timeout expires > 3. if the timeout expires send the signal specified by FinalKillSignal=, > KILL is the default > > For "control-group", all remaining processes will receive the SIGTERM > signal (by default) and if there are still processes after a period f > time, they will get the SIGKILL signal. > > For "mixed", only the main process will receive the SIGTERM signal, and > if there are still processes after a period of time, all remaining > processes (including the main one) will receive the SIGKILL signal. > > From the above comment, currently KillMode=control-group is a proper > kill mode. Since control-gropu is the default kill mode, the fix can be > simply removing KillMode=none line from the service file, then the > default mode will take effect. Hi All, We are experiencing issues with IMSM metadata on RHEL8.7 and 9.1 (the patch was picked by Redhat). There are several issues which results in hang task, characteristic to missing mdmon: [ 619.521440] task:umount state:D stack: 0 pid: 6285 ppid: flags:0x00004084 [ 619.534033] Call Trace: [ 619.539980] __schedule+0x2d1/0x830 [ 619.547056] ? finish_wait+0x80/0x80 [ 619.554261] schedule+0x35/0xa0 [ 619.560999] md_write_start+0x14b/0x220 [ 619.568492] ? finish_wait+0x80/0x80 [ 619.575649] raid1_make_request+0x3c/0x90 [raid1] [ 619.584111] md_handle_request+0x128/0x1b0 [ 619.591891] md_make_request+0x5b/0xb0 [ 619.599235] generic_make_request_no_check+0x202/0x330 [ 619.608185] submit_bio+0x3c/0x160 [ 619.615161] ? bio_add_page+0x42/0x50 [ 619.622413] submit_bh_wbc+0x16a/0x190 [ 619.629713] jbd2_write_superblock+0xf4/0x210 [jbd2] [ 619.638340] jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2] [ 619.647773] __jbd2_update_log_tail+0x3f/0x100 [jbd2] [ 619.656374] jbd2_cleanup_journal_tail+0x50/0x90 [jbd2] [ 619.665107] jbd2_log_do_checkpoint+0xfa/0x400 [jbd2] [ 619.673572] ? prepare_to_wait_event+0xa0/0x180 [ 619.681344] jbd2_journal_destroy+0x120/0x2a0 [jbd2] [ 619.689551] ? finish_wait+0x80/0x80 [ 619.696096] ext4_put_super+0x76/0x390 [ext4] [ 619.703584] generic_shutdown_super+0x6c/0x100 [ 619.711065] kill_block_super+0x21/0x50 [ 619.717809] deactivate_locked_super+0x34/0x70 [ 619.725146] cleanup_mnt+0x3b/0x70 [ 619.731279] task_work_run+0x8a/0xb0 [ 619.737576] exit_to_usermode_loop+0xeb/0xf0 [ 619.744657] do_syscall_64+0x198/0x1a0 [ 619.751155] entry_SYSCALL_64_after_hwframe+0x65/0xca It can be reproduced by mounting LVM created on IMSM RAID1 array and then reboot. I verified that reverting the patch fixes the issue. I understand that from systemd perspective the behavior in not wanted, but this is exactly what we need, to have working mdmon process even if systemd was stopped. KillMode=none does the job. I searched for alternative way to prevent systemd from stopping the mdmon unit but I failed. I tried to change signals, so I configured unit to send SIGPIPE (because it is ignored by mdmon)- it worked but later system hanged because mdmon unit cannot be stopped. I also tried to configure mdmon unit to be stopped after umount.target and I failed too. It cannot be achieved by setting After= or Before=. The one objection I have here is that systemd-shutdown tries to stop raid arrays later, so it could be better to have running mdmon there. IMO KillMode=none is desired in this case. Later, mdmon is restarted in dracut by mdraid module. If there is no other solution for the problem, I will need to ask Jes to revert this patch. For now, I asked Redhat to do it. Do you have any suggestions? TIA, Mariusz ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-07-28 7:55 ` Mariusz Tkaczyk @ 2022-07-28 8:39 ` Coly Li 2022-07-28 9:01 ` Mariusz Tkaczyk 2022-07-29 1:55 ` NeilBrown 2022-10-04 10:24 ` Mariusz Tkaczyk 2 siblings, 1 reply; 16+ messages in thread From: Coly Li @ 2022-07-28 8:39 UTC (permalink / raw) To: Mariusz Tkaczyk Cc: linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Neil Brown, Xiao Ni > 2022年7月28日 15:55,Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> 写道: > > On Tue, 15 Feb 2022 21:34:15 +0800 > Coly Li <colyli@suse.de> wrote: > >> For mdadm's systemd configuration, current systemd KillMode is "none" in >> following service files, >> - mdadm-grow-continue@.service >> - mdmon@.service >> >> This "none" mode is strongly againsted by systemd developers (see man 5 >> systemd.kill for "KillMode=" section), and is considering to remove in >> future systemd version. >> >> As systemd developer explained in disuccsion, the systemd kill process >> is, >> 1. send the signal specified by KillSignal= to the list of processes (if >> any), TERM is the default >> 2. wait until either the target of process(es) exit or a timeout expires >> 3. if the timeout expires send the signal specified by FinalKillSignal=, >> KILL is the default >> >> For "control-group", all remaining processes will receive the SIGTERM >> signal (by default) and if there are still processes after a period f >> time, they will get the SIGKILL signal. >> >> For "mixed", only the main process will receive the SIGTERM signal, and >> if there are still processes after a period of time, all remaining >> processes (including the main one) will receive the SIGKILL signal. >> >> From the above comment, currently KillMode=control-group is a proper >> kill mode. Since control-gropu is the default kill mode, the fix can be >> simply removing KillMode=none line from the service file, then the >> default mode will take effect. > > Hi All, > We are experiencing issues with IMSM metadata on RHEL8.7 and 9.1 (the patch > was picked by Redhat). There are several issues which results in hang task, > characteristic to missing mdmon: > > [ 619.521440] task:umount state:D stack: 0 pid: 6285 ppid: flags:0x00004084 > [ 619.534033] Call Trace: > [ 619.539980] __schedule+0x2d1/0x830 > [ 619.547056] ? finish_wait+0x80/0x80 > [ 619.554261] schedule+0x35/0xa0 > [ 619.560999] md_write_start+0x14b/0x220 > [ 619.568492] ? finish_wait+0x80/0x80 > [ 619.575649] raid1_make_request+0x3c/0x90 [raid1] > [ 619.584111] md_handle_request+0x128/0x1b0 > [ 619.591891] md_make_request+0x5b/0xb0 > [ 619.599235] generic_make_request_no_check+0x202/0x330 > [ 619.608185] submit_bio+0x3c/0x160 > [ 619.615161] ? bio_add_page+0x42/0x50 > [ 619.622413] submit_bh_wbc+0x16a/0x190 > [ 619.629713] jbd2_write_superblock+0xf4/0x210 [jbd2] > [ 619.638340] jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2] > [ 619.647773] __jbd2_update_log_tail+0x3f/0x100 [jbd2] > [ 619.656374] jbd2_cleanup_journal_tail+0x50/0x90 [jbd2] > [ 619.665107] jbd2_log_do_checkpoint+0xfa/0x400 [jbd2] > [ 619.673572] ? prepare_to_wait_event+0xa0/0x180 > [ 619.681344] jbd2_journal_destroy+0x120/0x2a0 [jbd2] > [ 619.689551] ? finish_wait+0x80/0x80 > [ 619.696096] ext4_put_super+0x76/0x390 [ext4] > [ 619.703584] generic_shutdown_super+0x6c/0x100 > [ 619.711065] kill_block_super+0x21/0x50 > [ 619.717809] deactivate_locked_super+0x34/0x70 > [ 619.725146] cleanup_mnt+0x3b/0x70 > [ 619.731279] task_work_run+0x8a/0xb0 > [ 619.737576] exit_to_usermode_loop+0xeb/0xf0 > [ 619.744657] do_syscall_64+0x198/0x1a0 > [ 619.751155] entry_SYSCALL_64_after_hwframe+0x65/0xca > > It can be reproduced by mounting LVM created on IMSM RAID1 array and then > reboot. I verified that reverting the patch fixes the issue. > > I understand that from systemd perspective the behavior in not wanted, but > this is exactly what we need, to have working mdmon process even if systemd was > stopped. KillMode=none does the job. > I searched for alternative way to prevent systemd from stopping the mdmon unit > but I failed. I tried to change signals, so I configured unit to send SIGPIPE > (because it is ignored by mdmon)- it worked but later system hanged because > mdmon unit cannot be stopped. > > I also tried to configure mdmon unit to be stopped after umount.target and I > failed too. It cannot be achieved by setting After= or Before=. The one > objection I have here is that systemd-shutdown tries to stop raid arrays later, > so it could be better to have running mdmon there. > > IMO KillMode=none is desired in this case. Later, mdmon is restarted in dracut > by mdraid module. > > If there is no other solution for the problem, I will need to ask Jes to revert > this patch. For now, I asked Redhat to do it. > Do you have any suggestions? If Redhat doesn’t use the latest systemd, they should drop this patch. For mdadm upstream we should keep this because it was suggested by systemd developer. Thanks. Coly Li ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-07-28 8:39 ` Coly Li @ 2022-07-28 9:01 ` Mariusz Tkaczyk 2022-07-28 10:55 ` Coly Li 0 siblings, 1 reply; 16+ messages in thread From: Mariusz Tkaczyk @ 2022-07-28 9:01 UTC (permalink / raw) To: Coly Li Cc: linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Neil Brown, Xiao Ni On Thu, 28 Jul 2022 16:39:56 +0800 Coly Li <colyli@suse.de> wrote: > > 2022年7月28日 15:55,Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > > 写道: > > > > On Tue, 15 Feb 2022 21:34:15 +0800 > > Coly Li <colyli@suse.de> wrote: > > > >> For mdadm's systemd configuration, current systemd KillMode is "none" in > >> following service files, > >> - mdadm-grow-continue@.service > >> - mdmon@.service > >> > >> This "none" mode is strongly againsted by systemd developers (see man 5 > >> systemd.kill for "KillMode=" section), and is considering to remove in > >> future systemd version. > >> > >> As systemd developer explained in disuccsion, the systemd kill process > >> is, > >> 1. send the signal specified by KillSignal= to the list of processes (if > >> any), TERM is the default > >> 2. wait until either the target of process(es) exit or a timeout expires > >> 3. if the timeout expires send the signal specified by FinalKillSignal=, > >> KILL is the default > >> > >> For "control-group", all remaining processes will receive the SIGTERM > >> signal (by default) and if there are still processes after a period f > >> time, they will get the SIGKILL signal. > >> > >> For "mixed", only the main process will receive the SIGTERM signal, and > >> if there are still processes after a period of time, all remaining > >> processes (including the main one) will receive the SIGKILL signal. > >> > >> From the above comment, currently KillMode=control-group is a proper > >> kill mode. Since control-gropu is the default kill mode, the fix can be > >> simply removing KillMode=none line from the service file, then the > >> default mode will take effect. > > > > Hi All, > > We are experiencing issues with IMSM metadata on RHEL8.7 and 9.1 (the patch > > was picked by Redhat). There are several issues which results in hang task, > > characteristic to missing mdmon: > > > > [ 619.521440] task:umount state:D stack: 0 pid: 6285 ppid: flags:0x00004084 > > [ 619.534033] Call Trace: > > [ 619.539980] __schedule+0x2d1/0x830 > > [ 619.547056] ? finish_wait+0x80/0x80 > > [ 619.554261] schedule+0x35/0xa0 > > [ 619.560999] md_write_start+0x14b/0x220 > > [ 619.568492] ? finish_wait+0x80/0x80 > > [ 619.575649] raid1_make_request+0x3c/0x90 [raid1] > > [ 619.584111] md_handle_request+0x128/0x1b0 > > [ 619.591891] md_make_request+0x5b/0xb0 > > [ 619.599235] generic_make_request_no_check+0x202/0x330 > > [ 619.608185] submit_bio+0x3c/0x160 > > [ 619.615161] ? bio_add_page+0x42/0x50 > > [ 619.622413] submit_bh_wbc+0x16a/0x190 > > [ 619.629713] jbd2_write_superblock+0xf4/0x210 [jbd2] > > [ 619.638340] jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2] > > [ 619.647773] __jbd2_update_log_tail+0x3f/0x100 [jbd2] > > [ 619.656374] jbd2_cleanup_journal_tail+0x50/0x90 [jbd2] > > [ 619.665107] jbd2_log_do_checkpoint+0xfa/0x400 [jbd2] > > [ 619.673572] ? prepare_to_wait_event+0xa0/0x180 > > [ 619.681344] jbd2_journal_destroy+0x120/0x2a0 [jbd2] > > [ 619.689551] ? finish_wait+0x80/0x80 > > [ 619.696096] ext4_put_super+0x76/0x390 [ext4] > > [ 619.703584] generic_shutdown_super+0x6c/0x100 > > [ 619.711065] kill_block_super+0x21/0x50 > > [ 619.717809] deactivate_locked_super+0x34/0x70 > > [ 619.725146] cleanup_mnt+0x3b/0x70 > > [ 619.731279] task_work_run+0x8a/0xb0 > > [ 619.737576] exit_to_usermode_loop+0xeb/0xf0 > > [ 619.744657] do_syscall_64+0x198/0x1a0 > > [ 619.751155] entry_SYSCALL_64_after_hwframe+0x65/0xca > > > > It can be reproduced by mounting LVM created on IMSM RAID1 array and then > > reboot. I verified that reverting the patch fixes the issue. > > > > I understand that from systemd perspective the behavior in not wanted, but > > this is exactly what we need, to have working mdmon process even if systemd > > was stopped. KillMode=none does the job. > > I searched for alternative way to prevent systemd from stopping the mdmon > > unit but I failed. I tried to change signals, so I configured unit to send > > SIGPIPE (because it is ignored by mdmon)- it worked but later system hanged > > because mdmon unit cannot be stopped. > > > > I also tried to configure mdmon unit to be stopped after umount.target and I > > failed too. It cannot be achieved by setting After= or Before=. The one > > objection I have here is that systemd-shutdown tries to stop raid arrays > > later, so it could be better to have running mdmon there. > > > > IMO KillMode=none is desired in this case. Later, mdmon is restarted in > > dracut by mdraid module. > > > > If there is no other solution for the problem, I will need to ask Jes to > > revert this patch. For now, I asked Redhat to do it. > > Do you have any suggestions? > > > If Redhat doesn’t use the latest systemd, they should drop this patch. For > mdadm upstream we should keep this because it was suggested by systemd > developer. > If we want to keep this, we need to resolve reboot problem. I described problem and now I'm waiting for feedback. I hope that it can be fixed in mdmon service fast and easy. I we will determine that mdmon design update is needed then I will request to revert it, until fix is not ready to minimize impact on users (distros may pull this). Thanks Mariusz ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-07-28 9:01 ` Mariusz Tkaczyk @ 2022-07-28 10:55 ` Coly Li 2022-07-29 7:55 ` Mariusz Tkaczyk 0 siblings, 1 reply; 16+ messages in thread From: Coly Li @ 2022-07-28 10:55 UTC (permalink / raw) To: Mariusz Tkaczyk Cc: linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Neil Brown, Xiao Ni > 2022年7月28日 17:01,Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> 写道: > > On Thu, 28 Jul 2022 16:39:56 +0800 > Coly Li <colyli@suse.de> wrote: > >>> 2022年7月28日 15:55,Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> >>> 写道: >>> >>> On Tue, 15 Feb 2022 21:34:15 +0800 >>> Coly Li <colyli@suse.de> wrote: >>> >>>> For mdadm's systemd configuration, current systemd KillMode is "none" in >>>> following service files, >>>> - mdadm-grow-continue@.service >>>> - mdmon@.service >>>> >>>> This "none" mode is strongly againsted by systemd developers (see man 5 >>>> systemd.kill for "KillMode=" section), and is considering to remove in >>>> future systemd version. >>>> >>>> As systemd developer explained in disuccsion, the systemd kill process >>>> is, >>>> 1. send the signal specified by KillSignal= to the list of processes (if >>>> any), TERM is the default >>>> 2. wait until either the target of process(es) exit or a timeout expires >>>> 3. if the timeout expires send the signal specified by FinalKillSignal=, >>>> KILL is the default >>>> >>>> For "control-group", all remaining processes will receive the SIGTERM >>>> signal (by default) and if there are still processes after a period f >>>> time, they will get the SIGKILL signal. >>>> >>>> For "mixed", only the main process will receive the SIGTERM signal, and >>>> if there are still processes after a period of time, all remaining >>>> processes (including the main one) will receive the SIGKILL signal. >>>> >>>> From the above comment, currently KillMode=control-group is a proper >>>> kill mode. Since control-gropu is the default kill mode, the fix can be >>>> simply removing KillMode=none line from the service file, then the >>>> default mode will take effect. >>> >>> Hi All, >>> We are experiencing issues with IMSM metadata on RHEL8.7 and 9.1 (the patch >>> was picked by Redhat). There are several issues which results in hang task, >>> characteristic to missing mdmon: >>> >>> [ 619.521440] task:umount state:D stack: 0 pid: 6285 ppid: flags:0x00004084 >>> [ 619.534033] Call Trace: >>> [ 619.539980] __schedule+0x2d1/0x830 >>> [ 619.547056] ? finish_wait+0x80/0x80 >>> [ 619.554261] schedule+0x35/0xa0 >>> [ 619.560999] md_write_start+0x14b/0x220 >>> [ 619.568492] ? finish_wait+0x80/0x80 >>> [ 619.575649] raid1_make_request+0x3c/0x90 [raid1] >>> [ 619.584111] md_handle_request+0x128/0x1b0 >>> [ 619.591891] md_make_request+0x5b/0xb0 >>> [ 619.599235] generic_make_request_no_check+0x202/0x330 >>> [ 619.608185] submit_bio+0x3c/0x160 >>> [ 619.615161] ? bio_add_page+0x42/0x50 >>> [ 619.622413] submit_bh_wbc+0x16a/0x190 >>> [ 619.629713] jbd2_write_superblock+0xf4/0x210 [jbd2] >>> [ 619.638340] jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2] >>> [ 619.647773] __jbd2_update_log_tail+0x3f/0x100 [jbd2] >>> [ 619.656374] jbd2_cleanup_journal_tail+0x50/0x90 [jbd2] >>> [ 619.665107] jbd2_log_do_checkpoint+0xfa/0x400 [jbd2] >>> [ 619.673572] ? prepare_to_wait_event+0xa0/0x180 >>> [ 619.681344] jbd2_journal_destroy+0x120/0x2a0 [jbd2] >>> [ 619.689551] ? finish_wait+0x80/0x80 >>> [ 619.696096] ext4_put_super+0x76/0x390 [ext4] >>> [ 619.703584] generic_shutdown_super+0x6c/0x100 >>> [ 619.711065] kill_block_super+0x21/0x50 >>> [ 619.717809] deactivate_locked_super+0x34/0x70 >>> [ 619.725146] cleanup_mnt+0x3b/0x70 >>> [ 619.731279] task_work_run+0x8a/0xb0 >>> [ 619.737576] exit_to_usermode_loop+0xeb/0xf0 >>> [ 619.744657] do_syscall_64+0x198/0x1a0 >>> [ 619.751155] entry_SYSCALL_64_after_hwframe+0x65/0xca >>> >>> It can be reproduced by mounting LVM created on IMSM RAID1 array and then >>> reboot. I verified that reverting the patch fixes the issue. >>> >>> I understand that from systemd perspective the behavior in not wanted, but >>> this is exactly what we need, to have working mdmon process even if systemd >>> was stopped. KillMode=none does the job. >>> I searched for alternative way to prevent systemd from stopping the mdmon >>> unit but I failed. I tried to change signals, so I configured unit to send >>> SIGPIPE (because it is ignored by mdmon)- it worked but later system hanged >>> because mdmon unit cannot be stopped. >>> >>> I also tried to configure mdmon unit to be stopped after umount.target and I >>> failed too. It cannot be achieved by setting After= or Before=. The one >>> objection I have here is that systemd-shutdown tries to stop raid arrays >>> later, so it could be better to have running mdmon there. >>> >>> IMO KillMode=none is desired in this case. Later, mdmon is restarted in >>> dracut by mdraid module. >>> >>> If there is no other solution for the problem, I will need to ask Jes to >>> revert this patch. For now, I asked Redhat to do it. >>> Do you have any suggestions? >> >> >> If Redhat doesn’t use the latest systemd, they should drop this patch. For >> mdadm upstream we should keep this because it was suggested by systemd >> developer. >> > > If we want to keep this, we need to resolve reboot problem. I described problem > and now I'm waiting for feedback. I hope that it can be fixed in mdmon service > fast and easy. Hmm, in the latest systemd source code, unit_kill_context() just simply ignores KILL_NONE (KillMode=none) like this, 4776 /* Kill the processes belonging to this unit, in preparation for shutting the unit down. 4777 * Returns > 0 if we killed something worth waiting for, 0 otherwise. */ 4778 4779 if (c->kill_mode == KILL_NONE) 4780 return 0; And no signal sent to target unit. Since there is no other location references KILL_NONE, it is not clear to me how KillMode=none may help more. I have no too much understanding to systemd, I guess maybe (correct me if I am wrong) it was because the systemd used in RHEL is not the latest version? > I we will determine that mdmon design update is needed then I will request to > revert it, until fix is not ready to minimize impact on users (distros may > pull this). Yes I agree. But for mdadm package in RHEL, I guess they don’t always use upstream mdadm, and just do backport for selected patches as other enterprise distributions do. If the latest mdadm and latest systemd work fine together, maybe the fast fix for RHEL is to just drop this patch from their backport, it is unnecessary to wait until the patch is reverted or fixed by upstream. BTW, can I know the exact version of systemd from RHEL 8.7 and 9.1? On my openSUSE 15.4, the systemd version is 249.11, I will try to reproduce the operations as well, and try to find some clue if I am lucky. Thanks. Coly Li ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-07-28 10:55 ` Coly Li @ 2022-07-29 7:55 ` Mariusz Tkaczyk 0 siblings, 0 replies; 16+ messages in thread From: Mariusz Tkaczyk @ 2022-07-29 7:55 UTC (permalink / raw) To: Coly Li Cc: linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Neil Brown, Xiao Ni On Thu, 28 Jul 2022 18:55:04 +0800 Coly Li <colyli@suse.de> wrote: > > 2022年7月28日 17:01,Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > > 写道: > > > > On Thu, 28 Jul 2022 16:39:56 +0800 > > Coly Li <colyli@suse.de> wrote: > > > >>> 2022年7月28日 15:55,Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> > >>> 写道: > >>> > >>> On Tue, 15 Feb 2022 21:34:15 +0800 > >>> Coly Li <colyli@suse.de> wrote: > >>> > >>>> For mdadm's systemd configuration, current systemd KillMode is "none" in > >>>> following service files, > >>>> - mdadm-grow-continue@.service > >>>> - mdmon@.service > >>>> > >>>> This "none" mode is strongly againsted by systemd developers (see man 5 > >>>> systemd.kill for "KillMode=" section), and is considering to remove in > >>>> future systemd version. > >>>> > >>>> As systemd developer explained in disuccsion, the systemd kill process > >>>> is, > >>>> 1. send the signal specified by KillSignal= to the list of processes (if > >>>> any), TERM is the default > >>>> 2. wait until either the target of process(es) exit or a timeout expires > >>>> 3. if the timeout expires send the signal specified by FinalKillSignal=, > >>>> KILL is the default > >>>> > >>>> For "control-group", all remaining processes will receive the SIGTERM > >>>> signal (by default) and if there are still processes after a period f > >>>> time, they will get the SIGKILL signal. > >>>> > >>>> For "mixed", only the main process will receive the SIGTERM signal, and > >>>> if there are still processes after a period of time, all remaining > >>>> processes (including the main one) will receive the SIGKILL signal. > >>>> > >>>> From the above comment, currently KillMode=control-group is a proper > >>>> kill mode. Since control-gropu is the default kill mode, the fix can be > >>>> simply removing KillMode=none line from the service file, then the > >>>> default mode will take effect. > >>> > >>> Hi All, > >>> We are experiencing issues with IMSM metadata on RHEL8.7 and 9.1 (the > >>> patch was picked by Redhat). There are several issues which results in > >>> hang task, characteristic to missing mdmon: > >>> > >>> [ 619.521440] task:umount state:D stack: 0 pid: 6285 ppid: > >>> flags:0x00004084 [ 619.534033] Call Trace: > >>> [ 619.539980] __schedule+0x2d1/0x830 > >>> [ 619.547056] ? finish_wait+0x80/0x80 > >>> [ 619.554261] schedule+0x35/0xa0 > >>> [ 619.560999] md_write_start+0x14b/0x220 > >>> [ 619.568492] ? finish_wait+0x80/0x80 > >>> [ 619.575649] raid1_make_request+0x3c/0x90 [raid1] > >>> [ 619.584111] md_handle_request+0x128/0x1b0 > >>> [ 619.591891] md_make_request+0x5b/0xb0 > >>> [ 619.599235] generic_make_request_no_check+0x202/0x330 > >>> [ 619.608185] submit_bio+0x3c/0x160 > >>> [ 619.615161] ? bio_add_page+0x42/0x50 > >>> [ 619.622413] submit_bh_wbc+0x16a/0x190 > >>> [ 619.629713] jbd2_write_superblock+0xf4/0x210 [jbd2] > >>> [ 619.638340] jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2] > >>> [ 619.647773] __jbd2_update_log_tail+0x3f/0x100 [jbd2] > >>> [ 619.656374] jbd2_cleanup_journal_tail+0x50/0x90 [jbd2] > >>> [ 619.665107] jbd2_log_do_checkpoint+0xfa/0x400 [jbd2] > >>> [ 619.673572] ? prepare_to_wait_event+0xa0/0x180 > >>> [ 619.681344] jbd2_journal_destroy+0x120/0x2a0 [jbd2] > >>> [ 619.689551] ? finish_wait+0x80/0x80 > >>> [ 619.696096] ext4_put_super+0x76/0x390 [ext4] > >>> [ 619.703584] generic_shutdown_super+0x6c/0x100 > >>> [ 619.711065] kill_block_super+0x21/0x50 > >>> [ 619.717809] deactivate_locked_super+0x34/0x70 > >>> [ 619.725146] cleanup_mnt+0x3b/0x70 > >>> [ 619.731279] task_work_run+0x8a/0xb0 > >>> [ 619.737576] exit_to_usermode_loop+0xeb/0xf0 > >>> [ 619.744657] do_syscall_64+0x198/0x1a0 > >>> [ 619.751155] entry_SYSCALL_64_after_hwframe+0x65/0xca > >>> > >>> It can be reproduced by mounting LVM created on IMSM RAID1 array and then > >>> reboot. I verified that reverting the patch fixes the issue. > >>> > >>> I understand that from systemd perspective the behavior in not wanted, but > >>> this is exactly what we need, to have working mdmon process even if > >>> systemd was stopped. KillMode=none does the job. > >>> I searched for alternative way to prevent systemd from stopping the mdmon > >>> unit but I failed. I tried to change signals, so I configured unit to send > >>> SIGPIPE (because it is ignored by mdmon)- it worked but later system > >>> hanged because mdmon unit cannot be stopped. > >>> > >>> I also tried to configure mdmon unit to be stopped after umount.target > >>> and I failed too. It cannot be achieved by setting After= or Before=. The > >>> one objection I have here is that systemd-shutdown tries to stop raid > >>> arrays later, so it could be better to have running mdmon there. > >>> > >>> IMO KillMode=none is desired in this case. Later, mdmon is restarted in > >>> dracut by mdraid module. > >>> > >>> If there is no other solution for the problem, I will need to ask Jes to > >>> revert this patch. For now, I asked Redhat to do it. > >>> Do you have any suggestions? > >> > >> > >> If Redhat doesn’t use the latest systemd, they should drop this patch. For > >> mdadm upstream we should keep this because it was suggested by systemd > >> developer. > >> > > > > If we want to keep this, we need to resolve reboot problem. I described > > problem and now I'm waiting for feedback. I hope that it can be fixed in > > mdmon service fast and easy. > > > Hmm, in the latest systemd source code, unit_kill_context() just simply > ignores KILL_NONE (KillMode=none) like this, > > 4776 /* Kill the processes belonging to this unit, in preparation for > shutting the unit down. 4777 * Returns > 0 if we killed something > worth waiting for, 0 otherwise. */ 4778 > 4779 if (c->kill_mode == KILL_NONE) > 4780 return 0; > > And no signal sent to target unit. Since there is no other location > references KILL_NONE, it is not clear to me how KillMode=none may help more. > > I have no too much understanding to systemd, I guess maybe (correct me if I > am wrong) it was because the systemd used in RHEL is not the latest version? > Hi Coly, It seems to be clear for me. When "none" is set then 0 is returned up. 0 means that there is nothing to wait for, so systemd doesn't check if process is really killed by pinging it. It assumes that process is dead/stopped already and systemd unit can be stopped too. And that happens- unit is stopped, but mdmon@ process works in background. > > > If we will determine that mdmon design update is needed then I will request > > to revert it, until fix is not ready to minimize impact on users (distros > > may pull this). > > Yes I agree. But for mdadm package in RHEL, I guess they don’t always use > upstream mdadm, and just do backport for selected patches as other enterprise > distributions do. If the latest mdadm and latest systemd work fine together, > maybe the fast fix for RHEL is to just drop this patch from their backport, > it is unnecessary to wait until the patch is reverted or fixed by upstream. > Yes, I recommended to revert it. But I don't think that it will be fixed automatically in systemd. We need to find solution and implement it on our side. > BTW, can I know the exact version of systemd from RHEL 8.7 and 9.1? On my > openSUSE 15.4, the systemd version is 249.11, I will try to reproduce the > operations as well, and try to find some clue if I am lucky. > I don't think that systemd version matters here. To reproduce it you need to: 1. remove KillMode line from service (/lib/sysytemd/system/mdmon@.service)- you don't need to reinstall mdadm at all. 2. systemctl daemon-reload or reboot 3. systemctl restart mdmon@md127 (generally it is IMSM container) 4. create LVM volume and mount it somewhere. 5. Do reboot RHEL 8.7 systemd rpm version is 238. RHEL 9.1 systemd rpm version is 250. They are using systemd-stable repo: https://github.com/systemd/systemd-stable so please find the latest tag for a release and you should be close. Thanks, Mariusz ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-07-28 7:55 ` Mariusz Tkaczyk 2022-07-28 8:39 ` Coly Li @ 2022-07-29 1:55 ` NeilBrown 2022-08-02 15:43 ` Mariusz Tkaczyk 2022-10-04 10:24 ` Mariusz Tkaczyk 2 siblings, 1 reply; 16+ messages in thread From: NeilBrown @ 2022-07-29 1:55 UTC (permalink / raw) To: Mariusz Tkaczyk Cc: Coly Li, linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Xiao Ni On Thu, 28 Jul 2022, Mariusz Tkaczyk wrote: > On Tue, 15 Feb 2022 21:34:15 +0800 > Coly Li <colyli@suse.de> wrote: > > > For mdadm's systemd configuration, current systemd KillMode is "none" in > > following service files, > > - mdadm-grow-continue@.service > > - mdmon@.service > > > > This "none" mode is strongly againsted by systemd developers (see man 5 > > systemd.kill for "KillMode=" section), and is considering to remove in > > future systemd version. > > > > As systemd developer explained in disuccsion, the systemd kill process > > is, > > 1. send the signal specified by KillSignal= to the list of processes (if > > any), TERM is the default > > 2. wait until either the target of process(es) exit or a timeout expires > > 3. if the timeout expires send the signal specified by FinalKillSignal=, > > KILL is the default > > > > For "control-group", all remaining processes will receive the SIGTERM > > signal (by default) and if there are still processes after a period f > > time, they will get the SIGKILL signal. > > > > For "mixed", only the main process will receive the SIGTERM signal, and > > if there are still processes after a period of time, all remaining > > processes (including the main one) will receive the SIGKILL signal. > > > > From the above comment, currently KillMode=control-group is a proper > > kill mode. Since control-gropu is the default kill mode, the fix can be > > simply removing KillMode=none line from the service file, then the > > default mode will take effect. > > Hi All, > We are experiencing issues with IMSM metadata on RHEL8.7 and 9.1 (the patch > was picked by Redhat). There are several issues which results in hang task, > characteristic to missing mdmon: > > [ 619.521440] task:umount state:D stack: 0 pid: 6285 ppid: flags:0x00004084 > [ 619.534033] Call Trace: > [ 619.539980] __schedule+0x2d1/0x830 > [ 619.547056] ? finish_wait+0x80/0x80 > [ 619.554261] schedule+0x35/0xa0 > [ 619.560999] md_write_start+0x14b/0x220 > [ 619.568492] ? finish_wait+0x80/0x80 > [ 619.575649] raid1_make_request+0x3c/0x90 [raid1] > [ 619.584111] md_handle_request+0x128/0x1b0 > [ 619.591891] md_make_request+0x5b/0xb0 > [ 619.599235] generic_make_request_no_check+0x202/0x330 > [ 619.608185] submit_bio+0x3c/0x160 > [ 619.615161] ? bio_add_page+0x42/0x50 > [ 619.622413] submit_bh_wbc+0x16a/0x190 > [ 619.629713] jbd2_write_superblock+0xf4/0x210 [jbd2] > [ 619.638340] jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2] > [ 619.647773] __jbd2_update_log_tail+0x3f/0x100 [jbd2] > [ 619.656374] jbd2_cleanup_journal_tail+0x50/0x90 [jbd2] > [ 619.665107] jbd2_log_do_checkpoint+0xfa/0x400 [jbd2] > [ 619.673572] ? prepare_to_wait_event+0xa0/0x180 > [ 619.681344] jbd2_journal_destroy+0x120/0x2a0 [jbd2] > [ 619.689551] ? finish_wait+0x80/0x80 > [ 619.696096] ext4_put_super+0x76/0x390 [ext4] > [ 619.703584] generic_shutdown_super+0x6c/0x100 > [ 619.711065] kill_block_super+0x21/0x50 > [ 619.717809] deactivate_locked_super+0x34/0x70 > [ 619.725146] cleanup_mnt+0x3b/0x70 > [ 619.731279] task_work_run+0x8a/0xb0 > [ 619.737576] exit_to_usermode_loop+0xeb/0xf0 > [ 619.744657] do_syscall_64+0x198/0x1a0 > [ 619.751155] entry_SYSCALL_64_after_hwframe+0x65/0xca > > It can be reproduced by mounting LVM created on IMSM RAID1 array and then > reboot. I verified that reverting the patch fixes the issue. > > I understand that from systemd perspective the behavior in not wanted, but > this is exactly what we need, to have working mdmon process even if systemd was > stopped. KillMode=none does the job. > I searched for alternative way to prevent systemd from stopping the mdmon unit > but I failed. I tried to change signals, so I configured unit to send SIGPIPE > (because it is ignored by mdmon)- it worked but later system hanged because > mdmon unit cannot be stopped. > > I also tried to configure mdmon unit to be stopped after umount.target and I > failed too. It cannot be achieved by setting After= or Before=. The one > objection I have here is that systemd-shutdown tries to stop raid arrays later, > so it could be better to have running mdmon there. > > IMO KillMode=none is desired in this case. Later, mdmon is restarted in dracut > by mdraid module. > > If there is no other solution for the problem, I will need to ask Jes to revert > this patch. For now, I asked Redhat to do it. > Do you have any suggestions? We should be able to make this work. We don't need mdmon after the last array stops, and we should have dependencies to tell systemd that the various arrays require mdmon. Ideally systemd wouldn't even try to stop mdmon until the relevant array was stopped. Can we change the udev rule to tell systemd that the device WANTS mdmon@foo.service?? Or add "Before=sys-devices-md-%I.device" or something like that to mdmon@.service ?? Do you know what exactly is causing systemd to hang because mdmon cannot be stopped? What other unit is waiting for it? Even if the root filesystems is on LVM on IMSM, doesn't systemd chroot back to the initramfs and then tear down the LVM and MD arrays??? NeilBrown ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-07-29 1:55 ` NeilBrown @ 2022-08-02 15:43 ` Mariusz Tkaczyk 2022-08-18 22:00 ` Michal Koutný 0 siblings, 1 reply; 16+ messages in thread From: Mariusz Tkaczyk @ 2022-08-02 15:43 UTC (permalink / raw) To: NeilBrown Cc: Coly Li, linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Xiao Ni On Fri, 29 Jul 2022 11:55:18 +1000 "NeilBrown" <neilb@suse.de> wrote: > On Thu, 28 Jul 2022, Mariusz Tkaczyk wrote: > > On Tue, 15 Feb 2022 21:34:15 +0800 > > Coly Li <colyli@suse.de> wrote: > > > > > For mdadm's systemd configuration, current systemd KillMode is "none" in > > > following service files, > > > - mdadm-grow-continue@.service > > > - mdmon@.service > > > > > > This "none" mode is strongly againsted by systemd developers (see man 5 > > > systemd.kill for "KillMode=" section), and is considering to remove in > > > future systemd version. > > > > > > As systemd developer explained in disuccsion, the systemd kill process > > > is, > > > 1. send the signal specified by KillSignal= to the list of processes (if > > > any), TERM is the default > > > 2. wait until either the target of process(es) exit or a timeout expires > > > 3. if the timeout expires send the signal specified by FinalKillSignal=, > > > KILL is the default > > > > > > For "control-group", all remaining processes will receive the SIGTERM > > > signal (by default) and if there are still processes after a period f > > > time, they will get the SIGKILL signal. > > > > > > For "mixed", only the main process will receive the SIGTERM signal, and > > > if there are still processes after a period of time, all remaining > > > processes (including the main one) will receive the SIGKILL signal. > > > > > > From the above comment, currently KillMode=control-group is a propervi > > > kill mode. Since control-gropu is the default kill mode, the fix can be > > > simply removing KillMode=none line from the service file, then the > > > default mode will take effect. > > > > Hi All, > > We are experiencing issues with IMSM metadata on RHEL8.7 and 9.1 (the patch > > was picked by Redhat). There are several issues which results in hang task, > > characteristic to missing mdmon: > > > > [ 619.521440] task:umount state:D stack: 0 pid: 6285 ppid: flags:0x00004084 > > [ 619.534033] Call Trace: > > [ 619.539980] __schedule+0x2d1/0x830 > > [ 619.547056] ? finish_wait+0x80/0x80 > > [ 619.554261] schedule+0x35/0xa0 > > [ 619.560999] md_write_start+0x14b/0x220 > > [ 619.568492] ? finish_wait+0x80/0x80 > > [ 619.575649] raid1_make_request+0x3c/0x90 [raid1] > > [ 619.584111] md_handle_request+0x128/0x1b0 > > [ 619.591891] md_make_request+0x5b/0xb0 > > [ 619.599235] generic_make_request_no_check+0x202/0x330 > > [ 619.608185] submit_bio+0x3c/0x160 > > [ 619.615161] ? bio_add_page+0x42/0x50 > > [ 619.622413] submit_bh_wbc+0x16a/0x190 > > [ 619.629713] jbd2_write_superblock+0xf4/0x210 [jbd2] > > [ 619.638340] jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2] > > [ 619.647773] __jbd2_update_log_tail+0x3f/0x100 [jbd2] > > [ 619.656374] jbd2_cleanup_journal_tail+0x50/0x90 [jbd2] > > [ 619.665107] jbd2_log_do_checkpoint+0xfa/0x400 [jbd2] > > [ 619.673572] ? prepare_to_wait_event+0xa0/0x180 > > [ 619.681344] jbd2_journal_destroy+0x120/0x2a0 [jbd2] > > [ 619.689551] ? finish_wait+0x80/0x80 > > [ 619.696096] ext4_put_super+0x76/0x390 [ext4] > > [ 619.703584] generic_shutdown_super+0x6c/0x100 > > [ 619.711065] kill_block_super+0x21/0x50 > > [ 619.717809] deactivate_locked_super+0x34/0x70 > > [ 619.725146] cleanup_mnt+0x3b/0x70 > > [ 619.731279] task_work_run+0x8a/0xb0 > > [ 619.737576] exit_to_usermode_loop+0xeb/0xf0 > > [ 619.744657] do_syscall_64+0x198/0x1a0 > > [ 619.751155] entry_SYSCALL_64_after_hwframe+0x65/0xca > > > > It can be reproduced by mounting LVM created on IMSM RAID1 array and then > > reboot. I verified that reverting the patch fixes the issue. > > > > I understand that from systemd perspective the behavior in not wanted, but > > this is exactly what we need, to have working mdmon process even if systemd > > was stopped. KillMode=none does the job. > > I searched for alternative way to prevent systemd from stopping the mdmon > > unit but I failed. I tried to change signals, so I configured unit to send > > SIGPIPE (because it is ignored by mdmon)- it worked but later system hanged > > because mdmon unit cannot be stopped. > > > > I also tried to configure mdmon unit to be stopped after umount.target and I > > failed too. It cannot be achieved by setting After= or Before=. The one > > objection I have here is that systemd-shutdown tries to stop raid arrays > > later, so it could be better to have running mdmon there. > > > > IMO KillMode=none is desired in this case. Later, mdmon is restarted in > > dracut by mdraid module. > > > > If there is no other solution for the problem, I will need to ask Jes to > > revert this patch. For now, I asked Redhat to do it. > > Do you have any suggestions? > > We should be able to make this work. > We don't need mdmon after the last array stops, and we should have > dependencies to tell systemd that the various arrays require mdmon. > Ideally systemd wouldn't even try to stop mdmon until the relevant array > was stopped. > > Can we change the udev rule to tell systemd that the device WANTS > mdmon@foo.service?? Hi Neil, This is done already: https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/udev-md-raid-arrays.rules#n41 but i can't find wants dependency in: #systemctl show dev-md126.service #systemctl show dev-md127.service According to man: https://www.freedesktop.org/software/systemd/man/systemd.device.html there is nothing else I can do. > Or add "Before=sys-devices-md-%I.device" or something like that to > mdmon@.service ?? > I got: systemd[1]: /usr/lib/systemd/system/mdmon@.service:11: Failed to resolve unit specifiers in 'dev-%I.device', ignoring: Invalid slot > Do you know what exactly is causing systemd to hang because mdmon cannot > be stopped? What other unit is waiting for it? There is special umount.target https://www.freedesktop.org/software/systemd/man/systemd.special.html Probably it tries to umount every exiting .mount unit, i didn't check deeply. https://www.freedesktop.org/software/systemd/man/systemd.mount.html I can see that we can define something for .mount units so I tried both: # mount -o x-systemd.after=mdmon@md127.service /dev/mapper/vg0-lvm_raid /mnt # mount -o x-systemd.requires=mdmon@md127.service /dev/mapper/vg0-lvm_raid /mnt but I doesn't help either. I seems that it is ignored because I cannot find mdmon dependency in systemctl show output for mnt.mount unit. Do you have any other ideas? > > Even if the root filesystems is on LVM on IMSM, doesn't systemd chroot > back to the initramfs and then tear down the LVM and MD arrays??? Yes, this is how it works, mdmon is restarted in initrd later. System will reboot successfully after timeout. Thanks, Mariusz ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-08-02 15:43 ` Mariusz Tkaczyk @ 2022-08-18 22:00 ` Michal Koutný 2022-08-24 9:52 ` Mariusz Tkaczyk 0 siblings, 1 reply; 16+ messages in thread From: Michal Koutný @ 2022-08-18 22:00 UTC (permalink / raw) To: Mariusz Tkaczyk Cc: NeilBrown, Coly Li, linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Xiao Ni Hello. (Coming via https://lists.freedesktop.org/archives/systemd-devel/2022-August/048201.html.) On Tue, Aug 02, 2022 at 05:43:05PM +0200, Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> wrote: > https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/udev-md-raid-arrays.rules#n41 > but i can't find wants dependency in: > #systemctl show dev-md126.service > #systemctl show dev-md127.service Typo here s/service/device/ But the Wants dependency won't help with shutdown ordering. > I got: > systemd[1]: /usr/lib/systemd/system/mdmon@.service:11: Failed to resolve unit > specifiers in 'dev-%I.device', ignoring: Invalid slot What was your exact directive in service unit file and what was the template parameter? (This may not work though, since there'd be no stop job for .device unit during shutdown to order against. (not tested)) > Probably it tries to umount every exiting .mount unit, i didn't check deeply. > https://www.freedesktop.org/software/systemd/man/systemd.mount.html > > I can see that we can define something for .mount units so I tried both: > # mount -o x-systemd.after=mdmon@md127.service /dev/mapper/vg0-lvm_raid /mnt > # mount -o x-systemd.requires=mdmon@md127.service /dev/mapper/vg0-lvm_raid /mnt > > but I doesn't help either. I seems that it is ignored because I cannot find > mdmon dependency in systemctl show output for mnt.mount unit. These x-* options are parsed from fstab. If you mount manually like this, systemd won't learn about these non-kernel options (they don't get through /proc/mountinfo). Actually, I think if you add the .mount:After=mdmon@....service (via fstab), it should properly order the stop of mdmon after the particular unmount during shutdown. HTH, Michal ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-08-18 22:00 ` Michal Koutný @ 2022-08-24 9:52 ` Mariusz Tkaczyk 2022-08-24 12:03 ` Michal Koutný 0 siblings, 1 reply; 16+ messages in thread From: Mariusz Tkaczyk @ 2022-08-24 9:52 UTC (permalink / raw) To: Michal Koutný Cc: NeilBrown, Coly Li, linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Xiao Ni Hi Michal, Thank you for support. On Fri, 19 Aug 2022 00:00:47 +0200 Michal Koutný <mkoutny@suse.com> wrote: > Hello. > > (Coming via > https://lists.freedesktop.org/archives/systemd-devel/2022-August/048201.html.) > > On Tue, Aug 02, 2022 at 05:43:05PM +0200, Mariusz Tkaczyk > <mariusz.tkaczyk@linux.intel.com> wrote: > > https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/udev-md-raid-arrays.rules#n41 > > but i can't find wants dependency in: > > #systemctl show dev-md126.service > > #systemctl show dev-md127.service > > Typo here > > s/service/device/ > > But the Wants dependency won't help with shutdown ordering. > > > I got: > > systemd[1]: /usr/lib/systemd/system/mdmon@.service:11: Failed to resolve > > unit specifiers in 'dev-%I.device', ignoring: Invalid slot > > What was your exact directive in service unit file and what was the > template parameter? > (This may not work though, since there'd be no stop job for .device unit > during shutdown to order against. (not tested)) I removed those setting but it was something like: Before=initrd-switch-root.target dev-%I.device I can test more if you have suggestions. > > > Probably it tries to umount every exiting .mount unit, i didn't check > > deeply. https://www.freedesktop.org/software/systemd/man/systemd.mount.html > > > > I can see that we can define something for .mount units so I tried both: > > # mount -o x-systemd.after=mdmon@md127.service /dev/mapper/vg0-lvm_raid /mnt > > # mount -o x-systemd.requires=mdmon@md127.service /dev/mapper/vg0-lvm_raid > > /mnt > > > > but I doesn't help either. I seems that it is ignored because I cannot find > > mdmon dependency in systemctl show output for mnt.mount unit. > > These x-* options are parsed from fstab. If you mount manually like > this, systemd won't learn about these non-kernel options (they don't get > through /proc/mountinfo). > > Actually, I think if you add the .mount:After=mdmon@....service > (via fstab), it should properly order the stop of mdmon after the > particular unmount during shutdown. > Will check but it can be considered as workaround, not as a solution. VROC arrays are automatically configured in installers, also users may mount them manually, without any additional settings (as standalone disk). We need to resolve it globally. Thanks, Mariusz ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-08-24 9:52 ` Mariusz Tkaczyk @ 2022-08-24 12:03 ` Michal Koutný 2022-08-24 12:57 ` Mariusz Tkaczyk 0 siblings, 1 reply; 16+ messages in thread From: Michal Koutný @ 2022-08-24 12:03 UTC (permalink / raw) To: Mariusz Tkaczyk Cc: NeilBrown, Coly Li, linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Xiao Ni On Wed, Aug 24, 2022 at 11:52:39AM +0200, Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> wrote: > I removed those setting but it was something like: > > Before=initrd-switch-root.target dev-%I.device > > I can test more if you have suggestions. Sorry, I realize it won't work, device deps are restricted [1]. (I considered relaxing that [2] in order to terminate loop devs properly.) > Will check but it can be considered as workaround, not as a solution. VROC > arrays are automatically configured in installers, also users may mount them > manually, without any additional settings (as standalone disk). We need to > resolve it globally. It's not the only setup when a device requires a userspace daemon. There is a generic solution for root devices [3] (when the daemon is marked to run indefinitely). The device job ordering dependencies during shutdown would need better handling in systemd. (But I don't understand how much mdmon@.serice is necessary for device existence and teardown.) HTH, Michal [1] https://github.com/systemd/systemd/blob/98f3e84342dbb9da48ffa22bfdf122bdae4da1c6/src/core/unit.c#L3101 [2] https://github.com/Werkov/systemd/commit/bdaa49d34e78981f3535c42ec19ac0f314135c07 (forked repo, not an upstream commit) [3] https://systemd.io/ROOT_STORAGE_DAEMONS/ ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-08-24 12:03 ` Michal Koutný @ 2022-08-24 12:57 ` Mariusz Tkaczyk 2022-08-29 16:19 ` Michal Koutný 0 siblings, 1 reply; 16+ messages in thread From: Mariusz Tkaczyk @ 2022-08-24 12:57 UTC (permalink / raw) To: Michal Koutný Cc: NeilBrown, Coly Li, linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Xiao Ni On Wed, 24 Aug 2022 14:03:25 +0200 Michal Koutný <mkoutny@suse.com> wrote: > On Wed, Aug 24, 2022 at 11:52:39AM +0200, Mariusz Tkaczyk > <mariusz.tkaczyk@linux.intel.com> wrote: > > I removed those setting but it was something like: > > > > Before=initrd-switch-root.target dev-%I.device > > > > I can test more if you have suggestions. > > Sorry, I realize it won't work, device deps are restricted [1]. (I > considered relaxing that [2] in order to terminate loop devs properly.) > > > Will check but it can be considered as workaround, not as a solution. VROC > > arrays are automatically configured in installers, also users may mount them > > manually, without any additional settings (as standalone disk). We need to > > resolve it globally. > > It's not the only setup when a device requires a userspace daemon. > There is a generic solution for root devices [3] (when the daemon is > marked to run indefinitely). Yes, I know that trick and we are setting '@' to prevent systemd from killing it[1] but we do mdmon@ service restart after switch root. This is the simplest way to reopen descriptors. We can try to change that. It will be great if you can really prove that the mechanism is working. Do you know any project which really uses this functionality? > > The device job ordering dependencies during shutdown would need better > handling in systemd. (But I don't understand how much > mdmon@.serice is necessary for device existence and teardown.) > We need to handle dirty clean transaction. On shutdown, when umount is requested them filesystem could flush in flight data, and them kernel is waiting for mdmon to acknowledge the change in metadata[2]. Thanks, Mariusz [1] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/mdmon.c#n342 [2] https://git.kernel.org/pub/scm/utils/mdadm/mdadm.git/tree/mdmon-design.txt ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-08-24 12:57 ` Mariusz Tkaczyk @ 2022-08-29 16:19 ` Michal Koutný 0 siblings, 0 replies; 16+ messages in thread From: Michal Koutný @ 2022-08-29 16:19 UTC (permalink / raw) To: Mariusz Tkaczyk, systemd-devel Cc: NeilBrown, Coly Li, linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Xiao Ni [-- Attachment #1: Type: text/plain, Size: 1374 bytes --] Hello. On Wed, Aug 24, 2022 at 02:57:56PM +0200, Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> wrote: > It will be great if you can really prove that the mechanism is working. Do you > know any project which really uses this functionality? My knee-jerk response would be open-iscsi daemon but I know that this one in particular works without '@', so I can't answer your query. (But generally, it would only protect against the "global" killing upon initrd transitions, not the killing of a single unit. That's likely what you run into first during shutdown.) Let me cross-post (back [1]) to systemd-devel ML. > We need to handle dirty clean transaction. On shutdown, when umount is > requested them filesystem could flush in flight data, and them kernel is > waiting for mdmon to acknowledge the change in metadata[2]. So, technically, you'd want to order the mdmon service wrt .mount unit. But that's unfortunately not known when mdmon@ starts based on a udev rule. Therefore, I suspect removal of KillMode=none would need some version of [2] to accomodate such device-service orderings. Michal [1] In-Reply-To: https://lists.freedesktop.org/archives/systemd-devel/2022-August/048201.html [1] In-Reply-To: https://lore.kernel.org/r/20220824145756.000048f8@linux.intel.com/ [2] https://github.com/Werkov/systemd/commit/bdaa49d34e78981f3535c42ec19ac0f314135c07 [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 228 bytes --] ^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH] mdadm/systemd: remove KillMode=none from service file 2022-07-28 7:55 ` Mariusz Tkaczyk 2022-07-28 8:39 ` Coly Li 2022-07-29 1:55 ` NeilBrown @ 2022-10-04 10:24 ` Mariusz Tkaczyk 2 siblings, 0 replies; 16+ messages in thread From: Mariusz Tkaczyk @ 2022-10-04 10:24 UTC (permalink / raw) To: Coly Li Cc: linux-raid, Benjamin Brunner, Franck Bui, Jes Sorensen, Neil Brown, Xiao Ni, Michal Koutný On Thu, 28 Jul 2022 09:55:35 +0200 Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com> wrote: > On Tue, 15 Feb 2022 21:34:15 +0800 > Coly Li <colyli@suse.de> wrote: > > > For mdadm's systemd configuration, current systemd KillMode is "none" in > > following service files, > > - mdadm-grow-continue@.service > > - mdmon@.service > > > > This "none" mode is strongly againsted by systemd developers (see man 5 > > systemd.kill for "KillMode=" section), and is considering to remove in > > future systemd version. > > > > As systemd developer explained in disuccsion, the systemd kill process > > is, > > 1. send the signal specified by KillSignal= to the list of processes (if > > any), TERM is the default > > 2. wait until either the target of process(es) exit or a timeout expires > > 3. if the timeout expires send the signal specified by FinalKillSignal=, > > KILL is the default > > > > For "control-group", all remaining processes will receive the SIGTERM > > signal (by default) and if there are still processes after a period f > > time, they will get the SIGKILL signal. > > > > For "mixed", only the main process will receive the SIGTERM signal, and > > if there are still processes after a period of time, all remaining > > processes (including the main one) will receive the SIGKILL signal. > > > > From the above comment, currently KillMode=control-group is a proper > > kill mode. Since control-gropu is the default kill mode, the fix can be > > simply removing KillMode=none line from the service file, then the > > default mode will take effect. > > Hi All, > We are experiencing issues with IMSM metadata on RHEL8.7 and 9.1 (the patch > was picked by Redhat). There are several issues which results in hang task, > characteristic to missing mdmon: > > [ 619.521440] task:umount state:D stack: 0 pid: 6285 ppid: flags:0x00004084 > [ 619.534033] Call Trace: > [ 619.539980] __schedule+0x2d1/0x830 > [ 619.547056] ? finish_wait+0x80/0x80 > [ 619.554261] schedule+0x35/0xa0 > [ 619.560999] md_write_start+0x14b/0x220 > [ 619.568492] ? finish_wait+0x80/0x80 > [ 619.575649] raid1_make_request+0x3c/0x90 [raid1] > [ 619.584111] md_handle_request+0x128/0x1b0 > [ 619.591891] md_make_request+0x5b/0xb0 > [ 619.599235] generic_make_request_no_check+0x202/0x330 > [ 619.608185] submit_bio+0x3c/0x160 > [ 619.615161] ? bio_add_page+0x42/0x50 > [ 619.622413] submit_bh_wbc+0x16a/0x190 > [ 619.629713] jbd2_write_superblock+0xf4/0x210 [jbd2] > [ 619.638340] jbd2_journal_update_sb_log_tail+0x65/0xc0 [jbd2] > [ 619.647773] __jbd2_update_log_tail+0x3f/0x100 [jbd2] > [ 619.656374] jbd2_cleanup_journal_tail+0x50/0x90 [jbd2] > [ 619.665107] jbd2_log_do_checkpoint+0xfa/0x400 [jbd2] > [ 619.673572] ? prepare_to_wait_event+0xa0/0x180 > [ 619.681344] jbd2_journal_destroy+0x120/0x2a0 [jbd2] > [ 619.689551] ? finish_wait+0x80/0x80 > [ 619.696096] ext4_put_super+0x76/0x390 [ext4] > [ 619.703584] generic_shutdown_super+0x6c/0x100 > [ 619.711065] kill_block_super+0x21/0x50 > [ 619.717809] deactivate_locked_super+0x34/0x70 > [ 619.725146] cleanup_mnt+0x3b/0x70 > [ 619.731279] task_work_run+0x8a/0xb0 > [ 619.737576] exit_to_usermode_loop+0xeb/0xf0 > [ 619.744657] do_syscall_64+0x198/0x1a0 > [ 619.751155] entry_SYSCALL_64_after_hwframe+0x65/0xca > > It can be reproduced by mounting LVM created on IMSM RAID1 array and then > reboot. I verified that reverting the patch fixes the issue. > > I understand that from systemd perspective the behavior in not wanted, but > this is exactly what we need, to have working mdmon process even if systemd > was stopped. KillMode=none does the job. > I searched for alternative way to prevent systemd from stopping the mdmon unit > but I failed. I tried to change signals, so I configured unit to send SIGPIPE > (because it is ignored by mdmon)- it worked but later system hanged because > mdmon unit cannot be stopped. > > I also tried to configure mdmon unit to be stopped after umount.target and I > failed too. It cannot be achieved by setting After= or Before=. The one > objection I have here is that systemd-shutdown tries to stop raid arrays > later, so it could be better to have running mdmon there. > > IMO KillMode=none is desired in this case. Later, mdmon is restarted in dracut > by mdraid module. > > If there is no other solution for the problem, I will need to ask Jes to > revert this patch. For now, I asked Redhat to do it. > Do you have any suggestions? > Hi all, I would like to recommend reverting this for now. Fixing that seems to not be trivial, we need more time. For user experience it will be better to have upstream working. Jes, could you please revert this patch for now? Thanks, Mariusz ^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2022-10-04 10:24 UTC | newest] Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2022-02-15 13:34 [PATCH] mdadm/systemd: remove KillMode=none from service file Coly Li 2022-04-06 6:36 ` Xiao Ni 2022-04-06 13:35 ` Jes Sorensen 2022-07-28 7:55 ` Mariusz Tkaczyk 2022-07-28 8:39 ` Coly Li 2022-07-28 9:01 ` Mariusz Tkaczyk 2022-07-28 10:55 ` Coly Li 2022-07-29 7:55 ` Mariusz Tkaczyk 2022-07-29 1:55 ` NeilBrown 2022-08-02 15:43 ` Mariusz Tkaczyk 2022-08-18 22:00 ` Michal Koutný 2022-08-24 9:52 ` Mariusz Tkaczyk 2022-08-24 12:03 ` Michal Koutný 2022-08-24 12:57 ` Mariusz Tkaczyk 2022-08-29 16:19 ` Michal Koutný 2022-10-04 10:24 ` Mariusz Tkaczyk
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.