RE: [PATCH v2 1/4] Tools: hv: Reopen the devices if read() or write() returns errors

From: Dexuan Cui <decui@microsoft.com>
To: Michael Kelley <mikelley@microsoft.com>,
	KY Srinivasan <kys@microsoft.com>,
	Haiyang Zhang <haiyangz@microsoft.com>,
	Stephen Hemminger <sthemmin@microsoft.com>,
	"sashal@kernel.org" <sashal@kernel.org>,
	Sasha Levin <Alexander.Levin@microsoft.com>,
	"linux-hyperv@vger.kernel.org" <linux-hyperv@vger.kernel.org>,
	vkuznets <vkuznets@redhat.com>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: RE: [PATCH v2 1/4] Tools: hv: Reopen the devices if read() or write() returns errors
Date: Thu, 23 Jan 2020 08:10:57 +0000	[thread overview]
Message-ID: <HK0P153MB01483385D79219EBCDE4037DBF0F0@HK0P153MB0148.APCP153.PROD.OUTLOOK.COM> (raw)
In-Reply-To: <DM5PR2101MB1047BBD2C8C7B382D77EF981D70C0@DM5PR2101MB1047.namprd21.prod.outlook.com>

> From: Michael Kelley <mikelley@microsoft.com>
> Sent: Wednesday, January 22, 2020 8:54 AM
> > ...
> > @@ -111,12 +113,16 @@ static int hv_copy_data(struct hv_do_fcopy
> *cpmsg)
> >  static int hv_copy_finished(void)
> >  {
> >  	close(target_fd);
> > +	target_fd = -1;
> > +	memset(target_fname, 0, sizeof(target_fname));
> 
> I'm not completely clear on why target_fd and target_fname need to
> be reset.  Could you add a comment with an explanation?  Also,

After checking the code again, now I think we indeed don't need
"target_fd = -1;", but we need to reset target_fname because the kernel
hv_utils hibernation patch fakes a CANCEL_FCOPY message upon suspend,
and later when the VM resumes back the hv_fcopy_daemon may need
to handle the message by calling hv_copy_cancel(), which tries to remove
a file specified by target_fname. 

If some file was copied successfully before suspend, currently 
hv_copy_finished() doesn't reset target_fname; so after resume, when
the daemon handles the faked CANCEL_FCOPY message, hv_copy_cancel()
removes the file unexpectedly.

This patch resets target_fname in hv_copy_finished() to avoid the described
issue. 

In case a file is not fully copied before suspend (which means hv_copy_finished()
is not called), we'd better also reset taret_fname hv_copy_cancel(), since we'd
better make sure we only try to remove the file once, when we suspend/resume
multiple times.

I'll add a comment in the next version of the patch.

> since target_fname is a null terminated string, it seems like
> target_fname[0] = 0 would be sufficient vs. zero'ing all 4096 bytes
> (PATH_MAX).

I agree. Will use this better method.

> > +reopen_fcopy_fd:
> > +	hv_copy_cancel();
> > +	in_handshake = 1;
> >  	fcopy_fd = open("/dev/vmbus/hv_fcopy", O_RDWR);
> >
> >  	if (fcopy_fd < 0) {
> > @@ -196,7 +205,8 @@ int main(int argc, char *argv[])
> >  		len = pread(fcopy_fd, &buffer, sizeof(buffer), 0);
> >  		if (len < 0) {
> >  			syslog(LOG_ERR, "pread failed: %s", strerror(errno));
> > -			exit(EXIT_FAILURE);
> > +			close(fcopy_fd);
> > +			goto reopen_fcopy_fd;
> 
> In this case and all similar cases in this patch, there may be some risk
> of getting stuck in a tight loop doing reopens if things are broken
> in some strange and bizarre way.   Having an absolute limit on the
> number of reopens is potentially too restrictive as it could limit the
> number of times a VM could be hibernated and resumed.  Ideally
> there could a simple rate limit on the reopens -- if it happens too frequently,
> go ahead and exit like the current code does.  Thoughts?

If there is a bug, IMO it's better to get stuck in a tight loop, so the user
can notice it more quickly by the "top" command. :-)

With the patch, the daemon can get stuck in the loop only when the daemon
successfully reopens the dev file and registers itself to the kernel, but fails
to handle one of the messages. IMO the chance is pretty small, and if that
happens, there must be a bug in the hv_utils driver we need to fix, so, again,
IMO it's better to make the bug more noticeable by the tight loop. :-)

If the daemon fails to reopen the dev file or fails to register it to the kernel,
the daemon still calls exit().

> >  static int vss_do_freeze(char *dir, unsigned int cmd)
> >  {
> > @@ -155,8 +157,11 @@ static int vss_operate(int operation)
> >  			continue;
> >  		}
> >  		error |= vss_do_freeze(ent->mnt_dir, cmd);
> > -		if (error && operation == VSS_OP_FREEZE)
> > -			goto err;
> > +		if (operation == VSS_OP_FREEZE) {
> > +			if (error)
> > +				goto err;
> > +			fs_frozen = true;
> > +		}
> >  	}
> >
> >  	endmntent(mounts);
> 
> Shortly after the above code, there's code specifically to
> do the root filesystem last.  It has the same error test as above,
> and it seems like it should also be setting fs_frozen = true if
> it is successful.

Yes, I missed that. Will add code for that.
 
> > @@ -167,6 +172,9 @@ static int vss_operate(int operation)
> >  			goto err;
> >  	}
> >
> > +	if (operation == VSS_OP_THAW && !error)
> > +		fs_frozen = false;
> > +
> >  	goto out;
> >  err:
> >  	save_errno = errno;
> > @@ -175,6 +183,7 @@ static int vss_operate(int operation)
> >  		endmntent(mounts);
> >  	}
> >  	vss_operate(VSS_OP_THAW);
> > +	fs_frozen = false;
> >  	/* Call syslog after we thaw all filesystems */
> >  	if (ent)
> >  		syslog(LOG_ERR, "FREEZE of %s failed; error:%d %s",
> > @@ -202,7 +211,7 @@ int main(int argc, char *argv[])
> >  	int	op;
> >  	struct hv_vss_msg vss_msg[1];
> >  	int daemonize = 1, long_index = 0, opt;
> > -	int in_handshake = 1;
> > +	int in_handshake;
> >  	__u32 kernel_modver;
> >
> >  	static struct option long_options[] = {
> > @@ -232,6 +241,10 @@ int main(int argc, char *argv[])
> >  	openlog("Hyper-V VSS", 0, LOG_USER);
> >  	syslog(LOG_INFO, "VSS starting; pid is:%d", getpid());
> >
> > +reopen_vss_fd:
> > +	if (fs_frozen)
> > +		vss_operate(VSS_OP_THAW);
> 
> Need to set fs_frozen = false after the above statement?

fs_frozen is set to false in vss_operate(), but there is a chance
vss_operate() may fail due to some reason, and hence fs_frozen may
remain to be true. I'll add the below code:

@@ -242,8 +242,14 @@ int main(int argc, char *argv[])
        syslog(LOG_INFO, "VSS starting; pid is:%d", getpid());

 reopen_vss_fd:
-       if (fs_frozen)
-               vss_operate(VSS_OP_THAW);
+       if (fs_frozen) {
+               if (vss_operate(VSS_OP_THAW) || fs_frozen) {
+                       syslog(LOG_ERR, "failed to thaw file system: err=%d",
+                              errno);
+                       exit(EXIT_FAILURE);
+               }
+       }
+
        in_handshake = 1;
        vss_fd = open("/dev/vmbus/hv_vss", O_RDWR);

Thanks,
-- Dexuan