linux-cifs.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: rsync copy operation fails on a CIFS mount
       [not found] <CAJK_Yh-m-p8r=9WhrHn=V5yMWBpYZCRZeqWyci+NbUEGNPwpYw@mail.gmail.com>
@ 2021-08-08 20:33 ` SZIGETVÁRI János
  2021-08-08 20:56   ` Steve French
  0 siblings, 1 reply; 3+ messages in thread
From: SZIGETVÁRI János @ 2021-08-08 20:33 UTC (permalink / raw)
  To: linux-cifs

Dear Members,

I work for a company that (among others) sells Ubuntu-based log
storage appliances. I ran into a problem, where I'm trying to copy a
large amount of data over to a CIFS mount from a Ubuntu 18.04 based
appliance to a Windows 2012 R2 Storage Server, and rsync fails after
1-2-3 hours into the copy operation with something like:
rsync: failed to set times on "FILENAME.U5EgGX": No such device (19)"

and I see a number of kernel logs just prior to that, that look like this:

kernel: [3819786.441711] CIFS VFS: No task to wake, unknown frame
received! NumMids 1
kernel: [3819786.441717] 00000000: 6c000000 424d53fe 00000040 00000000
 ...l.SMB@.......
kernel: [3819786.441718] 00000010: 00000012 00000001 00000000 ffffffff
 ................
kernel: [3819786.441720] 00000020: ffffffff 00000000 00000000 00000000
 ................
kernel: [3819786.441721] 00000030: 00000000 00000000 00000000 00000000
 ................
kernel: [3819786.441722] 00000040: 00000000

I also tried to google around for a while, and I found the same exact
package hexdump on the linux-cifs mailing list from 2012:
https://www.spinics.net/lists/linux-cifs/msg06634.html

In that thread the person reporting the problem failed to reproduce
the problem a few weeks after reporting it.
There it was recommended to try and mount the share with SMB v1, but
that is out of the question nowdays.

We tried forcing the mount to happen with vers=3 and 3.02, but it made
no difference. The error still re-occurred.

Best Regards,
János Szigetvári
--
Janos SZIGETVARI
RHCE, License no. 150-053-692

LinkedIn: linkedin.com/in/janosszigetvari

__@__˚V˚
Make the switch to open (source) applications, protocols, formats now:
- windows -> Linux, iexplore -> Firefox, msoffice -> LibreOffice
- msn -> jabber protocol (Pidgin, Google Talk)
- mp3 -> ogg, wmv -> ogg, jpg -> png, doc/xls/ppt -> odt/ods/odp

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: rsync copy operation fails on a CIFS mount
  2021-08-08 20:33 ` rsync copy operation fails on a CIFS mount SZIGETVÁRI János
@ 2021-08-08 20:56   ` Steve French
  2021-08-11 13:13     ` SZIGETVÁRI János
  0 siblings, 1 reply; 3+ messages in thread
From: Steve French @ 2021-08-08 20:56 UTC (permalink / raw)
  To: SZIGETVÁRI János; +Cc: CIFS, rohiths msft

As Rohith has been investigating, there can be cases where
   "CIFS VFS: No task to wake, unknown frame received!"
is ok (e.g. races between close and oplock break) but to investigate
your storage problem some additional information could be helpful.

Is this a large file workload, or lots of small files in directories,
or deep directory trees?

Large directories, and deep directory trees can result in the Linux
VFS layer doing many revalidate
requests beyond the default 1 second metadata caching timeout (cifs.ko
is stricter than some other
fs in defaulting to 1 second).   Especially if this is the only client
likely to update the files while backing them up
then setting actimeo much higher (e.g. actimeo=30) could be helpful.

In general SMB3.1.1 is MUCH faster with reasonably current Ubuntu
(make sure you have updated
your Ubuntu to at least 5.4 although current Ubuntu AFAIK is 5.8 or
5.11 kernel now).   It is VERY
important to use a kernel more recent than 5.3, not just because of
the many bugfixes but also
because of the addition of GCM (much faster) encryption for SMB3.1.1
in 5.3 kernel.kernel:

There are very few cases where you want to mount with something other
than the default (which
for modern servers is almost always SMB3.1.1, e.g. default or
explicitly with "vers=3.1.1") as GCM
encryption is faster.  There are a few cases where SMB1 POSIX
Extensions can be helpful but there
are tradeoffs with older SMB1 being so much less secure and so many
other features missing from the
20+ year old SMB1 - use the default SMB3.1.1 if possible.

There are other parameters that can be helpful e.g. "nostrictsync" in
some cases.

Also consider whether rsync is what you want to use - rsync picks an
unfortunately small default I/O size and its
maximum I/O size is also terrible - this is less of an issue if you
are using the default caching (which goes
through the Linux page cache so will aggregate I/O into larger chunks)
but for a network FS you really want
I/O 1MB or larger (SMB2+ will typically default to 4MB or 1MB I/O and
even NFSv4+ will typically default to 1MB)

If you have the ability to try newer kernels (e.g. Ubuntu makes it
very easy to download install packages to update
to the more current 5.13 kernel from the Ubuntu website for testing on
older Ubuntu) - give 5.13 a try and experiment
with the new mount parm ("rasize") e.g. setting it to 8MB and see if that helps.

Another key thing to look at is whether there are reconnects being
triggered (e.g. by bad app behavior - like we saw
with scp sometimes sending signals accidentally killing the TCP
network connection, or by timeouts on the server,
or bugs that have been fixed in more recent kernels).   See the number
of reconnects in /proc/fs/cifs/Stats and if
it is increasing then focusing on whether that is a server bug, or a
bug due to an older kernel on the client which
is missing fixes can be useful.

On Sun, Aug 8, 2021 at 3:37 PM SZIGETVÁRI János <jszigetvari@gmail.com> wrote:
>
> Dear Members,
>
> I work for a company that (among others) sells Ubuntu-based log
> storage appliances. I ran into a problem, where I'm trying to copy a
> large amount of data over to a CIFS mount from a Ubuntu 18.04 based
> appliance to a Windows 2012 R2 Storage Server, and rsync fails after
> 1-2-3 hours into the copy operation with something like:
> rsync: failed to set times on "FILENAME.U5EgGX": No such device (19)"
>
> and I see a number of kernel logs just prior to that, that look like this:
>
> kernel: [3819786.441711] CIFS VFS: No task to wake, unknown frame
> received! NumMids 1
> kernel: [3819786.441717] 00000000: 6c000000 424d53fe 00000040 00000000
>  ...l.SMB@.......
> kernel: [3819786.441718] 00000010: 00000012 00000001 00000000 ffffffff
>  ................
> kernel: [3819786.441720] 00000020: ffffffff 00000000 00000000 00000000
>  ................
> kernel: [3819786.441721] 00000030: 00000000 00000000 00000000 00000000
>  ................
> kernel: [3819786.441722] 00000040: 00000000
>
> I also tried to google around for a while, and I found the same exact
> package hexdump on the linux-cifs mailing list from 2012:
> https://www.spinics.net/lists/linux-cifs/msg06634.html
>
> In that thread the person reporting the problem failed to reproduce
> the problem a few weeks after reporting it.
> There it was recommended to try and mount the share with SMB v1, but
> that is out of the question nowdays.
>
> We tried forcing the mount to happen with vers=3 and 3.02, but it made
> no difference. The error still re-occurred.
>
> Best Regards,
> János Szigetvári
> --
> Janos SZIGETVARI
> RHCE, License no. 150-053-692
>
> LinkedIn: linkedin.com/in/janosszigetvari
>
> __@__˚V˚
> Make the switch to open (source) applications, protocols, formats now:
> - windows -> Linux, iexplore -> Firefox, msoffice -> LibreOffice
> - msn -> jabber protocol (Pidgin, Google Talk)
> - mp3 -> ogg, wmv -> ogg, jpg -> png, doc/xls/ppt -> odt/ods/odp



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: rsync copy operation fails on a CIFS mount
  2021-08-08 20:56   ` Steve French
@ 2021-08-11 13:13     ` SZIGETVÁRI János
  0 siblings, 0 replies; 3+ messages in thread
From: SZIGETVÁRI János @ 2021-08-11 13:13 UTC (permalink / raw)
  To: linux-cifs

Dear Steve,

Thank you so much for the long list of recommendations and tips.
I will try to answer to some of your questions that I can answer right
away, and will try to
come back to you on the others at a later time.

Steve French <smfrench@gmail.com> ezt írta (időpont: 2021. aug. 8., V, 22:57):
> Is this a large file workload, or lots of small files in directories,
> or deep directory trees?

Yes, it is in deed. A directory typically has between 1700 and 1100
files, with a total size of
about 150 GB. In each directory there is one large file with a size of
between 90 and 100 GB,
and the rest of the 1xxx files are about 40-50 MB each.
Based on the recommendations I got over the samba mailing-list, I
recommended some config
changes that would hopefully reduce the number of the smaller files in
a directory to about 1/8th
of the current numbers. We'll see what difference that change makes.
Ideally one or two such directories are copied in a routine rsync job run.

> Large directories, and deep directory trees can result in the Linux
> VFS layer doing many revalidate
> requests beyond the default 1 second metadata caching timeout (cifs.ko
> is stricter than some other
> fs in defaulting to 1 second).   Especially if this is the only client
> likely to update the files while backing them up
> then setting actimeo much higher (e.g. actimeo=30) could be helpful.

Thank you, we will try this option!
In our case, the machine offering the share is doing so for only one client.
There is another client, but that uses a different/non-overlapping share.

> In general SMB3.1.1 is MUCH faster with reasonably current Ubuntu
> (make sure you have updated
> your Ubuntu to at least 5.4 although current Ubuntu AFAIK is 5.8 or
> 5.11 kernel now).   It is VERY
> important to use a kernel more recent than 5.3, not just because of
> the many bugfixes but also
> because of the addition of GCM (much faster) encryption for SMB3.1.1
> in 5.3 kernel.kernel:

As mentioned, our appliance is Ubuntu-based, but unfortunately it's
built on 18.04 LTS.
Currently we use a 4.15 version kernel, so can be considered old by now.

During our tests, we tried using CIFS version 3.1.1, but the mount did
not succeed.
In our tests CIFS version 3.02 was the latest one the mount succeeded with.
It may be that CIFS 3.1.1 support was missing from either the 4.15
Linux kernel, or
it may have not been present in the Windows 2012 Storage Edition, that
was providing the share.

> There are other parameters that can be helpful e.g. "nostrictsync" in
> some cases.

Okay, we'll make sure to try that one too.

> Also consider whether rsync is what you want to use - rsync picks an
> unfortunately small default I/O size and its
> maximum I/O size is also terrible - this is less of an issue if you
> are using the default caching (which goes
> through the Linux page cache so will aggregate I/O into larger chunks)
> but for a network FS you really want
> I/O 1MB or larger (SMB2+ will typically default to 4MB or 1MB I/O and
> even NFSv4+ will typically default to 1MB)

I guess we use the default caching, but I'll look into it.

> If you have the ability to try newer kernels (e.g. Ubuntu makes it
> very easy to download install packages to update
> to the more current 5.13 kernel from the Ubuntu website for testing on
> older Ubuntu) - give 5.13 a try and experiment
> with the new mount parm ("rasize") e.g. setting it to 8MB and see if that helps.

Unfortunately doing that is not so easy in our case, but we may give
it some thought
to see whether and how it could be done.

> Another key thing to look at is whether there are reconnects being
> triggered (e.g. by bad app behavior - like we saw
> with scp sometimes sending signals accidentally killing the TCP
> network connection, or by timeouts on the server,
> or bugs that have been fixed in more recent kernels).   See the number
> of reconnects in /proc/fs/cifs/Stats and if
> it is increasing then focusing on whether that is a server bug, or a
> bug due to an older kernel on the client which
> is missing fixes can be useful.

Okay, the next time we do some testing, I will monitor that counter as well.

Thank you so much Steve for your help!

Best Regards,
János Szigetvári

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-08-11 13:14 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAJK_Yh-m-p8r=9WhrHn=V5yMWBpYZCRZeqWyci+NbUEGNPwpYw@mail.gmail.com>
2021-08-08 20:33 ` rsync copy operation fails on a CIFS mount SZIGETVÁRI János
2021-08-08 20:56   ` Steve French
2021-08-11 13:13     ` SZIGETVÁRI János

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).