All of lore.kernel.org
 help / color / mirror / Atom feed
From: sean finney <seanius-ADwgVSpYHhHR7s880joybQ@public.gmane.org>
To: linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	samba-technical-w/Ol4Ecudpl8XjKLYN78aQ@public.gmane.org
Subject: Intermittent file corruption problems with cifs driver?
Date: Mon, 12 Sep 2011 10:36:55 +0200	[thread overview]
Message-ID: <20110912083654.GA3338@cobija.connexer.com> (raw)

Hi all,

Recently at $customer I've been tasked into looking into a problem they
are intermittently having with corrupt file transfers from linux servers
to a windows share.  

Little info on the servers:

	Ubuntu Lucid 10.04
	Stock and up to date Linux 2.6.32-33-server distro package
	Stock cifs-utils 4.5-2 packages

Description of behavior:

	The servers are all part of a distributed service where each server
	regularly uploads 100-200MB zipfiles to the windows share.  Intermittently
	the resulting files will be corrupted.  On the client that performs
	the upload, the corrupted file will appear to have the correct checksum,
	but any other remote client will see it as corrupted.

	The problem used to be much more frequent, and mounting with -o directio
	seems to have greatly reduced, but not eliminated, the recurrence of the
	corruption.  But recently (perhaps due to higher reates of uploads?),
	the problem has started recurring.  It doesn't seem uniformly occuring,
	but rather in spurts where a couple files will be corrupted in one day,
	and then a week will go by with no corruptions.

	I do see occasional errors in the kernel logs, though I'm not sure if
	they are relevant or not (note that they're at substantially different
	times, and at present I have no way to correlate them with corruption,
	though I'm working on that):

	[170873.721023]  CIFS VFS: Error -104 sending data on socket to server
	[170873.728747]  CIFS VFS: Error -32 sending data on socket to server
	[515039.940104]  CIFS VFS: No response to cmd 115 mid 32714
	[515039.947933]  CIFS VFS: Send error in SessSetup = -11
	[521901.595381]  CIFS VFS: No response to cmd 46 mid 37426
	[521901.603422]  CIFS VFS: Send error in read = -11
	[2097744.571138]  CIFS VFS: No response for cmd 50 mid 48502
	[2097849.771138]  CIFS VFS: No response for cmd 114 mid 48519


Reading through the archives along with the rest of teh internetz I've found 
very little info.  Someone posted here back in february about a similar
sounding problem, though I do not see the wsize-len blocks of NULL bytes in
the resulting files like they did.

I've written a small python script that right now is running on a pair of
these servers, which with a couple dozen threads is uploading similarly sized
files of arbitrary data, and comparing the upload results of each other.
after a few hours I haven't seen it yet, but will keep it runnign for
a couple days to see if it shows up.

I've also found a couple suggestions out there to "disable linux
extensions" and "disable oplocks" when searching on the above kernel error
messages, but am hesitant to try them unless there's a strong indication
that they will help, and i'm not entirely sure if/whether they will.


does this ring a bell with anyone?  at this point i can't just do a
blanket "try the latest" upgrade of these servers because they're part
of a production application, at least without any further indication that
there was a fix for this problem between the current and latest versions.
If I can repro the problem, however, and can then take it to a non-prod
machine, then I might have a bit more flexibility, but in the meantime
thought I'd field the question here on the off chance...


thanks!
	sean

             reply	other threads:[~2011-09-12  8:36 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2011-09-12  8:36 sean finney [this message]
2011-09-12 12:27 ` Intermittent file corruption problems with cifs driver? Steve French
     [not found]   ` <CAH2r5msrRGT+aMZw8shNRNzbQqfGo8Ba2RHBhUtXznNRvRyykQ-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-12 12:29     ` Fwd: " Steve French
2011-09-12 13:16 ` Jeff Layton
     [not found]   ` <20110912091624.10d5aea5-4QP7MXygkU+dMjc06nkz3ljfA9RmPOcC@public.gmane.org>
     [not found]     ` <CAH2r5mumMQEG57BDP3PSsx42N_51hf7_HbcSVHYy2O0LR=FVUA-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2011-09-12 14:37       ` Fwd: " sean finney
     [not found]         ` <20110912143715.GA3959-Znhnm/lQSyjxW5zecs3cv0EOCMrvLtNR@public.gmane.org>
2011-09-12 15:25           ` Steve French

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20110912083654.GA3338@cobija.connexer.com \
    --to=seanius-adwgvspyhhhr7s880joybq@public.gmane.org \
    --cc=linux-cifs-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
    --cc=samba-technical-w/Ol4Ecudpl8XjKLYN78aQ@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.