All of lore.kernel.org
 help / color / mirror / Atom feed
From: Joshua Kinard <kumba@gentoo.org>
To: Matt Turner <mattst88@gmail.com>, Manuel Lauss <manuel.lauss@gmail.com>
Cc: Ralf Baechle <ralf@linux-mips.org>,
	James Hogan <james.hogan@imgtec.com>,
	"linux-mips@linux-mips.org" <linux-mips@linux-mips.org>
Subject: Re: NFS corruption, fixed by echo 1 > /proc/sys/vm/drop_caches -- next debugging steps?
Date: Wed, 15 Mar 2017 12:46:03 -0400	[thread overview]
Message-ID: <2d13de51-e757-b283-1e7f-f4a54f87965a@gentoo.org> (raw)
In-Reply-To: <CAEdQ38FU6H7ThmP2MgUY-uLhf9feZ6US2JwhEQsCuPw9AeV3nQ@mail.gmail.com>

On 03/15/2017 11:31, Matt Turner wrote:
> On Wed, Mar 15, 2017 at 7:00 AM, Manuel Lauss <manuel.lauss@gmail.com> wrote:
>>
>> On Wed, Mar 15, 2017 at 10:25 AM, Ralf Baechle <ralf@linux-mips.org> wrote:
>>>
>>> On Mon, Mar 13, 2017 at 09:47:57AM +0000, James Hogan wrote:
>>>
>>>>>
>>>>> Note that the corruption is different across reboots, both in the size
>>>>> of the corruption and the location. I saw 1900~ and 1400~ byte
>>>>> sequences corrupted on separate occasions, which don't correspond to
>>>>> the system's 16kB page size.
>>>>>
>>>>> I've tested kernels from v3.19 to 4.11-rc1+ (master branch from
>>>>> today). All exhibit this behavior with differing frequencies. Earlier
>>>>> kernels seem to reproduce the issue less often, while more recent
>>>>> kernels reliably exhibit the problem every boot.
>>>>>
>>>>> How can I further debug this?
>>>>
>>>> It smells a bit like a DMA / caching issue.
>>>>
>>>> Can you provide a full kernel log. That might provide some information
>>>> about caching that might be relevant (e.g. does dcache have aliases?).
>>>
>>> The architecture of the BCM1250 SOC used for the BCM91250 boards are
>>> fully coherent, S-cache and D-cache are physically indexed and tagged.
>>> Only the VIVT (plus the usual ASID tagging) I-cache leaves space for
>>> software to screw up cache management but that shouldn't matter for this
>>> case, so I suggest to start looking into this from the NFS side.
>>
>>
>> I did Matt's tests on Alchemy (VIPT caches) with kernels 3.18 to 4.11-rc
>> against
>> an x86 4.9.15 host, and did not see any problems.   Given Ralf's comment
>> about the BCM1250 caches, maybe you have bad hardware (BCM board or
>> network) ?
> 
> I certainly cannot rule that possibility out. If that is the case, I
> would like to be sure of it -- see a failure in memtester or something
> for instance. Any suggestions? (I have run memtester and never found
> anything)
> 
> For what its worth, did you determine the cause of the NFS corruption
> you reported [1]?
> 
> [1] https://www.spinics.net/lists/mips/msg44006.html

I'm using NFSv4 between my SGI Octane and Intel box with no noticeable issues.
I used both rsync and cp to move a large, ~845MB file between both and
md5summed them both and get the same md5sum back.  What NFS versions and
protocols have you tried?  v4 is TCP-only, but v3 can do both UDP and TCP.

That said, I doubt this'll affect you, but, if you're running the XFS
filesystem, version 5 (crc=1, finobt=1), Do you notice any oddities with
untarring a really large tarball, like a Gentoo stage or such on that BCM
machine?  That's revealed a couple of curious issues that may be
Octane-specific that I haven't tried to trace down yet.  Would be interesting
if you saw them as well.  Specifically, if you get a non-fatal Oops in dmesg
from the above or a message from xfsaild about a possible deadlock in
kmem_alloc(), I'd love to know.



-- 
Joshua Kinard
Gentoo/MIPS
kumba@gentoo.org
6144R/F5C6C943 2015-04-27
177C 1972 1FB8 F254 BAD0 3E72 5C63 F4E3 F5C6 C943

"The past tempts us, the present confuses us, the future frightens us.  And our
lives slip away, moment by moment, lost in that vast, terrible in-between."

--Emperor Turhan, Centauri Republic

  parent reply	other threads:[~2017-03-15 16:46 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-13  1:43 NFS corruption, fixed by echo 1 > /proc/sys/vm/drop_caches -- next debugging steps? Matt Turner
2017-03-13  9:47 ` James Hogan
2017-03-13  9:47   ` James Hogan
2017-03-13 17:17   ` Matt Turner
2017-03-15  9:25   ` Ralf Baechle
     [not found]     ` <CAOLZvyGRn5JgeRoiHv0AH8LVwLF5MtXF2KwS5Yr5N8QOK6eYnw@mail.gmail.com>
2017-03-15 15:31       ` Matt Turner
2017-03-15 15:52         ` Ralf Baechle
2017-03-15 16:46         ` Joshua Kinard [this message]
2017-12-08  7:00 ` Matt Turner
2017-12-08  7:54   ` Matt Turner
2017-12-08 13:42     ` Eric Dumazet
2017-12-08 13:42       ` Eric Dumazet
2017-12-08 13:52       ` Eric Dumazet
2017-12-08 13:52         ` Eric Dumazet
2017-12-08 20:26         ` Matt Turner
2017-12-08 20:26           ` Matt Turner
2017-12-08 21:16           ` Eric Dumazet
2017-12-08 21:16             ` Eric Dumazet
2017-12-09 21:03             ` Matt Turner
2017-12-09 21:03               ` Matt Turner
2017-12-09 21:37               ` Eric Dumazet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=2d13de51-e757-b283-1e7f-f4a54f87965a@gentoo.org \
    --to=kumba@gentoo.org \
    --cc=james.hogan@imgtec.com \
    --cc=linux-mips@linux-mips.org \
    --cc=manuel.lauss@gmail.com \
    --cc=mattst88@gmail.com \
    --cc=ralf@linux-mips.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.