From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from smtp1.ascom-ws.com ([195.81.148.76] helo=mail.ascom-ws.com) by bombadil.infradead.org with esmtps (Exim 4.85_2 #1 (Red Hat Linux)) id 1bziJ1-0000dR-9w for linux-mtd@lists.infradead.org; Thu, 27 Oct 2016 10:53:11 +0000 From: Danesh Daroui To: Boris Brezillon CC: Steve deRosier , "linux-mtd@lists.infradead.org" Subject: RE: OOB Test fails Date: Thu, 27 Oct 2016 10:51:14 +0000 Message-ID: <39BC08CB3FF4C84CB6397533D4FC79095770D5E7@SEGOTEXCH02.ascom-Resource.ads> References: <39BC08CB3FF4C84CB6397533D4FC79095770D513@SEGOTEXCH02.ascom-Resource.ads> <39BC08CB3FF4C84CB6397533D4FC79095770D530@SEGOTEXCH02.ascom-Resource.ads> <20161027093801.7695f05e@bbrezillon> In-Reply-To: <20161027093801.7695f05e@bbrezillon> Content-Language: en-US Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 List-Id: Linux MTD discussion mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Hi Boris, Thanks for your help. We would really like to upgrade the Kernel and that i= s a wise approach of course, but we would like to be sure that this is due = to the outdated Kernel or whether this is a hardware problem since Kernel u= pgrading is a time consuming and cumbersome task, but definitely necessary = as you mentioned. Right now I am trying to run UBIFS tests which are includ= ed in "mtd_utils". I hope these tests will give me some hints if there is a= ny problem is UBI/UBIFS layers. I had written my own stress test before whi= ch would test the memory on POSIX level (same as UBI/UBIFS layers more or l= ess), and I experienced some crashes but could not identify what is the rea= son. For instance I could not find out if the crash happens due to a bug in= driver or file system, etc.=20 The flash memory we are using is a Micron NAND 1GiB 3,3V 8-bit and the driv= er delivered with Kernel 3.6.39. Have you heard about similar problem befor= e? Or do you want me to give you more info about the hardware and the syste= m we have under test? Thanks again for your help, Danesh Daroui -----Original Message----- From: Boris Brezillon [mailto:boris.brezillon@free-electrons.com]=20 Sent: den 27 oktober 2016 09:38 To: Danesh Daroui Cc: Steve deRosier ; linux-mtd@lists.infradead.org Subject: Re: OOB Test fails Hi Danesh, On Wed, 26 Oct 2016 16:28:43 +0000 Danesh Daroui wrote: > Hi Steve, >=20 > Thank you for your prompt answer. When I run OOB test (mtd_oobtest), for = instance, one of devices always return verification failed error on a certa= in address. This is all we know and all the test reports. We use a quite ol= d kernel i.e. 2.6.39 and this is one of the things that we suspect as a sou= rce of the problem that the kernel is outdated. Also, we consider the hardw= are failure since on some devices no error is shown on OOB test while on ot= hers more errors are shown and the address is changed randomly sometimes. Yes, please, try with a newer kernel: I won't help debugging such an old th= ing. >=20 > Our main problem is that sometimes UBIFS forces the device into read-only= mode due to "bad CRC" error at startup when the device is booted. I am now= running tests which are in "mtd_utils" for testing file system. I have sta= rted running two tests which are "simple/test_1" and "simple/test_2" which = simply write until the drive is full and the read the data back and verify = the correctness. During the test, I see lots of: >=20 > UBI: scrubbed PEB 585 (LEB 3:770), data moved to PEB 1772 > UBI: scrubbed PEB 1045 (LEB 3:1261), data moved to PEB 828 > UBI: scrubbed PEB 1493 (LEB 3:664), data moved to PEB 814 > UBI: scrubbed PEB 751 (LEB 3:1260), data moved to PEB 1772 >=20 > In my mind, this is related to problematic hardware that the data is corr= upted on many cells that UBIFS tries to move the data when a corruption is = detected. My question is, whether this guess can be valid or this is mostly= due to old kernel that we are using and upgrading to a new kernel would mo= st likely solve the problems? Well, I can't tell. It can be caused by a buggy NAND controller driver, a b= ug in the UBI layer or maybe your NAND is simply worn. Try with a newer kernel, and let's see what the MTD tests and MTD utils tes= ts say. BTW, which NAND and NAND controller are your testing on? Regards, Boris