linux-mtd.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
From: Boris Brezillon <boris.brezillon@collabora.com>
To: Lubomir Rintel <lkundrak@v3.sk>
Cc: Richard Weinberger <richard@nod.at>,
	Tudor Ambarus <tudor.ambarus@microchip.com>,
	linux-mtd@lists.infradead.org,
	Vignesh Raghavendra <vigneshr@ti.com>,
	Miquel Raynal <miquel.raynal@bootlin.com>
Subject: Re: [PATCH v2 00/19] mtd: rawnand: cafe: Convert to exec_op() (and more)
Date: Mon, 11 May 2020 10:23:05 +0200	[thread overview]
Message-ID: <20200511102305.7d843fbc@collabora.com> (raw)
In-Reply-To: <20200510093549.56f74e61@collabora.com>

On Sun, 10 May 2020 09:35:49 +0200
Boris Brezillon <boris.brezillon@collabora.com> wrote:

> On Sun, 10 May 2020 09:21:08 +0200
> Lubomir Rintel <lkundrak@v3.sk> wrote:
> 
> > On Sun, May 10, 2020 at 08:45:41AM +0200, Boris Brezillon wrote:  
> > > On Sun, 10 May 2020 08:31:05 +0200
> > > Boris Brezillon <boris.brezillon@collabora.com> wrote:
> > >     
> > > > On Sat, 9 May 2020 22:28:55 +0200
> > > > Lubomir Rintel <lkundrak@v3.sk> wrote:
> > > >     
> > > > > On Sat, May 09, 2020 at 10:01:02PM +0200, Boris Brezillon wrote:      
> > > > > > On Sat, 9 May 2020 21:34:40 +0200
> > > > > > Lubomir Rintel <lkundrak@v3.sk> wrote:
> > > > > >         
> > > > > > > On Thu, May 07, 2020 at 10:12:57PM +0200, Boris Brezillon wrote:        
> > > > > > > > On Thu, 7 May 2020 15:47:08 +0200
> > > > > > > > Lubomir Rintel <lkundrak@v3.sk> wrote:
> > > > > > > >           
> > > > > > > > > On Wed, May 06, 2020 at 11:35:52PM +0200, Boris Brezillon wrote:          
> > > > > > > > > > On Wed, 6 May 2020 22:36:35 +0200
> > > > > > > > > > Lubomir Rintel <lkundrak@v3.sk> wrote:
> > > > > > > > > >             
> > > > > > > > > > > > We really should mask IRQs (AKA disable IRQs in my naming convention
> > > > > > > > > > > > :-)) here, unless we want to switch to interrupt-based waits (which
> > > > > > > > > > > > would be a good thing when we have DMA or WAIT_RDY involved). Having an
> > > > > > > > > > > > interrupt handler in the current implementation doesn't make any sense
> > > > > > > > > > > > (that's assuming the IRQ_STATUS bits are updated even if the interrupts
> > > > > > > > > > > > are disabled, which am not sure is a valid assumption in this case).              
> > > > > > > > > > > 
> > > > > > > > > > > I have no idea why the interrupt handler is there. Perhaps some
> > > > > > > > > > > interrupts can't be masked and need an ack or something.            
> > > > > > > > > > 
> > > > > > > > > > Can you try to set NAND_IRQ_MASK to 0x0 and see if that still works.
> > > > > > > > > > Can you also check the number of NAND interrupts when set to 0x0? It's
> > > > > > > > > > hard to tell exactly what caused the interrupt handler to be called
> > > > > > > > > > since this is a shared interrupt.            
> > > > > > > > > 
> > > > > > > > > When it's set to 0, I get an interrupt with CAFE_NAND_IRQ=0x40000000
> > > > > > > > > (CAFE_NAND_IRQ_FLASH_RDY) right off the bat. That doesn't happen with
> > > > > > > > > a mask of 0xffffffff.
> > > > > > > > > 
> > > > > > > > > When changing the handler to always ack CAFE_NAND_IRQ_FLASH_RDY I've
> > > > > > > > > also seen CAFE_NAND_IRQ=0x80000000 (CAFE_NAND_IRQ_CMD_DONE) suggesting
> > > > > > > > > that other interrupts aren't masked either.
> > > > > > > > > 
> > > > > > > > > It seems to be that ones indeed mask interrupts but just can't be
> > > > > > > > > masked (CAFE_NAND_IRQ_CMD_DONE or CAFE_NAND_IRQ_DMA_DONE), perhaps
> > > > > > > > > due to hardware bugs.
> > > > > > > > >           
> > > > > > > > 
> > > > > > > > I pushed a new version with some interrupt-related changes [1].
> > > > > > > > 
> > > > > > > > [1]https://github.com/bbrezillon/linux/commits/nand/cafe-nand-exec-op-debug          
> > > > > > > 
> > > > > > > Works with one fix:
> > > > > > > 
> > > > > > > diff --git a/drivers/mtd/nand/raw/cafe_nand.c b/drivers/mtd/nand/raw/cafe_nand.c
> > > > > > > index 591d79730961..e37737b7b089 100644
> > > > > > > --- a/drivers/mtd/nand/raw/cafe_nand.c
> > > > > > > +++ b/drivers/mtd/nand/raw/cafe_nand.c
> > > > > > > @@ -801,6 +801,7 @@ static int cafe_nand_probe(struct pci_dev *pdev,
> > > > > > >         if (!cafe)
> > > > > > >                 return  -ENOMEM;
> > > > > > >  
> > > > > > > +       init_completion(&cafe->complete);        
> > > > > > 
> > > > > > Oops, indeed.
> > > > > >         
> > > > > > >         mtd = nand_to_mtd(&cafe->nand);
> > > > > > >         mtd->dev.parent = &pdev->dev;
> > > > > > >         nand_set_controller_data(&cafe->nand, cafe);
> > > > > > > 
> > > > > > > However, the mount JFFS2 mount takes about twice as long as it did with
> > > > > > > the polling version:        
> > > > > > 
> > > > > > Yes, that's not surprising. At the same time, using atomic-polling for
> > > > > > something that's expected to take hundreds of microseconds is not that
> > > > > > great. That means your CPU is not doing anything useful while you wait
> > > > > > for the read/write/erase operation to finish.        
> > > > > 
> > > > > Yes. But this really is too much of a slowdown:
> > > > > 
> > > > >   bash-5.0# time dd count=65536 bs=2k if=/dev/mtd0 of=/dev/null
> > > > >   65536+0 records in
> > > > >   65536+0 records out
> > > > >   
> > > > >   real    0m20.191s
> > > > >   user    0m0.346s
> > > > >   sys     0m10.366s
> > > > > 
> > > > > vs (previously):
> > > > >   
> > > > >   bash-5.0# time dd count=65536 bs=2k if=/dev/mtd0 of=/dev/null
> > > > >   65536+0 records in
> > > > >   65536+0 records out
> > > > >   
> > > > >   real    0m7.629s
> > > > >   user    0m0.010s
> > > > >   sys     0m7.500s
> > > > >   bash-5.0#      
> > > > 
> > > > Almost a factor 3. I was definitely not expecting interrupt-based waits
> > > > to have such a huge impact on the perfs.
> > > >     
> > > > > 
> > > > > Note that your CPU can't be doing anything useful before the program and
> > > > > its data is loaded from the storage :)      
> > > > 
> > > > Well, that's only true at mount time (and if you delay the mount after
> > > > the boot, your CPU might already have other things to do), but any
> > > > erase/write operations are likely to monopolize your CPU for no good
> > > > reason.
> > > >     
> > > > > 
> > > > > I suppose that if someone really prefers to avoid hogging the CPU at
> > > > > this cost, then it makes sense to add a knob (a module parameter or
> > > > > something) that would enable the interrupt-driven operation, but
> > > > > keep polling as a default.      
> > > > 
> > > > Let's not add more module params than we already have, it just
> > > > confuses users and deciding how to wait on HW events doesn't sounds
> > > > like something they should be able to choose anyway (just like passing
> > > > the timing params, this should be calculated by the driver). Oh well,
> > > > I'll drop the patch adding interrupt-based waits. Having the driver
> > > > converted to exec_op() is more than enough :-).    
> > > 
> > > Just pushed a new version. If it works for you I'll send a v3.    
> > 
> > Thank you. That's b6b10b45dd9 in nand/cafe-nand-exec-op-debug of
> > https://github.com/bbrezillon/linux/ I suppose?
> > 
> > Without the readl_poll_timeout() -> readl_poll_timeout_atomic() change
> > it's still very slow.  
> 
> Should be fixed now.
> 
> > 
> > Also, commit f89355b6b6 ("mtd: rawnand: cafe: Return IRQ_HANDLED when
> > appropriate") looks somewhat suspicious to me. Previously it wrote the
> > pending interrupt bits back into CAFE_NAND_IRQ, now you're masking them
> > out in CAFE_NAND_IRQ_MASK (which already should be 0xffffffff) at this
> > point. Why?  
> 
> If interrupts are masked we don't need to clear them. We only clear
> them before executing an operation to start from a fresh state.
> 
> > I thought the write back to CAFE_NAND_IRQ serves to ack the
> > interrupts that came up but we don't handle elsewhere because we weren't
> > expecting them.  
> 
> If we reach the handler and all our irqs are masked, that means the irq
> was not for us, which is possible since the irq line is shared. We
> really should to return IRQ_NONE in that case, and clearing pending
> interrupts is useless, since they are masked anyway. Since we read
> the interrupt status from exec_op(), I thought it'd be better to never
> clear any interrupt bits instead of clearing all bits but the CMD_DONE,
> DMA_DONE and FLASH_RDY.
> 
> > 
> > As you correctly pointed out; the source of the interrupts I'm seeing
> > could be something else than the CAFE chip -- the camera or the MMC
> > card. I'm not sure though; camera is certainly off and there shouldn't
> > be much going on about the MMC card. I'm testing with a init=/bin/bash
> > installation off a SD-card currently. I guess I can try switching to the
> > USB flash stick and disable the camera and MMC altogether.  
> 
> Okay, if that happens that would be a HW bug (or an interrupt coming
> from somewhere else, maybe PCI errors?)? Can you print the values of
> CAFE_GLOBAL_IRQ and CAFE_GLOBAL_IRQ_MASK in your irq handler?

If you think that's less risky, I can drop "mtd: rawnand: cafe: Return
IRQ_HANDLED when appropriate" and go for your initial fix (avoid
clearing FLSH_READY interrupt). It just feels like the current
implementation is papering over a bug.

______________________________________________________
Linux MTD discussion mailing list
http://lists.infradead.org/mailman/listinfo/linux-mtd/

  reply	other threads:[~2020-05-11  8:23 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-05 10:13 [PATCH v2 00/19] mtd: rawnand: cafe: Convert to exec_op() (and more) Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 01/19] mtd: rawnand: Propage CS selection to sub operations Boris Brezillon
2020-05-24 19:17   ` Miquel Raynal
2020-05-05 10:13 ` [PATCH v2 02/19] mtd: rawnand: cafe: Get rid of an inaccurate kernel doc header Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 03/19] mtd: rawnand: cafe: Rename cafe_nand_write_page_lowlevel() Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 04/19] mtd: rawnand: cafe: Use a correct ECC mode and pass the ECC alg Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 05/19] mtd: rawnand: cafe: Include linux/io.h instead of asm/io.h Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 06/19] mtd: rawnand: cafe: Demistify register fields Boris Brezillon
     [not found]   ` <20200506204638.GB207924@furthur.local>
2020-05-06 20:53     ` Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 07/19] mtd: rawnand: cafe: Factor out the controller initialization logic Boris Brezillon
2020-05-10 21:49   ` Miquel Raynal
2020-05-05 10:13 ` [PATCH v2 08/19] mtd: rawnand: cafe: Get rid of the debug module param Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 09/19] mtd: rawnand: cafe: Use devm_kzalloc and devm_request_irq() Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 10/19] mtd: rawnand: cafe: Get rid of a useless label Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 11/19] mtd: rawnand: cafe: Explicitly inherit from nand_controller Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 12/19] mtd: rawnand: cafe: Don't leave ECC enabled in the write path Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 13/19] mtd: rawnand: cafe: Don't split things when reading/writing a page Boris Brezillon
2020-05-10 21:49   ` Miquel Raynal
2020-05-05 10:13 ` [PATCH v2 14/19] mtd: rawnand: cafe: Add exec_op() support Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 15/19] mtd: rawnand: cafe: Get rid of the legacy interface implementation Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 16/19] mtd: rawnand: cafe: Adjust the cafe_{read, write}_buf() prototypes Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 17/19] mtd: rawnand: cafe: s/uint{8,16,32}_t/u{8,16,32}/ Boris Brezillon
2020-05-05 10:13 ` [PATCH v2 18/19] mtd: rawnand: cafe: Drop the cafe_{readl, writel}() wrappers Boris Brezillon
2020-05-10 21:45   ` [PATCH v2 18/19] mtd: rawnand: cafe: Drop the cafe_{readl,writel}() wrappers Miquel Raynal
2020-05-05 10:13 ` [PATCH v2 19/19] mtd: rawnand: cafe: Get rid of the last printk() Boris Brezillon
2020-05-10 21:43   ` Miquel Raynal
     [not found] ` <20200505144639.GB1997@furthur.local>
     [not found]   ` <20200505220152.GA157445@furthur.local>
2020-05-06  6:32     ` [PATCH v2 00/19] mtd: rawnand: cafe: Convert to exec_op() (and more) Boris Brezillon
     [not found]       ` <20200506155359.GA183666@furthur.local>
2020-05-06 16:11         ` Boris Brezillon
     [not found]           ` <20200506203635.GA207924@furthur.local>
2020-05-06 20:58             ` Boris Brezillon
2020-05-06 21:35             ` Boris Brezillon
     [not found]               ` <20200507134708.GA303404@furthur.local>
2020-05-07 14:11                 ` Boris Brezillon
2020-05-07 20:12                 ` Boris Brezillon
     [not found]                   ` <20200509193440.GA524772@furthur.local>
2020-05-09 20:01                     ` Boris Brezillon
     [not found]                       ` <20200509202855.GB524772@furthur.local>
2020-05-10  6:31                         ` Boris Brezillon
2020-05-10  6:45                           ` Boris Brezillon
     [not found]                             ` <20200510072108.GA587379@furthur.local>
2020-05-10  7:35                               ` Boris Brezillon
2020-05-11  8:23                                 ` Boris Brezillon [this message]
     [not found]                                   ` <20200512164057.GC604838@furthur.local>
2020-05-12 19:50                                     ` Boris Brezillon
2020-05-13 17:10                                     ` Boris Brezillon
     [not found]                                       ` <20200515144703.GA1245784@furthur.local>
     [not found]                                         ` <20200515192540.GB1245784@furthur.local>
     [not found]                                           ` <20200516145650.GA1433661@furthur.local>
2020-05-16 19:08                                             ` Boris Brezillon
2020-05-16 20:18                                               ` Boris Brezillon
     [not found]                                                 ` <20200517164709.GA1651421@furthur.local>
2020-05-18 14:50                                                   ` Boris Brezillon
     [not found]                                                     ` <20200520072331.GJ1695525@furthur.local>
2020-05-20  7:55                                                       ` Boris Brezillon
     [not found]                                                         ` <20200524115246.GC2781@furthur.local>
2020-05-24 14:55                                                           ` Boris Brezillon
2020-05-24 15:05                                                             ` Boris Brezillon
2020-05-24 15:29                                                           ` Boris Brezillon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200511102305.7d843fbc@collabora.com \
    --to=boris.brezillon@collabora.com \
    --cc=linux-mtd@lists.infradead.org \
    --cc=lkundrak@v3.sk \
    --cc=miquel.raynal@bootlin.com \
    --cc=richard@nod.at \
    --cc=tudor.ambarus@microchip.com \
    --cc=vigneshr@ti.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).