From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752212AbdGBOvw (ORCPT ); Sun, 2 Jul 2017 10:51:52 -0400 Received: from smtp-1b.atlantis.sk ([80.94.52.26]:51173 "EHLO smtp-1b.atlantis.sk" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1750818AbdGBOvu (ORCPT ); Sun, 2 Jul 2017 10:51:50 -0400 From: Ondrej Zary To: Finn Thain Subject: Re: [PATCH v6 0/6] g_NCR5380: PDMA fixes and cleanup Date: Sun, 2 Jul 2017 16:51:36 +0200 User-Agent: KMail/1.9.10 (enterprise35 0.20100827.1168748) Cc: "James E.J. Bottomley" , "Martin K. Petersen" , linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org, Michael Schmitz References: <201707012349.04652.linux@rainbow-software.org> In-Reply-To: X-KMail-QuotePrefix: > MIME-Version: 1.0 Content-Type: Text/Plain; charset="ansi_x3.4-1968" Content-Transfer-Encoding: 7bit Content-Disposition: inline Message-Id: <201707021651.37016.linux@rainbow-software.org> Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Sunday 02 July 2017 05:11:27 Finn Thain wrote: > On Sat, 1 Jul 2017, Ondrej Zary wrote: > > The write corruption is still present - "start" must be rolled back in > > both IRQ and timeout cases. > > Your original algorithm aborts the transfer for a timeout. Same with mine. I do "start -= 2 * 128" even after timeout. > The bug must be a elsewhere. > > > And 128 B is not enough , 256 is OK (why did it work last time?). > > When I get contradictory results it usually means I booted the wrong build > or built the wrong branch. I've just retested PATCHv5, it really misses 128 bytes and works if I add "residual += 128;". > Actually, I think that adding 128 to the residual is correct in some > sitations, and 256 is correct in other situations. > > > We just wrote a buffer to the chip but the chip is writing the previous > > one to the drive - so if a problem arises, both buffers are lost. > > I see. I guess we have to take buffer swaps into account. > > > This fixes the corruption (although the "start > 0" check seems wrong > > now): --- a/drivers/scsi/g_NCR5380.c > > +++ b/drivers/scsi/g_NCR5380.c > > @@ -598,23 +598,17 @@ static inline int generic_NCR5380_psend(struct > > NCR5380_hostdata *hostdata, CSR_HOST_BUF_NOT_RDY, 0, > > hostdata->c400_ctl_status, > > CSR_GATED_53C80_IRQ, > > - CSR_GATED_53C80_IRQ, HZ / 64) < 0) > > - break; > > - > > - if (NCR5380_read(hostdata->c400_ctl_status) & > > - CSR_HOST_BUF_NOT_RDY) { > > + CSR_GATED_53C80_IRQ, HZ / 64) < 0 || > > + (NCR5380_read(hostdata->c400_ctl_status) & > > + (CSR_HOST_BUF_NOT_RDY | CSR_GATED_53C80_IRQ))) { > > You could add a printk to the timeout branch. If it executes, something is > seriously wrong. E.g. > > - break; > + { pr_err("send timeout %02x, %d/%d\n", > NCR5380_read(hostdata->c400_ctl_status), start, len); break; } Yes, timeouts do happen: [ 9671.909223] send timeout 14, 3840/4096 [ 9672.978079] send timeout 14, 2816/4096 [ 9675.323751] send timeout 14, 1280/4096 > > /* The chip has done a 128 B buffer swap but the first > > * buffer still has not reached the SCSI bus. > > */ > > if (start > 0) > > - start -= 128; > > + start -= 256; > > break; > > } > > BTW, that change carries the risk of 'start' going negative and the > residual exceeding the length of the original transfer. > > But I agree with you that there's a problem with the residual. > > If I understand correctly, the 53c400 can't do a buffer swap until the > disk acknowledges each of the 128 bytes from the buffer. But I guess the > first buffer is special because the disk will not see the first byte of > the transfer until after the first buffer swap. > > And it appears that the last buffer is also special: we have to wait for > CSR_HOST_BUF_NOT_RDY even after start == len otherwise we may not detect a > failure and fix the residual. So I think the datasheet is right; we have > to iterate until the block counter goes to zero. > > I think it is safe to say that when CSR_HOST_BUF_NOT_RDY, 'start' is > between 128 and 256 B ahead of the disk. Otherwise, the host buffer is > empty and 'start' is no more than 128 B ahead of the disk. > > > - if (NCR5380_read(hostdata->c400_ctl_status) & > > - CSR_GATED_53C80_IRQ) > > - break; > > - > > if (hostdata->io_port && hostdata->io_width == 2) > > outsw(hostdata->io_port + hostdata->c400_host_buf, > > src + start, 64); > > > > > > DTC seems to work too. > > OK. Thanks for testing. Please try the patch below on top of v6. It misses 256B blocks. It's caused by the timeouts, this patch fixes it: --- a/drivers/scsi/g_NCR5380.c +++ b/drivers/scsi/g_NCR5380.c @@ -598,11 +598,9 @@ static inline int generic_NCR5380_psend(struct NCR5380_hostdata *hostdata, CSR_HOST_BUF_NOT_RDY, 0, hostdata->c400_ctl_status, CSR_GATED_53C80_IRQ, - CSR_GATED_53C80_IRQ, HZ / 64) < 0) - break; - - if (NCR5380_read(hostdata->c400_ctl_status) & - CSR_HOST_BUF_NOT_RDY) { + CSR_GATED_53C80_IRQ, HZ / 64) < 0 || + (NCR5380_read(hostdata->c400_ctl_status) & + CSR_HOST_BUF_NOT_RDY)) { /* Both 128 B buffers are in use */ if (start >= 128) start -= 128; -- Ondrej Zary