* [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix @ 2008-12-21 7:32 ` David Brownell 0 siblings, 0 replies; 19+ messages in thread From: David Brownell @ 2008-12-21 7:32 UTC (permalink / raw) To: Andrew Morton, spi-devel-general; +Cc: lkml, Linus Torvalds, Vernon Sauder From: David Brownell <dbrownell@users.sourceforge.net> The recent "simplify spi_write_then_read()" patch included a small regression from the 2.6.27 behavior with its performance win. All SPI transfers are full duplex, and are packaged as half duplex by either discarding the data that's read ("write only"), or else by writing zeroes ("read only"). That patch wasn't ensuring that zeroes were getting written out during the "half duplex read" part of the transaction; instead, old RX bits were getting sent. The fix is trivial: zero the buffer before using it. Signed-off-by: David Brownell <dbrownell@users.sourceforge.net> Cc: Vernon Sauder <vernoninhand@gmail.com> --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -677,11 +677,13 @@ int spi_write_then_read(struct spi_device *spi, /* ... unless someone else is using the pre-allocated buffer */ if (!mutex_trylock(&lock)) { - local_buf = kmalloc(SPI_BUFSIZ, GFP_KERNEL); + local_buf = kzalloc(SPI_BUFSIZ, GFP_KERNEL); if (!local_buf) return -ENOMEM; - } else + } else { local_buf = buf; + memset(local_buf, 0, x.len); + } memcpy(local_buf, txbuf, n_tx); x.tx_buf = local_buf; ^ permalink raw reply [flat|nested] 19+ messages in thread
* [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix @ 2008-12-21 7:32 ` David Brownell 0 siblings, 0 replies; 19+ messages in thread From: David Brownell @ 2008-12-21 7:32 UTC (permalink / raw) To: Andrew Morton, spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f Cc: Vernon Sauder, Linus Torvalds, lkml From: David Brownell <dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org> The recent "simplify spi_write_then_read()" patch included a small regression from the 2.6.27 behavior with its performance win. All SPI transfers are full duplex, and are packaged as half duplex by either discarding the data that's read ("write only"), or else by writing zeroes ("read only"). That patch wasn't ensuring that zeroes were getting written out during the "half duplex read" part of the transaction; instead, old RX bits were getting sent. The fix is trivial: zero the buffer before using it. Signed-off-by: David Brownell <dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f@public.gmane.org> Cc: Vernon Sauder <vernoninhand-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> --- a/drivers/spi/spi.c +++ b/drivers/spi/spi.c @@ -677,11 +677,13 @@ int spi_write_then_read(struct spi_device *spi, /* ... unless someone else is using the pre-allocated buffer */ if (!mutex_trylock(&lock)) { - local_buf = kmalloc(SPI_BUFSIZ, GFP_KERNEL); + local_buf = kzalloc(SPI_BUFSIZ, GFP_KERNEL); if (!local_buf) return -ENOMEM; - } else + } else { local_buf = buf; + memset(local_buf, 0, x.len); + } memcpy(local_buf, txbuf, n_tx); x.tx_buf = local_buf; ------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix @ 2008-12-21 23:46 ` Linus Torvalds 0 siblings, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2008-12-21 23:46 UTC (permalink / raw) To: David Brownell; +Cc: Andrew Morton, spi-devel-general, lkml, Vernon Sauder On Sat, 20 Dec 2008, David Brownell wrote: > > All SPI transfers are full duplex, and are packaged as half duplex > by either discarding the data that's read ("write only"), or else > by writing zeroes ("read only"). That patch wasn't ensuring that > zeroes were getting written out during the "half duplex read" part > of the transaction; instead, old RX bits were getting sent. Hmm. In addition, isn't this broken (in that same function): memcpy(local_buf, txbuf, n_tx); x.tx_buf = local_buf; x.rx_buf = local_buf; /* do the i/o */ status = spi_sync(spi, &message); if (status == 0) memcpy(rxbuf, x.rx_buf + n_tx, n_rx); shouldn't that 'rx_buf' setup be x.rx_buf = local_buf + n_tx; since the whole point was that we allocated a buffer that can hold _both_ the rx and tx parts? Especially as that final copy into the resulting "rxbuf" thing uses that "+ n_tx" addition? Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix @ 2008-12-21 23:46 ` Linus Torvalds 0 siblings, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2008-12-21 23:46 UTC (permalink / raw) To: David Brownell Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Andrew Morton, Vernon Sauder, lkml On Sat, 20 Dec 2008, David Brownell wrote: > > All SPI transfers are full duplex, and are packaged as half duplex > by either discarding the data that's read ("write only"), or else > by writing zeroes ("read only"). That patch wasn't ensuring that > zeroes were getting written out during the "half duplex read" part > of the transaction; instead, old RX bits were getting sent. Hmm. In addition, isn't this broken (in that same function): memcpy(local_buf, txbuf, n_tx); x.tx_buf = local_buf; x.rx_buf = local_buf; /* do the i/o */ status = spi_sync(spi, &message); if (status == 0) memcpy(rxbuf, x.rx_buf + n_tx, n_rx); shouldn't that 'rx_buf' setup be x.rx_buf = local_buf + n_tx; since the whole point was that we allocated a buffer that can hold _both_ the rx and tx parts? Especially as that final copy into the resulting "rxbuf" thing uses that "+ n_tx" addition? Linus ------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix 2008-12-21 23:46 ` Linus Torvalds (?) @ 2008-12-22 0:48 ` David Brownell 2008-12-23 1:53 ` Ingo Oeser 2008-12-23 18:45 ` Linus Torvalds -1 siblings, 2 replies; 19+ messages in thread From: David Brownell @ 2008-12-22 0:48 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, spi-devel-general, lkml, Vernon Sauder On Sunday 21 December 2008, Linus Torvalds wrote: > > On Sat, 20 Dec 2008, David Brownell wrote: > > > > All SPI transfers are full duplex, and are packaged as half duplex > > by either discarding the data that's read ("write only"), or else > > by writing zeroes ("read only"). That patch wasn't ensuring that > > zeroes were getting written out during the "half duplex read" part > > of the transaction; instead, old RX bits were getting sent. > > Hmm. In addition, isn't this broken (in that same function): No -- this is full duplex. The write_then_read() helper is simplifying a common half-duplex idiom for short operations, but the harware still does full duplex. Buffer layout is: Before: WWWWW0000000 After: xxxxxRRRRRRR That is, for every bit shifted out (W, 0) another one gets shifted in (x, R). The I/O primitive essentially swaps contents of a one-word shift register between master and slave; or, sequences of such words. Words don't need to be byte-size, though that's a common option. > memcpy(local_buf, txbuf, n_tx); > x.tx_buf = local_buf; > x.rx_buf = local_buf; > > /* do the i/o */ > status = spi_sync(spi, &message); > if (status == 0) > memcpy(rxbuf, x.rx_buf + n_tx, n_rx); > > shouldn't that 'rx_buf' setup be > > x.rx_buf = local_buf + n_tx; > > since the whole point was that we allocated a buffer that can hold _both_ > the rx and tx parts? Especially as that final copy into the resulting > "rxbuf" thing uses that "+ n_tx" addition? See above. We only want the "R" bits which were shifted in right *after* the n_tx "W" bits. If we offset rx_buf before the I/O, we'd start with the "x" don't-care bits and need to do something else to discard them. (Plus, allocate more space at the end of the buffer.) - Dave > Linus > > ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix 2008-12-22 0:48 ` David Brownell @ 2008-12-23 1:53 ` Ingo Oeser 2008-12-23 18:37 ` David Brownell 2008-12-23 18:45 ` Linus Torvalds 1 sibling, 1 reply; 19+ messages in thread From: Ingo Oeser @ 2008-12-23 1:53 UTC (permalink / raw) To: David Brownell Cc: Linus Torvalds, Andrew Morton, spi-devel-general, lkml, Vernon Sauder Hi David, On Monday 22 December 2008, you wrote: > On Sunday 21 December 2008, Linus Torvalds wrote: > > > > On Sat, 20 Dec 2008, David Brownell wrote: > > > > > > All SPI transfers are full duplex, and are packaged as half duplex > > > by either discarding the data that's read ("write only"), or else > > > by writing zeroes ("read only"). That patch wasn't ensuring that > > > zeroes were getting written out during the "half duplex read" part > > > of the transaction; instead, old RX bits were getting sent. > > > > Hmm. In addition, isn't this broken (in that same function): > > No -- this is full duplex. The write_then_read() helper is > simplifying a common half-duplex idiom for short operations, > but the harware still does full duplex. Buffer layout is: > > Before: WWWWW0000000 > After: xxxxxRRRRRRR > > That is, for every bit shifted out (W, 0) another one gets > shifted in (x, R). The I/O primitive essentially swaps > contents of a one-word shift register between master and > slave; or, sequences of such words. Words don't need to > be byte-size, though that's a common option. > See above. We only want the "R" bits which were shifted in > right *after* the n_tx "W" bits. If we offset rx_buf before > the I/O, we'd start with the "x" don't-care bits and need to > do something else to discard them. (Plus, allocate more > space at the end of the buffer.) Wow, what interesting hardware logic and a nice explanation. Could you put that into a comment somewhere close to those helpers? You can safely assume, that any code which Linus doesn't understand is non-trivial and needs a comment :-) Best Regards Ingo Oeser ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix @ 2008-12-23 18:37 ` David Brownell 0 siblings, 0 replies; 19+ messages in thread From: David Brownell @ 2008-12-23 18:37 UTC (permalink / raw) To: Ingo Oeser Cc: Linus Torvalds, Andrew Morton, spi-devel-general, lkml, Vernon Sauder On Monday 22 December 2008, Ingo Oeser wrote: > Hi David, > > On Monday 22 December 2008, you wrote: > > The write_then_read() helper is > > simplifying a common half-duplex idiom for short operations, > > but the hardware still does full duplex. Buffer layout is: > > > > Before: WWWWW0000000 > > After: xxxxxRRRRRRR > > > > That is, for every bit shifted out (W, 0) another one gets > > shifted in (x, R). The I/O primitive essentially swaps > > contents of a one-word shift register between master and > > slave; or, sequences of such words. Words don't need to > > be byte-size, though that's a common option. > > ... > > Wow, what interesting hardware logic and a nice explanation. It's standard SPI ... yes, it's kind of interesting, and I've not come across anything else that synchronizes RX and TX in quite the same way. I suspect that hardware designers find the ability to get quick transfers (tens of megabits/second) from just a shift register is fairly attractive. Less work than high speed I2C, for one example. > Could you put that into a comment somewhere close to those helpers? Fair enough. That file does lack any mention of the I/O model for SPI ... and it's certainly a bit of a surprise to anyone who's not had to look at it before! > You can safely assume, that any code which Linus doesn't understand > is non-trivial and needs a comment :-) I'm not sure I'd agree with that -- while he's many things, "expert in everything" is not one of them! -- but I do agree that comment would be worth adding. However, with regards to $PATCH ... considering that the '28 release is just around the corner, I changed my mind and would suggest just reverting f9b90e39cbc5c4d6ef60022fd1f25d541df0aad1 much though I very much like that fix in general. Reason being: it needs a bit more work, I forgot that it will break for example drivers using drivers/spi/omap_uwire.c since that is built on top of truly half-duplex hardware. (Used with OMAP1 SOCs.) The kerneldoc says it's for providing MicroWire style transactions, so it's kind of significant that it work on MicroWire hardware too. ;) So an updated version would need to both zero the TX bits when using normal full-duplex SPI hardware ... and use more or less the current code when running on half-duplex variants. - Dave ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix @ 2008-12-23 18:37 ` David Brownell 0 siblings, 0 replies; 19+ messages in thread From: David Brownell @ 2008-12-23 18:37 UTC (permalink / raw) To: Ingo Oeser Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Andrew Morton, Vernon Sauder, Linus Torvalds, lkml On Monday 22 December 2008, Ingo Oeser wrote: > Hi David, > > On Monday 22 December 2008, you wrote: > > The write_then_read() helper is > > simplifying a common half-duplex idiom for short operations, > > but the hardware still does full duplex. Buffer layout is: > > > > Before: WWWWW0000000 > > After: xxxxxRRRRRRR > > > > That is, for every bit shifted out (W, 0) another one gets > > shifted in (x, R). The I/O primitive essentially swaps > > contents of a one-word shift register between master and > > slave; or, sequences of such words. Words don't need to > > be byte-size, though that's a common option. > > ... > > Wow, what interesting hardware logic and a nice explanation. It's standard SPI ... yes, it's kind of interesting, and I've not come across anything else that synchronizes RX and TX in quite the same way. I suspect that hardware designers find the ability to get quick transfers (tens of megabits/second) from just a shift register is fairly attractive. Less work than high speed I2C, for one example. > Could you put that into a comment somewhere close to those helpers? Fair enough. That file does lack any mention of the I/O model for SPI ... and it's certainly a bit of a surprise to anyone who's not had to look at it before! > You can safely assume, that any code which Linus doesn't understand > is non-trivial and needs a comment :-) I'm not sure I'd agree with that -- while he's many things, "expert in everything" is not one of them! -- but I do agree that comment would be worth adding. However, with regards to $PATCH ... considering that the '28 release is just around the corner, I changed my mind and would suggest just reverting f9b90e39cbc5c4d6ef60022fd1f25d541df0aad1 much though I very much like that fix in general. Reason being: it needs a bit more work, I forgot that it will break for example drivers using drivers/spi/omap_uwire.c since that is built on top of truly half-duplex hardware. (Used with OMAP1 SOCs.) The kerneldoc says it's for providing MicroWire style transactions, so it's kind of significant that it work on MicroWire hardware too. ;) So an updated version would need to both zero the TX bits when using normal full-duplex SPI hardware ... and use more or less the current code when running on half-duplex variants. - Dave ------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix @ 2008-12-23 18:45 ` Linus Torvalds 0 siblings, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2008-12-23 18:45 UTC (permalink / raw) To: David Brownell; +Cc: Andrew Morton, spi-devel-general, lkml, Vernon Sauder On Sun, 21 Dec 2008, David Brownell wrote: > On Sunday 21 December 2008, Linus Torvalds wrote: > > > > Hmm. In addition, isn't this broken (in that same function): > > No -- this is full duplex. The write_then_read() helper is > simplifying a common half-duplex idiom for short operations, > but the harware still does full duplex. Buffer layout is: > > Before: WWWWW0000000 > After: xxxxxRRRRRRR Then, the problem is this: x.tx_buf = local_buf; x.rx_buf = local_buf; which makes no sense. It's not two separate "tx_buf"/"rx_buf" things, it's just a single "buf". Having two separate pointers is not only confusing, but it's actively WRONG, since the "rx_buf" clearly does not start at local_buf. But that's not how most spi drivers seem to do it. They literally set tx_bug and rx_buf to different values. So again, none of what you talk about makes sense, and none of your "explanations" are anything but just incoherent noise. So please explain how it can _possibly_ make sense to - have two separate pointers - and have one of them point to what is clearly not the data it is supposed to point to. Hmm? Linus ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix @ 2008-12-23 18:45 ` Linus Torvalds 0 siblings, 0 replies; 19+ messages in thread From: Linus Torvalds @ 2008-12-23 18:45 UTC (permalink / raw) To: David Brownell Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Andrew Morton, Vernon Sauder, lkml On Sun, 21 Dec 2008, David Brownell wrote: > On Sunday 21 December 2008, Linus Torvalds wrote: > > > > Hmm. In addition, isn't this broken (in that same function): > > No -- this is full duplex. The write_then_read() helper is > simplifying a common half-duplex idiom for short operations, > but the harware still does full duplex. Buffer layout is: > > Before: WWWWW0000000 > After: xxxxxRRRRRRR Then, the problem is this: x.tx_buf = local_buf; x.rx_buf = local_buf; which makes no sense. It's not two separate "tx_buf"/"rx_buf" things, it's just a single "buf". Having two separate pointers is not only confusing, but it's actively WRONG, since the "rx_buf" clearly does not start at local_buf. But that's not how most spi drivers seem to do it. They literally set tx_bug and rx_buf to different values. So again, none of what you talk about makes sense, and none of your "explanations" are anything but just incoherent noise. So please explain how it can _possibly_ make sense to - have two separate pointers - and have one of them point to what is clearly not the data it is supposed to point to. Hmm? Linus ------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix @ 2008-12-23 20:54 ` David Brownell 0 siblings, 0 replies; 19+ messages in thread From: David Brownell @ 2008-12-23 20:54 UTC (permalink / raw) To: Linus Torvalds; +Cc: Andrew Morton, spi-devel-general, lkml, Vernon Sauder On Tuesday 23 December 2008, Linus Torvalds wrote: > > On Sun, 21 Dec 2008, David Brownell wrote: > > > On Sunday 21 December 2008, Linus Torvalds wrote: > > > > > > Hmm. In addition, isn't this broken (in that same function): > > > > No -- this is full duplex. The write_then_read() helper is > > simplifying a common half-duplex idiom for short operations, > > but the hardware still does full duplex. Buffer layout is: > > > > Before: WWWWW0000000 > > After: xxxxxRRRRRRR > > Then, the problem is this: > > x.tx_buf = local_buf; > x.rx_buf = local_buf; > > which makes no sense. It's not two separate "tx_buf"/"rx_buf" things, > it's just a single "buf". In *this* specific case, yes. That's the whole point of this routine: letting drivers treat a full duplex hardware protocol as if it were a half duplex model. Note that the *caller* provides two separate buffers ... which can safely be on the stack; memcpy is used to access them. The whole local_buf thing is a bounce-buffer, letting the caller ignore DMA and other I/O issues that can be confusing. You're not quite understanding an optimization, I suspect. The original version of this code had two separate tranfers, and each was half duplex: first a TX, then an RX ... and with the buffer pointer in the RX transfer offset as you seem to expect. But it uses local_buf in the *same* way. The only thing different is how the I/O is described. With ", " separating the first and second transfers: Before: WWWWW, 0000000 After: xxxxx, RRRRRRR Previously "discard RX data" (xxxxx) and "TX zeroes" (000000) were implicit, because the TX transfer had no RX buffer and then the RX transfer had no TX buffer. (You'll get that code if you revert the patch, as I suggest for '28-final.) The "interesting" RX data landed in the same place. Thing is, the mid-message switch between two transfers made for nasty performance when DMA or similar pipelining (FIFOs) is involved. The cost to switch from the TX transfer to the RX transfer could easily double the time to perform the whole write-then-read message. (And it exposed a bug in one SPI master controller driver, since fixed.) Hence the switch to a lower overhead implementation, which implements the same on-the-wire protocol more efficiently. It made the "xxxxx" and "0000000" parts explicit, and uses one transfer not two. (And exposed DMA bugs in other SPI master controller drivers ... in one case, its fix seemed to resolve some longstanding hard-to-reproduce problems.) And that's why the offset to the RRRRRRR bits went from being explicit to being implicit: it's doing the whole thing in one transfer, rather than two. > Having two separate pointers is not only confusing, but it's actively > WRONG, since the "rx_buf" clearly does not start at local_buf. Erm, no. It's right as is. Watch the bits on the wire, or as they arrive in the buffers -- it's correct. Bits are arriving and going into rx_buf, starting at exactly the beginning of local_buf. But none of those bits are interesting to the caller until the ones read AFTER the write completes. Hence the name of the function: write, then read. SPI itself does both at the same time. > So please explain how it can _possibly_ make sense to > > - have two separate pointers In the general case, the full duplex nature of SPI is available for drivers to do what they want. This version of write_then_read() is using that for a performance optimization. It would be extremely surprising for a driver to write data and always have its transmit buffer overwritten -- unless it asked for that behavior. Especially for a driver that doesn't care about the data the slave returns from that particular message! Likewise if it had to initialize each RX buffer before using it. The per-transfer interface prevents such surprises by splitting out the RX and TX buffers. Code that wants overwriting can get it; code that doesn't, isn't forced to cope with that. > - and have one of them point to what is clearly not the data it is > supposed to point to. See above ... you're not seeing the consequences of each transfer being full duplex. There aren't many ways to package an interface to hardware transfers where no bit is received without transmitting another one: - Only provide half duplex transfers ... not a good choice, some devices do need full duplex, and custom drivers have wanted to leverage the full duplex hardware. - Only provide a primitive where every TX buffer gets clobbered by RX data, and every RX buffer must be zeroed ... limiting, because of half-duplex variants (Microwire, and "3-wire SPI") that would be precluded. - Provide a primitive with both RX and TX buffers, and let the drivers set them up as needed. The current framework uses the last option, which still seems to me to be the best choice. - Dave ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix @ 2008-12-23 20:54 ` David Brownell 0 siblings, 0 replies; 19+ messages in thread From: David Brownell @ 2008-12-23 20:54 UTC (permalink / raw) To: Linus Torvalds Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Andrew Morton, Vernon Sauder, lkml On Tuesday 23 December 2008, Linus Torvalds wrote: > > On Sun, 21 Dec 2008, David Brownell wrote: > > > On Sunday 21 December 2008, Linus Torvalds wrote: > > > > > > Hmm. In addition, isn't this broken (in that same function): > > > > No -- this is full duplex. The write_then_read() helper is > > simplifying a common half-duplex idiom for short operations, > > but the hardware still does full duplex. Buffer layout is: > > > > Before: WWWWW0000000 > > After: xxxxxRRRRRRR > > Then, the problem is this: > > x.tx_buf = local_buf; > x.rx_buf = local_buf; > > which makes no sense. It's not two separate "tx_buf"/"rx_buf" things, > it's just a single "buf". In *this* specific case, yes. That's the whole point of this routine: letting drivers treat a full duplex hardware protocol as if it were a half duplex model. Note that the *caller* provides two separate buffers ... which can safely be on the stack; memcpy is used to access them. The whole local_buf thing is a bounce-buffer, letting the caller ignore DMA and other I/O issues that can be confusing. You're not quite understanding an optimization, I suspect. The original version of this code had two separate tranfers, and each was half duplex: first a TX, then an RX ... and with the buffer pointer in the RX transfer offset as you seem to expect. But it uses local_buf in the *same* way. The only thing different is how the I/O is described. With ", " separating the first and second transfers: Before: WWWWW, 0000000 After: xxxxx, RRRRRRR Previously "discard RX data" (xxxxx) and "TX zeroes" (000000) were implicit, because the TX transfer had no RX buffer and then the RX transfer had no TX buffer. (You'll get that code if you revert the patch, as I suggest for '28-final.) The "interesting" RX data landed in the same place. Thing is, the mid-message switch between two transfers made for nasty performance when DMA or similar pipelining (FIFOs) is involved. The cost to switch from the TX transfer to the RX transfer could easily double the time to perform the whole write-then-read message. (And it exposed a bug in one SPI master controller driver, since fixed.) Hence the switch to a lower overhead implementation, which implements the same on-the-wire protocol more efficiently. It made the "xxxxx" and "0000000" parts explicit, and uses one transfer not two. (And exposed DMA bugs in other SPI master controller drivers ... in one case, its fix seemed to resolve some longstanding hard-to-reproduce problems.) And that's why the offset to the RRRRRRR bits went from being explicit to being implicit: it's doing the whole thing in one transfer, rather than two. > Having two separate pointers is not only confusing, but it's actively > WRONG, since the "rx_buf" clearly does not start at local_buf. Erm, no. It's right as is. Watch the bits on the wire, or as they arrive in the buffers -- it's correct. Bits are arriving and going into rx_buf, starting at exactly the beginning of local_buf. But none of those bits are interesting to the caller until the ones read AFTER the write completes. Hence the name of the function: write, then read. SPI itself does both at the same time. > So please explain how it can _possibly_ make sense to > > - have two separate pointers In the general case, the full duplex nature of SPI is available for drivers to do what they want. This version of write_then_read() is using that for a performance optimization. It would be extremely surprising for a driver to write data and always have its transmit buffer overwritten -- unless it asked for that behavior. Especially for a driver that doesn't care about the data the slave returns from that particular message! Likewise if it had to initialize each RX buffer before using it. The per-transfer interface prevents such surprises by splitting out the RX and TX buffers. Code that wants overwriting can get it; code that doesn't, isn't forced to cope with that. > - and have one of them point to what is clearly not the data it is > supposed to point to. See above ... you're not seeing the consequences of each transfer being full duplex. There aren't many ways to package an interface to hardware transfers where no bit is received without transmitting another one: - Only provide half duplex transfers ... not a good choice, some devices do need full duplex, and custom drivers have wanted to leverage the full duplex hardware. - Only provide a primitive where every TX buffer gets clobbered by RX data, and every RX buffer must be zeroed ... limiting, because of half-duplex variants (Microwire, and "3-wire SPI") that would be precluded. - Provide a primitive with both RX and TX buffers, and let the drivers set them up as needed. The current framework uses the last option, which still seems to me to be the best choice. - Dave ------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <200812231254.55284.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org>]
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix [not found] ` <200812231254.55284.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> @ 2008-12-24 3:26 ` Tang, Feng [not found] ` <EADF0A36011179459010BDF5142A45750458C7DD-Uz4Je35TzWuiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Tang, Feng @ 2008-12-24 3:26 UTC (permalink / raw) To: David Brownell, Linus Torvalds Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Andrew Morton, Vernon Sauder Hi David, I'm writing a SPI-UART device driver for max3111, its working mechanism is if you want read data back, you need write 0 to it first, and almost every readback data from max3111 is meaningful. My driver uses the spi_write_then_read() API, it write some 0s to the device, and read all the feedback from the device, as the feedback may have valid receive data. The format is Before: WWWWWW After: RRRRRR The driver works in 2.6.27 with your previous spi_write_then_read() implementation, but fails on 2.6.28 as it dumps all the readback value of the write. Hope this case be useful to the discussion. Thanks, Feng -----Original Message----- From: David Brownell [mailto:david-b@pacbell.net] Sent: 2008年12月24日 4:55 To: Linus Torvalds Cc: spi-devel-general@lists.sourceforge.net; Andrew Morton; Vernon Sauder; lkml Subject: Re: [spi-devel-general] [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix On Tuesday 23 December 2008, Linus Torvalds wrote: > > On Sun, 21 Dec 2008, David Brownell wrote: > > > On Sunday 21 December 2008, Linus Torvalds wrote: > > > > > > Hmm. In addition, isn't this broken (in that same function): > > > > No -- this is full duplex. The write_then_read() helper is > > simplifying a common half-duplex idiom for short operations, > > but the hardware still does full duplex. Buffer layout is: > > > > Before: WWWWW0000000 > > After: xxxxxRRRRRRR > > Then, the problem is this: > > x.tx_buf = local_buf; > x.rx_buf = local_buf; > > which makes no sense. It's not two separate "tx_buf"/"rx_buf" things, > it's just a single "buf". In *this* specific case, yes. That's the whole point of this routine: letting drivers treat a full duplex hardware protocol as if it were a half duplex model. Note that the *caller* provides two separate buffers ... which can safely be on the stack; memcpy is used to access them. The whole local_buf thing is a bounce-buffer, letting the caller ignore DMA and other I/O issues that can be confusing. You're not quite understanding an optimization, I suspect. The original version of this code had two separate tranfers, and each was half duplex: first a TX, then an RX ... and with the buffer pointer in the RX transfer offset as you seem to expect. But it uses local_buf in the *same* way. The only thing different is how the I/O is described. With ", " separating the first and second transfers: Before: WWWWW, 0000000 After: xxxxx, RRRRRRR Previously "discard RX data" (xxxxx) and "TX zeroes" (000000) were implicit, because the TX transfer had no RX buffer and then the RX transfer had no TX buffer. (You'll get that code if you revert the patch, as I suggest for '28-final.) The "interesting" RX data landed in the same place. Thing is, the mid-message switch between two transfers made for nasty performance when DMA or similar pipelining (FIFOs) is involved. The cost to switch from the TX transfer to the RX transfer could easily double the time to perform the whole write-then-read message. (And it exposed a bug in one SPI master controller driver, since fixed.) Hence the switch to a lower overhead implementation, which implements the same on-the-wire protocol more efficiently. It made the "xxxxx" and "0000000" parts explicit, and uses one transfer not two. (And exposed DMA bugs in other SPI master controller drivers ... in one case, its fix seemed to resolve some longstanding hard-to-reproduce problems.) And that's why the offset to the RRRRRRR bits went from being explicit to being implicit: it's doing the whole thing in one transfer, rather than two. > Having two separate pointers is not only confusing, but it's actively > WRONG, since the "rx_buf" clearly does not start at local_buf. Erm, no. It's right as is. Watch the bits on the wire, or as they arrive in the buffers -- it's correct. Bits are arriving and going into rx_buf, starting at exactly the beginning of local_buf. But none of those bits are interesting to the caller until the ones read AFTER the write completes. Hence the name of the function: write, then read. SPI itself does both at the same time. > So please explain how it can _possibly_ make sense to > > - have two separate pointers In the general case, the full duplex nature of SPI is available for drivers to do what they want. This version of write_then_read() is using that for a performance optimization. It would be extremely surprising for a driver to write data and always have its transmit buffer overwritten -- unless it asked for that behavior. Especially for a driver that doesn't care about the data the slave returns from that particular message! Likewise if it had to initialize each RX buffer before using it. The per-transfer interface prevents such surprises by splitting out the RX and TX buffers. Code that wants overwriting can get it; code that doesn't, isn't forced to cope with that. > - and have one of them point to what is clearly not the data it is > supposed to point to. See above ... you're not seeing the consequences of each transfer being full duplex. There aren't many ways to package an interface to hardware transfers where no bit is received without transmitting another one: - Only provide half duplex transfers ... not a good choice, some devices do need full duplex, and custom drivers have wanted to leverage the full duplex hardware. - Only provide a primitive where every TX buffer gets clobbered by RX data, and every RX buffer must be zeroed ... limiting, because of half-duplex variants (Microwire, and "3-wire SPI") that would be precluded. - Provide a primitive with both RX and TX buffers, and let the drivers set them up as needed. The current framework uses the last option, which still seems to me to be the best choice. - Dave ------------------------------------------------------------------------------ _______________________________________________ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general ------------------------------------------------------------------------------ _______________________________________________ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <EADF0A36011179459010BDF5142A45750458C7DD-Uz4Je35TzWuiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix [not found] ` <EADF0A36011179459010BDF5142A45750458C7DD-Uz4Je35TzWuiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2008-12-24 3:53 ` David Brownell [not found] ` <200812231953.26407.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: David Brownell @ 2008-12-24 3:53 UTC (permalink / raw) To: Tang, Feng Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Andrew Morton, Vernon Sauder, Linus Torvalds On Tuesday 23 December 2008, Tang, Feng wrote: > I'm writing a SPI-UART device driver for max3111, its working > mechanism is if you want read data back, you need write 0 to > it first ... > > The driver works in 2.6.27 with your previous spi_write_then_read() > implementation, but fails on 2.6.28 as it dumps all the readback > value of the write. And so it works with the $SUBJECT patch, yes? ------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <200812231953.26407.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org>]
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix [not found] ` <200812231953.26407.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> @ 2008-12-24 4:20 ` Tang, Feng [not found] ` <EADF0A36011179459010BDF5142A45750458C815-Uz4Je35TzWuiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: Tang, Feng @ 2008-12-24 4:20 UTC (permalink / raw) To: David Brownell Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Andrew Morton, Vernon Sauder, Linus Torvalds No. After that it fails on 2.6.28, I wrote a new API inside my driver which is similar to your 2.6.28 spi_write_then_read(), but modify the buffer layout. And the new API works fine for my max3111 device. Thanks, Feng -----Original Message----- From: David Brownell [mailto:david-b@pacbell.net] Sent: 2008年12月24日 11:53 To: Tang, Feng Cc: Linus Torvalds; spi-devel-general@lists.sourceforge.net; Andrew Morton; Vernon Sauder Subject: Re: [spi-devel-general] [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix On Tuesday 23 December 2008, Tang, Feng wrote: > I'm writing a SPI-UART device driver for max3111, its working > mechanism is if you want read data back, you need write 0 to > it first ... > > The driver works in 2.6.27 with your previous spi_write_then_read() > implementation, but fails on 2.6.28 as it dumps all the readback > value of the write. And so it works with the $SUBJECT patch, yes? ------------------------------------------------------------------------------ _______________________________________________ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <EADF0A36011179459010BDF5142A45750458C815-Uz4Je35TzWuiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org>]
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix [not found] ` <EADF0A36011179459010BDF5142A45750458C815-Uz4Je35TzWuiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org> @ 2008-12-24 8:39 ` David Brownell [not found] ` <200812240039.32008.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> 0 siblings, 1 reply; 19+ messages in thread From: David Brownell @ 2008-12-24 8:39 UTC (permalink / raw) To: Tang, Feng Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Andrew Morton, Vernon Sauder, Linus Torvalds On Tuesday 23 December 2008, Tang, Feng wrote: > o. After that it fails on 2.6.28, I wrote a new API inside > my driver which is similar to your 2.6.28 spi_write_then_read(), > but modify the buffer layout. And the new API works fine for my > max3111 device. Sounds to me like the issue is a buggy SPI controller driver ... is this a driver that's in mainline? I've certainly seen a couple drivers that handled the full duplex behavior incorrectly, until they fixed that bug. As in, they wrote the tx_buf *then* read rx_buf (a half-duplex model) instead of doing them concurrently (the full-duplex model a spi_master must implement). > > Before: WWWWWW > > After: RRRRRR That's obviously incorrect, or you're introducing some new notation. See my previous notes. - Dave ------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <200812240039.32008.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org>]
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix [not found] ` <200812240039.32008.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> @ 2008-12-24 8:58 ` Baruch Siach [not found] ` <20081224085844.GA11197-l+PWtdWbHAs2zBO71LVbc/qBs+8SCbDb@public.gmane.org> 2008-12-25 7:03 ` Tang, Feng 1 sibling, 1 reply; 19+ messages in thread From: Baruch Siach @ 2008-12-24 8:58 UTC (permalink / raw) To: David Brownell Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Vernon Sauder, Linus Torvalds, Andrew Morton Hi David, On Wed, Dec 24, 2008 at 12:39:31AM -0800, David Brownell wrote: > I've certainly seen a couple drivers that handled the > full duplex behavior incorrectly, until they fixed that > bug. As in, they wrote the tx_buf *then* read rx_buf > (a half-duplex model) instead of doing them concurrently > (the full-duplex model a spi_master must implement). May you name those drivers and the commits with the bug fixes? I couldn't find them in Linus' tree. I'm working on an SPI master (DesignWare) driver and I want to make sure that I'm implementing the API correctly. baruch -- ~. .~ Tk Open Systems =}------------------------------------------------ooO--U--Ooo------------{= - baruch-NswTu9S1W3P6gbPvEgmw2w@public.gmane.org - tel: +972.2.679.5364, http://www.tkos.co.il - ------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 19+ messages in thread
[parent not found: <20081224085844.GA11197-l+PWtdWbHAs2zBO71LVbc/qBs+8SCbDb@public.gmane.org>]
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix [not found] ` <20081224085844.GA11197-l+PWtdWbHAs2zBO71LVbc/qBs+8SCbDb@public.gmane.org> @ 2008-12-24 19:43 ` David Brownell 0 siblings, 0 replies; 19+ messages in thread From: David Brownell @ 2008-12-24 19:43 UTC (permalink / raw) To: Baruch Siach Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Vernon Sauder, Linus Torvalds, Andrew Morton On Wednesday 24 December 2008, Baruch Siach wrote: > > I've certainly seen a couple drivers that handled the > > full duplex behavior incorrectly, until they fixed that > > bug. As in, they wrote the tx_buf *then* read rx_buf > > (a half-duplex model) instead of doing them concurrently > > (the full-duplex model a spi_master must implement). > > May you name those drivers and the commits with the bug fixes? > I couldn't find them in Linus' tree. I don't recall such a driver merging to mainline; the idea is to catch such fundamental bugs before they merge. > I'm working on an SPI master (DesignWare) driver and I want > to make sure that I'm implementing the API correctly. Lacking a test suite for the spi_master drivers, I suggest that you verify your driver works with some of the existing protocol drivers. If at all possible, hook up a couple SPI slaves to different chipselects on your master, and test them all. Then, if you find misbehavior when you test a driver that's already known to work with a bunch of controller drivers, it's more likely that the bug is in your new code than the existing code. - Dave ------------------------------------------------------------------------------ ^ permalink raw reply [flat|nested] 19+ messages in thread
* Re: [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix [not found] ` <200812240039.32008.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> 2008-12-24 8:58 ` Baruch Siach @ 2008-12-25 7:03 ` Tang, Feng 1 sibling, 0 replies; 19+ messages in thread From: Tang, Feng @ 2008-12-25 7:03 UTC (permalink / raw) To: dbrownell-Rn4VEauK+AKRv+LV9MX5uipxlwaOVQ5f Cc: spi-devel-general-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f, Andrew Morton, Vernon Sauder, Linus Torvalds >-----Original Message----- >From: David Brownell [mailto:david-b@pacbell.net] >Sent: 2008年12月24日 16:40 >To: Tang, Feng >Cc: Linus Torvalds; spi-devel-general@lists.sourceforge.net; Andrew Morton; >Vernon Sauder >Subject: Re: [spi-devel-general] [patch 2.6.28-rc9] spi: spi_write_then_read() >regression fix > >On Tuesday 23 December 2008, Tang, Feng wrote: >> o. After that it fails on 2.6.28, I wrote a new API inside >> my driver which is similar to your 2.6.28 spi_write_then_read(), >> but modify the buffer layout. And the new API works fine for my >> max3111 device. > >Sounds to me like the issue is a buggy SPI controller >driver ... is this a driver that's in mainline? [Tang, Feng] The HW I use is a SPI controller from Synopsys, I wrote and am still debugging the driver as the silicon is not stable yet. > >I've certainly seen a couple drivers that handled the >full duplex behavior incorrectly, until they fixed that >bug. As in, they wrote the tx_buf *then* read rx_buf >(a half-duplex model) instead of doing them concurrently >(the full-duplex model a spi_master must implement). > > >> > Before: WWWWWW >> > After: RRRRRR > >That's obviously incorrect, or you're introducing >some new notation. See my previous notes. I agree with your second assumption :). It's a new use case, for the max3111 spi-uart device, no matter you write a valid char or a 0 to it, the readback data may contain a valid data (some bits indicating the data is valid or not), so no data should be dumped. Following link contains the datasheet of max3111 http://pdf1.alldatasheet.com/datasheet-pdf/view/73121/MAXIM/MAX3110.html . > >- Dave > Thanks, Feng ------------------------------------------------------------------------------ _______________________________________________ spi-devel-general mailing list spi-devel-general@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/spi-devel-general ^ permalink raw reply [flat|nested] 19+ messages in thread
end of thread, other threads:[~2008-12-25 7:03 UTC | newest] Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-12-21 7:32 [patch 2.6.28-rc9] spi: spi_write_then_read() regression fix David Brownell 2008-12-21 7:32 ` David Brownell 2008-12-21 23:46 ` Linus Torvalds 2008-12-21 23:46 ` Linus Torvalds 2008-12-22 0:48 ` David Brownell 2008-12-23 1:53 ` Ingo Oeser 2008-12-23 18:37 ` David Brownell 2008-12-23 18:37 ` David Brownell 2008-12-23 18:45 ` Linus Torvalds 2008-12-23 18:45 ` Linus Torvalds 2008-12-23 20:54 ` David Brownell 2008-12-23 20:54 ` David Brownell [not found] ` <200812231254.55284.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> 2008-12-24 3:26 ` Tang, Feng [not found] ` <EADF0A36011179459010BDF5142A45750458C7DD-Uz4Je35TzWuiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2008-12-24 3:53 ` David Brownell [not found] ` <200812231953.26407.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> 2008-12-24 4:20 ` Tang, Feng [not found] ` <EADF0A36011179459010BDF5142A45750458C815-Uz4Je35TzWuiAffOGbnezLfspsVTdybXVpNB7YpNyf8@public.gmane.org> 2008-12-24 8:39 ` David Brownell [not found] ` <200812240039.32008.david-b-yBeKhBN/0LDR7s880joybQ@public.gmane.org> 2008-12-24 8:58 ` Baruch Siach [not found] ` <20081224085844.GA11197-l+PWtdWbHAs2zBO71LVbc/qBs+8SCbDb@public.gmane.org> 2008-12-24 19:43 ` David Brownell 2008-12-25 7:03 ` Tang, Feng
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.