All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] consistent_sync and non L1 cache line aligned buffers
@ 2003-07-15  4:32 Eugene Surovegin
  2003-07-15 15:46 ` Tom Rini
  2003-07-15 16:17 ` Matt Porter
  0 siblings, 2 replies; 24+ messages in thread
From: Eugene Surovegin @ 2003-07-15  4:32 UTC (permalink / raw)
  To: linuxppc-embedded


Hi!

I think this is a known problem.

There are drivers or even subsystems which use stack allocated DMA buffers.
To make things worse, those buffers usually non L1 cache line aligned
(start and/or end).

When they use pci_map_* with PCI_DMA_FROMDEVICE, consistent_sync calls
invalidate_dcache_range for the buffer.

invalidate_dcache_range works in L1_CACHE_LINE chunks, so if start and/or
end of the buffer are not aligned we may corrupt data located in the same
cache line (usually stack variable(s) declared before or after buffer
declaration).

According to MV kernel, there are USB devices that use such buffers.

After spending last weekend with RISCWatch :) I can say that SCSI subsystem
is also guilty of this behavior (drivers/scsi/scsi_scan.c::scan_scsis,
scsi_result0).

Unfortunately, I don't know how many similar places of code are still
waiting to be found :(.
To be safe I think it's better to modify consistent_sync to handle such
"bad" buffers.

If start and/or end of the buffer are not properly aligned I use "dcbf" to
flush corresponding cache line(s) and then call invalidate_dcache_range.

This change doesn't affect performance of consistent_sync noticeably (like
in the variant I found in MV kernel, where invalidate_dcache_range was
changed to flush_dcache_range if USB was enabled)

I don't know whether we should "ifdef" this for CONFIG_4xx and I know this
fix is ugly :)
I'm not even sure that such hacks should be included in the kernel :)))
(but I will definitely use it in my tree)

Comments/suggestions are welcome!

Thanks,

Eugene


===== arch/ppc/mm/cachemap.c 1.13 vs edited =====
--- 1.13/arch/ppc/mm/cachemap.c	Thu Feb 27 11:40:16 2003
+++ edited/arch/ppc/mm/cachemap.c	Mon Jul 14 20:49:28 2003
@@ -150,6 +150,21 @@
  	case PCI_DMA_NONE:
  		BUG();
  	case PCI_DMA_FROMDEVICE:	/* invalidate only */
+
+		/* Handle cases when the buffer start and/or end
+		   are not L1 cache line aligned.
+		   Some drivers/subsystems (e.g. USB, SCSI) do DMA
+		   from the stack allocated buffers, to prevent
+		   corruption of the other stack variables located
+		   near the buffer, we flush (instead of invalidate)
+		   these "dangerous" areas                     --ebs
+		*/
+		if (unlikely(start & (L1_CACHE_LINE_SIZE - 1)))
+			__asm__ __volatile__("dcbf 0,%0" : : "r" (start));
+
+		if (unlikely(end & (L1_CACHE_LINE_SIZE - 1)))
+			__asm__ __volatile__("dcbf 0,%0" : : "r" (end));
+
  		invalidate_dcache_range(start, end);
  		break;
  	case PCI_DMA_TODEVICE:		/* writeback only */


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] consistent_sync and non L1 cache line aligned buffers
  2003-07-15  4:32 [RFC] consistent_sync and non L1 cache line aligned buffers Eugene Surovegin
@ 2003-07-15 15:46 ` Tom Rini
  2003-07-15 16:20   ` Eugene Surovegin
  2003-07-15 16:17 ` Matt Porter
  1 sibling, 1 reply; 24+ messages in thread
From: Tom Rini @ 2003-07-15 15:46 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-embedded


On Mon, Jul 14, 2003 at 09:32:07PM -0700, Eugene Surovegin wrote:

> I think this is a known problem.

Yes, fortunatly.

> According to MV kernel, there are USB devices that use such buffers.
>
> After spending last weekend with RISCWatch :) I can say that SCSI subsystem
> is also guilty of this behavior (drivers/scsi/scsi_scan.c::scan_scsis,
> scsi_result0).

Owch.

> Unfortunately, I don't know how many similar places of code are still
> waiting to be found :(.
> To be safe I think it's better to modify consistent_sync to handle such
> "bad" buffers.
>
> If start and/or end of the buffer are not properly aligned I use "dcbf" to
> flush corresponding cache line(s) and then call invalidate_dcache_range.
>
> This change doesn't affect performance of consistent_sync noticeably (like
> in the variant I found in MV kernel, where invalidate_dcache_range was
> changed to flush_dcache_range if USB was enabled)

Good to know.

> I don't know whether we should "ifdef" this for CONFIG_4xx and I know this
> fix is ugly :)
> I'm not even sure that such hacks should be included in the kernel :)))
> (but I will definitely use it in my tree)
>
> Comments/suggestions are welcome!

Well, one thing that is worth noting is that the USB people knew this
was a problem, and it was / should have been fixed in the 2.5 cycle.
Similarly, SCSI was cleaned up a lot, so perhaps this has been fixed
there.  I think it's generally known that doing DMA off of the stack is
a bad idea, and should be fixed when found.

--
Tom Rini
http://gate.crashing.org/~trini/

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] consistent_sync and non L1 cache line aligned buffers
  2003-07-15  4:32 [RFC] consistent_sync and non L1 cache line aligned buffers Eugene Surovegin
  2003-07-15 15:46 ` Tom Rini
@ 2003-07-15 16:17 ` Matt Porter
  2003-07-15 16:27   ` Eugene Surovegin
  1 sibling, 1 reply; 24+ messages in thread
From: Matt Porter @ 2003-07-15 16:17 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-embedded


On Mon, Jul 14, 2003 at 09:32:07PM -0700, Eugene Surovegin wrote:
>
> Hi!
>
> I think this is a known problem.
>
> There are drivers or even subsystems which use stack allocated DMA buffers.
> To make things worse, those buffers usually non L1 cache line aligned
> (start and/or end).
>
> When they use pci_map_* with PCI_DMA_FROMDEVICE, consistent_sync calls
> invalidate_dcache_range for the buffer.
>
> invalidate_dcache_range works in L1_CACHE_LINE chunks, so if start and/or
> end of the buffer are not aligned we may corrupt data located in the same
> cache line (usually stack variable(s) declared before or after buffer
> declaration).
>
> According to MV kernel, there are USB devices that use such buffers.

Well, the USB subsystem itself in 2.4 has stack DMA buffers completely
intertwined, yes.  Fixing it in the USB stack was nontrivial enough
to force it to their 2.5/2.6 code.

> After spending last weekend with RISCWatch :) I can say that SCSI subsystem
> is also guilty of this behavior (drivers/scsi/scsi_scan.c::scan_scsis,
> scsi_result0).

I went through the SCSI subsystem a while back and found a few including
that one too. :)  These should be passed up since they are clear
violations of the DMA API.  I dropped that task since (as you know)
there are other issues with using SCSI drivers on "non-coherents" at
the moment.

> Unfortunately, I don't know how many similar places of code are still
> waiting to be found :(.

<snip>

> Comments/suggestions are welcome!

I'll agree that it's a better hack, but since the offending areas in
the SCSI subsystem are easily located, it seems wiser to fix upstream.
Just my US 2 cents.

We still need someone with interest AND time to properly fix the
consistent alloc from irq issue. :)  All of the patches post to date
are incomplete bandaids.

Regards,
--
Matt Porter
mporter@kernel.crashing.org

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] consistent_sync and non L1 cache line aligned buffers
  2003-07-15 15:46 ` Tom Rini
@ 2003-07-15 16:20   ` Eugene Surovegin
  2003-07-15 16:25     ` Tom Rini
  0 siblings, 1 reply; 24+ messages in thread
From: Eugene Surovegin @ 2003-07-15 16:20 UTC (permalink / raw)
  To: Tom Rini; +Cc: linuxppc-embedded


At 08:46 AM 7/15/2003, Tom Rini wrote:
>Well, one thing that is worth noting is that the USB people knew this
>was a problem, and it was / should have been fixed in the 2.5 cycle.
>Similarly, SCSI was cleaned up a lot, so perhaps this has been fixed
>there.  I think it's generally known that doing DMA off of the stack is
>a bad idea, and should be fixed when found.

I agree this is VERY bad idea but the fact is that there is a code which
does such nasty things.

I truly hope all this will/was fixed in 2.5 but frankly I wouldn't be so
sure :)

Unfortunately, for production 2.5 is unusable and will be for some time.
A lot of people (I think majority) still use 2.4. And 2.4 (as of
2.4.22-pre6) is still broken in this respect...

Eugene


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] consistent_sync and non L1 cache line aligned buffers
  2003-07-15 16:20   ` Eugene Surovegin
@ 2003-07-15 16:25     ` Tom Rini
  0 siblings, 0 replies; 24+ messages in thread
From: Tom Rini @ 2003-07-15 16:25 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: linuxppc-embedded


On Tue, Jul 15, 2003 at 09:20:24AM -0700, Eugene Surovegin wrote:
> At 08:46 AM 7/15/2003, Tom Rini wrote:
> >Well, one thing that is worth noting is that the USB people knew this
> >was a problem, and it was / should have been fixed in the 2.5 cycle.
> >Similarly, SCSI was cleaned up a lot, so perhaps this has been fixed
> >there.  I think it's generally known that doing DMA off of the stack is
> >a bad idea, and should be fixed when found.
>
> I agree this is VERY bad idea but the fact is that there is a code which
> does such nasty things.
>
> I truly hope all this will/was fixed in 2.5 but frankly I wouldn't be so
> sure :)

Well, I would be, of the USB code.  SCSI might have had it fixed, and
others that we haven't found yet may or may not.  But the important
point is that doing this is a driver bug and it's OK to beat driver
authors over the head with patches to fix the behavior. :)

> Unfortunately, for production 2.5 is unusable and will be for some time.
> A lot of people (I think majority) still use 2.4. And 2.4 (as of
> 2.4.22-pre6) is still broken in this respect...

Yes, the changes to USB and SCSI probably won't be backported for some
time, if ever.  So there is still a question of should we workaround
this in 2.4 (Or, more to the point, do we leave it up to every
$(EMBEDDED VENDOR) to do it, or bite the bullet and commit it to the 2.4
mainline.

--
Tom Rini
http://gate.crashing.org/~trini/

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] consistent_sync and non L1 cache line aligned buffers
  2003-07-15 16:17 ` Matt Porter
@ 2003-07-15 16:27   ` Eugene Surovegin
  2003-07-15 18:11     ` PPCBoot on Ebony board Brian Padalino
  2003-07-15 23:51     ` [RFC] consistent_sync and non L1 cache line aligned buffers Matt Porter
  0 siblings, 2 replies; 24+ messages in thread
From: Eugene Surovegin @ 2003-07-15 16:27 UTC (permalink / raw)
  To: Matt Porter; +Cc: linuxppc-embedded


At 09:17 AM 7/15/2003, Matt Porter wrote:
>I'll agree that it's a better hack, but since the offending areas in
>the SCSI subsystem are easily located, it seems wiser to fix upstream.

Matt, the problem is it wasn't that *easy* to locate this, at least for me :)
I'm not sure that this is the only place..

>We still need someone with interest AND time to properly fix the
>consistent alloc from irq issue. :)  All of the patches post to date
>are incomplete bandaids.

Uhh, I switched to solution which uses pre allocated consistent memory (10
pages are enough for sym53c8xx_2).
It's still not a generic solution, but at least it's safe :)

Eugene


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* PPCBoot on Ebony board
  2003-07-15 16:27   ` Eugene Surovegin
@ 2003-07-15 18:11     ` Brian Padalino
  2003-07-15 21:32       ` Chris Zimman
  2003-07-15 23:51     ` [RFC] consistent_sync and non L1 cache line aligned buffers Matt Porter
  1 sibling, 1 reply; 24+ messages in thread
From: Brian Padalino @ 2003-07-15 18:11 UTC (permalink / raw)
  To: linuxppc-embedded


I am trying to transfer PPCBoot that I had compiled over a tftp session in
the 440GP 1.18 ROM Monitor (02/11/02) that came on the Ebony board.  Is that
the wrong method of trying it?  I tried using both serial ports and Kermit
with Xon/Xoff handshaking, but it just ended up timing out.

I keep getting the error:
  Loading file "C:\ppc\bootp\boot.img" ...
  Sending tftp boot request ...
  Not a valid boot image file

Which stems from an improper magic number at the top of the file to say it's
a valid boot image.

I hate to sound like such a newbie, but can anyone help in getting PPCBoot
on this new board?  Any help would be very appreciated.

Thank you,
Brian Padalino


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PPCBoot on Ebony board
  2003-07-15 18:11     ` PPCBoot on Ebony board Brian Padalino
@ 2003-07-15 21:32       ` Chris Zimman
  2003-07-16 11:59         ` Brian Padalino
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Zimman @ 2003-07-15 21:32 UTC (permalink / raw)
  To: Brian Padalino; +Cc: linuxppc-embedded


On Tue, Jul 15, 2003 at 02:11:07PM -0400, Brian Padalino wrote:
>
> I am trying to transfer PPCBoot that I had compiled over a tftp session in
> the 440GP 1.18 ROM Monitor (02/11/02) that came on the Ebony board.  Is that
> the wrong method of trying it?  I tried using both serial ports and Kermit
> with Xon/Xoff handshaking, but it just ended up timing out.
>
> I keep getting the error:
>   Loading file "C:\ppc\bootp\boot.img" ...
>   Sending tftp boot request ...
>   Not a valid boot image file
>
> Which stems from an improper magic number at the top of the file to say it's
> a valid boot image.

This is not going to work.  You want to program PPCBoot into flash via a
flash programmer or via JTAG.

--Chris

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] consistent_sync and non L1 cache line aligned buffers
  2003-07-15 16:27   ` Eugene Surovegin
  2003-07-15 18:11     ` PPCBoot on Ebony board Brian Padalino
@ 2003-07-15 23:51     ` Matt Porter
  1 sibling, 0 replies; 24+ messages in thread
From: Matt Porter @ 2003-07-15 23:51 UTC (permalink / raw)
  To: Eugene Surovegin; +Cc: Matt Porter, linuxppc-embedded


On Tue, Jul 15, 2003 at 09:27:10AM -0700, Eugene Surovegin wrote:
>
> At 09:17 AM 7/15/2003, Matt Porter wrote:
> >I'll agree that it's a better hack, but since the offending areas in
> >the SCSI subsystem are easily located, it seems wiser to fix upstream.
>
> Matt, the problem is it wasn't that *easy* to locate this, at least for me :)
> I'm not sure that this is the only place..

I didn't mean to trivialize the difficulty of finding this from the path
of tracking the symptom to the source. :)  I merely was pointing out
that now that you know the source of the problem, it's not *too* difficult
to look for buffers allocated on the stack by simple inspection of the
SCSI code.  I only jumped in on this because I felt a little guilty that
when I noticed this sometime back I got distracted and never tried to
send a patch to the maintainers. :-/

> >We still need someone with interest AND time to properly fix the
> >consistent alloc from irq issue. :)  All of the patches post to date
> >are incomplete bandaids.
>
> Uhh, I switched to solution which uses pre allocated consistent memory (10
> pages are enough for sym53c8xx_2).
> It's still not a generic solution, but at least it's safe :)

Are you doing this in the sym_2 driver or in the ppc consistent_*
implementations?  I only ask because I finally convinced myself
recently that attempting to make all the locking safe in the VM
subsystem was too much work.  I think Paul suggested at one point
that we might just preallocate a pool for atomic consistent allocations
anyway.

Regards,
--
Matt Porter
mporter@kernel.crashing.org

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: PPCBoot on Ebony board
  2003-07-15 21:32       ` Chris Zimman
@ 2003-07-16 11:59         ` Brian Padalino
  2003-07-16 14:29           ` Chris Zimman
  2003-07-16 14:45           ` Roland Dreier
  0 siblings, 2 replies; 24+ messages in thread
From: Brian Padalino @ 2003-07-16 11:59 UTC (permalink / raw)
  To: Chris Zimman; +Cc: linuxppc-embedded


I found the flash programmer utility, but I am actually a bit scared to
flash the flash with  the version of PPCBoot that I made.  I am working
without a JTAG interface, so flashing using JTAG is out of the question, and
how would I re-flash if the PPCBoot binary I have doesn't work properly?  Is
the board running out of EEPROM right now with the IBM Open Shell always
there?  I have sifted through the documentation a bit, but haven't found a
straight answer as to exactly how the board is setup right out of the box.

Any sort of help is appreciated.  I am extremely new to ICE, JTAG and such
embedded systems -- so please, be patient with me (if you can).

Sincerely,
Brian Padalino


-----Original Message-----
From: owner-linuxppc-embedded@lists.linuxppc.org
[mailto:owner-linuxppc-embedded@lists.linuxppc.org]On Behalf Of Chris
Zimman
Sent: Tuesday, July 15, 2003 5:33 PM
To: Brian Padalino
Cc: linuxppc-embedded@lists.linuxppc.org
Subject: Re: PPCBoot on Ebony board



On Tue, Jul 15, 2003 at 02:11:07PM -0400, Brian Padalino wrote:
>
> I am trying to transfer PPCBoot that I had compiled over a tftp session in
> the 440GP 1.18 ROM Monitor (02/11/02) that came on the Ebony board.  Is
that
> the wrong method of trying it?  I tried using both serial ports and Kermit
> with Xon/Xoff handshaking, but it just ended up timing out.
>
> I keep getting the error:
>   Loading file "C:\ppc\bootp\boot.img" ...
>   Sending tftp boot request ...
>   Not a valid boot image file
>
> Which stems from an improper magic number at the top of the file to say
it's
> a valid boot image.

This is not going to work.  You want to program PPCBoot into flash via a
flash programmer or via JTAG.

--Chris


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PPCBoot on Ebony board
  2003-07-16 11:59         ` Brian Padalino
@ 2003-07-16 14:29           ` Chris Zimman
  2003-07-16 15:39             ` Brian Padalino
  2003-07-16 14:45           ` Roland Dreier
  1 sibling, 1 reply; 24+ messages in thread
From: Chris Zimman @ 2003-07-16 14:29 UTC (permalink / raw)
  To: Brian Padalino; +Cc: linuxppc-embedded


On Wed, Jul 16, 2003 at 07:59:28AM -0400, Brian Padalino wrote:
> I found the flash programmer utility, but I am actually a bit scared to
> flash the flash with  the version of PPCBoot that I made.  I am working
> without a JTAG interface, so flashing using JTAG is out of the question, and
> how would I re-flash if the PPCBoot binary I have doesn't work properly?  Is
> the board running out of EEPROM right now with the IBM Open Shell always
> there?  I have sifted through the documentation a bit, but haven't found a
> straight answer as to exactly how the board is setup right out of the box.

If you built a standard U-Boot 440GP config, odds are that it will work fine.
If you have a flash programmer, you can always save a copy of the OSOpen
image and then reflash with that if U-Boot doesn't come up right.

I'm not sure if your board is using the EEPROM, but that's only used for
strapping anyway.

> Any sort of help is appreciated.  I am extremely new to ICE, JTAG and such
> embedded systems -- so please, be patient with me (if you can).

The first thing I would recommend you do is to get a BDI2000 JTAG
debugger.  If you're going to be doing any serious work on U-Boot,
Linux, etc. this is one of the most valuable tools you can have.

--Chris

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: PPCBoot on Ebony board
  2003-07-16 11:59         ` Brian Padalino
  2003-07-16 14:29           ` Chris Zimman
@ 2003-07-16 14:45           ` Roland Dreier
  1 sibling, 0 replies; 24+ messages in thread
From: Roland Dreier @ 2003-07-16 14:45 UTC (permalink / raw)
  To: Brian Padalino; +Cc: linuxppc-embedded


    Brian> I found the flash programmer utility, but I am actually a
    Brian> bit scared to flash the flash with the version of PPCBoot
    Brian> that I made.  I am working without a JTAG interface, so
    Brian> flashing using JTAG is out of the question, and how would I
    Brian> re-flash if the PPCBoot binary I have doesn't work
    Brian> properly?  Is the board running out of EEPROM right now
    Brian> with the IBM Open Shell always there?  I have sifted
    Brian> through the documentation a bit, but haven't found a
    Brian> straight answer as to exactly how the board is setup right
    Brian> out of the box.

The Ebony OpenBIOS runs out of the socketed EEPROM on the board.  Just
pop the current EEPROM, put a sticker on it that says "Original
Bootloader" and use a new EEPROM for PPCBoot testing.

 - Roland

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: PPCBoot on Ebony board
  2003-07-16 14:29           ` Chris Zimman
@ 2003-07-16 15:39             ` Brian Padalino
  0 siblings, 0 replies; 24+ messages in thread
From: Brian Padalino @ 2003-07-16 15:39 UTC (permalink / raw)
  To: Chris Zimman; +Cc: linuxppc-embedded


I am actually stuck with an OCDemon Wiggler JTAG interface.  I was quoted as
being $2300 for the BDI2000.

That is a bit out of my range for this project.  Know anyone selling a used
one?

I guess I will just try to get gdb to work (in cygwin) with this wiggler.

Thanks for everyone's help!

Brian

-----Original Message-----
From: owner-linuxppc-embedded@lists.linuxppc.org
[mailto:owner-linuxppc-embedded@lists.linuxppc.org]On Behalf Of Chris
Zimman
Sent: Wednesday, July 16, 2003 10:30 AM
To: Brian Padalino
Cc: linuxppc-embedded@lists.linuxppc.org
Subject: Re: PPCBoot on Ebony board



On Wed, Jul 16, 2003 at 07:59:28AM -0400, Brian Padalino wrote:
> I found the flash programmer utility, but I am actually a bit scared to
> flash the flash with  the version of PPCBoot that I made.  I am working
> without a JTAG interface, so flashing using JTAG is out of the question,
and
> how would I re-flash if the PPCBoot binary I have doesn't work properly?
Is
> the board running out of EEPROM right now with the IBM Open Shell always
> there?  I have sifted through the documentation a bit, but haven't found a
> straight answer as to exactly how the board is setup right out of the box.

If you built a standard U-Boot 440GP config, odds are that it will work
fine.
If you have a flash programmer, you can always save a copy of the OSOpen
image and then reflash with that if U-Boot doesn't come up right.

I'm not sure if your board is using the EEPROM, but that's only used for
strapping anyway.

> Any sort of help is appreciated.  I am extremely new to ICE, JTAG and such
> embedded systems -- so please, be patient with me (if you can).

The first thing I would recommend you do is to get a BDI2000 JTAG
debugger.  If you're going to be doing any serious work on U-Boot,
Linux, etc. this is one of the most valuable tools you can have.

--Chris


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] consistent_sync and non L1 cache line aligned buffers
  2003-07-15 23:04 Darin.Johnson
  2003-07-15 23:34 ` Paul Mackerras
  2003-07-15 23:45 ` Matt Porter
@ 2003-07-16 14:01 ` Dan Malek
  2 siblings, 0 replies; 24+ messages in thread
From: Dan Malek @ 2003-07-16 14:01 UTC (permalink / raw)
  To: Darin.Johnson; +Cc: linuxppc-embedded


Darin.Johnson@nokia.com wrote:

> My headache example was the 4 byte buffer that MPC860 CPM
> was using to talk to an I2C device, there just wasn't an
> elegant solution...

You should have used consistent_alloc and never used consistent_sync()
in this case.  FYI, consistent_sync() should never be used on
consistent_alloc()'ed buffers.....another tidbit for documentation.
The consistent_* are used to manage a variety of consistency related
features of non-coherent processors, not that you should necessarily
use all of them to solve a particular implementation requirement.

Thanks.


	-- Dan


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC] consistent_sync and non L1 cache line aligned buffers
@ 2003-07-16  0:12 Darin.Johnson
  0 siblings, 0 replies; 24+ messages in thread
From: Darin.Johnson @ 2003-07-16  0:12 UTC (permalink / raw)
  To: linuxppc-embedded


> If you think about it, you will see that if you are doing DMA to an
> unaligned buffer, and some other unrelated part of the kernel is
> accessing another part of the cache line, you are in trouble no matter
> what sequence of cache flushes/invalidates/whatever you do.

I sort of assumed this was a given.  I'm used to things like
invalidating just 1514 bytes in a larger buffer, because I know
that it's safe.  However, I can agree with your sentiment, since
the actual reason that I made my invalidate routine safer was
because this was simpler than convincing some other programmers
how to do the right thing :-)

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC] consistent_sync and non L1 cache line aligned buffers
  2003-07-15 23:34 ` Paul Mackerras
@ 2003-07-15 23:50   ` Eugene Surovegin
  0 siblings, 0 replies; 24+ messages in thread
From: Eugene Surovegin @ 2003-07-15 23:50 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: linuxppc-embedded


At 04:34 PM 7/15/2003, Paul Mackerras wrote:
>If you think about it, you will see that if you are doing DMA to an
>unaligned buffer, and some other unrelated part of the kernel is
>accessing another part of the cache line, you are in trouble no matter
>what sequence of cache flushes/invalidates/whatever you do.
>
>In other words, the driver MUST own the entire extent of all the cache
>lines that overlap the DMA buffer.  There really is no other solution
>on cache-incoherent machines, assuming you want to allow DMA to
>proceed in parallel with the CPU executing other arbitrary code, which
>is really the whole point of DMA.
>
>How to achieve that has been the subject of much debate.  It's clear
>that tweaks to consistent_sync won't do it, though.

So, at least adding an assertion to consistent_sync would be useful
debugging feature.

E.g. we already have BUG() in consistent_alloc when it called from
interrupt context :)

Eugene


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] consistent_sync and non L1 cache line aligned buffers
  2003-07-15 23:04 Darin.Johnson
  2003-07-15 23:34 ` Paul Mackerras
@ 2003-07-15 23:45 ` Matt Porter
  2003-07-16 14:01 ` Dan Malek
  2 siblings, 0 replies; 24+ messages in thread
From: Matt Porter @ 2003-07-15 23:45 UTC (permalink / raw)
  To: Darin.Johnson; +Cc: linuxppc-embedded


On Tue, Jul 15, 2003 at 04:04:24PM -0700, Darin.Johnson@nokia.com wrote:
>
> > IMHO, the easiest solution is
> > alignment of buffers.....plus it's likely to be a performance
> > improvement.
>
> True, it's the easiest solution for the kernel developer, but
> requires more work from driver authors.  Which is ok, *if* it's
> well documented and everyone knows buffers must be aligned,
> and that's the problem.  I think some people implicitly understand
> these issues, and assume that everyone else thinks the same way.

What more do you expect than the "What memory is DMA'able?" section
in Documentation/DMA-mapping.txt?

It's there to refer the developers to. Some maintainers have bugs
in their drivers/subsystems.  They just need a patch from the people
that depend on the bug fix.

I know you are now focusing on some 8xx buffer issue but the original
issue was surrounding generic SCSI subsystem bugs.

Regards,
--
Matt Porter
mporter@kernel.crashing.org

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC] consistent_sync and non L1 cache line aligned buffers
  2003-07-15 23:04 Darin.Johnson
@ 2003-07-15 23:34 ` Paul Mackerras
  2003-07-15 23:50   ` Eugene Surovegin
  2003-07-15 23:45 ` Matt Porter
  2003-07-16 14:01 ` Dan Malek
  2 siblings, 1 reply; 24+ messages in thread
From: Paul Mackerras @ 2003-07-15 23:34 UTC (permalink / raw)
  To: Darin.Johnson; +Cc: linuxppc-embedded


Darin.Johnson@nokia.com writes:

> > IMHO, the easiest solution is
> > alignment of buffers.....plus it's likely to be a performance
> > improvement.
>
> True, it's the easiest solution for the kernel developer, but
> requires more work from driver authors.  Which is ok, *if* it's

If you think about it, you will see that if you are doing DMA to an
unaligned buffer, and some other unrelated part of the kernel is
accessing another part of the cache line, you are in trouble no matter
what sequence of cache flushes/invalidates/whatever you do.

In other words, the driver MUST own the entire extent of all the cache
lines that overlap the DMA buffer.  There really is no other solution
on cache-incoherent machines, assuming you want to allow DMA to
proceed in parallel with the CPU executing other arbitrary code, which
is really the whole point of DMA.

How to achieve that has been the subject of much debate.  It's clear
that tweaks to consistent_sync won't do it, though.

Paul.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC] consistent_sync and non L1 cache line aligned buffers
@ 2003-07-15 23:04 Darin.Johnson
  2003-07-15 23:34 ` Paul Mackerras
                   ` (2 more replies)
  0 siblings, 3 replies; 24+ messages in thread
From: Darin.Johnson @ 2003-07-15 23:04 UTC (permalink / raw)
  To: linuxppc-embedded


> IMHO, the easiest solution is
> alignment of buffers.....plus it's likely to be a performance
> improvement.

True, it's the easiest solution for the kernel developer, but
requires more work from driver authors.  Which is ok, *if* it's
well documented and everyone knows buffers must be aligned,
and that's the problem.  I think some people implicitly understand
these issues, and assume that everyone else thinks the same way.
The driver authors on the other hand often come from completely
different environments and may be porting code that's been working
fine.

My headache example was the 4 byte buffer that MPC860 CPM
was using to talk to an I2C device, there just wasn't an
elegant solution...

A big snag is that problems in this area are not readily apparent,
and bugs may not surface early.  "Option 1", to provide assertions,
has some appeal when I think about it, because it'll point out
problems that may arise on other architectures (ie, the author
develops on PowerPC with a 'smart' invalidate_dcache_region, but
then the code mysteriously fails twice a week on a pentium).

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] consistent_sync and non L1 cache line aligned buffers
  2003-07-15 21:26   ` David Blythe
@ 2003-07-15 22:15     ` Dan Malek
  0 siblings, 0 replies; 24+ messages in thread
From: Dan Malek @ 2003-07-15 22:15 UTC (permalink / raw)
  To: David Blythe; +Cc: linuxppc-embedded


David Blythe wrote:

> 2) change the definition to allow non-aligned addresses and handle them
> gracefully

What's your implementation proposal?  I contend you can't guarantee
a perfectly working solution due to the race conditions surrounding
the software managment of a cache line, the processor potentially
accessing that cache line, and the DMA that is in progress.  In any
case, you are going to require the programmer has knowledge of something
associated with the cache line, either the alignment or other processor
accessed data that will reside there.  Although the DMA buffers on
the stack are a very poor programming practice, the main problem for
us is that it immediately shows the programmer requires proper buffer
alignment or assignment of other objects to the cache line that won't
cause problems.  If you aligned those stack DMA buffers, you wouldn't
see this problem.

A design constraint of non-coherent cache is the operations on the
cache lines are whole and atomic.  IMHO, the easiest solution is
alignment of buffers.....plus it's likely to be a performance improvement.

Thanks.


	-- Dan


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [RFC] consistent_sync and non L1 cache line aligned buffers
  2003-07-15 20:39 ` Eugene Surovegin
@ 2003-07-15 21:26   ` David Blythe
  2003-07-15 22:15     ` Dan Malek
  0 siblings, 1 reply; 24+ messages in thread
From: David Blythe @ 2003-07-15 21:26 UTC (permalink / raw)
  To: linuxppc-embedded


This discussion has been going on (off and on) for 2 years now.  The
consistent_sync routine does seem to be abused and it seems that in
practice a lot of code does fail (like the skbufs were doing 2 years
ago).  The party line has been that the path works as designed and
people are writing incorrect code.  There are two things that could be
done to improve the developers lot in life:

1) maintain the semantics, but add an assert or something to catch some
of the cases where developers are using it incorrectly rather than
allowing it fail in subtle ways.

2) change the definition to allow non-aligned addresses and handle them
gracefully

2) Would be the most pragmatic.  Doing nothing will ensure that a month
or two from now the topic will come up again and more people will have
wasted countless hours tracking down the same problem in yet another
piece of code.

	david


Eugene Surovegin wrote:
>
> At 01:18 PM 7/15/2003, Darin.Johnson@nokia.com wrote:
>
>> I solved the problem (in a non-Linux system) by just flushing the first
>> and last lines in the requested range, and invalidating the rest.  The
>> very slight performance hit is probably less than testing to see if the
>> buffer is unaligned.
>
>
> I don't think so.
>
> If you take a look at the assembler output of my patch you'll see that test
> for unaligned just accesses register, when dcbf may require memory access
> which is *significantly* slower.
>
> In majority of cases consistent_sync is called with properly aligned buffer
> and I don't want to penalize this path by *unconditionally* (as you are
> suggesting) flushing start and end of the buffer.
>
> Eugene.
>
>
>
>
>


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC] consistent_sync and non L1 cache line aligned buffers
@ 2003-07-15 20:47 Darin.Johnson
  0 siblings, 0 replies; 24+ messages in thread
From: Darin.Johnson @ 2003-07-15 20:47 UTC (permalink / raw)
  To: linuxppc-embedded


> Well, I would be, of the USB code.  SCSI might have had it fixed, and
> others that we haven't found yet may or may not.  But the important
> point is that doing this is a driver bug and it's OK to beat driver
> authors over the head with patches to fix the behavior. :)

I don't necessarily agree.  It's an argument over what is
broken - the invalidate routine that makes assumptions, or the
device writer that assumes the interface functions work as
claimed, or the documentation that doesn't exist?

It's fine to beat the driver authors over the head with the
fact that the functions don't necessarily do what their
names imply.  And it's fine to discourage authors from doing
DMA on the stack (that's a separate issue from unaligned
buffers).

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC] consistent_sync and non L1 cache line aligned buffers
       [not found] <F0B628F30F48064289D8CCC1EE21B7A80C48A5@mvebe001.americas.n okia.com>
@ 2003-07-15 20:39 ` Eugene Surovegin
  2003-07-15 21:26   ` David Blythe
  0 siblings, 1 reply; 24+ messages in thread
From: Eugene Surovegin @ 2003-07-15 20:39 UTC (permalink / raw)
  To: Darin.Johnson; +Cc: linuxppc-embedded


At 01:18 PM 7/15/2003, Darin.Johnson@nokia.com wrote:

>I solved the problem (in a non-Linux system) by just flushing the first
>and last lines in the requested range, and invalidating the rest.  The
>very slight performance hit is probably less than testing to see if the
>buffer is unaligned.

I don't think so.

If you take a look at the assembler output of my patch you'll see that test
for unaligned just accesses register, when dcbf may require memory access
which is *significantly* slower.

In majority of cases consistent_sync is called with properly aligned buffer
and I don't want to penalize this path by *unconditionally* (as you are
suggesting) flushing start and end of the buffer.

Eugene.


** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

* RE: [RFC] consistent_sync and non L1 cache line aligned buffers
@ 2003-07-15 20:18 Darin.Johnson
  0 siblings, 0 replies; 24+ messages in thread
From: Darin.Johnson @ 2003-07-15 20:18 UTC (permalink / raw)
  To: linuxppc-embedded


> invalidate_dcache_range works in L1_CACHE_LINE chunks, so if
> start and/or
> end of the buffer are not aligned we may corrupt data located
> in the same
> cache line (usually stack variable(s) declared before or after buffer
> declaration).

I had noticed this a long time ago when looking to see how PowerPC
Linux had solved this problem, but discovered that it just sidestepped
the problem.

I solved the problem (in a non-Linux system) by just flushing the first
and last lines in the requested range, and invalidating the rest.  The
very slight performance hit is probably less than testing to see if the
buffer is unaligned.

> To be safe I think it's better to modify consistent_sync to
> handle such
> "bad" buffers.

"Bad" is the wrong word to use, in my opinion.  If the function is
not called "invalidate_aligned_dcache_range", or is not otherwise
clearly and unambiguously documented (in an easy to locate place)
that there are severe restrictions on the input parameters, then
the function itself is "bad".  The "principle of least astonishment"
implies that a function should do just what its name implies without
hidden surprises.

** Sent via the linuxppc-embedded mail list. See http://lists.linuxppc.org/

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2003-07-16 15:39 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-07-15  4:32 [RFC] consistent_sync and non L1 cache line aligned buffers Eugene Surovegin
2003-07-15 15:46 ` Tom Rini
2003-07-15 16:20   ` Eugene Surovegin
2003-07-15 16:25     ` Tom Rini
2003-07-15 16:17 ` Matt Porter
2003-07-15 16:27   ` Eugene Surovegin
2003-07-15 18:11     ` PPCBoot on Ebony board Brian Padalino
2003-07-15 21:32       ` Chris Zimman
2003-07-16 11:59         ` Brian Padalino
2003-07-16 14:29           ` Chris Zimman
2003-07-16 15:39             ` Brian Padalino
2003-07-16 14:45           ` Roland Dreier
2003-07-15 23:51     ` [RFC] consistent_sync and non L1 cache line aligned buffers Matt Porter
2003-07-15 20:18 Darin.Johnson
     [not found] <F0B628F30F48064289D8CCC1EE21B7A80C48A5@mvebe001.americas.n okia.com>
2003-07-15 20:39 ` Eugene Surovegin
2003-07-15 21:26   ` David Blythe
2003-07-15 22:15     ` Dan Malek
2003-07-15 20:47 Darin.Johnson
2003-07-15 23:04 Darin.Johnson
2003-07-15 23:34 ` Paul Mackerras
2003-07-15 23:50   ` Eugene Surovegin
2003-07-15 23:45 ` Matt Porter
2003-07-16 14:01 ` Dan Malek
2003-07-16  0:12 Darin.Johnson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.