All of lore.kernel.org
 help / color / mirror / Atom feed
* Suspected cache coherency problem on V4L2 and AR7100 CPU
@ 2013-10-03 14:00 Krzysztof Hałasa
  2013-10-04  8:06 ` Krzysztof Hałasa
  2013-10-07  8:38 ` Krzysztof Hałasa
  0 siblings, 2 replies; 10+ messages in thread
From: Krzysztof Hałasa @ 2013-10-03 14:00 UTC (permalink / raw)
  To: linux-mips, linux-media

Hi,

I'm debugging a problem with a SOLO6110-based H.264 PCI video encoder on
Atheros AR7100-based (MIPS, big-endian) platform.

The problem manifests itself with stale data being returned by the
driver (using ioctl VIDIOC_DQBUF). The stale date always starts and ends
on 32-byte cache line boundary.

The driver is drivers/staging/media/solo6x10.

Initially I thought the encoder hardware is at fault (though it works on
i686 and on (both endians) ARM). But I've eliminated actual DMA accesses
from the driver and the problems still persists.

The control flow is now basically the following:
- userspace program initializes the adapter and allocates 192 KB long
  buffers (at least 2 of them):
	open(/dev/video1);

	various ioctl() calls

	for (cnt = 0; cnt < buffer_count; cnt++) {
		struct v4l2_buffer buf = {
			.type = V4L2_BUF_TYPE_VIDEO_CAPTURE,
			.memory = V4L2_MEMORY_MMAP,
			.index = cnt,
		};
		ioctl(stream->fd, VIDIOC_QUERYBUF, &buf);
		mmap(NULL, buf.length, PROT_READ | PROT_WRITE, MAP_SHARED, stream->fd, buf.m.offset);
        }

and then:

	for (cnt = 0; cnt < buffer_count; cnt++) {
		struct v4l2_buffer buf = {
			.type = V4L2_BUF_TYPE_VIDEO_CAPTURE,
			.memory = V4L2_MEMORY_MMAP,
			.index = cnt,
		ioctl(stream->fd, VIDIOC_QBUF, &buf);
	}

The buffers are now queued. The driver (upon receiving an encoded frame)
now mostly does:

	various spin_lock() etc.
	vb = list_first_entry(&solo_enc->vidq_active, struct solo_vb2_buf, list);
	list_del(&vb->list);

	struct vb2_dma_sg_desc *vbuf = vb2_dma_sg_plane_desc(vb, 0);

        /* now we have vbuf->sglist which corresponds to a single
	userspace 192-KB buffer */

	vb2_set_plane_payload(vb, 0, 1024 /* bytes */);

	static u32 n = 0; /* a counter to mark each buffer */

        /* the following is normally done using dma_map_sg() and DMA,
        and also with sg_copy_from_buffer() - eliminated for now */

	/* I do the following instead */
	struct page *page = sg_page(vbuf->sglist);
	u32 *addr = kmap_atomic(page);

	/* 4 times as large, I know, the buffer is much longer though */
	for (i = 0; i < 1024; i++)
		addr[i] = n;

	flush_dcache_page(page); /* and/or */
	flush_kernel_dcache_page(page);

	kunmap_atomic(addr);

	vb->v4l2_buf.sequence = solo_enc->sequence++;
	vb->v4l2_buf.timestamp.tv_sec = vop_sec(vh);
	vb->v4l2_buf.timestamp.tv_usec = vop_usec(vh);

	vb2_buffer_done(vb, VB2_BUF_STATE_DONE);

The userspace server now does ioctl(VIDIOC_DQBUF), sends it using UDP,
and populates buffer pool again with ioctl(VIDIOC_QBUF).

The driver uses directly-mapped cached (kernel) pointers to access the
buffers (0x80000000->0x9FFFFFFF kseg0 region) while (obviously)
userspace uses TLB-mapped pointers.

I have verified with a JTAG-based debugger (OpenOCD) that the buffers
are flushed to DRAM (0xAxxxxxxx uncached directly-mapped region has
valid data), however the userspace TLB-mapped buffers (which correspond
to the same physical DRAM addresses) partially contain old cached data
(from previous iterations).

The question is which part of the code is at fault, and how do I fix it.
I understand invalidating (and perhaps first flushing) userspace buffers
(cache) should generally fix the problem.

This could also be a simple bug rather than API/platform incompatibility
because usually (though not always) only 1 of the buffers gets corrupted
(the second one of two).

It looks like this - valid buffer, counter n = 0x499, a fragment
of actual UDP packet:
        0x0030:  0000 0499 0000 0499 0000 0499 0000 0499  ................
        0x0040:  0000 0499 0000 0499 0000 0499 0000 0499  ................
        0x0050:  0000 0499 0000 0499 0000 0499 0000 0499  ................
        0x0060:  0000 0499 0000 0499 0000 0499 0000 0499  ................
        0x0070:  0000 0499 0000 0499 0000 0499 0000 0499  ................
        0x0080:  0000 0499 0000 0499 0000 0499 0000 0499  ................
        0x0090:  0000 0499 0000 0499 0000 0499 0000 0499  ................
        0x00a0:  0000 0499 0000 0499 0000 0499 0000 0499  ................
        0x00b0:  0000 0499 0000 0499 0000 0499 0000 0499  ................

next buffer is corrupted, n = 0x49A:
        0x0030:  0000 049a 0000 049a 0000 049a 0000 049a  ................
        0x0040:  0000 049a 0000 0468 0000 0468 0000 0468  .......h...h...h
        0x0050:  0000 0468 0000 0468 0000 0468 0000 0468  ...h...h...h...h
        0x0060:  0000 0468 0000 049a 0000 049a 0000 049a  ...h............
        0x0070:  0000 049a 0000 049a 0000 049a 0000 049a  ................
        0x0080:  0000 049a 0000 049a 0000 049a 0000 049a  ................
        0x0090:  0000 049a 0000 049a 0000 049a 0000 049a  ................
        0x00a0:  0000 049a 0000 049a 0000 049a 0000 049a  ................
        0x00b0:  0000 049a 0000 049a 0000 049a 0000 049a  ................
        0x00c0:  0000 049a 0000 0478 0000 0478 0000 0478  .......x...x...x
        0x00d0:  0000 0478 0000 0478 0000 0478 0000 0478  ...x...x...x...x
        0x00e0:  0000 0478 0000 049a 0000 049a 0000 049a  ...x............
        0x00f0:  0000 049a 0000 049a 0000 049a 0000 049a  ................
        0x0100:  0000 049a 0000 049a 0000 049a 0000 049a  ................

Additional details: Ubiquity RouterStation Pro, gcc-4.7.3, Linux v3.11.

Any ideas?
-- 
Krzysztof Halasa

Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
  2013-10-03 14:00 Suspected cache coherency problem on V4L2 and AR7100 CPU Krzysztof Hałasa
@ 2013-10-04  8:06 ` Krzysztof Hałasa
  2013-10-07  8:38 ` Krzysztof Hałasa
  1 sibling, 0 replies; 10+ messages in thread
From: Krzysztof Hałasa @ 2013-10-04  8:06 UTC (permalink / raw)
  To: linux-mips; +Cc: linux-media

> I'm debugging a problem with a SOLO6110-based H.264 PCI video encoder on
> Atheros AR7100-based (MIPS, big-endian) platform.

BTW this CPU obviously has VIPT data cache, this means a physical page
with multiple virtual addresses (e.g. mapped multiple times) may and
will be cached multiple times.

AR7100 = arch/mips/ath79.
-- 
Krzysztof Halasa

Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
  2013-10-03 14:00 Suspected cache coherency problem on V4L2 and AR7100 CPU Krzysztof Hałasa
  2013-10-04  8:06 ` Krzysztof Hałasa
@ 2013-10-07  8:38 ` Krzysztof Hałasa
  2013-10-07 14:24   ` Ralf Baechle
  1 sibling, 1 reply; 10+ messages in thread
From: Krzysztof Hałasa @ 2013-10-07  8:38 UTC (permalink / raw)
  To: linux-mips; +Cc: linux-media

Please forgive me my MIPS TLB ignorance.

It seems there is a TLB entry pointing to the userspace buffer at the
time the kernel pointer (kseg0) is used. Is is an allowed situation on
MIPS 24K?

buffer: len 0x1000 (first page),
	userspace pointer 0x77327000,
	kernel pointer 0x867ac000 (physical address = 0x067ac000)

TLB Index: 15 pgmask=4kb va=77326000 asid=be
       [pa=01149000 c=3 d=1 v=1 g=0] [pa=067ac000 c=3 d=1 v=1 g=0]

Should the TLB entry be deleted before using the kernel pointer (which
points at the same page)?
-- 
Krzysztof Halasa

Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
  2013-10-07  8:38 ` Krzysztof Hałasa
@ 2013-10-07 14:24   ` Ralf Baechle
  2013-10-08  8:24     ` Krzysztof Hałasa
  2013-10-08 10:46     ` Krzysztof Hałasa
  0 siblings, 2 replies; 10+ messages in thread
From: Ralf Baechle @ 2013-10-07 14:24 UTC (permalink / raw)
  To: Krzysztof Hałasa; +Cc: linux-mips, linux-media

On Mon, Oct 07, 2013 at 10:38:49AM +0200, Krzysztof Hałasa wrote:

> Please forgive me my MIPS TLB ignorance.

May the manual be with you :-)

> It seems there is a TLB entry pointing to the userspace buffer at the
> time the kernel pointer (kseg0) is used. Is is an allowed situation on
> MIPS 24K?
> 
> buffer: len 0x1000 (first page),
> 	userspace pointer 0x77327000,
> 	kernel pointer 0x867ac000 (physical address = 0x067ac000)
> 
> TLB Index: 15 pgmask=4kb va=77326000 asid=be
>        [pa=01149000 c=3 d=1 v=1 g=0] [pa=067ac000 c=3 d=1 v=1 g=0]
> 
> Should the TLB entry be deleted before using the kernel pointer (which
> points at the same page)?

That's fine.  You just need to ensure that there are no virtual aliases.
One way to do so is to increase the page size to 16kB.

Note that there is a variant of the 24K which has a VIPT cache but uses
hardware to resolve cache aliases.  That is, from a kernel cache management
perspective it behaves like a PIPT cache.

However as I understand what you're mapping to userspace is actually
device memory, right?  You probably want to map that uncached.  That's a
long standing issue in these two macros:

/*
 * Convert a physical pointer to a virtual kernel pointer for /dev/mem
 * access
 */
#define xlate_dev_mem_ptr(p)    __va(p)

/*
 * Convert a virtual cached pointer to an uncached pointer
 */
#define xlate_dev_kmem_ptr(p)   p

which are defined in arch/mips/include/asm/io.h.  These should return
a KSEG1 (uncached XKPHYS) address for anything but RAM.

Would that explain your observations?

  Ralf

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
  2013-10-07 14:24   ` Ralf Baechle
@ 2013-10-08  8:24     ` Krzysztof Hałasa
  2013-10-08 12:07       ` Ralf Baechle
  2013-10-08 10:46     ` Krzysztof Hałasa
  1 sibling, 1 reply; 10+ messages in thread
From: Krzysztof Hałasa @ 2013-10-08  8:24 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips, linux-media

Ralf Baechle <ralf@linux-mips.org> writes:

> That's fine.  You just need to ensure that there are no virtual
> aliases.

Does this include virtual aliasing between a 4 KB TLB-mapped page and
a kseg0 address? I don't really have two TLBs pointing to the same page.

> One way to do so is to increase the page size to 16kB.

Right, this way we will have a unique mapping from the virtual address
to the data cache, as the cache size (per way) is 8 KB here. Is it the
correct fix in this situation?

> Note that there is a variant of the 24K which has a VIPT cache but uses
> hardware to resolve cache aliases.  That is, from a kernel cache management
> perspective it behaves like a PIPT cache.

It seems it's not the case here. What I have here is:
CPU revision is: 00019374 (MIPS 24Kc)
SoC: Atheros AR7161 rev 2
Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes.
Primary data cache 32kB, 4-way, VIPT, cache aliases, linesize 32 bytes

> However as I understand what you're mapping to userspace is actually
> device memory, right?

Not exactly - I'm using PCI DMA to userspace SG buffers in RAM.

The userspace first allocates the buffers in normal RAM (with vmalloc()
or something, there is an mmap ioctl() for this), the address returned
is 0x7xxxxxxx. Then the buffers (which consist of several pages each)
are presented to the hw driver which obtains separate (kernel) mapping
for each page (kseg0) and does dma_map_sg() and so on. The driver also
simply writes to the buffers. This isn't a problem, though - only the
incoherence between TLB and kseg0 is.

The problem is the userspace doesn't see the kernel writes - The
0x7xxxxxxx TLB-mapped pages are read-cached and not invalidated while
the kernel writes to the same pages using kseg0 addresses.

Thanks for looking at this.
-- 
Krzysztof Halasa

Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
  2013-10-07 14:24   ` Ralf Baechle
  2013-10-08  8:24     ` Krzysztof Hałasa
@ 2013-10-08 10:46     ` Krzysztof Hałasa
  1 sibling, 0 replies; 10+ messages in thread
From: Krzysztof Hałasa @ 2013-10-08 10:46 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips, linux-media

Ralf Baechle <ralf@linux-mips.org> writes:

> That's fine.  You just need to ensure that there are no virtual aliases.
> One way to do so is to increase the page size to 16kB.

Checked, this thing works fine with 16 KB pages.
-- 
Krzysztof Halasa

Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
  2013-10-08  8:24     ` Krzysztof Hałasa
@ 2013-10-08 12:07       ` Ralf Baechle
  2013-10-09  6:53         ` Krzysztof Hałasa
  0 siblings, 1 reply; 10+ messages in thread
From: Ralf Baechle @ 2013-10-08 12:07 UTC (permalink / raw)
  To: Krzysztof Hałasa; +Cc: linux-mips, linux-media

On Tue, Oct 08, 2013 at 10:24:13AM +0200, Krzysztof Hałasa wrote:

> > That's fine.  You just need to ensure that there are no virtual
> > aliases.
> 
> Does this include virtual aliasing between a 4 KB TLB-mapped page and
> a kseg0 address? I don't really have two TLBs pointing to the same page.

Yes.

Note that the terminology used in the manuals may be confusing here.
They call KSEG0/1 and - on 64 bit - XKPHYS "unmapped spaces".  But
obviously there is a mapping from virtual to physical addresses.  It's
just that there is no TLB entry being used for these mappings.

It's easier to understand if you see that all memory accesses are going
through the cache, TLB mapped or not.

> > One way to do so is to increase the page size to 16kB.
> 
> Right, this way we will have a unique mapping from the virtual address
> to the data cache, as the cache size (per way) is 8 KB here. Is it the
> correct fix in this situation?

16K is a silver bullet solution to all cache aliasing problems.  So if
your issue persists with 16K page size, it's not a cache aliasing issue.
Aside there are generally performance gains from the bigger page size.

> > Note that there is a variant of the 24K which has a VIPT cache but uses
> > hardware to resolve cache aliases.  That is, from a kernel cache management
> > perspective it behaves like a PIPT cache.
> 
> It seems it's not the case here. What I have here is:
> CPU revision is: 00019374 (MIPS 24Kc)
> SoC: Atheros AR7161 rev 2
> Primary instruction cache 64kB, VIPT, 4-way, linesize 32 bytes.
> Primary data cache 32kB, 4-way, VIPT, cache aliases, linesize 32 bytes

Yes; it would print "PIPT" if the cache was aliasing-free.

> > However as I understand what you're mapping to userspace is actually
> > device memory, right?
> 
> Not exactly - I'm using PCI DMA to userspace SG buffers in RAM.
> 
> The userspace first allocates the buffers in normal RAM (with vmalloc()
> or something, there is an mmap ioctl() for this), the address returned
> is 0x7xxxxxxx. Then the buffers (which consist of several pages each)
> are presented to the hw driver which obtains separate (kernel) mapping
> for each page (kseg0) and does dma_map_sg() and so on. The driver also
> simply writes to the buffers. This isn't a problem, though - only the
> incoherence between TLB and kseg0 is.

Now that very much sounds like an aliasing issue!

> The problem is the userspace doesn't see the kernel writes - The
> 0x7xxxxxxx TLB-mapped pages are read-cached and not invalidated while
> the kernel writes to the same pages using kseg0 addresses.
> 
> Thanks for looking at this.

You're welcome.

I'm just wondering the underlying issue might be a generic problem.

  Ralf

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
  2013-10-08 12:07       ` Ralf Baechle
@ 2013-10-09  6:53         ` Krzysztof Hałasa
  2013-10-09  8:17           ` Ralf Baechle
  0 siblings, 1 reply; 10+ messages in thread
From: Krzysztof Hałasa @ 2013-10-09  6:53 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips, linux-media

Ralf Baechle <ralf@linux-mips.org> writes:

> 16K is a silver bullet solution to all cache aliasing problems.  So if
> your issue persists with 16K page size, it's not a cache aliasing issue.
> Aside there are generally performance gains from the bigger page size.

I wonder why isn't the issue present in other cases. Perhaps remapping
of a userspace address and accessing it with kseg0 isn't a frequent
operation.

Shouldn't we change the default page size (on affected CPUs) to 16 KB
then? Alternatively, we could flush/invalidate the cache when needed -
is it a viable option?

Thanks.
-- 
Krzysztof Halasa

Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
  2013-10-09  6:53         ` Krzysztof Hałasa
@ 2013-10-09  8:17           ` Ralf Baechle
  2013-10-09 13:05             ` Krzysztof Hałasa
  0 siblings, 1 reply; 10+ messages in thread
From: Ralf Baechle @ 2013-10-09  8:17 UTC (permalink / raw)
  To: Krzysztof Hałasa; +Cc: linux-mips, linux-media

On Wed, Oct 09, 2013 at 08:53:20AM +0200, Krzysztof Hałasa wrote:

> > 16K is a silver bullet solution to all cache aliasing problems.  So if
> > your issue persists with 16K page size, it's not a cache aliasing issue.
> > Aside there are generally performance gains from the bigger page size.
> 
> I wonder why isn't the issue present in other cases. Perhaps remapping
> of a userspace address and accessing it with kseg0 isn't a frequent
> operation.
> 
> Shouldn't we change the default page size (on affected CPUs) to 16 KB
> then? Alternatively, we could flush/invalidate the cache when needed -
> is it a viable option?

The kernel is supposed to perform the necessary cache flushing, so any
remaining aliasing issue would be considered a bug.  But the code is
performance sensitive, some of the problem cases are twisted and complex
so bugs and unsolved corner cases show up every now and then.

The historic default is 4K page size - on some processors such as the
venerable R3000 it's also the only page size available.  Some application
code wants to know the page size and has wisely hardcoded 4K.  Also
a "fix" to binutils many years ago reduced the alignment of generated
binaries so they'd not run on a kernel with larger page size.  The
kernel configuration defaults are chosen to just work out of the box,
and 4K page size is the safest choice.

Anyway, binutils got "unfixed" again years ago so chances are 16K
will just work.

Does it work for you, even solve your problem?

  Ralf

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: Suspected cache coherency problem on V4L2 and AR7100 CPU
  2013-10-09  8:17           ` Ralf Baechle
@ 2013-10-09 13:05             ` Krzysztof Hałasa
  0 siblings, 0 replies; 10+ messages in thread
From: Krzysztof Hałasa @ 2013-10-09 13:05 UTC (permalink / raw)
  To: Ralf Baechle; +Cc: linux-mips, linux-media

Ralf Baechle <ralf@linux-mips.org> writes:

> The kernel is supposed to perform the necessary cache flushing, so any
> remaining aliasing issue would be considered a bug.  But the code is
> performance sensitive, some of the problem cases are twisted and complex
> so bugs and unsolved corner cases show up every now and then.

Ok. This means I should also investigate the V4L2 and the hw driver
code, because the cache aliasing shouldn't be there in the first place.

> Does it work for you, even solve your problem?

Sure, with 16 KB page size everything works fine.

-- 
Krzysztof Halasa

Research Institute for Automation and Measurements PIAP
Al. Jerozolimskie 202, 02-486 Warsaw, Poland

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-10-09 13:05 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-03 14:00 Suspected cache coherency problem on V4L2 and AR7100 CPU Krzysztof Hałasa
2013-10-04  8:06 ` Krzysztof Hałasa
2013-10-07  8:38 ` Krzysztof Hałasa
2013-10-07 14:24   ` Ralf Baechle
2013-10-08  8:24     ` Krzysztof Hałasa
2013-10-08 12:07       ` Ralf Baechle
2013-10-09  6:53         ` Krzysztof Hałasa
2013-10-09  8:17           ` Ralf Baechle
2013-10-09 13:05             ` Krzysztof Hałasa
2013-10-08 10:46     ` Krzysztof Hałasa

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.