All of lore.kernel.org
 help / color / mirror / Atom feed
* Old O_DIRECT story
@ 2014-12-27 13:31 Leon Pollak
  2014-12-27 16:08 ` Theodore Ts'o
  2015-01-05 15:52 ` One Thousand Gnomes
  0 siblings, 2 replies; 5+ messages in thread
From: Leon Pollak @ 2014-12-27 13:31 UTC (permalink / raw)
  To: linux-kernel

Hi, all.
There was a discussion here:
https://lkml.org/lkml/2007/1/10/231

Linus wrote in this discussion:
"So don't use O_DIRECT. Use things like madvise() and posix_fadvise()
instead"

After the full week of tests, searches, discussions, I have impudence to
turn to the community - has one tried to implement this approach?

The situation is very simple:
I have the incoming DMA stream using scatter/gather technique. the driver
read() function provides the next ready DMA buffer descriptor with the
virtual address pointer to the acquired data. I need to store this data to
the disk partition as fast as possible, as the incoming stream is too very
fast. According to tests, O_DIRECT/mapping is fast enough, while write() is
not.

I tried in all ways to implement this with mmap(), but it does not success,
because I did not find a way to mmap() file as O_WRONLY. Mapping as O_RDWR
makes kernel to pre-fill mapped memory with partition data. So, kernel and
DMA actually compete on the RAM area to fill it - one with garbage, one
with actual data. Kernel wins.

So, how to implement Linus's advice?
Leon

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Old O_DIRECT story
  2014-12-27 13:31 Old O_DIRECT story Leon Pollak
@ 2014-12-27 16:08 ` Theodore Ts'o
  2015-01-05 15:52 ` One Thousand Gnomes
  1 sibling, 0 replies; 5+ messages in thread
From: Theodore Ts'o @ 2014-12-27 16:08 UTC (permalink / raw)
  To: Leon Pollak; +Cc: linux-kernel

On Sat, Dec 27, 2014 at 03:31:26PM +0200, Leon Pollak wrote:
> Hi, all.
> There was a discussion here:
> https://lkml.org/lkml/2007/1/10/231
> 
> Linus wrote in this discussion:
> "So don't use O_DIRECT. Use things like madvise() and posix_fadvise()
> instead"
> 
> After the full week of tests, searches, discussions, I have impudence to
> turn to the community - has one tried to implement this approach?

As Linus stated in one of the other messages in the thread:

   As a result, our madvise and/or posix_fadvise interfaces may not be all
   that strong, because people sadly don't use them that much. It's a sad
   example of a totally broken interface (O_DIRECT) resulting in better
   interfaces not getting used, and then not getting as much development
   effort put into them.

There are two reasons to use O_DIRECT.  One is controlling the cache
usage, and the other is performance.

> The situation is very simple:
> I have the incoming DMA stream using scatter/gather technique. the driver
> read() function provides the next ready DMA buffer descriptor with the
> virtual address pointer to the acquired data. I need to store this data to
> the disk partition as fast as possible, as the incoming stream is too very
> fast. According to tests, O_DIRECT/mapping is fast enough, while write() is
> not.

Do you understand *why* write is not fast enough?  Is it realy a
matter of memory bandwidth issues, where you are actually limited by
the copy time implied by the write(2).  If you are being constrained
by memory bandwidth issues, then this won't help, but if the issue
with using buffered writes is that you can't control the writeback
precisely enough, you might try using sync_file_range(2).

The perf program should help confirm if you really are getting hit by
memory bandwidth issues.

> I tried in all ways to implement this with mmap(), but it does not success,
> because I did not find a way to mmap() file as O_WRONLY. Mapping as O_RDWR
> makes kernel to pre-fill mapped memory with partition data. So, kernel and
> DMA actually compete on the RAM area to fill it - one with garbage, one
> with actual data. Kernel wins.

I would be *very* surprised that mmap() is fast enough, because the
overhead in dealing with the page tables and TLB flush usually dooms
the mmap() method.

But if in fact the issue is the pre-fill with partition table, if you
are using a file system, and using fallocate so that you are mapping
in a sparse file, then there would be no pre-population.  I'm guessing
though that since you mention "partition data", you're using a raw
block device, right?

> So, how to implement Linus's advice?

Ultimately, if nothing else works, O_DIRECT is still there for a
reason.  Nothing should stop you from using it.  It is a very awkward
interface, yes, but from a design perspective, it is ugly as sin.  But
at the end of the day, you really need the performance, it's there for
you to use.

Cheers,

						- Ted

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Old O_DIRECT story
  2014-12-27 13:31 Old O_DIRECT story Leon Pollak
  2014-12-27 16:08 ` Theodore Ts'o
@ 2015-01-05 15:52 ` One Thousand Gnomes
  2015-01-06  2:04   ` Kirill A. Shutemov
  1 sibling, 1 reply; 5+ messages in thread
From: One Thousand Gnomes @ 2015-01-05 15:52 UTC (permalink / raw)
  To: Leon Pollak; +Cc: linux-kernel

> I tried in all ways to implement this with mmap(), but it does not success,
> because I did not find a way to mmap() file as O_WRONLY. Mapping as O_RDWR
> makes kernel to pre-fill mapped memory with partition data. So, kernel and
> DMA actually compete on the RAM area to fill it - one with garbage, one
> with actual data. Kernel wins.
> 
> So, how to implement Linus's advice?

Use O_DIRECT. There are lots of problems with the mmap() model, in
particular with how mmu table changes scale to large numbers of CPU
threads (ie they don't).

You would need to modify the kernel to add an madvise type of
soemthing like ZEROFILL (you can't do WRONLY because the x86 CPU can't
really do write only) but then you'd be stuck with only running on the
latest and greatest kernel. A ZEROFILL madvise would at least mean each
time you touched a new page it took a fault and did a 4K clear not a read.

mmap works well if you don't need to change the page permissions or map
new pages all the time. If you have to keep touching the MMU then it gets
ugly because you tend to need to synchronize btween cores.

Alan


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Old O_DIRECT story
  2015-01-05 15:52 ` One Thousand Gnomes
@ 2015-01-06  2:04   ` Kirill A. Shutemov
  2015-01-06  7:53     ` Leon Pollak
  0 siblings, 1 reply; 5+ messages in thread
From: Kirill A. Shutemov @ 2015-01-06  2:04 UTC (permalink / raw)
  To: One Thousand Gnomes; +Cc: Leon Pollak, linux-kernel

On Mon, Jan 05, 2015 at 03:52:10PM +0000, One Thousand Gnomes wrote:
> > I tried in all ways to implement this with mmap(), but it does not success,
> > because I did not find a way to mmap() file as O_WRONLY. Mapping as O_RDWR
> > makes kernel to pre-fill mapped memory with partition data. So, kernel and
> > DMA actually compete on the RAM area to fill it - one with garbage, one
> > with actual data. Kernel wins.
> > 
> > So, how to implement Linus's advice?
> 
> Use O_DIRECT. There are lots of problems with the mmap() model, in
> particular with how mmu table changes scale to large numbers of CPU
> threads (ie they don't).

They do. Kinda. See split page table lock.
But, yeah, mmap() approach should not be faster anyway.

-- 
 Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Old O_DIRECT story
  2015-01-06  2:04   ` Kirill A. Shutemov
@ 2015-01-06  7:53     ` Leon Pollak
  0 siblings, 0 replies; 5+ messages in thread
From: Leon Pollak @ 2015-01-06  7:53 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: One Thousand Gnomes, linux-kernel

IMHO(!):
1. It will be slower, as pre-fill takes its significant time.
2. DMA-Kernel competition makes this method unusable.

On 6 January 2015 at 04:04, Kirill A. Shutemov <kirill@shutemov.name> wrote:
> On Mon, Jan 05, 2015 at 03:52:10PM +0000, One Thousand Gnomes wrote:
>> > I tried in all ways to implement this with mmap(), but it does not success,
>> > because I did not find a way to mmap() file as O_WRONLY. Mapping as O_RDWR
>> > makes kernel to pre-fill mapped memory with partition data. So, kernel and
>> > DMA actually compete on the RAM area to fill it - one with garbage, one
>> > with actual data. Kernel wins.
>> >
>> > So, how to implement Linus's advice?
>>
>> Use O_DIRECT. There are lots of problems with the mmap() model, in
>> particular with how mmu table changes scale to large numbers of CPU
>> threads (ie they don't).
>
> They do. Kinda. See split page table lock.
> But, yeah, mmap() approach should not be faster anyway.
>
> --
>  Kirill A. Shutemov

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-01-06  7:53 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-27 13:31 Old O_DIRECT story Leon Pollak
2014-12-27 16:08 ` Theodore Ts'o
2015-01-05 15:52 ` One Thousand Gnomes
2015-01-06  2:04   ` Kirill A. Shutemov
2015-01-06  7:53     ` Leon Pollak

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.