linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Pipes and fd question. Large amounts of data.
@ 2005-01-30  9:15 Oded Shimon
       [not found] ` <200501300941.45554.miles@milessabin.com>
  2005-01-30 19:41 ` Miquel van Smoorenburg
  0 siblings, 2 replies; 5+ messages in thread
From: Oded Shimon @ 2005-01-30  9:15 UTC (permalink / raw)
  To: linux-kernel

A Unix C programming question. Has to do mostly with pipes, so I am hoping I 
am asking in the right place.

I have a rather unique situation. I have 2 programs, neither of which   have 
control over.
Program A writes into TWO fifo's.
Program B reads from two fifo's.

My program is the middle step.

The problem - neither programs are aware of each other, and write into any of 
the fifo's at their own free will. They will also block until whatever data 
moving they did is complete.

Meaning, if I were to use the direct approach and have no middle step, the 
programs would be thrown into a deadlock instantly. as one program will write 
info fifo 1, and the other will be reading from fifo 2.

The amounts of data is very large, GB's of data in total, and at least 10mb a 
second or possibly as much as 300mb a second. So efficiency in context 
switching is very important.

programs A & B both write and read using large chunks, usually 300k.

So far, my solution is using select() and non blocking pipes. I also used 
large buffers (20mb). In my measurements, at worst case the programs 
write/read 6mb before switching to the other fifo. so 20mb is safe enough.

I have implemented this, but it has a major disadvantage - every 'write()' 
only write 4k at a time, never more, because of how non-blocking pipes are 
done. at 20,000 context switches a second, this method reaches barely 10mb a 
second, if not less.

Blocking pipes have an advantage - they can write large chunks at a time. They 
have a more serious disadvantage though - the amount of data you ask to be 
written/read, IS the amount of data that will be written or read, and will 
block until that much data is moved. I cannot know beforehand exactly how 
much data the programs want, so this could easily fall into a dead lock.

Ideally, I could do this:
my program:  write(20mb);
program B:     read(300k);
my program:  write() returns with return value '300,000'

I was unable to find anything like this solution or similar.
No combination of blocking/non blocking fd's will give this, or any system 
call.
I am looking for alternative/better suggestions.

- ods15.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Pipes and fd question. Large amounts of data.
       [not found] ` <200501300941.45554.miles@milessabin.com>
@ 2005-01-30 10:48   ` Oded Shimon
  2005-01-31 15:02     ` Chris Friesen
  0 siblings, 1 reply; 5+ messages in thread
From: Oded Shimon @ 2005-01-30 10:48 UTC (permalink / raw)
  To: Miles Sabin; +Cc: linux-kernel

On Sunday 30 January 2005 11:41, Miles wrote:
> My suggestion would be to perform blocking writes in a seperate thread
> for each of the two written-to fds. You can still use select/poll for
> the read side ... tho' once you're using threading on the write side it
> might be more straightforward to to use threading on the read side as
> well. Bear in mind that if you do that you'll need to dedicate threads
> to _each_ of the four fds, because each of them could block
> independently while progress is required on one or more of the others.
>
> I'd say that this was one of the rare cases where a solution using
> threads is not only superior to one using event-driven IO, but actually
> required.

Yeah, I reached just about the same conclusion. At first I thought only 2 
threads were necessary, one for each data flow, but I realized a deadlock 
could happen just as well in that too. Making a 4 thread implementation I 
trust is gonna be hard... I better get working. :)

Thanks for the reply,
- ods15

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Pipes and fd question. Large amounts of data.
  2005-01-30  9:15 Pipes and fd question. Large amounts of data Oded Shimon
       [not found] ` <200501300941.45554.miles@milessabin.com>
@ 2005-01-30 19:41 ` Miquel van Smoorenburg
  1 sibling, 0 replies; 5+ messages in thread
From: Miquel van Smoorenburg @ 2005-01-30 19:41 UTC (permalink / raw)
  To: linux-kernel

In article <200501301115.59532.ods15@ods15.dyndns.org>,
Oded Shimon  <ods15@ods15.dyndns.org> wrote:
>I have implemented this, but it has a major disadvantage - every 'write()' 
>only write 4k at a time, never more, because of how non-blocking pipes are 
>done. at 20,000 context switches a second, this method reaches barely 10mb a 
>second, if not less.

If you're using pipe(), you might want to try socketpair()
instead. You can setsockopt() SO_RCVBUF and SO_SNDBUF to
large values if you want.

Mike.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Pipes and fd question. Large amounts of data.
  2005-01-30 10:48   ` Oded Shimon
@ 2005-01-31 15:02     ` Chris Friesen
  2005-01-31 15:14       ` Oded Shimon
  0 siblings, 1 reply; 5+ messages in thread
From: Chris Friesen @ 2005-01-31 15:02 UTC (permalink / raw)
  To: Oded Shimon; +Cc: Miles Sabin, linux-kernel

Oded Shimon wrote:
> On Sunday 30 January 2005 11:41, Miles wrote:

>>I'd say that this was one of the rare cases where a solution using
>>threads is not only superior to one using event-driven IO, but actually
>>required.

> Yeah, I reached just about the same conclusion. At first I thought only 2 
> threads were necessary, one for each data flow, but I realized a deadlock 
> could happen just as well in that too. Making a 4 thread implementation I 
> trust is gonna be hard... I better get working. :)

Your other option would be to use processes with shared memory (either 
sysV or memory-mapped files).  This gets you the speed of shared memory 
maps, but also lets you get the reliability of not sharing your entire 
memory space.

If you use NPTL, your locking should be quick as well.  If not, you can 
always roll your own futex-based locking.

Chris

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Pipes and fd question. Large amounts of data.
  2005-01-31 15:02     ` Chris Friesen
@ 2005-01-31 15:14       ` Oded Shimon
  0 siblings, 0 replies; 5+ messages in thread
From: Oded Shimon @ 2005-01-31 15:14 UTC (permalink / raw)
  To: Chris Friesen; +Cc: Miles Sabin, linux-kernel

On Monday 31 January 2005 17:02, Chris Friesen wrote:
> Your other option would be to use processes with shared memory (either
> sysV or memory-mapped files).  This gets you the speed of shared memory
> maps, but also lets you get the reliability of not sharing your entire
> memory space.
>
> If you use NPTL, your locking should be quick as well.  If not, you can
> always roll your own futex-based locking.

To be honest, most of that was gibrish to me (NTPL, futex, sysV..).. Most of 
my experience with system calls is with pipes and files, I know very little 
about these other things...
Either way, you are a bit late, just half an hour ago, I have completed my 
program, and it works. :) I finished the pthread instead of select() 
implementation pretty quickly (now I understand why lazy programmers use 
threads.. heh), what took me so long was troubles with the 2 other programs, 
had to refine their command line params carefully...
(btw, the 2 other programs - MPlayer and MEncoder, and my job was transferring 
video AND audio between them.)

Thankyou for the reply,
- ods15

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2005-01-31 15:14 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2005-01-30  9:15 Pipes and fd question. Large amounts of data Oded Shimon
     [not found] ` <200501300941.45554.miles@milessabin.com>
2005-01-30 10:48   ` Oded Shimon
2005-01-31 15:02     ` Chris Friesen
2005-01-31 15:14       ` Oded Shimon
2005-01-30 19:41 ` Miquel van Smoorenburg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).