All of lore.kernel.org
 help / color / mirror / Atom feed
* Single socket with TX_RING and RX_RING
@ 2013-05-15 12:53 Ricardo Tubío
  2013-05-15 13:20 ` Daniel Borkmann
  2013-05-15 22:44 ` Phil Sutter
  0 siblings, 2 replies; 16+ messages in thread
From: Ricardo Tubío @ 2013-05-15 12:53 UTC (permalink / raw)
  To: netdev

Once I tell kernel to export the TX_RING through setsockopt() (see code
below) I always get an error (EBUSY) if i try to tell kernel to export the
RX_RING with the same socket descriptor. Therefore, I have to open an
additional socket for the RX_RING and I require of two sockets when I though
that I would only require of one socket for both TX and RX using mmap()ed
memory.

Do I need both sockets or am I doing something wrong?

Code: 

/* init_ring; type = {PACKET_TX_RING, PACKET_RX_RING} */
void *init_ring(const int socket_fd, const int type)
{
  	
	void *ring = NULL;

	int ring_access_flags = PROT_READ | PROT_WRITE;
	tpacket_req_t *p = init_tpacket_req(FRAMES_PER_RING);
	int ring_len = ( p->tp_block_size ) * ( p->tp_block_nr );
  	
  	if ( setsockopt(socket_fd, SOL_PACKET, type,
                          p, LEN__TPACKET_REQ) < 0 )
	{
		log_sys_error("Setting socket options for this ring");
	}

	// 2) open ring
  	if ( ( ring = mmap(NULL, ring_len, ring_access_flags, MAP_SHARED,
  						socket_fd, 0) ) == NULL )
	{
		log_sys_error("mmap()ing error");
	}
	
	return(ring);
	
}

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-15 12:53 Single socket with TX_RING and RX_RING Ricardo Tubío
@ 2013-05-15 13:20 ` Daniel Borkmann
  2013-05-15 13:32   ` Ricardo Tubío
  2013-05-15 22:44 ` Phil Sutter
  1 sibling, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2013-05-15 13:20 UTC (permalink / raw)
  To: Ricardo Tubío; +Cc: netdev

On 05/15/2013 02:53 PM, Ricardo Tubío wrote:
> Once I tell kernel to export the TX_RING through setsockopt() (see code
> below) I always get an error (EBUSY) if i try to tell kernel to export the
> RX_RING with the same socket descriptor. Therefore, I have to open an
> additional socket for the RX_RING and I require of two sockets when I though
> that I would only require of one socket for both TX and RX using mmap()ed
> memory.
>
> Do I need both sockets or am I doing something wrong?

The second time you call init_ring() in your code e.g. with TX_RING, where
you have previously set it up for the RX_RING. The kernel will give you
-EBUSY because the packet socket is already mmap(2)'ed.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-15 13:20 ` Daniel Borkmann
@ 2013-05-15 13:32   ` Ricardo Tubío
  2013-05-15 14:47     ` Daniel Borkmann
  2013-05-20 20:50     ` Paul Chavent
  0 siblings, 2 replies; 16+ messages in thread
From: Ricardo Tubío @ 2013-05-15 13:32 UTC (permalink / raw)
  To: netdev

Daniel Borkmann <dborkman <at> redhat.com> writes:

> 
> On 05/15/2013 02:53 PM, Ricardo Tubío wrote:
> > Once I tell kernel to export the TX_RING through setsockopt() (see code
> > below) I always get an error (EBUSY) if i try to tell kernel to export the
> > RX_RING with the same socket descriptor. Therefore, I have to open an
> > additional socket for the RX_RING and I require of two sockets when I though
> > that I would only require of one socket for both TX and RX using mmap()ed
> > memory.
> >
> > Do I need both sockets or am I doing something wrong?
> 
> The second time you call init_ring() in your code e.g. with TX_RING, where
> you have previously set it up for the RX_RING. The kernel will give you
> -EBUSY because the packet socket is already mmap(2)'ed.
> 

Ok, so if I make the following system calls:

void *ring=NULL;
setsockopt(socket_fd, SOL_PACKET, PACKET_RX_RING, p, LEN__TPACKET_REQ);
ring = mmap(NULL, ring_len, ring_access_flags, MAP_SHARED, socket_fd, 0);

Would I be permitted to use the ring map obtained both for RX and for TX? If
so, for me it is confusing to use PACKET_RX_RING if I can also TX data
through that ring...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-15 13:32   ` Ricardo Tubío
@ 2013-05-15 14:47     ` Daniel Borkmann
  2013-05-15 14:52       ` Daniel Borkmann
  2013-05-20 20:50     ` Paul Chavent
  1 sibling, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2013-05-15 14:47 UTC (permalink / raw)
  To: Ricardo Tubío; +Cc: netdev

On 05/15/2013 03:32 PM, Ricardo Tubío wrote:
> Daniel Borkmann <dborkman <at> redhat.com> writes:
>> On 05/15/2013 02:53 PM, Ricardo Tubío wrote:
>>> Once I tell kernel to export the TX_RING through setsockopt() (see code
>>> below) I always get an error (EBUSY) if i try to tell kernel to export the
>>> RX_RING with the same socket descriptor. Therefore, I have to open an
>>> additional socket for the RX_RING and I require of two sockets when I though
>>> that I would only require of one socket for both TX and RX using mmap()ed
>>> memory.
>>>
>>> Do I need both sockets or am I doing something wrong?
>>
>> The second time you call init_ring() in your code e.g. with TX_RING, where
>> you have previously set it up for the RX_RING. The kernel will give you
>> -EBUSY because the packet socket is already mmap(2)'ed.

(if you need an answer, then please do not drop the CC, otherwise it could be
  that I might not read it)

> Ok, so if I make the following system calls:
>
> void *ring=NULL;
> setsockopt(socket_fd, SOL_PACKET, PACKET_RX_RING, p, LEN__TPACKET_REQ);
> ring = mmap(NULL, ring_len, ring_access_flags, MAP_SHARED, socket_fd, 0);
>
> Would I be permitted to use the ring map obtained both for RX and for TX? If
> so, for me it is confusing to use PACKET_RX_RING if I can also TX data
> through that ring...

I haven't tried it out yet, and currently also do not really have time to. But
looking at the mmap code, it seems that the size of the mmap area is accumulated
for rx and tx ring. However, the header status bits are not really interoperable
with each other. So looks you will need to have two sockets ...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-15 14:47     ` Daniel Borkmann
@ 2013-05-15 14:52       ` Daniel Borkmann
  2013-05-15 14:58         ` Ricardo Tubío
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Borkmann @ 2013-05-15 14:52 UTC (permalink / raw)
  To: Ricardo Tubío; +Cc: netdev

On 05/15/2013 04:47 PM, Daniel Borkmann wrote:
> On 05/15/2013 03:32 PM, Ricardo Tubío wrote:
>> Daniel Borkmann <dborkman <at> redhat.com> writes:
>>> On 05/15/2013 02:53 PM, Ricardo Tubío wrote:
>>>> Once I tell kernel to export the TX_RING through setsockopt() (see code
>>>> below) I always get an error (EBUSY) if i try to tell kernel to export the
>>>> RX_RING with the same socket descriptor. Therefore, I have to open an
>>>> additional socket for the RX_RING and I require of two sockets when I though
>>>> that I would only require of one socket for both TX and RX using mmap()ed
>>>> memory.
>>>>
>>>> Do I need both sockets or am I doing something wrong?
>>>
>>> The second time you call init_ring() in your code e.g. with TX_RING, where
>>> you have previously set it up for the RX_RING. The kernel will give you
>>> -EBUSY because the packet socket is already mmap(2)'ed.
>
> (if you need an answer, then please do not drop the CC, otherwise it could be
>   that I might not read it)
>
>> Ok, so if I make the following system calls:
>>
>> void *ring=NULL;
>> setsockopt(socket_fd, SOL_PACKET, PACKET_RX_RING, p, LEN__TPACKET_REQ);
>> ring = mmap(NULL, ring_len, ring_access_flags, MAP_SHARED, socket_fd, 0);
>>
>> Would I be permitted to use the ring map obtained both for RX and for TX? If
>> so, for me it is confusing to use PACKET_RX_RING if I can also TX data
>> through that ring...

No, just as a side note, I think here you rather wanted to say ...

  setsockopt(socket, SOL_PACKET, PACKET_RX_RING, ...);
  setsockopt(socket, SOL_PACKET, PACKET_TX_RING, ...);

... and then only once:

  ring = mmap(NULL, ..., socket, 0);

> I haven't tried it out yet, and currently also do not really have time to. But
> looking at the mmap code, it seems that the size of the mmap area is accumulated
> for rx and tx ring. However, the header status bits are not really interoperable
> with each other. So looks you will need to have two sockets ...

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-15 14:52       ` Daniel Borkmann
@ 2013-05-15 14:58         ` Ricardo Tubío
  2013-05-15 15:04           ` Daniel Borkmann
  0 siblings, 1 reply; 16+ messages in thread
From: Ricardo Tubío @ 2013-05-15 14:58 UTC (permalink / raw)
  To: netdev

Daniel Borkmann <dborkman <at> redhat.com> writes:

> 
> No, just as a side note, I think here you rather wanted to say ...
> 
>   setsockopt(socket, SOL_PACKET, PACKET_RX_RING, ...);
>   setsockopt(socket, SOL_PACKET, PACKET_TX_RING, ...);
> 
> ... and then only once:
> 
>   ring = mmap(NULL, ..., socket, 0);
> 
> > I haven't tried it out yet, and currently also do not really have time
to. But
> > looking at the mmap code, it seems that the size of the mmap area is
accumulated
> > for rx and tx ring. However, the header status bits are not really
interoperable
> > with each other. So looks you will need to have two sockets ...
> 

I have already tried that and, if I use the same socket_fd twice with
setsockopt(), I get the EBUSY errno from Kernel. Tomorrow, I will try the
first solution with both sockets (it seems the easiest way); afterwards, I
will try to use the TX_RING socket for TX and for RX at a time.

I will come back with the results.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-15 14:58         ` Ricardo Tubío
@ 2013-05-15 15:04           ` Daniel Borkmann
  0 siblings, 0 replies; 16+ messages in thread
From: Daniel Borkmann @ 2013-05-15 15:04 UTC (permalink / raw)
  To: Ricardo Tubío; +Cc: netdev

On 05/15/2013 04:58 PM, Ricardo Tubío wrote:
> Daniel Borkmann <dborkman <at> redhat.com> writes:
[...]
> I have already tried that and, if I use the same socket_fd twice with
> setsockopt(), I get the EBUSY errno from Kernel. Tomorrow, I will try the
> first solution with both sockets (it seems the easiest way); afterwards, I
> will try to use the TX_RING socket for TX and for RX at a time.
>
> I will come back with the results.

Well, no need. I already told you that it cannot work due to the status bits.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-15 12:53 Single socket with TX_RING and RX_RING Ricardo Tubío
  2013-05-15 13:20 ` Daniel Borkmann
@ 2013-05-15 22:44 ` Phil Sutter
  2013-05-16  9:18   ` Ricardo Tubío
  1 sibling, 1 reply; 16+ messages in thread
From: Phil Sutter @ 2013-05-15 22:44 UTC (permalink / raw)
  To: Ricardo Tubío; +Cc: netdev

On Wed, May 15, 2013 at 12:53:55PM +0000, Ricardo Tubío wrote:
> Once I tell kernel to export the TX_RING through setsockopt() (see code
> below) I always get an error (EBUSY) if i try to tell kernel to export the
> RX_RING with the same socket descriptor. Therefore, I have to open an
> additional socket for the RX_RING and I require of two sockets when I though
> that I would only require of one socket for both TX and RX using mmap()ed
> memory.
> 
> Do I need both sockets or am I doing something wrong?

After requesting the rings, a single mmap() call suffices for both. So
pseudo-code basically looks like this:

| setsockopt(fd, SOL_PACKET, PACKET_RX_RING, p, sizeof(p));
| setsockopt(fd, SOL_PACKET, PACKET_TX_RING, p, sizeof(p));
| rx_ring = mmap(NULL, ring_len * 2, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
| tx_ring = rx_ring + ring_len;

Note that packet_mmap() in net/packet/af_packet.c always maps the TX
ring memory right after the RX one.

HTH, Phil

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-15 22:44 ` Phil Sutter
@ 2013-05-16  9:18   ` Ricardo Tubío
  2013-05-16 10:45     ` Phil Sutter
  0 siblings, 1 reply; 16+ messages in thread
From: Ricardo Tubío @ 2013-05-16  9:18 UTC (permalink / raw)
  To: netdev

Phil Sutter <phil <at> nwl.cc> writes:

> 
> On Wed, May 15, 2013 at 12:53:55PM +0000, Ricardo Tubío wrote:
> > Once I tell kernel to export the TX_RING through setsockopt() (see code
> > below) I always get an error (EBUSY) if i try to tell kernel to export the
> > RX_RING with the same socket descriptor. Therefore, I have to open an
> > additional socket for the RX_RING and I require of two sockets when I though
> > that I would only require of one socket for both TX and RX using mmap()ed
> > memory.
> > 
> > Do I need both sockets or am I doing something wrong?
> 
> After requesting the rings, a single mmap() call suffices for both. So
> pseudo-code basically looks like this:
> 
> | setsockopt(fd, SOL_PACKET, PACKET_RX_RING, p, sizeof(p));
> | setsockopt(fd, SOL_PACKET, PACKET_TX_RING, p, sizeof(p));
> | rx_ring = mmap(NULL, ring_len * 2, PROT_READ | PROT_WRITE, MAP_SHARED,
fd, 0);
> | tx_ring = rx_ring + ring_len;
> 
> Note that packet_mmap() in net/packet/af_packet.c always maps the TX
> ring memory right after the RX one.
> 
> HTH, Phil
> 

Phil, the issue comes precisely when I try to do that: the second call to
setsockopt() returns an "EBUSY" error message from the kernel. It seems that
if you have initialized one socket for beeing either TX_RING or RX_RING, you
cannot initialize the same socket again for the other option (RX_RING or
TX_RING).

Does anybody really know whether I am right or wrong?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-16  9:18   ` Ricardo Tubío
@ 2013-05-16 10:45     ` Phil Sutter
  2013-05-16 11:01       ` Ricardo Tubío
  0 siblings, 1 reply; 16+ messages in thread
From: Phil Sutter @ 2013-05-16 10:45 UTC (permalink / raw)
  To: Ricardo Tubío; +Cc: netdev

On Thu, May 16, 2013 at 09:18:03AM +0000, Ricardo Tubío wrote:
> Phil Sutter <phil <at> nwl.cc> writes:
> 
> > 
> > On Wed, May 15, 2013 at 12:53:55PM +0000, Ricardo Tubío wrote:
> > > Once I tell kernel to export the TX_RING through setsockopt() (see code
> > > below) I always get an error (EBUSY) if i try to tell kernel to export the
> > > RX_RING with the same socket descriptor. Therefore, I have to open an
> > > additional socket for the RX_RING and I require of two sockets when I though
> > > that I would only require of one socket for both TX and RX using mmap()ed
> > > memory.
> > > 
> > > Do I need both sockets or am I doing something wrong?
> > 
> > After requesting the rings, a single mmap() call suffices for both. So
> > pseudo-code basically looks like this:
> > 
> > | setsockopt(fd, SOL_PACKET, PACKET_RX_RING, p, sizeof(p));
> > | setsockopt(fd, SOL_PACKET, PACKET_TX_RING, p, sizeof(p));
> > | rx_ring = mmap(NULL, ring_len * 2, PROT_READ | PROT_WRITE, MAP_SHARED,
> fd, 0);
> > | tx_ring = rx_ring + ring_len;
> > 
> > Note that packet_mmap() in net/packet/af_packet.c always maps the TX
> > ring memory right after the RX one.
> > 
> > HTH, Phil
> > 
> 
> Phil, the issue comes precisely when I try to do that: the second call to
> setsockopt() returns an "EBUSY" error message from the kernel. It seems that
> if you have initialized one socket for beeing either TX_RING or RX_RING, you
> cannot initialize the same socket again for the other option (RX_RING or
> TX_RING).

So you do not call init_ring() twice as one may imply when reading your
first mail? Please provide a complete code sample.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-16 10:45     ` Phil Sutter
@ 2013-05-16 11:01       ` Ricardo Tubío
  2013-05-16 11:14         ` Daniel Borkmann
                           ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Ricardo Tubío @ 2013-05-16 11:01 UTC (permalink / raw)
  To: netdev

Phil Sutter <phil <at> nwl.cc> writes:

> So you do not call init_ring() twice as one may imply when reading your
> first mail? Please provide a complete code sample.
> 

Yes, I call it twice. The problem is that if I call it twice with the same
socket_fd, the second time I call it I get the EBUSY error from kernel. I
have to use two different sockets (two different socket_fd's, therefore) in
order to workaround this issue.

The code I use for calling "init_ring" is the one below. If in function
"init_rings", instead of using two different sockets (rx_socket_fd and
tx_socket_fd), I use a single socket, I get the EBUSY error from kernel.

Hope this clarifies, Cardo.

>>>>>>>>>>>>>>>>> FULL CODE EXAMPLE

/* init_rings */
int init_rings(ll_socket_t *ll_socket)
{
  	
  // 1) initialize rx ring
  if ( ( ll_socket->rx_ring_buffer
	  = init_ring(ll_socket->rx_socket_fd, PACKET_RX_RING) ) == NULL )
  {
    handle_app_error("Could not set initialize RX ring.");
  }
	
  // 2) initialize tx ring
  if ( ( ll_socket->tx_ring_buffer
	= init_ring(ll_socket->tx_socket_fd, PACKET_TX_RING) ) == NULL )
  {
    handle_app_error("Could not set initialize TX ring.");
  }
	
  // 3) set destination address for both kernel rings
  if ( set_sockaddr_ll(ll_socket) < 0 )
  {
    handle_app_error("Could not set sockaddr_ll for TX/RX rings.");
  }
  	
  return(EX_OK);

}

/* init_ring */
void *init_ring(const int socket_fd, const int type)
{
  	
  void *ring = NULL;

  int ring_access_flags = PROT_READ | PROT_WRITE;
  tpacket_req_t *p = init_tpacket_req(FRAMES_PER_RING);
  int ring_len = ( p->tp_block_size ) * ( p->tp_block_nr );
  	
  if ( setsockopt(socket_fd, SOL_PACKET, type, p, LEN__TPACKET_REQ) < 0 )
  {
     handle_sys_error("Setting socket options for this ring");
  }

  // 3) open ring
  if ( ( ring = mmap(NULL, ring_len, ring_access_flags, MAP_SHARED,
  			socket_fd, 0) ) == NULL )
  {
    log_sys_error("mmap()ing error");
  }
	
  return(ring);
	
}

/* init_tpacket_req */
tpacket_req_t *init_tpacket_req(const int frames_per_ring)
{
	tpacket_req_t *t = new_tpacket_req();
  	t->tp_block_size = frames_per_ring * getpagesize();
  	t->tp_block_nr = 1;
  	t->tp_frame_size = getpagesize();
  	t->tp_frame_nr = frames_per_ring;
  	return(t); 	
}

>>>>>>>>>>>>>>>>>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-16 11:01       ` Ricardo Tubío
@ 2013-05-16 11:14         ` Daniel Borkmann
  2013-05-16 11:52         ` Phil Sutter
  2013-05-20 20:54         ` Paul Chavent
  2 siblings, 0 replies; 16+ messages in thread
From: Daniel Borkmann @ 2013-05-16 11:14 UTC (permalink / raw)
  To: Ricardo Tubío; +Cc: netdev, phil

On 05/16/2013 01:01 PM, Ricardo Tubío wrote:
> Phil Sutter <phil <at> nwl.cc> writes:
>
>> So you do not call init_ring() twice as one may imply when reading your
>> first mail? Please provide a complete code sample.
>
> Yes, I call it twice. The problem is that if I call it twice with the same
> socket_fd, the second time I call it I get the EBUSY error from kernel. I
> have to use two different sockets (two different socket_fd's, therefore) in
> order to workaround this issue.
>
> The code I use for calling "init_ring" is the one below. If in function
> "init_rings", instead of using two different sockets (rx_socket_fd and
> tx_socket_fd), I use a single socket, I get the EBUSY error from kernel.

Ricardo, haven't we already been trough this that this way it cannot work?

This is not what we suggested in earlier mails.

Also, why do you keep sending your answers only to netdev without keeping
others in CC?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-16 11:01       ` Ricardo Tubío
  2013-05-16 11:14         ` Daniel Borkmann
@ 2013-05-16 11:52         ` Phil Sutter
  2013-05-20 20:54         ` Paul Chavent
  2 siblings, 0 replies; 16+ messages in thread
From: Phil Sutter @ 2013-05-16 11:52 UTC (permalink / raw)
  To: Ricardo Tubío; +Cc: netdev

On Thu, May 16, 2013 at 11:01:17AM +0000, Ricardo Tubío wrote:
> Phil Sutter <phil <at> nwl.cc> writes:
> 
> > So you do not call init_ring() twice as one may imply when reading your
> > first mail? Please provide a complete code sample.
> > 
> 
> Yes, I call it twice. The problem is that if I call it twice with the same
> socket_fd, the second time I call it I get the EBUSY error from kernel. I
> have to use two different sockets (two different socket_fd's, therefore) in
> order to workaround this issue.

Which call does produce the EBUSY response, the second setsockopt() or
second mmap() one?

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-15 13:32   ` Ricardo Tubío
  2013-05-15 14:47     ` Daniel Borkmann
@ 2013-05-20 20:50     ` Paul Chavent
  1 sibling, 0 replies; 16+ messages in thread
From: Paul Chavent @ 2013-05-20 20:50 UTC (permalink / raw)
  To: Ricardo Tubío; +Cc: netdev

[-- Attachment #1: Type: text/plain, Size: 3388 bytes --]

On 05/15/2013 03:32 PM, Ricardo Tubío wrote:
> Daniel Borkmann <dborkman <at> redhat.com> writes:
>
>>
>> On 05/15/2013 02:53 PM, Ricardo Tubío wrote:
>>> Once I tell kernel to export the TX_RING through setsockopt() (see code
>>> below) I always get an error (EBUSY) if i try to tell kernel to export the
>>> RX_RING with the same socket descriptor. Therefore, I have to open an
>>> additional socket for the RX_RING and I require of two sockets when I though
>>> that I would only require of one socket for both TX and RX using mmap()ed
>>> memory.
>>>
>>> Do I need both sockets or am I doing something wrong?
>>
>> The second time you call init_ring() in your code e.g. with TX_RING, where
>> you have previously set it up for the RX_RING. The kernel will give you
>> -EBUSY because the packet socket is already mmap(2)'ed.
>>
>
> Ok, so if I make the following system calls:
>
> void *ring=NULL;
> setsockopt(socket_fd, SOL_PACKET, PACKET_RX_RING, p, LEN__TPACKET_REQ);
> ring = mmap(NULL, ring_len, ring_access_flags, MAP_SHARED, socket_fd, 0);
>
> Would I be permitted to use the ring map obtained both for RX and for TX? If
> so, for me it is confusing to use PACKET_RX_RING if I can also TX data
> through that ring...
>

Hello Ricardo.

I managed to use the same socket and a single mmaped area for both RX_RING and TX_RING. Here is some sample code :

/* open socket */
sock_fd = socket(PF_PACKET, socket_type, htons(socket_protocol));

/* socket tuning and init */
[...]

/* rings geometry */
rx_packet_req.tp_block_size = pagesize << order;
rx_packet_req.tp_block_nr = 1;
rx_packet_req.tp_frame_size = frame_size;
rx_packet_req.tp_frame_nr = (rx_packet_req.tp_block_size / rx_packet_req.tp_frame_size) * rx_packet_req.tp_block_nr;

tx_packet_req = rx_packet_req;

/* set packet version */
setsockopt(sock_fd, SOL_PACKET, PACKET_VERSION, &version, sizeof(version))

/* set RX ring option */
setsockopt(sock_fd, SOL_PACKET, PACKET_RX_RING, &rx_packet_req, sizeof(rx_packet_req))

/* set TX ring option*/
setsockopt(sock_fd, SOL_PACKET, PACKET_TX_RING, &tx_packet_req, sizeof(tx_packet_req))

/* map rx + tx buffer to userspace : they are in this order */
mmap_size =
     rx_packet_req.tp_block_size * rx_packet_req.tp_block_nr +
     tx_packet_req.tp_block_size * tx_packet_req.tp_block_nr ;
mmap_base = mmap(0, mmap_size, PROT_READ|PROT_WRITE, MAP_SHARED, sock_fd, 0);

/* get rx and tx buffer description */
rx_buffer_size = rx_packet_req.tp_block_size * rx_packet_req.tp_block_nr;
rx_buffer_addr = mmap_base;
rx_buffer_idx  = 0;
rx_buffer_cnt  = rx_packet_req.tp_block_size * rx_packet_req.tp_block_nr / rx_packet_req.tp_frame_size;

tx_buffer_size = tx_packet_req.tp_block_size * tx_packet_req.tp_block_nr;
tx_buffer_addr = mmap_base + rx_buffer_size;
tx_buffer_idx  = 0;
tx_buffer_cnt  = tx_packet_req.tp_block_size * tx_packet_req.tp_block_nr / tx_packet_req.tp_frame_size;


I join to this mail a complete (but certainly outdated) sample code.

I've also begun to write a kind of howto (in french) on the packet mmap at this page : http://paul.chavent.free.fr/packet_mmap.html (this is a work in progress, i will add information on timestamping)

Regards.

Paul.

>
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>


[-- Attachment #2: ethernet.c --]
[-- Type: text/plain, Size: 38648 bytes --]

/*
 * This module allow to send/receive ethernet frames.
 * The type of ethernet frames must be specified at compile time :
 *  - use 8021Q or not
 *    - tpid and tci
 *  - ethertype
 *  - filtering or not
 *
 * See /usr/src/linux/Documentation/networking/packet_mmap.txt  for improvement
 *
 *
 * Notes on packet mmap
 *
 * For tx example see :
 *   http://wiki.ipxwarzone.com/index.php5?title=Linux_packet_mmap#Example
 * For rx example see :
 *   http://www.scaramanga.co.uk/code-fu/lincap.c
 *
 * (1) If we open the socket with SOCK_DGRAM, the tp_mac and the
 *     tp_net are the same (the mac header isn't provided by the
 *     user). Eg tp_mac=80 and tp_net=80. If we open the socket with
 *     SOCK_RAW, the tp_net = tp_mac + 14. Eg tp_mac=66 and tp_net=80.
 *     (see (6) for alignment)
 *
 * (2) The tx and rx are asymetrics. On tx we fill data at 
 *       TPACKET2_HDRLEN - sizeof(struct sockaddr_ll)
 *     on rx we get data at (see (1)) 
 *       tp_mac 
 *     or 
 *       tp_net 
 *
 * (3) The mmaping is made only once for the two sides. The map gives
 *     rx before tx.
 * 
 * (4) The tp_len is the real len of the frame, the tp_snaplen is the
 *     len of the data in the ring buffer. If you give a too small
 *     size for the struct tpacket_req -> tp_frame_size is the real
 *     length and if the PACKET_COPY_TRESH sockopt is set,
 *     TP_STATUS_COPY is set in tp_status.
 *
 * (5) The minimum tp_frame_size for tx is the minimum size of the
 *     payload (including the mac header if SOCK_RAW is selected) plus :
 *       TPACKET2_HDRLEN - sizeof(struct sockaddr_ll)           = 32 
 *     The TPACKET2_HDRLEN - sizeof(struct sockaddr_ll) is always aligned
 *     to 16 bytes
 *
 *
 * (6) The minimum tp_frame_size for rx is the minimum size of the
 *     payload (including the mac header if SOCK_RAW is selected) plus :
 *       ALIGN_16(TPACKET2_HDRLEN) + 16 + tp_reserve (=0)       = 80 = tp_net 
 *     The tp_net will always be aligned to 16 bytes boundaries
 *
 *
 * RX FRAME STRUCTURE :
 *
 * Start (aligned to TPACKET_ALIGNMENT=16)   TPACKET_ALIGNMENT=16                                   TPACKET_ALIGNMENT=16
 * v                                         v                                                      v
 * |                                         |                             | tp_mac                 |tp_net
 * |  struct tpacket_hdr  ... pad            | struct sockaddr_ll ... gap  | min(16, maclen) = 16   |
 * |<--------------------------------------------------------------------->|<---------------------->|<----... 
 *                                tp_hdrlen = TPACKET2_HDRLEN                   if SOCK_RAW             user data
 *
 *
 * TX FRAME STRUCTURE :
 *
 * Start (aligned to TPACKET_ALIGNMENT=16)   TPACKET_ALIGNMENT=16
 * v                                         v
 * |                                         |
 * |  struct tpacket_hdr  ... pad            | struct sockaddr_ll ... gap
 * |<--------------------------------------------------------------------->| 
 *                                tp_hdrlen = TPACKET2_HDRLEN
 *                                           |<---- ... 
 *                                               user data
 *
 *
 * TODO / IMPROVEMENTS
 *  vlan 802Q
 *  timestamp
 *  filtering
 *  set the mtu according to the tp_frame_size or set tp_frame_size according
 *  to the mtu ?
 */

#undef  USE_FILTER
#define COOKED_PACKET
#undef  P_8021Q
#define PATCHED_PACKET

#define _GNU_SOURCE 

#include <assert.h>           /* assert */
#include <stdio.h>            /* printf */
#include <stdlib.h>           /* calloc, free */
#include <string.h>           /* memcpy */
#include <errno.h>            /* errno, perror, etc */
#include <unistd.h>           /* close */
#include <sys/ioctl.h>        /* ioctl */
#include <arpa/inet.h>        /* htons, ntohs */
#include <poll.h>             /* poll */
#include <time.h>             /* struct timespec */
#include <sys/timerfd.h>      /* timerfd_create etc. */
#include <sys/mman.h>         /* mmap */
#include <sys/socket.h>       /* socket */
#include <net/if.h>           /* ifreq, ifconf */
#include <net/ethernet.h>     /* struct ether_header, ETH_ALEN, ... */
#include <linux/if_packet.h>  /* packet mmap*/
#if defined(USE_FILTER)
#include <linux/types.h>      /* attach filter */
#include <linux/filter.h>     /* attach filter */
#endif

#include "ethernet.h"
#if !defined(NDEBUG)
#include "debug.h"
#endif

#define MIN(x,y) ((x)<(y)?(x):(y))

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
 
static inline unsigned next_power_of_two(unsigned n)
{
  n--;
  n |= n >> 1;
  n |= n >> 2;
  n |= n >> 4;
  n |= n >> 8;
  n |= n >> 16;
  n++;
  return n;
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
 
static const uint8_t broadcast_addr[6] = {0xff, 0xff, 0xff, 0xff, 0xff, 0xff};

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
struct ethernet_s
{
#if !defined(NDEBUG)
  int                debug;
#endif

  int                timer_fd;

  int                sock_fd;

  struct sockaddr_ll local_addr;
  struct sockaddr_ll remote_addr;

  unsigned           mtu;

  struct tpacket_req rx_packet_req;
  struct tpacket_req tx_packet_req;

  void *             mmap_base;
  unsigned           mmap_size;

  unsigned           rx_buffer_size;
  void *             rx_buffer_addr;
  unsigned           rx_buffer_cnt;
  unsigned           rx_buffer_idx;
  unsigned           rx_buffer_payload_offset;
  unsigned           rx_buffer_payload_max_size;

  unsigned           tx_buffer_size;
  void *             tx_buffer_addr;
  unsigned           tx_buffer_cnt; 
  unsigned           tx_buffer_idx;
  unsigned           tx_buffer_payload_offset;
  unsigned           tx_buffer_payload_max_size;

  struct pollfd      pollfd[2];
};

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/

/* http://standards.ieee.org/develop/regauth/ethertype/eth.txt */
#define ETH_TYPE 0x88b5

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/

#if !defined(COOKED_PACKET)

static const int socket_type     = SOCK_DGRAM;
static const int socket_protocol = ETH_P_802_3;
static const int bind_protocol   = ETH_P_802_2; // man packet section Notes
static const int send_protocol   = ETH_TYPE;

#endif /* !defined(COOKED_PACKET) */

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/

#if defined(COOKED_PACKET) && !defined(P_8021Q)

static const int socket_type     = SOCK_RAW;
static const int socket_protocol = ETH_P_802_3;
static const int bind_protocol   = ETH_P_802_2; // man packet section Notes
static const int send_protocol   = ETH_TYPE;

struct ether_header_s
{
  uint8_t  dhost[ETH_ALEN];
  uint8_t  shost[ETH_ALEN];
  uint16_t type;
} __attribute__ ((__packed__));

typedef struct ether_header_s ether_header_t;

#endif /* defined(COOKED_PACKET) && !defined(P_8021Q) */

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/

#if defined(COOKED_PACKET) && defined(P_8021Q)

static const int socket_type     = SOCK_RAW;
static const int socket_protocol = ETH_P_ALL;
static const int bind_protocol   = ETH_P_ALL;
static const int send_protocol   = ETH_TYPE;

struct ether_header_s
{
  uint8_t   dhost[ETH_ALEN];
  uint8_t   shost[ETH_ALEN];
  uint16_t  tpid;
  uint16_t  tci;
  uint16_t  type;
} __attribute__ ((__packed__));

typedef struct ether_header_s ether_header_t;

#define E_8021Q_TPID 0x8100
#define E_8021Q_TCI  0xEFFE

#define E_8021Q_PCP 0x7     /* priority : highest -> better, from 0 to 7 */
#define E_8021Q_CFI 0
#define E_8021Q_VID 0xFFE   /* vlan id, from 0 (reserved) to 0xFFF (reserved) */

#endif /* defined(COOKED_PACKET) && defined(P_8021Q) */

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/

#if defined(USE_FILTER)

static struct sock_filter filt_prog_code[] =
{
#if defined(P_8021Q)
  /* load and check tpid */
  BPF_STMT(BPF_LD  | BPF_H   | BPF_ABS, 12),                /* Load tpid */
  BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K,   E_8021Q_TPID, 1, 0),/* equal 8021Q_TPID */
  BPF_STMT(BPF_RET | BPF_K,             0),                 /* reject */
  /* load and check tci */
  BPF_STMT(BPF_LD  | BPF_H   | BPF_ABS, 14),               /* Load tci */
  BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K,   E_8021Q_TCI, 1, 0),/* equal 8021Q_TCI */
  BPF_STMT(BPF_RET | BPF_K,             0),                /* reject */
#endif /* defined(USE_8021Q) */
  BPF_STMT(BPF_LD  | BPF_H   | BPF_ABS, ETH_HDR_LEN - 2),  /* Load ether type */
  BPF_JUMP(BPF_JMP | BPF_JEQ | BPF_K,   ETH_TYPE, 1, 0),   /* equal ETHER_TYPE */
  BPF_STMT(BPF_RET | BPF_K,             0),                /* reject */
  BPF_STMT(BPF_RET | BPF_K,             65535),            /* accept */
};

static struct sock_fprog filt_prog =
{
  sizeof(filt_prog_code) / sizeof(filt_prog_code[0]),
  filt_prog_code
};

#endif /* defined(USE_FILTER) */

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
#if !defined(NDEBUG)
static void ethernet_debug_frame(const void * base);
static void ethernet_debug_packet_req(const struct tpacket_req * rx_packet_req, const struct tpacket_req * tx_packet_req);
#endif

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
ethernet_t * ethernet_alloc()
{
  ethernet_t * itf = calloc(1, sizeof(*itf));
  if(itf)
    {
      itf->timer_fd = -1;
      itf->sock_fd = -1;
      itf->mmap_base = (void *)-1;
    }
  return itf;
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
void ethernet_free(ethernet_t *itf)
{
 /* check parameters */
  assert(itf);

  ethernet_close(itf);

  free(itf);
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
int ethernet_open(ethernet_t *itf, const char *itf_name)
{
  struct ifreq ifr;
  int err = 0;
  socklen_t errlen = sizeof(err);

  /* fill ifr name field */
  memset(&ifr, 0, sizeof(ifr));
  strncpy(ifr.ifr_name, itf_name, sizeof(ifr.ifr_name));

  /* check parameters */
  assert(itf);

  /* cleanup */
  ethernet_close(itf);

  /* setup timer fd */
  itf->timer_fd = timerfd_create(CLOCK_REALTIME, 0);
  if(itf->timer_fd < 0)
    {
      perror("timerfd_create failed");
      return -1;
    }

  /* open socket */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "socket\n");
    }
#endif
  itf->sock_fd = socket(PF_PACKET, socket_type, htons(socket_protocol));
  if(itf->sock_fd < 0)
    {
      perror("socket failed");
      return -1;
    }
  
#if defined(USE_FILTER)
  /* attach filter */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "setsockopt SO_ATTACH_FILTER\n");
    }
#endif
  if(setsockopt(itf->sock_fd, SOL_SOCKET, SO_ATTACH_FILTER, &filt_prog, sizeof(filt_prog)))
    {
      perror("getsockopt SO_ERROR failed");
      return -1;
    }
#endif /* defined(USE_FILTER) */

  /* set local addr */
  memset(&itf->local_addr, 0, sizeof(itf->local_addr));
  itf->local_addr.sll_family = AF_PACKET;
  itf->local_addr.sll_protocol = htons(bind_protocol);

  /* get itf index */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ioctl SIOCGIFINDEX\n");
    }
#endif
  if(ioctl(itf->sock_fd, SIOCGIFINDEX, &ifr) == -1)
    {
      perror("ioctl SIOCGIFINDEX failed");
      return -1;
    }
  itf->local_addr.sll_ifindex = ifr.ifr_ifindex;
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "if index %d\n", ifr.ifr_ifindex);
    }
#endif

  /* get own MAC address */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ioctl SIOCGIFHWADDR\n");
    }
#endif
  if(ioctl(itf->sock_fd, SIOCGIFHWADDR, &ifr) < 0)
    {
      perror("ioctl SIOCGIFHWADDR failed");
      return -1;
    }
  itf->local_addr.sll_halen = ETH_ALEN;
  memcpy(&itf->local_addr.sll_addr, ifr.ifr_hwaddr.sa_data, ETH_ALEN);
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "if mac addr %02x:%02x:%02x:%02x:%02x:%02x:\n", 
              itf->local_addr.sll_addr[0], itf->local_addr.sll_addr[1], 
              itf->local_addr.sll_addr[2], itf->local_addr.sll_addr[3],
              itf->local_addr.sll_addr[4], itf->local_addr.sll_addr[5]);
    }
#endif

  /* bind to eth */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "bind\n");
    }
#endif
  if(bind(itf->sock_fd, (const void *)&itf->local_addr, sizeof(itf->local_addr)) == -1)
    {
      perror("bind failed");
      return -1;
    }

  /* any pending errors, e.g., network is down? */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "getsockopt SO_ERROR\n");
    }
#endif
  if(getsockopt(itf->sock_fd, SOL_SOCKET, SO_ERROR, &err, &errlen) == -1)
    {
      perror("getsockopt SO_ERROR failed");
      return -1;
    }
  if(err > 0)
    {
      fprintf(stderr, "network is down ?\n");
      return -1;
    }

  /* set remote addr */
  itf->remote_addr = itf->local_addr;
  itf->remote_addr.sll_protocol = htons(send_protocol);
  memcpy(&itf->remote_addr.sll_addr, broadcast_addr, ETH_ALEN);

  /* get own MTU */
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ioctl SIOCGIFMTU\n");
    }
#endif
  if (ioctl(itf->sock_fd, SIOCGIFMTU, &ifr) < 0)
    {
      perror("ioctl SIOCGIFMTU failed");
      return -1;
    }
  itf->mtu = ifr.ifr_mtu;
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "Mtu %d\n", itf->mtu);
    }
#endif

  /* prepare packet mmaping */
  const long pagesize = sysconf(_SC_PAGESIZE); /* assume 4096 */
  const unsigned order = 1;
  const unsigned frame_size = next_power_of_two(itf->mtu + 128); /* 128 is an arbitrary value */ 

  /* tp_block_size must be a power of two of PAGE_SIZE */
  itf->rx_packet_req.tp_block_size = pagesize << order; 
  /* tp_block_nr */
  itf->rx_packet_req.tp_block_nr = 1;
  /* tp_frame_size must be greater than TPACKET2_HDRLEN and a multiple 
   * of TPACKET_ALIGNMENT. It should also be a divisor of tp_block_size */
  itf->rx_packet_req.tp_frame_size = frame_size;
  /* tp_frame_nr */
  itf->rx_packet_req.tp_frame_nr = (itf->rx_packet_req.tp_block_size / itf->rx_packet_req.tp_frame_size) * itf->rx_packet_req.tp_block_nr;

  /* sanity checks */
  if(frame_size <= TPACKET2_HDRLEN)
    {
      fprintf(stderr, "frame_size (%u) must be greater than TPACKET2_HDRLEN (%u)\n", frame_size, TPACKET2_HDRLEN);
      return -1;
    }
  if((frame_size % TPACKET_ALIGNMENT) != 0)
    {
      fprintf(stderr, "frame_size (%u) must be a multiple of TPACKET_ALIGNMENT (%u)\n", frame_size, TPACKET_ALIGNMENT);
      return -1;
    }
  if((itf->rx_packet_req.tp_block_size % frame_size) != 0)
    {
      fprintf(stderr, "frame_size (%u) must be a divisor of tp_block_size (%u)\n", frame_size, itf->rx_packet_req.tp_block_size);
      return -1;
    }

  /* same settings for tx */
  itf->tx_packet_req = itf->rx_packet_req;

#if !defined(NDEBUG)
  if(itf->debug)
    {
      ethernet_debug_packet_req(&itf->rx_packet_req, &itf->tx_packet_req);
    }
#endif
  
  /* set paquet version option */
  int version = TPACKET_V2;
  if(setsockopt(itf->sock_fd, SOL_PACKET, PACKET_VERSION, &version, sizeof(version)) < 0)
    {
      perror("setsockopt: PACKET_VERSION");
      return -1;
    }

  /* set RX ring option */
  if (setsockopt(itf->sock_fd, SOL_PACKET, PACKET_RX_RING, &itf->rx_packet_req, sizeof(itf->rx_packet_req)) < 0)
    {
      perror("setsockopt: PACKET_RX_RING");
      return -1;
    }
 
  /* set TX ring option*/
  if (setsockopt(itf->sock_fd, SOL_PACKET, PACKET_TX_RING, &itf->tx_packet_req, sizeof(itf->tx_packet_req)) < 0)
    {
      perror("setsockopt: PACKET_TX_RING");
      return -1;
    }

  /* map rx + tx buffer to userspace : they are in this order */
  itf->mmap_size = 
    itf->rx_packet_req.tp_block_size * itf->rx_packet_req.tp_block_nr +
    itf->tx_packet_req.tp_block_size * itf->tx_packet_req.tp_block_nr ;
  itf->mmap_base = mmap(0, itf->mmap_size, PROT_READ|PROT_WRITE, MAP_SHARED, itf->sock_fd, 0);
  if (itf->mmap_base == (void*)-1)
    {
      perror("mmap rx buffer failed");
      return -1;
    }

  /* get rx and tx buffer description */
  itf->rx_buffer_size = itf->rx_packet_req.tp_block_size * itf->rx_packet_req.tp_block_nr;
  itf->rx_buffer_addr = itf->mmap_base;
  itf->rx_buffer_idx  = 0;
  itf->rx_buffer_cnt  = itf->rx_packet_req.tp_block_size * itf->rx_packet_req.tp_block_nr / itf->rx_packet_req.tp_frame_size;

  itf->tx_buffer_size = itf->tx_packet_req.tp_block_size * itf->tx_packet_req.tp_block_nr;
  itf->tx_buffer_addr = itf->mmap_base + itf->rx_buffer_size;
  itf->tx_buffer_idx  = 0;
  itf->tx_buffer_cnt  = itf->tx_packet_req.tp_block_size * itf->tx_packet_req.tp_block_nr / itf->tx_packet_req.tp_frame_size;

  /* 
   * Precompute payload offset and max size 
   * Warning : tx and rx are asymetrics 
   */

  /*
   * - on rx we get data at tp_net (SOCK_DGRAM) and tp_mac if we need mac 
   *   header (SOCK_RAW) 
   *   the rx_buffer_payload_offset is the offset from the tp_net of the frame !
   *   For computing max size we consider the tp_net to be :
   *     TPACKET2_HDRLEN + 16 + reserve   (=80)
   *   or
   *     TPACKET2_HDRLEN + min(16, maclen) + reserve
   *   see src/linux/net/packet/af_packet.c tpacket_rcv  
   */
  itf->rx_buffer_payload_offset = TPACKET_ALIGN(TPACKET2_HDRLEN + MIN(sizeof(ether_header_t), 16)); // only used here, use tp_net elsewhere
  itf->rx_buffer_payload_max_size = itf->rx_packet_req.tp_frame_size - itf->rx_buffer_payload_offset;

  /*
   * - on tx we fill data at 
   *     TPACKET2_HDRLEN - sizeof(struct sockaddr_ll)
   *   or
   *     TPACKET2_HDRLEN + min(16, maclen)
   *   see src/linux/net/packet/af_packet.c tpacket_fill_skb  
   */
#if defined(PATCHED_PACKET)
  itf->tx_buffer_payload_offset = TPACKET_ALIGN(TPACKET2_HDRLEN + MIN(sizeof(ether_header_t), 16));
#else /* defined(PATCHED_PACKET) */
  itf->tx_buffer_payload_offset = (TPACKET2_HDRLEN - sizeof(struct sockaddr_ll));
#endif /* defined(PATCHED_PACKET) */
  itf->tx_buffer_payload_max_size = itf->tx_packet_req.tp_frame_size - itf->tx_buffer_payload_offset;

#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "rx_buffer_payload_max_size %u\n", 
              itf->rx_buffer_payload_max_size);
      fprintf(stdout, "tx_buffer_payload_max_size %u\n", 
              itf->tx_buffer_payload_max_size);
    }
#endif

#if defined(COOKED_PACKET)
  /* for each packet we initialize the ethernet header */
  ether_header_t ether_header;
  memcpy(ether_header.dhost, &itf->remote_addr.sll_addr, sizeof(ether_header.dhost));
  memcpy(ether_header.shost, &itf->local_addr.sll_addr, sizeof(ether_header.shost));
#if defined(P_8021Q)
  ether_header.tpid = htons(E_8021Q_TPID);
  ether_header.tci  = htons(E_8021Q_TCI);
#endif /* defined(P_8021Q) */
  ether_header.type = htons(send_protocol);
  for(unsigned i = 0; i < itf->tx_buffer_cnt; i++)
    {
      void * base = itf->tx_buffer_addr + i * itf->tx_packet_req.tp_frame_size;
      memcpy(base + itf->tx_buffer_payload_offset - sizeof(ether_header_t), &ether_header, sizeof(ether_header_t));
    }
  
  /* override the setting of the tx data offset and size */

  /* apply the diffs */
  itf->rx_buffer_payload_max_size -= sizeof(ether_header);
  itf->tx_buffer_payload_max_size -= sizeof(ether_header);

#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "rx_buffer_payload_max_size %u\n", 
              itf->rx_buffer_payload_max_size);
      fprintf(stdout, "tx_buffer_payload_max_size %u\n", 
              itf->tx_buffer_payload_max_size);
    }
#endif

#endif /* defined(COOKED_PACKET) */

  /* threshold payload max size according to the mtu */
  if(itf->mtu < itf->rx_buffer_payload_max_size)
    {
      itf->rx_buffer_payload_max_size = itf->mtu;
    }
  if(itf->mtu < itf->tx_buffer_payload_max_size)
    {
      itf->tx_buffer_payload_max_size = itf->mtu;
    }

#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "rx_buffer_payload_max_size %u\n", itf->rx_buffer_payload_max_size);
      fprintf(stdout, "tx_buffer_payload_max_size %u\n", itf->tx_buffer_payload_max_size);
    }
#endif

  /* setup poll fd */

  itf->pollfd[0].fd      = itf->timer_fd;
  itf->pollfd[0].events  = POLLIN;
  itf->pollfd[0].revents = 0;

  itf->pollfd[1].fd      = itf->sock_fd;
  itf->pollfd[1].events  = POLLIN|POLLRDNORM|POLLERR;
  itf->pollfd[1].revents = 0;

  return 0;
}


/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
void ethernet_close(ethernet_t * itf)
{
  /* check parameters */
  assert(itf);

  /* */
  if(itf->mmap_base != (void *)-1)
    {
      munmap(itf->mmap_base, itf->mmap_size);
      itf->mmap_base = (void *)-1;
      itf->mmap_size = 0;
    }

  /* close socket */
  if(0 <= itf->sock_fd)
    {
#if !defined(NDEBUG)
      if(itf->debug)
        {
          fprintf(stdout, "close\n");
        }
#endif
      close(itf->sock_fd);
      itf->sock_fd = -1;
    }

  /* close timer */
  if(0 <= itf->timer_fd)
    {
      close(itf->timer_fd);
      itf->timer_fd = -1;
    }
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
void ethernet_purge(ethernet_t * itf)
{
  /* check parameters */
  assert(itf);

  /* get base adress of the current rx frame */
  void * base = itf->rx_buffer_addr + itf->rx_buffer_idx * itf->rx_packet_req.tp_frame_size;
  volatile struct tpacket2_hdr * header = (struct tpacket2_hdr *)base;
  while(header->tp_status != TP_STATUS_KERNEL)
    {
      /* load the next rx frame index */
      if(itf->rx_buffer_idx < (itf->rx_buffer_cnt - 1))
        {
          itf->rx_buffer_idx ++;
        }
      else
        {
          itf->rx_buffer_idx = 0;
        }

      /* clear the status */
      header->tp_status = TP_STATUS_KERNEL;

      /* get base adress of the current rx frame */
      base = itf->rx_buffer_addr + itf->rx_buffer_idx * itf->rx_packet_req.tp_frame_size;
      header = (struct tpacket2_hdr *)base;
    }
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
int ethernet_rx_request(ethernet_t * itf, ethernet_msg_t * msg)
{
  /* check parameters */
  assert(itf && msg);

#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ethernet_rx_request\n");
    }
#endif

  if(msg->data || msg->data_len)
    {
      fprintf(stderr, "Rx request have to be released before requested.\n");
      return -1;
    }
 
  /* get base adress of the current rx frame */
  void * base = itf->rx_buffer_addr + itf->rx_buffer_idx * itf->rx_packet_req.tp_frame_size;
  volatile struct tpacket2_hdr * header = (struct tpacket2_hdr *)base;

  /* check if we need to poll */
  if(header->tp_status == TP_STATUS_KERNEL)
    {
      int err;

      /* setup read timeout */
      struct itimerspec to = {{0,0}, msg->to};
      int flags = (msg->to_is_relative)?0:TFD_TIMER_ABSTIME;
      err = timerfd_settime(itf->timer_fd, flags, &to, NULL);
      if(err < 0)
        {
          perror("timerfd_settime failed");
          return -1;
        }
     
      /* poll input */
      itf->pollfd[0].revents = 0;
      itf->pollfd[1].revents = 0;
      err = ppoll(itf->pollfd, 2, NULL, NULL);
      if(err < 0)
        {
          perror("ppoll failed");
          fprintf(stderr, "revents = %hd %hd\n", itf->pollfd[0].revents, itf->pollfd[1].revents);
          return -1;
        }
#if !defined(NDEBUG)
      else if(err == 0)
        {
          fprintf(stderr, "ppoll timeout unexpected\n");
          return -1;
        }
#endif
      else if(itf->pollfd[0].revents == POLLIN)
        {
#if !defined(NDEBUG)
          if(itf->debug)
            {
              fprintf(stdout, "timerfd timeout\n");
            }
#endif
          return 0;
        }
#if !defined(NDEBUG)
      else if(!itf->pollfd[1].revents)
        {
          fprintf(stderr, "event on socket axpected\n");
          return -1;
        }
#endif
      else if(itf->pollfd[1].revents & POLLERR)
        {
          fprintf(stderr, "error on socket poll\n");
          return -1;
        }
    }

#if !defined(NDEBUG)
  if(itf->debug)
    {
      ethernet_debug_frame(base);
    }
#endif

  /* so, here we have a frame ready to process */

  /* load the next rx frame index */
  if(itf->rx_buffer_idx < (itf->rx_buffer_cnt - 1))
    {
      itf->rx_buffer_idx ++;
    }
  else
    {
      itf->rx_buffer_idx = 0;
    }

  /* if the frame is good for reading */
  if((header->tp_status == TP_STATUS_USER) && header->tp_snaplen)
    {
      /* give to the caller the payload adress and size */
      msg->data = base + header->tp_net;
      msg->data_len = header->tp_snaplen; 
#if defined(COOKED_PACKET) // hope that header->tp_net - sizeof(ether_header_t) == header->tp_mac
      assert((header->tp_net - sizeof(ether_header_t)) == header->tp_mac);
      msg->data_len -= sizeof(ether_header_t);
#endif
      return 0;
    }
  else
    {
      fprintf(stderr, "capture failed : revents %x, status %d, snap_len %d\n", itf->pollfd[1].revents, header->tp_status, header->tp_snaplen);
      header->tp_status = TP_STATUS_KERNEL;
      return -1;
    }
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
int ethernet_rx_release(ethernet_t * itf, ethernet_msg_t * msg)
{
  /* check parameters */
  assert(itf && msg);

#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ethernet_rx_release\n");
    }
#endif

  if(!msg->data || !msg->data_len)
    {
      fprintf(stderr, "Rx request have to be requested before release.\n");
      return -1;
    }

  /* find the index of the frame associated to this data pointer */
  int i = (msg->data - itf->rx_buffer_addr) / itf->rx_packet_req.tp_frame_size;
  if((0 <= i) &&  ((unsigned)i < itf->rx_buffer_cnt))
    {
      void * base = itf->rx_buffer_addr + i * itf->rx_packet_req.tp_frame_size;
      volatile struct tpacket2_hdr * header = (struct tpacket2_hdr *)base;
      header->tp_status = TP_STATUS_KERNEL;
      msg->data = 0;
      msg->data_len = 0;
      return 0;
    }
  else
    {
      fprintf(stderr, "Rx release addr out of range (%p).\n", msg->data);
      return -1;
    }
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
int ethernet_tx_request(ethernet_t * itf, ethernet_msg_t * msg)
{
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ethernet_tx_request\n");
    }
#endif

  /* check parameters */
  assert(itf && msg);

  if(msg->data || msg->data_len)
    {
      fprintf(stderr, "Tx request have to be released before requested.\n");
      return -1;
    }
 
  /* get base adress of the current tx frame */
  void * base;
  volatile struct tpacket2_hdr * header;
  do
    {
      /* get base adress of the current tx frame */
      base = itf->tx_buffer_addr + itf->tx_buffer_idx * itf->tx_packet_req.tp_frame_size;
      header = (struct tpacket2_hdr *)base;

      /* load the next tx frame index */
      if(itf->tx_buffer_idx < (itf->tx_buffer_cnt - 1))
        {
          itf->tx_buffer_idx ++;
        }
      else
        {
          itf->tx_buffer_idx = 0;
        }

    } while(header->tp_status != TP_STATUS_AVAILABLE);

  /* give to the caller the payload adress and size */
  msg->data = base + itf->tx_buffer_payload_offset;
  msg->data_len = itf->tx_buffer_payload_max_size;

#if !defined(NDEBUG)
  if(itf->debug)
    {
      ethernet_debug_frame(base);
    }
#endif

  return 0;
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
int ethernet_tx_release(ethernet_t * itf, ethernet_msg_t * msg)
{
#if !defined(NDEBUG)
  if(itf->debug)
    {
      fprintf(stdout, "ethernet_tx_release\n");
    }
#endif

  /* check parameters */
  assert(itf && msg);

  if(!msg->data || !msg->data_len)
    {
      fprintf(stderr, "Tx request have to be requested before released.\n");
      return -1;
    }

  if(itf->tx_buffer_payload_max_size < msg->data_len)
    {
      fprintf(stderr, "Tx request can be greater than %d bytes (requested %d).\n", itf->tx_buffer_payload_max_size, msg->data_len);
      return -1;
    }
 
  /* ethernet payload are at least 46 bytes */
  if(msg->data_len < 46)
    {
      memset(msg->data + msg->data_len, 0, 46 - msg->data_len);
      msg->data_len = 46;
    }

  /* find the index of the frame associated to this data pointer */
  int i = (msg->data - itf->tx_buffer_addr) / itf->tx_packet_req.tp_frame_size;
  if((i < 0) || (itf->tx_buffer_cnt <= (unsigned)i))
    {
      fprintf(stderr, "Tx release addr out of range (%p).\n", msg->data);
      return -1;
    }

  /* get base adress of this tx frame */
  void * base = itf->tx_buffer_addr + i * itf->tx_packet_req.tp_frame_size;
  volatile struct tpacket2_hdr * header = (struct tpacket2_hdr *)base;

#if defined(PATCHED_PACKET)
  /* update packet offset */
  header->tp_net = itf->tx_buffer_payload_offset;
#endif /* defined(PATCHED_PACKET) */
  /* update packet len */
  header->tp_len = msg->data_len;
#if defined(COOKED_PACKET)
  header->tp_len += sizeof(ether_header_t);
#endif
  /* set header flag to USER (trigs xmit)*/
  header->tp_status = TP_STATUS_SEND_REQUEST;

  /* ask the kernel to send data */
  ssize_t err;
  err = sendto(itf->sock_fd, NULL, 0, 0, (const struct sockaddr *)&itf->remote_addr, sizeof(itf->remote_addr));
  if(err < 0) 
    {
      perror("sendto failed");
      fprintf(stderr, "errno = %d\n", errno);
      return -1;
    }
  else if(err == 0 ) 
    {
      /* nothing to do */
      fprintf(stderr, "Kernel have nothing to send.\n");
      return -1;
    }

  /* reset the tp_len : optional */
  header->tp_len = 0;

  /* release the buffer */
  msg->data = 0;
  msg->data_len = 0;

  return 0;
}

/******************************************************************************
 * Permet de fixer le mode debug                                              *
 *****************************************************************************/
int ethernet_set_debug(ethernet_t * itf, int debug)
{
#if !defined(NDEBUG)
  /* check parameters */
  assert(itf);
 
  int old_debug = itf->debug;

  itf->debug = debug;
 
  return old_debug;
#else
  return 0;
#endif
}

/******************************************************************************
 * Permet de recuperer l'adresse mac                                          *
 *****************************************************************************/
void ethernet_fill_with_mac_addr(ethernet_t * itf, uint8_t * addr, unsigned addr_len)
{
  /* check parameters */
  assert(itf && addr);
 
  unsigned i;
  for(i = 0; (i < itf->local_addr.sll_halen)  && (i < addr_len); i++)
    {
      addr[i] = itf->local_addr.sll_addr[i];
    }
  for(; i < addr_len; i++)
    {
      addr[i] = 0;
    }
}

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
#if !defined(NDEBUG)
static void ethernet_debug_frame(const void * base)
{
  fprintf(stdout, "buffer base addr %p\n", base);

  const struct tpacket2_hdr * header = (const struct tpacket2_hdr *)base;
  fprintf(stdout, "tpacket2_header :\n");
  fprintf(stdout, " tp_status   : 0x%02x\n", header->tp_status);
  fprintf(stdout, " tp_len      : %d\n", header->tp_len);
  fprintf(stdout, " tp_snaplen  : %d\n", header->tp_snaplen);
  fprintf(stdout, " tp_mac      : %d\n", header->tp_mac);
  fprintf(stdout, " tp_net      : %d\n", header->tp_net);
  fprintf(stdout, " tp_sec      : %d\n", header->tp_sec);
  fprintf(stdout, " tp_nsec     : %d\n", header->tp_nsec);
  fprintf(stdout, " tp_vlan_tci : 0x%04x\n", header->tp_vlan_tci);

  const struct sockaddr_ll * sll = (const struct sockaddr_ll *)(base + TPACKET_ALIGN(sizeof(struct tpacket2_hdr)));
  fprintf(stdout, "sockaddr_ll :\n");
  fprintf(stdout, " sll_family   : 0x%02x\n", sll->sll_family);
  fprintf(stdout, " sll_protocol : 0x%04x\n", sll->sll_protocol);
  fprintf(stdout, " sll_ifindex  : %d\n", sll->sll_ifindex);
  fprintf(stdout, " sll_hatype   : %d\n", sll->sll_hatype);
  fprintf(stdout, " sll_pkttype  : %d\n", sll->sll_pkttype);
  fprintf(stdout, " sll_halen    : %d\n", sll->sll_halen);
  fprintf(stdout, " sll_addr[8]  : %02x:%02x:%02x:%02x:%02x:%02x:\n", 
          sll->sll_addr[0], sll->sll_addr[1], sll->sll_addr[2],
          sll->sll_addr[3], sll->sll_addr[4], sll->sll_addr[5]);
}
#endif

/******************************************************************************
 *                                                                            *
 *                                                                            *
 *****************************************************************************/
#if !defined(NDEBUG)
static void ethernet_debug_packet_req(const struct tpacket_req * rx_packet_req, const struct tpacket_req * tx_packet_req)
{
  fprintf(stdout, "Pagesize = %ld\n", sysconf(_SC_PAGESIZE));
  fprintf(stdout, "TPACKET_ALIGNMENT = %d\n", TPACKET_ALIGNMENT);
  fprintf(stdout, "TPACKET2_HDRLEN = %d\n", TPACKET2_HDRLEN);
  fprintf(stdout, "sizeof(struct sockaddr_ll) = %d\n", sizeof(struct sockaddr_ll));
  fprintf(stdout, "Rx packet req :\n");
  fprintf(stdout, " tp_block_size = %d\n", rx_packet_req->tp_block_size);
  fprintf(stdout, " tp_block_nr   = %d\n", rx_packet_req->tp_block_nr);
  fprintf(stdout, " tp_frame_size = %d\n", rx_packet_req->tp_frame_size);
  fprintf(stdout, " tp_frame_nr   = %d\n", rx_packet_req->tp_frame_nr);
  fprintf(stdout, "Tx packet req :\n");
  fprintf(stdout, " tp_block_size = %d\n", tx_packet_req->tp_block_size);
  fprintf(stdout, " tp_block_nr   = %d\n", tx_packet_req->tp_block_nr);
  fprintf(stdout, " tp_frame_size = %d\n", tx_packet_req->tp_frame_size);
  fprintf(stdout, " tp_frame_nr   = %d\n", tx_packet_req->tp_frame_nr);
}
#endif


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-16 11:01       ` Ricardo Tubío
  2013-05-16 11:14         ` Daniel Borkmann
  2013-05-16 11:52         ` Phil Sutter
@ 2013-05-20 20:54         ` Paul Chavent
  2013-05-22 19:36           ` Ricardo Tubío
  2 siblings, 1 reply; 16+ messages in thread
From: Paul Chavent @ 2013-05-20 20:54 UTC (permalink / raw)
  To: Ricardo Tubío; +Cc: netdev

On 05/16/2013 01:01 PM, Ricardo Tubío wrote:
> Phil Sutter <phil <at> nwl.cc> writes:
>
>> So you do not call init_ring() twice as one may imply when reading your
>> first mail? Please provide a complete code sample.
>>
>
> Yes, I call it twice. The problem is that if I call it twice with the same
> socket_fd, the second time I call it I get the EBUSY error from kernel. I
> have to use two different sockets (two different socket_fd's, therefore) in
> order to workaround this issue.
>
> The code I use for calling "init_ring" is the one below. If in function
> "init_rings", instead of using two different sockets (rx_socket_fd and
> tx_socket_fd), I use a single socket, I get the EBUSY error from kernel.
>
> Hope this clarifies, Cardo.

Hi.

As stated before, you should call mmap only once.

Paul.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Single socket with TX_RING and RX_RING
  2013-05-20 20:54         ` Paul Chavent
@ 2013-05-22 19:36           ` Ricardo Tubío
  0 siblings, 0 replies; 16+ messages in thread
From: Ricardo Tubío @ 2013-05-22 19:36 UTC (permalink / raw)
  To: netdev

Paul Chavent <paul.chavent <at> fnac.net> writes:

> 
> Hi.
> 
> As stated before, you should call mmap only once.
> 
> Paul.
> 


Hi Paul,

I think that you must check the return value of the call that you make to
"setsockopt()", especially for the second call. The first one returns 0,
this is, the "setsockopt()" call was executed correctly. However, the second
one returns -16 ("EBUSY" value), this is, the call to "setsockopt()" was not
executed correctly.

So, could you modify your calls to "setsockopt()" and check whether the
return value is 0 (correct execution) or -16 (EBUSY, incorrect execution)?

Best, Cardo.

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2013-05-22 19:36 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-15 12:53 Single socket with TX_RING and RX_RING Ricardo Tubío
2013-05-15 13:20 ` Daniel Borkmann
2013-05-15 13:32   ` Ricardo Tubío
2013-05-15 14:47     ` Daniel Borkmann
2013-05-15 14:52       ` Daniel Borkmann
2013-05-15 14:58         ` Ricardo Tubío
2013-05-15 15:04           ` Daniel Borkmann
2013-05-20 20:50     ` Paul Chavent
2013-05-15 22:44 ` Phil Sutter
2013-05-16  9:18   ` Ricardo Tubío
2013-05-16 10:45     ` Phil Sutter
2013-05-16 11:01       ` Ricardo Tubío
2013-05-16 11:14         ` Daniel Borkmann
2013-05-16 11:52         ` Phil Sutter
2013-05-20 20:54         ` Paul Chavent
2013-05-22 19:36           ` Ricardo Tubío

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.