Please wait until the page is fully downloaded and then press the "Expand" button or the blue line numbers.

0042001 /*
0042002 ip_read.c
0042003 
0042004 Copyright 1995 Philip Homburg
0042005 */
As its name suggests, ip_read.c contains code that handles the ip layer of a read request. For example, when an packet destined for a udp file descriptor arrives at an ethernet port, the path of the code is the following:

eth_arrive() 

ip_eth_arrived()

if (unicast packet)
ip_arrived()
else if (ethernet broadcast packet)
ip_arrived_broadcast()

if (packet must be input routed)
hand off packet to destination ip port
else {
ip_port_arrive()
packet2user()
udp_ip_arrived()
}
ip_read.c contains ip_arrived(), ip_arrived_broadcast(), ip_port_arrive(), and packet2user().

Additionally, ip_read.c contains functions that reassemble fragmented packets.


0042006 
0042007 #include "inet.h"
0042008 #include "buf.h"
0042009 #include "clock.h"
0042010 #include "event.h"
0042011 #include "type.h"
0042012 
0042013 #include "assert.h"
0042014 #include "icmp_lib.h"
0042015 #include "io.h"
0042016 #include "ip.h"
0042017 #include "ip_int.h"
0042018 #include "ipr.h"
0042019 
0042020 THIS_FILE
0042021 
0042022 FORWARD ip_ass_t *find_ass_ent ARGS(( ip_port_t *ip_port, U16_t id,
0042023          int proto, ipaddr_t src, ipaddr_t dst ));
0042024 FORWARD acc_t *merge_frags ARGS(( acc_t *first, acc_t *second ));
0042025 FORWARD int ip_frag_chk ARGS(( acc_t *pack ));
0042026 FORWARD acc_t *reassemble ARGS(( ip_port_t *ip_port, acc_t *pack,
0042027          ip_hdr_t *ip_hdr ));
0042028 FORWARD int broadcast_dst ARGS(( ip_port_t *ip_port, ipaddr_t dest ));
0042029 FORWARD void packet2user ARGS(( ip_fd_t *ip_fd, acc_t *pack,
0042030          time_t exp_time ));
0042031 
0042032 PUBLIC int ip_read (fd, count)
0042033 int fd;
0042034 size_t count;
ip_read()

If there are unexpired packets in the ip file descriptor fd's read queue, ip_read(fd, count) passes count (ip_read()'s second parameter) bytes off to the next-higher layer by calling packet2user(). If the packets in the file descriptor's read queue have expired, ip_read() discards the packets.

ip_read() is (indirectly) called by sr_rwio() when a process reads an ip device file (e.g., /dev/ip).

In the udp code, ip_read() is called by read_ip_packets() during the initialization of the udp code. Normally, ip_read() is not called by the udp code after the initialization.


0042035 {
0042036          ip_fd_t *ip_fd;
0042037          acc_t *pack;
0042038 
0042039          ip_fd= &ip_fd_table[fd];
Before an ip file descriptor may be used (for example, before the ip file descriptor may be read from), the file descriptor must be configured. Once configured, the IFF_OPTSET flag in the if_flags field of the ip file descriptor is set.


0042040          if (!(ip_fd->if_flags & IFF_OPTSET))
0042041                   return (*ip_fd->if_put_userdata)(ip_fd->if_srfd, EBADMODE,
0042042                            (acc_t *)0, FALSE);
The if_srfd field points to the file descriptor of the higher layer. For example, if the udp code acquired the ip file descriptor, if_srfd points to a udp file descriptor.

If the udp code opened the ip file descriptor, it expected that the ip file descriptor was configured and the network service will terminate


0042043 
0042044          ip_fd->if_rd_count= count;
0042045 
0042046          ip_fd->if_flags |= IFF_READ_IP;
The IFF_READ_IP flag indicates that a read operation is in progress.

When a udp port is being initialized, the udp port opens up an ip file descriptor. Immediately after initialization, the udp code calls ip_read(), which sets the IFF_READ_IP flag. This flag will never be cleared while the ip file descriptor is open. What this means is that any packet destined for a udp file descriptor can always be immediately handed off to the udp code.


0042047          if (ip_fd->if_rdbuf_head)
If data arrived from the eth/psip port destined for the ip file descriptor, the if_rdbuf_head queue for the ip file descriptor will be non-null.


0042048          {
0042049                   if (get_time() <= ip_fd->if_exp_time)
Verify that the timer for the ip file descriptor hasn't expired. This timer is set if the ip file descriptor receives a packet while it is not being read (see line 42375).


get_time()


get_time() returns the number of clock ticks since reboot.

Several of the clients (eth, arp, ip, tcp, and udp) use get_time() to determine an appropriate timeout value for a given operation. For example, the arp code calls get_time() to determine an appropriate amount of time to wait for a response from an arp request before giving up.


0042050                   {
0042051                            pack= ip_fd->if_rdbuf_head;
0042052                            ip_fd->if_rdbuf_head= pack->acc_ext_link;
0042053                            packet2user (ip_fd, pack, ip_fd->if_exp_time);
packet2user() / ip

packet2user() attempts to pass a packet off to a higher layer (e.g., udp). If the ip file descriptor is not currently being read (i.e., the IFF_READ_IP flag in the if_flags field of the ip file descriptor is not set), packet2user() puts the packet at the end of its read queue and returns. Note that if the udp code opened the ip file descriptor, however, the udp code set the IFF_READ_IP flag during its initialization and that this flag remains set until the ip file descriptor is closed.


0042054                            assert(!(ip_fd->if_flags & IFF_READ_IP));
0042055                            return NW_OK;
0042056                   }
0042057                   while (ip_fd->if_rdbuf_head)
The packets in the read queue of the ip file descriptor have expired. Discard them.


0042058                   {
0042059                            pack= ip_fd->if_rdbuf_head;
0042060                            ip_fd->if_rdbuf_head= pack->acc_ext_link;
0042061                            bf_afree(pack);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0042062                   }
0042063          }
0042064          return NW_SUSPEND;
0042065 }
0042066 
0042067 PRIVATE acc_t *reassemble (ip_port, pack, pack_hdr)
0042068 ip_port_t *ip_port;
0042069 acc_t *pack;
0042070 ip_hdr_t *pack_hdr;
reassemble()

reassemble() reassembles a packet from its fragments. reassemble() either adds a fragment to ip_ass_table[] if all of the fragments have not already been received or, if all of the fragments have been received, reassembles the original ip packet from the fragments. For example, the following ip packet of length 3744 bytes is broken up in transit into 4 fragments of lengths 1480 bytes, 1480 bytes, and 784 bytes. The corresponding fragment offsets of the fragments are 0, 185, and 370 respectively (the "fragment offset" is the actual byte offset divided by 8). reassemble() will reassemble this original ip packet from the fragments it receives.



reassemble() is called by ip_port_arrive() and in turn calls find_ass_ent() and merge_frags().



A return value of NULL indicates that the packet to which the fragment belongs was not reassembled in its entirety or that the reassembly time exceeded the maximum time allowed (255 seconds in Minix).



0042071 {
An example is a good way to see how reassemble() works.

Let's say that an ip packet is broken up into four fragments: fragments 1,2,3, and 4. Let's also say that the fragments arrive in the order 2 - 4 - 3 - 1.

When fragment 2 arrives, reassemble() on line 42090 adds fragment 2 to the head of the fragment linked list of an ip_ass_table[] element and returns NULL, indicating the the packet has not been reassembled.



When fragment 4 arrives, reassemble() is called again and on line 42125 calls merge_frags(), which links the two fragments together by the acc_ext_link field. Note that merge_frags() does not actually merge the fragments here since 2 and 4 are not consecutive fragments.



When fragment 1 arrives, reassemble() is called again and, on line 42118, reassemble() calls merge_frags(). This time, merge_frags() merges fragments 1 and 2.



When fragment 3 arrives, reassemble() is called again and, on line 42123, merge_frags() merges the combined fragment 1-2 with fragment 3.



Then, on line 42125, merge_frags() is called again to merge the combined fragment 1-2-3 with fragment 4.



At the end of this last call, reassemble() returns the reassembled packet.


0042072          ip_ass_t *ass_ent;
0042073          size_t pack_hdr_len, pack_data_len, pack_offset, tmp_offset;
0042074          u16_t pack_flags_fragoff;
0042075          acc_t *prev_acc, *curr_acc, *next_acc, *head_acc, *tmp_acc;
0042076          ip_hdr_t *tmp_hdr;
0042077          time_t first_time;
0042078 
0042079          ass_ent= find_ass_ent (ip_port, pack_hdr->ih_id,
0042080                   pack_hdr->ih_proto, pack_hdr->ih_src, pack_hdr->ih_dst);
find_ass_ent() / ip_ass_table[]

If an ip packet is fragmented, ip_ass_table[] (the ip assemble table) holds the fragments until they are reassembled by reassemble(). Each of the 3 elements (which is an oddly small number) of ip_ass_table[] corresponds to a packet that has been fragmented and is of type ip_ass_t (see below). find_ass_ent() searches through ip_ass_table[] for fragments of the same packet and adds the fragment to this packet if found and starts a new packet otherwise. If ip_ass_table[] is full, the oldest fragmented packet is dropped and replaced by the new fragmented packet and an icmp packet is sent to the source of the dropped packet.

typedef struct ip_ass

{
acc_t *ia_frags;
int ia_min_ttl;
ip_port_t *ia_port;
time_t ia_first_time;
ipaddr_t ia_srcaddr, ia_dstaddr;
int ia_proto, ia_id;
} ip_ass_t;
acc_t *ia_frags: The first fragment in the linked list of fragments. These accessors hold the data contained in the fragments and are linked by the accessors' acc_ext_link field.

int ia_min_ttl: Set to IP_MAX_TTL (#define'd as 255 in in.h). This value is in seconds and is the maximum time that a fragmented packet may be in ip_ass_table[] before the source is sent an icmp packet.

ip_port_t *ia_port: The ip port on which the packet arrived.

time_t ia_first_time: The time at which the first fragment of the packet is added.

ipaddr_t ia_srcaddr, ia_dstaddr: The source and destination ip address of the fragment.

int ia_proto: The protocol of the packet to which the fragment belongs. For example, if the packet is a udp packet, ia_proto will be 17. If the packet is a tcp packet, ia_proto will be 6.

ia_id: The value of the ih_id field for the ip header of the first packet sent out is determined by ip_init() and is equal to the number of clock ticks since reboot (i.e., the value returned by get_time) and is incremented for each packet sent out. This value is used to combine fragments at the receiving end if fragmentation has occurred. More specifically, if a packet is fragmented during transit, ia_id will be the same for all the fragments.



0042081 
find_ass_ent() on the previous line found either a fragmented packet to which this fragment belonged or it claimed a slot in the ip_ass_table[] in which to start collecting a new fragmented packet.

Lines 42082 and 42085 calculate the byte offset of the fragment while lines 42083 and 42084 calculate the length of the fragment's data (which does not include the header).


ip_hdr_t


struct ip_hdr_t is the structure of an ip header. "ih" (e.g., ih_src, ih_dst) stands for "Ip Header".

ip_hdr_t is declared in /include/net/gen/ip_hdr.h:

typedef struct ip_hdr

{
u8_t ih_vers_ihl, ih_tos;
u16_t ih_length, ih_id, ih_flags_fragoff;
u8_t ih_ttl, ih_proto;
u16_t ih_hdr_chk;
ipaddr_t ih_src, ih_dst;
} ip_hdr_t;

ih_vers_ihl: The lower 4 bits is the length of the header plus options (if there are any) shifted by 2 bit positions (i.e., its actual length is 4 times as great as the value stored in ih_vers_ihl). An example of an option is a router list that a packet should follow to its destination.

The upper four bits is the version number (e.g., IPv4).


ih_tos: tos stands for "Type Of Service" and is the priority of the ip packet. A value of zero is the lowest priority. Both UDP and TCP have a default TOS of zero.

#define TCP_DEF_TOS 0
#define UDP_TOS 0


ih_length: The length of the entire ip packet, including the ip header.


ih_id: The value of ih_id for the first packet sent out is determined by ip_init() and is equal to the number of clock ticks since reboot (i.e., the value returned by get_time) and is incremented for each packet sent out. This value is used to combine fragments at the receiving end if fragmentation has occurred.


ih_flags_fragoff: ih_flags_fragoff is a combination of flags and a (possible) fragmentation offset ("fragoff").

If the packet should not be fragmented, ih_flags_fragoff is set to IH_DONT_FRAG. If there are additional fragments (e.g., the 3rd fragment of 4 fragments), ih_flags_fragoff is set to IH_MORE_FRAGS.

If the packet is indeed just a fragment of a packet, this value indicates the starting byte position (in 8 byte increments) of the original ip packet's data. So for example, if an ip packet of data size (not including the ip header) is broken up into two fragments of 1496 and 504 bytes each, the first fragment would have a fragmentation offset of 0 bytes and the second fragment would have a fragmentation offset of 1496 bytes and ih_flags_fragoff is therefore 187 (1496 / 8 = 187).


ih_ttl: "Time to live" for the packet. As a packet is routed to the destination, each router decrements the packet's ttl. When the ttl reaches 0, the router sends an "icmp unreachable" packet to the source. The ttl is designed to prevent packets that can't reach their destination from indefinitely bouncing around between routers. UDP's default TTL is 30:

#define UDP_TTL 30

Note that the Minix code also uses this value as a timeout value (in seconds). This code was written before the ttl field was redefined to be strictly a hope count. The original IP RFC defines the ttl field as the time to live in seconds.


ih_proto: The protocol of the ip packet. For example, if the packet is a udp packet, ih_proto will be 17. If the packet is a tcp packet, ih_proto will be 6.


ih_hdr_chk: Checksum for the header.


ih_src, ih_dst: Source and destination ip address of the ip packet.


IP HEADER (as given by RFC 791)


0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



0042082          pack_flags_fragoff= ntohs(pack_hdr->ih_flags_fragoff);
htons() / ntohs() / htonl() / ntohl()

From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0042083          pack_hdr_len= (pack_hdr->ih_vers_ihl & IH_IHL_MASK) * 4;
IH_IHL_MASK is defined in /include/net/gen/ip_hdr.h:

#define IH_IHL_MASK 0xf


0042084          pack_data_len= ntohs(pack_hdr->ih_length)-pack_hdr_len;
htons() / ntohs() / htonl() / ntohl()

From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0042085          pack_offset= (pack_flags_fragoff & IH_FRAGOFF_MASK)*8;
IH_FRAGOFF_MASK is defined in /include/net/gen/ip_hdr.h:

#define IH_FRAGOFF_MASK 0x1fff


0042086          pack->acc_ext_link= NULL;
The fragments (which are stored in accessors) are linked together by their acc_ext_link field. The fragments are eventually reassembled to form a single accessor.


0042087 
0042088          head_acc= ass_ent->ia_frags;
0042089          ass_ent->ia_frags= NULL;
0042090          if (head_acc == NULL)
If head_acc is empty (i.e., no previous fragments of the same packet were found), the current fragment is placed in ip_ass_table[] and NULL is returned to indicate that the fragmented packet was not reassembled in its entirety.


0042091          {
0042092                   ass_ent->ia_frags= pack;
0042093                   return NULL;
0042094          }
0042095 
0042096          prev_acc= NULL;
0042097          curr_acc= NULL;
0042098          next_acc= head_acc;
0042099 
0042100          while(next_acc)
This while loop finds the appropriate place in the linked list for the fragment based on its fragmentation offset. Note that this loop does not insert the fragment into the linked list. merge_frags() does that later.


0042101          {
0042102                   tmp_hdr= (ip_hdr_t *)ptr2acc_data(next_acc);
Each fragment has an ip header.


ptr2acc_data()


The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0042103                   tmp_offset= (ntohs(tmp_hdr->ih_flags_fragoff) &
0042104                            IH_FRAGOFF_MASK)*8;
htons() / ntohs() / htonl() / ntohl()

From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0042105 
0042106                   if (pack_offset < tmp_offset)
0042107                            break;
0042108 
0042109                   prev_acc= curr_acc;
0042110                   curr_acc= next_acc;
0042111                   next_acc= next_acc->acc_ext_link;
Swap the two fragments. Recall that the fragments are linked together by their acc_ext_link fields.


0042112          }
0042113          if (curr_acc == NULL)
If the fragment became the first in the linked list, attempt to merge the first fragment with the second fragment.


0042114          {
0042115                   assert(prev_acc == NULL);
0042116                   assert(next_acc != NULL);
0042117 
0042118                   curr_acc= merge_frags(pack, next_acc);
merge_frags()

merge_frags(first, second) attempts to merge fragments first, the first parameter, with second, the second parameter. If the fragments are not consecutive fragments (for example, if the two fragments are the 2nd and 4th fragments of the original packet), merge_frags() simply links the two fragments together.


0042119                   head_acc= curr_acc;
0042120          }
0042121          else
If the fragment is somewhere in the middle, attempt to merge the fragment with both the fragment before it and the fragment after it.


0042122          {
0042123                   curr_acc= merge_frags(curr_acc, pack);
merge_frags()

merge_frags(first, second) attempts to merge fragments first, the first parameter, with second, the second parameter. If the fragments are not consecutive fragments (for example, if the two fragments are the 2nd and 4th fragments of the original packet), merge_frags() simply links the two fragments together.


0042124                   if (next_acc != NULL)
0042125                            curr_acc= merge_frags(curr_acc, next_acc);
merge_frags()

merge_frags(first, second) attempts to merge fragments first, the first parameter, with second, the second parameter. If the fragments are not consecutive fragments (for example, if the two fragments are the 2nd and 4th fragments of the original packet), merge_frags() simply links the two fragments together.


0042126                   if (prev_acc != NULL)
Place the new fragment (which will be the result of a merge of two or more fragments) in the appropriate position in the linked list of fragments.


0042127                            prev_acc->acc_ext_link= curr_acc;
0042128                   else
0042129                            head_acc= curr_acc;
0042130          }
Lines 42131-42135 retrieve the ih_flags_fragoff field from the first fragment in the relevant ip_ass_table[] element so that it can be examined on line 42137.


0042131          ass_ent->ia_frags= head_acc;
0042132 
0042133          pack= ass_ent->ia_frags;
The ia_frags field for an element in ip_ass_table[] points to the first fragment in the linked list of fragments.


0042134          pack_hdr= (ip_hdr_t *)ptr2acc_data(pack);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0042135          pack_flags_fragoff= ntohs(pack_hdr->ih_flags_fragoff);
htons() / ntohs() / htonl() / ntohl()

From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0042136 
0042137          if (!(pack_flags_fragoff & (IH_FRAGOFF_MASK|IH_MORE_FRAGS)))
0042138                   /* it's now a complete packet */
If there are no more fragments after the first fragment (i.e., its IH_MORE_FRAGS flag is not set), the packet has been completely reassembled. Unless it took too long to reassemble the packet (see lines 42151-42154), return the newly reassembled packet to ip_port_arrive().


IH_FRAGOFF_MASK and IH_MORE_FRAGS are #define'd in /include/net/gen/ip_hdr.h:

#define IH_FRAGOFF_MASK 0x1fff
#define IH_MORE_FRAGS 0x2000


0042139          {
0042140                   first_time= ass_ent->ia_first_time;
The variable first_time is used on line 42151 to determine if it has taken too long (in the case of Minix, 255 seconds) to reassemble the packet.


0042141 
The next two lines free up the ip_ass_table[] element.


0042142                   ass_ent->ia_frags= 0;
0042143                   ass_ent->ia_first_time= 0;
0042144 
0042145                   while (pack->acc_ext_link)
Free up any accessors that are unnecessarily linked to the packet (for example, a fragment that was sent twice).


0042146                   {
0042147                            tmp_acc= pack->acc_ext_link;
0042148                            pack->acc_ext_link= tmp_acc->acc_ext_link;
0042149                            bf_afree(tmp_acc);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0042150                   }
0042151                   if ((ass_ent->ia_min_ttl) * HZ + first_time <
0042152                            get_time())
get_time()

get_time() returns the number of clock ticks since reboot.

Several of the clients (eth, arp, ip, tcp, and udp) use get_time() to determine an appropriate timeout value for a given operation. For example, the arp code calls get_time() to determine an appropriate amount of time to wait for a response from an arp request before giving up.


0042153                            icmp_snd_time_exceeded(ip_port->ip_port, pack,
0042154                                     ICMP_FRAG_REASSEM);
If it took too long to reassemble the packet, send an icmp packet to the source.

ICMP_FRAG_REASSEM is #define'd in /include/net/gen/icmp.h:

# define ICMP_FRAG_REASSEM 1

and is the code number for the "Fragment Reassembly Time Exceeded" message.


icmp_time_exceeded()


icmp_time_exceeded() is called by process_data() to handle time-exceeded icmp messages.

If the icmp message is valid, ipr_ttl_exc() is called to either increase the ttl of the appropriate entry in the output routing table or mark the entry as unreachable.


0042155                   else
0042156                            return pack;
The packet was successfully reassembled. Return the packet to ip_port_arrive().


0042157          }
0042158          return NULL;
If the original packet was not completely reassembled, return NULL. It will have to wait for the remaining fragments (unless the process timed out).


0042159 }
0042160 
0042161 PRIVATE acc_t *merge_frags (first, second)
0042162 acc_t *first, *second;
merge_frags()

merge_frags(first, second) attempts to merge fragments first, the first parameter, with second, the second parameter. If the fragments are not consecutive fragments (for example, if the two fragments are the 2nd and 4th fragments of the original packet), merge_frags() simply links the two fragments together.


0042163 {
0042164          ip_hdr_t *first_hdr, *second_hdr;
0042165          size_t first_hdr_size, second_hdr_size, first_datasize, second_datasize,
0042166                   first_offset, second_offset;
0042167          acc_t *cut_second, *tmp_acc;
0042168 
0042169          if (!second)
If there isn't a second fragment with which to merge the first, there isn't much that can be done.


0042170          {
0042171                   first->acc_ext_link= NULL;
0042172                   return first;
0042173          }
0042174 
0042175 assert (first->acc_length >= IP_MIN_HDR_SIZE);
0042176 assert (second->acc_length >= IP_MIN_HDR_SIZE);
0042177 
Lines 42178-42182 determine the first fragment's data length (not including the header) and its fragmentation offset. Lines 42184-42188 determine the second fragment's data length (again, not including the header) and its fragmentation offset. These values are ascertained from the fragments' ip headers.


ip_hdr_t


struct ip_hdr_t is the structure of an ip header. "ih" (e.g., ih_src, ih_dst) stands for "Ip Header".

ip_hdr_t is declared in /include/net/gen/ip_hdr.h:

typedef struct ip_hdr

{
u8_t ih_vers_ihl, ih_tos;
u16_t ih_length, ih_id, ih_flags_fragoff;
u8_t ih_ttl, ih_proto;
u16_t ih_hdr_chk;
ipaddr_t ih_src, ih_dst;
} ip_hdr_t;

ih_vers_ihl: The lower 4 bits is the length of the header plus options (if there are any) shifted by 2 bit positions (i.e., its actual length is 4 times as great as the value stored in ih_vers_ihl). An example of an option is a router list that a packet should follow to its destination.

The upper four bits is the version number (e.g., IPv4).


ih_tos: tos stands for "Type Of Service" and is the priority of the ip packet. A value of zero is the lowest priority. Both UDP and TCP have a default TOS of zero.

#define TCP_DEF_TOS 0
#define UDP_TOS 0


ih_length: The length of the entire ip packet, including the ip header.


ih_id: The value of ih_id for the first packet sent out is determined by ip_init() and is equal to the number of clock ticks since reboot (i.e., the value returned by get_time) and is incremented for each packet sent out. This value is used to combine fragments at the receiving end if fragmentation has occurred.


ih_flags_fragoff: ih_flags_fragoff is a combination of flags and a (possible) fragmentation offset ("fragoff").

If the packet should not be fragmented, ih_flags_fragoff is set to IH_DONT_FRAG. If there are additional fragments (e.g., the 3rd fragment of 4 fragments), ih_flags_fragoff is set to IH_MORE_FRAGS.

If the packet is indeed just a fragment of a packet, this value indicates the starting byte position (in 8 byte increments) of the original ip packet's data. So for example, if an ip packet of data size (not including the ip header) is broken up into two fragments of 1496 and 504 bytes each, the first fragment would have a fragmentation offset of 0 bytes and the second fragment would have a fragmentation offset of 1496 bytes and ih_flags_fragoff is therefore 187 (1496 / 8 = 187).


ih_ttl: "Time to live" for the packet. As a packet is routed to the destination, each router decrements the packet's ttl. When the ttl reaches 0, the router sends an "icmp unreachable" packet to the source. The ttl is designed to prevent packets that can't reach their destination from indefinitely bouncing around between routers. UDP's default TTL is 30:

#define UDP_TTL 30

Note that the Minix code also uses this value as a timeout value (in seconds). This code was written before the ttl field was redefined to be strictly a hope count. The original IP RFC defines the ttl field as the time to live in seconds.


ih_proto: The protocol of the ip packet. For example, if the packet is a udp packet, ih_proto will be 17. If the packet is a tcp packet, ih_proto will be 6.


ih_hdr_chk: Checksum for the header.


ih_src, ih_dst: Source and destination ip address of the ip packet.


IP HEADER (as given by RFC 791)


0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



0042178          first_hdr= (ip_hdr_t *)ptr2acc_data(first);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0042179          first_offset= (ntohs(first_hdr->ih_flags_fragoff) &
0042180                   IH_FRAGOFF_MASK) * 8;
htons() / ntohs() / htonl() / ntohl()

From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0042181          first_hdr_size= (first_hdr->ih_vers_ihl & IH_IHL_MASK) * 4;
0042182          first_datasize= ntohs(first_hdr->ih_length) - first_hdr_size;
0042183 
0042184          second_hdr= (ip_hdr_t *)ptr2acc_data(second);
0042185          second_offset= (ntohs(second_hdr->ih_flags_fragoff) &
0042186                   IH_FRAGOFF_MASK) * 8;
second_offset is the fragmentation offset of the second fragment.


htons() / ntohs() / htonl() / ntohl()


From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0042187          second_hdr_size= (second_hdr->ih_vers_ihl & IH_IHL_MASK) * 4;
0042188          second_datasize= ntohs(second_hdr->ih_length) - second_hdr_size;
0042189 
0042190          assert (first_hdr_size + first_datasize == bf_bufsize(first));
0042191          assert (second_hdr_size + second_datasize == bf_bufsize(second));
0042192          assert (second_offset >= first_offset);
0042193 
0042194          if (second_offset > first_offset+first_datasize)
If a fragment is missing between the two fragments, just link the two fragments together. For example, if the fragmentation offset of the first fragment is 500 bytes and its data length is 500 bytes and the second fragment has a fragmentation offset of 1500 bytes, there is a gap of 500 bytes between the two fragments.






0042195          {
0042196                   DBLOCK(1, printf("ip fragments out of order\n"));
0042197                   first->acc_ext_link= second;
0042198                   return first;
0042199          }
0042200 
0042201          if (second_offset + second_datasize <= first_offset +
0042202                   first_datasize)
If the first fragment is after the second fragment in the original datagram, it is better to delete the first fragment. merge_frags() expects the fragment first to really be the first fragment.


0042203          {
0042204                   /* May cause problems if we try to merge. */
0042205                   bf_afree(first);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0042206                   return second;
0042207          }
0042208 
0042209          if (!(second_hdr->ih_flags_fragoff & HTONS(IH_MORE_FRAGS)))
0042210                   first_hdr->ih_flags_fragoff &= ~HTONS(IH_MORE_FRAGS);
Since the first fragment will be merged with the second fragment, turn off the IH_MORE_FRAGS in the first fragment if there will be no fragments after the second fragment.


htons() / ntohs() / htonl() / ntohl()


From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0042211 
The remainder of merge_frags() cuts out the second fragment's header and any overlap between the two fragments (see figure for comment for line 42215) and appends the remainder of the second fragment to the first fragment. In addition to this, merge_frags() also adjusts the ih_length field (the total length of the fragment including the header) of the new, larger first fragment and adjusts this fragment's acc_ext_link field (see line 42224).


0042212          second_datasize= second_offset+second_datasize-(first_offset+
0042213                   first_datasize);
0042214          cut_second= bf_cut(second, second_hdr_size + first_offset+
0042215                   first_datasize-second_offset, second_datasize);
Cut out at least the second fragment's header and, if there is some overlap between the two fragments, cut out the length of the overlap in the second fragment's data:






bf_cut()


If a section of a linked list needs to be duplicated, bf_cut(data, offset, length) is called. For example, if a section of length 50 starting at an offset of 75 of the linked list below needs to be duplicated, bf_cut(data, 75, 50) is called:



Note that the original linked list remains unchanged and that acc_linkC for all the accessors in the new linked list is one.

If length (the second parameter) is zero, simply duplicate the first accessor in the linked list but set acc_length=0 and acc_next=null. In other words, create a linked list of length one accessor whose acc_length is 0.

bf_cut() is used in a number of scenarios, including cutting a received ethernet packet to size.

For a full description of the network service's buffer management, click here.



0042216          tmp_acc= second->acc_ext_link;
0042217          bf_afree(second);
0042218          second= tmp_acc;
0042219 
Adjust the first fragment's ih_length field (the total length of the fragment including the header).


0042220          first_datasize += second_datasize;
0042221          first_hdr->ih_length= htons(first_hdr_size + first_datasize);
0042222 
0042223          first= bf_append (first, cut_second);
This is where the actual merging occurs.


bf_append()


bf_append() appends one accessor linked list to another accessor linked list. For example, if the payload of an ethernet packet (1500 bytes) is appended to an ethernet header (14 bytes):



the resulting linked list is as follows:






0042224          first->acc_ext_link= second;
The next fragment in the linked list for the first framgent is the fragment that was originally the next fragment for the second fragment (which no longer exists).


0042225 
0042226 assert (first_hdr_size + first_datasize == bf_bufsize(first));
0042227 
0042228          return first;
The merged fragment is returned.


0042229 }
0042230 
0042231 PRIVATE ip_ass_t *find_ass_ent (ip_port, id, proto, src, dst)
0042232 ip_port_t *ip_port;
0042233 u16_t id;
0042234 ipproto_t proto;
0042235 ipaddr_t src;
0042236 ipaddr_t dst;
find_ass_ent() / ip_ass_table[]

If an ip packet is fragmented, ip_ass_table[] (the ip assemble table) holds the fragments until they are reassembled by reassemble(). Each of the 3 elements (which is an oddly small number) of ip_ass_table[] corresponds to a packet that has been fragmented and is of type ip_ass_t (see below). find_ass_ent() searches through ip_ass_table[] for fragments of the same packet and adds the fragment to this packet if found and starts a new packet otherwise. If ip_ass_table[] is full, the oldest fragmented packet is dropped and replaced by the new fragmented packet and an icmp packet is sent to the source of the dropped packet.

typedef struct ip_ass

{
acc_t *ia_frags;
int ia_min_ttl;
ip_port_t *ia_port;
time_t ia_first_time;
ipaddr_t ia_srcaddr, ia_dstaddr;
int ia_proto, ia_id;
} ip_ass_t;
acc_t *ia_frags: The first fragment in the linked list of fragments. These accessors hold the data contained in the fragments and are linked by the accessors' acc_ext_link field.

int ia_min_ttl: Set to IP_MAX_TTL (#define'd as 255 in in.h). This value is in seconds and is the maximum time that a fragmented packet may be in ip_ass_table[] before the source is sent an icmp packet.

ip_port_t *ia_port: The ip port on which the packet arrived.

time_t ia_first_time: The time at which the first fragment of the packet is added.

ipaddr_t ia_srcaddr, ia_dstaddr: The source and destination ip address of the fragment.

int ia_proto: The protocol of the packet to which the fragment belongs. For example, if the packet is a udp packet, ia_proto will be 17. If the packet is a tcp packet, ia_proto will be 6.

ia_id: The value of the ih_id field for the ip header of the first packet sent out is determined by ip_init() and is equal to the number of clock ticks since reboot (i.e., the value returned by get_time) and is incremented for each packet sent out. This value is used to combine fragments at the receiving end if fragmentation has occurred. More specifically, if a packet is fragmented during transit, ia_id will be the same for all the fragments.



0042237 {
0042238          ip_ass_t *new_ass_ent, *tmp_ass_ent;
0042239          int i;
0042240          acc_t *tmp_acc, *curr_acc;
0042241 
0042242          new_ass_ent= 0;
0042243 
0042244          for (i=0, tmp_ass_ent= ip_ass_table; i<IP_ASS_NR; i++,
0042245                   tmp_ass_ent++)
Search through ip_ass_table[] (which has only 3 elements).


0042246          {
0042247                   if (!tmp_ass_ent->ia_frags && tmp_ass_ent->ia_first_time)
0042248                   {
0042249                            DBLOCK(1,
0042250                   printf("strange ip_ass entry (can be a race condition)\n"));
0042251                            continue;
0042252                   }
0042253 
0042254                   if ((tmp_ass_ent->ia_srcaddr == src) &&
0042255                            (tmp_ass_ent->ia_dstaddr == dst) &&
0042256                            (tmp_ass_ent->ia_proto == proto) &&
0042257                            (tmp_ass_ent->ia_id == id) &&
0042258                            (tmp_ass_ent->ia_port == ip_port))
If the fragment has the same source address, destination address, protocol, ip port, and (probably most importantly) id, a match has been found. Return the ip_ass_table[] element (which corresponds to a fragmented packet) to reassemble(). reassemble() will then possibly add the new fragment to the ip_ass_table[] element or possibly reassemble the packet (among other possibilities).


0042259                   {
0042260                            return tmp_ass_ent;
0042261                   }
0042262                   if (!new_ass_ent || tmp_ass_ent->ia_first_time <
0042263                            new_ass_ent->ia_first_time)
Keep track of which element has been in ip_ass_table[] the longest. If ip_ass_table[] is full, this element will be discarded (see the block on lines 42269-42288).


0042264                   {
0042265                            new_ass_ent= tmp_ass_ent;
0042266                   }
0042267          }
0042268 
0042269          if (new_ass_ent->ia_frags)
If ip_ass_table[] is full, the fragmented packet that has been in ip_ass_table[] for the longest time is discarded and the source is sent an icmp packet.


0042270          {
0042271                   DBLOCK(1, printf("old frags id= %u, proto= %u, src= ",
0042272                            ntohs(new_ass_ent->ia_id),
0042273                            ntohs(new_ass_ent->ia_proto));
0042274                            writeIpAddr(new_ass_ent->ia_srcaddr); printf(" dst= ");
0042275                            writeIpAddr(new_ass_ent->ia_dstaddr); printf(": ");
0042276                            ip_print_frags(new_ass_ent->ia_frags); printf("\n"));
Free all the accessors (except for the first one which is used to send an icmp packet back to the source) that were used to hold the fragmented packet's data.


0042277                   curr_acc= new_ass_ent->ia_frags->acc_ext_link;
0042278                   while (curr_acc)
0042279                   {
0042280                            tmp_acc= curr_acc->acc_ext_link;
0042281                            bf_afree(curr_acc);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0042282                            curr_acc= tmp_acc;
0042283                   }
0042284                   curr_acc= new_ass_ent->ia_frags;
0042285                   new_ass_ent->ia_frags= 0;
0042286                   icmp_snd_time_exceeded(ip_port->ip_port, curr_acc,
0042287                            ICMP_FRAG_REASSEM);
Send the source of the discarded packet an icmp packet.


icmp_snd_time_exceeded()


icmp_snd_time_exceeded() sends an ICMP_TYPE_TIME_EXCEEDED icmp message. ICMP_TYPE_TIME_EXCEEDED messages are sent when either a packet's TTL timer has expired or if some fragments of the original packet were not received in time to be reassembled by the destination system.


0042288          }
Add the fragment (which will be the first from the original packet) to the (perhaps newly) unoccupied element of ip_ass_table[] and return the element.


0042289          new_ass_ent->ia_min_ttl= IP_MAX_TTL;
Minix chooses a max ttl value of 255 seconds as the reassembly timeout while many other implementations (e.g., at BSD - see TCP Illustrated vol II, pp 292) use a value of 60 seconds. RFC 1122 recommends a value between 60 and 120 seconds (and requires that an icmp time exceeded message be sent back to the source).

Note that this value is in seconds and not in hops.


0042290          new_ass_ent->ia_port= ip_port;
0042291          new_ass_ent->ia_first_time= get_time();
get_time()

get_time() returns the number of clock ticks since reboot.

Several of the clients (eth, arp, ip, tcp, and udp) use get_time() to determine an appropriate timeout value for a given operation. For example, the arp code calls get_time() to determine an appropriate amount of time to wait for a response from an arp request before giving up.


0042292          new_ass_ent->ia_srcaddr= src;
0042293          new_ass_ent->ia_dstaddr= dst;
0042294          new_ass_ent->ia_proto= proto;
0042295          new_ass_ent->ia_id= id;
0042296 
0042297          return new_ass_ent;
0042298 }
0042299 
0042300 PRIVATE int ip_frag_chk(pack)
0042301 acc_t *pack;
ip_frag_chk()

ip_frag_chk(pack) performs a few checks on the packet pack, ip_frag_chk()'s only parameter. For example, ip_frag_chk() verifies that the checksum of the ip header is zero.

ip_frag_chk() returns 0 on failure.


0042302 {
0042303          ip_hdr_t *ip_hdr;
0042304          int hdr_len;
0042305 
0042306          if (pack->acc_length < sizeof(ip_hdr_t))
The length of the packet must be at least the length of an ip header.


0042307          {
0042308                   DBLOCK(1, printf("wrong length\n"));
0042309                   return FALSE;
0042310          }
0042311 
0042312          ip_hdr= (ip_hdr_t *)ptr2acc_data(pack);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0042313 
0042314          hdr_len= (ip_hdr->ih_vers_ihl & IH_IHL_MASK) * 4;
The lower 4 bits of ih_vers_ihl is the length of the header plus options (if there are any) shifted by 2 bit positions (i.e., its actual length is 4 times as great as the value stored in ih_vers_ihl). An example of an option is a router list that a packet should follow to its destination.

The upper four bits is the version number (e.g., IPv4).


0042315          if (pack->acc_length < hdr_len)
The packet length must be at least as great as the length of the ip header as reported in the ip header.


0042316          {
0042317                   DBLOCK(1, printf("wrong length\n"));
0042318 
0042319                   return FALSE;
0042320          }
0042321 
0042322          if (((ip_hdr->ih_vers_ihl >> 4) & IH_VERSION_MASK) !=
0042323                   IP_VERSION)
The upper four bits of ih_vers_ihl is the version number. This must be IPv4.

IP_VERSION is defined in /include/net/gen/in.h:

#define IP_VERSION 4


0042324          {
0042325                   DBLOCK(1, printf("wrong version (ih_vers_ihl=0x%x)\n",
0042326                            ip_hdr->ih_vers_ihl));
0042327                   return FALSE;
0042328          }
0042329          if (ntohs(ip_hdr->ih_length) != bf_bufsize(pack))
The length of the packet must be equal to the length of the entire packet as reported by the ip header (i.e., ih_length).


htons() / ntohs() / htonl() / ntohl()


From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


bf_bufsize()


bf_bufsize() returns the total buffer size of a linked list of accessors (i.e., the sum of acc_length for the accessors in a linked list).

For a detailed description of the network service's buffer management, click here.


0042330          {
0042331                   DBLOCK(1, printf("wrong size\n"));
0042332 
0042333                   return FALSE;
0042334          }
0042335          if ((u16_t)~oneC_sum(0, (u16_t *)ip_hdr, hdr_len))
oneC_sum()

A checksum is used to determine if errors occurred during the transmission of data. The checksum algorithm used by oneC_sum() (which is also the Internet standard) is described by RFC 1071.

Essentially, the algorithm goes through data and adds all the bytes together (using one's complement addition). The high 16 bits of the resulting 32 bit value is then added to the low 16 bits (again, using one's complement addition). The checksum field is then set to the one's complement of this 16 bit sum. (Recall that the one's complement of 0xF0F0 is 0x0F0F.) Since AND'ing any 16 bit number and its 16 bit one's complement will equal 0xFFFF, the checksum of the packet (without the checksum field) AND'ed with the checksum field will equal 0xFFFF (provided the packet was not corrupted after the checksum field was calculated). For example, the checksum of a udp header (including the checksum field) will equal 0xFFFF if the packet was not corrupted in delivery.


From RFC 1071:

In outline, the Internet checksum algorithm is fairly simple:

(1) Adjacent octets to be checksummed are paired to form 16-bit
integers, and the 1's complement sum of these 16-bit integers is
formed.

(2) To generate a checksum, the checksum field itself is cleared,
the 16-bit 1's complement sum is computed over the octets
concerned, and the 1's complement of this sum is placed in the
checksum field.

(3) To check a checksum, the 1's complement sum is computed over the
same set of octets, including the checksum field. If the result
is all 1 bits (-0 in 1's complement arithmetic), the check
succeeds.

Below is a "C" code algorithm that describes the process above. This algorithm is also from RFC 1071. Note that count is the running count of all the bytes in the data and checksum is the return value.



{
/* Compute Internet Checksum for "count" bytes
* beginning at location "addr".
*/
register long sum = 0;

while( count > 1 ) {
/* This is the inner loop */
sum += * (unsigned short) addr++;
count -= 2;
}

/* Add left-over byte, if any */
if( count > 0 )
sum += * (unsigned char *) addr;

/* Fold 32-bit sum to 16 bits */
while (sum>>16)
sum = (sum & 0xffff) + (sum >> 16);

checksum = ~sum;
}



0042336          {
0042337                   DBLOCK(1, printf("packet with wrong checksum (= %x)\n",
0042338                            (u16_t)~oneC_sum(0, (u16_t *)ip_hdr, hdr_len)));
0042339                   return FALSE;
0042340          }
0042341          if (hdr_len>IP_MIN_HDR_SIZE && ip_chk_hdropt((u8_t *)
0042342                   (ptr2acc_data(pack) + IP_MIN_HDR_SIZE),
0042343                   hdr_len-IP_MIN_HDR_SIZE))
Verify that the ip header length is greater than the minimum size and verify that the ip header options are acceptable.


ptr2acc_data()


The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


ip_chk_hdropt()


ip_chk_hdropt() goes through the ip header options (if there are any) and verifies that the options are acceptable. For example, ip_chk_hdropt() verifies that the same ip header option is not listed twice.


0042344          {
0042345                   DBLOCK(1, printf("packet with wrong options\n"));
0042346                   return FALSE;
0042347          }
0042348          return TRUE;
0042349 }
0042350 
0042351 PRIVATE void packet2user (ip_fd, pack, exp_time)
0042352 ip_fd_t *ip_fd;
0042353 acc_t *pack;
0042354 time_t exp_time;
packet2user() / ip

packet2user() attempts to pass a packet off to a higher layer (e.g., udp). If the ip file descriptor is not currently being read (i.e., the IFF_READ_IP flag in the if_flags field of the ip file descriptor is not set), packet2user() puts the packet at the end of its read queue and returns. Note that if the udp code opened the ip file descriptor, however, the udp code set the IFF_READ_IP flag during its initialization and that this flag remains set until the ip file descriptor is closed.


0042355 {
To fully understand packet2user(), an understanding of the network service's buffer management is necessary.


acc_freelist / buf512_freelist / accessors[] , buffers512[]


While being processed by the network service, packets and configuration data are temporarily stored in buffers associated with struct's named "accessors". These accessors are chained together to form linked lists and can be manipulated by a number of functions. There are two linked lists of signifigance in buf.c: acc_freelist and buf512_freelist. After initialization, these linked lists are as follows:



where the elements of buffers512[] are of type buf512_t:

typedef struct buf512 

{
buf_t buf_header;
char buf_data[512]; /* the user data is found here */
} buf512_t

typedef struct buf
{
int buf_linkC;
buffree_t buf_free;
size_t buf_size;
char *buf_data_p;
} buf_t;
and the elements of accessors[] are of type acc_t:

typedef struct acc

{
int acc_linkC;
int acc_offset, acc_length;
buf_t *acc_buffer;
struct acc *acc_next, *acc_ext_link;
} acc_t;
The fields of these struct's are described in the context of the functions that access them.

buf512_freelist is the linked list of accessors that have associated buffers and acc_freelist is the linked list of accessors that do not have associated buffers. When a buffer is needed, an accessor from buf512_freelist is taken (see bf_memreq() below). When a single accessor or a segment of the linked list is duplicated, the duplicate accessor is taken from acc_freelist (see bf_dupacc() and bf_cut() below).

If an accessor is at the beginning of a linked list or if only a single accessor is referencing it, acc_linkC will be one. If more than one accessor references another accessor, however, acc_linkC for the referenced accessor will be greater than one. Similarly, if only a single accessor references a buffer, buf_linkC will be one. If more than one accessor references a buffer, the buffer's buf_linkC will be greater than one. For two good examples of functions that affect acc_linkC/buf_linkC, see bf_dupacc() and bf_cut() below.

As with other queues within the network service, the memory for the acc_freelist queue and the buf512_freelist queue (and its associated buffers) come from arrays with a limited number of elements. This must be done because memory within Minix is scarce (for one reason, Minix does not have virtual memory).

When buffer space for data is needed (for example, for an incoming ethernet packet), bf_memreq() is called to acquire the space. For example, if ETH_MAX_PACK_SIZE (#define'd as 1514) bytes of buffer space are required, bf_memreq() will remove 3 accessors from buf512_freelist and will return a pointer to the first accessor:



If an accessor needs to be duplicated, bf_dupacc() is called. For example, if acc (see figure below) is needed to be duplicated, bf_dupacc(acc) will return new_acc:



Note that the buf_linkC of buffers512[127] and the acc_linkC of accessors[65] are incremented to 2. Also note that the accessor returned by bf_dupacc() (in this case, new_acc) is taken from acc_freelist and not from buf512_freelist.

After a chain of accessors is no longer needed, the chain can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors or their associated buffers in the linked list is not equal to one, then the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).

If a section of a linked list needs to be duplicated (as opposed to a single accessor - see bf_dupacc() above), bf_cut() is called. For example, if a section of length 50 starting at an offset of 75 of the linked list below needs to be duplicated, bf_cut(data, 75, 50) is called:



bf_cut() is used in a number of scenarios, including cutting a received ethernet packet to size. bf_afree() is generally called after bf_cut() to free up the original accessor linked list.

If only the beginning of a linked list can be freed, bf_delhead() is called. If acc_linkC and buf_linkC are one for all of the relevant accessors and their associated buffers in the linked list, the operation is straight-forward:



bf_delhead() is often called to remove the header (e.g., ip header) from a packet. Note that the buffers of the accessors returned to the buf512_freelist have their buf_linkC field set to zero (0).

bf_append() appends one accessor linked list to another accessor linked list. For example, if the payload of an ethernet packet (1500 bytes) is appended to an ethernet header (14 bytes):



the resulting linked list is as follows:



If data is not already packed and aligned, bf_align(acc, size, alignment) packs SIZE bytes from ACC, a linked list of accessors, by calling bf_pack(). This packing is necessary to ensure that all of the fields from a header are easily accessed. For example, the ip code aligns a packet's header contained in the accessors before accessing the various ip header fields.


0042356          acc_t *tmp_pack;
0042357          ip_hdr_t *ip_hdr;
0042358          int result, ip_hdr_len;
0042359          size_t size, transf_size;
0042360 
0042361          assert (ip_fd->if_flags & IFF_INUSE);
0042362          if (!(ip_fd->if_flags & IFF_READ_IP))
If the ip file descriptor is currently not being read (i.e., the IFF_READ_IP flag is not set), place the packet at the end of its read queue and set the expiration time. The IFF_READ_IP flag was set on line 42046 by ip_read().


0042363          {
0042364                   if (pack->acc_linkC != 1)
0042365                   {
0042366                            tmp_pack= bf_dupacc(pack);
0042367                            bf_afree(pack);
0042368                            pack= tmp_pack;
0042369                            tmp_pack= NULL;
0042370                   }
0042371                   pack->acc_ext_link= NULL;
Place the packet at the tail of the read queue (which is also the head of the queue if there were no packets previously in the queue).


0042372                   if (ip_fd->if_rdbuf_head == NULL)
0042373                   {
0042374                            ip_fd->if_rdbuf_head= pack;
0042375                            ip_fd->if_exp_time= exp_time;
This expiration time is checked on line 42049 by ip_read().


0042376                   }
0042377                   else
0042378                            ip_fd->if_rdbuf_tail->acc_ext_link= pack;
0042379                   ip_fd->if_rdbuf_tail= pack;
0042380                   return;
0042381          }
0042382 
0042383          size= bf_bufsize (pack);
0042384          if (ip_fd->if_ipopt.nwio_flags & NWIO_RWDATONLY)
From ip(4):

"NWIO_RWDATONLY specifies that the header should be omitted from a write request...A read operation will also only return the data part, so the IP options will be lost."


0042385          {
0042386 
0042387                   pack= bf_packIffLess (pack, IP_MIN_HDR_SIZE);
0042388                   assert (pack->acc_length >= IP_MIN_HDR_SIZE);
0042389 
0042390                   ip_hdr= (ip_hdr_t *)ptr2acc_data(pack);
0042391                   ip_hdr_len= (ip_hdr->ih_vers_ihl & IH_IHL_MASK) * 4;
0042392 
0042393                   assert (size >= ip_hdr_len);
0042394                   size -= ip_hdr_len;
0042395                   pack= bf_delhead(pack, ip_hdr_len);
Remove the ip header from the packet. The process that made the read request doesn't want it.


bf_delhead()


If only the beginning of a linked list can be freed, bf_delhead() is called. If acc_linkC and buf_linkC are one for all of the relevant accessors and their associated buffers in the linked list, the operation is straight-forward:



bf_delhead() is often called to remove the header (e.g., ip header) from a packet.

For a detailed description of the network service's buffer management, click here.


0042396          }
0042397 
0042398          if (size>ip_fd->if_rd_count)
Make sure that the higher layer (e.g., udp) doesn't receive more data than it requested.


0042399          {
0042400                   tmp_pack= bf_cut (pack, 0, ip_fd->if_rd_count);
bf_cut()

If a section of a linked list needs to be duplicated, bf_cut(data, offset, length) is called. For example, if a section of length 50 starting at an offset of 75 of the linked list below needs to be duplicated, bf_cut(data, 75, 50) is called:



Note that the original linked list remains unchanged and that acc_linkC for all the accessors in the new linked list is one.

If length (the second parameter) is zero, simply duplicate the first accessor in the linked list but set acc_length=0 and acc_next=null. In other words, create a linked list of length one accessor whose acc_length is 0.

bf_cut() is used in a number of scenarios, including cutting a received ethernet packet to size.

For a full description of the network service's buffer management, click here.



0042401                   bf_afree(pack);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0042402                   pack= tmp_pack;
0042403                   transf_size= ip_fd->if_rd_count;
0042404          }
0042405          else
0042406                   transf_size= size;
0042407 
There are two functions (actually, they reference functions) that can send data to the higher layer: if_put_pkt and if_put_userdata. The if_put_pkt field of the ip file descriptor corresponds to a higher layer function (e.g., udp_ip_arrived) as does if_put_userdata (e.g., udp_put_data()). These two values were set by ip_open() (if_put_pkt corresponds to the fifth parameter and if_put_userdata corresponds to the fourth parameter). If if_put_pkt has been set, packet2user() calls it (actually, it calls the function that if_put_pkt references) and then returns. Otherwise, packet2user() calls if_put_userdata (again, it calls the function that if_put_userdata references).

if_put_pkt is not set for ip file descriptors opened by the icmp code. if_put_userdata is set to reference icmp_putdata().


0042408          if (ip_fd->if_put_pkt)
0042409          {
0042410                   (*ip_fd->if_put_pkt)(ip_fd->if_srfd, pack, transf_size);
0042411                   return;
0042412          }
0042413 
0042414          result= (*ip_fd->if_put_userdata)(ip_fd->if_srfd,
0042415                   (size_t)0, pack, FALSE);
0042416          if (result >= 0)
0042417                   if (size > transf_size)
0042418                            result= EPACKSIZE;
0042419                   else
0042420                            result= transf_size;
0042421 
0042422          ip_fd->if_flags &= ~IFF_READ_IP;
Clear the IFF_READ_IP flag to indicate that the read operation is over. Note that this point in the code will never be reached if the ip file descriptor was opened by the udp code and so the ip file descriptor's IFF_READ_IP flag is never cleared.


0042423          result= (*ip_fd->if_put_userdata)(ip_fd->if_srfd, result,
0042424                            (acc_t *)0, FALSE);
0042425          assert (result >= 0);
0042426          return;
0042427 }
0042428 
0042429 PUBLIC void ip_port_arrive (ip_port, pack, ip_hdr)
0042430 ip_port_t *ip_port;
0042431 acc_t *pack;
0042432 ip_hdr_t *ip_hdr;
ip_port_arrive()

For a given packet, ip_port_arrive() finds the ip file descriptors that are interested in the packet. ip_port_arrive() then hands the packet off to the higher layer (e.g., udp layer) by calling either packet2user() or a protocol-specific function (e.g., udp_ip_arrived()).

eth_arrive() is the function analogous to ip_port_arrive() on the ethernet layer. eth_arrive() tries to find interested ethernet file descriptors for a given packet.udp read path

eth_arrive() 

ip_eth_arrived()

if (unicast packet)
ip_arrived()
else if (ethernet broadcast packet)
ip_arrived_broadcast()

if (packet must be input routed)
hand off packet to destination ip port
else
ip_port_arrive() {
packet2user()
udp_ip_arrived()
}



0042433 {
0042434          ip_fd_t *ip_fd, *first_fd, *share_fd;
0042435          ip_hdr_t *hdr;
ip_port_arrive() examines the ip header fields closely. A careful study of the ip header's fields at this point is beneficial.


ip_hdr_t


struct ip_hdr_t is the structure of an ip header. "ih" (e.g., ih_src, ih_dst) stands for "Ip Header".

ip_hdr_t is declared in /include/net/gen/ip_hdr.h:

typedef struct ip_hdr

{
u8_t ih_vers_ihl, ih_tos;
u16_t ih_length, ih_id, ih_flags_fragoff;
u8_t ih_ttl, ih_proto;
u16_t ih_hdr_chk;
ipaddr_t ih_src, ih_dst;
} ip_hdr_t;

ih_vers_ihl: The lower 4 bits is the length of the header plus options (if there are any) shifted by 2 bit positions (i.e., its actual length is 4 times as great as the value stored in ih_vers_ihl). An example of an option is a router list that a packet should follow to its destination.

The upper four bits is the version number (e.g., IPv4).


ih_tos: tos stands for "Type Of Service" and is the priority of the ip packet. A value of zero is the lowest priority. Both UDP and TCP have a default TOS of zero.

#define TCP_DEF_TOS 0
#define UDP_TOS 0


ih_length: The length of the entire ip packet, including the ip header.


ih_id: The value of ih_id for the first packet sent out is determined by ip_init() and is equal to the number of clock ticks since reboot (i.e., the value returned by get_time) and is incremented for each packet sent out. This value is used to combine fragments at the receiving end if fragmentation has occurred.


ih_flags_fragoff: ih_flags_fragoff is a combination of flags and a (possible) fragmentation offset ("fragoff").

If the packet should not be fragmented, ih_flags_fragoff is set to IH_DONT_FRAG. If there are additional fragments (e.g., the 3rd fragment of 4 fragments), ih_flags_fragoff is set to IH_MORE_FRAGS.

If the packet is indeed just a fragment of a packet, this value indicates the starting byte position (in 8 byte increments) of the original ip packet's data. So for example, if an ip packet of data size (not including the ip header) is broken up into two fragments of 1496 and 504 bytes each, the first fragment would have a fragmentation offset of 0 bytes and the second fragment would have a fragmentation offset of 1496 bytes and ih_flags_fragoff is therefore 187 (1496 / 8 = 187).


ih_ttl: "Time to live" for the packet. As a packet is routed to the destination, each router decrements the packet's ttl. When the ttl reaches 0, the router sends an "icmp unreachable" packet to the source. The ttl is designed to prevent packets that can't reach their destination from indefinitely bouncing around between routers. UDP's default TTL is 30:

#define UDP_TTL 30

Note that the Minix code also uses this value as a timeout value (in seconds). This code was written before the ttl field was redefined to be strictly a hope count. The original IP RFC defines the ttl field as the time to live in seconds.


ih_proto: The protocol of the ip packet. For example, if the packet is a udp packet, ih_proto will be 17. If the packet is a tcp packet, ih_proto will be 6.


ih_hdr_chk: Checksum for the header.


ih_src, ih_dst: Source and destination ip address of the ip packet.


IP HEADER (as given by RFC 791)


0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service| Total Length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Identification |Flags| Fragment Offset |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live | Protocol | Header Checksum |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Destination Address |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Options | Padding |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+



0042436          int port_nr;
0042437          unsigned long ip_pack_stat;
0042438          int i;
0042439          int hash, proto;
0042440          time_t exp_time;
0042441 
0042442          assert (pack->acc_linkC>0);
0042443          assert (pack->acc_length >= IP_MIN_HDR_SIZE);
0042444 
0042445          if (ntohs(ip_hdr->ih_flags_fragoff) & (IH_FRAGOFF_MASK|IH_MORE_FRAGS))
If the packet is actually a fragment, call reassemble() to either store the fragment until the remaining fragments arrive or rebuild the original IP datagram from the other fragments already in ip_ass_table[].

The ih_flags_fragoff field has a size of 13 bits and is used by the receiver to determine the the fragment's place in the reassembly of the original ip packet. The ih_flags_fragoff field holds the fragment's starting byte position (divided by 8) within the original IP packet (which means that the first fragment's offset will be zero). The fragmentation mask (IH_FRAGOFF_MASK) is used to mask those 13 bits.

The More Fragments flag (IH_MORE_FRAGS) indicates whether or not the fragment is the last fragment of the original packet.

IH_FRAGOFF_MASK and IH_MORE_FRAGS are defined in /include/net/gen/ip_hdr.h:

#define IH_FRAGOFF_MASK 0x1fff
#define IH_MORE_FRAGS 0x2000


htons() / ntohs() / htonl() / ntohl()


From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0042446          {
0042447                   pack= reassemble (ip_port, pack, ip_hdr);
reassemble()

reassemble() reassembles a packet from its fragments. reassemble() either adds a fragment to ip_ass_table[] if all of the fragments have not already been received or, if all of the fragments have been received, reassembles the original ip packet from the fragments. For example, the following ip packet of length 3744 bytes is broken up in transit into 4 fragments of lengths 1480 bytes, 1480 bytes, and 784 bytes. The corresponding fragment offsets of the fragments are 0, 185, and 370 respectively (the "fragment offset" is the actual byte offset divided by 8). reassemble() will reassemble this original ip packet from the fragments it receives.



reassemble() is called by ip_port_arrive() and in turn calls find_ass_ent() and merge_frags().



A return value of NULL indicates that the packet to which the fragment belongs was not reassembled in its entirety or that the reassembly time exceeded the maximum time allowed (255 seconds in Minix).



0042448                   if (!pack)
0042449                            return;
If this point in the code has been reached, the packet has been successfully reassembled.


0042450                   assert (pack->acc_length >= IP_MIN_HDR_SIZE);
0042451                   ip_hdr= (ip_hdr_t *)ptr2acc_data(pack);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0042452                   assert (!(ntohs(ip_hdr->ih_flags_fragoff) &
0042453                            (IH_FRAGOFF_MASK|IH_MORE_FRAGS)));
0042454          }
0042455 
0042456          exp_time= get_time() + (ip_hdr->ih_ttl+1) * HZ;
The expiration time of the packet is calculated using the ih_ttl (Time to Live) field of its header. As can be seen, there is a higher likelihood that it will be discarded. This code was written before ttl was redefined to be strictly a hop count. The original IP RFC defines TTL as the time to live in seconds.

HZ is #define'd in /include/minix/const.h:

#define HZ 60 /* clock freq (software settable on IBM-PC) */


get_time()


get_time() returns the number of clock ticks since reboot.

Several of the clients (eth, arp, ip, tcp, and udp) use get_time() to determine an appropriate timeout value for a given operation. For example, the arp code calls get_time() to determine an appropriate amount of time to wait for a response from an arp request before giving up.


0042457 
Determine if the packet has the same destination ip address as the ip address of the ip port on which the packet arrived. This is one of the factors that determine whether a given ip file descriptor accepts the packet. Note that the default setting for an ip file descriptor is to accept both:

#define NWIO_DEFAULT (NWIO_EN_LOC | NWIO_EN_BROAD | NWIO_REMANY | \
NWIO_RWDATALL | NWIO_HDR_O_SPEC)

Ip file descriptors opened by the udp code are also configured to accept both.


0042458          if (ip_hdr->ih_dst == ip_port->ip_ipaddr)
0042459                   ip_pack_stat= NWIO_EN_LOC;
0042460          else
0042461                   ip_pack_stat= NWIO_EN_BROAD;
0042462 
0042463          proto= ip_hdr->ih_proto;
The ih_proto field of the ip header is the protocol of the ip packet. For example, if the packet is a udp packet, ih_proto will be 17. If the packet is a tcp packet, ih_proto will be 6. The protocol of the packet is used to determine which linked lists of ip file descriptors associated with the ip port to search (see line 42472).



0042464          hash= proto & (IP_PROTO_HASH_NR-1);
0042465 
Since the NWUO_SHARED flag of an ip file descriptor is the most difficult to understand, here is a short description.

From ip(4):

"If the NWIO_SHARED flag is set, then multiple channels that all must specify NWIO_SHARED can use the same IP protocol, but incoming packets will be delivered to at most one channel."

An example best demonstrates the NWIO_SHARED flag.

Let's say there are three ip file descriptors (ip_fd_table[2], ip_fd_table[5], and ip_fd_table[9]) that have the same flags set; each is configured to share (i.e., their NWIO_SHARED flag is set) and each of the three ip file descriptors is configured for the udp protocol. The packet will only be delivered to ip_fd_table[9]; the packet will not be delivered to ip_fd_table[2] and ip_fd_table[5].

Let's also say that 2 other ip file descriptor, ip_fd_table[4] and ip_fd_table[6], have the same flags set with the exception of the NWIO_SHARED flag, which they do not have set. The packet will be delivered to ip_fd_table[4] and ip_fd_table[6] in addition to ip_fd_table[9].


0042466          first_fd= NULL;
0042467          for (i= 0; i<2; i++)
0042468          {
0042469                   share_fd= NULL;
0042470 
0042471                   ip_fd= (i == 0) ? ip_port->ip_proto_any :
0042472                            ip_port->ip_proto[hash];
Search through two of the ip port's linked lists of ip file descriptors. First, search through the linked list of file descriptors that have been configured to accept and send packets of any protocol type (e.g., udp, tcp). Second, search through the linked list associated with the protocol-type of the packet.


0042473                   for (; ip_fd; ip_fd= ip_fd->if_proto_next)
0042474                   {
0042475                            if (i && ip_fd->if_ipopt.nwio_proto != proto)
0042476                                     continue;
For the search of the protocol-specific linked list, continue (i.e., go to the next ip file descriptor) if the protocol of the packet and the ip port are not the same.


0042477                            if (!(ip_fd->if_ipopt.nwio_flags & ip_pack_stat))
0042478                                     continue;
ip_pack_stat was determined on lines 42459-42461 and is either NWIO_EN_LOC or NWIO_EN_BROAD. Continue with the next ip file descriptor if the ip file descriptor is not configured appropriately.


0042479                            if ((ip_fd->if_ipopt.nwio_flags & NWIO_REMSPEC) &&
0042480                                     ip_hdr->ih_src != ip_fd->if_ipopt.nwio_rem)
0042481                            {
0042482                                     continue;
0042483                            }
From ip(4):

"NWIO_REMSPEC can be used to restrict communication to one remote host.
This host is taken from the nwio_rem field. If any remote host is to be
allowed, then NWIO_REMANY can be used."

The default setting for an ip file descriptor is to accept packets from any host:

#define NWIO_DEFAULT (NWIO_EN_LOC | NWIO_EN_BROAD | NWIO_REMANY | \
NWIO_RWDATALL | NWIO_HDR_O_SPEC)


0042484                            if ((ip_fd->if_ipopt.nwio_flags & NWIO_ACC_MASK) ==
0042485                                     NWIO_SHARED)
0042486                            {
0042487                                     if (!share_fd)
0042488                                     {
0042489                                              share_fd= ip_fd;
0042490                                              continue;
0042491                                     }
0042492                                     if (!ip_fd->if_rdbuf_head)
0042493                                              share_fd= ip_fd;
0042494                                     continue;
0042495                            }
From ip(4):

"If the NWIO_SHARED flag is set, then multiple channels that all must specify NWIO_SHARED can use the same IP protocol, but incoming packets will be delivered to at most one channel."

So, for example, if ip_fd_table[2], ip_fd_table[5], and ip_fd_table[9] are configured to share the same protocol and (if the NWIO_SHARED flag is configured for the ip file descriptors) the same source address, then the packet will only be delivered to ip_fd_table[9].


0042496                            if (!first_fd)
0042497                            {
0042498                                     first_fd= ip_fd;
0042499                                     continue;
0042500                            }
Note that first_fd is delivered the packet last. This is not an issue since there is no priority among the the ip file descriptors.


0042501                            pack->acc_linkC++;
Incrementing acc_link of the first accessor forces packet2user() to create a duplicate of the packet.


0042502                            packet2user(ip_fd, pack, exp_time);
packet2user() / ip

packet2user() attempts to pass a packet off to a higher layer (e.g., udp). If the ip file descriptor is not currently being read (i.e., the IFF_READ_IP flag in the if_flags field of the ip file descriptor is not set), packet2user() puts the packet at the end of its read queue and returns. Note that if the udp code opened the ip file descriptor, however, the udp code set the IFF_READ_IP flag during its initialization and that this flag remains set until the ip file descriptor is closed.


0042503 
0042504                   }
0042505                   if (share_fd)
At most two ip file descriptors will receive the packet, one ip file descriptor in the ip_port_any linked list and one ip file descriptor in the ip_port[hash] that corresponds to the protocol of the incoming packet.


0042506                   {
0042507                            pack->acc_linkC++;
Incrementing acc_link of the first accessor forces packet2user() to create a duplicate of the packet.


0042508                            packet2user(share_fd, pack, exp_time);
packet2user() / ip

packet2user() attempts to pass a packet off to a higher layer (e.g., udp). If the ip file descriptor is not currently being read (i.e., the IFF_READ_IP flag in the if_flags field of the ip file descriptor is not set), packet2user() puts the packet at the end of its read queue and returns. Note that if the udp code opened the ip file descriptor, however, the udp code set the IFF_READ_IP flag during its initialization and that this flag remains set until the ip file descriptor is closed.


0042509                   }
0042510          }
At this point in the code, both linked lists, ip_proto_any and ip_proto[hash], have been searched and the only ip file descriptor that hasn't been delivered the packet is the first non-shared ip file descriptor configured to receive the packet (if one exists). Lines 42511 - 42521 attempt to deliver the packet to this ip file descriptor.


0042511          if (first_fd)
0042512          {
If the ip file descriptor's IFF_READ_IP flag is not set, packet2user() places the packet in the ip file descriptor's read queue and returns. If the ip file descriptor's NWIO_RWDATONLY flag is set, packet2user() strips the ip packet's ip header. After these two checks are made, packet2user() calls a protocol-specific function that hands the packet off to the next higher layer (e.g., udp_ip_arrived()). However, if neither the IFF_READ_IP nor the NWIO_RWDATONLY flag are set, there is no point to calling packet2user() instead of directly calling the protocol-specific function.


0042513                   if (first_fd->if_put_pkt &&
0042514                            (first_fd->if_flags & IFF_READ_IP) &&
0042515                            !(first_fd->if_ipopt.nwio_flags & NWIO_RWDATONLY))
If an ip file descriptor's NWIO_RWDATONLY flag is set, the process does not wish to be delivered the ip packet's ip header. The ip packet's ip header will be stripped off by packet2user().


0042516                   {
0042517                            (*first_fd->if_put_pkt)(first_fd->if_srfd, pack,
0042518                                     ntohs(ip_hdr->ih_length));
Call the protocol-specific function that hands the packet off to the next-higher layer (e.g., udp_ip_arrived()).


htons() / ntohs() / htonl() / ntohl()


From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


udp_ip_arrived()


udp_ip_arrived() is called either (indirectly) from the ip code or from udp_put_data() and is one of the last functions called for a read request from a user process. udp_ip_arrived() does some checks (e.g., a checksum check) and figures out the destination udp file descriptor before calling udp_packet2user() to deliver the packet to the process that requested the read.


0042519                   }
0042520                   else
0042521                            packet2user(first_fd, pack, exp_time);
packet2user() / ip

packet2user() attempts to pass a packet off to a higher layer (e.g., udp). If the ip file descriptor is not currently being read (i.e., the IFF_READ_IP flag in the if_flags field of the ip file descriptor is not set), packet2user() puts the packet at the end of its read queue and returns. Note that if the udp code opened the ip file descriptor, however, the udp code set the IFF_READ_IP flag during its initialization and that this flag remains set until the ip file descriptor is closed.


0042522          }
0042523          else
The packet was not delivered to any non-shared ip file descriptors. Free the packet.


0042524          {
0042525                   if (ip_pack_stat == NWIO_EN_LOC)
0042526                   {
0042527                            DBLOCK(0x01,
0042528                            printf("ip_port_arrive: dropping packet for proto %d\n",
0042529                                     proto));
0042530                   }
0042531                   else
0042532                   {
0042533                            DBLOCK(0x20, printf("dropping packet for proto %d\n",
0042534                                     proto));
0042535                   }
0042536                   bf_afree(pack);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0042537          }
0042538 }
0042539 
0042540 PUBLIC void ip_arrived(ip_port, pack)
0042541 ip_port_t *ip_port;
0042542 acc_t *pack;
ip_arrived()

Depending on the destination ip address of its second parameter,
ip_arrived(ip_port, pack) does one of several things:

1) If the destination ip address is the ip address of the ip port associated with the ethernet port, ip_arrived() calls ip_port_arrive() for the packet.

2) If the destination ip address is the ip address of another ip port, ip_arrived() also calls ip_port_arrived(). This time, however, the first argument passed to ip_port_arrived() is the other port. Note that for this to take place, an input route to the other ip port must exist.

3) If the destination ip address is not the address of another ip port but it is in the same network as another ip port, ip_arrived() sends the packet out the other interface. Again, an input route to the other ip port for this destination must exist.

4) If the destination ip address is not the address of another ip port and it is not in the same network as another ip port, ip_arrived() sends the packet out to the gateway for this network. Again, an input route (including the gateway) to the other ip port for this destination must exist.

5) If the destination ip address is not the ip address of the ip port but an input route for the destination exists and is associated with the same ip port as the packet arrived, an icmp redirect message is sent to the source (provided the source is on the same network) and the packet is then sent. If the source of the ip packet is not on the same network as the ip port, the packet is dropped.

If an ip packet arrives on an ethernet interface, ip_eth_arrived() strips off a packet's ethernet header before handing the packet off to ip_arrived().
udp read path

eth_arrive() 

ip_eth_arrived()

if (unicast packet)
ip_arrived()
else if (ethernet broadcast packet)
ip_arrived_broadcast()

if (packet must be input routed)
hand off packet to destination ip port
else
ip_port_arrive() {
packet2user()
udp_ip_arrived()
}



0042543 {
0042544          ip_port_t *next_port;
0042545          ip_hdr_t *ip_hdr;
0042546          iroute_t *iroute;
0042547          ipaddr_t dest;
0042548          nettype_t nettype;
0042549          int ip_frag_len, ip_hdr_len;
0042550          size_t pack_size;
0042551          acc_t *tmp_pack;
0042552          int broadcast;
0042553 
0042554          pack_size= bf_bufsize(pack);
bf_bufsize()

bf_bufsize() returns the total buffer size of a linked list of accessors (i.e., the sum of acc_length for the accessors in a linked list).

For a detailed description of the network service's buffer management, click here.


0042555 
0042556          if (pack_size < IP_MIN_HDR_SIZE)
At this stage, the ip header is still on the packet. Therefore, the packet should obviously be at least as large as an ip header.


0042557          {
0042558                   DBLOCK(1, printf("wrong acc_length\n"));
0042559                   bf_afree(pack);
0042560                   return;
0042561          }
0042562          pack= bf_align(pack, IP_MIN_HDR_SIZE, 4);
bf_align()

If data is not already packed and aligned, bf_align(acc, size, alignment) packs size (bf_align's second parameter) bytes from acc, bf_align()'s first parameter and a linked list of accessors (i.e., a packet), by calling bf_pack(). This packing is necessary to ensure that all of the fields from a header are easily accessed. For example, the ip code aligns a packet's header contained in the accessors before accessing the various ip header fields.

For a detailed description of the network service's buffer management, click here.


0042563          pack= bf_packIffLess(pack, IP_MIN_HDR_SIZE);
This is probably not necessary since bf_align() packs the buffer in addition to aligning the buffer.


bf_packIffLess()


If the data in a linked list of accessors is less than min_len (the second parameter), bf_packIffLess(pack, min_len) packs the data by calling bf_pack().

bf_packIffLess() is often called to ensure that a packet's header is in a single contiguous buffer so that the individual fields of the header can be easily accessed.

For a detailed description of the network service's buffer management, click here.


0042564 assert (pack->acc_length >= IP_MIN_HDR_SIZE);
0042565 
0042566          ip_hdr= (ip_hdr_t *)ptr2acc_data(pack);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0042567          ip_hdr_len= (ip_hdr->ih_vers_ihl & IH_IHL_MASK) << 2;
The lower 4 bits of ih_vers_ihl is the length of the header plus options (if there are any) shifted by 2 bit positions (i.e., its actual length is 4 times as great as the value stored in ih_vers_ihl). An example of an option is a router list that a packet should follow to its destination.

The upper four bits is the version number (e.g., IPv4).


0042568          if (ip_hdr_len>IP_MIN_HDR_SIZE)
The block that preceded this block ensured that the ip header (but not the ip header options) was in one contiguous memory block.

In this case, the ip header has options. Ensure that the ip header and all ip header options are in one contiguous memory block.


0042569          {
0042570                   pack= bf_align(pack, IP_MIN_HDR_SIZE, 4);
bf_align()

If data is not already packed and aligned, bf_align(acc, size, alignment) packs size (bf_align's second parameter) bytes from acc, bf_align()'s first parameter and a linked list of accessors (i.e., a packet), by calling bf_pack(). This packing is necessary to ensure that all of the fields from a header are easily accessed. For example, the ip code aligns a packet's header contained in the accessors before accessing the various ip header fields.

For a detailed description of the network service's buffer management, click here.


0042571                   pack= bf_packIffLess(pack, ip_hdr_len);
bf_packIffLess()

If the data in a linked list of accessors is less than min_len (the second parameter), bf_packIffLess(pack, min_len) packs the data by calling bf_pack().

bf_packIffLess() is often called to ensure that a packet's header is in a single contiguous buffer so that the individual fields of the header can be easily accessed.

For a detailed description of the network service's buffer management, click here.


0042572                   ip_hdr= (ip_hdr_t *)ptr2acc_data(pack);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0042573          }
0042574          ip_frag_len= ntohs(ip_hdr->ih_length);
htons() / ntohs() / htonl() / ntohl()

From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0042575          if (ip_frag_len<pack_size)
If the size of the packet (or fragment) as advertised in the header is less than the size of the packet received, trim the packet to reflect the size as advertised in the header. It appears that this will never be the case.


0042576          {
0042577                   tmp_pack= pack;
0042578                   pack= bf_cut(tmp_pack, 0, ip_frag_len);
0042579                   bf_afree(tmp_pack);
0042580          }
0042581 
0042582          if (!ip_frag_chk(pack))
ip_frag_chk()

ip_frag_chk(pack) performs a few checks on the packet pack, ip_frag_chk()'s only parameter. For example, ip_frag_chk() verifies that the checksum of the ip header is zero.

ip_frag_chk() returns 0 on failure.


0042583          {
0042584                   DBLOCK(1, printf("fragment not allright\n"));
0042585                   bf_afree(pack);
0042586                   return;
0042587          }
0042588 
0042589          /* Decide about local delivery or routing. Local delivery can happen
0042590           * when the destination is the local ip address, or one of the
0042591           * broadcast addresses and the packet happens to be delivered
0042592           * point-to-point.
0042593           */
0042594 
Lines 42595-42606 determine if the destination ip address of the packet is for the local address or the ip broadcast address of the ip port. For both cases, ip_port_arrive() is called. If the desination ip address of the packet is neither the ip address of the ip port nor the ip broadcast address, the packet must be routed.


0042595          dest= ip_hdr->ih_dst;
0042596 
0042597          if (dest == ip_port->ip_ipaddr)
0042598          {
0042599                   ip_port_arrive (ip_port, pack, ip_hdr);
ip_port_arrive()

For a given packet, ip_port_arrive() finds the ip file descriptors that are interested in the packet. ip_port_arrive() then hands the packet off to the higher layer (e.g., udp layer) by calling either packet2user() or a protocol-specific function (e.g., udp_ip_arrived()).

eth_arrive() is the function analogous to ip_port_arrive() on the ethernet layer. eth_arrive() tries to find interested ethernet file descriptors for a given packet.


0042600                   return;
0042601          }
0042602          if (broadcast_dst(ip_port, dest))
broadcast_dst()

broadcast_dst(ip_port, dest) returns TRUE if dest, the second parameter of ip_broadcast_dst(), is the broadcast address of the ip port ip_port, the first parameter.

Which addresses are broadcast addresses for a given ip port is best shown with an example:

If an ip port has an ip address/subnet pair of 192.168.1.5/255.255.255.252, the following ip addresses are considered to be broadcast addresses:

192.168.1.7 (the broadcast address for the given subnet)
192.168.1.255 (the broadcast address for the address's class C network)
255.255.255.255

If BSD rules apply (i.e., iff IP_42BSD_BCAST is set), the following ip addresses are also considered to be broadcast addresses:

192.168.1.4 (the network address for the given subnet)
192.168.1.0 (the network address for the address's class C network)


0042603          {
0042604                   ip_port_arrive (ip_port, pack, ip_hdr);
ip_port_arrive()

For a given packet, ip_port_arrive() finds the ip file descriptors that are interested in the packet. ip_port_arrive() then hands the packet off to the higher layer (e.g., udp layer) by calling either packet2user() or a protocol-specific function (e.g., udp_ip_arrived()).

eth_arrive() is the function analogous to ip_port_arrive() on the ethernet layer. eth_arrive() tries to find interested ethernet file descriptors for a given packet.


0042605                   return;
0042606          }
0042607 
It is possible that the packet must be routed to another system.


0042608          /* Try to decrement the ttl field with one. */
0042609          if (ip_hdr->ih_ttl < 2)
The packet's ttl has expired. Send an "icmp undeliverable" packet.


0042610          {
0042611                   icmp_snd_time_exceeded(ip_port->ip_port, pack, ICMP_TTL_EXC);
icmp_snd_time_exceeded()

icmp_snd_time_exceeded() sends an ICMP_TYPE_TIME_EXCEEDED icmp message. ICMP_TYPE_TIME_EXCEEDED messages are sent when either a packet's TTL timer has expired or if some fragments of the original packet were not received in time to be reassembled by the destination system.


0042612                   return;
0042613          }
0042614          ip_hdr->ih_ttl--;
Since the ttl is a hop count and since the packet was not destined for this system but must be sent to another system, decrement the ttl.


0042615          ip_hdr_chksum(ip_hdr, ip_hdr_len);
The TTL has changed. The checksum for the header must reflect this change.

Note that since the ttl is decremented by every router along an ip packet's path to the destination, the checksum of the ip packet must be recalculated by each router through which the packet passes.


ip_hdr_chksum()


ip_hdr_chksum() sets the ih_hdr_chk field of an ip header with the checksum of the ip header (minus the checksum field). This checksum is obtained from oneC_sum().


0042616 
0042617          /* Avoid routing to bad destinations. */
0042618          nettype= ip_nettype(dest);
ip_nettype()

ip_nettype(ipaddr) returns the network type (which will be of type nettype_t) of ipaddr, the only parameter to ip_nettype().

The nettype_t enum typedef is declared in inet/generic/ip_int.h. Each type's associated ip address range is included in the comments.

typedef enum nettype
{
IPNT_ZERO, /* 0.xx.xx.xx */
IPNT_CLASS_A, /* 1.xx.xx.xx .. 126.xx.xx.xx */
IPNT_LOCAL, /* 127.xx.xx.xx */
IPNT_CLASS_B, /* 128.xx.xx.xx .. 191.xx.xx.xx */
IPNT_CLASS_C, /* 192.xx.xx.xx .. 223.xx.xx.xx */
IPNT_CLASS_D, /* 224.xx.xx.xx .. 239.xx.xx.xx */
IPNT_CLASS_E, /* 240.xx.xx.xx .. 247.xx.xx.xx */
IPNT_MARTIAN, /* 248.xx.xx.xx .. 254.xx.xx.xx + */
IPNT_BROADCAST /* 255.255.255.255 */
} nettype_t;



0042619          if (nettype != IPNT_CLASS_A && nettype != IPNT_CLASS_B && nettype !=
0042620                   IPNT_CLASS_C)
Only class A, B, and C packets can be routed.


0042621          {
0042622                   /* Bogus destination address */
0042623                   if (nettype == IPNT_CLASS_D || nettype == IPNT_CLASS_E)
0042624                            bf_afree(pack);
Packets destined for class D and E ip addresses are simply dropped. For all other packets, the source is sent an icmp unreachable packet.

Class D ip addresses are used for multicasting and Class E ip addresses are used in research.


bf_afree()


After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0042625                   else
0042626                   {
0042627                            icmp_snd_unreachable(ip_port->ip_port, pack,
0042628                                     ICMP_HOST_UNRCH);
icmp_snd_unreachable()

icmp_snd_unreachable(port_nr, pack, code) builds an icmp unreachable packet (partially using the ip packet pack, icmp_snd_unreachable()'s second parameter) and then places the icmp unreachable packet in the outgoing queue. Icmp unreachable packets are sent if the network, host, or port number specified by the ip packet pack is unreachable.

The function first calls icmp_err_pack() to build a generic icmp packet, sets the ih_type field of the icmp header to ICMP_TYPE_DST_UNRCH and recalibrates the checksum (since the type and code fields of the icmp header have changed) of the icmp header before placing the packet in the icmp port's write queue.


0042629                   }
0042630                   return;
0042631          }
0042632          iroute= iroute_frag(ip_port->ip_port, dest);
Note that the network service handles two types of routing, output routing and input routing. Output routing handles the routing of outgoing packets. Input routing, on the other hand, handles the routing of packets received on the ethernet or psip ports that are not destined for the local machine.


iroute_frag()


iroute_frag(port_nr, dest) first looks in the input route cache (i.e., iroute_hash_table[][]) for a route for the network to which dest, iroute_frag()'s second parameter, belongs and if it doesn't find the route in the cache, looks for the route in the main input routing table (i.e., iroute_table[]).

If iroute_frag() doesn't find the route in the cache or the routing table, it returns NULL. If iroute_frag() cannot find the route in the cache but finds the route in the main routing table, it places the route in the cache.


0042633          if (iroute == NULL || iroute->irt_dist == IRTD_UNREACHABLE)
If a route to the network couldn't be found in the main input routing table or the route to the network was labeled as unreachable, send the source of the packet an icmp unreachable packet.


0042634          {
0042635                   /* Also unreachable */
0042636                   /* Finding out if we send a network unreachable is too much
0042637                    * trouble.
0042638                    */
0042639                   icmp_snd_unreachable(ip_port->ip_port, pack,
0042640                            ICMP_HOST_UNRCH);
icmp_snd_unreachable()

icmp_snd_unreachable(port_nr, pack, code) builds an icmp unreachable packet (partially using the ip packet pack, icmp_snd_unreachable()'s second parameter) and then places the icmp unreachable packet in the outgoing queue. Icmp unreachable packets are sent if the network, host, or port number specified by the ip packet pack is unreachable.

The function first calls icmp_err_pack() to build a generic icmp packet, sets the ih_type field of the icmp header to ICMP_TYPE_DST_UNRCH and recalibrates the checksum (since the type and code fields of the icmp header have changed) of the icmp header before placing the packet in the icmp port's write queue.


0042641                   return;
0042642          }
0042643          next_port= &ip_port_table[iroute->irt_port];
The irt_port field of an input route for a given network holds the ip port to which the packet should be routed.


0042644          if (next_port != ip_port)
0042645          {
0042646                   if (iroute->irt_gateway != 0)
0042647                   {
0042648                            /* Just send the packet to the next gateway */
0042649                            next_port->ip_dev_send(next_port, iroute->irt_gateway,
0042650                                     pack, /* no bradcast */ 0);
Send the packet out the appropriate ip port. If the ip port is associated with an ethernet port (as opposed to a psip port), ip_dev_send for the ip port will be ipeth_send().


ipeth_send()


ipeth_send() is called (indirectly) by ip_send() to send out a packet to a destination address on the same subnet as the ip port from which it is sent or to send out a packet to a broadcast address. ipeth_send() first creates an ethernet header to prepend to the ip packet and then, if there are no packets waiting to be sent out, calls eth_send() in an attempt to send the packet to the ethernet task immediately. If eth_send() is not able to send the ethernet packet immediately, eth_write() is called to queue the packet. If there are already ethernet packets waiting to be sent out, eth_send() and eth_write() are not called and the packet is queued (i.e., placed in the de_q_head queue of the ip port).


0042651                            return;
0042652                   }
If the irt_gateway field of the input route is null, the destination must be on the same network.

To enable this passing of packets between two interfaces (e.g., between two ip ports), the add_route utility can be used:

"The command

add_route -i -g 0.0.0.0 -d 192.31.231.0 -m 1 -n 255.255.255.0 -I /dev/ip0

adds a route to the directly attached ethernet. A packet for 192.31.231.1
that arrives at interface ip2 will be routed to ip0 and ARP will
be used to find the ethernet address associated with 192.31.231.1."


0042653                   /* The packet is for the attached network. Special addresses
0042654                    * are the ip address of the interface and net.0 if
0042655                    * no IP_42BSD_BCAST.
0042656                    */
0042657                   if (dest == next_port->ip_ipaddr)
If the destination ip address of the packet is the ip address of the ip port to which the packet is routed, call ip_port_arrive() to hand the packet off to the ip port. An example of this scenario is if a system has two ethernet ports, ethernet 1 and ethernet 2, and an ip packet arrives on ethernet 1 and the ip packet is destined for the ip address of the ip port associated with ethernet 2. Note that an entry in the input routing table must exist for this routing to occur.


0042658                   {
0042659                            ip_port_arrive (next_port, pack, ip_hdr);
ip_port_arrive()

For a given packet, ip_port_arrive() finds the ip file descriptors that are interested in the packet. ip_port_arrive() then hands the packet off to the higher layer (e.g., udp layer) by calling either packet2user() or a protocol-specific function (e.g., udp_ip_arrived()).

eth_arrive() is the function analogous to ip_port_arrive() on the ethernet layer. eth_arrive() tries to find interested ethernet file descriptors for a given packet.


0042660                            return;
0042661                   }
0042662                   if (dest == iroute->irt_dest)
If the destination ip address of the packet is a network address (as opposed to the ip address of a single host), interpret the address as the broadcast address if the network service follows the BSD convention for broadcast addresses (i.e., IP_42BSD_BCAST is TRUE). If the network service does not follow the BSD convention, send the source of the packet an icmp unreachable packet.


0042663                   {
0042664 #if IP_42BSD_BCAST
0042665                            broadcast= 1;
0042666 #else
0042667                            /* Bogus destination address */
0042668                            icmp_snd_dstunrch(pack);
This function is not in any of the files. If IP_42BSD_BCAST is not #define'd, there's a problem.


0042669                            return;
0042670 #endif
0042671                   }
0042672                   else if (dest == (iroute->irt_dest | ~iroute->irt_subnetmask))
0042673                            broadcast= 1;
The packet is destined for the broadcast address of a network. For example, if a network in the input routing table has a network address of 204.140.1.0 and a subnet mask of 255.255.255.0, the broadcast address of the network is 204.140.1.255:

204.140.1.0 | ~(255.255.255.0)
= 204.140.1.0 | 0.0.0.255
= 204.140.1.255

The broadcast variable is used on line 42678.


0042674                   else
0042675                            broadcast= 0;
The packet is a unicast packet.


0042676 
0042677                   /* Just send the packet to it's destination */
0042678                   next_port->ip_dev_send(next_port, dest, pack, broadcast);
Send the packet out the appropriate ip port. If the ip port is associated with an ethernet port (as opposed to a psip port), ip_dev_send for the ip port will be ipeth_send().


ipeth_send()


ipeth_send() is called (indirectly) by ip_send() to send out a packet to a destination address on the same subnet as the ip port from which it is sent or to send out a packet to a broadcast address. ipeth_send() first creates an ethernet header to prepend to the ip packet and then, if there are no packets waiting to be sent out, calls eth_send() in an attempt to send the packet to the ethernet task immediately. If eth_send() is not able to send the ethernet packet immediately, eth_write() is called to queue the packet. If there are already ethernet packets waiting to be sent out, eth_send() and eth_write() are not called and the packet is queued (i.e., placed in the de_q_head queue of the ip port).


0042679                   return;
0042680          }
0042681 
0042682          /* Now we know that the packet should be route over the same network
0042683           * as it came from. If there is a next hop gateway, we can send
0042684           * the packet to that gateway and send a redirect ICMP to the sender
0042685           * if the sender is on the attached network. If there is no gateway
0042686           * complain.
0042687           */
The packet will be sent out the same interface on which it was received. Obviously, this isn't ideal; in the future, packets destined for the same ip address as this packet should be redirected elsewhere. For a good description of redirect messages and when they are sent, click here.


0042688          if (iroute->irt_gateway == 0)
If there is no gateway for the input route, free the packet.


0042689          {
0042690 #if !CRAMPED
0042691                   printf("packet should not be here, src=");
0042692                   writeIpAddr(ip_hdr->ih_src);
0042693                   printf(" dst=");
0042694                   writeIpAddr(ip_hdr->ih_dst);
0042695                   printf("\n");
0042696 #endif
0042697                   bf_afree(pack);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0042698                   return;
0042699          }
0042700          if (((ip_hdr->ih_src ^ ip_port->ip_ipaddr) &
0042701                   ip_port->ip_subnetmask) == 0)
If the source ip address of the packet and the ip address of the ip port are in the same network, icmp_snd_redirect() sends the source an icmp redirect message and then sends the packet (lines 42722-42723).


0042702          {
0042703                   /* Finding out if we can send a network redirect instead of
0042704                    * a host redirect is too much trouble.
0042705                    */
0042706                   pack->acc_linkC++;
0042707                   icmp_snd_redirect(ip_port->ip_port, pack,
0042708                            ICMP_REDIRECT_HOST, iroute->irt_gateway);
icmp_snd_redirect()

icmp_snd_redirect(port_nr, pack, code) builds an icmp redirect packet (partially using the ip packet pack, icmp_snd_redirect()'s second parameter) and then places the icmp redirect packet in the outgoing queue.

The function first calls icmp_err_pack() to build a generic icmp packet, sets the ih_type field of the icmp header to ICMP_TYPE_REDIRECT and recalibrates the checksum (since the type, code, and gateway fields of the icmp header have changed) of the icmp header before placing the icmp packet in the icmp port's write queue.


0042709          }
0042710          else
If the source of the packet was not in the same network as the ip port, discard the packet.


0042711          {
0042712 #if !CRAMPED
0042713                   printf("packet is wrongly routed, src=");
0042714                   writeIpAddr(ip_hdr->ih_src);
0042715                   printf(" dst=");
0042716                   writeIpAddr(ip_hdr->ih_dst);
0042717                   printf("\n");
0042718 #endif
0042719                   bf_afree(pack);
0042720                   return;
0042721          }
0042722          ip_port->ip_dev_send(ip_port, iroute->irt_gateway, pack,
0042723                   /* no broadcast */ 0);
After sending out the icmp redirect message, send the packet out the same interface on which it arrived. If the ip port is associated with an ethernet port (as opposed to a psip port), the ip_dev_send field of the ip port was set to ipeth_send().


ipeth_send()


ipeth_send() is called (indirectly) by ip_send() to send out a packet to a destination address on the same subnet as the ip port from which it is sent or to send out a packet to a broadcast address. ipeth_send() first creates an ethernet header to prepend to the ip packet and then, if there are no packets waiting to be sent out, calls eth_send() in an attempt to send the packet to the ethernet task immediately. If eth_send() is not able to send the ethernet packet immediately, eth_write() is called to queue the packet. If there are already ethernet packets waiting to be sent out, eth_send() and eth_write() are not called and the packet is queued (i.e., placed in the de_q_head queue of the ip port).


0042724 }
0042725 
0042726 PUBLIC void ip_arrived_broadcast(ip_port, pack)
0042727 ip_port_t *ip_port;
0042728 acc_t *pack;
ip_arrived_broadcast()

If a packet arrives on an ethernet interface, ip_arrived_broadcast() is called from ip_eth_arrived() (instead of ip_arrived()) if the arriving ethernet packet has the broadcast ethernet address (i.e., ff:ff:ff:ff:ff:ff). ip_arrived_broadcast() performs some checks that include verifying that the destination ip address (in addition to the destination ethernet address) is the broadcast address.udp read path

eth_arrive() 

ip_eth_arrived()

if (unicast packet)
ip_arrived()
else if (ethernet broadcast packet)
ip_arrived_broadcast()

if (packet must be input routed)
hand off packet to destination ip port
else
ip_port_arrive() {
packet2user()
udp_ip_arrived()
}



0042729 {
0042730          ip_hdr_t *ip_hdr;
0042731          int ip_frag_len, ip_hdr_len;
0042732          size_t pack_size;
0042733          acc_t *tmp_pack;
0042734 
0042735          pack_size= bf_bufsize(pack);
bf_bufsize()

bf_bufsize() returns the total buffer size of a linked list of accessors (i.e., the sum of acc_length for the accessors in a linked list).

For a detailed description of the network service's buffer management, click here.


0042736 
0042737          if (pack_size < IP_MIN_HDR_SIZE)
If the ip packet isn't even as big as the minimum ip header, there's a serious problem.


0042738          {
0042739                   DBLOCK(1, printf("wrong acc_length\n"));
0042740                   bf_afree(pack);
0042741                   return;
0042742          }
Lines 42743 pack and align the buffer so that the ip header can be extracted on line 42747.


0042743          pack= bf_align(pack, IP_MIN_HDR_SIZE, 4);
bf_align()

If data is not already packed and aligned, bf_align(acc, size, alignment) packs size (bf_align's second parameter) bytes from acc, bf_align()'s first parameter and a linked list of accessors (i.e., a packet), by calling bf_pack(). This packing is necessary to ensure that all of the fields from a header are easily accessed. For example, the ip code aligns a packet's header contained in the accessors before accessing the various ip header fields.

For a detailed description of the network service's buffer management, click here.


0042744          pack= bf_packIffLess(pack, IP_MIN_HDR_SIZE);
bf_packIffLess()

If the data in a linked list of accessors is less than min_len (the second parameter), bf_packIffLess(pack, min_len) packs the data by calling bf_pack().

bf_packIffLess() is often called to ensure that a packet's header is in a single contiguous buffer so that the individual fields of the header can be easily accessed.

For a detailed description of the network service's buffer management, click here.


0042745 assert (pack->acc_length >= IP_MIN_HDR_SIZE);
0042746 
0042747          ip_hdr= (ip_hdr_t *)ptr2acc_data(pack);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0042748 
0042749          DIFBLOCK(0x20, (ip_hdr->ih_dst & HTONL(0xf0000000)) == HTONL(0xe0000000),
0042750                   printf("got multicast packet\n"));
0042751 
If the ip header includes ip options, it will be larger than the minimum header size and the buffer must be repacked and realigned and the ip header must be extracted again from the buffer.


0042752          ip_hdr_len= (ip_hdr->ih_vers_ihl & IH_IHL_MASK) << 2;
The lower 4 bits of ih_vers_ihl is the length of the header plus options (if there are any) shifted by 2 bit positions (i.e., its actual length is 4 times as great as the value stored in ih_vers_ihl). An example of an option is a router list that a packet should follow to its destination.

The upper four bits iof ih_vers_ihl s the version number (e.g., IPv4).



0042753          if (ip_hdr_len>IP_MIN_HDR_SIZE)
0042754          {
0042755                   pack= bf_align(pack, IP_MIN_HDR_SIZE, 4);
bf_align()

If data is not already packed and aligned, bf_align(acc, size, alignment) packs size (bf_align's second parameter) bytes from acc, bf_align()'s first parameter and a linked list of accessors (i.e., a packet), by calling bf_pack(). This packing is necessary to ensure that all of the fields from a header are easily accessed. For example, the ip code aligns a packet's header contained in the accessors before accessing the various ip header fields.

For a detailed description of the network service's buffer management, click here.


0042756                   pack= bf_packIffLess(pack, ip_hdr_len);
bf_packIffLess()

If the data in a linked list of accessors is less than min_len (the second parameter), bf_packIffLess(pack, min_len) packs the data by calling bf_pack().

bf_packIffLess() is often called to ensure that a packet's header is in a single contiguous buffer so that the individual fields of the header can be easily accessed.

For a detailed description of the network service's buffer management, click here.


0042757                   ip_hdr= (ip_hdr_t *)ptr2acc_data(pack);
ptr2acc_data()

The macro ptr2acc_data is #define'd in inet/generic/buf.h as:

#define ptr2acc_data(/* acc_t * */ a) (bf_temporary_acc=(a), \
(&bf_temporary_acc->acc_buffer->buf_data_p[bf_temporary_acc-> \
acc_offset]))

ptr2acc_data() simply returns a pointer to the actual data within an accessor.

ptr2acc_data() is usually called so that the fields of a header (e.g., ip header) can be analyzed.


0042758          }
0042759          ip_frag_len= ntohs(ip_hdr->ih_length);
ih_length is the length of the entire ip packet, including the ip header. If this total length is less than the length of the buffer (line 42760), cut the buffer to size (lines 42761-42765).


htons() / ntohs() / htonl() / ntohl()


From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0042760          if (ip_frag_len<pack_size)
If the packet length advertised by its header is less than the size of the packet received, the packet is trimmed to reflect the length advertised by the header.


0042761          {
0042762                   tmp_pack= pack;
0042763                   pack= bf_cut(tmp_pack, 0, ip_frag_len);
bf_cut()

If a section of a linked list needs to be duplicated, bf_cut(data, offset, length) is called. For example, if a section of length 50 starting at an offset of 75 of the linked list below needs to be duplicated, bf_cut(data, 75, 50) is called:



Note that the original linked list remains unchanged and that acc_linkC for all the accessors in the new linked list is one.

If length (the second parameter) is zero, simply duplicate the first accessor in the linked list but set acc_length=0 and acc_next=null. In other words, create a linked list of length one accessor whose acc_length is 0.

bf_cut() is used in a number of scenarios, including cutting a received ethernet packet to size.

For a full description of the network service's buffer management, click here.



0042764                   bf_afree(tmp_pack);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0042765          }
0042766 
0042767          if (!ip_frag_chk(pack))
ip_frag_chk()

ip_frag_chk(pack) performs a few checks on the packet pack, ip_frag_chk()'s only parameter. For example, ip_frag_chk() verifies that the checksum of the ip header is zero.

ip_frag_chk() returns 0 on failure.


0042768          {
0042769                   DBLOCK(1, printf("fragment not allright\n"));
0042770                   bf_afree(pack);
bf_afree()

After a chain of accessors is no longer needed, the chain (and not simply the single accessor passed as the parameter) can be freed by calling bf_free(). However, if either acc_linkC or buf_linkC of one of the accessors in the linked list is not equal to one (1), the entire chain will not be freed. For example, if buf_afree(acc1) is called for the following chain:



Then the resulting chain will be:



bf_afree() returns acc1 (accessors[63]) to acc_freelist (recall that acc_freelist is the linked list of acc_t's without an associated buffer). However, buffers512[127] cannot be freed because acc2 (accessors[64]) still references it.

bf_afree() is called after an accessor's associated data is no longer needed (for example, after a packet has been sent off by the ethernet driver).


0042771                   return;
0042772          }
0042773 
0042774          if (!broadcast_dst(ip_port, ip_hdr->ih_dst))
Verify that the ip address of the packet is a broadcast address. The ethernet address of the ethernet packet was already determined to be a broadcast address.


broadcast_dst()


broadcast_dst(ip_port, dest) returns TRUE if dest, the second parameter of ip_broadcast_dst(), is the broadcast address of the ip port ip_port, the first parameter.

Which addresses are broadcast addresses for a given ip port is best shown with an example:

If an ip port has an ip address/subnet pair of 192.168.1.5/255.255.255.252, the following ip addresses are considered to be broadcast addresses:

192.168.1.7 (the broadcast address for the given subnet)
192.168.1.255 (the broadcast address for the address's class C network)
255.255.255.255

If BSD rules apply (i.e., iff IP_42BSD_BCAST is set), the following ip addresses are also considered to be broadcast addresses:

192.168.1.4 (the network address for the given subnet)
192.168.1.0 (the network address for the address's class C network)


0042775          {
0042776 #if !CRAMPED
0042777                   printf(
0042778                   "ip[%d]: broadcast packet for ip-nonbroadcast addr, src=",
0042779                            ip_port->ip_port);
0042780                   writeIpAddr(ip_hdr->ih_src);
0042781                   printf(" dst=");
0042782                   writeIpAddr(ip_hdr->ih_dst);
0042783                   printf("\n");
0042784 #endif
0042785                   bf_afree(pack);
0042786                   return;
0042787          }
0042788 
0042789          ip_port_arrive (ip_port, pack, ip_hdr);
ip_port_arrive()

For a given packet, ip_port_arrive() finds the ip file descriptors that are interested in the packet. ip_port_arrive() then hands the packet off to the higher layer (e.g., udp layer) by calling either packet2user() or a protocol-specific function (e.g., udp_ip_arrived()).

eth_arrive() is the function analogous to ip_port_arrive() on the ethernet layer. eth_arrive() tries to find interested ethernet file descriptors for a given packet.


0042790 }
0042791 
0042792 PRIVATE int broadcast_dst(ip_port, dest)
0042793 ip_port_t *ip_port;
0042794 ipaddr_t dest;
broadcast_dst()

broadcast_dst(ip_port, dest) returns TRUE if dest, the second parameter of ip_broadcast_dst(), is the broadcast address of the ip port ip_port, the first parameter.

Which addresses are broadcast addresses for a given ip port is best shown with an example:

If an ip port has an ip address/subnet pair of 192.168.1.5/255.255.255.252, the following ip addresses are considered to be broadcast addresses:

192.168.1.7 (the broadcast address for the given subnet)
192.168.1.255 (the broadcast address for the address's class C network)
255.255.255.255

If BSD rules apply (i.e., iff IP_42BSD_BCAST is set), the following ip addresses are also considered to be broadcast addresses:

192.168.1.4 (the network address for the given subnet)
192.168.1.0 (the network address for the address's class C network)


0042795 {
0042796          /* Treat class D (multicast) address as broadcasts. */
0042797          if ((dest & HTONL(0xF0000000)) == HTONL(0xE0000000))
If the first byte (in binary) of the destination address is "1110", the address is a multicast address. In other words, the range "224.0.0.0" through "239.255.255.255" (i.e., "Class D" addresses) are multicast addresses.

Multicast addresses are treated here as broadcast addresses.


htons() / ntohs() / htonl() / ntohl()


From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0042798          {
0042799                   return 1;
0042800          }
0042801 
0042802          /* Accept without complaint if netmask not yet configured. */
0042803          if (!(ip_port->ip_flags & IPF_NETMASKSET))
0042804          {
0042805                   return 1;
0042806          }
0042807 
0042808          if (((ip_port->ip_ipaddr ^ dest) & ip_port->ip_netmask) != 0)
The netmask is simply a reflection of the class to which the ip address belongs. For example, if the ip address is 194.77.33.5, then it is a class C address and its netmask is therefore 255.255.255.0. See ip_nettype() for more information.

Determine if the destination ip address and the ip address of the ip port are in the same network. If the two addresses are part of the same network, the condition below will be false. For example, 192.168.1.1 and 192.168.1.255 are in the same network (see calculation below). Note that 192.168.1.x is a class C network - in other words, the netmask is 255.255.255.0.

(192.168.1.1 ^ 192.168.1.255) & 255.255.255.0 =
0.0.0.254 & 255.255.255.0 = 0 (and therefore the condition is false)

(note that "^" is the exclusive-or (XOR) operator)

If the two addresses are not part of the same class network, determine whether the address is 255.255.255.255 or 0.0.0.0 (if BSD rules are followed).


0042809          {
0042810                   /* Two possibilities, 0 (iff IP_42BSD_BCAST) and -1 */
0042811                   if (dest == HTONL((ipaddr_t)-1))
If the address is 255.255.255.255, this is considered to be a broadcast packet.

ipaddr_t is declared in include/net/gen/in.h:

typedef u32_t ipaddr_t;

Casting (-1) to an ipaddr_t produces a value of 0xffffffff (255.255.255.255).


htons() / ntohs() / htonl() / ntohl()


From htons(3):

"htons() converts a 16-bit quantity from host byte order to network byte order."

Different CPU architectures group multiple bytes differently. For example, on a "little-endian" machine (an example of which is the Intel CPU), the value 0x1234 is stored in memory as 0x3412. However, on a "big-endian" machine, the value 0x1234 is stored in memory as 0x1234.

It is important that values in a header are sent across a network in a consistent manner independent of the architecture of the sending or receiving system. For this reason, a standard was chosen. The standard chosen was big-endian although it could have just as well been little-endian.

htons() is defined in /include/net/hton.h, as:
#define htons(x) (_tmp=(x), ((_tmp>>8) & 0xff) | ((_tmp<<8) & 0xff00))

ntohs() converts a 16-bit quantity from network byte order to host byte order, the reverse of htons().

htonl() and ntohl() are identical to htons() and ntohs() except that they convert 32-bit quantities instead of 16-bit quantities.

Processes generally supply header information when sending packets. The data in these fields is converted to the network format (i.e., big-endian) by the process before the process copies the data to the network service.


0042812                            return 1;
0042813 #if IP_42BSD_BCAST
0042814                   if (dest == HTONL((ipaddr_t)0))
If BSD rules apply, ip address 0.0.0.0 is also a broadcast address.


0042815                            return 1;
0042816 #endif
0042817                   return 0;
0042818          }
0042819          if (((ip_port->ip_ipaddr ^ dest) & ip_port->ip_subnetmask) != 0)
Using the subnet mask this time (instead of the netmask - see line 42808), determine if the destination address is in the same (sub)network as the ip address of the ip port. Again, the condition below will be false if the two addresses are part of the same network (or, actually, subnet in this case).


0042820          {
0042821                   /* Two possibilities, netwerk.0 (iff IP_42BSD_BCAST) and
0042822                    * netwerk.-1
0042823                    */
0042824                   if ((dest & ~ip_port->ip_netmask) == ~ip_port->ip_netmask)
0042825                            return 1;
Accept "network.-1" as a broadcast address. For example, 192.168.1.255 (which is a class C address and therfore has a netmask of 255.255.255.0) is a broadcast address if BSD rules apply. Note that this will be a broadcast address even if the subnet mask is not 255.255.255.0. In my (Andrew's) opinion, this is a little odd.


0042826 #if IP_42BSD_BCAST
0042827                   if ((dest & ~ip_port->ip_netmask) == 0)
0042828                            return 1;
If BSD rules apply, accept "network.0" as a broadcast address. For example, 192.168.1.0 (which is a class C address and therefore has a netmask of 255.255.255.0) is a broadcast address if BSD rules apply.


0042829 #endif
0042830                   return 0;
0042831          }
0042832 
0042833          /* Two possibilities, netwerk.subnet.0 (iff IP_42BSD_BCAST) and
0042834           * netwerk.subnet.-1
0042835           */
0042836          if ((dest & ~ip_port->ip_subnetmask) == ~ip_port->ip_subnetmask)
0042837                   return 1;
This is the common case of the broadcast address. If the destination address is 192.168.1.7 and the ip port/subnet mask pair is 192.168.1.5/255.255.255.252, the destination address is the broadcast address:

192.168.1.7 & ~(255.255.255.252) = 192.168.1.7 & 0.0.0.3 =
0.0.0.2 ?= ~(255.255.255.252) = 0.0.0.2

The condition is true and therefore 192.168.1.7 is the broadcast address for 192.168.1.5/255.255.255.252.

Note that 192.168.1.3 and 192.168.1.11 will not be broadcast addresses for 192.168.1.5/255.255.255.252.


0042838 #if IP_42BSD_BCAST
0042839          if ((dest & ~ip_port->ip_subnetmask) == 0)
0042840                   return 1;
If BSD rules apply, accept "network.subnet.-1". In other words (and using the example above), 192.168.1.4 will be accepted as the broadcast address if 192.168.1.5/255.255.255.252 is the ip address/subnetmask for the ip port.


0042841 #endif
0042842          return 0;
0042843 }
0042844 
0042845 void ip_process_loopb(ev, arg)
0042846 event_t *ev;
0042847 ev_arg_t arg;
ip_process_loopb()

If a packet is either sent (i.e., written) to the ip address of the ip port out of which the packet is sent or sent to the loopback address, ip_send() calls ev_enqueue() with its second argument set to a reference of ip_process_loopb(). The next time that the event queue is processed, ip_process_loopb() will be called.

ip_process_loopb() calls ip_arrived() for every ip packet in the ip port's ip_loopb_head queue.


0042848 {
0042849          ip_port_t *ip_port;
0042850          acc_t *pack;
0042851 
0042852          ip_port= arg.ev_ptr;
0042853          assert(ev == &ip_port->ip_loopb_event);
0042854 
0042855          while(pack= ip_port->ip_loopb_head)
0042856          {
0042857                   ip_port->ip_loopb_head= pack->acc_ext_link;
The packets in the ip port's ip_loopb_head queue are linked together by their acc_ext_link field.



0042858                   ip_arrived(ip_port, pack);
ip_arrived()

Depending on the destination ip address of its second parameter,
ip_arrived(ip_port, pack) does one of several things:

1) If the destination ip address is the ip address of the ip port associated with the ethernet port, ip_arrived() calls ip_port_arrive() for the packet.

2) If the destination ip address is the ip address of another ip port, ip_arrived() also calls ip_port_arrived(). This time, however, the first argument passed to ip_port_arrived() is the other port. Note that for this to take place, an input route to the other ip port must exist.

3) If the destination ip address is not the address of another ip port but it is in the same network as another ip port, ip_arrived() sends the packet out the other interface. Again, an input route to the other ip port for this destination must exist.

4) If the destination ip address is not the address of another ip port and it is not in the same network as another ip port, ip_arrived() sends the packet out to the gateway for this network. Again, an input route (including the gateway) to the other ip port for this destination must exist.

5) If the destination ip address is not the ip address of the ip port but an input route for the destination exists and is associated with the same ip port as the packet arrived, an icmp redirect message is sent to the source (provided the source is on the same network) and the packet is then sent. If the source of the ip packet is not on the same network as the ip port, the packet is dropped.

If an ip packet arrives on an ethernet interface, ip_eth_arrived() strips off a packet's ethernet header before handing the packet off to ip_arrived().



0042859          }
0042860 }
0042861 
0042862 /*
0042863  * $PchId: ip_read.c,v 1.9 1997/01/31 08:51:39 philip Exp $
0042864  */