One of the decisions the CM needs to make is the granularity at which a macroflow is constructed, by deciding which streams belong to the same macroflow and share congestion information. The API provides two functions that allow applications to decide which of their streams ought to belong to the same macroflow.

cm_getmacroflow(i32 cm_streamid) returns a unique i32 macroflow identifier. cm_setmacroflow(i32 cm_macroflowid, i32 cm_streamid) sets the macroflow of the stream cm_streamid to cm_macroflowid. If the cm_macroflowid that is passed to cm_setmacroflow() is -1, then a new macroflow is constructed and this is returned to the caller. Each call to cm_setmacroflow() overrides the previous macroflow association for the stream, should one exist.

The default suggested aggregation method is to aggregate by destination IP address; i.e., all streams to the same destination address are aggregated to a single macroflow by default. The cm_getmacroflow() and cm_setmacroflow() calls can then be used to change this as needed. We do note that there are some cases where this may not be optimal, even over best-effort networks. For example, when a group of receivers are behind a NAT device, the sender will see them all as one address. If the hosts behind the NAT are in fact connected over different bottleneck links, some of those hosts could see worse performance than before. It is possible to detect such hosts when using delay and loss estimates, although the specific mechanisms for doing so are beyond the scope of this document.

The objective of this interface is to set up sharing of groups not sharing policy of relative weights of streams in a macroflow. The latter requires the scheduler to provide an interface to set sharing policy. However, because we want to support many different schedulers (each of which may need different information to set policy), we do not specify a complete API to the scheduler (but see Section 5.2). A later guideline document is expected to describe a few simple schedulers (e.g., weighted round-robin, hierarchical scheduling) and the API they export to provide relative prioritization.

4. CM internals
This section describes the internal components of the CM. It includes a Congestion Controller and a Scheduler, with well-defined, abstract interfaces exported by them.

4.1 Congestion controller
Associated with each macroflow is a congestion control algorithm; the collection of all these algorithms comprises the congestion controller of the CM. The control algorithm decides when and how much data can be transmitted by a macroflow. It uses application notifications (Section 4.3) from concurrent streams on the same macroflow to build up information about the congestion state of the network path used by the macroflow.

The congestion controller MUST implement a "TCP-friendly" [Mahdavi98] congestion control algorithm. Several macroflows MAY (and indeed, often will) use the same congestion control algorithm but each macroflow maintains state about the network used by its streams.

The congestion control module MUST implement the following abstract interfaces. We emphasize that these are not directly visible to applications; they are within the context of a macroflow, and are different from the CM API functions of Section 4.

- void query(u64 *rate, u32 *srtt, u32 *rttdev): This function returns the estimated rate (in bits per second) and smoothed round trip time (in microseconds) for the macroflow.

- void notify(u32 nsent): This function MUST be used to notify the congestion control module whenever data is sent by an application. The nsent parameter indicates the number of bytes just sent by the application.

- void update(u32 nsent, u32 nrecd, u32 rtt, u32 lossmode): This function is called whenever any of the CM streams associated with a macroflow identifies that data has reached the receiver or has been lost en route. The nrecd parameter indicates the number of bytes that have just arrived at the receiver. The nsent parameter is the sum of the number of bytes just received and the number of bytes identified as lost en route. The rtt parameter is the estimated round trip time in microseconds during the transfer. The lossmode parameter provides an indicator of how a loss was detected (section 4.3).


kazza