One of the decisions the CM needs to make is the granularity at which
a macroflow is constructed, by deciding which streams belong to the
same macroflow and share congestion information. The API provides
two functions that allow applications to decide which of their
streams ought to belong to the same macroflow.
cm_getmacroflow(i32 cm_streamid) returns a unique i32 macroflow
identifier. cm_setmacroflow(i32 cm_macroflowid, i32 cm_streamid)
sets the macroflow of the stream cm_streamid to cm_macroflowid. If
the cm_macroflowid that is passed to cm_setmacroflow() is -1, then a
new macroflow is constructed and this is returned to the caller.
Each call to cm_setmacroflow() overrides the previous macroflow
association for the stream, should one exist.
The default suggested aggregation method is to aggregate by
destination IP address; i.e., all streams to the same destination
address are aggregated to a single macroflow by default. The
cm_getmacroflow() and cm_setmacroflow() calls can then be used to
change this as needed. We do note that there are some cases where
this may not be optimal, even over best-effort networks. For
example, when a group of receivers are behind a NAT device, the
sender will see them all as one address. If the hosts behind the NAT
are in fact connected over different bottleneck links, some of those
hosts could see worse performance than before. It is possible to
detect such hosts when using delay and loss estimates, although the
specific mechanisms for doing so are beyond the scope of this
document.
The objective of this interface is to set up sharing of groups not
sharing policy of relative weights of streams in a macroflow. The
latter requires the scheduler to provide an interface to set sharing
policy. However, because we want to support many different
schedulers (each of which may need different information to set
policy), we do not specify a complete API to the scheduler (but see Section 5.2). A later guideline document is expected to describe a
few simple schedulers (e.g., weighted round-robin, hierarchical
scheduling) and the API they export to provide relative
prioritization.
4. CM internals
This section describes the internal components of the CM. It
includes a Congestion Controller and a Scheduler, with well-defined,
abstract interfaces exported by them.
4.1 Congestion controller
Associated with each macroflow is a congestion control algorithm; the
collection of all these algorithms comprises the congestion
controller of the CM. The control algorithm decides when and how
much data can be transmitted by a macroflow. It uses application
notifications (Section 4.3) from concurrent streams on the same
macroflow to build up information about the congestion state of the
network path used by the macroflow.
The congestion controller MUST implement a "TCP-friendly" [Mahdavi98]
congestion control algorithm. Several macroflows MAY (and indeed,
often will) use the same congestion control algorithm but each
macroflow maintains state about the network used by its streams.
The congestion control module MUST implement the following abstract
interfaces. We emphasize that these are not directly visible to
applications; they are within the context of a macroflow, and are
different from the CM API functions of Section 4.
- void query(u64 *rate, u32 *srtt, u32 *rttdev): This function
returns the estimated rate (in bits per second) and smoothed
round trip time (in microseconds) for the macroflow.
- void notify(u32 nsent): This function MUST be used to notify the
congestion control module whenever data is sent by an
application. The nsent parameter indicates the number of bytes
just sent by the application.
- void update(u32 nsent, u32 nrecd, u32 rtt, u32 lossmode): This
function is called whenever any of the CM streams associated with
a macroflow identifies that data has reached the receiver or has
been lost en route. The nrecd parameter indicates the number of
bytes that have just arrived at the receiver. The nsent
parameter is the sum of the number of bytes just received and the number of bytes identified as lost en route. The rtt parameter is
the estimated round trip time in microseconds during the
transfer. The lossmode parameter provides an indicator of how a
loss was detected (section 4.3).
kazza