Testing Session Initiation Protocol (SIP)
Effects, caveats, and issues encountered in testing SIP servers
Introduction
The SIP (session initiation protocol) signaling protocol is emerging as the industry’s preferred Voice over IP (VoIP) technology. As such, SIP’s scalability is key to VoIP technology’s continued adoption, and the need to test SIP user agents and servers under heavily saturated scenarios is becoming more acute. SIP servers must be able to bear heavy traffic loads for user registration and call setup functions that require reading from and writing to one or more large dynamic database(s). The corresponding test tools, call generators, terminators and analyzers must all be capable of processing the same significant loads.
The Session Initiation Protocol (SIP) is a signaling protocol, widely used for controlling multimedia communication sessions such as voice and video calls over Internet Protocol (IP). The protocol can be used for creating, modifying and terminating two-party (unicast) or multiparty (multicast) sessions consisting of one or several media streams. The modification can involve changing addresses or ports, inviting more participants, adding or deleting media streams, etc. Other feasible application examples include video conferencing, streaming multimedia distribution, instant messaging, presence information and online games.
SIP was originally designed by Henning Schulzrinne and Mark Handley starting in 1996. The latest version of the specification is RFC 3261 from the IETF Network Working Group. In November 2000, SIP was accepted as a 3GPP signaling protocol and permanent element of the IP Multimedia Subsystem (IMS) architecture for IP-based streaming multimedia services in cellular systems.
The SIP protocol is a TCP/IP-based Application Layer protocol. SIP is designed to be independent of the underlying transport layer; it can run on Transmission Control Protocol (TCP), User Datagram Protocol (UDP), or Stream Control Transmission Protocol (SCTP). It is a text-based protocol, incorporating many elements of the Hypertext Transfer Protocol (HTTP) and the Simple Mail Transfer Protocol (SMTP), allowing for direct inspection by administrators.
Parameters (port numbers, protocols, codecs) for these media streams are defined and negotiated using the Session Description Protocol (SDP) which is transported in the SIP packet body.
A motivating goal for SIP was to provide a signaling and call setup protocol for IP-based communications that can support a superset of the call processing functions and features present in the public switched telephone network (PSTN). SIP by itself does not define these features; rather, its focus is call-setup and signaling. However, it was designed to enable the construction of functionalities of network elements designated proxy servers and user agents. These are features that permit familiar telephone-like operations: dialing a number, causing a phone to ring, hearing ringback tones or a busy signal. Implementation and terminology are different in the SIP world but to the end-user, the behavior is similar.
SIP-enabled telephony networks can also implement many of the more advanced call processing features present in Signaling System 7 (SS7), though the two protocols themselves are very different. SS7 is a centralized protocol, characterized by a complex central network architecture and dumb endpoints (traditional telephone handsets). SIP is a peer-to-peer protocol, thus it requires only a simple (and thus scalable) core network with intelligence distributed to the network edge, embedded in endpoints (terminating devices built in either hardware or software). SIP features are implemented in the communicating endpoints (i.e. at the edge of the network) contrary to traditional SS7 features, which are implemented in the network.
Although several other VoIP signaling protocols exist, SIP is distinguished by its proponents for having roots in the IP community rather than the telecommunications industry. SIP has been standardized and governed primarily by the IETF, while other protocols, such as H.323, have traditionally been associated with the International Telecommunication Union (ITU).
The first proposed standard version (SIP 2.0) was defined by RFC 2543. This version of the protocol was further refined and clarified in RFC 3261, although some implementations are still relying on the older definitions.
SIP network elements
A SIP user agent (UA) is a logical network end-point used to create or receive SIP messages and thereby manage a SIP session. A SIP UA can perform the role of a User Agent Client (UAC), which sends SIP requests, and the User Agent Server (UAS), which receives the requests and returns a SIP response. These roles of UAC and UAS only last for the duration of a SIP transaction.
A SIP phone is a hardware-based or software-based SIP user agent, that provides call functions such as dial, answer, reject, hold/unhold, and call transfer.[6][7] Examples include softphones such as Ekiga, KPhone, Twinkle, Windows Live Messenger, X-Lite, and hardware phones from vendors such as Avaya, Cisco, Leadtek, Polycom, Snom, and Nokia.
Each resource of a SIP network, such as a User Agent or a voicemail box, is identified by a Uniform Resource Identifier (URI), based on the general standard syntax[8] also used in Web services and e-mail. A typical SIP URI is of the form: sip:username:password@host:port. The URI scheme used for SIP is sip:. If secure transmission is required, the scheme sips: is used and SIP messages must be transported over Transport Layer Security (TLS).
In SIP, as in HTTP, the User Agent may identify itself using a message header field 'User-Agent', containing a text description of the software/hardware/product involved. The User-Agent field is sent in request messages, which means that the receiving SIP server can see this information. SIP network elements sometimes store this information, and it can be useful in diagnosing SIP compatibility problems.
SIP also defines server network elements. Although two SIP endpoints can communicate without any intervening SIP infrastructure, which is why the protocol is described as peer-to-peer, this approach is often impractical for a public service.
RFC 3261 defines these server elements:
A proxy server " is an intermediary entity that acts as both a server and a client for the purpose of making requests on behalf of other clients. A proxy server primarily plays the role of routing, which means its job is to ensure that a request is sent to another entity "closer" to the targeted user. Proxies are also useful for enforcing policy (for example, making sure a user is allowed to make a call). A proxy interprets, and, if necessary, rewrites specific parts of a request message before forwarding it."
"A registrar is a server that accepts REGISTER requests and places the information it receives in those requests into the location service for the domain it handles."
"A redirect server is a user agent server that generates 3xx responses to requests it receives, directing the client to contact an alternate set of URIs.The redirect server allows SIP Proxy Servers to direct SIP session invitations to external domains." The RFC specifies: "It is an important concept that the distinction between types of SIP servers is logical, not physical."
SIP Testing - An 0verview
Protocol design
SIP employs design elements similar to the HTTP request/response transaction model.[5] Each transaction consists of a client request that invokes a particular method or function on the server and at least one response. SIP reuses most of the header fields, encoding rules and status codes of HTTP, providing a readable text-based format.
SIP works in concert with several other protocols and is only involved in the signaling portion of a communication session. SIP clients typically use TCP or UDP on port numbers 5060 and/or 5061 to connect to SIP servers and other SIP endpoints. Port 5060 is commonly used for non-encrypted signaling traffic whereas port 5061 is typically used for traffic encrypted with Transport Layer Security (TLS). The voice and video stream communications in SIP applications are carried over another application protocol, the Real-time Transport Protocol (RTP).