Skip to content
242 changes: 242 additions & 0 deletions Packet++/header/SipLayer.h
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,9 @@

#include "TextBasedProtocol.h"

#include <cstring>
#include <algorithm>

/// @file

/// @namespace pcpp
Expand Down Expand Up @@ -114,6 +117,128 @@ namespace pcpp
return port == 5060 || port == 5061;
}

/// Heuristically detects whether the first line of a buffer looks like a SIP
/// Request-Line or Status-Line (RFC 3261).
///
/// Line is parsed as:
/// token1 SP(space1) token2 SP(space2) token3
///
/// SIP Request-Line:
/// token1 = Method
/// token2 = Request-URI (must contain ':')
/// token3 = SIP-Version (starts with "SIP/")
/// Example: INVITE sip:alice@example.com SIP/2.0
///
/// SIP Status-Line:
/// token1 = SIP-Version (starts with "SIP/")
/// token2 = Status-Code (3 digits)
/// token3 = Reason-Phrase
/// Example: SIP/2.0 200 OK
///
/// RFC References:
/// From section 4.1 of RFC 2543:
/// Request-Line = Method SP Request-URI SP SIP-Version CRLF
///
/// From section 5.1 of RFC 2543:
/// Status-Line = SIP-Version SP Status-Code SP Reason-Phrase CRLF
///
/// From section 7.1 of RFC 3261:
/// Unlike HTTP, SIP treats the version number as a literal string.
/// In practice, this should make no difference.
///
/// @param[in] data Pointer to the raw data buffer
/// @param[in] dataLen Length of the data buffer in bytes
/// @return True if the first line matches SIP request/response syntax, false otherwise
static bool dissectSipHeuristic(const uint8_t* data, size_t dataLen)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have SipRequestFirstLine and SipResponseFirstLine that parse the first line, maybe we could use this instead of adding more logic to parse the first line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we already have SipRequestFirstLine and SipResponseFirstLine for parsing the first line, but I think keeping the heuristic logic separate is still necessary because the goals are different.

The Sip*FirstLine classes assume we already decided that the payload is SIP, operate on a Sip*Layer instance, update internal state (m_IsComplete, offsets, logging, etc.), and are meant for full parsing.

In contrast, dissectSipHeuristic() is a stateless, side-effect-free check that runs directly on raw data to answer a simpler question: “does this buffer look like a SIP message at all?”. This also matches Wireshark’s design, where heuristic detection is separate from the actual SIP dissector that parses the first line and fields.

This separation is particularly important for TCP: when we inspect data per segment, the first line may be incomplete. In that case the heuristic must be able to say “need more data / undecided” without constructing SIP layers or marking anything as invalid, which is a different lifecycle than the existing first-line parsers.

In this pull request I’m not yet handling TCP segmentation or IP fragmentation — the heuristic currently assumes it sees at least one complete first line. I plan to address proper TCP stream reassembly / IP fragmentation handling in a separate follow-up PR.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we already have SipRequestFirstLine and SipResponseFirstLine for parsing the first line, but I think keeping the heuristic logic separate is still necessary because the goals are different.

The Sip*FirstLine classes assume we already decided that the payload is SIP, operate on a Sip*Layer instance, update internal state (m_IsComplete, offsets, logging, etc.), and are meant for full parsing.

In contrast, dissectSipHeuristic() is a stateless, side-effect-free check that runs directly on raw data to answer a simpler question: “does this buffer look like a SIP message at all?”. This also matches Wireshark’s design, where heuristic detection is separate from the actual SIP dissector that parses the first line and fields.

I just noticed Sip*FirstLine classes do accept a request/response pointer in their constructor, so they can't be used directly. However, they do contain static methods such as parseStatusCode(), parseVersion(), parseMethod() that can definitely be used. If we see we still have a lot of common code between these classes and the parsing logic you need we can think what's the best way to refactor them so they can be used in both scenarios.

In this pull request I’m not yet handling TCP segmentation or IP fragmentation — the heuristic currently assumes it sees at least one complete first line. I plan to address proper TCP stream reassembly / IP fragmentation handling in a separate follow-up PR.

Handling TCP segmentation or IP fragmentation is more tricky - PcapPlusPlus parses packets one by one, there is currently no built-in way to use TcpReassembly or IPReassembly and use the outcome to parse the message again as a packet

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion!

I refactored SipLayer::dissectSipHeuristic() to use the static parsing helpers from SipResponseFirstLine and SipRequestFirstLine instead of manually tokenizing the first line.

For responses I'm now using parseVersion() and parseStatusCode(), and for requests I'm using parseMethod(), parseVersion() and parseUri(). This removes the duplicated parsing logic and keeps the heuristic in sync with the actual SIP first-line parsers.

I didn't use the Sip*FirstLine constructors themselves, as they still require a request/response pointer as you mentioned.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re absolutely right that handling TCP segmentation and IP fragmentation is more complex. Since PcapPlusPlus currently processes packets one by one, in this PR I focused only on heuristic detection on the first line of a single packet. My plan is to add built-in IP fragmentation support to PcapPlusPlus itself in a separate PR, and I’m really excited to work on that.

{
if (!data || dataLen == 0)
{
return false;
}

int firstLineLen = findFirstLine(data, dataLen);
if (firstLineLen <= 0)
{
return false;
}

const char* line = reinterpret_cast<const char*>(data);
const int len = firstLineLen;

// --- Extract first three tokens from the first line ---
int token1_start = 0;
token1_start = skipSpaces(line, token1_start, len);
if (token1_start >= len)
{
return false;
}

int space1 = findSpace(line, token1_start, len);
if (space1 == -1 || space1 == token1_start)
{
return false;
}

int token1_len = space1 - token1_start;

int token2_start = skipSpaces(line, space1 + 1, len);
if (token2_start >= len)
{
return false;
}

int space2 = findSpace(line, token2_start, len);
if (space2 == -1)
{
return false;
}

int token2_len = space2 - token2_start;

int token3_start = skipSpaces(line, space2 + 1, len);
if (token3_start >= len)
{
return false;
}

int token3_len = len - token3_start;

const char* token1 = line + token1_start;
const char* token2 = line + token2_start;
const char* token3 = line + token3_start;

// --- Check if it's a SIP response line: "SIP/x.y SP nnn SP Reason" ---
if (startsWithSipVersion(token1, static_cast<size_t>(token1_len)))
{
// second token must be 3-digit status code
if (!isThreeDigitCode(token2, static_cast<size_t>(token2_len)))
return false;

return true;
}

// --- Check if it's a SIP request line: "METHOD SP URI SP SIP/x.y" ---
if (token2_len < 3)
{
return false;
}

if (!hasColonInRange(line,
static_cast<size_t>(token2_start + 1),
static_cast<size_t>(space2)))
{
return false;
}

if (!startsWithSipVersion(token3, static_cast<size_t>(token3_len)))
{
return false;
}

return true;
}


protected:
SipLayer(uint8_t* data, size_t dataLen, Layer* prevLayer, Packet* packet, ProtocolType protocol)
: TextBasedProtocolMessage(data, dataLen, prevLayer, packet, protocol)
Expand All @@ -137,6 +262,123 @@ namespace pcpp
{
return true;
}

private:
/// Finds the length of the first line in the buffer.
/// This method scans the input data for the first occurrence of '\r' or '\n',
/// marking the end of the first line. If no such character exists, the returned
/// value will be equal to the buffer length. Returns -1 if the buffer is null
/// or empty.
/// @param[in] data Pointer to the raw data buffer
/// @param[in] dataLen Length of the data buffer in bytes
/// @return The number of bytes until the first CR/LF, or -1 on invalid input
static int findFirstLine(const uint8_t* data, size_t dataLen)
{
if (!data || dataLen == 0)
return -1;

const char* start = reinterpret_cast<const char*>(data);
const char* end = start + dataLen;

// Find CR or LF
auto it = std::find_if(start, end, [](char c)
{
return c == '\r' || c == '\n';
});

return static_cast<int>(std::distance(start, it));
}

/// Checks whether a buffer starts with the SIP version prefix "SIP/".
/// Comparison is case-insensitive and requires the input length to be at least
/// the size of the prefix.
/// @param[in] s Pointer to the buffer to examine
/// @param[in] len Number of bytes available in the buffer
/// @return True if the buffer begins with "SIP/" (case-insensitive), false otherwise
static bool startsWithSipVersion(const char* s, size_t len)
{
constexpr char prefix[] = "SIP/";
constexpr std::size_t prefixLen = sizeof(prefix) - 1;

if (len < prefixLen)
return false;

return std::equal(
prefix, prefix + prefixLen, s,
[](char a, char b)
{
return std::tolower(static_cast<unsigned char>(a)) ==
std::tolower(static_cast<unsigned char>(b));
}
);
}

/// Determines whether a buffer of length 3 contains only numeric digits.
/// This is primarily used to validate SIP response status codes, which must
/// always be 3-digit numeric values.
/// @param[in] s Pointer to the buffer to check
/// @param[in] len Must be exactly 3 to return true
/// @return True if all three characters are decimal digits, false otherwise
static bool isThreeDigitCode(const char* s, size_t len)
{
if (len != 3)
return false;

return std::all_of(s, s + 3, [](unsigned char ch)
{
return std::isdigit(ch) != 0;
});
}

/// Checks for the presence of a colon (':') within a specific range of a string.
/// This is used to validate that a SIP Request-URI contains a scheme (e.g., sip:),
/// which is required for proper SIP request-line syntax.
/// @param[in] s Pointer to the string to search
/// @param[in] begin Starting index of the range (inclusive)
/// @param[in] end Ending index of the range (exclusive)
/// @return True if a ':' character exists within the specified range, false otherwise
static bool hasColonInRange(const char* s, size_t begin, size_t end)
{
const char* first = s + begin;
const char* last = s + end;

return std::find(first, last, ':') != last;
}

/// Finds the first space (' ') character in the string starting from a given index.
/// The search is limited to the range [start, len). If no space is found, -1 is returned.
/// @param[in] s Pointer to the string to search
/// @param[in] start Index from which to start scanning
/// @param[in] len Total valid length of the string
/// @return The index of the first space, or -1 if not found
static int findSpace(const char* s, int start, int len)
{
const char* begin = s + start;
const char* end = s + len;

auto it = std::find(begin, end, ' ');
if (it == end)
return -1;

return static_cast<int>(std::distance(s, it));
}

/// Finds the first space (' ') character in the string starting from a given index.
/// The search is limited to the range [start, len). If no space is found, -1 is returned.
/// @param[in] s Pointer to the string to search
/// @param[in] start Index from which to start scanning
/// @param[in] len Total valid length of the string
/// @return The index of the first space, or -1 if not found
static int skipSpaces(const char* s, int start, int len)
{
const char* begin = s + start;
const char* end = s + len;
const char* it = std::find_if(begin, end, [](unsigned char ch)
{
return ch != ' ';
});
return static_cast<int>(std::distance(s, it));
}
};

class SipRequestFirstLine;
Expand Down
5 changes: 4 additions & 1 deletion Packet++/src/UdpLayer.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -108,7 +108,10 @@ namespace pcpp
else if (DnsLayer::isDataValid(udpData, udpDataLen) &&
(DnsLayer::isDnsPort(portDst) || DnsLayer::isDnsPort(portSrc)))
m_NextLayer = new DnsLayer(udpData, udpDataLen, this, m_Packet);
else if (SipLayer::isSipPort(portDst) || SipLayer::isSipPort(portSrc))
else if (SipLayer::isSipPort(portDst) ||
SipLayer::isSipPort(portSrc) ||
SipLayer::dissectSipHeuristic(udpData, udpDataLen)
)
{
if (SipRequestFirstLine::parseMethod((char*)udpData, udpDataLen) != SipRequestLayer::SipMethodUnknown)
m_NextLayer = new SipRequestLayer(udpData, udpDataLen, this, m_Packet);
Expand Down
Loading