Skip to content

Conversation

@sorooshm78
Copy link

This PR improves SIP packet detection in PcapPlusPlus by introducing heuristic parsing based on Wireshark’s SIP dissector. SIP messages are now detected from the UDP payload itself, not only when using port 5060.

  • Add static bool dissectSipHeuristic(const uint8_t* data, size_t dataLen) to detect SIP requests/responses from payload content (Wireshark-style logic)
  • Use the new heuristic in UdpLayer so SIP packets on non-standard ports are correctly classified
  • Preserve existing behavior for non-SIP payloads

Related issue: #2022

Note: I’m not yet fully familiar with PcapPlusPlus’ internal structure, so if there are better places, names, or patterns for this logic, I’m happy to adjust the PR based on your feedback

@sorooshm78 sorooshm78 requested a review from seladb as a code owner November 17, 2025 16:11
/// @param[in] data Pointer to the raw data buffer
/// @param[in] dataLen Length of the data buffer in bytes
/// @return True if the first line matches SIP request/response syntax, false otherwise
static bool dissectSipHeuristic(const uint8_t* data, size_t dataLen)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We already have SipRequestFirstLine and SipResponseFirstLine that parse the first line, maybe we could use this instead of adding more logic to parse the first line?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we already have SipRequestFirstLine and SipResponseFirstLine for parsing the first line, but I think keeping the heuristic logic separate is still necessary because the goals are different.

The Sip*FirstLine classes assume we already decided that the payload is SIP, operate on a Sip*Layer instance, update internal state (m_IsComplete, offsets, logging, etc.), and are meant for full parsing.

In contrast, dissectSipHeuristic() is a stateless, side-effect-free check that runs directly on raw data to answer a simpler question: “does this buffer look like a SIP message at all?”. This also matches Wireshark’s design, where heuristic detection is separate from the actual SIP dissector that parses the first line and fields.

This separation is particularly important for TCP: when we inspect data per segment, the first line may be incomplete. In that case the heuristic must be able to say “need more data / undecided” without constructing SIP layers or marking anything as invalid, which is a different lifecycle than the existing first-line parsers.

In this pull request I’m not yet handling TCP segmentation or IP fragmentation — the heuristic currently assumes it sees at least one complete first line. I plan to address proper TCP stream reassembly / IP fragmentation handling in a separate follow-up PR.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we already have SipRequestFirstLine and SipResponseFirstLine for parsing the first line, but I think keeping the heuristic logic separate is still necessary because the goals are different.

The Sip*FirstLine classes assume we already decided that the payload is SIP, operate on a Sip*Layer instance, update internal state (m_IsComplete, offsets, logging, etc.), and are meant for full parsing.

In contrast, dissectSipHeuristic() is a stateless, side-effect-free check that runs directly on raw data to answer a simpler question: “does this buffer look like a SIP message at all?”. This also matches Wireshark’s design, where heuristic detection is separate from the actual SIP dissector that parses the first line and fields.

I just noticed Sip*FirstLine classes do accept a request/response pointer in their constructor, so they can't be used directly. However, they do contain static methods such as parseStatusCode(), parseVersion(), parseMethod() that can definitely be used. If we see we still have a lot of common code between these classes and the parsing logic you need we can think what's the best way to refactor them so they can be used in both scenarios.

In this pull request I’m not yet handling TCP segmentation or IP fragmentation — the heuristic currently assumes it sees at least one complete first line. I plan to address proper TCP stream reassembly / IP fragmentation handling in a separate follow-up PR.

Handling TCP segmentation or IP fragmentation is more tricky - PcapPlusPlus parses packets one by one, there is currently no built-in way to use TcpReassembly or IPReassembly and use the outcome to parse the message again as a packet

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion!

I refactored SipLayer::dissectSipHeuristic() to use the static parsing helpers from SipResponseFirstLine and SipRequestFirstLine instead of manually tokenizing the first line.

For responses I'm now using parseVersion() and parseStatusCode(), and for requests I'm using parseMethod(), parseVersion() and parseUri(). This removes the duplicated parsing logic and keeps the heuristic in sync with the actual SIP first-line parsers.

I didn't use the Sip*FirstLine constructors themselves, as they still require a request/response pointer as you mentioned.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You’re absolutely right that handling TCP segmentation and IP fragmentation is more complex. Since PcapPlusPlus currently processes packets one by one, in this PR I focused only on heuristic detection on the first line of a single packet. My plan is to add built-in IP fragmentation support to PcapPlusPlus itself in a separate PR, and I’m really excited to work on that.

@codecov
Copy link

codecov bot commented Nov 23, 2025

Codecov Report

❌ Patch coverage is 83.65385% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.88%. Comparing base (24cc309) to head (6c6acda).
⚠️ Report is 2 commits behind head on dev.

Files with missing lines Patch % Lines
Packet++/header/SipLayer.h 83.16% 16 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##              dev    #2024   +/-   ##
=======================================
  Coverage   83.87%   83.88%           
=======================================
  Files         307      307           
  Lines       53952    54046   +94     
  Branches    11352    11335   -17     
=======================================
+ Hits        45254    45334   +80     
- Misses       7483     7494   +11     
- Partials     1215     1218    +3     
Flag Coverage Δ
alpine320 75.91% <87.03%> (+0.01%) ⬆️
fedora42 75.47% <86.04%> (+0.01%) ⬆️
macos-14 81.56% <74.41%> (-0.02%) ⬇️
macos-15 81.56% <74.41%> (-0.02%) ⬇️
mingw32 69.97% <64.86%> (-0.03%) ⬇️
mingw64 69.99% <72.09%> (+0.12%) ⬆️
npcap ?
rhel94 75.49% <85.45%> (+0.01%) ⬆️
ubuntu2004 59.50% <75.47%> (+0.02%) ⬆️
ubuntu2004-zstd 59.61% <75.47%> (+0.04%) ⬆️
ubuntu2204 75.40% <85.45%> (+0.01%) ⬆️
ubuntu2204-icpx 57.91% <72.09%> (+0.06%) ⬆️
ubuntu2404 75.53% <87.03%> (+0.04%) ⬆️
ubuntu2404-arm64 75.56% <87.03%> (+0.02%) ⬆️
unittest 83.88% <83.65%> (+<0.01%) ⬆️
windows-2022 85.40% <81.11%> (+0.15%) ⬆️
windows-2025 85.42% <81.11%> (+0.09%) ⬆️
winpcap 85.42% <81.11%> (-0.10%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@seladb seladb linked an issue Nov 30, 2025 that may be closed by this pull request
@seladb
Copy link
Owner

seladb commented Dec 3, 2025

@sorooshm78 I just saw we already have most of the logic that you implemented: https://github.com/seladb/PcapPlusPlus/blob/master/Packet%2B%2B/src/UdpLayer.cpp#L113-L120

The 2 things you implemented on top of what we have is parsing the request version and the URI, but are they actually required? 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SIP detection in PcapPlusPlus relies solely on port 5060

2 participants