Improve SIP Packet Detection Using Heuristic Parsing (Fixes #2022) #2024

sorooshm78 · 2025-11-17T16:11:39Z

This PR improves SIP packet detection in PcapPlusPlus by introducing heuristic parsing based on Wireshark’s SIP dissector. SIP messages are now detected from the UDP payload itself, not only when using port 5060.

Add static bool dissectSipHeuristic(const uint8_t* data, size_t dataLen) to detect SIP requests/responses from payload content (Wireshark-style logic)
Use the new heuristic in UdpLayer so SIP packets on non-standard ports are correctly classified
Preserve existing behavior for non-SIP payloads

Related issue: #2022

Note: I’m not yet fully familiar with PcapPlusPlus’ internal structure, so if there are better places, names, or patterns for this logic, I’m happy to adjust the PR based on your feedback

Packet++/header/SipLayer.h

seladb · 2025-11-23T03:41:42Z

Packet++/header/SipLayer.h

+		/// @param[in] data Pointer to the raw data buffer
+		/// @param[in] dataLen Length of the data buffer in bytes
+		/// @return True if the first line matches SIP request/response syntax, false otherwise
+		static bool dissectSipHeuristic(const uint8_t* data, size_t dataLen)


We already have SipRequestFirstLine and SipResponseFirstLine that parse the first line, maybe we could use this instead of adding more logic to parse the first line?

I agree that we already have SipRequestFirstLine and SipResponseFirstLine for parsing the first line, but I think keeping the heuristic logic separate is still necessary because the goals are different.

The Sip*FirstLine classes assume we already decided that the payload is SIP, operate on a Sip*Layer instance, update internal state (m_IsComplete, offsets, logging, etc.), and are meant for full parsing.

In contrast, dissectSipHeuristic() is a stateless, side-effect-free check that runs directly on raw data to answer a simpler question: “does this buffer look like a SIP message at all?”. This also matches Wireshark’s design, where heuristic detection is separate from the actual SIP dissector that parses the first line and fields.

This separation is particularly important for TCP: when we inspect data per segment, the first line may be incomplete. In that case the heuristic must be able to say “need more data / undecided” without constructing SIP layers or marking anything as invalid, which is a different lifecycle than the existing first-line parsers.

In this pull request I’m not yet handling TCP segmentation or IP fragmentation — the heuristic currently assumes it sees at least one complete first line. I plan to address proper TCP stream reassembly / IP fragmentation handling in a separate follow-up PR.

I agree that we already have SipRequestFirstLine and SipResponseFirstLine for parsing the first line, but I think keeping the heuristic logic separate is still necessary because the goals are different.

The Sip*FirstLine classes assume we already decided that the payload is SIP, operate on a Sip*Layer instance, update internal state (m_IsComplete, offsets, logging, etc.), and are meant for full parsing.

In contrast, dissectSipHeuristic() is a stateless, side-effect-free check that runs directly on raw data to answer a simpler question: “does this buffer look like a SIP message at all?”. This also matches Wireshark’s design, where heuristic detection is separate from the actual SIP dissector that parses the first line and fields.

I just noticed Sip*FirstLine classes do accept a request/response pointer in their constructor, so they can't be used directly. However, they do contain static methods such as parseStatusCode(), parseVersion(), parseMethod() that can definitely be used. If we see we still have a lot of common code between these classes and the parsing logic you need we can think what's the best way to refactor them so they can be used in both scenarios.

In this pull request I’m not yet handling TCP segmentation or IP fragmentation — the heuristic currently assumes it sees at least one complete first line. I plan to address proper TCP stream reassembly / IP fragmentation handling in a separate follow-up PR.

Handling TCP segmentation or IP fragmentation is more tricky - PcapPlusPlus parses packets one by one, there is currently no built-in way to use TcpReassembly or IPReassembly and use the outcome to parse the message again as a packet

Thanks for the suggestion!

I refactored SipLayer::dissectSipHeuristic() to use the static parsing helpers from SipResponseFirstLine and SipRequestFirstLine instead of manually tokenizing the first line.

For responses I'm now using parseVersion() and parseStatusCode(), and for requests I'm using parseMethod(), parseVersion() and parseUri(). This removes the duplicated parsing logic and keeps the heuristic in sync with the actual SIP first-line parsers.

I didn't use the Sip*FirstLine constructors themselves, as they still require a request/response pointer as you mentioned.

You’re absolutely right that handling TCP segmentation and IP fragmentation is more complex. Since PcapPlusPlus currently processes packets one by one, in this PR I focused only on heuristic detection on the first line of a single packet. My plan is to add built-in IP fragmentation support to PcapPlusPlus itself in a separate PR, and I’m really excited to work on that.

codecov · 2025-11-23T03:53:35Z

Codecov Report

❌ Patch coverage is 83.65385% with 17 lines in your changes missing coverage. Please review.
✅ Project coverage is 83.88%. Comparing base (24cc309) to head (6c6acda).
⚠️ Report is 2 commits behind head on dev.

Files with missing lines	Patch %	Lines
Packet++/header/SipLayer.h	83.16%	16 Missing and 1 partial ⚠️

Additional details and impacted files

@@           Coverage Diff           @@
##              dev    #2024   +/-   ##
=======================================
  Coverage   83.87%   83.88%           
=======================================
  Files         307      307           
  Lines       53952    54046   +94     
  Branches    11352    11335   -17     
=======================================
+ Hits        45254    45334   +80     
- Misses       7483     7494   +11     
- Partials     1215     1218    +3

Flag	Coverage Δ
alpine320	`75.91% <87.03%> (+0.01%)`	⬆️
fedora42	`75.47% <86.04%> (+0.01%)`	⬆️
macos-14	`81.56% <74.41%> (-0.02%)`	⬇️
macos-15	`81.56% <74.41%> (-0.02%)`	⬇️
mingw32	`69.97% <64.86%> (-0.03%)`	⬇️
mingw64	`69.99% <72.09%> (+0.12%)`	⬆️
npcap	`?`
rhel94	`75.49% <85.45%> (+0.01%)`	⬆️
ubuntu2004	`59.50% <75.47%> (+0.02%)`	⬆️
ubuntu2004-zstd	`59.61% <75.47%> (+0.04%)`	⬆️
ubuntu2204	`75.40% <85.45%> (+0.01%)`	⬆️
ubuntu2204-icpx	`57.91% <72.09%> (+0.06%)`	⬆️
ubuntu2404	`75.53% <87.03%> (+0.04%)`	⬆️
ubuntu2404-arm64	`75.56% <87.03%> (+0.02%)`	⬆️
unittest	`83.88% <83.65%> (+<0.01%)`	⬆️
windows-2022	`85.40% <81.11%> (+0.15%)`	⬆️
windows-2025	`85.42% <81.11%> (+0.09%)`	⬆️
winpcap	`85.42% <81.11%> (-0.10%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…sers in heuristic SIP detection

seladb · 2025-12-03T07:16:57Z

@sorooshm78 I just saw we already have most of the logic that you implemented: https://github.com/seladb/PcapPlusPlus/blob/master/Packet%2B%2B/src/UdpLayer.cpp#L113-L120

The 2 things you implemented on top of what we have is parsing the request version and the URI, but are they actually required? 🤔

sorooshm78 added 5 commits November 17, 2025 15:51

add initial heuristic detection for SIP packets

f8b2d4d

add comment

07ca6b7

refactor move helper methods to private, keep public API minimal

f4fba28

use function dissectSipHeuristic

d18abd0

Remove unused #include <iostream>

6c6acda

sorooshm78 requested a review from seladb as a code owner November 17, 2025 16:11

seladb reviewed Nov 23, 2025

View reviewed changes

sorooshm78 added 2 commits November 29, 2025 07:55

move the implementation to SipLayer.cpp

2f2cc66

Merge branch 'dev' into add-sip-heuristic

09cb3f9

seladb linked an issue Nov 30, 2025 that may be closed by this pull request

SIP detection in PcapPlusPlus relies solely on port 5060 #2022

Open

sorooshm78 added 3 commits December 2, 2025 17:10

Merge branch 'dev' into add-sip-heuristic

8362e31

refactor: use SipRequestFirstLine and SipResponseFirstLine static par…

dfc161c

…sers in heuristic SIP detection

Remove unused helper functions

9f5c52d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve SIP Packet Detection Using Heuristic Parsing (Fixes #2022) #2024

Improve SIP Packet Detection Using Heuristic Parsing (Fixes #2022) #2024

Uh oh!

sorooshm78 commented Nov 17, 2025

Uh oh!

Uh oh!

seladb Nov 23, 2025

Uh oh!

sorooshm78 Nov 29, 2025

Uh oh!

seladb Nov 30, 2025

Uh oh!

sorooshm78 Dec 2, 2025

Uh oh!

sorooshm78 Dec 2, 2025

Uh oh!

codecov bot commented Nov 23, 2025 •

edited

Loading

Uh oh!

seladb commented Dec 3, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Improve SIP Packet Detection Using Heuristic Parsing (Fixes #2022) #2024

Are you sure you want to change the base?

Improve SIP Packet Detection Using Heuristic Parsing (Fixes #2022) #2024

Uh oh!

Conversation

sorooshm78 commented Nov 17, 2025

Uh oh!

Uh oh!

seladb Nov 23, 2025

Choose a reason for hiding this comment

Uh oh!

sorooshm78 Nov 29, 2025

Choose a reason for hiding this comment

Uh oh!

seladb Nov 30, 2025

Choose a reason for hiding this comment

Uh oh!

sorooshm78 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

sorooshm78 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Nov 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

seladb commented Dec 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codecov bot commented Nov 23, 2025 •

edited

Loading

seladb commented Dec 3, 2025 •

edited

Loading