You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I recently reviewed the TSQL_data_lineage procedure and the helper function fn_removelistChars. While the concept of a T-SQL-based lineage parser is interesting, I identified several critical issues that make the current implementation unsafe for multi-user environments and unreliable for complex SQL syntax.
Here is a detailed breakdown of the findings:
1. Critical Concurrency Issue (Global Tables)
The procedure creates and drops permanent tables (dbo.SQL_query_table, dbo.LN_Query, dbo.Query_results_no_comment, etc.) during runtime instead of using temporary tables (#Table) or table variables.
The Problem:
If two processes or users execute dbo.TSQL_data_lineage simultaneously, they will attempt to drop and recreate the same global tables. This leads to race conditions, data corruption (User A reading User B's data), or execution errors.
Recommendation:
Replace all instances of CREATE TABLE dbo.Name with CREATE TABLE #Name to ensure session isolation.
2. Naive Parsing Logic
The parsing logic relies heavily on STRING_SPLIT using a space as the delimiter and simple string replacements. This approach fails in valid standard T-SQL scenarios:
No Spaces: T-SQL allows SELECT*FROM[Table]. Since there are no spaces, STRING_SPLIT will not isolate FROM or the table name, causing the lineage detection to fail completely.
String Literals: The script does not respect string boundaries.
Example:SELECT 'This is -- not a comment' AS Text FROM Tbl.
The script interprets the -- inside the string as a comment start and strips the rest of the line, rendering the query invalid.
Identifiers with Spaces: A table named [My Table] will be split into two tokens: [My and Table]. The logic likely fails to reconstruct the full object name.
3. Logical Bug in Conditional Check
Around line 259, there appears to be a logical error or a tautology:
IF (@command NOT LIKE'%#%'OR @command NOT LIKE'%#%')
This condition checks the exact same expression twice with an OR operator. Logically, this makes little sense. If the intention was to check for temporary tables, the logic needs to be revisited.
4. Performance (RBAR)
The script uses WHILE loops to iterate through the text character by character (e.g., in fn_removelistChars and the comment removal section). In T-SQL, this "Row-By-Agonizing-Row" approach is extremely inefficient for larger scripts and results in high CPU usage compared to set-based operations or CLR/PowerShell parsers.
Summary
To make this script production-ready, it requires:
Switching to Temporary Tables immediately to prevent data collisions.
Robust Tokenizing: Handling brackets [], string literals '', and code without spaces correctly.
For robust lineage analysis, relying on sys.dm_exec_describe_first_result_set, sys.sql_expression_dependencies, or the Microsoft.SqlServer.TransactSql.ScriptDom (via PowerShell/C#) is generally recommended over regex-like parsing in T-SQL.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hi Tomaž,
I recently reviewed the
TSQL_data_lineageprocedure and the helper functionfn_removelistChars. While the concept of a T-SQL-based lineage parser is interesting, I identified several critical issues that make the current implementation unsafe for multi-user environments and unreliable for complex SQL syntax.Here is a detailed breakdown of the findings:
1. Critical Concurrency Issue (Global Tables)
The procedure creates and drops permanent tables (
dbo.SQL_query_table,dbo.LN_Query,dbo.Query_results_no_comment, etc.) during runtime instead of using temporary tables (#Table) or table variables.The Problem:
If two processes or users execute
dbo.TSQL_data_lineagesimultaneously, they will attempt to drop and recreate the same global tables. This leads to race conditions, data corruption (User A reading User B's data), or execution errors.Recommendation:
Replace all instances of
CREATE TABLE dbo.NamewithCREATE TABLE #Nameto ensure session isolation.2. Naive Parsing Logic
The parsing logic relies heavily on
STRING_SPLITusing a space as the delimiter and simple string replacements. This approach fails in valid standard T-SQL scenarios:SELECT*FROM[Table]. Since there are no spaces,STRING_SPLITwill not isolateFROMor the table name, causing the lineage detection to fail completely.SELECT 'This is -- not a comment' AS Text FROM Tbl.--inside the string as a comment start and strips the rest of the line, rendering the query invalid.[My Table]will be split into two tokens:[MyandTable]. The logic likely fails to reconstruct the full object name.3. Logical Bug in Conditional Check
Around line 259, there appears to be a logical error or a tautology:
This condition checks the exact same expression twice with an
ORoperator. Logically, this makes little sense. If the intention was to check for temporary tables, the logic needs to be revisited.4. Performance (RBAR)
The script uses
WHILEloops to iterate through the text character by character (e.g., infn_removelistCharsand the comment removal section). In T-SQL, this "Row-By-Agonizing-Row" approach is extremely inefficient for larger scripts and results in high CPU usage compared to set-based operations or CLR/PowerShell parsers.Summary
To make this script production-ready, it requires:
[], string literals'', and code without spaces correctly.For robust lineage analysis, relying on
sys.dm_exec_describe_first_result_set,sys.sql_expression_dependencies, or the Microsoft.SqlServer.TransactSql.ScriptDom (via PowerShell/C#) is generally recommended over regex-like parsing in T-SQL.Best regards,
Markus
Beta Was this translation helpful? Give feedback.
All reactions