String Escaping/Encoding Bug Fixes#5772
String Escaping/Encoding Bug Fixes#5772christopherholland-workday wants to merge 1 commit intomainfrom
Conversation
There was a problem hiding this comment.
Code Review
This pull request addresses several string escaping and encoding bugs, likely identified by CodeQL. The changes primarily involve replacing single-occurrence string replacements with global regular expression-based replacements to ensure all instances of special characters are correctly handled. This is a good practice for robustness. Additionally, an HTML stripping logic in the Searxng tool has been removed, with the reasoning that ReactMarkdown already handles sanitization. This change also improves safety against potential null or undefined values. The fixes appear correct and well-targeted, though I have a suggestion to make the backslash replacement logic more robust.
| const [, nodeIdPart, outputPath] = outputMatch | ||
| // Clean nodeId (handle escaped underscores) | ||
| const cleanNodeId = nodeIdPart.replace('\\', '') | ||
| const cleanNodeId = nodeIdPart.replace(/\\/g, '') |
There was a problem hiding this comment.
While using a global replacement is correct, replace(/\\/g, '') is a bit aggressive as it removes all backslashes. This could corrupt strings that are meant to contain literal backslashes (e.g., in paths).
A safer approach is to specifically un-escape only the characters that turndown escapes. According to turndown's implementation, these are: \\, *, _, [, ], (, ), #.
I suggest using a more targeted regex for this.
const cleanNodeId = nodeIdPart.replace(/\\([\\*_\[\]()#])/g, '$1')| // Find node data in executed data | ||
| // sometimes turndown value returns a backslash like `llmAgentflow\_1`, remove the backslash | ||
| const cleanNodeId = variableFullPath.replace('\\', '') | ||
| const cleanNodeId = variableFullPath.replace(/\\/g, '') |
There was a problem hiding this comment.
While using a global replacement is correct, replace(/\\/g, '') is a bit aggressive as it removes all backslashes. This could corrupt strings that are meant to contain literal backslashes (e.g., in paths).
A safer approach is to specifically un-escape only the characters that turndown escapes. According to turndown's implementation, these are: \\, *, _, [, ], (, ), #.
I suggest using a more targeted regex for this.
const cleanNodeId = variableFullPath.replace(/\\([\\*_\[\]()#])/g, '$1')
Related to CodeQL findings
59, 61, 62 are not security issues, more so code smells and potential minor bugs.
For 63 we can remove our own HTML stripping because
ReactMarkdownhandles all of the stripping/sanitizing already.