REGEX
Basic searching can only take you so far when looking for vulnerabilities in code. REGEX (Regular Expressions) are a natural progression from this for any bug hunter.
It is not as fast as basic searching however it allows for more complicated and looser matching. This can help us improve the search results by reducing noise and increasing true-positive matches.
wpdirectory.net
wpdirectory.net is a free online tool that will scan all the plugin and themes in the WordPress.org SVN repository for a REGEX pattern.
This uses the RE2 REGEX library which is superfast but does not support “fancy” features like look ahead / behind. So keep your regex simple!
Note: It is worth double-checking once you’ve found some suspicious code to investigate if it is still active. Many of the plugins and themes that are scanned are now closed, which means they will be ineligible for a bounty. Just click on the icon to be taken to the project’s WordPress.org page which will state if it is closed.
Cheatsheet
When developing REGEX patterns, I would recommend using online tools such as regexr that can help by testing matches, syntax highlighting and explaining what each part does.
Char | Explanation |
---|---|
\s | Space or Tab |
\d | Digits (0-9) |
\w | Alphanumeric (a-zA-Z0-9_) |
+ | One or more of previous char |
* | Zero or more of previous char |
Examples
Here are some PHP and WordPress specific examples to help get you started on your REGEX adventure:
Matching multiples
\$_(GET|POST|REQUEST)\[
This will match any attempt to access a variable in the GET
, POST
or REQUEST
superglobals. Here the $
and [
are escaped using a \
to force them to be interpreted as literal character instead of REGEX control sequences.
Optional Matching Groups
\.\s*(wp_)?json_decode\(
This will match either PHP’s json_decode
or WordPress’ wp_json_decode
when used with string concatenation (.
).
Looser matching
When scanning for vulnerabilities using REGEX it’s important to be able to catch as many real-positives as possible. Annoyingly PHP is a very loose language and allows for quite a range of syntax to be valid. For example, let’s say we are searching for the following bit of PHP code:
echo $_GET['variable'];
We could write a pattern like: echo\s\$_GET\[["']
, however, that would not match any of the following PHP which would all do the same thing:
echo$_GET['variable'];
echo $_GET['variable'];
echo $_GET ['variable'];
echo $_GET[ 'variable'];
$field="variable";echo $_GET[$field];
Going beyond REGEX
As REGEX in not code aware, it is not great at “understanding” what it sees. For code aware scanning we need to look at tools that create Abstract Syntax Trees. The following are code scanning tools that support PHP:
Another popular tool is GitHub’s CodeQL, however this does not currently support PHP.