Business Rules 301: Advanced Functions II

July 21, 2017

AUTHOR

Uncategorized

Reading Time: 5 minutes

This blog post is part of a series designed to demystify the process of understanding and writing your own business rules.

We’re continuing this mini-series with a look at the REGEX “family” of functions: REGEXMATCH, REGEXGET, and REGEXREPLACE. These rules use the power of regular expressions (or “RegEx” for short), which is a system that can find certain values within text, even without knowing every character in the value it’s searching. Before we can explain the rules themselves, it will probably help to explain a bit more about regular expressions.

A regular expression is still text placed within quotes, but each character describes part of the pattern instead of being taken literally. It searches the text for this pattern, so it can get results that fit the pattern even if we don’t know all the specifics. Let’s start with a regular expression that describes the pattern of:

“an ‘a’ character, followed by a ‘b’ character, followed by a ‘c’ character”

In this example, the character that describes an a is ‘a’, a b is ‘b’, and a c is ‘c’. So the expression we need to find this pattern is “abc”. This is perfectly valid, but it doesn’t make RegEx any more useful than our CONTAINS or REPLACE rules. In RegEx, some characters have specific meanings outside what we normally use them for in regular text. The period character (.), for example, is able to describe any one character. To see the power of RegEx, we’ll need a pattern that isn’t as specific as “abc”.

Example 1

Say that each product’s description includes an HTML bulleted list, and one of those bullets specifies the product’s brand. We’d like to isolate the brand through a business rule that we can then apply to an inventory upload template or to multiple outgoing marketplace and digital marketing templates. We know that each description includes text and formatting like this:

<li>Brand: Acme</li>

Since the brand value itself will change, we’ll need to use a pattern like this:

“the characters ‘<li>Brand:’, followed by one or more words, followed by the characters ‘</li>’”

Let’s go through each part of this pattern separately, and build up our regular expression text as we go. For the first part, we can simply describe all of those characters literally.

Our expression so far: “<li>Brand:”

This expression matches: “<li>Brand: Acme</li>”

Next, we need to use some characters with special meanings so that we can describe any word. The sequence ‘w’ can be used in RegEx to describe any single letter, digit, or underscore. In RegEx, the backslash character (‘’) is used to force the next character to take a different meaning. If the next character is usually taken literally, it forces a special meaning; if the next character usually represents a special meaning, the backslash forces a literal interpretation. Here, it’s important to include it; otherwise, ‘w’ alone would be taken literally, as the letter w. We don’t know how many letters, numbers, or underscores will be included in one word, so we need one more entry in this expression to indicate that we want to include all of them. This is done with the plus sign (+). Placing a plus sign after any pattern requires the expression to find as many of that character as possible, until it reaches a character that is not included in that pattern.

Our expression so far: “<li>Brand: w+”

This expression matches: “<li>Brand: Acme</li>”

However, what if we have a brand with two or more words, like “Alpha Beta”? The ‘w+’ sequence does not include spaces, so we need to expand our existing pattern to isolate the entire brand. One of the easiest ways to search for one or more words is to include as many letters, numbers, underscores, AND spaces as possible. We can use square brackets to combine different patterns into a “character class” and locate any one character that matches any of the patterns in that class. The sequence ‘s’ describes any one whitespace character, such as a space or tab. Including it with ‘w’ in a character class will search for any single letter, digit, underscore, or whitespace. We can then use the plus sign to identify all of the word characters OR space characters in a row.

Our expression so far: “<li>Brand: [ws]+”

This expression matches: “<li>Brand: Alpha Beta</li>”

The expression above is likely sufficient to identify a full brand name, but in RegEx, it’s good practice to use expressions that describe very specific patterns to make sure it doesn’t locate values we aren’t expecting. To be complete, our expression should also identify the ending HTML tag to ensure it has included the entire brand value. Most of the remaining characters do not have special meanings and can be taken literally, but the ‘/’ character usually represents a special meaning. We can place a backslash before it to force it to be taken literally.

Our expression so far: “<li>Brand: [ws]+</li>”

This expression matches: “<li>Brand: Alpha Beta</li>”

Example 2

What if we wanted to find a price value that might be in a product’s title? Not every product has the same price, but all prices follow the same pattern:

“a ‘$’ character, followed by some numbers, followed by a ‘.’ character, followed by two more numbers”

Let’s go through each part of this pattern separately, and build our regular expression text as we go. In RegEx, the dollar sign character has a special meaning of its own, so we again need to add a backslash to force it to be taken literally.

Our expression so far: “$”

Next, we need to capture a variable amount of numbers. This is similar to how we captured a word in our first example, but this time we’ll use the pattern ‘d’ instead of ‘w’ to only capture digits. As before, we’ll use the plus sign character to capture as many digits in a row as possible.

Our expression so far: “$d+”

Now that the expression includes as many digits as possible, the next part of our pattern is a decimal point. Because the ‘.’ character has a special meaning of its own, we’ll use a backslash to take it literally.

Our expression so far: “$d+.”
Finally, we need two more digits. We already know that we can describe one digit with the characters ‘d’, and we have a few options for how we can indicate that we need two of them. One option is ‘dd’, which simply matches one digit and then another digit. We could also use ‘d+’ again, to find as many digits as possible after the decimal point. A third option is to specify the number of repetitions the ‘d’ pattern should match, which is done by placing a number within braces after the pattern. In this example, we can use ‘d{2}’.

Our expression so far: “$d+.d{2}”

Back to the Business Rules

REGEXMATCH

The REGEXMATCH function searches input text for a regular expression pattern, and returns “true” or “false” based on whether the pattern can be found in that text. Let’s say that we do have a price value in some product titles, and we simply want to identify those products for now. In that case, we would use the following expression:

REGEXMATCH($itemtitle, “$d+.d{2}”)

We can use this function as a condition in an IF or SELECTCASE function to perform certain logic for products with (or without) prices in their titles.

REGEXGET

What if we need to isolate the price value out of the product title? Use the following expression:

REGEXGET($itemtitle, “$d+.d{2}”)

The REGEXGET function searches for an expression in the input text value. If there is a match, it will return the specific matching text. Otherwise, it will return a blank value.

REGEXREPLACE

Finally, what if we know that some products have prices in their titles, and we’re trying to efficiently remove them to meet a marketplace’s data requirements? Use the REGEXREPLACE function:

REGEXREPLACE($itemtitle, “$d+.d{2}”, “”)

REGEXREPLACE searches for the given expression in the given text value. If there is a match, the specific matching text will be replaced with the text in the third parameter. Note that the third parameter must be regular text, instead of a second regular expression.

Regular expressions represent a field all on its own, too broad to cover in one blog post. Many online resources and even textbooks on regular expression are available for additional learning. To find a quick reference to see what characters have special meanings in RegEx, search for “regex cheat sheet” and identify a concise guide to review.

If you want to learn more and can’t wait two weeks for our next blog post, feel free to explore more about business rules on our SSC or check out the other posts in this series. If you’re struggling with a rule, you can always open a case with support to assist you.

business rules,Business Rules 301,how to

Rithum Team

Rithum for Brands

Rithum for Retailers

Solutions for Brands

Solutions for Retailers