Linux Tactic

Mastering Regular Expressions in Bash: Tips and Examples

Introduction to Regular Expressions in Bash

If you’re a developer or system administrator working with the bash shell, you might have heard about regular expressions. Regular expressions, or regex, are a powerful tool used to search for and match specific patterns of text within a larger body of text.

In this article, we’ll explore how to use regular expressions in bash for search queries and programming.

Use of Regular Expressions for Search Queries

Regular expressions can be a lifesaver when you need to search for a specific pattern of text from a large dataset. Let’s say you have a list of email addresses and you want to find all the addresses that end with .edu domain.

You could manually scan each entry, or you could use a regex to quickly find all the matching entries. Here’s an example:

grep -E “.edu$” emails.txt

This will return all the lines that end with .edu in the emails.txt file.

The -E flag specifies that we’re using extended regex syntax, and the “.edu$” pattern matches any line that ends with .edu. This is just a simple example to illustrate how regex can be used to search for specific patterns.

You can use regex for more complex search queries such as matching phone numbers, social security numbers, or IP addresses.

Use of Regular Expressions in Bash Programming

Aside from search queries, regex is also widely used in bash programming for text manipulation and parsing. Let’s say you have a file with a list of names, but the names are formatted in different cases.

You want to convert all the names to uppercase. This can easily be achieved using regex with the tr command:

tr ‘a-z’ ‘A-Z’ < names.txt

The ‘a-z’ pattern matches all lowercase alphabetical characters, and ‘A-Z’ replaces them with uppercase alphabetical characters.

The < operator redirects the contents of the names.txt file to the tr command. You can also use regex to extract specific parts of a string.

Let’s say you have a string that contains a date in the format of “MM/DD/YYYY”. You want to extract the year from this string.

This can be done using the bash regex operator ${}:

date=”01/01/2022″

year=${date:6}

The ${date:6} operator returns the string starting from the 6th character, which is the year “2022”. You can use similar techniques to extract other parts of the string such as the day or month.

Limitations of Regular Expressions in Case Statements

While regex is a powerful tool, it has some limitations when used in case statements in bash programming. Case statements are used to match a specific pattern of text and execute a specific command.

Here’s an example:

read -p “Enter your name: ” name

case $name in

[Jj][oO][hH][nN]) echo “Hello John!” ;;

[Mm][aA][rR][yY]) echo “Hello Mary!” ;;

*) echo “Hello, $name!” ;;

esac

In this example, we’re using case statements to match a user’s name and output a custom greeting based on their name. The [Jj][oO][hH][nN] pattern matches any variation of John or john, and similarly for Mary.

The * symbol matches any other name. The limitation of regex in case statements is that it only matches character by character.

Expressions such as ^ or $ that match the beginning and end of the line are not supported. This means you cannot use regex to match specific words or phrases within a line of text.

In this case, you’re better off using the grep command to search for the specific pattern of text. Example 01: Using Regular Expressions with Grep and Case Statements

Let’s put everything we’ve learned so far into a practical example.

We’re going to create a bash script that reads a file of email addresses and categorizes them based on their domain age. We’ll use regex to extract the domain name, and then check the age using the whois command.

To start, let’s create a new file using GNU editor:

nano email-age.sh

Next, let’s add the following code to the file:

#!/bin/bash

for i in $(cat emails.txt); do

# Extract domain name

domain=$(echo $i | grep -Eo “[^@]+$”)

# Check domain age

age=$(whois $domain | grep -E “Registrar Registration Expiration Date:” | awk ‘{print $5}’)

if [[ ! -z “$age” ]]; then

case 1 in

$(($age < 180))) echo "$i is less than 6 months old!" ;;

$(($age < 365))) echo "$i is less than 1 year old!" ;;

*) echo “$i is over 1 year old!” ;;

esac

else

echo “Failed to retrieve domain age for $i”

fi

done

Let’s break down the code. We start by looping through each line in the emails.txt file using the for loop.

Then, we extract the domain name from each email using grep -Eo “[^@]+$”. This regex matches any character that is not @ at the end of the line, effectively extracting the domain name.

Next, we use the whois command to retrieve the registration expiration date of the domain. We use another regex to match the expiration date line and extract the date using awk.

Then, we use a case statement to categorize the domain age into three categories: less than 6 months, less than 1 year, and over 1 year. We do this by performing arithmetic comparisons on the age variable.

Finally, we output the result with a custom message.

Conclusion

Regular expressions are a powerful tool for search queries and text manipulation in bash programming. While they have some limitations when used in case statements, they can still be used in combination with other bash commands to achieve complex tasks.

By learning and mastering regular expressions, you can make your bash scripts more efficient and powerful. 3) Example 02: Using Regular Expressions with If Statements

Regular expressions can also be used in bash programming for input validation.

If statements are commonly used to check if a certain condition is met and execute a specific command. Here’s an example of using regular expressions with if statements:

read -p “Enter a phone number: ” phone

if [[ $phone =~ ^[0-9]{10}$ ]]; then

echo “Valid phone number entered!”

else

echo “Invalid phone number entered!”

fi

In this example, we’re using the read command to take in user input for a phone number.

We then use an if statement to check if the input matches the regex pattern. The ^[0-9]{10}$ pattern matches any string that has exactly 10 digits, which is the format of a phone number in most countries.

If the condition is met, the script outputs a valid phone number message, otherwise an invalid message. Another example of using regular expressions with if statements is validating email addresses:

read -p “Enter an email address: ” email

if [[ $email =~ ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}$ ]]; then

echo “Valid email address entered!”

else

echo “Invalid email address entered!”

fi

In this example, we’re using the same format for if statements, but with a more complex regex pattern.

The ^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+.[A-Za-z]{2,}$ pattern matches any valid email address. This pattern matches the local-part (username) of the email address, the domain name, and the top-level domain (TLD).

Using regular expressions with if statements is an effective way to validate input and ensure that input entered into a script is in the expected format. 4) Example 03: Using Regular Expressions within the Case Statement

Regular expressions can also be used within the case statement to match specific patterns of input.

The case statement is used to match a specific pattern of input and execute a specific command. Here’s an example of using regular expressions within the case statement:

read -p “Enter a server domain: ” domain

case $domain in

[wW][wW][wW].*) echo “Domain starts with www” ;;

*.[cC][oO][mM]) echo “Domain ends with .com” ;;

[a-zA-Z]*.[a-zA-Z]*) echo “Domain has a valid TLD” ;;

*) echo “Domain does not match known patterns” ;;

esac

In this example, we’re taking in user input for a server domain using the read command.

We then use a case statement to match specific patterns of input using regular expressions. The first pattern [wW][wW][wW].* matches any domain that starts with www.

The second pattern *.[cC][oO][mM] matches any domain that ends in .com. The third pattern [a-zA-Z]*.[a-zA-Z]* matches any domain that has a valid top-level domain.

Through regular expressions, the case statement can match specific patterns within input, which can be useful for identifying and handling input in specific ways.

Conclusion

Regular expressions are a powerful tool for search queries, text manipulation, input validation, and pattern matching in bash programming. By mastering regular expressions, you can enhance your bash scripts and achieve more complex tasks.

Understanding how to use regular expressions with if statements and case statements is key to building effective bash scripts. In conclusion, regular expressions are a powerful tool for search queries, text manipulation, input validation, and pattern matching in bash programming.

By learning and mastering regular expressions, you can make your bash scripts more efficient and powerful. Regular expressions can be used with commands like grep, tr, if statements, and case statements, and they allow for precise matching of patterns within input.

In bash programming, regular expressions are a valuable tool to have in your skillset, and they can help you achieve more complex tasks in less time.

Popular Posts