Regex

What is Regex?

Regex (short for Regular Expression) is a sequence of characters that defines a search pattern. Think of it as a mini-language specifically designed for matching, searching, extracting, and replacing text.

Here is a real-world analogy: imagine you are in a library looking for books. Instead of searching for one specific title, you tell the librarian: “Find me all books whose title starts with ‘Java’, has a number in the middle, and ends with ‘Guide’.” That description is essentially a regex — a template that matches multiple possibilities based on a pattern, not a fixed string.

In Java, regex is used everywhere:

  • Validation — Check if user input matches an expected format (email, phone number, password)
  • Search — Find all occurrences of a pattern in a large body of text
  • Extraction — Pull specific pieces of data out of strings (dates from logs, numbers from reports)
  • Replacement — Transform text by replacing matched patterns with new content
  • Splitting — Break strings apart at complex delimiters

Without regex, tasks like “find all email addresses in a 10,000-line log file” would require dozens of lines of manual string parsing. With regex, it takes one line.

Java Regex Classes

Java provides regex support through the java.util.regex package, which contains three core classes:

Class Purpose Key Methods
Pattern A compiled representation of a regex pattern. Compiling is expensive, so you compile once and reuse. compile(), matcher(), matches(), split()
Matcher The engine that performs matching operations against a string using a Pattern. matches(), find(), group(), replaceAll()
PatternSyntaxException An unchecked exception thrown when a regex pattern has invalid syntax. getMessage(), getPattern(), getIndex()

The basic workflow for using regex in Java follows three steps:

  1. Compile the regex string into a Pattern object
  2. Create a Matcher by calling pattern.matcher(inputString)
  3. Execute a matching operation: matches(), find(), lookingAt(), etc.
import java.util.regex.Pattern;
import java.util.regex.Matcher;

public class RegexBasics {
    public static void main(String[] args) {
        // Step 1: Compile the pattern
        Pattern pattern = Pattern.compile("Java");

        // Step 2: Create a matcher for the input string
        Matcher matcher = pattern.matcher("I love Java programming");

        // Step 3: Execute matching operations
        boolean found = matcher.find();
        System.out.println("Found 'Java': " + found); // Found 'Java': true

        // matches() checks if the ENTIRE string matches the pattern
        boolean fullMatch = matcher.matches();
        System.out.println("Entire string is 'Java': " + fullMatch); // Entire string is 'Java': false

        // Reset and find the match position
        matcher.reset();
        if (matcher.find()) {
            System.out.println("Match starts at index: " + matcher.start()); // Match starts at index: 7
            System.out.println("Match ends at index: " + matcher.end());     // Match ends at index: 11
            System.out.println("Matched text: " + matcher.group());          // Matched text: Java
        }
    }
}

There is an important distinction between three Matcher methods:

Method What it Checks Example Pattern: "Java"
matches() Does the entire string match the pattern? "Java" returns true, "Java rocks" returns false
find() Is the pattern found anywhere in the string? "I love Java" returns true
lookingAt() Does the beginning of the string match the pattern? "Java rocks" returns true, "I love Java" returns false

For quick one-off checks, you can skip the compile step and use the static Pattern.matches() method. However, this recompiles the pattern every time, so avoid it in loops or frequently called methods.

// Quick one-off match (compiles a new Pattern every call -- avoid in loops)
boolean isMatch = Pattern.matches("\\d+", "12345");
System.out.println("All digits: " + isMatch); // All digits: true

// Even quicker: String.matches() delegates to Pattern.matches()
boolean isDigits = "12345".matches("\\d+");
System.out.println("All digits: " + isDigits); // All digits: true

Basic Pattern Syntax

A regex pattern is built from two types of characters:

  • Literal characters — Match themselves exactly. The pattern cat matches the text “cat”.
  • Metacharacters — Special characters with special meaning. They are the building blocks of pattern logic.

Java has 14 metacharacters that have special meaning in regex. If you want to match these characters literally, you must escape them with a backslash (\).

Metacharacter Meaning To Match Literally
. Any single character (except newline by default) \\.
^ Start of string (or line in MULTILINE mode) \\^
$ End of string (or line in MULTILINE mode) \\$
* Zero or more of preceding element \\*
+ One or more of preceding element \\+
? Zero or one of preceding element \\?
{ } Quantifier range (e.g., {2,5}) \\{ \\}
[ ] Character class definition \\[ \\]
( ) Grouping and capturing \\( \\)
\ Escape character \\\\
| Alternation (OR) \\|

Critical Java note: In Java strings, the backslash (\) is itself an escape character. So to write the regex \d (which means “a digit”), you must write "\\d" in Java code — the first backslash escapes the second one for Java, and the resulting \d is what the regex engine sees.

import java.util.regex.*;

public class MetacharacterEscaping {
    public static void main(String[] args) {
        // Without escaping: . matches ANY character
        System.out.println("file.txt".matches("file.txt"));  // true
        System.out.println("fileXtxt".matches("file.txt"));  // true -- oops, . matched 'X'

        // With escaping: \\. matches only a literal dot
        System.out.println("file.txt".matches("file\\.txt")); // true
        System.out.println("fileXtxt".matches("file\\.txt")); // false -- correct!

        // Matching a literal dollar sign in a price
        Pattern price = Pattern.compile("\\$\\d+\\.\\d{2}");
        System.out.println(price.matcher("$19.99").matches()); // true
        System.out.println(price.matcher("$5.00").matches());  // true
        System.out.println(price.matcher("19.99").matches());  // false -- missing $

        // Use Pattern.quote() to treat an entire string as a literal
        String userInput = "price is $10.00 (USD)";
        String searchTerm = "$10.00";
        Pattern literal = Pattern.compile(Pattern.quote(searchTerm));
        Matcher m = literal.matcher(userInput);
        System.out.println(m.find()); // true -- matched "$10.00" literally
    }
}

Character Classes

A character class (also called a character set) matches a single character from a defined set. You define a character class by placing characters inside square brackets [].

Custom Character Classes

Syntax Meaning Example Matches
[abc] Any one of a, b, or c [aeiou] Any vowel
[a-z] Any character in range a through z [a-zA-Z] Any letter
[0-9] Any digit 0 through 9 [0-9a-f] Any hex digit
[^abc] Any character except a, b, or c [^0-9] Any non-digit
[a-z&&[^aeiou]] Intersection: a-z but not vowels [a-z&&[^aeiou]] Any consonant

Predefined Character Classes

Java provides shorthand notation for commonly used character classes. These save typing and improve readability.

Shorthand Equivalent Meaning
\d [0-9] Any digit
\D [^0-9] Any non-digit
\w [a-zA-Z0-9_] Any word character (letter, digit, or underscore)
\W [^a-zA-Z0-9_] Any non-word character
\s [ \t\n\r\f] Any whitespace character
\S [^ \t\n\r\f] Any non-whitespace character
. (almost anything) Any character except newline (unless DOTALL flag is set)

Remember: in Java strings, you write \\d to produce the regex \d.

import java.util.regex.*;

public class CharacterClasses {
    public static void main(String[] args) {
        // Custom character class: match a vowel followed by a consonant
        Pattern vc = Pattern.compile("[aeiou][^aeiou\\s\\d]");
        Matcher m = vc.matcher("hello world");
        while (m.find()) {
            System.out.println("Found: " + m.group() + " at index " + m.start());
        }
        // Found: el at index 1
        // Found: or at index 7

        // \\d matches any digit
        System.out.println("abc".matches("\\d+")); // false
        System.out.println("123".matches("\\d+")); // true

        // \\w matches word characters (letters, digits, underscore)
        System.out.println("hello_world".matches("\\w+")); // true
        System.out.println("hello world".matches("\\w+")); // false -- space is not a word char

        // \\s matches whitespace
        System.out.println("has spaces".matches(".*\\s.*")); // true
        System.out.println("nospaces".matches(".*\\s.*"));   // false

        // . matches any character except newline
        System.out.println("a".matches("."));  // true
        System.out.println("1".matches("."));  // true
        System.out.println("".matches("."));   // false -- needs exactly one char

        // Ranges: hex digit check
        Pattern hex = Pattern.compile("[0-9a-fA-F]+");
        System.out.println(hex.matcher("1a2bFF").matches()); // true
        System.out.println(hex.matcher("GHIJ").matches());   // false

        // Negation: match non-digits
        Matcher nonDigits = Pattern.compile("[^0-9]+").matcher("abc123def");
        while (nonDigits.find()) {
            System.out.println("Non-digit segment: " + nonDigits.group());
        }
        // Non-digit segment: abc
        // Non-digit segment: def
    }
}

Quantifiers

Quantifiers control how many times a preceding element must occur for a match. Without quantifiers, each element in a pattern matches exactly once.

Quantifier Reference

Quantifier Meaning Example Pattern Matches Does Not Match
* Zero or more ab*c “ac”, “abc”, “abbc” “adc”
+ One or more ab+c “abc”, “abbc” “ac”
? Zero or one (optional) colou?r “color”, “colour” “colouur”
{n} Exactly n times \\d{3} “123” “12”, “1234”
{n,} At least n times \\d{2,} “12”, “123”, “1234” “1”
{n,m} Between n and m times \\d{2,4} “12”, “123”, “1234” “1”, “12345”

Greedy vs Lazy Quantifiers

By default, all quantifiers are greedy — they match as much text as possible. Adding a ? after a quantifier makes it lazy (also called reluctant) — it matches as little text as possible.

Greedy Lazy Behavior
* *? Match as few as possible (zero or more)
+ +? Match as few as possible (one or more)
? ?? Match zero if possible
{n,m} {n,m}? Match n times if possible

The difference matters most when your pattern has flexible parts and you need to control where the match stops.

import java.util.regex.*;

public class Quantifiers {
    public static void main(String[] args) {
        // Greedy vs Lazy demonstration
        String html = "bold and more bold";

        // Greedy: .* grabs as much as possible
        Matcher greedy = Pattern.compile(".*").matcher(html);
        if (greedy.find()) {
            System.out.println("Greedy: " + greedy.group());
            // Greedy: bold and more bold
            // -- matched from first  to LAST 
        }

        // Lazy: .*? grabs as little as possible
        Matcher lazy = Pattern.compile(".*?").matcher(html);
        while (lazy.find()) {
            System.out.println("Lazy: " + lazy.group());
        }
        // Lazy: bold
        // Lazy: more bold
        // -- matched each ... pair individually

        // Exact count: match a US zip code (5 digits, optional -4 digits)
        Pattern zip = Pattern.compile("\\d{5}(-\\d{4})?");
        System.out.println(zip.matcher("90210").matches());      // true
        System.out.println(zip.matcher("90210-1234").matches()); // true
        System.out.println(zip.matcher("9021").matches());       // false
        System.out.println(zip.matcher("902101234").matches());  // false

        // Range: password length check (8 to 20 characters)
        Pattern length = Pattern.compile(".{8,20}");
        System.out.println(length.matcher("short").matches());             // false (5 chars)
        System.out.println(length.matcher("justright").matches());         // true (9 chars)
        System.out.println(length.matcher("a]".repeat(11)).matches());     // false (22 chars)

        // Optional element: match "http" or "https"
        Pattern protocol = Pattern.compile("https?://.*");
        System.out.println(protocol.matcher("http://example.com").matches());  // true
        System.out.println(protocol.matcher("https://example.com").matches()); // true
        System.out.println(protocol.matcher("ftp://example.com").matches());   // false
    }
}

Anchors and Boundaries

Anchors do not match characters — they match positions in the string. They assert that the current position in the string meets a certain condition.

Anchor Meaning Example
^ Start of string (or start of each line with MULTILINE flag) ^Hello matches “Hello world” but not “Say Hello”
$ End of string (or end of each line with MULTILINE flag) world$ matches “Hello world” but not “world peace”
\b Word boundary (between a word char and a non-word char) \bcat\b matches “the cat sat” but not “concatenate”
\B Non-word boundary (between two word chars or two non-word chars) \Bcat\B matches “concatenate” but not “the cat sat”

Word boundaries (\b) are one of the most useful anchors. A word boundary exists between a word character (\w) and a non-word character (\W), or at the start/end of the string if it begins/ends with a word character.

import java.util.regex.*;

public class AnchorsAndBoundaries {
    public static void main(String[] args) {
        // ^ and $ -- start and end anchors
        System.out.println("Hello World".matches("^Hello.*"));  // true
        System.out.println("Say Hello".matches("^Hello.*"));    // false

        // Without anchors, find() looks anywhere in the string
        Matcher m1 = Pattern.compile("error").matcher("An error occurred");
        System.out.println(m1.find()); // true

        // With anchors, matches() checks the entire string
        System.out.println("An error occurred".matches("error")); // false -- not the whole string
        System.out.println("error".matches("error"));             // true

        // \\b word boundary -- match whole words only
        String text = "The cat scattered the catalog across the category";
        Matcher wordCat = Pattern.compile("\\bcat\\b").matcher(text);
        int count = 0;
        while (wordCat.find()) {
            System.out.println("Found whole word 'cat' at index " + wordCat.start());
            count++;
        }
        System.out.println("Total matches: " + count);
        // Found whole word 'cat' at index 4
        // Total matches: 1
        // -- "scattered", "catalog", and "category" were correctly excluded

        // Without word boundary -- matches "cat" inside other words too
        Matcher anyCat = Pattern.compile("cat").matcher(text);
        count = 0;
        while (anyCat.find()) {
            count++;
        }
        System.out.println("Without boundary: " + count + " matches");
        // Without boundary: 4 matches

        // ^ and $ with MULTILINE flag -- match each line
        String multiline = "First line\nSecond line\nThird line";
        Matcher lineStarts = Pattern.compile("^\\w+", Pattern.MULTILINE).matcher(multiline);
        while (lineStarts.find()) {
            System.out.println("Line starts with: " + lineStarts.group());
        }
        // Line starts with: First
        // Line starts with: Second
        // Line starts with: Third
    }
}

Groups and Capturing

Parentheses () in a regex serve two purposes: they group parts of the pattern together (so quantifiers or alternation can apply to the whole group), and they capture the matched text (so you can retrieve it later).

Capturing Groups

Each pair of parentheses creates a capturing group, numbered left-to-right starting at 1. Group 0 always refers to the entire match.

For the pattern (\\d{3})-(\\d{3})-(\\d{4}) matching “555-123-4567”:

  • group(0) = “555-123-4567” (entire match)
  • group(1) = “555” (area code)
  • group(2) = “123” (prefix)
  • group(3) = “4567” (line number)

Named Groups

Numbered groups can be hard to read in complex patterns. Java supports named capturing groups using the syntax (?<name>...). You retrieve the value with matcher.group("name").

Non-Capturing Groups

Sometimes you need parentheses for grouping (e.g., to apply a quantifier to a group) but do not need to capture the matched text. Use (?:...) for a non-capturing group. This is slightly more efficient since the regex engine does not need to store the match.

Backreferences

A backreference refers back to a previously captured group within the same pattern. \\1 refers to the text matched by group 1, \\2 refers to group 2, and so on. This is useful for finding repeated patterns like duplicate words.

import java.util.regex.*;

public class GroupsAndCapturing {
    public static void main(String[] args) {
        // --- Numbered Capturing Groups ---
        String phone = "Call me at 555-123-4567 or 800-555-0199";
        Pattern phonePattern = Pattern.compile("(\\d{3})-(\\d{3})-(\\d{4})");
        Matcher m = phonePattern.matcher(phone);

        while (m.find()) {
            System.out.println("Full match:  " + m.group(0));
            System.out.println("Area code:   " + m.group(1));
            System.out.println("Prefix:      " + m.group(2));
            System.out.println("Line number: " + m.group(3));
            System.out.println();
        }
        // Full match:  555-123-4567
        // Area code:   555
        // Prefix:      123
        // Line number: 4567
        //
        // Full match:  800-555-0199
        // Area code:   800
        // Prefix:      555
        // Line number: 0199

        // --- Named Capturing Groups ---
        String dateStr = "2026-02-28";
        Pattern datePattern = Pattern.compile(
            "(?\\d{4})-(?\\d{2})-(?\\d{2})"
        );
        Matcher dm = datePattern.matcher(dateStr);

        if (dm.matches()) {
            System.out.println("Year:  " + dm.group("year"));   // Year:  2026
            System.out.println("Month: " + dm.group("month"));  // Month: 02
            System.out.println("Day:   " + dm.group("day"));    // Day:   28
        }

        // --- Non-Capturing Groups ---
        // Match "http" or "https" without capturing the "s"
        Pattern url = Pattern.compile("(?:https?)://([\\w.]+)");
        Matcher um = url.matcher("Visit https://example.com today");

        if (um.find()) {
            System.out.println("Full match: " + um.group(0));  // Full match: https://example.com
            System.out.println("Domain:     " + um.group(1));  // Domain:     example.com
            // group(1) is the domain, not "https" -- because (?:...) did not capture
        }

        // --- Backreferences: find duplicate words ---
        String text = "This is is a test test of of duplicate words";
        Pattern dupes = Pattern.compile("\\b(\\w+)\\s+\\1\\b", Pattern.CASE_INSENSITIVE);
        Matcher dupeMatcher = dupes.matcher(text);

        while (dupeMatcher.find()) {
            System.out.println("Duplicate found: \"" + dupeMatcher.group() + "\"");
        }
        // Duplicate found: "is is"
        // Duplicate found: "test test"
        // Duplicate found: "of of"
    }
}

Alternation and Lookaround

Alternation (OR)

The pipe character | acts as an OR operator. The pattern cat|dog matches either “cat” or “dog”. Alternation has the lowest precedence of any regex operator, so gray|grey matches “gray” or “grey”, not “gra” followed by “y|grey”.

To limit the scope of alternation, use parentheses: gr(a|e)y matches “gray” or “grey”.

Lookahead and Lookbehind

Lookaround assertions check if a pattern exists before or after the current position, but they do not consume characters (the match position does not advance). They are “zero-width assertions” — they assert a condition without including the matched text in the result.

Syntax Name Meaning Example
(?=...) Positive lookahead What follows must match \\d+(?= dollars) matches “100” in “100 dollars”
(?!...) Negative lookahead What follows must NOT match \\d+(?! dollars) matches “100” in “100 euros”
(?<=...) Positive lookbehind What precedes must match (?<=\\$)\\d+ matches "50" in "$50"
(? Negative lookbehind What precedes must NOT match (? matches "50" in "50" but not in "$50"

Lookarounds are especially useful in password validation, where you need to check multiple conditions at the same position (e.g., must contain a digit AND a special character AND an uppercase letter).

import java.util.regex.*;

public class AlternationAndLookaround {
    public static void main(String[] args) {
        // --- Alternation ---
        Pattern pet = Pattern.compile("cat|dog|bird");
        String text = "I have a cat and a dog but no bird";
        Matcher m = pet.matcher(text);
        while (m.find()) {
            System.out.println("Found pet: " + m.group());
        }
        // Found pet: cat
        // Found pet: dog
        // Found pet: bird

        // Alternation with grouping
        Pattern color = Pattern.compile("gr(a|e)y");
        System.out.println(color.matcher("gray").matches());  // true
        System.out.println(color.matcher("grey").matches());  // true
        System.out.println(color.matcher("griy").matches());  // false

        // --- Positive Lookahead: find numbers followed by "px" ---
        Matcher lookahead = Pattern.compile("\\d+(?=px)").matcher("width: 100px; height: 50px; margin: 10em");
        while (lookahead.find()) {
            System.out.println("Pixel value: " + lookahead.group());
        }
        // Pixel value: 100
        // Pixel value: 50
        // -- "10" was excluded because it is followed by "em", not "px"

        // --- Negative Lookahead: find numbers NOT followed by "px" ---
        Matcher negLookahead = Pattern.compile("\\d+(?!px)").matcher("width: 100px; margin: 10em");
        while (negLookahead.find()) {
            System.out.println("Non-pixel: " + negLookahead.group());
        }
        // Non-pixel: 10
        // Non-pixel: 10

        // --- Positive Lookbehind: extract amounts after $ ---
        Matcher lookbehind = Pattern.compile("(?<=\\$)\\d+\\.?\\d*").matcher("Price: $19.99 and $5.00");
        while (lookbehind.find()) {
            System.out.println("Amount: " + lookbehind.group());
        }
        // Amount: 19.99
        // Amount: 5.00

        // --- Password validation using multiple lookaheads ---
        // At least 8 chars, one uppercase, one lowercase, one digit, one special char
        Pattern strongPassword = Pattern.compile(
            "^(?=.*[A-Z])"  +   // at least one uppercase
            "(?=.*[a-z])"   +   // at least one lowercase
            "(?=.*\\d)"     +   // at least one digit
            "(?=.*[@#$%^&+=!])" + // at least one special character
            ".{8,}$"             // at least 8 characters total
        );

        String[] passwords = {"Passw0rd!", "password", "SHORT1!", "MyP@ss12"};
        for (String pw : passwords) {
            boolean strong = strongPassword.matcher(pw).matches();
            System.out.println(pw + " -> " + (strong ? "STRONG" : "WEAK"));
        }
        // Passw0rd! -> STRONG
        // password -> WEAK
        // SHORT1! -> WEAK
        // MyP@ss12 -> STRONG
    }
}

Common String Methods with Regex

Java's String class has several built-in methods that accept regex patterns. These are convenient for simple use cases where you do not need the full power of Pattern and Matcher.

Method What it Does Returns
String.matches(regex) Tests if the entire string matches the regex boolean
String.split(regex) Splits the string at each match of the regex String[]
String.split(regex, limit) Splits with a limit on the number of parts String[]
String.replaceAll(regex, replacement) Replaces all matches with the replacement String
String.replaceFirst(regex, replacement) Replaces only the first match String

Performance warning: Every call to these methods compiles a new Pattern internally. If you call them in a loop or frequently, compile the Pattern once yourself and use Matcher instead.

import java.util.Arrays;

public class StringRegexMethods {
    public static void main(String[] args) {
        // --- matches() -- checks the ENTIRE string ---
        System.out.println("12345".matches("\\d+"));       // true
        System.out.println("123abc".matches("\\d+"));      // false -- not all digits
        System.out.println("hello".matches("[a-z]+"));     // true

        // --- split() -- break a string into parts ---
        // Split on one or more whitespace characters
        String sentence = "Split   this   string   up";
        String[] words = sentence.split("\\s+");
        System.out.println(Arrays.toString(words));
        // [Split, this, string, up]

        // Split a CSV line (handles optional spaces after commas)
        String csv = "Java, Python,  C++, JavaScript";
        String[] languages = csv.split(",\\s*");
        System.out.println(Arrays.toString(languages));
        // [Java, Python, C++, JavaScript]

        // Split with a limit
        String path = "com.example.project.Main";
        String[] parts = path.split("\\.", 3); // at most 3 parts
        System.out.println(Arrays.toString(parts));
        // [com, example, project.Main]

        // --- replaceAll() -- replace all matches ---
        // Remove all non-alphanumeric characters
        String dirty = "Hello, World! @2026";
        String clean = dirty.replaceAll("[^a-zA-Z0-9]", "");
        System.out.println(clean); // HelloWorld2026

        // Normalize whitespace: replace multiple spaces/tabs with a single space
        String messy = "too   many     spaces    here";
        String normalized = messy.replaceAll("\\s+", " ");
        System.out.println(normalized); // too many spaces here

        // --- replaceFirst() -- replace only the first match ---
        String text = "error: file not found. error: permission denied.";
        String result = text.replaceFirst("error", "WARNING");
        System.out.println(result);
        // WARNING: file not found. error: permission denied.

        // Use captured groups in replacement with $1, $2, etc.
        // Reformat dates from MM/DD/YYYY to YYYY-MM-DD
        String date = "02/28/2026";
        String reformatted = date.replaceAll("(\\d{2})/(\\d{2})/(\\d{4})", "$3-$1-$2");
        System.out.println(reformatted); // 2026-02-28
    }
}

Pattern Flags

Pattern flags modify how the regex engine interprets the pattern. You pass them as the second argument to Pattern.compile(), or embed them directly in the pattern using inline flag syntax.

Flag Constant Inline Effect
Pattern.CASE_INSENSITIVE (?i) Matches letters regardless of case. abc matches "ABC".
Pattern.MULTILINE (?m) ^ and $ match start/end of each line, not just the entire string.
Pattern.DOTALL (?s) . matches any character including newline.
Pattern.COMMENTS (?x) Whitespace and comments (# to end of line) in the pattern are ignored. Great for readability.
Pattern.UNICODE_CASE (?u) Case-insensitive matching follows Unicode rules, not just ASCII.
Pattern.LITERAL -- The pattern is treated as a literal string (metacharacters have no special meaning).

You can combine multiple flags using the bitwise OR operator (|).

import java.util.regex.*;

public class PatternFlags {
    public static void main(String[] args) {
        // --- CASE_INSENSITIVE ---
        Pattern ci = Pattern.compile("java", Pattern.CASE_INSENSITIVE);
        System.out.println(ci.matcher("JAVA").matches());   // true
        System.out.println(ci.matcher("Java").matches());   // true
        System.out.println(ci.matcher("jAvA").matches());   // true

        // Same thing using inline flag (?i)
        System.out.println("JAVA".matches("(?i)java"));     // true

        // --- MULTILINE ---
        String log = "ERROR: disk full\nWARN: low memory\nERROR: timeout";
        Pattern errorLines = Pattern.compile("^ERROR.*$", Pattern.MULTILINE);
        Matcher m = errorLines.matcher(log);
        while (m.find()) {
            System.out.println(m.group());
        }
        // ERROR: disk full
        // ERROR: timeout

        // --- DOTALL ---
        String html = "
\nHello\nWorld\n
"; // Without DOTALL, . does not match newlines System.out.println(html.matches("
.*
")); // false // With DOTALL, . matches everything including newlines Pattern dotall = Pattern.compile("
.*
", Pattern.DOTALL); System.out.println(dotall.matcher(html).matches()); // true // --- COMMENTS -- write readable patterns --- Pattern readable = Pattern.compile( "\\d{3}" + // area code "-" + // separator "\\d{3}" + // prefix "-" + // separator "\\d{4}" // line number ); System.out.println(readable.matcher("555-123-4567").matches()); // true // Using COMMENTS flag with whitespace and # comments in the pattern itself Pattern commented = Pattern.compile( "(?x) " + // enable comments mode "\\d{3} " + // area code "- " + // dash separator "\\d{3} " + // prefix "- " + // dash separator "\\d{4} " // line number ); System.out.println(commented.matcher("555-123-4567").matches()); // true // --- Combining multiple flags --- Pattern combined = Pattern.compile( "^error.*$", Pattern.CASE_INSENSITIVE | Pattern.MULTILINE ); Matcher cm = combined.matcher("Error: something\nERROR: another\ninfo: ok"); while (cm.find()) { System.out.println("Found: " + cm.group()); } // Found: Error: something // Found: ERROR: another } }

Practical Validation Examples

One of the most common uses of regex is input validation. Below are battle-tested patterns for common formats, each broken down so you understand every part.

1. Email Validation

A simplified but practical email regex. Note that the full RFC 5322 email spec is extremely complex -- this pattern covers the vast majority of real-world addresses.

// Email: local-part@domain.tld
// ^                    -- start of string
// [a-zA-Z0-9._%+-]+   -- local part: letters, digits, dots, underscores, %, +, -
// @                    -- literal @ symbol
// [a-zA-Z0-9.-]+      -- domain: letters, digits, dots, hyphens
// \.                   -- literal dot before TLD
// [a-zA-Z]{2,}        -- TLD: at least 2 letters (com, org, io, etc.)
// $                    -- end of string

String emailRegex = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$";

String[] emails = {"user@example.com", "first.last@company.co.uk", "invalid@", "@nodomain.com", "test@site.io"};
for (String email : emails) {
    System.out.println(email + " -> " + (email.matches(emailRegex) ? "VALID" : "INVALID"));
}
// user@example.com -> VALID
// first.last@company.co.uk -> VALID
// invalid@ -> INVALID
// @nodomain.com -> INVALID
// test@site.io -> VALID

2. Phone Number (US)

Matches multiple common US phone formats: (555) 123-4567, 555-123-4567, 5551234567, +1-555-123-4567.

// US phone: optional country code, various separator formats
// ^                    -- start
// (\\+1[- ]?)?        -- optional +1 country code with optional separator
// \\(?                -- optional opening parenthesis
// \\d{3}              -- area code (3 digits)
// \\)?                -- optional closing parenthesis
// [- ]?               -- optional separator (dash or space)
// \\d{3}              -- prefix (3 digits)
// [- ]?               -- optional separator
// \\d{4}              -- line number (4 digits)
// $                    -- end

String phoneRegex = "^(\\+1[- ]?)?(\\(?\\d{3}\\)?[- ]?)?\\d{3}[- ]?\\d{4}$";

String[] phones = {"(555) 123-4567", "555-123-4567", "5551234567", "+1-555-123-4567", "123"};
for (String phone : phones) {
    System.out.println(phone + " -> " + (phone.matches(phoneRegex) ? "VALID" : "INVALID"));
}
// (555) 123-4567 -> VALID
// 555-123-4567 -> VALID
// 5551234567 -> VALID
// +1-555-123-4567 -> VALID
// 123 -> INVALID

3. Password Strength

Uses lookaheads to enforce multiple rules simultaneously: minimum length, uppercase, lowercase, digit, and special character.

// Password must have:
// (?=.*[A-Z])          -- at least one uppercase letter
// (?=.*[a-z])          -- at least one lowercase letter
// (?=.*\\d)            -- at least one digit
// (?=.*[@#$%^&+=!])    -- at least one special character
// .{8,20}              -- between 8 and 20 characters total

String passwordRegex = "^(?=.*[A-Z])(?=.*[a-z])(?=.*\\d)(?=.*[@#$%^&+=!]).{8,20}$";

String[] passwords = {"Str0ng!Pass", "weakpassword", "SHORT1!", "NoSpecial1", "G00d@Pwd"};
for (String pw : passwords) {
    System.out.println(pw + " -> " + (pw.matches(passwordRegex) ? "STRONG" : "WEAK"));
}
// Str0ng!Pass -> STRONG
// weakpassword -> WEAK (no uppercase, no digit, no special)
// SHORT1! -> WEAK (less than 8 chars)
// NoSpecial1 -> WEAK (no special character)
// G00d@Pwd -> STRONG

4. URL Validation

// URL: protocol://domain:port/path?query#fragment
// ^https?://           -- http or https
// [\\w.-]+             -- domain name
// (:\\d{1,5})?         -- optional port (1-5 digits)
// (/[\\w./-]*)*        -- optional path segments
// (\\?[\\w=&%-]*)?     -- optional query string
// (#[\\w-]*)?          -- optional fragment
// $

String urlRegex = "^https?://[\\w.-]+(:\\d{1,5})?(/[\\w./-]*)*(\\?[\\w=&%-]*)?(#[\\w-]*)?$";

String[] urls = {
    "https://example.com",
    "http://localhost:8080/api/users",
    "https://site.com/page?name=test&id=5",
    "ftp://invalid.com",
    "https://example.com/path#section"
};
for (String url : urls) {
    System.out.println(url + " -> " + (url.matches(urlRegex) ? "VALID" : "INVALID"));
}
// https://example.com -> VALID
// http://localhost:8080/api/users -> VALID
// https://site.com/page?name=test&id=5 -> VALID
// ftp://invalid.com -> INVALID
// https://example.com/path#section -> VALID

5. IP Address (IPv4)

// IPv4: four octets (0-255) separated by dots
// Each octet: 25[0-5] | 2[0-4]\\d | [01]?\\d{1,2}
// This handles: 0-9, 10-99, 100-199, 200-249, 250-255

String ipRegex = "^((25[0-5]|2[0-4]\\d|[01]?\\d{1,2})\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d{1,2})$";

String[] ips = {"192.168.1.1", "255.255.255.255", "0.0.0.0", "256.1.1.1", "192.168.1"};
for (String ip : ips) {
    System.out.println(ip + " -> " + (ip.matches(ipRegex) ? "VALID" : "INVALID"));
}
// 192.168.1.1 -> VALID
// 255.255.255.255 -> VALID
// 0.0.0.0 -> VALID
// 256.1.1.1 -> INVALID (256 is out of range)
// 192.168.1 -> INVALID (only 3 octets)

6. Date Format (YYYY-MM-DD)

// Date: YYYY-MM-DD (basic format validation, not full calendar validation)
// \\d{4}              -- 4-digit year
// -                   -- separator
// (0[1-9]|1[0-2])    -- month: 01-12
// -                   -- separator
// (0[1-9]|[12]\\d|3[01]) -- day: 01-31

String dateRegex = "^\\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\\d|3[01])$";

String[] dates = {"2026-02-28", "2026-13-01", "2026-00-15", "2026-12-31", "26-01-01"};
for (String date : dates) {
    System.out.println(date + " -> " + (date.matches(dateRegex) ? "VALID" : "INVALID"));
}
// 2026-02-28 -> VALID
// 2026-13-01 -> INVALID (month 13)
// 2026-00-15 -> INVALID (month 00)
// 2026-12-31 -> VALID
// 26-01-01 -> INVALID (2-digit year)

7. Credit Card Number

// Credit card: 13-19 digits, optionally separated by spaces or dashes every 4 digits
// Common formats: Visa (4xxx), Mastercard (5xxx), Amex (34xx/37xx)

String ccRegex = "^\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{1,7}$";

String[] cards = {"4111111111111111", "4111-1111-1111-1111", "4111 1111 1111 1111", "411", "1234567890123456789012"};
for (String card : cards) {
    System.out.println(card + " -> " + (card.matches(ccRegex) ? "VALID FORMAT" : "INVALID FORMAT"));
}
// 4111111111111111 -> VALID FORMAT
// 4111-1111-1111-1111 -> VALID FORMAT
// 4111 1111 1111 1111 -> VALID FORMAT
// 411 -> INVALID FORMAT
// 1234567890123456789012 -> INVALID FORMAT

// Note: this only validates the FORMAT, not the actual card number.
// Use the Luhn algorithm for checksum validation.

8. Social Security Number (SSN)

// SSN format: XXX-XX-XXXX
// (?!000|666)         -- area number cannot be 000 or 666
// (?!9)               -- area number cannot start with 9
// \\d{3}              -- 3-digit area number
// -                   -- separator
// (?!00)\\d{2}        -- 2-digit group number (not 00)
// -                   -- separator
// (?!0000)\\d{4}      -- 4-digit serial number (not 0000)

String ssnRegex = "^(?!000|666)(?!9\\d{2})\\d{3}-(?!00)\\d{2}-(?!0000)\\d{4}$";

String[] ssns = {"123-45-6789", "000-12-3456", "666-12-3456", "900-12-3456", "123-00-6789", "123-45-0000"};
for (String ssn : ssns) {
    System.out.println(ssn + " -> " + (ssn.matches(ssnRegex) ? "VALID" : "INVALID"));
}
// 123-45-6789 -> VALID
// 000-12-3456 -> INVALID (area 000)
// 666-12-3456 -> INVALID (area 666)
// 900-12-3456 -> INVALID (area starts with 9)
// 123-00-6789 -> INVALID (group 00)
// 123-45-0000 -> INVALID (serial 0000)

Search and Replace

Beyond validation, regex is heavily used for searching text and performing replacements. The Matcher class gives you fine-grained control over the search and replace process.

Finding All Matches with find()

The find() method scans the input for the next match. Call it in a while loop to iterate through all matches.

import java.util.regex.*;
import java.util.ArrayList;
import java.util.List;

public class SearchAndReplace {
    public static void main(String[] args) {
        // --- Finding all matches ---
        String text = "Contact us at support@company.com or sales@company.com. " +
                       "Personal: john.doe@gmail.com";
        Pattern emailPattern = Pattern.compile("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}");
        Matcher finder = emailPattern.matcher(text);

        List emails = new ArrayList<>();
        while (finder.find()) {
            emails.add(finder.group());
            System.out.println("Found email at [" + finder.start() + "-" + finder.end() + "]: " + finder.group());
        }
        // Found email at [17-36]: support@company.com
        // Found email at [40-57]: sales@company.com
        // Found email at [69-88]: john.doe@gmail.com
        System.out.println("Total emails found: " + emails.size()); // Total emails found: 3

        // --- Simple replaceAll ---
        String censored = emailPattern.matcher(text).replaceAll("[REDACTED]");
        System.out.println(censored);
        // Contact us at [REDACTED] or [REDACTED]. Personal: [REDACTED]

        // --- replaceFirst ---
        String firstOnly = emailPattern.matcher(text).replaceFirst("[REDACTED]");
        System.out.println(firstOnly);
        // Contact us at [REDACTED] or sales@company.com. Personal: john.doe@gmail.com
    }
}

Custom Replacement with appendReplacement / appendTail

When you need dynamic replacements (e.g., the replacement depends on the matched value), use appendReplacement() and appendTail(). This pair lets you build a result string incrementally, applying custom logic to each match.

import java.util.regex.*;

public class CustomReplacement {
    public static void main(String[] args) {
        // Convert all words to title case using appendReplacement
        String input = "the quick brown fox jumps over the lazy dog";
        Pattern wordPattern = Pattern.compile("\\b([a-z])(\\w*)");
        Matcher m = wordPattern.matcher(input);
        StringBuilder result = new StringBuilder();

        while (m.find()) {
            String titleCase = m.group(1).toUpperCase() + m.group(2);
            m.appendReplacement(result, titleCase);
        }
        m.appendTail(result);
        System.out.println(result);
        // The Quick Brown Fox Jumps Over The Lazy Dog

        // Mask credit card numbers: show only last 4 digits
        String data = "Card: 4111-1111-1111-1111, Another: 5500-0000-0000-0004";
        Pattern ccPattern = Pattern.compile("(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})");
        Matcher ccMatcher = ccPattern.matcher(data);
        StringBuilder masked = new StringBuilder();

        while (ccMatcher.find()) {
            String replacement = "****-****-****-" + ccMatcher.group(4);
            ccMatcher.appendReplacement(masked, replacement);
        }
        ccMatcher.appendTail(masked);
        System.out.println(masked);
        // Card: ****-****-****-1111, Another: ****-****-****-0004

        // Java 9+: Matcher.replaceAll with a Function
        String prices = "Items cost $5 and $23 and $100";
        Pattern pricePattern = Pattern.compile("\\$(\\d+)");
        String doubled = pricePattern.matcher(prices).replaceAll(mr -> {
            int amount = Integer.parseInt(mr.group(1));
            return "\\$" + (amount * 2);
        });
        System.out.println(doubled);
        // Items cost $10 and $46 and $200
    }
}

Common Mistakes

Even experienced developers make regex mistakes. Here are the most frequent pitfalls and how to avoid them.

1. Forgetting to Double-Escape Backslashes in Java

This is the number one mistake for Java developers. In regex, \d means "digit." In a Java string, \d is not a valid escape sequence. You must write \\d so Java's string parser produces the single backslash that the regex engine expects.

// WRONG -- Java does not recognize \d as a string escape
// String pattern = "\d+";  // Compilation error!

// CORRECT -- double backslash to produce \d for the regex engine
String pattern = "\\d+";

// To match a literal backslash in text, you need FOUR backslashes:
// Java string: "\\\\"  -> produces: \\  -> regex sees: \ (literal backslash)
String backslashPattern = "\\\\";
System.out.println("C:\\Users".matches(".*\\\\.*")); // true

2. Catastrophic Backtracking

Certain regex patterns can cause the engine to take an exponential amount of time on certain inputs. This happens when a pattern has nested quantifiers that can match the same characters in multiple ways.

// DANGEROUS -- nested quantifiers can cause catastrophic backtracking
// String bad = "(a+)+b";
// On input "aaaaaaaaaaaaaaaaaac", the engine tries every possible way
// to split the 'a's between the inner and outer groups before failing.
// This can freeze your application.

// SAFE -- flatten the nesting
String safe = "a+b";
// This matches the same thing but without the exponential backtracking risk.

// Another common danger: matching quoted strings with nested quantifiers
// DANGEROUS: "(.*)*"
// SAFE:      "[^"]*"   -- use negated character class instead

3. Overly Complex Patterns

If your regex is more than about 80 characters long, consider breaking the validation into multiple simpler steps. A 200-character regex that validates everything at once is nearly impossible to maintain.

// BAD -- one massive unreadable regex
// String nightmare = "^(?=.*[A-Z])(?=.*[a-z])(?=.*\\d)(?=.*[@#$%^&+=!])[a-zA-Z0-9@#$%^&+=!]{8,20}$";

// BETTER -- break into understandable steps
public static boolean isStrongPassword(String password) {
    if (password == null) return false;
    if (password.length() < 8 || password.length() > 20) return false;
    if (!password.matches(".*[A-Z].*")) return false;  // needs uppercase
    if (!password.matches(".*[a-z].*")) return false;  // needs lowercase
    if (!password.matches(".*\\d.*")) return false;     // needs digit
    if (!password.matches(".*[@#$%^&+=!].*")) return false; // needs special char
    return true;
}
// Easier to read, debug, and extend. Each rule is independently testable.

4. Testing Only the Happy Path

Always test your regex with edge cases: empty strings, very long strings, strings with special characters, and strings that are close to matching but should not.

// Testing an email regex -- you need ALL of these test cases
String emailRegex = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$";

// Happy path
assert "user@example.com".matches(emailRegex);         // standard email
assert "first.last@company.co.uk".matches(emailRegex); // dots and subdomains

// Edge cases that should FAIL
assert !"".matches(emailRegex);                        // empty string
assert !"@example.com".matches(emailRegex);            // missing local part
assert !"user@".matches(emailRegex);                   // missing domain
assert !"user@.com".matches(emailRegex);               // domain starts with dot
assert !"user@com".matches(emailRegex);                // no TLD separator
assert !"user@@example.com".matches(emailRegex);       // double @

// Edge cases that should PASS
assert "a@b.co".matches(emailRegex);                   // minimal valid email
assert "user+tag@gmail.com".matches(emailRegex);       // plus addressing

5. Using matches() When You Meant find()

String.matches() and Matcher.matches() check if the entire string matches the pattern. If you want to check if the pattern appears anywhere in the string, use Matcher.find().

String text = "Error code: 404";

// WRONG -- matches() checks the ENTIRE string
System.out.println(text.matches("\\d+"));  // false -- the entire string is not digits

// CORRECT -- find() searches for the pattern anywhere
Matcher m = Pattern.compile("\\d+").matcher(text);
System.out.println(m.find());     // true
System.out.println(m.group());    // 404

// If you must use matches(), wrap the pattern with .*
System.out.println(text.matches(".*\\d+.*")); // true -- but find() is cleaner

Best Practices

Follow these guidelines to write regex that is correct, readable, and performant.

1. Compile Patterns Once

The Pattern.compile() method is expensive. If you use the same regex multiple times (in a loop, in a method called frequently, etc.), compile it once and store it as a static final field.

public class UserValidator {

    // GOOD -- compiled once, reused many times
    private static final Pattern EMAIL_PATTERN =
        Pattern.compile("^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$");

    private static final Pattern PHONE_PATTERN =
        Pattern.compile("^(\\+1[- ]?)?(\\(?\\d{3}\\)?[- ]?)?\\d{3}[- ]?\\d{4}$");

    public static boolean isValidEmail(String email) {
        return email != null && EMAIL_PATTERN.matcher(email).matches();
    }

    public static boolean isValidPhone(String phone) {
        return phone != null && PHONE_PATTERN.matcher(phone).matches();
    }

    // BAD -- compiles a new Pattern on every call
    // public static boolean isValidEmailBad(String email) {
    //     return email.matches("^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$");
    // }
}

2. Use Named Groups for Readability

Named groups make your code self-documenting. Instead of remembering that group(3) is the year, use group("year").

3. Use Pattern.quote() for Literal Matching

When you are searching for user-supplied text that might contain regex metacharacters, use Pattern.quote() to escape everything automatically.

4. Keep Patterns Simple

If a regex grows beyond a readable length, consider breaking the validation into multiple steps or using a combination of regex and plain Java logic.

5. Comment Complex Patterns

Use Java string concatenation with comments, or the COMMENTS flag, to make complex patterns understandable.

6. Test with Edge Cases

Always test with: empty strings, null input, maximum-length input, strings with only special characters, strings that are "almost" valid, and internationalized input (if applicable).

7. Prefer Specific Patterns Over Greedy Wildcards

Instead of .* (which matches anything), use character classes that describe what you actually expect: [^"]* instead of .* inside quotes, \\d+ instead of .+ for numbers.

Best Practices Summary

Practice Do Do Not
Compile patterns static final Pattern P = Pattern.compile(...) str.matches("...") in a loop
Escape user input Pattern.quote(userInput) Concatenate user input directly into regex
Name groups (?<year>\\d{4}) (\\d{4}) then group(1)
Be specific [^"]* between quotes .* between quotes
Handle null Check null before matching Call .matches() on nullable values
Break complex logic Multiple simple checks One enormous regex
Test edge cases Empty, long, special chars, near-misses Test only the happy path

Quick Reference Table

A comprehensive reference of all regex syntax elements covered in this tutorial.

Category Syntax Meaning Java String
Character Classes [abc] Any of a, b, or c "[abc]"
[^abc] Not a, b, or c "[^abc]"
[a-z] Range a through z "[a-z]"
\d / \D Digit / Non-digit "\\d" / "\\D"
\w / \W Word char / Non-word char "\\w" / "\\W"
\s / \S Whitespace / Non-whitespace "\\s" / "\\S"
. Any character (except newline) "."
Quantifiers * Zero or more "a*"
+ One or more "a+"
? Zero or one "a?"
{n} Exactly n "a{3}"
{n,m} Between n and m "a{2,5}"
*? / +? Lazy (minimal) match "a*?" / "a+?"
Anchors ^ Start of string/line "^"
$ End of string/line "$"
\b Word boundary "\\b"
\B Non-word boundary "\\B"
Groups (...) Capturing group "(abc)"
(?:...) Non-capturing group "(?:abc)"
(?<name>...) Named group "(?<name>abc)"
\1 Backreference to group 1 "\\1"
| Alternation (OR) "cat|dog"
Lookaround (?=...) Positive lookahead "(?=abc)"
(?!...) Negative lookahead "(?!abc)"
(?<=...) Positive lookbehind "(?<=abc)"
(? Negative lookbehind "(?
Flags (?i) Case insensitive Pattern.CASE_INSENSITIVE
(?m) Multiline (^ $ match lines) Pattern.MULTILINE
(?s) Dotall (. matches newline) Pattern.DOTALL
(?x) Comments mode Pattern.COMMENTS
(?u) Unicode case Pattern.UNICODE_CASE
-- Literal (no metacharacters) Pattern.LITERAL

Complete Practical Example: Log Parser and Input Validator

This final example brings together everything we have learned. It is a complete, runnable program that demonstrates regex in two real-world scenarios: parsing structured log files and validating user input for a registration form.

import java.util.regex.*;
import java.util.*;
import java.util.stream.Collectors;

/**
 * Complete Regex Example: LogParser and InputValidator
 *
 * Demonstrates:
 * - Pattern compilation and reuse (static final)
 * - Named capturing groups
 * - Multiple validation patterns
 * - find() with while loop for extraction
 * - replaceAll for data masking
 * - appendReplacement for custom replacement
 * - Lookaheads for password validation
 * - Word boundaries
 * - Greedy vs lazy matching
 * - Pattern flags
 */
public class RegexDemo {

    // =========================================================================
    // Part 1: Log Parser -- Extract structured data from log entries
    // =========================================================================

    // Pre-compiled patterns (compiled once, reused across all calls)
    private static final Pattern LOG_PATTERN = Pattern.compile(
        "(?\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2})" + // 2026-02-28 14:30:00
        "\\s+\\[(?\\w+)]"  +                                  // [ERROR]
        "\\s+(?[\\w.]+)"   +                                  // com.app.Service
        "\\s+-\\s+(?.*)"                                     // - The log message
    );

    private static final Pattern IP_PATTERN = Pattern.compile(
        "\\b(?:(?:25[0-5]|2[0-4]\\d|[01]?\\d{1,2})\\.){3}(?:25[0-5]|2[0-4]\\d|[01]?\\d{1,2})\\b"
    );

    private static final Pattern ERROR_CODE_PATTERN = Pattern.compile(
        "\\b[A-Z]{2,4}-\\d{3,5}\\b"  // e.g., ERR-5001, HTTP-404
    );

    public static void parseLogEntries(String[] logLines) {
        System.out.println("=== LOG PARSER RESULTS ===");
        System.out.println();

        Map levelCounts = new LinkedHashMap<>();
        List errorMessages = new ArrayList<>();

        for (String line : logLines) {
            Matcher m = LOG_PATTERN.matcher(line);
            if (m.matches()) {
                String timestamp = m.group("timestamp");
                String level = m.group("level");
                String className = m.group("class");
                String message = m.group("message");

                // Count log levels
                levelCounts.merge(level, 1, Integer::sum);

                // Collect error messages
                if ("ERROR".equals(level)) {
                    errorMessages.add(timestamp + " | " + className + " | " + message);
                }

                // Extract IP addresses from the message
                Matcher ipMatcher = IP_PATTERN.matcher(message);
                while (ipMatcher.find()) {
                    System.out.println("  IP found in log: " + ipMatcher.group()
                        + " (from " + className + ")");
                }

                // Extract error codes from the message
                Matcher codeMatcher = ERROR_CODE_PATTERN.matcher(message);
                while (codeMatcher.find()) {
                    System.out.println("  Error code found: " + codeMatcher.group()
                        + " (at " + timestamp + ")");
                }
            }
        }

        System.out.println();
        System.out.println("Log Level Summary:");
        levelCounts.forEach((level, count) ->
            System.out.println("  " + level + ": " + count));

        System.out.println();
        System.out.println("Error Messages:");
        errorMessages.forEach(msg -> System.out.println("  " + msg));
    }

    // =========================================================================
    // Part 2: Input Validator -- Validate form fields for user registration
    // =========================================================================

    private static final Pattern EMAIL_PATTERN = Pattern.compile(
        "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
    );

    private static final Pattern PHONE_PATTERN = Pattern.compile(
        "^(\\+1[- ]?)?(\\(?\\d{3}\\)?[- ]?)?\\d{3}[- ]?\\d{4}$"
    );

    private static final Pattern PASSWORD_PATTERN = Pattern.compile(
        "^(?=.*[A-Z])(?=.*[a-z])(?=.*\\d)(?=.*[@#$%^&+=!]).{8,20}$"
    );

    private static final Pattern USERNAME_PATTERN = Pattern.compile(
        "^[a-zA-Z][a-zA-Z0-9_]{2,19}$"  // starts with letter, 3-20 chars, only alphanumeric and _
    );

    private static final Pattern DATE_PATTERN = Pattern.compile(
        "^(?\\d{4})-(?0[1-9]|1[0-2])-(?0[1-9]|[12]\\d|3[01])$"
    );

    public static Map validateRegistration(
            String username, String email, String password, String phone, String birthDate) {

        Map errors = new LinkedHashMap<>();

        // Username validation
        if (username == null || username.isEmpty()) {
            errors.put("username", "Username is required");
        } else if (!USERNAME_PATTERN.matcher(username).matches()) {
            errors.put("username", "Must start with a letter, 3-20 chars, only letters/digits/underscore");
        }

        // Email validation
        if (email == null || email.isEmpty()) {
            errors.put("email", "Email is required");
        } else if (!EMAIL_PATTERN.matcher(email).matches()) {
            errors.put("email", "Invalid email format");
        }

        // Password validation with specific feedback
        if (password == null || password.isEmpty()) {
            errors.put("password", "Password is required");
        } else {
            List passwordIssues = new ArrayList<>();
            if (password.length() < 8) passwordIssues.add("at least 8 characters");
            if (password.length() > 20) passwordIssues.add("at most 20 characters");
            if (!password.matches(".*[A-Z].*")) passwordIssues.add("an uppercase letter");
            if (!password.matches(".*[a-z].*")) passwordIssues.add("a lowercase letter");
            if (!password.matches(".*\\d.*")) passwordIssues.add("a digit");
            if (!password.matches(".*[@#$%^&+=!].*")) passwordIssues.add("a special character (@#$%^&+=!)");
            if (!passwordIssues.isEmpty()) {
                errors.put("password", "Password needs: " + String.join(", ", passwordIssues));
            }
        }

        // Phone validation
        if (phone != null && !phone.isEmpty() && !PHONE_PATTERN.matcher(phone).matches()) {
            errors.put("phone", "Invalid US phone format");
        }

        // Birth date validation
        if (birthDate != null && !birthDate.isEmpty()) {
            Matcher dm = DATE_PATTERN.matcher(birthDate);
            if (!dm.matches()) {
                errors.put("birthDate", "Invalid date format (use YYYY-MM-DD)");
            } else {
                int year = Integer.parseInt(dm.group("year"));
                if (year > 2026 || year < 1900) {
                    errors.put("birthDate", "Year must be between 1900 and 2026");
                }
            }
        }

        return errors;
    }

    // =========================================================================
    // Part 3: Data Masking -- Redact sensitive information from text
    // =========================================================================

    private static final Pattern SSN_IN_TEXT = Pattern.compile(
        "\\b\\d{3}-\\d{2}-\\d{4}\\b"
    );

    private static final Pattern CC_IN_TEXT = Pattern.compile(
        "\\b(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\b"
    );

    private static final Pattern EMAIL_IN_TEXT = Pattern.compile(
        "[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
    );

    public static String maskSensitiveData(String text) {
        // Mask SSNs: 123-45-6789 -> ***-**-6789
        String result = SSN_IN_TEXT.matcher(text).replaceAll(mr -> {
            String ssn = mr.group();
            return "***-**-" + ssn.substring(ssn.length() - 4);
        });

        // Mask credit cards: show only last 4 digits
        Matcher ccMatcher = CC_IN_TEXT.matcher(result);
        StringBuilder sb = new StringBuilder();
        while (ccMatcher.find()) {
            ccMatcher.appendReplacement(sb, "****-****-****-" + ccMatcher.group(4));
        }
        ccMatcher.appendTail(sb);
        result = sb.toString();

        // Mask emails: user@domain.com -> u***@domain.com
        Matcher emailMatcher = EMAIL_IN_TEXT.matcher(result);
        sb = new StringBuilder();
        while (emailMatcher.find()) {
            String email = emailMatcher.group();
            int atIndex = email.indexOf('@');
            String masked = email.charAt(0) + "***" + email.substring(atIndex);
            emailMatcher.appendReplacement(sb, Matcher.quoteReplacement(masked));
        }
        emailMatcher.appendTail(sb);

        return sb.toString();
    }

    // =========================================================================
    // Main -- Run all demonstrations
    // =========================================================================

    public static void main(String[] args) {

        // --- Part 1: Parse log entries ---
        String[] logLines = {
            "2026-02-28 14:30:00 [INFO] com.app.UserService - User login from 192.168.1.100",
            "2026-02-28 14:30:05 [ERROR] com.app.PaymentService - Payment failed: ERR-5001 for IP 10.0.0.1",
            "2026-02-28 14:30:10 [WARN] com.app.AuthService - Failed login attempt from 172.16.0.50",
            "2026-02-28 14:30:15 [ERROR] com.app.OrderService - Order processing failed: HTTP-500 timeout",
            "2026-02-28 14:30:20 [INFO] com.app.CacheService - Cache refreshed successfully",
            "2026-02-28 14:30:25 [ERROR] com.app.DatabaseService - Connection lost: DB-1001 to 192.168.1.200"
        };
        parseLogEntries(logLines);

        System.out.println();
        System.out.println("========================================");
        System.out.println();

        // --- Part 2: Validate registration forms ---
        System.out.println("=== REGISTRATION VALIDATION ===");
        System.out.println();

        // Test case 1: Valid registration
        Map errors1 = validateRegistration(
            "john_doe", "john@example.com", "MyP@ss123", "(555) 123-4567", "1990-06-15"
        );
        System.out.println("Test 1 (valid): " + (errors1.isEmpty() ? "PASSED" : "FAILED: " + errors1));

        // Test case 2: Multiple validation failures
        Map errors2 = validateRegistration(
            "2bad", "not-an-email", "weak", "12345", "2026-13-45"
        );
        System.out.println("Test 2 (invalid):");
        errors2.forEach((field, error) -> System.out.println("  " + field + ": " + error));

        // Test case 3: Specific password feedback
        Map errors3 = validateRegistration(
            "alice", "alice@test.com", "onlylowercase", null, null
        );
        System.out.println("Test 3 (weak password):");
        errors3.forEach((field, error) -> System.out.println("  " + field + ": " + error));

        System.out.println();
        System.out.println("========================================");
        System.out.println();

        // --- Part 3: Mask sensitive data ---
        System.out.println("=== DATA MASKING ===");
        System.out.println();
        String sensitiveText = "Customer SSN: 123-45-6789, CC: 4111-1111-1111-1111, " +
                               "Email: john.doe@gmail.com, Alt SSN: 987-65-4321";
        System.out.println("Original: " + sensitiveText);
        System.out.println("Masked:   " + maskSensitiveData(sensitiveText));
    }
}

Output

=== LOG PARSER RESULTS ===

  IP found in log: 192.168.1.100 (from com.app.UserService)
  IP found in log: 10.0.0.1 (from com.app.PaymentService)
  Error code found: ERR-5001 (at 2026-02-28 14:30:05)
  IP found in log: 172.16.0.50 (from com.app.AuthService)
  Error code found: HTTP-500 (at 2026-02-28 14:30:15)
  Error code found: DB-1001 (at 2026-02-28 14:30:25)
  IP found in log: 192.168.1.200 (from com.app.DatabaseService)

Log Level Summary:
  INFO: 2
  ERROR: 3
  WARN: 1

Error Messages:
  2026-02-28 14:30:05 | com.app.PaymentService | Payment failed: ERR-5001 for IP 10.0.0.1
  2026-02-28 14:30:15 | com.app.OrderService | Order processing failed: HTTP-500 timeout
  2026-02-28 14:30:25 | com.app.DatabaseService | Connection lost: DB-1001 to 192.168.1.200

========================================

=== REGISTRATION VALIDATION ===

Test 1 (valid): PASSED
Test 2 (invalid):
  username: Must start with a letter, 3-20 chars, only letters/digits/underscore
  email: Invalid email format
  password: Password needs: at least 8 characters, an uppercase letter, a digit, a special character (@#$%^&+=!)
  phone: Invalid US phone format
  birthDate: Invalid date format (use YYYY-MM-DD)
Test 3 (weak password):
  password: Password needs: an uppercase letter, a digit, a special character (@#$%^&+=!)

========================================

=== DATA MASKING ===

Original: Customer SSN: 123-45-6789, CC: 4111-1111-1111-1111, Email: john.doe@gmail.com, Alt SSN: 987-65-4321
Masked:   Customer SSN: ***-**-6789, CC: ****-****-****-1111, Email: j***@gmail.com, Alt SSN: ***-**-4321

Concepts Demonstrated

# Concept Where Used
1 Pattern compilation and reuse static final Pattern fields throughout
2 Named capturing groups LOG_PATTERN: (?<timestamp>...), (?<level>...), (?<class>...), (?<message>...)
3 find() with while loop IP address and error code extraction from log messages
4 matches() for full-string validation All validators: email, phone, username, password, date
5 Lookaheads for password rules PASSWORD_PATTERN uses (?=.*[A-Z]), (?=.*\\d), etc.
6 Word boundaries SSN_IN_TEXT, CC_IN_TEXT, ERROR_CODE_PATTERN use \\b
7 appendReplacement / appendTail Credit card and email masking with custom replacement logic
8 replaceAll with Function (Java 9+) SSN masking: replaceAll(mr -> ...)
9 Matcher.quoteReplacement() Email masking: prevents $ and \ in replacement from being interpreted
10 Numbered capturing groups CC_IN_TEXT: group(4) to get last 4 digits
11 Group extraction for further processing Date validation: extracting year for range check
12 Multiple regex patterns working together Log parser uses 3 patterns; validator uses 5 patterns; masker uses 3 patterns
13 Breaking complex validation into steps Password validation gives specific feedback per rule instead of one giant regex
14 Null-safe validation All validators check for null before applying regex



Subscribe To Our Newsletter
You will receive our latest post and tutorial.
Thank you for subscribing!

required
required


Leave a Reply

Your email address will not be published. Required fields are marked *