Regex (short for Regular Expression) is a sequence of characters that defines a search pattern. Think of it as a mini-language specifically designed for matching, searching, extracting, and replacing text.
Here is a real-world analogy: imagine you are in a library looking for books. Instead of searching for one specific title, you tell the librarian: “Find me all books whose title starts with ‘Java’, has a number in the middle, and ends with ‘Guide’.” That description is essentially a regex — a template that matches multiple possibilities based on a pattern, not a fixed string.
In Java, regex is used everywhere:
Without regex, tasks like “find all email addresses in a 10,000-line log file” would require dozens of lines of manual string parsing. With regex, it takes one line.
Java provides regex support through the java.util.regex package, which contains three core classes:
| Class | Purpose | Key Methods |
|---|---|---|
Pattern |
A compiled representation of a regex pattern. Compiling is expensive, so you compile once and reuse. | compile(), matcher(), matches(), split() |
Matcher |
The engine that performs matching operations against a string using a Pattern. | matches(), find(), group(), replaceAll() |
PatternSyntaxException |
An unchecked exception thrown when a regex pattern has invalid syntax. | getMessage(), getPattern(), getIndex() |
The basic workflow for using regex in Java follows three steps:
Pattern objectpattern.matcher(inputString)matches(), find(), lookingAt(), etc.import java.util.regex.Pattern;
import java.util.regex.Matcher;
public class RegexBasics {
public static void main(String[] args) {
// Step 1: Compile the pattern
Pattern pattern = Pattern.compile("Java");
// Step 2: Create a matcher for the input string
Matcher matcher = pattern.matcher("I love Java programming");
// Step 3: Execute matching operations
boolean found = matcher.find();
System.out.println("Found 'Java': " + found); // Found 'Java': true
// matches() checks if the ENTIRE string matches the pattern
boolean fullMatch = matcher.matches();
System.out.println("Entire string is 'Java': " + fullMatch); // Entire string is 'Java': false
// Reset and find the match position
matcher.reset();
if (matcher.find()) {
System.out.println("Match starts at index: " + matcher.start()); // Match starts at index: 7
System.out.println("Match ends at index: " + matcher.end()); // Match ends at index: 11
System.out.println("Matched text: " + matcher.group()); // Matched text: Java
}
}
}
There is an important distinction between three Matcher methods:
| Method | What it Checks | Example Pattern: "Java" |
|---|---|---|
matches() |
Does the entire string match the pattern? | "Java" returns true, "Java rocks" returns false |
find() |
Is the pattern found anywhere in the string? | "I love Java" returns true |
lookingAt() |
Does the beginning of the string match the pattern? | "Java rocks" returns true, "I love Java" returns false |
For quick one-off checks, you can skip the compile step and use the static Pattern.matches() method. However, this recompiles the pattern every time, so avoid it in loops or frequently called methods.
// Quick one-off match (compiles a new Pattern every call -- avoid in loops)
boolean isMatch = Pattern.matches("\\d+", "12345");
System.out.println("All digits: " + isMatch); // All digits: true
// Even quicker: String.matches() delegates to Pattern.matches()
boolean isDigits = "12345".matches("\\d+");
System.out.println("All digits: " + isDigits); // All digits: true
A regex pattern is built from two types of characters:
cat matches the text “cat”.Java has 14 metacharacters that have special meaning in regex. If you want to match these characters literally, you must escape them with a backslash (\).
| Metacharacter | Meaning | To Match Literally |
|---|---|---|
. |
Any single character (except newline by default) | \\. |
^ |
Start of string (or line in MULTILINE mode) | \\^ |
$ |
End of string (or line in MULTILINE mode) | \\$ |
* |
Zero or more of preceding element | \\* |
+ |
One or more of preceding element | \\+ |
? |
Zero or one of preceding element | \\? |
{ } |
Quantifier range (e.g., {2,5}) |
\\{ \\} |
[ ] |
Character class definition | \\[ \\] |
( ) |
Grouping and capturing | \\( \\) |
\ |
Escape character | \\\\ |
| |
Alternation (OR) | \\| |
Critical Java note: In Java strings, the backslash (\) is itself an escape character. So to write the regex \d (which means “a digit”), you must write "\\d" in Java code — the first backslash escapes the second one for Java, and the resulting \d is what the regex engine sees.
import java.util.regex.*;
public class MetacharacterEscaping {
public static void main(String[] args) {
// Without escaping: . matches ANY character
System.out.println("file.txt".matches("file.txt")); // true
System.out.println("fileXtxt".matches("file.txt")); // true -- oops, . matched 'X'
// With escaping: \\. matches only a literal dot
System.out.println("file.txt".matches("file\\.txt")); // true
System.out.println("fileXtxt".matches("file\\.txt")); // false -- correct!
// Matching a literal dollar sign in a price
Pattern price = Pattern.compile("\\$\\d+\\.\\d{2}");
System.out.println(price.matcher("$19.99").matches()); // true
System.out.println(price.matcher("$5.00").matches()); // true
System.out.println(price.matcher("19.99").matches()); // false -- missing $
// Use Pattern.quote() to treat an entire string as a literal
String userInput = "price is $10.00 (USD)";
String searchTerm = "$10.00";
Pattern literal = Pattern.compile(Pattern.quote(searchTerm));
Matcher m = literal.matcher(userInput);
System.out.println(m.find()); // true -- matched "$10.00" literally
}
}
A character class (also called a character set) matches a single character from a defined set. You define a character class by placing characters inside square brackets [].
| Syntax | Meaning | Example | Matches |
|---|---|---|---|
[abc] |
Any one of a, b, or c | [aeiou] |
Any vowel |
[a-z] |
Any character in range a through z | [a-zA-Z] |
Any letter |
[0-9] |
Any digit 0 through 9 | [0-9a-f] |
Any hex digit |
[^abc] |
Any character except a, b, or c | [^0-9] |
Any non-digit |
[a-z&&[^aeiou]] |
Intersection: a-z but not vowels | [a-z&&[^aeiou]] |
Any consonant |
Java provides shorthand notation for commonly used character classes. These save typing and improve readability.
| Shorthand | Equivalent | Meaning |
|---|---|---|
\d |
[0-9] |
Any digit |
\D |
[^0-9] |
Any non-digit |
\w |
[a-zA-Z0-9_] |
Any word character (letter, digit, or underscore) |
\W |
[^a-zA-Z0-9_] |
Any non-word character |
\s |
[ \t\n\r\f] |
Any whitespace character |
\S |
[^ \t\n\r\f] |
Any non-whitespace character |
. |
(almost anything) | Any character except newline (unless DOTALL flag is set) |
Remember: in Java strings, you write \\d to produce the regex \d.
import java.util.regex.*;
public class CharacterClasses {
public static void main(String[] args) {
// Custom character class: match a vowel followed by a consonant
Pattern vc = Pattern.compile("[aeiou][^aeiou\\s\\d]");
Matcher m = vc.matcher("hello world");
while (m.find()) {
System.out.println("Found: " + m.group() + " at index " + m.start());
}
// Found: el at index 1
// Found: or at index 7
// \\d matches any digit
System.out.println("abc".matches("\\d+")); // false
System.out.println("123".matches("\\d+")); // true
// \\w matches word characters (letters, digits, underscore)
System.out.println("hello_world".matches("\\w+")); // true
System.out.println("hello world".matches("\\w+")); // false -- space is not a word char
// \\s matches whitespace
System.out.println("has spaces".matches(".*\\s.*")); // true
System.out.println("nospaces".matches(".*\\s.*")); // false
// . matches any character except newline
System.out.println("a".matches(".")); // true
System.out.println("1".matches(".")); // true
System.out.println("".matches(".")); // false -- needs exactly one char
// Ranges: hex digit check
Pattern hex = Pattern.compile("[0-9a-fA-F]+");
System.out.println(hex.matcher("1a2bFF").matches()); // true
System.out.println(hex.matcher("GHIJ").matches()); // false
// Negation: match non-digits
Matcher nonDigits = Pattern.compile("[^0-9]+").matcher("abc123def");
while (nonDigits.find()) {
System.out.println("Non-digit segment: " + nonDigits.group());
}
// Non-digit segment: abc
// Non-digit segment: def
}
}
Quantifiers control how many times a preceding element must occur for a match. Without quantifiers, each element in a pattern matches exactly once.
| Quantifier | Meaning | Example Pattern | Matches | Does Not Match |
|---|---|---|---|---|
* |
Zero or more | ab*c |
“ac”, “abc”, “abbc” | “adc” |
+ |
One or more | ab+c |
“abc”, “abbc” | “ac” |
? |
Zero or one (optional) | colou?r |
“color”, “colour” | “colouur” |
{n} |
Exactly n times | \\d{3} |
“123” | “12”, “1234” |
{n,} |
At least n times | \\d{2,} |
“12”, “123”, “1234” | “1” |
{n,m} |
Between n and m times | \\d{2,4} |
“12”, “123”, “1234” | “1”, “12345” |
By default, all quantifiers are greedy — they match as much text as possible. Adding a ? after a quantifier makes it lazy (also called reluctant) — it matches as little text as possible.
| Greedy | Lazy | Behavior |
|---|---|---|
* |
*? |
Match as few as possible (zero or more) |
+ |
+? |
Match as few as possible (one or more) |
? |
?? |
Match zero if possible |
{n,m} |
{n,m}? |
Match n times if possible |
The difference matters most when your pattern has flexible parts and you need to control where the match stops.
import java.util.regex.*;
public class Quantifiers {
public static void main(String[] args) {
// Greedy vs Lazy demonstration
String html = "bold and more bold";
// Greedy: .* grabs as much as possible
Matcher greedy = Pattern.compile(".*").matcher(html);
if (greedy.find()) {
System.out.println("Greedy: " + greedy.group());
// Greedy: bold and more bold
// -- matched from first to LAST
}
// Lazy: .*? grabs as little as possible
Matcher lazy = Pattern.compile(".*?").matcher(html);
while (lazy.find()) {
System.out.println("Lazy: " + lazy.group());
}
// Lazy: bold
// Lazy: more bold
// -- matched each ... pair individually
// Exact count: match a US zip code (5 digits, optional -4 digits)
Pattern zip = Pattern.compile("\\d{5}(-\\d{4})?");
System.out.println(zip.matcher("90210").matches()); // true
System.out.println(zip.matcher("90210-1234").matches()); // true
System.out.println(zip.matcher("9021").matches()); // false
System.out.println(zip.matcher("902101234").matches()); // false
// Range: password length check (8 to 20 characters)
Pattern length = Pattern.compile(".{8,20}");
System.out.println(length.matcher("short").matches()); // false (5 chars)
System.out.println(length.matcher("justright").matches()); // true (9 chars)
System.out.println(length.matcher("a]".repeat(11)).matches()); // false (22 chars)
// Optional element: match "http" or "https"
Pattern protocol = Pattern.compile("https?://.*");
System.out.println(protocol.matcher("http://example.com").matches()); // true
System.out.println(protocol.matcher("https://example.com").matches()); // true
System.out.println(protocol.matcher("ftp://example.com").matches()); // false
}
}
Anchors do not match characters — they match positions in the string. They assert that the current position in the string meets a certain condition.
| Anchor | Meaning | Example |
|---|---|---|
^ |
Start of string (or start of each line with MULTILINE flag) | ^Hello matches “Hello world” but not “Say Hello” |
$ |
End of string (or end of each line with MULTILINE flag) | world$ matches “Hello world” but not “world peace” |
\b |
Word boundary (between a word char and a non-word char) | \bcat\b matches “the cat sat” but not “concatenate” |
\B |
Non-word boundary (between two word chars or two non-word chars) | \Bcat\B matches “concatenate” but not “the cat sat” |
Word boundaries (\b) are one of the most useful anchors. A word boundary exists between a word character (\w) and a non-word character (\W), or at the start/end of the string if it begins/ends with a word character.
import java.util.regex.*;
public class AnchorsAndBoundaries {
public static void main(String[] args) {
// ^ and $ -- start and end anchors
System.out.println("Hello World".matches("^Hello.*")); // true
System.out.println("Say Hello".matches("^Hello.*")); // false
// Without anchors, find() looks anywhere in the string
Matcher m1 = Pattern.compile("error").matcher("An error occurred");
System.out.println(m1.find()); // true
// With anchors, matches() checks the entire string
System.out.println("An error occurred".matches("error")); // false -- not the whole string
System.out.println("error".matches("error")); // true
// \\b word boundary -- match whole words only
String text = "The cat scattered the catalog across the category";
Matcher wordCat = Pattern.compile("\\bcat\\b").matcher(text);
int count = 0;
while (wordCat.find()) {
System.out.println("Found whole word 'cat' at index " + wordCat.start());
count++;
}
System.out.println("Total matches: " + count);
// Found whole word 'cat' at index 4
// Total matches: 1
// -- "scattered", "catalog", and "category" were correctly excluded
// Without word boundary -- matches "cat" inside other words too
Matcher anyCat = Pattern.compile("cat").matcher(text);
count = 0;
while (anyCat.find()) {
count++;
}
System.out.println("Without boundary: " + count + " matches");
// Without boundary: 4 matches
// ^ and $ with MULTILINE flag -- match each line
String multiline = "First line\nSecond line\nThird line";
Matcher lineStarts = Pattern.compile("^\\w+", Pattern.MULTILINE).matcher(multiline);
while (lineStarts.find()) {
System.out.println("Line starts with: " + lineStarts.group());
}
// Line starts with: First
// Line starts with: Second
// Line starts with: Third
}
}
Parentheses () in a regex serve two purposes: they group parts of the pattern together (so quantifiers or alternation can apply to the whole group), and they capture the matched text (so you can retrieve it later).
Each pair of parentheses creates a capturing group, numbered left-to-right starting at 1. Group 0 always refers to the entire match.
For the pattern (\\d{3})-(\\d{3})-(\\d{4}) matching “555-123-4567”:
group(0) = “555-123-4567” (entire match)group(1) = “555” (area code)group(2) = “123” (prefix)group(3) = “4567” (line number)Numbered groups can be hard to read in complex patterns. Java supports named capturing groups using the syntax (?<name>...). You retrieve the value with matcher.group("name").
Sometimes you need parentheses for grouping (e.g., to apply a quantifier to a group) but do not need to capture the matched text. Use (?:...) for a non-capturing group. This is slightly more efficient since the regex engine does not need to store the match.
A backreference refers back to a previously captured group within the same pattern. \\1 refers to the text matched by group 1, \\2 refers to group 2, and so on. This is useful for finding repeated patterns like duplicate words.
import java.util.regex.*;
public class GroupsAndCapturing {
public static void main(String[] args) {
// --- Numbered Capturing Groups ---
String phone = "Call me at 555-123-4567 or 800-555-0199";
Pattern phonePattern = Pattern.compile("(\\d{3})-(\\d{3})-(\\d{4})");
Matcher m = phonePattern.matcher(phone);
while (m.find()) {
System.out.println("Full match: " + m.group(0));
System.out.println("Area code: " + m.group(1));
System.out.println("Prefix: " + m.group(2));
System.out.println("Line number: " + m.group(3));
System.out.println();
}
// Full match: 555-123-4567
// Area code: 555
// Prefix: 123
// Line number: 4567
//
// Full match: 800-555-0199
// Area code: 800
// Prefix: 555
// Line number: 0199
// --- Named Capturing Groups ---
String dateStr = "2026-02-28";
Pattern datePattern = Pattern.compile(
"(?\\d{4})-(?\\d{2})-(?\\d{2})"
);
Matcher dm = datePattern.matcher(dateStr);
if (dm.matches()) {
System.out.println("Year: " + dm.group("year")); // Year: 2026
System.out.println("Month: " + dm.group("month")); // Month: 02
System.out.println("Day: " + dm.group("day")); // Day: 28
}
// --- Non-Capturing Groups ---
// Match "http" or "https" without capturing the "s"
Pattern url = Pattern.compile("(?:https?)://([\\w.]+)");
Matcher um = url.matcher("Visit https://example.com today");
if (um.find()) {
System.out.println("Full match: " + um.group(0)); // Full match: https://example.com
System.out.println("Domain: " + um.group(1)); // Domain: example.com
// group(1) is the domain, not "https" -- because (?:...) did not capture
}
// --- Backreferences: find duplicate words ---
String text = "This is is a test test of of duplicate words";
Pattern dupes = Pattern.compile("\\b(\\w+)\\s+\\1\\b", Pattern.CASE_INSENSITIVE);
Matcher dupeMatcher = dupes.matcher(text);
while (dupeMatcher.find()) {
System.out.println("Duplicate found: \"" + dupeMatcher.group() + "\"");
}
// Duplicate found: "is is"
// Duplicate found: "test test"
// Duplicate found: "of of"
}
}
The pipe character | acts as an OR operator. The pattern cat|dog matches either “cat” or “dog”. Alternation has the lowest precedence of any regex operator, so gray|grey matches “gray” or “grey”, not “gra” followed by “y|grey”.
To limit the scope of alternation, use parentheses: gr(a|e)y matches “gray” or “grey”.
Lookaround assertions check if a pattern exists before or after the current position, but they do not consume characters (the match position does not advance). They are “zero-width assertions” — they assert a condition without including the matched text in the result.
| Syntax | Name | Meaning | Example |
|---|---|---|---|
(?=...) |
Positive lookahead | What follows must match | \\d+(?= dollars) matches “100” in “100 dollars” |
(?!...) |
Negative lookahead | What follows must NOT match | \\d+(?! dollars) matches “100” in “100 euros” |
(?<=...) |
Positive lookbehind | What precedes must match | (?<=\\$)\\d+ matches "50" in "$50" |
(? |
Negative lookbehind | What precedes must NOT match | (? matches "50" in "50" but not in "$50" |
Lookarounds are especially useful in password validation, where you need to check multiple conditions at the same position (e.g., must contain a digit AND a special character AND an uppercase letter).
import java.util.regex.*;
public class AlternationAndLookaround {
public static void main(String[] args) {
// --- Alternation ---
Pattern pet = Pattern.compile("cat|dog|bird");
String text = "I have a cat and a dog but no bird";
Matcher m = pet.matcher(text);
while (m.find()) {
System.out.println("Found pet: " + m.group());
}
// Found pet: cat
// Found pet: dog
// Found pet: bird
// Alternation with grouping
Pattern color = Pattern.compile("gr(a|e)y");
System.out.println(color.matcher("gray").matches()); // true
System.out.println(color.matcher("grey").matches()); // true
System.out.println(color.matcher("griy").matches()); // false
// --- Positive Lookahead: find numbers followed by "px" ---
Matcher lookahead = Pattern.compile("\\d+(?=px)").matcher("width: 100px; height: 50px; margin: 10em");
while (lookahead.find()) {
System.out.println("Pixel value: " + lookahead.group());
}
// Pixel value: 100
// Pixel value: 50
// -- "10" was excluded because it is followed by "em", not "px"
// --- Negative Lookahead: find numbers NOT followed by "px" ---
Matcher negLookahead = Pattern.compile("\\d+(?!px)").matcher("width: 100px; margin: 10em");
while (negLookahead.find()) {
System.out.println("Non-pixel: " + negLookahead.group());
}
// Non-pixel: 10
// Non-pixel: 10
// --- Positive Lookbehind: extract amounts after $ ---
Matcher lookbehind = Pattern.compile("(?<=\\$)\\d+\\.?\\d*").matcher("Price: $19.99 and $5.00");
while (lookbehind.find()) {
System.out.println("Amount: " + lookbehind.group());
}
// Amount: 19.99
// Amount: 5.00
// --- Password validation using multiple lookaheads ---
// At least 8 chars, one uppercase, one lowercase, one digit, one special char
Pattern strongPassword = Pattern.compile(
"^(?=.*[A-Z])" + // at least one uppercase
"(?=.*[a-z])" + // at least one lowercase
"(?=.*\\d)" + // at least one digit
"(?=.*[@#$%^&+=!])" + // at least one special character
".{8,}$" // at least 8 characters total
);
String[] passwords = {"Passw0rd!", "password", "SHORT1!", "MyP@ss12"};
for (String pw : passwords) {
boolean strong = strongPassword.matcher(pw).matches();
System.out.println(pw + " -> " + (strong ? "STRONG" : "WEAK"));
}
// Passw0rd! -> STRONG
// password -> WEAK
// SHORT1! -> WEAK
// MyP@ss12 -> STRONG
}
}
Java's String class has several built-in methods that accept regex patterns. These are convenient for simple use cases where you do not need the full power of Pattern and Matcher.
| Method | What it Does | Returns |
|---|---|---|
String.matches(regex) |
Tests if the entire string matches the regex | boolean |
String.split(regex) |
Splits the string at each match of the regex | String[] |
String.split(regex, limit) |
Splits with a limit on the number of parts | String[] |
String.replaceAll(regex, replacement) |
Replaces all matches with the replacement | String |
String.replaceFirst(regex, replacement) |
Replaces only the first match | String |
Performance warning: Every call to these methods compiles a new Pattern internally. If you call them in a loop or frequently, compile the Pattern once yourself and use Matcher instead.
import java.util.Arrays;
public class StringRegexMethods {
public static void main(String[] args) {
// --- matches() -- checks the ENTIRE string ---
System.out.println("12345".matches("\\d+")); // true
System.out.println("123abc".matches("\\d+")); // false -- not all digits
System.out.println("hello".matches("[a-z]+")); // true
// --- split() -- break a string into parts ---
// Split on one or more whitespace characters
String sentence = "Split this string up";
String[] words = sentence.split("\\s+");
System.out.println(Arrays.toString(words));
// [Split, this, string, up]
// Split a CSV line (handles optional spaces after commas)
String csv = "Java, Python, C++, JavaScript";
String[] languages = csv.split(",\\s*");
System.out.println(Arrays.toString(languages));
// [Java, Python, C++, JavaScript]
// Split with a limit
String path = "com.example.project.Main";
String[] parts = path.split("\\.", 3); // at most 3 parts
System.out.println(Arrays.toString(parts));
// [com, example, project.Main]
// --- replaceAll() -- replace all matches ---
// Remove all non-alphanumeric characters
String dirty = "Hello, World! @2026";
String clean = dirty.replaceAll("[^a-zA-Z0-9]", "");
System.out.println(clean); // HelloWorld2026
// Normalize whitespace: replace multiple spaces/tabs with a single space
String messy = "too many spaces here";
String normalized = messy.replaceAll("\\s+", " ");
System.out.println(normalized); // too many spaces here
// --- replaceFirst() -- replace only the first match ---
String text = "error: file not found. error: permission denied.";
String result = text.replaceFirst("error", "WARNING");
System.out.println(result);
// WARNING: file not found. error: permission denied.
// Use captured groups in replacement with $1, $2, etc.
// Reformat dates from MM/DD/YYYY to YYYY-MM-DD
String date = "02/28/2026";
String reformatted = date.replaceAll("(\\d{2})/(\\d{2})/(\\d{4})", "$3-$1-$2");
System.out.println(reformatted); // 2026-02-28
}
}
Pattern flags modify how the regex engine interprets the pattern. You pass them as the second argument to Pattern.compile(), or embed them directly in the pattern using inline flag syntax.
| Flag Constant | Inline | Effect |
|---|---|---|
Pattern.CASE_INSENSITIVE |
(?i) |
Matches letters regardless of case. abc matches "ABC". |
Pattern.MULTILINE |
(?m) |
^ and $ match start/end of each line, not just the entire string. |
Pattern.DOTALL |
(?s) |
. matches any character including newline. |
Pattern.COMMENTS |
(?x) |
Whitespace and comments (# to end of line) in the pattern are ignored. Great for readability. |
Pattern.UNICODE_CASE |
(?u) |
Case-insensitive matching follows Unicode rules, not just ASCII. |
Pattern.LITERAL |
-- | The pattern is treated as a literal string (metacharacters have no special meaning). |
You can combine multiple flags using the bitwise OR operator (|).
import java.util.regex.*;
public class PatternFlags {
public static void main(String[] args) {
// --- CASE_INSENSITIVE ---
Pattern ci = Pattern.compile("java", Pattern.CASE_INSENSITIVE);
System.out.println(ci.matcher("JAVA").matches()); // true
System.out.println(ci.matcher("Java").matches()); // true
System.out.println(ci.matcher("jAvA").matches()); // true
// Same thing using inline flag (?i)
System.out.println("JAVA".matches("(?i)java")); // true
// --- MULTILINE ---
String log = "ERROR: disk full\nWARN: low memory\nERROR: timeout";
Pattern errorLines = Pattern.compile("^ERROR.*$", Pattern.MULTILINE);
Matcher m = errorLines.matcher(log);
while (m.find()) {
System.out.println(m.group());
}
// ERROR: disk full
// ERROR: timeout
// --- DOTALL ---
String html = "\nHello\nWorld\n";
// Without DOTALL, . does not match newlines
System.out.println(html.matches(".*")); // false
// With DOTALL, . matches everything including newlines
Pattern dotall = Pattern.compile(".*", Pattern.DOTALL);
System.out.println(dotall.matcher(html).matches()); // true
// --- COMMENTS -- write readable patterns ---
Pattern readable = Pattern.compile(
"\\d{3}" + // area code
"-" + // separator
"\\d{3}" + // prefix
"-" + // separator
"\\d{4}" // line number
);
System.out.println(readable.matcher("555-123-4567").matches()); // true
// Using COMMENTS flag with whitespace and # comments in the pattern itself
Pattern commented = Pattern.compile(
"(?x) " + // enable comments mode
"\\d{3} " + // area code
"- " + // dash separator
"\\d{3} " + // prefix
"- " + // dash separator
"\\d{4} " // line number
);
System.out.println(commented.matcher("555-123-4567").matches()); // true
// --- Combining multiple flags ---
Pattern combined = Pattern.compile(
"^error.*$",
Pattern.CASE_INSENSITIVE | Pattern.MULTILINE
);
Matcher cm = combined.matcher("Error: something\nERROR: another\ninfo: ok");
while (cm.find()) {
System.out.println("Found: " + cm.group());
}
// Found: Error: something
// Found: ERROR: another
}
}
One of the most common uses of regex is input validation. Below are battle-tested patterns for common formats, each broken down so you understand every part.
A simplified but practical email regex. Note that the full RFC 5322 email spec is extremely complex -- this pattern covers the vast majority of real-world addresses.
// Email: local-part@domain.tld
// ^ -- start of string
// [a-zA-Z0-9._%+-]+ -- local part: letters, digits, dots, underscores, %, +, -
// @ -- literal @ symbol
// [a-zA-Z0-9.-]+ -- domain: letters, digits, dots, hyphens
// \. -- literal dot before TLD
// [a-zA-Z]{2,} -- TLD: at least 2 letters (com, org, io, etc.)
// $ -- end of string
String emailRegex = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$";
String[] emails = {"user@example.com", "first.last@company.co.uk", "invalid@", "@nodomain.com", "test@site.io"};
for (String email : emails) {
System.out.println(email + " -> " + (email.matches(emailRegex) ? "VALID" : "INVALID"));
}
// user@example.com -> VALID
// first.last@company.co.uk -> VALID
// invalid@ -> INVALID
// @nodomain.com -> INVALID
// test@site.io -> VALID
Matches multiple common US phone formats: (555) 123-4567, 555-123-4567, 5551234567, +1-555-123-4567.
// US phone: optional country code, various separator formats
// ^ -- start
// (\\+1[- ]?)? -- optional +1 country code with optional separator
// \\(? -- optional opening parenthesis
// \\d{3} -- area code (3 digits)
// \\)? -- optional closing parenthesis
// [- ]? -- optional separator (dash or space)
// \\d{3} -- prefix (3 digits)
// [- ]? -- optional separator
// \\d{4} -- line number (4 digits)
// $ -- end
String phoneRegex = "^(\\+1[- ]?)?(\\(?\\d{3}\\)?[- ]?)?\\d{3}[- ]?\\d{4}$";
String[] phones = {"(555) 123-4567", "555-123-4567", "5551234567", "+1-555-123-4567", "123"};
for (String phone : phones) {
System.out.println(phone + " -> " + (phone.matches(phoneRegex) ? "VALID" : "INVALID"));
}
// (555) 123-4567 -> VALID
// 555-123-4567 -> VALID
// 5551234567 -> VALID
// +1-555-123-4567 -> VALID
// 123 -> INVALID
Uses lookaheads to enforce multiple rules simultaneously: minimum length, uppercase, lowercase, digit, and special character.
// Password must have:
// (?=.*[A-Z]) -- at least one uppercase letter
// (?=.*[a-z]) -- at least one lowercase letter
// (?=.*\\d) -- at least one digit
// (?=.*[@#$%^&+=!]) -- at least one special character
// .{8,20} -- between 8 and 20 characters total
String passwordRegex = "^(?=.*[A-Z])(?=.*[a-z])(?=.*\\d)(?=.*[@#$%^&+=!]).{8,20}$";
String[] passwords = {"Str0ng!Pass", "weakpassword", "SHORT1!", "NoSpecial1", "G00d@Pwd"};
for (String pw : passwords) {
System.out.println(pw + " -> " + (pw.matches(passwordRegex) ? "STRONG" : "WEAK"));
}
// Str0ng!Pass -> STRONG
// weakpassword -> WEAK (no uppercase, no digit, no special)
// SHORT1! -> WEAK (less than 8 chars)
// NoSpecial1 -> WEAK (no special character)
// G00d@Pwd -> STRONG
// URL: protocol://domain:port/path?query#fragment
// ^https?:// -- http or https
// [\\w.-]+ -- domain name
// (:\\d{1,5})? -- optional port (1-5 digits)
// (/[\\w./-]*)* -- optional path segments
// (\\?[\\w=&%-]*)? -- optional query string
// (#[\\w-]*)? -- optional fragment
// $
String urlRegex = "^https?://[\\w.-]+(:\\d{1,5})?(/[\\w./-]*)*(\\?[\\w=&%-]*)?(#[\\w-]*)?$";
String[] urls = {
"https://example.com",
"http://localhost:8080/api/users",
"https://site.com/page?name=test&id=5",
"ftp://invalid.com",
"https://example.com/path#section"
};
for (String url : urls) {
System.out.println(url + " -> " + (url.matches(urlRegex) ? "VALID" : "INVALID"));
}
// https://example.com -> VALID
// http://localhost:8080/api/users -> VALID
// https://site.com/page?name=test&id=5 -> VALID
// ftp://invalid.com -> INVALID
// https://example.com/path#section -> VALID
// IPv4: four octets (0-255) separated by dots
// Each octet: 25[0-5] | 2[0-4]\\d | [01]?\\d{1,2}
// This handles: 0-9, 10-99, 100-199, 200-249, 250-255
String ipRegex = "^((25[0-5]|2[0-4]\\d|[01]?\\d{1,2})\\.){3}(25[0-5]|2[0-4]\\d|[01]?\\d{1,2})$";
String[] ips = {"192.168.1.1", "255.255.255.255", "0.0.0.0", "256.1.1.1", "192.168.1"};
for (String ip : ips) {
System.out.println(ip + " -> " + (ip.matches(ipRegex) ? "VALID" : "INVALID"));
}
// 192.168.1.1 -> VALID
// 255.255.255.255 -> VALID
// 0.0.0.0 -> VALID
// 256.1.1.1 -> INVALID (256 is out of range)
// 192.168.1 -> INVALID (only 3 octets)
// Date: YYYY-MM-DD (basic format validation, not full calendar validation)
// \\d{4} -- 4-digit year
// - -- separator
// (0[1-9]|1[0-2]) -- month: 01-12
// - -- separator
// (0[1-9]|[12]\\d|3[01]) -- day: 01-31
String dateRegex = "^\\d{4}-(0[1-9]|1[0-2])-(0[1-9]|[12]\\d|3[01])$";
String[] dates = {"2026-02-28", "2026-13-01", "2026-00-15", "2026-12-31", "26-01-01"};
for (String date : dates) {
System.out.println(date + " -> " + (date.matches(dateRegex) ? "VALID" : "INVALID"));
}
// 2026-02-28 -> VALID
// 2026-13-01 -> INVALID (month 13)
// 2026-00-15 -> INVALID (month 00)
// 2026-12-31 -> VALID
// 26-01-01 -> INVALID (2-digit year)
// Credit card: 13-19 digits, optionally separated by spaces or dashes every 4 digits
// Common formats: Visa (4xxx), Mastercard (5xxx), Amex (34xx/37xx)
String ccRegex = "^\\d{4}[- ]?\\d{4}[- ]?\\d{4}[- ]?\\d{1,7}$";
String[] cards = {"4111111111111111", "4111-1111-1111-1111", "4111 1111 1111 1111", "411", "1234567890123456789012"};
for (String card : cards) {
System.out.println(card + " -> " + (card.matches(ccRegex) ? "VALID FORMAT" : "INVALID FORMAT"));
}
// 4111111111111111 -> VALID FORMAT
// 4111-1111-1111-1111 -> VALID FORMAT
// 4111 1111 1111 1111 -> VALID FORMAT
// 411 -> INVALID FORMAT
// 1234567890123456789012 -> INVALID FORMAT
// Note: this only validates the FORMAT, not the actual card number.
// Use the Luhn algorithm for checksum validation.
// SSN format: XXX-XX-XXXX
// (?!000|666) -- area number cannot be 000 or 666
// (?!9) -- area number cannot start with 9
// \\d{3} -- 3-digit area number
// - -- separator
// (?!00)\\d{2} -- 2-digit group number (not 00)
// - -- separator
// (?!0000)\\d{4} -- 4-digit serial number (not 0000)
String ssnRegex = "^(?!000|666)(?!9\\d{2})\\d{3}-(?!00)\\d{2}-(?!0000)\\d{4}$";
String[] ssns = {"123-45-6789", "000-12-3456", "666-12-3456", "900-12-3456", "123-00-6789", "123-45-0000"};
for (String ssn : ssns) {
System.out.println(ssn + " -> " + (ssn.matches(ssnRegex) ? "VALID" : "INVALID"));
}
// 123-45-6789 -> VALID
// 000-12-3456 -> INVALID (area 000)
// 666-12-3456 -> INVALID (area 666)
// 900-12-3456 -> INVALID (area starts with 9)
// 123-00-6789 -> INVALID (group 00)
// 123-45-0000 -> INVALID (serial 0000)
Beyond validation, regex is heavily used for searching text and performing replacements. The Matcher class gives you fine-grained control over the search and replace process.
The find() method scans the input for the next match. Call it in a while loop to iterate through all matches.
import java.util.regex.*;
import java.util.ArrayList;
import java.util.List;
public class SearchAndReplace {
public static void main(String[] args) {
// --- Finding all matches ---
String text = "Contact us at support@company.com or sales@company.com. " +
"Personal: john.doe@gmail.com";
Pattern emailPattern = Pattern.compile("[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}");
Matcher finder = emailPattern.matcher(text);
List emails = new ArrayList<>();
while (finder.find()) {
emails.add(finder.group());
System.out.println("Found email at [" + finder.start() + "-" + finder.end() + "]: " + finder.group());
}
// Found email at [17-36]: support@company.com
// Found email at [40-57]: sales@company.com
// Found email at [69-88]: john.doe@gmail.com
System.out.println("Total emails found: " + emails.size()); // Total emails found: 3
// --- Simple replaceAll ---
String censored = emailPattern.matcher(text).replaceAll("[REDACTED]");
System.out.println(censored);
// Contact us at [REDACTED] or [REDACTED]. Personal: [REDACTED]
// --- replaceFirst ---
String firstOnly = emailPattern.matcher(text).replaceFirst("[REDACTED]");
System.out.println(firstOnly);
// Contact us at [REDACTED] or sales@company.com. Personal: john.doe@gmail.com
}
}
When you need dynamic replacements (e.g., the replacement depends on the matched value), use appendReplacement() and appendTail(). This pair lets you build a result string incrementally, applying custom logic to each match.
import java.util.regex.*;
public class CustomReplacement {
public static void main(String[] args) {
// Convert all words to title case using appendReplacement
String input = "the quick brown fox jumps over the lazy dog";
Pattern wordPattern = Pattern.compile("\\b([a-z])(\\w*)");
Matcher m = wordPattern.matcher(input);
StringBuilder result = new StringBuilder();
while (m.find()) {
String titleCase = m.group(1).toUpperCase() + m.group(2);
m.appendReplacement(result, titleCase);
}
m.appendTail(result);
System.out.println(result);
// The Quick Brown Fox Jumps Over The Lazy Dog
// Mask credit card numbers: show only last 4 digits
String data = "Card: 4111-1111-1111-1111, Another: 5500-0000-0000-0004";
Pattern ccPattern = Pattern.compile("(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})");
Matcher ccMatcher = ccPattern.matcher(data);
StringBuilder masked = new StringBuilder();
while (ccMatcher.find()) {
String replacement = "****-****-****-" + ccMatcher.group(4);
ccMatcher.appendReplacement(masked, replacement);
}
ccMatcher.appendTail(masked);
System.out.println(masked);
// Card: ****-****-****-1111, Another: ****-****-****-0004
// Java 9+: Matcher.replaceAll with a Function
String prices = "Items cost $5 and $23 and $100";
Pattern pricePattern = Pattern.compile("\\$(\\d+)");
String doubled = pricePattern.matcher(prices).replaceAll(mr -> {
int amount = Integer.parseInt(mr.group(1));
return "\\$" + (amount * 2);
});
System.out.println(doubled);
// Items cost $10 and $46 and $200
}
}
Even experienced developers make regex mistakes. Here are the most frequent pitfalls and how to avoid them.
This is the number one mistake for Java developers. In regex, \d means "digit." In a Java string, \d is not a valid escape sequence. You must write \\d so Java's string parser produces the single backslash that the regex engine expects.
// WRONG -- Java does not recognize \d as a string escape
// String pattern = "\d+"; // Compilation error!
// CORRECT -- double backslash to produce \d for the regex engine
String pattern = "\\d+";
// To match a literal backslash in text, you need FOUR backslashes:
// Java string: "\\\\" -> produces: \\ -> regex sees: \ (literal backslash)
String backslashPattern = "\\\\";
System.out.println("C:\\Users".matches(".*\\\\.*")); // true
Certain regex patterns can cause the engine to take an exponential amount of time on certain inputs. This happens when a pattern has nested quantifiers that can match the same characters in multiple ways.
// DANGEROUS -- nested quantifiers can cause catastrophic backtracking // String bad = "(a+)+b"; // On input "aaaaaaaaaaaaaaaaaac", the engine tries every possible way // to split the 'a's between the inner and outer groups before failing. // This can freeze your application. // SAFE -- flatten the nesting String safe = "a+b"; // This matches the same thing but without the exponential backtracking risk. // Another common danger: matching quoted strings with nested quantifiers // DANGEROUS: "(.*)*" // SAFE: "[^"]*" -- use negated character class instead
If your regex is more than about 80 characters long, consider breaking the validation into multiple simpler steps. A 200-character regex that validates everything at once is nearly impossible to maintain.
// BAD -- one massive unreadable regex
// String nightmare = "^(?=.*[A-Z])(?=.*[a-z])(?=.*\\d)(?=.*[@#$%^&+=!])[a-zA-Z0-9@#$%^&+=!]{8,20}$";
// BETTER -- break into understandable steps
public static boolean isStrongPassword(String password) {
if (password == null) return false;
if (password.length() < 8 || password.length() > 20) return false;
if (!password.matches(".*[A-Z].*")) return false; // needs uppercase
if (!password.matches(".*[a-z].*")) return false; // needs lowercase
if (!password.matches(".*\\d.*")) return false; // needs digit
if (!password.matches(".*[@#$%^&+=!].*")) return false; // needs special char
return true;
}
// Easier to read, debug, and extend. Each rule is independently testable.
Always test your regex with edge cases: empty strings, very long strings, strings with special characters, and strings that are close to matching but should not.
// Testing an email regex -- you need ALL of these test cases
String emailRegex = "^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$";
// Happy path
assert "user@example.com".matches(emailRegex); // standard email
assert "first.last@company.co.uk".matches(emailRegex); // dots and subdomains
// Edge cases that should FAIL
assert !"".matches(emailRegex); // empty string
assert !"@example.com".matches(emailRegex); // missing local part
assert !"user@".matches(emailRegex); // missing domain
assert !"user@.com".matches(emailRegex); // domain starts with dot
assert !"user@com".matches(emailRegex); // no TLD separator
assert !"user@@example.com".matches(emailRegex); // double @
// Edge cases that should PASS
assert "a@b.co".matches(emailRegex); // minimal valid email
assert "user+tag@gmail.com".matches(emailRegex); // plus addressing
String.matches() and Matcher.matches() check if the entire string matches the pattern. If you want to check if the pattern appears anywhere in the string, use Matcher.find().
String text = "Error code: 404";
// WRONG -- matches() checks the ENTIRE string
System.out.println(text.matches("\\d+")); // false -- the entire string is not digits
// CORRECT -- find() searches for the pattern anywhere
Matcher m = Pattern.compile("\\d+").matcher(text);
System.out.println(m.find()); // true
System.out.println(m.group()); // 404
// If you must use matches(), wrap the pattern with .*
System.out.println(text.matches(".*\\d+.*")); // true -- but find() is cleaner
Follow these guidelines to write regex that is correct, readable, and performant.
The Pattern.compile() method is expensive. If you use the same regex multiple times (in a loop, in a method called frequently, etc.), compile it once and store it as a static final field.
public class UserValidator {
// GOOD -- compiled once, reused many times
private static final Pattern EMAIL_PATTERN =
Pattern.compile("^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$");
private static final Pattern PHONE_PATTERN =
Pattern.compile("^(\\+1[- ]?)?(\\(?\\d{3}\\)?[- ]?)?\\d{3}[- ]?\\d{4}$");
public static boolean isValidEmail(String email) {
return email != null && EMAIL_PATTERN.matcher(email).matches();
}
public static boolean isValidPhone(String phone) {
return phone != null && PHONE_PATTERN.matcher(phone).matches();
}
// BAD -- compiles a new Pattern on every call
// public static boolean isValidEmailBad(String email) {
// return email.matches("^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$");
// }
}
Named groups make your code self-documenting. Instead of remembering that group(3) is the year, use group("year").
When you are searching for user-supplied text that might contain regex metacharacters, use Pattern.quote() to escape everything automatically.
If a regex grows beyond a readable length, consider breaking the validation into multiple steps or using a combination of regex and plain Java logic.
Use Java string concatenation with comments, or the COMMENTS flag, to make complex patterns understandable.
Always test with: empty strings, null input, maximum-length input, strings with only special characters, strings that are "almost" valid, and internationalized input (if applicable).
Instead of .* (which matches anything), use character classes that describe what you actually expect: [^"]* instead of .* inside quotes, \\d+ instead of .+ for numbers.
| Practice | Do | Do Not |
|---|---|---|
| Compile patterns | static final Pattern P = Pattern.compile(...) |
str.matches("...") in a loop |
| Escape user input | Pattern.quote(userInput) |
Concatenate user input directly into regex |
| Name groups | (?<year>\\d{4}) |
(\\d{4}) then group(1) |
| Be specific | [^"]* between quotes |
.* between quotes |
| Handle null | Check null before matching |
Call .matches() on nullable values |
| Break complex logic | Multiple simple checks | One enormous regex |
| Test edge cases | Empty, long, special chars, near-misses | Test only the happy path |
A comprehensive reference of all regex syntax elements covered in this tutorial.
| Category | Syntax | Meaning | Java String |
|---|---|---|---|
| Character Classes | [abc] |
Any of a, b, or c | "[abc]" |
[^abc] |
Not a, b, or c | "[^abc]" |
|
[a-z] |
Range a through z | "[a-z]" |
|
\d / \D |
Digit / Non-digit | "\\d" / "\\D" |
|
\w / \W |
Word char / Non-word char | "\\w" / "\\W" |
|
\s / \S |
Whitespace / Non-whitespace | "\\s" / "\\S" |
|
. |
Any character (except newline) | "." |
|
| Quantifiers | * |
Zero or more | "a*" |
+ |
One or more | "a+" |
|
? |
Zero or one | "a?" |
|
{n} |
Exactly n | "a{3}" |
|
{n,m} |
Between n and m | "a{2,5}" |
|
*? / +? |
Lazy (minimal) match | "a*?" / "a+?" |
|
| Anchors | ^ |
Start of string/line | "^" |
$ |
End of string/line | "$" |
|
\b |
Word boundary | "\\b" |
|
\B |
Non-word boundary | "\\B" |
|
| Groups | (...) |
Capturing group | "(abc)" |
(?:...) |
Non-capturing group | "(?:abc)" |
|
(?<name>...) |
Named group | "(?<name>abc)" |
|
\1 |
Backreference to group 1 | "\\1" |
|
| |
Alternation (OR) | "cat|dog" |
|
| Lookaround | (?=...) |
Positive lookahead | "(?=abc)" |
(?!...) |
Negative lookahead | "(?!abc)" |
|
(?<=...) |
Positive lookbehind | "(?<=abc)" |
|
(? |
Negative lookbehind | "(? |
|
| Flags | (?i) |
Case insensitive | Pattern.CASE_INSENSITIVE |
(?m) |
Multiline (^ $ match lines) | Pattern.MULTILINE |
|
(?s) |
Dotall (. matches newline) | Pattern.DOTALL |
|
(?x) |
Comments mode | Pattern.COMMENTS |
|
(?u) |
Unicode case | Pattern.UNICODE_CASE |
|
| -- | Literal (no metacharacters) | Pattern.LITERAL |
This final example brings together everything we have learned. It is a complete, runnable program that demonstrates regex in two real-world scenarios: parsing structured log files and validating user input for a registration form.
import java.util.regex.*;
import java.util.*;
import java.util.stream.Collectors;
/**
* Complete Regex Example: LogParser and InputValidator
*
* Demonstrates:
* - Pattern compilation and reuse (static final)
* - Named capturing groups
* - Multiple validation patterns
* - find() with while loop for extraction
* - replaceAll for data masking
* - appendReplacement for custom replacement
* - Lookaheads for password validation
* - Word boundaries
* - Greedy vs lazy matching
* - Pattern flags
*/
public class RegexDemo {
// =========================================================================
// Part 1: Log Parser -- Extract structured data from log entries
// =========================================================================
// Pre-compiled patterns (compiled once, reused across all calls)
private static final Pattern LOG_PATTERN = Pattern.compile(
"(?\\d{4}-\\d{2}-\\d{2} \\d{2}:\\d{2}:\\d{2})" + // 2026-02-28 14:30:00
"\\s+\\[(?\\w+)]" + // [ERROR]
"\\s+(?[\\w.]+)" + // com.app.Service
"\\s+-\\s+(?.*)" // - The log message
);
private static final Pattern IP_PATTERN = Pattern.compile(
"\\b(?:(?:25[0-5]|2[0-4]\\d|[01]?\\d{1,2})\\.){3}(?:25[0-5]|2[0-4]\\d|[01]?\\d{1,2})\\b"
);
private static final Pattern ERROR_CODE_PATTERN = Pattern.compile(
"\\b[A-Z]{2,4}-\\d{3,5}\\b" // e.g., ERR-5001, HTTP-404
);
public static void parseLogEntries(String[] logLines) {
System.out.println("=== LOG PARSER RESULTS ===");
System.out.println();
Map levelCounts = new LinkedHashMap<>();
List errorMessages = new ArrayList<>();
for (String line : logLines) {
Matcher m = LOG_PATTERN.matcher(line);
if (m.matches()) {
String timestamp = m.group("timestamp");
String level = m.group("level");
String className = m.group("class");
String message = m.group("message");
// Count log levels
levelCounts.merge(level, 1, Integer::sum);
// Collect error messages
if ("ERROR".equals(level)) {
errorMessages.add(timestamp + " | " + className + " | " + message);
}
// Extract IP addresses from the message
Matcher ipMatcher = IP_PATTERN.matcher(message);
while (ipMatcher.find()) {
System.out.println(" IP found in log: " + ipMatcher.group()
+ " (from " + className + ")");
}
// Extract error codes from the message
Matcher codeMatcher = ERROR_CODE_PATTERN.matcher(message);
while (codeMatcher.find()) {
System.out.println(" Error code found: " + codeMatcher.group()
+ " (at " + timestamp + ")");
}
}
}
System.out.println();
System.out.println("Log Level Summary:");
levelCounts.forEach((level, count) ->
System.out.println(" " + level + ": " + count));
System.out.println();
System.out.println("Error Messages:");
errorMessages.forEach(msg -> System.out.println(" " + msg));
}
// =========================================================================
// Part 2: Input Validator -- Validate form fields for user registration
// =========================================================================
private static final Pattern EMAIL_PATTERN = Pattern.compile(
"^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$"
);
private static final Pattern PHONE_PATTERN = Pattern.compile(
"^(\\+1[- ]?)?(\\(?\\d{3}\\)?[- ]?)?\\d{3}[- ]?\\d{4}$"
);
private static final Pattern PASSWORD_PATTERN = Pattern.compile(
"^(?=.*[A-Z])(?=.*[a-z])(?=.*\\d)(?=.*[@#$%^&+=!]).{8,20}$"
);
private static final Pattern USERNAME_PATTERN = Pattern.compile(
"^[a-zA-Z][a-zA-Z0-9_]{2,19}$" // starts with letter, 3-20 chars, only alphanumeric and _
);
private static final Pattern DATE_PATTERN = Pattern.compile(
"^(?\\d{4})-(?0[1-9]|1[0-2])-(?0[1-9]|[12]\\d|3[01])$"
);
public static Map validateRegistration(
String username, String email, String password, String phone, String birthDate) {
Map errors = new LinkedHashMap<>();
// Username validation
if (username == null || username.isEmpty()) {
errors.put("username", "Username is required");
} else if (!USERNAME_PATTERN.matcher(username).matches()) {
errors.put("username", "Must start with a letter, 3-20 chars, only letters/digits/underscore");
}
// Email validation
if (email == null || email.isEmpty()) {
errors.put("email", "Email is required");
} else if (!EMAIL_PATTERN.matcher(email).matches()) {
errors.put("email", "Invalid email format");
}
// Password validation with specific feedback
if (password == null || password.isEmpty()) {
errors.put("password", "Password is required");
} else {
List passwordIssues = new ArrayList<>();
if (password.length() < 8) passwordIssues.add("at least 8 characters");
if (password.length() > 20) passwordIssues.add("at most 20 characters");
if (!password.matches(".*[A-Z].*")) passwordIssues.add("an uppercase letter");
if (!password.matches(".*[a-z].*")) passwordIssues.add("a lowercase letter");
if (!password.matches(".*\\d.*")) passwordIssues.add("a digit");
if (!password.matches(".*[@#$%^&+=!].*")) passwordIssues.add("a special character (@#$%^&+=!)");
if (!passwordIssues.isEmpty()) {
errors.put("password", "Password needs: " + String.join(", ", passwordIssues));
}
}
// Phone validation
if (phone != null && !phone.isEmpty() && !PHONE_PATTERN.matcher(phone).matches()) {
errors.put("phone", "Invalid US phone format");
}
// Birth date validation
if (birthDate != null && !birthDate.isEmpty()) {
Matcher dm = DATE_PATTERN.matcher(birthDate);
if (!dm.matches()) {
errors.put("birthDate", "Invalid date format (use YYYY-MM-DD)");
} else {
int year = Integer.parseInt(dm.group("year"));
if (year > 2026 || year < 1900) {
errors.put("birthDate", "Year must be between 1900 and 2026");
}
}
}
return errors;
}
// =========================================================================
// Part 3: Data Masking -- Redact sensitive information from text
// =========================================================================
private static final Pattern SSN_IN_TEXT = Pattern.compile(
"\\b\\d{3}-\\d{2}-\\d{4}\\b"
);
private static final Pattern CC_IN_TEXT = Pattern.compile(
"\\b(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})[- ]?(\\d{4})\\b"
);
private static final Pattern EMAIL_IN_TEXT = Pattern.compile(
"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}"
);
public static String maskSensitiveData(String text) {
// Mask SSNs: 123-45-6789 -> ***-**-6789
String result = SSN_IN_TEXT.matcher(text).replaceAll(mr -> {
String ssn = mr.group();
return "***-**-" + ssn.substring(ssn.length() - 4);
});
// Mask credit cards: show only last 4 digits
Matcher ccMatcher = CC_IN_TEXT.matcher(result);
StringBuilder sb = new StringBuilder();
while (ccMatcher.find()) {
ccMatcher.appendReplacement(sb, "****-****-****-" + ccMatcher.group(4));
}
ccMatcher.appendTail(sb);
result = sb.toString();
// Mask emails: user@domain.com -> u***@domain.com
Matcher emailMatcher = EMAIL_IN_TEXT.matcher(result);
sb = new StringBuilder();
while (emailMatcher.find()) {
String email = emailMatcher.group();
int atIndex = email.indexOf('@');
String masked = email.charAt(0) + "***" + email.substring(atIndex);
emailMatcher.appendReplacement(sb, Matcher.quoteReplacement(masked));
}
emailMatcher.appendTail(sb);
return sb.toString();
}
// =========================================================================
// Main -- Run all demonstrations
// =========================================================================
public static void main(String[] args) {
// --- Part 1: Parse log entries ---
String[] logLines = {
"2026-02-28 14:30:00 [INFO] com.app.UserService - User login from 192.168.1.100",
"2026-02-28 14:30:05 [ERROR] com.app.PaymentService - Payment failed: ERR-5001 for IP 10.0.0.1",
"2026-02-28 14:30:10 [WARN] com.app.AuthService - Failed login attempt from 172.16.0.50",
"2026-02-28 14:30:15 [ERROR] com.app.OrderService - Order processing failed: HTTP-500 timeout",
"2026-02-28 14:30:20 [INFO] com.app.CacheService - Cache refreshed successfully",
"2026-02-28 14:30:25 [ERROR] com.app.DatabaseService - Connection lost: DB-1001 to 192.168.1.200"
};
parseLogEntries(logLines);
System.out.println();
System.out.println("========================================");
System.out.println();
// --- Part 2: Validate registration forms ---
System.out.println("=== REGISTRATION VALIDATION ===");
System.out.println();
// Test case 1: Valid registration
Map errors1 = validateRegistration(
"john_doe", "john@example.com", "MyP@ss123", "(555) 123-4567", "1990-06-15"
);
System.out.println("Test 1 (valid): " + (errors1.isEmpty() ? "PASSED" : "FAILED: " + errors1));
// Test case 2: Multiple validation failures
Map errors2 = validateRegistration(
"2bad", "not-an-email", "weak", "12345", "2026-13-45"
);
System.out.println("Test 2 (invalid):");
errors2.forEach((field, error) -> System.out.println(" " + field + ": " + error));
// Test case 3: Specific password feedback
Map errors3 = validateRegistration(
"alice", "alice@test.com", "onlylowercase", null, null
);
System.out.println("Test 3 (weak password):");
errors3.forEach((field, error) -> System.out.println(" " + field + ": " + error));
System.out.println();
System.out.println("========================================");
System.out.println();
// --- Part 3: Mask sensitive data ---
System.out.println("=== DATA MASKING ===");
System.out.println();
String sensitiveText = "Customer SSN: 123-45-6789, CC: 4111-1111-1111-1111, " +
"Email: john.doe@gmail.com, Alt SSN: 987-65-4321";
System.out.println("Original: " + sensitiveText);
System.out.println("Masked: " + maskSensitiveData(sensitiveText));
}
}
=== LOG PARSER RESULTS === IP found in log: 192.168.1.100 (from com.app.UserService) IP found in log: 10.0.0.1 (from com.app.PaymentService) Error code found: ERR-5001 (at 2026-02-28 14:30:05) IP found in log: 172.16.0.50 (from com.app.AuthService) Error code found: HTTP-500 (at 2026-02-28 14:30:15) Error code found: DB-1001 (at 2026-02-28 14:30:25) IP found in log: 192.168.1.200 (from com.app.DatabaseService) Log Level Summary: INFO: 2 ERROR: 3 WARN: 1 Error Messages: 2026-02-28 14:30:05 | com.app.PaymentService | Payment failed: ERR-5001 for IP 10.0.0.1 2026-02-28 14:30:15 | com.app.OrderService | Order processing failed: HTTP-500 timeout 2026-02-28 14:30:25 | com.app.DatabaseService | Connection lost: DB-1001 to 192.168.1.200 ======================================== === REGISTRATION VALIDATION === Test 1 (valid): PASSED Test 2 (invalid): username: Must start with a letter, 3-20 chars, only letters/digits/underscore email: Invalid email format password: Password needs: at least 8 characters, an uppercase letter, a digit, a special character (@#$%^&+=!) phone: Invalid US phone format birthDate: Invalid date format (use YYYY-MM-DD) Test 3 (weak password): password: Password needs: an uppercase letter, a digit, a special character (@#$%^&+=!) ======================================== === DATA MASKING === Original: Customer SSN: 123-45-6789, CC: 4111-1111-1111-1111, Email: john.doe@gmail.com, Alt SSN: 987-65-4321 Masked: Customer SSN: ***-**-6789, CC: ****-****-****-1111, Email: j***@gmail.com, Alt SSN: ***-**-4321
| # | Concept | Where Used |
|---|---|---|
| 1 | Pattern compilation and reuse | static final Pattern fields throughout |
| 2 | Named capturing groups | LOG_PATTERN: (?<timestamp>...), (?<level>...), (?<class>...), (?<message>...) |
| 3 | find() with while loop | IP address and error code extraction from log messages |
| 4 | matches() for full-string validation | All validators: email, phone, username, password, date |
| 5 | Lookaheads for password rules | PASSWORD_PATTERN uses (?=.*[A-Z]), (?=.*\\d), etc. |
| 6 | Word boundaries | SSN_IN_TEXT, CC_IN_TEXT, ERROR_CODE_PATTERN use \\b |
| 7 | appendReplacement / appendTail | Credit card and email masking with custom replacement logic |
| 8 | replaceAll with Function (Java 9+) | SSN masking: replaceAll(mr -> ...) |
| 9 | Matcher.quoteReplacement() | Email masking: prevents $ and \ in replacement from being interpreted |
| 10 | Numbered capturing groups | CC_IN_TEXT: group(4) to get last 4 digits |
| 11 | Group extraction for further processing | Date validation: extracting year for range check |
| 12 | Multiple regex patterns working together | Log parser uses 3 patterns; validator uses 5 patterns; masker uses 3 patterns |
| 13 | Breaking complex validation into steps | Password validation gives specific feedback per rule instead of one giant regex |
| 14 | Null-safe validation | All validators check for null before applying regex |