I often say, we must know the basic so that we hang on around it. Having so much in IT industry i realized, people struggles with writting simple regex. In the age of ChatGPT, you can have the regex solution quickly but you fail to understand it then you need to learn from scratch.
Regex is essential in programming languages and Unix OS, and knowledge of it is important for computer science.
In this blog, i try to write most imporant regrex. Please read it slowly and practice it. This article will surely give you confidence to be a better programmer, debugger (while scanning logs).
Regex problem – 1
This has start with foo and end with bar and if we see there is only one single character in between.
.(dot) single wildcard -> can represent any one character in a single position.
fooabar - include
fooxbar - include
baryfoo - exclude
foobar - exclude
fooxybar- exclude
foocbar - include
Solution
foo.bar
Regex problem – 2
Question – to include 1,3,4,6
This start with foo and end with bar and has characters in between. the characters in between are be 0 to 3
Hint – .* – zero or more occurances of wildcard,which means zero or more occurances of any characters
foobar
barfoo
fooabcbar
foobxcbar
barcbyfoo
foozbar
barafoo
barabfoo
Solution
foo.*bar
Regex Problem – 3
Question – include 2,5,6,7- which has whitespaces zero or more whitespaces.
Hint – \s represents whitespace and \s* represents zero or more occurances of whitespaces.
fooxxxbar
foo bar
fooxbar
fooxxbar
foo bar
foo bar
foobar
fooyyybar
Solution
foo\s*bar
Regex Problem – 4
Question – include, 1, 3, 6
No exact pattern is found in 1, 3, 6 as other lines also have same pattern so we can not apply wildcard here. instead we can apply character class. character class is the choices out of many characters class is wrapped in [] and whatever is written in branker says that we have either choices in position.
For example [abc] says that at any particular position we many have either a,b or c.
foo
moo
coo
doo
poo
loo
boo
hoo
Solution
[fcl]oo
Regex Problem – 5
foo
moo
coo
doo
poo
loo
boo
hoo
include 1,3,4,5,6,8
Solution
[fcdplh]oo
Regex Problem – 6
In the example 6, we did with help of character class but here we can do other way. for example we need to include the words as per below pattern and then negate it with ^
foo
moo
coo
doo
poo
loo
boo
hoo
include 1,3,4,5,6,8
Soultion
[^mb]oo
Regex Problem – 7
joo
boo
koo
loo
woo
moo
zoo
coo
include 1,3,4,6
Two solution
[jklm]oo
[j-m]oo
Regex Problem – 8
joo
boo
Koo
Loo
woo
moo
zoo
coo
Zoo
include 1,3,4,6,7
Solution
[j-mm-zK-L]oo
Explanation - here we have three ranges j-m or m-z or K-L. if we have outlier word for example Zoo then we can use [j-mm-zK-LZ]oo
Regex Problem – 9
We see there is a x character repitations and dot and then y character repitations. since dot comes in wildcard family and if any wildcard family comes in our input then we need to use escape symbol with backslash.
xxx.yy
xx.yyyy
x.yy
xy
xxyy
yyxx
yx
yxxx
include 1,2,3
Solution
x.*\.y.*
Regex – 10
x#y
x:y
x.y
x&y
x%y
include 1,2,3
Solution command
x[\#\:\.]y
Regex Problem – 11
^ (carat) is a symbol that signify the beginning of the line. The interpretation of ^ differes within square bracket and outside of it. inside square bracket , ^ stands for negation. Outside it is a placeholder for beginning of line.
foo bar baz
bar foo baz
baz foo bar
bar baz foo
foo baz bar
baz bar foo
include - 1,5
foo.* will be the wrong solution as these will include line number 2 and other also because line number 2 also has foo and baz. So we need something to represent start of line. this can be acheived from carat symbol.
Solution
^foo.*
Regex Problem – 12
foo bar baz
bar foo baz
baz foo bar
bar baz foo
foo baz bar
baz bar foo
inlcude 3, 5
Explanation - 3 and 5 ends with bar.
*bar will not work. it can filter line number 1 also so we need something to represent the end of line which can be done by $
Solution
.*bar$
Regex Problem – 13
foo
foo bar
baz foo
foo bar baz
baz bar foo
include 1
Solution
^foo$
Regex Problem – 14
834
519
4874
5
89
45687
25
645
include 1,2,8
^[0-9][0-9][0-9]$ - We can not write like this as here only has 3 digit letter. if we have more digit then we don't want to repeat [0-9] many times.
Curly braces is used to show the repetitions
a{m} -> represents exactly 'm' repetitions of whatever immediatly precedes this. i.e 'a'
Solution
^[0-9]{3}$
Explanation
^ - tells the beginning of line
[0-9] - tells the a character class (single character) anywhere between 0 and 9
{3} - curly bracket is used to represents the repetitions and here it is telling that three repetitions of any character
$ - tells the end of line
Regrex Problem – 15
lion
tiger
leopard
fox
kangaroo
bat
mouse
cuckoo
deer
include 1,2,7,8,9
Pattern is – either 4 or 5 or 6 characters long word.
a{m,n} – Represents at least m and at most n repetations of whatever immediatly preeceed a
Solution
^[a-z]{4,6}$
Regrex – 16
ha
hahahahaha
hahaha
hahahaha
haha
hahahahahaha
hahahahahahahaha
hahahahahahahahaha
include 2,4,7,8,9 – ha word repeats minimum 4 times and maximum 9 times
Wrong answer – ^ha{4,9}$ This is wrong because it is saying the minimum repetation of a 4 times and maximum repetation of a 9 times and our requirement is to match ha which is two character long. So we need something to group the characters to form a single entity.
parantesis is used here to group of characters to form a single entity.
Correct Solution
^(ha){4,9}$
Regex – 17
ha
haha
hahahahaha
hahahaha
hahaha
hahahahahahaha
hahahahahaha
include 1,2
Solution
^(ha){,2}$