How to Do Pattern Matching in VB.NET

Submitted by: 
Language: 
Visitors have accessed this post 422 times.

HOW TO DO PATTERN MATCHING IN VB.NET
Searching for a particular text is one of the important applications in text processing. Searching becomes more difficult when the size of text increases. Regular expression is a method used to reduce the time and complexity of searching.
Regular expressions also called as RegEX which describes the pattern (set of strings which needs to be searched).
HOW TO WRITE REGULAR EXPRESSION
Regular expression is a string and contains list of characters which represents the set of search result.
EXAMPLE
Text : this sentence has more than 10 letters.
Regular expression: \d
Result : 1,0
In the above example, there is a simple text, the regular expression we are using is \d. \d corresponds to a single digit. There are 2 digits in the text, so the output contains 2 matches (1 and 0).

Regular expression Description Text Output
\d Matches one digit Hi, my roll number is 53. 5, 3
\D Matches one non-digit 15 cars c, a, r and s
\w Matches one word (except spaces) 15 cars 1, 5, c, a, r, s
\W Matches one non word 15 cars Contains one space
. Matches one single character except line breaks 15 cars 1, 5,  , c, a, r, s
\n Matches a new line This is line 1
This is line 2
Contains a new line character

There are few quantifiers which are used to match more than one characters, here are the list of quantifiers.

Quantifiers Meaning
+ One or more occurrences
* Zero or more occurrences
? Zero or one occurrence
{n} Exactly n number of occurrences
{n,m} n to m number of occurrences
{,n} Zero to n number of occurrences
{n,} n to more number of occurrences
\b Matches either beginning/ending of a word

Few examples of regular expression
1) Find all the 3 digit numbers in the string “this sentence has numbers 1, 45,103, 53, 2456, 23”
Regular expression: /d{3}
Result: 103, 245.
Explanation: the regular expression \d matches one digit, but this has {3} quantifier added with it, so this matches all the numbers where there are 3 consecutive digits. The first digit 1 is followed by , (which is a non-digit) so 1 is rejected. The next number is 45. This has 2 consecutive digits, but the 3ed one is a comma, so it is also rejected. The next number 103 satisfies the condition, so it is accepted. The forth one 53 and the sixth one 23 will be rejected. The number 2456 satisfies the condition of consecutive 3 digit, so it is also accepted.

2) Find all words that ends with at. Input string “cat hat wet sit fat”
Regular expression: (/w)*at/b
Result: cat, hat, fat
Explanation: /w matches one word, the * quantifier represent 0 or more, so (/w)* matches zero or more words. The /b is placed after at, this means that the string at should be at last. So the above regular expression matches all the string which ends with at.

3) Find all words that begins with st. input string “horse stable, stars in sky, working staffs”
Regular expression /bst(/w)*
Result: stable, stars, staffs
Explanation: \b is present before st, so it matches all characters following by st. The quantifier * used along with \w matches all words followed by st.


Note: Due to the size or complexity of this submission, the author has submitted it as a .zip file to shorten your download time. After downloading it, you will need a program like Winzip to decompress it.

Virus note: All files are scanned once-a-day by SourceCodester.com for viruses, but new viruses come out every day, so no prevention program can catch 100% of them.

FOR YOUR OWN SAFETY, PLEASE:

1. Re-scan downloaded files using your personal virus checker before using it.
2. NEVER, EVER run compiled files (.exe's, .ocx's, .dll's etc.)--only run source code.

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd> <table> <tr> <td> <th> <img> <h1> <h2> <h3>
  • You may insert videos with [video:URL]
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <asp>, <c>, <cpp>, <csharp>, <css>, <html4strict>, <java>, <javascript>, <mysql>, <php>, <python>, <sql>, <vb>, <vbnet>. The supported tag styles are: <foo>, [foo].
  • Lines and paragraphs break automatically.

Plain text

  • No HTML tags allowed.
  • Lines and paragraphs break automatically.