Wednesday, 28 October 2015

Regular Expressions

Regular Expression is used to represent a group of String objects according to a particular pattern. They are useful in Validation frameworks, Pattern Matching applications(Ctrl F, Grep), Communication Protocols.

Regular Expression in java is implemented using Pattern and Matcher classes
A Pattern Object represents java object of regular expression. It can be created using compile() method. Pattern.compile("regularExpression");
A Matcher Object to Match the given pattern in target String. It can be created using matcher() of Pattern class.

Methods in Matcher class:
  • find() - Attempts to find next match and returns true if it is applicable. Otherwise returns false
  • start() - Start index of the match
  • end() - end+1 index of the match
  • group() - returns matched pattern

Example
import java.util.regex.Matcher;
import java.util.regex.Pattern;

class RegExDemo1{
  public static void main(String args[]){
   int count=0;
   Pattern p=Pattern.compile("dv1");//If a class.method() returns object of the class then it is called static factory method.
   Matcher m=p.matcher("dv1dv113dv1");//target string
   while (m.find()){
       count++;
       System.out.println(m.start()+"   "+m.end()+"    "+m.group());
       }
       System.out.println("The Number of Occurences:" +count);  
}
}

Character Classes:
[abc] -> Either a or b or c
[^abc] -> any character except a, b and c
[a-z] -> Any lower case alphabet
[A-Z] -> Any upper case alphabet
[a-z A-Z] -> Any alphabet
[0-9] -> any digit from 0-9
[a-z A-Z 0-9] -> any alpha numerical number
[^a-z A-Z 0-9] -> except alpha numerical number(any special character)

 eg:
   Pattern p=Pattern.compile("[^a-zA-Z0-9]");
   Matcher m=p.matcher("dv1$dv1*13dv1");
   while (m.find())        System.out.println(m.start()+"    "+m.group());

Pre-Defined Character Classes: 
\s   -> Space character
\S   ->Any character except space
\d   ->any digit from 0-9
\D   ->any character except digit
\w   ->any word (alpha numeric character) [a-z A-Z 0-9]
\W   -> except alpha numerical number(any special character)
.       ->any symbol including Special Character also

   Pattern p=Pattern.compile("\\s");  //double backslash to indicate compiler not to treat \ as escape character.

Quantifiers: to specify number of occurrences to match

  a -> Exactly one a
  a+ -> Atleast one a
  a* -> 0 or more a's
  a? -> atmost one a (0 or 1)


split() -> to split the given target string according to the given pattern


class RegExDemo1{
  public static void main(String args[]){
   int count=0;
   Pattern p=Pattern.compile("\\s");
   String[] str=p.split("dv1 dv113 dv1");
   for(String s1 : str){
       System.out.println(s1);
       }
}
}

   Pattern p=Pattern.compile("\\."); // [.]
   String[] str=p.split("venkatdesu.blogspot.com");


String Class - split() method

   String str="dv1 dv113 dv1";
   String[] s=p.split("\\s");
   for (s1:s) System.out.println(s1);

Note: Pattern class split takes Target string as arugument, where as String class split takes pattern as argument.

StringTokenizer - This is designed for Tokenization activity. It is present in java.util package
   StringTokenizer st=new StringTokenizer("dv1 dv113 dv1");
   while (st.hasMoreTokens())
          System.out.println(st.nextToken);

 Note - Default regular expression for StringTokenizer is space.
   StringTokenizer st=new StringTokenizer("dv1 dv113 dv1","1");
  
Regular Expression to represent 10 digit mobile number
  • Every number should contain exactly 10 digits
  • first digit should be 7 or 8 or 9
    • [7-9][0-9] [0-9][0-9] [0-9] [0-9] [0-9] [0-9] [0-9] [0-9] 
    • [7-9][0-9]{9}
    • [789][0-9]{9}
  •  If number is 11 digits then the first number should be 0. 10 or 11 digit number regex
    •  0?[7-9][0-9]{9}
  •  If number is 12 digits then the first two numbers should be 91. 10 or 11 or 12 digit number regex
    •  (0|91)?[7-9][0-9]{9}
Regular Expression to all valid mail ids
  • [a-zA-Z0-9][a-zA-Z0-9_.]*@[a-zA-Z0-9]+([.][a-zA-Z]+)+
 Regular Expression to represent gmail ids
  • [a-zA-Z0-9][a-zA-Z0-9_.]*@gmail[.]com
     
Regular Expression to identifier with following rules:
1) Allowed are alpha numeric, # and $
2) length should be atleast 2
3) first should be lower case alphabetic between a and k
4) second should be divisible by 3

  • [a-k][0369][a-zA-Z0-9#$]*
 Regular Expression to represent all names that start with A or a
  • [aA][a-zA-Z]*
 Regular Expression to represent all names that end with L or l
  • [a-zA-Z]*[lL]
 Regular Expression to represent all names that start with A or a AND end with L or l
  • [aA][a-zA-Z]*[lL]

 Program to check a valid mobile number.

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class checkValidMobile{
  public static void main(String args[]){
   int count=0;
   Pattern p=Pattern.compile("(0|91)?[7-9][0-9]{9}");
   Matcher m=p.matcher(args[0]);//target string
   if (m.find() && m.group().equals(args[0])){
       System.out.println("Valid Mobile Number");
       }
  else {
       System.out.println("invalid Mobile Number");
}
}
}

Program to extract valid mail ids in a text file

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class checkValidMobile{
  public static void main(String args[]){
   Pattern p=Pattern.compile("[a-zA-Z0-9][a-zA-Z0-9_.]*@[a-zA-Z0-9]+([.][a-zA-Z]+)+");

   PrintWriter pw=new PrintWriter("output.txt");
   BufferedReader br=new BufferedReader(new FileReader("input.txt"));
   String line =br.readLine();

   while (line!=null){
        Matcher m=p.matcher(line);//target string
         while (m.find()){
              pw.print(m.group());
             }
   line=br.readLine();
}
pw.flush();
pw.close();
br.close();
}
}

Program to extract valid mail ids or mobile numbers in a text file
( [a-zA-Z0-9][a-zA-Z0-9_.]*@[a-zA-Z0-9]+([.][a-zA-Z]+)+[789][0-9]{9})
(mail|mobile)

Program to print all .txt files in a directory 

import java.util.regex.Matcher;
import java.util.regex.Pattern;

class FileNames{
  public static void main(String args[]) throws Exception{
   int count=0;
   Pattern p=Pattern.compile("[a-zA-Z0-9][a-zA-Z0-9_$.]*[.]txt");
   File f= new File("d:\\temp");
   String[] files = f.list();
   while(String f1: files){
   Matcher m=p.matcher(f1);
   if (m.find() && m.group().equals(f1)){
       count++;
       System.out.println(f1);
       }
       System.out.println("Total file counts is:"+count);
}
}


No comments:

Post a Comment