Tag Archives: regular expression

20 Mar

Working with EditPlus Text Editor-Regular Expression How To

Editplus is a lot better than the regular text editor, Notepad.
From all it’s features I like RegExp Support the most, and than comes the block select feature.
Here are the quick lines to carry out regular tasks using regular expression in Editplus. It’s kinda downloadable cheetsheet list.

Remove all empty lines:

Find: “^\n” (Ignore double-quotes in all find/replace)
Replace: “”
Where,
^ – Beginning of the line
\n – New Line

Remove Multiple Spaces convert into single space:

Find: ” +”
Replace: ” “
Where,
+ – find one or more occurance of space character.

Comment multiple line of code:

Find: “^”
Replace: “#” or “//”
You may optionally use: Edit Menu > Format > Line Comment.
Generate Comma Separated List from new line delimited list:

Find: “\n”
Replace: “, “
This helps in even joining some of lines of code instead of replacing by comma you may replace it with “”.

Manipulate columns display order / punctuation:

Find: “([0-9]+)\t([a-zA-Z]+)”
Replace: “\2\t\1”
Where,
[0-9]+ – Finds one or more digits
[a-zA-Z]+ – Finds one or more characters
() – mark the block or capture the group
\2 – 2nd mark expression
Eg:
123 abc
345 cde
567 efg
Becomes:
abc 123
cde 345
efg 567
The Other Way:
– Press Alt+C
– Drag you mouse to select respective column and click
– Copy / Cut as required

[ad#ad-2-300×250]

Append / Add semicolon (any character) at the end of the line:

Find: “\n”
Replace: “;\n”

Enclose lines by quotes:

Find: “\n”
Replace: “‘\n'”

Delete all lines containing a given STRING:

Find: “^.*STRING.*$”
Replace: “”

Remove lines not containing a given STRING:

I don’t know how to do this!! 🙂

Convert tab separated file into insert statements:

TSV: abcd de4 iirn 34399
SQL: INSERT INTO TABLENAME VALUES (“abcd”, “de4”, “iirn”,”34399″);
Find: “(.*)\t(.*)\t(.*)\t(.*)”
Replace: “INSERT INTO TABLENAME VALUES (“\1”, “\2”, “\3″,”\4″);”

Format the telephone number:

Find: “([0-9][0-9][0-9])([0-9][0-9][0-9])([0-9].*)”
Replace: “\1-\2-\3”
Eg.:

Original: 1231231231
Formatted-1: 123-123-1231

Remove Brackets:

Find: “\(|\)”
Replace: “”
Where,
\( – Match (. \ is required to escape marking the expression.
| – or

Replace 1st occurrence of character:

Find: ” (.*)”
Replace: “-\1”
Where,
(.*) – matches everything and marks the block
** Make sure you ignore double-quotes(“) while writing in find / replace boxes.

EditPlus supports following regular expressions in Find, Replace and Find in Files command.

Expression – Description
  • \t – Tab character.
  • \n – New line.
  • . – Matches any character.
  • | – Either expression on its left and right side matches the target string.
  • [] – Any of the enclosed characters may match the target character.
  • [^] – None of the enclosed characters may match the target character.
  • * – Character to the left of asterisk in the expression should match 0 or more times.
  • + – Character to the left of plus sign in the expression should match 1 or more times.
  • ? – Character to the left of question mark in the expression should match 0 or 1 time.
  • ^ – Expression to the right of ^ matches only when it is at the beginning of line.
  • $ – Expression to the left of $ matches only when it is at the end of line.
  • () – Affects evaluation order of expression and also used for tagged expression.
  • \ – Escape character. If you want to use character “\” itself, you should use “\\”.

Notable Features of Editplus are :

  • Spell checking
  • Regex-based find & replace
  • Encoding conversion
  • Newline conversion
  • Syntax highlighting
  • Multiple undo/redo
  • Rectangular block selection
  • Auto indentation
  • Code folding (Text folding)

Download pdf: Editplus-RegExp.

04 Mar

Regular Expression Basics – Quick Reference

Characters

Character

Description

Example

Any character except [\^$.|?*+() All characters except the listed special characters match a single instance of themselves. { and } are literal characters, unless they’re part of a valid regular expression token (e.g. the {n} quantifier). a matches a
\ (backslash) followed by any of [\^$.|?*+(){} A backslash escapes special characters to suppress their special meaning. \+ matches +
\Q…\E Matches the characters between \Q and \E literally, suppressing the meaning of special characters. \Q+-*/\E matches +-*/
\xFF where FF are 2 hexadecimal digits Matches the character with the specified ASCII/ANSI value, which depends on the code page used. Can be used in character classes. \xA9 matches © when using the Latin-1 code page.
\n, \r and \t Match an LF character, CR character and a tab character respectively. Can be used in character classes. \r\n matches a DOS/Windows CRLF line break.
\a, \e, \f and \v Match a bell character (\x07), escape character (\x1B), form feed (\x0C) and vertical tab (\x0B) respectively. Can be used in character classes.
\cA through \cZ Match an ASCII character Control+A through Control+Z, equivalent to \x01 through \x1A. Can be used in character classes. \cM\cJ matches a DOS/Windows CRLF line break.

Character Classes or Character Sets [abc]

Character

Description

Example

[ (opening square bracket) Starts a character class. A character class matches a single character out of all the possibilities offered by the character class. Inside a character class, different rules apply. The rules in this section are only valid inside character classes. The rules outside this section are not valid in character classes, except \n, \r, \t and \xFF
Any character except ^-]\ add that character to the possible matches for the character class. All characters except the listed special characters. [abc] matches a, b or c
\ (backslash) followed by any of ^-]\ A backslash escapes special characters to suppress their special meaning. [\^\]] matches ^ or ]
– (hyphen) except immediately after the opening [ Specifies a range of characters. (Specifies a hyphen if placed immediately after the opening [) [a-zA-Z0-9] matches any letter or digit
^ (caret) immediately after the opening [ Negates the character class, causing it to match a single character not listed in the character class. (Specifies a caret if placed anywhere except after the opening [) [^a-d] matches x (any character except a, b, c or d)
\d, \w and \s Shorthand character classes matching digits 0-9, word characters (letters and digits) and whitespace respectively. Can be used inside and outside character classes. [\d\s] matches a character that is a digit or whitespace
\D, \W and \S Negated versions of the above. Should be used only outside character classes. (Can be used inside, but that is confusing.) \D matches a character that is not a digit
[\b] Inside a character class, \b is a backspace character. [\b\t] matches a backspace or tab character

Dot

Character

Description

Example

. (dot) Matches any single character except line break characters \r and \n. Most regex flavors have an option to make the dot match line break characters too. . matches x or (almost) any other character

Anchors

Character

Description

Example

^ (caret) Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the caret match after line breaks (i.e. at the start of a line in a file) as well. ^. matches a in abc\ndef. Also matches d in “multi-line” mode.
$ (dollar) Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Most regex flavors have an option to make the dollar match before line breaks (i.e. at the end of a line in a file) as well. Also matches before the very last line break if the string ends with a line break. .$ matches f in abc\ndef. Also matches c in “multi-line” mode.
\A Matches at the start of the string the regex pattern is applied to. Matches a position rather than a character. Never matches after line breaks. \A. matches a in abc
\Z Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Never matches before line breaks, except for the very last line break if the string ends with a line break. .\Z matches f in abc\ndef
\z Matches at the end of the string the regex pattern is applied to. Matches a position rather than a character. Never matches before line breaks. .\z matches f in abc\ndef

Word Boundaries

Character

Description

Example

\b Matches at the position between a word character (anything matched by \w) and a non-word character (anything matched by [^\w] or \W) as well as at the start and/or end of the string if the first and/or last characters in the string are word characters. .\b matches c in abc
\B Matches at the position between two word characters (i.e the position between \w\w) as well as at the position between two non-word characters (i.e. \W\W). \B.\B matches b in abc

Alternation

Character

Description

Example

| (pipe) Causes the regex engine to match either the part on the left side, or the part on the right side. Can be strung together into a series of options. abc|def|xyz matches abc, def or xyz
| (pipe) The pipe has the lowest precedence of all operators. Use grouping to alternate only part of the regular expression. abc(def|xyz) matches abcdef or abcxyz

Quantifiers

Character

Description

Example

? (question mark) Makes the preceding item optional. Greedy, so the optional item is included in the match if possible. abc? matches ab or abc
?? Makes the preceding item optional. Lazy, so the optional item is excluded in the match if possible. This construct is often excluded from documentation because of its limited use. abc?? matches ab or abc
* (star) Repeats the previous item zero or more times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is not matched at all. “.*” matches “def” “ghi” in abc “def” “ghi” jkl
*? (lazy star) Repeats the previous item zero or more times. Lazy, so the engine first attempts to skip the previous item, before trying permutations with ever increasing matches of the preceding item. “.*?” matches “def” in abc “def” “ghi” jkl
+ (plus) Repeats the previous item once or more. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only once. “.+” matches “def” “ghi” in abc “def” “ghi” jkl
+? (lazy plus) Repeats the previous item once or more. Lazy, so the engine first matches the previous item only once, before trying permutations with ever increasing matches of the preceding item. “.+?” matches “def” in abc “def” “ghi” jkl
{n} where n is an integer >= 1 Repeats the previous item exactly n times. a{3} matches aaa
{n,m} where n >= 0 and m >= n Repeats the previous item between n and m times. Greedy, so repeating m times is tried before reducing the repetition to n times. a{2,4} matches aaaa, aaa or aa
{n,m}? where n >= 0 and m >= n Repeats the previous item between n and m times. Lazy, so repeating n times is tried before increasing the repetition to m times. a{2,4}? matches aa, aaa or aaaa
{n,} where n >= 0 Repeats the previous item at least n times. Greedy, so as many items as possible will be matched before trying permutations with less matches of the preceding item, up to the point where the preceding item is matched only n times. a{2,} matches aaaaa in aaaaa
{n,}? where n >= 0 Repeats the previous item n or more times. Lazy, so the engine first matches the previous item n times, before trying permutations with ever increasing matches of the preceding item. a{2,}? matches aa in aaaaa

source: www.regular-expressions.info

-- Kedar Vaijanapurkar --