Wednesday, March 23, 2011

How to exclude from match if a substring of another string

I have a problem. I'd like to match all occurrences of \t in my text (and by \t i mean it literally it is not a tab character) but I would like to exclude a match if it is a part of \t string. How to do that?

Example

<HTML>Blah</HTML>\t
D:\\UserData\\tui

I'd like to match \t in the first line but not in second line (as it is a part of \\t).

Is this at all possible using regular expressions?

From stackoverflow
  • You have to define more precisely what you mean by "part of a string". For example, you might mean: Don't match \t if it is followed by more alphanumerics or slash. So that would become (in Perl):

      \\t(?![\w\\])
    
    Jagger : Simply put. My regular expression is now \\t. But I don't want any matches in following text. D:\\UserData\\tui
    j_random_hacker : Then your regular expression is **not** simply \\t. Because \\t matches in your text. Adrian's point is you need to come up with a hard and fast *rule* for deciding whether an occurrence of "\t" should be considered a match or not.
  • You're going to need to define in exactly which cases a \t should match, and in which ones it shouldn't, before it's possible to determine a regex for it. Your current definition seems to be of the "I'll know it when I see it" variety, which is not sufficient.

  • /\\t\b/
    

    \b matches a word boundary (transition from word-like character to non-word-like, or vice versa).

  • Another approach: Match anything but a backslash, match a backslash and match a "t" character.

    /[^\\](\\t)/

0 comments:

Post a Comment