Thursday, March 3, 2011

HTML last tag conditional match

I have two strings

<EM>is <i>love</i></EM>,<PARTITION />

and

<EM>is <i>love</i>,<PARTITION />

I want a regex to match the second string completely but should not match the first one. Please help.

Note: Everything can change except the EM and PARTITION tags.

From stackoverflow
  • I don't think you're asking the right question. This regex matches the second string completely and not the first:

    /^<EM>is <i>love<\/i>,<PARTITION \/>$/

    But obviously, you want to match a class of strings, not just the second string... right? Define the class of strings you want to match and you can be 1 step closer to getting the regular expression you need.

    shabby : well sorry i meant that if the string is run on second string it matches it and does not match anything if run on first string
  • ^<EM>(?:(?<!</EM>).)*<PARTITION />$
    

    works. But it depends on target language, JavaScript, for example, doesn't support lookaround assertions...

    A simpler solution is to use ^<EM>.*<PARTITION />$ and just check there is not </EM> in the string afterward: I believe REs are powerful and a must have, but I don't try to do everything in one expression only... :-)

  • If you want to match a string entirely if it does not contain a certain substring, use a regex to match the substring, and return the whole string if the regex does not match. You didn't say which language you're using, but you tagged your question with .NET, so here goes in C#:

    if (Regex.IsMatch(subjectString, "</EM>")) {
        return null;
    } else {
        return subjectString;
    }
    

    Since is just a bit of literal text, you don't even need to use a regular expression:

    if (subjectString.Contains("</EM>")) {
        return null;
    } else {
        return subjectString;
    }
    

    In a situation where all you could use is a regex, try this:

    \A((?!</EM>).)*\Z
    

    The regex-only solution will be far less efficient than the above code samples.

  • luckily going through all this and doing a lot of research on this i found the right regexx..........heres foor you all...thanks for everyone who helped

    <EM>\w*\s*\W*\S*[^\(</EM>)]<PARTITION[ ]/>
    

    captures the second string but leaves teh first one.... the only problem i was having was of negating the </EM> combination which i did with a backslash before the group, this negates the complete string rather than taking characters seperatly....

    Gumbo : This won’t work as “[^\()]” describes a character class and not just “” as assumed.

0 comments:

Post a Comment