Replace w/

I am not very good at regex, but I need to convert the following example from this

<li>Creations by Carol -</li>


<li>Creations by Carol - <a href="" rel="external"></a></li>
13.10.2009 20:26:42
what language are you using to do this? This can't be accomplished just in HTML.
GSto 13.10.2009 20:29:56
I am doing this in my TextMate search & replace part, sorry I did not mention that earlier.
Brad 13.10.2009 20:40:23
Just take the pattern from my script then: www\.[a-z\d-\.]+\.[a-z]+
David Snabel-Caunt 13.10.2009 20:45:57
I am terribly sorry, but reading the question and some of the answers and comments has led me to believe, that some people think, that "www" as the leftmost label of a domain name might mean something special. Why?
shylent 26.12.2009 14:08:04

If you're only looking for URLs in <li> elements formatted like the one in your question, it should be much simpler than a lot of the other suggested solutions. You don't really need to validate your URLs, I assume, you just want to take a list of site names and URLs and turn the URLs into links.

Your search pattern could be:

<li>(.+) - (https?:\/\/)?(\S+?)<\/li>

And the replace pattern would be:

<li>$1 - <a href="(?2:$2:http\://)$3" rel="external">$3</a></li>

Just tested the find/replace out in TextMate and it worked nicely. It addes http:// if it isn't already present, and otherwise assumes that whatever is after the - is a URL as long as it doesn't contain a space.

For testing out regular expressions, Rubular is a great tool. You can paste in some text, and it'll show you what matches as you type your regex. It's a ruby tool, but TextMate uses the same regex syntax as ruby.

13.10.2009 21:30:55
This looks good, but i think the S+ match should be non-greedy just in case there is another <li>withoutspaces<\li> following.
John La Rooy 13.10.2009 21:23:57
Good suggestion, I didn't think of that. I've changed it. (Sorry to have misattributed the suggestion in the edit comments, though. That's what I get for copy/pasting too fast)
Emily 13.10.2009 21:32:33

How about this in PHP?

$string = '<li>Creations by Carol -</li>';
$pattern = '/(www\.[a-z\d-\.]+\.[a-z]+)/i';
$replacement = '<a href="http://$1" rel="external">$1</a>';
echo preg_replace($pattern, $replacement, $string);

Assumes your links are always www.something.extension.

13.10.2009 20:40:57
You forgot uppercase and symbols, and you did not escaped the last dot, and the last part of the url is not really a [a-z]+, but rather a list of choices.
NewbiZ 13.10.2009 20:35:13
The i after the closing slash denotes an insensitive match. I've added the missing backslash. I made the assumption that Brad doesn't want to enumerate hundreds of TLDs and his users will enter valid domains. He didn't ask for an exhaustive or highly complex solution so I wrote a simple regex.
David Snabel-Caunt 13.10.2009 20:43:26
It's for a regex replacement in a text editor - quick and dirty is desireable. However, some editors have non-convential implementations that may differ from PHPs. Can someone confirm this will work in TM?
Samantha Branham 13.10.2009 20:55:49
This will work in TextMate with only one modification: the - in the character class needs to be escaped. Also, the case sensitivity flag is a checkbox. So, regex as follows: (www\.[a-z\d\-\.]+\.[a-z]+)
Emily 13.10.2009 21:37:28
13.10.2009 20:32:38

You have to be really clear about how much information you need to give the regex to avoid false positives.

For example is the pattern www.something.somethingelse enough? are there other www in the file that would get caught?

maybe <li> something - somethingelse</li> is the correct match. We cannot guess without knowing your whole file. There might be other <li> in there that you don't want to change.

13.10.2009 21:04:37