php - Find and replace URLs in a blob of text but exclude those in link tags -
i have been trying run through string , find , replace urls link, here have come far, , seem work part quite well, there few things i'd polish. might not best performing way of doing that.
i have read many threads on here on so, , although helped great deal, still need tie loose ends on it.
i running through string 2 times. first time replacing bbtags html tags; , second time running through string , replacing text urls links:
$body_str = preg_replace('/\[url=(.+?)\](.+?)\[\/url\]/i', '<a href="\1" rel="nofollow" target="_blank">\2</a>', $body_str); $body_str = preg_replace_callback( '!(?:^|[^"\'])(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?!', function ($matches) { return strpos(trim($matches[0]), 'thisone.com') == false ? '<a href="' . ltrim($matches[0], " \t\n\r\0\x0b.,@?^=%&:/~\+#'") . '" rel="nofollow" target="_blank">' . ltrim($matches[0], "\t\n\r\0\x0b.,@?^=%&:/~\+#'") . '</a>' : '<a href="' . ltrim($matches[0], " \t\n\r\0\x0b.,@?^=%&:/~\+#'") . '">' . ltrim($matches[0], "\t\n\r\0\x0b.,@?^=%&:/~\+#'") . '</a>'; }, $body_str ); so far few problems finding tends pick character immediatelly before 'http' etc e.g. space/comma/colon etc, broke links. used preg_replace_callback work around , trim unwanted characters break link.
the other problem avoid breaking links matching urls, in a-tags excluding urls starting quote,double-quote, , i'd rather use href='|href=" exclusion.
any tips , advice appreciated
first allowed myself refactor bit code make easier read , modify :
function urltrim($str) { return ltrim($str, " \t\n\r\0\x0b.,@?^=%&:/~\+#'"); } function addlink($str,$nofollow=true) { return '<a href="' . urltrim($str) . '"'.($nofollow ? ' rel="nofollow" target="_blank"' : '').'>' . urltrim($str) . '</a>'; } function checksite($str) { return strpos(trim($str), 'thisone.com') == false ? addlink($str) : addlink($str,false); } $body_str = preg_replace('/\[url=(.+?)\](.+?)\[\/url\]/i', '\2', $body_str); $body_str = preg_replace_callback( '!(?:^|[^"\'])(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?!', function ($matches) { return checksite($matches[0]); }, $body_str ); after changed way handle links :
- i considered url word (= characters until find space or \n or \t (=\s))
- i changed matching method match existence of href= in front of string
- if exists don't anything, it's link
- if no href= present, replace link
- so urltrim method not useful anymore since don't eat first char before http
- and of course, use urlencode encode url , avoid html injection
function urltrim($str) { return $str; } function addlink($str,$nofollow=true) { $url = preg_replace("#(https?)%3a%2f%2f#","$1://",urlencode(urltrim($str))); return '<a href="' . $url . '"'.($nofollow ? ' rel="nofollow" target="_blank"' : '').'>' . urltrim($str) . '</a>'; } function checksite($str) { return strpos(trim($str), 'thisone.com') == false ? addlink($str) : addlink($str,false); } $body_str = preg_replace('/\[url=(.+?)\](.+?)\[\/url\]/i', '\2', $body_str); $body_str = preg_replace_callback( '!(|href=)(["\']?)(https?://[^\s]+)!', function ($matches) { if ($matches[1]) { # if href= present, dont anything, return original string return $matches[0]; } else { # add previous char (" or ') , link return $matches[2].checksite($matches[3]); } }, $body_str ); i hope can in project. tell if helped.
bye.
Comments
Post a Comment