php - Find and replace URLs in a blob of text but exclude those in link tags -


i have been trying run through string , find , replace urls link, here have come far, , seem work part quite well, there few things i'd polish. might not best performing way of doing that.

i have read many threads on here on so, , although helped great deal, still need tie loose ends on it.

i running through string 2 times. first time replacing bbtags html tags; , second time running through string , replacing text urls links:

$body_str = preg_replace('/\[url=(.+?)\](.+?)\[\/url\]/i', '<a href="\1" rel="nofollow" target="_blank">\2</a>', $body_str);  $body_str = preg_replace_callback(     '!(?:^|[^"\'])(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&amp;:/~\+#]*[\w\-\@?^=%&amp;/~\+#])?!',     function ($matches) {         return strpos(trim($matches[0]), 'thisone.com') == false ?         '<a href="' . ltrim($matches[0], " \t\n\r\0\x0b.,@?^=%&amp;:/~\+#'") . '" rel="nofollow" target="_blank">' . ltrim($matches[0], "\t\n\r\0\x0b.,@?^=%&amp;:/~\+#'") . '</a>' :         '<a href="' . ltrim($matches[0], " \t\n\r\0\x0b.,@?^=%&amp;:/~\+#'") . '">' . ltrim($matches[0], "\t\n\r\0\x0b.,@?^=%&amp;:/~\+#'") . '</a>';     },     $body_str ); 

so far few problems finding tends pick character immediatelly before 'http' etc e.g. space/comma/colon etc, broke links. used preg_replace_callback work around , trim unwanted characters break link.

the other problem avoid breaking links matching urls, in a-tags excluding urls starting quote,double-quote, , i'd rather use href='|href=" exclusion.

any tips , advice appreciated

first allowed myself refactor bit code make easier read , modify :

  function urltrim($str) {    return ltrim($str, " \t\n\r\0\x0b.,@?^=%&:/~\+#'"); } function addlink($str,$nofollow=true) {         return '<a href="' . urltrim($str) . '"'.($nofollow ? ' rel="nofollow" target="_blank"' : '').'>' . urltrim($str) . '</a>'; } function checksite($str) {         return strpos(trim($str), 'thisone.com') == false ?  addlink($str) : addlink($str,false); }  $body_str = preg_replace('/\[url=(.+?)\](.+?)\[\/url\]/i', '\2', $body_str);  $body_str = preg_replace_callback(     '!(?:^|[^"\'])(http|https):\/\/[\w\-_]+(\.[\w\-_]+)+([\w\-\.,@?^=%&:/~\+#]*[\w\-\@?^=%&/~\+#])?!',        function ($matches) {         return checksite($matches[0]);     },      $body_str );  

after changed way handle links :

  • i considered url word (= characters until find space or \n or \t (=\s))
  • i changed matching method match existence of href= in front of string
    • if exists don't anything, it's link
    • if no href= present, replace link
  • so urltrim method not useful anymore since don't eat first char before http
  • and of course, use urlencode encode url , avoid html injection
 function urltrim($str) {     return $str; } function addlink($str,$nofollow=true) {         $url = preg_replace("#(https?)%3a%2f%2f#","$1://",urlencode(urltrim($str)));         return '<a href="' . $url . '"'.($nofollow ? ' rel="nofollow" target="_blank"' : '').'>' . urltrim($str) . '</a>'; } function checksite($str) {         return strpos(trim($str), 'thisone.com') == false ?  addlink($str) : addlink($str,false); }  $body_str = preg_replace('/\[url=(.+?)\](.+?)\[\/url\]/i', '\2', $body_str);  $body_str = preg_replace_callback(     '!(|href=)(["\']?)(https?://[^\s]+)!',     function ($matches) {         if ($matches[1]) {             # if href= present, dont anything, return original string             return $matches[0];         } else {             # add previous char (" or ') , link             return $matches[2].checksite($matches[3]);         }     },     $body_str );  

i hope can in project. tell if helped.

bye.


Comments

Popular posts from this blog

matlab - Deleting rows with specific rules -

php - MySQLi multi_query results for later use -