Get Website Title From User Site Input
I'm trying to get the title of a website that is entered by the user.
 Text input: website link, entered by user is sent to the server via AJAX.  The user can input anything: an actual existing link, or just single word, or something weird like 'po392#*@8'  
 Here is a part of my PHP script:  
         // Make sure the url is on another host
        if(substr($url, 0, 7) !== "http://" AND substr($url, 0, 8) !== "https://") {
            $url = "http://".$url;
        }
        // Extra confirmation for security
        if (filter_var($url, FILTER_VALIDATE_URL, FILTER_FLAG_HOST_REQUIRED)) {
            $urlIsValid = "1";
        } else {
            $urlIsValid = "0";
        }
        // Make sure there is a dot in the url
        if (strpos($url, '.') !== false) {
            $urlIsValid = "1";
        } else {
            $urlIsValid = "0";
        }
        // Retrieve title if no title is entered
        if($title == "" AND $urlIsValid == "1") {
            function get_http_response_code($theURL) {
                $headers = get_headers($theURL);
                if($headers) {
                    return substr($headers[0], 9, 3);
                } else {
                    return 'error';
                }
            }
            if(get_http_response_code($url) != "200") {
                $urlIsValid = "0";
            } else {
                $file = file_get_contents($url);
                $res = preg_match("/<title>(.*)</title>/siU", $file, $title_matches);
                if($res === 1) {
                    $title = preg_replace('/s+/', ' ', $title_matches[1]);
                    $title = trim($title);
                    $title = addslashes($title);
                }
                // If title is still empty, make title the url
                if($title == "") {
                    $title = $url;
                }
            }
        }
However, there are still errors occuring in this script.
It works perfectly if an existing url as 'https://www.youtube.com/watch?v=eB1HfI-nIRg' is entered and when a non-existing page is entered as 'https://www.youtube.com/watch?v=NON-EXISTING', but it doesn't work when the users enters something like 'twitter.com' (without http) or something like 'yikes'.
I tried literally everthing: cUrl, DomDocument...
The problem is that when an invalid link is entered, the ajax call never completes (it keeps loading), while it should $urlIsValid = "0" whenever an error occurs.
I hope someone can help you - it's appreciated.
Nathan
You have a relatively simple problem but your solution is too complex and also buggy.
These are the problems that I've identified with your code:
// Make sure the url is on another host
if(substr($url, 0, 7) !== "http://" AND substr($url, 0, 8) !== "https://") {
     $url = "http://".$url;
}
 You won't make sure that that possible url is on another host that way (it could be localhost ).  You should remove this code.  
// Make sure there is a dot in the url
if (strpos($url, '.') !== false) {
        $urlIsValid = "1";
} else {
        $urlIsValid = "0";
}
 This code overwrites the code above it, where you validate that the string is indeed a valid URL , so remove it.  
 The definition of the additional function get_http_response_code is pointless.  You could use only file_get_contents to get the HTML of the remote page and check it against false to detect the error.  
 Also, from your code I conclude that, if the (external to context) variable $title is empty then you won't execute any external fetch so why not check it first?  
To sum it up, your code should look something like this:
if('' === $title && filter_var($url, FILTER_VALIDATE_URL))
{
    //@ means we suppress warnings as we won't need them
    //this could be done with error_reporting(0) or similar side-effect method
    $html = getContentsFromUrl($url);
    if(false !== $html && preg_match("/<title>(.*)</title>/siU", $file, $title_matches))
    {
        $title = preg_replace('/s+/', ' ', $title_matches[1]);
        $title = trim($title);
        $title = addslashes($title);
    }
    // If title is still empty, make title the url
    if($title == "") {
        $title = $url;
    }
}
function getContentsFromUrl($url)
{
   //if not full/complete url
   if(!preg_match('#^https?://#ims', $url))
   {
       $completeUrl = 'http://' . $url;
       $result = @file_get_contents($completeUrl);
       if(false !== $result)
       {
           return $result;
       }
       //we try with https://
       $url = 'https://' . $url;
   }
   return @file_get_contents($url);
}
上一篇: 如何在Three.js json场景中使用四元数旋转
下一篇: 从用户网站输入获取网站标题
