ASP to PHP Question

Author
Discussion

judas

Original Poster:

5,992 posts

260 months

Friday 5th October 2007
quotequote all
Anyone proficient with both ASP and PHP? I'm looking for a way to convert an ASP function I have into PHP. The function in question uses a Microsoft.XMLHTTP object to grab the source HTML of another web page and then chop bits out to include in the new page. I don't know much (read: nothing hehe) about PHP and I'm wondering if there's an equivalent way of grabbing the HTML and performing the same treatment to it.

Ta!

incubus

8,788 posts

283 months

Friday 5th October 2007
quotequote all
There is a library called curl that will do what you need. Here's an example script - it will take the HTML of a give URL (in this case http://www.pistonheads.co.uk), change the string "Pistonheads" to "Example" and output it to the screen.


<?php

// Create a new curl resource
$ch = curl_init();

// Set URL and other appropriate options
curl_setopt($ch,CURLOPT_URL,"http://www.pistonheads.co.uk");

// Set a parameter to return the page rather than immediately print it on the screen
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);

// Perform the curl command using the options set above
$output = curl_exec($ch);

// Close curl and free up system resources
curl_close($ch);

// Replace "Pistonheads" with "Example
$output = str_replace("Pistonheads","Example",$output);

// Print
echo $output;


?>



Edited by incubus on Friday 5th October 14:37

incubus

8,788 posts

283 months

Friday 5th October 2007
quotequote all
Another example. This will take the HTML from a given page (http://www.pistonheads.com), strip html tags, javascript and comments out and print out the raw text to screen.

{{{

<?php

function webpage2txt($url)
{
$user_agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";

$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL, $url); // set url to post to
curl_setopt($ch, CURLOPT_FAILONERROR, 1); // Fail on errors
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // allow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_PORT, 80); //Set the port number
curl_setopt($ch, CURLOPT_TIMEOUT, 15); // times out after 15s

curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);

$document = curl_exec($ch);

curl_close($ch);
$search = array('@<script[^>]*?>.*?</script>@... Strip out javascript
'@<style[^>]*?>.*?</style>@siU', // Strip style tags properly
'@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags
'@<![\s\S]*?–[ \t\n\r]*>@', // Strip multi-line comments including CDATA
'/\s{2,}/',

);

$text = preg_replace($search, "\n", html_entity_decode($document));

$pat[0] = "/^\s+/";
$pat[2] = "/\s+\$/";
$rep[0] = "";
$rep[2] = " ";

$text = preg_replace($pat, $rep, trim($text));

return $text;
}

$getText = webpage2txt("www.pistonheads.com&quotwink;
echo $getText;
?>

}}}


Edited by incubus on Friday 5th October 14:38

judas

Original Poster:

5,992 posts

260 months

Friday 5th October 2007
quotequote all
Thanks for that!

After some Googling I found a function called file_get_contents that does the trick. What I've done is very basic (I said I don't know PHP!) but I'll get our semi-resident PHP chap to knock it into shape when he gets back in next week.

This is my effort:
{{{
<?php
function getHTML($URL, $StartTag, $EndTag) {

$Source = file_get_contents($URL, False);
$StartTagLength = strlen($StartTag);
$StartTagPos = strpos($Source, $StartTag) + $StartTagLength;
$EndTagLength = strlen($EndTag);
$EndTagPos = strpos($Source, $EndTag) - $StartTagPos;

$HTML = substr($Source,$StartTagPos, $EndTagPos);

return $HTML;
}

$theStart = "<!--NavStart -->";
$theEnd = "<!--NavEnd -->";
$theSite = "http://thepageIwant.com";

echo getHTML($theSite, $theStart, $theEnd);
?>
}}}

Edited by judas on Friday 5th October 16:41

LivinLaVidaLotus

1,626 posts

202 months

Friday 5th October 2007
quotequote all
Be aware, that file_get_contents() isn't always a portable function - because some systems may have the opening of URLs with the inbuilt file functions disabled, obviously not a problem if it is for an internal system.

judas

Original Poster:

5,992 posts

260 months

Friday 5th October 2007
quotequote all
It's entirely for internal convenience. We have a legacy content management system into which we sometimes need to add additional functionality; the orginal developer has long since departed and it's easier to build more-or-less standalone add-ons. These functions will allow us to easily integrate the two almost seamlessly (from the user's perspective at least) and inject the dynamically created site navigation from the CMS into any new applications.