ASP to PHP Question
Discussion
Anyone proficient with both ASP and PHP? I'm looking for a way to convert an ASP function I have into PHP. The function in question uses a Microsoft.XMLHTTP object to grab the source HTML of another web page and then chop bits out to include in the new page. I don't know much (read: nothing ) about PHP and I'm wondering if there's an equivalent way of grabbing the HTML and performing the same treatment to it.
Ta!
Ta!
There is a library called curl that will do what you need. Here's an example script - it will take the HTML of a give URL (in this case http://www.pistonheads.co.uk), change the string "Pistonheads" to "Example" and output it to the screen.
<?php
// Create a new curl resource
$ch = curl_init();
// Set URL and other appropriate options
curl_setopt($ch,CURLOPT_URL,"http://www.pistonheads.co.uk");
// Set a parameter to return the page rather than immediately print it on the screen
curl_setopt($ch,CURLOPT_RETURNTRANSFER,true);
// Perform the curl command using the options set above
$output = curl_exec($ch);
// Close curl and free up system resources
curl_close($ch);
// Replace "Pistonheads" with "Example
$output = str_replace("Pistonheads","Example",$output);
echo $output;
?>
Edited by incubus on Friday 5th October 14:37
Another example. This will take the HTML from a given page (http://www.pistonheads.com), strip html tags, javascript and comments out and print out the raw text to screen.
{{{
<?php
function webpage2txt($url)
{
$user_agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL, $url); // set url to post to
curl_setopt($ch, CURLOPT_FAILONERROR, 1); // Fail on errors
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // allow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_PORT, 80); //Set the port number
curl_setopt($ch, CURLOPT_TIMEOUT, 15); // times out after 15s
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
$document = curl_exec($ch);
curl_close($ch);
$search = array('@<script[^>]*?>.*?</script>@... Strip out javascript
'@<style[^>]*?>.*?</style>@siU', // Strip style tags properly
'@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags
'@<![\s\S]*?–[ \t\n\r]*>@', // Strip multi-line comments including CDATA
'/\s{2,}/',
);
$text = preg_replace($search, "\n", html_entity_decode($document));
$pat[0] = "/^\s+/";
$pat[2] = "/\s+\$/";
$rep[0] = "";
$rep[2] = " ";
$text = preg_replace($pat, $rep, trim($text));
return $text;
}
$getText = webpage2txt("www.pistonheads.com"
echo $getText;
?>
}}}
{{{
<?php
function webpage2txt($url)
{
$user_agent = "Mozilla/4.0 (compatible; MSIE 5.01; Windows NT 5.0)";
$ch = curl_init(); // initialize curl handle
curl_setopt($ch, CURLOPT_URL, $url); // set url to post to
curl_setopt($ch, CURLOPT_FAILONERROR, 1); // Fail on errors
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1); // allow redirects
curl_setopt($ch, CURLOPT_RETURNTRANSFER,1); // return into a variable
curl_setopt($ch, CURLOPT_PORT, 80); //Set the port number
curl_setopt($ch, CURLOPT_TIMEOUT, 15); // times out after 15s
curl_setopt($ch, CURLOPT_USERAGENT, $user_agent);
$document = curl_exec($ch);
curl_close($ch);
$search = array('@<script[^>]*?>.*?</script>@... Strip out javascript
'@<style[^>]*?>.*?</style>@siU', // Strip style tags properly
'@<[\/\!]*?[^<>]*?>@si', // Strip out HTML tags
'@<![\s\S]*?–[ \t\n\r]*>@', // Strip multi-line comments including CDATA
'/\s{2,}/',
);
$text = preg_replace($search, "\n", html_entity_decode($document));
$pat[0] = "/^\s+/";
$pat[2] = "/\s+\$/";
$rep[0] = "";
$rep[2] = " ";
$text = preg_replace($pat, $rep, trim($text));
return $text;
}
$getText = webpage2txt("www.pistonheads.com"
echo $getText;
?>
}}}
Edited by incubus on Friday 5th October 14:38
Thanks for that!
After some Googling I found a function called file_get_contents that does the trick. What I've done is very basic (I said I don't know PHP!) but I'll get our semi-resident PHP chap to knock it into shape when he gets back in next week.
This is my effort:
{{{
<?php
function getHTML($URL, $StartTag, $EndTag) {
$Source = file_get_contents($URL, False);
$StartTagLength = strlen($StartTag);
$StartTagPos = strpos($Source, $StartTag) + $StartTagLength;
$EndTagLength = strlen($EndTag);
$EndTagPos = strpos($Source, $EndTag) - $StartTagPos;
$HTML = substr($Source,$StartTagPos, $EndTagPos);
return $HTML;
}
$theStart = "<!--NavStart -->";
$theEnd = "<!--NavEnd -->";
$theSite = "http://thepageIwant.com";
echo getHTML($theSite, $theStart, $theEnd);
?>
}}}
After some Googling I found a function called file_get_contents that does the trick. What I've done is very basic (I said I don't know PHP!) but I'll get our semi-resident PHP chap to knock it into shape when he gets back in next week.
This is my effort:
{{{
<?php
function getHTML($URL, $StartTag, $EndTag) {
$Source = file_get_contents($URL, False);
$StartTagLength = strlen($StartTag);
$StartTagPos = strpos($Source, $StartTag) + $StartTagLength;
$EndTagLength = strlen($EndTag);
$EndTagPos = strpos($Source, $EndTag) - $StartTagPos;
$HTML = substr($Source,$StartTagPos, $EndTagPos);
return $HTML;
}
$theStart = "<!--NavStart -->";
$theEnd = "<!--NavEnd -->";
$theSite = "http://thepageIwant.com";
echo getHTML($theSite, $theStart, $theEnd);
?>
}}}
Edited by judas on Friday 5th October 16:41
It's entirely for internal convenience. We have a legacy content management system into which we sometimes need to add additional functionality; the orginal developer has long since departed and it's easier to build more-or-less standalone add-ons. These functions will allow us to easily integrate the two almost seamlessly (from the user's perspective at least) and inject the dynamically created site navigation from the CMS into any new applications.
Gassing Station | Computers, Gadgets & Stuff | Top of Page | What's New | My Stuff