Wednesday 26 February 2014

Fetching a Webpage in WEB-HARVEST(PART:1)


Now we'll be to fetching webpages.  Look at the following script:

<?xml version="1.0" encoding="UTF-8"?>

<config>
    <var-def name="datestring">
          <file action="read" path="date.txt"></file>
    </var-def>
   
    <var-def name="webpage">
         <html-to-xml>
          <http url="http://scores.espn.go.com/ncb/scoreboard?date=$(datestring)"/>
     </html-to-xml>
    </var-def>
   
</config>




As before, we read in the datestring from a file.  And as in Part 1, we use the http processor to fetch a web page.  But notice the url:

url="http://scores.espn.go.com/ncb/scoreboard?date=${datestring}"
The end of the URL is "${datestring}".  In processor attributes, just as in the template processor, anything enclosed in ${ } is evaluated in Javascript.  In this case, "${datestring}" is replaced with the value of the datestring variable -- which is the "20121110" we read from the date.txt file.  So the resulting URL is "http://scores.espn.go.com/ncb/scoreboard?date=20121110".  This leads (as you might have guessed) to the college basketball results from 11/10/2012.

Conclusion

We now have some basic tools for fetching and manipulating data.  Next time we'll get to the real work of pulling information out of a web page.

CONTINUE WITH PART : 2 

Unknown Web Developer

No comments:

Post a Comment

Total Pageviews

DjKiRu Initative. Powered by Blogger.