Fetching a Webpage in WEB-HARVEST(PART:1)

Wednesday, 26 February 2014

Fetching a Webpage in WEB-HARVEST(PART:1)

Now we'll be to fetching webpages. Look at the following script:

<?xml version="1.0" encoding="UTF-8"?>

<config>
    <var-def name="datestring">
         <file action="read" path="date.txt"></file>
    </var-def>

    <var-def name="webpage">
         <html-to-xml>
          <http url="http://scores.espn.go.com/ncb/scoreboard?date=$(datestring)"/>
     </html-to-xml>
    </var-def>

</config>

As before, we read in the datestring from a file. And as in Part 1, we use the http processor to fetch a web page. But notice the url:

url="http://scores.espn.go.com/ncb/scoreboard?date=${datestring}"

The end of the URL is "${datestring}". In processor attributes, just as in the template processor, anything enclosed in ${ } is evaluated in Javascript. In this case, "${datestring}" is replaced with the value of the datestring variable -- which is the "20121110" we read from the date.txt file. So the resulting URL is "http://scores.espn.go.com/ncb/scoreboard?date=20121110". This leads (as you might have guessed) to the college basketball results from 11/10/2012.

Conclusion

We now have some basic tools for fetching and manipulating data. Next time we'll get to the real work of pulling information out of a web page.

CONTINUE WITH PART : 2

Unknown Web Developer

Labels

Labels

300x250 AD TOP

Random Posts

About

Labels

Flickr

Find Us On Facebook

Popular Posts

Recent comments

Sponsor

TechWizard

Wednesday, 26 February 2014