Default Title in English

preamble in English
javascript - Trying to scrape dynamic data generated by google maps api on a website, but normal scraping returns blank - Stack Overflow
Asked
Viewed 2k times
0

I am using scrapy to scrape jobs data from this website. One job page looks like this. The static data can be easily scraped by scrapy but the dynamic data generated by google maps apis like the "Distance" and "Time" are giving me problem. I get "Distance Unknown" value for the distance field and blank value for the time field.

When I open the console in chrome, then in networks tab in the scripts section, I can see a java request ("DirectionsService.Route") that has been made to googles maps api and all the values that I need are there in a JSON format.

Is there a way in which I can use scrapy to get this json output generated by google maps api's ?

If not, then is there a way to program scrapy script to wait for the complete page load ( so that distance and time values load ) and then scrape these values ?

1 Answer 1

2

The issue is that scrapy does not render javascript and the Distance and Time fields are both populated by javascript.

You have a few options. You can use Splash (http://splash.readthedocs.org/en/latest/index.html) made by the same folks as Scrapy or selenium/phantomjs.

selenium with scrapy for dynamic page has lots of links/info in the answer.

As for JSON/scrapy, you can use the json library in python (import json) to load json into a python dictionary like:

   json_url = 'http://www.whatever.com/whatever.json'    yield Request(json_url, callback=self.parse_json) def parse_json(self, response):    json_dict = json.loads(response.body_as_unicode()) 

If the URL you yielded returns JSON, the data will now be in a python dictionary called json_dict.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.

0
Feed