07/03/2024
Default Title in English
Collectives™ on Stack Overflow
Find centralized, trusted content and collaborate around the technologies you use most.
Learn more about CollectivesTeams
Q&A for work
Connect and share knowledge within a single location that is structured and easy to search.
Learn more about TeamsGet early access and see previews of new features.
Learn more about LabsI am using scrapy to scrape jobs data from this website. One job page looks like this. The static data can be easily scraped by scrapy but the dynamic data generated by google maps apis like the "Distance" and "Time" are giving me problem. I get "Distance Unknown" value for the distance field and blank value for the time field.
When I open the console in chrome, then in networks tab in the scripts section, I can see a java request ("DirectionsService.Route") that has been made to googles maps api and all the values that I need are there in a JSON format.
Is there a way in which I can use scrapy to get this json output generated by google maps api's ?
If not, then is there a way to program scrapy script to wait for the complete page load ( so that distance and time values load ) and then scrape these values ?
1 Answer 1
The issue is that scrapy does not render javascript and the Distance and Time fields are both populated by javascript.
You have a few options. You can use Splash (http://splash.readthedocs.org/en/latest/index.html) made by the same folks as Scrapy or selenium/phantomjs.
selenium with scrapy for dynamic page has lots of links/info in the answer.
As for JSON/scrapy, you can use the json library in python (import json) to load json into a python dictionary like:
json_url = 'http://www.whatever.com/whatever.json' yield Request(json_url, callback=self.parse_json) def parse_json(self, response): json_dict = json.loads(response.body_as_unicode())
If the URL you yielded returns JSON, the data will now be in a python dictionary called json_dict.
Not the answer you're looking for? Browse other questions tagged or ask your own question.
- The Overflow Blog
-
-
- Upcoming Events
- 2024 Community Moderator Electionends in 6 days
- Featured on Meta
-
-
-
Related
Hot Network Questions
- What does an SD Card do internally during the 74 clock cycles at startup
- When a bus goes around a corner, does the person sitting at the back travel further distance than the person sitting at the front?
- Short story in which a spacecraft is destroyed when all its internal friction is removed, so that it falls apart
- Can DOS make use of more than 640 KB of conventional memory on 80186?
- The horizontal gap/kern before the comma is too large with unicode-math
- Why do most planets remain within a few degrees from the ecliptic?
- "To run a picture". Does it mean "publish"?
- What tool can generate this kind of illustrations?
- The statistical model equivalent of this R formula
- How did the AOL software provide internet access to other applications running on Windows 95/98?
- Is it possible to identify a phone model by screenshot stripped of metadata?
- STM32 for loop slows down code too much
- Is there any ban that prevents the militarization of the moon?
- Rocket attached to a pendulum. How is energy conserved?
- Why the planet Uranus wasn't recognized by ancient cultures?
- Is there a language to write programmes by pictures?
- Riddle: I am a mollusc
- How to force LineLegend to be independent from the plot
- Abortion Debate - Should we teach abstinence (along with condoms and contraceptions) to adolescents as a preventive measure of unwanted pregnancy?
- Ideals generated by Turing independent sets
- Are there techniques to reduce the number of Pauli strings in a Hamiltonian?
- Highlight dates in DateListStepPlot
- Why is the lunar relief not visible in photographs of solar eclipses?
- Confusion with polling data by an MCU using UART
We Care About Your Privacy
We and our partners perform the following based on your settings:
Use precise geolocation data. Actively scan device characteristics for identification. Understand audiences through statistics or combinations of data from different sources. Store and/or access information on a device. Develop and improve services. Create profiles to personalise content. Measure content performance. Use limited data to select content. Measure advertising performance. Use limited data to select advertising. Create profiles for personalised advertising. Use profiles to select personalised advertising. Use profiles to select personalised content.