advertisement
Facebook
X
LinkedIn
WhatsApp
Reddit

How to scrape data with import.io – Join in at JoziHub

Few things excite and chill in equal amounts as the words “big data” when spoken to a journalist. On the one hand, our world is being deluged by a downpour of data which tells us everything from who’s investing in who to how many people pass through an etoll gantry at night: turn your umbrella upside down, catch enough data, and you can learn almost anything.

On the other hand, however, data without context is just numbers. Often out-of-order and in a format that you can’t use. Learning to ‘scrape’ data – which means collating information from publically available web sources and putting into a format you can use – is hard. How exactly do you turn that PDF of expense lines into a searchable spreadsheet from which you can extract meaning?

If only there was a fairly simply tool into which you could feed a webpage into one page and see rows of orderly digits ready to be analysed come out of the other. What’s that you say, there is?

import.io and Tabula are two extraordinarily powerful but relatively straightforward tools which can strip a webpage or PDF of its lines of knowledge in minutes. In the right hands, they cut out hours of work manually sorting through data or smashing it into usable form.

And you can learn to use them with Hacks/Hackers Johannesburg next week.

On Tuesday 18th at 6pm in JoziHub, 44 Stanley, Hacks/Hackers Johannesburg is kicking off a new program of events looking at the data pipeline for storytelling kicks off and is open to anyone involved in journalism, investigation, data visualisation, design or data science who wants to learn how to use import.io and Tabula for pulling data from reluctant sources. We’ll be there, and we hope you are too.

Full details over here.

advertisement

About Author

advertisement

Related News

advertisement