Free your PDFs! Introduction to Tabula with Manuel Aristarán

In preparation for National Day of Civic Hacking, we wanted to show off a tool that helps liberate table data from PDFs called Tabula. Tabula is an open source tool built by Manuel Aristarán with the help of ProPublica, La Nación DATA and Knight-Mozilla OpenNews. We sat down with Aristarán to talk about the app and give a short demo. When you first open Tabula, you’re given the option to load PDFs into the system. For this example, we’ve taken the monthly veterans report from the Illinois Department of Employment Security (currently only available in PDF) and loaded it into Tabula. TabulaDemo1 Once you upload it, Tabula will process the file. This can take a little bit of time depending on the size of the file. Once it’s loaded, you simply draw rectangulars over the tables in the PDF. TabulaDemo2 From there, Tabula will show you the data that’s it’s captured. Now, you can copy the data to the clipboard (so you can copy it into excel)  or download to your own local machine as a file. It’s that simple. TabulaDemo3 You can find more information on how Tabula works and download a copy for yourself on their website at tabula.nerdpower.org.