Recently I was assign to do some Web Scraping and Data Scraping work. I do some research and find it is pretty easy but somehow Cumbersome job.

I managed to use simple Classic VBScript programming to read and save each web pages and get the useful data out with Regular Expression. However this job is not a one-time job, you need to analyze each HTML code first before you can get the correct data out using correct way. Usually you will face a lot of rule violations so you need fix many exceptions.

And there is also a big problem when doing Web Scraping, that is your code can not trigger JavaScript functions in a web page, which means you can not read content when it is generated by JavaScript or data from Ajax.

loops

imacros logoI luckily found a tool which can solve these problem, the DJuggler Builder. (Actually I found two, another one is iMacro, but this one usually still need to take many programming effort.)

With DJuggler, you don’t need to do any programming since it provides a lot of build-in components for you to drag and drop and quickly get those useful data back to your computer or database, it also support “JavaScript function triggering” and “Mouse Action Recording”.

The best of all, it can also compile into an exe file for other specific purpose. I suggest you try this tool if you are doing web scraping, it will save you a lot of programming and debuging time.

發佈留言