Most people in digital marketing have so much data available at their fingertips now that working with databases is a necessity. On the SEO team, site crawls, performance numbers, and other data can be worked with in Microsoft Excel, but often it goes into our internal data warehouse where much of the work can be automated.
For this example, we’re going to look at a middle ground between standalone Excel and an automated data warehouse system to handle larger site crawl data when you don’t have access to a team of developers.
1. Collecting a Large Amount of Data with Crawlers
One thing we have done is setup servers that allow us to crawl hundreds of thousands of URLs from very large sites. Start with running a crawler (such as Screaming Frog) and downloading all of the data on a schedule (daily, weekly, or monthly). Amazon AWS is an option here, or buy a dedicated desktop with a lot of RAM.
Once you’re done, export a simple file like a CSV.
2. Take the Data into a Database
Excel can only handle a million or so rows before it keels over. We’ll use a database to store all of the data, but get smaller chunks of data from our it that can still fit into Excel.
Install a database like MySQL (free) or Microsoft Access (pretty cheap). Internally, we use a different database type, but the good news is that databases pretty much work the same regardless of which one you use.
Create a table and name it “site-crawl” (or “i-luv-kittens,” whatever works for you). Before importing it, adjust your crawl file by adding a column for the date of each crawl so you can discover trends. Then, use the Access wizard tool or MySQL import tool to upload your CSV from step 1.
3. Query the Database
Here’s an example you can do to to get you started. In MySQL or Access, you can save that as a view/report.
And now you’ve got trended data of the site’s 404 errors.
4. Buildout the Data in a Useful Format
Last, put that data into a format that’s easier to work and build graphs. You could use a tool like Tableau, but in this example using Excel.
Connect your database from step 3 first. Start a new notebook and then click the “Data” tab. Use “From Access” for Access and the “Connections” button for MySQL.
Set up either connection, which the wizards will help with. Once you have that, you can either select a view you’ve setup, or paste the SQL query you wrote in the previous step directly into Excel. You should get a table with all of your data, small enough that Excel won’t crash.
Now you can set up a graph of that data which refreshes from your database anytime you use it.