4 MINUTE READ | June 20, 2017
How to Insert R data frame into SQL
I’ve run into this issue a couple of times during my time at PMG. I’ve used R to perform some kind of analysis, forecasting, clustering, etc., and I need to get the result into our database so anyone on our analytics team could use the output in a SQL query or connect to it in a dashboard.
For example, we wanted to run our own forecast analysis to determine outliers in a time series. Simple enough to do in R, but the end consumer wanted to see it in a dashboard. By inserting the R result into our database, we were able to add the results of the analysis to our dashboard.
Here’s a dummy data frame to get us started.
The first thing we have to do is create a placeholder in your database. Fortunately, our dev team has built an amazing UI that allows us to easily create tables without having to write any code.
For the less fortunate, there are plenty of tutorials out there on how to do this. Just Google it.
Now onto the fun stuff. The trick here is to figure out how to take an R data frame and convert it to a string that matches SQL’s INSERT syntax. (I’m working on Redshift’s version of Postgres. Double check the INSERT syntax for your database and make any needed edits.) This example is small enough where we can manually type out the INSERT statement we are looking for.
We can obviously hard code the “INSERT INTO [tablename] VALUES” part, but the rest of it has to be completely dynamic to accommodate different numbers of rows and columns when we need to do this.
Enter the R code. Our goal is to find R code to reproduce “(‘2017-06-10’, ‘Campaign A’, ‘1’), (‘2017-06-11’, ‘Campaign A’, ‘2’), (‘2017-06-10’, ‘Campaign B’, ‘3’), (‘2017-06-11’, ‘Campaign B’, ‘4’)”.
If we break it down, there are four elements to this:
Each value is closed in single quote marks.
The values within each row of data are separated by commas.
Each row is enclosed in parentheses.
Each row is separated by a comma.
Let’s try solving 1 and 2. We could apply paste0 to each row and collapse the result to combine each row into a single string.
Certainly a start, but we are missing the first and last single quote for each row as well as the parentheses. We are probably going to have to apply a custom function, so let’s recreate what we have so far before adding anything else.
Same output, but we can build on this much easier than our first attempt. Let’s add the parentheses and an extra single quote at the start and end of each row. This is easily done by wrapping an extra paste0 function within our custom function to be applied to each row.
So close! All that is left is to combine these into one single string (instead of the vector of size 4 we have now) and separate by commas. This sounds like another job for paste0! We can wrap our previous attempt up in paste0 and add a collapse argument.
There we have it! Now we can assign the output to an object and put it in yet another paste0 call to add the “INSERT INTO [tablename] VALUES” at the beginning and submit the query to be run.
Sure enough, when I go check my table, everything is there.
Bonus: What if you have NA’s in R that you want to be input as NULL values in your database? Go through the same steps to produce the values object, then add one extra line. I use the str_replace_all function from the stringr package to replace all ‘NA’ values with NULL.
The trick to finding solutions like this is to start small. Once you solve one little piece, move on to the next as we did with the single quotes and commas, then the parentheses, and then the final commas. By the end of it, you have a solution that you might not have thought you could produce.
Stay in touch
Subscribe to our newsletter
Interested in working with us? See our open engineering roles here.
Posted by: Preston Smith
4 MINUTES READ | August 30, 2021
2 MINUTES READ | February 4, 2020
2 MINUTES READ | September 4, 2019