Database query tuning is mostly about getting better plans. Mostly, but not always. Sometimes, the problem has nothing to do with the plan, and you might need to get a bit creative to find a solution. In this recent case a query was showing a decent performance when running from SQL Developer, but it took about 5 times longer to complete when running from R. The plan was the same, so I knew that it was irrelevant. The R session wasn’t showing as active most of the time, so it was fairly clear that the problem was fetching data — i.e. it was fetching too few rows at a time which lead to a large number of roundtrips, and consequently, high waits on “idle” event “SQL*Net message from client”.
I didn’t find any settings to control fetch size in ODBC, but the R function that executed the query and fetched results, as it turned out, contained “rows_at_once” parameter, which did that. The problem was, however, that the maximum setting was 1000, which wasn’t enough. It was still taking almost 3 minutes to fetch all 2 million rows.
Now let me say a few words about the data I was pulling from the database. It was a collection of time series of several variables, of the form: timestamp, key, value. After getting the data, I did a bit of processing, and plotted the time series.
So in a situation like this, what are the options for improving performance? If too much time is spent on network roundtrips then the logical thing is either to reduce the number of roundtrips, or make an average roundtrip faster. Neither seems particularly easy — the natural way to reduce the number of roundtrips is by increasing the number of rows fetched at a time, but we already hit a limit here. Making the roundtrips faster, in all likelihood, is going to be yet more difficult.
But in this particular case, there was a third, less obvious option. Remember, I was loading data into R — a data processing and visualization system which is very powerful and rich in features. Many things are as easy or even easier to do in R than in Oracle, which makes Oracle + R a great combination (you can move many pieces of your data analysis from Oracle to R or the other way round as you like).
Another thing to notice was that I was pulling a “tall” data set (few columns, many rows). Both Oracle and R can easily transform it to a “fat” data set using operation called “pivoting” in Oracle, or “melting” in R. The reverse operation is equally easy. And actually for my purposes I needed a “fat” data set, it’s just that I chose to transform it to this representation at the very last stage of my analysis, after some additional filtering. But I could be flexible, and make this transformation earlier, thus effectively reducing the number of rows.
So I changed the query using the PIVOT operator to something like
with a as (<original query>) select * from a pivot (avg(value) value for key in ('key1', 'key2', ... 'keyN'))
which changed the original 3 column by 2M rows dataset to 680 columns by 3000 rows, reducing the time to fetch all results from about 3 minutes to 15-20 seconds. Unpivoting it in R was a matter of a single line of code, something like
dcast(dataset, time ~ key)
and only took about a couple of seconds to complete.
So when you have the flexibility (like in my case), you can use pivoting to reduce the number of rows of the data set you are after. It won’t change its actual size, but it will allow you to “repackage” it into a different shape and achieve significant performance savings on fetching.