Tech bloggers (myself included) tend to like writing about big issues, with massive impact, company’s reputation (or in extreme cases even very existence) at stake, etc. This adds dramatism to the story (and as an added bonus, helps us feel more important). I think it can also lead to a certain bias, making the IT community believe that only major issues matter. Whenever you spot a relatively harmless anomaly and try to find out more about it on a DBA forum, “is there an actual problem you’re trying to solve” is a very likely response. And if you honestly answer “not really”, you can expect a lecture on “obsessive tuning disorder” and a suggestion to stick to a simple principle, “ain’t broken don’t fix it”. I think this mentality was initially a sound reaction to some inquisitive minds trying to solve non-issues and occasionally creating issues out of nothing instead. When taking too far, however, this attitude becomes dangerous. Anomalies are important even without an immediate impact. Sometimes they are important even without any impact at all. In this post, I’d like to illustrate it with an example.Continue reading “Performance monitoring and anomalies”
In my last blog post I covered some details of our recent battle with memory fragmentation problems on an OL6 server (Exadata compute node). It was mostly focused around page cache growth which was the main scenario. However, in addition to that, there was also a secondary scenario that had a completely different mechanism, and I will describe it in this post.Continue reading “Memory fragmentation via inode cache growth”
Last year I’ve spent quite some time tackling various memory fragmentation issues on an Exadata cluster (I’ve described some of my experiences here and here). In the end, everything was resolved, and the symptoms went away, but only to come back in a different form a few months later. First we had a minor episode months ago — some applications experienced timeouts when trying to connect, and there was a massive spike in Load Average metric with hundreds of processes in the “D” state stuck on rtnetlink_rcv and rtnl_lock. Then everything became stable again, until a few weeks ago, when the same symptoms came back, but the impact became much more severe, with node evictions, reboots and failovers causing serious disruption for the application.Continue reading “Memory fragmentation via buffered file I/O”
Oracle Compute Infrastructure (OCI) is the next generation cloud platform run by Oracle. Like some of its competitors, Oracle offers an always free tier where you can access a range of cloud products and services without having to pay anything ever. And while you will need to provide credit card information to prove that you are not a bot, it will not going to be charged unless you explicitly upgrade to a paid account.
Over last few years, I’ve put together a few utilities in R to visualise database performance data. One that was particularly useful for me is my own version of ASH viewer. I think it could be useful for many other DBAs and developers who deal with performance optimisation topics frequently enough, so I finally published it on github.
A very brief note to alert the community of a nasty JDBC bug affecting INSERT performance. It was noticed by our Java developers after upgrading their JDBC driver 220.127.116.11.0 to version 18.104.22.168.190416DBRU. They were inserting data in batches of 5,000 rows at a time, 250,000 total, and the time to process the entire batch went up from 16 to 102 minutes.
A few illustrations of patterns to look for when using Wireshark to understand poor network performance. I’ve already touched upon this topic in the past, but this time I just want to share a couple of screenshots with a few comments.
Loading data from flat files into an Oracle database is a very common task. Oracle’s implementation of external tables is fantastic, and in many cases it simplifies the job to such a degree that the developer is left with very little to do: just write a “create table” statement with a few additional details about the file’s structure and that’s pretty much it. Even if the information in the file is not in a simple format (like comma-separated or tab-delimited), this doesn’t make things much more complicated, as you can e.g. load the raw text and then use regex functions to process it.
So I’ve been using this feature in a broad variety of situations (some of them I covered in this blog, e.g. here), and one problem that I occasionally incur is that performance isn’t always great. For example, here is the DDL of what I use to parse listener log files:
In my previous posts (e.g. here and here) I showed how to use ps output (e.g. from ExaWatcher) visualization to spot performance problems in Linux. Here I’d like to show that this approach can be taken a little bit further, namely, to find the source of increase in memory usage.
The R code for this is quite straightforward. I also think it shouldn’t be much of a problem to do the same in Python, although I haven’t gotten around to try it myself. In the code below read_ps is the function that reads in a ps output file from ExaWatcher without unzipping it, sum_and_tidy does the aggregation, and visualize_rss does the plotting. There is also an auxillary function keep_top_n which is needed to keep the number of color bands to something reasonable.