Robert's Data Science Blog

Connect to Azure Databases from R

R has great options for connecting to SQL databases through the DBI for R project. RStudio also has great documentation on Databases using R, in particular the section “Best Practices”. The major lines of this post is based on the RStudio documentation, but I’ve added some details that I spent time resolving.

When connecting to SQL databases in Azure I use the DBI package and the odbc package. As explained in the “Best Practices” above, it is not desirable to enter database information and credentials directly in a connection string. That is, don’t do like this:

con <- DBI::dbConnect(odbc::odbc(),
	server = "<server name>.database.windows.net",
	database = "<database name>",
	username = "<user>",
	password = "<password>",
	port = 1433
)
The security credentials are entered in your history for the world to see and it is quite verbose. Instead, rely on a Data Source Name (DSN).

Windows

On Windows I use the “ODBC Data Source Administrator”:

ODBC Data Source Administrator

A DSN with name “MyConnection” is then used in R as

con <- DBI::dbConnect(odbc::odbc(), dsn = "MyConnection")
At least when using Active Directory for authentication, there is no need for further authentication details in R.

You can test the connection with DBI::dbIsValid(con).

Linux

On Linux (which is relevant when using Shiny Server) I rely on the Free TDS drivers. On Ubuntu these are installed by running the following commands:

apt-get install unixodbc unixodbc-dev --install-suggests
apt-get install tdsodbc
I have tried to make other drivers work, but with no luck.

The ODBC drivers are specified in the file /etc/odbcinst.ini:

[FreeTDS]
Description = FreeTDS
Driver = /usr/lib/x86_64-linux-gnu/odbc/libtdsodbc.so
Setup = /usr/lib/x86_64-linux-gnu/odbc/libtdsS.so
Trace = no
The Driver and Setup files may be in different locations on other systems, but at least the name of the shared libraries should be the same.

A connection is specified in the file /etc/odbc.ini:

[MyConnection]
Driver = FreeTDS
Server = <server name>.database.windows.net
Database = <database name>
Port = 1433
Encrypt = yes
TDS_Version = 7.0
Timeout = 60

I can only make a connection work when TDS_Version is 7.0 – using 7.x with x different from 0 has not worked for me.

In Azure one can make Azure Active Directory the only allowed form of authentication. So if you are having problems accessing a database, check if access with username & password is allowed.

You can also enter Username and Password in odbc.ini, but there is no point: They are not read by R. That is, you need a connection string like:

con <- DBI::dbConnect(odbc::odbc(), dsn = "MyConnection", username = "<username>", password = "<password>")
Again, this is a bad idea. One option that at least makes this line of code safe and usable for others is to store username/password in a (local) .Renviron file:

db_username = "<username>"
db_password = "<password>"
This info can be retrieved with Sys.getenv("db_username") and Sys.getenv("db_password"). (Check the references for Renviron in a previous post.)

Differences between Linux and Windows

One thing that has caused me problems is that a SQL script passed to a SQL database in Azure through R is not executed in the same manner on Windows and Linux (with the setup I have described).

As a simple example, I often use SQL Server Management Studio (SSMS) on Windows to write and test SQL scripts (when the dbplyr package doesn’t suffice). In SSMS the tables in a database are listed as <prefix>.<tablename>. In R, the command DBI::dbListTables(con) returns the tables without the <prefix>. When executing a SQL script through DBI::dbSendQuery with <prefix> it works on Windows, but not on Linux.