7 CHAPTER 2 | Microsoft SQL Server R Services
multiple database systems. Most vendors extend the basic SQL constructs to take advantage of the
platform it runs on, and in the case of Microsoft SQL Server, this dialect is called Transact-SQL (T-SQL).
T-SQL is used to query, update, and delete data, along with many other functions.
In both R and T-SQL, the developer types commands in a step-wise fashion in an editor window or at
a command-line interface (CLI). But the path of operations is different from that point on. R is an
interpreted language, which means a set of binaries local to the command environment processes the
operations and returns the result directly to the calling program. In SQL Server, the client is separate
from the processing engine. The installation of SQL Server listens on a network interface, and the
client software puts the commands on the network path in a particular protocol. The server receives
this packet with the T-SQL statements only if the packet is “well formed.” The commands are run on
the server, and the results, along with any messages the server sends (such as the number of rows)
and any error messages, are returned to the client over the same protocol. The primary load in this
approach is on the server rather than the workstation. Of course, the workstation might then further
process the data—using Java, C#, or some other local language—but often the business logic is done
at the server level, with its security, performance, and other advantages and controls.
But SQL Server is much more than just a data store. It’s a rich ecostructure of services, tools, and an
advanced language to deal with data of almost any shape and massive size. Many organizations store
the vast amount of their actionable data within SQL Server by using custom and commercial software.
It has more than 36 data types, and gives you the ability to define more.
SQL Server also has fine-grained security features. When these are applied, the data professional can
simply query the data, and only the allowed datasets are returned. This facilitates good separation of
duties, which is highly important in large, complex systems for which one group of professionals
might handle the security of data, and another handles the querying and processing of the data.
SQL Server also has advanced performance features, such as a column-based index, which can provide
extremely fast search and query functions over very large sets of data.
Using R on SQL Server combines the power of the R language (and its many packages) and the
advantages of the SQL Server platform by placing the computation over the data. This means that you
aren’t moving the data to the R system, involving networking, memory on two systems, CPU power on
each side, and other disadvantages—the code operates on the same system as the application data.
Combining R and SQL Server means that the R environment gains not only the functions and features
in the R language, but also the ecostructure, security, and performance of SQL Server, as well as
increased scale. And using R directly on SQL Server means that the R code can save the results of the
operation to a new or existing table for other queries to access and update.
A brief overview of the SQL Server R Services
architecture
The native implementation of open-source R reads data into a data-frame structure, all of which is
held in memory. This means that R is limited to working with data sizes that will fit into the RAM on
the system that processes the data. Another limitation in R is within a few of the core packages that
process certain algorithms, most notably dealing with linear regression math. These native calls can
perform slowly.
SQL Server R Services
To address these limitations (and others), Microsoft R Server brings several major enhancements to
the R platform—Microsoft R Server is what is used in SQL Server R Services. The first enhancement is
the ScaleR library, which allows MRS to “chunk” data stored on permanent storage in either comma-