Matrices
In this report we will look at how R
treats matrices.
We will explore the Matrix
package, which extends the basic R
functionality for matrices. For example, it seems odd that base R
does not have a method of determining the rank of a matrix up to a certain tolerance — the Matrix
package adds this functionality.
Dense matrices
A matrix is a two dimensional data structure in R
. We specify a matrix by its columns, and we can change this default functionality by setting byrow=T
. We can assign names to the rows and columns of a matrix, and easily change these names by redefining colnames(<matrix>)
and rownames(<matrix>)
. This is shown below.
# matrices are defined by columns
m <- matrix(1:4, 2, 2)
m
## [,1] [,2]
## [1,] 1 3
## [2,] 2 4
# print the class of the matrix object
class(m)
## [1] "matrix" "array"
# we can observe attributes
attributes(m)
## $dim
## [1] 2 2
# we can print this attribute
dim(m)
## [1] 2 2
# we can assign row and column names when creating the matrix
m <- matrix(c(rnorm(2,0,1), rnorm(2, 1, 1)), 2, 2,
dimnames = list(c("obs 1", "obs 2"), c("r.v. 1", "r.v. 2")))
m
## r.v. 1 r.v. 2
## obs 1 0.5952221 1.4344718
## obs 2 1.0713101 0.8299451
# print colnames
colnames(m)
## [1] "r.v. 1" "r.v. 2"
# print rownames
rownames(m)
## [1] "obs 1" "obs 2"
# redefine colnames
colnames(m) <- c("new col name 1", "new col name 2")
# redefine rownames
rownames(m) <- c("new row name 1", "new row name 2")
# print new matrix
m
## new col name 1 new col name 2
## new row name 1 0.5952221 1.4344718
## new row name 2 1.0713101 0.8299451
It’s quite odd but when we select a row or column of a matrix, R
returns a vector and loses the matrix dimensions. When selecting a row of a \(2 \times 2\) matrix we would expect a vector of dimension \(1 \times 2\).
# select row
m[1,]
## new col name 1 new col name 2
## 0.5952221 1.4344718
# check dimensions of row
dim(m[1,])
## NULL
In order to change this behavior, we can specify drop=F
.
# select row
m[1,,drop=F]
## new col name 1 new col name 2
## new row name 1 0.5952221 1.434472
# check dimensions
dim(m[1,,drop=F])
## [1] 1 2
This is inconvenient but it allows us to maintain the dimensions of a matrix when indexing. When selecting a column of a matrix we would expect to obtain a column vector, but this is also not the case unless we specify drop=F
.
R
also has higher-dimensional data structures called arrays. We can think of an array as stacked matrices. For example, specifying dimension (4,5,6)
creates 6 matrices with 4 rows and 5 columns.
Below we will specify an array of dimension (2,2,3)
. Recall that R
specifies a matrix by its columns — we will demonstrate this.
arr <- array(1:3, c(2,2,3))
dim(arr)
## [1] 2 2 3
arr[,,1]
## [,1] [,2]
## [1,] 1 3
## [2,] 2 1
arr[,,2]
## [,1] [,2]
## [1,] 2 1
## [2,] 3 2
arr[,,3]
## [,1] [,2]
## [1,] 3 2
## [2,] 1 3
# We can also select the rows or columns
arr[1,,]
## [,1] [,2] [,3]
## [1,] 1 2 3
## [2,] 3 1 2
Note in the above that when we do arr[1,,]
this selects the first row of each matrix, which are \((1,3)\), \((2,1)\), and \((3,2)\), and puts them into a matrix.
Solving linear systems
A common problem encountered in linear algebra is solving a system of linear equations. That is, finding \(x\) that solves \(Ax = b\). On paper we would simply write \(x = A^{-1}b\). There are two ways of implementing this solution in R
: we could use x <- solve(A) %*% b
; we could directly solve the system using x = solve(A, b)
. Naturally we would expect these solutions to be equal, but this is not necessarily the case in R
. We shall show this by an example with Hilbert matrices which are well known to be close to singular.
A <- as.matrix(Hilbert(9))
# finding the rank is inconsistent
rankMatrix(A)
## [1] 9
## attr(,"method")
## [1] "tolNorm2"
## attr(,"useGrad")
## [1] FALSE
## attr(,"tol")
## [1] 1.998401e-15
rankMatrix(A, tol = 1e-9)
## [1] 7
## attr(,"method")
## [1] "tolNorm2"
## attr(,"useGrad")
## [1] FALSE
## attr(,"tol")
## [1] 1e-09
# show nearly singular
det(A)
## [1] 9.720265e-43
# create a linear system
x <- matrix(rnorm(9), 9, 1)
b <- A %*% x
# solve
x1 <- solve(A) %*% b
x2 <- solve(A, b)
# compute some metric of the error
norm(x-x1)
## [1] 0.0001742052
norm(x-x2)
## [1] 5.78809e-05
# compare the residuals
norm(b - A %*% x1)
## [1] 2.753297e-06
norm(b - A %*% x2)
## [1] 7.771561e-16
We see that solving the system of equations directly is far superior, both in terms of accuracy and speed. In R
we should always avoid directly computing the inverse of a matrix, unless the inverted matrix will be used multiple times to solve linear systems. Even then, there are better ways to approach the problem. We can always look at different decompositions of the matrix and find a more numerically stable method of solving the linear system. This is what functions such as lm()
do — they do not directly solve for predicted coefficients using \(\hat{\beta} = (X^TX)^{-1}X^TY\), they rely on the QR decomposition. Before we compute the inverse of a matrix we should try to find a more elegant solution by decomposing the matrices involved.
Numerical stability and finite precision arithmetic
In most programming packages, a floating point number is stored as a ‘double’ precision number. According to the IEEE754 standard a double number consists of 64 binary bits arranged as follows:
- sign bit: 1 bit
- exponent bit: 11 bits
- significant precision: 52 bits
Thus, we are limited in the length of the numbers we can store. This means there exist smallest and largest numbers in R
. We can see this demonstrated below.
c(2^(-1075), 2^(-1074), 2^1023, 2^1024, 2^1024 / 2^1024)
## [1] 0.000000e+00 4.940656e-324 8.988466e+307 Inf NaN
We see that \(2^{1023}\) is the largest number that can be stored. Any higher and R
returns Inf
. Note that we can no longer do arithmetic with Inf
— even a simple operation such as \(2^{1024}/2^{1024}\) cannot be computed.
We have 52 precision bits, so we can expect an error rate of approximately \(10^{-16}\) (\(2^{-52} \approx 2.22 \times 10^{-16}\)) in numerical computations of double numbers. This is shown below. The R
documentation mentions that we should avoid using more than 15 digits.
# R will not suggest digits >= 16
# so if we add a very small number to 1, it will not print
print(1 + 1e-14)
## [1] 1
# specify digits
print(1 + 1e-14, digits=15)
## [1] 1.00000000000001
These numerical characteristics of the machine are also stored in R
. We can access them using .Machine
. For example, we can find the smallest \(x\) such that \(1 + x \neq 1\) (the machine epsilon) using .Machine$double.eps
. .Machine
stores many interesting characteristics and is definitely worth exploring.
#
.Machine$double.eps
## [1] 2.220446e-16
Arithmetic with double floating point numbers is not so easy in R
. This is because of numerical errors — this is a problem for even trivial calculations as shown below. These errors occur because floating points can only represent the dyadic rationals, which are numbers with denominators which are powers of 2. Hence 0.1 is simply approximated as a dyadic rational. This leads to unexpected results, for example, 0.1+0.2 == 0.3
is FALSE
!
# not zero
0.1+0.2-0.3
## [1] 5.551115e-17
# looks okay
0.1 + 0.2
## [1] 0.3
# check
0.1 + 0.2 == 0.3
## [1] FALSE
# all.equal uses a tolerance parameter
all.equal(0.1 + 0.2, 0.3)
## [1] TRUE
We see that the equality comparison ==
often does not behave the way we expect it to. When working with double floating point numbers we should avoid testing equality using ==
and instead use all.equal()
which ignores machine errors using a tolerance parameter.
R
also has an integer data type ‘long’.
# 1L seems to be the same as 1
1L
## [1] 1
# test equality
1 == 1L
## [1] TRUE
identical(1, 1L)
## [1] FALSE
Given that error is introduced with arithmetic operations, the errors produced by matrix multiplication are much larger due to the numerous operations of addition and scalar multiplication.
Suppose we have matrices \(A \in \mathbb{R}^{n \times n}\), \(B \in \mathbb{R}^{n \times m}\), and \(C \in \mathbb{R}^{m \times 1}\). Then the flop cost of computing \(AB\) is of the order \(\mathcal{O}(n^2m)\) as it requires \(n^2m\) multiplications. The flop cost of computing \(BC\) is of the order \(\mathcal{O}(nm)\). Let \(m=n\), then to compute \(ABC\) has cost of the order \(\mathcal{O}(n^3)\) and to compute \(A(BC)\) has cost of the order \(\mathcal{O}(n^2)\). Thus, whilst these quantities are the same algebraically, the order of multiplications impacts the number of multiplications and thus impacts the error produced. In the following example we expect computing \(ABC\) to have an error approximately \(10\) times larger than computing \(A(BC)\).
# check error for matrix addition and subtraction
A <- matrix(rnorm(100),10,10); B <- matrix(rnorm(100),10,10); C <- rnorm(10)
summary(c((A + B) - A - B))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.220e-16 -2.776e-17 0.000e+00 -5.785e-18 0.000e+00 4.441e-16
# check error for matrix multiplication
summary(c(A%*%B%*%C - A%*%(B%*%C)))
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -7.105e-15 0.000e+00 1.332e-15 4.441e-16 2.665e-15 3.553e-15
# check computation times
n <- 5000
A <- matrix(rnorm(n^2), n, n); B <- matrix(rnorm(n^2), n, n); C <- rnorm(n)
c(system.time(A%*%B%*%C), system.time(A%*%(B%*%C)))
## user.self sys.self elapsed user.child sys.child user.self sys.self
## 170.762 0.188 171.034 0.000 0.000 0.124 0.000
## elapsed user.child sys.child
## 0.124 0.000 0.000
When working with matrices we must think carefully about what we are trying to do, in order to avoid unnecessary multiplications which can introduce numerical errors.
Suppose we want to sum the diagonal entries of the matrix product \(AB\), for \(A \in \mathbb{R}^{n \times m}\), \(B \in \mathbb{R}^{m \times n}\) where \(m << n\). Note the costs of forming \(AB\) compared to \(BA\) (\(\mathcal{O}(n^2m)\) compared to \(\mathcal{O}(m^2n)\)). In R
there are multiple ways we can do this.
We can compute the matrix product \(AB\), extract the diagonal entries, and sum them.
We can instead compute the matrix product \(BA\), extract the diagonal entries, and sum them. This uses the fact that \(\text{tr}(AB) = \text{tr}(BA)\).
We can sum the elements of \(A * B^T\) (element-wise multiplication), since \(\text{tr} = \sum_{i,j} A_{ij} B_{ji}\).
n <- 10000; m <- 1000
A <- matrix(rnorm(n*m), n, m); B <- matrix(rnorm(n*m), m, n)
# first, second, and third method
first <- sum(diag(A %*% B))
second <- sum(diag(B %*% A))
third <- sum(A * t(B))
# compare errors
summary(first - second)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.683e-11 1.683e-11 1.683e-11 1.683e-11 1.683e-11 1.683e-11
summary(first - third)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.411e-12 3.411e-12 3.411e-12 3.411e-12 3.411e-12 3.411e-12
summary(second - third)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1.342e-11 -1.342e-11 -1.342e-11 -1.342e-11 -1.342e-11 -1.342e-11
# compare computation time
system.time(sum(diag(A %*% B))) # first
## user system elapsed
## 83.652 0.228 83.883
system.time(sum(diag(B %*% A))) # second
## user system elapsed
## 8.344 0.000 8.344
system.time(sum(A * t(B))) # third
## user system elapsed
## 0.104 0.028 0.132
The second method is much better than the first both in terms of precision and run time. The third method is better than the second method.
Sparse Matrices
The R packages Matrix
provides additional functionality for both dense matrices and sparse matrices. In the Matrix
packages, dense matrices are stored as dgeMatrix
objects, and sparse matrices are stored as dgCMatrix
objects. A useful function we have already seen is rankMatrix
which provides the rank of the input matrix up to a certain tolerance.
library(Matrix)
A <- Matrix(c(1,1,2,2),2,2)
B <- Matrix(c(1,1,2,2) + rnorm(4)/1e10,2,2)
c(rankMatrix(A), rankMatrix(B), rankMatrix(B, tol=1e-7))
## [1] 1 2 1
A sparse matrix is one in which most entries are equal to zero. It’s inefficient to store all of these zeros in memory, and so we only store the non-zero entries.
set.seed(25)
nrows <- ncols <- 500
entries <- sample(c(0,1,2),nrows*ncols,TRUE,c(0.98, 0.01, 0.01))
m1 <- matrix(entries, nrows, ncols)
m2 <- Matrix(entries, nrows, ncols, sparse = TRUE)
c(class(m1), class(m2))
## [1] "matrix" "array" "dgCMatrix"
m1[1:2,1:10]
## [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10]
## [1,] 0 0 0 0 0 0 0 0 0 0
## [2,] 0 1 0 0 0 0 0 0 0 1
m2[1:2,1:10]
## 2 x 10 sparse Matrix of class "dgCMatrix"
##
## [1,] . . . . . . . . . .
## [2,] . 1 . . . . . . . 1
c(object.size(m1), object.size(m2))
## [1] 2000216 62904
We see that a sparse matrix defined using the Matrix
package has class dgCMatrix
. The ‘d’ stands for digit, the ‘g’ stands for general, and the ‘C’ stands for column. We can convert a dgCMatrix
to a dgeMatrix
— the standard class for dense matrices — but we will lose the memory improvements of the compressed sparse format. Alternatives to the dgCMatrix
are the dgRMatrix
and dgTMatrix
classes. We can convert a dgCMatrix
into a dgTMatrix
(a triplet matrix), but not a dgRMatrix
.
# display the structure
str(m2, max.level = 1)
## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
str(m2)
## Formal class 'dgCMatrix' [package "Matrix"] with 6 slots
## ..@ i : int [1:4950] 5 46 141 211 315 327 380 430 482 485 ...
## ..@ p : int [1:501] 0 10 21 29 42 47 54 58 67 76 ...
## ..@ Dim : int [1:2] 500 500
## ..@ Dimnames:List of 2
## .. ..$ : NULL
## .. ..$ : NULL
## ..@ x : num [1:4950] 2 1 1 1 1 1 1 1 1 1 ...
## ..@ factors : list()
which(m2[,1]>0)
## [1] 6 47 142 212 316 328 381 431 483 486
# show difference in memory size
object.size(as(m2, 'dgeMatrix'))
## as(<dgCMatrix>, "dgeMatrix") is deprecated since Matrix 1.5-0; do as(., "unpackedMatrix") instead
## 2001176 bytes
object.size(as(m2, 'dgTMatrix'))
## 80696 bytes
object.size(m2) # dgCMatrix
## 62904 bytes
# print the matrices before and after coercion
m2[1:2, 1:10]
## 2 x 10 sparse Matrix of class "dgCMatrix"
##
## [1,] . . . . . . . . . .
## [2,] . 1 . . . . . . . 1
as(m2, 'dgTMatrix')[1:2, 1:10]
## 2 x 10 sparse Matrix of class "dgTMatrix"
##
## [1,] . . . . . . . . . .
## [2,] . 1 . . . . . . . 1
Operations for sparse matrices
A <- B <- Matrix(c(1,1,0,0,0,1),3,2, sparse=TRUE)
A
## 3 x 2 sparse Matrix of class "dgCMatrix"
##
## [1,] 1 .
## [2,] 1 .
## [3,] . 1
# division maintains class
A/10
## 3 x 2 sparse Matrix of class "dgCMatrix"
##
## [1,] 0.1 .
## [2,] 0.1 .
## [3,] . 0.1
# addition and subtraction change class to dgeMatrix
A+1; A-1
## 3 x 2 Matrix of class "dgeMatrix"
## [,1] [,2]
## [1,] 2 1
## [2,] 2 1
## [3,] 1 2
## 3 x 2 Matrix of class "dgeMatrix"
## [,1] [,2]
## [1,] 0 -1
## [2,] 0 -1
## [3,] -1 0
# matrix multiplication changes class
A %*% c(1,1)
## 3 x 1 Matrix of class "dgeMatrix"
## [,1]
## [1,] 1
## [2,] 1
## [3,] 1
# addition and subtraction of two dgCMatrix objects produces an object of the same class
A+B; A-B
## 3 x 2 sparse Matrix of class "dgCMatrix"
##
## [1,] 2 .
## [2,] 2 .
## [3,] . 2
## 3 x 2 sparse Matrix of class "dgCMatrix"
##
## [1,] 0 .
## [2,] 0 .
## [3,] . 0
# matrix multiplication of two dgCMatrix objects gives a dgCMatrix object
A %*% t(B)
## 3 x 3 sparse Matrix of class "dgCMatrix"
##
## [1,] 1 1 .
## [2,] 1 1 .
## [3,] . . 1
# row or column binding dgCMatrix objects returns a dgCMatrix
cbind(A,A); rbind(A,A)
## 3 x 4 sparse Matrix of class "dgCMatrix"
##
## [1,] 1 . 1 .
## [2,] 1 . 1 .
## [3,] . 1 . 1
## 6 x 2 sparse Matrix of class "dgCMatrix"
##
## [1,] 1 .
## [2,] 1 .
## [3,] . 1
## [4,] 1 .
## [5,] 1 .
## [6,] . 1
Solving large linear systems
Solving linear systems for very large and sparse systems is very difficult, as the inverse of a sparse matrix is not necessarily sparse. To avoid these problems we should directly solve the linear system using solve(A, b)
.
nrow <- ncol <- 500
A <- Matrix(sample(c(0,1), nrow*ncol, TRUE, c(0.98, 0.02)), nrow, ncol, sparse=TRUE)
A.inv <- solve(A)
# object sizes
c(object.size(A), object.size(A.inv))
## [1] 1884440 2001176
# possible elements
nrow*ncol
## [1] 250000
# entries in A
nnzero(A)
## [1] 4897
# entries in A.inv
nnzero(A.inv)
## [1] 250000