Gradient package:base R Documentation
Gradient Computation
Description:
These are the facilities for computing gradients via run-time
automatic differentiation. For symbolic differentiation
facilities, see 'deriv'. For numerical differentiation, see
'numericDeriv'.
The present implementation of these facilities is a preliminary
one that supports only gradients of real scalar values, or lists
of real scalar values (or lists of lists of real scalar values,
etc), with respect to real scalar variables or lists of real
scalar values (or lists of lists, etc). Support for
differentiation of and with respect to vectors, matrices, and
arrays is planned, as is support for higher-order derivatives.
Perhaps differentiation of complex values will also be supported
some day.
Usage:
with gradient (var=v.expr) expr
with gradient (var) expr
with gradient (var1=v1.expr,var2=v2.expr) expr
with gradient (var1,var2) expr
track gradient (var=v.expr) expr
track gradient (var) expr
track gradient (var1=v1.expr,var2=v2.expr) expr
track gradient (var1,var2) expr
back gradient (var=v.expr) expr
back gradient (var) expr
back gradient (var1=v1.expr,var2=v2.expr) expr
back gradient (var1,var2) expr
compute gradient (var1=v1.expr, var2=v2.expr) expr as (g1.expr, g2.expr)
compute gradient (var1, var2) expr as (g1.expr, g2.expr)
gradient_of (e)
no_gradient (e)
Arguments:
var, var1, var2: (And so forth for var3, etc.) Names (not strings) of
variables with respect to which gradients will be tracked or
computed.
expr: An expression, often a compound expression, of a form such as
'{expr1;expr2}', which is evaluated in a new environment
containing the new variable or variables listed.
g1.expr, g2.expr: (and so forth for g3.expr, etc.) Expressions giving
the gradients of the specified 'expr' with respect to 'var1',
'var2', etc.
e: An expression whose gradient is computed (if possible), or is
not computed.
Details:
The 'with gradient' construct evaluates 'expr' in a new
environment (with the current environment as its parent)
containing the variable 'var' (or variables 'var1', 'var2', etc),
with initial value 'v.expr' (or initial values 'v1.expr',
'v2.expr', etc.). The value of 'expr' is returned as that of
'with gradient', with a '"gradient"' attribute attached containing
the derivative of the value of 'expr' with respect to 'var', or
the derivatives with respect to 'var1', 'var2', etc.. (Any
existing '"gradient"' attribute is discarded.)
During the evaluation of 'expr' within 'with gradient',
assignments to local variables (in the new environment) will
record the gradient of the assigned value with respect to 'var'
(or 'var1', 'var2', etc.), and with respect to the variables
listed in any enclosing 'with gradient', 'track gradient', or
active 'back gradient' constructs, so that if the value of the
variable is used to compute the final value for 'with gradient',
or is the argument of call of 'gradient_of', its gradient can be
computed. Note, however, that the gradients of attribute values
are not recorded, nor are gradients recorded when non-local
assignments are made with '<<-'.
Tracking of gradients continues when a function is called with one
or more arguments with tracked gradients. This includes functions
for S3 methods, but not S4 methods. Tracking will also be
performed when an expression is evaluated with 'eval', if the
environment used for the evaluation is one in which gradients are
being tracked.
Within 'expr', the gradient of an expression, 'e', with respect
the variable or variables of the innermost enclosing 'with
gradient' or 'track gradient' construct can be found with
'gradient_of(e)'. The value of 'gradient_of' does not itself have
any gradient information - hence 'gradient_of(gradient_of(e))'
will not produce a second derivative (it will always be zero).
Tracking of gradients can be explicitly suppressed (to save time)
with 'no_gradient(e)'. Gradient tracking is automatically
suppressed when evaluating expressions that are used as 'if' or
'while' conditions, as indexes, or as expressions iterated over
with 'for'.
The 'track gradient' construct is like 'with gradient', except
that the gradient is not attached to the final result. It is
therefore useful only in combination with calls of 'gradient_of',
or if it is inside (in a dynamic sense) another 'track gradient',
'with gradient', or active 'back gradient' construct. When these
gradient constructs are nested, derivatives with respect to an
inner variable may determine derivatives with respect to an outer
variable, by application of the chain rule (backpropagation).
The 'back gradient' construct is like 'track gradient', except
that it does not allow 'gradient_of', and hence only needs to
track gradients if they will be backpropagated to give gradients
for a (dynamically) enclosing gradient construct. It may
therefore be included at little performance cost in a function
that may either be called for only its value, or may be called for
both its value and that value's gradient, depending on whether
call from within a gradient construct.
In forms of 'with gradient', 'track gradient', and 'back gradient'
in which no expression follows a variable, the expression is
assumed to be the same as the variable (evaluated in the current
(not the new) environment).
The gradient of an expression can be specified explicity using the
'compute gradient' construct, as an alternative to simply letting
the gradient be obtained automatically, or as a necessary measure
if the expression contains built-in functions for which automatic
differentiation has not yet been implemented. The 'expr' within
'compute gradient' is evaluated in a new environment in which one
or more variables have been defined (two in the above templates).
The initial values of these variables are as specified, defaulting
to the variable's value evaluated in the current environment.
Gradients with respect to these new variables are not tracked
automatically, but are instead specified by the expressions after
'as'. The chain rule is used to translate these gradients to
gradients with respect to variables used to compute 'v1.expr',
'v2.expr', etc.
If computation of a gradient has not been requested, 'compute
gradient' will evaluate only the value, skipping evaluation of the
expressions after 'as'. It is possible for the gradient
expression to be evaluted for some variables but not others (e.g.,
in the form shown above, 'g2.expr' might be evaluated but
'g1.expr' not be evaluated).
Computation of gradients for built-in functions is also skipped
when it is known that the gradient will not be needed.
Gradients can be computed for expressions that are not
differentiable at some points, with the gradient returned at such
points being arbitrary.
Gradients may not have been defined for some builtin functions
(even if they exist mathematically), in which case they will
appear to be zero. When a builtin functions returns 'NA' or a
'NaN' value, the gradient will be regarded as zero (without an
error or warning).
Gradients may be defined for real-valued random generation
functions (eg, 'rnorm'). The gradient for these functions
indicates how a change in the distribution parameters would
produce a change in the generated random value, if the state of
the random number generator when calling the function were kept
fixed.
The following built-in functions and operators will compute
gradients, with respect to all their scalar real arguments (unless
noted), or when applicable, arguments that are lists of scalar
reals (or lists of lists of scalar reals, etc.):
list
$ (for vector lists only), [[ (for vector lists only)
$<- (for vector lists only), [[<- (for vector lists only)
+, -, *, /, ^ (+ and - may be unary)
abs, sqrt, expm1, exp, log1p, log2, log10, log (one-argument form only)
cos, sin, tan, acos, asin, atan, atan2
cosh, sinh, tanh, acosh, asinh, atanh
gamma, lgamma, digamma, trigamma, psigamma, beta, lbeta
dbeta, pbeta (1st argument only), qbeta (1st argument only)
dchisq (no ncp arg), pchisq (1st only, no ncp), qchisq (1st only, no ncp)
dbinom, pbinom
dcauchy, pcauchy, qcauchy, rcauchy
dexp, pexp, qexp, rexp
df (no ncp argument), pf (1st arg only, no ncp), qf (1st arg only,no ncp)
dgamma, pgamma (1st and 3rd args only), qgamma (1st and 3rd args only)
Note: 3rd argument of dgamma/pgamma/qgamma may be either rate or scale
dgeom, pgeom
dlogis, plogis, qlogis, rlogis
dlnorm, plnorn, qlnorm, rlnorm
dnbinom (3rd arg (prob or mu) only), pnbinom (3rd arg (prob or mu) only)
dnorm, pnorm, qnorm, rnorm
dpois, ppois
dt (with no ncp arg), pt (1st arg only, no ncp), qt (1st arg only,no ncp)
dunif, punif, qunif, runif
dweibull, pweibull, qweibull, rweibull
The following replacement functions do not compute any gradient
information, but do leave undisturbed any gradient information
that is associated with the variable that they update:
attr<-
Value:
The gradient returned by 'gradient_of' or attached as a
'"gradient"' attribute by 'with gradient' will be a scalar real if
the gradient is with respect to only one variable, and the
expression this is the gradient of has a scalar real value. The
gradient of a list is a list of the same form, with scalar
gradients replacing the scalar elements. For a gradient with
respect to a list variable, the gradient value is a list (or
nested lists) of the same form, with elements that are the
gradient of the expression value with respect to that element of
the list; note that the gradient of an expression value could
itself be a list.
When the 'with gradient' or 'track gradient' construct has more
than one variable, the gradient will be a list of scalar real
values or lists, with names corresponding to the variables. Note
that this is the same form as would be obtained if the values of
these variables were combined into a named list which was then
used as a single variable in the 'with gradient' or 'track
gradient' construct (see the example below).
Examples:
a <- with gradient (x=3) sin(x)
attr(a,"gradient") # should be cos(3)
a <- with gradient (x=3,y=2) sin(x+2*y)
attr(a,"gradient")$x # should be cos(7)
attr(a,"gradient")$y # should be 2*cos(7)
x <- 3
a <- with gradient (x) { r <- sin(x); r^2 }
attr(a,"gradient") # should be 2*sin(3)*cos(3)
sqr <- function (y) y^2 # gradients can be tracked through sqr
x <- 3
a <- with gradient (x) { r <- sin(x); sqr(r) }
attr(a,"gradient") # should be 2*sin(3)*cos(3)
funny <- function (x,y) { # has a discontinuity
q <- no_gradient(2*x) # gradient of 2*x won't be tracked
if (q>y/2) # gradient of y/2 won't be tracked
sin(x+y)
else
cos(x+y)
}
track gradient (a = 3) {
print (gradient_of(funny(a,a))) # prints 2*cos(3+3)
print (gradient_of(funny(a,8*a))) # prints -9*sin(3+24)
}
sigmoid <- function (x)
compute gradient (x) { v <- 1 / (1+exp(-x)); v }
as (v * (1-v))
sigmoid(1) # no gradient computed, only value
with gradient (x=1) sigmoid(x) # both value and gradient computed
track gradient (x=1) # should compute the same gradient
gradient_of (1/(1+exp(-x))) # as above, but perhaps more slowly
# (though maybe not since x is scalar)
set.seed(123); with gradient (r=5) rexp(1,r)
set.seed(123); v1<-rexp(1,4.999)
set.seed(123); v2<-rexp(1,5.001)
(v2-v1) / 0.002 # should be close to gradient above
r <- with gradient (a=7) list(a,a^2)
attr(r,"gradient") # should be a list of 1 and 14
r <- with gradient (a=7) list(square=a^2,cube=a^3)
attr(r,"gradient")$cube # gradients of lists retain names
with gradient (a=7,b=9) {
r <- list()
r$aplusb <- a+b
r$atimesb <- a*b
r$asqbsq <- r$atimesb^2
list (r, a^2*b^2) # a^2*b^2 should be the same as asqbsq
}
with gradient (a=7) {
L <- list(square=a^2,cube=a^3)
list (L$square*L$cube, a^5) # both values and gradients of the two
} # list elements should be the same
with gradient (a=3) # only the gradient with respect to a
with gradient (b=10*a) # will be shown, but the chain rule
list(a,b,a*b) # is used to convert derivatives wrt
# to b into derivatives wrt to a
with gradient (a=5,b=6) list(a^2,a*b) # these give the same
with gradient (x=list(a=5,b=6)) list(x$a^2,x$a*x$b) # value and gradient
with gradient (a=2) { # find derivatives of powers of a, from
L <- list(a) # a^1 to a^10, evaluated at a=2
for (i in 2..10) L[[i]] <- L[[i-1]] * a
L
}
V <- as.list(seq(0,1,length=11))
with gradient (V) { # tracks gradient w.r.t. 11 elements of V
p <- 0
for (i along V)
p <- p + i*V[[i]]
p^2 + p^3 + p^4 + p^5 # every operation computes derivatives
} # w.r.t. all 11 elements of V
with gradient (V) { # compute same result more efficiently...
p <- 0
for (i along V)
p <- p + i*V[[i]]
back gradient (p) # operations in the expresson below
p^2 + p^3 + p^4 + p^5 # compute derivative w.r.t. p only, then
} # chain rule gives gradient w.r.t. V