With the release of Tensorflow [1] and Twitter’s torch-autograd [2] (inspired by autograd [3] originally written in Python) I think it’s time to take a look at automatic differentiation [4] and why it’s super awesome.
First, I’d like to differentiate symbolic differentiation (SD) and automatic differentiation (AD). SD is akin to the stuff you did in your calculus class, you take a math expression and return a math expression. AD is where you take code that computes some function and return code that computes the derivative of that function. Theano [5], for example, uses SD. This is precisely why it’s very difficult to express loops in Theano and models like RNNs take a long time to compile.
So why does AD even work?
The theory behind AD is all numeric computations are ultimately compositions of a finite set of elementary operations (+, -, *, /, exp, log, sin, cos, etc.) [6].
So the idea is if we can write the code to differentiate these basic operations, then when we encounter a complicated numeric computation we break it down into these elementary ops and deal with those as opposed to figuring out the derivative encapsulating the entire computation. No more fiddling around with backpropagation!
Ok, let’s tie this back in with Tensorflow and torch-autograd.
So far, there’s been two approaches to doing AD. Explicit vs. implicit graph construction.
Tensorflow
Construct a graph and have a compilation step to optimize the graph. Now, to be fair, Tensorflow uses the graph for much more than just AD. For our purposes we’ll just focus on the AD part.
You also can’t write arbitrary code, for example, you can’t use numpy to do computations. You have to use the Python Tensorflow library. This might not be case if you write in C++ since Tensorflow is a C++ program. Either way, the use of Tensorflow would most likely be in a higher-level language so it makes sense to have a language agnostic API.
torch-autograd
No compilation. Constructs a tape data structure on the fly that keeps track of computations and how to compute the backward pass (constucts the computation graph for you).
Here we can write arbitrary Torch/Lua code. Unlike Tensorflow there’s no concern to have a language agnostic API. If you buy into Torch you buy into Lua, so arbitrary code makes sense here.
So which approach is better?
I think both are right for what the each project’s goals are. Also, it doesn’t really matter! We should just be happy AD is taking off and we can avoid the dreaded friction of calculating the backward pass ourselves. Let the computer to the dirty work for you!
Sources:
[1] http://tensorflow.org/
[2] https://github.com/twitter/torch-autograd
[3] https://github.com/HIPS/autograd
[4] https://en.wikipedia.org/wiki/Automatic_differentiation
[5] http://deeplearning.net/software/theano/
[6] http://arxiv.org/abs/1502.05767
随着Tensorflow[1]和Twitter的发布火炬autograd[2](由autograd启发[3]最初用Python编写的),我认为现在是时候看看自动分化[4],以及为什么它的超级真棒。
首先,我想区分符号微分(SD)和自动微分(AD)。 SD类似于你在你的微积分类做了东西,你把一个数学表达式,并返回一个数学表达式。 AD是你采取一些计算功能,并返回代码,该计算函数的导数的代码。 Theano[5],例如,使用的SD。这正是为什么它是非常艰难的啮合来表达Theano循环和模型,如RNNs需要较长时间进行编译。
那么,为什么AD甚至工作?
AD背后的理论是所有的数字计算最终都是有限的一系列基本操作的成分(+, - ,*,/,EXP,日志,正弦,余弦,等等)[6]。
这样的想法是,如果当我们遇到一个复杂的数值计算,我们把它分解成这些基本老年退休金计划,并与处理,而不是搞清楚衍生封装整个计算,我们可以编写代码来区分这些基本操作,然后。没有更多的反向传播摆弄周围!
好吧,让我们来配合这回在Tensorflow和火炬autograd。
到目前为止,已经两方法做AD。显性与隐性图施工。