neural network - How to correctly accumulate gradients across batches in Torch? -


i'd accumulate gradients across several batches. training iter_size 2 , batch_size 16 should same if set iter_size = 1 , batch_size = 32. suspect there i've missed in code, because gradparams both cases not same. appreciate if me find out problem. here code:

   local params, gradparams = net:getparameters()    local iter_size = 2    local batch_size = 16    local iter = 0    net:zerogradparameters()    i, input, target in trainset:sampleiter(batch_size)       iter = iter + 1       -- forward       local input = input:cuda()       local target = target:cuda()       local output = net:forward(input)       local loss = criterion:forward(output, target)       local gradoutput = criterion:backward(output, target)       local gradinput = net:backward(input, gradoutput)       -- update       if iter == iter_size           gradparams:mul(1.0/iter_size)           net:updategradparameters(0.9)           net:updateparameters(0.01)           iter = 0           net:zerogradparameters()       end    end 

it worth mentioning manually set random seed determinism when comparing results, difference not due random initialization of network.

the problem due sampling, sampleiter returned images in different order different batch sizes, batches in these 2 cases contained different images , accumulated gradients different.


Popular posts from this blog

php - How should I create my API for mobile applications (Needs Authentication) -

python 3.x - PyQt5 - Signal : pyqtSignal no method connect -

5 Reasons to Blog Anonymously (and 5 Reasons Not To)