neural network - How to correctly accumulate gradients across batches in Torch? -
i'd accumulate gradients across several batches. training iter_size 2 , batch_size 16 should same if set iter_size = 1 , batch_size = 32. suspect there i've missed in code, because gradparams both cases not same. appreciate if me find out problem. here code:
local params, gradparams = net:getparameters() local iter_size = 2 local batch_size = 16 local iter = 0 net:zerogradparameters() i, input, target in trainset:sampleiter(batch_size) iter = iter + 1 -- forward local input = input:cuda() local target = target:cuda() local output = net:forward(input) local loss = criterion:forward(output, target) local gradoutput = criterion:backward(output, target) local gradinput = net:backward(input, gradoutput) -- update if iter == iter_size gradparams:mul(1.0/iter_size) net:updategradparameters(0.9) net:updateparameters(0.01) iter = 0 net:zerogradparameters() end end
it worth mentioning manually set random seed determinism when comparing results, difference not due random initialization of network.
the problem due sampling, sampleiter returned images in different order different batch sizes, batches in these 2 cases contained different images , accumulated gradients different.