neural network - How to correctly accumulate gradients across batches in Torch? -


i'd accumulate gradients across several batches. training iter_size 2 , batch_size 16 should same if set iter_size = 1 , batch_size = 32. suspect there i've missed in code, because gradparams both cases not same. appreciate if me find out problem. here code:

   local params, gradparams = net:getparameters()    local iter_size = 2    local batch_size = 16    local iter = 0    net:zerogradparameters()    i, input, target in trainset:sampleiter(batch_size)       iter = iter + 1       -- forward       local input = input:cuda()       local target = target:cuda()       local output = net:forward(input)       local loss = criterion:forward(output, target)       local gradoutput = criterion:backward(output, target)       local gradinput = net:backward(input, gradoutput)       -- update       if iter == iter_size           gradparams:mul(1.0/iter_size)           net:updategradparameters(0.9)           net:updateparameters(0.01)           iter = 0           net:zerogradparameters()       end    end 

it worth mentioning manually set random seed determinism when comparing results, difference not due random initialization of network.

the problem due sampling, sampleiter returned images in different order different batch sizes, batches in these 2 cases contained different images , accumulated gradients different.


Popular posts from this blog

Apache NiFi ExecuteScript: Groovy script to replace Json values via a mapping file -

python 3.x - PyQt5 - Signal : pyqtSignal no method connect -