[PyTorch] Gradient accumulation 예제 코드

티스토리 뷰

Deep Learning/PyTorch

[PyTorch] Gradient accumulation 예제 코드

developer0hye 2021. 6. 16. 22:43

https://pytorch.org/docs/stable/notes/amp_examples.html#id6

Automatic Mixed Precision examples — PyTorch 1.9.0 documentation

Shortcuts

pytorch.org

scaler = GradScaler()

for epoch in epochs:
    for i, (input, target) in enumerate(data):
        with autocast():
            output = model(input)
            loss = loss_fn(output, target)
            loss = loss / iters_to_accumulate

        # Accumulates scaled gradients.
        scaler.scale(loss).backward()

        if (i + 1) % iters_to_accumulate == 0:
            # may unscale_ here if desired (e.g., to allow clipping unscaled gradients)

            scaler.step(optimizer)
            scaler.update()
            optimizer.zero_grad()

GPU 메모리가 부족하여 낮은 배치사이즈로 모델을 학습시키는 경우 학습이 잘 안될 수도 있다. 이 경우 위 기법을 사용해보자.

darknet/yolo 를 사용해서 학습이나 인퍼런스를 해본 사람이라면 모델 cfg에서 batchsize랑 subdivision 이란 아이템을 수정해본 적 이 있을 것이다. 위 코드에서 batchsize가 따로 명시돼있지는 않지만, 일반적으로 우리가 알고있던, 그리고 PyTorch 상에서 정의하게 되는 batchsize(정확히는 mini batch size) 가 yolo cfg의 subdivision 이고 batchsize * iters_to_accumulate 가 yolo cfg의 batchsize가 되는 것이다.

mini batch를 한 번 더 작은 mini mini batch 로 쪼개고 mini mini batch 를 여러번 로드해가며 forward & backward 과정을 통해 gradient를 누적시킨다음, 일정 주기(iters_to_accumulate)마다 웨이트를 업데이트 하는 방법이라고 이해하면 된다.

예)

배치 사이즈: 16

iters_to_accumulate: 4

위 경우 데이터를 16개씩 4번 처리(forward & backward) 후 한번에 Optimization 한다. 이렇게 하면 배치 사이즈를 64로 세팅했을때와 유사하게끔 웨이트를 업데이트 할 수 있다.

아래의 프로젝트를 구현함에 있어 사용한 기법이다.

https://github.com/developer0hye/Simple-CenterNet

developer0hye/Simple-CenterNet

PyTorch Implementation of CenterNet(Object as Points) - developer0hye/Simple-CenterNet

github.com

참고로 yolov5도 이 기법을 사용한다.

https://github.com/ultralytics/yolov5

ultralytics/yolov5

YOLOv5 in PyTorch > ONNX > CoreML > TFLite. Contribute to ultralytics/yolov5 development by creating an account on GitHub.

github.com

'Deep Learning > PyTorch' 카테고리의 다른 글

[PyTorch] DistributedDataParallel 예시 코드 및 참고 자료 모음 (4)	2021.06.18
[PyTorch] timm(rwightman/pytorch-image-models) 백본 스테이지별 채널 수 확인 코드 (0)	2021.06.18
PyTorch to Onnx to TensorRT 과정을 위해 참고할만한 링크들 (0)	2021.05.30
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling `cublasCreate(handle)` (0)	2021.05.23
[PyTorch] Mish 메모리 이슈 (0)	2021.02.26

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

글 보관함

지속 가능한 꾸준함

티스토리 뷰

[PyTorch] Gradient accumulation 예제 코드

'Deep Learning > PyTorch' 카테고리의 다른 글

티스토리툴바