Why are Visually-Grounded Language Models Bad at Image Classification? 왜 VLM 은 이미지 분류를 잘 못하는가?

티스토리 뷰

Deep Learning

Why are Visually-Grounded Language Models Bad at Image Classification? 왜 VLM 은 이미지 분류를 잘 못하는가?

developer0hye 2025. 1. 1. 16:58

Why are Visually-Grounded Language Models Bad at Image Classification?

Image classification is one of the most fundamental capabilities of machine vision intelligence. In this work, we revisit the image classification task using visually-grounded language models (VLMs) such as GPT-4V and LLaVA. We find that existing proprieta

arxiv.org

VLM 이 Computer Vision의 근본 Task 인 Image Classification 을 왜 잘 못하는지를 분석한 논문

논문에 나온 결론은 Classificaiton 위주의 학습 데이터를 구축하고 파인튜닝해주면 VLM도 잘할 수 있다고함.

'Deep Learning' 카테고리의 다른 글

InternVL2.5 78B 메모리 사용량은 얼마나 되고 인퍼런스 타임은 어느정도일까 (2)	2025.01.04
2025년 1월1일 기준 관심가져보면 좋을 거 같은 VLM InternVL2.5 (0)	2025.01.01
2025년 VLM모델의 Vision Encoder 트렌드 예상(스케일링, Native Resolution Processing) (0)	2025.01.01
huggingface 첫 space 개설 (1)	2024.12.27
register 기법 적용된 ViT 사용시 유의 사항 (1)	2024.12.15

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2026/08 »
일	월	화	수	목	금	토
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28	29
30	31

글 보관함

지속 가능한 꾸준함

티스토리 뷰

Why are Visually-Grounded Language Models Bad at Image Classification? 왜 VLM 은 이미지 분류를 잘 못하는가?

'Deep Learning' 카테고리의 다른 글

티스토리툴바