'2025/01/01 글 목록

2025년 1월1일 기준 관심가져보면 좋을 거 같은 VLM InternVL2.5

프로젝트 페이지https://internvl.github.io/blog/2024-12-05-InternVL-2.5/ InternVL2.5We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series that builds upon InternVL 2.0, maintaining its core model architecture while introducing significant enhancements in training and testing strategies as well as data qualitinternvl.github.io 논문https://arxiv.org/pdf/2412.05271 허깅페이스https:/..

Deep Learning 2025. 1. 1. 17:51

Why are Visually-Grounded Language Models Bad at Image Classification? 왜 VLM 은 이미지 분류를 잘 못하는가?

Why are Visually-Grounded Language Models Bad at Image Classification? Why are Visually-Grounded Language Models Bad at Image Classification?Image classification is one of the most fundamental capabilities of machine vision intelligence. In this work, we revisit the image classification task using visually-grounded language models (VLMs) such as GPT-4V and LLaVA. We find that existing proprietaa..

Deep Learning 2025. 1. 1. 16:58

2025년 VLM모델의 Vision Encoder 트렌드 예상(스케일링, Native Resolution Processing)

2025년 VLM 모델 트렌드는 2개로 예상된다. 1. 스케일링 2. Native Resolution Processing 1. 스케일링 2024년도에 나온 VLM모델의 Vision Encoder 모델들의 사이즈는 대체로 300M~1B정도 였던 거 같다. Qwen2VL 도 Visual Encoder 로 675M 급의 ViT를 사용했고, 2B, 7B, 72B 모델이 있는데 다 Visual Enoder 는 같고 LLM Module만 크기를 냅다 키운식이다. 이런 전략은 LLaVA-NeXT에서도 쓰였다. https://llava-vl.github.io/blog/2024-01-30-llava-next/ LLaVA-NeXT: Improved reasoning, OCR, and world knowledgeLLaVA..

Deep Learning 2025. 1. 1. 14:37

이전 1 다음

이전 다음

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/01 »
일	월	화	수	목	금	토
			1	2	3	4
5	6	7	8	9	10	11
12	13	14	15	16	17	18
19	20	21	22	23	24	25
26	27	28	29	30	31

글 보관함

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

지속 가능한 꾸준함

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역