'분류 전체보기' 카테고리의 글 목록 (4 Page)

Why are Visually-Grounded Language Models Bad at Image Classification? 왜 VLM 은 이미지 분류를 잘 못하는가?

Why are Visually-Grounded Language Models Bad at Image Classification? Why are Visually-Grounded Language Models Bad at Image Classification?Image classification is one of the most fundamental capabilities of machine vision intelligence. In this work, we revisit the image classification task using visually-grounded language models (VLMs) such as GPT-4V and LLaVA. We find that existing proprietaa..

Deep Learning 2025. 1. 1. 16:58

2025년 VLM모델의 Vision Encoder 트렌드 예상(스케일링, Native Resolution Processing)

2025년 VLM 모델 트렌드는 2개로 예상된다. 1. 스케일링 2. Native Resolution Processing 1. 스케일링 2024년도에 나온 VLM모델의 Vision Encoder 모델들의 사이즈는 대체로 300M~1B정도 였던 거 같다. Qwen2VL 도 Visual Encoder 로 675M 급의 ViT를 사용했고, 2B, 7B, 72B 모델이 있는데 다 Visual Enoder 는 같고 LLM Module만 크기를 냅다 키운식이다. 이런 전략은 LLaVA-NeXT에서도 쓰였다. https://llava-vl.github.io/blog/2024-01-30-llava-next/ LLaVA-NeXT: Improved reasoning, OCR, and world knowledgeLLaVA..

Deep Learning 2025. 1. 1. 14:37

python print 나 pdb가 제대로 출력안될때

with contextlib.redirect_stdout(open(os.devnull, "w")): 로 블록이 감싸져있는 건 아닌지 체크해보기 unsloth 버그 디버깅하다가 이걸로 몇 분 소요함

Python 2024. 12. 30. 22:01

unsloth 너무 너무 불안정하다!

이렇게 많은 스타를 받은 프로젝트에 .gitignore 도 없고 cicd 는 funding 밖에 없다...! gitignore는 내가 추가해서 pr을 날려놨다. https://github.com/unslothai/unsloth-zoo/pull/31 upload .gitignore by developer0hye · Pull Request #31 · unslothai/unsloth-zooWe should make sure adding a .gitignore file for clean project.github.comhttps://github.com/unslothai/unsloth/pull/1489 upload .gitignore by developer0hye · Pull Request #1489 · unsl..

Contribution 일지 2024. 12. 30. 21:31

huggingface 첫 space 개설

워 노스페이스 입던 내가허깅페이스 첫 스페이스 개설 https://huggingface.co/spaces/KingNish/Qwen2-VL-7B Qwen2-VL-7B - a Hugging Face Space by KingNishRunning on Zerohuggingface.co 위 스페이스를 그대로 베꼈다. 7B를 2B로 변경했고 이미지 업로드 하면 업로드한 이미지가 브라우저 창에 출력되게 수정했다. 위에 거는 업로드하면 업로드한 이미지 이름만 출력되고 이미지는 안뜬다. https://huggingface.co/spaces/developer0hye/Qwen2-VL-2B-Instruct Qwen2 VL 2B Instruct - a Hugging Face Space by developer0hyeRunnin..

Deep Learning 2024. 12. 27. 21:22

독거미 황축 키보드 구매 후기

회사 사무실에서 4년간 사용해온 기본 지급된 멤브레인 키보드를 대체하려고 독거미 황축 키보드를 구매했다. 소리가 생각보다 우렁차다. 과연 사무실에서 사용할 수 있을까. 일단은 월요일에 들고가서 하루 실사용해보고 팀원분들의 솔직한 후기를 들을 예정이다. 그리고 키 배열이 묘하게 불편하다. 귀찮아서 사진 첨부는 안하는데 del 키가 내가 사용해온 키보드와는 다른 곳에 위치해있다. 키보드가 높다. 원래 집에서 쓰던 것과 사무실의 멤브레인 키보드가 낮아서 더 체감이된다 이 키보드를 위해서 받침대를 깔아야할판이다. 받침대 구매로 인한 추가 소비는 지금 같은 불경기에 좋지 못하다. 내 손목을 희생할까... 후기글https://developer0hye.tistory.com/825 사무실에 갔던 독거미 황축 키보드..

기타 2024. 12. 20. 20:30

register 기법 적용된 ViT 사용시 유의 사항

import torchimport timmif __name__ == '__main__': dinov2_model_wo_reg = timm.create_model('vit_base_patch14_dinov2.lvd142m', pretrained=True) dinov2_model_w_reg = timm.create_model('vit_base_patch14_reg4_dinov2.lvd142m', pretrained=True) input = torch.randn(1, 3, 518, 518) output_wo_reg = dinov2_model_wo_reg.forward_features(input) output_w_reg = dinov2_model_w_reg.forward_feature..

Deep Learning 2024. 12. 15. 23:05

그냥 드는 의문 SWIN은 왜 CLIP 모델이 없을까

제곧내

Deep Learning 2024. 12. 15. 22:33

FROM CLIP TO DINO: VISUAL ENCODERS SHOUT IN MULTI-MODAL LARGE LANGUAGE MODELS, CLIP과 DINOv2 모델을 잘 Ensemble해보자

FROM CLIP TO DINO: VISUAL ENCODERS SHOUT IN MULTI-MODAL LARGE LANGUAGE MODELS CLIP 의 서로 다른 블락에서 나오는 피쳐들을 잘 Ensemble 해주고, DNIOv2 의 서로 다른 블락에서 나오는 피쳐들을 잘 Ensemble해주고 Concat 해주고 Embedding 해준 피쳐들을 Text Embeddings이랑 잘 LLM 에 넣어주면 VLM의 성능이 올라간다고한다. w/ MFM 은 하나의 모델에서 여러 블락에서 나온 피쳐를 Ensemble 해줬을때의 결과, 같은 모델에서 서로 다른 레이어에서 나오는 피쳐들을 활용하는거라 Ensemble이라고 하긴 뭐하긴하지만 merge보단 ensemble이라는 표현이 더 맞긴한 거 같아서 블로그에 정리..

Deep Learning 2024. 12. 9. 23:35

[Open Source Contribution] Unsloth-zoo Contribution

https://github.com/unslothai/unsloth-zoo/pull/21 Add `formatting_func` to Enable Lazy Data Loading in `UnslothVisionDataCollator` by developer0hye · Pull Request #21 · unslothOverview This PR introduces a new formatting_func parameter to the UnslothVisionDataCollator, allowing for dynamic formatting of examples during data collation. This enhancement addresses a critica...github.com unsloth로 qwe..

Contribution 일지 2024. 12. 9. 21:35

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

지속 가능한 꾸준함

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30