이미지 생성 모델 FLUX 맛보기

티스토리 뷰

Deep Learning

이미지 생성 모델 FLUX 맛보기

developer0hye 2024. 10. 6. 04:32

https://huggingface.co/black-forest-labs/FLUX.1-dev

black-forest-labs/FLUX.1-dev · Hugging Face

FLUX.1 [dev] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post. Key Features Cutting-edge output quality, second only to our state-of-the-art model FLUX

huggingface.co

https://huggingface.co/black-forest-labs/FLUX.1-schnell

black-forest-labs/FLUX.1-schnell · Hugging Face

FLUX.1 [schnell] is a 12 billion parameter rectified flow transformer capable of generating images from text descriptions. For more information, please read our blog post. Key Features Cutting-edge output quality and competitive prompt following, matching

huggingface.co

FLUX란? 2024년10월6일 기준 Publicly avaliable 이미지 생성 모델 중 꽤나 잘 생성해주는 모델!

dev랑 schnell 이 있는데 dev 가 좀 더 high performance 모델로 보임. 대신에 이미지 생성 스텝이 50회 정도 돼야 하이 퀄리티 이미지가 나오고 schnell은 4회 정도면 하이 퀄리티 이미지가 생성되는 모델로 인퍼런스 타임 측면에서 보다 경량화된 모델이라고 할 수 있을 거 같다. 데모사이트에서 해보면 dev도 50회까지 안가도 28회 정도로 설정해도 꽤나 좋은 결과물이 나왔다.

https://youtu.be/IhtnxeoqZEY

몇개만 쓱쓱 봤을때는 DEV가 확실히 photorealistic하다 .

https://huggingface.co/spaces/black-forest-labs/FLUX.1-dev

FLUX.1 [dev] - a Hugging Face Space by black-forest-labs

Running on Zero

huggingface.co

여기서 데모 돌려볼 수 있다. 실행하면 Step 별로 개선되는 이미지를 점진적으로 보여주는데 처음 출력되는 Noisy 이미지를 보고 실망하지 말고 기다리면 점점 깨끗한 이미지가 생성되어진다!

다른 Publicly available 이미지 생성 모델의 출력물들은 텍스트 생성을 정말 잘 못하는데 이 모델은 "~ 단어가 적힌 ~를 들고 있는 ~" 이런 식으로 생성해보면 꽤나 잘 생성해주는 것을 확인할 수 있었다. 근데 schnell 모델의 경우 살짝 긴 문장 입력했을때 단어 통채로 빼먹고 생성하는 경우가 있었다. 똑같은 입력으로 dev 모델은 잘 생성해줬었다!

설치 방법

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

pip install -U diffusers

pip install transformers

pip install accelerate

pip install protobuf

pip install sentencepiece

이후 hugging face 가입하고 토큰 생성해주고, https://huggingface.co/black-forest-labs/FLUX.1-dev 여기 사이트에서 license 동의하고 커맨드창에서

huggingface-cli login

입력하고 로그인해주면됨

이때 토큰 필요함

실행 코드

테스트에 사용한 GPU가 3070 8GB인지라 해상도를 512x512로 설정함.(Default Resolution은 1024x1024 같다.)

그리고 dev는 50스텝 밟아야하는데 그러면 예상 수행시간이 50분 정도로 출력되어 포기하고 schnell로 4스텝으로 결과나오도록 코드 돌려봄

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    height=512,
    width=512,
    guidance_scale=0.0,
    num_inference_steps=4,
    max_sequence_length=256,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-schnell.png")

테스트하고자 하는 pc의 gpu가 24기가 이상인 3090 이상급의 GPU 라면 dev 로 1024x1024 이미지 생성해도 아마 Reasonable한 시간내에 추론이 될 거 같음. 그러면 아래 예제 코드로 돌려보면 될듯함

import torch
from diffusers import FluxPipeline

pipe = FluxPipeline.from_pretrained("black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16)
pipe.enable_model_cpu_offload() #save some VRAM by offloading the model to CPU. Remove this if you have enough GPU power

prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    height=1024,
    width=1024,
    guidance_scale=3.5,
    num_inference_steps=50,
    max_sequence_length=512,
    generator=torch.Generator("cpu").manual_seed(0)
).images[0]
image.save("flux-dev.png")

생성 결과

프롬프트: A horse holding a sign that says ChatGPT is God

오우... 표지판을 들고 있는 말 생성해달라고 하면 완전 엉뚱한 결과 생성해줄줄 알았는데 은근 잘만들어준다;;;

프롬프트: A tennis player holding an umbrella while doing a forehand

https://developer0hye.tistory.com/646

20230903, 20230910, 20230917, 20230924 테니스 23, 24, 25, 26 번째 레슨 그리고 이미지 생성 AI

나비처럼 날아 벌처럼 쏜다. 설명: 우산을 쓰고 포핸드를 치는 장면 이런 이미지는 현 시대의 이미지 생성 AI 모델들도 생성이 어렵지 않을까 싶다. 한 번 해보자 https://huggingface.co/spaces/stabilityai/s

developer0hye.tistory.com

위 결과는 Stable Diffusion 2.x 결과이다. 너무 너무 생뚱맞은 이미지를 생성해주었었는데...

schnell은 아래와 같은 이미지를,

dev는 아래와 같은 이미지를 생성해주었다.

보법이 다르다. 아직 100% 만족할만한 결과는 아니지만 엄청나게 발전된 결과라고 생각이 된다.

FLUX.1 [dev] 로

an image of an Asian developer around 30 years old, wearing glasses, a checkered shirt under a gray zip-up hoodie, and a chef's hat. He is holding a sign that says 'When should we meet?' A small frog is sitting on his head, next to an orange mushroom.

라는 프롬프트를 입력했을때 생성된 이미지

위 프롬프트 또한 ChatGPT 이용해서 좀 손본다음 입력함. 앞에 Create는 빼버림.

'Deep Learning' 카테고리의 다른 글

돌려보고 싶은데 귀찮아서 망설이고 있는 Human Detection Model MMPedestron (2) 좀 친해지려고 노력중 (0)	2024.10.19
돌려보고 싶은데 귀찮아서 망설이고 있는 Human Detection Model MMPedestron (1) (0)	2024.10.07
YOLOv8, YOLO11 성능 정량적 비교 (0)	2024.10.03
COCO Pretrained YOLOv8 클래스별 AP (0)	2024.09.13
Sapiens: Foundation for Human Vision Models 리뷰 (0)	2024.08.26

지속 가능한 꾸준함 developer0hye 님의 블로그입니다.

공지사항

최근에 올라온 글

최근에 달린 댓글

Total

Today

Yesterday

링크

TAG more

« 2025/04 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30

글 보관함

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

지속 가능한 꾸준함

티스토리 뷰