π Text Generation
Models like GPT, Gemini, LLaMA create essays, articles, stories, and conversations.
πΌοΈ Image Generation
Tools like Stable Diffusion, DALLΒ·E, MidJourney create realistic or artistic images from text prompts.
π΅ Audio & Music
Models like MusicLM, ElevenLabs generate songs, background music, and realistic human voices.
π¬ Video Generation
Platforms like Runway, Pika Labs can create short clips, animations, and even movies from prompts.
π» Code Generation
AI tools like GitHub Copilot, Tabnine help developers write code faster and smarter.
π Multimodal AI
New models like GPT-4o, Gemini 1.5 handle text, images, audio, and video together for richer interactions.
π§ Example: Generate an Image
A Python snippet using Stable Diffusion to create an image from a text prompt:
from diffusers import StableDiffusionPipeline
import torch
pipe = StableDiffusionPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16
).to("cuda")
image = pipe("a futuristic cityscape at sunset").images[0]
image.save("city.png")β Summary
Generative AI spans multiple media types: text, images, audio, video, and code. With multimodal AI, all of these can be combined for advanced applications.