Install, Run & Control
Everything
on Your Computer
with 1 Click.

Pinokio is a browser that lets you install, run, and manage ANY server application, locally.
Verified
Scripts from Verified Publishers
Hunyuan3D-2-LowVRAM
Text/Image to 3D (Cross Platform: Mac + Windows + Linux): High-Resolution 3D Assets Generation with Large Scale Hunyuan3D Diffusion Models. https://github.com/deepbeepmeep/Hunyuan3D-2GP
image
Wan 2.1
[NVIDIA ONLY] Super Optimized Gradio UI for Wan2.1 video for GPU poor machines (5GB+ VRAM). Generate up to 12 sec videos https://github.com/deepbeepmeep/Wan2GP
image
cube
Roblox Foundation Model for 3D Intelligence --- Cross Platform (Mac, Windows, Linux): Requires 16GB+ VRAM PC or 18GB+ Memory Macs https://github.com/Roblox/cube
image
DiffRhythm
Generate songs with AI (up to 4 min 45 sec). Both with lyrics or instrumental https://github.com/ASLP-lab/DiffRhythm
image
HunyuanVideo
[NVIDIA ONLY] Super Optimized Gradio UI for Hunyuan Video Generator that works on GPU poor machines. Generate up to 10~14 sec videos https://github.com/deepbeepmeep/HunyuanVideoGP
image
Comfyui
The most powerful and modular diffusion model GUI, api and backend with a graph/nodes interface. https://github.com/comfyanonymous/ComfyUI
image
MatAnyone
MatAnyone AI is a tool for editing videos by separating objects from their backgrounds. It is an AI to remove the background from videos effectively. Stable Video Matting with Consistent Memory Propagation: https://github.com/pq-yang/MatAnyone.git
image
macOS-use
[Mac Only] We make AI agents that control Mac apps: https://github.com/browser-use/macOS-use
image
MagicQuill
An intelligent, interactive Image Editing System. Easily erase and add objects on a user-friendly interface.
image
zonos
Zonos-v0.1 is a leading open-weight text-to-speech model trained on more than 200k hours of varied multilingual speech, delivering expressiveness and quality on par with—or even surpassing—top TTS providers. https://github.com/Zyphra/Zonos
image
Deeper Hermes
deep hermes, but without the need for a system prompt. Autonomously responds based on its OWN judgment https://github.com/cocktailpeanut/deeperhermes
image
browser-use
Run AI Agent in your browser. https://github.com/browser-use/web-ui
image
YuE
[NVIDIA ONLY] YuEGP--A Web UI for YuE, an Open Full-song Generation Foundation Model (10G VRAM required), via https://github.com/deepbeepmeep/YuEGP
image
Open WebUI
User-friendly WebUI for LLMs, supported LLM runners include Ollama and OpenAI-compatible APIs https://github.com/open-webui/open-webui
image
bolt.diy
Prompt, run, edit, and deploy full-stack web apps. https://github.com/stackblitz-labs/bolt.diy
image
StyleTTS2 Studio
Build your own voice for StyleTTS2
image
FaceFusion 3.1.1
Industry leading face manipulation platform
image
MMAudio
Generate synchronized audio from video and/or text inputs https://github.com/hkchengrex/MMAudio
image
PSP
Pinokio System Programming: Make your own custom Pinokio
image
ai-video-composer
The ultimate video editor powered by natural language and FFMPEG https://huggingface.co/spaces/huggingface-projects/ai-video-composer
image
echomimic2
[NVIDIA ONLY] Make virtual avatars talk whatever you want with an image and an audio clip https://github.com/antgroup/echomimic_v2
image
Clarity Refiners UI
An enhanced local port of finegrain-image-enhancer powered by Refiners (https://huggingface.co/spaces/finegrain/finegrain-image-enhancer), which was adapted from philz1337x's Clarity Upscaler (https://github.com/philz1337x/clarity-upscaler)
image
pyramidflow
Pyramd Flow Video Generation AI (text-to-video & image-to-video) https://github.com/jy0205/Pyramid-Flow
image
RMBG-2-Studio
Enhanced background remove and replace app built around BRIA-RMBG-2.0 https://huggingface.co/briaai/RMBG-2.0
image
InstantIR
restore low-res images, restore broken images, recreate a new version of the image with a prompt https://huggingface.co/spaces/fffiloni/InstantIR
image
Hallucinator
[NVIDIA ONLY] Autocomplete any voice(s), powered by Hertz AI (Standard Intelligence)
image
fish
Multilingual Text-to-Speech with Voice Cloning (Supports: English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish) https://github.com/fishaudio/fish-speech
image
MFLUX-WEBUI
[MAC ONLY] A powerful and user-friendly web interface for FLUX, powered by MLX and Gradio via MFLUX
image
Allegro-txt2vid
[NVIDIA ONLY] Generate videos with Allegro txt2vid model https://github.com/rhymes-ai/Allegro
image
omnigen
A unified image generation model that you can use to perform various tasks, including but not limited to text-to-image generation, subject-driven generation, Identity-Preserving Generation, and image-conditioned generation. https://huggingface.co/spaces/Shitao/OmniGen
image
ditto
the simplest self-building coding agent https://github.com/yoheinakajima/ditto
image
e2-f5-tts
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching https://huggingface.co/spaces/mrfakename/E2-F5-TTS
image
diamond
Diffusion for World Modeling https://diamond-wm.github.io/
image
facepoke
[NVIDIA Only] Select a portrait, click to move the head around https://github.com/jbilcke-hf/FacePoke
image
MLX-Video-Transcription
[Mac Only] Super Fast MLX Powered Video Transcription https://github.com/RayFernando1337/MLX-Auto-Subtitled-Video-Generator/ by https://x.com/RayFernando1337
image
Invoke
The Gen AI Platform for Pro Studios https://github.com/invoke-ai/InvokeAI
image
diffusers-image-fill
Remove objects from an image https://huggingface.co/spaces/OzzyGT/diffusers-image-fill
image
Whisper-WebUI
A Web UI for easy subtitle using whisper model.
image
CogStudio
[NVIDIA ONLY] Advanced Web UI for CogVideo (text to video, image to video, video to video, extend video, etc) -- Generate videos with less than 10GB VRAM
image
moshi
[Mac only] a speech-text foundation model for real time dialogue https://github.com/kyutai-labs/moshi
image
Applio
A simple, high-quality voice conversion tool focused on ease of use and performance. https://github.com/IAHispano/Applio
image
fluxgym
[NVIDIA Only] Dead simple web UI for training FLUX LoRA with LOW VRAM support (From 12GB)
image
cogvideo
[NVIDIA ONLY] Generate videos with less than 10GB VRAM https://github.com/THUDM/CogVideo
image
Forge
[NVIDIA ONLY] The most efficient way to run FLUX (Optimized to run even on low memory machines, as low as 3GB VRAM with 512x512 resolution) https://github.com/lllyasviel/stable-diffusion-webui-forge
image
LivePortrait
Bring portraits to life! https://github.com/KwaiVGI/LivePortrait
image
flux-webui
Minimal Flux Web UI powered by Gradio & Diffusers (Flux Schnell + Flux Merged)
image
aura-sr-upscaler
AuraSR-v2 - An open reproduction of the GigaGAN Upscaler from fal.ai https://huggingface.co/spaces/gokaygokay/AuraSR-v2
image
audiocraft_plus
AudioCraft Plus is an all-in-one WebUI for the original AudioCraft, adding many quality features on top https://github.com/GrandaddyShmax/audiocraft_plus
image
artist
Artist is a training-free text-driven image stylization method. You give an image and input a prompt describing the desired style, Artist give you the stylized image in that style. The detail of the original image and the style you provide is harmonically integrated https://huggingface.co/spaces/fffiloni/Artist
image
RC Stable Audio Tools
Advanced Gradio UI for Stable Audio https://github.com/RoyalCities/RC-stable-audio-tools
image
PhotoMaker2
Customizing Realistic Human Photos via Stacked ID Embedding https://huggingface.co/spaces/TencentARC/PhotoMaker-V2
image
Fooocus
Minimal Stable Diffusion UI
image
autogpt
AutoGPT is a powerful tool that lets you create and run intelligent agents https://github.com/Significant-Gravitas/AutoGPT
image
gepeto
Generate Pinokio Launchers, Instantly. https://gepeto.pinokio.computer
image
Florence2
An advanced vision foundation model from MicroSoft https://huggingface.co/spaces/gokaygokay/Florence-2
image
hallo
[NVIDIA Only] Hierarchical Audio-Driven Visual Synthesis for Portrait Image Animation https://github.com/fudan-generative-vision/hallo
image
chat-with-mlx
[Mac Onlyl] An all-in-one LLMs Chat UI for Apple Silicon Mac using MLX Framework. https://github.com/qnguyen3/chat-with-mlx
image
flashdiffusion
Accelerating any conditional diffusion model for few steps image generation https://gojasper.github.io/flash-diffusion-project/
image
StableAudio
An Open Source Model for Audio Samples and Sound Design https://github.com/Stability-AI/stable-audio-tools
image
PCM
Phased Consistency Model - generate high quality images with 2 steps https://huggingface.co/spaces/radames/Phased-Consistency-Model-PCM
image
SillyTavern
a local-install interface that allows you to interact with text generation AIs (LLMs) to chat and roleplay with custom characters. https://docs.sillytavern.app/
image
AITown
Build and customize your own version of AI town - a virtual town where AI characters live, chat and socialize https://github.com/a16z-infra/ai-town
image
LlamaFactory
Unify Efficient Fine-Tuning of 100+ LLMs https://github.com/hiyouga/LLaMA-Factory
image
StoryDiffusion Comics
create a story by generating consistent images https://github.com/HVision-NKU/StoryDiffusion
image
ZeST
ZeST: Zero-Shot Material Transfer from a Single Image. Local port of https://huggingface.co/spaces/fffiloni/ZeST (Project: https://ttchengab.github.io/zest/)
image
Openvoice2
Openvoice 2 Web UI - A local web UI for Openvoice2, a multilingual voice cloning TTS https://x.com/myshell_ai/status/1783161876052066793
image
Lobe Chat
An open-source, modern-design ChatGPT/LLMs UI/Framework. Supports speech-synthesis, multi-modal, and extensible (function call) plugin system. https://github.com/lobehub/lobe-chat
image
IDM-VTON
Improving Diffusion Models for Authentic Virtual Try-on in the Wild https://huggingface.co/spaces/yisol/IDM-VTON
image
devika
Agentic AI Software Engineer https://github.com/stitionai/devika
image
parler-tts
a lightweight text-to-speech (TTS) model that can generate high-quality speech with features that can be controlled using a simple text prompt (e.g. gender, background noise, speaking rate, pitch and reverberation). https://huggingface.co/spaces/parler-tts/parler_tts_mini
image
instantstyle
Upload the picture of an image, and generate images with that image style. Instant generation with no LoRA required https://huggingface.co/spaces/InstantX/InstantStyle
image
face-to-all
diffusers InstantID + ControlNet inspired by face-to-many from fofr (https://x.com/fofrAI) - a localized Version of https://huggingface.co/spaces/multimodalart/face-to-all
image
CustomNet
A unified encoder-based framework for object customization in text-to-image diffusion models https://huggingface.co/spaces/TencentARC/CustomNet
image
spright
Generate images with spatial accuracy https://huggingface.co/spaces/SPRIGHT-T2I/SPRIGHT-T2I
image
brushnet
A Plug-and-Play Image Inpainting Model with Decomposed Dual-Branch Diffusion https://huggingface.co/spaces/TencentARC/BrushNet
image
Arc2Face
A Foundation Model of Human Faces https://huggingface.co/spaces/FoivosPar/Arc2Face
image
supir
[NVIDIA ONLY] Text-driven, intelligent restoration, blending AI technology with creativity to give every image a brand new life https://supir.xpixel.group
image
moondream2
a tiny vision language model that kicks ass and runs anywhere https://github.com/vikhyat/moondream
image
ZETA
Zero-Shot Text-Based Audio Editing Using DDPM Inversion https://huggingface.co/spaces/hilamanor/audioEditing
image
differential-diffusion-ui
Differential Diffusion modifies an image according to a text prompt, and according to a map that specifies the amount of change in each region https://differential-diffusion.github.io/
image
dust3r
Geometric 3D Vision Made Easy https://dust3r.europe.naverlabs.com/
image
Chatbot-Ollama
open source chat UI for Ollama https://github.com/ivanfioravanti/chatbot-ollama
image
remove-video-bg
Video background removal tool https://huggingface.co/spaces/amirgame197/Remove-Video-Background
image
MeloTTS
High-quality multi-lingual text-to-speech library by MyShell.ai. Support English, Spanish, French, Chinese, Japanese and Korean https://github.com/myshell-ai/MeloTTS
image
gligen
An intuitive GUI for GLIGEN that uses ComfyUI in the backend https://github.com/mut-ex/gligen-gui
image
Stable Cascade
Stable Cascade from StabilityAI
image
Bark Voice Cloning
Upload a clean 20 seconds WAV file of the vocal persona you want to mimic, type your text-to-speech prompt and hit submit! A local version of https://huggingface.co/spaces/fffiloni/instant-TTS-Bark-cloning
image
[NVIDIA GPU ONLY] LGM
LGM: Large Multi-View Gaussian Model for High-Resolution 3D Content Creation https://huggingface.co/spaces/ashawkey/LGM
image
BRIA RMBG
Background removal model developed by BRIA.AI, trained on a carefully selected dataset and is available as an open-source model for non-commercial use https://huggingface.co/spaces/briaai/BRIA-RMBG-1.4
image
VideoCrafter 2
[Runs fast on NVIDIA GPUs. Works on M1/M2/M3 Macs but slow] VideoCrafter is an open-source video generation and editing toolbox for crafting video content. It currently includes the Text2Video and Image2Video models https://github.com/AILab-CVC/VideoCrafter
image
InstantID
state-of-the-art tuning-free method to achieve ID-Preserving generation with only single image, supporting various downstream tasks. https://instantid.github.io/
image
PhotoMaker
Customizing Realistic Human Photos via Stacked ID Embedding https://github.com/TencentARC/PhotoMaker
image
MAGNeT
MAGNeT is a text-to-music and text-to-sound model capable of generating high-quality audio samples conditioned on text descriptions https://github.com/facebookresearch/audiocraft/blob/main/docs/MAGNET.md
image
vid2pose
Video to Openpose & DWPose (All OS supported) https://github.com/sdbds/vid2pose
image
Moore-AnimateAnyone-Mini
[NVIDIA ONLY] Efficient Implementation of Animate Anyone (13G VRAM + 2G model size) https://github.com/sdbds/Moore-AnimateAnyone-for-windows
image
Moore-AnimateAnyone
[NVIDIA GPU ONLY] Unofficial Implementation of Animate Anyone https://github.com/MooreThreads/Moore-AnimateAnyone
image
OpenVoice
Instantly clone any voice from any text to any speech, in any language https://huggingface.co/spaces/myshell-ai/OpenVoice
image
IP-Adapter-FaceID
Enter a face image and transform it to any other image. Demo for the h94/IP-Adapter-FaceID model https://huggingface.co/spaces/multimodalart/Ip-Adapter-FaceID
image
dreamtalk
When Expressive Talking Head Generation Meets Diffusion Probabilistic Models (https://github.com/ali-vilab/dreamtalk)
image
Stable Diffusion web UI
One-click launcher for Stable Diffusion web UI (AUTOMATIC1111/stable-diffusion-webui)
image
Video2Openpose
Turn any video into Openpose video https://huggingface.co/spaces/fffiloni/video2openpose2
image
StyleAligned
Style Aligned Image Generation via Shared Attention https://style-aligned-gen.github.io/
image
MagicAnimate Mini
[NVIDIA GPU Only] An optimized version of MagicAnimate https://github.com/sdbds/magic-animate-for-windows
image
Vid2DensePose
Convert your videos to densepose and use it on MagicAnimate https://github.com/Flode-Labs/vid2densepose
image
kohya_ss
1 Click Installer for kohya_ss, a Stable Diffusion LoRa & Dreambooth WebUI (https://github.com/bmaltais/kohya_ss)
image
AudioSep
Separate Anything You Describe (https://huggingface.co/spaces/Audio-AGI/AudioSep)
image
XTTS
clone voices into different languages by using just a quick 3-second audio clip. (a local version of https://huggingface.co/spaces/coqui/xtts)
image
RVC
1 Click Installer for Retrieval-based-Voice-Conversion-WebUI (https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
image
AnimateDiff
Install AnimateDiff Automatic1111 Extension and the models with one click
image
lavie
Text-to-Video (T2V) generation framework from Vchitect https://github.com/Vchitect/LaVie
image
LEDITS++
Limitless Image Editing using Text-to-Image Models
image
Text Generation WebUI
A Gradio web UI for Large Language Models https://github.com/oobabooga/text-generation-webui
image
Latest
Latest Pinokio scripts from the community (tagged as 'pinokio' on GitHub)