Multimodal inference CLI — feed an image (and optional text) to a vision model.
Usage:
multimodal -m model.gguf --mmproj mmproj.gguf \ [-i image.jpg] [-n n_predict] [-ngl n_gpu_layers] [prompt]
The language model and projector must be compatible (same architecture). If no image is supplied the tool behaves like a plain text-completion CLI.
See Source File
Multimodal inference CLI — feed an image (and optional text) to a vision model.
Usage:
The language model and projector must be compatible (same architecture). If no image is supplied the tool behaves like a plain text-completion CLI.