multimodal

Multimodal inference CLI — feed an image (and optional text) to a vision model.

Usage:

multimodal -m model.gguf --mmproj mmproj.gguf \
           [-i image.jpg] [-n n_predict] [-ngl n_gpu_layers] [prompt]

The language model and projector must be compatible (same architecture). If no image is supplied the tool behaves like a plain text-completion CLI.

Members

Functions

main
int main(string[] args)
Undocumented in source. Be warned that the author may not have intended to support it.
printUsage
int printUsage(string prog)
Undocumented in source. Be warned that the author may not have intended to support it.

Meta