multimodal

Multimodal inference CLI — feed an image (and optional text) to a vision model.

Usage:

multimodal -m model.gguf --mmproj mmproj.gguf \
           [-i image.jpg] [-n n_predict] [-ngl n_gpu_layers] [prompt]

The language model and projector must be compatible (same architecture). If no image is supplied the tool behaves like a plain text-completion CLI.

Members

Functions

main int main(string[] args): Undocumented in source. Be warned that the author may not have intended to support it.
printUsage int printUsage(string prog): Undocumented in source. Be warned that the author may not have intended to support it.

Meta

Source

See Source File

modules