- mtmd_bitmap_free
void mtmd_bitmap_free(mtmd_bitmap* bitmap)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_bitmap_get_data
const(ubyte)* mtmd_bitmap_get_data(const(mtmd_bitmap)* bitmap)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_bitmap_get_id
const(char)* mtmd_bitmap_get_id(const(mtmd_bitmap)* bitmap)
Optional string ID used for KV-cache tracking.
- mtmd_bitmap_get_n_bytes
size_t mtmd_bitmap_get_n_bytes(const(mtmd_bitmap)* bitmap)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_bitmap_get_nx
uint mtmd_bitmap_get_nx(const(mtmd_bitmap)* bitmap)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_bitmap_get_ny
uint mtmd_bitmap_get_ny(const(mtmd_bitmap)* bitmap)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_bitmap_init
mtmd_bitmap* mtmd_bitmap_init(uint nx, uint ny, const(ubyte)* data)
Create an image bitmap from raw RGB pixels (RGBRGBRGB…; length must equal nx * ny * 3).
- mtmd_bitmap_init_from_audio
mtmd_bitmap* mtmd_bitmap_init_from_audio(size_t n_samples, const(float)* data)
Create an audio bitmap from float PCM samples.
- mtmd_bitmap_is_audio
bool mtmd_bitmap_is_audio(const(mtmd_bitmap)* bitmap)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_bitmap_set_id
void mtmd_bitmap_set_id(mtmd_bitmap* bitmap, const(char)* id)
- mtmd_context_params_default
mtmd_context_params mtmd_context_params_default()
Returns a default-initialised mtmd_context_params.
- mtmd_decode_use_mrope
bool mtmd_decode_use_mrope(mtmd_context* ctx)
True if the model uses M-RoPE (Multimodal RoPE) for llama_decode.
- mtmd_decode_use_non_causal
bool mtmd_decode_use_non_causal(mtmd_context* ctx)
True if the model requires a non-causal attention mask for llama_decode.
- mtmd_default_marker
const(char)* mtmd_default_marker()
Returns the default media marker string ("<__media__>").
- mtmd_encode_chunk
int mtmd_encode_chunk(mtmd_context* ctx, const(mtmd_input_chunk)* chunk)
Encode a single image/audio chunk. Returns 0 on success.
- mtmd_free
void mtmd_free(mtmd_context* ctx)
Frees a multimodal context.
- mtmd_get_audio_sample_rate
int mtmd_get_audio_sample_rate(mtmd_context* ctx)
Audio sample rate in Hz (e.g. 16 000 for Whisper), or -1 if unsupported.
- mtmd_get_output_embd
float* mtmd_get_output_embd(mtmd_context* ctx)
Pointer to the float embeddings from the last mtmd_encode_chunk call.
- mtmd_helper_bitmap_init_from_buf
mtmd_bitmap* mtmd_helper_bitmap_init_from_buf(mtmd_context* ctx, const(ubyte)* buf, size_t len)
Load from an in-memory buffer (JPEG/PNG/BMP/GIF/WAV/MP3/FLAC). Thread-safe.
- mtmd_helper_bitmap_init_from_file
mtmd_bitmap* mtmd_helper_bitmap_init_from_file(mtmd_context* ctx, const(char)* fname)
Load an image or audio file into a bitmap. Thread-safe. Returns null on failure.
- mtmd_helper_decode_image_chunk
int mtmd_helper_decode_image_chunk(mtmd_context* ctx, llama_context* lctx, const(mtmd_input_chunk)* chunk, float* encoded_embd, llama_pos n_past, llama_seq_id seq_id, int n_batch, llama_pos* new_n_past)
Decode an already-encoded image chunk (embeddings pre-calculated).
- mtmd_helper_eval_chunk_single
int mtmd_helper_eval_chunk_single(mtmd_context* ctx, llama_context* lctx, const(mtmd_input_chunk)* chunk, llama_pos n_past, llama_seq_id seq_id, int n_batch, bool logits_last, llama_pos* new_n_past)
Like mtmd_helper_eval_chunks but for a single chunk.
- mtmd_helper_eval_chunks
int mtmd_helper_eval_chunks(mtmd_context* ctx, llama_context* lctx, const(mtmd_input_chunks)* chunks, llama_pos n_past, llama_seq_id seq_id, int n_batch, bool logits_last, llama_pos* new_n_past)
Eval all chunks: text via llama_decode, images via mtmd_encode_chunk + llama_decode.
Returns 0 on success. NOT thread-safe.
- mtmd_helper_get_n_pos
llama_pos mtmd_helper_get_n_pos(const(mtmd_input_chunks)* chunks)
Total position count across all chunks (may differ from n_tokens for M-RoPE).
- mtmd_helper_get_n_tokens
size_t mtmd_helper_get_n_tokens(const(mtmd_input_chunks)* chunks)
Total token count across all chunks (for KV-cache sizing).
- mtmd_helper_log_set
void mtmd_helper_log_set(ggml_log_callback log_callback, void* user_data)
Set logging callback (also calls mtmd_log_set internally).
- mtmd_image_tokens_get_id
const(char)* mtmd_image_tokens_get_id(const(mtmd_image_tokens)* image_tokens)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_image_tokens_get_n_pos
llama_pos mtmd_image_tokens_get_n_pos(const(mtmd_image_tokens)* image_tokens)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_image_tokens_get_n_tokens
size_t mtmd_image_tokens_get_n_tokens(const(mtmd_image_tokens)* image_tokens)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_image_tokens_get_nx
size_t mtmd_image_tokens_get_nx(const(mtmd_image_tokens)* image_tokens)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_image_tokens_get_ny
size_t mtmd_image_tokens_get_ny(const(mtmd_image_tokens)* image_tokens)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_init_from_file
mtmd_context* mtmd_init_from_file(const(char)* mmproj_fname, const(llama_model)* text_model, mtmd_context_params ctx_params)
Initialises a multimodal context from a projector GGUF file.
Returns null on failure (bad path, incompatible model, etc.).
- mtmd_input_chunk_copy
mtmd_input_chunk* mtmd_input_chunk_copy(const(mtmd_input_chunk)* chunk)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_input_chunk_free
void mtmd_input_chunk_free(mtmd_input_chunk* chunk)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_input_chunk_get_id
const(char)* mtmd_input_chunk_get_id(const(mtmd_input_chunk)* chunk)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_input_chunk_get_n_pos
llama_pos mtmd_input_chunk_get_n_pos(const(mtmd_input_chunk)* chunk)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_input_chunk_get_n_tokens
size_t mtmd_input_chunk_get_n_tokens(const(mtmd_input_chunk)* chunk)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_input_chunk_get_tokens_image
const(mtmd_image_tokens)* mtmd_input_chunk_get_tokens_image(const(mtmd_input_chunk)* chunk)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_input_chunk_get_tokens_text
const(llama_token)* mtmd_input_chunk_get_tokens_text(const(mtmd_input_chunk)* chunk, size_t* n_tokens_output)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_input_chunk_get_type
mtmd_input_chunk_type mtmd_input_chunk_get_type(const(mtmd_input_chunk)* chunk)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_input_chunks_free
void mtmd_input_chunks_free(mtmd_input_chunks* chunks)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_input_chunks_get
const(mtmd_input_chunk)* mtmd_input_chunks_get(const(mtmd_input_chunks)* chunks, size_t idx)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_input_chunks_init
mtmd_input_chunks* mtmd_input_chunks_init()
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_input_chunks_size
size_t mtmd_input_chunks_size(const(mtmd_input_chunks)* chunks)
Undocumented in source but is binding to C. You might be able to learn more by searching the web for its name.
- mtmd_log_set
void mtmd_log_set(ggml_log_callback log_callback, void* user_data)
- mtmd_support_audio
bool mtmd_support_audio(mtmd_context* ctx)
True if the model supports audio input.
- mtmd_support_vision
bool mtmd_support_vision(mtmd_context* ctx)
True if the model supports image input.
- mtmd_tokenize
int mtmd_tokenize(mtmd_context* ctx, mtmd_input_chunks* output, const(mtmd_input_text)* text, const(mtmd_bitmap*)* bitmaps, size_t n_bitmaps)
Tokenise a text prompt that may contain media markers.
Returns 0 on success, 1 on bitmap-count mismatch, 2 on preprocessing error.
D bindings and wrappers for libmtmd (multimodal support).
libmtmd encodes images and audio into token embeddings that a language model can attend to alongside ordinary text tokens.
Typical usage: