Integrations: Docling, Langflow, OpenSearch, Airbyte, GitHub Action & Grafana

Use MDDB alongside popular AI/ML, ELT, and observability tools to build production document processing and RAG pipelines.

Architecture Overview

graph LR subgraph Content Sources PDF[PDF / DOCX / PPTX] DOCLING[Docling<br>IBM Document Parser] WP[WordPress] WPEXP[wpexporter] WPSYNC[WordPress<br>Sync plugin] AB[Airbyte<br>300+ ELT sources] GHA[GitHub Action<br>repo docs / READMEs] end subgraph Storage & Search MDDB[mddbd<br>:11023 / :11024] BOLT[(BoltDB)] VEC[(Vector Index)] OS[(OpenSearch<br>optional)] end subgraph Output & Orchestration SSG[SSG<br>Static Site Generator] LF[Langflow<br>Visual RAG Builder] LLM[LLM<br>Claude / GPT / Llama] DEPLOY[GitHub Pages<br>Cloudflare / Netlify] GRAFANA[Grafana<br>datasource plugin] end PDF -->|parse| DOCLING DOCLING -->|markdown| MDDB WP -->|export| WPEXP WPEXP -->|markdown| MDDB WP -->|live hooks| WPSYNC WPSYNC -->|/v1/add /v1/delete| MDDB AB -->|destination-mddb| MDDB GHA -->|/v1/add per file| MDDB MDDB --> BOLT MDDB --> VEC MDDB -.->|sync| OS MDDB -->|fetch docs| SSG SSG -->|static HTML| DEPLOY LF -->|REST / MCP| MDDB LF -->|query| OS LF -->|generate| LLM LLM -->|answer| LF MDDB -->|/v1/temporal /v1/aggregate| GRAFANA

1. Docling β†’ MDDB (Document Ingestion)

Docling is IBM's document parser that converts PDF, DOCX, PPTX, and HTML into structured Markdown. Since MDDB stores Markdown natively, this is a natural fit.

Install Docling

pip install docling

Basic: Parse and Store a Single Document

from docling.document_converter import DocumentConverter
import requests MDDB_URL = "http://localhost:11023" converter = DocumentConverter()
result = converter.convert("report.pdf")
markdown = result.document.export_to_markdown() requests.post(f"{MDDB_URL}/v1/add", json={ "collection": "reports", "key": "report-2026-q1", "lang": "en_US", "meta": { "source": ["docling"], "type": ["pdf"], "title": ["Q1 2026 Report"], }, "contentMd": markdown,
})

Batch: Ingest a Folder of Documents

"""
Bulk-import a folder of documents via Docling β†’ MDDB.
Supports PDF, DOCX, PPTX, HTML.
"""
from docling.document_converter import DocumentConverter
from pathlib import Path
import requests MDDB_URL = "http://localhost:11023"
COLLECTION = "knowledge-base"
INPUT_DIR = Path("./documents") converter = DocumentConverter()
supported = {".pdf", ".docx", ".pptx", ".html", ".htm"} for file in INPUT_DIR.iterdir(): if file.suffix.lower() not in supported: continue print(f"Processing: {file.name}") result = converter.convert(str(file)) markdown = result.document.export_to_markdown() resp = requests.post(f"{MDDB_URL}/v1/add", json={ "collection": COLLECTION, "key": file.stem, "lang": "en_US", "meta": { "source": ["docling"], "type": [file.suffix.lstrip(".")], "filename": [file.name], }, "contentMd": markdown, }) if resp.status_code == 200: print(f" OK: {file.name}") else: print(f" ERROR: {resp.text}") print("Done. Check embedding progress:")
print(requests.get(f"{MDDB_URL}/v1/vector-stats").json())

With Chunking (for Better Vector Search)

Long documents should be split into chunks for more precise semantic search:

from docling.document_converter import DocumentConverter
from docling.chunking import HybridChunker
import requests MDDB_URL = "http://localhost:11023" converter = DocumentConverter()
result = converter.convert("manual.pdf") chunker = HybridChunker(tokenizer="sentence-transformers/all-MiniLM-L6-v2")
chunks = list(chunker.chunk(result.document)) for i, chunk in enumerate(chunks): text = chunk.text # Skip very short chunks if len(text.strip()) < 50: continue requests.post(f"{MDDB_URL}/v1/add", json={ "collection": "manual", "key": f"manual-chunk-{i:04d}", "lang": "en_US", "meta": { "source": ["docling"], "chunk_index": [str(i)], "parent_doc": ["manual.pdf"], }, "contentMd": text, }) print(f"Imported {len(chunks)} chunks from manual.pdf")

Docker Pipeline

services: mddb: image: tradik/mddb:latest ports: - "11023:11023" - "11024:11024" - "9000:9000" volumes: - mddb-data:/app/data environment: MDDB_EMBEDDING_PROVIDER: openai MDDB_EMBEDDING_API_KEY: ${OPENAI_API_KEY} docling-ingest: build: context: . dockerfile: Dockerfile.docling volumes: - ./documents:/documents environment: MDDB_URL: http://mddb:11023 depends_on: - mddb volumes: mddb-data:
FROM python:3.11-slim
RUN pip install docling requests
COPY ingest.py /app/ingest.py
CMD ["python", "/app/ingest.py"]

2. Langflow + MDDB (Visual RAG Orchestration)

Langflow is a visual framework for building LLM workflows. MDDB can be integrated as a retrieval component via REST API or MCP.

Install Langflow

pip install langflow
langflow run

Option A: Custom Python Component (REST API)

Create a custom Langflow component that queries MDDB:

"""
MDDB Search Component for Langflow.
Save as: mddb_component.py
Import in Langflow via Custom Components.
"""
from langflow.custom import Component
from langflow.io import MessageTextInput, IntInput, Output
from langflow.schema import Data
import requests class MDDBSearch(Component): display_name = "MDDB Semantic Search" description = "Search MDDB knowledge base using semantic/vector search." icon = "search" inputs = [ MessageTextInput( name="query", display_name="Search Query", info="Natural language query to search for.", ), MessageTextInput( name="mddb_url", display_name="MDDB URL", value="http://localhost:11023", info="MDDB server address.", ), MessageTextInput( name="collection", display_name="Collection", value="docs", info="MDDB collection to search.", ), IntInput( name="top_k", display_name="Top K", value=5, info="Number of results to return.", ), ] outputs = [ Output(display_name="Results", name="results", method="search"), ] def search(self) -> list[Data]: response = requests.post( f"{self.mddb_url}/v1/vector-search", json={ "collection": self.collection, "query": self.query, "topK": self.top_k, "threshold": 0.6, "includeContent": True, }, ) results = response.json().get("results", []) return [ Data(data={ "key": r["document"]["key"], "content": r["document"].get("contentMd", ""), "score": r["score"], "meta": r["document"].get("meta", {}), }) for r in results ]

Using in Langflow

  1. Open Langflow UI β†’ My Collection β†’ New Project
  2. Go to Custom Components β†’ upload mddb_component.py
  3. Build a flow:
[Chat Input] β†’ [MDDB Semantic Search] β†’ [Parse Data] β†’ [Prompt] β†’ [LLM] β†’ [Chat Output]

The Prompt template:

Answer the user's question based on the following documents from the knowledge base.
Cite sources by their key. Context:
{documents} Question: {query}

Option B: Langflow + MDDB via MCP

If your Langflow version supports MCP tool calling, connect MDDB directly:

{ "mcpServers": { "mddb": { "command": "docker", "args": [ "run", "-i", "--rm", "--network", "host", "-v", "mddb-data:/app/data", "-e", "MDDB_MCP_STDIO=true", "tradik/mddb:latest" ] } }
}

Available MCP tools for Langflow: semantic_search, full_text_search, hybrid_search, search_documents, add_document, import_url, and 48 more.

Option C: Langflow API Tool (No Custom Code)

Use Langflow's built-in API Request component to call MDDB directly:

  1. Add an API Request component
  2. Set Method: POST
  3. Set URL: http://localhost:11023/v1/vector-search
  4. Set Body:
{ "collection": "docs", "query": "{query}", "topK": 5, "includeContent": true
}
  1. Connect: [Chat Input] β†’ [API Request] β†’ [Parse Data] β†’ [Prompt] β†’ [LLM] β†’ [Chat Output]

Full Langflow RAG Flow Example

graph LR INPUT[Chat Input] --> MDDB[MDDB Search<br>vector-search] MDDB --> PARSE[Parse Data<br>extract contentMd] PARSE --> PROMPT[Prompt Template<br>context + question] INPUT --> PROMPT PROMPT --> LLM[OpenAI / Claude<br>/ Ollama] LLM --> OUTPUT[Chat Output]

3. OpenSearch + MDDB (Scalable Search)

MDDB's built-in vector search works well up to ~50K documents. For larger datasets or advanced full-text search (BM25, aggregations, facets), sync documents to OpenSearch.

Architecture

FeatureMDDBOpenSearch
StoragePrimary (BoltDB)Search index (replica)
Vector searchIn-memory, ~50K docskNN plugin, millions
Full-text searchBuilt-in TF scoringBM25, analyzers, stemming
AggregationsNoYes (facets, histograms)
MCP tools52 built-inNo

Strategy: MDDB as the primary store + MCP interface, OpenSearch as the search backend for scale.

Setup OpenSearch

services: mddb: image: tradik/mddb:latest ports: - "11023:11023" volumes: - mddb-data:/app/data environment: MDDB_EMBEDDING_PROVIDER: openai MDDB_EMBEDDING_API_KEY: ${OPENAI_API_KEY} opensearch: image: opensearchproject/opensearch:2 ports: - "9200:9200" environment: discovery.type: single-node DISABLE_SECURITY_PLUGIN: "true" volumes: - os-data:/usr/share/opensearch/data opensearch-dashboards: image: opensearchproject/opensearch-dashboards:2 ports: - "5601:5601" environment: OPENSEARCH_HOSTS: '["http://opensearch:9200"]' DISABLE_SECURITY_DASHBOARDS_PLUGIN: "true" volumes: mddb-data: os-data:

Create OpenSearch Index

curl -X PUT http://localhost:9200/mddb-docs -H 'Content-Type: application/json' -d '{ "settings": { "index": { "knn": true, "number_of_replicas": 0 } }, "mappings": { "properties": { "key": { "type": "keyword" }, "collection": { "type": "keyword" }, "lang": { "type": "keyword" }, "contentMd": { "type": "text", "analyzer": "standard" }, "meta": { "type": "object", "enabled": true }, "addedAt": { "type": "date" }, "updatedAt": { "type": "date" }, "embedding": { "type": "knn_vector", "dimension": 1536, "method": { "name": "hnsw", "engine": "lucene" } } } }
}'

Sync Script: MDDB β†’ OpenSearch

"""
Sync documents from MDDB to OpenSearch.
Run periodically (cron) or trigger via MDDB webhook.
"""
import requests
import json MDDB_URL = "http://localhost:11023"
OS_URL = "http://localhost:9200"
INDEX = "mddb-docs" def sync_collection(collection: str): """Export all docs from MDDB and index into OpenSearch.""" # Step 1: Export from MDDB as NDJSON resp = requests.post(f"{MDDB_URL}/v1/export", json={ "collection": collection, "format": "ndjson", }) if resp.status_code != 200: print(f"Export failed: {resp.text}") return # Step 2: Bulk index into OpenSearch bulk_body = "" count = 0 for line in resp.text.strip().split("\n"): if not line: continue doc = json.loads(line) doc_id = f"{collection}|{doc['key']}|{doc.get('lang', 'en_us')}" bulk_body += json.dumps({"index": {"_index": INDEX, "_id": doc_id}}) + "\n" bulk_body += json.dumps({ "key": doc["key"], "collection": collection, "lang": doc.get("lang", ""), "contentMd": doc.get("contentMd", ""), "meta": doc.get("meta", {}), "addedAt": doc.get("addedAt"), "updatedAt": doc.get("updatedAt"), }) + "\n" count += 1 if bulk_body: r = requests.post( f"{OS_URL}/_bulk", data=bulk_body, headers={"Content-Type": "application/x-ndjson"}, ) result = r.json() errors = result.get("errors", False) print(f"Synced {count} docs from '{collection}' β†’ OpenSearch (errors={errors})") stats = requests.get(f"{MDDB_URL}/v1/stats").json()
for coll in stats.get("collections", {}).keys(): sync_collection(coll)

Real-Time Sync via MDDB Webhooks

Instead of periodic sync, use MDDB webhooks for real-time updates:

curl -X POST http://localhost:11023/v1/webhooks -H 'Content-Type: application/json' -d '{ "url": "http://sync-service:8080/mddb-webhook", "events": ["doc.add", "doc.update", "doc.delete"], "collections": ["*"]
}'

Webhook handler that updates OpenSearch:

"""
Webhook receiver: syncs individual document changes to OpenSearch.
Run as a small Flask/FastAPI service.
"""
from fastapi import FastAPI, Request
import requests app = FastAPI()
OS_URL = "http://localhost:9200"
INDEX = "mddb-docs"
MDDB_URL = "http://localhost:11023" @app.post("/mddb-webhook")
async def handle_webhook(request: Request): payload = await request.json() event = payload.get("event") collection = payload.get("collection") key = payload.get("key") lang = payload.get("lang", "en_us") doc_id = f"{collection}|{key}|{lang}" if event in ("doc.add", "doc.update"): # Fetch full document from MDDB resp = requests.post(f"{MDDB_URL}/v1/get", json={ "collection": collection, "key": key, "lang": lang, }) doc = resp.json() # Index into OpenSearch requests.put(f"{OS_URL}/{INDEX}/_doc/{doc_id}", json={ "key": key, "collection": collection, "lang": lang, "contentMd": doc.get("contentMd", ""), "meta": doc.get("meta", {}), }) elif event == "doc.delete": requests.delete(f"{OS_URL}/{INDEX}/_doc/{doc_id}") return {"ok": True}

Query OpenSearch from MDDB Pipeline

"""
Hybrid search: MDDB for semantic, OpenSearch for full-text.
Merge results for best recall.
"""
import requests MDDB_URL = "http://localhost:11023"
OS_URL = "http://localhost:9200" def hybrid_search(query: str, collection: str, top_k: int = 5) -> list: # Semantic search via MDDB mddb_resp = requests.post(f"{MDDB_URL}/v1/vector-search", json={ "collection": collection, "query": query, "topK": top_k, "threshold": 0.6, "includeContent": True, }) vector_results = mddb_resp.json().get("results", []) # Full-text search via OpenSearch (BM25) os_resp = requests.post(f"{OS_URL}/mddb-docs/_search", json={ "query": { "bool": { "must": {"match": {"contentMd": query}}, "filter": {"term": {"collection": collection}}, } }, "size": top_k, }) os_hits = os_resp.json().get("hits", {}).get("hits", []) # Merge and deduplicate seen = set() merged = [] for r in vector_results: key = r["document"]["key"] if key not in seen: seen.add(key) merged.append({ "key": key, "content": r["document"].get("contentMd", ""), "vector_score": r["score"], "source": "mddb-vector", }) for hit in os_hits: key = hit["_source"]["key"] if key not in seen: seen.add(key) merged.append({ "key": key, "content": hit["_source"].get("contentMd", ""), "bm25_score": hit["_score"], "source": "opensearch-bm25", }) return merged[:top_k]

OpenSearch kNN Search (Vector Search at Scale)

For datasets larger than ~50K documents, use OpenSearch kNN instead of MDDB's in-memory index:

"""
Use OpenSearch kNN for large-scale vector search.
Requires syncing embeddings from MDDB to OpenSearch.
"""
import requests
import numpy as np MDDB_URL = "http://localhost:11023"
OS_URL = "http://localhost:9200" def get_embedding(text: str) -> list: """Get embedding vector from MDDB's embedding endpoint.""" resp = requests.post(f"{MDDB_URL}/v1/embed", json={"text": text}) return resp.json().get("embedding", []) def opensearch_knn_search(query: str, collection: str, k: int = 10): """Semantic search via OpenSearch kNN plugin.""" embedding = get_embedding(query) resp = requests.post(f"{OS_URL}/mddb-docs/_search", json={ "size": k, "query": { "bool": { "must": { "knn": { "embedding": { "vector": embedding, "k": k, } } }, "filter": {"term": {"collection": collection}}, } }, }) hits = resp.json().get("hits", {}).get("hits", []) return [ { "key": h["_source"]["key"], "content": h["_source"].get("contentMd", ""), "score": h["_score"], } for h in hits ]

4. SSG β€” Static Site Generator from MDDB

SSG is a high-performance static site generator written in Go with built-in MDDB support. It pulls Markdown content directly from MDDB collections and renders complete static websites with themes, minification, and deployment-ready output.

graph LR subgraph Content Management MDDB[MDDB<br>:11023] PANEL[MDDB Panel<br>:9000] end subgraph Static Generation SSG[SSG<br>Static Site Generator] TPL[Templates<br>Go / Pongo2 / Mustache] OPT[Optimizer<br>WebP / Minify / Sitemap] end subgraph Deployment GH[GitHub Pages] CF[Cloudflare Pages] NL[Netlify / Vercel] end PANEL -->|edit content| MDDB MDDB -->|fetch docs<br>REST API| SSG SSG --> TPL TPL --> OPT OPT -->|static HTML/CSS/JS| GH OPT --> CF OPT --> NL

Generate a Site from MDDB Collection

brew install spagu/tap/ssg ssg --mddb-url=http://localhost:11023 \ --mddb-collection=blog \ --mddb-lang=en_US \ krowy example.com 

CLI Flags

FlagDescriptionDefault
--mddb-urlMDDB server URL (enables MDDB mode)β€”
--mddb-collectionCollection to fetch posts fromβ€”
--mddb-keyAPI key for authenticationβ€”
--mddb-langLanguage filteren_US
--mddb-timeoutRequest timeout in seconds30

Dev Server with Live Reload

ssg serve --mddb-url=http://localhost:11023 \ --mddb-collection=blog \ krowy example.com

CI/CD Pipeline: MDDB β†’ SSG β†’ GitHub Pages

name: Deploy Site
on: workflow_dispatch: schedule: - cron: '0 */6 * * *' # every 6 hours jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Install SSG run: | curl -sL https://github.com/spagu/ssg/releases/latest/download/ssg_linux_amd64.tar.gz | tar xz sudo mv ssg /usr/local/bin/ - name: Build site from MDDB run: | ssg --mddb-url=${{ secrets.MDDB_URL }} \ --mddb-collection=blog \ --mddb-key=${{ secrets.MDDB_API_KEY }} \ krowy ${{ vars.SITE_DOMAIN }} - name: Deploy to GitHub Pages uses: peaceiris/actions-gh-pages@v3 with: github_token: ${{ secrets.GITHUB_TOKEN }} publish_dir: ./public

Docker Pipeline

services: mddb: image: tradik/mddb:latest ports: - "11023:11023" - "9000:9000" volumes: - mddb-data:/app/data ssg: image: spagu/ssg:latest depends_on: - mddb command: > ssg --mddb-url=http://mddb:11023 --mddb-collection=blog krowy example.com volumes: - ./public:/app/public volumes: mddb-data:

Workflow: Edit in Panel β†’ Generate β†’ Deploy

sequenceDiagram participant U as Editor participant P as MDDB Panel participant M as MDDB participant S as SSG participant D as GitHub Pages U->>P: Edit blog post in Panel P->>M: POST /v1/add (save document) Note over M: Webhook fires on doc.add M->>S: Webhook triggers SSG build S->>M: GET /v1/search (fetch all posts) M->>S: Markdown documents + metadata S->>S: Render templates, optimize assets S->>D: Deploy static files D->>U: Site updated at example.com

5. wpexporter β€” WordPress to MDDB Migration

wpexporter is a Go toolkit for exporting WordPress content. It supports 14+ output formats including Markdown, and includes an MCP server (wpmcp) for AI-assisted migrations.

graph LR subgraph WordPress WP[WordPress Site] REST[REST API<br>/wp-json/wp/v2] XMLRPC[XML-RPC<br>/xmlrpc.php] end subgraph wpexporter EXP[wpexportjson<br>Public content] XR[wpxmlrpc<br>Private content] MCP_WP[wpmcp<br>MCP Server] end subgraph MDDB MDDB_S[mddbd<br>:11023] MCP_MD[mddb-mcp<br>MCP Server] BOLT[(BoltDB)] VEC[(Vector Index)] end WP --> REST --> EXP WP --> XMLRPC --> XR WP --> MCP_WP EXP -->|markdown export| MDDB_S XR -->|markdown export| MDDB_S MCP_WP -->|AI orchestration| MCP_MD MDDB_S --> BOLT MDDB_S --> VEC

Quick Export: WordPress β†’ Markdown β†’ MDDB

wpexportjson -url https://your-site.com -format markdown -output ./wp-export/ for file in wp-export/*.md; do key=$(basename "$file" .md) content=$(cat "$file") curl -X POST http://localhost:11023/v1/add \ -H 'Content-Type: application/json' \ -d "{ \"collection\": \"blog\", \"key\": \"$key\", \"lang\": \"en_US\", \"meta\": {\"source\": [\"wordpress\"]}, \"contentMd\": $(echo "$content" | jq -Rs .) }"
done

Python: Full Migration with Metadata

"""
Migrate WordPress β†’ MDDB with full metadata preservation.
Uses wpexportjson JSON output for richer metadata.
"""
import subprocess
import json
import requests
from pathlib import Path MDDB_URL = "http://localhost:11023"
WP_URL = "https://your-site.com"
COLLECTION = "blog" subprocess.run([ "wpexportjson", "-url", WP_URL, "-format", "json", "-output", "./wp-export.json",
]) with open("wp-export.json") as f: posts = json.load(f) for post in posts: slug = post.get("slug", "") title = post.get("title", "") content_md = post.get("content_markdown", post.get("content", "")) categories = post.get("categories", []) tags = post.get("tags", []) date = post.get("date", "") resp = requests.post(f"{MDDB_URL}/v1/add", json={ "collection": COLLECTION, "key": slug, "lang": "en_US", "meta": { "title": [title], "source": ["wordpress"], "wp_url": [f"{WP_URL}/{slug}/"], "category": categories if categories else ["uncategorized"], "tags": tags, "date": [date], }, "contentMd": f"# {title}\n\n{content_md}", }) status = "OK" if resp.status_code == 200 else f"ERROR: {resp.text}" print(f" {slug}: {status}") print(f"\nMigrated {len(posts)} posts. Embeddings generating in background.")

AI-Assisted Migration via MCP

Both wpexporter (wpmcp) and MDDB (mddb-mcp) have MCP servers. Connect both to Claude Desktop and let the AI orchestrate the migration:

{ "mcpServers": { "wordpress": { "command": "wpmcp", "args": [], "env": { "WP_URL": "https://your-site.com", "WP_USER": "admin", "WP_APP_PASSWORD": "xxxx xxxx xxxx xxxx" } }, "mddb": { "command": "docker", "args": [ "run", "-i", "--rm", "--network", "host", "-v", "mddb-data:/app/data", "-e", "MDDB_MCP_STDIO=true", "tradik/mddb:latest" ] } }
}

Then ask Claude:

"Migrate all posts from WordPress to MDDB collection 'blog'. Preserve categories, tags, and dates as metadata. Skip draft posts."

sequenceDiagram participant U as User participant C as Claude Desktop participant WP as wpmcp<br>(WordPress) participant MD as mddb-mcp<br>(MDDB) U->>C: "Migrate all published posts<br>from WordPress to MDDB" C->>WP: tool: list_posts(status=published) WP->>C: 142 posts with metadata loop For each post C->>WP: tool: get_post(id=N, format=markdown) WP->>C: Markdown content + meta C->>MD: tool: add_document(collection=blog,<br>key=slug, contentMd=...) MD->>C: OK end C->>MD: tool: vector_reindex(collection=blog) MD->>C: Reindexing 142 documents C->>U: "Done! Migrated 142 posts.<br>Embeddings generating in background."

Full WordPress Migration Pipeline

graph TB subgraph s1["1. Export"] WP[WordPress] -->|REST API| WPEXP[wpexporter] WP -->|XML-RPC| WPEXP WPEXP -->|markdown + metadata| MD_FILES[Markdown Files] end subgraph s2["2. Store and Index"] MD_FILES -->|bulk import| MDDB[MDDB] MDDB -->|auto-embed| VEC[(Vector Index)] MDDB -->|store| BOLT[(BoltDB)] end subgraph s3["3. Use"] MDDB -->|MCP| CLAUDE[Claude / AI Agents] MDDB -->|REST| SSG_N[SSG<br>New Static Site] MDDB -->|vector search| RAG[RAG Pipeline] end style WP fill:#21759b,color:#fff style MDDB fill:#00d4aa,color:#000 style CLAUDE fill:#d97706,color:#fff

6. Airbyte β†’ MDDB (ELT Destination Connector)

Airbyte is an open-source ELT platform with 300+ source connectors. The MDDB Airbyte destination ships records from any Airbyte source (Postgres, MySQL, Salesforce, Stripe, S3, …) directly into MDDB via POST /v1/add. Each Airbyte stream maps to its own MDDB collection.

Image

RegistryImage
Docker Hubtradik/airbyte-destination-mddb:0.1.1
GHCRghcr.io/tradik/airbyte-destination-mddb:0.1.1 (multi-arch + SLSA build-provenance)

Source: integrations/airbyte-destination/

Register the connector in Airbyte UI

  1. Settings β†’ Destinations β†’ βŠ• New connector.
  2. Fill in:
    • Connector display name:MDDB
    • Docker repository name:tradik/airbyte-destination-mddb
    • Docker image tag:0.1.1
    • Connector documentation URL:https://github.com/tradik/mddb/tree/main/integrations/airbyte-destination
  3. Add. Airbyte runs spec and renders the form.
  4. Destinations β†’ βŠ• New destination β†’ MDDB β†’ fill mddbUrl (e.g. https://mddb.tradik.com) and optional apiKey (bearer vk_…) β†’ Set up destination. Airbyte executes check against the MDDB instance and should report SUCCEEDED.

Configuration (spec)

FieldDefaultDescription
mddbUrlhttps://mddb.tradik.comMDDB base URL, no trailing /.
apiKey(empty)Bearer token (vk_…). Empty = MDDB without auth.
keyFieldidRecord field used as the MDDB document key. SHA-1 of the record on miss.
languageen_USLocale stored on every document.
batchSize100Records buffered before flush. Flush also triggered on every Airbyte STATE message and at end-of-stream.
timeoutSeconds30HTTP timeout per request.
verifySsltrueSet false only for self-signed dev instances.

Record mapping

Airbyte record {"id":"u-42","email":"a@b.c","tags":["pro","beta"]} on stream users becomes:

{ "collection": "users", "key": "u-42", "lang": "en_US", "meta": { "id": ["u-42"], "email": ["a@b.c"], "tags": ["pro", "beta"] }, "contentMd": "<!-- emittedAt=… -->\n```json\n{ …record… }\n```\n"
}

contentMd carries the full record inside a fenced JSON code block β€” FTS + vector search index it out of the box. meta follows the native MDDB map<string,[]string> schema.

Sync modes

  • append β€” every record upserted by key (existing docs replaced, no orphan deletion).
  • append_dedup β€” same semantics (/v1/add is upsert-by-key by nature).
  • overwrite β€” not advertised; if forced by the source, the connector logs WARN and falls back to append-upsert. Orphans are not deleted (MDDB has no batch-delete-by-collection).

Reliability

  • HTTP retry 3Γ— with exponential backoff on 429/5xx (urllib3.Retry).
  • Flush on every AirbyteMessage(STATE) so partial syncs don't lose in-flight batches.
  • 40 unit tests, 97% coverage, CI matrix on Python 3.12 & 3.13.

Example flow: Postgres β†’ MDDB

  1. Source β†’ Postgres pointing at the table you want to index (e.g. wiki.articles).
  2. Destination β†’ MDDB with keyField=slug, language=en_US.
  3. Connection β†’ select streams, sync mode Append + Deduped (the connector treats both append modes identically).
  4. Run sync. MDDB ingests each row, embeds it (auto-vector if configured), and indexes for FTS/vector/hybrid search.
curl -s https://mddb.tradik.com/v1/search \ -H 'content-type: application/json' \ -d '{"collection":"articles","query":"vector index","limit":5}'

7. WordPress β†’ MDDB (Sync plugin)

integrations/wordpress-plugin/ β€” first-party WordPress plugin that mirrors posts and pages (or any public post type) into MDDB. Unlike wpexporter (one-shot bulk migration), this plugin keeps the two stores in lock-step: every save / publish / trash / delete in WordPress is reflected in MDDB in real time.

PropertyValue
Plugin slugmddb-sync
Release tag prefixwp-v (e.g. wp-v0.1.0) β€” separate from core MDDB vX.Y.Z tags
Release assetmddb-sync-<version>.zip attached to each GitHub Release
WP requires6.2+
PHP requires8.1+

Hooks

  • wp_after_insert_post β†’ POST /v1/add (autosaves & revisions skipped; drafts opt-in).
  • wp_trash_post and before_delete_post β†’ POST /v1/delete.
  • pre_set_site_transient_update_plugins + plugins_api β†’ self-update channel hitting repos/tradik/mddb/releases/latest.

Settings (Settings β†’ MDDB Sync)

FieldWhat
MDDB URLBase URL, e.g. https://mddb.tradik.com.
API keyBearer token (vk_…); empty for unauthenticated dev instances.
CollectionDefaults to a slug derived from the site host.
Sync eventsToggle save / delete / include-drafts independently.
Post typesAny registered public post type.
Language detectionAuto (Polylang β†’ WPML β†’ site locale), or pin one source.
Key strategyposttype-id (default), posttype-slug, or permalink path.

Document shape

{ "collection": "example_com", "key": "post-42", "lang": "en_US", "meta": { "postType": ["post"], "status": ["publish"], "title": ["Hello world"], "slug": ["hello-world"], "permalink": ["https://example.com/hello-world/"], "author": ["Jane Author"], "publishedAt": ["2026-05-19T10:14:00+00:00"], "categories": ["News"], "tags": ["intro", "demo"] }, "contentMd": "# Hello world\n\nHello world.\n"
}

contentMd runs the post body through the standard the_content filter, then strips tags β€” shortcodes, blocks, and oEmbed embeds are expanded first so the indexed text matches the rendered page.

Build & release

The workflow .github/workflows/wordpress-plugin.yml runs composer audit + PHPCS (WordPress security ruleset) + PHPStan level 5 + PHPUnit on PHP 8.1 / 8.2 / 8.3 / 8.4 on every PR and push touching integrations/wordpress-plugin/**. The 8.3 leg enforces β‰₯90 % line coverage. Pushing a wp-v* tag builds the runtime zip and attaches it to a GitHub Release β€” that asset is what the in-plugin updater downloads.


8. GitHub Action β†’ MDDB (CI sync)

integrations/github-action/ β€” native Node 24 JavaScript action that ingests repository files into an MDDB collection on every push (or any other workflow trigger). Drop it into a workflow to keep an MDDB collection in sync with your docs/, README.md, OpenAPI specs, or any other text/markdown/JSON artefacts that live in git.

name: Sync docs to MDDB on: push: branches: [main] paths: ['docs/**', 'README.md'] jobs: sync: runs-on: ubuntu-latest steps: - uses: actions/checkout@v6 - uses: tradik/mddb/integrations/github-action@gha-v0 with: mddb-url: https://mddb.tradik.com api-key: ${{ secrets.MDDB_API_KEY }} collection: project-docs path: | docs/**/*.md README.md ignore: docs/draft/** key-prefix: ${{ github.repository }}/

Inputs

InputDefaultDescription
mddb-urlhttps://mddb.tradik.comMDDB base URL.
api-key(empty)Bearer token (vk_…).
collectionrequiredTarget MDDB collection.
path**/*.mdNewline-separated globs (inline ! for negation).
ignore(empty)Newline-separated exclude globs.
key-strategypathpath (slug of relative path), hash (sha1 of content), filename (basename).
key-prefix(empty)Prefix every key β€” useful when several repos share one collection.
concurrency8Parallel /v1/add requests.
dry-runfalseWalk + build documents without contacting MDDB.
fail-on-errortrueSet to false to demote upload failures to job warnings.

Outputs

documents-scanned, documents-added, documents-failed.

Document shape

A file at docs/guide/intro.md containing # Hello produces:

{ "collection": "project-docs", "key": "tradik/mddb/docs/guide/intro.md", "lang": "en_US", "meta": { "source": ["github-action"], "path": ["docs/guide/intro.md"], "extension": [".md"], "size": ["7"], "repository": ["tradik/mddb"], "ref": ["<commit sha>"] }, "contentMd": "# Hello"
}

Markdown and plain-text (.md, .markdown, .mdx, .txt, .rst, .adoc) are stored verbatim. JSON / YAML / TOML / HTML / CSS / JS / TS / Python / Go / Rust / Bash files are wrapped in a fenced code block with the matching language so FTS + vector indexing recognise the structure.

Tests & release

57 unit tests with 90%+ Jest coverage (statements / branches / functions / lines). The workflow .github/workflows/github-action.yml runs format check + ESLint + Jest with coverage on a Node 22 & 24 matrix, rebuilds dist/ and asserts it matches the committed bundle (verify-dist), and dry-runs the action against the integration's own README (smoke). Pushing a gha-v* tag verifies package.json.version, force-moves floating gha-v<major> / gha-v<major>.<minor> tags, and publishes a GitHub Release β€” so consumers can pin to @gha-v0, @gha-v0.1, or @gha-v0.1.0.


9. Grafana β†’ MDDB (Datasource plugin)

integrations/grafana-datasource/ β€” native Grafana 10/11/12/13 datasource plugin that turns MDDB into a first-class panel source. Five query types map directly to MDDB endpoints, so a dashboard can mix MDDB content signals (hot documents, FTS results, metadata facets, event histograms) with Prometheus/Loki/Tempo on the same page.

Note: MDDB also exposes a Prometheus-compatible /metrics endpoint for server-level metrics (request rate, latency, database size, embedding queue) β€” see docs/TELEMETRY.md. This plugin is the complement: it queries MDDB content, not Prometheus metrics.

Plugin ID & supported versions

Plugin IDtradik-mddb-datasource
TypeDatasource (frontend-only, no backend Go binary)
Grafana>=10.0.0 (tested against Grafana 13.0.1)
MDDB2.9.16+

Source: integrations/grafana-datasource/

Install

cd integrations/grafana-datasource && make build docker run --rm -p 3000:3000 \ -e GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS=tradik-mddb-datasource \ -v "$(pwd)/dist:/var/lib/grafana/plugins/tradik-mddb-datasource" \ grafana/grafana:13.0.1

Or make docker to bake the plugin into a tradik/mddb-grafana-datasource:<version> image.

In Grafana: Configuration β†’ Data sources β†’ Add data source β†’ MDDB, set the base URL (https://mddb.tradik.com), optional default collection, and a bearer API key β€” stored encrypted in secureJsonData. Save & test pings /v1/stats and distinguishes auth vs server vs network failures.

Query types

Query typeMDDB endpointOutput shapeBest panel
Temporal histogramPOST /v1/temporal/histogramTime + count time-seriesTime series
Hot documentsPOST /v1/temporal/hotdocId / accessCount / lastAccessAt tableTable / Bar chart
Metadata aggregatePOST /v1/aggregatevalue/count table or time bucket seriesPie / Bar / Time series
Full-text searchPOST /v1/ftskey / lang / score / highlight tableTable
Database statsPOST /v1/statsper-collection documents / revisions / embeddingsStat / Table

Dashboard time range is applied automatically as from / to (seconds). Grafana dashboard variables are interpolated into collection, query, and facetKey via getTemplateSrv().replace().

Example: temporal access histogram

Datasource: MDDB
Query type: Temporal histogram (time-series)
Collection: blog
Event type: access
Interval: day

Yields a time series plottable on any Grafana time-series panel; combine with a documents per collection Prometheus query on the same dashboard to correlate ingest with reads.

Tests & release

Pure-logic Jest tests on query.ts, transform.ts, client.ts, and datasource.ts enforce β‰₯90% coverage. make package produces a versioned plugin zip; tag-driven releases use grafana-v<version> and ship the zip as a GitHub Release asset for grafana-cli plugins install --pluginUrl or signing-service submission.


10. Full Pipeline: All Integrations Together

Combine all tools for a complete content platform:

graph TB subgraph p1["1. Content Sources"] WP[WordPress] -->|wpexporter| WPEXP[wpexporter<br>markdown + meta] FILES[PDF / DOCX / PPTX] -->|parse| DOCLING[Docling] end subgraph p2["2. MDDB - Central Hub"] WPEXP -->|bulk import| MDDB[MDDB<br>:11023 / :11024] DOCLING -->|markdown + chunks| MDDB MDDB -->|primary store| BOLT[(BoltDB)] MDDB -->|auto-embed| VEC[(Vector Index)] MDDB -->|webhook sync| OS[(OpenSearch<br>scale search)] end subgraph p3["3. Orchestration and AI"] LF[Langflow] -->|semantic search| MDDB LF -->|BM25 / kNN| OS LF -->|generate| LLM[LLM] LLM --> LF MDDB -->|MCP| CLAUDE[Claude Desktop] end subgraph p4["4. Output"] MDDB -->|REST API| SSG[SSG<br>Static Site Generator] SSG -->|HTML/CSS/JS| DEPLOY[GitHub Pages<br>Cloudflare<br>Netlify] LF --> WEBAPP[Web App] MDDB -->|REST| API[Custom Apps] end style WP fill:#21759b,color:#fff style MDDB fill:#00d4aa,color:#000 style LLM fill:#d97706,color:#fff style SSG fill:#7c3aed,color:#fff style CLAUDE fill:#d97706,color:#fff

Step-by-Step

  1. Import β€” wpexporter migrates WordPress content; Docling parses PDFs/DOCX into Markdown
  2. Store β€” MDDB stores all documents with metadata in BoltDB
  3. Index β€” MDDB auto-generates embeddings; optionally syncs to OpenSearch for scale
  4. Search β€” Langflow orchestrates RAG: semantic search via MDDB, BM25 via OpenSearch
  5. Answer β€” LLM generates answers from retrieved context, cites sources
  6. Publish β€” SSG renders static sites from MDDB collections, deploys to CDN
  7. Manage β€” Claude Desktop via MCP (52 tools), Panel UI, REST/gRPC APIs

When to Use What

ScenarioRecommendation
Migrate from WordPresswpexporter β†’ MDDB
Parse PDF/DOCX to MarkdownDocling β†’ MDDB
Generate static websiteMDDB β†’ SSG
< 50K docs, simple RAGMDDB only (no OpenSearch needed)
> 50K docs, enterprise searchMDDB + OpenSearch
Visual workflow buildingMDDB + Langflow
AI agent integrationMDDB MCP (52 tools)
AI-assisted migrationwpmcp + mddb-mcp via Claude
Full production pipelineAll together

← Back to README Β· LLM Connections β†’ Β· RAG Pipeline β†’