Integrations: Docling, Langflow, OpenSearch, Airbyte, GitHub Action & Grafana
Use MDDB alongside popular AI/ML, ELT, and observability tools to build production document processing and RAG pipelines.
Architecture Overview
graph LR subgraph Content Sources PDF[PDF / DOCX / PPTX] DOCLING[Docling<br>IBM Document Parser] WP[WordPress] WPEXP[wpexporter] WPSYNC[WordPress<br>Sync plugin] AB[Airbyte<br>300+ ELT sources] GHA[GitHub Action<br>repo docs / READMEs] end subgraph Storage & Search MDDB[mddbd<br>:11023 / :11024] BOLT[(BoltDB)] VEC[(Vector Index)] OS[(OpenSearch<br>optional)] end subgraph Output & Orchestration SSG[SSG<br>Static Site Generator] LF[Langflow<br>Visual RAG Builder] LLM[LLM<br>Claude / GPT / Llama] DEPLOY[GitHub Pages<br>Cloudflare / Netlify] GRAFANA[Grafana<br>datasource plugin] end PDF -->|parse| DOCLING DOCLING -->|markdown| MDDB WP -->|export| WPEXP WPEXP -->|markdown| MDDB WP -->|live hooks| WPSYNC WPSYNC -->|/v1/add /v1/delete| MDDB AB -->|destination-mddb| MDDB GHA -->|/v1/add per file| MDDB MDDB --> BOLT MDDB --> VEC MDDB -.->|sync| OS MDDB -->|fetch docs| SSG SSG -->|static HTML| DEPLOY LF -->|REST / MCP| MDDB LF -->|query| OS LF -->|generate| LLM LLM -->|answer| LF MDDB -->|/v1/temporal /v1/aggregate| GRAFANA
1. Docling β MDDB (Document Ingestion)
Docling is IBM's document parser that converts PDF, DOCX, PPTX, and HTML into structured Markdown. Since MDDB stores Markdown natively, this is a natural fit.
Install Docling
pip install docling
Basic: Parse and Store a Single Document
from docling.document_converter import DocumentConverter
import requests MDDB_URL = "http://localhost:11023" converter = DocumentConverter()
result = converter.convert("report.pdf")
markdown = result.document.export_to_markdown() requests.post(f"{MDDB_URL}/v1/add", json={ "collection": "reports", "key": "report-2026-q1", "lang": "en_US", "meta": { "source": ["docling"], "type": ["pdf"], "title": ["Q1 2026 Report"], }, "contentMd": markdown,
})
Batch: Ingest a Folder of Documents
"""
Bulk-import a folder of documents via Docling β MDDB.
Supports PDF, DOCX, PPTX, HTML.
"""
from docling.document_converter import DocumentConverter
from pathlib import Path
import requests MDDB_URL = "http://localhost:11023"
COLLECTION = "knowledge-base"
INPUT_DIR = Path("./documents") converter = DocumentConverter()
supported = {".pdf", ".docx", ".pptx", ".html", ".htm"} for file in INPUT_DIR.iterdir(): if file.suffix.lower() not in supported: continue print(f"Processing: {file.name}") result = converter.convert(str(file)) markdown = result.document.export_to_markdown() resp = requests.post(f"{MDDB_URL}/v1/add", json={ "collection": COLLECTION, "key": file.stem, "lang": "en_US", "meta": { "source": ["docling"], "type": [file.suffix.lstrip(".")], "filename": [file.name], }, "contentMd": markdown, }) if resp.status_code == 200: print(f" OK: {file.name}") else: print(f" ERROR: {resp.text}") print("Done. Check embedding progress:")
print(requests.get(f"{MDDB_URL}/v1/vector-stats").json())
With Chunking (for Better Vector Search)
Long documents should be split into chunks for more precise semantic search:
from docling.document_converter import DocumentConverter
from docling.chunking import HybridChunker
import requests MDDB_URL = "http://localhost:11023" converter = DocumentConverter()
result = converter.convert("manual.pdf") chunker = HybridChunker(tokenizer="sentence-transformers/all-MiniLM-L6-v2")
chunks = list(chunker.chunk(result.document)) for i, chunk in enumerate(chunks): text = chunk.text # Skip very short chunks if len(text.strip()) < 50: continue requests.post(f"{MDDB_URL}/v1/add", json={ "collection": "manual", "key": f"manual-chunk-{i:04d}", "lang": "en_US", "meta": { "source": ["docling"], "chunk_index": [str(i)], "parent_doc": ["manual.pdf"], }, "contentMd": text, }) print(f"Imported {len(chunks)} chunks from manual.pdf")
Docker Pipeline
services: mddb: image: tradik/mddb:latest ports: - "11023:11023" - "11024:11024" - "9000:9000" volumes: - mddb-data:/app/data environment: MDDB_EMBEDDING_PROVIDER: openai MDDB_EMBEDDING_API_KEY: ${OPENAI_API_KEY} docling-ingest: build: context: . dockerfile: Dockerfile.docling volumes: - ./documents:/documents environment: MDDB_URL: http://mddb:11023 depends_on: - mddb volumes: mddb-data:
FROM python:3.11-slim
RUN pip install docling requests
COPY ingest.py /app/ingest.py
CMD ["python", "/app/ingest.py"]
2. Langflow + MDDB (Visual RAG Orchestration)
Langflow is a visual framework for building LLM workflows. MDDB can be integrated as a retrieval component via REST API or MCP.
Install Langflow
pip install langflow
langflow run
Option A: Custom Python Component (REST API)
Create a custom Langflow component that queries MDDB:
"""
MDDB Search Component for Langflow.
Save as: mddb_component.py
Import in Langflow via Custom Components.
"""
from langflow.custom import Component
from langflow.io import MessageTextInput, IntInput, Output
from langflow.schema import Data
import requests class MDDBSearch(Component): display_name = "MDDB Semantic Search" description = "Search MDDB knowledge base using semantic/vector search." icon = "search" inputs = [ MessageTextInput( name="query", display_name="Search Query", info="Natural language query to search for.", ), MessageTextInput( name="mddb_url", display_name="MDDB URL", value="http://localhost:11023", info="MDDB server address.", ), MessageTextInput( name="collection", display_name="Collection", value="docs", info="MDDB collection to search.", ), IntInput( name="top_k", display_name="Top K", value=5, info="Number of results to return.", ), ] outputs = [ Output(display_name="Results", name="results", method="search"), ] def search(self) -> list[Data]: response = requests.post( f"{self.mddb_url}/v1/vector-search", json={ "collection": self.collection, "query": self.query, "topK": self.top_k, "threshold": 0.6, "includeContent": True, }, ) results = response.json().get("results", []) return [ Data(data={ "key": r["document"]["key"], "content": r["document"].get("contentMd", ""), "score": r["score"], "meta": r["document"].get("meta", {}), }) for r in results ]
Using in Langflow
- Open Langflow UI β My Collection β New Project
- Go to Custom Components β upload
mddb_component.py - Build a flow:
[Chat Input] β [MDDB Semantic Search] β [Parse Data] β [Prompt] β [LLM] β [Chat Output]
The Prompt template:
Answer the user's question based on the following documents from the knowledge base.
Cite sources by their key. Context:
{documents} Question: {query}
Option B: Langflow + MDDB via MCP
If your Langflow version supports MCP tool calling, connect MDDB directly:
{ "mcpServers": { "mddb": { "command": "docker", "args": [ "run", "-i", "--rm", "--network", "host", "-v", "mddb-data:/app/data", "-e", "MDDB_MCP_STDIO=true", "tradik/mddb:latest" ] } }
}
Available MCP tools for Langflow: semantic_search, full_text_search, hybrid_search, search_documents, add_document, import_url, and 48 more.
Option C: Langflow API Tool (No Custom Code)
Use Langflow's built-in API Request component to call MDDB directly:
- Add an API Request component
- Set Method: POST
- Set URL:
http://localhost:11023/v1/vector-search - Set Body:
{ "collection": "docs", "query": "{query}", "topK": 5, "includeContent": true
}
- Connect:
[Chat Input] β [API Request] β [Parse Data] β [Prompt] β [LLM] β [Chat Output]
Full Langflow RAG Flow Example
graph LR INPUT[Chat Input] --> MDDB[MDDB Search<br>vector-search] MDDB --> PARSE[Parse Data<br>extract contentMd] PARSE --> PROMPT[Prompt Template<br>context + question] INPUT --> PROMPT PROMPT --> LLM[OpenAI / Claude<br>/ Ollama] LLM --> OUTPUT[Chat Output]
3. OpenSearch + MDDB (Scalable Search)
MDDB's built-in vector search works well up to ~50K documents. For larger datasets or advanced full-text search (BM25, aggregations, facets), sync documents to OpenSearch.
Architecture
| Feature | MDDB | OpenSearch |
|---|---|---|
| Storage | Primary (BoltDB) | Search index (replica) |
| Vector search | In-memory, ~50K docs | kNN plugin, millions |
| Full-text search | Built-in TF scoring | BM25, analyzers, stemming |
| Aggregations | No | Yes (facets, histograms) |
| MCP tools | 52 built-in | No |
Strategy: MDDB as the primary store + MCP interface, OpenSearch as the search backend for scale.
Setup OpenSearch
services: mddb: image: tradik/mddb:latest ports: - "11023:11023" volumes: - mddb-data:/app/data environment: MDDB_EMBEDDING_PROVIDER: openai MDDB_EMBEDDING_API_KEY: ${OPENAI_API_KEY} opensearch: image: opensearchproject/opensearch:2 ports: - "9200:9200" environment: discovery.type: single-node DISABLE_SECURITY_PLUGIN: "true" volumes: - os-data:/usr/share/opensearch/data opensearch-dashboards: image: opensearchproject/opensearch-dashboards:2 ports: - "5601:5601" environment: OPENSEARCH_HOSTS: '["http://opensearch:9200"]' DISABLE_SECURITY_DASHBOARDS_PLUGIN: "true" volumes: mddb-data: os-data:
Create OpenSearch Index
curl -X PUT http://localhost:9200/mddb-docs -H 'Content-Type: application/json' -d '{ "settings": { "index": { "knn": true, "number_of_replicas": 0 } }, "mappings": { "properties": { "key": { "type": "keyword" }, "collection": { "type": "keyword" }, "lang": { "type": "keyword" }, "contentMd": { "type": "text", "analyzer": "standard" }, "meta": { "type": "object", "enabled": true }, "addedAt": { "type": "date" }, "updatedAt": { "type": "date" }, "embedding": { "type": "knn_vector", "dimension": 1536, "method": { "name": "hnsw", "engine": "lucene" } } } }
}'
Sync Script: MDDB β OpenSearch
"""
Sync documents from MDDB to OpenSearch.
Run periodically (cron) or trigger via MDDB webhook.
"""
import requests
import json MDDB_URL = "http://localhost:11023"
OS_URL = "http://localhost:9200"
INDEX = "mddb-docs" def sync_collection(collection: str): """Export all docs from MDDB and index into OpenSearch.""" # Step 1: Export from MDDB as NDJSON resp = requests.post(f"{MDDB_URL}/v1/export", json={ "collection": collection, "format": "ndjson", }) if resp.status_code != 200: print(f"Export failed: {resp.text}") return # Step 2: Bulk index into OpenSearch bulk_body = "" count = 0 for line in resp.text.strip().split("\n"): if not line: continue doc = json.loads(line) doc_id = f"{collection}|{doc['key']}|{doc.get('lang', 'en_us')}" bulk_body += json.dumps({"index": {"_index": INDEX, "_id": doc_id}}) + "\n" bulk_body += json.dumps({ "key": doc["key"], "collection": collection, "lang": doc.get("lang", ""), "contentMd": doc.get("contentMd", ""), "meta": doc.get("meta", {}), "addedAt": doc.get("addedAt"), "updatedAt": doc.get("updatedAt"), }) + "\n" count += 1 if bulk_body: r = requests.post( f"{OS_URL}/_bulk", data=bulk_body, headers={"Content-Type": "application/x-ndjson"}, ) result = r.json() errors = result.get("errors", False) print(f"Synced {count} docs from '{collection}' β OpenSearch (errors={errors})") stats = requests.get(f"{MDDB_URL}/v1/stats").json()
for coll in stats.get("collections", {}).keys(): sync_collection(coll)
Real-Time Sync via MDDB Webhooks
Instead of periodic sync, use MDDB webhooks for real-time updates:
curl -X POST http://localhost:11023/v1/webhooks -H 'Content-Type: application/json' -d '{ "url": "http://sync-service:8080/mddb-webhook", "events": ["doc.add", "doc.update", "doc.delete"], "collections": ["*"]
}'
Webhook handler that updates OpenSearch:
"""
Webhook receiver: syncs individual document changes to OpenSearch.
Run as a small Flask/FastAPI service.
"""
from fastapi import FastAPI, Request
import requests app = FastAPI()
OS_URL = "http://localhost:9200"
INDEX = "mddb-docs"
MDDB_URL = "http://localhost:11023" @app.post("/mddb-webhook")
async def handle_webhook(request: Request): payload = await request.json() event = payload.get("event") collection = payload.get("collection") key = payload.get("key") lang = payload.get("lang", "en_us") doc_id = f"{collection}|{key}|{lang}" if event in ("doc.add", "doc.update"): # Fetch full document from MDDB resp = requests.post(f"{MDDB_URL}/v1/get", json={ "collection": collection, "key": key, "lang": lang, }) doc = resp.json() # Index into OpenSearch requests.put(f"{OS_URL}/{INDEX}/_doc/{doc_id}", json={ "key": key, "collection": collection, "lang": lang, "contentMd": doc.get("contentMd", ""), "meta": doc.get("meta", {}), }) elif event == "doc.delete": requests.delete(f"{OS_URL}/{INDEX}/_doc/{doc_id}") return {"ok": True}
Query OpenSearch from MDDB Pipeline
"""
Hybrid search: MDDB for semantic, OpenSearch for full-text.
Merge results for best recall.
"""
import requests MDDB_URL = "http://localhost:11023"
OS_URL = "http://localhost:9200" def hybrid_search(query: str, collection: str, top_k: int = 5) -> list: # Semantic search via MDDB mddb_resp = requests.post(f"{MDDB_URL}/v1/vector-search", json={ "collection": collection, "query": query, "topK": top_k, "threshold": 0.6, "includeContent": True, }) vector_results = mddb_resp.json().get("results", []) # Full-text search via OpenSearch (BM25) os_resp = requests.post(f"{OS_URL}/mddb-docs/_search", json={ "query": { "bool": { "must": {"match": {"contentMd": query}}, "filter": {"term": {"collection": collection}}, } }, "size": top_k, }) os_hits = os_resp.json().get("hits", {}).get("hits", []) # Merge and deduplicate seen = set() merged = [] for r in vector_results: key = r["document"]["key"] if key not in seen: seen.add(key) merged.append({ "key": key, "content": r["document"].get("contentMd", ""), "vector_score": r["score"], "source": "mddb-vector", }) for hit in os_hits: key = hit["_source"]["key"] if key not in seen: seen.add(key) merged.append({ "key": key, "content": hit["_source"].get("contentMd", ""), "bm25_score": hit["_score"], "source": "opensearch-bm25", }) return merged[:top_k]
OpenSearch kNN Search (Vector Search at Scale)
For datasets larger than ~50K documents, use OpenSearch kNN instead of MDDB's in-memory index:
"""
Use OpenSearch kNN for large-scale vector search.
Requires syncing embeddings from MDDB to OpenSearch.
"""
import requests
import numpy as np MDDB_URL = "http://localhost:11023"
OS_URL = "http://localhost:9200" def get_embedding(text: str) -> list: """Get embedding vector from MDDB's embedding endpoint.""" resp = requests.post(f"{MDDB_URL}/v1/embed", json={"text": text}) return resp.json().get("embedding", []) def opensearch_knn_search(query: str, collection: str, k: int = 10): """Semantic search via OpenSearch kNN plugin.""" embedding = get_embedding(query) resp = requests.post(f"{OS_URL}/mddb-docs/_search", json={ "size": k, "query": { "bool": { "must": { "knn": { "embedding": { "vector": embedding, "k": k, } } }, "filter": {"term": {"collection": collection}}, } }, }) hits = resp.json().get("hits", {}).get("hits", []) return [ { "key": h["_source"]["key"], "content": h["_source"].get("contentMd", ""), "score": h["_score"], } for h in hits ]
4. SSG β Static Site Generator from MDDB
SSG is a high-performance static site generator written in Go with built-in MDDB support. It pulls Markdown content directly from MDDB collections and renders complete static websites with themes, minification, and deployment-ready output.
graph LR subgraph Content Management MDDB[MDDB<br>:11023] PANEL[MDDB Panel<br>:9000] end subgraph Static Generation SSG[SSG<br>Static Site Generator] TPL[Templates<br>Go / Pongo2 / Mustache] OPT[Optimizer<br>WebP / Minify / Sitemap] end subgraph Deployment GH[GitHub Pages] CF[Cloudflare Pages] NL[Netlify / Vercel] end PANEL -->|edit content| MDDB MDDB -->|fetch docs<br>REST API| SSG SSG --> TPL TPL --> OPT OPT -->|static HTML/CSS/JS| GH OPT --> CF OPT --> NL
Generate a Site from MDDB Collection
brew install spagu/tap/ssg ssg --mddb-url=http://localhost:11023 \ --mddb-collection=blog \ --mddb-lang=en_US \ krowy example.com CLI Flags
| Flag | Description | Default |
|---|---|---|
--mddb-url | MDDB server URL (enables MDDB mode) | β |
--mddb-collection | Collection to fetch posts from | β |
--mddb-key | API key for authentication | β |
--mddb-lang | Language filter | en_US |
--mddb-timeout | Request timeout in seconds | 30 |
Dev Server with Live Reload
ssg serve --mddb-url=http://localhost:11023 \ --mddb-collection=blog \ krowy example.com
CI/CD Pipeline: MDDB β SSG β GitHub Pages
name: Deploy Site
on: workflow_dispatch: schedule: - cron: '0 */6 * * *' # every 6 hours jobs: build: runs-on: ubuntu-latest steps: - uses: actions/checkout@v4 - name: Install SSG run: | curl -sL https://github.com/spagu/ssg/releases/latest/download/ssg_linux_amd64.tar.gz | tar xz sudo mv ssg /usr/local/bin/ - name: Build site from MDDB run: | ssg --mddb-url=${{ secrets.MDDB_URL }} \ --mddb-collection=blog \ --mddb-key=${{ secrets.MDDB_API_KEY }} \ krowy ${{ vars.SITE_DOMAIN }} - name: Deploy to GitHub Pages uses: peaceiris/actions-gh-pages@v3 with: github_token: ${{ secrets.GITHUB_TOKEN }} publish_dir: ./public
Docker Pipeline
services: mddb: image: tradik/mddb:latest ports: - "11023:11023" - "9000:9000" volumes: - mddb-data:/app/data ssg: image: spagu/ssg:latest depends_on: - mddb command: > ssg --mddb-url=http://mddb:11023 --mddb-collection=blog krowy example.com volumes: - ./public:/app/public volumes: mddb-data:
Workflow: Edit in Panel β Generate β Deploy
sequenceDiagram participant U as Editor participant P as MDDB Panel participant M as MDDB participant S as SSG participant D as GitHub Pages U->>P: Edit blog post in Panel P->>M: POST /v1/add (save document) Note over M: Webhook fires on doc.add M->>S: Webhook triggers SSG build S->>M: GET /v1/search (fetch all posts) M->>S: Markdown documents + metadata S->>S: Render templates, optimize assets S->>D: Deploy static files D->>U: Site updated at example.com
5. wpexporter β WordPress to MDDB Migration
wpexporter is a Go toolkit for exporting WordPress content. It supports 14+ output formats including Markdown, and includes an MCP server (wpmcp) for AI-assisted migrations.
graph LR subgraph WordPress WP[WordPress Site] REST[REST API<br>/wp-json/wp/v2] XMLRPC[XML-RPC<br>/xmlrpc.php] end subgraph wpexporter EXP[wpexportjson<br>Public content] XR[wpxmlrpc<br>Private content] MCP_WP[wpmcp<br>MCP Server] end subgraph MDDB MDDB_S[mddbd<br>:11023] MCP_MD[mddb-mcp<br>MCP Server] BOLT[(BoltDB)] VEC[(Vector Index)] end WP --> REST --> EXP WP --> XMLRPC --> XR WP --> MCP_WP EXP -->|markdown export| MDDB_S XR -->|markdown export| MDDB_S MCP_WP -->|AI orchestration| MCP_MD MDDB_S --> BOLT MDDB_S --> VEC
Quick Export: WordPress β Markdown β MDDB
wpexportjson -url https://your-site.com -format markdown -output ./wp-export/ for file in wp-export/*.md; do key=$(basename "$file" .md) content=$(cat "$file") curl -X POST http://localhost:11023/v1/add \ -H 'Content-Type: application/json' \ -d "{ \"collection\": \"blog\", \"key\": \"$key\", \"lang\": \"en_US\", \"meta\": {\"source\": [\"wordpress\"]}, \"contentMd\": $(echo "$content" | jq -Rs .) }"
done
Python: Full Migration with Metadata
"""
Migrate WordPress β MDDB with full metadata preservation.
Uses wpexportjson JSON output for richer metadata.
"""
import subprocess
import json
import requests
from pathlib import Path MDDB_URL = "http://localhost:11023"
WP_URL = "https://your-site.com"
COLLECTION = "blog" subprocess.run([ "wpexportjson", "-url", WP_URL, "-format", "json", "-output", "./wp-export.json",
]) with open("wp-export.json") as f: posts = json.load(f) for post in posts: slug = post.get("slug", "") title = post.get("title", "") content_md = post.get("content_markdown", post.get("content", "")) categories = post.get("categories", []) tags = post.get("tags", []) date = post.get("date", "") resp = requests.post(f"{MDDB_URL}/v1/add", json={ "collection": COLLECTION, "key": slug, "lang": "en_US", "meta": { "title": [title], "source": ["wordpress"], "wp_url": [f"{WP_URL}/{slug}/"], "category": categories if categories else ["uncategorized"], "tags": tags, "date": [date], }, "contentMd": f"# {title}\n\n{content_md}", }) status = "OK" if resp.status_code == 200 else f"ERROR: {resp.text}" print(f" {slug}: {status}") print(f"\nMigrated {len(posts)} posts. Embeddings generating in background.")
AI-Assisted Migration via MCP
Both wpexporter (wpmcp) and MDDB (mddb-mcp) have MCP servers. Connect both to Claude Desktop and let the AI orchestrate the migration:
{ "mcpServers": { "wordpress": { "command": "wpmcp", "args": [], "env": { "WP_URL": "https://your-site.com", "WP_USER": "admin", "WP_APP_PASSWORD": "xxxx xxxx xxxx xxxx" } }, "mddb": { "command": "docker", "args": [ "run", "-i", "--rm", "--network", "host", "-v", "mddb-data:/app/data", "-e", "MDDB_MCP_STDIO=true", "tradik/mddb:latest" ] } }
}
Then ask Claude:
"Migrate all posts from WordPress to MDDB collection 'blog'. Preserve categories, tags, and dates as metadata. Skip draft posts."
sequenceDiagram participant U as User participant C as Claude Desktop participant WP as wpmcp<br>(WordPress) participant MD as mddb-mcp<br>(MDDB) U->>C: "Migrate all published posts<br>from WordPress to MDDB" C->>WP: tool: list_posts(status=published) WP->>C: 142 posts with metadata loop For each post C->>WP: tool: get_post(id=N, format=markdown) WP->>C: Markdown content + meta C->>MD: tool: add_document(collection=blog,<br>key=slug, contentMd=...) MD->>C: OK end C->>MD: tool: vector_reindex(collection=blog) MD->>C: Reindexing 142 documents C->>U: "Done! Migrated 142 posts.<br>Embeddings generating in background."
Full WordPress Migration Pipeline
graph TB subgraph s1["1. Export"] WP[WordPress] -->|REST API| WPEXP[wpexporter] WP -->|XML-RPC| WPEXP WPEXP -->|markdown + metadata| MD_FILES[Markdown Files] end subgraph s2["2. Store and Index"] MD_FILES -->|bulk import| MDDB[MDDB] MDDB -->|auto-embed| VEC[(Vector Index)] MDDB -->|store| BOLT[(BoltDB)] end subgraph s3["3. Use"] MDDB -->|MCP| CLAUDE[Claude / AI Agents] MDDB -->|REST| SSG_N[SSG<br>New Static Site] MDDB -->|vector search| RAG[RAG Pipeline] end style WP fill:#21759b,color:#fff style MDDB fill:#00d4aa,color:#000 style CLAUDE fill:#d97706,color:#fff
6. Airbyte β MDDB (ELT Destination Connector)
Airbyte is an open-source ELT platform with 300+ source connectors. The MDDB Airbyte destination ships records from any Airbyte source (Postgres, MySQL, Salesforce, Stripe, S3, β¦) directly into MDDB via POST /v1/add. Each Airbyte stream maps to its own MDDB collection.
Image
| Registry | Image |
|---|---|
| Docker Hub | tradik/airbyte-destination-mddb:0.1.1 |
| GHCR | ghcr.io/tradik/airbyte-destination-mddb:0.1.1 (multi-arch + SLSA build-provenance) |
Source: integrations/airbyte-destination/
Register the connector in Airbyte UI
- Settings β Destinations β β New connector.
- Fill in:
- Connector display name:
MDDB - Docker repository name:
tradik/airbyte-destination-mddb - Docker image tag:
0.1.1 - Connector documentation URL:
https://github.com/tradik/mddb/tree/main/integrations/airbyte-destination
- Connector display name:
- Add. Airbyte runs
specand renders the form. - Destinations β β New destination β MDDB β fill
mddbUrl(e.g.https://mddb.tradik.com) and optionalapiKey(bearervk_β¦) β Set up destination. Airbyte executescheckagainst the MDDB instance and should reportSUCCEEDED.
Configuration (spec)
| Field | Default | Description |
|---|---|---|
mddbUrl | https://mddb.tradik.com | MDDB base URL, no trailing /. |
apiKey | (empty) | Bearer token (vk_β¦). Empty = MDDB without auth. |
keyField | id | Record field used as the MDDB document key. SHA-1 of the record on miss. |
language | en_US | Locale stored on every document. |
batchSize | 100 | Records buffered before flush. Flush also triggered on every Airbyte STATE message and at end-of-stream. |
timeoutSeconds | 30 | HTTP timeout per request. |
verifySsl | true | Set false only for self-signed dev instances. |
Record mapping
Airbyte record {"id":"u-42","email":"a@b.c","tags":["pro","beta"]} on stream users becomes:
{ "collection": "users", "key": "u-42", "lang": "en_US", "meta": { "id": ["u-42"], "email": ["a@b.c"], "tags": ["pro", "beta"] }, "contentMd": "<!-- emittedAt=β¦ -->\n```json\n{ β¦recordβ¦ }\n```\n"
}
contentMd carries the full record inside a fenced JSON code block β FTS + vector search index it out of the box. meta follows the native MDDB map<string,[]string> schema.
Sync modes
appendβ every record upserted bykey(existing docs replaced, no orphan deletion).append_dedupβ same semantics (/v1/addis upsert-by-key by nature).overwriteβ not advertised; if forced by the source, the connector logsWARNand falls back to append-upsert. Orphans are not deleted (MDDB has no batch-delete-by-collection).
Reliability
- HTTP retry 3Γ with exponential backoff on
429/5xx(urllib3.Retry). - Flush on every
AirbyteMessage(STATE)so partial syncs don't lose in-flight batches. - 40 unit tests, 97% coverage, CI matrix on Python 3.12 & 3.13.
Example flow: Postgres β MDDB
- Source β Postgres pointing at the table you want to index (e.g.
wiki.articles). - Destination β MDDB with
keyField=slug,language=en_US. - Connection β select streams, sync mode
Append + Deduped(the connector treats both append modes identically). - Run sync. MDDB ingests each row, embeds it (auto-vector if configured), and indexes for FTS/vector/hybrid search.
curl -s https://mddb.tradik.com/v1/search \ -H 'content-type: application/json' \ -d '{"collection":"articles","query":"vector index","limit":5}'
7. WordPress β MDDB (Sync plugin)
integrations/wordpress-plugin/ β first-party WordPress plugin that mirrors posts and pages (or any public post type) into MDDB. Unlike wpexporter (one-shot bulk migration), this plugin keeps the two stores in lock-step: every save / publish / trash / delete in WordPress is reflected in MDDB in real time.
| Property | Value |
|---|---|
| Plugin slug | mddb-sync |
| Release tag prefix | wp-v (e.g. wp-v0.1.0) β separate from core MDDB vX.Y.Z tags |
| Release asset | mddb-sync-<version>.zip attached to each GitHub Release |
| WP requires | 6.2+ |
| PHP requires | 8.1+ |
Hooks
wp_after_insert_postβPOST /v1/add(autosaves & revisions skipped; drafts opt-in).wp_trash_postandbefore_delete_postβPOST /v1/delete.pre_set_site_transient_update_plugins+plugins_apiβ self-update channel hittingrepos/tradik/mddb/releases/latest.
Settings (Settings β MDDB Sync)
| Field | What |
|---|---|
| MDDB URL | Base URL, e.g. https://mddb.tradik.com. |
| API key | Bearer token (vk_β¦); empty for unauthenticated dev instances. |
| Collection | Defaults to a slug derived from the site host. |
| Sync events | Toggle save / delete / include-drafts independently. |
| Post types | Any registered public post type. |
| Language detection | Auto (Polylang β WPML β site locale), or pin one source. |
| Key strategy | posttype-id (default), posttype-slug, or permalink path. |
Document shape
{ "collection": "example_com", "key": "post-42", "lang": "en_US", "meta": { "postType": ["post"], "status": ["publish"], "title": ["Hello world"], "slug": ["hello-world"], "permalink": ["https://example.com/hello-world/"], "author": ["Jane Author"], "publishedAt": ["2026-05-19T10:14:00+00:00"], "categories": ["News"], "tags": ["intro", "demo"] }, "contentMd": "# Hello world\n\nHello world.\n"
}
contentMd runs the post body through the standard the_content filter, then strips tags β shortcodes, blocks, and oEmbed embeds are expanded first so the indexed text matches the rendered page.
Build & release
The workflow .github/workflows/wordpress-plugin.yml runs composer audit + PHPCS (WordPress security ruleset) + PHPStan level 5 + PHPUnit on PHP 8.1 / 8.2 / 8.3 / 8.4 on every PR and push touching integrations/wordpress-plugin/**. The 8.3 leg enforces β₯90 % line coverage. Pushing a wp-v* tag builds the runtime zip and attaches it to a GitHub Release β that asset is what the in-plugin updater downloads.
8. GitHub Action β MDDB (CI sync)
integrations/github-action/ β native Node 24 JavaScript action that ingests repository files into an MDDB collection on every push (or any other workflow trigger). Drop it into a workflow to keep an MDDB collection in sync with your docs/, README.md, OpenAPI specs, or any other text/markdown/JSON artefacts that live in git.
name: Sync docs to MDDB on: push: branches: [main] paths: ['docs/**', 'README.md'] jobs: sync: runs-on: ubuntu-latest steps: - uses: actions/checkout@v6 - uses: tradik/mddb/integrations/github-action@gha-v0 with: mddb-url: https://mddb.tradik.com api-key: ${{ secrets.MDDB_API_KEY }} collection: project-docs path: | docs/**/*.md README.md ignore: docs/draft/** key-prefix: ${{ github.repository }}/
Inputs
| Input | Default | Description |
|---|---|---|
mddb-url | https://mddb.tradik.com | MDDB base URL. |
api-key | (empty) | Bearer token (vk_β¦). |
collection | required | Target MDDB collection. |
path | **/*.md | Newline-separated globs (inline ! for negation). |
ignore | (empty) | Newline-separated exclude globs. |
key-strategy | path | path (slug of relative path), hash (sha1 of content), filename (basename). |
key-prefix | (empty) | Prefix every key β useful when several repos share one collection. |
concurrency | 8 | Parallel /v1/add requests. |
dry-run | false | Walk + build documents without contacting MDDB. |
fail-on-error | true | Set to false to demote upload failures to job warnings. |
Outputs
documents-scanned, documents-added, documents-failed.
Document shape
A file at docs/guide/intro.md containing # Hello produces:
{ "collection": "project-docs", "key": "tradik/mddb/docs/guide/intro.md", "lang": "en_US", "meta": { "source": ["github-action"], "path": ["docs/guide/intro.md"], "extension": [".md"], "size": ["7"], "repository": ["tradik/mddb"], "ref": ["<commit sha>"] }, "contentMd": "# Hello"
}
Markdown and plain-text (.md, .markdown, .mdx, .txt, .rst, .adoc) are stored verbatim. JSON / YAML / TOML / HTML / CSS / JS / TS / Python / Go / Rust / Bash files are wrapped in a fenced code block with the matching language so FTS + vector indexing recognise the structure.
Tests & release
57 unit tests with 90%+ Jest coverage (statements / branches / functions / lines). The workflow .github/workflows/github-action.yml runs format check + ESLint + Jest with coverage on a Node 22 & 24 matrix, rebuilds dist/ and asserts it matches the committed bundle (verify-dist), and dry-runs the action against the integration's own README (smoke). Pushing a gha-v* tag verifies package.json.version, force-moves floating gha-v<major> / gha-v<major>.<minor> tags, and publishes a GitHub Release β so consumers can pin to @gha-v0, @gha-v0.1, or @gha-v0.1.0.
9. Grafana β MDDB (Datasource plugin)
integrations/grafana-datasource/ β native Grafana 10/11/12/13 datasource plugin that turns MDDB into a first-class panel source. Five query types map directly to MDDB endpoints, so a dashboard can mix MDDB content signals (hot documents, FTS results, metadata facets, event histograms) with Prometheus/Loki/Tempo on the same page.
Note: MDDB also exposes a Prometheus-compatible
/metricsendpoint for server-level metrics (request rate, latency, database size, embedding queue) β see docs/TELEMETRY.md. This plugin is the complement: it queries MDDB content, not Prometheus metrics.
Plugin ID & supported versions
| Plugin ID | tradik-mddb-datasource |
| Type | Datasource (frontend-only, no backend Go binary) |
| Grafana | >=10.0.0 (tested against Grafana 13.0.1) |
| MDDB | 2.9.16+ |
Source: integrations/grafana-datasource/
Install
cd integrations/grafana-datasource && make build docker run --rm -p 3000:3000 \ -e GF_PLUGINS_ALLOW_LOADING_UNSIGNED_PLUGINS=tradik-mddb-datasource \ -v "$(pwd)/dist:/var/lib/grafana/plugins/tradik-mddb-datasource" \ grafana/grafana:13.0.1
Or make docker to bake the plugin into a tradik/mddb-grafana-datasource:<version> image.
In Grafana: Configuration β Data sources β Add data source β MDDB, set the base URL (https://mddb.tradik.com), optional default collection, and a bearer API key β stored encrypted in secureJsonData. Save & test pings /v1/stats and distinguishes auth vs server vs network failures.
Query types
| Query type | MDDB endpoint | Output shape | Best panel |
|---|---|---|---|
| Temporal histogram | POST /v1/temporal/histogram | Time + count time-series | Time series |
| Hot documents | POST /v1/temporal/hot | docId / accessCount / lastAccessAt table | Table / Bar chart |
| Metadata aggregate | POST /v1/aggregate | value/count table or time bucket series | Pie / Bar / Time series |
| Full-text search | POST /v1/fts | key / lang / score / highlight table | Table |
| Database stats | POST /v1/stats | per-collection documents / revisions / embeddings | Stat / Table |
Dashboard time range is applied automatically as from / to (seconds). Grafana dashboard variables are interpolated into collection, query, and facetKey via getTemplateSrv().replace().
Example: temporal access histogram
Datasource: MDDB
Query type: Temporal histogram (time-series)
Collection: blog
Event type: access
Interval: day
Yields a time series plottable on any Grafana time-series panel; combine with a documents per collection Prometheus query on the same dashboard to correlate ingest with reads.
Tests & release
Pure-logic Jest tests on query.ts, transform.ts, client.ts, and datasource.ts enforce β₯90% coverage. make package produces a versioned plugin zip; tag-driven releases use grafana-v<version> and ship the zip as a GitHub Release asset for grafana-cli plugins install --pluginUrl or signing-service submission.
10. Full Pipeline: All Integrations Together
Combine all tools for a complete content platform:
graph TB subgraph p1["1. Content Sources"] WP[WordPress] -->|wpexporter| WPEXP[wpexporter<br>markdown + meta] FILES[PDF / DOCX / PPTX] -->|parse| DOCLING[Docling] end subgraph p2["2. MDDB - Central Hub"] WPEXP -->|bulk import| MDDB[MDDB<br>:11023 / :11024] DOCLING -->|markdown + chunks| MDDB MDDB -->|primary store| BOLT[(BoltDB)] MDDB -->|auto-embed| VEC[(Vector Index)] MDDB -->|webhook sync| OS[(OpenSearch<br>scale search)] end subgraph p3["3. Orchestration and AI"] LF[Langflow] -->|semantic search| MDDB LF -->|BM25 / kNN| OS LF -->|generate| LLM[LLM] LLM --> LF MDDB -->|MCP| CLAUDE[Claude Desktop] end subgraph p4["4. Output"] MDDB -->|REST API| SSG[SSG<br>Static Site Generator] SSG -->|HTML/CSS/JS| DEPLOY[GitHub Pages<br>Cloudflare<br>Netlify] LF --> WEBAPP[Web App] MDDB -->|REST| API[Custom Apps] end style WP fill:#21759b,color:#fff style MDDB fill:#00d4aa,color:#000 style LLM fill:#d97706,color:#fff style SSG fill:#7c3aed,color:#fff style CLAUDE fill:#d97706,color:#fff
Step-by-Step
- Import β wpexporter migrates WordPress content; Docling parses PDFs/DOCX into Markdown
- Store β MDDB stores all documents with metadata in BoltDB
- Index β MDDB auto-generates embeddings; optionally syncs to OpenSearch for scale
- Search β Langflow orchestrates RAG: semantic search via MDDB, BM25 via OpenSearch
- Answer β LLM generates answers from retrieved context, cites sources
- Publish β SSG renders static sites from MDDB collections, deploys to CDN
- Manage β Claude Desktop via MCP (52 tools), Panel UI, REST/gRPC APIs
When to Use What
| Scenario | Recommendation |
|---|---|
| Migrate from WordPress | wpexporter β MDDB |
| Parse PDF/DOCX to Markdown | Docling β MDDB |
| Generate static website | MDDB β SSG |
| < 50K docs, simple RAG | MDDB only (no OpenSearch needed) |
| > 50K docs, enterprise search | MDDB + OpenSearch |
| Visual workflow building | MDDB + Langflow |
| AI agent integration | MDDB MCP (52 tools) |
| AI-assisted migration | wpmcp + mddb-mcp via Claude |
| Full production pipeline | All together |
β Back to README Β· LLM Connections β Β· RAG Pipeline β