// Bewertungsmatrix

Klinische Bewertungsmatrix

Creator: Clinical App Report
Published: 2026-05-18T00:00:00.000Z

Jede Bewertung und jedes Ranking auf Clinical App Report wird auf derselben 100-Punkte-Matrix gescort. Sieben Kriterien, gewichtet, mit Evidenznote (A–F) verankert an der publizierten Validierungsliteratur. Das Protokoll ist unten ausführlich genug beschrieben, dass eine externe Partei es replizieren könnte.

The 100-point framework

Clinical Evaluation Framework — criteria, weights, and what we measure
Kriterium	Gewichtung	Was wir messen
Evidenz & Validierung	25%	Peer-Review-Validierungsstudien, regulatorischer Status (FDA/MHRA/CE), Zitationsdichte in der klinischen Literatur
Klinische Genauigkeit	20%	Messvalidität — MAPE gegenüber gewogenen Referenzmahlzeiten, Datenbankverifikationsstufe, Rauschtoleranz
KI-Erkennungsleistung	15%	Top-1/Top-3-Lebensmittelerkennung, Portions-MAPE, Tellersegmentierung unter Licht- und Winkelbedingungen
Makronährstoff- & Zielrahmen	10%	Makro-Tiefe, Zielanpassung, adaptive Coaching-Protokolle, Rezeptanalysator-Treue
Verhaltenshaftung	10%	Median-Erfassungszeit über 20-Aufgaben-Batterie, Friktion, Abbruchmuster aus Längsstudien
Datenschutz & Sicherheit	10%	Datenverarbeitungs-Transparenz, HIPAA-Konformität, Export-/Löschungsfunktion, Kündigungsfriktion, Monetarisierungs-Interessenkonflikte
Kosten & Zugänglichkeit	10%	Reale 12-Monats-Kosten, Nutzwert der kostenlosen Stufe, Sprachabdeckung, Unterstützung ressourcenarmer Geräte

Each criterion produces a sub-score from 0 to 100; the weighted sum is the overall score. The Evidence Grade is a separate, structured assessment of validation evidence (A–F).

Evidence & Validation (25%)

Evidence & Validation is the largest criterion because clinical credibility depends on it. We assess peer-reviewed validation studies, regulatory posture (FDA / MHRA / CE), citation depth in clinical literature, and the publisher's own methodology transparency. The Evidence Grade (A–F) is a structured summary: A requires ≥ 1 published RCT validating the app as a clinical intervention versus an active comparator; B requires peer-reviewed observational validation; C requires manufacturer-cited validation; D requires documented methodology; F is neither.

Clinical Accuracy (20%)

Clinical Accuracy is anchored to Mean Absolute Percentage Error (MAPE) against weighed reference meals. Each reference meal is built from USDA FoodData Central composition values, with every ingredient weighed on a calibrated kitchen scale (0.1g precision). We compute MAPE of each app's predicted kcal vs the reference value across the battery.

Scoring anchor: accuracy_points = clamp(100 − MAPE × 4, 0, 100). A 5% MAPE earns 80 points; 15% MAPE earns 40; 25%+ earns zero. The slope was chosen so an app at the boundary of clinical usefulness (~5% MAPE per Schoeller 1995) earns a strong but not perfect sub-score.

AI Recognition Performance (15%)

For each AI-photo-capable app we run a 30-plate photo battery across three lighting conditions, three angles, and three plate sizes. Sub-scoring: Top-1 identification correctness (40 of 100 AI-subscore points), Top-3 identification correctness (20), portion-size MAPE (30), and plate segmentation accuracy on multi-item plates (10).

Macronutrient & Goal Framework (10%)

Macros (10%) covers four sub-dimensions: macro display depth (calories, P/C/F, net carbs, fiber as first-class metrics), target-setting flexibility (custom per-macro targets, time-windowed targets), adaptive coaching protocols (TDEE estimation, weekly target adjustment), and recipe builder fidelity.

Behavioral Adherence (10%)

Behavioral Adherence is measured as median time-to-log across a standardized 20-task battery, plus drop-off pattern from published longitudinal-use studies. Friction matters because logging consistency over weeks is the variable that most predicts weight-management outcomes — a faster-to-log app is structurally more accurate over time even if per-meal accuracy is comparable.

Privacy & Security (10%)

Privacy is graded on data handling clarity, HIPAA posture (where applicable), retention policy transparency, ease of data export and deletion, cancellation friction, and whether the product's monetization model creates conflicts of interest with user advice quality.

Cost & Accessibility (10%)

Accessibility is computed as feature-density per dollar of annual cost plus free-tier usefulness, language coverage, and low-resource device support. Aggressive trial-conversion pricing reduces the sub-score.

Test cadence

Top-tier apps are re-evaluated quarterly. Mid-tier apps are re-evaluated semi-annually. A vendor release that changes core methodology, database source, or photo-AI model triggers a 30-day re-test window. Evidence Grade updates as new validation evidence publishes.

Quality control

All evaluation and scoring is reviewed against the test data before publication. Substantive corrections are logged with date and reason.

Why we don't take affiliate money

We don't maintain affiliate accounts with any of the apps we evaluate. Our reasoning is documented in our no-affiliate disclosure.