// Kader

Klinisch Evaluatiekader

Creator: Clinical App Report
Published: 2026-05-18T00:00:00.000Z

Elke beoordeling en ranglijst op Clinical App Report wordt op hetzelfde 100-punts kader gescoord. Zeven criteria, gewogen, met een Evidence-cijfer (A–F) verankerd aan de gepubliceerde validatieliteratuur.

The 100-point framework

Clinical Evaluation Framework — criteria, weights, and what we measure
Criterium	Gewicht	Wat we meten
Evidence & Validatie	25%	Peer-reviewed validatiestudies, regulatoire status (FDA/MHRA/CE), citatiediepte in klinische literatuur
Klinische Nauwkeurigheid	20%	Meetvaliditeit — MAPE versus gewogen referentiemaaltijden, databaseverificatieniveau, ruisbestendigheid
AI-herkenningsprestaties	15%	Top-1/Top-3 voedselidentificatie, portie-MAPE, bordsegmentatie bij verschillende belichting en hoek
Macronutriënten- & Doelkader	10%	Macro-diepte, doelaanpassing, adaptieve coachingprotocollen, getrouwheid van receptanalyse
Gedragsadherentie	10%	Mediane logtijd over een 20-takenbatterij, frictie, uitvalpatroon uit longitudinale studies
Privacy & Beveiliging	10%	Transparantie van dataverwerking, HIPAA-houding, export-/verwijderingsgemak, opzeggingsfrictie, monetisatieconflicten
Kosten & Toegankelijkheid	10%	Werkelijke kosten over 12 maanden, nut van gratis laag, taaldekking, ondersteuning voor apparaten met beperkte middelen

Each criterion produces a sub-score from 0 to 100; the weighted sum is the overall score. The Evidence Grade is a separate, structured assessment of validation evidence (A–F).

Evidence & Validation (25%)

Evidence & Validation is the largest criterion because clinical credibility depends on it. We assess peer-reviewed validation studies, regulatory posture (FDA / MHRA / CE), citation depth in clinical literature, and the publisher's own methodology transparency. The Evidence Grade (A–F) is a structured summary: A requires ≥ 1 published RCT validating the app as a clinical intervention versus an active comparator; B requires peer-reviewed observational validation; C requires manufacturer-cited validation; D requires documented methodology; F is neither.

Clinical Accuracy (20%)

Clinical Accuracy is anchored to Mean Absolute Percentage Error (MAPE) against weighed reference meals. Each reference meal is built from USDA FoodData Central composition values, with every ingredient weighed on a calibrated kitchen scale (0.1g precision). We compute MAPE of each app's predicted kcal vs the reference value across the battery.

Scoring anchor: accuracy_points = clamp(100 − MAPE × 4, 0, 100). A 5% MAPE earns 80 points; 15% MAPE earns 40; 25%+ earns zero. The slope was chosen so an app at the boundary of clinical usefulness (~5% MAPE per Schoeller 1995) earns a strong but not perfect sub-score.

AI Recognition Performance (15%)

For each AI-photo-capable app we run a 30-plate photo battery across three lighting conditions, three angles, and three plate sizes. Sub-scoring: Top-1 identification correctness (40 of 100 AI-subscore points), Top-3 identification correctness (20), portion-size MAPE (30), and plate segmentation accuracy on multi-item plates (10).

Macronutrient & Goal Framework (10%)

Macros (10%) covers four sub-dimensions: macro display depth (calories, P/C/F, net carbs, fiber as first-class metrics), target-setting flexibility (custom per-macro targets, time-windowed targets), adaptive coaching protocols (TDEE estimation, weekly target adjustment), and recipe builder fidelity.

Behavioral Adherence (10%)

Behavioral Adherence is measured as median time-to-log across a standardized 20-task battery, plus drop-off pattern from published longitudinal-use studies. Friction matters because logging consistency over weeks is the variable that most predicts weight-management outcomes — a faster-to-log app is structurally more accurate over time even if per-meal accuracy is comparable.

Privacy & Security (10%)

Privacy is graded on data handling clarity, HIPAA posture (where applicable), retention policy transparency, ease of data export and deletion, cancellation friction, and whether the product's monetization model creates conflicts of interest with user advice quality.

Cost & Accessibility (10%)

Accessibility is computed as feature-density per dollar of annual cost plus free-tier usefulness, language coverage, and low-resource device support. Aggressive trial-conversion pricing reduces the sub-score.

Test cadence

Top-tier apps are re-evaluated quarterly. Mid-tier apps are re-evaluated semi-annually. A vendor release that changes core methodology, database source, or photo-AI model triggers a 30-day re-test window. Evidence Grade updates as new validation evidence publishes.

Quality control

All evaluation and scoring is reviewed against the test data before publication. Substantive corrections are logged with date and reason.

Why we don't take affiliate money

We don't maintain affiliate accounts with any of the apps we evaluate. Our reasoning is documented in our no-affiliate disclosure.