// Marco

Marco de Evaluación Clínica

Creator: Clinical App Report
Published: 2026-05-18T00:00:00.000Z

Cada evaluación y clasificación en Clinical App Report se puntúa sobre el mismo marco de 100 puntos. Siete criterios, ponderados, con un Grado de Evidencia (A–F) anclado a la literatura de validación publicada.

The 100-point framework

Clinical Evaluation Framework — criteria, weights, and what we measure
Criterio	Peso	Qué medimos
Evidencia y Validación	25%	Estudios de validación con revisión por pares, postura regulatoria (FDA/MHRA/CE), profundidad de citas en literatura clínica
Exactitud Clínica	20%	Validez de medición — MAPE frente a comidas de referencia pesadas, nivel de verificación de base de datos, resistencia al ruido
Rendimiento del Reconocimiento por IA	15%	Identificación Top-1/Top-3 de alimentos, MAPE de porción, segmentación de plato bajo iluminación y ángulo
Marco de Macronutrientes y Objetivos	10%	Profundidad de macros, personalización de objetivos, protocolos de coaching adaptativos, fidelidad del analizador de recetas
Adherencia Conductual	10%	Tiempo mediano de registro en batería de 20 tareas, fricción, patrón de abandono en estudios longitudinales
Privacidad y Seguridad	10%	Claridad en el manejo de datos, postura HIPAA, facilidad de exportación/eliminación, fricción de cancelación, conflictos de monetización
Coste y Accesibilidad	10%	Coste real a 12 meses, utilidad del nivel gratuito, cobertura lingüística, soporte para dispositivos de bajos recursos

Each criterion produces a sub-score from 0 to 100; the weighted sum is the overall score. The Evidence Grade is a separate, structured assessment of validation evidence (A–F).

Evidence & Validation (25%)

Evidence & Validation is the largest criterion because clinical credibility depends on it. We assess peer-reviewed validation studies, regulatory posture (FDA / MHRA / CE), citation depth in clinical literature, and the publisher's own methodology transparency. The Evidence Grade (A–F) is a structured summary: A requires ≥ 1 published RCT validating the app as a clinical intervention versus an active comparator; B requires peer-reviewed observational validation; C requires manufacturer-cited validation; D requires documented methodology; F is neither.

Clinical Accuracy (20%)

Clinical Accuracy is anchored to Mean Absolute Percentage Error (MAPE) against weighed reference meals. Each reference meal is built from USDA FoodData Central composition values, with every ingredient weighed on a calibrated kitchen scale (0.1g precision). We compute MAPE of each app's predicted kcal vs the reference value across the battery.

Scoring anchor: accuracy_points = clamp(100 − MAPE × 4, 0, 100). A 5% MAPE earns 80 points; 15% MAPE earns 40; 25%+ earns zero. The slope was chosen so an app at the boundary of clinical usefulness (~5% MAPE per Schoeller 1995) earns a strong but not perfect sub-score.

AI Recognition Performance (15%)

For each AI-photo-capable app we run a 30-plate photo battery across three lighting conditions, three angles, and three plate sizes. Sub-scoring: Top-1 identification correctness (40 of 100 AI-subscore points), Top-3 identification correctness (20), portion-size MAPE (30), and plate segmentation accuracy on multi-item plates (10).

Macronutrient & Goal Framework (10%)

Macros (10%) covers four sub-dimensions: macro display depth (calories, P/C/F, net carbs, fiber as first-class metrics), target-setting flexibility (custom per-macro targets, time-windowed targets), adaptive coaching protocols (TDEE estimation, weekly target adjustment), and recipe builder fidelity.

Behavioral Adherence (10%)

Behavioral Adherence is measured as median time-to-log across a standardized 20-task battery, plus drop-off pattern from published longitudinal-use studies. Friction matters because logging consistency over weeks is the variable that most predicts weight-management outcomes — a faster-to-log app is structurally more accurate over time even if per-meal accuracy is comparable.

Privacy & Security (10%)

Privacy is graded on data handling clarity, HIPAA posture (where applicable), retention policy transparency, ease of data export and deletion, cancellation friction, and whether the product's monetization model creates conflicts of interest with user advice quality.

Cost & Accessibility (10%)

Accessibility is computed as feature-density per dollar of annual cost plus free-tier usefulness, language coverage, and low-resource device support. Aggressive trial-conversion pricing reduces the sub-score.

Test cadence

Top-tier apps are re-evaluated quarterly. Mid-tier apps are re-evaluated semi-annually. A vendor release that changes core methodology, database source, or photo-AI model triggers a 30-day re-test window. Evidence Grade updates as new validation evidence publishes.

Quality control

All evaluation and scoring is reviewed against the test data before publication. Substantive corrections are logged with date and reason.

Why we don't take affiliate money

We don't maintain affiliate accounts with any of the apps we evaluate. Our reasoning is documented in our no-affiliate disclosure.