YouTube Coverage Analyse
Wissensbasis fuer die fortlaufende Optimierung der YouTube-Daten-Coverage. Alle Queries laufen direkt gegen die Production-DB (TablePlus oder psql) und sind die Grundlage fuer Plan 56 (Intelligente Tier-Aktivierung) und seine Folge-Plaene.
Konvention: Realdaten-Snapshots werden woechentlich automatisch in youtube_coverage_weekly_snapshots persistiert (siehe Plan 56 Phase 11). Die hier dokumentierten Queries sind die manuelle Variante der gleichen Aggregation — fuer Adhoc-Analysen, neue Hypothesen-Tests, oder bei Plan-Anpassungen.
Snapshot 2026-04-28 (Erstmessung)
Bestand
| Metrik | Wert |
|---|---|
| YouTube-Profile gesamt | 545.644 |
| Tracking aktiv | 544.163 |
| PRO (aktiv getrackt + Sync) | 21.055 (3,9%) |
| Free (kein Per-Video-Sync) | 523.108 (96,1%) |
youtube_video_syncs.auto_sync_enabled = true | 21.064 |
| Sync mit 3+ Fehl-Tagen | 45 |
| PRO ohne Sync (defekt) | 0 |
| Free mit Sync (Fehlkonfiguration) | 0 |
→ PRO ↔ Per-Video-Sync ist 1:1 sauber. Free-Profile bekommen aktuell gar keine Per-Video-Snapshots.
Aktivitaets-Verteilung der 21K PRO-Profile
| Letzter Video-Upload | PRO-Profile | Anteil |
|---|---|---|
| ≤ 7 Tage | 14.737 | 70,0% |
| ≤ 30 Tage | 2.378 | 11,3% |
| 30–60 Tage | 834 | 4,0% |
| 60–90 Tage | 452 | 2,1% |
| 90–180 Tage | 735 | 3,5% |
| 180–365 Tage | 622 | 3,0% |
| > 365 Tage | 1.292 | 6,1% |
| Videos in 30 Tagen | PRO-Profile |
|---|---|
| 20+ | 7.601 |
| 10–19 | 3.206 |
| 5–9 | 2.692 |
| 3–4 | 1.617 |
| 1–2 | 1.999 |
| 0 | ~3.940 |
Per-Video-Snapshot-Realitaet
| Metrik | Wert |
|---|---|
| Daily Snapshots (Mittel letzte 14 Tage) | ~190.000 |
| Daily Snapshots (Schwankungsbreite) | 124.481 – 270.266 |
| Unique Videos mit Snapshots in 30 Tagen | 2.694.981 |
| Total Snapshots in 30 Tagen | 5.152.491 |
| Average Snapshots pro Video / 30 Tage | 1,9 |
Storage youtube_video_daily_metrics | 2.086 MB / 7,26 Mio Rows |
→ Kein Video wird taeglich gesnappt. Im Mittel 1–2× pro Monat. Das ist konsistent mit der Plan-57-Capacity-Begrenzung.
Snapshot-Freshness im 180d-Fenster
| Status | Videos | Anteil |
|---|---|---|
| Frisch (≤ 1 Tag) | 443.756 | 15% |
| ≤ 7 Tage | 342.160 | 12% |
| ≤ 30 Tage (potenziell stale) | 1.505.259 | 52% |
| > 30 Tage (stale) | 433.361 | 15% |
| Kein Snapshot | 145.733 | 5% |
→ 67% der Videos haben Snapshots ≥ 7 Tage alt. Publishing-Analytics-Aggregation arbeitet de facto schon mit z.T. veralteten View-Counts.
Channel-Kategorie
| Videos in 180d | Channels | Total Videos | Anteil Videos |
|---|---|---|---|
| 100+ | 2.752 | 2.365.184 | 82% |
| 51–100 | 2.715 | 190.202 | 7% |
| 21–50 | 7.328 | 234.918 | 8% |
| 6–20 | 4.882 | 72.121 | 2% |
| 1–5 | 1.459 | 3.829 | <1% |
| 0 | 1.916 | 0 | 0% |
→ 2.752 Channels (13%) verursachen 82% der gesamten Video-Last.
Adhoc-Queries (TablePlus / psql)
Block A — Bestand & Volumen
-- A1: Gesamtverteilung YouTube-Profile
SELECT
COUNT(*) AS total_yt_profiles,
COUNT(*) FILTER (WHERE tracking_enabled = true) AS tracking_enabled,
COUNT(*) FILTER (WHERE tracking_enabled = false) AS tracking_disabled,
COUNT(*) FILTER (WHERE archived_at IS NOT NULL) AS archived,
COUNT(*) FILTER (WHERE blocked_at IS NOT NULL) AS blocked,
COUNT(*) FILTER (WHERE pro_enabled = true) AS pro_enabled,
COUNT(*) FILTER (WHERE pro_enabled = true AND tracking_enabled = true) AS pro_active
FROM social_profiles
WHERE platform = 'youtube';
-- A2: Video-Sync-Status (Plan 47 Stat-Boxen)
SELECT
COUNT(*) AS total_syncs,
COUNT(*) FILTER (WHERE auto_sync_enabled = true) AS active_sync,
COUNT(*) FILTER (WHERE auto_sync_enabled = false) AS inactive_sync,
COUNT(*) FILTER (WHERE consecutive_fail_days >= 3) AS failing_sync
FROM youtube_video_syncs;
-- A3: PRO ↔ Sync-Konsistenz
SELECT
COUNT(*) FILTER (WHERE sp.pro_enabled AND yvs.auto_sync_enabled) AS pro_with_active_sync,
COUNT(*) FILTER (WHERE sp.pro_enabled AND (yvs.auto_sync_enabled IS NULL OR yvs.auto_sync_enabled = false)) AS pro_without_sync,
COUNT(*) FILTER (WHERE NOT sp.pro_enabled AND yvs.auto_sync_enabled) AS free_with_active_sync
FROM social_profiles sp
LEFT JOIN youtube_video_syncs yvs ON yvs.social_profile_id = sp.id
WHERE sp.platform = 'youtube' AND sp.tracking_enabled = true;
Block B — Aktivitaets-Verteilung
-- B1: Histogramm "Letzter Video-Upload vor X Tagen"
WITH last_video AS (
SELECT sp.id, MAX(yv.published_at) AS last_published
FROM social_profiles sp
LEFT JOIN youtube_videos yv ON yv.social_profile_id = sp.id AND yv.is_removed = false
WHERE sp.platform = 'youtube' AND sp.tracking_enabled = true
GROUP BY sp.id
)
SELECT
CASE
WHEN last_published IS NULL THEN '00_no_videos'
WHEN last_published >= NOW() - INTERVAL '7 days' THEN '01_<=7d'
WHEN last_published >= NOW() - INTERVAL '30 days' THEN '02_<=30d'
WHEN last_published >= NOW() - INTERVAL '60 days' THEN '03_<=60d'
WHEN last_published >= NOW() - INTERVAL '90 days' THEN '04_<=90d'
WHEN last_published >= NOW() - INTERVAL '180 days' THEN '05_<=180d'
WHEN last_published >= NOW() - INTERVAL '365 days' THEN '06_<=365d'
ELSE '07_>365d'
END AS bucket,
COUNT(*) AS profiles
FROM last_video
GROUP BY bucket
ORDER BY bucket;
-- B2: Histogramm "Videos in letzten 30 Tagen"
WITH recent_videos AS (
SELECT sp.id,
COUNT(yv.video_id) FILTER (
WHERE yv.published_at >= NOW() - INTERVAL '30 days' AND yv.is_removed = false
) AS videos_30d
FROM social_profiles sp
LEFT JOIN youtube_videos yv ON yv.social_profile_id = sp.id
WHERE sp.platform = 'youtube' AND sp.tracking_enabled = true
GROUP BY sp.id
)
SELECT
CASE
WHEN videos_30d = 0 THEN '0_videos'
WHEN videos_30d BETWEEN 1 AND 2 THEN '1-2_videos'
WHEN videos_30d BETWEEN 3 AND 4 THEN '3-4_videos'
WHEN videos_30d BETWEEN 5 AND 9 THEN '5-9_videos'
WHEN videos_30d BETWEEN 10 AND 19 THEN '10-19_videos'
ELSE '20+_videos'
END AS bucket,
COUNT(*) AS profiles
FROM recent_videos
GROUP BY bucket
ORDER BY bucket;
-- B3: Follower-Delta letzte 30 Tage (Channel-Lebenszeichen)
WITH follower_movement AS (
SELECT sp.id,
MAX(spdm.followers_count) - MIN(spdm.followers_count) AS follower_delta_30d,
COUNT(spdm.id) AS metric_days_30d
FROM social_profiles sp
LEFT JOIN social_profile_daily_metrics spdm
ON spdm.social_profile_id = sp.id
AND spdm.date >= CURRENT_DATE - INTERVAL '30 days'
WHERE sp.platform = 'youtube' AND sp.tracking_enabled = true
GROUP BY sp.id
)
SELECT
CASE
WHEN metric_days_30d = 0 OR follower_delta_30d IS NULL THEN '0_no_metrics'
WHEN follower_delta_30d = 0 THEN '1_zero_delta'
WHEN follower_delta_30d <= 10 THEN '2_<=10'
WHEN follower_delta_30d <= 100 THEN '3_<=100'
WHEN follower_delta_30d <= 1000 THEN '4_<=1000'
ELSE '5_>1000'
END AS bucket,
COUNT(*) AS profiles
FROM follower_movement
GROUP BY bucket
ORDER BY bucket;
Block C — Tier-Simulation (Schwellenwert-Tuning)
-- C1: Default-Schwellen 90d/60d/5 — Was wuerde Plan 56 heute klassifizieren?
WITH signals AS (
SELECT
sp.id, sp.pro_enabled,
MAX(yv.published_at) AS last_video_at,
COUNT(yv.video_id) FILTER (
WHERE yv.published_at >= NOW() - INTERVAL '30 days' AND yv.is_removed = false
) AS videos_30d,
COALESCE(MAX(spdm.followers_count) - MIN(spdm.followers_count), 0) AS follower_delta_30d
FROM social_profiles sp
LEFT JOIN youtube_videos yv ON yv.social_profile_id = sp.id AND yv.is_removed = false
LEFT JOIN social_profile_daily_metrics spdm
ON spdm.social_profile_id = sp.id
AND spdm.date >= CURRENT_DATE - INTERVAL '30 days'
WHERE sp.platform = 'youtube' AND sp.tracking_enabled = true
GROUP BY sp.id, sp.pro_enabled
)
SELECT
CASE
WHEN pro_enabled AND (last_video_at IS NULL OR last_video_at < NOW() - INTERVAL '180 days')
THEN 'paused (PRO dormant)'
WHEN pro_enabled AND videos_30d >= 5
THEN 'full (PRO + high)'
WHEN pro_enabled AND videos_30d BETWEEN 1 AND 4
THEN 'standard (PRO + moderate)'
WHEN pro_enabled AND last_video_at >= NOW() - INTERVAL '90 days'
THEN 'light (PRO + recent but no 30d)'
WHEN pro_enabled
THEN 'paused (PRO + 90-180d)'
ELSE 'free (no per-video-sync)'
END AS expected_tier,
COUNT(*) AS profiles,
ROUND(100.0 * COUNT(*) / SUM(COUNT(*)) OVER (), 1) AS pct
FROM signals
GROUP BY expected_tier
ORDER BY profiles DESC;
→ Variiere INTERVAL '90 days' / INTERVAL '60 days' / videos_30d >= 5 um strengere/laschere Heuristiken zu testen.
Block D — Quota-Footprint
-- D1: Daily Per-Video-Snapshot-Volumen (letzte 14 Tage)
SELECT date,
COUNT(*) AS snapshots,
COUNT(DISTINCT youtube_video_id) AS unique_videos
FROM youtube_video_daily_metrics
WHERE date >= CURRENT_DATE - INTERVAL '14 days'
GROUP BY date
ORDER BY date DESC;
-- D2: 30-Tage-Aggregat — wie viele Videos werden ueberhaupt regelmaessig gesnappt?
SELECT
COUNT(DISTINCT youtube_video_id) AS videos_with_snapshots_last_30d,
COUNT(*) AS total_snapshots_last_30d,
ROUND(COUNT(*)::numeric / NULLIF(COUNT(DISTINCT youtube_video_id), 0), 1) AS avg_snapshots_per_video
FROM youtube_video_daily_metrics
WHERE date >= CURRENT_DATE - INTERVAL '30 days';
-- D3: Storage-Footprint
SELECT
pg_size_pretty(pg_total_relation_size('youtube_video_daily_metrics')) AS total_size,
(SELECT COUNT(*) FROM youtube_video_daily_metrics) AS total_rows;
Block E — Publishing-Analytics-Impact
-- E1: Channel-Verteilung im 180d-Fenster
WITH video_per_channel AS (
SELECT sp.id, sp.handle,
COUNT(*) FILTER (
WHERE yv.published_at >= NOW() - INTERVAL '180 days' AND yv.is_removed = false
) AS videos_180d
FROM social_profiles sp
JOIN youtube_videos yv ON yv.social_profile_id = sp.id
WHERE sp.platform = 'youtube' AND sp.tracking_enabled = true
GROUP BY sp.id, sp.handle
)
SELECT
CASE
WHEN videos_180d = 0 THEN '0_videos'
WHEN videos_180d BETWEEN 1 AND 5 THEN '1-5'
WHEN videos_180d BETWEEN 6 AND 20 THEN '6-20'
WHEN videos_180d BETWEEN 21 AND 50 THEN '21-50'
WHEN videos_180d BETWEEN 51 AND 100 THEN '51-100'
ELSE '100+'
END AS bucket,
COUNT(*) AS channels,
SUM(videos_180d) AS total_videos_in_bucket
FROM video_per_channel
GROUP BY bucket
ORDER BY bucket;
-- E2: Was wuerden wir bei light-Tier verlieren?
WITH light_candidates AS (
SELECT sp.id
FROM social_profiles sp
LEFT JOIN youtube_videos yv
ON yv.social_profile_id = sp.id
AND yv.published_at >= NOW() - INTERVAL '30 days'
AND yv.is_removed = false
WHERE sp.platform = 'youtube' AND sp.tracking_enabled = true
GROUP BY sp.id
HAVING COUNT(yv.video_id) = 0
)
SELECT
COUNT(DISTINCT yvdm.youtube_video_id) AS videos_we_would_stop_tracking,
COUNT(*) AS daily_snapshots_we_would_stop
FROM light_candidates lc
JOIN youtube_videos yv ON yv.social_profile_id = lc.id
JOIN youtube_video_daily_metrics yvdm ON yvdm.youtube_video_id = yv.video_id
WHERE yvdm.date >= CURRENT_DATE - INTERVAL '7 days';
-- E3: Snapshot-Freshness pro Video im 180d-Fenster
WITH video_recency AS (
SELECT yv.video_id, yv.social_profile_id,
MAX(yvdm.date) AS last_snapshot_date,
CURRENT_DATE - MAX(yvdm.date) AS days_since_snapshot
FROM youtube_videos yv
LEFT JOIN youtube_video_daily_metrics yvdm ON yvdm.youtube_video_id = yv.video_id
WHERE yv.published_at >= NOW() - INTERVAL '180 days' AND yv.is_removed = false
GROUP BY yv.video_id, yv.social_profile_id
)
SELECT
CASE
WHEN last_snapshot_date IS NULL THEN '0_no_snapshot'
WHEN days_since_snapshot <= 1 THEN '1_<=1d (frisch)'
WHEN days_since_snapshot <= 7 THEN '2_<=7d'
WHEN days_since_snapshot <= 30 THEN '3_<=30d (potenziell stale)'
ELSE '4_>30d (stale)'
END AS snapshot_freshness,
COUNT(*) AS videos
FROM video_recency
GROUP BY snapshot_freshness
ORDER BY snapshot_freshness;
Schluesselerkenntnisse fuer Plan 56 und Folgeplaene
-
PRO ↔ Per-Video-Sync ist 1:1. Plan 56 wirkt fast ausschliesslich auf den PRO-Pool von 21K Profilen — die 523K Free-Profile sind de facto schon
light(kein Per-Video-Sync). -
Quota-Einsparung im Per-Video-Pool ist marginal (~3.000 Snapshots/Tag potentielle Einsparung = vernachlaessigbar). Plan 56 sollte nicht primaer Quota sparen, sondern Coverage-Capacity fairer verteilen.
-
Echter Quota-Hebel liegt ausserhalb Plan 56: 227.311 Profile mit
zero_deltain 30d bekommen taeglichchannels.list(~227K Quota-Units/Tag). Throttling auf woechentlich = -194K Units/Tag. Gehoert in einen Schwester-Plan (Channel-Stats-Throttle). -
Coverage ist heute schon nicht-taeglich: avg 1,9 Snapshots pro Video pro 30 Tage. Plan-56-Tier-Definitionen muessen relative Prioritaet in der Coverage-Pipeline definieren, nicht absolute Frequenz.
-
Publishing-Analytics-Impact gering, wenn
light-Tier weiterhin woechentliche Top-Video-Snapshots macht. Aggregation laeuft ueber 180d-Window mit MAX(view_count) — robust gegen einzelne stale Werte.