Sourcing
Three legitimate sources: production logs (highest signal, biggest PII risk), human-curated (high cost, low volume), synthetic from a frontier model (cheap, diversity-risky). Most enterprise fine-tunes blend all three. Document each source's licensing posture before ingest.