Skip to main content
Version: 2.0
🕑Estimated time for completion

This section takes about 5 minutes to complete.

Data Science Data Requirements

Where an Analyst might ask for very specific or aggregated data (many cases, not all), a ML Engineer will more than likely ask for a subset of data but "as raw as possible". What they mean by this is that they want non-aggregated data as close to the source as possible. There is an extent to which some standardisation and harmonisation is ok (but it is important to document the transformations that have taken place as to not obscure any significant patterns that would have appeared in ML work).

Key takeaway: too much curation is too much for ML Engineers so it's important to not overaggregate your data.

To find out more about what kind of work that ML Engineers do to understand their requirements better, check out the next section (bonus)!