Notes from the Firehose #1 — Three kinds of hard

It's been five months since I started my new role as a data scientist at Walmart. I'm still going through the usual early stages of a new job, and the usual challenges that come with them. I wanted to share some thoughts I've been having lately.

So far I've come to divide the skills and challenges of the job into three types: technical, methodological, and soft skills.

Technical skills are what you learn at uni — models, training techniques, train/test splits, metrics, and so on. It's writing code. For me this is the most natural to learn, and I expect to get good at it with time, given a strong academic background. The challenges I see here are catching up on the NLP "prerequisites" I never formally learned, not coming from the field, and learning well — deeply, consistently, and in a way that actually sticks in a work environment.

Methodological skills are how you organize and track experiments and curate data. How to run the "data flywheel" — bootstrapping models from manual annotations to prompt engineering to synthetic data to fine-tuning. How to calibrate judges. How to sample from production, and which "production distribution" you actually care about when you measure something. What belongs in a good report.

Then we get to soft skills, the bane of my existence. How to manage yourself and plan your tasks. When to think deeply and when to go with what you have. How to avoid rabbit holes. When to ask for help, how to communicate and stay transparent, when to raise a flag, how to lead a project, how to make use of other people. The challenge here is twofold: these skills are fuzzy and take time and experience to develop, and they're tangled up with what I call "tendencies". For example, I'm used to doing things on my own — great for independent learning, less great when it means I ask for help or raise a flag long after I should have.

You can think of this as a pyramid: each level builds on the one below it. But right now it seems to me the top is the most crucial, because without it you can't reliably deliver. Worth noting that AI agents mostly chew through the bottom level — they can write the code, do the visualizations, implement the algorithms — which only pushes the real challenge upward. But that's another post.

So how do you get better at all this? The bottom level you just study — and if you're aiming for a data scientist role, you're probably already good at studying. The trouble is the top two. Methodological knowledge is diffuse: there's a lot of it, it's not deep, it's just knowing the trade. A lot of it is scattered across blogs, though some books (Machine Learning Engineering, for one) gather much of the wisdom in one place. I find there's little point trying to learn it in advance — it's hard to anchor without an experience that forced you to use it. Soft skills are similar but worse: fuzzy, not as talked about, and learned mostly by getting them wrong.

So what's the learning mechanism? What stops you from making the same mistakes again and again — besides feeling bad when you screw up? I think one answer is writing, or at least, that's what I want to try out. Writing fosters reflection. It focuses you and produces something, countering that feeling of reading endlessly without retaining — it gives you the sense that you own a piece of the firehose of information you're supposed to know, and that you might even be useful to others.

So that's what's been going on. I'm aiming for a weekly post (again), covering interesting topics and questions I run into in my daily work, and hopefully some insights — either from experience or distilled from things I read.

See you next week!

social