fizz.today

Data engineering is archaeology

Some of the datasets feeding modern quant research pipelines still originate on mainframes.

One vendor we integrated had a login chain that went: Airflow DAG hits vendor portal, portal authenticates to a credential broker, broker initiates a mainframe export. Four systems deep before you touch a row of data.

If you mistyped the portal password, the mainframe account locked. If you mistyped it twice, the portal account locked too. And only one person in the firm had permission to reset both — a person who kept banker’s hours in a timezone three hours behind the team running the DAG.

The retry logic in the DAG was straightforward. The human retry logic was not. You don’t page someone at 6 AM because your Airflow task got a 401. You wait, you escalate through the right channel, and you hope the data lands before the morning research window closes.

Nobody designs a system like this. It accumulates. The mainframe was there first. The portal was bolted on when the vendor added a web layer. The credential broker was added when they needed SSO. Each layer made sense at the time, and the aggregate made sense to nobody.

Alt-data engineering isn’t just building cloud pipelines. It’s excavating the credential ceremonies of systems that predate your entire stack — and writing DAGs polite enough not to trigger the deadlocks buried in them.

#data-engineering #war-stories