Hadi Rahjou — Resume

Experience

June 2025 – Now

Built a real-time data ingestion system streaming high-frequency cryptocurrency data into Iceberg tables on AWS S3, optimized for analytical workloads.
Provisioned and maintained an Amazon MSK Serverless (Kafka) cluster using Terraform, enabling scalable, event-driven data ingestion with automated infrastructure management and monitoring.
Developed a cross-exchange cryptocurrency trading bot in Go, integrating market data streams, strategy execution, and exchange APIs to support low-latency order routing and multi-venue arbitrage.

April 2024 – June 2025

Led the migration of 50TB of big data infrastructure and 33 Spark legacy pipelines from Hadoop (YARN/HDFS) to Kubernetes and S3, enhancing scalability and efficiency.
Led Spark cluster setup and architected end-to-end data pipelines using custom Helm (development), Spark APIs (deployment), and Airflow operators (scheduling) for Kafka sink and flattening pipelines (50+).
Managed standard deployments on OKD using CI/CD and Vault with ArgoCD, on VMs using Ansible.
Performed server maintenance, including upgrades, storage provisioning, and preparing bare metal servers to join OKD clusters.
Maintained AWS infrastructure incorporating key technologies like Kafka and ClickHouse.

Feb 2023 – April 2024

Built browser-based admin tools handling 200M+ daily transactions on cloud infrastructure.
Developed high-performance Go modules with async, multiprocessing, and parallelism for 80% faster reporting.
Designed microservices with Django resolving messaging, filtering, and auditing via RESTful middleware.
Created a Django-based dashboard for database management and system monitoring.

Feb 2022 – Jan 2023

Built Spark-based ETL pipelines in PySpark for processing billions of stock records.
Deployed on Hadoop and MS SQL, exposing results via Kafka and REST APIs.
Dockerized jobs and scheduled with Kubernetes; used Git, Jira, and Confluence for team collaboration.

May 2017 – Feb 2022

Customer 360 Dashboard : a dashboard to have all account information ,statistical reports and analytical feedback in a single page. was a huge data project.

Customer Segmentation : a bigdata project which segments customers over RFM model and customized kmeans clustering algorithm to serve bank's apps and campaigns

Build and maintain a Hadoop cluster
Build and maintain a Spark cluster
Huge research and r&d Implementaion for choose methods of Segmentation : clustering algorithms ,number of clusters ,financial models
Implementaion customized kmeans to have a domain close segments
Stage daily results on Sqlserver

Churn Detection : a bigdata dashboard to detect and predict churn of the customers

Stage all transactional data of customers into hdfs
Calc monetary rate , frequency rate and recency rate of each segment base on average of customers and stock behaviors.
Calc expected monetary and frequency of each customer base on his segment's rates via pyspark
Detect churned customer for serving the tableau dashboard
Predict the customers with the risk of churn by Logestic Regression algorithm over spark

I/O Bank Resources : an statistical dashboard to reports all mounetry that bank recieved and spent among all it's ports. It is a managment dashboard and managers of the bank are it's users.

create a java job to feed resources table from several resources
Implementaion of high complex aggregations to gain a daily, monthly and yearly result
create an application web server via django and plotly for automate report publishing
implementng access managing for hierarchical access of users

Implementing a FTP Listener by python to catch and report lateness or lackness of routine files
Controlling SQL Packages stats
Create and schedule backup of database and perform a seasonly backup manouvre.
Complex Querying in case of bank reconciliation

Apr 2012 – Apr 2017

Implementing Convert project to unify all excel, access, foxpro, txt, xml .. files into Oracledb
Create PEDEC data policy to maintain all new data is generated in a usable form
Design and Implementaion Oracledb cluster and run CDC on Golden Gate tools
Design and Support PEDEC data ticketing webtools via java spring
Maintain all needs of PEDEC db such as routine backup , grant administration, ..

Master’s Degree in Computer Science (2016 – 2019)