Hi there, I am Nhlakanipho Ngubo, a Data Engineer

Profile picture of Nhlakanipho Ngubo

About Me

Passionate about building scalable ETL pipelines and transforming raw data into actionable insights, I specialize in web scraping, data wrangling, and database integration. Using Pandas, I clean and structure messy datasets, enabling reliable workflows and performance predictions. Whether containerizing MongoDB with Docker, crafting REST APIs with Flask and SQLAlchemy, or optimizing data pipelines, I thrive on solving complex data challenges. Rooted in Agile methodologies, I bring curiosity, innovation, and a growth mindset—ready to drive smarter, data-driven decisions.

Latest Projects

ETL: Top Banks

Bank building Image

Specifications

  • Programming Language:
    • Python logo Python
  • Database:
    • SQLite logoSQLite
  • Development Environment:
    • Jupyter logo Jupyter Notebook
  • Frameworks / Libraries:
    • Pandas logoPandas
    • Numpy iconNumpy
    • BeautifulSoup iconBeautifulSoup
    • Requests IconRequests

Description

The Top Banks ETL pipeline automates financial data extraction, transformation, and storage. It scrapes bank rankings, converts market cap values, adds currency conversions (GBP, EUR, INR), and loads the refined data into a CSV file and SQLite database for seamless querying. Designed for accuracy and scalability, it transforms raw data into actionable insights for efficient analysis and informed decision-making.

Flask: CompTrack API

CompTrack API Image

Specifications

  • Programming Language:
    • Python logo Python
  • Database:
    • SQLite logoSQLite
  • Frameworks / Libraries:
    • Bug On Screen ImageUnittest
    • Flask LogoFlask
    • Flask LogoFlask-SQLAlchemy

Description

CompTrack API streamlines the collection and organization of computer specifications, enabling efficient data management and integration into analytics pipelines. Built for precision and scalability, it transforms raw hardware data into actionable insights, empowering smarter resource management and continuous innovation.

MongoDB: Visitor Admin

Visitor Admin Image

Specifications

  • Programming Language:
    • Python logo Python
  • Database:
    • MongoDB logo MongoDB
  • Containerization:
    • Docker logo Docker
  • Frameworks / Libraries:
    • Bug On Screen ImageUnittest
    • MongoDB LogoPymongo
    • MongoDB LogoBson
    • MongoDB LogoMongomock

Description

Visitor Admin securely captures and stores visitor credentials in a robust MongoDB database. It ensures efficient data collection, organized storage, and seamless retrieval, empowering organizations to manage critical data reliably and confidently. Designed for scalability, it lays the foundation for advanced workflows.

Pandas: Data Wrangling

Data Wrangling Image

Specifications

  • Programming Language:
    • Python logo Python
  • Development Environment:
    • Jupyter logo Jupyter Notebook
  • Frameworks / Libraries:
    • Pandas logoPandas

Description

Data Wrangling transforms messy datasets into clean, structured formats using Pandas, ensuring reliable workflows. By analyzing learners' personality scores and department choices, it identifies "High risk" and "Low risk" learners, enabling performance predictions and guiding proactive actions for mismatches. This process turns raw data into actionable insights, driving smarter decisions.

PostgreSQL: Shop Database

Shop Database Image

Specifications

  • Query Language:
    • SQL logoSQL
  • Database:
    • PostgreSQL logoPostgreSQL
  • Containerization:
    • Docker logo Docker

Description

The Shop Database is a scalable relational database built with PostgreSQL, designed to streamline data management through efficient modeling and optimized querying. It ensures data integrity with primary and foreign keys, supports analytics workflows and ETL processes, and applies clean coding practices for maintainability and scalability.

Volunteering

Data Entry And Educational Content Management | Umuzi Academy | April 2025 - Present

Data Entry Image

Transferred and organized learning materials from Google Drive to Google Classroom to support remote education delivery. Wrote clear and concise task headings and descriptions to enhance learner comprehension and navigation. Ensured that content uploads were accurate, timely, and aligned with course structure.

Key Contributions

  • Maintained consistent file organization standards to reduce educator workload.
  • Developed descriptive summaries for assignments, improving clarity and learner engagement.
  • Supported educators in streamlining course content distribution across digital platfoms.

Technologies I Use

Visual Studio Code

GitHub

HTML

CSS

Rabbitmq

Git

Certificates

Umuzi Academy | National Certificate: Business Analysis Support Practice NQF Level 5

Business Analysis Support Practice Certificate

Bringing together analytical thinking and creative problem-solving to research technical challenges and develop effective solutions within consulting environments. Business needs are translated into clear, actionable specifications while troubleshooting is conducted with precision. Adaptable communication skills and a structured approach support collaborative Data Engineering, optimizing pipelines and integrating solutions for meaningful results.

IBM Certificate: Python for Data Science, AI, and Development

Python for Data Science IBM Certificate

Built a strong foundation in Python with a focus on critical data structures, programming logic, and core libraries used in Data Science workflows. Applied these skills to manipulate and analyze datasets, and to develop basic data-driven applications. Gained hands-on experience with essential tools such as pandas, and numpy, reinforcing my readiness to contribute to data engineering tasks such as data wrangling, transformation, and integration within ETL pipelines.

IBM Certificate: Python Project for Data Engineering | Top Banks

Python project for Data Engineering IBM Certificate

Developed a data pipeline for banking sector analysis, extracting financial data via APIs and web scraping. Transformed datasets across formats, applied structured logging for ETL tracking, and prepared analysis-ready data for repository loading. This project showcases practical Python-based Data Engineering skills, scalability in pipeline design, and initiative in applying industry-relevant techniques.

Additional Skills

Reviewed 111+ Pull Requests, ensuring high coding standards.

Completed 46+ projects, demonstrating expertise in scalable data solutions.

Solved 100+ problems across multiple coding platforms, sharpening problem-solving skills.

Experienced in Agile workflows, leading peer learning through POD sessions, and creating clear documentation for seamless project onboarding.

Contact Me

Looking for ETL expertise, database integration, or automation? Let's connect and create impactful data solutions!

nhlakanipho.ngubo@umuzi.org:

mpilongubo07@gmail.com:

LinkedIn Profile:

LinkedIn logo