Project Overview

Introduction:

This project was carried out as part of the initiative “Développement d’une interface géospatiale intelligente du panneau 8 de la mine de phosphate à Benguerir.” The goal was to design a prototype web application that integrates geological drilling data, provides an interactive geospatial visualization, and leverages machine learning to predict phosphate concentration based on location and depth.

Objectives

  • Centralize and serve geological drilling data through a backend API.

  • Develop a predictive machine learning model to estimate phosphate grade.

  • Develop an interactive web interface that enables users to visualize drilling data and generate predictions dynamically.

Project Architecture:

The following diagram shows the project architecture and the different components

The project follows a modern client–server architecture, combining a RESTful backend with a reactive single-page frontend. The main components are:

Backend (API Layer)

  • Framework: FastAPI running on Uvicorn (ASGI) for high-performance asynchronous request handling.

  • Database: PostgreSQL with SQLAlchemy ORM for data persistence and query abstraction.

  • Validation & Typing: Pydantic ensures strict request/response validation and schema typing.

  • Data Handling: pandas is used for data ingestion, preprocessing, and cleaning.

  • Machine Learning Service:

    • Built with a pre-trained scikit-learn pipeline composed of:

      • PolynomialFeatures

      • StandardScaler

      • PCA

      • KNeighborsRegressor

    • The model is serialized and loaded via joblib.

  • Configuration: Managed with dotenv for environment variables.

  • CORS: Enabled to allow local and cross-origin frontend communication.

Examples of the used Endpoints:

  • To get the data from the PostgreSQL database:

curl -s http://127.0.0.1:8000/data
  • To get the prediction of a certain (X, Y, Z):

curl -s -X POST "http://127.0.0.1:8000/predict" ^
  -H "Content-Type: application/json" ^
  -d "{\"x_coord\": -6.55, \"y_coord\": 33.42, \"z_coord\": 150.0}"

Frontend (User Interface)

  • Framework: Built as a React single-page application (SPA).

  • API Communication: Uses the native fetch API to interact with backend endpoints.

  • Geospatial Visualization:

    • Leaflet.js powers the interactive map interface.

    • Drilling locations are displayed as markers.

    • Prediction results are dynamically updated on the map.

This project adopts a three-tier architecture, ensuring clear separation of concerns and scalability:

  • Presentation Layer (Frontend): A React-based single-page application (SPA) that provides an interactive user interface, including map visualization with Leaflet and user input for prediction requests.

  • Application Layer (Backend): A FastAPI service responsible for request handling, business logic, and ML model inference. It exposes RESTful endpoints for data retrieval and phosphate concentration prediction.

  • Data Layer (Database): A PostgreSQL database, accessed via SQLAlchemy ORM, that stores drilling data and serves as the reliable source for backend queries and analytics.

The following sequence diagram explains what happens when a user inserts X, Y, and Z to get a teneur prediction:

  • Startup:

    • Load configuration and connect to the database.

    • Ensure tables exist.

    • Load the ML pipeline (poly → scaler → PCA → KNN) into memory.

  • Optional: Data insert (CSV):

    • User uploads a CSV.

    • Server parses, validates, and inserts rows into the database.

    • UI refreshes and shows new points on the map.

  • Prediction:

    • User inputs X, Y, Z, and submits.

    • Server validates inputs and assembles features.

    • Pipeline transforms inputs and runs KNN to predict teneur.

    • The server returns the predicted value and the model name.

  • UI:

    • Displays the prediction.

    • Drops a marker at the specified coordinates and centers the map on that location.

The final user interface is presented as follows:

Machine Learning Model:

For the ML Model, we selected and trained a diverse set of models on the Forages dataset to predict the target variable, "Teneur (%)". The models included Linear Regression, Random Forest Regressor, K-Nearest Neighbors (KNN) Regressor, XGBoost Regressor, and Support Vector Regressor (SVR).

These models were chosen to cover a range of simple, ensemble-based, and non-linear approaches. Linear Regression served as a baseline model, while Random Forest and XGBoost leveraged ensemble techniques to capture complex patterns.

We captured the results of the training of the models on the forages dataset (70% train, 20% test, 10% validation) :

Based on the performance plots, the KNeighborsRegressor model achieved the lowest Mean Squared Error (MSE) of 0.1313, compared to 0.1355 for the Random Forest model.

To improve performance, we applied hyperparameter tuning using the GridSearch technique, which resulted in a modest ~1% improvement for both models.

Additionally, we introduced Principal Component Analysis (PCA) as a dimensionality reduction technique to uncover hidden patterns in the drilling dataset. After applying PCA, the models produced the following results:

The PCA technique led to an approximate 1% performance improvement for both the KNeighborsRegressor and Random Forest models.

We selected the KNeighborsRegressor model as it delivered the best performance (50% R2, 12.18% MSE) compared to the other models, and it was therefore adopted as the core of the machine learning pipeline used in the application.

  • Learning Curve: The training and cross-validation curves show a steady decrease in Mean Squared Error (MSE) as the training set size increases. The gap between the two narrows with more data, indicating improved generalization. The final cross-validation error stabilizes around 0.13–0.14, confirming good predictive accuracy.

  • Residual Plot: The residuals are centered around zero, showing that predictions are generally unbiased. The spread increases slightly for higher predicted values, but no strong patterns or systematic errors are visible, which suggests that the model captures the underlying data well.

  • ROC Curve: Although ROC is primarily for classification, it was applied here by thresholding regression outputs. The model achieved an AUC of 0.72, reflecting a moderate ability to distinguish between higher and lower phosphate values. The curve lies above the chance line, confirming meaningful predictive power beyond random guessing.

We integrated the KNNRegressor in the ML pipeline used in the application as follows:

  • Inputs

    • Features: x_coord, y_coord, z_coord (3 numeric inputs representing position/depth).

    • In inference, the backend builds a DataFrame with columns X, Y, Z, and passes it through the pipeline.

  • Stages

    • PolynomialFeatures:

      • Expands the original 3 features into a richer set of non-linear combinations (e.g., squares, cross-terms).

      • Purpose: Let simple learners capture non-linear relationships between spatial coordinates and grade.

    • StandardScaler:

      • Standardizes each expanded feature to zero mean and unit variance.

      • Purpose: Normalize scales so downstream PCA and KNN are not dominated by larger-magnitude features.

    • PCA (Principal Component Analysis):

      • Projects standardized features into a lower-dimensional subspace while preserving most variance.

      • Purpose: Reduce noise, mitigate multicollinearity from polynomial expansion, and improve KNN efficiency and stability.

    • KNeighborsRegressor:

      • Predicts the target (teneur) by averaging values of the nearest neighbors in the PCA-transformed space.

      • Purpose: Non-parametric local regression that benefits from the denoised, standardized feature space.

  • Artifacts and serving:

    • The fitted transformers and regressor are stored as serialized artifacts (poly_transform.pkl, scaler.pkl, pca_transform.pkl, knn_model.pkl).

    • On API startup, these are loaded into memory.

    • During prediction:

      • Input → PolynomialFeatures → StandardScaler → PCA → KNN → predicted_teneur returned to the client.

Application Deployment on AWS (EC2):

We deployed the Smart Mining Panel on AWS EC2 behind Nginx with HTTPS. The React frontend is built and served as static assets by Nginx at https://smartmining-16-16-120-47.sslip.io/, while API requests are routed through the same origin under /api and reverse‑proxied to the FastAPI backend running on Uvicorn (127.0.0.1:8000) managed by a systemd service.

TLS certificates are provisioned and auto‑renewed by Let’s Encrypt (Certbot). The backend reads its database connection from the repository’s .env via DATABASE_URL and uses SQLAlchemy to connect to PostgreSQL.

This setup provides a secure, production‑grade deployment with simple redeploy steps: rebuild the frontend and reload Nginx for UI changes, or restart the systemd service for backend updates.

Here's a step-by-step and a sequence diagram:

  • Page load (UI)

  • API proxying (Nginx)

    • Any request under /api (e.g., /api/data, /api/ingest, /api/predict) is reverse-proxied by Nginx to FastAPI at 127.0.0.1:8000.

    • TLS is handled by Nginx; certificates are managed by Certbot.

  • Backend service (FastAPI/Uvicorn)

    • Uvicorn runs backend.main:app under a systemd service (smartmining-backend.service).

    • On startup, the app:

      • Creates DB tables via SQLAlchemy.

      • Loads ML artifacts from models/ (poly, scaler, pca, knn).

    • DB access uses SQLAlchemy via backend/db.py, reading DATABASE_URL from .env.

  • Database (PostgreSQL)

    • forages table stores ingested rows: id, x_coord, y_coord, z_coord, teneur.

    • Queries for listing and prediction run via SQLAlchemy sessions.

Last updated