Skip to content

by3nrique/KernelEmbedding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Learning Reconstructive Embeddings in Reproducing Kernel Hilbert Spaces via the Representer Theorem

DOI DOI

imagen

Motivated by the growing interest in representation learning approaches that uncover the latent structure of high-dimensional data, this project proposes new algorithms for reconstruction-based manifold learning within Reproducing-Kernel Hilbert Spaces (RKHS). Each observation is first reconstructed as a linear combination of the other samples in the RKHS, by optimizing a vector form of the Representer Theorem for their autorepresentation property. A separable operator-valued kernel extends the formulation to vector-valued data while retaining the simplicity of a single scalar similarity function. A subsequent kernel-alignment objective projects the data into a lower-dimensional latent space whose Gram matrix aims to match the high-dimensional reconstruction kernel, thus transferring the auto-reconstruction geometry of the RKHS to the embedding. Therefore, the proposed algorithms represent a principled approach to the autorepresentation property, exhibited by many natural data, by using and adapting well-known results of Kernel Learning Theory. Numerical experiments on both simulated (concentric circles and swiss-roll) and real (cancer molecular activity and IoT network intrusions) datasets provide empirical evidence of the practical effectiveness of the proposed approach.

This repository contains the supplementary code for the paper "Learning Reconstructive Embeddings in Reproducing Kernel Hilbert Spaces via the Representer Theorem".

Citation

If you use this code in your research, please cite:

@article{feito_casares_2025a,
  author        = {Feito-Casares, Enrique and Melgarejo Meseguer, Francisco Manuel and Rojo-{\'A}lvarez, Jos{\'e} Luis},
  journal       = {IEEE Open Journal of the Computer Society},
  title         = {{Learning Reconstructive Embeddings in Reproducing Kernel Hilbert Spaces via the Representer Theorem }},
  year          = {2026},
  month         = apr,
  volume        = {7},
  ISSN          = {2644-1268},
  pages         = {659-669},
  doi           = {10.1109/OJCS.2026.3682462}
}

@online{feito_casares_2025b,
  title        = {Learning Reconstructive Embeddings in Reproducing Kernel Hilbert Spaces via the Representer Theorem (Supplementary Code)},
  author       = {Feito-Casares, Enrique and Melgarejo Meseguer, Francisco Manuel and Rojo-{\'A}lvarez, Jos{\'e} Luis},
  year         = {2025},
  month        = aug,
  publisher    = {Zenodo},
  doi          = {10.5281/zenodo.16812805},
  url          = {https://doi.org/10.5281/zenodo.16812805}
}

Project Structure

KernelEmbedding/
├── README.md                  # This file
├── Experiments_A_B_C.ipynb    # Experiments A: Concentric Circles, B: Swiss Roll, C: Cancer Biomolecules
├── Experiment_D.ipynb         # Experiment D: IoT Network Intrusion
├── requirements.txt           # Python dependencies
├── data/                      # Dataset storage directory
│   ├── swiss_roll.mat         # Synthetic Swiss Roll data
│   ├── CiCIOT/                # CIC-IoT-2023 dataset (network intrusions)
│   └── NCI-CANCER/            # NCI60 cancer biomolecule dataset
├── Experiments_A_B_C.ipynb    # Experiments A: Concentric Circles, B: Swiss Roll, C: Cancer Biomolecules
├── Experiment_D.ipynb         # Experiment D: IoT Network Intrusions (CIC-IoT-2023)
├── ke_toolbox/                # Kernel Embedding toolbox
│   ├── __init__.py
│   ├── dataset.py             # Data loading and preprocessing functions
│   ├── kernels.py             # Kernel function implementations and utilities
│   ├── main.py                # Main functions and optimization pipeline
│   ├── optimization.py        # RKHS reconstruction optimization algorithms
│   ├── requirements.txt       # Python dependencies
│   └── utils.py               # Synthetic data generation and device management

Usage

  1. Install dependencies:

    pip install -r requirements.txt
  2. Run the experiments:

    • Open and run the notebooks Experiments_A_B_C.ipynb or Experiment_D.ipynb in Jupyter to reproduce the experiments from the paper.

Data References

The following datasets were used in the experiments:

NCI60 Dataset

The NCI60 dataset contains 4547 drug candidates with their cancer inhibition potentials in 60 cell line targets. Raw data provided by the NCI's Division of Cancer Treatment and Diagnosis (DCTD) offer a wide range of data for use by the scientific community.

References:

CICIoT2023 Dataset

Citation: Neto, E.C.P.; Dadkhah, S.; Ferreira, R.; Zohourian, A.; Lu, R.; Ghorbani, A.A. CICIoT2023: A Real-Time Dataset and Benchmark for Large-Scale Attacks in IoT Environment. Sensors 2023, 23, 5941.https://doi.org/10.3390/s23135941

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

This work was supported in part by the CyberFold Project, funded by the European Union through the NextGenerationEU instrument (Recovery, Transformation,and Resilience Plan), in part by Instituto Nacional de Ciberseguridad de España (INCIBE), under Grant ETD202300129, in part by the Autonomous Community of Madrid (ELLIS Madrid Node), under Project PID2022-140786NB-C32 (LATENTIA), Project AIA2025-163540-C31 (EmbedWorld), and Project PID2023-152331OA-I00 (HERMES) from the Spanish Ministry of Science and Innovation under Grant AEI/10.13039/501100011033).

EU Funding

With the collaboration of

URJC Logo

About

Reconstruction-based manifold learning in Reproducing Kernel Hilbert Spaces (RKHS) via the Representer Theorem and kernel alignment for interpretable embeddings.

Topics

Resources

License

Stars

Watchers

Forks

Contributors