424 lines (423 with data), 18.4 kB
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# Quantum machine learning on lower-dimensional single-cell RNAseq data\n",
"\n",
"## Introduction\n",
"\n",
"The general task of machine learning is to find and study patterns in data. Many machine learning algorithms map their input dataset to a higher dimensional feature space, through the use of a kernel function:\n",
"\n",
"$$\n",
"k(\\vec{x}_i, \\vec{x}_j) = \\langle f(\\vec{x}_i), f(\\vec{x}_j) \\rangle\n",
"$$\n",
"\n",
"where $k$ is the kernel function, $\\vec{x}_i, \\vec{x}_j$ are $n$ dimensional inputs, $f$ is a map from $n$-dimension to $m$-dimension space and $\\langle a,b \\rangle$ denotes the dot product. When considering finite data, a kernel function can be represented as a matrix:\n",
"\n",
"$$\n",
"\\textbf{K}_{ij} = k(\\vec{x}_i,\\vec{x}_j)\n",
"$$\n",
"\n",
"This notebook evaluates the following quantum machine learning models:\n",
"\n",
"* Quantum Support Vector Machine (QSVC) https://qiskit-community.github.io/qiskit-machine-learning/stubs/qiskit_machine_learning.algorithms.QSVC.html\n",
"* Pegasos QSVC: https://qiskit-community.github.io/qiskit-machine-learning/stubs/qiskit_machine_learning.algorithms.PegasosQSVC.html\n",
"* Neural Networks: https://qiskit-community.github.io/qiskit-machine-learning/stubs/qiskit_machine_learning.algorithms.NeuralNetworkClassifier.html\n",
"* Variational Quantum Classifier (VQC): https://qiskit-community.github.io/qiskit-machine-learning/stubs/qiskit_machine_learning.algorithms.VQC.html\n",
"\n",
"\n",
"It takes as input the lower dimensional embedding of the single-cell RNAseq data with eight dimension of the melanoma minimal residual diseases sample and predicts drug-administered melanoma v/s phase II of minimal residual disease. "
]
},
{
"cell_type": "code",
"execution_count": 11,
"metadata": {},
"outputs": [],
"source": [
"# ====== Base class imports ======\n",
"\n",
"import numpy as np\n",
"import pandas as pd\n",
"from glob import glob\n",
"import matplotlib\n",
"import os, time\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns \n",
"sns.set_style('dark')\n",
"\n",
"# ====== Scikit-learn imports ======\n",
"from sklearn.svm import SVC\n",
"from sklearn.metrics import (\n",
" auc,\n",
" roc_curve,\n",
" ConfusionMatrixDisplay,\n",
" f1_score,\n",
" balanced_accuracy_score,\n",
")\n",
"from sklearn.preprocessing import StandardScaler, LabelBinarizer\n",
"from sklearn.model_selection import train_test_split\n",
"\n",
"# ====== Qiskit imports ======\n",
"from qiskit.circuit.library import ZZFeatureMap, ZFeatureMap, PauliFeatureMap\n",
"from qiskit import QuantumCircuit\n",
"from qiskit_aer import AerSimulator\n",
"from qiskit_ibm_runtime import QiskitRuntimeService\n",
"from qiskit_algorithms.utils import algorithm_globals\n",
"from qiskit.primitives import Sampler\n",
"from qiskit_ibm_runtime.fake_provider import FakeManilaV2\n",
"from qiskit.circuit.library import RealAmplitudes\n",
"from qiskit_algorithms.state_fidelities import ComputeUncompute\n",
"from qiskit_machine_learning.kernels import FidelityQuantumKernel\n",
"from qiskit_machine_learning.algorithms import QSVC, PegasosQSVC\n",
"from qiskit_machine_learning.algorithms.classifiers import NeuralNetworkClassifier, VQC\n",
"from qiskit_machine_learning.circuit.library import QNNCircuit\n",
"from qiskit_machine_learning.neural_networks import EstimatorQNN, SamplerQNN\n",
"from qiskit.transpiler.preset_passmanagers import generate_preset_pass_manager\n",
"from qiskit_algorithms.optimizers import COBYLA\n",
"\n",
"## ====== Torch imports ======\n",
"import torch\n",
"import torch.nn as nn \n",
"import pytorch_lightning as pl \n",
"from torchmetrics.classification import F1Score\n",
"import torch.optim as optim\n",
"from lightning.pytorch.utilities.types import OptimizerLRScheduler\n",
"import torch.utils.data\n",
"from pytorch_lightning.loggers import TensorBoardLogger\n",
"import lightning, lightning.pytorch.loggers\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Load all checkpoints"
]
},
{
"cell_type": "code",
"execution_count": 12,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Number of checkpoint files: 4\n"
]
}
],
"source": [
"\n",
"all_checkpoints = []\n",
"for fname in glob('../SessionIII_MultiOmics/checkpoints/MRD/*.npy', recursive=True):\n",
" all_checkpoints.append(fname)\n",
"print(\"Number of checkpoint files: \", len(all_checkpoints))"
]
},
{
"cell_type": "code",
"execution_count": 13,
"metadata": {},
"outputs": [],
"source": [
"# QiskitRuntimeService.save_account(channel=\"ibm_quantum\", \n",
"# token=\"4e7791327b95d470dcaae699d5f0a3073486fa3b22bd16c64bf65c609244728719c4389f587cac08285bd28488c69d1c68fc54ecbf4f160f2bff19ef04112846\", \n",
"# set_as_default=True)\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Support Vector Classifier\n",
"\n",
"We use a Support Vector Classifier (SVC) as a baseline to compare quantum machine learning methods. It uses the kernel function defined above to kernelize the data. \n"
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"def compute_svc(X_train, y_train, X_test, y_test, c = 1):\n",
" beg_time = time.time()\n",
" svc = SVC(C=c)\n",
" svc_vanilla = svc.fit(X_train, y_train)\n",
" labels_vanilla = svc_vanilla.predict(X_test)\n",
" f1_svc = f1_score(y_test, labels_vanilla, average='micro')\n",
" \n",
" print(\"Time taken for SVC (secs): \", time.time() - beg_time)\n",
" print(\"F1 SVC: \", f1_svc)\n",
" \n",
" return f1_svc"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Quantum Support Vector Classifier\n",
"\n",
"In quantum machine learning, a quantum feature map, $\\phi(\\vec{x})$, maps a classical feature vector, $\\vec{x}$, to a quantum Hilbert space, $| \\phi(\\vec{x})\\rangle \\langle \\phi(\\vec{x})|$. The quantum feature map transforms $\\vec{x} \\rightarrow | \\phi(\\vec{x})\\rangle$ using a unitary transformation $\\vec{U_\\phi}(\\vec{x})$, which is typically a [parameterized quantum circuit](./parameterized-quantum-circuits).\n",
"\n",
"Constructing quantum feature maps based on parameterized quantum circuits that are hard to simulate classically is an important step towards possibly obtaining an advantage over classical machine learning approaches, and is an active area of current research.\n",
"\n",
"In Reference 1, the authors propose a family of quantum feature maps that are [conjectured](gloss:conjecture) to be hard to simulate classically, and can be implemented as [short-depth](gloss:short-depth) circuits on [near-term quantum devices](gloss:near-term-quantum-devices). Qiskit implements these as the [`PauliFeatureMap`](https://qiskit.org/documentation/stubs/qiskit.circuit.library.PauliFeatureMap.html). The quantum feature map of depth $d$ is implemented by the unitary operator:\n",
"\n",
"$$ \\mathcal{U}_{\\Phi(\\vec{x})}=\\prod_d U_{\\Phi(\\vec{x})}H^{\\otimes n},\\ U_{\\Phi(\\vec{x})}=\\exp\\left(i\\sum_{S\\subseteq[n]}\\phi_S(\\vec{x})\\prod_{k\\in S} P_i\\right)$$\n",
"\n",
"which contains layers of Hadamard gates interleaved with [entangling blocks](gloss:entangling-blocks),\n",
"\n",
"$U_{\\Phi(\\vec{x})}$, encoding the classical data as shown in circuit diagram below for $d=2$.\n",
"\n",
"\n",
"\n",
"Within the entangling blocks, $U_{\\Phi(\\vec{x})}$: $P_i \\in \\{ I, X, Y, Z \\}$ denotes the [Pauli matrices](gloss:pauli-matrices), the index $S$ describes connectivity between different qubits or data points: $S \\in \\{\\binom{n}{k}\\ \\mathrm{combinations,\\ }k = 1,... n \\}$, and by default the data mapping function $\\phi_S(\\vec{x})$ is\n",
"\n",
"$$\n",
"\\phi_S:\\vec{x} \\mapsto \\Bigg\\{\n",
" \\begin{array}{ll}\n",
" x_i & \\textit{if}\\ S=\\{i\\} \\\\\n",
" (\\pi-x_i)(\\pi-x_j) & \\textit{if}\\ S=\\{i,j\\}\n",
" \\end{array}\n",
"$$\n",
"\n",
"\n",
"when $k = 2, P_0 = Z, P_1 = ZZ$, this is the [`ZZFeatureMap`](https://qiskit.org/documentation/stubs/qiskit.circuit.library.ZZFeatureMap.html) in Qiskit:\n",
"\n",
"$${\\mathcal{U}_{\\Phi(\\vec{x})}} = \\left( \\exp\\left(i\\sum_{jk} \\phi_{\\{j,k\\}}(\\vec{x}) \\, Z_j \\otimes Z_k\\right) \\, \\exp\\left(i\\sum_j \\phi_{\\{j\\}}(\\vec{x}) \\, Z_j\\right) \\, H^{\\otimes n} \\right)^d$$\n",
"\n",
"A generic Quantum machine learning pipeline will look like the following:\n",
"\n",
""
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"For different options of simulators look at https://qiskit.github.io/qiskit-aer/tutorials/1_aersimulator.html#Aer-Simulator-Options"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [],
"source": [
"\n",
" \n",
"def compute_QSVC(X_train, y_train, X_test, y_test, encoding='ZZ', c = 1, pegasos=False):\n",
" beg_time = time.time()\n",
" #service = QiskitRuntimeService(instance=\"accelerated-disc/internal/default\")\n",
" #backend = service.backend('ibm_torino') \n",
" service = QiskitRuntimeService() \n",
" backend = AerSimulator(method='matrix_product_state')\n",
" algorithm_globals.random_seed = 12345\n",
"\n",
" feature_map = None\n",
"\n",
" if encoding == 'ZZ' :\n",
" feature_map = ZZFeatureMap(feature_dimension=X_train.shape[1], \n",
" reps=2, \n",
" entanglement='linear')\n",
" #print(\"QSVC Circuit for ZZ Feature Map\")\n",
" #feature_map.draw('mpl')\n",
" else: \n",
" if encoding == 'Z': \n",
" feature_map = ZFeatureMap(feature_dimension=X_train.shape[1], \n",
" reps=2)\n",
" print(\"QSVC Circuit for ZZ Feature Map\")\n",
" #feature_map.draw('mpl')\n",
" if encoding == 'P': \n",
" feature_map = PauliFeatureMap(feature_dimension=X_train.shape[1], \n",
" reps=2, entanglement='linear')\n",
" print(\"QSVC Circuit for ZZ Feature Map\")\n",
" #feature_map.draw('mpl')\n",
"\n",
" sampler = Sampler() \n",
" fidelity = ComputeUncompute(sampler=sampler)\n",
" \n",
" Qkernel = FidelityQuantumKernel(fidelity=fidelity, feature_map=feature_map)\n",
" qsvc = QSVC(quantum_kernel=Qkernel, C=c)\n",
" qsvc_model = qsvc.fit(X_train,y_train)\n",
" labels_qsvc = qsvc_model.predict(X_test)\n",
" f1_qsvc = f1_score(y_test, labels_qsvc, average='micro')\n",
" \n",
" f1_peg_qsvc = 0\n",
" if pegasos == True: \n",
" peg_qsvc = PegasosQSVC(quantum_kernel=Qkernel, C=c)\n",
" peg_qsvc_model = peg_qsvc.fit(X_train, y_train)\n",
" labels_peg_qsvc = peg_qsvc_model.predict(X_test)\n",
" f1_peg_qsvc = f1_score(y_test, labels_peg_qsvc, average='micro')\n",
" \n",
" print(\"Time taken for QSVC (secs): \", time.time() - beg_time)\n",
" if pegasos == False: \n",
" print(\"F1 QSVC: \", f1_qsvc)\n",
" else: \n",
" print(\"F1 Pegasos QSVC: \", f1_peg_qsvc)\n",
" \n",
" return f1_qsvc,f1_peg_qsvc\n"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Quantum Neural Networks"
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"def compute_QNN(X_train, y_train, X_test, y_test, primitive: str):\n",
" beg_time = time.time()\n",
" if primitive == 'estimator':\n",
" # construct QNN with the QNNCircuit's default ZZFeatureMap feature map and RealAmplitudes ansatz.\n",
" qc_qnn = QNNCircuit(num_qubits=X_train.shape[1])\n",
" #print(\"Estimator QNN Circuit\")\n",
" #qc_qnn.draw('mpl')\n",
" estimator_qnn = EstimatorQNN(circuit=qc_qnn)\n",
" # QNN maps inputs to [-1, +1]\n",
" estimator_qnn.forward(X_train[0, :], algorithm_globals.random.random(estimator_qnn.num_weights))\n",
" # construct neural network classifier\n",
" estimator_classifier = NeuralNetworkClassifier(estimator_qnn, optimizer=COBYLA(maxiter=100))\n",
" # fit classifier to data\n",
" estimator_classifier.fit(X_train, y_train)\n",
" f1_score_estimator_qnn = estimator_classifier.score(X_test, y_test)\n",
" print(\"Time taken for Sampler QNN (secs): \", time.time() - beg_time)\n",
" print(\"F1 Estimator QNN: \", f1_score_estimator_qnn)\n",
" \n",
" return f1_score_estimator_qnn\n",
" \n",
" if primitive == 'sampler':\n",
" # construct a quantum circuit from the default ZZFeatureMap feature map and a customized RealAmplitudes ansatz\n",
" qc_sampler = QNNCircuit(ansatz=RealAmplitudes(X_train.shape[1], reps=1))\n",
" #print(\"Sampler QNN Circuit\")\n",
" #qc_sampler.draw('mpl')\n",
" # parity maps bitstrings to 0 or 1\n",
" def parity(x):\n",
" return \"{:b}\".format(x).count(\"1\") % 2\n",
" output_shape = 2 # corresponds to the number of classes, possible outcomes of the (parity) mapping\n",
" # construct QNN\n",
" sampler_qnn = SamplerQNN(circuit=qc_sampler, interpret=parity,output_shape=output_shape,)\n",
" # construct classifier\n",
" sampler_classifier = NeuralNetworkClassifier(neural_network=sampler_qnn, optimizer=COBYLA(maxiter=100))\n",
" # fit classifier to data\n",
" sampler_classifier.fit(X_train, y_train)\n",
" f1_score_sampler_qnn = sampler_classifier.score(X_test, y_test)\n",
" \n",
" print(\"Time taken for Sampler QNN (secs): \", time.time() - beg_time)\n",
" print(\"F1 Sampler QNN: \", f1_score_sampler_qnn)\n",
" \n",
" return f1_score_sampler_qnn"
]
},
{
"cell_type": "code",
"execution_count": 27,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Time taken for SVC (secs): 0.010152339935302734\n",
"F1 SVC: 0.6\n",
"Time taken for QSVC (secs): 218.56338953971863\n",
"F1 QSVC: 0.6\n",
"Estimator QNN Circuit\n",
"Time taken for Sampler QNN (secs): 162.06342673301697\n",
"F1 Estimator QNN: 0.5666666666666667\n",
"Sampler QNN Circuit\n",
"Time taken for Sampler QNN (secs): 144.42334818840027\n",
"F1 Sampler QNN: 0.5666666666666667\n"
]
}
],
"source": [
"results_dict = {}\n",
"for i in range(1):\n",
" matches = [x for x in all_checkpoints if \"iter\"+str(i)+\"_\" in x]\n",
" x_train = np.load([x for x in matches if \"train_embedding\" in x][0])\n",
" #print(x_train.shape)\n",
" x_test = np.load([x for x in matches if \"test_embedding\" in x][0])\n",
" #print(x_test.shape)\n",
" y_train = np.load([x for x in matches if \"train_labels\" in x][0])\n",
" #print(y_train.shape)\n",
" y_test = np.load([x for x in matches if \"test_labels\" in x][0])\n",
" #print(y_test.shape)\n",
"\n",
" \n",
" f1_svc = compute_svc(x_train,\n",
" y_train,\n",
" x_test,\n",
" y_test,\n",
" c=10)\n",
" \n",
" f1_qsvc, f1_peg_qsvc = compute_QSVC(x_train, \n",
" y_train, \n",
" x_test,\n",
" y_test,\n",
" c=10,\n",
" pegasos=0,\n",
" )\n",
" f1_est_qnn = compute_QNN(x_train, \n",
" y_train, \n",
" x_test, \n",
" y_test, \n",
" 'estimator')\n",
" \n",
" f1_sam_qnn = compute_QNN(x_train, \n",
" y_train, \n",
" x_test, \n",
" y_test, \n",
" 'sampler')\n",
" \n",
" results_dict[i] = [f1_svc, f1_qsvc, f1_peg_qsvc, f1_est_qnn, f1_sam_qnn]\n",
" \n",
"df = pd.DataFrame.from_dict(results_dict, orient='index')\n",
"df.to_csv('PerformanceMetrics.csv', \n",
" index=False, \n",
" header=['SVC', 'QSVC', 'PEG_QSVC', 'EST_QNN', 'SAM_QNN'])"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Results for predicting T0 vs MRD Phase 2 \n",
""
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.12"
}
},
"nbformat": 4,
"nbformat_minor": 2
}