Skip to content
GitLab
Explore
Sign in
Primary navigation
Search or go to…
Project
M
machine-learning-services
Manage
Activity
Members
Plan
Wiki
Code
Repository
Branches
Commits
Tags
Repository graph
Compare revisions
Locked files
Help
Help
Support
GitLab documentation
Compare GitLab plans
GitLab community forum
Contribute to GitLab
Provide feedback
Keyboard shortcuts
?
Snippets
Groups
Projects
Show more breadcrumbs
KI_LAB
machine-learning-services
Commits
eed5e315
Commit
eed5e315
authored
Jun 26, 2024
by
Konrad Firley
Browse files
Options
Downloads
Patches
Plain Diff
Delete notebook_2.ipynb -> Not needed
parent
62e255fa
No related branches found
No related tags found
No related merge requests found
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
Insurance/Insurance Fraud detection/notebook_2.ipynb
+0
-1503
0 additions, 1503 deletions
Insurance/Insurance Fraud detection/notebook_2.ipynb
with
0 additions
and
1503 deletions
Insurance/Insurance Fraud detection/notebook_2.ipynb
deleted
100644 → 0
+
0
−
1503
View file @
62e255fa
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"id": "58fa7892",
"metadata": {
"editable": true,
"slideshow": {
"slide_type": ""
},
"tags": []
},
"source": [
"# 4. Modellierung"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "409bf0ce",
"metadata": {},
"source": [
"## 4.1 Import von relevanten Modulen"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "ce52edf1",
"metadata": {},
"outputs": [],
"source": [
"import pandas as pd\n",
"import numpy as np\n",
"import matplotlib.pyplot as plt\n",
"import seaborn as sns\n",
"\n",
"from sklearn.preprocessing import StandardScaler\n",
"from sklearn.model_selection import train_test_split\n",
"from sklearn.metrics import confusion_matrix, classification_report, accuracy_score, precision_score, recall_score\n",
"from sklearn.linear_model import LogisticRegression\n",
"from sklearn.tree import DecisionTreeClassifier\n",
"from sklearn.ensemble import RandomForestClassifier\n",
"from sklearn.svm import SVC\n",
"\n",
"sns.set()\n",
"\n",
"%matplotlib inline"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "2e9db144",
"metadata": {},
"source": [
"## 4.2 Daten einlesen"
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "83961ee0",
"metadata": {},
"outputs": [],
"source": [
"data = pd.read_csv('dataset_dummies.csv') # file is generated in notebook_1"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "72081129",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>policy_csl_250/500</th>\n",
" <th>policy_csl_500/1000</th>\n",
" <th>insured_sex_MALE</th>\n",
" <th>insured_education_level_College</th>\n",
" <th>insured_education_level_High School</th>\n",
" <th>insured_education_level_JD</th>\n",
" <th>insured_education_level_MD</th>\n",
" <th>insured_education_level_Masters</th>\n",
" <th>insured_education_level_PhD</th>\n",
" <th>insured_occupation_armed-forces</th>\n",
" <th>...</th>\n",
" <th>capital-gains</th>\n",
" <th>capital-loss</th>\n",
" <th>number_of_vehicles_involved</th>\n",
" <th>bodily_injuries</th>\n",
" <th>witnesses</th>\n",
" <th>injury_claim</th>\n",
" <th>property_claim</th>\n",
" <th>vehicle_claim</th>\n",
" <th>fraud_reported</th>\n",
" <th>pct_paid_insurance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>53300</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>6510</td>\n",
" <td>13020</td>\n",
" <td>52080</td>\n",
" <td>1</td>\n",
" <td>0.986035</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>780</td>\n",
" <td>780</td>\n",
" <td>3510</td>\n",
" <td>1</td>\n",
" <td>0.605523</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>35100</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>2</td>\n",
" <td>3</td>\n",
" <td>7700</td>\n",
" <td>3850</td>\n",
" <td>23100</td>\n",
" <td>0</td>\n",
" <td>0.942280</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>48900</td>\n",
" <td>-62400</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>6340</td>\n",
" <td>6340</td>\n",
" <td>50720</td>\n",
" <td>1</td>\n",
" <td>0.968454</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>66000</td>\n",
" <td>-46000</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1300</td>\n",
" <td>650</td>\n",
" <td>4550</td>\n",
" <td>0</td>\n",
" <td>0.846154</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 74 columns</p>\n",
"</div>"
],
"text/plain": [
" policy_csl_250/500 policy_csl_500/1000 insured_sex_MALE \\\n",
"0 1 0 1 \n",
"1 1 0 1 \n",
"2 0 0 0 \n",
"3 1 0 0 \n",
"4 0 1 1 \n",
"\n",
" insured_education_level_College insured_education_level_High School \\\n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" insured_education_level_JD insured_education_level_MD \\\n",
"0 0 1 \n",
"1 0 1 \n",
"2 0 0 \n",
"3 0 0 \n",
"4 0 0 \n",
"\n",
" insured_education_level_Masters insured_education_level_PhD \\\n",
"0 0 0 \n",
"1 0 0 \n",
"2 0 1 \n",
"3 0 1 \n",
"4 0 0 \n",
"\n",
" insured_occupation_armed-forces ... capital-gains capital-loss \\\n",
"0 0 ... 53300 0 \n",
"1 0 ... 0 0 \n",
"2 0 ... 35100 0 \n",
"3 1 ... 48900 -62400 \n",
"4 0 ... 66000 -46000 \n",
"\n",
" number_of_vehicles_involved bodily_injuries witnesses injury_claim \\\n",
"0 1 1 2 6510 \n",
"1 1 0 0 780 \n",
"2 3 2 3 7700 \n",
"3 1 1 2 6340 \n",
"4 1 0 1 1300 \n",
"\n",
" property_claim vehicle_claim fraud_reported pct_paid_insurance \n",
"0 13020 52080 1 0.986035 \n",
"1 780 3510 1 0.605523 \n",
"2 3850 23100 0 0.942280 \n",
"3 6340 50720 1 0.968454 \n",
"4 650 4550 0 0.846154 \n",
"\n",
"[5 rows x 74 columns]"
]
},
"execution_count": 4,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"data.head()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c139e7ce",
"metadata": {},
"source": [
"## 4.3 Datenvorbereitung für die Modellierung"
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "a5c7e329",
"metadata": {},
"outputs": [],
"source": [
"target = data.fraud_reported\n",
"features = data.drop('fraud_reported', axis=1)"
]
},
{
"cell_type": "code",
"execution_count": 6,
"id": "bf93a421",
"metadata": {},
"outputs": [],
"source": [
"# Split data in training and test datasets\n",
"x_train, x_test, y_train, y_test = train_test_split(features, target, test_size=0.2, random_state=365)"
]
},
{
"cell_type": "code",
"execution_count": 7,
"id": "92201392",
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>policy_csl_250/500</th>\n",
" <th>policy_csl_500/1000</th>\n",
" <th>insured_sex_MALE</th>\n",
" <th>insured_education_level_College</th>\n",
" <th>insured_education_level_High School</th>\n",
" <th>insured_education_level_JD</th>\n",
" <th>insured_education_level_MD</th>\n",
" <th>insured_education_level_Masters</th>\n",
" <th>insured_education_level_PhD</th>\n",
" <th>insured_occupation_armed-forces</th>\n",
" <th>...</th>\n",
" <th>umbrella_limit</th>\n",
" <th>capital-gains</th>\n",
" <th>capital-loss</th>\n",
" <th>number_of_vehicles_involved</th>\n",
" <th>bodily_injuries</th>\n",
" <th>witnesses</th>\n",
" <th>injury_claim</th>\n",
" <th>property_claim</th>\n",
" <th>vehicle_claim</th>\n",
" <th>pct_paid_insurance</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>908</th>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>52600</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>500</td>\n",
" <td>500</td>\n",
" <td>4500</td>\n",
" <td>0.636364</td>\n",
" </tr>\n",
" <tr>\n",
" <th>591</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>7270</td>\n",
" <td>21810</td>\n",
" <td>50890</td>\n",
" <td>0.993748</td>\n",
" </tr>\n",
" <tr>\n",
" <th>836</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>52100</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>21330</td>\n",
" <td>7110</td>\n",
" <td>56880</td>\n",
" <td>0.988279</td>\n",
" </tr>\n",
" <tr>\n",
" <th>145</th>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>-57900</td>\n",
" <td>1</td>\n",
" <td>2</td>\n",
" <td>1</td>\n",
" <td>7640</td>\n",
" <td>15280</td>\n",
" <td>76400</td>\n",
" <td>0.994966</td>\n",
" </tr>\n",
" <tr>\n",
" <th>606</th>\n",
" <td>0</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>-66200</td>\n",
" <td>1</td>\n",
" <td>0</td>\n",
" <td>3</td>\n",
" <td>5750</td>\n",
" <td>5750</td>\n",
" <td>46000</td>\n",
" <td>0.982609</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 73 columns</p>\n",
"</div>"
],
"text/plain": [
" policy_csl_250/500 policy_csl_500/1000 insured_sex_MALE \\\n",
"908 1 0 1 \n",
"591 0 1 0 \n",
"836 0 0 0 \n",
"145 0 0 0 \n",
"606 0 1 0 \n",
"\n",
" insured_education_level_College insured_education_level_High School \\\n",
"908 0 0 \n",
"591 0 0 \n",
"836 0 0 \n",
"145 0 0 \n",
"606 0 0 \n",
"\n",
" insured_education_level_JD insured_education_level_MD \\\n",
"908 0 1 \n",
"591 0 0 \n",
"836 1 0 \n",
"145 0 0 \n",
"606 0 0 \n",
"\n",
" insured_education_level_Masters insured_education_level_PhD \\\n",
"908 0 0 \n",
"591 0 0 \n",
"836 0 0 \n",
"145 0 0 \n",
"606 0 0 \n",
"\n",
" insured_occupation_armed-forces ... umbrella_limit capital-gains \\\n",
"908 0 ... 0 52600 \n",
"591 1 ... 0 0 \n",
"836 0 ... 0 52100 \n",
"145 0 ... 0 0 \n",
"606 0 ... 0 0 \n",
"\n",
" capital-loss number_of_vehicles_involved bodily_injuries witnesses \\\n",
"908 0 1 1 0 \n",
"591 0 1 2 1 \n",
"836 0 1 0 1 \n",
"145 -57900 1 2 1 \n",
"606 -66200 1 0 3 \n",
"\n",
" injury_claim property_claim vehicle_claim pct_paid_insurance \n",
"908 500 500 4500 0.636364 \n",
"591 7270 21810 50890 0.993748 \n",
"836 21330 7110 56880 0.988279 \n",
"145 7640 15280 76400 0.994966 \n",
"606 5750 5750 46000 0.982609 \n",
"\n",
"[5 rows x 73 columns]"
]
},
"execution_count": 7,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"x_train.head()"
]
},
{
"cell_type": "code",
"execution_count": 8,
"id": "20040790",
"metadata": {},
"outputs": [],
"source": [
"# Scale data\n",
"scaler = StandardScaler()\n",
"scaler.fit(x_train)\n",
"\n",
"x_train = scaler.transform(x_train)\n",
"x_test = scaler.transform(x_test)"
]
},
{
"cell_type": "code",
"execution_count": 9,
"id": "64342d4d",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"0 596\n",
"1 204\n",
"Name: fraud_reported, dtype: int64"
]
},
"execution_count": 9,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# distribution of target in train data\n",
"y_train.value_counts()"
]
},
{
"cell_type": "code",
"execution_count": 10,
"id": "5ee57584",
"metadata": {
"tags": []
},
"outputs": [
{
"data": {
"text/plain": [
"0 157\n",
"1 43\n",
"Name: fraud_reported, dtype: int64"
]
},
"execution_count": 10,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# distribution of target in test datat\n",
"y_test.value_counts()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "73297e79",
"metadata": {},
"source": [
"# 4.4 Modellierung und Evaluation"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "c9b07ea0",
"metadata": {},
"source": [
"### 4.4.1 Logistische Regression"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "6b330394",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"LogisticRegression()"
]
},
"execution_count": 11,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"logreg = LogisticRegression()\n",
"logreg.fit(x_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 12,
"id": "fa3ef374",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 0.92 0.93 0.92 596\n",
" 1 0.79 0.75 0.77 204\n",
"\n",
" accuracy 0.89 800\n",
" macro avg 0.85 0.84 0.85 800\n",
"weighted avg 0.88 0.89 0.88 800\n",
"\n"
]
}
],
"source": [
"# train data\n",
"print(classification_report(y_train, logreg.predict(x_train)))"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "86ea265b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 88.5\n",
"Precision: 78.8659793814433\n",
"Recall: 75.0\n"
]
}
],
"source": [
"# train data\n",
"print('Accuracy:', accuracy_score(y_train, logreg.predict(x_train))*100)\n",
"print('Precision:', precision_score(y_train, logreg.predict(x_train))*100)\n",
"print('Recall:', recall_score(y_train, logreg.predict(x_train))*100)"
]
},
{
"cell_type": "code",
"execution_count": 14,
"id": "172d0b95",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 0.91 0.91 0.91 157\n",
" 1 0.67 0.65 0.66 43\n",
"\n",
" accuracy 0.85 200\n",
" macro avg 0.79 0.78 0.78 200\n",
"weighted avg 0.85 0.85 0.85 200\n",
"\n"
]
}
],
"source": [
"# test data\n",
"print(classification_report(y_test, logreg.predict(x_test)))"
]
},
{
"cell_type": "code",
"execution_count": 15,
"id": "8a1e6fd9",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 85.5\n",
"Precision: 66.66666666666666\n",
"Recall: 65.11627906976744\n"
]
}
],
"source": [
"# test data\n",
"print('Accuracy:', accuracy_score(y_test, logreg.predict(x_test))*100)\n",
"print('Precision:', precision_score(y_test, logreg.predict(x_test))*100)\n",
"print('Recall:', recall_score(y_test, logreg.predict(x_test))*100)"
]
},
{
"cell_type": "code",
"execution_count": 16,
"id": "c48c8e65",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"143 14 15 28\n"
]
}
],
"source": [
"tn, fp, fn, tp = confusion_matrix(y_test, logreg.predict(x_test)).ravel() \n",
"print(tn, fp, fn, tp)"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "584079bf",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"cm = confusion_matrix(y_test, logreg.predict(x_test))\n",
"sns.heatmap(cm, annot=True, cmap='terrain', fmt='g')\n",
"plt.xlabel('Real data')\n",
"plt.ylabel('Predicted data')\n",
"plt.show()"
]
},
{
"cell_type": "code",
"execution_count": 18,
"id": "f96ea8be",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([-1.87199801])"
]
},
"execution_count": 18,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"logreg.intercept_"
]
},
{
"cell_type": "code",
"execution_count": 19,
"id": "2950b467",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([[ 5.80290586e-02, -2.37429629e-01, -3.18975936e-02,\n",
" 1.03264901e-01, 1.57395846e-02, 1.54412249e-01,\n",
" 9.98550038e-02, 1.01734974e-01, 1.65847423e-01,\n",
" 7.04430787e-02, 2.21983720e-01, 2.75034627e-01,\n",
" 6.08030524e-02, -8.57181314e-02, 6.39780307e-02,\n",
" -1.53624306e-01, -4.61802683e-02, 1.51000581e-01,\n",
" -5.51566389e-02, 5.90062519e-02, -8.83889618e-03,\n",
" -8.20083809e-03, -1.42216723e-01, 1.90073855e-02,\n",
" -2.16204990e-01, -4.62653155e-01, 7.95035036e-01,\n",
" 5.66640168e-01, -3.05200290e-01, -2.31887634e-01,\n",
" -8.43891138e-02, -9.43983125e-02, -3.03973994e-01,\n",
" -9.70096772e-02, -2.00177795e-01, -5.26322421e-02,\n",
" -1.90789105e-02, -1.19299001e-01, -2.22788639e-01,\n",
" 8.18201612e-03, 2.03631238e-01, 3.87350193e-01,\n",
" 4.07693675e-01, -2.79170918e-02, 2.25397475e-01,\n",
" 1.85538456e-01, -1.95996214e-01, -1.87436352e-01,\n",
" -2.87031385e-01, 1.41595895e-01, -1.11191352e-01,\n",
" -1.74200227e+00, -1.66185713e+00, -1.21492255e+00,\n",
" 1.24360304e-01, 1.64944689e-01, 2.58711282e-01,\n",
" 3.23758889e-01, 1.30618426e-03, -1.22778321e-01,\n",
" -1.01884582e-02, 5.54932851e-02, 5.82757827e-02,\n",
" 3.61968828e-01, -1.20848644e-01, -2.42543595e-01,\n",
" -1.93472781e-01, 8.04217694e-02, 5.86457472e-02,\n",
" -1.71694568e-01, -3.64878309e-02, 1.53029355e-01,\n",
" 3.50253316e-03]])"
]
},
"execution_count": 19,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"logreg.coef_"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "5c21a230",
"metadata": {},
"source": [
"### 4.4.2 Entscheidungsbaum"
]
},
{
"cell_type": "code",
"execution_count": 20,
"id": "3c2caa28",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"DecisionTreeClassifier()"
]
},
"execution_count": 20,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"tree = DecisionTreeClassifier()\n",
"tree.fit(x_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "d5102a31",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 1.00 1.00 1.00 596\n",
" 1 1.00 1.00 1.00 204\n",
"\n",
" accuracy 1.00 800\n",
" macro avg 1.00 1.00 1.00 800\n",
"weighted avg 1.00 1.00 1.00 800\n",
"\n"
]
}
],
"source": [
"# train data\n",
"print(classification_report(y_train, tree.predict(x_train)))"
]
},
{
"cell_type": "code",
"execution_count": 22,
"id": "dc4b1761",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 100.0\n",
"Precision: 100.0\n",
"Recall: 100.0\n"
]
}
],
"source": [
"# train data\n",
"print('Accuracy:', accuracy_score(y_train, tree.predict(x_train))*100)\n",
"print('Precision:', precision_score(y_train, tree.predict(x_train))*100)\n",
"print('Recall:', recall_score(y_train, tree.predict(x_train))*100)"
]
},
{
"cell_type": "code",
"execution_count": 23,
"id": "fff555ef",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 0.83 0.85 0.84 157\n",
" 1 0.39 0.35 0.37 43\n",
"\n",
" accuracy 0.74 200\n",
" macro avg 0.61 0.60 0.61 200\n",
"weighted avg 0.73 0.74 0.74 200\n",
"\n"
]
}
],
"source": [
"# test data\n",
"print(classification_report(y_test, tree.predict(x_test)))"
]
},
{
"cell_type": "code",
"execution_count": 24,
"id": "afdad593",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 74.5\n",
"Precision: 39.473684210526315\n",
"Recall: 34.883720930232556\n"
]
}
],
"source": [
"# test data\n",
"print('Accuracy:', accuracy_score(y_test, tree.predict(x_test))*100)\n",
"print('Precision:', precision_score(y_test, tree.predict(x_test))*100)\n",
"print('Recall:', recall_score(y_test, tree.predict(x_test))*100)"
]
},
{
"cell_type": "code",
"execution_count": 25,
"id": "44062e93",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"cm = confusion_matrix(y_test, tree.predict(x_test))\n",
"sns.heatmap(cm, annot=True, cmap='terrain', fmt='g')\n",
"plt.xlabel('Real data')\n",
"plt.ylabel('Predicted data')\n",
"plt.show()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "e867e082",
"metadata": {},
"source": [
"### 4.4.3 Random Forest"
]
},
{
"cell_type": "code",
"execution_count": 26,
"id": "fbd0cd13",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"RandomForestClassifier()"
]
},
"execution_count": 26,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"forest = RandomForestClassifier()\n",
"forest.fit(x_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "37a0c75e",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 1.00 1.00 1.00 596\n",
" 1 1.00 1.00 1.00 204\n",
"\n",
" accuracy 1.00 800\n",
" macro avg 1.00 1.00 1.00 800\n",
"weighted avg 1.00 1.00 1.00 800\n",
"\n"
]
}
],
"source": [
"# train data\n",
"print(classification_report(y_train, forest.predict(x_train)))"
]
},
{
"cell_type": "code",
"execution_count": 28,
"id": "4192d83b",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 100.0\n",
"Precision: 100.0\n",
"Recall: 100.0\n"
]
}
],
"source": [
"# train data\n",
"print('Accuracy:', accuracy_score(y_train, forest.predict(x_train))*100)\n",
"print('Precision:', precision_score(y_train, forest.predict(x_train))*100)\n",
"print('Recall:', recall_score(y_train, forest.predict(x_train))*100)"
]
},
{
"cell_type": "code",
"execution_count": 29,
"id": "1461c7a0",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 0.79 0.94 0.85 157\n",
" 1 0.23 0.07 0.11 43\n",
"\n",
" accuracy 0.75 200\n",
" macro avg 0.51 0.50 0.48 200\n",
"weighted avg 0.67 0.75 0.69 200\n",
"\n"
]
}
],
"source": [
"# test data\n",
"print(classification_report(y_test, forest.predict(x_test)))"
]
},
{
"cell_type": "code",
"execution_count": 30,
"id": "04b758a6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 75.0\n",
"Precision: 23.076923076923077\n",
"Recall: 6.976744186046512\n"
]
}
],
"source": [
"# test data\n",
"print('Accuracy:', accuracy_score(y_test, forest.predict(x_test))*100)\n",
"print('Precision:', precision_score(y_test, forest.predict(x_test))*100)\n",
"print('Recall:', recall_score(y_test, forest.predict(x_test))*100)"
]
},
{
"cell_type": "code",
"execution_count": 31,
"id": "1c713122",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"cm = confusion_matrix(y_test, forest.predict(x_test))\n",
"sns.heatmap(cm, annot=True, cmap='terrain', fmt='g')\n",
"plt.xlabel('Real data')\n",
"plt.ylabel('Predicted data')\n",
"plt.show()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d4e8e96f",
"metadata": {},
"source": [
"### 4.4.4 Support Vector Machine"
]
},
{
"cell_type": "code",
"execution_count": 32,
"id": "4bd111b7",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"SVC()"
]
},
"execution_count": 32,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"svc = SVC()\n",
"svc.fit(x_train, y_train)"
]
},
{
"cell_type": "code",
"execution_count": 33,
"id": "57b9a951",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 0.92 0.99 0.95 596\n",
" 1 0.95 0.75 0.84 204\n",
"\n",
" accuracy 0.93 800\n",
" macro avg 0.94 0.87 0.90 800\n",
"weighted avg 0.93 0.93 0.92 800\n",
"\n"
]
}
],
"source": [
"# train data\n",
"print(classification_report(y_train, svc.predict(x_train)))"
]
},
{
"cell_type": "code",
"execution_count": 34,
"id": "83962b23",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 92.625\n",
"Precision: 95.03105590062113\n",
"Recall: 75.0\n"
]
}
],
"source": [
"# train data\n",
"print('Accuracy:', accuracy_score(y_train, svc.predict(x_train))*100)\n",
"print('Precision:', precision_score(y_train, svc.predict(x_train))*100)\n",
"print('Recall:', recall_score(y_train, svc.predict(x_train))*100)"
]
},
{
"cell_type": "code",
"execution_count": 35,
"id": "ffe4ac08",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
" precision recall f1-score support\n",
"\n",
" 0 0.85 0.99 0.91 157\n",
" 1 0.89 0.37 0.52 43\n",
"\n",
" accuracy 0.85 200\n",
" macro avg 0.87 0.68 0.72 200\n",
"weighted avg 0.86 0.85 0.83 200\n",
"\n"
]
}
],
"source": [
"# test data\n",
"print(classification_report(y_test, svc.predict(x_test)))"
]
},
{
"cell_type": "code",
"execution_count": 36,
"id": "99f76a05",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 85.5\n",
"Precision: 88.88888888888889\n",
"Recall: 37.2093023255814\n"
]
}
],
"source": [
"# test data\n",
"print('Accuracy:', accuracy_score(y_test, svc.predict(x_test))*100)\n",
"print('Precision:', precision_score(y_test, svc.predict(x_test))*100)\n",
"print('Recall:', recall_score(y_test, svc.predict(x_test))*100)"
]
},
{
"cell_type": "code",
"execution_count": 37,
"id": "f7ea05f4",
"metadata": {},
"outputs": [
{
"data": {
"image/png": "",
"text/plain": [
"<Figure size 432x288 with 2 Axes>"
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"cm = confusion_matrix(y_test, svc.predict(x_test))\n",
"sns.heatmap(cm, annot=True, cmap='terrain', fmt='g')\n",
"plt.xlabel('Real data')\n",
"plt.ylabel('Predicted data')\n",
"plt.show()"
]
},
{
"attachments": {},
"cell_type": "markdown",
"id": "d9d7678f",
"metadata": {},
"source": [
"# 5. Deployment"
]
},
{
"cell_type": "code",
"execution_count": 43,
"id": "6c6ba1e6",
"metadata": {},
"outputs": [],
"source": [
"# Select one scaled person of the dataset\n",
"sample_df = x_test[72]"
]
},
{
"cell_type": "code",
"execution_count": 44,
"id": "6b73ff09",
"metadata": {},
"outputs": [
{
"data": {
"text/plain": [
"array([ 1.35903462, -0.65270587, 1.10833761, -0.37363236, -0.42828957,\n",
" 2.30200187, -0.41181385, -0.40137644, -0.38655567, -0.27958383,\n",
" -0.30478874, -0.28730468, -0.24413654, -0.24124895, -0.31207962,\n",
" -0.26636529, -0.26636529, -0.30478874, -0.26636529, -0.28217394,\n",
" -0.28984624, 3.79270555, -0.18328047, -0.22941573, -0.24983394,\n",
" -0.24124895, -0.22331316, 5.06622805, -0.21707238, -0.24699789,\n",
" -0.23241869, -0.24413654, -0.23833416, -0.23833416, -0.23833416,\n",
" -0.22021079, -0.25264558, -0.22021079, -0.19044535, -0.2353911 ,\n",
" -0.24983394, 2.19986728, -0.47248449, -0.46255869, -0.40973554,\n",
" -0.43033148, -0.27958383, -0.82502865, -0.31926223, -0.91370804,\n",
" -0.6352234 , -0.74390729, 1.60356745, -0.29488391, -0.51752183,\n",
" -0.29738086, -0.50780078, -0.65660263, -0.6644106 , -0.67419986,\n",
" -1.6511054 , -1.04810348, 0.18475885, -0.48560679, -0.92537512,\n",
" 0.963709 , 1.11630666, -1.18253256, 0.45167913, 0.85886085,\n",
" 0.85043965, 0.74218584, 0.10204472])"
]
},
"execution_count": 44,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# Features of the selected sample\n",
"sample_df"
]
},
{
"cell_type": "code",
"execution_count": 45,
"id": "05ec844f",
"metadata": {},
"outputs": [],
"source": [
"# Execute prediction\n",
"sample_pred = svc.predict([sample_df])"
]
},
{
"cell_type": "code",
"execution_count": 46,
"id": "ac328f85",
"metadata": {},
"outputs": [],
"source": [
"# Interpret the result\n",
"def check_prediction(pred):\n",
" if pred[0] == 1:\n",
" print(\"Fraud.\")\n",
" else:\n",
" print(\"No Fraud.\")"
]
},
{
"cell_type": "code",
"execution_count": 47,
"id": "fc24f4ee",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Fraud.\n"
]
}
],
"source": [
"# call the prediciton method\n",
"check_prediction(sample_pred)"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "c91f2802",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"category": "Insurance",
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
},
"skipNotebookInDeployment": true,
"title": "Insurance Fraud detection"
},
"nbformat": 4,
"nbformat_minor": 5
}
%% Cell type:markdown id:58fa7892 tags:
# 4. Modellierung
%% Cell type:markdown id:409bf0ce tags:
## 4.1 Import von relevanten Modulen
%% Cell type:code id:ce52edf1 tags:
```
python
import
pandas
as
pd
import
numpy
as
np
import
matplotlib.pyplot
as
plt
import
seaborn
as
sns
from
sklearn.preprocessing
import
StandardScaler
from
sklearn.model_selection
import
train_test_split
from
sklearn.metrics
import
confusion_matrix
,
classification_report
,
accuracy_score
,
precision_score
,
recall_score
from
sklearn.linear_model
import
LogisticRegression
from
sklearn.tree
import
DecisionTreeClassifier
from
sklearn.ensemble
import
RandomForestClassifier
from
sklearn.svm
import
SVC
sns
.
set
()
%
matplotlib
inline
```
%% Cell type:markdown id:2e9db144 tags:
## 4.2 Daten einlesen
%% Cell type:code id:83961ee0 tags:
```
python
data
=
pd
.
read_csv
(
'
dataset_dummies.csv
'
)
# file is generated in notebook_1
```
%% Cell type:code id:72081129 tags:
```
python
data
.
head
()
```
%% Output
policy_csl_250/500 policy_csl_500/1000 insured_sex_MALE \
0 1 0 1
1 1 0 1
2 0 0 0
3 1 0 0
4 0 1 1
insured_education_level_College insured_education_level_High School \
0 0 0
1 0 0
2 0 0
3 0 0
4 0 0
insured_education_level_JD insured_education_level_MD \
0 0 1
1 0 1
2 0 0
3 0 0
4 0 0
insured_education_level_Masters insured_education_level_PhD \
0 0 0
1 0 0
2 0 1
3 0 1
4 0 0
insured_occupation_armed-forces ... capital-gains capital-loss \
0 0 ... 53300 0
1 0 ... 0 0
2 0 ... 35100 0
3 1 ... 48900 -62400
4 0 ... 66000 -46000
number_of_vehicles_involved bodily_injuries witnesses injury_claim \
0 1 1 2 6510
1 1 0 0 780
2 3 2 3 7700
3 1 1 2 6340
4 1 0 1 1300
property_claim vehicle_claim fraud_reported pct_paid_insurance
0 13020 52080 1 0.986035
1 780 3510 1 0.605523
2 3850 23100 0 0.942280
3 6340 50720 1 0.968454
4 650 4550 0 0.846154
[5 rows x 74 columns]
%% Cell type:markdown id:c139e7ce tags:
## 4.3 Datenvorbereitung für die Modellierung
%% Cell type:code id:a5c7e329 tags:
```
python
target
=
data
.
fraud_reported
features
=
data
.
drop
(
'
fraud_reported
'
,
axis
=
1
)
```
%% Cell type:code id:bf93a421 tags:
```
python
# Split data in training and test datasets
x_train
,
x_test
,
y_train
,
y_test
=
train_test_split
(
features
,
target
,
test_size
=
0.2
,
random_state
=
365
)
```
%% Cell type:code id:92201392 tags:
```
python
x_train
.
head
()
```
%% Output
policy_csl_250/500 policy_csl_500/1000 insured_sex_MALE \
908 1 0 1
591 0 1 0
836 0 0 0
145 0 0 0
606 0 1 0
insured_education_level_College insured_education_level_High School \
908 0 0
591 0 0
836 0 0
145 0 0
606 0 0
insured_education_level_JD insured_education_level_MD \
908 0 1
591 0 0
836 1 0
145 0 0
606 0 0
insured_education_level_Masters insured_education_level_PhD \
908 0 0
591 0 0
836 0 0
145 0 0
606 0 0
insured_occupation_armed-forces ... umbrella_limit capital-gains \
908 0 ... 0 52600
591 1 ... 0 0
836 0 ... 0 52100
145 0 ... 0 0
606 0 ... 0 0
capital-loss number_of_vehicles_involved bodily_injuries witnesses \
908 0 1 1 0
591 0 1 2 1
836 0 1 0 1
145 -57900 1 2 1
606 -66200 1 0 3
injury_claim property_claim vehicle_claim pct_paid_insurance
908 500 500 4500 0.636364
591 7270 21810 50890 0.993748
836 21330 7110 56880 0.988279
145 7640 15280 76400 0.994966
606 5750 5750 46000 0.982609
[5 rows x 73 columns]
%% Cell type:code id:20040790 tags:
```
python
# Scale data
scaler
=
StandardScaler
()
scaler
.
fit
(
x_train
)
x_train
=
scaler
.
transform
(
x_train
)
x_test
=
scaler
.
transform
(
x_test
)
```
%% Cell type:code id:64342d4d tags:
```
python
# distribution of target in train data
y_train
.
value_counts
()
```
%% Output
0 596
1 204
Name: fraud_reported, dtype: int64
%% Cell type:code id:5ee57584 tags:
```
python
# distribution of target in test datat
y_test
.
value_counts
()
```
%% Output
0 157
1 43
Name: fraud_reported, dtype: int64
%% Cell type:markdown id:73297e79 tags:
# 4.4 Modellierung und Evaluation
%% Cell type:markdown id:c9b07ea0 tags:
### 4.4.1 Logistische Regression
%% Cell type:code id:6b330394 tags:
```
python
logreg
=
LogisticRegression
()
logreg
.
fit
(
x_train
,
y_train
)
```
%% Output
LogisticRegression()
%% Cell type:code id:fa3ef374 tags:
```
python
# train data
print
(
classification_report
(
y_train
,
logreg
.
predict
(
x_train
)))
```
%% Output
precision recall f1-score support
0 0.92 0.93 0.92 596
1 0.79 0.75 0.77 204
accuracy 0.89 800
macro avg 0.85 0.84 0.85 800
weighted avg 0.88 0.89 0.88 800
%% Cell type:code id:86ea265b tags:
```
python
# train data
print
(
'
Accuracy:
'
,
accuracy_score
(
y_train
,
logreg
.
predict
(
x_train
))
*
100
)
print
(
'
Precision:
'
,
precision_score
(
y_train
,
logreg
.
predict
(
x_train
))
*
100
)
print
(
'
Recall:
'
,
recall_score
(
y_train
,
logreg
.
predict
(
x_train
))
*
100
)
```
%% Output
Accuracy: 88.5
Precision: 78.8659793814433
Recall: 75.0
%% Cell type:code id:172d0b95 tags:
```
python
# test data
print
(
classification_report
(
y_test
,
logreg
.
predict
(
x_test
)))
```
%% Output
precision recall f1-score support
0 0.91 0.91 0.91 157
1 0.67 0.65 0.66 43
accuracy 0.85 200
macro avg 0.79 0.78 0.78 200
weighted avg 0.85 0.85 0.85 200
%% Cell type:code id:8a1e6fd9 tags:
```
python
# test data
print
(
'
Accuracy:
'
,
accuracy_score
(
y_test
,
logreg
.
predict
(
x_test
))
*
100
)
print
(
'
Precision:
'
,
precision_score
(
y_test
,
logreg
.
predict
(
x_test
))
*
100
)
print
(
'
Recall:
'
,
recall_score
(
y_test
,
logreg
.
predict
(
x_test
))
*
100
)
```
%% Output
Accuracy: 85.5
Precision: 66.66666666666666
Recall: 65.11627906976744
%% Cell type:code id:c48c8e65 tags:
```
python
tn
,
fp
,
fn
,
tp
=
confusion_matrix
(
y_test
,
logreg
.
predict
(
x_test
)).
ravel
()
print
(
tn
,
fp
,
fn
,
tp
)
```
%% Output
143 14 15 28
%% Cell type:code id:584079bf tags:
```
python
cm
=
confusion_matrix
(
y_test
,
logreg
.
predict
(
x_test
))
sns
.
heatmap
(
cm
,
annot
=
True
,
cmap
=
'
terrain
'
,
fmt
=
'
g
'
)
plt
.
xlabel
(
'
Real data
'
)
plt
.
ylabel
(
'
Predicted data
'
)
plt
.
show
()
```
%% Output
%% Cell type:code id:f96ea8be tags:
```
python
logreg
.
intercept_
```
%% Output
array([-1.87199801])
%% Cell type:code id:2950b467 tags:
```
python
logreg
.
coef_
```
%% Output
array([[ 5.80290586e-02, -2.37429629e-01, -3.18975936e-02,
1.03264901e-01, 1.57395846e-02, 1.54412249e-01,
9.98550038e-02, 1.01734974e-01, 1.65847423e-01,
7.04430787e-02, 2.21983720e-01, 2.75034627e-01,
6.08030524e-02, -8.57181314e-02, 6.39780307e-02,
-1.53624306e-01, -4.61802683e-02, 1.51000581e-01,
-5.51566389e-02, 5.90062519e-02, -8.83889618e-03,
-8.20083809e-03, -1.42216723e-01, 1.90073855e-02,
-2.16204990e-01, -4.62653155e-01, 7.95035036e-01,
5.66640168e-01, -3.05200290e-01, -2.31887634e-01,
-8.43891138e-02, -9.43983125e-02, -3.03973994e-01,
-9.70096772e-02, -2.00177795e-01, -5.26322421e-02,
-1.90789105e-02, -1.19299001e-01, -2.22788639e-01,
8.18201612e-03, 2.03631238e-01, 3.87350193e-01,
4.07693675e-01, -2.79170918e-02, 2.25397475e-01,
1.85538456e-01, -1.95996214e-01, -1.87436352e-01,
-2.87031385e-01, 1.41595895e-01, -1.11191352e-01,
-1.74200227e+00, -1.66185713e+00, -1.21492255e+00,
1.24360304e-01, 1.64944689e-01, 2.58711282e-01,
3.23758889e-01, 1.30618426e-03, -1.22778321e-01,
-1.01884582e-02, 5.54932851e-02, 5.82757827e-02,
3.61968828e-01, -1.20848644e-01, -2.42543595e-01,
-1.93472781e-01, 8.04217694e-02, 5.86457472e-02,
-1.71694568e-01, -3.64878309e-02, 1.53029355e-01,
3.50253316e-03]])
%% Cell type:markdown id:5c21a230 tags:
### 4.4.2 Entscheidungsbaum
%% Cell type:code id:3c2caa28 tags:
```
python
tree
=
DecisionTreeClassifier
()
tree
.
fit
(
x_train
,
y_train
)
```
%% Output
DecisionTreeClassifier()
%% Cell type:code id:d5102a31 tags:
```
python
# train data
print
(
classification_report
(
y_train
,
tree
.
predict
(
x_train
)))
```
%% Output
precision recall f1-score support
0 1.00 1.00 1.00 596
1 1.00 1.00 1.00 204
accuracy 1.00 800
macro avg 1.00 1.00 1.00 800
weighted avg 1.00 1.00 1.00 800
%% Cell type:code id:dc4b1761 tags:
```
python
# train data
print
(
'
Accuracy:
'
,
accuracy_score
(
y_train
,
tree
.
predict
(
x_train
))
*
100
)
print
(
'
Precision:
'
,
precision_score
(
y_train
,
tree
.
predict
(
x_train
))
*
100
)
print
(
'
Recall:
'
,
recall_score
(
y_train
,
tree
.
predict
(
x_train
))
*
100
)
```
%% Output
Accuracy: 100.0
Precision: 100.0
Recall: 100.0
%% Cell type:code id:fff555ef tags:
```
python
# test data
print
(
classification_report
(
y_test
,
tree
.
predict
(
x_test
)))
```
%% Output
precision recall f1-score support
0 0.83 0.85 0.84 157
1 0.39 0.35 0.37 43
accuracy 0.74 200
macro avg 0.61 0.60 0.61 200
weighted avg 0.73 0.74 0.74 200
%% Cell type:code id:afdad593 tags:
```
python
# test data
print
(
'
Accuracy:
'
,
accuracy_score
(
y_test
,
tree
.
predict
(
x_test
))
*
100
)
print
(
'
Precision:
'
,
precision_score
(
y_test
,
tree
.
predict
(
x_test
))
*
100
)
print
(
'
Recall:
'
,
recall_score
(
y_test
,
tree
.
predict
(
x_test
))
*
100
)
```
%% Output
Accuracy: 74.5
Precision: 39.473684210526315
Recall: 34.883720930232556
%% Cell type:code id:44062e93 tags:
```
python
cm
=
confusion_matrix
(
y_test
,
tree
.
predict
(
x_test
))
sns
.
heatmap
(
cm
,
annot
=
True
,
cmap
=
'
terrain
'
,
fmt
=
'
g
'
)
plt
.
xlabel
(
'
Real data
'
)
plt
.
ylabel
(
'
Predicted data
'
)
plt
.
show
()
```
%% Output
%% Cell type:markdown id:e867e082 tags:
### 4.4.3 Random Forest
%% Cell type:code id:fbd0cd13 tags:
```
python
forest
=
RandomForestClassifier
()
forest
.
fit
(
x_train
,
y_train
)
```
%% Output
RandomForestClassifier()
%% Cell type:code id:37a0c75e tags:
```
python
# train data
print
(
classification_report
(
y_train
,
forest
.
predict
(
x_train
)))
```
%% Output
precision recall f1-score support
0 1.00 1.00 1.00 596
1 1.00 1.00 1.00 204
accuracy 1.00 800
macro avg 1.00 1.00 1.00 800
weighted avg 1.00 1.00 1.00 800
%% Cell type:code id:4192d83b tags:
```
python
# train data
print
(
'
Accuracy:
'
,
accuracy_score
(
y_train
,
forest
.
predict
(
x_train
))
*
100
)
print
(
'
Precision:
'
,
precision_score
(
y_train
,
forest
.
predict
(
x_train
))
*
100
)
print
(
'
Recall:
'
,
recall_score
(
y_train
,
forest
.
predict
(
x_train
))
*
100
)
```
%% Output
Accuracy: 100.0
Precision: 100.0
Recall: 100.0
%% Cell type:code id:1461c7a0 tags:
```
python
# test data
print
(
classification_report
(
y_test
,
forest
.
predict
(
x_test
)))
```
%% Output
precision recall f1-score support
0 0.79 0.94 0.85 157
1 0.23 0.07 0.11 43
accuracy 0.75 200
macro avg 0.51 0.50 0.48 200
weighted avg 0.67 0.75 0.69 200
%% Cell type:code id:04b758a6 tags:
```
python
# test data
print
(
'
Accuracy:
'
,
accuracy_score
(
y_test
,
forest
.
predict
(
x_test
))
*
100
)
print
(
'
Precision:
'
,
precision_score
(
y_test
,
forest
.
predict
(
x_test
))
*
100
)
print
(
'
Recall:
'
,
recall_score
(
y_test
,
forest
.
predict
(
x_test
))
*
100
)
```
%% Output
Accuracy: 75.0
Precision: 23.076923076923077
Recall: 6.976744186046512
%% Cell type:code id:1c713122 tags:
```
python
cm
=
confusion_matrix
(
y_test
,
forest
.
predict
(
x_test
))
sns
.
heatmap
(
cm
,
annot
=
True
,
cmap
=
'
terrain
'
,
fmt
=
'
g
'
)
plt
.
xlabel
(
'
Real data
'
)
plt
.
ylabel
(
'
Predicted data
'
)
plt
.
show
()
```
%% Output
%% Cell type:markdown id:d4e8e96f tags:
### 4.4.4 Support Vector Machine
%% Cell type:code id:4bd111b7 tags:
```
python
svc
=
SVC
()
svc
.
fit
(
x_train
,
y_train
)
```
%% Output
SVC()
%% Cell type:code id:57b9a951 tags:
```
python
# train data
print
(
classification_report
(
y_train
,
svc
.
predict
(
x_train
)))
```
%% Output
precision recall f1-score support
0 0.92 0.99 0.95 596
1 0.95 0.75 0.84 204
accuracy 0.93 800
macro avg 0.94 0.87 0.90 800
weighted avg 0.93 0.93 0.92 800
%% Cell type:code id:83962b23 tags:
```
python
# train data
print
(
'
Accuracy:
'
,
accuracy_score
(
y_train
,
svc
.
predict
(
x_train
))
*
100
)
print
(
'
Precision:
'
,
precision_score
(
y_train
,
svc
.
predict
(
x_train
))
*
100
)
print
(
'
Recall:
'
,
recall_score
(
y_train
,
svc
.
predict
(
x_train
))
*
100
)
```
%% Output
Accuracy: 92.625
Precision: 95.03105590062113
Recall: 75.0
%% Cell type:code id:ffe4ac08 tags:
```
python
# test data
print
(
classification_report
(
y_test
,
svc
.
predict
(
x_test
)))
```
%% Output
precision recall f1-score support
0 0.85 0.99 0.91 157
1 0.89 0.37 0.52 43
accuracy 0.85 200
macro avg 0.87 0.68 0.72 200
weighted avg 0.86 0.85 0.83 200
%% Cell type:code id:99f76a05 tags:
```
python
# test data
print
(
'
Accuracy:
'
,
accuracy_score
(
y_test
,
svc
.
predict
(
x_test
))
*
100
)
print
(
'
Precision:
'
,
precision_score
(
y_test
,
svc
.
predict
(
x_test
))
*
100
)
print
(
'
Recall:
'
,
recall_score
(
y_test
,
svc
.
predict
(
x_test
))
*
100
)
```
%% Output
Accuracy: 85.5
Precision: 88.88888888888889
Recall: 37.2093023255814
%% Cell type:code id:f7ea05f4 tags:
```
python
cm
=
confusion_matrix
(
y_test
,
svc
.
predict
(
x_test
))
sns
.
heatmap
(
cm
,
annot
=
True
,
cmap
=
'
terrain
'
,
fmt
=
'
g
'
)
plt
.
xlabel
(
'
Real data
'
)
plt
.
ylabel
(
'
Predicted data
'
)
plt
.
show
()
```
%% Output
%% Cell type:markdown id:d9d7678f tags:
# 5. Deployment
%% Cell type:code id:6c6ba1e6 tags:
```
python
# Select one scaled person of the dataset
sample_df
=
x_test
[
72
]
```
%% Cell type:code id:6b73ff09 tags:
```
python
# Features of the selected sample
sample_df
```
%% Output
array([ 1.35903462, -0.65270587, 1.10833761, -0.37363236, -0.42828957,
2.30200187, -0.41181385, -0.40137644, -0.38655567, -0.27958383,
-0.30478874, -0.28730468, -0.24413654, -0.24124895, -0.31207962,
-0.26636529, -0.26636529, -0.30478874, -0.26636529, -0.28217394,
-0.28984624, 3.79270555, -0.18328047, -0.22941573, -0.24983394,
-0.24124895, -0.22331316, 5.06622805, -0.21707238, -0.24699789,
-0.23241869, -0.24413654, -0.23833416, -0.23833416, -0.23833416,
-0.22021079, -0.25264558, -0.22021079, -0.19044535, -0.2353911 ,
-0.24983394, 2.19986728, -0.47248449, -0.46255869, -0.40973554,
-0.43033148, -0.27958383, -0.82502865, -0.31926223, -0.91370804,
-0.6352234 , -0.74390729, 1.60356745, -0.29488391, -0.51752183,
-0.29738086, -0.50780078, -0.65660263, -0.6644106 , -0.67419986,
-1.6511054 , -1.04810348, 0.18475885, -0.48560679, -0.92537512,
0.963709 , 1.11630666, -1.18253256, 0.45167913, 0.85886085,
0.85043965, 0.74218584, 0.10204472])
%% Cell type:code id:05ec844f tags:
```
python
# Execute prediction
sample_pred
=
svc
.
predict
([
sample_df
])
```
%% Cell type:code id:ac328f85 tags:
```
python
# Interpret the result
def
check_prediction
(
pred
):
if
pred
[
0
]
==
1
:
print
(
"
Fraud.
"
)
else
:
print
(
"
No Fraud.
"
)
```
%% Cell type:code id:fc24f4ee tags:
```
python
# call the prediciton method
check_prediction
(
sample_pred
)
```
%% Output
Fraud.
%% Cell type:code id:c91f2802 tags:
```
python
```
This diff is collapsed.
Click to expand it.
Preview
0%
Loading
Try again
or
attach a new file
.
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Save comment
Cancel
Please
sign in
to comment