From db4109603721fb0d9edb22e4733a62a0b5615b40 Mon Sep 17 00:00:00 2001
From: chris waisi <kingwaisi@gmail.com>
Date: Sun, 7 Jul 2024 17:24:54 +0200
Subject: [PATCH] updatenewtags1

---
 ...{notebook (4).ipynb => notebook (5).ipynb} | 1457 +++++++--
 .../notebook.ipynb                            | 2641 -----------------
 2 files changed, 1195 insertions(+), 2903 deletions(-)
 rename Tourism/Prediction cancellation of hotel bookings/{notebook (4).ipynb => notebook (5).ipynb} (98%)
 delete mode 100644 Tourism/Prediction cancellation of hotel bookings/notebook.ipynb

diff --git a/Tourism/Prediction cancellation of hotel bookings/notebook (4).ipynb b/Tourism/Prediction cancellation of hotel bookings/notebook (5).ipynb
similarity index 98%
rename from Tourism/Prediction cancellation of hotel bookings/notebook (4).ipynb
rename to Tourism/Prediction cancellation of hotel bookings/notebook (5).ipynb
index 2d33178..c36522a 100644
--- a/Tourism/Prediction cancellation of hotel bookings/notebook (4).ipynb	
+++ b/Tourism/Prediction cancellation of hotel bookings/notebook (5).ipynb	
@@ -3,99 +3,92 @@
   {
    "cell_type": "markdown",
    "metadata": {
+    "Paragraph": "Analyse der Bewegung und Aktivität freilaufender Rinder",
     "editable": true,
     "include": true,
-    "paragraph": "Analyse der Bewegung und Aktivität von freilaufenden Rindern",
     "slideshow": {
      "slide_type": ""
     },
     "tags": []
    },
    "source": [
-    "# Analyse der Bewegung und Aktivität von freilaufenden Rindern"
+    "# Analyse der Bewegung und Aktivität freilaufender Rinder"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {
+    "Paragraph": "Geschäftsverstandnis",
     "editable": true,
-    "include": false,
-    "paragraph": "Geschäftsverstandnis",
+    "include": true,
     "slideshow": {
      "slide_type": ""
     },
     "tags": []
    },
    "source": [
-    "# 1. Geschäftsverstandnis"
+    "# 1. Geschäftsverstandinis\n",
+    "Landwirte geraten immer weiter unter wirtschaftlichen Druck. Beispielsweise kam es durch die Coronapandemie zu teuren logistischen Problemen. Gleichzeitig stellen immer mehr Konsumenten höhere Anforderungen an eine artgerechte Tierhaltung. Diese Marktbedingungen erscheinen für Landwirte problematisch, da letzterer Aspekt natürlich mit gewissen Kosten verbunden ist und ein generelles Umdenken notwendig macht. Durch neue Technologien wie z.B. Künstliche Intelligenz und der Verwendung von Sensoren können die Prozesse effizienter gestaltet und gleichzeitig die Anforderungen des Tierschutzes besser erfüllt werden. Ein Problem der Landwirte ist, dass sie ihre Tiere nicht gezielt bzw. individuell umsorgen können. Das Wissen darüber, welche Aktivitäten z.B. besonderen Einfluss auf die Milchproduktion oder Fleischqualität haben, kann bisher nicht umfänglich erfasst werden. Somit sind auch keine gezielten Maßnahmen zur Förderung bestimmter Aktivitäten möglich. Denkbare Maßnahmen wären die gezielte Motivation zu mehr Bewegung einzelner Tiere oder die Optimierung der Zucht. Bisher ist es jedoch schwierig kontinuierliche Aktivitätsprofile der Tiere zu erstellen. Ein weiteres Problem der Landwirte ist, dass eine frühzeitige Erkennung von Veränderungen des Gesundheitszustandes einzelner Tiere oft nicht möglich ist. Zwar ist die regelmäßige Kontrolle des Zustandes einer Herde rechtlich vorgeschrieben, jedoch können gerade in größeren Betrieben auf Grund der großen Anzahl an Tieren, gesundheitliche Veränderungen erst spät erkannt werden. Oft bleibt im Krankheitsfall nur die Behandlung mit Antibiotika oder anderen teuren Medikamenten. Gewissen Verhaltensänderungen können jedoch frühzeitig erste Hinweise liefern. Gesunde Rinder beispielsweise bewegen sich normalerweise recht viel und reduzieren ihre Aktivität, wenn sie erste Krankheitszeichen entwickeln. Eine frühe Erkennung dieser Bewegungsreduzierung wäre von großer Bedeutung für einen Landwirt. Healthy Cattle plant, zur Lösung der oben genannten Probleme, eine Aktivitätsklassifizierung zur Erstellung von Aktivitätsprofilen und zur frühzeitigen Erkennung von Krankheiten zu entwickeln. Die Hypothese ist, dass mit Hilfe von typischen Sensoren eines Smartphones und der Verwendung eines Machine Learning Algorithmus, die Aktivitäten klassifiziert werden können. Healthy Cattle möchte für die geplante Lösung zunächst testen, ob mit Hilfe von typischen Sensoren eines Smartphones eine ausreichend genaue Klassifizierung von Aktivitäten möglich ist. Um dies möglichst kostengünstig testen zu können, werden für einen ersten Prototypen keine eigenen Daten verwendet, sondern im Internet zugängliche Daten aus einem ähnlichen Setup genutzt"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {
-    "include": true,
-    "paragraph": "",
+    "Include": true,
+    "Paragraph": "Datenverständnis",
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
     "tags": []
    },
    "source": [
-    "Landwirte geraten immer weiter unter wirtschaftlichen Druck. Gleichzeitig\n",
-    "stellen immer mehr Konsumenten höhere Anforderungen an eine artgerechte\n",
-    "Tierhaltung. Diese Marktbedingungen erscheinen für Landwirte problematisch,\n",
-    "da letzterer Aspekt natürlich mit gewissen Kosten verbunden ist und ein\n",
-    "generelles Umdenken notwendig macht.\n",
-    "\n",
-    "Durch neue Technologien wie Künstliche Intelligenz und die Verwendung\n",
-    "von Sensoren können Prozesse effizienter gestaltet und gleichzeitig die\n",
-    "Anforderungen des Tierschutzes besser erfüllt werden.\n",
-    "\n",
-    "Ein Problem der Landwirte ist, dass sie ihre Tiere nicht gezielt bzw. individuell\n",
-    "umsorgen können. Das Wissen darüber, welche Aktivitäten besonderen\n",
-    "Einfluss auf die Milchproduktion oder Fleischqualität haben, kann bisher nicht\n",
-    "umfänglich erfasst werden. Somit sind auch keine gezielten Maßnahmen zur\n",
-    "Förderung bestimmter Aktivitäten möglich. Denkbare Maßnahmen wären die\n",
-    "gezielte Motivation zu mehr Bewegung einzelner Tiere oder die Optimierung der\n",
-    "Zucht. Bisher ist es jedoch schwierig kontinuierliche Aktivitätsprofile der Tiere zu erstellen.\n",
-    "\n",
-    "Ein weiteres Problem der Landwirte ist, dass eine frühzeitige Erkennung von\n",
-    "Veränderungen des Gesundheitszustandes einzelner Tiere oft nicht möglich ist.\n",
-    "Zwar ist die regelmäßige Kontrolle des Zustandes einer Herde rechtlich\n",
-    "vorgeschrieben, jedoch können gerade in größeren Betrieben auf Grund der\n",
-    "großen Anzahl an Tieren, gesundheitliche Veränderungen erst spät erkannt\n",
-    "werden. \n",
-    "\n",
-    "Verhaltensänderungen können jedoch frühzeitig erste Hinweise liefern. Gesunde Rinder beispielsweise bewegen sich normalerweise recht viel und reduzieren ihre Aktivität, wenn sie erste Krankheitszeichen entwickeln. Eine frühe Erkennung dieser Bewegungsreduzierung wäre von großer Bedeutung für einen Landwirt."
+    "# 2. Datenverständnis\n",
+    "Der Datensatz dieser Hausarbeit stammt von Anguita, Ghio et al. und wurde abgerufen von Kaggle.com unter: https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones. Der Original Datensatz ist auf https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones. In einem Experiment haben 30 Probanden mit einem auf Hüfthöhe befestigten Smartphone einen festgelegten Bewegungsablauf durchgeführt. Dabei wurden mit Hilfe der verbauten Sensoren die linearen Beschleunigungs- und Winkelgeschwindigkeitssignale auf jeweils drei Achsen erfasst. Die einzelnen Datenpunkte wurden je nach Aktivität klassifiziert bzw. gelabelt. Durch verschiedene Filterverfahren wurden diese anschließend von Rauschen befreit und weitere zeit- der frequenzbezogene Features extrahiert. Aus diesen wurden weitere Features wie z.B. Mittelwert, Standardabweichung, Minimal- und Maximalwert, Median usw. berechnet. Das zu Grunde gelegte Machine Learning Problem ist in den Bereich Supervised Learning einzuordnen, da auf gelabelten Daten gelernt wird. Jeder Datenpunkt im Datensatz besitzt eine Reihe von Features - berechnet aus der Sensorwerten - und ein Target - die Aktivitätsklasse. Der zur Verfügung stehende Datensatz besteht aus 10299 Beobachtungen (Observations). Es gibt keine fehlenden Werte und es existieren auch keine Duplikate. Die Anzahl der Features ist sehr groß, was eine umfängliche Analyse der einzelnen Features stark erschwert. Eine grafische Überprüfung auf Multikollinearität mit Hilfe der Korrelationsmatrix ist deshalb nicht sinnvoll. Eine Überprüfung des Variance Inflation Factor (VIF) ergibt, dass sehr viele Features mit einem VIF>10 eine hohe Multikollinearität aufweisen (siehe Tabelle). Eine hohe Multikollinearität hat negative Auswirkungen auf die Interpretierbarkeit der später erstellten Modelle\n"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
    "metadata": {
-    "include": true,
-    "paragraph": "Daten und Datenverständnis"
+    "Include": true,
+    "Paragraph": "Import von Relevant Module",
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
    },
    "outputs": [],
    "source": [
-    "# 2. Daten und Datenverständnis"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "include": true,
-    "paragraph": "Import von relevanten Module"
-   },
-   "source": [
-    "\n",
-    "## 2.1. Import von relevanten Daten "
+    "## 2.1. Import von Relevant Modules  "
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 34,
+   "execution_count": 2,
    "metadata": {
-    "include": true
+    "Include": false,
+    "Paragraph": "Import von Relevant Module",
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
    },
-   "outputs": [],
+   "outputs": [
+    {
+     "ename": "ModuleNotFoundError",
+     "evalue": "No module named 'plotly'",
+     "output_type": "error",
+     "traceback": [
+      "\u001b[1;31m---------------------------------------------------------------------------\u001b[0m",
+      "\u001b[1;31mModuleNotFoundError\u001b[0m                       Traceback (most recent call last)",
+      "Cell \u001b[1;32mIn[2], line 4\u001b[0m\n\u001b[0;32m      1\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mnumpy\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mnp\u001b[39;00m \u001b[38;5;66;03m# standard for data processing\u001b[39;00m\n\u001b[0;32m      2\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mpandas\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mpd\u001b[39;00m \u001b[38;5;66;03m# standard for data processing\u001b[39;00m\n\u001b[1;32m----> 4\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mplotly\u001b[39;00m\u001b[38;5;21;01m.\u001b[39;00m\u001b[38;5;21;01mgraph_objects\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01mgo\u001b[39;00m \u001b[38;5;66;03m# creates plots\u001b[39;00m\n\u001b[0;32m      5\u001b[0m \u001b[38;5;28;01mimport\u001b[39;00m \u001b[38;5;21;01mseaborn\u001b[39;00m \u001b[38;5;28;01mas\u001b[39;00m \u001b[38;5;21;01msns\u001b[39;00m\n\u001b[0;32m      6\u001b[0m \u001b[38;5;28;01mfrom\u001b[39;00m \u001b[38;5;21;01mmatplotlib\u001b[39;00m \u001b[38;5;28;01mimport\u001b[39;00m pyplot \u001b[38;5;28;01mas\u001b[39;00m plt\n",
+      "\u001b[1;31mModuleNotFoundError\u001b[0m: No module named 'plotly'"
+     ]
+    }
+   ],
    "source": [
     "import numpy as np # standard for data processing\n",
     "import pandas as pd # standard for data processing\n",
@@ -126,8 +119,13 @@
   {
    "cell_type": "markdown",
    "metadata": {
-    "include": "true",
-    "paragraph": "Daten Auslesen"
+    "editable": true,
+    "include": true,
+    "paragragph": "Daten Auslesen",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
    },
    "source": [
     "## Daten Auslesen"
@@ -136,9 +134,9 @@
   {
    "cell_type": "markdown",
    "metadata": {
+    "Include": true,
+    "Paragraph": "Daten Auslesen",
     "editable": true,
-    "include": true,
-    "paragraph": "DataUnderstanding",
     "slideshow": {
      "slide_type": ""
     },
@@ -147,29 +145,37 @@
    "source": [
     "Quelle der Daten:\n",
     "\n",
-    "Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra und Jorge L. Reyes-Ortiz. Ein öffentlich zugänglicher Datensatz für die Erkennung menschlicher Aktivitäten mit Smartphones. 21. Europäisches Symposium über Künstliche Neuronale Netze, Computational Intelligence und Maschinelles Lernen, ESANN 2013. Brügge, Belgien 24-26 April 2013.\n",
-    "Heruntergeladen von Kaggle: https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones\n"
+    "Davide Anguita, Alessandro Ghio, Luca Oneto, Xavier Parra und Jorge L. Reyes-Ortiz. Ein gemeinfreier Datensatz zur Erkennung menschlicher Aktivitäten mithilfe von Smartphones. 21. Europäisches Symposium zu künstlichen neuronalen Netzen, Computerintelligenz und maschinellem Lernen, ESANN 2013. Brügge, Belgien, 24.–26. April 2013.\n",
+    "Downloaded from Kaggle:\n",
+    "https://www.kaggle.com/uciml/human-activity-recognition-with-smartphones"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {
+    "Include": true,
+    "Pragraph": true,
     "editable": true,
-    "include": true,
-    "paragraph": "DataUnderstanding",
     "slideshow": {
      "slide_type": ""
     },
     "tags": []
    },
    "source": [
-    "Es gibt zwei getrennte Datensätze aus der Testgruppe und der Trainingsgruppe. Diese werden für die weitere Analyse zunächst noch einmal zusammengefasst. Die Daten werden identifiziert, ob sie aus dem Trainings- oder Testdatensatz stammen, damit sie später wieder aufgeteilt werden können."
+    "Es gibt zwei separate Datensätze aus der Testgruppe und der Trainingsgruppe. Zur weiteren Analyse werden diese zunächst noch einmal zusammengefasst. Die Daten werden unabhängig davon, ob sie aus dem Zug- oder Testdatensatz stammen, identifiziert, sodass sie später wieder aufgeteilt werden können."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 35,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "df_train = pd.read_csv('https://storage.googleapis.com/ml-service-repository-datastorage/Analysis_of_the_movement_and_activity_of_free-ranging_cattle_test.csv', delimiter=',')"
@@ -178,7 +184,14 @@
   {
    "cell_type": "code",
    "execution_count": 36,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "df_test = pd.read_csv('https://storage.googleapis.com/ml-service-repository-datastorage/Analysis_of_the_movement_and_activity_of_free-ranging_cattle_test.csv', delimiter=',')"
@@ -187,7 +200,14 @@
   {
    "cell_type": "code",
    "execution_count": 37,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "df_train['type']='train'"
@@ -196,7 +216,14 @@
   {
    "cell_type": "code",
    "execution_count": 38,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "df_test['type']='test'"
@@ -205,7 +232,14 @@
   {
    "cell_type": "code",
    "execution_count": 39,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "df = pd.concat([df_train, df_test], axis=0).reset_index(drop=True)"
@@ -214,7 +248,14 @@
   {
    "cell_type": "code",
    "execution_count": 40,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -443,15 +484,30 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "Include": true,
+    "Paragraph": "Datenvorbereitung",
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "3. Datenexploration und -aufbereitung"
+    "# 3. Datenvorbereitung"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 41,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -467,15 +523,30 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "Include": true,
+    "Paragraph": "Suchen nach fehlenden Daten",
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "### Auf fehlende Werte prüfen"
+    "### Suchen nach fehlenden Daten\""
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 42,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -506,7 +577,14 @@
   {
    "cell_type": "code",
    "execution_count": 43,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -525,22 +603,44 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": false,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "--> **Keine fehlenden Werte im Data Frame"
+    "--> **no missing values** in the Data Frame"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "Include": true,
+    "Paragraph": "Suchen nach Duplikate",
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "### Auf Duplikate prüfen"
+    "### Check for duplicates"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 44,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -611,29 +711,58 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": false,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "--> **keine Duplikatzeilen** in den Daten"
+    "--> **no duplicated rows** in the data"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "Include": true,
+    "Paragraph": "Ziel Variable",
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "### Ziel Variable"
+    "### Target variable"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "Wir haben ein Klassifizierungsproblem und die Zielspalte ist die Spalte \"Aktivität\"."
+    "Wir haben ein Klassifizierungsproblem und die Zielspalte ist die Spalte „Aktivität"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 45,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -655,7 +784,14 @@
   {
    "cell_type": "code",
    "execution_count": 46,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -686,15 +822,29 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "* Im Smart Farming Kontext dieser Aufgabe würde ein Rind normalerweise nicht auf einer Treppe laufen. Daher werden die Zeilen mit den Bezeichnungen \"WALKING_DOWNSTAIRS\" und \"WALKING_UPSTAIRS\" entfernt:"
+    "*Im Smart-Farming-Kontext dieser Aufgabe würde ein Rind normalerweise nicht über eine Treppe laufen. Daher werden die Zeilen mit den Bezeichnungen „WALKING_DOWNSTAIRS“ und „WALKING_UPSTAIRS“ entfernt:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 47,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "indexNames = df[(df['Activity'] == 'WALKING_DOWNSTAIRS') | (df['Activity'] == 'WALKING_UPSTAIRS')].index\n",
@@ -705,7 +855,14 @@
   {
    "cell_type": "code",
    "execution_count": 48,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -736,23 +893,44 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "- Für jede Aktivität etwa die gleiche Anzahl von Beobachtungen. Wir könnten vor der Modellierung eine Überstichprobe aller Minderheitenklassen nehmen, um einen perfekt ausgewogenen Datensatz zu erhalten."
+    "- For each activity approximately the same number of observations. We could over-sample all minority classes before modelling to have a perfect balanced dataset"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "Paragraph": " Wie viele Beobachtungen gibt es zu jedem Thema?",
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "### Wie viele Beobachtungen gibt es von jedem Testobjekt??"
+    "### How many observations from each subject exists?"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 49,
    "metadata": {
-    "scrolled": true
+    "editable": true,
+    "include": true,
+    "scrolled": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
    },
    "outputs": [
     {
@@ -784,22 +962,44 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "Die obige Abbildung ist interessant. Normalerweise haben alle Versuchspersonen die gleiche Versuchsreihe durchgeführt. Daher würde man annehmen, dass die Anzahl der Beobachtungen von jedem Probanden nahezu gleich sein muss. Es gibt jedoch eine Spanne von 200 bis 300 Beobachtungen über alle Versuchspersonen hinweg. Ein Grund dafür könnte sein, dass der Wechsel von einer Aktivität zur nächsten in der Sequenz nicht scharf oder deutlich genug war und die Beobachter die unklaren Beobachtungen in diesen unstabilen Phasen nachträglich gelöscht haben.  "
+    "Die obige Abbildung ist interessant. Normalerweise haben alle Probanden den gleichen Versuchsablauf durchgeführt. Daher würden wir erwarten, dass die Anzahl der Beobachtungen von jedem Probanden nahezu gleich sein muss. Es gibt jedoch eine Spanne von 200 bis 300 Beobachtungen über alle Themen hinweg. Ein Grund könnte sein, dass der Übergang von einer Aktivität zur nächsten in der Sequenz nicht scharf oder klar genug war und die Beobachter die unklaren Beobachtungen in diesen instabilen Phasen nachträglich gelöscht hatten.  "
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Gesamtzahl der Beobachtungen",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "### Gesamtzahl der Beobachtungen"
+    "### Total number of observations"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 50,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -815,16 +1015,31 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "PCA fuer Visionalisation",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "### PCA zur Visualisierung\n",
-    "- **Die PCA ist eine einfache Methode zur Visualisierung hochdimensionaler Daten in einem niedrigdimensionalen Raum. (Vorsicht: wir bezahlen dafür mit einem Informationsverlust-->aber für Visualisierungszwecke ist es OK)"
+    "### PCA fuer Visualisation\n",
+    "**Die Hauptkomponentenanalyse ** ist eine einfache Möglichkeit, hochdimensionale Daten in einem niedrigdimensionalen Raum zu visualisieren. (Achtung: Wir zahlen dafür mit einem Informationsverlust -> aber für Visualisierungszwecke ist es in Ordnung)"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 51,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "data_visualisation = df.drop('subject', axis =1).drop('Activity', axis=1).drop('type', axis =1)"
@@ -833,7 +1048,14 @@
   {
    "cell_type": "code",
    "execution_count": 52,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "s = StandardScaler()\n",
@@ -843,7 +1065,14 @@
   {
    "cell_type": "code",
    "execution_count": 53,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "#We want 3 \n",
@@ -855,7 +1084,14 @@
   {
    "cell_type": "code",
    "execution_count": 54,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -872,7 +1108,14 @@
   {
    "cell_type": "code",
    "execution_count": 55,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -889,7 +1132,14 @@
   {
    "cell_type": "code",
    "execution_count": 56,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -909,7 +1159,14 @@
   {
    "cell_type": "code",
    "execution_count": 57,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -949,7 +1206,14 @@
   {
    "cell_type": "code",
    "execution_count": 58,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -980,7 +1244,14 @@
   {
    "cell_type": "code",
    "execution_count": 59,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -1016,7 +1287,14 @@
   {
    "cell_type": "code",
    "execution_count": 60,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -23187,6 +23465,26 @@
           "y": 0.8999775657179135,
           "z": 0.6377980341399246
          },
+         "camera": {
+          "center": {
+           "x": 0,
+           "y": 0,
+           "z": 0
+          },
+          "eye": {
+           "x": 2.3987812973900966,
+           "y": 2.398781297390096,
+           "z": 2.3987812973900966
+          },
+          "projection": {
+           "type": "perspective"
+          },
+          "up": {
+           "x": 0,
+           "y": 0,
+           "z": 1
+          }
+         },
          "domain": {
           "x": [
            0,
@@ -24075,34 +24373,61 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "### Ergebnisse Visualisierung mit PCA\n",
-    "\n",
-    "* Nach der PCA/Transformation in 3 Hauptkomponenten können wir die 3 Klassen in angemessener Weise visuell trennen. \n",
-    "* Allerdings beschreiben die 3 Hauptkomponenten nur 62% der Varianz der ursprünglichen Daten. Das bedeutet, dass wir einen relativ großen Informationsverlust haben. \n",
-    "* Ein zweites Problem ist, dass es schwierig ist, Modelle zu interpretieren, die auf dem Ergebnis einer PCA basieren\n",
-    "\n",
+    "###Ergibnis von Visualisation mit PCA\n",
+    "Nach der PCA / Transformation in 3 Hauptkomponenten können wir die 3 Klassen visuell angemessen trennen.\n",
+    "Die 3 Hauptkomponenten beschreiben jedoch nur 62 % der Varianz der Originaldaten. Das bedeutet, dass wir einen relativ großen Informationsverlust haben.\n",
+    "Ein zweites Problem besteht darin, dass es schwierig ist, Modelle basierend auf dem Ergebnis einer PCA zu interpretieren\n",
     "-->wir verwenden die PCA nur zur Visualisierung und nicht zur Modellierung"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Funktionsübersicht",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "### Feature Übersicht"
+    "### Funktionsübersicht"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "Aus dem Originalpapier über die Daten/Quelle der Daten können wir die Information erhalten, dass es die folgenden 17 Hauptmerkmale im Zeit- und Frequenzbereich des Signals gibt:"
+    "Aus dem Originalpapier über die Daten/Quelle der Daten können wir die Information entnehmen, dass es die folgenden 17 Hauptmerkmale von Zeit- und Frequenzbereichssignalen gibt "
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
     "|Name|Time|Freq.|\n",
     "| --- | --- | --- |\n",
@@ -24121,7 +24446,14 @@
   {
    "cell_type": "code",
    "execution_count": 61,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "data_temp = df.drop('subject', axis =1).drop('Activity', axis=1).drop('type', axis =1)"
@@ -24130,7 +24462,14 @@
   {
    "cell_type": "code",
    "execution_count": 62,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -24147,7 +24486,14 @@
   {
    "cell_type": "code",
    "execution_count": 63,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -24167,15 +24513,30 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "auf Multikollinearität prüfen",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "### Multikollinearität prüfen"
+    "### auf Multikollinearität prüfen"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 65,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "variables = data_temp\n",
@@ -24185,7 +24546,14 @@
   {
    "cell_type": "code",
    "execution_count": 66,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -24786,7 +25154,14 @@
   {
    "cell_type": "code",
    "execution_count": 67,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "# add name of features to vif dataframe\n",
@@ -24799,7 +25174,14 @@
   {
    "cell_type": "code",
    "execution_count": 68,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -24916,7 +25298,14 @@
   {
    "cell_type": "code",
    "execution_count": 69,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "#Drop all features with a VIF > 10, as they are correlating to much with other features\n",
@@ -24926,7 +25315,14 @@
   {
    "cell_type": "code",
    "execution_count": 70,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "data_with_important_features = data_temp.drop(features_to_drop, axis = 1)"
@@ -24935,7 +25331,14 @@
   {
    "cell_type": "code",
    "execution_count": 71,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -25179,7 +25582,14 @@
   {
    "cell_type": "code",
    "execution_count": 72,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -25742,7 +26152,14 @@
   {
    "cell_type": "code",
    "execution_count": 73,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -25778,7 +26195,14 @@
   {
    "cell_type": "code",
    "execution_count": 74,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -25816,44 +26240,58 @@
    "metadata": {
     "editable": true,
     "include": true,
-    "paragraph": "DataUnderstanding",
+    "paragraph": "Zusammenfassung des Datenverständnisses",
     "slideshow": {
      "slide_type": ""
     },
     "tags": []
    },
    "source": [
-    "### Zusammenfassung des Datenverständnisses\n",
-    "* 1 Spalte mit der Subjekt-ID 'subject'-->nur zur Filterung, muss entfernt werden\n",
-    "* 1 Zielspalte 'Aktivität'-->4 Klassen (GEHEN, SITZEN, LIEGEN, STEHEN), etwas unausgewogen\n",
-    "* wir haben ein Klassifizierungsproblem mit mehreren Klassen ***überwachtes maschinelles Lernen***\n",
-    "* 1 \"support column\" source-->is the observation from the train or test group-->train test split along this value\n",
-    "* 1 \"Unterstützungsspalte\"-Quelle-->ist die Beobachtung aus der Trainings- oder Testgruppe-->Train-Test-Aufteilung entlang dieses Wertes \n",
-    "* Diese wurden nach Möglichkeit für jede der drei Achsen x, y und z berechnet\n",
-    "* In der Summe sind das 561 Merkmale\n",
-    "-->Die Analyse der einzelnen Merkmale im Detail ist aufgrund der großen Anzahl von Merkmalen sehr schwierig. Schließlich haben wir Multilinerarität in den Daten\n",
+    "### Summary of the Data Understanding\n",
+    "Zusammenfassung des Datenverständnisses\n",
+    "1 Spalte mit der Betreff-ID „subject“ – nur zum Filtern – muss entfernt werden\n",
     "\n",
-    "* Nach der Multicollineraty-Prüfung sind nur noch 39 Merkmale übrig. In den Boxplots sind bei einigen Merkmalen Ausreißer zu sehen, aber es ist schwierig, diese Ausreißer zu analysieren und zu interpretieren, da die Daten anscheinend bereits teilweise standardisiert sind und wir keine Messeinheiten haben. Daher akzeptieren wir im Moment diese Ausreißer\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {},
-   "source": [
-    "Trainings Test Split"
+    "1 Zielspalte „Aktivität“ -> 4 Klassen (GEHEN, SITZEN, LIEGEN, STEHEN) etwas unausgewogen\n",
+    "\n",
+    "Wir haben ein Klassifizierungsproblem mit mehreren Klassen – überwachtes maschinelles Lernen\n",
+    "\n",
+    "1 Quelle „Unterstützungsspalte“ -> ist die Beobachtung aus dem Zug oder der Testgruppe -> Zugtest, aufgeteilt entlang dieses Werts\n",
+    "\n",
+    "Darüber hinaus gibt es viele Merkmale, die aus den 17 Signalmerkmalen im Zeit- und Frequenzbereich berechnet werden. Zum Beispiel Mittelwert, Min., Max., Standardwert, Signalstärkebereich usw.\n",
+    "\n",
+    "Diese wurden nach Möglichkeit für jede der drei Achsen x,y und z berechnet\n",
+    "\n",
+    "Zusammenfassend bedeutet das 561 Features -> Aufgrund der hohen Anzahl an Features ist es sehr schwierig, jedes Feature im Detail zu analysieren. Schließlich haben wir Multilinearität in den Daten\n",
+    "\n",
+    "Nach der Multikollinearitätsprüfung waren nur noch 39 Features übrig. Wir können in den Boxplots einige Ausreißer in einigen Merkmalen erkennen, aber es ist schwierig, diese Ausreißer zu analysieren und zu interpretieren, da es den Anschein hat, dass die Daten bereits teilweise standardisiert sind und wir keine Maßeinheiten haben. Im Moment akzeptieren wir diese Ausreißer\n"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph  ": "Train Test Split",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "### "
+    "### Train Test Split"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 75,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "data = df.copy()"
@@ -25862,7 +26300,14 @@
   {
    "cell_type": "code",
    "execution_count": 76,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "data_train = data[data['type']=='train']\n",
@@ -25872,7 +26317,14 @@
   {
    "cell_type": "code",
    "execution_count": 78,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "X_train = data_train.drop('subject', axis =1).drop('Activity', axis=1).drop('type', axis =1)\n",
@@ -25887,7 +26339,14 @@
   {
    "cell_type": "code",
    "execution_count": 79,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -25909,15 +26368,30 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Data Sampling",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "### Oversampling von Daten"
+    "### Data Oversampling"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 80,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "ros = RandomOverSampler(random_state=0)\n",
@@ -25927,7 +26401,14 @@
   {
    "cell_type": "code",
    "execution_count": 81,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -25958,9 +26439,17 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Modellierrung",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "# Modellierung"
+    "# Modeling "
    ]
   },
   {
@@ -25968,26 +26457,40 @@
    "metadata": {
     "editable": true,
     "include": true,
-    "paragraph": "Modeling",
     "slideshow": {
      "slide_type": ""
     },
     "tags": []
    },
    "source": [
-    "Um verschiedene Modelle zu erstellen, ist es eine gute Praxis, Pipelines und GridSearchCV zu verwenden. Diese beiden Tools sind ein einfacher Weg, um Hyperparamter-Tuning und k-fold Cross-Validation auf verschiedenen Modellen durchzuführen. In diesem Fall bestehen die Pipelines aus einem StandardScaler und einem One vs Rest-Klassifikator"
+    "Um verschiedene Modelle zu erstellen, empfiehlt es sich, Pipelines und GridSearchCV zu verwenden. Diese beiden Tools sind eine einfache Möglichkeit, Hyperparameter-Tuning und k-fache Kreuzvalidierung auf verschiedenen Modellen durchzuführen. In diesem Fall bestehen die Pipes aus einem StandardScaler und einem One vs Rest-Klassifikator"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "__Erstellen & Evaluieren verschiedener Modelle__"
+    "__Build & Evaluate some different Models__"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph ": " KNN",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
     "## KNN\n",
     "K-Nearest Neighbors"
@@ -25996,7 +26499,14 @@
   {
    "cell_type": "code",
    "execution_count": 82,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "# Pipeline for KNN\n",
@@ -26009,7 +26519,14 @@
   {
    "cell_type": "code",
    "execution_count": 83,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -26029,7 +26546,14 @@
   {
    "cell_type": "code",
    "execution_count": 84,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -26241,7 +26765,14 @@
   {
    "cell_type": "code",
    "execution_count": 85,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -26260,7 +26791,14 @@
   {
    "cell_type": "code",
    "execution_count": 86,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -26411,7 +26949,14 @@
   {
    "cell_type": "code",
    "execution_count": 87,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "classification_report_test = classification_report(y_test, clf.best_estimator_.predict(X_test), output_dict=True)\n",
@@ -26421,7 +26966,14 @@
   {
    "cell_type": "code",
    "execution_count": 88,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "#helper function\n",
@@ -26442,7 +26994,14 @@
   {
    "cell_type": "code",
    "execution_count": 89,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -26615,7 +27174,14 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
     "## Random Forest"
    ]
@@ -26623,7 +27189,14 @@
   {
    "cell_type": "code",
    "execution_count": 90,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "# Pipeline\n",
@@ -26636,7 +27209,14 @@
   {
    "cell_type": "code",
    "execution_count": 91,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "param_grid = {'ranfo__estimator__max_depth': [5, 15, 30]}"
@@ -26645,7 +27225,14 @@
   {
    "cell_type": "code",
    "execution_count": 92,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -26845,7 +27432,14 @@
   {
    "cell_type": "code",
    "execution_count": 93,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -26864,7 +27458,14 @@
   {
    "cell_type": "code",
    "execution_count": 94,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "classification_report_test = classification_report(y_test, clf2.best_estimator_.predict(X_test), output_dict=True)\n",
@@ -26874,7 +27475,14 @@
   {
    "cell_type": "code",
    "execution_count": 95,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -27047,15 +27655,30 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Logistic Regression",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "## Logistische Regression"
+    "## Logistic Regression"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 96,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "# Pipeline\n",
@@ -27068,7 +27691,14 @@
   {
    "cell_type": "code",
    "execution_count": 97,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -27093,7 +27723,14 @@
   {
    "cell_type": "code",
    "execution_count": 98,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -27112,7 +27749,14 @@
   {
    "cell_type": "code",
    "execution_count": 99,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "classification_report_test = classification_report(y_test, clf3.best_estimator_.predict(X_test), output_dict=True)\n",
@@ -27122,7 +27766,14 @@
   {
    "cell_type": "code",
    "execution_count": 100,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -27295,15 +27946,30 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Decision Tree",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "## Entscheidungsbaum"
+    "## Decision Tree"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 101,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "# Pipeline\n",
@@ -27316,7 +27982,14 @@
   {
    "cell_type": "code",
    "execution_count": 102,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -27341,7 +28014,14 @@
   {
    "cell_type": "code",
    "execution_count": 103,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -27360,7 +28040,14 @@
   {
    "cell_type": "code",
    "execution_count": 104,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "classification_report_test = classification_report(y_test, clf4.best_estimator_.predict(X_test), output_dict=True)\n",
@@ -27370,7 +28057,15 @@
   {
    "cell_type": "code",
    "execution_count": 105,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "SVM",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -27543,7 +28238,14 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
     "## SVM\n",
     "Support Vector Machine"
@@ -27552,7 +28254,14 @@
   {
    "cell_type": "code",
    "execution_count": 106,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "# Pipeline\n",
@@ -27565,7 +28274,14 @@
   {
    "cell_type": "code",
    "execution_count": 107,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "param_grid = {\n",
@@ -27575,7 +28291,14 @@
   {
    "cell_type": "code",
    "execution_count": 108,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -27600,7 +28323,14 @@
   {
    "cell_type": "code",
    "execution_count": 109,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "name": "stdout",
@@ -27619,7 +28349,14 @@
   {
    "cell_type": "code",
    "execution_count": 110,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -27707,7 +28444,14 @@
   {
    "cell_type": "code",
    "execution_count": 111,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "classification_report_test = classification_report(y_test, clf5.best_estimator_.predict(X_test), output_dict=True)\n",
@@ -27717,7 +28461,14 @@
   {
    "cell_type": "code",
    "execution_count": 112,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -27890,15 +28641,30 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Zusammenfassung",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "## Zusammenfassung"
+    "## Summary"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 113,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "frames = [results_KNN, \n",
@@ -27913,7 +28679,14 @@
   {
    "cell_type": "code",
    "execution_count": 114,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -28523,7 +29296,14 @@
   {
    "cell_type": "code",
    "execution_count": 115,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "final_results.sort_index(level=1).to_csv('Results_VIF_lower10.csv')"
@@ -28531,7 +29311,15 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Evaluation",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
     "# 5.  Evaluation"
    ]
@@ -28539,7 +29327,14 @@
   {
    "cell_type": "code",
    "execution_count": 116,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "#helper function\n",
@@ -28571,9 +29366,17 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Genauigkeit erreicht",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "## Erzielte Genauigkeit"
+    "## Achieved accuracy"
    ]
   },
   {
@@ -28581,6 +29384,7 @@
    "execution_count": 117,
    "metadata": {
     "editable": true,
+    "include": true,
     "slideshow": {
      "slide_type": ""
     },
@@ -28689,7 +29493,14 @@
   {
    "cell_type": "code",
    "execution_count": 118,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -28726,7 +29537,14 @@
   {
    "cell_type": "code",
    "execution_count": 119,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -28882,32 +29700,46 @@
    "metadata": {
     "editable": true,
     "include": true,
-    "paragraph": "Evaluation",
     "slideshow": {
      "slide_type": ""
     },
     "tags": []
    },
    "source": [
-    "* **KNN** --> Die Leistung auf Testdaten ist ca. 8% niedriger als auf Trainingsdaten. Wahrscheinlich passt sich das Modell zu sehr an die Trainingsdaten an und lässt sich nicht gut auf ungesehene Testdaten verallgemeinern.\n",
-    "* **Random Forest** --> Die Leistung bei den Testdaten ist 15% geringer als bei den Trainingsdaten. Das Modell **übererfüllt** die Trainingsdaten und verallgemeinert sich nicht gut auf die ungesehenen Testdaten.\n",
-    "* **Logistische Regression** --> Trainings- und Testleistung sind sehr ähnlich (diff=3.9%). Dies bedeutet wahrscheinlich, dass das erstellte Modell **gut** auf neue Daten verallgemeinert.\n",
-    "* **Entscheidungsbaum** --> Die Leistung bei den Testdaten ist 34% niedriger als bei den Trainingsdaten. Das Modell passt sich zu sehr an die Trainingsdaten an und verallgemeinert **nicht gut** auf die unbekannten Testdaten.\n",
-    "* **Support Vector Machine** --> Trainings- und Testleistung sind sehr ähnlich (diff=4.8%). Das bedeutet wahrscheinlich, dass das erstellte Modell **gut** auf neue Daten verallgemeinert.\n",
-    "-->Überprüfen wir die Konfusionsmatrix der beiden besten Modelle"
+    "KNN -> Die Leistung bei Testdaten beträgt ca. 8 % niedriger als bei Trainingsdaten. Wahrscheinlich passt das Modell zu stark auf Trainingsdaten und lässt sich nicht gut auf nicht sichtbare Testdaten verallgemeinern.\n",
+    "Random Forest -> Die Leistung bei Testdaten ist 15 % geringer als bei Trainingsdaten. Das Modell passt zu stark auf Trainingsdaten und lässt sich nicht gut auf nicht sichtbare Testdaten verallgemeinern.\n",
+    "Logistische Regression -> Trainings- und Testleistung sind sehr ähnlich (Differenz = 3,9 %). Dies bedeutet wahrscheinlich, dass sich das erstellte Modell gut auf neue Daten verallgemeinern lässt.\n",
+    "Entscheidungsbaum -> Die Leistung bei Testdaten ist 34 % geringer als bei Trainingsdaten. Das Modell passt zu stark auf Trainingsdaten und lässt sich nicht gut auf nicht sichtbare Testdaten verallgemeinern.\n",
+    "Support Vector Machine -> Trainings- und Testleistung sind sehr ähnlich (Differenz = 4,8 %). Dies bedeutet wahrscheinlich, dass sich das erstellte Modell gut auf neue Daten verallgemeinern lässt.\n",
+    "-->Lassen Sie uns die Verwirrungsmatrix der beiden besten Modelle überprüfen"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragragh": "Confusion Matrix-Logistic Regression",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "## Konfusionsmatrix-Logistische Regression"
+    "## Confusion Matrix-Logistic Regression"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 120,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -28928,15 +29760,30 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Confusion Matrix SVM",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "## Konfusionsmatrix-SVM"
+    "## Confusion Matrix-SVM"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 121,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -28959,7 +29806,13 @@
    "cell_type": "code",
    "execution_count": 122,
    "metadata": {
-    "scrolled": true
+    "editable": true,
+    "include": true,
+    "scrolled": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
    },
    "outputs": [
     {
@@ -29143,18 +29996,32 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragrph": "Funktionsbedeutung",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "## Wichtigkeit der Merkmale\n",
+    "## Funktionsbedeutung\n",
     "\n",
-    "* \n",
-    "Ein One vs Rest Classifier besteht aus n einzelnen Klassifikatoren (mit n=Anzahl der Klassen). Das macht es schwierig, die Merkmalsbedeutung für den gesamten OvsR-Klassifikator zu interpretieren. Eine Möglichkeit besteht darin, die Koeffizienten für jedes einzelne Modell zu bestimmen und den Mittelwert über jeden Koeffizienten zu bilden.\n"
+    "* Ein One vs Rest-Klassifikator besteht aus n einzelnen Klassifikatoren (wobei n = Anzahl der Klassen). Das macht es schwierig, die Bedeutung der Merkmale für den gesamten OvsR-Klassifikator zu interpretieren. Eine Möglichkeit besteht darin, die Koeffizienten für jedes einzelne Modell zu bestimmen und den Mittelwert über jeden Koeffizienten zu bilden."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 123,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "def get_the_feature_importances(clfobject,est_name, functype):\n",
@@ -29174,7 +30041,14 @@
   {
    "cell_type": "code",
    "execution_count": 124,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "feature_importance = pd.DataFrame({},[])\n",
@@ -29184,18 +30058,33 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "jp-MarkdownHeadingCollapsed": true,
+    "paragraph": "ergebnis",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "## Ergebnisse"
+    "## Results"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "* Das beste Modell ist der OvsR-Klassifikator mit logistischer Regression. \n",
-    "* Es hat eine gute Genauigkeit (0.77:test, 0.81:train) und eine gute Präzision (0.87:test) für die Klasse 'WALKING'.\n",
-    "* Die 10 wichtigsten Eigenschaften dieses OvsR-Klassifikators sind die folgenden:"
+    "* The best model is the Logistic Regression OvsR Classifier. \n",
+    "* It has a good accuracy (0.77:test, 0.81:train) and a good precision (0.87:test) for the class 'WALKING'.\n",
+    "* The 10 most important feautures of this OvsR-Classifier are the following:"
    ]
   },
   {
@@ -29312,20 +30201,35 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "-->Die meisten dieser wichtigen Funktionen stammen vom Gyroskopsensor im Smartphone."
+    "-->The most of this important features comes from the **gyroscope sensor** in the smartphone"
    ]
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Modell Deployment",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
-    "## Endgültiges Modell für das Deployment"
+    "## Final model for deployment"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 126,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [],
    "source": [
     "final_model = clf3.best_estimator_"
@@ -29333,16 +30237,31 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "paragraph": "Deployment",
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "source": [
     "# 6. Deployment\n",
-    "Teste die Prognose lokal"
+    "Test the prediction locally:"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": 127,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -29561,7 +30480,14 @@
   {
    "cell_type": "code",
    "execution_count": 128,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -29586,7 +30512,14 @@
   {
    "cell_type": "code",
    "execution_count": 129,
-   "metadata": {},
+   "metadata": {
+    "editable": true,
+    "include": true,
+    "slideshow": {
+     "slide_type": ""
+    },
+    "tags": []
+   },
    "outputs": [
     {
      "data": {
@@ -29608,7 +30541,6 @@
    "metadata": {
     "editable": true,
     "include": true,
-    "paragraph": "Deployment",
     "slideshow": {
      "slide_type": ""
     },
@@ -29621,7 +30553,8 @@
  ],
  "metadata": {
   "branche": "Landwirtschaft",
-  "dataSource": "https://storage.googleapis.com/ml-service-repository-datastorage/Analysis_of_the_movement_and_activity_of_free-ranging_cattle_test.csv",
+  "category": "Forschung",
+  "dataSource": "https://archive.ics.uci.edu/ml/datasets/human+activity+recognition+using+smartphones",
   "funktion": "Operations",
   "kernelspec": {
    "display_name": "Python 3 (ipykernel)",
@@ -29640,10 +30573,10 @@
    "pygments_lexer": "ipython3",
    "version": "3.12.3"
   },
-  "repoLink": "https://gitlab.reutlingen-university.de/ki_lab/machine-learning-services/-/tree/main/Agriculture/Analysis%20of%20the%20movement%20and%20activity%20of%20free-ranging%20cattle",
+  "repoLink": "https://github.com/AlexRossmann/machine-learning-services/tree/main/Agriculture/Analysis%20of%20the%20movement%20and%20activity%20of%20free-ranging%20cattle",
   "skipNotebookInDeployment": false,
-  "teaser": "To be done",
-  "title": "Analyse der Bewegungsprofile von Kühen auf dem Feld"
+  "teaser": "Landwirte geraten immer weiter unter wirtschaftlichen Druck. Beispielsweise kam es durch die Coronapandemie zu teuren logistischen Problemen. Gleichzeitig stellen immer mehr Konsumenten höhere Anforderungen an eine artgerechte Tierhaltung. Diese Marktbedingungen erscheinen für Landwirte problematisch, da letzterer Aspekt natürlich mit gewissen Kosten verbunden ist und ein generelles Umdenken notwendig macht ",
+  "title": "Erkennung der Aktivitäten von Kühen auf dem Feld"
  },
  "nbformat": 4,
  "nbformat_minor": 4
diff --git a/Tourism/Prediction cancellation of hotel bookings/notebook.ipynb b/Tourism/Prediction cancellation of hotel bookings/notebook.ipynb
deleted file mode 100644
index b03d30e..0000000
--- a/Tourism/Prediction cancellation of hotel bookings/notebook.ipynb	
+++ /dev/null
@@ -1,2641 +0,0 @@
-{
- "cells": [
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "paragraph": "Geschäftsverständnis",
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "# 1. Geschäftsverständnis"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "Abschätzung des Kundenverhaltens bezüglich Hotelstornierungen zur Planung von Kapazitäten. Der Anwendungsfall soll testen, ob es möglich ist,\n",
-    "Hotelstornierungen vorherzusagen.Ein wichtiges Ziel für jedes Unternehmen liegt in der Erhaltung wertvoller Kundenbeziehungen, \n",
-    "postive Ranking und Kapazitat plannung. Für Unternehmen ist daher eine Einschätzung der Kundenvehalten bei Buchungstonierung notwendig, \n",
-    "So dass sich das Risiko der Buchungstonierung eines Kunden vorab einschätzen lässt, können Gegenmaßnahmen eingeleitet werden."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "paragraph": "Daten und Datenverständnis",
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "# 2. Daten und Datenverständnis"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "Die Daten Rahmen sind fuer Hotel Buchungen und Der Datensatz für diese Demo wurde auf der Kaggle Data Science Plattform als c.s.v  file veröffentlicht. \n",
-    "In den Datensätzen sind allerlei Daten zu den Gästen erfasst. Merkmale zu Familie mit Kindern Buchungen über Reisebüros etc. können Aufschluss \n",
-    "darüber geben, ob bei ihnen eine höhere Stornoquote vorliegt. Der Datenrahmen enthält Buchungsinformationen von 2 verschiedenen Hotels und die\n",
-    "Anzahl der Beobachtungen = 119390, Anzahl der Merkmale = 32\n",
-    "Correlation Analysis: stays in weekend nights and stays in week night with 0.5."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "paragraph": "Import von Relevant Module",
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "# 2.1. Import von Relevant Module"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 2,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "import pandas as pd\n",
-    "import numpy as np\n",
-    "import matplotlib.pyplot as plt\n",
-    "import seaborn as sns\n",
-    "sns.set()\n",
-    "\n",
-    "from sklearn.model_selection import train_test_split\n",
-    "from sklearn.tree import DecisionTreeClassifier\n",
-    "from sklearn.ensemble import RandomForestClassifier\n",
-    "from sklearn.linear_model import LogisticRegression\n",
-    "from sklearn import metrics\n",
-    "from sklearn.metrics import confusion_matrix, classification_report"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "paragraph": "Daten Auslesen",
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "## 2.2. Daten Auslesen"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 3,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>hotel</th>\n",
-       "      <th>is_canceled</th>\n",
-       "      <th>lead_time</th>\n",
-       "      <th>arrival_date_year</th>\n",
-       "      <th>arrival_date_month</th>\n",
-       "      <th>arrival_date_week_number</th>\n",
-       "      <th>arrival_date_day_of_month</th>\n",
-       "      <th>stays_in_weekend_nights</th>\n",
-       "      <th>stays_in_week_nights</th>\n",
-       "      <th>adults</th>\n",
-       "      <th>...</th>\n",
-       "      <th>deposit_type</th>\n",
-       "      <th>agent</th>\n",
-       "      <th>company</th>\n",
-       "      <th>days_in_waiting_list</th>\n",
-       "      <th>customer_type</th>\n",
-       "      <th>adr</th>\n",
-       "      <th>required_car_parking_spaces</th>\n",
-       "      <th>total_of_special_requests</th>\n",
-       "      <th>reservation_status</th>\n",
-       "      <th>reservation_status_date</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <td>0</td>\n",
-       "      <td>Resort Hotel</td>\n",
-       "      <td>0</td>\n",
-       "      <td>342</td>\n",
-       "      <td>2015</td>\n",
-       "      <td>July</td>\n",
-       "      <td>27</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>...</td>\n",
-       "      <td>No Deposit</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Transient</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Check-Out</td>\n",
-       "      <td>2015-07-01</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>1</td>\n",
-       "      <td>Resort Hotel</td>\n",
-       "      <td>0</td>\n",
-       "      <td>737</td>\n",
-       "      <td>2015</td>\n",
-       "      <td>July</td>\n",
-       "      <td>27</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>...</td>\n",
-       "      <td>No Deposit</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Transient</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Check-Out</td>\n",
-       "      <td>2015-07-01</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>2</td>\n",
-       "      <td>Resort Hotel</td>\n",
-       "      <td>0</td>\n",
-       "      <td>7</td>\n",
-       "      <td>2015</td>\n",
-       "      <td>July</td>\n",
-       "      <td>27</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "      <td>...</td>\n",
-       "      <td>No Deposit</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Transient</td>\n",
-       "      <td>75.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Check-Out</td>\n",
-       "      <td>2015-07-02</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>3</td>\n",
-       "      <td>Resort Hotel</td>\n",
-       "      <td>0</td>\n",
-       "      <td>13</td>\n",
-       "      <td>2015</td>\n",
-       "      <td>July</td>\n",
-       "      <td>27</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "      <td>...</td>\n",
-       "      <td>No Deposit</td>\n",
-       "      <td>304.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Transient</td>\n",
-       "      <td>75.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Check-Out</td>\n",
-       "      <td>2015-07-02</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>4</td>\n",
-       "      <td>Resort Hotel</td>\n",
-       "      <td>0</td>\n",
-       "      <td>14</td>\n",
-       "      <td>2015</td>\n",
-       "      <td>July</td>\n",
-       "      <td>27</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>2</td>\n",
-       "      <td>...</td>\n",
-       "      <td>No Deposit</td>\n",
-       "      <td>240.0</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0</td>\n",
-       "      <td>Transient</td>\n",
-       "      <td>98.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>Check-Out</td>\n",
-       "      <td>2015-07-03</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>5 rows × 32 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "          hotel  is_canceled  lead_time  arrival_date_year arrival_date_month  \\\n",
-       "0  Resort Hotel            0        342               2015               July   \n",
-       "1  Resort Hotel            0        737               2015               July   \n",
-       "2  Resort Hotel            0          7               2015               July   \n",
-       "3  Resort Hotel            0         13               2015               July   \n",
-       "4  Resort Hotel            0         14               2015               July   \n",
-       "\n",
-       "   arrival_date_week_number  arrival_date_day_of_month  \\\n",
-       "0                        27                          1   \n",
-       "1                        27                          1   \n",
-       "2                        27                          1   \n",
-       "3                        27                          1   \n",
-       "4                        27                          1   \n",
-       "\n",
-       "   stays_in_weekend_nights  stays_in_week_nights  adults  ...  deposit_type  \\\n",
-       "0                        0                     0       2  ...    No Deposit   \n",
-       "1                        0                     0       2  ...    No Deposit   \n",
-       "2                        0                     1       1  ...    No Deposit   \n",
-       "3                        0                     1       1  ...    No Deposit   \n",
-       "4                        0                     2       2  ...    No Deposit   \n",
-       "\n",
-       "   agent company days_in_waiting_list customer_type   adr  \\\n",
-       "0    NaN     NaN                    0     Transient   0.0   \n",
-       "1    NaN     NaN                    0     Transient   0.0   \n",
-       "2    NaN     NaN                    0     Transient  75.0   \n",
-       "3  304.0     NaN                    0     Transient  75.0   \n",
-       "4  240.0     NaN                    0     Transient  98.0   \n",
-       "\n",
-       "   required_car_parking_spaces  total_of_special_requests  reservation_status  \\\n",
-       "0                            0                          0           Check-Out   \n",
-       "1                            0                          0           Check-Out   \n",
-       "2                            0                          0           Check-Out   \n",
-       "3                            0                          0           Check-Out   \n",
-       "4                            0                          1           Check-Out   \n",
-       "\n",
-       "  reservation_status_date  \n",
-       "0              2015-07-01  \n",
-       "1              2015-07-01  \n",
-       "2              2015-07-02  \n",
-       "3              2015-07-02  \n",
-       "4              2015-07-03  \n",
-       "\n",
-       "[5 rows x 32 columns]"
-      ]
-     },
-     "execution_count": 3,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "# my_file = project.get_file()\n",
-    "\n",
-    "# my_file.seek(0)\n",
-    "df = pd.read_csv(\"https://storage.googleapis.com/ml-service-repository-datastorage/Prediction_cancellation_of_hotel_bookings_data.csv\")\n",
-    "\n",
-    "df.head()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 4,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "def attribute_description(data):\n",
-    "    longestColumnName = len(max(np.array(data.columns), key=len))\n",
-    "    for col in data.columns:\n",
-    "        description = ''\n",
-    "        col_dropna = data[col].dropna()\n",
-    "        example = col_dropna.sample(1).values[0]\n",
-    "        if type(example) == str:\n",
-    "            description = 'str '\n",
-    "            if len(col_dropna.unique()) < 10:\n",
-    "                description += '['\n",
-    "                description += '; '.join([ f'\"{name}\"' for name in col_dropna.unique()])\n",
-    "                description += ']'\n",
-    "            else:\n",
-    "                description += '[ example: \"'+ example + '\" ]'\n",
-    "        else:\n",
-    "            description = str(type(example))\n",
-    "        print(col.ljust(longestColumnName)+ f':   {description}')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 5,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "hotel                         :   str [\"Resort Hotel\"; \"City Hotel\"]\n",
-      "is_canceled                   :   <class 'numpy.int64'>\n",
-      "lead_time                     :   <class 'numpy.int64'>\n",
-      "arrival_date_year             :   <class 'numpy.int64'>\n",
-      "arrival_date_month            :   str [ example: \"May\" ]\n",
-      "arrival_date_week_number      :   <class 'numpy.int64'>\n",
-      "arrival_date_day_of_month     :   <class 'numpy.int64'>\n",
-      "stays_in_weekend_nights       :   <class 'numpy.int64'>\n",
-      "stays_in_week_nights          :   <class 'numpy.int64'>\n",
-      "adults                        :   <class 'numpy.int64'>\n",
-      "children                      :   <class 'numpy.float64'>\n",
-      "babies                        :   <class 'numpy.int64'>\n",
-      "meal                          :   str [\"BB\"; \"FB\"; \"HB\"; \"SC\"; \"Undefined\"]\n",
-      "country                       :   str [ example: \"SWE\" ]\n",
-      "market_segment                :   str [\"Direct\"; \"Corporate\"; \"Online TA\"; \"Offline TA/TO\"; \"Complementary\"; \"Groups\"; \"Undefined\"; \"Aviation\"]\n",
-      "distribution_channel          :   str [\"Direct\"; \"Corporate\"; \"TA/TO\"; \"Undefined\"; \"GDS\"]\n",
-      "is_repeated_guest             :   <class 'numpy.int64'>\n",
-      "previous_cancellations        :   <class 'numpy.int64'>\n",
-      "previous_bookings_not_canceled:   <class 'numpy.int64'>\n",
-      "reserved_room_type            :   str [ example: \"A\" ]\n",
-      "assigned_room_type            :   str [ example: \"C\" ]\n",
-      "booking_changes               :   <class 'numpy.int64'>\n",
-      "deposit_type                  :   str [\"No Deposit\"; \"Refundable\"; \"Non Refund\"]\n",
-      "agent                         :   <class 'numpy.float64'>\n",
-      "company                       :   <class 'numpy.float64'>\n",
-      "days_in_waiting_list          :   <class 'numpy.int64'>\n",
-      "customer_type                 :   str [\"Transient\"; \"Contract\"; \"Transient-Party\"; \"Group\"]\n",
-      "adr                           :   <class 'numpy.float64'>\n",
-      "required_car_parking_spaces   :   <class 'numpy.int64'>\n",
-      "total_of_special_requests     :   <class 'numpy.int64'>\n",
-      "reservation_status            :   str [\"Check-Out\"; \"Canceled\"; \"No-Show\"]\n",
-      "reservation_status_date       :   str [ example: \"2017-02-27\" ]\n"
-     ]
-    }
-   ],
-   "source": [
-    "attribute_description(df)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 6,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>hotel</th>\n",
-       "      <th>is_canceled</th>\n",
-       "      <th>lead_time</th>\n",
-       "      <th>arrival_date_year</th>\n",
-       "      <th>arrival_date_month</th>\n",
-       "      <th>arrival_date_week_number</th>\n",
-       "      <th>arrival_date_day_of_month</th>\n",
-       "      <th>stays_in_weekend_nights</th>\n",
-       "      <th>stays_in_week_nights</th>\n",
-       "      <th>adults</th>\n",
-       "      <th>...</th>\n",
-       "      <th>deposit_type</th>\n",
-       "      <th>agent</th>\n",
-       "      <th>company</th>\n",
-       "      <th>days_in_waiting_list</th>\n",
-       "      <th>customer_type</th>\n",
-       "      <th>adr</th>\n",
-       "      <th>required_car_parking_spaces</th>\n",
-       "      <th>total_of_special_requests</th>\n",
-       "      <th>reservation_status</th>\n",
-       "      <th>reservation_status_date</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <td>count</td>\n",
-       "      <td>119390</td>\n",
-       "      <td>119390.000000</td>\n",
-       "      <td>119390.000000</td>\n",
-       "      <td>119390.000000</td>\n",
-       "      <td>119390</td>\n",
-       "      <td>119390.000000</td>\n",
-       "      <td>119390.000000</td>\n",
-       "      <td>119390.000000</td>\n",
-       "      <td>119390.000000</td>\n",
-       "      <td>119390.000000</td>\n",
-       "      <td>...</td>\n",
-       "      <td>119390</td>\n",
-       "      <td>103050.000000</td>\n",
-       "      <td>6797.000000</td>\n",
-       "      <td>119390.000000</td>\n",
-       "      <td>119390</td>\n",
-       "      <td>119390.000000</td>\n",
-       "      <td>119390.000000</td>\n",
-       "      <td>119390.000000</td>\n",
-       "      <td>119390</td>\n",
-       "      <td>119390</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>unique</td>\n",
-       "      <td>2</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>12</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>...</td>\n",
-       "      <td>3</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>4</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>3</td>\n",
-       "      <td>926</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>top</td>\n",
-       "      <td>City Hotel</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>August</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>...</td>\n",
-       "      <td>No Deposit</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Transient</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>Check-Out</td>\n",
-       "      <td>2015-10-21</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>freq</td>\n",
-       "      <td>79330</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>13877</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>...</td>\n",
-       "      <td>104641</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>89613</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>75166</td>\n",
-       "      <td>1461</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>mean</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.370416</td>\n",
-       "      <td>104.011416</td>\n",
-       "      <td>2016.156554</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>27.165173</td>\n",
-       "      <td>15.798241</td>\n",
-       "      <td>0.927599</td>\n",
-       "      <td>2.500302</td>\n",
-       "      <td>1.856403</td>\n",
-       "      <td>...</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>86.693382</td>\n",
-       "      <td>189.266735</td>\n",
-       "      <td>2.321149</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>101.831122</td>\n",
-       "      <td>0.062518</td>\n",
-       "      <td>0.571363</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>std</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.482918</td>\n",
-       "      <td>106.863097</td>\n",
-       "      <td>0.707476</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>13.605138</td>\n",
-       "      <td>8.780829</td>\n",
-       "      <td>0.998613</td>\n",
-       "      <td>1.908286</td>\n",
-       "      <td>0.579261</td>\n",
-       "      <td>...</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>110.774548</td>\n",
-       "      <td>131.655015</td>\n",
-       "      <td>17.594721</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>50.535790</td>\n",
-       "      <td>0.245291</td>\n",
-       "      <td>0.792798</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>min</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>2015.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1.000000</td>\n",
-       "      <td>1.000000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>...</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1.000000</td>\n",
-       "      <td>6.000000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>-6.380000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>25%</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>18.000000</td>\n",
-       "      <td>2016.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>16.000000</td>\n",
-       "      <td>8.000000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>1.000000</td>\n",
-       "      <td>2.000000</td>\n",
-       "      <td>...</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>9.000000</td>\n",
-       "      <td>62.000000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>69.290000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>50%</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>69.000000</td>\n",
-       "      <td>2016.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>28.000000</td>\n",
-       "      <td>16.000000</td>\n",
-       "      <td>1.000000</td>\n",
-       "      <td>2.000000</td>\n",
-       "      <td>2.000000</td>\n",
-       "      <td>...</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>14.000000</td>\n",
-       "      <td>179.000000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>94.575000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>75%</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1.000000</td>\n",
-       "      <td>160.000000</td>\n",
-       "      <td>2017.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>38.000000</td>\n",
-       "      <td>23.000000</td>\n",
-       "      <td>2.000000</td>\n",
-       "      <td>3.000000</td>\n",
-       "      <td>2.000000</td>\n",
-       "      <td>...</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>229.000000</td>\n",
-       "      <td>270.000000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>126.000000</td>\n",
-       "      <td>0.000000</td>\n",
-       "      <td>1.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>max</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>1.000000</td>\n",
-       "      <td>737.000000</td>\n",
-       "      <td>2017.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>53.000000</td>\n",
-       "      <td>31.000000</td>\n",
-       "      <td>19.000000</td>\n",
-       "      <td>50.000000</td>\n",
-       "      <td>55.000000</td>\n",
-       "      <td>...</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>535.000000</td>\n",
-       "      <td>543.000000</td>\n",
-       "      <td>391.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>5400.000000</td>\n",
-       "      <td>8.000000</td>\n",
-       "      <td>5.000000</td>\n",
-       "      <td>NaN</td>\n",
-       "      <td>NaN</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>11 rows × 32 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "             hotel    is_canceled      lead_time  arrival_date_year  \\\n",
-       "count       119390  119390.000000  119390.000000      119390.000000   \n",
-       "unique           2            NaN            NaN                NaN   \n",
-       "top     City Hotel            NaN            NaN                NaN   \n",
-       "freq         79330            NaN            NaN                NaN   \n",
-       "mean           NaN       0.370416     104.011416        2016.156554   \n",
-       "std            NaN       0.482918     106.863097           0.707476   \n",
-       "min            NaN       0.000000       0.000000        2015.000000   \n",
-       "25%            NaN       0.000000      18.000000        2016.000000   \n",
-       "50%            NaN       0.000000      69.000000        2016.000000   \n",
-       "75%            NaN       1.000000     160.000000        2017.000000   \n",
-       "max            NaN       1.000000     737.000000        2017.000000   \n",
-       "\n",
-       "       arrival_date_month  arrival_date_week_number  \\\n",
-       "count              119390             119390.000000   \n",
-       "unique                 12                       NaN   \n",
-       "top                August                       NaN   \n",
-       "freq                13877                       NaN   \n",
-       "mean                  NaN                 27.165173   \n",
-       "std                   NaN                 13.605138   \n",
-       "min                   NaN                  1.000000   \n",
-       "25%                   NaN                 16.000000   \n",
-       "50%                   NaN                 28.000000   \n",
-       "75%                   NaN                 38.000000   \n",
-       "max                   NaN                 53.000000   \n",
-       "\n",
-       "        arrival_date_day_of_month  stays_in_weekend_nights  \\\n",
-       "count               119390.000000            119390.000000   \n",
-       "unique                        NaN                      NaN   \n",
-       "top                           NaN                      NaN   \n",
-       "freq                          NaN                      NaN   \n",
-       "mean                    15.798241                 0.927599   \n",
-       "std                      8.780829                 0.998613   \n",
-       "min                      1.000000                 0.000000   \n",
-       "25%                      8.000000                 0.000000   \n",
-       "50%                     16.000000                 1.000000   \n",
-       "75%                     23.000000                 2.000000   \n",
-       "max                     31.000000                19.000000   \n",
-       "\n",
-       "        stays_in_week_nights         adults  ...  deposit_type          agent  \\\n",
-       "count          119390.000000  119390.000000  ...        119390  103050.000000   \n",
-       "unique                   NaN            NaN  ...             3            NaN   \n",
-       "top                      NaN            NaN  ...    No Deposit            NaN   \n",
-       "freq                     NaN            NaN  ...        104641            NaN   \n",
-       "mean                2.500302       1.856403  ...           NaN      86.693382   \n",
-       "std                 1.908286       0.579261  ...           NaN     110.774548   \n",
-       "min                 0.000000       0.000000  ...           NaN       1.000000   \n",
-       "25%                 1.000000       2.000000  ...           NaN       9.000000   \n",
-       "50%                 2.000000       2.000000  ...           NaN      14.000000   \n",
-       "75%                 3.000000       2.000000  ...           NaN     229.000000   \n",
-       "max                50.000000      55.000000  ...           NaN     535.000000   \n",
-       "\n",
-       "            company days_in_waiting_list customer_type            adr  \\\n",
-       "count   6797.000000        119390.000000        119390  119390.000000   \n",
-       "unique          NaN                  NaN             4            NaN   \n",
-       "top             NaN                  NaN     Transient            NaN   \n",
-       "freq            NaN                  NaN         89613            NaN   \n",
-       "mean     189.266735             2.321149           NaN     101.831122   \n",
-       "std      131.655015            17.594721           NaN      50.535790   \n",
-       "min        6.000000             0.000000           NaN      -6.380000   \n",
-       "25%       62.000000             0.000000           NaN      69.290000   \n",
-       "50%      179.000000             0.000000           NaN      94.575000   \n",
-       "75%      270.000000             0.000000           NaN     126.000000   \n",
-       "max      543.000000           391.000000           NaN    5400.000000   \n",
-       "\n",
-       "        required_car_parking_spaces  total_of_special_requests  \\\n",
-       "count                 119390.000000              119390.000000   \n",
-       "unique                          NaN                        NaN   \n",
-       "top                             NaN                        NaN   \n",
-       "freq                            NaN                        NaN   \n",
-       "mean                       0.062518                   0.571363   \n",
-       "std                        0.245291                   0.792798   \n",
-       "min                        0.000000                   0.000000   \n",
-       "25%                        0.000000                   0.000000   \n",
-       "50%                        0.000000                   0.000000   \n",
-       "75%                        0.000000                   1.000000   \n",
-       "max                        8.000000                   5.000000   \n",
-       "\n",
-       "        reservation_status reservation_status_date  \n",
-       "count               119390                  119390  \n",
-       "unique                   3                     926  \n",
-       "top              Check-Out              2015-10-21  \n",
-       "freq                 75166                    1461  \n",
-       "mean                   NaN                     NaN  \n",
-       "std                    NaN                     NaN  \n",
-       "min                    NaN                     NaN  \n",
-       "25%                    NaN                     NaN  \n",
-       "50%                    NaN                     NaN  \n",
-       "75%                    NaN                     NaN  \n",
-       "max                    NaN                     NaN  \n",
-       "\n",
-       "[11 rows x 32 columns]"
-      ]
-     },
-     "execution_count": 6,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "df.describe(include='all')"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "paragraph": "Daten Vorbereitung",
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "## 2.3. Daten bereingung"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 7,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "image/png": "\n",
-      "text/plain": [
-       "<Figure size 1296x1296 with 2 Axes>"
-      ]
-     },
-     "metadata": {
-      "needs_background": "light"
-     },
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "f,ax=plt.subplots(figsize = (18,18))\n",
-    "sns.heatmap(df.corr(),annot= True,linewidths=0.5,fmt = \".1f\",ax=ax)\n",
-    "plt.xticks(rotation=90)\n",
-    "plt.yticks(rotation=0)\n",
-    "plt.title('Correlation Map')\n",
-    "plt.show()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 8,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "hotel                                  0\n",
-       "is_canceled                            0\n",
-       "lead_time                              0\n",
-       "arrival_date_year                      0\n",
-       "arrival_date_month                     0\n",
-       "arrival_date_week_number               0\n",
-       "arrival_date_day_of_month              0\n",
-       "stays_in_weekend_nights                0\n",
-       "stays_in_week_nights                   0\n",
-       "adults                                 0\n",
-       "children                               4\n",
-       "babies                                 0\n",
-       "meal                                   0\n",
-       "country                              488\n",
-       "market_segment                         0\n",
-       "distribution_channel                   0\n",
-       "is_repeated_guest                      0\n",
-       "previous_cancellations                 0\n",
-       "previous_bookings_not_canceled         0\n",
-       "reserved_room_type                     0\n",
-       "assigned_room_type                     0\n",
-       "booking_changes                        0\n",
-       "deposit_type                           0\n",
-       "agent                              16340\n",
-       "company                           112593\n",
-       "days_in_waiting_list                   0\n",
-       "customer_type                          0\n",
-       "adr                                    0\n",
-       "required_car_parking_spaces            0\n",
-       "total_of_special_requests              0\n",
-       "reservation_status                     0\n",
-       "reservation_status_date                0\n",
-       "dtype: int64"
-      ]
-     },
-     "execution_count": 8,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "df.isnull().sum()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 9,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.drop(['reservation_status'], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 10,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.drop(['stays_in_weekend_nights'], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 11,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.drop(['reservation_status_date'], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 12,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.drop(['arrival_date_day_of_month'], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 13,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.drop(['arrival_date_year'], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 14,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.drop(['arrival_date_month'], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 15,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.drop(['arrival_date_week_number'], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 16,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.drop(['required_car_parking_spaces'], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.drop(['previous_bookings_not_canceled'], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 18,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.drop(['total_of_special_requests'], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 19,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.drop(['agent'], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 20,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.drop(['company'], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 21,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.drop(['adr'], axis=1)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 22,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "df = df.dropna(axis=0)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "paragraph": "Test auf Multikollinearität",
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "## 2.4. Test for Multicollinearity"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 23,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "from statsmodels.stats.outliers_influence import variance_inflation_factor\n",
-    "variables = df[['lead_time', 'is_repeated_guest', 'adults', 'booking_changes', 'previous_cancellations', 'is_canceled', 'stays_in_week_nights', 'babies', 'days_in_waiting_list']]\n",
-    "vif = pd.DataFrame()\n",
-    "vif['VIF'] = [variance_inflation_factor(variables.values, i) for i in range(variables.shape[1])]\n",
-    "vif['Features'] = variables.columns"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 24,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>VIF</th>\n",
-       "      <th>Features</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <td>0</td>\n",
-       "      <td>2.285568</td>\n",
-       "      <td>lead_time</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>1</td>\n",
-       "      <td>1.033605</td>\n",
-       "      <td>is_repeated_guest</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>2</td>\n",
-       "      <td>3.354523</td>\n",
-       "      <td>adults</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>3</td>\n",
-       "      <td>1.143147</td>\n",
-       "      <td>booking_changes</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>4</td>\n",
-       "      <td>1.037159</td>\n",
-       "      <td>previous_cancellations</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>5</td>\n",
-       "      <td>1.759416</td>\n",
-       "      <td>is_canceled</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>6</td>\n",
-       "      <td>2.680081</td>\n",
-       "      <td>stays_in_week_nights</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>7</td>\n",
-       "      <td>1.015332</td>\n",
-       "      <td>babies</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>8</td>\n",
-       "      <td>1.049190</td>\n",
-       "      <td>days_in_waiting_list</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "        VIF                Features\n",
-       "0  2.285568               lead_time\n",
-       "1  1.033605       is_repeated_guest\n",
-       "2  3.354523                  adults\n",
-       "3  1.143147         booking_changes\n",
-       "4  1.037159  previous_cancellations\n",
-       "5  1.759416             is_canceled\n",
-       "6  2.680081    stays_in_week_nights\n",
-       "7  1.015332                  babies\n",
-       "8  1.049190    days_in_waiting_list"
-      ]
-     },
-     "execution_count": 24,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "vif"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "paragraph": "Deskriptiv Analyse",
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "## 2.5. Deskriptive Analyse"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 25,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "array([[<matplotlib.axes._subplots.AxesSubplot object at 0x000001DE37350C08>,\n",
-       "        <matplotlib.axes._subplots.AxesSubplot object at 0x000001DE37C47A88>,\n",
-       "        <matplotlib.axes._subplots.AxesSubplot object at 0x000001DE37C572C8>],\n",
-       "       [<matplotlib.axes._subplots.AxesSubplot object at 0x000001DE37C913C8>,\n",
-       "        <matplotlib.axes._subplots.AxesSubplot object at 0x000001DE37CC8508>,\n",
-       "        <matplotlib.axes._subplots.AxesSubplot object at 0x000001DE37D02608>],\n",
-       "       [<matplotlib.axes._subplots.AxesSubplot object at 0x000001DE37D3B708>,\n",
-       "        <matplotlib.axes._subplots.AxesSubplot object at 0x000001DE37D73808>,\n",
-       "        <matplotlib.axes._subplots.AxesSubplot object at 0x000001DE37D80408>],\n",
-       "       [<matplotlib.axes._subplots.AxesSubplot object at 0x000001DE37DB85C8>,\n",
-       "        <matplotlib.axes._subplots.AxesSubplot object at 0x000001DE37E1EB48>,\n",
-       "        <matplotlib.axes._subplots.AxesSubplot object at 0x000001DE37E54D08>]],\n",
-       "      dtype=object)"
-      ]
-     },
-     "execution_count": 25,
-     "metadata": {},
-     "output_type": "execute_result"
-    },
-    {
-     "data": {
-      "image/png": "\n",
-      "text/plain": [
-       "<Figure size 1800x1800 with 12 Axes>"
-      ]
-     },
-     "metadata": {
-      "needs_background": "light"
-     },
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "df.hist(figsize=(25,25), bins=50)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "paragraph": "Datenaufbereitung",
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "# 3. Datenaufbereitung"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "Zunächst wird der Typ der Daten nach dem Einlesen in das Notebook überprüft. Einlesefehler werden entsprechend korrigiert.\n",
-    "Dimensionalitäts reduktion: entfernte Attribute ohne Beschreibung. Fehlende Daten: Zeilen mit fehlenden Daten werden entfernt.\n",
-    "Datenkonvertierung: Dummy-Variablen werden erstellt."
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "paragraph": "Erfassung kategorialer Variablen",
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "## 3.1. Recoding of Categorical Variables"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 26,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>is_canceled</th>\n",
-       "      <th>lead_time</th>\n",
-       "      <th>stays_in_week_nights</th>\n",
-       "      <th>adults</th>\n",
-       "      <th>children</th>\n",
-       "      <th>babies</th>\n",
-       "      <th>is_repeated_guest</th>\n",
-       "      <th>previous_cancellations</th>\n",
-       "      <th>booking_changes</th>\n",
-       "      <th>days_in_waiting_list</th>\n",
-       "      <th>...</th>\n",
-       "      <th>assigned_room_type_H</th>\n",
-       "      <th>assigned_room_type_I</th>\n",
-       "      <th>assigned_room_type_K</th>\n",
-       "      <th>assigned_room_type_L</th>\n",
-       "      <th>assigned_room_type_P</th>\n",
-       "      <th>deposit_type_Non Refund</th>\n",
-       "      <th>deposit_type_Refundable</th>\n",
-       "      <th>customer_type_Group</th>\n",
-       "      <th>customer_type_Transient</th>\n",
-       "      <th>customer_type_Transient-Party</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>342</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>3</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "      <td>737</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>4</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>2</td>\n",
-       "      <td>0</td>\n",
-       "      <td>7</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>3</td>\n",
-       "      <td>0</td>\n",
-       "      <td>13</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>4</td>\n",
-       "      <td>0</td>\n",
-       "      <td>14</td>\n",
-       "      <td>2</td>\n",
-       "      <td>2</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>5 rows × 226 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "   is_canceled  lead_time  stays_in_week_nights  adults  children  babies  \\\n",
-       "0            0        342                     0       2       0.0       0   \n",
-       "1            0        737                     0       2       0.0       0   \n",
-       "2            0          7                     1       1       0.0       0   \n",
-       "3            0         13                     1       1       0.0       0   \n",
-       "4            0         14                     2       2       0.0       0   \n",
-       "\n",
-       "   is_repeated_guest  previous_cancellations  booking_changes  \\\n",
-       "0                  0                       0                3   \n",
-       "1                  0                       0                4   \n",
-       "2                  0                       0                0   \n",
-       "3                  0                       0                0   \n",
-       "4                  0                       0                0   \n",
-       "\n",
-       "   days_in_waiting_list  ...  assigned_room_type_H  assigned_room_type_I  \\\n",
-       "0                     0  ...                     0                     0   \n",
-       "1                     0  ...                     0                     0   \n",
-       "2                     0  ...                     0                     0   \n",
-       "3                     0  ...                     0                     0   \n",
-       "4                     0  ...                     0                     0   \n",
-       "\n",
-       "   assigned_room_type_K  assigned_room_type_L  assigned_room_type_P  \\\n",
-       "0                     0                     0                     0   \n",
-       "1                     0                     0                     0   \n",
-       "2                     0                     0                     0   \n",
-       "3                     0                     0                     0   \n",
-       "4                     0                     0                     0   \n",
-       "\n",
-       "   deposit_type_Non Refund  deposit_type_Refundable  customer_type_Group  \\\n",
-       "0                        0                        0                    0   \n",
-       "1                        0                        0                    0   \n",
-       "2                        0                        0                    0   \n",
-       "3                        0                        0                    0   \n",
-       "4                        0                        0                    0   \n",
-       "\n",
-       "   customer_type_Transient  customer_type_Transient-Party  \n",
-       "0                        1                              0  \n",
-       "1                        1                              0  \n",
-       "2                        1                              0  \n",
-       "3                        1                              0  \n",
-       "4                        1                              0  \n",
-       "\n",
-       "[5 rows x 226 columns]"
-      ]
-     },
-     "execution_count": 26,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "df_dummies = pd.get_dummies(df, drop_first=True) # 0-1 encoding for categorical values\n",
-    "df_dummies.head()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "#df_dummies.to_csv('train_dummies.csv', index = False) "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 28,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Int64Index([     0,      1,      2,      3,      4,      5,      6,      7,\n",
-       "                 8,      9,\n",
-       "            ...\n",
-       "            119380, 119381, 119382, 119383, 119384, 119385, 119386, 119387,\n",
-       "            119388, 119389],\n",
-       "           dtype='int64', length=118898)"
-      ]
-     },
-     "execution_count": 28,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "df_dummies.axes[0]"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 29,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": [
-     "active_ipynb"
-    ]
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Index(['is_canceled', 'lead_time', 'stays_in_week_nights', 'adults',\n",
-       "       'children', 'babies', 'is_repeated_guest', 'previous_cancellations',\n",
-       "       'booking_changes', 'days_in_waiting_list',\n",
-       "       ...\n",
-       "       'assigned_room_type_H', 'assigned_room_type_I', 'assigned_room_type_K',\n",
-       "       'assigned_room_type_L', 'assigned_room_type_P',\n",
-       "       'deposit_type_Non Refund', 'deposit_type_Refundable',\n",
-       "       'customer_type_Group', 'customer_type_Transient',\n",
-       "       'customer_type_Transient-Party'],\n",
-       "      dtype='object', length=226)"
-      ]
-     },
-     "execution_count": 29,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "df_dummies.axes[1]"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "paragraph": "Modellierung und Auswertung",
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "# 4. Modellierung und Auswertung"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "Der Datensatz wird mit seinen Dummy-Variablen hochgeladen und in einen Trainings- und einen Testsatz aufgeteilt.\n",
-    "Dann wird der Trainings- und Testprozess mit 3 verschiedenen Algorithmen durchgeführt und ausgewertet - Logistische Regression, Entscheidungsbaum,\n",
-    "Random Forest.\n",
-    "Fur Bewertung, Hyperparameter:    \n",
-    "Output: überwachtes Lernen, Klassifikation\n",
-    "Datenaufteilung: 80% Trainingsdaten, 20% Testdaten.\n",
-    "Auswertungsmetriken DecisionTree: Genauigkeit= 0.82, Rückruf= 0.74, Präzision=0.76.\n",
-    "Auswertungsmetriken Logistische Regression: Genauigkeit= 0.78, Rückruf= 0.55,\n",
-    "Präzision= 0.78.\n",
-    "Auswertungsmetriken Random Forest: Genauigkeit= 0.82, Rückruf= 0.57, Genauigkeit= 0,90.\n"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "paragraph": "Test- und Trainingsdaten",
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "## 4.1. Test and Train Data"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 30,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [],
-   "source": [
-    "target = df_dummies['is_canceled'] # feature to be predicted\n",
-    "predictors = df_dummies.drop(['is_canceled'], axis = 1) # all other features are used as predictors"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 31,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/html": [
-       "<div>\n",
-       "<style scoped>\n",
-       "    .dataframe tbody tr th:only-of-type {\n",
-       "        vertical-align: middle;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe tbody tr th {\n",
-       "        vertical-align: top;\n",
-       "    }\n",
-       "\n",
-       "    .dataframe thead th {\n",
-       "        text-align: right;\n",
-       "    }\n",
-       "</style>\n",
-       "<table border=\"1\" class=\"dataframe\">\n",
-       "  <thead>\n",
-       "    <tr style=\"text-align: right;\">\n",
-       "      <th></th>\n",
-       "      <th>lead_time</th>\n",
-       "      <th>stays_in_week_nights</th>\n",
-       "      <th>adults</th>\n",
-       "      <th>children</th>\n",
-       "      <th>babies</th>\n",
-       "      <th>is_repeated_guest</th>\n",
-       "      <th>previous_cancellations</th>\n",
-       "      <th>booking_changes</th>\n",
-       "      <th>days_in_waiting_list</th>\n",
-       "      <th>hotel_Resort Hotel</th>\n",
-       "      <th>...</th>\n",
-       "      <th>assigned_room_type_H</th>\n",
-       "      <th>assigned_room_type_I</th>\n",
-       "      <th>assigned_room_type_K</th>\n",
-       "      <th>assigned_room_type_L</th>\n",
-       "      <th>assigned_room_type_P</th>\n",
-       "      <th>deposit_type_Non Refund</th>\n",
-       "      <th>deposit_type_Refundable</th>\n",
-       "      <th>customer_type_Group</th>\n",
-       "      <th>customer_type_Transient</th>\n",
-       "      <th>customer_type_Transient-Party</th>\n",
-       "    </tr>\n",
-       "  </thead>\n",
-       "  <tbody>\n",
-       "    <tr>\n",
-       "      <td>0</td>\n",
-       "      <td>342</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>3</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>1</td>\n",
-       "      <td>737</td>\n",
-       "      <td>0</td>\n",
-       "      <td>2</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>4</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>2</td>\n",
-       "      <td>7</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>3</td>\n",
-       "      <td>13</td>\n",
-       "      <td>1</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "    <tr>\n",
-       "      <td>4</td>\n",
-       "      <td>14</td>\n",
-       "      <td>2</td>\n",
-       "      <td>2</td>\n",
-       "      <td>0.0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>...</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>0</td>\n",
-       "      <td>1</td>\n",
-       "      <td>0</td>\n",
-       "    </tr>\n",
-       "  </tbody>\n",
-       "</table>\n",
-       "<p>5 rows × 225 columns</p>\n",
-       "</div>"
-      ],
-      "text/plain": [
-       "   lead_time  stays_in_week_nights  adults  children  babies  \\\n",
-       "0        342                     0       2       0.0       0   \n",
-       "1        737                     0       2       0.0       0   \n",
-       "2          7                     1       1       0.0       0   \n",
-       "3         13                     1       1       0.0       0   \n",
-       "4         14                     2       2       0.0       0   \n",
-       "\n",
-       "   is_repeated_guest  previous_cancellations  booking_changes  \\\n",
-       "0                  0                       0                3   \n",
-       "1                  0                       0                4   \n",
-       "2                  0                       0                0   \n",
-       "3                  0                       0                0   \n",
-       "4                  0                       0                0   \n",
-       "\n",
-       "   days_in_waiting_list  hotel_Resort Hotel  ...  assigned_room_type_H  \\\n",
-       "0                     0                   1  ...                     0   \n",
-       "1                     0                   1  ...                     0   \n",
-       "2                     0                   1  ...                     0   \n",
-       "3                     0                   1  ...                     0   \n",
-       "4                     0                   1  ...                     0   \n",
-       "\n",
-       "   assigned_room_type_I  assigned_room_type_K  assigned_room_type_L  \\\n",
-       "0                     0                     0                     0   \n",
-       "1                     0                     0                     0   \n",
-       "2                     0                     0                     0   \n",
-       "3                     0                     0                     0   \n",
-       "4                     0                     0                     0   \n",
-       "\n",
-       "   assigned_room_type_P  deposit_type_Non Refund  deposit_type_Refundable  \\\n",
-       "0                     0                        0                        0   \n",
-       "1                     0                        0                        0   \n",
-       "2                     0                        0                        0   \n",
-       "3                     0                        0                        0   \n",
-       "4                     0                        0                        0   \n",
-       "\n",
-       "   customer_type_Group  customer_type_Transient  customer_type_Transient-Party  \n",
-       "0                    0                        1                              0  \n",
-       "1                    0                        1                              0  \n",
-       "2                    0                        1                              0  \n",
-       "3                    0                        1                              0  \n",
-       "4                    0                        1                              0  \n",
-       "\n",
-       "[5 rows x 225 columns]"
-      ]
-     },
-     "execution_count": 31,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "predictors.head()"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 32,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": [
-     "active_ipynb"
-    ]
-   },
-   "outputs": [],
-   "source": [
-    "X_train, X_test, y_train, y_test = train_test_split(predictors, target, test_size=0.2, random_state=123)"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "paragraph": "DecisionTree",
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "## 4.2. DecisionTree"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 33,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": [
-     "active_ipynb"
-    ]
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "DecisionTreeClassifier()"
-      ]
-     },
-     "execution_count": 33,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "tree = DecisionTreeClassifier()\n",
-    "tree.fit(X_train, y_train)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 34,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "12983 2041 2295 6461\n"
-     ]
-    }
-   ],
-   "source": [
-    "tn, fp, fn, tp = confusion_matrix(y_test, tree.predict(X_test)).ravel() \n",
-    "print(tn, fp, fn, tp)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 35,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": [
-     "active_ipynb"
-    ]
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "              precision    recall  f1-score   support\n",
-      "\n",
-      "           0       0.97      0.99      0.98     59721\n",
-      "           1       0.99      0.94      0.97     35397\n",
-      "\n",
-      "    accuracy                           0.98     95118\n",
-      "   macro avg       0.98      0.97      0.97     95118\n",
-      "weighted avg       0.98      0.98      0.98     95118\n",
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(classification_report(y_train, tree.predict(X_train)))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 36,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": [
-     "active_ipynb"
-    ]
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "              precision    recall  f1-score   support\n",
-      "\n",
-      "           0       0.85      0.86      0.86     15024\n",
-      "           1       0.76      0.74      0.75      8756\n",
-      "\n",
-      "    accuracy                           0.82     23780\n",
-      "   macro avg       0.80      0.80      0.80     23780\n",
-      "weighted avg       0.82      0.82      0.82     23780\n",
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(classification_report(y_test, tree.predict(X_test)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "paragraph": "Logistik Regression",
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "## 4.3. Logistic Regression "
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 37,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": [
-     "active_ipynb"
-    ]
-   },
-   "outputs": [
-    {
-     "name": "stderr",
-     "output_type": "stream",
-     "text": [
-      "C:\\Users\\alexa\\Anaconda3\\lib\\site-packages\\sklearn\\linear_model\\_logistic.py:765: ConvergenceWarning: lbfgs failed to converge (status=1):\n",
-      "STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.\n",
-      "\n",
-      "Increase the number of iterations (max_iter) or scale the data as shown in:\n",
-      "    https://scikit-learn.org/stable/modules/preprocessing.html\n",
-      "Please also refer to the documentation for alternative solver options:\n",
-      "    https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression\n",
-      "  extra_warning_msg=_LOGISTIC_SOLVER_CONVERGENCE_MSG)\n"
-     ]
-    },
-    {
-     "data": {
-      "text/plain": [
-       "LogisticRegression()"
-      ]
-     },
-     "execution_count": 37,
-     "metadata": {},
-     "output_type": "execute_result"
-    }
-   ],
-   "source": [
-    "logreg = LogisticRegression()\n",
-    "logreg.fit(X_train, y_train)"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 38,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": [
-     "active_ipynb"
-    ]
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "[[13634  1390]\n",
-      " [ 3870  4886]]\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(confusion_matrix(y_test, logreg.predict(X_test)))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 39,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": [
-     "active_ipynb"
-    ]
-   },
-   "outputs": [
-    {
-     "data": {
-      "text/plain": [
-       "Text(0.5, 39.5, 'Predicted label')"
-      ]
-     },
-     "execution_count": 39,
-     "metadata": {},
-     "output_type": "execute_result"
-    },
-    {
-     "data": {
-      "image/png": "\n",
-      "text/plain": [
-       "<Figure size 720x504 with 2 Axes>"
-      ]
-     },
-     "metadata": {
-      "needs_background": "light"
-     },
-     "output_type": "display_data"
-    }
-   ],
-   "source": [
-    "conf_mat = confusion_matrix(y_test, logreg.predict(X_test))\n",
-    "df_cm = pd.DataFrame(conf_mat, index=['0','1'], columns=['0', '1'],)\n",
-    "fig = plt.figure(figsize=[10,7])\n",
-    "heatmap = sns.heatmap(df_cm, annot=True, fmt=\"d\")\n",
-    "heatmap.yaxis.set_ticklabels(heatmap.yaxis.get_ticklabels(), rotation=0, ha='right', fontsize=14)\n",
-    "heatmap.xaxis.set_ticklabels(heatmap.xaxis.get_ticklabels(), rotation=45, ha='right', fontsize=14)\n",
-    "plt.ylabel('True label')\n",
-    "plt.xlabel('Predicted label')"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 40,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "              precision    recall  f1-score   support\n",
-      "\n",
-      "           0       0.78      0.91      0.84     15024\n",
-      "           1       0.78      0.56      0.65      8756\n",
-      "\n",
-      "    accuracy                           0.78     23780\n",
-      "   macro avg       0.78      0.73      0.74     23780\n",
-      "weighted avg       0.78      0.78      0.77     23780\n",
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(classification_report(y_test, logreg.predict(X_test)))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 41,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "              precision    recall  f1-score   support\n",
-      "\n",
-      "           0       0.78      0.91      0.84     59721\n",
-      "           1       0.78      0.56      0.66     35397\n",
-      "\n",
-      "    accuracy                           0.78     95118\n",
-      "   macro avg       0.78      0.74      0.75     95118\n",
-      "weighted avg       0.78      0.78      0.77     95118\n",
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "print(classification_report(y_train, logreg.predict(X_train)))"
-   ]
-  },
-  {
-   "cell_type": "markdown",
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": []
-   },
-   "source": [
-    "## 4.4. Random Forest"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 42,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": [
-     "active_ipynb"
-    ]
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Max tree depth:  5\n",
-      "Train results:                precision    recall  f1-score   support\n",
-      "\n",
-      "           0       0.73      1.00      0.84     59721\n",
-      "           1       1.00      0.37      0.54     35397\n",
-      "\n",
-      "    accuracy                           0.77     95118\n",
-      "   macro avg       0.86      0.69      0.69     95118\n",
-      "weighted avg       0.83      0.77      0.73     95118\n",
-      "\n",
-      "Test results:                precision    recall  f1-score   support\n",
-      "\n",
-      "           0       0.73      1.00      0.84     15024\n",
-      "           1       1.00      0.36      0.53      8756\n",
-      "\n",
-      "    accuracy                           0.76     23780\n",
-      "   macro avg       0.86      0.68      0.69     23780\n",
-      "weighted avg       0.83      0.76      0.73     23780\n",
-      "\n",
-      "Max tree depth:  10\n",
-      "Train results:                precision    recall  f1-score   support\n",
-      "\n",
-      "           0       0.75      0.98      0.85     59721\n",
-      "           1       0.94      0.45      0.61     35397\n",
-      "\n",
-      "    accuracy                           0.78     95118\n",
-      "   macro avg       0.85      0.71      0.73     95118\n",
-      "weighted avg       0.82      0.78      0.76     95118\n",
-      "\n",
-      "Test results:                precision    recall  f1-score   support\n",
-      "\n",
-      "           0       0.75      0.98      0.85     15024\n",
-      "           1       0.94      0.44      0.59      8756\n",
-      "\n",
-      "    accuracy                           0.78     23780\n",
-      "   macro avg       0.84      0.71      0.72     23780\n",
-      "weighted avg       0.82      0.78      0.76     23780\n",
-      "\n",
-      "Max tree depth:  20\n",
-      "Train results:                precision    recall  f1-score   support\n",
-      "\n",
-      "           0       0.81      0.97      0.88     59721\n",
-      "           1       0.92      0.62      0.74     35397\n",
-      "\n",
-      "    accuracy                           0.84     95118\n",
-      "   macro avg       0.87      0.79      0.81     95118\n",
-      "weighted avg       0.85      0.84      0.83     95118\n",
-      "\n",
-      "Test results:                precision    recall  f1-score   support\n",
-      "\n",
-      "           0       0.80      0.96      0.87     15024\n",
-      "           1       0.90      0.58      0.70      8756\n",
-      "\n",
-      "    accuracy                           0.82     23780\n",
-      "   macro avg       0.85      0.77      0.79     23780\n",
-      "weighted avg       0.83      0.82      0.81     23780\n",
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "tree_depth = [5, 10, 20]\n",
-    "for i in tree_depth:\n",
-    "    rf = RandomForestClassifier(max_depth=i)\n",
-    "    rf.fit(X_train, y_train)\n",
-    "    print('Max tree depth: ', i)\n",
-    "    print('Train results: ', classification_report(y_train, rf.predict(X_train)))\n",
-    "    print('Test results: ',classification_report(y_test, rf.predict(X_test)))"
-   ]
-  },
-  {
-   "cell_type": "code",
-   "execution_count": 43,
-   "metadata": {
-    "editable": true,
-    "include": true,
-    "slideshow": {
-     "slide_type": ""
-    },
-    "tags": [
-     "active_ipynb"
-    ]
-   },
-   "outputs": [
-    {
-     "name": "stdout",
-     "output_type": "stream",
-     "text": [
-      "Max tree depth:  20\n",
-      "Train results:                precision    recall  f1-score   support\n",
-      "\n",
-      "           0       0.80      0.98      0.88     59721\n",
-      "           1       0.94      0.60      0.73     35397\n",
-      "\n",
-      "    accuracy                           0.84     95118\n",
-      "   macro avg       0.87      0.79      0.81     95118\n",
-      "weighted avg       0.85      0.84      0.83     95118\n",
-      "\n",
-      "Test results:                precision    recall  f1-score   support\n",
-      "\n",
-      "           0       0.79      0.97      0.87     15024\n",
-      "           1       0.91      0.56      0.69      8756\n",
-      "\n",
-      "    accuracy                           0.82     23780\n",
-      "   macro avg       0.85      0.76      0.78     23780\n",
-      "weighted avg       0.83      0.82      0.80     23780\n",
-      "\n"
-     ]
-    }
-   ],
-   "source": [
-    "rf = RandomForestClassifier(max_depth=20)\n",
-    "rf.fit(X_train, y_train)\n",
-    "print('Max tree depth: ', i)\n",
-    "print('Train results: ', classification_report(y_train, rf.predict(X_train)))\n",
-    "print('Test results: ',classification_report(y_test, rf.predict(X_test)))"
-   ]
-  }
- ],
- "metadata": {
-  "branche": "Tourismus",
-  "category": "Hotel",
-  "dataSource": "https://www.kaggle.com/jessemostipak/hotel-booking-demand",
-  "funktion": "Marketforschung",
-  "kernelspec": {
-   "display_name": "Python 3 (ipykernel)",
-   "language": "python",
-   "name": "python3"
-  },
-  "language_info": {
-   "codemirror_mode": {
-    "name": "ipython",
-    "version": 3
-   },
-   "file_extension": ".py",
-   "mimetype": "text/x-python",
-   "name": "python",
-   "nbconvert_exporter": "python",
-   "pygments_lexer": "ipython3",
-   "version": "3.12.3"
-  },
-  "repoLink": "https://github.com/AlexRossmann/machine-learning-services/tree/main/Marketing/Generation%20of%20Individual%20Playlists",
-  "skipNotebookInDeployment": false,
-  "teaser": "Ein wichtiges Ziel für jedes Unternehmen liegt in der Kunden Zufriedenheit und Gewinn maximierung. Problematisch ist wenn die Kunde Ihre Buchungen plotzlich stonieren ohne vorkentnisse. Diese kann nicht nur zu Gewinn velust und negative Bewertung das Hotel fuhren sondern auch zu wirtschaft unsicherheit. Abschätzung des Kundenverhaltens in vorraus bei Buchungsstornierungen für Planung von Kapazität  ist notwending.. ",
-  "title": "Vorherzusagen von Buchungstornierungen"
- },
- "nbformat": 4,
- "nbformat_minor": 4
-}
-- 
GitLab